Phoenix Index tool deletes data from main table and does not populate local index
HI, I am using CDP 7.3.1 I am trying create a phoenix local index for already existing table with ~14M rows. Following are the steps I did. 1. take snapshot of table from hbase (table is salted in phoenix) 2. clone the snapshot to new test table 3. create the table in phoenix to link the phoenix to new habse table created from clone snapshot. I can see all rows in the phoenix table. 4. create the local index in async mode: CREATE LOCAL INDEX MILESTONE_LOCAL_INDEX ON MILESTONE_TEST (h.eventTimestamp) ASYNC; 5. Run the MR job Phoenix IndexTool: ./hbase org.apache.phoenix.mapreduce.index.IndexTool --data-table MILESTONE_TEST --index-table MILESTONE_LOCAL_INDEX --output-path /hbase/data/default/MILESTONE_LOCAL_INDEX_HFILE MR jobs says succeeded, but he data in main test table and local index is 0 but the index is marked active. I don’t see any explicit error in MR job. What could be the issue with MR job? Another question on Local index usage. When I tried with some 5 rows, I tried to see if the local index is used explain select * from MILESTONE_TEST where eventTimestamp <= TO_TIMESTAMP('2022-07-25 14:03:22.559'); +---+ | PLAN | +---+ | CLIENT 10-CHUNK 0 ROWS 0 BYTES PARALLEL 10-WAY ROUND ROBIN FULL SCAN OVER MILESTONE_TEST | | SERVER FILTER BY H.EVENTTIMESTAMP <= TIMESTAMP '2022-07-25 14:03:22.559' | +---+ I am not seeing the local index name in the explain suggesting its usage. Does the phoenix not use the local index if there are less rows? Hence am trying with millions of rows so that phoenix with try to use the local index. Can I expect to see to the local index name in explain if I have a few million rows in table and local index? Thanks, Pradheep
Using Secondary index hint for many to many relationship model
Hi, We have a table with with parented and childid combination as unique. We have to query it sometimes with parent id and sometimes with child id. So we have secondary index(global index) on child id. As we don’t want to store all the fields again in the secondary index , we wanted to use index hinting. Also we do pre-splitting on the first field of table and as well as secondary index to distribute the write. We pre-calculate Secondary_Dist_salt and write to a column in primary table which is used to populate the index. Table(TEST) Primary_Dist_salt Parentid Childid Other columns Secondary Index: (TESTINDEX) Secondary_Dist_salt Childid Parentid If we do a query like below, I get below plan I see that the index is used, I am worried if the full scan on primary table will affect performance in future as the table may grow to several hundreds of millions. Also I see that Primary_Dist_salt is also added implicitly when I create the secondary index without it. So I thought we have all the keys required to do a point lookup on primary table to get rest of the columns. I presume that the range scan runs parallel and hence full scan occurs? Can you please throw some light on this? Is there some way we can avoid that full scan. explain select /*+ INDEX(TEST TESTINDEX) */ * from TEST where Childid ='10' and Secondary_dist_salt='3k'; CLIENT 10-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER TEST SKIP-SCAN-JOIN TABLE 0 CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN RANGE SCAN OVER TESTINDEX ['3k','10'] SERVER FILTER BY FIRST KEY ONLY DYNAMIC SERVER FILTER BY ("TEST.PRIMARY_DIST_SALT", "TEST.PARENTID", "TEST.CHILDID") IN (($2.$4, $2.$5, $2.$6) Thanks, Pradheep
Re: Phoenix Client threads
Hi Sergey, Thanks for your response..I got the issue.. Could you please throw some light on the Phoenix thread and Hconnections? Is it like 1 phoenix thread creates 30 Hconnections . Assuming we are running 30 concurrent reads queries which would need 30 * 30 = 900 Hconnections but I guess we have only 256 Hconnections (is there a parameter to tweak this) will the parallel scan be converted to serial scans reducing query performance? Does increasing the phoenix.query.threadPoolSize (from 128 to 256) also increase the Hconnections thread pool size and how much)? Should we definitely consider reducing the salting(from 30 to 10) considering this huge multiplication factor of number of threads? Initially we decided to use salt buckets of 30 as this is write heave table and total number of rows could grow up to several billions. Thanks, Pradheep From: <sergey.solda...@gmail.com> on behalf of Sergey Soldatov <sergeysolda...@gmail.com> Reply-To: "user@phoenix.apache.org" <user@phoenix.apache.org> Date: Tuesday, May 22, 2018 at 1:45 PM To: "user@phoenix.apache.org" <user@phoenix.apache.org> Subject: Re: Phoenix Client threads Salting byte is calculated using a hash function for the whole row key (using all pk columns). So if you are using only one of PK columns in the WHERE clause, Phoenix is unable to identify which salting byte (bucket number) should be used, so it runs scans for all salting bytes. All those threads are lightweight, mostly waiting for a response from HBase server, so you may consider the option to adjust nproc limit. Or you may decrease the number of phoenix threads by phoenix.query.threadPoolSize property. Decreasing number of salting buckets can be used as well. Thanks, Sergey On Tue, May 22, 2018 at 8:52 AM, Pradheep Shanmugam <pradheep.shanmu...@infor.com<mailto:pradheep.shanmu...@infor.com>> wrote: Hi, We have table with key as (type, id1, id2) (type is same for all rows where as id1 and id2 are unique for each row) which is salted (30 salt buckets) The load on this table is about 30 queries/sec with each query taking ~6ms we are using phoenix 4.7.0 non-thin client we have query like below SELECT tab.a, tab.b FROM tab WHERE tab.id1 = '1F64F5DY0J0A03692' AND tab.type = 4 AND tab.isActive = 1; CLIENT 30-CHUNK 0 ROWS 0 BYTES PARALLEL 30-WAY ROUND ROBIN RANGE SCAN OVER TAB [0,4, '1F64F5DY0J0A03692'] SERVER FILTER BY TS.ISACTIVE = 1 Here I could see that about 30 threads are being used for this query..here ‘type’ is same for all rows..and thought that it is the reason for looking into all the chunks to get the key and hence using 30 threads Then I ran the same query on a similar table with keys rearranged (id1, id2, type) and salted (30) But still I see same 30 threads are being used , thought it can uniquely identify a row with given id1 which should be in one of the chunks (is this due to salting that it does not know where the keys is) CLIENT 30-CHUNK PARALLEL 30-WAY ROUND ROBIN RANGE SCAN OVER TAB [0, '1F64F5DY0J0A03692'] SERVER FILTER BY (TYPE = 4 AND TS.ISACTIVE = 1) Currently I am exceeding my nproc limit set in my app server with (phoenix threads 128 and hconnection threads reaching 256 = 384 threads). Can you please throw some light on phoenix connections and Hconnections and how to reduce that to reasonable level..and also on the above query plans. Should we consider reducing the SALT Number to 10( we have 10 region servers)? Thanks, Pradheep
Phoenix Client threads
Hi, We have table with key as (type, id1, id2) (type is same for all rows where as id1 and id2 are unique for each row) which is salted (30 salt buckets) The load on this table is about 30 queries/sec with each query taking ~6ms we are using phoenix 4.7.0 non-thin client we have query like below SELECT tab.a, tab.b FROM tab WHERE tab.id1 = '1F64F5DY0J0A03692' AND tab.type = 4 AND tab.isActive = 1; CLIENT 30-CHUNK 0 ROWS 0 BYTES PARALLEL 30-WAY ROUND ROBIN RANGE SCAN OVER TAB [0,4, '1F64F5DY0J0A03692'] SERVER FILTER BY TS.ISACTIVE = 1 Here I could see that about 30 threads are being used for this query..here ‘type’ is same for all rows..and thought that it is the reason for looking into all the chunks to get the key and hence using 30 threads Then I ran the same query on a similar table with keys rearranged (id1, id2, type) and salted (30) But still I see same 30 threads are being used , thought it can uniquely identify a row with given id1 which should be in one of the chunks (is this due to salting that it does not know where the keys is) CLIENT 30-CHUNK PARALLEL 30-WAY ROUND ROBIN RANGE SCAN OVER TAB [0, '1F64F5DY0J0A03692'] SERVER FILTER BY (TYPE = 4 AND TS.ISACTIVE = 1) Currently I am exceeding my nproc limit set in my app server with (phoenix threads 128 and hconnection threads reaching 256 = 384 threads). Can you please throw some light on phoenix connections and Hconnections and how to reduce that to reasonable level..and also on the above query plans. Should we consider reducing the SALT Number to 10( we have 10 region servers)? Thanks, Pradheep
increase column datatype length
Hi, I have a requirement to increase a varchar column length from 500 to 2000.. Looking at below. http://apache-phoenix-user-list.1124778.n5.nabble.com/Can-I-change-a-String-column-s-size-and-preserve-the-data-tt2664.html#none how does this change datatype length at hbase layer..should a major_compaction be done at hbase layer? will above solution work in phoenix 4.4 other options I see are create a new table with updated columns and copy the data over from old table are there any other options? Thanks, Pradheep
Re: Salt Number
HI James, We have a table where multiple customer could have rows. Some of them may be large and some very small in terms for number of rows. we have a row key based on customerid+type+orderid..if not salted all the rows of large customer will end up in some regions leading to hot spotting(being large customer and more frequently used) Thanks, Pradheep From: James Taylor <jamestay...@apache.org> Sent: Friday, September 8, 2017 12:56:31 PM To: user Subject: Re: Salt Number Hi Pradheep, Would you be able to describe your use case and why you're salting? We really only recommend salting if you have write hotspotting. Otherwise, it increases the overall load on your cluster. Thanks, James On Fri, Sep 8, 2017 at 9:13 AM, Pradheep Shanmugam <pradheep.shanmu...@infor.com<mailto:pradheep.shanmu...@infor.com>> wrote: Hi, As the salt number cannot be changed later, what is is best number we can give in different cases for cluster with 10 region servers with say 6 cores in each. Should we consider cores while deciding the number.. In some places i see number can be in the range 1-256 and in some place i see that it is equal to the number of region servers..can the number in the multiples of region server(say 20, 30 etc) read heavy large(several 100 millions) table with range scans write heavy large table with less frequent range scans large table with hybrid load with range scans Thanks, Pradheep
optimistic lock in phenix
hi, we have a table in phoenix 4.4 which has a modifyrevision as a column.. when a thread updates the row the modifyrevision has to be incremented.. only option i can see is to use the upsert select to get the latest revision but since it is not atomic how do we avoid the dirty reads?is there any other way this can be done? Thanks, Pradheep
Phoenix log
Hi, I am doing a point lookup query based on row key and get back a column from another column family(H.DATA] and it takes about 100ms though sqlline..what is a typical point lookup time? I am not getting what takes this 54ms at the beginning..any clue…also the total time taken does not sum up to the time returned by sqlline.assuming network time. Phoenix log is as below: 2017-04-26 18:47:12 DEBUG ParallelIterators:97 - Id: 4595e0a3-0d93-4ab7-b300-a5e590d7f838, Time: 54ms, Scan: {"timeRange":[0,1493232432648],"batch":-1,"startRow":"","stopRow":"\\x01","loadColumnFamiliesOnDemand":null,"totalColumns":1,"cacheBlocks":true,"families":{"0":["ALL"]},"maxResultSize":2097152,"maxVersions":1,"filter":"FilterList AND (2/2): [COLUMN_NAME IS NOT NULL, ]","caching":1} 2017-04-26 18:48:10 DEBUG ParallelIterators:97 - Id: d6ad1212-e56e-4b00-87f9-c8a4b534bbc4, Time: 4ms, Scan: {"timeRange":[0,1493232491081],"batch":-1,"startRow":"\\x00\\x00MSBO_PHOENIX_COMP_ROWKEY\\x00\\x01","stopRow":"\\x00\\x00MSBO_PHOENIX_COMP_ROWKEY\\x01","loadColumnFamiliesOnDemand":null,"totalColumns":1,"cacheBlocks":true,"families":{"0":["ALL"]},"maxResultSize":2097152,"maxVersions":1,"filter":"FilterList AND (2/2): [COLUMN_FAMILY IS NULL, ]","caching":1} 2017-04-26 18:48:10 DEBUG ParallelIterators:97 - Id: bd8f3eb7-9259-45bc-b488-89e510881816, Time: 1ms, Scan: {"timeRange":[0,1493232491022],"batch":-1,"startRow":"\\x07\\x00\\x00\\x00dPO\\x00\\x00\\x00\\x00\\x00-s\\x0C\\x04\\x00\\x00\\x00\\x07","stopRow":"\\x07\\x00\\x00\\x00dPO\\x00\\x00\\x00\\x00\\x00-s\\x0C\\x04\\x00\\x00\\x00\\x08","loadColumnFamiliesOnDemand":null,"totalColumns":2,"cacheBlocks":true,"families":{"0":["_0"],"H":["DATA"]},"maxResultSize":2097152,"maxVersions":1,"caching":1} Thanks, Pradheep
Could not find hash cache error/time out errors
Hi, I run a phoenix(4.4.0 – thick client) left join query from Jmeter I run the same query from 3 to 5 parallel threads and I run into the following error: org.apache.phoenix.exception.PhoenixIOException: org.apache.phoenix.exception.PhoenixIOException: org.apache.hadoop.hbase.DoNotRetryIOException: Could not find hash cache for joinId: �lÍ–LW||. The cache might have expired and have been removed. at org.apache.phoenix.coprocessor.HashJoinRegionScanner.(HashJoinRegionScanner.java:96) at org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver.doPostScannerOpen(GroupedAggregateRegionObserver.java:149) at org.apache.phoenix.coprocessor.BaseScannerRegionObserver.postScannerOpen(BaseScannerRegionObserver.java:177) at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$52.call(RegionCoprocessorHost.java:1318) at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1673) at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1748) at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperationWithResult(RegionCoprocessorHost.java:1712) at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.postScannerOpen(RegionCoprocessorHost.java:1313) at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2259) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32205) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101) at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130) at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107) at java.lang.Thread.run(Thread.java:745) When I increase the allocated heap in Jmeter from 4Gb for above to 6 Gb, I do not get this error.. Also I have to allocate memory proportional to number of parallel threads I run. Allocating more memory in the client application seems a costly option.. I tried increasing the phoenix.coprocessor.maxServerCacheTimeToLiveMs from 30 to 120 secs then I Stated getting org.apache.phoenix.exception.PhoenixIOException: org.apache.phoenix.exception.PhoenixIOException: 89367ms passed since the last invocation, timeout is currently set to 6 Default phoenix client time out is 10 mins.. Any idea on how to resolve this? Thanks, Pradheep
phoenix client config and memory
Hi, 1. When using phoenix thick client(4.4.0) at the application, where does the client hbase-site.xml reside..I don’t see one? Or does it pull the hbase-site.xml from the server before starting up?I do see the phoenix query time out set in the server side habse-site though it is client setting. 2. When using phoenix thick client, does the client need extra memory to be allocated in general to run the queries and when it does operations like client merge sort. With mutiple such queries runnign at the same time? Thanks, Pradheep
Re: Phoenix query performance
Hi Arvind, The row key is PARENTID, OWNERORGID, MILESTONETYPEID, PARENTTYPE Each parentid will have a list of MILESTONETYPEID (19661, 1, 2 , etc..). So your query will return all the parentids.. I am looking of rparentid that does not have a MILESTONETYPEID Thanks, Pradheep From: Arvind S <arvind18...@gmail.com<mailto:arvind18...@gmail.com>> Reply-To: "user@phoenix.apache.org<mailto:user@phoenix.apache.org>" <user@phoenix.apache.org<mailto:user@phoenix.apache.org>> Date: Thursday, February 23, 2017 at 1:19 AM To: "user@phoenix.apache.org<mailto:user@phoenix.apache.org>" <user@phoenix.apache.org<mailto:user@phoenix.apache.org>> Subject: Re: Phoenix query performance why cant you reduce your query to select msbo1.PARENTID from msbo_phoenix_comp_rowkey where msbo1.PARENTTYPE = 'SHIPMENT' and msbo1.OWNERORGID = 100 and msbo1.MILESTONETYPEID != 19661 and msbo1.PARENTREFERENCETIME between 1479964000 and 1480464000 group by msbo1.PARENTID order by msbo1.PARENTID ?? looks like that's what you get by this left outer. Cheers !! Arvind On 22 February 2017 at 22:58, Pradheep Shanmugam <pradheep.shanmu...@infor.com<mailto:pradheep.shanmu...@infor.com>> wrote: Hi, We have a hbase cluster with 8 region servers with 20G memory We have a table with 1 column family along with a secondary index. Following query took only few milliseconds when we had less data(< 1 million) After adding more data(~30M rows) the performance declined and took about a minute or more(not stable) select msbo1.PARENTID from msbo_phoenix_comp_rowkey msbo1 left outer join ( select PARENTID,MILESTONETYPEID from msbo_phoenix_comp_rowkey where PARENTREFERENCETIME between 1479964000 and 1480464000<tel:01480%20464%2> and OWNERORGID = 100 and PARENTTYPE = 'SHIPMENT' and MILESTONETYPEID = 19661 group by PARENTID,MILESTONETYPEID ) msbo2 on msbo1.PARENTID = msbo2.PARENTID where msbo1.PARENTTYPE = 'SHIPMENT' and msbo1.OWNERORGID = 100 and msbo2.MILESTONETYPEID is null and msbo1.PARENTREFERENCETIME between 1479964000 and 1480464000<tel:01480%20464%2> group by msbo1.PARENTID order by msbo1.PARENTID The RHS return about a 500K rows ..LHS about 18M rows…final result about 500K rows MSBO_PHOENIX_COMP_ROWKEY_INDEX_PARENTTYPE_OWNERORGID_MILESTONETYPEID_PARENTREFERENCETIME is the index Query plan: CLIENT 8-CHUNK PARALLEL 8-WAY RANGE SCAN OVER MSBO_PHOENIX_COMP_ROWKEY_INDEX_PARENTTYPE_OWNERORGID_MILESTONETYPEID_PARENTREFERENCETIME [0,'SHIPMENT',100] SERVER FILTER BY FIRST KEY ONLY AND (TO_UNSIGNED_LONG("PARENTREFERENCETIME") >= 1477958400 AND TO_UNSIGNED_LONG("PARENTREFERENCETIME") <= 1480464000<tel:01480%20464%2>) SERVER AGGREGATE INTO DISTINCT ROWS BY ["MSBO1.:PARENTID"] CLIENT MERGE SORT PARALLEL LEFT-JOIN TABLE 0 CLIENT 8-CHUNK PARALLEL 8-WAY RANGE SCAN OVER MSBO_PHOENIX_COMP_ROWKEY_INDEX_PARENTTYPE_OWNERORGID_MILESTONETYPEID_PARENTREFERENCETIME [0,'SHIPMENT',100,19661,1,477,958,400] - [0,'SHIPMENT',100,19661,1,480,464,000] SERVER FILTER BY FIRST KEY ONLY SERVER AGGREGATE INTO DISTINCT ROWS BY ["PARENTID", "MILESTONETYPEID"] CLIENT MERGE SORT AFTER-JOIN SERVER FILTER BY MSBO2.MILESTONETYPEID IS NULL Attached the phoenix log. I see the caching to set as 100..and "maxResultSize”:2097152..is that something that can be tuned will help? Is that the client merge sort consuming more time can be improved? Is there any other tuning possible? Thanks, Pradheep
Re: Phoenix query performance
Hi Maryann Please find my answers inline. Thanks, Pradheep From: Maryann Xue <maryann@gmail.com<mailto:maryann@gmail.com>> Reply-To: "user@phoenix.apache.org<mailto:user@phoenix.apache.org>" <user@phoenix.apache.org<mailto:user@phoenix.apache.org>> Date: Wednesday, February 22, 2017 at 2:22 PM To: "user@phoenix.apache.org<mailto:user@phoenix.apache.org>" <user@phoenix.apache.org<mailto:user@phoenix.apache.org>> Subject: Re: Phoenix query performance Hi Pradheep, Thank you for posting the query and the log file! There are two things going on on the server side at the same time here. I think it'd be a good idea to isolate the problem first. So a few questions: 1. When you say data size went from "< 1M" to 30M, did the data from both LHS and RHS grow proportionately? It is basically the same table..the query is like a self join..yes, you can say that it is proportional. 2. If yes to (1), what if we only increase the data in LHS, but keep it small for RHS? Would the query run significantly faster? When RHS count is 420336, time taken is 37 seconds When RHS count is 63575, time taken is 32 seconds (not a significant difference) 3. What if we only do group by on LHS? Would the query time be linear to the data size? After Removing group by on RHS When RHS count is 420336, time taken is 34 seconds When RHS count is 63575, time taken is 32 seconds 4. How was GC when running the query? About 12ms in 1 RS, 10ms in 1 RS, 4-5ms in couple of them and less than 1ms in the rest of the region servers when the query is runnning. Thanks, Maryann On Wed, Feb 22, 2017 at 9:28 AM, Pradheep Shanmugam <pradheep.shanmu...@infor.com<mailto:pradheep.shanmu...@infor.com>> wrote: Hi, We have a hbase cluster with 8 region servers with 20G memory We have a table with 1 column family along with a secondary index. Following query took only few milliseconds when we had less data(< 1 million) After adding more data(~30M rows) the performance declined and took about a minute or more(not stable) select msbo1.PARENTID from msbo_phoenix_comp_rowkey msbo1 left outer join ( select PARENTID,MILESTONETYPEID from msbo_phoenix_comp_rowkey where PARENTREFERENCETIME between 1479964000 and 1480464000 and OWNERORGID = 100 and PARENTTYPE = 'SHIPMENT' and MILESTONETYPEID = 19661 group by PARENTID,MILESTONETYPEID ) msbo2 on msbo1.PARENTID = msbo2.PARENTID where msbo1.PARENTTYPE = 'SHIPMENT' and msbo1.OWNERORGID = 100 and msbo2.MILESTONETYPEID is null and msbo1.PARENTREFERENCETIME between 1479964000 and 1480464000 group by msbo1.PARENTID order by msbo1.PARENTID The RHS return about a 500K rows ..LHS about 18M rows…final result about 500K rows MSBO_PHOENIX_COMP_ROWKEY_INDEX_PARENTTYPE_OWNERORGID_MILESTONETYPEID_PARENTREFERENCETIME is the index Query plan: CLIENT 8-CHUNK PARALLEL 8-WAY RANGE SCAN OVER MSBO_PHOENIX_COMP_ROWKEY_INDEX_PARENTTYPE_OWNERORGID_MILESTONETYPEID_PARENTREFERENCETIME [0,'SHIPMENT',100] SERVER FILTER BY FIRST KEY ONLY AND (TO_UNSIGNED_LONG("PARENTREFERENCETIME") >= 1477958400 AND TO_UNSIGNED_LONG("PARENTREFERENCETIME") <= 1480464000) SERVER AGGREGATE INTO DISTINCT ROWS BY ["MSBO1.:PARENTID"] CLIENT MERGE SORT PARALLEL LEFT-JOIN TABLE 0 CLIENT 8-CHUNK PARALLEL 8-WAY RANGE SCAN OVER MSBO_PHOENIX_COMP_ROWKEY_INDEX_PARENTTYPE_OWNERORGID_MILESTONETYPEID_PARENTREFERENCETIME [0,'SHIPMENT',100,19661,1,477,958,400] - [0,'SHIPMENT',100,19661,1,480,464,000] SERVER FILTER BY FIRST KEY ONLY SERVER AGGREGATE INTO DISTINCT ROWS BY ["PARENTID", "MILESTONETYPEID"] CLIENT MERGE SORT AFTER-JOIN SERVER FILTER BY MSBO2.MILESTONETYPEID IS NULL Attached the phoenix log. I see the caching to set as 100..and "maxResultSize”:2097152..is that something that can be tuned will help? Is that the client merge sort consuming more time can be improved? Is there any other tuning possible? Thanks, Pradheep
Managed Splitting of phoenix salted tables
HI, 1. I have couple of phoenix tables with salting. I am assuming that all the regions will grow uniformly across the region server. Based on above I expect the region splitting to happen across all the region servers at the same time which will impact my performance when the region size gets bigger. I am considering manual region splitting to avoid this. But the given that the default split policy is IncreasingToUpperBoundRegionSplitPolicy, I cannot really disable the splitting by increasing the hbase.hregion.max.file size to say 100Gb as the the new split size is going to be the one set by the policy which will be less than 100 Gb several times and the automatic splitting is going to continue. Should I consider changing the policy back to ConstantSizeRegionSplitPolicy and then disable it setting to (say) 100 Gb? Is there a better solution to avoid this problem? 2. Should I not consider manual splitting as the chances of all regions splitting at the same time is not possible? Thanks, Pradheep
Phoenix tracing did not start
Hi, I am using hdp 2.4.3.0-227. I am trying to enable phoenix tracing to monitor the queries and to analyze performance. I followed steps outlined here - https://phoenix.apache.org/tracing.html i placed the hadoop-metrics2-hbase.properties in /etc/hbase/conf hadoop-metrics2-phoenix.properties in /usr/hdp/2.4.3.0-227/phoenix/bin in all regions servers Aslo added following properties to hbase.site phoenix.trace.statsTableName SYSTEM.TRACING_STATS phoenix.trace.frequency always. After this I am not clear where to place the ddl for SYSTEM.TRACING_STATS. Also i could not see ./bin/traceserver.py to start Please advice. Thanks, Pradheep
Salting an secondary index
Hi, I have a hbase table created using phoenix which is salted. Since the queries on the table required a secondary index, I created index using phoenix. Can this index also be salted which may place the rows randomly in different Rss? Even of the index is not salted, will the index be useful when we salt the actual table? Please advise. Thanks, Pradheep
Re: Using Apache perf with Hbase 1.1
Hi, All the regions are online acoording to 16010/master-status web UI. Phoenix is not able connect properly due to error "Caused by: java.io.IOException: hconnection-0xa59a583 closed" Thanks, Pradheep From: Mujtaba Chohan <mujt...@apache.org<mailto:mujt...@apache.org>> Reply-To: "user@phoenix.apache.org<mailto:user@phoenix.apache.org>" <user@phoenix.apache.org<mailto:user@phoenix.apache.org>> Date: Tuesday, October 18, 2016 at 5:36 PM To: "user@phoenix.apache.org<mailto:user@phoenix.apache.org>" <user@phoenix.apache.org<mailto:user@phoenix.apache.org>> Subject: Re: Using Apache perf with Hbase 1.1 Cannot get all table regions Check that there are no offline regions. See related thread here<http://apache-phoenix-user-list.1124778.n5.nabble.com/Help-w-table-that-suddenly-keeps-timing-out-tt2393.html#a2404>. On Tue, Oct 18, 2016 at 2:11 PM, Pradheep Shanmugam <pradheep.shanmu...@infor.com<mailto:pradheep.shanmu...@infor.com>> wrote: Hi, I am trying to connect pherf to hbase cluster running hbase 1.1 using below. Could you please help me connect pherf to hbase cluster running v 1.1 java -Xms512m -Xmx3072m -cp "/home/ambari/pherf/phoenix/bin/../phoenix-pherf/config:/etc/hbase/conf:/home/ambari/pherf/phoenix/bin/../phoenix-client/target/phoenix-server-client-4.7.0-HBase-1.1.jar:/home/ambari/pherf/phoenix/bin/../phoenix-pherf/target/phoenix-pherf-4.8.1-HBase-1.1.jar" -Dlog4j.configuration=file:/home/ambari/pherf/phoenix/bin/log4j.properties org.apache.phoenix.pherf.Pherf -drop all -l -q -z hbase-perf-rs1 -schemaFile '.*user_defined_schema.sql' -scenarioFile '.*user_defined_scenario.xml’ And I get below exception Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/commons/cli/ParseException at java.lang.Class.getDeclaredMethods0(Native Method) at java.lang.Class.privateGetDeclaredMethods(Class.java:2615) at java.lang.Class.getMethod0(Class.java:2856) at java.lang.Class.getMethod(Class.java:1668) at sun.launcher.LauncherHelper.getMainMethod(LauncherHelper.java:494) at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:486) Caused by: java.lang.ClassNotFoundException: org.apache.commons.cli.ParseException at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 6 more When I tried to connect using below, java -Xms512m -Xmx3072m -cp "/home/ambari/pherf/phoenix/bin/../phoenix-pherf/config:/etc/hbase/conf:/home/ambari/pherf/phoenix/bin/../phoenix-client/target/phoenix-4.9.0-HBase-1.2-SNAPSHOT-client.jar:/home/ambari/pherf/phoenix/bin/../phoenix-pherf/target/phoenix-pherf-4.9.0-HBase-1.2-SNAPSHOT-minimal.jar" -Dlog4j.configuration=file:/home/ambari/pherf/phoenix/bin/log4j.properties org.apache.phoenix.pherf.Pherf -drop all -l -q -z hbase-perf-rs1 -schemaFile '.*user_defined_schema.sql' -scenarioFile '.*user_defined_scenario.xml’ I got below error. I thought it could be because of connecting to hbase 1.1 with 1.2 client? java.sql.SQLException: ERROR 1102 (XCL02): Cannot get all table regions. at org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:457) at org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:145) at org.apache.phoenix.query.ConnectionQueryServicesImpl.getAllTableRegions(ConnectionQueryServicesImpl.java:549) at org.apache.phoenix.iterate.BaseResultIterators.getParallelScans(BaseResultIterators.java:542) at org.apache.phoenix.iterate.BaseResultIterators.getParallelScans(BaseResultIterators.java:477) at org.apache.phoenix.iterate.BaseResultIterators.(BaseResultIterators.java:370) at org.apache.phoenix.iterate.ParallelIterators.(ParallelIterators.java:60) at org.apache.phoenix.execute.ScanPlan.newIterator(ScanPlan.java:218) at org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:341) at org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:206) at org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:290) at org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:270) at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53) at org.apache.phoenix.jdbc.PhoenixStatement.executeQuery(PhoenixStatement.java:269) at org.apache.phoenix.jdbc.PhoenixStatement.executeQuery(PhoenixStatement.java:1476) at org.apache.phoenix.jdbc.PhoenixDatabaseMetaData.getTables(PhoenixDatabaseMetaData.java:1149) at org.apache.phoenix.pherf.util.PhoenixUtil.getTableMetaData(PhoenixUtil.java:220) at org.apache.phoenix.pherf.util.PhoenixUtil.d
Using Apache perf with Hbase 1.1
Hi, I am trying to connect pherf to hbase cluster running hbase 1.1 using below. Could you please help me connect pherf to hbase cluster running v 1.1 java -Xms512m -Xmx3072m -cp "/home/ambari/pherf/phoenix/bin/../phoenix-pherf/config:/etc/hbase/conf:/home/ambari/pherf/phoenix/bin/../phoenix-client/target/phoenix-server-client-4.7.0-HBase-1.1.jar:/home/ambari/pherf/phoenix/bin/../phoenix-pherf/target/phoenix-pherf-4.8.1-HBase-1.1.jar" -Dlog4j.configuration=file:/home/ambari/pherf/phoenix/bin/log4j.properties org.apache.phoenix.pherf.Pherf -drop all -l -q -z hbase-perf-rs1 -schemaFile '.*user_defined_schema.sql' -scenarioFile '.*user_defined_scenario.xml’ And I get below exception Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/commons/cli/ParseException at java.lang.Class.getDeclaredMethods0(Native Method) at java.lang.Class.privateGetDeclaredMethods(Class.java:2615) at java.lang.Class.getMethod0(Class.java:2856) at java.lang.Class.getMethod(Class.java:1668) at sun.launcher.LauncherHelper.getMainMethod(LauncherHelper.java:494) at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:486) Caused by: java.lang.ClassNotFoundException: org.apache.commons.cli.ParseException at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 6 more When I tried to connect using below, java -Xms512m -Xmx3072m -cp "/home/ambari/pherf/phoenix/bin/../phoenix-pherf/config:/etc/hbase/conf:/home/ambari/pherf/phoenix/bin/../phoenix-client/target/phoenix-4.9.0-HBase-1.2-SNAPSHOT-client.jar:/home/ambari/pherf/phoenix/bin/../phoenix-pherf/target/phoenix-pherf-4.9.0-HBase-1.2-SNAPSHOT-minimal.jar" -Dlog4j.configuration=file:/home/ambari/pherf/phoenix/bin/log4j.properties org.apache.phoenix.pherf.Pherf -drop all -l -q -z hbase-perf-rs1 -schemaFile '.*user_defined_schema.sql' -scenarioFile '.*user_defined_scenario.xml’ I got below error. I thought it could be because of connecting to hbase 1.1 with 1.2 client? java.sql.SQLException: ERROR 1102 (XCL02): Cannot get all table regions. at org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:457) at org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:145) at org.apache.phoenix.query.ConnectionQueryServicesImpl.getAllTableRegions(ConnectionQueryServicesImpl.java:549) at org.apache.phoenix.iterate.BaseResultIterators.getParallelScans(BaseResultIterators.java:542) at org.apache.phoenix.iterate.BaseResultIterators.getParallelScans(BaseResultIterators.java:477) at org.apache.phoenix.iterate.BaseResultIterators.(BaseResultIterators.java:370) at org.apache.phoenix.iterate.ParallelIterators.(ParallelIterators.java:60) at org.apache.phoenix.execute.ScanPlan.newIterator(ScanPlan.java:218) at org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:341) at org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:206) at org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:290) at org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:270) at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53) at org.apache.phoenix.jdbc.PhoenixStatement.executeQuery(PhoenixStatement.java:269) at org.apache.phoenix.jdbc.PhoenixStatement.executeQuery(PhoenixStatement.java:1476) at org.apache.phoenix.jdbc.PhoenixDatabaseMetaData.getTables(PhoenixDatabaseMetaData.java:1149) at org.apache.phoenix.pherf.util.PhoenixUtil.getTableMetaData(PhoenixUtil.java:220) at org.apache.phoenix.pherf.util.PhoenixUtil.deleteTables(PhoenixUtil.java:192) at org.apache.phoenix.pherf.Pherf.run(Pherf.java:234) at org.apache.phoenix.pherf.Pherf.main(Pherf.java:188) Caused by: java.io.IOException: hconnection-0xa59a583 closed at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getKeepAliveZooKeeperWatcher(ConnectionManager.java:1685) at org.apache.hadoop.hbase.client.ZooKeeperRegistry.isTableOnlineState(ZooKeeperRegistry.java:122) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.isTableDisabled(ConnectionManager.java:979) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.relocateRegion(ConnectionManager.java:1148) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.relocateRegion(ConnectionManager.java:1136) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getRegionLocation(ConnectionManager.java:957) at org.apache.phoenix.query.ConnectionQueryServicesImpl.getAllTableRegions(ConnectionQueryServicesImpl.java:535)