Re: Buckets VS regions
There are few reasons - around corner cases , starting from a region-split to any minor compactions during write process, general recommendation is to not use guide posts for query processing - or - disable guideposts so the consistency is guaranteed On Tue, Aug 20, 2019 at 11:10 AM jesse wrote: > Yes, that seems to be a trick, the issue is only associated a region. > > What could be the causes of guidposts not updated? > > > On Mon, Aug 19, 2019, 6:40 PM venkata subbarayudu > wrote: > >> Did you try with updating table statistics, it may be because some times >> table guideposts are out of sync >> >> Below is the SQL to update table stats >> Update statistics table >> By default above executes asynchronously, hence it may take some time to >> update depending on table size >> >> On Tue 20 Aug, 2019, 6:34 AM jesse, wrote: >> >>> And the table is simple and has no index set up. >>> >>> On Mon, Aug 19, 2019, 6:03 PM jesse wrote: >>> >>>> we got some trouble, maybe someone could shed some light on this. >>>> >>>> Table has primary key c1, c2 and c3. >>>> Table is set with SALT_BUCKETS=12. Now it has 14 regions. >>>> >>>> The table has a record with c1='a', c2='b', c3='c' >>>> >>>> If Phoenix query is like: >>>> select * from t where c2='b', it returns some results. >>>> >>>> select * from t where c1='a', it returns empty >>>> select * from t where c2='b' and c1='a', it returns empty >>>> >>>> select * from t where c3='c', it returns right results >>>> select * from t where c2='b' and c3='c', it returns results >>>> >>>> What the heck is going wrong? The system used to work fine. >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Mon, Aug 19, 2019, 5:33 PM James Taylor >>>> wrote: >>>> >>>>> It’ll start with 12 regions, but those regions may split as they’re >>>>> written to. >>>>> >>>>> On Mon, Aug 19, 2019 at 4:34 PM jesse wrote: >>>>> >>>>>> I have a table is SALT_BUCKETS = 12, but it has 14 regions, is this >>>>>> right? >>>>>> >>>>>> Thanks >>>>>> >>>>>> >>>>>> -- *Venkata Subbarayudu Amanchi.*
Re: Buckets VS regions
Did you try with updating table statistics, it may be because some times table guideposts are out of sync Below is the SQL to update table stats Update statistics table By default above executes asynchronously, hence it may take some time to update depending on table size On Tue 20 Aug, 2019, 6:34 AM jesse, wrote: > And the table is simple and has no index set up. > > On Mon, Aug 19, 2019, 6:03 PM jesse wrote: > >> we got some trouble, maybe someone could shed some light on this. >> >> Table has primary key c1, c2 and c3. >> Table is set with SALT_BUCKETS=12. Now it has 14 regions. >> >> The table has a record with c1='a', c2='b', c3='c' >> >> If Phoenix query is like: >> select * from t where c2='b', it returns some results. >> >> select * from t where c1='a', it returns empty >> select * from t where c2='b' and c1='a', it returns empty >> >> select * from t where c3='c', it returns right results >> select * from t where c2='b' and c3='c', it returns results >> >> What the heck is going wrong? The system used to work fine. >> >> >> >> >> >> >> >> >> On Mon, Aug 19, 2019, 5:33 PM James Taylor >> wrote: >> >>> It’ll start with 12 regions, but those regions may split as they’re >>> written to. >>> >>> On Mon, Aug 19, 2019 at 4:34 PM jesse wrote: >>> I have a table is SALT_BUCKETS = 12, but it has 14 regions, is this right? Thanks
Re: split count for mapreduce jobs with PhoenixInputFormat
You may recreate the table with salt_bucket table option to have reasonable regions and you may try having a secondary index to make the query run faster incase if your Mapreduce job performing specific filters On Thu 31 Jan, 2019, 12:09 AM Thomas D'Silva If stats are enabled PhoenixInputFormat will generate a split per > guidepost. > > On Wed, Jan 30, 2019 at 7:31 AM Josh Elser wrote: > >> You can extend/customize the PhoenixInputFormat with your own code to >> increase the number of InputSplits and Mappers. >> >> On 1/30/19 6:43 AM, Edwin Litterst wrote: >> > Hi, >> > I am using PhoenixInputFormat as input source for mapreduce jobs. >> > The split count (which determines how many mappers are used for the >> job) >> > is always equal to the number of regions of the table from where I >> > select the input. >> > Is there a way to increase the number of splits? My job is running too >> > slow with only one mapper for every region. >> > (Increasing the number of regions is no option.) >> > regards, >> > Eddie >> >
Phoenix Delete Query - Hbase rpc timeout relation
Hello Team, How does *Phoenix delete query uses hbase-rpc-timeout* , and can you please point any document illustrating the difference between a delete-query execution Vs Upsert query execution, since I've noticed a different behaviour for this during an *UPSERT *query Vs *DELETE* query. Please find below details on the same hbase.rpc.timeout = 12 ( 2 min ) phoenix.query.keepAliveMs = 12 ( 2 min ) phoenix.query.timeoutMs = 120 ( 20 min ) phoenix-auto-commit = true An upsert query that runs for more than 2 mins and below 20 mins, goes fine (test-query ran for about 12 mins), but a delete query that runs for more than 2 min throws rpc-timeout errors, and when the table is checked few records got deleted, when rpctimeout increased to same as phoenix-query timeout , delete query ran fine, hence wants an understanding on how to set rpc-timeout + phoenix-timeout values for a large table delete and if the delete is expected to take increasing time as the table grows (trying with secondary indexing is one option to keep the timeouts lower, but assuming the delete-time grows even with secondary indexes) Thank you for your support -- *Venkata Subbarayudu Amanchi.*
Phoenix Metrics - How to understand
Hello Team, Phoenix provides metrics to understand insights during execution of a SQL query both at individual SQL and client JVM level, I want to check if there any document explaining what a each metric mean and what indicates a query execution went fine or had issues. for eg: below are the Queries in sequence that were fired using a standalone program along with metrics response in a hbase cluster with phoenix (hbase 1.1.2 + phoenix 4.8.1) Query Start Time: 2018-09-04 09:10:58,536 Query End Time: 2018-09-04 09:12:52,932 Time taken to query: ~ 2minutes Below are the phoenix metrics *Mutation Write Metrings* *PhoenixRuntime.getWriteMetricsForMutationsSinceLastReset(connection)* >>> {UPSRT_TABLE={MUTATION_COMMIT_TIME=9325278, MUTATION_BATCH_SIZE=8097788, MUTATION_BYTES=23708733912}} *Mutation Read Metrings* *PhoenixRuntime.getReadMetricsForMutationsSinceLastReset(connection)* >>> {SLET_TABLE={TASK_QUEUE_WAIT_TIME=29384207, SCAN_BYTES=17973054976, TASK_END_TO_END_TIME=41399954, TASK_EXECUTED_COUNTER=712, TASK_EXECUTION_TIME=12015738, TASK_REJECTED_COUNTER=0}} *Global Metrics* *PhoenixRuntime.getGlobalPhoenixClientMetrics()* >>> MUTATION_BATCH_SIZE :: 8097788 MUTATION_BYTES :: 23708733912 MUTATION_COMMIT_TIME :: 9325278 QUERY_TIME :: 0 NUM_PARALLEL_SCANS :: 712 SCAN_BYTES :: 17973054976 SPOOL_FILE_SIZE :: 0 MEMORY_CHUNK_BYTES :: 0 MEMORY_WAIT_TIME :: 0 TASK_QUEUE_WAIT_TIME :: 29384207 TASK_END_TO_END_TIME :: 41399954 TASK_EXECUTION_TIME :: 12015738 MUTATION_SQL_COUNTER :: 95 SELECT_SQL_COUNTER :: 0 TASK_EXECUTED_COUNTER :: 712 TASK_REJECTED_COUNTER :: 0 QUERY_TIMEOUT_COUNTER :: 0 QUERY_FAILED_COUNTER :: 0 SPOOL_FILE_COUNTER :: 0 OPEN_PHOENIX_CONNECTIONS_COUNTER :: 1 *Q:* *How to understand below metrics * TASK_QUEUE_WAIT_TIME = 29384207 milliseconds (~490 minutes) TASK_EXECUTION_TIME = 12015738 milliseconds (~200 minutes) But query itself completed in ~ 2 minutes It is also noticed that OPEN_PHOENIX_CONNECTIONS_COUNTER kept on increasing by 10 for each of the Query that is fired subsequently i.e. "OPEN_PHOENIX_CONNECTIONS_COUNTER = 11" , "OPEN_PHOENIX_CONNECTIONS_COUNTER = 21 " etc.,though there were no other noticeable logs indicating some error, what does this mean, is it like multiple attempts tried internally, and all the open connections by this are not cleared ? -- *Venkata Subbarayudu Amanchi.*
Phoenix Metrics - How to understand
Phoenix provides metrics to understand insights during exeuction of a SQL query both at individual SQL and client JVM level, is there any document explaining what a each metric mean and what indicates a query execution went fine or had issues. for eg: below are the Queries in sequence that were fired using a standalone program with metrics response in a hbase cluster (hbase-1.1.2 & Phoenix-4.8) Query Start Time: 2018-09-04 09:10:58,536 Query End Time: 2018-09-04 09:12:52,932 Time taken to query: ~ 2minutes Below are the phoenix metrics *Mutation Write Metrings* *PhoenixRuntime.getWriteMetricsForMutationsSinceLastReset(connection)* >>> {UPSRT_TABLE={MUTATION_COMMIT_TIME=9325278, MUTATION_BATCH_SIZE=8097788, MUTATION_BYTES=23708733912}} *Mutation Read Metrings* *PhoenixRuntime.getReadMetricsForMutationsSinceLastReset(connection)* >>> {SLET_TABLE={TASK_QUEUE_WAIT_TIME=29384207, SCAN_BYTES=17973054976, TASK_END_TO_END_TIME=41399954, TASK_EXECUTED_COUNTER=712, TASK_EXECUTION_TIME=12015738, TASK_REJECTED_COUNTER=0}} *Global Metrics* *PhoenixRuntime.getGlobalPhoenixClientMetrics()* >>> MUTATION_BATCH_SIZE :: 8097788 MUTATION_BYTES :: 23708733912 MUTATION_COMMIT_TIME :: 9325278 QUERY_TIME :: 0 NUM_PARALLEL_SCANS :: 712 SCAN_BYTES :: 17973054976 SPOOL_FILE_SIZE :: 0 MEMORY_CHUNK_BYTES :: 0 MEMORY_WAIT_TIME :: 0 TASK_QUEUE_WAIT_TIME :: 29384207 TASK_END_TO_END_TIME :: 41399954 TASK_EXECUTION_TIME :: 12015738 MUTATION_SQL_COUNTER :: 95 SELECT_SQL_COUNTER :: 0 TASK_EXECUTED_COUNTER :: 712 TASK_REJECTED_COUNTER :: 0 QUERY_TIMEOUT_COUNTER :: 0 QUERY_FAILED_COUNTER :: 0 SPOOL_FILE_COUNTER :: 0 OPEN_PHOENIX_CONNECTIONS_COUNTER :: 1 *Q:* *How to understand below metrics* TASK_QUEUE_WAIT_TIME = 29384207 milliseconds (~490 minutes) TASK_EXECUTION_TIME = 12015738 milliseconds (~200 minutes) Its also noticed that OPEN_PHOENIX_CONNECTIONS_COUNTER kept on increasing by 10 for each of the Query for each of the query fired subsequently i.e. "OPEN_PHOENIX_CONNECTIONS_COUNTER = 11" , "OPEN_PHOENIX_CONNECTIONS_COUNTER = 21 " etc. ,though there were no other noticeable logs indicating some error, what does this mean? -- *Venkata Subbarayudu Amanchi.*
Re: Why the bulkload does not support update index data?
is phoenix bulk loader not updating the index all the times, or , are during any specific scenarios On Thu, Jan 18, 2018 at 7:51 AM, cloud.pos...@gmail.com < cloud.pos...@gmail.com> wrote: > The index data will be dirty if bulkload can not update index data. > -- *Venkata Subbarayudu Amanchi.*
Re: using index with "or" query
Hi Noam, You can evaluate below approach - Create a temp-table with your target-schema for the result-set - Have indexes created for all your OR clause queries - Fire one-query at a time (with one OR clause) and dump data to target-temp table (here, all later queries updates the records if they are already present) - Read data from target-temp table (after firing all queries) , and drop temp-table though this approach results in firing more queries, it can complete in less time than firing queries without using index On Wed, Sep 20, 2017 at 9:31 PM, James Taylor wrote: > If you're using local index, then the hint isn't necessary. However, a > covered local index may improve performance (at the expense of extra space) > by preventing a local get to retrieve the data row (when a column not in > the index is referenced). > > Phoenix will only use a single index. The only exception is with UNION > ALL. > > Thanks, > James > > On Tue, Sep 19, 2017 at 12:27 PM Sapir, Yoav wrote: > >> Hi James, >> Thank you for the fast reply. >> >> 1. As far as we saw in Phoenix documentation UNION is not supported, only >> UNION ALL. >> Breaking the queries to multiple queries with UNION ALL will return >> duplicates. >> It may be possible to wrap these queries with select distinct, but it >> significantly complicates the queries and will have performance impact. >> 2. In case of select …. Where a='xyz' or b='123' >> I don't see how adding column b as covered column to index on column a >> will help for finding a row such as the row (a: 'abc', b: '123') >> It will help only for a row such as (a: 'xyz', b: '123') >> 3. Hint on global index works for a single index. Is there a way to use >> multiple indexes? Hint on using multiple indexes? >> Hint on local index cause an error of unknown field if the where clause >> refer to a field in the index and to another field that is not part of the >> specific local index. There is an open bug on it. >> >> BR, >> >> Yoav Sapir >> >> >> On 19 Sep 2017, at 18:21, James Taylor wrote: >> >> Hi Noam, >> A few ideas: >> 1) Use a UNION instead of an OR and you may be able to use more than one >> index for one query. >> 2) Include the columns you're referencing in the index to make it a >> covered index [1]. >> 3) Hint [2] the query to force the index to be used. >> >> Thanks, >> James >> >> [1] http://phoenix.apache.org/secondary_indexing.html#Index_Usage (#1) >> [2] http://phoenix.apache.org/secondary_indexing.html#Index_Usage (#2) >> >> On Tue, Sep 19, 2017 at 4:21 AM Bulvik, Noam >> wrote: >> >>> Hi, >>> >>> >>> >>> We have a case where we have a table with few index on different >>> columns a, b, c etc' . It works well if we do select with "and" >>> condition (for example select …. Where a='xyz' and b='123' )but when we >>> have or condition (for example select …. Where a='xyz' or b='123') we get >>> full scan even though we have index on a and on b. >>> >>> >>> >>> Is there a way to get this query to use indexes and not full scan beside >>> creating index on all available column combination (index on a+b , index on >>> a +c …) >>> >>> >>> >>> >>> >>> Regards, >>> >>> >>> >>> *Noam * >>> >>> >>> >>> -- >>> >>> PRIVILEGED AND CONFIDENTIAL >>> PLEASE NOTE: The information contained in this message is privileged and >>> confidential, and is intended only for the use of the individual to whom it >>> is addressed and others who have been specifically authorized to receive >>> it. If you are not the intended recipient, you are hereby notified that any >>> dissemination, distribution or copying of this communication is strictly >>> prohibited. If you have received this communication in error, or if any >>> problems occur with transmission, please contact sender. Thank you. >>> >> >> -- >> >> PRIVILEGED AND CONFIDENTIAL >> PLEASE NOTE: The information contained in this message is privileged and >> confidential, and is intended only for the use of the individual to whom it >> is addressed and others who have been specifically authorized to receive >> it. If you are not the intended recipient, you are hereby notified that any >> dissemination, distribution or copying of this communication is strictly >> prohibited. If you have received this communication in error, or if any >> problems occur with transmission, please contact sender. Thank you. >> > -- *Venkata Subbarayudu Amanchi.*
Re: phoenix upsert select query fails with : java.util.concurrent.ExecutionException: java.lang.ArrayIndexOutOfBoundsException
Below is the complete stack trace INFO [main] org.apache.phoenix.iterate.BaseResultIterators: Failed to execute task during cancel java.util.concurrent.ExecutionException: java.lang.ArrayIndexOutOfBoundsException: 45 at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.phoenix.iterate.BaseResultIterators.close(BaseResultIterators.java:882) at org.apache.phoenix.iterate.BaseResultIterators.getIterators(BaseResultIterators.java:819) at org.apache.phoenix.iterate.BaseResultIterators.getIterators(BaseResultIterators.java:721) at org.apache.phoenix.iterate.RoundRobinResultIterator.getIterators(RoundRobinResultIterator.java:176) at org.apache.phoenix.iterate.RoundRobinResultIterator.next(RoundRobinResultIterator.java:91) at org.apache.phoenix.iterate.DelegateResultIterator.next(DelegateResultIterator.java:44) at org.apache.phoenix.compile.UpsertCompiler$2.execute(UpsertCompiler.java:815) at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:344) at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:332) at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53) at org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:331) at org.apache.phoenix.jdbc.PhoenixStatement.executeUpdate(PhoenixStatement.java:1423) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) *Caused by: java.lang.ArrayIndexOutOfBoundsException: 45* at org.apache.phoenix.util.ByteUtil.vlongFromBytes(ByteUtil.java:341) at org.apache.phoenix.util.ByteUtil.vintFromBytes(ByteUtil.java:320) at org.apache.phoenix.schema.KeyValueSchema.next(KeyValueSchema.java:209) at org.apache.phoenix.schema.KeyValueSchema.iterator(KeyValueSchema.java:165) at org.apache.phoenix.schema.KeyValueSchema.iterator(KeyValueSchema.java:171) at org.apache.phoenix.schema.KeyValueSchema.iterator(KeyValueSchema.java:175) at org.apache.phoenix.expression.ProjectedColumnExpression.evaluate(ProjectedColumnExpression.java:114) at org.apache.phoenix.expression.LongAddExpression.evaluate(LongAddExpression.java:43) at org.apache.phoenix.expression.CoerceExpression.evaluate(CoerceExpression.java:148) at org.apache.phoenix.compile.ExpressionProjector.getValue(ExpressionProjector.java:69) at org.apache.phoenix.jdbc.PhoenixResultSet.getBytes(PhoenixResultSet.java:308) at org.apache.phoenix.compile.UpsertCompiler.upsertSelect(UpsertCompiler.java:197) at org.apache.phoenix.compile.UpsertCompiler.access$000(UpsertCompiler.java:115) at org.apache.phoenix.compile.UpsertCompiler$UpsertingParallelIteratorFactory.mutate(UpsertCompiler.java:259) at org.apache.phoenix.compile.MutatingParallelIteratorFactory.newIterator(MutatingParallelIteratorFactory.java:59) at org.apache.phoenix.iterate.ParallelIterators$1.call(ParallelIterators.java:114) at org.apache.phoenix.iterate.ParallelIterators$1.call(ParallelIterators.java:106) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at org.apache.phoenix.job.JobManager$InstrumentedJobFutureTask.run(JobManager.java:183) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) On Mon, Dec 12, 2016 at 1:43 AM, Josh Elser wrote: > What's the rest of the stacktrace? You cut off the cause. > > venkata subbarayudu wrote: > >> >> I faced a strange issue, that, Phoenix hbase upsert query fails with >> ArrayIndexOutOfBounds exception. >> >> Query looks like: >>upsert into table (pk,col1, col2, col3) select a.pk >> <http://a.pk>,a.col1,a.col2,a.col3 from table a inner join table b on >> a.pk <http://a.pk>=b.pk <http://b.pk> where b.col2='value' >> >> >> we use hbase-1.1.2 >> with phoneix-4.8.1 >> >> Below is the stack trace of the error >> --- >> >> [main] org.apache.phoenix.iterate.BaseResultIterators: Failed to execute >> task during cancel >> java.util.concurrent.ExecutionException: java.lang.ArrayIndexOutOfBound >> sException >> at java.util.concurrent.FutureTask.report(FutureTask.java:122) >>
phoenix upsert select query fails with : java.util.concurrent.ExecutionException: java.lang.ArrayIndexOutOfBoundsException
I faced a strange issue, that, Phoenix hbase upsert query fails with ArrayIndexOutOfBounds exception. Query looks like: upsert into table (pk,col1, col2, col3) select a.pk,a.col1,a.col2,a.col3 from table a inner join table b on a.pk=b.pk where b.col2='value' we use hbase-1.1.2 with phoneix-4.8.1 Below is the stack trace of the error --- [main] org.apache.phoenix.iterate.BaseResultIterators: Failed to execute task during cancel java.util.concurrent.ExecutionException: java.lang.ArrayIndexOutOfBoundsException at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.phoenix.iterate.BaseResultIterators.close(BaseResultIterators.java:882) at org.apache.phoenix.iterate.BaseResultIterators.getIterators(BaseResultIterators.java:819) at org.apache.phoenix.iterate.BaseResultIterators.getIterators(BaseResultIterators.java:721) at org.apache.phoenix.iterate.RoundRobinResultIterator.getIterators(RoundRobinResultIterator.java:176) at org.apache.phoenix.iterate.RoundRobinResultIterator.next(RoundRobinResultIterator.java:91) at org.apache.phoenix.iterate.DelegateResultIterator.next(DelegateResultIterator.java:44) at org.apache.phoenix.compile.UpsertCompiler$2.execute(UpsertCompiler.java:815) at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:344) at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:332) at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53) at org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:331) at org.apache.phoenix.jdbc.PhoenixStatement.executeUpdate(PhoenixStatement.java:1423) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) Caused by: java.lang.ArrayIndexOutOfBoundsException But re-trying the same query, it got succeded. I'm trying to understand the root cause of the problem and how to resolve the same any help is much appreciated. -- *Venkata Subbarayudu Amanchi.*