[jira] [Commented] (PHOENIX-2896) Support encoded column qualifiers per column family
[ https://issues.apache.org/jira/browse/PHOENIX-2896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16488565#comment-16488565 ] Samarth Jain commented on PHOENIX-2896: --- [~tdsilva] - I am not sure I understood your question. We use the default family name for tracking column qualifier counters for mutable tables. {code:java} if (immutableStorageScheme == SINGLE_CELL_ARRAY_WITH_OFFSETS && encodingScheme != NON_ENCODED_QUALIFIERS) { // For this scheme we track column qualifier counters at the column family level. cqCounterFamily = colDefFamily != null ? colDefFamily : (defaultFamilyName != null ? defaultFamilyName : DEFAULT_COLUMN_FAMILY); } else { // For other schemes, column qualifier counters are tracked using the default column family. cqCounterFamily = defaultFamilyName != null ? defaultFamilyName : DEFAULT_COLUMN_FAMILY; }{code} > Support encoded column qualifiers per column family > > > Key: PHOENIX-2896 > URL: https://issues.apache.org/jira/browse/PHOENIX-2896 > Project: Phoenix > Issue Type: Sub-task >Reporter: Thomas D'Silva >Assignee: Samarth Jain >Priority: Major > Fix For: 4.10.0 > > > This allows us to reduce the number of null values in the stored array that > contains all columns for a give column family for the > COLUMNS_STORED_IN_SINGLE_CELL Storage Scheme. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PHOENIX-4701) Improve schema of SYSTEM.LOG table
[ https://issues.apache.org/jira/browse/PHOENIX-4701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16451462#comment-16451462 ] Samarth Jain commented on PHOENIX-4701: --- I haven't closely looked at the original commit, [~jamestaylor]. But do you think we can run into some kind of infinite loop by using the Phoenix API for writing to the SYSTEM.LOG table? If so, we may need to do something similar like what our tracing framework does where it makes sure writes to SYSTEM.TRACE table do not generate traces themselves. > Improve schema of SYSTEM.LOG table > -- > > Key: PHOENIX-4701 > URL: https://issues.apache.org/jira/browse/PHOENIX-4701 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: James Taylor >Priority: Major > Fix For: 4.14.0, 5.0.0 > > Attachments: PHOENIX-4701_wip1.patch, PHOENIX-4701_wip2.patch > > > If possible, the SYSTEM.LOG table would benefit greatly (3-5x perf gain) > from being declared as immutable with a column encoding of 1 byte and a > storage format of SINGLE_CELL_ARRAY_WITH_OFFSETS. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PHOENIX-4366) Rebuilding a local index fails sometimes
[ https://issues.apache.org/jira/browse/PHOENIX-4366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16432792#comment-16432792 ] Samarth Jain commented on PHOENIX-4366: --- Ah, I see! Thanks for the explanation, [~sergey.soldatov]. > Rebuilding a local index fails sometimes > > > Key: PHOENIX-4366 > URL: https://issues.apache.org/jira/browse/PHOENIX-4366 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.12.0 >Reporter: Marcin Januszkiewicz >Assignee: James Taylor >Priority: Blocker > Fix For: 4.14.0 > > Attachments: PHOENIX-4366_v1.patch > > > We have a table created in 4.12 with the new column encoding scheme and with > several local indexes. Sometimes when we issue an ALTER INDEX ... REBUILD > command, it fails with the following exception: > {noformat} > Error: org.apache.phoenix.exception.PhoenixIOException: > org.apache.hadoop.hbase.DoNotRetryIOException: > TRACES,\x01BY01O90A6-$599a349e,1509979836322.3f > 30c9d449ed6c60a1cda6898f766bd0.: null > > > at > org.apache.phoenix.util.ServerUtil.createIOException(ServerUtil.java:96) > > > at > org.apache.phoenix.util.ServerUtil.throwIOException(ServerUtil.java:62) > > > at > org.apache.phoenix.coprocessor.BaseScannerRegionObserver$RegionScannerHolder.overrideDelegate(BaseScannerRegionObserver.java:255) > > at > org.apache.phoenix.coprocessor.BaseScannerRegionObserver$RegionScannerHolder.nextRaw(BaseScannerRegionObserver.java:284) > > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2541) > > > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:33648) > > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2183) > > > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112) > > > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:183) > > > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:163) > > > Caused by: java.lang.UnsupportedOperationException > > > at > org.apache.phoenix.schema.PTable$QualifierEncodingScheme$1.decode(PTable.java:247) > > > at > org.apache.phoenix.schema.tuple.EncodedColumnQualiferCellsList.add(EncodedColumnQualiferCellsList.java:141) > > at > org.apache.phoenix.schema.tuple.EncodedColumnQualiferCellsList.add(EncodedColumnQualiferCellsList.java:56) > > at > org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:560) > > > at > org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147) > > > at > org.apache.hadoop.hbase.regionserver.HRegio
[jira] [Commented] (PHOENIX-4366) Rebuilding a local index fails sometimes
[ https://issues.apache.org/jira/browse/PHOENIX-4366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431669#comment-16431669 ] Samarth Jain commented on PHOENIX-4366: --- I was motivated by just getting hold of the column encoding related values once in preScannerOpen and reusing it across the board (instead of having to fetch it from the scan context every time). I made this with the assumption that every region gets it's own co-processor instance. Or is it one instance per region server? If former, why is it problematic to store these values as member variables since their scope should only be limited to the table region. > Rebuilding a local index fails sometimes > > > Key: PHOENIX-4366 > URL: https://issues.apache.org/jira/browse/PHOENIX-4366 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.12.0 >Reporter: Marcin Januszkiewicz >Assignee: James Taylor >Priority: Blocker > Fix For: 4.14.0 > > Attachments: PHOENIX-4366_v1.patch > > > We have a table created in 4.12 with the new column encoding scheme and with > several local indexes. Sometimes when we issue an ALTER INDEX ... REBUILD > command, it fails with the following exception: > {noformat} > Error: org.apache.phoenix.exception.PhoenixIOException: > org.apache.hadoop.hbase.DoNotRetryIOException: > TRACES,\x01BY01O90A6-$599a349e,1509979836322.3f > 30c9d449ed6c60a1cda6898f766bd0.: null > > > at > org.apache.phoenix.util.ServerUtil.createIOException(ServerUtil.java:96) > > > at > org.apache.phoenix.util.ServerUtil.throwIOException(ServerUtil.java:62) > > > at > org.apache.phoenix.coprocessor.BaseScannerRegionObserver$RegionScannerHolder.overrideDelegate(BaseScannerRegionObserver.java:255) > > at > org.apache.phoenix.coprocessor.BaseScannerRegionObserver$RegionScannerHolder.nextRaw(BaseScannerRegionObserver.java:284) > > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2541) > > > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:33648) > > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2183) > > > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112) > > > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:183) > > > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:163) > > > Caused by: java.lang.UnsupportedOperationException > > > at > org.apache.phoenix.schema.PTable$QualifierEncodingScheme$1.decode(PTable.java:247) > > > at > org.apache.phoenix.schema.tuple.EncodedColumnQualiferCellsList.add(EncodedColumnQualiferCellsList.java:141) > > at > org.apache.phoenix.schema.tuple.EncodedColumnQualiferCellsList.add(EncodedColumnQualiferCellsList.java:56) > > at > org.apache.hadoop.hbase.regionserver.StoreScanner.
Re: [DISCUSS] Design for a "query log"
A couple more points which I think you alluded to, Josh, but I would still like to call out: 1) Writing of these query logs to a phoenix table should be best effort i.e. a query definitely shouldn't fail because we encountered an issue while writing its log 2) Writing of query logs should happen in a manner that is async to the flow of the query i.e. a query shouldn't incur the cost of the write happening to the query log table Doing 2) will help out with 1) On Fri, Mar 2, 2018 at 2:28 PM, Josh Elserwrote: > Thanks Nick and Andrew! These are great points. > > * A TTL out of the box is a must. That's such a good suggestion > * Sensitivity of data being stored is also a tricky-serious issue to > consider. We'll want to lock the table down and be able to state very > clearly what data may show up in it. > * I like the "levels" of detail that will be persisted. It will help break > up the development work (e.g. first impl can just be the INFO details), and > prevents concern of runtime impact. > * Sampling is a no-brainer for "always-on" situations. I like that too. > > I'll work on taking these (and others) and updating the gdoc tonight. > Thanks again for your feedback! > > > On 3/2/18 1:50 PM, Andrew Purtell wrote: > >> Agree with Nick's points but let me augment with an additional suggestion: >> Tunable/configurable threshold for sampling. In many cases it's sufficient >> to sample e.g. 1% of queries to get sufficient coverage and this would >> prune 99% of actual load from the query log. >> >> Also let me underline that compliance requirements will require either >> super strong controls of the query log if everything is always logged, in >> which case it is important that it works well with access control features >> to lock it down; or better what Nick suggests where we can turn off things >> like logging the values supplied for bound parameters. >> >> >> >> On Fri, Mar 2, 2018 at 8:41 AM, Nick Dimiduk wrote: >> >> I'm a big fan of this idea. There was a brief discussion on the topic over >>> on PHOENIX-2715. >>> >>> My first concern is that the collected information is huge -- easily far >>> larger than the user data for a busy cluster. For instance, a couple 10's >>> of GB stored user data, guideposts set to default 100mb, enable salting >>> on >>> a table with an "innocent" value of 10 or 20 and the collection of RPCs >>> can >>> easily grow into the hundreds for simple queries. Even if you catalog >>> just >>> the "logical" RPC's - HBase Client API calls that Phoenix plans rather >>> than >>> the underlying HBase Client RPCs - this will be quite large. The >>> guidepost >>> themselves for such a table would be on the order of 30mb. >>> >>> My next concern is about the sensitive query parameters being stored. >>> It's >>> entirely reasonable to expect a table to store sensitive information that >>> should not be exposed to operations. >>> >>> Thus, my suggestions: >>> * minimize the unbounded nature of this table by truncating all columns >>> to >>> some max length -- perhaps 5k or 10k. >>> * enable a default TTL on the schema. 7 days seems like a good starting >>> point. >>> * consider controlling which columns are populated via some operational >>> mechanism. Use Logger level as an example, with INFO the default setting. >>> Which data is stored at this level? Then at DEBUG, then TRACE. Maybe >>> timestamp, SQL, and explain are at INFO. DEBUG adds bound parameters and >>> scan metrics. TRACE adds RPCs and timing, snapshot metadata. >>> >>> Thanks, >>> Nick >>> >>> On Mon, Feb 26, 2018 at 1:57 PM, Josh Elser wrote: >>> >>> Hiya, I wanted to share this little design doc with you about some feature work we've been thinking about. The following is a Google doc in which anyone should be allowed to comment. Feel free to comment there, or here on the thread. https://s.apache.org/phoenix-query-log The high-level goal is to create a construct in which Phoenix clients >>> will >>> automatically serialize information about the queries they run to a table for retrospective analysis. Ideally, this information would be stored in >>> a >>> Phoenix table. We want this data to help answer questions like: * What queries are running against my system * What specific queries started between 535AM and 620AM two days ago * What queries are user "bob" running * Are my user's queries effectively using the indexes in the system Anti-goals for include: * Cluster impact (computation/memory) usage of a query * Query performance may be slowed to ensure all data is serialized * A third-party service dedicated to ensuring query info is serialized >>> (in >>> the event of client failure) Take a look at the document and let us know what you think please. I'm happy to try to explain this in greater detail.
[jira] [Commented] (PHOENIX-4625) memory leak in PhoenixConnection if scanner renew lease thread is not enabled
[ https://issues.apache.org/jira/browse/PHOENIX-4625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16372463#comment-16372463 ] Samarth Jain commented on PHOENIX-4625: --- +1 > memory leak in PhoenixConnection if scanner renew lease thread is not enabled > - > > Key: PHOENIX-4625 > URL: https://issues.apache.org/jira/browse/PHOENIX-4625 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.14.0 >Reporter: Vikas Vishwakarma >Priority: Major > Fix For: 4.14.0 > > Attachments: GC_After_fix.png, GC_Leak.png, PHOENIX-4625.patch, QS.png > > > We have two different code path > # In ConnectionQueryServicesImpl RenewLeaseTasks is scheduled based on the > following checks if renew lease feature is supported and if the renew lease > config is enabled > supportsFeature(ConnectionQueryServices.Feature.RENEW_LEASE) && > renewLeaseEnabled > # In PhoenixConnection for every scan iterator is added to a Queue for lease > renewal based on just the check if the renew lease feature is supported > services.supportsFeature(Feature.RENEW_LEASE) > In PhoenixConnection we however miss the check whether renew lease config is > enabled (phoenix.scanner.lease.renew.enabled) > > Now consider a situation where Renew lease feature is supported but > phoenix.scanner.lease.renew.enabled is set to false in hbase-site.xml . In > this case PhoenixConnection will keep adding the iterators for every scan > into the scannerQueue for renewal based on the feature supported check but > the renewal task is not running because phoenix.scanner.lease.renew.enabled > is set to false, so the scannerQueue will keep growing as long as the > PhoenixConnection is alive and multiple scans requests are coming on this > connection. > > We have a use case that uses a single PhoenixConnection that is perpetual and > does billions of scans on this connection. In this case scannerQueue is > growing to several GB's and ultimately leading to Consecutive Full GC's/OOM > > Add iterators for Lease renewal in PhoenixConnection > = > {code:java} > > public void addIteratorForLeaseRenewal(@Nonnull TableResultIterator itr) { > if (services.supportsFeature(Feature.RENEW_LEASE)) > { >checkNotNull(itr); scannerQueue.add(new > WeakReference(itr)); > } > } > {code} > > Starting the RenewLeaseTask > = > checks if Feature.RENEW_LEASE is supported and if > phoenix.scanner.lease.renew.enabled is true and starts the RenewLeaseTask > {code:java} > > ConnectionQueryServicesImpl { > > this.renewLeaseEnabled = config.getBoolean(RENEW_LEASE_ENABLED, > DEFAULT_RENEW_LEASE_ENABLED); > . > @Override > public boolean isRenewingLeasesEnabled(){ >return supportsFeature(ConnectionQueryServices.Feature.RENEW_LEASE) && > renewLeaseEnabled; > } > private void scheduleRenewLeaseTasks() { > if (isRenewingLeasesEnabled()) { >renewLeaseExecutor = >Executors.newScheduledThreadPool(renewLeasePoolSize, > renewLeaseThreadFactory); >for (LinkedBlockingQueue<WeakReference> q : > connectionQueues) { > renewLeaseExecutor.scheduleAtFixedRate(new RenewLeaseTask(q), 0, > renewLeaseTaskFrequency, TimeUnit.MILLISECONDS); >} > } > } > ... > } > {code} > > To solve this We must add both checks in PhoenixConnection if the feature is > supported and if the config is enabled before adding the iterators to > scannerQueue > ConnectionQueryServices.Feature.RENEW_LEASE is true && > phoenix.scanner.lease.renew.enabled is true > instead of just checking if the feature > ConnectionQueryServices.Feature.RENEW_LEASE is supported > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: [VOTE] Release of Apache Phoenix 4.13.2 for CDH 5.11.2 RC0
+1. Ran unit tests successfully. Executed some manual tests around secondary indexes and stats collection - looks fine. On Sat, Jan 13, 2018 at 3:30 AM, Pedro Boadowrote: > Hello Everyone, > > This is a call for a vote on Apache Phoenix 4.13.2 for CDH 5.11.2 RC0. This > is > a first release of Phoenix 4.13.x compatible with Cloudera CDH. The release > includes a source-only release, a convenience binary release and, as a > novelty, a > parcel-based binary release ready to be installed from Cloudera Manager > (CM). > > This release has feature parity with supported HBase versions and includes > the following improvements: > - Support for Apache Phoenix on CDH 5.11.2 ( based on HBase 1.2 branch ) . > - More than 10+ fixes over release 4.13.1-HBase-1.2 > > The work is inspired on the approach taken ( and now discontinued ) by > https://github.com/cloudera-labs/phoenix a while ago. Please take this > first RC for a spin! > > The source tarball, including signatures, digests, etc can be found at: > https://dist.apache.org/repos/dist/dev/phoenix/apache- > phoenix-4.13.2-cdh5.11.2-rc0/src/ > > The binary artifacts can be found at: > https://dist.apache.org/repos/dist/dev/phoenix/apache- > phoenix-4.13.2-cdh5.11.2-rc0/bin/ > https://dist.apache.org/repos/dist/dev/phoenix/apache- > phoenix-4.13.2-cdh5.11.2-rc0/parcels/ ( this > directory can be configured in CM as parcel repository for direct > installation ) > > For a complete list of changes, see: > https://issues.apache.org/jira/secure/ReleaseNote.jspa? > version=12342253=Text=12315120 > > Release artifacts are signed with the following key: > https://people.apache.org/keys/committer/mujtaba.asc > https://dist.apache.org/repos/dist/release/phoenix/KEYS > > The hash and tag to be voted upon: > https://git-wip-us.apache.org/repos/asf?p=phoenix.git;a=commit;h= > 60b76d2dc0a039777cc380cf5a8a927a02afff6d > https://git-wip-us.apache.org/repos/asf?p=phoenix.git;a=tag; > h=refs/tags/v4.13.2-cdh5.11.2-rc0 > > Vote will be open for at least 72 hours. Please vote: > > [ ] +1 approve > [ ] +0 no opinion > [ ] -1 disapprove (and reason why) > > Thanks, > The Apache Phoenix Team >
[jira] [Updated] (PHOENIX-4397) Incorrect query results when with stats are disabled on a salted table
[ https://issues.apache.org/jira/browse/PHOENIX-4397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samarth Jain updated PHOENIX-4397: -- Attachment: PHOENIX-4397_v2.patch Patch that fixes the issue along with tests. [~jamestaylor], please review. [~mujtabachohan] - let me know if it passes your more exhaustive tests too. > Incorrect query results when with stats are disabled on a salted table > -- > > Key: PHOENIX-4397 > URL: https://issues.apache.org/jira/browse/PHOENIX-4397 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.13.0 >Reporter: Mujtaba Chohan >Assignee: Samarth Jain > Fix For: 4.13.1 > > Attachments: PHOENIX-4397.patch, PHOENIX-4397_v2.patch > > > See attached unit test. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-4371) Document explain plan and how we expose estimate information in it
[ https://issues.apache.org/jira/browse/PHOENIX-4371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16273863#comment-16273863 ] Samarth Jain commented on PHOENIX-4371: --- Committed the patch after addressing the review comments. http://phoenix.apache.org/explainplan.html http://localhost:8000/tuning_guide.html > Document explain plan and how we expose estimate information in it > -- > > Key: PHOENIX-4371 > URL: https://issues.apache.org/jira/browse/PHOENIX-4371 > Project: Phoenix > Issue Type: Task > Reporter: Samarth Jain > Assignee: Samarth Jain > Attachments: explainplan.md > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (PHOENIX-4371) Document explain plan and how we expose estimate information in it
[ https://issues.apache.org/jira/browse/PHOENIX-4371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samarth Jain resolved PHOENIX-4371. --- Resolution: Fixed > Document explain plan and how we expose estimate information in it > -- > > Key: PHOENIX-4371 > URL: https://issues.apache.org/jira/browse/PHOENIX-4371 > Project: Phoenix > Issue Type: Task > Reporter: Samarth Jain > Assignee: Samarth Jain > Attachments: explainplan.md > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-4395) Illegal data. Expected length of at least 49 bytes, but had 4 (state=22000,code=201)
[ https://issues.apache.org/jira/browse/PHOENIX-4395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259575#comment-16259575 ] Samarth Jain commented on PHOENIX-4395: --- [~gjacoby] - this error doesn't have to do with column encoding. Although I can see why the error message made you think it was ;). [~rajat.thakur] - how was data added to Phoenix/HBase? Schema of your Phoenix table along with sample upsert statements will help immensely, too. > Illegal data. Expected length of at least 49 bytes, but had 4 > (state=22000,code=201) > > > Key: PHOENIX-4395 > URL: https://issues.apache.org/jira/browse/PHOENIX-4395 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.7.0, 4.12.0 >Reporter: Rajat Thakur > > I am importing Oracle ExaData to Hbase via Sqoop. And query via phoenix . > There are problem in following Column attributes (when querying via phoenix) > whose dataType is : DATE, TIMESTAMP, BIGINT > Error: ERROR 201 (22000): Illegal data. Expected length of at least 49 bytes, > but had 4 (state=22000,code=201) > java.sql.SQLException: ERROR 201 (22000): Illegal data. Expected length of at > least 49 bytes, but had 4 > at > org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:489) > at > org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:150) > at > org.apache.phoenix.schema.KeyValueSchema.next(KeyValueSchema.java:211) > at > org.apache.phoenix.expression.ProjectedColumnExpression.evaluate(ProjectedColumnExpression.java:116) > at > org.apache.phoenix.compile.ExpressionProjector.getValue(ExpressionProjector.java:69) > at > org.apache.phoenix.jdbc.PhoenixResultSet.getString(PhoenixResultSet.java:609) > at sqlline.Rows$Row.(Rows.java:183) > at sqlline.BufferedRows.(BufferedRows.java:38) > at sqlline.SqlLine.print(SqlLine.java:1660) > at sqlline.Commands.execute(Commands.java:833) > at sqlline.Commands.sql(Commands.java:732) > at sqlline.SqlLine.dispatch(SqlLine.java:813) > at sqlline.SqlLine.begin(SqlLine.java:686) > at sqlline.SqlLine.start(SqlLine.java:398) > at sqlline.SqlLine.main(SqlLine.java:291) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-4319) Zookeeper connection should be closed immediately
[ https://issues.apache.org/jira/browse/PHOENIX-4319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258245#comment-16258245 ] Samarth Jain commented on PHOENIX-4319: --- Can you try with 4.13? > Zookeeper connection should be closed immediately > - > > Key: PHOENIX-4319 > URL: https://issues.apache.org/jira/browse/PHOENIX-4319 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.10.0 > Environment: phoenix4.10 hbase1.2.0 >Reporter: Jepson > Labels: patch > Original Estimate: 48h > Remaining Estimate: 48h > > *Code:* > {code:java} > val zkUrl = "192.168.100.40,192.168.100.41,192.168.100.42:2181:/hbase" > val configuration = new Configuration() > configuration.set("hbase.zookeeper.quorum",zkUrl) > val spark = SparkSession > .builder() > .appName("SparkPhoenixTest1") > .master("local[2]") > .getOrCreate() > for( a <- 1 to 100){ > val wms_doDF = spark.sqlContext.phoenixTableAsDataFrame( > "DW.wms_do", > Array("WAREHOUSE_NO", "DO_NO"), > predicate = Some( > """ > |MOD_TIME >= TO_DATE('begin_day', '-MM-dd') > |and MOD_TIME < TO_DATE('end_day', '-MM-dd') > """.stripMargin.replaceAll("begin_day", > "2017-10-01").replaceAll("end_day", "2017-10-25")), > conf = configuration > ) > wms_doDF.show(100) > } > {code} > *Description:* > The connection to zookeeper is not getting closed,which causes the maximum > number of client connections to be reached from a host( we have > maxClientCnxns as 500 in zookeeper config). > *Zookeeper connections:* > [https://github.com/Hackeruncle/Images/blob/master/zookeeper%20connections.png] > *Reference:* > [https://community.hortonworks.com/questions/116832/hbase-zookeeper-connections-not-getting-closed.html] -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-4360) Prevent System.Catalog from splitting
[ https://issues.apache.org/jira/browse/PHOENIX-4360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257684#comment-16257684 ] Samarth Jain commented on PHOENIX-4360: --- [~lhofhansl] - would be good to also have a test for this that basically validates admin.split('SYSTEM.CATALOG') was a no-op. > Prevent System.Catalog from splitting > - > > Key: PHOENIX-4360 > URL: https://issues.apache.org/jira/browse/PHOENIX-4360 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.13.0 >Reporter: Lars Hofhansl >Assignee: Lars Hofhansl >Priority: Blocker > Fix For: 4.14.0 > > Attachments: 4360.txt > > > Just talked to [~jamestaylor]. > It turns out that currently System.Catalog is not prevented from splitting > generally, but does not allow splitting within a schema. > In the multi-tenant case that is not good enough. When System.Catalog splits > and a base table and view end up in different regions the following can > happen: > * DROP CASCADE no longer works for those views > * Adding/removing columns to/from the base table no longer works > Until PHOENIX-3534 is done we should simply prevent System.Catalog from > splitting. (just like HBase:meta) > [~apurtell] -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-4369) ArrayIndexOutOfBounds when upserting to table using ROW_TIMESTAMP
[ https://issues.apache.org/jira/browse/PHOENIX-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16256713#comment-16256713 ] Samarth Jain commented on PHOENIX-4369: --- Looked at it briefly. Looks like an issue when the ROW_TIMESTAMP column's data type is TIMESTAMP. [~arfield], as a work around, you can have the PK_ROW_TIMESTAMP as DATE. > ArrayIndexOutOfBounds when upserting to table using ROW_TIMESTAMP > - > > Key: PHOENIX-4369 > URL: https://issues.apache.org/jira/browse/PHOENIX-4369 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.10.0 > Environment: Ubuntu 16.04, JRE 1.8.0_102_x64 >Reporter: Alex Field > Labels: SFDC > > Used this DDL to create a table which uses ROW_TIMESTAMP > CREATE TABLE IF NOT EXISTS FOO ( > TENANT_ID CHAR(15) NOT NULL, > PK_BAR VARCHAR(80) NOT NULL, -- NAME > PK_ROW_TIMESTAMP TIMESTAMP NOT NULL, > FIZZ TIMESTAMP, > BUZZ VARCHAR(255), -- LABEL > BAZZ CHAR(15), -- VERSION_ID > QUX INTEGER, > HODOR VARCHAR(10) -- A json blob. > CONSTRAINT PK PRIMARY KEY (TENANT_ID, PK_BAR, PK_ROW_TIMESTAMP > ROW_TIMESTAMP) > ) VERSIONS=3,MULTI_TENANT=true,REPLICATION_SCOPE=1 > Upsert causes this exception: > java.lang.ArrayIndexOutOfBoundsException: 8 > at > org.apache.phoenix.execute.MutationState.getNewRowKeyWithRowTimestamp(MutationState.java:554) > at > org.apache.phoenix.execute.MutationState.generateMutations(MutationState.java:640) > at > org.apache.phoenix.execute.MutationState.addRowMutations(MutationState.java:572) > at > org.apache.phoenix.execute.MutationState.send(MutationState.java:1003) > at > org.apache.phoenix.execute.MutationState.send(MutationState.java:1469) > at > org.apache.phoenix.execute.MutationState.commit(MutationState.java:1301) > at > org.apache.phoenix.jdbc.PhoenixConnection$3.call(PhoenixConnection.java:533) > at > org.apache.phoenix.jdbc.PhoenixConnection$3.call(PhoenixConnection.java:530) > at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53) > at > org.apache.phoenix.jdbc.PhoenixConnection.commit(PhoenixConnection.java:530) > Here's a copy of the test driver: > @Test > public void testSomething() throws Exception { > String sql = "UPSERT INTO FOO (BUZZ, BAZZ, PK_BAR) VALUES (?, ?, ?)"; > try (PreparedStatement stmt = conn.prepareStatement(sql)) { > stmt.setString(1, "blah blah"); > stmt.setString(2, null); > stmt.setString(3, "blah"); > stmt.execute(); > conn.commit(); > } > } -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (PHOENIX-4371) Document explain plan and how we expose estimate information in it
[ https://issues.apache.org/jira/browse/PHOENIX-4371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samarth Jain updated PHOENIX-4371: -- Attachment: explainplan.md [~jamestaylor], please review. I also removed the explain plan section from the tuning guide, copied its content and added it to the new explain plan page. > Document explain plan and how we expose estimate information in it > -- > > Key: PHOENIX-4371 > URL: https://issues.apache.org/jira/browse/PHOENIX-4371 > Project: Phoenix > Issue Type: Task > Reporter: Samarth Jain > Assignee: Samarth Jain > Attachments: explainplan.md > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (PHOENIX-4371) Document explain plan and how we expose estimate information in it
Samarth Jain created PHOENIX-4371: - Summary: Document explain plan and how we expose estimate information in it Key: PHOENIX-4371 URL: https://issues.apache.org/jira/browse/PHOENIX-4371 Project: Phoenix Issue Type: Task Reporter: Samarth Jain Assignee: Samarth Jain -- This message was sent by Atlassian JIRA (v6.4.14#64029)
Re: [VOTE] Release of Apache Phoenix 4.13.0 RC1
+1 Successfully ran all unit tests. Verified recent changes and bug fixes around stats collection and reporting. On Thu, Nov 9, 2017 at 5:02 PM, James Taylorwrote: > +1. Verified bug fixes around delete. > > On Thu, Nov 9, 2017 at 10:21 AM, Mujtaba Chohan > wrote: > > > +1. > > > > Verified performance and backward compat. with 4.10/11/12. > > > > On Tue, Nov 7, 2017 at 2:50 PM, Andrew Purtell > > wrote: > > > > > +1 > > > > > > Checked sums and signatures: ok > > > RAT check passes: ok (8u131) [1] > > > Built from source: ok (8u131) [1] > > > Unit tests pass: ok (8u131) [2] > > > > > > > > > 1. There are some Maven warnings that should be fixed, but are not > > release > > > blockers. "Reporting configuration should be done in > section, > > > not in maven-site-plugin as reportPlugins parameter." > > Maven > > > 3.5.0. > > > > > > 2. PartialIndexRebuilderIT.testConcurrentUpsertsWithRebuild ran out of > > > time > > > when executed with other tests, but passed when run standalone. > > > > > > > > > On Mon, Nov 6, 2017 at 3:47 PM, James Taylor > > > wrote: > > > > > > > Hello Everyone, > > > > > > > > This is a call for a vote on Apache Phoenix 4.13.0 RC1. This is the > > next > > > > minor release of Phoenix 4, compatible with Apache HBase 0.98 and > 1.3. > > > The > > > > release includes both a source-only release and a convenience binary > > > > release for each supported HBase version. The previous RC was sunk > due > > to > > > > PHOENIX-4351 which is now fixed. > > > > > > > > This release has feature parity with supported HBase versions and > > > includes > > > > the following improvements: > > > > - Critical bug fix to prevent snapshot creation of SYSTEM.CATALOG > when > > > > connecting [1] > > > > - Numerous bug fixes around handling of row deletion [2][3][4][5] > > > > - Improvements to statistics collection [6][7][8][9] > > > > - New COLLATION_KEY built-in function for linguistic sort [10] > > > > > > > > The source tarball, including signatures, digests, etc can be found > at: > > > > https://dist.apache.org/repos/dist/dev/phoenix/apache-phoeni > > > > x-4.13.0-HBase-0.98-rc1/src/ > > > > https://dist.apache.org/repos/dist/dev/phoenix/apache-phoeni > > > > x-4.13.0-HBase-1.3-rc1/src/ > > > > > > > > The binary artifacts can be found at: > > > > https://dist.apache.org/repos/dist/dev/phoenix/apache-phoeni > > > > x-4.13.0-HBase-0.98-rc1/bin/ > > > > https://dist.apache.org/repos/dist/dev/phoenix/apache-phoeni > > > > x-4.13.0-HBase-1.3-rc1/bin/ > > > > > > > > For a complete list of changes, see: > > > > https://issues.apache.org/jira/secure/ReleaseNote.jspa?proje > > > > ctId=12315120=12341710 > > > > > > > > Release artifacts are signed with the following key: > > > > https://people.apache.org/keys/committer/mujtaba.asc > > > > https://dist.apache.org/repos/dist/release/phoenix/KEYS > > > > > > > > The hash and tag to be voted upon: > > > > https://git-wip-us.apache.org/repos/asf?p=phoenix.git;a=comm > > > > it;h=8b7e12414400c997d5993fb55586bfcc2f56d217 > > > > https://git-wip-us.apache.org/repos/asf?p=phoenix.git;a=tag; > > > > h=refs/tags/v4.13.0-HBase-0.98-rc1 > > > > https://git-wip-us.apache.org/repos/asf?p=phoenix.git;a=comm > > > > it;h=4a1f0df6143ba705a48b5051aee52dab158afe8d > > > > https://git-wip-us.apache.org/repos/asf?p=phoenix.git;a=tag; > > > > h=refs/tags/v4.13.0-HBase-1.3-rc1 > > > > > > > > Vote will be open for at least 72 hours. Please vote: > > > > > > > > [ ] +1 approve > > > > [ ] +0 no opinion > > > > [ ] -1 disapprove (and reason why) > > > > > > > > Thanks, > > > > The Apache Phoenix Team > > > > > > > > [1] https://issues.apache.org/jira/browse/PHOENIX-4335 > > > > [2] https://issues.apache.org/jira/browse/PHOENIX-4280 > > > > [3] https://issues.apache.org/jira/browse/PHOENIX-4290 > > > > [4] https://issues.apache.org/jira/browse/PHOENIX-4348 > > > > [5] https://issues.apache.org/jira/browse/PHOENIX-4277 > > > > [6] https://issues.apache.org/jira/browse/PHOENIX-3368 > > > > [7] https://issues.apache.org/jira/browse/PHOENIX-4287 > > > > [8] https://issues.apache.org/jira/browse/PHOENIX-4289 > > > > [9] https://issues.apache.org/jira/browse/PHOENIX-4343 > > > > [10] https://issues.apache.org/jira/browse/PHOENIX-4237 > > > > > > > > > >
[jira] [Commented] (PHOENIX-4358) Case Sensitive String match on SqlType in PDataType
[ https://issues.apache.org/jira/browse/PHOENIX-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246418#comment-16246418 ] Samarth Jain commented on PHOENIX-4358: --- Patch looks fine to me, [~dangulo]. Please add a test in PDataTypeTest to make sure a future change doesn't end up causing a regression. > Case Sensitive String match on SqlType in PDataType > --- > > Key: PHOENIX-4358 > URL: https://issues.apache.org/jira/browse/PHOENIX-4358 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.11.0 > Environment: OSX and Linux >Reporter: Dave Angulo >Priority: Minor > Attachments: caseFix.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > fromSqlTypeName() method uses a case sensitive match on input SqlType. This > causes an issue in Spark JDBCUtils.makeSetter() which lowerCases input. The > result is the error _Unsupported sql type: varchar_. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
Re: [VOTE] Release of Apache Phoenix 4.13.0 RC1
+1 On Tue, Nov 7, 2017 at 12:06 PM, Andrew Purtellwrote: > My bad. Ignore my -1. Somehow the bin downloads failed. Let me try again. > > > On Tue, Nov 7, 2017 at 12:05 PM, Andrew Purtell > wrote: > > > -1 > > > > Signature verification fails > > > > $ gpg --verify apache-phoenix-4.13.0-HBase-0.98-src.tar.gz.asc > > apache-phoenix-4.13.0-HBase-0.98-src.tar.gz > > gpg: Signature made Mon Nov 6 14:54:25 2017 PST > > gpg:using RSA key 3BFCB3929461178E > > gpg: Good signature from "Mujtaba Chohan (CODE SIGNING KEY) < > > mujt...@apache.org>" [unknown] > > > > $ gpg --verify apache-phoenix-4.13.0-HBase-0.98-bin.tar.gz.asc > > apache-phoenix-4.13.0-HBase-0.98-bin.tar.gz > > gpg: Signature made Mon Nov 6 14:54:05 2017 PST > > gpg:using RSA key 3BFCB3929461178E > > *gpg: BAD signature from "Mujtaba Chohan (CODE SIGNING KEY) > > >" [unknown]* > > > > $ gpg --verify apache-phoenix-4.13.0-HBase-1.3-src.tar.gz.asc > > apache-phoenix-4.13.0-HBase-1.3-src.tar.gz > > gpg: Signature made Mon Nov 6 14:54:34 2017 PST > > gpg:using RSA key 3BFCB3929461178E > > gpg: Good signature from "Mujtaba Chohan (CODE SIGNING KEY) < > > mujt...@apache.org>" [unknown] > > > > $ gpg --verify apache-phoenix-4.13.0-HBase-1.3-bin.tar.gz.asc > > apache-phoenix-4.13.0-HBase-1.3-bin.tar.gz > > gpg: Signature made Mon Nov 6 14:54:06 2017 PST > > gpg:using RSA key 3BFCB3929461178E > > *gpg: BAD signature from "Mujtaba Chohan (CODE SIGNING KEY) > > >" [unknown]* > > > > > > > > On Mon, Nov 6, 2017 at 3:47 PM, James Taylor > > wrote: > > > >> Hello Everyone, > >> > >> This is a call for a vote on Apache Phoenix 4.13.0 RC1. This is the next > >> minor release of Phoenix 4, compatible with Apache HBase 0.98 and 1.3. > The > >> release includes both a source-only release and a convenience binary > >> release for each supported HBase version. The previous RC was sunk due > to > >> PHOENIX-4351 which is now fixed. > >> > >> This release has feature parity with supported HBase versions and > includes > >> the following improvements: > >> - Critical bug fix to prevent snapshot creation of SYSTEM.CATALOG when > >> connecting [1] > >> - Numerous bug fixes around handling of row deletion [2][3][4][5] > >> - Improvements to statistics collection [6][7][8][9] > >> - New COLLATION_KEY built-in function for linguistic sort [10] > >> > >> The source tarball, including signatures, digests, etc can be found at: > >> https://dist.apache.org/repos/dist/dev/phoenix/apache-phoeni > >> x-4.13.0-HBase-0.98-rc1/src/ > >> https://dist.apache.org/repos/dist/dev/phoenix/apache-phoeni > >> x-4.13.0-HBase-1.3-rc1/src/ > >> > >> The binary artifacts can be found at: > >> https://dist.apache.org/repos/dist/dev/phoenix/apache-phoeni > >> x-4.13.0-HBase-0.98-rc1/bin/ > >> https://dist.apache.org/repos/dist/dev/phoenix/apache-phoeni > >> x-4.13.0-HBase-1.3-rc1/bin/ > >> > >> For a complete list of changes, see: > >> https://issues.apache.org/jira/secure/ReleaseNote.jspa?proje > >> ctId=12315120=12341710 > >> > >> Release artifacts are signed with the following key: > >> https://people.apache.org/keys/committer/mujtaba.asc > >> https://dist.apache.org/repos/dist/release/phoenix/KEYS > >> > >> The hash and tag to be voted upon: > >> https://git-wip-us.apache.org/repos/asf?p=phoenix.git;a=comm > >> it;h=8b7e12414400c997d5993fb55586bfcc2f56d217 > >> https://git-wip-us.apache.org/repos/asf?p=phoenix.git;a=tag; > >> h=refs/tags/v4.13.0-HBase-0.98-rc1 > >> https://git-wip-us.apache.org/repos/asf?p=phoenix.git;a=comm > >> it;h=4a1f0df6143ba705a48b5051aee52dab158afe8d > >> https://git-wip-us.apache.org/repos/asf?p=phoenix.git;a=tag; > >> h=refs/tags/v4.13.0-HBase-1.3-rc1 > >> > >> Vote will be open for at least 72 hours. Please vote: > >> > >> [ ] +1 approve > >> [ ] +0 no opinion > >> [ ] -1 disapprove (and reason why) > >> > >> Thanks, > >> The Apache Phoenix Team > >> > >> [1] https://issues.apache.org/jira/browse/PHOENIX-4335 > >> [2] https://issues.apache.org/jira/browse/PHOENIX-4280 > >> [3] https://issues.apache.org/jira/browse/PHOENIX-4290 > >> [4] https://issues.apache.org/jira/browse/PHOENIX-4348 > >> [5] https://issues.apache.org/jira/browse/PHOENIX-4277 > >> [6] https://issues.apache.org/jira/browse/PHOENIX-3368 > >> [7] https://issues.apache.org/jira/browse/PHOENIX-4287 > >> [8] https://issues.apache.org/jira/browse/PHOENIX-4289 > >> [9] https://issues.apache.org/jira/browse/PHOENIX-4343 > >> [10] https://issues.apache.org/jira/browse/PHOENIX-4237 > >> > > > > > > > > -- > > Best regards, > > Andrew > > > > Words like orphans lost among the crosstalk, meaning torn from truth's > > decrepit hands > >- A23, Crosstalk > > > > > > -- > Best regards, > Andrew > > Words like orphans lost among the crosstalk, meaning torn from truth's > decrepit hands >- A23,
[jira] [Commented] (PHOENIX-4348) Point deletes do not work when there are immutable indexes with only row key columns
[ https://issues.apache.org/jira/browse/PHOENIX-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16237067#comment-16237067 ] Samarth Jain commented on PHOENIX-4348: --- +1 > Point deletes do not work when there are immutable indexes with only row key > columns > > > Key: PHOENIX-4348 > URL: https://issues.apache.org/jira/browse/PHOENIX-4348 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor >Assignee: James Taylor >Priority: Major > Fix For: 4.13.0 > > Attachments: PHOENIX-4348.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-4287) Incorrect aggregate query results when stats are disable for parallelization
[ https://issues.apache.org/jira/browse/PHOENIX-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236904#comment-16236904 ] Samarth Jain commented on PHOENIX-4287: --- Thanks. I added the comment in my commit. > Incorrect aggregate query results when stats are disable for parallelization > > > Key: PHOENIX-4287 > URL: https://issues.apache.org/jira/browse/PHOENIX-4287 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.12.0 > Environment: HBase 1.3.1 >Reporter: Mujtaba Chohan > Assignee: Samarth Jain >Priority: Major > Labels: localIndex > Fix For: 4.13.0, 4.12.1 > > Attachments: PHOENIX-4287.patch, PHOENIX-4287_addendum.patch, > PHOENIX-4287_addendum2.patch, PHOENIX-4287_addendum3.patch, > PHOENIX-4287_addendum4.patch, PHOENIX-4287_addendum5.patch, > PHOENIX-4287_addendum6.patch, PHOENIX-4287_addendum7.patch, > PHOENIX-4287_v2.patch, PHOENIX-4287_v3.patch, PHOENIX-4287_v3_wip.patch, > PHOENIX-4287_v4.patch > > > With {{phoenix.use.stats.parallelization}} set to {{false}}, aggregate query > returns incorrect results when stats are available. > With local index and stats disabled for parallelization: > {noformat} > explain select count(*) from TABLE_T; > +---+-++---+ > | PLAN > | EST_BYTES_READ | EST_ROWS_READ | EST_INFO | > +---+-++---+ > | CLIENT 0-CHUNK 332170 ROWS 625043899 BYTES PARALLEL 0-WAY RANGE SCAN OVER > TABLE_T [1] | 625043899 | 332170 | 150792825 | > | SERVER FILTER BY FIRST KEY ONLY > | 625043899 | 332170 | 150792825 | > | SERVER AGGREGATE INTO SINGLE ROW > | 625043899 | 332170 | 150792825 | > +---+-++---+ > select count(*) from TABLE_T; > +---+ > | COUNT(1) | > +---+ > | 0 | > +---+ > {noformat} > Using data table > {noformat} > explain select /*+NO_INDEX*/ count(*) from TABLE_T; > +--+-+++ > | PLAN > | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS | > +--+-+++ > | CLIENT 2-CHUNK 332151 ROWS 438492470 BYTES PARALLEL 1-WAY FULL SCAN OVER > TABLE_T | 438492470 | 332151 | 1507928257617 | > | SERVER FILTER BY FIRST KEY ONLY > | 438492470 | 332151 | 1507928257617 | > | SERVER AGGREGATE INTO SINGLE ROW > | 438492470 | 332151 | 1507928257617 | > +--+-+++ > select /*+NO_INDEX*/ count(*) from TABLE_T; > +---+ > | COUNT(1) | > +---+ > | 14| > +---+ > {noformat} > Without stats available, results are correct: > {noformat} > explain select /*+NO_INDEX*/ count(*) from TABLE_T; > +--+-++--+ > | PLAN | > EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS | > +--+-++--+ > | CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER TABLE_T | null| > null | null | > | SERVER FILTER BY FIRST KEY ONLY | null >| null | null | > | SERVER AGGREGATE INTO SINGLE ROW
[jira] [Updated] (PHOENIX-4287) Incorrect aggregate query results when stats are disable for parallelization
[ https://issues.apache.org/jira/browse/PHOENIX-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samarth Jain updated PHOENIX-4287: -- Attachment: PHOENIX-4287_addendum7.patch Looks like an NPE happens when dropping local indexes. Addressing it in this patch. > Incorrect aggregate query results when stats are disable for parallelization > > > Key: PHOENIX-4287 > URL: https://issues.apache.org/jira/browse/PHOENIX-4287 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.12.0 > Environment: HBase 1.3.1 >Reporter: Mujtaba Chohan > Assignee: Samarth Jain >Priority: Major > Labels: localIndex > Fix For: 4.13.0, 4.12.1 > > Attachments: PHOENIX-4287.patch, PHOENIX-4287_addendum.patch, > PHOENIX-4287_addendum2.patch, PHOENIX-4287_addendum3.patch, > PHOENIX-4287_addendum4.patch, PHOENIX-4287_addendum5.patch, > PHOENIX-4287_addendum6.patch, PHOENIX-4287_addendum7.patch, > PHOENIX-4287_v2.patch, PHOENIX-4287_v3.patch, PHOENIX-4287_v3_wip.patch, > PHOENIX-4287_v4.patch > > > With {{phoenix.use.stats.parallelization}} set to {{false}}, aggregate query > returns incorrect results when stats are available. > With local index and stats disabled for parallelization: > {noformat} > explain select count(*) from TABLE_T; > +---+-++---+ > | PLAN > | EST_BYTES_READ | EST_ROWS_READ | EST_INFO | > +---+-++---+ > | CLIENT 0-CHUNK 332170 ROWS 625043899 BYTES PARALLEL 0-WAY RANGE SCAN OVER > TABLE_T [1] | 625043899 | 332170 | 150792825 | > | SERVER FILTER BY FIRST KEY ONLY > | 625043899 | 332170 | 150792825 | > | SERVER AGGREGATE INTO SINGLE ROW > | 625043899 | 332170 | 150792825 | > +---+-++---+ > select count(*) from TABLE_T; > +---+ > | COUNT(1) | > +---+ > | 0 | > +---+ > {noformat} > Using data table > {noformat} > explain select /*+NO_INDEX*/ count(*) from TABLE_T; > +--+-+++ > | PLAN > | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS | > +--+-+++ > | CLIENT 2-CHUNK 332151 ROWS 438492470 BYTES PARALLEL 1-WAY FULL SCAN OVER > TABLE_T | 438492470 | 332151 | 1507928257617 | > | SERVER FILTER BY FIRST KEY ONLY > | 438492470 | 332151 | 1507928257617 | > | SERVER AGGREGATE INTO SINGLE ROW > | 438492470 | 332151 | 1507928257617 | > +--+-+++ > select /*+NO_INDEX*/ count(*) from TABLE_T; > +---+ > | COUNT(1) | > +---+ > | 14| > +---+ > {noformat} > Without stats available, results are correct: > {noformat} > explain select /*+NO_INDEX*/ count(*) from TABLE_T; > +--+-++--+ > | PLAN | > EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS | > +--+-++--+ > | CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER TABLE_T | null| > null | null | > | SERVER
[jira] [Updated] (PHOENIX-4287) Incorrect aggregate query results when stats are disable for parallelization
[ https://issues.apache.org/jira/browse/PHOENIX-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samarth Jain updated PHOENIX-4287: -- Attachment: PHOENIX-4287_addendum6.patch Updated patch with additional test on view and view index. > Incorrect aggregate query results when stats are disable for parallelization > > > Key: PHOENIX-4287 > URL: https://issues.apache.org/jira/browse/PHOENIX-4287 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.12.0 > Environment: HBase 1.3.1 >Reporter: Mujtaba Chohan > Assignee: Samarth Jain >Priority: Major > Labels: localIndex > Fix For: 4.13.0, 4.12.1 > > Attachments: PHOENIX-4287.patch, PHOENIX-4287_addendum.patch, > PHOENIX-4287_addendum2.patch, PHOENIX-4287_addendum3.patch, > PHOENIX-4287_addendum4.patch, PHOENIX-4287_addendum5.patch, > PHOENIX-4287_addendum6.patch, PHOENIX-4287_v2.patch, PHOENIX-4287_v3.patch, > PHOENIX-4287_v3_wip.patch, PHOENIX-4287_v4.patch > > > With {{phoenix.use.stats.parallelization}} set to {{false}}, aggregate query > returns incorrect results when stats are available. > With local index and stats disabled for parallelization: > {noformat} > explain select count(*) from TABLE_T; > +---+-++---+ > | PLAN > | EST_BYTES_READ | EST_ROWS_READ | EST_INFO | > +---+-++---+ > | CLIENT 0-CHUNK 332170 ROWS 625043899 BYTES PARALLEL 0-WAY RANGE SCAN OVER > TABLE_T [1] | 625043899 | 332170 | 150792825 | > | SERVER FILTER BY FIRST KEY ONLY > | 625043899 | 332170 | 150792825 | > | SERVER AGGREGATE INTO SINGLE ROW > | 625043899 | 332170 | 150792825 | > +---+-++---+ > select count(*) from TABLE_T; > +---+ > | COUNT(1) | > +---+ > | 0 | > +---+ > {noformat} > Using data table > {noformat} > explain select /*+NO_INDEX*/ count(*) from TABLE_T; > +--+-+++ > | PLAN > | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS | > +--+-+++ > | CLIENT 2-CHUNK 332151 ROWS 438492470 BYTES PARALLEL 1-WAY FULL SCAN OVER > TABLE_T | 438492470 | 332151 | 1507928257617 | > | SERVER FILTER BY FIRST KEY ONLY > | 438492470 | 332151 | 1507928257617 | > | SERVER AGGREGATE INTO SINGLE ROW > | 438492470 | 332151 | 1507928257617 | > +--+-+++ > select /*+NO_INDEX*/ count(*) from TABLE_T; > +---+ > | COUNT(1) | > +---+ > | 14| > +---+ > {noformat} > Without stats available, results are correct: > {noformat} > explain select /*+NO_INDEX*/ count(*) from TABLE_T; > +--+-++--+ > | PLAN | > EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS | > +--+-++--+ > | CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER TABLE_T | null| > null | null | > | SERVER FILTER BY FIRST KEY ONLY | null >| null | null | > | SERVER AGGREGATE INTO SINGLE ROW
[jira] [Updated] (PHOENIX-4287) Incorrect aggregate query results when stats are disable for parallelization
[ https://issues.apache.org/jira/browse/PHOENIX-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samarth Jain updated PHOENIX-4287: -- Attachment: PHOENIX-4287_addendum5.patch Thanks for the code snippet, [~jamestaylor]. Attached is the addendum along with a test. > Incorrect aggregate query results when stats are disable for parallelization > > > Key: PHOENIX-4287 > URL: https://issues.apache.org/jira/browse/PHOENIX-4287 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.12.0 > Environment: HBase 1.3.1 >Reporter: Mujtaba Chohan > Assignee: Samarth Jain >Priority: Major > Labels: localIndex > Fix For: 4.13.0, 4.12.1 > > Attachments: PHOENIX-4287.patch, PHOENIX-4287_addendum.patch, > PHOENIX-4287_addendum2.patch, PHOENIX-4287_addendum3.patch, > PHOENIX-4287_addendum4.patch, PHOENIX-4287_addendum5.patch, > PHOENIX-4287_v2.patch, PHOENIX-4287_v3.patch, PHOENIX-4287_v3_wip.patch, > PHOENIX-4287_v4.patch > > > With {{phoenix.use.stats.parallelization}} set to {{false}}, aggregate query > returns incorrect results when stats are available. > With local index and stats disabled for parallelization: > {noformat} > explain select count(*) from TABLE_T; > +---+-++---+ > | PLAN > | EST_BYTES_READ | EST_ROWS_READ | EST_INFO | > +---+-++---+ > | CLIENT 0-CHUNK 332170 ROWS 625043899 BYTES PARALLEL 0-WAY RANGE SCAN OVER > TABLE_T [1] | 625043899 | 332170 | 150792825 | > | SERVER FILTER BY FIRST KEY ONLY > | 625043899 | 332170 | 150792825 | > | SERVER AGGREGATE INTO SINGLE ROW > | 625043899 | 332170 | 150792825 | > +---+-++---+ > select count(*) from TABLE_T; > +---+ > | COUNT(1) | > +---+ > | 0 | > +---+ > {noformat} > Using data table > {noformat} > explain select /*+NO_INDEX*/ count(*) from TABLE_T; > +--+-+++ > | PLAN > | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS | > +--+-+++ > | CLIENT 2-CHUNK 332151 ROWS 438492470 BYTES PARALLEL 1-WAY FULL SCAN OVER > TABLE_T | 438492470 | 332151 | 1507928257617 | > | SERVER FILTER BY FIRST KEY ONLY > | 438492470 | 332151 | 1507928257617 | > | SERVER AGGREGATE INTO SINGLE ROW > | 438492470 | 332151 | 1507928257617 | > +--+-+++ > select /*+NO_INDEX*/ count(*) from TABLE_T; > +---+ > | COUNT(1) | > +---+ > | 14| > +---+ > {noformat} > Without stats available, results are correct: > {noformat} > explain select /*+NO_INDEX*/ count(*) from TABLE_T; > +--+-++--+ > | PLAN | > EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS | > +--+-++--+ > | CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER TABLE_T | null| > null | null | > | SERVER FILTER BY FIRST KEY ONLY | null >| null | null | > | SERVER AGGREGATE INTO SINGLE ROW
[jira] [Commented] (PHOENIX-4287) Incorrect aggregate query results when stats are disable for parallelization
[ https://issues.apache.org/jira/browse/PHOENIX-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236580#comment-16236580 ] Samarth Jain commented on PHOENIX-4287: --- Yes, that's correct. Will change the patch to fetch the property from the base table. > Incorrect aggregate query results when stats are disable for parallelization > > > Key: PHOENIX-4287 > URL: https://issues.apache.org/jira/browse/PHOENIX-4287 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.12.0 > Environment: HBase 1.3.1 >Reporter: Mujtaba Chohan > Assignee: Samarth Jain >Priority: Major > Labels: localIndex > Fix For: 4.13.0, 4.12.1 > > Attachments: PHOENIX-4287.patch, PHOENIX-4287_addendum.patch, > PHOENIX-4287_addendum2.patch, PHOENIX-4287_addendum3.patch, > PHOENIX-4287_addendum4.patch, PHOENIX-4287_v2.patch, PHOENIX-4287_v3.patch, > PHOENIX-4287_v3_wip.patch, PHOENIX-4287_v4.patch > > > With {{phoenix.use.stats.parallelization}} set to {{false}}, aggregate query > returns incorrect results when stats are available. > With local index and stats disabled for parallelization: > {noformat} > explain select count(*) from TABLE_T; > +---+-++---+ > | PLAN > | EST_BYTES_READ | EST_ROWS_READ | EST_INFO | > +---+-++---+ > | CLIENT 0-CHUNK 332170 ROWS 625043899 BYTES PARALLEL 0-WAY RANGE SCAN OVER > TABLE_T [1] | 625043899 | 332170 | 150792825 | > | SERVER FILTER BY FIRST KEY ONLY > | 625043899 | 332170 | 150792825 | > | SERVER AGGREGATE INTO SINGLE ROW > | 625043899 | 332170 | 150792825 | > +---+-++---+ > select count(*) from TABLE_T; > +---+ > | COUNT(1) | > +---+ > | 0 | > +---+ > {noformat} > Using data table > {noformat} > explain select /*+NO_INDEX*/ count(*) from TABLE_T; > +--+-+++ > | PLAN > | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS | > +--+-+++ > | CLIENT 2-CHUNK 332151 ROWS 438492470 BYTES PARALLEL 1-WAY FULL SCAN OVER > TABLE_T | 438492470 | 332151 | 1507928257617 | > | SERVER FILTER BY FIRST KEY ONLY > | 438492470 | 332151 | 1507928257617 | > | SERVER AGGREGATE INTO SINGLE ROW > | 438492470 | 332151 | 1507928257617 | > +--+-+++ > select /*+NO_INDEX*/ count(*) from TABLE_T; > +---+ > | COUNT(1) | > +---+ > | 14| > +---+ > {noformat} > Without stats available, results are correct: > {noformat} > explain select /*+NO_INDEX*/ count(*) from TABLE_T; > +--+-++--+ > | PLAN | > EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS | > +--+-++--+ > | CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER TABLE_T | null| > null | null | > | SERVER FILTER BY FIRST KEY ONLY | null >| null | null | > | SERVER AGGREGATE INTO SINGLE ROW | null >
[jira] [Commented] (PHOENIX-4287) Incorrect aggregate query results when stats are disable for parallelization
[ https://issues.apache.org/jira/browse/PHOENIX-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236544#comment-16236544 ] Samarth Jain commented on PHOENIX-4287: --- USE_STATS_FOR_PARALLELIZATION can be set at an index/view/base table level. For index to use parallelization, you need to set USE_STATS_FOR_PARALLELIZATION = true, else the default value will be used (which in your case is false) > Incorrect aggregate query results when stats are disable for parallelization > > > Key: PHOENIX-4287 > URL: https://issues.apache.org/jira/browse/PHOENIX-4287 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.12.0 > Environment: HBase 1.3.1 >Reporter: Mujtaba Chohan > Assignee: Samarth Jain >Priority: Major > Labels: localIndex > Fix For: 4.13.0, 4.12.1 > > Attachments: PHOENIX-4287.patch, PHOENIX-4287_addendum.patch, > PHOENIX-4287_addendum2.patch, PHOENIX-4287_addendum3.patch, > PHOENIX-4287_addendum4.patch, PHOENIX-4287_v2.patch, PHOENIX-4287_v3.patch, > PHOENIX-4287_v3_wip.patch, PHOENIX-4287_v4.patch > > > With {{phoenix.use.stats.parallelization}} set to {{false}}, aggregate query > returns incorrect results when stats are available. > With local index and stats disabled for parallelization: > {noformat} > explain select count(*) from TABLE_T; > +---+-++---+ > | PLAN > | EST_BYTES_READ | EST_ROWS_READ | EST_INFO | > +---+-++---+ > | CLIENT 0-CHUNK 332170 ROWS 625043899 BYTES PARALLEL 0-WAY RANGE SCAN OVER > TABLE_T [1] | 625043899 | 332170 | 150792825 | > | SERVER FILTER BY FIRST KEY ONLY > | 625043899 | 332170 | 150792825 | > | SERVER AGGREGATE INTO SINGLE ROW > | 625043899 | 332170 | 150792825 | > +---+-++---+ > select count(*) from TABLE_T; > +---+ > | COUNT(1) | > +---+ > | 0 | > +---+ > {noformat} > Using data table > {noformat} > explain select /*+NO_INDEX*/ count(*) from TABLE_T; > +--+-+++ > | PLAN > | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS | > +--+-+++ > | CLIENT 2-CHUNK 332151 ROWS 438492470 BYTES PARALLEL 1-WAY FULL SCAN OVER > TABLE_T | 438492470 | 332151 | 1507928257617 | > | SERVER FILTER BY FIRST KEY ONLY > | 438492470 | 332151 | 1507928257617 | > | SERVER AGGREGATE INTO SINGLE ROW > | 438492470 | 332151 | 1507928257617 | > +--+-+++ > select /*+NO_INDEX*/ count(*) from TABLE_T; > +---+ > | COUNT(1) | > +---+ > | 14| > +---+ > {noformat} > Without stats available, results are correct: > {noformat} > explain select /*+NO_INDEX*/ count(*) from TABLE_T; > +--+-++--+ > | PLAN | > EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS | > +--+-++--+ > | CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER TABLE_T | null| > null | null | > | SERVER FILTE
[jira] [Commented] (PHOENIX-4287) Incorrect aggregate query results when stats are disable for parallelization
[ https://issues.apache.org/jira/browse/PHOENIX-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236538#comment-16236538 ] Samarth Jain commented on PHOENIX-4287: --- Just got back. Taking a look. > Incorrect aggregate query results when stats are disable for parallelization > > > Key: PHOENIX-4287 > URL: https://issues.apache.org/jira/browse/PHOENIX-4287 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.12.0 > Environment: HBase 1.3.1 >Reporter: Mujtaba Chohan > Assignee: Samarth Jain >Priority: Major > Labels: localIndex > Fix For: 4.13.0, 4.12.1 > > Attachments: PHOENIX-4287.patch, PHOENIX-4287_addendum.patch, > PHOENIX-4287_addendum2.patch, PHOENIX-4287_addendum3.patch, > PHOENIX-4287_addendum4.patch, PHOENIX-4287_v2.patch, PHOENIX-4287_v3.patch, > PHOENIX-4287_v3_wip.patch, PHOENIX-4287_v4.patch > > > With {{phoenix.use.stats.parallelization}} set to {{false}}, aggregate query > returns incorrect results when stats are available. > With local index and stats disabled for parallelization: > {noformat} > explain select count(*) from TABLE_T; > +---+-++---+ > | PLAN > | EST_BYTES_READ | EST_ROWS_READ | EST_INFO | > +---+-++---+ > | CLIENT 0-CHUNK 332170 ROWS 625043899 BYTES PARALLEL 0-WAY RANGE SCAN OVER > TABLE_T [1] | 625043899 | 332170 | 150792825 | > | SERVER FILTER BY FIRST KEY ONLY > | 625043899 | 332170 | 150792825 | > | SERVER AGGREGATE INTO SINGLE ROW > | 625043899 | 332170 | 150792825 | > +---+-++---+ > select count(*) from TABLE_T; > +---+ > | COUNT(1) | > +---+ > | 0 | > +---+ > {noformat} > Using data table > {noformat} > explain select /*+NO_INDEX*/ count(*) from TABLE_T; > +--+-+++ > | PLAN > | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS | > +--+-+++ > | CLIENT 2-CHUNK 332151 ROWS 438492470 BYTES PARALLEL 1-WAY FULL SCAN OVER > TABLE_T | 438492470 | 332151 | 1507928257617 | > | SERVER FILTER BY FIRST KEY ONLY > | 438492470 | 332151 | 1507928257617 | > | SERVER AGGREGATE INTO SINGLE ROW > | 438492470 | 332151 | 1507928257617 | > +--+-+++ > select /*+NO_INDEX*/ count(*) from TABLE_T; > +---+ > | COUNT(1) | > +---+ > | 14| > +---+ > {noformat} > Without stats available, results are correct: > {noformat} > explain select /*+NO_INDEX*/ count(*) from TABLE_T; > +--+-++--+ > | PLAN | > EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS | > +--+-++--+ > | CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER TABLE_T | null| > null | null | > | SERVER FILTER BY FIRST KEY ONLY | null >| null | null | > | SERVER AGGREGATE INTO SINGLE ROW | null >| null | null | > +--
[jira] [Commented] (PHOENIX-4333) Stats - Incorrect estimate when stats are updated on a tenant specific view
[ https://issues.apache.org/jira/browse/PHOENIX-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235303#comment-16235303 ] Samarth Jain commented on PHOENIX-4333: --- Is it a safe assumption to make that if intersectScan is returning a non-null value, then we have an intersection? {code} Scan newScan = scanRanges.intersectScan(scan, currentKeyBytes, currentGuidePostBytes, keyOffset, false); if (newScan != null) { // guide post was available in the } {code} > Stats - Incorrect estimate when stats are updated on a tenant specific view > --- > > Key: PHOENIX-4333 > URL: https://issues.apache.org/jira/browse/PHOENIX-4333 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.12.0 >Reporter: Mujtaba Chohan >Assignee: Samarth Jain >Priority: Major > Attachments: PHOENIX-4333_test.patch, PHOENIX-4333_v1.patch, > PHOENIX-4333_v2.patch > > > Consider two tenants A, B with tenant specific view on 2 separate > regions/region servers. > {noformat} > Region 1 keys: > A,1 > A,2 > B,1 > Region 2 keys: > B,2 > B,3 > {noformat} > When stats are updated on tenant A view. Querying stats on tenant B view > yield partial results (only contains stats for B,1) which are incorrect even > though it shows updated timestamp as current. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-4333) Stats - Incorrect estimate when stats are updated on a tenant specific view
[ https://issues.apache.org/jira/browse/PHOENIX-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235294#comment-16235294 ] Samarth Jain commented on PHOENIX-4333: --- Good point, [~jamestaylor]. I don't think my check would work in the below case: REGION 1 - VIEW1 and VIEW2 REGION2 - VIEW2 and VIEW3 If we collect stats for VIEW1 and VIEW3, then even though both regions have stats, they don't have stats for VIEW2. I think I would also need to check whether there any guidepost intersected for the region. > Stats - Incorrect estimate when stats are updated on a tenant specific view > --- > > Key: PHOENIX-4333 > URL: https://issues.apache.org/jira/browse/PHOENIX-4333 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.12.0 >Reporter: Mujtaba Chohan >Assignee: Samarth Jain >Priority: Major > Attachments: PHOENIX-4333_test.patch, PHOENIX-4333_v1.patch, > PHOENIX-4333_v2.patch > > > Consider two tenants A, B with tenant specific view on 2 separate > regions/region servers. > {noformat} > Region 1 keys: > A,1 > A,2 > B,1 > Region 2 keys: > B,2 > B,3 > {noformat} > When stats are updated on tenant A view. Querying stats on tenant B view > yield partial results (only contains stats for B,1) which are incorrect even > though it shows updated timestamp as current. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (PHOENIX-4333) Stats - Incorrect estimate when stats are updated on a tenant specific view
[ https://issues.apache.org/jira/browse/PHOENIX-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samarth Jain updated PHOENIX-4333: -- Attachment: PHOENIX-4333_v2.patch > Stats - Incorrect estimate when stats are updated on a tenant specific view > --- > > Key: PHOENIX-4333 > URL: https://issues.apache.org/jira/browse/PHOENIX-4333 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.12.0 >Reporter: Mujtaba Chohan >Assignee: Samarth Jain >Priority: Major > Attachments: PHOENIX-4333_test.patch, PHOENIX-4333_v1.patch, > PHOENIX-4333_v2.patch > > > Consider two tenants A, B with tenant specific view on 2 separate > regions/region servers. > {noformat} > Region 1 keys: > A,1 > A,2 > B,1 > Region 2 keys: > B,2 > B,3 > {noformat} > When stats are updated on tenant A view. Querying stats on tenant B view > yield partial results (only contains stats for B,1) which are incorrect even > though it shows updated timestamp as current. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (PHOENIX-4333) Stats - Incorrect estimate when stats are updated on a tenant specific view
[ https://issues.apache.org/jira/browse/PHOENIX-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samarth Jain updated PHOENIX-4333: -- Attachment: PHOENIX-4333_v2.patch Updated patch that sets estimate timestamp to null when we don't have guideposts available for all regions. > Stats - Incorrect estimate when stats are updated on a tenant specific view > --- > > Key: PHOENIX-4333 > URL: https://issues.apache.org/jira/browse/PHOENIX-4333 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.12.0 >Reporter: Mujtaba Chohan >Assignee: Samarth Jain >Priority: Major > Attachments: PHOENIX-4333_test.patch, PHOENIX-4333_v1.patch > > > Consider two tenants A, B with tenant specific view on 2 separate > regions/region servers. > {noformat} > Region 1 keys: > A,1 > A,2 > B,1 > Region 2 keys: > B,2 > B,3 > {noformat} > When stats are updated on tenant A view. Querying stats on tenant B view > yield partial results (only contains stats for B,1) which are incorrect even > though it shows updated timestamp as current. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (PHOENIX-4333) Stats - Incorrect estimate when stats are updated on a tenant specific view
[ https://issues.apache.org/jira/browse/PHOENIX-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samarth Jain updated PHOENIX-4333: -- Attachment: (was: PHOENIX-4333_v2.patch) > Stats - Incorrect estimate when stats are updated on a tenant specific view > --- > > Key: PHOENIX-4333 > URL: https://issues.apache.org/jira/browse/PHOENIX-4333 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.12.0 >Reporter: Mujtaba Chohan >Assignee: Samarth Jain >Priority: Major > Attachments: PHOENIX-4333_test.patch, PHOENIX-4333_v1.patch > > > Consider two tenants A, B with tenant specific view on 2 separate > regions/region servers. > {noformat} > Region 1 keys: > A,1 > A,2 > B,1 > Region 2 keys: > B,2 > B,3 > {noformat} > When stats are updated on tenant A view. Querying stats on tenant B view > yield partial results (only contains stats for B,1) which are incorrect even > though it shows updated timestamp as current. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-4333) Stats - Incorrect estimate when stats are updated on a tenant specific view
[ https://issues.apache.org/jira/browse/PHOENIX-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235262#comment-16235262 ] Samarth Jain commented on PHOENIX-4333: --- Actually, the check needs to be done inside this catch block: {code} catch (EOFException e) { // We have read all guide posts } {code} And if we are doing there, I think the check I had makes it easier to understand what's going on, IMHO. {code} +if (regionIndex < stopIndex) { +/* + * We don't have guide posts available for all regions. So in this case we + * conservatively say that we cannot provide estimates + */ +gpsAvailableForAllRegions = false; +} } {code} > Stats - Incorrect estimate when stats are updated on a tenant specific view > --- > > Key: PHOENIX-4333 > URL: https://issues.apache.org/jira/browse/PHOENIX-4333 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.12.0 >Reporter: Mujtaba Chohan > Assignee: Samarth Jain >Priority: Major > Attachments: PHOENIX-4333_test.patch, PHOENIX-4333_v1.patch > > > Consider two tenants A, B with tenant specific view on 2 separate > regions/region servers. > {noformat} > Region 1 keys: > A,1 > A,2 > B,1 > Region 2 keys: > B,2 > B,3 > {noformat} > When stats are updated on tenant A view. Querying stats on tenant B view > yield partial results (only contains stats for B,1) which are incorrect even > though it shows updated timestamp as current. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-4333) Stats - Incorrect estimate when stats are updated on a tenant specific view
[ https://issues.apache.org/jira/browse/PHOENIX-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235253#comment-16235253 ] Samarth Jain commented on PHOENIX-4333: --- Ah, I see. Yes, that's true. Let me update the patch. > Stats - Incorrect estimate when stats are updated on a tenant specific view > --- > > Key: PHOENIX-4333 > URL: https://issues.apache.org/jira/browse/PHOENIX-4333 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.12.0 >Reporter: Mujtaba Chohan >Assignee: Samarth Jain >Priority: Major > Attachments: PHOENIX-4333_test.patch, PHOENIX-4333_v1.patch > > > Consider two tenants A, B with tenant specific view on 2 separate > regions/region servers. > {noformat} > Region 1 keys: > A,1 > A,2 > B,1 > Region 2 keys: > B,2 > B,3 > {noformat} > When stats are updated on tenant A view. Querying stats on tenant B view > yield partial results (only contains stats for B,1) which are incorrect even > though it shows updated timestamp as current. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (PHOENIX-4332) Indexes should inherit guide post width of the base data table
[ https://issues.apache.org/jira/browse/PHOENIX-4332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samarth Jain updated PHOENIX-4332: -- Summary: Indexes should inherit guide post width of the base data table (was: Stats - Allow setting guide post width on global indexes) > Indexes should inherit guide post width of the base data table > -- > > Key: PHOENIX-4332 > URL: https://issues.apache.org/jira/browse/PHOENIX-4332 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.12.0 >Reporter: Mujtaba Chohan >Assignee: Samarth Jain >Priority: Major > Attachments: PHOENIX-4332.patch > > > Altering guidepost with on data table does not propagate to global index > using {{ALTER TABLE}} command. > Altering global index table runs in not allowed error. > {noformat} > ALTER TABLE IDX SET GUIDE_POSTS_WIDTH=1; > Error: ERROR 1010 (42M01): Not allowed to mutate table. Cannot add/drop > column referenced by VIEW columnName=IDX (state=42M01,code=1010) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (PHOENIX-4343) In CREATE TABLE allow setting guide post width only on base data tables
[ https://issues.apache.org/jira/browse/PHOENIX-4343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samarth Jain updated PHOENIX-4343: -- Summary: In CREATE TABLE allow setting guide post width only on base data tables (was: In CREATE TABLE only allow setting guide post width on tables and global indexes) > In CREATE TABLE allow setting guide post width only on base data tables > --- > > Key: PHOENIX-4343 > URL: https://issues.apache.org/jira/browse/PHOENIX-4343 > Project: Phoenix > Issue Type: Bug > Reporter: Samarth Jain > Assignee: Samarth Jain >Priority: Major > Attachments: PHOENIX-4343.patch, PHOENIX-4343_v2.patch, > PHOENIX-4343_v3.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-4333) Stats - Incorrect estimate when stats are updated on a tenant specific view
[ https://issues.apache.org/jira/browse/PHOENIX-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235245#comment-16235245 ] Samarth Jain commented on PHOENIX-4333: --- It might be a late night and lack of coffee but I am not sure I see the co-relation here. {code} gpsAvailableForAllRegions &= initialKeyBytes != currentKeyBytes; {code} We set initialKeyBytes to currentKeyBytes when we know we are not using stats for parallelisation. {code} if (!useStatsForParallelization) { /* * If we are not using stats for generating parallel scans, we need to reset the * currentKey back to what it was at the beginning of the loop. */ currentKeyBytes = initialKeyBytes; } {code} bq. I also think we should set the estimatedRows and estimatedSize to what we've found, but only set estimateInfoTimestamp to null if !gpsAvailableForAllRegions. That way callers can choose to use or not use the partial estimates based on estimateInfoTimestamp. Makes sense. > Stats - Incorrect estimate when stats are updated on a tenant specific view > --- > > Key: PHOENIX-4333 > URL: https://issues.apache.org/jira/browse/PHOENIX-4333 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.12.0 >Reporter: Mujtaba Chohan >Assignee: Samarth Jain >Priority: Major > Attachments: PHOENIX-4333_test.patch, PHOENIX-4333_v1.patch > > > Consider two tenants A, B with tenant specific view on 2 separate > regions/region servers. > {noformat} > Region 1 keys: > A,1 > A,2 > B,1 > Region 2 keys: > B,2 > B,3 > {noformat} > When stats are updated on tenant A view. Querying stats on tenant B view > yield partial results (only contains stats for B,1) which are incorrect even > though it shows updated timestamp as current. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (PHOENIX-4343) In CREATE TABLE only allow setting guide post width on tables and global indexes
[ https://issues.apache.org/jira/browse/PHOENIX-4343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samarth Jain updated PHOENIX-4343: -- Attachment: PHOENIX-4343_v3.patch Thanks for the review, [~jamestaylor]. Attached is the updated patch. > In CREATE TABLE only allow setting guide post width on tables and global > indexes > > > Key: PHOENIX-4343 > URL: https://issues.apache.org/jira/browse/PHOENIX-4343 > Project: Phoenix > Issue Type: Bug > Reporter: Samarth Jain >Assignee: Samarth Jain >Priority: Major > Attachments: PHOENIX-4343.patch, PHOENIX-4343_v2.patch, > PHOENIX-4343_v3.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (PHOENIX-4333) Stats - Incorrect estimate when stats are updated on a tenant specific view
[ https://issues.apache.org/jira/browse/PHOENIX-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samarth Jain updated PHOENIX-4333: -- Attachment: PHOENIX-4333_v1.patch With this patch, we now detect that if we don't have stats information available for all the regions, then we report estimates as null. The updated test tests out this scenario. [~jamestaylor], please review. > Stats - Incorrect estimate when stats are updated on a tenant specific view > --- > > Key: PHOENIX-4333 > URL: https://issues.apache.org/jira/browse/PHOENIX-4333 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.12.0 >Reporter: Mujtaba Chohan >Assignee: Samarth Jain >Priority: Major > Attachments: PHOENIX-4333_test.patch, PHOENIX-4333_v1.patch > > > Consider two tenants A, B with tenant specific view on 2 separate > regions/region servers. > {noformat} > Region 1 keys: > A,1 > A,2 > B,1 > Region 2 keys: > B,2 > B,3 > {noformat} > When stats are updated on tenant A view. Querying stats on tenant B view > yield partial results (only contains stats for B,1) which are incorrect even > though it shows updated timestamp as current. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (PHOENIX-4343) In CREATE TABLE only allow setting guide post width on tables and global indexes
[ https://issues.apache.org/jira/browse/PHOENIX-4343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samarth Jain updated PHOENIX-4343: -- Attachment: PHOENIX-4343_v2.patch Updated patch. [~jamestaylor], please review. > In CREATE TABLE only allow setting guide post width on tables and global > indexes > > > Key: PHOENIX-4343 > URL: https://issues.apache.org/jira/browse/PHOENIX-4343 > Project: Phoenix > Issue Type: Bug > Reporter: Samarth Jain >Assignee: Samarth Jain >Priority: Major > Attachments: PHOENIX-4343.patch, PHOENIX-4343_v2.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-4343) In CREATE TABLE only allow setting guide post width on tables and global indexes
[ https://issues.apache.org/jira/browse/PHOENIX-4343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235158#comment-16235158 ] Samarth Jain commented on PHOENIX-4343: --- With PHOENIX-4332 indexes now inherit the guide post width of the data table. The right approach would be disallow setting guide post width on everything except the data table. Will update the patch. > In CREATE TABLE only allow setting guide post width on tables and global > indexes > > > Key: PHOENIX-4343 > URL: https://issues.apache.org/jira/browse/PHOENIX-4343 > Project: Phoenix > Issue Type: Bug > Reporter: Samarth Jain >Assignee: Samarth Jain >Priority: Major > Attachments: PHOENIX-4343.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-4332) Stats - Allow setting guide post width on global indexes
[ https://issues.apache.org/jira/browse/PHOENIX-4332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235145#comment-16235145 ] Samarth Jain commented on PHOENIX-4332: --- Instead of supporting ALTER TABLE or ALTER INDEX to set guide_posts_width, indexes now instead inherit the guide post width of the data table. This applies to global, local, and view indexes. > Stats - Allow setting guide post width on global indexes > > > Key: PHOENIX-4332 > URL: https://issues.apache.org/jira/browse/PHOENIX-4332 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.12.0 >Reporter: Mujtaba Chohan >Assignee: Samarth Jain >Priority: Major > Attachments: PHOENIX-4332.patch > > > Altering guidepost with on data table does not propagate to global index > using {{ALTER TABLE}} command. > Altering global index table runs in not allowed error. > {noformat} > ALTER TABLE IDX SET GUIDE_POSTS_WIDTH=1; > Error: ERROR 1010 (42M01): Not allowed to mutate table. Cannot add/drop > column referenced by VIEW columnName=IDX (state=42M01,code=1010) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (PHOENIX-4332) Stats - Allow setting guide post width on global indexes
[ https://issues.apache.org/jira/browse/PHOENIX-4332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samarth Jain updated PHOENIX-4332: -- Attachment: PHOENIX-4332.patch [~jamestaylor], please review. > Stats - Allow setting guide post width on global indexes > > > Key: PHOENIX-4332 > URL: https://issues.apache.org/jira/browse/PHOENIX-4332 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.12.0 >Reporter: Mujtaba Chohan >Assignee: Samarth Jain >Priority: Major > Attachments: PHOENIX-4332.patch > > > Altering guidepost with on data table does not propagate to global index > using {{ALTER TABLE}} command. > Altering global index table runs in not allowed error. > {noformat} > ALTER TABLE IDX SET GUIDE_POSTS_WIDTH=1; > Error: ERROR 1010 (42M01): Not allowed to mutate table. Cannot add/drop > column referenced by VIEW columnName=IDX (state=42M01,code=1010) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (PHOENIX-4287) Incorrect aggregate query results when stats are disable for parallelization
[ https://issues.apache.org/jira/browse/PHOENIX-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samarth Jain updated PHOENIX-4287: -- Attachment: PHOENIX-4287_addendum4.patch > Incorrect aggregate query results when stats are disable for parallelization > > > Key: PHOENIX-4287 > URL: https://issues.apache.org/jira/browse/PHOENIX-4287 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.12.0 > Environment: HBase 1.3.1 >Reporter: Mujtaba Chohan > Assignee: Samarth Jain >Priority: Major > Labels: localIndex > Fix For: 4.13.0, 4.12.1 > > Attachments: PHOENIX-4287.patch, PHOENIX-4287_addendum.patch, > PHOENIX-4287_addendum2.patch, PHOENIX-4287_addendum3.patch, > PHOENIX-4287_addendum4.patch, PHOENIX-4287_v2.patch, PHOENIX-4287_v3.patch, > PHOENIX-4287_v3_wip.patch, PHOENIX-4287_v4.patch > > > With {{phoenix.use.stats.parallelization}} set to {{false}}, aggregate query > returns incorrect results when stats are available. > With local index and stats disabled for parallelization: > {noformat} > explain select count(*) from TABLE_T; > +---+-++---+ > | PLAN > | EST_BYTES_READ | EST_ROWS_READ | EST_INFO | > +---+-++---+ > | CLIENT 0-CHUNK 332170 ROWS 625043899 BYTES PARALLEL 0-WAY RANGE SCAN OVER > TABLE_T [1] | 625043899 | 332170 | 150792825 | > | SERVER FILTER BY FIRST KEY ONLY > | 625043899 | 332170 | 150792825 | > | SERVER AGGREGATE INTO SINGLE ROW > | 625043899 | 332170 | 150792825 | > +---+-++---+ > select count(*) from TABLE_T; > +---+ > | COUNT(1) | > +---+ > | 0 | > +---+ > {noformat} > Using data table > {noformat} > explain select /*+NO_INDEX*/ count(*) from TABLE_T; > +--+-+++ > | PLAN > | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS | > +--+-+++ > | CLIENT 2-CHUNK 332151 ROWS 438492470 BYTES PARALLEL 1-WAY FULL SCAN OVER > TABLE_T | 438492470 | 332151 | 1507928257617 | > | SERVER FILTER BY FIRST KEY ONLY > | 438492470 | 332151 | 1507928257617 | > | SERVER AGGREGATE INTO SINGLE ROW > | 438492470 | 332151 | 1507928257617 | > +--+-+++ > select /*+NO_INDEX*/ count(*) from TABLE_T; > +---+ > | COUNT(1) | > +---+ > | 14| > +---+ > {noformat} > Without stats available, results are correct: > {noformat} > explain select /*+NO_INDEX*/ count(*) from TABLE_T; > +--+-++--+ > | PLAN | > EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS | > +--+-++--+ > | CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER TABLE_T | null| > null | null | > | SERVER FILTER BY FIRST KEY ONLY | null >| null | null | > | SERVER AGGREGATE INTO SINGLE ROW | null >| null | null | > +--
[jira] [Updated] (PHOENIX-4287) Incorrect aggregate query results when stats are disable for parallelization
[ https://issues.apache.org/jira/browse/PHOENIX-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samarth Jain updated PHOENIX-4287: -- Attachment: (was: PHOENIX-4287_addendum4.patch) > Incorrect aggregate query results when stats are disable for parallelization > > > Key: PHOENIX-4287 > URL: https://issues.apache.org/jira/browse/PHOENIX-4287 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.12.0 > Environment: HBase 1.3.1 >Reporter: Mujtaba Chohan > Assignee: Samarth Jain >Priority: Major > Labels: localIndex > Fix For: 4.13.0, 4.12.1 > > Attachments: PHOENIX-4287.patch, PHOENIX-4287_addendum.patch, > PHOENIX-4287_addendum2.patch, PHOENIX-4287_addendum3.patch, > PHOENIX-4287_v2.patch, PHOENIX-4287_v3.patch, PHOENIX-4287_v3_wip.patch, > PHOENIX-4287_v4.patch > > > With {{phoenix.use.stats.parallelization}} set to {{false}}, aggregate query > returns incorrect results when stats are available. > With local index and stats disabled for parallelization: > {noformat} > explain select count(*) from TABLE_T; > +---+-++---+ > | PLAN > | EST_BYTES_READ | EST_ROWS_READ | EST_INFO | > +---+-++---+ > | CLIENT 0-CHUNK 332170 ROWS 625043899 BYTES PARALLEL 0-WAY RANGE SCAN OVER > TABLE_T [1] | 625043899 | 332170 | 150792825 | > | SERVER FILTER BY FIRST KEY ONLY > | 625043899 | 332170 | 150792825 | > | SERVER AGGREGATE INTO SINGLE ROW > | 625043899 | 332170 | 150792825 | > +---+-++---+ > select count(*) from TABLE_T; > +---+ > | COUNT(1) | > +---+ > | 0 | > +---+ > {noformat} > Using data table > {noformat} > explain select /*+NO_INDEX*/ count(*) from TABLE_T; > +--+-+++ > | PLAN > | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS | > +--+-+++ > | CLIENT 2-CHUNK 332151 ROWS 438492470 BYTES PARALLEL 1-WAY FULL SCAN OVER > TABLE_T | 438492470 | 332151 | 1507928257617 | > | SERVER FILTER BY FIRST KEY ONLY > | 438492470 | 332151 | 1507928257617 | > | SERVER AGGREGATE INTO SINGLE ROW > | 438492470 | 332151 | 1507928257617 | > +--+-+++ > select /*+NO_INDEX*/ count(*) from TABLE_T; > +---+ > | COUNT(1) | > +---+ > | 14| > +---+ > {noformat} > Without stats available, results are correct: > {noformat} > explain select /*+NO_INDEX*/ count(*) from TABLE_T; > +--+-++--+ > | PLAN | > EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS | > +--+-++--+ > | CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER TABLE_T | null| > null | null | > | SERVER FILTER BY FIRST KEY ONLY | null >| null | null | > | SERVER AGGREGATE INTO SINGLE ROW | null >| null | null | > +-
[jira] [Updated] (PHOENIX-4287) Incorrect aggregate query results when stats are disable for parallelization
[ https://issues.apache.org/jira/browse/PHOENIX-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samarth Jain updated PHOENIX-4287: -- Attachment: PHOENIX-4287_addendum4.patch Updated patch with more tests including fix for an issue that the new test surfaced. > Incorrect aggregate query results when stats are disable for parallelization > > > Key: PHOENIX-4287 > URL: https://issues.apache.org/jira/browse/PHOENIX-4287 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.12.0 > Environment: HBase 1.3.1 >Reporter: Mujtaba Chohan > Assignee: Samarth Jain >Priority: Major > Labels: localIndex > Fix For: 4.13.0, 4.12.1 > > Attachments: PHOENIX-4287.patch, PHOENIX-4287_addendum.patch, > PHOENIX-4287_addendum2.patch, PHOENIX-4287_addendum3.patch, > PHOENIX-4287_addendum4.patch, PHOENIX-4287_v2.patch, PHOENIX-4287_v3.patch, > PHOENIX-4287_v3_wip.patch, PHOENIX-4287_v4.patch > > > With {{phoenix.use.stats.parallelization}} set to {{false}}, aggregate query > returns incorrect results when stats are available. > With local index and stats disabled for parallelization: > {noformat} > explain select count(*) from TABLE_T; > +---+-++---+ > | PLAN > | EST_BYTES_READ | EST_ROWS_READ | EST_INFO | > +---+-++---+ > | CLIENT 0-CHUNK 332170 ROWS 625043899 BYTES PARALLEL 0-WAY RANGE SCAN OVER > TABLE_T [1] | 625043899 | 332170 | 150792825 | > | SERVER FILTER BY FIRST KEY ONLY > | 625043899 | 332170 | 150792825 | > | SERVER AGGREGATE INTO SINGLE ROW > | 625043899 | 332170 | 150792825 | > +---+-++---+ > select count(*) from TABLE_T; > +---+ > | COUNT(1) | > +---+ > | 0 | > +---+ > {noformat} > Using data table > {noformat} > explain select /*+NO_INDEX*/ count(*) from TABLE_T; > +--+-+++ > | PLAN > | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS | > +--+-+++ > | CLIENT 2-CHUNK 332151 ROWS 438492470 BYTES PARALLEL 1-WAY FULL SCAN OVER > TABLE_T | 438492470 | 332151 | 1507928257617 | > | SERVER FILTER BY FIRST KEY ONLY > | 438492470 | 332151 | 1507928257617 | > | SERVER AGGREGATE INTO SINGLE ROW > | 438492470 | 332151 | 1507928257617 | > +--+-+++ > select /*+NO_INDEX*/ count(*) from TABLE_T; > +---+ > | COUNT(1) | > +---+ > | 14| > +---+ > {noformat} > Without stats available, results are correct: > {noformat} > explain select /*+NO_INDEX*/ count(*) from TABLE_T; > +--+-++--+ > | PLAN | > EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS | > +--+-++--+ > | CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER TABLE_T | null| > null | null | > | SERVER FILTER BY FIRST KEY ONLY | null >| null | null | > | SERVER AGGREGATE INTO SINGLE ROW | null >
[jira] [Reopened] (PHOENIX-4287) Incorrect aggregate query results when stats are disable for parallelization
[ https://issues.apache.org/jira/browse/PHOENIX-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samarth Jain reopened PHOENIX-4287: --- > Incorrect aggregate query results when stats are disable for parallelization > > > Key: PHOENIX-4287 > URL: https://issues.apache.org/jira/browse/PHOENIX-4287 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.12.0 > Environment: HBase 1.3.1 >Reporter: Mujtaba Chohan > Assignee: Samarth Jain >Priority: Major > Labels: localIndex > Fix For: 4.13.0, 4.12.1 > > Attachments: PHOENIX-4287.patch, PHOENIX-4287_addendum.patch, > PHOENIX-4287_addendum2.patch, PHOENIX-4287_addendum3.patch, > PHOENIX-4287_addendum4.patch, PHOENIX-4287_v2.patch, PHOENIX-4287_v3.patch, > PHOENIX-4287_v3_wip.patch, PHOENIX-4287_v4.patch > > > With {{phoenix.use.stats.parallelization}} set to {{false}}, aggregate query > returns incorrect results when stats are available. > With local index and stats disabled for parallelization: > {noformat} > explain select count(*) from TABLE_T; > +---+-++---+ > | PLAN > | EST_BYTES_READ | EST_ROWS_READ | EST_INFO | > +---+-++---+ > | CLIENT 0-CHUNK 332170 ROWS 625043899 BYTES PARALLEL 0-WAY RANGE SCAN OVER > TABLE_T [1] | 625043899 | 332170 | 150792825 | > | SERVER FILTER BY FIRST KEY ONLY > | 625043899 | 332170 | 150792825 | > | SERVER AGGREGATE INTO SINGLE ROW > | 625043899 | 332170 | 150792825 | > +---+-++---+ > select count(*) from TABLE_T; > +---+ > | COUNT(1) | > +---+ > | 0 | > +---+ > {noformat} > Using data table > {noformat} > explain select /*+NO_INDEX*/ count(*) from TABLE_T; > +--+-+++ > | PLAN > | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS | > +--+-+++ > | CLIENT 2-CHUNK 332151 ROWS 438492470 BYTES PARALLEL 1-WAY FULL SCAN OVER > TABLE_T | 438492470 | 332151 | 1507928257617 | > | SERVER FILTER BY FIRST KEY ONLY > | 438492470 | 332151 | 1507928257617 | > | SERVER AGGREGATE INTO SINGLE ROW > | 438492470 | 332151 | 1507928257617 | > +--+-+++ > select /*+NO_INDEX*/ count(*) from TABLE_T; > +---+ > | COUNT(1) | > +---+ > | 14| > +---+ > {noformat} > Without stats available, results are correct: > {noformat} > explain select /*+NO_INDEX*/ count(*) from TABLE_T; > +--+-++--+ > | PLAN | > EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS | > +--+-++--+ > | CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER TABLE_T | null| > null | null | > | SERVER FILTER BY FIRST KEY ONLY | null >| null | null | > | SERVER AGGREGATE INTO SINGLE ROW | null >| null | null | > +--+-++--+ > select /*+NO_INDEX*/ count(*) from TABLE_T; > +---+ > | COUNT(1) | > +---+ > | 27| > +---+ > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (PHOENIX-4287) Incorrect aggregate query results when stats are disable for parallelization
[ https://issues.apache.org/jira/browse/PHOENIX-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samarth Jain updated PHOENIX-4287: -- Attachment: PHOENIX-4287_addendum3.patch Good catch, [~jamestaylor]. I have added a test that makes sure that useStatsForParallelization returns null when the property is not set in create table. > Incorrect aggregate query results when stats are disable for parallelization > > > Key: PHOENIX-4287 > URL: https://issues.apache.org/jira/browse/PHOENIX-4287 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.12.0 > Environment: HBase 1.3.1 >Reporter: Mujtaba Chohan > Assignee: Samarth Jain >Priority: Major > Labels: localIndex > Fix For: 4.13.0, 4.12.1 > > Attachments: PHOENIX-4287.patch, PHOENIX-4287_addendum.patch, > PHOENIX-4287_addendum2.patch, PHOENIX-4287_addendum3.patch, > PHOENIX-4287_v2.patch, PHOENIX-4287_v3.patch, PHOENIX-4287_v3_wip.patch, > PHOENIX-4287_v4.patch > > > With {{phoenix.use.stats.parallelization}} set to {{false}}, aggregate query > returns incorrect results when stats are available. > With local index and stats disabled for parallelization: > {noformat} > explain select count(*) from TABLE_T; > +---+-++---+ > | PLAN > | EST_BYTES_READ | EST_ROWS_READ | EST_INFO | > +---+-++---+ > | CLIENT 0-CHUNK 332170 ROWS 625043899 BYTES PARALLEL 0-WAY RANGE SCAN OVER > TABLE_T [1] | 625043899 | 332170 | 150792825 | > | SERVER FILTER BY FIRST KEY ONLY > | 625043899 | 332170 | 150792825 | > | SERVER AGGREGATE INTO SINGLE ROW > | 625043899 | 332170 | 150792825 | > +---+-++---+ > select count(*) from TABLE_T; > +---+ > | COUNT(1) | > +---+ > | 0 | > +---+ > {noformat} > Using data table > {noformat} > explain select /*+NO_INDEX*/ count(*) from TABLE_T; > +--+-+++ > | PLAN > | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS | > +--+-+++ > | CLIENT 2-CHUNK 332151 ROWS 438492470 BYTES PARALLEL 1-WAY FULL SCAN OVER > TABLE_T | 438492470 | 332151 | 1507928257617 | > | SERVER FILTER BY FIRST KEY ONLY > | 438492470 | 332151 | 1507928257617 | > | SERVER AGGREGATE INTO SINGLE ROW > | 438492470 | 332151 | 1507928257617 | > +--+-+++ > select /*+NO_INDEX*/ count(*) from TABLE_T; > +---+ > | COUNT(1) | > +---+ > | 14| > +---+ > {noformat} > Without stats available, results are correct: > {noformat} > explain select /*+NO_INDEX*/ count(*) from TABLE_T; > +--+-++--+ > | PLAN | > EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS | > +--+-++--+ > | CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER TABLE_T | null| > null | null | > | SERVER FILTER BY FIRST KEY ONLY | null >| null | null | > | SERVER AGGREGATE INTO SINGLE ROW
[jira] [Updated] (PHOENIX-4287) Incorrect aggregate query results when stats are disable for parallelization
[ https://issues.apache.org/jira/browse/PHOENIX-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samarth Jain updated PHOENIX-4287: -- Attachment: PHOENIX-4287_addendum2.patch Thanks for the reviews, [~jamestaylor]. Updated patch addresses the comment. > Incorrect aggregate query results when stats are disable for parallelization > > > Key: PHOENIX-4287 > URL: https://issues.apache.org/jira/browse/PHOENIX-4287 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.12.0 > Environment: HBase 1.3.1 >Reporter: Mujtaba Chohan > Assignee: Samarth Jain >Priority: Major > Labels: localIndex > Fix For: 4.13.0, 4.12.1 > > Attachments: PHOENIX-4287.patch, PHOENIX-4287_addendum.patch, > PHOENIX-4287_addendum2.patch, PHOENIX-4287_v2.patch, PHOENIX-4287_v3.patch, > PHOENIX-4287_v3_wip.patch, PHOENIX-4287_v4.patch > > > With {{phoenix.use.stats.parallelization}} set to {{false}}, aggregate query > returns incorrect results when stats are available. > With local index and stats disabled for parallelization: > {noformat} > explain select count(*) from TABLE_T; > +---+-++---+ > | PLAN > | EST_BYTES_READ | EST_ROWS_READ | EST_INFO | > +---+-++---+ > | CLIENT 0-CHUNK 332170 ROWS 625043899 BYTES PARALLEL 0-WAY RANGE SCAN OVER > TABLE_T [1] | 625043899 | 332170 | 150792825 | > | SERVER FILTER BY FIRST KEY ONLY > | 625043899 | 332170 | 150792825 | > | SERVER AGGREGATE INTO SINGLE ROW > | 625043899 | 332170 | 150792825 | > +---+-++---+ > select count(*) from TABLE_T; > +---+ > | COUNT(1) | > +---+ > | 0 | > +---+ > {noformat} > Using data table > {noformat} > explain select /*+NO_INDEX*/ count(*) from TABLE_T; > +--+-+++ > | PLAN > | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS | > +--+-+++ > | CLIENT 2-CHUNK 332151 ROWS 438492470 BYTES PARALLEL 1-WAY FULL SCAN OVER > TABLE_T | 438492470 | 332151 | 1507928257617 | > | SERVER FILTER BY FIRST KEY ONLY > | 438492470 | 332151 | 1507928257617 | > | SERVER AGGREGATE INTO SINGLE ROW > | 438492470 | 332151 | 1507928257617 | > +--+-+++ > select /*+NO_INDEX*/ count(*) from TABLE_T; > +---+ > | COUNT(1) | > +---+ > | 14| > +---+ > {noformat} > Without stats available, results are correct: > {noformat} > explain select /*+NO_INDEX*/ count(*) from TABLE_T; > +--+-++--+ > | PLAN | > EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS | > +--+-++--+ > | CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER TABLE_T | null| > null | null | > | SERVER FILTER BY FIRST KEY ONLY | null >| null | null | > | SERVER AGGREGATE INTO SINGLE ROW | null >| null | null | > +-
[jira] [Commented] (PHOENIX-4332) Stats - Altering guidepost width on base table does not propagate to global index
[ https://issues.apache.org/jira/browse/PHOENIX-4332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234835#comment-16234835 ] Samarth Jain commented on PHOENIX-4332: --- View indexes and local indexes use the guide post width of the data table. Global indexes need to have their guide post width set. > Stats - Altering guidepost width on base table does not propagate to global > index > - > > Key: PHOENIX-4332 > URL: https://issues.apache.org/jira/browse/PHOENIX-4332 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.12.0 >Reporter: Mujtaba Chohan > Assignee: Samarth Jain >Priority: Major > > Altering guidepost with on data table does not propagate to global index > using {{ALTER TABLE}} command. > Altering global index table runs in not allowed error. > {noformat} > ALTER TABLE IDX SET GUIDE_POSTS_WIDTH=1; > Error: ERROR 1010 (42M01): Not allowed to mutate table. Cannot add/drop > column referenced by VIEW columnName=IDX (state=42M01,code=1010) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (PHOENIX-4332) Stats - Allow setting guide post width on global indexes
[ https://issues.apache.org/jira/browse/PHOENIX-4332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samarth Jain updated PHOENIX-4332: -- Summary: Stats - Allow setting guide post width on global indexes (was: Stats - Altering guidepost width on base table does not propagate to global index) > Stats - Allow setting guide post width on global indexes > > > Key: PHOENIX-4332 > URL: https://issues.apache.org/jira/browse/PHOENIX-4332 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.12.0 >Reporter: Mujtaba Chohan >Assignee: Samarth Jain >Priority: Major > > Altering guidepost with on data table does not propagate to global index > using {{ALTER TABLE}} command. > Altering global index table runs in not allowed error. > {noformat} > ALTER TABLE IDX SET GUIDE_POSTS_WIDTH=1; > Error: ERROR 1010 (42M01): Not allowed to mutate table. Cannot add/drop > column referenced by VIEW columnName=IDX (state=42M01,code=1010) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (PHOENIX-4287) Incorrect aggregate query results when stats are disable for parallelization
[ https://issues.apache.org/jira/browse/PHOENIX-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samarth Jain updated PHOENIX-4287: -- Attachment: PHOENIX-4287_addendum.patch Patch that fixes the issue. We need to set the config {code} phoenix.use.stats.parallelization {code} both on client and server side. When build PTable on the server side, we use the config default if the cell for USE_STATS_FOR_PARALLELIZATION is not present. Earlier it was defaulting to true. > Incorrect aggregate query results when stats are disable for parallelization > > > Key: PHOENIX-4287 > URL: https://issues.apache.org/jira/browse/PHOENIX-4287 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.12.0 > Environment: HBase 1.3.1 >Reporter: Mujtaba Chohan > Assignee: Samarth Jain >Priority: Major > Labels: localIndex > Fix For: 4.13.0, 4.12.1 > > Attachments: PHOENIX-4287.patch, PHOENIX-4287_addendum.patch, > PHOENIX-4287_v2.patch, PHOENIX-4287_v3.patch, PHOENIX-4287_v3_wip.patch, > PHOENIX-4287_v4.patch > > > With {{phoenix.use.stats.parallelization}} set to {{false}}, aggregate query > returns incorrect results when stats are available. > With local index and stats disabled for parallelization: > {noformat} > explain select count(*) from TABLE_T; > +---+-++---+ > | PLAN > | EST_BYTES_READ | EST_ROWS_READ | EST_INFO | > +---+-++---+ > | CLIENT 0-CHUNK 332170 ROWS 625043899 BYTES PARALLEL 0-WAY RANGE SCAN OVER > TABLE_T [1] | 625043899 | 332170 | 150792825 | > | SERVER FILTER BY FIRST KEY ONLY > | 625043899 | 332170 | 150792825 | > | SERVER AGGREGATE INTO SINGLE ROW > | 625043899 | 332170 | 150792825 | > +---+-++---+ > select count(*) from TABLE_T; > +---+ > | COUNT(1) | > +---+ > | 0 | > +---+ > {noformat} > Using data table > {noformat} > explain select /*+NO_INDEX*/ count(*) from TABLE_T; > +--+-+++ > | PLAN > | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS | > +--+-+++ > | CLIENT 2-CHUNK 332151 ROWS 438492470 BYTES PARALLEL 1-WAY FULL SCAN OVER > TABLE_T | 438492470 | 332151 | 1507928257617 | > | SERVER FILTER BY FIRST KEY ONLY > | 438492470 | 332151 | 1507928257617 | > | SERVER AGGREGATE INTO SINGLE ROW > | 438492470 | 332151 | 1507928257617 | > +--+-+++ > select /*+NO_INDEX*/ count(*) from TABLE_T; > +---+ > | COUNT(1) | > +---+ > | 14| > +---+ > {noformat} > Without stats available, results are correct: > {noformat} > explain select /*+NO_INDEX*/ count(*) from TABLE_T; > +--+-++--+ > | PLAN | > EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS | > +--+-++--+ > | CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER TABLE_T | null| > null | null | > | SERVER FILTER B
[jira] [Updated] (PHOENIX-4343) In CREATE TABLE only allow setting guide post width on tables and global indexes
[ https://issues.apache.org/jira/browse/PHOENIX-4343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samarth Jain updated PHOENIX-4343: -- Attachment: PHOENIX-4343.patch > In CREATE TABLE only allow setting guide post width on tables and global > indexes > > > Key: PHOENIX-4343 > URL: https://issues.apache.org/jira/browse/PHOENIX-4343 > Project: Phoenix > Issue Type: Bug > Reporter: Samarth Jain >Assignee: Samarth Jain >Priority: Major > Attachments: PHOENIX-4343.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-4343) In CREATE TABLE only allow setting guide post width on tables and global indexes
[ https://issues.apache.org/jira/browse/PHOENIX-4343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234774#comment-16234774 ] Samarth Jain commented on PHOENIX-4343: --- [~jamestaylor], please review. > In CREATE TABLE only allow setting guide post width on tables and global > indexes > > > Key: PHOENIX-4343 > URL: https://issues.apache.org/jira/browse/PHOENIX-4343 > Project: Phoenix > Issue Type: Bug > Reporter: Samarth Jain >Assignee: Samarth Jain >Priority: Major > Attachments: PHOENIX-4343.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (PHOENIX-4343) In CREATE TABLE only allow setting guide post width on tables and global indexes
Samarth Jain created PHOENIX-4343: - Summary: In CREATE TABLE only allow setting guide post width on tables and global indexes Key: PHOENIX-4343 URL: https://issues.apache.org/jira/browse/PHOENIX-4343 Project: Phoenix Issue Type: Bug Reporter: Samarth Jain Assignee: Samarth Jain Priority: Major -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-4287) Incorrect aggregate query results when stats are disable for parallelization
[ https://issues.apache.org/jira/browse/PHOENIX-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234620#comment-16234620 ] Samarth Jain commented on PHOENIX-4287: --- OK, thanks. Looks like we are hitting a similar issue when using queries against views. Views should inherit the USE_STATS_FOR_PARALLELIZATION property from the base table. > Incorrect aggregate query results when stats are disable for parallelization > > > Key: PHOENIX-4287 > URL: https://issues.apache.org/jira/browse/PHOENIX-4287 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.12.0 > Environment: HBase 1.3.1 >Reporter: Mujtaba Chohan > Assignee: Samarth Jain >Priority: Major > Labels: localIndex > Fix For: 4.13.0, 4.12.1 > > Attachments: PHOENIX-4287.patch, PHOENIX-4287_v2.patch, > PHOENIX-4287_v3.patch, PHOENIX-4287_v3_wip.patch, PHOENIX-4287_v4.patch > > > With {{phoenix.use.stats.parallelization}} set to {{false}}, aggregate query > returns incorrect results when stats are available. > With local index and stats disabled for parallelization: > {noformat} > explain select count(*) from TABLE_T; > +---+-++---+ > | PLAN > | EST_BYTES_READ | EST_ROWS_READ | EST_INFO | > +---+-++---+ > | CLIENT 0-CHUNK 332170 ROWS 625043899 BYTES PARALLEL 0-WAY RANGE SCAN OVER > TABLE_T [1] | 625043899 | 332170 | 150792825 | > | SERVER FILTER BY FIRST KEY ONLY > | 625043899 | 332170 | 150792825 | > | SERVER AGGREGATE INTO SINGLE ROW > | 625043899 | 332170 | 150792825 | > +---+-++---+ > select count(*) from TABLE_T; > +---+ > | COUNT(1) | > +---+ > | 0 | > +---+ > {noformat} > Using data table > {noformat} > explain select /*+NO_INDEX*/ count(*) from TABLE_T; > +--+-+++ > | PLAN > | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS | > +--+-+++ > | CLIENT 2-CHUNK 332151 ROWS 438492470 BYTES PARALLEL 1-WAY FULL SCAN OVER > TABLE_T | 438492470 | 332151 | 1507928257617 | > | SERVER FILTER BY FIRST KEY ONLY > | 438492470 | 332151 | 1507928257617 | > | SERVER AGGREGATE INTO SINGLE ROW > | 438492470 | 332151 | 1507928257617 | > +--+-+++ > select /*+NO_INDEX*/ count(*) from TABLE_T; > +---+ > | COUNT(1) | > +---+ > | 14| > +---+ > {noformat} > Without stats available, results are correct: > {noformat} > explain select /*+NO_INDEX*/ count(*) from TABLE_T; > +--+-++--+ > | PLAN | > EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS | > +--+-++--+ > | CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER TABLE_T | null| > null | null | > | SERVER FILTER BY FIRST KEY ONLY | null >| null | null | > | SERVER AGGREGATE INTO SINGLE ROW | null >| null | null | &
[jira] [Commented] (PHOENIX-4287) Incorrect aggregate query results when stats are disable for parallelization
[ https://issues.apache.org/jira/browse/PHOENIX-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234587#comment-16234587 ] Samarth Jain commented on PHOENIX-4287: --- [~mujtabachohan] - What kind of query are you running into issue with? Is it against a table or a view? What happens after you execute a ALTER TABLE SET USE_STATS_FOR_PARALLELIZATION=false? Can you check also for the base table the value of USE_STATS_FOR_PARALLELIZATION in SYSTEM.CATALOG? > Incorrect aggregate query results when stats are disable for parallelization > > > Key: PHOENIX-4287 > URL: https://issues.apache.org/jira/browse/PHOENIX-4287 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.12.0 > Environment: HBase 1.3.1 >Reporter: Mujtaba Chohan > Assignee: Samarth Jain >Priority: Major > Labels: localIndex > Fix For: 4.13.0, 4.12.1 > > Attachments: PHOENIX-4287.patch, PHOENIX-4287_v2.patch, > PHOENIX-4287_v3.patch, PHOENIX-4287_v3_wip.patch, PHOENIX-4287_v4.patch > > > With {{phoenix.use.stats.parallelization}} set to {{false}}, aggregate query > returns incorrect results when stats are available. > With local index and stats disabled for parallelization: > {noformat} > explain select count(*) from TABLE_T; > +---+-++---+ > | PLAN > | EST_BYTES_READ | EST_ROWS_READ | EST_INFO | > +---+-++---+ > | CLIENT 0-CHUNK 332170 ROWS 625043899 BYTES PARALLEL 0-WAY RANGE SCAN OVER > TABLE_T [1] | 625043899 | 332170 | 150792825 | > | SERVER FILTER BY FIRST KEY ONLY > | 625043899 | 332170 | 150792825 | > | SERVER AGGREGATE INTO SINGLE ROW > | 625043899 | 332170 | 150792825 | > +---+-++---+ > select count(*) from TABLE_T; > +---+ > | COUNT(1) | > +---+ > | 0 | > +---+ > {noformat} > Using data table > {noformat} > explain select /*+NO_INDEX*/ count(*) from TABLE_T; > +--+-+++ > | PLAN > | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS | > +--+-+++ > | CLIENT 2-CHUNK 332151 ROWS 438492470 BYTES PARALLEL 1-WAY FULL SCAN OVER > TABLE_T | 438492470 | 332151 | 1507928257617 | > | SERVER FILTER BY FIRST KEY ONLY > | 438492470 | 332151 | 1507928257617 | > | SERVER AGGREGATE INTO SINGLE ROW > | 438492470 | 332151 | 1507928257617 | > +--+-+++ > select /*+NO_INDEX*/ count(*) from TABLE_T; > +---+ > | COUNT(1) | > +---+ > | 14| > +---+ > {noformat} > Without stats available, results are correct: > {noformat} > explain select /*+NO_INDEX*/ count(*) from TABLE_T; > +--+-++--+ > | PLAN | > EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS | > +--+-++--+ > | CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER TABLE_T | null| > null | null | > | SERVER FILTER BY F
[jira] [Commented] (PHOENIX-4335) System catalog snapshot created each time a new connection is created
[ https://issues.apache.org/jira/browse/PHOENIX-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234412#comment-16234412 ] Samarth Jain commented on PHOENIX-4335: --- +1 > System catalog snapshot created each time a new connection is created > - > > Key: PHOENIX-4335 > URL: https://issues.apache.org/jira/browse/PHOENIX-4335 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.12.0 >Reporter: Mujtaba Chohan >Assignee: James Taylor >Priority: Blocker > Fix For: 4.13.0 > > Attachments: PHOENIX-4335.patch, PHOENIX-4335_v2.patch, > PHOENIX-4335_v3.patch > > > With current head of 4.x, System Catalog snapshot is created on each new > connection. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-4335) System catalog snapshot created each time a new connection is created
[ https://issues.apache.org/jira/browse/PHOENIX-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233715#comment-16233715 ] Samarth Jain commented on PHOENIX-4335: --- Patch looks good, [~jamestaylor]. One minor nit: I don't see why these have to be an array? {code} +private final static boolean[] reinitialize = new boolean[1]; +private final static int[] countUpgradeAttempts = new int[1]; +private final static long[] systemTableVersion = {MetaDataProtocol.getPriorVersion()}; {code} > System catalog snapshot created each time a new connection is created > - > > Key: PHOENIX-4335 > URL: https://issues.apache.org/jira/browse/PHOENIX-4335 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.12.0 >Reporter: Mujtaba Chohan >Assignee: James Taylor >Priority: Blocker > Fix For: 4.13.0 > > Attachments: PHOENIX-4335.patch, PHOENIX-4335_v2.patch > > > With current head of 4.x, System Catalog snapshot is created on each new > connection. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-4332) Stats - Altering guidepost width on base table does not propagate to global index
[ https://issues.apache.org/jira/browse/PHOENIX-4332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16227669#comment-16227669 ] Samarth Jain commented on PHOENIX-4332: --- [~jamestaylor], if possible, I would like to get this in for the 4.13 release. > Stats - Altering guidepost width on base table does not propagate to global > index > - > > Key: PHOENIX-4332 > URL: https://issues.apache.org/jira/browse/PHOENIX-4332 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.12.0 >Reporter: Mujtaba Chohan > Assignee: Samarth Jain > > Altering guidepost with on data table does not propagate to global index > using {{ALTER TABLE}} command. > Altering global index table runs in not allowed error. > {noformat} > ALTER TABLE IDX SET GUIDE_POSTS_WIDTH=1; > Error: ERROR 1010 (42M01): Not allowed to mutate table. Cannot add/drop > column referenced by VIEW columnName=IDX (state=42M01,code=1010) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-4335) System catalog snapshot created each time a new connection is created
[ https://issues.apache.org/jira/browse/PHOENIX-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16227665#comment-16227665 ] Samarth Jain commented on PHOENIX-4335: --- My comment was more around user expectation that a snapshot of the SYSTEM.CATALOG table will be created before phoenix ends up executing the upgrade code. They have been getting a snapshot for past 4 releases or so (because we have been changing the metadata, yes). And now for the 4.12 release they won't. They can always create a snapshot themselves too, just that it will be a bit of hassle as opposed to Phoenix doing it for them. > System catalog snapshot created each time a new connection is created > - > > Key: PHOENIX-4335 > URL: https://issues.apache.org/jira/browse/PHOENIX-4335 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.12.0 >Reporter: Mujtaba Chohan >Assignee: James Taylor >Priority: Blocker > Fix For: 4.13.0 > > Attachments: PHOENIX-4335.patch > > > With current head of 4.x, System Catalog snapshot is created on each new > connection. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-4335) System catalog snapshot created each time a new connection is created
[ https://issues.apache.org/jira/browse/PHOENIX-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16227658#comment-16227658 ] Samarth Jain commented on PHOENIX-4335: --- Thinking about this a little bit more, there is a slight downside that we won't be creating a snapshot of SYSTEM.CATALOG when users are upgrading to the 4.12 release. Maybe we should have some upgrade code to increment the SYSTEM table's timestamp even though we are not changing the metadata. > System catalog snapshot created each time a new connection is created > - > > Key: PHOENIX-4335 > URL: https://issues.apache.org/jira/browse/PHOENIX-4335 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.12.0 >Reporter: Mujtaba Chohan >Assignee: James Taylor >Priority: Blocker > Fix For: 4.13.0 > > > With current head of 4.x, System Catalog snapshot is created on each new > connection. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-4333) Stats - Incorrect estimate when stats are updated on a tenant specific view
[ https://issues.apache.org/jira/browse/PHOENIX-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16227639#comment-16227639 ] Samarth Jain commented on PHOENIX-4333: --- I have committed the test to the master, 4.x-HBase-0.98 and 4.12* branches. > Stats - Incorrect estimate when stats are updated on a tenant specific view > --- > > Key: PHOENIX-4333 > URL: https://issues.apache.org/jira/browse/PHOENIX-4333 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.12.0 >Reporter: Mujtaba Chohan >Assignee: Samarth Jain > Attachments: PHOENIX-4333_test.patch > > > Consider two tenants A, B with tenant specific view on 2 separate > regions/region servers. > {noformat} > Region 1 keys: > A,1 > A,2 > B,1 > Region 2 keys: > B,2 > B,3 > {noformat} > When stats are updated on tenant A view. Querying stats on tenant B view > yield partial results (only contains stats for B,1) which are incorrect even > though it shows updated timestamp as current. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (PHOENIX-4333) Stats - Incorrect estimate when stats are updated on a tenant specific view
[ https://issues.apache.org/jira/browse/PHOENIX-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samarth Jain updated PHOENIX-4333: -- Attachment: PHOENIX-4333_test.patch Test which demonstrates the issue that [~mujtabachohan] brought up. I would say it is working fine. We call these estimates for a reason :). If the user desires more accuracy, he/she should call UPDATE STATISTICS on the tenant view. FYI, [~cody.mar...@gmail.com] > Stats - Incorrect estimate when stats are updated on a tenant specific view > --- > > Key: PHOENIX-4333 > URL: https://issues.apache.org/jira/browse/PHOENIX-4333 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.12.0 >Reporter: Mujtaba Chohan >Assignee: Samarth Jain > Attachments: PHOENIX-4333_test.patch > > > Consider two tenants A, B with tenant specific view on 2 separate > regions/region servers. > {noformat} > Region 1 keys: > A,1 > A,2 > B,1 > Region 2 keys: > B,2 > B,3 > {noformat} > When stats are updated on tenant A view. Querying stats on tenant B view > yield partial results (only contains stats for B,1) which are incorrect even > though it shows updated timestamp as current. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (PHOENIX-4333) Stats - Incorrect estimate when stats are updated on a tenant specific view
[ https://issues.apache.org/jira/browse/PHOENIX-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16227597#comment-16227597 ] Samarth Jain edited comment on PHOENIX-4333 at 10/31/17 9:12 PM: - Test which demonstrates the issue that [~mujtabachohan] brought up. I would say it is working as designed. We call these estimates for a reason :). If the user desires more accuracy, he/she should call UPDATE STATISTICS on the tenant view. FYI, [~cody.mar...@gmail.com] was (Author: samarthjain): Test which demonstrates the issue that [~mujtabachohan] brought up. I would say it is working fine. We call these estimates for a reason :). If the user desires more accuracy, he/she should call UPDATE STATISTICS on the tenant view. FYI, [~cody.mar...@gmail.com] > Stats - Incorrect estimate when stats are updated on a tenant specific view > --- > > Key: PHOENIX-4333 > URL: https://issues.apache.org/jira/browse/PHOENIX-4333 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.12.0 >Reporter: Mujtaba Chohan >Assignee: Samarth Jain > Attachments: PHOENIX-4333_test.patch > > > Consider two tenants A, B with tenant specific view on 2 separate > regions/region servers. > {noformat} > Region 1 keys: > A,1 > A,2 > B,1 > Region 2 keys: > B,2 > B,3 > {noformat} > When stats are updated on tenant A view. Querying stats on tenant B view > yield partial results (only contains stats for B,1) which are incorrect even > though it shows updated timestamp as current. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-4333) Stats - Incorrect estimate when stats are updated on a tenant specific view
[ https://issues.apache.org/jira/browse/PHOENIX-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16226388#comment-16226388 ] Samarth Jain commented on PHOENIX-4333: --- I am not sure what is the best option here. We possibly shouldn't be relying on the EST_INFO_TS for tenant views since in situations like these overlaps, we may have incomplete guide post info for a view. The user can possibly call update stats on the view after the first data load. And then subsequently rely on major compaction to collect stats for it. [~jamestaylor], WDYT? > Stats - Incorrect estimate when stats are updated on a tenant specific view > --- > > Key: PHOENIX-4333 > URL: https://issues.apache.org/jira/browse/PHOENIX-4333 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.12.0 >Reporter: Mujtaba Chohan >Assignee: Samarth Jain > > Consider two tenants A, B with tenant specific view on 2 separate > regions/region servers. > {noformat} > Region 1 keys: > A,1 > A,2 > B,1 > Region 2 keys: > B,2 > B,3 > {noformat} > When stats are updated on tenant A view. Querying stats on tenant B view > yield partial results (only contains stats for B,1) which are incorrect even > though it shows updated timestamp as current. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (PHOENIX-4287) Incorrect aggregate query results when stats are disable for parallelization
[ https://issues.apache.org/jira/browse/PHOENIX-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samarth Jain updated PHOENIX-4287: -- Attachment: PHOENIX-4287_v4.patch Thanks for the review, [~jamestaylor]. Attached is the updated patch. Will wait for the QA run to finish before I commit. > Incorrect aggregate query results when stats are disable for parallelization > > > Key: PHOENIX-4287 > URL: https://issues.apache.org/jira/browse/PHOENIX-4287 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.12.0 > Environment: HBase 1.3.1 >Reporter: Mujtaba Chohan > Assignee: Samarth Jain > Labels: localIndex > Fix For: 4.12.1 > > Attachments: PHOENIX-4287.patch, PHOENIX-4287_v2.patch, > PHOENIX-4287_v3.patch, PHOENIX-4287_v3_wip.patch, PHOENIX-4287_v4.patch > > > With {{phoenix.use.stats.parallelization}} set to {{false}}, aggregate query > returns incorrect results when stats are available. > With local index and stats disabled for parallelization: > {noformat} > explain select count(*) from TABLE_T; > +---+-++---+ > | PLAN > | EST_BYTES_READ | EST_ROWS_READ | EST_INFO | > +---+-++---+ > | CLIENT 0-CHUNK 332170 ROWS 625043899 BYTES PARALLEL 0-WAY RANGE SCAN OVER > TABLE_T [1] | 625043899 | 332170 | 150792825 | > | SERVER FILTER BY FIRST KEY ONLY > | 625043899 | 332170 | 150792825 | > | SERVER AGGREGATE INTO SINGLE ROW > | 625043899 | 332170 | 150792825 | > +---+-++---+ > select count(*) from TABLE_T; > +---+ > | COUNT(1) | > +---+ > | 0 | > +---+ > {noformat} > Using data table > {noformat} > explain select /*+NO_INDEX*/ count(*) from TABLE_T; > +--+-+++ > | PLAN > | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS | > +--+-+++ > | CLIENT 2-CHUNK 332151 ROWS 438492470 BYTES PARALLEL 1-WAY FULL SCAN OVER > TABLE_T | 438492470 | 332151 | 1507928257617 | > | SERVER FILTER BY FIRST KEY ONLY > | 438492470 | 332151 | 1507928257617 | > | SERVER AGGREGATE INTO SINGLE ROW > | 438492470 | 332151 | 1507928257617 | > +--+-+++ > select /*+NO_INDEX*/ count(*) from TABLE_T; > +---+ > | COUNT(1) | > +---+ > | 14| > +---+ > {noformat} > Without stats available, results are correct: > {noformat} > explain select /*+NO_INDEX*/ count(*) from TABLE_T; > +--+-++--+ > | PLAN | > EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS | > +--+-++--+ > | CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER TABLE_T | null| > null | null | > | SERVER FILTER BY FIRST KEY ONLY | null >| null | null | > | SERVER AGGREGATE INTO SINGLE ROW | null >| null | null | > +--+-++--+ > select /*+NO_INDEX*/ count(*) from TABLE_T; > +---+ > | COUNT(1) | > +---+ > | 27| > +---+ > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-4335) System catalog snapshot created each time a new connection is created
[ https://issues.apache.org/jira/browse/PHOENIX-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16226372#comment-16226372 ] Samarth Jain commented on PHOENIX-4335: --- You could possibly spy/mock the ConnectionQueryServicesImpl object and make sure that when establishing more than one HConnection to the cluster (by using the EXTRA_JDBC_ARGUMENTS param in the connection properties), {code} private void createSnapshot(String snapshotName, String tableName) throws SQLException { {code} is not called more than once. Such a test will fail without your patch. > System catalog snapshot created each time a new connection is created > - > > Key: PHOENIX-4335 > URL: https://issues.apache.org/jira/browse/PHOENIX-4335 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.12.0 >Reporter: Mujtaba Chohan >Priority: Blocker > Fix For: 4.13.0 > > > With current head of 4.x, System Catalog snapshot is created on each new > connection. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-4334) Unable to update stats on views that reside on separate regions before phoenix.stats.updateFrequency has elapsed
[ https://issues.apache.org/jira/browse/PHOENIX-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16226350#comment-16226350 ] Samarth Jain commented on PHOENIX-4334: --- [~jamestaylor] - any other ideas on how we can prevent update stats on view2 to not block itself from running when update stats on view1 has already run? We could possibly store last_update_stats_time at the logical table level too. But that would be a non-trivial change. > Unable to update stats on views that reside on separate regions before > phoenix.stats.updateFrequency has elapsed > > > Key: PHOENIX-4334 > URL: https://issues.apache.org/jira/browse/PHOENIX-4334 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.12.0 >Reporter: Mujtaba Chohan > Assignee: Samarth Jain > > Consider multiple tenant views that all reside on unique region/region > servers. Updating stats on any one of the view causes other views to report > estimated stats last update time as current resulting in stats command > getting ignored for other views till {{phoenix.stats.updateFrequency}} has > elapsed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-4334) Unable to update stats on views that reside on separate regions before phoenix.stats.updateFrequency has elapsed
[ https://issues.apache.org/jira/browse/PHOENIX-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16226349#comment-16226349 ] Samarth Jain commented on PHOENIX-4334: --- We store last_update_time at the physical table level. So if we end up collecting stats for view1, then we will have to wait for phoenix.stats.updateFrequency before update stats on view2 has any effect. An alternative would be set phoenix.stats.updateFrequency to 0. I will take a look at why view2 is reporting estimate time as current time. > Unable to update stats on views that reside on separate regions before > phoenix.stats.updateFrequency has elapsed > > > Key: PHOENIX-4334 > URL: https://issues.apache.org/jira/browse/PHOENIX-4334 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.12.0 >Reporter: Mujtaba Chohan > Assignee: Samarth Jain > > Consider multiple tenant views that all reside on unique region/region > servers. Updating stats on any one of the view causes other views to report > estimated stats last update time as current resulting in stats command > getting ignored for other views till {{phoenix.stats.updateFrequency}} has > elapsed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-4335) System catalog snapshot created each time a new connection is created
[ https://issues.apache.org/jira/browse/PHOENIX-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16226347#comment-16226347 ] Samarth Jain commented on PHOENIX-4335: --- Would a straightforward change be to revert the MIN_SYSTEM_TABLE_TIMESTAMP increment? We rely on the system table's timestamp to check whether we need to create a snapshot. > System catalog snapshot created each time a new connection is created > - > > Key: PHOENIX-4335 > URL: https://issues.apache.org/jira/browse/PHOENIX-4335 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.12.0 >Reporter: Mujtaba Chohan >Priority: Blocker > Fix For: 4.13.0 > > > With current head of 4.x, System Catalog snapshot is created on each new > connection. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (PHOENIX-4287) Incorrect aggregate query results when stats are disable for parallelization
[ https://issues.apache.org/jira/browse/PHOENIX-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samarth Jain updated PHOENIX-4287: -- Attachment: PHOENIX-4287_v3.patch I think I figured out what was going on. When we are not using stats for parallelization, we need to reset the start key of the scan to either the original scan's start key (if we are looking at the first region) or to the end key of the previous region. [~jamestaylor] - your keen eyes would be much appreciated. It is tricky to get this stuff right. > Incorrect aggregate query results when stats are disable for parallelization > > > Key: PHOENIX-4287 > URL: https://issues.apache.org/jira/browse/PHOENIX-4287 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.12.0 > Environment: HBase 1.3.1 >Reporter: Mujtaba Chohan > Assignee: Samarth Jain > Labels: localIndex > Fix For: 4.12.1 > > Attachments: PHOENIX-4287.patch, PHOENIX-4287_v2.patch, > PHOENIX-4287_v3.patch, PHOENIX-4287_v3_wip.patch > > > With {{phoenix.use.stats.parallelization}} set to {{false}}, aggregate query > returns incorrect results when stats are available. > With local index and stats disabled for parallelization: > {noformat} > explain select count(*) from TABLE_T; > +---+-++---+ > | PLAN > | EST_BYTES_READ | EST_ROWS_READ | EST_INFO | > +---+-++---+ > | CLIENT 0-CHUNK 332170 ROWS 625043899 BYTES PARALLEL 0-WAY RANGE SCAN OVER > TABLE_T [1] | 625043899 | 332170 | 150792825 | > | SERVER FILTER BY FIRST KEY ONLY > | 625043899 | 332170 | 150792825 | > | SERVER AGGREGATE INTO SINGLE ROW > | 625043899 | 332170 | 150792825 | > +---+-++---+ > select count(*) from TABLE_T; > +---+ > | COUNT(1) | > +---+ > | 0 | > +---+ > {noformat} > Using data table > {noformat} > explain select /*+NO_INDEX*/ count(*) from TABLE_T; > +--+-+++ > | PLAN > | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS | > +--+-+++ > | CLIENT 2-CHUNK 332151 ROWS 438492470 BYTES PARALLEL 1-WAY FULL SCAN OVER > TABLE_T | 438492470 | 332151 | 1507928257617 | > | SERVER FILTER BY FIRST KEY ONLY > | 438492470 | 332151 | 1507928257617 | > | SERVER AGGREGATE INTO SINGLE ROW > | 438492470 | 332151 | 1507928257617 | > +--+-+++ > select /*+NO_INDEX*/ count(*) from TABLE_T; > +---+ > | COUNT(1) | > +---+ > | 14| > +---+ > {noformat} > Without stats available, results are correct: > {noformat} > explain select /*+NO_INDEX*/ count(*) from TABLE_T; > +--+-++--+ > | PLAN | > EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS | > +--+-++--+ > | CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER TABLE_T | null| > null | null | > | SERVER FILTER BY FIRS
[jira] [Updated] (PHOENIX-4287) Incorrect aggregate query results when stats are disable for parallelization
[ https://issues.apache.org/jira/browse/PHOENIX-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samarth Jain updated PHOENIX-4287: -- Attachment: PHOENIX-4287_v3_wip.patch wip patch for an attempt to use the existing code. Doesn't work, yet. > Incorrect aggregate query results when stats are disable for parallelization > > > Key: PHOENIX-4287 > URL: https://issues.apache.org/jira/browse/PHOENIX-4287 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.12.0 > Environment: HBase 1.3.1 >Reporter: Mujtaba Chohan > Assignee: Samarth Jain > Labels: localIndex > Fix For: 4.12.1 > > Attachments: PHOENIX-4287.patch, PHOENIX-4287_v2.patch, > PHOENIX-4287_v3_wip.patch > > > With {{phoenix.use.stats.parallelization}} set to {{false}}, aggregate query > returns incorrect results when stats are available. > With local index and stats disabled for parallelization: > {noformat} > explain select count(*) from TABLE_T; > +---+-++---+ > | PLAN > | EST_BYTES_READ | EST_ROWS_READ | EST_INFO | > +---+-++---+ > | CLIENT 0-CHUNK 332170 ROWS 625043899 BYTES PARALLEL 0-WAY RANGE SCAN OVER > TABLE_T [1] | 625043899 | 332170 | 150792825 | > | SERVER FILTER BY FIRST KEY ONLY > | 625043899 | 332170 | 150792825 | > | SERVER AGGREGATE INTO SINGLE ROW > | 625043899 | 332170 | 150792825 | > +---+-++---+ > select count(*) from TABLE_T; > +---+ > | COUNT(1) | > +---+ > | 0 | > +---+ > {noformat} > Using data table > {noformat} > explain select /*+NO_INDEX*/ count(*) from TABLE_T; > +--+-+++ > | PLAN > | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS | > +--+-+++ > | CLIENT 2-CHUNK 332151 ROWS 438492470 BYTES PARALLEL 1-WAY FULL SCAN OVER > TABLE_T | 438492470 | 332151 | 1507928257617 | > | SERVER FILTER BY FIRST KEY ONLY > | 438492470 | 332151 | 1507928257617 | > | SERVER AGGREGATE INTO SINGLE ROW > | 438492470 | 332151 | 1507928257617 | > +--+-+++ > select /*+NO_INDEX*/ count(*) from TABLE_T; > +---+ > | COUNT(1) | > +---+ > | 14| > +---+ > {noformat} > Without stats available, results are correct: > {noformat} > explain select /*+NO_INDEX*/ count(*) from TABLE_T; > +--+-++--+ > | PLAN | > EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS | > +--+-++--+ > | CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER TABLE_T | null| > null | null | > | SERVER FILTER BY FIRST KEY ONLY | null >| null | null | > | SERVER AGGREGATE INTO SINGLE ROW | null >| null | null | > +--+-++--+ > select /*+NO_INDEX*/ count(*) from TABLE_T; > +---+ > | COUNT(1) | > +---+ > | 27| > +---+ > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-4287) Incorrect aggregate query results when stats are disable for parallelization
[ https://issues.apache.org/jira/browse/PHOENIX-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225583#comment-16225583 ] Samarth Jain commented on PHOENIX-4287: --- There is some level of duplication but the generation of estimates when statsParallelization is off is relatively simpler. We only need to intersect scan stop and start key with guideposts and not worry about region boundaries and everything else which the code in getParallelScans() does. My previous attempt at using the existing code to generate estimates and not generate intra-region scans failed miserably. I will sync with you offline to see if what we can do to reuse the existing code. > Incorrect aggregate query results when stats are disable for parallelization > > > Key: PHOENIX-4287 > URL: https://issues.apache.org/jira/browse/PHOENIX-4287 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.12.0 > Environment: HBase 1.3.1 >Reporter: Mujtaba Chohan > Assignee: Samarth Jain > Labels: localIndex > Fix For: 4.12.1 > > Attachments: PHOENIX-4287.patch, PHOENIX-4287_v2.patch > > > With {{phoenix.use.stats.parallelization}} set to {{false}}, aggregate query > returns incorrect results when stats are available. > With local index and stats disabled for parallelization: > {noformat} > explain select count(*) from TABLE_T; > +---+-++---+ > | PLAN > | EST_BYTES_READ | EST_ROWS_READ | EST_INFO | > +---+-++---+ > | CLIENT 0-CHUNK 332170 ROWS 625043899 BYTES PARALLEL 0-WAY RANGE SCAN OVER > TABLE_T [1] | 625043899 | 332170 | 150792825 | > | SERVER FILTER BY FIRST KEY ONLY > | 625043899 | 332170 | 150792825 | > | SERVER AGGREGATE INTO SINGLE ROW > | 625043899 | 332170 | 150792825 | > +---+-++---+ > select count(*) from TABLE_T; > +---+ > | COUNT(1) | > +---+ > | 0 | > +---+ > {noformat} > Using data table > {noformat} > explain select /*+NO_INDEX*/ count(*) from TABLE_T; > +--+-+++ > | PLAN > | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS | > +--+-+++ > | CLIENT 2-CHUNK 332151 ROWS 438492470 BYTES PARALLEL 1-WAY FULL SCAN OVER > TABLE_T | 438492470 | 332151 | 1507928257617 | > | SERVER FILTER BY FIRST KEY ONLY > | 438492470 | 332151 | 1507928257617 | > | SERVER AGGREGATE INTO SINGLE ROW > | 438492470 | 332151 | 1507928257617 | > +--+-+++ > select /*+NO_INDEX*/ count(*) from TABLE_T; > +---+ > | COUNT(1) | > +---+ > | 14| > +---+ > {noformat} > Without stats available, results are correct: > {noformat} > explain select /*+NO_INDEX*/ count(*) from TABLE_T; > +--+-++--+ > | PLAN | > EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS | > +--+-++--+ > | CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER TABLE_T | null| > null | null | > | SERVER FILTER BY FIRST K
[jira] [Commented] (PHOENIX-4287) Incorrect aggregate query results when stats are disable for parallelization
[ https://issues.apache.org/jira/browse/PHOENIX-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225434#comment-16225434 ] Samarth Jain commented on PHOENIX-4287: --- Yes, v2 just has changes relevant to this JIRA. > Incorrect aggregate query results when stats are disable for parallelization > > > Key: PHOENIX-4287 > URL: https://issues.apache.org/jira/browse/PHOENIX-4287 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.12.0 > Environment: HBase 1.3.1 >Reporter: Mujtaba Chohan > Assignee: Samarth Jain > Labels: localIndex > Fix For: 4.12.1 > > Attachments: PHOENIX-4287.patch, PHOENIX-4287_v2.patch > > > With {{phoenix.use.stats.parallelization}} set to {{false}}, aggregate query > returns incorrect results when stats are available. > With local index and stats disabled for parallelization: > {noformat} > explain select count(*) from TABLE_T; > +---+-++---+ > | PLAN > | EST_BYTES_READ | EST_ROWS_READ | EST_INFO | > +---+-++---+ > | CLIENT 0-CHUNK 332170 ROWS 625043899 BYTES PARALLEL 0-WAY RANGE SCAN OVER > TABLE_T [1] | 625043899 | 332170 | 150792825 | > | SERVER FILTER BY FIRST KEY ONLY > | 625043899 | 332170 | 150792825 | > | SERVER AGGREGATE INTO SINGLE ROW > | 625043899 | 332170 | 150792825 | > +---+-++---+ > select count(*) from TABLE_T; > +---+ > | COUNT(1) | > +---+ > | 0 | > +---+ > {noformat} > Using data table > {noformat} > explain select /*+NO_INDEX*/ count(*) from TABLE_T; > +--+-+++ > | PLAN > | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS | > +--+-+++ > | CLIENT 2-CHUNK 332151 ROWS 438492470 BYTES PARALLEL 1-WAY FULL SCAN OVER > TABLE_T | 438492470 | 332151 | 1507928257617 | > | SERVER FILTER BY FIRST KEY ONLY > | 438492470 | 332151 | 1507928257617 | > | SERVER AGGREGATE INTO SINGLE ROW > | 438492470 | 332151 | 1507928257617 | > +--+-+++ > select /*+NO_INDEX*/ count(*) from TABLE_T; > +---+ > | COUNT(1) | > +---+ > | 14| > +---+ > {noformat} > Without stats available, results are correct: > {noformat} > explain select /*+NO_INDEX*/ count(*) from TABLE_T; > +--+-++--+ > | PLAN | > EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS | > +--+-++--+ > | CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER TABLE_T | null| > null | null | > | SERVER FILTER BY FIRST KEY ONLY | null >| null | null | > | SERVER AGGREGATE INTO SINGLE ROW | null >| null | null | > +--+-++--+ > select /*+NO_INDEX*/ count(*) from TABLE_T; > +---+ > | COUNT(1) | > +---+ > | 27| > +---+ > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (PHOENIX-4287) Incorrect aggregate query results when stats are disable for parallelization
[ https://issues.apache.org/jira/browse/PHOENIX-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samarth Jain updated PHOENIX-4287: -- Attachment: PHOENIX-4287_v2.patch > Incorrect aggregate query results when stats are disable for parallelization > > > Key: PHOENIX-4287 > URL: https://issues.apache.org/jira/browse/PHOENIX-4287 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.12.0 > Environment: HBase 1.3.1 >Reporter: Mujtaba Chohan > Assignee: Samarth Jain > Labels: localIndex > Fix For: 4.12.1 > > Attachments: PHOENIX-4287.patch, PHOENIX-4287_v2.patch > > > With {{phoenix.use.stats.parallelization}} set to {{false}}, aggregate query > returns incorrect results when stats are available. > With local index and stats disabled for parallelization: > {noformat} > explain select count(*) from TABLE_T; > +---+-++---+ > | PLAN > | EST_BYTES_READ | EST_ROWS_READ | EST_INFO | > +---+-++---+ > | CLIENT 0-CHUNK 332170 ROWS 625043899 BYTES PARALLEL 0-WAY RANGE SCAN OVER > TABLE_T [1] | 625043899 | 332170 | 150792825 | > | SERVER FILTER BY FIRST KEY ONLY > | 625043899 | 332170 | 150792825 | > | SERVER AGGREGATE INTO SINGLE ROW > | 625043899 | 332170 | 150792825 | > +---+-++---+ > select count(*) from TABLE_T; > +---+ > | COUNT(1) | > +---+ > | 0 | > +---+ > {noformat} > Using data table > {noformat} > explain select /*+NO_INDEX*/ count(*) from TABLE_T; > +--+-+++ > | PLAN > | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS | > +--+-+++ > | CLIENT 2-CHUNK 332151 ROWS 438492470 BYTES PARALLEL 1-WAY FULL SCAN OVER > TABLE_T | 438492470 | 332151 | 1507928257617 | > | SERVER FILTER BY FIRST KEY ONLY > | 438492470 | 332151 | 1507928257617 | > | SERVER AGGREGATE INTO SINGLE ROW > | 438492470 | 332151 | 1507928257617 | > +--+-+++ > select /*+NO_INDEX*/ count(*) from TABLE_T; > +---+ > | COUNT(1) | > +---+ > | 14| > +---+ > {noformat} > Without stats available, results are correct: > {noformat} > explain select /*+NO_INDEX*/ count(*) from TABLE_T; > +--+-++--+ > | PLAN | > EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS | > +--+-++--+ > | CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER TABLE_T | null| > null | null | > | SERVER FILTER BY FIRST KEY ONLY | null >| null | null | > | SERVER AGGREGATE INTO SINGLE ROW | null >| null | null | > +--+-++--+ > select /*+NO_INDEX*/ count(*) from TABLE_T; > +---+ > | COUNT(1) | > +---+ > | 27| > +---+ > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-4289) UPDATE STATISTICS command does not collect stats for local indexes
[ https://issues.apache.org/jira/browse/PHOENIX-4289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16224407#comment-16224407 ] Samarth Jain commented on PHOENIX-4289: --- Tests passed. I will go ahead and commit this patch. > UPDATE STATISTICS command does not collect stats for local indexes > -- > > Key: PHOENIX-4289 > URL: https://issues.apache.org/jira/browse/PHOENIX-4289 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.12.0 > Environment: HBase 1.3.1, Phoenix 4.12.0 >Reporter: Mujtaba Chohan > Assignee: Samarth Jain > Labels: localIndex > Attachments: PHOENIX-4289.patch, PHOENIX-4289_v2.patch, > PHOENIX-4289_v3.patch, PHOENIX-4289_v4.patch > > > With clean {{SYSTEM.STATS}} table and restarted HBase server+Phoenix client. > Ran {{UPDATE STATISTICS T ALL}} command. Global guidepost width is set to > 100M. No stats are generated for any of the local indexes on table T. > {noformat} > explain select count(*) from T; > +---+-++--+ > | PLAN| > EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS | > +---+-++--+ > | CLIENT 8-CHUNK PARALLEL 8-WAY RANGE SCAN OVER T [1] | > null| null | null | > | SERVER FILTER BY FIRST KEY ONLY | > null| null | null | > | SERVER AGGREGATE INTO SINGLE ROW | > null| null | null | > +---+-++--+ > select * from system.stats; > +--++-++--++ > |PHYSICAL_NAME | COLUMN_FAMILY | GUIDE_POST_KEY | > GUIDE_POSTS_WIDTH | LAST_STATS_UPDATE_TIME | GUIDE_POSTS_ROW_COUNT | > +--++-++--++ > | T || | null | > 2017-10-16 18:36:57.884 | null | > | T | 0 | [B@9bd0fa6 | 10099 | >| 75756 | > | T | 0 | [B@59d2103b | 10057 | >| 75748 | > | T | 0 | [B@39dcf4b0 | 10058 | >| 75748 | > | T | 0 | [B@6e4de19b | 10081 | >| 75743 | > | T | 0 | [B@f6c03cb | 10044 | >| 75744 | > | T | 0 | [B@46f699d5 | 10023 | >| 75741 | > | T | 0 | [B@18518ccf | 10019 | >| 75749 | > | T | 0 | [B@1991f767 | 10097 | >| 75740 | > | T | 0 | [B@768ccdc5 | 10092 | >| 75740 | > | T | 0 | [B@4c6daf0 | 10026 | >| 75739 | > | T | 0 | [B@10650953 | 10054 | >| 75731 | > | T | 0 | [B@659eef7 | 10092 | >| 75741 | > | T | 0 | [B@162be91c | 10023 | >| 75752 | > | T | 0 | [B@2488b073 | 10096 | >| 75743 | > | T | 0 | [B@1c9f0a20 | 10025 | >| 75745 | > | T | 0 | [B@55787112 | 10104 | >| 75725 | > | T | 0 | [B@1cd201a8 | 10019 |
[jira] [Updated] (PHOENIX-4287) Incorrect aggregate query results when stats are disable for parallelization
[ https://issues.apache.org/jira/browse/PHOENIX-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samarth Jain updated PHOENIX-4287: -- Attachment: PHOENIX-4287.patch Patch on top on PHOENIX-4289. [~jamestaylor], please review. > Incorrect aggregate query results when stats are disable for parallelization > > > Key: PHOENIX-4287 > URL: https://issues.apache.org/jira/browse/PHOENIX-4287 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.12.0 > Environment: HBase 1.3.1 >Reporter: Mujtaba Chohan > Assignee: Samarth Jain > Labels: localIndex > Fix For: 4.12.1 > > Attachments: PHOENIX-4287.patch > > > With {{phoenix.use.stats.parallelization}} set to {{false}}, aggregate query > returns incorrect results when stats are available. > With local index and stats disabled for parallelization: > {noformat} > explain select count(*) from TABLE_T; > +---+-++---+ > | PLAN > | EST_BYTES_READ | EST_ROWS_READ | EST_INFO | > +---+-++---+ > | CLIENT 0-CHUNK 332170 ROWS 625043899 BYTES PARALLEL 0-WAY RANGE SCAN OVER > TABLE_T [1] | 625043899 | 332170 | 150792825 | > | SERVER FILTER BY FIRST KEY ONLY > | 625043899 | 332170 | 150792825 | > | SERVER AGGREGATE INTO SINGLE ROW > | 625043899 | 332170 | 150792825 | > +---+-++---+ > select count(*) from TABLE_T; > +---+ > | COUNT(1) | > +---+ > | 0 | > +---+ > {noformat} > Using data table > {noformat} > explain select /*+NO_INDEX*/ count(*) from TABLE_T; > +--+-+++ > | PLAN > | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS | > +--+-+++ > | CLIENT 2-CHUNK 332151 ROWS 438492470 BYTES PARALLEL 1-WAY FULL SCAN OVER > TABLE_T | 438492470 | 332151 | 1507928257617 | > | SERVER FILTER BY FIRST KEY ONLY > | 438492470 | 332151 | 1507928257617 | > | SERVER AGGREGATE INTO SINGLE ROW > | 438492470 | 332151 | 1507928257617 | > +--+-+++ > select /*+NO_INDEX*/ count(*) from TABLE_T; > +---+ > | COUNT(1) | > +---+ > | 14| > +---+ > {noformat} > Without stats available, results are correct: > {noformat} > explain select /*+NO_INDEX*/ count(*) from TABLE_T; > +--+-++--+ > | PLAN | > EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS | > +--+-++--+ > | CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER TABLE_T | null| > null | null | > | SERVER FILTER BY FIRST KEY ONLY | null >| null | null | > | SERVER AGGREGATE INTO SINGLE ROW | null >| null | null | > +--+-++--+ > select /*+NO_INDEX*/ count(*) from TABLE_T; > +---+ > | COUNT(1) | > +---+ > | 27| > +---+ > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (PHOENIX-4289) UPDATE STATISTICS command does not collect stats for local indexes
[ https://issues.apache.org/jira/browse/PHOENIX-4289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samarth Jain updated PHOENIX-4289: -- Attachment: PHOENIX-4289_v4.patch Fixing test failure. > UPDATE STATISTICS command does not collect stats for local indexes > -- > > Key: PHOENIX-4289 > URL: https://issues.apache.org/jira/browse/PHOENIX-4289 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.12.0 > Environment: HBase 1.3.1, Phoenix 4.12.0 >Reporter: Mujtaba Chohan > Assignee: Samarth Jain > Labels: localIndex > Attachments: PHOENIX-4289.patch, PHOENIX-4289_v2.patch, > PHOENIX-4289_v3.patch, PHOENIX-4289_v4.patch > > > With clean {{SYSTEM.STATS}} table and restarted HBase server+Phoenix client. > Ran {{UPDATE STATISTICS T ALL}} command. Global guidepost width is set to > 100M. No stats are generated for any of the local indexes on table T. > {noformat} > explain select count(*) from T; > +---+-++--+ > | PLAN| > EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS | > +---+-++--+ > | CLIENT 8-CHUNK PARALLEL 8-WAY RANGE SCAN OVER T [1] | > null| null | null | > | SERVER FILTER BY FIRST KEY ONLY | > null| null | null | > | SERVER AGGREGATE INTO SINGLE ROW | > null| null | null | > +---+-++--+ > select * from system.stats; > +--++-++--++ > |PHYSICAL_NAME | COLUMN_FAMILY | GUIDE_POST_KEY | > GUIDE_POSTS_WIDTH | LAST_STATS_UPDATE_TIME | GUIDE_POSTS_ROW_COUNT | > +--++-++--++ > | T || | null | > 2017-10-16 18:36:57.884 | null | > | T | 0 | [B@9bd0fa6 | 10099 | >| 75756 | > | T | 0 | [B@59d2103b | 10057 | >| 75748 | > | T | 0 | [B@39dcf4b0 | 10058 | >| 75748 | > | T | 0 | [B@6e4de19b | 10081 | >| 75743 | > | T | 0 | [B@f6c03cb | 10044 | >| 75744 | > | T | 0 | [B@46f699d5 | 10023 | >| 75741 | > | T | 0 | [B@18518ccf | 10019 | >| 75749 | > | T | 0 | [B@1991f767 | 10097 | >| 75740 | > | T | 0 | [B@768ccdc5 | 10092 | >| 75740 | > | T | 0 | [B@4c6daf0 | 10026 | >| 75739 | > | T | 0 | [B@10650953 | 10054 | >| 75731 | > | T | 0 | [B@659eef7 | 10092 | >| 75741 | > | T | 0 | [B@162be91c | 10023 | >| 75752 | > | T | 0 | [B@2488b073 | 10096 | >| 75743 | > | T | 0 | [B@1c9f0a20 | 10025 | >| 75745 | > | T | 0 | [B@55787112 | 10104 | >| 75725 | > | T | 0 | [B@1cd201a8 | 10019 | >
[jira] [Updated] (PHOENIX-4289) UPDATE STATISTICS command does not collect stats for local indexes
[ https://issues.apache.org/jira/browse/PHOENIX-4289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samarth Jain updated PHOENIX-4289: -- Attachment: PHOENIX-4289_v3.patch v3 patch to address the test failure. > UPDATE STATISTICS command does not collect stats for local indexes > -- > > Key: PHOENIX-4289 > URL: https://issues.apache.org/jira/browse/PHOENIX-4289 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.12.0 > Environment: HBase 1.3.1, Phoenix 4.12.0 >Reporter: Mujtaba Chohan > Assignee: Samarth Jain > Labels: localIndex > Attachments: PHOENIX-4289.patch, PHOENIX-4289_v2.patch, > PHOENIX-4289_v3.patch > > > With clean {{SYSTEM.STATS}} table and restarted HBase server+Phoenix client. > Ran {{UPDATE STATISTICS T ALL}} command. Global guidepost width is set to > 100M. No stats are generated for any of the local indexes on table T. > {noformat} > explain select count(*) from T; > +---+-++--+ > | PLAN| > EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS | > +---+-++--+ > | CLIENT 8-CHUNK PARALLEL 8-WAY RANGE SCAN OVER T [1] | > null| null | null | > | SERVER FILTER BY FIRST KEY ONLY | > null| null | null | > | SERVER AGGREGATE INTO SINGLE ROW | > null| null | null | > +---+-++--+ > select * from system.stats; > +--++-++--++ > |PHYSICAL_NAME | COLUMN_FAMILY | GUIDE_POST_KEY | > GUIDE_POSTS_WIDTH | LAST_STATS_UPDATE_TIME | GUIDE_POSTS_ROW_COUNT | > +--++-++--++ > | T || | null | > 2017-10-16 18:36:57.884 | null | > | T | 0 | [B@9bd0fa6 | 10099 | >| 75756 | > | T | 0 | [B@59d2103b | 10057 | >| 75748 | > | T | 0 | [B@39dcf4b0 | 10058 | >| 75748 | > | T | 0 | [B@6e4de19b | 10081 | >| 75743 | > | T | 0 | [B@f6c03cb | 10044 | >| 75744 | > | T | 0 | [B@46f699d5 | 10023 | >| 75741 | > | T | 0 | [B@18518ccf | 10019 | >| 75749 | > | T | 0 | [B@1991f767 | 10097 | >| 75740 | > | T | 0 | [B@768ccdc5 | 10092 | >| 75740 | > | T | 0 | [B@4c6daf0 | 10026 | >| 75739 | > | T | 0 | [B@10650953 | 10054 | >| 75731 | > | T | 0 | [B@659eef7 | 10092 | >| 75741 | > | T | 0 | [B@162be91c | 10023 | >| 75752 | > | T | 0 | [B@2488b073 | 10096 | >| 75743 | > | T | 0 | [B@1c9f0a20 | 10025 | >| 75745 | > | T | 0 | [B@55787112 | 10104 | >| 75725 | > | T | 0 | [B@1cd201a8 | 10019 | >
[jira] [Updated] (PHOENIX-4289) UPDATE STATISTICS command does not collect stats for local indexes
[ https://issues.apache.org/jira/browse/PHOENIX-4289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samarth Jain updated PHOENIX-4289: -- Attachment: PHOENIX-4289_v2.patch Previous patch had an issue which was preventing stats being collected for local indexes on views. Updated patch. > UPDATE STATISTICS command does not collect stats for local indexes > -- > > Key: PHOENIX-4289 > URL: https://issues.apache.org/jira/browse/PHOENIX-4289 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.12.0 > Environment: HBase 1.3.1, Phoenix 4.12.0 >Reporter: Mujtaba Chohan > Assignee: Samarth Jain > Labels: localIndex > Attachments: PHOENIX-4289.patch, PHOENIX-4289_v2.patch > > > With clean {{SYSTEM.STATS}} table and restarted HBase server+Phoenix client. > Ran {{UPDATE STATISTICS T ALL}} command. Global guidepost width is set to > 100M. No stats are generated for any of the local indexes on table T. > {noformat} > explain select count(*) from T; > +---+-++--+ > | PLAN| > EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS | > +---+-++--+ > | CLIENT 8-CHUNK PARALLEL 8-WAY RANGE SCAN OVER T [1] | > null| null | null | > | SERVER FILTER BY FIRST KEY ONLY | > null| null | null | > | SERVER AGGREGATE INTO SINGLE ROW | > null| null | null | > +---+-++--+ > select * from system.stats; > +--++-++--++ > |PHYSICAL_NAME | COLUMN_FAMILY | GUIDE_POST_KEY | > GUIDE_POSTS_WIDTH | LAST_STATS_UPDATE_TIME | GUIDE_POSTS_ROW_COUNT | > +--++-++--++ > | T || | null | > 2017-10-16 18:36:57.884 | null | > | T | 0 | [B@9bd0fa6 | 10099 | >| 75756 | > | T | 0 | [B@59d2103b | 10057 | >| 75748 | > | T | 0 | [B@39dcf4b0 | 10058 | >| 75748 | > | T | 0 | [B@6e4de19b | 10081 | >| 75743 | > | T | 0 | [B@f6c03cb | 10044 | >| 75744 | > | T | 0 | [B@46f699d5 | 10023 | >| 75741 | > | T | 0 | [B@18518ccf | 10019 | >| 75749 | > | T | 0 | [B@1991f767 | 10097 | >| 75740 | > | T | 0 | [B@768ccdc5 | 10092 | >| 75740 | > | T | 0 | [B@4c6daf0 | 10026 | >| 75739 | > | T | 0 | [B@10650953 | 10054 | >| 75731 | > | T | 0 | [B@659eef7 | 10092 | >| 75741 | > | T | 0 | [B@162be91c | 10023 | >| 75752 | > | T | 0 | [B@2488b073 | 10096 | >| 75743 | > | T | 0 | [B@1c9f0a20 | 10025 | >| 75745 | > | T | 0 | [B@55787112 | 10104 | >| 75725 | > | T | 0 | [B@1cd201a8 | 10019 |
[jira] [Updated] (PHOENIX-4289) UPDATE STATISTICS command does not collect stats for local indexes
[ https://issues.apache.org/jira/browse/PHOENIX-4289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samarth Jain updated PHOENIX-4289: -- Attachment: (was: PHOENIX-4289.patch) > UPDATE STATISTICS command does not collect stats for local indexes > -- > > Key: PHOENIX-4289 > URL: https://issues.apache.org/jira/browse/PHOENIX-4289 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.12.0 > Environment: HBase 1.3.1, Phoenix 4.12.0 >Reporter: Mujtaba Chohan > Assignee: Samarth Jain > Labels: localIndex > Attachments: PHOENIX-4289.patch > > > With clean {{SYSTEM.STATS}} table and restarted HBase server+Phoenix client. > Ran {{UPDATE STATISTICS T ALL}} command. Global guidepost width is set to > 100M. No stats are generated for any of the local indexes on table T. > {noformat} > explain select count(*) from T; > +---+-++--+ > | PLAN| > EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS | > +---+-++--+ > | CLIENT 8-CHUNK PARALLEL 8-WAY RANGE SCAN OVER T [1] | > null| null | null | > | SERVER FILTER BY FIRST KEY ONLY | > null| null | null | > | SERVER AGGREGATE INTO SINGLE ROW | > null| null | null | > +---+-++--+ > select * from system.stats; > +--++-++--++ > |PHYSICAL_NAME | COLUMN_FAMILY | GUIDE_POST_KEY | > GUIDE_POSTS_WIDTH | LAST_STATS_UPDATE_TIME | GUIDE_POSTS_ROW_COUNT | > +--++-++--++ > | T || | null | > 2017-10-16 18:36:57.884 | null | > | T | 0 | [B@9bd0fa6 | 10099 | >| 75756 | > | T | 0 | [B@59d2103b | 10057 | >| 75748 | > | T | 0 | [B@39dcf4b0 | 10058 | >| 75748 | > | T | 0 | [B@6e4de19b | 10081 | >| 75743 | > | T | 0 | [B@f6c03cb | 10044 | >| 75744 | > | T | 0 | [B@46f699d5 | 10023 | >| 75741 | > | T | 0 | [B@18518ccf | 10019 | >| 75749 | > | T | 0 | [B@1991f767 | 10097 | >| 75740 | > | T | 0 | [B@768ccdc5 | 10092 | >| 75740 | > | T | 0 | [B@4c6daf0 | 10026 | >| 75739 | > | T | 0 | [B@10650953 | 10054 | >| 75731 | > | T | 0 | [B@659eef7 | 10092 | >| 75741 | > | T | 0 | [B@162be91c | 10023 | >| 75752 | > | T | 0 | [B@2488b073 | 10096 | >| 75743 | > | T | 0 | [B@1c9f0a20 | 10025 | >| 75745 | > | T | 0 | [B@55787112 | 10104 | >| 75725 | > | T | 0 | [B@1cd201a8 | 10019 | >| 75748 | > | T | 0 | [B@7db82169 | 10080 |
[jira] [Updated] (PHOENIX-4289) UPDATE STATISTICS command does not collect stats for local indexes
[ https://issues.apache.org/jira/browse/PHOENIX-4289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samarth Jain updated PHOENIX-4289: -- Attachment: PHOENIX-4289.patch > UPDATE STATISTICS command does not collect stats for local indexes > -- > > Key: PHOENIX-4289 > URL: https://issues.apache.org/jira/browse/PHOENIX-4289 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.12.0 > Environment: HBase 1.3.1, Phoenix 4.12.0 >Reporter: Mujtaba Chohan > Assignee: Samarth Jain > Labels: localIndex > Attachments: PHOENIX-4289.patch > > > With clean {{SYSTEM.STATS}} table and restarted HBase server+Phoenix client. > Ran {{UPDATE STATISTICS T ALL}} command. Global guidepost width is set to > 100M. No stats are generated for any of the local indexes on table T. > {noformat} > explain select count(*) from T; > +---+-++--+ > | PLAN| > EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS | > +---+-++--+ > | CLIENT 8-CHUNK PARALLEL 8-WAY RANGE SCAN OVER T [1] | > null| null | null | > | SERVER FILTER BY FIRST KEY ONLY | > null| null | null | > | SERVER AGGREGATE INTO SINGLE ROW | > null| null | null | > +---+-++--+ > select * from system.stats; > +--++-++--++ > |PHYSICAL_NAME | COLUMN_FAMILY | GUIDE_POST_KEY | > GUIDE_POSTS_WIDTH | LAST_STATS_UPDATE_TIME | GUIDE_POSTS_ROW_COUNT | > +--++-++--++ > | T || | null | > 2017-10-16 18:36:57.884 | null | > | T | 0 | [B@9bd0fa6 | 10099 | >| 75756 | > | T | 0 | [B@59d2103b | 10057 | >| 75748 | > | T | 0 | [B@39dcf4b0 | 10058 | >| 75748 | > | T | 0 | [B@6e4de19b | 10081 | >| 75743 | > | T | 0 | [B@f6c03cb | 10044 | >| 75744 | > | T | 0 | [B@46f699d5 | 10023 | >| 75741 | > | T | 0 | [B@18518ccf | 10019 | >| 75749 | > | T | 0 | [B@1991f767 | 10097 | >| 75740 | > | T | 0 | [B@768ccdc5 | 10092 | >| 75740 | > | T | 0 | [B@4c6daf0 | 10026 | >| 75739 | > | T | 0 | [B@10650953 | 10054 | >| 75731 | > | T | 0 | [B@659eef7 | 10092 | >| 75741 | > | T | 0 | [B@162be91c | 10023 | >| 75752 | > | T | 0 | [B@2488b073 | 10096 | >| 75743 | > | T | 0 | [B@1c9f0a20 | 10025 | >| 75745 | > | T | 0 | [B@55787112 | 10104 | >| 75725 | > | T | 0 | [B@1cd201a8 | 10019 | >| 75748 | > | T | 0 | [B@7db82169 | 10080 | >
[jira] [Updated] (PHOENIX-4289) UPDATE STATISTICS command does not collect stats for local indexes
[ https://issues.apache.org/jira/browse/PHOENIX-4289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samarth Jain updated PHOENIX-4289: -- Attachment: PHOENIX-4289.patch [~jamestaylor], please review. > UPDATE STATISTICS command does not collect stats for local indexes > -- > > Key: PHOENIX-4289 > URL: https://issues.apache.org/jira/browse/PHOENIX-4289 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.12.0 > Environment: HBase 1.3.1, Phoenix 4.12.0 >Reporter: Mujtaba Chohan > Assignee: Samarth Jain > Labels: localIndex > Attachments: PHOENIX-4289.patch > > > With clean {{SYSTEM.STATS}} table and restarted HBase server+Phoenix client. > Ran {{UPDATE STATISTICS T ALL}} command. Global guidepost width is set to > 100M. No stats are generated for any of the local indexes on table T. > {noformat} > explain select count(*) from T; > +---+-++--+ > | PLAN| > EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS | > +---+-++--+ > | CLIENT 8-CHUNK PARALLEL 8-WAY RANGE SCAN OVER T [1] | > null| null | null | > | SERVER FILTER BY FIRST KEY ONLY | > null| null | null | > | SERVER AGGREGATE INTO SINGLE ROW | > null| null | null | > +---+-++--+ > select * from system.stats; > +--++-++--++ > |PHYSICAL_NAME | COLUMN_FAMILY | GUIDE_POST_KEY | > GUIDE_POSTS_WIDTH | LAST_STATS_UPDATE_TIME | GUIDE_POSTS_ROW_COUNT | > +--++-++--++ > | T || | null | > 2017-10-16 18:36:57.884 | null | > | T | 0 | [B@9bd0fa6 | 10099 | >| 75756 | > | T | 0 | [B@59d2103b | 10057 | >| 75748 | > | T | 0 | [B@39dcf4b0 | 10058 | >| 75748 | > | T | 0 | [B@6e4de19b | 10081 | >| 75743 | > | T | 0 | [B@f6c03cb | 10044 | >| 75744 | > | T | 0 | [B@46f699d5 | 10023 | >| 75741 | > | T | 0 | [B@18518ccf | 10019 | >| 75749 | > | T | 0 | [B@1991f767 | 10097 | >| 75740 | > | T | 0 | [B@768ccdc5 | 10092 | >| 75740 | > | T | 0 | [B@4c6daf0 | 10026 | >| 75739 | > | T | 0 | [B@10650953 | 10054 | >| 75731 | > | T | 0 | [B@659eef7 | 10092 | >| 75741 | > | T | 0 | [B@162be91c | 10023 | >| 75752 | > | T | 0 | [B@2488b073 | 10096 | >| 75743 | > | T | 0 | [B@1c9f0a20 | 10025 | >| 75745 | > | T | 0 | [B@55787112 | 10104 | >| 75725 | > | T | 0 | [B@1cd201a8 | 10019 | >| 75748 | > | T | 0 | [B@7db82169
[jira] [Commented] (PHOENIX-4320) Update website pages with information on phoenix.use.stats.parallelization confi
[ https://issues.apache.org/jira/browse/PHOENIX-4320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219492#comment-16219492 ] Samarth Jain commented on PHOENIX-4320: --- Something wrong with my setup. [~mujtabachohan] just pushed a commit and fixed it. Thanks Mujtaba. > Update website pages with information on phoenix.use.stats.parallelization > confi > > > Key: PHOENIX-4320 > URL: https://issues.apache.org/jira/browse/PHOENIX-4320 > Project: Phoenix > Issue Type: Task > Reporter: Samarth Jain >Assignee: Samarth Jain > Attachments: PHOENIX-4320.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-4320) Update website pages with information on phoenix.use.stats.parallelization confi
[ https://issues.apache.org/jira/browse/PHOENIX-4320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219433#comment-16219433 ] Samarth Jain commented on PHOENIX-4320: --- Oops. Will fix it right away. > Update website pages with information on phoenix.use.stats.parallelization > confi > > > Key: PHOENIX-4320 > URL: https://issues.apache.org/jira/browse/PHOENIX-4320 > Project: Phoenix > Issue Type: Task > Reporter: Samarth Jain >Assignee: Samarth Jain > Attachments: PHOENIX-4320.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (PHOENIX-4320) Update website pages with information on phoenix.use.stats.parallelization confi
[ https://issues.apache.org/jira/browse/PHOENIX-4320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samarth Jain resolved PHOENIX-4320. --- Resolution: Fixed > Update website pages with information on phoenix.use.stats.parallelization > confi > > > Key: PHOENIX-4320 > URL: https://issues.apache.org/jira/browse/PHOENIX-4320 > Project: Phoenix > Issue Type: Task > Reporter: Samarth Jain >Assignee: Samarth Jain > Attachments: PHOENIX-4320.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (PHOENIX-4320) Update website pages with information on phoenix.use.stats.parallelization confi
[ https://issues.apache.org/jira/browse/PHOENIX-4320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samarth Jain updated PHOENIX-4320: -- Attachment: PHOENIX-4320.patch > Update website pages with information on phoenix.use.stats.parallelization > confi > > > Key: PHOENIX-4320 > URL: https://issues.apache.org/jira/browse/PHOENIX-4320 > Project: Phoenix > Issue Type: Task > Reporter: Samarth Jain >Assignee: Samarth Jain > Attachments: PHOENIX-4320.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (PHOENIX-4320) Update website pages with information on phoenix.use.stats.parallelization confi
Samarth Jain created PHOENIX-4320: - Summary: Update website pages with information on phoenix.use.stats.parallelization confi Key: PHOENIX-4320 URL: https://issues.apache.org/jira/browse/PHOENIX-4320 Project: Phoenix Issue Type: Task Reporter: Samarth Jain Assignee: Samarth Jain -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (PHOENIX-4289) UPDATE STATISTICS command does not collect stats for local indexes
[ https://issues.apache.org/jira/browse/PHOENIX-4289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213608#comment-16213608 ] Samarth Jain edited comment on PHOENIX-4289 at 10/21/17 12:58 AM: -- I think I see what is going on. When a table has an index, we run update stats twice - once for the data table and once for the index table. We control update stats being called too many times in a short duration by using the configurable setting phoenix.stats.minUpdateFrequency. The check for when update stats was last run uses the physical_name as the filter. {code} String query = "SELECT CURRENT_DATE()," + LAST_STATS_UPDATE_TIME + " FROM " + PhoenixDatabaseMetaData.SYSTEM_STATS_NAME + " WHERE " + PHYSICAL_NAME + "='" + physicalName.getString() + "' AND " + COLUMN_FAMILY + " IS NULL AND " + LAST_STATS_UPDATE_TIME + " IS NOT NULL"; {code} For local indexes, the physical_name is same for both data table and index table. As a result the second update stats ends up not collecting any stats for the index table. The default value of this config is set to 0 in our tests. So an update statistics statement was collecting stats for both index and data tables. After setting QueryServicesTestImpl.DEFAULT_MIN_STATS_UPDATE_FREQ_MS to a large value, I am seeing now that the estimates are being returned as null. was (Author: samarthjain): I think I see what is going on. When a table has an index, we run update stats twice - once for the data table and once for the index table. We control update stats being called too many times in a short duration by using the configurable setting phoenix.stats.minUpdateFrequency. The check for when update stats was last run uses the physical_table_name as the filter. For local indexes, the physical_table_name is same for both data table and index table. As a result the second update stats ends up not collecting any stats for the index table. The default value of this config is set to 0 in our tests. So the tests weren't able to catch this issue. After setting QueryServicesTestImpl.DEFAULT_MIN_STATS_UPDATE_FREQ_MS to a large value, I am seeing now that the estimates are being returned as null. > UPDATE STATISTICS command does not collect stats for local indexes > -- > > Key: PHOENIX-4289 > URL: https://issues.apache.org/jira/browse/PHOENIX-4289 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.12.0 > Environment: HBase 1.3.1, Phoenix 4.12.0 >Reporter: Mujtaba Chohan >Assignee: Samarth Jain > Labels: localIndex > > With clean {{SYSTEM.STATS}} table and restarted HBase server+Phoenix client. > Ran {{UPDATE STATISTICS T ALL}} command. Global guidepost width is set to > 100M. No stats are generated for any of the local indexes on table T. > {noformat} > explain select count(*) from T; > +---+-++--+ > | PLAN| > EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS | > +---+-++--+ > | CLIENT 8-CHUNK PARALLEL 8-WAY RANGE SCAN OVER T [1] | > null| null | null | > | SERVER FILTER BY FIRST KEY ONLY | > null| null | null | > | SERVER AGGREGATE INTO SINGLE ROW | > null| null | null | > +---+-++--+ > select * from system.stats; > +--++-++--++ > |PHYSICAL_NAME | COLUMN_FAMILY | GUIDE_POST_KEY | > GUIDE_POSTS_WIDTH | LAST_STATS_UPDATE_TIME | GUIDE_POSTS_ROW_COUNT | > +--++-++--++ > | T || | null | > 2017-10-16 18:36:57.884 | null | > | T | 0 | [B@9bd0fa6 | 10099 | >| 75756 | > | T | 0 | [B@59d2103b | 10057 | >
[jira] [Commented] (PHOENIX-4289) UPDATE STATISTICS command does not collect stats for local indexes
[ https://issues.apache.org/jira/browse/PHOENIX-4289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213608#comment-16213608 ] Samarth Jain commented on PHOENIX-4289: --- I think I see what is going on. When a table has an index, we run update stats twice - once for the data table and once for the index table. We control update stats being called too many times in a short duration by using the configurable setting phoenix.stats.minUpdateFrequency. The check for when update stats was last run uses the physical_table_name as the filter. For local indexes, the physical_table_name is same for both data table and index table. As a result the second update stats ends up not collecting any stats for the index table. The default value of this config is set to 0 in our tests. So the tests weren't able to catch this issue. After setting QueryServicesTestImpl.DEFAULT_MIN_STATS_UPDATE_FREQ_MS to a large value, I am seeing now that the estimates are being returned as null. > UPDATE STATISTICS command does not collect stats for local indexes > -- > > Key: PHOENIX-4289 > URL: https://issues.apache.org/jira/browse/PHOENIX-4289 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.12.0 > Environment: HBase 1.3.1, Phoenix 4.12.0 >Reporter: Mujtaba Chohan > Assignee: Samarth Jain > Labels: localIndex > > With clean {{SYSTEM.STATS}} table and restarted HBase server+Phoenix client. > Ran {{UPDATE STATISTICS T ALL}} command. Global guidepost width is set to > 100M. No stats are generated for any of the local indexes on table T. > {noformat} > explain select count(*) from T; > +---+-++--+ > | PLAN| > EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS | > +---+-++--+ > | CLIENT 8-CHUNK PARALLEL 8-WAY RANGE SCAN OVER T [1] | > null| null | null | > | SERVER FILTER BY FIRST KEY ONLY | > null| null | null | > | SERVER AGGREGATE INTO SINGLE ROW | > null| null | null | > +---+-++--+ > select * from system.stats; > +--++-++--++ > |PHYSICAL_NAME | COLUMN_FAMILY | GUIDE_POST_KEY | > GUIDE_POSTS_WIDTH | LAST_STATS_UPDATE_TIME | GUIDE_POSTS_ROW_COUNT | > +--++-++--++ > | T || | null | > 2017-10-16 18:36:57.884 | null | > | T | 0 | [B@9bd0fa6 | 10099 | >| 75756 | > | T | 0 | [B@59d2103b | 10057 | >| 75748 | > | T | 0 | [B@39dcf4b0 | 10058 | >| 75748 | > | T | 0 | [B@6e4de19b | 10081 | >| 75743 | > | T | 0 | [B@f6c03cb | 10044 | >| 75744 | > | T | 0 | [B@46f699d5 | 10023 | >| 75741 | > | T | 0 | [B@18518ccf | 10019 | >| 75749 | > | T | 0 | [B@1991f767 | 10097 | >| 75740 | > | T | 0 | [B@768ccdc5 | 10092 | >| 75740 | > | T | 0 | [B@4c6daf0 | 10026 | >| 75739 | > | T | 0 | [B@10650953 | 10054 | >| 75731 | > | T | 0 | [B@659eef7 | 10092 | >| 757
[jira] [Commented] (PHOENIX-4289) UPDATE STATISTICS command does not collect stats for local indexes
[ https://issues.apache.org/jira/browse/PHOENIX-4289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213580#comment-16213580 ] Samarth Jain commented on PHOENIX-4289: --- [~mujtabachohan], I am unable to repro this issue in a unit test. This is what I added in ExplainPlanWithStatsEnabledIT: {code} @Test public void testEstimatesWithLocalIndexes() throws Exception { String tableName = generateUniqueName(); String indexName = "IDX_" + generateUniqueName(); try (Connection conn = DriverManager.getConnection(getUrl())) { int guidePostWidth = 20; conn.createStatement() .execute("CREATE TABLE " + tableName + " (k INTEGER PRIMARY KEY, a bigint, b bigint)" + " GUIDE_POSTS_WIDTH=" + guidePostWidth); conn.createStatement().execute("upsert into " + tableName + " values (100,1,3)"); conn.createStatement().execute("upsert into " + tableName + " values (101,2,4)"); conn.createStatement().execute("upsert into " + tableName + " values (102,2,4)"); conn.createStatement().execute("upsert into " + tableName + " values (103,2,4)"); conn.createStatement().execute("upsert into " + tableName + " values (104,2,4)"); conn.createStatement().execute("upsert into " + tableName + " values (105,2,4)"); conn.createStatement().execute("upsert into " + tableName + " values (106,2,4)"); conn.createStatement().execute("upsert into " + tableName + " values (107,2,4)"); conn.createStatement().execute("upsert into " + tableName + " values (108,2,4)"); conn.createStatement().execute("upsert into " + tableName + " values (109,2,4)"); conn.commit(); conn.createStatement().execute( "CREATE LOCAL INDEX " + indexName + " ON " + tableName + " (a) INCLUDE (b) "); conn.createStatement().execute("UPDATE STATISTICS " + tableName + ""); } List binds = Lists.newArrayList(); try (Connection conn = DriverManager.getConnection(getUrl())) { String sql = "SELECT COUNT(*) " + " FROM " + tableName; ResultSet rs = conn.createStatement().executeQuery(sql); assertTrue("Index " + indexName + " should have been used", rs.unwrap(PhoenixResultSet.class).getStatement().getQueryPlan().getTableRef() .getTable().getName().getString().equals(indexName)); Estimate info = getByteRowEstimates(conn, sql, binds); assertEquals((Long) 10l, info.estimatedRows); assertTrue(info.estimateInfoTs > 0); } } {code} > UPDATE STATISTICS command does not collect stats for local indexes > ------ > > Key: PHOENIX-4289 > URL: https://issues.apache.org/jira/browse/PHOENIX-4289 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.12.0 > Environment: HBase 1.3.1, Phoenix 4.12.0 >Reporter: Mujtaba Chohan >Assignee: Samarth Jain > Labels: localIndex > > With clean {{SYSTEM.STATS}} table and restarted HBase server+Phoenix client. > Ran {{UPDATE STATISTICS T ALL}} command. Global guidepost width is set to > 100M. No stats are generated for any of the local indexes on table T. > {noformat} > explain select count(*) from T; > +---+-++--+ > | PLAN| > EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS | > +---+-++--+ > | CLIENT 8-CHUNK PARALLEL 8-WAY RANGE SCAN OVER T [1] | > null| null | null | > | SERVER FILTER BY FIRST KEY ONLY | > null| null | null | > | SERVER AGGREGATE INTO SINGLE ROW | > null| null | null | > +---+-++--+ > select * from system.stats; > +
[jira] [Commented] (PHOENIX-4287) Incorrect aggregate query results when stats are disable for parallelization
[ https://issues.apache.org/jira/browse/PHOENIX-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16209858#comment-16209858 ] Samarth Jain commented on PHOENIX-4287: --- Looks like it is limited to local indexes. Will keep looking. > Incorrect aggregate query results when stats are disable for parallelization > > > Key: PHOENIX-4287 > URL: https://issues.apache.org/jira/browse/PHOENIX-4287 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.12.0 > Environment: HBase 1.3.1 >Reporter: Mujtaba Chohan > Assignee: Samarth Jain > Fix For: 4.12.1 > > > With {{phoenix.use.stats.parallelization}} set to {{false}}, aggregate query > returns incorrect results when stats are available. > With local index and stats disabled for parallelization: > {noformat} > explain select count(*) from TABLE_T; > +---+-++---+ > | PLAN > | EST_BYTES_READ | EST_ROWS_READ | EST_INFO | > +---+-++---+ > | CLIENT 0-CHUNK 332170 ROWS 625043899 BYTES PARALLEL 0-WAY RANGE SCAN OVER > TABLE_T [1] | 625043899 | 332170 | 150792825 | > | SERVER FILTER BY FIRST KEY ONLY > | 625043899 | 332170 | 150792825 | > | SERVER AGGREGATE INTO SINGLE ROW > | 625043899 | 332170 | 150792825 | > +---+-++---+ > select count(*) from TABLE_T; > +---+ > | COUNT(1) | > +---+ > | 0 | > +---+ > {noformat} > Using data table > {noformat} > explain select /*+NO_INDEX*/ count(*) from TABLE_T; > +--+-+++ > | PLAN > | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS | > +--+-+++ > | CLIENT 2-CHUNK 332151 ROWS 438492470 BYTES PARALLEL 1-WAY FULL SCAN OVER > TABLE_T | 438492470 | 332151 | 1507928257617 | > | SERVER FILTER BY FIRST KEY ONLY > | 438492470 | 332151 | 1507928257617 | > | SERVER AGGREGATE INTO SINGLE ROW > | 438492470 | 332151 | 1507928257617 | > +--+-+++ > select /*+NO_INDEX*/ count(*) from TABLE_T; > +---+ > | COUNT(1) | > +---+ > | 14| > +---+ > {noformat} > Without stats available, results are correct: > {noformat} > explain select /*+NO_INDEX*/ count(*) from TABLE_T; > +--+-++--+ > | PLAN | > EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS | > +--+-++--+ > | CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER TABLE_T | null| > null | null | > | SERVER FILTER BY FIRST KEY ONLY | null >| null | null | > | SERVER AGGREGATE INTO SINGLE ROW | null >| null | null | > +--+-++--+ > select /*+NO_INDEX*/ count(*) from TABLE_T; > +---+ > | COUNT(1) | > +---+ > | 27| > +---+ > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-4287) Incorrect aggregate query results when stats are disable for parallelization
[ https://issues.apache.org/jira/browse/PHOENIX-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16208163#comment-16208163 ] Samarth Jain commented on PHOENIX-4287: --- Yes, I am working on it. > Incorrect aggregate query results when stats are disable for parallelization > > > Key: PHOENIX-4287 > URL: https://issues.apache.org/jira/browse/PHOENIX-4287 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.12.0 > Environment: HBase 1.3.1 >Reporter: Mujtaba Chohan > Assignee: Samarth Jain > Fix For: 4.12.1 > > > With {{phoenix.use.stats.parallelization}} set to {{false}}, aggregate query > returns incorrect results when stats are available. > With local index and stats disabled for parallelization: > {noformat} > explain select count(*) from TABLE_T; > +---+-++---+ > | PLAN > | EST_BYTES_READ | EST_ROWS_READ | EST_INFO | > +---+-++---+ > | CLIENT 0-CHUNK 332170 ROWS 625043899 BYTES PARALLEL 0-WAY RANGE SCAN OVER > TABLE_T [1] | 625043899 | 332170 | 150792825 | > | SERVER FILTER BY FIRST KEY ONLY > | 625043899 | 332170 | 150792825 | > | SERVER AGGREGATE INTO SINGLE ROW > | 625043899 | 332170 | 150792825 | > +---+-++---+ > select count(*) from TABLE_T; > +---+ > | COUNT(1) | > +---+ > | 0 | > +---+ > {noformat} > Using data table > {noformat} > explain select /*+NO_INDEX*/ count(*) from TABLE_T; > +--+-+++ > | PLAN > | EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS | > +--+-+++ > | CLIENT 2-CHUNK 332151 ROWS 438492470 BYTES PARALLEL 1-WAY FULL SCAN OVER > TABLE_T | 438492470 | 332151 | 1507928257617 | > | SERVER FILTER BY FIRST KEY ONLY > | 438492470 | 332151 | 1507928257617 | > | SERVER AGGREGATE INTO SINGLE ROW > | 438492470 | 332151 | 1507928257617 | > +--+-+++ > select /*+NO_INDEX*/ count(*) from TABLE_T; > +---+ > | COUNT(1) | > +---+ > | 14| > +---+ > {noformat} > Without stats available, results are correct: > {noformat} > explain select /*+NO_INDEX*/ count(*) from TABLE_T; > +--+-++--+ > | PLAN | > EST_BYTES_READ | EST_ROWS_READ | EST_INFO_TS | > +--+-++--+ > | CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER TABLE_T | null| > null | null | > | SERVER FILTER BY FIRST KEY ONLY | null >| null | null | > | SERVER AGGREGATE INTO SINGLE ROW | null >| null | null | > +--+-++--+ > select /*+NO_INDEX*/ count(*) from TABLE_T; > +---+ > | COUNT(1) | > +---+ > | 27| > +---+ > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
Re: [ANNOUNCE] New Phoenix committer: Ethan Wang
Congrats, Ethan! On Thu, Oct 12, 2017 at 9:25 AM Thomas D'Silvawrote: > Congrats Ethan! > > On Thu, Oct 12, 2017 at 8:28 AM, Geoffrey Jacoby > wrote: > > > Congrats, Ethan! Looking forward to using those new functions soon. > > > > Geoffrey > > > > On Thu, Oct 12, 2017 at 1:32 AM, rajeshb...@apache.org < > > chrajeshbab...@gmail.com> wrote: > > > > > Congratulations Ethan!! Great Job. > > > > > > Thanks, > > > Rajeshbabu. > > > > > > On Thu, Oct 12, 2017 at 7:15 AM, James Taylor > > > wrote: > > > > > > > On behalf of the Apache Phoenix PMC, I'm please to announce that > Ethan > > > Wang > > > > has accepted our invitation to become a committer. He's behind some > of > > > the > > > > great new 4.12 features of table sampling [1] and approximate count > > > > distinct [2] along with contributing to the less sexy work of helping > > to > > > > stabilize our unit tests. > > > > > > > > Please give Ethan a warm welcome to the project! > > > > > > > > James > > > > > > > > [1] https://phoenix.apache.org/tablesample.html > > > > [2] https://phoenix.apache.org/language/functions.html# > > > > approx_count_distinct > > > > > > > > > >
Re: [ANNOUNCE] New Phoenix committer: Vincent Poon
Congrats, Vincent! On Thu, Oct 12, 2017 at 9:25 AM Thomas D'Silvawrote: > Congrats Vincent! > > On Thu, Oct 12, 2017 at 8:27 AM, Geoffrey Jacoby > wrote: > > > Congrats, Vincent! Thanks for all your help on the index stabilization. > > > > On Thu, Oct 12, 2017 at 1:32 AM, rajeshb...@apache.org < > > chrajeshbab...@gmail.com> wrote: > > > > > Congratulations Vincent!! Great Job. > > > > > > Thanks, > > > Rajeshbabu. > > > > > > On Thu, Oct 12, 2017 at 7:21 AM, James Taylor > > > wrote: > > > > > > > On behalf of the Apache Phoenix PMC, I'm delighted to announce that > > > Vincent > > > > Poon has accepted our invitation to become a committer. He's had a > big > > > > impact in helping to stabilize our secondary index implementation, > > > > including the creation of an index scrutiny tool that will detect > > > > out-of-sync issues [1]. > > > > > > > > Looking forward to continued contributions. > > > > > > > > Please give Vincent a warm welcome to the project! > > > > > > > > James > > > > > > > > > > > > [1] https://phoenix.apache.org/secondary_indexing.html#Index_ > > > Scrutiny_Tool > > > > > > > > > >
Re: [VOTE] Release of Apache Phoenix 4.12.0 RC0
+1 - built from source - successfully ran all unit and integration tests - collected stats using major compaction and update stats - estimates look correct - ran some basic manual tests involving global mutable and immutable secondary indexes, looks good. On Fri, Oct 6, 2017 at 1:03 PM, lars hofhanslwrote: > +1 > - built from source- loaded a few million rows into Phoenix- tried some > queries- nothing undue in the logs- killed a region server while the client > was in the middle of a large update (UPSERT ... SELECT ...)- all recovered > nicely > > > From: James Taylor > To: "dev@phoenix.apache.org" > Sent: Wednesday, October 4, 2017 12:46 AM > Subject: [VOTE] Release of Apache Phoenix 4.12.0 RC0 > > Hello Everyone, > > This is a call for a vote on Apache Phoenix 4.12.0 RC0. This is the next > minor release of Phoenix 4, compatible with Apache HBase 0.98, 1.1, 1.2, & > 1.3. The release includes both a source-only release and a convenience > binary release for each supported HBase version. > > This release has feature parity with supported HBase versions and includes > the following improvements: > - Improved scalability of global mutable secondary index > - 100+ bug fixes (the majority around secondary indexing) > - Index Scrutiny tool [1] > - Stabilization of unit tests > - Support for table sampling [2] > - Support for APPROX_COUNT_DISTINCT aggregate function [3] > > The source tarball, including signatures, digests, etc can be found at: > https://dist.apache.org/repos/dist/dev/phoenix/apache- > phoenix-v4.12.0-HBase-0.98-rc0/src/ > https://dist.apache.org/repos/dist/dev/phoenix/apache- > phoenix-v4.12.0-HBase-1.1-rc0/src/ > https://dist.apache.org/repos/dist/dev/phoenix/apache- > phoenix-v4.12.0-HBase-1.2-rc0/src/ > https://dist.apache.org/repos/dist/dev/phoenix/apache- > phoenix-v4.12.0-HBase-1.3-rc0/src/ > > The binary artifacts can be found at: > https://dist.apache.org/repos/dist/dev/phoenix/apache- > phoenix-v4.12.0-HBase-0.98-rc0/bin/ > https://dist.apache.org/repos/dist/dev/phoenix/apache- > phoenix-v4.12.0-HBase-1.1-rc0/bin/ > https://dist.apache.org/repos/dist/dev/phoenix/apache- > phoenix-v4.12.0-HBase-1.2-rc0/bin/ > https://dist.apache.org/repos/dist/dev/phoenix/apache- > phoenix-v4.12.0-HBase-1.3-rc0/bin/ > > For a complete list of changes, see: > https://issues.apache.org/jira/secure/ReleaseNote.jspa? > projectId=12315120=12340844 > > Artifacts are signed with my "CODE SIGNING KEY": 308FBEE06088BE0F > > KEYS file available here: > https://dist.apache.org/repos/dist/dev/phoenix/KEYS > > The hash and tag to be voted upon: > https://git-wip-us.apache.org/repos/asf?p=phoenix.git;a=commit;h= > 13a7f97b49704642d67481c58a118a68c2e4c2e5 > https://git-wip-us.apache.org/repos/asf?p=phoenix.git;a=tag; > h=refs/tags/v4.12.0-HBase-0.98-rc0 > https://git-wip-us.apache.org/repos/asf?p=phoenix.git;a=commit;h= > e40bbfff1150e56e1ecb7cd22c49cee298496c2b > https://git-wip-us.apache.org/repos/asf?p=phoenix.git;a=tag; > h=refs/tags/v4.12.0-HBase-1.1-rc0 > https://git-wip-us.apache.org/repos/asf?p=phoenix.git;a=commit;h= > d79dd50ff732f2673e1414d970cd4742e2c135de > https://git-wip-us.apache.org/repos/asf?p=phoenix.git;a=tag; > h=refs/tags/v4.12.0-HBase-1.2-rc0 > https://git-wip-us.apache.org/repos/asf?p=phoenix.git;a=commit;h= > f0bc4cdb5bbf96b316c78cc816400b04f63e911b > https://git-wip-us.apache.org/repos/asf?p=phoenix.git;a=tag; > h=refs/tags/v4.12.0-HBase-1.3-rc0 > > Vote will be open for at least 72 hours. Please vote: > > [ ] +1 approve > [ ] +0 no opinion > [ ] -1 disapprove (and reason why) > > Thanks, > The Apache Phoenix Team > > [1] https://phoenix.apache.org/secondary_indexing.html#Index_Scrutiny_Tool > [2] https://phoenix.apache.org/tablesample.html > [3] https://phoenix.apache.org/language/functions.html# > approx_count_distinct > > > >
[jira] [Commented] (PHOENIX-4276) Surface metrics on statistics collection
[ https://issues.apache.org/jira/browse/PHOENIX-4276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16191580#comment-16191580 ] Samarth Jain commented on PHOENIX-4276: --- FYI, [~Misraji] > Surface metrics on statistics collection > > > Key: PHOENIX-4276 > URL: https://issues.apache.org/jira/browse/PHOENIX-4276 > Project: Phoenix > Issue Type: Improvement > Reporter: Samarth Jain > > It would be good to get an insight on how stats collection is doing over > time. An initial set of metrics that I can think of would be: > Time taken to compute stats (reading cells and computing their size) > Time taken to commit stats per physical table. > Number of guide posts collected per physical table > Number of guide posts collected per region. > Number of regions on which stats collection happened per physical table > Number of times stats was collected due to major compaction vs update stats > per physical table > If possible, figure out if stats was collected because minor compaction was > promoted to major compaction and surface a metric for it. > Because most of the collection work happens on server side, one option would > be to see how HBase's metrics are surfaced (my guess is JMX) and follow the > same pattern. Or we could possibly use the hbase-metrics-api module but that > is an HBase 1.4 thing. Another option would be see PHOENIX-3807 for some > inspiration. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (PHOENIX-4276) Surface metrics on statistics collection
[ https://issues.apache.org/jira/browse/PHOENIX-4276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samarth Jain updated PHOENIX-4276: -- Issue Type: Improvement (was: Bug) > Surface metrics on statistics collection > > > Key: PHOENIX-4276 > URL: https://issues.apache.org/jira/browse/PHOENIX-4276 > Project: Phoenix > Issue Type: Improvement > Reporter: Samarth Jain > > It would be good to get an insight on how stats collection is doing over > time. An initial set of metrics that I can think of would be: > Time taken to compute stats (reading cells and computing their size) > Time taken to commit stats per physical table. > Number of guide posts collected per physical table > Number of guide posts collected per region. > Number of regions on which stats collection happened per physical table > Number of times stats was collected due to major compaction vs update stats > per physical table > If possible, figure out if stats was collected because minor compaction was > promoted to major compaction and surface a metric for it. > Because most of the collection work happens on server side, one option would > be to see how HBase's metrics are surfaced (my guess is JMX) and follow the > same pattern. Or we could possibly use the hbase-metrics-api module but that > is an HBase 1.4 thing. Another option would be see PHOENIX-3807 for some > inspiration. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (PHOENIX-4276) Surface metrics on statistics collection
Samarth Jain created PHOENIX-4276: - Summary: Surface metrics on statistics collection Key: PHOENIX-4276 URL: https://issues.apache.org/jira/browse/PHOENIX-4276 Project: Phoenix Issue Type: Bug Reporter: Samarth Jain It would be good to get an insight on how stats collection is doing over time. An initial set of metrics that I can think of would be: Time taken to compute stats (reading cells and computing their size) Time taken to commit stats per physical table. Number of guide posts collected per physical table Number of guide posts collected per region. Number of regions on which stats collection happened per physical table Number of times stats was collected due to major compaction vs update stats per physical table If possible, figure out if stats was collected because minor compaction was promoted to major compaction and surface a metric for it. Because most of the collection work happens on server side, one option would be to see how HBase's metrics are surfaced (my guess is JMX) and follow the same pattern. Or we could possibly use the hbase-metrics-api module but that is an HBase 1.4 thing. Another option would be see PHOENIX-3807 for some inspiration. -- This message was sent by Atlassian JIRA (v6.4.14#64029)