[jira] [Commented] (PHOENIX-2896) Support encoded column qualifiers per column family

2018-05-24 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16488565#comment-16488565
 ] 

Samarth Jain commented on PHOENIX-2896:
---

[~tdsilva] - I am not sure I understood your question. We use the default 
family name for tracking column qualifier counters for mutable tables. 
{code:java}

if (immutableStorageScheme == SINGLE_CELL_ARRAY_WITH_OFFSETS && encodingScheme 
!= NON_ENCODED_QUALIFIERS) {
// For this scheme we track column qualifier counters at the column family 
level.
cqCounterFamily = colDefFamily != null ? colDefFamily : (defaultFamilyName != 
null ? defaultFamilyName : DEFAULT_COLUMN_FAMILY);
} else {
// For other schemes, column qualifier counters are tracked using the default 
column family.
cqCounterFamily = defaultFamilyName != null ? defaultFamilyName : 
DEFAULT_COLUMN_FAMILY;
}{code}

> Support encoded column qualifiers per column family 
> 
>
> Key: PHOENIX-2896
> URL: https://issues.apache.org/jira/browse/PHOENIX-2896
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Thomas D'Silva
>Assignee: Samarth Jain
>Priority: Major
> Fix For: 4.10.0
>
>
> This allows us to reduce the number of null values in the stored array that 
> contains all columns for a give column family for the 
> COLUMNS_STORED_IN_SINGLE_CELL Storage Scheme.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PHOENIX-4701) Improve schema of SYSTEM.LOG table

2018-04-24 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16451462#comment-16451462
 ] 

Samarth Jain commented on PHOENIX-4701:
---

I haven't closely looked at the original commit, [~jamestaylor]. But do you 
think we can run into some kind of infinite loop by using the Phoenix API for 
writing to the SYSTEM.LOG table? If so, we may need to do something similar 
like what our tracing framework does where it makes sure writes to SYSTEM.TRACE 
table do not generate traces themselves.

> Improve schema of SYSTEM.LOG table
> --
>
> Key: PHOENIX-4701
> URL: https://issues.apache.org/jira/browse/PHOENIX-4701
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: James Taylor
>Priority: Major
> Fix For: 4.14.0, 5.0.0
>
> Attachments: PHOENIX-4701_wip1.patch, PHOENIX-4701_wip2.patch
>
>
> If possible, the SYSTEM.LOG table would benefit greatly  (3-5x perf gain) 
> from being declared as immutable with a column encoding of 1 byte and a 
> storage format of SINGLE_CELL_ARRAY_WITH_OFFSETS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PHOENIX-4366) Rebuilding a local index fails sometimes

2018-04-10 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16432792#comment-16432792
 ] 

Samarth Jain commented on PHOENIX-4366:
---

Ah, I see! Thanks for the explanation, [~sergey.soldatov].

> Rebuilding a local index fails sometimes
> 
>
> Key: PHOENIX-4366
> URL: https://issues.apache.org/jira/browse/PHOENIX-4366
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
>Reporter: Marcin Januszkiewicz
>Assignee: James Taylor
>Priority: Blocker
> Fix For: 4.14.0
>
> Attachments: PHOENIX-4366_v1.patch
>
>
> We have a table created in 4.12 with the new column encoding scheme and with 
> several local indexes. Sometimes when we issue an ALTER INDEX ... REBUILD 
> command, it fails with the following exception:
> {noformat}
> Error: org.apache.phoenix.exception.PhoenixIOException: 
> org.apache.hadoop.hbase.DoNotRetryIOException: 
> TRACES,\x01BY01O90A6-$599a349e,1509979836322.3f
> 30c9d449ed6c60a1cda6898f766bd0.: null 
>   
>   
> at 
> org.apache.phoenix.util.ServerUtil.createIOException(ServerUtil.java:96)  
>   
>
> at 
> org.apache.phoenix.util.ServerUtil.throwIOException(ServerUtil.java:62)   
>   
>
> at 
> org.apache.phoenix.coprocessor.BaseScannerRegionObserver$RegionScannerHolder.overrideDelegate(BaseScannerRegionObserver.java:255)
>   
> at 
> org.apache.phoenix.coprocessor.BaseScannerRegionObserver$RegionScannerHolder.nextRaw(BaseScannerRegionObserver.java:284)
>
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2541)
>   
>  
> at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:33648)
> 
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2183)
>   
>   
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
>   
>   
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:183) 
>   
>
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:163) 
>   
>
> Caused by: java.lang.UnsupportedOperationException
>   
>   
> at 
> org.apache.phoenix.schema.PTable$QualifierEncodingScheme$1.decode(PTable.java:247)
>   
>
> at 
> org.apache.phoenix.schema.tuple.EncodedColumnQualiferCellsList.add(EncodedColumnQualiferCellsList.java:141)
> 
> at 
> org.apache.phoenix.schema.tuple.EncodedColumnQualiferCellsList.add(EncodedColumnQualiferCellsList.java:56)
>  
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:560) 
>   
>
> at 
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147) 
>   
>
> at 
> org.apache.hadoop.hbase.regionserver.HRegio

[jira] [Commented] (PHOENIX-4366) Rebuilding a local index fails sometimes

2018-04-09 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16431669#comment-16431669
 ] 

Samarth Jain commented on PHOENIX-4366:
---

I was motivated by just getting hold of the column encoding related values once 
in preScannerOpen and reusing it across the board (instead of having to fetch 
it from the scan context every time). I made this with the assumption that 
every region gets it's own co-processor instance. Or is it one instance per 
region server? If former, why is it problematic to store these values as member 
variables since their scope should only be limited to the table region.

> Rebuilding a local index fails sometimes
> 
>
> Key: PHOENIX-4366
> URL: https://issues.apache.org/jira/browse/PHOENIX-4366
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
>Reporter: Marcin Januszkiewicz
>Assignee: James Taylor
>Priority: Blocker
> Fix For: 4.14.0
>
> Attachments: PHOENIX-4366_v1.patch
>
>
> We have a table created in 4.12 with the new column encoding scheme and with 
> several local indexes. Sometimes when we issue an ALTER INDEX ... REBUILD 
> command, it fails with the following exception:
> {noformat}
> Error: org.apache.phoenix.exception.PhoenixIOException: 
> org.apache.hadoop.hbase.DoNotRetryIOException: 
> TRACES,\x01BY01O90A6-$599a349e,1509979836322.3f
> 30c9d449ed6c60a1cda6898f766bd0.: null 
>   
>   
> at 
> org.apache.phoenix.util.ServerUtil.createIOException(ServerUtil.java:96)  
>   
>
> at 
> org.apache.phoenix.util.ServerUtil.throwIOException(ServerUtil.java:62)   
>   
>
> at 
> org.apache.phoenix.coprocessor.BaseScannerRegionObserver$RegionScannerHolder.overrideDelegate(BaseScannerRegionObserver.java:255)
>   
> at 
> org.apache.phoenix.coprocessor.BaseScannerRegionObserver$RegionScannerHolder.nextRaw(BaseScannerRegionObserver.java:284)
>
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2541)
>   
>  
> at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:33648)
> 
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2183)
>   
>   
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
>   
>   
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:183) 
>   
>
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:163) 
>   
>
> Caused by: java.lang.UnsupportedOperationException
>   
>   
> at 
> org.apache.phoenix.schema.PTable$QualifierEncodingScheme$1.decode(PTable.java:247)
>   
>
> at 
> org.apache.phoenix.schema.tuple.EncodedColumnQualiferCellsList.add(EncodedColumnQualiferCellsList.java:141)
> 
> at 
> org.apache.phoenix.schema.tuple.EncodedColumnQualiferCellsList.add(EncodedColumnQualiferCellsList.java:56)
>  
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.

Re: [DISCUSS] Design for a "query log"

2018-03-02 Thread Samarth Jain
A couple more points which I think you alluded to, Josh, but I would still
like to call out:
1) Writing of these query logs to a phoenix table should be best effort
i.e. a query definitely shouldn't fail because we encountered an issue
while writing its log
2) Writing of query logs should happen in a manner that is async to the
flow of the query i.e. a query shouldn't incur the cost of the write
happening to the query log table

Doing 2) will help out with 1)





On Fri, Mar 2, 2018 at 2:28 PM, Josh Elser  wrote:

> Thanks Nick and Andrew! These are great points.
>
> * A TTL out of the box is a must. That's such a good suggestion
> * Sensitivity of data being stored is also a tricky-serious issue to
> consider. We'll want to lock the table down and be able to state very
> clearly what data may show up in it.
> * I like the "levels" of detail that will be persisted. It will help break
> up the development work (e.g. first impl can just be the INFO details), and
> prevents concern of runtime impact.
> * Sampling is a no-brainer for "always-on" situations. I like that too.
>
> I'll work on taking these (and others) and updating the gdoc tonight.
> Thanks again for your feedback!
>
>
> On 3/2/18 1:50 PM, Andrew Purtell wrote:
>
>> Agree with Nick's points but let me augment with an additional suggestion:
>> Tunable/configurable threshold for sampling. In many cases it's sufficient
>> to sample e.g. 1% of queries to get sufficient coverage and this would
>> prune 99% of actual load from the query log.
>>
>> Also let me underline that compliance requirements will require either
>> super strong controls of the query log if everything is always logged, in
>> which case it is important that it works well with access control features
>> to lock it down; or better what Nick suggests where we can turn off things
>> like logging the values supplied for bound parameters.
>>
>>
>>
>> On Fri, Mar 2, 2018 at 8:41 AM, Nick Dimiduk  wrote:
>>
>> I'm a big fan of this idea. There was a brief discussion on the topic over
>>> on PHOENIX-2715.
>>>
>>> My first concern is that the collected information is huge -- easily far
>>> larger than the user data for a busy cluster. For instance, a couple 10's
>>> of GB stored user data, guideposts set to default 100mb, enable salting
>>> on
>>> a table with an "innocent" value of 10 or 20 and the collection of RPCs
>>> can
>>> easily grow into the hundreds for simple queries. Even if you catalog
>>> just
>>> the "logical" RPC's - HBase Client API calls that Phoenix plans rather
>>> than
>>> the underlying HBase Client RPCs - this will be quite large. The
>>> guidepost
>>> themselves for such a table would be on the order of 30mb.
>>>
>>> My next concern is about the sensitive query parameters being stored.
>>> It's
>>> entirely reasonable to expect a table to store sensitive information that
>>> should not be exposed to operations.
>>>
>>> Thus, my suggestions:
>>> * minimize the unbounded nature of this table by truncating all columns
>>> to
>>> some max length -- perhaps 5k or 10k.
>>> * enable a default TTL on the schema. 7 days seems like a good starting
>>> point.
>>> * consider controlling which columns are populated via some operational
>>> mechanism. Use Logger level as an example, with INFO the default setting.
>>> Which data is stored at this level? Then at DEBUG, then TRACE. Maybe
>>> timestamp, SQL, and explain are at INFO. DEBUG adds bound parameters and
>>> scan metrics. TRACE adds RPCs and timing, snapshot metadata.
>>>
>>> Thanks,
>>> Nick
>>>
>>> On Mon, Feb 26, 2018 at 1:57 PM, Josh Elser  wrote:
>>>
>>> Hiya,

 I wanted to share this little design doc with you about some feature
 work
 we've been thinking about. The following is a Google doc in which anyone
 should be allowed to comment. Feel free to comment there, or here on the
 thread.

 https://s.apache.org/phoenix-query-log

 The high-level goal is to create a construct in which Phoenix clients

>>> will
>>>
 automatically serialize information about the queries they run to a
 table
 for retrospective analysis. Ideally, this information would be stored in

>>> a
>>>
 Phoenix table. We want this data to help answer questions like:

 * What queries are running against my system
 * What specific queries started between 535AM and 620AM two days ago
 * What queries are user "bob" running
 * Are my user's queries effectively using the indexes in the system

 Anti-goals for include:

 * Cluster impact (computation/memory) usage of a query
 * Query performance may be slowed to ensure all data is serialized
 * A third-party service dedicated to ensuring query info is serialized

>>> (in
>>>
 the event of client failure)

 Take a look at the document and let us know what you think please. I'm
 happy to try to explain this in greater detail.

[jira] [Commented] (PHOENIX-4625) memory leak in PhoenixConnection if scanner renew lease thread is not enabled

2018-02-21 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16372463#comment-16372463
 ] 

Samarth Jain commented on PHOENIX-4625:
---

+1

> memory leak in PhoenixConnection if scanner renew lease thread is not enabled
> -
>
> Key: PHOENIX-4625
> URL: https://issues.apache.org/jira/browse/PHOENIX-4625
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.14.0
>Reporter: Vikas Vishwakarma
>Priority: Major
> Fix For: 4.14.0
>
> Attachments: GC_After_fix.png, GC_Leak.png, PHOENIX-4625.patch, QS.png
>
>
> We have two different code path
>  # In ConnectionQueryServicesImpl RenewLeaseTasks is scheduled based on the 
> following checks  if renew lease feature is supported and if the renew lease 
> config is enabled 
> supportsFeature(ConnectionQueryServices.Feature.RENEW_LEASE) && 
> renewLeaseEnabled
>  # In PhoenixConnection for every scan iterator is added to a Queue for lease 
> renewal based on just the check if the renew lease feature is supported 
> services.supportsFeature(Feature.RENEW_LEASE)
> In PhoenixConnection we however miss the check whether renew lease config is 
> enabled (phoenix.scanner.lease.renew.enabled)
>  
> Now consider a situation where Renew lease feature is supported but 
> phoenix.scanner.lease.renew.enabled is set to false in hbase-site.xml . In 
> this case PhoenixConnection will keep adding the iterators for every scan 
> into the scannerQueue for renewal based on the feature supported check but 
> the renewal task is not running because phoenix.scanner.lease.renew.enabled 
> is set to false, so the scannerQueue will keep growing as long as the 
> PhoenixConnection is alive and multiple scans requests are coming on this 
> connection.
>  
> We have a use case that uses a single PhoenixConnection that is perpetual and 
> does billions of scans on this connection. In this case scannerQueue is 
> growing to several GB's and ultimately leading to Consecutive Full GC's/OOM
>  
> Add iterators for Lease renewal in PhoenixConnection
> =
> {code:java}
>  
> public void addIteratorForLeaseRenewal(@Nonnull TableResultIterator itr) {
>  if (services.supportsFeature(Feature.RENEW_LEASE))
>  { 
>checkNotNull(itr); scannerQueue.add(new 
> WeakReference(itr)); 
>  }
> }
> {code}
>  
> Starting the RenewLeaseTask
> =
> checks if Feature.RENEW_LEASE is supported and if 
> phoenix.scanner.lease.renew.enabled is true and starts the RenewLeaseTask
> {code:java}
>  
> ConnectionQueryServicesImpl {
> 
> this.renewLeaseEnabled = config.getBoolean(RENEW_LEASE_ENABLED, 
> DEFAULT_RENEW_LEASE_ENABLED);
> .
> @Override
>  public boolean isRenewingLeasesEnabled(){ 
>return supportsFeature(ConnectionQueryServices.Feature.RENEW_LEASE) && 
> renewLeaseEnabled; 
>  }
> private void scheduleRenewLeaseTasks() {
>  if (isRenewingLeasesEnabled()) {
>renewLeaseExecutor =
>Executors.newScheduledThreadPool(renewLeasePoolSize, 
> renewLeaseThreadFactory);
>for (LinkedBlockingQueue<WeakReference> q : 
> connectionQueues) { 
>  renewLeaseExecutor.scheduleAtFixedRate(new RenewLeaseTask(q), 0, 
> renewLeaseTaskFrequency, TimeUnit.MILLISECONDS); 
>}
>   }
> }
> ...
> }
> {code}
>  
> To solve this We must add both checks in PhoenixConnection if the feature is 
> supported and if the config is enabled before adding the iterators to 
> scannerQueue
> ConnectionQueryServices.Feature.RENEW_LEASE is true  &&  
> phoenix.scanner.lease.renew.enabled is true 
> instead of just checking if the feature 
> ConnectionQueryServices.Feature.RENEW_LEASE is supported
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [VOTE] Release of Apache Phoenix 4.13.2 for CDH 5.11.2 RC0

2018-01-16 Thread Samarth Jain
+1. Ran unit tests successfully. Executed some manual tests around
secondary indexes and stats collection - looks fine.

On Sat, Jan 13, 2018 at 3:30 AM, Pedro Boado  wrote:

> Hello Everyone,
>
> This is a call for a vote on Apache Phoenix 4.13.2 for CDH 5.11.2 RC0. This
> is
> a first release of Phoenix 4.13.x compatible with Cloudera CDH. The release
> includes a source-only release, a convenience binary release and, as a
> novelty, a
> parcel-based binary release ready to be installed from Cloudera Manager
> (CM).
>
> This release has feature parity with supported HBase versions and includes
> the following improvements:
> - Support for Apache Phoenix on CDH 5.11.2 ( based on HBase 1.2 branch ) .
> - More than 10+ fixes over release 4.13.1-HBase-1.2
>
> The work is inspired on the approach taken ( and now discontinued ) by
> https://github.com/cloudera-labs/phoenix a while ago. Please take this
> first RC for a spin!
>
> The source tarball, including signatures, digests, etc can be found at:
> https://dist.apache.org/repos/dist/dev/phoenix/apache-
> phoenix-4.13.2-cdh5.11.2-rc0/src/
>
> The binary artifacts can be found at:
> https://dist.apache.org/repos/dist/dev/phoenix/apache-
> phoenix-4.13.2-cdh5.11.2-rc0/bin/
> https://dist.apache.org/repos/dist/dev/phoenix/apache-
> phoenix-4.13.2-cdh5.11.2-rc0/parcels/  ( this
> directory can be configured in CM as parcel repository for direct
> installation )
>
> For a complete list of changes, see:
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?
> version=12342253=Text=12315120
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/mujtaba.asc
> https://dist.apache.org/repos/dist/release/phoenix/KEYS
>
> The hash and tag to be voted upon:
> https://git-wip-us.apache.org/repos/asf?p=phoenix.git;a=commit;h=
> 60b76d2dc0a039777cc380cf5a8a927a02afff6d
> https://git-wip-us.apache.org/repos/asf?p=phoenix.git;a=tag;
> h=refs/tags/v4.13.2-cdh5.11.2-rc0
>
> Vote will be open for at least 72 hours. Please vote:
>
> [ ] +1 approve
> [ ] +0 no opinion
> [ ] -1 disapprove (and reason why)
>
> Thanks,
> The Apache Phoenix Team
>


[jira] [Updated] (PHOENIX-4397) Incorrect query results when with stats are disabled on a salted table

2017-12-08 Thread Samarth Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Samarth Jain updated PHOENIX-4397:
--
Attachment: PHOENIX-4397_v2.patch

Patch that fixes the issue along with tests. [~jamestaylor], please review. 
[~mujtabachohan] - let me know if it passes your more exhaustive tests too.

> Incorrect query results when with stats are disabled on a salted table
> --
>
> Key: PHOENIX-4397
> URL: https://issues.apache.org/jira/browse/PHOENIX-4397
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.13.0
>Reporter: Mujtaba Chohan
>Assignee: Samarth Jain
> Fix For: 4.13.1
>
> Attachments: PHOENIX-4397.patch, PHOENIX-4397_v2.patch
>
>
> See attached unit test.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-4371) Document explain plan and how we expose estimate information in it

2017-11-30 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16273863#comment-16273863
 ] 

Samarth Jain commented on PHOENIX-4371:
---

Committed the patch after addressing the review comments. 

http://phoenix.apache.org/explainplan.html
http://localhost:8000/tuning_guide.html

> Document explain plan and how we expose estimate information in it
> --
>
> Key: PHOENIX-4371
> URL: https://issues.apache.org/jira/browse/PHOENIX-4371
> Project: Phoenix
>  Issue Type: Task
>    Reporter: Samarth Jain
>    Assignee: Samarth Jain
> Attachments: explainplan.md
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (PHOENIX-4371) Document explain plan and how we expose estimate information in it

2017-11-30 Thread Samarth Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Samarth Jain resolved PHOENIX-4371.
---
Resolution: Fixed

> Document explain plan and how we expose estimate information in it
> --
>
> Key: PHOENIX-4371
> URL: https://issues.apache.org/jira/browse/PHOENIX-4371
> Project: Phoenix
>  Issue Type: Task
>    Reporter: Samarth Jain
>    Assignee: Samarth Jain
> Attachments: explainplan.md
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-4395) Illegal data. Expected length of at least 49 bytes, but had 4 (state=22000,code=201)

2017-11-20 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259575#comment-16259575
 ] 

Samarth Jain commented on PHOENIX-4395:
---

[~gjacoby] - this error doesn't have to do with column encoding. Although I can 
see why the error message made you think it was ;). 

[~rajat.thakur] - how was data added to Phoenix/HBase? Schema of your Phoenix 
table along with sample upsert statements will help immensely, too.

> Illegal data. Expected length of at least 49 bytes, but had 4 
> (state=22000,code=201)
> 
>
> Key: PHOENIX-4395
> URL: https://issues.apache.org/jira/browse/PHOENIX-4395
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.7.0, 4.12.0
>Reporter: Rajat Thakur
>
> I am importing Oracle ExaData to Hbase via Sqoop. And query via phoenix .
> There are problem in following Column attributes (when querying via phoenix) 
> whose dataType is : DATE, TIMESTAMP, BIGINT
> Error: ERROR 201 (22000): Illegal data. Expected length of at least 49 bytes, 
> but had 4 (state=22000,code=201)
> java.sql.SQLException: ERROR 201 (22000): Illegal data. Expected length of at 
> least 49 bytes, but had 4
>   at 
> org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:489)
>   at 
> org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:150)
>   at 
> org.apache.phoenix.schema.KeyValueSchema.next(KeyValueSchema.java:211)
>   at 
> org.apache.phoenix.expression.ProjectedColumnExpression.evaluate(ProjectedColumnExpression.java:116)
>   at 
> org.apache.phoenix.compile.ExpressionProjector.getValue(ExpressionProjector.java:69)
>   at 
> org.apache.phoenix.jdbc.PhoenixResultSet.getString(PhoenixResultSet.java:609)
>   at sqlline.Rows$Row.(Rows.java:183)
>   at sqlline.BufferedRows.(BufferedRows.java:38)
>   at sqlline.SqlLine.print(SqlLine.java:1660)
>   at sqlline.Commands.execute(Commands.java:833)
>   at sqlline.Commands.sql(Commands.java:732)
>   at sqlline.SqlLine.dispatch(SqlLine.java:813)
>   at sqlline.SqlLine.begin(SqlLine.java:686)
>   at sqlline.SqlLine.start(SqlLine.java:398)
>   at sqlline.SqlLine.main(SqlLine.java:291)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-4319) Zookeeper connection should be closed immediately

2017-11-18 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258245#comment-16258245
 ] 

Samarth Jain commented on PHOENIX-4319:
---

Can you try with 4.13?

> Zookeeper connection should be closed immediately
> -
>
> Key: PHOENIX-4319
> URL: https://issues.apache.org/jira/browse/PHOENIX-4319
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.10.0
> Environment: phoenix4.10 hbase1.2.0
>Reporter: Jepson
>  Labels: patch
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> *Code:*
> {code:java}
> val zkUrl = "192.168.100.40,192.168.100.41,192.168.100.42:2181:/hbase"
> val configuration = new Configuration()
> configuration.set("hbase.zookeeper.quorum",zkUrl)
> val spark = SparkSession
>   .builder()
>   .appName("SparkPhoenixTest1")
>   .master("local[2]")
>   .getOrCreate()
>   for( a <- 1 to 100){
>   val wms_doDF = spark.sqlContext.phoenixTableAsDataFrame(
> "DW.wms_do",
> Array("WAREHOUSE_NO", "DO_NO"),
> predicate = Some(
>   """
> |MOD_TIME >= TO_DATE('begin_day', '-MM-dd')
> |and MOD_TIME < TO_DATE('end_day', '-MM-dd')
>   """.stripMargin.replaceAll("begin_day", 
> "2017-10-01").replaceAll("end_day", "2017-10-25")),
> conf = configuration
>   )
>   wms_doDF.show(100)
> }
> {code}
> *Description:*
> The connection to zookeeper is not getting closed,which causes the maximum 
> number of client connections to be reached from a host( we have 
> maxClientCnxns as 500 in zookeeper config).
> *Zookeeper connections:*
> [https://github.com/Hackeruncle/Images/blob/master/zookeeper%20connections.png]
> *Reference:*
> [https://community.hortonworks.com/questions/116832/hbase-zookeeper-connections-not-getting-closed.html]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-4360) Prevent System.Catalog from splitting

2017-11-17 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257684#comment-16257684
 ] 

Samarth Jain commented on PHOENIX-4360:
---

[~lhofhansl] - would be good to also have a test for this that basically 
validates admin.split('SYSTEM.CATALOG') was a no-op.

> Prevent System.Catalog from splitting
> -
>
> Key: PHOENIX-4360
> URL: https://issues.apache.org/jira/browse/PHOENIX-4360
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.13.0
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
>Priority: Blocker
> Fix For: 4.14.0
>
> Attachments: 4360.txt
>
>
> Just talked to [~jamestaylor].
> It turns out that currently System.Catalog is not prevented from splitting 
> generally, but does not allow splitting within a schema.
> In the multi-tenant case that is not good enough. When System.Catalog splits 
> and a base table and view end up in different regions the following can 
> happen:
> * DROP CASCADE no longer works for those views
> * Adding/removing columns to/from the base table no longer works
> Until PHOENIX-3534 is done we should simply prevent System.Catalog from 
> splitting. (just like HBase:meta)
> [~apurtell]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-4369) ArrayIndexOutOfBounds when upserting to table using ROW_TIMESTAMP

2017-11-17 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16256713#comment-16256713
 ] 

Samarth Jain commented on PHOENIX-4369:
---

Looked at it briefly. Looks like an issue when the ROW_TIMESTAMP column's data 
type is TIMESTAMP. [~arfield], as a work around, you can have the 
PK_ROW_TIMESTAMP as DATE.

> ArrayIndexOutOfBounds when upserting to table using ROW_TIMESTAMP
> -
>
> Key: PHOENIX-4369
> URL: https://issues.apache.org/jira/browse/PHOENIX-4369
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.10.0
> Environment: Ubuntu 16.04, JRE 1.8.0_102_x64
>Reporter: Alex Field
>  Labels: SFDC
>
> Used this DDL to create a table which uses ROW_TIMESTAMP
> CREATE TABLE IF NOT EXISTS FOO (
>   TENANT_ID CHAR(15) NOT NULL,
>   PK_BAR VARCHAR(80) NOT NULL, -- NAME
>   PK_ROW_TIMESTAMP TIMESTAMP NOT NULL,
>   FIZZ TIMESTAMP,
>   BUZZ VARCHAR(255), -- LABEL
>   BAZZ CHAR(15), -- VERSION_ID
>   QUX INTEGER,
>   HODOR VARCHAR(10) -- A json blob.
>   CONSTRAINT PK PRIMARY KEY (TENANT_ID, PK_BAR, PK_ROW_TIMESTAMP 
> ROW_TIMESTAMP)
> ) VERSIONS=3,MULTI_TENANT=true,REPLICATION_SCOPE=1
> Upsert causes this exception:
> java.lang.ArrayIndexOutOfBoundsException: 8
>   at 
> org.apache.phoenix.execute.MutationState.getNewRowKeyWithRowTimestamp(MutationState.java:554)
>   at 
> org.apache.phoenix.execute.MutationState.generateMutations(MutationState.java:640)
>   at 
> org.apache.phoenix.execute.MutationState.addRowMutations(MutationState.java:572)
>   at 
> org.apache.phoenix.execute.MutationState.send(MutationState.java:1003)
>   at 
> org.apache.phoenix.execute.MutationState.send(MutationState.java:1469)
>   at 
> org.apache.phoenix.execute.MutationState.commit(MutationState.java:1301)
>   at 
> org.apache.phoenix.jdbc.PhoenixConnection$3.call(PhoenixConnection.java:533)
>   at 
> org.apache.phoenix.jdbc.PhoenixConnection$3.call(PhoenixConnection.java:530)
>   at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
>   at 
> org.apache.phoenix.jdbc.PhoenixConnection.commit(PhoenixConnection.java:530)
> Here's a copy of the test driver:
> @Test
> public void testSomething() throws Exception {
> String sql = "UPSERT INTO FOO (BUZZ, BAZZ, PK_BAR) VALUES (?, ?, ?)";
> try (PreparedStatement stmt = conn.prepareStatement(sql)) {
> stmt.setString(1, "blah blah");
> stmt.setString(2, null);
> stmt.setString(3, "blah");
> stmt.execute();
> conn.commit();
> }
> }



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (PHOENIX-4371) Document explain plan and how we expose estimate information in it

2017-11-10 Thread Samarth Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Samarth Jain updated PHOENIX-4371:
--
Attachment: explainplan.md

[~jamestaylor], please review. I also removed the explain plan section from the 
tuning guide, copied its content and added it to the new explain plan page.

> Document explain plan and how we expose estimate information in it
> --
>
> Key: PHOENIX-4371
> URL: https://issues.apache.org/jira/browse/PHOENIX-4371
> Project: Phoenix
>  Issue Type: Task
>    Reporter: Samarth Jain
>    Assignee: Samarth Jain
> Attachments: explainplan.md
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (PHOENIX-4371) Document explain plan and how we expose estimate information in it

2017-11-10 Thread Samarth Jain (JIRA)
Samarth Jain created PHOENIX-4371:
-

 Summary: Document explain plan and how we expose estimate 
information in it
 Key: PHOENIX-4371
 URL: https://issues.apache.org/jira/browse/PHOENIX-4371
 Project: Phoenix
  Issue Type: Task
Reporter: Samarth Jain
Assignee: Samarth Jain






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: [VOTE] Release of Apache Phoenix 4.13.0 RC1

2017-11-09 Thread Samarth Jain
+1

Successfully ran all unit tests. Verified recent changes and bug fixes
around stats collection and reporting.

On Thu, Nov 9, 2017 at 5:02 PM, James Taylor  wrote:

> +1. Verified bug fixes around delete.
>
> On Thu, Nov 9, 2017 at 10:21 AM, Mujtaba Chohan 
> wrote:
>
> > +1.
> >
> > Verified performance and backward compat. with 4.10/11/12.
> >
> > On Tue, Nov 7, 2017 at 2:50 PM, Andrew Purtell 
> > wrote:
> >
> > > +1
> > >
> > > Checked sums and signatures: ok
> > > RAT check passes: ok (8u131) [1]
> > > Built from source: ok (8u131) [1]
> > > Unit tests pass: ok (8u131) [2]
> > >
> > >
> > > 1. There are some Maven warnings that should be fixed, but are not
> > release
> > > blockers. "Reporting configuration should be done in 
> section,
> > > not in maven-site-plugin  as reportPlugins parameter."
> > Maven
> > > 3.5.0.
> > >
> > > 2. PartialIndexRebuilderIT.testConcurrentUpsertsWithRebuild ran out of
> > > time
> > > when executed with other tests, but passed when run standalone.
> > >
> > >
> > > On Mon, Nov 6, 2017 at 3:47 PM, James Taylor 
> > > wrote:
> > >
> > > > Hello Everyone,
> > > >
> > > > This is a call for a vote on Apache Phoenix 4.13.0 RC1. This is the
> > next
> > > > minor release of Phoenix 4, compatible with Apache HBase 0.98 and
> 1.3.
> > > The
> > > > release includes both a source-only release and a convenience binary
> > > > release for each supported HBase version. The previous RC was sunk
> due
> > to
> > > > PHOENIX-4351 which is now fixed.
> > > >
> > > > This release has feature parity with supported HBase versions and
> > > includes
> > > > the following improvements:
> > > > - Critical bug fix to prevent snapshot creation of SYSTEM.CATALOG
> when
> > > > connecting [1]
> > > > - Numerous bug fixes around handling of row deletion [2][3][4][5]
> > > > - Improvements to statistics collection [6][7][8][9]
> > > > - New COLLATION_KEY built-in function for linguistic sort [10]
> > > >
> > > > The source tarball, including signatures, digests, etc can be found
> at:
> > > > https://dist.apache.org/repos/dist/dev/phoenix/apache-phoeni
> > > > x-4.13.0-HBase-0.98-rc1/src/
> > > > https://dist.apache.org/repos/dist/dev/phoenix/apache-phoeni
> > > > x-4.13.0-HBase-1.3-rc1/src/
> > > >
> > > > The binary artifacts can be found at:
> > > > https://dist.apache.org/repos/dist/dev/phoenix/apache-phoeni
> > > > x-4.13.0-HBase-0.98-rc1/bin/
> > > > https://dist.apache.org/repos/dist/dev/phoenix/apache-phoeni
> > > > x-4.13.0-HBase-1.3-rc1/bin/
> > > >
> > > > For a complete list of changes, see:
> > > > https://issues.apache.org/jira/secure/ReleaseNote.jspa?proje
> > > > ctId=12315120=12341710
> > > >
> > > > Release artifacts are signed with the following key:
> > > > https://people.apache.org/keys/committer/mujtaba.asc
> > > > https://dist.apache.org/repos/dist/release/phoenix/KEYS
> > > >
> > > > The hash and tag to be voted upon:
> > > > https://git-wip-us.apache.org/repos/asf?p=phoenix.git;a=comm
> > > > it;h=8b7e12414400c997d5993fb55586bfcc2f56d217
> > > > https://git-wip-us.apache.org/repos/asf?p=phoenix.git;a=tag;
> > > > h=refs/tags/v4.13.0-HBase-0.98-rc1
> > > > https://git-wip-us.apache.org/repos/asf?p=phoenix.git;a=comm
> > > > it;h=4a1f0df6143ba705a48b5051aee52dab158afe8d
> > > > https://git-wip-us.apache.org/repos/asf?p=phoenix.git;a=tag;
> > > > h=refs/tags/v4.13.0-HBase-1.3-rc1
> > > >
> > > > Vote will be open for at least 72 hours. Please vote:
> > > >
> > > > [ ] +1 approve
> > > > [ ] +0 no opinion
> > > > [ ] -1 disapprove (and reason why)
> > > >
> > > > Thanks,
> > > > The Apache Phoenix Team
> > > >
> > > > [1] https://issues.apache.org/jira/browse/PHOENIX-4335
> > > > [2] https://issues.apache.org/jira/browse/PHOENIX-4280
> > > > [3] https://issues.apache.org/jira/browse/PHOENIX-4290
> > > > [4] https://issues.apache.org/jira/browse/PHOENIX-4348
> > > > [5] https://issues.apache.org/jira/browse/PHOENIX-4277
> > > > [6] https://issues.apache.org/jira/browse/PHOENIX-3368
> > > > [7] https://issues.apache.org/jira/browse/PHOENIX-4287
> > > > [8] https://issues.apache.org/jira/browse/PHOENIX-4289
> > > > [9] https://issues.apache.org/jira/browse/PHOENIX-4343
> > > > [10] https://issues.apache.org/jira/browse/PHOENIX-4237
> > > >
> > >
> >
>


[jira] [Commented] (PHOENIX-4358) Case Sensitive String match on SqlType in PDataType

2017-11-09 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246418#comment-16246418
 ] 

Samarth Jain commented on PHOENIX-4358:
---

Patch looks fine to me, [~dangulo]. Please add a test in PDataTypeTest to make 
sure a future change doesn't end up causing a regression.

> Case Sensitive String match on SqlType in PDataType
> ---
>
> Key: PHOENIX-4358
> URL: https://issues.apache.org/jira/browse/PHOENIX-4358
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.11.0
> Environment: OSX and Linux
>Reporter: Dave Angulo
>Priority: Minor
> Attachments: caseFix.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> fromSqlTypeName() method uses a case sensitive match on input SqlType. This 
> causes an issue in Spark JDBCUtils.makeSetter() which lowerCases input. The 
> result is the error  _Unsupported sql type: varchar_.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: [VOTE] Release of Apache Phoenix 4.13.0 RC1

2017-11-07 Thread Samarth Jain
+1

On Tue, Nov 7, 2017 at 12:06 PM, Andrew Purtell  wrote:

> My bad. Ignore my -1. Somehow the bin downloads failed. Let me try again.
>
>
> On Tue, Nov 7, 2017 at 12:05 PM, Andrew Purtell 
> wrote:
>
> > -1
> >
> > Signature verification fails
> >
> > $ gpg --verify apache-phoenix-4.13.0-HBase-0.98-src.tar.gz.asc
> > apache-phoenix-4.13.0-HBase-0.98-src.tar.gz
> > gpg: Signature made Mon Nov  6 14:54:25 2017 PST
> > gpg:using RSA key 3BFCB3929461178E
> > gpg: Good signature from "Mujtaba Chohan (CODE SIGNING KEY) <
> > mujt...@apache.org>" [unknown]
> >
> > $ gpg --verify apache-phoenix-4.13.0-HBase-0.98-bin.tar.gz.asc
> > apache-phoenix-4.13.0-HBase-0.98-bin.tar.gz
> > gpg: Signature made Mon Nov  6 14:54:05 2017 PST
> > gpg:using RSA key 3BFCB3929461178E
> > *gpg: BAD signature from "Mujtaba Chohan (CODE SIGNING KEY)
> > >" [unknown]*
> >
> > $ gpg --verify apache-phoenix-4.13.0-HBase-1.3-src.tar.gz.asc
> > apache-phoenix-4.13.0-HBase-1.3-src.tar.gz
> > gpg: Signature made Mon Nov  6 14:54:34 2017 PST
> > gpg:using RSA key 3BFCB3929461178E
> > gpg: Good signature from "Mujtaba Chohan (CODE SIGNING KEY) <
> > mujt...@apache.org>" [unknown]
> >
> > $ gpg --verify apache-phoenix-4.13.0-HBase-1.3-bin.tar.gz.asc
> > apache-phoenix-4.13.0-HBase-1.3-bin.tar.gz
> > gpg: Signature made Mon Nov  6 14:54:06 2017 PST
> > gpg:using RSA key 3BFCB3929461178E
> > *gpg: BAD signature from "Mujtaba Chohan (CODE SIGNING KEY)
> > >" [unknown]*
> >
> >
> >
> > On Mon, Nov 6, 2017 at 3:47 PM, James Taylor 
> > wrote:
> >
> >> Hello Everyone,
> >>
> >> This is a call for a vote on Apache Phoenix 4.13.0 RC1. This is the next
> >> minor release of Phoenix 4, compatible with Apache HBase 0.98 and 1.3.
> The
> >> release includes both a source-only release and a convenience binary
> >> release for each supported HBase version. The previous RC was sunk due
> to
> >> PHOENIX-4351 which is now fixed.
> >>
> >> This release has feature parity with supported HBase versions and
> includes
> >> the following improvements:
> >> - Critical bug fix to prevent snapshot creation of SYSTEM.CATALOG when
> >> connecting [1]
> >> - Numerous bug fixes around handling of row deletion [2][3][4][5]
> >> - Improvements to statistics collection [6][7][8][9]
> >> - New COLLATION_KEY built-in function for linguistic sort [10]
> >>
> >> The source tarball, including signatures, digests, etc can be found at:
> >> https://dist.apache.org/repos/dist/dev/phoenix/apache-phoeni
> >> x-4.13.0-HBase-0.98-rc1/src/
> >> https://dist.apache.org/repos/dist/dev/phoenix/apache-phoeni
> >> x-4.13.0-HBase-1.3-rc1/src/
> >>
> >> The binary artifacts can be found at:
> >> https://dist.apache.org/repos/dist/dev/phoenix/apache-phoeni
> >> x-4.13.0-HBase-0.98-rc1/bin/
> >> https://dist.apache.org/repos/dist/dev/phoenix/apache-phoeni
> >> x-4.13.0-HBase-1.3-rc1/bin/
> >>
> >> For a complete list of changes, see:
> >> https://issues.apache.org/jira/secure/ReleaseNote.jspa?proje
> >> ctId=12315120=12341710
> >>
> >> Release artifacts are signed with the following key:
> >> https://people.apache.org/keys/committer/mujtaba.asc
> >> https://dist.apache.org/repos/dist/release/phoenix/KEYS
> >>
> >> The hash and tag to be voted upon:
> >> https://git-wip-us.apache.org/repos/asf?p=phoenix.git;a=comm
> >> it;h=8b7e12414400c997d5993fb55586bfcc2f56d217
> >> https://git-wip-us.apache.org/repos/asf?p=phoenix.git;a=tag;
> >> h=refs/tags/v4.13.0-HBase-0.98-rc1
> >> https://git-wip-us.apache.org/repos/asf?p=phoenix.git;a=comm
> >> it;h=4a1f0df6143ba705a48b5051aee52dab158afe8d
> >> https://git-wip-us.apache.org/repos/asf?p=phoenix.git;a=tag;
> >> h=refs/tags/v4.13.0-HBase-1.3-rc1
> >>
> >> Vote will be open for at least 72 hours. Please vote:
> >>
> >> [ ] +1 approve
> >> [ ] +0 no opinion
> >> [ ] -1 disapprove (and reason why)
> >>
> >> Thanks,
> >> The Apache Phoenix Team
> >>
> >> [1] https://issues.apache.org/jira/browse/PHOENIX-4335
> >> [2] https://issues.apache.org/jira/browse/PHOENIX-4280
> >> [3] https://issues.apache.org/jira/browse/PHOENIX-4290
> >> [4] https://issues.apache.org/jira/browse/PHOENIX-4348
> >> [5] https://issues.apache.org/jira/browse/PHOENIX-4277
> >> [6] https://issues.apache.org/jira/browse/PHOENIX-3368
> >> [7] https://issues.apache.org/jira/browse/PHOENIX-4287
> >> [8] https://issues.apache.org/jira/browse/PHOENIX-4289
> >> [9] https://issues.apache.org/jira/browse/PHOENIX-4343
> >> [10] https://issues.apache.org/jira/browse/PHOENIX-4237
> >>
> >
> >
> >
> > --
> > Best regards,
> > Andrew
> >
> > Words like orphans lost among the crosstalk, meaning torn from truth's
> > decrepit hands
> >- A23, Crosstalk
> >
>
>
>
> --
> Best regards,
> Andrew
>
> Words like orphans lost among the crosstalk, meaning torn from truth's
> decrepit hands
>- A23, 

[jira] [Commented] (PHOENIX-4348) Point deletes do not work when there are immutable indexes with only row key columns

2017-11-02 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16237067#comment-16237067
 ] 

Samarth Jain commented on PHOENIX-4348:
---

+1

> Point deletes do not work when there are immutable indexes with only row key 
> columns
> 
>
> Key: PHOENIX-4348
> URL: https://issues.apache.org/jira/browse/PHOENIX-4348
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: James Taylor
>Priority: Major
> Fix For: 4.13.0
>
> Attachments: PHOENIX-4348.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-4287) Incorrect aggregate query results when stats are disable for parallelization

2017-11-02 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236904#comment-16236904
 ] 

Samarth Jain commented on PHOENIX-4287:
---

Thanks. I added the comment in my commit.

> Incorrect aggregate query results when stats are disable for parallelization
> 
>
> Key: PHOENIX-4287
> URL: https://issues.apache.org/jira/browse/PHOENIX-4287
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
> Environment: HBase 1.3.1
>Reporter: Mujtaba Chohan
>    Assignee: Samarth Jain
>Priority: Major
>  Labels: localIndex
> Fix For: 4.13.0, 4.12.1
>
> Attachments: PHOENIX-4287.patch, PHOENIX-4287_addendum.patch, 
> PHOENIX-4287_addendum2.patch, PHOENIX-4287_addendum3.patch, 
> PHOENIX-4287_addendum4.patch, PHOENIX-4287_addendum5.patch, 
> PHOENIX-4287_addendum6.patch, PHOENIX-4287_addendum7.patch, 
> PHOENIX-4287_v2.patch, PHOENIX-4287_v3.patch, PHOENIX-4287_v3_wip.patch, 
> PHOENIX-4287_v4.patch
>
>
> With {{phoenix.use.stats.parallelization}} set to {{false}}, aggregate query 
> returns incorrect results when stats are available.
> With local index and stats disabled for parallelization:
> {noformat}
> explain select count(*) from TABLE_T;
> +---+-++---+
> | PLAN
>   | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO |
> +---+-++---+
> | CLIENT 0-CHUNK 332170 ROWS 625043899 BYTES PARALLEL 0-WAY RANGE SCAN OVER 
> TABLE_T [1]  | 625043899   | 332170 | 150792825 |
> | SERVER FILTER BY FIRST KEY ONLY 
>   | 625043899   | 332170 | 150792825 |
> | SERVER AGGREGATE INTO SINGLE ROW
>   | 625043899   | 332170 | 150792825 |
> +---+-++---+
> select count(*) from TABLE_T;
> +---+
> | COUNT(1)  |
> +---+
> | 0 |
> +---+
> {noformat}
> Using data table
> {noformat}
> explain select /*+NO_INDEX*/ count(*) from TABLE_T;
> +--+-+++
> |   PLAN  
>  | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO_TS   |
> +--+-+++
> | CLIENT 2-CHUNK 332151 ROWS 438492470 BYTES PARALLEL 1-WAY FULL SCAN OVER 
> TABLE_T  | 438492470   | 332151 | 1507928257617  |
> | SERVER FILTER BY FIRST KEY ONLY 
>  | 438492470   | 332151 | 1507928257617  |
> | SERVER AGGREGATE INTO SINGLE ROW
>  | 438492470   | 332151 | 1507928257617  |
> +--+-+++
> select /*+NO_INDEX*/ count(*) from TABLE_T;
> +---+
> | COUNT(1)  |
> +---+
> | 14|
> +---+
> {noformat}
> Without stats available, results are correct:
> {noformat}
> explain select /*+NO_INDEX*/ count(*) from TABLE_T;
> +--+-++--+
> | PLAN | 
> EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
> +--+-++--+
> | CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER TABLE_T  | null| 
> null   | null |
> | SERVER FILTER BY FIRST KEY ONLY  | null 
>| null   | null |
> | SERVER AGGREGATE INTO SINGLE ROW

[jira] [Updated] (PHOENIX-4287) Incorrect aggregate query results when stats are disable for parallelization

2017-11-02 Thread Samarth Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Samarth Jain updated PHOENIX-4287:
--
Attachment: PHOENIX-4287_addendum7.patch

Looks like an NPE happens when dropping local indexes. Addressing it in this 
patch.

> Incorrect aggregate query results when stats are disable for parallelization
> 
>
> Key: PHOENIX-4287
> URL: https://issues.apache.org/jira/browse/PHOENIX-4287
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
> Environment: HBase 1.3.1
>Reporter: Mujtaba Chohan
>    Assignee: Samarth Jain
>Priority: Major
>  Labels: localIndex
> Fix For: 4.13.0, 4.12.1
>
> Attachments: PHOENIX-4287.patch, PHOENIX-4287_addendum.patch, 
> PHOENIX-4287_addendum2.patch, PHOENIX-4287_addendum3.patch, 
> PHOENIX-4287_addendum4.patch, PHOENIX-4287_addendum5.patch, 
> PHOENIX-4287_addendum6.patch, PHOENIX-4287_addendum7.patch, 
> PHOENIX-4287_v2.patch, PHOENIX-4287_v3.patch, PHOENIX-4287_v3_wip.patch, 
> PHOENIX-4287_v4.patch
>
>
> With {{phoenix.use.stats.parallelization}} set to {{false}}, aggregate query 
> returns incorrect results when stats are available.
> With local index and stats disabled for parallelization:
> {noformat}
> explain select count(*) from TABLE_T;
> +---+-++---+
> | PLAN
>   | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO |
> +---+-++---+
> | CLIENT 0-CHUNK 332170 ROWS 625043899 BYTES PARALLEL 0-WAY RANGE SCAN OVER 
> TABLE_T [1]  | 625043899   | 332170 | 150792825 |
> | SERVER FILTER BY FIRST KEY ONLY 
>   | 625043899   | 332170 | 150792825 |
> | SERVER AGGREGATE INTO SINGLE ROW
>   | 625043899   | 332170 | 150792825 |
> +---+-++---+
> select count(*) from TABLE_T;
> +---+
> | COUNT(1)  |
> +---+
> | 0 |
> +---+
> {noformat}
> Using data table
> {noformat}
> explain select /*+NO_INDEX*/ count(*) from TABLE_T;
> +--+-+++
> |   PLAN  
>  | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO_TS   |
> +--+-+++
> | CLIENT 2-CHUNK 332151 ROWS 438492470 BYTES PARALLEL 1-WAY FULL SCAN OVER 
> TABLE_T  | 438492470   | 332151 | 1507928257617  |
> | SERVER FILTER BY FIRST KEY ONLY 
>  | 438492470   | 332151 | 1507928257617  |
> | SERVER AGGREGATE INTO SINGLE ROW
>  | 438492470   | 332151 | 1507928257617  |
> +--+-+++
> select /*+NO_INDEX*/ count(*) from TABLE_T;
> +---+
> | COUNT(1)  |
> +---+
> | 14|
> +---+
> {noformat}
> Without stats available, results are correct:
> {noformat}
> explain select /*+NO_INDEX*/ count(*) from TABLE_T;
> +--+-++--+
> | PLAN | 
> EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
> +--+-++--+
> | CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER TABLE_T  | null| 
> null   | null |
> | SERVER

[jira] [Updated] (PHOENIX-4287) Incorrect aggregate query results when stats are disable for parallelization

2017-11-02 Thread Samarth Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Samarth Jain updated PHOENIX-4287:
--
Attachment: PHOENIX-4287_addendum6.patch

Updated patch with additional test on view and view index.

> Incorrect aggregate query results when stats are disable for parallelization
> 
>
> Key: PHOENIX-4287
> URL: https://issues.apache.org/jira/browse/PHOENIX-4287
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
> Environment: HBase 1.3.1
>Reporter: Mujtaba Chohan
>    Assignee: Samarth Jain
>Priority: Major
>  Labels: localIndex
> Fix For: 4.13.0, 4.12.1
>
> Attachments: PHOENIX-4287.patch, PHOENIX-4287_addendum.patch, 
> PHOENIX-4287_addendum2.patch, PHOENIX-4287_addendum3.patch, 
> PHOENIX-4287_addendum4.patch, PHOENIX-4287_addendum5.patch, 
> PHOENIX-4287_addendum6.patch, PHOENIX-4287_v2.patch, PHOENIX-4287_v3.patch, 
> PHOENIX-4287_v3_wip.patch, PHOENIX-4287_v4.patch
>
>
> With {{phoenix.use.stats.parallelization}} set to {{false}}, aggregate query 
> returns incorrect results when stats are available.
> With local index and stats disabled for parallelization:
> {noformat}
> explain select count(*) from TABLE_T;
> +---+-++---+
> | PLAN
>   | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO |
> +---+-++---+
> | CLIENT 0-CHUNK 332170 ROWS 625043899 BYTES PARALLEL 0-WAY RANGE SCAN OVER 
> TABLE_T [1]  | 625043899   | 332170 | 150792825 |
> | SERVER FILTER BY FIRST KEY ONLY 
>   | 625043899   | 332170 | 150792825 |
> | SERVER AGGREGATE INTO SINGLE ROW
>   | 625043899   | 332170 | 150792825 |
> +---+-++---+
> select count(*) from TABLE_T;
> +---+
> | COUNT(1)  |
> +---+
> | 0 |
> +---+
> {noformat}
> Using data table
> {noformat}
> explain select /*+NO_INDEX*/ count(*) from TABLE_T;
> +--+-+++
> |   PLAN  
>  | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO_TS   |
> +--+-+++
> | CLIENT 2-CHUNK 332151 ROWS 438492470 BYTES PARALLEL 1-WAY FULL SCAN OVER 
> TABLE_T  | 438492470   | 332151 | 1507928257617  |
> | SERVER FILTER BY FIRST KEY ONLY 
>  | 438492470   | 332151 | 1507928257617  |
> | SERVER AGGREGATE INTO SINGLE ROW
>  | 438492470   | 332151 | 1507928257617  |
> +--+-+++
> select /*+NO_INDEX*/ count(*) from TABLE_T;
> +---+
> | COUNT(1)  |
> +---+
> | 14|
> +---+
> {noformat}
> Without stats available, results are correct:
> {noformat}
> explain select /*+NO_INDEX*/ count(*) from TABLE_T;
> +--+-++--+
> | PLAN | 
> EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
> +--+-++--+
> | CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER TABLE_T  | null| 
> null   | null |
> | SERVER FILTER BY FIRST KEY ONLY  | null 
>| null   | null |
> | SERVER AGGREGATE INTO SINGLE ROW   

[jira] [Updated] (PHOENIX-4287) Incorrect aggregate query results when stats are disable for parallelization

2017-11-02 Thread Samarth Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Samarth Jain updated PHOENIX-4287:
--
Attachment: PHOENIX-4287_addendum5.patch

Thanks for the code snippet, [~jamestaylor]. Attached is the addendum along 
with a test.

> Incorrect aggregate query results when stats are disable for parallelization
> 
>
> Key: PHOENIX-4287
> URL: https://issues.apache.org/jira/browse/PHOENIX-4287
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
> Environment: HBase 1.3.1
>Reporter: Mujtaba Chohan
>    Assignee: Samarth Jain
>Priority: Major
>  Labels: localIndex
> Fix For: 4.13.0, 4.12.1
>
> Attachments: PHOENIX-4287.patch, PHOENIX-4287_addendum.patch, 
> PHOENIX-4287_addendum2.patch, PHOENIX-4287_addendum3.patch, 
> PHOENIX-4287_addendum4.patch, PHOENIX-4287_addendum5.patch, 
> PHOENIX-4287_v2.patch, PHOENIX-4287_v3.patch, PHOENIX-4287_v3_wip.patch, 
> PHOENIX-4287_v4.patch
>
>
> With {{phoenix.use.stats.parallelization}} set to {{false}}, aggregate query 
> returns incorrect results when stats are available.
> With local index and stats disabled for parallelization:
> {noformat}
> explain select count(*) from TABLE_T;
> +---+-++---+
> | PLAN
>   | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO |
> +---+-++---+
> | CLIENT 0-CHUNK 332170 ROWS 625043899 BYTES PARALLEL 0-WAY RANGE SCAN OVER 
> TABLE_T [1]  | 625043899   | 332170 | 150792825 |
> | SERVER FILTER BY FIRST KEY ONLY 
>   | 625043899   | 332170 | 150792825 |
> | SERVER AGGREGATE INTO SINGLE ROW
>   | 625043899   | 332170 | 150792825 |
> +---+-++---+
> select count(*) from TABLE_T;
> +---+
> | COUNT(1)  |
> +---+
> | 0 |
> +---+
> {noformat}
> Using data table
> {noformat}
> explain select /*+NO_INDEX*/ count(*) from TABLE_T;
> +--+-+++
> |   PLAN  
>  | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO_TS   |
> +--+-+++
> | CLIENT 2-CHUNK 332151 ROWS 438492470 BYTES PARALLEL 1-WAY FULL SCAN OVER 
> TABLE_T  | 438492470   | 332151 | 1507928257617  |
> | SERVER FILTER BY FIRST KEY ONLY 
>  | 438492470   | 332151 | 1507928257617  |
> | SERVER AGGREGATE INTO SINGLE ROW
>  | 438492470   | 332151 | 1507928257617  |
> +--+-+++
> select /*+NO_INDEX*/ count(*) from TABLE_T;
> +---+
> | COUNT(1)  |
> +---+
> | 14|
> +---+
> {noformat}
> Without stats available, results are correct:
> {noformat}
> explain select /*+NO_INDEX*/ count(*) from TABLE_T;
> +--+-++--+
> | PLAN | 
> EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
> +--+-++--+
> | CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER TABLE_T  | null| 
> null   | null |
> | SERVER FILTER BY FIRST KEY ONLY  | null 
>| null   | null |
> | SERVER AGGREGATE INTO SINGLE ROW   

[jira] [Commented] (PHOENIX-4287) Incorrect aggregate query results when stats are disable for parallelization

2017-11-02 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236580#comment-16236580
 ] 

Samarth Jain commented on PHOENIX-4287:
---

Yes, that's correct. Will change the patch to fetch the property from the base 
table.

> Incorrect aggregate query results when stats are disable for parallelization
> 
>
> Key: PHOENIX-4287
> URL: https://issues.apache.org/jira/browse/PHOENIX-4287
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
> Environment: HBase 1.3.1
>Reporter: Mujtaba Chohan
>    Assignee: Samarth Jain
>Priority: Major
>  Labels: localIndex
> Fix For: 4.13.0, 4.12.1
>
> Attachments: PHOENIX-4287.patch, PHOENIX-4287_addendum.patch, 
> PHOENIX-4287_addendum2.patch, PHOENIX-4287_addendum3.patch, 
> PHOENIX-4287_addendum4.patch, PHOENIX-4287_v2.patch, PHOENIX-4287_v3.patch, 
> PHOENIX-4287_v3_wip.patch, PHOENIX-4287_v4.patch
>
>
> With {{phoenix.use.stats.parallelization}} set to {{false}}, aggregate query 
> returns incorrect results when stats are available.
> With local index and stats disabled for parallelization:
> {noformat}
> explain select count(*) from TABLE_T;
> +---+-++---+
> | PLAN
>   | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO |
> +---+-++---+
> | CLIENT 0-CHUNK 332170 ROWS 625043899 BYTES PARALLEL 0-WAY RANGE SCAN OVER 
> TABLE_T [1]  | 625043899   | 332170 | 150792825 |
> | SERVER FILTER BY FIRST KEY ONLY 
>   | 625043899   | 332170 | 150792825 |
> | SERVER AGGREGATE INTO SINGLE ROW
>   | 625043899   | 332170 | 150792825 |
> +---+-++---+
> select count(*) from TABLE_T;
> +---+
> | COUNT(1)  |
> +---+
> | 0 |
> +---+
> {noformat}
> Using data table
> {noformat}
> explain select /*+NO_INDEX*/ count(*) from TABLE_T;
> +--+-+++
> |   PLAN  
>  | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO_TS   |
> +--+-+++
> | CLIENT 2-CHUNK 332151 ROWS 438492470 BYTES PARALLEL 1-WAY FULL SCAN OVER 
> TABLE_T  | 438492470   | 332151 | 1507928257617  |
> | SERVER FILTER BY FIRST KEY ONLY 
>  | 438492470   | 332151 | 1507928257617  |
> | SERVER AGGREGATE INTO SINGLE ROW
>  | 438492470   | 332151 | 1507928257617  |
> +--+-+++
> select /*+NO_INDEX*/ count(*) from TABLE_T;
> +---+
> | COUNT(1)  |
> +---+
> | 14|
> +---+
> {noformat}
> Without stats available, results are correct:
> {noformat}
> explain select /*+NO_INDEX*/ count(*) from TABLE_T;
> +--+-++--+
> | PLAN | 
> EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
> +--+-++--+
> | CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER TABLE_T  | null| 
> null   | null |
> | SERVER FILTER BY FIRST KEY ONLY  | null 
>| null   | null |
> | SERVER AGGREGATE INTO SINGLE ROW | null 
>  

[jira] [Commented] (PHOENIX-4287) Incorrect aggregate query results when stats are disable for parallelization

2017-11-02 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236544#comment-16236544
 ] 

Samarth Jain commented on PHOENIX-4287:
---

USE_STATS_FOR_PARALLELIZATION can be set at an index/view/base table level. For 
index to use parallelization, you need to set USE_STATS_FOR_PARALLELIZATION = 
true, else the default value will be used (which in your case is false)

> Incorrect aggregate query results when stats are disable for parallelization
> 
>
> Key: PHOENIX-4287
> URL: https://issues.apache.org/jira/browse/PHOENIX-4287
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
> Environment: HBase 1.3.1
>Reporter: Mujtaba Chohan
>    Assignee: Samarth Jain
>Priority: Major
>  Labels: localIndex
> Fix For: 4.13.0, 4.12.1
>
> Attachments: PHOENIX-4287.patch, PHOENIX-4287_addendum.patch, 
> PHOENIX-4287_addendum2.patch, PHOENIX-4287_addendum3.patch, 
> PHOENIX-4287_addendum4.patch, PHOENIX-4287_v2.patch, PHOENIX-4287_v3.patch, 
> PHOENIX-4287_v3_wip.patch, PHOENIX-4287_v4.patch
>
>
> With {{phoenix.use.stats.parallelization}} set to {{false}}, aggregate query 
> returns incorrect results when stats are available.
> With local index and stats disabled for parallelization:
> {noformat}
> explain select count(*) from TABLE_T;
> +---+-++---+
> | PLAN
>   | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO |
> +---+-++---+
> | CLIENT 0-CHUNK 332170 ROWS 625043899 BYTES PARALLEL 0-WAY RANGE SCAN OVER 
> TABLE_T [1]  | 625043899   | 332170 | 150792825 |
> | SERVER FILTER BY FIRST KEY ONLY 
>   | 625043899   | 332170 | 150792825 |
> | SERVER AGGREGATE INTO SINGLE ROW
>   | 625043899   | 332170 | 150792825 |
> +---+-++---+
> select count(*) from TABLE_T;
> +---+
> | COUNT(1)  |
> +---+
> | 0 |
> +---+
> {noformat}
> Using data table
> {noformat}
> explain select /*+NO_INDEX*/ count(*) from TABLE_T;
> +--+-+++
> |   PLAN  
>  | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO_TS   |
> +--+-+++
> | CLIENT 2-CHUNK 332151 ROWS 438492470 BYTES PARALLEL 1-WAY FULL SCAN OVER 
> TABLE_T  | 438492470   | 332151 | 1507928257617  |
> | SERVER FILTER BY FIRST KEY ONLY 
>  | 438492470   | 332151 | 1507928257617  |
> | SERVER AGGREGATE INTO SINGLE ROW
>  | 438492470   | 332151 | 1507928257617  |
> +--+-+++
> select /*+NO_INDEX*/ count(*) from TABLE_T;
> +---+
> | COUNT(1)  |
> +---+
> | 14|
> +---+
> {noformat}
> Without stats available, results are correct:
> {noformat}
> explain select /*+NO_INDEX*/ count(*) from TABLE_T;
> +--+-++--+
> | PLAN | 
> EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
> +--+-++--+
> | CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER TABLE_T  | null| 
> null   | null |
> | SERVER FILTE

[jira] [Commented] (PHOENIX-4287) Incorrect aggregate query results when stats are disable for parallelization

2017-11-02 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236538#comment-16236538
 ] 

Samarth Jain commented on PHOENIX-4287:
---

Just got back. Taking a look.

> Incorrect aggregate query results when stats are disable for parallelization
> 
>
> Key: PHOENIX-4287
> URL: https://issues.apache.org/jira/browse/PHOENIX-4287
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
> Environment: HBase 1.3.1
>Reporter: Mujtaba Chohan
>    Assignee: Samarth Jain
>Priority: Major
>  Labels: localIndex
> Fix For: 4.13.0, 4.12.1
>
> Attachments: PHOENIX-4287.patch, PHOENIX-4287_addendum.patch, 
> PHOENIX-4287_addendum2.patch, PHOENIX-4287_addendum3.patch, 
> PHOENIX-4287_addendum4.patch, PHOENIX-4287_v2.patch, PHOENIX-4287_v3.patch, 
> PHOENIX-4287_v3_wip.patch, PHOENIX-4287_v4.patch
>
>
> With {{phoenix.use.stats.parallelization}} set to {{false}}, aggregate query 
> returns incorrect results when stats are available.
> With local index and stats disabled for parallelization:
> {noformat}
> explain select count(*) from TABLE_T;
> +---+-++---+
> | PLAN
>   | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO |
> +---+-++---+
> | CLIENT 0-CHUNK 332170 ROWS 625043899 BYTES PARALLEL 0-WAY RANGE SCAN OVER 
> TABLE_T [1]  | 625043899   | 332170 | 150792825 |
> | SERVER FILTER BY FIRST KEY ONLY 
>   | 625043899   | 332170 | 150792825 |
> | SERVER AGGREGATE INTO SINGLE ROW
>   | 625043899   | 332170 | 150792825 |
> +---+-++---+
> select count(*) from TABLE_T;
> +---+
> | COUNT(1)  |
> +---+
> | 0 |
> +---+
> {noformat}
> Using data table
> {noformat}
> explain select /*+NO_INDEX*/ count(*) from TABLE_T;
> +--+-+++
> |   PLAN  
>  | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO_TS   |
> +--+-+++
> | CLIENT 2-CHUNK 332151 ROWS 438492470 BYTES PARALLEL 1-WAY FULL SCAN OVER 
> TABLE_T  | 438492470   | 332151 | 1507928257617  |
> | SERVER FILTER BY FIRST KEY ONLY 
>  | 438492470   | 332151 | 1507928257617  |
> | SERVER AGGREGATE INTO SINGLE ROW
>  | 438492470   | 332151 | 1507928257617  |
> +--+-+++
> select /*+NO_INDEX*/ count(*) from TABLE_T;
> +---+
> | COUNT(1)  |
> +---+
> | 14|
> +---+
> {noformat}
> Without stats available, results are correct:
> {noformat}
> explain select /*+NO_INDEX*/ count(*) from TABLE_T;
> +--+-++--+
> | PLAN | 
> EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
> +--+-++--+
> | CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER TABLE_T  | null| 
> null   | null |
> | SERVER FILTER BY FIRST KEY ONLY  | null 
>| null   | null |
> | SERVER AGGREGATE INTO SINGLE ROW | null 
>| null   | null |
> +--

[jira] [Commented] (PHOENIX-4333) Stats - Incorrect estimate when stats are updated on a tenant specific view

2017-11-02 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235303#comment-16235303
 ] 

Samarth Jain commented on PHOENIX-4333:
---

Is it a safe assumption to make that if intersectScan is returning a non-null 
value, then we have an intersection? 

{code}
Scan newScan = scanRanges.intersectScan(scan, currentKeyBytes, 
currentGuidePostBytes, keyOffset,
false);
if (newScan != null) {
 // guide post was available in the 
}
{code}

> Stats - Incorrect estimate when stats are updated on a tenant specific view
> ---
>
> Key: PHOENIX-4333
> URL: https://issues.apache.org/jira/browse/PHOENIX-4333
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
>Reporter: Mujtaba Chohan
>Assignee: Samarth Jain
>Priority: Major
> Attachments: PHOENIX-4333_test.patch, PHOENIX-4333_v1.patch, 
> PHOENIX-4333_v2.patch
>
>
> Consider two tenants A, B with tenant specific view on 2 separate 
> regions/region servers.
> {noformat}
> Region 1 keys:
> A,1
> A,2
> B,1
> Region 2 keys:
> B,2
> B,3
> {noformat}
> When stats are updated on tenant A view. Querying stats on tenant B view 
> yield partial results (only contains stats for B,1) which are incorrect even 
> though it shows updated timestamp as current.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-4333) Stats - Incorrect estimate when stats are updated on a tenant specific view

2017-11-02 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235294#comment-16235294
 ] 

Samarth Jain commented on PHOENIX-4333:
---

Good point, [~jamestaylor]. I don't think my check would work in the below case:

REGION 1 - VIEW1 and VIEW2
REGION2 - VIEW2 and VIEW3

If we collect stats for VIEW1 and VIEW3, then even though both regions have 
stats, they don't have stats for VIEW2. I think I would also need to check 
whether there any guidepost intersected for the region.

> Stats - Incorrect estimate when stats are updated on a tenant specific view
> ---
>
> Key: PHOENIX-4333
> URL: https://issues.apache.org/jira/browse/PHOENIX-4333
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
>Reporter: Mujtaba Chohan
>Assignee: Samarth Jain
>Priority: Major
> Attachments: PHOENIX-4333_test.patch, PHOENIX-4333_v1.patch, 
> PHOENIX-4333_v2.patch
>
>
> Consider two tenants A, B with tenant specific view on 2 separate 
> regions/region servers.
> {noformat}
> Region 1 keys:
> A,1
> A,2
> B,1
> Region 2 keys:
> B,2
> B,3
> {noformat}
> When stats are updated on tenant A view. Querying stats on tenant B view 
> yield partial results (only contains stats for B,1) which are incorrect even 
> though it shows updated timestamp as current.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (PHOENIX-4333) Stats - Incorrect estimate when stats are updated on a tenant specific view

2017-11-02 Thread Samarth Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Samarth Jain updated PHOENIX-4333:
--
Attachment: PHOENIX-4333_v2.patch

> Stats - Incorrect estimate when stats are updated on a tenant specific view
> ---
>
> Key: PHOENIX-4333
> URL: https://issues.apache.org/jira/browse/PHOENIX-4333
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
>Reporter: Mujtaba Chohan
>Assignee: Samarth Jain
>Priority: Major
> Attachments: PHOENIX-4333_test.patch, PHOENIX-4333_v1.patch, 
> PHOENIX-4333_v2.patch
>
>
> Consider two tenants A, B with tenant specific view on 2 separate 
> regions/region servers.
> {noformat}
> Region 1 keys:
> A,1
> A,2
> B,1
> Region 2 keys:
> B,2
> B,3
> {noformat}
> When stats are updated on tenant A view. Querying stats on tenant B view 
> yield partial results (only contains stats for B,1) which are incorrect even 
> though it shows updated timestamp as current.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (PHOENIX-4333) Stats - Incorrect estimate when stats are updated on a tenant specific view

2017-11-02 Thread Samarth Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Samarth Jain updated PHOENIX-4333:
--
Attachment: PHOENIX-4333_v2.patch

Updated patch that sets estimate timestamp to null when we don't have 
guideposts available for all regions.

> Stats - Incorrect estimate when stats are updated on a tenant specific view
> ---
>
> Key: PHOENIX-4333
> URL: https://issues.apache.org/jira/browse/PHOENIX-4333
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
>Reporter: Mujtaba Chohan
>Assignee: Samarth Jain
>Priority: Major
> Attachments: PHOENIX-4333_test.patch, PHOENIX-4333_v1.patch
>
>
> Consider two tenants A, B with tenant specific view on 2 separate 
> regions/region servers.
> {noformat}
> Region 1 keys:
> A,1
> A,2
> B,1
> Region 2 keys:
> B,2
> B,3
> {noformat}
> When stats are updated on tenant A view. Querying stats on tenant B view 
> yield partial results (only contains stats for B,1) which are incorrect even 
> though it shows updated timestamp as current.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (PHOENIX-4333) Stats - Incorrect estimate when stats are updated on a tenant specific view

2017-11-02 Thread Samarth Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Samarth Jain updated PHOENIX-4333:
--
Attachment: (was: PHOENIX-4333_v2.patch)

> Stats - Incorrect estimate when stats are updated on a tenant specific view
> ---
>
> Key: PHOENIX-4333
> URL: https://issues.apache.org/jira/browse/PHOENIX-4333
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
>Reporter: Mujtaba Chohan
>Assignee: Samarth Jain
>Priority: Major
> Attachments: PHOENIX-4333_test.patch, PHOENIX-4333_v1.patch
>
>
> Consider two tenants A, B with tenant specific view on 2 separate 
> regions/region servers.
> {noformat}
> Region 1 keys:
> A,1
> A,2
> B,1
> Region 2 keys:
> B,2
> B,3
> {noformat}
> When stats are updated on tenant A view. Querying stats on tenant B view 
> yield partial results (only contains stats for B,1) which are incorrect even 
> though it shows updated timestamp as current.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-4333) Stats - Incorrect estimate when stats are updated on a tenant specific view

2017-11-02 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235262#comment-16235262
 ] 

Samarth Jain commented on PHOENIX-4333:
---

Actually, the check needs to be done inside this catch block:

{code}
catch (EOFException e) {
// We have read all guide posts

}
{code}

And if we are doing there, I think the check I had makes it easier to 
understand what's going on, IMHO.

{code}
+if (regionIndex < stopIndex) {
+/*
+ * We don't have guide posts available for all 
regions. So in this case we
+ * conservatively say that we cannot provide 
estimates
+ */
+gpsAvailableForAllRegions = false;
+}
 }
{code}



> Stats - Incorrect estimate when stats are updated on a tenant specific view
> ---
>
> Key: PHOENIX-4333
> URL: https://issues.apache.org/jira/browse/PHOENIX-4333
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
>Reporter: Mujtaba Chohan
>    Assignee: Samarth Jain
>Priority: Major
> Attachments: PHOENIX-4333_test.patch, PHOENIX-4333_v1.patch
>
>
> Consider two tenants A, B with tenant specific view on 2 separate 
> regions/region servers.
> {noformat}
> Region 1 keys:
> A,1
> A,2
> B,1
> Region 2 keys:
> B,2
> B,3
> {noformat}
> When stats are updated on tenant A view. Querying stats on tenant B view 
> yield partial results (only contains stats for B,1) which are incorrect even 
> though it shows updated timestamp as current.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-4333) Stats - Incorrect estimate when stats are updated on a tenant specific view

2017-11-02 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235253#comment-16235253
 ] 

Samarth Jain commented on PHOENIX-4333:
---

Ah, I see. Yes, that's true. Let me update the patch.

> Stats - Incorrect estimate when stats are updated on a tenant specific view
> ---
>
> Key: PHOENIX-4333
> URL: https://issues.apache.org/jira/browse/PHOENIX-4333
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
>Reporter: Mujtaba Chohan
>Assignee: Samarth Jain
>Priority: Major
> Attachments: PHOENIX-4333_test.patch, PHOENIX-4333_v1.patch
>
>
> Consider two tenants A, B with tenant specific view on 2 separate 
> regions/region servers.
> {noformat}
> Region 1 keys:
> A,1
> A,2
> B,1
> Region 2 keys:
> B,2
> B,3
> {noformat}
> When stats are updated on tenant A view. Querying stats on tenant B view 
> yield partial results (only contains stats for B,1) which are incorrect even 
> though it shows updated timestamp as current.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (PHOENIX-4332) Indexes should inherit guide post width of the base data table

2017-11-02 Thread Samarth Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Samarth Jain updated PHOENIX-4332:
--
Summary: Indexes should inherit guide post width of the base data table  
(was: Stats - Allow setting guide post width on global indexes)

> Indexes should inherit guide post width of the base data table
> --
>
> Key: PHOENIX-4332
> URL: https://issues.apache.org/jira/browse/PHOENIX-4332
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
>Reporter: Mujtaba Chohan
>Assignee: Samarth Jain
>Priority: Major
> Attachments: PHOENIX-4332.patch
>
>
> Altering guidepost with on data table does not propagate to global index 
> using {{ALTER TABLE}} command.
> Altering global index table runs in not allowed error.
> {noformat}
> ALTER TABLE IDX SET GUIDE_POSTS_WIDTH=1;
> Error: ERROR 1010 (42M01): Not allowed to mutate table. Cannot add/drop 
> column referenced by VIEW columnName=IDX (state=42M01,code=1010)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (PHOENIX-4343) In CREATE TABLE allow setting guide post width only on base data tables

2017-11-02 Thread Samarth Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Samarth Jain updated PHOENIX-4343:
--
Summary: In CREATE TABLE allow setting guide post width only on base data 
tables  (was: In CREATE TABLE only allow setting guide post width on tables and 
global indexes)

> In CREATE TABLE allow setting guide post width only on base data tables
> ---
>
> Key: PHOENIX-4343
> URL: https://issues.apache.org/jira/browse/PHOENIX-4343
> Project: Phoenix
>  Issue Type: Bug
>    Reporter: Samarth Jain
>    Assignee: Samarth Jain
>Priority: Major
> Attachments: PHOENIX-4343.patch, PHOENIX-4343_v2.patch, 
> PHOENIX-4343_v3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-4333) Stats - Incorrect estimate when stats are updated on a tenant specific view

2017-11-02 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235245#comment-16235245
 ] 

Samarth Jain commented on PHOENIX-4333:
---

It might be a late night and lack of coffee but I am not sure I see the 
co-relation here.
{code}
gpsAvailableForAllRegions &= initialKeyBytes != currentKeyBytes;
{code}

We set initialKeyBytes to currentKeyBytes when we know we are not using stats 
for parallelisation.
{code}
if (!useStatsForParallelization) {
/*
 * If we are not using stats for generating parallel scans, 
we need to reset the
 * currentKey back to what it was at the beginning of the 
loop.
 */
currentKeyBytes = initialKeyBytes;
}
{code}

bq. I also think we should set the estimatedRows and estimatedSize to what 
we've found, but only set estimateInfoTimestamp to null if 
!gpsAvailableForAllRegions. That way callers can choose to use or not use the 
partial estimates based on estimateInfoTimestamp.

Makes sense.


> Stats - Incorrect estimate when stats are updated on a tenant specific view
> ---
>
> Key: PHOENIX-4333
> URL: https://issues.apache.org/jira/browse/PHOENIX-4333
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
>Reporter: Mujtaba Chohan
>Assignee: Samarth Jain
>Priority: Major
> Attachments: PHOENIX-4333_test.patch, PHOENIX-4333_v1.patch
>
>
> Consider two tenants A, B with tenant specific view on 2 separate 
> regions/region servers.
> {noformat}
> Region 1 keys:
> A,1
> A,2
> B,1
> Region 2 keys:
> B,2
> B,3
> {noformat}
> When stats are updated on tenant A view. Querying stats on tenant B view 
> yield partial results (only contains stats for B,1) which are incorrect even 
> though it shows updated timestamp as current.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (PHOENIX-4343) In CREATE TABLE only allow setting guide post width on tables and global indexes

2017-11-02 Thread Samarth Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Samarth Jain updated PHOENIX-4343:
--
Attachment: PHOENIX-4343_v3.patch

Thanks for the review, [~jamestaylor]. Attached is the updated patch.

> In CREATE TABLE only allow setting guide post width on tables and global 
> indexes
> 
>
> Key: PHOENIX-4343
> URL: https://issues.apache.org/jira/browse/PHOENIX-4343
> Project: Phoenix
>  Issue Type: Bug
>    Reporter: Samarth Jain
>Assignee: Samarth Jain
>Priority: Major
> Attachments: PHOENIX-4343.patch, PHOENIX-4343_v2.patch, 
> PHOENIX-4343_v3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (PHOENIX-4333) Stats - Incorrect estimate when stats are updated on a tenant specific view

2017-11-01 Thread Samarth Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Samarth Jain updated PHOENIX-4333:
--
Attachment: PHOENIX-4333_v1.patch

With this patch, we now detect that if we don't have stats information 
available for all the regions, then we report estimates as null. The updated 
test tests out this scenario.

[~jamestaylor], please review.

> Stats - Incorrect estimate when stats are updated on a tenant specific view
> ---
>
> Key: PHOENIX-4333
> URL: https://issues.apache.org/jira/browse/PHOENIX-4333
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
>Reporter: Mujtaba Chohan
>Assignee: Samarth Jain
>Priority: Major
> Attachments: PHOENIX-4333_test.patch, PHOENIX-4333_v1.patch
>
>
> Consider two tenants A, B with tenant specific view on 2 separate 
> regions/region servers.
> {noformat}
> Region 1 keys:
> A,1
> A,2
> B,1
> Region 2 keys:
> B,2
> B,3
> {noformat}
> When stats are updated on tenant A view. Querying stats on tenant B view 
> yield partial results (only contains stats for B,1) which are incorrect even 
> though it shows updated timestamp as current.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (PHOENIX-4343) In CREATE TABLE only allow setting guide post width on tables and global indexes

2017-11-01 Thread Samarth Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Samarth Jain updated PHOENIX-4343:
--
Attachment: PHOENIX-4343_v2.patch

Updated patch. [~jamestaylor], please review.

> In CREATE TABLE only allow setting guide post width on tables and global 
> indexes
> 
>
> Key: PHOENIX-4343
> URL: https://issues.apache.org/jira/browse/PHOENIX-4343
> Project: Phoenix
>  Issue Type: Bug
>    Reporter: Samarth Jain
>Assignee: Samarth Jain
>Priority: Major
> Attachments: PHOENIX-4343.patch, PHOENIX-4343_v2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-4343) In CREATE TABLE only allow setting guide post width on tables and global indexes

2017-11-01 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235158#comment-16235158
 ] 

Samarth Jain commented on PHOENIX-4343:
---

With PHOENIX-4332 indexes now inherit the guide post width of the data table. 
The right approach would be disallow setting guide post width on everything 
except the data table. Will update the patch.

> In CREATE TABLE only allow setting guide post width on tables and global 
> indexes
> 
>
> Key: PHOENIX-4343
> URL: https://issues.apache.org/jira/browse/PHOENIX-4343
> Project: Phoenix
>  Issue Type: Bug
>    Reporter: Samarth Jain
>Assignee: Samarth Jain
>Priority: Major
> Attachments: PHOENIX-4343.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-4332) Stats - Allow setting guide post width on global indexes

2017-11-01 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235145#comment-16235145
 ] 

Samarth Jain commented on PHOENIX-4332:
---

Instead of supporting ALTER TABLE or ALTER INDEX to set guide_posts_width, 
indexes now instead inherit the guide post width of the data table. This 
applies to global, local, and view indexes.

> Stats - Allow setting guide post width on global indexes
> 
>
> Key: PHOENIX-4332
> URL: https://issues.apache.org/jira/browse/PHOENIX-4332
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
>Reporter: Mujtaba Chohan
>Assignee: Samarth Jain
>Priority: Major
> Attachments: PHOENIX-4332.patch
>
>
> Altering guidepost with on data table does not propagate to global index 
> using {{ALTER TABLE}} command.
> Altering global index table runs in not allowed error.
> {noformat}
> ALTER TABLE IDX SET GUIDE_POSTS_WIDTH=1;
> Error: ERROR 1010 (42M01): Not allowed to mutate table. Cannot add/drop 
> column referenced by VIEW columnName=IDX (state=42M01,code=1010)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (PHOENIX-4332) Stats - Allow setting guide post width on global indexes

2017-11-01 Thread Samarth Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Samarth Jain updated PHOENIX-4332:
--
Attachment: PHOENIX-4332.patch

[~jamestaylor], please review.

> Stats - Allow setting guide post width on global indexes
> 
>
> Key: PHOENIX-4332
> URL: https://issues.apache.org/jira/browse/PHOENIX-4332
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
>Reporter: Mujtaba Chohan
>Assignee: Samarth Jain
>Priority: Major
> Attachments: PHOENIX-4332.patch
>
>
> Altering guidepost with on data table does not propagate to global index 
> using {{ALTER TABLE}} command.
> Altering global index table runs in not allowed error.
> {noformat}
> ALTER TABLE IDX SET GUIDE_POSTS_WIDTH=1;
> Error: ERROR 1010 (42M01): Not allowed to mutate table. Cannot add/drop 
> column referenced by VIEW columnName=IDX (state=42M01,code=1010)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (PHOENIX-4287) Incorrect aggregate query results when stats are disable for parallelization

2017-11-01 Thread Samarth Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Samarth Jain updated PHOENIX-4287:
--
Attachment: PHOENIX-4287_addendum4.patch

> Incorrect aggregate query results when stats are disable for parallelization
> 
>
> Key: PHOENIX-4287
> URL: https://issues.apache.org/jira/browse/PHOENIX-4287
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
> Environment: HBase 1.3.1
>Reporter: Mujtaba Chohan
>    Assignee: Samarth Jain
>Priority: Major
>  Labels: localIndex
> Fix For: 4.13.0, 4.12.1
>
> Attachments: PHOENIX-4287.patch, PHOENIX-4287_addendum.patch, 
> PHOENIX-4287_addendum2.patch, PHOENIX-4287_addendum3.patch, 
> PHOENIX-4287_addendum4.patch, PHOENIX-4287_v2.patch, PHOENIX-4287_v3.patch, 
> PHOENIX-4287_v3_wip.patch, PHOENIX-4287_v4.patch
>
>
> With {{phoenix.use.stats.parallelization}} set to {{false}}, aggregate query 
> returns incorrect results when stats are available.
> With local index and stats disabled for parallelization:
> {noformat}
> explain select count(*) from TABLE_T;
> +---+-++---+
> | PLAN
>   | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO |
> +---+-++---+
> | CLIENT 0-CHUNK 332170 ROWS 625043899 BYTES PARALLEL 0-WAY RANGE SCAN OVER 
> TABLE_T [1]  | 625043899   | 332170 | 150792825 |
> | SERVER FILTER BY FIRST KEY ONLY 
>   | 625043899   | 332170 | 150792825 |
> | SERVER AGGREGATE INTO SINGLE ROW
>   | 625043899   | 332170 | 150792825 |
> +---+-++---+
> select count(*) from TABLE_T;
> +---+
> | COUNT(1)  |
> +---+
> | 0 |
> +---+
> {noformat}
> Using data table
> {noformat}
> explain select /*+NO_INDEX*/ count(*) from TABLE_T;
> +--+-+++
> |   PLAN  
>  | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO_TS   |
> +--+-+++
> | CLIENT 2-CHUNK 332151 ROWS 438492470 BYTES PARALLEL 1-WAY FULL SCAN OVER 
> TABLE_T  | 438492470   | 332151 | 1507928257617  |
> | SERVER FILTER BY FIRST KEY ONLY 
>  | 438492470   | 332151 | 1507928257617  |
> | SERVER AGGREGATE INTO SINGLE ROW
>  | 438492470   | 332151 | 1507928257617  |
> +--+-+++
> select /*+NO_INDEX*/ count(*) from TABLE_T;
> +---+
> | COUNT(1)  |
> +---+
> | 14|
> +---+
> {noformat}
> Without stats available, results are correct:
> {noformat}
> explain select /*+NO_INDEX*/ count(*) from TABLE_T;
> +--+-++--+
> | PLAN | 
> EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
> +--+-++--+
> | CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER TABLE_T  | null| 
> null   | null |
> | SERVER FILTER BY FIRST KEY ONLY  | null 
>| null   | null |
> | SERVER AGGREGATE INTO SINGLE ROW | null 
>| null   | null |
> +--

[jira] [Updated] (PHOENIX-4287) Incorrect aggregate query results when stats are disable for parallelization

2017-11-01 Thread Samarth Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Samarth Jain updated PHOENIX-4287:
--
Attachment: (was: PHOENIX-4287_addendum4.patch)

> Incorrect aggregate query results when stats are disable for parallelization
> 
>
> Key: PHOENIX-4287
> URL: https://issues.apache.org/jira/browse/PHOENIX-4287
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
> Environment: HBase 1.3.1
>Reporter: Mujtaba Chohan
>    Assignee: Samarth Jain
>Priority: Major
>  Labels: localIndex
> Fix For: 4.13.0, 4.12.1
>
> Attachments: PHOENIX-4287.patch, PHOENIX-4287_addendum.patch, 
> PHOENIX-4287_addendum2.patch, PHOENIX-4287_addendum3.patch, 
> PHOENIX-4287_v2.patch, PHOENIX-4287_v3.patch, PHOENIX-4287_v3_wip.patch, 
> PHOENIX-4287_v4.patch
>
>
> With {{phoenix.use.stats.parallelization}} set to {{false}}, aggregate query 
> returns incorrect results when stats are available.
> With local index and stats disabled for parallelization:
> {noformat}
> explain select count(*) from TABLE_T;
> +---+-++---+
> | PLAN
>   | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO |
> +---+-++---+
> | CLIENT 0-CHUNK 332170 ROWS 625043899 BYTES PARALLEL 0-WAY RANGE SCAN OVER 
> TABLE_T [1]  | 625043899   | 332170 | 150792825 |
> | SERVER FILTER BY FIRST KEY ONLY 
>   | 625043899   | 332170 | 150792825 |
> | SERVER AGGREGATE INTO SINGLE ROW
>   | 625043899   | 332170 | 150792825 |
> +---+-++---+
> select count(*) from TABLE_T;
> +---+
> | COUNT(1)  |
> +---+
> | 0 |
> +---+
> {noformat}
> Using data table
> {noformat}
> explain select /*+NO_INDEX*/ count(*) from TABLE_T;
> +--+-+++
> |   PLAN  
>  | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO_TS   |
> +--+-+++
> | CLIENT 2-CHUNK 332151 ROWS 438492470 BYTES PARALLEL 1-WAY FULL SCAN OVER 
> TABLE_T  | 438492470   | 332151 | 1507928257617  |
> | SERVER FILTER BY FIRST KEY ONLY 
>  | 438492470   | 332151 | 1507928257617  |
> | SERVER AGGREGATE INTO SINGLE ROW
>  | 438492470   | 332151 | 1507928257617  |
> +--+-+++
> select /*+NO_INDEX*/ count(*) from TABLE_T;
> +---+
> | COUNT(1)  |
> +---+
> | 14|
> +---+
> {noformat}
> Without stats available, results are correct:
> {noformat}
> explain select /*+NO_INDEX*/ count(*) from TABLE_T;
> +--+-++--+
> | PLAN | 
> EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
> +--+-++--+
> | CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER TABLE_T  | null| 
> null   | null |
> | SERVER FILTER BY FIRST KEY ONLY  | null 
>| null   | null |
> | SERVER AGGREGATE INTO SINGLE ROW | null 
>| null   | null |
> +-

[jira] [Updated] (PHOENIX-4287) Incorrect aggregate query results when stats are disable for parallelization

2017-11-01 Thread Samarth Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Samarth Jain updated PHOENIX-4287:
--
Attachment: PHOENIX-4287_addendum4.patch

Updated patch with more tests including fix for an issue that the new test 
surfaced.

> Incorrect aggregate query results when stats are disable for parallelization
> 
>
> Key: PHOENIX-4287
> URL: https://issues.apache.org/jira/browse/PHOENIX-4287
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
> Environment: HBase 1.3.1
>Reporter: Mujtaba Chohan
>    Assignee: Samarth Jain
>Priority: Major
>  Labels: localIndex
> Fix For: 4.13.0, 4.12.1
>
> Attachments: PHOENIX-4287.patch, PHOENIX-4287_addendum.patch, 
> PHOENIX-4287_addendum2.patch, PHOENIX-4287_addendum3.patch, 
> PHOENIX-4287_addendum4.patch, PHOENIX-4287_v2.patch, PHOENIX-4287_v3.patch, 
> PHOENIX-4287_v3_wip.patch, PHOENIX-4287_v4.patch
>
>
> With {{phoenix.use.stats.parallelization}} set to {{false}}, aggregate query 
> returns incorrect results when stats are available.
> With local index and stats disabled for parallelization:
> {noformat}
> explain select count(*) from TABLE_T;
> +---+-++---+
> | PLAN
>   | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO |
> +---+-++---+
> | CLIENT 0-CHUNK 332170 ROWS 625043899 BYTES PARALLEL 0-WAY RANGE SCAN OVER 
> TABLE_T [1]  | 625043899   | 332170 | 150792825 |
> | SERVER FILTER BY FIRST KEY ONLY 
>   | 625043899   | 332170 | 150792825 |
> | SERVER AGGREGATE INTO SINGLE ROW
>   | 625043899   | 332170 | 150792825 |
> +---+-++---+
> select count(*) from TABLE_T;
> +---+
> | COUNT(1)  |
> +---+
> | 0 |
> +---+
> {noformat}
> Using data table
> {noformat}
> explain select /*+NO_INDEX*/ count(*) from TABLE_T;
> +--+-+++
> |   PLAN  
>  | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO_TS   |
> +--+-+++
> | CLIENT 2-CHUNK 332151 ROWS 438492470 BYTES PARALLEL 1-WAY FULL SCAN OVER 
> TABLE_T  | 438492470   | 332151 | 1507928257617  |
> | SERVER FILTER BY FIRST KEY ONLY 
>  | 438492470   | 332151 | 1507928257617  |
> | SERVER AGGREGATE INTO SINGLE ROW
>  | 438492470   | 332151 | 1507928257617  |
> +--+-+++
> select /*+NO_INDEX*/ count(*) from TABLE_T;
> +---+
> | COUNT(1)  |
> +---+
> | 14|
> +---+
> {noformat}
> Without stats available, results are correct:
> {noformat}
> explain select /*+NO_INDEX*/ count(*) from TABLE_T;
> +--+-++--+
> | PLAN | 
> EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
> +--+-++--+
> | CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER TABLE_T  | null| 
> null   | null |
> | SERVER FILTER BY FIRST KEY ONLY  | null 
>| null   | null |
> | SERVER AGGREGATE INTO SINGLE ROW | null 
>  

[jira] [Reopened] (PHOENIX-4287) Incorrect aggregate query results when stats are disable for parallelization

2017-11-01 Thread Samarth Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Samarth Jain reopened PHOENIX-4287:
---

> Incorrect aggregate query results when stats are disable for parallelization
> 
>
> Key: PHOENIX-4287
> URL: https://issues.apache.org/jira/browse/PHOENIX-4287
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
> Environment: HBase 1.3.1
>Reporter: Mujtaba Chohan
>    Assignee: Samarth Jain
>Priority: Major
>  Labels: localIndex
> Fix For: 4.13.0, 4.12.1
>
> Attachments: PHOENIX-4287.patch, PHOENIX-4287_addendum.patch, 
> PHOENIX-4287_addendum2.patch, PHOENIX-4287_addendum3.patch, 
> PHOENIX-4287_addendum4.patch, PHOENIX-4287_v2.patch, PHOENIX-4287_v3.patch, 
> PHOENIX-4287_v3_wip.patch, PHOENIX-4287_v4.patch
>
>
> With {{phoenix.use.stats.parallelization}} set to {{false}}, aggregate query 
> returns incorrect results when stats are available.
> With local index and stats disabled for parallelization:
> {noformat}
> explain select count(*) from TABLE_T;
> +---+-++---+
> | PLAN
>   | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO |
> +---+-++---+
> | CLIENT 0-CHUNK 332170 ROWS 625043899 BYTES PARALLEL 0-WAY RANGE SCAN OVER 
> TABLE_T [1]  | 625043899   | 332170 | 150792825 |
> | SERVER FILTER BY FIRST KEY ONLY 
>   | 625043899   | 332170 | 150792825 |
> | SERVER AGGREGATE INTO SINGLE ROW
>   | 625043899   | 332170 | 150792825 |
> +---+-++---+
> select count(*) from TABLE_T;
> +---+
> | COUNT(1)  |
> +---+
> | 0 |
> +---+
> {noformat}
> Using data table
> {noformat}
> explain select /*+NO_INDEX*/ count(*) from TABLE_T;
> +--+-+++
> |   PLAN  
>  | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO_TS   |
> +--+-+++
> | CLIENT 2-CHUNK 332151 ROWS 438492470 BYTES PARALLEL 1-WAY FULL SCAN OVER 
> TABLE_T  | 438492470   | 332151 | 1507928257617  |
> | SERVER FILTER BY FIRST KEY ONLY 
>  | 438492470   | 332151 | 1507928257617  |
> | SERVER AGGREGATE INTO SINGLE ROW
>  | 438492470   | 332151 | 1507928257617  |
> +--+-+++
> select /*+NO_INDEX*/ count(*) from TABLE_T;
> +---+
> | COUNT(1)  |
> +---+
> | 14|
> +---+
> {noformat}
> Without stats available, results are correct:
> {noformat}
> explain select /*+NO_INDEX*/ count(*) from TABLE_T;
> +--+-++--+
> | PLAN | 
> EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
> +--+-++--+
> | CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER TABLE_T  | null| 
> null   | null |
> | SERVER FILTER BY FIRST KEY ONLY  | null 
>| null   | null |
> | SERVER AGGREGATE INTO SINGLE ROW | null 
>| null   | null |
> +--+-++--+
> select /*+NO_INDEX*/ count(*) from TABLE_T;
> +---+
> | COUNT(1)  |
> +---+
> | 27|
> +---+
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (PHOENIX-4287) Incorrect aggregate query results when stats are disable for parallelization

2017-11-01 Thread Samarth Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Samarth Jain updated PHOENIX-4287:
--
Attachment: PHOENIX-4287_addendum3.patch

Good catch, [~jamestaylor]. I have added a test that makes sure that 
useStatsForParallelization returns null when the property is not set in create 
table.

> Incorrect aggregate query results when stats are disable for parallelization
> 
>
> Key: PHOENIX-4287
> URL: https://issues.apache.org/jira/browse/PHOENIX-4287
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
> Environment: HBase 1.3.1
>Reporter: Mujtaba Chohan
>    Assignee: Samarth Jain
>Priority: Major
>  Labels: localIndex
> Fix For: 4.13.0, 4.12.1
>
> Attachments: PHOENIX-4287.patch, PHOENIX-4287_addendum.patch, 
> PHOENIX-4287_addendum2.patch, PHOENIX-4287_addendum3.patch, 
> PHOENIX-4287_v2.patch, PHOENIX-4287_v3.patch, PHOENIX-4287_v3_wip.patch, 
> PHOENIX-4287_v4.patch
>
>
> With {{phoenix.use.stats.parallelization}} set to {{false}}, aggregate query 
> returns incorrect results when stats are available.
> With local index and stats disabled for parallelization:
> {noformat}
> explain select count(*) from TABLE_T;
> +---+-++---+
> | PLAN
>   | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO |
> +---+-++---+
> | CLIENT 0-CHUNK 332170 ROWS 625043899 BYTES PARALLEL 0-WAY RANGE SCAN OVER 
> TABLE_T [1]  | 625043899   | 332170 | 150792825 |
> | SERVER FILTER BY FIRST KEY ONLY 
>   | 625043899   | 332170 | 150792825 |
> | SERVER AGGREGATE INTO SINGLE ROW
>   | 625043899   | 332170 | 150792825 |
> +---+-++---+
> select count(*) from TABLE_T;
> +---+
> | COUNT(1)  |
> +---+
> | 0 |
> +---+
> {noformat}
> Using data table
> {noformat}
> explain select /*+NO_INDEX*/ count(*) from TABLE_T;
> +--+-+++
> |   PLAN  
>  | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO_TS   |
> +--+-+++
> | CLIENT 2-CHUNK 332151 ROWS 438492470 BYTES PARALLEL 1-WAY FULL SCAN OVER 
> TABLE_T  | 438492470   | 332151 | 1507928257617  |
> | SERVER FILTER BY FIRST KEY ONLY 
>  | 438492470   | 332151 | 1507928257617  |
> | SERVER AGGREGATE INTO SINGLE ROW
>  | 438492470   | 332151 | 1507928257617  |
> +--+-+++
> select /*+NO_INDEX*/ count(*) from TABLE_T;
> +---+
> | COUNT(1)  |
> +---+
> | 14|
> +---+
> {noformat}
> Without stats available, results are correct:
> {noformat}
> explain select /*+NO_INDEX*/ count(*) from TABLE_T;
> +--+-++--+
> | PLAN | 
> EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
> +--+-++--+
> | CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER TABLE_T  | null| 
> null   | null |
> | SERVER FILTER BY FIRST KEY ONLY  | null 
>| null   | null |
> | SERVER AGGREGATE INTO SINGLE ROW  

[jira] [Updated] (PHOENIX-4287) Incorrect aggregate query results when stats are disable for parallelization

2017-11-01 Thread Samarth Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Samarth Jain updated PHOENIX-4287:
--
Attachment: PHOENIX-4287_addendum2.patch

Thanks for the reviews, [~jamestaylor]. Updated patch addresses the comment.

> Incorrect aggregate query results when stats are disable for parallelization
> 
>
> Key: PHOENIX-4287
> URL: https://issues.apache.org/jira/browse/PHOENIX-4287
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
> Environment: HBase 1.3.1
>Reporter: Mujtaba Chohan
>    Assignee: Samarth Jain
>Priority: Major
>  Labels: localIndex
> Fix For: 4.13.0, 4.12.1
>
> Attachments: PHOENIX-4287.patch, PHOENIX-4287_addendum.patch, 
> PHOENIX-4287_addendum2.patch, PHOENIX-4287_v2.patch, PHOENIX-4287_v3.patch, 
> PHOENIX-4287_v3_wip.patch, PHOENIX-4287_v4.patch
>
>
> With {{phoenix.use.stats.parallelization}} set to {{false}}, aggregate query 
> returns incorrect results when stats are available.
> With local index and stats disabled for parallelization:
> {noformat}
> explain select count(*) from TABLE_T;
> +---+-++---+
> | PLAN
>   | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO |
> +---+-++---+
> | CLIENT 0-CHUNK 332170 ROWS 625043899 BYTES PARALLEL 0-WAY RANGE SCAN OVER 
> TABLE_T [1]  | 625043899   | 332170 | 150792825 |
> | SERVER FILTER BY FIRST KEY ONLY 
>   | 625043899   | 332170 | 150792825 |
> | SERVER AGGREGATE INTO SINGLE ROW
>   | 625043899   | 332170 | 150792825 |
> +---+-++---+
> select count(*) from TABLE_T;
> +---+
> | COUNT(1)  |
> +---+
> | 0 |
> +---+
> {noformat}
> Using data table
> {noformat}
> explain select /*+NO_INDEX*/ count(*) from TABLE_T;
> +--+-+++
> |   PLAN  
>  | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO_TS   |
> +--+-+++
> | CLIENT 2-CHUNK 332151 ROWS 438492470 BYTES PARALLEL 1-WAY FULL SCAN OVER 
> TABLE_T  | 438492470   | 332151 | 1507928257617  |
> | SERVER FILTER BY FIRST KEY ONLY 
>  | 438492470   | 332151 | 1507928257617  |
> | SERVER AGGREGATE INTO SINGLE ROW
>  | 438492470   | 332151 | 1507928257617  |
> +--+-+++
> select /*+NO_INDEX*/ count(*) from TABLE_T;
> +---+
> | COUNT(1)  |
> +---+
> | 14|
> +---+
> {noformat}
> Without stats available, results are correct:
> {noformat}
> explain select /*+NO_INDEX*/ count(*) from TABLE_T;
> +--+-++--+
> | PLAN | 
> EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
> +--+-++--+
> | CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER TABLE_T  | null| 
> null   | null |
> | SERVER FILTER BY FIRST KEY ONLY  | null 
>| null   | null |
> | SERVER AGGREGATE INTO SINGLE ROW | null 
>| null   | null |
> +-

[jira] [Commented] (PHOENIX-4332) Stats - Altering guidepost width on base table does not propagate to global index

2017-11-01 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234835#comment-16234835
 ] 

Samarth Jain commented on PHOENIX-4332:
---

View indexes and local indexes use the guide post width of the data table. 
Global indexes need to have their guide post width set.

> Stats - Altering guidepost width on base table does not propagate to global 
> index
> -
>
> Key: PHOENIX-4332
> URL: https://issues.apache.org/jira/browse/PHOENIX-4332
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
>Reporter: Mujtaba Chohan
>    Assignee: Samarth Jain
>Priority: Major
>
> Altering guidepost with on data table does not propagate to global index 
> using {{ALTER TABLE}} command.
> Altering global index table runs in not allowed error.
> {noformat}
> ALTER TABLE IDX SET GUIDE_POSTS_WIDTH=1;
> Error: ERROR 1010 (42M01): Not allowed to mutate table. Cannot add/drop 
> column referenced by VIEW columnName=IDX (state=42M01,code=1010)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (PHOENIX-4332) Stats - Allow setting guide post width on global indexes

2017-11-01 Thread Samarth Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Samarth Jain updated PHOENIX-4332:
--
Summary: Stats - Allow setting guide post width on global indexes  (was: 
Stats - Altering guidepost width on base table does not propagate to global 
index)

> Stats - Allow setting guide post width on global indexes
> 
>
> Key: PHOENIX-4332
> URL: https://issues.apache.org/jira/browse/PHOENIX-4332
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
>Reporter: Mujtaba Chohan
>Assignee: Samarth Jain
>Priority: Major
>
> Altering guidepost with on data table does not propagate to global index 
> using {{ALTER TABLE}} command.
> Altering global index table runs in not allowed error.
> {noformat}
> ALTER TABLE IDX SET GUIDE_POSTS_WIDTH=1;
> Error: ERROR 1010 (42M01): Not allowed to mutate table. Cannot add/drop 
> column referenced by VIEW columnName=IDX (state=42M01,code=1010)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (PHOENIX-4287) Incorrect aggregate query results when stats are disable for parallelization

2017-11-01 Thread Samarth Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Samarth Jain updated PHOENIX-4287:
--
Attachment: PHOENIX-4287_addendum.patch

Patch that fixes the issue. We need to set the config {code} 
phoenix.use.stats.parallelization {code} both on client and server side. When 
build PTable on the server side, we use the config default if the cell for 
USE_STATS_FOR_PARALLELIZATION is not present. Earlier it was defaulting to 
true. 

> Incorrect aggregate query results when stats are disable for parallelization
> 
>
> Key: PHOENIX-4287
> URL: https://issues.apache.org/jira/browse/PHOENIX-4287
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
> Environment: HBase 1.3.1
>Reporter: Mujtaba Chohan
>    Assignee: Samarth Jain
>Priority: Major
>  Labels: localIndex
> Fix For: 4.13.0, 4.12.1
>
> Attachments: PHOENIX-4287.patch, PHOENIX-4287_addendum.patch, 
> PHOENIX-4287_v2.patch, PHOENIX-4287_v3.patch, PHOENIX-4287_v3_wip.patch, 
> PHOENIX-4287_v4.patch
>
>
> With {{phoenix.use.stats.parallelization}} set to {{false}}, aggregate query 
> returns incorrect results when stats are available.
> With local index and stats disabled for parallelization:
> {noformat}
> explain select count(*) from TABLE_T;
> +---+-++---+
> | PLAN
>   | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO |
> +---+-++---+
> | CLIENT 0-CHUNK 332170 ROWS 625043899 BYTES PARALLEL 0-WAY RANGE SCAN OVER 
> TABLE_T [1]  | 625043899   | 332170 | 150792825 |
> | SERVER FILTER BY FIRST KEY ONLY 
>   | 625043899   | 332170 | 150792825 |
> | SERVER AGGREGATE INTO SINGLE ROW
>   | 625043899   | 332170 | 150792825 |
> +---+-++---+
> select count(*) from TABLE_T;
> +---+
> | COUNT(1)  |
> +---+
> | 0 |
> +---+
> {noformat}
> Using data table
> {noformat}
> explain select /*+NO_INDEX*/ count(*) from TABLE_T;
> +--+-+++
> |   PLAN  
>  | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO_TS   |
> +--+-+++
> | CLIENT 2-CHUNK 332151 ROWS 438492470 BYTES PARALLEL 1-WAY FULL SCAN OVER 
> TABLE_T  | 438492470   | 332151 | 1507928257617  |
> | SERVER FILTER BY FIRST KEY ONLY 
>  | 438492470   | 332151 | 1507928257617  |
> | SERVER AGGREGATE INTO SINGLE ROW
>  | 438492470   | 332151 | 1507928257617  |
> +--+-+++
> select /*+NO_INDEX*/ count(*) from TABLE_T;
> +---+
> | COUNT(1)  |
> +---+
> | 14|
> +---+
> {noformat}
> Without stats available, results are correct:
> {noformat}
> explain select /*+NO_INDEX*/ count(*) from TABLE_T;
> +--+-++--+
> | PLAN | 
> EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
> +--+-++--+
> | CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER TABLE_T  | null| 
> null   | null |
> | SERVER FILTER B

[jira] [Updated] (PHOENIX-4343) In CREATE TABLE only allow setting guide post width on tables and global indexes

2017-11-01 Thread Samarth Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Samarth Jain updated PHOENIX-4343:
--
Attachment: PHOENIX-4343.patch

> In CREATE TABLE only allow setting guide post width on tables and global 
> indexes
> 
>
> Key: PHOENIX-4343
> URL: https://issues.apache.org/jira/browse/PHOENIX-4343
> Project: Phoenix
>  Issue Type: Bug
>    Reporter: Samarth Jain
>Assignee: Samarth Jain
>Priority: Major
> Attachments: PHOENIX-4343.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-4343) In CREATE TABLE only allow setting guide post width on tables and global indexes

2017-11-01 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234774#comment-16234774
 ] 

Samarth Jain commented on PHOENIX-4343:
---

[~jamestaylor], please review.

> In CREATE TABLE only allow setting guide post width on tables and global 
> indexes
> 
>
> Key: PHOENIX-4343
> URL: https://issues.apache.org/jira/browse/PHOENIX-4343
> Project: Phoenix
>  Issue Type: Bug
>    Reporter: Samarth Jain
>Assignee: Samarth Jain
>Priority: Major
> Attachments: PHOENIX-4343.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (PHOENIX-4343) In CREATE TABLE only allow setting guide post width on tables and global indexes

2017-11-01 Thread Samarth Jain (JIRA)
Samarth Jain created PHOENIX-4343:
-

 Summary: In CREATE TABLE only allow setting guide post width on 
tables and global indexes
 Key: PHOENIX-4343
 URL: https://issues.apache.org/jira/browse/PHOENIX-4343
 Project: Phoenix
  Issue Type: Bug
Reporter: Samarth Jain
Assignee: Samarth Jain
Priority: Major






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-4287) Incorrect aggregate query results when stats are disable for parallelization

2017-11-01 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234620#comment-16234620
 ] 

Samarth Jain commented on PHOENIX-4287:
---

OK, thanks. Looks like we are hitting a similar issue when using queries 
against views. Views should inherit the USE_STATS_FOR_PARALLELIZATION property 
from the base table.

> Incorrect aggregate query results when stats are disable for parallelization
> 
>
> Key: PHOENIX-4287
> URL: https://issues.apache.org/jira/browse/PHOENIX-4287
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
> Environment: HBase 1.3.1
>Reporter: Mujtaba Chohan
>    Assignee: Samarth Jain
>Priority: Major
>  Labels: localIndex
> Fix For: 4.13.0, 4.12.1
>
> Attachments: PHOENIX-4287.patch, PHOENIX-4287_v2.patch, 
> PHOENIX-4287_v3.patch, PHOENIX-4287_v3_wip.patch, PHOENIX-4287_v4.patch
>
>
> With {{phoenix.use.stats.parallelization}} set to {{false}}, aggregate query 
> returns incorrect results when stats are available.
> With local index and stats disabled for parallelization:
> {noformat}
> explain select count(*) from TABLE_T;
> +---+-++---+
> | PLAN
>   | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO |
> +---+-++---+
> | CLIENT 0-CHUNK 332170 ROWS 625043899 BYTES PARALLEL 0-WAY RANGE SCAN OVER 
> TABLE_T [1]  | 625043899   | 332170 | 150792825 |
> | SERVER FILTER BY FIRST KEY ONLY 
>   | 625043899   | 332170 | 150792825 |
> | SERVER AGGREGATE INTO SINGLE ROW
>   | 625043899   | 332170 | 150792825 |
> +---+-++---+
> select count(*) from TABLE_T;
> +---+
> | COUNT(1)  |
> +---+
> | 0 |
> +---+
> {noformat}
> Using data table
> {noformat}
> explain select /*+NO_INDEX*/ count(*) from TABLE_T;
> +--+-+++
> |   PLAN  
>  | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO_TS   |
> +--+-+++
> | CLIENT 2-CHUNK 332151 ROWS 438492470 BYTES PARALLEL 1-WAY FULL SCAN OVER 
> TABLE_T  | 438492470   | 332151 | 1507928257617  |
> | SERVER FILTER BY FIRST KEY ONLY 
>  | 438492470   | 332151 | 1507928257617  |
> | SERVER AGGREGATE INTO SINGLE ROW
>  | 438492470   | 332151 | 1507928257617  |
> +--+-+++
> select /*+NO_INDEX*/ count(*) from TABLE_T;
> +---+
> | COUNT(1)  |
> +---+
> | 14|
> +---+
> {noformat}
> Without stats available, results are correct:
> {noformat}
> explain select /*+NO_INDEX*/ count(*) from TABLE_T;
> +--+-++--+
> | PLAN | 
> EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
> +--+-++--+
> | CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER TABLE_T  | null| 
> null   | null |
> | SERVER FILTER BY FIRST KEY ONLY  | null 
>| null   | null |
> | SERVER AGGREGATE INTO SINGLE ROW | null 
>| null   | null |
&

[jira] [Commented] (PHOENIX-4287) Incorrect aggregate query results when stats are disable for parallelization

2017-11-01 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234587#comment-16234587
 ] 

Samarth Jain commented on PHOENIX-4287:
---

[~mujtabachohan] - What kind of query are you running into issue with? Is it 
against a table or a view? What happens after you execute a ALTER TABLE  SET USE_STATS_FOR_PARALLELIZATION=false? Can you check also for the base 
table the value of USE_STATS_FOR_PARALLELIZATION in SYSTEM.CATALOG? 

> Incorrect aggregate query results when stats are disable for parallelization
> 
>
> Key: PHOENIX-4287
> URL: https://issues.apache.org/jira/browse/PHOENIX-4287
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
> Environment: HBase 1.3.1
>Reporter: Mujtaba Chohan
>    Assignee: Samarth Jain
>Priority: Major
>  Labels: localIndex
> Fix For: 4.13.0, 4.12.1
>
> Attachments: PHOENIX-4287.patch, PHOENIX-4287_v2.patch, 
> PHOENIX-4287_v3.patch, PHOENIX-4287_v3_wip.patch, PHOENIX-4287_v4.patch
>
>
> With {{phoenix.use.stats.parallelization}} set to {{false}}, aggregate query 
> returns incorrect results when stats are available.
> With local index and stats disabled for parallelization:
> {noformat}
> explain select count(*) from TABLE_T;
> +---+-++---+
> | PLAN
>   | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO |
> +---+-++---+
> | CLIENT 0-CHUNK 332170 ROWS 625043899 BYTES PARALLEL 0-WAY RANGE SCAN OVER 
> TABLE_T [1]  | 625043899   | 332170 | 150792825 |
> | SERVER FILTER BY FIRST KEY ONLY 
>   | 625043899   | 332170 | 150792825 |
> | SERVER AGGREGATE INTO SINGLE ROW
>   | 625043899   | 332170 | 150792825 |
> +---+-++---+
> select count(*) from TABLE_T;
> +---+
> | COUNT(1)  |
> +---+
> | 0 |
> +---+
> {noformat}
> Using data table
> {noformat}
> explain select /*+NO_INDEX*/ count(*) from TABLE_T;
> +--+-+++
> |   PLAN  
>  | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO_TS   |
> +--+-+++
> | CLIENT 2-CHUNK 332151 ROWS 438492470 BYTES PARALLEL 1-WAY FULL SCAN OVER 
> TABLE_T  | 438492470   | 332151 | 1507928257617  |
> | SERVER FILTER BY FIRST KEY ONLY 
>  | 438492470   | 332151 | 1507928257617  |
> | SERVER AGGREGATE INTO SINGLE ROW
>  | 438492470   | 332151 | 1507928257617  |
> +--+-+++
> select /*+NO_INDEX*/ count(*) from TABLE_T;
> +---+
> | COUNT(1)  |
> +---+
> | 14|
> +---+
> {noformat}
> Without stats available, results are correct:
> {noformat}
> explain select /*+NO_INDEX*/ count(*) from TABLE_T;
> +--+-++--+
> | PLAN | 
> EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
> +--+-++--+
> | CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER TABLE_T  | null| 
> null   | null |
> | SERVER FILTER BY F

[jira] [Commented] (PHOENIX-4335) System catalog snapshot created each time a new connection is created

2017-11-01 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16234412#comment-16234412
 ] 

Samarth Jain commented on PHOENIX-4335:
---

+1

> System catalog snapshot created each time a new connection is created
> -
>
> Key: PHOENIX-4335
> URL: https://issues.apache.org/jira/browse/PHOENIX-4335
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
>Reporter: Mujtaba Chohan
>Assignee: James Taylor
>Priority: Blocker
> Fix For: 4.13.0
>
> Attachments: PHOENIX-4335.patch, PHOENIX-4335_v2.patch, 
> PHOENIX-4335_v3.patch
>
>
> With current head of 4.x, System Catalog snapshot is created on each new 
> connection.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-4335) System catalog snapshot created each time a new connection is created

2017-11-01 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233715#comment-16233715
 ] 

Samarth Jain commented on PHOENIX-4335:
---

Patch looks good, [~jamestaylor]. One minor nit:

I don't see why these have to be an array? 
{code}
+private final static boolean[] reinitialize = new boolean[1];
+private final static int[] countUpgradeAttempts = new int[1];
+private final static long[] systemTableVersion = 
{MetaDataProtocol.getPriorVersion()};
{code}


> System catalog snapshot created each time a new connection is created
> -
>
> Key: PHOENIX-4335
> URL: https://issues.apache.org/jira/browse/PHOENIX-4335
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
>Reporter: Mujtaba Chohan
>Assignee: James Taylor
>Priority: Blocker
> Fix For: 4.13.0
>
> Attachments: PHOENIX-4335.patch, PHOENIX-4335_v2.patch
>
>
> With current head of 4.x, System Catalog snapshot is created on each new 
> connection.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-4332) Stats - Altering guidepost width on base table does not propagate to global index

2017-10-31 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16227669#comment-16227669
 ] 

Samarth Jain commented on PHOENIX-4332:
---

[~jamestaylor], if possible, I would like to get this in for the 4.13 release.

> Stats - Altering guidepost width on base table does not propagate to global 
> index
> -
>
> Key: PHOENIX-4332
> URL: https://issues.apache.org/jira/browse/PHOENIX-4332
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
>Reporter: Mujtaba Chohan
>    Assignee: Samarth Jain
>
> Altering guidepost with on data table does not propagate to global index 
> using {{ALTER TABLE}} command.
> Altering global index table runs in not allowed error.
> {noformat}
> ALTER TABLE IDX SET GUIDE_POSTS_WIDTH=1;
> Error: ERROR 1010 (42M01): Not allowed to mutate table. Cannot add/drop 
> column referenced by VIEW columnName=IDX (state=42M01,code=1010)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-4335) System catalog snapshot created each time a new connection is created

2017-10-31 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16227665#comment-16227665
 ] 

Samarth Jain commented on PHOENIX-4335:
---

My comment was more around user expectation that a snapshot of the 
SYSTEM.CATALOG table will be created before phoenix ends up executing the 
upgrade code. They have been getting a snapshot for past 4 releases or so 
(because we have been changing the metadata, yes). And now for the 4.12 release 
they won't. They can always create a snapshot themselves too, just that it will 
be a bit of hassle as opposed to Phoenix doing it for them.

> System catalog snapshot created each time a new connection is created
> -
>
> Key: PHOENIX-4335
> URL: https://issues.apache.org/jira/browse/PHOENIX-4335
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
>Reporter: Mujtaba Chohan
>Assignee: James Taylor
>Priority: Blocker
> Fix For: 4.13.0
>
> Attachments: PHOENIX-4335.patch
>
>
> With current head of 4.x, System Catalog snapshot is created on each new 
> connection.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-4335) System catalog snapshot created each time a new connection is created

2017-10-31 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16227658#comment-16227658
 ] 

Samarth Jain commented on PHOENIX-4335:
---

Thinking about this a little bit more, there is a slight downside that we won't 
be creating a snapshot of SYSTEM.CATALOG when users are upgrading to the 4.12 
release. Maybe we should have some upgrade code to increment the SYSTEM table's 
timestamp even though we are not changing the metadata.

> System catalog snapshot created each time a new connection is created
> -
>
> Key: PHOENIX-4335
> URL: https://issues.apache.org/jira/browse/PHOENIX-4335
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
>Reporter: Mujtaba Chohan
>Assignee: James Taylor
>Priority: Blocker
> Fix For: 4.13.0
>
>
> With current head of 4.x, System Catalog snapshot is created on each new 
> connection.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-4333) Stats - Incorrect estimate when stats are updated on a tenant specific view

2017-10-31 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16227639#comment-16227639
 ] 

Samarth Jain commented on PHOENIX-4333:
---

I have committed the test to the master, 4.x-HBase-0.98 and 4.12* branches.

> Stats - Incorrect estimate when stats are updated on a tenant specific view
> ---
>
> Key: PHOENIX-4333
> URL: https://issues.apache.org/jira/browse/PHOENIX-4333
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
>Reporter: Mujtaba Chohan
>Assignee: Samarth Jain
> Attachments: PHOENIX-4333_test.patch
>
>
> Consider two tenants A, B with tenant specific view on 2 separate 
> regions/region servers.
> {noformat}
> Region 1 keys:
> A,1
> A,2
> B,1
> Region 2 keys:
> B,2
> B,3
> {noformat}
> When stats are updated on tenant A view. Querying stats on tenant B view 
> yield partial results (only contains stats for B,1) which are incorrect even 
> though it shows updated timestamp as current.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (PHOENIX-4333) Stats - Incorrect estimate when stats are updated on a tenant specific view

2017-10-31 Thread Samarth Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Samarth Jain updated PHOENIX-4333:
--
Attachment: PHOENIX-4333_test.patch

Test which demonstrates the issue that [~mujtabachohan] brought up. I would say 
it is working fine. We call these estimates for a reason :). If the user 
desires more accuracy, he/she should call UPDATE STATISTICS on the tenant view.

FYI, [~cody.mar...@gmail.com] 

> Stats - Incorrect estimate when stats are updated on a tenant specific view
> ---
>
> Key: PHOENIX-4333
> URL: https://issues.apache.org/jira/browse/PHOENIX-4333
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
>Reporter: Mujtaba Chohan
>Assignee: Samarth Jain
> Attachments: PHOENIX-4333_test.patch
>
>
> Consider two tenants A, B with tenant specific view on 2 separate 
> regions/region servers.
> {noformat}
> Region 1 keys:
> A,1
> A,2
> B,1
> Region 2 keys:
> B,2
> B,3
> {noformat}
> When stats are updated on tenant A view. Querying stats on tenant B view 
> yield partial results (only contains stats for B,1) which are incorrect even 
> though it shows updated timestamp as current.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (PHOENIX-4333) Stats - Incorrect estimate when stats are updated on a tenant specific view

2017-10-31 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16227597#comment-16227597
 ] 

Samarth Jain edited comment on PHOENIX-4333 at 10/31/17 9:12 PM:
-

Test which demonstrates the issue that [~mujtabachohan] brought up. I would say 
it is working as designed. We call these estimates for a reason :). If the user 
desires more accuracy, he/she should call UPDATE STATISTICS on the tenant view.

FYI, [~cody.mar...@gmail.com] 


was (Author: samarthjain):
Test which demonstrates the issue that [~mujtabachohan] brought up. I would say 
it is working fine. We call these estimates for a reason :). If the user 
desires more accuracy, he/she should call UPDATE STATISTICS on the tenant view.

FYI, [~cody.mar...@gmail.com] 

> Stats - Incorrect estimate when stats are updated on a tenant specific view
> ---
>
> Key: PHOENIX-4333
> URL: https://issues.apache.org/jira/browse/PHOENIX-4333
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
>Reporter: Mujtaba Chohan
>Assignee: Samarth Jain
> Attachments: PHOENIX-4333_test.patch
>
>
> Consider two tenants A, B with tenant specific view on 2 separate 
> regions/region servers.
> {noformat}
> Region 1 keys:
> A,1
> A,2
> B,1
> Region 2 keys:
> B,2
> B,3
> {noformat}
> When stats are updated on tenant A view. Querying stats on tenant B view 
> yield partial results (only contains stats for B,1) which are incorrect even 
> though it shows updated timestamp as current.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-4333) Stats - Incorrect estimate when stats are updated on a tenant specific view

2017-10-31 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16226388#comment-16226388
 ] 

Samarth Jain commented on PHOENIX-4333:
---

I am not sure what is the best option here. We possibly shouldn't be relying on 
the EST_INFO_TS for tenant views since in situations like these overlaps, we 
may have incomplete guide post info for a view. The user can possibly call 
update stats on the view after the first data load. And then subsequently rely 
on major compaction to collect stats for it.

[~jamestaylor], WDYT?

> Stats - Incorrect estimate when stats are updated on a tenant specific view
> ---
>
> Key: PHOENIX-4333
> URL: https://issues.apache.org/jira/browse/PHOENIX-4333
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
>Reporter: Mujtaba Chohan
>Assignee: Samarth Jain
>
> Consider two tenants A, B with tenant specific view on 2 separate 
> regions/region servers.
> {noformat}
> Region 1 keys:
> A,1
> A,2
> B,1
> Region 2 keys:
> B,2
> B,3
> {noformat}
> When stats are updated on tenant A view. Querying stats on tenant B view 
> yield partial results (only contains stats for B,1) which are incorrect even 
> though it shows updated timestamp as current.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (PHOENIX-4287) Incorrect aggregate query results when stats are disable for parallelization

2017-10-31 Thread Samarth Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Samarth Jain updated PHOENIX-4287:
--
Attachment: PHOENIX-4287_v4.patch

Thanks for the review, [~jamestaylor]. Attached is the updated patch. Will wait 
for the QA run to finish before I commit.

> Incorrect aggregate query results when stats are disable for parallelization
> 
>
> Key: PHOENIX-4287
> URL: https://issues.apache.org/jira/browse/PHOENIX-4287
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
> Environment: HBase 1.3.1
>Reporter: Mujtaba Chohan
>    Assignee: Samarth Jain
>  Labels: localIndex
> Fix For: 4.12.1
>
> Attachments: PHOENIX-4287.patch, PHOENIX-4287_v2.patch, 
> PHOENIX-4287_v3.patch, PHOENIX-4287_v3_wip.patch, PHOENIX-4287_v4.patch
>
>
> With {{phoenix.use.stats.parallelization}} set to {{false}}, aggregate query 
> returns incorrect results when stats are available.
> With local index and stats disabled for parallelization:
> {noformat}
> explain select count(*) from TABLE_T;
> +---+-++---+
> | PLAN
>   | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO |
> +---+-++---+
> | CLIENT 0-CHUNK 332170 ROWS 625043899 BYTES PARALLEL 0-WAY RANGE SCAN OVER 
> TABLE_T [1]  | 625043899   | 332170 | 150792825 |
> | SERVER FILTER BY FIRST KEY ONLY 
>   | 625043899   | 332170 | 150792825 |
> | SERVER AGGREGATE INTO SINGLE ROW
>   | 625043899   | 332170 | 150792825 |
> +---+-++---+
> select count(*) from TABLE_T;
> +---+
> | COUNT(1)  |
> +---+
> | 0 |
> +---+
> {noformat}
> Using data table
> {noformat}
> explain select /*+NO_INDEX*/ count(*) from TABLE_T;
> +--+-+++
> |   PLAN  
>  | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO_TS   |
> +--+-+++
> | CLIENT 2-CHUNK 332151 ROWS 438492470 BYTES PARALLEL 1-WAY FULL SCAN OVER 
> TABLE_T  | 438492470   | 332151 | 1507928257617  |
> | SERVER FILTER BY FIRST KEY ONLY 
>  | 438492470   | 332151 | 1507928257617  |
> | SERVER AGGREGATE INTO SINGLE ROW
>  | 438492470   | 332151 | 1507928257617  |
> +--+-+++
> select /*+NO_INDEX*/ count(*) from TABLE_T;
> +---+
> | COUNT(1)  |
> +---+
> | 14|
> +---+
> {noformat}
> Without stats available, results are correct:
> {noformat}
> explain select /*+NO_INDEX*/ count(*) from TABLE_T;
> +--+-++--+
> | PLAN | 
> EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
> +--+-++--+
> | CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER TABLE_T  | null| 
> null   | null |
> | SERVER FILTER BY FIRST KEY ONLY  | null 
>| null   | null |
> | SERVER AGGREGATE INTO SINGLE ROW | null 
>| null   | null |
> +--+-++--+
> select /*+NO_INDEX*/ count(*) from TABLE_T;
> +---+
> | COUNT(1)  |
> +---+
> | 27|
> +---+
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-4335) System catalog snapshot created each time a new connection is created

2017-10-31 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16226372#comment-16226372
 ] 

Samarth Jain commented on PHOENIX-4335:
---

You could possibly spy/mock the ConnectionQueryServicesImpl object and make 
sure that when establishing more than one HConnection to the cluster (by using 
the EXTRA_JDBC_ARGUMENTS param in the connection properties), 
{code}
private void createSnapshot(String snapshotName, String tableName)
throws SQLException {
{code}

is not called more than once. Such a test will fail without your patch. 

> System catalog snapshot created each time a new connection is created
> -
>
> Key: PHOENIX-4335
> URL: https://issues.apache.org/jira/browse/PHOENIX-4335
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
>Reporter: Mujtaba Chohan
>Priority: Blocker
> Fix For: 4.13.0
>
>
> With current head of 4.x, System Catalog snapshot is created on each new 
> connection.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-4334) Unable to update stats on views that reside on separate regions before phoenix.stats.updateFrequency has elapsed

2017-10-31 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16226350#comment-16226350
 ] 

Samarth Jain commented on PHOENIX-4334:
---

[~jamestaylor] - any other ideas on how we can prevent update stats on view2 to 
not block itself from running when update stats on view1 has already run? We 
could possibly store last_update_stats_time at the logical table level too. But 
that would be a non-trivial change.

> Unable to update stats on views that reside on separate regions before 
> phoenix.stats.updateFrequency has elapsed
> 
>
> Key: PHOENIX-4334
> URL: https://issues.apache.org/jira/browse/PHOENIX-4334
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
>Reporter: Mujtaba Chohan
>    Assignee: Samarth Jain
>
> Consider multiple tenant views that all reside on unique region/region 
> servers. Updating stats on any one of the view causes other views to report 
> estimated stats last update time as current resulting in stats command 
> getting ignored for other views till {{phoenix.stats.updateFrequency}} has 
> elapsed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-4334) Unable to update stats on views that reside on separate regions before phoenix.stats.updateFrequency has elapsed

2017-10-31 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16226349#comment-16226349
 ] 

Samarth Jain commented on PHOENIX-4334:
---

We store last_update_time at the physical table level. So if we end up 
collecting stats for view1, then we will have to wait for 
phoenix.stats.updateFrequency before update stats on view2 has any effect. An 
alternative would be set phoenix.stats.updateFrequency to 0. 

I will take a look at why view2 is reporting estimate time as current time.

> Unable to update stats on views that reside on separate regions before 
> phoenix.stats.updateFrequency has elapsed
> 
>
> Key: PHOENIX-4334
> URL: https://issues.apache.org/jira/browse/PHOENIX-4334
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
>Reporter: Mujtaba Chohan
>    Assignee: Samarth Jain
>
> Consider multiple tenant views that all reside on unique region/region 
> servers. Updating stats on any one of the view causes other views to report 
> estimated stats last update time as current resulting in stats command 
> getting ignored for other views till {{phoenix.stats.updateFrequency}} has 
> elapsed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-4335) System catalog snapshot created each time a new connection is created

2017-10-31 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16226347#comment-16226347
 ] 

Samarth Jain commented on PHOENIX-4335:
---

Would a straightforward change be to revert the MIN_SYSTEM_TABLE_TIMESTAMP 
increment? We rely on the system table's timestamp to check whether we need to 
create a snapshot.

> System catalog snapshot created each time a new connection is created
> -
>
> Key: PHOENIX-4335
> URL: https://issues.apache.org/jira/browse/PHOENIX-4335
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
>Reporter: Mujtaba Chohan
>Priority: Blocker
> Fix For: 4.13.0
>
>
> With current head of 4.x, System Catalog snapshot is created on each new 
> connection.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (PHOENIX-4287) Incorrect aggregate query results when stats are disable for parallelization

2017-10-31 Thread Samarth Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Samarth Jain updated PHOENIX-4287:
--
Attachment: PHOENIX-4287_v3.patch

I think I figured out what was going on. When we are not using stats for 
parallelization, we need to reset the start key of the scan to either the 
original scan's start key (if we are looking at the first region) or to the end 
key of the previous region.

[~jamestaylor] - your keen eyes would be much appreciated. It is tricky to get 
this stuff right.

> Incorrect aggregate query results when stats are disable for parallelization
> 
>
> Key: PHOENIX-4287
> URL: https://issues.apache.org/jira/browse/PHOENIX-4287
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
> Environment: HBase 1.3.1
>Reporter: Mujtaba Chohan
>    Assignee: Samarth Jain
>  Labels: localIndex
> Fix For: 4.12.1
>
> Attachments: PHOENIX-4287.patch, PHOENIX-4287_v2.patch, 
> PHOENIX-4287_v3.patch, PHOENIX-4287_v3_wip.patch
>
>
> With {{phoenix.use.stats.parallelization}} set to {{false}}, aggregate query 
> returns incorrect results when stats are available.
> With local index and stats disabled for parallelization:
> {noformat}
> explain select count(*) from TABLE_T;
> +---+-++---+
> | PLAN
>   | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO |
> +---+-++---+
> | CLIENT 0-CHUNK 332170 ROWS 625043899 BYTES PARALLEL 0-WAY RANGE SCAN OVER 
> TABLE_T [1]  | 625043899   | 332170 | 150792825 |
> | SERVER FILTER BY FIRST KEY ONLY 
>   | 625043899   | 332170 | 150792825 |
> | SERVER AGGREGATE INTO SINGLE ROW
>   | 625043899   | 332170 | 150792825 |
> +---+-++---+
> select count(*) from TABLE_T;
> +---+
> | COUNT(1)  |
> +---+
> | 0 |
> +---+
> {noformat}
> Using data table
> {noformat}
> explain select /*+NO_INDEX*/ count(*) from TABLE_T;
> +--+-+++
> |   PLAN  
>  | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO_TS   |
> +--+-+++
> | CLIENT 2-CHUNK 332151 ROWS 438492470 BYTES PARALLEL 1-WAY FULL SCAN OVER 
> TABLE_T  | 438492470   | 332151 | 1507928257617  |
> | SERVER FILTER BY FIRST KEY ONLY 
>  | 438492470   | 332151 | 1507928257617  |
> | SERVER AGGREGATE INTO SINGLE ROW
>  | 438492470   | 332151 | 1507928257617  |
> +--+-+++
> select /*+NO_INDEX*/ count(*) from TABLE_T;
> +---+
> | COUNT(1)  |
> +---+
> | 14|
> +---+
> {noformat}
> Without stats available, results are correct:
> {noformat}
> explain select /*+NO_INDEX*/ count(*) from TABLE_T;
> +--+-++--+
> | PLAN | 
> EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
> +--+-++--+
> | CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER TABLE_T  | null| 
> null   | null |
> | SERVER FILTER BY FIRS

[jira] [Updated] (PHOENIX-4287) Incorrect aggregate query results when stats are disable for parallelization

2017-10-30 Thread Samarth Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Samarth Jain updated PHOENIX-4287:
--
Attachment: PHOENIX-4287_v3_wip.patch

wip patch for an attempt to use the existing code. Doesn't work, yet.

> Incorrect aggregate query results when stats are disable for parallelization
> 
>
> Key: PHOENIX-4287
> URL: https://issues.apache.org/jira/browse/PHOENIX-4287
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
> Environment: HBase 1.3.1
>Reporter: Mujtaba Chohan
>    Assignee: Samarth Jain
>  Labels: localIndex
> Fix For: 4.12.1
>
> Attachments: PHOENIX-4287.patch, PHOENIX-4287_v2.patch, 
> PHOENIX-4287_v3_wip.patch
>
>
> With {{phoenix.use.stats.parallelization}} set to {{false}}, aggregate query 
> returns incorrect results when stats are available.
> With local index and stats disabled for parallelization:
> {noformat}
> explain select count(*) from TABLE_T;
> +---+-++---+
> | PLAN
>   | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO |
> +---+-++---+
> | CLIENT 0-CHUNK 332170 ROWS 625043899 BYTES PARALLEL 0-WAY RANGE SCAN OVER 
> TABLE_T [1]  | 625043899   | 332170 | 150792825 |
> | SERVER FILTER BY FIRST KEY ONLY 
>   | 625043899   | 332170 | 150792825 |
> | SERVER AGGREGATE INTO SINGLE ROW
>   | 625043899   | 332170 | 150792825 |
> +---+-++---+
> select count(*) from TABLE_T;
> +---+
> | COUNT(1)  |
> +---+
> | 0 |
> +---+
> {noformat}
> Using data table
> {noformat}
> explain select /*+NO_INDEX*/ count(*) from TABLE_T;
> +--+-+++
> |   PLAN  
>  | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO_TS   |
> +--+-+++
> | CLIENT 2-CHUNK 332151 ROWS 438492470 BYTES PARALLEL 1-WAY FULL SCAN OVER 
> TABLE_T  | 438492470   | 332151 | 1507928257617  |
> | SERVER FILTER BY FIRST KEY ONLY 
>  | 438492470   | 332151 | 1507928257617  |
> | SERVER AGGREGATE INTO SINGLE ROW
>  | 438492470   | 332151 | 1507928257617  |
> +--+-+++
> select /*+NO_INDEX*/ count(*) from TABLE_T;
> +---+
> | COUNT(1)  |
> +---+
> | 14|
> +---+
> {noformat}
> Without stats available, results are correct:
> {noformat}
> explain select /*+NO_INDEX*/ count(*) from TABLE_T;
> +--+-++--+
> | PLAN | 
> EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
> +--+-++--+
> | CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER TABLE_T  | null| 
> null   | null |
> | SERVER FILTER BY FIRST KEY ONLY  | null 
>| null   | null |
> | SERVER AGGREGATE INTO SINGLE ROW | null 
>| null   | null |
> +--+-++--+
> select /*+NO_INDEX*/ count(*) from TABLE_T;
> +---+
> | COUNT(1)  |
> +---+
> | 27|
> +---+
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-4287) Incorrect aggregate query results when stats are disable for parallelization

2017-10-30 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225583#comment-16225583
 ] 

Samarth Jain commented on PHOENIX-4287:
---

There is some level of duplication but the generation of estimates when 
statsParallelization is off is relatively simpler. We only need to intersect 
scan stop and start key with guideposts and not worry about region boundaries 
and everything else which the code in getParallelScans() does.  My previous 
attempt at using the existing code to generate estimates and not generate 
intra-region scans failed miserably. I will sync with you offline to see if 
what we can do to reuse the existing code.

> Incorrect aggregate query results when stats are disable for parallelization
> 
>
> Key: PHOENIX-4287
> URL: https://issues.apache.org/jira/browse/PHOENIX-4287
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
> Environment: HBase 1.3.1
>Reporter: Mujtaba Chohan
>    Assignee: Samarth Jain
>  Labels: localIndex
> Fix For: 4.12.1
>
> Attachments: PHOENIX-4287.patch, PHOENIX-4287_v2.patch
>
>
> With {{phoenix.use.stats.parallelization}} set to {{false}}, aggregate query 
> returns incorrect results when stats are available.
> With local index and stats disabled for parallelization:
> {noformat}
> explain select count(*) from TABLE_T;
> +---+-++---+
> | PLAN
>   | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO |
> +---+-++---+
> | CLIENT 0-CHUNK 332170 ROWS 625043899 BYTES PARALLEL 0-WAY RANGE SCAN OVER 
> TABLE_T [1]  | 625043899   | 332170 | 150792825 |
> | SERVER FILTER BY FIRST KEY ONLY 
>   | 625043899   | 332170 | 150792825 |
> | SERVER AGGREGATE INTO SINGLE ROW
>   | 625043899   | 332170 | 150792825 |
> +---+-++---+
> select count(*) from TABLE_T;
> +---+
> | COUNT(1)  |
> +---+
> | 0 |
> +---+
> {noformat}
> Using data table
> {noformat}
> explain select /*+NO_INDEX*/ count(*) from TABLE_T;
> +--+-+++
> |   PLAN  
>  | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO_TS   |
> +--+-+++
> | CLIENT 2-CHUNK 332151 ROWS 438492470 BYTES PARALLEL 1-WAY FULL SCAN OVER 
> TABLE_T  | 438492470   | 332151 | 1507928257617  |
> | SERVER FILTER BY FIRST KEY ONLY 
>  | 438492470   | 332151 | 1507928257617  |
> | SERVER AGGREGATE INTO SINGLE ROW
>  | 438492470   | 332151 | 1507928257617  |
> +--+-+++
> select /*+NO_INDEX*/ count(*) from TABLE_T;
> +---+
> | COUNT(1)  |
> +---+
> | 14|
> +---+
> {noformat}
> Without stats available, results are correct:
> {noformat}
> explain select /*+NO_INDEX*/ count(*) from TABLE_T;
> +--+-++--+
> | PLAN | 
> EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
> +--+-++--+
> | CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER TABLE_T  | null| 
> null   | null |
> | SERVER FILTER BY FIRST K

[jira] [Commented] (PHOENIX-4287) Incorrect aggregate query results when stats are disable for parallelization

2017-10-30 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16225434#comment-16225434
 ] 

Samarth Jain commented on PHOENIX-4287:
---

Yes, v2 just has changes relevant to this JIRA.

> Incorrect aggregate query results when stats are disable for parallelization
> 
>
> Key: PHOENIX-4287
> URL: https://issues.apache.org/jira/browse/PHOENIX-4287
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
> Environment: HBase 1.3.1
>Reporter: Mujtaba Chohan
>    Assignee: Samarth Jain
>  Labels: localIndex
> Fix For: 4.12.1
>
> Attachments: PHOENIX-4287.patch, PHOENIX-4287_v2.patch
>
>
> With {{phoenix.use.stats.parallelization}} set to {{false}}, aggregate query 
> returns incorrect results when stats are available.
> With local index and stats disabled for parallelization:
> {noformat}
> explain select count(*) from TABLE_T;
> +---+-++---+
> | PLAN
>   | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO |
> +---+-++---+
> | CLIENT 0-CHUNK 332170 ROWS 625043899 BYTES PARALLEL 0-WAY RANGE SCAN OVER 
> TABLE_T [1]  | 625043899   | 332170 | 150792825 |
> | SERVER FILTER BY FIRST KEY ONLY 
>   | 625043899   | 332170 | 150792825 |
> | SERVER AGGREGATE INTO SINGLE ROW
>   | 625043899   | 332170 | 150792825 |
> +---+-++---+
> select count(*) from TABLE_T;
> +---+
> | COUNT(1)  |
> +---+
> | 0 |
> +---+
> {noformat}
> Using data table
> {noformat}
> explain select /*+NO_INDEX*/ count(*) from TABLE_T;
> +--+-+++
> |   PLAN  
>  | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO_TS   |
> +--+-+++
> | CLIENT 2-CHUNK 332151 ROWS 438492470 BYTES PARALLEL 1-WAY FULL SCAN OVER 
> TABLE_T  | 438492470   | 332151 | 1507928257617  |
> | SERVER FILTER BY FIRST KEY ONLY 
>  | 438492470   | 332151 | 1507928257617  |
> | SERVER AGGREGATE INTO SINGLE ROW
>  | 438492470   | 332151 | 1507928257617  |
> +--+-+++
> select /*+NO_INDEX*/ count(*) from TABLE_T;
> +---+
> | COUNT(1)  |
> +---+
> | 14|
> +---+
> {noformat}
> Without stats available, results are correct:
> {noformat}
> explain select /*+NO_INDEX*/ count(*) from TABLE_T;
> +--+-++--+
> | PLAN | 
> EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
> +--+-++--+
> | CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER TABLE_T  | null| 
> null   | null |
> | SERVER FILTER BY FIRST KEY ONLY  | null 
>| null   | null |
> | SERVER AGGREGATE INTO SINGLE ROW | null 
>| null   | null |
> +--+-++--+
> select /*+NO_INDEX*/ count(*) from TABLE_T;
> +---+
> | COUNT(1)  |
> +---+
> | 27|
> +---+
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (PHOENIX-4287) Incorrect aggregate query results when stats are disable for parallelization

2017-10-30 Thread Samarth Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Samarth Jain updated PHOENIX-4287:
--
Attachment: PHOENIX-4287_v2.patch

> Incorrect aggregate query results when stats are disable for parallelization
> 
>
> Key: PHOENIX-4287
> URL: https://issues.apache.org/jira/browse/PHOENIX-4287
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
> Environment: HBase 1.3.1
>Reporter: Mujtaba Chohan
>    Assignee: Samarth Jain
>  Labels: localIndex
> Fix For: 4.12.1
>
> Attachments: PHOENIX-4287.patch, PHOENIX-4287_v2.patch
>
>
> With {{phoenix.use.stats.parallelization}} set to {{false}}, aggregate query 
> returns incorrect results when stats are available.
> With local index and stats disabled for parallelization:
> {noformat}
> explain select count(*) from TABLE_T;
> +---+-++---+
> | PLAN
>   | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO |
> +---+-++---+
> | CLIENT 0-CHUNK 332170 ROWS 625043899 BYTES PARALLEL 0-WAY RANGE SCAN OVER 
> TABLE_T [1]  | 625043899   | 332170 | 150792825 |
> | SERVER FILTER BY FIRST KEY ONLY 
>   | 625043899   | 332170 | 150792825 |
> | SERVER AGGREGATE INTO SINGLE ROW
>   | 625043899   | 332170 | 150792825 |
> +---+-++---+
> select count(*) from TABLE_T;
> +---+
> | COUNT(1)  |
> +---+
> | 0 |
> +---+
> {noformat}
> Using data table
> {noformat}
> explain select /*+NO_INDEX*/ count(*) from TABLE_T;
> +--+-+++
> |   PLAN  
>  | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO_TS   |
> +--+-+++
> | CLIENT 2-CHUNK 332151 ROWS 438492470 BYTES PARALLEL 1-WAY FULL SCAN OVER 
> TABLE_T  | 438492470   | 332151 | 1507928257617  |
> | SERVER FILTER BY FIRST KEY ONLY 
>  | 438492470   | 332151 | 1507928257617  |
> | SERVER AGGREGATE INTO SINGLE ROW
>  | 438492470   | 332151 | 1507928257617  |
> +--+-+++
> select /*+NO_INDEX*/ count(*) from TABLE_T;
> +---+
> | COUNT(1)  |
> +---+
> | 14|
> +---+
> {noformat}
> Without stats available, results are correct:
> {noformat}
> explain select /*+NO_INDEX*/ count(*) from TABLE_T;
> +--+-++--+
> | PLAN | 
> EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
> +--+-++--+
> | CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER TABLE_T  | null| 
> null   | null |
> | SERVER FILTER BY FIRST KEY ONLY  | null 
>| null   | null |
> | SERVER AGGREGATE INTO SINGLE ROW | null 
>| null   | null |
> +--+-++--+
> select /*+NO_INDEX*/ count(*) from TABLE_T;
> +---+
> | COUNT(1)  |
> +---+
> | 27|
> +---+
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-4289) UPDATE STATISTICS command does not collect stats for local indexes

2017-10-29 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16224407#comment-16224407
 ] 

Samarth Jain commented on PHOENIX-4289:
---

Tests passed. I will go ahead and commit this patch.

> UPDATE STATISTICS command does not collect stats for local indexes
> --
>
> Key: PHOENIX-4289
> URL: https://issues.apache.org/jira/browse/PHOENIX-4289
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
> Environment: HBase 1.3.1, Phoenix 4.12.0
>Reporter: Mujtaba Chohan
>    Assignee: Samarth Jain
>  Labels: localIndex
> Attachments: PHOENIX-4289.patch, PHOENIX-4289_v2.patch, 
> PHOENIX-4289_v3.patch, PHOENIX-4289_v4.patch
>
>
> With clean {{SYSTEM.STATS}} table and restarted HBase server+Phoenix client. 
> Ran {{UPDATE STATISTICS T ALL}} command. Global guidepost width is set to 
> 100M. No stats are generated for any of the local indexes on table T.
> {noformat}
> explain select count(*) from T;
> +---+-++--+
> |   PLAN| 
> EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
> +---+-++--+
> | CLIENT 8-CHUNK PARALLEL 8-WAY RANGE SCAN OVER T [1]   | 
> null| null   | null |
> | SERVER FILTER BY FIRST KEY ONLY   | 
> null| null   | null |
> | SERVER AGGREGATE INTO SINGLE ROW  | 
> null| null   | null |
> +---+-++--+
> select * from system.stats;
> +--++-++--++
> |PHYSICAL_NAME | COLUMN_FAMILY  | GUIDE_POST_KEY  | 
> GUIDE_POSTS_WIDTH  |  LAST_STATS_UPDATE_TIME  | GUIDE_POSTS_ROW_COUNT  |
> +--++-++--++
> | T   || | null   | 
> 2017-10-16 18:36:57.884  | null   |
> | T   | 0  | [B@9bd0fa6  | 10099  |   
>| 75756  |
> | T   | 0  | [B@59d2103b | 10057  |   
>| 75748  |
> | T   | 0  | [B@39dcf4b0 | 10058  |   
>| 75748  |
> | T   | 0  | [B@6e4de19b | 10081  |   
>| 75743  |
> | T   | 0  | [B@f6c03cb  | 10044  |   
>| 75744  |
> | T   | 0  | [B@46f699d5 | 10023  |   
>| 75741  |
> | T   | 0  | [B@18518ccf | 10019  |   
>| 75749  |
> | T   | 0  | [B@1991f767 | 10097  |   
>| 75740  |
> | T   | 0  | [B@768ccdc5 | 10092  |   
>| 75740  |
> | T   | 0  | [B@4c6daf0  | 10026  |   
>| 75739  |
> | T   | 0  | [B@10650953 | 10054  |   
>| 75731  |
> | T   | 0  | [B@659eef7  | 10092  |   
>| 75741  |
> | T   | 0  | [B@162be91c | 10023  |   
>| 75752  |
> | T   | 0  | [B@2488b073 | 10096  |   
>| 75743  |
> | T   | 0  | [B@1c9f0a20 | 10025  |   
>| 75745  |
> | T   | 0  | [B@55787112 | 10104  |   
>| 75725  |
> | T   | 0  | [B@1cd201a8 | 10019  |   

[jira] [Updated] (PHOENIX-4287) Incorrect aggregate query results when stats are disable for parallelization

2017-10-29 Thread Samarth Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Samarth Jain updated PHOENIX-4287:
--
Attachment: PHOENIX-4287.patch

Patch on top on PHOENIX-4289. [~jamestaylor], please review.

> Incorrect aggregate query results when stats are disable for parallelization
> 
>
> Key: PHOENIX-4287
> URL: https://issues.apache.org/jira/browse/PHOENIX-4287
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
> Environment: HBase 1.3.1
>Reporter: Mujtaba Chohan
>    Assignee: Samarth Jain
>  Labels: localIndex
> Fix For: 4.12.1
>
> Attachments: PHOENIX-4287.patch
>
>
> With {{phoenix.use.stats.parallelization}} set to {{false}}, aggregate query 
> returns incorrect results when stats are available.
> With local index and stats disabled for parallelization:
> {noformat}
> explain select count(*) from TABLE_T;
> +---+-++---+
> | PLAN
>   | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO |
> +---+-++---+
> | CLIENT 0-CHUNK 332170 ROWS 625043899 BYTES PARALLEL 0-WAY RANGE SCAN OVER 
> TABLE_T [1]  | 625043899   | 332170 | 150792825 |
> | SERVER FILTER BY FIRST KEY ONLY 
>   | 625043899   | 332170 | 150792825 |
> | SERVER AGGREGATE INTO SINGLE ROW
>   | 625043899   | 332170 | 150792825 |
> +---+-++---+
> select count(*) from TABLE_T;
> +---+
> | COUNT(1)  |
> +---+
> | 0 |
> +---+
> {noformat}
> Using data table
> {noformat}
> explain select /*+NO_INDEX*/ count(*) from TABLE_T;
> +--+-+++
> |   PLAN  
>  | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO_TS   |
> +--+-+++
> | CLIENT 2-CHUNK 332151 ROWS 438492470 BYTES PARALLEL 1-WAY FULL SCAN OVER 
> TABLE_T  | 438492470   | 332151 | 1507928257617  |
> | SERVER FILTER BY FIRST KEY ONLY 
>  | 438492470   | 332151 | 1507928257617  |
> | SERVER AGGREGATE INTO SINGLE ROW
>  | 438492470   | 332151 | 1507928257617  |
> +--+-+++
> select /*+NO_INDEX*/ count(*) from TABLE_T;
> +---+
> | COUNT(1)  |
> +---+
> | 14|
> +---+
> {noformat}
> Without stats available, results are correct:
> {noformat}
> explain select /*+NO_INDEX*/ count(*) from TABLE_T;
> +--+-++--+
> | PLAN | 
> EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
> +--+-++--+
> | CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER TABLE_T  | null| 
> null   | null |
> | SERVER FILTER BY FIRST KEY ONLY  | null 
>| null   | null |
> | SERVER AGGREGATE INTO SINGLE ROW | null 
>| null   | null |
> +--+-++--+
> select /*+NO_INDEX*/ count(*) from TABLE_T;
> +---+
> | COUNT(1)  |
> +---+
> | 27|
> +---+
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (PHOENIX-4289) UPDATE STATISTICS command does not collect stats for local indexes

2017-10-29 Thread Samarth Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Samarth Jain updated PHOENIX-4289:
--
Attachment: PHOENIX-4289_v4.patch

Fixing test failure.

> UPDATE STATISTICS command does not collect stats for local indexes
> --
>
> Key: PHOENIX-4289
> URL: https://issues.apache.org/jira/browse/PHOENIX-4289
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
> Environment: HBase 1.3.1, Phoenix 4.12.0
>Reporter: Mujtaba Chohan
>    Assignee: Samarth Jain
>  Labels: localIndex
> Attachments: PHOENIX-4289.patch, PHOENIX-4289_v2.patch, 
> PHOENIX-4289_v3.patch, PHOENIX-4289_v4.patch
>
>
> With clean {{SYSTEM.STATS}} table and restarted HBase server+Phoenix client. 
> Ran {{UPDATE STATISTICS T ALL}} command. Global guidepost width is set to 
> 100M. No stats are generated for any of the local indexes on table T.
> {noformat}
> explain select count(*) from T;
> +---+-++--+
> |   PLAN| 
> EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
> +---+-++--+
> | CLIENT 8-CHUNK PARALLEL 8-WAY RANGE SCAN OVER T [1]   | 
> null| null   | null |
> | SERVER FILTER BY FIRST KEY ONLY   | 
> null| null   | null |
> | SERVER AGGREGATE INTO SINGLE ROW  | 
> null| null   | null |
> +---+-++--+
> select * from system.stats;
> +--++-++--++
> |PHYSICAL_NAME | COLUMN_FAMILY  | GUIDE_POST_KEY  | 
> GUIDE_POSTS_WIDTH  |  LAST_STATS_UPDATE_TIME  | GUIDE_POSTS_ROW_COUNT  |
> +--++-++--++
> | T   || | null   | 
> 2017-10-16 18:36:57.884  | null   |
> | T   | 0  | [B@9bd0fa6  | 10099  |   
>| 75756  |
> | T   | 0  | [B@59d2103b | 10057  |   
>| 75748  |
> | T   | 0  | [B@39dcf4b0 | 10058  |   
>| 75748  |
> | T   | 0  | [B@6e4de19b | 10081  |   
>| 75743  |
> | T   | 0  | [B@f6c03cb  | 10044  |   
>| 75744  |
> | T   | 0  | [B@46f699d5 | 10023  |   
>| 75741  |
> | T   | 0  | [B@18518ccf | 10019  |   
>| 75749  |
> | T   | 0  | [B@1991f767 | 10097  |   
>| 75740  |
> | T   | 0  | [B@768ccdc5 | 10092  |   
>| 75740  |
> | T   | 0  | [B@4c6daf0  | 10026  |   
>| 75739  |
> | T   | 0  | [B@10650953 | 10054  |   
>| 75731  |
> | T   | 0  | [B@659eef7  | 10092  |   
>| 75741  |
> | T   | 0  | [B@162be91c | 10023  |   
>| 75752  |
> | T   | 0  | [B@2488b073 | 10096  |   
>| 75743  |
> | T   | 0  | [B@1c9f0a20 | 10025  |   
>| 75745  |
> | T   | 0  | [B@55787112 | 10104  |   
>| 75725  |
> | T   | 0  | [B@1cd201a8 | 10019  |   
>   

[jira] [Updated] (PHOENIX-4289) UPDATE STATISTICS command does not collect stats for local indexes

2017-10-29 Thread Samarth Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Samarth Jain updated PHOENIX-4289:
--
Attachment: PHOENIX-4289_v3.patch

v3 patch to address the test failure.

> UPDATE STATISTICS command does not collect stats for local indexes
> --
>
> Key: PHOENIX-4289
> URL: https://issues.apache.org/jira/browse/PHOENIX-4289
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
> Environment: HBase 1.3.1, Phoenix 4.12.0
>Reporter: Mujtaba Chohan
>    Assignee: Samarth Jain
>  Labels: localIndex
> Attachments: PHOENIX-4289.patch, PHOENIX-4289_v2.patch, 
> PHOENIX-4289_v3.patch
>
>
> With clean {{SYSTEM.STATS}} table and restarted HBase server+Phoenix client. 
> Ran {{UPDATE STATISTICS T ALL}} command. Global guidepost width is set to 
> 100M. No stats are generated for any of the local indexes on table T.
> {noformat}
> explain select count(*) from T;
> +---+-++--+
> |   PLAN| 
> EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
> +---+-++--+
> | CLIENT 8-CHUNK PARALLEL 8-WAY RANGE SCAN OVER T [1]   | 
> null| null   | null |
> | SERVER FILTER BY FIRST KEY ONLY   | 
> null| null   | null |
> | SERVER AGGREGATE INTO SINGLE ROW  | 
> null| null   | null |
> +---+-++--+
> select * from system.stats;
> +--++-++--++
> |PHYSICAL_NAME | COLUMN_FAMILY  | GUIDE_POST_KEY  | 
> GUIDE_POSTS_WIDTH  |  LAST_STATS_UPDATE_TIME  | GUIDE_POSTS_ROW_COUNT  |
> +--++-++--++
> | T   || | null   | 
> 2017-10-16 18:36:57.884  | null   |
> | T   | 0  | [B@9bd0fa6  | 10099  |   
>| 75756  |
> | T   | 0  | [B@59d2103b | 10057  |   
>| 75748  |
> | T   | 0  | [B@39dcf4b0 | 10058  |   
>| 75748  |
> | T   | 0  | [B@6e4de19b | 10081  |   
>| 75743  |
> | T   | 0  | [B@f6c03cb  | 10044  |   
>| 75744  |
> | T   | 0  | [B@46f699d5 | 10023  |   
>| 75741  |
> | T   | 0  | [B@18518ccf | 10019  |   
>| 75749  |
> | T   | 0  | [B@1991f767 | 10097  |   
>| 75740  |
> | T   | 0  | [B@768ccdc5 | 10092  |   
>| 75740  |
> | T   | 0  | [B@4c6daf0  | 10026  |   
>| 75739  |
> | T   | 0  | [B@10650953 | 10054  |   
>| 75731  |
> | T   | 0  | [B@659eef7  | 10092  |   
>| 75741  |
> | T   | 0  | [B@162be91c | 10023  |   
>| 75752  |
> | T   | 0  | [B@2488b073 | 10096  |   
>| 75743  |
> | T   | 0  | [B@1c9f0a20 | 10025  |   
>| 75745  |
> | T   | 0  | [B@55787112 | 10104  |   
>| 75725  |
> | T   | 0  | [B@1cd201a8 | 10019  |   
>   

[jira] [Updated] (PHOENIX-4289) UPDATE STATISTICS command does not collect stats for local indexes

2017-10-28 Thread Samarth Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Samarth Jain updated PHOENIX-4289:
--
Attachment: PHOENIX-4289_v2.patch

Previous patch had an issue which was preventing stats being collected for 
local indexes on views. Updated patch.

> UPDATE STATISTICS command does not collect stats for local indexes
> --
>
> Key: PHOENIX-4289
> URL: https://issues.apache.org/jira/browse/PHOENIX-4289
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
> Environment: HBase 1.3.1, Phoenix 4.12.0
>Reporter: Mujtaba Chohan
>    Assignee: Samarth Jain
>  Labels: localIndex
> Attachments: PHOENIX-4289.patch, PHOENIX-4289_v2.patch
>
>
> With clean {{SYSTEM.STATS}} table and restarted HBase server+Phoenix client. 
> Ran {{UPDATE STATISTICS T ALL}} command. Global guidepost width is set to 
> 100M. No stats are generated for any of the local indexes on table T.
> {noformat}
> explain select count(*) from T;
> +---+-++--+
> |   PLAN| 
> EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
> +---+-++--+
> | CLIENT 8-CHUNK PARALLEL 8-WAY RANGE SCAN OVER T [1]   | 
> null| null   | null |
> | SERVER FILTER BY FIRST KEY ONLY   | 
> null| null   | null |
> | SERVER AGGREGATE INTO SINGLE ROW  | 
> null| null   | null |
> +---+-++--+
> select * from system.stats;
> +--++-++--++
> |PHYSICAL_NAME | COLUMN_FAMILY  | GUIDE_POST_KEY  | 
> GUIDE_POSTS_WIDTH  |  LAST_STATS_UPDATE_TIME  | GUIDE_POSTS_ROW_COUNT  |
> +--++-++--++
> | T   || | null   | 
> 2017-10-16 18:36:57.884  | null   |
> | T   | 0  | [B@9bd0fa6  | 10099  |   
>| 75756  |
> | T   | 0  | [B@59d2103b | 10057  |   
>| 75748  |
> | T   | 0  | [B@39dcf4b0 | 10058  |   
>| 75748  |
> | T   | 0  | [B@6e4de19b | 10081  |   
>| 75743  |
> | T   | 0  | [B@f6c03cb  | 10044  |   
>| 75744  |
> | T   | 0  | [B@46f699d5 | 10023  |   
>| 75741  |
> | T   | 0  | [B@18518ccf | 10019  |   
>| 75749  |
> | T   | 0  | [B@1991f767 | 10097  |   
>| 75740  |
> | T   | 0  | [B@768ccdc5 | 10092  |   
>| 75740  |
> | T   | 0  | [B@4c6daf0  | 10026  |   
>| 75739  |
> | T   | 0  | [B@10650953 | 10054  |   
>| 75731  |
> | T   | 0  | [B@659eef7  | 10092  |   
>| 75741  |
> | T   | 0  | [B@162be91c | 10023  |   
>| 75752  |
> | T   | 0  | [B@2488b073 | 10096  |   
>| 75743  |
> | T   | 0  | [B@1c9f0a20 | 10025  |   
>| 75745  |
> | T   | 0  | [B@55787112 | 10104  |   
>| 75725  |
> | T   | 0  | [B@1cd201a8 | 10019  |  

[jira] [Updated] (PHOENIX-4289) UPDATE STATISTICS command does not collect stats for local indexes

2017-10-27 Thread Samarth Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Samarth Jain updated PHOENIX-4289:
--
Attachment: (was: PHOENIX-4289.patch)

> UPDATE STATISTICS command does not collect stats for local indexes
> --
>
> Key: PHOENIX-4289
> URL: https://issues.apache.org/jira/browse/PHOENIX-4289
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
> Environment: HBase 1.3.1, Phoenix 4.12.0
>Reporter: Mujtaba Chohan
>    Assignee: Samarth Jain
>  Labels: localIndex
> Attachments: PHOENIX-4289.patch
>
>
> With clean {{SYSTEM.STATS}} table and restarted HBase server+Phoenix client. 
> Ran {{UPDATE STATISTICS T ALL}} command. Global guidepost width is set to 
> 100M. No stats are generated for any of the local indexes on table T.
> {noformat}
> explain select count(*) from T;
> +---+-++--+
> |   PLAN| 
> EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
> +---+-++--+
> | CLIENT 8-CHUNK PARALLEL 8-WAY RANGE SCAN OVER T [1]   | 
> null| null   | null |
> | SERVER FILTER BY FIRST KEY ONLY   | 
> null| null   | null |
> | SERVER AGGREGATE INTO SINGLE ROW  | 
> null| null   | null |
> +---+-++--+
> select * from system.stats;
> +--++-++--++
> |PHYSICAL_NAME | COLUMN_FAMILY  | GUIDE_POST_KEY  | 
> GUIDE_POSTS_WIDTH  |  LAST_STATS_UPDATE_TIME  | GUIDE_POSTS_ROW_COUNT  |
> +--++-++--++
> | T   || | null   | 
> 2017-10-16 18:36:57.884  | null   |
> | T   | 0  | [B@9bd0fa6  | 10099  |   
>| 75756  |
> | T   | 0  | [B@59d2103b | 10057  |   
>| 75748  |
> | T   | 0  | [B@39dcf4b0 | 10058  |   
>| 75748  |
> | T   | 0  | [B@6e4de19b | 10081  |   
>| 75743  |
> | T   | 0  | [B@f6c03cb  | 10044  |   
>| 75744  |
> | T   | 0  | [B@46f699d5 | 10023  |   
>| 75741  |
> | T   | 0  | [B@18518ccf | 10019  |   
>| 75749  |
> | T   | 0  | [B@1991f767 | 10097  |   
>| 75740  |
> | T   | 0  | [B@768ccdc5 | 10092  |   
>| 75740  |
> | T   | 0  | [B@4c6daf0  | 10026  |   
>| 75739  |
> | T   | 0  | [B@10650953 | 10054  |   
>| 75731  |
> | T   | 0  | [B@659eef7  | 10092  |   
>| 75741  |
> | T   | 0  | [B@162be91c | 10023  |   
>| 75752  |
> | T   | 0  | [B@2488b073 | 10096  |   
>| 75743  |
> | T   | 0  | [B@1c9f0a20 | 10025  |   
>| 75745  |
> | T   | 0  | [B@55787112 | 10104  |   
>| 75725  |
> | T   | 0  | [B@1cd201a8 | 10019  |   
>| 75748  |
> | T   | 0  | [B@7db82169 | 10080  |   

[jira] [Updated] (PHOENIX-4289) UPDATE STATISTICS command does not collect stats for local indexes

2017-10-27 Thread Samarth Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Samarth Jain updated PHOENIX-4289:
--
Attachment: PHOENIX-4289.patch

> UPDATE STATISTICS command does not collect stats for local indexes
> --
>
> Key: PHOENIX-4289
> URL: https://issues.apache.org/jira/browse/PHOENIX-4289
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
> Environment: HBase 1.3.1, Phoenix 4.12.0
>Reporter: Mujtaba Chohan
>    Assignee: Samarth Jain
>  Labels: localIndex
> Attachments: PHOENIX-4289.patch
>
>
> With clean {{SYSTEM.STATS}} table and restarted HBase server+Phoenix client. 
> Ran {{UPDATE STATISTICS T ALL}} command. Global guidepost width is set to 
> 100M. No stats are generated for any of the local indexes on table T.
> {noformat}
> explain select count(*) from T;
> +---+-++--+
> |   PLAN| 
> EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
> +---+-++--+
> | CLIENT 8-CHUNK PARALLEL 8-WAY RANGE SCAN OVER T [1]   | 
> null| null   | null |
> | SERVER FILTER BY FIRST KEY ONLY   | 
> null| null   | null |
> | SERVER AGGREGATE INTO SINGLE ROW  | 
> null| null   | null |
> +---+-++--+
> select * from system.stats;
> +--++-++--++
> |PHYSICAL_NAME | COLUMN_FAMILY  | GUIDE_POST_KEY  | 
> GUIDE_POSTS_WIDTH  |  LAST_STATS_UPDATE_TIME  | GUIDE_POSTS_ROW_COUNT  |
> +--++-++--++
> | T   || | null   | 
> 2017-10-16 18:36:57.884  | null   |
> | T   | 0  | [B@9bd0fa6  | 10099  |   
>| 75756  |
> | T   | 0  | [B@59d2103b | 10057  |   
>| 75748  |
> | T   | 0  | [B@39dcf4b0 | 10058  |   
>| 75748  |
> | T   | 0  | [B@6e4de19b | 10081  |   
>| 75743  |
> | T   | 0  | [B@f6c03cb  | 10044  |   
>| 75744  |
> | T   | 0  | [B@46f699d5 | 10023  |   
>| 75741  |
> | T   | 0  | [B@18518ccf | 10019  |   
>| 75749  |
> | T   | 0  | [B@1991f767 | 10097  |   
>| 75740  |
> | T   | 0  | [B@768ccdc5 | 10092  |   
>| 75740  |
> | T   | 0  | [B@4c6daf0  | 10026  |   
>| 75739  |
> | T   | 0  | [B@10650953 | 10054  |   
>| 75731  |
> | T   | 0  | [B@659eef7  | 10092  |   
>| 75741  |
> | T   | 0  | [B@162be91c | 10023  |   
>| 75752  |
> | T   | 0  | [B@2488b073 | 10096  |   
>| 75743  |
> | T   | 0  | [B@1c9f0a20 | 10025  |   
>| 75745  |
> | T   | 0  | [B@55787112 | 10104  |   
>| 75725  |
> | T   | 0  | [B@1cd201a8 | 10019  |   
>| 75748  |
> | T   | 0  | [B@7db82169 | 10080  |   
>

[jira] [Updated] (PHOENIX-4289) UPDATE STATISTICS command does not collect stats for local indexes

2017-10-27 Thread Samarth Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Samarth Jain updated PHOENIX-4289:
--
Attachment: PHOENIX-4289.patch

[~jamestaylor], please review.

> UPDATE STATISTICS command does not collect stats for local indexes
> --
>
> Key: PHOENIX-4289
> URL: https://issues.apache.org/jira/browse/PHOENIX-4289
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
> Environment: HBase 1.3.1, Phoenix 4.12.0
>Reporter: Mujtaba Chohan
>    Assignee: Samarth Jain
>  Labels: localIndex
> Attachments: PHOENIX-4289.patch
>
>
> With clean {{SYSTEM.STATS}} table and restarted HBase server+Phoenix client. 
> Ran {{UPDATE STATISTICS T ALL}} command. Global guidepost width is set to 
> 100M. No stats are generated for any of the local indexes on table T.
> {noformat}
> explain select count(*) from T;
> +---+-++--+
> |   PLAN| 
> EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
> +---+-++--+
> | CLIENT 8-CHUNK PARALLEL 8-WAY RANGE SCAN OVER T [1]   | 
> null| null   | null |
> | SERVER FILTER BY FIRST KEY ONLY   | 
> null| null   | null |
> | SERVER AGGREGATE INTO SINGLE ROW  | 
> null| null   | null |
> +---+-++--+
> select * from system.stats;
> +--++-++--++
> |PHYSICAL_NAME | COLUMN_FAMILY  | GUIDE_POST_KEY  | 
> GUIDE_POSTS_WIDTH  |  LAST_STATS_UPDATE_TIME  | GUIDE_POSTS_ROW_COUNT  |
> +--++-++--++
> | T   || | null   | 
> 2017-10-16 18:36:57.884  | null   |
> | T   | 0  | [B@9bd0fa6  | 10099  |   
>| 75756  |
> | T   | 0  | [B@59d2103b | 10057  |   
>| 75748  |
> | T   | 0  | [B@39dcf4b0 | 10058  |   
>| 75748  |
> | T   | 0  | [B@6e4de19b | 10081  |   
>| 75743  |
> | T   | 0  | [B@f6c03cb  | 10044  |   
>| 75744  |
> | T   | 0  | [B@46f699d5 | 10023  |   
>| 75741  |
> | T   | 0  | [B@18518ccf | 10019  |   
>| 75749  |
> | T   | 0  | [B@1991f767 | 10097  |   
>| 75740  |
> | T   | 0  | [B@768ccdc5 | 10092  |   
>| 75740  |
> | T   | 0  | [B@4c6daf0  | 10026  |   
>| 75739  |
> | T   | 0  | [B@10650953 | 10054  |   
>| 75731  |
> | T   | 0  | [B@659eef7  | 10092  |   
>| 75741  |
> | T   | 0  | [B@162be91c | 10023  |   
>| 75752  |
> | T   | 0  | [B@2488b073 | 10096  |   
>| 75743  |
> | T   | 0  | [B@1c9f0a20 | 10025  |   
>| 75745  |
> | T   | 0  | [B@55787112 | 10104  |   
>| 75725  |
> | T   | 0  | [B@1cd201a8 | 10019  |   
>| 75748  |
> | T   | 0  | [B@7db82169

[jira] [Commented] (PHOENIX-4320) Update website pages with information on phoenix.use.stats.parallelization confi

2017-10-25 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219492#comment-16219492
 ] 

Samarth Jain commented on PHOENIX-4320:
---

Something wrong with my setup. [~mujtabachohan] just pushed a commit and fixed 
it. Thanks Mujtaba.

> Update website pages with information on phoenix.use.stats.parallelization 
> confi
> 
>
> Key: PHOENIX-4320
> URL: https://issues.apache.org/jira/browse/PHOENIX-4320
> Project: Phoenix
>  Issue Type: Task
>    Reporter: Samarth Jain
>Assignee: Samarth Jain
> Attachments: PHOENIX-4320.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-4320) Update website pages with information on phoenix.use.stats.parallelization confi

2017-10-25 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16219433#comment-16219433
 ] 

Samarth Jain commented on PHOENIX-4320:
---

Oops. Will fix it right away.

> Update website pages with information on phoenix.use.stats.parallelization 
> confi
> 
>
> Key: PHOENIX-4320
> URL: https://issues.apache.org/jira/browse/PHOENIX-4320
> Project: Phoenix
>  Issue Type: Task
>    Reporter: Samarth Jain
>Assignee: Samarth Jain
> Attachments: PHOENIX-4320.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (PHOENIX-4320) Update website pages with information on phoenix.use.stats.parallelization confi

2017-10-25 Thread Samarth Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Samarth Jain resolved PHOENIX-4320.
---
Resolution: Fixed

> Update website pages with information on phoenix.use.stats.parallelization 
> confi
> 
>
> Key: PHOENIX-4320
> URL: https://issues.apache.org/jira/browse/PHOENIX-4320
> Project: Phoenix
>  Issue Type: Task
>    Reporter: Samarth Jain
>Assignee: Samarth Jain
> Attachments: PHOENIX-4320.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (PHOENIX-4320) Update website pages with information on phoenix.use.stats.parallelization confi

2017-10-25 Thread Samarth Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Samarth Jain updated PHOENIX-4320:
--
Attachment: PHOENIX-4320.patch

> Update website pages with information on phoenix.use.stats.parallelization 
> confi
> 
>
> Key: PHOENIX-4320
> URL: https://issues.apache.org/jira/browse/PHOENIX-4320
> Project: Phoenix
>  Issue Type: Task
>    Reporter: Samarth Jain
>Assignee: Samarth Jain
> Attachments: PHOENIX-4320.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (PHOENIX-4320) Update website pages with information on phoenix.use.stats.parallelization confi

2017-10-25 Thread Samarth Jain (JIRA)
Samarth Jain created PHOENIX-4320:
-

 Summary: Update website pages with information on 
phoenix.use.stats.parallelization confi
 Key: PHOENIX-4320
 URL: https://issues.apache.org/jira/browse/PHOENIX-4320
 Project: Phoenix
  Issue Type: Task
Reporter: Samarth Jain
Assignee: Samarth Jain






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (PHOENIX-4289) UPDATE STATISTICS command does not collect stats for local indexes

2017-10-20 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213608#comment-16213608
 ] 

Samarth Jain edited comment on PHOENIX-4289 at 10/21/17 12:58 AM:
--

I think I see what is going on. When a table has an index, we run update stats 
twice - once for the data table and once for the index table. We control update 
stats being called too many times in a short duration by using the configurable 
setting phoenix.stats.minUpdateFrequency. The check for when update stats was 
last run uses the physical_name as the filter. 
{code}
String query = "SELECT CURRENT_DATE()," + LAST_STATS_UPDATE_TIME + " FROM " + 
PhoenixDatabaseMetaData.SYSTEM_STATS_NAME
+ " WHERE " + PHYSICAL_NAME + "='" + physicalName.getString() + 
"' AND " + COLUMN_FAMILY
+ " IS NULL AND " + LAST_STATS_UPDATE_TIME + " IS NOT NULL";
{code}

For local indexes, the physical_name is same for both data table and index 
table. As a result the second update stats ends up not collecting any stats for 
the index table. The default value of this config is set to 0 in our tests. So 
an update statistics statement was collecting stats for both index and data 
tables. After setting QueryServicesTestImpl.DEFAULT_MIN_STATS_UPDATE_FREQ_MS to 
a large value, I am seeing now that the estimates are being returned as null.


was (Author: samarthjain):
I think I see what is going on. When a table has an index, we run update stats 
twice - once for the data table and once for the index table. We control update 
stats being called too many times in a short duration by using the configurable 
setting phoenix.stats.minUpdateFrequency. The check for when update stats was 
last run uses the physical_table_name as the filter. For local indexes, the 
physical_table_name is same for both data table and index table. As a result 
the second update stats ends up not collecting any stats for the index table. 
The default value of this config is set to 0 in our tests. So the tests weren't 
able to catch this issue. After setting 
QueryServicesTestImpl.DEFAULT_MIN_STATS_UPDATE_FREQ_MS to a large value, I am 
seeing now that the estimates are being returned as null.

> UPDATE STATISTICS command does not collect stats for local indexes
> --
>
> Key: PHOENIX-4289
> URL: https://issues.apache.org/jira/browse/PHOENIX-4289
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
> Environment: HBase 1.3.1, Phoenix 4.12.0
>Reporter: Mujtaba Chohan
>Assignee: Samarth Jain
>  Labels: localIndex
>
> With clean {{SYSTEM.STATS}} table and restarted HBase server+Phoenix client. 
> Ran {{UPDATE STATISTICS T ALL}} command. Global guidepost width is set to 
> 100M. No stats are generated for any of the local indexes on table T.
> {noformat}
> explain select count(*) from T;
> +---+-++--+
> |   PLAN| 
> EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
> +---+-++--+
> | CLIENT 8-CHUNK PARALLEL 8-WAY RANGE SCAN OVER T [1]   | 
> null| null   | null |
> | SERVER FILTER BY FIRST KEY ONLY   | 
> null| null   | null |
> | SERVER AGGREGATE INTO SINGLE ROW  | 
> null| null   | null |
> +---+-++--+
> select * from system.stats;
> +--++-++--++
> |PHYSICAL_NAME | COLUMN_FAMILY  | GUIDE_POST_KEY  | 
> GUIDE_POSTS_WIDTH  |  LAST_STATS_UPDATE_TIME  | GUIDE_POSTS_ROW_COUNT  |
> +--++-++--++
> | T   || | null   | 
> 2017-10-16 18:36:57.884  | null   |
> | T   | 0  | [B@9bd0fa6  | 10099  |   
>| 75756  |
> | T   | 0  | [B@59d2103b | 10057  |   
>   

[jira] [Commented] (PHOENIX-4289) UPDATE STATISTICS command does not collect stats for local indexes

2017-10-20 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213608#comment-16213608
 ] 

Samarth Jain commented on PHOENIX-4289:
---

I think I see what is going on. When a table has an index, we run update stats 
twice - once for the data table and once for the index table. We control update 
stats being called too many times in a short duration by using the configurable 
setting phoenix.stats.minUpdateFrequency. The check for when update stats was 
last run uses the physical_table_name as the filter. For local indexes, the 
physical_table_name is same for both data table and index table. As a result 
the second update stats ends up not collecting any stats for the index table. 
The default value of this config is set to 0 in our tests. So the tests weren't 
able to catch this issue. After setting 
QueryServicesTestImpl.DEFAULT_MIN_STATS_UPDATE_FREQ_MS to a large value, I am 
seeing now that the estimates are being returned as null.

> UPDATE STATISTICS command does not collect stats for local indexes
> --
>
> Key: PHOENIX-4289
> URL: https://issues.apache.org/jira/browse/PHOENIX-4289
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
> Environment: HBase 1.3.1, Phoenix 4.12.0
>Reporter: Mujtaba Chohan
>    Assignee: Samarth Jain
>  Labels: localIndex
>
> With clean {{SYSTEM.STATS}} table and restarted HBase server+Phoenix client. 
> Ran {{UPDATE STATISTICS T ALL}} command. Global guidepost width is set to 
> 100M. No stats are generated for any of the local indexes on table T.
> {noformat}
> explain select count(*) from T;
> +---+-++--+
> |   PLAN| 
> EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
> +---+-++--+
> | CLIENT 8-CHUNK PARALLEL 8-WAY RANGE SCAN OVER T [1]   | 
> null| null   | null |
> | SERVER FILTER BY FIRST KEY ONLY   | 
> null| null   | null |
> | SERVER AGGREGATE INTO SINGLE ROW  | 
> null| null   | null |
> +---+-++--+
> select * from system.stats;
> +--++-++--++
> |PHYSICAL_NAME | COLUMN_FAMILY  | GUIDE_POST_KEY  | 
> GUIDE_POSTS_WIDTH  |  LAST_STATS_UPDATE_TIME  | GUIDE_POSTS_ROW_COUNT  |
> +--++-++--++
> | T   || | null   | 
> 2017-10-16 18:36:57.884  | null   |
> | T   | 0  | [B@9bd0fa6  | 10099  |   
>| 75756  |
> | T   | 0  | [B@59d2103b | 10057  |   
>| 75748  |
> | T   | 0  | [B@39dcf4b0 | 10058  |   
>| 75748  |
> | T   | 0  | [B@6e4de19b | 10081  |   
>| 75743  |
> | T   | 0  | [B@f6c03cb  | 10044  |   
>| 75744  |
> | T   | 0  | [B@46f699d5 | 10023  |   
>| 75741  |
> | T   | 0  | [B@18518ccf | 10019  |   
>| 75749  |
> | T   | 0  | [B@1991f767 | 10097  |   
>| 75740  |
> | T   | 0  | [B@768ccdc5 | 10092  |   
>| 75740  |
> | T   | 0  | [B@4c6daf0  | 10026  |   
>| 75739  |
> | T   | 0  | [B@10650953 | 10054  |   
>| 75731  |
> | T   | 0  | [B@659eef7  | 10092  |   
>| 757

[jira] [Commented] (PHOENIX-4289) UPDATE STATISTICS command does not collect stats for local indexes

2017-10-20 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213580#comment-16213580
 ] 

Samarth Jain commented on PHOENIX-4289:
---

[~mujtabachohan], I am unable to repro this issue in a unit test. This is what 
I added in ExplainPlanWithStatsEnabledIT:
{code}
@Test
public void testEstimatesWithLocalIndexes() throws Exception {
String tableName = generateUniqueName();
String indexName = "IDX_" + generateUniqueName();
try (Connection conn = DriverManager.getConnection(getUrl())) {
int guidePostWidth = 20;
conn.createStatement()
.execute("CREATE TABLE " + tableName
+ " (k INTEGER PRIMARY KEY, a bigint, b bigint)"
+ " GUIDE_POSTS_WIDTH=" + guidePostWidth);
conn.createStatement().execute("upsert into " + tableName + " 
values (100,1,3)");
conn.createStatement().execute("upsert into " + tableName + " 
values (101,2,4)");
conn.createStatement().execute("upsert into " + tableName + " 
values (102,2,4)");
conn.createStatement().execute("upsert into " + tableName + " 
values (103,2,4)");
conn.createStatement().execute("upsert into " + tableName + " 
values (104,2,4)");
conn.createStatement().execute("upsert into " + tableName + " 
values (105,2,4)");
conn.createStatement().execute("upsert into " + tableName + " 
values (106,2,4)");
conn.createStatement().execute("upsert into " + tableName + " 
values (107,2,4)");
conn.createStatement().execute("upsert into " + tableName + " 
values (108,2,4)");
conn.createStatement().execute("upsert into " + tableName + " 
values (109,2,4)");
conn.commit();
conn.createStatement().execute(
"CREATE LOCAL INDEX " + indexName + " ON " + tableName + " (a) 
INCLUDE (b) ");
conn.createStatement().execute("UPDATE STATISTICS " + tableName + 
"");
}
List binds = Lists.newArrayList();
try (Connection conn = DriverManager.getConnection(getUrl())) {
String sql =
"SELECT COUNT(*) " + " FROM " + tableName;
ResultSet rs = conn.createStatement().executeQuery(sql);
assertTrue("Index " + indexName + " should have been used",

rs.unwrap(PhoenixResultSet.class).getStatement().getQueryPlan().getTableRef()
.getTable().getName().getString().equals(indexName));
Estimate info = getByteRowEstimates(conn, sql, binds);
assertEquals((Long) 10l, info.estimatedRows);
assertTrue(info.estimateInfoTs > 0);
}
}
{code}


> UPDATE STATISTICS command does not collect stats for local indexes
> ------
>
> Key: PHOENIX-4289
> URL: https://issues.apache.org/jira/browse/PHOENIX-4289
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
> Environment: HBase 1.3.1, Phoenix 4.12.0
>Reporter: Mujtaba Chohan
>Assignee: Samarth Jain
>  Labels: localIndex
>
> With clean {{SYSTEM.STATS}} table and restarted HBase server+Phoenix client. 
> Ran {{UPDATE STATISTICS T ALL}} command. Global guidepost width is set to 
> 100M. No stats are generated for any of the local indexes on table T.
> {noformat}
> explain select count(*) from T;
> +---+-++--+
> |   PLAN| 
> EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
> +---+-++--+
> | CLIENT 8-CHUNK PARALLEL 8-WAY RANGE SCAN OVER T [1]   | 
> null| null   | null |
> | SERVER FILTER BY FIRST KEY ONLY   | 
> null| null   | null |
> | SERVER AGGREGATE INTO SINGLE ROW  | 
> null| null   | null |
> +---+-++--+
> select * from system.stats;
> +

[jira] [Commented] (PHOENIX-4287) Incorrect aggregate query results when stats are disable for parallelization

2017-10-18 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16209858#comment-16209858
 ] 

Samarth Jain commented on PHOENIX-4287:
---

Looks like it is limited to local indexes. Will keep looking.

> Incorrect aggregate query results when stats are disable for parallelization
> 
>
> Key: PHOENIX-4287
> URL: https://issues.apache.org/jira/browse/PHOENIX-4287
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
> Environment: HBase 1.3.1
>Reporter: Mujtaba Chohan
>    Assignee: Samarth Jain
> Fix For: 4.12.1
>
>
> With {{phoenix.use.stats.parallelization}} set to {{false}}, aggregate query 
> returns incorrect results when stats are available.
> With local index and stats disabled for parallelization:
> {noformat}
> explain select count(*) from TABLE_T;
> +---+-++---+
> | PLAN
>   | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO |
> +---+-++---+
> | CLIENT 0-CHUNK 332170 ROWS 625043899 BYTES PARALLEL 0-WAY RANGE SCAN OVER 
> TABLE_T [1]  | 625043899   | 332170 | 150792825 |
> | SERVER FILTER BY FIRST KEY ONLY 
>   | 625043899   | 332170 | 150792825 |
> | SERVER AGGREGATE INTO SINGLE ROW
>   | 625043899   | 332170 | 150792825 |
> +---+-++---+
> select count(*) from TABLE_T;
> +---+
> | COUNT(1)  |
> +---+
> | 0 |
> +---+
> {noformat}
> Using data table
> {noformat}
> explain select /*+NO_INDEX*/ count(*) from TABLE_T;
> +--+-+++
> |   PLAN  
>  | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO_TS   |
> +--+-+++
> | CLIENT 2-CHUNK 332151 ROWS 438492470 BYTES PARALLEL 1-WAY FULL SCAN OVER 
> TABLE_T  | 438492470   | 332151 | 1507928257617  |
> | SERVER FILTER BY FIRST KEY ONLY 
>  | 438492470   | 332151 | 1507928257617  |
> | SERVER AGGREGATE INTO SINGLE ROW
>  | 438492470   | 332151 | 1507928257617  |
> +--+-+++
> select /*+NO_INDEX*/ count(*) from TABLE_T;
> +---+
> | COUNT(1)  |
> +---+
> | 14|
> +---+
> {noformat}
> Without stats available, results are correct:
> {noformat}
> explain select /*+NO_INDEX*/ count(*) from TABLE_T;
> +--+-++--+
> | PLAN | 
> EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
> +--+-++--+
> | CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER TABLE_T  | null| 
> null   | null |
> | SERVER FILTER BY FIRST KEY ONLY  | null 
>| null   | null |
> | SERVER AGGREGATE INTO SINGLE ROW | null 
>| null   | null |
> +--+-++--+
> select /*+NO_INDEX*/ count(*) from TABLE_T;
> +---+
> | COUNT(1)  |
> +---+
> | 27|
> +---+
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-4287) Incorrect aggregate query results when stats are disable for parallelization

2017-10-17 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16208163#comment-16208163
 ] 

Samarth Jain commented on PHOENIX-4287:
---

Yes, I am working on it.

> Incorrect aggregate query results when stats are disable for parallelization
> 
>
> Key: PHOENIX-4287
> URL: https://issues.apache.org/jira/browse/PHOENIX-4287
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
> Environment: HBase 1.3.1
>Reporter: Mujtaba Chohan
>    Assignee: Samarth Jain
> Fix For: 4.12.1
>
>
> With {{phoenix.use.stats.parallelization}} set to {{false}}, aggregate query 
> returns incorrect results when stats are available.
> With local index and stats disabled for parallelization:
> {noformat}
> explain select count(*) from TABLE_T;
> +---+-++---+
> | PLAN
>   | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO |
> +---+-++---+
> | CLIENT 0-CHUNK 332170 ROWS 625043899 BYTES PARALLEL 0-WAY RANGE SCAN OVER 
> TABLE_T [1]  | 625043899   | 332170 | 150792825 |
> | SERVER FILTER BY FIRST KEY ONLY 
>   | 625043899   | 332170 | 150792825 |
> | SERVER AGGREGATE INTO SINGLE ROW
>   | 625043899   | 332170 | 150792825 |
> +---+-++---+
> select count(*) from TABLE_T;
> +---+
> | COUNT(1)  |
> +---+
> | 0 |
> +---+
> {noformat}
> Using data table
> {noformat}
> explain select /*+NO_INDEX*/ count(*) from TABLE_T;
> +--+-+++
> |   PLAN  
>  | EST_BYTES_READ  | EST_ROWS_READ  |  EST_INFO_TS   |
> +--+-+++
> | CLIENT 2-CHUNK 332151 ROWS 438492470 BYTES PARALLEL 1-WAY FULL SCAN OVER 
> TABLE_T  | 438492470   | 332151 | 1507928257617  |
> | SERVER FILTER BY FIRST KEY ONLY 
>  | 438492470   | 332151 | 1507928257617  |
> | SERVER AGGREGATE INTO SINGLE ROW
>  | 438492470   | 332151 | 1507928257617  |
> +--+-+++
> select /*+NO_INDEX*/ count(*) from TABLE_T;
> +---+
> | COUNT(1)  |
> +---+
> | 14|
> +---+
> {noformat}
> Without stats available, results are correct:
> {noformat}
> explain select /*+NO_INDEX*/ count(*) from TABLE_T;
> +--+-++--+
> | PLAN | 
> EST_BYTES_READ  | EST_ROWS_READ  | EST_INFO_TS  |
> +--+-++--+
> | CLIENT 2-CHUNK PARALLEL 1-WAY FULL SCAN OVER TABLE_T  | null| 
> null   | null |
> | SERVER FILTER BY FIRST KEY ONLY  | null 
>| null   | null |
> | SERVER AGGREGATE INTO SINGLE ROW | null 
>| null   | null |
> +--+-++--+
> select /*+NO_INDEX*/ count(*) from TABLE_T;
> +---+
> | COUNT(1)  |
> +---+
> | 27|
> +---+
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: [ANNOUNCE] New Phoenix committer: Ethan Wang

2017-10-12 Thread Samarth Jain
Congrats, Ethan!

On Thu, Oct 12, 2017 at 9:25 AM Thomas D'Silva 
wrote:

> Congrats Ethan!
>
> On Thu, Oct 12, 2017 at 8:28 AM, Geoffrey Jacoby 
> wrote:
>
> > Congrats, Ethan! Looking forward to using those new functions soon.
> >
> > Geoffrey
> >
> > On Thu, Oct 12, 2017 at 1:32 AM, rajeshb...@apache.org <
> > chrajeshbab...@gmail.com> wrote:
> >
> > > Congratulations Ethan!! Great Job.
> > >
> > > Thanks,
> > > Rajeshbabu.
> > >
> > > On Thu, Oct 12, 2017 at 7:15 AM, James Taylor 
> > > wrote:
> > >
> > > > On behalf of the Apache Phoenix PMC, I'm please to announce that
> Ethan
> > > Wang
> > > > has accepted our invitation to become a committer. He's behind some
> of
> > > the
> > > > great new 4.12 features of table sampling [1] and approximate count
> > > > distinct [2] along with contributing to the less sexy work of helping
> > to
> > > > stabilize our unit tests.
> > > >
> > > > Please give Ethan a warm welcome to the project!
> > > >
> > > > James
> > > >
> > > > [1] https://phoenix.apache.org/tablesample.html
> > > > [2] https://phoenix.apache.org/language/functions.html#
> > > > approx_count_distinct
> > > >
> > >
> >
>


Re: [ANNOUNCE] New Phoenix committer: Vincent Poon

2017-10-12 Thread Samarth Jain
Congrats, Vincent!

On Thu, Oct 12, 2017 at 9:25 AM Thomas D'Silva 
wrote:

> Congrats Vincent!
>
> On Thu, Oct 12, 2017 at 8:27 AM, Geoffrey Jacoby 
> wrote:
>
> > Congrats, Vincent! Thanks for all your help on the index stabilization.
> >
> > On Thu, Oct 12, 2017 at 1:32 AM, rajeshb...@apache.org <
> > chrajeshbab...@gmail.com> wrote:
> >
> > > Congratulations Vincent!! Great Job.
> > >
> > > Thanks,
> > > Rajeshbabu.
> > >
> > > On Thu, Oct 12, 2017 at 7:21 AM, James Taylor 
> > > wrote:
> > >
> > > > On behalf of the Apache Phoenix PMC, I'm delighted to announce that
> > > Vincent
> > > > Poon has accepted our invitation to become a committer. He's had a
> big
> > > > impact in helping to stabilize our secondary index implementation,
> > > > including the creation of an index scrutiny tool that will detect
> > > > out-of-sync issues [1].
> > > >
> > > > Looking forward to continued contributions.
> > > >
> > > > Please give Vincent a warm welcome to the project!
> > > >
> > > > James
> > > >
> > > >
> > > > [1] https://phoenix.apache.org/secondary_indexing.html#Index_
> > > Scrutiny_Tool
> > > >
> > >
> >
>


Re: [VOTE] Release of Apache Phoenix 4.12.0 RC0

2017-10-09 Thread Samarth Jain
+1
- built from source
- successfully ran all unit and integration tests
- collected stats using major compaction and update stats - estimates look
correct
- ran some basic manual tests involving global mutable and immutable
secondary indexes, looks good.

On Fri, Oct 6, 2017 at 1:03 PM, lars hofhansl  wrote:

> +1
> - built from source- loaded a few million rows into Phoenix- tried some
> queries- nothing undue in the logs- killed a region server while the client
> was in the middle of a large update (UPSERT ... SELECT ...)- all recovered
> nicely
>
>
>   From: James Taylor 
>  To: "dev@phoenix.apache.org" 
>  Sent: Wednesday, October 4, 2017 12:46 AM
>  Subject: [VOTE] Release of Apache Phoenix 4.12.0 RC0
>
> Hello Everyone,
>
> This is a call for a vote on Apache Phoenix 4.12.0 RC0. This is the next
> minor release of Phoenix 4, compatible with Apache HBase 0.98, 1.1, 1.2, &
> 1.3. The release includes both a source-only release and a convenience
> binary release for each supported HBase version.
>
> This release has feature parity with supported HBase versions and includes
> the following improvements:
> - Improved scalability of global mutable secondary index
> - 100+ bug fixes (the majority around secondary indexing)
> - Index Scrutiny tool [1]
> - Stabilization of unit tests
> - Support for table sampling [2]
> - Support for APPROX_COUNT_DISTINCT aggregate function [3]
>
> The source tarball, including signatures, digests, etc can be found at:
> https://dist.apache.org/repos/dist/dev/phoenix/apache-
> phoenix-v4.12.0-HBase-0.98-rc0/src/
> https://dist.apache.org/repos/dist/dev/phoenix/apache-
> phoenix-v4.12.0-HBase-1.1-rc0/src/
> https://dist.apache.org/repos/dist/dev/phoenix/apache-
> phoenix-v4.12.0-HBase-1.2-rc0/src/
> https://dist.apache.org/repos/dist/dev/phoenix/apache-
> phoenix-v4.12.0-HBase-1.3-rc0/src/
>
> The binary artifacts can be found at:
> https://dist.apache.org/repos/dist/dev/phoenix/apache-
> phoenix-v4.12.0-HBase-0.98-rc0/bin/
> https://dist.apache.org/repos/dist/dev/phoenix/apache-
> phoenix-v4.12.0-HBase-1.1-rc0/bin/
> https://dist.apache.org/repos/dist/dev/phoenix/apache-
> phoenix-v4.12.0-HBase-1.2-rc0/bin/
> https://dist.apache.org/repos/dist/dev/phoenix/apache-
> phoenix-v4.12.0-HBase-1.3-rc0/bin/
>
> For a complete list of changes, see:
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?
> projectId=12315120=12340844
>
> Artifacts are signed with my "CODE SIGNING KEY": 308FBEE06088BE0F
>
> KEYS file available here:
> https://dist.apache.org/repos/dist/dev/phoenix/KEYS
>
> The hash and tag to be voted upon:
> https://git-wip-us.apache.org/repos/asf?p=phoenix.git;a=commit;h=
> 13a7f97b49704642d67481c58a118a68c2e4c2e5
> https://git-wip-us.apache.org/repos/asf?p=phoenix.git;a=tag;
> h=refs/tags/v4.12.0-HBase-0.98-rc0
> https://git-wip-us.apache.org/repos/asf?p=phoenix.git;a=commit;h=
> e40bbfff1150e56e1ecb7cd22c49cee298496c2b
> https://git-wip-us.apache.org/repos/asf?p=phoenix.git;a=tag;
> h=refs/tags/v4.12.0-HBase-1.1-rc0
> https://git-wip-us.apache.org/repos/asf?p=phoenix.git;a=commit;h=
> d79dd50ff732f2673e1414d970cd4742e2c135de
> https://git-wip-us.apache.org/repos/asf?p=phoenix.git;a=tag;
> h=refs/tags/v4.12.0-HBase-1.2-rc0
> https://git-wip-us.apache.org/repos/asf?p=phoenix.git;a=commit;h=
> f0bc4cdb5bbf96b316c78cc816400b04f63e911b
> https://git-wip-us.apache.org/repos/asf?p=phoenix.git;a=tag;
> h=refs/tags/v4.12.0-HBase-1.3-rc0
>
> Vote will be open for at least 72 hours. Please vote:
>
> [ ] +1 approve
> [ ] +0 no opinion
> [ ] -1 disapprove (and reason why)
>
> Thanks,
> The Apache Phoenix Team
>
> [1] https://phoenix.apache.org/secondary_indexing.html#Index_Scrutiny_Tool
> [2] https://phoenix.apache.org/tablesample.html
> [3] https://phoenix.apache.org/language/functions.html#
> approx_count_distinct
>
>
>
>


[jira] [Commented] (PHOENIX-4276) Surface metrics on statistics collection

2017-10-04 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16191580#comment-16191580
 ] 

Samarth Jain commented on PHOENIX-4276:
---

FYI, [~Misraji]

> Surface metrics on statistics collection
> 
>
> Key: PHOENIX-4276
> URL: https://issues.apache.org/jira/browse/PHOENIX-4276
> Project: Phoenix
>  Issue Type: Improvement
>    Reporter: Samarth Jain
>
> It would be good to get an insight on how stats collection is doing over 
> time. An initial set of metrics that I can think of would be:
> Time taken to compute stats (reading cells and computing their size)
> Time taken to commit stats per physical table.
> Number of guide posts collected per physical table
> Number of guide posts collected per region.
> Number of regions on which stats collection happened per physical table
> Number of times stats was collected due to major compaction vs update stats 
> per physical table
> If possible, figure out if stats was collected because minor compaction was 
> promoted to major compaction and surface a metric for it.
> Because most of the collection work happens on server side, one option would 
> be to see how HBase's metrics are surfaced (my guess is JMX) and follow the 
> same pattern. Or we could possibly use the hbase-metrics-api module but that 
> is an HBase 1.4 thing. Another option would be see PHOENIX-3807 for some 
> inspiration.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (PHOENIX-4276) Surface metrics on statistics collection

2017-10-04 Thread Samarth Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Samarth Jain updated PHOENIX-4276:
--
Issue Type: Improvement  (was: Bug)

> Surface metrics on statistics collection
> 
>
> Key: PHOENIX-4276
> URL: https://issues.apache.org/jira/browse/PHOENIX-4276
> Project: Phoenix
>  Issue Type: Improvement
>    Reporter: Samarth Jain
>
> It would be good to get an insight on how stats collection is doing over 
> time. An initial set of metrics that I can think of would be:
> Time taken to compute stats (reading cells and computing their size)
> Time taken to commit stats per physical table.
> Number of guide posts collected per physical table
> Number of guide posts collected per region.
> Number of regions on which stats collection happened per physical table
> Number of times stats was collected due to major compaction vs update stats 
> per physical table
> If possible, figure out if stats was collected because minor compaction was 
> promoted to major compaction and surface a metric for it.
> Because most of the collection work happens on server side, one option would 
> be to see how HBase's metrics are surfaced (my guess is JMX) and follow the 
> same pattern. Or we could possibly use the hbase-metrics-api module but that 
> is an HBase 1.4 thing. Another option would be see PHOENIX-3807 for some 
> inspiration.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (PHOENIX-4276) Surface metrics on statistics collection

2017-10-04 Thread Samarth Jain (JIRA)
Samarth Jain created PHOENIX-4276:
-

 Summary: Surface metrics on statistics collection
 Key: PHOENIX-4276
 URL: https://issues.apache.org/jira/browse/PHOENIX-4276
 Project: Phoenix
  Issue Type: Bug
Reporter: Samarth Jain


It would be good to get an insight on how stats collection is doing over time. 
An initial set of metrics that I can think of would be:
Time taken to compute stats (reading cells and computing their size)
Time taken to commit stats per physical table.
Number of guide posts collected per physical table
Number of guide posts collected per region.
Number of regions on which stats collection happened per physical table
Number of times stats was collected due to major compaction vs update stats per 
physical table
If possible, figure out if stats was collected because minor compaction was 
promoted to major compaction and surface a metric for it.

Because most of the collection work happens on server side, one option would be 
to see how HBase's metrics are surfaced (my guess is JMX) and follow the same 
pattern. Or we could possibly use the hbase-metrics-api module but that is an 
HBase 1.4 thing. Another option would be see PHOENIX-3807 for some inspiration.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


  1   2   3   4   5   6   7   8   9   10   >