[jira] [Updated] (PHOENIX-3674) Upsert or Delele queries failing to translate LogicalTableModify into MutableRel in Phoenix-Calcite

2017-02-14 Thread Rajeshbabu Chintaguntla (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajeshbabu Chintaguntla updated PHOENIX-3674:
-
Labels: calcite  (was: )

> Upsert or Delele queries failing to translate LogicalTableModify into 
> MutableRel in Phoenix-Calcite 
> 
>
> Key: PHOENIX-3674
> URL: https://issues.apache.org/jira/browse/PHOENIX-3674
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Rajeshbabu Chintaguntla
>Assignee: Rajeshbabu Chintaguntla
>  Labels: calcite
>
> Here are the tests failing 
> {noformat}
> testDeleteAllFromTableWithLocalIndexNoAutoCommitSalted(org.apache.phoenix.end2end.DeleteIT)
>   Time elapsed: 8.99 sec  <<< ERROR!
> java.sql.SQLException: Error while executing SQL "DELETE FROM T000139": 
> cannot translate 
> rel#7748:LogicalTableModify.NONE.[](input=rel#7735:PhoenixTableScan.SERVER.[](table=[phoenix,
>  T000139]),table=[phoenix, T000139],operation=DELETE,flattened=false) to 
> MutableRel
>   at 
> org.apache.phoenix.end2end.DeleteIT.testDeleteAllFromTableWithIndex(DeleteIT.java:332)
>   at 
> org.apache.phoenix.end2end.DeleteIT.testDeleteAllFromTableWithLocalIndexNoAutoCommitSalted(DeleteIT.java:283)
> Caused by: java.lang.RuntimeException: cannot translate 
> rel#7748:LogicalTableModify.NONE.[](input=rel#7735:PhoenixTableScan.SERVER.[](table=[phoenix,
>  T000139]),table=[phoenix, T000139],operation=DELETE,flattened=false) to 
> MutableRel
>   at 
> org.apache.phoenix.end2end.DeleteIT.testDeleteAllFromTableWithIndex(DeleteIT.java:332)
>   at 
> org.apache.phoenix.end2end.DeleteIT.testDeleteAllFromTableWithLocalIndexNoAutoCommitSalted(DeleteIT.java:283)
> {noformat}
> {noformat}
> java.sql.SQLException: Error while executing SQL "UPSERT INTO 
> atable(organization_id,entity_id,a_integer) SELECT organization_id, 
> entity_id, CAST(null AS integer) FROM atable": cannot translate 
> rel#25411:LogicalTableModify.NONE.[](input=rel#25415:LogicalProject.NONE.[](input=rel#25389:PhoenixTableScan.SERVER.[](table=[phoenix,
>  
> ATABLE]),ORGANIZATION_ID=$0,ENTITY_ID=$1,A_STRING=null,B_STRING=null,A_INTEGER=null,A_DATE=null,A_TIME=null,A_TIMESTAMP=null,X_DECIMAL=null,X_LONG=null,X_INTEGER=null,Y_INTEGER=null,A_BYTE=null,A_SHORT=null,A_FLOAT=null,A_DOUBLE=null,A_UNSIGNED_FLOAT=null,A_UNSIGNED_DOUBLE=null),table=[phoenix,
>  ATABLE],operation=INSERT,flattened=false) to MutableRel
>   at 
> org.apache.phoenix.end2end.AggregateQueryIT.testSumOverNullIntegerColumn(AggregateQueryIT.java:80)
> Caused by: java.lang.RuntimeException: cannot translate 
> rel#25411:LogicalTableModify.NONE.[](input=rel#25415:LogicalProject.NONE.[](input=rel#25389:PhoenixTableScan.SERVER.[](table=[phoenix,
>  
> ATABLE]),ORGANIZATION_ID=$0,ENTITY_ID=$1,A_STRING=null,B_STRING=null,A_INTEGER=null,A_DATE=null,A_TIME=null,A_TIMESTAMP=null,X_DECIMAL=null,X_LONG=null,X_INTEGER=null,Y_INTEGER=null,A_BYTE=null,A_SHORT=null,A_FLOAT=null,A_DOUBLE=null,A_UNSIGNED_FLOAT=null,A_UNSIGNED_DOUBLE=null),table=[phoenix,
>  ATABLE],operation=INSERT,flattened=false) to MutableRel
>   at 
> org.apache.phoenix.end2end.AggregateQueryIT.testSumOverNullIntegerColumn(AggregateQueryIT.java:80)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (PHOENIX-3674) Upsert or Delele queries failing to translate LogicalTableModify into MutableRel in Phoenix-Calcite

2017-02-14 Thread Rajeshbabu Chintaguntla (JIRA)
Rajeshbabu Chintaguntla created PHOENIX-3674:


 Summary: Upsert or Delele queries failing to translate 
LogicalTableModify into MutableRel in Phoenix-Calcite 
 Key: PHOENIX-3674
 URL: https://issues.apache.org/jira/browse/PHOENIX-3674
 Project: Phoenix
  Issue Type: Sub-task
Reporter: Rajeshbabu Chintaguntla
Assignee: Rajeshbabu Chintaguntla


Here are the tests failing 
{noformat}
testDeleteAllFromTableWithLocalIndexNoAutoCommitSalted(org.apache.phoenix.end2end.DeleteIT)
  Time elapsed: 8.99 sec  <<< ERROR!
java.sql.SQLException: Error while executing SQL "DELETE FROM T000139": cannot 
translate 
rel#7748:LogicalTableModify.NONE.[](input=rel#7735:PhoenixTableScan.SERVER.[](table=[phoenix,
 T000139]),table=[phoenix, T000139],operation=DELETE,flattened=false) to 
MutableRel
at 
org.apache.phoenix.end2end.DeleteIT.testDeleteAllFromTableWithIndex(DeleteIT.java:332)
at 
org.apache.phoenix.end2end.DeleteIT.testDeleteAllFromTableWithLocalIndexNoAutoCommitSalted(DeleteIT.java:283)
Caused by: java.lang.RuntimeException: cannot translate 
rel#7748:LogicalTableModify.NONE.[](input=rel#7735:PhoenixTableScan.SERVER.[](table=[phoenix,
 T000139]),table=[phoenix, T000139],operation=DELETE,flattened=false) to 
MutableRel
at 
org.apache.phoenix.end2end.DeleteIT.testDeleteAllFromTableWithIndex(DeleteIT.java:332)
at 
org.apache.phoenix.end2end.DeleteIT.testDeleteAllFromTableWithLocalIndexNoAutoCommitSalted(DeleteIT.java:283)
{noformat}
{noformat}
java.sql.SQLException: Error while executing SQL "UPSERT INTO 
atable(organization_id,entity_id,a_integer) SELECT organization_id, entity_id, 
CAST(null AS integer) FROM atable": cannot translate 
rel#25411:LogicalTableModify.NONE.[](input=rel#25415:LogicalProject.NONE.[](input=rel#25389:PhoenixTableScan.SERVER.[](table=[phoenix,
 
ATABLE]),ORGANIZATION_ID=$0,ENTITY_ID=$1,A_STRING=null,B_STRING=null,A_INTEGER=null,A_DATE=null,A_TIME=null,A_TIMESTAMP=null,X_DECIMAL=null,X_LONG=null,X_INTEGER=null,Y_INTEGER=null,A_BYTE=null,A_SHORT=null,A_FLOAT=null,A_DOUBLE=null,A_UNSIGNED_FLOAT=null,A_UNSIGNED_DOUBLE=null),table=[phoenix,
 ATABLE],operation=INSERT,flattened=false) to MutableRel
at 
org.apache.phoenix.end2end.AggregateQueryIT.testSumOverNullIntegerColumn(AggregateQueryIT.java:80)
Caused by: java.lang.RuntimeException: cannot translate 
rel#25411:LogicalTableModify.NONE.[](input=rel#25415:LogicalProject.NONE.[](input=rel#25389:PhoenixTableScan.SERVER.[](table=[phoenix,
 
ATABLE]),ORGANIZATION_ID=$0,ENTITY_ID=$1,A_STRING=null,B_STRING=null,A_INTEGER=null,A_DATE=null,A_TIME=null,A_TIMESTAMP=null,X_DECIMAL=null,X_LONG=null,X_INTEGER=null,Y_INTEGER=null,A_BYTE=null,A_SHORT=null,A_FLOAT=null,A_DOUBLE=null,A_UNSIGNED_FLOAT=null,A_UNSIGNED_DOUBLE=null),table=[phoenix,
 ATABLE],operation=INSERT,flattened=false) to MutableRel
at 
org.apache.phoenix.end2end.AggregateQueryIT.testSumOverNullIntegerColumn(AggregateQueryIT.java:80)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-2388) Support pooling Phoenix connections

2017-02-14 Thread William Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15867327#comment-15867327
 ] 

William Yang commented on PHOENIX-2388:
---

Anyone have time to have a look at this? What else shall we implement?

> Support pooling Phoenix connections
> ---
>
> Key: PHOENIX-2388
> URL: https://issues.apache.org/jira/browse/PHOENIX-2388
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
> Attachments: PHOENIX-2388.patch
>
>
> Frequently user are plugging Phoenix into an ecosystem that pools 
> connections. It would be possible to implement a pooling mechanism for 
> Phoenix by creating a delegate Connection that instantiates a new Phoenix 
> connection when retrieved from the pool and then closes the connection when 
> returning it to the pool.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-3658) Remove org.json:json dependency from flume module

2017-02-14 Thread Kalyan (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15867296#comment-15867296
 ] 

Kalyan commented on PHOENIX-3658:
-

Hi [~jmahonin],

we can go for com:tdunning:json, no issues from my side.

Thanks

> Remove org.json:json dependency from flume module
> -
>
> Key: PHOENIX-3658
> URL: https://issues.apache.org/jira/browse/PHOENIX-3658
> Project: Phoenix
>  Issue Type: Task
>Reporter: Josh Elser
>Assignee: Josh Mahonin
>Priority: Blocker
> Attachments: PHOENIX-3658.patch
>
>
> The phoenix-flume module depends on org.json:json which is now category-x.
> We have a grace period until 2017/04/30 to resolve this one.
> Need to replace it with something else.
> https://www.apache.org/legal/resolved#json
> https://lists.apache.org/thread.html/bb18f942ce7eb83c11438303c818b885810fb76385979490366720d5@%3Clegal-discuss.apache.org%3E



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-3135) Support loading csv data using apache phoenix flume plugin

2017-02-14 Thread Kalyan (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15867272#comment-15867272
 ] 

Kalyan commented on PHOENIX-3135:
-

Hi [~jmahonin], [~jamestaylor] , we can change this. Sorry about my mistake.

Thanks

> Support loading csv data using apache phoenix flume plugin
> --
>
> Key: PHOENIX-3135
> URL: https://issues.apache.org/jira/browse/PHOENIX-3135
> Project: Phoenix
>  Issue Type: New Feature
> Environment: cloudera 5.4
>Reporter: Kalyan
>Assignee: Josh Mahonin
>Priority: Minor
> Fix For: 4.10.0
>
> Attachments: phoenix_csv.patch
>
>
> To work with below sample data sets ... we need support loading csv data 
> using apache phoenix flume plugin.
> // sample data set 1
> schema: col1 varchar , col2 double, col3 varchar, col4 integer
> input: kalyan,10.5,abc,1
> input: "kalyan",10.5,"abc",1
> // sample data set 1
> schema: col1 varchar , col2 double, col3 varchar[], col4 integer[]
> input: kalyan,10.5,"abc,pqr,xyz","1,2,3,4"



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-3672) Change tests extending BaseQueryIT to use unique table names

2017-02-14 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15867226#comment-15867226
 ] 

James Taylor commented on PHOENIX-3672:
---

+1. Thanks for converting these!

> Change tests extending BaseQueryIT to use unique table names
> 
>
> Key: PHOENIX-3672
> URL: https://issues.apache.org/jira/browse/PHOENIX-3672
> Project: Phoenix
>  Issue Type: Task
>Reporter: Samarth Jain
>Assignee: Samarth Jain
> Attachments: PHOENIX-3672.patch
>
>
> This is important for making sure we have good coverage for column encoding 
> and any new features we will add.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (PHOENIX-3571) Potential divide by zero exception in LongDivideExpression

2017-02-14 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated PHOENIX-3571:

Description: 
Running SaltedIndexIT, I saw the following:
{code}
===> 
testExpressionThrowsException(org.apache.phoenix.end2end.index.IndexExpressionIT)
 starts
2017-01-05 19:42:48,992 INFO  [main] client.HBaseAdmin: Created I
2017-01-05 19:42:48,996 INFO  [main] schema.MetaDataClient: Created index I at 
1483645369000
2017-01-05 19:42:49,066 WARN  [hconnection-0x5a45c218-shared--pool52-t6] 
client.AsyncProcess: #38, table=T, attempt=1/35 failed=1ops, last exception: 
org.apache.phoenix.hbase.index.builder.IndexBuildingFailureException: 
org.apache.phoenix.hbase.index.builder.IndexBuildingFailureException: Failed to 
build index for unexpected reason!
  at 
org.apache.phoenix.hbase.index.util.IndexManagementUtil.rethrowIndexingException(IndexManagementUtil.java:183)
  at org.apache.phoenix.hbase.index.Indexer.preBatchMutate(Indexer.java:204)
  at 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$35.call(RegionCoprocessorHost.java:974)
  at 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1660)
  at 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1734)
  at 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1692)
  at 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.preBatchMutate(RegionCoprocessorHost.java:970)
  at 
org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:3218)
  at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2984)
  at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2926)
  at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:718)
  at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:680)
  at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2065)
  at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32393)
  at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2141)
  at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
  at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:238)
  at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:218)
Caused by: java.lang.ArithmeticException: / by zero
  at 
org.apache.phoenix.expression.LongDivideExpression.evaluate(LongDivideExpression.java:50)
  at 
org.apache.phoenix.index.IndexMaintainer.buildRowKey(IndexMaintainer.java:521)
  at 
org.apache.phoenix.index.IndexMaintainer.buildUpdateMutation(IndexMaintainer.java:859)
  at 
org.apache.phoenix.index.PhoenixIndexCodec.getIndexUpserts(PhoenixIndexCodec.java:76)
  at 
org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.addCurrentStateMutationsForBatch(NonTxIndexBuilder.java:288)
  at 
org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.addUpdateForGivenTimestamp(NonTxIndexBuilder.java:256)
  at 
org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.addMutationsForBatch(NonTxIndexBuilder.java:222)
  at 
org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.batchMutationAndAddUpdates(NonTxIndexBuilder.java:109)
  at 
org.apache.phoenix.hbase.index.covered.NonTxIndexBuilder.getIndexUpdate(NonTxIndexBuilder.java:71)
  at 
org.apache.phoenix.hbase.index.builder.IndexBuildManager$1.call(IndexBuildManager.java:136)
  at 
org.apache.phoenix.hbase.index.builder.IndexBuildManager$1.call(IndexBuildManager.java:132)
  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
  at 
com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:253)
  at 
com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:56)
  at 
org.apache.phoenix.hbase.index.parallel.BaseTaskRunner.submit(BaseTaskRunner.java:58)
  at 
org.apache.phoenix.hbase.index.parallel.BaseTaskRunner.submitUninterruptible(BaseTaskRunner.java:99)
  at 
org.apache.phoenix.hbase.index.builder.IndexBuildManager.getIndexUpdate(IndexBuildManager.java:143)
  at 
org.apache.phoenix.hbase.index.Indexer.preBatchMutateWithExceptions(Indexer.java:273)
  at org.apache.phoenix.hbase.index.Indexer.preBatchMutate(Indexer.java:201)
  ... 16 more
{code}
Better handling of divide by zero should be provided.

  was:
Running SaltedIndexIT, I saw the following:
{code}
===> 
testExpressionThrowsException(org.apache.phoenix.end2end.index.IndexExpressionIT)
 starts
2017-01-05 19:42:48,992 INFO  [main] client.HBaseAdmin: Created I
2017-01-05 19:42:48,996 INFO  [main] schema.MetaDataClient: Created index I at 
1483645369000
2017-01-05 19:42:49,066 WARN  [hconnection-0x5a45c218-shared--pool52-t6] 

[GitHub] phoenix issue #232: Pull request for merging the encode columns branch to 4....

2017-02-14 Thread samarthjain
Github user samarthjain commented on the issue:

https://github.com/apache/phoenix/pull/232
  
@jtaylor-sfdc @twdsilva please review


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] phoenix pull request #232: Pull request for merging the encode columns branc...

2017-02-14 Thread samarthjain
GitHub user samarthjain opened a pull request:

https://github.com/apache/phoenix/pull/232

Pull request for merging the encode columns branch to 4.x-HBase-0.98



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/phoenix encodecolumns2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/phoenix/pull/232.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #232


commit b49fc0d1d684b7864ddfafa665ceadfcd53e424f
Author: Samarth 
Date:   2017-02-14T23:40:50Z

PHOENIX-1598 Column encoding to save space and improve performance

commit 4044378fabc836f48d1dc0ce045c0684272dcffc
Author: Samarth 
Date:   2017-02-15T02:43:10Z

Fix test failures in partial index rebuild tool




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (PHOENIX-476) Support declaration of DEFAULT in CREATE statement

2017-02-14 Thread Thomas D'Silva (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15867069#comment-15867069
 ] 

Thomas D'Silva commented on PHOENIX-476:


[~kliew]

Can you use a sequence (NEXT VALUE FOR) as a default expression ? I get a 
parser error when I tried to use a sequence. 

{code}
"CREATE TABLE IF NOT EXISTS " + sharedTable1 + " (" +
"pk1 INTEGER NOT NULL, " +
"pk2 INTEGER NOT NULL DEFAULT NEXT VALUE FOR my_seq, " +
"CONSTRAINT NAME_PK PRIMARY KEY (pk1, pk2))"
{code}

> Support declaration of DEFAULT in CREATE statement
> --
>
> Key: PHOENIX-476
> URL: https://issues.apache.org/jira/browse/PHOENIX-476
> Project: Phoenix
>  Issue Type: New Feature
>Affects Versions: 3.0-Release
>Reporter: James Taylor
>Assignee: Kevin Liew
> Fix For: 4.9.0
>
> Attachments: PHOENIX-476.10.patch, PHOENIX-476.11.patch, 
> PHOENIX-476.12.patch, PHOENIX-476.2.patch, PHOENIX-476.3.patch, 
> PHOENIX-476.4.patch, PHOENIX-476.5.patch, PHOENIX-476.6.patch, 
> PHOENIX-476.7.patch, PHOENIX-476.8.patch, PHOENIX-476.9.patch, 
> PHOENIX-476.patch
>
>
> Support the declaration of a default value in the CREATE TABLE/VIEW statement 
> like this:
> CREATE TABLE Persons (
> Pid int NOT NULL PRIMARY KEY,
> LastName varchar(255) NOT NULL,
> FirstName varchar(255),
> Address varchar(255),
> City varchar(255) DEFAULT 'Sandnes'
> )
> To implement this, we'd need to:
> 1. add a new DEFAULT_VALUE key value column in SYSTEM.TABLE and pass through 
> the value when the table is created (in MetaDataClient).
> 2. always set NULLABLE to ResultSetMetaData.columnNoNulls if a default value 
> is present, since the column will never be null.
> 3. add a getDefaultValue() accessor in PColumn
> 4.  for a row key column, during UPSERT use the default value if no value was 
> specified for that column. This could be done in the PTableImpl.newKey method.
> 5.  for a key value column with a default value, we can get away without 
> incurring any storage cost. Although a little bit of extra effort than if we 
> persisted the default value on an UPSERT for key value columns, this approach 
> has the benefit of not incurring any storage cost for a default value.
> * serialize any default value into KeyValueColumnExpression
> * in the evaluate method of KeyValueColumnExpression, conditionally use 
> the default value if the column value is not present. If doing partial 
> evaluation, you should not yet return the default value, as we may not have 
> encountered the the KeyValue for the column yet (since a filter evaluates 
> each time it sees each KeyValue, and there may be more than one KeyValue 
> referenced in the expression). Partial evaluation is determined by calling 
> Tuple.isImmutable(), where false means it is NOT doing partial evaluation, 
> while true means it is.
> * modify EvaluateOnCompletionVisitor by adding a visitor method for 
> RowKeyColumnExpression and KeyValueColumnExpression to set 
> evaluateOnCompletion to true if they have a default value specified. This 
> will cause filter evaluation to execute one final time after all KeyValues 
> for a row have been seen, since it's at this time we know we should use the 
> default value.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-3603) Fix compilation errors against hbase 1.3.0 release

2017-02-14 Thread Rajeshbabu Chintaguntla (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15867025#comment-15867025
 ] 

Rajeshbabu Chintaguntla commented on PHOENIX-3603:
--

bq. Would you be ok doing that? I'd recommend waiting just a few days until 
Samarth Jain puts together and hopefully commits his column encoding pull 
request.
Sure [~jamestaylor]. Will do it.

> Fix compilation errors against hbase 1.3.0 release
> --
>
> Key: PHOENIX-3603
> URL: https://issues.apache.org/jira/browse/PHOENIX-3603
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Zach York
> Fix For: 4.10.0
>
>
> hbase 1.3.0 has been released.
> I saw the following when compiling master branch against 1.3.0
> {code}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:3.0:compile (default-compile) 
> on project phoenix-core: Compilation failure: Compilation failure:
> [ERROR] 
> /Users/tyu/phoenix/phoenix-core/src/main/java/org/apache/phoenix/execute/DelegateHTable.java:[49,8]
>  org.apache.phoenix.execute.DelegateHTable is not abstract and does not 
> override abstract method getRpcTimeout() in 
> org.apache.hadoop.hbase.client.Table
> [ERROR] 
> /Users/tyu/phoenix/phoenix-core/src/main/java/org/apache/hadoop/hbase/ipc/PhoenixRpcScheduler.java:[32,8]
>  org.apache.hadoop.hbase.ipc.PhoenixRpcScheduler is not abstract and does not 
> override abstract method getNumLifoModeSwitches() in 
> org.apache.hadoop.hbase.ipc.RpcScheduler
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-3672) Change tests extending BaseQueryIT to use unique table names

2017-02-14 Thread Thomas D'Silva (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866929#comment-15866929
 ] 

Thomas D'Silva commented on PHOENIX-3672:
-

+1

> Change tests extending BaseQueryIT to use unique table names
> 
>
> Key: PHOENIX-3672
> URL: https://issues.apache.org/jira/browse/PHOENIX-3672
> Project: Phoenix
>  Issue Type: Task
>Reporter: Samarth Jain
>Assignee: Samarth Jain
> Attachments: PHOENIX-3672.patch
>
>
> This is important for making sure we have good coverage for column encoding 
> and any new features we will add.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-3672) Change tests extending BaseQueryIT to use unique table names

2017-02-14 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866927#comment-15866927
 ] 

Samarth Jain commented on PHOENIX-3672:
---

I am going to go ahead push this patch to the encodecolumns2 branch for now. 
Will amend or revert if needed.

> Change tests extending BaseQueryIT to use unique table names
> 
>
> Key: PHOENIX-3672
> URL: https://issues.apache.org/jira/browse/PHOENIX-3672
> Project: Phoenix
>  Issue Type: Task
>Reporter: Samarth Jain
>Assignee: Samarth Jain
> Attachments: PHOENIX-3672.patch
>
>
> This is important for making sure we have good coverage for column encoding 
> and any new features we will add.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (PHOENIX-3672) Change tests extending BaseQueryIT to use unique table names

2017-02-14 Thread Samarth Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Samarth Jain updated PHOENIX-3672:
--
Attachment: PHOENIX-3672.patch

[~jamestaylor], [~tdsilva] - please review. The key change is in BaseQueryIT. 
What I have done is that for every parameter combination, we generate a unique 
table and index. The alternative was to generate index and tables in the 
constructor but that slows down the tests considerably as then for every test 
method for every parameter, we would be creating tables and indices. I have 
provided isolation between methods running for a parameter by the usual clean 
up method in BaseClientManagedIT class. I have overriden the @AfterClass method 
though since we don't need to drop the tables as they are unique (unless some 
other test class ends up using hardcoded table names like our generated table 
names which is unlikely).

> Change tests extending BaseQueryIT to use unique table names
> 
>
> Key: PHOENIX-3672
> URL: https://issues.apache.org/jira/browse/PHOENIX-3672
> Project: Phoenix
>  Issue Type: Task
>Reporter: Samarth Jain
>Assignee: Samarth Jain
> Attachments: PHOENIX-3672.patch
>
>
> This is important for making sure we have good coverage for column encoding 
> and any new features we will add.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-3453) Secondary index and query using distinct: Outer query results in ERROR 201 (22000): Illegal data. CHAR types may only contain single byte characters

2017-02-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866818#comment-15866818
 ] 

Hudson commented on PHOENIX-3453:
-

FAILURE: Integrated in Jenkins build Phoenix-master #1553 (See 
[https://builds.apache.org/job/Phoenix-master/1553/])
PHOENIX-3453 Secondary index and query using distinct: Outer query 
(jamestaylor: rev 799d217f6cab6fd57cd3b1c87553b607024de4f0)
* (edit) 
phoenix-core/src/main/java/org/apache/phoenix/expression/CoerceExpression.java
* (edit) phoenix-core/src/it/java/org/apache/phoenix/end2end/GroupByCaseIT.java
* (edit) 
phoenix-core/src/test/java/org/apache/phoenix/compile/QueryCompilerTest.java


> Secondary index and query using distinct: Outer query results in ERROR 201 
> (22000): Illegal data. CHAR types may only contain single byte characters
> 
>
> Key: PHOENIX-3453
> URL: https://issues.apache.org/jira/browse/PHOENIX-3453
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0, 4.9.0
>Reporter: Joel Palmert
>Assignee: chenglei
> Fix For: 4.10.0
>
> Attachments: PHOENIX-3453_v1.patch
>
>
> Steps to repro:
> CREATE TABLE IF NOT EXISTS TEST.TEST (
> ENTITY_ID CHAR(15) NOT NULL,
> SCORE DOUBLE,
> CONSTRAINT TEST_PK PRIMARY KEY (
> ENTITY_ID
> )
> ) VERSIONS=1, MULTI_TENANT=FALSE, REPLICATION_SCOPE=1, TTL=31536000;
> CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (SCORE DESC, ENTITY_ID 
> DESC);
> UPSERT INTO test.test VALUES ('entity1',1.1);
> SELECT DISTINCT entity_id, score
> FROM(
> SELECT entity_id, score
> FROM test.test
> LIMIT 25
> );
> Output (in SQuirreL)
> ���   1.1
> If you run it in SQuirreL it results in the entity_id column getting the 
> above error value. Notice that if you remove the secondary index or DISTINCT 
> you get the correct result.
> I've also run the query through the Phoenix java api. Then I get the 
> following exception:
> Caused by: java.sql.SQLException: ERROR 201 (22000): Illegal data. CHAR types 
> may only contain single byte characters ()
> at 
> org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:454)
> at 
> org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:145)
> at 
> org.apache.phoenix.schema.types.PDataType.newIllegalDataException(PDataType.java:291)
> at org.apache.phoenix.schema.types.PChar.toObject(PChar.java:121)
> at org.apache.phoenix.schema.types.PDataType.toObject(PDataType.java:997)
> at 
> org.apache.phoenix.compile.ExpressionProjector.getValue(ExpressionProjector.java:75)
> at 
> org.apache.phoenix.jdbc.PhoenixResultSet.getString(PhoenixResultSet.java:608)
> at 
> org.apache.phoenix.jdbc.PhoenixResultSet.getString(PhoenixResultSet.java:621)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-3360) Secondary index configuration is wrong

2017-02-14 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866815#comment-15866815
 ] 

Enis Soztutar commented on PHOENIX-3360:


bq. we can use the v1 patch with a little modification that we just set the 
conf returned by CoprocessorEnvironment#getConfiguration()
Indeed. The patch should have read the env.getConfiguration() as opposed to 
env.getRegionServerServices().getConfiguration().  
1+ for v4. [~rajeshbabu] do you mind committing this. 

> Secondary index configuration is wrong
> --
>
> Key: PHOENIX-3360
> URL: https://issues.apache.org/jira/browse/PHOENIX-3360
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Enis Soztutar
>Assignee: William Yang
>Priority: Critical
> Fix For: 4.10.0
>
> Attachments: ConfCP.java, PHOENIX-3360.patch, PHOENIX-3360-v2.PATCH, 
> PHOENIX-3360-v3.PATCH, PHOENIX-3360-v4.PATCH
>
>
> IndexRpcScheduler allocates some handler threads and uses a higher priority 
> for RPCs. The corresponding IndexRpcController is not used by default as it 
> is, but used through ServerRpcControllerFactory that we configure from Ambari 
> by default which sets the priority of the outgoing RPCs to either metadata 
> priority, or the index priority.
> However, after reading code of IndexRpcController / ServerRpcController it 
> seems that the IndexRPCController DOES NOT look at whether the outgoing RPC 
> is for an Index table or not. It just sets ALL rpc priorities to be the index 
> priority. The intention seems to be the case that ONLY on servers, we 
> configure ServerRpcControllerFactory, and with clients we NEVER configure 
> ServerRpcControllerFactory, but instead use ClientRpcControllerFactory. We 
> configure ServerRpcControllerFactory from Ambari, which in affect makes it so 
> that ALL rpcs from Phoenix are only handled by the index handlers by default. 
> It means all deadlock cases are still there. 
> The documentation in https://phoenix.apache.org/secondary_indexing.html is 
> also wrong in this sense. It does not talk about server side / client side. 
> Plus this way of configuring different values is not how HBase configuration 
> is deployed. We cannot have the configuration show the 
> ServerRpcControllerFactory even only for server nodes, because the clients 
> running on those nodes will also see the wrong values. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-2993) Tephra: Prune invalid transaction set once all data for a given invalid transaction has been dropped

2017-02-14 Thread Poorna Chandra (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866725#comment-15866725
 ] 

Poorna Chandra commented on PHOENIX-2993:
-

[~jamestaylor] Tephra 0.11.0-incubating by default has pruning of the invalid 
list disabled, since this feature has not been tested extensively yet. Once we 
do some more testing, we can enable it by default from 0.12.0-incubating 
release onwards.

It would be great to have Phoenix start using it now, and give us early 
feedback. To enable pruning in 0.11.0, just set the configuration 
{{data.tx.prune.enable}} to {{true}}. Once you restart transaction service and 
HBase region servers after the configuration change, the invalid list will get 
pruned automatically based on major compactions.

Also note that Tephra will create an HBase table called {{tephra.state}} in the 
default namespace when pruning is enabled. The name of this table can be 
controlled by using the configuration parameter {{data.tx.prune.state.table}}.

> Tephra: Prune invalid transaction set once all data for a given invalid 
> transaction has been dropped
> 
>
> Key: PHOENIX-2993
> URL: https://issues.apache.org/jira/browse/PHOENIX-2993
> Project: Phoenix
>  Issue Type: New Feature
>Reporter: Poorna Chandra
>Assignee: Poorna Chandra
> Attachments: ApacheTephraAutomaticInvalidListPruning.pdf
>
>
> From TEPHRA-35 -
> In addition to dropping the data from invalid transactions we need to be able 
> to prune the invalid set of any transactions where data cleanup has been 
> completely performed. Without this, the invalid set will grow indefinitely 
> and become a greater and greater cost to in-progress transactions over time.
> To do this correctly, the TransactionDataJanitor coprocessor will need to 
> maintain some bookkeeping for the transaction data that it removes, so that 
> the transaction manager can reason about when all of a given transaction's 
> data has been removed. Only at this point can the transaction manager safely 
> drop the transaction ID from the invalid set.
> One approach would be for the TransactionDataJanitor to update a table 
> marking when a major compaction was performed on a region and what 
> transaction IDs were filtered out. Once all regions in a table containing the 
> transaction data have been compacted, we can remove the filtered out 
> transaction IDs from the invalid set. However, this will need to cope with 
> changing region names due to splits, etc.
> Note: This will be moved to Tephra JIRA once the setup of Tephra JIRA is 
> complete (INFRA-11445)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-3662) PhoenixStorageHandler throws ClassCastException.

2017-02-14 Thread Sergey Soldatov (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866686#comment-15866686
 ] 

Sergey Soldatov commented on PHOENIX-3662:
--

[~jamestaylor] sure. 
[~Jeongdae Kim] thank you for the patch, I will try to get all patches from 
HivePhoenix label collected and commit after testing (most of them are already 
tested). 

> PhoenixStorageHandler throws ClassCastException.
> 
>
> Key: PHOENIX-3662
> URL: https://issues.apache.org/jira/browse/PHOENIX-3662
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.9.0
>Reporter: Jeongdae Kim
>Assignee: Jeongdae Kim
>  Labels: HivePhoenix
> Attachments: PHOENIX-3662.1.patch, PHOENIX-3662.2.patch
>
>
> when executing a query that has between clauses embraced by function, phoenix 
> storage handler throws class cast exception like below.
> and in addition, i found some bugs when handling push down predicates.
> {code}
> 2017-02-06T16:35:26,019 ERROR [7d29d400-2ec5-4ab8-84c2-041b55c3e24b 
> HiveServer2-Handler-Pool: Thread-57]: ql.Driver 
> (SessionState.java:printError(1097)) - FAILED: ClassCastException 
> org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc cannot be cast to 
> org.apache.hadoop.hive.ql.plan.ExprNodeColumnDesc
> java.lang.ClassCastException: 
> org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc cannot be cast to 
> org.apache.hadoop.hive.ql.plan.ExprNodeColumnDesc
>   at 
> org.apache.phoenix.hive.ql.index.IndexPredicateAnalyzer.processingBetweenOperator(IndexPredicateAnalyzer.java:229)
>   at 
> org.apache.phoenix.hive.ql.index.IndexPredicateAnalyzer.analyzeExpr(IndexPredicateAnalyzer.java:369)
>   at 
> org.apache.phoenix.hive.ql.index.IndexPredicateAnalyzer.access$000(IndexPredicateAnalyzer.java:72)
>   at 
> org.apache.phoenix.hive.ql.index.IndexPredicateAnalyzer$1.process(IndexPredicateAnalyzer.java:165)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:158)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
>   at 
> org.apache.phoenix.hive.ql.index.IndexPredicateAnalyzer.analyzePredicate(IndexPredicateAnalyzer.java:176)
>   at 
> org.apache.phoenix.hive.ppd.PhoenixPredicateDecomposer.decomposePredicate(PhoenixPredicateDecomposer.java:63)
>   at 
> org.apache.phoenix.hive.PhoenixStorageHandler.decomposePredicate(PhoenixStorageHandler.java:238)
>   at 
> org.apache.hadoop.hive.ql.ppd.OpProcFactory.pushFilterToStorageHandler(OpProcFactory.java:1004)
>   at 
> org.apache.hadoop.hive.ql.ppd.OpProcFactory.createFilter(OpProcFactory.java:910)
>   at 
> org.apache.hadoop.hive.ql.ppd.OpProcFactory.createFilter(OpProcFactory.java:880)
>   at 
> org.apache.hadoop.hive.ql.ppd.OpProcFactory$TableScanPPD.process(OpProcFactory.java:429)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:158)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
>   at 
> org.apache.hadoop.hive.ql.ppd.SimplePredicatePushDown.transform(SimplePredicatePushDown.java:102)
>   at 
> org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:242)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10921)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:246)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:250)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:471)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1242)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1229)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:191)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:276)
>   at 
> org.apache.hive.service.cli.operation.Operation.run(Operation.java:324)
>   at 
> 

[jira] [Commented] (PHOENIX-3639) WALEntryFilter to replicate only multi-tenant views from SYSTEM.CATALOG

2017-02-14 Thread Geoffrey Jacoby (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866677#comment-15866677
 ] 

Geoffrey Jacoby commented on PHOENIX-3639:
--

Agreed. If someone comes along later with a need for an 0.98 backport, I'd 
encourage them to open a new JIRA and assign to me and I'd be happy to make 
one. In the meantime though, I think 1.x only is fine. 

> WALEntryFilter to replicate only multi-tenant views from SYSTEM.CATALOG
> ---
>
> Key: PHOENIX-3639
> URL: https://issues.apache.org/jira/browse/PHOENIX-3639
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.8.0, 4.9.0
>Reporter: Geoffrey Jacoby
>Assignee: Geoffrey Jacoby
>  Labels: replication
> Fix For: 4.10.0
>
> Attachments: PHOENIX-3639.patch, PHOENIX-3639.v2.patch
>
>
> As a stopgap before PHOENIX-3520, we can create an HBase WALEntryFilter to 
> filter out non-tenant rows of SYSTEM.CATALOG from replication while allowing 
> tenant-owned rows such as tenant views to proceed with replication. 
> This would have to be incorporated into a ReplicationEndpoint subclass to be 
> useful, though HBASE-17543 would make that much simpler by doing it via 
> configuration rather than code (and avoiding the need for a new peer to be 
> created).
> If PHOENIX-3520 is on the near-future roadmap, however, that would be the 
> better solution to the "replication corrupts SYSTEM.CATALOG" problem.  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (PHOENIX-3662) PhoenixStorageHandler throws ClassCastException.

2017-02-14 Thread Sergey Soldatov (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Soldatov updated PHOENIX-3662:
-
Labels: HivePhoenix  (was: )

> PhoenixStorageHandler throws ClassCastException.
> 
>
> Key: PHOENIX-3662
> URL: https://issues.apache.org/jira/browse/PHOENIX-3662
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.9.0
>Reporter: Jeongdae Kim
>Assignee: Jeongdae Kim
>  Labels: HivePhoenix
> Attachments: PHOENIX-3662.1.patch, PHOENIX-3662.2.patch
>
>
> when executing a query that has between clauses embraced by function, phoenix 
> storage handler throws class cast exception like below.
> and in addition, i found some bugs when handling push down predicates.
> {code}
> 2017-02-06T16:35:26,019 ERROR [7d29d400-2ec5-4ab8-84c2-041b55c3e24b 
> HiveServer2-Handler-Pool: Thread-57]: ql.Driver 
> (SessionState.java:printError(1097)) - FAILED: ClassCastException 
> org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc cannot be cast to 
> org.apache.hadoop.hive.ql.plan.ExprNodeColumnDesc
> java.lang.ClassCastException: 
> org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc cannot be cast to 
> org.apache.hadoop.hive.ql.plan.ExprNodeColumnDesc
>   at 
> org.apache.phoenix.hive.ql.index.IndexPredicateAnalyzer.processingBetweenOperator(IndexPredicateAnalyzer.java:229)
>   at 
> org.apache.phoenix.hive.ql.index.IndexPredicateAnalyzer.analyzeExpr(IndexPredicateAnalyzer.java:369)
>   at 
> org.apache.phoenix.hive.ql.index.IndexPredicateAnalyzer.access$000(IndexPredicateAnalyzer.java:72)
>   at 
> org.apache.phoenix.hive.ql.index.IndexPredicateAnalyzer$1.process(IndexPredicateAnalyzer.java:165)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:158)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
>   at 
> org.apache.phoenix.hive.ql.index.IndexPredicateAnalyzer.analyzePredicate(IndexPredicateAnalyzer.java:176)
>   at 
> org.apache.phoenix.hive.ppd.PhoenixPredicateDecomposer.decomposePredicate(PhoenixPredicateDecomposer.java:63)
>   at 
> org.apache.phoenix.hive.PhoenixStorageHandler.decomposePredicate(PhoenixStorageHandler.java:238)
>   at 
> org.apache.hadoop.hive.ql.ppd.OpProcFactory.pushFilterToStorageHandler(OpProcFactory.java:1004)
>   at 
> org.apache.hadoop.hive.ql.ppd.OpProcFactory.createFilter(OpProcFactory.java:910)
>   at 
> org.apache.hadoop.hive.ql.ppd.OpProcFactory.createFilter(OpProcFactory.java:880)
>   at 
> org.apache.hadoop.hive.ql.ppd.OpProcFactory$TableScanPPD.process(OpProcFactory.java:429)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:158)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
>   at 
> org.apache.hadoop.hive.ql.ppd.SimplePredicatePushDown.transform(SimplePredicatePushDown.java:102)
>   at 
> org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:242)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10921)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:246)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:250)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:471)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1242)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1229)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:191)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:276)
>   at 
> org.apache.hive.service.cli.operation.Operation.run(Operation.java:324)
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:499)
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:486)
>   at 
> 

[jira] [Resolved] (PHOENIX-3644) Phoenix Query With Multiple 'OR' operators does a full range scan when it is a tentant specific connection

2017-02-14 Thread Thomas D'Silva (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas D'Silva resolved PHOENIX-3644.
-
Resolution: Not A Problem

The query that returned a range scan was filtering on a pk column that was not 
part of the leading columns of the primary key. 

> Phoenix Query With Multiple 'OR' operators does a full range scan when it is 
> a tentant specific connection 
> ---
>
> Key: PHOENIX-3644
> URL: https://issues.apache.org/jira/browse/PHOENIX-3644
> Project: Phoenix
>  Issue Type: Bug
>Reporter: saikiran perumala
>Assignee: Thomas D'Silva
>
> I was looking at explain plan for IN / OR operators in a where statements, I 
> got some conflicting results
> Non tenant query :
> here IN  AND operator are on PK
> Explain Select * from CUSTOM_ENTITY."CUSTOM_ENTITY_DATA_NO_ID" Where 
> Organization_id IN ('00Dxx001i28', '00Dxx001i29') AND Key_prefix = 
> 'z0D';
> Explain Select * from CUSTOM_ENTITY."CUSTOM_ENTITY_DATA_NO_ID" Where 
> (Organization_id = '00Dxx001i28' OR Organization_id  = '00Dxx001i29') 
> AND Key_prefix = 'z0D';
> Both give same result :
> CLIENT PARALLEL 32-WAY POINT LOOKUP ON 2 KEYS OVER 
> CUSTOM_ENTITY.CUSTOM_ENTITY_DATA_NO_ID
> Tenant Specific View:
> here IN  AND operator are on PK
> explain SELECT * FROM CUSTOM_ENTITY."z0D"  WHERE C00NXX01DIBOEAS 
> IN('ROW-THREAD_1-VAL-9','ROW-THREAD_1-VAL-8','ROW-THREAD_1-VAL-7')
> this is the query plan
> CLIENT PARALLEL 32-WAY POINT LOOKUP ON 3 KEYS OVER 
> CUSTOM_ENTITY.CUSTOM_ENTITY_DATA_NO_ID
> SERVER FILTER BY PageFilter 100
> SERVER 100 ROW LIMIT
> CLIENT 100 ROW LIMIT
> But when there is an OR say for this query 
> explain SELECT * FROM CUSTOM_ENTITY."z0D"  WHERE 
> (C00NXX01DIBUEAS='ROW-THREAD_1-VAL-8' OR 
> C00NXX01DIBUEAS='ROW-THREAD_1-VAL-7')
> This is the query plan :
> CLIENT PARALLEL 32-WAY RANGE SCAN OVER CUSTOM_ENTITY.CUSTOM_ENTITY_DATA_NO_ID 
> ['00Dxx001i28','z0D']
> SERVER FILTER BY (C00NXX01DIBUEAS = 'ROW-THREAD_1-VAL-8' OR 
> C00NXX01DIBUEAS = 'ROW-THREAD_1-VAL-7')
> SERVER 100 ROW LIMIT
> CLIENT 100 ROW LIMIT
> In a tenant specific view IN and OR operators on a PK return different query 
> plan, OR filter is doing a full range scan instead of a Point query. 
> DDL :
> CREATE TABLE IF NOT EXISTS CUSTOM_ENTITY.CUSTOM_TABLE (
> ORGANIZATION_ID CHAR(15) NOT NULL, 
> KEY_PREFIX CHAR(3) NOT NULL, 
> CREATED_DATE DATE,
> CREATED_BY CHAR(15),
> SYSTEM_MODSTAMP DATE
> CONSTRAINT PK PRIMARY KEY (
> ORGANIZATION_ID, 
> KEY_PREFIX 
> )
> ) VERSIONS=1, MULTI_TENANT=true, IMMUTABLE_ROWS=TRUE, REPLICATION_SCOPE=1
> DDL FOR VIEWS :
> CREATE VIEW IF NOT EXISTS CUSTOM_VIEW."z0I" (
>   C00NXX01DII4EAC VARCHAR(50) NOT NULL, 
>   C00NXX01DII3EAC CHAR(15), 
>   C00NXX01DII5EAC CHAR(15), 
>   C00NXX01DII6EAC DATE, 
>   C00NXX01DII7EAC DATE, 
>   C00NXX01DII8EAC DECIMAL, 
>   C00NXX01DII9EAC DECIMAL, 
>   C00NXX01DIIAEAS VARCHAR(100), 
>   C00NXX01DIIBEAS DECIMAL, 
>   C00NXX01DIICEAS DECIMAL, 
>   C00NXX01DIIDEAS DECIMAL, 
>   C00NXX01DIIEEAS VARCHAR(40), 
>   C00NXX01DIIFEAS VARCHAR(255), 
>   C00NXX01DIIGEAS VARCHAR(30), 
>   C00NXX01DIIHEAS VARCHAR(30), 
>   C00NXX01DIIIEAS VARCHAR(100), 
>   C00NXX01DIIJEAS VARCHAR(100), 
>   C00NXX01DIIKEAS VARCHAR(255), 
>   C00NXX01DIILEAS VARCHAR(255), 
>   C00NXX01DIIMEAS DECIMAL CONSTRAINT PK PRIMARY KEY 
> (C00NXX01DII4EAC DESC)) AS SELECT * FROM CUSTOM_VIEW.CUSTOM_TABLE WHERE 
> KEY_PREFIX = 'z0I'



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-3639) WALEntryFilter to replicate only multi-tenant views from SYSTEM.CATALOG

2017-02-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866651#comment-15866651
 ] 

Hudson commented on PHOENIX-3639:
-

FAILURE: Integrated in Jenkins build Phoenix-master #1552 (See 
[https://builds.apache.org/job/Phoenix-master/1552/])
PHOENIX-3639 WALEntryFilter to replicate only multi-tenant views from 
(jamestaylor: rev d1e80e3b161628af4d58443b87d42fa4af256486)
* (add) 
phoenix-core/src/main/java/org/apache/phoenix/replication/SystemCatalogWALEntryFilter.java
* (add) 
phoenix-core/src/it/java/org/apache/phoenix/replication/TestSystemCatalogWALEntryFilter.java


> WALEntryFilter to replicate only multi-tenant views from SYSTEM.CATALOG
> ---
>
> Key: PHOENIX-3639
> URL: https://issues.apache.org/jira/browse/PHOENIX-3639
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.8.0, 4.9.0
>Reporter: Geoffrey Jacoby
>Assignee: Geoffrey Jacoby
>  Labels: replication
> Fix For: 4.10.0
>
> Attachments: PHOENIX-3639.patch, PHOENIX-3639.v2.patch
>
>
> As a stopgap before PHOENIX-3520, we can create an HBase WALEntryFilter to 
> filter out non-tenant rows of SYSTEM.CATALOG from replication while allowing 
> tenant-owned rows such as tenant views to proceed with replication. 
> This would have to be incorporated into a ReplicationEndpoint subclass to be 
> useful, though HBASE-17543 would make that much simpler by doing it via 
> configuration rather than code (and avoiding the need for a new peer to be 
> created).
> If PHOENIX-3520 is on the near-future roadmap, however, that would be the 
> better solution to the "replication corrupts SYSTEM.CATALOG" problem.  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-3670) KeyRange.intersect(List , List) is inefficient,especially for join dynamic filter

2017-02-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866652#comment-15866652
 ] 

Hudson commented on PHOENIX-3670:
-

FAILURE: Integrated in Jenkins build Phoenix-master #1552 (See 
[https://builds.apache.org/job/Phoenix-master/1552/])
PHOENIX-3670 KeyRange.intersect(List , List) is 
(jamestaylor: rev 4b4205a681f3a87c6d418462fcf282abd9ad80b0)
* (edit) phoenix-core/src/main/java/org/apache/phoenix/query/KeyRange.java
* (add) 
phoenix-core/src/test/java/org/apache/phoenix/query/KeyRangeMoreTest.java


> KeyRange.intersect(List , List) is inefficient,especially 
> for join dynamic filter
> -
>
> Key: PHOENIX-3670
> URL: https://issues.apache.org/jira/browse/PHOENIX-3670
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.9.0
>Reporter: chenglei
>Assignee: chenglei
> Fix For: 4.10.0
>
> Attachments: PHOENIX-3670_v1.patch
>
>
> In my business system, there is a following join SQL(which is simplified), 
> fact_table is a fact table,  joining dimension table dim_table1 and 
> dim_table2 : 
> {code:borderStyle=solid} 
> select /*+ SKIP_SCAN */ sum(t.click)  from fact_table t join dim_table1 d1 on 
>  t.cust_id=d1.id  join dim_table2 d2 on t.cust_id =d2.id  where t.date 
> between '2016-01-01' and '2017-01-01' and d1.code = 2008 and d2.region = 'us';
> {code} 
> I use /*+ SKIP_SCAN */ hint to enable join dynamic filter. For some small 
> dataset, the sql executes quickly, but when the dataset is bigger, the sql 
> becomes very slowly,when the  row count of fact_table is 30 
> million,dim_table1 is 300 thousand and dim_table2 is 100 thousand, the above 
> query  costs 17s.
> When I debug the SQL executing, I find RHS1 return 5523 rows:
> {code:borderStyle=solid} 
>select d1.id from dim_table1 d1 where d1.code = 2008
> {code} 
> and RHS2 return 23881 rows: 
> {code:borderStyle=solid}
>select d2.id from dim_table2 d2 where d2.region='us'
> {code}  
> then HashJoinPlan uses  KeyRange.intersect(List , List ) 
> method to compute RHS1 intersecting RHS2 for join dynamic filter, narrowing 
> down fact_table.cust_id should be. 
> Surprisingly,the KeyRange.intersect method costs 11s ! although the whole sql 
> execution only costs 17s.After I read the code of  KeyRange.intersect 
> method,I find following two problem:
> (1) The double loop is inefficient in line 521 and line 522,when keyRanges  
> size is M, keyRanges2 size is N, the time complexity is O(M*N), for my 
> example,is 5523*23881: 
> {code:borderStyle=solid} 
> 519 public static List intersect(List keyRanges,  
> List keyRanges2) {
> 520List tmp = new ArrayList();
> 521for (KeyRange r1 : keyRanges) {
> 522for (KeyRange r2 : keyRanges2) {
> 523KeyRange r = r1.intersect(r2);
> 524if (EMPTY_RANGE != r) {
> 525tmp.add(r);
> 526}
> 527}
> 528}
> {code}  
> (2) line 540 shoule be r = r.union(tmp.get( i )), not intersect, just as 
> KeyRange.coalesce method does:
> {code:borderStyle=solid} 
> 532Collections.sort(tmp, KeyRange.COMPARATOR);
> 533List tmp2 = new ArrayList();
> 534KeyRange r = tmp.get(0);
> 535for (int i=1; i 536if (EMPTY_RANGE == r.intersect(tmp.get(i))) {
> 537tmp2.add(r);
> 538r = tmp.get(i);
> 539} else {
> 540r = r.intersect(tmp.get(i));
> 541}
> 542}
> {code}
> and it seems that no unit tests for this KeyRange.intersect(List , 
> List) method. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (PHOENIX-3360) Secondary index configuration is wrong

2017-02-14 Thread James Taylor (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Taylor reassigned PHOENIX-3360:
-

Assignee: William Yang  (was: Rajeshbabu Chintaguntla)

> Secondary index configuration is wrong
> --
>
> Key: PHOENIX-3360
> URL: https://issues.apache.org/jira/browse/PHOENIX-3360
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Enis Soztutar
>Assignee: William Yang
>Priority: Critical
> Fix For: 4.10.0
>
> Attachments: ConfCP.java, PHOENIX-3360.patch, PHOENIX-3360-v2.PATCH, 
> PHOENIX-3360-v3.PATCH, PHOENIX-3360-v4.PATCH
>
>
> IndexRpcScheduler allocates some handler threads and uses a higher priority 
> for RPCs. The corresponding IndexRpcController is not used by default as it 
> is, but used through ServerRpcControllerFactory that we configure from Ambari 
> by default which sets the priority of the outgoing RPCs to either metadata 
> priority, or the index priority.
> However, after reading code of IndexRpcController / ServerRpcController it 
> seems that the IndexRPCController DOES NOT look at whether the outgoing RPC 
> is for an Index table or not. It just sets ALL rpc priorities to be the index 
> priority. The intention seems to be the case that ONLY on servers, we 
> configure ServerRpcControllerFactory, and with clients we NEVER configure 
> ServerRpcControllerFactory, but instead use ClientRpcControllerFactory. We 
> configure ServerRpcControllerFactory from Ambari, which in affect makes it so 
> that ALL rpcs from Phoenix are only handled by the index handlers by default. 
> It means all deadlock cases are still there. 
> The documentation in https://phoenix.apache.org/secondary_indexing.html is 
> also wrong in this sense. It does not talk about server side / client side. 
> Plus this way of configuring different values is not how HBase configuration 
> is deployed. We cannot have the configuration show the 
> ServerRpcControllerFactory even only for server nodes, because the clients 
> running on those nodes will also see the wrong values. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (PHOENIX-3453) Secondary index and query using distinct: Outer query results in ERROR 201 (22000): Illegal data. CHAR types may only contain single byte characters

2017-02-14 Thread James Taylor (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Taylor updated PHOENIX-3453:
--
Fix Version/s: 4.10.0

> Secondary index and query using distinct: Outer query results in ERROR 201 
> (22000): Illegal data. CHAR types may only contain single byte characters
> 
>
> Key: PHOENIX-3453
> URL: https://issues.apache.org/jira/browse/PHOENIX-3453
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0, 4.9.0
>Reporter: Joel Palmert
>Assignee: chenglei
> Fix For: 4.10.0
>
> Attachments: PHOENIX-3453_v1.patch
>
>
> Steps to repro:
> CREATE TABLE IF NOT EXISTS TEST.TEST (
> ENTITY_ID CHAR(15) NOT NULL,
> SCORE DOUBLE,
> CONSTRAINT TEST_PK PRIMARY KEY (
> ENTITY_ID
> )
> ) VERSIONS=1, MULTI_TENANT=FALSE, REPLICATION_SCOPE=1, TTL=31536000;
> CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (SCORE DESC, ENTITY_ID 
> DESC);
> UPSERT INTO test.test VALUES ('entity1',1.1);
> SELECT DISTINCT entity_id, score
> FROM(
> SELECT entity_id, score
> FROM test.test
> LIMIT 25
> );
> Output (in SQuirreL)
> ���   1.1
> If you run it in SQuirreL it results in the entity_id column getting the 
> above error value. Notice that if you remove the secondary index or DISTINCT 
> you get the correct result.
> I've also run the query through the Phoenix java api. Then I get the 
> following exception:
> Caused by: java.sql.SQLException: ERROR 201 (22000): Illegal data. CHAR types 
> may only contain single byte characters ()
> at 
> org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:454)
> at 
> org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:145)
> at 
> org.apache.phoenix.schema.types.PDataType.newIllegalDataException(PDataType.java:291)
> at org.apache.phoenix.schema.types.PChar.toObject(PChar.java:121)
> at org.apache.phoenix.schema.types.PDataType.toObject(PDataType.java:997)
> at 
> org.apache.phoenix.compile.ExpressionProjector.getValue(ExpressionProjector.java:75)
> at 
> org.apache.phoenix.jdbc.PhoenixResultSet.getString(PhoenixResultSet.java:608)
> at 
> org.apache.phoenix.jdbc.PhoenixResultSet.getString(PhoenixResultSet.java:621)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-3453) Secondary index and query using distinct: Outer query results in ERROR 201 (22000): Illegal data. CHAR types may only contain single byte characters

2017-02-14 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866566#comment-15866566
 ] 

James Taylor commented on PHOENIX-3453:
---

+1 on the patch. Thanks for the excellent work, [~comnetwork]!

> Secondary index and query using distinct: Outer query results in ERROR 201 
> (22000): Illegal data. CHAR types may only contain single byte characters
> 
>
> Key: PHOENIX-3453
> URL: https://issues.apache.org/jira/browse/PHOENIX-3453
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0, 4.9.0
>Reporter: Joel Palmert
>Assignee: chenglei
> Attachments: PHOENIX-3453_v1.patch
>
>
> Steps to repro:
> CREATE TABLE IF NOT EXISTS TEST.TEST (
> ENTITY_ID CHAR(15) NOT NULL,
> SCORE DOUBLE,
> CONSTRAINT TEST_PK PRIMARY KEY (
> ENTITY_ID
> )
> ) VERSIONS=1, MULTI_TENANT=FALSE, REPLICATION_SCOPE=1, TTL=31536000;
> CREATE INDEX IF NOT EXISTS TEST_SCORE ON TEST.TEST (SCORE DESC, ENTITY_ID 
> DESC);
> UPSERT INTO test.test VALUES ('entity1',1.1);
> SELECT DISTINCT entity_id, score
> FROM(
> SELECT entity_id, score
> FROM test.test
> LIMIT 25
> );
> Output (in SQuirreL)
> ���   1.1
> If you run it in SQuirreL it results in the entity_id column getting the 
> above error value. Notice that if you remove the secondary index or DISTINCT 
> you get the correct result.
> I've also run the query through the Phoenix java api. Then I get the 
> following exception:
> Caused by: java.sql.SQLException: ERROR 201 (22000): Illegal data. CHAR types 
> may only contain single byte characters ()
> at 
> org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:454)
> at 
> org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:145)
> at 
> org.apache.phoenix.schema.types.PDataType.newIllegalDataException(PDataType.java:291)
> at org.apache.phoenix.schema.types.PChar.toObject(PChar.java:121)
> at org.apache.phoenix.schema.types.PDataType.toObject(PDataType.java:997)
> at 
> org.apache.phoenix.compile.ExpressionProjector.getValue(ExpressionProjector.java:75)
> at 
> org.apache.phoenix.jdbc.PhoenixResultSet.getString(PhoenixResultSet.java:608)
> at 
> org.apache.phoenix.jdbc.PhoenixResultSet.getString(PhoenixResultSet.java:621)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-2993) Tephra: Prune invalid transaction set once all data for a given invalid transaction has been dropped

2017-02-14 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866553#comment-15866553
 ] 

James Taylor commented on PHOENIX-2993:
---

[~poornachandra] - with Tephra 0.11.0, is there anything required on Phoenix's 
part to take advantage of the pruning of the invalid list?

> Tephra: Prune invalid transaction set once all data for a given invalid 
> transaction has been dropped
> 
>
> Key: PHOENIX-2993
> URL: https://issues.apache.org/jira/browse/PHOENIX-2993
> Project: Phoenix
>  Issue Type: New Feature
>Reporter: Poorna Chandra
>Assignee: Poorna Chandra
> Attachments: ApacheTephraAutomaticInvalidListPruning.pdf
>
>
> From TEPHRA-35 -
> In addition to dropping the data from invalid transactions we need to be able 
> to prune the invalid set of any transactions where data cleanup has been 
> completely performed. Without this, the invalid set will grow indefinitely 
> and become a greater and greater cost to in-progress transactions over time.
> To do this correctly, the TransactionDataJanitor coprocessor will need to 
> maintain some bookkeeping for the transaction data that it removes, so that 
> the transaction manager can reason about when all of a given transaction's 
> data has been removed. Only at this point can the transaction manager safely 
> drop the transaction ID from the invalid set.
> One approach would be for the TransactionDataJanitor to update a table 
> marking when a major compaction was performed on a region and what 
> transaction IDs were filtered out. Once all regions in a table containing the 
> transaction data have been compacted, we can remove the filtered out 
> transaction IDs from the invalid set. However, this will need to cope with 
> changing region names due to splits, etc.
> Note: This will be moved to Tephra JIRA once the setup of Tephra JIRA is 
> complete (INFRA-11445)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (PHOENIX-3673) Upgrade to Tephra 0.11.0 when released

2017-02-14 Thread James Taylor (JIRA)
James Taylor created PHOENIX-3673:
-

 Summary: Upgrade to Tephra 0.11.0 when released
 Key: PHOENIX-3673
 URL: https://issues.apache.org/jira/browse/PHOENIX-3673
 Project: Phoenix
  Issue Type: Improvement
Reporter: James Taylor
 Fix For: 4.10.0






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-3639) WALEntryFilter to replicate only multi-tenant views from SYSTEM.CATALOG

2017-02-14 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866528#comment-15866528
 ] 

James Taylor commented on PHOENIX-3639:
---

Perhaps best just to not support this in 0.98 release then? WDYT?

> WALEntryFilter to replicate only multi-tenant views from SYSTEM.CATALOG
> ---
>
> Key: PHOENIX-3639
> URL: https://issues.apache.org/jira/browse/PHOENIX-3639
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.8.0, 4.9.0
>Reporter: Geoffrey Jacoby
>Assignee: Geoffrey Jacoby
>  Labels: replication
> Fix For: 4.10.0
>
> Attachments: PHOENIX-3639.patch, PHOENIX-3639.v2.patch
>
>
> As a stopgap before PHOENIX-3520, we can create an HBase WALEntryFilter to 
> filter out non-tenant rows of SYSTEM.CATALOG from replication while allowing 
> tenant-owned rows such as tenant views to proceed with replication. 
> This would have to be incorporated into a ReplicationEndpoint subclass to be 
> useful, though HBASE-17543 would make that much simpler by doing it via 
> configuration rather than code (and avoiding the need for a new peer to be 
> created).
> If PHOENIX-3520 is on the near-future roadmap, however, that would be the 
> better solution to the "replication corrupts SYSTEM.CATALOG" problem.  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-3603) Fix compilation errors against hbase 1.3.0 release

2017-02-14 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866521#comment-15866521
 ] 

James Taylor commented on PHOENIX-3603:
---

[~rajeshbabu] - on second thought, we really should create a 4.x-HBase-1.2 
branch from master and then commit this patch on master (so that master 
corresponds to the latest HBase version). Would you be ok doing that? I'd 
recommend waiting just a few days until [~samarthjain] puts together and 
hopefully commits his column encoding pull request.

> Fix compilation errors against hbase 1.3.0 release
> --
>
> Key: PHOENIX-3603
> URL: https://issues.apache.org/jira/browse/PHOENIX-3603
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Ted Yu
>Assignee: Zach York
> Fix For: 4.10.0
>
>
> hbase 1.3.0 has been released.
> I saw the following when compiling master branch against 1.3.0
> {code}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:3.0:compile (default-compile) 
> on project phoenix-core: Compilation failure: Compilation failure:
> [ERROR] 
> /Users/tyu/phoenix/phoenix-core/src/main/java/org/apache/phoenix/execute/DelegateHTable.java:[49,8]
>  org.apache.phoenix.execute.DelegateHTable is not abstract and does not 
> override abstract method getRpcTimeout() in 
> org.apache.hadoop.hbase.client.Table
> [ERROR] 
> /Users/tyu/phoenix/phoenix-core/src/main/java/org/apache/hadoop/hbase/ipc/PhoenixRpcScheduler.java:[32,8]
>  org.apache.hadoop.hbase.ipc.PhoenixRpcScheduler is not abstract and does not 
> override abstract method getNumLifoModeSwitches() in 
> org.apache.hadoop.hbase.ipc.RpcScheduler
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-3639) WALEntryFilter to replicate only multi-tenant views from SYSTEM.CATALOG

2017-02-14 Thread Geoffrey Jacoby (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866512#comment-15866512
 ] 

Geoffrey Jacoby commented on PHOENIX-3639:
--

Replication in 0.98 is significantly different than in 1.x. I can make a 
0.98-based patch, though the companion HBase JIRA to this (HBASE-17543) is not 
in 0.98 due to the recent EOLing of 0.98. An 0.98-based patch of PHOENIX-3639 
would still be potentially useful, but require users of 0.98.x clusters to 
write their own custom ReplicationEndpoint implementations to make use of the 
new WALEntryFilter introduced here. Users of the upcoming HBase 1.4 and up can 
make use of it just by changing some config on their existing replication peers 
and doing a rolling restart.   

> WALEntryFilter to replicate only multi-tenant views from SYSTEM.CATALOG
> ---
>
> Key: PHOENIX-3639
> URL: https://issues.apache.org/jira/browse/PHOENIX-3639
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.8.0, 4.9.0
>Reporter: Geoffrey Jacoby
>Assignee: Geoffrey Jacoby
>  Labels: replication
> Fix For: 4.10.0
>
> Attachments: PHOENIX-3639.patch, PHOENIX-3639.v2.patch
>
>
> As a stopgap before PHOENIX-3520, we can create an HBase WALEntryFilter to 
> filter out non-tenant rows of SYSTEM.CATALOG from replication while allowing 
> tenant-owned rows such as tenant views to proceed with replication. 
> This would have to be incorporated into a ReplicationEndpoint subclass to be 
> useful, though HBASE-17543 would make that much simpler by doing it via 
> configuration rather than code (and avoiding the need for a new peer to be 
> created).
> If PHOENIX-3520 is on the near-future roadmap, however, that would be the 
> better solution to the "replication corrupts SYSTEM.CATALOG" problem.  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-3658) Remove org.json:json dependency from flume module

2017-02-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866498#comment-15866498
 ] 

Hadoop QA commented on PHOENIX-3658:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12852625/PHOENIX-3658.patch
  against master branch at commit 7567fcd6d569a2ece7556c4e3a966a1baf34c3a5.
  ATTACHMENT ID: 12852625

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+0 tests included{color}.  The patch appears to be a 
documentation, build,
or dev patch that doesn't require tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 
43 warning messages.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:green}+1 core tests{color}.  The patch passed unit tests in .

 {color:red}-1 core zombie tests{color}.  There are 13 zombie test(s):  
at 
org.apache.ambari.server.controller.internal.RootServiceComponentPropertyProviderTest.testPopulateResources(RootServiceComponentPropertyProviderTest.java:64)
at 
org.apache.ambari.server.controller.internal.RootServiceComponentPropertyProviderTest.testPopulateResources_AmbariServer_JCEPolicy(RootServiceComponentPropertyProviderTest.java:48)

Test results: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/771//testReport/
Javadoc warnings: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/771//artifact/patchprocess/patchJavadocWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/771//console

This message is automatically generated.

> Remove org.json:json dependency from flume module
> -
>
> Key: PHOENIX-3658
> URL: https://issues.apache.org/jira/browse/PHOENIX-3658
> Project: Phoenix
>  Issue Type: Task
>Reporter: Josh Elser
>Assignee: Josh Mahonin
>Priority: Blocker
> Attachments: PHOENIX-3658.patch
>
>
> The phoenix-flume module depends on org.json:json which is now category-x.
> We have a grace period until 2017/04/30 to resolve this one.
> Need to replace it with something else.
> https://www.apache.org/legal/resolved#json
> https://lists.apache.org/thread.html/bb18f942ce7eb83c11438303c818b885810fb76385979490366720d5@%3Clegal-discuss.apache.org%3E



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-3639) WALEntryFilter to replicate only multi-tenant views from SYSTEM.CATALOG

2017-02-14 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866484#comment-15866484
 ] 

James Taylor commented on PHOENIX-3639:
---

[~gjacoby] - looks like we need an updated patch for 4.x-HBase-0.98 branch as 
I'm getting compilation errors:
{code}
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-compiler-plugin:3.0:compile (default-compile) on 
project phoenix-core: Compilation failure: Compilation failure:
[ERROR] 
/Users/jtaylor/dev/apache/phoenix/phoenix-core/src/main/java/org/apache/phoenix/replication/SystemCatalogWALEntryFilter.java:[24,35]
 package org.apache.hadoop.hbase.wal does not exist
[ERROR] 
/Users/jtaylor/dev/apache/phoenix/phoenix-core/src/main/java/org/apache/phoenix/replication/SystemCatalogWALEntryFilter.java:[40,30]
 package WAL does not exist
[ERROR] 
/Users/jtaylor/dev/apache/phoenix/phoenix-core/src/main/java/org/apache/phoenix/replication/SystemCatalogWALEntryFilter.java:[40,13]
 package WAL does not exist
{code}

> WALEntryFilter to replicate only multi-tenant views from SYSTEM.CATALOG
> ---
>
> Key: PHOENIX-3639
> URL: https://issues.apache.org/jira/browse/PHOENIX-3639
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.8.0, 4.9.0
>Reporter: Geoffrey Jacoby
>Assignee: Geoffrey Jacoby
>  Labels: replication
> Fix For: 4.10.0
>
> Attachments: PHOENIX-3639.patch, PHOENIX-3639.v2.patch
>
>
> As a stopgap before PHOENIX-3520, we can create an HBase WALEntryFilter to 
> filter out non-tenant rows of SYSTEM.CATALOG from replication while allowing 
> tenant-owned rows such as tenant views to proceed with replication. 
> This would have to be incorporated into a ReplicationEndpoint subclass to be 
> useful, though HBASE-17543 would make that much simpler by doing it via 
> configuration rather than code (and avoiding the need for a new peer to be 
> created).
> If PHOENIX-3520 is on the near-future roadmap, however, that would be the 
> better solution to the "replication corrupts SYSTEM.CATALOG" problem.  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (PHOENIX-3672) Change tests extending BaseQueryIT to use unique table names

2017-02-14 Thread Samarth Jain (JIRA)
Samarth Jain created PHOENIX-3672:
-

 Summary: Change tests extending BaseQueryIT to use unique table 
names
 Key: PHOENIX-3672
 URL: https://issues.apache.org/jira/browse/PHOENIX-3672
 Project: Phoenix
  Issue Type: Task
Reporter: Samarth Jain
Assignee: Samarth Jain


This is important for making sure we have good coverage for column encoding and 
any new features we will add.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-3513) Throw SQLException instead of IllegalStateException when max commit size exceeded

2017-02-14 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866467#comment-15866467
 ] 

James Taylor commented on PHOENIX-3513:
---

Ping [~gjacoby] - still planning on this minor one?

> Throw SQLException instead of IllegalStateException when max commit size 
> exceeded
> -
>
> Key: PHOENIX-3513
> URL: https://issues.apache.org/jira/browse/PHOENIX-3513
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: Geoffrey Jacoby
> Fix For: 4.10.0
>
>
> We should change this code in MutationState and implement the TODO here:
> {code}
> private void throwIfTooBig() {
> if (numRows > maxSize) {
> // TODO: throw SQLException ?
> throw new IllegalArgumentException("MutationState size of " + 
> numRows + " is bigger than max allowed size of " + maxSize);
> }
> }
> {code}
> Otherwise, it's difficult for clients to react to the exception.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (PHOENIX-3607) Change hashCode calculation for caching ConnectionQueryServicesImpls

2017-02-14 Thread James Taylor (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Taylor resolved PHOENIX-3607.
---
Resolution: Won't Fix

> Change hashCode calculation for caching ConnectionQueryServicesImpls
> 
>
> Key: PHOENIX-3607
> URL: https://issues.apache.org/jira/browse/PHOENIX-3607
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0, 4.9.0
>Reporter: Geoffrey Jacoby
>Assignee: Geoffrey Jacoby
>
> PhoenixDriver maintains a cache of ConnectionInfo -> 
> ConnectionQueryServicesImpl (each of which holds a single HConnection) : 
> The hash code of ConnectionInfo in part uses the hash code of its HBase User 
> object, which uses the *identity hash* of the Subject allocated at login. 
> There are concerns about the stability of this hashcode. When we log out and 
> log in after TGT refresh, will we have a new Subject?
> To be defensive, we should do a hash of the string returned by user.getName() 
> instead.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (PHOENIX-3639) WALEntryFilter to replicate only multi-tenant views from SYSTEM.CATALOG

2017-02-14 Thread James Taylor (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Taylor updated PHOENIX-3639:
--
Fix Version/s: 4.10.0

> WALEntryFilter to replicate only multi-tenant views from SYSTEM.CATALOG
> ---
>
> Key: PHOENIX-3639
> URL: https://issues.apache.org/jira/browse/PHOENIX-3639
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.8.0, 4.9.0
>Reporter: Geoffrey Jacoby
>Assignee: Geoffrey Jacoby
>  Labels: replication
> Fix For: 4.10.0
>
> Attachments: PHOENIX-3639.patch, PHOENIX-3639.v2.patch
>
>
> As a stopgap before PHOENIX-3520, we can create an HBase WALEntryFilter to 
> filter out non-tenant rows of SYSTEM.CATALOG from replication while allowing 
> tenant-owned rows such as tenant views to proceed with replication. 
> This would have to be incorporated into a ReplicationEndpoint subclass to be 
> useful, though HBASE-17543 would make that much simpler by doing it via 
> configuration rather than code (and avoiding the need for a new peer to be 
> created).
> If PHOENIX-3520 is on the near-future roadmap, however, that would be the 
> better solution to the "replication corrupts SYSTEM.CATALOG" problem.  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (PHOENIX-3639) WALEntryFilter to replicate only multi-tenant views from SYSTEM.CATALOG

2017-02-14 Thread James Taylor (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Taylor updated PHOENIX-3639:
--
Summary: WALEntryFilter to replicate only multi-tenant views from 
SYSTEM.CATALOG  (was: WALEntryFilter to block System.Catalog replication)

> WALEntryFilter to replicate only multi-tenant views from SYSTEM.CATALOG
> ---
>
> Key: PHOENIX-3639
> URL: https://issues.apache.org/jira/browse/PHOENIX-3639
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.8.0, 4.9.0
>Reporter: Geoffrey Jacoby
>Assignee: Geoffrey Jacoby
>  Labels: replication
> Attachments: PHOENIX-3639.patch, PHOENIX-3639.v2.patch
>
>
> As a stopgap before PHOENIX-3520, we can create an HBase WALEntryFilter to 
> filter out non-tenant rows of SYSTEM.CATALOG from replication while allowing 
> tenant-owned rows such as tenant views to proceed with replication. 
> This would have to be incorporated into a ReplicationEndpoint subclass to be 
> useful, though HBASE-17543 would make that much simpler by doing it via 
> configuration rather than code (and avoiding the need for a new peer to be 
> created).
> If PHOENIX-3520 is on the near-future roadmap, however, that would be the 
> better solution to the "replication corrupts SYSTEM.CATALOG" problem.  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (PHOENIX-3214) Kafka Phoenix Consumer

2017-02-14 Thread James Taylor (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Taylor reassigned PHOENIX-3214:
-

Assignee: Kalyan  (was: Josh Mahonin)

> Kafka Phoenix Consumer
> --
>
> Key: PHOENIX-3214
> URL: https://issues.apache.org/jira/browse/PHOENIX-3214
> Project: Phoenix
>  Issue Type: New Feature
>Reporter: Kalyan
>Assignee: Kalyan
> Fix For: 4.10.0
>
> Attachments: PHOENIX-3214.addendum-0.98.patch, 
> PHOENIX-3214.addendum-1.1.patch, PHOENIX-3214-docs.patch, 
> PHOENIX-3214-final.patch, PHOENIX-3214_license-2.patch, 
> PHOENIX-3214_license.patch, PHOENIX-3214.patch, PHOENIX-3214-updated-1.patch, 
> PHOENIX-3214-updated-2.patch, PHOENIX-3214-updated.patch
>
>
> Providing a new feature to Phoenix.
> Directly ingest Kafka messages to Phoenix.
> Similar to flume phoenix integration.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (PHOENIX-3670) KeyRange.intersect(List , List) is inefficient,especially for join dynamic filter

2017-02-14 Thread James Taylor (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Taylor reassigned PHOENIX-3670:
-

Assignee: chenglei

> KeyRange.intersect(List , List) is inefficient,especially 
> for join dynamic filter
> -
>
> Key: PHOENIX-3670
> URL: https://issues.apache.org/jira/browse/PHOENIX-3670
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.9.0
>Reporter: chenglei
>Assignee: chenglei
> Fix For: 4.10.0
>
> Attachments: PHOENIX-3670_v1.patch
>
>
> In my business system, there is a following join SQL(which is simplified), 
> fact_table is a fact table,  joining dimension table dim_table1 and 
> dim_table2 : 
> {code:borderStyle=solid} 
> select /*+ SKIP_SCAN */ sum(t.click)  from fact_table t join dim_table1 d1 on 
>  t.cust_id=d1.id  join dim_table2 d2 on t.cust_id =d2.id  where t.date 
> between '2016-01-01' and '2017-01-01' and d1.code = 2008 and d2.region = 'us';
> {code} 
> I use /*+ SKIP_SCAN */ hint to enable join dynamic filter. For some small 
> dataset, the sql executes quickly, but when the dataset is bigger, the sql 
> becomes very slowly,when the  row count of fact_table is 30 
> million,dim_table1 is 300 thousand and dim_table2 is 100 thousand, the above 
> query  costs 17s.
> When I debug the SQL executing, I find RHS1 return 5523 rows:
> {code:borderStyle=solid} 
>select d1.id from dim_table1 d1 where d1.code = 2008
> {code} 
> and RHS2 return 23881 rows: 
> {code:borderStyle=solid}
>select d2.id from dim_table2 d2 where d2.region='us'
> {code}  
> then HashJoinPlan uses  KeyRange.intersect(List , List ) 
> method to compute RHS1 intersecting RHS2 for join dynamic filter, narrowing 
> down fact_table.cust_id should be. 
> Surprisingly,the KeyRange.intersect method costs 11s ! although the whole sql 
> execution only costs 17s.After I read the code of  KeyRange.intersect 
> method,I find following two problem:
> (1) The double loop is inefficient in line 521 and line 522,when keyRanges  
> size is M, keyRanges2 size is N, the time complexity is O(M*N), for my 
> example,is 5523*23881: 
> {code:borderStyle=solid} 
> 519 public static List intersect(List keyRanges,  
> List keyRanges2) {
> 520List tmp = new ArrayList();
> 521for (KeyRange r1 : keyRanges) {
> 522for (KeyRange r2 : keyRanges2) {
> 523KeyRange r = r1.intersect(r2);
> 524if (EMPTY_RANGE != r) {
> 525tmp.add(r);
> 526}
> 527}
> 528}
> {code}  
> (2) line 540 shoule be r = r.union(tmp.get( i )), not intersect, just as 
> KeyRange.coalesce method does:
> {code:borderStyle=solid} 
> 532Collections.sort(tmp, KeyRange.COMPARATOR);
> 533List tmp2 = new ArrayList();
> 534KeyRange r = tmp.get(0);
> 535for (int i=1; i 536if (EMPTY_RANGE == r.intersect(tmp.get(i))) {
> 537tmp2.add(r);
> 538r = tmp.get(i);
> 539} else {
> 540r = r.intersect(tmp.get(i));
> 541}
> 542}
> {code}
> and it seems that no unit tests for this KeyRange.intersect(List , 
> List) method. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (PHOENIX-3670) KeyRange.intersect(List , List) is inefficient,especially for join dynamic filter

2017-02-14 Thread James Taylor (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Taylor updated PHOENIX-3670:
--
Fix Version/s: 4.10.0

> KeyRange.intersect(List , List) is inefficient,especially 
> for join dynamic filter
> -
>
> Key: PHOENIX-3670
> URL: https://issues.apache.org/jira/browse/PHOENIX-3670
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.9.0
>Reporter: chenglei
>Assignee: chenglei
> Fix For: 4.10.0
>
> Attachments: PHOENIX-3670_v1.patch
>
>
> In my business system, there is a following join SQL(which is simplified), 
> fact_table is a fact table,  joining dimension table dim_table1 and 
> dim_table2 : 
> {code:borderStyle=solid} 
> select /*+ SKIP_SCAN */ sum(t.click)  from fact_table t join dim_table1 d1 on 
>  t.cust_id=d1.id  join dim_table2 d2 on t.cust_id =d2.id  where t.date 
> between '2016-01-01' and '2017-01-01' and d1.code = 2008 and d2.region = 'us';
> {code} 
> I use /*+ SKIP_SCAN */ hint to enable join dynamic filter. For some small 
> dataset, the sql executes quickly, but when the dataset is bigger, the sql 
> becomes very slowly,when the  row count of fact_table is 30 
> million,dim_table1 is 300 thousand and dim_table2 is 100 thousand, the above 
> query  costs 17s.
> When I debug the SQL executing, I find RHS1 return 5523 rows:
> {code:borderStyle=solid} 
>select d1.id from dim_table1 d1 where d1.code = 2008
> {code} 
> and RHS2 return 23881 rows: 
> {code:borderStyle=solid}
>select d2.id from dim_table2 d2 where d2.region='us'
> {code}  
> then HashJoinPlan uses  KeyRange.intersect(List , List ) 
> method to compute RHS1 intersecting RHS2 for join dynamic filter, narrowing 
> down fact_table.cust_id should be. 
> Surprisingly,the KeyRange.intersect method costs 11s ! although the whole sql 
> execution only costs 17s.After I read the code of  KeyRange.intersect 
> method,I find following two problem:
> (1) The double loop is inefficient in line 521 and line 522,when keyRanges  
> size is M, keyRanges2 size is N, the time complexity is O(M*N), for my 
> example,is 5523*23881: 
> {code:borderStyle=solid} 
> 519 public static List intersect(List keyRanges,  
> List keyRanges2) {
> 520List tmp = new ArrayList();
> 521for (KeyRange r1 : keyRanges) {
> 522for (KeyRange r2 : keyRanges2) {
> 523KeyRange r = r1.intersect(r2);
> 524if (EMPTY_RANGE != r) {
> 525tmp.add(r);
> 526}
> 527}
> 528}
> {code}  
> (2) line 540 shoule be r = r.union(tmp.get( i )), not intersect, just as 
> KeyRange.coalesce method does:
> {code:borderStyle=solid} 
> 532Collections.sort(tmp, KeyRange.COMPARATOR);
> 533List tmp2 = new ArrayList();
> 534KeyRange r = tmp.get(0);
> 535for (int i=1; i 536if (EMPTY_RANGE == r.intersect(tmp.get(i))) {
> 537tmp2.add(r);
> 538r = tmp.get(i);
> 539} else {
> 540r = r.intersect(tmp.get(i));
> 541}
> 542}
> {code}
> and it seems that no unit tests for this KeyRange.intersect(List , 
> List) method. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-3538) Regex Bulkload Tool

2017-02-14 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866423#comment-15866423
 ] 

James Taylor commented on PHOENIX-3538:
---

[~kalyanhadoop] - would it be possible to do the above so we can get this 
committed?

> Regex Bulkload Tool
> ---
>
> Key: PHOENIX-3538
> URL: https://issues.apache.org/jira/browse/PHOENIX-3538
> Project: Phoenix
>  Issue Type: New Feature
>Reporter: Kalyan
>Assignee: Kalyan
>Priority: Minor
> Attachments: PHOENIX-3538-codecleanup.patch, 
> PHOENIX-3538-final.patch, PHOENIX-3538.patch, PHOENIX-3538-v1.patch
>
>
> To work with complex data , we can regex to load directly.
> Similar to JSON Bulkload Tool & CSV Bulkload Tool



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-3135) Support loading csv data using apache phoenix flume plugin

2017-02-14 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866360#comment-15866360
 ] 

James Taylor commented on PHOENIX-3135:
---

Good catch, [~jmahonin]! +1 to that change.

> Support loading csv data using apache phoenix flume plugin
> --
>
> Key: PHOENIX-3135
> URL: https://issues.apache.org/jira/browse/PHOENIX-3135
> Project: Phoenix
>  Issue Type: New Feature
> Environment: cloudera 5.4
>Reporter: Kalyan
>Assignee: Josh Mahonin
>Priority: Minor
> Fix For: 4.10.0
>
> Attachments: phoenix_csv.patch
>
>
> To work with below sample data sets ... we need support loading csv data 
> using apache phoenix flume plugin.
> // sample data set 1
> schema: col1 varchar , col2 double, col3 varchar, col4 integer
> input: kalyan,10.5,abc,1
> input: "kalyan",10.5,"abc",1
> // sample data set 1
> schema: col1 varchar , col2 double, col3 varchar[], col4 integer[]
> input: kalyan,10.5,"abc,pqr,xyz","1,2,3,4"



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-3503) PhoenixStorageHandler doesn't work properly when execution engine of Hive is Tez.

2017-02-14 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866356#comment-15866356
 ] 

James Taylor commented on PHOENIX-3503:
---

[~sergey.soldatov] - would it be possible for you to review this? [~Jeongdae 
Kim] - this might need to be rebased after PHOENIX-3346 is committed.

> PhoenixStorageHandler doesn't  work properly when execution engine of Hive is 
> Tez.
> --
>
> Key: PHOENIX-3503
> URL: https://issues.apache.org/jira/browse/PHOENIX-3503
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Jeongdae Kim
>Assignee: Jeongdae Kim
>  Labels: HivePhoenix
> Attachments: PHOENIX-3503.patch
>
>
> Hive storage handler can't parse some column types that have 
> parameters(length, precision, scale...) from serdeConstants.LIST_COLUMN_TYPES 
> correctly, when execution engine of Hive is Tez.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-3486) RoundRobinResultIterator doesn't work correctly because of setting Scan's cache size inappropriately in PhoenixInputForamt

2017-02-14 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866354#comment-15866354
 ] 

James Taylor commented on PHOENIX-3486:
---

LGTM. [~sergey.soldatov] - would it be possible for you to review this too? 
[~Jeongdae Kim] - this might need to be rebased after PHOENIX-3346 is committed.

> RoundRobinResultIterator doesn't work correctly because of setting Scan's 
> cache size inappropriately in PhoenixInputForamt
> --
>
> Key: PHOENIX-3486
> URL: https://issues.apache.org/jira/browse/PHOENIX-3486
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Jeongdae Kim
>Assignee: Jeongdae Kim
>  Labels: HivePhoenix
> Attachments: PHOENIX-3486.patch
>
>
> RoundRobinResultIterator uses "hbase.client.scanner.caching" to fill caches  
> in parallel for all scans, but by setting Scan.setCaching() in 
> PhoenixInputForrmat(phoenix-hive), RoundRobinResultIterator doesn't work 
> correctly, because if Scan have cache size by setCaching(), HBase uses cache 
> size from Scan.getCaching() to fill cache, not 
> "hbase.client.scanner.caching", and RoundRobinResultIterator scans the table 
> in parallel to fill caches every "hbase.client.scanner.caching", resulting in 
> unintended parallel scan operation,  this causes scan performance degradation.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-3512) PhoenixStorageHandler makes erroneous query string when handling between clauses with date constants.

2017-02-14 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866349#comment-15866349
 ] 

James Taylor commented on PHOENIX-3512:
---

[~sergey.soldatov] - would it be possible for you to review this? [~Jeongdae 
Kim] - this might need to be rebased after PHOENIX-3346 is committed.

> PhoenixStorageHandler makes erroneous query string when handling between 
> clauses with date constants.
> -
>
> Key: PHOENIX-3512
> URL: https://issues.apache.org/jira/browse/PHOENIX-3512
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0
>Reporter: Jeongdae Kim
>Assignee: Jeongdae Kim
>  Labels: HivePhoenix
> Attachments: PHOENIX-3512.patch
>
>
> ex) l_shipdate BETWEEN '1992-01-02' AND '1992-02-02' --> l_shipdate between 
> to_date('69427800') and to_date('69695640')



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-3536) Remove creating unnecessary phoenix connections in MR Tasks of Hive

2017-02-14 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866342#comment-15866342
 ] 

James Taylor commented on PHOENIX-3536:
---

Thanks for the patch, [~Jeongdae Kim]. What kind of impact does this have on 
performance? [~sergey.soldatov] - would it be possible for you to review this? 
[~Jeongdae Kim] - this might need to be rebased after PHOENIX-3346 is committed.

> Remove creating unnecessary phoenix connections in MR Tasks of Hive
> ---
>
> Key: PHOENIX-3536
> URL: https://issues.apache.org/jira/browse/PHOENIX-3536
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: Jeongdae Kim
>Assignee: Jeongdae Kim
>  Labels: HivePhoenix
> Attachments: PHOENIX-3536.1.patch
>
>
> PhoenixStorageHandler creates phoenix connections to make QueryPlan in 
> getSplit phase(prepare MR) and getRecordReader phase(Map) while running MR 
> Job.
> in phoenix, it spends too many times to create the first phoenix 
> connection(QueryServices) for specific URL. (checking and loading phoenix 
> schema information)
> i found it is possible to remove creating query plan again in Map 
> phase(getRecordReader()) by serializing QueryPlan created from Input format 
> ans passing this plan to record reader. 
>  this approach improves scan performance by removing trying to unnecessary 
> connection in map phase.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-3135) Support loading csv data using apache phoenix flume plugin

2017-02-14 Thread Josh Mahonin (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866330#comment-15866330
 ] 

Josh Mahonin commented on PHOENIX-3135:
---

LGTM. [~kalyanhadoop] any issues if I change the commons-csv version to use 
${commons-csv.version} defined in the parent pom. It's version 1.0 vs 1.3. 
Integration tests all pass with that change as well.

> Support loading csv data using apache phoenix flume plugin
> --
>
> Key: PHOENIX-3135
> URL: https://issues.apache.org/jira/browse/PHOENIX-3135
> Project: Phoenix
>  Issue Type: New Feature
> Environment: cloudera 5.4
>Reporter: Kalyan
>Assignee: Josh Mahonin
>Priority: Minor
> Fix For: 4.10.0
>
> Attachments: phoenix_csv.patch
>
>
> To work with below sample data sets ... we need support loading csv data 
> using apache phoenix flume plugin.
> // sample data set 1
> schema: col1 varchar , col2 double, col3 varchar, col4 integer
> input: kalyan,10.5,abc,1
> input: "kalyan",10.5,"abc",1
> // sample data set 1
> schema: col1 varchar , col2 double, col3 varchar[], col4 integer[]
> input: kalyan,10.5,"abc,pqr,xyz","1,2,3,4"



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-3662) PhoenixStorageHandler throws ClassCastException.

2017-02-14 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866328#comment-15866328
 ] 

James Taylor commented on PHOENIX-3662:
---

[~sergey.soldatov] - would it be possible for you to review this? [~Jeongdae 
Kim] - this might need to be rebased after PHOENIX-3346 is committed.

> PhoenixStorageHandler throws ClassCastException.
> 
>
> Key: PHOENIX-3662
> URL: https://issues.apache.org/jira/browse/PHOENIX-3662
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.9.0
>Reporter: Jeongdae Kim
>Assignee: Jeongdae Kim
> Attachments: PHOENIX-3662.1.patch, PHOENIX-3662.2.patch
>
>
> when executing a query that has between clauses embraced by function, phoenix 
> storage handler throws class cast exception like below.
> and in addition, i found some bugs when handling push down predicates.
> {code}
> 2017-02-06T16:35:26,019 ERROR [7d29d400-2ec5-4ab8-84c2-041b55c3e24b 
> HiveServer2-Handler-Pool: Thread-57]: ql.Driver 
> (SessionState.java:printError(1097)) - FAILED: ClassCastException 
> org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc cannot be cast to 
> org.apache.hadoop.hive.ql.plan.ExprNodeColumnDesc
> java.lang.ClassCastException: 
> org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc cannot be cast to 
> org.apache.hadoop.hive.ql.plan.ExprNodeColumnDesc
>   at 
> org.apache.phoenix.hive.ql.index.IndexPredicateAnalyzer.processingBetweenOperator(IndexPredicateAnalyzer.java:229)
>   at 
> org.apache.phoenix.hive.ql.index.IndexPredicateAnalyzer.analyzeExpr(IndexPredicateAnalyzer.java:369)
>   at 
> org.apache.phoenix.hive.ql.index.IndexPredicateAnalyzer.access$000(IndexPredicateAnalyzer.java:72)
>   at 
> org.apache.phoenix.hive.ql.index.IndexPredicateAnalyzer$1.process(IndexPredicateAnalyzer.java:165)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:158)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
>   at 
> org.apache.phoenix.hive.ql.index.IndexPredicateAnalyzer.analyzePredicate(IndexPredicateAnalyzer.java:176)
>   at 
> org.apache.phoenix.hive.ppd.PhoenixPredicateDecomposer.decomposePredicate(PhoenixPredicateDecomposer.java:63)
>   at 
> org.apache.phoenix.hive.PhoenixStorageHandler.decomposePredicate(PhoenixStorageHandler.java:238)
>   at 
> org.apache.hadoop.hive.ql.ppd.OpProcFactory.pushFilterToStorageHandler(OpProcFactory.java:1004)
>   at 
> org.apache.hadoop.hive.ql.ppd.OpProcFactory.createFilter(OpProcFactory.java:910)
>   at 
> org.apache.hadoop.hive.ql.ppd.OpProcFactory.createFilter(OpProcFactory.java:880)
>   at 
> org.apache.hadoop.hive.ql.ppd.OpProcFactory$TableScanPPD.process(OpProcFactory.java:429)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:158)
>   at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
>   at 
> org.apache.hadoop.hive.ql.ppd.SimplePredicatePushDown.transform(SimplePredicatePushDown.java:102)
>   at 
> org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:242)
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10921)
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:246)
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:250)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:471)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1242)
>   at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1229)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:191)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:276)
>   at 
> org.apache.hive.service.cli.operation.Operation.run(Operation.java:324)
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:499)
>   at 
> 

[jira] [Commented] (PHOENIX-3346) Hive PhoenixStorageHandler doesn't work well with column mapping

2017-02-14 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866323#comment-15866323
 ] 

James Taylor commented on PHOENIX-3346:
---

Would you have a few cycles to get this committed, [~elserj]? Should we kick 
off a test run to make sure the Hive-related tests pass?

> Hive PhoenixStorageHandler doesn't work well with column mapping
> 
>
> Key: PHOENIX-3346
> URL: https://issues.apache.org/jira/browse/PHOENIX-3346
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Sergey Soldatov
>Assignee: Sergey Soldatov
>  Labels: HivePhoenix
> Attachments: PHOENIX-3346-1.patch
>
>
> If column mapping is used during table creation, the hive table becomes  
> unusable and throws UnknownColumn exception.
> There are several issues in the current implementation:
> 1. During table creation mapping doesn't applies to primary keys
> 2. During select query building no mapping happen
> 3. PhoenixRow should have backward mapping from phoenix column names to hive 
> names.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-3658) Remove org.json:json dependency from flume module

2017-02-14 Thread Josh Mahonin (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866316#comment-15866316
 ] 

Josh Mahonin commented on PHOENIX-3658:
---

Integration tests all pass locally. 

CC [~kalyanhadoop] can you take a quick look as well? com:tdunning:json is 
supposed to be a drop-in replacement for org.json:json, but it's possible there 
are still some edge cases.

> Remove org.json:json dependency from flume module
> -
>
> Key: PHOENIX-3658
> URL: https://issues.apache.org/jira/browse/PHOENIX-3658
> Project: Phoenix
>  Issue Type: Task
>Reporter: Josh Elser
>Assignee: Josh Mahonin
>Priority: Blocker
> Attachments: PHOENIX-3658.patch
>
>
> The phoenix-flume module depends on org.json:json which is now category-x.
> We have a grace period until 2017/04/30 to resolve this one.
> Need to replace it with something else.
> https://www.apache.org/legal/resolved#json
> https://lists.apache.org/thread.html/bb18f942ce7eb83c11438303c818b885810fb76385979490366720d5@%3Clegal-discuss.apache.org%3E



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (PHOENIX-3658) Remove org.json:json dependency from flume module

2017-02-14 Thread Josh Mahonin (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Mahonin updated PHOENIX-3658:
--
Attachment: PHOENIX-3658.patch

> Remove org.json:json dependency from flume module
> -
>
> Key: PHOENIX-3658
> URL: https://issues.apache.org/jira/browse/PHOENIX-3658
> Project: Phoenix
>  Issue Type: Task
>Reporter: Josh Elser
>Assignee: Josh Mahonin
>Priority: Blocker
> Attachments: PHOENIX-3658.patch
>
>
> The phoenix-flume module depends on org.json:json which is now category-x.
> We have a grace period until 2017/04/30 to resolve this one.
> Need to replace it with something else.
> https://www.apache.org/legal/resolved#json
> https://lists.apache.org/thread.html/bb18f942ce7eb83c11438303c818b885810fb76385979490366720d5@%3Clegal-discuss.apache.org%3E



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-3639) WALEntryFilter to block System.Catalog replication

2017-02-14 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866293#comment-15866293
 ] 

James Taylor commented on PHOENIX-3639:
---

+1. Thanks, [~gjacoby] - this looks very clean to me already. I'll get this 
committed shortly. If you have a few spare cycles, it would be most appreciated 
if you could right up a paragraph or two documenting how best to configure 
replication in Phoenix (and the config options that should be set). We get 
asked this frequently on the mailing list. Maybe a new Replication page in the 
Using menu after Tuning?

> WALEntryFilter to block System.Catalog replication
> --
>
> Key: PHOENIX-3639
> URL: https://issues.apache.org/jira/browse/PHOENIX-3639
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.8.0, 4.9.0
>Reporter: Geoffrey Jacoby
>Assignee: Geoffrey Jacoby
>  Labels: replication
> Attachments: PHOENIX-3639.patch, PHOENIX-3639.v2.patch
>
>
> As a stopgap before PHOENIX-3520, we can create an HBase WALEntryFilter to 
> filter out non-tenant rows of SYSTEM.CATALOG from replication while allowing 
> tenant-owned rows such as tenant views to proceed with replication. 
> This would have to be incorporated into a ReplicationEndpoint subclass to be 
> useful, though HBASE-17543 would make that much simpler by doing it via 
> configuration rather than code (and avoiding the need for a new peer to be 
> created).
> If PHOENIX-3520 is on the near-future roadmap, however, that would be the 
> better solution to the "replication corrupts SYSTEM.CATALOG" problem.  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-3670) KeyRange.intersect(List , List) is inefficient,especially for join dynamic filter

2017-02-14 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866269#comment-15866269
 ] 

James Taylor commented on PHOENIX-3670:
---

Excellent work, [~comnetwork]. I'll get this committed.

> KeyRange.intersect(List , List) is inefficient,especially 
> for join dynamic filter
> -
>
> Key: PHOENIX-3670
> URL: https://issues.apache.org/jira/browse/PHOENIX-3670
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.9.0
>Reporter: chenglei
> Attachments: PHOENIX-3670_v1.patch
>
>
> In my business system, there is a following join SQL(which is simplified), 
> fact_table is a fact table,  joining dimension table dim_table1 and 
> dim_table2 : 
> {code:borderStyle=solid} 
> select /*+ SKIP_SCAN */ sum(t.click)  from fact_table t join dim_table1 d1 on 
>  t.cust_id=d1.id  join dim_table2 d2 on t.cust_id =d2.id  where t.date 
> between '2016-01-01' and '2017-01-01' and d1.code = 2008 and d2.region = 'us';
> {code} 
> I use /*+ SKIP_SCAN */ hint to enable join dynamic filter. For some small 
> dataset, the sql executes quickly, but when the dataset is bigger, the sql 
> becomes very slowly,when the  row count of fact_table is 30 
> million,dim_table1 is 300 thousand and dim_table2 is 100 thousand, the above 
> query  costs 17s.
> When I debug the SQL executing, I find RHS1 return 5523 rows:
> {code:borderStyle=solid} 
>select d1.id from dim_table1 d1 where d1.code = 2008
> {code} 
> and RHS2 return 23881 rows: 
> {code:borderStyle=solid}
>select d2.id from dim_table2 d2 where d2.region='us'
> {code}  
> then HashJoinPlan uses  KeyRange.intersect(List , List ) 
> method to compute RHS1 intersecting RHS2 for join dynamic filter, narrowing 
> down fact_table.cust_id should be. 
> Surprisingly,the KeyRange.intersect method costs 11s ! although the whole sql 
> execution only costs 17s.After I read the code of  KeyRange.intersect 
> method,I find following two problem:
> (1) The double loop is inefficient in line 521 and line 522,when keyRanges  
> size is M, keyRanges2 size is N, the time complexity is O(M*N), for my 
> example,is 5523*23881: 
> {code:borderStyle=solid} 
> 519 public static List intersect(List keyRanges,  
> List keyRanges2) {
> 520List tmp = new ArrayList();
> 521for (KeyRange r1 : keyRanges) {
> 522for (KeyRange r2 : keyRanges2) {
> 523KeyRange r = r1.intersect(r2);
> 524if (EMPTY_RANGE != r) {
> 525tmp.add(r);
> 526}
> 527}
> 528}
> {code}  
> (2) line 540 shoule be r = r.union(tmp.get( i )), not intersect, just as 
> KeyRange.coalesce method does:
> {code:borderStyle=solid} 
> 532Collections.sort(tmp, KeyRange.COMPARATOR);
> 533List tmp2 = new ArrayList();
> 534KeyRange r = tmp.get(0);
> 535for (int i=1; i 536if (EMPTY_RANGE == r.intersect(tmp.get(i))) {
> 537tmp2.add(r);
> 538r = tmp.get(i);
> 539} else {
> 540r = r.intersect(tmp.get(i));
> 541}
> 542}
> {code}
> and it seems that no unit tests for this KeyRange.intersect(List , 
> List) method. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-3664) Pyspark: pushing filter by date against apache phoenix

2017-02-14 Thread Josh Mahonin (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866161#comment-15866161
 ] 

Josh Mahonin commented on PHOENIX-3664:
---

Hi [~pablo.castellanos]

I'm able to reproduce this issue on a local cluster, but I'm unable to do so 
within the Phoenix unit tests. I think issue is the same as PHOENIX-3540, which 
is fixed in unreleased Phoenix 4.10. Seeing as you're using a vendor-supplied 
Phoenix, you may have some success putting in a support request from them for a 
patched version.

As a temporary work around, you could look at using the RDD integration 
instead. Something like this should work:

{code}
import org.apache.spark.SparkContext
import org.apache.phoenix.spark._

val sv = new java.util.Date
val phoenixRDD = sc.phoenixTableAsRDD(
  table = "PCV2",
  columns = Seq("METER_ID", "FH", ..., "VAL_R4"),
  predicate = Some(s"""FH < TO_DATE('${sv.getTime}', 'S')""")
  zkUrl = Some("10.0.0.11:2181:/hbase-unsecure")
)
{code}

Note that the 'predicate' value effectively takes a literal string value and 
passes it directly to Phoenix after a 'WHERE' clause. In this instance it 
should translate into a query like:
{{SELECT METER_ID, FH, ..., VAL_R4 FROM PCV2 WHERE FH < 
TO_DATE('1487092208672', 'S')}}

Please keep this ticket updated with your eventual solution, thanks!

> Pyspark: pushing filter by date against apache phoenix
> --
>
> Key: PHOENIX-3664
> URL: https://issues.apache.org/jira/browse/PHOENIX-3664
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.7.0
> Environment: Azure HDIndight (HDI 3.5) - pyspark using phoenix 
> client. (Spark 1.6.3 - HBase 1.1.2 under HDP 2.5)
>Reporter: Pablo Castilla
>
> I am trying to filter by date in apache phoenix from pyspark. The column in 
> phoenix is created as Date and the filter is a datetime. When I use explain I 
> see spark doesn't push the filter to phoenix. I have tried a lot of 
> combinations without luck.
> Any way to do it?
> df = sqlContext.read \
>.format("org.apache.phoenix.spark") \
>   .option("table", "TABLENAME") \
>   .option("zkUrl",zookepperServer +":2181:/hbase-unsecure" ) \
>   .load()
> print(df.printSchema())
> startValidation = datetime.datetime.now()
> print(df.filter(df['FH'] >startValidation).explain(True))
> Results:
> root
>  |-- METER_ID: string (nullable = true)
>  |-- FH: date (nullable = true)
> None
>== Parsed Logical Plan ==
> 'Filter (FH#53 > 1486726683446150)
> +- 
> Relation[METER_ID#52,FH#53,SUMMERTIME#54,MAGNITUDE#55,SOURCE#56,ENTRY_DATETIME#57,BC#58,T_VAL_AE#59,T_VAL_AI#60,T_VAL_R1#61,T_VAL_R2#62,T_VAL_R3#63,T_VAL_R4#64]
>  PhoenixRelation(DAILYREADS,10.0.0.13:2181:/hbase-unsecure)
> == Analyzed Logical Plan ==
> METER_ID: string, FH: date, SUMMERTIME: string, MAGNITUDE: int, SOURCE: int, 
> ENTRY_DATETIME: date, BC: string, T_VAL_AE: int, T_VAL_AI: int, T_VAL_R1: 
> int, T_VAL_R2: int, T_VAL_R3: int, T_VAL_R4: int
> Filter (cast(FH#53 as string) > cast(1486726683446150 as string))
> +- 
> Relation[METER_ID#52,FH#53,SUMMERTIME#54,MAGNITUDE#55,SOURCE#56,ENTRY_DATETIME#57,BC#58,T_VAL_AE#59,T_VAL_AI#60,T_VAL_R1#61,T_VAL_R2#62,T_VAL_R3#63,T_VAL_R4#64]
>  PhoenixRelation(DAILYREADS,10.0.0.13:2181:/hbase-unsecure)
> == Optimized Logical Plan ==
> Filter (cast(FH#53 as string) > 2017-02-10 11:38:03.44615)
> +- 
> Relation[METER_ID#52,FH#53,SUMMERTIME#54,MAGNITUDE#55,SOURCE#56,ENTRY_DATETIME#57,BC#58,T_VAL_AE#59,T_VAL_AI#60,T_VAL_R1#61,T_VAL_R2#62,T_VAL_R3#63,T_VAL_R4#64]
>  PhoenixRelation(DAILYREADS,10.0.0.13:2181:/hbase-unsecure)
> == Physical Plan ==
> Filter (cast(FH#53 as string) > 2017-02-10 11:38:03.44615)
> +- Scan 
> PhoenixRelation(DAILYREADS,10.0.0.13:2181:/hbase-unsecure)[METER_ID#52,FH#53,SUMMERTIME#54,MAGNITUDE#55,SOURCE#56,ENTRY_DATETIME#57,BC#58,T_VAL_AE#59,T_VAL_AI#60,T_VAL_R1#61,T_VAL_R2#62,T_VAL_R3#63,T_VAL_R4#64]
> None
> if I set the FH column as timestamp it pushes the filter but throws an 
> exception:
> Caused by: org.apache.phoenix.exception.PhoenixParserException: ERROR 604 
> (42P00): Syntax error. Mismatched input. Expecting "RPAREN", got "12" at line 
> 1, column 219.
> at 
> org.apache.phoenix.exception.PhoenixParserException.newException(PhoenixParserException.java:33)
> at org.apache.phoenix.parse.SQLParser.parseStatement(SQLParser.java:111)
> at 
> org.apache.phoenix.jdbc.PhoenixStatement$PhoenixStatementParser.parseStatement(PhoenixStatement.java:1280)
> at 
> org.apache.phoenix.jdbc.PhoenixStatement.parseStatement(PhoenixStatement.java:1363)
> at 
> org.apache.phoenix.jdbc.PhoenixStatement.compileQuery(PhoenixStatement.java:1373)
> at 
> org.apache.phoenix.jdbc.PhoenixStatement.optimizeQuery(PhoenixStatement.java:1368)
> at 
> 

[jira] [Commented] (PHOENIX-3670) KeyRange.intersect(List , List) is inefficient,especially for join dynamic filter

2017-02-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866083#comment-15866083
 ] 

Hadoop QA commented on PHOENIX-3670:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12852592/PHOENIX-3670_v1.patch
  against master branch at commit 7567fcd6d569a2ece7556c4e3a966a1baf34c3a5.
  ATTACHMENT ID: 12852592

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 
43 warning messages.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 lineLengths{color}.  The patch introduces the following lines 
longer than 100:
+result = 
Bytes.BYTES_COMPARATOR.compare(rowKeyRange1.getUpperRange(), 
rowKeyRange2.getUpperRange());
+public static List intersect(List rowKeyRanges1, 
List rowKeyRanges2) {
+private void doTestListIntersectWithOneResultRange(int start1,int end1,int 
step1,int start2,int end2,int step2,boolean addEmptyRange) throws Exception {
+
PInteger.INSTANCE.getKeyRange(PInteger.INSTANCE.toBytes(i), true, 
PInteger.INSTANCE.toBytes(i+step1), true));
+
PInteger.INSTANCE.getKeyRange(PInteger.INSTANCE.toBytes(i), true, 
PInteger.INSTANCE.toBytes(i+step2), true));
+private void doTestListIntersectWithMultiResultRange(int start1,int 
count1,int step1,int start2,int count2,int step2,boolean addEmptyRange) throws 
Exception {
+
listIntersectAndAssert(Arrays.asList(KeyRange.EMPTY_RANGE),Arrays.asList(KeyRange.EVERYTHING_RANGE),Arrays.asList(KeyRange.EMPTY_RANGE));
+private static void listIntersectAndAssert(List 
rowKeyRanges1,List rowKeyRanges2,List expected) {

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
 
./phoenix-core/target/failsafe-reports/TEST-org.apache.phoenix.end2end.index.MutableIndexFailureIT

 {color:red}-1 core zombie tests{color}.  There are 12 zombie test(s): 

Test results: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/770//testReport/
Javadoc warnings: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/770//artifact/patchprocess/patchJavadocWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/770//console

This message is automatically generated.

> KeyRange.intersect(List , List) is inefficient,especially 
> for join dynamic filter
> -
>
> Key: PHOENIX-3670
> URL: https://issues.apache.org/jira/browse/PHOENIX-3670
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.9.0
>Reporter: chenglei
> Attachments: PHOENIX-3670_v1.patch
>
>
> In my business system, there is a following join SQL(which is simplified), 
> fact_table is a fact table,  joining dimension table dim_table1 and 
> dim_table2 : 
> {code:borderStyle=solid} 
> select /*+ SKIP_SCAN */ sum(t.click)  from fact_table t join dim_table1 d1 on 
>  t.cust_id=d1.id  join dim_table2 d2 on t.cust_id =d2.id  where t.date 
> between '2016-01-01' and '2017-01-01' and d1.code = 2008 and d2.region = 'us';
> {code} 
> I use /*+ SKIP_SCAN */ hint to enable join dynamic filter. For some small 
> dataset, the sql executes quickly, but when the dataset is bigger, the sql 
> becomes very slowly,when the  row count of fact_table is 30 
> million,dim_table1 is 300 thousand and dim_table2 is 100 thousand, the above 
> query  costs 17s.
> When I debug the SQL executing, I find RHS1 return 5523 rows:
> {code:borderStyle=solid} 
>select d1.id from dim_table1 d1 where d1.code = 2008
> {code} 
> and RHS2 return 23881 rows: 
> {code:borderStyle=solid}
>select d2.id from dim_table2 d2 where d2.region='us'
> {code}  
> then HashJoinPlan uses  KeyRange.intersect(List , List ) 
> method to compute RHS1 intersecting RHS2 for join dynamic filter, narrowing 
> down fact_table.cust_id should be. 
> Surprisingly,the KeyRange.intersect method costs 11s ! although the whole sql 
> execution only costs 17s.After I read the code of  KeyRange.intersect 
> method,I find following two problem:
> (1) The double loop is inefficient in line 521 and line 522,when keyRanges  
> size is M, keyRanges2 size is N, the time complexity is O(M*N), for my 
> example,is 5523*23881: 
> {code:borderStyle=solid} 
> 519 public static List intersect(List keyRanges,  
> List keyRanges2) {
> 520List tmp = new ArrayList();
> 521for 

[jira] [Commented] (PHOENIX-3664) Pyspark: pushing filter by date against apache phoenix

2017-02-14 Thread Pablo Castilla (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15866020#comment-15866020
 ] 

Pablo Castilla commented on PHOENIX-3664:
-

Hi Josh,

Thanks for helping.

I have tried what you told me and I see errors but different than with python. 
With the FH set as Date it is pushed to phoenix.

The table is created with:
"CREATE TABLE  IF NOT EXISTS PCV2 (METER_ID VARCHAR(13) not null , FH DATE NOT 
NULL, SUMMERTIME VARCHAR(1), MAGNITUDE INTEGER, ENTRY_DATETIME DATE, BC 
VARCHAR(2), VAL_AE INTEGER,VAL_AI INTEGER,VAL_R1 INTEGER,VAL_R2 INTEGER,VAL_R3 
INTEGER,VAL_R4 INTEGER  CONSTRAINT pk PRIMARY KEY (METER_ID, FH)  )  
COMPRESSION='GZ' ";

The scala code is the following:
import org.apache.spark.SparkContext
import org.apache.spark.sql.SQLContext
import org.apache.phoenix.spark._
import java.util.Date

val sqlContext = new SQLContext(sc)

val df = sqlContext.load(
  "org.apache.phoenix.spark",  
Map("table" -> "PCV2", "zkUrl" -> 
"10.0.0.11:2181:/hbase-unsecure","dateAsTimestamp" -> "true")
)

println(df.printSchema())

root
 |-- METER_ID: string (nullable = true)
 |-- FH: date (nullable = true)
 |-- SUMMERTIME: string (nullable = true)
 |-- MAGNITUDE: integer (nullable = true)
 |-- ENTRY_DATETIME: date (nullable = true)
 |-- BC: string (nullable = true)
 |-- VAL_AE: integer (nullable = true)
 |-- VAL_AI: integer (nullable = true)
 |-- VAL_R1: integer (nullable = true)
 |-- VAL_R2: integer (nullable = true)
 |-- VAL_R3: integer (nullable = true)
 |-- VAL_R4: integer (nullable = true)

val startValidation = new java.sql.Date(System.currentTimeMillis())
startValidation: java.sql.Date = 2017-02-14

df.filter(df("FH") >startValidation).explain(true)
== Physical Plan ==
Filter (FH#735 > 17211)
+- Scan 
PhoenixRelation(PCV2,10.0.0.11:2181:/hbase-unsecure)[METER_ID#734,FH#735,SUMMERTIME#736,MAGNITUDE#737,ENTRY_DATETIME#738,BC#739,VAL_AE#740,VAL_AI#741,VAL_R1#742,VAL_R2#743,VAL_R3#744,VAL_R4#745]
 PushedFilters: [GreaterThan(FH,2017-02-14)]

df.filter(df("FH") >startValidation).count()
Caused by: org.apache.phoenix.schema.TypeMismatchException: ERROR 203 (22005): 
Type mismatch. DATE and BIGINT for FH > 2001
at 
org.apache.phoenix.schema.TypeMismatchException.newException(TypeMismatchException.java:53)
at 
org.apache.phoenix.expression.ComparisonExpression.create(ComparisonExpression.java:133)
at 
org.apache.phoenix.compile.ExpressionCompiler.visitLeave(ExpressionCompiler.java:228)
at 
org.apache.phoenix.compile.ExpressionCompiler.visitLeave(ExpressionCompiler.java:141)
at 
org.apache.phoenix.parse.ComparisonParseNode.accept(ComparisonParseNode.java:47)
at 
org.apache.phoenix.compile.WhereCompiler.compile(WhereCompiler.java:130)
at 
org.apache.phoenix.compile.WhereCompiler.compile(WhereCompiler.java:100)
at 
org.apache.phoenix.compile.QueryCompiler.compileSingleFlatQuery(QueryCompiler.java:558)
at 
org.apache.phoenix.compile.QueryCompiler.compileSingleQuery(QueryCompiler.java:510)
at 
org.apache.phoenix.compile.QueryCompiler.compileSelect(QueryCompiler.java:205)
at 
org.apache.phoenix.compile.QueryCompiler.compile(QueryCompiler.java:160)
at 
org.apache.phoenix.jdbc.PhoenixStatement$ExecutableSelectStatement.compilePlan(PhoenixStatement.java:404)
at 
org.apache.phoenix.jdbc.PhoenixStatement$ExecutableSelectStatement.compilePlan(PhoenixStatement.java:378)
at 
org.apache.phoenix.jdbc.PhoenixStatement.compileQuery(PhoenixStatement.java:1381)
at 
org.apache.phoenix.jdbc.PhoenixStatement.compileQuery(PhoenixStatement.java:1374)
at 
org.apache.phoenix.jdbc.PhoenixStatement.optimizeQuery(PhoenixStatement.java:1368)
at 
org.apache.phoenix.mapreduce.PhoenixInputFormat.getQueryPlan(PhoenixInputFormat.java:122)





Also if I set the FH field as Timestamp I get the same error than with python:

Caused by: org.apache.phoenix.exception.PhoenixParserException: ERROR 604 
(42P00): Syntax error. Mismatched input. Expecting "RPAREN", got "15" at line 
1, column 53.
at 
org.apache.phoenix.exception.PhoenixParserException.newException(PhoenixParserException.java:33)
at org.apache.phoenix.parse.SQLParser.parseStatement(SQLParser.java:111)
at 
org.apache.phoenix.jdbc.PhoenixStatement$PhoenixStatementParser.parseStatement(PhoenixStatement.java:1280)
at 
org.apache.phoenix.jdbc.PhoenixStatement.parseStatement(PhoenixStatement.java:1363)
at 
org.apache.phoenix.jdbc.PhoenixStatement.compileQuery(PhoenixStatement.java:1373)
at 
org.apache.phoenix.jdbc.PhoenixStatement.optimizeQuery(PhoenixStatement.java:1368)
at 
org.apache.phoenix.mapreduce.PhoenixInputFormat.getQueryPlan(PhoenixInputFormat.java:122)
... 180 more
Caused by: MismatchedTokenException(106!=129)
at 

[jira] [Commented] (PHOENIX-3671) Implement TAL functionality for Tephra

2017-02-14 Thread Ohad Shacham (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15865997#comment-15865997
 ] 

Ohad Shacham commented on PHOENIX-3671:
---

Hi [~jamestaylor]
A few questions regarding Tephra implementation that will later on will be 
useful for Omid integration as well.
1.  Join is only done from a mutation state with TransactionAware list (no 
context) to a mutation state with context. 
I assume this is related to what you wrote on thread safety at [OMID-56] and 
this is adding back the TransactionAwares to the parent thread? I also assume 
that all these use the same Transaction object and that each thread has its own 
TransactionAwares (for thread safety). If this is correct then to support this 
functionality in Omid we will need to distribute these maps as well and not 
keep one map in the Transaction object.
2.  Reset is done only for txAwares and tx. I assume that this is done only 
for the child threads and not for the parent thread.

Thx,
Ohad


> Implement TAL functionality for Tephra
> --
>
> Key: PHOENIX-3671
> URL: https://issues.apache.org/jira/browse/PHOENIX-3671
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Ohad Shacham
>
> Implement TAL functionality for Tephra.Tephra TAL will be connected to 
> Phoenix when this subtask will be committed. From that stage any transaction 
> processor will be able to implement the TAL and used by Phoenix.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (PHOENIX-3664) Pyspark: pushing filter by date against apache phoenix

2017-02-14 Thread Pablo Castilla (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Castilla updated PHOENIX-3664:

Environment: Azure HDIndight (HDI 3.5) - pyspark using phoenix client. 
(Spark 1.6.3 - HBase 1.1.2 under HDP 2.5)  (was: Azure HDIndight - pyspark 
using phoenix client.)

> Pyspark: pushing filter by date against apache phoenix
> --
>
> Key: PHOENIX-3664
> URL: https://issues.apache.org/jira/browse/PHOENIX-3664
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.7.0
> Environment: Azure HDIndight (HDI 3.5) - pyspark using phoenix 
> client. (Spark 1.6.3 - HBase 1.1.2 under HDP 2.5)
>Reporter: Pablo Castilla
>
> I am trying to filter by date in apache phoenix from pyspark. The column in 
> phoenix is created as Date and the filter is a datetime. When I use explain I 
> see spark doesn't push the filter to phoenix. I have tried a lot of 
> combinations without luck.
> Any way to do it?
> df = sqlContext.read \
>.format("org.apache.phoenix.spark") \
>   .option("table", "TABLENAME") \
>   .option("zkUrl",zookepperServer +":2181:/hbase-unsecure" ) \
>   .load()
> print(df.printSchema())
> startValidation = datetime.datetime.now()
> print(df.filter(df['FH'] >startValidation).explain(True))
> Results:
> root
>  |-- METER_ID: string (nullable = true)
>  |-- FH: date (nullable = true)
> None
>== Parsed Logical Plan ==
> 'Filter (FH#53 > 1486726683446150)
> +- 
> Relation[METER_ID#52,FH#53,SUMMERTIME#54,MAGNITUDE#55,SOURCE#56,ENTRY_DATETIME#57,BC#58,T_VAL_AE#59,T_VAL_AI#60,T_VAL_R1#61,T_VAL_R2#62,T_VAL_R3#63,T_VAL_R4#64]
>  PhoenixRelation(DAILYREADS,10.0.0.13:2181:/hbase-unsecure)
> == Analyzed Logical Plan ==
> METER_ID: string, FH: date, SUMMERTIME: string, MAGNITUDE: int, SOURCE: int, 
> ENTRY_DATETIME: date, BC: string, T_VAL_AE: int, T_VAL_AI: int, T_VAL_R1: 
> int, T_VAL_R2: int, T_VAL_R3: int, T_VAL_R4: int
> Filter (cast(FH#53 as string) > cast(1486726683446150 as string))
> +- 
> Relation[METER_ID#52,FH#53,SUMMERTIME#54,MAGNITUDE#55,SOURCE#56,ENTRY_DATETIME#57,BC#58,T_VAL_AE#59,T_VAL_AI#60,T_VAL_R1#61,T_VAL_R2#62,T_VAL_R3#63,T_VAL_R4#64]
>  PhoenixRelation(DAILYREADS,10.0.0.13:2181:/hbase-unsecure)
> == Optimized Logical Plan ==
> Filter (cast(FH#53 as string) > 2017-02-10 11:38:03.44615)
> +- 
> Relation[METER_ID#52,FH#53,SUMMERTIME#54,MAGNITUDE#55,SOURCE#56,ENTRY_DATETIME#57,BC#58,T_VAL_AE#59,T_VAL_AI#60,T_VAL_R1#61,T_VAL_R2#62,T_VAL_R3#63,T_VAL_R4#64]
>  PhoenixRelation(DAILYREADS,10.0.0.13:2181:/hbase-unsecure)
> == Physical Plan ==
> Filter (cast(FH#53 as string) > 2017-02-10 11:38:03.44615)
> +- Scan 
> PhoenixRelation(DAILYREADS,10.0.0.13:2181:/hbase-unsecure)[METER_ID#52,FH#53,SUMMERTIME#54,MAGNITUDE#55,SOURCE#56,ENTRY_DATETIME#57,BC#58,T_VAL_AE#59,T_VAL_AI#60,T_VAL_R1#61,T_VAL_R2#62,T_VAL_R3#63,T_VAL_R4#64]
> None
> if I set the FH column as timestamp it pushes the filter but throws an 
> exception:
> Caused by: org.apache.phoenix.exception.PhoenixParserException: ERROR 604 
> (42P00): Syntax error. Mismatched input. Expecting "RPAREN", got "12" at line 
> 1, column 219.
> at 
> org.apache.phoenix.exception.PhoenixParserException.newException(PhoenixParserException.java:33)
> at org.apache.phoenix.parse.SQLParser.parseStatement(SQLParser.java:111)
> at 
> org.apache.phoenix.jdbc.PhoenixStatement$PhoenixStatementParser.parseStatement(PhoenixStatement.java:1280)
> at 
> org.apache.phoenix.jdbc.PhoenixStatement.parseStatement(PhoenixStatement.java:1363)
> at 
> org.apache.phoenix.jdbc.PhoenixStatement.compileQuery(PhoenixStatement.java:1373)
> at 
> org.apache.phoenix.jdbc.PhoenixStatement.optimizeQuery(PhoenixStatement.java:1368)
> at 
> org.apache.phoenix.mapreduce.PhoenixInputFormat.getQueryPlan(PhoenixInputFormat.java:122)
> ... 102 more
> Caused by: MismatchedTokenException(106!=129)
> at 
> org.apache.phoenix.parse.PhoenixSQLParser.recoverFromMismatchedToken(PhoenixSQLParser.java:360)
> at 
> org.apache.phoenix.shaded.org.antlr.runtime.BaseRecognizer.match(BaseRecognizer.java:115)
> at 
> org.apache.phoenix.parse.PhoenixSQLParser.not_expression(PhoenixSQLParser.java:6862)
> at 
> org.apache.phoenix.parse.PhoenixSQLParser.and_expression(PhoenixSQLParser.java:6677)
> at 
> org.apache.phoenix.parse.PhoenixSQLParser.or_expression(PhoenixSQLParser.java:6614)
> at 
> org.apache.phoenix.parse.PhoenixSQLParser.expression(PhoenixSQLParser.java:6579)
> at 
> org.apache.phoenix.parse.PhoenixSQLParser.single_select(PhoenixSQLParser.java:4615)
> at 
> org.apache.phoenix.parse.PhoenixSQLParser.unioned_selects(PhoenixSQLParser.java:4697)
> at 
> org.apache.phoenix.parse.PhoenixSQLParser.select_node(PhoenixSQLParser.java:4763)
> at 
> 

[jira] [Updated] (PHOENIX-3670) KeyRange.intersect(List , List) is inefficient,especially for join dynamic filter

2017-02-14 Thread chenglei (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenglei updated PHOENIX-3670:
--
Description: 
In my business system, there is a following join SQL(which is simplified), 
fact_table is a fact table,  joining dimension table dim_table1 and dim_table2 
: 

{code:borderStyle=solid} 
select /*+ SKIP_SCAN */ sum(t.click)  from fact_table t join dim_table1 d1 on  
t.cust_id=d1.id  join dim_table2 d2 on t.cust_id =d2.id  where t.date between 
'2016-01-01' and '2017-01-01' and d1.code = 2008 and d2.region = 'us';
{code} 

I use /*+ SKIP_SCAN */ hint to enable join dynamic filter. For some small 
dataset, the sql executes quickly, but when the dataset is bigger, the sql 
becomes very slowly,when the  row count of fact_table is 30 million,dim_table1 
is 300 thousand and dim_table2 is 100 thousand, the above query  costs 17s.

When I debug the SQL executing, I find RHS1 return 5523 rows:
{code:borderStyle=solid} 
   select d1.id from dim_table1 d1 where d1.code = 2008
{code} 

and RHS2 return 23881 rows: 
{code:borderStyle=solid}
   select d2.id from dim_table2 d2 where d2.region='us'
{code}  

then HashJoinPlan uses  KeyRange.intersect(List , List ) 
method to compute RHS1 intersecting RHS2 for join dynamic filter, narrowing 
down fact_table.cust_id should be. 

Surprisingly,the KeyRange.intersect method costs 11s ! although the whole sql 
execution only costs 17s.After I read the code of  KeyRange.intersect method,I 
find following two problem:

(1) The double loop is inefficient in line 521 and line 522,when keyRanges  
size is M, keyRanges2 size is N, the time complexity is O(M*N), for my 
example,is 5523*23881: 

{code:borderStyle=solid} 
519 public static List intersect(List keyRanges,  
List keyRanges2) {
520List tmp = new ArrayList();
521for (KeyRange r1 : keyRanges) {
522for (KeyRange r2 : keyRanges2) {
523KeyRange r = r1.intersect(r2);
524if (EMPTY_RANGE != r) {
525tmp.add(r);
526}
527}
528}
{code}  

(2) line 540 shoule be r = r.union(tmp.get( i )), not intersect, just as 
KeyRange.coalesce method does:

{code:borderStyle=solid} 
532Collections.sort(tmp, KeyRange.COMPARATOR);
533List tmp2 = new ArrayList();
534KeyRange r = tmp.get(0);
535for (int i=1; i

[jira] [Comment Edited] (PHOENIX-3670) KeyRange.intersect(List , List) is inefficient,especially for join dynamic filter

2017-02-14 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15865978#comment-15865978
 ] 

chenglei edited comment on PHOENIX-3670 at 2/14/17 3:43 PM:


I uploaded my first patch, could someone help me review for this patch? The 
time complexity  of  KeyRange.intersect method in my patch is reduced to 
O(M*logM)+O(N*logN), which is faster than current O(M*N) ,and for my example 
explained above,after applied the patch,KeyRange.intersect method only cost 
20ms, dramatically faster than original 11s.I also add some unit tests for 
KeyRange.intersect(List,List) method in my patch.


was (Author: comnetwork):
I uploaded my first patch, could someone help me review for this patch? The 
time complexity  of  KeyRange.intersect method in my patch is reduced to 
O(M*logM)+O(N*logN), which is faster than current O(M*N) ,and for my example 
explained above,after applied the patch,KeyRange.intersect method only cost 
20ms, dramatically faster than original 11s.

> KeyRange.intersect(List , List) is inefficient,especially 
> for join dynamic filter
> -
>
> Key: PHOENIX-3670
> URL: https://issues.apache.org/jira/browse/PHOENIX-3670
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.9.0
>Reporter: chenglei
> Attachments: PHOENIX-3670_v1.patch
>
>
> In my business system, there is a following join SQL(which is simplified), 
> fact_table is a fact table,  joining dimension table dim_table1 and 
> dim_table2 : 
> {code:borderStyle=solid} 
> select /*+ SKIP_SCAN */ sum(t.click)  from fact_table t join dim_table1 d1 on 
>  t.cust_id=d1.id  join dim_table2 d2 on t.cust_id =d2.id  where t.date 
> between '2016-01-01' and '2017-01-01' and d1.code = 2008 and d2.region = 'us';
> {code} 
> I use /*+ SKIP_SCAN */ hint to enable join dynamic filter. For some small 
> dataset, the sql executes quickly, but when the dataset is bigger, the sql 
> become very slowly. When the  row count of fact_table is 30 
> million,dim_table1 is 300 thousand and dim_table2 is 100 thousand, the above 
> query  costs 17s.
> When I debug the SQL executing, I find RHS1 return 5523 rows:
> {code:borderStyle=solid} 
>select d1.id from dim_table1 d1 where d1.code = 2008
> {code} 
> and RHS2 return 23881 rows: 
> {code:borderStyle=solid}
>select d2.id from dim_table2 d2 where d2.region='us'
> {code}  
> then HashJoinPlan uses  KeyRange.intersect(List , List ) 
> method to compute RHS1 intersecting RHS2 for dynamic filter, narrowing down 
> fact_table.cust_id should be. 
> Surprisingly,the KeyRange.intersect method costs 11s ! although the whole sql 
> execution only costs 17s.After I read the code of  KeyRange.intersect 
> method,I find following two problem:
> 1. The double loop is inefficient in line 521 and line 522,when keyRanges  
> size is M, keyRanges2 size is N, the time complexity is O(M*N), for my 
> example,is 5523*23881: 
> {code:borderStyle=solid} 
> 519 public static List intersect(List keyRanges,  
> List keyRanges2) {
> 520List tmp = new ArrayList();
> 521for (KeyRange r1 : keyRanges) {
> 522for (KeyRange r2 : keyRanges2) {
> 523KeyRange r = r1.intersect(r2);
> 524if (EMPTY_RANGE != r) {
> 525tmp.add(r);
> 526}
> 527}
> 528}
> {code}  
> 2. line 540 shoule be r = r.union(tmp.get( i )), not intersect, just as 
> KeyRange.coalesce method does:
> {code:borderStyle=solid} 
> 532Collections.sort(tmp, KeyRange.COMPARATOR);
> 533List tmp2 = new ArrayList();
> 534KeyRange r = tmp.get(0);
> 535for (int i=1; i 536if (EMPTY_RANGE == r.intersect(tmp.get(i))) {
> 537tmp2.add(r);
> 538r = tmp.get(i);
> 539} else {
> 540r = r.intersect(tmp.get(i));
> 541}
> 542}
> {code}
> and it seems that no unit tests for  KeyRange.intersect(List , 
> List) method. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (PHOENIX-3670) KeyRange.intersect(List , List) is inefficient,especially for join dynamic filter

2017-02-14 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15865978#comment-15865978
 ] 

chenglei edited comment on PHOENIX-3670 at 2/14/17 3:44 PM:


I uploaded my first patch, could someone help me review for this patch? The 
time complexity  of  KeyRange.intersect method in my patch is reduced to 
O(M*logM)+O(N*logN), which is faster than current O(M*N) ,and for my example 
explained above,after applied the patch,KeyRange.intersect method only cost 
20ms, dramatically faster than original 11s.
I also add some unit tests for KeyRange.intersect(List,List) method in my patch.


was (Author: comnetwork):
I uploaded my first patch, could someone help me review for this patch? The 
time complexity  of  KeyRange.intersect method in my patch is reduced to 
O(M*logM)+O(N*logN), which is faster than current O(M*N) ,and for my example 
explained above,after applied the patch,KeyRange.intersect method only cost 
20ms, dramatically faster than original 11s.I also add some unit tests for 
KeyRange.intersect(List,List) method in my patch.

> KeyRange.intersect(List , List) is inefficient,especially 
> for join dynamic filter
> -
>
> Key: PHOENIX-3670
> URL: https://issues.apache.org/jira/browse/PHOENIX-3670
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.9.0
>Reporter: chenglei
> Attachments: PHOENIX-3670_v1.patch
>
>
> In my business system, there is a following join SQL(which is simplified), 
> fact_table is a fact table,  joining dimension table dim_table1 and 
> dim_table2 : 
> {code:borderStyle=solid} 
> select /*+ SKIP_SCAN */ sum(t.click)  from fact_table t join dim_table1 d1 on 
>  t.cust_id=d1.id  join dim_table2 d2 on t.cust_id =d2.id  where t.date 
> between '2016-01-01' and '2017-01-01' and d1.code = 2008 and d2.region = 'us';
> {code} 
> I use /*+ SKIP_SCAN */ hint to enable join dynamic filter. For some small 
> dataset, the sql executes quickly, but when the dataset is bigger, the sql 
> become very slowly. When the  row count of fact_table is 30 
> million,dim_table1 is 300 thousand and dim_table2 is 100 thousand, the above 
> query  costs 17s.
> When I debug the SQL executing, I find RHS1 return 5523 rows:
> {code:borderStyle=solid} 
>select d1.id from dim_table1 d1 where d1.code = 2008
> {code} 
> and RHS2 return 23881 rows: 
> {code:borderStyle=solid}
>select d2.id from dim_table2 d2 where d2.region='us'
> {code}  
> then HashJoinPlan uses  KeyRange.intersect(List , List ) 
> method to compute RHS1 intersecting RHS2 for dynamic filter, narrowing down 
> fact_table.cust_id should be. 
> Surprisingly,the KeyRange.intersect method costs 11s ! although the whole sql 
> execution only costs 17s.After I read the code of  KeyRange.intersect 
> method,I find following two problem:
> 1. The double loop is inefficient in line 521 and line 522,when keyRanges  
> size is M, keyRanges2 size is N, the time complexity is O(M*N), for my 
> example,is 5523*23881: 
> {code:borderStyle=solid} 
> 519 public static List intersect(List keyRanges,  
> List keyRanges2) {
> 520List tmp = new ArrayList();
> 521for (KeyRange r1 : keyRanges) {
> 522for (KeyRange r2 : keyRanges2) {
> 523KeyRange r = r1.intersect(r2);
> 524if (EMPTY_RANGE != r) {
> 525tmp.add(r);
> 526}
> 527}
> 528}
> {code}  
> 2. line 540 shoule be r = r.union(tmp.get( i )), not intersect, just as 
> KeyRange.coalesce method does:
> {code:borderStyle=solid} 
> 532Collections.sort(tmp, KeyRange.COMPARATOR);
> 533List tmp2 = new ArrayList();
> 534KeyRange r = tmp.get(0);
> 535for (int i=1; i 536if (EMPTY_RANGE == r.intersect(tmp.get(i))) {
> 537tmp2.add(r);
> 538r = tmp.get(i);
> 539} else {
> 540r = r.intersect(tmp.get(i));
> 541}
> 542}
> {code}
> and it seems that no unit tests for  KeyRange.intersect(List , 
> List) method. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-3670) KeyRange.intersect(List , List) is inefficient,especially for join dynamic filter

2017-02-14 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15865978#comment-15865978
 ] 

chenglei commented on PHOENIX-3670:
---

I uploaded my first patch, could someone help me review for this patch? The 
time complexity  of  KeyRange.intersect method in my patch is reduced to 
O(M*logM)+O(N*logN), which is faster than current O(M*N) ,and for my example 
explained above,after applied the patch,KeyRange.intersect method only cost 
20ms, dramatically faster than original 11s.

> KeyRange.intersect(List , List) is inefficient,especially 
> for join dynamic filter
> -
>
> Key: PHOENIX-3670
> URL: https://issues.apache.org/jira/browse/PHOENIX-3670
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.9.0
>Reporter: chenglei
> Attachments: PHOENIX-3670_v1.patch
>
>
> In my business system, there is a following join SQL(which is simplified), 
> fact_table is a fact table,  joining dimension table dim_table1 and 
> dim_table2 : 
> {code:borderStyle=solid} 
> select /*+ SKIP_SCAN */ sum(t.click)  from fact_table t join dim_table1 d1 on 
>  t.cust_id=d1.id  join dim_table2 d2 on t.cust_id =d2.id  where t.date 
> between '2016-01-01' and '2017-01-01' and d1.code = 2008 and d2.region = 'us';
> {code} 
> I use /*+ SKIP_SCAN */ hint to enable join dynamic filter. For some small 
> dataset, the sql executes quickly, but when the dataset is bigger, the sql 
> become very slowly. When the  row count of fact_table is 30 
> million,dim_table1 is 300 thousand and dim_table2 is 100 thousand, the above 
> query  costs 17s.
> When I debug the SQL executing, I find RHS1 return 5523 rows:
> {code:borderStyle=solid} 
>select d1.id from dim_table1 d1 where d1.code = 2008
> {code} 
> and RHS2 return 23881 rows: 
> {code:borderStyle=solid}
>select d2.id from dim_table2 d2 where d2.region='us'
> {code}  
> then HashJoinPlan uses  KeyRange.intersect(List , List ) 
> method to compute RHS1 intersecting RHS2 for dynamic filter, narrowing down 
> fact_table.cust_id should be. 
> Surprisingly,the KeyRange.intersect method costs 11s ! although the whole sql 
> execution only costs 17s.After I read the code of  KeyRange.intersect 
> method,I find following two problem:
> 1. The double loop is inefficient in line 521 and line 522,when keyRanges  
> size is M, keyRanges2 size is N, the time complexity is O(M*N), for my 
> example,is 5523*23881: 
> {code:borderStyle=solid} 
> 519 public static List intersect(List keyRanges,  
> List keyRanges2) {
> 520List tmp = new ArrayList();
> 521for (KeyRange r1 : keyRanges) {
> 522for (KeyRange r2 : keyRanges2) {
> 523KeyRange r = r1.intersect(r2);
> 524if (EMPTY_RANGE != r) {
> 525tmp.add(r);
> 526}
> 527}
> 528}
> {code}  
> 2. line 540 shoule be r = r.union(tmp.get( i )), not intersect, just as 
> KeyRange.coalesce method does:
> {code:borderStyle=solid} 
> 532Collections.sort(tmp, KeyRange.COMPARATOR);
> 533List tmp2 = new ArrayList();
> 534KeyRange r = tmp.get(0);
> 535for (int i=1; i 536if (EMPTY_RANGE == r.intersect(tmp.get(i))) {
> 537tmp2.add(r);
> 538r = tmp.get(i);
> 539} else {
> 540r = r.intersect(tmp.get(i));
> 541}
> 542}
> {code}
> and it seems that no unit tests for  KeyRange.intersect(List , 
> List) method. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (PHOENIX-3670) KeyRange.intersect(List , List) is inefficient,especially for join dynamic filter

2017-02-14 Thread chenglei (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenglei updated PHOENIX-3670:
--
Attachment: PHOENIX-3670_v1.patch

> KeyRange.intersect(List , List) is inefficient,especially 
> for join dynamic filter
> -
>
> Key: PHOENIX-3670
> URL: https://issues.apache.org/jira/browse/PHOENIX-3670
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.9.0
>Reporter: chenglei
> Attachments: PHOENIX-3670_v1.patch
>
>
> In my business system, there is a following join SQL(which is simplified), 
> fact_table is a fact table,  joining dimension table dim_table1 and 
> dim_table2 : 
> {code:borderStyle=solid} 
> select /*+ SKIP_SCAN */ sum(t.click)  from fact_table t join dim_table1 d1 on 
>  t.cust_id=d1.id  join dim_table2 d2 on t.cust_id =d2.id  where t.date 
> between '2016-01-01' and '2017-01-01' and d1.code = 2008 and d2.region = 'us';
> {code} 
> I use /*+ SKIP_SCAN */ hint to enable join dynamic filter. For some small 
> dataset, the sql executes quickly, but when the dataset is bigger, the sql 
> become very slowly. When the  row count of fact_table is 30 
> million,dim_table1 is 300 thousand and dim_table2 is 100 thousand, the above 
> query  costs 17s.
> When I debug the SQL executing, I find RHS1 return 5523 rows:
> {code:borderStyle=solid} 
>select d1.id from dim_table1 d1 where d1.code = 2008
> {code} 
> and RHS2 return 23881 rows: 
> {code:borderStyle=solid}
>select d2.id from dim_table2 d2 where d2.region='us'
> {code}  
> then HashJoinPlan uses  KeyRange.intersect(List , List ) 
> method to compute RHS1 intersecting RHS2 for dynamic filter, narrowing down 
> fact_table.cust_id should be. 
> Surprisingly,the KeyRange.intersect method costs 11s ! although the whole sql 
> execution only costs 17s.After I read the code of  KeyRange.intersect 
> method,I find following two problem:
> 1. The double loop is inefficient in line 521 and line 522,when keyRanges  
> size is M, keyRanges2 size is N, the time complexity is O(M*N), for my 
> example,is 5523*23881: 
> {code:borderStyle=solid} 
> 519 public static List intersect(List keyRanges,  
> List keyRanges2) {
> 520List tmp = new ArrayList();
> 521for (KeyRange r1 : keyRanges) {
> 522for (KeyRange r2 : keyRanges2) {
> 523KeyRange r = r1.intersect(r2);
> 524if (EMPTY_RANGE != r) {
> 525tmp.add(r);
> 526}
> 527}
> 528}
> {code}  
> 2. line 540 shoule be r = r.union(tmp.get( i )), not intersect, just as 
> KeyRange.coalesce method does:
> {code:borderStyle=solid} 
> 532Collections.sort(tmp, KeyRange.COMPARATOR);
> 533List tmp2 = new ArrayList();
> 534KeyRange r = tmp.get(0);
> 535for (int i=1; i 536if (EMPTY_RANGE == r.intersect(tmp.get(i))) {
> 537tmp2.add(r);
> 538r = tmp.get(i);
> 539} else {
> 540r = r.intersect(tmp.get(i));
> 541}
> 542}
> {code}
> and it seems that no unit tests for  KeyRange.intersect(List , 
> List) method. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (PHOENIX-3671) Implement TAL functionality for Tephra

2017-02-14 Thread Ohad Shacham (JIRA)
Ohad Shacham created PHOENIX-3671:
-

 Summary: Implement TAL functionality for Tephra
 Key: PHOENIX-3671
 URL: https://issues.apache.org/jira/browse/PHOENIX-3671
 Project: Phoenix
  Issue Type: Sub-task
Reporter: Ohad Shacham


Implement TAL functionality for Tephra.Tephra TAL will be connected to Phoenix 
when this subtask will be committed. From that stage any transaction processor 
will be able to implement the TAL and used by Phoenix.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (PHOENIX-3670) KeyRange.intersect(List , List) is inefficient,especially for join dynamic filter

2017-02-14 Thread chenglei (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenglei updated PHOENIX-3670:
--
Description: 
In my business system, there is a following join SQL(which is simplified), 
fact_table is a fact table,  joining dimension table dim_table1 and dim_table2 
: 

{code:borderStyle=solid} 
select /*+ SKIP_SCAN */ sum(t.click)  from fact_table t join dim_table1 d1 on  
t.cust_id=d1.id  join dim_table2 d2 on t.cust_id =d2.id  where t.date between 
'2016-01-01' and '2017-01-01' and d1.code = 2008 and d2.region = 'us';
{code} 

I use /*+ SKIP_SCAN */ hint to enable join dynamic filter. For some small 
dataset, the sql executes quickly, but when the dataset is bigger, the sql 
become very slowly. When the  row count of fact_table is 30 million,dim_table1 
is 300 thousand and dim_table2 is 100 thousand, the above query  costs 17s.

When I debug the SQL executing, I find RHS1 return 5523 rows:
{code:borderStyle=solid} 
   select d1.id from dim_table1 d1 where d1.code = 2008
{code} 

and RHS2 return 23881 rows: 
{code:borderStyle=solid}
   select d2.id from dim_table2 d2 where d2.region='us'
{code}  

then HashJoinPlan uses  KeyRange.intersect(List , List ) 
method to compute RHS1 intersecting RHS2 for dynamic filter, narrowing down 
fact_table.cust_id should be. 

Surprisingly,the KeyRange.intersect method costs 11s ! although the whole sql 
execution only costs 17s.After I read the code of  KeyRange.intersect method,I 
find following two problem:

1. The double loop is inefficient in line 521 and line 522,when keyRanges  size 
is M, keyRanges2 size is N, the time complexity is O(M*N), for my example,is 
5523*23881: 

{code:borderStyle=solid} 
519 public static List intersect(List keyRanges,  
List keyRanges2) {
520List tmp = new ArrayList();
521for (KeyRange r1 : keyRanges) {
522for (KeyRange r2 : keyRanges2) {
523KeyRange r = r1.intersect(r2);
524if (EMPTY_RANGE != r) {
525tmp.add(r);
526}
527}
528}
{code}  

2. line 540 shoule be r = r.union(tmp.get( i )), not intersect, just as 
KeyRange.coalesce method does:

{code:borderStyle=solid} 
532Collections.sort(tmp, KeyRange.COMPARATOR);
533List tmp2 = new ArrayList();
534KeyRange r = tmp.get(0);
535for (int i=1; i

[jira] [Updated] (PHOENIX-3670) KeyRange.intersect(List , List) is inefficient,especially for join dynamic filter

2017-02-14 Thread chenglei (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenglei updated PHOENIX-3670:
--
Description: 
In my business system, there is a following join SQL(which is simplified), 
fact_table is a fact table,  joining dimension table dim_table1 and dim_table2 
: 

{code:borderStyle=solid} 
select /*+ SKIP_SCAN */ sum(t.click)  from fact_table t join dim_table1 d1 on  
t.cust_id=d1.id  join dim_table2 d2 on t.cust_id =d2.id  where t.date between 
'2016-01-01' and '2017-01-01' and d1.code = 2008 and d2.region = 'us';
{code} 

I use /*+ SKIP_SCAN */ hint to enable join dynamic filter. For some small 
dataset, the sql executes quickly, but when the dataset is bigger, the sql 
become very slowly. When the  row count of fact_table is 30 million,dim_table1 
is 300 thousand and dim_table2 is 100 thousand, the above query  costs 17s.

When I debug the SQL executing, I find RHS1 return 5523 rows:
{code:borderStyle=solid} 
   select d1.id from dim_table1 d1 where d1.code = 2008
{code} 

and RHS2 return 23881 rows: 
{code:borderStyle=solid}
   select d2.id from dim_table2 d2 where d2.region='us'
{code}  

then HashJoinPlan uses  KeyRange.intersect(List , List ) 
method to compute RHS1 intersecting RHS2 for dynamic filter, narrowing down 
fact_table.cust_id should be. 

Surprisingly,the KeyRange.intersect method costs 11s ! although the whole sql 
execution only costs 17s.After I read the code of  KeyRange.intersect method,I 
find following two problem:

1. The double loop is inefficient in line 521 and line 522,when keyRanges  size 
is M, keyRanges2 size is N, the time complexity is O(M*N), for my example,is 
5523*23881: 

{code:borderStyle=solid} 
519 public static List intersect(List keyRanges,  
List keyRanges2) {
520List tmp = new ArrayList();
521for (KeyRange r1 : keyRanges) {
522for (KeyRange r2 : keyRanges2) {
523KeyRange r = r1.intersect(r2);
524if (EMPTY_RANGE != r) {
525tmp.add(r);
526}
527}
528}
{code}  

2. line 540 shoule be r = r.union(tmp.get( i )), not intersect,just as 
KeyRange.coalesce method does:

{code:borderStyle=solid} 
532Collections.sort(tmp, KeyRange.COMPARATOR);
533List tmp2 = new ArrayList();
534KeyRange r = tmp.get(0);
535for (int i=1; i

[jira] [Updated] (PHOENIX-3670) KeyRange.intersect(List , List) is inefficient,especially for join dynamic filter

2017-02-14 Thread chenglei (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenglei updated PHOENIX-3670:
--
Summary: KeyRange.intersect(List , List) is 
inefficient,especially for join dynamic filter  (was: 
KeyRange.intersect(List , List ) is inefficient,especially 
for join dynamic filter)

> KeyRange.intersect(List , List) is inefficient,especially 
> for join dynamic filter
> -
>
> Key: PHOENIX-3670
> URL: https://issues.apache.org/jira/browse/PHOENIX-3670
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.9.0
>Reporter: chenglei
>
> In my business system, there is a following join SQL(which is simplified), 
> fact_table is a fact table,  joining dimension table dim_table1 and 
> dim_table2 : 
> {code:borderStyle=solid} 
> select /*+ SKIP_SCAN */ sum(t.click)  from fact_table t join dim_table1 d1 on 
>  t.cust_id=d1.id  join dim_table2 d2 on t.cust_id =d2.id  where t.date 
> between '2016-01-01' and '2017-01-01' and d1.code = 2008 and d2.region = 'us';
> {code} 
> I use /*+ SKIP_SCAN */ hint to enable join dynamic filter. For some small 
> dataset, the sql executes quickly, but when the dataset is bigger, the sql 
> become very slowly. When the  row count of fact_table is 30 
> million,dim_table1 is 300 thousand and dim_table2 is 100 thousand, the above 
> query  costs 17s.
> When I debug the SQL executing, I find RHS1 return 5523 rows:
> {code:borderStyle=solid} 
>select d1.id from dim_table1 d1 where d1.code = 2008
> {code} 
> and RHS2 return 23881 rows: 
> {code:borderStyle=solid}
>select d2.id from dim_table2 d2 where d2.region='us'
> {code}  
> then HashJoinPlan uses  KeyRange.intersect(List , List ) 
> method to compute RHS1 intersecting RHS2 for dynamic filter, narrowing down 
> fact_table.cust_id should be. 
> Surprisingly,the KeyRange.intersect method costs 11s ! although the whole sql 
> execution only costs 17s.After I read the code of  KeyRange.intersect 
> method,I find following two problem:
> 1. The double loop is inefficient in line 521 and line 522,when keyRanges  
> size is M, keyRanges2 size is N, the time complexity is O(M*N), for my 
> example,is 5523*23881: 
> {code:borderStyle=solid} 
> 519 public static List intersect(List keyRanges,  
> List keyRanges2) {
> 520List tmp = new ArrayList();
> 521for (KeyRange r1 : keyRanges) {
> 522for (KeyRange r2 : keyRanges2) {
> 523KeyRange r = r1.intersect(r2);
> 524if (EMPTY_RANGE != r) {
> 525tmp.add(r);
> 526}
> 527}
> 528}
> {code}  
> 2. line 540 shoule be r = r.union(tmp.get(i)), not intersect,just as 
> KeyRange.coalesce method does:
> {code:borderStyle=solid} 
> 532Collections.sort(tmp, KeyRange.COMPARATOR);
> 533List tmp2 = new ArrayList();
> 534KeyRange r = tmp.get(0);
> 535for (int i=1; i 536if (EMPTY_RANGE == r.intersect(tmp.get(i))) {
> 537tmp2.add(r);
> 538r = tmp.get(i);
> 539} else {
> 540r = r.intersect(tmp.get(i));
> 541}
> 542}
> {code}
> and it seems that no unit tests for  KeyRange.intersect(List , 
> List) method. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (PHOENIX-3670) KeyRange.intersect(List , List ) is inefficient,especially for join dynamic filter

2017-02-14 Thread chenglei (JIRA)
chenglei created PHOENIX-3670:
-

 Summary: KeyRange.intersect(List , List ) is 
inefficient,especially for join dynamic filter
 Key: PHOENIX-3670
 URL: https://issues.apache.org/jira/browse/PHOENIX-3670
 Project: Phoenix
  Issue Type: Improvement
Affects Versions: 4.9.0
Reporter: chenglei


In my business system, there is a following join SQL(which is simplified), 
fact_table is a fact table,  joining dimension table dim_table1 and dim_table2 
: 

{code:borderStyle=solid} 
select /*+ SKIP_SCAN */ sum(t.click)  from fact_table t join dim_table1 d1 on  
t.cust_id=d1.id  join dim_table2 d2 on t.cust_id =d2.id  where t.date between 
'2016-01-01' and '2017-01-01' and d1.code = 2008 and d2.region = 'us';
{code} 

I use /*+ SKIP_SCAN */ hint to enable join dynamic filter. For some small 
dataset, the sql executes quickly, but when the dataset is bigger, the sql 
become very slowly. When the  row count of fact_table is 30 million,dim_table1 
is 300 thousand and dim_table2 is 100 thousand, the above query  costs 17s.

When I debug the SQL executing, I find RHS1 return 5523 rows:
{code:borderStyle=solid} 
   select d1.id from dim_table1 d1 where d1.code = 2008
{code} 

and RHS2 return 23881 rows: 
{code:borderStyle=solid}
   select d2.id from dim_table2 d2 where d2.region='us'
{code}  

then HashJoinPlan uses  KeyRange.intersect(List , List ) 
method to compute RHS1 intersecting RHS2 for dynamic filter, narrowing down 
fact_table.cust_id should be. 

Surprisingly,the KeyRange.intersect method costs 11s ! although the whole sql 
execution only costs 17s.After I read the code of  KeyRange.intersect method,I 
find following two problem:

1. The double loop is inefficient in line 521 and line 522,when keyRanges  size 
is M, keyRanges2 size is N, the time complexity is O(M*N), for my example,is 
5523*23881: 

{code:borderStyle=solid} 
519 public static List intersect(List keyRanges,  
List keyRanges2) {
520List tmp = new ArrayList();
521for (KeyRange r1 : keyRanges) {
522for (KeyRange r2 : keyRanges2) {
523KeyRange r = r1.intersect(r2);
524if (EMPTY_RANGE != r) {
525tmp.add(r);
526}
527}
528}
{code}  

2. line 540 shoule be r = r.union(tmp.get(i)), not intersect,just as 
KeyRange.coalesce method does:

{code:borderStyle=solid} 
532Collections.sort(tmp, KeyRange.COMPARATOR);
533List tmp2 = new ArrayList();
534KeyRange r = tmp.get(0);
535for (int i=1; i

[jira] [Commented] (PHOENIX-3585) MutableIndexIT testSplitDuringIndexScan and testIndexHalfStoreFileReader fail for transactional tables and local indexes

2017-02-14 Thread Rajeshbabu Chintaguntla (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15865895#comment-15865895
 ] 

Rajeshbabu Chintaguntla commented on PHOENIX-3585:
--

[~jamestaylor] [~tdsilva] 
In IndexHalfStoreFileReaderGenerator.preCompactScannerOpen() we create 
LocalIndexStoreFileScanner after split and that to when there are reference 
files. In normal compaction cases we don't go by create them. Seems like cannot 
combine both TransactionProcessor and IndexHalfStoreFileReaderGenerator because 
in both the cases scan object is different.  Can't we skip invalidating in 
TransactionProcessor till normal compaction than compaction after split or 
merge? Ping [~poornachandra].

> MutableIndexIT testSplitDuringIndexScan and testIndexHalfStoreFileReader fail 
> for transactional tables and local indexes
> 
>
> Key: PHOENIX-3585
> URL: https://issues.apache.org/jira/browse/PHOENIX-3585
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Thomas D'Silva
>Assignee: Thomas D'Silva
> Attachments: diff.patch
>
>
> the tests fail if we use HDFSTransactionStateStorage instead of  
> InMemoryTransactionStateStorage when we create the TransactionManager in 
> BaseTest



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-3360) Secondary index configuration is wrong

2017-02-14 Thread Rajeshbabu Chintaguntla (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15865398#comment-15865398
 ] 

Rajeshbabu Chintaguntla commented on PHOENIX-3360:
--

+1 on v4 patch. Nice catch [~yhxx511]. We can go ahead and commit it.
bq.  In this ctor, it will hit ZK to read the cluster id by calling 
retrieveClusterId(). This is totally unacceptable. 
we can raise issue in HBase whether it's possible to avoid this.

> Secondary index configuration is wrong
> --
>
> Key: PHOENIX-3360
> URL: https://issues.apache.org/jira/browse/PHOENIX-3360
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Enis Soztutar
>Assignee: Rajeshbabu Chintaguntla
>Priority: Critical
> Fix For: 4.10.0
>
> Attachments: ConfCP.java, PHOENIX-3360.patch, PHOENIX-3360-v2.PATCH, 
> PHOENIX-3360-v3.PATCH, PHOENIX-3360-v4.PATCH
>
>
> IndexRpcScheduler allocates some handler threads and uses a higher priority 
> for RPCs. The corresponding IndexRpcController is not used by default as it 
> is, but used through ServerRpcControllerFactory that we configure from Ambari 
> by default which sets the priority of the outgoing RPCs to either metadata 
> priority, or the index priority.
> However, after reading code of IndexRpcController / ServerRpcController it 
> seems that the IndexRPCController DOES NOT look at whether the outgoing RPC 
> is for an Index table or not. It just sets ALL rpc priorities to be the index 
> priority. The intention seems to be the case that ONLY on servers, we 
> configure ServerRpcControllerFactory, and with clients we NEVER configure 
> ServerRpcControllerFactory, but instead use ClientRpcControllerFactory. We 
> configure ServerRpcControllerFactory from Ambari, which in affect makes it so 
> that ALL rpcs from Phoenix are only handled by the index handlers by default. 
> It means all deadlock cases are still there. 
> The documentation in https://phoenix.apache.org/secondary_indexing.html is 
> also wrong in this sense. It does not talk about server side / client side. 
> Plus this way of configuring different values is not how HBase configuration 
> is deployed. We cannot have the configuration show the 
> ServerRpcControllerFactory even only for server nodes, because the clients 
> running on those nodes will also see the wrong values. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (PHOENIX-3360) Secondary index configuration is wrong

2017-02-14 Thread William Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15865236#comment-15865236
 ] 

William Yang edited comment on PHOENIX-3360 at 2/14/17 8:28 AM:


bq. CompoundConfiguration treats the added configs as immutable, and has an 
internal mutable config (see the code). This means that with the original 
patch, the rest of region server (including replication) will not be affected.

I've done a simple test, see {{ConfCP.java}}. If we changed the RegionServer 
level configuration in a region's coprocessor, then all the other Regions 
opened afterwards on the same RS will see the change. It has nothing to do with 
the implementation of Configuration class or any other internal classes, but is 
determined by where a region's Configuration object comes from. 

I checked the code in both hbase 1.1.2 and 0.94. See 
{{RegionCoprocessorHost#getTableCoprocessorAttrsFromSchema()}} for 1.1 and 
{{RegionCoprocessorHost#loadTableCoprocessors()}} for 0.94. 

Each region will have its own copy of Configuration, which are all copied from 
the region server's configuration object. So it is safe to change the 
configuration returned by {{CoprocessorEnvironment#getConfiguration()}} and 
this change can be seen only within this Region. But we should never change the 
Configurations return by 
{{RegionCoprocessorEnvironment#getRegionServerServices().getConfiguration()}} 
for this will change all the other Regions' conf (Only the regions being opened 
afterwards).

How to use ConfCP.java
 * create 'test1', 'cf'
 * create 'test2', 'cf'
 * make sure that all regions of the above two tables are hosted in the same 
regionserver
 * add coprocessor ConfCP  for test1, check log, should see the print below:
{code}
YHYH1: [test1]conf hashCode = 2027310658
YHYH2: [test1]put conf (yh.special.key,XX)
YHYH3: [test1]get conf (yh.special.key,XX)
{code}
 * add coprocessor ConfCP for test2, check the log again, should see the print 
below
{code}
YHYH1: [test2]conf hashCode = 2027310658
YHYH3: [test2]get conf (yh.special.key,XX)
{code}

So we set a unique conf in coprocessor in test1, then test2 saw it.
Note that {{conf}} can be assigned in two ways. Currently 
{code}
conf = 
((RegionCoprocessorEnvironment)e).getRegionServerServices().getConfiguration();
{code}
is used, and this is what we do in V1 patch.

Change it to 
{code}
conf = e.getConfiguration();
{code}
then table test2 will not see the change that test1 did.

Above all, we can use the v1 patch with a little modification that we just set 
the conf returned by {{CoprocessorEnvironment#getConfiguration()}}.  And for 
PHOENIX-3271 that UPSART SELECT's write will still have higher priority. 

WDYT? Ping [~jamestaylor], [~enis], [~rajeshbabu].



was (Author: yhxx511):
bq. CompoundConfiguration treats the added configs as immutable, and has an 
internal mutable config (see the code). This means that with the original 
patch, the rest of region server (including replication) will not be affected.

I've done a simple test, see {{ConfCP.java}}. If we change the RegionServer 
level configuration in a coprocessor, then all the other Regions opened on the 
same RS will see the change. It has nothing to do with the implementation of 
Configuration class or any other internal classes, but is determined by where a 
region's Configuration object comes from. 

I checked the code in both hbase 1.1.2 and 0.94. See 
{{RegionCoprocessorHost#getTableCoprocessorAttrsFromSchema()}} for 1.1 and 
{{RegionCoprocessorHost#loadTableCoprocessors()}} for 0.94. 

Each region will have its own copy of Configuration, which are all copied from 
the region server's configuration object. So it is safe to change the 
configuration returned by {{CoprocessorEnvironment#getConfiguration()}} and 
this change can be seen only within this Region. But we should never change the 
Configurations return by 
{{RegionCoprocessorEnvironment#getRegionServerServices().getConfiguration()}} 
for this will change all the other Regions' conf.

How to use ConfCP.java
 * create 'test1', 'cf'
 * create 'test2', 'cf'
 * make sure that all regions of the above two tables are hosted in the same 
regionserver
 * add coprocessor ConfCP  for test1, check log, should see the print below:
{code}
YHYH1: [test1]conf hashCode = 2027310658
YHYH2: [test1]put conf (yh.special.key,XX)
YHYH3: [test1]get conf (yh.special.key,XX)
{code}
 * add coprocessor ConfCP for test2, check the log again, should see the print 
below
{code}
YHYH1: [test2]conf hashCode = 2027310658
YHYH3: [test2]get conf (yh.special.key,XX)
{code}

Note that {{conf}} can be assigned by two values. for 
{code}
conf = 
((RegionCoprocessorEnvironment)e).getRegionServerServices().getConfiguration();
{code}
is used now, and this is what we do in V1 patch.

Change it to 
{code}
conf = e.getConfiguration();
{code}
then table test2 will not 

[jira] [Commented] (PHOENIX-3539) Fix bulkload for StorageScheme - ONE_CELL_PER_KEYVALUE_COLUMN

2017-02-14 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15865322#comment-15865322
 ] 

Samarth Jain commented on PHOENIX-3539:
---

[~an...@apache.org] - can you tell us more about this fix. Why is it needed? 
Also, can you rebase the patch to the latest of encodecolumns2 branch? Thanks!

> Fix bulkload for StorageScheme - ONE_CELL_PER_KEYVALUE_COLUMN 
> --
>
> Key: PHOENIX-3539
> URL: https://issues.apache.org/jira/browse/PHOENIX-3539
> Project: Phoenix
>  Issue Type: Sub-task
>Affects Versions: 4.10.0
>Reporter: Ankit Singhal
>Assignee: Ankit Singhal
> Fix For: 4.10.0
>
> Attachments: PHOENIX-3539.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)