Re: [VOTE] Apache Phoenix 5.0.0-alpha rc1

2018-02-13 Thread rajeshb...@apache.org
+1

- Tested basic, index related queries with and without stats  on large
data. All queries working fine.
- Verified cluster restart, region server restart and compaction related
cases they are ok.


Thanks,
Rajeshbabu.

On Tue, Feb 13, 2018 at 4:52 PM, Ankit Singhal 
wrote:

> +1
> - All the tests are passing(except 2 which are flaky and can be ignored).
> - Tested some basic and complex queries on cluster with large data - Ok
>
>
>
> On Tue, Feb 13, 2018 at 12:16 PM, Sergey Soldatov <
> sergeysolda...@gmail.com>
> wrote:
>
> > Tested with basic scenarios with a heavy load to salted/unsalted tables.
> > Looks stable.
> >
> > +1
> >
> > On Mon, Feb 12, 2018 at 8:01 AM, Artem Ervits 
> > wrote:
> >
> > > Hadoop 2.7.5
> > > HBase 2.0-beta1
> > > downloaded binary release: OK
> > > md5: OK
> > > loaded 1M rows with performance.py: OK
> > > ran queries in sqlline: OK
> > > started PQS and ran queries with phoenixdb python client: OK
> > > ran a java Hello World example: OK
> > >
> > >
> > > On Fri, Feb 9, 2018 at 10:34 AM, Josh Elser  wrote:
> > >
> > > > Hello Everyone,
> > > >
> > > > This is a call for a vote on Apache Phoenix 5.0.0-alpha rc1. Please
> > > notice
> > > > that there are known issues with this release which deserve the
> "alpha"
> > > > designation. These are staged on the website[1]. (Atomic upsert does
> > work
> > > > on my local installation with trivial testing)
> > > >
> > > > Over rc0, this release contains the changes: PHOENIX-4586,
> > PHOENIX-4546,
> > > > PHOENIX-4549, PHOENIX-4582.
> > > >
> > > > The RC is available at the standard location:
> > > >
> > > > https://dist.apache.org/repos/dist/dev/phoenix/apache-phoeni
> > > > x-5.0.0-alpha-HBase-2.0-rc1
> > > >
> > > > RC0 is based on the following commit: 451d6a37d0d461b60edff36ceb42b1
> > > > 7bb9610350
> > > >
> > > > Signed with my key: 9E62822F4668F17B0972ADD9B7D5CD454677D66C,
> > > > http://pgp.mit.edu/pks/lookup?op=get=0xB7D5CD454677D66C
> > > >
> > > > Vote will be open for at least 72 hours (2018/02/12 1600GMT). Please
> > > vote:
> > > >
> > > > [ ] +1 approve
> > > > [ ] +0 no opinion
> > > > [ ] -1 disapprove (and reason why)
> > > >
> > > > Thanks,
> > > > The Apache Phoenix Team
> > > >
> > > > [1] https://phoenix.apache.org/release_notes.html
> > > >
> > >
> >
>


[jira] [Updated] (PHOENIX-4602) OrExpression should can also push non-leading pk columns to scan

2018-02-13 Thread chenglei (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenglei updated PHOENIX-4602:
--
Description: 
Given following table:

{code}
    CREATE TABLE test_table (
     PK1 INTEGER NOT NULL,
     PK2 INTEGER NOT NULL,
     PK3 INTEGER NOT NULL,
     DATA INTEGER, 
     CONSTRAINT TEST_PK PRIMARY KEY (PK1,PK2,PK3))
{code}

and a sql:

{code}
  select * from test_table t where (t.pk1 >=2 and t.pk1<5) and ((t.pk2 >= 4 and 
t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9))
{code}

Obviously, it is a typical case for the sql to use SkipScanFilter,however, the 
sql actually does not use Skip Scan, it use Range Scan and just push the 
leading pk column expression \{{ (t.pk1 >=2 and t.pk1<5)}} to scan,the explain 
is :
 {code:sql}
   CLIENT PARALLEL 1-WAY RANGE SCAN OVER TEST_TABLE [2] - [5]
   SERVER FILTER BY ((PK2 >= 4 AND PK2 < 6) OR (PK2 >= 8 AND PK2 < 9))
{code}

 I think the problem is affected by the 
WhereOptimizer.KeyExpressionVisitor.orKeySlots method, in the following line 
763, because the pk2 column is not the leading pk column,so this
method return null, causing the expression {{ ((t.pk2 >= 4 and t.pk2 <6) or 
(t.pk2 >= 8 and t.pk2 <9))}} does not  pushed to scan
{code:java}
757    boolean hasFirstSlot = true;
758    boolean prevIsNull = false;
759    // TODO: Do the same optimization that we do for IN if the childSlots 
specify a fully qualified row key
760   for (KeySlot slot : childSlot) {
761      if (hasFirstSlot) {
762           // if the first slot is null, return null immediately
763           if (slot == null) {
764                return null;
765            }
766           // mark that we've handled the first slot
767           hasFirstSlot = false;
768      }
{code}

 

 

  was:
Given following table:

{code}
    CREATE TABLE test_table (
     PK1 INTEGER NOT NULL,
     PK2 INTEGER NOT NULL,
     PK3 INTEGER NOT NULL,
     DATA INTEGER, 
     CONSTRAINT TEST_PK PRIMARY KEY (PK1,PK2,PK3))
{code}

and a sql:

{code}
  select * from test_table t where (t.pk1 >=2 and t.pk1<5) and ((t.pk2 >= 4 and 
t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9))
{code}

Obviously, it is a typical case for the sql to use SkipScanFilter,however, the 
sql actually does not use Skip Scan, it use Range Scan and just push the 
leading pk column expression \{{ (t.pk1 >=2 and t.pk1<5)}} to scan,the explain 
is :

 {code:sql}
   CLIENT PARALLEL 1-WAY RANGE SCAN OVER TEST_TABLE [2] - [5]
   SERVER FILTER BY ((PK2 >= 4 AND PK2 < 6) OR (PK2 >= 8 AND PK2 < 9))
{code}

 I think the problem is affected by the 
WhereOptimizer.KeyExpressionVisitor.orKeySlots method, in the following line 
763, because the pk2 column is not the leading pk column,so this
method return null, causing the expression {{ ((t.pk2 >= 4 and t.pk2 <6) or 
(t.pk2 >= 8 and t.pk2 <9))}} does not  pushed to scan

{code:java}
757    boolean hasFirstSlot = true;
758    boolean prevIsNull = false;
759    // TODO: Do the same optimization that we do for IN if the childSlots 
specify a fully qualified row key
760   for (KeySlot slot : childSlot) {
761      if (hasFirstSlot) {
762           // if the first slot is null, return null immediately
763           if (slot == null) {
764                return null;
765            }
766           // mark that we've handled the first slot
767           hasFirstSlot = false;
768      }
{code}

 

 


> OrExpression should can also push non-leading pk columns to scan
> 
>
> Key: PHOENIX-4602
> URL: https://issues.apache.org/jira/browse/PHOENIX-4602
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.13.0
>Reporter: chenglei
>Priority: Major
>
> Given following table:
> {code}
>     CREATE TABLE test_table (
>      PK1 INTEGER NOT NULL,
>      PK2 INTEGER NOT NULL,
>      PK3 INTEGER NOT NULL,
>      DATA INTEGER, 
>      CONSTRAINT TEST_PK PRIMARY KEY (PK1,PK2,PK3))
> {code}
> and a sql:
> {code}
>   select * from test_table t where (t.pk1 >=2 and t.pk1<5) and ((t.pk2 >= 4 
> and t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9))
> {code}
> Obviously, it is a typical case for the sql to use SkipScanFilter,however, 
> the sql actually does not use Skip Scan, it use Range Scan and just push the 
> leading pk column expression \{{ (t.pk1 >=2 and t.pk1<5)}} to scan,the 
> explain is :
>  {code:sql}
>    CLIENT PARALLEL 1-WAY RANGE SCAN OVER TEST_TABLE [2] - [5]
>    SERVER FILTER BY ((PK2 >= 4 AND PK2 < 6) OR (PK2 >= 8 AND PK2 < 9))
> {code}
>  I think the problem is affected by the 
> WhereOptimizer.KeyExpressionVisitor.orKeySlots method, in the following line 
> 763, because the pk2 column is not the leading pk column,so this
> method return null, causing the expression {{ ((t.pk2 >= 4 and t.pk2 <6) or 
> (t.pk2 >= 8 and t.pk2 <9))}} does not  pushed to scan
> {code:java}
> 757    

[jira] [Updated] (PHOENIX-4602) OrExpression should can also push non-leading pk columns to scan

2018-02-13 Thread chenglei (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenglei updated PHOENIX-4602:
--
Description: 
Given following table:

{code}
    CREATE TABLE test_table (
     PK1 INTEGER NOT NULL,
     PK2 INTEGER NOT NULL,
     PK3 INTEGER NOT NULL,
     DATA INTEGER, 
     CONSTRAINT TEST_PK PRIMARY KEY (PK1,PK2,PK3))
{code}

and a sql:

{code}
  select * from test_table t where (t.pk1 >=2 and t.pk1<5) and ((t.pk2 >= 4 and 
t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9))
{code}

Obviously, it is a typical case for the sql to use SkipScanFilter,however, the 
sql actually does not use Skip Scan, it use Range Scan and just push the 
leading pk column expression \{{ (t.pk1 >=2 and t.pk1<5)}} to scan,the explain 
is :

 {code:sql}
   CLIENT PARALLEL 1-WAY RANGE SCAN OVER TEST_TABLE [2] - [5]
   SERVER FILTER BY ((PK2 >= 4 AND PK2 < 6) OR (PK2 >= 8 AND PK2 < 9))
{code}

 I think the problem is affected by the 
WhereOptimizer.KeyExpressionVisitor.orKeySlots method, in the following line 
763, because the pk2 column is not the leading pk column,so this
method return null, causing the expression {{ ((t.pk2 >= 4 and t.pk2 <6) or 
(t.pk2 >= 8 and t.pk2 <9))}} does not  pushed to scan

{code:java}
757    boolean hasFirstSlot = true;
758    boolean prevIsNull = false;
759    // TODO: Do the same optimization that we do for IN if the childSlots 
specify a fully qualified row key
760   for (KeySlot slot : childSlot) {
761      if (hasFirstSlot) {
762           // if the first slot is null, return null immediately
763           if (slot == null) {
764                return null;
765            }
766           // mark that we've handled the first slot
767           hasFirstSlot = false;
768      }
{code}

 

 

  was:
Given following table:

{code}
    CREATE TABLE test_table (
     PK1 INTEGER NOT NULL,
     PK2 INTEGER NOT NULL,
     PK3 INTEGER NOT NULL,
     DATA INTEGER, 
     CONSTRAINT TEST_PK PRIMARY KEY (PK1,PK2,PK3))
{code}

and a sql:

{code}
  select * from test_table t where (t.pk1 >=2 and t.pk1<5) and ((t.pk2 >= 4 and 
t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9))
{code}

Obviously, it is a typical case for the sql to use SkipScanFilter,however, the 
sql actually does not use Skip Scan, it use Range Scan and just push the 
leading pk column expression \{{ (t.pk1 >=2 and t.pk1<5)}} to scan,the explain 
is :

 {code:sql}
   CLIENT PARALLEL 1-WAY RANGE SCAN OVER OR_NO_LEADING_PK [2] - [5]
   SERVER FILTER BY ((PK2 >= 4 AND PK2 < 6) OR (PK2 >= 8 AND PK2 < 9))
{code}

 

I think the problem is affected by the 
WhereOptimizer.KeyExpressionVisitor.orKeySlots method, in the following line 
763, because the pk2 column is not the leading pk column,so this
method return null, causing the expression {{ ((t.pk2 >= 4 and t.pk2 <6) or 
(t.pk2 >= 8 and t.pk2 <9))}} is not push to scan

{code:java}
757    boolean hasFirstSlot = true;
758    boolean prevIsNull = false;
759    // TODO: Do the same optimization that we do for IN if the childSlots 
specify a fully qualified row key
760   for (KeySlot slot : childSlot) {
761      if (hasFirstSlot) {
762           // if the first slot is null, return null immediately
763           if (slot == null) {
764                return null;
765            }
766           // mark that we've handled the first slot
767           hasFirstSlot = false;
768      }
{code}

 

 


> OrExpression should can also push non-leading pk columns to scan
> 
>
> Key: PHOENIX-4602
> URL: https://issues.apache.org/jira/browse/PHOENIX-4602
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.13.0
>Reporter: chenglei
>Priority: Major
>
> Given following table:
> {code}
>     CREATE TABLE test_table (
>      PK1 INTEGER NOT NULL,
>      PK2 INTEGER NOT NULL,
>      PK3 INTEGER NOT NULL,
>      DATA INTEGER, 
>      CONSTRAINT TEST_PK PRIMARY KEY (PK1,PK2,PK3))
> {code}
> and a sql:
> {code}
>   select * from test_table t where (t.pk1 >=2 and t.pk1<5) and ((t.pk2 >= 4 
> and t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9))
> {code}
> Obviously, it is a typical case for the sql to use SkipScanFilter,however, 
> the sql actually does not use Skip Scan, it use Range Scan and just push the 
> leading pk column expression \{{ (t.pk1 >=2 and t.pk1<5)}} to scan,the 
> explain is :
>  {code:sql}
>    CLIENT PARALLEL 1-WAY RANGE SCAN OVER TEST_TABLE [2] - [5]
>    SERVER FILTER BY ((PK2 >= 4 AND PK2 < 6) OR (PK2 >= 8 AND PK2 < 9))
> {code}
>  I think the problem is affected by the 
> WhereOptimizer.KeyExpressionVisitor.orKeySlots method, in the following line 
> 763, because the pk2 column is not the leading pk column,so this
> method return null, causing the expression {{ ((t.pk2 >= 4 and t.pk2 <6) or 
> (t.pk2 >= 8 and t.pk2 <9))}} does not  pushed to scan
> {code:java}
> 757   

[jira] [Updated] (PHOENIX-4602) OrExpression should can also push non-leading pk columns to scan

2018-02-13 Thread chenglei (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenglei updated PHOENIX-4602:
--
Description: 
Given following table:

{code}
    CREATE TABLE test_table (
     PK1 INTEGER NOT NULL,
     PK2 INTEGER NOT NULL,
     PK3 INTEGER NOT NULL,
     DATA INTEGER, 
     CONSTRAINT TEST_PK PRIMARY KEY (PK1,PK2,PK3))
{code}

and a sql:

{code}
  select * from test_table t where (t.pk1 >=2 and t.pk1<5) and ((t.pk2 >= 4 and 
t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9))
{code}

Obviously, it is a typical case for the sql to use SkipScanFilter,however, the 
sql actually does not use Skip Scan, it use Range Scan and just push the 
leading pk column expression \{{ (t.pk1 >=2 and t.pk1<5)}} to scan,the explain 
is :
 {code:sql}
   CLIENT PARALLEL 1-WAY RANGE SCAN OVER TEST_TABLE [2] - [5]
   SERVER FILTER BY ((PK2 >= 4 AND PK2 < 6) OR (PK2 >= 8 AND PK2 < 9))
{code}

 I think the problem is affected by the 
WhereOptimizer.KeyExpressionVisitor.orKeySlots method, in the following line 
763, because the pk2 column is not the leading pk column,so this
method return null, causing the expression {{ ((t.pk2 >= 4 and t.pk2 <6) or 
(t.pk2 >= 8 and t.pk2 <9))}} does not  pushed to scan
{code:java}
757    boolean hasFirstSlot = true;
758    boolean prevIsNull = false;
759    // TODO: Do the same optimization that we do for IN if the childSlots 
specify a fully qualified row key
760   for (KeySlot slot : childSlot) {
761      if (hasFirstSlot) {
762           // if the first slot is null, return null immediately
763           if (slot == null) {
764                return null;
765            }
766           // mark that we've handled the first slot
767           hasFirstSlot = false;
768      }
{code}

For above {{WhereOptimizer.KeyExpressionVisitor.orKeySlots}} method, it seems 
that it is not necessary to make sure the pk column in OrExpression is leading 
pk column,guarantee there is only one PK Column in OrExpression is enough.  

 

  was:
Given following table:

{code}
    CREATE TABLE test_table (
     PK1 INTEGER NOT NULL,
     PK2 INTEGER NOT NULL,
     PK3 INTEGER NOT NULL,
     DATA INTEGER, 
     CONSTRAINT TEST_PK PRIMARY KEY (PK1,PK2,PK3))
{code}

and a sql:

{code}
  select * from test_table t where (t.pk1 >=2 and t.pk1<5) and ((t.pk2 >= 4 and 
t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9))
{code}

Obviously, it is a typical case for the sql to use SkipScanFilter,however, the 
sql actually does not use Skip Scan, it use Range Scan and just push the 
leading pk column expression \{{ (t.pk1 >=2 and t.pk1<5)}} to scan,the explain 
is :
 {code:sql}
   CLIENT PARALLEL 1-WAY RANGE SCAN OVER TEST_TABLE [2] - [5]
   SERVER FILTER BY ((PK2 >= 4 AND PK2 < 6) OR (PK2 >= 8 AND PK2 < 9))
{code}

 I think the problem is affected by the 
WhereOptimizer.KeyExpressionVisitor.orKeySlots method, in the following line 
763, because the pk2 column is not the leading pk column,so this
method return null, causing the expression {{ ((t.pk2 >= 4 and t.pk2 <6) or 
(t.pk2 >= 8 and t.pk2 <9))}} does not  pushed to scan
{code:java}
757    boolean hasFirstSlot = true;
758    boolean prevIsNull = false;
759    // TODO: Do the same optimization that we do for IN if the childSlots 
specify a fully qualified row key
760   for (KeySlot slot : childSlot) {
761      if (hasFirstSlot) {
762           // if the first slot is null, return null immediately
763           if (slot == null) {
764                return null;
765            }
766           // mark that we've handled the first slot
767           hasFirstSlot = false;
768      }
{code}

 

 


> OrExpression should can also push non-leading pk columns to scan
> 
>
> Key: PHOENIX-4602
> URL: https://issues.apache.org/jira/browse/PHOENIX-4602
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.13.0
>Reporter: chenglei
>Priority: Major
>
> Given following table:
> {code}
>     CREATE TABLE test_table (
>      PK1 INTEGER NOT NULL,
>      PK2 INTEGER NOT NULL,
>      PK3 INTEGER NOT NULL,
>      DATA INTEGER, 
>      CONSTRAINT TEST_PK PRIMARY KEY (PK1,PK2,PK3))
> {code}
> and a sql:
> {code}
>   select * from test_table t where (t.pk1 >=2 and t.pk1<5) and ((t.pk2 >= 4 
> and t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9))
> {code}
> Obviously, it is a typical case for the sql to use SkipScanFilter,however, 
> the sql actually does not use Skip Scan, it use Range Scan and just push the 
> leading pk column expression \{{ (t.pk1 >=2 and t.pk1<5)}} to scan,the 
> explain is :
>  {code:sql}
>    CLIENT PARALLEL 1-WAY RANGE SCAN OVER TEST_TABLE [2] - [5]
>    SERVER FILTER BY ((PK2 >= 4 AND PK2 < 6) OR (PK2 >= 8 AND PK2 < 9))
> {code}
>  I think the problem is affected by the 
> WhereOptimizer.KeyExpressionVisitor.orKeySlots method, in the 

Re: [VOTE] Apache Phoenix 5.0.0-alpha rc1

2018-02-13 Thread Ankit Singhal
+1
- All the tests are passing(except 2 which are flaky and can be ignored).
- Tested some basic and complex queries on cluster with large data - Ok



On Tue, Feb 13, 2018 at 12:16 PM, Sergey Soldatov 
wrote:

> Tested with basic scenarios with a heavy load to salted/unsalted tables.
> Looks stable.
>
> +1
>
> On Mon, Feb 12, 2018 at 8:01 AM, Artem Ervits 
> wrote:
>
> > Hadoop 2.7.5
> > HBase 2.0-beta1
> > downloaded binary release: OK
> > md5: OK
> > loaded 1M rows with performance.py: OK
> > ran queries in sqlline: OK
> > started PQS and ran queries with phoenixdb python client: OK
> > ran a java Hello World example: OK
> >
> >
> > On Fri, Feb 9, 2018 at 10:34 AM, Josh Elser  wrote:
> >
> > > Hello Everyone,
> > >
> > > This is a call for a vote on Apache Phoenix 5.0.0-alpha rc1. Please
> > notice
> > > that there are known issues with this release which deserve the "alpha"
> > > designation. These are staged on the website[1]. (Atomic upsert does
> work
> > > on my local installation with trivial testing)
> > >
> > > Over rc0, this release contains the changes: PHOENIX-4586,
> PHOENIX-4546,
> > > PHOENIX-4549, PHOENIX-4582.
> > >
> > > The RC is available at the standard location:
> > >
> > > https://dist.apache.org/repos/dist/dev/phoenix/apache-phoeni
> > > x-5.0.0-alpha-HBase-2.0-rc1
> > >
> > > RC0 is based on the following commit: 451d6a37d0d461b60edff36ceb42b1
> > > 7bb9610350
> > >
> > > Signed with my key: 9E62822F4668F17B0972ADD9B7D5CD454677D66C,
> > > http://pgp.mit.edu/pks/lookup?op=get=0xB7D5CD454677D66C
> > >
> > > Vote will be open for at least 72 hours (2018/02/12 1600GMT). Please
> > vote:
> > >
> > > [ ] +1 approve
> > > [ ] +0 no opinion
> > > [ ] -1 disapprove (and reason why)
> > >
> > > Thanks,
> > > The Apache Phoenix Team
> > >
> > > [1] https://phoenix.apache.org/release_notes.html
> > >
> >
>


[jira] [Commented] (PHOENIX-4423) Phoenix-hive compilation broken on >=Hive 2.3

2018-02-13 Thread Ankit Singhal (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16362176#comment-16362176
 ] 

Ankit Singhal commented on PHOENIX-4423:


[~elserj], [~sergey.soldatov]

Attaching a WIP patch with Hive-3.0.0, Some tests are passing but two tests 
related to Joins are still failing for both Tez and MapReduce cluster.

I'm able to run tests from Eclipse only by setting JAVA_HOME in the environment 
as while running through maven, I was getting an NPE in setup, however I have 
not spent much time in fixing it but I have seen the same sometimes with the 
4.x build as well.

 

Pending items:-
 # Fix the failing tests related to Join
 # Should be able to run tests with Maven verify
 # Need to see if same build can work with <3.0.0(or <2.3.0)

 

 

> Phoenix-hive compilation broken on >=Hive 2.3
> -
>
> Key: PHOENIX-4423
> URL: https://issues.apache.org/jira/browse/PHOENIX-4423
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Critical
> Fix For: 5.0.0
>
> Attachments: PHOENIX-4423.002.patch, PHOENIX-4423_wip1.patch
>
>
> HIVE-15167 removed an interface which we're using in Phoenix which obviously 
> fails compilation. Will need to figure out how to work with Hive 1.x, <2.3.0, 
> and >=2.3.0.
> FYI [~sergey.soldatov]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PHOENIX-4423) Phoenix-hive compilation broken on >=Hive 2.3

2018-02-13 Thread Ankit Singhal (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Singhal updated PHOENIX-4423:
---
Attachment: PHOENIX-4423_wip1.patch

> Phoenix-hive compilation broken on >=Hive 2.3
> -
>
> Key: PHOENIX-4423
> URL: https://issues.apache.org/jira/browse/PHOENIX-4423
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Critical
> Fix For: 5.0.0
>
> Attachments: PHOENIX-4423.002.patch, PHOENIX-4423_wip1.patch
>
>
> HIVE-15167 removed an interface which we're using in Phoenix which obviously 
> fails compilation. Will need to figure out how to work with Hive 1.x, <2.3.0, 
> and >=2.3.0.
> FYI [~sergey.soldatov]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (PHOENIX-4602) OrExpression should can also push non-leading pk columns to scan

2018-02-13 Thread chenglei (JIRA)
chenglei created PHOENIX-4602:
-

 Summary: OrExpression should can also push non-leading pk columns 
to scan
 Key: PHOENIX-4602
 URL: https://issues.apache.org/jira/browse/PHOENIX-4602
 Project: Phoenix
  Issue Type: Improvement
Affects Versions: 4.13.0
Reporter: chenglei






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PHOENIX-4602) OrExpression should can also push non-leading pk columns to scan

2018-02-13 Thread chenglei (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenglei updated PHOENIX-4602:
--
Description: 
Given following table:

{code:sql}

    CREATE TABLE test_table (
     PK1 INTEGER NOT NULL,
     PK2 INTEGER NOT NULL,
     PK3 INTEGER NOT NULL,
    DATA INTEGER, 
    CONSTRAINT TEST_PK PRIMARY KEY (PK1,PK2,PK3))

{code}

and a sql:

{code:sql}

  select * from test_table t where (t.pk1 >=2 and t.pk1<5) and ((t.pk2 >= 4 and 
t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9))

{code}

Obviously, it is a typical case for the sql to use SkipScanFilter,however, the 
sql actually does not use Skip Scan, it use Range Scan and just push the 
leading pk column expression \{{ (t.pk1 >=2 and t.pk1<5)}} to scan,the explain 
is :

 \{code:sql}

   CLIENT PARALLEL 1-WAY RANGE SCAN OVER OR_NO_LEADING_PK [2] - [5]
   SERVER FILTER BY ((PK2 >= 4 AND PK2 < 6) OR (PK2 >= 8 AND PK2 < 9))

{code}

 

I think the problem is affected by the 
WhereOptimizer.KeyExpressionVisitor.orKeySlots method, in the following line 
763, because the pk2 column is not the leading pk column,so this

method return null, causing the expression \{{ ((t.pk2 >= 4 and t.pk2 <6) or 
(t.pk2 >= 8 and t.pk2 <9))}} is not push to scan

{code:java}

757    boolean hasFirstSlot = true;
758    boolean prevIsNull = false;
759    // TODO: Do the same optimization that we do for IN if the childSlots 
specify a fully qualified row key
760   for (KeySlot slot : childSlot) {
761      if (hasFirstSlot) {
762           // if the first slot is null, return null immediately
763           if (slot == null) {
764                return null;
765            }
766           // mark that we've handled the first slot
767           hasFirstSlot = false;
768      }

{code}

 

 

> OrExpression should can also push non-leading pk columns to scan
> 
>
> Key: PHOENIX-4602
> URL: https://issues.apache.org/jira/browse/PHOENIX-4602
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.13.0
>Reporter: chenglei
>Priority: Major
>
> Given following table:
> {code:sql}
>     CREATE TABLE test_table (
>      PK1 INTEGER NOT NULL,
>      PK2 INTEGER NOT NULL,
>      PK3 INTEGER NOT NULL,
>     DATA INTEGER, 
>     CONSTRAINT TEST_PK PRIMARY KEY (PK1,PK2,PK3))
> {code}
> and a sql:
> {code:sql}
>   select * from test_table t where (t.pk1 >=2 and t.pk1<5) and ((t.pk2 >= 4 
> and t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9))
> {code}
> Obviously, it is a typical case for the sql to use SkipScanFilter,however, 
> the sql actually does not use Skip Scan, it use Range Scan and just push the 
> leading pk column expression \{{ (t.pk1 >=2 and t.pk1<5)}} to scan,the 
> explain is :
>  \{code:sql}
>    CLIENT PARALLEL 1-WAY RANGE SCAN OVER OR_NO_LEADING_PK [2] - [5]
>    SERVER FILTER BY ((PK2 >= 4 AND PK2 < 6) OR (PK2 >= 8 AND PK2 < 9))
> {code}
>  
> I think the problem is affected by the 
> WhereOptimizer.KeyExpressionVisitor.orKeySlots method, in the following line 
> 763, because the pk2 column is not the leading pk column,so this
> method return null, causing the expression \{{ ((t.pk2 >= 4 and t.pk2 <6) or 
> (t.pk2 >= 8 and t.pk2 <9))}} is not push to scan
> {code:java}
> 757    boolean hasFirstSlot = true;
> 758    boolean prevIsNull = false;
> 759    // TODO: Do the same optimization that we do for IN if the childSlots 
> specify a fully qualified row key
> 760   for (KeySlot slot : childSlot) {
> 761      if (hasFirstSlot) {
> 762           // if the first slot is null, return null immediately
> 763           if (slot == null) {
> 764                return null;
> 765            }
> 766           // mark that we've handled the first slot
> 767           hasFirstSlot = false;
> 768      }
> {code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PHOENIX-4602) OrExpression should can also push non-leading pk columns to scan

2018-02-13 Thread chenglei (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenglei updated PHOENIX-4602:
--
Description: 
Given following table:

{code}
    CREATE TABLE test_table (
     PK1 INTEGER NOT NULL,
     PK2 INTEGER NOT NULL,
     PK3 INTEGER NOT NULL,
     DATA INTEGER, 
     CONSTRAINT TEST_PK PRIMARY KEY (PK1,PK2,PK3))
{code}

and a sql:

{code}
  select * from test_table t where (t.pk1 >=2 and t.pk1<5) and ((t.pk2 >= 4 and 
t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9))
{code}

Obviously, it is a typical case for the sql to use SkipScanFilter,however, the 
sql actually does not use Skip Scan, it use Range Scan and just push the 
leading pk column expression \{{ (t.pk1 >=2 and t.pk1<5)}} to scan,the explain 
is :

 {code:sql}
   CLIENT PARALLEL 1-WAY RANGE SCAN OVER OR_NO_LEADING_PK [2] - [5]
   SERVER FILTER BY ((PK2 >= 4 AND PK2 < 6) OR (PK2 >= 8 AND PK2 < 9))
{code}

 

I think the problem is affected by the 
WhereOptimizer.KeyExpressionVisitor.orKeySlots method, in the following line 
763, because the pk2 column is not the leading pk column,so this
method return null, causing the expression {{ ((t.pk2 >= 4 and t.pk2 <6) or 
(t.pk2 >= 8 and t.pk2 <9))}} is not push to scan

{code:java}
757    boolean hasFirstSlot = true;
758    boolean prevIsNull = false;
759    // TODO: Do the same optimization that we do for IN if the childSlots 
specify a fully qualified row key
760   for (KeySlot slot : childSlot) {
761      if (hasFirstSlot) {
762           // if the first slot is null, return null immediately
763           if (slot == null) {
764                return null;
765            }
766           // mark that we've handled the first slot
767           hasFirstSlot = false;
768      }
{code}

 

 

  was:
Given following table:

{code}

    CREATE TABLE test_table (
     PK1 INTEGER NOT NULL,
     PK2 INTEGER NOT NULL,
     PK3 INTEGER NOT NULL,
     DATA INTEGER, 
     CONSTRAINT TEST_PK PRIMARY KEY (PK1,PK2,PK3))
{code}

and a sql:

{code}

  select * from test_table t where (t.pk1 >=2 and t.pk1<5) and ((t.pk2 >= 4 and 
t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9))
{code}

Obviously, it is a typical case for the sql to use SkipScanFilter,however, the 
sql actually does not use Skip Scan, it use Range Scan and just push the 
leading pk column expression \{{ (t.pk1 >=2 and t.pk1<5)}} to scan,the explain 
is :

 {code:sql}

   CLIENT PARALLEL 1-WAY RANGE SCAN OVER OR_NO_LEADING_PK [2] - [5]
   SERVER FILTER BY ((PK2 >= 4 AND PK2 < 6) OR (PK2 >= 8 AND PK2 < 9))

{code}

 

I think the problem is affected by the 
WhereOptimizer.KeyExpressionVisitor.orKeySlots method, in the following line 
763, because the pk2 column is not the leading pk column,so this

method return null, causing the expression \{{ ((t.pk2 >= 4 and t.pk2 <6) or 
(t.pk2 >= 8 and t.pk2 <9))}} is not push to scan

{code:java}

757    boolean hasFirstSlot = true;
758    boolean prevIsNull = false;
759    // TODO: Do the same optimization that we do for IN if the childSlots 
specify a fully qualified row key
760   for (KeySlot slot : childSlot) {
761      if (hasFirstSlot) {
762           // if the first slot is null, return null immediately
763           if (slot == null) {
764                return null;
765            }
766           // mark that we've handled the first slot
767           hasFirstSlot = false;
768      }

{code}

 

 


> OrExpression should can also push non-leading pk columns to scan
> 
>
> Key: PHOENIX-4602
> URL: https://issues.apache.org/jira/browse/PHOENIX-4602
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.13.0
>Reporter: chenglei
>Priority: Major
>
> Given following table:
> {code}
>     CREATE TABLE test_table (
>      PK1 INTEGER NOT NULL,
>      PK2 INTEGER NOT NULL,
>      PK3 INTEGER NOT NULL,
>      DATA INTEGER, 
>      CONSTRAINT TEST_PK PRIMARY KEY (PK1,PK2,PK3))
> {code}
> and a sql:
> {code}
>   select * from test_table t where (t.pk1 >=2 and t.pk1<5) and ((t.pk2 >= 4 
> and t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9))
> {code}
> Obviously, it is a typical case for the sql to use SkipScanFilter,however, 
> the sql actually does not use Skip Scan, it use Range Scan and just push the 
> leading pk column expression \{{ (t.pk1 >=2 and t.pk1<5)}} to scan,the 
> explain is :
>  {code:sql}
>    CLIENT PARALLEL 1-WAY RANGE SCAN OVER OR_NO_LEADING_PK [2] - [5]
>    SERVER FILTER BY ((PK2 >= 4 AND PK2 < 6) OR (PK2 >= 8 AND PK2 < 9))
> {code}
>  
> I think the problem is affected by the 
> WhereOptimizer.KeyExpressionVisitor.orKeySlots method, in the following line 
> 763, because the pk2 column is not the leading pk column,so this
> method return null, causing the expression {{ ((t.pk2 >= 4 and t.pk2 <6) or 
> (t.pk2 >= 8 and t.pk2 <9))}} is not push to scan
> {code:java}

[jira] [Updated] (PHOENIX-4602) OrExpression should can also push non-leading pk columns to scan

2018-02-13 Thread chenglei (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenglei updated PHOENIX-4602:
--
Description: 
Given following table:

{code}

    CREATE TABLE test_table (
     PK1 INTEGER NOT NULL,
     PK2 INTEGER NOT NULL,
     PK3 INTEGER NOT NULL,
     DATA INTEGER, 
     CONSTRAINT TEST_PK PRIMARY KEY (PK1,PK2,PK3))
{code}

and a sql:

{code}

  select * from test_table t where (t.pk1 >=2 and t.pk1<5) and ((t.pk2 >= 4 and 
t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9))
{code}

Obviously, it is a typical case for the sql to use SkipScanFilter,however, the 
sql actually does not use Skip Scan, it use Range Scan and just push the 
leading pk column expression \{{ (t.pk1 >=2 and t.pk1<5)}} to scan,the explain 
is :

 {code:sql}

   CLIENT PARALLEL 1-WAY RANGE SCAN OVER OR_NO_LEADING_PK [2] - [5]
   SERVER FILTER BY ((PK2 >= 4 AND PK2 < 6) OR (PK2 >= 8 AND PK2 < 9))

{code}

 

I think the problem is affected by the 
WhereOptimizer.KeyExpressionVisitor.orKeySlots method, in the following line 
763, because the pk2 column is not the leading pk column,so this

method return null, causing the expression \{{ ((t.pk2 >= 4 and t.pk2 <6) or 
(t.pk2 >= 8 and t.pk2 <9))}} is not push to scan

{code:java}

757    boolean hasFirstSlot = true;
758    boolean prevIsNull = false;
759    // TODO: Do the same optimization that we do for IN if the childSlots 
specify a fully qualified row key
760   for (KeySlot slot : childSlot) {
761      if (hasFirstSlot) {
762           // if the first slot is null, return null immediately
763           if (slot == null) {
764                return null;
765            }
766           // mark that we've handled the first slot
767           hasFirstSlot = false;
768      }

{code}

 

 

  was:
Given following table:

{code:sql}

    CREATE TABLE test_table (
     PK1 INTEGER NOT NULL,
     PK2 INTEGER NOT NULL,
     PK3 INTEGER NOT NULL,
    DATA INTEGER, 
    CONSTRAINT TEST_PK PRIMARY KEY (PK1,PK2,PK3))

{code}

and a sql:

{code:sql}

  select * from test_table t where (t.pk1 >=2 and t.pk1<5) and ((t.pk2 >= 4 and 
t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9))

{code}

Obviously, it is a typical case for the sql to use SkipScanFilter,however, the 
sql actually does not use Skip Scan, it use Range Scan and just push the 
leading pk column expression \{{ (t.pk1 >=2 and t.pk1<5)}} to scan,the explain 
is :

 \{code:sql}

   CLIENT PARALLEL 1-WAY RANGE SCAN OVER OR_NO_LEADING_PK [2] - [5]
   SERVER FILTER BY ((PK2 >= 4 AND PK2 < 6) OR (PK2 >= 8 AND PK2 < 9))

{code}

 

I think the problem is affected by the 
WhereOptimizer.KeyExpressionVisitor.orKeySlots method, in the following line 
763, because the pk2 column is not the leading pk column,so this

method return null, causing the expression \{{ ((t.pk2 >= 4 and t.pk2 <6) or 
(t.pk2 >= 8 and t.pk2 <9))}} is not push to scan

{code:java}

757    boolean hasFirstSlot = true;
758    boolean prevIsNull = false;
759    // TODO: Do the same optimization that we do for IN if the childSlots 
specify a fully qualified row key
760   for (KeySlot slot : childSlot) {
761      if (hasFirstSlot) {
762           // if the first slot is null, return null immediately
763           if (slot == null) {
764                return null;
765            }
766           // mark that we've handled the first slot
767           hasFirstSlot = false;
768      }

{code}

 

 


> OrExpression should can also push non-leading pk columns to scan
> 
>
> Key: PHOENIX-4602
> URL: https://issues.apache.org/jira/browse/PHOENIX-4602
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.13.0
>Reporter: chenglei
>Priority: Major
>
> Given following table:
> {code}
>     CREATE TABLE test_table (
>      PK1 INTEGER NOT NULL,
>      PK2 INTEGER NOT NULL,
>      PK3 INTEGER NOT NULL,
>      DATA INTEGER, 
>      CONSTRAINT TEST_PK PRIMARY KEY (PK1,PK2,PK3))
> {code}
> and a sql:
> {code}
>   select * from test_table t where (t.pk1 >=2 and t.pk1<5) and ((t.pk2 >= 4 
> and t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9))
> {code}
> Obviously, it is a typical case for the sql to use SkipScanFilter,however, 
> the sql actually does not use Skip Scan, it use Range Scan and just push the 
> leading pk column expression \{{ (t.pk1 >=2 and t.pk1<5)}} to scan,the 
> explain is :
>  {code:sql}
>    CLIENT PARALLEL 1-WAY RANGE SCAN OVER OR_NO_LEADING_PK [2] - [5]
>    SERVER FILTER BY ((PK2 >= 4 AND PK2 < 6) OR (PK2 >= 8 AND PK2 < 9))
> {code}
>  
> I think the problem is affected by the 
> WhereOptimizer.KeyExpressionVisitor.orKeySlots method, in the following line 
> 763, because the pk2 column is not the leading pk column,so this
> method return null, causing the expression \{{ ((t.pk2 >= 4 and t.pk2 <6) or 
> (t.pk2 >= 8 and t.pk2 <9))}} is not push to 

[jira] [Commented] (PHOENIX-4602) OrExpression should can also push non-leading pk columns to scan

2018-02-13 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16362400#comment-16362400
 ] 

chenglei commented on PHOENIX-4602:
---

I uploaded my first patch,please help me have a review,thanks.

> OrExpression should can also push non-leading pk columns to scan
> 
>
> Key: PHOENIX-4602
> URL: https://issues.apache.org/jira/browse/PHOENIX-4602
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.13.0
>Reporter: chenglei
>Priority: Major
> Attachments: PHOENIX-4602_v1.patch
>
>
> Given following table:
> {code}
>     CREATE TABLE test_table (
>      PK1 INTEGER NOT NULL,
>      PK2 INTEGER NOT NULL,
>      PK3 INTEGER NOT NULL,
>      DATA INTEGER, 
>      CONSTRAINT TEST_PK PRIMARY KEY (PK1,PK2,PK3))
> {code}
> and a sql:
> {code}
>   select * from test_table t where (t.pk1 >=2 and t.pk1<5) and ((t.pk2 >= 4 
> and t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9))
> {code}
> Obviously, it is a typical case for the sql to use SkipScanFilter,however, 
> the sql actually does not use Skip Scan, it use Range Scan and just push the 
> leading pk column expression \{{ (t.pk1 >=2 and t.pk1<5)}} to scan,the 
> explain is :
>  {code:sql}
>    CLIENT PARALLEL 1-WAY RANGE SCAN OVER TEST_TABLE [2] - [5]
>    SERVER FILTER BY ((PK2 >= 4 AND PK2 < 6) OR (PK2 >= 8 AND PK2 < 9))
> {code}
>  I think the problem is affected by the 
> WhereOptimizer.KeyExpressionVisitor.orKeySlots method, in the following line 
> 763, because the pk2 column is not the leading pk column,so this
> method return null, causing the expression {{ ((t.pk2 >= 4 and t.pk2 <6) or 
> (t.pk2 >= 8 and t.pk2 <9))}} does not  pushed to scan
> {code:java}
> 757    boolean hasFirstSlot = true;
> 758    boolean prevIsNull = false;
> 759    // TODO: Do the same optimization that we do for IN if the childSlots 
> specify a fully qualified row key
> 760   for (KeySlot slot : childSlot) {
> 761      if (hasFirstSlot) {
> 762           // if the first slot is null, return null immediately
> 763           if (slot == null) {
> 764                return null;
> 765            }
> 766           // mark that we've handled the first slot
> 767           hasFirstSlot = false;
> 768      }
> {code}
> For above {{WhereOptimizer.KeyExpressionVisitor.orKeySlots}} method, it seems 
> that it is not necessary to make sure the pk column in OrExpression is 
> leading pk column,guarantee there is only one PK Column in OrExpression is 
> enough.  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PHOENIX-4602) OrExpression should can also push non-leading pk columns to scan

2018-02-13 Thread John Leach (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16362414#comment-16362414
 ] 

John Leach commented on PHOENIX-4602:
-

[~comnetwork] I am new to Phoenix, but when I look at the WhereOptimizer.java 
it is not clear to me how or when the predicates are moved to Conjunctive 
Normal Form ([https://en.wikipedia.org/wiki/Conjunctive_normal_form).]  I have 
always seen the following process when dealing with predicates. 

1.  Move to Conjunctive Normal Form all predicates.

2.  Mark predicates on the key.

3.  Apply a function on the key predicates to assemble a set of scans.

4.  Apply a function on the remaining predicates to assemble a filter (Usually 
in list is an exception case).

Do you know where CNF occurs?

> OrExpression should can also push non-leading pk columns to scan
> 
>
> Key: PHOENIX-4602
> URL: https://issues.apache.org/jira/browse/PHOENIX-4602
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.13.0
>Reporter: chenglei
>Priority: Major
> Attachments: PHOENIX-4602_v1.patch
>
>
> Given following table:
> {code}
>     CREATE TABLE test_table (
>      PK1 INTEGER NOT NULL,
>      PK2 INTEGER NOT NULL,
>      PK3 INTEGER NOT NULL,
>  DATA INTEGER, 
>      CONSTRAINT TEST_PK PRIMARY KEY (PK1,PK2,PK3))
> {code}
> and a sql:
> {code}
>   select * from test_table t where (t.pk1 >=2 and t.pk1<5) and ((t.pk2 >= 4 
> and t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9))
> {code}
> Obviously, it is a typical case for the sql to use SkipScanFilter,however, 
> the sql actually does not use Skip Scan, it use Range Scan and just push the 
> leading pk column expression {{(t.pk1 >=2 and t.pk1<5)}} to scan,the explain 
> sql is :
>  {code:sql}
>    CLIENT PARALLEL 1-WAY RANGE SCAN OVER TEST_TABLE [2] - [5]
>    SERVER FILTER BY ((PK2 >= 4 AND PK2 < 6) OR (PK2 >= 8 AND PK2 < 9))
> {code}
>  I think the problem is affected by the 
> {{WhereOptimizer.KeyExpressionVisitor.orKeySlots}} method, in the following 
> line 763, because the pk2 column is not the leading pk column,so this method 
> return null, causing the expression 
> {{((t.pk2 >= 4 and t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9))}} is not  pushed 
> to scan:
> {code:java}
> 757    boolean hasFirstSlot = true;
> 758    boolean prevIsNull = false;
> 759    // TODO: Do the same optimization that we do for IN if the childSlots 
> specify a fully qualified row key
> 760   for (KeySlot slot : childSlot) {
> 761      if (hasFirstSlot) {
> 762           // if the first slot is null, return null immediately
> 763           if (slot == null) {
> 764                return null;
> 765            }
> 766           // mark that we've handled the first slot
> 767           hasFirstSlot = false;
> 768      }
> {code}
> For above {{WhereOptimizer.KeyExpressionVisitor.orKeySlots}} method, it seems 
> that it is not necessary to make sure the PK Column in OrExpression is 
> leading PK Column,just guarantee there is only one PK Column in OrExpression 
> is enough.  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (PHOENIX-4602) OrExpression should can also push non-leading pk columns to scan

2018-02-13 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16362469#comment-16362469
 ] 

chenglei edited comment on PHOENIX-4602 at 2/13/18 3:41 PM:


[~jleach], in fact, Phoenix does not convert the where predicates expression to 
CNF expression in step one,but after WhereOptimizer.pushKeyExpressionsToScan 
method finished, you actually get a CNF in SkipScanFilter.slots, you can make a 
simple test to verify it , and you can mail to dev@phoenix.apache.org if you 
have more questions.


was (Author: comnetwork):
[~jleach], Phoenix does not convert the where predicates to CNF, you can make a 
simple test to verify it , and you can mail to dev@phoenix.apache.org if you 
have more questions.

> OrExpression should can also push non-leading pk columns to scan
> 
>
> Key: PHOENIX-4602
> URL: https://issues.apache.org/jira/browse/PHOENIX-4602
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.13.0
>Reporter: chenglei
>Priority: Major
> Attachments: PHOENIX-4602_v1.patch
>
>
> Given following table:
> {code}
>     CREATE TABLE test_table (
>      PK1 INTEGER NOT NULL,
>      PK2 INTEGER NOT NULL,
>      PK3 INTEGER NOT NULL,
>  DATA INTEGER, 
>      CONSTRAINT TEST_PK PRIMARY KEY (PK1,PK2,PK3))
> {code}
> and a sql:
> {code}
>   select * from test_table t where (t.pk1 >=2 and t.pk1<5) and ((t.pk2 >= 4 
> and t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9))
> {code}
> Obviously, it is a typical case for the sql to use SkipScanFilter,however, 
> the sql actually does not use Skip Scan, it use Range Scan and just push the 
> leading pk column expression {{(t.pk1 >=2 and t.pk1<5)}} to scan,the explain 
> sql is :
>  {code:sql}
>    CLIENT PARALLEL 1-WAY RANGE SCAN OVER TEST_TABLE [2] - [5]
>    SERVER FILTER BY ((PK2 >= 4 AND PK2 < 6) OR (PK2 >= 8 AND PK2 < 9))
> {code}
>  I think the problem is affected by the 
> {{WhereOptimizer.KeyExpressionVisitor.orKeySlots}} method, in the following 
> line 763, because the pk2 column is not the leading pk column,so this method 
> return null, causing the expression 
> {{((t.pk2 >= 4 and t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9))}} is not  pushed 
> to scan:
> {code:java}
> 757    boolean hasFirstSlot = true;
> 758    boolean prevIsNull = false;
> 759    // TODO: Do the same optimization that we do for IN if the childSlots 
> specify a fully qualified row key
> 760   for (KeySlot slot : childSlot) {
> 761      if (hasFirstSlot) {
> 762           // if the first slot is null, return null immediately
> 763           if (slot == null) {
> 764                return null;
> 765            }
> 766           // mark that we've handled the first slot
> 767           hasFirstSlot = false;
> 768      }
> {code}
> For above {{WhereOptimizer.KeyExpressionVisitor.orKeySlots}} method, it seems 
> that it is not necessary to make sure the PK Column in OrExpression is 
> leading PK Column,just guarantee there is only one PK Column in OrExpression 
> is enough.  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PHOENIX-4602) OrExpression should can also push non-leading pk columns to scan

2018-02-13 Thread chenglei (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenglei updated PHOENIX-4602:
--
Description: 
Given following table:

{code}
    CREATE TABLE test_table (
     PK1 INTEGER NOT NULL,
     PK2 INTEGER NOT NULL,
     PK3 INTEGER NOT NULL,
     DATA INTEGER, 
     CONSTRAINT TEST_PK PRIMARY KEY (PK1,PK2,PK3))
{code}

and a sql:

{code}
  select * from test_table t where (t.pk1 >=2 and t.pk1<5) and ((t.pk2 >= 4 and 
t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9))
{code}

Obviously, it is a typical case for the sql to use SkipScanFilter,however, the 
sql actually does not use Skip Scan, it use Range Scan and just push the 
leading pk column expression {{ (t.pk1 >=2 and t.pk1<5)}} to scan,the explain 
sql is :
 {code:sql}
   CLIENT PARALLEL 1-WAY RANGE SCAN OVER TEST_TABLE [2] - [5]
   SERVER FILTER BY ((PK2 >= 4 AND PK2 < 6) OR (PK2 >= 8 AND PK2 < 9))
{code}

 I think the problem is affected by the 
WhereOptimizer.KeyExpressionVisitor.orKeySlots method, in the following line 
763, because the pk2 column is not the leading pk column,so this method return 
null, causing the expression 
{{ ((t.pk2 >= 4 and t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9))}} does not  pushed 
to scan:
{code:java}
757    boolean hasFirstSlot = true;
758    boolean prevIsNull = false;
759    // TODO: Do the same optimization that we do for IN if the childSlots 
specify a fully qualified row key
760   for (KeySlot slot : childSlot) {
761      if (hasFirstSlot) {
762           // if the first slot is null, return null immediately
763           if (slot == null) {
764                return null;
765            }
766           // mark that we've handled the first slot
767           hasFirstSlot = false;
768      }
{code}

For above {{WhereOptimizer.KeyExpressionVisitor.orKeySlots}} method, it seems 
that it is not necessary to make sure the PK Column in OrExpression is leading 
PK Column,just guarantee there is only one PK Column in OrExpression is enough. 
 

 

  was:
Given following table:

{code}
    CREATE TABLE test_table (
     PK1 INTEGER NOT NULL,
     PK2 INTEGER NOT NULL,
     PK3 INTEGER NOT NULL,
     DATA INTEGER, 
     CONSTRAINT TEST_PK PRIMARY KEY (PK1,PK2,PK3))
{code}

and a sql:

{code}
  select * from test_table t where (t.pk1 >=2 and t.pk1<5) and ((t.pk2 >= 4 and 
t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9))
{code}

Obviously, it is a typical case for the sql to use SkipScanFilter,however, the 
sql actually does not use Skip Scan, it use Range Scan and just push the 
leading pk column expression \{{ (t.pk1 >=2 and t.pk1<5)}} to scan,the explain 
is :
 {code:sql}
   CLIENT PARALLEL 1-WAY RANGE SCAN OVER TEST_TABLE [2] - [5]
   SERVER FILTER BY ((PK2 >= 4 AND PK2 < 6) OR (PK2 >= 8 AND PK2 < 9))
{code}

 I think the problem is affected by the 
WhereOptimizer.KeyExpressionVisitor.orKeySlots method, in the following line 
763, because the pk2 column is not the leading pk column,so this
method return null, causing the expression {{ ((t.pk2 >= 4 and t.pk2 <6) or 
(t.pk2 >= 8 and t.pk2 <9))}} does not  pushed to scan
{code:java}
757    boolean hasFirstSlot = true;
758    boolean prevIsNull = false;
759    // TODO: Do the same optimization that we do for IN if the childSlots 
specify a fully qualified row key
760   for (KeySlot slot : childSlot) {
761      if (hasFirstSlot) {
762           // if the first slot is null, return null immediately
763           if (slot == null) {
764                return null;
765            }
766           // mark that we've handled the first slot
767           hasFirstSlot = false;
768      }
{code}

For above {{WhereOptimizer.KeyExpressionVisitor.orKeySlots}} method, it seems 
that it is not necessary to make sure the pk column in OrExpression is leading 
pk column,guarantee there is only one PK Column in OrExpression is enough.  

 


> OrExpression should can also push non-leading pk columns to scan
> 
>
> Key: PHOENIX-4602
> URL: https://issues.apache.org/jira/browse/PHOENIX-4602
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.13.0
>Reporter: chenglei
>Priority: Major
> Attachments: PHOENIX-4602_v1.patch
>
>
> Given following table:
> {code}
>     CREATE TABLE test_table (
>      PK1 INTEGER NOT NULL,
>      PK2 INTEGER NOT NULL,
>      PK3 INTEGER NOT NULL,
>      DATA INTEGER, 
>      CONSTRAINT TEST_PK PRIMARY KEY (PK1,PK2,PK3))
> {code}
> and a sql:
> {code}
>   select * from test_table t where (t.pk1 >=2 and t.pk1<5) and ((t.pk2 >= 4 
> and t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9))
> {code}
> Obviously, it is a typical case for the sql to use SkipScanFilter,however, 
> the sql actually does not use Skip Scan, it use Range Scan and just push the 
> leading pk column expression {{ (t.pk1 >=2 and t.pk1<5)}} to scan,the 

[jira] [Commented] (PHOENIX-4602) OrExpression should can also push non-leading pk columns to scan

2018-02-13 Thread John Leach (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16362510#comment-16362510
 ] 

John Leach commented on PHOENIX-4602:
-

[~comnetwork] Thank you for the pointer in the code!

> OrExpression should can also push non-leading pk columns to scan
> 
>
> Key: PHOENIX-4602
> URL: https://issues.apache.org/jira/browse/PHOENIX-4602
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.13.0
>Reporter: chenglei
>Priority: Major
> Attachments: PHOENIX-4602_v1.patch
>
>
> Given following table:
> {code}
>     CREATE TABLE test_table (
>      PK1 INTEGER NOT NULL,
>      PK2 INTEGER NOT NULL,
>      PK3 INTEGER NOT NULL,
>  DATA INTEGER, 
>      CONSTRAINT TEST_PK PRIMARY KEY (PK1,PK2,PK3))
> {code}
> and a sql:
> {code}
>   select * from test_table t where (t.pk1 >=2 and t.pk1<5) and ((t.pk2 >= 4 
> and t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9))
> {code}
> Obviously, it is a typical case for the sql to use SkipScanFilter,however, 
> the sql actually does not use Skip Scan, it use Range Scan and just push the 
> leading pk column expression {{(t.pk1 >=2 and t.pk1<5)}} to scan,the explain 
> sql is :
>  {code:sql}
>    CLIENT PARALLEL 1-WAY RANGE SCAN OVER TEST_TABLE [2] - [5]
>    SERVER FILTER BY ((PK2 >= 4 AND PK2 < 6) OR (PK2 >= 8 AND PK2 < 9))
> {code}
>  I think the problem is affected by the 
> {{WhereOptimizer.KeyExpressionVisitor.orKeySlots}} method, in the following 
> line 763, because the pk2 column is not the leading pk column,so this method 
> return null, causing the expression 
> {{((t.pk2 >= 4 and t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9))}} is not  pushed 
> to scan:
> {code:java}
> 757    boolean hasFirstSlot = true;
> 758    boolean prevIsNull = false;
> 759    // TODO: Do the same optimization that we do for IN if the childSlots 
> specify a fully qualified row key
> 760   for (KeySlot slot : childSlot) {
> 761      if (hasFirstSlot) {
> 762           // if the first slot is null, return null immediately
> 763           if (slot == null) {
> 764                return null;
> 765            }
> 766           // mark that we've handled the first slot
> 767           hasFirstSlot = false;
> 768      }
> {code}
> For above {{WhereOptimizer.KeyExpressionVisitor.orKeySlots}} method, it seems 
> that it is not necessary to make sure the PK Column in OrExpression is 
> leading PK Column,just guarantee there is only one PK Column in OrExpression 
> is enough.  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PHOENIX-4602) OrExpression should can also push non-leading pk columns to scan

2018-02-13 Thread chenglei (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenglei updated PHOENIX-4602:
--
Description: 
Given following table:

{code}
    CREATE TABLE test_table (
     PK1 INTEGER NOT NULL,
     PK2 INTEGER NOT NULL,
     PK3 INTEGER NOT NULL,
  DATA INTEGER, 
     CONSTRAINT TEST_PK PRIMARY KEY (PK1,PK2,PK3))
{code}

and a sql:

{code}
  select * from test_table t where (t.pk1 >=2 and t.pk1<5) and ((t.pk2 >= 4 and 
t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9))
{code}

Obviously, it is a typical case for the sql to use SkipScanFilter,however, the 
sql actually does not use Skip Scan, it use Range Scan and just push the 
leading pk column expression {{(t.pk1 >=2 and t.pk1<5)}} to scan,the explain 
sql is :
 {code:sql}
   CLIENT PARALLEL 1-WAY RANGE SCAN OVER TEST_TABLE [2] - [5]
   SERVER FILTER BY ((PK2 >= 4 AND PK2 < 6) OR (PK2 >= 8 AND PK2 < 9))
{code}

 I think the problem is affected by the 
{{WhereOptimizer.KeyExpressionVisitor.orKeySlots}} method, in the following 
line 763, because the pk2 column is not the leading pk column,so this method 
return null, causing the expression 
{{((t.pk2 >= 4 and t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9))}} is not  pushed to 
scan:
{code:java}
757    boolean hasFirstSlot = true;
758    boolean prevIsNull = false;
759    // TODO: Do the same optimization that we do for IN if the childSlots 
specify a fully qualified row key
760   for (KeySlot slot : childSlot) {
761      if (hasFirstSlot) {
762           // if the first slot is null, return null immediately
763           if (slot == null) {
764                return null;
765            }
766           // mark that we've handled the first slot
767           hasFirstSlot = false;
768      }
{code}

For above {{WhereOptimizer.KeyExpressionVisitor.orKeySlots}} method, it seems 
that it is not necessary to make sure the PK Column in OrExpression is leading 
PK Column,just guarantee there is only one PK Column in OrExpression is enough. 
 

 

  was:
Given following table:

{code}
    CREATE TABLE test_table (
     PK1 INTEGER NOT NULL,
     PK2 INTEGER NOT NULL,
     PK3 INTEGER NOT NULL,
     DATA INTEGER, 
     CONSTRAINT TEST_PK PRIMARY KEY (PK1,PK2,PK3))
{code}

and a sql:

{code}
  select * from test_table t where (t.pk1 >=2 and t.pk1<5) and ((t.pk2 >= 4 and 
t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9))
{code}

Obviously, it is a typical case for the sql to use SkipScanFilter,however, the 
sql actually does not use Skip Scan, it use Range Scan and just push the 
leading pk column expression {{(t.pk1 >=2 and t.pk1<5)}} to scan,the explain 
sql is :
 {code:sql}
   CLIENT PARALLEL 1-WAY RANGE SCAN OVER TEST_TABLE [2] - [5]
   SERVER FILTER BY ((PK2 >= 4 AND PK2 < 6) OR (PK2 >= 8 AND PK2 < 9))
{code}

 I think the problem is affected by the 
{{WhereOptimizer.KeyExpressionVisitor.orKeySlots}} method, in the following 
line 763, because the pk2 column is not the leading pk column,so this method 
return null, causing the expression 
{{((t.pk2 >= 4 and t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9))}} is not  pushed to 
scan:
{code:java}
757    boolean hasFirstSlot = true;
758    boolean prevIsNull = false;
759    // TODO: Do the same optimization that we do for IN if the childSlots 
specify a fully qualified row key
760   for (KeySlot slot : childSlot) {
761      if (hasFirstSlot) {
762           // if the first slot is null, return null immediately
763           if (slot == null) {
764                return null;
765            }
766           // mark that we've handled the first slot
767           hasFirstSlot = false;
768      }
{code}

For above {{WhereOptimizer.KeyExpressionVisitor.orKeySlots}} method, it seems 
that it is not necessary to make sure the PK Column in OrExpression is leading 
PK Column,just guarantee there is only one PK Column in OrExpression is enough. 
 

 


> OrExpression should can also push non-leading pk columns to scan
> 
>
> Key: PHOENIX-4602
> URL: https://issues.apache.org/jira/browse/PHOENIX-4602
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.13.0
>Reporter: chenglei
>Priority: Major
> Attachments: PHOENIX-4602_v1.patch
>
>
> Given following table:
> {code}
>     CREATE TABLE test_table (
>      PK1 INTEGER NOT NULL,
>      PK2 INTEGER NOT NULL,
>      PK3 INTEGER NOT NULL,
>   DATA INTEGER, 
>      CONSTRAINT TEST_PK PRIMARY KEY (PK1,PK2,PK3))
> {code}
> and a sql:
> {code}
>   select * from test_table t where (t.pk1 >=2 and t.pk1<5) and ((t.pk2 >= 4 
> and t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9))
> {code}
> Obviously, it is a typical case for the sql to use SkipScanFilter,however, 
> the sql actually does not use Skip Scan, it use Range Scan and just push the 
> leading pk column expression {{(t.pk1 >=2 and t.pk1<5)}} to 

[jira] [Updated] (PHOENIX-4602) OrExpression should can also push non-leading pk columns to scan

2018-02-13 Thread chenglei (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenglei updated PHOENIX-4602:
--
Description: 
Given following table:

{code}
    CREATE TABLE test_table (
     PK1 INTEGER NOT NULL,
     PK2 INTEGER NOT NULL,
     PK3 INTEGER NOT NULL,
     DATA INTEGER, 
     CONSTRAINT TEST_PK PRIMARY KEY (PK1,PK2,PK3))
{code}

and a sql:

{code}
  select * from test_table t where (t.pk1 >=2 and t.pk1<5) and ((t.pk2 >= 4 and 
t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9))
{code}

Obviously, it is a typical case for the sql to use SkipScanFilter,however, the 
sql actually does not use Skip Scan, it use Range Scan and just push the 
leading pk column expression {{(t.pk1 >=2 and t.pk1<5)}} to scan,the explain 
sql is :
 {code:sql}
   CLIENT PARALLEL 1-WAY RANGE SCAN OVER TEST_TABLE [2] - [5]
   SERVER FILTER BY ((PK2 >= 4 AND PK2 < 6) OR (PK2 >= 8 AND PK2 < 9))
{code}

 I think the problem is affected by the 
{{WhereOptimizer.KeyExpressionVisitor.orKeySlots}} method, in the following 
line 763, because the pk2 column is not the leading pk column,so this method 
return null, causing the expression 
{{((t.pk2 >= 4 and t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9))}} is not  pushed to 
scan:
{code:java}
757    boolean hasFirstSlot = true;
758    boolean prevIsNull = false;
759    // TODO: Do the same optimization that we do for IN if the childSlots 
specify a fully qualified row key
760   for (KeySlot slot : childSlot) {
761      if (hasFirstSlot) {
762           // if the first slot is null, return null immediately
763           if (slot == null) {
764                return null;
765            }
766           // mark that we've handled the first slot
767           hasFirstSlot = false;
768      }
{code}

For above {{WhereOptimizer.KeyExpressionVisitor.orKeySlots}} method, it seems 
that it is not necessary to make sure the PK Column in OrExpression is leading 
PK Column,just guarantee there is only one PK Column in OrExpression is enough. 
 

 

  was:
Given following table:

{code}
    CREATE TABLE test_table (
     PK1 INTEGER NOT NULL,
     PK2 INTEGER NOT NULL,
     PK3 INTEGER NOT NULL,
     DATA INTEGER, 
     CONSTRAINT TEST_PK PRIMARY KEY (PK1,PK2,PK3))
{code}

and a sql:

{code}
  select * from test_table t where (t.pk1 >=2 and t.pk1<5) and ((t.pk2 >= 4 and 
t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9))
{code}

Obviously, it is a typical case for the sql to use SkipScanFilter,however, the 
sql actually does not use Skip Scan, it use Range Scan and just push the 
leading pk column expression {{(t.pk1 >=2 and t.pk1<5)}} to scan,the explain 
sql is :
 {code:sql}
   CLIENT PARALLEL 1-WAY RANGE SCAN OVER TEST_TABLE [2] - [5]
   SERVER FILTER BY ((PK2 >= 4 AND PK2 < 6) OR (PK2 >= 8 AND PK2 < 9))
{code}

 I think the problem is affected by the 
{{WhereOptimizer.KeyExpressionVisitor.orKeySlots}} method, in the following 
line 763, because the pk2 column is not the leading pk column,so this method 
return null, causing the expression 
{{((t.pk2 >= 4 and t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9))}} does not  pushed 
to scan:
{code:java}
757    boolean hasFirstSlot = true;
758    boolean prevIsNull = false;
759    // TODO: Do the same optimization that we do for IN if the childSlots 
specify a fully qualified row key
760   for (KeySlot slot : childSlot) {
761      if (hasFirstSlot) {
762           // if the first slot is null, return null immediately
763           if (slot == null) {
764                return null;
765            }
766           // mark that we've handled the first slot
767           hasFirstSlot = false;
768      }
{code}

For above {{WhereOptimizer.KeyExpressionVisitor.orKeySlots}} method, it seems 
that it is not necessary to make sure the PK Column in OrExpression is leading 
PK Column,just guarantee there is only one PK Column in OrExpression is enough. 
 

 


> OrExpression should can also push non-leading pk columns to scan
> 
>
> Key: PHOENIX-4602
> URL: https://issues.apache.org/jira/browse/PHOENIX-4602
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.13.0
>Reporter: chenglei
>Priority: Major
> Attachments: PHOENIX-4602_v1.patch
>
>
> Given following table:
> {code}
>     CREATE TABLE test_table (
>      PK1 INTEGER NOT NULL,
>      PK2 INTEGER NOT NULL,
>      PK3 INTEGER NOT NULL,
>      DATA INTEGER, 
>      CONSTRAINT TEST_PK PRIMARY KEY (PK1,PK2,PK3))
> {code}
> and a sql:
> {code}
>   select * from test_table t where (t.pk1 >=2 and t.pk1<5) and ((t.pk2 >= 4 
> and t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9))
> {code}
> Obviously, it is a typical case for the sql to use SkipScanFilter,however, 
> the sql actually does not use Skip Scan, it use Range Scan and just push the 
> leading pk column expression {{(t.pk1 >=2 and t.pk1<5)}} to 

[jira] [Updated] (PHOENIX-4602) OrExpression should can also push non-leading pk columns to scan

2018-02-13 Thread chenglei (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenglei updated PHOENIX-4602:
--
Description: 
Given following table:

{code}
    CREATE TABLE test_table (
     PK1 INTEGER NOT NULL,
     PK2 INTEGER NOT NULL,
     PK3 INTEGER NOT NULL,
 DATA INTEGER, 
     CONSTRAINT TEST_PK PRIMARY KEY (PK1,PK2,PK3))
{code}

and a sql:

{code}
  select * from test_table t where (t.pk1 >=2 and t.pk1<5) and ((t.pk2 >= 4 and 
t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9))
{code}

Obviously, it is a typical case for the sql to use SkipScanFilter,however, the 
sql actually does not use Skip Scan, it use Range Scan and just push the 
leading pk column expression {{(t.pk1 >=2 and t.pk1<5)}} to scan,the explain 
sql is :
 {code:sql}
   CLIENT PARALLEL 1-WAY RANGE SCAN OVER TEST_TABLE [2] - [5]
   SERVER FILTER BY ((PK2 >= 4 AND PK2 < 6) OR (PK2 >= 8 AND PK2 < 9))
{code}

 I think the problem is affected by the 
{{WhereOptimizer.KeyExpressionVisitor.orKeySlots}} method, in the following 
line 763, because the pk2 column is not the leading pk column,so this method 
return null, causing the expression 
{{((t.pk2 >= 4 and t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9))}} is not  pushed to 
scan:
{code:java}
757    boolean hasFirstSlot = true;
758    boolean prevIsNull = false;
759    // TODO: Do the same optimization that we do for IN if the childSlots 
specify a fully qualified row key
760   for (KeySlot slot : childSlot) {
761      if (hasFirstSlot) {
762           // if the first slot is null, return null immediately
763           if (slot == null) {
764                return null;
765            }
766           // mark that we've handled the first slot
767           hasFirstSlot = false;
768      }
{code}

For above {{WhereOptimizer.KeyExpressionVisitor.orKeySlots}} method, it seems 
that it is not necessary to make sure the PK Column in OrExpression is leading 
PK Column,just guarantee there is only one PK Column in OrExpression is enough. 
 

 

  was:
Given following table:

{code}
    CREATE TABLE test_table (
     PK1 INTEGER NOT NULL,
     PK2 INTEGER NOT NULL,
     PK3 INTEGER NOT NULL,
  DATA INTEGER, 
     CONSTRAINT TEST_PK PRIMARY KEY (PK1,PK2,PK3))
{code}

and a sql:

{code}
  select * from test_table t where (t.pk1 >=2 and t.pk1<5) and ((t.pk2 >= 4 and 
t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9))
{code}

Obviously, it is a typical case for the sql to use SkipScanFilter,however, the 
sql actually does not use Skip Scan, it use Range Scan and just push the 
leading pk column expression {{(t.pk1 >=2 and t.pk1<5)}} to scan,the explain 
sql is :
 {code:sql}
   CLIENT PARALLEL 1-WAY RANGE SCAN OVER TEST_TABLE [2] - [5]
   SERVER FILTER BY ((PK2 >= 4 AND PK2 < 6) OR (PK2 >= 8 AND PK2 < 9))
{code}

 I think the problem is affected by the 
{{WhereOptimizer.KeyExpressionVisitor.orKeySlots}} method, in the following 
line 763, because the pk2 column is not the leading pk column,so this method 
return null, causing the expression 
{{((t.pk2 >= 4 and t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9))}} is not  pushed to 
scan:
{code:java}
757    boolean hasFirstSlot = true;
758    boolean prevIsNull = false;
759    // TODO: Do the same optimization that we do for IN if the childSlots 
specify a fully qualified row key
760   for (KeySlot slot : childSlot) {
761      if (hasFirstSlot) {
762           // if the first slot is null, return null immediately
763           if (slot == null) {
764                return null;
765            }
766           // mark that we've handled the first slot
767           hasFirstSlot = false;
768      }
{code}

For above {{WhereOptimizer.KeyExpressionVisitor.orKeySlots}} method, it seems 
that it is not necessary to make sure the PK Column in OrExpression is leading 
PK Column,just guarantee there is only one PK Column in OrExpression is enough. 
 

 


> OrExpression should can also push non-leading pk columns to scan
> 
>
> Key: PHOENIX-4602
> URL: https://issues.apache.org/jira/browse/PHOENIX-4602
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.13.0
>Reporter: chenglei
>Priority: Major
> Attachments: PHOENIX-4602_v1.patch
>
>
> Given following table:
> {code}
>     CREATE TABLE test_table (
>      PK1 INTEGER NOT NULL,
>      PK2 INTEGER NOT NULL,
>      PK3 INTEGER NOT NULL,
>  DATA INTEGER, 
>      CONSTRAINT TEST_PK PRIMARY KEY (PK1,PK2,PK3))
> {code}
> and a sql:
> {code}
>   select * from test_table t where (t.pk1 >=2 and t.pk1<5) and ((t.pk2 >= 4 
> and t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9))
> {code}
> Obviously, it is a typical case for the sql to use SkipScanFilter,however, 
> the sql actually does not use Skip Scan, it use Range Scan and just push the 
> leading pk column expression {{(t.pk1 >=2 and t.pk1<5)}} to 

[jira] [Commented] (PHOENIX-4602) OrExpression should can also push non-leading pk columns to scan

2018-02-13 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16362469#comment-16362469
 ] 

chenglei commented on PHOENIX-4602:
---

[~jleach], Phoenix does not convert the where predicates to CNF, you can make a 
simple test to verify it , and you can mail to dev@phoenix.apache.org if you 
have more questions.

> OrExpression should can also push non-leading pk columns to scan
> 
>
> Key: PHOENIX-4602
> URL: https://issues.apache.org/jira/browse/PHOENIX-4602
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.13.0
>Reporter: chenglei
>Priority: Major
> Attachments: PHOENIX-4602_v1.patch
>
>
> Given following table:
> {code}
>     CREATE TABLE test_table (
>      PK1 INTEGER NOT NULL,
>      PK2 INTEGER NOT NULL,
>      PK3 INTEGER NOT NULL,
>  DATA INTEGER, 
>      CONSTRAINT TEST_PK PRIMARY KEY (PK1,PK2,PK3))
> {code}
> and a sql:
> {code}
>   select * from test_table t where (t.pk1 >=2 and t.pk1<5) and ((t.pk2 >= 4 
> and t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9))
> {code}
> Obviously, it is a typical case for the sql to use SkipScanFilter,however, 
> the sql actually does not use Skip Scan, it use Range Scan and just push the 
> leading pk column expression {{(t.pk1 >=2 and t.pk1<5)}} to scan,the explain 
> sql is :
>  {code:sql}
>    CLIENT PARALLEL 1-WAY RANGE SCAN OVER TEST_TABLE [2] - [5]
>    SERVER FILTER BY ((PK2 >= 4 AND PK2 < 6) OR (PK2 >= 8 AND PK2 < 9))
> {code}
>  I think the problem is affected by the 
> {{WhereOptimizer.KeyExpressionVisitor.orKeySlots}} method, in the following 
> line 763, because the pk2 column is not the leading pk column,so this method 
> return null, causing the expression 
> {{((t.pk2 >= 4 and t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9))}} is not  pushed 
> to scan:
> {code:java}
> 757    boolean hasFirstSlot = true;
> 758    boolean prevIsNull = false;
> 759    // TODO: Do the same optimization that we do for IN if the childSlots 
> specify a fully qualified row key
> 760   for (KeySlot slot : childSlot) {
> 761      if (hasFirstSlot) {
> 762           // if the first slot is null, return null immediately
> 763           if (slot == null) {
> 764                return null;
> 765            }
> 766           // mark that we've handled the first slot
> 767           hasFirstSlot = false;
> 768      }
> {code}
> For above {{WhereOptimizer.KeyExpressionVisitor.orKeySlots}} method, it seems 
> that it is not necessary to make sure the PK Column in OrExpression is 
> leading PK Column,just guarantee there is only one PK Column in OrExpression 
> is enough.  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (PHOENIX-4602) OrExpression should can also push non-leading pk columns to scan

2018-02-13 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16362469#comment-16362469
 ] 

chenglei edited comment on PHOENIX-4602 at 2/13/18 3:48 PM:


[~jleach], in fact, Phoenix does not convert the where predicates expression to 
CNF expression in step one,but after WhereOptimizer.pushKeyExpressionsToScan 
method finished, you actually get a CNF for PK Columns in SkipScanFilter.slots, 
you can make a simple test to verify it , and you can mail to 
dev@phoenix.apache.org if you have more questions.


was (Author: comnetwork):
[~jleach], in fact, Phoenix does not convert the where predicates expression to 
CNF expression in step one,but after WhereOptimizer.pushKeyExpressionsToScan 
method finished, you actually get a CNF in SkipScanFilter.slots, you can make a 
simple test to verify it , and you can mail to dev@phoenix.apache.org if you 
have more questions.

> OrExpression should can also push non-leading pk columns to scan
> 
>
> Key: PHOENIX-4602
> URL: https://issues.apache.org/jira/browse/PHOENIX-4602
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.13.0
>Reporter: chenglei
>Priority: Major
> Attachments: PHOENIX-4602_v1.patch
>
>
> Given following table:
> {code}
>     CREATE TABLE test_table (
>      PK1 INTEGER NOT NULL,
>      PK2 INTEGER NOT NULL,
>      PK3 INTEGER NOT NULL,
>  DATA INTEGER, 
>      CONSTRAINT TEST_PK PRIMARY KEY (PK1,PK2,PK3))
> {code}
> and a sql:
> {code}
>   select * from test_table t where (t.pk1 >=2 and t.pk1<5) and ((t.pk2 >= 4 
> and t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9))
> {code}
> Obviously, it is a typical case for the sql to use SkipScanFilter,however, 
> the sql actually does not use Skip Scan, it use Range Scan and just push the 
> leading pk column expression {{(t.pk1 >=2 and t.pk1<5)}} to scan,the explain 
> sql is :
>  {code:sql}
>    CLIENT PARALLEL 1-WAY RANGE SCAN OVER TEST_TABLE [2] - [5]
>    SERVER FILTER BY ((PK2 >= 4 AND PK2 < 6) OR (PK2 >= 8 AND PK2 < 9))
> {code}
>  I think the problem is affected by the 
> {{WhereOptimizer.KeyExpressionVisitor.orKeySlots}} method, in the following 
> line 763, because the pk2 column is not the leading pk column,so this method 
> return null, causing the expression 
> {{((t.pk2 >= 4 and t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9))}} is not  pushed 
> to scan:
> {code:java}
> 757    boolean hasFirstSlot = true;
> 758    boolean prevIsNull = false;
> 759    // TODO: Do the same optimization that we do for IN if the childSlots 
> specify a fully qualified row key
> 760   for (KeySlot slot : childSlot) {
> 761      if (hasFirstSlot) {
> 762           // if the first slot is null, return null immediately
> 763           if (slot == null) {
> 764                return null;
> 765            }
> 766           // mark that we've handled the first slot
> 767           hasFirstSlot = false;
> 768      }
> {code}
> For above {{WhereOptimizer.KeyExpressionVisitor.orKeySlots}} method, it seems 
> that it is not necessary to make sure the PK Column in OrExpression is 
> leading PK Column,just guarantee there is only one PK Column in OrExpression 
> is enough.  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PHOENIX-4602) OrExpression should can also push non-leading pk columns to scan

2018-02-13 Thread chenglei (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenglei updated PHOENIX-4602:
--
Attachment: PHOENIX-4602_v1.patch

> OrExpression should can also push non-leading pk columns to scan
> 
>
> Key: PHOENIX-4602
> URL: https://issues.apache.org/jira/browse/PHOENIX-4602
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.13.0
>Reporter: chenglei
>Priority: Major
> Attachments: PHOENIX-4602_v1.patch
>
>
> Given following table:
> {code}
>     CREATE TABLE test_table (
>      PK1 INTEGER NOT NULL,
>      PK2 INTEGER NOT NULL,
>      PK3 INTEGER NOT NULL,
>      DATA INTEGER, 
>      CONSTRAINT TEST_PK PRIMARY KEY (PK1,PK2,PK3))
> {code}
> and a sql:
> {code}
>   select * from test_table t where (t.pk1 >=2 and t.pk1<5) and ((t.pk2 >= 4 
> and t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9))
> {code}
> Obviously, it is a typical case for the sql to use SkipScanFilter,however, 
> the sql actually does not use Skip Scan, it use Range Scan and just push the 
> leading pk column expression \{{ (t.pk1 >=2 and t.pk1<5)}} to scan,the 
> explain is :
>  {code:sql}
>    CLIENT PARALLEL 1-WAY RANGE SCAN OVER TEST_TABLE [2] - [5]
>    SERVER FILTER BY ((PK2 >= 4 AND PK2 < 6) OR (PK2 >= 8 AND PK2 < 9))
> {code}
>  I think the problem is affected by the 
> WhereOptimizer.KeyExpressionVisitor.orKeySlots method, in the following line 
> 763, because the pk2 column is not the leading pk column,so this
> method return null, causing the expression {{ ((t.pk2 >= 4 and t.pk2 <6) or 
> (t.pk2 >= 8 and t.pk2 <9))}} does not  pushed to scan
> {code:java}
> 757    boolean hasFirstSlot = true;
> 758    boolean prevIsNull = false;
> 759    // TODO: Do the same optimization that we do for IN if the childSlots 
> specify a fully qualified row key
> 760   for (KeySlot slot : childSlot) {
> 761      if (hasFirstSlot) {
> 762           // if the first slot is null, return null immediately
> 763           if (slot == null) {
> 764                return null;
> 765            }
> 766           // mark that we've handled the first slot
> 767           hasFirstSlot = false;
> 768      }
> {code}
> For above {{WhereOptimizer.KeyExpressionVisitor.orKeySlots}} method, it seems 
> that it is not necessary to make sure the pk column in OrExpression is 
> leading pk column,guarantee there is only one PK Column in OrExpression is 
> enough.  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PHOENIX-4602) OrExpression should can also push non-leading pk columns to scan

2018-02-13 Thread chenglei (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenglei updated PHOENIX-4602:
--
Description: 
Given following table:

{code}
    CREATE TABLE test_table (
     PK1 INTEGER NOT NULL,
     PK2 INTEGER NOT NULL,
     PK3 INTEGER NOT NULL,
     DATA INTEGER, 
     CONSTRAINT TEST_PK PRIMARY KEY (PK1,PK2,PK3))
{code}

and a sql:

{code}
  select * from test_table t where (t.pk1 >=2 and t.pk1<5) and ((t.pk2 >= 4 and 
t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9))
{code}

Obviously, it is a typical case for the sql to use SkipScanFilter,however, the 
sql actually does not use Skip Scan, it use Range Scan and just push the 
leading pk column expression {{ (t.pk1 >=2 and t.pk1<5) }} to scan,the explain 
sql is :
 {code:sql}
   CLIENT PARALLEL 1-WAY RANGE SCAN OVER TEST_TABLE [2] - [5]
   SERVER FILTER BY ((PK2 >= 4 AND PK2 < 6) OR (PK2 >= 8 AND PK2 < 9))
{code}

 I think the problem is affected by the 
{{WhereOptimizer.KeyExpressionVisitor.orKeySlots}} method, in the following 
line 763, because the pk2 column is not the leading pk column,so this method 
return null, causing the expression 
{{ ((t.pk2 >= 4 and t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9)) }} does not  pushed 
to scan:
{code:java}
757    boolean hasFirstSlot = true;
758    boolean prevIsNull = false;
759    // TODO: Do the same optimization that we do for IN if the childSlots 
specify a fully qualified row key
760   for (KeySlot slot : childSlot) {
761      if (hasFirstSlot) {
762           // if the first slot is null, return null immediately
763           if (slot == null) {
764                return null;
765            }
766           // mark that we've handled the first slot
767           hasFirstSlot = false;
768      }
{code}

For above {{WhereOptimizer.KeyExpressionVisitor.orKeySlots}} method, it seems 
that it is not necessary to make sure the PK Column in OrExpression is leading 
PK Column,just guarantee there is only one PK Column in OrExpression is enough. 
 

 

  was:
Given following table:

{code}
    CREATE TABLE test_table (
     PK1 INTEGER NOT NULL,
     PK2 INTEGER NOT NULL,
     PK3 INTEGER NOT NULL,
     DATA INTEGER, 
     CONSTRAINT TEST_PK PRIMARY KEY (PK1,PK2,PK3))
{code}

and a sql:

{code}
  select * from test_table t where (t.pk1 >=2 and t.pk1<5) and ((t.pk2 >= 4 and 
t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9))
{code}

Obviously, it is a typical case for the sql to use SkipScanFilter,however, the 
sql actually does not use Skip Scan, it use Range Scan and just push the 
leading pk column expression {{ (t.pk1 >=2 and t.pk1<5)}} to scan,the explain 
sql is :
 {code:sql}
   CLIENT PARALLEL 1-WAY RANGE SCAN OVER TEST_TABLE [2] - [5]
   SERVER FILTER BY ((PK2 >= 4 AND PK2 < 6) OR (PK2 >= 8 AND PK2 < 9))
{code}

 I think the problem is affected by the 
WhereOptimizer.KeyExpressionVisitor.orKeySlots method, in the following line 
763, because the pk2 column is not the leading pk column,so this method return 
null, causing the expression 
{{ ((t.pk2 >= 4 and t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9))}} does not  pushed 
to scan:
{code:java}
757    boolean hasFirstSlot = true;
758    boolean prevIsNull = false;
759    // TODO: Do the same optimization that we do for IN if the childSlots 
specify a fully qualified row key
760   for (KeySlot slot : childSlot) {
761      if (hasFirstSlot) {
762           // if the first slot is null, return null immediately
763           if (slot == null) {
764                return null;
765            }
766           // mark that we've handled the first slot
767           hasFirstSlot = false;
768      }
{code}

For above {{WhereOptimizer.KeyExpressionVisitor.orKeySlots}} method, it seems 
that it is not necessary to make sure the PK Column in OrExpression is leading 
PK Column,just guarantee there is only one PK Column in OrExpression is enough. 
 

 


> OrExpression should can also push non-leading pk columns to scan
> 
>
> Key: PHOENIX-4602
> URL: https://issues.apache.org/jira/browse/PHOENIX-4602
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.13.0
>Reporter: chenglei
>Priority: Major
> Attachments: PHOENIX-4602_v1.patch
>
>
> Given following table:
> {code}
>     CREATE TABLE test_table (
>      PK1 INTEGER NOT NULL,
>      PK2 INTEGER NOT NULL,
>      PK3 INTEGER NOT NULL,
>      DATA INTEGER, 
>      CONSTRAINT TEST_PK PRIMARY KEY (PK1,PK2,PK3))
> {code}
> and a sql:
> {code}
>   select * from test_table t where (t.pk1 >=2 and t.pk1<5) and ((t.pk2 >= 4 
> and t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9))
> {code}
> Obviously, it is a typical case for the sql to use SkipScanFilter,however, 
> the sql actually does not use Skip Scan, it use Range Scan and just push the 
> leading pk column expression {{ (t.pk1 >=2 and t.pk1<5) 

[jira] [Updated] (PHOENIX-4602) OrExpression should can also push non-leading pk columns to scan

2018-02-13 Thread chenglei (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenglei updated PHOENIX-4602:
--
Description: 
Given following table:

{code}
    CREATE TABLE test_table (
     PK1 INTEGER NOT NULL,
     PK2 INTEGER NOT NULL,
     PK3 INTEGER NOT NULL,
     DATA INTEGER, 
     CONSTRAINT TEST_PK PRIMARY KEY (PK1,PK2,PK3))
{code}

and a sql:

{code}
  select * from test_table t where (t.pk1 >=2 and t.pk1<5) and ((t.pk2 >= 4 and 
t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9))
{code}

Obviously, it is a typical case for the sql to use SkipScanFilter,however, the 
sql actually does not use Skip Scan, it use Range Scan and just push the 
leading pk column expression {{(t.pk1 >=2 and t.pk1<5)}} to scan,the explain 
sql is :
 {code:sql}
   CLIENT PARALLEL 1-WAY RANGE SCAN OVER TEST_TABLE [2] - [5]
   SERVER FILTER BY ((PK2 >= 4 AND PK2 < 6) OR (PK2 >= 8 AND PK2 < 9))
{code}

 I think the problem is affected by the 
{{WhereOptimizer.KeyExpressionVisitor.orKeySlots}} method, in the following 
line 763, because the pk2 column is not the leading pk column,so this method 
return null, causing the expression 
{{((t.pk2 >= 4 and t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9))}} does not  pushed 
to scan:
{code:java}
757    boolean hasFirstSlot = true;
758    boolean prevIsNull = false;
759    // TODO: Do the same optimization that we do for IN if the childSlots 
specify a fully qualified row key
760   for (KeySlot slot : childSlot) {
761      if (hasFirstSlot) {
762           // if the first slot is null, return null immediately
763           if (slot == null) {
764                return null;
765            }
766           // mark that we've handled the first slot
767           hasFirstSlot = false;
768      }
{code}

For above {{WhereOptimizer.KeyExpressionVisitor.orKeySlots}} method, it seems 
that it is not necessary to make sure the PK Column in OrExpression is leading 
PK Column,just guarantee there is only one PK Column in OrExpression is enough. 
 

 

  was:
Given following table:

{code}
    CREATE TABLE test_table (
     PK1 INTEGER NOT NULL,
     PK2 INTEGER NOT NULL,
     PK3 INTEGER NOT NULL,
     DATA INTEGER, 
     CONSTRAINT TEST_PK PRIMARY KEY (PK1,PK2,PK3))
{code}

and a sql:

{code}
  select * from test_table t where (t.pk1 >=2 and t.pk1<5) and ((t.pk2 >= 4 and 
t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9))
{code}

Obviously, it is a typical case for the sql to use SkipScanFilter,however, the 
sql actually does not use Skip Scan, it use Range Scan and just push the 
leading pk column expression {{ (t.pk1 >=2 and t.pk1<5) }} to scan,the explain 
sql is :
 {code:sql}
   CLIENT PARALLEL 1-WAY RANGE SCAN OVER TEST_TABLE [2] - [5]
   SERVER FILTER BY ((PK2 >= 4 AND PK2 < 6) OR (PK2 >= 8 AND PK2 < 9))
{code}

 I think the problem is affected by the 
{{WhereOptimizer.KeyExpressionVisitor.orKeySlots}} method, in the following 
line 763, because the pk2 column is not the leading pk column,so this method 
return null, causing the expression 
{{ ((t.pk2 >= 4 and t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9)) }} does not  pushed 
to scan:
{code:java}
757    boolean hasFirstSlot = true;
758    boolean prevIsNull = false;
759    // TODO: Do the same optimization that we do for IN if the childSlots 
specify a fully qualified row key
760   for (KeySlot slot : childSlot) {
761      if (hasFirstSlot) {
762           // if the first slot is null, return null immediately
763           if (slot == null) {
764                return null;
765            }
766           // mark that we've handled the first slot
767           hasFirstSlot = false;
768      }
{code}

For above {{WhereOptimizer.KeyExpressionVisitor.orKeySlots}} method, it seems 
that it is not necessary to make sure the PK Column in OrExpression is leading 
PK Column,just guarantee there is only one PK Column in OrExpression is enough. 
 

 


> OrExpression should can also push non-leading pk columns to scan
> 
>
> Key: PHOENIX-4602
> URL: https://issues.apache.org/jira/browse/PHOENIX-4602
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.13.0
>Reporter: chenglei
>Priority: Major
> Attachments: PHOENIX-4602_v1.patch
>
>
> Given following table:
> {code}
>     CREATE TABLE test_table (
>      PK1 INTEGER NOT NULL,
>      PK2 INTEGER NOT NULL,
>      PK3 INTEGER NOT NULL,
>      DATA INTEGER, 
>      CONSTRAINT TEST_PK PRIMARY KEY (PK1,PK2,PK3))
> {code}
> and a sql:
> {code}
>   select * from test_table t where (t.pk1 >=2 and t.pk1<5) and ((t.pk2 >= 4 
> and t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9))
> {code}
> Obviously, it is a typical case for the sql to use SkipScanFilter,however, 
> the sql actually does not use Skip Scan, it use Range Scan and just push the 
> leading pk column expression {{(t.pk1 >=2 and 

[jira] [Commented] (PHOENIX-4533) Phoenix Query Server should not use SPNEGO principal to proxy user requests

2018-02-13 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16362628#comment-16362628
 ] 

Josh Elser commented on PHOENIX-4533:
-

Pushed this to the 4.x and 5.x branches. Thanks again, [~lbronshtein].

One final thing: any interest in updating the website with content for the new 
configuration properties you've added?

We'd want to add them to https://phoenix.apache.org/server.html. 
https://phoenix.apache.org/building_website.html has instructions on how to do 
this. If you can get a diff against the website, I'd happily apply that too. 
Else, I'll just throw up something today myself.

> Phoenix Query Server should not use SPNEGO principal to proxy user requests
> ---
>
> Key: PHOENIX-4533
> URL: https://issues.apache.org/jira/browse/PHOENIX-4533
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: Lev Bronshtein
>Assignee: Lev Bronshtein
>Priority: Minor
> Fix For: 5.0.0, 4.14.0
>
> Attachments: PHOENIX-4533.1.patch, PHOENIX-4533.2.patch, 
> PHOENIX-4533.3.patch, PHOENIX-4533.squash.patch
>
>
> Currently the HTTP/ principal is used by various components in the HADOOP 
> ecosystem to perform SPNEGO authentication.  Since there can only be one 
> HTTP/ per host, even outside of the Hadoop ecosystem, the keytab containing 
> key material for local HTTP/ principal is shared among a few applications.  
> With so many applications having access to the HTTP/ credentials, this 
> increases the chances of an attack on the proxy user capabilities of Hadoop.  
> This JIRA proposes that two different key tabs can be used to
> 1. Authenticate kerberized web requests
> 2. Communicate with the phoenix back end



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (PHOENIX-4592) BaseResultIterators.getStatsForParallelizationProp() should use retry looking up the table without tenantId if cannot find the table using the tenantId

2018-02-13 Thread Thomas D'Silva (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas D'Silva reassigned PHOENIX-4592:
---

Assignee: Thomas D'Silva

> BaseResultIterators.getStatsForParallelizationProp() should use retry looking 
> up the table without tenantId if cannot find the table using the tenantId
> ---
>
> Key: PHOENIX-4592
> URL: https://issues.apache.org/jira/browse/PHOENIX-4592
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Thomas D'Silva
>Assignee: Thomas D'Silva
>Priority: Major
>
> Running a query using a tenant specific connection logs the following warning 
> :
> {code}
> 2018-02-09 17:41:45,497 WARN  [main] iterate.BaseResultIterators - Unable to 
> find parent table "X" of table "X" to determine USE_STATS_FOR_PARALLELIZATION
> org.apache.phoenix.schema.TableNotFoundException: ERROR 1012 (42M03): Table 
> undefined. tableName=X
>   at 
> org.apache.phoenix.schema.PMetaDataImpl.getTableRef(PMetaDataImpl.java:71)
>   at 
> org.apache.phoenix.jdbc.PhoenixConnection.getTable(PhoenixConnection.java:567)
>   at 
> org.apache.phoenix.iterate.BaseResultIterators.getStatsForParallelizationProp(BaseResultIterators.java:1282)
>   at 
> org.apache.phoenix.iterate.BaseResultIterators.(BaseResultIterators.java:500)
>   at 
> org.apache.phoenix.iterate.SerialIterators.(SerialIterators.java:67)
>   at org.apache.phoenix.execute.ScanPlan.newIterator(ScanPlan.java:240)
>   at 
> org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:345)
>   at 
> org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:212)
>   at 
> org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:207)
>   at 
> org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:202)
>   at 
> org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:309)
>   at 
> org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:289)
>   at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
>   at 
> org.apache.phoenix.jdbc.PhoenixStatement.executeQuery(PhoenixStatement.java:288)
>   at 
> org.apache.phoenix.jdbc.PhoenixStatement.executeQuery(PhoenixStatement.java:282)
>   at 
> org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:1692)
>   at sqlline.Commands.execute(Commands.java:822)
>   at sqlline.Commands.sql(Commands.java:732)
>   at sqlline.SqlLine.dispatch(SqlLine.java:807)
>   at sqlline.SqlLine.begin(SqlLine.java:681)
>   at sqlline.SqlLine.start(SqlLine.java:398)
>   at sqlline.SqlLine.main(SqlLine.java:292)
> {code}
> The following code needs to be modified
> {code}
>  if (table.getType() == PTableType.INDEX && table.getParentName() != null) {
> PhoenixConnection conn = context.getConnection();
> String parentTableName = table.getParentName().getString();
> try {
> PTable parentTable =
> conn.getTable(new PTableKey(conn.getTenantId(), 
> parentTableName));
> useStats = parentTable.useStatsForParallelization();
> if (useStats != null) {
> return useStats;
> }
> } catch (TableNotFoundException e) {
> logger.warn("Unable to find parent table \"" + 
> parentTableName + "\" of table \""
> + table.getName().getString()
> + "\" to determine USE_STATS_FOR_PARALLELIZATION",
> e);
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PHOENIX-4533) Phoenix Query Server should not use SPNEGO principal to proxy user requests

2018-02-13 Thread Josh Elser (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated PHOENIX-4533:

Fix Version/s: 4.14.0
   5.0.0

> Phoenix Query Server should not use SPNEGO principal to proxy user requests
> ---
>
> Key: PHOENIX-4533
> URL: https://issues.apache.org/jira/browse/PHOENIX-4533
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: Lev Bronshtein
>Assignee: Lev Bronshtein
>Priority: Minor
> Fix For: 5.0.0, 4.14.0
>
> Attachments: PHOENIX-4533.1.patch, PHOENIX-4533.2.patch, 
> PHOENIX-4533.3.patch, PHOENIX-4533.squash.patch
>
>
> Currently the HTTP/ principal is used by various components in the HADOOP 
> ecosystem to perform SPNEGO authentication.  Since there can only be one 
> HTTP/ per host, even outside of the Hadoop ecosystem, the keytab containing 
> key material for local HTTP/ principal is shared among a few applications.  
> With so many applications having access to the HTTP/ credentials, this 
> increases the chances of an attack on the proxy user capabilities of Hadoop.  
> This JIRA proposes that two different key tabs can be used to
> 1. Authenticate kerberized web requests
> 2. Communicate with the phoenix back end



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PHOENIX-4533) Phoenix Query Server should not use SPNEGO principal to proxy user requests

2018-02-13 Thread Lev Bronshtein (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16362911#comment-16362911
 ] 

Lev Bronshtein commented on PHOENIX-4533:
-

Can do the docs, I am not sure what should change for building, definitely for 
server, where are the source for the doc website?

> Phoenix Query Server should not use SPNEGO principal to proxy user requests
> ---
>
> Key: PHOENIX-4533
> URL: https://issues.apache.org/jira/browse/PHOENIX-4533
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: Lev Bronshtein
>Assignee: Lev Bronshtein
>Priority: Minor
> Fix For: 5.0.0, 4.14.0
>
> Attachments: PHOENIX-4533.1.patch, PHOENIX-4533.2.patch, 
> PHOENIX-4533.3.patch, PHOENIX-4533.squash.patch
>
>
> Currently the HTTP/ principal is used by various components in the HADOOP 
> ecosystem to perform SPNEGO authentication.  Since there can only be one 
> HTTP/ per host, even outside of the Hadoop ecosystem, the keytab containing 
> key material for local HTTP/ principal is shared among a few applications.  
> With so many applications having access to the HTTP/ credentials, this 
> increases the chances of an attack on the proxy user capabilities of Hadoop.  
> This JIRA proposes that two different key tabs can be used to
> 1. Authenticate kerberized web requests
> 2. Communicate with the phoenix back end



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PHOENIX-4603) Remove check for table existence in MetaDataClient.createTableInternal()

2018-02-13 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363015#comment-16363015
 ] 

James Taylor commented on PHOENIX-4603:
---

Please review, [~tdsilva] and/or [~samarthjain]. The client-side cache is 
populated based on the client-side state, so if  table already exists which is 
encoded, then Phoenix would think it's not encoded. I've filed PHOENIX-4604 to 
do the verification on the server-side if the table already exists.

> Remove check for table existence in MetaDataClient.createTableInternal()
> 
>
> Key: PHOENIX-4603
> URL: https://issues.apache.org/jira/browse/PHOENIX-4603
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: James Taylor
>Priority: Major
> Fix For: 4.14.0
>
> Attachments: PHOENIX-4603_v1.patch
>
>
> Found some strange code in that should be removed. If a table is being 
> created but the HBase metadata already exists, we can't assume one way or the 
> other that it's encoded or not encoded. It's on the user to supply the 
> correct existing encoding in that case.
> {code}
> byte[] tableNameBytes = 
> SchemaUtil.getTableNameAsBytes(schemaName, tableName);
> boolean tableExists = true;
> try {
> HTableDescriptor tableDescriptor = 
> connection.getQueryServices().getTableDescriptor(tableNameBytes);
> if (tableDescriptor == null) { // for connectionless
> tableExists = false;
> }
> } catch (org.apache.phoenix.schema.TableNotFoundException e) {
> tableExists = false;
> }
> if (tableExists) {
> encodingScheme = NON_ENCODED_QUALIFIERS;
> immutableStorageScheme = ONE_CELL_PER_COLUMN;
> } else ...
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PHOENIX-4533) Phoenix Query Server should not use SPNEGO principal to proxy user requests

2018-02-13 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16362972#comment-16362972
 ] 

Josh Elser commented on PHOENIX-4533:
-

bq. I am not sure what should change for building,

Nothing to change on that page -- it has the information on where to check out 
the website's source and how to build it :)

> Phoenix Query Server should not use SPNEGO principal to proxy user requests
> ---
>
> Key: PHOENIX-4533
> URL: https://issues.apache.org/jira/browse/PHOENIX-4533
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: Lev Bronshtein
>Assignee: Lev Bronshtein
>Priority: Minor
> Fix For: 5.0.0, 4.14.0
>
> Attachments: PHOENIX-4533.1.patch, PHOENIX-4533.2.patch, 
> PHOENIX-4533.3.patch, PHOENIX-4533.squash.patch
>
>
> Currently the HTTP/ principal is used by various components in the HADOOP 
> ecosystem to perform SPNEGO authentication.  Since there can only be one 
> HTTP/ per host, even outside of the Hadoop ecosystem, the keytab containing 
> key material for local HTTP/ principal is shared among a few applications.  
> With so many applications having access to the HTTP/ credentials, this 
> increases the chances of an attack on the proxy user capabilities of Hadoop.  
> This JIRA proposes that two different key tabs can be used to
> 1. Authenticate kerberized web requests
> 2. Communicate with the phoenix back end



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PHOENIX-4423) Phoenix-hive compilation broken on >=Hive 2.3

2018-02-13 Thread Sergey Soldatov (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16362892#comment-16362892
 ] 

Sergey Soldatov commented on PHOENIX-4423:
--

Heh. There were some 'improvements' in HiveTestUtils comparing to the default 
hive-it runner to get it working in our case for several MR/Tez jobs in the 
query (and joins are the place where we are using it). Let me check it. 

> Phoenix-hive compilation broken on >=Hive 2.3
> -
>
> Key: PHOENIX-4423
> URL: https://issues.apache.org/jira/browse/PHOENIX-4423
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Critical
> Fix For: 5.0.0
>
> Attachments: PHOENIX-4423.002.patch, PHOENIX-4423_wip1.patch
>
>
> HIVE-15167 removed an interface which we're using in Phoenix which obviously 
> fails compilation. Will need to figure out how to work with Hive 1.x, <2.3.0, 
> and >=2.3.0.
> FYI [~sergey.soldatov]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PHOENIX-4602) OrExpression should can also push non-leading pk columns to scan

2018-02-13 Thread James Taylor (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Taylor updated PHOENIX-4602:
--
Fix Version/s: 4.14.0

> OrExpression should can also push non-leading pk columns to scan
> 
>
> Key: PHOENIX-4602
> URL: https://issues.apache.org/jira/browse/PHOENIX-4602
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.13.0
>Reporter: chenglei
>Priority: Major
> Fix For: 4.14.0
>
> Attachments: PHOENIX-4602_v1.patch
>
>
> Given following table:
> {code}
>     CREATE TABLE test_table (
>      PK1 INTEGER NOT NULL,
>      PK2 INTEGER NOT NULL,
>      PK3 INTEGER NOT NULL,
>  DATA INTEGER, 
>      CONSTRAINT TEST_PK PRIMARY KEY (PK1,PK2,PK3))
> {code}
> and a sql:
> {code}
>   select * from test_table t where (t.pk1 >=2 and t.pk1<5) and ((t.pk2 >= 4 
> and t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9))
> {code}
> Obviously, it is a typical case for the sql to use SkipScanFilter,however, 
> the sql actually does not use Skip Scan, it use Range Scan and just push the 
> leading pk column expression {{(t.pk1 >=2 and t.pk1<5)}} to scan,the explain 
> sql is :
>  {code:sql}
>    CLIENT PARALLEL 1-WAY RANGE SCAN OVER TEST_TABLE [2] - [5]
>    SERVER FILTER BY ((PK2 >= 4 AND PK2 < 6) OR (PK2 >= 8 AND PK2 < 9))
> {code}
>  I think the problem is affected by the 
> {{WhereOptimizer.KeyExpressionVisitor.orKeySlots}} method, in the following 
> line 763, because the pk2 column is not the leading pk column,so this method 
> return null, causing the expression 
> {{((t.pk2 >= 4 and t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9))}} is not  pushed 
> to scan:
> {code:java}
> 757    boolean hasFirstSlot = true;
> 758    boolean prevIsNull = false;
> 759    // TODO: Do the same optimization that we do for IN if the childSlots 
> specify a fully qualified row key
> 760   for (KeySlot slot : childSlot) {
> 761      if (hasFirstSlot) {
> 762           // if the first slot is null, return null immediately
> 763           if (slot == null) {
> 764                return null;
> 765            }
> 766           // mark that we've handled the first slot
> 767           hasFirstSlot = false;
> 768      }
> {code}
> For above {{WhereOptimizer.KeyExpressionVisitor.orKeySlots}} method, it seems 
> that it is not necessary to make sure the PK Column in OrExpression is 
> leading PK Column,just guarantee there is only one PK Column in OrExpression 
> is enough.  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (PHOENIX-4602) OrExpression should can also push non-leading pk columns to scan

2018-02-13 Thread James Taylor (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Taylor reassigned PHOENIX-4602:
-

Assignee: chenglei

> OrExpression should can also push non-leading pk columns to scan
> 
>
> Key: PHOENIX-4602
> URL: https://issues.apache.org/jira/browse/PHOENIX-4602
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.13.0
>Reporter: chenglei
>Assignee: chenglei
>Priority: Major
> Fix For: 4.14.0
>
> Attachments: PHOENIX-4602_v1.patch
>
>
> Given following table:
> {code}
>     CREATE TABLE test_table (
>      PK1 INTEGER NOT NULL,
>      PK2 INTEGER NOT NULL,
>      PK3 INTEGER NOT NULL,
>  DATA INTEGER, 
>      CONSTRAINT TEST_PK PRIMARY KEY (PK1,PK2,PK3))
> {code}
> and a sql:
> {code}
>   select * from test_table t where (t.pk1 >=2 and t.pk1<5) and ((t.pk2 >= 4 
> and t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9))
> {code}
> Obviously, it is a typical case for the sql to use SkipScanFilter,however, 
> the sql actually does not use Skip Scan, it use Range Scan and just push the 
> leading pk column expression {{(t.pk1 >=2 and t.pk1<5)}} to scan,the explain 
> sql is :
>  {code:sql}
>    CLIENT PARALLEL 1-WAY RANGE SCAN OVER TEST_TABLE [2] - [5]
>    SERVER FILTER BY ((PK2 >= 4 AND PK2 < 6) OR (PK2 >= 8 AND PK2 < 9))
> {code}
>  I think the problem is affected by the 
> {{WhereOptimizer.KeyExpressionVisitor.orKeySlots}} method, in the following 
> line 763, because the pk2 column is not the leading pk column,so this method 
> return null, causing the expression 
> {{((t.pk2 >= 4 and t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9))}} is not  pushed 
> to scan:
> {code:java}
> 757    boolean hasFirstSlot = true;
> 758    boolean prevIsNull = false;
> 759    // TODO: Do the same optimization that we do for IN if the childSlots 
> specify a fully qualified row key
> 760   for (KeySlot slot : childSlot) {
> 761      if (hasFirstSlot) {
> 762           // if the first slot is null, return null immediately
> 763           if (slot == null) {
> 764                return null;
> 765            }
> 766           // mark that we've handled the first slot
> 767           hasFirstSlot = false;
> 768      }
> {code}
> For above {{WhereOptimizer.KeyExpressionVisitor.orKeySlots}} method, it seems 
> that it is not necessary to make sure the PK Column in OrExpression is 
> leading PK Column,just guarantee there is only one PK Column in OrExpression 
> is enough.  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (PHOENIX-4603) Remove check for table existence in MetaDataClient.createTableInternal()

2018-02-13 Thread James Taylor (JIRA)
James Taylor created PHOENIX-4603:
-

 Summary: Remove check for table existence in 
MetaDataClient.createTableInternal()
 Key: PHOENIX-4603
 URL: https://issues.apache.org/jira/browse/PHOENIX-4603
 Project: Phoenix
  Issue Type: Bug
Reporter: James Taylor


Found some strange code in that should be removed. If a table is being created 
but the HBase metadata already exists, we can't assume one way or the other 
that it's encoded or not encoded. It's on the user to supply the correct 
existing encoding in that case.
{code}
byte[] tableNameBytes = 
SchemaUtil.getTableNameAsBytes(schemaName, tableName);
boolean tableExists = true;
try {
HTableDescriptor tableDescriptor = 
connection.getQueryServices().getTableDescriptor(tableNameBytes);
if (tableDescriptor == null) { // for connectionless
tableExists = false;
}
} catch (org.apache.phoenix.schema.TableNotFoundException e) {
tableExists = false;
}
if (tableExists) {
encodingScheme = NON_ENCODED_QUALIFIERS;
immutableStorageScheme = ONE_CELL_PER_COLUMN;
} else ...
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (PHOENIX-4603) Remove check for table existence in MetaDataClient.createTableInternal()

2018-02-13 Thread James Taylor (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Taylor reassigned PHOENIX-4603:
-

Assignee: James Taylor

> Remove check for table existence in MetaDataClient.createTableInternal()
> 
>
> Key: PHOENIX-4603
> URL: https://issues.apache.org/jira/browse/PHOENIX-4603
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: James Taylor
>Priority: Major
> Fix For: 4.14.0
>
> Attachments: PHOENIX-4603_v1.patch
>
>
> Found some strange code in that should be removed. If a table is being 
> created but the HBase metadata already exists, we can't assume one way or the 
> other that it's encoded or not encoded. It's on the user to supply the 
> correct existing encoding in that case.
> {code}
> byte[] tableNameBytes = 
> SchemaUtil.getTableNameAsBytes(schemaName, tableName);
> boolean tableExists = true;
> try {
> HTableDescriptor tableDescriptor = 
> connection.getQueryServices().getTableDescriptor(tableNameBytes);
> if (tableDescriptor == null) { // for connectionless
> tableExists = false;
> }
> } catch (org.apache.phoenix.schema.TableNotFoundException e) {
> tableExists = false;
> }
> if (tableExists) {
> encodingScheme = NON_ENCODED_QUALIFIERS;
> immutableStorageScheme = ONE_CELL_PER_COLUMN;
> } else ...
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (PHOENIX-4605) Add TRANSACTION_PROVIDER and DEFAULT_TRANSACTION_PROVIDER instead of using boolean

2018-02-13 Thread James Taylor (JIRA)
James Taylor created PHOENIX-4605:
-

 Summary: Add TRANSACTION_PROVIDER and DEFAULT_TRANSACTION_PROVIDER 
instead of using boolean
 Key: PHOENIX-4605
 URL: https://issues.apache.org/jira/browse/PHOENIX-4605
 Project: Phoenix
  Issue Type: Bug
Reporter: James Taylor


We should deprecate QueryServices.DEFAULT_TABLE_ISTRANSACTIONAL_ATTRIB and 
instead have a QueryServices.DEFAULT_TRANSACTION_PROVIDER now that we'll have 
two transaction providers: Tephra and Omid. Along the same lines, we should add 
a TRANSACTION_PROVIDER column to SYSTEM.CATALOG  and stop using the 
IS_TRANSACTIONAL table property. For backwards compatibility, we can assume the 
provider is Tephra if the existing properties are set to true.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PHOENIX-4533) Phoenix Query Server should not use SPNEGO principal to proxy user requests

2018-02-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16362890#comment-16362890
 ] 

Hudson commented on PHOENIX-4533:
-

FAILURE: Integrated in Jenkins build Phoenix-master #1936 (See 
[https://builds.apache.org/job/Phoenix-master/1936/])
PHOENIX-4533 Modified Query Server to use two sets of Kerberos (elserj: rev 
a71c4b7e3c11f1c7d1955b51929ad65b252feb62)
* (edit) 
phoenix-queryserver/src/it/java/org/apache/phoenix/end2end/HttpParamImpersonationQueryServerIT.java
* (edit) phoenix-core/src/main/java/org/apache/phoenix/query/QueryServices.java
* (edit) 
phoenix-queryserver/src/main/java/org/apache/phoenix/queryserver/server/QueryServer.java
* (edit) 
phoenix-queryserver/src/it/java/org/apache/phoenix/end2end/SecureQueryServerIT.java


> Phoenix Query Server should not use SPNEGO principal to proxy user requests
> ---
>
> Key: PHOENIX-4533
> URL: https://issues.apache.org/jira/browse/PHOENIX-4533
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: Lev Bronshtein
>Assignee: Lev Bronshtein
>Priority: Minor
> Fix For: 5.0.0, 4.14.0
>
> Attachments: PHOENIX-4533.1.patch, PHOENIX-4533.2.patch, 
> PHOENIX-4533.3.patch, PHOENIX-4533.squash.patch
>
>
> Currently the HTTP/ principal is used by various components in the HADOOP 
> ecosystem to perform SPNEGO authentication.  Since there can only be one 
> HTTP/ per host, even outside of the Hadoop ecosystem, the keytab containing 
> key material for local HTTP/ principal is shared among a few applications.  
> With so many applications having access to the HTTP/ credentials, this 
> increases the chances of an attack on the proxy user capabilities of Hadoop.  
> This JIRA proposes that two different key tabs can be used to
> 1. Authenticate kerberized web requests
> 2. Communicate with the phoenix back end



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [VOTE] Apache Phoenix 5.0.0-alpha rc1

2018-02-13 Thread Josh Elser
FYI, even though 72 hours has already elapsed, I plan to leave this open 
until Wednesday under hopes that some other folks will take a look 
before then (as a part of the work-week).


Thanks in advance!

On 2/12/18 10:34 AM, Josh Elser wrote:

s/RC0/RC1/ below. I wasn't very diligent with my copy-paste-fix :)

The git-commit SHA1 is correct.

Please take a look if you can today!

On 2/9/18 10:34 AM, Josh Elser wrote:

Hello Everyone,

This is a call for a vote on Apache Phoenix 5.0.0-alpha rc1. Please 
notice that there are known issues with this release which deserve the 
"alpha" designation. These are staged on the website[1]. (Atomic 
upsert does work on my local installation with trivial testing)


Over rc0, this release contains the changes: PHOENIX-4586, 
PHOENIX-4546, PHOENIX-4549, PHOENIX-4582.


The RC is available at the standard location:

https://dist.apache.org/repos/dist/dev/phoenix/apache-phoenix-5.0.0-alpha-HBase-2.0-rc1 



RC0 is based on the following commit: 
451d6a37d0d461b60edff36ceb42b17bb9610350


Signed with my key: 9E62822F4668F17B0972ADD9B7D5CD454677D66C, 
http://pgp.mit.edu/pks/lookup?op=get=0xB7D5CD454677D66C


Vote will be open for at least 72 hours (2018/02/12 1600GMT). Please 
vote:


[ ] +1 approve
[ ] +0 no opinion
[ ] -1 disapprove (and reason why)

Thanks,
The Apache Phoenix Team

[1] https://phoenix.apache.org/release_notes.html


[jira] [Commented] (PHOENIX-4602) OrExpression should can also push non-leading pk columns to scan

2018-02-13 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16362926#comment-16362926
 ] 

James Taylor commented on PHOENIX-4602:
---

Patch looks good, [~comnetwork]. +1 assuming successful {{mvn verify}} run on 
4.x-HBase-1.3 branch.



> OrExpression should can also push non-leading pk columns to scan
> 
>
> Key: PHOENIX-4602
> URL: https://issues.apache.org/jira/browse/PHOENIX-4602
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.13.0
>Reporter: chenglei
>Assignee: chenglei
>Priority: Major
> Fix For: 4.14.0
>
> Attachments: PHOENIX-4602_v1.patch
>
>
> Given following table:
> {code}
>     CREATE TABLE test_table (
>      PK1 INTEGER NOT NULL,
>      PK2 INTEGER NOT NULL,
>      PK3 INTEGER NOT NULL,
>  DATA INTEGER, 
>      CONSTRAINT TEST_PK PRIMARY KEY (PK1,PK2,PK3))
> {code}
> and a sql:
> {code}
>   select * from test_table t where (t.pk1 >=2 and t.pk1<5) and ((t.pk2 >= 4 
> and t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9))
> {code}
> Obviously, it is a typical case for the sql to use SkipScanFilter,however, 
> the sql actually does not use Skip Scan, it use Range Scan and just push the 
> leading pk column expression {{(t.pk1 >=2 and t.pk1<5)}} to scan,the explain 
> sql is :
>  {code:sql}
>    CLIENT PARALLEL 1-WAY RANGE SCAN OVER TEST_TABLE [2] - [5]
>    SERVER FILTER BY ((PK2 >= 4 AND PK2 < 6) OR (PK2 >= 8 AND PK2 < 9))
> {code}
>  I think the problem is affected by the 
> {{WhereOptimizer.KeyExpressionVisitor.orKeySlots}} method, in the following 
> line 763, because the pk2 column is not the leading pk column,so this method 
> return null, causing the expression 
> {{((t.pk2 >= 4 and t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9))}} is not  pushed 
> to scan:
> {code:java}
> 757    boolean hasFirstSlot = true;
> 758    boolean prevIsNull = false;
> 759    // TODO: Do the same optimization that we do for IN if the childSlots 
> specify a fully qualified row key
> 760   for (KeySlot slot : childSlot) {
> 761      if (hasFirstSlot) {
> 762           // if the first slot is null, return null immediately
> 763           if (slot == null) {
> 764                return null;
> 765            }
> 766           // mark that we've handled the first slot
> 767           hasFirstSlot = false;
> 768      }
> {code}
> For above {{WhereOptimizer.KeyExpressionVisitor.orKeySlots}} method, it seems 
> that it is not necessary to make sure the PK Column in OrExpression is 
> leading PK Column,just guarantee there is only one PK Column in OrExpression 
> is enough.  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PHOENIX-4603) Remove check for table existence in MetaDataClient.createTableInternal()

2018-02-13 Thread James Taylor (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Taylor updated PHOENIX-4603:
--
Attachment: PHOENIX-4603_v1.patch

> Remove check for table existence in MetaDataClient.createTableInternal()
> 
>
> Key: PHOENIX-4603
> URL: https://issues.apache.org/jira/browse/PHOENIX-4603
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Priority: Major
> Fix For: 4.14.0
>
> Attachments: PHOENIX-4603_v1.patch
>
>
> Found some strange code in that should be removed. If a table is being 
> created but the HBase metadata already exists, we can't assume one way or the 
> other that it's encoded or not encoded. It's on the user to supply the 
> correct existing encoding in that case.
> {code}
> byte[] tableNameBytes = 
> SchemaUtil.getTableNameAsBytes(schemaName, tableName);
> boolean tableExists = true;
> try {
> HTableDescriptor tableDescriptor = 
> connection.getQueryServices().getTableDescriptor(tableNameBytes);
> if (tableDescriptor == null) { // for connectionless
> tableExists = false;
> }
> } catch (org.apache.phoenix.schema.TableNotFoundException e) {
> tableExists = false;
> }
> if (tableExists) {
> encodingScheme = NON_ENCODED_QUALIFIERS;
> immutableStorageScheme = ONE_CELL_PER_COLUMN;
> } else ...
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PHOENIX-4603) Remove check for table existence in MetaDataClient.createTableInternal()

2018-02-13 Thread James Taylor (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Taylor updated PHOENIX-4603:
--
Fix Version/s: 4.14.0

> Remove check for table existence in MetaDataClient.createTableInternal()
> 
>
> Key: PHOENIX-4603
> URL: https://issues.apache.org/jira/browse/PHOENIX-4603
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Priority: Major
> Fix For: 4.14.0
>
> Attachments: PHOENIX-4603_v1.patch
>
>
> Found some strange code in that should be removed. If a table is being 
> created but the HBase metadata already exists, we can't assume one way or the 
> other that it's encoded or not encoded. It's on the user to supply the 
> correct existing encoding in that case.
> {code}
> byte[] tableNameBytes = 
> SchemaUtil.getTableNameAsBytes(schemaName, tableName);
> boolean tableExists = true;
> try {
> HTableDescriptor tableDescriptor = 
> connection.getQueryServices().getTableDescriptor(tableNameBytes);
> if (tableDescriptor == null) { // for connectionless
> tableExists = false;
> }
> } catch (org.apache.phoenix.schema.TableNotFoundException e) {
> tableExists = false;
> }
> if (tableExists) {
> encodingScheme = NON_ENCODED_QUALIFIERS;
> immutableStorageScheme = ONE_CELL_PER_COLUMN;
> } else ...
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (PHOENIX-4604) If table already exists ensure that table metadata matches for non changeable properties

2018-02-13 Thread James Taylor (JIRA)
James Taylor created PHOENIX-4604:
-

 Summary: If table already exists ensure that table metadata 
matches for non changeable properties
 Key: PHOENIX-4604
 URL: https://issues.apache.org/jira/browse/PHOENIX-4604
 Project: Phoenix
  Issue Type: Bug
Reporter: James Taylor


We should check that the non changeable properties of a Phoenix table matches 
with the metadata passed from the client when the table already exists. 
Otherwise, we can run into issues for existing data: for example, if it was 
encoded before and it's subsequently declared as not encoded. Same issue for 
SALT_BUCKETS changing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PHOENIX-4423) Phoenix-hive compilation broken on >=Hive 2.3

2018-02-13 Thread Sergey Soldatov (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363038#comment-16363038
 ] 

Sergey Soldatov commented on PHOENIX-4423:
--

Ah, hive-it is not published as an official artifact. 
https://repository.apache.org/content/repositories/releases/org/apache/hive/
I believe that was the main reason why we used our own clone of test util class.

> Phoenix-hive compilation broken on >=Hive 2.3
> -
>
> Key: PHOENIX-4423
> URL: https://issues.apache.org/jira/browse/PHOENIX-4423
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Critical
> Fix For: 5.0.0
>
> Attachments: PHOENIX-4423.002.patch, PHOENIX-4423_wip1.patch
>
>
> HIVE-15167 removed an interface which we're using in Phoenix which obviously 
> fails compilation. Will need to figure out how to work with Hive 1.x, <2.3.0, 
> and >=2.3.0.
> FYI [~sergey.soldatov]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PHOENIX-4603) Remove check for table existence in MetaDataClient.createTableInternal()

2018-02-13 Thread Thomas D'Silva (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363045#comment-16363045
 ] 

Thomas D'Silva commented on PHOENIX-4603:
-

+1

> Remove check for table existence in MetaDataClient.createTableInternal()
> 
>
> Key: PHOENIX-4603
> URL: https://issues.apache.org/jira/browse/PHOENIX-4603
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: James Taylor
>Priority: Major
> Fix For: 4.14.0
>
> Attachments: PHOENIX-4603_v1.patch
>
>
> Found some strange code in that should be removed. If a table is being 
> created but the HBase metadata already exists, we can't assume one way or the 
> other that it's encoded or not encoded. It's on the user to supply the 
> correct existing encoding in that case.
> {code}
> byte[] tableNameBytes = 
> SchemaUtil.getTableNameAsBytes(schemaName, tableName);
> boolean tableExists = true;
> try {
> HTableDescriptor tableDescriptor = 
> connection.getQueryServices().getTableDescriptor(tableNameBytes);
> if (tableDescriptor == null) { // for connectionless
> tableExists = false;
> }
> } catch (org.apache.phoenix.schema.TableNotFoundException e) {
> tableExists = false;
> }
> if (tableExists) {
> encodingScheme = NON_ENCODED_QUALIFIERS;
> immutableStorageScheme = ONE_CELL_PER_COLUMN;
> } else ...
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PHOENIX-4592) BaseResultIterators.getStatsForParallelizationProp() should use retry looking up the table without tenantId if cannot find the table using the tenantId

2018-02-13 Thread Thomas D'Silva (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363163#comment-16363163
 ] 

Thomas D'Silva commented on PHOENIX-4592:
-

[~jamestaylor]
Can you please review? I also changed USE_STATS_FOR_PARALLELIZATION 
isMutableOnView property to be false. 

> BaseResultIterators.getStatsForParallelizationProp() should use retry looking 
> up the table without tenantId if cannot find the table using the tenantId
> ---
>
> Key: PHOENIX-4592
> URL: https://issues.apache.org/jira/browse/PHOENIX-4592
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Thomas D'Silva
>Assignee: Thomas D'Silva
>Priority: Major
> Attachments: PHOENIX-4592-4.x-HBase-0.98.patch
>
>
> Running a query using a tenant specific connection logs the following warning 
> :
> {code}
> 2018-02-09 17:41:45,497 WARN  [main] iterate.BaseResultIterators - Unable to 
> find parent table "X" of table "X" to determine USE_STATS_FOR_PARALLELIZATION
> org.apache.phoenix.schema.TableNotFoundException: ERROR 1012 (42M03): Table 
> undefined. tableName=X
>   at 
> org.apache.phoenix.schema.PMetaDataImpl.getTableRef(PMetaDataImpl.java:71)
>   at 
> org.apache.phoenix.jdbc.PhoenixConnection.getTable(PhoenixConnection.java:567)
>   at 
> org.apache.phoenix.iterate.BaseResultIterators.getStatsForParallelizationProp(BaseResultIterators.java:1282)
>   at 
> org.apache.phoenix.iterate.BaseResultIterators.(BaseResultIterators.java:500)
>   at 
> org.apache.phoenix.iterate.SerialIterators.(SerialIterators.java:67)
>   at org.apache.phoenix.execute.ScanPlan.newIterator(ScanPlan.java:240)
>   at 
> org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:345)
>   at 
> org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:212)
>   at 
> org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:207)
>   at 
> org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:202)
>   at 
> org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:309)
>   at 
> org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:289)
>   at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
>   at 
> org.apache.phoenix.jdbc.PhoenixStatement.executeQuery(PhoenixStatement.java:288)
>   at 
> org.apache.phoenix.jdbc.PhoenixStatement.executeQuery(PhoenixStatement.java:282)
>   at 
> org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:1692)
>   at sqlline.Commands.execute(Commands.java:822)
>   at sqlline.Commands.sql(Commands.java:732)
>   at sqlline.SqlLine.dispatch(SqlLine.java:807)
>   at sqlline.SqlLine.begin(SqlLine.java:681)
>   at sqlline.SqlLine.start(SqlLine.java:398)
>   at sqlline.SqlLine.main(SqlLine.java:292)
> {code}
> The following code needs to be modified
> {code}
>  if (table.getType() == PTableType.INDEX && table.getParentName() != null) {
> PhoenixConnection conn = context.getConnection();
> String parentTableName = table.getParentName().getString();
> try {
> PTable parentTable =
> conn.getTable(new PTableKey(conn.getTenantId(), 
> parentTableName));
> useStats = parentTable.useStatsForParallelization();
> if (useStats != null) {
> return useStats;
> }
> } catch (TableNotFoundException e) {
> logger.warn("Unable to find parent table \"" + 
> parentTableName + "\" of table \""
> + table.getName().getString()
> + "\" to determine USE_STATS_FOR_PARALLELIZATION",
> e);
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PHOENIX-4592) BaseResultIterators.getStatsForParallelizationProp() should use retry looking up the table without tenantId if cannot find the table using the tenantId

2018-02-13 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363175#comment-16363175
 ] 

James Taylor commented on PHOENIX-4592:
---

+1

> BaseResultIterators.getStatsForParallelizationProp() should use retry looking 
> up the table without tenantId if cannot find the table using the tenantId
> ---
>
> Key: PHOENIX-4592
> URL: https://issues.apache.org/jira/browse/PHOENIX-4592
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Thomas D'Silva
>Assignee: Thomas D'Silva
>Priority: Major
> Attachments: PHOENIX-4592-4.x-HBase-0.98.patch
>
>
> Running a query using a tenant specific connection logs the following warning 
> :
> {code}
> 2018-02-09 17:41:45,497 WARN  [main] iterate.BaseResultIterators - Unable to 
> find parent table "X" of table "X" to determine USE_STATS_FOR_PARALLELIZATION
> org.apache.phoenix.schema.TableNotFoundException: ERROR 1012 (42M03): Table 
> undefined. tableName=X
>   at 
> org.apache.phoenix.schema.PMetaDataImpl.getTableRef(PMetaDataImpl.java:71)
>   at 
> org.apache.phoenix.jdbc.PhoenixConnection.getTable(PhoenixConnection.java:567)
>   at 
> org.apache.phoenix.iterate.BaseResultIterators.getStatsForParallelizationProp(BaseResultIterators.java:1282)
>   at 
> org.apache.phoenix.iterate.BaseResultIterators.(BaseResultIterators.java:500)
>   at 
> org.apache.phoenix.iterate.SerialIterators.(SerialIterators.java:67)
>   at org.apache.phoenix.execute.ScanPlan.newIterator(ScanPlan.java:240)
>   at 
> org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:345)
>   at 
> org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:212)
>   at 
> org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:207)
>   at 
> org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:202)
>   at 
> org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:309)
>   at 
> org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:289)
>   at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
>   at 
> org.apache.phoenix.jdbc.PhoenixStatement.executeQuery(PhoenixStatement.java:288)
>   at 
> org.apache.phoenix.jdbc.PhoenixStatement.executeQuery(PhoenixStatement.java:282)
>   at 
> org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:1692)
>   at sqlline.Commands.execute(Commands.java:822)
>   at sqlline.Commands.sql(Commands.java:732)
>   at sqlline.SqlLine.dispatch(SqlLine.java:807)
>   at sqlline.SqlLine.begin(SqlLine.java:681)
>   at sqlline.SqlLine.start(SqlLine.java:398)
>   at sqlline.SqlLine.main(SqlLine.java:292)
> {code}
> The following code needs to be modified
> {code}
>  if (table.getType() == PTableType.INDEX && table.getParentName() != null) {
> PhoenixConnection conn = context.getConnection();
> String parentTableName = table.getParentName().getString();
> try {
> PTable parentTable =
> conn.getTable(new PTableKey(conn.getTenantId(), 
> parentTableName));
> useStats = parentTable.useStatsForParallelization();
> if (useStats != null) {
> return useStats;
> }
> } catch (TableNotFoundException e) {
> logger.warn("Unable to find parent table \"" + 
> parentTableName + "\" of table \""
> + table.getName().getString()
> + "\" to determine USE_STATS_FOR_PARALLELIZATION",
> e);
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PHOENIX-4603) Remove check for table existence in MetaDataClient.createTableInternal()

2018-02-13 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363147#comment-16363147
 ] 

James Taylor commented on PHOENIX-4603:
---

Thanks for the review, [~tdsilva]. I uploaded the final version of the patch 
which fixes a couple of tests which needed to explicitly disable column 
encoding.

> Remove check for table existence in MetaDataClient.createTableInternal()
> 
>
> Key: PHOENIX-4603
> URL: https://issues.apache.org/jira/browse/PHOENIX-4603
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: James Taylor
>Priority: Major
> Fix For: 4.14.0
>
> Attachments: PHOENIX-4603_v1.patch, PHOENIX-4603_v2.patch
>
>
> Found some strange code in that should be removed. If a table is being 
> created but the HBase metadata already exists, we can't assume one way or the 
> other that it's encoded or not encoded. It's on the user to supply the 
> correct existing encoding in that case.
> {code}
> byte[] tableNameBytes = 
> SchemaUtil.getTableNameAsBytes(schemaName, tableName);
> boolean tableExists = true;
> try {
> HTableDescriptor tableDescriptor = 
> connection.getQueryServices().getTableDescriptor(tableNameBytes);
> if (tableDescriptor == null) { // for connectionless
> tableExists = false;
> }
> } catch (org.apache.phoenix.schema.TableNotFoundException e) {
> tableExists = false;
> }
> if (tableExists) {
> encodingScheme = NON_ENCODED_QUALIFIERS;
> immutableStorageScheme = ONE_CELL_PER_COLUMN;
> } else ...
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PHOENIX-4605) Add TRANSACTION_PROVIDER and DEFAULT_TRANSACTION_PROVIDER instead of using boolean

2018-02-13 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363046#comment-16363046
 ] 

James Taylor commented on PHOENIX-4605:
---

FYI, [~ohads]. Not sure what we should do about 
QueryServices.TRANSACTIONS_ENABLED (currently a boolean as well). Maybe it 
should contain a list of supported/configured transaction providers? We use 
that mostly in tests, but we also use it when we open a cluster connection to 
conditionally establish a connection to the transaction manager. Is there any 
initialization required for Omid along these lines? If so, should we add a new 
TAL method?
{code}
private void openConnection() throws SQLException {
try {
boolean transactionsEnabled = props.getBoolean(
QueryServices.TRANSACTIONS_ENABLED,
QueryServicesOptions.DEFAULT_TRANSACTIONS_ENABLED);
this.connection = 
HBaseFactoryProvider.getHConnectionFactory().createConnection(this.config);
GLOBAL_HCONNECTIONS_COUNTER.increment();
logger.info("HConnection established. Stacktrace for informational 
purposes: " + connection + " " +  LogUtil.getCallerStackTrace());
// only initialize the tx service client if needed and if we 
succeeded in getting a connection
// to HBase
if (transactionsEnabled) {
initTxServiceClient();
}
} catch (IOException e) {
throw new 
SQLExceptionInfo.Builder(SQLExceptionCode.CANNOT_ESTABLISH_CONNECTION)
.setRootCause(e).build().buildException();
}
if (this.connection.isClosed()) { // TODO: why the heck doesn't this 
throw above?
throw new 
SQLExceptionInfo.Builder(SQLExceptionCode.CANNOT_ESTABLISH_CONNECTION).build().buildException();
}
}
{code}

One more check needed would be in MutationState to disallow updates to both 
Tephra and Omid tables in the same transaction.


> Add TRANSACTION_PROVIDER and DEFAULT_TRANSACTION_PROVIDER instead of using 
> boolean
> --
>
> Key: PHOENIX-4605
> URL: https://issues.apache.org/jira/browse/PHOENIX-4605
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Priority: Major
>
> We should deprecate QueryServices.DEFAULT_TABLE_ISTRANSACTIONAL_ATTRIB and 
> instead have a QueryServices.DEFAULT_TRANSACTION_PROVIDER now that we'll have 
> two transaction providers: Tephra and Omid. Along the same lines, we should 
> add a TRANSACTION_PROVIDER column to SYSTEM.CATALOG  and stop using the 
> IS_TRANSACTIONAL table property. For backwards compatibility, we can assume 
> the provider is Tephra if the existing properties are set to true.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PHOENIX-4603) Remove check for table existence in MetaDataClient.createTableInternal()

2018-02-13 Thread James Taylor (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Taylor updated PHOENIX-4603:
--
Attachment: PHOENIX-4603_v2.patch

> Remove check for table existence in MetaDataClient.createTableInternal()
> 
>
> Key: PHOENIX-4603
> URL: https://issues.apache.org/jira/browse/PHOENIX-4603
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: James Taylor
>Priority: Major
> Fix For: 4.14.0
>
> Attachments: PHOENIX-4603_v1.patch, PHOENIX-4603_v2.patch
>
>
> Found some strange code in that should be removed. If a table is being 
> created but the HBase metadata already exists, we can't assume one way or the 
> other that it's encoded or not encoded. It's on the user to supply the 
> correct existing encoding in that case.
> {code}
> byte[] tableNameBytes = 
> SchemaUtil.getTableNameAsBytes(schemaName, tableName);
> boolean tableExists = true;
> try {
> HTableDescriptor tableDescriptor = 
> connection.getQueryServices().getTableDescriptor(tableNameBytes);
> if (tableDescriptor == null) { // for connectionless
> tableExists = false;
> }
> } catch (org.apache.phoenix.schema.TableNotFoundException e) {
> tableExists = false;
> }
> if (tableExists) {
> encodingScheme = NON_ENCODED_QUALIFIERS;
> immutableStorageScheme = ONE_CELL_PER_COLUMN;
> } else ...
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PHOENIX-4592) BaseResultIterators.getStatsForParallelizationProp() should use retry looking up the table without tenantId if cannot find the table using the tenantId

2018-02-13 Thread Thomas D'Silva (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas D'Silva updated PHOENIX-4592:

Attachment: PHOENIX-4592-4.x-HBase-0.98.patch

> BaseResultIterators.getStatsForParallelizationProp() should use retry looking 
> up the table without tenantId if cannot find the table using the tenantId
> ---
>
> Key: PHOENIX-4592
> URL: https://issues.apache.org/jira/browse/PHOENIX-4592
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Thomas D'Silva
>Assignee: Thomas D'Silva
>Priority: Major
> Attachments: PHOENIX-4592-4.x-HBase-0.98.patch
>
>
> Running a query using a tenant specific connection logs the following warning 
> :
> {code}
> 2018-02-09 17:41:45,497 WARN  [main] iterate.BaseResultIterators - Unable to 
> find parent table "X" of table "X" to determine USE_STATS_FOR_PARALLELIZATION
> org.apache.phoenix.schema.TableNotFoundException: ERROR 1012 (42M03): Table 
> undefined. tableName=X
>   at 
> org.apache.phoenix.schema.PMetaDataImpl.getTableRef(PMetaDataImpl.java:71)
>   at 
> org.apache.phoenix.jdbc.PhoenixConnection.getTable(PhoenixConnection.java:567)
>   at 
> org.apache.phoenix.iterate.BaseResultIterators.getStatsForParallelizationProp(BaseResultIterators.java:1282)
>   at 
> org.apache.phoenix.iterate.BaseResultIterators.(BaseResultIterators.java:500)
>   at 
> org.apache.phoenix.iterate.SerialIterators.(SerialIterators.java:67)
>   at org.apache.phoenix.execute.ScanPlan.newIterator(ScanPlan.java:240)
>   at 
> org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:345)
>   at 
> org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:212)
>   at 
> org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:207)
>   at 
> org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:202)
>   at 
> org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:309)
>   at 
> org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:289)
>   at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
>   at 
> org.apache.phoenix.jdbc.PhoenixStatement.executeQuery(PhoenixStatement.java:288)
>   at 
> org.apache.phoenix.jdbc.PhoenixStatement.executeQuery(PhoenixStatement.java:282)
>   at 
> org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:1692)
>   at sqlline.Commands.execute(Commands.java:822)
>   at sqlline.Commands.sql(Commands.java:732)
>   at sqlline.SqlLine.dispatch(SqlLine.java:807)
>   at sqlline.SqlLine.begin(SqlLine.java:681)
>   at sqlline.SqlLine.start(SqlLine.java:398)
>   at sqlline.SqlLine.main(SqlLine.java:292)
> {code}
> The following code needs to be modified
> {code}
>  if (table.getType() == PTableType.INDEX && table.getParentName() != null) {
> PhoenixConnection conn = context.getConnection();
> String parentTableName = table.getParentName().getString();
> try {
> PTable parentTable =
> conn.getTable(new PTableKey(conn.getTenantId(), 
> parentTableName));
> useStats = parentTable.useStatsForParallelization();
> if (useStats != null) {
> return useStats;
> }
> } catch (TableNotFoundException e) {
> logger.warn("Unable to find parent table \"" + 
> parentTableName + "\" of table \""
> + table.getName().getString()
> + "\" to determine USE_STATS_FOR_PARALLELIZATION",
> e);
> }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (PHOENIX-4603) Remove check for table existence in MetaDataClient.createTableInternal()

2018-02-13 Thread James Taylor (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Taylor resolved PHOENIX-4603.
---
   Resolution: Fixed
Fix Version/s: 5.1.0

> Remove check for table existence in MetaDataClient.createTableInternal()
> 
>
> Key: PHOENIX-4603
> URL: https://issues.apache.org/jira/browse/PHOENIX-4603
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: James Taylor
>Priority: Major
> Fix For: 4.14.0, 5.1.0
>
> Attachments: PHOENIX-4603_v1.patch, PHOENIX-4603_v2.patch
>
>
> Found some strange code in that should be removed. If a table is being 
> created but the HBase metadata already exists, we can't assume one way or the 
> other that it's encoded or not encoded. It's on the user to supply the 
> correct existing encoding in that case.
> {code}
> byte[] tableNameBytes = 
> SchemaUtil.getTableNameAsBytes(schemaName, tableName);
> boolean tableExists = true;
> try {
> HTableDescriptor tableDescriptor = 
> connection.getQueryServices().getTableDescriptor(tableNameBytes);
> if (tableDescriptor == null) { // for connectionless
> tableExists = false;
> }
> } catch (org.apache.phoenix.schema.TableNotFoundException e) {
> tableExists = false;
> }
> if (tableExists) {
> encodingScheme = NON_ENCODED_QUALIFIERS;
> immutableStorageScheme = ONE_CELL_PER_COLUMN;
> } else ...
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


phoenix newbie build question

2018-02-13 Thread Xu Cang
Hi,

I am trying to build Phoenix (on Ubuntu) and run tests by following
'build.txt' instruction from code repo.

Commands I ran:

1. mvn install -DskipTests
2. mvn process-sources
3. mvn package

Thenm I got this error:

[ERROR]
testMultipleConnectionsAsSameUserWithoutLogin(org.apache.phoenix.jdbc.SecureUserConnectionsTest)
Time elapsed: 0.013 s  <<< ERROR!
java.lang.RuntimeException: Couldn't get the current user!!
at
org.apache.phoenix.jdbc.SecureUserConnectionsTest.testMultipleConnectionsAsSameUserWithoutLogin(SecureUserConnectionsTest.java:378)

[INFO]
[INFO] Results:
[INFO]
[ERROR] Errors:
[ERROR]
 SecureUserConnectionsTest.testMultipleConnectionsAsSameUserWithoutLogin:378
Runtime
[INFO]
[ERROR] Tests run: 1592, Failures: 0, Errors: 1, Skipped: 3
[INFO]
[INFO]

[INFO] Reactor Summary:
[INFO]
[INFO] Apache Phoenix . SUCCESS [
0.924 s]
[INFO] Phoenix Core ... FAILURE [
35.155 s]


The error comes from this code piece:

*try {*
*this.user = User.getCurrent();*
*} catch (IOException e) {*
*throw new RuntimeException("Couldn't get the current
user!!");*
*}*


My question is, am I missing any dependencies in order to get this user?
Any pointer or help is appreciated.  Thanks,


(BTW, IndexUtilTest.java unit test ran successfully. )

Best Regards,
Xu


[jira] [Commented] (PHOENIX-4344) MapReduce Delete Support

2018-02-13 Thread Akshita Malhotra (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363286#comment-16363286
 ] 

Akshita Malhotra commented on PHOENIX-4344:
---

[~jamestaylor] Can you explain why would it do a point scan? Maybe I am 
thinking in the wrong direction but as [~gjacoby] explained, even if the 
initial delete is deleting over a non PK column, when a point phoenix delete 
query is being issued, I can provide the PK information (obtain from the map 
reduce scan) along with the extra predicate that would include the non-PK 
column. 

> MapReduce Delete Support
> 
>
> Key: PHOENIX-4344
> URL: https://issues.apache.org/jira/browse/PHOENIX-4344
> Project: Phoenix
>  Issue Type: New Feature
>Affects Versions: 4.12.0
>Reporter: Geoffrey Jacoby
>Assignee: Geoffrey Jacoby
>Priority: Major
>
> Phoenix already has the ability to use MapReduce for asynchronous handling of 
> long-running SELECTs. It would be really useful to have this capability for 
> long-running DELETEs, particularly of tables with indexes where using HBase's 
> own MapReduce integration would be prohibitively complicated. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: phoenix newbie build question

2018-02-13 Thread Josh Elser

Hi Xu,

What version of Java and Maven are you using?

I wouldn't be super worried about the test failures -- it's likely just 
an indication that the unit test is reliant on something in the local 
environment which isn't there on your computer (e.g. a default 
krb5.conf). Ideally, we can figure out why it failed and fix it for the 
future, but would need to get to the bottom of it..


On 2/13/18 6:51 PM, Xu Cang wrote:

Hi,

I am trying to build Phoenix (on Ubuntu) and run tests by following
'build.txt' instruction from code repo.

Commands I ran:

1. mvn install -DskipTests
2. mvn process-sources
3. mvn package

Thenm I got this error:

[ERROR]
testMultipleConnectionsAsSameUserWithoutLogin(org.apache.phoenix.jdbc.SecureUserConnectionsTest)
Time elapsed: 0.013 s  <<< ERROR!
java.lang.RuntimeException: Couldn't get the current user!!
at
org.apache.phoenix.jdbc.SecureUserConnectionsTest.testMultipleConnectionsAsSameUserWithoutLogin(SecureUserConnectionsTest.java:378)

[INFO]
[INFO] Results:
[INFO]
[ERROR] Errors:
[ERROR]
  SecureUserConnectionsTest.testMultipleConnectionsAsSameUserWithoutLogin:378
Runtime
[INFO]
[ERROR] Tests run: 1592, Failures: 0, Errors: 1, Skipped: 3
[INFO]
[INFO]

[INFO] Reactor Summary:
[INFO]
[INFO] Apache Phoenix . SUCCESS [
0.924 s]
[INFO] Phoenix Core ... FAILURE [
35.155 s]


The error comes from this code piece:

*try {*
*this.user = User.getCurrent();*
*} catch (IOException e) {*
*throw new RuntimeException("Couldn't get the current
user!!");*
*}*


My question is, am I missing any dependencies in order to get this user?
Any pointer or help is appreciated.  Thanks,


(BTW, IndexUtilTest.java unit test ran successfully. )

Best Regards,
Xu



[jira] [Commented] (PHOENIX-4603) Remove check for table existence in MetaDataClient.createTableInternal()

2018-02-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363355#comment-16363355
 ] 

Hudson commented on PHOENIX-4603:
-

SUCCESS: Integrated in Jenkins build Phoenix-4.x-HBase-1.3 #39 (See 
[https://builds.apache.org/job/Phoenix-4.x-HBase-1.3/39/])
PHOENIX-4603 Remove check for table existence in (jtaylor: rev 
106daa347e89e762c30089023ae8389b95b01fd3)
* (edit) 
phoenix-core/src/it/java/org/apache/phoenix/end2end/DynamicColumnIT.java
* (edit) 
phoenix-core/src/main/java/org/apache/phoenix/schema/MetaDataClient.java
* (edit) 
phoenix-core/src/it/java/org/apache/phoenix/end2end/MappingTableDataTypeIT.java
* (edit) 
phoenix-core/src/it/java/org/apache/phoenix/end2end/NamespaceSchemaMappingIT.java


> Remove check for table existence in MetaDataClient.createTableInternal()
> 
>
> Key: PHOENIX-4603
> URL: https://issues.apache.org/jira/browse/PHOENIX-4603
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>Assignee: James Taylor
>Priority: Major
> Fix For: 4.14.0, 5.1.0
>
> Attachments: PHOENIX-4603_v1.patch, PHOENIX-4603_v2.patch
>
>
> Found some strange code in that should be removed. If a table is being 
> created but the HBase metadata already exists, we can't assume one way or the 
> other that it's encoded or not encoded. It's on the user to supply the 
> correct existing encoding in that case.
> {code}
> byte[] tableNameBytes = 
> SchemaUtil.getTableNameAsBytes(schemaName, tableName);
> boolean tableExists = true;
> try {
> HTableDescriptor tableDescriptor = 
> connection.getQueryServices().getTableDescriptor(tableNameBytes);
> if (tableDescriptor == null) { // for connectionless
> tableExists = false;
> }
> } catch (org.apache.phoenix.schema.TableNotFoundException e) {
> tableExists = false;
> }
> if (tableExists) {
> encodingScheme = NON_ENCODED_QUALIFIERS;
> immutableStorageScheme = ONE_CELL_PER_COLUMN;
> } else ...
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: phoenix newbie build question

2018-02-13 Thread Xu Cang
Hi Josh,

Thanks for your reply. I got java1.8 and maven 3.3.9 as below.

Apache Maven 3.3.9
Maven home: /usr/share/maven
Java version: 1.8.0_151, vendor: Oracle Corporation

Ok. Sounds good. Thank you.

Xu

On Tue, Feb 13, 2018 at 4:55 PM, Josh Elser  wrote:

> Hi Xu,
>
> What version of Java and Maven are you using?
>
> I wouldn't be super worried about the test failures -- it's likely just an
> indication that the unit test is reliant on something in the local
> environment which isn't there on your computer (e.g. a default krb5.conf).
> Ideally, we can figure out why it failed and fix it for the future, but
> would need to get to the bottom of it..
>
>
> On 2/13/18 6:51 PM, Xu Cang wrote:
>
>> Hi,
>>
>> I am trying to build Phoenix (on Ubuntu) and run tests by following
>> 'build.txt' instruction from code repo.
>>
>> Commands I ran:
>>
>> 1. mvn install -DskipTests
>> 2. mvn process-sources
>> 3. mvn package
>>
>> Thenm I got this error:
>>
>> [ERROR]
>> testMultipleConnectionsAsSameUserWithoutLogin(org.apache.pho
>> enix.jdbc.SecureUserConnectionsTest)
>> Time elapsed: 0.013 s  <<< ERROR!
>> java.lang.RuntimeException: Couldn't get the current user!!
>> at
>> org.apache.phoenix.jdbc.SecureUserConnectionsTest.testMultip
>> leConnectionsAsSameUserWithoutLogin(SecureUserConnectionsTest.java:378)
>>
>> [INFO]
>> [INFO] Results:
>> [INFO]
>> [ERROR] Errors:
>> [ERROR]
>>   SecureUserConnectionsTest.testMultipleConnectionsAsSameUserW
>> ithoutLogin:378
>> Runtime
>> [INFO]
>> [ERROR] Tests run: 1592, Failures: 0, Errors: 1, Skipped: 3
>> [INFO]
>> [INFO]
>> 
>> [INFO] Reactor Summary:
>> [INFO]
>> [INFO] Apache Phoenix . SUCCESS [
>> 0.924 s]
>> [INFO] Phoenix Core ... FAILURE [
>> 35.155 s]
>>
>>
>> The error comes from this code piece:
>>
>> *try {*
>> *this.user = User.getCurrent();*
>> *} catch (IOException e) {*
>> *throw new RuntimeException("Couldn't get the current
>> user!!");*
>> *}*
>>
>>
>> My question is, am I missing any dependencies in order to get this user?
>> Any pointer or help is appreciated.  Thanks,
>>
>>
>> (BTW, IndexUtilTest.java unit test ran successfully. )
>>
>> Best Regards,
>> Xu
>>
>>


[jira] [Commented] (PHOENIX-2566) Support NOT NULL constraint for any column for immutable table

2018-02-13 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363578#comment-16363578
 ] 

James Taylor commented on PHOENIX-2566:
---

Please review, [~tdsilva].

> Support NOT NULL constraint for any column for immutable table
> --
>
> Key: PHOENIX-2566
> URL: https://issues.apache.org/jira/browse/PHOENIX-2566
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: James Taylor
>Assignee: James Taylor
>Priority: Major
> Fix For: 4.14.0
>
> Attachments: PHOENIX-2566_v1.patch
>
>
> Since write-once/append-only tables do not partially update rows, we can 
> support NOT NULL constraints for non PK columns.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PHOENIX-2566) Support NOT NULL constraint for any column for immutable table

2018-02-13 Thread James Taylor (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Taylor updated PHOENIX-2566:
--
Attachment: PHOENIX-2566_v1.patch

> Support NOT NULL constraint for any column for immutable table
> --
>
> Key: PHOENIX-2566
> URL: https://issues.apache.org/jira/browse/PHOENIX-2566
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: James Taylor
>Assignee: James Taylor
>Priority: Major
> Fix For: 4.14.0
>
> Attachments: PHOENIX-2566_v1.patch
>
>
> Since write-once/append-only tables do not partially update rows, we can 
> support NOT NULL constraints for non PK columns.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PHOENIX-4344) MapReduce Delete Support

2018-02-13 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363598#comment-16363598
 ] 

James Taylor commented on PHOENIX-4344:
---

Phoenix will do a point delete (i.e. the Phoenix client will issue an HBase 
Delete with the full row key) because it thinks it has values for all the 
columns that make up the primary key of the base table. In this case, it 
doesn't need to issue a scan at all. The problem is, Phoenix doesn't know that 
there are derived views that have extended the PK.

One solution would be to have a declaration on the base table that it would 
never be used to upsert data directly. Something like declaring it ABSTRACT. In 
that case, if you deleted from it, Phoenix could know to issue a scan instead 
of trying to optimize it as a point delete.

Another solution would be to issue the delete statement against the view in the 
MR job. Since the view has extended the PK, Phoenix wouldn't issue a point 
delete, but would issue a scan. That might not be feasible, though, as it'd be 
tricky to know all the views.

> MapReduce Delete Support
> 
>
> Key: PHOENIX-4344
> URL: https://issues.apache.org/jira/browse/PHOENIX-4344
> Project: Phoenix
>  Issue Type: New Feature
>Affects Versions: 4.12.0
>Reporter: Geoffrey Jacoby
>Assignee: Geoffrey Jacoby
>Priority: Major
>
> Phoenix already has the ability to use MapReduce for asynchronous handling of 
> long-running SELECTs. It would be really useful to have this capability for 
> long-running DELETEs, particularly of tables with indexes where using HBase's 
> own MapReduce integration would be prohibitively complicated. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (PHOENIX-2566) Support NOT NULL constraint for any column for immutable table

2018-02-13 Thread James Taylor (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Taylor reassigned PHOENIX-2566:
-

Assignee: James Taylor  (was: Vincent Poon)

> Support NOT NULL constraint for any column for immutable table
> --
>
> Key: PHOENIX-2566
> URL: https://issues.apache.org/jira/browse/PHOENIX-2566
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: James Taylor
>Assignee: James Taylor
>Priority: Major
> Fix For: 4.14.0
>
>
> Since write-once/append-only tables do not partially update rows, we can 
> support NOT NULL constraints for non PK columns.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PHOENIX-2566) Support NOT NULL constraint for any column for immutable table

2018-02-13 Thread James Taylor (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Taylor updated PHOENIX-2566:
--
Fix Version/s: 4.14.0

> Support NOT NULL constraint for any column for immutable table
> --
>
> Key: PHOENIX-2566
> URL: https://issues.apache.org/jira/browse/PHOENIX-2566
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: James Taylor
>Assignee: Vincent Poon
>Priority: Major
> Fix For: 4.14.0
>
>
> Since write-once/append-only tables do not partially update rows, we can 
> support NOT NULL constraints for non PK columns.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PHOENIX-4602) OrExpression should can also push non-leading pk columns to scan

2018-02-13 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363594#comment-16363594
 ] 

chenglei commented on PHOENIX-4602:
---

Pushed to master, 4.x-HBase-1.3, 4.x-HBase-1.2,  4.x-HBase-1.1, 4.x-cdh5.11.2,  
4.x-HBase-0.98, and 5.x-HBase-2.0 branches.

> OrExpression should can also push non-leading pk columns to scan
> 
>
> Key: PHOENIX-4602
> URL: https://issues.apache.org/jira/browse/PHOENIX-4602
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.13.0
>Reporter: chenglei
>Assignee: chenglei
>Priority: Major
> Fix For: 4.14.0
>
> Attachments: PHOENIX-4602_v2.patch
>
>
> Given following table:
> {code}
>     CREATE TABLE test_table (
>      PK1 INTEGER NOT NULL,
>      PK2 INTEGER NOT NULL,
>      PK3 INTEGER NOT NULL,
>  DATA INTEGER, 
>      CONSTRAINT TEST_PK PRIMARY KEY (PK1,PK2,PK3))
> {code}
> and a sql:
> {code}
>   select * from test_table t where (t.pk1 >=2 and t.pk1<5) and ((t.pk2 >= 4 
> and t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9))
> {code}
> Obviously, it is a typical case for the sql to use SkipScanFilter,however, 
> the sql actually does not use Skip Scan, it use Range Scan and just push the 
> leading pk column expression {{(t.pk1 >=2 and t.pk1<5)}} to scan,the explain 
> sql is :
>  {code:sql}
>    CLIENT PARALLEL 1-WAY RANGE SCAN OVER TEST_TABLE [2] - [5]
>    SERVER FILTER BY ((PK2 >= 4 AND PK2 < 6) OR (PK2 >= 8 AND PK2 < 9))
> {code}
>  I think the problem is affected by the 
> {{WhereOptimizer.KeyExpressionVisitor.orKeySlots}} method, in the following 
> line 763, because the pk2 column is not the leading pk column,so this method 
> return null, causing the expression 
> {{((t.pk2 >= 4 and t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9))}} is not  pushed 
> to scan:
> {code:java}
> 757    boolean hasFirstSlot = true;
> 758    boolean prevIsNull = false;
> 759    // TODO: Do the same optimization that we do for IN if the childSlots 
> specify a fully qualified row key
> 760   for (KeySlot slot : childSlot) {
> 761      if (hasFirstSlot) {
> 762           // if the first slot is null, return null immediately
> 763           if (slot == null) {
> 764                return null;
> 765            }
> 766           // mark that we've handled the first slot
> 767           hasFirstSlot = false;
> 768      }
> {code}
> For above {{WhereOptimizer.KeyExpressionVisitor.orKeySlots}} method, it seems 
> that it is not necessary to make sure the PK Column in OrExpression is 
> leading PK Column,just guarantee there is only one PK Column in OrExpression 
> is enough.  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PHOENIX-4602) OrExpression should can also push non-leading pk columns to scan

2018-02-13 Thread chenglei (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363572#comment-16363572
 ] 

chenglei commented on PHOENIX-4602:
---

Applied the patch to 4.x-HBase-1.3 and ran all the unit tests and IT tests in 
my local machine, the tests are all successful, and add more tests in patchV2.

> OrExpression should can also push non-leading pk columns to scan
> 
>
> Key: PHOENIX-4602
> URL: https://issues.apache.org/jira/browse/PHOENIX-4602
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.13.0
>Reporter: chenglei
>Assignee: chenglei
>Priority: Major
> Fix For: 4.14.0
>
> Attachments: PHOENIX-4602_v2.patch
>
>
> Given following table:
> {code}
>     CREATE TABLE test_table (
>      PK1 INTEGER NOT NULL,
>      PK2 INTEGER NOT NULL,
>      PK3 INTEGER NOT NULL,
>  DATA INTEGER, 
>      CONSTRAINT TEST_PK PRIMARY KEY (PK1,PK2,PK3))
> {code}
> and a sql:
> {code}
>   select * from test_table t where (t.pk1 >=2 and t.pk1<5) and ((t.pk2 >= 4 
> and t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9))
> {code}
> Obviously, it is a typical case for the sql to use SkipScanFilter,however, 
> the sql actually does not use Skip Scan, it use Range Scan and just push the 
> leading pk column expression {{(t.pk1 >=2 and t.pk1<5)}} to scan,the explain 
> sql is :
>  {code:sql}
>    CLIENT PARALLEL 1-WAY RANGE SCAN OVER TEST_TABLE [2] - [5]
>    SERVER FILTER BY ((PK2 >= 4 AND PK2 < 6) OR (PK2 >= 8 AND PK2 < 9))
> {code}
>  I think the problem is affected by the 
> {{WhereOptimizer.KeyExpressionVisitor.orKeySlots}} method, in the following 
> line 763, because the pk2 column is not the leading pk column,so this method 
> return null, causing the expression 
> {{((t.pk2 >= 4 and t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9))}} is not  pushed 
> to scan:
> {code:java}
> 757    boolean hasFirstSlot = true;
> 758    boolean prevIsNull = false;
> 759    // TODO: Do the same optimization that we do for IN if the childSlots 
> specify a fully qualified row key
> 760   for (KeySlot slot : childSlot) {
> 761      if (hasFirstSlot) {
> 762           // if the first slot is null, return null immediately
> 763           if (slot == null) {
> 764                return null;
> 765            }
> 766           // mark that we've handled the first slot
> 767           hasFirstSlot = false;
> 768      }
> {code}
> For above {{WhereOptimizer.KeyExpressionVisitor.orKeySlots}} method, it seems 
> that it is not necessary to make sure the PK Column in OrExpression is 
> leading PK Column,just guarantee there is only one PK Column in OrExpression 
> is enough.  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PHOENIX-4602) OrExpression should can also push non-leading pk columns to scan

2018-02-13 Thread chenglei (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenglei updated PHOENIX-4602:
--
Attachment: PHOENIX-4602_v2.patch

> OrExpression should can also push non-leading pk columns to scan
> 
>
> Key: PHOENIX-4602
> URL: https://issues.apache.org/jira/browse/PHOENIX-4602
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.13.0
>Reporter: chenglei
>Assignee: chenglei
>Priority: Major
> Fix For: 4.14.0
>
> Attachments: PHOENIX-4602_v2.patch
>
>
> Given following table:
> {code}
>     CREATE TABLE test_table (
>      PK1 INTEGER NOT NULL,
>      PK2 INTEGER NOT NULL,
>      PK3 INTEGER NOT NULL,
>  DATA INTEGER, 
>      CONSTRAINT TEST_PK PRIMARY KEY (PK1,PK2,PK3))
> {code}
> and a sql:
> {code}
>   select * from test_table t where (t.pk1 >=2 and t.pk1<5) and ((t.pk2 >= 4 
> and t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9))
> {code}
> Obviously, it is a typical case for the sql to use SkipScanFilter,however, 
> the sql actually does not use Skip Scan, it use Range Scan and just push the 
> leading pk column expression {{(t.pk1 >=2 and t.pk1<5)}} to scan,the explain 
> sql is :
>  {code:sql}
>    CLIENT PARALLEL 1-WAY RANGE SCAN OVER TEST_TABLE [2] - [5]
>    SERVER FILTER BY ((PK2 >= 4 AND PK2 < 6) OR (PK2 >= 8 AND PK2 < 9))
> {code}
>  I think the problem is affected by the 
> {{WhereOptimizer.KeyExpressionVisitor.orKeySlots}} method, in the following 
> line 763, because the pk2 column is not the leading pk column,so this method 
> return null, causing the expression 
> {{((t.pk2 >= 4 and t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9))}} is not  pushed 
> to scan:
> {code:java}
> 757    boolean hasFirstSlot = true;
> 758    boolean prevIsNull = false;
> 759    // TODO: Do the same optimization that we do for IN if the childSlots 
> specify a fully qualified row key
> 760   for (KeySlot slot : childSlot) {
> 761      if (hasFirstSlot) {
> 762           // if the first slot is null, return null immediately
> 763           if (slot == null) {
> 764                return null;
> 765            }
> 766           // mark that we've handled the first slot
> 767           hasFirstSlot = false;
> 768      }
> {code}
> For above {{WhereOptimizer.KeyExpressionVisitor.orKeySlots}} method, it seems 
> that it is not necessary to make sure the PK Column in OrExpression is 
> leading PK Column,just guarantee there is only one PK Column in OrExpression 
> is enough.  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PHOENIX-4602) OrExpression should can also push non-leading pk columns to scan

2018-02-13 Thread chenglei (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenglei updated PHOENIX-4602:
--
Attachment: (was: PHOENIX-4602_v1.patch)

> OrExpression should can also push non-leading pk columns to scan
> 
>
> Key: PHOENIX-4602
> URL: https://issues.apache.org/jira/browse/PHOENIX-4602
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.13.0
>Reporter: chenglei
>Assignee: chenglei
>Priority: Major
> Fix For: 4.14.0
>
> Attachments: PHOENIX-4602_v2.patch
>
>
> Given following table:
> {code}
>     CREATE TABLE test_table (
>      PK1 INTEGER NOT NULL,
>      PK2 INTEGER NOT NULL,
>      PK3 INTEGER NOT NULL,
>  DATA INTEGER, 
>      CONSTRAINT TEST_PK PRIMARY KEY (PK1,PK2,PK3))
> {code}
> and a sql:
> {code}
>   select * from test_table t where (t.pk1 >=2 and t.pk1<5) and ((t.pk2 >= 4 
> and t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9))
> {code}
> Obviously, it is a typical case for the sql to use SkipScanFilter,however, 
> the sql actually does not use Skip Scan, it use Range Scan and just push the 
> leading pk column expression {{(t.pk1 >=2 and t.pk1<5)}} to scan,the explain 
> sql is :
>  {code:sql}
>    CLIENT PARALLEL 1-WAY RANGE SCAN OVER TEST_TABLE [2] - [5]
>    SERVER FILTER BY ((PK2 >= 4 AND PK2 < 6) OR (PK2 >= 8 AND PK2 < 9))
> {code}
>  I think the problem is affected by the 
> {{WhereOptimizer.KeyExpressionVisitor.orKeySlots}} method, in the following 
> line 763, because the pk2 column is not the leading pk column,so this method 
> return null, causing the expression 
> {{((t.pk2 >= 4 and t.pk2 <6) or (t.pk2 >= 8 and t.pk2 <9))}} is not  pushed 
> to scan:
> {code:java}
> 757    boolean hasFirstSlot = true;
> 758    boolean prevIsNull = false;
> 759    // TODO: Do the same optimization that we do for IN if the childSlots 
> specify a fully qualified row key
> 760   for (KeySlot slot : childSlot) {
> 761      if (hasFirstSlot) {
> 762           // if the first slot is null, return null immediately
> 763           if (slot == null) {
> 764                return null;
> 765            }
> 766           // mark that we've handled the first slot
> 767           hasFirstSlot = false;
> 768      }
> {code}
> For above {{WhereOptimizer.KeyExpressionVisitor.orKeySlots}} method, it seems 
> that it is not necessary to make sure the PK Column in OrExpression is 
> leading PK Column,just guarantee there is only one PK Column in OrExpression 
> is enough.  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)