[jira] [Updated] (PHOENIX-3328) Skip scan optimization failed for multi pk columns

2016-10-10 Thread James Taylor (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Taylor updated PHOENIX-3328:
--
Assignee: William Yang

> Skip scan optimization failed for multi pk columns
> --
>
> Key: PHOENIX-3328
> URL: https://issues.apache.org/jira/browse/PHOENIX-3328
> Project: Phoenix
>  Issue Type: Bug
>Reporter: William Yang
>Assignee: William Yang
>Priority: Minor
> Fix For: 4.9.0, 4.8.2
>
> Attachments: PHOENIX-3328.patch
>
>
> {code:sql}
> create table t1 (pk integer primary key, a integer);
> create table t2 (pk1 integer not null, pk2 integer not null, a integer 
> constraint pk primary key (pk1, pk2));
> explain select * from t1 where (pk > 10 and pk < 20) or (pk > 30 and pk < 40);
> explain select * from t2 where (pk1 > 10 and pk1 < 20) or (pk1 > 30 and pk1 < 
> 40);
> {code}
> The first SELECT statement will use skip scan filter for two ranges, so as 
> the second one. But actually the WhereOptimizer failed doing so and using a 
> full table scan instead. This happens for tables have multi PK columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PHOENIX-3328) Skip scan optimization failed for multi pk columns

2016-10-10 Thread James Taylor (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Taylor updated PHOENIX-3328:
--
Fix Version/s: 4.8.2
   4.9.0

> Skip scan optimization failed for multi pk columns
> --
>
> Key: PHOENIX-3328
> URL: https://issues.apache.org/jira/browse/PHOENIX-3328
> Project: Phoenix
>  Issue Type: Bug
>Reporter: William Yang
>Priority: Minor
> Fix For: 4.9.0, 4.8.2
>
> Attachments: PHOENIX-3328.patch
>
>
> {code:sql}
> create table t1 (pk integer primary key, a integer);
> create table t2 (pk1 integer not null, pk2 integer not null, a integer 
> constraint pk primary key (pk1, pk2));
> explain select * from t1 where (pk > 10 and pk < 20) or (pk > 30 and pk < 40);
> explain select * from t2 where (pk1 > 10 and pk1 < 20) or (pk1 > 30 and pk1 < 
> 40);
> {code}
> The first SELECT statement will use skip scan filter for two ranges, so as 
> the second one. But actually the WhereOptimizer failed doing so and using a 
> full table scan instead. This happens for tables have multi PK columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PHOENIX-3328) Skip scan optimization failed for multi pk columns

2016-09-24 Thread William Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15519309#comment-15519309
 ] 

William Yang commented on PHOENIX-3328:
---

The patch handles PK ranges OR operation only for the FIRST PK column. If there 
are other PK columns in the WHERE clause, it cannot handle. For example:
{code:sql}
where (pk1  = 5 and pk2 > 10 and pk2 < 20) or (pk = 6 and pk2 > 60 and pk2 < 
80);
{code}

this WHERE clause indicate four ranges:
5 10
5 20
6 60
6 80
which can be optimized to SkipScanFilter for 4 ranges [5,10] - [6, 80]
But this does not always happen, see the following case:
{code:sql}
where (pk1 = 5 and pk2 > 10 and pk2 < 20) or (pk1 = 1 and pk2 > 60 and pk2 < 
80);
{code} 
we have to do two separate skip scan range scans instead of one, because the 
value of PK1 in the RHS of OR is smaller than the LHS', but range of PK2 is 
greater than the left.  I think it's OK for us to be able to handle the first 
PK only.

> Skip scan optimization failed for multi pk columns
> --
>
> Key: PHOENIX-3328
> URL: https://issues.apache.org/jira/browse/PHOENIX-3328
> Project: Phoenix
>  Issue Type: Bug
>Reporter: William Yang
>Priority: Minor
> Attachments: PHOENIX-3328.patch
>
>
> {code:sql}
> create table t1 (pk integer primary key, a integer);
> create table t2 (pk1 integer not null, pk2 integer not null, a integer 
> constraint pk primary key (pk1, pk2));
> explain select * from t1 where (pk > 10 and pk < 20) or (pk > 30 and pk < 40);
> explain select * from t2 where (pk1 > 10 and pk1 < 20) or (pk1 > 30 and pk1 < 
> 40);
> {code}
> The first SELECT statement will use skip scan filter for two ranges, so as 
> the second one. But actually the WhereOptimizer failed doing so and using a 
> full table scan instead. This happens for tables have multi PK columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re:Re: Skip scan optimization failed for multi pk columns

2016-09-24 Thread William
Ted:
The email in my 'send box' has the attachment, but the email received didn't. I 
don't why.
I filed a jira, see https://issues.apache.org/jira/browse/PHOENIX-3328. 
I only add one test case for this. We already have some UT and IT for other 
scenarios that WHERE clause shouldn't be optimized to skip scan filter. i.e. 
EvaluationOfORIT.

Please take a look at it. 


Thanks.
William.

At 2016-09-24 22:36:08, "Ted Yu"  wrote:
>William:
>I don't see attachment in your original email.
>
>Mind sharing the JIRA number once you open the JIRA (I assume the patch
>would be attached there) ?
>
>Thanks
>
>On Thu, Sep 22, 2016 at 9:24 PM, William  wrote:
>
>> Hi all,
>>This is a simple scenario, there are two tables:
>>create table t1 (pk integer primary key, a integer);
>>create table t2 (pk1 integer not null, pk2 integer not null, a integer
>> constraint pk primary key (pk1, pk2));
>>
>>Do the following selects:
>>   1. explain select * from t1 where (pk > 10 and pk < 20) or (pk > 30 and
>> pk < 40);
>>   results: SKIP SCAN ON 2 RANGES OVER T1 [11] - [40]
>>
>>   2. explain select * from t2 where (pk1 > 10 and pk1 < 20) or (pk1 > 30
>> and pk1 < 40);
>>   results: FULL SCAN OVER T2
>>SERVER FILTER BY ((PK1 > 10 AND PK1 < 20) OR (PK1 > 30
>> AND PK1 < 40))
>>
>> Apparently, 2nd SELECT statement should use skip scan instead of full
>> table scan. But T2 has two pk columns and then WhereOptimizer failed to
>> optimize it. I went through the code and made a small improvement.
>> In WhereOptimizer#KeyExpressionVisitor#orKeySlots(),  see the
>> attached patch file for detail. The main idea is we allow slot in childSlot
>> is null, only if all slots afterwards are null too. So the following
>> statements are still rejected:
>> select * from t2 where (pk1 > 10 and pk1 < 20)  or (pk2 > 30 and pk2 <
>> 40)
>>
>> Please review this. Thanks.
>> William.
>>
>>
>>
>>
>>


[jira] [Updated] (PHOENIX-3328) Skip scan optimization failed for multi pk columns

2016-09-24 Thread William Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

William Yang updated PHOENIX-3328:
--
Attachment: PHOENIX-3328.patch

> Skip scan optimization failed for multi pk columns
> --
>
> Key: PHOENIX-3328
> URL: https://issues.apache.org/jira/browse/PHOENIX-3328
> Project: Phoenix
>  Issue Type: Bug
>Reporter: William Yang
>Priority: Minor
> Attachments: PHOENIX-3328.patch
>
>
> {code:sql}
> create table t1 (pk integer primary key, a integer);
> create table t2 (pk1 integer not null, pk2 integer not null, a integer 
> constraint pk primary key (pk1, pk2));
> explain select * from t1 where (pk > 10 and pk < 20) or (pk > 30 and pk < 40);
> explain select * from t2 where (pk1 > 10 and pk1 < 20) or (pk1 > 30 and pk1 < 
> 40);
> {code}
> The first SELECT statement will use skip scan filter for two ranges, so as 
> the second one. But actually the WhereOptimizer failed doing so and using a 
> full table scan instead. This happens for tables have multi PK columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PHOENIX-3328) Skip scan optimization failed for multi pk columns

2016-09-24 Thread William Yang (JIRA)
William Yang created PHOENIX-3328:
-

 Summary: Skip scan optimization failed for multi pk columns
 Key: PHOENIX-3328
 URL: https://issues.apache.org/jira/browse/PHOENIX-3328
 Project: Phoenix
  Issue Type: Bug
Reporter: William Yang
Priority: Minor


{code:sql}
create table t1 (pk integer primary key, a integer);
create table t2 (pk1 integer not null, pk2 integer not null, a integer 
constraint pk primary key (pk1, pk2));
explain select * from t1 where (pk > 10 and pk < 20) or (pk > 30 and pk < 40);
explain select * from t2 where (pk1 > 10 and pk1 < 20) or (pk1 > 30 and pk1 < 
40);
{code}

The first SELECT statement will use skip scan filter for two ranges, so as the 
second one. But actually the WhereOptimizer failed doing so and using a full 
table scan instead. This happens for tables have multi PK columns.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Skip scan optimization failed for multi pk columns

2016-09-24 Thread James Taylor
William,
Yes, PHOENIX-1801 is a good one too. IMHO, we should do that work in the
calcite branch where we can leverage the cost based optimizer framework in
Calcite.
Thanks,
James

On Sat, Sep 24, 2016 at 7:36 AM, Ted Yu  wrote:

> William:
> I don't see attachment in your original email.
>
> Mind sharing the JIRA number once you open the JIRA (I assume the patch
> would be attached there) ?
>
> Thanks
>
> On Thu, Sep 22, 2016 at 9:24 PM, William  wrote:
>
> > Hi all,
> >This is a simple scenario, there are two tables:
> >create table t1 (pk integer primary key, a integer);
> >create table t2 (pk1 integer not null, pk2 integer not null, a integer
> > constraint pk primary key (pk1, pk2));
> >
> >Do the following selects:
> >   1. explain select * from t1 where (pk > 10 and pk < 20) or (pk > 30 and
> > pk < 40);
> >   results: SKIP SCAN ON 2 RANGES OVER T1 [11] - [40]
> >
> >   2. explain select * from t2 where (pk1 > 10 and pk1 < 20) or (pk1 > 30
> > and pk1 < 40);
> >   results: FULL SCAN OVER T2
> >SERVER FILTER BY ((PK1 > 10 AND PK1 < 20) OR (PK1 > 30
> > AND PK1 < 40))
> >
> > Apparently, 2nd SELECT statement should use skip scan instead of full
> > table scan. But T2 has two pk columns and then WhereOptimizer failed to
> > optimize it. I went through the code and made a small improvement.
> > In WhereOptimizer#KeyExpressionVisitor#orKeySlots(),  see the
> > attached patch file for detail. The main idea is we allow slot in
> childSlot
> > is null, only if all slots afterwards are null too. So the following
> > statements are still rejected:
> > select * from t2 where (pk1 > 10 and pk1 < 20)  or (pk2 > 30 and pk2
> <
> > 40)
> >
> > Please review this. Thanks.
> > William.
> >
> >
> >
> >
> >
>


Re: Skip scan optimization failed for multi pk columns

2016-09-24 Thread Ted Yu
William:
I don't see attachment in your original email.

Mind sharing the JIRA number once you open the JIRA (I assume the patch
would be attached there) ?

Thanks

On Thu, Sep 22, 2016 at 9:24 PM, William  wrote:

> Hi all,
>This is a simple scenario, there are two tables:
>create table t1 (pk integer primary key, a integer);
>create table t2 (pk1 integer not null, pk2 integer not null, a integer
> constraint pk primary key (pk1, pk2));
>
>Do the following selects:
>   1. explain select * from t1 where (pk > 10 and pk < 20) or (pk > 30 and
> pk < 40);
>   results: SKIP SCAN ON 2 RANGES OVER T1 [11] - [40]
>
>   2. explain select * from t2 where (pk1 > 10 and pk1 < 20) or (pk1 > 30
> and pk1 < 40);
>   results: FULL SCAN OVER T2
>SERVER FILTER BY ((PK1 > 10 AND PK1 < 20) OR (PK1 > 30
> AND PK1 < 40))
>
> Apparently, 2nd SELECT statement should use skip scan instead of full
> table scan. But T2 has two pk columns and then WhereOptimizer failed to
> optimize it. I went through the code and made a small improvement.
> In WhereOptimizer#KeyExpressionVisitor#orKeySlots(),  see the
> attached patch file for detail. The main idea is we allow slot in childSlot
> is null, only if all slots afterwards are null too. So the following
> statements are still rejected:
> select * from t2 where (pk1 > 10 and pk1 < 20)  or (pk2 > 30 and pk2 <
> 40)
>
> Please review this. Thanks.
> William.
>
>
>
>
>


Re:Re: Skip scan optimization failed for multi pk columns

2016-09-24 Thread William
OK James, I'll do this tomorrow and have a look at those tickets.
Also, I am planning to do this, 
https://issues.apache.org/jira/browse/PHOENIX-1801. I'd like to support 
intersect and union scenario first, then sort union. 
Actually the QueryOptimizer doesn't 'choose' the appropriate index tables, but 
just generate all probable query plans then try to find the 'best'.
If we want to support complete index merge algorithm, we have to introduce cost 
model and this is not an easy job. Do you have any suggestions? 


Thanks.
William.


At 2016-09-23 14:46:41, "James Taylor"  wrote:
>Great find, William! Please file a JIRA with a test case and a patch, and
>we'll get this reviewed and committed.
>
>If you're interested and have time, there are a couple more issues related
>to skip scan that you could take a look at too: PHOENIX-1439, PHOENIX-307,
>and PHOENIX-1193.
>
>Thanks,
>James
>
>On Thu, Sep 22, 2016 at 9:24 PM, William  wrote:
>
>> Hi all,
>>This is a simple scenario, there are two tables:
>>create table t1 (pk integer primary key, a integer);
>>create table t2 (pk1 integer not null, pk2 integer not null, a integer
>> constraint pk primary key (pk1, pk2));
>>
>>Do the following selects:
>>   1. explain select * from t1 where (pk > 10 and pk < 20) or (pk > 30 and
>> pk < 40);
>>   results: SKIP SCAN ON 2 RANGES OVER T1 [11] - [40]
>>
>>   2. explain select * from t2 where (pk1 > 10 and pk1 < 20) or (pk1 > 30
>> and pk1 < 40);
>>   results: FULL SCAN OVER T2
>>SERVER FILTER BY ((PK1 > 10 AND PK1 < 20) OR (PK1 > 30
>> AND PK1 < 40))
>>
>> Apparently, 2nd SELECT statement should use skip scan instead of full
>> table scan. But T2 has two pk columns and then WhereOptimizer failed to
>> optimize it. I went through the code and made a small improvement.
>> In WhereOptimizer#KeyExpressionVisitor#orKeySlots(),  see the
>> attached patch file for detail. The main idea is we allow slot in childSlot
>> is null, only if all slots afterwards are null too. So the following
>> statements are still rejected:
>> select * from t2 where (pk1 > 10 and pk1 < 20)  or (pk2 > 30 and pk2 <
>> 40)
>>
>> Please review this. Thanks.
>> William.
>>
>>
>>
>>
>>


Re: Skip scan optimization failed for multi pk columns

2016-09-23 Thread James Taylor
Great find, William! Please file a JIRA with a test case and a patch, and
we'll get this reviewed and committed.

If you're interested and have time, there are a couple more issues related
to skip scan that you could take a look at too: PHOENIX-1439, PHOENIX-307,
and PHOENIX-1193.

Thanks,
James

On Thu, Sep 22, 2016 at 9:24 PM, William  wrote:

> Hi all,
>This is a simple scenario, there are two tables:
>create table t1 (pk integer primary key, a integer);
>create table t2 (pk1 integer not null, pk2 integer not null, a integer
> constraint pk primary key (pk1, pk2));
>
>Do the following selects:
>   1. explain select * from t1 where (pk > 10 and pk < 20) or (pk > 30 and
> pk < 40);
>   results: SKIP SCAN ON 2 RANGES OVER T1 [11] - [40]
>
>   2. explain select * from t2 where (pk1 > 10 and pk1 < 20) or (pk1 > 30
> and pk1 < 40);
>   results: FULL SCAN OVER T2
>SERVER FILTER BY ((PK1 > 10 AND PK1 < 20) OR (PK1 > 30
> AND PK1 < 40))
>
> Apparently, 2nd SELECT statement should use skip scan instead of full
> table scan. But T2 has two pk columns and then WhereOptimizer failed to
> optimize it. I went through the code and made a small improvement.
> In WhereOptimizer#KeyExpressionVisitor#orKeySlots(),  see the
> attached patch file for detail. The main idea is we allow slot in childSlot
> is null, only if all slots afterwards are null too. So the following
> statements are still rejected:
> select * from t2 where (pk1 > 10 and pk1 < 20)  or (pk2 > 30 and pk2 <
> 40)
>
> Please review this. Thanks.
> William.
>
>
>
>
>


Skip scan optimization failed for multi pk columns

2016-09-22 Thread William
Hi all,
   This is a simple scenario, there are two tables:
   create table t1 (pk integer primary key, a integer);
   create table t2 (pk1 integer not null, pk2 integer not null, a integer 
constraint pk primary key (pk1, pk2));


   Do the following selects:
  1. explain select * from t1 where (pk > 10 and pk < 20) or (pk > 30 and pk < 
40);
  results: SKIP SCAN ON 2 RANGES OVER T1 [11] - [40]
  
  2. explain select * from t2 where (pk1 > 10 and pk1 < 20) or (pk1 > 30 and 
pk1 < 40);
  results: FULL SCAN OVER T2 
   SERVER FILTER BY ((PK1 > 10 AND PK1 < 20) OR (PK1 > 30 AND 
PK1 < 40))


Apparently, 2nd SELECT statement should use skip scan instead of full table 
scan. But T2 has two pk columns and then WhereOptimizer failed to optimize it. 
I went through the code and made a small improvement.
In WhereOptimizer#KeyExpressionVisitor#orKeySlots(),  see the attached 
patch file for detail. The main idea is we allow slot in childSlot is null, 
only if all slots afterwards are null too. So the following statements are 
still rejected:
select * from t2 where (pk1 > 10 and pk1 < 20)  or (pk2 > 30 and pk2 < 40)


Please review this. Thanks.
William.