Re: Skip scan optimization failed for multi pk columns

2016-09-24 Thread James Taylor
William,
Yes, PHOENIX-1801 is a good one too. IMHO, we should do that work in the
calcite branch where we can leverage the cost based optimizer framework in
Calcite.
Thanks,
James

On Sat, Sep 24, 2016 at 7:36 AM, Ted Yu  wrote:

> William:
> I don't see attachment in your original email.
>
> Mind sharing the JIRA number once you open the JIRA (I assume the patch
> would be attached there) ?
>
> Thanks
>
> On Thu, Sep 22, 2016 at 9:24 PM, William  wrote:
>
> > Hi all,
> >This is a simple scenario, there are two tables:
> >create table t1 (pk integer primary key, a integer);
> >create table t2 (pk1 integer not null, pk2 integer not null, a integer
> > constraint pk primary key (pk1, pk2));
> >
> >Do the following selects:
> >   1. explain select * from t1 where (pk > 10 and pk < 20) or (pk > 30 and
> > pk < 40);
> >   results: SKIP SCAN ON 2 RANGES OVER T1 [11] - [40]
> >
> >   2. explain select * from t2 where (pk1 > 10 and pk1 < 20) or (pk1 > 30
> > and pk1 < 40);
> >   results: FULL SCAN OVER T2
> >SERVER FILTER BY ((PK1 > 10 AND PK1 < 20) OR (PK1 > 30
> > AND PK1 < 40))
> >
> > Apparently, 2nd SELECT statement should use skip scan instead of full
> > table scan. But T2 has two pk columns and then WhereOptimizer failed to
> > optimize it. I went through the code and made a small improvement.
> > In WhereOptimizer#KeyExpressionVisitor#orKeySlots(),  see the
> > attached patch file for detail. The main idea is we allow slot in
> childSlot
> > is null, only if all slots afterwards are null too. So the following
> > statements are still rejected:
> > select * from t2 where (pk1 > 10 and pk1 < 20)  or (pk2 > 30 and pk2
> <
> > 40)
> >
> > Please review this. Thanks.
> > William.
> >
> >
> >
> >
> >
>


Re: Skip scan optimization failed for multi pk columns

2016-09-24 Thread Ted Yu
William:
I don't see attachment in your original email.

Mind sharing the JIRA number once you open the JIRA (I assume the patch
would be attached there) ?

Thanks

On Thu, Sep 22, 2016 at 9:24 PM, William  wrote:

> Hi all,
>This is a simple scenario, there are two tables:
>create table t1 (pk integer primary key, a integer);
>create table t2 (pk1 integer not null, pk2 integer not null, a integer
> constraint pk primary key (pk1, pk2));
>
>Do the following selects:
>   1. explain select * from t1 where (pk > 10 and pk < 20) or (pk > 30 and
> pk < 40);
>   results: SKIP SCAN ON 2 RANGES OVER T1 [11] - [40]
>
>   2. explain select * from t2 where (pk1 > 10 and pk1 < 20) or (pk1 > 30
> and pk1 < 40);
>   results: FULL SCAN OVER T2
>SERVER FILTER BY ((PK1 > 10 AND PK1 < 20) OR (PK1 > 30
> AND PK1 < 40))
>
> Apparently, 2nd SELECT statement should use skip scan instead of full
> table scan. But T2 has two pk columns and then WhereOptimizer failed to
> optimize it. I went through the code and made a small improvement.
> In WhereOptimizer#KeyExpressionVisitor#orKeySlots(),  see the
> attached patch file for detail. The main idea is we allow slot in childSlot
> is null, only if all slots afterwards are null too. So the following
> statements are still rejected:
> select * from t2 where (pk1 > 10 and pk1 < 20)  or (pk2 > 30 and pk2 <
> 40)
>
> Please review this. Thanks.
> William.
>
>
>
>
>


Re: Skip scan optimization failed for multi pk columns

2016-09-23 Thread James Taylor
Great find, William! Please file a JIRA with a test case and a patch, and
we'll get this reviewed and committed.

If you're interested and have time, there are a couple more issues related
to skip scan that you could take a look at too: PHOENIX-1439, PHOENIX-307,
and PHOENIX-1193.

Thanks,
James

On Thu, Sep 22, 2016 at 9:24 PM, William  wrote:

> Hi all,
>This is a simple scenario, there are two tables:
>create table t1 (pk integer primary key, a integer);
>create table t2 (pk1 integer not null, pk2 integer not null, a integer
> constraint pk primary key (pk1, pk2));
>
>Do the following selects:
>   1. explain select * from t1 where (pk > 10 and pk < 20) or (pk > 30 and
> pk < 40);
>   results: SKIP SCAN ON 2 RANGES OVER T1 [11] - [40]
>
>   2. explain select * from t2 where (pk1 > 10 and pk1 < 20) or (pk1 > 30
> and pk1 < 40);
>   results: FULL SCAN OVER T2
>SERVER FILTER BY ((PK1 > 10 AND PK1 < 20) OR (PK1 > 30
> AND PK1 < 40))
>
> Apparently, 2nd SELECT statement should use skip scan instead of full
> table scan. But T2 has two pk columns and then WhereOptimizer failed to
> optimize it. I went through the code and made a small improvement.
> In WhereOptimizer#KeyExpressionVisitor#orKeySlots(),  see the
> attached patch file for detail. The main idea is we allow slot in childSlot
> is null, only if all slots afterwards are null too. So the following
> statements are still rejected:
> select * from t2 where (pk1 > 10 and pk1 < 20)  or (pk2 > 30 and pk2 <
> 40)
>
> Please review this. Thanks.
> William.
>
>
>
>
>