[sqlite] BestIndex problem

2015-05-22 Thread Emmanouil Karvounis
Thank you so much for your answers, we managed to do what we wanted.


Best,
Manos
Stamatis

On 21 May 2015 at 21:42, Dan Kennedy  wrote:

> On 05/21/2015 10:20 PM, Emmanouil Karvounis wrote:
>
>> Greetings,
>>
>> We are having an issue with BestIndex in our Virtual Table implementation.
>> Allow us to illustrate the problem with a simple example.
>>
>> We create a virtual table named 'vt' that conceptually has 3 columns
>> "value1, value2, value3" and then we want to execute the following
>> queries:
>>
>> 1) select value1 from vt where value1 = 7;
>>
>> In this case, BestIndex passes the equal constraint on the first column
>> and
>> by setting the output as pdxInfo->aConstraintUsage[0].argvIndex = 1,
>> we indicate that we accept this specific constraint. So, at Filter we get
>> the value 7 as argument0. Everything behaves normally so far.
>>
>> However, in case we run any of the following equivalent queries (that
>> should pass to Filter more than one value), we get an error message
>> "xBestIndex returned an invalid plan":
>>
>> 2) select value1 from vt where value1 = 7 or value1 = 8;
>> 3) select value1 from vt where value1 in (select * from tableA);//
>> suppose tableA contains an integer 'id' column and records (7, 8)
>> 4) select value1 from vt, tableA where value1 = tableA.id;
>>
>> Again, in each case we set pdxInfo->aConstraintUsage[0].argvIndex = 1 but
>> we get the above error message.
>>
>> This behavior seems rather weird, so we'd like some expert help on what we
>> might be doing wrong.
>>
>
> That happens if you set the argvIndex variable on a constraint for which
> the "usable" flag is not set. For each constraint in the aConstraint[]
> array, you need to check that the "usable" flag is set - and ignore the
> constraint if it is not.
>
> Search for "usable" in this section of the docs:
>
>   https://www.sqlite.org/vtab.html#xbestindex
>
> Dan.
>
>
>
> ___
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>


[sqlite] BestIndex problem

2015-05-21 Thread Emmanouil Karvounis
Greetings,

We are having an issue with BestIndex in our Virtual Table implementation.
Allow us to illustrate the problem with a simple example.

We create a virtual table named 'vt' that conceptually has 3 columns
"value1, value2, value3" and then we want to execute the following queries:

1) select value1 from vt where value1 = 7;

In this case, BestIndex passes the equal constraint on the first column and
by setting the output as pdxInfo->aConstraintUsage[0].argvIndex = 1,
we indicate that we accept this specific constraint. So, at Filter we get
the value 7 as argument0. Everything behaves normally so far.

However, in case we run any of the following equivalent queries (that
should pass to Filter more than one value), we get an error message
"xBestIndex returned an invalid plan":

2) select value1 from vt where value1 = 7 or value1 = 8;
3) select value1 from vt where value1 in (select * from tableA);//
suppose tableA contains an integer 'id' column and records (7, 8)
4) select value1 from vt, tableA where value1 = tableA.id;

Again, in each case we set pdxInfo->aConstraintUsage[0].argvIndex = 1 but
we get the above error message.

This behavior seems rather weird, so we'd like some expert help on what we
might be doing wrong.


Thank you for your time,
Manos Karvounis
Stamatis Christoforidis


Re: [sqlite] Streaming group by in union of already sorted tables

2015-01-23 Thread Emmanouil Karvounis
Thank you very much, Richard.

I'm sure this enhancement, modelled in a more general way, e.g., like an
isSorted flag on the subquery to be used by the outer query, can be a great
enhancement for many other types of queries employing nesting. I think it
will help both in time and in space complexity, something very important
when aiming at the embedded systems enviroment.

Let me know if you need us to provide any further details or other
assistance.


Regards,
Manos

On 23 January 2015 at 21:30, Richard Hipp <d...@sqlite.org> wrote:

> On 1/23/15, Emmanouil Karvounis <man...@di.uoa.gr> wrote:
> > We have two tables that are already sorted on a combination of
> > two fields (which are their primary keys) and we want to union them and
> > apply group by on both the aforementioned fields, like so:
> >
> > select c1, c2, myagg(*) from (
> > select * from tableA
> > union all
> > select * from tableB
> > )
> > group by c1, c2;
> >
> > where tableA, tableB have primary key (c1, c2) and their schema
> comprises 3
> > integers: c1, c2, and prop.
> >
> > The sqlite query plan creates a temporary B-tree to hold all the records
> of
> > both tables to execute the group by.
>
> You are correct.  Here is the test case I am using:
>
> CREATE TABLE tableA(c1 INT, c2 INT, payload INT, PRIMARY KEY(c1,c2));
> CREATE TABLE tableB(c1 INT, c2 INT, payload INT, PRIMARY KEY(c1,c2));
>
> explain query plan
> SELECT * FROM tableA
> UNION ALL
> SELECT * FROM tableB
> ORDER BY 1, 2;
>
> .print -
>
> explain query plan
> SELECT c1, c2, sum(payload) FROM (
>   SELECT * FROM tableA
>   UNION ALL
>   SELECT * FROM tableB
> )
> GROUP BY c1, c2;
>
> The first query - the UNION ALL with the ORDER BY does not use a
> separate B-tree for sorting.  So it seems logical that the second
> query that does a GROUP BY over the same UNION ALL should be able to
> get by without a separate sorting B-Tree too.
>
> For various technical reasons, this is complex change in SQLite.  But
> it seems like a worthwhile enhancement, so we'll take your suggestion
> under advisement and attempt to do a better job of handling your query
> in future releases.
>
> Thanks for posting.
>
> --
> D. Richard Hipp
> d...@sqlite.org
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Streaming group by in union of already sorted tables

2015-01-23 Thread Emmanouil Karvounis
>
> Sorry, but SQLite does not understand how the subquery (inside the
> brackets) is going to be used by the main query.  It hqs to complete the
> subquery first and only then can it inspect the main query to find out how
> to optimize it.  This is not a bug, there just isn't enough flexibility to
> do this the way you want.


This is unfortunate, especially for an RDBMS aiming to minimize memory
consumption for embedded systems. I believe there are many other queries
where optimizing the subqueries with regard to the external query will
result in optimal memory usage.


> One of the following may or may not be useful:
>
> You may be able to use
>
> select c1, c2, myagg(*) from (
> select c1, c2 from tableA group by c1,c2
> union all
> select c1, c2 from tableB group by c1,c2
> ) group by c1,c2
>
>
Thank you for your suggestion. But the query plan is the same, how would
that be any faster? We actually run an experiment and it is a bit slower, I
suppose because of the additional overhead of creating each group in the
subquery, which doesn't contribute anything given the way the external
query is handled.

Alternatively, is there a good reason for tableA and tableB not to be
> merged with, perhaps, an extra column indicating 'A' or 'B' ?  This would
> allow you to create an index and get your answer almost instantly.  When
> you see two tables with the same columns it's often an indication that
> there should really be one table.
>

This is an alternative we wish to explore, but in our usecase it would be
considered a hack, and might cause side-effects in other parts of the code.
If we are forced to do it, we might be able to pull it off though. Thank
you for your suggestion.


Best,
Manos



>
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Streaming group by in union of already sorted tables

2015-01-23 Thread Emmanouil Karvounis
To explain that with an example:

tableA
(1,2) (2,3) (2,4) (3,5)

tableB
(1,2) (2,3) (4,6)


Get a pointer on tableA and one on tableB.

(1,2) and (1,2) form a group, run count(*) and output 2.

Advance both pointers.

(2,3) and (2,3) form a group, run count(*) and output 2.

Advance both pointers.

(2,4) is less than (4,6) (because 2 < 4). (2,4) forms a group, run count(*)
and output 1.

Advance pointer in tableA.

(3,5) is less than (4,6) (because 3 < 5). (3,5) forms a group, run count(*)
and output 1.

pointer on table A is depleted. Continue with tableB only.

(4,6) forms a group. run count(*) and output 1.


The sqlite plan calls for merging both tables in a B-tree and scanning it,
which incurs unneeded time and space complecity compared to the above.


Manos

On 23 January 2015 at 18:59, Emmanouil Karvounis <man...@di.uoa.gr> wrote:

> Dear Simon,
>
> Thank you for your answer and I'm sorry if I have used inappropriate
> wording and confused you.
>
> The issue is actually very simple:
>
> tableA and tableB have both primary key on (c1, c2)
>
> explain query plan
> select c1, c2, count(*) from (
> select c1, c2 from tableA
> union all
> select c1, c2 from tableB
> )
> group by c1,c2
>
> 2|0|0|SCAN TABLE tableA USING COVERING INDEX sqlite_autoindex_tableA_1
> 3|0|0|SCAN TABLE tableB USING COVERING INDEX sqlite_autoindex_tableB_1
> 1|0|0|COMPOUND SUBQUERIES 2 AND 3 (UNION ALL)
> 0|0|0|SCAN SUBQUERY 1
> 0|0|0|USE TEMP B-TREE FOR GROUP BY
>
> There is no reason to create a new temp B-tree when you can sequentially
> and in-synch scan the B-tree of tableA and of tableB and get the groups in
> one pass. Think of how you would execute the sort step of mergesort on two
> already sorted subarrays.
>
>
> Manos
>
> On 23 January 2015 at 18:42, Simon Slavin <slav...@bigfraud.org> wrote:
>
>>
>> On 23 Jan 2015, at 4:15pm, Emmanouil Karvounis <man...@di.uoa.gr> wrote:
>>
>> > In short, we have two tables that are already sorted on a combination of
>> > two fields
>>
>> There is no such thing as a 'sorted table' in SQL.  Each table is a set
>> of rows and the rows have no order.
>>
>> If you want to make it easy for SQL to access a table's rows in a
>> particular order, create an index or make that order the table's primary
>> key (which is another way of making an index).
>>
>> > select c1, c2, myagg(*) from (
>> > select * from tableA
>> > union all
>> > select * from tableB
>> > )
>> > group by c1, c2;
>>
>> This command tells SQL that you want to construct a list of every row of
>> tableA and every row of tableB.  In other words, if you have 300 rows in
>> tableA and 500 rows in tableB, you are telling SQL to construct a new table
>> of 800 rows.  And because this table doesn't yet exist, it doesn't have any
>> indexes so it can't be searched quickly.  Is that what you wanted ?
>>
>> Is there a good reason for needing this data in two separate tables
>> rather than one for which you can create an index on (c1, c2) ?
>>
>> Do the groups occur entirely within one table or do you have to add the
>> tables together before SQL can figure out the groups.
>>
>> > where tableA, tableB have primary key (c1, c2) and their schema
>> comprises 3
>> > integers: c1, c2, and prop.
>>
>> It might be worth testing with something like 'total(*)' just to make
>> sure it isn't your own function which is causing the problems.
>>
>> Simon.
>> ___
>> sqlite-users mailing list
>> sqlite-users@sqlite.org
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>
>
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Streaming group by in union of already sorted tables

2015-01-23 Thread Emmanouil Karvounis
Dear Simon,

Thank you for your answer and I'm sorry if I have used inappropriate
wording and confused you.

The issue is actually very simple:

tableA and tableB have both primary key on (c1, c2)

explain query plan
select c1, c2, count(*) from (
select c1, c2 from tableA
union all
select c1, c2 from tableB
)
group by c1,c2

2|0|0|SCAN TABLE tableA USING COVERING INDEX sqlite_autoindex_tableA_1
3|0|0|SCAN TABLE tableB USING COVERING INDEX sqlite_autoindex_tableB_1
1|0|0|COMPOUND SUBQUERIES 2 AND 3 (UNION ALL)
0|0|0|SCAN SUBQUERY 1
0|0|0|USE TEMP B-TREE FOR GROUP BY

There is no reason to create a new temp B-tree when you can sequentially
and in-synch scan the B-tree of tableA and of tableB and get the groups in
one pass. Think of how you would execute the sort step of mergesort on two
already sorted subarrays.


Manos

On 23 January 2015 at 18:42, Simon Slavin <slav...@bigfraud.org> wrote:

>
> On 23 Jan 2015, at 4:15pm, Emmanouil Karvounis <man...@di.uoa.gr> wrote:
>
> > In short, we have two tables that are already sorted on a combination of
> > two fields
>
> There is no such thing as a 'sorted table' in SQL.  Each table is a set of
> rows and the rows have no order.
>
> If you want to make it easy for SQL to access a table's rows in a
> particular order, create an index or make that order the table's primary
> key (which is another way of making an index).
>
> > select c1, c2, myagg(*) from (
> > select * from tableA
> > union all
> > select * from tableB
> > )
> > group by c1, c2;
>
> This command tells SQL that you want to construct a list of every row of
> tableA and every row of tableB.  In other words, if you have 300 rows in
> tableA and 500 rows in tableB, you are telling SQL to construct a new table
> of 800 rows.  And because this table doesn't yet exist, it doesn't have any
> indexes so it can't be searched quickly.  Is that what you wanted ?
>
> Is there a good reason for needing this data in two separate tables rather
> than one for which you can create an index on (c1, c2) ?
>
> Do the groups occur entirely within one table or do you have to add the
> tables together before SQL can figure out the groups.
>
> > where tableA, tableB have primary key (c1, c2) and their schema
> comprises 3
> > integers: c1, c2, and prop.
>
> It might be worth testing with something like 'total(*)' just to make sure
> it isn't your own function which is causing the problems.
>
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Streaming group by in union of already sorted tables

2015-01-23 Thread Emmanouil Karvounis
Thank you Clemens.

In the general case, myagg() doesn't have the appropriate property to
distribute over tableA and tableB seperately.

Let me be clear, we are not looking for alternatives to get our query
running more efficiently, we sincerely believe that we have found a defect
in the sqlite query planner. This is a bug report, not a request for
assistance on formulating our query.

I do appreciate the effort and your good will, though. Thank you again.

Manos

On 23 January 2015 at 18:48, Clemens Ladisch <clem...@ladisch.de> wrote:

> Emmanouil Karvounis wrote:
> > In short, we have two tables that are already sorted on a combination of
> > two fields (which are their primary keys) and we want to union them and
> > apply group by on both the aforementioned fields, like so:
> >
> > select c1, c2, myagg(*) from (
> > select * from tableA
> > union all
> > select * from tableB
> > )
> > group by c1, c2;
> >
> > The sqlite query plan creates a temporary B-tree to hold all the records
> of
> > both tables to execute the group by. This incurs too much time and space
> > complexity
>
> If the number of result rows is small compared to the number of table
> rows, and if myagg() works correctly when called on its own results,
> then you can reduce the size of the temporary B-tree by doing another
> GROUP BY in the subqueries:
>
> select c1, c2, myagg(prop) from (
>   select c1, c2, myagg(prop) as prop from tableA group by c1, c2
>   union all
>   select c1, c2, myagg(prop) from tableB group by c2, c2
> )
> group by c1, c2;
>
>
> Regards,
> Clemens
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Streaming group by in union of already sorted tables

2015-01-23 Thread Emmanouil Karvounis
Greetings,

We are having an issue with the sqlite query plan for one of our queries,
which seems to be sub-optimal both in time and in space complexity.

In short, we have two tables that are already sorted on a combination of
two fields (which are their primary keys) and we want to union them and
apply group by on both the aforementioned fields, like so:

select c1, c2, myagg(*) from (
select * from tableA
union all
select * from tableB
)
group by c1, c2;

where tableA, tableB have primary key (c1, c2) and their schema comprises 3
integers: c1, c2, and prop.

The sqlite query plan creates a temporary B-tree to hold all the records of
both tables to execute the group by. This incurs too much time and space
complexity compared to the better plan of sequentially iterating over both
tables using their B-tree indices on the primary key (since they contain
the records already sorted in the group by fields). Like what one would do
on the sort step of mergesort to sort two already sorted lists.

When we manually implemented the idea of the sequential scan of the two
tables and manually gathering the groups and applying the aggregate
function, we had a 2X speedup compared to the sqlite plan of creating a
temp B-tree. Of course, this speedup would be even greater if we could push
the agg function inside the sqlite engine and having it do the sequential
scanning plan on the indices directly.

Note: In the general case, tableA and tableB have a non empty intersection
and neither one is a subset of the other.

At the end of the email we provide extensive details for the tables we use
and the query we want to run, plus a rundown on the complexity of the
sqlite plan and our proposed solution.

We are at your disposal for any further clarifications and we are looking
forward to your reaction on this.


Kind Regards,
Manos Karvounis


--



SQLite version: 3.8.8.1 (latest)


pragma table_info(tableA);

0|c1|int(11)|0||1
1|c2|int(11)|0||2
2|prop|int(11)|0||0


pragma table_info(tableB);

0|c1|int(11)|0||1
1|c2|int(11)|0||2
2|prop|int(11)|0||0


select count(*) from tableA;

11095298


select count(*) from tableB;

100


pragma index_list(tableA);

0|sqlite_autoindex_tableA_1|1


pragma index_info(sqlite_autoindex_tableA_1);

0|0|c1
1|1|c2


pragma index_list(tableB);

0|sqlite_autoindex_tableB_1|1


pragma index_info(sqlite_autoindex_tableB_1);

0|0|c1
1|1|c2


explain query plan
select c1, c2, myagg(*) from (
select * from tableA
union all
select * from tableB
)
group by c1, c2;

2|0|0|SCAN TABLE tableA
3|0|0|SCAN TABLE tableB
1|0|0|COMPOUND SUBQUERIES 2 AND 3 (UNION ALL)
0|0|0|SCAN SUBQUERY 1
0|0|0|USE TEMP B-TREE FOR GROUP BY


Note that we get the same query plan even if we explicitly ask for ordering
tableA, tableB on c1, c2 prior to group by.

explain query plan
select c1, c2, myagg(*) from (
select * from tableA
union all
select * from tableB
order by c1, c2
)
group by c1, c2;

2|0|0|SCAN TABLE tableA USING INDEX sqlite_autoindex_tableA_1
3|0|0|SCAN TABLE tableB USING INDEX sqlite_autoindex_tableB_1
1|0|0|COMPOUND SUBQUERIES 2 AND 3 (UNION ALL)
0|0|0|SCAN SUBQUERY 1
0|0|0|USE TEMP B-TREE FOR GROUP BY


SQLite Query Plan:

(1) Add every record of tableA and tableB on a new B-tree
Time complexity: O((|tableA| + |tableB|)log(|tableA|+|tableB|)).
Space overhead: O(|tableA|+|tableB|).

(2) Run through the B-tree to get groups
Time complexity: O(|tableA| + |tableB|).


Alternative Query Plan:

(1) Iterate over both B-trees in synch since they are already sorted (i.e.,
the way we would merge two already sorted lists using two running pointers,
or the behavior you indicate in 3.2 here
https://www.sqlite.org/queryplanner.html).
Time complexity: O(|tableA|+|tableB|)
Space overhead: O(1)

Yet in the following queries the plan (seems to) correctly use the primary
key index to speed up the order by.

explain query plan
select c1, c2, prop from (
select * from tableA
union all
select * from tableB
)
order by c1, c2;

1|0|0|SCAN TABLE tableA USING INDEX sqlite_autoindex_tableA_1
2|0|0|SCAN TABLE tableB USING INDEX sqlite_autoindex_tableB_1
0|0|0|COMPOUND SUBQUERIES 1 AND 2 (UNION ALL)


explain query plan
select c1, c2, prop from table A
order by c1, c2;

0|0|0|SCAN TABLE tableA USING INDEX sqlite_autoindex_tableA_1


Even if one of the tables did not have an index, again the better plan
would be to create a temp B-tree for that table and then iterate in synch.
Yet sqlite again creates a temp B-tree as shown below.

pragma table_info(tableC);

0|c1|int(11)|0||0
1|c2|int(11)|0||0
2|prop|int(11)|0||0


select count(*) from tableC;

100


pragma index_list(tableC);


explain query plan
select c1, c2, myagg(*) from (
select * from tableA
union all
select * from tableC
)
group by c1, c2;

2|0|0|SCAN TABLE tableA
3|0|0|SCAN TABLE tableC
1|0|0|COMPOUND SUBQUERIES 2 AND 3 (UNION ALL)
0|0|0|SCAN SUBQUERY 1
0|0|0|USE TEMP B-TREE FOR GROUP BY