[sqlite] Discrepancy with indexing and WHERE clause with AND/OR

RP McMurphy Mon, 10 Nov 2014 07:09:24 -0800

If there is a large table and we need to select a subset of values using a 
WHERE clause with an AND/OR construct sqlite has trouble finding the answer in 
a reasonable time. Breaking the queries down into separate SELECT statements 
speeds up the process exponentially.


For example the following takes a few seconds to return the answer 1334. Note 
that the index (w) is a "low quality" index with the arguments in the wrong 
order. The reason for this is explained further down:

        with recursive cnt(x) as (select 1 union all select x+1 from cnt limit 
10000000)
        insert into v select x % 3,x from cnt;

        create index w on v(z,y);
        select count(*) from v where    z = 0 and
                                (       y between 1000000 and 1001000 
                                or      y between 2000000 and 2001000
                                or      y between 3000000 and 3001000
                                or      y between 4000000 and 4001000);

The reason the the "low quality" index is because this data is also accessed in 
a different manner. Namely like this:

        select count(*) from v group by z;

Both of the above queries each take about 3 seconds to run. I don't think the 
second query can be made faster, but the first query can certainly be much 
faster even with the "low quality" index. Thus:

        select 
        (select count(*) from v where z = 0 and y between 1000000 and 1001000) +
        (select count(*) from v where z = 0 and y between 2000000 and 2001000) +
        (select count(*) from v where z = 0 and y between 3000000 and 3001000) +
        (select count(*) from v where z = 0 and y between 4000000 and 4001000);

Now the query returns the result 1334 almost immediately. The only difference 
is that the WHERE clause has been manually flattened and broken into separate 
SELECT portions.

When we change the index to "high quality" (u) and put the arguments in the 
other order.

        drop index w;
        create index u on v(y,z);

And rerun the query:

        select count(*) from v where    z = 0 and
                                (       y between 1000000 and 1001000 
                                or      y between 2000000 and 2001000
                                or      y between 3000000 and 3001000
                                or      y between 4000000 and 4001000);

The answer 1334 is returned almost immediately. And flattening this query gives 
no advantage. But now the second type of query runs terribly slow because the 
index is very poor for this type of query:

        select count(*) from v group by z;

And takes more then 30 seconds to finish.

Trying to make two indices in the hope that sqlite will find the optimal one by 
itself gives back result times the same as if only the w index is present:

        create index w on v(z,y);
        create index u on v(y,z);

Both queries now take a few seconds to run. It is as if the u index does not 
exist.

In our application we have only created the w index (since the u index is awful 
for the grouping query) and manually generate the WHERE/AND/OR flattened 
queries in a loop. This solution is unsatisfying to us and I think we must be 
doing something wrong. Is there a way we can make the w index work with both 
queries and not have to run external loops to flatten all the WHERE clauses?

RP

PS: Below is the text in one unit that can be copied and pasted into a shell 
session running sqlite3.exe:

create table times(idx,j);
create table v(z,y);

with recursive cnt(x) as (select 1 union all select x+1 from cnt limit 10000000)
insert into v select x % 3,x from cnt;

create index w on v(z,y);
select 'With index (z,y)';
insert into times select 0,julianday('now');
select 
(select count(*) from v where z = 0 and y between 1000000 and 1001000) +
(select count(*) from v where z = 0 and y between 2000000 and 2001000) +
(select count(*) from v where z = 0 and y between 3000000 and 3001000) +
(select count(*) from v where z = 0 and y between 4000000 and 4001000);
insert into times select 1,julianday('now');
select strftime('%f',(select j from times where idx=1)-(select j from times 
where idx=0));
select count(*) from v where    z = 0 and
                        (       y between 1000000 and 1001000 
                        or      y between 2000000 and 2001000
                        or      y between 3000000 and 3001000
                        or      y between 4000000 and 4001000);
insert into times select 2,julianday('now');
select strftime('%f',(select j from times where idx=2)-(select j from times 
where idx=1));
select count(*) from v group by z;
insert into times select 3,julianday('now');
select strftime('%f',(select j from times where idx=3)-(select j from times 
where idx=2));

drop index w;
delete from times;

create index u on v(y,z);
select 'With index (y,z)';
insert into times select 0,julianday('now');
select 
(select count(*) from v where z = 0 and y between 1000000 and 1001000) +
(select count(*) from v where z = 0 and y between 2000000 and 2001000) +
(select count(*) from v where z = 0 and y between 3000000 and 3001000) +
(select count(*) from v where z = 0 and y between 4000000 and 4001000);
insert into times select 1,julianday('now');
select strftime('%f',(select j from times where idx=1)-(select j from times 
where idx=0));
select count(*) from v where    z = 0 and
                        (       y between 1000000 and 1001000 
                        or      y between 2000000 and 2001000
                        or      y between 3000000 and 3001000
                        or      y between 4000000 and 4001000);
insert into times select 2,julianday('now');
select strftime('%f',(select j from times where idx=2)-(select j from times 
where idx=1));
select count(*) from v group by z;
insert into times select 3,julianday('now');
select strftime('%f',(select j from times where idx=3)-(select j from times 
where idx=2));

delete from times;

create index w on v(z,y);
select 'With both indices (y,z) and (z,y)';
insert into times select 0,julianday('now');
select 
(select count(*) from v where z = 0 and y between 1000000 and 1001000) +
(select count(*) from v where z = 0 and y between 2000000 and 2001000) +
(select count(*) from v where z = 0 and y between 3000000 and 3001000) +
(select count(*) from v where z = 0 and y between 4000000 and 4001000);
insert into times select 1,julianday('now');
select strftime('%f',(select j from times where idx=1)-(select j from times 
where idx=0));
select count(*) from v where    z = 0 and
                        (       y between 1000000 and 1001000 
                        or      y between 2000000 and 2001000
                        or      y between 3000000 and 3001000
                        or      y between 4000000 and 4001000);
insert into times select 2,julianday('now');
select strftime('%f',(select j from times where idx=2)-(select j from times 
where idx=1));
select count(*) from v group by z;
insert into times select 3,julianday('now');
select strftime('%f',(select j from times where idx=3)-(select j from times 
where idx=2));

.quit

_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

[sqlite] Discrepancy with indexing and WHERE clause with AND/OR

Reply via email to