Re: [sqlite] query optimization for inner table join

2008-12-02 Thread Igor Tandetnik
"Jos van den Oever" <[EMAIL PROTECTED]>
wrote in message
news:[EMAIL PROTECTED]
> There's two tables with the same problem. One has an undetermined
> number of values: 'm' points to user-definable tag.
> In the other table I have about 110 values. This could be spread over
> two integer columns. I'm a bit hesitant to use integer values as
> bitmasks. How is the signedness handled in the binding? Should I
> simply use a uint64_t and not worry?

I believe it would just work. But, if you think that would be a problem, 
you can use only 63 bits. Two columns will still cover 126 possible 
values.

Igor Tandetnik



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] query optimization for inner table join

2008-12-02 Thread Jos van den Oever
2008/12/2 Igor Tandetnik <[EMAIL PROTECTED]>:
> You could also try something more straightforward:
>
> select distinct n from map m1 where
>exists (select 1 from map m2 where m1.n=m2.n and m2.m=3) and
>exists (select 1 from map m2 where m1.n=m2.n and m2.m=5) and
>not exists (select 1 from map m2 where m1.n=m2.n and m2.m=7);
>
> -- or
>
> select distinct n from map where
>n in (select n from map where m=3) and
>n in (select n from map where m=5) and
>n not in (select n from map where m=7);

This would have a worse worst case scenario, but by cleverly ordering
the inclusive statements from infrequent to frequent and the exclusive
ones from frequent to infrequent this could be improved. I'd have to
do a
  select m, count(m) from map group by m;
to get the info I need for that.

> If you need to run this kind of query often, and values of m are small
> (preferably less than 64), you might want to store a map from n to a
> bitmask where each bit corresponds to one value of m. Then the query
> becomes simply
>
> select n from map
> where (n & 168) = 40;
>
> This is going to be linear, but in the number of distinct values of n,
> not in the number of all pairs.

There's two tables with the same problem. One has an undetermined
number of values: 'm' points to user-definable tag.
In the other table I have about 110 values. This could be spread over
two integer columns. I'm a bit hesitant to use integer values as
bitmasks. How is the signedness handled in the binding? Should I
simply use a uint64_t and not worry?

Cheers,
Jos
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] query optimization for inner table join

2008-12-01 Thread Igor Tandetnik
"Jos van den Oever" <[EMAIL PROTECTED]>
wrote in message
news:[EMAIL PROTECTED]
> 2008/12/1 Igor Tandetnik <[EMAIL PROTECTED]>:
>> Try this:
>>
>> select n from map
>> group by n
>> having
>>count(case when m=3 then 1 else null end) != 0 and
>>count(case when m=5 then 1 else null end) != 0 and
>>count(case when m=7 then 1 else null end) = 0;
>>
>> Having an index on map(n) should speed it up.
>
> Thank you very much, Igor. I would have not thought of that.
>
> This is a nicely predictable single linear scan. Still not awfully
> fast, but it will have to do.

You could also try something more straightforward:

select distinct n from map m1 where
exists (select 1 from map m2 where m1.n=m2.n and m2.m=3) and
exists (select 1 from map m2 where m1.n=m2.n and m2.m=5) and
not exists (select 1 from map m2 where m1.n=m2.n and m2.m=7);

-- or

select distinct n from map where
n in (select n from map where m=3) and
n in (select n from map where m=5) and
n not in (select n from map where m=7);


If you need to run this kind of query often, and values of m are small 
(preferably less than 64), you might want to store a map from n to a 
bitmask where each bit corresponds to one value of m. Then the query 
becomes simply

select n from map
where (n & 168) = 40;

This is going to be linear, but in the number of distinct values of n, 
not in the number of all pairs.

Igor Tandetnik 



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] query optimization for inner table join

2008-12-01 Thread Jos van den Oever
2008/12/1 Igor Tandetnik <[EMAIL PROTECTED]>:
> Try this:
>
> select n from map
> group by n
> having
>count(case when m=3 then 1 else null end) != 0 and
>count(case when m=5 then 1 else null end) != 0 and
>count(case when m=7 then 1 else null end) = 0;
>
> Having an index on map(n) should speed it up.

Thank you very much, Igor. I would have not thought of that.

This is a nicely predictable single linear scan. Still not awfully
fast, but it will have to do.
Using an index on map(n,m) seems faster. This may be because the m
values are in the index and there is no need to access the table.

Cheers,
Jos
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] query optimization for inner table join

2008-12-01 Thread Igor Tandetnik
"Jos van den Oever" <[EMAIL PROTECTED]>
wrote in message
news:[EMAIL PROTECTED]
> I've trouble optimizing for an N:M mapping table. The schema of the
> table is this:
>
> CREATE TABLE map (n INTEGER NOT NULL, m INTEGER NOT NULL);
>
> I want to retrieve a list of n filtered on the presence of certain
> values of m, e.g. give me all n for which there is an m = 3 and m = 5,
> but no m = 7.

Try this:

select n from map
group by n
having
count(case when m=3 then 1 else null end) != 0 and
count(case when m=5 then 1 else null end) != 0 and
count(case when m=7 then 1 else null end) = 0;

Having an index on map(n) should speed it up.

Igor Tandetnik



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] query optimization for inner table join

2008-12-01 Thread Jos van den Oever
Hi all,

I've trouble optimizing for an N:M mapping table. The schema of the
table is this:

CREATE TABLE map (n INTEGER NOT NULL, m INTEGER NOT NULL);

I want to retrieve a list of n filtered on the presence of certain
values of m, e.g. give me all n for which there is an m = 3 and m = 5,
but no m = 7.
A naive query would look like this:

SELECT a.n FROM map a, map b, map c
  WHERE a.n = b.n AND a.n = c.n AND a.m = 3 AND b.m = 5
 AND c.id not in (select id from map where c.m = 7);

This can be slow, even for the more simple case with only positive selection:

SELECT a.n FROM map a, map b
  WHERE a.n = b.n AND a.m = 3 AND b.m = 5;

And this variation does not make it a lot faster:

SELECT n FROM map WHERE m = 3 INTERSECT SELECT n FROM map where m = 5;

There are about a million entries in the table map and want to
increase to about 10 million.

The current indexes are

CREATE INDEX map_n ON map(n);
CREATE INDEX map_m ON map(n,m);

Is there a cleverer way of doing these queries?

The fraction of n's that has a particular m can be anywhere between 0 and 1.

Cheers,
Jos
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users