Re: [sqlite] Optimization of equality comparison when NULL involved

2007-06-18 Thread Joe Wilson
--- Sean Cunningham <[EMAIL PROTECTED]> wrote:
> I have very large datasets and have found that the built in union, 
> intersect, and except operations do not seem to use indices 
> (would be happy to be proven wrong here).   As  such, they
> are not very speedy with large large data sets.

A patch to speed up queries on a view (or a subquery) with 
compound selects:

  http://www.sqlite.org/cvstrac/tktview?tn=1924
  http://marc.info/?l=sqlite-users=117958960408282



   

Need a vacation? Get great deals
to amazing places on Yahoo! Travel.
http://travel.yahoo.com/

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Optimization of equality comparison when NULL involved

2007-06-18 Thread Scott Hess

On 6/18/07, Sean Cunningham <[EMAIL PROTECTED]> wrote:

There was talk in the mailing list a while back about creating a new
operator that would act as a superset of '==' which would treat
NULL==NULL as True.  I have seen this in some other database.
Anybody know if this is on the roadmap?


It wouldn't be hard to write a user-defined function to accomplish
this.  It naturally wouldn't be an infix operator.

-scott

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Optimization of equality comparison when NULL involved

2007-06-18 Thread Sean Cunningham


On Jun 15, 2007, at 5:27 PM, Scott Hess wrote:



select tableA.path, tableA.value from tableA,tableB where
tableA.path=tableB.path and tableA.value=tableB.value union
select tableA.path, tableA.value from tableA,tableB where
tableA.path=tableB.path and tableA.value IS NULL AND tableB.value IS
NULL;




I think I oversimplified the problem a little bit.  In the example I  
gave, there
were just two columns.  In the problem I have to solve, there are n  
columns,

and any item in any column might be NULL.  The above approach will work
well with just two columns, but with multiple columns you would have to
take an iterative approach; building intermediate results and  
intersecting

with the first column.

What I am trying to do is build efficient set operations.  I have  
very large datasets
and have found that the built in union, intersect, and except  
operations do not
seem to use indices (would be happy to be proven wrong here).   As  
such, they

are not very speedy with large large data sets.


If you really can treat null as '', then you might be better off
defining the column as NOT NULL DEFAULT ''.


I am leaning toward avoid NULL and using some other token which I can  
interpret

as "no data".  Not an ideal solution, but should scale.

There was talk in the mailing list a while back about creating a new  
operator that
would act as a superset of '==' which would treat NULL==NULL as  
True.  I have
seen this in some other database.  Anybody know if this is on the  
roadmap?




-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Optimization of equality comparison when NULL involved

2007-06-17 Thread Joe Wilson
> select tableA.path, tableA.value from tableA,tableB where
> tableA.path=tableB.path and (tableA.value=tableB.value or
> (tableA.value IS NULL AND tableB.value IS NULL));
>
> It's possible that won't use an index, either, due to the OR, in which
> case you could try a union between a select with is-null only, and
> another with equality, something like:
> 
> select tableA.path, tableA.value from tableA,tableB where
> tableA.path=tableB.path and tableA.value=tableB.value union
> select tableA.path, tableA.value from tableA,tableB where
> tableA.path=tableB.path and tableA.value IS NULL AND tableB.value IS
> NULL;

Even though this query has no OR operation:

  select tableA.path, tableA.value from tableA,tableB where
  tableA.path=tableB.path and tableA.value=tableB.value 

it has to do a complete pass over one of the tables 
anyway since it does not have a constrained value (or at 
least a full pass over an index). So I think in this case 
the first query with the OR listed above is more efficient
since it would do just a single pass over one of the tables
instead of 2 passes (one for each SELECT in the UNION).

I've been thinking of heuristics to rewrite SELECTs with ORs
using UNIONs, which is pretty straightforward, but knowing 
when not to perform this optimization is the tricky part.


   

Yahoo! oneSearch: Finally, mobile search 
that gives answers, not web links. 
http://mobile.yahoo.com/mobileweb/onesearch?refer=1ONXIC

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Optimization of equality comparison when NULL involved

2007-06-15 Thread Scott Hess

You can use something like:

select tableA.path, tableA.value from tableA,tableB where
tableA.path=tableB.path and (tableA.value=tableB.value or
(tableA.value IS NULL AND tableB.value IS NULL));

It's possible that won't use an index, either, due to the OR, in which
case you could try a union between a select with is-null only, and
another with equality, something like:

select tableA.path, tableA.value from tableA,tableB where
tableA.path=tableB.path and tableA.value=tableB.value union
select tableA.path, tableA.value from tableA,tableB where
tableA.path=tableB.path and tableA.value IS NULL AND tableB.value IS
NULL;

If you really can treat null as '', then you might be better off
defining the column as NOT NULL DEFAULT ''.

-scott


On 6/15/07, Sean Cunningham <[EMAIL PROTECTED]> wrote:

I am hoping there is an obvious answer to this that I've overlooked.

I have two tables:

create table tableA (path TEXT, value TEXT);
create index myIndexA on tableA (path, value);

create table tableB(path TEXT, value TEXT);
create index myIndexB on tableB (path, value);


Now some simple inserts:

insert into tableA(path,value) values('alpha', '1');
insert into tableA(path,value) values('beta', '2');
insert into tableB(path,value) values('alpha', '1');
insert into tableB(path,value) values('beta', '999');

The following select statement gets what you'd expect, and it uses
the index.

sqlite> select tableA.path, tableA.value from tableA,tableB where
tableA.path=tableB.path and tableA.value=tableB.value;
alpha|1
gamma|3


However, if your data has NULL's:

insert into tableA(path,value) values('gamma', NULL);
insert into tableB(path,value) values('gamma', NULL);

The same select above neglects to report the tuple ('gamma',NULL) as
the equal.  This is correct given SQL's treatment of NULL, and is
easily fixed:

sqlite> select tableA.path, tableA.value from tableA,tableB where
ifnull(tableA.path,'')=ifnull(tableB.path,'') and ifnull
(tableA.value,'')=ifnull(tableB.value,'');
alpha|1
gamma|3
gamma|

However, the above statement has the unfortunate side effect of
bypassing the index:

sqlite> explain select tableA.path, tableA.value from tableA,tableB
where ifnull(tableA.path,'')=ifnull(tableB.path,'') and ifnull
(tableA.value,'')=ifnull(tableB.value,'');
0|Goto|0|35|
1|Integer|0|0|
2|OpenRead|0|2|
3|SetNumColumns|0|2|
4|Integer|0|0|
5|OpenRead|1|4|
6|SetNumColumns|1|2|
7|Rewind|0|32|
8|Rewind|1|31|
9|Column|0|0|
10|String8|0|0|
11|CollSeq|0|0|collseq(BINARY)
12|Function|2|2|ifnull(2)
13|Column|1|0|
14|String8|0|0|
15|CollSeq|0|0|collseq(BINARY)
16|Function|2|2|ifnull(2)
17|Ne|28417|30|
18|Column|0|1|
19|String8|0|0|
20|CollSeq|0|0|collseq(BINARY)
21|Function|2|2|ifnull(2)
22|Column|1|1|
23|String8|0|0|
24|CollSeq|0|0|collseq(BINARY)
25|Function|2|2|ifnull(2)
26|Ne|28417|30|
27|Column|0|0|
28|Column|0|1|
29|Callback|2|0|
30|Next|1|9|
31|Next|0|8|
32|Close|0|0|
33|Close|1|0|
34|Halt|0|0|
35|Transaction|0|0|
36|VerifyCookie|0|4|
37|Goto|0|1|
38|Noop|0|0|



My question is:Is there another way to write such a select
statement which can solve both problems of treating NULL==NULL and
using the index.

Thanks,
Sean



-
To unsubscribe, send email to [EMAIL PROTECTED]
-




-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Optimization of equality comparison when NULL involved

2007-06-15 Thread Clark Christensen
In SQLite null is not equal to anything, including null.

I'm not sure what the best solution for your application is.  With help from 
the others here, I have learned to use coalesce() to convert nulls into a 
value, and to not allow null in key fields.

select 
  tableA.path,
  tableA.value 
from
  tableA,
  tableB 
where  
  tableA.path = tableB.path 
  and coalesce(tableA.value, '' ) = coalesce(tableB.value, '' );

yields 

alpha|1
gamma|

And it appears to use the myIndexB index for the join.

 -Clark

- Original Message 
From: Sean Cunningham <[EMAIL PROTECTED]>
To: sqlite-users@sqlite.org
Sent: Friday, June 15, 2007 1:26:49 PM
Subject: [sqlite] Optimization of equality comparison when NULL involved

I am hoping there is an obvious answer to this that I've overlooked.

I have two tables:

create table tableA (path TEXT, value TEXT);
create index myIndexA on tableA (path, value);

create table tableB(path TEXT, value TEXT);
create index myIndexB on tableB (path, value);


Now some simple inserts:

insert into tableA(path,value) values('alpha', '1');
insert into tableA(path,value) values('beta', '2');
insert into tableB(path,value) values('alpha', '1');
insert into tableB(path,value) values('beta', '999');

The following select statement gets what you'd expect, and it uses  
the index.

sqlite> select tableA.path, tableA.value from tableA,tableB where  
tableA.path=tableB.path and tableA.value=tableB.value;
alpha|1
gamma|3


However, if your data has NULL's:

insert into tableA(path,value) values('gamma', NULL);
insert into tableB(path,value) values('gamma', NULL);

The same select above neglects to report the tuple ('gamma',NULL) as  
the equal.  This is correct given SQL's treatment of NULL, and is  
easily fixed:

sqlite> select tableA.path, tableA.value from tableA,tableB where  
ifnull(tableA.path,'')=ifnull(tableB.path,'') and ifnull 
(tableA.value,'')=ifnull(tableB.value,'');
alpha|1
gamma|3
gamma|

However, the above statement has the unfortunate side effect of  
bypassing the index:

sqlite> explain select tableA.path, tableA.value from tableA,tableB  
where ifnull(tableA.path,'')=ifnull(tableB.path,'') and ifnull 
(tableA.value,'')=ifnull(tableB.value,'');
0|Goto|0|35|
1|Integer|0|0|
2|OpenRead|0|2|
3|SetNumColumns|0|2|
4|Integer|0|0|
5|OpenRead|1|4|
6|SetNumColumns|1|2|
7|Rewind|0|32|
8|Rewind|1|31|
9|Column|0|0|
10|String8|0|0|
11|CollSeq|0|0|collseq(BINARY)
12|Function|2|2|ifnull(2)
13|Column|1|0|
14|String8|0|0|
15|CollSeq|0|0|collseq(BINARY)
16|Function|2|2|ifnull(2)
17|Ne|28417|30|
18|Column|0|1|
19|String8|0|0|
20|CollSeq|0|0|collseq(BINARY)
21|Function|2|2|ifnull(2)
22|Column|1|1|
23|String8|0|0|
24|CollSeq|0|0|collseq(BINARY)
25|Function|2|2|ifnull(2)
26|Ne|28417|30|
27|Column|0|0|
28|Column|0|1|
29|Callback|2|0|
30|Next|1|9|
31|Next|0|8|
32|Close|0|0|
33|Close|1|0|
34|Halt|0|0|
35|Transaction|0|0|
36|VerifyCookie|0|4|
37|Goto|0|1|
38|Noop|0|0|



My question is:Is there another way to write such a select  
statement which can solve both problems of treating NULL==NULL and  
using the index.

Thanks,
Sean



-
To unsubscribe, send email to [EMAIL PROTECTED]
-





-
To unsubscribe, send email to [EMAIL PROTECTED]
-



[sqlite] Optimization of equality comparison when NULL involved

2007-06-15 Thread Sean Cunningham

I am hoping there is an obvious answer to this that I've overlooked.

I have two tables:

create table tableA (path TEXT, value TEXT);
create index myIndexA on tableA (path, value);

create table tableB(path TEXT, value TEXT);
create index myIndexB on tableB (path, value);


Now some simple inserts:

insert into tableA(path,value) values('alpha', '1');
insert into tableA(path,value) values('beta', '2');
insert into tableB(path,value) values('alpha', '1');
insert into tableB(path,value) values('beta', '999');

The following select statement gets what you'd expect, and it uses  
the index.


sqlite> select tableA.path, tableA.value from tableA,tableB where  
tableA.path=tableB.path and tableA.value=tableB.value;

alpha|1
gamma|3


However, if your data has NULL's:

insert into tableA(path,value) values('gamma', NULL);
insert into tableB(path,value) values('gamma', NULL);

The same select above neglects to report the tuple ('gamma',NULL) as  
the equal.  This is correct given SQL's treatment of NULL, and is  
easily fixed:


sqlite> select tableA.path, tableA.value from tableA,tableB where  
ifnull(tableA.path,'')=ifnull(tableB.path,'') and ifnull 
(tableA.value,'')=ifnull(tableB.value,'');

alpha|1
gamma|3
gamma|

However, the above statement has the unfortunate side effect of  
bypassing the index:


sqlite> explain select tableA.path, tableA.value from tableA,tableB  
where ifnull(tableA.path,'')=ifnull(tableB.path,'') and ifnull 
(tableA.value,'')=ifnull(tableB.value,'');

0|Goto|0|35|
1|Integer|0|0|
2|OpenRead|0|2|
3|SetNumColumns|0|2|
4|Integer|0|0|
5|OpenRead|1|4|
6|SetNumColumns|1|2|
7|Rewind|0|32|
8|Rewind|1|31|
9|Column|0|0|
10|String8|0|0|
11|CollSeq|0|0|collseq(BINARY)
12|Function|2|2|ifnull(2)
13|Column|1|0|
14|String8|0|0|
15|CollSeq|0|0|collseq(BINARY)
16|Function|2|2|ifnull(2)
17|Ne|28417|30|
18|Column|0|1|
19|String8|0|0|
20|CollSeq|0|0|collseq(BINARY)
21|Function|2|2|ifnull(2)
22|Column|1|1|
23|String8|0|0|
24|CollSeq|0|0|collseq(BINARY)
25|Function|2|2|ifnull(2)
26|Ne|28417|30|
27|Column|0|0|
28|Column|0|1|
29|Callback|2|0|
30|Next|1|9|
31|Next|0|8|
32|Close|0|0|
33|Close|1|0|
34|Halt|0|0|
35|Transaction|0|0|
36|VerifyCookie|0|4|
37|Goto|0|1|
38|Noop|0|0|



My question is:Is there another way to write such a select  
statement which can solve both problems of treating NULL==NULL and  
using the index.


Thanks,
Sean



-
To unsubscribe, send email to [EMAIL PROTECTED]
-