Re: [sqlite] Optimization of equality comparison when NULL involved

2007-06-18 Thread Sean Cunningham


On Jun 15, 2007, at 5:27 PM, Scott Hess wrote:



select tableA.path, tableA.value from tableA,tableB where
tableA.path=tableB.path and tableA.value=tableB.value union
select tableA.path, tableA.value from tableA,tableB where
tableA.path=tableB.path and tableA.value IS NULL AND tableB.value IS
NULL;




I think I oversimplified the problem a little bit.  In the example I  
gave, there
were just two columns.  In the problem I have to solve, there are n  
columns,

and any item in any column might be NULL.  The above approach will work
well with just two columns, but with multiple columns you would have to
take an iterative approach; building intermediate results and  
intersecting

with the first column.

What I am trying to do is build efficient set operations.  I have  
very large datasets
and have found that the built in union, intersect, and except  
operations do not
seem to use indices (would be happy to be proven wrong here).   As  
such, they

are not very speedy with large large data sets.


If you really can treat null as '', then you might be better off
defining the column as NOT NULL DEFAULT ''.


I am leaning toward avoid NULL and using some other token which I can  
interpret

as "no data".  Not an ideal solution, but should scale.

There was talk in the mailing list a while back about creating a new  
operator that
would act as a superset of '==' which would treat NULL==NULL as  
True.  I have
seen this in some other database.  Anybody know if this is on the  
roadmap?




-
To unsubscribe, send email to [EMAIL PROTECTED]
-



[sqlite] Optimization of equality comparison when NULL involved

2007-06-15 Thread Sean Cunningham

I am hoping there is an obvious answer to this that I've overlooked.

I have two tables:

create table tableA (path TEXT, value TEXT);
create index myIndexA on tableA (path, value);

create table tableB(path TEXT, value TEXT);
create index myIndexB on tableB (path, value);


Now some simple inserts:

insert into tableA(path,value) values('alpha', '1');
insert into tableA(path,value) values('beta', '2');
insert into tableB(path,value) values('alpha', '1');
insert into tableB(path,value) values('beta', '999');

The following select statement gets what you'd expect, and it uses  
the index.


sqlite> select tableA.path, tableA.value from tableA,tableB where  
tableA.path=tableB.path and tableA.value=tableB.value;

alpha|1
gamma|3


However, if your data has NULL's:

insert into tableA(path,value) values('gamma', NULL);
insert into tableB(path,value) values('gamma', NULL);

The same select above neglects to report the tuple ('gamma',NULL) as  
the equal.  This is correct given SQL's treatment of NULL, and is  
easily fixed:


sqlite> select tableA.path, tableA.value from tableA,tableB where  
ifnull(tableA.path,'')=ifnull(tableB.path,'') and ifnull 
(tableA.value,'')=ifnull(tableB.value,'');

alpha|1
gamma|3
gamma|

However, the above statement has the unfortunate side effect of  
bypassing the index:


sqlite> explain select tableA.path, tableA.value from tableA,tableB  
where ifnull(tableA.path,'')=ifnull(tableB.path,'') and ifnull 
(tableA.value,'')=ifnull(tableB.value,'');

0|Goto|0|35|
1|Integer|0|0|
2|OpenRead|0|2|
3|SetNumColumns|0|2|
4|Integer|0|0|
5|OpenRead|1|4|
6|SetNumColumns|1|2|
7|Rewind|0|32|
8|Rewind|1|31|
9|Column|0|0|
10|String8|0|0|
11|CollSeq|0|0|collseq(BINARY)
12|Function|2|2|ifnull(2)
13|Column|1|0|
14|String8|0|0|
15|CollSeq|0|0|collseq(BINARY)
16|Function|2|2|ifnull(2)
17|Ne|28417|30|
18|Column|0|1|
19|String8|0|0|
20|CollSeq|0|0|collseq(BINARY)
21|Function|2|2|ifnull(2)
22|Column|1|1|
23|String8|0|0|
24|CollSeq|0|0|collseq(BINARY)
25|Function|2|2|ifnull(2)
26|Ne|28417|30|
27|Column|0|0|
28|Column|0|1|
29|Callback|2|0|
30|Next|1|9|
31|Next|0|8|
32|Close|0|0|
33|Close|1|0|
34|Halt|0|0|
35|Transaction|0|0|
36|VerifyCookie|0|4|
37|Goto|0|1|
38|Noop|0|0|



My question is:Is there another way to write such a select  
statement which can solve both problems of treating NULL==NULL and  
using the index.


Thanks,
Sean



-
To unsubscribe, send email to [EMAIL PROTECTED]
-