According to this benchmark the break-even point is at 40% nulls. I asssume
you have a different test?
#include <stdio.h>
int main()
{
int i;
char sql[4096];
printf("CREATE TABLE x ('col1','col2','col3');\n");
printf("BEGIN;");
for(i=0;i<1000000;++i) {
if (i < 100000) {
sprintf(sql,"INSERT INTO x values(null,'col2_%d','col3_%d');",i,i,i);
}
else {
sprintf(sql,"INSERT INTO x values('col_%d','col2_%d','col3_%d');",i,i,i);
}
printf("%s\n",sql);
}
printf("COMMIT;");
printf("CREATE INDEX col1index on x('col1');\n");
}
sqlite> select count(*) from x where col1 between (select min(col1) from x) and
(select max(col1) from x);
900000
CPU Time: user 0.453125 sys 0.015625
sqlite> select count(*) from x where col1 is not null;
900000
CPU Time: user 0.281250 sys 0.125000
sqlite> update x set col1=null where rowid < 200000;
CPU Time: user 1.906250 sys 0.234375
sqlite> select count(*) from x where col1 between (select min(col1) from x) and
(select max(col1) from x);
800001
CPU Time: user 0.390625 sys 0.031250
sqlite> update x set col1=null where rowid >= 200000 and rowid < 300000;
CPU Time: user 0.953125 sys 0.125000
sqlite> select count(*) from x where col1 between (select min(col1) from x) and
(select max(col1) from x);
700001
CPU Time: user 0.359375 sys 0.015625
sqlite> select count(*) from x where col1 is not null;
700001
CPU Time: user 0.296875 sys 0.078125
sqlite> update x set col1=null where rowid >= 300000 and rowid < 400000;
CPU Time: user 0.890625 sys 0.156250
sqlite> select count(*) from x where col1 is not null;
600001
CPU Time: user 0.281250 sys 0.109375
sqlite> select count(*) from x where col1 between (select min(col1) from x) and
(select max(col1) from x);
600001
CPU Time: user 0.281250 sys 0.031250
sqlite> update x set col1=null where rowid >= 400000 and rowid < 500000;
CPU Time: user 0.921875 sys 0.171875
sqlite> select count(*) from x where col1 between (select min(col1) from x) and
(select max(col1) from x);
500001
CPU Time: user 0.250000 sys 0.015625
sqlite> select count(*) from x where col1 is not null;
500001
CPU Time: user 0.312500 sys 0.062500
Michael D. Black
Senior Scientist
Advanced Analytics Directorate
Advanced GEOINT Solutions Operating Unit
Northrop Grumman Information Systems
________________________________
From: [email protected] [[email protected]] on
behalf of Igor Tandetnik [[email protected]]
Sent: Monday, October 17, 2011 8:14 AM
To: [email protected]
Subject: EXT :Re: [sqlite] "is not null" and index
Jean-Christophe Deschamps <[email protected]> wrote:
>>> How can indexes be used with "not null" queries?
>>
>> They cannot.
>
> If one sees NOT NULL as the complement of NULL, i.e. values in the
> range {min_value, max_value} (min and max depending on the column
> expected content and type), then couldn't an index help?
> select * from table where col between min_value and max_value
If you have an extrinsic knowledge of the domain of the column values, then
yes, you can reformulate the query as
where col1 between minValue and maxValue
I don't think SQLite would be able to rewrite the query this way automatically.
> Of course if there are only few NULLs then a table scan will probably
> be faster.
In fact, unless some 90% of the rows contain NULLs, a table scan will probably
be faster. Using an index trades O(N) performance for O(M log N), where M is
the number of rows actually satisfying the condition. This is clearly an
improvement when M is much smaller than N, and a pessimization when M is close
to N. A rule of thumb is that the break-even point is somewhere around M = 0.1*N
And if 90% of the rows do contain NULL in this column, I'd consider splitting
the data into two tables - one with three columns (containing all non-NULL rows
from the original table) and the other with two columns (containing the
remaining rows).
--
Igor Tandetnik
_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users