According to this benchmark the break-even point is at 40% nulls. I asssume you have a different test?
#include <stdio.h> int main() { int i; char sql[4096]; printf("CREATE TABLE x ('col1','col2','col3');\n"); printf("BEGIN;"); for(i=0;i<1000000;++i) { if (i < 100000) { sprintf(sql,"INSERT INTO x values(null,'col2_%d','col3_%d');",i,i,i); } else { sprintf(sql,"INSERT INTO x values('col_%d','col2_%d','col3_%d');",i,i,i); } printf("%s\n",sql); } printf("COMMIT;"); printf("CREATE INDEX col1index on x('col1');\n"); } sqlite> select count(*) from x where col1 between (select min(col1) from x) and (select max(col1) from x); 900000 CPU Time: user 0.453125 sys 0.015625 sqlite> select count(*) from x where col1 is not null; 900000 CPU Time: user 0.281250 sys 0.125000 sqlite> update x set col1=null where rowid < 200000; CPU Time: user 1.906250 sys 0.234375 sqlite> select count(*) from x where col1 between (select min(col1) from x) and (select max(col1) from x); 800001 CPU Time: user 0.390625 sys 0.031250 sqlite> update x set col1=null where rowid >= 200000 and rowid < 300000; CPU Time: user 0.953125 sys 0.125000 sqlite> select count(*) from x where col1 between (select min(col1) from x) and (select max(col1) from x); 700001 CPU Time: user 0.359375 sys 0.015625 sqlite> select count(*) from x where col1 is not null; 700001 CPU Time: user 0.296875 sys 0.078125 sqlite> update x set col1=null where rowid >= 300000 and rowid < 400000; CPU Time: user 0.890625 sys 0.156250 sqlite> select count(*) from x where col1 is not null; 600001 CPU Time: user 0.281250 sys 0.109375 sqlite> select count(*) from x where col1 between (select min(col1) from x) and (select max(col1) from x); 600001 CPU Time: user 0.281250 sys 0.031250 sqlite> update x set col1=null where rowid >= 400000 and rowid < 500000; CPU Time: user 0.921875 sys 0.171875 sqlite> select count(*) from x where col1 between (select min(col1) from x) and (select max(col1) from x); 500001 CPU Time: user 0.250000 sys 0.015625 sqlite> select count(*) from x where col1 is not null; 500001 CPU Time: user 0.312500 sys 0.062500 Michael D. Black Senior Scientist Advanced Analytics Directorate Advanced GEOINT Solutions Operating Unit Northrop Grumman Information Systems ________________________________ From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on behalf of Igor Tandetnik [itandet...@mvps.org] Sent: Monday, October 17, 2011 8:14 AM To: sqlite-users@sqlite.org Subject: EXT :Re: [sqlite] "is not null" and index Jean-Christophe Deschamps <j...@antichoc.net> wrote: >>> How can indexes be used with "not null" queries? >> >> They cannot. > > If one sees NOT NULL as the complement of NULL, i.e. values in the > range {min_value, max_value} (min and max depending on the column > expected content and type), then couldn't an index help? > select * from table where col between min_value and max_value If you have an extrinsic knowledge of the domain of the column values, then yes, you can reformulate the query as where col1 between minValue and maxValue I don't think SQLite would be able to rewrite the query this way automatically. > Of course if there are only few NULLs then a table scan will probably > be faster. In fact, unless some 90% of the rows contain NULLs, a table scan will probably be faster. Using an index trades O(N) performance for O(M log N), where M is the number of rows actually satisfying the condition. This is clearly an improvement when M is much smaller than N, and a pessimization when M is close to N. A rule of thumb is that the break-even point is somewhere around M = 0.1*N And if 90% of the rows do contain NULL in this column, I'd consider splitting the data into two tables - one with three columns (containing all non-NULL rows from the original table) and the other with two columns (containing the remaining rows). -- Igor Tandetnik _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users