According to this benchmark the break-even point is at 40% nulls.  I asssume 
you have a different test?

#include <stdio.h>

int main()
{
  int i;
  char sql[4096];
  printf("CREATE TABLE x ('col1','col2','col3');\n");
  printf("BEGIN;");
  for(i=0;i<1000000;++i) {
    if (i < 100000) {
      sprintf(sql,"INSERT INTO x values(null,'col2_%d','col3_%d');",i,i,i);
    }
    else {
      sprintf(sql,"INSERT INTO x values('col_%d','col2_%d','col3_%d');",i,i,i);
    }
    printf("%s\n",sql);
  }
  printf("COMMIT;");
  printf("CREATE INDEX col1index on x('col1');\n");
}



sqlite> select count(*) from x where col1 between (select min(col1) from x) and 
(select max(col1) from x);
900000
CPU Time: user 0.453125 sys 0.015625
sqlite> select count(*) from x where col1 is not null;
900000
CPU Time: user 0.281250 sys 0.125000
sqlite> update x set col1=null where rowid < 200000;
CPU Time: user 1.906250 sys 0.234375
sqlite> select count(*) from x where col1 between (select min(col1) from x) and 
(select max(col1) from x);
800001
CPU Time: user 0.390625 sys 0.031250
sqlite> update x set col1=null where rowid >= 200000 and rowid < 300000;
CPU Time: user 0.953125 sys 0.125000
sqlite> select count(*) from x where col1 between (select min(col1) from x) and 
(select max(col1) from x);
700001
CPU Time: user 0.359375 sys 0.015625
sqlite> select count(*) from x where col1 is not null;
700001
CPU Time: user 0.296875 sys 0.078125
sqlite> update x set col1=null where rowid >= 300000 and rowid < 400000;
CPU Time: user 0.890625 sys 0.156250
sqlite> select count(*) from x where col1 is not null;
600001
CPU Time: user 0.281250 sys 0.109375
sqlite> select count(*) from x where col1 between (select min(col1) from x) and 
(select max(col1) from x);
600001
CPU Time: user 0.281250 sys 0.031250
sqlite> update x set col1=null where rowid >= 400000 and rowid < 500000;
CPU Time: user 0.921875 sys 0.171875
sqlite> select count(*) from x where col1 between (select min(col1) from x) and 
(select max(col1) from x);
500001
CPU Time: user 0.250000 sys 0.015625
sqlite> select count(*) from x where col1 is not null;
500001
CPU Time: user 0.312500 sys 0.062500



Michael D. Black

Senior Scientist

Advanced Analytics Directorate

Advanced GEOINT Solutions Operating Unit

Northrop Grumman Information Systems

________________________________
From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
behalf of Igor Tandetnik [itandet...@mvps.org]
Sent: Monday, October 17, 2011 8:14 AM
To: sqlite-users@sqlite.org
Subject: EXT :Re: [sqlite] "is not null" and index

Jean-Christophe Deschamps <j...@antichoc.net> wrote:
>>> How can indexes be used with "not null" queries?
>>
>> They cannot.
>
> If one sees NOT NULL as the complement of NULL, i.e. values in the
> range {min_value, max_value} (min and max depending on the column
> expected content and type), then couldn't an index help?
> select * from table where col between min_value and max_value

If you have an extrinsic knowledge of the domain of the column values, then 
yes, you can reformulate the query as

where col1 between minValue and maxValue

I don't think SQLite would be able to rewrite the query this way automatically.

> Of course if there are only few NULLs then a table scan will probably
> be faster.

In fact, unless some 90% of the rows contain NULLs, a table scan will probably 
be faster. Using an index trades O(N) performance for O(M log N), where M is 
the number of rows actually satisfying the condition. This is clearly an 
improvement when M is much smaller than N, and a pessimization when M is close 
to N. A rule of thumb is that the break-even point is somewhere around M = 0.1*N

And if 90% of the rows do contain NULL in this column, I'd consider splitting 
the data into two tables - one with three columns (containing all non-NULL rows 
from the original table) and the other with two columns (containing the 
remaining rows).
--
Igor Tandetnik

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to