Hi, We encountered an issue in our system where we had two nearly identical subqueries in a statement, which differed only in one table (both of which had the same definition). One table had 4 rows, the other table had zero rows. The subquery with the 4-row table ran about 10,000 times faster than the subquery with the zero-row table. ANALYZE had been run on this schema, and the problem existed on sqlite version 3.7.12 up through 3.8.5 (running on Mac OS 10.8), and exists whether compiling with SQLITE_ENABLE_STAT3 or not.
After much analysis, it turns out the problem was "fixed" by inserting one row into the empty table, running ANALYZE, then removing the row. Without this step, there was no row in sqlite_stat1, and the optimizer used assumed 100,000 rows for the empty table. My naïve understanding of the sqlite engine would make me think that the problem lies in not storing a stats row for zero-sized tables, resulting in the optimizer making the unnecessary assumption. I've included a script below that will reproduce the problem. The poor performance isn't as dramatic as our production case, likely because the tables are not as wide as ours, so the cost of visiting rows is reduced, so I added some nonsensical string operations to the select clause to burn cpu. In our code, the timing for the optimal query was "CPU Time: user 0.000142 sys 0.000072" and the suboptimal one was "CPU Time: user 1.021453 sys 0.036893" Note that several conditions must exist to produce this perfect storm: 1. The subquery needs to contain several tables besides the empty one 2. The stats computed for the schema are as described above 3. Covering indexes exist for the correlation keys. Here are query plans & times produced my script: ==== Query against the subquery with 4-row table ==== selectid order from detail ---------- ---------- ---------- --------------------------------------------------------------------------------------------- 0 0 0 SEARCH TABLE child AS mc USING INTEGER PRIMARY KEY (rowid=?) (~1 rows) 0 1 1 SEARCH TABLE parent AS mp USING INTEGER PRIMARY KEY (rowid=?) (~1 rows) 0 0 0 EXECUTE CORRELATED SCALAR SUBQUERY 1 1 0 0 SCAN TABLE somerows AS s (~4 rows) 1 1 1 SEARCH TABLE child AS c USING INTEGER PRIMARY KEY (rowid=?) (~1 rows) 1 2 2 SEARCH TABLE parent AS p USING INTEGER PRIMARY KEY (rowid=?) (~1 rows) CPU Time: user 0.000183 sys 0.000106 ==== Query against the subquery with zero-row table Note that it runs about 150 times slower than the first case ==== selectid order from detail ---------- ---------- ---------- --------------------------------------------------------------------------------------------- 0 0 0 SEARCH TABLE child AS mc USING INTEGER PRIMARY KEY (rowid=?) (~1 rows) 0 1 1 SEARCH TABLE parent AS mp USING INTEGER PRIMARY KEY (rowid=?) (~1 rows) 0 0 0 EXECUTE CORRELATED SCALAR SUBQUERY 1 1 0 2 SEARCH TABLE parent AS p USING COVERING INDEX ak_p (gp_id=?) (~100 rows) 1 1 1 SEARCH TABLE child AS c USING COVERING INDEX ak_c (p_id=?) (~50 rows) 1 2 0 SEARCH TABLE norows AS n USING COVERING INDEX ak_norows (c_id=?) (~1 rows) CPU Time: user 0.026749 sys 0.000396 ==== Query against the subquery with zero-row table after stats were generated with a single row in the table ==== ==== Note that now the query plan is the same as in the "correct" case ==== selectid order from detail ---------- ---------- ---------- --------------------------------------------------------------------------------------------- 0 0 0 SEARCH TABLE child AS mc USING INTEGER PRIMARY KEY (rowid=?) (~1 rows) 0 1 1 SEARCH TABLE parent AS mp USING INTEGER PRIMARY KEY (rowid=?) (~1 rows) 0 0 0 EXECUTE CORRELATED SCALAR SUBQUERY 1 1 0 0 SCAN TABLE norows AS n (~1 rows) 1 1 1 SEARCH TABLE child AS c USING INTEGER PRIMARY KEY (rowid=?) (~1 rows) 1 2 2 SEARCH TABLE parent AS p USING INTEGER PRIMARY KEY (rowid=?) (~1 rows) CPU Time: user 0.000173 sys 0.000099 ========================== === Steps to Reproduce === ========================== Run the attached bash script and redirect to sqlite, like so: ./testdata.sh | sqlite3 test.db The script generates a db with a million-row table; 150000 rows (by choosing 100/50/30 values for the i/j/k loops) is enough to illustrate the problem as well, but the results are not as dramatic. ================= == testdata.sh == ================= #!/bin/bash echo 'create table grandparent (gp_id integer not null primary key, gp_name text not null);' echo 'create table parent (p_id integer not null primary key, p_name text not null, gp_id integernot null);' echo 'create table child (c_id integer not null primary key, c_name text not null, p_id integernot null);' echo 'create temp table if not exists variables (name text primary key, value text);' for i in `seq 1 100`; do echo insert into grandparent values \(null, \"gp $i\"\)\; echo 'insert or replace into variables values ("gp", last_insert_rowid());' for j in `seq 1 100`; do echo insert into parent select null, \"gp-\" \|\| value \|\| \" p-$j\", value from variables where name = \'gp\'\; echo 'insert or replace into variables values ("p", last_insert_rowid());' for k in `seq 1 100`; do echo insert into child select null, \"p-\" \|\| value \|\| \" c-$k\", value from variables where name = \'p\'\; done done done echo 'create table somerows (name text not null, c_id integer not null);' echo 'create table norows (name text not null, c_id integer not null);' echo 'delete from somerows;' echo 'insert into somerows values ("apple", 15001);' echo 'insert into somerows values ("peach", 15003);' echo 'insert into somerows values ("pear", 15005);' echo 'insert into somerows values ("plum", 15007);' echo 'create unique index pk_gp on grandparent (gp_id);' echo 'create unique index pk_p on parent (p_id);' echo 'create unique index pk_c on child (c_id);' echo 'create unique index pk_somerows on somerows (name);' echo 'create unique index pk_norows on norows (name);' echo 'create unique index ak_p on parent (gp_id, p_id);' echo 'create unique index ak_c on child (p_id, c_name);' echo 'create unique index ak_norows on norows (c_id);' echo 'create unique index ak_somerows on somerows (c_id);' echo 'analyze;' echo '.width 0 0 0 200' echo 'explain query plan ' \ 'select' \ '(select substr(replace(c_name || " aa aa aa aa aa aa " || p.p_id, "aa", "a"), ' \ ' length(replace(c_name || " aa aa aa aa aa aa " || p.p_id, "aa", "a")) - 8, ' \ ' 4) || c_name expensive_col' \ 'from somerows s' \ ' join child c' \ ' on s.c_id = c.c_id' \ ' join parent p' \ ' on c.p_id = p.p_id' \ 'where p.gp_id = mp.gp_id and expensive_col != "nothing")' \ 'from child mc, parent mp' \ 'where mc.p_id = mp.p_id and c_id = 15010;' echo '.timer on' echo \ 'select' \ '(select substr(replace(c_name || " aa aa aa aa aa aa " || p.p_id, "aa", "a"), ' \ ' length(replace(c_name || " aa aa aa aa aa aa " || p.p_id, "aa", "a")) - 8, ' \ ' 4) || c_name expensive_col' \ 'from somerows s' \ ' join child c' \ ' on s.c_id = c.c_id' \ ' join parent p' \ ' on c.p_id = p.p_id' \ 'where p.gp_id = mp.gp_id and expensive_col != "nothing")' \ 'from child mc, parent mp' \ 'where mc.p_id = mp.p_id and c_id = 15010;' echo '.timer off' echo 'explain query plan ' \ 'select' \ '(select substr(replace(c_name || " aa aa aa aa aa aa " || p.p_id, "aa", "a"), ' \ ' length(replace(c_name || " aa aa aa aa aa aa " || p.p_id, "aa", "a")) - 8, ' \ ' 4) || c_name expensive_col' \ 'from norows n' \ ' join child c' \ ' on n.c_id = c.c_id' \ ' join parent p' \ ' on c.p_id = p.p_id' \ 'where p.gp_id = mp.gp_id and expensive_col != "nothing")' \ 'from child mc, parent mp' \ 'where mc.p_id = mp.p_id and c_id = 15010;' echo '.timer on' echo \ 'select' \ '(select substr(replace(c_name || " aa aa aa aa aa aa " || p.p_id, "aa", "a"), ' \ ' length(replace(c_name || " aa aa aa aa aa aa " || p.p_id, "aa", "a")) - 8, ' \ ' 4) || c_name expensive_col' \ 'from norows n' \ ' join child c' \ ' on n.c_id = c.c_id' \ ' join parent p' \ ' on c.p_id = p.p_id' \ 'where p.gp_id = mp.gp_id and expensive_col != "nothing")' \ 'from child mc, parent mp' \ 'where mc.p_id = mp.p_id and c_id = 15010;' echo '.timer off' echo 'insert into norows values ("dummy", 1);' echo 'analyze;' echo 'delete from norows;' echo 'explain query plan ' \ 'select' \ '(select substr(replace(c_name || " aa aa aa aa aa aa " || p.p_id, "aa", "a"), ' \ ' length(replace(c_name || " aa aa aa aa aa aa " || p.p_id, "aa", "a")) - 8, ' \ ' 4) || c_name expensive_col' \ 'from norows n' \ ' join child c' \ ' on n.c_id = c.c_id' \ ' join parent p' \ ' on c.p_id = p.p_id' \ 'where p.gp_id = mp.gp_id and expensive_col != "nothing")' \ 'from child mc, parent mp' \ 'where mc.p_id = mp.p_id and c_id = 15010;' echo '.timer on' echo \ 'select' \ '(select substr(replace(c_name || " aa aa aa aa aa aa " || p.p_id, "aa", "a"), ' \ ' length(replace(c_name || " aa aa aa aa aa aa " || p.p_id, "aa", "a")) - 8, ' \ ' 4) || c_name expensive_col' \ 'from norows n' \ ' join child c' \ ' on n.c_id = c.c_id' \ ' join parent p' \ ' on c.p_id = p.p_id' \ 'where p.gp_id = mp.gp_id and expensive_col != "nothing")' \ 'from child mc, parent mp' \ 'where mc.p_id = mp.p_id and c_id = 15010;' echo '.timer off' echo 'analyze;' _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users