I?m using 3.8.8.1 on Windows via the ?C? interface.

I work with SQLite for several years with good success. And I know that no 
optimizer ever will be perfect.
But I?ve just stumbled upon a case where a very similar query requires between 
0.2 seconds and a whopping 30 seconds.

I?ve simplified things as much as possible and included the create instructions 
and queries below.

1.  My database has a table "art_data" which holds information about an 
article. A second table "attr" holds all available attributes.
art_data may contain any number of attribues per article, typically between 50 
and 200.
The art_data table in the production database has about 22 million rows, the 
attr table 20,000.

art_data
        oid INTEGER PRIMARY KEY,
        art_oid INTEGER,
        attr_oid INTEGER,
        tdata TEXT

the attr_oid refers to the table which defines the available attributes and 
tdata is the value for that attribute. The attr table is defined as:

attr
        oid INTEGER PRIMARY KEY
        class INTEGER
        tag TEXT

For the indices created, please see below.

2.  My application needs to select all the data for a specific article and 
specific attributes. The number of attributes to select is usually between 1 
and 30 so I use an IN clause and provide the attribute ids directly in the 
SELECT query. My application has these 20,000 ids cached and always available.
If the number of articles is < 500, my application supplies a list of articles 
for the d.oid IN (...) as well.

SELECT d.oid, d.tdata FROM art_data d
WHERE d.oid IN (1,890,222,...)  
AND d.attr_oid IN (2188,2191,2251,2272,...) 
ORDER BY d.oid ASC, d.attr_oid ASC, d.rowid ASC

This query takes between 0.2 and 0.5 seconds, even if 500 article numbers are 
in the first WHERE d.oid IN clause!

If more than 500 articles are needed, I estimated that this would probably 
break the IN (is there a limit for IN?) and thus my application puts the 
article into a temporary table and JOINs with this table:

SELECT d.oid, d.attr_oid, d.tdata FROM art_data d 
INNER JOIN _temp _t ON d.oid = _t.oid  
AND d.attr_oid IN (2188,2191,2251,2272,...)
ORDER BY d.oid ASC, d.attr_oid ASC, d.rowid ASC

This query takes between 17 and 30 seconds (!) even if the temporary table only 
has 501 article numbers (one more than the threshold for the IN clause).

The only difference between 0.2 and 17 seconds is replacing an IN clause with 
500 numbers with a JOIN with a temporary table containing 501 numbers.

While playing with that, I used a SQLite GUI tool and created the temporary 
table _temp (oid INTEGER PRIMARY KEY) as a regular table and filled it with 500 
article numbers. This also resulted in the 17 to 30s query times.

For a test, I ran ANALYZE and the query time dropped down to 0.5 seconds. 
AMAZING.
Apparently the query analyzer now had the info about the (no longer) temporary 
table and was able to use it efficiently.

My SOLUTION for now was to change the query with the temporary table to

SELECT d.oid, d.attr_oid, d.tdata FROM art_data d 
WHERE d.attr_oid IN (2188,2191,2251,2272,...) 
AND d.oid IN (SELECT oid FROM _temp)

Instead of a JOIN for the temporary table I use an IN clause with a SELECT. 
This brought the query time down to 0.5 seconds as well. May also be the 
optimizer.

The question is: When JOINing large tables with a temporary table, how to 
ensure that the optimizer can work optimal? Running ANALYZE with a temporary 
table probably does not work, and ANALYZE takes about 1 minute on this database 
so this is not feasible for each query.

I'm glad to have found an apparently working solution (IN instead of JOIN) but 
I wonder if this could be somehow automated by the optimizer? Or maybe this is 
a worst-case for the optimizer?



If you want to try this out yourself, here is the complete CREATE schema and 
queries:

-- BEGIN ---------------------------------
DROP TABLE IF EXISTS art_data; 
DROP TABLE IF EXISTS attr;

CREATE TABLE art_data (oid INTEGER, attr_oid INTEGER, tdata TEXT, FOREIGN 
KEY(attr_oid) REFERENCES attr(oid) ON DELETE CASCADE);

CREATE INDEX idx_art_data_oid ON art_data(oid);
CREATE INDEX idx_art_data_oid_tag_oid ON art_data(oid,attr_oid);
CREATE INDEX idx_art_data_attr_oid ON art_data(attr_oid);

CREATE TABLE attr (oid INTEGER PRIMARY KEY, class INTEGER, tag TEXT);
CREATE INDEX idx_attr_tag ON attr(tag);

DROP TABLE IF EXISTS _temp;
CREATE TABLE _temp (OID INTEGER PRIMARY KEY);
--insert into _temp select ...

-- Fast: 0.2 seconds
-- explain query plan
SELECT d.oid, d.tdata FROM art_data d
-- Only for specific articles 
WHERE d.oid IN
(24832,161134,24852,161140,24865,161146,24870,161147,24906,161151,24953,24992,25079,25119,161165,25125,161166,25129,161167,25153,25182,161169,25183,161170,25185,161172,25217,161173,25192,25432,161176,25488,161097,161179,25558,161098,161180,25569,161099,161133,25576,25585,25725,25854,26295,26278,161187,26282,26805,26815,161192,26818,26819,161193,26820,161194,26808,161195,26888,161197,27183,27284,27302,27305,29057,29381,29382,29410,29421,29422,19439,19488,19508,19570,161224,19585,19610,19673,19676,19677,19730,19742,19734,19781,19842,161236,19999,19976,19980,20030,161244,20102,20573,161255,20795,20824,20882,161258,21018,21182,21248,21275,21324,21460,161268,21492,21706,21872,22060,22309,22371,22587,22584,22583,22642,161272,22646,22647,22724,22728,22738,22739,22840,22850,22853,22903,22975,23040,23050,23051,23070,23085,23094,161279,23219,23221,161282,23481,23558,23560,23605,23606,161285,23960,161288,24004,24094,24114,24149,24204,24257,24286,24472,24531,24650,13220,13241,13356,13420,13585,13574,13577,13580,13618,13687,13858,14180,14278,14274,14335,14354,14559,14605,14669,15104,15398,15409,15843,161321,15785,15917,15923,161326,16113,16185,161327,16347,16369,16529,16562,19154,19187,19281,17178,17418,17485,16600,16632,16643,161333,42138,16784,16794,16809,17041,17527,17593,17667,17644,17817,17844,17975,18321,18286,18350,161341,18531,18673,18924,19115,8578,9031,9253,9257,9423,10212,10247,10306,11161,11189,11870,11954,12228,12369,12469,12400,12860,12928,13089,13042,816,1002,1317,1322,1366,1448,1521,1608,1591,1941,1960,2580,2659,2921,3147,3160,3186,4322,4403,161104,4448,4499,4580,4906,4964,4982,5009,5296,5306,5307,5496,5500,5589,5548,5742,5897,5810,6203,6230,6243,6625,6850,6909,6925,6949,6972,7045,7094,7376,7733,7735,7853,8451,34026,34106,34254,34274,34711,34766,35263,35156,35270,35234,35280,35352,35359,35516,35528,35549,37347,36970,37511,37512,37928,38012,38434,38481,38638,40952,40988,41033,41213,41099,41131,41132,41807,41855,42266,42268,42416,42501,42502,42507,42510,42566,52523,42925,42984,42978,43042,43049,43098,43108,43104,43249,43251,43283,43739,43841,44140,44164,45379,45743,51006,51011,51471,51489,51532,51582,51715,51760,52130,52145,52260,52273,52319,46422,45969,46083,46538,64122,46974,47022,47280,47316,47389,47564,47759,47794,47798,48286,48505,48568,48633,48658,48774,48784,48795,49869,48964,49424,49579,49581,49582,49936,50733,50765,50813,50776,50916,56759,56762,53562,53565,53593,53657,53659,53768,53854,54061,54056,54057,54064,54096,54141,56784,56788,56861,57003,57010,56916,56927,64126,57652,57964,57968,57925,57993,58012,58059,58429,58435,58436,58437,58438,58816,58896,58918,58969,58977,58999,59052,59503,59580,59726,60963,60986,60995,60999,61255,61261,61276,61315,61346,61812,62686,62839,62899,63262,63316,63330,63696,64138,64145,64194,64202,64219,64338,64597,64707,64714,64855,64878,65047,65636,65723,65731,65939,66984,67338,67389,67604,67608,68041,68395,68504,68507,68938,69112,69329,69419,69644,69893,70359,70831,70833,70963,70965,70991,71036)
  
-- Only these attributes
AND d.attr_oid IN 
(2188,2191,2251,2272,17674,18120,18122,18128,18131,18186,18200,18201,18209,18298,18361,18362,18367,18368,18372,18375,18384,18581,18587,18589,18597,18898,18900,18901)
 
ORDER BY d.oid ASC, d.attr_oid ASC, d.rowid ASC 


-- Slow (17-30 seconds)
-- explain query plan
SELECT d.oid, d.attr_oid, d.tdata FROM art_data d 
INNER JOIN _temp _t ON d.oid = _t.oid  
AND d.attr_oid IN 
(2188,2191,2251,2272,17674,18120,18122,18128,18131,18186,18200,18201,18209,18298,18361,18362,18367,18368,18372,18375,18384,18581,18587,18589,18597,18898,18900,18901)
 
ORDER BY d.oid ASC, d.attr_oid ASC, d.rowid ASC 

-- Fast: 0.5 seconds
-- explain query plan
SELECT d.oid, d.attr_oid, d.tdata FROM art_data d 
WHERE d.attr_oid IN 
(2188,2191,2251,2272,17674,18120,18122,18128,18131,18186,18200,18201,18209,18298,18361,18362,18367,18368,18372,18375,18384,18581,18587,18589,18597,18898,18900,18901)
 
AND d.oid IN (SELECT oid FROM _temp)
ORDER BY d.oid ASC, d.attr_oid ASC, d.rowid ASC
-- END  ---------------------------------


QUERY PLANS:

-- ** FAST (Query with IN clause) 0.2 to 0.5s
SEARCH TABLE art_data AS d USING INDEX idx_art_data_attr_oid (attr_oid=?)
EXECUTE LIST SUBQUERY 1
EXECUTE LIST SUBQUERY 1

-- ** SLOW (JOIN with temporary table) 17-30s
SEARCH TABLE art_data AS d USING INDEX idx_art_data_attr_oid (attr_oid=?)
EXECUTE LIST SUBQUERY 1
SEARCH TABLE _temp AS _t USING INTEGER PRIMARY KEY (rowid=?)

-- ** FAST (JOIN replaced by IN) 0.2 to 0.5s
SEARCH TABLE art_data AS d USING INDEX idx_art_data_oid (oid=?)
EXECUTE LIST SUBQUERY 1



Reply via email to