Re: [sqlite] Index not used in simple look-up-value query

Pavel Ivanov Thu, 15 Apr 2010 09:47:10 -0700

> Thanks for your prompt response Dan. I missed mentioning that in my tests, 
> I've made sure both queries lead to a single row (value) being returned in 
> all cases.


Also you forgot to mention how many rows MyDataTable and
MyIntermediateTable have (from first email we know that MyTable
contains 100K records), what's the structure of MyDataTable and
MyIntermediateTable, what's the average size of Label in MyTable and
how did you measured the running time of the queries.... That's just
some important factors off the top of my head...


Pavel

On Thu, Apr 15, 2010 at 12:33 PM,  <diego.d...@bentley.com> wrote:
> Thanks for your prompt response Dan. I missed mentioning that in my tests, 
> I've made sure both queries lead to a single row (value) being returned in 
> all cases.
>
> -----Original Message-----
> From: sqlite-users-boun...@sqlite.org 
> [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Dan Kennedy
> Sent: Thursday, April 15, 2010 11:59 AM
> To: General Discussion of SQLite Database
> Subject: Re: [sqlite] Index not used in simple look-up-value query
>
>
> On Apr 15, 2010, at 10:51 PM, <diego.d...@bentley.com> wrote:
>
>> In the light of the work on query-plan optimizations currently being
>> undertaken, I wonder if this issue could be looked at more closely.
>> I can confirm the returning type doesn't affect the performance I'm
>> seeing from the simple query. For example, returning an INT instead:
>>
>> SELECT ElementID FROM MyTable WHERE ElementID = @ElementID
>>
>> ... still runs slower (from 2 to 5 times slower) than the query
>> below (which doesn't make much sense).
>>
>> SELECT
>>       MyDataTable.MyValue
>> FROM
>>       MyDataTable, MyIntermediateTable
>> WHERE
>>       MyIntermediateTable.ID1 = @ID1 AND
>>       MyIntermediateTable.ID2 = �...@id2 AND
>>       MyIntermediateTable.RecordID = MyDataTable.RecordID;
>
> Maybe there are no rows in MyIntermediateTable that
> match the WHERE conditions.
>
> Dan.
>
>
>> Thanks,
>>
>> Diego
>>
>> -----Original Message-----
>> From: Diego Diaz
>> Sent: Tuesday, April 13, 2010 4:24 PM
>> To: General Discussion of SQLite Database
>> Subject: RE: [sqlite] Index not used in simple look-up-value query
>>
>> Thanks Dan for clarifying that NotExists op does a seek on the IPK
>> column. However, I'm still confused as why the first query (simple)
>> takes much longer (from 2 to 5 times more) than the complicated one.
>> Some numbers:
>>
>> 50K runs of the simple query takes ~3 sec in total.
>> 50K runs of the complicated query takes 0.5~1 sec in total.
>>
>> One would expect those timings to be the opposite (the complicated
>> one taking longer than the simpler one).
>>
>> One more detail: the simple query is returning a VARCHAR (Label) vs.
>> a FLOAT (MyValue) returned from the complicated one. Would the
>> returning type matter in the overall execution of these look-ups?
>>
>> Thanks!
>>
>> -----Original Message-----
>> From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-boun...@sqlite.org
>> ] On Behalf Of Dan Kennedy
>> Sent: Tuesday, April 13, 2010 2:11 PM
>> To: General Discussion of SQLite Database
>> Subject: Re: [sqlite] Index not used in simple look-up-value query
>>
>>
>> On Apr 14, 2010, at 1:05 AM, <diego.d...@bentley.com> wrote:
>>
>>> Hello,
>>>
>>> I have a question regarding the query-optimizer. I've tested SQLite
>>> with the following situation:
>>>
>>> 1 table defined as
>>>
>>> CREATE TABLE MyTable (ElementID INTEGER CONSTRAINT ElementID_PK
>>> PRIMARY KEY AUTOINCREMENT, Label VARCHAR(255))
>>>
>>> After filling that table with lots of records (e.g. 100K), I tested
>>> random look-ups of the "Label" column based on the ElementID column
>>> (which is the primary-key of the table). That is, executing the
>>> following query, which always returns 1 label-value, many times
>>> (after preparing it only once), with random values for the
>>> @ElementID parameter:
>>>
>>> SELECT Label FROM MyTable WHERE ElementID = @ElementID
>>>
>>> I've noticed that the performance of such query is not what I
>>> expected. So, I looked at the query-plan (with the EXPLAIN keyword),
>>> confirming that it is not using the primary-key, but it seems to be
>>> doing table-scans instead (e.g. operation NotExists at addr 5):
>>
>> It is using the primary key. NotExists does a seek on
>> the IPK column. This query should be fast.
>>
>>
>>> addr    opcode  p1      p2      p3      p4      p5      comment
>>> 0       Trace   0       0       0               0
>>> 1       Variable        1       1       1       @ElementID      0
>>> 2       Goto    0       10      0               0
>>> 3       OpenRead        0       2       0       2       0
>>> 4       MustBeInt       1       8       0               0
>>> 5       NotExists       0       8       1               0
>>> 6       Column  0       1       3               0
>>> 7       ResultRow       3       1       0               0
>>> 8       Close   0       0       0               0
>>> 9       Halt    0       0       0               0
>>> 10      Transaction     0       0       0               0
>>> 11      VerifyCookie    0       2       0               0
>>> 12      TableLock       0       2       0       MyTable 0
>>> 13      Goto    0       3       0               0
>>>
>>> On the other hand, I've tried the same kind of "look-ups" but
>>> involving a few joined-tables, leading to a more complex query like:
>>>
>>> SELECT MyDataTable .MyValue FROM MyDataTable, MyIntermediateTable
>>> WHERE MyIntermediateTable.ID1 = @ID1 AND MyIntermediateTable.ID2 =
>>> @ID2 AND MyIntermediateTable.RecordID = MyDataTable.RecordID;
>>>
>>> That query returns one value for a provided pair of integer IDs
>>> (@ID1 and @ID2) and it involves an inner join between two tables
>>> (MyDataTable.RecordID is a primary key autoincrement column).
>>>
>>> Executing that second query with random IDs many times in a row
>>> leads to much better performance (takes less than half of the time)
>>> than the simple query shown before (the size in number of rows of
>>> all tables is comparable).
>>>
>>> After analyzing the query-plan for the second query, I can tell it
>>> is using the indexes to seek into those tables as shown below (e.g.
>>> operations at addr 13 through 16: SeekGe, IdxGE, IdxRowid and Seek).
>>>
>>> addr    opcode  p1      p2      p3      p4      p5      comment
>>> 0       Trace   0       0       0               0
>>> 1       Integer 1       1       0               0
>>> 2       Variable        1       2       1       @ID1    0
>>> 3       Variable        2       3       1       @ID2    0
>>> 4       Goto    0       29      0               0
>>> 5       OpenRead        1       7       0       3       0
>>> 6       OpenRead        2       8       0
>>> keyinfo(2,BINARY,BINARY)        0
>>> 7       OpenRead        0       6       0       2       0
>>> 8       SCopy   2       4       0               0
>>> 9       IsNull  4       25      0               0
>>> 10      SCopy   3       5       0               0
>>> 11      IsNull  5       25      0               0
>>> 12      Affinity        4       2       0       dd      0
>>> 13      SeekGe  2       25      4       2       0
>>> 14      IdxGE   2       25      4       2       1
>>> 15      IdxRowid        2       6       0               0
>>> 16      Seek    1       6       0               0
>>> 17      Column  1       2       7               0
>>> 18      MustBeInt       7       24      0               0
>>> 19      NotExists       0       24      7               0
>>> 20      Column  0       1       8       0       0
>>> 21      RealAffinity    8       0       0               0
>>> 22      ResultRow       8       1       0               0
>>> 23      IfZero  1       25      -1              0
>>> 24      Next    2       14      0               0
>>> 25      Close   1       0       0               0
>>> 26      Close   2       0       0               0
>>> 27      Close   0       0       0               0
>>> 28      Halt    0       0       0               0
>>> 29      Transaction     0       0       0               0
>>> 30      VerifyCookie    0       101     0               0
>>> 31      TableLock       0       7       0
>>> MyIntermediateTable     0
>>> 32      TableLock       0       6       0       MyDataTable     0
>>> 33      Goto    0       5       0               0
>>>
>>> Does anyone have any idea why the query-optimizer is not taking
>>> advantage of the index associated with the primary key in the simple
>>> query I showed initially? How can I rewrite such a query so it uses
>>> the primary key index? I've tried "INDEXED BY" as well (after
>>> creating a second index on the ElementID column) with no success.
>>>
>>> Thanks,
>>>
>>> Diego
>>>
>>> _______________________________________________
>>> sqlite-users mailing list
>>> sqlite-users@sqlite.org
>>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>
>> _______________________________________________
>> sqlite-users mailing list
>> sqlite-users@sqlite.org
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>> _______________________________________________
>> sqlite-users mailing list
>> sqlite-users@sqlite.org
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Index not used in simple look-up-value query

Reply via email to