Re: [sqlite] Force the use of a specified index?

Bart Smissaert Fri, 18 Sep 2009 15:22:12 -0700

DELETE
FROM
TABLE2
WHERE
ENTRY_ID NOT IN
 (SELECT T2.ENTRY_ID FROM
 TABLE2 T2 INNER JOIN TABLE1 T1 ON
 (T1.PATIENT_ID = T2.PATIENT_ID)
 WHERE julianday(T2.START_DATE, '+15 month') >
       julianday(T1.START_DATE)
 )


That is indeed a lot faster and then slightly faster than my approach
with the intermediate table. Will now have a look at Jon's suggestion.

RBS


On Fri, Sep 18, 2009 at 10:04 PM, Pavel Ivanov <paiva...@gmail.com> wrote:
> *I'm leaving aside the rant that your first delete is not identical to
> combination of the select and delete in the second approach and select
> in second approach contains typos...*
>
> But did you try to combine your insert and delete statements from the
> second approach? This approach quicker because of exactly that -
> select and delete statements are independent whereas in first approach
> your select is executed again and again for each row in TABLE2. So
> just make it like this:
>
> DELETE
> FROM
> TABLE2
> WHERE
> ENTRY_ID NOT IN
>  (SELECT T2.ENTRY_ID FROM
>  TABLE2 T2 INNER JOIN TABLE1 T1 ON
>  (T1.PATIENT_ID = T2.PATIENT_ID)
>  WHERE julianday(T2.START_DATE, '+15 month') >
>        julianday(T1.START_DATE)
>  )
>
> And about indexes: for this query index on (PATIENT_ID, START_DATE)
> doesn't do any better than on (PATIENT_ID) because SQLite must to
> check all rows with given PATIENT_ID anyway. And that is because
> START_DATE is in the query inside function call to julianday(). Index
> on 2 fields could help only if your condition was e.g. T2.START_DATE >
> T1.START_DATE.
> And in most cases there's no benefit creating index on 1 field
> (PATIENT_ID) when there's already index on 2 fields (PATIENT_ID,
> START_DATE) where PATIENT_ID is the first field.
>
> Pavel
>
> On Fri, Sep 18, 2009 at 4:46 PM, Bart Smissaert
> <bart.smissa...@gmail.com> wrote:
>> This is what I am dealing with:
>> 2 tables with exactly the same schema (but could be slightly
>> different, so can't put in same table):
>>
>> CREATE TABLE TABLE1(
>>   [PATIENT_ID] INTEGER,
>>   [ENTRY_ID] INTEGER PRIMARY KEY,
>>   [READ_CODE] TEXT,
>>   [ADDED_DATE] TEXT,
>>   [START_DATE] TEXT)
>>
>> The last 2 date fields are in the format yyyy-mm-dd
>>
>> Again in both tables the following indexes:
>> CREATE INDEX IDX_TABLE1_PATIENT_ID ON TABLE1(PATIENT_ID)
>> CREATE INDEX IDX_TABLE1_PATIENT_ID_START_DATE ON TABLE1(PATIENT_ID, 
>> START_DATE)
>> CREATE INDEX IDX_TABLE1_PATIENT_ID_ENTRY_ID ON TABLE1(PATIENT_ID, ENTRY_ID)
>> CREATE INDEX TABLE1_START_DATE ON TABLE1(START_DATE)
>>
>> Then some delete SQL's will be run so that in the end PATIENT_ID is
>> unique in both tables.
>>
>> Then the SQL I was trying to improve:
>>
>> DELETE
>> FROM
>> TABLE2
>> WHERE
>> PATIENT_ID NOT IN (
>>   SELECT
>>   TABLE1.PATIENT_ID
>>   FROM
>>   TABLE1
>>   WHERE
>>   JULIANDAY(TABLE2.START_DATE, '-14 month') >
>>   JULIANDAY(TABLE1.START_DATE) AND
>>   TABLE1.PATIENT_ID = TABLE2.PATIENT_ID)
>>
>> The index it will use for this SQL is IDX_TABLE1_PATIENT_ID_ENTRY_ID
>> and that surprises me as ENTRY_ID is not in the above SQL. If I drop
>> that index then it will
>> use IDX_TABLE1_PATIENT_ID and only if that is dropped as well will it
>> use IDX_TABLE1_PATIENT_ID_START_DATE, which I thought would be the
>> best, but turns
>> out to be no better than the first. Whatever way I do this it is slow
>> and I can do it a lot quicker by doing this:
>>
>> CREATE TEMP TABLE DATE_COMPARE_TEMP([E_ID] INTEGER)
>>
>> INSERT INTO DATE_COMPARE_TEMP (E_ID)
>>   SELECT T2.ENTRY_ID FROM
>>   TABLE2 T2 INNER JOIN TABLE1 T2 ON
>>   (T1.PATIENT_ID = T2.PATIENT_ID)
>>   WHERE julianday(T2.START_DATE, '+15 month') >
>>         julianday(T1.START_DATE)
>>
>> CREATE INDEX IDX1_DATE_COMPARE_TEMP_E_ID ON DATE_COMPARE_TEMP(E_ID)
>>
>> analyze DATE_COMPARE_TEMP
>>
>> and then the delete SQL like this:
>>
>> DELETE
>> FROM
>> TABLE2
>> WHERE
>> ENTRY_ID NOT IN
>>   (SELECT E_ID FROM DATE_COMPARE_TEMP)
>>
>> Although this involves a third temp table this method is about twice
>> as fast as the first one.
>> Maybe that is just the way it is and there just is no way to do this
>> as fast without the intermediate temp table, but I just wondered.
>> Again there is no problem here as the second method is simple and
>> fast. Just trying to increase my understanding of SQLite and indexes.
>>
>>
>> RBS
>>
>>
>>
>>
>> On Fri, Sep 18, 2009 at 8:25 PM, Simon Slavin
>> <slav...@hearsay.demon.co.uk> wrote:
>>>
>>> On 18 Sep 2009, at 4:49pm, bartsmissa...@blueyonder.co.uk wrote:
>>>
>>>>> Did something in the documentation make
>>>>> you think SQLite wouldn't use a
>>>>> multi-column index unless you forced it ?
>>>>
>>>> No, but I just noticed it didn't use the index I thought would be
>>>> best. As
>>>> it turned out it looks I was wrong in that that index didn't give the
>>>> quickest result.
>>>
>>> Ah.  Okay, that's fine.  It can be quite surprising what's best.  Glad
>>> you got a good result.
>>>
>>> Simon.
>>> _______________________________________________
>>> sqlite-users mailing list
>>> sqlite-users@sqlite.org
>>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>>
>> _______________________________________________
>> sqlite-users mailing list
>> sqlite-users@sqlite.org
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Force the use of a specified index?

Reply via email to