Thanks, will have a look at your suggestion and yes, I had a feeling I was overlooking some elemental things here. I typed it out all bit quick (hence the typo's and difference in the deletes), but I thought it would make clear what was going on. Will test now and see if your suggestion is indeed quicker.
RBS On Fri, Sep 18, 2009 at 10:04 PM, Pavel Ivanov <paiva...@gmail.com> wrote: > *I'm leaving aside the rant that your first delete is not identical to > combination of the select and delete in the second approach and select > in second approach contains typos...* > > But did you try to combine your insert and delete statements from the > second approach? This approach quicker because of exactly that - > select and delete statements are independent whereas in first approach > your select is executed again and again for each row in TABLE2. So > just make it like this: > > DELETE > FROM > TABLE2 > WHERE > ENTRY_ID NOT IN > (SELECT T2.ENTRY_ID FROM > TABLE2 T2 INNER JOIN TABLE1 T1 ON > (T1.PATIENT_ID = T2.PATIENT_ID) > WHERE julianday(T2.START_DATE, '+15 month') > > julianday(T1.START_DATE) > ) > > And about indexes: for this query index on (PATIENT_ID, START_DATE) > doesn't do any better than on (PATIENT_ID) because SQLite must to > check all rows with given PATIENT_ID anyway. And that is because > START_DATE is in the query inside function call to julianday(). Index > on 2 fields could help only if your condition was e.g. T2.START_DATE > > T1.START_DATE. > And in most cases there's no benefit creating index on 1 field > (PATIENT_ID) when there's already index on 2 fields (PATIENT_ID, > START_DATE) where PATIENT_ID is the first field. > > Pavel > > On Fri, Sep 18, 2009 at 4:46 PM, Bart Smissaert > <bart.smissa...@gmail.com> wrote: >> This is what I am dealing with: >> 2 tables with exactly the same schema (but could be slightly >> different, so can't put in same table): >> >> CREATE TABLE TABLE1( >> [PATIENT_ID] INTEGER, >> [ENTRY_ID] INTEGER PRIMARY KEY, >> [READ_CODE] TEXT, >> [ADDED_DATE] TEXT, >> [START_DATE] TEXT) >> >> The last 2 date fields are in the format yyyy-mm-dd >> >> Again in both tables the following indexes: >> CREATE INDEX IDX_TABLE1_PATIENT_ID ON TABLE1(PATIENT_ID) >> CREATE INDEX IDX_TABLE1_PATIENT_ID_START_DATE ON TABLE1(PATIENT_ID, >> START_DATE) >> CREATE INDEX IDX_TABLE1_PATIENT_ID_ENTRY_ID ON TABLE1(PATIENT_ID, ENTRY_ID) >> CREATE INDEX TABLE1_START_DATE ON TABLE1(START_DATE) >> >> Then some delete SQL's will be run so that in the end PATIENT_ID is >> unique in both tables. >> >> Then the SQL I was trying to improve: >> >> DELETE >> FROM >> TABLE2 >> WHERE >> PATIENT_ID NOT IN ( >> SELECT >> TABLE1.PATIENT_ID >> FROM >> TABLE1 >> WHERE >> JULIANDAY(TABLE2.START_DATE, '-14 month') > >> JULIANDAY(TABLE1.START_DATE) AND >> TABLE1.PATIENT_ID = TABLE2.PATIENT_ID) >> >> The index it will use for this SQL is IDX_TABLE1_PATIENT_ID_ENTRY_ID >> and that surprises me as ENTRY_ID is not in the above SQL. If I drop >> that index then it will >> use IDX_TABLE1_PATIENT_ID and only if that is dropped as well will it >> use IDX_TABLE1_PATIENT_ID_START_DATE, which I thought would be the >> best, but turns >> out to be no better than the first. Whatever way I do this it is slow >> and I can do it a lot quicker by doing this: >> >> CREATE TEMP TABLE DATE_COMPARE_TEMP([E_ID] INTEGER) >> >> INSERT INTO DATE_COMPARE_TEMP (E_ID) >> SELECT T2.ENTRY_ID FROM >> TABLE2 T2 INNER JOIN TABLE1 T2 ON >> (T1.PATIENT_ID = T2.PATIENT_ID) >> WHERE julianday(T2.START_DATE, '+15 month') > >> julianday(T1.START_DATE) >> >> CREATE INDEX IDX1_DATE_COMPARE_TEMP_E_ID ON DATE_COMPARE_TEMP(E_ID) >> >> analyze DATE_COMPARE_TEMP >> >> and then the delete SQL like this: >> >> DELETE >> FROM >> TABLE2 >> WHERE >> ENTRY_ID NOT IN >> (SELECT E_ID FROM DATE_COMPARE_TEMP) >> >> Although this involves a third temp table this method is about twice >> as fast as the first one. >> Maybe that is just the way it is and there just is no way to do this >> as fast without the intermediate temp table, but I just wondered. >> Again there is no problem here as the second method is simple and >> fast. Just trying to increase my understanding of SQLite and indexes. >> >> >> RBS >> >> >> >> >> On Fri, Sep 18, 2009 at 8:25 PM, Simon Slavin >> <slav...@hearsay.demon.co.uk> wrote: >>> >>> On 18 Sep 2009, at 4:49pm, bartsmissa...@blueyonder.co.uk wrote: >>> >>>>> Did something in the documentation make >>>>> you think SQLite wouldn't use a >>>>> multi-column index unless you forced it ? >>>> >>>> No, but I just noticed it didn't use the index I thought would be >>>> best. As >>>> it turned out it looks I was wrong in that that index didn't give the >>>> quickest result. >>> >>> Ah. Okay, that's fine. It can be quite surprising what's best. Glad >>> you got a good result. >>> >>> Simon. >>> _______________________________________________ >>> sqlite-users mailing list >>> sqlite-users@sqlite.org >>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >>> >> _______________________________________________ >> sqlite-users mailing list >> sqlite-users@sqlite.org >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users >> > _______________________________________________ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users