On 2008 September 16 (Tue) 07:12:02am PDT, "Jay A. Kreibich" <[EMAIL PROTECTED]> wrote: > On Mon, Sep 15, 2008 at 10:57:37PM -0700, Tomas Lee scratched on the wall: > >> I want to delete the 1000 members with the lowest scores. Assume that >> it is extremely unlikely for two members to have identical scores. >> Also, the 1000 is arbitrary -- it could be more or less on different >> days. But each time I do this deletion, it will be a fixed number. >> >> What's the fastest way to do this? > > > Your best bet is to just try things out and see. > > >> Method A: >> * find the 1000th lowest score: >> SELECT score FROM members ORDER BY score LIMIT 1 OFFSET 999; >> * delete the records equal to or lower than that score >> DELETE FROM members WHERE score <= $thousandth_lowest_score; > > Do it as one command using a sub-select: > > DELETE FROM members WHERE score <= ( > SELECT score FROM members ORDER BY score LIMIT 1 OFFSET 999); > > Personally I don't like this, since it has the potential to delete > too many records. Having repeating scores may be rare, but that's > not never. On the other hand, getting rid of "ties" at the cut-off > point may be desirable. > >> Method B: >> * find the uids for the 1000 lowest scores: >> SELECT uid FROM members ORDER BY score LIMIT 1000; >> * delete those records >> DELETE FROM members WHERE uid IN ([join $uids ,]); > > Again, do it as one command: > > DELETE FROM members WHERE uid IN ( > SELECT uid FROM members ORDER BY score LIMIT 1000); > > I'm not sure about speed, but I like this best from from a readability > standpoint. It is clean and straight forward, and I wouldn't be > surprised to find out it is the fastest. > > The only odd thing about this is if several records share the cut-off > score, there isn't any good way of knowing which will be deleted and > which will be left behind. That may or may not matter. > >> Method C: >> * delete the records as you find them: >> >> sqlite3_prepare_v2(db, "DELETE FROM members WHERE uid = ?", -1, &stmt_d, >> NULL); >> sqlite3_prepare_v2(db, "SELECT uid FROM members ORDER BY score LIMIT 1000", >> -1, &stmt_q, NULL); >> while (sqlite3_step(stmt_q) == SQLITE_ROW) { >> int uid = sqlite3_column_int(stmt_q, 0); >> sqlite3_bind_int(stmt_d, 0, uid); >> sqlite3_step(stmt_d); >> sqlite3_reset(stmt_d); >> } >> sqlite3_finalize(stmt_d); >> sqlite3_finalize(stmt_q); > > Essentially a manual version of Method B. > > > Which is fastest may depend on the contents of the table and if the > index will actually be used or not. There might also be variations if > the size (the 1000) changes significantly. > > Again, just try it and see. Using the sub-selects you can do this > from the command line on a test database to give you a rough idea.
Thanks for the help. I knew about using sub-selects, but I didn't want to confuse the issue. I didn't realize method B and method C were the same -- I thought SQLite would need to keep all the uids in memory in method B. When I was doing my experiments, method A came out faster than method B, and I like method A better anyway. But I think either one would work fast enough for my purposes anyway. _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users