On Mon, Sep 15, 2008 at 10:57:37PM -0700, Tomas Lee scratched on the wall:

> I want to delete the 1000 members with the lowest scores.  Assume that
> it is extremely unlikely for two members to have identical scores.
> Also, the 1000 is arbitrary -- it could be more or less on different
> days.  But each time I do this deletion, it will be a fixed number.
> 
> What's the fastest way to do this?  


  Your best bet is to just try things out and see.


> Method A:
>  * find the 1000th lowest score:
>    SELECT score FROM members ORDER BY score LIMIT 1 OFFSET 999;
>  * delete the records equal to or lower than that score
>    DELETE FROM members WHERE score <= $thousandth_lowest_score;

  Do it as one command using a sub-select:

  DELETE FROM members WHERE score <= (
        SELECT score FROM members ORDER BY score LIMIT 1 OFFSET 999);

  Personally I don't like this, since it has the potential to delete
  too many records.  Having repeating scores may be rare, but that's
  not never.  On the other hand, getting rid of "ties" at the cut-off
  point may be desirable.

> Method B:
>  * find the uids for the 1000 lowest scores:
>    SELECT uid FROM members ORDER BY score LIMIT 1000;
>  * delete those records
>    DELETE FROM members WHERE uid IN ([join $uids ,]);

  Again, do it as one command:

  DELETE FROM members WHERE uid IN (
        SELECT uid FROM members ORDER BY score LIMIT 1000);

  I'm not sure about speed, but I like this best from from a readability
  standpoint.  It is clean and straight forward, and I wouldn't be
  surprised to find out it is the fastest.

  The only odd thing about this is if several records share the cut-off
  score, there isn't any good way of knowing which will be deleted and
  which will be left behind.  That may or may not matter.

> Method C:
>  * delete the records as you find them:
> 
>    sqlite3_prepare_v2(db, "DELETE FROM members WHERE uid = ?", -1, &stmt_d, 
> NULL);
>    sqlite3_prepare_v2(db, "SELECT uid FROM members ORDER BY score LIMIT 
> 1000", -1, &stmt_q, NULL);
>    while (sqlite3_step(stmt_q) == SQLITE_ROW) {
>        int uid = sqlite3_column_int(stmt_q, 0);
>        sqlite3_bind_int(stmt_d, 0, uid);
>        sqlite3_step(stmt_d);
>        sqlite3_reset(stmt_d);
>    }
>    sqlite3_finalize(stmt_d);
>    sqlite3_finalize(stmt_q);

  Essentially a manual version of Method B.


  Which is fastest may depend on the contents of the table and if the
  index will actually be used or not.  There might also be variations if
  the size (the 1000) changes significantly.

  Again, just try it and see.  Using the sub-selects you can do this
  from the command line on a test database to give you a rough idea.

   -j

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"Our opponent is an alien starship packed with atomic bombs.  We have
 a protractor."   "I'll go home and see if I can scrounge up a ruler
 and a piece of string."  --from Anathem by Neal Stephenson
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to