On 2008 September 16 (Tue) 07:12:02am PDT, "Jay A. Kreibich" <[EMAIL 
PROTECTED]> wrote:
> On Mon, Sep 15, 2008 at 10:57:37PM -0700, Tomas Lee scratched on the wall:
> 
>> I want to delete the 1000 members with the lowest scores.  Assume that
>> it is extremely unlikely for two members to have identical scores.
>> Also, the 1000 is arbitrary -- it could be more or less on different
>> days.  But each time I do this deletion, it will be a fixed number.
>> 
>> What's the fastest way to do this?  
> 
> 
> Your best bet is to just try things out and see.
> 
> 
>> Method A:
>> * find the 1000th lowest score:
>> SELECT score FROM members ORDER BY score LIMIT 1 OFFSET 999;
>> * delete the records equal to or lower than that score
>> DELETE FROM members WHERE score <= $thousandth_lowest_score;
> 
> Do it as one command using a sub-select:
> 
> DELETE FROM members WHERE score <= (
> SELECT score FROM members ORDER BY score LIMIT 1 OFFSET 999);
> 
> Personally I don't like this, since it has the potential to delete
> too many records.  Having repeating scores may be rare, but that's
> not never.  On the other hand, getting rid of "ties" at the cut-off
> point may be desirable.
> 
>> Method B:
>> * find the uids for the 1000 lowest scores:
>> SELECT uid FROM members ORDER BY score LIMIT 1000;
>> * delete those records
>> DELETE FROM members WHERE uid IN ([join $uids ,]);
> 
> Again, do it as one command:
> 
> DELETE FROM members WHERE uid IN (
> SELECT uid FROM members ORDER BY score LIMIT 1000);
> 
> I'm not sure about speed, but I like this best from from a readability
> standpoint.  It is clean and straight forward, and I wouldn't be
> surprised to find out it is the fastest.
> 
> The only odd thing about this is if several records share the cut-off
> score, there isn't any good way of knowing which will be deleted and
> which will be left behind.  That may or may not matter.
> 
>> Method C:
>> * delete the records as you find them:
>> 
>> sqlite3_prepare_v2(db, "DELETE FROM members WHERE uid = ?", -1, &stmt_d, 
>> NULL);
>> sqlite3_prepare_v2(db, "SELECT uid FROM members ORDER BY score LIMIT 1000", 
>> -1, &stmt_q, NULL);
>> while (sqlite3_step(stmt_q) == SQLITE_ROW) {
>> int uid = sqlite3_column_int(stmt_q, 0);
>> sqlite3_bind_int(stmt_d, 0, uid);
>> sqlite3_step(stmt_d);
>> sqlite3_reset(stmt_d);
>> }
>> sqlite3_finalize(stmt_d);
>> sqlite3_finalize(stmt_q);
> 
> Essentially a manual version of Method B.
> 
> 
> Which is fastest may depend on the contents of the table and if the
> index will actually be used or not.  There might also be variations if
> the size (the 1000) changes significantly.
> 
> Again, just try it and see.  Using the sub-selects you can do this
> from the command line on a test database to give you a rough idea.

Thanks for the help.  I knew about using sub-selects, but I didn't
want to confuse the issue.  I didn't realize method B and method C
were the same -- I thought SQLite would need to keep all the uids in
memory in method B.

When I was doing my experiments, method A came out faster than method
B, and I like method A better anyway.  But I think either one would
work fast enough for my purposes anyway.
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to