Re: [sqlite] Efficient random sampling in a large table using builtinfunctions.

Igor Tandetnik Thu, 08 Mar 2012 06:03:05 -0800

Benoit Mortgat <mort...@gmail.com> wrote:
> I have a table with millions of records.
> 
> When I run a query with
> 
>    ORDER BY random() LIMIT N;
> 
> the RANDOM() function is evaluated against all rows of my result set,
> then sorting occurs, and as a result the query is slow.
> 
> In this case the query could be rewritten as:
> 
> * Generate N, random, row numbers between 1 and (SELECT COUNT(*) FROM
>   the_table_name). Maybe using remainder operator % and builtin ABS()
>   and RANDOM() functions can help (see below)
> * SELECT FROM the_table WHERE rowid IN (those random numbers)


That is not equivalent. First, duplicate random numbers may be generated. 
Second, rowid values are not necessarily sequential. Thus, you may end up with 
fewer than N rows.

You can implement this algorithm in your application's code, if you are so 
inclined. Just prepare a query lile

select * from the_table where rowid=?;

then run it N times, binding different random numbers to the parameter. Of 
course you still need to deal with the two issues I mentioned above (perhaps 
you know that in your particular application, rowids are always sequential).

> This can however return less than N rows if by chance two random numbers
> have the same remainder modulo COUNT(*) FROM my_table.

Precisely.
-- 
Igor Tandetnik

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Efficient random sampling in a large table using builtinfunctions.

Reply via email to