I'm trying sample a (deterministically) random subset of a SELECT query, the most common solution on the internet to get random samples seems to be "SELECT * FROM (...) ORDER BY RANDOM() LIMIT n;" (this already has some question marks, since it relies on seeding RANDOM and knowing the RANDOM function is always evaluated in the same order every query run), but looking at the query plans this materialises the entire result set in memory for the sort (no surprise, I can't think of anyway that could work otherwise) which is rather undesirable if the sample size becomes large (i.e. several million rows).
Now, I already know different ways to implement a predicate function that can deterministically keep elements from a stream, however that relies on having a deterministic order for the stream. Which brings us to SQLite. I can easily write something like: SELECT * FROM (...) WHERE my_predicate_fun() ORDER BY column1, column2,... And this *seems* to evaluate the where clause for each row in the order determined by ORDER BY, but this doesn't seem at all guaranteed by the SQL spec. So, is this behaviour documented/guaranteed somewhere? If not, is there some way to guarantee my where clause is evaluated for each row in a deterministic order? In the simple case like above I could always just evaluate the query without the ORDER BY, step through the entire query, and evaluate the predicate in the application, but if I want to use this random selection as a subquery, then that doesn't work. And while I'm asking questions: What if I want to do the above, but selecting groups of rows? So, sort of like: SELECT * FROM (...) GROUP BY groupColumn HAVING my_predicate_fun(); But where I want to return all rows in the group, rather than an aggregate. Thanks in advance, Merijn _______________________________________________ sqlite-users mailing list [email protected] http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

