In the very old days before computers were common, a random number
table appeared at the back of many statistical texts. This was used
to select a series of random numbers which would then be used as
look-up indices into some other data set.

You could do the same:

  1. generate a list of pseudo-random numbers, using a pre-defined
     seed value, over the range 1 .. count(*) of records in table,

  2. use that list as record id values to select the desired subset
     of the data in the table.

This would be done in two separate operations, possibly with a
storage of the generated numbers in a separate table which could
be used in the query of the main data.

Since it is a pseudo-random number series, you can repeat it as
often as needed using the same seed value.

Chris


On Thu, 7 Nov 2019, at 15:15, Merijn Verstraaten wrote:
> 
> > On 7 Nov 2019, at 19:16, David Raymond <david.raym...@tomtom.com> wrote:
> > 
> > Along those lines SQLite includes the reverse_unordered_selects pragma
> > https://www.sqlite.org/pragma.html#pragma_reverse_unordered_selects
> > which will flip the order it sends rows in queries that don't explicitly 
> > specify an ordering. It's there to assist you in finding spots in your code 
> > where you might be relying on implicit ordering when you really shouldn't 
> > be.
> 
> Like the rest of this threads, this is just pointing out why the things 
> in my initial email don't work, but I already knew that. Which is why I 
> asked for help to see if there is a way to do what I want that *does* 
> work. I don't care particularly about the details of "can I control the 
> order the condition is evaluated", it's just that all reasonable ways 
> to sample large streams that I know would require a deterministic order.
> 
> If someone has a different/better idea on how to return just a random 
> sample from a query in a repeatable way, I'm all ears.
> 
> So far the only suggestion was "use some non-deterministic random 
> sampling method and store the result", but since my samples are large 
> and I have lots of them, this would balloon my storage by >100x and I 
> don't have the available storage to make that work.
> 
> - Merijn
> 
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> 
> Attachments:
> * signature.asc
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to