On 28/06/2010 14:29, [email protected] wrote:
Hello couchers,
how would you do to select a random subset of a view result (a simple view with
map only).
Hi Mickael,
You want a random sample of predetermined size, from a list that you can
only (best) access randomly. To do this you must know the number of
records in the database.
Note - I know the stats side MUCH better than the couchdb so this might
not be implementable.
Here are two methods.
Method 1.
Say, for example, you want 3 from 11.
Take the first index with a probability of 3/11 by computing a random
number in range 0 to 1, and taking the record if rand < 3/11 (using real
math, not integer).
If you take that record, adjust the number required down by 1.
Reduce the number remaining by 1.
Take the next record with a probability or 2/10 or 3/10
Continue in like manner until you either
a) Take the last record with a probability of 1/1 or
b) Have all you want, and take the remaining records with a probability
of 0/n
To do this with couchdb I would read all the IDs into the client and
filter them there, and then read each records separately.
Method 2 -
Allocate a random number to each record (from a large range - we don't
want duplicates). This could be a sha or MD5 of the actual data.
Sort by the random number allocated.
Read the first N records that you need.
I think this sort of index could be set up on the server, so the client
needs only create the index, and read the first N records and remove the
index. The work on the server will be much greater that method 1 though.
If you use the MD5 or SHA function, then the order will be repeatable if
the data is not changed.
Regards
Ian