Re: selecting a random subset of a view

mickael . bailly Mon, 28 Jun 2010 07:25:43 -0700

Thanks for all your answers

I forgot to tell I try to avoid the "skip" param for performances reasons.

The first method Ian suggest imply to read all my doc ids... In this case I can 
easily use the client random function (client is PHP). But reading all doc ids 
is not really performance friendly.

The second method Ian speak about, is the one that others (Sebastian, Robert) 
are proposing : some random parameter in the document. But as Ian tells us, "If 
you use the MD5 or SHA function, then the order will be repeatable if 
the data is not changed."... not so random :-)

My app is a music jukebox, I want to provide a way to fill the playlist with 
random songs. Each song document have an _id composed of 'aa'+ a random sha1 .

I scratched my head already but I think the random feature should be a couchdb 
server feature, and can't be implemented client-side, but by reading all 
documents ids and using client-side random function, which is, once again, not 
really performance-friendly.

Any idea welcome...

Mickael

----- Mail Original -----
De: "Ian Hobson" <[email protected]>
À: [email protected]
Envoyé: Lundi 28 Juin 2010 16h04:09 GMT +01:00 Amsterdam / Berlin / Berne / 
Rome / Stockholm / Vienne
Objet: Re: selecting a random subset of a view

On 28/06/2010 14:29, [email protected] wrote:
> Hello couchers,
>
> how would you do to select a random subset of a view result (a simple view 
> with map only).
>
>    
Hi Mickael,

You want a random sample of predetermined size, from a list that you can 
only (best) access randomly. To do this you must know the number of 
records in the database.

Note - I know the stats side MUCH better than the couchdb so this might 
not be implementable.

Here are two methods.

Method 1.

Say, for example, you want 3 from 11.

Take the first index with a probability of 3/11  by computing a random 
number in range 0 to 1, and taking the record if rand < 3/11 (using real 
math, not integer).

If you take that record, adjust the number required down by 1.
Reduce the number remaining by 1.

Take the next record with a probability or 2/10 or 3/10

Continue in like manner until you either

a) Take the last record with a probability of 1/1 or
b) Have all you want, and take the remaining records with a probability 
of 0/n

To do this with couchdb I would read all the IDs into the client and 
filter them there, and then read each records separately.

Method 2 -

Allocate a random number to each record (from a large range - we don't 
want duplicates). This could be a sha or MD5 of the actual data.
Sort by the random number allocated.
Read the first N records that you need.

I think this sort of index could be set up on the server, so the client 
needs only create the index, and read the first N records and remove the 
index. The work on the server will be much greater that method 1 though.

If you use the MD5 or SHA function, then the order will be repeatable if 
the data is not changed.

Regards

Ian

Re: selecting a random subset of a view

Reply via email to