Re: selecting a random subset of a view

Robert Newson Mon, 28 Jun 2010 07:53:29 -0700

If your _id's are 'aa' + sha1, then you can generate a random key as
your startkey parameter to the _all_docs view. The next N hits are
randomly selected.


It should be clarified that map/reduce methods have to be
deterministic, so approaches where you'd generate random numbers to
control calls to emit() are not going to work.

B.

On Mon, Jun 28, 2010 at 3:25 PM,  <[email protected]> wrote:
> Thanks for all your answers
>
> I forgot to tell I try to avoid the "skip" param for performances reasons.
>
> The first method Ian suggest imply to read all my doc ids... In this case I 
> can easily use the client random function (client is PHP). But reading all 
> doc ids is not really performance friendly.
>
> The second method Ian speak about, is the one that others (Sebastian, Robert) 
> are proposing : some random parameter in the document. But as Ian tells us, 
> "If you use the MD5 or SHA function, then the order will be repeatable if
> the data is not changed."... not so random :-)
>
> My app is a music jukebox, I want to provide a way to fill the playlist with 
> random songs. Each song document have an _id composed of 'aa'+ a random sha1 .
>
> I scratched my head already but I think the random feature should be a 
> couchdb server feature, and can't be implemented client-side, but by reading 
> all documents ids and using client-side random function, which is, once 
> again, not really performance-friendly.
>
> Any idea welcome...
>
> Mickael
>
> ----- Mail Original -----
> De: "Ian Hobson" <[email protected]>
> À: [email protected]
> Envoyé: Lundi 28 Juin 2010 16h04:09 GMT +01:00 Amsterdam / Berlin / Berne / 
> Rome / Stockholm / Vienne
> Objet: Re: selecting a random subset of a view
>
> On 28/06/2010 14:29, [email protected] wrote:
>> Hello couchers,
>>
>> how would you do to select a random subset of a view result (a simple view 
>> with map only).
>>
>>
> Hi Mickael,
>
> You want a random sample of predetermined size, from a list that you can
> only (best) access randomly. To do this you must know the number of
> records in the database.
>
> Note - I know the stats side MUCH better than the couchdb so this might
> not be implementable.
>
> Here are two methods.
>
> Method 1.
>
> Say, for example, you want 3 from 11.
>
> Take the first index with a probability of 3/11  by computing a random
> number in range 0 to 1, and taking the record if rand < 3/11 (using real
> math, not integer).
>
> If you take that record, adjust the number required down by 1.
> Reduce the number remaining by 1.
>
> Take the next record with a probability or 2/10 or 3/10
>
> Continue in like manner until you either
>
> a) Take the last record with a probability of 1/1 or
> b) Have all you want, and take the remaining records with a probability
> of 0/n
>
> To do this with couchdb I would read all the IDs into the client and
> filter them there, and then read each records separately.
>
> Method 2 -
>
> Allocate a random number to each record (from a large range - we don't
> want duplicates). This could be a sha or MD5 of the actual data.
> Sort by the random number allocated.
> Read the first N records that you need.
>
> I think this sort of index could be set up on the server, so the client
> needs only create the index, and read the first N records and remove the
> index. The work on the server will be much greater that method 1 though.
>
> If you use the MD5 or SHA function, then the order will be repeatable if
> the data is not changed.
>
> Regards
>
> Ian
>
>

Re: selecting a random subset of a view

Reply via email to