If your _id's are 'aa' + sha1, then you can generate a random key as your startkey parameter to the _all_docs view. The next N hits are randomly selected.
It should be clarified that map/reduce methods have to be deterministic, so approaches where you'd generate random numbers to control calls to emit() are not going to work. B. On Mon, Jun 28, 2010 at 3:25 PM, <[email protected]> wrote: > Thanks for all your answers > > I forgot to tell I try to avoid the "skip" param for performances reasons. > > The first method Ian suggest imply to read all my doc ids... In this case I > can easily use the client random function (client is PHP). But reading all > doc ids is not really performance friendly. > > The second method Ian speak about, is the one that others (Sebastian, Robert) > are proposing : some random parameter in the document. But as Ian tells us, > "If you use the MD5 or SHA function, then the order will be repeatable if > the data is not changed."... not so random :-) > > My app is a music jukebox, I want to provide a way to fill the playlist with > random songs. Each song document have an _id composed of 'aa'+ a random sha1 . > > I scratched my head already but I think the random feature should be a > couchdb server feature, and can't be implemented client-side, but by reading > all documents ids and using client-side random function, which is, once > again, not really performance-friendly. > > Any idea welcome... > > Mickael > > ----- Mail Original ----- > De: "Ian Hobson" <[email protected]> > À: [email protected] > Envoyé: Lundi 28 Juin 2010 16h04:09 GMT +01:00 Amsterdam / Berlin / Berne / > Rome / Stockholm / Vienne > Objet: Re: selecting a random subset of a view > > On 28/06/2010 14:29, [email protected] wrote: >> Hello couchers, >> >> how would you do to select a random subset of a view result (a simple view >> with map only). >> >> > Hi Mickael, > > You want a random sample of predetermined size, from a list that you can > only (best) access randomly. To do this you must know the number of > records in the database. > > Note - I know the stats side MUCH better than the couchdb so this might > not be implementable. > > Here are two methods. > > Method 1. > > Say, for example, you want 3 from 11. > > Take the first index with a probability of 3/11 by computing a random > number in range 0 to 1, and taking the record if rand < 3/11 (using real > math, not integer). > > If you take that record, adjust the number required down by 1. > Reduce the number remaining by 1. > > Take the next record with a probability or 2/10 or 3/10 > > Continue in like manner until you either > > a) Take the last record with a probability of 1/1 or > b) Have all you want, and take the remaining records with a probability > of 0/n > > To do this with couchdb I would read all the IDs into the client and > filter them there, and then read each records separately. > > Method 2 - > > Allocate a random number to each record (from a large range - we don't > want duplicates). This could be a sha or MD5 of the actual data. > Sort by the random number allocated. > Read the first N records that you need. > > I think this sort of index could be set up on the server, so the client > needs only create the index, and read the first N records and remove the > index. The work on the server will be much greater that method 1 though. > > If you use the MD5 or SHA function, then the order will be repeatable if > the data is not changed. > > Regards > > Ian > >
