Thanks for all your answers I forgot to tell I try to avoid the "skip" param for performances reasons.
The first method Ian suggest imply to read all my doc ids... In this case I can easily use the client random function (client is PHP). But reading all doc ids is not really performance friendly. The second method Ian speak about, is the one that others (Sebastian, Robert) are proposing : some random parameter in the document. But as Ian tells us, "If you use the MD5 or SHA function, then the order will be repeatable if the data is not changed."... not so random :-) My app is a music jukebox, I want to provide a way to fill the playlist with random songs. Each song document have an _id composed of 'aa'+ a random sha1 . I scratched my head already but I think the random feature should be a couchdb server feature, and can't be implemented client-side, but by reading all documents ids and using client-side random function, which is, once again, not really performance-friendly. Any idea welcome... Mickael ----- Mail Original ----- De: "Ian Hobson" <[email protected]> À: [email protected] Envoyé: Lundi 28 Juin 2010 16h04:09 GMT +01:00 Amsterdam / Berlin / Berne / Rome / Stockholm / Vienne Objet: Re: selecting a random subset of a view On 28/06/2010 14:29, [email protected] wrote: > Hello couchers, > > how would you do to select a random subset of a view result (a simple view > with map only). > > Hi Mickael, You want a random sample of predetermined size, from a list that you can only (best) access randomly. To do this you must know the number of records in the database. Note - I know the stats side MUCH better than the couchdb so this might not be implementable. Here are two methods. Method 1. Say, for example, you want 3 from 11. Take the first index with a probability of 3/11 by computing a random number in range 0 to 1, and taking the record if rand < 3/11 (using real math, not integer). If you take that record, adjust the number required down by 1. Reduce the number remaining by 1. Take the next record with a probability or 2/10 or 3/10 Continue in like manner until you either a) Take the last record with a probability of 1/1 or b) Have all you want, and take the remaining records with a probability of 0/n To do this with couchdb I would read all the IDs into the client and filter them there, and then read each records separately. Method 2 - Allocate a random number to each record (from a large range - we don't want duplicates). This could be a sha or MD5 of the actual data. Sort by the random number allocated. Read the first N records that you need. I think this sort of index could be set up on the server, so the client needs only create the index, and read the first N records and remove the index. The work on the server will be much greater that method 1 though. If you use the MD5 or SHA function, then the order will be repeatable if the data is not changed. Regards Ian
