I think it would be great if views supported random queries, maybe by adding a query argument, but truly random results are rarely necessary in practice. For example, in generating a play list what you really need is sufficient unpredictability that play lists yield gratifying results for repeated playing over time.
You could get this by using Sabastian's suggestion, and request the documents one at a time. On the client side, generate a random hash and use that as your startkey. If you get no result get the first document in the view so that it is sure to spread your results evenly over the whole list (loop the list). This works well if you need one document at a time. If you want to retrieve several documents at a time, I'd consider periodically regenerating a UUID you are using for sorting. For a play list, I'd probably do this when I update the document with statistics, so for example when a song is played I'd update its record with a "last played" timestamp, number of plays, and anything else of interest that says this song was played, and update it with a newly generated UUID string in a sort field, which I'd use as the key in the view. Then I'd generate a random hash client-side as the start key. If you use this, you could get three at a time and still maintain sufficient randomness for satisfactorily picking music in an unpredictable way. Songs would be unlikely to be played in the same order twice (unless your list is small). If your situation supports it you could update the sort field every time you retrieve a document. Whether that's practical or not depends on your environment, the list size, the document size, and so on. You just have to weigh how "random" you want the list and the impact on the user's experience of performance. If the update is non-blocking it won't impact the user's perceived performance but will increase the random appearance of play lists. The occasions when it is really important to generate random values are uncommon, such as in situations where security requires the next in a sequence to be unguessable. Using couchdb is already a rebalancing of priorities for those of us who have built careers out of data driven applications. Sometimes the best answer lies in rethinking our requirements pragmatically. David Caylor On Mon, Jun 28, 2010 at 9:15 AM, <[email protected]> wrote: > shouldn't list functions be deterministics ? > > > ----- Mail Original ----- > De: "Randall Leeds" <[email protected]> > À: [email protected] > Envoyé: Lundi 28 Juin 2010 17h44:36 GMT +01:00 Amsterdam / Berlin / Berne / > Rome / Stockholm / Vienne > Objet: Re: selecting a random subset of a view > > Ian's Method One, implemented as a list function, would probably do the > trick. > > On Jun 28, 2010 9:04 AM, "Ian Hobson" <[email protected]> wrote: > > On 28/06/2010 14:29, [email protected] wrote: > > > > Hello couchers, > > > > how would you do to selec... > Hi Mickael, > > You want a random sample of predetermined size, from a list that you can > only (best) access randomly. To do this you must know the number of records > in the database. > > Note - I know the stats side MUCH better than the couchdb so this might not > be implementable. > > Here are two methods. > > Method 1. > > Say, for example, you want 3 from 11. > > Take the first index with a probability of 3/11 by computing a random > number in range 0 to 1, and taking the record if rand < 3/11 (using real > math, not integer). > > If you take that record, adjust the number required down by 1. > Reduce the number remaining by 1. > > Take the next record with a probability or 2/10 or 3/10 > > Continue in like manner until you either > > a) Take the last record with a probability of 1/1 or > b) Have all you want, and take the remaining records with a probability of > 0/n > > To do this with couchdb I would read all the IDs into the client and filter > them there, and then read each records separately. > > Method 2 - > > Allocate a random number to each record (from a large range - we don't want > duplicates). This could be a sha or MD5 of the actual data. > Sort by the random number allocated. > Read the first N records that you need. > > I think this sort of index could be set up on the server, so the client > needs only create the index, and read the first N records and remove the > index. The work on the server will be much greater that method 1 though. > > If you use the MD5 or SHA function, then the order will be repeatable if > the > data is not changed. > > Regards > > Ian >
