On Fri, Jun 19, 2009 at 09:43:31AM +0200, Daniel Trümper wrote: > Hi, > > I am somewhat new to CouchDB but have been doing some stuff with it and > this is my first post to the list so pardon if I am wrong :) > > >> It would be really cool if there were some way to pass all the docs >> with a value of 1 for group_key to a single map function call, so I >> could do computation across those related documents and emit the >> results ... I'm just using the magic group_key attribute as an >> example, if such a feature were to actually be made I'd think you'd >> define a javascript function which returned a single groupping k to >> exist I > I think this is what the reduce function is for.
No, I'm afraid it's not. The OP wants to calculate information across a group of related documents. CouchDB does not guarantee that all the related documents will be passed to the reduce function at the same time. It may pass documents (d1,d2,d3) to the reduce function to generate Rx, then pass (d4,d5,d6) to the reduce function to generate Ry, then (d7,d8,d9) to generate Rz, then pass (Rx,Ry,Rz) to the re-reduce function to generate the final R value. If the values sharing the key were e.g. d3,d4 then you won't be able to process them together, as they would not be presented to the reduce function at the same time. Using a grouped reduce query is better (i.e. group=true), but a large set of documents sharing the same group key are still likely to be split into several reductions with a re-reduce. The OP was talking about ~100 documents sharing this key, and so they may well be split this way. Furthermore, CouchDB optimises its reductions by storing the reduced value for all the documents within the same Btree node. For example, suppose you have +-------------+ +-------------+ +-------------+ | d1 d2 d3 Rx | | d4 d5 d6 Ry | | d7 d8 d9 Rz | +-------------+ +-------------+ +-------------+ Then you make a reduce query for the key range which includes documents d2 to d8 inclusive (or a grouped query where d2 to d8 share the same group key). CouchDB will calculate: R1 = Reduce(d2,d3) R2 = Reduce(d7,d8) R = Rereduce(R1,Ry,R2) That is: the already-reduced value of Ry=Reduce(d4,d5,d6) is reused without recomputation. So the reduce function doesn't see documents d4 to d6 again. So in summary: you cannot rely on the reduce function to be able to process adjacent documents. You *must* do this sort of processing client-side. HTH, Brian.
