On Wed, Feb 11, 2009 at 08:08:39PM +0000, Brian Candler wrote: > On Tue, Feb 10, 2009 at 02:31:58PM -0800, James Marca wrote: > > I have a situation where I want to run two different reduce functions > > on the output of a single map function. Like suppose I want one > > reduce function to get the count of all objects in each group (for > > example, documents with or without attachments), and another reduce to > > compute some other aggregate, like the average and standard deviation > > of a value, (like the average size of attached documents). (Yes, I > > know this is a stupid example, as the averaging reduce function will > > also have the count, but my real case is too complicated to write > > easily). > > I believe reduce values are any JSON object, so perhaps you could reduce to > an array of values, e.g. [count, total, sum_of_squares] > > The final calculation of average and SD could then be left to the client
I'll have to think about what that means. I've got mean/sd, etc handled in a reduce, but I was wondering about doing other things with the same map. I am analyzing data from detectors for a year, with most of the detectors reporting every 30 seconds. So I want to say things like "the average, std. dev, min and max for X on Tuesday between 8:05 and 8:10 was [...]" That's one map/reduce run. But there might be other things we want to look at, so I was wondering whether it was worth it to optimize a single map now (given the size of the data) rather than adding more maps later. James -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.
