Re: how do I do different reduce operations on the same map

James Marca Wed, 11 Feb 2009 17:33:02 -0800

On Wed, Feb 11, 2009 at 08:08:39PM +0000, Brian Candler wrote:
> On Tue, Feb 10, 2009 at 02:31:58PM -0800, James Marca wrote:
> > I have a situation where I want to run two different reduce functions
> > on the output of a single map function.  Like suppose I want one
> > reduce function to get the count of all objects in each group (for
> > example, documents with or without attachments), and another reduce to
> > compute some other aggregate, like the average and standard deviation
> > of a value, (like the average size of attached documents).  (Yes, I
> > know this is a stupid example, as the averaging reduce function will
> > also have the count, but my real case is too complicated to write
> > easily).
> 
> I believe reduce values are any JSON object, so perhaps you could reduce to
> an array of values, e.g. [count, total, sum_of_squares]
> 
> The final calculation of average and SD could then be left to the client


I'll have to think about what that means.  I've got mean/sd, etc
handled in a reduce, but I was wondering about doing other things with
the same map.  I am analyzing data from detectors for a year, with
most of the detectors reporting every 30 seconds.  So I want to say
things like "the average, std. dev, min and max for X on Tuesday between
8:05 and 8:10 was [...]"  That's one map/reduce run.  But there might be
other things we want to look at, so I was wondering whether it was
worth it to optimize a single map now (given the size of the data)
rather than adding more maps later. 

James




-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

Re: how do I do different reduce operations on the same map

Reply via email to