How to do aggregate operations with multiple reducers

jeremy p Fri, 29 Nov 2013 14:38:40 -0800

Hey all,

So, I'm writing a module where I need to do aggregate operations over the
entire set of data.  However, I also want to use multiple reducers.


For example, let's say each row of input data looks like this :

DATE_TIME_TEMPERATURE

Let's say my mapper outputs DATE_TIME as a key, and the temperature as a
value.  In this example, I'm using 3 reducers, which should create three
output files. I want to find the day with the highest temperature in the
entire data set.

I know I could just write a script that examines the output from the
reducers and picks out the value with the highest temperature.  I could
also write a mapreduce job that does the same thing, and chain the two jobs
together.  However, these solutions seem kinda wrong to me.

What's the commonly-accepted best way to do this?

--Jeremy

How to do aggregate operations with multiple reducers

Reply via email to