RE: Results from a Map/Reduce

Peter Haidinyak Fri, 17 Dec 2010 11:56:49 -0800

Does that mean that when the job.waitForCompletion(true) returns that I have 
the results from the Reducer(s) available to me? I haven't seen much on 
coprocessors, can you point me to some examples of their use?

Thanks
-Pete

-----Original Message-----
From: Jonathan Gray [mailto:[email protected]] 
Sent: Friday, December 17, 2010 11:13 AM
To: [email protected]
Subject: RE: Results from a Map/Reduce

Hey Peter,

That System.exit line is nothing important, just the main thread waiting for 
the tasks to finish before closing.

You're interested in having the MR job return a single result?  To do that, you 
would need to roll-up the processing done in each of your Map tasks into a 
single Reduce task.  With one reducer, you can have a single point to do the 
final aggregation of the result.

I'm not sure exactly what kind of aggregation you are doing but funneling into 
a single reducer can range from no problem to don't even try it.  Sounds like 
you just want a final number or something so shouldn't be an issue.

You might also consider doing your aggregations with coprocessors if you're 
into experimenting on HBase Trunk :)

As for FirstKeyOnlyFilter:

/**
 * A filter that will only return the first KV from each row.
 * <p>
 * This filter can be used to more efficiently perform row count operations.
 */

That's what it does.  If you scan a table, regardless of what you ask for in 
the query, the filter will just return whatever the first KeyValue is on each 
row and will skip every other column/version/value of that row except the first.

Like it says, it's generally useful for doing row counting but that's about it.

JG

> -----Original Message-----
> From: Peter Haidinyak [mailto:[email protected]]
> Sent: Friday, December 17, 2010 10:56 AM
> To: [email protected]
> Subject: Results from a Map/Reduce
> 
> Hi, dumb question again.
>   I have been using a Scan to return a result back to my client which works
> fine except when I am returning a million rows just to aggregate the results.
> The next logical step would be to do the aggregation in a Map/Reduce. I've
> been looking at what samples I could find and they see to all do this...
> 
>     System.exit(job.waitForCompletion(true) ? 0 : 1);
> 
> My question, is there a way to return a result from the job in a similar way 
> of
> getting a ResultScanner back in iterating through the results?
> 
> Also, is there a good definition of what a 'FirstKeyOnlyFilter' does?
> 
> Thanks
> 
> -Pete

RE: Results from a Map/Reduce

Reply via email to