Re: how to find top N values using map-reduce ?

2013-02-01 Thread Eugene Kirpichov
Hi, Can you tell more about: * How big is N * How big is the input dataset * How many mappers you have * Do input splits correlate with the sorting criterion for top N? Depending on the answers, very different strategies will be optimal. On Fri, Feb 1, 2013 at 9:05 PM, praveenesh kumar

Re: how to find top N values using map-reduce ?

2013-02-01 Thread praveenesh kumar
Actually what I am trying to find to top n% of the whole data. This n could be very large if my data is large. Assuming I have uniform rows of equal size and if the total data size is 10 GB, using the above mentioned approach, if I have to take top 10% of the whole data set, I need 10% of 10GB

Re: how to find top N values using map-reduce ?

2013-02-01 Thread Russell Jurney
Pig. Datafu. 7 lines of code. https://gist.github.com/4696443 https://github.com/linkedin/datafu On Fri, Feb 1, 2013 at 11:17 PM, praveenesh kumar praveen...@gmail.comwrote: Actually what I am trying to find to top n% of the whole data. This n could be very large if my data is large.

Re: how to find top N values using map-reduce ?

2013-02-01 Thread praveenesh kumar
Thanks for that Russell. Unfortunately I can't use Pig. Need to write my own MR job. I was wondering how its usually done in the best way possible. Regards Praveenesh On Sat, Feb 2, 2013 at 1:00 PM, Russell Jurney russell.jur...@gmail.com wrote: Pig. Datafu. 7 lines of code.