Re: how to find top N values using map-reduce ?

2013-02-02 Thread Harsh J
Note that a "one reducer" isn't always the solution. If you know your key space boundaries, consider using a total-order-partition to scale the app/job and make use of nodes on the cluster. On Sat, Feb 2, 2013 at 10:35 AM, praveenesh kumar wrote: > I am looking for a better solution for this. > >

Re: how to find top N values using map-reduce ?

2013-02-02 Thread Niels Basjes
My suggestion is to use secondary sort with a single reducer. That easy you can easily extract the top N. If you want to get the top N% you'll need an additional phase to determine how many records this N% really is. -- Met vriendelijke groet, Niels Basjes (Verstuurd vanaf mobiel ) Op 2 feb. 2013

Re: how to find top N values using map-reduce ?

2013-02-02 Thread praveenesh kumar
My actual problem is to rank all values and then run logic 1 to top n% values and logic 2 to rest values. 1st - Ranking ? (need major suggestions here) 2nd - Find top n% out of them. Then rest is covered. Regards Praveenesh On Sat, Feb 2, 2013 at 1:42 PM, Lake Chang wrote: > there's one thing

Re: how to find top N values using map-reduce ?

2013-02-02 Thread Lake Chang
there's one thing i want to clarify that you can use multi-reducers to sort the data globally and then cat all the parts to get the top n records. The data in all parts are globally in order. Then you may find the problem is much easier. 在 2013-2-2 下午3:18,"praveenesh kumar" 写道: > Actually what I a