Note that a "one reducer" isn't always the solution. If you know your
key space boundaries, consider using a total-order-partition to scale
the app/job and make use of nodes on the cluster.
On Sat, Feb 2, 2013 at 10:35 AM, praveenesh kumar wrote:
> I am looking for a better solution for this.
>
>
My suggestion is to use secondary sort with a single reducer. That easy you
can easily extract the top N. If you want to get the top N% you'll need an
additional phase to determine how many records this N% really is.
--
Met vriendelijke groet,
Niels Basjes
(Verstuurd vanaf mobiel )
Op 2 feb. 2013
My actual problem is to rank all values and then run logic 1 to top n%
values and logic 2 to rest values.
1st - Ranking ? (need major suggestions here)
2nd - Find top n% out of them.
Then rest is covered.
Regards
Praveenesh
On Sat, Feb 2, 2013 at 1:42 PM, Lake Chang wrote:
> there's one thing
there's one thing i want to clarify that you can use multi-reducers to sort
the data globally and then cat all the parts to get the top n records. The
data in all parts are globally in order.
Then you may find the problem is much easier.
在 2013-2-2 下午3:18,"praveenesh kumar" 写道:
> Actually what I a