[ 
https://issues.apache.org/jira/browse/PIG-871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-871:
------------------------------

    Assignee: Thejas M Nair

> Improve distribution of keys in reduce phase
> --------------------------------------------
>
>                 Key: PIG-871
>                 URL: https://issues.apache.org/jira/browse/PIG-871
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.3.0
>            Reporter: Ankur
>            Assignee: Thejas M Nair
>
> The default hashing scheme used to distribute keys in reduce phase sometimes 
> results in an uneven distribution of keys resulting in 5 - 10 % of reducers 
> being overloaded with data. This bottleneck makes the PIG jobs really slow 
> and gives users a bad impression.
> While there is no bullet proof solution to the problem in general, the 
> hashing can certainly be improved for better distribution. The proposal here 
> is to evaluate and incorporate other hashing schemes that give high avalanche 
> and more even distribution. We can start by evaluating MurmurHash which is 
> Apache 2.0 licensed and freely available here - 
> http://www.getopt.org/murmur/MurmurHash.java

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to