Tom White commented on PIG-871:

MurmurHash (and other hashing schemes) can be found in the 
org.apache.hadoop.util.hash package of Hadoop Common.

> Improve distribution of keys in reduce phase
> --------------------------------------------
>                 Key: PIG-871
>                 URL: https://issues.apache.org/jira/browse/PIG-871
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.3.0
>            Reporter: Ankur
> The default hashing scheme used to distribute keys in reduce phase sometimes 
> results in an uneven distribution of keys resulting in 5 - 10 % of reducers 
> being overloaded with data. This bottleneck makes the PIG jobs really slow 
> and gives users a bad impression.
> While there is no bullet proof solution to the problem in general, the 
> hashing can certainly be improved for better distribution. The proposal here 
> is to evaluate and incorporate other hashing schemes that give high avalanche 
> and more even distribution. We can start by evaluating MurmurHash which is 
> Apache 2.0 licensed and freely available here - 
> http://www.getopt.org/murmur/MurmurHash.java

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Reply via email to