[
https://issues.apache.org/jira/browse/PIG-871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alan Gates reassigned PIG-871:
------------------------------
Assignee: Thejas M Nair
> Improve distribution of keys in reduce phase
> --------------------------------------------
>
> Key: PIG-871
> URL: https://issues.apache.org/jira/browse/PIG-871
> Project: Pig
> Issue Type: Improvement
> Affects Versions: 0.3.0
> Reporter: Ankur
> Assignee: Thejas M Nair
>
> The default hashing scheme used to distribute keys in reduce phase sometimes
> results in an uneven distribution of keys resulting in 5 - 10 % of reducers
> being overloaded with data. This bottleneck makes the PIG jobs really slow
> and gives users a bad impression.
> While there is no bullet proof solution to the problem in general, the
> hashing can certainly be improved for better distribution. The proposal here
> is to evaluate and incorporate other hashing schemes that give high avalanche
> and more even distribution. We can start by evaluating MurmurHash which is
> Apache 2.0 licensed and freely available here -
> http://www.getopt.org/murmur/MurmurHash.java
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.