[ https://issues.apache.org/jira/browse/PIG-871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alan Gates reassigned PIG-871: ------------------------------ Assignee: Thejas M Nair > Improve distribution of keys in reduce phase > -------------------------------------------- > > Key: PIG-871 > URL: https://issues.apache.org/jira/browse/PIG-871 > Project: Pig > Issue Type: Improvement > Affects Versions: 0.3.0 > Reporter: Ankur > Assignee: Thejas M Nair > > The default hashing scheme used to distribute keys in reduce phase sometimes > results in an uneven distribution of keys resulting in 5 - 10 % of reducers > being overloaded with data. This bottleneck makes the PIG jobs really slow > and gives users a bad impression. > While there is no bullet proof solution to the problem in general, the > hashing can certainly be improved for better distribution. The proposal here > is to evaluate and incorporate other hashing schemes that give high avalanche > and more even distribution. We can start by evaluating MurmurHash which is > Apache 2.0 licensed and freely available here - > http://www.getopt.org/murmur/MurmurHash.java -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.