Improve distribution of keys in reduce phase
--------------------------------------------
Key: PIG-871
URL: https://issues.apache.org/jira/browse/PIG-871
Project: Pig
Issue Type: Improvement
Affects Versions: 0.3.0
Reporter: Ankur
The default hashing scheme used to distribute keys in reduce phase sometimes
results in an uneven distribution of keys resulting in 5 - 10 % of reducers
being overloaded with data. This bottleneck makes the PIG jobs really slow and
gives users a bad impression.
While there is no bullet proof solution to the problem in general, the hashing
can certainly be improved for better distribution. The proposal here is to
evaluate and incorporate other hashing schemes that give high avalanche and
more even distribution. We can start by evaluating MurmurHash which is Apache
2.0 licensed and freely available here -
http://www.getopt.org/murmur/MurmurHash.java
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.