PERFORMANCE: The sample MR job in order by implementation can use Hadoop sorting instead of doing a POSort ----------------------------------------------------------------------------------------------------------
Key: PIG-841 URL: https://issues.apache.org/jira/browse/PIG-841 Project: Pig Issue Type: Improvement Affects Versions: 0.2.1 Reporter: Pradeep Kamath Fix For: 0.3.0 Currently the sample map reduce job in order by implementation does the following: - sample 100 records from each map - group all on the above output - sort the output bag from the above grouping on keys of the order by - give the sorted bag to FindQuantiles udf The steps 2 and 3 above can be replaced by - group the sample output by the order by key and set parallelism of the group to 1 so that output of the group goes to one reducer. Since Hadoop ensures the output of the group is sorted by key we get sorting for free without using POSort -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.