PERFORMANCE: The sample MR job in order by implementation can use Hadoop 
sorting instead of doing a POSort

                 Key: PIG-841
             Project: Pig
          Issue Type: Improvement
    Affects Versions: 0.2.1
            Reporter: Pradeep Kamath
             Fix For: 0.3.0

Currently the sample map reduce job in order by implementation does the 
 - sample 100 records from each map
 - group all on the above output
 - sort the output bag from the above grouping on keys of the order by
 - give the sorted bag to FindQuantiles udf

The steps 2 and 3 above can be replaced by
- group the sample output by the order by key and set parallelism of the group 
to 1 so that output of the group goes to one reducer. Since Hadoop ensures the 
output of the group is sorted by key we get sorting for free without using 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Reply via email to