[jira] Created: (PIG-841) PERFORMANCE: The sample MR job in order by implementation can use Hadoop sorting instead of doing a POSort

Pradeep Kamath (JIRA) Tue, 09 Jun 2009 12:16:29 -0700

PERFORMANCE: The sample MR job in order by implementation can use Hadoop 
sorting instead of doing a POSort
----------------------------------------------------------------------------------------------------------


                 Key: PIG-841
                 URL: https://issues.apache.org/jira/browse/PIG-841
             Project: Pig
          Issue Type: Improvement
    Affects Versions: 0.2.1
            Reporter: Pradeep Kamath
             Fix For: 0.3.0


Currently the sample map reduce job in order by implementation does the 
following:
 - sample 100 records from each map
 - group all on the above output
 - sort the output bag from the above grouping on keys of the order by
 - give the sorted bag to FindQuantiles udf


The steps 2 and 3 above can be replaced by
- group the sample output by the order by key and set parallelism of the group 
to 1 so that output of the group goes to one reducer. Since Hadoop ensures the 
output of the group is sorted by key we get sorting for free without using 
POSort 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (PIG-841) PERFORMANCE: The sample MR job in order by implementation can use Hadoop sorting instead of doing a POSort

Reply via email to