[ 
https://issues.apache.org/jira/browse/PIG-744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath resolved PIG-744.
--------------------------------

    Resolution: Duplicate

Duplicate of PIG-802

> PERFORMANCE: Bag creation can be more efficiently handled in order by
> ---------------------------------------------------------------------
>
>                 Key: PIG-744
>                 URL: https://issues.apache.org/jira/browse/PIG-744
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.2.0
>            Reporter: Pradeep Kamath
>             Fix For: 0.3.0
>
>
> Currently order by results in multiple map reduce jobs (2 or 3 depending on 
> the script) of which the last one does the actual ordering. In this last map 
> reduce job, we create a bag of values (each value being the entire tuple that 
> is getting sorted) for each sort key(s) using POPackage in the reduce phase. 
> Then we turn around and flatten the bag in the foreach following the package. 
> So there is really no need for the bag. But to be generic and use the 
> existing operators, we can be more efficient by tagging the POPackage to 
> create bags which are backed by the Hadoop iterator itself. This way we do 
> not create a bag by making a copy of each tuple from the hadoop iterator. 
> This should help both performance and scalability by making better use of 
> memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to