[ https://issues.apache.org/jira/browse/PIG-744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Pradeep Kamath resolved PIG-744. -------------------------------- Resolution: Duplicate Duplicate of PIG-802 > PERFORMANCE: Bag creation can be more efficiently handled in order by > --------------------------------------------------------------------- > > Key: PIG-744 > URL: https://issues.apache.org/jira/browse/PIG-744 > Project: Pig > Issue Type: Improvement > Affects Versions: 0.2.0 > Reporter: Pradeep Kamath > Fix For: 0.3.0 > > > Currently order by results in multiple map reduce jobs (2 or 3 depending on > the script) of which the last one does the actual ordering. In this last map > reduce job, we create a bag of values (each value being the entire tuple that > is getting sorted) for each sort key(s) using POPackage in the reduce phase. > Then we turn around and flatten the bag in the foreach following the package. > So there is really no need for the bag. But to be generic and use the > existing operators, we can be more efficient by tagging the POPackage to > create bags which are backed by the Hadoop iterator itself. This way we do > not create a bag by making a copy of each tuple from the hadoop iterator. > This should help both performance and scalability by making better use of > memory. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.