[ https://issues.apache.org/jira/browse/PIG-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12703659#action_12703659 ]
David Ciemiewicz commented on PIG-777: -------------------------------------- This seems like it could be useful but I don't understand the full issue as a user. I often want to compute intermediate summaries, store them, and then continue computation. {code}A = load ... ... store D into ... E = group D by ... ... store H into ...{code} The problem I encountered in earlier versions of Pig was that to PREVENT two executions of steps A thru D, I had to introduce a load step before E: {code}A = load ... ... store D into ... D = load ... E = group D by ... ... store H into ...{code} It's great that you will be introducing code that possibly eliminates D = load in the execution. However, is anything being done so that I don't need to introduce D = load in the first place? > Code refactoring: Create optimization out of store/load post processing code > ---------------------------------------------------------------------------- > > Key: PIG-777 > URL: https://issues.apache.org/jira/browse/PIG-777 > Project: Pig > Issue Type: Improvement > Reporter: Gunther Hagleitner > > The postProcessing method in the pig server checks whether a logical graph > contains stores to and loads from the same location. If so, it will either > connect the store and load, or optimize by throwing out the load and > connecting the store predecessor with the successor of the load. > Ideally the introduction of the store and load connection should happen in > the query compiler, while the optimization should then happen in an separate > optimizer step as part of the optimizer framework. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.