[ 
https://issues.apache.org/jira/browse/PIG-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12703659#action_12703659
 ] 

David Ciemiewicz commented on PIG-777:
--------------------------------------

This seems like it could be useful but I don't understand the full issue as a 
user.

I often want to compute intermediate summaries, store them, and then continue 
computation.

{code}A = load ...
...
store D into ...
E = group D by ...
...
store H into ...{code}

The problem I encountered in earlier versions of Pig was that to PREVENT two 
executions of steps
A thru D, I had to introduce a load step before E:

{code}A = load ...
...
store D into ...
D = load ...
E = group D by ...
...
store H into ...{code}

It's great that you will be introducing code that possibly eliminates D = load 
in the execution.

However, is anything being done so that I don't need to introduce D = load in 
the first place?

> Code refactoring: Create optimization out of store/load post processing code
> ----------------------------------------------------------------------------
>
>                 Key: PIG-777
>                 URL: https://issues.apache.org/jira/browse/PIG-777
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Gunther Hagleitner
>
> The postProcessing method in the pig server checks whether a logical graph 
> contains stores to and loads from the same location. If so, it will either 
> connect the store and load, or optimize by throwing out the load and 
> connecting the store predecessor with the successor of the load.
> Ideally the introduction of the store and load connection should happen in 
> the query compiler, while the optimization should then happen in an separate 
> optimizer step as part of the optimizer framework.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to