David Ciemiewicz commented on PIG-777:

This seems like it could be useful but I don't understand the full issue as a 

I often want to compute intermediate summaries, store them, and then continue 

{code}A = load ...
store D into ...
E = group D by ...
store H into ...{code}

The problem I encountered in earlier versions of Pig was that to PREVENT two 
executions of steps
A thru D, I had to introduce a load step before E:

{code}A = load ...
store D into ...
D = load ...
E = group D by ...
store H into ...{code}

It's great that you will be introducing code that possibly eliminates D = load 
in the execution.

However, is anything being done so that I don't need to introduce D = load in 
the first place?

> Code refactoring: Create optimization out of store/load post processing code
> ----------------------------------------------------------------------------
>                 Key: PIG-777
>                 URL: https://issues.apache.org/jira/browse/PIG-777
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Gunther Hagleitner
> The postProcessing method in the pig server checks whether a logical graph 
> contains stores to and loads from the same location. If so, it will either 
> connect the store and load, or optimize by throwing out the load and 
> connecting the store predecessor with the successor of the load.
> Ideally the introduction of the store and load connection should happen in 
> the query compiler, while the optimization should then happen in an separate 
> optimizer step as part of the optimizer framework.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Reply via email to