[
https://issues.apache.org/jira/browse/PIG-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12703659#action_12703659
]
David Ciemiewicz commented on PIG-777:
--------------------------------------
This seems like it could be useful but I don't understand the full issue as a
user.
I often want to compute intermediate summaries, store them, and then continue
computation.
{code}A = load ...
...
store D into ...
E = group D by ...
...
store H into ...{code}
The problem I encountered in earlier versions of Pig was that to PREVENT two
executions of steps
A thru D, I had to introduce a load step before E:
{code}A = load ...
...
store D into ...
D = load ...
E = group D by ...
...
store H into ...{code}
It's great that you will be introducing code that possibly eliminates D = load
in the execution.
However, is anything being done so that I don't need to introduce D = load in
the first place?
> Code refactoring: Create optimization out of store/load post processing code
> ----------------------------------------------------------------------------
>
> Key: PIG-777
> URL: https://issues.apache.org/jira/browse/PIG-777
> Project: Pig
> Issue Type: Improvement
> Reporter: Gunther Hagleitner
>
> The postProcessing method in the pig server checks whether a logical graph
> contains stores to and loads from the same location. If so, it will either
> connect the store and load, or optimize by throwing out the load and
> connecting the store predecessor with the successor of the load.
> Ideally the introduction of the store and load connection should happen in
> the query compiler, while the optimization should then happen in an separate
> optimizer step as part of the optimizer framework.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.