Hi David,

This is exactly the problem that the multi-query optimization project is
addressing. Please see the following link for details:

http://wiki.apache.org/pig/PigMultiQueryPerformanceSpecification


Thanks,
-Richard

-----Original Message-----
From: David Ciemiewicz (JIRA) [mailto:j...@apache.org] 
Sent: Tuesday, April 28, 2009 7:43 AM
To: pig-dev@hadoop.apache.org
Subject: [jira] Commented: (PIG-777) Code refactoring: Create
optimization out of store/load post processing code


    [
https://issues.apache.org/jira/browse/PIG-777?page=com.atlassian.jira.pl
ugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12703659#ac
tion_12703659 ] 

David Ciemiewicz commented on PIG-777:
--------------------------------------

This seems like it could be useful but I don't understand the full issue
as a user.

I often want to compute intermediate summaries, store them, and then
continue computation.

{code}A = load ...
...
store D into ...
E = group D by ...
...
store H into ...{code}

The problem I encountered in earlier versions of Pig was that to PREVENT
two executions of steps
A thru D, I had to introduce a load step before E:

{code}A = load ...
...
store D into ...
D = load ...
E = group D by ...
...
store H into ...{code}

It's great that you will be introducing code that possibly eliminates D
= load in the execution.

However, is anything being done so that I don't need to introduce D =
load in the first place?

> Code refactoring: Create optimization out of store/load post
processing code
>
------------------------------------------------------------------------
----
>
>                 Key: PIG-777
>                 URL: https://issues.apache.org/jira/browse/PIG-777
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Gunther Hagleitner
>
> The postProcessing method in the pig server checks whether a logical
graph contains stores to and loads from the same location. If so, it
will either connect the store and load, or optimize by throwing out the
load and connecting the store predecessor with the successor of the
load.
> Ideally the introduction of the store and load connection should
happen in the query compiler, while the optimization should then happen
in an separate optimizer step as part of the optimizer framework.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to