[jira] Commented: (PIG-777) Code refactoring: Create optimization out of store/load post processing code

2009-05-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12710169#action_12710169
 ] 

Hadoop QA commented on PIG-777:
---

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12406744/log_message.patch
  against trunk revision 775340.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no tests are needed for this patch.

-1 patch.  The patch command could not apply the patch.

Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/42/console

This message is automatically generated.

 Code refactoring: Create optimization out of store/load post processing code
 

 Key: PIG-777
 URL: https://issues.apache.org/jira/browse/PIG-777
 Project: Pig
  Issue Type: Improvement
Reporter: Gunther Hagleitner
 Attachments: log_message.patch


 The postProcessing method in the pig server checks whether a logical graph 
 contains stores to and loads from the same location. If so, it will either 
 connect the store and load, or optimize by throwing out the load and 
 connecting the store predecessor with the successor of the load.
 Ideally the introduction of the store and load connection should happen in 
 the query compiler, while the optimization should then happen in an separate 
 optimizer step as part of the optimizer framework.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-777) Code refactoring: Create optimization out of store/load post processing code

2009-05-14 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12709590#action_12709590
 ] 

Olga Natkovich commented on PIG-777:


Looks like the patch does more than just log message. I think you need to add a 
unit test for why this is needed:

store.setInputSpec(load.getInputFile());

 Code refactoring: Create optimization out of store/load post processing code
 

 Key: PIG-777
 URL: https://issues.apache.org/jira/browse/PIG-777
 Project: Pig
  Issue Type: Improvement
Reporter: Gunther Hagleitner
 Attachments: log_message.patch


 The postProcessing method in the pig server checks whether a logical graph 
 contains stores to and loads from the same location. If so, it will either 
 connect the store and load, or optimize by throwing out the load and 
 connecting the store predecessor with the successor of the load.
 Ideally the introduction of the store and load connection should happen in 
 the query compiler, while the optimization should then happen in an separate 
 optimizer step as part of the optimizer framework.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-777) Code refactoring: Create optimization out of store/load post processing code

2009-04-29 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12704002#action_12704002
 ] 

Gunther Hagleitner commented on PIG-777:


David,

Per PIG-627 the first example you gave will result in a single map reduce job 
that is going to process both store operations. No duplication of steps A thru 
D. So, yes you shouldn't need to introduce D = load. Also PIG-627 introduced 
an optimization that will throw the D = load out - basically transforming 
your second example into the first.

This bug is mostly about the way the optimization is written. Some code should 
be moved around to align it with the optimization framework.

Adding a log message when this happens is a good idea though. Let me add that. 

 Code refactoring: Create optimization out of store/load post processing code
 

 Key: PIG-777
 URL: https://issues.apache.org/jira/browse/PIG-777
 Project: Pig
  Issue Type: Improvement
Reporter: Gunther Hagleitner

 The postProcessing method in the pig server checks whether a logical graph 
 contains stores to and loads from the same location. If so, it will either 
 connect the store and load, or optimize by throwing out the load and 
 connecting the store predecessor with the successor of the load.
 Ideally the introduction of the store and load connection should happen in 
 the query compiler, while the optimization should then happen in an separate 
 optimizer step as part of the optimizer framework.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: [jira] Commented: (PIG-777) Code refactoring: Create optimization out of store/load post processing code

2009-04-28 Thread Richard Ding
Hi David,

This is exactly the problem that the multi-query optimization project is
addressing. Please see the following link for details:

http://wiki.apache.org/pig/PigMultiQueryPerformanceSpecification


Thanks,
-Richard

-Original Message-
From: David Ciemiewicz (JIRA) [mailto:j...@apache.org] 
Sent: Tuesday, April 28, 2009 7:43 AM
To: pig-dev@hadoop.apache.org
Subject: [jira] Commented: (PIG-777) Code refactoring: Create
optimization out of store/load post processing code


[
https://issues.apache.org/jira/browse/PIG-777?page=com.atlassian.jira.pl
ugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12703659#ac
tion_12703659 ] 

David Ciemiewicz commented on PIG-777:
--

This seems like it could be useful but I don't understand the full issue
as a user.

I often want to compute intermediate summaries, store them, and then
continue computation.

{code}A = load ...
...
store D into ...
E = group D by ...
...
store H into ...{code}

The problem I encountered in earlier versions of Pig was that to PREVENT
two executions of steps
A thru D, I had to introduce a load step before E:

{code}A = load ...
...
store D into ...
D = load ...
E = group D by ...
...
store H into ...{code}

It's great that you will be introducing code that possibly eliminates D
= load in the execution.

However, is anything being done so that I don't need to introduce D =
load in the first place?

 Code refactoring: Create optimization out of store/load post
processing code




 Key: PIG-777
 URL: https://issues.apache.org/jira/browse/PIG-777
 Project: Pig
  Issue Type: Improvement
Reporter: Gunther Hagleitner

 The postProcessing method in the pig server checks whether a logical
graph contains stores to and loads from the same location. If so, it
will either connect the store and load, or optimize by throwing out the
load and connecting the store predecessor with the successor of the
load.
 Ideally the introduction of the store and load connection should
happen in the query compiler, while the optimization should then happen
in an separate optimizer step as part of the optimizer framework.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-777) Code refactoring: Create optimization out of store/load post processing code

2009-04-28 Thread David Ciemiewicz (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12703764#action_12703764
 ] 

David Ciemiewicz commented on PIG-777:
--

Another thing ...

If you eliminate the D = load statement, could you provide some information to 
the user that this optimization is taking place?

It would help me immensely with code maintenance if I could eliminate the D = 
load steps which often require recoding the AS clause schema.

 Code refactoring: Create optimization out of store/load post processing code
 

 Key: PIG-777
 URL: https://issues.apache.org/jira/browse/PIG-777
 Project: Pig
  Issue Type: Improvement
Reporter: Gunther Hagleitner

 The postProcessing method in the pig server checks whether a logical graph 
 contains stores to and loads from the same location. If so, it will either 
 connect the store and load, or optimize by throwing out the load and 
 connecting the store predecessor with the successor of the load.
 Ideally the introduction of the store and load connection should happen in 
 the query compiler, while the optimization should then happen in an separate 
 optimizer step as part of the optimizer framework.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.