[ 
https://issues.apache.org/jira/browse/PIG-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894467#action_12894467
 ] 

Swati Jain commented on PIG-1530:
---------------------------------

a) This is not a developer coding issue. The example I gave is in fact a fairly 
simple one. Developer programs could be fairly complex and it is not always 
easy for the developer to do such optimizations on his own. One of the 
important advantages of an optimizer is to remove the burden of thinking about 
these from the developer.

b) A general filter pushup rule (as you correctly observe) must be able to push 
a filter as far up as possible. The way this would work is iterative 
application of pushing LOFilter across all relational operators. Simple rules 
must exist for pushing a filter above individual relational operators, these in 
conjunction would allow a filter to be pushed up as far as it can go. As an 
example, after I added the rule for the above, I can see a program where the 
LOFilter is below a LOForeach-LOCogroup pair pushed above LOCogroup. This was 
the result of applying PushUpFilter across LOCogroup and LOForeach (which 
already exists as a separate rule).

c) Each relational operator has specifics which make it hard to write a single 
pattern and must be handled separately to ensure nuances specific to that 
relational operator are handled correctly. Both LOCogroup and LOJoin are 
examples where the rules have fairly distinct logic. I do think however that 
there should be a single rule (with multiple patterns) which handles pushing up 
an LOFilter. That is the reason why I have added the LOCogroup optimization in 
PushUpFilter instead of creating a separate rule.

>  PIG Logical Optimization: Push LOFilter above LOCogroup
> --------------------------------------------------------
>
>                 Key: PIG-1530
>                 URL: https://issues.apache.org/jira/browse/PIG-1530
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>            Reporter: Swati Jain
>            Assignee: Swati Jain
>            Priority: Minor
>             Fix For: 0.8.0
>
>
> Consider the following:
> {noformat}
> A = load '<any file>' USING PigStorage(',') as (a1:int,a2:int,a3:int);
> B = load '<any file>' USING PigStorage(',') as (b1:int,b2:int,b3:int);
> G = COGROUP A by (a1,a2) , B by (b1,b2);
> D = Filter G by group.$0 + 5 > group.$1;
> explain D;
> {noformat}
> In the above example, LOFilter can be pushed above LOCogroup. Note there are 
> some tricky NULL issues to think about when the Cogroup is not of type INNER 
> (Similar to issues that need to be thought through when pushing LOFilter on 
> the right side of a LeftOuterJoin).
> Also note that typically the LOFilter in user programs will be below a 
> ForEach-Cogroup pair. To make this really useful, we need to also implement 
> LOFilter pushed across ForEach. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to