[jira] Commented: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with

Alan Gates (JIRA) Thu, 04 Feb 2010 12:23:55 -0800

    [ 
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12829759#action_12829759
 ]


Alan Gates commented on PIG-1178:
---------------------------------

Comments that came out of a review of the twiki doc the pig team did:

1) In OperatorPlan, the use of roots and leaves in the graph was considered 
confusing.  Some people view roots as sources and some as sinks.  It was 
recommended that we switch roots to sources and leaves to sinks to avoid 
confusion.

2) The new OperatorPlan does not include mergeSharedPlan, which was used by 
multi-query functionality in the old plan.  After further investigation I found 
that merge is currently only used by multi-query for physical plans.  While 
ideally we would like to use this infrastructure for physical plans too, I feel 
it is reasonable to put off adding merge until at least the initial prototyping 
phase is done.  After briefling looking at it I see no reason why it should not 
work, though we may need a more precise way to decide when two nodes are the 
same and should be merged.

3) A point was raised that perhaps the optimizer should reset the annotations 
on the nodes after a transform and all the attached listeners have been run.  
With further thought, I don't think so, as there may be annotations we want to 
last across transforms.  For example, a rule that could match an infinite 
number of times may want to "sign" a node to note it's already been there so 
that it does not fire on the node again.  The easiest way to do this signing 
would be with the annotations.  However, I can see that there would be a desire 
to clear certain annotations so that each pass of the optimizer has a fresh 
state.  To accomplish this I was wondering if we should allow developers to add 
visitors that would be run after all the listeners run.  So PlanOptimizer would 
change to have a new method:

{code}
addStatusResettingVisitor(Visitor v) {
    resetters.add(v);
}
{code}

and in the optimize loop

{code}
for (OperatorPlan m : matches) {
    if (transformer.check(m)) {
        sawMatch = true;
        transformer.transform(m);
        for(PlanTransformListener l: listeners) {
            l.transformed(plan, transformer.reportChanges());
        }
    }
}
{code}

would change to be:

{code}

for (OperatorPlan m : matches) {
    if (transformer.check(m)) {
        sawMatch = true;
        transformer.transform(m);
        for(PlanTransformListener l: listeners) {
            l.transformed(plan, transformer.reportChanges());
        }
        for(Visitor v : resetters) {
            v.visit();
        }
    }
}
{code}

Thoughts?

4) There is not clarity on how column pruning will work in the new optimizer.  
Will it be represented by a rule?  If so, how, since the new optimizer does not 
allow matching on any operator just on specific operators?  Would it be better 
instead to have it use the Transformers but not the PlanOptimizer 
infrastructure, since it isn't clear that we would want the column pruning rule 
to be triggered more than once?  To answer these I think we should prototype 
the column pruning soon.  It was one of the hardest parts of the existing 
infrastructure.  We want to make sure it can be done well in this new approach 
before committing to the approach.

5) The comment was made that while the examples in the document appear to show 
that the proposal will work for nested plans (that is, inner plans in foreach) 
they do not show that it will work for operators not yet nestable in foreach 
(e.g. group, foreach).  Since a stated goal of Pig Latin is to someday allow 
arbitrary nesting, we should validate that the proposal will support these 
additional operators to be nested in foreach.


> LogicalPlan and Optimizer are too complex and hard to work with
> ---------------------------------------------------------------
>
>                 Key: PIG-1178
>                 URL: https://issues.apache.org/jira/browse/PIG-1178
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Alan Gates
>            Assignee: Ying He
>         Attachments: expressions-2.patch, expressions.patch, lp.patch, 
> lp.patch, pig_1178.patch, PIG_1178.patch
>
>
> The current implementation of the logical plan and the logical optimizer in 
> Pig has proven to not be easily extensible. Developer feedback has indicated 
> that adding new rules to the optimizer is quite burdensome. In addition, the 
> logical plan has been an area of numerous bugs, many of which have been 
> difficult to fix. Developers also feel that the logical plan is difficult to 
> understand and maintain. The root cause for these issues is that a number of 
> design decisions that were made as part of the 0.2 rewrite of the front end 
> have now proven to be sub-optimal. The heart of this proposal is to revisit a 
> number of those proposals and rebuild the logical plan with a simpler design 
> that will make it much easier to maintain the logical plan as well as extend 
> the logical optimizer. 
> See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full 
> details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with

Reply via email to