[ https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12829759#action_12829759 ]
Alan Gates commented on PIG-1178: --------------------------------- Comments that came out of a review of the twiki doc the pig team did: 1) In OperatorPlan, the use of roots and leaves in the graph was considered confusing. Some people view roots as sources and some as sinks. It was recommended that we switch roots to sources and leaves to sinks to avoid confusion. 2) The new OperatorPlan does not include mergeSharedPlan, which was used by multi-query functionality in the old plan. After further investigation I found that merge is currently only used by multi-query for physical plans. While ideally we would like to use this infrastructure for physical plans too, I feel it is reasonable to put off adding merge until at least the initial prototyping phase is done. After briefling looking at it I see no reason why it should not work, though we may need a more precise way to decide when two nodes are the same and should be merged. 3) A point was raised that perhaps the optimizer should reset the annotations on the nodes after a transform and all the attached listeners have been run. With further thought, I don't think so, as there may be annotations we want to last across transforms. For example, a rule that could match an infinite number of times may want to "sign" a node to note it's already been there so that it does not fire on the node again. The easiest way to do this signing would be with the annotations. However, I can see that there would be a desire to clear certain annotations so that each pass of the optimizer has a fresh state. To accomplish this I was wondering if we should allow developers to add visitors that would be run after all the listeners run. So PlanOptimizer would change to have a new method: {code} addStatusResettingVisitor(Visitor v) { resetters.add(v); } {code} and in the optimize loop {code} for (OperatorPlan m : matches) { if (transformer.check(m)) { sawMatch = true; transformer.transform(m); for(PlanTransformListener l: listeners) { l.transformed(plan, transformer.reportChanges()); } } } {code} would change to be: {code} for (OperatorPlan m : matches) { if (transformer.check(m)) { sawMatch = true; transformer.transform(m); for(PlanTransformListener l: listeners) { l.transformed(plan, transformer.reportChanges()); } for(Visitor v : resetters) { v.visit(); } } } {code} Thoughts? 4) There is not clarity on how column pruning will work in the new optimizer. Will it be represented by a rule? If so, how, since the new optimizer does not allow matching on any operator just on specific operators? Would it be better instead to have it use the Transformers but not the PlanOptimizer infrastructure, since it isn't clear that we would want the column pruning rule to be triggered more than once? To answer these I think we should prototype the column pruning soon. It was one of the hardest parts of the existing infrastructure. We want to make sure it can be done well in this new approach before committing to the approach. 5) The comment was made that while the examples in the document appear to show that the proposal will work for nested plans (that is, inner plans in foreach) they do not show that it will work for operators not yet nestable in foreach (e.g. group, foreach). Since a stated goal of Pig Latin is to someday allow arbitrary nesting, we should validate that the proposal will support these additional operators to be nested in foreach. > LogicalPlan and Optimizer are too complex and hard to work with > --------------------------------------------------------------- > > Key: PIG-1178 > URL: https://issues.apache.org/jira/browse/PIG-1178 > Project: Pig > Issue Type: Improvement > Reporter: Alan Gates > Assignee: Ying He > Attachments: expressions-2.patch, expressions.patch, lp.patch, > lp.patch, pig_1178.patch, PIG_1178.patch > > > The current implementation of the logical plan and the logical optimizer in > Pig has proven to not be easily extensible. Developer feedback has indicated > that adding new rules to the optimizer is quite burdensome. In addition, the > logical plan has been an area of numerous bugs, many of which have been > difficult to fix. Developers also feel that the logical plan is difficult to > understand and maintain. The root cause for these issues is that a number of > design decisions that were made as part of the 0.2 rewrite of the front end > have now proven to be sub-optimal. The heart of this proposal is to revisit a > number of those proposals and rebuild the logical plan with a simpler design > that will make it much easier to maintain the logical plan as well as extend > the logical optimizer. > See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full > details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.