[ https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832313#action_12832313 ]
Ying He commented on PIG-1178: ------------------------------ Here is my thoughts to use this framework to implement PruneColumns. 1. Separate prune columns and prune map keys into 2 rules. Current implementation mixed them in one class. It's better to separate them to make each rule simpler. 2. The prune column rule can be implemented by creating a new visitor. This visitor is called from transform(), and it visits every LogicalRelationalOperator by reverse dependency order. Each visit(LogicalRelationalOperator) calculates the required output uids by combining the input uids from it successors. If a node is the sink of the plan, the output uids are retrieved from its schema. The input uids are calculated from its output uids by looking into the expression plan(s) of this operator. If an output uid is derived from other uids, the source uids should be put into input uids. For example, a+b is from a & b. The input uids should keep the uid of a & b. Each operator should consider its logical meanings when calculating input uids from output uids. For example, for LOCross, the input uids should contain at least one field from each input. The input uids and output uids can be added into the operator as annotations. 3. After step 2, use another visitor to go over the plan again by dependency order to prune the columns. This can be done by reading out the input and output uids for each node. 4. I think it's ok to implement prune column and prune map key as regular rule. They just need to overwrite the match(). public List<OperatorPlan> match(OperatorPlan plan) { List<OperatorPlan> ll = new ArrayList<OperatorPlan(); ll.add(plan); return ll; } This method tells optimizer that only one match is find, which is the plan itself. 5. For Transformer class, I suggest to get rid of check() and change void transform() into boolean transform(). If transform() returns false, it means no transformation is made. If it returns true, transformation is made. The reason is that for some rules, it is not easy to know if a change is going to be made, such as PruneColumn rule. If we have both check() and transform(), lots of logic would be duplicated in these two methods. > LogicalPlan and Optimizer are too complex and hard to work with > --------------------------------------------------------------- > > Key: PIG-1178 > URL: https://issues.apache.org/jira/browse/PIG-1178 > Project: Pig > Issue Type: Improvement > Reporter: Alan Gates > Assignee: Ying He > Attachments: expressions-2.patch, expressions.patch, lp.patch, > lp.patch, pig_1178.patch, pig_1178.patch, PIG_1178.patch > > > The current implementation of the logical plan and the logical optimizer in > Pig has proven to not be easily extensible. Developer feedback has indicated > that adding new rules to the optimizer is quite burdensome. In addition, the > logical plan has been an area of numerous bugs, many of which have been > difficult to fix. Developers also feel that the logical plan is difficult to > understand and maintain. The root cause for these issues is that a number of > design decisions that were made as part of the 0.2 rewrite of the front end > have now proven to be sub-optimal. The heart of this proposal is to revisit a > number of those proposals and rebuild the logical plan with a simpler design > that will make it much easier to maintain the logical plan as well as extend > the logical optimizer. > See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full > details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.