[ https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12803565#action_12803565 ]
Alan Gates commented on PIG-1178: --------------------------------- Comments on lp.patch: 1) In LOJoin.getSchema, these lines of code: {code} for (Operator op : inputs) { LogicalSchema inputSchema = ((LogicalRelationalOperator)op).getSchema(); // the schema of one input is unknown, so the join schema is unknown, just return if (inputSchema == null) { schemaSet = true; return schema; } {code} You are assuming that schema is null. It would be better to explicitly set schema to null and then return it. 2) In SplitFilter.transform you put it in a while loop, finding each 'and' and splitting it into another filter. But there's already an outer while loop (the one in the optimizer applying the rule over and over) that will do that. One of the assertions in this design is that each rule should be as simple as possible. This rule should just split one and, and let the next application of the rule find the next and and split it again. Same comment applies to MergeFilter.transform and to PushUpFilterTransformer.check and .transform. 3) In MergeFilter.check: IIRC implicit splits aren't inserted into the plan until the logical to physical transformation. So it's possible that a filter actually has multiple successors. So instead of: {code} if (succeds != null && succeds.size()>0) { if (succeds.get(0) instanceof LOFilter) { return true; } } {code} it should read {code} if (succeds != null && succeds.size() == 1) { if (succeds.get(0) instanceof LOFilter) { return true; } } {code} 4) In MergeFilter.combineFilterCond: The expressions have been written in such a way that they manage their own connections when they are created. See for example, AndExpression. In its constructor it takes it add itself to the expression plan and connects itself to its two operands. So there no need to to do the addPlan.add and addPlan.connect calls. 5) In PushUpFilterTransformer.check, you need to check that the join type is inner. Pushing past outer joins is much trickier, and need not be handled here. 6) In PushUpFilterTransformer.check I don't understand what findCommon is doing. In any case, it should not be paying attention to aliases. It should be using the inputNums from the projection. It should be checking that all projections in the filter are associated with the same inputNum. If so, it is pushable to that inputNum. If not, not. In the same way transform should be using inputNum to find the right predecessor, not aliases. 7) We need a fourth rule to handle swapping filters, so each one can be tried against the join. Since this rule will always pass check (it would just be two filters in a row) we need a way to check that it doesn't run more than twice for a given pair of filters. We can accomplish this by having it 'sign' each filter in the node each time it is applied. This is what the annotate call on Operator is for. So each time the transform is applied, it would annotate both filters with info that it was applied, and to which filters. Then part of check can be two check that this rule has been applied at most twice. > LogicalPlan and Optimizer are too complex and hard to work with > --------------------------------------------------------------- > > Key: PIG-1178 > URL: https://issues.apache.org/jira/browse/PIG-1178 > Project: Pig > Issue Type: Improvement > Reporter: Alan Gates > Assignee: Ying He > Attachments: expressions-2.patch, expressions.patch, lp.patch, > lp.patch, PIG_1178.patch > > > The current implementation of the logical plan and the logical optimizer in > Pig has proven to not be easily extensible. Developer feedback has indicated > that adding new rules to the optimizer is quite burdensome. In addition, the > logical plan has been an area of numerous bugs, many of which have been > difficult to fix. Developers also feel that the logical plan is difficult to > understand and maintain. The root cause for these issues is that a number of > design decisions that were made as part of the 0.2 rewrite of the front end > have now proven to be sub-optimal. The heart of this proposal is to revisit a > number of those proposals and rebuild the logical plan with a simpler design > that will make it much easier to maintain the logical plan as well as extend > the logical optimizer. > See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full > details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.