[jira] Commented: (PIG-1399) Logical Optimizer: Expression optimizor rule
[ https://issues.apache.org/jira/browse/PIG-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12904808#action_12904808 ] Alan Gates commented on PIG-1399: - [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 6 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] Logical Optimizer: Expression optimizor rule Key: PIG-1399 URL: https://issues.apache.org/jira/browse/PIG-1399 Project: Pig Issue Type: Sub-task Components: impl Affects Versions: 0.7.0 Reporter: Daniel Dai Assignee: Yan Zhou Fix For: 0.8.0 Attachments: newPatchFindbugsWarnings.html, PIG-1399.patch, PIG-1399.patch, PIG-1399.patch, PIG-1399.patch, PIG-1399.patch, PIG-1399.patch, PIG-1399.patch We can optimize expression in several ways: 1. Constant pre-calculation Example: B = filter A by a0 5+7; = B = filter A by a0 12; 2. Boolean expression optimization Example: B = filter A by not (not(a05) or a10); = B = filter A by a05 and a=10; -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1399) Logical Optimizer: Expression optimizor rule
[ https://issues.apache.org/jira/browse/PIG-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12904356#action_12904356 ] Alan Gates commented on PIG-1399: - {code} [exec] [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 6 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] -1 findbugs. The patch appears to introduce 2 new Findbugs warnings. {code} I'll attach the results of findbugs separately. Logical Optimizer: Expression optimizor rule Key: PIG-1399 URL: https://issues.apache.org/jira/browse/PIG-1399 Project: Pig Issue Type: Sub-task Components: impl Affects Versions: 0.7.0 Reporter: Daniel Dai Assignee: Yan Zhou Fix For: 0.8.0 Attachments: PIG-1399.patch, PIG-1399.patch, PIG-1399.patch, PIG-1399.patch, PIG-1399.patch, PIG-1399.patch We can optimize expression in several ways: 1. Constant pre-calculation Example: B = filter A by a0 5+7; = B = filter A by a0 12; 2. Boolean expression optimization Example: B = filter A by not (not(a05) or a10); = B = filter A by a05 and a=10; -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1399) Logical Optimizer: Expression optimizor rule
[ https://issues.apache.org/jira/browse/PIG-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12887912#action_12887912 ] Yan Zhou commented on PIG-1399: --- A couple of additional scenarios to be simplified: 5. equality Example: B = filter A by (a0 5 and a0 5); = B = filter A by a0 5; 6. complementary OR Example: B = filter A by (a0 5 OR a0 = 5); = the filtering is removed Note that by themselves they both look straightforward and may have little value. But used after other simplification rules, it could simplify the end results further but could be not obviously applicable at first place. Logical Optimizer: Expression optimizor rule Key: PIG-1399 URL: https://issues.apache.org/jira/browse/PIG-1399 Project: Pig Issue Type: Sub-task Components: impl Affects Versions: 0.7.0 Reporter: Daniel Dai Assignee: Yan Zhou Fix For: 0.8.0 We can optimize expression in several ways: 1. Constant pre-calculation Example: B = filter A by a0 5+7; = B = filter A by a0 12; 2. Boolean expression optimization Example: B = filter A by not (not(a05) or a10); = B = filter A by a05 and a=10; -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1399) Logical Optimizer: Expression optimizor rule
[ https://issues.apache.org/jira/browse/PIG-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12887923#action_12887923 ] Yan Zhou commented on PIG-1399: --- This work is not to optimize on the generic boolean logics, but rather to simplify the logic expression based upon the constant values as the logical expression's operands. The former, e.g., would change an boolean expression of ((A AND B) OR (A AND C)) to (A AND (B OR C)); while the latter will change, say, (a0 5 and a0 7) to (a0 7). It is, therefore, up to the query writer to optimize his/her boolean logic, probably through use of some other tools. The algorithm works in a series of steps in order: 1) a constant expression evaluation visitor that will evaluate the constant expressions. It works by traversing the expression tree in a bottom-up manner and evaluate all subexpressions that have all constant subexpressions. All results from constant children are pushed to a stack for the parent to digest for its own evaluation. Any non-constant expression will push a null to the stack and consequently will cause all of its ancestors not to be evaluated. For simplicity, only constant binary expressions and constant unary expressions and evaluated. More complex expressions are not evaluated at this moment. For UDF, this evaluation is not planned at all for fear of possible stateful consequences resulting from the original evaluations; 2) A NOT conversion visitor that will traverse the expression tree in a depth-first manner with post-order handling. A status of negativity for a NOT expression is recorded in the depth-first traversal before subtree traversal and reversed after traversing the subtree. All reversible expressions is replaced by its negated counterpart for negative negativity. Currently equality ops, and its non-equality couter part, all range comparisons, logical AND and OR are reversible. Notably missing is the is null for lack of a is not null base expression; 3) A DNF plan is generated, through a helper DNFPlanGenerator visitor class, whose disjunctions are either of OrExpression or a new DNFExpression with type of OR, whose conjunctions are either of AndExpression or the new DNFExpression with type of AND. The introduction of the new DNFExpression, which extends LogiclExpression, is to support multiple children (vs. the two children in a BinaryExpression) to facilitate the processing of multiple children of an OR or AND operator due to the commutative property of the two operators. The leaves of the DNF are of a new LogiclExpressionProxy type that extends the LogicalExpression. This new type is to be used as a proxy toward the original leaf expression in the original filter plan. The purpose is to track how often an original expression has been put in the DNF plan as result of the normalizing process. Consequently, a DNFSplitCounter member is added to the LogicalExpression, which is incremented once a new proxy is created on the original expression. Due to the potentially exponential growth of the DNF plan, and the nonlinear complexity to trim the DNF plan (see 4 below), the size of the DNF plan is limited to 100 nodes beyind which the simplification beyond step 2) are just skipped; 4) Then the DNF plan is trimmed according to the inferrence rules between the operands of the conjunctions first, and then between the operands of the disjuction in the DNF plan. If a leaf is trimmed, the counter, DNFSpliCounter, of the source of the proxy will be decremented. Basically, the DNF plan is used as a utility to determine if an original leaf expression can be trimmed from the original filter plan or not. If all proxies of the original leaf expression have been trimmed from the DNF plan, the original leaf expression can be trimmed from the original plan then. The point is that the DNF plan is not intended to replace the original filer plan since the DNF plan in general tends to be more expensive to evaluate than the original filter plan. 5) The original filter plan is traversed in a bottom-up manner so that if a leaf's DNFSpliCounter is zero, which means all of its proxies on DNF has been trimmed, the leaf will be trimmed. For nonleafs of AND or OR expressions, if one child survives, the child will be relinked to the predecessor(s). If either or both children are trimmed, the nonleaf will be trimmed too. If the whole new filter plan is empty, the filter operator will be removed from the logical plan too. Using a example of B = filter A by NOT((a0 1) or (a1 3 and a0 3+2)): After 1), the filter plan becomes NOT((a0 1) or (a1 3 and a0 5)); After 2), the filter plan becomes (a0 = 1) AND ((a1 = 3) OR (a0 = 5)); After 3), the DNF plan is ((a0 = 1) AND (a1 = 3)) OR ((a0 = 1) AND (a0 = 5)); After 4), the DNF plan becomes a0 = 1; After 5), the filter plan becomes a0 = 1.
[jira] Commented: (PIG-1399) Logical Optimizer: Expression optimizor rule
[ https://issues.apache.org/jira/browse/PIG-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12883348#action_12883348 ] Yan Zhou commented on PIG-1399: --- Other expression optimizations include: 3. Optimization of erasure of logical implicated expression in AND Example: B = filter A by (a0 5 and a0 7); = B = filter A by a0 7; 4. Optimization of erasure of logical implicated expression in OR Example: B = filter A by ((a0 5) or (a0 6 and a1 15); = B = filter C by a0 5; A comprehensive example of 2, 3 and 4 optimizations is: B = filter A by NOT((a0 1 and a0 0) or (a1 3 and a0 5)); = B = filter A by a0 = 1; Logical Optimizer: Expression optimizor rule Key: PIG-1399 URL: https://issues.apache.org/jira/browse/PIG-1399 Project: Pig Issue Type: Sub-task Components: impl Affects Versions: 0.7.0 Reporter: Daniel Dai Assignee: Yan Zhou We can optimize expression in several ways: 1. Constant pre-calculation Example: B = filter A by a0 5+7; = B = filter A by a0 12; 2. Boolean expression optimization Example: B = filter A by not (not(a05) or a10); = B = filter A by a05 and a=10; -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.