Hi, I am interested in implementing logical optimization rules and to target this I have studied currently implemented logical rules and the rule framework. In particular, I felt that rules dealing with LOfilter are not able to handle complicated boolean expressions. I would like to share suggestions to improve handling of boolean expressions in LOFilter to enable better optimization.
1. SplitFilter Rule : SplitFilter rule is splitting one LOFilter into two by "AND". However it will not be able to split LOFilter if the top level operator is "OR". For example: *ex script:* A = load 'file_a' USING PigStorage(',') as (a1:int,a2:int,a3:int); B = load 'file_b' USING PigStorage(',') as (b1:int,b2:int,b3:int); C = load 'file_c' USING PigStorage(',') as (c1:int,c2:int,c3:int); J1 = JOIN B by b1, C by c1; J2 = JOIN J1 by $0, A by a1; D = *Filter J2 by ( (c1 < 10) AND (a3+b3 > 10) ) OR (c2 == 5);* explain D; In the above example current rule is not able to any filter condition across any join as it contains columns from all branches (inputs). But if we convert this expression into "Conjunctive Normal Form" (CNF) then we would be able to push filter condition c1< 10 and c2 == 5 below both join conditions. Here is the CNF expression for highlighted line: ( (c1 < 10) OR (c2 == 5) ) AND ( (a3+b3 > 10) OR (c2 ==5) ) *Suggestions:* It would be a good idea to convert LOFilter's boolean expression into CNF, it would then be easy to push parts (conjuncts) of the LOFilter boolean expression selectively. I have started thinking about the design for implementing this conversion (arbitrary boolean expression to CNF) and would appreciate any feedback or ideas. Thanks! Swati