Swinky commented on a change in pull request #31318:
URL: https://github.com/apache/spark/pull/31318#discussion_r585222944



##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -366,6 +363,16 @@ object BooleanSimplification extends Rule[LogicalPlan] 
with PredicateHelper {
             // ((c || ...) && (d || ...)) || a || b
             (common :+ And(ldiff.reduce(Or), rdiff.reduce(Or))).reduce(Or)
           }
+        } else {
+          // No common factors from disjunctive predicates, reduce common 
factor from conjunction
+          val all = splitConjunctivePredicates(left) ++ 
splitConjunctivePredicates(right)

Review comment:
       Many thanks @c21 for suggestion and review.
   I was thinking about this. We can handle this as described in 3.
   For `And(left, right)`
   1. CommonFactorsInDisjunctions
   2. else CommonFactorsInConjunctions.
   3. else Mix of && and ||'s.
   ```
   split lhs on || to get leftOrs.
        if( leftOrs.size > 1)
                split rhs on && to get rightAnds.
                foreach p in leftOrs
                        distinct = Set(p, rightAnds)
                        if (distinct < rightAnds.size + 1)
                                OutputOrs +=  distinct.reduce(And)
                OutputOrs.reduce(||)
        else
                split rhs on || to gt rightOrs
                if( rightOrs.size > 1)
                        split lhs on && to get leftAnds.
                        foreach p in rightOrs
                                distinct = Set(p, leftAnds)
                                if (distinct < rightAnds.size + 1)
                                        OutputOrs +=  distinct.reduce(And)
                        OutputOrs.reduce(||)                                    
  //OutputOrs is a set.
   ```
   
   Given example would be optimized:
   ```
   => (a || b) && (a && b)
   => (a && (a && b)) || (b && (a && b))
   => a && b || a && b
   => a && b
   ```
   But for examples like this: 
   ```
   => (a || b) && (a && c)
   => (a && (a && c)) || (b && (a && c))
   => (a && c) || (a && b && c)
   ```
   `(a && c) || (a && b && c)` will be reduced in next iteration of 
`BooleanSimplification` rule to `a && b && c`.
   Hence handling mix of &&'s and ||'s in this way can be inefficient. Do you 
have any suggestions around how to optimize this in single iteration of 
`BooleanSimplification` rule.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to