Swinky commented on a change in pull request #31318:
URL: https://github.com/apache/spark/pull/31318#discussion_r585222944
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
##########
@@ -366,6 +363,16 @@ object BooleanSimplification extends Rule[LogicalPlan]
with PredicateHelper {
// ((c || ...) && (d || ...)) || a || b
(common :+ And(ldiff.reduce(Or), rdiff.reduce(Or))).reduce(Or)
}
+ } else {
+ // No common factors from disjunctive predicates, reduce common
factor from conjunction
+ val all = splitConjunctivePredicates(left) ++
splitConjunctivePredicates(right)
Review comment:
Many thanks @c21 for suggestion and review.
I was thinking about this. We can handle this as described in 3.
For `And(left, right)`
1. CommonFactorsInDisjunctions
2. else CommonFactorsInConjunctions.
3. else Mix of && and ||'s.
```
split lhs on || to get leftOrs.
if( leftOrs.size > 1)
split rhs on && to get rightAnds.
foreach p in leftOrs
distinct = Set(p, rightAnds)
if (distinct < rightAnds.size + 1)
OutputOrs += distinct.reduce(And)
OutputOrs.reduce(||)
else
split rhs on || to gt rightOrs
if( rightOrs.size > 1)
split lhs on && to get leftAnds.
foreach p in rightOrs
distinct = Set(p, leftAnds)
if (distinct < rightAnds.size + 1)
OutputOrs += distinct.reduce(And)
OutputOrs.reduce(||)
//OutputOrs is a set.
```
Given example would be optimized:
```
=> (a || b) && (a && b)
=> (a && (a && b)) || (b && (a && b))
=> a && b || a && b
=> a && b
```
But for examples like this:
```
=> (a || b) && (a && c)
=> (a && (a && c)) || (b && (a && c))
=> (a && c) || (a && b && c)
```
`(a && c) || (a && b && c)` will be reduced in next iteration of
`BooleanSimplification` rule to `a && b && c`.
Hence handling mix of &&'s and ||'s in this way can be inefficient. Do you
have any suggestions around how to optimize this in single iteration of
`BooleanSimplification` rule.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]