maropu commented on a change in pull request #24118: [SPARK-26736][SQL] if
filter condition `And` has non-determined sub function it does not do partition
prunning
URL: https://github.com/apache/spark/pull/24118#discussion_r269831064
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala
##########
@@ -91,6 +98,54 @@ object PhysicalOperation extends PredicateHelper {
.map(Alias(_, a.name)(a.exprId, a.qualifier)).getOrElse(a)
}
}
+
+ /**
+ * Extract the deterministic expressions in non-deterministic expressions,
i.e. 'And' and 'Or'.
+ *
+ * Example input:
+ * {{{
+ * col = 1 and rand() < 1
+ * (col1 = 1 and rand() < 1) and col2 = 1
+ * col1 = 1 or rand() < 1
+ * (col1 = 1 and rand() < 1) or (col2 = 1 and rand() < 1)
Review comment:
IMO we don't need to handle this case `(col1 = 1 and rand() < 1) or (col2 =
1 and rand() < 1)` in this pr because DNF forms should be handled in another
normalization logic (e.g., SPARK-6624). So, I think its ok to handle CFN forms
only here. In fact, I think we should keep the same semantics with
[PushDownPredicate](https://github.com/apache/spark/blob/39577a27a0b58fd75b41d24b10012447748b7ee9/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala#L1031).
cc: @gatorsmile @cloud-fan
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]