agubichev commented on code in PR #43111:
URL: https://github.com/apache/spark/pull/43111#discussion_r1345926575


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/DecorrelateInnerQuery.scala:
##########
@@ -461,6 +462,23 @@ object DecorrelateInnerQuery extends PredicateHelper {
       p.mapChildren(rewriteDomainJoins(outerPlan, _, conditions))
   }
 
+  private def isCountBugFree(aggregateExpressions: Seq[NamedExpression]): 
Boolean = {
+    // The COUNT bug only appears if an aggregate expression returns a 
non-NULL result on an empty
+    // input.
+    // Typical example (hence the name) is COUNT(*) that returns 0 from an 
empty result.
+    // However, SUM(x) IS NULL is another case that returns 0, and in general 
any IS/NOT IS and CASE
+    // expressions are suspect (and the combination of those).
+    // For now we conservatively accept only those expressions that are 
guaranteed to be safe.
+    val exprsRejectEmptyInput = aggregateExpressions.map {

Review Comment:
   For exists and IN we did not detect the count bug before, hence the 
incorrect results.
   For scalar subqueries, there is some quite convoluted way of detecting a 
count bug as a post-processing of scalar subquery. I will refactor it to use 
this function in the future, as it seems easier and more straightforward. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to