[GitHub] [spark] cloud-fan commented on a diff in pull request #40811: [SPARK-43098][SQL] Fix correctness COUNT bug when scalar subquery has group by clause

via GitHub Tue, 18 Apr 2023 18:35:31 -0700


cloud-fan commented on code in PR #40811:
URL: https://github.com/apache/spark/pull/40811#discussion_r1170717475



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/subquery.scala:
##########
@@ -254,13 +254,20 @@ object SubExprUtils extends PredicateHelper {
  * scalar subquery during planning.
  *
  * Note: `exprId` is used to have a unique name in explain string output.
+ *
+ * `mayHaveCountBug` is whether it's possible for the subquery to evaluate to 
non-null on
+ * empty input (zero tuples). It is false if the subquery has a GROUP BY 
clause, because in that
+ * case the subquery yields no row at all on empty input to the GROUP BY, 
which evaluates to NULL.
+ * It is set in PullupCorrelatedPredicates to true/false, before it is set its 
value is None.
+ * See constructLeftJoins in RewriteCorrelatedScalarSubquery for more details.
  */
 case class ScalarSubquery(
     plan: LogicalPlan,
     outerAttrs: Seq[Expression] = Seq.empty,
     exprId: ExprId = NamedExpression.newExprId,
     joinCond: Seq[Expression] = Seq.empty,
-    hint: Option[HintInfo] = None)
+    hint: Option[HintInfo] = None,
+    mayHaveCountBug: Option[Boolean] = None)

Review Comment:
   how about we make the naming easier to understand? `hasGlobalAggregate`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] cloud-fan commented on a diff in pull request #40811: [SPARK-43098][SQL] Fix correctness COUNT bug when scalar subquery has group by clause

Reply via email to