[GitHub] [spark] jchen5 commented on a diff in pull request #43111: [SPARK-36112] [SQL] Support correlated exists subqueries using DecorrelateInnerQuery framework

via GitHub Thu, 28 Sep 2023 12:05:53 -0700


jchen5 commented on code in PR #43111:
URL: https://github.com/apache/spark/pull/43111#discussion_r1340545294



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala:
##########
@@ -1397,6 +1400,10 @@ trait CheckAnalysis extends PredicateHelper with 
LookupCatalog with QueryErrorsB
           failOnInvalidOuterReference(l)
           checkPlan(input, aggregated, canContainOuter)
 
+        case o @ Offset(_, input) =>

Review Comment:
   How does this change relate, or is it a separate change to enable offset?



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##########
@@ -3416,6 +3416,14 @@ object SQLConf {
       .booleanConf
       .createWithDefault(true)
 
+  val DECORRELATE_EXISTS_AND_IN_SUBQUERIES =
+    buildConf("spark.sql.optimizer.decorrelateExistsIn.enabled")
+      .internal()
+      .doc("Decorrelate EXISTS and IN subqueries.")
+      .version("3.4.0")

Review Comment:
   I think we're on 4.0.0 now.



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala:
##########
@@ -1134,8 +1134,11 @@ trait CheckAnalysis extends PredicateHelper with 
LookupCatalog with QueryErrorsB
       isLateral: Boolean = false): Unit = {
     // Some query shapes are only supported with the DecorrelateInnerQuery 
framework.
     // Currently we only use this new framework for scalar and lateral 
subqueries.

Review Comment:
   Delete this line



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/DecorrelateInnerQuery.scala:
##########
@@ -710,6 +711,12 @@ object DecorrelateInnerQuery extends PredicateHelper {
           case a @ Aggregate(groupingExpressions, aggregateExpressions, child) 
=>
             val outerReferences = collectOuterReferences(a.expressions)
             val newOuterReferences = parentOuterReferences ++ outerReferences
+            // Find all the aggregate expressions that are subject to the 
"COUNT bug",
+            // i.e. those that have non-None default result.
+            val countBugSusceptibleAggs = 
aggregateExpressions.flatMap(_.collect {
+              case a@AggregateExpression(function, _, _, _, _)
+                if function.defaultResult.nonEmpty => a

Review Comment:
   Just checking the function's default result doesn't work for some more 
complicated cases, such as where there are nested subqueries:
   ```
   select (
     select sum(cnt)
     from (select count(*) cnt from t2 where t1.c1 = t2.c1)
   ) from t1
   ```
   or:
   ```
   select (
      select sum(a) from (
        select a from t2 where t1.c1 = t2.c1 UNION ALL select 1 as a
     )
   ) from t1
   ```
   The subquery is subject to the count bug even though the sum expression at 
the top defaults to NULL.
   
   We have logic for this at `evalSubqueryOnZeroTups` and 
`evalAggExprOnZeroTups` used below.
   
   But I'm not sure how that fits into the context here - what case motivated 
this change?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] jchen5 commented on a diff in pull request #43111: [SPARK-36112] [SQL] Support correlated exists subqueries using DecorrelateInnerQuery framework

Reply via email to