[spark] branch master updated: [SPARK-41985][SQL][FOLLOWUP] Remove alias in GROUP BY only when the expr is resolved

maxgekk Fri, 03 Feb 2023 04:40:58 -0800

This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new 02b39f0b880 [SPARK-41985][SQL][FOLLOWUP] Remove alias in GROUP BY only 
when the expr is resolved
02b39f0b880 is described below

commit 02b39f0b880a2ecf63167355d9644e91c98588a8
Author: Wenchen Fan <wenc...@databricks.com>
AuthorDate: Fri Feb 3 15:40:33 2023 +0300

    [SPARK-41985][SQL][FOLLOWUP] Remove alias in GROUP BY only when the expr is 
resolved
    
    ### What changes were proposed in this pull request?
    
    This is a followup of https://github.com/apache/spark/pull/39508 to fix a 
regression. We should not remove aliases from grouping expressions if they are 
not resolved, as the alias may be necessary for resolution, such as 
`CreateNamedStruct`.
    
    ### Why are the changes needed?
    
    fix a regression
    
    ### Does this PR introduce _any_ user-facing change?
    
    no
    
    ### How was this patch tested?
    
    new test
    
    Closes #39867 from cloud-fan/column.
    
    Lead-authored-by: Wenchen Fan <wenc...@databricks.com>
    Co-authored-by: Wenchen Fan <cloud0...@gmail.com>
    Signed-off-by: Max Gekk <max.g...@gmail.com>
---
 .../sql/catalyst/analysis/ResolveReferencesInAggregate.scala  |  8 +++++++-
 sql/core/src/test/resources/sql-tests/inputs/group-by.sql     |  3 +++
 .../src/test/resources/sql-tests/results/group-by.sql.out     | 11 +++++++++++
 3 files changed, 21 insertions(+), 1 deletion(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveReferencesInAggregate.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveReferencesInAggregate.scala
index 4af2ecc91ab..1a9ed4ce16e 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveReferencesInAggregate.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveReferencesInAggregate.scala
@@ -96,7 +96,13 @@ object ResolveReferencesInAggregate extends SQLConfHelper
       // can't find the grouping expressions via `semanticEquals` and the 
analysis will fail.
       // Example rules: ResolveGroupingAnalytics (See SPARK-31670 for more 
details) and
       // ResolveLateralColumnAliasReference.
-      groupingExpressions = resolvedGroupExprs.map(trimAliases),
+      groupingExpressions = resolvedGroupExprs.map { e =>
+        // Only trim the alias if the expression is resolved, as the alias may 
be needed to resolve
+        // the expression, such as `NamePlaceHolder` in `CreateNamedStruct`.
+        // Note: this rule will be invoked even if the Aggregate is fully 
resolved. So alias in
+        //       GROUP BY will be removed eventually, by following iterations.
+        if (e.resolved) trimAliases(e) else e
+      },
       aggregateExpressions = resolvedAggExprsWithOuter)
   }
 
diff --git a/sql/core/src/test/resources/sql-tests/inputs/group-by.sql 
b/sql/core/src/test/resources/sql-tests/inputs/group-by.sql
index 1615c43cc7e..c812403ba2c 100644
--- a/sql/core/src/test/resources/sql-tests/inputs/group-by.sql
+++ b/sql/core/src/test/resources/sql-tests/inputs/group-by.sql
@@ -34,6 +34,9 @@ SELECT a + b, COUNT(b) FROM testData GROUP BY a + b;
 SELECT a + 2, COUNT(b) FROM testData GROUP BY a + 1;
 SELECT a + 1 + 1, COUNT(b) FROM testData GROUP BY a + 1;
 
+-- struct() in group by
+SELECT count(1) FROM testData GROUP BY struct(a + 0.1 AS aa);
+
 -- Aggregate with nulls.
 SELECT SKEWNESS(a), KURTOSIS(a), MIN(a), MAX(a), AVG(a), VARIANCE(a), 
STDDEV(a), SUM(a), COUNT(a)
 FROM testData;
diff --git a/sql/core/src/test/resources/sql-tests/results/group-by.sql.out 
b/sql/core/src/test/resources/sql-tests/results/group-by.sql.out
index 0402039fafa..6e7592d6978 100644
--- a/sql/core/src/test/resources/sql-tests/results/group-by.sql.out
+++ b/sql/core/src/test/resources/sql-tests/results/group-by.sql.out
@@ -145,6 +145,17 @@ struct<((a + 1) + 1):int,count(b):bigint>
 NULL   1
 
 
+-- !query
+SELECT count(1) FROM testData GROUP BY struct(a + 0.1 AS aa)
+-- !query schema
+struct<count(1):bigint>
+-- !query output
+2
+2
+2
+3
+
+
 -- !query
 SELECT SKEWNESS(a), KURTOSIS(a), MIN(a), MAX(a), AVG(a), VARIANCE(a), 
STDDEV(a), SUM(a), COUNT(a)
 FROM testData


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-41985][SQL][FOLLOWUP] Remove alias in GROUP BY only when the expr is resolved

Reply via email to