[GitHub] [spark] maropu commented on a change in pull request #31758: [SPARK-34639][SQL] Always remove unnecessary Alias in Analyzer.resolveExpression

GitBox Mon, 08 Mar 2021 06:02:46 -0800


maropu commented on a change in pull request #31758:
URL: https://github.com/apache/spark/pull/31758#discussion_r589442582




##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##########
@@ -1905,11 +1904,13 @@ class Analyzer(override val catalogManager: 
CatalogManager)
               .getOrElse(u)
           }
           val result = resolved match {
-            // When trimAlias = true, we will trim unnecessary alias of 
`GetStructField` and
-            // we won't trim the alias of top-level `GetStructField`. Since we 
will call
-            // CleanupAliases later in Analyzer, trim non top-level 
unnecessary alias of
-            // `GetStructField` here is safe.
-            case Alias(s: GetStructField, _) if trimAlias && !isTopLevel => s
+            // We trim unnecessary alias of `Get[Array]StructField` here. Note 
that, we cannot trim
+            // the alias of top-level `Get[Array]StructField`, as we should 
resolve
+            // `UnresolvedAttribute` to a named expression. The caller side 
can trim the alias of
+            // top-level `GetStructField` if it's safe to do so. Since we will 
call CleanupAliases
+            // later in Analyzer, trim non top-level unnecessary alias here is 
safe.
+            case Alias(s: GetStructField, _) if !isTopLevel => s
+            case Alias(s: GetArrayStructFields, _) if !isTopLevel => s

Review comment:
       It looks we also need to add entries for the other `ExtractValue` 
classes, e.g., `GetMapValue`?
   They seems to have the same issue with SPARK-31670;
   ```
   scala> spark.table("t").printSchema()
   root
    |-- c0: integer (nullable = false)
    |-- c1: map (nullable = false)
    |    |-- key: string
    |    |-- value: string (valueContainsNull = true)
   
   
   scala> sql("select c0, c1.key, COUNT(1) from t group by c0, c1.key with 
cube").show()
   org.apache.spark.sql.AnalysisException: expression 't.`c1`' is neither 
present in the group by, nor is it an aggregate function. Add to group by or 
wrap in first() (or first_value) if you don't care which value you get.;
   Aggregate [c0#34, key#35, spark_grouping_id#33L], [c0#34, c1#24[key] AS 
key#28, count(1) AS count(1)#30L]
   +- Expand [List(c0#23, c1#24, c0#31, key#32, 0), List(c0#23, c1#24, c0#31, 
null, 1), List(c0#23, c1#24, null, key#32, 2), List(c0#23, c1#24, null, null, 
3)], [c0#23, c1#24, c0#34, key#35, spark_grouping_id#33L]
      +- Project [c0#23, c1#24, c0#23 AS c0#31, c1#24[key] AS key#32]
         +- SubqueryAlias t
            +- View (`t`, [c0#23,c1#24])
               +- Project [cast(col1#25 as int) AS c0#23, cast(col2#26 as 
map<string,string>) AS c1#24]
                  +- Project [col1#25, col2#26]
                     +- LocalRelation [col1#25, col2#26]
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] maropu commented on a change in pull request #31758: [SPARK-34639][SQL] Always remove unnecessary Alias in Analyzer.resolveExpression

Reply via email to