maropu commented on a change in pull request #31758:
URL: https://github.com/apache/spark/pull/31758#discussion_r589442582
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##########
@@ -1905,11 +1904,13 @@ class Analyzer(override val catalogManager:
CatalogManager)
.getOrElse(u)
}
val result = resolved match {
- // When trimAlias = true, we will trim unnecessary alias of
`GetStructField` and
- // we won't trim the alias of top-level `GetStructField`. Since we
will call
- // CleanupAliases later in Analyzer, trim non top-level
unnecessary alias of
- // `GetStructField` here is safe.
- case Alias(s: GetStructField, _) if trimAlias && !isTopLevel => s
+ // We trim unnecessary alias of `Get[Array]StructField` here. Note
that, we cannot trim
+ // the alias of top-level `Get[Array]StructField`, as we should
resolve
+ // `UnresolvedAttribute` to a named expression. The caller side
can trim the alias of
+ // top-level `GetStructField` if it's safe to do so. Since we will
call CleanupAliases
+ // later in Analyzer, trim non top-level unnecessary alias here is
safe.
+ case Alias(s: GetStructField, _) if !isTopLevel => s
+ case Alias(s: GetArrayStructFields, _) if !isTopLevel => s
Review comment:
It looks we also need to add entries for the other `ExtractValue`
classes, e.g., `GetMapValue`?
They seems to have the same issue with SPARK-31670;
```
scala> spark.table("t").printSchema()
root
|-- c0: integer (nullable = false)
|-- c1: map (nullable = false)
| |-- key: string
| |-- value: string (valueContainsNull = true)
scala> sql("select c0, c1.key, COUNT(1) from t group by c0, c1.key with
cube").show()
org.apache.spark.sql.AnalysisException: expression 't.`c1`' is neither
present in the group by, nor is it an aggregate function. Add to group by or
wrap in first() (or first_value) if you don't care which value you get.;
Aggregate [c0#34, key#35, spark_grouping_id#33L], [c0#34, c1#24[key] AS
key#28, count(1) AS count(1)#30L]
+- Expand [List(c0#23, c1#24, c0#31, key#32, 0), List(c0#23, c1#24, c0#31,
null, 1), List(c0#23, c1#24, null, key#32, 2), List(c0#23, c1#24, null, null,
3)], [c0#23, c1#24, c0#34, key#35, spark_grouping_id#33L]
+- Project [c0#23, c1#24, c0#23 AS c0#31, c1#24[key] AS key#32]
+- SubqueryAlias t
+- View (`t`, [c0#23,c1#24])
+- Project [cast(col1#25 as int) AS c0#23, cast(col2#26 as
map<string,string>) AS c1#24]
+- Project [col1#25, col2#26]
+- LocalRelation [col1#25, col2#26]
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]