EnricoMi commented on PR #37407:
URL: https://github.com/apache/spark/pull/37407#issuecomment-1249106578
@cloud-fan I have introduced expression `UnpivotExpr` to replace the
`(Seq[NamedExpression], Option[String]])`, which makes code more readable.
But, this introduces the following change in behaviour / deviation from
projection behaviour:
```scala
spark.range(5).select(struct($"id").as("an")).select($"an.id").show()
```
"an.id" gets alias "id":
```
+---+
| id|
+---+
| 0|
| 1|
| 2|
| 3|
| 4|
+---+
```
```
Project(UnresolvedAttribute("an.id"), plan)
--> ResolveReferences rule -->
Project(Alias(GetStructField(an#2.id), "id"), plan)
```
```scala
spark.range(5).select(struct($"id").as("an")).unpivot(Array($"an.id"),
Array($"an.id"), "col", "val").show()
```
before introducing `UnpivotExpr`, both ids and values get alias "id" (as in
select / `Project`):
```
+---+---+---+
| id|col|val|
+---+---+---+
| 0| id| 0|
| 1| id| 1|
| 2| id| 2|
| 3| id| 3|
| 4| id| 4|
+---+---+---+
```
after introducing `UnpivotExpr`, id "str.id" gets alias "id", value "str.id"
does not get an alias and hence gets name "an.id":
```
+---+-----+---+
| id| col|val|
+---+-----+---+
| 0|an.id| 0|
| 1|an.id| 1|
| 2|an.id| 2|
| 3|an.id| 3|
| 4|an.id| 4|
+---+-----+---+
```
Now that `UnpivotExpr` is the top level expression, inner
`UnresolvedAttribute` / `GetStructField` does not get an alias:
```
Unpivot(Seq(UnresolvedAttribute("an.id")),
Seq(UnpivotExpr(Seq(UnresolvedAttribute("an.id")), ...)), ..., plan)
--> ResolveReferences -->
Unpivot(Seq(Alias(GetStructField(an#2.id), "id")),
Seq(UnpivotExpr(Seq(GetStructField(an#2.id)), ...)), ..., plan)
```
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala#L1770
`CleanupAliases` rule is not the reason, the alias is being removed inside
`ResolveReferences`.
The only way to get to the old behaviour is a special treatment of
`UnpivotExpr` in `QueryPlan.mapExpressions.recursiveTransform`:
https://github.com/apache/spark/pull/37407/commits/9dd66b78ec817a53325d95900f18198dac9bc3b1#diff-ece55283a94dd23d3c04f8b9d8ae35937ccff67724be690ff30f76e9f8093c6eR211
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]