HeartSaVioR commented on pull request #28326:
URL: https://github.com/apache/spark/pull/28326#issuecomment-619722391
(Please consider that I'm not an expert of SQL area.)
I've read through the code around Alias and played with the reproducer
(test), and feel #24457 is the fix addressing root cause.
Looking into the reproducer with debugger,
```
val left = df1.select('key, window('leftTime, "10 second") as 'leftWindow,
'leftValue)
```
`'leftTime` in here is **unresolved** when applying `as` (hence `window` is
also **unresolved** as well), and `as` sets the metadata with what original
column has whereas the metadata of `'leftWindow` cannot be determined here,
hence the problem arises.
Even without the analysis, logically thinking, I'm wondering why Alias has
an explicit metadata and hides the actual attribute's metadata, except the case
of optimization which should be done without side-effects.
Sorry for the dumb question, but is there any real case to do it, and even
if it's valid, is it intentional to hide the metadata of actual attribute?
Shouldn't we retain the metadata of actual attribute's metadata as well?
If we concern about the performance about not having shortcut of Alias
metadata, below fix may bring same effect with #24457, whereas it only changes
the behavior when Alias renames Column which has unresolved expression:
```
def name(alias: String): Column = withExpr {
normalizedExpr() match {
case ne: NamedExpression if ne.resolved =>
Alias(expr, alias)(explicitMetadata = Some(ne.metadata))
case other => Alias(other, alias)()
}
}
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]