Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/21449
This is a long-standing issue, I've seen many attempts to fix it (including
myself) but no one success.
The major problem is, there is no clear definition of the expected
behavior, i.e. what's the semantic of `Dataset.col`?
some examples
```
df.select(df.col("i")) // valid
val df1 = df.filter(...)
df1.select(df.col("i")) // still valid
df.join(otherDF, df.col("i") === otherDF.col("i")) // valid
df.join(otherDF).select(df.col("i"), otherDF("i")) // valid
val df2 = df.select(df.col("i") + 1)
df2.select(df.col("i")) // invalid
```
Sometime we can use an ancestor's column in a new Dataset but sometimes we
can't. We should make the condition clear first.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]