Github user cloud-fan commented on the pull request:
https://github.com/apache/spark/pull/9548#issuecomment-155721233
I don't think every `Column` need to belong to a `DataFrame`, and it's ok
to me to have exactly same `Column` from different `DataFrames`, e.g. the
`keyColToDrop` in your example.
About the problem that we resolve right tree of self join but miss the join
codition, actually it's a known bug, a workaround is aliasing a name to
`DataFrame`s and use `$"df.col"` in join condition so that it's unresolved
while we resoving the right tree.
Making every `Column` globally unique is a good idea, but adding a
`DataFrame` ID maybe too much because self-join is a special case and the only
case that introduce ambiguity. How about we create new `Column`s with new
expression IDs only when calling `DataFrame.as`?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]