Github user mgaido91 commented on the issue:
https://github.com/apache/spark/pull/21449
Thanks for your comment @cloud-fan. I understand your point. That is quite
a tricky problem, since we should know probably also the "DAG" of the
dataframes in order to take the right decision.
But despite this change is related to that problem, I think it is different
and with a much smaller scope. Indeed, while we can use the metadata
information in many places, actually in this patch is is used only in the
self-join case when there is ambiguity in which column to take. The behavior in
any other case in unchanged.
So after this patch, the situation in resolving column using `col` is
unchanged. The only places where the dataset of provenance is checked is in
self joins. The goal here is only to support cases which were throwing
exceptions in resolving the right column.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]