ahshahid commented on code in PR #45446:
URL: https://github.com/apache/spark/pull/45446#discussion_r1524263718
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ColumnResolutionHelper.scala:
##########
@@ -477,6 +482,57 @@ trait ColumnResolutionHelper extends Logging with
DataTypeErrorsBase {
assert(q.children.length == 1)
q.children.head.output
},
+
+ resolveOnDatasetId = (datasetid: Long, name: String) => {
Review Comment:
I have been trying to see how I can integrate #43115 with this PR. One part
is deifinitely easily integratable and simplifies the code is to use is to use
the DataSet change of adding UnResolvedAttribute. This will simplify the
changes in this PR for select and join, etc.
But I am not comfortable as of now to have that function create
UnresolvedAttribute . Instead prefer UnresolvedAttributeWithTag which wraps the
original attribute.
This PRs logic of resolution involves identifying first Join ( the top
level Join) and then traversing the two legs till either a leaf or a binary
node is reached. It does not recurse down further. I am not sure we need
resolution to happen below the top level join as the user is expected to
extract column attributes from only the Dataset involved in top level Join.
Secondly the current code of resolution of DataFrame Column does not use
Tag from the UnresolvedAttribute. It uses the Logical Plan's ID.
3rd The DataSetId being a Set[Long] instead of single PlanId , is more
versatile as it avoids the need of recursion, if at all.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]