Re: [PR] [SPARK-47320][SQL] : The behaviour of Datasets involving self joins is inconsistent, unintuitive, with contradictions [spark]

via GitHub Wed, 13 Mar 2024 22:36:00 -0700


ahshahid commented on code in PR #45446:
URL: https://github.com/apache/spark/pull/45446#discussion_r1524263718



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ColumnResolutionHelper.scala:
##########
@@ -477,6 +482,57 @@ trait ColumnResolutionHelper extends Logging with 
DataTypeErrorsBase {
         assert(q.children.length == 1)
         q.children.head.output
       },
+
+      resolveOnDatasetId = (datasetid: Long, name: String) => {

Review Comment:
   I have been trying to see how I can integrate #43115 with this PR. One part 
is deifinitely easily integratable and simplifies the code is to use is to use 
the DataSet change of adding UnResolvedAttribute. This will simplify the 
changes in this PR for select and join, etc.
   But I am not  comfortable as of now to have that function create 
UnresolvedAttribute . Instead prefer UnresolvedAttributeWithTag which wraps the 
original attribute.
   This PRs logic  of resolution involves identifying first Join ( the top 
level Join) and then traversing the two legs till either a leaf or a binary 
node is reached. It does not recurse down further. I am not sure we need 
resolution to happen below the top level join as the user is expected to 
extract column attributes from only the Dataset involved in top level Join.
   Secondly the current code of resolution of DataFrame  Column does not use 
Tag from the UnresolvedAttribute. It uses the Logical Plan's ID. 
   3rd The DataSetId being a Set[Long] instead of single PlanId , is more 
versatile as it avoids the need of recursion, if at all.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-47320][SQL] : The behaviour of Datasets involving self joins is inconsistent, unintuitive, with contradictions [spark]

Reply via email to