Re: [PR] [SPARK-47217][SQL] Fix deduplicated expression resolution [spark]

via GitHub Tue, 19 Mar 2024 05:21:30 -0700


cloud-fan commented on PR #45552:
URL: https://github.com/apache/spark/pull/45552#issuecomment-2007037219


   The column reference in classic Spark SQL DataFrame API is very broken and I 
really don't want to add more hacks here and there to fix certain cases. In 
Spark Connect, we've redesigned the column reference and it's much more 
reliable and reasonable.
   
   How about adding a config to let the classic column reference use the same 
implement of spark connect's? Ideally users should update their DataFrame query 
to always use named column like SQL API.
   ```
   df1 = abc.as("df1")
   df2 = xyz.as("df2")
   df1.join(df2, $"df1.col" === $"df2.col")
   ```
   But if users really want to stick with the old style, they can turn on the 
config.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-47217][SQL] Fix deduplicated expression resolution [spark]

Reply via email to