Re: [PR] [SPARK-47320][SQL] : The behaviour of Datasets involving self joins is inconsistent, unintuitive, with contradictions [spark]

via GitHub Sat, 16 Mar 2024 22:11:43 -0700


ahshahid commented on PR #45446:
URL: https://github.com/apache/spark/pull/45446#issuecomment-2002310326


   @peter-toth @cloud-fan ,
   IMHO the current idea of spark resolving the attribute to dataframe lower 
than the top level dataframe(s) , which in process adds missing attribute to 
various projections in between , can be detrimental to the performance without 
user being aware of the cause. The scenario which I have in mind is that say 
user had cached the lower dataframes. Now with the plan implicitly adding 
missing projects may make those cached plans unusable, without user being aware 
of the situation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-47320][SQL] : The behaviour of Datasets involving self joins is inconsistent, unintuitive, with contradictions [spark]

Reply via email to