ahshahid commented on PR #43854:
URL: https://github.com/apache/spark/pull/43854#issuecomment-2042090754

   I understand that suggestion is to not use api to add single column.. but I
   have come across many companies which generate dataframes via some loop
   logic. In my previous works I have seen query plans containing 40million
   plus project nodes only ( not counting filters joins windows etc).
   
   There are other customers who are now seeing query compilation times
   increased from 3 mins to 2 plus hours, due to de dup relation rule or plan
   cloning at every stage.
   
   On Mon, Apr 8, 2024, 12:48 AM Asif Shahid ***@***.***> wrote:
   
   > Caching issue is fixed in this PR.
   > That was the complex part.
   > It will not miss any cache.
   > I have described the approach in PR description.
   > And as I mentioned it makes cache lookup code much robust as described in
   > other bug I filed.
   >
   >
   > On Mon, Apr 8, 2024, 12:22 AM Wenchen Fan ***@***.***>
   > wrote:
   >
   >> This is a well-known issue. The suggested fix is to ask users to not
   >> chain transformations too much, and use "batch" like APIs such as
   >> Dataset#withColumns.
   >>
   >> How does this PR fix the issue without the problem mentioned in 23d9822
   >> 
<https://github.com/apache/spark/commit/23d982204bb9ef74d3b788a32ce6608116968719>
   >> ?
   >>
   >> —
   >> Reply to this email directly, view it on GitHub
   >> <https://github.com/apache/spark/pull/43854#issuecomment-2042035639>, or
   >> unsubscribe
   >> 
<https://github.com/notifications/unsubscribe-auth/AC6XG2ED66ZCKM7MGK44MHLY4JAUJAVCNFSM6AAAAAA7O7DTR6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBSGAZTKNRTHE>
   >> .
   >> You are receiving this because you authored the thread.Message ID:
   >> ***@***.***>
   >>
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to