Re: [PR] [SPARK-45959][SQL] Improving performance when addition of 1 column at a time causes increase in the LogicalPlan tree depth [spark]

via GitHub Mon, 08 Apr 2024 00:49:03 -0700


ahshahid commented on PR #43854:
URL: https://github.com/apache/spark/pull/43854#issuecomment-2042082750


   Caching issue is fixed in this PR.
   That was the complex part.
   It will not miss any cache.
   I have described the approach in PR description.
   And as I mentioned it makes cache lookup code much robust as described in
   other bug I filed.
   
   
   On Mon, Apr 8, 2024, 12:22 AM Wenchen Fan ***@***.***> wrote:
   
   > This is a well-known issue. The suggested fix is to ask users to not chain
   > transformations too much, and use "batch" like APIs such as
   > Dataset#withColumns.
   >
   > How does this PR fix the issue without the problem mentioned in 23d9822
   > 
<https://github.com/apache/spark/commit/23d982204bb9ef74d3b788a32ce6608116968719>
   > ?
   >
   > —
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/spark/pull/43854#issuecomment-2042035639>, or
   > unsubscribe
   > 
<https://github.com/notifications/unsubscribe-auth/AC6XG2ED66ZCKM7MGK44MHLY4JAUJAVCNFSM6AAAAAA7O7DTR6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBSGAZTKNRTHE>
   > .
   > You are receiving this because you authored the thread.Message ID:
   > ***@***.***>
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-45959][SQL] Improving performance when addition of 1 column at a time causes increase in the LogicalPlan tree depth [spark]

Reply via email to