ahshahid commented on PR #43854: URL: https://github.com/apache/spark/pull/43854#issuecomment-2042082750
Caching issue is fixed in this PR. That was the complex part. It will not miss any cache. I have described the approach in PR description. And as I mentioned it makes cache lookup code much robust as described in other bug I filed. On Mon, Apr 8, 2024, 12:22 AM Wenchen Fan ***@***.***> wrote: > This is a well-known issue. The suggested fix is to ask users to not chain > transformations too much, and use "batch" like APIs such as > Dataset#withColumns. > > How does this PR fix the issue without the problem mentioned in 23d9822 > <https://github.com/apache/spark/commit/23d982204bb9ef74d3b788a32ce6608116968719> > ? > > — > Reply to this email directly, view it on GitHub > <https://github.com/apache/spark/pull/43854#issuecomment-2042035639>, or > unsubscribe > <https://github.com/notifications/unsubscribe-auth/AC6XG2ED66ZCKM7MGK44MHLY4JAUJAVCNFSM6AAAAAA7O7DTR6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBSGAZTKNRTHE> > . > You are receiving this because you authored the thread.Message ID: > ***@***.***> > -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
