khakhlyuk commented on PR #52613: URL: https://github.com/apache/spark/pull/52613#issuecomment-3415012180
> constructing large local relations in driver sounds dangerous, is it possible to offload the data from the driver to the executor side (an RDD)? Hey @pan3793! Thanks for the feedback, you are totally correct. Spark already materializes the LocalRelations fully on the driver today (both in the classic and connect mode), so this PR is a net-positive improvement over the existing behaviour. My changes remove the hard limit of 2GB, the new limit can be controlled via the `spark.sql.session.localRelationSizeLimit` conf and is set to 3GB by default. Offloading the data materialization from the driver to the executor would be the next important improvement and I believe it should be done, but it's out of scope of this PR and outside of my expertise (I'm mainly familiar with spark connect). I may be able to work on the executor changes in several months. If someone else can pick up the executor work, I'm happy with that. Also happy to create a jira ticket for tracking the follow-up. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
