khakhlyuk commented on PR #52613:
URL: https://github.com/apache/spark/pull/52613#issuecomment-3415012180

   > constructing large local relations in driver sounds dangerous, is it 
possible to offload the data from the driver to the executor side (an RDD)?
   
   Hey @pan3793!
   Thanks for the feedback, you are totally correct. Spark already materializes 
the LocalRelations fully on the driver today (both in the classic and connect 
mode), so this PR is a net-positive improvement over the existing behaviour. My 
changes remove the hard limit of 2GB, the new limit can be controlled via the 
`spark.sql.session.localRelationSizeLimit` conf and is set to 3GB by default.
   Offloading the data materialization from the driver to the executor would be 
the next important improvement and I believe it should be done, but it's out of 
scope of this PR and outside of my expertise (I'm mainly familiar with spark 
connect). I may be able to work on the executor changes in several months. If 
someone else can pick up the executor work, I'm happy with that. Also happy to 
create a jira ticket for tracking the follow-up.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to