rdblue commented on issue #21977: [SPARK-25004][CORE] Add 
spark.executor.pyspark.memory limit.
URL: https://github.com/apache/spark/pull/21977#issuecomment-455628479
 
 
   @HyukjinKwon, I haven't looked at `spark.python.worker.memory` before. 
Thanks for pointing it out.
   
   Looks like this limit controls when data is spilled to disk. Do you know 
what data is spilled and what is accumulating in the python worker? My 
understanding was that python processed groups of rows (either pickled or in 
Arrow format) and doesn't typically hold data like the executor JVM does. More 
information here would be helpful to know what the right way to set this is.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to