rdblue commented on issue #21977: [SPARK-25004][CORE] Add spark.executor.pyspark.memory limit. URL: https://github.com/apache/spark/pull/21977#issuecomment-455628479 @HyukjinKwon, I haven't looked at `spark.python.worker.memory` before. Thanks for pointing it out. Looks like this limit controls when data is spilled to disk. Do you know what data is spilled and what is accumulating in the python worker? My understanding was that python processed groups of rows (either pickled or in Arrow format) and doesn't typically hold data like the executor JVM does. More information here would be helpful to know what the right way to set this is.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
