Hi
As the documentation said:
spark.python.worker.memory
Amount of memory to use per python worker process during aggregation, in
the same format as JVM memory strings (e.g. 512m, 2g). If the memory used
during aggregation goes above this amount, it will spill the data into
disks.

I search the config in spark source code, only rdd.py use the option.
It means that the option only work in python *rdd.groupByKey or*
*rdd.sortByKey *etc.
The python* ExternalSorter or ExternalMerger* will spill data to disk when
memory reach the  spark.python.worker.memory limit.

When PythonRunner fork a python worker subprocess, what is the memory limit
for each python worker? does spark.python.worker.memory affect the memory
of a python worker?

-- 
Best & Regards
Cyanny LIANG
email: lgrcya...@gmail.com

Reply via email to