GitHub user rdblue opened a pull request:
https://github.com/apache/spark/pull/21977
SPARK-25004: Add spark.executor.pyspark.memory limit.
## What changes were proposed in this pull request?
This adds `spark.executor.pyspark.memory` to configure Python's address
space limit,
[`resource.RLIMIT_AS`](https://docs.python.org/3/library/resource.html#resource.RLIMIT_AS).
Limiting Python's address space allows Python to participate in memory
management. In practice, we see fewer cases of Python taking too much memory
because it doesn't know to run garbage collection. This results in YARN killing
fewer containers. This also improves error messages so users know that Python
is consuming too much memory:
```
File "build/bdist.linux-x86_64/egg/package/library.py", line 265, in
fe_engineer
fe_eval_rec.update(f(src_rec_prep, mat_rec_prep))
File "build/bdist.linux-x86_64/egg/package/library.py", line 163, in
fe_comp
comparisons = EvaluationUtils.leven_list_compare(src_rec_prep.get(item,
[]), mat_rec_prep.get(item, []))
File "build/bdist.linux-x86_64/egg/package/evaluationutils.py", line 25,
in leven_list_compare
permutations = sorted(permutations, reverse=True)
MemoryError
```
The new pyspark memory setting is used to increase requested YARN container
memory, instead of sharing overhead memory between python and off-heap JVM
activity.
## How was this patch tested?
Tested memory limits in our YARN cluster and verified that MemoryError is
thrown.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/rdblue/spark
SPARK-25004-add-python-memory-limit
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21977.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21977
----
commit 19cd9c5cce4420729074a0976b129889d70fd56c
Author: Ryan Blue <blue@...>
Date: 2018-05-09T18:34:50Z
SPARK-25004: Add spark.executor.pyspark.memory limit.
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]