[jira] [Commented] (SPARK-25004) Add spark.executor.pyspark.memory config to set resource.RLIMIT_AS

2020-03-19 Thread Xiaochen Ouyang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-25004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17062333#comment-17062333
 ] 

Xiaochen Ouyang commented on SPARK-25004:
-

[~rdblue] This configuration can only control the worker.py process, and the 
maximum memory limit of the derived child process cannot be controlled. 

Worker(JVM) --> Executor–> python.demon–>python.demon , the last python demon 
process can not be controlled by this configuration.

> Add spark.executor.pyspark.memory config to set resource.RLIMIT_AS
> --
>
> Key: SPARK-25004
> URL: https://issues.apache.org/jira/browse/SPARK-25004
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.3.0
>Reporter: Ryan Blue
>Assignee: Ryan Blue
>Priority: Major
> Fix For: 2.4.0
>
>
> Some platforms support limiting Python's addressable memory space by limiting 
> [{{resource.RLIMIT_AS}}|https://docs.python.org/3/library/resource.html#resource.RLIMIT_AS].
> We've found that adding a limit is very useful when running in YARN because 
> when Python doesn't know about memory constraints, it doesn't know when to 
> garbage collect and will continue using memory when it doesn't need to. 
> Adding a limit reduces PySpark memory consumption and avoids YARN killing 
> containers because Python hasn't cleaned up memory.
> This also improves error messages for users, allowing them to see when Python 
> is allocating too much memory instead of YARN killing the container:
> {code:lang=python}
>   File "build/bdist.linux-x86_64/egg/package/library.py", line 265, in 
> fe_engineer
> fe_eval_rec.update(f(src_rec_prep, mat_rec_prep))
>   File "build/bdist.linux-x86_64/egg/package/library.py", line 163, in fe_comp
> comparisons = EvaluationUtils.leven_list_compare(src_rec_prep.get(item, 
> []), mat_rec_prep.get(item, []))
>   File "build/bdist.linux-x86_64/egg/package/evaluationutils.py", line 25, in 
> leven_list_compare
> permutations = sorted(permutations, reverse=True)
>   MemoryError
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25004) Add spark.executor.pyspark.memory config to set resource.RLIMIT_AS

2018-08-02 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16567396#comment-16567396
 ] 

Apache Spark commented on SPARK-25004:
--

User 'rdblue' has created a pull request for this issue:
https://github.com/apache/spark/pull/21977

> Add spark.executor.pyspark.memory config to set resource.RLIMIT_AS
> --
>
> Key: SPARK-25004
> URL: https://issues.apache.org/jira/browse/SPARK-25004
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.3.0
>Reporter: Ryan Blue
>Priority: Major
>
> Some platforms support limiting Python's addressable memory space by limiting 
> [{{resource.RLIMIT_AS}}|https://docs.python.org/3/library/resource.html#resource.RLIMIT_AS].
> We've found that adding a limit is very useful when running in YARN because 
> when Python doesn't know about memory constraints, it doesn't know when to 
> garbage collect and will continue using memory when it doesn't need to. 
> Adding a limit reduces PySpark memory consumption and avoids YARN killing 
> containers because Python hasn't cleaned up memory.
> This also improves error messages for users, allowing them to see when Python 
> is allocating too much memory instead of YARN killing the container:
> {code:lang=python}
>   File "build/bdist.linux-x86_64/egg/package/library.py", line 265, in 
> fe_engineer
> fe_eval_rec.update(f(src_rec_prep, mat_rec_prep))
>   File "build/bdist.linux-x86_64/egg/package/library.py", line 163, in fe_comp
> comparisons = EvaluationUtils.leven_list_compare(src_rec_prep.get(item, 
> []), mat_rec_prep.get(item, []))
>   File "build/bdist.linux-x86_64/egg/package/evaluationutils.py", line 25, in 
> leven_list_compare
> permutations = sorted(permutations, reverse=True)
>   MemoryError
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org