[ 
https://issues.apache.org/jira/browse/SPARK-31149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-31149.
----------------------------------
    Resolution: Won't Fix

> PySpark job not killing Spark Daemon processes after the executor is killed 
> due to OOM
> --------------------------------------------------------------------------------------
>
>                 Key: SPARK-31149
>                 URL: https://issues.apache.org/jira/browse/SPARK-31149
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 2.4.5
>            Reporter: Arsenii Venherak
>            Priority: Major
>
> {code:java}
> 2020-03-10 10:15:00,257 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>  Memory usage of ProcessTree 327523 for container-id container_e25_1583
> 485217113_0347_01_000042: 1.9 GB of 2 GB physical memory used; 39.5 GB of 4.2 
> GB virtual memory used
> 2020-03-10 10:15:05,135 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>  Memory usage of ProcessTree 327523 for container-id container_e25_1583
> 485217113_0347_01_000042: 3.6 GB of 2 GB physical memory used; 41.1 GB of 4.2 
> GB virtual memory used
> 2020-03-10 10:15:05,136 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>  Process tree for container: container_e25_1583485217113_0347_01_000042
>  has processes older than 1 iteration running over the configured limit. 
> Limit=2147483648, current usage = 3915513856
> 2020-03-10 10:15:05,136 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>  Container [pid=327523,containerID=container_e25_1583485217113_0347_01_
> 000042] is running beyond physical memory limits. Current usage: 3.6 GB of 2 
> GB physical memory used; 41.1 GB of 4.2 GB virtual memory used. Killing 
> container.
> Dump of the process-tree for container_e25_1583485217113_0347_01_000042 :
>         |- 327535 327523 327523 327523 (java) 1611 111 4044427264 172306 
> /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.242.b08-0.el7_7.x86_64/jre/bin/java 
> -server -Xmx1024m -Djava.io.tmpdir=/data/s
> cratch/yarn/usercache/u689299/appcache/application_1583485217113_0347/container_e25_1583485217113_0347_01_000042/tmp
>  -Dspark.ssl.trustStore=/opt/mapr/conf/ssl_truststore -Dspark.authenticat
> e.enableSaslEncryption=true -Dspark.driver.port=40653 
> -Dspark.network.timeout=7200 -Dspark.ssl.keyStore=/opt/mapr/conf/ssl_keystore 
> -Dspark.network.sasl.serverAlwaysEncrypt=true -Dspark.ssl
> .enabled=true -Dspark.ssl.protocol=TLSv1.2 -Dspark.ssl.fs.enabled=true 
> -Dspark.ssl.ui.enabled=false -Dspark.authenticate=true 
> -Dspark.yarn.app.container.log.dir=/opt/mapr/hadoop/hadoop-2.7.
> 0/logs/userlogs/application_1583485217113_0347/container_e25_1583485217113_0347_01_000042
>  -XX:OnOutOfMemoryError=kill %p 
> org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url
> spark://coarsegrainedschedu...@bd02slse0201.wellsfargo.com:40653 
> --executor-id 40 --hostname bd02slsc0519.wellsfargo.com --cores 1 --app-id 
> application_1583485217113_0347 --user-class-path
> file:/data/scratch/yarn/usercache/u689299/appcache/application_1583485217113_0347/container_e25_1583485217113_0347_01_000042/__app__.jar
> {code}
>  
>  
> After that, there are lots of pyspark.daemon process left.
>  eg:
>  /apps/anaconda3-5.3.0/bin/python -m pyspark.daemon



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to