[ https://issues.apache.org/jira/browse/SPARK-31149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean R. Owen resolved SPARK-31149. ---------------------------------- Resolution: Won't Fix > PySpark job not killing Spark Daemon processes after the executor is killed > due to OOM > -------------------------------------------------------------------------------------- > > Key: SPARK-31149 > URL: https://issues.apache.org/jira/browse/SPARK-31149 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 2.4.5 > Reporter: Arsenii Venherak > Priority: Major > > {code:java} > 2020-03-10 10:15:00,257 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: > Memory usage of ProcessTree 327523 for container-id container_e25_1583 > 485217113_0347_01_000042: 1.9 GB of 2 GB physical memory used; 39.5 GB of 4.2 > GB virtual memory used > 2020-03-10 10:15:05,135 INFO > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: > Memory usage of ProcessTree 327523 for container-id container_e25_1583 > 485217113_0347_01_000042: 3.6 GB of 2 GB physical memory used; 41.1 GB of 4.2 > GB virtual memory used > 2020-03-10 10:15:05,136 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: > Process tree for container: container_e25_1583485217113_0347_01_000042 > has processes older than 1 iteration running over the configured limit. > Limit=2147483648, current usage = 3915513856 > 2020-03-10 10:15:05,136 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: > Container [pid=327523,containerID=container_e25_1583485217113_0347_01_ > 000042] is running beyond physical memory limits. Current usage: 3.6 GB of 2 > GB physical memory used; 41.1 GB of 4.2 GB virtual memory used. Killing > container. > Dump of the process-tree for container_e25_1583485217113_0347_01_000042 : > |- 327535 327523 327523 327523 (java) 1611 111 4044427264 172306 > /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.242.b08-0.el7_7.x86_64/jre/bin/java > -server -Xmx1024m -Djava.io.tmpdir=/data/s > cratch/yarn/usercache/u689299/appcache/application_1583485217113_0347/container_e25_1583485217113_0347_01_000042/tmp > -Dspark.ssl.trustStore=/opt/mapr/conf/ssl_truststore -Dspark.authenticat > e.enableSaslEncryption=true -Dspark.driver.port=40653 > -Dspark.network.timeout=7200 -Dspark.ssl.keyStore=/opt/mapr/conf/ssl_keystore > -Dspark.network.sasl.serverAlwaysEncrypt=true -Dspark.ssl > .enabled=true -Dspark.ssl.protocol=TLSv1.2 -Dspark.ssl.fs.enabled=true > -Dspark.ssl.ui.enabled=false -Dspark.authenticate=true > -Dspark.yarn.app.container.log.dir=/opt/mapr/hadoop/hadoop-2.7. > 0/logs/userlogs/application_1583485217113_0347/container_e25_1583485217113_0347_01_000042 > -XX:OnOutOfMemoryError=kill %p > org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url > spark://coarsegrainedschedu...@bd02slse0201.wellsfargo.com:40653 > --executor-id 40 --hostname bd02slsc0519.wellsfargo.com --cores 1 --app-id > application_1583485217113_0347 --user-class-path > file:/data/scratch/yarn/usercache/u689299/appcache/application_1583485217113_0347/container_e25_1583485217113_0347_01_000042/__app__.jar > {code} > > > After that, there are lots of pyspark.daemon process left. > eg: > /apps/anaconda3-5.3.0/bin/python -m pyspark.daemon -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org