Dear all, We encountered a problem with failed Spark jobs. We have a Spark/Hadoop cluster - CDH 5.1.2 + Spark 1.1
After launching a spark job with command: ~/soft/spark-1.1.0-bin-hadoop2.3/bin/spark-submit --master yarn-cluster --executor-memory 4G --driver-memory 4G --class "ru.retailrocket.spark.Upsell" --num-executors 18 --executor-cores 2 target/scala-2.10/Upsell-assembly-1.0.jar if a task is failed some of old workers processes are still in memory: yarn 9916 0.0 0.0 12228 1444 ? Ss 15:05 0:00 /bin/bash -c /usr/lib/jvm/java-7-oracle/bin/java -server -XX:OnOutOfMemoryError='kill %p' -Xms4096m -Xmx4096m -Djava.io.tmpdir=/dfs/dn1/yarn/local/usercache/tik/appcache/application_1414589211432_63031/container_1414589211432_63031_01_000010/tmp '-Dspark.akka.timeout=1000' '-Dspark.akka.frameSize=1000' org.apache.spark.executor.CoarseGrainedExecutorBackend akka.tcp://sparkdri...@h6.xxxxxxxxx.ru:53813/user/CoarseGrainedScheduler 16 h11.XXXXXXXXX.ru 2 1> /dfs/dn2/yarn/logs/application_1414589211432_63031/container_1414589211432_63031_01_000010/stdout 2> /dfs/dn2/yarn/logs/application_1414589211432_63031/container_1414589211432_63031_01_000010/stderr Why Spark doesn't kill such processes? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-doesn-t-kill-worker-process-after-failing-on-Yarn-tp19378.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org