Dell - Internal Use - Confidential Hi, Using spark 0.8 and hadoop 1.2.1 with cluster of 2 node each have 16 CPU and allocated 8G of RAM
I am running into a use case that if I try to save a very large JavaRDD<String> that was created using paralleize from Java List<String> my job workers are failing as follows 13/11/11 19:23:48 INFO Worker: Executor app-20131111191414-0001/2 finished with state FAILED message Command exited with code 1 exitStatus 1 Looks like the spark driver trying 5 times to execute the then decide to kill the process Any help on how to get more info on the reason of failure or what code 1 existStatus 1 would means here? Any setting or configuration that I can use in spark that would dump more info on error? Here's my logs 13/11/11 19:14:50 INFO Worker: Asked to launch executor app-20131111190659-0000/0 for OMDBQueryService 13/11/11 19:14:50 INFO ExecutorRunner: Launch command: "java" "-cp" ":/opt/spark-0.8.0/conf:/opt/spark-0.8.0/assembly/target/scala-2.9.3/spark-assembly_2.9.3-0.8.0-incubating-hadoop1.0.4.jar" "-Dspark.executor.memory=8g" "-Dspark.local.dir=/tmp/spark" "-XX:+UseParallelGC" "-XX:+UseParallelOldGC" "-XX:+DisableExplicitGC" "-XX:MaxPermSize=1024m" "-Dspark.executor.memory=8g" "-Dspark.local.dir=/tmp/spark" "-XX:+UseParallelGC" "-XX:+UseParallelOldGC" "-XX:+DisableExplicitGC" "-XX:MaxPermSize=1024m" "-Dspark.executor.memory=8g" "-Dspark.local.dir=/tmp/spark" "-XX:+UseParallelGC" "-XX:+UseParallelOldGC" "-XX:+DisableExplicitGC" "-XX:MaxPermSize=1024m" "-Dspark.executor.memory=8g" "-Dspark.local.dir=/tmp/spark" "-XX:+UseParallelGC" "-XX:+UseParallelOldGC" "-XX:+DisableExplicitGC" "-XX:MaxPermSize=1024m" "-Xms512M" "-Xmx512M" "org.apache.spark.executor.StandaloneExecutorBackend" "akka://spark@poc1:54482/user/StandaloneScheduler" "0" "poc3" "16" 13/11/11 19:16:47 INFO Worker: Executor app-20131111190659-0000/0 finished with state FAILED message Command exited with code 1 exitStatus 1 13/11/11 19:16:47 INFO Worker: Asked to launch executor app-20131111190659-0000/2 for OMDBQueryService 13/11/11 19:16:47 INFO ExecutorRunner: Launch command: "java" "-cp" ":/opt/spark-0.8.0/conf:/opt/spark-0.8.0/assembly/target/scala-2.9.3/spark-assembly_2.9.3-0.8.0-incubating-hadoop1.0.4.jar" "-Dspark.executor.memory=8g" "-Dspark.local.dir=/tmp/spark" "-XX:+UseParallelGC" "-XX:+UseParallelOldGC" "-XX:+DisableExplicitGC" "-XX:MaxPermSize=1024m" "-Dspark.executor.memory=8g" "-Dspark.local.dir=/tmp/spark" "-XX:+UseParallelGC" "-XX:+UseParallelOldGC" "-XX:+DisableExplicitGC" "-XX:MaxPermSize=1024m" "-Dspark.executor.memory=8g" "-Dspark.local.dir=/tmp/spark" "-XX:+UseParallelGC" "-XX:+UseParallelOldGC" "-XX:+DisableExplicitGC" "-XX:MaxPermSize=1024m" "-Dspark.executor.memory=8g" "-Dspark.local.dir=/tmp/spark" "-XX:+UseParallelGC" "-XX:+UseParallelOldGC" "-XX:+DisableExplicitGC" "-XX:MaxPermSize=1024m" "-Xms512M" "-Xmx512M" "org.apache.spark.executor.StandaloneExecutorBackend" "akka://spark@poc1:54482/user/StandaloneScheduler" "2" "poc3" "16" 13/11/11 19:16:53 INFO Worker: Executor app-20131111190659-0000/2 finished with state FAILED message Command exited with code 1 exitStatus 1 13/11/11 19:16:53 INFO Worker: Asked to launch executor app-20131111190659-0000/4 for OMDBQueryService 13/11/11 19:16:53 INFO ExecutorRunner: Launch command: "java" "-cp" ":/opt/spark-0.8.0/conf:/opt/spark-0.8.0/assembly/target/scala-2.9.3/spark-assembly_2.9.3-0.8.0-incubating-hadoop1.0.4.jar" "-Dspark.executor.memory=8g" "-Dspark.local.dir=/tmp/spark" "-XX:+UseParallelGC" "-XX:+UseParallelOldGC" "-XX:+DisableExplicitGC" "-XX:MaxPermSize=1024m" "-Dspark.executor.memory=8g" "-Dspark.local.dir=/tmp/spark" "-XX:+UseParallelGC" "-XX:+UseParallelOldGC" "-XX:+DisableExplicitGC" "-XX:MaxPermSize=1024m" "-Dspark.executor.memory=8g" "-Dspark.local.dir=/tmp/spark" "-XX:+UseParallelGC" "-XX:+UseParallelOldGC" "-XX:+DisableExplicitGC" "-XX:MaxPermSize=1024m" "-Dspark.executor.memory=8g" "-Dspark.local.dir=/tmp/spark" "-XX:+UseParallelGC" "-XX:+UseParallelOldGC" "-XX:+DisableExplicitGC" "-XX:MaxPermSize=1024m" "-Xms512M" "-Xmx512M" "org.apache.spark.executor.StandaloneExecutorBackend" "akka://spark@poc1:54482/user/StandaloneScheduler" "4" "poc3" "16" 13/11/11 19:17:02 INFO Worker: Executor app-20131111190659-0000/4 finished with state FAILED message Command exited with code 1 exitStatus 1 13/11/11 19:17:02 INFO Worker: Asked to launch executor app-20131111190659-0000/6 for OMDBQueryService 13/11/11 19:17:02 INFO ExecutorRunner: Launch command: "java" "-cp" ":/opt/spark-0.8.0/conf:/opt/spark-0.8.0/assembly/target/scala-2.9.3/spark-assembly_2.9.3-0.8.0-incubating-hadoop1.0.4.jar" "-Dspark.executor.memory=8g" "-Dspark.local.dir=/tmp/spark" "-XX:+UseParallelGC" "-XX:+UseParallelOldGC" "-XX:+DisableExplicitGC" "-XX:MaxPermSize=1024m" "-Dspark.executor.memory=8g" "-Dspark.local.dir=/tmp/spark" "-XX:+UseParallelGC" "-XX:+UseParallelOldGC" "-XX:+DisableExplicitGC" "-XX:MaxPermSize=1024m" "-Dspark.executor.memory=8g" "-Dspark.local.dir=/tmp/spark" "-XX:+UseParallelGC" "-XX:+UseParallelOldGC" "-XX:+DisableExplicitGC" "-XX:MaxPermSize=1024m" "-Dspark.executor.memory=8g" "-Dspark.local.dir=/tmp/spark" "-XX:+UseParallelGC" "-XX:+UseParallelOldGC" "-XX:+DisableExplicitGC" "-XX:MaxPermSize=1024m" "-Xms512M" "-Xmx512M" "org.apache.spark.executor.StandaloneExecutorBackend" "akka://spark@poc1:54482/user/StandaloneScheduler" "6" "poc3" "16" 13/11/11 19:17:09 INFO Worker: Executor app-20131111190659-0000/6 finished with state FAILED message Command exited with code 1 exitStatus 1 13/11/11 19:17:09 INFO Worker: Asked to launch executor app-20131111190659-0000/8 for OMDBQueryService 13/11/11 19:17:09 INFO ExecutorRunner: Launch command: "java" "-cp" ":/opt/spark-0.8.0/conf:/opt/spark-0.8.0/assembly/target/scala-2.9.3/spark-assembly_2.9.3-0.8.0-incubating-hadoop1.0.4.jar" "-Dspark.executor.memory=8g" "-Dspark.local.dir=/tmp/spark" "-XX:+UseParallelGC" "-XX:+UseParallelOldGC" "-XX:+DisableExplicitGC" "-XX:MaxPermSize=1024m" "-Dspark.executor.memory=8g" "-Dspark.local.dir=/tmp/spark" "-XX:+UseParallelGC" "-XX:+UseParallelOldGC" "-XX:+DisableExplicitGC" "-XX:MaxPermSize=1024m" "-Dspark.executor.memory=8g" "-Dspark.local.dir=/tmp/spark" "-XX:+UseParallelGC" "-XX:+UseParallelOldGC" "-XX:+DisableExplicitGC" "-XX:MaxPermSize=1024m" "-Dspark.executor.memory=8g" "-Dspark.local.dir=/tmp/spark" "-XX:+UseParallelGC" "-XX:+UseParallelOldGC" "-XX:+DisableExplicitGC" "-XX:MaxPermSize=1024m" "-Xms512M" "-Xmx512M" "org.apache.spark.executor.StandaloneExecutorBackend" "akka://spark@poc1:54482/user/StandaloneScheduler" "8" "poc3" "16" 13/11/11 19:17:17 INFO Worker: Executor app-20131111190659-0000/8 finished with state FAILED message Command exited with code 1 exitStatus 1 13/11/11 19:17:17 INFO Worker: Asked to launch executor app-20131111190659-0000/10 for OMDBQueryService 13/11/11 19:17:17 INFO ExecutorRunner: Launch command: "java" "-cp" ":/opt/spark-0.8.0/conf:/opt/spark-0.8.0/assembly/target/scala-2.9.3/spark-assembly_2.9.3-0.8.0-incubating-hadoop1.0.4.jar" "-Dspark.executor.memory=8g" "-Dspark.local.dir=/tmp/spark" "-XX:+UseParallelGC" "-XX:+UseParallelOldGC" "-XX:+DisableExplicitGC" "-XX:MaxPermSize=1024m" "-Dspark.executor.memory=8g" "-Dspark.local.dir=/tmp/spark" "-XX:+UseParallelGC" "-XX:+UseParallelOldGC" "-XX:+DisableExplicitGC" "-XX:MaxPermSize=1024m" "-Dspark.executor.memory=8g" "-Dspark.local.dir=/tmp/spark" "-XX:+UseParallelGC" "-XX:+UseParallelOldGC" "-XX:+DisableExplicitGC" "-XX:MaxPermSize=1024m" "-Dspark.executor.memory=8g" "-Dspark.local.dir=/tmp/spark" "-XX:+UseParallelGC" "-XX:+UseParallelOldGC" "-XX:+DisableExplicitGC" "-XX:MaxPermSize=1024m" "-Xms512M" "-Xmx512M" "org.apache.spark.executor.StandaloneExecutorBackend" "akka://spark@poc1:54482/user/StandaloneScheduler" "10" "poc3" "16" 13/11/11 19:17:20 INFO Worker: Asked to kill executor app-20131111190659-0000/10 13/11/11 19:17:20 INFO ExecutorRunner: Killing process! 13/11/11 19:17:20 INFO ExecutorRunner: Runner thread for executor app-20131111190659-0000/10 interrupted 13/11/11 19:17:21 INFO Worker: Executor app-20131111190659-0000/10 finished with state KILLED
