Hi, Our spark job on yarn suddenly started failing silently without showing any error following is the trace.
Using properties file: /usr/lib/spark/conf/spark-defaults.conf Adding default property: spark.serializer=org.apache.spark.serializer.KryoSerializer Adding default property: spark.executor.extraJavaOptions=-Dlog4j.configuration=file:///etc/spark/log4j.properties Adding default property: spark.eventLog.enabled=true Adding default property: spark.shuffle.service.enabled=true Adding default property: spark.driver.extraLibraryPath=/usr/lib/hadoop/lib/native Adding default property: spark.yarn.historyServer.address=http://ds-hnn002.dev.abc.com:18088 Adding default property: spark.yarn.am.extraLibraryPath=/usr/lib/hadoop/lib/native Adding default property: spark.ui.showConsoleProgress=true Adding default property: spark.shuffle.service.port=7337 Adding default property: spark.master=yarn-client Adding default property: spark.executor.extraLibraryPath=/usr/lib/hadoop/lib/native Adding default property: spark.eventLog.dir=hdfs://magnetic-hadoop-dev/user/spark/applicationHistory Adding default property: spark.yarn.jar=local:/usr/lib/spark/assembly/lib/spark-assembly-1.3.0-cdh5.4.0-hadoop2.6.0-cdh5.4.0.jar Parsed arguments: master yarn deployMode null executorMemory 3G executorCores null totalExecutorCores null propertiesFile /usr/lib/spark/conf/spark-defaults.conf driverMemory 4G driverCores null driverExtraClassPath null driverExtraLibraryPath /usr/lib/hadoop/lib/native driverExtraJavaOptions null supervise false queue null numExecutors 30 files null pyFiles null archives null mainClass null primaryResource file:/home/jonathanarfa/code/updb/spark/updb2vw_testing.py name updb2vw_testing.py childArgs [--date 2015-05-20] jars null packages null repositories null verbose true Spark properties used, including those specified through --conf and those from the properties file /usr/lib/spark/conf/spark-defaults.conf: spark.executor.extraLibraryPath -> /usr/lib/hadoop/lib/native spark.yarn.jar -> local:/usr/lib/spark/assembly/lib/spark-assembly-1.3.0-cdh5.4.0-hadoop2.6.0-cdh5.4.0.jar spark.driver.extraLibraryPath -> /usr/lib/hadoop/lib/native spark.yarn.historyServer.address -> http://ds-hnn002.dev.abc.com:18088 spark.yarn.am.extraLibraryPath -> /usr/lib/hadoop/lib/native spark.eventLog.enabled -> true spark.ui.showConsoleProgress -> true spark.serializer -> org.apache.spark.serializer.KryoSerializer spark.executor.extraJavaOptions -> -Dlog4j.configuration=file:///etc/spark/log4j.properties spark.shuffle.service.enabled -> true spark.shuffle.service.port -> 7337 spark.eventLog.dir -> hdfs://magnetic-hadoop-dev/user/spark/applicationHistory spark.master -> yarn-client Main class: org.apache.spark.deploy.PythonRunner Arguments: file:/home/jonathanarfa/code/updb/spark/updb2vw_testing.py null --date 2015-05-20 System properties: spark.executor.extraLibraryPath -> /usr/lib/hadoop/lib/native spark.driver.memory -> 4G spark.executor.memory -> 3G spark.yarn.jar -> local:/usr/lib/spark/assembly/lib/spark-assembly-1.3.0-cdh5.4.0-hadoop2.6.0-cdh5.4.0.jar spark.driver.extraLibraryPath -> /usr/lib/hadoop/lib/native spark.executor.instances -> 30 spark.yarn.historyServer.address -> http://ds-hnn002.dev.abc.com:18088 spark.yarn.am.extraLibraryPath -> /usr/lib/hadoop/lib/native spark.ui.showConsoleProgress -> true spark.eventLog.enabled -> true spark.yarn.dist.files -> file:/home/jonathanarfa/code/updb/spark/updb2vw_testing.py SPARK_SUBMIT -> true spark.serializer -> org.apache.spark.serializer.KryoSerializer spark.executor.extraJavaOptions -> -Dlog4j.configuration=file:///etc/spark/log4j.properties spark.shuffle.service.enabled -> true spark.app.name -> updb2vw_testing.py spark.shuffle.service.port -> 7337 spark.eventLog.dir -> hdfs://magnetic-hadoop-dev/user/spark/applicationHistory spark.master -> yarn-client Classpath elements: spark.akka.frameSize=60 spark.app.name=updb2vw_2015-05-20 spark.driver.extraLibraryPath=/usr/lib/hadoop/lib/native spark.driver.maxResultSize=2G spark.driver.memory=4G spark.eventLog.dir=hdfs://magnetic-hadoop-dev/user/spark/applicationHistory spark.eventLog.enabled=true spark.executor.extraJavaOptions=-Dlog4j.configuration=file:///etc/spark/log4j.properties spark.executor.extraLibraryPath=/usr/lib/hadoop/lib/native spark.executor.instances=30 spark.executor.memory=3G spark.master=yarn-client spark.serializer=org.apache.spark.serializer.KryoSerializer spark.shuffle.manager=hash spark.shuffle.service.enabled=true spark.shuffle.service.port=7337 spark.task.maxFailures=6 spark.ui.showConsoleProgress=true spark.yarn.am.extraLibraryPath=/usr/lib/hadoop/lib/native spark.yarn.dist.files=file:/home/jonathanarfa/code/updb/spark/updb2vw_testing.py spark.yarn.executor.memoryOverhead=2000 spark.yarn.historyServer.address=http://ds-hnn002.dev.abc.com:18088 spark.yarn.jar=local:/usr/lib/spark/assembly/lib/spark-assembly-1.3.0-cdh5.4.0-hadoop2.6.0-cdh5.4.0.jar SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/lib/flume-ng/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/lib/parquet/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 15/06/22 17:04:45 WARN Utils: Your hostname, datasci01.dev.abc.com resolves to a loopback address: 127.0.0.1; using 10.0.3.197 instead (on interface eth0) 15/06/22 17:04:45 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address Traceback (most recent call last): File "/home/jonathanarfa/code/updb/spark/updb2vw_testing.py", line 125, in <module> spark_context = pyspark.SparkContext(conf=conf) File "/usr/lib/spark/python/pyspark/context.py", line 111, in __init__ conf, jsc, profiler_cls) File "/usr/lib/spark/python/pyspark/context.py", line 159, in _do_init self._jsc = jsc or self._initialize_context(self._conf._jconf) File "/usr/lib/spark/python/pyspark/context.py", line 212, in _initialize_context return self._jvm.JavaSparkContext(jconf) File "/usr/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 701, in __call__ File "/usr/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext. : org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master. at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:113) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:59) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:141) at org.apache.spark.SparkContext.<init>(SparkContext.scala:379) at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:61) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) at py4j.Gateway.invoke(Gateway.java:214) at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79) at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68) at py4j.GatewayConnection.run(GatewayConnection.java:207) at java.lang.Thread.run(Thread.java:745) 15/06/22 17:08:27 ERROR Utils: Uncaught exception in thread delete Spark local dirs java.lang.NullPointerException at org.apache.spark.storage.DiskBlockManager.org$apache$spark$storage$DiskBlockManager$$doStop(DiskBlockManager.scala:161) at org.apache.spark.storage.DiskBlockManager$$anon$1$$anonfun$run$1.apply$mcV$sp(DiskBlockManager.scala:141) at org.apache.spark.storage.DiskBlockManager$$anon$1$$anonfun$run$1.apply(DiskBlockManager.scala:139) at org.apache.spark.storage.DiskBlockManager$$anon$1$$anonfun$run$1.apply(DiskBlockManager.scala:139) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1617) at org.apache.spark.storage.DiskBlockManager$$anon$1.run(DiskBlockManager.scala:139) Exception in thread "delete Spark local dirs" java.lang.NullPointerException at org.apache.spark.storage.DiskBlockManager.org$apache$spark$storage$DiskBlockManager$$doStop(DiskBlockManager.scala:161) at org.apache.spark.storage.DiskBlockManager$$anon$1$$anonfun$run$1.apply$mcV$sp(DiskBlockManager.scala:141) at org.apache.spark.storage.DiskBlockManager$$anon$1$$anonfun$run$1.apply(DiskBlockManager.scala:139) at org.apache.spark.storage.DiskBlockManager$$anon$1$$anonfun$run$1.apply(DiskBlockManager.scala:139) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1617) at org.apache.spark.storage.DiskBlockManager$$anon$1.run(DiskBlockManager.scala:139) SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/lib/flume-ng/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/lib/parquet/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-job-fails-silently-tp23436.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org