Era created SPARK-13711: --------------------------- Summary: Apache Spark driver stopping JVM when master not available Key: SPARK-13711 URL: https://issues.apache.org/jira/browse/SPARK-13711 Project: Spark Issue Type: Bug Affects Versions: 1.6.0, 1.4.1 Reporter: Era
In my application Java spark context is created with an unavailable master URL (you may assume master is down for a maintenance). When creating Java spark context it leads to stopping JVM that runs spark driver with JVM exit code 50. When I checked the logs I found SparkUncaughtExceptionHandler calling the System.exit. My program should run forever. package test.mains; import org.apache.spark.SparkConf; import org.apache.spark.api.java.JavaSparkContext; public class CheckJavaSparkContext { /** * @param args the command line arguments */ public static void main(String[] args) { SparkConf conf = new SparkConf(); conf.setAppName("test"); conf.setMaster("spark://sunshinee:7077"); try { new JavaSparkContext(conf); } catch (Throwable e) { System.out.println("Caught an exception : " + e.getMessage()); //e.printStackTrace(); } System.out.println("Waiting to complete..."); while (true) { } } } Output log Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/data/downloads/spark-1.6.0-bin-hadoop2.6/lib/spark-assembly-1.6.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/data/downloads/spark-1.6.0-bin-hadoop2.6/lib/spark-examples-1.6.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 16/03/04 18:01:15 INFO SparkContext: Running Spark version 1.6.0 16/03/04 18:01:17 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 16/03/04 18:01:17 WARN Utils: Your hostname, pesamara-mobl-vm1 resolves to a loopback address: 127.0.0.1; using 10.30.9.107 instead (on interface eth0) 16/03/04 18:01:17 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address 16/03/04 18:01:18 INFO SecurityManager: Changing view acls to: ps40233 16/03/04 18:01:18 INFO SecurityManager: Changing modify acls to: ps40233 16/03/04 18:01:18 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(ps40233); users with modify permissions: Set(ps40233) 16/03/04 18:01:19 INFO Utils: Successfully started service 'sparkDriver' on port 55309. 16/03/04 18:01:21 INFO Slf4jLogger: Slf4jLogger started 16/03/04 18:01:21 INFO Remoting: Starting remoting 16/03/04 18:01:22 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@10.30.9.107:52128] 16/03/04 18:01:22 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 52128. 16/03/04 18:01:22 INFO SparkEnv: Registering MapOutputTracker 16/03/04 18:01:22 INFO SparkEnv: Registering BlockManagerMaster 16/03/04 18:01:22 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-87c20178-357d-4252-a46a-62a755568a98 16/03/04 18:01:22 INFO MemoryStore: MemoryStore started with capacity 457.7 MB 16/03/04 18:01:22 INFO SparkEnv: Registering OutputCommitCoordinator 16/03/04 18:01:23 INFO Utils: Successfully started service 'SparkUI' on port 4040. 16/03/04 18:01:23 INFO SparkUI: Started SparkUI at http://10.30.9.107:4040 16/03/04 18:01:24 INFO AppClient$ClientEndpoint: Connecting to master spark://sunshinee:7077... 16/03/04 18:01:24 WARN AppClient$ClientEndpoint: Failed to connect to master sunshinee:7077 java.io.IOException: Failed to connect to sunshinee:7077 at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216) at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216) at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216) at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:167) at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:200) at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:187) at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:183) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.nio.channels.UnresolvedAddressException at sun.nio.ch.Net.checkAddress(Net.java:123) at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:621) at io.netty.channel.socket.nio.NioSocketChannel.doConnect(NioSocketChannel.java:209) at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.connect(AbstractNioChannel.java:207) at io.netty.channel.DefaultChannelPipeline$HeadContext.connect(DefaultChannelPipeline.java:1097) at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:471) at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:456) at io.netty.channel.ChannelOutboundHandlerAdapter.connect(ChannelOutboundHandlerAdapter.java:47) at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:471) at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:456) at io.netty.channel.ChannelDuplexHandler.connect(ChannelDuplexHandler.java:50) at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:471) at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:456) at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:438) at io.netty.channel.DefaultChannelPipeline.connect(DefaultChannelPipeline.java:908) at io.netty.channel.AbstractChannel.connect(AbstractChannel.java:203) at io.netty.bootstrap.Bootstrap$2.run(Bootstrap.java:166) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) ... 1 more 16/03/04 18:01:44 INFO AppClient$ClientEndpoint: Connecting to master spark://sunshinee:7077... 16/03/04 18:01:44 WARN AppClient$ClientEndpoint: Failed to connect to master sunshinee:7077 java.io.IOException: Failed to connect to sunshinee:7077 at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216) at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:167) at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:200) at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:187) at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:183) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.nio.channels.UnresolvedAddressException at sun.nio.ch.Net.checkAddress(Net.java:123) at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:621) at io.netty.channel.socket.nio.NioSocketChannel.doConnect(NioSocketChannel.java:209) at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.connect(AbstractNioChannel.java:207) at io.netty.channel.DefaultChannelPipeline$HeadContext.connect(DefaultChannelPipeline.java:1097) at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:471) at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:456) at io.netty.channel.ChannelOutboundHandlerAdapter.connect(ChannelOutboundHandlerAdapter.java:47) at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:471) at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:456) at io.netty.channel.ChannelDuplexHandler.connect(ChannelDuplexHandler.java:50) at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:471) at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:456) at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:438) at io.netty.channel.DefaultChannelPipeline.connect(DefaultChannelPipeline.java:908) at io.netty.channel.AbstractChannel.connect(AbstractChannel.java:203) at io.netty.bootstrap.Bootstrap$2.run(Bootstrap.java:166) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) ... 1 more 16/03/04 18:02:04 INFO AppClient$ClientEndpoint: Connecting to master spark://sunshinee:7077... 16/03/04 18:02:04 INFO AppClient$ClientEndpoint: Connecting to master spark://sunshinee:7077... 16/03/04 18:02:04 WARN AppClient$ClientEndpoint: Failed to connect to master sunshinee:7077 java.io.IOException: Failed to connect to sunshinee:7077 at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216) at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:167) at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:200) at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:187) at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:183) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.nio.channels.UnresolvedAddressException at sun.nio.ch.Net.checkAddress(Net.java:123) at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:621) at io.netty.channel.socket.nio.NioSocketChannel.doConnect(NioSocketChannel.java:209) at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.connect(AbstractNioChannel.java:207) at io.netty.channel.DefaultChannelPipeline$HeadContext.connect(DefaultChannelPipeline.java:1097) at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:471) at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:456) at io.netty.channel.ChannelOutboundHandlerAdapter.connect(ChannelOutboundHandlerAdapter.java:47) at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:471) at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:456) at io.netty.channel.ChannelDuplexHandler.connect(ChannelDuplexHandler.java:50) at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:471) at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:456) at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:438) at io.netty.channel.DefaultChannelPipeline.connect(DefaultChannelPipeline.java:908) at io.netty.channel.AbstractChannel.connect(AbstractChannel.java:203) at io.netty.bootstrap.Bootstrap$2.run(Bootstrap.java:166) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) ... 1 more 16/03/04 18:02:24 INFO AppClient$ClientEndpoint: Connecting to master spark://sunshinee:7077... 16/03/04 18:02:24 INFO AppClient$ClientEndpoint: Connecting to master spark://sunshinee:7077... 16/03/04 18:02:24 WARN SparkDeploySchedulerBackend: Application ID is not initialized yet. 16/03/04 18:02:24 ERROR SparkDeploySchedulerBackend: Application has been killed. Reason: All masters are unresponsive! Giving up. 16/03/04 18:02:24 WARN AppClient$ClientEndpoint: Failed to connect to master sunshinee:7077 java.io.IOException: Failed to connect to sunshinee:7077 at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216) at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:167) at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:200) at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:187) at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:183) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.nio.channels.UnresolvedAddressException at sun.nio.ch.Net.checkAddress(Net.java:123) at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:621) at io.netty.channel.socket.nio.NioSocketChannel.doConnect(NioSocketChannel.java:209) at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.connect(AbstractNioChannel.java:207) at io.netty.channel.DefaultChannelPipeline$HeadContext.connect(DefaultChannelPipeline.java:1097) at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:471) at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:456) at io.netty.channel.ChannelOutboundHandlerAdapter.connect(ChannelOutboundHandlerAdapter.java:47) at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:471) at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:456) at io.netty.channel.ChannelDuplexHandler.connect(ChannelDuplexHandler.java:50) at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:471) at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:456) at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:438) at io.netty.channel.DefaultChannelPipeline.connect(DefaultChannelPipeline.java:908) at io.netty.channel.AbstractChannel.connect(AbstractChannel.java:203) at io.netty.bootstrap.Bootstrap$2.run(Bootstrap.java:166) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) ... 1 more 16/03/04 18:02:24 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 44298. 16/03/04 18:02:24 INFO NettyBlockTransferService: Server created on 44298 16/03/04 18:02:24 INFO BlockManagerMaster: Trying to register BlockManager 16/03/04 18:02:24 INFO BlockManagerMasterEndpoint: Registering block manager 10.30.9.107:44298 with 457.7 MB RAM, BlockManagerId(driver, 10.30.9.107, 44298) 16/03/04 18:02:24 INFO BlockManagerMaster: Registered BlockManager 16/03/04 18:02:24 INFO SparkUI: Stopped Spark web UI at http://10.30.9.107:4040 16/03/04 18:02:24 INFO SparkDeploySchedulerBackend: Shutting down all executors 16/03/04 18:02:24 INFO SparkDeploySchedulerBackend: Asking each executor to shut down 16/03/04 18:02:24 WARN AppClient$ClientEndpoint: Drop UnregisterApplication(null) because has not yet connected to master 16/03/04 18:02:24 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[appclient-registration-retry-thread,5,main] java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1039) at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328) at scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:208) at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) at org.apache.spark.deploy.client.AppClient.stop(AppClient.scala:290) at org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend.org$apache$spark$scheduler$cluster$SparkDeploySchedulerBackend$$stop(SparkDeploySchedulerBackend.scala:198) at org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend.stop(SparkDeploySchedulerBackend.scala:101) at org.apache.spark.scheduler.TaskSchedulerImpl.stop(TaskSchedulerImpl.scala:446) at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:1582) at org.apache.spark.SparkContext$$anonfun$stop$7.apply$mcV$sp(SparkContext.scala:1731) at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1229) at org.apache.spark.SparkContext.stop(SparkContext.scala:1730) at org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend.dead(SparkDeploySchedulerBackend.scala:127) at org.apache.spark.deploy.client.AppClient$ClientEndpoint.markDead(AppClient.scala:264) at org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anon$2$$anonfun$run$1.apply$mcV$sp(AppClient.scala:134) at org.apache.spark.util.Utils$.tryOrExit(Utils.scala:1163) at org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anon$2.run(AppClient.scala:129) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 16/03/04 18:02:24 INFO DiskBlockManager: Shutdown hook called 16/03/04 18:02:24 INFO ShutdownHookManager: Shutdown hook called 16/03/04 18:02:24 INFO ShutdownHookManager: Deleting directory /tmp/spark-ea68a0fa-4f0d-4dbb-8407-cce90ef78a52 16/03/04 18:02:24 INFO ShutdownHookManager: Deleting directory /tmp/spark-ea68a0fa-4f0d-4dbb-8407-cce90ef78a52/userFiles-db548748-a55c-4406-adcb-c09e63b118bd Java Result: 50 -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org