Hi, I am getting ExecutorLostFailure when I run spark on YARN and in map I perform very long tasks (couple of hours). Error Log is below.
Do you know if it is possible to set something to make it possible for Spark to perform these very long running jobs in map? Thank you very much for any advice. Best regards, Jan Spark log: 4533,931: [GC 394578K->20882K(1472000K), 0,0226470 secs] Traceback (most recent call last): File "/home/hadoop/spark_stuff/spark_lda.py", line 112, in <module> models.saveAsTextFile(sys.argv[1]) File "/home/hadoop/spark/python/pyspark/rdd.py", line 1324, in saveAsTextFile keyed._jrdd.map(self.ctx._jvm.BytesToString()).saveAsTextFile(path) File "/home/hadoop/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 538, in __call__ File "/home/hadoop/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o36.saveAsTextFile. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 28 in stage 0.0 failed 4 times, most recent failure: Lost task 28.3 in stage 0.0 (TID 41, ip-172-16-1-90.us-west-2.compute.internal): ExecutorLostFailure (executor lost) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1173) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:688) at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Yarn log: 14/11/08 08:20:34 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-172-16-1-152.us-west-2.compute.internal:41091 (size: 596.9 KB, free: 775.7 MB) 14/11/08 08:20:34 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-172-16-1-152.us-west-2.compute.internal:39160 (size: 596.9 KB, free: 775.7 MB) 14/11/08 08:20:34 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-172-16-1-152.us-west-2.compute.internal:45058 (size: 596.9 KB, free: 775.7 MB) 14/11/08 08:20:34 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-172-16-1-241.us-west-2.compute.internal:54111 (size: 596.9 KB, free: 775.7 MB) 14/11/08 08:20:34 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-172-16-1-238.us-west-2.compute.internal:45772 (size: 596.9 KB, free: 775.7 MB) 14/11/08 08:20:34 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-172-16-1-241.us-west-2.compute.internal:59509 (size: 596.9 KB, free: 775.7 MB) 14/11/08 08:20:34 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-172-16-1-238.us-west-2.compute.internal:35720 (size: 596.9 KB, free: 775.7 MB) 14/11/08 08:21:11 INFO network.ConnectionManager: Removing SendingConnection to ConnectionManagerId(ip-172-16-1-241.us-west-2.compute.internal,59509) 14/11/08 08:21:11 INFO network.ConnectionManager: Removing ReceivingConnection to ConnectionManagerId(ip-172-16-1-241.us-west-2.compute.internal,59509) 14/11/08 08:21:11 ERROR network.ConnectionManager: Corresponding SendingConnection to ConnectionManagerId(ip-172-16-1-241.us-west-2.compute.internal,59509) not found 14/11/08 08:21:11 INFO cluster.YarnClientSchedulerBackend: Executor 10 disconnected, so removing it 14/11/08 08:21:11 ERROR cluster.YarnClientClusterScheduler: Lost executor 10 on ip-172-16-1-241.us-west-2.compute.internal: remote Akka client disassociated 14/11/08 08:21:11 INFO scheduler.TaskSetManager: Re-queueing tasks for 10 from TaskSet 0.0 14/11/08 08:21:11 WARN scheduler.TaskSetManager: Lost task 28.0 in stage 0.0 (TID 28, ip-172-16-1-241.us-west-2.compute.internal): ExecutorLostFailure (executor lost) 14/11/08 08:21:11 INFO scheduler.DAGScheduler: Executor lost: 10 (epoch 0) 14/11/08 08:21:11 INFO storage.BlockManagerMasterActor: Trying to remove executor 10 from BlockManagerMaster. 14/11/08 08:21:11 INFO storage.BlockManagerMaster: Removed 10 successfully in removeExecutor 14/11/08 08:21:20 INFO network.ConnectionManager: Removing ReceivingConnection to ConnectionManagerId(ip-172-16-1-194.us-west-2.compute.internal,45823) 14/11/08 08:21:20 INFO network.ConnectionManager: Removing SendingConnection to ConnectionManagerId(ip-172-16-1-194.us-west-2.compute.internal,45823) 14/11/08 08:21:20 INFO network.ConnectionManager: Removing SendingConnection to ConnectionManagerId(ip-172-16-1-194.us-west-2.compute.internal,45823) 14/11/08 08:21:20 INFO cluster.YarnClientSchedulerBackend: Executor 5 disconnected, so removing it 14/11/08 08:21:20 ERROR cluster.YarnClientClusterScheduler: Lost executor 5 on ip-172-16-1-194.us-west-2.compute.internal: remote Akka client disassociated 14/11/08 08:21:20 INFO scheduler.TaskSetManager: Re-queueing tasks for 5 from TaskSet 0.0 14/11/08 08:21:20 WARN scheduler.TaskSetManager: Lost task 21.0 in stage 0.0 (TID 21, ip-172-16-1-194.us-west-2.compute.internal): ExecutorLostFailure (executor lost) 14/11/08 08:21:20 INFO scheduler.DAGScheduler: Executor lost: 5 (epoch 1) 14/11/08 08:21:20 INFO network.ConnectionManager: key already cancelled ? sun.nio.ch.SelectionKeyImpl@3bb633cd java.nio.channels.CancelledKeyException at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73) at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77) at org.apache.spark.network.ConnectionManager.run(ConnectionManager.scala:289) at org.apache.spark.network.ConnectionManager$$anon$4.run(ConnectionManager.scala:139) 14/11/08 08:21:20 INFO storage.BlockManagerMasterActor: Trying to remove executor 5 from BlockManagerMaster. 14/11/08 08:21:20 INFO storage.BlockManagerMaster: Removed 5 successfully in removeExecutor 14/11/08 08:21:21 INFO network.ConnectionManager: Removing ReceivingConnection to ConnectionManagerId(ip-172-16-1-92.us-west-2.compute.internal,50928) 14/11/08 08:21:21 INFO network.ConnectionManager: Removing SendingConnection to ConnectionManagerId(ip-172-16-1-92.us-west-2.compute.internal,50928) 14/11/08 08:21:21 INFO network.ConnectionManager: Removing SendingConnection to ConnectionManagerId(ip-172-16-1-92.us-west-2.compute.internal,50928) 14/11/08 08:21:21 INFO cluster.YarnClientSchedulerBackend: Executor 27 disconnected, so removing it 14/11/08 08:21:21 ERROR cluster.YarnClientClusterScheduler: Lost executor 27 on ip-172-16-1-92.us-west-2.compute.internal: remote Akka client disassociated 14/11/08 08:21:21 INFO scheduler.TaskSetManager: Re-queueing tasks for 27 from TaskSet 0.0 14/11/08 08:21:21 WARN scheduler.TaskSetManager: Lost task 27.0 in stage 0.0 (TID 27, ip-172-16-1-92.us-west-2.compute.internal): ExecutorLostFailure (executor lost) 14/11/08 08:21:21 INFO scheduler.DAGScheduler: Executor lost: 27 (epoch 2) 14/11/08 08:21:21 INFO storage.BlockManagerMasterActor: Trying to remove executor 27 from BlockManagerMaster. 14/11/08 08:21:21 INFO storage.BlockManagerMaster: Removed 27 successfully in removeExecutor 14/11/08 08:21:21 INFO network.ConnectionManager: Removing ReceivingConnection to ConnectionManagerId(ip-172-16-1-152.us-west-2.compute.internal,41091) 14/11/08 08:21:21 INFO network.ConnectionManager: Removing SendingConnection to ConnectionManagerId(ip-172-16-1-152.us-west-2.compute.internal,41091) 14/11/08 08:21:21 INFO network.ConnectionManager: Removing SendingConnection to ConnectionManagerId(ip-172-16-1-152.us-west-2.compute.internal,41091) 14/11/08 08:21:21 INFO cluster.YarnClientSchedulerBackend: Executor 20 disconnected, so removing it 14/11/08 08:21:21 ERROR cluster.YarnClientClusterScheduler: Lost executor 20 on ip-172-16-1-152.us-west-2.compute.internal: remote Akka client disassociated 14/11/08 08:21:21 INFO scheduler.TaskSetManager: Re-queueing tasks for 20 from TaskSet 0.0 14/11/08 08:21:21 WARN scheduler.TaskSetManager: Lost task 29.0 in stage 0.0 (TID 29, ip-172-16-1-152.us-west-2.compute.internal): ExecutorLostFailure (executor lost) 14/11/08 08:21:21 INFO scheduler.DAGScheduler: Executor lost: 20 (epoch 3) 14/11/08 08:21:21 INFO storage.BlockManagerMasterActor: Trying to remove executor 20 from BlockManagerMaster. 14/11/08 08:21:21 INFO storage.BlockManagerMaster: Removed 20 successfully in removeExecutor 14/11/08 08:21:26 INFO network.ConnectionManager: Removing ReceivingConnection to ConnectionManagerId(ip-172-16-1-23.us-west-2.compute.internal,51269) 14/11/08 08:21:26 INFO network.ConnectionManager: Removing SendingConnection to ConnectionManagerId(ip-172-16-1-23.us-west-2.compute.internal,51269) 14/11/08 08:21:26 INFO network.ConnectionManager: Removing SendingConnection to ConnectionManagerId(ip-172-16-1-23.us-west-2.compute.internal,51269) 14/11/08 08:21:26 INFO cluster.YarnClientSchedulerBackend: Executor 6 disconnected, so removing it 14/11/08 08:21:26 ERROR cluster.YarnClientClusterScheduler: Lost executor 6 on ip-172-16-1-23.us-west-2.compute.internal: remote Akka client disassociated 14/11/08 08:21:26 INFO scheduler.TaskSetManager: Re-queueing tasks for 6 from TaskSet 0.0 14/11/08 08:21:26 WARN scheduler.TaskSetManager: Lost task 24.0 in stage 0.0 (TID 24, ip-172-16-1-23.us-west-2.compute.internal): ExecutorLostFailure (executor lost) 14/11/08 08:21:26 INFO scheduler.DAGScheduler: Executor lost: 6 (epoch 4) 14/11/08 08:21:26 INFO storage.BlockManagerMasterActor: Trying to remove executor 6 from BlockManagerMaster. 14/11/08 08:21:26 INFO storage.BlockManagerMaster: Removed 6 successfully in removeExecutor 14/11/08 08:21:26 INFO network.ConnectionManager: Removing SendingConnection to ConnectionManagerId(ip-172-16-1-90.us-west-2.compute.internal,46792) 14/11/08 08:21:26 INFO network.ConnectionManager: Removing ReceivingConnection to ConnectionManagerId(ip-172-16-1-90.us-west-2.compute.internal,46792) 14/11/08 08:21:26 ERROR network.ConnectionManager: Corresponding SendingConnection to ConnectionManagerId(ip-172-16-1-90.us-west-2.compute.internal,46792) not found 14/11/08 08:21:26 INFO cluster.YarnClientSchedulerBackend: Executor 21 disconnected, so removing it 14/11/08 08:21:26 ERROR cluster.YarnClientClusterScheduler: Lost executor 21 on ip-172-16-1-90.us-west-2.compute.internal: remote Akka client disassociated 14/11/08 08:21:26 INFO scheduler.TaskSetManager: Re-queueing tasks for 21 from TaskSet 0.0 14/11/08 08:21:26 WARN scheduler.TaskSetManager: Lost task 25.0 in stage 0.0 (TID 25, ip-172-16-1-90.us-west-2.compute.internal): ExecutorLostFailure (executor lost) 14/11/08 08:21:26 INFO scheduler.DAGScheduler: Executor lost: 21 (epoch 5) 14/11/08 08:21:26 INFO storage.BlockManagerMasterActor: Trying to remove executor 21 from BlockManagerMaster. 14/11/08 08:21:26 INFO storage.BlockManagerMaster: Removed 21 successfully in removeExecutor 14/11/08 08:21:29 INFO cluster.YarnClientSchedulerBackend: Executor 18 disconnected, so removing it 14/11/08 08:21:29 INFO network.ConnectionManager: Removing ReceivingConnection to ConnectionManagerId(ip-172-16-1-222.us-west-2.compute.internal,43883) 14/11/08 08:21:29 INFO network.ConnectionManager: Removing SendingConnection to ConnectionManagerId(ip-172-16-1-222.us-west-2.compute.internal,43883) 14/11/08 08:21:29 INFO network.ConnectionManager: Removing SendingConnection to ConnectionManagerId(ip-172-16-1-222.us-west-2.compute.internal,43883) 14/11/08 08:21:29 ERROR cluster.YarnClientClusterScheduler: Lost executor 18 on ip-172-16-1-222.us-west-2.compute.internal: remote Akka client disassociated 14/11/08 08:21:29 INFO scheduler.TaskSetManager: Re-queueing tasks for 18 from TaskSet 0.0 14/11/08 08:21:29 WARN scheduler.TaskSetManager: Lost task 26.0 in stage 0.0 (TID 26, ip-172-16-1-222.us-west-2.compute.internal): ExecutorLostFailure (executor lost) 14/11/08 08:21:29 INFO scheduler.DAGScheduler: Executor lost: 18 (epoch 6) 14/11/08 08:21:29 INFO storage.BlockManagerMasterActor: Trying to remove executor 18 from BlockManagerMaster. 14/11/08 08:21:29 INFO storage.BlockManagerMaster: Removed 18 successfully in removeExecutor 14/11/08 08:21:30 INFO cluster.YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://sparkexecu...@ip-172-16-1-194.us-west-2.compute.internal:50858/user/Executor#935992941] with ID 31 14/11/08 08:21:30 INFO scheduler.TaskSetManager: Starting task 26.1 in stage 0.0 (TID 30, ip-172-16-1-194.us-west-2.compute.internal, PROCESS_LOCAL, 1122 bytes) 14/11/08 08:21:30 INFO storage.BlockManagerMasterActor: Registering block manager ip-172-16-1-194.us-west-2.compute.internal:44263 with 776.3 MB RAM 14/11/08 08:21:31 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-172-16-1-194.us-west-2.compute.internal:44263 (size: 596.9 KB, free: 775.7 MB) 14/11/08 08:21:33 INFO network.ConnectionManager: Removing SendingConnection to ConnectionManagerId(ip-172-16-1-222.us-west-2.compute.internal,40102) 14/11/08 08:21:33 INFO network.ConnectionManager: Removing ReceivingConnection to ConnectionManagerId(ip-172-16-1-222.us-west-2.compute.internal,40102) 14/11/08 08:21:33 ERROR network.ConnectionManager: Corresponding SendingConnection to ConnectionManagerId(ip-172-16-1-222.us-west-2.compute.internal,40102) not found 14/11/08 08:21:33 INFO cluster.YarnClientSchedulerBackend: Executor 26 disconnected, so removing it 14/11/08 08:21:33 ERROR cluster.YarnClientClusterScheduler: Lost executor 26 on ip-172-16-1-222.us-west-2.compute.internal: remote Akka client disassociated 14/11/08 08:21:33 INFO scheduler.TaskSetManager: Re-queueing tasks for 26 from TaskSet 0.0 14/11/08 08:21:33 WARN scheduler.TaskSetManager: Lost task 23.0 in stage 0.0 (TID 23, ip-172-16-1-222.us-west-2.compute.internal): ExecutorLostFailure (executor lost) 14/11/08 08:21:33 INFO scheduler.DAGScheduler: Executor lost: 26 (epoch 7) 14/11/08 08:21:33 INFO storage.BlockManagerMasterActor: Trying to remove executor 26 from BlockManagerMaster. 14/11/08 08:21:33 INFO storage.BlockManagerMaster: Removed 26 successfully in removeExecutor 14/11/08 08:21:36 INFO network.ConnectionManager: Removing ReceivingConnection to ConnectionManagerId(ip-172-16-1-241.us-west-2.compute.internal,43310) 14/11/08 08:21:36 INFO network.ConnectionManager: Removing SendingConnection to ConnectionManagerId(ip-172-16-1-241.us-west-2.compute.internal,43310) 14/11/08 08:21:36 INFO cluster.YarnClientSchedulerBackend: Executor 1 disconnected, so removing it 14/11/08 08:21:36 ERROR cluster.YarnClientClusterScheduler: Lost executor 1 on ip-172-16-1-241.us-west-2.compute.internal: remote Akka client disassociated 14/11/08 08:21:36 INFO scheduler.TaskSetManager: Re-queueing tasks for 1 from TaskSet 0.0 14/11/08 08:21:36 WARN scheduler.TaskSetManager: Lost task 22.0 in stage 0.0 (TID 22, ip-172-16-1-241.us-west-2.compute.internal): ExecutorLostFailure (executor lost) 14/11/08 08:21:36 ERROR network.SendingConnection: Exception while reading SendingConnection to ConnectionManagerId(ip-172-16-1-241.us-west-2.compute.internal,43310) java.nio.channels.ClosedChannelException at sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:252) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:295) at org.apache.spark.network.SendingConnection.read(Connection.scala:390) at org.apache.spark.network.ConnectionManager$$anon$7.run(ConnectionManager.scala:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 14/11/08 08:21:36 INFO scheduler.DAGScheduler: Executor lost: 1 (epoch 8) 14/11/08 08:21:36 INFO storage.BlockManagerMasterActor: Trying to remove executor 1 from BlockManagerMaster. 14/11/08 08:21:36 INFO storage.BlockManagerMaster: Removed 1 successfully in removeExecutor 14/11/08 08:21:36 INFO network.ConnectionManager: Handling connection error on connection to ConnectionManagerId(ip-172-16-1-241.us-west-2.compute.internal,43310) 14/11/08 08:21:36 INFO network.ConnectionManager: Removing SendingConnection to ConnectionManagerId(ip-172-16-1-241.us-west-2.compute.internal,43310) 14/11/08 08:21:36 INFO network.ConnectionManager: Removing SendingConnection to ConnectionManagerId(ip-172-16-1-241.us-west-2.compute.internal,43310) 14/11/08 08:21:40 INFO cluster.YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://sparkexecu...@ip-172-16-1-194.us-west-2.compute.internal:58099/user/Executor#-112835629] with ID 34 14/11/08 08:21:40 INFO scheduler.TaskSetManager: Starting task 22.1 in stage 0.0 (TID 31, ip-172-16-1-194.us-west-2.compute.internal, PROCESS_LOCAL, 1122 bytes) 14/11/08 08:21:41 INFO storage.BlockManagerMasterActor: Registering block manager ip-172-16-1-194.us-west-2.compute.internal:41093 with 776.3 MB RAM 14/11/08 08:21:41 INFO cluster.YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://sparkexecu...@ip-172-16-1-228.us-west-2.compute.internal:36136/user/Executor#318736262] with ID 32 14/11/08 08:21:41 INFO scheduler.TaskSetManager: Starting task 23.1 in stage 0.0 (TID 32, ip-172-16-1-228.us-west-2.compute.internal, PROCESS_LOCAL, 1122 bytes) 14/11/08 08:21:41 INFO cluster.YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://sparkexecu...@ip-172-16-1-90.us-west-2.compute.internal:33130/user/Executor#1744030597] with ID 33 14/11/08 08:21:41 INFO scheduler.TaskSetManager: Starting task 25.1 in stage 0.0 (TID 33, ip-172-16-1-90.us-west-2.compute.internal, PROCESS_LOCAL, 1122 bytes) 14/11/08 08:21:41 INFO cluster.YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://sparkexecu...@ip-172-16-1-92.us-west-2.compute.internal:55503/user/Executor#574084779] with ID 35 14/11/08 08:21:41 INFO scheduler.TaskSetManager: Starting task 24.1 in stage 0.0 (TID 34, ip-172-16-1-92.us-west-2.compute.internal, PROCESS_LOCAL, 1122 bytes) 14/11/08 08:21:42 INFO storage.BlockManagerMasterActor: Registering block manager ip-172-16-1-228.us-west-2.compute.internal:40128 with 776.3 MB RAM 14/11/08 08:21:42 INFO storage.BlockManagerMasterActor: Registering block manager ip-172-16-1-90.us-west-2.compute.internal:32839 with 776.3 MB RAM 14/11/08 08:21:42 INFO storage.BlockManagerMasterActor: Registering block manager ip-172-16-1-92.us-west-2.compute.internal:58081 with 776.3 MB RAM 14/11/08 08:21:42 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-172-16-1-194.us-west-2.compute.internal:41093 (size: 596.9 KB, free: 775.7 MB) 14/11/08 08:21:43 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-172-16-1-228.us-west-2.compute.internal:40128 (size: 596.9 KB, free: 775.7 MB) 14/11/08 08:21:43 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-172-16-1-92.us-west-2.compute.internal:58081 (size: 596.9 KB, free: 775.7 MB) 14/11/08 08:21:43 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-172-16-1-90.us-west-2.compute.internal:32839 (size: 596.9 KB, free: 775.7 MB) 14/11/08 08:21:43 INFO cluster.YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://sparkexecu...@ip-172-16-1-152.us-west-2.compute.internal:34268/user/Executor#-937582169] with ID 36 14/11/08 08:21:43 INFO scheduler.TaskSetManager: Starting task 29.1 in stage 0.0 (TID 35, ip-172-16-1-152.us-west-2.compute.internal, PROCESS_LOCAL, 1122 bytes) 14/11/08 08:21:44 INFO storage.BlockManagerMasterActor: Registering block manager ip-172-16-1-152.us-west-2.compute.internal:52550 with 776.3 MB RAM 14/11/08 08:21:45 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-172-16-1-152.us-west-2.compute.internal:52550 (size: 596.9 KB, free: 775.7 MB) 14/11/08 08:21:46 INFO cluster.YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://sparkexecu...@ip-172-16-1-90.us-west-2.compute.internal:34555/user/Executor#-94727554] with ID 37 14/11/08 08:21:46 INFO scheduler.TaskSetManager: Starting task 27.1 in stage 0.0 (TID 36, ip-172-16-1-90.us-west-2.compute.internal, PROCESS_LOCAL, 1122 bytes) 14/11/08 08:21:46 INFO cluster.YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://sparkexecu...@ip-172-16-1-228.us-west-2.compute.internal:34471/user/Executor#1412546630] with ID 38 14/11/08 08:21:46 INFO scheduler.TaskSetManager: Starting task 21.1 in stage 0.0 (TID 37, ip-172-16-1-228.us-west-2.compute.internal, PROCESS_LOCAL, 1122 bytes) 14/11/08 08:21:47 INFO storage.BlockManagerMasterActor: Registering block manager ip-172-16-1-90.us-west-2.compute.internal:46194 with 776.3 MB RAM 14/11/08 08:21:47 INFO storage.BlockManagerMasterActor: Registering block manager ip-172-16-1-228.us-west-2.compute.internal:42275 with 776.3 MB RAM 14/11/08 08:21:48 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-172-16-1-90.us-west-2.compute.internal:46194 (size: 596.9 KB, free: 775.7 MB) 14/11/08 08:21:48 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-172-16-1-228.us-west-2.compute.internal:42275 (size: 596.9 KB, free: 775.7 MB) 14/11/08 08:21:50 INFO cluster.YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://sparkexecu...@ip-172-16-1-23.us-west-2.compute.internal:37122/user/Executor#1404320204] with ID 39 14/11/08 08:21:51 INFO scheduler.TaskSetManager: Starting task 28.1 in stage 0.0 (TID 38, ip-172-16-1-23.us-west-2.compute.internal, PROCESS_LOCAL, 1122 bytes) 14/11/08 08:21:51 INFO storage.BlockManagerMasterActor: Registering block manager ip-172-16-1-23.us-west-2.compute.internal:33106 with 776.3 MB RAM 14/11/08 08:21:52 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-172-16-1-23.us-west-2.compute.internal:33106 (size: 596.9 KB, free: 775.7 MB) 14/11/08 08:22:36 INFO cluster.YarnClientSchedulerBackend: Executor 39 disconnected, so removing it 14/11/08 08:22:36 ERROR cluster.YarnClientClusterScheduler: Lost executor 39 on ip-172-16-1-23.us-west-2.compute.internal: remote Akka client disassociated 14/11/08 08:22:36 INFO scheduler.TaskSetManager: Re-queueing tasks for 39 from TaskSet 0.0 14/11/08 08:22:36 WARN scheduler.TaskSetManager: Lost task 28.1 in stage 0.0 (TID 38, ip-172-16-1-23.us-west-2.compute.internal): ExecutorLostFailure (executor lost) 14/11/08 08:22:36 INFO scheduler.DAGScheduler: Executor lost: 39 (epoch 9) 14/11/08 08:22:36 INFO storage.BlockManagerMasterActor: Trying to remove executor 39 from BlockManagerMaster. 14/11/08 08:22:36 INFO storage.BlockManagerMaster: Removed 39 successfully in removeExecutor 14/11/08 08:22:57 INFO cluster.YarnClientSchedulerBackend: Executor 36 disconnected, so removing it 14/11/08 08:22:57 ERROR cluster.YarnClientClusterScheduler: Lost executor 36 on ip-172-16-1-152.us-west-2.compute.internal: remote Akka client disassociated 14/11/08 08:22:57 INFO scheduler.TaskSetManager: Re-queueing tasks for 36 from TaskSet 0.0 14/11/08 08:22:57 WARN scheduler.TaskSetManager: Lost task 29.1 in stage 0.0 (TID 35, ip-172-16-1-152.us-west-2.compute.internal): ExecutorLostFailure (executor lost) 14/11/08 08:22:57 INFO scheduler.DAGScheduler: Executor lost: 36 (epoch 10) 14/11/08 08:22:57 INFO storage.BlockManagerMasterActor: Trying to remove executor 36 from BlockManagerMaster. 14/11/08 08:22:57 INFO storage.BlockManagerMaster: Removed 36 successfully in removeExecutor 14/11/08 08:23:00 INFO cluster.YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://sparkexecu...@ip-172-16-1-90.us-west-2.compute.internal:48033/user/Executor#-1088273404] with ID 40 14/11/08 08:23:00 INFO scheduler.TaskSetManager: Starting task 29.2 in stage 0.0 (TID 39, ip-172-16-1-90.us-west-2.compute.internal, PROCESS_LOCAL, 1122 bytes) 14/11/08 08:23:01 INFO storage.BlockManagerMasterActor: Registering block manager ip-172-16-1-90.us-west-2.compute.internal:39067 with 776.3 MB RAM 14/11/08 08:23:03 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-172-16-1-90.us-west-2.compute.internal:39067 (size: 596.9 KB, free: 775.7 MB) 14/11/08 08:23:15 INFO cluster.YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://sparkexecu...@ip-172-16-1-23.us-west-2.compute.internal:48860/user/Executor#-369895446] with ID 41 14/11/08 08:23:15 INFO scheduler.TaskSetManager: Starting task 28.2 in stage 0.0 (TID 40, ip-172-16-1-23.us-west-2.compute.internal, PROCESS_LOCAL, 1122 bytes) 14/11/08 08:23:16 INFO storage.BlockManagerMasterActor: Registering block manager ip-172-16-1-23.us-west-2.compute.internal:38093 with 776.3 MB RAM 14/11/08 08:23:17 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-172-16-1-23.us-west-2.compute.internal:38093 (size: 596.9 KB, free: 775.7 MB) 14/11/08 08:23:32 INFO cluster.YarnClientSchedulerBackend: Executor 34 disconnected, so removing it 14/11/08 08:23:32 ERROR cluster.YarnClientClusterScheduler: Lost executor 34 on ip-172-16-1-194.us-west-2.compute.internal: remote Akka client disassociated 14/11/08 08:23:32 INFO scheduler.TaskSetManager: Re-queueing tasks for 34 from TaskSet 0.0 14/11/08 08:23:32 WARN scheduler.TaskSetManager: Lost task 22.1 in stage 0.0 (TID 31, ip-172-16-1-194.us-west-2.compute.internal): ExecutorLostFailure (executor lost) 14/11/08 08:23:32 INFO scheduler.DAGScheduler: Executor lost: 34 (epoch 11) 14/11/08 08:23:32 INFO storage.BlockManagerMasterActor: Trying to remove executor 34 from BlockManagerMaster. 14/11/08 08:23:32 INFO storage.BlockManagerMaster: Removed 34 successfully in removeExecutor 14/11/08 08:23:53 INFO cluster.YarnClientSchedulerBackend: Executor 41 disconnected, so removing it 14/11/08 08:23:53 ERROR cluster.YarnClientClusterScheduler: Lost executor 41 on ip-172-16-1-23.us-west-2.compute.internal: remote Akka client disassociated 14/11/08 08:23:53 INFO scheduler.TaskSetManager: Re-queueing tasks for 41 from TaskSet 0.0 14/11/08 08:23:53 WARN scheduler.TaskSetManager: Lost task 28.2 in stage 0.0 (TID 40, ip-172-16-1-23.us-west-2.compute.internal): ExecutorLostFailure (executor lost) 14/11/08 08:23:53 INFO scheduler.DAGScheduler: Executor lost: 41 (epoch 12) 14/11/08 08:23:53 INFO storage.BlockManagerMasterActor: Trying to remove executor 41 from BlockManagerMaster. 14/11/08 08:23:53 INFO storage.BlockManagerMaster: Removed 41 successfully in removeExecutor 14/11/08 08:23:57 INFO cluster.YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://sparkexecu...@ip-172-16-1-90.us-west-2.compute.internal:58017/user/Executor#2094507560] with ID 42 14/11/08 08:23:57 INFO scheduler.TaskSetManager: Starting task 28.3 in stage 0.0 (TID 41, ip-172-16-1-90.us-west-2.compute.internal, PROCESS_LOCAL, 1122 bytes) 14/11/08 08:23:58 INFO storage.BlockManagerMasterActor: Registering block manager ip-172-16-1-90.us-west-2.compute.internal:41182 with 776.3 MB RAM 14/11/08 08:24:00 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-172-16-1-90.us-west-2.compute.internal:41182 (size: 596.9 KB, free: 775.7 MB) 14/11/08 08:24:04 INFO cluster.YarnClientSchedulerBackend: Executor 35 disconnected, so removing it 14/11/08 08:24:04 ERROR cluster.YarnClientClusterScheduler: Lost executor 35 on ip-172-16-1-92.us-west-2.compute.internal: remote Akka client disassociated 14/11/08 08:24:04 INFO scheduler.TaskSetManager: Re-queueing tasks for 35 from TaskSet 0.0 14/11/08 08:24:04 WARN scheduler.TaskSetManager: Lost task 24.1 in stage 0.0 (TID 34, ip-172-16-1-92.us-west-2.compute.internal): ExecutorLostFailure (executor lost) 14/11/08 08:24:04 INFO scheduler.DAGScheduler: Executor lost: 35 (epoch 13) 14/11/08 08:24:04 INFO storage.BlockManagerMasterActor: Trying to remove executor 35 from BlockManagerMaster. 14/11/08 08:24:04 INFO storage.BlockManagerMaster: Removed 35 successfully in removeExecutor 14/11/08 08:24:17 INFO cluster.YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://sparkexecu...@ip-172-16-1-90.us-west-2.compute.internal:36395/user/Executor#-1907878650] with ID 43 14/11/08 08:24:17 INFO scheduler.TaskSetManager: Starting task 24.2 in stage 0.0 (TID 42, ip-172-16-1-90.us-west-2.compute.internal, PROCESS_LOCAL, 1122 bytes) 14/11/08 08:24:18 INFO storage.BlockManagerMasterActor: Registering block manager ip-172-16-1-90.us-west-2.compute.internal:46948 with 776.3 MB RAM 14/11/08 08:24:20 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-172-16-1-90.us-west-2.compute.internal:46948 (size: 596.9 KB, free: 775.7 MB) 14/11/08 08:24:21 INFO cluster.YarnClientSchedulerBackend: Executor 40 disconnected, so removing it 14/11/08 08:24:21 ERROR cluster.YarnClientClusterScheduler: Lost executor 40 on ip-172-16-1-90.us-west-2.compute.internal: remote Akka client disassociated 14/11/08 08:24:21 INFO scheduler.TaskSetManager: Re-queueing tasks for 40 from TaskSet 0.0 14/11/08 08:24:21 WARN scheduler.TaskSetManager: Lost task 29.2 in stage 0.0 (TID 39, ip-172-16-1-90.us-west-2.compute.internal): ExecutorLostFailure (executor lost) 14/11/08 08:24:21 INFO scheduler.DAGScheduler: Executor lost: 40 (epoch 14) 14/11/08 08:24:21 INFO storage.BlockManagerMasterActor: Trying to remove executor 40 from BlockManagerMaster. 14/11/08 08:24:21 INFO storage.BlockManagerMaster: Removed 40 successfully in removeExecutor 14/11/08 08:24:31 INFO cluster.YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://sparkexecu...@ip-172-16-1-90.us-west-2.compute.internal:34467/user/Executor#-1100688472] with ID 44 14/11/08 08:24:31 INFO scheduler.TaskSetManager: Starting task 29.3 in stage 0.0 (TID 43, ip-172-16-1-90.us-west-2.compute.internal, PROCESS_LOCAL, 1122 bytes) 14/11/08 08:24:32 INFO storage.BlockManagerMasterActor: Registering block manager ip-172-16-1-90.us-west-2.compute.internal:40126 with 776.3 MB RAM 14/11/08 08:24:34 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-172-16-1-90.us-west-2.compute.internal:40126 (size: 596.9 KB, free: 775.7 MB) 14/11/08 08:24:48 INFO cluster.YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://sparkexecu...@ip-172-16-1-90.us-west-2.compute.internal:53257/user/Executor#-745380917] with ID 45 14/11/08 08:24:48 INFO scheduler.TaskSetManager: Starting task 22.2 in stage 0.0 (TID 44, ip-172-16-1-90.us-west-2.compute.internal, PROCESS_LOCAL, 1122 bytes) 14/11/08 08:24:49 INFO storage.BlockManagerMasterActor: Registering block manager ip-172-16-1-90.us-west-2.compute.internal:46252 with 776.3 MB RAM 14/11/08 08:24:52 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-172-16-1-90.us-west-2.compute.internal:46252 (size: 596.9 KB, free: 775.7 MB) 14/11/08 08:25:16 INFO cluster.YarnClientSchedulerBackend: Executor 38 disconnected, so removing it 14/11/08 08:25:16 ERROR cluster.YarnClientClusterScheduler: Lost executor 38 on ip-172-16-1-228.us-west-2.compute.internal: remote Akka client disassociated 14/11/08 08:25:16 INFO scheduler.TaskSetManager: Re-queueing tasks for 38 from TaskSet 0.0 14/11/08 08:25:16 WARN scheduler.TaskSetManager: Lost task 21.1 in stage 0.0 (TID 37, ip-172-16-1-228.us-west-2.compute.internal): ExecutorLostFailure (executor lost) 14/11/08 08:25:16 INFO scheduler.DAGScheduler: Executor lost: 38 (epoch 15) 14/11/08 08:25:16 INFO storage.BlockManagerMasterActor: Trying to remove executor 38 from BlockManagerMaster. 14/11/08 08:25:16 INFO storage.BlockManagerMaster: Removed 38 successfully in removeExecutor 14/11/08 08:25:37 INFO cluster.YarnClientSchedulerBackend: Executor 42 disconnected, so removing it 14/11/08 08:25:37 ERROR cluster.YarnClientClusterScheduler: Lost executor 42 on ip-172-16-1-90.us-west-2.compute.internal: remote Akka client disassociated 14/11/08 08:25:37 INFO scheduler.TaskSetManager: Re-queueing tasks for 42 from TaskSet 0.0 14/11/08 08:25:37 WARN scheduler.TaskSetManager: Lost task 28.3 in stage 0.0 (TID 41, ip-172-16-1-90.us-west-2.compute.internal): ExecutorLostFailure (executor lost) 14/11/08 08:25:37 ERROR scheduler.TaskSetManager: Task 28 in stage 0.0 failed 4 times; aborting job 14/11/08 08:25:37 INFO cluster.YarnClientClusterScheduler: Cancelling stage 0 14/11/08 08:25:37 INFO cluster.YarnClientClusterScheduler: Stage 0 was cancelled 14/11/08 08:25:37 INFO scheduler.DAGScheduler: Failed to run saveAsTextFile at NativeMethodAccessorImpl.java:-2 14/11/08 08:25:37 INFO scheduler.DAGScheduler: Executor lost: 42 (epoch 16) 14/11/08 08:25:37 INFO storage.BlockManagerMasterActor: Trying to remove executor 42 from BlockManagerMaster. 14/11/08 08:25:37 INFO storage.BlockManagerMaster: Removed 42 successfully in removeExecutor --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org