[jira] [Commented] (SPARK-20814) Mesos scheduler does not respect spark.executor.extraClassPath configuration
[ https://issues.apache.org/jira/browse/SPARK-20814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16072244#comment-16072244 ] Laurent Hoss commented on SPARK-20814: -- > Hmm, this sucks, we should fix it for 2.2 ! +1 ( for the whole spark-on-mesos community ) And this patch looks straightforward, without risk to break anything (not related to mesos), so why it didn't make it into 2.2.0 ?! > Mesos scheduler does not respect spark.executor.extraClassPath configuration > > > Key: SPARK-20814 > URL: https://issues.apache.org/jira/browse/SPARK-20814 > Project: Spark > Issue Type: Bug > Components: Mesos >Affects Versions: 2.2.0 >Reporter: Gene Pang >Assignee: Marcelo Vanzin >Priority: Critical > Fix For: 2.2.1 > > > When Spark executors are deployed on Mesos, the Mesos scheduler no longer > respects the "spark.executor.extraClassPath" configuration parameter. > MesosCoarseGrainedSchedulerBackend used to use the environment variable > "SPARK_CLASSPATH" to add the value of "spark.executor.extraClassPath" to the > executor classpath. However, "SPARK_CLASSPATH" was deprecated, and was > removed in this commit > [https://github.com/apache/spark/commit/8f0490e22b4c7f1fdf381c70c5894d46b7f7e6fb#diff-387c5d0c916278495fc28420571adf9eL178]. > This effectively broke the ability for users to specify > "spark.executor.extraClassPath" for Spark executors deployed on Mesos. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19606) Support constraints in spark-dispatcher
[ https://issues.apache.org/jira/browse/SPARK-19606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031146#comment-16031146 ] Laurent Hoss commented on SPARK-19606: -- +1 hopefully this PR gets comitter attention > Support constraints in spark-dispatcher > --- > > Key: SPARK-19606 > URL: https://issues.apache.org/jira/browse/SPARK-19606 > Project: Spark > Issue Type: New Feature > Components: Mesos >Affects Versions: 2.1.0 >Reporter: Philipp Hoffmann > > The `spark.mesos.constraints` configuration is ignored by the > spark-dispatcher. The constraints need to be passed in the Framework > information when registering with Mesos. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-19606) Support constraints in spark-dispatcher
[ https://issues.apache.org/jira/browse/SPARK-19606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031146#comment-16031146 ] Laurent Hoss edited comment on SPARK-19606 at 5/31/17 1:28 PM: --- +1 hopefully this PR gets committer attention was (Author: laurentcoder): +1 hopefully this PR gets comitter attention > Support constraints in spark-dispatcher > --- > > Key: SPARK-19606 > URL: https://issues.apache.org/jira/browse/SPARK-19606 > Project: Spark > Issue Type: New Feature > Components: Mesos >Affects Versions: 2.1.0 >Reporter: Philipp Hoffmann > > The `spark.mesos.constraints` configuration is ignored by the > spark-dispatcher. The constraints need to be passed in the Framework > information when registering with Mesos. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15142) Spark Mesos dispatcher becomes unusable when the Mesos master restarts
[ https://issues.apache.org/jira/browse/SPARK-15142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16031143#comment-16031143 ] Laurent Hoss commented on SPARK-15142: -- +1 to get this solved ! (ps: We are calling the dispatcher with requests generated from a *livy* container..) > Spark Mesos dispatcher becomes unusable when the Mesos master restarts > -- > > Key: SPARK-15142 > URL: https://issues.apache.org/jira/browse/SPARK-15142 > Project: Spark > Issue Type: Bug > Components: Deploy, Mesos >Reporter: Devaraj K >Priority: Minor > Attachments: > spark-devaraj-org.apache.spark.deploy.mesos.MesosClusterDispatcher-1-stobdtserver5.out > > > While Spark Mesos dispatcher running if the Mesos master gets restarted then > Spark Mesos dispatcher will keep running and queues up all the submitted > applications and will not launch them. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2984) FileNotFoundException on _temporary directory
[ https://issues.apache.org/jira/browse/SPARK-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875962#comment-15875962 ] Laurent Hoss commented on SPARK-2984: - related issue (still open): SPARK-10109 > FileNotFoundException on _temporary directory > - > > Key: SPARK-2984 > URL: https://issues.apache.org/jira/browse/SPARK-2984 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.1.0 >Reporter: Andrew Ash >Assignee: Josh Rosen >Priority: Critical > Fix For: 1.3.0 > > > We've seen several stacktraces and threads on the user mailing list where > people are having issues with a {{FileNotFoundException}} stemming from an > HDFS path containing {{_temporary}}. > I ([~aash]) think this may be related to {{spark.speculation}}. I think the > error condition might manifest in this circumstance: > 1) task T starts on a executor E1 > 2) it takes a long time, so task T' is started on another executor E2 > 3) T finishes in E1 so moves its data from {{_temporary}} to the final > destination and deletes the {{_temporary}} directory during cleanup > 4) T' finishes in E2 and attempts to move its data from {{_temporary}}, but > those files no longer exist! exception > Some samples: > {noformat} > 14/08/11 08:05:08 ERROR JobScheduler: Error running job streaming job > 140774430 ms.0 > java.io.FileNotFoundException: File > hdfs://hadoopc/user/csong/output/human_bot/-140774430.out/_temporary/0/task_201408110805__m_07 > does not exist. > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:654) > at > org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:102) > at > org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:712) > at > org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:708) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:708) > at > org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:360) > at > org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:310) > at > org.apache.hadoop.mapred.FileOutputCommitter.commitJob(FileOutputCommitter.java:136) > at > org.apache.spark.SparkHadoopWriter.commitJob(SparkHadoopWriter.scala:126) > at > org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopDataset(PairRDDFunctions.scala:841) > at > org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:724) > at > org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:643) > at org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:1068) > at > org.apache.spark.streaming.dstream.DStream$$anonfun$8.apply(DStream.scala:773) > at > org.apache.spark.streaming.dstream.DStream$$anonfun$8.apply(DStream.scala:771) > at > org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:41) > at > org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40) > at > org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40) > at scala.util.Try$.apply(Try.scala:161) > at org.apache.spark.streaming.scheduler.Job.run(Job.scala:32) > at > org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:172) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > {noformat} > -- Chen Song at > http://apache-spark-user-list.1001560.n3.nabble.com/saveAsTextFiles-file-not-found-exception-td10686.html > {noformat} > I am running a Spark Streaming job that uses saveAsTextFiles to save results > into hdfs files. However, it has an exception after 20 batches > result-140631234/_temporary/0/task_201407251119__m_03 does not > exist. > {noformat} > and > {noformat} > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): > No lease on /apps/data/vddil/real-time/checkpoint/temp: File does not exist. > Holder DFSClient_NONMAPREDUCE_327993456_13 does not have any open files. > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2946) > at >
[jira] [Commented] (SPARK-10109) NPE when saving Parquet To HDFS
[ https://issues.apache.org/jira/browse/SPARK-10109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875960#comment-15875960 ] Laurent Hoss commented on SPARK-10109: -- seems related to SPARK-2984 (though closed, the last comment proposed to create a new issue) > NPE when saving Parquet To HDFS > --- > > Key: SPARK-10109 > URL: https://issues.apache.org/jira/browse/SPARK-10109 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.4.1 > Environment: Sparc-ec2, standalone cluster on amazon >Reporter: Virgil Palanciuc > > Very simple code, trying to save a dataframe > I get this in the driver > {quote} > 15/08/19 11:21:41 INFO TaskSetManager: Lost task 9.2 in stage 217.0 (TID > 4748) on executor 172.xx.xx.xx: java.lang.NullPointerException (null) > and (not for that task): > 15/08/19 11:21:46 WARN TaskSetManager: Lost task 5.0 in stage 543.0 (TID > 5607, 172.yy.yy.yy): java.lang.NullPointerException > at > parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:146) > at > parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:112) > at > parquet.hadoop.ParquetRecordWriter.close(ParquetRecordWriter.java:73) > at > org.apache.spark.sql.parquet.ParquetOutputWriter.close(newParquet.scala:88) > at > org.apache.spark.sql.sources.DynamicPartitionWriterContainer$$anonfun$clearOutputWriters$1.apply(commands.scala:536) > at > org.apache.spark.sql.sources.DynamicPartitionWriterContainer$$anonfun$clearOutputWriters$1.apply(commands.scala:536) > at > scala.collection.mutable.HashMap$$anon$2$$anonfun$foreach$3.apply(HashMap.scala:107) > at > scala.collection.mutable.HashMap$$anon$2$$anonfun$foreach$3.apply(HashMap.scala:107) > at > scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226) > at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) > at scala.collection.mutable.HashMap$$anon$2.foreach(HashMap.scala:107) > at > org.apache.spark.sql.sources.DynamicPartitionWriterContainer.clearOutputWriters(commands.scala:536) > at > org.apache.spark.sql.sources.DynamicPartitionWriterContainer.abortTask(commands.scala:552) > at > org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.org$apache$spark$sql$sources$InsertIntoHadoopFsRelation$$writeRows$2(commands.scala:269) > at > org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insertWithDynamicPartitions$3.apply(commands.scala:229) > at > org.apache.spark.sql.sources.InsertIntoHadoopFsRelation$$anonfun$insertWithDynamicPartitions$3.apply(commands.scala:229) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) > at org.apache.spark.scheduler.Task.run(Task.scala:70) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > {quote} > I get this in the executor log: > {quote} > 15/08/19 11:21:41 WARN DFSClient: DataStreamer Exception > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): > No lease on > /gglogs/2015-07-27/_temporary/_attempt_201508191119_0217_m_09_2/dpid=18432/pid=1109/part-r-9-46ac3a79-a95c-4d9c-a2f1-b3ee76f6a46c.snappy.parquet > File does not exist. Holder DFSClient_NONMAPREDUCE_1730998114_63 does not > have any open files. > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2396) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2387) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2183) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:481) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1695) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1691) > at java.security.AccessController.doPrivileged(Native Method) > at
[jira] [Commented] (SPARK-10643) Support HDFS application download in client mode spark submit
[ https://issues.apache.org/jira/browse/SPARK-10643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1581#comment-1581 ] Laurent Hoss commented on SPARK-10643: -- +1 would be very useful when using Zeppelin (running in docker) on a *mesos* cluster Unfort. the Zeppelin GUI does not either/yet support adding jars from HDFS, nor does it support some kind of http-upload, instead only from local dirs (not practical when inside docker) or from a (custom) maven-repos (not ideal for quick dev iterations). After I learned that it should work with 'cluster mode' I tried to submit a spark job in cluster mode, within zeppelin but it then failed because it can't find the builtin zeppelin-spark interpreter jar (when driver is ran in the 'cluster). Not sure yet if that's actually an issue (as I'ld assume spark taking care to transfer the provided '--jars' ..) > Support HDFS application download in client mode spark submit > - > > Key: SPARK-10643 > URL: https://issues.apache.org/jira/browse/SPARK-10643 > Project: Spark > Issue Type: New Feature > Components: Spark Submit >Reporter: Alan Braithwaite >Priority: Minor > > When using mesos with docker and marathon, it would be nice to be able to > make spark-submit deployable on marathon and have that download a jar from > HDFS instead of having to package the jar with the docker. > {code} > $ docker run -it docker.example.com/spark:latest > /usr/local/spark/bin/spark-submit --class > com.example.spark.streaming.EventHandler hdfs://hdfs/tmp/application.jar > Warning: Skip remote jar hdfs://hdfs/tmp/application.jar. > java.lang.ClassNotFoundException: com.example.spark.streaming.EventHandler > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:348) > at org.apache.spark.util.Utils$.classForName(Utils.scala:173) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:639) > at > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > {code} > Although I'm aware that we can run in cluster mode with mesos, we've already > built some nice tools surrounding marathon for logging and monitoring. > Code in question: > https://github.com/apache/spark/blob/132718ad7f387e1002b708b19e471d9cd907e105/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L723-L736 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12430) Temporary folders do not get deleted after Task completes causing problems with disk space.
[ https://issues.apache.org/jira/browse/SPARK-12430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15424242#comment-15424242 ] Laurent Hoss commented on SPARK-12430: -- In spark 2.0 this should be less an issue, at least when using 'coarse grained' mode after SPARK-12330 was fixed. [~fede-bis] any feedback if this has been resolved for you ? [~dragos] said > Are you using 1.6? In that case, the blockmgr directory should really be inside your Mesos sandbox, not under /tmp. At least, that's what I see when I try out. Let me note that one big drawback having those dirs in the sandbox (/tmp) is that the shuffling cannot be parallized over multiple disks. That's why we prefer to set `spark.local.dirs` to use a number of partitions (on different disks), and ensure those get cleaned regularly using some 'find -mtime' magic on the blockmgr-* dirs (non ideal hower, so we hope the migration to spark-2 improves the situation) > Temporary folders do not get deleted after Task completes causing problems > with disk space. > --- > > Key: SPARK-12430 > URL: https://issues.apache.org/jira/browse/SPARK-12430 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.1, 1.5.2, 1.6.0 > Environment: Ubuntu server >Reporter: Fede Bar > > We are experiencing an issue with automatic /tmp folder deletion after > framework completes. Completing a M/R job using Spark 1.5.2 (same behavior as > Spark 1.5.1) over Mesos will not delete some temporary folders causing free > disk space on server to exhaust. > Behavior of M/R job using Spark 1.4.1 over Mesos cluster: > - Launched using spark-submit on one cluster node. > - Following folders are created: */tmp/mesos/slaves/id#* , */tmp/spark-#/* , > */tmp/spark-#/blockmgr-#* > - When task is completed */tmp/spark-#/* gets deleted along with > */tmp/spark-#/blockmgr-#* sub-folder. > Behavior of M/R job using Spark 1.5.2 over Mesos cluster (same identical job): > - Launched using spark-submit on one cluster node. > - Following folders are created: */tmp/mesos/mesos/slaves/id** * , > */tmp/spark-***/ * ,{color:red} /tmp/blockmgr-***{color} > - When task is completed */tmp/spark-***/ * gets deleted but NOT shuffle > container folder {color:red} /tmp/blockmgr-***{color} > Unfortunately, {color:red} /tmp/blockmgr-***{color} can account for several > GB depending on the job that ran. Over time this causes disk space to become > full with consequences that we all know. > Running a shell script would probably work but it is difficult to identify > folders in use by a running M/R or stale folders. I did notice similar issues > opened by other users marked as "resolved", but none seems to exactly match > the above behavior. > I really hope someone has insights on how to fix it. > Thank you very much! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16522) [MESOS] Spark application throws exception on exit
[ https://issues.apache.org/jira/browse/SPARK-16522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15409502#comment-15409502 ] Laurent Hoss commented on SPARK-16522: -- Is this gonna be fixed soon (2.0.1) ?! I just got a stacktrace of a developer here, who's job *failed* .. with following exception {code} 2016-08-05 05:43:35,186 WARN org.apache.spark.rpc.netty.NettyRpcEndpointRef: Error sending message [message = RemoveExecutor(5,Executor finished with state FINISHED)] in 1 attempts org.apache.spark.SparkException: Exception thrown in awaitResult at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77) at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167) at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83) at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102) at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78) at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:412) at org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.executorTerminated(MesosCoarseGrainedSchedulerBackend.scala:555) at org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.statusUpdate(MesosCoarseGrainedSchedulerBackend.scala:495) Caused by: org.apache.spark.SparkException: Could not find CoarseGrainedScheduler. at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:152) at org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:127) at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:225) at org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:508) at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101) ... 4 more {code} > [MESOS] Spark application throws exception on exit > -- > > Key: SPARK-16522 > URL: https://issues.apache.org/jira/browse/SPARK-16522 > Project: Spark > Issue Type: Bug > Components: Mesos >Affects Versions: 2.0.0 >Reporter: Sun Rui > > Spark applications running on Mesos throw exception upon exit as follows: > {noformat} > 16/07/13 15:20:46 WARN NettyRpcEndpointRef: Error sending message [message = > RemoveExecutor(1,Executor finished with state FINISHED)] in 3 attempts > org.apache.spark.SparkException: Exception thrown in awaitResult > at > org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75) > at > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) > at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167) > at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78) > at > org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:412) > at > org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.executorTerminated(MesosCoarseGrainedSchedulerBackend.scala:555) > at > org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.statusUpdate(MesosCoarseGrainedSchedulerBackend.scala:495) > Caused by: org.apache.spark.SparkException: Could not find > CoarseGrainedScheduler. > at > org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:152) > at > org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:127) > at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:225) > at > org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:508) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101) > ... 4 more > Exception in thread "Thread-47" org.apache.spark.SparkException: Error > notifying standalone scheduler's driver endpoint >