[jira] [Updated] (SPARK-20170) Enhance spark framework to support failover in case mesos master failure
[ https://issues.apache.org/jira/browse/SPARK-20170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jared updated SPARK-20170: -- Description: When launching spark framework on mesos cluster, if mesos master failure, restarts or elected new master, spark framework couldn't re-register with master again. Obviously, master failover is not taken into consideration in current spark framework implementation. Actually, spark framework on mesos should support failover in case of mesos master failure or network disconnect. > Enhance spark framework to support failover in case mesos master failure > > > Key: SPARK-20170 > URL: https://issues.apache.org/jira/browse/SPARK-20170 > Project: Spark > Issue Type: Improvement > Components: Mesos >Affects Versions: 2.1.0 > Environment: CentOS 7.0, mesos 1.3.0, spark 2.1.0 >Reporter: Jared >Priority: Minor > > When launching spark framework on mesos cluster, if mesos master failure, > restarts or elected new master, spark framework couldn't re-register with > master again. > Obviously, master failover is not taken into consideration in current spark > framework implementation. > Actually, spark framework on mesos should support failover in case of mesos > master failure or network disconnect. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-20170) Enhance spark framework to support failover in case mesos master failure
Jared created SPARK-20170: - Summary: Enhance spark framework to support failover in case mesos master failure Key: SPARK-20170 URL: https://issues.apache.org/jira/browse/SPARK-20170 Project: Spark Issue Type: Improvement Components: Mesos Affects Versions: 2.1.0 Environment: CentOS 7.0, mesos 1.3.0, spark 2.1.0 Reporter: Jared Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15142) Spark Mesos dispatcher becomes unusable when the Mesos master restarts
[ https://issues.apache.org/jira/browse/SPARK-15142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15949450#comment-15949450 ] Jared commented on SPARK-15142: --- [~devaraj.k] are you still working on this problem? It seemed that MesosClusterDispatcher was unaware of master lost. > Spark Mesos dispatcher becomes unusable when the Mesos master restarts > -- > > Key: SPARK-15142 > URL: https://issues.apache.org/jira/browse/SPARK-15142 > Project: Spark > Issue Type: Bug > Components: Deploy, Mesos >Reporter: Devaraj K >Priority: Minor > Attachments: > spark-devaraj-org.apache.spark.deploy.mesos.MesosClusterDispatcher-1-stobdtserver5.out > > > While Spark Mesos dispatcher running if the Mesos master gets restarted then > Spark Mesos dispatcher will keep running and queues up all the submitted > applications and will not launch them. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15359) Mesos dispatcher should handle DRIVER_ABORTED status from mesosDriver.run()
[ https://issues.apache.org/jira/browse/SPARK-15359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15786813#comment-15786813 ] Jared commented on SPARK-15359: --- Hi, I tested the fix. However, it seemed the problem still existed. I1230 11:39:07.096375 6889 sched.cpp:1223] Aborting framework 16/12/30 11:39:07 INFO MesosClusterScheduler: driver.run() returned with code DRIVER_ABORTED 16/12/30 11:39:07 ERROR MesosClusterScheduler: driver.run() failed org.apache.spark.SparkException: Error starting driver, DRIVER_ABORTED at org.apache.spark.scheduler.cluster.mesos.MesosSchedulerUtils$$anon$1.run(MesosSchedulerUtils.scala:124) Exception in thread "MesosClusterScheduler-mesos-driver" org.apache.spark.SparkException: Error starting driver, DRIVER_ABORTED at org.apache.spark.scheduler.cluster.mesos.MesosSchedulerUtils$$anon$1.run(MesosSchedulerUtils.scala:124) 16/12/30 11:39:07 INFO Utils: Successfully started service on port 7077. 16/12/30 11:39:07 INFO MesosRestServer: Started REST server for submitting applications on port 7077 It seemed that exceptions thrown was not handled. I think several other files should also be changed to fix this problem. > Mesos dispatcher should handle DRIVER_ABORTED status from mesosDriver.run() > --- > > Key: SPARK-15359 > URL: https://issues.apache.org/jira/browse/SPARK-15359 > Project: Spark > Issue Type: Bug > Components: Deploy, Mesos >Reporter: Devaraj K >Priority: Minor > > Mesos dispatcher handles DRIVER_ABORTED status for mesosDriver.run() during > the successful registration but if the mesosDriver.run() returns > DRIVER_ABORTED status after the successful register then there is no action > for the status and the thread will be terminated. > I think we need to throw the exception and shutdown the dispatcher. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15359) Mesos dispatcher should handle DRIVER_ABORTED status from mesosDriver.run()
[ https://issues.apache.org/jira/browse/SPARK-15359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15780014#comment-15780014 ] Jared commented on SPARK-15359: --- Hi, I also met some similar problem when running spark on mesos. For my testing, spark mesos dispatcher didn't register with mesos master successfully. But mesos dispatcher is still brought up and listening on default port 7077. I think mesos dispatcher should been shut down if status of mesosDriver.run() is DRIVER_ABORTED. I didn't quite understand content in the description. What's meaning of "successful registration"? Do you mean mesosDriver.run() return without aborting? If we're working exactly on the same problem, I would like to contribute to fix this issue, for example, review code changes or testing the fixes and so on. Thanks, Jared > Mesos dispatcher should handle DRIVER_ABORTED status from mesosDriver.run() > --- > > Key: SPARK-15359 > URL: https://issues.apache.org/jira/browse/SPARK-15359 > Project: Spark > Issue Type: Bug > Components: Deploy, Mesos >Reporter: Devaraj K >Priority: Minor > > Mesos dispatcher handles DRIVER_ABORTED status for mesosDriver.run() during > the successful registration but if the mesosDriver.run() returns > DRIVER_ABORTED status after the successful register then there is no action > for the status and the thread will be terminated. > I think we need to throw the exception and shutdown the dispatcher. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org