[jira] [Updated] (SPARK-20170) Enhance spark framework to support failover in case mesos master failure

2017-03-31 Thread Jared (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jared updated SPARK-20170:
--
Description: 
When launching spark framework on mesos cluster, if mesos master failure,  
restarts or elected new master, spark framework couldn't re-register with 
master again.
Obviously, master failover is not taken into consideration in current spark 
framework implementation.
Actually, spark framework on mesos should support failover in case of mesos 
master failure or network disconnect. 

> Enhance spark framework to support failover in case mesos master failure
> 
>
> Key: SPARK-20170
> URL: https://issues.apache.org/jira/browse/SPARK-20170
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Affects Versions: 2.1.0
> Environment: CentOS 7.0, mesos 1.3.0, spark 2.1.0
>Reporter: Jared
>Priority: Minor
>
> When launching spark framework on mesos cluster, if mesos master failure,  
> restarts or elected new master, spark framework couldn't re-register with 
> master again.
> Obviously, master failover is not taken into consideration in current spark 
> framework implementation.
> Actually, spark framework on mesos should support failover in case of mesos 
> master failure or network disconnect. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-20170) Enhance spark framework to support failover in case mesos master failure

2017-03-31 Thread Jared (JIRA)
Jared created SPARK-20170:
-

 Summary: Enhance spark framework to support failover in case mesos 
master failure
 Key: SPARK-20170
 URL: https://issues.apache.org/jira/browse/SPARK-20170
 Project: Spark
  Issue Type: Improvement
  Components: Mesos
Affects Versions: 2.1.0
 Environment: CentOS 7.0, mesos 1.3.0, spark 2.1.0
Reporter: Jared
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15142) Spark Mesos dispatcher becomes unusable when the Mesos master restarts

2017-03-30 Thread Jared (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15949450#comment-15949450
 ] 

Jared commented on SPARK-15142:
---

[~devaraj.k] are you still working on this problem?
It seemed that MesosClusterDispatcher was unaware of master lost.

> Spark Mesos dispatcher becomes unusable when the Mesos master restarts
> --
>
> Key: SPARK-15142
> URL: https://issues.apache.org/jira/browse/SPARK-15142
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Mesos
>Reporter: Devaraj K
>Priority: Minor
> Attachments: 
> spark-devaraj-org.apache.spark.deploy.mesos.MesosClusterDispatcher-1-stobdtserver5.out
>
>
> While Spark Mesos dispatcher running if the Mesos master gets restarted then 
> Spark Mesos dispatcher will keep running and queues up all the submitted 
> applications and will not launch them.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15359) Mesos dispatcher should handle DRIVER_ABORTED status from mesosDriver.run()

2016-12-29 Thread Jared (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15786813#comment-15786813
 ] 

Jared commented on SPARK-15359:
---

Hi, I tested the fix. However, it seemed the problem still existed.
I1230 11:39:07.096375  6889 sched.cpp:1223] Aborting framework
16/12/30 11:39:07 INFO MesosClusterScheduler: driver.run() returned with code 
DRIVER_ABORTED
16/12/30 11:39:07 ERROR MesosClusterScheduler: driver.run() failed
org.apache.spark.SparkException: Error starting driver, DRIVER_ABORTED
at 
org.apache.spark.scheduler.cluster.mesos.MesosSchedulerUtils$$anon$1.run(MesosSchedulerUtils.scala:124)
Exception in thread "MesosClusterScheduler-mesos-driver" 
org.apache.spark.SparkException: Error starting driver, DRIVER_ABORTED
at 
org.apache.spark.scheduler.cluster.mesos.MesosSchedulerUtils$$anon$1.run(MesosSchedulerUtils.scala:124)
16/12/30 11:39:07 INFO Utils: Successfully started service on port 7077.
16/12/30 11:39:07 INFO MesosRestServer: Started REST server for submitting 
applications on port 7077

It seemed that exceptions thrown was not handled.
I think several other files should also be changed to fix this problem.

> Mesos dispatcher should handle DRIVER_ABORTED status from mesosDriver.run()
> ---
>
> Key: SPARK-15359
> URL: https://issues.apache.org/jira/browse/SPARK-15359
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Mesos
>Reporter: Devaraj K
>Priority: Minor
>
> Mesos dispatcher handles DRIVER_ABORTED status for mesosDriver.run() during 
> the successful registration but if the mesosDriver.run() returns 
> DRIVER_ABORTED status after the successful register then there is no action 
> for the status and the thread will be terminated. 
> I think we need to throw the exception and shutdown the dispatcher.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15359) Mesos dispatcher should handle DRIVER_ABORTED status from mesosDriver.run()

2016-12-27 Thread Jared (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15780014#comment-15780014
 ] 

Jared commented on SPARK-15359:
---

Hi, I also met some similar problem when running spark on mesos.
For my testing, spark mesos dispatcher didn't register with mesos master 
successfully. But mesos dispatcher is still brought up and listening on default 
port 7077.
I think mesos dispatcher should been shut down if status of mesosDriver.run() 
is DRIVER_ABORTED.

I didn't quite understand content in the description.  What's meaning of 
"successful registration"? Do you mean mesosDriver.run() return without 
aborting?
If we're working exactly on the same problem, I would like to contribute to fix 
this issue, for example, review code changes or testing the fixes and so on.

Thanks,
Jared


> Mesos dispatcher should handle DRIVER_ABORTED status from mesosDriver.run()
> ---
>
> Key: SPARK-15359
> URL: https://issues.apache.org/jira/browse/SPARK-15359
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Mesos
>Reporter: Devaraj K
>Priority: Minor
>
> Mesos dispatcher handles DRIVER_ABORTED status for mesosDriver.run() during 
> the successful registration but if the mesosDriver.run() returns 
> DRIVER_ABORTED status after the successful register then there is no action 
> for the status and the thread will be terminated. 
> I think we need to throw the exception and shutdown the dispatcher.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org