how to overcome orphaned tasks after master failure

2015-09-14 Thread Mike Barborak
Hi, I'm trying to understand a chain of events that occurred this weekend to see if I'm doing something wrong in my custom framework. The cluster in question had a single Mesos master and < 10 slaves. At some point the master got a fatal error and apparently respawned: Log file created at:

Re: how to overcome orphaned tasks after master failure

2015-09-14 Thread Vinod Kone
The framework is expected to re-register with a failed over master with the same framework id as before. If you are using the scheduler driver, that should happen automatically. On Mon, Sep 14, 2015 at 6:38 AM, Mike Barborak wrote: > Hi, > > > > I’m trying to understand a

Re: Spark Job Submitting on Mesos Cluster

2015-09-14 Thread Tim Chen
Thanks Haosdent! Tim On Mon, Sep 14, 2015 at 1:29 AM, SLiZn Liu wrote: > I found the --no-switch_user flag in mesos slave configuration. Will give > it a try. Thanks Tim, and haosdent ! > ​ > > On Mon, Sep 14, 2015 at 4:15 PM haosdent wrote: > >> >

RE: how to overcome orphaned tasks after master failure

2015-09-14 Thread Mike Barborak
Hi, This is what I see regarding the framework in the logs on the master: mesos-master.INFO:W0912 20:42:42.451398 10692 master.cpp:4926] Possibly orphaned task 0 of framework 20150908-084257-755703724-5050-2811- running on slave 20150908-084257-755703724-5050-2811-S0 at

Re: Mesos + Spark Integration

2015-09-14 Thread Sam Bessalah
Have you tried to give spark the namenode explicitely hdfs://namenode_ip:8020/hdfs/ as address? On Mon, Sep 14, 2015 at 11:09 PM, Rodrick Brown wrote: > I have separate systems for the following services > > — Mesos (3 masters + 3 slaves) > — Hadoop (2 NN + 8

Re: Apache Mesos Community Sync

2015-09-14 Thread Adam Bordelon
We'll have the next community sync this Thursday (Sept. 17th) at 8:30am Pacific. Please add items to the agenda . We will try Hangouts on Air this time. We will post the video stream

Re: Spark Job Submitting on Mesos Cluster

2015-09-14 Thread zhou weitao
At the same time, make sure SPARK_USER is the real one living on slave before execute your spark program. 2015-09-14 16:29 GMT+08:00 SLiZn Liu : > I found the --no-switch_user flag in mesos slave configuration. Will give > it a try. Thanks Tim, and haosdent ! > ​ > > On

Re: Spark Job Submitting on Mesos Cluster

2015-09-14 Thread SLiZn Liu
No, we set up a specific user to start mesos, it isn't root. On Mon, Sep 14, 2015 at 1:05 PM haosdent wrote: > Do you start your mesos cluster with root? > > On Mon, Sep 14, 2015 at 12:10 PM, SLiZn Liu > wrote: > >> Hi Mesos Users, >> >> I’m trying

Re: Spark Job Submitting on Mesos Cluster

2015-09-14 Thread Tim Chen
Actually --proxy-user is more about which user you're impersonated to run the driver, but not the user that is going to be passed to Mesos to run as. The way to use a partciular user when running a spark job is to set the SPARK_USER environment variable, and that user will be passed to Mesos.

RE: how to overcome orphaned tasks after master failure

2015-09-14 Thread Mike Barborak
Hi, Sorry for my ignorance, but what is the “scheduler driver?” My framework is based on the Java example: https://github.com/apache/mesos/blob/master/src/examples/java/TestFramework.java Just guessing, but is there something I should be doing in the reregistered method? My understanding was

Re: how to overcome orphaned tasks after master failure

2015-09-14 Thread Vinod Kone
On Mon, Sep 14, 2015 at 12:40 PM, Mike Barborak wrote: > Sorry for my ignorance, but what is the “scheduler driver?” My framework > is based on the Java example: > > Some details about the driver should be here:

Re: Spark Job Submitting on Mesos Cluster

2015-09-14 Thread SLiZn Liu
Thx Tommy, did you mean add proxy user like this: spark-submit --proxy-user ... where represents the user who started Mesos? and is this parameter documented anywhere? ​ On Mon, Sep 14, 2015 at 1:34 PM tommy xiao wrote: > @SLiZn Liu yes, you need add proxy_user parameter

Re: Spark Job Submitting on Mesos Cluster

2015-09-14 Thread haosdent
> turn off --switch-user flag in the Mesos slave --no-switch_user :-) On Mon, Sep 14, 2015 at 4:03 PM, Tim Chen wrote: > Actually --proxy-user is more about which user you're impersonated to run > the driver, but not the user that is going to be passed to Mesos to run as. >

Re: Spark Job Submitting on Mesos Cluster

2015-09-14 Thread SLiZn Liu
I found the --no-switch_user flag in mesos slave configuration. Will give it a try. Thanks Tim, and haosdent ! ​ On Mon, Sep 14, 2015 at 4:15 PM haosdent wrote: > > turn off --switch-user flag in the Mesos slave > --no-switch_user :-) > > On Mon, Sep 14, 2015 at 4:03 PM, Tim