I think you are confusing 'principals' with 'users'. Principal is an identifier for the framework (think of it as username). User is the unix user that a task will run as on the *slave*.
So, whatever users you want to run the tasks as, make sure that they exist on the Mesos slave host (or docker container in which the slave runs). You will get TASK_LOST (or TASK_ERROR since 0.22.0) if there are authorization errors. TASK_FAILED has nothing do with authorization; it is generated by the slave. On Mon, Mar 9, 2015 at 4:36 AM, Sivaram Kannan <[email protected]> wrote: > > Hi, > > I apologize for bombarding with so many emails on the same issue. So, I > modified the acl.json as below. > > 1. I was able to launch the framework with authentication as users devel1 > and devel2. > 2. I was able to launch a task as user devel1 > 3. I get TASK_LOST when I try to launch task with the framework registered > as devel2. > 4. In the same config, if I change the run_tasks => users to devel, the > task fails with the error described in the previous email. As far as I > understand, an error in run_tasks users, does not give TASK_LOST, but a > TASK_FAILED. But a mismatch in principals between register_frameworks and > run_tasks gives a TASK_LOST. > > Does the above makes sense? Please correct me if I am wrong. > > > "permissive": false, > "register_frameworks": [ > { > "principals": { "values": [ "devel1", "devel2" ] }, > "roles": { "values": [ "apps", "dev-ops" ] } > }, > { > "principals": { "type": "NONE" }, > "roles": { "values": [ "apps", "dev-ops" ] } > } > ], > "run_tasks": [ > { > "principals": { "values": [ "devel1" ] }, > "users": { "values": [ "root" ] } > }, > { > "principals": { "values": [ "marathon" ] }, > "users": { "type": "NONE" } } ] > } > > Thanks, > ./Siva. > > On Mon, Mar 9, 2015 at 3:57 PM, Sivaram Kannan <[email protected]> > wrote: > >> >> Hi Vinod, >> >> The users in below run_tasks definition - does it refer to unix users in >> the machine where the framework is run or the unix users in the mesos-slave >> machine. I think the fact that I run all softwares (mesos-master, >> mesos-slave, marathon) as docker containers is of significance and reason >> for the below failure. >> >> "run_tasks": [ >> { >> "principals": { >> "values": [ >> "marathon" >> ] >> }, >> "users": { >> "values": [ >> "devel" >> ] >> } >> }, >> { >> "principals": { >> "values": [ >> "marathon" >> ] >> }, >> "users": { >> "type": "NONE" >> } >> } >> ] >> >> When I start the marathon, I start with the flag --mesos_user=devel and >> while bringing up mesos-slave I bring up with the flag >> --switch_user=true(which I think anyway is default). When I try to launch a >> task this is what I am getting >> >> Marathon Log: >> >> 0.10; rv:38.0) Gecko/20100101 Firefox/38.0" >> (mesosphere.chaos.http.ChaosRequestLog:15) >> [2015-03-06 06:04:04,057] INFO Received status update for task >> busybox.9777a963-c3c6-11e4-a31a-56847afe9799: TASK_FAILED (Abnormal >> executor termination) (mesosphere.marathon.MarathonScheduler:165) >> [2015-03-06 06:04:04,063] INFO Task launch delay for [/busybox] is now >> [43] seconds (mesosphere.util.RateLimiter:34) >> [2015-03-06 06:04:04,068] INFO Task >> busybox.9777a963-c3c6-11e4-a31a-56847afe9799 expunged and removed from >> TaskTracker (mesosphere.marathon.tasks.TaskTracker:101) >> [2015-03-06 06:04:04,068] INFO Sending event notification. >> (mesosphere.marathon.MarathonScheduler:274) >> >> Mesos-Slave Log: >> >> Mar 06 06:06:03 node-0800279564ad sh[27684]: E0306 06:06:03.898473 13 >> slave.cpp:2787] Container '9835da8c-a844-4d53-a7f7-4a5e6e808a9b' for >> executor 'busybox.d4ef22c6-c3c6-11e4-a31a-56847afe9799' of framework >> '20150306-054714-24707342-5050-1-0000' failed to start: Failed to create >> container: Failed to chown: Failed to get user information for 'devel': >> Success >> Mar 06 06:06:03 node-0800279564ad sh[27684]: E0306 06:06:03.900068 13 >> slave.cpp:2882] Termination of executor >> 'busybox.d4ef22c6-c3c6-11e4-a31a-56847afe9799' of framework >> '20150306-054714-24707342-5050-1-0000' failed: Unknown container: >> 9835da8c-a844-4d53-a7f7-4a5e6e808a9b >> Mar 06 06:06:03 node-0800279564ad sh[27684]: E0306 06:06:03.905900 13 >> slave.cpp:3134] Failed to unmonitor container for executor >> busybox.d4ef22c6-c3c6-11e4-a31a-56847afe9799 of framework >> 20150306-054714-24707342-5050-1-0000: Not monitored >> >> Could the failure be related to me running the mesos-slave as container >> here? >> >> Thanks, >> ./Siva. >> >> On Mon, Mar 9, 2015 at 10:51 AM, Sivaram Kannan <[email protected]> >> wrote: >> >>> >>> Hi Vinod, >>> >>> Thanks, I got it. I guess I did not understand the relationship between >>> principals defined in authentication and in authorization. I re-read the >>> authentication and credentials flag, it is not clear from them that the >>> principals defined in authorization should match them to work correctly. If >>> I could, will change the documentation to be more clear and submit a PR. >>> >>> Thanks, >>> ./Siva. >>> >>> On Mon, Mar 9, 2015 at 2:18 AM, Vinod Kone <[email protected]> wrote: >>> >>>> The principal used for authenticating the framework is the same >>>> principal used to authorize the framework too. So you need to use >>>> 'marathon' in your credentials too. In other words, when you start the >>>> framework the Credential.principal should be the same as >>>> FrameworkInfo.principal (Mesos master will validate this). >>>> >>>> On Sun, Mar 8, 2015 at 10:48 AM, Sivaram Kannan <[email protected]> >>>> wrote: >>>> >>>>> I0308 17:41:14.876610 6 master.cpp:1342] Authorizing framework >>>>> principal 'user1' to receive offers for role 'apps' >>>>> >>>> >>>> As you can see from this line, the master is trying to authorize >>>> principal 'user1' and not 'marathon'. >>>> >>> >>> >>> >>> -- >>> ever tried. ever failed. no matter. >>> try again. fail again. fail better. >>> -- Samuel Beckett >>> >> >> >> >> -- >> ever tried. ever failed. no matter. >> try again. fail again. fail better. >> -- Samuel Beckett >> > > > > -- > ever tried. ever failed. no matter. > try again. fail again. fail better. > -- Samuel Beckett >

