Thanks a lot for the patient explanation both of you - Vinod and Mark. You are right, I got confused with the terminology. I got what I wanted out of ACL in my environment.
Thanks again and I really appreciate it. ./Siva. On Tue, Mar 10, 2015 at 4:30 AM, Michael Park <[email protected]> wrote: > On 9 March 2015 at 07:36, Sivaram Kannan <[email protected]> wrote: > >> >> Hi, >> >> I apologize for bombarding with so many emails on the same issue. So, I >> modified the acl.json as below. >> >> 1. I was able to launch the framework with authentication as users devel1 >> and devel2. >> > > Just so that our terminologies match here, you were able to *register* > frameworks *authenticated* as *principals* *devel1* and *devel2.* > > Your ACL specifies *devel1* and *devel2* can register under *apps* and > *dev-ops* roles, so as long as the frameworks registered under those > roles, the success here makes sense. > > >> 2. I was able to launch a task as user devel1 >> > > I agree with Vinod that maybe the point of confusion is regarding *principal > *vs *user* for *run_tasks*. To reiterate, *principal* is essentially a > username for Mesos to authenticate the framework, and *user* is the unix > user under which the task will run. Your ACL specifies that *principal* > *devel1* can launch tasks as *user* *root*. So you shouldn't be able to > launch a task as user *devel1*, but rather launch a task as user* root* with > the framework registered as principal=*devel1*. If this is not the case, > something's wrong. > > >> 3. I get TASK_LOST when I try to launch task with the framework >> registered as devel2. >> > > This is correct. Vinod already covered this point. In short, framework > registered as *devel2* is not permitted to run anything based on your ACL > since none of the specified cases match and "permissive" is set to false. > > >> 4. In the same config, if I change the run_tasks => users to devel, the >> task fails with the error described in the previous email. As far as I >> understand, an error in run_tasks users, does not give TASK_LOST, but a >> TASK_FAILED. >> > > I'm not sure what you mean by "an error in run_tasks users". The error you > get in this case is because you don't have a *devel* user available in > the environment you're launching the task. The relevant line in the error > message that illustrate this is: > > Mar 06 06:06:03 node-0800279564ad sh[27684]: E0306 06:06:03.898473 13 >> slave.cpp:2787] Container '9835da8c-a844-4d53-a7f7-4a5e6e808a9b' for >> executor 'busybox.d4ef22c6-c3c6-11e4-a31a-56847afe9799' of framework >> '20150306-054714-24707342-5050-1-0000' failed to start: Failed to create >> container: Failed to chown: *Failed to get user information for 'devel'*: >> Success > > > This is indeed a *TASK_FAILED*, since authorization succeeded, but the > task failed to launch. > > >> But a mismatch in principals between register_frameworks and run_tasks >> gives a TASK_LOST. >> > > It's not clear exactly what you mean by "a *mismatch* in principals > between register_frameworks and run_tasks". If you mean that all principals > under register_frameworks should have a matching portion in run_tasks, > that's not quite correct. For example, you modified the ACL to be: > > { >> "permissive": false, >> "register_frameworks": [ >> { >> "principals": { "values": [ "devel1", "devel2" ] }, >> "roles": { "values": [ "apps", "dev-ops" ] } >> }, >> { >> "principals": { "type": "NONE" }, >> "roles": { "values": [ "apps", "dev-ops" ] } >> } >> ], >> "run_tasks": [ >> { >> "principals": { "values": [ "devel1", *"devel2"* ] }, >> "users": { "values": [ "root" ] } >> }, > > { >> "principals": { "values": [ "marathon" ] }, >> "users": { "type": "NONE" } >> } >> ] >> } > > > If we attempt to launch a task as user "mpark" with the framework > registered as "devel2" (or "devel1"), we'll get continue to get the > *TASK_LOST* message because it fails at the *authorization* phase. > > >> Does the above makes sense? Please correct me if I am wrong. >> > > I hope my explanation above made sense! > > >> >> "permissive": false, >> "register_frameworks": [ >> { >> "principals": { "values": [ "devel1", "devel2" ] }, >> "roles": { "values": [ "apps", "dev-ops" ] } >> }, >> { >> "principals": { "type": "NONE" }, >> "roles": { "values": [ "apps", "dev-ops" ] } >> } >> ], >> "run_tasks": [ >> { >> "principals": { "values": [ "devel1" ] }, >> "users": { "values": [ "root" ] } >> }, >> { >> "principals": { "values": [ "marathon" ] }, >> "users": { "type": "NONE" } } ] >> } >> >> Thanks, >> ./Siva. >> > > Thanks, > > MPark. > > >> On Mon, Mar 9, 2015 at 3:57 PM, Sivaram Kannan <[email protected]> >> wrote: >> >>> >>> Hi Vinod, >>> >>> The users in below run_tasks definition - does it refer to unix users in >>> the machine where the framework is run or the unix users in the mesos-slave >>> machine. I think the fact that I run all softwares (mesos-master, >>> mesos-slave, marathon) as docker containers is of significance and reason >>> for the below failure. >>> >>> "run_tasks": [ >>> { >>> "principals": { >>> "values": [ >>> "marathon" >>> ] >>> }, >>> "users": { >>> "values": [ >>> "devel" >>> ] >>> } >>> }, >>> { >>> "principals": { >>> "values": [ >>> "marathon" >>> ] >>> }, >>> "users": { >>> "type": "NONE" >>> } >>> } >>> ] >>> >>> When I start the marathon, I start with the flag --mesos_user=devel and >>> while bringing up mesos-slave I bring up with the flag >>> --switch_user=true(which I think anyway is default). When I try to launch a >>> task this is what I am getting >>> >>> Marathon Log: >>> >>> 0.10; rv:38.0) Gecko/20100101 Firefox/38.0" >>> (mesosphere.chaos.http.ChaosRequestLog:15) >>> [2015-03-06 06:04:04,057] INFO Received status update for task >>> busybox.9777a963-c3c6-11e4-a31a-56847afe9799: TASK_FAILED (Abnormal >>> executor termination) (mesosphere.marathon.MarathonScheduler:165) >>> [2015-03-06 06:04:04,063] INFO Task launch delay for [/busybox] is now >>> [43] seconds (mesosphere.util.RateLimiter:34) >>> [2015-03-06 06:04:04,068] INFO Task >>> busybox.9777a963-c3c6-11e4-a31a-56847afe9799 expunged and removed from >>> TaskTracker (mesosphere.marathon.tasks.TaskTracker:101) >>> [2015-03-06 06:04:04,068] INFO Sending event notification. >>> (mesosphere.marathon.MarathonScheduler:274) >>> >>> Mesos-Slave Log: >>> >>> Mar 06 06:06:03 node-0800279564ad sh[27684]: E0306 06:06:03.898473 13 >>> slave.cpp:2787] Container '9835da8c-a844-4d53-a7f7-4a5e6e808a9b' for >>> executor 'busybox.d4ef22c6-c3c6-11e4-a31a-56847afe9799' of framework >>> '20150306-054714-24707342-5050-1-0000' failed to start: Failed to create >>> container: Failed to chown: Failed to get user information for 'devel': >>> Success >>> Mar 06 06:06:03 node-0800279564ad sh[27684]: E0306 06:06:03.900068 13 >>> slave.cpp:2882] Termination of executor >>> 'busybox.d4ef22c6-c3c6-11e4-a31a-56847afe9799' of framework >>> '20150306-054714-24707342-5050-1-0000' failed: Unknown container: >>> 9835da8c-a844-4d53-a7f7-4a5e6e808a9b >>> Mar 06 06:06:03 node-0800279564ad sh[27684]: E0306 06:06:03.905900 13 >>> slave.cpp:3134] Failed to unmonitor container for executor >>> busybox.d4ef22c6-c3c6-11e4-a31a-56847afe9799 of framework >>> 20150306-054714-24707342-5050-1-0000: Not monitored >>> >>> Could the failure be related to me running the mesos-slave as container >>> here? >>> >>> Thanks, >>> ./Siva. >>> >>> On Mon, Mar 9, 2015 at 10:51 AM, Sivaram Kannan <[email protected]> >>> wrote: >>> >>>> >>>> Hi Vinod, >>>> >>>> Thanks, I got it. I guess I did not understand the relationship between >>>> principals defined in authentication and in authorization. I re-read the >>>> authentication and credentials flag, it is not clear from them that the >>>> principals defined in authorization should match them to work correctly. If >>>> I could, will change the documentation to be more clear and submit a PR. >>>> >>>> Thanks, >>>> ./Siva. >>>> >>>> On Mon, Mar 9, 2015 at 2:18 AM, Vinod Kone <[email protected]> >>>> wrote: >>>> >>>>> The principal used for authenticating the framework is the same >>>>> principal used to authorize the framework too. So you need to use >>>>> 'marathon' in your credentials too. In other words, when you start the >>>>> framework the Credential.principal should be the same as >>>>> FrameworkInfo.principal (Mesos master will validate this). >>>>> >>>>> On Sun, Mar 8, 2015 at 10:48 AM, Sivaram Kannan <[email protected]> >>>>> wrote: >>>>> >>>>>> I0308 17:41:14.876610 6 master.cpp:1342] Authorizing framework >>>>>> principal 'user1' to receive offers for role 'apps' >>>>>> >>>>> >>>>> As you can see from this line, the master is trying to authorize >>>>> principal 'user1' and not 'marathon'. >>>>> >>>> >>>> >>>> >>>> -- >>>> ever tried. ever failed. no matter. >>>> try again. fail again. fail better. >>>> -- Samuel Beckett >>>> >>> >>> >>> >>> -- >>> ever tried. ever failed. no matter. >>> try again. fail again. fail better. >>> -- Samuel Beckett >>> >> >> >> >> -- >> ever tried. ever failed. no matter. >> try again. fail again. fail better. >> -- Samuel Beckett >> > > -- ever tried. ever failed. no matter. try again. fail again. fail better. -- Samuel Beckett

