On 9 March 2015 at 07:36, Sivaram Kannan <[email protected]> wrote:

>
> Hi,
>
> I apologize for bombarding with so many emails on the same issue. So, I
> modified the acl.json as below.
>
> 1. I was able to launch the framework with authentication as users devel1
> and devel2.
>

Just so that our terminologies match here, you were able to *register*
 frameworks *authenticated* as *principals* *devel1* and *devel2.*

Your ACL specifies *devel1* and *devel2* can register under *apps* and
*dev-ops* roles, so as long as the frameworks registered under those roles,
the success here makes sense.


> 2. I was able to launch a task as user devel1
>

I agree with Vinod that maybe the point of confusion is regarding *principal
*vs *user* for *run_tasks*. To reiterate, *principal* is essentially a
username for Mesos to authenticate the framework, and *user* is the unix
user under which the task will run. Your ACL specifies that *principal*
*devel1* can launch tasks as *user* *root*. So you shouldn't be able to
launch a task as user *devel1*, but rather launch a task as user* root* with
the framework registered as principal=*devel1*. If this is not the case,
something's wrong.


> 3. I get TASK_LOST when I try to launch task with the framework registered
> as devel2.
>

This is correct. Vinod already covered this point. In short, framework
registered as *devel2* is not permitted to run anything based on your ACL
since none of the specified cases match and "permissive" is set to false.


> 4. In the same config, if I change the run_tasks => users to devel, the
> task fails with the error described in the previous email. As far as I
> understand, an error in run_tasks users, does not give TASK_LOST, but a
> TASK_FAILED.
>

I'm not sure what you mean by "an error in run_tasks users". The error you
get in this case is because you don't have a *devel* user available in the
environment you're launching the task. The relevant line in the error
message that illustrate this is:

Mar 06 06:06:03 node-0800279564ad sh[27684]: E0306 06:06:03.898473    13
> slave.cpp:2787] Container '9835da8c-a844-4d53-a7f7-4a5e6e808a9b' for
> executor 'busybox.d4ef22c6-c3c6-11e4-a31a-56847afe9799' of framework
> '20150306-054714-24707342-5050-1-0000' failed to start: Failed to create
> container: Failed to chown: *Failed to get user information for 'devel'*:
> Success


This is indeed a *TASK_FAILED*, since authorization succeeded, but the task
failed to launch.


> But a mismatch in principals between register_frameworks and run_tasks
> gives a TASK_LOST.
>

It's not clear exactly what you mean by "a *mismatch* in principals between
register_frameworks and run_tasks". If you mean that all principals under
register_frameworks should have a matching portion in run_tasks, that's not
quite correct. For example, you modified the ACL to be:

{
>     "permissive": false,
>     "register_frameworks": [
>         {
>             "principals": { "values": [ "devel1", "devel2" ] },
>             "roles": { "values": [ "apps", "dev-ops" ] }
>         },
>         {
>             "principals": { "type": "NONE" },
>             "roles": { "values": [ "apps", "dev-ops" ] }
>         }
>     ],
>     "run_tasks": [
>         {
>             "principals": { "values": [ "devel1", *"devel2"* ] },
>             "users": { "values": [ "root" ] }
>         },

        {
>             "principals": { "values": [ "marathon" ] },
>             "users": { "type": "NONE" }
>         }
>     ]
> }


If we attempt to launch a task as user "mpark" with the framework
registered as "devel2" (or "devel1"), we'll get continue to get the
*TASK_LOST* message because it fails at the *authorization* phase.


> Does the above makes sense? Please correct me if I am wrong.
>

I hope my explanation above made sense!


>
>     "permissive": false,
>     "register_frameworks": [
>         {
>             "principals": { "values": [ "devel1", "devel2" ] },
>             "roles": { "values": [ "apps", "dev-ops" ] }
>         },
>         {
>             "principals": { "type": "NONE" },
>             "roles": { "values": [ "apps", "dev-ops" ] }
>         }
>     ],
>     "run_tasks": [
>         {
>             "principals": { "values": [ "devel1" ] },
>             "users": { "values": [ "root" ] }
>         },
>         {
>             "principals": { "values": [ "marathon" ] },
>             "users": { "type": "NONE" } } ]
> }
>
> Thanks,
> ./Siva.
>

Thanks,

MPark.


> On Mon, Mar 9, 2015 at 3:57 PM, Sivaram Kannan <[email protected]>
> wrote:
>
>>
>> Hi Vinod,
>>
>> The users in below run_tasks definition - does it refer to unix users in
>> the machine where the framework is run or the unix users in the mesos-slave
>> machine. I think the fact that I run all softwares (mesos-master,
>> mesos-slave, marathon) as docker containers is of significance and reason
>> for the below failure.
>>
>> "run_tasks": [
>>         {
>>             "principals": {
>>                 "values": [
>>                     "marathon"
>>                 ]
>>             },
>>             "users": {
>>                 "values": [
>>                     "devel"
>>                 ]
>>             }
>>         },
>>         {
>>             "principals": {
>>                 "values": [
>>                     "marathon"
>>                 ]
>>             },
>>             "users": {
>>                 "type": "NONE"
>>             }
>>         }
>>     ]
>>
>> When I start the marathon, I start with the flag --mesos_user=devel and
>> while bringing up mesos-slave I bring up with the flag
>> --switch_user=true(which I think anyway is default). When I try to launch a
>> task this is what I am getting
>>
>> Marathon Log:
>>
>> 0.10; rv:38.0) Gecko/20100101 Firefox/38.0"
>> (mesosphere.chaos.http.ChaosRequestLog:15)
>> [2015-03-06 06:04:04,057] INFO Received status update for task
>> busybox.9777a963-c3c6-11e4-a31a-56847afe9799: TASK_FAILED (Abnormal
>> executor termination) (mesosphere.marathon.MarathonScheduler:165)
>> [2015-03-06 06:04:04,063] INFO Task launch delay for [/busybox] is now
>> [43] seconds (mesosphere.util.RateLimiter:34)
>> [2015-03-06 06:04:04,068] INFO Task
>> busybox.9777a963-c3c6-11e4-a31a-56847afe9799 expunged and removed from
>> TaskTracker (mesosphere.marathon.tasks.TaskTracker:101)
>> [2015-03-06 06:04:04,068] INFO Sending event notification.
>> (mesosphere.marathon.MarathonScheduler:274)
>>
>> Mesos-Slave Log:
>>
>> Mar 06 06:06:03 node-0800279564ad sh[27684]: E0306 06:06:03.898473    13
>> slave.cpp:2787] Container '9835da8c-a844-4d53-a7f7-4a5e6e808a9b' for
>> executor 'busybox.d4ef22c6-c3c6-11e4-a31a-56847afe9799' of framework
>> '20150306-054714-24707342-5050-1-0000' failed to start: Failed to create
>> container: Failed to chown: Failed to get user information for 'devel':
>> Success
>> Mar 06 06:06:03 node-0800279564ad sh[27684]: E0306 06:06:03.900068    13
>> slave.cpp:2882] Termination of executor
>> 'busybox.d4ef22c6-c3c6-11e4-a31a-56847afe9799' of framework
>> '20150306-054714-24707342-5050-1-0000' failed: Unknown container:
>> 9835da8c-a844-4d53-a7f7-4a5e6e808a9b
>> Mar 06 06:06:03 node-0800279564ad sh[27684]: E0306 06:06:03.905900    13
>> slave.cpp:3134] Failed to unmonitor container for executor
>> busybox.d4ef22c6-c3c6-11e4-a31a-56847afe9799 of framework
>> 20150306-054714-24707342-5050-1-0000: Not monitored
>>
>> Could the failure be related to me running the mesos-slave as container
>> here?
>>
>> Thanks,
>> ./Siva.
>>
>> On Mon, Mar 9, 2015 at 10:51 AM, Sivaram Kannan <[email protected]>
>> wrote:
>>
>>>
>>> Hi Vinod,
>>>
>>> Thanks, I got it. I guess I did not understand the relationship between
>>> principals defined in authentication and in authorization.  I re-read the
>>> authentication and credentials flag, it is not clear from them that the
>>> principals defined in authorization should match them to work correctly. If
>>> I could, will change the documentation to be more clear and submit a PR.
>>>
>>> Thanks,
>>> ./Siva.
>>>
>>> On Mon, Mar 9, 2015 at 2:18 AM, Vinod Kone <[email protected]> wrote:
>>>
>>>> The principal used for authenticating the framework is the same
>>>> principal used to authorize the framework too. So you need to use
>>>> 'marathon' in your credentials too. In other words, when you start the
>>>> framework the Credential.principal should be the same as
>>>> FrameworkInfo.principal (Mesos master will validate this).
>>>>
>>>> On Sun, Mar 8, 2015 at 10:48 AM, Sivaram Kannan <[email protected]>
>>>> wrote:
>>>>
>>>>> I0308 17:41:14.876610     6 master.cpp:1342] Authorizing framework
>>>>> principal 'user1' to receive offers for role 'apps'
>>>>>
>>>>
>>>> As you can see from this line, the master is trying to authorize
>>>> principal 'user1' and not 'marathon'.
>>>>
>>>
>>>
>>>
>>> --
>>> ever tried. ever failed. no matter.
>>> try again. fail again. fail better.
>>>         -- Samuel Beckett
>>>
>>
>>
>>
>> --
>> ever tried. ever failed. no matter.
>> try again. fail again. fail better.
>>         -- Samuel Beckett
>>
>
>
>
> --
> ever tried. ever failed. no matter.
> try again. fail again. fail better.
>         -- Samuel Beckett
>

Reply via email to