Re: Who is the user in Mesos Authorization ACL definition?
Hi, I apologize for bombarding with so many emails on the same issue. So, I modified the acl.json as below. 1. I was able to launch the framework with authentication as users devel1 and devel2. 2. I was able to launch a task as user devel1 3. I get TASK_LOST when I try to launch task with the framework registered as devel2. 4. In the same config, if I change the run_tasks = users to devel, the task fails with the error described in the previous email. As far as I understand, an error in run_tasks users, does not give TASK_LOST, but a TASK_FAILED. But a mismatch in principals between register_frameworks and run_tasks gives a TASK_LOST. Does the above makes sense? Please correct me if I am wrong. permissive: false, register_frameworks: [ { principals: { values: [ devel1, devel2 ] }, roles: { values: [ apps, dev-ops ] } }, { principals: { type: NONE }, roles: { values: [ apps, dev-ops ] } } ], run_tasks: [ { principals: { values: [ devel1 ] }, users: { values: [ root ] } }, { principals: { values: [ marathon ] }, users: { type: NONE } } ] } Thanks, ./Siva. On Mon, Mar 9, 2015 at 3:57 PM, Sivaram Kannan sivara...@gmail.com wrote: Hi Vinod, The users in below run_tasks definition - does it refer to unix users in the machine where the framework is run or the unix users in the mesos-slave machine. I think the fact that I run all softwares (mesos-master, mesos-slave, marathon) as docker containers is of significance and reason for the below failure. run_tasks: [ { principals: { values: [ marathon ] }, users: { values: [ devel ] } }, { principals: { values: [ marathon ] }, users: { type: NONE } } ] When I start the marathon, I start with the flag --mesos_user=devel and while bringing up mesos-slave I bring up with the flag --switch_user=true(which I think anyway is default). When I try to launch a task this is what I am getting Marathon Log: 0.10; rv:38.0) Gecko/20100101 Firefox/38.0 (mesosphere.chaos.http.ChaosRequestLog:15) [2015-03-06 06:04:04,057] INFO Received status update for task busybox.9777a963-c3c6-11e4-a31a-56847afe9799: TASK_FAILED (Abnormal executor termination) (mesosphere.marathon.MarathonScheduler:165) [2015-03-06 06:04:04,063] INFO Task launch delay for [/busybox] is now [43] seconds (mesosphere.util.RateLimiter:34) [2015-03-06 06:04:04,068] INFO Task busybox.9777a963-c3c6-11e4-a31a-56847afe9799 expunged and removed from TaskTracker (mesosphere.marathon.tasks.TaskTracker:101) [2015-03-06 06:04:04,068] INFO Sending event notification. (mesosphere.marathon.MarathonScheduler:274) Mesos-Slave Log: Mar 06 06:06:03 node-0800279564ad sh[27684]: E0306 06:06:03.89847313 slave.cpp:2787] Container '9835da8c-a844-4d53-a7f7-4a5e6e808a9b' for executor 'busybox.d4ef22c6-c3c6-11e4-a31a-56847afe9799' of framework '20150306-054714-24707342-5050-1-' failed to start: Failed to create container: Failed to chown: Failed to get user information for 'devel': Success Mar 06 06:06:03 node-0800279564ad sh[27684]: E0306 06:06:03.90006813 slave.cpp:2882] Termination of executor 'busybox.d4ef22c6-c3c6-11e4-a31a-56847afe9799' of framework '20150306-054714-24707342-5050-1-' failed: Unknown container: 9835da8c-a844-4d53-a7f7-4a5e6e808a9b Mar 06 06:06:03 node-0800279564ad sh[27684]: E0306 06:06:03.90590013 slave.cpp:3134] Failed to unmonitor container for executor busybox.d4ef22c6-c3c6-11e4-a31a-56847afe9799 of framework 20150306-054714-24707342-5050-1-: Not monitored Could the failure be related to me running the mesos-slave as container here? Thanks, ./Siva. On Mon, Mar 9, 2015 at 10:51 AM, Sivaram Kannan sivara...@gmail.com wrote: Hi Vinod, Thanks, I got it. I guess I did not understand the relationship between principals defined in authentication and in authorization. I re-read the authentication and credentials flag, it is not clear from them that the principals defined in authorization should match them to work correctly. If I could, will change the documentation to be more clear and submit a PR. Thanks, ./Siva. On Mon, Mar 9, 2015 at 2:18 AM, Vinod Kone vinodk...@apache.org wrote: The principal used for authenticating the framework is the same principal used to authorize the framework too. So you need to use 'marathon' in your credentials too. In other words, when you start the framework the Credential.principal should be the same as FrameworkInfo.principal (Mesos master will validate this). On Sun, Mar 8, 2015 at 10:48 AM,
mesos on coreos
Hi, I am wondering if anybody in the community has looked into or are running mesos on top of coreos. I would be interested to hear out your experiences around following areas - Users management on coreos cluster and containers running with Mesos - Are you using fleet to run mesos or run it as service in cloud-config and don't use fleet at all - Networking among hosts flannel or ? - Any other interesting insights you found considering such setup Thanks, Gurvinder
Re: Who is the user in Mesos Authorization ACL definition?
On 9 March 2015 at 07:36, Sivaram Kannan sivara...@gmail.com wrote: Hi, I apologize for bombarding with so many emails on the same issue. So, I modified the acl.json as below. 1. I was able to launch the framework with authentication as users devel1 and devel2. Just so that our terminologies match here, you were able to *register* frameworks *authenticated* as *principals* *devel1* and *devel2.* Your ACL specifies *devel1* and *devel2* can register under *apps* and *dev-ops* roles, so as long as the frameworks registered under those roles, the success here makes sense. 2. I was able to launch a task as user devel1 I agree with Vinod that maybe the point of confusion is regarding *principal *vs *user* for *run_tasks*. To reiterate, *principal* is essentially a username for Mesos to authenticate the framework, and *user* is the unix user under which the task will run. Your ACL specifies that *principal* *devel1* can launch tasks as *user* *root*. So you shouldn't be able to launch a task as user *devel1*, but rather launch a task as user* root* with the framework registered as principal=*devel1*. If this is not the case, something's wrong. 3. I get TASK_LOST when I try to launch task with the framework registered as devel2. This is correct. Vinod already covered this point. In short, framework registered as *devel2* is not permitted to run anything based on your ACL since none of the specified cases match and permissive is set to false. 4. In the same config, if I change the run_tasks = users to devel, the task fails with the error described in the previous email. As far as I understand, an error in run_tasks users, does not give TASK_LOST, but a TASK_FAILED. I'm not sure what you mean by an error in run_tasks users. The error you get in this case is because you don't have a *devel* user available in the environment you're launching the task. The relevant line in the error message that illustrate this is: Mar 06 06:06:03 node-0800279564ad sh[27684]: E0306 06:06:03.89847313 slave.cpp:2787] Container '9835da8c-a844-4d53-a7f7-4a5e6e808a9b' for executor 'busybox.d4ef22c6-c3c6-11e4-a31a-56847afe9799' of framework '20150306-054714-24707342-5050-1-' failed to start: Failed to create container: Failed to chown: *Failed to get user information for 'devel'*: Success This is indeed a *TASK_FAILED*, since authorization succeeded, but the task failed to launch. But a mismatch in principals between register_frameworks and run_tasks gives a TASK_LOST. It's not clear exactly what you mean by a *mismatch* in principals between register_frameworks and run_tasks. If you mean that all principals under register_frameworks should have a matching portion in run_tasks, that's not quite correct. For example, you modified the ACL to be: { permissive: false, register_frameworks: [ { principals: { values: [ devel1, devel2 ] }, roles: { values: [ apps, dev-ops ] } }, { principals: { type: NONE }, roles: { values: [ apps, dev-ops ] } } ], run_tasks: [ { principals: { values: [ devel1, *devel2* ] }, users: { values: [ root ] } }, { principals: { values: [ marathon ] }, users: { type: NONE } } ] } If we attempt to launch a task as user mpark with the framework registered as devel2 (or devel1), we'll get continue to get the *TASK_LOST* message because it fails at the *authorization* phase. Does the above makes sense? Please correct me if I am wrong. I hope my explanation above made sense! permissive: false, register_frameworks: [ { principals: { values: [ devel1, devel2 ] }, roles: { values: [ apps, dev-ops ] } }, { principals: { type: NONE }, roles: { values: [ apps, dev-ops ] } } ], run_tasks: [ { principals: { values: [ devel1 ] }, users: { values: [ root ] } }, { principals: { values: [ marathon ] }, users: { type: NONE } } ] } Thanks, ./Siva. Thanks, MPark. On Mon, Mar 9, 2015 at 3:57 PM, Sivaram Kannan sivara...@gmail.com wrote: Hi Vinod, The users in below run_tasks definition - does it refer to unix users in the machine where the framework is run or the unix users in the mesos-slave machine. I think the fact that I run all softwares (mesos-master, mesos-slave, marathon) as docker containers is of significance and reason for the below failure. run_tasks: [ { principals: { values: [ marathon ] }, users: { values: [ devel ] } }, { principals: { values:
Re: Who is the user in Mesos Authorization ACL definition?
Hi Vinod, The users in below run_tasks definition - does it refer to unix users in the machine where the framework is run or the unix users in the mesos-slave machine. I think the fact that I run all softwares (mesos-master, mesos-slave, marathon) as docker containers is of significance and reason for the below failure. run_tasks: [ { principals: { values: [ marathon ] }, users: { values: [ devel ] } }, { principals: { values: [ marathon ] }, users: { type: NONE } } ] When I start the marathon, I start with the flag --mesos_user=devel and while bringing up mesos-slave I bring up with the flag --switch_user=true(which I think anyway is default). When I try to launch a task this is what I am getting Marathon Log: 0.10; rv:38.0) Gecko/20100101 Firefox/38.0 (mesosphere.chaos.http.ChaosRequestLog:15) [2015-03-06 06:04:04,057] INFO Received status update for task busybox.9777a963-c3c6-11e4-a31a-56847afe9799: TASK_FAILED (Abnormal executor termination) (mesosphere.marathon.MarathonScheduler:165) [2015-03-06 06:04:04,063] INFO Task launch delay for [/busybox] is now [43] seconds (mesosphere.util.RateLimiter:34) [2015-03-06 06:04:04,068] INFO Task busybox.9777a963-c3c6-11e4-a31a-56847afe9799 expunged and removed from TaskTracker (mesosphere.marathon.tasks.TaskTracker:101) [2015-03-06 06:04:04,068] INFO Sending event notification. (mesosphere.marathon.MarathonScheduler:274) Mesos-Slave Log: Mar 06 06:06:03 node-0800279564ad sh[27684]: E0306 06:06:03.89847313 slave.cpp:2787] Container '9835da8c-a844-4d53-a7f7-4a5e6e808a9b' for executor 'busybox.d4ef22c6-c3c6-11e4-a31a-56847afe9799' of framework '20150306-054714-24707342-5050-1-' failed to start: Failed to create container: Failed to chown: Failed to get user information for 'devel': Success Mar 06 06:06:03 node-0800279564ad sh[27684]: E0306 06:06:03.90006813 slave.cpp:2882] Termination of executor 'busybox.d4ef22c6-c3c6-11e4-a31a-56847afe9799' of framework '20150306-054714-24707342-5050-1-' failed: Unknown container: 9835da8c-a844-4d53-a7f7-4a5e6e808a9b Mar 06 06:06:03 node-0800279564ad sh[27684]: E0306 06:06:03.90590013 slave.cpp:3134] Failed to unmonitor container for executor busybox.d4ef22c6-c3c6-11e4-a31a-56847afe9799 of framework 20150306-054714-24707342-5050-1-: Not monitored Could the failure be related to me running the mesos-slave as container here? Thanks, ./Siva. On Mon, Mar 9, 2015 at 10:51 AM, Sivaram Kannan sivara...@gmail.com wrote: Hi Vinod, Thanks, I got it. I guess I did not understand the relationship between principals defined in authentication and in authorization. I re-read the authentication and credentials flag, it is not clear from them that the principals defined in authorization should match them to work correctly. If I could, will change the documentation to be more clear and submit a PR. Thanks, ./Siva. On Mon, Mar 9, 2015 at 2:18 AM, Vinod Kone vinodk...@apache.org wrote: The principal used for authenticating the framework is the same principal used to authorize the framework too. So you need to use 'marathon' in your credentials too. In other words, when you start the framework the Credential.principal should be the same as FrameworkInfo.principal (Mesos master will validate this). On Sun, Mar 8, 2015 at 10:48 AM, Sivaram Kannan sivara...@gmail.com wrote: I0308 17:41:14.876610 6 master.cpp:1342] Authorizing framework principal 'user1' to receive offers for role 'apps' As you can see from this line, the master is trying to authorize principal 'user1' and not 'marathon'. -- ever tried. ever failed. no matter. try again. fail again. fail better. -- Samuel Beckett -- ever tried. ever failed. no matter. try again. fail again. fail better. -- Samuel Beckett