Re: Authentication module
Authentication is enabled for Mesos APIs used by schedulers (to talk to master), operators (to talk to master/agent) and agents (to talk to master). Executor to agent communication is not currently authenticated. This might throw some light: https://github.com/apache/mesos/blob/master/docs/authentication.md On Fri, Dec 2, 2016 at 11:48 AM, Alexander Gallegowrote: > > For the authentication module: http://mesos.apache.org/ > documentation/latest/modules/ does it mean kerberos,ldap, etc for tasks > or for framework registration or for machine registration > > are there any more docs on this? > > >
Re: Failure reason documentation
Thanks haosdent! On Sun, Dec 4, 2016 at 10:45 AM haosdentwrote: > Ohoh, sorry for misunderstanding the question. As far as I know, there is > no documentation for that. We should add some comments to the reason enums. > Create a ticket here https://issues.apache.org/jira/browse/MESOS-6686 to > track it. > > On Mon, Dec 5, 2016 at 2:27 AM, Erik Weathers > wrote: > > I think he's looking for documentation about what precisely each reason > *means*. A la how there are comments beside the TaskState list in > mesos.proto. > > - Erik > > On Sun, Dec 4, 2016 at 10:07 AM haosdent wrote: > > Hi @Wil You could find them here > https://github.com/apache/mesos/blob/1.1.0/include/mesos/mesos.proto#L1577-L1609 > > On Sat, Dec 3, 2016 at 6:09 AM, Wil Yegelwel wrote: > > No I'm referring to the values of the enum Reason. > > On Fri, Dec 2, 2016, 4:52 PM Tomek Janiszewski wrote: > > Hi > > Are you referring to task state? If yes then take a look at comments in > proto > https://github.com/apache/mesos/blob/master/include/mesos/mesos.proto#L1552 > > http://mesos.apache.org/api/latest/java/org/apache/mesos/Protos.TaskState.html > > Best > > Tomek > > pt., 2.12.2016, 21:31 użytkownik Wil Yegelwel > napisał: > > Hey mesos users! > > I can't seem to find any documentation about the various reasons mesos > includes when a job fails. Is there a place that describes what the reasons > mean? > > Thanks, > Wil > > > > > -- > Best Regards, > Haosdent Huang > > > > > -- > Best Regards, > Haosdent Huang >
Re: Failure reason documentation
Ohoh, sorry for misunderstanding the question. As far as I know, there is no documentation for that. We should add some comments to the reason enums. Create a ticket here https://issues.apache.org/jira/browse/MESOS-6686 to track it. On Mon, Dec 5, 2016 at 2:27 AM, Erik Weatherswrote: > I think he's looking for documentation about what precisely each reason > *means*. A la how there are comments beside the TaskState list in > mesos.proto. > > - Erik > > On Sun, Dec 4, 2016 at 10:07 AM haosdent wrote: > > Hi @Wil You could find them here https://github.com/ > apache/mesos/blob/1.1.0/include/mesos/mesos.proto#L1577-L1609 > > On Sat, Dec 3, 2016 at 6:09 AM, Wil Yegelwel wrote: > > No I'm referring to the values of the enum Reason. > > On Fri, Dec 2, 2016, 4:52 PM Tomek Janiszewski wrote: > > Hi > > Are you referring to task state? If yes then take a look at comments in > proto https://github.com/apache/mesos/blob/master/include/ > mesos/mesos.proto#L1552 http://mesos.apache.org/api/ > latest/java/org/apache/mesos/Protos.TaskState.html > > Best > > Tomek > > pt., 2.12.2016, 21:31 użytkownik Wil Yegelwel > napisał: > > Hey mesos users! > > I can't seem to find any documentation about the various reasons mesos > includes when a job fails. Is there a place that describes what the reasons > mean? > > Thanks, > Wil > > > > > -- > Best Regards, > Haosdent Huang > > -- Best Regards, Haosdent Huang
Re: Failure reason documentation
I think he's looking for documentation about what precisely each reason *means*. A la how there are comments beside the TaskState list in mesos.proto. - Erik On Sun, Dec 4, 2016 at 10:07 AM haosdentwrote: Hi @Wil You could find them here https://github.com/apache/mesos/blob/1.1.0/include/mesos/mesos.proto#L1577-L1609 On Sat, Dec 3, 2016 at 6:09 AM, Wil Yegelwel wrote: No I'm referring to the values of the enum Reason. On Fri, Dec 2, 2016, 4:52 PM Tomek Janiszewski wrote: Hi Are you referring to task state? If yes then take a look at comments in proto https://github.com/apache/mesos/blob/master/include/mesos/mesos.proto#L1552 http://mesos.apache.org/api/latest/java/org/apache/mesos/Protos.TaskState.html Best Tomek pt., 2.12.2016, 21:31 użytkownik Wil Yegelwel napisał: Hey mesos users! I can't seem to find any documentation about the various reasons mesos includes when a job fails. Is there a place that describes what the reasons mean? Thanks, Wil -- Best Regards, Haosdent Huang
Re: Failure reason documentation
Hi @Wil You could find them here https://github.com/apache/mesos/blob/1.1.0/include/mesos/mesos.proto#L1577-L1609 On Sat, Dec 3, 2016 at 6:09 AM, Wil Yegelwelwrote: > No I'm referring to the values of the enum Reason. > > On Fri, Dec 2, 2016, 4:52 PM Tomek Janiszewski wrote: > >> Hi >> >> Are you referring to task state? If yes then take a look at comments in >> proto https://github.com/apache/mesos/blob/master/include/ >> mesos/mesos.proto#L1552 http://mesos.apache.org/api/ >> latest/java/org/apache/mesos/Protos.TaskState.html >> >> Best >> >> Tomek >> >> pt., 2.12.2016, 21:31 użytkownik Wil Yegelwel >> napisał: >> >> Hey mesos users! >> >> I can't seem to find any documentation about the various reasons mesos >> includes when a job fails. Is there a place that describes what the reasons >> mean? >> >> Thanks, >> Wil >> >> -- Best Regards, Haosdent Huang
Re: MESOS-6233 Allow agents to re-register post a host reboot
> we can have the agent remove `rm -f /meta/slaves/latest` automatically upon recovery failure but only after the host has rebooted. This sounds dangerous. When the different of AgentInfo is caused by operator's typo, I think the operator would prefer to correct them and try to start agent again. Rather than remove them automatically. But if we decide to do that, please make sure email this behavior change to the mailing lists in a separate email. Thank you! On Wed, Nov 30, 2016 at 6:24 AM, tommy xiaowrote: > agree with james's options. > > 2016-11-30 0:48 GMT+08:00 James Peach : > > > > > > On Nov 28, 2016, at 6:09 PM, Yan Xu wrote: > > > > > > So one thing that was brought up during offline conversations was that > > if the host reboot is associated with hardware change (e.g., a new memory > > stick): > > > > > > • Currently: the agent would skip the recovery (and the chance of > > running into incompatible agent info) and register as a new agent. > > > • With the change: the agent could run into incompatible agent > > info due to resource change and flap indefinitely until the operator > > intervenes. > > > > > > To mitigate this and maintain the current behavior, we can have the > > agent remove `rm -f /meta/slaves/latest` automatically upon > > recovery failure but only after the host has rebooted. This way the agent > > can restart as a new agent without operator intervention. > > > > > > Any thoughts? > > > > I still think you need a mechanism for the master/agent to tell you > > whether it will honor the restart policy. Without this, you have to lock > > the framework to a Mesos version. > > > > An empty RestartPolicy is also problematic since it precludes using > > RestartPolicy in pods. If you later want to restart a task inside a pod > but > > not across agent restarts you would have no way to express that. > > > > J > > > > > -- > Deshi Xiao > Twitter: xds2000 > E-mail: xiaods(AT)gmail.com > -- Best Regards, Haosdent Huang