Question about mesos authentication and authorization?

2016-12-12 Thread Yu Wei
Hi folks,


One question about mesos authentication and authorization.

Where could I store the credentials? In documentation, it seems the credentials 
is stored in local disk.

Is there any other approach to store and manage credentials via LDAP or other 
3rd party software?


Thanks,

Jared, (??)
Software developer
Interested in open source software, big data, Linux


Re: MESOS-6233 Allow agents to re-register post a host reboot

2016-12-12 Thread Joris Van Remoortere
>
> So one thing that was brought up during offline conversations was that if
> the host reboot is associated with hardware change (e.g., a new memory
> stick):


>- With the change: the agent could run into incompatible agent info
>due to resource change and flap
>
> 
>  indefinitely
>until the operator intervenes.
>
> Can you elaborate on this?

Would you run into this because you don't explicitly specify the memory
resource in the agent configuration? I think we highly recommend that you
do this in production to prevent accidental incompatibility of resources
even without an actual hardware change. Historically there were some issues
reported where the kernel reported a slightly different amount of memory
after reboot.

—
*Joris Van Remoortere*
Mesosphere

On Mon, Nov 28, 2016 at 6:09 PM, Yan Xu  wrote:

> So one thing that was brought up during offline conversations was that if
> the host reboot is associated with hardware change (e.g., a new memory
> stick):
>
>
>- Currently: the agent would skip the recovery (and the chance of
>running into incompatible agent info) and register as a new agent.
>- With the change: the agent could run into incompatible agent info
>due to resource change and flap
>
> 
>indefinitely until the operator intervenes.
>
>
> To mitigate this and maintain the current behavior, we can have the agent
> remove `rm -f /meta/slaves/latest` automatically upon recovery
> failure but only after the host has rebooted. This way the agent can
> restart as a new agent without operator intervention.
>
> Any thoughts?
>
> BTW this speaks to the need for MESOS-1739.
>
> Yan
>
> On Tue, Nov 15, 2016 at 7:37 AM, Megha Sharma  wrote:
>
>> Hi All,
>>
>> We have been working on the design for Restartable tasks (
>> MESOS-3545) and allowing agents to recover and re-register post reboot is a
>> pre-requisite for that.
>> Agent today doesn’t recover its state that includes its SlaveID post a
>> host reboot, it short-circuits the recovery upon discovering the reboot and
>> registers with the master as a new agent. With Partition Awareness, the
>> mesos master even allows agents which have failed master’s health check
>> pings (unreachable agents) to re-register with it and reconcile the
>> tasks/executors. The executors on a rebooted host are anyway terminated so
>> there is no harm in letting such an agent recover and re-register with the
>> master using its old SlaveID.
>> Would like to hear from the folks here if you see any operational
>> concerns with letting the agents recover post a host reboot.
>>
>> MESOS JIRA: https://issues.apache.org/jira/browse/MESOS-6223
>>
>> Many Thanks
>> Megha Sharma
>>
>>
>>
>


Re: Building on OS X 10.12

2016-12-12 Thread Neil Conway
I think we should look into adopting "-fvisibility=hidden" and
explicitly annotating the symbols that we want to export:

https://issues.apache.org/jira/browse/MESOS-6734

Although I agree this isn't a trivial change and it would be good to
have some tool support here, but there are lots of benefits [1,2].

Neil

[1] https://gcc.gnu.org/wiki/Visibility
[2] https://software.intel.com/sites/default/files/m/a/1/e/dsohowto.pdf

On Mon, Dec 12, 2016 at 2:17 PM, Joris Van Remoortere
 wrote:
> There are a significant number of developer and runtime performance
> benefits from turning this flag on.
> In my opinion it is also a dangerous flag to turn on by default without a
> strict set of rules for our codebase to ensure that we don't accidentally:
>
>- have multiple instances of static variables when we think they are a
>singleton
>- run into inequality when we expect equality for comparison of in-lined
>function pointers (For example when building a vtable in a library for
>something like variant / visitor)
>
> Although the likelihood that our codebase would suffer from these is low, I
> would prefer to have clang-tidy support and have some rule checkers to
> ensure that we can turn this flag on by default and know we will catch any
> future code that may break these rules.
>
> @James have you done any validation of the codebase and the libraries we
> depend on to ensure this is safe?
>
> Joris
>
> —
> *Joris Van Remoortere*
> Mesosphere
>
> On Mon, Dec 5, 2016 at 1:16 PM, James Peach  wrote:
>
>>
>> > On Dec 2, 2016, at 10:54 PM, Jie Yu  wrote:
>> >
>> > Another tip. If you are on macOS sierra, you might notice the linking is
>> > extremely slow using the default clang.
>> >
>> > Using CXXFLAGS `-fvisibility-inlines-hidden` will greatly speedup the
>> > linking.
>>
>> Is there a reason we should not always do this? It reduces the number of
>> exported symbols in libmesos.so from 250K to 100K.
>>
>> J


Re: Building on OS X 10.12

2016-12-12 Thread Joris Van Remoortere
There are a significant number of developer and runtime performance
benefits from turning this flag on.
In my opinion it is also a dangerous flag to turn on by default without a
strict set of rules for our codebase to ensure that we don't accidentally:

   - have multiple instances of static variables when we think they are a
   singleton
   - run into inequality when we expect equality for comparison of in-lined
   function pointers (For example when building a vtable in a library for
   something like variant / visitor)

Although the likelihood that our codebase would suffer from these is low, I
would prefer to have clang-tidy support and have some rule checkers to
ensure that we can turn this flag on by default and know we will catch any
future code that may break these rules.

@James have you done any validation of the codebase and the libraries we
depend on to ensure this is safe?

Joris

—
*Joris Van Remoortere*
Mesosphere

On Mon, Dec 5, 2016 at 1:16 PM, James Peach  wrote:

>
> > On Dec 2, 2016, at 10:54 PM, Jie Yu  wrote:
> >
> > Another tip. If you are on macOS sierra, you might notice the linking is
> > extremely slow using the default clang.
> >
> > Using CXXFLAGS `-fvisibility-inlines-hidden` will greatly speedup the
> > linking.
>
> Is there a reason we should not always do this? It reduces the number of
> exported symbols in libmesos.so from 250K to 100K.
>
> J


Re: Duplicate task IDs

2016-12-12 Thread Neil Conway
On Mon, Dec 12, 2016 at 1:32 PM, Joris Van Remoortere
 wrote:
> It sounds like using a multi_hashmap for now allows you to clean up the
> code and avoid some bugs, without changing the existing behavior.

Because we want cache-like behavior (bounded size + LRU replacement),
this would require adding a new data structure, BoundedMultiHashMap
(https://reviews.apache.org/r/54178/). That seems like overkill to me,
for now.

> It would also be unfortunate if we said we were dis-allowing duplicate task
> ids but only catch some of the manifestations.

Definitely unfortunate, but I don't see an alternative, as long as we
continue to allow frameworks to freely choose their own task IDs.

Neil


Re: Duplicate task IDs

2016-12-12 Thread Joris Van Remoortere
It sounds like using a multi_hashmap for now allows you to clean up the
code and avoid some bugs, without changing the existing behavior.

I agree that we would want a deprecation period if we changed the behavior.
It would also be unfortunate if we said we were dis-allowing duplicate task
ids but only catch some of the manifestations.

—
*Joris Van Remoortere*
Mesosphere

On Mon, Dec 12, 2016 at 7:56 AM, Neil Conway  wrote:

> Hi Joris,
>
> Fair point: I didn't deliberately set out to change the behavior for
> duplicate task IDs. Rather, it was a consequence of switching from
> boost::circular_buffer to using a hashmap for managing completed
> tasks. Using a hashmap has a few minor advantages [1], but we can
> certainly continue using circular_buffer (or a multi-hashmap) if we
> want to keep the current behavior.
>
> I think we have the following options:
>
> (1) Keep the current behavior: reusing task IDs is discouraged but
> supported.
>
> (2) Per Alex's suggestion, we can say that frameworks are no longer
> allowed to reuse task IDs. Because the master only keeps a
> limited-size cache of completed tasks (which is not preserved across
> master restart or failover), we wouldn't be able to reject all
> situations in which frameworks attempt to reuse task IDs.
>
> If we pursue #2, we might need a deprecation period or master
> capability to give framework authors some time to migrate.
>
> For the moment, I'll avoid changing the behavior for duplicate task
> IDs; I've opened https://issues.apache.org/jira/browse/MESOS-6779 to
> track this issue. If you have an opinion in this change, please
> weigh-in, either on this thread or on JIRA.
>
> Neil
>
> [1] Specifically, making the management of completed and unreachable
> tasks more symmetric and avoiding some bugs/UBI in
> boost::circular_buffer. O(1) lookup of completed tasks might be useful
> in the future but isn't used right now.
>
> On Fri, Dec 9, 2016 at 2:13 PM, Joris Van Remoortere
>  wrote:
> > Hey Neil,
> >
> > I concur that using duplicate task IDs is bad practice and asking for
> > trouble.
> >
> > Could you please clarify *why* you want to use a hashmap? Is your goal to
> > remove duplicate task IDs or is this just a side-effect and you have a
> > different reason (e.g. performance) for using a hashmap?
> >
> > I'm wondering why a multi-hashmap is not sufficient. This would be clear
> if
> > you were explicitly *trying* to get rid of duplicates of course :-)
> >
> > Thanks,
> > Joris
> >
> > —
> > *Joris Van Remoortere*
> > Mesosphere
> >
> > On Fri, Dec 9, 2016 at 7:08 AM, Neil Conway 
> wrote:
> >
> >> Folks,
> >>
> >> The master stores a cache of metadata about recently completed tasks;
> >> for example, this information can be accessed via the "/tasks" HTTP
> >> endpoint or the "GET_TASKS" call in the new Operator API.
> >>
> >> The master currently stores this metadata using a list; this means
> >> that duplicate task IDs are permitted. We're considering [1] changing
> >> this to use a hashmap instead. Using a hashmap would mean that
> >> duplicate task IDs would be discarded: if two completed tasks have the
> >> same task ID, only the metadata for the most recently completed task
> >> would be retained by the master.
> >>
> >> If this behavior change would cause problems for your framework or
> >> other software that relies on Mesos, please let me know.
> >>
> >> (Note that if you do have two completed tasks with the same ID, you'd
> >> need an unambiguous way to tell them apart. As a recommendation, I
> >> would strongly encourage framework authors to never reuse task IDs.)
> >>
> >> Neil
> >>
> >> [1] https://reviews.apache.org/r/54179/
> >>
>


Re: Duplicate task IDs

2016-12-12 Thread Neil Conway
Hi Joris,

Fair point: I didn't deliberately set out to change the behavior for
duplicate task IDs. Rather, it was a consequence of switching from
boost::circular_buffer to using a hashmap for managing completed
tasks. Using a hashmap has a few minor advantages [1], but we can
certainly continue using circular_buffer (or a multi-hashmap) if we
want to keep the current behavior.

I think we have the following options:

(1) Keep the current behavior: reusing task IDs is discouraged but supported.

(2) Per Alex's suggestion, we can say that frameworks are no longer
allowed to reuse task IDs. Because the master only keeps a
limited-size cache of completed tasks (which is not preserved across
master restart or failover), we wouldn't be able to reject all
situations in which frameworks attempt to reuse task IDs.

If we pursue #2, we might need a deprecation period or master
capability to give framework authors some time to migrate.

For the moment, I'll avoid changing the behavior for duplicate task
IDs; I've opened https://issues.apache.org/jira/browse/MESOS-6779 to
track this issue. If you have an opinion in this change, please
weigh-in, either on this thread or on JIRA.

Neil

[1] Specifically, making the management of completed and unreachable
tasks more symmetric and avoiding some bugs/UBI in
boost::circular_buffer. O(1) lookup of completed tasks might be useful
in the future but isn't used right now.

On Fri, Dec 9, 2016 at 2:13 PM, Joris Van Remoortere
 wrote:
> Hey Neil,
>
> I concur that using duplicate task IDs is bad practice and asking for
> trouble.
>
> Could you please clarify *why* you want to use a hashmap? Is your goal to
> remove duplicate task IDs or is this just a side-effect and you have a
> different reason (e.g. performance) for using a hashmap?
>
> I'm wondering why a multi-hashmap is not sufficient. This would be clear if
> you were explicitly *trying* to get rid of duplicates of course :-)
>
> Thanks,
> Joris
>
> —
> *Joris Van Remoortere*
> Mesosphere
>
> On Fri, Dec 9, 2016 at 7:08 AM, Neil Conway  wrote:
>
>> Folks,
>>
>> The master stores a cache of metadata about recently completed tasks;
>> for example, this information can be accessed via the "/tasks" HTTP
>> endpoint or the "GET_TASKS" call in the new Operator API.
>>
>> The master currently stores this metadata using a list; this means
>> that duplicate task IDs are permitted. We're considering [1] changing
>> this to use a hashmap instead. Using a hashmap would mean that
>> duplicate task IDs would be discarded: if two completed tasks have the
>> same task ID, only the metadata for the most recently completed task
>> would be retained by the master.
>>
>> If this behavior change would cause problems for your framework or
>> other software that relies on Mesos, please let me know.
>>
>> (Note that if you do have two completed tasks with the same ID, you'd
>> need an unambiguous way to tell them apart. As a recommendation, I
>> would strongly encourage framework authors to never reuse task IDs.)
>>
>> Neil
>>
>> [1] https://reviews.apache.org/r/54179/
>>


Re: Command healthcheck failed but status KILLED

2016-12-12 Thread Tomek Janiszewski
It there any information that kill is the result of failed healthcheck?
TaskHealthStatus should have some details on what was wrong. When default
executor is killing task it should add a reason and details to TaskStatus.
What do you think?

pon., 12.12.2016 o 11:11 użytkownik Alex Rukletsov 
napisał:

> Technically the task hast not failed but was killed by the executor
> (because it failed a health check).
>
> On Fri, Dec 9, 2016 at 11:27 AM, Tomek Janiszewski 
> wrote:
>
> > Hi
> >
> > What is desired behavior when command health check failed? On Mesos 1.0.2
> > when health check fails task has state KILLED instead of FAILED with
> reason
> > specifying it was killed due to failing health check.
> >
> > Thanks
> > Tomek
> >
>


Re: Command healthcheck failed but status KILLED

2016-12-12 Thread Alex Rukletsov
Technically the task hast not failed but was killed by the executor
(because it failed a health check).

On Fri, Dec 9, 2016 at 11:27 AM, Tomek Janiszewski 
wrote:

> Hi
>
> What is desired behavior when command health check failed? On Mesos 1.0.2
> when health check fails task has state KILLED instead of FAILED with reason
> specifying it was killed due to failing health check.
>
> Thanks
> Tomek
>