Hi all,
I'm working on an issue related to operation feedback on agent default
resources, MESOS-9535 <https://issues.apache.org/jira/browse/MESOS-9535>.
This involves the master's handling of an agent capability that we recently
added, AGENT_OPERATION_FEEDBACK. This new capability is optional (i.e. not
in the agent's list of capabilities required for agent startup
<https://github.com/apache/mesos/blob/761e1ca400901dd623f1cb025e1d68da9472d49c/src/slave/flags.cpp#L774-L780>),
and it has the RESOURCE_PROVIDER capability as a prerequisite.

I need to update the master code to avoid memory leaks in the case where an
agent is downgraded from AGENT_OPERATION_FEEDBACK-capable to
non-AGENT_OPERATION_FEEDBACK-capable. In this case, it is difficult for the
master to tell the difference between a true *version downgrade* to an
older agent, and a downgrade to a *recent agent* which has simply had the
capability unset by an operator.

To avoid this difficulty, I'm considering the possibility of making both
the RESOURCE_PROVIDER and AGENT_OPERATION_FEEDBACK capabilities required
for agent startup starting in 1.8.0. This would mean that operators could
no longer opt out of all of the new operation-handling code paths in the
master (`ApplyOperationMessage`, `UpdateOperationStatusMessage`, etc.).

I wanted to reach out to the community to see how folks feel about this
change, and also if there are any cluster operators out there who have been
disabling the RESOURCE_PROVIDER capability on their agents.

Thanks in advance for your input!

Cheers,
Greg

Reply via email to