Hi all, I'm working on an issue related to operation feedback on agent default resources, MESOS-9535 <https://issues.apache.org/jira/browse/MESOS-9535>. This involves the master's handling of an agent capability that we recently added, AGENT_OPERATION_FEEDBACK. This new capability is optional (i.e. not in the agent's list of capabilities required for agent startup <https://github.com/apache/mesos/blob/761e1ca400901dd623f1cb025e1d68da9472d49c/src/slave/flags.cpp#L774-L780>), and it has the RESOURCE_PROVIDER capability as a prerequisite.
I need to update the master code to avoid memory leaks in the case where an agent is downgraded from AGENT_OPERATION_FEEDBACK-capable to non-AGENT_OPERATION_FEEDBACK-capable. In this case, it is difficult for the master to tell the difference between a true *version downgrade* to an older agent, and a downgrade to a *recent agent* which has simply had the capability unset by an operator. To avoid this difficulty, I'm considering the possibility of making both the RESOURCE_PROVIDER and AGENT_OPERATION_FEEDBACK capabilities required for agent startup starting in 1.8.0. This would mean that operators could no longer opt out of all of the new operation-handling code paths in the master (`ApplyOperationMessage`, `UpdateOperationStatusMessage`, etc.). I wanted to reach out to the community to see how folks feel about this change, and also if there are any cluster operators out there who have been disabling the RESOURCE_PROVIDER capability on their agents. Thanks in advance for your input! Cheers, Greg