[ https://issues.apache.org/jira/browse/MESOS-7681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16279334#comment-16279334 ]
Vinod Kone commented on MESOS-7681: ----------------------------------- FYI, Master capabilities have landed. [~mcypark] will you be working on this? > Add safeguard for new agents with new features + old master > ----------------------------------------------------------- > > Key: MESOS-7681 > URL: https://issues.apache.org/jira/browse/MESOS-7681 > Project: Mesos > Issue Type: Improvement > Reporter: Neil Conway > Labels: mesosphere > > Consider this scenario: > * Mesos cluster with 3 masters and 1 agent. > * 2 of the masters (including the leader) are upgraded to Mesos 1.4; > remaining master stays at Mesos 1.3 (e.g., due to operator error). > * Agent is upgraded to Mesos 1.4 > * Framework creates a reservation refinement on the agent > * Leading master fails; Mesos 1.3 master is elected as the new leader > In this scenario, the agent will send resources to the master in the new > (post-refinement) format, but the master will not understand those new > fields. This results in an inconsistency between the agent's resources and > the master's view of the agent's resources. This could lead to various > problems -- in effect, the reservation the framework previously made has been > "forgotten" during master failover. Similarly, if the agent attempts to > unreserve the resources (using the master's version of the resource), that > operation will be rejected by the agent. > To fix this, it seems we need an explicit negotiation between the agent and > the master as part of registration/re-registration. The agent would examine > its resources and say which capabilities it _requires_ of the master (not > just the capabilities the agent _supports_); if the master does not support > those capabilities, the agent cannot safely register. > We could implement this either via master capabilities (agent computes the > master capabilities it requires and declines to register if the master isn't > new enough), or via agent capabilities (agent tells master the capabilities > it is "actively using"; master refuses to allow any agent to register that is > using a capability the master doesn't recognize/support). Probably the former > is safer/cleaner. -- This message was sent by Atlassian JIRA (v6.4.14#64029)