Megha created MESOS-6483:
----------------------------

             Summary: Potential issue with upgrading from mesos 0.28 to mesos > 
1.0
                 Key: MESOS-6483
                 URL: https://issues.apache.org/jira/browse/MESOS-6483
             Project: Mesos
          Issue Type: Bug
            Reporter: Megha


When upgrading directly from mesos version 0.28 to a version > 1.0 there could 
be a scenario that may make the 
CHECK(frameworks.recovered.contains(frameworkId)) in 
Master::_markUnreachable(..) to fail. The following sequence of events can 
happen.

1) The master gets upgraded first to the new version and the agent lets say X 
is still at mesos version 0.28
2) This agent X (at mesos 0.28) attempts to re-registers with the master (at 
lets say 1.1) and as a result doesn't send the frameworks (frameworkInfos) in 
the ReRegisterSlave message since it wasn't available in the older mesos 
version.
3) Among other frameworks on this agent X, is a framework Y which didn’t 
re-register after master’s failover. Since the master builds the 
frameworks.recovered from the frameworkInfos that agents provide it so this 
framework Y is neither in the recovered nor in registered frameworks.
4) The agent X post re-registering fails master’s health check and is being 
marked unreachable by the master. The check  
CHECK(frameworks.recovered.contains(frameworkId)) will get fired for the 
framework Y since it is neither in recovered or registered but has tasks 
running on the agent X.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to