I've been playing around with marathon and mesos recently and I was 
encountering a bunch of weird, inconsistent behavior with marathon.  It turns 
out that some overly-strict iptables rules were blocking traffic between the 
mesos master and the ephemeral port of the marathon framework leader (unless by 
chance they were on the same box).

The net result is that mesos would constantly spam re-registration requests, 
think that they succeeded, then disconnect the framework since it couldn't 
connect.  Mesos would mark the framework as active in the ui and successfully 
registered (Although the re-registered time was getting continuously updated.  
During this time, the mesos leader's logs contained tons of entries of the form:

Feb  4 12:57:04 dev-mesos-master1 mesos-master[12510]: I0204 12:57:04.611101 
12534 master.cpp:1573] Re-registering framework 
20141111-001826-924320522-5050-26663-0000 (marathon-0.7.6)  at 
[email protected]:58021
Feb  4 12:57:04 dev-mesos-master1 mesos-master[12510]: I0204 12:57:04.611127 
12534 master.cpp:1602] Framework 20141111-001826-924320522-5050-26663-0000 
(marathon-0.7.6) at 
[email protected]:58021 failed over
Feb  4 12:57:04 dev-mesos-master1 mesos-master[12510]: I0204 12:57:04.611335 
12534 hierarchical_allocator_process.hpp:375] Activated framework 
20141111-001826-924320522-5050-26663-0000
Feb  4 12:57:04 dev-mesos-master1 mesos-master[12510]: I0204 12:57:04.611882 
12534 master.cpp:3843] Sending 4 offers to framework 
20141111-001826-924320522-5050-26663-0000 (marathon-0.7.6) at 
[email protected]:58021
Feb  4 12:57:04 dev-mesos-master1 mesos-master[12510]: I0204 12:57:04.612428 
12529 master.cpp:789] Framework 20141111-001826-924320522-5050-26663-0000 
(marathon-0.7.6) at 
[email protected]:58021 disconnected
Feb  4 12:57:04 dev-mesos-master1 mesos-master[12510]: I0204 12:57:04.612452 
12529 master.cpp:1752] Disconnecting framework 
20141111-001826-924320522-5050-26663-0000 (marathon-0.7.6) at 
[email protected]:58021
Feb  4 12:57:04 dev-mesos-master1 mesos-master[12510]: I0204 12:57:04.612463 
12529 master.cpp:1768] Deactivating framework 
20141111-001826-924320522-5050-26663-0000 (marathon-0.7.6) at 
[email protected]:58021
Feb  4 12:57:04 dev-mesos-master1 mesos-master[12510]: I0204 12:57:04.612586 
12530 hierarchical_allocator_process.hpp:405] Deactivated framework 
20141111-001826-924320522-5050-26663-0000

Where 10.3.0.57 was the box hosting the marathon leader.

I've posted this as an issue in marathon's github 
(https://github.com/mesosphere/marathon/issues/1140), but I also wanted to post 
here as it may be an issue that mesos seems to not be handling the case where 
it cannot successfully connect to a framework.  (Obviously mesos handling this 
better wouldn't fix the issues that crop up in marathon, but it'd be nice if 
mesos gave some indication that it's not actually able to successfully 
communicate with a framework.


Reply via email to