(I'm running mesos 0.16.0 and marathon 0.4.0)

Every day or two, I'm seeing the mesos slaves lose touch with the master and 
disconnect (causing all of the services running on all of the slaves to be 
redeployed and restarted). The only thing I'm seeing in the logs at these times 
(on the slaves) is something like:

W0409 12:32:27.347270 22523 group.cpp:435] Timed out waiting to reconnect to 
ZooKeeper (sessionId=1446fc9b27d00b7)
F0409 12:32:42.366143 22523 zookeeper.cpp:195] Failed to create ZooKeeper, 
zookeeper_init: No such file or directory [2]

I'm not sure where to begin troubleshooting this. I will be upgrading to mesos 
0.17.0 and marathon 0.4.1 in case that matters.

Any pointers would be appreciated!

;ted

__________________________________________________________
Ted M. Young
Guidewire Software - DevOps
Tel: +1 650 357 5291
[email protected]<mailto:[email protected]> | 
www.guidewire.com<http://www.guidewire.com/>
1001 E. Hillsdale Blvd, Suite 800, Foster City, CA 94404
Deliver insurance your way with flexible software products from Guidewire.


Reply via email to