Anyone have any suggestions? I'm still seeing these problems and it's causing 
our slaves to constantly re-register themselves, which then causes the apps to 
move around a lot.

;ted

From: Ted Young
Sent: Wednesday, April 09, 2014 4:25 PM
To: [email protected]
Subject: RE: Mesos slaves disconnecting because of Zookeeper?

Hi Tom,

There's only one hostname right now and it's a static entry in the DNS, so 
unless there's some DNS weirdness going on (anything's possible), it's always 
resolving properly. Also, I'm only getting the error once every day or three, 
so it could be something going on somewhere on the network, but I'm not sure 
where to look next.

Thanks,
;ted


From: Thomas Petr [mailto:[email protected]]
Sent: Wednesday, April 09, 2014 4:19 PM
To: [email protected]<mailto:[email protected]>
Subject: Re: Mesos slaves disconnecting because of Zookeeper?

Hey Ted,

Could you check your zk connection string and ensure that all the hostnames 
resolve correctly? When I've hit that error in the past it was due to zookeeper 
failing to resolve a hostname (in my case, for a EC2 instance that was deleted).

Thanks,
Tom

On Wed, Apr 9, 2014 at 7:09 PM, Ted Young 
<[email protected]<mailto:[email protected]>> wrote:
(I'm running mesos 0.16.0 and marathon 0.4.0)

Every day or two, I'm seeing the mesos slaves lose touch with the master and 
disconnect (causing all of the services running on all of the slaves to be 
redeployed and restarted). The only thing I'm seeing in the logs at these times 
(on the slaves) is something like:

W0409 12:32:27.347270 22523 group.cpp:435] Timed out waiting to reconnect to 
ZooKeeper (sessionId=1446fc9b27d00b7)
F0409 12:32:42.366143 22523 zookeeper.cpp:195] Failed to create ZooKeeper, 
zookeeper_init: No such file or directory [2]

I'm not sure where to begin troubleshooting this. I will be upgrading to mesos 
0.17.0 and marathon 0.4.1 in case that matters.

Any pointers would be appreciated!

;ted

__________________________________________________________
Ted M. Young
Guidewire Software - DevOps
Tel: +1 650 357 5291<tel:%2B1%20650%20357%205291>
[email protected]<mailto:[email protected]> | 
www.guidewire.com<http://www.guidewire.com/>
1001 E. Hillsdale Blvd, Suite 800, Foster City, CA 94404
Deliver insurance your way with flexible software products from Guidewire.



Reply via email to