mess 0.17.0 had a major refactor around interaction with ZooKeeper. So I would definitely recommend giving it a try and see if the problem persists.
On Tue, Apr 15, 2014 at 11:59 AM, Ted Young <[email protected]> wrote: > Anyone have any suggestions? I'm still seeing these problems and it's > causing our slaves to constantly re-register themselves, which then causes > the apps to move around a lot. > > > > ;ted > > > > *From:* Ted Young > *Sent:* Wednesday, April 09, 2014 4:25 PM > *To:* [email protected] > *Subject:* RE: Mesos slaves disconnecting because of Zookeeper? > > > > Hi Tom, > > > > There's only one hostname right now and it's a static entry in the DNS, so > unless there's some DNS weirdness going on (anything's possible), it's > always resolving properly. Also, I'm only getting the error once every day > or three, so it could be something going on somewhere on the network, but > I'm not sure where to look next. > > > > Thanks, > > ;ted > > > > > > *From:* Thomas Petr [mailto:[email protected] <[email protected]>] > *Sent:* Wednesday, April 09, 2014 4:19 PM > *To:* [email protected] > *Subject:* Re: Mesos slaves disconnecting because of Zookeeper? > > > > Hey Ted, > > > > Could you check your zk connection string and ensure that all the > hostnames resolve correctly? When I've hit that error in the past it was > due to zookeeper failing to resolve a hostname (in my case, for a EC2 > instance that was deleted). > > > > Thanks, > > Tom > > > > On Wed, Apr 9, 2014 at 7:09 PM, Ted Young <[email protected]> wrote: > > (I'm running mesos 0.16.0 and marathon 0.4.0) > > > > Every day or two, I'm seeing the mesos slaves lose touch with the master > and disconnect (causing all of the services running on all of the slaves to > be redeployed and restarted). The only thing I'm seeing in the logs at > these times (on the slaves) is something like: > > > > W0409 12:32:27.347270 22523 group.cpp:435] Timed out waiting to reconnect > to ZooKeeper (sessionId=1446fc9b27d00b7) > > F0409 12:32:42.366143 22523 zookeeper.cpp:195] Failed to create ZooKeeper, > zookeeper_init: No such file or directory [2] > > > > I'm not sure where to begin troubleshooting this. I will be upgrading to > mesos 0.17.0 and marathon 0.4.1 in case that matters. > > > > Any pointers would be appreciated! > > > > ;ted > > > > __________________________________________________________ > > *Ted M. Young* > Guidewire Software - DevOps > > Tel: +1 650 357 5291 > [email protected] <[email protected]> | www.guidewire.com > > 1001 E. Hillsdale Blvd, Suite 800, Foster City, CA 94404 > > Deliver insurance your way with flexible software products from Guidewire. > > > > > > >

