mess 0.17.0 had a major refactor around interaction with ZooKeeper. So I
would definitely recommend giving it a try and see if the problem persists.


On Tue, Apr 15, 2014 at 11:59 AM, Ted Young <[email protected]> wrote:

>  Anyone have any suggestions? I'm still seeing these problems and it's
> causing our slaves to constantly re-register themselves, which then causes
> the apps to move around a lot.
>
>
>
> ;ted
>
>
>
> *From:* Ted Young
> *Sent:* Wednesday, April 09, 2014 4:25 PM
> *To:* [email protected]
> *Subject:* RE: Mesos slaves disconnecting because of Zookeeper?
>
>
>
> Hi Tom,
>
>
>
> There's only one hostname right now and it's a static entry in the DNS, so
> unless there's some DNS weirdness going on (anything's possible), it's
> always resolving properly. Also, I'm only getting the error once every day
> or three, so it could be something going on somewhere on the network, but
> I'm not sure where to look next.
>
>
>
> Thanks,
>
> ;ted
>
>
>
>
>
> *From:* Thomas Petr [mailto:[email protected] <[email protected]>]
> *Sent:* Wednesday, April 09, 2014 4:19 PM
> *To:* [email protected]
> *Subject:* Re: Mesos slaves disconnecting because of Zookeeper?
>
>
>
> Hey Ted,
>
>
>
> Could you check your zk connection string and ensure that all the
> hostnames resolve correctly? When I've hit that error in the past it was
> due to zookeeper failing to resolve a hostname (in my case, for a EC2
> instance that was deleted).
>
>
>
> Thanks,
>
> Tom
>
>
>
> On Wed, Apr 9, 2014 at 7:09 PM, Ted Young <[email protected]> wrote:
>
>  (I'm running mesos 0.16.0 and marathon 0.4.0)
>
>
>
> Every day or two, I'm seeing the mesos slaves lose touch with the master
> and disconnect (causing all of the services running on all of the slaves to
> be redeployed and restarted). The only thing I'm seeing in the logs at
> these times (on the slaves) is something like:
>
>
>
> W0409 12:32:27.347270 22523 group.cpp:435] Timed out waiting to reconnect
> to ZooKeeper (sessionId=1446fc9b27d00b7)
>
> F0409 12:32:42.366143 22523 zookeeper.cpp:195] Failed to create ZooKeeper,
> zookeeper_init: No such file or directory [2]
>
>
>
> I'm not sure where to begin troubleshooting this. I will be upgrading to
> mesos 0.17.0 and marathon 0.4.1 in case that matters.
>
>
>
> Any pointers would be appreciated!
>
>
>
> ;ted
>
>
>
> __________________________________________________________
>
> *Ted M. Young*
> Guidewire Software - DevOps
>
> Tel: +1 650 357 5291
> [email protected] <[email protected]> | www.guidewire.com
>
> 1001 E. Hillsdale Blvd, Suite 800, Foster City, CA 94404
>
> Deliver insurance your way with flexible software products from Guidewire.
>
>
>
>
>
>
>

Reply via email to