Slave logs clearly says that its unable to connect to zookeeper. Definitely something is wrong in zk quorum or slave zk configuration.
On Fri, May 20, 2016 at 11:42 AM, <[email protected]> wrote: > Hi, > > Here are the logs: > > Master logs : > > > > 0520 05:20:48.026689 1905 master.cpp:1457] Recovery failed: Failed to > recover registrar: Failed to perform fetch within 1mins > > F0520 05:23:05.026454 2097 log.cpp:396] Failed to participate in > ZooKeeper group: Failed to create ephemeral node at '/mesos/log_replicas' > in ZooKeeper: no node > > W0520 05:23:05.256734 2115 authenticator.cpp:511] No credentials > provided, authentication requests will be refused > > > > I0520 05:25:22.299355 2192 detector.cpp:479] A new leading master > (UPID=master@) is detected > > I0520 05:25:22.299422 2192 master.cpp:1710] The newly elected leader is > master@:5050 with id ddf2064c-9887-4f64-875d-2ed3e318310b > > I0520 05:25:22.300588 2193 network.hpp:413] ZooKeeper group memberships > changed > > I0520 05:25:22.300668 2193 group.cpp:700] Trying to get > '/mesos/log_replicas/0000000038' in ZooKeeper > > I0520 05:25:22.307706 2193 group.cpp:700] Trying to get > '/mesos/log_replicas/0000000039' in ZooKeeper > > I0520 05:25:22.308213 2193 group.cpp:700] Trying to get > '/mesos/log_replicas/0000000040' in ZooKeeper > > I0520 05:25:22.309149 2193 network.hpp:461] ZooKeeper group PIDs: { > log-replica(1)@:5050, log-replica(1)@:5050 } > > I0520 05:25:30.009531 2193 detector.cpp:154] Detected a new leader: > (id='40') > > I0520 05:25:30.009729 2193 group.cpp:700] Trying to get > '/mesos/json.info_0000000040' in ZooKeeper > > I0520 05:25:30.010063 2190 network.hpp:413] ZooKeeper group memberships > changed > > I0520 05:25:30.012197 2190 group.cpp:700] Trying to get > '/mesos/log_replicas/0000000039' in ZooKeeper > > I0520 05:25:30.012583 2193 detector.cpp:479] A new leading master > (UPID=master@:5050) is detected > > I0520 05:25:30.012662 2193 master.cpp:1710] The newly elected leader is > master@with id 1ef713ee-313d-440c-b76e-2772cb6056c7 > > I0520 05:25:30.012990 2190 group.cpp:700] Trying to get > '/mesos/log_replicas/0000000040' in ZooKeeper > > I0520 05:25:30.013538 2190 network.hpp:461] ZooKeeper group PIDs: { > log-replica(1)@:5050, log-replica(1)@:5050 } > > > > > > Slave logs: > > E0519 06:33:03.076380 1416 process.cpp:1958] Failed to shutdown socket > with fd 10: Transport endpoint is not connected > > W0519 10:33:20.348443 1412 group.cpp:503] Timed out waiting to connect to > ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration > > W0519 10:33:30.354156 1413 group.cpp:503] Timed out waiting to connect to > ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration > > W0519 10:33:40.358856 1412 group.cpp:503] Timed out waiting to connect to > ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration > > E0520 05:28:46.157860 1622 process.cpp:1966] Failed to shutdown socket > with fd 10: Transport endpoint is not connected > > E0520 05:33:18.412081 1622 process.cpp:1966] Failed to shutdown socket > with fd 10: Transport end W0520 05:33:18.108160 1616 slave.cpp:3484] > Master disconnected! Waiting for a new master to be elected > > E0520 05:33:18.412081 1622 process.cpp:1966] Failed to shutdown socket > with fd 10: Transport endpoint is not connected > > W0520 05:34:26.118638 1620 slave.cpp:3484] Master disconnected! Waiting > for a new master to be electedpoint is not connected > > > > > > *From:* Pradeep Chhetri [mailto:[email protected]] > *Sent:* 20 May 2016 08:58 > *To:* [email protected] > *Subject:* Re: Mesos Slave not registering or getting activated > > > > I am assuming that now you have pointed mesos slaves to this new zk > quorum. Can you please post the logs of mesos slave when you are restarting > it. > > > > On Fri, May 20, 2016 at 8:41 AM, <[email protected]> wrote: > > M running zk quorum . > > > > And now I have recreated all master nodes and then trying but isn’t > getting registered. > > > > *From:* Pradeep Chhetri [mailto:[email protected]] > *Sent:* 19 May 2016 23:40 > > > *To:* [email protected] > *Subject:* Re: Mesos Slave not registering or getting activated > > > > Are you running zk quorum or just a standalone instance ? > > > > My guess is that you were running single zk node & your slaves are still > pointing to the zookeeper instance on the mesos master which you have > replaced. > > > > On Thu, May 19, 2016 at 4:53 PM, <[email protected]> wrote: > > Master logs:- > > > > F0519 10:47:39.203780 28689 master.cpp:1457] Recovery failed: Failed to > recover registrar: Failed to perform fetch within 1mins > > > > W0519 10:47:39.370132 28719 slave.cpp:3484] Master disconnected! Waiting > for a new master to be elected > > W0519 10:48:47.387729 28722 slave.cpp:3484] Master disconnected! Waiting > for a new master to be elected > > > > I0519 10:48:55.191784 28723 detector.cpp:154] Detected a new leader: > (id='183') > > I0519 10:48:55.191987 28723 group.cpp:700] Trying to get > '/mesos/json.info_0000000183' in ZooKeeper > > I0519 10:48:55.193650 28723 detector.cpp:479] A new leading master > (UPID=master@) is detected > > I0519 10:48:55.193745 28723 slave.cpp:795] New master detected at master@ > > I0519 10:48:55.193918 28723 slave.cpp:820] No credentials provided. > Attempting to register without authentication > > I0519 10:48:55.193949 28723 slave.cpp:831] Detecting new master > > I0519 10:48:55.194022 28723 status_update_manager.cpp:174] Pausing sending > status updates > > I0519 10:49:29.887965 28725 slave.cpp:4304] Current disk usage 9.13%. Max > allowed age: 5.661099880651967days > > I0519 10:49:55.378037 28725 slave.cpp:3481] master@exited > > W0519 10:49:55.378182 28725 slave.cpp:3484] Master disconnected! Waiting > for a new master to be elected > > I0519 10:50:03.213912 28723 detector.cpp:154] Detected a new leader: > (id='184') > > I0519 10:50:03.214061 28723 group.cpp:700] Trying to get > '/mesos/json.info_0000000184' in ZooKeeper > > I0519 10:50:03.215296 28723 detector.cpp:479] A new leading master > (UPID=master@) is detected > > I0519 10:50:03.215395 28723 slave.cpp:795] New master detected at master@ > > I0519 10:50:03.215601 28723 slave.cpp:820] No credentials provided. > Attempting to register without authentication > > I0519 10:50:03.215631 28723 slave.cpp:831] Detecting new master > > I0519 10:50:03.215670 28723 status_update_manager.cpp:174] Pausing sending > status updates > > I0519 10:50:29.893625 28720 slave.cpp:4304] Current disk usage 9.13%. Max > allowed > > > > > > > > > > > > > > *From:* Abhishek Amralkar [mailto:[email protected]] > *Sent:* 19 May 2016 16:23 > > > *To:* [email protected] > *Subject:* Re: Mesos Slave not registering or getting activated > > > > How about masters logs? > > On 19-May-2016, at 4:20 PM, [email protected] wrote: > > > > > > > > *From:* Kumari, Suruchi > *Sent:* 19 May 2016 16:14 > *To:* '[email protected]' <[email protected]> > *Subject:* RE: Mesos Slave not registering or getting activated > > > > Hi , > > > > Theses are the slave logs: > > > > E0519 05:31:52.802345 1416 process.cpp:1958] Failed to shutdown socket > with fd 10: Transport endpoint is not connected > > E0519 05:31:53.122215 1416 process.cpp:1958] Failed to shutdown socket > with fd 10: Transport endpoint is not connected > > E0519 05:31:54.422402 1416 process.cpp:1958] Failed to shutdown socket > with fd 10: Transport endpoint is not connected > > E0519 05:31:54.546566 1416 process.cpp:1958] Failed to shutdown socket > with fd 10: Transport endpoint is not connected > > E0519 05:43:34.321432 1416 process.cpp:1958] Failed to shutdown socket > with fd 10: Transport endpoint is not connected > > E0519 06:00:44.652227 1416 process.cpp:1958] Failed to shutdown socket > with fd 10: Transport endpoint is not connected > > E0519 06:33:03.076380 1416 process.cpp:1958] Failed to shutdown socket > with fd 10: Transport endpoint is not connected > > > > W0519 10:27:40.170183 1413 group.cpp:503] Timed out waiting to connect to > ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration > > W0519 10:27:50.175251 1415 group.cpp:503] Timed out waiting to connect to > ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration > > W0519 10:28:00.191156 1411 group.cpp:503] Timed out waiting to connect to > ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration > > W0519 10:28:10.207135 1412 group.cpp:503] Timed out waiting to connect to > ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration > > W0519 10:28:20.214915 1412 group.cpp:503] Timed out waiting to connect to > ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration > > W0519 10:28:30.222079 1415 group.cpp:503] Timed out waiting to connect to > ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration > > W0519 10:28:40.227010 1414 group.cpp:503] Timed out waiting to connect to > ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration > > W0519 10:28:50.230976 1418 group.cpp:503] Timed out waiting to connect to > ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration > > W0519 10:29:00.234498 1417 group.cpp:503] Timed out waiting to connect to > ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration > > W0519 10:29:10.238258 1416 group.cpp:503] Timed out waiting to connect to > ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration > > > > > > > > *From:* Abhishek Amralkar [mailto:[email protected] > <[email protected]>] > *Sent:* 19 May 2016 15:40 > *To:* [email protected] > *Subject:* Re: Mesos Slave not registering or getting activated > > > > What logs are saying? Any error? > > > > -Abhishek > > On 19-May-2016, at 3:35 PM, [email protected] wrote: > > > > > > Hi, > > > > Previously I had a setup of 3 mesos-masters and 2 slave node but one of > the master node stopped working. So I replaced that with new mesos-master. > And now the slaves are not registering themselves. Slaves are not getting > registered. > > > > > > Can I know why is this happening. And is there any solution to this. > > > > > > Thanks > > > > > ------------------------------ > > > This message is for the designated recipient only and may contain > privileged, proprietary, or otherwise confidential information. If you have > received it in error, please notify the sender immediately and delete the > original. Any other use of the e-mail by you is prohibited. Where allowed > by local law, electronic communications with Accenture and its affiliates, > including e-mail and instant messaging (including content), may be scanned > by our systems for the purposes of information security and assessment of > internal compliance with Accenture policy. > > ______________________________________________________________________________________ > > www.accenture.com > > > > > > > > -- > > Regards, > > Pradeep Chhetri > > > > > > -- > > Regards, > > Pradeep Chhetri > -- Regards, Pradeep Chhetri

