The masters are losing their zookeeper connection too, which is forcing an election:
I0412 11:01:48.887229 3677 group.cpp:460] Lost connection to ZooKeeper, attempting to reconnect ... I0412 11:01:48.919545 3675 group.cpp:519] ZooKeeper session expired I0412 11:01:48.919848 3680 detector.cpp:154] Detected a new leader: None I0412 11:01:48.919922 3680 master.cpp:1710] The newly elected leader is None You need to tune your zookeeper cluster I'd guess, there's something not right there. On 13 April 2016 at 06:09, <aishwarya.adyanth...@accenture.com> wrote: > Hi, > > > > I configured the zookeeper file in slave machine by adding the master > details and now the salve is getting registered. > > > > But I don’t why, the three masters keep fluctuating among themselves to be > the leader when I try accessing the master IP in the GUI. > > > > Thank you. > > > > > > From: haosdent [mailto:haosd...@gmail.com] > Sent: 13 April 2016 09:25 > To: user <user@mesos.apache.org> > Cc: Kumari, Suruchi <suruchi.kum...@accenture.com> > > > Subject: Re: Slaves not getting registered > > > >>I0412 11:01:50.586612 3732 recover.cpp:578] Successfully joined the Paxos >> group > > > > According to this, master 1 should connect to zk successfully. > > > >>root@slave1:/var/log/mesos# tail -f >> mesos-slave.slave1.invalid-user.log.INFO.20160412-110554.1696 > >>I0413 03:12:54.532676 1711 group.cpp:519] ZooKeeper session expired > >>I0413 03:12:58.757953 1715 slave.cpp:4304] Current disk usage 6.44%. Max >> allowed age: 5.848917453828577days > >>W0413 03:13:04.539577 1715 group.cpp:503] Timed out waiting to connect to >> ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration > > > > How about check whether you could connect to zk on slave1 or not? > > > > On Wed, Apr 13, 2016 at 11:49 AM, <aishwarya.adyanth...@accenture.com> > wrote: > > I checked the zookeeper status by running the command: > > > > root@master1:/home/ubuntu# echo stat | nc 30.30.30.52 2181 | grep Mode > > Mode: follower > > root@master1:/home/ubuntu# echo stat | nc 30.30.30.53 2181 | grep Mode > > Mode: leader > > root@master1:/home/ubuntu# echo stat | nc 30.30.30.54 2181 | grep Mode > > Mode: follower > > > > And it seems like it’s working fine. Is there another way to check the > health status? > > > > > > From: Abhishek Amralkar [mailto:abhishek.amral...@talentica.com] > Sent: 13 April 2016 09:10 > > > To: user@mesos.apache.org > Subject: Re: Slaves not getting registered > > > > Have you checked if your ZooKeeper cluster is healthy? accessible from Mesos > Masters? > > > > W0413 03:12:24.512336 1715 group.cpp:503] Timed out waiting to connect to > ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration > > W0413 03:12:34.519641 1710 group.cpp:503] Timed out waiting to connect to > ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration > > W0413 03:12:44.521181 1713 group.cpp:503] Timed out waiting to connect to > ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration > > W0413 03:12:54.532501 1711 group.cpp:503] Timed out waiting to connect to > ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration > > > > It seems Mesos masters are not able to communicate to Zookeeper. > > > > -Abhishek > > On 13-Apr-2016, at 9:06 AM, aishwarya.adyanth...@accenture.com wrote: > > > > Hi, > > > > I have been following the document from the digitalocean (mesos-doc-link) > where I have set 3 masters and one slave. Below are the log details: > > > > root@master1:/var/log/mesos# tail -f mesos-master.INFO > > I0412 11:01:50.579818 3736 recover.cpp:193] Received a recover response > from a replica in VOTING status > > I0412 11:01:50.579903 3736 recover.cpp:564] Updating replica status to > RECOVERING > > I0412 11:01:50.583102 3736 leveldb.cpp:304] Persisting metadata (8 bytes) > to leveldb took 3.154399ms > > I0412 11:01:50.583137 3736 replica.cpp:320] Persisted replica status to > RECOVERING > > I0412 11:01:50.583176 3736 recover.cpp:543] Starting catch-up from position > 1 to 2 > > I0412 11:01:50.583732 3736 recover.cpp:564] Updating replica status to > VOTING > > I0412 11:01:50.586318 3736 leveldb.cpp:304] Persisting metadata (8 bytes) > to leveldb took 2.540703ms > > I0412 11:01:50.586484 3736 replica.cpp:320] Persisted replica status to > VOTING > > I0412 11:01:50.586612 3732 recover.cpp:578] Successfully joined the Paxos > group > > I0412 11:01:50.586745 3731 recover.cpp:462] Recover process terminated > > > > root@master1:/var/log/mesos# tail -f mesos-master.WARNING > > Log file created at: 2016/04/12 11:01:49 > > Running on machine: master1 > > Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg > > W0412 11:01:49.024226 3712 authenticator.cpp:511] No credentials provided, > authentication requests will be refused > > > > root@master1:/var/log/mesos# tail -f > mesos-master.master1.invalid-user.log.INFO.20160412-11014 > > tail: cannot open > ‘mesos-master.master1.invalid-user.log.INFO.20160412-11014’ for reading: No > such file or directory > > root@master1:/var/log/mesos# tail -f > mesos-master.master1.invalid-user.log.INFO.20160412-11014 > > mesos-master.master1.invalid-user.log.INFO.20160412-110143.3651 > mesos-master.master1.invalid-user.log.INFO.20160412-110148.3712 > > root@master1:/var/log/mesos# tail -f > mesos-master.master1.invalid-user.log.INFO.20160412-110143.3651 > > I0412 11:01:46.424433 3676 replica.cpp:673] Replica in EMPTY status > received a broadcasted recover request from (5)@30.30.30.53:5050 > > I0412 11:01:47.068586 3675 replica.cpp:673] Replica in EMPTY status > received a broadcasted recover request from (8)@30.30.30.53:5050 > > I0412 11:01:47.592926 3677 replica.cpp:673] Replica in EMPTY status > received a broadcasted recover request from (11)@30.30.30.53:5050 > > I0412 11:01:48.188248 3680 replica.cpp:673] Replica in EMPTY status > received a broadcasted recover request from (14)@30.30.30.53:5050 > > I0412 11:01:48.887104 3678 group.cpp:460] Lost connection to ZooKeeper, > attempting to reconnect ... > > I0412 11:01:48.887177 3674 group.cpp:460] Lost connection to ZooKeeper, > attempting to reconnect ... > > I0412 11:01:48.887229 3677 group.cpp:460] Lost connection to ZooKeeper, > attempting to reconnect ... > > I0412 11:01:48.919545 3675 group.cpp:519] ZooKeeper session expired > > I0412 11:01:48.919848 3680 detector.cpp:154] Detected a new leader: None > > I0412 11:01:48.919922 3680 master.cpp:1710] The newly elected leader is > None > > > > > > root@slave1:/var/log/mesos# tail -f > mesos-slave.slave1.invalid-user.log.INFO.20160412-110554.1696 > > I0413 03:12:54.532676 1711 group.cpp:519] ZooKeeper session expired > > I0413 03:12:58.757953 1715 slave.cpp:4304] Current disk usage 6.44%. Max > allowed age: 5.848917453828577days > > W0413 03:13:04.539577 1715 group.cpp:503] Timed out waiting to connect to > ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration > > I0413 03:13:04.539798 1715 group.cpp:519] ZooKeeper session expired > > W0413 03:13:14.542245 1713 group.cpp:503] Timed out waiting to connect to > ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration > > I0413 03:13:14.542434 1713 group.cpp:519] ZooKeeper session expired > > > > root@slave1:/var/log/mesos# tail -f mesos-slave.WARNING > > W0413 03:12:24.512336 1715 group.cpp:503] Timed out waiting to connect to > ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration > > W0413 03:12:34.519641 1710 group.cpp:503] Timed out waiting to connect to > ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration > > W0413 03:12:44.521181 1713 group.cpp:503] Timed out waiting to connect to > ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration > > W0413 03:12:54.532501 1711 group.cpp:503] Timed out waiting to connect to > ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration > > > > Thank you. > > > > > > From: June Taylor [mailto:j...@umn.edu] > Sent: 12 April 2016 18:06 > To: user@mesos.apache.org > Subject: Re: Slaves not getting registered > > > > Try looking in /var/log/mesos/ at these files: mesos-slave.WARNING, > mesos-slave.INFO, mesos-slave.ERROR > > > > > Thanks, > > June Taylor > > System Administrator, Minnesota Population Center > > University of Minnesota > > > > On Tue, Apr 12, 2016 at 4:36 AM, Dick Davies <d...@hellooperator.net> wrote: > > There's no mention of a slave there, have a look at the logs on the > slaves filesystem and see if it is giving any errors. > > > On 12 April 2016 at 10:17, <aishwarya.adyanth...@accenture.com> wrote: >> The GUI log shows like this: >> >> >> >> I0412 08:45:51.379609 3616 master.cpp:3673] Processing DECLINE call for >> offers: [ 74f33592-fc48-4066-a59c-977818b4c13c-O282 ] for framework >> 74f33592-fc48-4066-a59c-977818b4c13c-0001 (chronos-2.4.0) at >> scheduler-15022696-44ec-43d2-b193-a3cc4021d20e@30.30.30.48:42208 >> >> I0412 08:45:54.637461 3612 http.cpp:501] HTTP GET for /master/state.json >> from 10.211.203.147:59463 with User-Agent='Mozilla/5.0 (Windows NT 6.0; >> WOW64; rv:43.0) Gecko/20100101 Firefox/43.0' >> >> I0412 08:45:57.376288 3619 master.cpp:5350] Sending 1 offers to framework >> 74f33592-fc48-4066-a59c-977818b4c13c-0001 (chronos-2.4.0) at >> scheduler-15022696-44ec-43d2-b193-a3cc4021d20e@30.30.30.48:42208 >> >> I0412 08:45:57.385325 3613 master.cpp:3673] Processing DECLINE call for >> offers: [ 74f33592-fc48-4066-a59c-977818b4c13c-O283 ] for framework >> 74f33592-fc48-4066-a59c-977818b4c13c-0001 (chronos-2.4.0) at >> scheduler-15022696-44ec-43d2-b193-a3cc4021d20e@30.30.30.48:42208 >> >> I0412 08:46:03.383728 3614 master.cpp:5350] Sending 1 offers to framework >> 74f33592-fc48-4066-a59c-977818b4c13c-0001 (chronos-2.4.0) at >> scheduler-15022696-44ec-43d2-b193-a3cc4021d20e@30.30.30.48:42208 >> >> I0412 08:46:03.396531 3612 master.cpp:3673] Processing DECLINE call for >> offers: [ 74f33592-fc48-4066-a59c-977818b4c13c-O284 ] for framework >> 74f33592-fc48-4066-a59c-977818b4c13c-0001 (chronos-2.4.0) at >> scheduler-15022696-44ec-43d2-b193-a3cc4021d20e@30.30.30.48:42208 >> >> I0412 08:46:04.665582 3612 http.cpp:501] HTTP GET for /master/state.json >> from 10.211.203.147:59464 with User-Agent='Mozilla/5.0 (Windows NT 6.0; >> WOW64; rv:43.0) Gecko/20100101 Firefox/43.0' >> >> I0412 08:46:09.389493 3616 master.cpp:5350] Sending 1 offers to framework >> 74f33592-fc48-4066-a59c-977818b4c13c-0001 (chronos-2.4.0) at >> scheduler-15022696-44ec-43d2-b193-a3cc4021d20e@30.30.30.48:42208 >> >> >> >> >> >> Is there a way to find out the number of masters that are present in the >> environment together through CLI/GUI? >> >> >> >> >> >> >> >> From: haosdent [mailto:haosd...@gmail.com] >> Sent: 12 April 2016 13:37 >> To: user <user@mesos.apache.org> >> Subject: Re: Slaves not getting registered >> >> >> >>>but am unable to get it registered. >> >> Hi, @aishwarya Could you post master and slave log to provide more >> details? >> Usually it is because of network problem. >> >> >> >> On Tue, Apr 12, 2016 at 4:02 PM, <aishwarya.adyanth...@accenture.com> >> wrote: >> >> Hi, >> >> >> >> I’m unable to get the slave registered with the master node. I’ve >> configured >> both the masters and slave machines but am unable to get it registered. >> >> >> >> Thank you. >> >> >> >> ________________________________ >> >> >> This message is for the designated recipient only and may contain >> privileged, proprietary, or otherwise confidential information. If you >> have >> received it in error, please notify the sender immediately and delete the >> original. Any other use of the e-mail by you is prohibited. Where allowed >> by >> local law, electronic communications with Accenture and its affiliates, >> including e-mail and instant messaging (including content), may be scanned >> by our systems for the purposes of information security and assessment of >> internal compliance with Accenture policy. >> >> ______________________________________________________________________________________ >> >> www.accenture.com >> >> >> >> >> >> -- >> >> Best Regards, >> >> Haosdent Huang > > > > > > ________________________________ > > > This message is for the designated recipient only and may contain > privileged, proprietary, or otherwise confidential information. If you have > received it in error, please notify the sender immediately and delete the > original. Any other use of the e-mail by you is prohibited. Where allowed by > local law, electronic communications with Accenture and its affiliates, > including e-mail and instant messaging (including content), may be scanned > by our systems for the purposes of information security and assessment of > internal compliance with Accenture policy. > ______________________________________________________________________________________ > > www.accenture.com > > > > > > > > -- > > Best Regards, > > Haosdent Huang