>the three masters keep fluctuating among themselves to be the leader. Do all the network between 3 Mesos masters and zookeeper are stable? Is it lost packets when you ping zookeeper servers in every Mesos master?
On Wed, Apr 13, 2016 at 1:15 PM, Abhishek Amralkar < abhishek.amral...@talentica.com> wrote: > Not sure, but try to change the quorum and check. > > > > On 13-Apr-2016, at 10:39 AM, aishwarya.adyanth...@accenture.com wrote: > > Hi, > > I configured the zookeeper file in slave machine by adding the master > details and now the salve is getting registered. > > But I don’t why, the three masters keep fluctuating among themselves to be > the leader when I try accessing the master IP in the GUI. > > Thank you. > > > *From:* haosdent [mailto:haosd...@gmail.com <haosd...@gmail.com>] > *Sent:* 13 April 2016 09:25 > *To:* user <user@mesos.apache.org> > *Cc:* Kumari, Suruchi <suruchi.kum...@accenture.com> > *Subject:* Re: Slaves not getting registered > > >I0412 11:01:50.586612 3732 recover.cpp:578] Successfully joined the > Paxos group > > According to this, master 1 should connect to zk successfully. > > >root@slave1:/var/log/mesos# tail -f > mesos-slave.slave1.invalid-user.log.INFO.20160412-110554.1696 > >I0413 03:12:54.532676 1711 group.cpp:519] ZooKeeper session expired > >I0413 03:12:58.757953 1715 slave.cpp:4304] Current disk usage 6.44%. Max > allowed age: 5.848917453828577days > >W0413 03:13:04.539577 1715 group.cpp:503] Timed out waiting to connect > to ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration > > How about check whether you could connect to zk on slave1 or not? > > On Wed, Apr 13, 2016 at 11:49 AM, <aishwarya.adyanth...@accenture.com> > wrote: > > I checked the zookeeper status by running the command: > > root@master1:/home/ubuntu# echo stat | nc 30.30.30.52 2181 | grep Mode > Mode: follower > root@master1:/home/ubuntu# echo stat | nc 30.30.30.53 2181 | grep Mode > Mode: leader > root@master1:/home/ubuntu# echo stat | nc 30.30.30.54 2181 | grep Mode > Mode: follower > > And it seems like it’s working fine. Is there another way to check the > health status? > > > *From:* Abhishek Amralkar [mailto:abhishek.amral...@talentica.com] > *Sent:* 13 April 2016 09:10 > > *To:* user@mesos.apache.org > *Subject:* Re: Slaves not getting registered > > Have you checked if your ZooKeeper cluster is healthy? accessible from > Mesos Masters? > > W0413 03:12:24.512336 1715 group.cpp:503] Timed out waiting to connect to > ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration > W0413 03:12:34.519641 1710 group.cpp:503] Timed out waiting to connect to > ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration > W0413 03:12:44.521181 1713 group.cpp:503] Timed out waiting to connect to > ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration > W0413 03:12:54.532501 1711 group.cpp:503] Timed out waiting to connect to > ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration > > It seems Mesos masters are not able to communicate to Zookeeper. > > -Abhishek > > On 13-Apr-2016, at 9:06 AM, aishwarya.adyanth...@accenture.com wrote: > > Hi, > > I have been following the document from the digitalocean (mesos-doc-link > <https://www.digitalocean.com/community/tutorials/how-to-configure-a-production-ready-mesosphere-cluster-on-ubuntu-14-04>) > where I have set 3 masters and one slave. Below are the log details: > > root@master1:/var/log/mesos# tail -f mesos-master.INFO > I0412 11:01:50.579818 3736 recover.cpp:193] Received a recover response > from a replica in VOTING status > I0412 11:01:50.579903 3736 recover.cpp:564] Updating replica status to > RECOVERING > I0412 11:01:50.583102 3736 leveldb.cpp:304] Persisting metadata (8 bytes) > to leveldb took 3.154399ms > I0412 11:01:50.583137 3736 replica.cpp:320] Persisted replica status to > RECOVERING > I0412 11:01:50.583176 3736 recover.cpp:543] Starting catch-up from > position 1 to 2 > I0412 11:01:50.583732 3736 recover.cpp:564] Updating replica status to > VOTING > I0412 11:01:50.586318 3736 leveldb.cpp:304] Persisting metadata (8 bytes) > to leveldb took 2.540703ms > I0412 11:01:50.586484 3736 replica.cpp:320] Persisted replica status to > VOTING > I0412 11:01:50.586612 3732 recover.cpp:578] Successfully joined the Paxos > group > I0412 11:01:50.586745 3731 recover.cpp:462] Recover process terminated > > root@master1:/var/log/mesos# tail -f mesos-master.WARNING > Log file created at: 2016/04/12 11:01:49 > Running on machine: master1 > Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg > W0412 11:01:49.024226 3712 authenticator.cpp:511] No credentials > provided, authentication requests will be refused > > root@master1:/var/log/mesos# tail -f > mesos-master.master1.invalid-user.log.INFO.20160412-11014 > tail: cannot open > ‘mesos-master.master1.invalid-user.log.INFO.20160412-11014’ for reading: No > such file or directory > root@master1:/var/log/mesos# tail -f > mesos-master.master1.invalid-user.log.INFO.20160412-11014 > mesos-master.master1.invalid-user.log.INFO.20160412-110143.3651 > mesos-master.master1.invalid-user.log.INFO.20160412-110148.3712 > root@master1:/var/log/mesos# tail -f > mesos-master.master1.invalid-user.log.INFO.20160412-110143.3651 > I0412 11:01:46.424433 3676 replica.cpp:673] Replica in EMPTY status > received a broadcasted recover request from (5)@30.30.30.53:5050 > I0412 11:01:47.068586 3675 replica.cpp:673] Replica in EMPTY status > received a broadcasted recover request from (8)@30.30.30.53:5050 > I0412 11:01:47.592926 3677 replica.cpp:673] Replica in EMPTY status > received a broadcasted recover request from (11)@30.30.30.53:5050 > I0412 11:01:48.188248 3680 replica.cpp:673] Replica in EMPTY status > received a broadcasted recover request from (14)@30.30.30.53:5050 > I0412 11:01:48.887104 3678 group.cpp:460] Lost connection to ZooKeeper, > attempting to reconnect ... > I0412 11:01:48.887177 3674 group.cpp:460] Lost connection to ZooKeeper, > attempting to reconnect ... > I0412 11:01:48.887229 3677 group.cpp:460] Lost connection to ZooKeeper, > attempting to reconnect ... > I0412 11:01:48.919545 3675 group.cpp:519] ZooKeeper session expired > I0412 11:01:48.919848 3680 detector.cpp:154] Detected a new leader: None > I0412 11:01:48.919922 3680 master.cpp:1710] The newly elected leader is > None > > > root@slave1:/var/log/mesos# tail -f > mesos-slave.slave1.invalid-user.log.INFO.20160412-110554.1696 > I0413 03:12:54.532676 1711 group.cpp:519] ZooKeeper session expired > I0413 03:12:58.757953 1715 slave.cpp:4304] Current disk usage 6.44%. Max > allowed age: 5.848917453828577days > W0413 03:13:04.539577 1715 group.cpp:503] Timed out waiting to connect to > ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration > I0413 03:13:04.539798 1715 group.cpp:519] ZooKeeper session expired > W0413 03:13:14.542245 1713 group.cpp:503] Timed out waiting to connect to > ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration > I0413 03:13:14.542434 1713 group.cpp:519] ZooKeeper session expired > > root@slave1:/var/log/mesos# tail -f mesos-slave.WARNING > W0413 03:12:24.512336 1715 group.cpp:503] Timed out waiting to connect to > ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration > W0413 03:12:34.519641 1710 group.cpp:503] Timed out waiting to connect to > ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration > W0413 03:12:44.521181 1713 group.cpp:503] Timed out waiting to connect to > ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration > W0413 03:12:54.532501 1711 group.cpp:503] Timed out waiting to connect to > ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration > > Thank you. > > > *From:* June Taylor [mailto:j...@umn.edu <j...@umn.edu>] > *Sent:* 12 April 2016 18:06 > *To:* user@mesos.apache.org > *Subject:* Re: Slaves not getting registered > > Try looking in /var/log/mesos/ at these files: mesos-slave.WARNING, > mesos-slave.INFO, mesos-slave.ERROR > > > Thanks, > June Taylor > System Administrator, Minnesota Population Center > University of Minnesota > > On Tue, Apr 12, 2016 at 4:36 AM, Dick Davies <d...@hellooperator.net> > wrote: > > There's no mention of a slave there, have a look at the logs on the > slaves filesystem and see if it is giving any errors. > > On 12 April 2016 at 10:17, <aishwarya.adyanth...@accenture.com> wrote: > > The GUI log shows like this: > > > > > > > > I0412 08:45:51.379609 3616 master.cpp:3673] Processing DECLINE call for > > offers: [ 74f33592-fc48-4066-a59c-977818b4c13c-O282 ] for framework > > 74f33592-fc48-4066-a59c-977818b4c13c-0001 (chronos-2.4.0) at > > scheduler-15022696-44ec-43d2-b193-a3cc4021d20e@30.30.30.48:42208 > > > > I0412 08:45:54.637461 3612 http.cpp:501] HTTP GET for /master/state.json > > from 10.211.203.147:59463 with User-Agent='Mozilla/5.0 (Windows NT 6.0; > > WOW64; rv:43.0) Gecko/20100101 Firefox/43.0' > > > > I0412 08:45:57.376288 3619 master.cpp:5350] Sending 1 offers to > framework > > 74f33592-fc48-4066-a59c-977818b4c13c-0001 (chronos-2.4.0) at > > scheduler-15022696-44ec-43d2-b193-a3cc4021d20e@30.30.30.48:42208 > > > > I0412 08:45:57.385325 3613 <385325%20%203613> master.cpp:3673] > Processing DECLINE call for > > offers: [ 74f33592-fc48-4066-a59c-977818b4c13c-O283 ] for framework > > 74f33592-fc48-4066-a59c-977818b4c13c-0001 (chronos-2.4.0) at > > scheduler-15022696-44ec-43d2-b193-a3cc4021d20e@30.30.30.48:42208 > > > > I0412 08:46:03.383728 3614 master.cpp:5350] Sending 1 offers to > framework > > 74f33592-fc48-4066-a59c-977818b4c13c-0001 (chronos-2.4.0) at > > scheduler-15022696-44ec-43d2-b193-a3cc4021d20e@30.30.30.48:42208 > > > > I0412 08:46:03.396531 3612 master.cpp:3673] Processing DECLINE call for > > offers: [ 74f33592-fc48-4066-a59c-977818b4c13c-O284 ] for framework > > 74f33592-fc48-4066-a59c-977818b4c13c-0001 (chronos-2.4.0) at > > scheduler-15022696-44ec-43d2-b193-a3cc4021d20e@30.30.30.48:42208 > > > > I0412 08:46:04.665582 3612 http.cpp:501] HTTP GET for /master/state.json > > from 10.211.203.147:59464 with User-Agent='Mozilla/5.0 (Windows NT 6.0; > > WOW64; rv:43.0) Gecko/20100101 Firefox/43.0' > > > > I0412 08:46:09.389493 3616 master.cpp:5350] Sending 1 offers to > framework > > 74f33592-fc48-4066-a59c-977818b4c13c-0001 (chronos-2.4.0) at > > scheduler-15022696-44ec-43d2-b193-a3cc4021d20e@30.30.30.48:42208 > > > > > > > > > > > > Is there a way to find out the number of masters that are present in the > > environment together through CLI/GUI? > > > > > > > > > > > > > > > > From: haosdent [mailto:haosd...@gmail.com] > > Sent: 12 April 2016 13:37 > > To: user <user@mesos.apache.org> > > Subject: Re: Slaves not getting registered > > > > > > > >>but am unable to get it registered. > > > > Hi, @aishwarya Could you post master and slave log to provide more > details? > > Usually it is because of network problem. > > > > > > > > On Tue, Apr 12, 2016 at 4:02 PM, <aishwarya.adyanth...@accenture.com> > wrote: > > > > Hi, > > > > > > > > I’m unable to get the slave registered with the master node. I’ve > configured > > both the masters and slave machines but am unable to get it registered. > > > > > > > > Thank you. > > > > > > > > ________________________________ > > > > > > This message is for the designated recipient only and may contain > > privileged, proprietary, or otherwise confidential information. If you > have > > received it in error, please notify the sender immediately and delete the > > original. Any other use of the e-mail by you is prohibited. Where > allowed by > > local law, electronic communications with Accenture and its affiliates, > > including e-mail and instant messaging (including content), may be > scanned > > by our systems for the purposes of information security and assessment of > > internal compliance with Accenture policy. > > > ______________________________________________________________________________________ > > > > www.accenture.com > > > > > > > > > > > > -- > > > > Best Regards, > > > > Haosdent Huang > > > > ------------------------------ > > This message is for the designated recipient only and may contain > privileged, proprietary, or otherwise confidential information. If you have > received it in error, please notify the sender immediately and delete the > original. Any other use of the e-mail by you is prohibited. Where allowed > by local law, electronic communications with Accenture and its affiliates, > including e-mail and instant messaging (including content), may be scanned > by our systems for the purposes of information security and assessment of > internal compliance with Accenture policy. > > ______________________________________________________________________________________ > > www.accenture.com > > > > > > > -- > Best Regards, > Haosdent Huang > > > -- Best Regards, Haosdent Huang