The masters are losing their zookeeper connection too, which is
forcing an election:

I0412 11:01:48.887229  3677 group.cpp:460] Lost connection to
ZooKeeper, attempting to reconnect ...

I0412 11:01:48.919545  3675 group.cpp:519] ZooKeeper session expired

I0412 11:01:48.919848  3680 detector.cpp:154] Detected a new leader: None

I0412 11:01:48.919922  3680 master.cpp:1710] The newly elected leader is None

You need to tune your zookeeper cluster I'd guess, there's something
not right there.

On 13 April 2016 at 06:09,  <aishwarya.adyanth...@accenture.com> wrote:
> Hi,
>
>
>
> I configured the zookeeper file in slave machine by adding the master
> details and now the salve is getting registered.
>
>
>
> But I don’t why, the three masters keep fluctuating among themselves to be
> the leader when I try accessing the master IP in the GUI.
>
>
>
> Thank you.
>
>
>
>
>
> From: haosdent [mailto:haosd...@gmail.com]
> Sent: 13 April 2016 09:25
> To: user <user@mesos.apache.org>
> Cc: Kumari, Suruchi <suruchi.kum...@accenture.com>
>
>
> Subject: Re: Slaves not getting registered
>
>
>
>>I0412 11:01:50.586612  3732 recover.cpp:578] Successfully joined the Paxos
>> group
>
>
>
> According to this, master 1 should connect to zk successfully.
>
>
>
>>root@slave1:/var/log/mesos# tail -f
>> mesos-slave.slave1.invalid-user.log.INFO.20160412-110554.1696
>
>>I0413 03:12:54.532676  1711 group.cpp:519] ZooKeeper session expired
>
>>I0413 03:12:58.757953  1715 slave.cpp:4304] Current disk usage 6.44%. Max
>> allowed age: 5.848917453828577days
>
>>W0413 03:13:04.539577  1715 group.cpp:503] Timed out waiting to connect to
>> ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration
>
>
>
> How about check whether you could connect to zk on slave1 or not?
>
>
>
> On Wed, Apr 13, 2016 at 11:49 AM, <aishwarya.adyanth...@accenture.com>
> wrote:
>
> I checked the zookeeper status by running the command:
>
>
>
> root@master1:/home/ubuntu# echo stat | nc 30.30.30.52 2181 | grep Mode
>
> Mode: follower
>
> root@master1:/home/ubuntu# echo stat | nc 30.30.30.53 2181 | grep Mode
>
> Mode: leader
>
> root@master1:/home/ubuntu# echo stat | nc 30.30.30.54 2181 | grep Mode
>
> Mode: follower
>
>
>
> And it seems like it’s working fine. Is there another way to check the
> health status?
>
>
>
>
>
> From: Abhishek Amralkar [mailto:abhishek.amral...@talentica.com]
> Sent: 13 April 2016 09:10
>
>
> To: user@mesos.apache.org
> Subject: Re: Slaves not getting registered
>
>
>
> Have you checked if your ZooKeeper cluster is healthy? accessible from Mesos
> Masters?
>
>
>
> W0413 03:12:24.512336  1715 group.cpp:503] Timed out waiting to connect to
> ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration
>
> W0413 03:12:34.519641  1710 group.cpp:503] Timed out waiting to connect to
> ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration
>
> W0413 03:12:44.521181  1713 group.cpp:503] Timed out waiting to connect to
> ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration
>
> W0413 03:12:54.532501  1711 group.cpp:503] Timed out waiting to connect to
> ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration
>
>
>
> It seems Mesos masters are not able to communicate to Zookeeper.
>
>
>
> -Abhishek
>
> On 13-Apr-2016, at 9:06 AM, aishwarya.adyanth...@accenture.com wrote:
>
>
>
> Hi,
>
>
>
> I have been following the document from the digitalocean (mesos-doc-link)
> where I have set 3 masters and one slave. Below are the log details:
>
>
>
> root@master1:/var/log/mesos# tail -f mesos-master.INFO
>
> I0412 11:01:50.579818  3736 recover.cpp:193] Received a recover response
> from a replica in VOTING status
>
> I0412 11:01:50.579903  3736 recover.cpp:564] Updating replica status to
> RECOVERING
>
> I0412 11:01:50.583102  3736 leveldb.cpp:304] Persisting metadata (8 bytes)
> to leveldb took 3.154399ms
>
> I0412 11:01:50.583137  3736 replica.cpp:320] Persisted replica status to
> RECOVERING
>
> I0412 11:01:50.583176  3736 recover.cpp:543] Starting catch-up from position
> 1 to 2
>
> I0412 11:01:50.583732  3736 recover.cpp:564] Updating replica status to
> VOTING
>
> I0412 11:01:50.586318  3736 leveldb.cpp:304] Persisting metadata (8 bytes)
> to leveldb took 2.540703ms
>
> I0412 11:01:50.586484  3736 replica.cpp:320] Persisted replica status to
> VOTING
>
> I0412 11:01:50.586612  3732 recover.cpp:578] Successfully joined the Paxos
> group
>
> I0412 11:01:50.586745  3731 recover.cpp:462] Recover process terminated
>
>
>
> root@master1:/var/log/mesos# tail -f mesos-master.WARNING
>
> Log file created at: 2016/04/12 11:01:49
>
> Running on machine: master1
>
> Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
>
> W0412 11:01:49.024226  3712 authenticator.cpp:511] No credentials provided,
> authentication requests will be refused
>
>
>
> root@master1:/var/log/mesos# tail -f
> mesos-master.master1.invalid-user.log.INFO.20160412-11014
>
> tail: cannot open
> ‘mesos-master.master1.invalid-user.log.INFO.20160412-11014’ for reading: No
> such file or directory
>
> root@master1:/var/log/mesos# tail -f
> mesos-master.master1.invalid-user.log.INFO.20160412-11014
>
> mesos-master.master1.invalid-user.log.INFO.20160412-110143.3651
> mesos-master.master1.invalid-user.log.INFO.20160412-110148.3712
>
> root@master1:/var/log/mesos# tail -f
> mesos-master.master1.invalid-user.log.INFO.20160412-110143.3651
>
> I0412 11:01:46.424433  3676 replica.cpp:673] Replica in EMPTY status
> received a broadcasted recover request from (5)@30.30.30.53:5050
>
> I0412 11:01:47.068586  3675 replica.cpp:673] Replica in EMPTY status
> received a broadcasted recover request from (8)@30.30.30.53:5050
>
> I0412 11:01:47.592926  3677 replica.cpp:673] Replica in EMPTY status
> received a broadcasted recover request from (11)@30.30.30.53:5050
>
> I0412 11:01:48.188248  3680 replica.cpp:673] Replica in EMPTY status
> received a broadcasted recover request from (14)@30.30.30.53:5050
>
> I0412 11:01:48.887104  3678 group.cpp:460] Lost connection to ZooKeeper,
> attempting to reconnect ...
>
> I0412 11:01:48.887177  3674 group.cpp:460] Lost connection to ZooKeeper,
> attempting to reconnect ...
>
> I0412 11:01:48.887229  3677 group.cpp:460] Lost connection to ZooKeeper,
> attempting to reconnect ...
>
> I0412 11:01:48.919545  3675 group.cpp:519] ZooKeeper session expired
>
> I0412 11:01:48.919848  3680 detector.cpp:154] Detected a new leader: None
>
> I0412 11:01:48.919922  3680 master.cpp:1710] The newly elected leader is
> None
>
>
>
>
>
> root@slave1:/var/log/mesos# tail -f
> mesos-slave.slave1.invalid-user.log.INFO.20160412-110554.1696
>
> I0413 03:12:54.532676  1711 group.cpp:519] ZooKeeper session expired
>
> I0413 03:12:58.757953  1715 slave.cpp:4304] Current disk usage 6.44%. Max
> allowed age: 5.848917453828577days
>
> W0413 03:13:04.539577  1715 group.cpp:503] Timed out waiting to connect to
> ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration
>
> I0413 03:13:04.539798  1715 group.cpp:519] ZooKeeper session expired
>
> W0413 03:13:14.542245  1713 group.cpp:503] Timed out waiting to connect to
> ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration
>
> I0413 03:13:14.542434  1713 group.cpp:519] ZooKeeper session expired
>
>
>
> root@slave1:/var/log/mesos# tail -f mesos-slave.WARNING
>
> W0413 03:12:24.512336  1715 group.cpp:503] Timed out waiting to connect to
> ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration
>
> W0413 03:12:34.519641  1710 group.cpp:503] Timed out waiting to connect to
> ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration
>
> W0413 03:12:44.521181  1713 group.cpp:503] Timed out waiting to connect to
> ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration
>
> W0413 03:12:54.532501  1711 group.cpp:503] Timed out waiting to connect to
> ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration
>
>
>
> Thank you.
>
>
>
>
>
> From: June Taylor [mailto:j...@umn.edu]
> Sent: 12 April 2016 18:06
> To: user@mesos.apache.org
> Subject: Re: Slaves not getting registered
>
>
>
> Try looking in /var/log/mesos/ at these files: mesos-slave.WARNING,
> mesos-slave.INFO, mesos-slave.ERROR
>
>
>
>
> Thanks,
>
> June Taylor
>
> System Administrator, Minnesota Population Center
>
> University of Minnesota
>
>
>
> On Tue, Apr 12, 2016 at 4:36 AM, Dick Davies <d...@hellooperator.net> wrote:
>
> There's no mention of a slave there, have a look at the logs on the
> slaves filesystem and see if it is giving any errors.
>
>
> On 12 April 2016 at 10:17,  <aishwarya.adyanth...@accenture.com> wrote:
>> The GUI log shows like this:
>>
>>
>>
>> I0412 08:45:51.379609  3616 master.cpp:3673] Processing DECLINE call for
>> offers: [ 74f33592-fc48-4066-a59c-977818b4c13c-O282 ] for framework
>> 74f33592-fc48-4066-a59c-977818b4c13c-0001 (chronos-2.4.0) at
>> scheduler-15022696-44ec-43d2-b193-a3cc4021d20e@30.30.30.48:42208
>>
>> I0412 08:45:54.637461  3612 http.cpp:501] HTTP GET for /master/state.json
>> from 10.211.203.147:59463 with User-Agent='Mozilla/5.0 (Windows NT 6.0;
>> WOW64; rv:43.0) Gecko/20100101 Firefox/43.0'
>>
>> I0412 08:45:57.376288  3619 master.cpp:5350] Sending 1 offers to framework
>> 74f33592-fc48-4066-a59c-977818b4c13c-0001 (chronos-2.4.0) at
>> scheduler-15022696-44ec-43d2-b193-a3cc4021d20e@30.30.30.48:42208
>>
>> I0412 08:45:57.385325 3613 master.cpp:3673] Processing DECLINE call for
>> offers: [ 74f33592-fc48-4066-a59c-977818b4c13c-O283 ] for framework
>> 74f33592-fc48-4066-a59c-977818b4c13c-0001 (chronos-2.4.0) at
>> scheduler-15022696-44ec-43d2-b193-a3cc4021d20e@30.30.30.48:42208
>>
>> I0412 08:46:03.383728  3614 master.cpp:5350] Sending 1 offers to framework
>> 74f33592-fc48-4066-a59c-977818b4c13c-0001 (chronos-2.4.0) at
>> scheduler-15022696-44ec-43d2-b193-a3cc4021d20e@30.30.30.48:42208
>>
>> I0412 08:46:03.396531  3612 master.cpp:3673] Processing DECLINE call for
>> offers: [ 74f33592-fc48-4066-a59c-977818b4c13c-O284 ] for framework
>> 74f33592-fc48-4066-a59c-977818b4c13c-0001 (chronos-2.4.0) at
>> scheduler-15022696-44ec-43d2-b193-a3cc4021d20e@30.30.30.48:42208
>>
>> I0412 08:46:04.665582  3612 http.cpp:501] HTTP GET for /master/state.json
>> from 10.211.203.147:59464 with User-Agent='Mozilla/5.0 (Windows NT 6.0;
>> WOW64; rv:43.0) Gecko/20100101 Firefox/43.0'
>>
>> I0412 08:46:09.389493  3616 master.cpp:5350] Sending 1 offers to framework
>> 74f33592-fc48-4066-a59c-977818b4c13c-0001 (chronos-2.4.0) at
>> scheduler-15022696-44ec-43d2-b193-a3cc4021d20e@30.30.30.48:42208
>>
>>
>>
>>
>>
>> Is there a way to find out the number of masters that are present in the
>> environment together through CLI/GUI?
>>
>>
>>
>>
>>
>>
>>
>> From: haosdent [mailto:haosd...@gmail.com]
>> Sent: 12 April 2016 13:37
>> To: user <user@mesos.apache.org>
>> Subject: Re: Slaves not getting registered
>>
>>
>>
>>>but am unable to get it registered.
>>
>> Hi, @aishwarya Could you post master and slave log to provide more
>> details?
>> Usually it is because of network problem.
>>
>>
>>
>> On Tue, Apr 12, 2016 at 4:02 PM, <aishwarya.adyanth...@accenture.com>
>> wrote:
>>
>> Hi,
>>
>>
>>
>> I’m unable to get the slave registered with the master node. I’ve
>> configured
>> both the masters and slave machines but am unable to get it registered.
>>
>>
>>
>> Thank you.
>>
>>
>>
>> ________________________________
>>
>>
>> This message is for the designated recipient only and may contain
>> privileged, proprietary, or otherwise confidential information. If you
>> have
>> received it in error, please notify the sender immediately and delete the
>> original. Any other use of the e-mail by you is prohibited. Where allowed
>> by
>> local law, electronic communications with Accenture and its affiliates,
>> including e-mail and instant messaging (including content), may be scanned
>> by our systems for the purposes of information security and assessment of
>> internal compliance with Accenture policy.
>>
>> ______________________________________________________________________________________
>>
>> www.accenture.com
>>
>>
>>
>>
>>
>> --
>>
>> Best Regards,
>>
>> Haosdent Huang
>
>
>
>
>
> ________________________________
>
>
> This message is for the designated recipient only and may contain
> privileged, proprietary, or otherwise confidential information. If you have
> received it in error, please notify the sender immediately and delete the
> original. Any other use of the e-mail by you is prohibited. Where allowed by
> local law, electronic communications with Accenture and its affiliates,
> including e-mail and instant messaging (including content), may be scanned
> by our systems for the purposes of information security and assessment of
> internal compliance with Accenture policy.
> ______________________________________________________________________________________
>
> www.accenture.com
>
>
>
>
>
>
>
> --
>
> Best Regards,
>
> Haosdent Huang

Reply via email to