Re: Slaves not getting registered

haosdent Tue, 12 Apr 2016 23:10:27 -0700

>the three masters keep fluctuating among themselves to be the leader.

Do all the network between 3 Mesos masters and zookeeper are stable? Is it
lost packets when you ping zookeeper servers in every Mesos master?


On Wed, Apr 13, 2016 at 1:15 PM, Abhishek Amralkar <
[email protected]> wrote:

> Not sure, but try to change the quorum and check.
>
>
>
> On 13-Apr-2016, at 10:39 AM, [email protected] wrote:
>
> Hi,
>
> I configured the zookeeper file in slave machine by adding the master
> details and now the salve is getting registered.
>
> But I don’t why, the three masters keep fluctuating among themselves to be
> the leader when I try accessing the master IP in the GUI.
>
> Thank you.
>
>
> *From:* haosdent [mailto:[email protected] <[email protected]>]
> *Sent:* 13 April 2016 09:25
> *To:* user <[email protected]>
> *Cc:* Kumari, Suruchi <[email protected]>
> *Subject:* Re: Slaves not getting registered
>
> >I0412 11:01:50.586612  3732 recover.cpp:578] Successfully joined the
> Paxos group
>
> According to this, master 1 should connect to zk successfully.
>
> >root@slave1:/var/log/mesos# tail -f
> mesos-slave.slave1.invalid-user.log.INFO.20160412-110554.1696
> >I0413 03:12:54.532676  1711 group.cpp:519] ZooKeeper session expired
> >I0413 03:12:58.757953  1715 slave.cpp:4304] Current disk usage 6.44%. Max
> allowed age: 5.848917453828577days
> >W0413 03:13:04.539577  1715 group.cpp:503] Timed out waiting to connect
> to ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration
>
> How about check whether you could connect to zk on slave1 or not?
>
> On Wed, Apr 13, 2016 at 11:49 AM, <[email protected]>
> wrote:
>
> I checked the zookeeper status by running the command:
>
> root@master1:/home/ubuntu# echo stat | nc 30.30.30.52 2181 | grep Mode
> Mode: follower
> root@master1:/home/ubuntu# echo stat | nc 30.30.30.53 2181 | grep Mode
> Mode: leader
> root@master1:/home/ubuntu# echo stat | nc 30.30.30.54 2181 | grep Mode
> Mode: follower
>
> And it seems like it’s working fine. Is there another way to check the
> health status?
>
>
> *From:* Abhishek Amralkar [mailto:[email protected]]
> *Sent:* 13 April 2016 09:10
>
> *To:* [email protected]
> *Subject:* Re: Slaves not getting registered
>
> Have you checked if your ZooKeeper cluster is healthy? accessible from
> Mesos Masters?
>
> W0413 03:12:24.512336  1715 group.cpp:503] Timed out waiting to connect to
> ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration
> W0413 03:12:34.519641  1710 group.cpp:503] Timed out waiting to connect to
> ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration
> W0413 03:12:44.521181  1713 group.cpp:503] Timed out waiting to connect to
> ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration
> W0413 03:12:54.532501  1711 group.cpp:503] Timed out waiting to connect to
> ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration
>
> It seems Mesos masters are not able to communicate to Zookeeper.
>
> -Abhishek
>
> On 13-Apr-2016, at 9:06 AM, [email protected] wrote:
>
> Hi,
>
> I have been following the document from the digitalocean (mesos-doc-link
> <https://www.digitalocean.com/community/tutorials/how-to-configure-a-production-ready-mesosphere-cluster-on-ubuntu-14-04>)
> where I have set 3 masters and one slave. Below are the log details:
>
> root@master1:/var/log/mesos# tail -f mesos-master.INFO
> I0412 11:01:50.579818  3736 recover.cpp:193] Received a recover response
> from a replica in VOTING status
> I0412 11:01:50.579903  3736 recover.cpp:564] Updating replica status to
> RECOVERING
> I0412 11:01:50.583102  3736 leveldb.cpp:304] Persisting metadata (8 bytes)
> to leveldb took 3.154399ms
> I0412 11:01:50.583137  3736 replica.cpp:320] Persisted replica status to
> RECOVERING
> I0412 11:01:50.583176  3736 recover.cpp:543] Starting catch-up from
> position 1 to 2
> I0412 11:01:50.583732  3736 recover.cpp:564] Updating replica status to
> VOTING
> I0412 11:01:50.586318  3736 leveldb.cpp:304] Persisting metadata (8 bytes)
> to leveldb took 2.540703ms
> I0412 11:01:50.586484  3736 replica.cpp:320] Persisted replica status to
> VOTING
> I0412 11:01:50.586612  3732 recover.cpp:578] Successfully joined the Paxos
> group
> I0412 11:01:50.586745  3731 recover.cpp:462] Recover process terminated
>
> root@master1:/var/log/mesos# tail -f mesos-master.WARNING
> Log file created at: 2016/04/12 11:01:49
> Running on machine: master1
> Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
> W0412 11:01:49.024226  3712 authenticator.cpp:511] No credentials
> provided, authentication requests will be refused
>
> root@master1:/var/log/mesos# tail -f
> mesos-master.master1.invalid-user.log.INFO.20160412-11014
> tail: cannot open
> ‘mesos-master.master1.invalid-user.log.INFO.20160412-11014’ for reading: No
> such file or directory
> root@master1:/var/log/mesos# tail -f
> mesos-master.master1.invalid-user.log.INFO.20160412-11014
> mesos-master.master1.invalid-user.log.INFO.20160412-110143.3651
> mesos-master.master1.invalid-user.log.INFO.20160412-110148.3712
> root@master1:/var/log/mesos# tail -f
> mesos-master.master1.invalid-user.log.INFO.20160412-110143.3651
> I0412 11:01:46.424433  3676 replica.cpp:673] Replica in EMPTY status
> received a broadcasted recover request from (5)@30.30.30.53:5050
> I0412 11:01:47.068586  3675 replica.cpp:673] Replica in EMPTY status
> received a broadcasted recover request from (8)@30.30.30.53:5050
> I0412 11:01:47.592926  3677 replica.cpp:673] Replica in EMPTY status
> received a broadcasted recover request from (11)@30.30.30.53:5050
> I0412 11:01:48.188248  3680 replica.cpp:673] Replica in EMPTY status
> received a broadcasted recover request from (14)@30.30.30.53:5050
> I0412 11:01:48.887104  3678 group.cpp:460] Lost connection to ZooKeeper,
> attempting to reconnect ...
> I0412 11:01:48.887177  3674 group.cpp:460] Lost connection to ZooKeeper,
> attempting to reconnect ...
> I0412 11:01:48.887229  3677 group.cpp:460] Lost connection to ZooKeeper,
> attempting to reconnect ...
> I0412 11:01:48.919545  3675 group.cpp:519] ZooKeeper session expired
> I0412 11:01:48.919848  3680 detector.cpp:154] Detected a new leader: None
> I0412 11:01:48.919922  3680 master.cpp:1710] The newly elected leader is
> None
>
>
> root@slave1:/var/log/mesos# tail -f
> mesos-slave.slave1.invalid-user.log.INFO.20160412-110554.1696
> I0413 03:12:54.532676  1711 group.cpp:519] ZooKeeper session expired
> I0413 03:12:58.757953  1715 slave.cpp:4304] Current disk usage 6.44%. Max
> allowed age: 5.848917453828577days
> W0413 03:13:04.539577  1715 group.cpp:503] Timed out waiting to connect to
> ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration
> I0413 03:13:04.539798  1715 group.cpp:519] ZooKeeper session expired
> W0413 03:13:14.542245  1713 group.cpp:503] Timed out waiting to connect to
> ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration
> I0413 03:13:14.542434  1713 group.cpp:519] ZooKeeper session expired
>
> root@slave1:/var/log/mesos# tail -f mesos-slave.WARNING
> W0413 03:12:24.512336  1715 group.cpp:503] Timed out waiting to connect to
> ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration
> W0413 03:12:34.519641  1710 group.cpp:503] Timed out waiting to connect to
> ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration
> W0413 03:12:44.521181  1713 group.cpp:503] Timed out waiting to connect to
> ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration
> W0413 03:12:54.532501  1711 group.cpp:503] Timed out waiting to connect to
> ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration
>
> Thank you.
>
>
> *From:* June Taylor [mailto:[email protected] <[email protected]>]
> *Sent:* 12 April 2016 18:06
> *To:* [email protected]
> *Subject:* Re: Slaves not getting registered
>
> Try looking in /var/log/mesos/ at these files: mesos-slave.WARNING,
> mesos-slave.INFO, mesos-slave.ERROR
>
>
> Thanks,
> June Taylor
> System Administrator, Minnesota Population Center
> University of Minnesota
>
> On Tue, Apr 12, 2016 at 4:36 AM, Dick Davies <[email protected]>
> wrote:
>
> There's no mention of a slave there, have a look at the logs on the
> slaves filesystem and see if it is giving any errors.
>
> On 12 April 2016 at 10:17,  <[email protected]> wrote:
> > The GUI log shows like this:
> >
> >
> >
> > I0412 08:45:51.379609  3616 master.cpp:3673] Processing DECLINE call for
> > offers: [ 74f33592-fc48-4066-a59c-977818b4c13c-O282 ] for framework
> > 74f33592-fc48-4066-a59c-977818b4c13c-0001 (chronos-2.4.0) at
> > [email protected]:42208
> >
> > I0412 08:45:54.637461  3612 http.cpp:501] HTTP GET for /master/state.json
> > from 10.211.203.147:59463 with User-Agent='Mozilla/5.0 (Windows NT 6.0;
> > WOW64; rv:43.0) Gecko/20100101 Firefox/43.0'
> >
> > I0412 08:45:57.376288  3619 master.cpp:5350] Sending 1 offers to
> framework
> > 74f33592-fc48-4066-a59c-977818b4c13c-0001 (chronos-2.4.0) at
> > [email protected]:42208
> >
> > I0412 08:45:57.385325 3613 <385325%20%203613> master.cpp:3673]
> Processing DECLINE call for
> > offers: [ 74f33592-fc48-4066-a59c-977818b4c13c-O283 ] for framework
> > 74f33592-fc48-4066-a59c-977818b4c13c-0001 (chronos-2.4.0) at
> > [email protected]:42208
> >
> > I0412 08:46:03.383728  3614 master.cpp:5350] Sending 1 offers to
> framework
> > 74f33592-fc48-4066-a59c-977818b4c13c-0001 (chronos-2.4.0) at
> > [email protected]:42208
> >
> > I0412 08:46:03.396531  3612 master.cpp:3673] Processing DECLINE call for
> > offers: [ 74f33592-fc48-4066-a59c-977818b4c13c-O284 ] for framework
> > 74f33592-fc48-4066-a59c-977818b4c13c-0001 (chronos-2.4.0) at
> > [email protected]:42208
> >
> > I0412 08:46:04.665582  3612 http.cpp:501] HTTP GET for /master/state.json
> > from 10.211.203.147:59464 with User-Agent='Mozilla/5.0 (Windows NT 6.0;
> > WOW64; rv:43.0) Gecko/20100101 Firefox/43.0'
> >
> > I0412 08:46:09.389493  3616 master.cpp:5350] Sending 1 offers to
> framework
> > 74f33592-fc48-4066-a59c-977818b4c13c-0001 (chronos-2.4.0) at
> > [email protected]:42208
> >
> >
> >
> >
> >
> > Is there a way to find out the number of masters that are present in the
> > environment together through CLI/GUI?
> >
> >
> >
> >
> >
> >
> >
> > From: haosdent [mailto:[email protected]]
> > Sent: 12 April 2016 13:37
> > To: user <[email protected]>
> > Subject: Re: Slaves not getting registered
> >
> >
> >
> >>but am unable to get it registered.
> >
> > Hi, @aishwarya Could you post master and slave log to provide more
> details?
> > Usually it is because of network problem.
> >
> >
> >
> > On Tue, Apr 12, 2016 at 4:02 PM, <[email protected]>
> wrote:
> >
> > Hi,
> >
> >
> >
> > I’m unable to get the slave registered with the master node. I’ve
> configured
> > both the masters and slave machines but am unable to get it registered.
> >
> >
> >
> > Thank you.
> >
> >
> >
> > ________________________________
> >
> >
> > This message is for the designated recipient only and may contain
> > privileged, proprietary, or otherwise confidential information. If you
> have
> > received it in error, please notify the sender immediately and delete the
> > original. Any other use of the e-mail by you is prohibited. Where
> allowed by
> > local law, electronic communications with Accenture and its affiliates,
> > including e-mail and instant messaging (including content), may be
> scanned
> > by our systems for the purposes of information security and assessment of
> > internal compliance with Accenture policy.
> >
> ______________________________________________________________________________________
> >
> > www.accenture.com
> >
> >
> >
> >
> >
> > --
> >
> > Best Regards,
> >
> > Haosdent Huang
>
>
>
> ------------------------------
>
> This message is for the designated recipient only and may contain
> privileged, proprietary, or otherwise confidential information. If you have
> received it in error, please notify the sender immediately and delete the
> original. Any other use of the e-mail by you is prohibited. Where allowed
> by local law, electronic communications with Accenture and its affiliates,
> including e-mail and instant messaging (including content), may be scanned
> by our systems for the purposes of information security and assessment of
> internal compliance with Accenture policy.
>
> ______________________________________________________________________________________
>
> www.accenture.com
>
>
>
>
>
>
> --
> Best Regards,
> Haosdent Huang
>
>
>


-- 
Best Regards,
Haosdent Huang

Re: Slaves not getting registered

Reply via email to