Re: removed slace "ID": (131.154.96.172): health check timed out

Stefano Bianchi Tue, 19 Apr 2016 02:42:08 -0700

Now it seems working.
I guess for 2 reasons:
1) I set up /etc/mesos-master/ip and etc/mesos-slave/ip thanks for your
suggestion.
2) i added in the routing table the gateway to reach the other network.
the second point continue to be strange, since only for 3 machines i had to
place the routing rule, while for other machine there were not necessary.
However, now i have 2 mesos masters on network 1 and 1 mesos master on
network 2 connected each other, and all the slaves are connected to the
leader without disconnecting.
Thanks a lot guys if i have other issues i will share with you!


2016-04-19 10:09 GMT+02:00 Stefano Bianchi <[email protected]>:

> However i have omitted to say that on these machines is running docker, on
> some machines docker is running a service on other dont, i saw the docker
> interface typing ifconfig, i guess this is what you mean Dick Davies?
> Il 19/apr/2016 09:22, "Stefano Bianchi" <[email protected]> ha scritto:
>
>> Actualli the majority of these settings i have already done, out of
>> /etc/mesos-master/ip, here should i write the ip of master interface ? And
>> /etc/mesos-slave/ip, here i should write the ip of slave interface ?
>> Your suggest seems the right one, because if i try to ping some machines
>> from a network to another someone is reachable some other don't, and these
>> latters sometimes, commonly at boot, are able to ping and aftwr a while
>> dont.
>> Thanks for your suggestion i m going to try.
>> Il 19/apr/2016 03:12, "Dick Davies" <[email protected]> ha scritto:
>>
>>> On our network a lot of the hosts have multiple interfaces, which let
>>> some asymmetric routing
>>> issues creep in that prevented our masters replying to slaves, which
>>> reminded me of your symptoms.
>>>
>>> So we set an IP address in /etc/mesos-slave/ip and
>>> /etc/mesos-master/ip so that they only listen
>>> on one interface, and then check connectivity between those IPs.
>>>
>>> The Ansible repo we use to build the stack now has a 'signoff'
>>> playbook to check network connectivity
>>> is correct between the services it deploys to a new environment.
>>>
>>> It won't be much use to you on its own I'm afraid, but
>>> here's a checklist cribbed from that playbook (ports might be
>>> different in your setup).
>>>
>>> You can SSH to the servers and check reachability between them with
>>> netcat or telnet.
>>>
>>>
>>> zookeepers:
>>>
>>> - need to be able to reach each other on the election port (usually
>>> tcp/3888)
>>>
>>> masters:
>>>
>>> * must be able to reach zookeepers on tcp/2181
>>> * must be able to reach each other on tcp/5050
>>> * must be able to reach slaves on tcp/5051
>>>
>>> mesos slaves:
>>>
>>> - must be able to reach masters on tcp/5050
>>> - must be able to reach zookeepers on tcp/2181
>>> - another other connectivity to services your application needs
>>> (database, caches, whatever)
>>>
>>> I think that's it.
>>>
>>> On 18 April 2016 at 20:39, Stefano Bianchi <[email protected]> wrote:
>>> > Hi Dick Davies
>>> >
>>> > Could you please share your solution?
>>> > How did you set up mesos/Zookeeper to interconnect masters and slaves
>>> among
>>> > networks?
>>> >
>>> > Thanks a lot!
>>> >
>>> > 2016-04-18 20:56 GMT+02:00 Dick Davies <[email protected]>:
>>> >>
>>> >> +1 for that theory, we had some screwy issues when we tried to span
>>> >> subnets until we set every slave and master
>>> >> to listen on a specific IP so we could tie down routing correctly.
>>> >>
>>> >> Saw very similar symptoms that have been described.
>>> >>
>>> >> On 18 April 2016 at 18:35, Alex Rukletsov <[email protected]>
>>> wrote:
>>> >> > I believe it's because slaves are able to connect to the master,
>>> but the
>>> >> > master is not able to connect to the slaves. That's why you see them
>>> >> > connected for some time and gone afterwards.
>>> >> >
>>> >> > On Mon, Apr 18, 2016 at 6:47 PM, Stefano Bianchi <
>>> [email protected]>
>>> >> > wrote:
>>> >> >>
>>> >> >> Indeed, i dont know why, i am not able to reach all the machines
>>> from a
>>> >> >> network to the other, just some machines can interconnect with some
>>> >> >> others
>>> >> >> among the networks.
>>> >> >> On mesos i see that all the slaves at a certain time are all
>>> connected,
>>> >> >> then disconnected and after a while connected again, it seems like
>>> they
>>> >> >> are
>>> >> >> able to connect for a while.
>>> >> >> However is an openstack issue i guess.
>>> >> >>
>>> >> >> Does this also happen when master3 is leading? My guess is that
>>> you're
>>> >> >> not
>>> >> >> allowong incoming connections from master1 and master2 to slave3.
>>> >> >> Generally,
>>> >> >> masters should be able to connect to slaves, not just respond to
>>> their
>>> >> >> requests.
>>> >> >>
>>> >> >> On 18 Apr 2016 13:17, "Stefano Bianchi" <[email protected]>
>>> wrote:
>>> >> >>>
>>> >> >>> Hi
>>> >> >>> On openstack i plugged two virtual networks to the same virtual
>>> router
>>> >> >>> so
>>> >> >>> that the hosts on the 2 networks can communicate each other.
>>> >> >>> this is my topology:
>>> >> >>>
>>> >> >>> -----------------------internet-----------------------
>>> >> >>>                                 |
>>> >> >>>                            Router1
>>> >> >>>                                 |
>>> >> >>> --------------------------------------------------------
>>> >> >>> |
>>>  |
>>> >> >>> Net1                                                        Net2
>>> >> >>> Master1 Master2                                     Master3
>>> >> >>> Slave1 slave2                                          Slave3
>>> >> >>>
>>> >> >>> I have set zookeeper in with this line:
>>> >> >>>
>>> >> >>> zk://Master1_IP:2181,Master2_IP:2181,Master3_IP:2181/mesos
>>> >> >>>
>>> >> >>> The 3 masters, even though on 2 separated networks, elect the
>>> leader
>>> >> >>> correclty.
>>> >> >>> Now i have started the slaves, and in a first time i see all 3
>>> >> >>> correctly
>>> >> >>> registered, but after a while the slave 3, independently form who
>>> is
>>> >> >>> the
>>> >> >>> master, disconnects.
>>> >> >>> I saw in the log and i get the message in the object.
>>> >> >>> Can you help me to solve this problem?
>>> >> >>>
>>> >> >>>
>>> >> >>> Thanks to all.
>>> >> >
>>> >> >
>>> >
>>> >
>>>
>>

Re: removed slace "ID": (131.154.96.172): health check timed out

Reply via email to