Now it seems working. I guess for 2 reasons: 1) I set up /etc/mesos-master/ip and etc/mesos-slave/ip thanks for your suggestion. 2) i added in the routing table the gateway to reach the other network. the second point continue to be strange, since only for 3 machines i had to place the routing rule, while for other machine there were not necessary. However, now i have 2 mesos masters on network 1 and 1 mesos master on network 2 connected each other, and all the slaves are connected to the leader without disconnecting. Thanks a lot guys if i have other issues i will share with you!
2016-04-19 10:09 GMT+02:00 Stefano Bianchi <[email protected]>: > However i have omitted to say that on these machines is running docker, on > some machines docker is running a service on other dont, i saw the docker > interface typing ifconfig, i guess this is what you mean Dick Davies? > Il 19/apr/2016 09:22, "Stefano Bianchi" <[email protected]> ha scritto: > >> Actualli the majority of these settings i have already done, out of >> /etc/mesos-master/ip, here should i write the ip of master interface ? And >> /etc/mesos-slave/ip, here i should write the ip of slave interface ? >> Your suggest seems the right one, because if i try to ping some machines >> from a network to another someone is reachable some other don't, and these >> latters sometimes, commonly at boot, are able to ping and aftwr a while >> dont. >> Thanks for your suggestion i m going to try. >> Il 19/apr/2016 03:12, "Dick Davies" <[email protected]> ha scritto: >> >>> On our network a lot of the hosts have multiple interfaces, which let >>> some asymmetric routing >>> issues creep in that prevented our masters replying to slaves, which >>> reminded me of your symptoms. >>> >>> So we set an IP address in /etc/mesos-slave/ip and >>> /etc/mesos-master/ip so that they only listen >>> on one interface, and then check connectivity between those IPs. >>> >>> The Ansible repo we use to build the stack now has a 'signoff' >>> playbook to check network connectivity >>> is correct between the services it deploys to a new environment. >>> >>> It won't be much use to you on its own I'm afraid, but >>> here's a checklist cribbed from that playbook (ports might be >>> different in your setup). >>> >>> You can SSH to the servers and check reachability between them with >>> netcat or telnet. >>> >>> >>> zookeepers: >>> >>> - need to be able to reach each other on the election port (usually >>> tcp/3888) >>> >>> masters: >>> >>> * must be able to reach zookeepers on tcp/2181 >>> * must be able to reach each other on tcp/5050 >>> * must be able to reach slaves on tcp/5051 >>> >>> mesos slaves: >>> >>> - must be able to reach masters on tcp/5050 >>> - must be able to reach zookeepers on tcp/2181 >>> - another other connectivity to services your application needs >>> (database, caches, whatever) >>> >>> I think that's it. >>> >>> On 18 April 2016 at 20:39, Stefano Bianchi <[email protected]> wrote: >>> > Hi Dick Davies >>> > >>> > Could you please share your solution? >>> > How did you set up mesos/Zookeeper to interconnect masters and slaves >>> among >>> > networks? >>> > >>> > Thanks a lot! >>> > >>> > 2016-04-18 20:56 GMT+02:00 Dick Davies <[email protected]>: >>> >> >>> >> +1 for that theory, we had some screwy issues when we tried to span >>> >> subnets until we set every slave and master >>> >> to listen on a specific IP so we could tie down routing correctly. >>> >> >>> >> Saw very similar symptoms that have been described. >>> >> >>> >> On 18 April 2016 at 18:35, Alex Rukletsov <[email protected]> >>> wrote: >>> >> > I believe it's because slaves are able to connect to the master, >>> but the >>> >> > master is not able to connect to the slaves. That's why you see them >>> >> > connected for some time and gone afterwards. >>> >> > >>> >> > On Mon, Apr 18, 2016 at 6:47 PM, Stefano Bianchi < >>> [email protected]> >>> >> > wrote: >>> >> >> >>> >> >> Indeed, i dont know why, i am not able to reach all the machines >>> from a >>> >> >> network to the other, just some machines can interconnect with some >>> >> >> others >>> >> >> among the networks. >>> >> >> On mesos i see that all the slaves at a certain time are all >>> connected, >>> >> >> then disconnected and after a while connected again, it seems like >>> they >>> >> >> are >>> >> >> able to connect for a while. >>> >> >> However is an openstack issue i guess. >>> >> >> >>> >> >> Does this also happen when master3 is leading? My guess is that >>> you're >>> >> >> not >>> >> >> allowong incoming connections from master1 and master2 to slave3. >>> >> >> Generally, >>> >> >> masters should be able to connect to slaves, not just respond to >>> their >>> >> >> requests. >>> >> >> >>> >> >> On 18 Apr 2016 13:17, "Stefano Bianchi" <[email protected]> >>> wrote: >>> >> >>> >>> >> >>> Hi >>> >> >>> On openstack i plugged two virtual networks to the same virtual >>> router >>> >> >>> so >>> >> >>> that the hosts on the 2 networks can communicate each other. >>> >> >>> this is my topology: >>> >> >>> >>> >> >>> -----------------------internet----------------------- >>> >> >>> | >>> >> >>> Router1 >>> >> >>> | >>> >> >>> -------------------------------------------------------- >>> >> >>> | >>> | >>> >> >>> Net1 Net2 >>> >> >>> Master1 Master2 Master3 >>> >> >>> Slave1 slave2 Slave3 >>> >> >>> >>> >> >>> I have set zookeeper in with this line: >>> >> >>> >>> >> >>> zk://Master1_IP:2181,Master2_IP:2181,Master3_IP:2181/mesos >>> >> >>> >>> >> >>> The 3 masters, even though on 2 separated networks, elect the >>> leader >>> >> >>> correclty. >>> >> >>> Now i have started the slaves, and in a first time i see all 3 >>> >> >>> correctly >>> >> >>> registered, but after a while the slave 3, independently form who >>> is >>> >> >>> the >>> >> >>> master, disconnects. >>> >> >>> I saw in the log and i get the message in the object. >>> >> >>> Can you help me to solve this problem? >>> >> >>> >>> >> >>> >>> >> >>> Thanks to all. >>> >> > >>> >> > >>> > >>> > >>> >>

