Re: removed slace "ID": (131.154.96.172): health check timed out

2016-04-19 Thread Stefano Bianchi
Now it seems working. I guess for 2 reasons: 1) I set up /etc/mesos-master/ip and etc/mesos-slave/ip thanks for your suggestion. 2) i added in the routing table the gateway to reach the other network. the second point continue to be strange, since only for 3 machines i had to place the routing

Re: removed slace "ID": (131.154.96.172): health check timed out

2016-04-19 Thread Stefano Bianchi
However i have omitted to say that on these machines is running docker, on some machines docker is running a service on other dont, i saw the docker interface typing ifconfig, i guess this is what you mean Dick Davies? Il 19/apr/2016 09:22, "Stefano Bianchi" ha scritto: >

Re: removed slace "ID": (131.154.96.172): health check timed out

2016-04-19 Thread Stefano Bianchi
Actualli the majority of these settings i have already done, out of /etc/mesos-master/ip, here should i write the ip of master interface ? And /etc/mesos-slave/ip, here i should write the ip of slave interface ? Your suggest seems the right one, because if i try to ping some machines from a

Re: removed slace "ID": (131.154.96.172): health check timed out

2016-04-18 Thread Dick Davies
On our network a lot of the hosts have multiple interfaces, which let some asymmetric routing issues creep in that prevented our masters replying to slaves, which reminded me of your symptoms. So we set an IP address in /etc/mesos-slave/ip and /etc/mesos-master/ip so that they only listen on one

Re: removed slace "ID": (131.154.96.172): health check timed out

2016-04-18 Thread Stefano Bianchi
Hi Dick Davies Could you please share your solution? How did you set up mesos/Zookeeper to interconnect masters and slaves among networks? Thanks a lot! 2016-04-18 20:56 GMT+02:00 Dick Davies : > +1 for that theory, we had some screwy issues when we tried to span >

Re: removed slace "ID": (131.154.96.172): health check timed out

2016-04-18 Thread Dick Davies
+1 for that theory, we had some screwy issues when we tried to span subnets until we set every slave and master to listen on a specific IP so we could tie down routing correctly. Saw very similar symptoms that have been described. On 18 April 2016 at 18:35, Alex Rukletsov

Re: removed slace "ID": (131.154.96.172): health check timed out

2016-04-18 Thread Alex Rukletsov
I believe it's because slaves are able to connect to the master, but the master is not able to connect to the slaves. That's why you see them connected for some time and gone afterwards. On Mon, Apr 18, 2016 at 6:47 PM, Stefano Bianchi wrote: > Indeed, i dont know why, i

Re: removed slace "ID": (131.154.96.172): health check timed out

2016-04-18 Thread Stefano Bianchi
Indeed, i dont know why, i am not able to reach all the machines from a network to the other, just some machines can interconnect with some others among the networks. On mesos i see that all the slaves at a certain time are all connected, then disconnected and after a while connected again, it

Re: removed slace "ID": (131.154.96.172): health check timed out

2016-04-18 Thread Alex Rukletsov
Does this also happen when master3 is leading? My guess is that you're not allowong incoming connections from master1 and master2 to slave3. Generally, masters should be able to connect to slaves, not just respond to their requests. On 18 Apr 2016 13:17, "Stefano Bianchi"

removed slace "ID": (131.154.96.172): health check timed out

2016-04-18 Thread Stefano Bianchi
Hi On openstack i plugged two virtual networks to the same virtual router so that the hosts on the 2 networks can communicate each other. this is my topology: ---internet--- | Router1