Re: Debugging framework registration from inside docker

2015-06-12 Thread James Vanns
Hi Vinod - this is good news! Just the fact that I'm not barking up the
wrong tree and that indeed it is a known issue.

Cheers

Jim


On 11 June 2015 at 18:16, Vinod Kone vinodk...@gmail.com wrote:


 On Thu, Jun 11, 2015 at 4:00 AM, James Vanns jvanns@gmail.com wrote:

 I think I can conclude then that this just won't work; one cannot run a
 framework as a docker container using bridged networking. This is because a
 POST to the MM that libprocess does on your framework's behalf, includes
 the non-route-able private docker IP and that is what the MM well then try
 to communicate with? Setting LIBPROCESS_IP to the host IP will of course
 not work because then libprocess or somewhere in the mesos framework code
 an attempt at bind()ing to that interface is made and fails because it
 does not exist in bridge mode.


 You are right on track. This is a known issue:
 https://issues.apache.org/jira/browse/MESOS-809. Anindya has submitted a
 short term fix, which unfortunately never landed. I'll shepherd and commit
 this.



 *If* the above is correct then the question I suppose is why does the
 communication channel get established in that way? Why off the back of some
 data in a POST rather than the connected endpoint (that presumably docker
 would manage/forward as it would with a regular web service, for example)?
 Is this some caveat of using zookeeper?


 Longer term, the plan is for the master is to reuse the connection opened
 by the scheduler and not open a new one, as you mentioned. See
 https://issues.apache.org/jira/browse/MESOS-2289




 I'm sure someone will correct me where I'm wrong ;)


 You are not!




-- 
--
Senior Code Pig
Industrial Light  Magic


Re: Debugging framework registration from inside docker

2015-06-11 Thread James Vanns
For what exactly? I thought that was for slave-master communication?
There is no problem there. Or are you suggesting that from inside the
running container I set at least LIBPROCESS_IP to the host IP rather than
the IP of eth0 the container sees? Won't that screw with the docker bridge
routing?

This doesn't quite make sense. I have other network connections inside this
container and those channels are established and communicating fine. It's
just with the mesos master for some reason. Just to be clear;

* The running process is a scheduling framework
* It does not listen for any inbound connection requests
* It, of course, does attempt an outbound connection to the zookeeper to
get the MM
  (this works)
* It then attempts to establish a connection with the MM
  (this also works)
* When the MM sends a response, it fails - it effectively tries to send the
response back to the private/internal docker IP where my scheduler is
running.
* This problem disappears when run with --net=host

TCPDump never shows any inbound traffic;

IP 172.17.1.197.55182  172.20.121.193.5050
...

Therefore there is never any ACK# that corresponds with the SEQ# and these
are just re-transmissions. I think!

Jim


On 10 June 2015 at 18:16, Steven Schlansker sschlans...@opentable.com
wrote:

 On Jun 10, 2015, at 10:10 AM, James Vanns jvanns@gmail.com wrote:

  Hi. When attempting to run my scheduler inside a docker container in
 --net=bridge mode it never receives acknowledgement or a reply to that
 request. However, it works fine in --net=host mode. It does not listen on
 any port as a service so does not expose any.
 
  The scheduler receives the mesos master (leader) from zookeeper fine but
 fails to register the framework with that master. It just loops trying to
 do so - the master sees the registration but deactivates it immediately as
 apparently it disconnects. It doesn't disconnect but is obviously
 unreachable. I see the reason for this in the sendto() and the master log
 file -- because the internal docker bridge IP is included in the POST and
 perhaps that is how the master is trying to talk back
  to the requesting framework??
 
  Inside the container is this;
  tcp0  0 0.0.0.0:44431   0.0.0.0:*
  LISTEN  1/scheduler
 
  This is not my code! I'm at a loss where to go from here. Anyone got any
 further suggestions
  to fix this?

 You may need to try setting LIBPROCESS_IP and LIBPROCESS_PORT to hide the
 fact that you are on a virtual Docker interface.





-- 
--
Senior Code Pig
Industrial Light  Magic


Re: Debugging framework registration from inside docker

2015-06-11 Thread James Vanns
Looks like I share the same symptoms as this 'marathon inside container'
problem;

https://groups.google.com/d/topic/marathon-framework/aFIlv-VnF58/discussion

I guess that sheds some light on the subject ;)


On 11 June 2015 at 09:43, James Vanns jvanns@gmail.com wrote:

 For what exactly? I thought that was for slave-master communication?
 There is no problem there. Or are you suggesting that from inside the
 running container I set at least LIBPROCESS_IP to the host IP rather than
 the IP of eth0 the container sees? Won't that screw with the docker bridge
 routing?

 This doesn't quite make sense. I have other network connections inside
 this container and those channels are established and communicating fine.
 It's just with the mesos master for some reason. Just to be clear;

 * The running process is a scheduling framework
 * It does not listen for any inbound connection requests
 * It, of course, does attempt an outbound connection to the zookeeper to
 get the MM
   (this works)
 * It then attempts to establish a connection with the MM
   (this also works)
 * When the MM sends a response, it fails - it effectively tries to send
 the
 response back to the private/internal docker IP where my scheduler is
 running.
 * This problem disappears when run with --net=host

 TCPDump never shows any inbound traffic;

 IP 172.17.1.197.55182  172.20.121.193.5050
 ...

 Therefore there is never any ACK# that corresponds with the SEQ# and these
 are just re-transmissions. I think!

 Jim


 On 10 June 2015 at 18:16, Steven Schlansker sschlans...@opentable.com
 wrote:

 On Jun 10, 2015, at 10:10 AM, James Vanns jvanns@gmail.com wrote:

  Hi. When attempting to run my scheduler inside a docker container in
 --net=bridge mode it never receives acknowledgement or a reply to that
 request. However, it works fine in --net=host mode. It does not listen on
 any port as a service so does not expose any.
 
  The scheduler receives the mesos master (leader) from zookeeper fine
 but fails to register the framework with that master. It just loops trying
 to do so - the master sees the registration but deactivates it immediately
 as apparently it disconnects. It doesn't disconnect but is obviously
 unreachable. I see the reason for this in the sendto() and the master log
 file -- because the internal docker bridge IP is included in the POST and
 perhaps that is how the master is trying to talk back
  to the requesting framework??
 
  Inside the container is this;
  tcp0  0 0.0.0.0:44431   0.0.0.0:*
  LISTEN  1/scheduler
 
  This is not my code! I'm at a loss where to go from here. Anyone got
 any further suggestions
  to fix this?

 You may need to try setting LIBPROCESS_IP and LIBPROCESS_PORT to hide the
 fact that you are on a virtual Docker interface.





 --
 --
 Senior Code Pig
 Industrial Light  Magic




-- 
--
Senior Code Pig
Industrial Light  Magic


Re: Debugging framework registration from inside docker

2015-06-11 Thread James Vanns
I think I can conclude then that this just won't work; one cannot run a
framework as a docker container using bridged networking. This is because a
POST to the MM that libprocess does on your framework's behalf, includes
the non-route-able private docker IP and that is what the MM well then try
to communicate with? Setting LIBPROCESS_IP to the host IP will of course
not work because then libprocess or somewhere in the mesos framework code
an attempt at bind()ing to that interface is made and fails because it
does not exist in bridge mode.

*If* the above is correct then the question I suppose is why does the
communication channel get established in that way? Why off the back of some
data in a POST rather than the connected endpoint (that presumably docker
would manage/forward as it would with a regular web service, for example)?
Is this some caveat of using zookeeper?

I'm sure someone will correct me where I'm wrong ;)

Cheers,

Jim


On 11 June 2015 at 10:00, James Vanns jvanns@gmail.com wrote:

 Looks like I share the same symptoms as this 'marathon inside container'
 problem;

 https://groups.google.com/d/topic/marathon-framework/aFIlv-VnF58/discussion

 I guess that sheds some light on the subject ;)


 On 11 June 2015 at 09:43, James Vanns jvanns@gmail.com wrote:

 For what exactly? I thought that was for slave-master communication?
 There is no problem there. Or are you suggesting that from inside the
 running container I set at least LIBPROCESS_IP to the host IP rather than
 the IP of eth0 the container sees? Won't that screw with the docker bridge
 routing?

 This doesn't quite make sense. I have other network connections inside
 this container and those channels are established and communicating fine.
 It's just with the mesos master for some reason. Just to be clear;

 * The running process is a scheduling framework
 * It does not listen for any inbound connection requests
 * It, of course, does attempt an outbound connection to the zookeeper to
 get the MM
   (this works)
 * It then attempts to establish a connection with the MM
   (this also works)
 * When the MM sends a response, it fails - it effectively tries to send
 the
 response back to the private/internal docker IP where my scheduler is
 running.
 * This problem disappears when run with --net=host

 TCPDump never shows any inbound traffic;

 IP 172.17.1.197.55182  172.20.121.193.5050
 ...

 Therefore there is never any ACK# that corresponds with the SEQ# and
 these are just re-transmissions. I think!

 Jim


 On 10 June 2015 at 18:16, Steven Schlansker sschlans...@opentable.com
 wrote:

 On Jun 10, 2015, at 10:10 AM, James Vanns jvanns@gmail.com wrote:

  Hi. When attempting to run my scheduler inside a docker container in
 --net=bridge mode it never receives acknowledgement or a reply to that
 request. However, it works fine in --net=host mode. It does not listen on
 any port as a service so does not expose any.
 
  The scheduler receives the mesos master (leader) from zookeeper fine
 but fails to register the framework with that master. It just loops trying
 to do so - the master sees the registration but deactivates it immediately
 as apparently it disconnects. It doesn't disconnect but is obviously
 unreachable. I see the reason for this in the sendto() and the master log
 file -- because the internal docker bridge IP is included in the POST and
 perhaps that is how the master is trying to talk back
  to the requesting framework??
 
  Inside the container is this;
  tcp0  0 0.0.0.0:44431   0.0.0.0:*
  LISTEN  1/scheduler
 
  This is not my code! I'm at a loss where to go from here. Anyone got
 any further suggestions
  to fix this?

 You may need to try setting LIBPROCESS_IP and LIBPROCESS_PORT to hide
 the fact that you are on a virtual Docker interface.





 --
 --
 Senior Code Pig
 Industrial Light  Magic




 --
 --
 Senior Code Pig
 Industrial Light  Magic




-- 
--
Senior Code Pig
Industrial Light  Magic


Re: Debugging framework registration from inside docker

2015-06-11 Thread Vinod Kone
On Thu, Jun 11, 2015 at 4:00 AM, James Vanns jvanns@gmail.com wrote:

 I think I can conclude then that this just won't work; one cannot run a
 framework as a docker container using bridged networking. This is because a
 POST to the MM that libprocess does on your framework's behalf, includes
 the non-route-able private docker IP and that is what the MM well then try
 to communicate with? Setting LIBPROCESS_IP to the host IP will of course
 not work because then libprocess or somewhere in the mesos framework code
 an attempt at bind()ing to that interface is made and fails because it
 does not exist in bridge mode.


You are right on track. This is a known issue:
https://issues.apache.org/jira/browse/MESOS-809. Anindya has submitted a
short term fix, which unfortunately never landed. I'll shepherd and commit
this.



 *If* the above is correct then the question I suppose is why does the
 communication channel get established in that way? Why off the back of some
 data in a POST rather than the connected endpoint (that presumably docker
 would manage/forward as it would with a regular web service, for example)?
 Is this some caveat of using zookeeper?


Longer term, the plan is for the master is to reuse the connection opened
by the scheduler and not open a new one, as you mentioned. See
https://issues.apache.org/jira/browse/MESOS-2289




 I'm sure someone will correct me where I'm wrong ;)


You are not!


Re: Debugging framework registration from inside docker

2015-06-11 Thread Tom Arnfeld
I believe you're correct Jim, if you set LIBPROCESS_IP=$HOST_IP libprocess will 
try to bind to that address as well as announce it, which won't work inside a 
bridged container.




We've been having a similar discussion on 
https://github.com/wickman/pesos/issues/25.



--


Tom Arnfeld

Developer // DueDil






On Thursday, Jun 11, 2015 at 10:00 am, James Vanns jvanns@gmail.com, 
wrote:


Looks like I share the same symptoms as this 'marathon inside container' 
problem;




https://groups.google.com/d/topic/marathon-framework/aFIlv-VnF58/discussion



I guess that sheds some light on the subject ;)








On 11 June 2015 at 09:43, James Vanns jvanns@gmail.com wrote:

For what exactly? I thought that was for slave-master communication? There is 
no problem there. Or are you suggesting that from inside the running container 
I set at least LIBPROCESS_IP to the host IP rather than the IP of eth0 the 
container sees? Won't that screw with the docker bridge routing?


This doesn't quite make sense. I have other network connections inside this 
container and those channels are established and communicating fine. It's just 
with the mesos master for some reason. Just to be clear;




* The running process is a scheduling framework

* It does not listen for any inbound connection requests

* It, of course, does attempt an outbound connection to the zookeeper to get 
the MM

  (this works)

* It then attempts to establish a connection with the MM

  (this also works)

* When the MM sends a response, it fails - it effectively tries to send the 

response back to the private/internal docker IP where my scheduler is 

running.

* This problem disappears when run with --net=host




TCPDump never shows any inbound traffic;





IP 172.17.1.197.55182  172.20.121.193.5050

...




Therefore there is never any ACK# that corresponds with the SEQ# and these are 
just re-transmissions. I think!






Jim










On 10 June 2015 at 18:16, Steven Schlansker sschlans...@opentable.com wrote:

On Jun 10, 2015, at 10:10 AM, James Vanns jvanns@gmail.com wrote:


 Hi. When attempting to run my scheduler inside a docker container in 
 --net=bridge mode it never receives acknowledgement or a reply to that 
 request. However, it works fine in --net=host mode. It does not listen on any 
 port as a service so does not expose any.



 The scheduler receives the mesos master (leader) from zookeeper fine but 
 fails to register the framework with that master. It just loops trying to do 
 so - the master sees the registration but deactivates it immediately as 
 apparently it disconnects. It doesn't disconnect but is obviously 
 unreachable. I see the reason for this in the sendto() and the master log 
 file -- because the internal docker bridge IP is included in the POST and 
 perhaps that is how the master is trying to talk back

 to the requesting framework??



 Inside the container is this;

 tcp        0      0 0.0.0.0:44431           0.0.0.0:*               LISTEN    
   1/scheduler



 This is not my code! I'm at a loss where to go from here. Anyone got any 
 further suggestions

 to fix this?



You may need to try setting LIBPROCESS_IP and LIBPROCESS_PORT to hide the fact 
that you are on a virtual Docker interface.












-- 

--

Senior Code Pig

Industrial Light  Magic














-- 

--

Senior Code Pig

Industrial Light  Magic

Re: Debugging framework registration from inside docker

2015-06-10 Thread Steven Schlansker
On Jun 10, 2015, at 10:10 AM, James Vanns jvanns@gmail.com wrote:

 Hi. When attempting to run my scheduler inside a docker container in 
 --net=bridge mode it never receives acknowledgement or a reply to that 
 request. However, it works fine in --net=host mode. It does not listen on any 
 port as a service so does not expose any.
 
 The scheduler receives the mesos master (leader) from zookeeper fine but 
 fails to register the framework with that master. It just loops trying to do 
 so - the master sees the registration but deactivates it immediately as 
 apparently it disconnects. It doesn't disconnect but is obviously 
 unreachable. I see the reason for this in the sendto() and the master log 
 file -- because the internal docker bridge IP is included in the POST and 
 perhaps that is how the master is trying to talk back
 to the requesting framework?? 
 
 Inside the container is this;
 tcp0  0 0.0.0.0:44431   0.0.0.0:*   LISTEN
   1/scheduler
 
 This is not my code! I'm at a loss where to go from here. Anyone got any 
 further suggestions
 to fix this?

You may need to try setting LIBPROCESS_IP and LIBPROCESS_PORT to hide the fact 
that you are on a virtual Docker interface.