Re: mesos network/port_mapping + spark traffic not flowing between containers

2017-03-19 Thread Jie Yu
Dominic,

This might be related to this:
https://issues.apache.org/jira/browse/MESOS-7130

- Jie

On Sun, Mar 19, 2017 at 10:10 AM, Dominic Grégoire <
dominic.grego...@gmail.com> wrote:

> Hello all,
>
> I’m testing with mesos 1.1.0 on aws linux to see if it applies to some of
> our processes and I ran into a problem with network/port_mapping, maybe
> this is a known issue?
>
> The agent is running with these flags:
> export MESOS_isolation=cgroups/cpu,cgroups/mem,network/port_mapping
> export MESOS_containerizers=mesos
> export MESOS_resources="ports:[31000-32000];ephemeral_ports:[32768-57344]"
> export MESOS_ephemeral_ports_per_container=1024
>
> Running spark 2.1.0 with 2 mesos containers on the same host, they can
> connect to each other’s block manager but can’t send traffic, it stays in
> their netns send-q.
>
> Spark is logging:
> 7/03/19 16:54:56 INFO TransportClientFactory: Successfully created
> connection to ip-10-32-20-34.ec2.internal/10.32.20.34:34294 after 12 ms
> (0 ms spent in bootstraps)
> 17/03/19 16:56:56 ERROR TransportChannelHandler: Connection to
> ip-10-32-20-34.ec2.internal/10.32.20.34:34294 has been quiet for 12
> ms while there are outstanding requests. Assuming connection is dead;
> please adjust spark.network.timeout if this is wrong.
>
> I can see connections established between containers but everything stays
> in the send Qs:
> [root@ip-10-32-20-34 sysctl.d]# ip netns
> 4602 (id: 1)
> 4600 (id: 0)
> [root@ip-10-32-20-34 sysctl.d]# ip netns exec 4600 netstat -an
> Connexions Internet actives (serveurs et établies)
> Proto Recv-Q Send-Q Local Address   Foreign Address
>  State
> tcp0  0 10.32.20.34:32861   0.0.0.0:*
>LISTEN
> tcp0  0 0.0.0.0:33003   0.0.0.0:*
>LISTEN
> tcp0  0 10.32.20.34:33003   10.32.20.34:57363
>ESTABLISHED
> tcp0  0 10.32.20.34:33566   10.32.20.34:34294
>ESTABLISHED
> tcp0  0 10.32.20.34:33658   10.32.18.185:40600
>   ESTABLISHED
> tcp0  0 10.32.20.34:32832   10.32.18.185:40196
>   ESTABLISHED
> tcp0  0 10.32.20.34:33406   10.32.20.34:5051
>   ESTABLISHED
> Sockets du domaine UNIX actives(serveurs et établies)
> Proto RefCpt Indicatrs   Type   Etat  I-Node Chemin
> unix  2  [ ] STREAM CONNECTE  21869
> unix  2  [ ] STREAM CONNECTE  20339
> [root@ip-10-32-20-34 sysctl.d]# ip netns exec 4602 netstat -an
> Connexions Internet actives (serveurs et établies)
> Proto Recv-Q Send-Q Local Address   Foreign Address
>  State
> tcp0  0 0.0.0.0:33836   0.0.0.0:*
>LISTEN
> tcp0  0 10.32.20.34:34294   0.0.0.0:*
>LISTEN
> tcp0  24229 10.32.20.34:34294   10.32.20.34:33566
>ESTABLISHED
> tcp0  0 10.32.20.34:33860   10.32.18.185:40196
>   ESTABLISHED
> tcp0  0 10.32.20.34:34680   10.32.18.185:40600
>   ESTABLISHED
> tcp0  0 10.32.20.34:34434   10.32.20.34:5051
>   ESTABLISHED
> tcp0  0 10.32.20.34:33836   10.32.20.34:58149
>ESTABLISHED
> Sockets du domaine UNIX actives(serveurs et établies)
> Proto RefCpt Indicatrs   Type   Etat  I-Node Chemin
> unix  2  [ ] STREAM CONNECTE  20359
> unix  2  [ ] STREAM CONNECTE  20373
> [root@ip-10-32-20-34 sysctl.d]#
>


mesos network/port_mapping + spark traffic not flowing between containers

2017-03-19 Thread Dominic Grégoire
Hello all,

I’m testing with mesos 1.1.0 on aws linux to see if it applies to some of
our processes and I ran into a problem with network/port_mapping, maybe
this is a known issue?

The agent is running with these flags:
export MESOS_isolation=cgroups/cpu,cgroups/mem,network/port_mapping
export MESOS_containerizers=mesos
export MESOS_resources="ports:[31000-32000];ephemeral_ports:[32768-57344]"
export MESOS_ephemeral_ports_per_container=1024

Running spark 2.1.0 with 2 mesos containers on the same host, they can
connect to each other’s block manager but can’t send traffic, it stays in
their netns send-q.

Spark is logging:
7/03/19 16:54:56 INFO TransportClientFactory: Successfully created
connection to ip-10-32-20-34.ec2.internal/10.32.20.34:34294 after 12 ms (0
ms spent in bootstraps)
17/03/19 16:56:56 ERROR TransportChannelHandler: Connection to
ip-10-32-20-34.ec2.internal/10.32.20.34:34294 has been quiet for 12 ms
while there are outstanding requests. Assuming connection is dead;
please adjust spark.network.timeout if this is wrong.

I can see connections established between containers but everything stays
in the send Qs:
[root@ip-10-32-20-34 sysctl.d]# ip netns
4602 (id: 1)
4600 (id: 0)
[root@ip-10-32-20-34 sysctl.d]# ip netns exec 4600 netstat -an
Connexions Internet actives (serveurs et établies)
Proto Recv-Q Send-Q Local Address   Foreign Address
 State
tcp0  0 10.32.20.34:32861   0.0.0.0:*
 LISTEN
tcp0  0 0.0.0.0:33003   0.0.0.0:*
 LISTEN
tcp0  0 10.32.20.34:33003   10.32.20.34:57363
 ESTABLISHED
tcp0  0 10.32.20.34:33566   10.32.20.34:34294
 ESTABLISHED
tcp0  0 10.32.20.34:33658   10.32.18.185:40600
ESTABLISHED
tcp0  0 10.32.20.34:32832   10.32.18.185:40196
ESTABLISHED
tcp0  0 10.32.20.34:33406   10.32.20.34:5051
ESTABLISHED
Sockets du domaine UNIX actives(serveurs et établies)
Proto RefCpt Indicatrs   Type   Etat  I-Node Chemin
unix  2  [ ] STREAM CONNECTE  21869
unix  2  [ ] STREAM CONNECTE  20339
[root@ip-10-32-20-34 sysctl.d]# ip netns exec 4602 netstat -an
Connexions Internet actives (serveurs et établies)
Proto Recv-Q Send-Q Local Address   Foreign Address
 State
tcp0  0 0.0.0.0:33836   0.0.0.0:*
 LISTEN
tcp0  0 10.32.20.34:34294   0.0.0.0:*
 LISTEN
tcp0  24229 10.32.20.34:34294   10.32.20.34:33566
 ESTABLISHED
tcp0  0 10.32.20.34:33860   10.32.18.185:40196
ESTABLISHED
tcp0  0 10.32.20.34:34680   10.32.18.185:40600
ESTABLISHED
tcp0  0 10.32.20.34:34434   10.32.20.34:5051
ESTABLISHED
tcp0  0 10.32.20.34:33836   10.32.20.34:58149
 ESTABLISHED
Sockets du domaine UNIX actives(serveurs et établies)
Proto RefCpt Indicatrs   Type   Etat  I-Node Chemin
unix  2  [ ] STREAM CONNECTE  20359
unix  2  [ ] STREAM CONNECTE  20373
[root@ip-10-32-20-34 sysctl.d]#