[jira] [Comment Edited] (MESOS-9031) Mesos CNI portmap plugins' iptables rules doesn't allow connections via host ip and port from the same bridge container network

2018-07-05 Thread Qian Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16534372#comment-16534372
 ] 

Qian Zhang edited comment on MESOS-9031 at 7/6/18 2:21 AM:
---

After more investigations, I found there are 4 issues in total, and the root 
cause of all these 4 issues is some iptables rules are missed which causes 
packets are dropped by the default policy of the FORWARD chain. Here is the 
summary:

*Case 1:* Two containers join the `mesos-cni0` bridge network, the first 
container listens and serves on its 80 port and map it to host's 8080 port, and 
the second container tries to access hostIP:8080.

We need this rule `+iptables -t filter -A FORWARD -o mesos-cni0 -j ACCEPT`+ for 
this case, otherwise the packets sent from the second container will be dropped 
by the default policy of the FORWARD chain, and once this rule is added, the 
packets will be accepted and reaches to the first container successfully.

*Case 2:* One container joins the `mesos-cni0` bridge network, and it tries to 
access the world outside the host on which it runs (e.g., it executes the 
command `curl www.myip.ch`).

We need this rule `+iptables -t filter -A FORWARD -i mesos-cni0 ! -o mesos-cni0 
-j ACCEPT+` for this case, otherwise the packets sent from the container will 
be dropped by the default policy of the FORWARD chain, and once this rule is 
added, the packets will be accepted and sent out of the host successfully.

*Case 3:* One container joins the `mesos-cni0` bridge network and listens and 
serves on its 80 port and map it to host's 8080 port, we try to access 
hostIP:8080 from another host.

We need both of two rules mentioned in case 1 & 2, because without rule in case 
1 the packets sent from the other host will be dropped by the default policy of 
the FORWARD chain, and without rule in case 2 the reply packets sent by the 
container will also be dropped by the default policy of the FORWARD chain.

*Case 4:* Two containers join the `mesos-cni0` bridge network, the first 
container listens and serves on its 80 port, and the second container tries to 
access the first container's IP:80.

We need this rule `+iptables -t filter -A FORWARD -i mesos-cni0 -o mesos-cni0 
-j ACCEPT+` for this case, otherwise the packets sent from the second container 
will be dropped by the default policy of the FORWARD chain, and once this rule 
is added, the packets will be accepted and reaches to the first container 
successfully.

Please note that we only have all the above issues when the default policy of 
the FORWARD chain is DROP, if it is ACCEPT, then there is no any issues.

So to resolve the above issues, we need to add the 3 rules mentioned in the 
above cases into the FORWARD chain, it is actually how Docker handles these 
issues as well, you will find the following rules in the host which has Docker 
installed.
{code:java}
$ sudo iptables -t filter -nvL 
...
Chain FORWARD (policy DROP 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination 
...
0 0 ACCEPT all -- * docker0 0.0.0.0/0 0.0.0.0/0 ctstate RELATED,ESTABLISHED
0 0 ACCEPT all -- docker0 !docker0 0.0.0.0/0 0.0.0.0/0 
0 0 ACCEPT all -- docker0 docker0 0.0.0.0/0 0.0.0.0/0{code}


was (Author: qianzhang):
After more investigations, I found there are 4 issues in total, and the root 
cause of all these 4 issues is some iptables rules are missed which causes 
packets are dropped by the default policy of the FORWARD chain. Here is the 
summary:

*Case 1:* Two containers join the `mesos-cni0` bridge network, the first 
container listens and serves on its 80 port and map it to host's 8080 port, and 
the second container tries to access hostIP:8080.

We need this rule `+iptables -t filter -A FORWARD -o mesos-cni0 -j ACCEPT`+ for 
this case, otherwise the packets sent from the second container will be dropped 
by the default policy of the FORWARD chain, and once this rule is added, the 
packets will be accepted and reaches to the first container successfully.

*Case 2:* One container joins the `mesos-cni0` bridge network, and it tries to 
access the world outside the host on which it runs (e.g., it executes the 
command `curl www.myip.ch`).

We need this rule `+iptables -t filter -A FORWARD -i mesos-cni0 ! -o mesos-cni0 
-j ACCEPT+` for this case, otherwise the packets sent from the container will 
be dropped by the default policy of the FORWARD chain, and once this rule is 
added, the packets will be accepted and sent out of the host successfully.

*Case 3:* One container joins the `mesos-cni0` bridge network and listens and 
serves on its 80 port and map it to host's 8080 port, we try to access 
hostIP:8080 from another host.

We need both of two rules mentioned in case 1 & 2, because without rule in case 
1 the packets sent from the other host will be dropped by the default policy of 
the FORWARD chain, and without rule in case 2 the reply packets 

[jira] [Commented] (MESOS-9031) Mesos CNI portmap plugins' iptables rules doesn't allow connections via host ip and port from the same bridge container network

2018-07-05 Thread Qian Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16534372#comment-16534372
 ] 

Qian Zhang commented on MESOS-9031:
---

After more investigations, I found there are 4 issues in total, and the root 
cause of all these 4 issues is some iptables rules are missed which causes 
packets are dropped by the default policy of the FORWARD chain. Here is the 
summary:

*Case 1:* Two containers join the `mesos-cni0` bridge network, the first 
container listens and serves on its 80 port and map it to host's 8080 port, and 
the second container tries to access hostIP:8080.

We need this rule `+iptables -t filter -A FORWARD -o mesos-cni0 -j ACCEPT`+ for 
this case, otherwise the packets sent from the second container will be dropped 
by the default policy of the FORWARD chain, and once this rule is added, the 
packets will be accepted and reaches to the first container successfully.

*Case 2:* One container joins the `mesos-cni0` bridge network, and it tries to 
access the world outside the host on which it runs (e.g., it executes the 
command `curl www.myip.ch`).

We need this rule `+iptables -t filter -A FORWARD -i mesos-cni0 ! -o mesos-cni0 
-j ACCEPT+` for this case, otherwise the packets sent from the container will 
be dropped by the default policy of the FORWARD chain, and once this rule is 
added, the packets will be accepted and sent out of the host successfully.

*Case 3:* One container joins the `mesos-cni0` bridge network and listens and 
serves on its 80 port and map it to host's 8080 port, we try to access 
hostIP:8080 from another host.

We need both of two rules mentioned in case 1 & 2, because without rule in case 
1 the packets sent from the other host will be dropped by the default policy of 
the FORWARD chain, and without rule in case 2 the reply packets sent by the 
container will also be dropped by the default policy of the FORWARD chain.

*Case 4:* Two containers join the `mesos-cni0` bridge network, the first 
container listens and serves on its 80 port, and the second container tries to 
access the first container's IP:80.

We need this rule `+iptables -t filter -A FORWARD -i mesos-cni0 -o mesos-cni0 
-j ACCEPT+` for this case, otherwise the packets sent from the second container 
will be dropped by the default policy of the FORWARD chain, and once this rule 
is added, the packets will be accepted and reaches to the first container 
successfully.

Please note that we only have all the above issues when the default policy of 
the FORWARD chain is DROP, if it is ACCEPT, then there is no any issues.

> Mesos CNI portmap plugins' iptables rules doesn't allow connections via host 
> ip and port from the same bridge container network
> ---
>
> Key: MESOS-9031
> URL: https://issues.apache.org/jira/browse/MESOS-9031
> Project: Mesos
>  Issue Type: Bug
>  Components: cni, containerization
>Affects Versions: 1.6.0
>Reporter: Kirill Plyashkevich
>Assignee: Qian Zhang
>Priority: Major
>
> using `mesos-cni-port-mapper` with folllowing config:
> {noformat}
> { 
>    "name" : "dcos", 
>    "type" : "mesos-cni-port-mapper", 
>    "excludeDevices" : [], 
>    "chain": "MESOS-CNI0-PORT-MAPPER", 
>    "delegate": { 
>    "type": "bridge", 
>    "bridge": "mesos-cni0", 
>    "isGateway": true, 
>    "ipMasq": true, 
>    "hairpinMode": true, 
>    "ipam": { 
>    "type": "host-local", 
>    "ranges": [ 
>    [{"subnet": "172.26.0.0/16"}] 
>    ], 
>    "routes": [ 
>    {"dst": "0.0.0.0/0"} 
>    ] 
>    } 
>    } 
> }
> {noformat}
>  - 2 services running on the same mesos-slave using unified containerizer in 
> different tasks and communicating via host ip and host port
>  - connection timeouts due to iptables rules per container CNI-XXX chain
>  - actually timeouts are caused by
> {noformat}
> Chain CNI-XXX (1 references)
> num  target prot opt source   destination 
> 1ACCEPT all  --  anywhere 172.26.0.0/16/* name: 
> "dcos" id: "" */
> 2MASQUERADE  all  --  anywhere!base-address.mcast.net/4  /* 
> name: "dcos" id: "" */
> {noformat}
> rule #1 is executed and no masquerading happens.
> there are multiple solutions:
>  - -simpliest and fastest one is not to add that ACCEPT- - NOT A SOLUTION. 
> it's happening in `bridge` plugin and `cni/portmap` shows that 
> snat/masquerade should be done during portmapping as well.
>  - perhaps, there's a better change in iptables rules that can fix it
>  - proper one (imho) is to finally implement cni spec 0.3.x in order to be 
> able to use chaining of plugins and use cni's `bridge` 

[jira] [Comment Edited] (MESOS-9040) Break scheduler driver dependency on mesos-local.

2018-07-05 Thread Till Toenshoff (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16534291#comment-16534291
 ] 

Till Toenshoff edited comment on MESOS-9040 at 7/5/18 11:46 PM:


[~jamespeach] one reason on why it was likely not that popular in the past was 
that several of our examples missed to implement the {{master}} flag 
accordingly. I did at some point set that straight by  
[73f6522|https://github.com/apache/mesos/commit/73f6522905db5114d1bd6d99a3bab0793b14b86d].

I still believe that this was a nice feature (hence my efforts) but I also see 
how this renders a not so clean dependency tree. But, as you can see in my 
commit, there is more done in the examples to make the entire feature perfectly 
convenient - no need for manually setting up ACLs, derived roles and 
authentication. It does indeed decrease friction a lot, especially for newbies.

I can see the following options;
 # Kill that feature
 # Have an additional (convenience) library for use in the examples that 
provide this feature as Ben suggested
 # Let the driver fork-exec towards `mesos-local` when signaled to do so 
without reaping it - but then we need to get the bound IP... not sure how
 # Leave as is

I currently like (3) as it solves the linkage issue while still leaving the 
feature intact - just how do we do that properly.

Please note that the scheduler library also implements this feature, not just 
the scheduler driver. See 
[https://github.com/apache/mesos/blob/16455857ee00e98147d8cb9fb6f31b22554dfe52/src/scheduler/scheduler.cpp#L191-L196]


was (Author: tillt):
[~jamespeach] one reason on why it was likely not that popular in the past was 
that several of our examples missed to implement the {{master}} flag 
accordingly. I did at some point set that straight by  
[73f6522|https://github.com/apache/mesos/commit/73f6522905db5114d1bd6d99a3bab0793b14b86d].

I still believe that this was a nice feature (hence my efforts) but I also see 
how this renders a not so clean dependency tree. But, as you can see in my 
commit, there is more done in the examples to make the entire feature perfectly 
convenient - no need for manually setting up ACLs, derived roles and 
authentication. It does indeed decrease friction a lot, especially for newbies.

I can see the following options;
 # Kill that feature
 # Have an additional (convenience) library for use in the examples that 
provide this feature as Ben suggested
 # Let the driver fork-exec towards `mesos-local` when signaled to do so 
without reaping it
 # Leave as is

I currently like (3) as it solves the linkage issue while still leaving the 
feature intact.

Please note that the scheduler library also implements this feature, not just 
the scheduler driver. See 
[https://github.com/apache/mesos/blob/16455857ee00e98147d8cb9fb6f31b22554dfe52/src/scheduler/scheduler.cpp#L191-L196]

> Break scheduler driver dependency on mesos-local.
> -
>
> Key: MESOS-9040
> URL: https://issues.apache.org/jira/browse/MESOS-9040
> Project: Mesos
>  Issue Type: Task
>  Components: build, scheduler driver
>Reporter: James Peach
>Priority: Minor
>
> The scheduler driver in {{src/sched/sched.cpp}} has some special dependencies 
> on the {{mesos-local}} code. This seems fairly hacky, but it also causes 
> binary dependencies on {{src/local/local.cpp}} to be dragged into 
> {{libmesos.so}}. {{libmesos.so}} would not otherwise require this code, which 
> could be isolated in the {{mesos-local}} command.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (MESOS-9040) Break scheduler driver dependency on mesos-local.

2018-07-05 Thread Till Toenshoff (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16534291#comment-16534291
 ] 

Till Toenshoff edited comment on MESOS-9040 at 7/5/18 11:43 PM:


[~jamespeach] one reason on why it was likely not that popular in the past was 
that several of our examples missed to implement the {{master}} flag 
accordingly. I did at some point set that straight by  
[73f6522|https://github.com/apache/mesos/commit/73f6522905db5114d1bd6d99a3bab0793b14b86d].

I still believe that this was a nice feature (hence my efforts) but I also see 
how this renders a not so clean dependency tree. But, as you can see in my 
commit, there is more done in the examples to make the entire feature perfectly 
convenient - no need for manually setting up ACLs, derived roles and 
authentication. It does indeed decrease friction a lot, especially for newbies.

I can see the following options;
 # Kill that feature
 # Have an additional (convenience) library for use in the examples that 
provide this feature as Ben suggested
 # Let the driver fork-exec towards `mesos-local` when signaled to do so 
without reaping it
 # Leave as is

I currently like (3) as it solves the linkage issue while still leaving the 
feature intact.

Please note that the scheduler library also implements this feature, not just 
the scheduler driver. See 
[https://github.com/apache/mesos/blob/16455857ee00e98147d8cb9fb6f31b22554dfe52/src/scheduler/scheduler.cpp#L191-L196]


was (Author: tillt):
[~jamespeach] one reason on why it was likely not that popular in the past was 
that several of our examples missed to implement the {{master}} flag 
accordingly. I did at some point set that straight by  
[73f6522|[https://github.com/apache/mesos/commit/73f6522905db5114d1bd6d99a3bab0793b14b86d].]

I still believe that this was a nice feature (hence my efforts) but I also see 
how this renders a not so clean dependency tree. But, as you can see in my 
commit, there is more done in the examples to make the entire feature perfectly 
convenient - no need for manually setting up ACLs, derived roles and 
authentication. It does indeed decrease friction a lot, especially for newbies.

I can see the following options;
 # Kill that feature
 # Have an additional (convenience) library for use in the examples that 
provide this feature as Ben suggested
 # Let the driver fork-exec towards `mesos-local` when signaled to do so 
without reaping it
 # Leave as is

I currently like (3) as it solves the linkage issue while still leaving the 
feature intact.

Please note that the scheduler library also implements this feature, not just 
the scheduler driver. See 
https://github.com/apache/mesos/blob/16455857ee00e98147d8cb9fb6f31b22554dfe52/src/scheduler/scheduler.cpp#L191-L196

> Break scheduler driver dependency on mesos-local.
> -
>
> Key: MESOS-9040
> URL: https://issues.apache.org/jira/browse/MESOS-9040
> Project: Mesos
>  Issue Type: Task
>  Components: build, scheduler driver
>Reporter: James Peach
>Priority: Minor
>
> The scheduler driver in {{src/sched/sched.cpp}} has some special dependencies 
> on the {{mesos-local}} code. This seems fairly hacky, but it also causes 
> binary dependencies on {{src/local/local.cpp}} to be dragged into 
> {{libmesos.so}}. {{libmesos.so}} would not otherwise require this code, which 
> could be isolated in the {{mesos-local}} command.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-9040) Break scheduler driver dependency on mesos-local.

2018-07-05 Thread Till Toenshoff (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16534291#comment-16534291
 ] 

Till Toenshoff commented on MESOS-9040:
---

[~jamespeach] one reason on why it was likely not that popular in the past was 
that several of our examples missed to implement the {{master}} flag 
accordingly. I did at some point set that straight by  
[73f6522|[https://github.com/apache/mesos/commit/73f6522905db5114d1bd6d99a3bab0793b14b86d].]

I still believe that this was a nice feature (hence my efforts) but I also see 
how this renders a not so clean dependency tree. But, as you can see in my 
commit, there is more done in the examples to make the entire feature perfectly 
convenient - no need for manually setting up ACLs, derived roles and 
authentication. It does indeed decrease friction a lot, especially for newbies.

I can see the following options;
 # Kill that feature
 # Have an additional (convenience) library for use in the examples that 
provide this feature as Ben suggested
 # Let the driver fork-exec towards `mesos-local` when signaled to do so 
without reaping it
 # Leave as is

I currently like (3) as it solves the linkage issue while still leaving the 
feature intact.

Please note that the scheduler library also implements this feature, not just 
the scheduler driver. See 
https://github.com/apache/mesos/blob/16455857ee00e98147d8cb9fb6f31b22554dfe52/src/scheduler/scheduler.cpp#L191-L196

> Break scheduler driver dependency on mesos-local.
> -
>
> Key: MESOS-9040
> URL: https://issues.apache.org/jira/browse/MESOS-9040
> Project: Mesos
>  Issue Type: Task
>  Components: build, scheduler driver
>Reporter: James Peach
>Priority: Minor
>
> The scheduler driver in {{src/sched/sched.cpp}} has some special dependencies 
> on the {{mesos-local}} code. This seems fairly hacky, but it also causes 
> binary dependencies on {{src/local/local.cpp}} to be dragged into 
> {{libmesos.so}}. {{libmesos.so}} would not otherwise require this code, which 
> could be isolated in the {{mesos-local}} command.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-9052) Default executor should commit suicide if it cannot receive HTTP responses for LAUNCH_NESTED_CONTAINER calls.

2018-07-05 Thread Vinod Kone (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16534077#comment-16534077
 ] 

Vinod Kone commented on MESOS-9052:
---

Instead of suicide, it should shutdown the current task group. Since one 
task/container failing to launch shouldn't impact other task groups.

Also, should this be more generically applied to all calls from executor to 
agent or just launch? 

cc [~gkleiman]

> Default executor should commit suicide if it cannot receive HTTP responses 
> for LAUNCH_NESTED_CONTAINER calls.
> -
>
> Key: MESOS-9052
> URL: https://issues.apache.org/jira/browse/MESOS-9052
> Project: Mesos
>  Issue Type: Bug
>  Components: executor
>Affects Versions: 1.4.0, 1.5.0, 1.6.0, 1.7.0
>Reporter: Chun-Hung Hsiao
>Priority: Major
>
> If there is a network problem (e.g., a routing problem), it is possible that 
> the agent has received {{LAUNCH_NESTED_CONTAINER}} calls from the default 
> executor and launched the nested container, but the executor does not get the 
> HTTP response. This would result in tasks stuck at {{TASK_STARTING}} forever. 
> We should consider making the default executor commit suicide if it does not 
> receive the response in a reasonable amount of time. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-9052) Default executor should commit suicide if it cannot receive HTTP responses for LAUNCH_NESTED_CONTAINER calls.

2018-07-05 Thread Chun-Hung Hsiao (JIRA)
Chun-Hung Hsiao created MESOS-9052:
--

 Summary: Default executor should commit suicide if it cannot 
receive HTTP responses for LAUNCH_NESTED_CONTAINER calls.
 Key: MESOS-9052
 URL: https://issues.apache.org/jira/browse/MESOS-9052
 Project: Mesos
  Issue Type: Bug
  Components: executor
Affects Versions: 1.6.0, 1.5.0, 1.4.0, 1.7.0
Reporter: Chun-Hung Hsiao


If there is a network problem (e.g., a routing problem), it is possible that 
the agent has received {{LAUNCH_NESTED_CONTAINER}} calls from the default 
executor and launched the nested container, but the executor does not get the 
HTTP response. This would result in tasks stuck at {{TASK_STARTING}} forever. 
We should consider making the default executor commit suicide if it does not 
receive the response in a reasonable amount of time. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-9031) Mesos CNI portmap plugins' iptables rules doesn't allow connections via host ip and port from the same bridge container network

2018-07-05 Thread Kirill Plyashkevich (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16533628#comment-16533628
 ] 

Kirill Plyashkevich commented on MESOS-9031:


[~qianzhang], for purity of checking iptables were clean, but I don't think 
that there was any drop rule on server (Ubuntu 16.04.4 LTS).
anyway, it doesn't seem to be a firewall issue, still smth with routing the 
packets...

> Mesos CNI portmap plugins' iptables rules doesn't allow connections via host 
> ip and port from the same bridge container network
> ---
>
> Key: MESOS-9031
> URL: https://issues.apache.org/jira/browse/MESOS-9031
> Project: Mesos
>  Issue Type: Bug
>  Components: cni, containerization
>Affects Versions: 1.6.0
>Reporter: Kirill Plyashkevich
>Assignee: Qian Zhang
>Priority: Major
>
> using `mesos-cni-port-mapper` with folllowing config:
> {noformat}
> { 
>    "name" : "dcos", 
>    "type" : "mesos-cni-port-mapper", 
>    "excludeDevices" : [], 
>    "chain": "MESOS-CNI0-PORT-MAPPER", 
>    "delegate": { 
>    "type": "bridge", 
>    "bridge": "mesos-cni0", 
>    "isGateway": true, 
>    "ipMasq": true, 
>    "hairpinMode": true, 
>    "ipam": { 
>    "type": "host-local", 
>    "ranges": [ 
>    [{"subnet": "172.26.0.0/16"}] 
>    ], 
>    "routes": [ 
>    {"dst": "0.0.0.0/0"} 
>    ] 
>    } 
>    } 
> }
> {noformat}
>  - 2 services running on the same mesos-slave using unified containerizer in 
> different tasks and communicating via host ip and host port
>  - connection timeouts due to iptables rules per container CNI-XXX chain
>  - actually timeouts are caused by
> {noformat}
> Chain CNI-XXX (1 references)
> num  target prot opt source   destination 
> 1ACCEPT all  --  anywhere 172.26.0.0/16/* name: 
> "dcos" id: "" */
> 2MASQUERADE  all  --  anywhere!base-address.mcast.net/4  /* 
> name: "dcos" id: "" */
> {noformat}
> rule #1 is executed and no masquerading happens.
> there are multiple solutions:
>  - -simpliest and fastest one is not to add that ACCEPT- - NOT A SOLUTION. 
> it's happening in `bridge` plugin and `cni/portmap` shows that 
> snat/masquerade should be done during portmapping as well.
>  - perhaps, there's a better change in iptables rules that can fix it
>  - proper one (imho) is to finally implement cni spec 0.3.x in order to be 
> able to use chaining of plugins and use cni's `bridge` and `portmap` plugins 
> in chain (and get rid of mesos-cni-port-mapper completely eventually).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-8985) Posting to the operator api with 'accept recordio' header can crash the agent

2018-07-05 Thread Alexander Rukletsov (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-8985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16533496#comment-16533496
 ] 

Alexander Rukletsov commented on MESOS-8985:


[~bmahler] This does not look like a critical issue to me, hence no back port.

> Posting to the operator api with 'accept recordio' header can crash the agent
> -
>
> Key: MESOS-8985
> URL: https://issues.apache.org/jira/browse/MESOS-8985
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Affects Versions: 1.4.1, 1.5.1
>Reporter: Philip Norman
>Assignee: Benno Evers
>Priority: Major
>  Labels: mesosphere
> Fix For: 1.7.0
>
> Attachments: mesos-slave-crash.log
>
>
> It's possible to crash the mesos agent by posting a reasonable request to the 
> operator API.
> h3. Background:
> Sending a request to the v1 api endpoint with an unsupported 'accept' header:
> {code:java}
> curl -X POST http://10.0.3.27:5051/api/v1 \
>   -H 'accept: application/atom+xml' \
>   -H 'content-type: application/json' \
>   -d '{"type":"GET_CONTAINERS","get_containers":{"show_nested": 
> true,"show_standalone": true}}'{code}
> Results in the following friendly error message:
> {code:java}
> Expecting 'Accept' to allow application/json or application/x-protobuf or 
> application/recordio{code}
> h3. Reproducible crash:
> However, sending the same request with 'application/recordio' 'accept' header:
> {code:java}
> curl -X POST \
> http://10.0.3.27:5051/api/v1 \
>   -H 'accept: application/recordio' \
>   -H 'content-type: application/json' \
>   -d '{"type":"GET_CONTAINERS","get_containers":{"show_nested": 
> true,"show_standalone": true}}'{code}
> causes the agent to crash (no response is received).
> Crash log is shown below, full log from the agent is attached here:
> {code:java}
> Jun 07 22:30:32 ip-10-0-3-27.us-west-2.compute.internal mesos-agent[3718]: 
> I0607 22:30:32.397320 3743 logfmt.cpp:178] type=audit timestamp=2018-06-07 
> 22:30:32.397243904+00:00 reason="Error in token 'Missing 'Authorization' 
> header from HTTP request'. Allowing anonymous connection" 
> object="/slave(1)/api/v1" agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 
> 10_13_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 
> Safari/537.36" authorizer="mesos-agent" action="POST" result=allow 
> srcip=10.0.6.99 dstport=5051 srcport=42084 dstip=10.0.3.27
> Jun 07 22:30:32 ip-10-0-3-27.us-west-2.compute.internal mesos-agent[3718]: 
> W0607 22:30:32.397434 3743 authenticator.cpp:289] Error in token on request 
> from '10.0.6.99:42084': Missing 'Authorization' header from HTTP request
> Jun 07 22:30:32 ip-10-0-3-27.us-west-2.compute.internal mesos-agent[3718]: 
> W0607 22:30:32.397466 3743 authenticator.cpp:291] Falling back to anonymous 
> connection using user 'dcos_anonymous'
> Jun 07 22:30:32 ip-10-0-3-27.us-west-2.compute.internal mesos-agent[3718]: 
> I0607 22:30:32.397629 3748 http.cpp:1099] HTTP POST for /slave(1)/api/v1 from 
> 10.0.6.99:42084 with User-Agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 
> 10_13_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 
> Safari/537.36'
> Jun 07 22:30:32 ip-10-0-3-27.us-west-2.compute.internal mesos-agent[3718]: 
> I0607 22:30:32.397784 3748 http.cpp:2030] Processing GET_CONTAINERS call
> Jun 07 22:30:32 ip-10-0-3-27.us-west-2.compute.internal mesos-agent[3718]: 
> F0607 22:30:32.398736 3747 http.cpp:121] Serializing a RecordIO stream is not 
> supported
> Jun 07 22:30:32 ip-10-0-3-27.us-west-2.compute.internal mesos-agent[3718]: 
> *** Check failure stack trace: ***
> Jun 07 22:30:32 ip-10-0-3-27.us-west-2.compute.internal mesos-agent[3718]: @ 
> 0x7f619478636d google::LogMessage::Fail()
> Jun 07 22:30:32 ip-10-0-3-27.us-west-2.compute.internal mesos-agent[3718]: @ 
> 0x7f619478819d google::LogMessage::SendToLog()
> Jun 07 22:30:32 ip-10-0-3-27.us-west-2.compute.internal mesos-agent[3718]: @ 
> 0x7f6194785f5c google::LogMessage::Flush()
> Jun 07 22:30:32 ip-10-0-3-27.us-west-2.compute.internal mesos-agent[3718]: @ 
> 0x7f6194788a99 google::LogMessageFatal::~LogMessageFatal()
> Jun 07 22:30:32 ip-10-0-3-27.us-west-2.compute.internal mesos-agent[3718]: @ 
> 0x7f61935e2b9d mesos::internal::serialize()
> Jun 07 22:30:32 ip-10-0-3-27.us-west-2.compute.internal mesos-agent[3718]: @ 
> 0x7f6193a4c0ef 
> _ZNO6lambda12CallableOnceIFN7process6FutureINS1_4http8ResponseEEERKN4JSON5ArrayEEE10CallableFnIZNK5mesos8internal5slave4Http13getContainersERKNSD_5agent4CallENSD_11ContentTypeERK6OptionINS3_14authentication9PrincipalEEEUlRKNS2_IS7_EEE0_EclES9_
> Jun 07 22:30:32 ip-10-0-3-27.us-west-2.compute.internal mesos-agent[3718]: @ 
> 0x7f6193a81d61 process::internal::thenf<>()
> Jun 07 22:30:32 ip-10-0-3-27.us-west-2.compute.internal 

[jira] [Comment Edited] (MESOS-9031) Mesos CNI portmap plugins' iptables rules doesn't allow connections via host ip and port from the same bridge container network

2018-07-05 Thread Qian Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16533475#comment-16533475
 ] 

Qian Zhang edited comment on MESOS-9031 at 7/5/18 10:22 AM:


[~Kirill P] It's weird, I think there should be no firewall issue in your 
environment since the default policy of your FORWARD chain is ACCEPT
{quote}:FORWARD ACCEPT [146628:34388360]
{quote}
So no packets will be dropped by this chain.

In my environment (a Ubuntu 17.10 VM), the default policy of the FORWARD chain 
is DROP, that's why I need to add that rule ("iptables -t filter -A FORWARD -o 
mesos-cni0 -j ACCEPT").

Can you please try another application rather than akka? E.g., launch a simple 
http server ("python -m SimpleHTTPServer 8080") to join a CNI bridge network 
and map its port to a host port, and then launch another container in the same 
CNI bridge network to access that http server via hostIP:hostPort, will it have 
the same timeout issue?


was (Author: qianzhang):
[~Kirill P] It's weird, I think there should be no firewall issue in your 
environment since the default policy of your FORWARD chain is ACCEPT
{quote}:FORWARD ACCEPT [146628:34388360]
{quote}
So no packets will be dropped by this chain.

In my environment (a Ubuntu 17.10 VM), the default policy of the FORWARD chain 
is DROP, that's why I need to add that rule ("iptables -t filter -A FORWARD -o 
mesos-cni0 -j ACCEPT").

> Mesos CNI portmap plugins' iptables rules doesn't allow connections via host 
> ip and port from the same bridge container network
> ---
>
> Key: MESOS-9031
> URL: https://issues.apache.org/jira/browse/MESOS-9031
> Project: Mesos
>  Issue Type: Bug
>  Components: cni, containerization
>Affects Versions: 1.6.0
>Reporter: Kirill Plyashkevich
>Assignee: Qian Zhang
>Priority: Major
>
> using `mesos-cni-port-mapper` with folllowing config:
> {noformat}
> { 
>    "name" : "dcos", 
>    "type" : "mesos-cni-port-mapper", 
>    "excludeDevices" : [], 
>    "chain": "MESOS-CNI0-PORT-MAPPER", 
>    "delegate": { 
>    "type": "bridge", 
>    "bridge": "mesos-cni0", 
>    "isGateway": true, 
>    "ipMasq": true, 
>    "hairpinMode": true, 
>    "ipam": { 
>    "type": "host-local", 
>    "ranges": [ 
>    [{"subnet": "172.26.0.0/16"}] 
>    ], 
>    "routes": [ 
>    {"dst": "0.0.0.0/0"} 
>    ] 
>    } 
>    } 
> }
> {noformat}
>  - 2 services running on the same mesos-slave using unified containerizer in 
> different tasks and communicating via host ip and host port
>  - connection timeouts due to iptables rules per container CNI-XXX chain
>  - actually timeouts are caused by
> {noformat}
> Chain CNI-XXX (1 references)
> num  target prot opt source   destination 
> 1ACCEPT all  --  anywhere 172.26.0.0/16/* name: 
> "dcos" id: "" */
> 2MASQUERADE  all  --  anywhere!base-address.mcast.net/4  /* 
> name: "dcos" id: "" */
> {noformat}
> rule #1 is executed and no masquerading happens.
> there are multiple solutions:
>  - -simpliest and fastest one is not to add that ACCEPT- - NOT A SOLUTION. 
> it's happening in `bridge` plugin and `cni/portmap` shows that 
> snat/masquerade should be done during portmapping as well.
>  - perhaps, there's a better change in iptables rules that can fix it
>  - proper one (imho) is to finally implement cni spec 0.3.x in order to be 
> able to use chaining of plugins and use cni's `bridge` and `portmap` plugins 
> in chain (and get rid of mesos-cni-port-mapper completely eventually).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (MESOS-9031) Mesos CNI portmap plugins' iptables rules doesn't allow connections via host ip and port from the same bridge container network

2018-07-05 Thread Qian Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16533475#comment-16533475
 ] 

Qian Zhang edited comment on MESOS-9031 at 7/5/18 10:13 AM:


[~Kirill P] It's weird, I think there should be no firewall issue in your 
environment since the default policy of your FORWARD chain is ACCEPT
{quote}:FORWARD ACCEPT [146628:34388360]
{quote}
So no packets will be dropped by this chain.

In my environment (a Ubuntu 17.10 VM), the default policy of the FORWARD chain 
is DROP, that's why I need to add that rule ("iptables -t filter -A FORWARD -o 
mesos-cni0 -j ACCEPT").


was (Author: qianzhang):
[~Kirill P] It's weird, I think there should be no firewall issue in your 
environment since the default policy of your FORWARD chain is ACCEPT (":FORWARD 
ACCEPT [146628:34388360]"), so no packets will be dropped by this chain. In my 
environment (a Ubuntu 17.10 VM), the default policy of the FORWARD chain is 
DROP, that's why I need to add that rule ("iptables -t filter -A FORWARD -o 
mesos-cni0 -j ACCEPT").

> Mesos CNI portmap plugins' iptables rules doesn't allow connections via host 
> ip and port from the same bridge container network
> ---
>
> Key: MESOS-9031
> URL: https://issues.apache.org/jira/browse/MESOS-9031
> Project: Mesos
>  Issue Type: Bug
>  Components: cni, containerization
>Affects Versions: 1.6.0
>Reporter: Kirill Plyashkevich
>Assignee: Qian Zhang
>Priority: Major
>
> using `mesos-cni-port-mapper` with folllowing config:
> {noformat}
> { 
>    "name" : "dcos", 
>    "type" : "mesos-cni-port-mapper", 
>    "excludeDevices" : [], 
>    "chain": "MESOS-CNI0-PORT-MAPPER", 
>    "delegate": { 
>    "type": "bridge", 
>    "bridge": "mesos-cni0", 
>    "isGateway": true, 
>    "ipMasq": true, 
>    "hairpinMode": true, 
>    "ipam": { 
>    "type": "host-local", 
>    "ranges": [ 
>    [{"subnet": "172.26.0.0/16"}] 
>    ], 
>    "routes": [ 
>    {"dst": "0.0.0.0/0"} 
>    ] 
>    } 
>    } 
> }
> {noformat}
>  - 2 services running on the same mesos-slave using unified containerizer in 
> different tasks and communicating via host ip and host port
>  - connection timeouts due to iptables rules per container CNI-XXX chain
>  - actually timeouts are caused by
> {noformat}
> Chain CNI-XXX (1 references)
> num  target prot opt source   destination 
> 1ACCEPT all  --  anywhere 172.26.0.0/16/* name: 
> "dcos" id: "" */
> 2MASQUERADE  all  --  anywhere!base-address.mcast.net/4  /* 
> name: "dcos" id: "" */
> {noformat}
> rule #1 is executed and no masquerading happens.
> there are multiple solutions:
>  - -simpliest and fastest one is not to add that ACCEPT- - NOT A SOLUTION. 
> it's happening in `bridge` plugin and `cni/portmap` shows that 
> snat/masquerade should be done during portmapping as well.
>  - perhaps, there's a better change in iptables rules that can fix it
>  - proper one (imho) is to finally implement cni spec 0.3.x in order to be 
> able to use chaining of plugins and use cni's `bridge` and `portmap` plugins 
> in chain (and get rid of mesos-cni-port-mapper completely eventually).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-9031) Mesos CNI portmap plugins' iptables rules doesn't allow connections via host ip and port from the same bridge container network

2018-07-05 Thread Qian Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-9031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16533475#comment-16533475
 ] 

Qian Zhang commented on MESOS-9031:
---

[~Kirill P] It's weird, I think there should be no firewall issue in your 
environment since the default policy of your FORWARD chain is ACCEPT (":FORWARD 
ACCEPT [146628:34388360]"), so no packets will be dropped by this chain. In my 
environment (a Ubuntu 17.10 VM), the default policy of the FORWARD chain is 
DROP, that's why I need to add that rule ("iptables -t filter -A FORWARD -o 
mesos-cni0 -j ACCEPT").

> Mesos CNI portmap plugins' iptables rules doesn't allow connections via host 
> ip and port from the same bridge container network
> ---
>
> Key: MESOS-9031
> URL: https://issues.apache.org/jira/browse/MESOS-9031
> Project: Mesos
>  Issue Type: Bug
>  Components: cni, containerization
>Affects Versions: 1.6.0
>Reporter: Kirill Plyashkevich
>Assignee: Qian Zhang
>Priority: Major
>
> using `mesos-cni-port-mapper` with folllowing config:
> {noformat}
> { 
>    "name" : "dcos", 
>    "type" : "mesos-cni-port-mapper", 
>    "excludeDevices" : [], 
>    "chain": "MESOS-CNI0-PORT-MAPPER", 
>    "delegate": { 
>    "type": "bridge", 
>    "bridge": "mesos-cni0", 
>    "isGateway": true, 
>    "ipMasq": true, 
>    "hairpinMode": true, 
>    "ipam": { 
>    "type": "host-local", 
>    "ranges": [ 
>    [{"subnet": "172.26.0.0/16"}] 
>    ], 
>    "routes": [ 
>    {"dst": "0.0.0.0/0"} 
>    ] 
>    } 
>    } 
> }
> {noformat}
>  - 2 services running on the same mesos-slave using unified containerizer in 
> different tasks and communicating via host ip and host port
>  - connection timeouts due to iptables rules per container CNI-XXX chain
>  - actually timeouts are caused by
> {noformat}
> Chain CNI-XXX (1 references)
> num  target prot opt source   destination 
> 1ACCEPT all  --  anywhere 172.26.0.0/16/* name: 
> "dcos" id: "" */
> 2MASQUERADE  all  --  anywhere!base-address.mcast.net/4  /* 
> name: "dcos" id: "" */
> {noformat}
> rule #1 is executed and no masquerading happens.
> there are multiple solutions:
>  - -simpliest and fastest one is not to add that ACCEPT- - NOT A SOLUTION. 
> it's happening in `bridge` plugin and `cni/portmap` shows that 
> snat/masquerade should be done during portmapping as well.
>  - perhaps, there's a better change in iptables rules that can fix it
>  - proper one (imho) is to finally implement cni spec 0.3.x in order to be 
> able to use chaining of plugins and use cni's `bridge` and `portmap` plugins 
> in chain (and get rid of mesos-cni-port-mapper completely eventually).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (MESOS-9051) Move agent call validation into common validation library.

2018-07-05 Thread James Peach (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-9051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Peach reassigned MESOS-9051:
--

Assignee: James Peach

| [https://reviews.apache.org/r/67830/] | Moved `executor::Call` validation to 
common validation library. |

> Move agent call validation into common validation library.
> --
>
> Key: MESOS-9051
> URL: https://issues.apache.org/jira/browse/MESOS-9051
> Project: Mesos
>  Issue Type: Bug
>  Components: agent, build
>Reporter: James Peach
>Assignee: James Peach
>Priority: Minor
>
> The executor driver calls {{executor::call::validate()}} from 
> {{src/slave/validation.cpp}}, which creates an upward dependency from 
> libmesos.so (where the executor driver has to live) to the agent. If we can 
> move the validation calls down to the common validation library, we can break 
> this dependency.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)