[jira] [Comment Edited] (MESOS-9031) Mesos CNI portmap plugins' iptables rules doesn't allow connections via host ip and port from the same bridge container network
[ https://issues.apache.org/jira/browse/MESOS-9031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16534372#comment-16534372 ] Qian Zhang edited comment on MESOS-9031 at 7/6/18 2:21 AM: --- After more investigations, I found there are 4 issues in total, and the root cause of all these 4 issues is some iptables rules are missed which causes packets are dropped by the default policy of the FORWARD chain. Here is the summary: *Case 1:* Two containers join the `mesos-cni0` bridge network, the first container listens and serves on its 80 port and map it to host's 8080 port, and the second container tries to access hostIP:8080. We need this rule `+iptables -t filter -A FORWARD -o mesos-cni0 -j ACCEPT`+ for this case, otherwise the packets sent from the second container will be dropped by the default policy of the FORWARD chain, and once this rule is added, the packets will be accepted and reaches to the first container successfully. *Case 2:* One container joins the `mesos-cni0` bridge network, and it tries to access the world outside the host on which it runs (e.g., it executes the command `curl www.myip.ch`). We need this rule `+iptables -t filter -A FORWARD -i mesos-cni0 ! -o mesos-cni0 -j ACCEPT+` for this case, otherwise the packets sent from the container will be dropped by the default policy of the FORWARD chain, and once this rule is added, the packets will be accepted and sent out of the host successfully. *Case 3:* One container joins the `mesos-cni0` bridge network and listens and serves on its 80 port and map it to host's 8080 port, we try to access hostIP:8080 from another host. We need both of two rules mentioned in case 1 & 2, because without rule in case 1 the packets sent from the other host will be dropped by the default policy of the FORWARD chain, and without rule in case 2 the reply packets sent by the container will also be dropped by the default policy of the FORWARD chain. *Case 4:* Two containers join the `mesos-cni0` bridge network, the first container listens and serves on its 80 port, and the second container tries to access the first container's IP:80. We need this rule `+iptables -t filter -A FORWARD -i mesos-cni0 -o mesos-cni0 -j ACCEPT+` for this case, otherwise the packets sent from the second container will be dropped by the default policy of the FORWARD chain, and once this rule is added, the packets will be accepted and reaches to the first container successfully. Please note that we only have all the above issues when the default policy of the FORWARD chain is DROP, if it is ACCEPT, then there is no any issues. So to resolve the above issues, we need to add the 3 rules mentioned in the above cases into the FORWARD chain, it is actually how Docker handles these issues as well, you will find the following rules in the host which has Docker installed. {code:java} $ sudo iptables -t filter -nvL ... Chain FORWARD (policy DROP 0 packets, 0 bytes) pkts bytes target prot opt in out source destination ... 0 0 ACCEPT all -- * docker0 0.0.0.0/0 0.0.0.0/0 ctstate RELATED,ESTABLISHED 0 0 ACCEPT all -- docker0 !docker0 0.0.0.0/0 0.0.0.0/0 0 0 ACCEPT all -- docker0 docker0 0.0.0.0/0 0.0.0.0/0{code} was (Author: qianzhang): After more investigations, I found there are 4 issues in total, and the root cause of all these 4 issues is some iptables rules are missed which causes packets are dropped by the default policy of the FORWARD chain. Here is the summary: *Case 1:* Two containers join the `mesos-cni0` bridge network, the first container listens and serves on its 80 port and map it to host's 8080 port, and the second container tries to access hostIP:8080. We need this rule `+iptables -t filter -A FORWARD -o mesos-cni0 -j ACCEPT`+ for this case, otherwise the packets sent from the second container will be dropped by the default policy of the FORWARD chain, and once this rule is added, the packets will be accepted and reaches to the first container successfully. *Case 2:* One container joins the `mesos-cni0` bridge network, and it tries to access the world outside the host on which it runs (e.g., it executes the command `curl www.myip.ch`). We need this rule `+iptables -t filter -A FORWARD -i mesos-cni0 ! -o mesos-cni0 -j ACCEPT+` for this case, otherwise the packets sent from the container will be dropped by the default policy of the FORWARD chain, and once this rule is added, the packets will be accepted and sent out of the host successfully. *Case 3:* One container joins the `mesos-cni0` bridge network and listens and serves on its 80 port and map it to host's 8080 port, we try to access hostIP:8080 from another host. We need both of two rules mentioned in case 1 & 2, because without rule in case 1 the packets sent from the other host will be dropped by the default policy of the FORWARD chain, and without rule in case 2 the reply packets
[jira] [Commented] (MESOS-9031) Mesos CNI portmap plugins' iptables rules doesn't allow connections via host ip and port from the same bridge container network
[ https://issues.apache.org/jira/browse/MESOS-9031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16534372#comment-16534372 ] Qian Zhang commented on MESOS-9031: --- After more investigations, I found there are 4 issues in total, and the root cause of all these 4 issues is some iptables rules are missed which causes packets are dropped by the default policy of the FORWARD chain. Here is the summary: *Case 1:* Two containers join the `mesos-cni0` bridge network, the first container listens and serves on its 80 port and map it to host's 8080 port, and the second container tries to access hostIP:8080. We need this rule `+iptables -t filter -A FORWARD -o mesos-cni0 -j ACCEPT`+ for this case, otherwise the packets sent from the second container will be dropped by the default policy of the FORWARD chain, and once this rule is added, the packets will be accepted and reaches to the first container successfully. *Case 2:* One container joins the `mesos-cni0` bridge network, and it tries to access the world outside the host on which it runs (e.g., it executes the command `curl www.myip.ch`). We need this rule `+iptables -t filter -A FORWARD -i mesos-cni0 ! -o mesos-cni0 -j ACCEPT+` for this case, otherwise the packets sent from the container will be dropped by the default policy of the FORWARD chain, and once this rule is added, the packets will be accepted and sent out of the host successfully. *Case 3:* One container joins the `mesos-cni0` bridge network and listens and serves on its 80 port and map it to host's 8080 port, we try to access hostIP:8080 from another host. We need both of two rules mentioned in case 1 & 2, because without rule in case 1 the packets sent from the other host will be dropped by the default policy of the FORWARD chain, and without rule in case 2 the reply packets sent by the container will also be dropped by the default policy of the FORWARD chain. *Case 4:* Two containers join the `mesos-cni0` bridge network, the first container listens and serves on its 80 port, and the second container tries to access the first container's IP:80. We need this rule `+iptables -t filter -A FORWARD -i mesos-cni0 -o mesos-cni0 -j ACCEPT+` for this case, otherwise the packets sent from the second container will be dropped by the default policy of the FORWARD chain, and once this rule is added, the packets will be accepted and reaches to the first container successfully. Please note that we only have all the above issues when the default policy of the FORWARD chain is DROP, if it is ACCEPT, then there is no any issues. > Mesos CNI portmap plugins' iptables rules doesn't allow connections via host > ip and port from the same bridge container network > --- > > Key: MESOS-9031 > URL: https://issues.apache.org/jira/browse/MESOS-9031 > Project: Mesos > Issue Type: Bug > Components: cni, containerization >Affects Versions: 1.6.0 >Reporter: Kirill Plyashkevich >Assignee: Qian Zhang >Priority: Major > > using `mesos-cni-port-mapper` with folllowing config: > {noformat} > { > "name" : "dcos", > "type" : "mesos-cni-port-mapper", > "excludeDevices" : [], > "chain": "MESOS-CNI0-PORT-MAPPER", > "delegate": { > "type": "bridge", > "bridge": "mesos-cni0", > "isGateway": true, > "ipMasq": true, > "hairpinMode": true, > "ipam": { > "type": "host-local", > "ranges": [ > [{"subnet": "172.26.0.0/16"}] > ], > "routes": [ > {"dst": "0.0.0.0/0"} > ] > } > } > } > {noformat} > - 2 services running on the same mesos-slave using unified containerizer in > different tasks and communicating via host ip and host port > - connection timeouts due to iptables rules per container CNI-XXX chain > - actually timeouts are caused by > {noformat} > Chain CNI-XXX (1 references) > num target prot opt source destination > 1ACCEPT all -- anywhere 172.26.0.0/16/* name: > "dcos" id: "" */ > 2MASQUERADE all -- anywhere!base-address.mcast.net/4 /* > name: "dcos" id: "" */ > {noformat} > rule #1 is executed and no masquerading happens. > there are multiple solutions: > - -simpliest and fastest one is not to add that ACCEPT- - NOT A SOLUTION. > it's happening in `bridge` plugin and `cni/portmap` shows that > snat/masquerade should be done during portmapping as well. > - perhaps, there's a better change in iptables rules that can fix it > - proper one (imho) is to finally implement cni spec 0.3.x in order to be > able to use chaining of plugins and use cni's `bridge`
[jira] [Comment Edited] (MESOS-9040) Break scheduler driver dependency on mesos-local.
[ https://issues.apache.org/jira/browse/MESOS-9040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16534291#comment-16534291 ] Till Toenshoff edited comment on MESOS-9040 at 7/5/18 11:46 PM: [~jamespeach] one reason on why it was likely not that popular in the past was that several of our examples missed to implement the {{master}} flag accordingly. I did at some point set that straight by [73f6522|https://github.com/apache/mesos/commit/73f6522905db5114d1bd6d99a3bab0793b14b86d]. I still believe that this was a nice feature (hence my efforts) but I also see how this renders a not so clean dependency tree. But, as you can see in my commit, there is more done in the examples to make the entire feature perfectly convenient - no need for manually setting up ACLs, derived roles and authentication. It does indeed decrease friction a lot, especially for newbies. I can see the following options; # Kill that feature # Have an additional (convenience) library for use in the examples that provide this feature as Ben suggested # Let the driver fork-exec towards `mesos-local` when signaled to do so without reaping it - but then we need to get the bound IP... not sure how # Leave as is I currently like (3) as it solves the linkage issue while still leaving the feature intact - just how do we do that properly. Please note that the scheduler library also implements this feature, not just the scheduler driver. See [https://github.com/apache/mesos/blob/16455857ee00e98147d8cb9fb6f31b22554dfe52/src/scheduler/scheduler.cpp#L191-L196] was (Author: tillt): [~jamespeach] one reason on why it was likely not that popular in the past was that several of our examples missed to implement the {{master}} flag accordingly. I did at some point set that straight by [73f6522|https://github.com/apache/mesos/commit/73f6522905db5114d1bd6d99a3bab0793b14b86d]. I still believe that this was a nice feature (hence my efforts) but I also see how this renders a not so clean dependency tree. But, as you can see in my commit, there is more done in the examples to make the entire feature perfectly convenient - no need for manually setting up ACLs, derived roles and authentication. It does indeed decrease friction a lot, especially for newbies. I can see the following options; # Kill that feature # Have an additional (convenience) library for use in the examples that provide this feature as Ben suggested # Let the driver fork-exec towards `mesos-local` when signaled to do so without reaping it # Leave as is I currently like (3) as it solves the linkage issue while still leaving the feature intact. Please note that the scheduler library also implements this feature, not just the scheduler driver. See [https://github.com/apache/mesos/blob/16455857ee00e98147d8cb9fb6f31b22554dfe52/src/scheduler/scheduler.cpp#L191-L196] > Break scheduler driver dependency on mesos-local. > - > > Key: MESOS-9040 > URL: https://issues.apache.org/jira/browse/MESOS-9040 > Project: Mesos > Issue Type: Task > Components: build, scheduler driver >Reporter: James Peach >Priority: Minor > > The scheduler driver in {{src/sched/sched.cpp}} has some special dependencies > on the {{mesos-local}} code. This seems fairly hacky, but it also causes > binary dependencies on {{src/local/local.cpp}} to be dragged into > {{libmesos.so}}. {{libmesos.so}} would not otherwise require this code, which > could be isolated in the {{mesos-local}} command. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (MESOS-9040) Break scheduler driver dependency on mesos-local.
[ https://issues.apache.org/jira/browse/MESOS-9040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16534291#comment-16534291 ] Till Toenshoff edited comment on MESOS-9040 at 7/5/18 11:43 PM: [~jamespeach] one reason on why it was likely not that popular in the past was that several of our examples missed to implement the {{master}} flag accordingly. I did at some point set that straight by [73f6522|https://github.com/apache/mesos/commit/73f6522905db5114d1bd6d99a3bab0793b14b86d]. I still believe that this was a nice feature (hence my efforts) but I also see how this renders a not so clean dependency tree. But, as you can see in my commit, there is more done in the examples to make the entire feature perfectly convenient - no need for manually setting up ACLs, derived roles and authentication. It does indeed decrease friction a lot, especially for newbies. I can see the following options; # Kill that feature # Have an additional (convenience) library for use in the examples that provide this feature as Ben suggested # Let the driver fork-exec towards `mesos-local` when signaled to do so without reaping it # Leave as is I currently like (3) as it solves the linkage issue while still leaving the feature intact. Please note that the scheduler library also implements this feature, not just the scheduler driver. See [https://github.com/apache/mesos/blob/16455857ee00e98147d8cb9fb6f31b22554dfe52/src/scheduler/scheduler.cpp#L191-L196] was (Author: tillt): [~jamespeach] one reason on why it was likely not that popular in the past was that several of our examples missed to implement the {{master}} flag accordingly. I did at some point set that straight by [73f6522|[https://github.com/apache/mesos/commit/73f6522905db5114d1bd6d99a3bab0793b14b86d].] I still believe that this was a nice feature (hence my efforts) but I also see how this renders a not so clean dependency tree. But, as you can see in my commit, there is more done in the examples to make the entire feature perfectly convenient - no need for manually setting up ACLs, derived roles and authentication. It does indeed decrease friction a lot, especially for newbies. I can see the following options; # Kill that feature # Have an additional (convenience) library for use in the examples that provide this feature as Ben suggested # Let the driver fork-exec towards `mesos-local` when signaled to do so without reaping it # Leave as is I currently like (3) as it solves the linkage issue while still leaving the feature intact. Please note that the scheduler library also implements this feature, not just the scheduler driver. See https://github.com/apache/mesos/blob/16455857ee00e98147d8cb9fb6f31b22554dfe52/src/scheduler/scheduler.cpp#L191-L196 > Break scheduler driver dependency on mesos-local. > - > > Key: MESOS-9040 > URL: https://issues.apache.org/jira/browse/MESOS-9040 > Project: Mesos > Issue Type: Task > Components: build, scheduler driver >Reporter: James Peach >Priority: Minor > > The scheduler driver in {{src/sched/sched.cpp}} has some special dependencies > on the {{mesos-local}} code. This seems fairly hacky, but it also causes > binary dependencies on {{src/local/local.cpp}} to be dragged into > {{libmesos.so}}. {{libmesos.so}} would not otherwise require this code, which > could be isolated in the {{mesos-local}} command. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-9040) Break scheduler driver dependency on mesos-local.
[ https://issues.apache.org/jira/browse/MESOS-9040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16534291#comment-16534291 ] Till Toenshoff commented on MESOS-9040: --- [~jamespeach] one reason on why it was likely not that popular in the past was that several of our examples missed to implement the {{master}} flag accordingly. I did at some point set that straight by [73f6522|[https://github.com/apache/mesos/commit/73f6522905db5114d1bd6d99a3bab0793b14b86d].] I still believe that this was a nice feature (hence my efforts) but I also see how this renders a not so clean dependency tree. But, as you can see in my commit, there is more done in the examples to make the entire feature perfectly convenient - no need for manually setting up ACLs, derived roles and authentication. It does indeed decrease friction a lot, especially for newbies. I can see the following options; # Kill that feature # Have an additional (convenience) library for use in the examples that provide this feature as Ben suggested # Let the driver fork-exec towards `mesos-local` when signaled to do so without reaping it # Leave as is I currently like (3) as it solves the linkage issue while still leaving the feature intact. Please note that the scheduler library also implements this feature, not just the scheduler driver. See https://github.com/apache/mesos/blob/16455857ee00e98147d8cb9fb6f31b22554dfe52/src/scheduler/scheduler.cpp#L191-L196 > Break scheduler driver dependency on mesos-local. > - > > Key: MESOS-9040 > URL: https://issues.apache.org/jira/browse/MESOS-9040 > Project: Mesos > Issue Type: Task > Components: build, scheduler driver >Reporter: James Peach >Priority: Minor > > The scheduler driver in {{src/sched/sched.cpp}} has some special dependencies > on the {{mesos-local}} code. This seems fairly hacky, but it also causes > binary dependencies on {{src/local/local.cpp}} to be dragged into > {{libmesos.so}}. {{libmesos.so}} would not otherwise require this code, which > could be isolated in the {{mesos-local}} command. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-9052) Default executor should commit suicide if it cannot receive HTTP responses for LAUNCH_NESTED_CONTAINER calls.
[ https://issues.apache.org/jira/browse/MESOS-9052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16534077#comment-16534077 ] Vinod Kone commented on MESOS-9052: --- Instead of suicide, it should shutdown the current task group. Since one task/container failing to launch shouldn't impact other task groups. Also, should this be more generically applied to all calls from executor to agent or just launch? cc [~gkleiman] > Default executor should commit suicide if it cannot receive HTTP responses > for LAUNCH_NESTED_CONTAINER calls. > - > > Key: MESOS-9052 > URL: https://issues.apache.org/jira/browse/MESOS-9052 > Project: Mesos > Issue Type: Bug > Components: executor >Affects Versions: 1.4.0, 1.5.0, 1.6.0, 1.7.0 >Reporter: Chun-Hung Hsiao >Priority: Major > > If there is a network problem (e.g., a routing problem), it is possible that > the agent has received {{LAUNCH_NESTED_CONTAINER}} calls from the default > executor and launched the nested container, but the executor does not get the > HTTP response. This would result in tasks stuck at {{TASK_STARTING}} forever. > We should consider making the default executor commit suicide if it does not > receive the response in a reasonable amount of time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (MESOS-9052) Default executor should commit suicide if it cannot receive HTTP responses for LAUNCH_NESTED_CONTAINER calls.
Chun-Hung Hsiao created MESOS-9052: -- Summary: Default executor should commit suicide if it cannot receive HTTP responses for LAUNCH_NESTED_CONTAINER calls. Key: MESOS-9052 URL: https://issues.apache.org/jira/browse/MESOS-9052 Project: Mesos Issue Type: Bug Components: executor Affects Versions: 1.6.0, 1.5.0, 1.4.0, 1.7.0 Reporter: Chun-Hung Hsiao If there is a network problem (e.g., a routing problem), it is possible that the agent has received {{LAUNCH_NESTED_CONTAINER}} calls from the default executor and launched the nested container, but the executor does not get the HTTP response. This would result in tasks stuck at {{TASK_STARTING}} forever. We should consider making the default executor commit suicide if it does not receive the response in a reasonable amount of time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-9031) Mesos CNI portmap plugins' iptables rules doesn't allow connections via host ip and port from the same bridge container network
[ https://issues.apache.org/jira/browse/MESOS-9031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16533628#comment-16533628 ] Kirill Plyashkevich commented on MESOS-9031: [~qianzhang], for purity of checking iptables were clean, but I don't think that there was any drop rule on server (Ubuntu 16.04.4 LTS). anyway, it doesn't seem to be a firewall issue, still smth with routing the packets... > Mesos CNI portmap plugins' iptables rules doesn't allow connections via host > ip and port from the same bridge container network > --- > > Key: MESOS-9031 > URL: https://issues.apache.org/jira/browse/MESOS-9031 > Project: Mesos > Issue Type: Bug > Components: cni, containerization >Affects Versions: 1.6.0 >Reporter: Kirill Plyashkevich >Assignee: Qian Zhang >Priority: Major > > using `mesos-cni-port-mapper` with folllowing config: > {noformat} > { > "name" : "dcos", > "type" : "mesos-cni-port-mapper", > "excludeDevices" : [], > "chain": "MESOS-CNI0-PORT-MAPPER", > "delegate": { > "type": "bridge", > "bridge": "mesos-cni0", > "isGateway": true, > "ipMasq": true, > "hairpinMode": true, > "ipam": { > "type": "host-local", > "ranges": [ > [{"subnet": "172.26.0.0/16"}] > ], > "routes": [ > {"dst": "0.0.0.0/0"} > ] > } > } > } > {noformat} > - 2 services running on the same mesos-slave using unified containerizer in > different tasks and communicating via host ip and host port > - connection timeouts due to iptables rules per container CNI-XXX chain > - actually timeouts are caused by > {noformat} > Chain CNI-XXX (1 references) > num target prot opt source destination > 1ACCEPT all -- anywhere 172.26.0.0/16/* name: > "dcos" id: "" */ > 2MASQUERADE all -- anywhere!base-address.mcast.net/4 /* > name: "dcos" id: "" */ > {noformat} > rule #1 is executed and no masquerading happens. > there are multiple solutions: > - -simpliest and fastest one is not to add that ACCEPT- - NOT A SOLUTION. > it's happening in `bridge` plugin and `cni/portmap` shows that > snat/masquerade should be done during portmapping as well. > - perhaps, there's a better change in iptables rules that can fix it > - proper one (imho) is to finally implement cni spec 0.3.x in order to be > able to use chaining of plugins and use cni's `bridge` and `portmap` plugins > in chain (and get rid of mesos-cni-port-mapper completely eventually). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-8985) Posting to the operator api with 'accept recordio' header can crash the agent
[ https://issues.apache.org/jira/browse/MESOS-8985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16533496#comment-16533496 ] Alexander Rukletsov commented on MESOS-8985: [~bmahler] This does not look like a critical issue to me, hence no back port. > Posting to the operator api with 'accept recordio' header can crash the agent > - > > Key: MESOS-8985 > URL: https://issues.apache.org/jira/browse/MESOS-8985 > Project: Mesos > Issue Type: Bug > Components: HTTP API >Affects Versions: 1.4.1, 1.5.1 >Reporter: Philip Norman >Assignee: Benno Evers >Priority: Major > Labels: mesosphere > Fix For: 1.7.0 > > Attachments: mesos-slave-crash.log > > > It's possible to crash the mesos agent by posting a reasonable request to the > operator API. > h3. Background: > Sending a request to the v1 api endpoint with an unsupported 'accept' header: > {code:java} > curl -X POST http://10.0.3.27:5051/api/v1 \ > -H 'accept: application/atom+xml' \ > -H 'content-type: application/json' \ > -d '{"type":"GET_CONTAINERS","get_containers":{"show_nested": > true,"show_standalone": true}}'{code} > Results in the following friendly error message: > {code:java} > Expecting 'Accept' to allow application/json or application/x-protobuf or > application/recordio{code} > h3. Reproducible crash: > However, sending the same request with 'application/recordio' 'accept' header: > {code:java} > curl -X POST \ > http://10.0.3.27:5051/api/v1 \ > -H 'accept: application/recordio' \ > -H 'content-type: application/json' \ > -d '{"type":"GET_CONTAINERS","get_containers":{"show_nested": > true,"show_standalone": true}}'{code} > causes the agent to crash (no response is received). > Crash log is shown below, full log from the agent is attached here: > {code:java} > Jun 07 22:30:32 ip-10-0-3-27.us-west-2.compute.internal mesos-agent[3718]: > I0607 22:30:32.397320 3743 logfmt.cpp:178] type=audit timestamp=2018-06-07 > 22:30:32.397243904+00:00 reason="Error in token 'Missing 'Authorization' > header from HTTP request'. Allowing anonymous connection" > object="/slave(1)/api/v1" agent="Mozilla/5.0 (Macintosh; Intel Mac OS X > 10_13_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 > Safari/537.36" authorizer="mesos-agent" action="POST" result=allow > srcip=10.0.6.99 dstport=5051 srcport=42084 dstip=10.0.3.27 > Jun 07 22:30:32 ip-10-0-3-27.us-west-2.compute.internal mesos-agent[3718]: > W0607 22:30:32.397434 3743 authenticator.cpp:289] Error in token on request > from '10.0.6.99:42084': Missing 'Authorization' header from HTTP request > Jun 07 22:30:32 ip-10-0-3-27.us-west-2.compute.internal mesos-agent[3718]: > W0607 22:30:32.397466 3743 authenticator.cpp:291] Falling back to anonymous > connection using user 'dcos_anonymous' > Jun 07 22:30:32 ip-10-0-3-27.us-west-2.compute.internal mesos-agent[3718]: > I0607 22:30:32.397629 3748 http.cpp:1099] HTTP POST for /slave(1)/api/v1 from > 10.0.6.99:42084 with User-Agent='Mozilla/5.0 (Macintosh; Intel Mac OS X > 10_13_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 > Safari/537.36' > Jun 07 22:30:32 ip-10-0-3-27.us-west-2.compute.internal mesos-agent[3718]: > I0607 22:30:32.397784 3748 http.cpp:2030] Processing GET_CONTAINERS call > Jun 07 22:30:32 ip-10-0-3-27.us-west-2.compute.internal mesos-agent[3718]: > F0607 22:30:32.398736 3747 http.cpp:121] Serializing a RecordIO stream is not > supported > Jun 07 22:30:32 ip-10-0-3-27.us-west-2.compute.internal mesos-agent[3718]: > *** Check failure stack trace: *** > Jun 07 22:30:32 ip-10-0-3-27.us-west-2.compute.internal mesos-agent[3718]: @ > 0x7f619478636d google::LogMessage::Fail() > Jun 07 22:30:32 ip-10-0-3-27.us-west-2.compute.internal mesos-agent[3718]: @ > 0x7f619478819d google::LogMessage::SendToLog() > Jun 07 22:30:32 ip-10-0-3-27.us-west-2.compute.internal mesos-agent[3718]: @ > 0x7f6194785f5c google::LogMessage::Flush() > Jun 07 22:30:32 ip-10-0-3-27.us-west-2.compute.internal mesos-agent[3718]: @ > 0x7f6194788a99 google::LogMessageFatal::~LogMessageFatal() > Jun 07 22:30:32 ip-10-0-3-27.us-west-2.compute.internal mesos-agent[3718]: @ > 0x7f61935e2b9d mesos::internal::serialize() > Jun 07 22:30:32 ip-10-0-3-27.us-west-2.compute.internal mesos-agent[3718]: @ > 0x7f6193a4c0ef > _ZNO6lambda12CallableOnceIFN7process6FutureINS1_4http8ResponseEEERKN4JSON5ArrayEEE10CallableFnIZNK5mesos8internal5slave4Http13getContainersERKNSD_5agent4CallENSD_11ContentTypeERK6OptionINS3_14authentication9PrincipalEEEUlRKNS2_IS7_EEE0_EclES9_ > Jun 07 22:30:32 ip-10-0-3-27.us-west-2.compute.internal mesos-agent[3718]: @ > 0x7f6193a81d61 process::internal::thenf<>() > Jun 07 22:30:32 ip-10-0-3-27.us-west-2.compute.internal
[jira] [Comment Edited] (MESOS-9031) Mesos CNI portmap plugins' iptables rules doesn't allow connections via host ip and port from the same bridge container network
[ https://issues.apache.org/jira/browse/MESOS-9031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16533475#comment-16533475 ] Qian Zhang edited comment on MESOS-9031 at 7/5/18 10:22 AM: [~Kirill P] It's weird, I think there should be no firewall issue in your environment since the default policy of your FORWARD chain is ACCEPT {quote}:FORWARD ACCEPT [146628:34388360] {quote} So no packets will be dropped by this chain. In my environment (a Ubuntu 17.10 VM), the default policy of the FORWARD chain is DROP, that's why I need to add that rule ("iptables -t filter -A FORWARD -o mesos-cni0 -j ACCEPT"). Can you please try another application rather than akka? E.g., launch a simple http server ("python -m SimpleHTTPServer 8080") to join a CNI bridge network and map its port to a host port, and then launch another container in the same CNI bridge network to access that http server via hostIP:hostPort, will it have the same timeout issue? was (Author: qianzhang): [~Kirill P] It's weird, I think there should be no firewall issue in your environment since the default policy of your FORWARD chain is ACCEPT {quote}:FORWARD ACCEPT [146628:34388360] {quote} So no packets will be dropped by this chain. In my environment (a Ubuntu 17.10 VM), the default policy of the FORWARD chain is DROP, that's why I need to add that rule ("iptables -t filter -A FORWARD -o mesos-cni0 -j ACCEPT"). > Mesos CNI portmap plugins' iptables rules doesn't allow connections via host > ip and port from the same bridge container network > --- > > Key: MESOS-9031 > URL: https://issues.apache.org/jira/browse/MESOS-9031 > Project: Mesos > Issue Type: Bug > Components: cni, containerization >Affects Versions: 1.6.0 >Reporter: Kirill Plyashkevich >Assignee: Qian Zhang >Priority: Major > > using `mesos-cni-port-mapper` with folllowing config: > {noformat} > { > "name" : "dcos", > "type" : "mesos-cni-port-mapper", > "excludeDevices" : [], > "chain": "MESOS-CNI0-PORT-MAPPER", > "delegate": { > "type": "bridge", > "bridge": "mesos-cni0", > "isGateway": true, > "ipMasq": true, > "hairpinMode": true, > "ipam": { > "type": "host-local", > "ranges": [ > [{"subnet": "172.26.0.0/16"}] > ], > "routes": [ > {"dst": "0.0.0.0/0"} > ] > } > } > } > {noformat} > - 2 services running on the same mesos-slave using unified containerizer in > different tasks and communicating via host ip and host port > - connection timeouts due to iptables rules per container CNI-XXX chain > - actually timeouts are caused by > {noformat} > Chain CNI-XXX (1 references) > num target prot opt source destination > 1ACCEPT all -- anywhere 172.26.0.0/16/* name: > "dcos" id: "" */ > 2MASQUERADE all -- anywhere!base-address.mcast.net/4 /* > name: "dcos" id: "" */ > {noformat} > rule #1 is executed and no masquerading happens. > there are multiple solutions: > - -simpliest and fastest one is not to add that ACCEPT- - NOT A SOLUTION. > it's happening in `bridge` plugin and `cni/portmap` shows that > snat/masquerade should be done during portmapping as well. > - perhaps, there's a better change in iptables rules that can fix it > - proper one (imho) is to finally implement cni spec 0.3.x in order to be > able to use chaining of plugins and use cni's `bridge` and `portmap` plugins > in chain (and get rid of mesos-cni-port-mapper completely eventually). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (MESOS-9031) Mesos CNI portmap plugins' iptables rules doesn't allow connections via host ip and port from the same bridge container network
[ https://issues.apache.org/jira/browse/MESOS-9031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16533475#comment-16533475 ] Qian Zhang edited comment on MESOS-9031 at 7/5/18 10:13 AM: [~Kirill P] It's weird, I think there should be no firewall issue in your environment since the default policy of your FORWARD chain is ACCEPT {quote}:FORWARD ACCEPT [146628:34388360] {quote} So no packets will be dropped by this chain. In my environment (a Ubuntu 17.10 VM), the default policy of the FORWARD chain is DROP, that's why I need to add that rule ("iptables -t filter -A FORWARD -o mesos-cni0 -j ACCEPT"). was (Author: qianzhang): [~Kirill P] It's weird, I think there should be no firewall issue in your environment since the default policy of your FORWARD chain is ACCEPT (":FORWARD ACCEPT [146628:34388360]"), so no packets will be dropped by this chain. In my environment (a Ubuntu 17.10 VM), the default policy of the FORWARD chain is DROP, that's why I need to add that rule ("iptables -t filter -A FORWARD -o mesos-cni0 -j ACCEPT"). > Mesos CNI portmap plugins' iptables rules doesn't allow connections via host > ip and port from the same bridge container network > --- > > Key: MESOS-9031 > URL: https://issues.apache.org/jira/browse/MESOS-9031 > Project: Mesos > Issue Type: Bug > Components: cni, containerization >Affects Versions: 1.6.0 >Reporter: Kirill Plyashkevich >Assignee: Qian Zhang >Priority: Major > > using `mesos-cni-port-mapper` with folllowing config: > {noformat} > { > "name" : "dcos", > "type" : "mesos-cni-port-mapper", > "excludeDevices" : [], > "chain": "MESOS-CNI0-PORT-MAPPER", > "delegate": { > "type": "bridge", > "bridge": "mesos-cni0", > "isGateway": true, > "ipMasq": true, > "hairpinMode": true, > "ipam": { > "type": "host-local", > "ranges": [ > [{"subnet": "172.26.0.0/16"}] > ], > "routes": [ > {"dst": "0.0.0.0/0"} > ] > } > } > } > {noformat} > - 2 services running on the same mesos-slave using unified containerizer in > different tasks and communicating via host ip and host port > - connection timeouts due to iptables rules per container CNI-XXX chain > - actually timeouts are caused by > {noformat} > Chain CNI-XXX (1 references) > num target prot opt source destination > 1ACCEPT all -- anywhere 172.26.0.0/16/* name: > "dcos" id: "" */ > 2MASQUERADE all -- anywhere!base-address.mcast.net/4 /* > name: "dcos" id: "" */ > {noformat} > rule #1 is executed and no masquerading happens. > there are multiple solutions: > - -simpliest and fastest one is not to add that ACCEPT- - NOT A SOLUTION. > it's happening in `bridge` plugin and `cni/portmap` shows that > snat/masquerade should be done during portmapping as well. > - perhaps, there's a better change in iptables rules that can fix it > - proper one (imho) is to finally implement cni spec 0.3.x in order to be > able to use chaining of plugins and use cni's `bridge` and `portmap` plugins > in chain (and get rid of mesos-cni-port-mapper completely eventually). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MESOS-9031) Mesos CNI portmap plugins' iptables rules doesn't allow connections via host ip and port from the same bridge container network
[ https://issues.apache.org/jira/browse/MESOS-9031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16533475#comment-16533475 ] Qian Zhang commented on MESOS-9031: --- [~Kirill P] It's weird, I think there should be no firewall issue in your environment since the default policy of your FORWARD chain is ACCEPT (":FORWARD ACCEPT [146628:34388360]"), so no packets will be dropped by this chain. In my environment (a Ubuntu 17.10 VM), the default policy of the FORWARD chain is DROP, that's why I need to add that rule ("iptables -t filter -A FORWARD -o mesos-cni0 -j ACCEPT"). > Mesos CNI portmap plugins' iptables rules doesn't allow connections via host > ip and port from the same bridge container network > --- > > Key: MESOS-9031 > URL: https://issues.apache.org/jira/browse/MESOS-9031 > Project: Mesos > Issue Type: Bug > Components: cni, containerization >Affects Versions: 1.6.0 >Reporter: Kirill Plyashkevich >Assignee: Qian Zhang >Priority: Major > > using `mesos-cni-port-mapper` with folllowing config: > {noformat} > { > "name" : "dcos", > "type" : "mesos-cni-port-mapper", > "excludeDevices" : [], > "chain": "MESOS-CNI0-PORT-MAPPER", > "delegate": { > "type": "bridge", > "bridge": "mesos-cni0", > "isGateway": true, > "ipMasq": true, > "hairpinMode": true, > "ipam": { > "type": "host-local", > "ranges": [ > [{"subnet": "172.26.0.0/16"}] > ], > "routes": [ > {"dst": "0.0.0.0/0"} > ] > } > } > } > {noformat} > - 2 services running on the same mesos-slave using unified containerizer in > different tasks and communicating via host ip and host port > - connection timeouts due to iptables rules per container CNI-XXX chain > - actually timeouts are caused by > {noformat} > Chain CNI-XXX (1 references) > num target prot opt source destination > 1ACCEPT all -- anywhere 172.26.0.0/16/* name: > "dcos" id: "" */ > 2MASQUERADE all -- anywhere!base-address.mcast.net/4 /* > name: "dcos" id: "" */ > {noformat} > rule #1 is executed and no masquerading happens. > there are multiple solutions: > - -simpliest and fastest one is not to add that ACCEPT- - NOT A SOLUTION. > it's happening in `bridge` plugin and `cni/portmap` shows that > snat/masquerade should be done during portmapping as well. > - perhaps, there's a better change in iptables rules that can fix it > - proper one (imho) is to finally implement cni spec 0.3.x in order to be > able to use chaining of plugins and use cni's `bridge` and `portmap` plugins > in chain (and get rid of mesos-cni-port-mapper completely eventually). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (MESOS-9051) Move agent call validation into common validation library.
[ https://issues.apache.org/jira/browse/MESOS-9051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Peach reassigned MESOS-9051: -- Assignee: James Peach | [https://reviews.apache.org/r/67830/] | Moved `executor::Call` validation to common validation library. | > Move agent call validation into common validation library. > -- > > Key: MESOS-9051 > URL: https://issues.apache.org/jira/browse/MESOS-9051 > Project: Mesos > Issue Type: Bug > Components: agent, build >Reporter: James Peach >Assignee: James Peach >Priority: Minor > > The executor driver calls {{executor::call::validate()}} from > {{src/slave/validation.cpp}}, which creates an upward dependency from > libmesos.so (where the executor driver has to live) to the agent. If we can > move the validation calls down to the common validation library, we can break > this dependency. -- This message was sent by Atlassian JIRA (v7.6.3#76005)