[ https://issues.apache.org/jira/browse/MESOS-9127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jie Yu reassigned MESOS-9127: ----------------------------- Assignee: Jie Yu > Port mapper CNI plugin might deadlock iptables on the agent. > ------------------------------------------------------------ > > Key: MESOS-9127 > URL: https://issues.apache.org/jira/browse/MESOS-9127 > Project: Mesos > Issue Type: Bug > Components: network > Affects Versions: 1.4.1, 1.5.1, 1.6.1 > Reporter: Jie Yu > Assignee: Jie Yu > Priority: Major > > Recently, we noticed that if one launches a lot of containers that use the > port mapper CNI plugin, the iptables will get deadlock. The symptom is like > the following. > If you do any iptables command on the box, it'll get stuck on acquiring the > xtables lock: > {noformat} > core@ip-10-0-2-99 ~ $ time iptables -w -t nat -S UCR-DEFAULT-BRIDGE > Another app is currently holding the xtables lock; waiting for it to exit... > Another app is currently holding the xtables lock; waiting for it to exit... > Another app is currently holding the xtables lock; waiting for it to exit... > Another app is currently holding the xtables lock; waiting for it to exit... > Another app is currently holding the xtables lock; waiting for it to exit... > Another app is currently holding the xtables lock; waiting for it to exit... > Another app is currently holding the xtables lock; waiting for it to exit... > Another app is currently holding the xtables lock; waiting for it to exit... > Another app is currently holding the xtables lock; waiting for it to exit... > Another app is currently holding the xtables lock; waiting for it to exit... > Another app is currently holding the xtables lock; waiting for it to exit... > Another app is currently holding the xtables lock; waiting for it to exit... > Another app is currently holding the xtables lock; waiting for it to exit... > Another app is currently holding the xtables lock; waiting for it to exit... > Another app is currently holding the xtables lock; waiting for it to exit... > Another app is currently holding the xtables lock; waiting for it to exit... > Another app is currently holding the xtables lock; waiting for it to exit... > Another app is currently holding the xtables lock; waiting for it to exit... > Another app is currently holding the xtables lock; waiting for it to exit... > Another app is currently holding the xtables lock; waiting for it to exit... > Another app is currently holding the xtables lock; waiting for it to exit... > ^C > real 0m41.349s > user 0m0.001s > sys 0m0.000s > {noformat} > And you'll notice that a lot of iptables and mesos port mapper CNI plugin > processes on the box > {noformat} > $ ps -fp $(pidof mesos-cni-port-mapper) | wc -l > 191 > $ ps -fp $(pidof iptables) | wc -l > 192 > {noformat} > Then, we look into the process that is holding the xtables lock: > {noformat} > $ sudo netstat -p -n | grep xtables > unix 2 [ ] STREAM 225083 25048/iptables > @xtables > $ ps aux | grep 25048 > root 25048 0.0 0.0 26184 2512 ? S 22:41 0:00 iptables -w > -t nat -S UCR-DEFAULT-BRIDGE > core 31857 0.0 0.0 6760 976 pts/0 S+ 23:18 0:00 grep > --colour=auto 25048 > $ sudo strace -s 10000 -p 25048 > Process 25048 attached > write(1, "-dport 13839 -m comment --comment \"container_id: > a074efef-c119-4764-a584-87c57e6b7f68\" -j DNAT --to-destination > 172.31.254.183:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport > 17282 -m comment --comment \"container_id: > 47880db5-90ad-4034-9c2b-2fd246d42342\" -j DNAT --to-destination > 172.31.254.126:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport > 28712 -m comment --comment \"container_id: > 3293d5d0-772c-48d2-bafd-1c4b4d56247e\" -j DNAT --to-destination > 172.31.254.130:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport > 23893 -m comment --comment \"container_id: > f57de8eb-a3b9-44cb-8dac-7ab261bc8aac\" -j DNAT --to-destination > 172.31.254.149:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport > 28449 -m comment --comment \"container_id: > 9238dbf0-7b28-4fda-880a-bf0c8f40562a\" -j DNAT --to-destination > 172.31.254.190:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport > 26438 -m comment --comment \"container_id: > d307cf58-8972-4de4-ad45-26c29786add0\" -j DNAT --to-destination > 172.31.254.187:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport > 7682 -m comment --comment \"container_id: > 60f5a61b-f4c0-4846-b2cd-63cd7eb5a4e8\" -j DNAT --to-destination > 172.31.254.177:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport > 23904 -m comment --comment \"container_id: > f203ff9e-7b81-4e54-ab44-d45e2a937f38\" -j DNAT --to-destination > 172.31.254.157:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport > 8359 -m comment --comment \"container_id: > 578cc89c-83bf-46ba-9ae7-c7b89e40e739\" -j DNAT --to-destination > 172.31.254.158:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport > 28482 -m comment --comment \"container_id: > 70721adb-cd6c-4158-8b11-4ef694999203\" -j DNAT --to-destination > 172.31.254.163:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport > 2564 -m comment --comment \"container_id: > 70bc9735-8e0a-4b27-9a68-1b1151c9f6ec\" -j DNAT --to-destination > 172.31.254.105:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport > 6952 -m comment --comment \"container_id: > a494e0fe-ce6e-4e95-b094-504c7603b6fb\" -j DNAT --to-destination > 172.31.254.159:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport > 29371 -m comment --comment \"container_id: > 1dac0183-8939-4165-95e8-cf6503cc53ad\" -j DNAT --to-destination > 172.31.254.174:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport > 6551 -m comment --comment \"container_id: > 148a197b-7247-4a3c-b528-1417cc8ccf3f\" -j DNAT --to-destination > 172.31.254.144:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport > 16568 -m comment --comment \"container_id: > e68bb942-ab69-43a5-a308-40b33368d111\" -j DNAT --to-destination > 172.31.254.184:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport > 5915 -m comment --comment \"container_id: > 08d53ef6-7d3c-4b44-8bb7-923b6a4a1780\" -j DNAT --to-destination > 172.31.254.186:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport > 11252 -m comment --comment \"container_id: > 894e1455-6c77-4fc2-8414-53f3db7b492a\" -j DNAT --to-destination > 172.31.254.223:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport > 7583 -m comment --comment \"container_id: > 917b1fc3-12ca-4198-a5f4-13144fbf9096\" -j DNAT --to-destination > 172.31.254.164:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport > 12695 -m comment --comment \"container_id: > cfadeb29-3481-4e9e-8eba-e57ce6a4f693\" -j DNAT --to-destination > 172.31.254.153:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport > 1256 -m comment --comment \"container_id: > 212649ee-a894-4201-bc38-798545ce9e1f\" -j DNAT --to-destination > 172.31.254.132:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport > 28581 -m comment --comment \"container_id: > 38c46524-9023-450d-a134-24ad14eecbd5\" -j DNAT --to-destination > 172.31.254.202:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport > 5709 -m comment --comment \"container_id: > 8a510a8e-df16-40b2-a4b7-36605a5f9984\" -j DNAT --to-destination > 172.31.254.194:80\n-A UCR-DEFAULT-BRIDGE ! -i ucr-br0 -p tcp -m tcp --dport > 6050 -m comment --comment \"container_id: > 94757d96-b6ad-4a57-9e38-c956a2e24fd6\" -j DNAT --to-destination > 172.31.254.152:80\n-A UCR-DEFAULT-B", 4096 > {noformat} > So it gets stuck when writing to stdout. That indicates that the pipe is > full, and the reader end hasn't read from the pipe. Then we identified the > reader > {noformat} > $ ps aux | grep a074efef-c119-4764-a584-87c57e6b7f68 > core 2355 0.0 0.0 6756 1020 pts/0 S+ 23:39 0:00 grep > --colour=auto a074efef-c119-4764-a584-87c57e6b7f68 > root 26486 0.0 0.0 11948 2852 ? S 22:41 0:00 sh -c > #!/bin/sh exec 1>&2 set -x # The iptables command searches > for the DNAT rules with tag # "container_id: <CNI_CONTAINERID>", and if > it exists goes ahead # and deletes it. iptables -w -t nat -S > UCR-DEFAULT-BRIDGE | sed "/container_id: > a074efef-c119-4764-a584-87c57e6b7f68/ s/-A/iptables -w -t nat -D/e" > root 26490 0.0 0.0 4340 896 ? S 22:41 0:00 sed > /container_id: a074efef-c119-4764-a584-87c57e6b7f68/ s/-A/iptables -w -t nat > -D/e > $ sudo strace -s 10000 -p 26490 > Process 26490 attached > read(0, ^CProcess 26490 detached > {noformat} > The reader which is `sed` is reading from stdin, somehow gets stuck as well. > The corresponding container gets stuck during CNI detach after checking the > agent log. > After checking the code, we pinpoint the problematic script in our CNI port > mapper plugin: > https://github.com/apache/mesos/blob/1.6.0/src/slave/containerizer/mesos/isolators/network/cni/plugins/port_mapper/port_mapper.cpp#L372 > {code} > iptables -w -t nat -S %s | sed "/%s/ s/-A/iptables -w -t nat -D/e" > {code} > This command will deadlock if there are a lot of entires in the iptables > because the `sed` won't process the next line while executing `iptables -w -t > nat -D ...`. But the executing of `iptables -w -t nat -D ... ` might get > stuck if the first command `iptables -w -t nat -S %s` didn't finish, because > the lock is not released. The first command might not finish if it has a lot > of output, filling the pipe that `sed` hasn't had a chance to process yet. > The correct solution is to write the command to execute in a file, and > execute the file. -- This message was sent by Atlassian JIRA (v7.6.3#76005)