Hi, we’ve encountered a next problem with stateful ACLs.
Suppose, we have one logical switch (ls1) and attached to it a VIF type logical ports (lsp1, lsp2). Each logical port has a linux VM besides it. Logical ports reside in port group (pg1) and two ACLs are created within this PG: to-lport outport == @pg1 && ip4 && ip4.dst == 0.0.0.0/0 allow-related from-lport outport == @pg1 && ip4 && ip4.src == 0.0.0.0/0 allow-related When we have a high-connection rate service between VMs, the tcp source/dest ports may be reused before the connection is deleted from LSP’s-related conntrack zones on the host. Let’s use curl with passing --local-port argument to have each time same source port. Run it from VM to another VM (172.31.0.18 -> 172.31.0.17): curl --local-port 44444 http://172.31.0.17/ Check connections in client’s and server’s vif zones (client - zone=20, server - zone=1): run while true script to check connections state per-second, while running new connection with same source/dest 5-tuple: while true; do date; grep -e 'zone=1 ' -e zone=20 /proc/net/nf_conntrack; sleep 0.2; done Right after we’ve succesfully run curl, the connection is getting time-closed and next time-wait states: Mon Sep 13 14:34:39 MSK 2021 ipv4 2 tcp 6 59 CLOSE_WAIT src=172.31.0.18 dst=172.31.0.17 sport=44444 dport=80 src=172.31.0.17 dst=172.31.0.18 sport=80 dport=44444 [ASSURED] mark=0 zone=1 use=2 ipv4 2 tcp 6 59 CLOSE_WAIT src=172.31.0.18 dst=172.31.0.17 sport=44444 dport=80 src=172.31.0.17 dst=172.31.0.18 sport=80 dport=44444 [ASSURED] mark=0 zone=20 use=2 Mon Sep 13 14:34:39 MSK 2021 ipv4 2 tcp 6 119 TIME_WAIT src=172.31.0.18 dst=172.31.0.17 sport=44444 dport=80 src=172.31.0.17 dst=172.31.0.18 sport=80 dport=44444 [ASSURED] mark=0 zone=1 use=2 ipv4 2 tcp 6 119 TIME_WAIT src=172.31.0.18 dst=172.31.0.17 sport=44444 dport=80 src=172.31.0.17 dst=172.31.0.18 sport=80 dport=44444 [ASSURED] mark=0 zone=20 use=2 And it remains in time-wait state for nf_conntrack_time_wait_timeout (120 seconds for centos 7). Everything is okay for now. While we have installed connections in TW state in zone 1 and 20, lets run this curl (source port 44444) again: 1st SYN packet is lost. It didn’t get to destination VM. In conntrack we have: Mon Sep 13 14:34:41 MSK 2021 ipv4 2 tcp 6 118 TIME_WAIT src=172.31.0.18 dst=172.31.0.17 sport=44444 dport=80 src=172.31.0.17 dst=172.31.0.18 sport=80 dport=44444 [ASSURED] mark=0 zone=1 use=2 We see that TW connection was dropped in source vif’s zone (20). Next, after one second TCP sends retry and connection in destination (server’s) zone is dropped and a new connection is created in source zone (client’s): Mon Sep 13 14:34:41 MSK 2021 ipv4 2 tcp 6 120 SYN_SENT src=172.31.0.18 dst=172.31.0.17 sport=44444 dport=80 [UNREPLIED] src=172.31.0.17 dst=172.31.0.18 sport=80 dport=44444 mark=0 zone=20 use=2 Server VM still didn’t get this SYN packet. It got dropped. Then, after 2 seconds TCP sends retry again and connection is working well: Mon Sep 13 14:34:44 MSK 2021 ipv4 2 tcp 6 59 CLOSE_WAIT src=172.31.0.18 dst=172.31.0.17 sport=44444 dport=80 src=172.31.0.17 dst=172.31.0.18 sport=80 dport=44444 [ASSURED] mark=0 zone=1 use=2 ipv4 2 tcp 6 59 CLOSE_WAIT src=172.31.0.18 dst=172.31.0.17 sport=44444 dport=80 src=172.31.0.17 dst=172.31.0.18 sport=80 dport=44444 [ASSURED] mark=0 zone=20 use=2 Mon Sep 13 14:34:44 MSK 2021 ipv4 2 tcp 6 119 TIME_WAIT src=172.31.0.18 dst=172.31.0.17 sport=44444 dport=80 src=172.31.0.17 dst=172.31.0.18 sport=80 dport=44444 [ASSURED] mark=0 zone=1 use=2 ipv4 2 tcp 6 119 TIME_WAIT src=172.31.0.18 dst=172.31.0.17 sport=44444 dport=80 src=172.31.0.17 dst=172.31.0.18 sport=80 dport=44444 [ASSURED] mark=0 zone=20 use=2 I guess, that it could happen: 1. Run curl with an empty conntrack zones. Everything is good, we’ve got http response, closed the connection. There’s one TW entry in client’s and one in server’s zonntrack zones. 2. Run curl with same source port within nf_conntrack_time_wait_timeout seconds. 2.1. OVS gets packet from VM, sends it to client’s conntrack zone=20. It matches pre-existed conntrack entry in tw state from previous curl run. TW connection in conntrack is deleted. A copy of a packet is returned to OVS and recirculated packet has ct.inv (?) and !ct.trk states and got dropped (I’m NOT sure, it’s just an assumption!) 3. After one second client VM resends TCP SYN. 3.1. OVS gets packet, sends through client’s conntrack zone=20, a new connection is added, packet has ct.trk and ct.new states set. Packet goes to recirculation. 3.2. OVS sends packet to server’s conntrack zone=1. It matches pre-existed conntrack entry in tw state from previous run. Conntrack removes this entry. Packet is returned to OVS with ct.inv (?) and !ct.trk. Packet got dropped. 4. Client’s VM again sends TCP SYN after 2 more seconds left. 4.1 OVS gets packet from client’s VIF, sends to client’s conntrack zone=20, it matches pre-existed SYN_SENT conntrack entry state, packets is returned to OVS with ct.new, ct.trk flags set. 4.2 OVS sends packet to server’s conntrack zone=1. Conntrack table for zone=1 is empty, it adds new entry, returns packet to OVS with ct.trk and ct.new flags set. 4.3 OVS sends packet to server’s VIF, next traffic operates normally. So, with such behaviour connection establishment sometimes takes up to three seconds (2 TCP SYN retries) and makes troubles in overlay services. (Application timeouts and service outages). I’ve checked how conntrack works inside VMs with such traffic and it looks like if conntrack gets a packet within a TW connection it recreates a new conntrack entry. No tuning inside VMs was performed. As a server I used apache with default config from CentOS distribution. @Numan, @Han, @Mark, can you please take a look at this and give any suggestions/thoughts how this can be fixed. The problem is actual with OVS 2.13.4 and latest OVN master branch, however we’ve met it on 20.06.3 with same OVS and it’s very important for us. Thanks. Regards, Vladislav Odintsov _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
