Hi,

we’ve encountered a next problem with stateful ACLs.

Suppose, we have one logical switch (ls1) and attached to it a VIF type logical 
ports (lsp1, lsp2).
Each logical port has a linux VM besides it.

Logical ports reside in port group (pg1) and two ACLs are created within this 
PG:
to-lport outport == @pg1 && ip4 && ip4.dst == 0.0.0.0/0 allow-related
from-lport outport == @pg1 && ip4 && ip4.src == 0.0.0.0/0 allow-related

When we have a high-connection rate service between VMs, the tcp source/dest 
ports may be reused before the connection is deleted from LSP’s-related 
conntrack zones on the host.
Let’s use curl with passing --local-port argument to have each time same source 
port.

Run it from VM to another VM (172.31.0.18 -> 172.31.0.17):
curl --local-port 44444 http://172.31.0.17/

Check connections in client’s and server’s vif zones (client - zone=20, server 
- zone=1):
run while true script to check connections state per-second, while running new 
connection with same source/dest 5-tuple:

while true; do date; grep -e 'zone=1 ' -e zone=20 /proc/net/nf_conntrack; sleep 
0.2; done

Right after we’ve succesfully run curl, the connection is getting time-closed 
and next time-wait states:

Mon Sep 13 14:34:39 MSK 2021
ipv4     2 tcp      6 59 CLOSE_WAIT src=172.31.0.18 dst=172.31.0.17 sport=44444 
dport=80 src=172.31.0.17 dst=172.31.0.18 sport=80 dport=44444 [ASSURED] mark=0 
zone=1 use=2
ipv4     2 tcp      6 59 CLOSE_WAIT src=172.31.0.18 dst=172.31.0.17 sport=44444 
dport=80 src=172.31.0.17 dst=172.31.0.18 sport=80 dport=44444 [ASSURED] mark=0 
zone=20 use=2
Mon Sep 13 14:34:39 MSK 2021
ipv4     2 tcp      6 119 TIME_WAIT src=172.31.0.18 dst=172.31.0.17 sport=44444 
dport=80 src=172.31.0.17 dst=172.31.0.18 sport=80 dport=44444 [ASSURED] mark=0 
zone=1 use=2
ipv4     2 tcp      6 119 TIME_WAIT src=172.31.0.18 dst=172.31.0.17 sport=44444 
dport=80 src=172.31.0.17 dst=172.31.0.18 sport=80 dport=44444 [ASSURED] mark=0 
zone=20 use=2

And it remains in time-wait state for nf_conntrack_time_wait_timeout (120 
seconds for centos 7).

Everything is okay for now.
While we have installed connections in TW state in zone 1 and 20, lets run this 
curl (source port 44444) again:
1st SYN packet is lost. It didn’t get to destination VM. In conntrack we have:

Mon Sep 13 14:34:41 MSK 2021
ipv4     2 tcp      6 118 TIME_WAIT src=172.31.0.18 dst=172.31.0.17 sport=44444 
dport=80 src=172.31.0.17 dst=172.31.0.18 sport=80 dport=44444 [ASSURED] mark=0 
zone=1 use=2

We see that TW connection was dropped in source vif’s zone (20).

Next, after one second TCP sends retry and connection in destination (server’s) 
zone is dropped and a new connection is created in source zone (client’s):

Mon Sep 13 14:34:41 MSK 2021
ipv4     2 tcp      6 120 SYN_SENT src=172.31.0.18 dst=172.31.0.17 sport=44444 
dport=80 [UNREPLIED] src=172.31.0.17 dst=172.31.0.18 sport=80 dport=44444 
mark=0 zone=20 use=2

Server VM still didn’t get this SYN packet. It got dropped.

Then, after 2 seconds TCP sends retry again and connection is working well:

Mon Sep 13 14:34:44 MSK 2021
ipv4     2 tcp      6 59 CLOSE_WAIT src=172.31.0.18 dst=172.31.0.17 sport=44444 
dport=80 src=172.31.0.17 dst=172.31.0.18 sport=80 dport=44444 [ASSURED] mark=0 
zone=1 use=2
ipv4     2 tcp      6 59 CLOSE_WAIT src=172.31.0.18 dst=172.31.0.17 sport=44444 
dport=80 src=172.31.0.17 dst=172.31.0.18 sport=80 dport=44444 [ASSURED] mark=0 
zone=20 use=2
Mon Sep 13 14:34:44 MSK 2021
ipv4     2 tcp      6 119 TIME_WAIT src=172.31.0.18 dst=172.31.0.17 sport=44444 
dport=80 src=172.31.0.17 dst=172.31.0.18 sport=80 dport=44444 [ASSURED] mark=0 
zone=1 use=2
ipv4     2 tcp      6 119 TIME_WAIT src=172.31.0.18 dst=172.31.0.17 sport=44444 
dport=80 src=172.31.0.17 dst=172.31.0.18 sport=80 dport=44444 [ASSURED] mark=0 
zone=20 use=2

I guess, that it could happen:
1. Run curl with an empty conntrack zones. Everything is good, we’ve got http 
response, closed the connection. There’s one TW entry in client’s and one in 
server’s zonntrack zones.
2. Run curl with same source port within nf_conntrack_time_wait_timeout seconds.
2.1. OVS gets packet from VM, sends it to client’s conntrack zone=20. It 
matches pre-existed conntrack entry in tw state from previous curl run. TW 
connection in conntrack is deleted. A copy of a packet is returned to OVS and 
recirculated packet has ct.inv (?) and !ct.trk states and got dropped (I’m NOT 
sure, it’s just an assumption!)
3. After one second client VM resends TCP SYN.
3.1. OVS gets packet, sends through client’s conntrack zone=20, a new 
connection is added, packet has ct.trk and ct.new states set. Packet goes to 
recirculation.
3.2. OVS sends packet to server’s conntrack zone=1. It matches pre-existed 
conntrack entry in tw state from previous run. Conntrack removes this entry. 
Packet is returned to OVS with ct.inv (?) and !ct.trk. Packet got dropped.
4. Client’s VM again sends TCP SYN after 2 more seconds left.
4.1 OVS gets packet from client’s VIF, sends to client’s conntrack zone=20, it 
matches pre-existed SYN_SENT conntrack entry state, packets is returned to OVS 
with ct.new, ct.trk flags set.
4.2 OVS sends packet to server’s conntrack zone=1. Conntrack table for zone=1 
is empty, it adds new entry, returns packet to OVS with ct.trk and ct.new flags 
set.
4.3 OVS sends packet to server’s VIF, next traffic operates normally.

So, with such behaviour connection establishment sometimes takes up to three 
seconds (2 TCP SYN retries) and makes troubles in overlay services. 
(Application timeouts and service outages).

I’ve checked how conntrack works inside VMs with such traffic and it looks like 
if conntrack gets a packet within a TW connection it recreates a new conntrack 
entry. No tuning inside VMs was performed. As a server I used apache with 
default config from CentOS distribution.

@Numan, @Han, @Mark, can you please take a look at this and give any 
suggestions/thoughts how this can be fixed.
The problem is actual with OVS 2.13.4 and latest OVN master branch, however 
we’ve met it on 20.06.3 with same OVS and it’s very important for us.

Thanks.


Regards,
Vladislav Odintsov
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to