Hi Dumitru, Numan, I’ve sent a corresponding patch to openvswitch with my findings. It’d be great if you can take a look on it. Thanks.
https://patchwork.ozlabs.org/project/openvswitch/patch/20211126205942.9354-1-odiv...@gmail.com/ Regards, Vladislav Odintsov > On 21 Sep 2021, at 14:43, Dumitru Ceara <dce...@redhat.com> wrote: > > On 9/21/21 1:33 PM, Vladislav Odintsov wrote: >> Hi Dumitru, > > Hi Vladislav, > >> >> are you talking about any specific _mising_ patch? > > No, sorry for the confusion. I just meant there's a bug in the OOT > module that was probably already fixed in the in-tree one so, likely, > one would have to figure out the patch that fixed it. > >> >> Regards, >> Vladislav Odintsov > > Regards, > Dumitru > >> >>> On 16 Sep 2021, at 19:09, Dumitru Ceara <dce...@redhat.com> wrote: >>> >>> On 9/16/21 4:18 PM, Vladislav Odintsov wrote: >>>> Sorry, by OOT I meant non-inbox kmod. >>>> I’ve tried to use inbox kernel module (from kernel package) and problem >>>> resolved. >>>> >>>> Regards, >>>> Vladislav Odintsov >>>> >>>>> On 16 Sep 2021, at 17:17, Vladislav Odintsov <odiv...@gmail.com> wrote: >>>>> >>>>> Hi Dumitru, >>>>> >>>>> I’ve tried to exclude OOT OVS kernel module. >>>>> With OVN 20.06.3 + OVS 2.13.4 the problem solved. >>>>> >>>>> Could you please try with OOT kmod? For me it looks like a bug in OOT OVS >>>>> kernel module code. >>> >>> You're right, this seems to be a missing patch in the OOT openvswitch >>> module. I could replicate the problem you reported with the OOT module. >>> >>> Regards, >>> Dumitru >>> >>>>> >>>>> Thanks. >>>>> >>>>> Regards, >>>>> Vladislav Odintsov >>>>> >>>>>> On 16 Sep 2021, at 11:02, Dumitru Ceara <dce...@redhat.com >>>>>> <mailto:dce...@redhat.com> <mailto:dce...@redhat.com >>>>>> <mailto:dce...@redhat.com>>> wrote: >>>>>> >>>>>> On 9/16/21 2:50 AM, Vladislav Odintsov wrote: >>>>>>> Hi Dumitru, >>>>>>> >>>>>>> thanks for your reply. >>>>>>> >>>>>>> Regards, >>>>>>> Vladislav Odintsov >>>>>>> >>>>>>>> On 15 Sep 2021, at 11:24, Dumitru Ceara <dce...@redhat.com> wrote: >>>>>>>> >>>>>>>> Hi Vladislav, >>>>>>>> >>>>>>>> On 9/13/21 6:14 PM, Vladislav Odintsov wrote: >>>>>>>>> Hi Numan, >>>>>>>>> >>>>>>>>> I’ve checked with OVS 2.16.0 and OVN master. The problem persists. >>>>>>>>> Symptoms are the same. >>>>>>>>> >>>>>>>>> # grep ct_zero_snat /var/log/openvswitch/ovs-vswitchd.log >>>>>>>>> 2021-09-13T16:10:01.792Z|00019|ofproto_dpif|INFO|system@ovs-system: >>>>>>>>> Datapath supports ct_zero_snat >>>>>>>> >>>>>>>> This shouldn't be related to the problem we fixed with ct_zero_snat. >>>>>>>> >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Vladislav Odintsov >>>>>>>>> >>>>>>>>>> On 13 Sep 2021, at 17:54, Numan Siddique <num...@ovn.org> wrote: >>>>>>>>>> >>>>>>>>>> On Mon, Sep 13, 2021 at 8:10 AM Vladislav Odintsov >>>>>>>>>> <odiv...@gmail.com <mailto:odiv...@gmail.com>> wrote: >>>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> we’ve encountered a next problem with stateful ACLs. >>>>>>>>>>> >>>>>>>>>>> Suppose, we have one logical switch (ls1) and attached to it a VIF >>>>>>>>>>> type logical ports (lsp1, lsp2). >>>>>>>>>>> Each logical port has a linux VM besides it. >>>>>>>>>>> >>>>>>>>>>> Logical ports reside in port group (pg1) and two ACLs are created >>>>>>>>>>> within this PG: >>>>>>>>>>> to-lport outport == @pg1 && ip4 && ip4.dst == 0.0.0.0/0 >>>>>>>>>>> allow-related >>>>>>>>>>> from-lport outport == @pg1 && ip4 && ip4.src == 0.0.0.0/0 >>>>>>>>>>> allow-related >>>>>>>>>>> >>>>>>>>>>> When we have a high-connection rate service between VMs, the tcp >>>>>>>>>>> source/dest ports may be reused before the connection is deleted >>>>>>>>>>> from LSP’s-related conntrack zones on the host. >>>>>>>>>>> Let’s use curl with passing --local-port argument to have each time >>>>>>>>>>> same source port. >>>>>>>>>>> >>>>>>>>>>> Run it from VM to another VM (172.31.0.18 -> 172.31.0.17): >>>>>>>>>>> curl --local-port 44444 http://172.31.0.17/ >>>>>>>>>>> >>>>>>>>>>> Check connections in client’s and server’s vif zones (client - >>>>>>>>>>> zone=20, server - zone=1): >>>>>>>>>>> run while true script to check connections state per-second, while >>>>>>>>>>> running new connection with same source/dest 5-tuple: >>>>>>>>>>> >>>>>>>>>>> while true; do date; grep -e 'zone=1 ' -e zone=20 >>>>>>>>>>> /proc/net/nf_conntrack; sleep 0.2; done >>>>>>>>>>> >>>>>>>>>>> Right after we’ve succesfully run curl, the connection is getting >>>>>>>>>>> time-closed and next time-wait states: >>>>>>>>>>> >>>>>>>>>>> Mon Sep 13 14:34:39 MSK 2021 >>>>>>>>>>> ipv4 2 tcp 6 59 CLOSE_WAIT src=172.31.0.18 dst=172.31.0.17 >>>>>>>>>>> sport=44444 dport=80 src=172.31.0.17 dst=172.31.0.18 sport=80 >>>>>>>>>>> dport=44444 [ASSURED] mark=0 zone=1 use=2 >>>>>>>>>>> ipv4 2 tcp 6 59 CLOSE_WAIT src=172.31.0.18 dst=172.31.0.17 >>>>>>>>>>> sport=44444 dport=80 src=172.31.0.17 dst=172.31.0.18 sport=80 >>>>>>>>>>> dport=44444 [ASSURED] mark=0 zone=20 use=2 >>>>>>>>>>> Mon Sep 13 14:34:39 MSK 2021 >>>>>>>>>>> ipv4 2 tcp 6 119 TIME_WAIT src=172.31.0.18 dst=172.31.0.17 >>>>>>>>>>> sport=44444 dport=80 src=172.31.0.17 dst=172.31.0.18 sport=80 >>>>>>>>>>> dport=44444 [ASSURED] mark=0 zone=1 use=2 >>>>>>>>>>> ipv4 2 tcp 6 119 TIME_WAIT src=172.31.0.18 dst=172.31.0.17 >>>>>>>>>>> sport=44444 dport=80 src=172.31.0.17 dst=172.31.0.18 sport=80 >>>>>>>>>>> dport=44444 [ASSURED] mark=0 zone=20 use=2 >>>>>>>>>>> >>>>>>>>>>> And it remains in time-wait state for >>>>>>>>>>> nf_conntrack_time_wait_timeout (120 seconds for centos 7). >>>>>>>>>>> >>>>>>>>>>> Everything is okay for now. >>>>>>>>>>> While we have installed connections in TW state in zone 1 and 20, >>>>>>>>>>> lets run this curl (source port 44444) again: >>>>>>>>>>> 1st SYN packet is lost. It didn’t get to destination VM. In >>>>>>>>>>> conntrack we have: >>>>>>>>>>> >>>>>>>>>>> Mon Sep 13 14:34:41 MSK 2021 >>>>>>>>>>> ipv4 2 tcp 6 118 TIME_WAIT src=172.31.0.18 dst=172.31.0.17 >>>>>>>>>>> sport=44444 dport=80 src=172.31.0.17 dst=172.31.0.18 sport=80 >>>>>>>>>>> dport=44444 [ASSURED] mark=0 zone=1 use=2 >>>>>>>>>>> >>>>>>>>>>> We see that TW connection was dropped in source vif’s zone (20). >>>>>>>>>>> >>>>>>>>>>> Next, after one second TCP sends retry and connection in >>>>>>>>>>> destination (server’s) zone is dropped and a new connection is >>>>>>>>>>> created in source zone (client’s): >>>>>>>>>>> >>>>>>>>>>> Mon Sep 13 14:34:41 MSK 2021 >>>>>>>>>>> ipv4 2 tcp 6 120 SYN_SENT src=172.31.0.18 dst=172.31.0.17 >>>>>>>>>>> sport=44444 dport=80 [UNREPLIED] src=172.31.0.17 dst=172.31.0.18 >>>>>>>>>>> sport=80 dport=44444 mark=0 zone=20 use=2 >>>>>>>>>>> >>>>>>>>>>> Server VM still didn’t get this SYN packet. It got dropped. >>>>>>>>>>> >>>>>>>>>>> Then, after 2 seconds TCP sends retry again and connection is >>>>>>>>>>> working well: >>>>>>>>>>> >>>>>>>>>>> Mon Sep 13 14:34:44 MSK 2021 >>>>>>>>>>> ipv4 2 tcp 6 59 CLOSE_WAIT src=172.31.0.18 dst=172.31.0.17 >>>>>>>>>>> sport=44444 dport=80 src=172.31.0.17 dst=172.31.0.18 sport=80 >>>>>>>>>>> dport=44444 [ASSURED] mark=0 zone=1 use=2 >>>>>>>>>>> ipv4 2 tcp 6 59 CLOSE_WAIT src=172.31.0.18 dst=172.31.0.17 >>>>>>>>>>> sport=44444 dport=80 src=172.31.0.17 dst=172.31.0.18 sport=80 >>>>>>>>>>> dport=44444 [ASSURED] mark=0 zone=20 use=2 >>>>>>>>>>> Mon Sep 13 14:34:44 MSK 2021 >>>>>>>>>>> ipv4 2 tcp 6 119 TIME_WAIT src=172.31.0.18 dst=172.31.0.17 >>>>>>>>>>> sport=44444 dport=80 src=172.31.0.17 dst=172.31.0.18 sport=80 >>>>>>>>>>> dport=44444 [ASSURED] mark=0 zone=1 use=2 >>>>>>>>>>> ipv4 2 tcp 6 119 TIME_WAIT src=172.31.0.18 dst=172.31.0.17 >>>>>>>>>>> sport=44444 dport=80 src=172.31.0.17 dst=172.31.0.18 sport=80 >>>>>>>>>>> dport=44444 [ASSURED] mark=0 zone=20 use=2 >>>>>>>>>>> >>>>>>>>>>> I guess, that it could happen: >>>>>>>>>>> 1. Run curl with an empty conntrack zones. Everything is good, >>>>>>>>>>> we’ve got http response, closed the connection. There’s one TW >>>>>>>>>>> entry in client’s and one in server’s zonntrack zones. >>>>>>>>>>> 2. Run curl with same source port within >>>>>>>>>>> nf_conntrack_time_wait_timeout seconds. >>>>>>>>>>> 2.1. OVS gets packet from VM, sends it to client’s conntrack >>>>>>>>>>> zone=20. It matches pre-existed conntrack entry in tw state from >>>>>>>>>>> previous curl run. TW connection in conntrack is deleted. A copy of >>>>>>>>>>> a packet is returned to OVS and recirculated packet has ct.inv (?) >>>>>>>>>>> and !ct.trk states and got dropped (I’m NOT sure, it’s just an >>>>>>>>>>> assumption!) >>>>>>>>>>> 3. After one second client VM resends TCP SYN. >>>>>>>>>>> 3.1. OVS gets packet, sends through client’s conntrack zone=20, a >>>>>>>>>>> new connection is added, packet has ct.trk and ct.new states set. >>>>>>>>>>> Packet goes to recirculation. >>>>>>>>>>> 3.2. OVS sends packet to server’s conntrack zone=1. It matches >>>>>>>>>>> pre-existed conntrack entry in tw state from previous run. >>>>>>>>>>> Conntrack removes this entry. Packet is returned to OVS with ct.inv >>>>>>>>>>> (?) and !ct.trk. Packet got dropped. >>>>>>>>>>> 4. Client’s VM again sends TCP SYN after 2 more seconds left. >>>>>>>>>>> 4.1 OVS gets packet from client’s VIF, sends to client’s conntrack >>>>>>>>>>> zone=20, it matches pre-existed SYN_SENT conntrack entry state, >>>>>>>>>>> packets is returned to OVS with ct.new, ct.trk flags set. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> 4.2 OVS sends packet to server’s conntrack zone=1. Conntrack table >>>>>>>>>>> for zone=1 is empty, it adds new entry, returns packet to OVS with >>>>>>>>>>> ct.trk and ct.new flags set. >>>>>>>>>>> 4.3 OVS sends packet to server’s VIF, next traffic operates >>>>>>>>>>> normally. >>>>>>>>>>> >>>>>>>>>>> So, with such behaviour connection establishment sometimes takes up >>>>>>>>>>> to three seconds (2 TCP SYN retries) and makes troubles in overlay >>>>>>>>>>> services. (Application timeouts and service outages). >>>>>>>>>>> >>>>>>>>>>> I’ve checked how conntrack works inside VMs with such traffic and >>>>>>>>>>> it looks like if conntrack gets a packet within a TW connection it >>>>>>>>>>> recreates a new conntrack entry. No tuning inside VMs was >>>>>>>>>>> performed. As a server I used apache with default config from >>>>>>>>>>> CentOS distribution. >>>>>>>> >>>>>>>> I don't have a centos 7 at hand but I do have a rhel 7 >>>>>>>> (3.10.0-1160.36.2.el7.x86_64) and I didn't manage to hit the issue you >>>>>>>> reported here (using OVS and OVN upstream master). The SYN matching >>>>>>>> the >>>>>>>> conntrack entry in state TIME_WAIT moves the entry to NEW and seems to >>>>>>>> be forwarded just fine, the session afterwards go to ESTABLISHED. >>>>>>>> >>>>>>>> Wed Sep 15 04:18:35 AM EDT 2021 >>>>>>>> conntrack v1.4.5 (conntrack-tools): 7 flow entries have been shown. >>>>>>>> tcp 6 431930 ESTABLISHED src=42.42.42.2 dst=42.42.42.3 sport=4141 >>>>>>>> dport=4242 src=42.42.42.3 dst=42.42.42.2 sport=4242 dport=4141 >>>>>>>> [ASSURED] >>>>>>>> mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=6 use=1 >>>>>>>> tcp 6 431930 ESTABLISHED src=42.42.42.2 dst=42.42.42.3 sport=4141 >>>>>>>> dport=4242 src=42.42.42.3 dst=42.42.42.2 sport=4242 dport=4141 >>>>>>>> [ASSURED] >>>>>>>> mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=3 use=1 >>>>>>>> -- >>>>>>>> Wed Sep 15 04:18:36 AM EDT 2021 >>>>>>>> conntrack v1.4.5 (conntrack-tools): 7 flow entries have been shown. >>>>>>>> tcp 6 119 TIME_WAIT src=42.42.42.2 dst=42.42.42.3 sport=4141 >>>>>>>> dport=4242 src=42.42.42.3 dst=42.42.42.2 sport=4242 dport=4141 >>>>>>>> [ASSURED] >>>>>>>> mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=6 use=1 >>>>>>>> tcp 6 119 TIME_WAIT src=42.42.42.2 dst=42.42.42.3 sport=4141 >>>>>>>> dport=4242 src=42.42.42.3 dst=42.42.42.2 sport=4242 dport=4141 >>>>>>>> [ASSURED] >>>>>>>> mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=3 use=1 >>>>>>>> -- >>>>>>>> Wed Sep 15 04:18:38 AM EDT 2021 >>>>>>>> conntrack v1.4.5 (conntrack-tools): 7 flow entries have been shown. >>>>>>>> tcp 6 431999 ESTABLISHED src=42.42.42.2 dst=42.42.42.3 sport=4141 >>>>>>>> dport=4242 src=42.42.42.3 dst=42.42.42.2 sport=4242 dport=4141 >>>>>>>> [ASSURED] >>>>>>>> mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=6 use=1 >>>>>>>> tcp 6 431999 ESTABLISHED src=42.42.42.2 dst=42.42.42.3 sport=4141 >>>>>>>> dport=4242 src=42.42.42.3 dst=42.42.42.2 sport=4242 dport=4141 >>>>>>>> [ASSURED] >>>>>>>> mark=0 secctx=system_u:object_r:unlabeled_t:s0 zone=3 use=1 >>>>>>>> -- >>>>>>>> >>>>>>>> DP flows just after the second session is initiated also seem to >>>>>>>> confirm >>>>>>>> that everything is fine: >>>>>>>> >>>>>>>> # ovs-appctl dpctl/dump-flows | grep -oE "ct_state(.*),ct_label" >>>>>>>> ct_state(+new-est-rel-rpl-inv+trk),ct_label >>>>>>>> ct_state(-new+est-rel-rpl-inv+trk),ct_label >>>>>>>> ct_state(-new+est-rel+rpl-inv+trk),ct_label >>>>>>>> ct_state(+new-est-rel-rpl-inv+trk),ct_label >>>>>>>> ct_state(-new+est-rel+rpl-inv+trk),ct_label >>>>>>>> ct_state(-new+est-rel-rpl-inv+trk),ct_label >>>>>>>> >>>>>>>> I also tried it out on a Fedora 34 with 5.13.14-200.fc34.x86_64, still >>>>>>>> works fine. >>>>>>>> >>>>>>>> What kernel and openvswitch module versions do you use? >>>>>>>> >>>>>>> On my box there is CentOS 7.5 with kernel 3.10.0-862.14.4.el7 and OOT >>>>>>> kernel module. >>>>>>> I’ve tested two versions, in both the problem was hit: >>>>>>> openvswitch-kmod-2.13.4-1.el7_5.x86_64 >>>>>>> openvswitch-kmod-2.16.0-1.el7_5.x86_64 >>>>>>> >>>>>>> Do you think the problem could be related to kernel (conntrack) and >>>>>>> kernel must be upgraded here? >>>>>>> Or, maybe I should try master OVS, as you did? >>>>>> >>>>>> I just tried with OVS v2.13.4, OVN master and it all worked fine (both >>>>>> on Fedora 34 and rhel 7). I don't think the problem is in user space. >>>>>> >>>>>> Regards, >>>>>> Dumitru >>>>>> >>>>>> _______________________________________________ >>>>>> dev mailing list >>>>>> d...@openvswitch.org <mailto:d...@openvswitch.org> >>>>>> <mailto:d...@openvswitch.org <mailto:d...@openvswitch.org>> >>>>>> <mailto:d...@openvswitch.org <mailto:d...@openvswitch.org> >>>>>> <mailto:d...@openvswitch.org <mailto:d...@openvswitch.org>>> >>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>>>>> <https://mail.openvswitch.org/mailman/listinfo/ovs-dev> >>>>>> <https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>>>>> <https://mail.openvswitch.org/mailman/listinfo/ovs-dev>> >>>>>> <https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>>>>> <https://mail.openvswitch.org/mailman/listinfo/ovs-dev><https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>>>>> <https://mail.openvswitch.org/mailman/listinfo/ovs-dev>>> >>>>> _______________________________________________ >>>>> dev mailing list >>>>> d...@openvswitch.org <mailto:d...@openvswitch.org> >>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>>>> <https://mail.openvswitch.org/mailman/listinfo/ovs-dev> >>>> >>> >>> _______________________________________________ >>> dev mailing list >>> d...@openvswitch.org >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >> >> > > _______________________________________________ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev