On Fri, Nov 19, 2021 at 3:11 PM Ilya Maximets <[email protected]> wrote: > > On 11/19/21 19:12, Vladislav Odintsov wrote: > > Hi, > > > > I’m testing OVN stateless ACL rules with `$port_group_ipVERSION` in match portion. > > There’s a strange behaviour and sometimes I got configuration, which totally kills my transport nodes, where logical switch ports reside. > > ovs-vswitchd and ovn-controller processes utilise 100% 1 core CPU each and ovs-vswitchd consumes all free memory and repeatedly got killed by OOM-killer. It consumes 5GB memory in 5-10 seconds! > > > > I reproduced this with OVS 2.13.4 & OVN main, but also tried with actual OVS master branch and the problem still reproduces. > > > > Below are steps to reproduce: > > <snip> > > > > > I couldn’t get any source of the problem except to find the steps to reproduce. > > Can somebody please take a look on this? > > This looks like a potential serious problem for OVN transport nodes. > > This indeed looks like a serious issue. > And thanks for the great detailed report! That was really easy to reproduce. > > I think, I found the main problem. Could you try the following patch: > https://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/ > ?
Thanks Vladislav for reporting and thanks Ilya for the quick fix! The fix looks good to me. However, I think there are more problems revealed by this bug report to be addressed. I could also reproduce it easily and I see at least 3 problems: 1) The simple ACL condition shouldn't generate the huge number of flows (>60k) in the first place. The ovn-controller expression parser doesn't handle != for const sets efficiently. It can be optimized to combine most of the matches. For the example in this report, I'd expect at most hundreds of flows in total. I have some ideas but need to try it out. 2) The memory spike problem caused by in OVS as explained and fixed by Ilya. Really great finding and fix! It is definitely required even if 1) is solved, because we have real situations when a large number of flows will be generated and installed at once. 3) What's left unclear to me, related to 2), is that after the bundle processing is finished, the quiescent state should be entered, and the RCU thread should free the temporarily allocated memory, right? But at least in my test I don't see the memory goes down. With 60K flows OVS has 3.3G RES which is unreasonable. Thanks, Han > > Best regards, Ilya Maximets. > _______________________________________________ > dev mailing list > [email protected] > https://mail.openvswitch.org/mailman/listinfo/ovs-dev _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
