On Fri, Aug 30, 2019 at 8:18 PM Han Zhou <zhou...@gmail.com> wrote: > > > > On Fri, Aug 30, 2019 at 6:46 AM Mark Michelson <mmich...@redhat.com> wrote: > > > > On 8/30/19 5:39 AM, Daniel Alvarez Sanchez wrote: > > > On Thu, Aug 29, 2019 at 10:01 PM Mark Michelson <mmich...@redhat.com> > > > wrote: > > >> > > >> On 8/29/19 2:39 PM, Numan Siddique wrote: > > >>> Hello Everyone, > > >>> > > >>> In one of the OVN deployments, we are seeing 100% CPU usage by > > >>> ovn-controllers all the time. > > >>> > > >>> After investigations we found the below > > >>> > > >>> - ovn-controller is taking more than 20 seconds to complete full loop > > >>> (mainly in lflow_run() function) > > >>> > > >>> - The physical switch is sending GARPs periodically every 10 seconds. > > >>> > > >>> - There is ovn-bridge-mappings configured and these GARP packets > > >>> reaches br-int via the patch port. > > >>> > > >>> - We have a flow in router pipeline which applies the action - > > >>> put_arp > > >>> if it is arp packet. > > >>> > > >>> - ovn-controller pinctrl thread receives these garps, stores the > > >>> learnt mac-ips in the 'put_mac_bindings' hmap and notifies the > > >>> ovn-controller main thread by incrementing the seq no. > > >>> > > >>> - In the ovn-controller main thread, after lflow_run() finishes, > > >>> pinctrl_wait() is called. This function calls - poll_immediate_wake() as > > >>> 'put_mac_bindings' hmap is not empty. > > >>> > > >>> - This causes the ovn-controller poll_block() to not sleep at all and > > >>> this repeats all the time resulting in 100% cpu usage. > > >>> > > >>> The deployment has OVS/OVN 2.9. We have back ported the pinctrl_thread > > >>> patch. > > >>> > > >>> Some time back I had reported an issue about lflow_run() taking lot of > > >>> time - > > >>> https://mail.openvswitch.org/pipermail/ovs-dev/2019-July/360414.html > > >>> > > >>> I think we need to improve the logical processing sooner or later. > > >> > > >> I agree that this is very important. I know that logical flow processing > > >> is the biggest bottleneck for ovn-controller, but 20 seconds is just > > >> ridiculous. In your scale testing, you found that lflow_run() was taking > > >> 10 seconds to complete. > > > I support this statement 100% (20 seconds is just ridiculous). To be > > > precise, in this deployment we see over 23 seconds for the main loop > > > to process and I've seen even 30 seconds some times. I've been talking > > > to Numan these days about this issue and I support profiling this > > > actual deployment so that we can figure out how incremental processing > > > would help. > > > > > >> > > >> I'm curious if there are any factors in this particular deployment's > > >> configuration that might contribute to this. For instance, does this > > >> deployment have a glut of ACLs? Are they not using port groups? > > > They're not using port groups because it's 2.9 and it is not there. > > > However, I don't think port groups would make a big difference in > > > terms of ovn-controller computation. I might be wrong but Port Groups > > > help reduce the number of ACLs in the NB database while the # of > > > Logical Flows would still remain the same. We'll try to get the > > > contents of the NB database and figure out what's killing it. > > > > > > > You're right that port groups won't reduce the number of logical flows. > > I think port-group reduces number of logical flows significantly, and also > reduces OVS flows when conjunctive matches are effective.
Right, definitely the number of lflows will be much lower. My bad as I was directly involved in this! :) I was just thinking that the number of OVS flows will remain the same so the computation for ovn-controller would be similar but I missed the conjunctive matches part in my statement. > Please see my calculation here: > https://www.slideshare.net/hanzhou1978/large-scale-overlay-networks-with-ovn-problems-and-solutions/30 > > > However, it can reduce the computation in ovn-controller. The reason is > > that the logical flows generated by ACLs that use port groups may result > > in conjunctive matches being used. If you want a bit more information, > > see the "Port groups" section of this blog post I wrote: > > > > https://developers.redhat.com/blog/2019/01/02/performance-improvements-in-ovn-past-and-future/ > > > > The TL;DR is that with port groups, I saw the number of OpenFlow flows > > generated by ovn-controller drop by 3 orders of magnitude. And that > > meant that flow processing was 99% faster for large networks. > > > > You may not see the same sort of improvement for this deployment, mainly > > because my test case was tailored to illustrate how port groups help. > > There may be other factors in this deployment that complicate flow > > processing. > > > > >> > > >> This particular deployment's configuration may give us a good scenario > > >> for our testing to improve lflow processing time. > > > Absolutely! > > >> > > >>> > > >>> But to fix this issue urgently, we are thinking of the below approach. > > >>> > > >>> - pinctrl_thread will locally cache the mac_binding entries (just > > >>> like > > >>> it caches the dns entries). (Please note pinctrl_thread can not access > > >>> the SB DB IDL). > > >> > > >>> > > >>> - Upon receiving any arp packet (via the put_arp action), pinctrl_thread > > >>> will check the local mac_binding cache and will only wake up the main > > >>> ovn-controller thread only if the mac_binding update is required. > > >>> > > >>> This approach will solve the issue since the MAC sent by the physical > > >>> switches will not change. So there is no need to wake up ovn-controller > > >>> main thread. > > >> > > >> I think this can work well. We have a lot of what's needed already in > > >> pinctrl at this point. We have the hash table of mac bindings already. > > >> Currently, we flush this table after we write the data to the southbound > > >> database. Instead, we would keep the bindings in memory. We would need > > >> to ensure that the in-memory MAC bindings eventually get deleted if they > > >> become stale. > > >> > > >>> > > >>> In the present master/2.12 these GARPs will not cause this 100% cpu loop > > >>> issue because incremental processing will not recompute flows. > > >> > > >> Another mitigating factor for master is something I'm currently working > > >> on. I've got the beginnings of a patch series going where I am > > >> separating pinctrl into a separate process from ovn-controller: > > >> https://github.com/putnopvut/ovn/tree/pinctrl_process > > >> > > >> It's in the early stages right now, so please don't judge :) > > >> > > >> Separating pinctrl to its own process means that it cannot directly > > >> cause ovn-controller to wake up like it currently might. > > >> > > >>> > > >>> Even though the above approach is not really required for master/2.12, I > > >>> think it is still Ok to have this as there is no harm. > > >>> > > >>> I would like to know your comments and any concerns if any. > > >> > > >> Hm, I don't really understand why we'd want to put this in master/2.12 > > >> if the problem doesn't exist there. The main concern I have is with > > >> regards to cache lifetime. I don't want to introduce potential memory > > >> growth concerns into a branch if it's not necessary. > > >> > > >> Is there a way for us to get this included in 2.9-2.11 without having to > > >> put it in master or 2.12? It's hard to classify this as a bug fix, > > >> really, but it does prevent unwanted behavior in real-world setups. > > >> Could we get an opinion from committers on this? > > >> > > >>> > > >>> Thanks > > >>> Numan > > >>> > > >>> > > >>> _______________________________________________ > > >>> discuss mailing list > > >>> disc...@openvswitch.org > > >>> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss > > >>> > > >> > > >> _______________________________________________ > > >> discuss mailing list > > >> disc...@openvswitch.org > > >> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss > > > > _______________________________________________ > > dev mailing list > > d...@openvswitch.org > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev _______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss