On Fri, Aug 30, 2019 at 8:18 PM Han Zhou <zhou...@gmail.com> wrote:
>
>
>
> On Fri, Aug 30, 2019 at 6:46 AM Mark Michelson <mmich...@redhat.com> wrote:
> >
> > On 8/30/19 5:39 AM, Daniel Alvarez Sanchez wrote:
> > > On Thu, Aug 29, 2019 at 10:01 PM Mark Michelson <mmich...@redhat.com> 
> > > wrote:
> > >>
> > >> On 8/29/19 2:39 PM, Numan Siddique wrote:
> > >>> Hello Everyone,
> > >>>
> > >>> In one of the OVN deployments, we are seeing 100% CPU usage by
> > >>> ovn-controllers all the time.
> > >>>
> > >>> After investigations we found the below
> > >>>
> > >>>    - ovn-controller is taking more than 20 seconds to complete full loop
> > >>> (mainly in lflow_run() function)
> > >>>
> > >>>    - The physical switch is sending GARPs periodically every 10 seconds.
> > >>>
> > >>>    - There is ovn-bridge-mappings configured and these GARP packets
> > >>> reaches br-int via the patch port.
> > >>>
> > >>>    - We have a flow in router pipeline which applies the action - 
> > >>> put_arp
> > >>> if it is arp packet.
> > >>>
> > >>>    - ovn-controller pinctrl thread receives these garps, stores the
> > >>> learnt mac-ips in the 'put_mac_bindings' hmap and notifies the
> > >>> ovn-controller main thread by incrementing the seq no.
> > >>>
> > >>>    - In the ovn-controller main thread, after lflow_run() finishes,
> > >>> pinctrl_wait() is called. This function calls - poll_immediate_wake() as
> > >>> 'put_mac_bindings' hmap is not empty.
> > >>>
> > >>> - This causes the ovn-controller poll_block() to not sleep at all and
> > >>> this repeats all the time resulting in 100% cpu usage.
> > >>>
> > >>> The deployment has OVS/OVN 2.9.  We have back ported the pinctrl_thread
> > >>> patch.
> > >>>
> > >>> Some time back I had reported an issue about lflow_run() taking lot of
> > >>> time - 
> > >>> https://mail.openvswitch.org/pipermail/ovs-dev/2019-July/360414.html
> > >>>
> > >>> I think we need to improve the logical processing sooner or later.
> > >>
> > >> I agree that this is very important. I know that logical flow processing
> > >> is the biggest bottleneck for ovn-controller, but 20 seconds is just
> > >> ridiculous. In your scale testing, you found that lflow_run() was taking
> > >> 10 seconds to complete.
> > > I support this statement 100% (20 seconds is just ridiculous). To be
> > > precise, in this deployment we see over 23 seconds for the main loop
> > > to process and I've seen even 30 seconds some times. I've been talking
> > > to Numan these days about this issue and I support profiling this
> > > actual deployment so that we can figure out how incremental processing
> > > would help.
> > >
> > >>
> > >> I'm curious if there are any factors in this particular deployment's
> > >> configuration that might contribute to this. For instance, does this
> > >> deployment have a glut of ACLs? Are they not using port groups?
> > > They're not using port groups because it's 2.9 and it is not there.
> > > However, I don't think port groups would make a big difference in
> > > terms of ovn-controller computation. I might be wrong but Port Groups
> > > help reduce the number of ACLs in the NB database while the # of
> > > Logical Flows would still remain the same. We'll try to get the
> > > contents of the NB database and figure out what's killing it.
> > >
> >
> > You're right that port groups won't reduce the number of logical flows.
>
> I think port-group reduces number of logical flows significantly, and also 
> reduces OVS flows when conjunctive matches are effective.

Right, definitely the number of lflows will be much lower. My bad as I
was directly involved in this! :) I was just thinking that the number
of OVS flows will remain the same so the computation for
ovn-controller would be similar but I missed the conjunctive matches
part in my statement.


> Please see my calculation here: 
> https://www.slideshare.net/hanzhou1978/large-scale-overlay-networks-with-ovn-problems-and-solutions/30
>
> > However, it can reduce the computation in ovn-controller. The reason is
> > that the logical flows generated by ACLs that use port groups may result
> > in conjunctive matches being used. If you want a bit more information,
> > see the "Port groups" section of this blog post I wrote:
> >
> > https://developers.redhat.com/blog/2019/01/02/performance-improvements-in-ovn-past-and-future/
> >
> > The TL;DR is that with port groups, I saw the number of OpenFlow flows
> > generated by ovn-controller drop by 3 orders of magnitude. And that
> > meant that flow processing was 99% faster for large networks.
> >
> > You may not see the same sort of improvement for this deployment, mainly
> > because my test case was tailored to illustrate how port groups help.
> > There may be other factors in this deployment that complicate flow
> > processing.
> >
> > >>
> > >> This particular deployment's configuration may give us a good scenario
> > >> for our testing to improve lflow processing time.
> > > Absolutely!
> > >>
> > >>>
> > >>> But to fix this issue urgently, we are thinking of the below approach.
> > >>>
> > >>>    - pinctrl_thread will locally cache the mac_binding entries (just 
> > >>> like
> > >>> it caches the dns entries). (Please note pinctrl_thread can not access
> > >>> the SB DB IDL).
> > >>
> > >>>
> > >>> - Upon receiving any arp packet (via the put_arp action), pinctrl_thread
> > >>> will check the local mac_binding cache and will only wake up the main
> > >>> ovn-controller thread only if the mac_binding update is required.
> > >>>
> > >>> This approach will solve the issue since the MAC sent by the physical
> > >>> switches will not change. So there is no need to wake up ovn-controller
> > >>> main thread.
> > >>
> > >> I think this can work well. We have a lot of what's needed already in
> > >> pinctrl at this point. We have the hash table of mac bindings already.
> > >> Currently, we flush this table after we write the data to the southbound
> > >> database. Instead, we would keep the bindings in memory. We would need
> > >> to ensure that the in-memory MAC bindings eventually get deleted if they
> > >> become stale.
> > >>
> > >>>
> > >>> In the present master/2.12 these GARPs will not cause this 100% cpu loop
> > >>> issue because incremental processing will not recompute flows.
> > >>
> > >> Another mitigating factor for master is something I'm currently working
> > >> on. I've got the beginnings of a patch series going where I am
> > >> separating pinctrl into a separate process from ovn-controller:
> > >> https://github.com/putnopvut/ovn/tree/pinctrl_process
> > >>
> > >> It's in the early stages right now, so please don't judge :)
> > >>
> > >> Separating pinctrl to its own process means that it cannot directly
> > >> cause ovn-controller to wake up like it currently might.
> > >>
> > >>>
> > >>> Even though the above approach is not really required for master/2.12, I
> > >>> think it is still Ok to have this as there is no harm.
> > >>>
> > >>> I would like to know your comments and any concerns if any.
> > >>
> > >> Hm, I don't really understand why we'd want to put this in master/2.12
> > >> if the problem doesn't exist there. The main concern I have is with
> > >> regards to cache lifetime. I don't want to introduce potential memory
> > >> growth concerns into a branch if it's not necessary.
> > >>
> > >> Is there a way for us to get this included in 2.9-2.11 without having to
> > >> put it in master or 2.12? It's hard to classify this as a bug fix,
> > >> really, but it does prevent unwanted behavior in real-world setups.
> > >> Could we get an opinion from committers on this?
> > >>
> > >>>
> > >>> Thanks
> > >>> Numan
> > >>>
> > >>>
> > >>> _______________________________________________
> > >>> discuss mailing list
> > >>> disc...@openvswitch.org
> > >>> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
> > >>>
> > >>
> > >> _______________________________________________
> > >> discuss mailing list
> > >> disc...@openvswitch.org
> > >> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
> >
> > _______________________________________________
> > dev mailing list
> > d...@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to