On 7/16/21 3:05 AM, Han Zhou wrote: > On Thu, Jul 15, 2021 at 5:03 PM Ben Pfaff <[email protected]> wrote: > >> On Mon, Jul 12, 2021 at 10:08:10AM +0200, Dumitru Ceara wrote: >>> Whenever a Load_Balancer is updated, e.g., a VIP is added, the following >>> sequence of events happens: >>> >>> 1. The Southbound Load_Balancer record is updated. >>> 2. The Southbound Datapath_Binding records on which the Load_Balancer is >>> applied are updated. >>> 3. Southbound ovsdb-server sends updates about the Load_Balancer and >>> Datapath_Binding records to ovn-controller. >>> 4. The IDL layer in ovn-controller processes the updates at #3, but >>> because of the SB schema references between tables [0] all logical >>> flows referencing the updated Datapath_Binding are marked as >>> "updated". The same is true for Logical_DP_Group records >>> referencing the Datapath_Binding, and also for all logical flows >>> pointing to the new "updated" datapath groups. >>> 5. ovn-controller ends up recomputing (removing/readding) all flows for >>> all these tracked updates. >> >> This is kind of a weird change from my perspective. It allows for >> broken referential integrity in the database to work around a >> performance bug in the IDL.
Thanks for looking at this change! I guess the description in the commit message is a bit unclear. Let me try to fix that here. There is no bug in the IDL; it's doing what it's supposed to do given the Southbound database schema. The problem is with the schema itself. When flow generation for load balancers was moved to ovn-controller an additional reference (from Datapath_Binding to Load_Balancer) was added in the SB schema to avoid performance issues or extremely complex code in ovn-controller to handle Load_Balancer updates incrementally. My first attempt at the fix was to just remove that additional reference. That is fine, and doesn't affect correctness but: 1. makes the upgrade scenario complex because ovn-controllers running old versions would expect that reference to be there. 2. adds lots of complicated code in ovn-controller incremental-processing. > > > Yes, it did look weird and there were detailed discussions on it in the v1 > reviews. Some options that require much bigger changes were discussed for > longer term, unless ddlog is used. > This would be the longer term solution, along with removing the reference from Datapath_Binding to Load_Balancers completely. But until then, we have huge performance impact in ovn-controller due to this. Quoting from my original commit log: "Some performance measurements, on a scale test deployment simulating an ovn-kubernetes deployment with 120 nodes and a large load balancer with 16K VIPs associated to each node's logical switch, the event processing loop time in ovn-controller, when adding a new VIP, is reduced from ~39 seconds to ~8 seconds." Regards, Dumitru _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
