On 7/16/21 3:05 AM, Han Zhou wrote:
> On Thu, Jul 15, 2021 at 5:03 PM Ben Pfaff <[email protected]> wrote:
> 
>> On Mon, Jul 12, 2021 at 10:08:10AM +0200, Dumitru Ceara wrote:
>>> Whenever a Load_Balancer is updated, e.g., a VIP is added, the following
>>> sequence of events happens:
>>>
>>> 1. The Southbound Load_Balancer record is updated.
>>> 2. The Southbound Datapath_Binding records on which the Load_Balancer is
>>>    applied are updated.
>>> 3. Southbound ovsdb-server sends updates about the Load_Balancer and
>>>    Datapath_Binding records to ovn-controller.
>>> 4. The IDL layer in ovn-controller processes the updates at #3, but
>>>    because of the SB schema references between tables [0] all logical
>>>    flows referencing the updated Datapath_Binding are marked as
>>>    "updated".  The same is true for Logical_DP_Group records
>>>    referencing the Datapath_Binding, and also for all logical flows
>>>    pointing to the new "updated" datapath groups.
>>> 5. ovn-controller ends up recomputing (removing/readding) all flows for
>>>    all these tracked updates.
>>
>> This is kind of a weird change from my perspective.  It allows for
>> broken referential integrity in the database to work around a
>> performance bug in the IDL.

Thanks for looking at this change!

I guess the description in the commit message is a bit unclear.  Let me
try to fix that here.  There is no bug in the IDL; it's doing what it's
supposed to do given the Southbound database schema.

The problem is with the schema itself.  When flow generation for load
balancers was moved to ovn-controller an additional reference (from
Datapath_Binding to Load_Balancer) was added in the SB schema to avoid
performance issues or extremely complex code in ovn-controller to handle
Load_Balancer updates incrementally.

My first attempt at the fix was to just remove that additional
reference.  That is fine, and doesn't affect correctness but:
1. makes the upgrade scenario complex because ovn-controllers running
old versions would expect that reference to be there.
2. adds lots of complicated code in ovn-controller incremental-processing.

> 
> 
> Yes, it did look weird and there were detailed discussions on it in the v1
> reviews. Some options that require much bigger changes were discussed for
> longer term, unless ddlog is used.
> 

This would be the longer term solution, along with removing the
reference from Datapath_Binding to Load_Balancers completely.

But until then, we have huge performance impact in ovn-controller due to
this.  Quoting from my original commit log:

"Some performance measurements, on a scale test deployment simulating an
ovn-kubernetes deployment with 120 nodes and a large load balancer
with 16K VIPs associated to each node's logical switch, the event
processing loop time in ovn-controller, when adding a new VIP, is
reduced from ~39 seconds to ~8 seconds."

Regards,
Dumitru

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to