Re: [ovs-dev] [PATCH v2] ovsdb-idl: Support change tracking of write-only columns.

Dumitru Ceara Wed, 20 Apr 2022 12:32:41 -0700

On 4/19/22 19:47, Han Zhou wrote:
> On Tue, Apr 19, 2022 at 12:49 AM Dumitru Ceara <[email protected]> wrote:
>>
>> On 4/16/22 00:41, Han Zhou wrote:
>>> On Fri, Apr 15, 2022 at 1:45 PM Dumitru Ceara <[email protected]> wrote:
>>>>
>>>> On 4/15/22 20:25, Han Zhou wrote:
>>>>> On Fri, Apr 15, 2022 at 7:22 AM Dumitru Ceara <[email protected]>
> wrote:
>>>>>>
>>>>>> At a first glance, change tracking should never be allowed for
>>>>>> write-only columns.  However, some clients (e.g., ovn-northd) that
> are
>>>>>> mostly exclusive writers of a database, use change tracking to avoid
>>>>>> duplicating the IDL row records into a local cache when implementing
>>>>>> incremental processing.
>>>>>
>>>>> Hi Dumitru,
>>>>>
>>>>
>>>> Hi Han,
>>>>
>>>>> It is still not clear to me why ovn-northd would need change-tracking
>>> for
>>>>> the write-only columns. It didn't require change tracking before using
>>> the
>>>>> incremental processing engine. In theory, to my understanding,
>>>>> change-tracking is required only if the column is an input to some of
>>> the
>>>>> engine nodes, so that it needs to be alerted when there is a change in
>>> the
>>>>> database. If the column is write-only to ovn-northd, it means it
> doesn't
>>>>> care about any change of the column from the DB, but blindly writes to
>>> the
>>>>> column regardless of the current value. I checked the patch [0], which
>>>>> mentioned that
>>>>>
>>>>>     Such nodes are primary inputs to the northd incremental processing
>>>>>     engine and without proper update processing for Southbound tables,
>>>>>     northd might fail to timely reconcile the contents of the
> Southbound
>>>>>     database.
>>>>>
>>>>> I am confused that, if the columns were write-only, how come they
> become
>>>>> inputs to northd and requires reconcile upon them? Does it mean that
>>> they
>>>>> were misused before I-P, that they were in fact read/write columns to
>>>>> ovn-northd?
>>>>
>>>> I think it's actually a bit more complicated than this.
>>>>
>>>> Even for pure write-only tables/columns we have a problem due to the
>>>> fact that clustered ovsdb offers at-least-once consistency [2].
>>>>
>>>> We actually hit the case in which a single northd transaction to insert
>>>> a SB.Load_Balancer record resulted in two identical rows being added to
>>>> the database, both of them pointing to the same logical switch datapath
>>>> binding [3].  When that logical switch was deleted northd would fail to
>>>> remove the duplicate from the SB causing a referential integrity
>>>> violation due to the dangling reference.
>>>>
>>>> Arguably the schema for load balancers is not ideal because it doesn't
>>>> defines an index on "name".  Nevertheless, that's not something we can
>>>> change easily because it will (quite likely with raft) break existing
>>>> deployments.  And it applies to other tables in the NB/SB schema.
>>>>
>>>> The fix was relatively straighforward though [4], and it meant fixing
>>>> the way northd reconciles the SB.Load_balancers.
>>>>
>>>> This is were I-P became an issue.  Consider the current main branch
>>>> code with commit e4d6d3455baf ("ovn-northd: Enable change tracking for
>>>> all SB tables.") reverted.  In a sandbox we do:
>>>>
>>>> $ ovn-nbctl ls-add ls
>>>> $ ovn-nbctl lb-add lb1 1.1.1.1:1000 2.2.2.2:2000 -- ls-lb-add ls lb1
>>>>
>>>> # At this point the SB.LB table looks ok:
>>>> $ ovn-sbctl list load_balancer
>>>
>>>
>>>> _uuid               : 71d07ce2-d8ad-44f5-8fb6-9e4f5ef7ea92
>>>> datapaths           : [29d576ef-31f5-485b-acb7-190373ef74c3]
>>>> external_ids        : {lb_id="c2206d6a-6939-47cc-92de-554d2bc163b0"}
>>>> name                : lb1
>>>> options             : {hairpin_orig_tuple="true"}
>>>> protocol            : tcp
>>>> vips                : {"1.1.1.1:1000"="2.2.2.2:2000"}
>>>>
>>>> # Simulate the bug in [3]
>>>> $ ovn-sbctl create load_balancer name=lb1
>>> external_ids:lb_id=c2206d6a-6939-47cc-92de-554d2bc163b0
>>>>
>>>> # Even with the northd fix [4], due to the fact that the LB table is
>>>> # write-only, northd doesn't wake up so we end up with the duplicate
>>>> # staying in the SB for an indeterminate time.
>>>> $ ovn-sbctl list load_balancer
>>>> _uuid               : 71d07ce2-d8ad-44f5-8fb6-9e4f5ef7ea92
>>>> datapaths           : [29d576ef-31f5-485b-acb7-190373ef74c3]
>>>> external_ids        : {lb_id="c2206d6a-6939-47cc-92de-554d2bc163b0"}
>>>> name                : lb1
>>>> options             : {hairpin_orig_tuple="true"}
>>>> protocol            : tcp
>>>> vips                : {"1.1.1.1:1000"="2.2.2.2:2000"}
>>>>
>>>> _uuid               : 0bff0f83-234a-4631-ab88-2348b9d07d1f
>>>> datapaths           : []
>>>> external_ids        : {lb_id="c2206d6a-6939-47cc-92de-554d2bc163b0"}
>>>> name                : lb1
>>>> options             : {}
>>>> protocol            : []
>>>> vips                : {}
>>>>
>>>> # Only when we generate an input for a "non-write-only" column the
> SB.LB
>>>> # is reconciled, because the "northd" I-P node runs.
>>>> $ ovn-nbctl --wait=sb sync
>>>> $ ovn-sbctl list load_balancer
>>>> _uuid               : 71d07ce2-d8ad-44f5-8fb6-9e4f5ef7ea92
>>>> datapaths           : [29d576ef-31f5-485b-acb7-190373ef74c3]
>>>> external_ids        : {lb_id="c2206d6a-6939-47cc-92de-554d2bc163b0"}
>>>> name                : lb1
>>>> options             : {hairpin_orig_tuple="true"}
>>>> protocol            : tcp
>>>> vips                : {"1.1.1.1:1000"="2.2.2.2:2000"}
>>>>
>>>>> Similarly for the patch [1], I didn't figure out what's exactly broken
>>>>> without change-tracking to the options column from the commit message
>>> and
>>>>> the patch itself.
>>>>
>>>> I think the reason behind [1] is that ovn-controller (in the pinctrl
>>>> module) writes some options, SB.Port_Binding.options:ipv6_ra_pd_list,
>>>> which were then propagated by ovn-northd to
>>>> Logical_Router_Port.ipv6_prefix.
>>>>
>>>> This is done in ovn_update_ipv6_prefix(), part of ovnnb_db_run() in the
>>>> "northd" I-P node.  But, if the only change that woke up northd was the
>>>> SB.Port_Binding.options:ipv6_ra_pd_list change, because the column is
>>>> not tracked, the en_sb_port_binding I-P node (input of northd) doesn't
>>>> run.
>>>>
>>>>> I think I need to understand the context before commenting on the
>>> current
>>>>> patch. Could you help explain a little more?
>>>>>
>>>>
>>>> I hope this makes it more clear.  I'm sure there are more aspects I
>>>> missed here but I think we need to find a way to make northd work
>>>> correctly and without delays in reconciling state while at the same
>>>> time avoiding to affect latency (like in the performance regression
>>>> I introduced with [0]).
>>>>
>>> Thanks Dumitru for the detailed explanation. I think I understand the
>>> problem now. In both of the cases above, we were in fact monitoring the
>>> columns as "write-only" while they were in fact "read-write".
>>>
>>> 1) The first case (the load balancer one) looks more complex, but
>>> essentially it is an input to ovn-northd. If we consider "removing
>>> duplicates" as a feature of northd, although the duplicates would in
> fact
>>> only be triggered by northd itself (but not actually generated by
> itself,
>>> it was RAFT's problem that actually generated the duplicated row),
> northd
>>> is reading the records and then remove the duplicates and write it back.
>>> That's why the northd needs to track the load-balancer columns and
> react to
>>> the changes. (We may still call it "write-only" in this case, which
> doesn't
>>> mean northd doesn't read the value, but just that northd is the only one
>>> that decides what should be there, not worrying about overwriting
> others'
>>> changes. I'd rather call it "read-write" to be more precise.)
>>>
>>
>> Sure, but just for completeness, this applies to the following tables
>> too (they don't have an index defined in the SB schema): DHCP_Options,
>> DHCPv6_Options, DNS, HA_Chassis.
>>
>> Most of these didn't have change tracking enabled before [0] either as
>> far as I can tell; or at least not directly.  We'd have to see if there
>> are more corner cases to cover.
>>
>>> 2) The second case (the IPv6 one) is more straightforward. If
>>> ovn-controller can modify the column, the column should never be
> regarded
>>> as write-only to northd, which is a bug even existed before northd I-P.
>>> There may be other such bugs that we haven't noticed before, but can be
>>> exposed by the I-P change. These bugs were not noticed before because
>>> anything that wakes up the main loop would trigger a recompute before
> I-P
>>> implementation, so we were actually handling any changes in time. And
> the
>>> problem mentioned in [5] Commit 1cc618c32524 ("ovsdb-idl: Fix atomicity
> of
>>> writes that don't change a column's value.") doesn't apply to this use
> case
>>> (correct me if I am wrong).
>>>
>>
>> Indeed, I think [5] is not really relevant in the ovn-northd <-> SB <->
>> ovn-controller context because both ovn-northd and ovn-controller don't
>> really care about transactions and (should) ensure eventual consistency
>> of the SB data.
> 
> I'd be careful to claim "don't really care about transactions" for all use
> cases, not that I know any use cases we do care about, but I just haven't
> examined every use case of ovn-northd and ovn-controller. Anyway, as long
> as we have the capability in the IDL to handle each column the way we want
> (with or without a flag: OVSDB_IDL_WRITE_CHANGED_ONLY), we should be able
> to handle both cases. Maybe we can assume it is ok to not care about
> atomicity of writes in northd, and write the changed columns only for now,
> but later if we figure out there is any problem for some columns, we can
> always remove the flag for those columns.
> 
>>
>>> Now to the solution:
>>>
>>> a) To fix the bugs, we shouldn't blindly change-tracking for all tables
> and
>>> columns, but instead, track changes for the columns that are read-write
> but
>>> treated as write-only by mistake, while keep the real write-only columns
>>> write-only (e.g. the nb_cfg related columns in SB_Global). I think we
>>> should fix this regardless.
>>>
>>
>> In theory, yes, but please see "c)" below.
>>
>>> b) However, this is not enough, because we are facing the performance
>>> problem introduced by such fixes, even if we only add change-tracking to
>>> very few new columns. E.g. the one added by [1] already impacts
> performance
>>> significantly even adding tracking for just one column. We need further
>>> improvement to solve the performance problem, which is primarily caused
> by
>>> the behavior introduced by [5]. However, in most use cases we don't have
>>> the problem mentioned by [5]. I think this is the assumption that leads
> to
>>> this patch as the solution for the performance problem and I think it is
>>> the right approach. It allows northd to get alerted by changes, but only
>>> write back columns that were changed. I just want to emphasize that we
> need
>>> this approach on top of a), and only apply for the columns that are
>>> read-write but don't worry about atomicity of transactions if only the
>>> changed values are written back.
>>
>> I would add:
>>
>> c) However, before [6], the non-incremental implementation of ovn-northd
>> had the side effect that *any* external change to a SB record/column,
>> regardless of how the column was monitored, would trigger northd to
>> reconcile the SB contents.  That won't be the case anymore.  I'm not
>> sure if this is a big problem but it might uncover issues that used to
>> be handled indirectly by the northd full reprocessing.
>>
>> E.g., in a sandbox with [0] reverted:
>>
>> $ ovn-nbctl ls-add ls
>> $ ovn-sbctl --uuid lflow-list | grep ls_in_acl_after_lb
>>   uuid=0xf771dbcb, table=13(ls_in_acl_after_lb ), priority=0    ,
>> match=(1), action=(next;)
>>
>> # Simulate a lflow "disappearing" from the SB.
>> $ ovn-sbctl destroy logical_flow f771dbcb
>>
>> # Northd doesn't recreate it until a read/write column changes value:
>> $ ovn-sbctl --uuid lflow-list | grep ls_in_acl_after_lb
>> $ ovn-nbctl --wait=sb sync
>> $ ovn-sbctl --uuid lflow-list | grep ls_in_acl_after_lb
>>   uuid=0x20a3b933, table=13(ls_in_acl_after_lb ), priority=0    ,
>> match=(1), action=(next;)
>>
>> So, with all this in mind, if we find a good way of defining this new
>> monitoring mode ("writing back changed columns only") I don't really see
>> a problem with blindly enabling change tracking and this new mode for
>> the Southbound.  All in all it just ensures the same behavior as before
>> [6] without any real drawbacks.
>>
> 
> I understand c), but I still think a) is important. For the example above,
> since we expect northd to detect changes in SB logical_flow table and fix
> it, it should be regarded as read-write. Probably most of the tables and
> columns fall into this category, but there are exceptions. In some cases we
> don't even want/need to fix changes. One example is nb_cfg, which is only
> for sync or performance test purposes. It is purely write-only, and no need
> to detect any changes from other components. Currently with alert enabled,
> after every nb_cfg update to the SB by northd, the northd would receive
> notification for its own change to this column which triggers a recompute
> unnecessarily. I am not sure if there are other examples. But I think
> overall we are on the same page. We can detect such cases one-by-one and
> treat them as write-only.
>


Thanks, Han, for the review!

I just posted v3:
https://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/

Regards,
Dumitru


> Thanks,
> Han
> 
>>>
>>> With the above understanding, I have some comments for the patch:
>>>
>>> i) Functionally I believe it serves the purpose of solving the
> performance
>>> problem, but it seems confusing what the modes really mean now. There
> were
>>> definitions to each of the flags:
>>> - OVSDB_IDL_MONITOR /* Replicate this column? */
>>> - OVSDB_IDL_ALERT /* Alert client when column changes? */
>>> - OVSDB_IDL_TRACK // there is no comment, but it is obvious as the name
>>> suggests.
>>>
>>> Each of the meanings was clear. Now with the patch it seems to change
> the
>>> meanings of:
>>> - OVSDB_IDL_ALERT to "alert client when column changes, and the column
> is
>>> NOT write-only"
>>> - OVSDB_IDL_TRACK to "alert client when column changes, and track the
>>> changes"
>>>
>>> This is really confusing. I think the purpose here is just to provide a
>>> capability of "writing back changed columns only" while still "alert
> client
>>> when column changes". Would it be better just to add a new flag for
> this,
>>> like OVSDB_IDL_WRITE_CHANGED_ONLY? It can be set properly in existed
> APIs
>>> to keep the existed behavior, and provide an API for clients to set it
>>> explicitly for columns that are read-write but doesn't care about the
>>> atomicity.
>>
>> Sure, sounds better to make it explicit indeed.
>>
>>>
>>> ii) The comments in ovsdb-idl.h under "The possible mode combinations
> are:"
>>> needs updating.
>>
>> Ack.
>>
>> I'll prepare a v3 for the idl change.  In the meantime it would be great
>> if we could already agree on the way we want to enable this in
>> ovn-northd (i.e., blindly for all SB vs. selectively for some tables).
>>
>>>
>>> Thanks,
>>> Han
>>>
>>>>> [0]
>>>>>
>>>
> https://github.com/ovn-org/ovn/commit/e4d6d3455baf09c63ed610037c384855e5f64141
>>>>> [1]
>>>>>
>>>
> https://github.com/ovn-org/ovn/commit/d32a9bc5290e40dd63ef495eb4f0fcde9e446089
>>>>>
>>>>
>>>> [2]
>>>
> https://github.com/openvswitch/ovs/blob/master/Documentation/ref/ovsdb.7.rst#understanding-cluster-consistency
>>>> [3] https://bugzilla.redhat.com/show_bug.cgi?id=2045577
>>>> [4]
>>>
> https://github.com/ovn-org/ovn/commit/9deb000536e0dcf81d0e61d5a8af1d4e655960b4
>>>>
>>> [5]
>>>
> https://github.com/openvswitch/ovs/commit/1cc618c32524076d14ba3ee30e672a554b8ee605
>>>
>>
>> [6]
>>
> https://github.com/ovn-org/ovn/commit/4597317f16d1a522fd468dc5339cbe3bb851b60e
>>
>> Thanks,
>> Dumitru
>>
> 

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH v2] ovsdb-idl: Support change tracking of write-only columns.

Reply via email to