On 4/16/22 00:41, Han Zhou wrote:
> On Fri, Apr 15, 2022 at 1:45 PM Dumitru Ceara <[email protected]> wrote:
>>
>> On 4/15/22 20:25, Han Zhou wrote:
>>> On Fri, Apr 15, 2022 at 7:22 AM Dumitru Ceara <[email protected]> wrote:
>>>>
>>>> At a first glance, change tracking should never be allowed for
>>>> write-only columns. However, some clients (e.g., ovn-northd) that are
>>>> mostly exclusive writers of a database, use change tracking to avoid
>>>> duplicating the IDL row records into a local cache when implementing
>>>> incremental processing.
>>>
>>> Hi Dumitru,
>>>
>>
>> Hi Han,
>>
>>> It is still not clear to me why ovn-northd would need change-tracking
> for
>>> the write-only columns. It didn't require change tracking before using
> the
>>> incremental processing engine. In theory, to my understanding,
>>> change-tracking is required only if the column is an input to some of
> the
>>> engine nodes, so that it needs to be alerted when there is a change in
> the
>>> database. If the column is write-only to ovn-northd, it means it doesn't
>>> care about any change of the column from the DB, but blindly writes to
> the
>>> column regardless of the current value. I checked the patch [0], which
>>> mentioned that
>>>
>>> Such nodes are primary inputs to the northd incremental processing
>>> engine and without proper update processing for Southbound tables,
>>> northd might fail to timely reconcile the contents of the Southbound
>>> database.
>>>
>>> I am confused that, if the columns were write-only, how come they become
>>> inputs to northd and requires reconcile upon them? Does it mean that
> they
>>> were misused before I-P, that they were in fact read/write columns to
>>> ovn-northd?
>>
>> I think it's actually a bit more complicated than this.
>>
>> Even for pure write-only tables/columns we have a problem due to the
>> fact that clustered ovsdb offers at-least-once consistency [2].
>>
>> We actually hit the case in which a single northd transaction to insert
>> a SB.Load_Balancer record resulted in two identical rows being added to
>> the database, both of them pointing to the same logical switch datapath
>> binding [3]. When that logical switch was deleted northd would fail to
>> remove the duplicate from the SB causing a referential integrity
>> violation due to the dangling reference.
>>
>> Arguably the schema for load balancers is not ideal because it doesn't
>> defines an index on "name". Nevertheless, that's not something we can
>> change easily because it will (quite likely with raft) break existing
>> deployments. And it applies to other tables in the NB/SB schema.
>>
>> The fix was relatively straighforward though [4], and it meant fixing
>> the way northd reconciles the SB.Load_balancers.
>>
>> This is were I-P became an issue. Consider the current main branch
>> code with commit e4d6d3455baf ("ovn-northd: Enable change tracking for
>> all SB tables.") reverted. In a sandbox we do:
>>
>> $ ovn-nbctl ls-add ls
>> $ ovn-nbctl lb-add lb1 1.1.1.1:1000 2.2.2.2:2000 -- ls-lb-add ls lb1
>>
>> # At this point the SB.LB table looks ok:
>> $ ovn-sbctl list load_balancer
>
>
>> _uuid : 71d07ce2-d8ad-44f5-8fb6-9e4f5ef7ea92
>> datapaths : [29d576ef-31f5-485b-acb7-190373ef74c3]
>> external_ids : {lb_id="c2206d6a-6939-47cc-92de-554d2bc163b0"}
>> name : lb1
>> options : {hairpin_orig_tuple="true"}
>> protocol : tcp
>> vips : {"1.1.1.1:1000"="2.2.2.2:2000"}
>>
>> # Simulate the bug in [3]
>> $ ovn-sbctl create load_balancer name=lb1
> external_ids:lb_id=c2206d6a-6939-47cc-92de-554d2bc163b0
>>
>> # Even with the northd fix [4], due to the fact that the LB table is
>> # write-only, northd doesn't wake up so we end up with the duplicate
>> # staying in the SB for an indeterminate time.
>> $ ovn-sbctl list load_balancer
>> _uuid : 71d07ce2-d8ad-44f5-8fb6-9e4f5ef7ea92
>> datapaths : [29d576ef-31f5-485b-acb7-190373ef74c3]
>> external_ids : {lb_id="c2206d6a-6939-47cc-92de-554d2bc163b0"}
>> name : lb1
>> options : {hairpin_orig_tuple="true"}
>> protocol : tcp
>> vips : {"1.1.1.1:1000"="2.2.2.2:2000"}
>>
>> _uuid : 0bff0f83-234a-4631-ab88-2348b9d07d1f
>> datapaths : []
>> external_ids : {lb_id="c2206d6a-6939-47cc-92de-554d2bc163b0"}
>> name : lb1
>> options : {}
>> protocol : []
>> vips : {}
>>
>> # Only when we generate an input for a "non-write-only" column the SB.LB
>> # is reconciled, because the "northd" I-P node runs.
>> $ ovn-nbctl --wait=sb sync
>> $ ovn-sbctl list load_balancer
>> _uuid : 71d07ce2-d8ad-44f5-8fb6-9e4f5ef7ea92
>> datapaths : [29d576ef-31f5-485b-acb7-190373ef74c3]
>> external_ids : {lb_id="c2206d6a-6939-47cc-92de-554d2bc163b0"}
>> name : lb1
>> options : {hairpin_orig_tuple="true"}
>> protocol : tcp
>> vips : {"1.1.1.1:1000"="2.2.2.2:2000"}
>>
>>> Similarly for the patch [1], I didn't figure out what's exactly broken
>>> without change-tracking to the options column from the commit message
> and
>>> the patch itself.
>>
>> I think the reason behind [1] is that ovn-controller (in the pinctrl
>> module) writes some options, SB.Port_Binding.options:ipv6_ra_pd_list,
>> which were then propagated by ovn-northd to
>> Logical_Router_Port.ipv6_prefix.
>>
>> This is done in ovn_update_ipv6_prefix(), part of ovnnb_db_run() in the
>> "northd" I-P node. But, if the only change that woke up northd was the
>> SB.Port_Binding.options:ipv6_ra_pd_list change, because the column is
>> not tracked, the en_sb_port_binding I-P node (input of northd) doesn't
>> run.
>>
>>> I think I need to understand the context before commenting on the
> current
>>> patch. Could you help explain a little more?
>>>
>>
>> I hope this makes it more clear. I'm sure there are more aspects I
>> missed here but I think we need to find a way to make northd work
>> correctly and without delays in reconciling state while at the same
>> time avoiding to affect latency (like in the performance regression
>> I introduced with [0]).
>>
> Thanks Dumitru for the detailed explanation. I think I understand the
> problem now. In both of the cases above, we were in fact monitoring the
> columns as "write-only" while they were in fact "read-write".
>
> 1) The first case (the load balancer one) looks more complex, but
> essentially it is an input to ovn-northd. If we consider "removing
> duplicates" as a feature of northd, although the duplicates would in fact
> only be triggered by northd itself (but not actually generated by itself,
> it was RAFT's problem that actually generated the duplicated row), northd
> is reading the records and then remove the duplicates and write it back.
> That's why the northd needs to track the load-balancer columns and react to
> the changes. (We may still call it "write-only" in this case, which doesn't
> mean northd doesn't read the value, but just that northd is the only one
> that decides what should be there, not worrying about overwriting others'
> changes. I'd rather call it "read-write" to be more precise.)
>
Sure, but just for completeness, this applies to the following tables
too (they don't have an index defined in the SB schema): DHCP_Options,
DHCPv6_Options, DNS, HA_Chassis.
Most of these didn't have change tracking enabled before [0] either as
far as I can tell; or at least not directly. We'd have to see if there
are more corner cases to cover.
> 2) The second case (the IPv6 one) is more straightforward. If
> ovn-controller can modify the column, the column should never be regarded
> as write-only to northd, which is a bug even existed before northd I-P.
> There may be other such bugs that we haven't noticed before, but can be
> exposed by the I-P change. These bugs were not noticed before because
> anything that wakes up the main loop would trigger a recompute before I-P
> implementation, so we were actually handling any changes in time. And the
> problem mentioned in [5] Commit 1cc618c32524 ("ovsdb-idl: Fix atomicity of
> writes that don't change a column's value.") doesn't apply to this use case
> (correct me if I am wrong).
>
Indeed, I think [5] is not really relevant in the ovn-northd <-> SB <->
ovn-controller context because both ovn-northd and ovn-controller don't
really care about transactions and (should) ensure eventual consistency
of the SB data.
> Now to the solution:
>
> a) To fix the bugs, we shouldn't blindly change-tracking for all tables and
> columns, but instead, track changes for the columns that are read-write but
> treated as write-only by mistake, while keep the real write-only columns
> write-only (e.g. the nb_cfg related columns in SB_Global). I think we
> should fix this regardless.
>
In theory, yes, but please see "c)" below.
> b) However, this is not enough, because we are facing the performance
> problem introduced by such fixes, even if we only add change-tracking to
> very few new columns. E.g. the one added by [1] already impacts performance
> significantly even adding tracking for just one column. We need further
> improvement to solve the performance problem, which is primarily caused by
> the behavior introduced by [5]. However, in most use cases we don't have
> the problem mentioned by [5]. I think this is the assumption that leads to
> this patch as the solution for the performance problem and I think it is
> the right approach. It allows northd to get alerted by changes, but only
> write back columns that were changed. I just want to emphasize that we need
> this approach on top of a), and only apply for the columns that are
> read-write but don't worry about atomicity of transactions if only the
> changed values are written back.
I would add:
c) However, before [6], the non-incremental implementation of ovn-northd
had the side effect that *any* external change to a SB record/column,
regardless of how the column was monitored, would trigger northd to
reconcile the SB contents. That won't be the case anymore. I'm not
sure if this is a big problem but it might uncover issues that used to
be handled indirectly by the northd full reprocessing.
E.g., in a sandbox with [0] reverted:
$ ovn-nbctl ls-add ls
$ ovn-sbctl --uuid lflow-list | grep ls_in_acl_after_lb
uuid=0xf771dbcb, table=13(ls_in_acl_after_lb ), priority=0 ,
match=(1), action=(next;)
# Simulate a lflow "disappearing" from the SB.
$ ovn-sbctl destroy logical_flow f771dbcb
# Northd doesn't recreate it until a read/write column changes value:
$ ovn-sbctl --uuid lflow-list | grep ls_in_acl_after_lb
$ ovn-nbctl --wait=sb sync
$ ovn-sbctl --uuid lflow-list | grep ls_in_acl_after_lb
uuid=0x20a3b933, table=13(ls_in_acl_after_lb ), priority=0 ,
match=(1), action=(next;)
So, with all this in mind, if we find a good way of defining this new
monitoring mode ("writing back changed columns only") I don't really see
a problem with blindly enabling change tracking and this new mode for
the Southbound. All in all it just ensures the same behavior as before
[6] without any real drawbacks.
>
> With the above understanding, I have some comments for the patch:
>
> i) Functionally I believe it serves the purpose of solving the performance
> problem, but it seems confusing what the modes really mean now. There were
> definitions to each of the flags:
> - OVSDB_IDL_MONITOR /* Replicate this column? */
> - OVSDB_IDL_ALERT /* Alert client when column changes? */
> - OVSDB_IDL_TRACK // there is no comment, but it is obvious as the name
> suggests.
>
> Each of the meanings was clear. Now with the patch it seems to change the
> meanings of:
> - OVSDB_IDL_ALERT to "alert client when column changes, and the column is
> NOT write-only"
> - OVSDB_IDL_TRACK to "alert client when column changes, and track the
> changes"
>
> This is really confusing. I think the purpose here is just to provide a
> capability of "writing back changed columns only" while still "alert client
> when column changes". Would it be better just to add a new flag for this,
> like OVSDB_IDL_WRITE_CHANGED_ONLY? It can be set properly in existed APIs
> to keep the existed behavior, and provide an API for clients to set it
> explicitly for columns that are read-write but doesn't care about the
> atomicity.
Sure, sounds better to make it explicit indeed.
>
> ii) The comments in ovsdb-idl.h under "The possible mode combinations are:"
> needs updating.
Ack.
I'll prepare a v3 for the idl change. In the meantime it would be great
if we could already agree on the way we want to enable this in
ovn-northd (i.e., blindly for all SB vs. selectively for some tables).
>
> Thanks,
> Han
>
>>> [0]
>>>
> https://github.com/ovn-org/ovn/commit/e4d6d3455baf09c63ed610037c384855e5f64141
>>> [1]
>>>
> https://github.com/ovn-org/ovn/commit/d32a9bc5290e40dd63ef495eb4f0fcde9e446089
>>>
>>
>> [2]
> https://github.com/openvswitch/ovs/blob/master/Documentation/ref/ovsdb.7.rst#understanding-cluster-consistency
>> [3] https://bugzilla.redhat.com/show_bug.cgi?id=2045577
>> [4]
> https://github.com/ovn-org/ovn/commit/9deb000536e0dcf81d0e61d5a8af1d4e655960b4
>>
> [5]
> https://github.com/openvswitch/ovs/commit/1cc618c32524076d14ba3ee30e672a554b8ee605
>
[6]
https://github.com/ovn-org/ovn/commit/4597317f16d1a522fd468dc5339cbe3bb851b60e
Thanks,
Dumitru
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev