On Tue, Apr 19, 2022 at 12:49 AM Dumitru Ceara <[email protected]> wrote:
>
> On 4/16/22 00:41, Han Zhou wrote:
> > On Fri, Apr 15, 2022 at 1:45 PM Dumitru Ceara <[email protected]> wrote:
> >>
> >> On 4/15/22 20:25, Han Zhou wrote:
> >>> On Fri, Apr 15, 2022 at 7:22 AM Dumitru Ceara <[email protected]>
wrote:
> >>>>
> >>>> At a first glance, change tracking should never be allowed for
> >>>> write-only columns.  However, some clients (e.g., ovn-northd) that
are
> >>>> mostly exclusive writers of a database, use change tracking to avoid
> >>>> duplicating the IDL row records into a local cache when implementing
> >>>> incremental processing.
> >>>
> >>> Hi Dumitru,
> >>>
> >>
> >> Hi Han,
> >>
> >>> It is still not clear to me why ovn-northd would need change-tracking
> > for
> >>> the write-only columns. It didn't require change tracking before using
> > the
> >>> incremental processing engine. In theory, to my understanding,
> >>> change-tracking is required only if the column is an input to some of
> > the
> >>> engine nodes, so that it needs to be alerted when there is a change in
> > the
> >>> database. If the column is write-only to ovn-northd, it means it
doesn't
> >>> care about any change of the column from the DB, but blindly writes to
> > the
> >>> column regardless of the current value. I checked the patch [0], which
> >>> mentioned that
> >>>
> >>>     Such nodes are primary inputs to the northd incremental processing
> >>>     engine and without proper update processing for Southbound tables,
> >>>     northd might fail to timely reconcile the contents of the
Southbound
> >>>     database.
> >>>
> >>> I am confused that, if the columns were write-only, how come they
become
> >>> inputs to northd and requires reconcile upon them? Does it mean that
> > they
> >>> were misused before I-P, that they were in fact read/write columns to
> >>> ovn-northd?
> >>
> >> I think it's actually a bit more complicated than this.
> >>
> >> Even for pure write-only tables/columns we have a problem due to the
> >> fact that clustered ovsdb offers at-least-once consistency [2].
> >>
> >> We actually hit the case in which a single northd transaction to insert
> >> a SB.Load_Balancer record resulted in two identical rows being added to
> >> the database, both of them pointing to the same logical switch datapath
> >> binding [3].  When that logical switch was deleted northd would fail to
> >> remove the duplicate from the SB causing a referential integrity
> >> violation due to the dangling reference.
> >>
> >> Arguably the schema for load balancers is not ideal because it doesn't
> >> defines an index on "name".  Nevertheless, that's not something we can
> >> change easily because it will (quite likely with raft) break existing
> >> deployments.  And it applies to other tables in the NB/SB schema.
> >>
> >> The fix was relatively straighforward though [4], and it meant fixing
> >> the way northd reconciles the SB.Load_balancers.
> >>
> >> This is were I-P became an issue.  Consider the current main branch
> >> code with commit e4d6d3455baf ("ovn-northd: Enable change tracking for
> >> all SB tables.") reverted.  In a sandbox we do:
> >>
> >> $ ovn-nbctl ls-add ls
> >> $ ovn-nbctl lb-add lb1 1.1.1.1:1000 2.2.2.2:2000 -- ls-lb-add ls lb1
> >>
> >> # At this point the SB.LB table looks ok:
> >> $ ovn-sbctl list load_balancer
> >
> >
> >> _uuid               : 71d07ce2-d8ad-44f5-8fb6-9e4f5ef7ea92
> >> datapaths           : [29d576ef-31f5-485b-acb7-190373ef74c3]
> >> external_ids        : {lb_id="c2206d6a-6939-47cc-92de-554d2bc163b0"}
> >> name                : lb1
> >> options             : {hairpin_orig_tuple="true"}
> >> protocol            : tcp
> >> vips                : {"1.1.1.1:1000"="2.2.2.2:2000"}
> >>
> >> # Simulate the bug in [3]
> >> $ ovn-sbctl create load_balancer name=lb1
> > external_ids:lb_id=c2206d6a-6939-47cc-92de-554d2bc163b0
> >>
> >> # Even with the northd fix [4], due to the fact that the LB table is
> >> # write-only, northd doesn't wake up so we end up with the duplicate
> >> # staying in the SB for an indeterminate time.
> >> $ ovn-sbctl list load_balancer
> >> _uuid               : 71d07ce2-d8ad-44f5-8fb6-9e4f5ef7ea92
> >> datapaths           : [29d576ef-31f5-485b-acb7-190373ef74c3]
> >> external_ids        : {lb_id="c2206d6a-6939-47cc-92de-554d2bc163b0"}
> >> name                : lb1
> >> options             : {hairpin_orig_tuple="true"}
> >> protocol            : tcp
> >> vips                : {"1.1.1.1:1000"="2.2.2.2:2000"}
> >>
> >> _uuid               : 0bff0f83-234a-4631-ab88-2348b9d07d1f
> >> datapaths           : []
> >> external_ids        : {lb_id="c2206d6a-6939-47cc-92de-554d2bc163b0"}
> >> name                : lb1
> >> options             : {}
> >> protocol            : []
> >> vips                : {}
> >>
> >> # Only when we generate an input for a "non-write-only" column the
SB.LB
> >> # is reconciled, because the "northd" I-P node runs.
> >> $ ovn-nbctl --wait=sb sync
> >> $ ovn-sbctl list load_balancer
> >> _uuid               : 71d07ce2-d8ad-44f5-8fb6-9e4f5ef7ea92
> >> datapaths           : [29d576ef-31f5-485b-acb7-190373ef74c3]
> >> external_ids        : {lb_id="c2206d6a-6939-47cc-92de-554d2bc163b0"}
> >> name                : lb1
> >> options             : {hairpin_orig_tuple="true"}
> >> protocol            : tcp
> >> vips                : {"1.1.1.1:1000"="2.2.2.2:2000"}
> >>
> >>> Similarly for the patch [1], I didn't figure out what's exactly broken
> >>> without change-tracking to the options column from the commit message
> > and
> >>> the patch itself.
> >>
> >> I think the reason behind [1] is that ovn-controller (in the pinctrl
> >> module) writes some options, SB.Port_Binding.options:ipv6_ra_pd_list,
> >> which were then propagated by ovn-northd to
> >> Logical_Router_Port.ipv6_prefix.
> >>
> >> This is done in ovn_update_ipv6_prefix(), part of ovnnb_db_run() in the
> >> "northd" I-P node.  But, if the only change that woke up northd was the
> >> SB.Port_Binding.options:ipv6_ra_pd_list change, because the column is
> >> not tracked, the en_sb_port_binding I-P node (input of northd) doesn't
> >> run.
> >>
> >>> I think I need to understand the context before commenting on the
> > current
> >>> patch. Could you help explain a little more?
> >>>
> >>
> >> I hope this makes it more clear.  I'm sure there are more aspects I
> >> missed here but I think we need to find a way to make northd work
> >> correctly and without delays in reconciling state while at the same
> >> time avoiding to affect latency (like in the performance regression
> >> I introduced with [0]).
> >>
> > Thanks Dumitru for the detailed explanation. I think I understand the
> > problem now. In both of the cases above, we were in fact monitoring the
> > columns as "write-only" while they were in fact "read-write".
> >
> > 1) The first case (the load balancer one) looks more complex, but
> > essentially it is an input to ovn-northd. If we consider "removing
> > duplicates" as a feature of northd, although the duplicates would in
fact
> > only be triggered by northd itself (but not actually generated by
itself,
> > it was RAFT's problem that actually generated the duplicated row),
northd
> > is reading the records and then remove the duplicates and write it back.
> > That's why the northd needs to track the load-balancer columns and
react to
> > the changes. (We may still call it "write-only" in this case, which
doesn't
> > mean northd doesn't read the value, but just that northd is the only one
> > that decides what should be there, not worrying about overwriting
others'
> > changes. I'd rather call it "read-write" to be more precise.)
> >
>
> Sure, but just for completeness, this applies to the following tables
> too (they don't have an index defined in the SB schema): DHCP_Options,
> DHCPv6_Options, DNS, HA_Chassis.
>
> Most of these didn't have change tracking enabled before [0] either as
> far as I can tell; or at least not directly.  We'd have to see if there
> are more corner cases to cover.
>
> > 2) The second case (the IPv6 one) is more straightforward. If
> > ovn-controller can modify the column, the column should never be
regarded
> > as write-only to northd, which is a bug even existed before northd I-P.
> > There may be other such bugs that we haven't noticed before, but can be
> > exposed by the I-P change. These bugs were not noticed before because
> > anything that wakes up the main loop would trigger a recompute before
I-P
> > implementation, so we were actually handling any changes in time. And
the
> > problem mentioned in [5] Commit 1cc618c32524 ("ovsdb-idl: Fix atomicity
of
> > writes that don't change a column's value.") doesn't apply to this use
case
> > (correct me if I am wrong).
> >
>
> Indeed, I think [5] is not really relevant in the ovn-northd <-> SB <->
> ovn-controller context because both ovn-northd and ovn-controller don't
> really care about transactions and (should) ensure eventual consistency
> of the SB data.

I'd be careful to claim "don't really care about transactions" for all use
cases, not that I know any use cases we do care about, but I just haven't
examined every use case of ovn-northd and ovn-controller. Anyway, as long
as we have the capability in the IDL to handle each column the way we want
(with or without a flag: OVSDB_IDL_WRITE_CHANGED_ONLY), we should be able
to handle both cases. Maybe we can assume it is ok to not care about
atomicity of writes in northd, and write the changed columns only for now,
but later if we figure out there is any problem for some columns, we can
always remove the flag for those columns.

>
> > Now to the solution:
> >
> > a) To fix the bugs, we shouldn't blindly change-tracking for all tables
and
> > columns, but instead, track changes for the columns that are read-write
but
> > treated as write-only by mistake, while keep the real write-only columns
> > write-only (e.g. the nb_cfg related columns in SB_Global). I think we
> > should fix this regardless.
> >
>
> In theory, yes, but please see "c)" below.
>
> > b) However, this is not enough, because we are facing the performance
> > problem introduced by such fixes, even if we only add change-tracking to
> > very few new columns. E.g. the one added by [1] already impacts
performance
> > significantly even adding tracking for just one column. We need further
> > improvement to solve the performance problem, which is primarily caused
by
> > the behavior introduced by [5]. However, in most use cases we don't have
> > the problem mentioned by [5]. I think this is the assumption that leads
to
> > this patch as the solution for the performance problem and I think it is
> > the right approach. It allows northd to get alerted by changes, but only
> > write back columns that were changed. I just want to emphasize that we
need
> > this approach on top of a), and only apply for the columns that are
> > read-write but don't worry about atomicity of transactions if only the
> > changed values are written back.
>
> I would add:
>
> c) However, before [6], the non-incremental implementation of ovn-northd
> had the side effect that *any* external change to a SB record/column,
> regardless of how the column was monitored, would trigger northd to
> reconcile the SB contents.  That won't be the case anymore.  I'm not
> sure if this is a big problem but it might uncover issues that used to
> be handled indirectly by the northd full reprocessing.
>
> E.g., in a sandbox with [0] reverted:
>
> $ ovn-nbctl ls-add ls
> $ ovn-sbctl --uuid lflow-list | grep ls_in_acl_after_lb
>   uuid=0xf771dbcb, table=13(ls_in_acl_after_lb ), priority=0    ,
> match=(1), action=(next;)
>
> # Simulate a lflow "disappearing" from the SB.
> $ ovn-sbctl destroy logical_flow f771dbcb
>
> # Northd doesn't recreate it until a read/write column changes value:
> $ ovn-sbctl --uuid lflow-list | grep ls_in_acl_after_lb
> $ ovn-nbctl --wait=sb sync
> $ ovn-sbctl --uuid lflow-list | grep ls_in_acl_after_lb
>   uuid=0x20a3b933, table=13(ls_in_acl_after_lb ), priority=0    ,
> match=(1), action=(next;)
>
> So, with all this in mind, if we find a good way of defining this new
> monitoring mode ("writing back changed columns only") I don't really see
> a problem with blindly enabling change tracking and this new mode for
> the Southbound.  All in all it just ensures the same behavior as before
> [6] without any real drawbacks.
>

I understand c), but I still think a) is important. For the example above,
since we expect northd to detect changes in SB logical_flow table and fix
it, it should be regarded as read-write. Probably most of the tables and
columns fall into this category, but there are exceptions. In some cases we
don't even want/need to fix changes. One example is nb_cfg, which is only
for sync or performance test purposes. It is purely write-only, and no need
to detect any changes from other components. Currently with alert enabled,
after every nb_cfg update to the SB by northd, the northd would receive
notification for its own change to this column which triggers a recompute
unnecessarily. I am not sure if there are other examples. But I think
overall we are on the same page. We can detect such cases one-by-one and
treat them as write-only.

Thanks,
Han

> >
> > With the above understanding, I have some comments for the patch:
> >
> > i) Functionally I believe it serves the purpose of solving the
performance
> > problem, but it seems confusing what the modes really mean now. There
were
> > definitions to each of the flags:
> > - OVSDB_IDL_MONITOR /* Replicate this column? */
> > - OVSDB_IDL_ALERT /* Alert client when column changes? */
> > - OVSDB_IDL_TRACK // there is no comment, but it is obvious as the name
> > suggests.
> >
> > Each of the meanings was clear. Now with the patch it seems to change
the
> > meanings of:
> > - OVSDB_IDL_ALERT to "alert client when column changes, and the column
is
> > NOT write-only"
> > - OVSDB_IDL_TRACK to "alert client when column changes, and track the
> > changes"
> >
> > This is really confusing. I think the purpose here is just to provide a
> > capability of "writing back changed columns only" while still "alert
client
> > when column changes". Would it be better just to add a new flag for
this,
> > like OVSDB_IDL_WRITE_CHANGED_ONLY? It can be set properly in existed
APIs
> > to keep the existed behavior, and provide an API for clients to set it
> > explicitly for columns that are read-write but doesn't care about the
> > atomicity.
>
> Sure, sounds better to make it explicit indeed.
>
> >
> > ii) The comments in ovsdb-idl.h under "The possible mode combinations
are:"
> > needs updating.
>
> Ack.
>
> I'll prepare a v3 for the idl change.  In the meantime it would be great
> if we could already agree on the way we want to enable this in
> ovn-northd (i.e., blindly for all SB vs. selectively for some tables).
>
> >
> > Thanks,
> > Han
> >
> >>> [0]
> >>>
> >
https://github.com/ovn-org/ovn/commit/e4d6d3455baf09c63ed610037c384855e5f64141
> >>> [1]
> >>>
> >
https://github.com/ovn-org/ovn/commit/d32a9bc5290e40dd63ef495eb4f0fcde9e446089
> >>>
> >>
> >> [2]
> >
https://github.com/openvswitch/ovs/blob/master/Documentation/ref/ovsdb.7.rst#understanding-cluster-consistency
> >> [3] https://bugzilla.redhat.com/show_bug.cgi?id=2045577
> >> [4]
> >
https://github.com/ovn-org/ovn/commit/9deb000536e0dcf81d0e61d5a8af1d4e655960b4
> >>
> > [5]
> >
https://github.com/openvswitch/ovs/commit/1cc618c32524076d14ba3ee30e672a554b8ee605
> >
>
> [6]
>
https://github.com/ovn-org/ovn/commit/4597317f16d1a522fd468dc5339cbe3bb851b60e
>
> Thanks,
> Dumitru
>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to