On 4/19/22 19:47, Han Zhou wrote: > On Tue, Apr 19, 2022 at 12:49 AM Dumitru Ceara <[email protected]> wrote: >> >> On 4/16/22 00:41, Han Zhou wrote: >>> On Fri, Apr 15, 2022 at 1:45 PM Dumitru Ceara <[email protected]> wrote: >>>> >>>> On 4/15/22 20:25, Han Zhou wrote: >>>>> On Fri, Apr 15, 2022 at 7:22 AM Dumitru Ceara <[email protected]> > wrote: >>>>>> >>>>>> At a first glance, change tracking should never be allowed for >>>>>> write-only columns. However, some clients (e.g., ovn-northd) that > are >>>>>> mostly exclusive writers of a database, use change tracking to avoid >>>>>> duplicating the IDL row records into a local cache when implementing >>>>>> incremental processing. >>>>> >>>>> Hi Dumitru, >>>>> >>>> >>>> Hi Han, >>>> >>>>> It is still not clear to me why ovn-northd would need change-tracking >>> for >>>>> the write-only columns. It didn't require change tracking before using >>> the >>>>> incremental processing engine. In theory, to my understanding, >>>>> change-tracking is required only if the column is an input to some of >>> the >>>>> engine nodes, so that it needs to be alerted when there is a change in >>> the >>>>> database. If the column is write-only to ovn-northd, it means it > doesn't >>>>> care about any change of the column from the DB, but blindly writes to >>> the >>>>> column regardless of the current value. I checked the patch [0], which >>>>> mentioned that >>>>> >>>>> Such nodes are primary inputs to the northd incremental processing >>>>> engine and without proper update processing for Southbound tables, >>>>> northd might fail to timely reconcile the contents of the > Southbound >>>>> database. >>>>> >>>>> I am confused that, if the columns were write-only, how come they > become >>>>> inputs to northd and requires reconcile upon them? Does it mean that >>> they >>>>> were misused before I-P, that they were in fact read/write columns to >>>>> ovn-northd? >>>> >>>> I think it's actually a bit more complicated than this. >>>> >>>> Even for pure write-only tables/columns we have a problem due to the >>>> fact that clustered ovsdb offers at-least-once consistency [2]. >>>> >>>> We actually hit the case in which a single northd transaction to insert >>>> a SB.Load_Balancer record resulted in two identical rows being added to >>>> the database, both of them pointing to the same logical switch datapath >>>> binding [3]. When that logical switch was deleted northd would fail to >>>> remove the duplicate from the SB causing a referential integrity >>>> violation due to the dangling reference. >>>> >>>> Arguably the schema for load balancers is not ideal because it doesn't >>>> defines an index on "name". Nevertheless, that's not something we can >>>> change easily because it will (quite likely with raft) break existing >>>> deployments. And it applies to other tables in the NB/SB schema. >>>> >>>> The fix was relatively straighforward though [4], and it meant fixing >>>> the way northd reconciles the SB.Load_balancers. >>>> >>>> This is were I-P became an issue. Consider the current main branch >>>> code with commit e4d6d3455baf ("ovn-northd: Enable change tracking for >>>> all SB tables.") reverted. In a sandbox we do: >>>> >>>> $ ovn-nbctl ls-add ls >>>> $ ovn-nbctl lb-add lb1 1.1.1.1:1000 2.2.2.2:2000 -- ls-lb-add ls lb1 >>>> >>>> # At this point the SB.LB table looks ok: >>>> $ ovn-sbctl list load_balancer >>> >>> >>>> _uuid : 71d07ce2-d8ad-44f5-8fb6-9e4f5ef7ea92 >>>> datapaths : [29d576ef-31f5-485b-acb7-190373ef74c3] >>>> external_ids : {lb_id="c2206d6a-6939-47cc-92de-554d2bc163b0"} >>>> name : lb1 >>>> options : {hairpin_orig_tuple="true"} >>>> protocol : tcp >>>> vips : {"1.1.1.1:1000"="2.2.2.2:2000"} >>>> >>>> # Simulate the bug in [3] >>>> $ ovn-sbctl create load_balancer name=lb1 >>> external_ids:lb_id=c2206d6a-6939-47cc-92de-554d2bc163b0 >>>> >>>> # Even with the northd fix [4], due to the fact that the LB table is >>>> # write-only, northd doesn't wake up so we end up with the duplicate >>>> # staying in the SB for an indeterminate time. >>>> $ ovn-sbctl list load_balancer >>>> _uuid : 71d07ce2-d8ad-44f5-8fb6-9e4f5ef7ea92 >>>> datapaths : [29d576ef-31f5-485b-acb7-190373ef74c3] >>>> external_ids : {lb_id="c2206d6a-6939-47cc-92de-554d2bc163b0"} >>>> name : lb1 >>>> options : {hairpin_orig_tuple="true"} >>>> protocol : tcp >>>> vips : {"1.1.1.1:1000"="2.2.2.2:2000"} >>>> >>>> _uuid : 0bff0f83-234a-4631-ab88-2348b9d07d1f >>>> datapaths : [] >>>> external_ids : {lb_id="c2206d6a-6939-47cc-92de-554d2bc163b0"} >>>> name : lb1 >>>> options : {} >>>> protocol : [] >>>> vips : {} >>>> >>>> # Only when we generate an input for a "non-write-only" column the > SB.LB >>>> # is reconciled, because the "northd" I-P node runs. >>>> $ ovn-nbctl --wait=sb sync >>>> $ ovn-sbctl list load_balancer >>>> _uuid : 71d07ce2-d8ad-44f5-8fb6-9e4f5ef7ea92 >>>> datapaths : [29d576ef-31f5-485b-acb7-190373ef74c3] >>>> external_ids : {lb_id="c2206d6a-6939-47cc-92de-554d2bc163b0"} >>>> name : lb1 >>>> options : {hairpin_orig_tuple="true"} >>>> protocol : tcp >>>> vips : {"1.1.1.1:1000"="2.2.2.2:2000"} >>>> >>>>> Similarly for the patch [1], I didn't figure out what's exactly broken >>>>> without change-tracking to the options column from the commit message >>> and >>>>> the patch itself. >>>> >>>> I think the reason behind [1] is that ovn-controller (in the pinctrl >>>> module) writes some options, SB.Port_Binding.options:ipv6_ra_pd_list, >>>> which were then propagated by ovn-northd to >>>> Logical_Router_Port.ipv6_prefix. >>>> >>>> This is done in ovn_update_ipv6_prefix(), part of ovnnb_db_run() in the >>>> "northd" I-P node. But, if the only change that woke up northd was the >>>> SB.Port_Binding.options:ipv6_ra_pd_list change, because the column is >>>> not tracked, the en_sb_port_binding I-P node (input of northd) doesn't >>>> run. >>>> >>>>> I think I need to understand the context before commenting on the >>> current >>>>> patch. Could you help explain a little more? >>>>> >>>> >>>> I hope this makes it more clear. I'm sure there are more aspects I >>>> missed here but I think we need to find a way to make northd work >>>> correctly and without delays in reconciling state while at the same >>>> time avoiding to affect latency (like in the performance regression >>>> I introduced with [0]). >>>> >>> Thanks Dumitru for the detailed explanation. I think I understand the >>> problem now. In both of the cases above, we were in fact monitoring the >>> columns as "write-only" while they were in fact "read-write". >>> >>> 1) The first case (the load balancer one) looks more complex, but >>> essentially it is an input to ovn-northd. If we consider "removing >>> duplicates" as a feature of northd, although the duplicates would in > fact >>> only be triggered by northd itself (but not actually generated by > itself, >>> it was RAFT's problem that actually generated the duplicated row), > northd >>> is reading the records and then remove the duplicates and write it back. >>> That's why the northd needs to track the load-balancer columns and > react to >>> the changes. (We may still call it "write-only" in this case, which > doesn't >>> mean northd doesn't read the value, but just that northd is the only one >>> that decides what should be there, not worrying about overwriting > others' >>> changes. I'd rather call it "read-write" to be more precise.) >>> >> >> Sure, but just for completeness, this applies to the following tables >> too (they don't have an index defined in the SB schema): DHCP_Options, >> DHCPv6_Options, DNS, HA_Chassis. >> >> Most of these didn't have change tracking enabled before [0] either as >> far as I can tell; or at least not directly. We'd have to see if there >> are more corner cases to cover. >> >>> 2) The second case (the IPv6 one) is more straightforward. If >>> ovn-controller can modify the column, the column should never be > regarded >>> as write-only to northd, which is a bug even existed before northd I-P. >>> There may be other such bugs that we haven't noticed before, but can be >>> exposed by the I-P change. These bugs were not noticed before because >>> anything that wakes up the main loop would trigger a recompute before > I-P >>> implementation, so we were actually handling any changes in time. And > the >>> problem mentioned in [5] Commit 1cc618c32524 ("ovsdb-idl: Fix atomicity > of >>> writes that don't change a column's value.") doesn't apply to this use > case >>> (correct me if I am wrong). >>> >> >> Indeed, I think [5] is not really relevant in the ovn-northd <-> SB <-> >> ovn-controller context because both ovn-northd and ovn-controller don't >> really care about transactions and (should) ensure eventual consistency >> of the SB data. > > I'd be careful to claim "don't really care about transactions" for all use > cases, not that I know any use cases we do care about, but I just haven't > examined every use case of ovn-northd and ovn-controller. Anyway, as long > as we have the capability in the IDL to handle each column the way we want > (with or without a flag: OVSDB_IDL_WRITE_CHANGED_ONLY), we should be able > to handle both cases. Maybe we can assume it is ok to not care about > atomicity of writes in northd, and write the changed columns only for now, > but later if we figure out there is any problem for some columns, we can > always remove the flag for those columns. > >> >>> Now to the solution: >>> >>> a) To fix the bugs, we shouldn't blindly change-tracking for all tables > and >>> columns, but instead, track changes for the columns that are read-write > but >>> treated as write-only by mistake, while keep the real write-only columns >>> write-only (e.g. the nb_cfg related columns in SB_Global). I think we >>> should fix this regardless. >>> >> >> In theory, yes, but please see "c)" below. >> >>> b) However, this is not enough, because we are facing the performance >>> problem introduced by such fixes, even if we only add change-tracking to >>> very few new columns. E.g. the one added by [1] already impacts > performance >>> significantly even adding tracking for just one column. We need further >>> improvement to solve the performance problem, which is primarily caused > by >>> the behavior introduced by [5]. However, in most use cases we don't have >>> the problem mentioned by [5]. I think this is the assumption that leads > to >>> this patch as the solution for the performance problem and I think it is >>> the right approach. It allows northd to get alerted by changes, but only >>> write back columns that were changed. I just want to emphasize that we > need >>> this approach on top of a), and only apply for the columns that are >>> read-write but don't worry about atomicity of transactions if only the >>> changed values are written back. >> >> I would add: >> >> c) However, before [6], the non-incremental implementation of ovn-northd >> had the side effect that *any* external change to a SB record/column, >> regardless of how the column was monitored, would trigger northd to >> reconcile the SB contents. That won't be the case anymore. I'm not >> sure if this is a big problem but it might uncover issues that used to >> be handled indirectly by the northd full reprocessing. >> >> E.g., in a sandbox with [0] reverted: >> >> $ ovn-nbctl ls-add ls >> $ ovn-sbctl --uuid lflow-list | grep ls_in_acl_after_lb >> uuid=0xf771dbcb, table=13(ls_in_acl_after_lb ), priority=0 , >> match=(1), action=(next;) >> >> # Simulate a lflow "disappearing" from the SB. >> $ ovn-sbctl destroy logical_flow f771dbcb >> >> # Northd doesn't recreate it until a read/write column changes value: >> $ ovn-sbctl --uuid lflow-list | grep ls_in_acl_after_lb >> $ ovn-nbctl --wait=sb sync >> $ ovn-sbctl --uuid lflow-list | grep ls_in_acl_after_lb >> uuid=0x20a3b933, table=13(ls_in_acl_after_lb ), priority=0 , >> match=(1), action=(next;) >> >> So, with all this in mind, if we find a good way of defining this new >> monitoring mode ("writing back changed columns only") I don't really see >> a problem with blindly enabling change tracking and this new mode for >> the Southbound. All in all it just ensures the same behavior as before >> [6] without any real drawbacks. >> > > I understand c), but I still think a) is important. For the example above, > since we expect northd to detect changes in SB logical_flow table and fix > it, it should be regarded as read-write. Probably most of the tables and > columns fall into this category, but there are exceptions. In some cases we > don't even want/need to fix changes. One example is nb_cfg, which is only > for sync or performance test purposes. It is purely write-only, and no need > to detect any changes from other components. Currently with alert enabled, > after every nb_cfg update to the SB by northd, the northd would receive > notification for its own change to this column which triggers a recompute > unnecessarily. I am not sure if there are other examples. But I think > overall we are on the same page. We can detect such cases one-by-one and > treat them as write-only. >
Thanks, Han, for the review! I just posted v3: https://patchwork.ozlabs.org/project/openvswitch/patch/[email protected]/ Regards, Dumitru > Thanks, > Han > >>> >>> With the above understanding, I have some comments for the patch: >>> >>> i) Functionally I believe it serves the purpose of solving the > performance >>> problem, but it seems confusing what the modes really mean now. There > were >>> definitions to each of the flags: >>> - OVSDB_IDL_MONITOR /* Replicate this column? */ >>> - OVSDB_IDL_ALERT /* Alert client when column changes? */ >>> - OVSDB_IDL_TRACK // there is no comment, but it is obvious as the name >>> suggests. >>> >>> Each of the meanings was clear. Now with the patch it seems to change > the >>> meanings of: >>> - OVSDB_IDL_ALERT to "alert client when column changes, and the column > is >>> NOT write-only" >>> - OVSDB_IDL_TRACK to "alert client when column changes, and track the >>> changes" >>> >>> This is really confusing. I think the purpose here is just to provide a >>> capability of "writing back changed columns only" while still "alert > client >>> when column changes". Would it be better just to add a new flag for > this, >>> like OVSDB_IDL_WRITE_CHANGED_ONLY? It can be set properly in existed > APIs >>> to keep the existed behavior, and provide an API for clients to set it >>> explicitly for columns that are read-write but doesn't care about the >>> atomicity. >> >> Sure, sounds better to make it explicit indeed. >> >>> >>> ii) The comments in ovsdb-idl.h under "The possible mode combinations > are:" >>> needs updating. >> >> Ack. >> >> I'll prepare a v3 for the idl change. In the meantime it would be great >> if we could already agree on the way we want to enable this in >> ovn-northd (i.e., blindly for all SB vs. selectively for some tables). >> >>> >>> Thanks, >>> Han >>> >>>>> [0] >>>>> >>> > https://github.com/ovn-org/ovn/commit/e4d6d3455baf09c63ed610037c384855e5f64141 >>>>> [1] >>>>> >>> > https://github.com/ovn-org/ovn/commit/d32a9bc5290e40dd63ef495eb4f0fcde9e446089 >>>>> >>>> >>>> [2] >>> > https://github.com/openvswitch/ovs/blob/master/Documentation/ref/ovsdb.7.rst#understanding-cluster-consistency >>>> [3] https://bugzilla.redhat.com/show_bug.cgi?id=2045577 >>>> [4] >>> > https://github.com/ovn-org/ovn/commit/9deb000536e0dcf81d0e61d5a8af1d4e655960b4 >>>> >>> [5] >>> > https://github.com/openvswitch/ovs/commit/1cc618c32524076d14ba3ee30e672a554b8ee605 >>> >> >> [6] >> > https://github.com/ovn-org/ovn/commit/4597317f16d1a522fd468dc5339cbe3bb851b60e >> >> Thanks, >> Dumitru >> > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
