On 5/31/23 01:22, Han Zhou wrote: > > > On Tue, May 30, 2023 at 4:04 AM Ilya Maximets <[email protected] > <mailto:[email protected]>> wrote: >> >> On 5/29/23 20:42, Han Zhou wrote: >> > >> > >> > On Mon, May 29, 2023 at 9:24 AM Ilya Maximets <[email protected] >> > <mailto:[email protected]> <mailto:[email protected] >> > <mailto:[email protected]>>> wrote: >> >> >> >> On 5/13/23 02:03, Han Zhou wrote: >> >> > This patch introduces a change handler for 'northd' input within the >> >> > 'lflow' node. It specifically handles cases when VIFs are created, which >> >> > is an easier start for lflow incremental processing. Support for >> >> > update/delete will be added later. >> >> > >> >> > Below are the performance test results simulating an ovn-k8s topology >> >> > of 500 >> >> > nodes x 50 lsp per node: >> >> > >> >> > Before: >> >> > ovn-nbctl --wait=hv --print-wait-time lsp-add ls_1_0001 lsp_1_0001_01 >> >> > -- lsp-set-addresses lsp_1_0001_01 "ff:f1:bb:00:01:01 1.0.1.101" -- >> >> > lsp-set-port-security lsp_1_0001_01 "ff:f1:bb:00:01:01 1.0.1.101" >> >> > ovn-northd completion: 773ms >> >> > >> >> > After: >> >> > ovn-nbctl --wait=hv --print-wait-time lsp-add ls_1_0001 lsp_1_0001_01 >> >> > -- lsp-set-addresses lsp_1_0001_01 "ff:f1:bb:00:01:01 1.0.1.101" -- >> >> > lsp-set-port-security lsp_1_0001_01 "ff:f1:bb:00:01:01 1.0.1.101" >> >> > ovn-northd completion: 30ms >> >> > >> >> > It is more than 95% reduction (or 20x faster). >> >> > >> >> > Signed-off-by: Han Zhou <[email protected] <mailto:[email protected]> >> >> > <mailto:[email protected] <mailto:[email protected]>>> >> >> > --- >> >> > northd/en-lflow.c | 82 ++++++++----- >> >> > northd/en-lflow.h | 1 + >> >> > northd/inc-proc-northd.c | 2 +- >> >> > northd/northd.c | 245 +++++++++++++++++++++++++++++++++------ >> >> > northd/northd.h | 6 +- >> >> > tests/ovn-northd.at <http://ovn-northd.at> <http://ovn-northd.at >> >> > <http://ovn-northd.at>> | 4 +- >> >> > 6 files changed, 277 insertions(+), 63 deletions(-) >> >> > >> >> >> >> <snip> >> >> >> >> > @@ -5685,7 +5687,18 @@ ovn_dp_group_add_with_reference(struct ovn_lflow >> >> > *lflow_ref, >> >> > >> >> > /* Adds a row with the specified contents to the Logical_Flow table. >> >> > * Version to use when hash bucket locking is NOT required. >> >> > + * >> >> > + * Note: This function can add generated lflows to the global variable >> >> > + * temp_lflow_list as its output, controlled by the global variable >> >> > + * add_lflow_to_temp_list. The caller of the ovn_lflow_add_... marcros >> >> > can get >> >> > + * a list of lflows generated by setting add_lflow_to_temp_list to >> >> > true. The >> >> > + * caller is responsible for initializing the temp_lflow_list, and also >> >> > + * reset the add_lflow_to_temp_list to false when it is no longer >> >> > needed. >> >> > + * XXX: this mechanism is temporary and will be replaced when we add >> >> > hash index >> >> > + * to lflow_data and refactor related functions. >> >> > */ >> >> > +static bool add_lflow_to_temp_list = false; >> >> > +static struct ovs_list temp_lflow_list; >> >> > static void >> >> > do_ovn_lflow_add(struct hmap *lflow_map, const struct ovn_datapath *od, >> >> > const unsigned long *dp_bitmap, size_t dp_bitmap_len, >> >> > @@ -5702,12 +5715,17 @@ do_ovn_lflow_add(struct hmap *lflow_map, const >> >> > struct ovn_datapath *od, >> >> > size_t bitmap_len = od ? ods_size(od->datapaths) : dp_bitmap_len; >> >> > ovs_assert(bitmap_len); >> >> > >> >> > - old_lflow = ovn_lflow_find(lflow_map, NULL, stage, priority, match, >> >> > - actions, ctrl_meter, hash); >> >> > - if (old_lflow) { >> >> > - ovn_dp_group_add_with_reference(old_lflow, od, dp_bitmap, >> >> > - bitmap_len); >> >> > - return; >> >> > + if (add_lflow_to_temp_list) { >> >> > + ovs_assert(od); >> >> > + ovs_assert(!dp_bitmap); >> >> > + } else { >> >> > + old_lflow = ovn_lflow_find(lflow_map, NULL, stage, priority, >> >> > match, >> >> > + actions, ctrl_meter, hash); >> >> > + if (old_lflow) { >> >> > + ovn_dp_group_add_with_reference(old_lflow, od, dp_bitmap, >> >> > + bitmap_len); >> >> > + return; >> >> > + } >> >> > } >> >> >> >> Hi, Han. Not a full review, but I'm a bit concerned that this code >> >> doesn't >> >> support datapath groups. IIUC, if some flows can be aggregated with >> >> datapath >> >> groups, I-P will just add them separately. Later a full re-compute will >> >> delete all these flows and replace them with a single grouped one. Might >> >> cause some unnecessary churn in the cluster. >> >> >> > >> > Hi Ilya, thanks for the feedback. You are right that the current I-P >> > doesn't support datapath groups, because: >> > 1. The current I-P only supports LSP (regular VIFs) related changes, which >> > is very unlikely to generate datapath groups. Even when that happens in >> > rare cases, the logic is still correct and the churn should be small. >> > 2. There are two types of DP groups: >> > a. DP groups that are directly generated when northd has the knowledge >> > that a lflow should be applied to a group of DPs. >> > b. DP groups that are passively generated by combining lflows >> > generated for different DPs. >> > Supporting I-P for the type-a DP groups is relatively straightforward, >> > which would be part of the I-P logic for the object that generates the DP >> > group, e.g. for LB related DP groups. >> > Supporting I-P for the type-b DP groups is more complex, but I think for >> > most DP groups we can optimize northd to make them type-a DP groups, for >> > example, for ACLs we should know beforehand the group of DPs the lflow >> > should be applied. This conversion/optimization is an incremental process, >> > when we see a type of change handling is a bottleneck, and decide to >> > implement I-P for it. >> > In the end I hope there will be no cases that need I-P for a type-b DP >> > group, which is why I don't worry about supporting DP groups for now. >> > >> >> Is my understanding correct? And can it be a performance issue? E.g. if >> >> the >> >> database grows too much because of that. Or how hard it is to add >> >> datapath >> >> group support to I-P ? >> >> >> > I'd not worry too much about DB growth for now since it is only LSP >> > (regular VIFs) related lflows that's skipping DP group which I believe DP >> > group is not heavily involved in. >> > >> >> I suppose, DP-groups is one of the reasons you didn't implement deletions >> >> here, right? >> > >> > Maybe partly related, but not the major reason. I just didn't get enough >> > time for that part yet. For the reason mentioned above, I will skip the DP >> > group for VIF related lflow generation so that I can build hash index for >> > deletion, and I don't see any obvious negative impact for not trying to >> > combine such lflows with DP groups. Please let me know if you think >> > differently. >> >> It's hard to tell for sure what kind of side effects we could see, >> because it's hard to sustain incremental-only processing for a long >> time with current implementation. It falls back to re-compute too >> frequently to see a long-term impact. >> >> But I would not be that dismissive of type-b flows. For example, I >> pulled the southbound database from our recent ovn-heater density-light >> 500-node test. It doesn't have any load balancers involved. >> >> The database by the end of a test contains 556 K logical flows. >> Only 826 of these flows have datapath groups, as expected, not a lot. >> However, each of these flows is applied to a group of 500+ datapaths. >> If we unfold them, we'll get additional 826 * 500 = 413 K flows. >> So, the number of logical flows will be effectively double of what we >> have today. The total database size will also grow accordingly. >> >> Best regards, Ilya Maximets. > > Maybe I should clarify more clearly that the current implementation of this > patch series only performs I-P for VIF related changes. The VIF related > lflows are mostly different, and shall never be applied to a group of 500+ > datapaths.
I see. Indeed, most of the flows in the case above are related to router ports and don't really match on any port-specific information. I missed that while looking through the database. > In the test you mentioned, the flows that applied to a group of 500+ > datapaths shouldn't be related to any individual VIFs. With this patch, they > are still always-recomputed and applied to DP groups instead of individual > DPs. > In my own ovn-k8s-simulation test environment, I have confirmed that after > adding and binding a new VIF (which is incrementally processed), and then > triggering a recompute from northd, there is no change in the logical flows > (no churns observed). > > Also, I was not dismissive of type-b DP groups. Sorry, it was a poor choice of words from my side. > I totally understand the impact of the cost of "ungrouping" them and never > meant to do so. I am just not worried now because the current patch series > (and immediate TODOs for deletions) would not touch the flows of type-b DP > groups, so they will keep getting recomputed and work as is for now. > > To implement I-P for the flows of type-b DP groups, I also had the proposal > above: to convert them to type-a DP groups first. By that I mean, when > generating flows that applies to a big group of DPs, northd must have the > knowledge which DPs should the flows be applied to while constructing them. > An example is the DP groups for LBs that you have optimized recently. Another > example is the flows for ACLs that apply to DPs either for the LSes that > reference the ACLs directly or through port-groups. When implementing I-P for > these objects, we can construct the DP groups and apply the flows to the > group (type-a) directly instead of generating flows for each individual DPs > first and then collecting them (type-b). The goal is not to convert every > type-b group to type-a, but when we are trying to implement I-P for a certain > input this is what we need to do for the DP groups related to that input. Ack. > > I hope this addresses your concerns. > > Finally, I don't want to mislead that there is never going to be DP groups > involved in VIF related flows. In some uncommon scenarios I think it may > still happen that two VIFs on different LSes may happen to have same IP/MAC > configured thus some of the flows generated may look the same on the > different DPs (thus get grouped as type-b groups), but these are very rare > cases and the DP groups generated by such coincidental identical flows should > be small and the effect of such small-sized DP groups are unimportant (unlike > the DP groups for common flows for all LSes/LRs or the ones for LBs/ACLs). So > if there are churns caused by this type of flows they should be very small > (if exists at all), and once I implement the I-P for deletion (which will > disable DP groups for such coincidental small groups) there will be no churn > at all even for these rare cases. Thanks for explanation. I see that datapath groups are not really a problem for this particular patch set. We just need to be careful while adding new I-P nodes for other types of resources. Best regards, Ilya Maximets. > > Thanks, > Han _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
