On Tue, May 30, 2023 at 4:04 AM Ilya Maximets <[email protected]> wrote: > > On 5/29/23 20:42, Han Zhou wrote: > > > > > > On Mon, May 29, 2023 at 9:24 AM Ilya Maximets <[email protected] <mailto:[email protected]>> wrote: > >> > >> On 5/13/23 02:03, Han Zhou wrote: > >> > This patch introduces a change handler for 'northd' input within the > >> > 'lflow' node. It specifically handles cases when VIFs are created, which > >> > is an easier start for lflow incremental processing. Support for > >> > update/delete will be added later. > >> > > >> > Below are the performance test results simulating an ovn-k8s topology of 500 > >> > nodes x 50 lsp per node: > >> > > >> > Before: > >> > ovn-nbctl --wait=hv --print-wait-time lsp-add ls_1_0001 lsp_1_0001_01 -- lsp-set-addresses lsp_1_0001_01 "ff:f1:bb:00:01:01 1.0.1.101" -- lsp-set-port-security lsp_1_0001_01 "ff:f1:bb:00:01:01 1.0.1.101" > >> > ovn-northd completion: 773ms > >> > > >> > After: > >> > ovn-nbctl --wait=hv --print-wait-time lsp-add ls_1_0001 lsp_1_0001_01 -- lsp-set-addresses lsp_1_0001_01 "ff:f1:bb:00:01:01 1.0.1.101" -- lsp-set-port-security lsp_1_0001_01 "ff:f1:bb:00:01:01 1.0.1.101" > >> > ovn-northd completion: 30ms > >> > > >> > It is more than 95% reduction (or 20x faster). > >> > > >> > Signed-off-by: Han Zhou <[email protected] <mailto:[email protected]>> > >> > --- > >> > northd/en-lflow.c | 82 ++++++++----- > >> > northd/en-lflow.h | 1 + > >> > northd/inc-proc-northd.c | 2 +- > >> > northd/northd.c | 245 +++++++++++++++++++++++++++++++++------ > >> > northd/northd.h | 6 +- > >> > tests/ovn-northd.at <http://ovn-northd.at> | 4 +- > >> > 6 files changed, 277 insertions(+), 63 deletions(-) > >> > > >> > >> <snip> > >> > >> > @@ -5685,7 +5687,18 @@ ovn_dp_group_add_with_reference(struct ovn_lflow *lflow_ref, > >> > > >> > /* Adds a row with the specified contents to the Logical_Flow table. > >> > * Version to use when hash bucket locking is NOT required. > >> > + * > >> > + * Note: This function can add generated lflows to the global variable > >> > + * temp_lflow_list as its output, controlled by the global variable > >> > + * add_lflow_to_temp_list. The caller of the ovn_lflow_add_... marcros can get > >> > + * a list of lflows generated by setting add_lflow_to_temp_list to true. The > >> > + * caller is responsible for initializing the temp_lflow_list, and also > >> > + * reset the add_lflow_to_temp_list to false when it is no longer needed. > >> > + * XXX: this mechanism is temporary and will be replaced when we add hash index > >> > + * to lflow_data and refactor related functions. > >> > */ > >> > +static bool add_lflow_to_temp_list = false; > >> > +static struct ovs_list temp_lflow_list; > >> > static void > >> > do_ovn_lflow_add(struct hmap *lflow_map, const struct ovn_datapath *od, > >> > const unsigned long *dp_bitmap, size_t dp_bitmap_len, > >> > @@ -5702,12 +5715,17 @@ do_ovn_lflow_add(struct hmap *lflow_map, const struct ovn_datapath *od, > >> > size_t bitmap_len = od ? ods_size(od->datapaths) : dp_bitmap_len; > >> > ovs_assert(bitmap_len); > >> > > >> > - old_lflow = ovn_lflow_find(lflow_map, NULL, stage, priority, match, > >> > - actions, ctrl_meter, hash); > >> > - if (old_lflow) { > >> > - ovn_dp_group_add_with_reference(old_lflow, od, dp_bitmap, > >> > - bitmap_len); > >> > - return; > >> > + if (add_lflow_to_temp_list) { > >> > + ovs_assert(od); > >> > + ovs_assert(!dp_bitmap); > >> > + } else { > >> > + old_lflow = ovn_lflow_find(lflow_map, NULL, stage, priority, match, > >> > + actions, ctrl_meter, hash); > >> > + if (old_lflow) { > >> > + ovn_dp_group_add_with_reference(old_lflow, od, dp_bitmap, > >> > + bitmap_len); > >> > + return; > >> > + } > >> > } > >> > >> Hi, Han. Not a full review, but I'm a bit concerned that this code doesn't > >> support datapath groups. IIUC, if some flows can be aggregated with datapath > >> groups, I-P will just add them separately. Later a full re-compute will > >> delete all these flows and replace them with a single grouped one. Might > >> cause some unnecessary churn in the cluster. > >> > > > > Hi Ilya, thanks for the feedback. You are right that the current I-P doesn't support datapath groups, because: > > 1. The current I-P only supports LSP (regular VIFs) related changes, which is very unlikely to generate datapath groups. Even when that happens in rare cases, the logic is still correct and the churn should be small. > > 2. There are two types of DP groups: > > a. DP groups that are directly generated when northd has the knowledge that a lflow should be applied to a group of DPs. > > b. DP groups that are passively generated by combining lflows generated for different DPs. > > Supporting I-P for the type-a DP groups is relatively straightforward, which would be part of the I-P logic for the object that generates the DP group, e.g. for LB related DP groups. > > Supporting I-P for the type-b DP groups is more complex, but I think for most DP groups we can optimize northd to make them type-a DP groups, for example, for ACLs we should know beforehand the group of DPs the lflow should be applied. This conversion/optimization is an incremental process, when we see a type of change handling is a bottleneck, and decide to implement I-P for it. > > In the end I hope there will be no cases that need I-P for a type-b DP group, which is why I don't worry about supporting DP groups for now. > > > >> Is my understanding correct? And can it be a performance issue? E.g. if the > >> database grows too much because of that. Or how hard it is to add datapath > >> group support to I-P ? > >> > > I'd not worry too much about DB growth for now since it is only LSP (regular VIFs) related lflows that's skipping DP group which I believe DP group is not heavily involved in. > > > >> I suppose, DP-groups is one of the reasons you didn't implement deletions > >> here, right? > > > > Maybe partly related, but not the major reason. I just didn't get enough time for that part yet. For the reason mentioned above, I will skip the DP group for VIF related lflow generation so that I can build hash index for deletion, and I don't see any obvious negative impact for not trying to combine such lflows with DP groups. Please let me know if you think differently. > > It's hard to tell for sure what kind of side effects we could see, > because it's hard to sustain incremental-only processing for a long > time with current implementation. It falls back to re-compute too > frequently to see a long-term impact. > > But I would not be that dismissive of type-b flows. For example, I > pulled the southbound database from our recent ovn-heater density-light > 500-node test. It doesn't have any load balancers involved. > > The database by the end of a test contains 556 K logical flows. > Only 826 of these flows have datapath groups, as expected, not a lot. > However, each of these flows is applied to a group of 500+ datapaths. > If we unfold them, we'll get additional 826 * 500 = 413 K flows. > So, the number of logical flows will be effectively double of what we > have today. The total database size will also grow accordingly. > > Best regards, Ilya Maximets.
Maybe I should clarify more clearly that the current implementation of this patch series only performs I-P for VIF related changes. The VIF related lflows are mostly different, and shall never be applied to a group of 500+ datapaths. In the test you mentioned, the flows that applied to a group of 500+ datapaths shouldn't be related to any individual VIFs. With this patch, they are still always-recomputed and applied to DP groups instead of individual DPs. In my own ovn-k8s-simulation test environment, I have confirmed that after adding and binding a new VIF (which is incrementally processed), and then triggering a recompute from northd, there is no change in the logical flows (no churns observed). Also, I was not dismissive of type-b DP groups. I totally understand the impact of the cost of "ungrouping" them and never meant to do so. I am just not worried now because the current patch series (and immediate TODOs for deletions) would not touch the flows of type-b DP groups, so they will keep getting recomputed and work as is for now. To implement I-P for the flows of type-b DP groups, I also had the proposal above: to convert them to type-a DP groups first. By that I mean, when generating flows that applies to a big group of DPs, northd must have the knowledge which DPs should the flows be applied to while constructing them. An example is the DP groups for LBs that you have optimized recently. Another example is the flows for ACLs that apply to DPs either for the LSes that reference the ACLs directly or through port-groups. When implementing I-P for these objects, we can construct the DP groups and apply the flows to the group (type-a) directly instead of generating flows for each individual DPs first and then collecting them (type-b). The goal is not to convert every type-b group to type-a, but when we are trying to implement I-P for a certain input this is what we need to do for the DP groups related to that input. I hope this addresses your concerns. Finally, I don't want to mislead that there is never going to be DP groups involved in VIF related flows. In some uncommon scenarios I think it may still happen that two VIFs on different LSes may happen to have same IP/MAC configured thus some of the flows generated may look the same on the different DPs (thus get grouped as type-b groups), but these are very rare cases and the DP groups generated by such coincidental identical flows should be small and the effect of such small-sized DP groups are unimportant (unlike the DP groups for common flows for all LSes/LRs or the ones for LBs/ACLs). So if there are churns caused by this type of flows they should be very small (if exists at all), and once I implement the I-P for deletion (which will disable DP groups for such coincidental small groups) there will be no churn at all even for these rare cases. Thanks, Han _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
