Re: [ovs-dev] [PATCH ovn 09/14] northd: Incremental processing of VIF additions in 'lflow' node.

Han Zhou Tue, 30 May 2023 16:22:50 -0700

On Tue, May 30, 2023 at 4:04 AM Ilya Maximets <[email protected]> wrote:
>
> On 5/29/23 20:42, Han Zhou wrote:
> >
> >
> > On Mon, May 29, 2023 at 9:24 AM Ilya Maximets <[email protected]
<mailto:[email protected]>> wrote:
> >>
> >> On 5/13/23 02:03, Han Zhou wrote:
> >> > This patch introduces a change handler for 'northd' input within the
> >> > 'lflow' node. It specifically handles cases when VIFs are created,
which
> >> > is an easier start for lflow incremental processing. Support for
> >> > update/delete will be added later.
> >> >
> >> > Below are the performance test results simulating an ovn-k8s
topology of 500
> >> > nodes x 50 lsp per node:
> >> >
> >> > Before:
> >> > ovn-nbctl --wait=hv --print-wait-time lsp-add ls_1_0001
lsp_1_0001_01 -- lsp-set-addresses lsp_1_0001_01 "ff:f1:bb:00:01:01
1.0.1.101" -- lsp-set-port-security lsp_1_0001_01 "ff:f1:bb:00:01:01
1.0.1.101"
> >> >     ovn-northd completion:          773ms
> >> >
> >> > After:
> >> > ovn-nbctl --wait=hv --print-wait-time lsp-add ls_1_0001
lsp_1_0001_01 -- lsp-set-addresses lsp_1_0001_01 "ff:f1:bb:00:01:01
1.0.1.101" -- lsp-set-port-security lsp_1_0001_01 "ff:f1:bb:00:01:01
1.0.1.101"
> >> >     ovn-northd completion:          30ms
> >> >
> >> > It is more than 95% reduction (or 20x faster).
> >> >
> >> > Signed-off-by: Han Zhou <[email protected] <mailto:[email protected]>>
> >> > ---
> >> >  northd/en-lflow.c        |  82 ++++++++-----
> >> >  northd/en-lflow.h        |   1 +
> >> >  northd/inc-proc-northd.c |   2 +-
> >> >  northd/northd.c          | 245
+++++++++++++++++++++++++++++++++------
> >> >  northd/northd.h          |   6 +-
> >> >  tests/ovn-northd.at <http://ovn-northd.at>      |   4 +-
> >> >  6 files changed, 277 insertions(+), 63 deletions(-)
> >> >
> >>
> >> <snip>
> >>
> >> > @@ -5685,7 +5687,18 @@ ovn_dp_group_add_with_reference(struct
ovn_lflow *lflow_ref,
> >> >
> >> >  /* Adds a row with the specified contents to the Logical_Flow table.
> >> >   * Version to use when hash bucket locking is NOT required.
> >> > + *
> >> > + * Note: This function can add generated lflows to the global
variable
> >> > + * temp_lflow_list as its output, controlled by the global variable
> >> > + * add_lflow_to_temp_list. The caller of the ovn_lflow_add_...
marcros can get
> >> > + * a list of lflows generated by setting add_lflow_to_temp_list to
true. The
> >> > + * caller is responsible for initializing the temp_lflow_list, and
also
> >> > + * reset the add_lflow_to_temp_list to false when it is no longer
needed.
> >> > + * XXX: this mechanism is temporary and will be replaced when we
add hash index
> >> > + * to lflow_data and refactor related functions.
> >> >   */
> >> > +static bool add_lflow_to_temp_list = false;
> >> > +static struct ovs_list temp_lflow_list;
> >> >  static void
> >> >  do_ovn_lflow_add(struct hmap *lflow_map, const struct ovn_datapath
*od,
> >> >                   const unsigned long *dp_bitmap, size_t
dp_bitmap_len,
> >> > @@ -5702,12 +5715,17 @@ do_ovn_lflow_add(struct hmap *lflow_map,
const struct ovn_datapath *od,
> >> >      size_t bitmap_len = od ? ods_size(od->datapaths) :
dp_bitmap_len;
> >> >      ovs_assert(bitmap_len);
> >> >
> >> > -    old_lflow = ovn_lflow_find(lflow_map, NULL, stage, priority,
match,
> >> > -                               actions, ctrl_meter, hash);
> >> > -    if (old_lflow) {
> >> > -        ovn_dp_group_add_with_reference(old_lflow, od, dp_bitmap,
> >> > -                                        bitmap_len);
> >> > -        return;
> >> > +    if (add_lflow_to_temp_list) {
> >> > +        ovs_assert(od);
> >> > +        ovs_assert(!dp_bitmap);
> >> > +    } else {
> >> > +        old_lflow = ovn_lflow_find(lflow_map, NULL, stage,
priority, match,
> >> > +                                   actions, ctrl_meter, hash);
> >> > +        if (old_lflow) {
> >> > +            ovn_dp_group_add_with_reference(old_lflow, od,
dp_bitmap,
> >> > +                                            bitmap_len);
> >> > +            return;
> >> > +        }
> >> >      }
> >>
> >> Hi, Han.  Not a full review, but I'm a bit concerned that this code
doesn't
> >> support datapath groups.  IIUC, if some flows can be aggregated with
datapath
> >> groups, I-P will just add them separately.  Later a full re-compute
will
> >> delete all these flows and replace them with a single grouped one.
Might
> >> cause some unnecessary churn in the cluster.
> >>
> >
> > Hi Ilya, thanks for the feedback. You are right that the current I-P
doesn't support datapath groups, because:
> > 1. The current I-P only supports LSP (regular VIFs) related changes,
which is very unlikely to generate datapath groups. Even when that happens
in rare cases, the logic is still correct and the churn should be small.
> > 2. There are two types of DP groups:
> >     a. DP groups that are directly generated when northd has the
knowledge that a lflow should be applied to a group of DPs.
> >     b. DP groups that are passively generated by combining lflows
generated for different DPs.
> > Supporting I-P for the type-a DP groups is relatively straightforward,
which would be part of the I-P logic for the object that generates the DP
group, e.g. for LB related DP groups.
> > Supporting I-P for the type-b DP groups is more complex, but I think
for most DP groups we can optimize northd to make them type-a DP groups,
for example, for ACLs we should know beforehand the group of DPs the lflow
should be applied. This conversion/optimization is an incremental process,
when we see a type of change handling is a bottleneck, and decide to
implement I-P for it.
> > In the end I hope there will be no cases that need I-P for a type-b DP
group, which is why I don't worry about supporting DP groups for now.
> >
> >> Is my understanding correct?  And can it be a performance issue? E.g.
if the
> >> database grows too much because of that.   Or how hard it is to add
datapath
> >> group support to I-P ?
> >>
> > I'd not worry too much about DB growth for now since it is only LSP
(regular VIFs) related lflows that's skipping DP group which I believe DP
group is not heavily involved in.
> >
> >> I suppose, DP-groups is one of the reasons you didn't implement
deletions
> >> here, right?
> >
> > Maybe partly related, but not the major reason. I just didn't get
enough time for that part yet. For the reason mentioned above, I will skip
the DP group for VIF related lflow generation so that I can build hash
index for deletion, and I don't see any obvious negative impact for not
trying to combine such lflows with DP groups. Please let me know if you
think differently.
>
> It's hard to tell for sure what kind of side effects we could see,
> because it's hard to sustain incremental-only processing for a long
> time with current implementation.  It falls back to re-compute too
> frequently to see a long-term impact.
>
> But I would not be that dismissive of type-b flows.  For example, I
> pulled the southbound database from our recent ovn-heater density-light
> 500-node test.  It doesn't have any load balancers involved.
>
> The database by the end of a test contains 556 K logical flows.
> Only 826 of these flows have datapath groups, as expected, not a lot.
> However, each of these flows is applied to a group of 500+ datapaths.
> If we unfold them, we'll get additional 826 * 500 = 413 K flows.
> So, the number of logical flows will be effectively double of what we
> have today.  The total database size will also grow accordingly.
>
> Best regards, Ilya Maximets.


Maybe I should clarify more clearly that the current implementation of this
patch series only performs I-P for VIF related changes. The VIF related
lflows are mostly different, and shall never be applied to a group of 500+
datapaths.
In the test you mentioned, the flows that applied to a group of 500+
datapaths shouldn't be related to any individual VIFs. With this patch,
they are still always-recomputed and applied to DP groups instead of
individual DPs.
In my own ovn-k8s-simulation test environment, I have confirmed that after
adding and binding a new VIF (which is incrementally processed), and then
triggering a recompute from northd, there is no change in the logical flows
(no churns observed).

Also, I was not dismissive of type-b DP groups. I totally understand the
impact of the cost of "ungrouping" them and never meant to do so. I am just
not worried now because the current patch series (and immediate TODOs for
deletions) would not touch the flows of type-b DP groups, so they will keep
getting recomputed and work as is for now.

To implement I-P for the flows of type-b DP groups, I also had the proposal
above: to convert them to type-a DP groups first. By that I mean, when
generating flows that applies to a big group of DPs, northd must have the
knowledge which DPs should the flows be applied to while constructing them.
An example is the DP groups for LBs that you have optimized recently.
Another example is the flows for ACLs that apply to DPs either for the LSes
that reference the ACLs directly or through port-groups. When implementing
I-P for these objects, we can construct the DP groups and apply the flows
to the group (type-a) directly instead of generating flows for each
individual DPs first and then collecting them (type-b). The goal is not to
convert every type-b group to type-a, but when we are trying to implement
I-P for a certain input this is what we need to do for the DP groups
related to that input.

I hope this addresses your concerns.

Finally, I don't want to mislead that there is never going to be DP groups
involved in VIF related flows. In some uncommon scenarios I think it may
still happen that two VIFs on different LSes may happen to have same IP/MAC
configured thus some of the flows generated may look the same on the
different DPs (thus get grouped as type-b groups), but these are very rare
cases and the DP groups generated by such coincidental identical flows
should be small and the effect of such small-sized DP groups are
unimportant (unlike the DP groups for common flows for all LSes/LRs or the
ones for LBs/ACLs). So if there are churns caused by this type of flows
they should be very small (if exists at all), and once I implement the I-P
for deletion (which will disable DP groups for such coincidental small
groups) there will be no churn at all even for these rare cases.

Thanks,
Han
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH ovn 09/14] northd: Incremental processing of VIF additions in 'lflow' node.

Reply via email to