On 5/31/23 01:22, Han Zhou wrote:
> 
> 
> On Tue, May 30, 2023 at 4:04 AM Ilya Maximets <[email protected] 
> <mailto:[email protected]>> wrote:
>>
>> On 5/29/23 20:42, Han Zhou wrote:
>> >
>> >
>> > On Mon, May 29, 2023 at 9:24 AM Ilya Maximets <[email protected] 
>> > <mailto:[email protected]> <mailto:[email protected] 
>> > <mailto:[email protected]>>> wrote:
>> >>
>> >> On 5/13/23 02:03, Han Zhou wrote:
>> >> > This patch introduces a change handler for 'northd' input within the
>> >> > 'lflow' node. It specifically handles cases when VIFs are created, which
>> >> > is an easier start for lflow incremental processing. Support for
>> >> > update/delete will be added later.
>> >> >
>> >> > Below are the performance test results simulating an ovn-k8s topology 
>> >> > of 500
>> >> > nodes x 50 lsp per node:
>> >> >
>> >> > Before:
>> >> > ovn-nbctl --wait=hv --print-wait-time lsp-add ls_1_0001 lsp_1_0001_01 
>> >> > -- lsp-set-addresses lsp_1_0001_01 "ff:f1:bb:00:01:01 1.0.1.101" -- 
>> >> > lsp-set-port-security lsp_1_0001_01 "ff:f1:bb:00:01:01 1.0.1.101"
>> >> >     ovn-northd completion:          773ms
>> >> >
>> >> > After:
>> >> > ovn-nbctl --wait=hv --print-wait-time lsp-add ls_1_0001 lsp_1_0001_01 
>> >> > -- lsp-set-addresses lsp_1_0001_01 "ff:f1:bb:00:01:01 1.0.1.101" -- 
>> >> > lsp-set-port-security lsp_1_0001_01 "ff:f1:bb:00:01:01 1.0.1.101"
>> >> >     ovn-northd completion:          30ms
>> >> >
>> >> > It is more than 95% reduction (or 20x faster).
>> >> >
>> >> > Signed-off-by: Han Zhou <[email protected] <mailto:[email protected]> 
>> >> > <mailto:[email protected] <mailto:[email protected]>>>
>> >> > ---
>> >> >  northd/en-lflow.c        |  82 ++++++++-----
>> >> >  northd/en-lflow.h        |   1 +
>> >> >  northd/inc-proc-northd.c |   2 +-
>> >> >  northd/northd.c          | 245 +++++++++++++++++++++++++++++++++------
>> >> >  northd/northd.h          |   6 +-
>> >> >  tests/ovn-northd.at <http://ovn-northd.at> <http://ovn-northd.at 
>> >> > <http://ovn-northd.at>>      |   4 +-
>> >> >  6 files changed, 277 insertions(+), 63 deletions(-)
>> >> >
>> >>
>> >> <snip>
>> >>
>> >> > @@ -5685,7 +5687,18 @@ ovn_dp_group_add_with_reference(struct ovn_lflow 
>> >> > *lflow_ref,
>> >> >
>> >> >  /* Adds a row with the specified contents to the Logical_Flow table.
>> >> >   * Version to use when hash bucket locking is NOT required.
>> >> > + *
>> >> > + * Note: This function can add generated lflows to the global variable
>> >> > + * temp_lflow_list as its output, controlled by the global variable
>> >> > + * add_lflow_to_temp_list. The caller of the ovn_lflow_add_... marcros 
>> >> > can get
>> >> > + * a list of lflows generated by setting add_lflow_to_temp_list to 
>> >> > true. The
>> >> > + * caller is responsible for initializing the temp_lflow_list, and also
>> >> > + * reset the add_lflow_to_temp_list to false when it is no longer 
>> >> > needed.
>> >> > + * XXX: this mechanism is temporary and will be replaced when we add 
>> >> > hash index
>> >> > + * to lflow_data and refactor related functions.
>> >> >   */
>> >> > +static bool add_lflow_to_temp_list = false;
>> >> > +static struct ovs_list temp_lflow_list;
>> >> >  static void
>> >> >  do_ovn_lflow_add(struct hmap *lflow_map, const struct ovn_datapath *od,
>> >> >                   const unsigned long *dp_bitmap, size_t dp_bitmap_len,
>> >> > @@ -5702,12 +5715,17 @@ do_ovn_lflow_add(struct hmap *lflow_map, const 
>> >> > struct ovn_datapath *od,
>> >> >      size_t bitmap_len = od ? ods_size(od->datapaths) : dp_bitmap_len;
>> >> >      ovs_assert(bitmap_len);
>> >> >
>> >> > -    old_lflow = ovn_lflow_find(lflow_map, NULL, stage, priority, match,
>> >> > -                               actions, ctrl_meter, hash);
>> >> > -    if (old_lflow) {
>> >> > -        ovn_dp_group_add_with_reference(old_lflow, od, dp_bitmap,
>> >> > -                                        bitmap_len);
>> >> > -        return;
>> >> > +    if (add_lflow_to_temp_list) {
>> >> > +        ovs_assert(od);
>> >> > +        ovs_assert(!dp_bitmap);
>> >> > +    } else {
>> >> > +        old_lflow = ovn_lflow_find(lflow_map, NULL, stage, priority, 
>> >> > match,
>> >> > +                                   actions, ctrl_meter, hash);
>> >> > +        if (old_lflow) {
>> >> > +            ovn_dp_group_add_with_reference(old_lflow, od, dp_bitmap,
>> >> > +                                            bitmap_len);
>> >> > +            return;
>> >> > +        }
>> >> >      }
>> >>
>> >> Hi, Han.  Not a full review, but I'm a bit concerned that this code 
>> >> doesn't
>> >> support datapath groups.  IIUC, if some flows can be aggregated with 
>> >> datapath
>> >> groups, I-P will just add them separately.  Later a full re-compute will
>> >> delete all these flows and replace them with a single grouped one.  Might
>> >> cause some unnecessary churn in the cluster.
>> >>
>> >
>> > Hi Ilya, thanks for the feedback. You are right that the current I-P 
>> > doesn't support datapath groups, because:
>> > 1. The current I-P only supports LSP (regular VIFs) related changes, which 
>> > is very unlikely to generate datapath groups. Even when that happens in 
>> > rare cases, the logic is still correct and the churn should be small.
>> > 2. There are two types of DP groups:
>> >     a. DP groups that are directly generated when northd has the knowledge 
>> > that a lflow should be applied to a group of DPs.
>> >     b. DP groups that are passively generated by combining lflows 
>> > generated for different DPs.
>> > Supporting I-P for the type-a DP groups is relatively straightforward, 
>> > which would be part of the I-P logic for the object that generates the DP 
>> > group, e.g. for LB related DP groups.
>> > Supporting I-P for the type-b DP groups is more complex, but I think for 
>> > most DP groups we can optimize northd to make them type-a DP groups, for 
>> > example, for ACLs we should know beforehand the group of DPs the lflow 
>> > should be applied. This conversion/optimization is an incremental process, 
>> > when we see a type of change handling is a bottleneck, and decide to 
>> > implement I-P for it.
>> > In the end I hope there will be no cases that need I-P for a type-b DP 
>> > group, which is why I don't worry about supporting DP groups for now.
>> >
>> >> Is my understanding correct?  And can it be a performance issue? E.g. if 
>> >> the
>> >> database grows too much because of that.   Or how hard it is to add 
>> >> datapath
>> >> group support to I-P ?
>> >>
>> > I'd not worry too much about DB growth for now since it is only LSP 
>> > (regular VIFs) related lflows that's skipping DP group which I believe DP 
>> > group is not heavily involved in.
>> >
>> >> I suppose, DP-groups is one of the reasons you didn't implement deletions
>> >> here, right?
>> >
>> > Maybe partly related, but not the major reason. I just didn't get enough 
>> > time for that part yet. For the reason mentioned above, I will skip the DP 
>> > group for VIF related lflow generation so that I can build hash index for 
>> > deletion, and I don't see any obvious negative impact for not trying to 
>> > combine such lflows with DP groups. Please let me know if you think 
>> > differently.
>>
>> It's hard to tell for sure what kind of side effects we could see,
>> because it's hard to sustain incremental-only processing for a long
>> time with current implementation.  It falls back to re-compute too
>> frequently to see a long-term impact.
>>
>> But I would not be that dismissive of type-b flows.  For example, I
>> pulled the southbound database from our recent ovn-heater density-light
>> 500-node test.  It doesn't have any load balancers involved.
>>
>> The database by the end of a test contains 556 K logical flows.
>> Only 826 of these flows have datapath groups, as expected, not a lot.
>> However, each of these flows is applied to a group of 500+ datapaths.
>> If we unfold them, we'll get additional 826 * 500 = 413 K flows.
>> So, the number of logical flows will be effectively double of what we
>> have today.  The total database size will also grow accordingly.
>>
>> Best regards, Ilya Maximets.
> 
> Maybe I should clarify more clearly that the current implementation of this 
> patch series only performs I-P for VIF related changes. The VIF related 
> lflows are mostly different, and shall never be applied to a group of 500+ 
> datapaths.

I see.  Indeed, most of the flows in the case above are related to
router ports and don't really match on any port-specific information.
I missed that while looking through the database.

> In the test you mentioned, the flows that applied to a group of 500+ 
> datapaths shouldn't be related to any individual VIFs. With this patch, they 
> are still always-recomputed and applied to DP groups instead of individual 
> DPs.
> In my own ovn-k8s-simulation test environment, I have confirmed that after 
> adding and binding a new VIF (which is incrementally processed), and then 
> triggering a recompute from northd, there is no change in the logical flows 
> (no churns observed).
> 
> Also, I was not dismissive of type-b DP groups.

Sorry, it was a poor choice of words from my side.

> I totally understand the impact of the cost of "ungrouping" them and never 
> meant to do so. I am just not worried now because the current patch series 
> (and immediate TODOs for deletions) would not touch the flows of type-b DP 
> groups, so they will keep getting recomputed and work as is for now.
> 
> To implement I-P for the flows of type-b DP groups, I also had the proposal 
> above: to convert them to type-a DP groups first. By that I mean, when 
> generating flows that applies to a big group of DPs, northd must have the 
> knowledge which DPs should the flows be applied to while constructing them. 
> An example is the DP groups for LBs that you have optimized recently. Another 
> example is the flows for ACLs that apply to DPs either for the LSes that 
> reference the ACLs directly or through port-groups. When implementing I-P for 
> these objects, we can construct the DP groups and apply the flows to the 
> group (type-a) directly instead of generating flows for each individual DPs 
> first and then collecting them (type-b). The goal is not to convert every 
> type-b group to type-a, but when we are trying to implement I-P for a certain 
> input this is what we need to do for the DP groups related to that input.

Ack.

> 
> I hope this addresses your concerns.
> 
> Finally, I don't want to mislead that there is never going to be DP groups 
> involved in VIF related flows. In some uncommon scenarios I think it may 
> still happen that two VIFs on different LSes may happen to have same IP/MAC 
> configured thus some of the flows generated may look the same on the 
> different DPs (thus get grouped as type-b groups), but these are very rare 
> cases and the DP groups generated by such coincidental identical flows should 
> be small and the effect of such small-sized DP groups are unimportant (unlike 
> the DP groups for common flows for all LSes/LRs or the ones for LBs/ACLs). So 
> if there are churns caused by this type of flows they should be very small 
> (if exists at all), and once I implement the I-P for deletion (which will 
> disable DP groups for such coincidental small groups) there will be no churn 
> at all even for these rare cases.


Thanks for explanation.  I see that datapath groups are not really
a problem for this particular patch set.  We just need to be careful
while adding new I-P nodes for other types of resources.

Best regards, Ilya Maximets.

> 
> Thanks,
> Han

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to