On Tue, 22 Aug, 2023, 6:11 am Han Zhou, <hz...@ovn.org> wrote:

> On Fri, Aug 18, 2023 at 4:35 AM Numan Siddique <num...@ovn.org> wrote:
> >
> > On Fri, Aug 18, 2023 at 2:33 PM Numan Siddique <num...@ovn.org> wrote:
> > >
> > > On Fri, Aug 18, 2023 at 11:38 AM Han Zhou <hz...@ovn.org> wrote:
> > > >
> > > > On Wed, Aug 9, 2023 at 9:38 AM <num...@ovn.org> wrote:
> > > > >
> > > > > From: Numan Siddique <num...@ovn.org>
> > > > >
> > > > Thanks Numan for the enhancement.
> > > >
> > > > > Instead of maintaing the lport to lflow references using
> > > > > 'struct lflow_ref_list', this patch makes use of the existing
> > > > > objdep APIs.  Since objdep_mgr is not thread safe, an instance
> > > > > of objdep_mgr is maintained for each 'ovn_port'.
> > > > >
> > > > > Once we add the thread safe support to objdep APIs we can move
> > > > > the objdep_mgr to 'lflow' engine date.
> > > > >
> > > > > Using objdep APIs does come with a cost in memory.  We now
> > > > > maintain an additional hmap of ovn_lflow's to look up using
> > > > > uuid.  But this will be useful to handle datapath and load
> > > > > balancer changes in the lflow engine node.
> > > > >
> > > >
> > > > I understand that the 'struct lflow_ref_list' may not satisfy the
> needs of
> > > > more complex I-P, but I wonder if objdep_mgr (without modification)
> is the
> > > > solution. It seems to me more complex at least for this patch without
> > > > adding obvious benefits. Since I am still reviewing, maybe I missed
> some
> > > > points which would show the value from the rest of the patches, and
> please
> > > > forgive me for my slow review.
> > >
> > > Sure.   No worries.
> > >
> > > Can you please continue reviewing from the latest v6 ?
> > >
> > > http://patchwork.ozlabs.org/project/ovn/list/?series=369384
> > >
> > > I think we can discuss more once you go through all the patches in the
> > > series (or until patch 14).
> > > I think that would give a better idea of how it is used and we can
> > > discuss if it's worth using obj dep mgr.
> > >
> > > Also I think the patches P1 - P8 make one set and can be considered
> > > first.  I submitted all these patches
> > > as one series mainly because
> > >    -  all the patches can be applied easily
> > >    -  the soft freeze was near.
> > >
> > > The patches can be found here too -
> > > https://github.com/numansiddique/ovn/tree/northd_ip_lb_ip_v6
> > >
> > > Thanks
> > > Numan
> > >
> > > >
> > > > While I am still reviewing the patches, I did a performance test with
> > > > https://github.com/hzhou8/ovn-test-script for the 200 chassis x 50
> lsp
> > > > scale of simulated ovn-k8s topology.
> > > >
> > > > - Before this patch, a recompute takes 1293ms
> > > > - After this patch, a recompute takes 1536ms, which is 20% more time.
> > > > - I also did the same test with all the rest of the patches of this
> series,
> > > > which takes 1690ms.
> > > >
> >
> > I did some testing using the OVN dbs of  the ovn-heater density heavy
> run.
> > A triggered forced recompute with the present main of ovn-northd takes
> > around 4389ms
> > where as with the patch 9 of this series,  it is taking around 4838ms.
> >
> > I hacked the 'engine_compute_log_timeout_msec' [1] value from 500ms to
> > 50ms so that
> > inc-eng can log the engine nodes taking more than 50ms in the recompute.
> >
> > With main
> > -------
> > 2023-08-18T11:28:20.714Z|00117|inc_proc_eng|INFO|User triggered force
> recompute.
> > 2023-08-18T11:28:23.189Z|00118|inc_proc_eng|INFO|node: northd,
> > recompute (forced) took 2475ms
> > 2023-08-18T11:28:25.102Z|00119|inc_proc_eng|INFO|node: lflow,
> > recompute (forced) took 1887ms
> > 2023-08-18T11:28:25.102Z|00120|timeval|WARN|Unreasonably long 4389ms
> > poll interval (4373ms user, 3ms system)
> >
> >
> > And with patch 9
> > ---------------------
> > 2023-08-18T11:28:10.558Z|00220|inc_proc_eng|INFO|User triggered force
> recompute.
> > 2023-08-18T11:28:10.883Z|00221|inc_proc_eng|INFO|node: lb_data,
> > recompute (forced) took 325ms
> > 2023-08-18T11:28:13.163Z|00222|inc_proc_eng|INFO|node: northd,
> > recompute (forced) took 2280ms
> > 2023-08-18T11:28:13.431Z|00223|inc_proc_eng|INFO|node: sync_to_sb_lb,
> > recompute (forced) took 241ms
> > 2023-08-18T11:28:15.396Z|00224|inc_proc_eng|INFO|node: lflow,
> > recompute (forced) took 1956ms
> > 2023-08-18T11:28:15.397Z|00225|timeval|WARN|Unreasonably long 4838ms
> > poll interval (4819ms user, 3ms system)
> >
> >
> > As you can see the lflow recompute time is not very significant
> > (1887 ms vs 1956 ms)
> >
> > It is the lb_data engine node which is taking 325ms more.  But if you
> > see the lb_data engine code, the lb_data handlers never return false.
> > So the lb_data full engine recompute cost can be ignored as it will be
> > triggered when engine aborts or when the user triggers a recompute.
> >
> > I don't think the lflow engine recompute cost due to objdep is
> > significant if lflow recomputes are less.
> >
>
> My point is that the performance degradation ~20% is caused by just patch
> 9, because I did the same test on patch 8 v.s. patch 9. (and patch 8 is
> similar to main in terms of the recompute time, although I did see a big
> percentage increase of time spent for VIF I-P, which I will check later
> separately)
> I did the same test for v6 (on top of the latest main) between patch 8 and
> patch 9. The percentage of the gap is similar:
>
> Before patch9:   1282ms
> After patch9:      1495ms
>
> The command I ran was simply creating an empty LR: ovn-nbctl --wait=hv
> --print-wait-time lr-add lr-test
> Also note that the test is using single thread for ovn-northd.
>
> The lb_data engine node cost shouldn't be a factor because there is no
> change of this part in patch 9, right? Maybe it is not objdep that is
> causing the performance drop because there are lots of changes in this
> patch, but it is hard to tell without profiling/perf.
>
> I didn't spend more time debugging this.  Below is just the result of
> stopwatch for reference.
> # ovn-appctl -t ovn-northd stopwatch/reset
> # ovn-appctl -t ovn-northd inc-engine/recompute
> # ovn-appctl -t ovn-northd stopwatch/show
> Left side is with patch 9, and the right side is without patch 9:
> ---
> Statistics for 'lflows_to_sb'
>  |Statistics for 'lflows_to_sb'
>   Total samples: 1                                                     |
>  Total samples: 1
>   Maximum: 214 msec                                                    |
>  Maximum: 197 msec
>   Minimum: 214 msec                                                    |
>  Minimum: 197 msec
>   95th percentile: 0.000000 msec                                       |
>  95th percentile: 0.000000 msec
>   Short term average: 214.000000 msec                                  |
>  Short term average: 197.000000 msec
>   Long term average: 214.000000 msec                                   |
>  Long term average: 197.000000 msec
> Statistics for 'ovnnb_db_run'
>  |Statistics for 'ovnnb_db_run'
>   Total samples: 1                                                     |
>  Total samples: 1
>   Maximum: 355 msec                                                    |
>  Maximum: 320 msec
>   Minimum: 355 msec                                                    |
>  Minimum: 320 msec
>   95th percentile: 0.000000 msec                                       |
>  95th percentile: 0.000000 msec
>   Short term average: 355.000000 msec                                  |
>  Short term average: 320.000000 msec
>   Long term average: 355.000000 msec                                   |
>  Long term average: 320.000000 msec
> Statistics for 'build_flows_ctx'
> |Statistics for 'build_flows_ctx'
>   Total samples: 1                                                     |
>  Total samples: 1
>   Maximum: 346 msec                                                    |
>  Maximum: 311 msec
>   Minimum: 346 msec                                                    |
>  Minimum: 311 msec
>   95th percentile: 0.000000 msec                                       |
>  95th percentile: 0.000000 msec
>   Short term average: 346.000000 msec                                  |
>  Short term average: 311.000000 msec
>   Long term average: 346.000000 msec                                   |
>  Long term average: 311.000000 msec
> Statistics for 'ovn-northd-loop'
> |Statistics for 'ovn-northd-loop'
>   Total samples: 4                                                     |
>  Total samples: 4
>   Maximum: 1486 msec                                                   |
>  Maximum: 1297 msec
>   Minimum: 0 msec                                                      |
>  Minimum: 0 msec
>   95th percentile: 0.000000 msec                                       |
>  95th percentile: 0.000000 msec
>   Short term average: 743.000000 msec                                  |
>  Short term average: 648.750000 msec
>   Long term average: 14.860000 msec                                    |
>  Long term average: 14.910598 msec
> Statistics for 'build_lflows'
>  |Statistics for 'build_lflows'
>   Total samples: 1                                                     |
>  Total samples: 1
>   Maximum: 642 msec                                                    |
>  Maximum: 588 msec
>   Minimum: 642 msec                                                    |
>  Minimum: 588 msec
>   95th percentile: 0.000000 msec                                       |
>  95th percentile: 0.000000 msec
>   Short term average: 642.000000 msec                                  |
>  Short term average: 588.000000 msec
>   Long term average: 642.000000 msec                                   |
>  Long term average: 588.000000 msec
> Statistics for 'lflows_ports'
>  |Statistics for 'lflows_ports'
>   Total samples: 1                                                     |
>  Total samples: 1
>   Maximum: 315 msec                                                    |
>  Maximum: 231 msec
>   Minimum: 315 msec                                                    |
>  Minimum: 231 msec
>   95th percentile: 0.000000 msec                                       |
>  95th percentile: 0.000000 msec
>   Short term average: 315.000000 msec                                  |
>  Short term average: 231.000000 msec
>   Long term average: 315.000000 msec                                   |
>  Long term average: 231.000000 msec
>


Hi Han,

I'm presently on PTO.  I'll take a look and respond once I'm back.

Meanwhile I kindly request to provide some reviews for the first 8 or 9
patches which are independent of obj dep mgr.

Thanks
Numan


> Thanks,
> Han
>
> > Thanks
> > Numan
> >
> > > > So the major increase of the recompute time is from this patch. I
> didn't
> > > > debug what's the exact cause of this increase, but I think it might
> be
> > > > related to the objdep usage.
> > > >
> > > > > This patch does few more changes which are significant:
> > > > >   -  Logical flows are now hashed without the logical
> > > > >      datapaths.  If a logical flow is referenced by just one
> > > > >      datapath, we don't rehash it.
> > > > >
> > > > >   -  The synthetic 'hash' column of sbrec_logical_flow now
> > > > >      doesn't use the logical datapath too.  This means that
> > > > >      when ovn-northd is updated/upgraded and has this commit,
> > > > >      all the logical flows with 'logical_datapath' column
> > > > >      set will get deleted and re-added causing some disruptions.
> > > > >
> > > > >   -  With the commit [1] which added I-P support for logical
> > > > >      port changes, multiple logical flows with same match 'M'
> > > > >      and actions 'A' are generated and stored without the
> > > > >      dp groups, which was not the case prior to
> > > > >      that patch.
> > > > >      One example to generate these lflows is:
> > > > >              ovn-nbctl lsp-set-addresses sw0p1 "MAC1 IP1"
> > > > >              ovn-nbctl lsp-set-addresses sw1p1 "MAC1 IP1"
> > > > >              ovn-nbctl lsp-set-addresses sw2p1 "MAC1 IP1"
> > > > >
> > > > >      Now with this patch we go back to the earlier way.  i.e
> > > > >      one logical flow with logical_dp_groups set.
> > > > >
> > > > Thanks for taking the effort to achieve this. The approach is indeed
> very
> > > > smart. I was aware of such scenarios, but it just didn''t seem to
> impact
> > > > scalability because the number of such lflows should be small, so it
> wasn't
> > > > considered a priority.
> > > >
> > > > >   -  With this patch any updates to a logical port which
> > > > >      doesn't result in new logical flows will not result in
> > > > >      deletion and addition of same logical flows.
> > > > >      Eg.
> > > > >      ovn-nbctl set logical_switch_port sw0p1 external_ids:foo=bar
> > > > >      will be a no-op to the SB logical flow table.
> > > > >
> > > > Similar to the above case, this doesn't seem to be critical for
> > > > scalability, but it is good to see the enhancement.
> > > > I will post more feedback after finishing the full review.
> > > >
> > > > Thanks,
> > > > Han
> > > >
> > > > > [1] - 8bbd67891f68("northd: Incremental processing of VIF additions
> in
> > > > 'lflow' node.")
> > > > >
> > > > > Suggested-by: Dumitru Ceara <dce...@redhat.com>
> > > > > Signed-off-by: Numan Siddique <num...@ovn.org>
> > > > _______________________________________________
> > > > dev mailing list
> > > > d...@openvswitch.org
> > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> _______________________________________________
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to