Re: [ovs-dev] [OVN Patch v8 3/3] northd: Restore parallel build with dp_groups

Anton Ivanov Thu, 30 Sep 2021 07:35:09 -0700

Summary of findings.

1. The numbers on the perf test do not align with heater which is much closer 
to a realistic load. On some tests where heater gives 5-10% end-to-end 
improvement with parallelization we get worse results with the perf-test. You 
spotted this one correctly.


Example of the northd average pulled out of the test report via grep and sed.

   127.489353
   131.509458
   116.088205
   94.721911
   119.629756
   114.896258
   124.811069
   129.679160
   106.699905
   134.490338
   112.106713
   135.957658
   132.471111
   94.106849
   117.431450
   115.861592
   106.830657
   132.396905
   107.092542
   128.945760
   94.298464
   120.455510
   136.910426
   134.311765
   115.881292
   116.918458

These values are all over the place - this is not a reproducible test.

2. In the present state you need to re-run it > 30+ times and take an average. The 
standard deviation for the values for the northd loop is > 10%. Compared to that 
the reproducibility of ovn-heater is significantly better. I usually get less than 
0.5% difference between runs if there was no iteration failures. I would suggest 
using that instead if you want to do performance comparisons until we have figured 
out what affects the perf-test.

3. It is using the short term running average value in reports which is 
probably wrong because you have very significant skew from the last several 
values.

I will look into all of these.

Brgds,

On 30/09/2021 08:26, Han Zhou wrote:



On Thu, Sep 30, 2021 at 12:08 AM Anton Ivanov <[email protected] 
<mailto:[email protected]>> wrote:

    After quickly adding some more prints into the testsuite.

    Test 1:

    Without

      1: ovn-northd basic scale test -- 200 Hypervisors, 200 Logical 
Ports/Hypervisor -- ovn-northd -- dp-groups=yes
      ---
      Maximum (NB in msec): 1130
      Average (NB in msec): 620.375000
      Maximum (SB in msec): 23
      Average (SB in msec): 21.468759
      Maximum (northd-loop in msec): 6002
      Minimum (northd-loop in msec): 0
      Average (northd-loop in msec): 914.760417
      Long term average (northd-loop in msec): 104.799340

    With

      1: ovn-northd basic scale test -- 200 Hypervisors, 200 Logical 
Ports/Hypervisor -- ovn-northd -- dp-groups=yes
      ---
      Maximum (NB in msec): 1148
      Average (NB in msec): 630.250000
      Maximum (SB in msec): 24
      Average (SB in msec): 21.468744
      Maximum (northd-loop in msec): 6090
      Minimum (northd-loop in msec): 0
      Average (northd-loop in msec): 762.101565
      Long term average (northd-loop in msec): 80.735192

    The metric which actually matters and which SHOULD me measured - long term 
average is better by 20%. Using short term average instead of long term in the 
test suite is actually a BUG.

Good catch!

    Are you running yours under some sort of virtualization?


No, I am testing on a bare-metal.

    A.

    On 30/09/2021 07:52, Han Zhou wrote:

    Thanks Anton for checking. I am using: Intel(R) Core(TM) i9-7920X CPU @ 
2.90GHz, 24 cores.
    It is weird why my result is so different. I also verified with a scale 
test script that creates a large scale NB/SB with 800 nodes of simulated k8s 
setup. And then just run:
        ovn-nbctl --print-wait-time --wait=sb sync

    Without parallel:
    ovn-northd completion: 7807ms

    With parallel:
    ovn-northd completion: 41267ms

    I suspected the hmap size problem but I tried changing the initial size to 64k 
buckets and it didn't help. I will find some time to check the "perf" reports.

    Thanks,
    Han

    On Wed, Sep 29, 2021 at 11:31 PM Anton Ivanov <[email protected] 
<mailto:[email protected]>> wrote:

        On 30/09/2021 07:16, Anton Ivanov wrote:

        Results on a Ryzen 5 3600 - 6 cores 12 threads


        I will also have a look into the "maximum" measurement for multi-thread.

        It does not tie up with the drop in average across the board.

        A.


        Without


          1: ovn-northd basic scale test -- 200 Hypervisors, 200 Logical 
Ports/Hypervisor -- ovn-northd -- dp-groups=yes
          ---
          Maximum (NB in msec): 1256
          Average (NB in msec): 679.463785
          Maximum (SB in msec): 25
          Average (SB in msec): 22.489798
          Maximum (northd-loop in msec): 1347
          Average (northd-loop in msec): 799.944878

          2: ovn-northd basic scale test -- 200 Hypervisors, 200 Logical 
Ports/Hypervisor -- ovn-northd -- dp-groups=no
          ---
          Maximum (NB in msec): 1956
          Average (NB in msec): 809.387285
          Maximum (SB in msec): 24
          Average (SB in msec): 21.649258
          Maximum (northd-loop in msec): 2011
          Average (northd-loop in msec): 961.718686

          5: ovn-northd basic scale test -- 500 Hypervisors, 50 Logical 
Ports/Hypervisor -- ovn-northd -- dp-groups=yes
          ---
          Maximum (NB in msec): 557
          Average (NB in msec): 474.010337
          Maximum (SB in msec): 15
          Average (SB in msec): 13.927192
          Maximum (northd-loop in msec): 1261
          Average (northd-loop in msec): 580.999122

          6: ovn-northd basic scale test -- 500 Hypervisors, 50 Logical 
Ports/Hypervisor -- ovn-northd -- dp-groups=no
          ---
          Maximum (NB in msec): 756
          Average (NB in msec): 625.614724
          Maximum (SB in msec): 15
          Average (SB in msec): 14.181048
          Maximum (northd-loop in msec): 1649
          Average (northd-loop in msec): 746.208332


        With

          1: ovn-northd basic scale test -- 200 Hypervisors, 200 Logical 
Ports/Hypervisor -- ovn-northd -- dp-groups=yes
          ---
          Maximum (NB in msec): 1140
          Average (NB in msec): 631.125000
          Maximum (SB in msec): 24
          Average (SB in msec): 21.453609
          Maximum (northd-loop in msec): 6080
          Average (northd-loop in msec): 759.718815

          2: ovn-northd basic scale test -- 200 Hypervisors, 200 Logical 
Ports/Hypervisor -- ovn-northd -- dp-groups=no
          ---
          Maximum (NB in msec): 1210
          Average (NB in msec): 673.000000
          Maximum (SB in msec): 27
          Average (SB in msec): 22.453125
          Maximum (northd-loop in msec): 6514
          Average (northd-loop in msec): 808.596842

          5: ovn-northd basic scale test -- 500 Hypervisors, 50 Logical 
Ports/Hypervisor -- ovn-northd -- dp-groups=yes
          ---
          Maximum (NB in msec): 798
          Average (NB in msec): 429.750000
          Maximum (SB in msec): 15
          Average (SB in msec): 12.998533
          Maximum (northd-loop in msec): 3835
          Average (northd-loop in msec): 564.875986

          6: ovn-northd basic scale test -- 500 Hypervisors, 50 Logical 
Ports/Hypervisor -- ovn-northd -- dp-groups=no
          ---
          Maximum (NB in msec): 1074
          Average (NB in msec): 593.875000
          Maximum (SB in msec): 14
          Average (SB in msec): 13.655273
          Maximum (northd-loop in msec): 4973
          Average (northd-loop in msec): 771.102605

        The only one slower is test 6 which I will look into.

        The rest are > 5% faster.

        A.

        On 30/09/2021 00:56, Han Zhou wrote:



        On Wed, Sep 15, 2021 at 5:45 AM <[email protected] 
<mailto:[email protected]>> wrote:
        >
        > From: Anton Ivanov <[email protected] 
<mailto:[email protected]>>
        >
        > Restore parallel build with dp groups using rwlock instead
        > of per row locking as an underlying mechanism.
        >
        > This provides improvement ~ 10% end-to-end on ovn-heater
        > under virutalization despite awakening some qemu gremlin
        > which makes qemu climb to silly CPU usage. The gain on
        > bare metal is likely to be higher.
        >
        Hi Anton,

        I am trying to see the benefit of parallel_build, but encountered 
unexpected performance result when running the perf tests with command:
             make check-perf TESTSUITEFLAGS="--rebuild"

        It shows significantly worse performance than without parallel_build. For 
dp_group = no cases, it is better, but still ~30% slower than without parallel_build. I 
have 24 cores, but each thread is not consuming much CPU except the main thread. I also 
tried hardcode the number of thread to just 4, which end up with slightly better results, 
but still far behind "without parallel_build".

                   no parallel                    | parallel  (24 pool threads) 
      |        parallel with (4 pool threads)
                           |   |
            1: ovn-northd basic scale test -- 200 Hypervisors, 200 |    1: 
ovn-northd basic scale test -- 200 Hypervisors, 200 |    1: ovn-northd basic 
scale test -- 200 Hypervisors, 200
            ---                      |    ---      |    ---
            Maximum (NB in msec): 1058                     |    Maximum (NB in 
msec): 4269     |    Maximum (NB in msec): 4097
            Average (NB in msec): 836.941167                     |    Average 
(NB in msec): 3697.253931      |    Average (NB in msec): 3498.311525
            Maximum (SB in msec): 30                     |    Maximum (SB in 
msec): 30     |    Maximum (SB in msec): 28
            Average (SB in msec): 25.934011                      |    Average 
(SB in msec): 26.001840      |    Average (SB in msec): 25.685091
            Maximum (northd-loop in msec): 1204                    |    Maximum 
(northd-loop in msec): 4379          |    Maximum (northd-loop in msec): 4251
            Average (northd-loop in msec): 1005.330078             |    Average 
(northd-loop in msec): 4233.871504         |    Average (northd-loop in msec): 
4022.774208
                             |     |
            2: ovn-northd basic scale test -- 200 Hypervisors, 200 |    2: 
ovn-northd basic scale test -- 200 Hypervisors, 200 |    2: ovn-northd basic 
scale test -- 200 Hypervisors, 200
            ---                      |    ---      |    ---
            Maximum (NB in msec): 1124                     |    Maximum (NB in 
msec): 1480     |    Maximum (NB in msec): 1331
            Average (NB in msec): 892.403405                     |    Average 
(NB in msec): 1206.189287      |    Average (NB in msec): 1089.378455
            Maximum (SB in msec): 29                     |    Maximum (SB in 
msec): 31     |    Maximum (SB in msec): 30
            Average (SB in msec): 26.922632                      |    Average 
(SB in msec): 26.636706      |    Average (SB in msec): 25.657484
            Maximum (northd-loop in msec): 1275                    |    Maximum 
(northd-loop in msec): 1639          |    Maximum (northd-loop in msec): 1495
            Average (northd-loop in msec): 1074.917873             |    Average 
(northd-loop in msec): 1458.152327         |    Average (northd-loop in msec): 
1301.057201
                             |     |
            5: ovn-northd basic scale test -- 500 Hypervisors, 50 L|    5: 
ovn-northd basic scale test -- 500 Hypervisors, 50 L|    5: ovn-northd basic 
scale test -- 500 Hypervisors, 50
            ---                      |    ---      |    ---
            Maximum (NB in msec): 768                      |    Maximum (NB in 
msec): 3086     |    Maximum (NB in msec): 2876
            Average (NB in msec): 614.491938                     |    Average 
(NB in msec): 2681.688365      |    Average (NB in msec): 2531.255444
            Maximum (SB in msec): 18                     |    Maximum (SB in 
msec): 17     |    Maximum (SB in msec): 18
            Average (SB in msec): 16.347526                      |    Average 
(SB in msec): 15.955263      |    Average (SB in msec): 16.278075
            Maximum (northd-loop in msec): 889                     |    Maximum 
(northd-loop in msec): 3247          |    Maximum (northd-loop in msec): 3031
            Average (northd-loop in msec): 772.083572              |    Average 
(northd-loop in msec): 3117.504297         |    Average (northd-loop in msec): 
2833.182361
                             |     |
            6: ovn-northd basic scale test -- 500 Hypervisors, 50 L|    6: 
ovn-northd basic scale test -- 500 Hypervisors, 50 L|    6: ovn-northd basic 
scale test -- 500 Hypervisors, 50
            ---                      |    ---      |    ---
            Maximum (NB in msec): 1046                     |    Maximum (NB in 
msec): 1371     |    Maximum (NB in msec): 1262
            Average (NB in msec): 827.735852                     |    Average 
(NB in msec): 1135.514228      |    Average (NB in msec): 970.544792
            Maximum (SB in msec): 19                     |    Maximum (SB in 
msec): 18     |    Maximum (SB in msec): 19
            Average (SB in msec): 16.828127                      |    Average 
(SB in msec): 16.083914      |    Average (SB in msec): 15.602525
            Maximum (northd-loop in msec): 1163                    |    Maximum 
(northd-loop in msec): 1545          |    Maximum (northd-loop in msec): 1411
            Average (northd-loop in msec): 972.567407              |    Average 
(northd-loop in msec): 1328.617583         |    Average (northd-loop in msec): 
1207.667100

        I didn't debug yet, but do you have any clue what could be the reason? 
I am using the upstream commit 9242f27f63 which already included this patch.
        Below is my change to the perf-northd.at <http://perf-northd.at> file 
just to enable parallel_build:

        diff --git a/tests/perf-northd.at <http://perf-northd.at> 
b/tests/perf-northd.at <http://perf-northd.at>
        index 74b69e9d4..9328c2e21 100644
        --- a/tests/perf-northd.at <http://perf-northd.at>
        +++ b/tests/perf-northd.at <http://perf-northd.at>
        @@ -191,6 +191,7 @@ AT_SETUP([ovn-northd basic scale test -- 200 
Hypervisors, 200 Logical Ports/Hype
         PERF_RECORD_START()

         ovn_start
        +ovn-nbctl set nb_global . options:use_parallel_build=true

         BUILD_NBDB(OVN_BASIC_SCALE_CONFIG(200, 200))

        @@ -203,9 +204,10 @@ AT_SETUP([ovn-northd basic scale test -- 500 
Hypervisors, 50 Logical Ports/Hyper
         PERF_RECORD_START()

         ovn_start
        +ovn-nbctl set nb_global . options:use_parallel_build=true

         BUILD_NBDB(OVN_BASIC_SCALE_CONFIG(500, 50))

        Thanks,
        Han

--Anton R. Ivanov

        Cambridgegreys Limited. Registered in England. Company Number 10273661
        https://www.cambridgegreys.com/  <https://www.cambridgegreys.com/>

--Anton R. Ivanov

        Cambridgegreys Limited. Registered in England. Company Number 10273661
        https://www.cambridgegreys.com/  <https://www.cambridgegreys.com/>

--Anton R. Ivanov

    Cambridgegreys Limited. Registered in England. Company Number 10273661
    https://www.cambridgegreys.com/  <https://www.cambridgegreys.com/>

--
Anton R. Ivanov
Cambridgegreys Limited. Registered in England. Company Number 10273661
https://www.cambridgegreys.com/

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [OVN Patch v8 3/3] northd: Restore parallel build with dp_groups

Reply via email to