Re: [ovs-dev] [OVN Patch v8 3/3] northd: Restore parallel build with dp_groups

Anton Ivanov Thu, 30 Sep 2021 14:04:14 -0700

On 30/09/2021 20:48, Han Zhou wrote:

On Thu, Sep 30, 2021 at 7:34 AM Anton Ivanov<[email protected]<mailto:[email protected]>> wrote:


    Summary of findings.

    1. The numbers on the perf test do not align with heater which is
    much closer to a realistic load. On some tests where heater gives
    5-10% end-to-end improvement with parallelization we get worse
    results with the perf-test. You spotted this one correctly.

    Example of the northd average pulled out of the test report via
    grep and sed.

       127.489353
       131.509458
       116.088205
       94.721911
       119.629756
       114.896258
       124.811069
       129.679160
       106.699905
       134.490338
       112.106713
       135.957658
       132.471111
       94.106849
       117.431450
       115.861592
       106.830657
       132.396905
       107.092542
       128.945760
       94.298464
       120.455510
       136.910426
       134.311765
       115.881292
       116.918458

    These values are all over the place - this is not a reproducible test.

    2. In the present state you need to re-run it > 30+ times and take
    an average. The standard deviation for the values for the northd
    loop is > 10%. Compared to that the reproducibility of ovn-heater
    is significantly better. I usually get less than 0.5% difference
    between runs if there was no iteration failures. I would suggest
    using that instead if you want to do performance comparisons until
    we have figured out what affects the perf-test.

    3. It is using the short term running average value in reports
    which is probably wrong because you have very significant skew
    from the last several values.

    I will look into all of these.

Thanks for the summary! However, I think there is a bigger problem(probably related to my environment) than the stability of the test(make check-perf TESTSUITEFLAGS="--rebuild") itself. As I mentioned inan earlier email I observed even worse results with a large scaletopology closer to a real world deployment of ovn-k8s just testingwith the command:

    ovn-nbctl --print-wait-time --wait=sb sync

This command simply triggers a change in NB_Global table and wait fornorthd to complete all the recompute and update SB. It doesn't have touse "sync" command but any change to the NB DB produces similar result(e.g.: ovn-nbctl --print-wait-time --wait=sb ls-add ls1)


Without parallel:
ovn-northd completion: 7807ms

With parallel:
ovn-northd completion: 41267ms


Is this with current master or prior to these patches?

1. There was an issue prior to these where the hash on first iterationwith an existing database when loading a large database for the firsttime was not sized correctly. These numbers sound about right when thisbug was around.

2. There should be NO DIFFERENCE in a single compute cycle with anexisting database between a run with parallel and without with dp groupsat present. This is because the first cycle does not use parallelcompute. It is disabled in order to achieve the correct hash sizings forfuture cycle by auto-scaling the hash.

So what exact tag/commit are you running this with and with what optionsare on/off?

A.

This result is stable and consistent when repeating the command on mymachine. Would you try it on your machine as well? I understand thatonly the lflow generation part can be parallelized and it doesn'tsolve all the bottleneck, but I did expect it to be faster instead ofslower. If your result always shows that parallel is better, then Iwill have to dig it out myself on my test machine.


Thanks,
Han

    Brgds,

    On 30/09/2021 08:26, Han Zhou wrote:



    On Thu, Sep 30, 2021 at 12:08 AM Anton Ivanov
    <[email protected]
    <mailto:[email protected]>> wrote:

        After quickly adding some more prints into the testsuite.

        Test 1:

        Without

          1: ovn-northd basic scale test -- 200 Hypervisors, 200
        Logical Ports/Hypervisor -- ovn-northd -- dp-groups=yes
          ---
          Maximum (NB in msec): 1130
          Average (NB in msec): 620.375000
          Maximum (SB in msec): 23
          Average (SB in msec): 21.468759
          Maximum (northd-loop in msec): 6002
          Minimum (northd-loop in msec): 0
          Average (northd-loop in msec): 914.760417
          Long term average (northd-loop in msec): 104.799340

        With

          1: ovn-northd basic scale test -- 200 Hypervisors, 200
        Logical Ports/Hypervisor -- ovn-northd -- dp-groups=yes
          ---
          Maximum (NB in msec): 1148
          Average (NB in msec): 630.250000
          Maximum (SB in msec): 24
          Average (SB in msec): 21.468744
          Maximum (northd-loop in msec): 6090
          Minimum (northd-loop in msec): 0
          Average (northd-loop in msec): 762.101565
          Long term average (northd-loop in msec): 80.735192

        The metric which actually matters and which SHOULD me
        measured - long term average is better by 20%. Using short
        term average instead of long term in the test suite is
        actually a BUG.

    Good catch!

        Are you running yours under some sort of virtualization?


    No, I am testing on a bare-metal.

        A.

        On 30/09/2021 07:52, Han Zhou wrote:

        Thanks Anton for checking. I am using: Intel(R) Core(TM)
        i9-7920X CPU @ 2.90GHz, 24 cores.
        It is weird why my result is so different. I also verified
        with a scale test script that creates a large scale NB/SB
        with 800 nodes of simulated k8s setup. And then just run:
            ovn-nbctl --print-wait-time --wait=sb sync

        Without parallel:
        ovn-northd completion: 7807ms

        With parallel:
        ovn-northd completion: 41267ms

        I suspected the hmap size problem but I tried changing the
        initial size to 64k buckets and it didn't help. I will find
        some time to check the "perf" reports.

        Thanks,
        Han

        On Wed, Sep 29, 2021 at 11:31 PM Anton Ivanov
        <[email protected]
        <mailto:[email protected]>> wrote:

            On 30/09/2021 07:16, Anton Ivanov wrote:

            Results on a Ryzen 5 3600 - 6 cores 12 threads


            I will also have a look into the "maximum" measurement
            for multi-thread.

            It does not tie up with the drop in average across the
            board.

            A.


            Without


              1: ovn-northd basic scale test -- 200 Hypervisors,
            200 Logical Ports/Hypervisor -- ovn-northd -- dp-groups=yes
              ---
              Maximum (NB in msec): 1256
              Average (NB in msec): 679.463785
              Maximum (SB in msec): 25
              Average (SB in msec): 22.489798
              Maximum (northd-loop in msec): 1347
              Average (northd-loop in msec): 799.944878

              2: ovn-northd basic scale test -- 200 Hypervisors,
            200 Logical Ports/Hypervisor -- ovn-northd -- dp-groups=no
              ---
              Maximum (NB in msec): 1956
              Average (NB in msec): 809.387285
              Maximum (SB in msec): 24
              Average (SB in msec): 21.649258
              Maximum (northd-loop in msec): 2011
              Average (northd-loop in msec): 961.718686

              5: ovn-northd basic scale test -- 500 Hypervisors, 50
            Logical Ports/Hypervisor -- ovn-northd -- dp-groups=yes
              ---
              Maximum (NB in msec): 557
              Average (NB in msec): 474.010337
              Maximum (SB in msec): 15
              Average (SB in msec): 13.927192
              Maximum (northd-loop in msec): 1261
              Average (northd-loop in msec): 580.999122

              6: ovn-northd basic scale test -- 500 Hypervisors, 50
            Logical Ports/Hypervisor -- ovn-northd -- dp-groups=no
              ---
              Maximum (NB in msec): 756
              Average (NB in msec): 625.614724
              Maximum (SB in msec): 15
              Average (SB in msec): 14.181048
              Maximum (northd-loop in msec): 1649
              Average (northd-loop in msec): 746.208332


            With

              1: ovn-northd basic scale test -- 200 Hypervisors,
            200 Logical Ports/Hypervisor -- ovn-northd -- dp-groups=yes
              ---
              Maximum (NB in msec): 1140
              Average (NB in msec): 631.125000
              Maximum (SB in msec): 24
              Average (SB in msec): 21.453609
              Maximum (northd-loop in msec): 6080
              Average (northd-loop in msec): 759.718815

              2: ovn-northd basic scale test -- 200 Hypervisors,
            200 Logical Ports/Hypervisor -- ovn-northd -- dp-groups=no
              ---
              Maximum (NB in msec): 1210
              Average (NB in msec): 673.000000
              Maximum (SB in msec): 27
              Average (SB in msec): 22.453125
              Maximum (northd-loop in msec): 6514
              Average (northd-loop in msec): 808.596842

              5: ovn-northd basic scale test -- 500 Hypervisors, 50
            Logical Ports/Hypervisor -- ovn-northd -- dp-groups=yes
              ---
              Maximum (NB in msec): 798
              Average (NB in msec): 429.750000
              Maximum (SB in msec): 15
              Average (SB in msec): 12.998533
              Maximum (northd-loop in msec): 3835
              Average (northd-loop in msec): 564.875986

              6: ovn-northd basic scale test -- 500 Hypervisors, 50
            Logical Ports/Hypervisor -- ovn-northd -- dp-groups=no
              ---
              Maximum (NB in msec): 1074
              Average (NB in msec): 593.875000
              Maximum (SB in msec): 14
              Average (SB in msec): 13.655273
              Maximum (northd-loop in msec): 4973
              Average (northd-loop in msec): 771.102605

            The only one slower is test 6 which I will look into.

            The rest are > 5% faster.

            A.

            On 30/09/2021 00:56, Han Zhou wrote:



            On Wed, Sep 15, 2021 at 5:45 AM
            <[email protected]
            <mailto:[email protected]>> wrote:
            >
            > From: Anton Ivanov <[email protected]
            <mailto:[email protected]>>
            >
            > Restore parallel build with dp groups using rwlock
            instead
            > of per row locking as an underlying mechanism.
            >
            > This provides improvement ~ 10% end-to-end on ovn-heater
            > under virutalization despite awakening some qemu gremlin
            > which makes qemu climb to silly CPU usage. The gain on
            > bare metal is likely to be higher.
            >
            Hi Anton,

            I am trying to see the benefit of parallel_build, but
            encountered unexpected performance result when running
            the perf tests with command:
                 make check-perf TESTSUITEFLAGS="--rebuild"

            It shows significantly worse performance than without
            parallel_build. For dp_group = no cases, it is better,
            but still ~30% slower than without parallel_build. I
            have 24 cores, but each thread is not consuming much
            CPU except the main thread. I also tried hardcode the
            number of thread to just 4, which end up with slightly
            better results, but still far behind "without
            parallel_build".

                       no parallel    |              parallel (24
            pool threads)       |        parallel with (4 pool
            threads)
             |     |
                1: ovn-northd basic scale test -- 200 Hypervisors,
            200 |    1: ovn-northd basic scale test -- 200
            Hypervisors, 200 |  1: ovn-northd basic scale test --
            200 Hypervisors, 200
                ---    |    ---        |    ---
                Maximum (NB in msec): 1058     |    Maximum (NB in
            msec): 4269             |    Maximum (NB in msec): 4097
                Average (NB in msec): 836.941167     |    Average
            (NB in msec): 3697.253931            |    Average (NB
            in msec): 3498.311525
                Maximum (SB in msec): 30   |    Maximum (SB in
            msec): 30     |    Maximum (SB in msec): 28
                Average (SB in msec): 25.934011      |    Average
            (SB in msec): 26.001840            |    Average (SB in
            msec): 25.685091
                Maximum (northd-loop in msec): 1204    |  
             Maximum (northd-loop in msec): 4379      |    Maximum
            (northd-loop in msec): 4251
                Average (northd-loop in msec): 1005.330078   |  
             Average (northd-loop in msec): 4233.871504       |  
             Average (northd-loop in msec): 4022.774208
               |       |
                2: ovn-northd basic scale test -- 200 Hypervisors,
            200 |    2: ovn-northd basic scale test -- 200
            Hypervisors, 200 |  2: ovn-northd basic scale test --
            200 Hypervisors, 200
                ---    |    ---        |    ---
                Maximum (NB in msec): 1124     |    Maximum (NB in
            msec): 1480             |    Maximum (NB in msec): 1331
                Average (NB in msec): 892.403405     |    Average
            (NB in msec): 1206.189287            |    Average (NB
            in msec): 1089.378455
                Maximum (SB in msec): 29   |    Maximum (SB in
            msec): 31     |    Maximum (SB in msec): 30
                Average (SB in msec): 26.922632      |    Average
            (SB in msec): 26.636706            |    Average (SB in
            msec): 25.657484
                Maximum (northd-loop in msec): 1275    |  
             Maximum (northd-loop in msec): 1639      |    Maximum
            (northd-loop in msec): 1495
                Average (northd-loop in msec): 1074.917873   |  
             Average (northd-loop in msec): 1458.152327       |  
             Average (northd-loop in msec): 1301.057201
               |       |
                5: ovn-northd basic scale test -- 500 Hypervisors,
            50 L|    5: ovn-northd basic scale test -- 500
            Hypervisors, 50 L|  5: ovn-northd basic scale test --
            500 Hypervisors, 50
                ---    |    ---        |    ---
                Maximum (NB in msec): 768      |    Maximum (NB in
            msec): 3086             |    Maximum (NB in msec): 2876
                Average (NB in msec): 614.491938     |    Average
            (NB in msec): 2681.688365            |    Average (NB
            in msec): 2531.255444
                Maximum (SB in msec): 18   |    Maximum (SB in
            msec): 17     |    Maximum (SB in msec): 18
                Average (SB in msec): 16.347526      |    Average
            (SB in msec): 15.955263            |    Average (SB in
            msec): 16.278075
                Maximum (northd-loop in msec): 889   |    Maximum
            (northd-loop in msec): 3247      |    Maximum
            (northd-loop in msec): 3031
                Average (northd-loop in msec): 772.083572    |  
             Average (northd-loop in msec): 3117.504297       |  
             Average (northd-loop in msec): 2833.182361
               |       |
                6: ovn-northd basic scale test -- 500 Hypervisors,
            50 L|    6: ovn-northd basic scale test -- 500
            Hypervisors, 50 L|  6: ovn-northd basic scale test --
            500 Hypervisors, 50
                ---    |    ---        |    ---
                Maximum (NB in msec): 1046     |    Maximum (NB in
            msec): 1371             |    Maximum (NB in msec): 1262
                Average (NB in msec): 827.735852     |    Average
            (NB in msec): 1135.514228            |    Average (NB
            in msec): 970.544792
                Maximum (SB in msec): 19   |    Maximum (SB in
            msec): 18     |    Maximum (SB in msec): 19
                Average (SB in msec): 16.828127      |    Average
            (SB in msec): 16.083914            |    Average (SB in
            msec): 15.602525
                Maximum (northd-loop in msec): 1163    |  
             Maximum (northd-loop in msec): 1545      |    Maximum
            (northd-loop in msec): 1411
                Average (northd-loop in msec): 972.567407    |  
             Average (northd-loop in msec): 1328.617583       |  
             Average (northd-loop in msec): 1207.667100

            I didn't debug yet, but do you have any clue what
            could be the reason? I am using the upstream commit
            9242f27f63 which already included this patch.
            Below is my change to the perf-northd.at
            <http://perf-northd.at> file just to enable
            parallel_build:

            diff --git a/tests/perf-northd.at
            <http://perf-northd.at> b/tests/perf-northd.at
            <http://perf-northd.at>
            index 74b69e9d4..9328c2e21 100644
            --- a/tests/perf-northd.at <http://perf-northd.at>
            +++ b/tests/perf-northd.at <http://perf-northd.at>
            @@ -191,6 +191,7 @@ AT_SETUP([ovn-northd basic scale
            test -- 200 Hypervisors, 200 Logical Ports/Hype
             PERF_RECORD_START()

             ovn_start
            +ovn-nbctl set nb_global . options:use_parallel_build=true

             BUILD_NBDB(OVN_BASIC_SCALE_CONFIG(200, 200))

            @@ -203,9 +204,10 @@ AT_SETUP([ovn-northd basic scale
            test -- 500 Hypervisors, 50 Logical Ports/Hyper
             PERF_RECORD_START()

             ovn_start
            +ovn-nbctl set nb_global . options:use_parallel_build=true

             BUILD_NBDB(OVN_BASIC_SCALE_CONFIG(500, 50))

            Thanks,
            Han

--Anton R. Ivanov

            Cambridgegreys Limited. Registered in England. Company Number 
10273661
            https://www.cambridgegreys.com/  <https://www.cambridgegreys.com/>

--Anton R. Ivanov

            Cambridgegreys Limited. Registered in England. Company Number 
10273661
            https://www.cambridgegreys.com/  <https://www.cambridgegreys.com/>

--Anton R. Ivanov

        Cambridgegreys Limited. Registered in England. Company Number 10273661
        https://www.cambridgegreys.com/  <https://www.cambridgegreys.com/>

--Anton R. Ivanov

    Cambridgegreys Limited. Registered in England. Company Number 10273661
    https://www.cambridgegreys.com/  <https://www.cambridgegreys.com/>


--
Anton R. Ivanov
Cambridgegreys Limited. Registered in England. Company Number 10273661
https://www.cambridgegreys.com/

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [OVN Patch v8 3/3] northd: Restore parallel build with dp_groups

Reply via email to