On 30/09/2021 20:48, Han Zhou wrote:
On Thu, Sep 30, 2021 at 7:34 AM Anton Ivanov
<[email protected]
<mailto:[email protected]>> wrote:
Summary of findings.
1. The numbers on the perf test do not align with heater which is
much closer to a realistic load. On some tests where heater gives
5-10% end-to-end improvement with parallelization we get worse
results with the perf-test. You spotted this one correctly.
Example of the northd average pulled out of the test report via
grep and sed.
127.489353
131.509458
116.088205
94.721911
119.629756
114.896258
124.811069
129.679160
106.699905
134.490338
112.106713
135.957658
132.471111
94.106849
117.431450
115.861592
106.830657
132.396905
107.092542
128.945760
94.298464
120.455510
136.910426
134.311765
115.881292
116.918458
These values are all over the place - this is not a reproducible test.
2. In the present state you need to re-run it > 30+ times and take
an average. The standard deviation for the values for the northd
loop is > 10%. Compared to that the reproducibility of ovn-heater
is significantly better. I usually get less than 0.5% difference
between runs if there was no iteration failures. I would suggest
using that instead if you want to do performance comparisons until
we have figured out what affects the perf-test.
3. It is using the short term running average value in reports
which is probably wrong because you have very significant skew
from the last several values.
I will look into all of these.
Thanks for the summary! However, I think there is a bigger problem
(probably related to my environment) than the stability of the test
(make check-perf TESTSUITEFLAGS="--rebuild") itself. As I mentioned in
an earlier email I observed even worse results with a large scale
topology closer to a real world deployment of ovn-k8s just testing
with the command:
ovn-nbctl --print-wait-time --wait=sb sync
This command simply triggers a change in NB_Global table and wait for
northd to complete all the recompute and update SB. It doesn't have to
use "sync" command but any change to the NB DB produces similar result
(e.g.: ovn-nbctl --print-wait-time --wait=sb ls-add ls1)
Without parallel:
ovn-northd completion: 7807ms
With parallel:
ovn-northd completion: 41267ms
Is this with current master or prior to these patches?
1. There was an issue prior to these where the hash on first iteration
with an existing database when loading a large database for the first
time was not sized correctly. These numbers sound about right when this
bug was around.
2. There should be NO DIFFERENCE in a single compute cycle with an
existing database between a run with parallel and without with dp groups
at present. This is because the first cycle does not use parallel
compute. It is disabled in order to achieve the correct hash sizings for
future cycle by auto-scaling the hash.
So what exact tag/commit are you running this with and with what options
are on/off?
A.
This result is stable and consistent when repeating the command on my
machine. Would you try it on your machine as well? I understand that
only the lflow generation part can be parallelized and it doesn't
solve all the bottleneck, but I did expect it to be faster instead of
slower. If your result always shows that parallel is better, then I
will have to dig it out myself on my test machine.
Thanks,
Han
Brgds,
On 30/09/2021 08:26, Han Zhou wrote:
On Thu, Sep 30, 2021 at 12:08 AM Anton Ivanov
<[email protected]
<mailto:[email protected]>> wrote:
After quickly adding some more prints into the testsuite.
Test 1:
Without
1: ovn-northd basic scale test -- 200 Hypervisors, 200
Logical Ports/Hypervisor -- ovn-northd -- dp-groups=yes
---
Maximum (NB in msec): 1130
Average (NB in msec): 620.375000
Maximum (SB in msec): 23
Average (SB in msec): 21.468759
Maximum (northd-loop in msec): 6002
Minimum (northd-loop in msec): 0
Average (northd-loop in msec): 914.760417
Long term average (northd-loop in msec): 104.799340
With
1: ovn-northd basic scale test -- 200 Hypervisors, 200
Logical Ports/Hypervisor -- ovn-northd -- dp-groups=yes
---
Maximum (NB in msec): 1148
Average (NB in msec): 630.250000
Maximum (SB in msec): 24
Average (SB in msec): 21.468744
Maximum (northd-loop in msec): 6090
Minimum (northd-loop in msec): 0
Average (northd-loop in msec): 762.101565
Long term average (northd-loop in msec): 80.735192
The metric which actually matters and which SHOULD me
measured - long term average is better by 20%. Using short
term average instead of long term in the test suite is
actually a BUG.
Good catch!
Are you running yours under some sort of virtualization?
No, I am testing on a bare-metal.
A.
On 30/09/2021 07:52, Han Zhou wrote:
Thanks Anton for checking. I am using: Intel(R) Core(TM)
i9-7920X CPU @ 2.90GHz, 24 cores.
It is weird why my result is so different. I also verified
with a scale test script that creates a large scale NB/SB
with 800 nodes of simulated k8s setup. And then just run:
ovn-nbctl --print-wait-time --wait=sb sync
Without parallel:
ovn-northd completion: 7807ms
With parallel:
ovn-northd completion: 41267ms
I suspected the hmap size problem but I tried changing the
initial size to 64k buckets and it didn't help. I will find
some time to check the "perf" reports.
Thanks,
Han
On Wed, Sep 29, 2021 at 11:31 PM Anton Ivanov
<[email protected]
<mailto:[email protected]>> wrote:
On 30/09/2021 07:16, Anton Ivanov wrote:
Results on a Ryzen 5 3600 - 6 cores 12 threads
I will also have a look into the "maximum" measurement
for multi-thread.
It does not tie up with the drop in average across the
board.
A.
Without
1: ovn-northd basic scale test -- 200 Hypervisors,
200 Logical Ports/Hypervisor -- ovn-northd -- dp-groups=yes
---
Maximum (NB in msec): 1256
Average (NB in msec): 679.463785
Maximum (SB in msec): 25
Average (SB in msec): 22.489798
Maximum (northd-loop in msec): 1347
Average (northd-loop in msec): 799.944878
2: ovn-northd basic scale test -- 200 Hypervisors,
200 Logical Ports/Hypervisor -- ovn-northd -- dp-groups=no
---
Maximum (NB in msec): 1956
Average (NB in msec): 809.387285
Maximum (SB in msec): 24
Average (SB in msec): 21.649258
Maximum (northd-loop in msec): 2011
Average (northd-loop in msec): 961.718686
5: ovn-northd basic scale test -- 500 Hypervisors, 50
Logical Ports/Hypervisor -- ovn-northd -- dp-groups=yes
---
Maximum (NB in msec): 557
Average (NB in msec): 474.010337
Maximum (SB in msec): 15
Average (SB in msec): 13.927192
Maximum (northd-loop in msec): 1261
Average (northd-loop in msec): 580.999122
6: ovn-northd basic scale test -- 500 Hypervisors, 50
Logical Ports/Hypervisor -- ovn-northd -- dp-groups=no
---
Maximum (NB in msec): 756
Average (NB in msec): 625.614724
Maximum (SB in msec): 15
Average (SB in msec): 14.181048
Maximum (northd-loop in msec): 1649
Average (northd-loop in msec): 746.208332
With
1: ovn-northd basic scale test -- 200 Hypervisors,
200 Logical Ports/Hypervisor -- ovn-northd -- dp-groups=yes
---
Maximum (NB in msec): 1140
Average (NB in msec): 631.125000
Maximum (SB in msec): 24
Average (SB in msec): 21.453609
Maximum (northd-loop in msec): 6080
Average (northd-loop in msec): 759.718815
2: ovn-northd basic scale test -- 200 Hypervisors,
200 Logical Ports/Hypervisor -- ovn-northd -- dp-groups=no
---
Maximum (NB in msec): 1210
Average (NB in msec): 673.000000
Maximum (SB in msec): 27
Average (SB in msec): 22.453125
Maximum (northd-loop in msec): 6514
Average (northd-loop in msec): 808.596842
5: ovn-northd basic scale test -- 500 Hypervisors, 50
Logical Ports/Hypervisor -- ovn-northd -- dp-groups=yes
---
Maximum (NB in msec): 798
Average (NB in msec): 429.750000
Maximum (SB in msec): 15
Average (SB in msec): 12.998533
Maximum (northd-loop in msec): 3835
Average (northd-loop in msec): 564.875986
6: ovn-northd basic scale test -- 500 Hypervisors, 50
Logical Ports/Hypervisor -- ovn-northd -- dp-groups=no
---
Maximum (NB in msec): 1074
Average (NB in msec): 593.875000
Maximum (SB in msec): 14
Average (SB in msec): 13.655273
Maximum (northd-loop in msec): 4973
Average (northd-loop in msec): 771.102605
The only one slower is test 6 which I will look into.
The rest are > 5% faster.
A.
On 30/09/2021 00:56, Han Zhou wrote:
On Wed, Sep 15, 2021 at 5:45 AM
<[email protected]
<mailto:[email protected]>> wrote:
>
> From: Anton Ivanov <[email protected]
<mailto:[email protected]>>
>
> Restore parallel build with dp groups using rwlock
instead
> of per row locking as an underlying mechanism.
>
> This provides improvement ~ 10% end-to-end on ovn-heater
> under virutalization despite awakening some qemu gremlin
> which makes qemu climb to silly CPU usage. The gain on
> bare metal is likely to be higher.
>
Hi Anton,
I am trying to see the benefit of parallel_build, but
encountered unexpected performance result when running
the perf tests with command:
make check-perf TESTSUITEFLAGS="--rebuild"
It shows significantly worse performance than without
parallel_build. For dp_group = no cases, it is better,
but still ~30% slower than without parallel_build. I
have 24 cores, but each thread is not consuming much
CPU except the main thread. I also tried hardcode the
number of thread to just 4, which end up with slightly
better results, but still far behind "without
parallel_build".
no parallel | parallel (24
pool threads) | parallel with (4 pool
threads)
| |
1: ovn-northd basic scale test -- 200 Hypervisors,
200 | 1: ovn-northd basic scale test -- 200
Hypervisors, 200 | 1: ovn-northd basic scale test --
200 Hypervisors, 200
--- | --- | ---
Maximum (NB in msec): 1058 | Maximum (NB in
msec): 4269 | Maximum (NB in msec): 4097
Average (NB in msec): 836.941167 | Average
(NB in msec): 3697.253931 | Average (NB
in msec): 3498.311525
Maximum (SB in msec): 30 | Maximum (SB in
msec): 30 | Maximum (SB in msec): 28
Average (SB in msec): 25.934011 | Average
(SB in msec): 26.001840 | Average (SB in
msec): 25.685091
Maximum (northd-loop in msec): 1204 |
Maximum (northd-loop in msec): 4379 | Maximum
(northd-loop in msec): 4251
Average (northd-loop in msec): 1005.330078 |
Average (northd-loop in msec): 4233.871504 |
Average (northd-loop in msec): 4022.774208
| |
2: ovn-northd basic scale test -- 200 Hypervisors,
200 | 2: ovn-northd basic scale test -- 200
Hypervisors, 200 | 2: ovn-northd basic scale test --
200 Hypervisors, 200
--- | --- | ---
Maximum (NB in msec): 1124 | Maximum (NB in
msec): 1480 | Maximum (NB in msec): 1331
Average (NB in msec): 892.403405 | Average
(NB in msec): 1206.189287 | Average (NB
in msec): 1089.378455
Maximum (SB in msec): 29 | Maximum (SB in
msec): 31 | Maximum (SB in msec): 30
Average (SB in msec): 26.922632 | Average
(SB in msec): 26.636706 | Average (SB in
msec): 25.657484
Maximum (northd-loop in msec): 1275 |
Maximum (northd-loop in msec): 1639 | Maximum
(northd-loop in msec): 1495
Average (northd-loop in msec): 1074.917873 |
Average (northd-loop in msec): 1458.152327 |
Average (northd-loop in msec): 1301.057201
| |
5: ovn-northd basic scale test -- 500 Hypervisors,
50 L| 5: ovn-northd basic scale test -- 500
Hypervisors, 50 L| 5: ovn-northd basic scale test --
500 Hypervisors, 50
--- | --- | ---
Maximum (NB in msec): 768 | Maximum (NB in
msec): 3086 | Maximum (NB in msec): 2876
Average (NB in msec): 614.491938 | Average
(NB in msec): 2681.688365 | Average (NB
in msec): 2531.255444
Maximum (SB in msec): 18 | Maximum (SB in
msec): 17 | Maximum (SB in msec): 18
Average (SB in msec): 16.347526 | Average
(SB in msec): 15.955263 | Average (SB in
msec): 16.278075
Maximum (northd-loop in msec): 889 | Maximum
(northd-loop in msec): 3247 | Maximum
(northd-loop in msec): 3031
Average (northd-loop in msec): 772.083572 |
Average (northd-loop in msec): 3117.504297 |
Average (northd-loop in msec): 2833.182361
| |
6: ovn-northd basic scale test -- 500 Hypervisors,
50 L| 6: ovn-northd basic scale test -- 500
Hypervisors, 50 L| 6: ovn-northd basic scale test --
500 Hypervisors, 50
--- | --- | ---
Maximum (NB in msec): 1046 | Maximum (NB in
msec): 1371 | Maximum (NB in msec): 1262
Average (NB in msec): 827.735852 | Average
(NB in msec): 1135.514228 | Average (NB
in msec): 970.544792
Maximum (SB in msec): 19 | Maximum (SB in
msec): 18 | Maximum (SB in msec): 19
Average (SB in msec): 16.828127 | Average
(SB in msec): 16.083914 | Average (SB in
msec): 15.602525
Maximum (northd-loop in msec): 1163 |
Maximum (northd-loop in msec): 1545 | Maximum
(northd-loop in msec): 1411
Average (northd-loop in msec): 972.567407 |
Average (northd-loop in msec): 1328.617583 |
Average (northd-loop in msec): 1207.667100
I didn't debug yet, but do you have any clue what
could be the reason? I am using the upstream commit
9242f27f63 which already included this patch.
Below is my change to the perf-northd.at
<http://perf-northd.at> file just to enable
parallel_build:
diff --git a/tests/perf-northd.at
<http://perf-northd.at> b/tests/perf-northd.at
<http://perf-northd.at>
index 74b69e9d4..9328c2e21 100644
--- a/tests/perf-northd.at <http://perf-northd.at>
+++ b/tests/perf-northd.at <http://perf-northd.at>
@@ -191,6 +191,7 @@ AT_SETUP([ovn-northd basic scale
test -- 200 Hypervisors, 200 Logical Ports/Hype
PERF_RECORD_START()
ovn_start
+ovn-nbctl set nb_global . options:use_parallel_build=true
BUILD_NBDB(OVN_BASIC_SCALE_CONFIG(200, 200))
@@ -203,9 +204,10 @@ AT_SETUP([ovn-northd basic scale
test -- 500 Hypervisors, 50 Logical Ports/Hyper
PERF_RECORD_START()
ovn_start
+ovn-nbctl set nb_global . options:use_parallel_build=true
BUILD_NBDB(OVN_BASIC_SCALE_CONFIG(500, 50))
Thanks,
Han
--
Anton R. Ivanov
Cambridgegreys Limited. Registered in England. Company Number
10273661
https://www.cambridgegreys.com/ <https://www.cambridgegreys.com/>
--
Anton R. Ivanov
Cambridgegreys Limited. Registered in England. Company Number
10273661
https://www.cambridgegreys.com/ <https://www.cambridgegreys.com/>
--
Anton R. Ivanov
Cambridgegreys Limited. Registered in England. Company Number 10273661
https://www.cambridgegreys.com/ <https://www.cambridgegreys.com/>
--
Anton R. Ivanov
Cambridgegreys Limited. Registered in England. Company Number 10273661
https://www.cambridgegreys.com/ <https://www.cambridgegreys.com/>
--
Anton R. Ivanov
Cambridgegreys Limited. Registered in England. Company Number 10273661
https://www.cambridgegreys.com/
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev