Summary of findings.
1. The numbers on the perf test do not align with heater which is much closer
to a realistic load. On some tests where heater gives 5-10% end-to-end
improvement with parallelization we get worse results with the perf-test. You
spotted this one correctly.
Example of the northd average pulled out of the test report via grep and sed.
127.489353
131.509458
116.088205
94.721911
119.629756
114.896258
124.811069
129.679160
106.699905
134.490338
112.106713
135.957658
132.471111
94.106849
117.431450
115.861592
106.830657
132.396905
107.092542
128.945760
94.298464
120.455510
136.910426
134.311765
115.881292
116.918458
These values are all over the place - this is not a reproducible test.
2. In the present state you need to re-run it > 30+ times and take an average. The
standard deviation for the values for the northd loop is > 10%. Compared to that
the reproducibility of ovn-heater is significantly better. I usually get less than
0.5% difference between runs if there was no iteration failures. I would suggest
using that instead if you want to do performance comparisons until we have figured
out what affects the perf-test.
3. It is using the short term running average value in reports which is
probably wrong because you have very significant skew from the last several
values.
I will look into all of these.
Brgds,
On 30/09/2021 08:26, Han Zhou wrote:
On Thu, Sep 30, 2021 at 12:08 AM Anton Ivanov <[email protected]
<mailto:[email protected]>> wrote:
After quickly adding some more prints into the testsuite.
Test 1:
Without
1: ovn-northd basic scale test -- 200 Hypervisors, 200 Logical
Ports/Hypervisor -- ovn-northd -- dp-groups=yes
---
Maximum (NB in msec): 1130
Average (NB in msec): 620.375000
Maximum (SB in msec): 23
Average (SB in msec): 21.468759
Maximum (northd-loop in msec): 6002
Minimum (northd-loop in msec): 0
Average (northd-loop in msec): 914.760417
Long term average (northd-loop in msec): 104.799340
With
1: ovn-northd basic scale test -- 200 Hypervisors, 200 Logical
Ports/Hypervisor -- ovn-northd -- dp-groups=yes
---
Maximum (NB in msec): 1148
Average (NB in msec): 630.250000
Maximum (SB in msec): 24
Average (SB in msec): 21.468744
Maximum (northd-loop in msec): 6090
Minimum (northd-loop in msec): 0
Average (northd-loop in msec): 762.101565
Long term average (northd-loop in msec): 80.735192
The metric which actually matters and which SHOULD me measured - long term
average is better by 20%. Using short term average instead of long term in the
test suite is actually a BUG.
Good catch!
Are you running yours under some sort of virtualization?
No, I am testing on a bare-metal.
A.
On 30/09/2021 07:52, Han Zhou wrote:
Thanks Anton for checking. I am using: Intel(R) Core(TM) i9-7920X CPU @
2.90GHz, 24 cores.
It is weird why my result is so different. I also verified with a scale
test script that creates a large scale NB/SB with 800 nodes of simulated k8s
setup. And then just run:
ovn-nbctl --print-wait-time --wait=sb sync
Without parallel:
ovn-northd completion: 7807ms
With parallel:
ovn-northd completion: 41267ms
I suspected the hmap size problem but I tried changing the initial size to 64k
buckets and it didn't help. I will find some time to check the "perf" reports.
Thanks,
Han
On Wed, Sep 29, 2021 at 11:31 PM Anton Ivanov <[email protected]
<mailto:[email protected]>> wrote:
On 30/09/2021 07:16, Anton Ivanov wrote:
Results on a Ryzen 5 3600 - 6 cores 12 threads
I will also have a look into the "maximum" measurement for multi-thread.
It does not tie up with the drop in average across the board.
A.
Without
1: ovn-northd basic scale test -- 200 Hypervisors, 200 Logical
Ports/Hypervisor -- ovn-northd -- dp-groups=yes
---
Maximum (NB in msec): 1256
Average (NB in msec): 679.463785
Maximum (SB in msec): 25
Average (SB in msec): 22.489798
Maximum (northd-loop in msec): 1347
Average (northd-loop in msec): 799.944878
2: ovn-northd basic scale test -- 200 Hypervisors, 200 Logical
Ports/Hypervisor -- ovn-northd -- dp-groups=no
---
Maximum (NB in msec): 1956
Average (NB in msec): 809.387285
Maximum (SB in msec): 24
Average (SB in msec): 21.649258
Maximum (northd-loop in msec): 2011
Average (northd-loop in msec): 961.718686
5: ovn-northd basic scale test -- 500 Hypervisors, 50 Logical
Ports/Hypervisor -- ovn-northd -- dp-groups=yes
---
Maximum (NB in msec): 557
Average (NB in msec): 474.010337
Maximum (SB in msec): 15
Average (SB in msec): 13.927192
Maximum (northd-loop in msec): 1261
Average (northd-loop in msec): 580.999122
6: ovn-northd basic scale test -- 500 Hypervisors, 50 Logical
Ports/Hypervisor -- ovn-northd -- dp-groups=no
---
Maximum (NB in msec): 756
Average (NB in msec): 625.614724
Maximum (SB in msec): 15
Average (SB in msec): 14.181048
Maximum (northd-loop in msec): 1649
Average (northd-loop in msec): 746.208332
With
1: ovn-northd basic scale test -- 200 Hypervisors, 200 Logical
Ports/Hypervisor -- ovn-northd -- dp-groups=yes
---
Maximum (NB in msec): 1140
Average (NB in msec): 631.125000
Maximum (SB in msec): 24
Average (SB in msec): 21.453609
Maximum (northd-loop in msec): 6080
Average (northd-loop in msec): 759.718815
2: ovn-northd basic scale test -- 200 Hypervisors, 200 Logical
Ports/Hypervisor -- ovn-northd -- dp-groups=no
---
Maximum (NB in msec): 1210
Average (NB in msec): 673.000000
Maximum (SB in msec): 27
Average (SB in msec): 22.453125
Maximum (northd-loop in msec): 6514
Average (northd-loop in msec): 808.596842
5: ovn-northd basic scale test -- 500 Hypervisors, 50 Logical
Ports/Hypervisor -- ovn-northd -- dp-groups=yes
---
Maximum (NB in msec): 798
Average (NB in msec): 429.750000
Maximum (SB in msec): 15
Average (SB in msec): 12.998533
Maximum (northd-loop in msec): 3835
Average (northd-loop in msec): 564.875986
6: ovn-northd basic scale test -- 500 Hypervisors, 50 Logical
Ports/Hypervisor -- ovn-northd -- dp-groups=no
---
Maximum (NB in msec): 1074
Average (NB in msec): 593.875000
Maximum (SB in msec): 14
Average (SB in msec): 13.655273
Maximum (northd-loop in msec): 4973
Average (northd-loop in msec): 771.102605
The only one slower is test 6 which I will look into.
The rest are > 5% faster.
A.
On 30/09/2021 00:56, Han Zhou wrote:
On Wed, Sep 15, 2021 at 5:45 AM <[email protected]
<mailto:[email protected]>> wrote:
>
> From: Anton Ivanov <[email protected]
<mailto:[email protected]>>
>
> Restore parallel build with dp groups using rwlock instead
> of per row locking as an underlying mechanism.
>
> This provides improvement ~ 10% end-to-end on ovn-heater
> under virutalization despite awakening some qemu gremlin
> which makes qemu climb to silly CPU usage. The gain on
> bare metal is likely to be higher.
>
Hi Anton,
I am trying to see the benefit of parallel_build, but encountered
unexpected performance result when running the perf tests with command:
make check-perf TESTSUITEFLAGS="--rebuild"
It shows significantly worse performance than without parallel_build. For
dp_group = no cases, it is better, but still ~30% slower than without parallel_build. I
have 24 cores, but each thread is not consuming much CPU except the main thread. I also
tried hardcode the number of thread to just 4, which end up with slightly better results,
but still far behind "without parallel_build".
no parallel | parallel (24 pool threads)
| parallel with (4 pool threads)
| |
1: ovn-northd basic scale test -- 200 Hypervisors, 200 | 1:
ovn-northd basic scale test -- 200 Hypervisors, 200 | 1: ovn-northd basic
scale test -- 200 Hypervisors, 200
--- | --- | ---
Maximum (NB in msec): 1058 | Maximum (NB in
msec): 4269 | Maximum (NB in msec): 4097
Average (NB in msec): 836.941167 | Average
(NB in msec): 3697.253931 | Average (NB in msec): 3498.311525
Maximum (SB in msec): 30 | Maximum (SB in
msec): 30 | Maximum (SB in msec): 28
Average (SB in msec): 25.934011 | Average
(SB in msec): 26.001840 | Average (SB in msec): 25.685091
Maximum (northd-loop in msec): 1204 | Maximum
(northd-loop in msec): 4379 | Maximum (northd-loop in msec): 4251
Average (northd-loop in msec): 1005.330078 | Average
(northd-loop in msec): 4233.871504 | Average (northd-loop in msec):
4022.774208
| |
2: ovn-northd basic scale test -- 200 Hypervisors, 200 | 2:
ovn-northd basic scale test -- 200 Hypervisors, 200 | 2: ovn-northd basic
scale test -- 200 Hypervisors, 200
--- | --- | ---
Maximum (NB in msec): 1124 | Maximum (NB in
msec): 1480 | Maximum (NB in msec): 1331
Average (NB in msec): 892.403405 | Average
(NB in msec): 1206.189287 | Average (NB in msec): 1089.378455
Maximum (SB in msec): 29 | Maximum (SB in
msec): 31 | Maximum (SB in msec): 30
Average (SB in msec): 26.922632 | Average
(SB in msec): 26.636706 | Average (SB in msec): 25.657484
Maximum (northd-loop in msec): 1275 | Maximum
(northd-loop in msec): 1639 | Maximum (northd-loop in msec): 1495
Average (northd-loop in msec): 1074.917873 | Average
(northd-loop in msec): 1458.152327 | Average (northd-loop in msec):
1301.057201
| |
5: ovn-northd basic scale test -- 500 Hypervisors, 50 L| 5:
ovn-northd basic scale test -- 500 Hypervisors, 50 L| 5: ovn-northd basic
scale test -- 500 Hypervisors, 50
--- | --- | ---
Maximum (NB in msec): 768 | Maximum (NB in
msec): 3086 | Maximum (NB in msec): 2876
Average (NB in msec): 614.491938 | Average
(NB in msec): 2681.688365 | Average (NB in msec): 2531.255444
Maximum (SB in msec): 18 | Maximum (SB in
msec): 17 | Maximum (SB in msec): 18
Average (SB in msec): 16.347526 | Average
(SB in msec): 15.955263 | Average (SB in msec): 16.278075
Maximum (northd-loop in msec): 889 | Maximum
(northd-loop in msec): 3247 | Maximum (northd-loop in msec): 3031
Average (northd-loop in msec): 772.083572 | Average
(northd-loop in msec): 3117.504297 | Average (northd-loop in msec):
2833.182361
| |
6: ovn-northd basic scale test -- 500 Hypervisors, 50 L| 6:
ovn-northd basic scale test -- 500 Hypervisors, 50 L| 6: ovn-northd basic
scale test -- 500 Hypervisors, 50
--- | --- | ---
Maximum (NB in msec): 1046 | Maximum (NB in
msec): 1371 | Maximum (NB in msec): 1262
Average (NB in msec): 827.735852 | Average
(NB in msec): 1135.514228 | Average (NB in msec): 970.544792
Maximum (SB in msec): 19 | Maximum (SB in
msec): 18 | Maximum (SB in msec): 19
Average (SB in msec): 16.828127 | Average
(SB in msec): 16.083914 | Average (SB in msec): 15.602525
Maximum (northd-loop in msec): 1163 | Maximum
(northd-loop in msec): 1545 | Maximum (northd-loop in msec): 1411
Average (northd-loop in msec): 972.567407 | Average
(northd-loop in msec): 1328.617583 | Average (northd-loop in msec):
1207.667100
I didn't debug yet, but do you have any clue what could be the reason?
I am using the upstream commit 9242f27f63 which already included this patch.
Below is my change to the perf-northd.at <http://perf-northd.at> file
just to enable parallel_build:
diff --git a/tests/perf-northd.at <http://perf-northd.at>
b/tests/perf-northd.at <http://perf-northd.at>
index 74b69e9d4..9328c2e21 100644
--- a/tests/perf-northd.at <http://perf-northd.at>
+++ b/tests/perf-northd.at <http://perf-northd.at>
@@ -191,6 +191,7 @@ AT_SETUP([ovn-northd basic scale test -- 200
Hypervisors, 200 Logical Ports/Hype
PERF_RECORD_START()
ovn_start
+ovn-nbctl set nb_global . options:use_parallel_build=true
BUILD_NBDB(OVN_BASIC_SCALE_CONFIG(200, 200))
@@ -203,9 +204,10 @@ AT_SETUP([ovn-northd basic scale test -- 500
Hypervisors, 50 Logical Ports/Hyper
PERF_RECORD_START()
ovn_start
+ovn-nbctl set nb_global . options:use_parallel_build=true
BUILD_NBDB(OVN_BASIC_SCALE_CONFIG(500, 50))
Thanks,
Han
--
Anton R. Ivanov
Cambridgegreys Limited. Registered in England. Company Number 10273661
https://www.cambridgegreys.com/ <https://www.cambridgegreys.com/>
--
Anton R. Ivanov
Cambridgegreys Limited. Registered in England. Company Number 10273661
https://www.cambridgegreys.com/ <https://www.cambridgegreys.com/>
--
Anton R. Ivanov
Cambridgegreys Limited. Registered in England. Company Number 10273661
https://www.cambridgegreys.com/ <https://www.cambridgegreys.com/>
--
Anton R. Ivanov
Cambridgegreys Limited. Registered in England. Company Number 10273661
https://www.cambridgegreys.com/
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev