On Mon, Feb 25, 2019 at 09:25:03AM -0800, Han Zhou wrote:
> In scalability test with ovn-scale-test, ovsdb-server SB load is not a
> problem at least with 1k HVs. However, if we restart the ovsdb-server,
> depending on the number of HVs and scale of logical objects, e.g. the
> number of logical ports, ovsdb-server of SB become an obvious bottleneck.
>
> In our test with 1k HVs and 20k logical ports (200 lport * 100 lswitches
> connected by one single logical router). Restarting ovsdb-server of SB
> resulted in 100% CPU of ovsdb-server for more than 1 hour. All HVs (and
> northd) are reconnecting and resyncing the big amount of data at the same
> time.
>
> Similar problem would happen in failover scenario. With active-active
> cluster, the problem can be aleviated slightly, because only 1/3 (assuming
> it is 3-node cluster) of the HVs will need to resync data from new servers,
> but it is still a serious problem.
>
> For detailed discussions for the problem and solutions, see:
> https://mail.openvswitch.org/pipermail/ovs-discuss/2018-October/047591.html
Thanks.
When I apply this series, I get a reproducible test failure in test
1920 "schema conversion online - clustered". It's an error from Address
Sanitizer. I'm appending the testsuite.log.
Obviously this needs to be fixed but I'll take a look at the patches in
more detail now too.
# -*- compilation -*-
1920. ovsdb-server.at:1126: testing schema conversion online - clustered ...
../../tests/ovsdb-server.at:869: if test $model = standalone; then
ovsdb-tool create db schema
else
ovsdb-tool create-cluster db schema unix:s1.raft
fi
../../tests/ovsdb-server.at:876: ovsdb-server -vfile -vvlog:off -vconsole:off
--detach --no-chdir --pidfile --log-file --remote=punix:db.sock db
../../tests/ovsdb-server.at:880: for pair in 'zero 0' 'one 1' 'two 2' 'three 3'
'four 4' 'five 5'; do
set -- $pair
ovsdb-client transact '
["ordinals",
{"op": "insert",
"table": "ordinals",
"row": {"name": "'$1'", "number": '$2'}},
{"op": "comment",
"comment": "add row for '"$pair"'"}]'
done | uuidfilt
stderr:
../../tests/ovsdb-server.at:900: ovsdb-client needs-conversion schema
../../tests/ovsdb-server.at:902: ovsdb-client needs-conversion new-schema
../../tests/ovsdb-server.at:907: ovsdb-client -vfile -vvlog:off --detach
--no-chdir --pidfile=monitor-ordinals-aware.pid
--log-file=monitor-ordinals-aware.log --db-change-aware --no-headings monitor
ordinals ordinals number name > monitor-ordinals-aware.stdout 2>
monitor-ordinals-aware.stderr
../../tests/ovsdb-server.at:912: ovsdb-client -vfile -vvlog:off --detach
--no-chdir --pidfile=monitor-ordinals-unaware.pid
--log-file=monitor-ordinals-unaware.log --no-db-change-aware --no-headings
monitor ordinals ordinals number name > monitor-ordinals-unaware.stdout 2>
monitor-ordinals-unaware.stderr
../../tests/ovsdb-server.at:919: ovsdb-client -vfile -vvlog:off --detach
--no-chdir --pidfile=monitor-server-aware.pid
--log-file=monitor-server-aware.log --db-change-aware --no-headings monitor
_Server Database name > monitor-server-aware.stdout 2>
monitor-server-aware.stderr
../../tests/ovsdb-server.at:924: ovsdb-client -vfile -vvlog:off --detach
--no-chdir --pidfile=monitor-server-unaware.pid
--log-file=monitor-server-unaware.log --no-db-change-aware --no-headings
monitor _Server Database name > monitor-server-unaware.stdout 2>
monitor-server-unaware.stderr
../../tests/ovsdb-server.at:938: ovsdb-client -vfile -vvlog:off --detach
--no-chdir --pidfile=trigger-ordinals-aware.pid
--log-file=trigger-ordinals-aware.log --db-change-aware transact
"$ordinals_txn" > trigger-ordinals-aware.stdout 2>
trigger-ordinals-aware.stderr
../../tests/ovsdb-server.at:943: ovsdb-client -vfile -vvlog:off --detach
--no-chdir --pidfile=trigger-ordinals-unaware.pid
--log-file=trigger-ordinals-unaware.log --no-db-change-aware transact
"$ordinals_txn" > trigger-ordinals-unaware.stdout 2>
trigger-ordinals-unaware.stderr
../../tests/ovsdb-server.at:957: ovsdb-client -vfile -vvlog:off --detach
--no-chdir --pidfile=trigger-server-aware.pid
--log-file=trigger-server-aware.log --db-change-aware transact "$server_txn" >
trigger-server-aware.stdout 2> trigger-server-aware.stderr
../../tests/ovsdb-server.at:962: ovsdb-client -vfile -vvlog:off --detach
--no-chdir --pidfile=trigger-server-unaware.pid
--log-file=trigger-server-unaware.log --no-db-change-aware transact
"$server_txn" > trigger-server-unaware.stdout 2> trigger-server-unaware.stderr
../../tests/ovsdb-server.at:968: ovsdb-client dump unix:db.sock ordinals
stdout:
ordinals table
_uuid name number
------------------------------------ ----- ------
c495692b-f991-440b-865b-ed7c61fe6412 five 5
d731216a-16c9-4b78-86da-31c65b0b8358 four 4
3e3e6998-2633-4bd0-bb97-3c3c4cbd7227 one 1
7fbcde1d-30ee-47ea-bf24-a40f16f05b15 three 3
6a7387b0-8115-48e9-a202-e2898a8f1772 two 2
79dcb8d0-7738-417c-b779-40d530c87c90 zero 0
../../tests/ovsdb-server.at:969: uuidfilt stdout
../../tests/ovsdb-server.at:982: ovsdb-client convert new-schema
../../tests/ovsdb-server.at:985: ovsdb-client needs-conversion schema
../../tests/ovsdb-server.at:987: ovsdb-client needs-conversion new-schema
aware
ovsdb-server.at:994: waiting while test -e monitor-ordinals-$x.pid...
ovsdb-server.at:994: wait succeeded immediately
../../tests/ovsdb-server.at:995: sort -k 3 monitor-ordinals-$x.stdout | uuidfilt
unaware
ovsdb-server.at:994: waiting while test -e monitor-ordinals-$x.pid...
ovsdb-server.at:994: wait succeeded immediately
../../tests/ovsdb-server.at:995: sort -k 3 monitor-ordinals-$x.stdout | uuidfilt
../../tests/ovsdb-server.at:1004: sed 's/.*: //' monitor-ordinals-unaware.stderr
../../tests/ovsdb-server.at:1006: sed 's/.*: //' monitor-ordinals-aware.stderr
../../tests/ovsdb-server.at:1012: sort -k 3 monitor-server-$x.stdout | uuidfilt
../../tests/ovsdb-server.at:1012: sort -k 3 monitor-server-$x.stdout | uuidfilt
ovsdb-server.at:1017: waiting while test -e monitor-server-unaware.pid...
ovsdb-server.at:1017: wait succeeded immediately
../../tests/ovsdb-server.at:1018: sed 's/.*: //' monitor-ordinals-unaware.stderr
../../tests/ovsdb-server.at:1020: test -e monitor-server-aware.pid
ovsdb-server.at:1025: waiting while test -e trigger-ordinals-$x.pid...
ovsdb-server.at:1025: wait succeeded immediately
../../tests/ovsdb-server.at:1026: cat trigger-ordinals-$x.stdout
ovsdb-server.at:1025: waiting while test -e trigger-ordinals-$x.pid...
ovsdb-server.at:1025: wait succeeded immediately
../../tests/ovsdb-server.at:1026: cat trigger-ordinals-$x.stdout
../../tests/ovsdb-server.at:1028: cat trigger-ordinals-unaware.stderr
../../tests/ovsdb-server.at:1030: cat trigger-ordinals-aware.stderr
../../tests/ovsdb-server.at:1036: cat trigger-server-$x.stdout
../../tests/ovsdb-server.at:1036: cat trigger-server-$x.stdout
ovsdb-server.at:1038: waiting while test -e trigger-server-unaware.pid...
ovsdb-server.at:1038: wait succeeded immediately
../../tests/ovsdb-server.at:1039: sed 's/.*: //' trigger-ordinals-unaware.stderr
../../tests/ovsdb-server.at:1041: test -e trigger-server-aware.pid
../../tests/ovsdb-server.at:1051: ovsdb-client dump unix:db.sock ordinals |
uuidfilt
../../tests/ovsdb-server.at:1065: ovsdb-client transact '
["ordinals",
{"op": "insert",
"table": "ordinals",
"row": {"number": 6}},
{"op": "comment",
"comment": "add row for 6"}]' | uuidfilt
../../tests/ovsdb-server.at:1078: ovsdb-client dump unix:db.sock ordinals |
uuidfilt
../../tests/ovsdb-server.at:1092: test -e $OVS_RUNDIR/ovsdb-server.pid
../../tests/ovsdb-server.at:1092: ovs-appctl --timeout=10 -t ovsdb-server exit
ovsdb-server.at:1092: waiting while kill -0 $TMPPID 2>/dev/null...
ovsdb-server.at:1092: wait succeeded after 1 seconds
../../tests/ovsdb-server.at:1093: ovsdb-server -vfile -vvlog:off -vconsole:off
--detach --no-chdir --pidfile --log-file --remote=punix:db.sock db
../../tests/ovsdb-server.at:1095: ovsdb-client dump unix:db.sock ordinals |
uuidfilt
../../tests/ovsdb-server.at:1111: test -h db
../../tests/ovsdb-server.at:1112: test -h .db.~lock~
../../tests/ovsdb-server.at:1113: test -f dir/db
../../tests/ovsdb-server.at:1114: test -f dir/.db.~lock~
../../tests/ovsdb-server.at:1117: test -e $OVS_RUNDIR/ovsdb-server.pid
../../tests/ovsdb-server.at:1117: ovs-appctl --timeout=10 -t ovsdb-server exit
ovsdb-server.at:1117: waiting while kill -0 $TMPPID 2>/dev/null...
ovsdb-server.at:1117: wait succeeded after 1 seconds
Address Sanitizer reported errors in: asan.32224 asan.32375
=================================================================
==32224==ERROR: AddressSanitizer: heap-use-after-free on address 0x6060000005c0
at pc 0x560f3a96859f bp 0x7ffee562c110 sp 0x7ffee562c108
READ of size 8 at 0x6060000005c0 thread T0
#0 0x560f3a96859e in ovsdb_row_destroy ../ovsdb/row.c:103
#1 0x560f3a973520 in ovsdb_txn_destroy_cloned ../ovsdb/transaction.c:941
#2 0x560f3a97a234 in ovsdb_txn_history_destroy ../ovsdb/transaction.c:1418
#3 0x560f3a9599de in ovsdb_destroy ../ovsdb/ovsdb.c:459
#4 0x560f3a94600e in close_db ../ovsdb/ovsdb-server.c:522
#5 0x560f3a941cd5 in main ../ovsdb/ovsdb-server.c:463
#6 0x7f2152f4209a in __libc_start_main ../csu/libc-start.c:308
#7 0x560f3a9438f9 in _start
(/home/blp/nicira/ovs/_build/ovsdb/ovsdb-server+0x508f9)
0x6060000005c0 is located 0 bytes inside of 56-byte region
[0x6060000005c0,0x6060000005f8)
freed by thread T0 here:
#0 0x7f21536fafd0 in __interceptor_free
(/lib/x86_64-linux-gnu/libasan.so.5+0xe8fd0)
#1 0x560f3a959b3b in ovsdb_destroy ../ovsdb/ovsdb.c:467
#2 0x560f3a959b3b in ovsdb_destroy ../ovsdb/ovsdb.c:447
#3 0x560f3a945117 in parse_txn ../ovsdb/ovsdb-server.c:556
#4 0x560f3a945117 in read_db ../ovsdb/ovsdb-server.c:603
#5 0x560f3a9401f9 in main_loop ../ovsdb/ovsdb-server.c:232
#6 0x560f3a9401f9 in main ../ovsdb/ovsdb-server.c:458
#7 0x7f2152f4209a in __libc_start_main ../csu/libc-start.c:308
previously allocated by thread T0 here:
#0 0x7f21536fb350 in __interceptor_malloc
(/lib/x86_64-linux-gnu/libasan.so.5+0xe9350)
#1 0x560f3a9fd794 in xmalloc ../lib/util.c:123
#2 0x560f3a96f46d in ovsdb_table_create ../ovsdb/table.c:295
#3 0x560f3a959603 in ovsdb_create ../ovsdb/ovsdb.c:423
#4 0x560f3a9450f3 in parse_txn ../ovsdb/ovsdb-server.c:556
#5 0x560f3a9450f3 in read_db ../ovsdb/ovsdb-server.c:603
#6 0x560f3a946b74 in open_db ../ovsdb/ovsdb-server.c:670
#7 0x560f3a9419f6 in main ../ovsdb/ovsdb-server.c:356
#8 0x7f2152f4209a in __libc_start_main ../csu/libc-start.c:308
SUMMARY: AddressSanitizer: heap-use-after-free ../ovsdb/row.c:103 in
ovsdb_row_destroy
Shadow bytes around the buggy address:
0x0c0c7fff8060: fa fa fa fa fd fd fd fd fd fd fd fd fa fa fa fa
0x0c0c7fff8070: fd fd fd fd fd fd fd fa fa fa fa fa fd fd fd fd
0x0c0c7fff8080: fd fd fd fd fa fa fa fa fd fd fd fd fd fd fd fd
0x0c0c7fff8090: fa fa fa fa fd fd fd fd fd fd fd fa fa fa fa fa
0x0c0c7fff80a0: fd fd fd fd fd fd fd fa fa fa fa fa fd fd fd fd
=>0x0c0c7fff80b0: fd fd fd fd fa fa fa fa[fd]fd fd fd fd fd fd fa
0x0c0c7fff80c0: fa fa fa fa fd fd fd fd fd fd fd fd fa fa fa fa
0x0c0c7fff80d0: fd fd fd fd fd fd fd fd fa fa fa fa fd fd fd fd
0x0c0c7fff80e0: fd fd fd fd fa fa fa fa fd fd fd fd fd fd fd fd
0x0c0c7fff80f0: fa fa fa fa 00 00 00 00 00 00 00 fa fa fa fa fa
0x0c0c7fff8100: 00 00 00 00 00 00 00 00 fa fa fa fa 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
==32224==ABORTING
=================================================================
==32375==ERROR: AddressSanitizer: heap-use-after-free on address 0x6060000008c0
at pc 0x5629da28a59f bp 0x7ffc98cc14e0 sp 0x7ffc98cc14d8
READ of size 8 at 0x6060000008c0 thread T0
#0 0x5629da28a59e in ovsdb_row_destroy ../ovsdb/row.c:103
#1 0x5629da295520 in ovsdb_txn_destroy_cloned ../ovsdb/transaction.c:941
#2 0x5629da29c234 in ovsdb_txn_history_destroy ../ovsdb/transaction.c:1418
#3 0x5629da27b9de in ovsdb_destroy ../ovsdb/ovsdb.c:459
#4 0x5629da26800e in close_db ../ovsdb/ovsdb-server.c:522
#5 0x5629da263cd5 in main ../ovsdb/ovsdb-server.c:463
#6 0x7f4f49ee709a in __libc_start_main ../csu/libc-start.c:308
#7 0x5629da2658f9 in _start
(/home/blp/nicira/ovs/_build/ovsdb/ovsdb-server+0x508f9)
0x6060000008c0 is located 0 bytes inside of 56-byte region
[0x6060000008c0,0x6060000008f8)
freed by thread T0 here:
#0 0x7f4f4a69ffd0 in __interceptor_free
(/lib/x86_64-linux-gnu/libasan.so.5+0xe8fd0)
#1 0x5629da27bb3b in ovsdb_destroy ../ovsdb/ovsdb.c:467
#2 0x5629da27bb3b in ovsdb_destroy ../ovsdb/ovsdb.c:447
#3 0x5629da267117 in parse_txn ../ovsdb/ovsdb-server.c:556
#4 0x5629da267117 in read_db ../ovsdb/ovsdb-server.c:603
#5 0x5629da268b74 in open_db ../ovsdb/ovsdb-server.c:670
#6 0x5629da2639f6 in main ../ovsdb/ovsdb-server.c:356
#7 0x7f4f49ee709a in __libc_start_main ../csu/libc-start.c:308
previously allocated by thread T0 here:
#0 0x7f4f4a6a0350 in __interceptor_malloc
(/lib/x86_64-linux-gnu/libasan.so.5+0xe9350)
#1 0x5629da31f794 in xmalloc ../lib/util.c:123
#2 0x5629da29146d in ovsdb_table_create ../ovsdb/table.c:295
#3 0x5629da27b603 in ovsdb_create ../ovsdb/ovsdb.c:423
#4 0x5629da2670f3 in parse_txn ../ovsdb/ovsdb-server.c:556
#5 0x5629da2670f3 in read_db ../ovsdb/ovsdb-server.c:603
#6 0x5629da268b74 in open_db ../ovsdb/ovsdb-server.c:670
#7 0x5629da2639f6 in main ../ovsdb/ovsdb-server.c:356
#8 0x7f4f49ee709a in __libc_start_main ../csu/libc-start.c:308
SUMMARY: AddressSanitizer: heap-use-after-free ../ovsdb/row.c:103 in
ovsdb_row_destroy
Shadow bytes around the buggy address:
0x0c0c7fff80c0: fa fa fa fa fd fd fd fd fd fd fd fd fa fa fa fa
0x0c0c7fff80d0: fd fd fd fd fd fd fd fa fa fa fa fa fd fd fd fd
0x0c0c7fff80e0: fd fd fd fd fa fa fa fa fd fd fd fd fd fd fd fd
0x0c0c7fff80f0: fa fa fa fa fd fd fd fd fd fd fd fa fa fa fa fa
0x0c0c7fff8100: fd fd fd fd fd fd fd fa fa fa fa fa fd fd fd fd
=>0x0c0c7fff8110: fd fd fd fd fa fa fa fa[fd]fd fd fd fd fd fd fa
0x0c0c7fff8120: fa fa fa fa fd fd fd fd fd fd fd fd fa fa fa fa
0x0c0c7fff8130: fd fd fd fd fd fd fd fd fa fa fa fa fd fd fd fd
0x0c0c7fff8140: fd fd fd fd fa fa fa fa 00 00 00 00 00 00 00 00
0x0c0c7fff8150: fa fa fa fa fd fd fd fd fd fd fd fd fa fa fa fa
0x0c0c7fff8160: 00 00 00 00 00 00 00 00 fa fa fa fa fd fd fd fd
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
==32375==ABORTING
../../tests/ovs-macros.at:194: hard failure
/home/blp/nicira/ovs/_build/tests/testsuite.dir/1920/cleanup: line 1: kill:
(32375) - No such process
1920. ovsdb-server.at:1126: 1920. schema conversion online - clustered
(ovsdb-server.at:1126): FAILED (ovs-macros.at:194)
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev