On Mon, Feb 25, 2019 at 09:25:03AM -0800, Han Zhou wrote:
> In scalability test with ovn-scale-test, ovsdb-server SB load is not a
> problem at least with 1k HVs. However, if we restart the ovsdb-server,
> depending on the number of HVs and scale of logical objects, e.g. the
> number of logical ports, ovsdb-server of SB become an obvious bottleneck.
> 
> In our test with 1k HVs and 20k logical ports (200 lport * 100 lswitches
> connected by one single logical router). Restarting ovsdb-server of SB
> resulted in 100% CPU of ovsdb-server for more than 1 hour. All HVs (and
> northd) are reconnecting and resyncing the big amount of data at the same
> time.
> 
> Similar problem would happen in failover scenario. With active-active
> cluster, the problem can be aleviated slightly, because only 1/3 (assuming
> it is 3-node cluster) of the HVs will need to resync data from new servers,
> but it is still a serious problem.
> 
> For detailed discussions for the problem and solutions, see:
> https://mail.openvswitch.org/pipermail/ovs-discuss/2018-October/047591.html

Thanks.

When I apply this series, I get a reproducible test failure in test
1920 "schema conversion online - clustered".  It's an error from Address
Sanitizer.  I'm appending the testsuite.log.

Obviously this needs to be fixed but I'll take a look at the patches in
more detail now too.

#                             -*- compilation -*-
1920. ovsdb-server.at:1126: testing schema conversion online - clustered ...
../../tests/ovsdb-server.at:869: if test $model = standalone; then
                  ovsdb-tool create db schema
              else
                  ovsdb-tool create-cluster db schema unix:s1.raft
              fi
../../tests/ovsdb-server.at:876: ovsdb-server -vfile -vvlog:off -vconsole:off 
--detach --no-chdir --pidfile --log-file --remote=punix:db.sock db
../../tests/ovsdb-server.at:880: for pair in 'zero 0' 'one 1' 'two 2' 'three 3' 
'four 4' 'five 5'; do
          set -- $pair
          ovsdb-client transact '
            ["ordinals",
             {"op": "insert",
              "table": "ordinals",
              "row": {"name": "'$1'", "number": '$2'}},
             {"op": "comment",
              "comment": "add row for '"$pair"'"}]'
        done | uuidfilt
stderr:
../../tests/ovsdb-server.at:900: ovsdb-client needs-conversion schema
../../tests/ovsdb-server.at:902: ovsdb-client needs-conversion new-schema
../../tests/ovsdb-server.at:907: ovsdb-client -vfile -vvlog:off --detach 
--no-chdir --pidfile=monitor-ordinals-aware.pid 
--log-file=monitor-ordinals-aware.log --db-change-aware --no-headings monitor 
ordinals ordinals number name > monitor-ordinals-aware.stdout 2> 
monitor-ordinals-aware.stderr
../../tests/ovsdb-server.at:912: ovsdb-client -vfile -vvlog:off --detach 
--no-chdir --pidfile=monitor-ordinals-unaware.pid 
--log-file=monitor-ordinals-unaware.log --no-db-change-aware --no-headings 
monitor ordinals ordinals number name > monitor-ordinals-unaware.stdout 2> 
monitor-ordinals-unaware.stderr
../../tests/ovsdb-server.at:919: ovsdb-client -vfile -vvlog:off --detach 
--no-chdir --pidfile=monitor-server-aware.pid 
--log-file=monitor-server-aware.log --db-change-aware --no-headings monitor 
_Server Database name > monitor-server-aware.stdout 2> 
monitor-server-aware.stderr
../../tests/ovsdb-server.at:924: ovsdb-client -vfile -vvlog:off --detach 
--no-chdir --pidfile=monitor-server-unaware.pid 
--log-file=monitor-server-unaware.log --no-db-change-aware --no-headings 
monitor _Server Database name > monitor-server-unaware.stdout 2> 
monitor-server-unaware.stderr
../../tests/ovsdb-server.at:938: ovsdb-client -vfile -vvlog:off --detach 
--no-chdir --pidfile=trigger-ordinals-aware.pid 
--log-file=trigger-ordinals-aware.log --db-change-aware transact 
"$ordinals_txn"  > trigger-ordinals-aware.stdout 2> 
trigger-ordinals-aware.stderr
../../tests/ovsdb-server.at:943: ovsdb-client -vfile -vvlog:off --detach 
--no-chdir --pidfile=trigger-ordinals-unaware.pid 
--log-file=trigger-ordinals-unaware.log --no-db-change-aware transact  
"$ordinals_txn" > trigger-ordinals-unaware.stdout 2> 
trigger-ordinals-unaware.stderr
../../tests/ovsdb-server.at:957: ovsdb-client -vfile -vvlog:off --detach 
--no-chdir --pidfile=trigger-server-aware.pid 
--log-file=trigger-server-aware.log --db-change-aware transact "$server_txn"  > 
trigger-server-aware.stdout 2> trigger-server-aware.stderr
../../tests/ovsdb-server.at:962: ovsdb-client -vfile -vvlog:off --detach 
--no-chdir --pidfile=trigger-server-unaware.pid 
--log-file=trigger-server-unaware.log --no-db-change-aware transact  
"$server_txn" > trigger-server-unaware.stdout 2> trigger-server-unaware.stderr
../../tests/ovsdb-server.at:968: ovsdb-client dump unix:db.sock ordinals
stdout:
ordinals table
_uuid                                name  number
------------------------------------ ----- ------
c495692b-f991-440b-865b-ed7c61fe6412 five  5
d731216a-16c9-4b78-86da-31c65b0b8358 four  4
3e3e6998-2633-4bd0-bb97-3c3c4cbd7227 one   1
7fbcde1d-30ee-47ea-bf24-a40f16f05b15 three 3
6a7387b0-8115-48e9-a202-e2898a8f1772 two   2
79dcb8d0-7738-417c-b779-40d530c87c90 zero  0
../../tests/ovsdb-server.at:969: uuidfilt stdout
../../tests/ovsdb-server.at:982: ovsdb-client convert new-schema
../../tests/ovsdb-server.at:985: ovsdb-client needs-conversion schema
../../tests/ovsdb-server.at:987: ovsdb-client needs-conversion new-schema
aware
ovsdb-server.at:994: waiting while test -e monitor-ordinals-$x.pid...
ovsdb-server.at:994: wait succeeded immediately
../../tests/ovsdb-server.at:995: sort -k 3 monitor-ordinals-$x.stdout | uuidfilt
unaware
ovsdb-server.at:994: waiting while test -e monitor-ordinals-$x.pid...
ovsdb-server.at:994: wait succeeded immediately
../../tests/ovsdb-server.at:995: sort -k 3 monitor-ordinals-$x.stdout | uuidfilt
../../tests/ovsdb-server.at:1004: sed 's/.*: //' monitor-ordinals-unaware.stderr
../../tests/ovsdb-server.at:1006: sed 's/.*: //' monitor-ordinals-aware.stderr
../../tests/ovsdb-server.at:1012: sort -k 3 monitor-server-$x.stdout | uuidfilt
../../tests/ovsdb-server.at:1012: sort -k 3 monitor-server-$x.stdout | uuidfilt
ovsdb-server.at:1017: waiting while test -e monitor-server-unaware.pid...
ovsdb-server.at:1017: wait succeeded immediately
../../tests/ovsdb-server.at:1018: sed 's/.*: //' monitor-ordinals-unaware.stderr
../../tests/ovsdb-server.at:1020: test -e monitor-server-aware.pid
ovsdb-server.at:1025: waiting while test -e trigger-ordinals-$x.pid...
ovsdb-server.at:1025: wait succeeded immediately
../../tests/ovsdb-server.at:1026: cat trigger-ordinals-$x.stdout
ovsdb-server.at:1025: waiting while test -e trigger-ordinals-$x.pid...
ovsdb-server.at:1025: wait succeeded immediately
../../tests/ovsdb-server.at:1026: cat trigger-ordinals-$x.stdout
../../tests/ovsdb-server.at:1028: cat trigger-ordinals-unaware.stderr
../../tests/ovsdb-server.at:1030: cat trigger-ordinals-aware.stderr
../../tests/ovsdb-server.at:1036: cat trigger-server-$x.stdout
../../tests/ovsdb-server.at:1036: cat trigger-server-$x.stdout
ovsdb-server.at:1038: waiting while test -e trigger-server-unaware.pid...
ovsdb-server.at:1038: wait succeeded immediately
../../tests/ovsdb-server.at:1039: sed 's/.*: //' trigger-ordinals-unaware.stderr
../../tests/ovsdb-server.at:1041: test -e trigger-server-aware.pid
../../tests/ovsdb-server.at:1051: ovsdb-client dump unix:db.sock ordinals | 
uuidfilt
../../tests/ovsdb-server.at:1065: ovsdb-client transact '
         ["ordinals",
          {"op": "insert",
           "table": "ordinals",
           "row": {"number": 6}},
          {"op": "comment",
           "comment": "add row for 6"}]' | uuidfilt
../../tests/ovsdb-server.at:1078: ovsdb-client dump unix:db.sock ordinals | 
uuidfilt
../../tests/ovsdb-server.at:1092: test -e $OVS_RUNDIR/ovsdb-server.pid
../../tests/ovsdb-server.at:1092: ovs-appctl --timeout=10 -t ovsdb-server exit
ovsdb-server.at:1092: waiting while kill -0 $TMPPID 2>/dev/null...
ovsdb-server.at:1092: wait succeeded after 1 seconds
../../tests/ovsdb-server.at:1093: ovsdb-server -vfile -vvlog:off -vconsole:off 
--detach --no-chdir --pidfile --log-file --remote=punix:db.sock db
../../tests/ovsdb-server.at:1095: ovsdb-client dump unix:db.sock ordinals | 
uuidfilt
../../tests/ovsdb-server.at:1111: test -h db
../../tests/ovsdb-server.at:1112: test -h .db.~lock~
../../tests/ovsdb-server.at:1113: test -f dir/db
../../tests/ovsdb-server.at:1114: test -f dir/.db.~lock~
../../tests/ovsdb-server.at:1117: test -e $OVS_RUNDIR/ovsdb-server.pid
../../tests/ovsdb-server.at:1117: ovs-appctl --timeout=10 -t ovsdb-server exit
ovsdb-server.at:1117: waiting while kill -0 $TMPPID 2>/dev/null...
ovsdb-server.at:1117: wait succeeded after 1 seconds
Address Sanitizer reported errors in: asan.32224 asan.32375
=================================================================
==32224==ERROR: AddressSanitizer: heap-use-after-free on address 0x6060000005c0 
at pc 0x560f3a96859f bp 0x7ffee562c110 sp 0x7ffee562c108
READ of size 8 at 0x6060000005c0 thread T0
    #0 0x560f3a96859e in ovsdb_row_destroy ../ovsdb/row.c:103
    #1 0x560f3a973520 in ovsdb_txn_destroy_cloned ../ovsdb/transaction.c:941
    #2 0x560f3a97a234 in ovsdb_txn_history_destroy ../ovsdb/transaction.c:1418
    #3 0x560f3a9599de in ovsdb_destroy ../ovsdb/ovsdb.c:459
    #4 0x560f3a94600e in close_db ../ovsdb/ovsdb-server.c:522
    #5 0x560f3a941cd5 in main ../ovsdb/ovsdb-server.c:463
    #6 0x7f2152f4209a in __libc_start_main ../csu/libc-start.c:308
    #7 0x560f3a9438f9 in _start 
(/home/blp/nicira/ovs/_build/ovsdb/ovsdb-server+0x508f9)

0x6060000005c0 is located 0 bytes inside of 56-byte region 
[0x6060000005c0,0x6060000005f8)
freed by thread T0 here:
    #0 0x7f21536fafd0 in __interceptor_free 
(/lib/x86_64-linux-gnu/libasan.so.5+0xe8fd0)
    #1 0x560f3a959b3b in ovsdb_destroy ../ovsdb/ovsdb.c:467
    #2 0x560f3a959b3b in ovsdb_destroy ../ovsdb/ovsdb.c:447
    #3 0x560f3a945117 in parse_txn ../ovsdb/ovsdb-server.c:556
    #4 0x560f3a945117 in read_db ../ovsdb/ovsdb-server.c:603
    #5 0x560f3a9401f9 in main_loop ../ovsdb/ovsdb-server.c:232
    #6 0x560f3a9401f9 in main ../ovsdb/ovsdb-server.c:458
    #7 0x7f2152f4209a in __libc_start_main ../csu/libc-start.c:308

previously allocated by thread T0 here:
    #0 0x7f21536fb350 in __interceptor_malloc 
(/lib/x86_64-linux-gnu/libasan.so.5+0xe9350)
    #1 0x560f3a9fd794 in xmalloc ../lib/util.c:123
    #2 0x560f3a96f46d in ovsdb_table_create ../ovsdb/table.c:295
    #3 0x560f3a959603 in ovsdb_create ../ovsdb/ovsdb.c:423
    #4 0x560f3a9450f3 in parse_txn ../ovsdb/ovsdb-server.c:556
    #5 0x560f3a9450f3 in read_db ../ovsdb/ovsdb-server.c:603
    #6 0x560f3a946b74 in open_db ../ovsdb/ovsdb-server.c:670
    #7 0x560f3a9419f6 in main ../ovsdb/ovsdb-server.c:356
    #8 0x7f2152f4209a in __libc_start_main ../csu/libc-start.c:308

SUMMARY: AddressSanitizer: heap-use-after-free ../ovsdb/row.c:103 in 
ovsdb_row_destroy
Shadow bytes around the buggy address:
  0x0c0c7fff8060: fa fa fa fa fd fd fd fd fd fd fd fd fa fa fa fa
  0x0c0c7fff8070: fd fd fd fd fd fd fd fa fa fa fa fa fd fd fd fd
  0x0c0c7fff8080: fd fd fd fd fa fa fa fa fd fd fd fd fd fd fd fd
  0x0c0c7fff8090: fa fa fa fa fd fd fd fd fd fd fd fa fa fa fa fa
  0x0c0c7fff80a0: fd fd fd fd fd fd fd fa fa fa fa fa fd fd fd fd
=>0x0c0c7fff80b0: fd fd fd fd fa fa fa fa[fd]fd fd fd fd fd fd fa
  0x0c0c7fff80c0: fa fa fa fa fd fd fd fd fd fd fd fd fa fa fa fa
  0x0c0c7fff80d0: fd fd fd fd fd fd fd fd fa fa fa fa fd fd fd fd
  0x0c0c7fff80e0: fd fd fd fd fa fa fa fa fd fd fd fd fd fd fd fd
  0x0c0c7fff80f0: fa fa fa fa 00 00 00 00 00 00 00 fa fa fa fa fa
  0x0c0c7fff8100: 00 00 00 00 00 00 00 00 fa fa fa fa 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==32224==ABORTING
=================================================================
==32375==ERROR: AddressSanitizer: heap-use-after-free on address 0x6060000008c0 
at pc 0x5629da28a59f bp 0x7ffc98cc14e0 sp 0x7ffc98cc14d8
READ of size 8 at 0x6060000008c0 thread T0
    #0 0x5629da28a59e in ovsdb_row_destroy ../ovsdb/row.c:103
    #1 0x5629da295520 in ovsdb_txn_destroy_cloned ../ovsdb/transaction.c:941
    #2 0x5629da29c234 in ovsdb_txn_history_destroy ../ovsdb/transaction.c:1418
    #3 0x5629da27b9de in ovsdb_destroy ../ovsdb/ovsdb.c:459
    #4 0x5629da26800e in close_db ../ovsdb/ovsdb-server.c:522
    #5 0x5629da263cd5 in main ../ovsdb/ovsdb-server.c:463
    #6 0x7f4f49ee709a in __libc_start_main ../csu/libc-start.c:308
    #7 0x5629da2658f9 in _start 
(/home/blp/nicira/ovs/_build/ovsdb/ovsdb-server+0x508f9)

0x6060000008c0 is located 0 bytes inside of 56-byte region 
[0x6060000008c0,0x6060000008f8)
freed by thread T0 here:
    #0 0x7f4f4a69ffd0 in __interceptor_free 
(/lib/x86_64-linux-gnu/libasan.so.5+0xe8fd0)
    #1 0x5629da27bb3b in ovsdb_destroy ../ovsdb/ovsdb.c:467
    #2 0x5629da27bb3b in ovsdb_destroy ../ovsdb/ovsdb.c:447
    #3 0x5629da267117 in parse_txn ../ovsdb/ovsdb-server.c:556
    #4 0x5629da267117 in read_db ../ovsdb/ovsdb-server.c:603
    #5 0x5629da268b74 in open_db ../ovsdb/ovsdb-server.c:670
    #6 0x5629da2639f6 in main ../ovsdb/ovsdb-server.c:356
    #7 0x7f4f49ee709a in __libc_start_main ../csu/libc-start.c:308

previously allocated by thread T0 here:
    #0 0x7f4f4a6a0350 in __interceptor_malloc 
(/lib/x86_64-linux-gnu/libasan.so.5+0xe9350)
    #1 0x5629da31f794 in xmalloc ../lib/util.c:123
    #2 0x5629da29146d in ovsdb_table_create ../ovsdb/table.c:295
    #3 0x5629da27b603 in ovsdb_create ../ovsdb/ovsdb.c:423
    #4 0x5629da2670f3 in parse_txn ../ovsdb/ovsdb-server.c:556
    #5 0x5629da2670f3 in read_db ../ovsdb/ovsdb-server.c:603
    #6 0x5629da268b74 in open_db ../ovsdb/ovsdb-server.c:670
    #7 0x5629da2639f6 in main ../ovsdb/ovsdb-server.c:356
    #8 0x7f4f49ee709a in __libc_start_main ../csu/libc-start.c:308

SUMMARY: AddressSanitizer: heap-use-after-free ../ovsdb/row.c:103 in 
ovsdb_row_destroy
Shadow bytes around the buggy address:
  0x0c0c7fff80c0: fa fa fa fa fd fd fd fd fd fd fd fd fa fa fa fa
  0x0c0c7fff80d0: fd fd fd fd fd fd fd fa fa fa fa fa fd fd fd fd
  0x0c0c7fff80e0: fd fd fd fd fa fa fa fa fd fd fd fd fd fd fd fd
  0x0c0c7fff80f0: fa fa fa fa fd fd fd fd fd fd fd fa fa fa fa fa
  0x0c0c7fff8100: fd fd fd fd fd fd fd fa fa fa fa fa fd fd fd fd
=>0x0c0c7fff8110: fd fd fd fd fa fa fa fa[fd]fd fd fd fd fd fd fa
  0x0c0c7fff8120: fa fa fa fa fd fd fd fd fd fd fd fd fa fa fa fa
  0x0c0c7fff8130: fd fd fd fd fd fd fd fd fa fa fa fa fd fd fd fd
  0x0c0c7fff8140: fd fd fd fd fa fa fa fa 00 00 00 00 00 00 00 00
  0x0c0c7fff8150: fa fa fa fa fd fd fd fd fd fd fd fd fa fa fa fa
  0x0c0c7fff8160: 00 00 00 00 00 00 00 00 fa fa fa fa fd fd fd fd
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==32375==ABORTING
../../tests/ovs-macros.at:194: hard failure
/home/blp/nicira/ovs/_build/tests/testsuite.dir/1920/cleanup: line 1: kill: 
(32375) - No such process
1920. ovsdb-server.at:1126: 1920. schema conversion online - clustered 
(ovsdb-server.at:1126): FAILED (ovs-macros.at:194)
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to