Hi Yun and Girish, I submitted a patch, do you mind testing and reviewing it? Thanks.
[PATCH] dynamic-string: Fix a bug that leads to assertion fail diff --git a/lib/dynamic-string.c b/lib/dynamic-string.c index 6f7b610a9908..4564e420544d 100644 --- a/lib/dynamic-string.c +++ b/lib/dynamic-string.c @@ -158,7 +158,7 @@ ds_put_format_valist(struct ds *ds, const char *format, va_list args_) if (needed < available) { ds->length += needed; } else { - ds_reserve(ds, ds->length + needed); + ds_reserve(ds, ds->allocated + needed); va_copy(args, args_); available = ds->allocated - ds->length + 1; Thanks, Yifeng Sun On Wed, Jul 18, 2018 at 10:48 AM, Girish Moodalbail <gmoodalb...@gmail.com> wrote: > Hello all, > > We are able to reproduce this issue on OVS 2.9.2 at will. The OVSDB NB > server or OVSDB SB server dumps core while it is trying to compact the > database. > > You can reproduce the issue by using: > > root@u1804-HVM-domU:/var/crash# ovs-appctl -t > /var/run/openvswitch/ovnsb_db.ctl ovsdb-server/compact OVN_Southbound > > 2018-07-18T17:34:29Z|00001|unixctl|WARN|error communicating with > unix:/var/run/openvswitch/ovnsb_db.ctl: End of file > ovs-appctl: /var/run/openvswitch/ovnsb_db.ctl: transaction error (End of > file) > root@u1804-HVM-domU:/var/crash# > root@u1804-HVM-domU:/var/crash# > root@u1804-HVM-domU:/var/crash# ERROR: apport (pid 17393) Wed Jul 18 > 10:34:23 2018: called for pid 14683, signal 6, core limit 0, dump mode 1 > ERROR: apport (pid 17393) Wed Jul 18 10:34:23 2018: executable: > /usr/sbin/ovsdb-server (command line "ovsdb-server -vconsole:off > -vfile:info --log-file=/var/log/openvswitch/ovsdb-server-sb.log > --remote=punix:/var/run/openvswitch/ovnsb_db.sock > --pidfile=/var/run/openvswitch/ovnsb_db.pid --unixctl=ovnsb_db.ctl > --detach > --monitor --remote=db:OVN_Southbound,SB_Global,connections > --private-key=db:OVN_Southbound,SSL,private_key > --certificate=db:OVN_Southbound,SSL,certificate > --ca-cert=db:OVN_Southbound,SSL,ca_cert > --ssl-protocols=db:OVN_Southbound,SSL,ssl_protocols > --ssl-ciphers=db:OVN_Southbound,SSL,ssl_ciphers > --remote=ptcp:6642:10.0.7.33 /etc/openvswitch/ovnsb_db.db") > ERROR: apport (pid 17393) Wed Jul 18 10:34:23 2018: is_closing_session(): > no DBUS_SESSION_BUS_ADDRESS in environment > ERROR: apport (pid 17393) Wed Jul 18 10:34:29 2018: wrote report > /var/crash/_usr_sbin_ovsdb-server.0.crash > > Looking through the crash we see the following stack: > > (gdb) bt > #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51 > #1 0x00007f7c9a43c801 in __GI_abort () at abort.c:79 > #2 0x00007f7c9aaa633c in json_serialize (json=<optimized out>, > s=<optimized out>) at lib/json.c:1554 > #3 0x00007f7c9aaa63ab in json_serialize_object_member (i=<optimized out>, > s=<optimized out>, node=<optimized out>, node=<optimized out>) > at lib/json.c:1583 > #4 0x00007f7c9aaa62f2 in json_serialize_object (s=0x7ffca2173ea0, > object=0x5568dc5d5b10) at lib/json.c:1612 > #5 json_serialize (json=<optimized out>, s=0x7ffca2173ea0) at > lib/json.c:1533 > #6 0x00007f7c9aaa863c in json_to_ds (json=json@entry=0x5568dc5d4a20, > flags=flags@entry=0, ds=ds@entry=0x7ffca2173f30) at lib/json.c:1511 > #7 0x00007f7c9ae6750f in ovsdb_log_compose_record > (json=json@entry=0x5568dc5d4a20, > magic=0x5568dc5d5a60 "CLUSTER", > header=header@entry=0x7ffca2173f10, data=data@entry=0x7ffca2173f30) at > ovsdb/log.c:570 > #8 0x00007f7c9ae677ef in ovsdb_log_write (file=0x5568dc5d5a80, > json=0x5568dc5d4a20) at ovsdb/log.c:618 > #9 0x00007f7c9ae6796e in ovsdb_log_write_and_free > (log=log@entry=0x5568dc5d5a80, > json=0x5568dc5d4a20) at ovsdb/log.c:651 > #10 0x00007f7c9ae6d684 in raft_write_snapshot (raft=raft@entry= > 0x5568dc1e3720, > log=0x5568dc5d5a80, new_log_start=new_log_start@entry=539578, > new_snapshot=new_snapshot@entry=0x7ffca21740e0) at ovsdb/raft.c:3588 > #11 0x00007f7c9ae6dbf3 in raft_save_snapshot (raft=raft@entry= > 0x5568dc1e3720, > new_start=new_start@entry=539578, > new_snapshot=new_snapshot@entry=0x7ffca21740e0) at ovsdb/raft.c:3647 > #12 0x00007f7c9ae757bd in raft_store_snapshot (raft=0x5568dc1e3720, > new_snapshot_data=new_snapshot_data@entry=0x5568dc5d49a0) > at ovsdb/raft.c:3849 > #13 0x00007f7c9ae7c7ae in ovsdb_storage_store_snapshot__ > (storage=0x5568dc6b2fb0, schema=0x5568dd66f5a0, data=0x5568dca67880) > at ovsdb/storage.c:541 > #14 0x00007f7c9ae7d1de in ovsdb_storage_store_snapshot > (storage=0x5568dc6b2fb0, schema=schema@entry=0x5568dd66f5a0, > data=data@entry=0x5568dca67880) at ovsdb/storage.c:568 > #15 0x00007f7c9ae69cab in ovsdb_snapshot (db=0x5568dc6b3020) at > ovsdb/ovsdb.c:519 > #16 0x00005568daec1f82 in main_loop (is_backup=0x7ffca21742be, > exiting=0x7ffca21742bf, run_process=0x0, remotes=0x7ffca2174310, > unixctl=0x5568dc71ade0, all_dbs=0x7ffca2174350, jsonrpc=0x5568dc1e36a0, > config=0x7ffca2174370) at ovsdb/ovsdb-server.c:239 > #17 main (argc=<optimized out>, argv=<optimized out>) at > ovsdb/ovsdb-server.c:457 > > Walking through the JSON objects being serialized we see that > "prev_servers" is malformed. > > (gdb) print *((struct shash *)0x5568dc5d5b10) > $3 = { > map = { > buckets = 0x5568dc5d1d30, > one = 0x0, > mask = 7, > n = 9 > } > } > > (gdb) x/6a 0x5568dc5d1d30 > 0x5568dc5d1d30: 0x5568dc5d6000 0x0 > 0x5568dc5d1d40: 0x0 0x5568dc5d5f30 > 0x5568dc5d1d50: 0x5568dc5d5e30 0x5568dc5d5bc0 > > Let us look at the next one > > (gdb) print *((struct shash_node *)0x5568dc5d5e30) > $7 = { > node = { > hash = 2043875868, > next = 0x0 > }, > name = 0x5568dc5d5e10 "prev_servers", > data = 0x5568dc688cd0 > } > > (gdb) print *((struct json *)0x5568dc688cd0) > $10 = { > type = 3697839232, > count = 34, > u = { > object = 0x5568dc688cb0, > array = { > n = 93908862799024, > n_allocated = 93908862798944, > elems = 0x5568dc22f050 > }, > integer = 93908862799024, > real = 4.6397142949016804e-310, > string = 0x5568dc688cb0 "\a" > } > } > > So, this is malformed. Somehow "prev_servers" is getting malformed. > > That information is coming in from 'struct raft`snap`servers' > > As anyone seen this before? > > > On Fri, Jul 13, 2018 at 3:49 PM, Yun Zhou <y...@nvidia.com> wrote: > > > Hi, > > > > We are running into some issues while we are trying out the 3 nodes raft > > ovsdb cluster in our lab, and hopefully we can get some help from the > > community. > > > > We are using ovs 2.9.2. > > ------------------------- > > > > We found that on one of the 3 nodes, the SB ovsdb-server was not started, > > and was not able to be restarted because its database was already > corrupted: > > > > "ovsdb-server: syntax "{"encaps":["uuid","7f0f7605- > > c1d1-43fb-826a-1718ea70e088"],"hostname":"nd-sdn-dgx-010"}": syntax > > error: hostname is not a UUID" > > > > Seeing from the ovsdb-server-sb log file history, SB ovsdb-server core > > dumped several days ago: > > > > "2018-07-08T06:58:15.267Z|00002|daemon_unix(monitor)|ERR|1 > > crashes: pid 937 died, killed (Aborted), core dumped, restarting" > > > > Unfortunately, core dump was not generated. > > > > FWIW, we saw core dumps for the NB ovsdb on all 3 cluster nodes, here is > > one of the stack: > > > > (gdb) bt > > #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/ > raise.c:51 > > #1 0x00007fc48f8c2801 in __GI_abort () at abort.c:79 > > #2 0x00007fc48ff2c33c in ?? () from /usr/lib/x86_64-linux-gnu/ > > libopenvswitch-2.9.so.0 > > #3 0x00007fc48ff2c2f2 in ?? () from /usr/lib/x86_64-linux-gnu/ > > libopenvswitch-2.9.so.0 > > #4 0x00007fc48ff2e63c in json_to_ds () > > from /usr/lib/x86_64-linux-gnu/libopenvswitch-2.9.so.0 > > #5 0x00007fc4902ed50f in ovsdb_log_compose_record () > > from /usr/lib/x86_64-linux-gnu/libovsdb-2.9.so.0 > > #6 0x00007fc4902ed7ef in ovsdb_log_write () > > from /usr/lib/x86_64-linux-gnu/libovsdb-2.9.so.0 > > #7 0x00007fc4902ed96e in ovsdb_log_write_and_free () > > from /usr/lib/x86_64-linux-gnu/libovsdb-2.9.so.0 > > #8 0x00007fc4902f3684 in ?? () from /usr/lib/x86_64-linux-gnu/ > > libovsdb-2.9.so.0 > > #9 0x00007fc4902f3bf3 in ?? () from /usr/lib/x86_64-linux-gnu/ > > libovsdb-2.9.so.0 > > #10 0x00007fc4902fb7bd in raft_store_snapshot () > > from /usr/lib/x86_64-linux-gnu/libovsdb-2.9.so.0 > > #11 0x00007fc4903027ae in ?? () from /usr/lib/x86_64-linux-gnu/ > > libovsdb-2.9.so.0 > > #12 0x00007fc4903031de in ovsdb_storage_store_snapshot () > > from /usr/lib/x86_64-linux-gnu/libovsdb-2.9.so.0 > > #13 0x00007fc4902efcab in ovsdb_snapshot () > > from /usr/lib/x86_64-linux-gnu/libovsdb-2.9.so.0 > > #14 0x0000561e47a8cf82 in ?? () > > #15 0x00007fc48f8a3b97 in __libc_start_main (main=0x561e47a8bef0, > argc=17, > > argv=0x7ffe000ce2c8, init=<optimized out>, fini=<optimized out>, > > rtld_fini=<optimized out>, stack_end=0x7ffe000ce2b8) at > > ../csu/libc-start.c:310 > > #16 0x0000561e47a8db9a in ?? () > > > > Please let us know if any more information is needed. Thanks very much! > > > > - Yun > > > > > > ------------------------------------------------------------ > > ----------------------- > > This email message is for the sole use of the intended recipient(s) and > > may contain > > confidential information. Any unauthorized review, use, disclosure or > > distribution > > is prohibited. If you are not the intended recipient, please contact the > > sender by > > reply email and destroy all copies of the original message. > > ------------------------------------------------------------ > > ----------------------- > > _______________________________________________ > > discuss mailing list > > disc...@openvswitch.org > > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss > > > _______________________________________________ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev