On Wed, Aug 5, 2020 at 8:28 AM Dumitru Ceara <[email protected]> wrote: > > Every time a follower has to install a snapshot received from the > leader, it should also replace the data in memory. Right now this only > happens when snapshots are installed that also change the schema. > > This can lead to inconsistent DB data on follower nodes and the snapshot > may fail to get applied. > > CC: Han Zhou <[email protected]> > Fixes: bda1f6b60588 ("ovsdb-server: Don't disconnect clients after raft install_snapshot.") > Signed-off-by: Dumitru Ceara <[email protected]>
Thanks Dumitru! This is a great finding, and sorry for my mistake. This patch looks good to me. Just one minor comment below on the test case. Otherwise: Acked-by: Han Zhou <[email protected]> > --- > ovsdb/ovsdb-server.c | 21 +++++++++++++-------- > tests/idltest.ovsschema | 9 +++++++++ > tests/ovsdb-cluster.at | 28 +++++++++++++++++++++++++--- > tests/ovsdb-idl.at | 1 + > 4 files changed, 48 insertions(+), 11 deletions(-) > > diff --git a/ovsdb/ovsdb-server.c b/ovsdb/ovsdb-server.c > index ef4e996..fd7891a 100644 > --- a/ovsdb/ovsdb-server.c > +++ b/ovsdb/ovsdb-server.c > @@ -543,13 +543,14 @@ parse_txn(struct server_config *config, struct db *db, > const struct ovsdb_schema *schema, const struct json *txn_json, > const struct uuid *txnid) > { > - if (schema && (!db->db->schema || strcmp(schema->version, > - db->db->schema->version))) { > + if (schema) { > /* We're replacing the schema (and the data). Destroy the database > * (first grabbing its storage), then replace it with the new schema. > * The transaction must also include the replacement data. > * > - * Only clustered database schema changes go through this path. */ > + * Only clustered database schema changes and snapshot installs > + * go through this path. > + */ > ovs_assert(txn_json); > ovs_assert(ovsdb_storage_is_clustered(db->db->storage)); > > @@ -559,11 +560,15 @@ parse_txn(struct server_config *config, struct db *db, > return error; > } > > - ovsdb_jsonrpc_server_reconnect( > - config->jsonrpc, false, > - (db->db->schema > - ? xasprintf("database %s schema changed", db->db->name) > - : xasprintf("database %s connected to storage", db->db->name))); > + if (!db->db->schema || > + strcmp(schema->version, db->db->schema->version)) { > + ovsdb_jsonrpc_server_reconnect( > + config->jsonrpc, false, > + (db->db->schema > + ? xasprintf("database %s schema changed", db->db->name) > + : xasprintf("database %s connected to storage", > + db->db->name))); > + } > > ovsdb_replace(db->db, ovsdb_create(ovsdb_schema_clone(schema), NULL)); > > diff --git a/tests/idltest.ovsschema b/tests/idltest.ovsschema > index e02b975..e04755e 100644 > --- a/tests/idltest.ovsschema > +++ b/tests/idltest.ovsschema > @@ -54,6 +54,15 @@ > }, > "isRoot" : true > }, > + "indexed": { > + "columns": { > + "i": { > + "type": "integer" > + } > + }, > + "indexes": [["i"]], > + "isRoot" : true > + }, > "simple": { > "columns": { > "b": { > diff --git a/tests/ovsdb-cluster.at b/tests/ovsdb-cluster.at > index 9714545..06d27c9 100644 > --- a/tests/ovsdb-cluster.at > +++ b/tests/ovsdb-cluster.at > @@ -332,15 +332,31 @@ for i in `seq $n`; do > AT_CHECK([ovsdb_client_wait unix:s$i.ovsdb $schema_name connected]) > done > > +AT_CHECK([ovsdb-client transact unix:s1.ovsdb '[["idltest", > + {"op": "insert", > + "table": "indexed", > + "row": {"i": 1}}]]'], [0], [ignore], [ignore]) > + > # Kill one follower (s2) and write some data to cluster, so that the follower is falling behind > printf "\ns2: stopping\n" > OVS_APP_EXIT_AND_WAIT_BY_TARGET([`pwd`/s2], [s2.pid]) > > +# Delete "i":1 and readd it to get a different UUID for it. > +AT_CHECK([ovsdb-client transact unix:s1.ovsdb '[["idltest", > + {"op": "delete", > + "table": "indexed", > + "where": [["i", "==", 1]]}]]'], [0], [ignore], [ignore]) > + > AT_CHECK([ovsdb-client transact unix:s1.ovsdb '[["idltest", > {"op": "insert", > - "table": "simple", > + "table": "indexed", > "row": {"i": 1}}]]'], [0], [ignore], [ignore]) > > +AT_CHECK([ovsdb-client transact unix:s1.ovsdb '[["idltest", > + {"op": "insert", > + "table": "indexed", > + "row": {"i": 2}}]]'], [0], [ignore], [ignore]) > + > # Compact leader online to generate snapshot > AT_CHECK([ovs-appctl -t "`pwd`"/s1 ovsdb-server/compact]) > > @@ -355,8 +371,14 @@ AT_CHECK([ovsdb_client_wait unix:s2.ovsdb $schema_name connected]) > # succeed. > AT_CHECK([ovsdb-client transact unix:s2.ovsdb '[["idltest", > {"op": "insert", > - "table": "simple", > - "row": {"i": 1}}]]'], [0], [ignore], [ignore]) > + "table": "indexed", > + "row": {"i": 3}}]]'], [0], [ignore], [ignore]) > + > +# The snapshot should overwrite the in-memory contents of the DB on S2 > +# without generating any constraint violations. > +AT_CHECK([grep 'Transaction causes multiple rows in "indexed" table to have identical values (1) for index on column "i"' -c s2.log], [1], [dnl > +0 > +]) Beside checking the error log, it may be better to simply check if the i=2 row is found in S2. Without this fix i=2 shouldn't get inserted. This can better ensure the correctness, in case the log text changes and the grep doesn't detect the problem. > > for i in `seq $n`; do > OVS_APP_EXIT_AND_WAIT_BY_TARGET([`pwd`/s$i], [s$i.pid]) > diff --git a/tests/ovsdb-idl.at b/tests/ovsdb-idl.at > index 4efed88..789ae23 100644 > --- a/tests/ovsdb-idl.at > +++ b/tests/ovsdb-idl.at > @@ -954,6 +954,7 @@ AT_CHECK([sort stdout | uuidfilt], [0], > > # Check that ovsdb-idl figured out that table link2 and column l2 are missing. > AT_CHECK([grep ovsdb_idl stderr | sort], [0], [dnl > +test-ovsdb|ovsdb_idl|idltest database lacks indexed table (database needs upgrade?) > test-ovsdb|ovsdb_idl|idltest database lacks link2 table (database needs upgrade?) > test-ovsdb|ovsdb_idl|idltest database lacks simple5 table (database needs upgrade?) > test-ovsdb|ovsdb_idl|idltest database lacks singleton table (database needs upgrade?) > -- > 1.8.3.1 > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
