On Tue, Sep 11, 2018 at 10:30 AM <[email protected]> wrote: > > From: Numan Siddique <[email protected]> > > The present code resets the database when it is in the state - > 'RPL_S_SCHEMA_REQUESTED' and repopulates the database when it > receives the monitor reply when it is in the state - > 'RPL_S_MONITOR_REQUESTED'. If however, it goes to active mode > before it processes the monitor reply, the whole data is lost. > > This patch alleviates the problem by resetting the database when it > receives the monitor reply (before processing it). So that > reset and repopulation of the db happens in the same state. > > This approach still has a window for data loss if the function > process_notification() when processing the monitor reply fails for > some reason or ovsdb-server crashes for some reason during > process_notification(). > > Reported-by: Han Zhou <[email protected]> > Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2018-August/047161.html > Tested-by: aginwala <[email protected]> > Acked-by: Han Zhou <[email protected]> > Signed-off-by: Numan Siddique <[email protected]> > --- > ovsdb/replication.c | 25 +++++++++++-------------- > 1 file changed, 11 insertions(+), 14 deletions(-) > > v1 -> v2 > -------- > * Updated the commit message as per Han's suggestion > * Added few comments in the code where it resets the db. > > diff --git a/ovsdb/replication.c b/ovsdb/replication.c > index 2b9ae2f83..752b3c89c 100644 > --- a/ovsdb/replication.c > +++ b/ovsdb/replication.c > @@ -299,19 +299,7 @@ replication_run(void) > /* After receiving schemas, reset the local databases that > * will be monitored and send out monitor requests for them. */ > if (hmap_is_empty(&request_ids)) { > - struct shash_node *node, *next; > - > - SHASH_FOR_EACH_SAFE (node, next, replication_dbs) { > - db = node->data; > - error = reset_database(db); > - if (error) { > - const char *db_name = db->schema->name; > - shash_find_and_delete(replication_dbs, db_name); > - ovsdb_error_assert(error); > - VLOG_WARN("Failed to reset database, " > - "%s not replicated.", db_name); > - } > - } > + struct shash_node *node; > > if (shash_is_empty(replication_dbs)) { > VLOG_WARN("Nothing to replicate."); > @@ -335,7 +323,16 @@ replication_run(void) > case RPL_S_MONITOR_REQUESTED: { > /* Reply to monitor requests. */ > struct ovsdb_error *error; > - error = process_notification(msg->result, db); > + VLOG_INFO("Monitor request received. Resetting the database"); > + /* Resetting the database here has few risks. If the > + * process_notification() fails, the database is completely > + * lost locally. In case that node becomes active, then > + * there is a chance of complete data loss in the active/standy > + * cluster. */ > + error = reset_database(db); > + if (!error) { > + error = process_notification(msg->result, db); > + } > if (error) { > ovsdb_error_assert(error); > state = RPL_S_ERR; > -- > 2.17.1 > > _______________________________________________ > dev mailing list > [email protected] > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Thanks Numan! _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
