On Tue, Sep 11, 2018 at 10:55:56AM -0700, Han Zhou wrote: > On Tue, Sep 11, 2018 at 10:30 AM <[email protected]> wrote: > > > > From: Numan Siddique <[email protected]> > > > > The present code resets the database when it is in the state - > > 'RPL_S_SCHEMA_REQUESTED' and repopulates the database when it > > receives the monitor reply when it is in the state - > > 'RPL_S_MONITOR_REQUESTED'. If however, it goes to active mode > > before it processes the monitor reply, the whole data is lost. > > > > This patch alleviates the problem by resetting the database when it > > receives the monitor reply (before processing it). So that > > reset and repopulation of the db happens in the same state. > > > > This approach still has a window for data loss if the function > > process_notification() when processing the monitor reply fails for > > some reason or ovsdb-server crashes for some reason during > > process_notification(). > > > > Reported-by: Han Zhou <[email protected]> > > Reported-at: > https://mail.openvswitch.org/pipermail/ovs-discuss/2018-August/047161.html > > Tested-by: aginwala <[email protected]> > > Acked-by: Han Zhou <[email protected]> > > Signed-off-by: Numan Siddique <[email protected]> > > --- > > ovsdb/replication.c | 25 +++++++++++-------------- > > 1 file changed, 11 insertions(+), 14 deletions(-) > > > > v1 -> v2 > > -------- > > * Updated the commit message as per Han's suggestion > > * Added few comments in the code where it resets the db. > > > > diff --git a/ovsdb/replication.c b/ovsdb/replication.c > > index 2b9ae2f83..752b3c89c 100644 > > --- a/ovsdb/replication.c > > +++ b/ovsdb/replication.c > > @@ -299,19 +299,7 @@ replication_run(void) > > /* After receiving schemas, reset the local databases > that > > * will be monitored and send out monitor requests for > them. */ > > if (hmap_is_empty(&request_ids)) { > > - struct shash_node *node, *next; > > - > > - SHASH_FOR_EACH_SAFE (node, next, replication_dbs) { > > - db = node->data; > > - error = reset_database(db); > > - if (error) { > > - const char *db_name = db->schema->name; > > - shash_find_and_delete(replication_dbs, > db_name); > > - ovsdb_error_assert(error); > > - VLOG_WARN("Failed to reset database, " > > - "%s not replicated.", db_name); > > - } > > - } > > + struct shash_node *node; > > > > if (shash_is_empty(replication_dbs)) { > > VLOG_WARN("Nothing to replicate."); > > @@ -335,7 +323,16 @@ replication_run(void) > > case RPL_S_MONITOR_REQUESTED: { > > /* Reply to monitor requests. */ > > struct ovsdb_error *error; > > - error = process_notification(msg->result, db); > > + VLOG_INFO("Monitor request received. Resetting the > database"); > > + /* Resetting the database here has few risks. If the > > + * process_notification() fails, the database is > completely > > + * lost locally. In case that node becomes active, then > > + * there is a chance of complete data loss in the > active/standy > > + * cluster. */ > > + error = reset_database(db); > > + if (!error) { > > + error = process_notification(msg->result, db); > > + } > > if (error) { > > ovsdb_error_assert(error); > > state = RPL_S_ERR; > > -- > > 2.17.1 > > > > _______________________________________________ > > dev mailing list > > [email protected] > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > Thanks Numan!
Thanks Numan and Han. I applied this to master. If you'd like it backported, please let me know (and how far). _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
