On Tue, Sep 11, 2018 at 12:56 AM Han Zhou <[email protected]> wrote: > On Mon, Sep 10, 2018 at 12:56 AM <[email protected]> wrote: > > > > From: Numan Siddique <[email protected]> > > > > The present code resets the database when it is in the state - > > 'RPL_S_SCHEMA_REQUESTED' and repopulates the database when it > > receives the monitor reply when it is in the state - > > 'RPL_S_MONITOR_REQUESTED'. If however, it goes to active mode > > before it processes the monitor reply, the whole data is lost. > > > > This patch fixes the issue by resetting the database when it > > receives the monitor reply (before processing it). so that > > reset and repopulation of the db happens in the same state. > > > > This approach still has a very small window for data loss if the > > function process_notification() which processes the monitor > > reply fails for some reason. > > > Hi Numan, > > Thanks for the patch. As discussed, this method alleviates the problem > instead of fixing the problem. So I'd suggest update the title and commit > message. > > I'd like to point out that it is not only in failure scenario of > process_notification, but also normal scenarios when data size is very big > and process_notification() for the huge data takes time. During this window > if the process is killed for whatever reason, the data is lost on this > node. I'd also like to emphasize that the patch really helps in such > scenario, because the original implementation would block at receiving that > big amount of data from master, which would leave the data wiped-out state > for much longer. > > Hi Han, Thanks for the review and the suggestions. I agree. I submitted v2 - https://patchwork.ozlabs.org/patch/968638/ with the updates in commit message and some comments in the code.
Thanks Numan Other than the commit message: > Acked-by: Han Zhou <[email protected]> > > > Reported-by: Han Zhou <[email protected]> > > Reported-at: > https://mail.openvswitch.org/pipermail/ovs-discuss/2018-August/047161.html > > Tested-by: aginwala <[email protected]> > > Signed-off-by: Numan Siddique <[email protected]> > > --- > > ovsdb/replication.c | 20 ++++++-------------- > > 1 file changed, 6 insertions(+), 14 deletions(-) > > > > diff --git a/ovsdb/replication.c b/ovsdb/replication.c > > index 2b9ae2f83..44428a48e 100644 > > --- a/ovsdb/replication.c > > +++ b/ovsdb/replication.c > > @@ -299,19 +299,7 @@ replication_run(void) > > /* After receiving schemas, reset the local databases > that > > * will be monitored and send out monitor requests for > them. */ > > if (hmap_is_empty(&request_ids)) { > > - struct shash_node *node, *next; > > - > > - SHASH_FOR_EACH_SAFE (node, next, replication_dbs) { > > - db = node->data; > > - error = reset_database(db); > > - if (error) { > > - const char *db_name = db->schema->name; > > - shash_find_and_delete(replication_dbs, > db_name); > > - ovsdb_error_assert(error); > > - VLOG_WARN("Failed to reset database, " > > - "%s not replicated.", db_name); > > - } > > - } > > + struct shash_node *node; > > > > if (shash_is_empty(replication_dbs)) { > > VLOG_WARN("Nothing to replicate."); > > @@ -335,7 +323,11 @@ replication_run(void) > > case RPL_S_MONITOR_REQUESTED: { > > /* Reply to monitor requests. */ > > struct ovsdb_error *error; > > - error = process_notification(msg->result, db); > > + VLOG_INFO("Monitor request received. Resetting the > database"); > > + error = reset_database(db); > > + if (!error) { > > + error = process_notification(msg->result, db); > > + } > > if (error) { > > ovsdb_error_assert(error); > > state = RPL_S_ERR; > > -- > > 2.17.1 > > > > _______________________________________________ > > dev mailing list > > [email protected] > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
