From: Numan Siddique <[email protected]>

The present code resets the database when it is in the state -
'RPL_S_SCHEMA_REQUESTED' and repopulates the database when it
receives the monitor reply when it is in the state -
'RPL_S_MONITOR_REQUESTED'. If however, it goes to active mode
before it processes the monitor reply, the whole data is lost.

This patch alleviates the problem by resetting the database when it
receives the monitor reply (before processing it). So that
reset and repopulation of the db happens in the same state.

This approach still has a window for data loss if the function
process_notification() when processing the monitor reply fails for
some reason or ovsdb-server crashes for some reason during
process_notification().

Reported-by: Han Zhou <[email protected]>
Reported-at: 
https://mail.openvswitch.org/pipermail/ovs-discuss/2018-August/047161.html
Tested-by: aginwala <[email protected]>
Acked-by: Han Zhou <[email protected]>
Signed-off-by: Numan Siddique <[email protected]>
---
 ovsdb/replication.c | 25 +++++++++++--------------
 1 file changed, 11 insertions(+), 14 deletions(-)

v1 -> v2
--------
 * Updated the commit message as per Han's suggestion
 * Added few comments in the code where it resets the db.

diff --git a/ovsdb/replication.c b/ovsdb/replication.c
index 2b9ae2f83..752b3c89c 100644
--- a/ovsdb/replication.c
+++ b/ovsdb/replication.c
@@ -299,19 +299,7 @@ replication_run(void)
                 /* After receiving schemas, reset the local databases that
                  * will be monitored and send out monitor requests for them. */
                 if (hmap_is_empty(&request_ids)) {
-                    struct shash_node *node, *next;
-
-                    SHASH_FOR_EACH_SAFE (node, next, replication_dbs) {
-                        db = node->data;
-                        error = reset_database(db);
-                        if (error) {
-                            const char *db_name = db->schema->name;
-                            shash_find_and_delete(replication_dbs, db_name);
-                            ovsdb_error_assert(error);
-                            VLOG_WARN("Failed to reset database, "
-                                      "%s not replicated.", db_name);
-                        }
-                    }
+                    struct shash_node *node;
 
                     if (shash_is_empty(replication_dbs)) {
                         VLOG_WARN("Nothing to replicate.");
@@ -335,7 +323,16 @@ replication_run(void)
             case RPL_S_MONITOR_REQUESTED: {
                 /* Reply to monitor requests. */
                 struct ovsdb_error *error;
-                error = process_notification(msg->result, db);
+                VLOG_INFO("Monitor request received. Resetting the database");
+                /* Resetting the database here has few risks. If the
+                 * process_notification() fails, the database is completely
+                 * lost locally. In case that node becomes active, then
+                 * there is a chance of complete data loss in the active/standy
+                 * cluster. */
+                error = reset_database(db);
+                if (!error) {
+                    error = process_notification(msg->result, db);
+                }
                 if (error) {
                     ovsdb_error_assert(error);
                     state = RPL_S_ERR;
-- 
2.17.1

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to