On 4/22/20 8:38 PM, Dumitru Ceara wrote:
> Assuming an ovsdb client connected to a database using OVSDB_MONITOR_V3
> (i.e., "monitor_cond_since" method) with the initial monitor condition
> MC1.
> 
> Assuming the following two transactions are executed on the
> ovsdb-server:
> TXN1: "insert record R1 in table T1"
> TXN2: "insert record R2 in table T2"
> 
> If the client's monitor condition MC1 for table T2 matches R2 then the
> client will receive the following update3 message:
> method="update3", "insert record R2 in table T2", last-txn-id=TXN2
> 
> At this point, if the presence of the new record R2 in the IDL triggers
> the client to update its monitor condition to MC2 and add a clause for
> table T1 which matches R1, a monitor_cond_change message is sent to the
> server:
> method="monitor_cond_change", "clauses from MC2"
> 
> In normal operation the ovsdb-server will reply with a new update3
> message of the form:
> method="update3", "insert record R1 in table T1", last-txn-id=TXN2
> 
> However, if the connection drops in the meantime, this last update might
> get lost.
> 
> It might happen that during the reconnect a new transaction happens
> that modifies the original record R1:
> TXN3: "modify record R1 in table T1"
> 
> When the client reconnects, it will try to perform a fast resync by
> sending:
> method="monitor_cond_since", "clauses from MC2", last-txn-id=TXN2
> 
> Because TXN2 is still in the ovsdb-server transaction history, the
> server replies with the changes from the most recent transactions only,
> i.e., TXN3:
> result="true", last-txbb-id=TXN3, "modify record R1 in table T1"
> 
> This causes the IDL on the client in to end up in an inconsistent
> state because it has never seen the update that created R1.
> 
> Such a scenario is described in:
> https://bugzilla.redhat.com/show_bug.cgi?id=1808580#c22
> 
> To avoid hitting this issue, whenever a reconnect happens (either
> due to an IDL retry or due to network issues), we clear db->last_id
> in ovsdb_idl_restart_fsm() in any of the following cases:
> - a monitor condition has been changed locally and the corresponding
>   request was not sent yet (i.e., idl->data.cond_changed == true).
> - a monitor_cond_change request is in flight.
> 
> This ensures that updates of type "insert" that happened before the last
> transaction known by the IDL but didn't match old monitor conditions are
> sent upon reconnect if the monitor condition has changed to include them
> in the meantime.
> 
> CC: Han Zhou <[email protected]>
> Fixes: 403a6a0cb003 ("ovsdb-idl: Fast resync from server when connection 
> reset.")
> Signed-off-by: Dumitru Ceara <[email protected]>
> ---
>  lib/ovsdb-idl.c |   19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
> 
> diff --git a/lib/ovsdb-idl.c b/lib/ovsdb-idl.c
> index 1535ad7..67c4745 100644
> --- a/lib/ovsdb-idl.c
> +++ b/lib/ovsdb-idl.c
> @@ -690,6 +690,25 @@ ovsdb_idl_send_request(struct ovsdb_idl *idl, struct 
> jsonrpc_msg *request)
>  static void
>  ovsdb_idl_restart_fsm(struct ovsdb_idl *idl)
>  {
> +    /* If there's an outstanding request of type monitor_cond_change and
> +     * we're in monitor_cond_since mode then we can't trust that all relevant
> +     * updates from transaction idl->data.last_id have been received as we
> +     * might have relaxed the monitor condition with our last request and
> +     * might be missing previously not monitored records.
> +     *
> +     * Same reasoning applies for the case when a monitor condition has been
> +     * changed locally but the monitor_cond_change request was not sent yet.
> +     *
> +     * In both cases, clear last_id to make sure that the next time
> +     * monitor_cond_since is sent (i.e., after reconnect) we get the complete
> +     * view of the database.
> +     */
> +    if (idl->data.cond_changed ||
> +            (idl->request_id &&
> +                idl->data.monitoring == OVSDB_IDL_MONITORING_COND_SINCE)) {
> +        uuid_zero(&idl->data.last_id);

As Han pointed out during OVN IRC meeting,  when there is a change to SB, e.g.
creating a new DP, it causes all HVs changing the condition, and at the same 
time
one of the SB servers might be disconnected triggering this bug on a big number
of nodes at once.  If all of these nodes will request the full database at the
same time, this might be an issue.

One possibility that came to mind is that we could store 'last_id' for the
previous successful cond_change and use it instead of 0 on re-connection if 
there
was in-flight cond_change.  This way we could reduce the pressure on SB server.

> +    }
> +
>      ovsdb_idl_send_schema_request(idl, &idl->server);
>      ovsdb_idl_transition(idl, IDL_S_SERVER_SCHEMA_REQUESTED);
>      idl->data.monitoring = OVSDB_IDL_NOT_MONITORING;
> 

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to