On 7/13/21 5:03 PM, Dumitru Ceara wrote:
> On 7/12/21 10:36 PM, Ilya Maximets wrote:
>> On 6/29/21 1:20 PM, Dumitru Ceara wrote:
>>> Until now clients that needed to reconnect immediately could only use
>>> reconnect_force_reconnect().  However, reconnect_force_reconnect()
>>> doesn't reset the backoff for connections that were alive long enough
>>> (more than backoff seconds).
>>>
>>> Moreover, the reconnect library cannot determine the exact reason why a
>>> client wishes to initiate a reconnection.  In most cases reconnection
>>> happens because of a fatal error when communicating with the remote,
>>> e.g., in the ovsdb-cs layer, when invalid messages are received from
>>> ovsdb-server.  In such cases it makes sense to not reset the backoff
>>> because the remote seems to be unhealthy.
>>>
>>> There are however cases when reconnection is needed for other reasons.
>>> One such example is when ovsdb-clients require "leader-only" connections
>>> to clustered ovsdb-server databases.  Whenever the client determines
>>> that the remote is not a leader anymore, it decides to reconnect to a
>>> new remote from its list, searching for the new leader.  Using
>>> jsonrpc_force_reconnect() (which calls reconnect_force_reconnect()) will
>>> not reset backoff even though the former leader is still likely in good
>>> shape.
>>>
>>> Since 3c2d6274bcee ("raft: Transfer leadership before creating
>>> snapshots.") leadership changes inside the clustered database happen
>>> more often and therefore "leader-only" clients need to reconnect more
>>> often too.  Not resetting the backoff every time a leadership change
>>> happens will cause all reconnections to happen with the maximum backoff
>>> (8 seconds) resulting in significant latency.
>>>
>>> This commit also updates the Python reconnect and IDL implementations
>>> and adds tests for force-reconnect and graceful-reconnect.
>>>
>>> Reported-at: https://bugzilla.redhat.com/1977264
>>> Signed-off-by: Dumitru Ceara <[email protected]>
>>> ---
>>
>> Hi, Dumitru.
>>
> 
> Hi Ilya,
> 
>> Thanks for working on this issue.  I've seen it in practice while running
>> OVN tests, but I still don't quiet understand why it happens.  Could you,
>> please, describe how state transitioning work here for the ovsdb-idl case?
>>
> 
> Without the patch, assuming a current backoff of X seconds, the sequence
> of events (even for a connection that has seen activity after backoff)
> is:
> 
> - reconnect_force_reconnect() -> move to S_RECONNECT
> - ovsdb_cs_run()
>   -> jsonrpc_session_run()
>      -> reconnect_run()
>      -> reconnect_disconnected()
>         -> because state is not S_ACTIVE | S_IDLE backoff is not
>            changed, stays X.

Hmm, I see.  Thanks for explanation!

> 
>>> +# Forcefully reconnect.
>>> +force-reconnect
>>> +  in RECONNECT for 0 ms (2000 ms backoff)
>>> +  1 successful connections out of 3 attempts, seqno 2
>>> +  disconnected
>>> +run
>>> +  should disconnect
>>> +connecting
>>> +  in CONNECTING for 0 ms (2000 ms backoff)
>>
>> Especially this part seems wrong to me.  Because after 'should disconnect'
>> there should be 'disconnect' of 'connect-fail', but not 'connecting'.  We
>> literally should disconnect here, otherwise it's a violation of the reconnect
>> API.  And my concern is that ovsdb-cs or jsonrpc violates the API somewhere
>> by not calling reconnect_disconnectd() when it is required, or there is some
>> other bug that makes 'reconnect' module to jump over few states in a fsm.
>>
>> The logical workflow for the force-reconnect, from what I see in the code
>> should be:
>>
>> 1. force-reconnect --> transition to S_RECONNECT
>> 2. run -> in S_RECONNECT, so returning RECONNECT_DISCONNECT
>> 3. disconnect -> check the state, update backoff and transition to S_BACKOFF
>> 4. run -> in S_BACKOFF, so returning RECONNECT_CONNECT
>> 5. connected ....
> 
> This is just a bug in the test case I added, I need to issue
> "disconnect" in the test case to trigger reconnect_disconnected()
> to be called (like ovsdb-cs does).
> 
>>
>> Something is fishy here, because ovsdb-cs somehow jumps over step #3 and
>> maybe also #4.
> 
> As mentioned above, it's not the case for ovsdb-cs.
> 
> However, while checking the test case I realized that the patch will
> cause all "graceful" reconnects to happen with an initial backoff of 2
> seconds for long lived sessions (instead of 1s).
> 
> That's because:
> 
> reconnect_graceful_reconnect()
> -> reconnect_reset_backoff__()
>    -> session was active after the initial 'backoff' seconds, so
>       reset backoff to minimum.
>    -> reconnect_transition__(..., S_RECONNECT);
> ovsdb_cs_run()
> -> jsonrpc_session_run()
>    -> reconnect_run()
>    -> reconnect_disconnected():
>       -> if (!reconnect_reset_backoff__(..)) then double backoff.
> 
> I see a couple of potential ways to fix that:
> 1. reconnect_graceful_reconnect() could bump fsm->backoff_free_tries to
> allow a single backoff free reconnect.
> 2. reconnect_graceful_reconnect() could reset fsm->backoff to 0 if it
> was active recently (instead of calling reconnect_reset_backoff__()).
> 
> However, this is all under the assumption that we want to add support
> for two types of reconnect "semantics":
> 
> a. forced, e.g., when the client detects inconsistencies in the data
> received from the server (the server is in "bad shape") in which we
> should never reset the backoff.
> 
> b. graceful, e.g., when the client reconnects because it needs a
> leader-only connection and a leadership transfer happened on the
> server side (the server is likely in "good shape" just not a leader
> anymore).
> 
> An alternative to all this is to change reconnect_disconnected() to
> reset backoff for connections in states S_ACTIVE | S_IDLE | S_RECONNECT
> (used to be just active and idle).
> 
> I guess we could treat this as a bug fix and continue discussion for a
> separate follow up patch to add the "graceful vs force" semantics if
> they turn out to make sense for the future.
> 
> In essence (excluding potential python and test changes) the patch would
> become:
> 
> diff --git a/lib/reconnect.c b/lib/reconnect.c
> index a929ddfd2d..3a6d93f9d1 100644
> --- a/lib/reconnect.c
> +++ b/lib/reconnect.c
> @@ -385,7 +385,7 @@ reconnect_disconnected(struct reconnect *fsm, long long 
> int now, int error)
>          if (fsm->backoff_free_tries > 1) {
>              fsm->backoff_free_tries--;
>              fsm->backoff = 0;
> -        } else if (fsm->state & (S_ACTIVE | S_IDLE)
> +        } else if (fsm->state & (S_ACTIVE | S_IDLE | S_RECONNECT)
>                     && (fsm->last_activity - fsm->last_connected >= 
> fsm->backoff
>                         || fsm->passive)) {
>              fsm->backoff = fsm->passive ? 0 : fsm->min_backoff;
> ---
> 
> What do you think?

The problem I see with adding S_RECONNECT to this condition is that
'last_activity' and 'last_connected' doesn't belong to current
connection in case we're forcefully interrupting connection in a
S_CONNECTING state.  So, we will re-set backoff based on outdated
information.  Also, I think, in this case, 'last_connected' might
be larger than 'last_activity' and we will have a negative value here.
All values are signed, so it should not be an issue, but it's not
a clean solution.

Bumping the number of free tries seems like a better solution to me,
because:

1. I'm not sure that we need 2 types of reconnect.  I mean, backoff
   is intended to avoid overloading the already overloaded server with
   connection attempts.  In case of forceful re-connection the server
   is not overloaded, so we should not increase a backoff.
   IMO, backoff should only be involved in cases where we have
   problems on the server side related to load, not the leader-only
   stuff or even database inconsistency problems.

2. force-reconnect should not consume free tries for exactly same
   reasons as in point 1.  Free tries are to avoid backoff, but
   backoff exists only to avoid storming the remote server with
   connections while it's overloaded.
   No overload -> No need for backoff -> No need to consume free tries.

3. With bumping of free tries we're avoiding logical issues around
   using 'last_activity' and 'last_connected' from the old connection.

4. Faster reconnection, since backoff will be zero.  And that is
   fine, because as far as we know, the server is not overloaded,
   it's just not suitable for our needs for some reason.
   If it is overloaded, we will backoff after the first failed
   connection attempt.

One potential problem I see with this solution is that if there
are no servers on a list that are suitable for ovsdb-cs, we will
have a log full of annoying reconnect logs.  Backoff kind of
rate-limits ovsdb-cs right now and we have no internal rate-limit
inside ovsdb-cs.  This looks more like an issue of ovsdb-cs and
not the issue of reconnect module itself.  But we need to have a
solution for this.

And actually, the problem here is that from the jsonrpc point of
view the connection is perfectly fine and functional, but ovsdb-cs
implements high-level logic on top of it and decides that
connection is not good on a much later stage based on the actual
received data.  So, we need to somehow propagate this information
from ovsdb-cs down to reconnect module or allow ovsdb-cs to control
the state machine, otherwise we will need two separate backoffs:
one inside the reconnect for usual connection problems and one
in ovsdb-cs for high-level data inconsistencies and leader changes.

Thinking this way led me to a different solution.  We could expose
something like jsonrpc_session_set_backoff_free_tries() and allow
ovsdb-cs to make a decision.  E.g.:

diff --git a/lib/ovsdb-cs.c b/lib/ovsdb-cs.c
index f13065c6c..900597b96 100644
--- a/lib/ovsdb-cs.c
+++ b/lib/ovsdb-cs.c
@@ -729,6 +729,17 @@ void
 ovsdb_cs_force_reconnect(struct ovsdb_cs *cs)
 {
     if (cs->session) {
+        if (cs->state == CS_S_MONITORING) {
+            /* The ovsdb-cs was in MONITORING state, so we either had data
+             * inconsistency on this server, or it stopped being the cluster
+             * leader, or the user requested to re-connect.  Avoiding backoff
+             * in these cases, as we need to re-connect as soon as possible.
+             * Connections that are not in MONITORING state should have their
+             * backoff to avoid constant flood of re-connection attempts in
+             * case there is no suitable database server. */
+            jsonrpc_session_set_backoff_free_tries(
+                cs->session, jsonrpc_session_get_n_remotes(cs->session));
+        }
         jsonrpc_session_force_reconnect(cs->session);
     }
 }
---

This way, if the server looses leadership or inconsistency detected,
the client will have 3 free attempts to find a new suitable server.
After that it will start to backoff as it does now.  No changes in
reconnect module required.

Thoughts?

Best regards, Ilya Maximets.
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to