Re: [ovs-dev] [PATCH] reconnect: Add graceful reconnect.

Dumitru Ceara Tue, 20 Jul 2021 07:33:15 -0700

On 7/20/21 4:29 PM, Ilya Maximets wrote:
> On 7/20/21 4:25 PM, Ilya Maximets wrote:
>> On 7/20/21 3:39 PM, Dumitru Ceara wrote:
>>> On 7/20/21 3:18 PM, Ilya Maximets wrote:
>>>> On 7/13/21 5:03 PM, Dumitru Ceara wrote:
>>>>> On 7/12/21 10:36 PM, Ilya Maximets wrote:
>>>>>> On 6/29/21 1:20 PM, Dumitru Ceara wrote:
>>>>>>> Until now clients that needed to reconnect immediately could only use
>>>>>>> reconnect_force_reconnect().  However, reconnect_force_reconnect()
>>>>>>> doesn't reset the backoff for connections that were alive long enough
>>>>>>> (more than backoff seconds).
>>>>>>>
>>>>>>> Moreover, the reconnect library cannot determine the exact reason why a
>>>>>>> client wishes to initiate a reconnection.  In most cases reconnection
>>>>>>> happens because of a fatal error when communicating with the remote,
>>>>>>> e.g., in the ovsdb-cs layer, when invalid messages are received from
>>>>>>> ovsdb-server.  In such cases it makes sense to not reset the backoff
>>>>>>> because the remote seems to be unhealthy.
>>>>>>>
>>>>>>> There are however cases when reconnection is needed for other reasons.
>>>>>>> One such example is when ovsdb-clients require "leader-only" connections
>>>>>>> to clustered ovsdb-server databases.  Whenever the client determines
>>>>>>> that the remote is not a leader anymore, it decides to reconnect to a
>>>>>>> new remote from its list, searching for the new leader.  Using
>>>>>>> jsonrpc_force_reconnect() (which calls reconnect_force_reconnect()) will
>>>>>>> not reset backoff even though the former leader is still likely in good
>>>>>>> shape.
>>>>>>>
>>>>>>> Since 3c2d6274bcee ("raft: Transfer leadership before creating
>>>>>>> snapshots.") leadership changes inside the clustered database happen
>>>>>>> more often and therefore "leader-only" clients need to reconnect more
>>>>>>> often too.  Not resetting the backoff every time a leadership change
>>>>>>> happens will cause all reconnections to happen with the maximum backoff
>>>>>>> (8 seconds) resulting in significant latency.
>>>>>>>
>>>>>>> This commit also updates the Python reconnect and IDL implementations
>>>>>>> and adds tests for force-reconnect and graceful-reconnect.
>>>>>>>
>>>>>>> Reported-at: https://bugzilla.redhat.com/1977264
>>>>>>> Signed-off-by: Dumitru Ceara <[email protected]>
>>>>>>> ---
>>>>>>
>>>>>> Hi, Dumitru.
>>>>>>
>>>>>
>>>>> Hi Ilya,
>>>>>
>>>>>> Thanks for working on this issue.  I've seen it in practice while running
>>>>>> OVN tests, but I still don't quiet understand why it happens.  Could you,
>>>>>> please, describe how state transitioning work here for the ovsdb-idl 
>>>>>> case?
>>>>>>
>>>>>
>>>>> Without the patch, assuming a current backoff of X seconds, the sequence
>>>>> of events (even for a connection that has seen activity after backoff)
>>>>> is:
>>>>>
>>>>> - reconnect_force_reconnect() -> move to S_RECONNECT
>>>>> - ovsdb_cs_run()
>>>>>   -> jsonrpc_session_run()
>>>>>      -> reconnect_run()
>>>>>      -> reconnect_disconnected()
>>>>>         -> because state is not S_ACTIVE | S_IDLE backoff is not
>>>>>            changed, stays X.
>>>>
>>>> Hmm, I see.  Thanks for explanation!
>>>>
>>>>>
>>>>>>> +# Forcefully reconnect.
>>>>>>> +force-reconnect
>>>>>>> +  in RECONNECT for 0 ms (2000 ms backoff)
>>>>>>> +  1 successful connections out of 3 attempts, seqno 2
>>>>>>> +  disconnected
>>>>>>> +run
>>>>>>> +  should disconnect
>>>>>>> +connecting
>>>>>>> +  in CONNECTING for 0 ms (2000 ms backoff)
>>>>>>
>>>>>> Especially this part seems wrong to me.  Because after 'should 
>>>>>> disconnect'
>>>>>> there should be 'disconnect' of 'connect-fail', but not 'connecting'.  We
>>>>>> literally should disconnect here, otherwise it's a violation of the 
>>>>>> reconnect
>>>>>> API.  And my concern is that ovsdb-cs or jsonrpc violates the API 
>>>>>> somewhere
>>>>>> by not calling reconnect_disconnectd() when it is required, or there is 
>>>>>> some
>>>>>> other bug that makes 'reconnect' module to jump over few states in a fsm.
>>>>>>
>>>>>> The logical workflow for the force-reconnect, from what I see in the code
>>>>>> should be:
>>>>>>
>>>>>> 1. force-reconnect --> transition to S_RECONNECT
>>>>>> 2. run -> in S_RECONNECT, so returning RECONNECT_DISCONNECT
>>>>>> 3. disconnect -> check the state, update backoff and transition to 
>>>>>> S_BACKOFF
>>>>>> 4. run -> in S_BACKOFF, so returning RECONNECT_CONNECT
>>>>>> 5. connected ....
>>>>>
>>>>> This is just a bug in the test case I added, I need to issue
>>>>> "disconnect" in the test case to trigger reconnect_disconnected()
>>>>> to be called (like ovsdb-cs does).
>>>>>
>>>>>>
>>>>>> Something is fishy here, because ovsdb-cs somehow jumps over step #3 and
>>>>>> maybe also #4.
>>>>>
>>>>> As mentioned above, it's not the case for ovsdb-cs.
>>>>>
>>>>> However, while checking the test case I realized that the patch will
>>>>> cause all "graceful" reconnects to happen with an initial backoff of 2
>>>>> seconds for long lived sessions (instead of 1s).
>>>>>
>>>>> That's because:
>>>>>
>>>>> reconnect_graceful_reconnect()
>>>>> -> reconnect_reset_backoff__()
>>>>>    -> session was active after the initial 'backoff' seconds, so
>>>>>       reset backoff to minimum.
>>>>>    -> reconnect_transition__(..., S_RECONNECT);
>>>>> ovsdb_cs_run()
>>>>> -> jsonrpc_session_run()
>>>>>    -> reconnect_run()
>>>>>    -> reconnect_disconnected():
>>>>>       -> if (!reconnect_reset_backoff__(..)) then double backoff.
>>>>>
>>>>> I see a couple of potential ways to fix that:
>>>>> 1. reconnect_graceful_reconnect() could bump fsm->backoff_free_tries to
>>>>> allow a single backoff free reconnect.
>>>>> 2. reconnect_graceful_reconnect() could reset fsm->backoff to 0 if it
>>>>> was active recently (instead of calling reconnect_reset_backoff__()).
>>>>>
>>>>> However, this is all under the assumption that we want to add support
>>>>> for two types of reconnect "semantics":
>>>>>
>>>>> a. forced, e.g., when the client detects inconsistencies in the data
>>>>> received from the server (the server is in "bad shape") in which we
>>>>> should never reset the backoff.
>>>>>
>>>>> b. graceful, e.g., when the client reconnects because it needs a
>>>>> leader-only connection and a leadership transfer happened on the
>>>>> server side (the server is likely in "good shape" just not a leader
>>>>> anymore).
>>>>>
>>>>> An alternative to all this is to change reconnect_disconnected() to
>>>>> reset backoff for connections in states S_ACTIVE | S_IDLE | S_RECONNECT
>>>>> (used to be just active and idle).
>>>>>
>>>>> I guess we could treat this as a bug fix and continue discussion for a
>>>>> separate follow up patch to add the "graceful vs force" semantics if
>>>>> they turn out to make sense for the future.
>>>>>
>>>>> In essence (excluding potential python and test changes) the patch would
>>>>> become:
>>>>>
>>>>> diff --git a/lib/reconnect.c b/lib/reconnect.c
>>>>> index a929ddfd2d..3a6d93f9d1 100644
>>>>> --- a/lib/reconnect.c
>>>>> +++ b/lib/reconnect.c
>>>>> @@ -385,7 +385,7 @@ reconnect_disconnected(struct reconnect *fsm, long 
>>>>> long int now, int error)
>>>>>          if (fsm->backoff_free_tries > 1) {
>>>>>              fsm->backoff_free_tries--;
>>>>>              fsm->backoff = 0;
>>>>> -        } else if (fsm->state & (S_ACTIVE | S_IDLE)
>>>>> +        } else if (fsm->state & (S_ACTIVE | S_IDLE | S_RECONNECT)
>>>>>                     && (fsm->last_activity - fsm->last_connected >= 
>>>>> fsm->backoff
>>>>>                         || fsm->passive)) {
>>>>>              fsm->backoff = fsm->passive ? 0 : fsm->min_backoff;
>>>>> ---
>>>>>
>>>>> What do you think?
>>>>
>>>> The problem I see with adding S_RECONNECT to this condition is that
>>>> 'last_activity' and 'last_connected' doesn't belong to current
>>>> connection in case we're forcefully interrupting connection in a
>>>> S_CONNECTING state.  So, we will re-set backoff based on outdated
>>>> information.  Also, I think, in this case, 'last_connected' might
>>>> be larger than 'last_activity' and we will have a negative value here.
>>>
>>> You're right, I missed this.
>>>
>>>> All values are signed, so it should not be an issue, but it's not
>>>> a clean solution.
>>>
>>> It's not, I agree.
>>>
>>>>
>>>> Bumping the number of free tries seems like a better solution to me,
>>>> because:
>>>>
>>>> 1. I'm not sure that we need 2 types of reconnect.  I mean, backoff
>>>>    is intended to avoid overloading the already overloaded server with
>>>>    connection attempts.  In case of forceful re-connection the server
>>>>    is not overloaded, so we should not increase a backoff.
>>>>    IMO, backoff should only be involved in cases where we have
>>>>    problems on the server side related to load, not the leader-only
>>>>    stuff or even database inconsistency problems.
>>>>
>>>> 2. force-reconnect should not consume free tries for exactly same
>>>>    reasons as in point 1.  Free tries are to avoid backoff, but
>>>>    backoff exists only to avoid storming the remote server with
>>>>    connections while it's overloaded.
>>>>    No overload -> No need for backoff -> No need to consume free tries.
>>>>
>>>> 3. With bumping of free tries we're avoiding logical issues around
>>>>    using 'last_activity' and 'last_connected' from the old connection.
>>>>
>>>> 4. Faster reconnection, since backoff will be zero.  And that is
>>>>    fine, because as far as we know, the server is not overloaded,
>>>>    it's just not suitable for our needs for some reason.
>>>>    If it is overloaded, we will backoff after the first failed
>>>>    connection attempt.
>>>
>>> This is even better (instead of min_backoff, I mean).
>>>
>>>>
>>>> One potential problem I see with this solution is that if there
>>>> are no servers on a list that are suitable for ovsdb-cs, we will
>>>> have a log full of annoying reconnect logs.  Backoff kind of
>>>> rate-limits ovsdb-cs right now and we have no internal rate-limit
>>>> inside ovsdb-cs.  This looks more like an issue of ovsdb-cs and
>>>> not the issue of reconnect module itself.  But we need to have a
>>>> solution for this.
>>>>
>>>> And actually, the problem here is that from the jsonrpc point of
>>>> view the connection is perfectly fine and functional, but ovsdb-cs
>>>> implements high-level logic on top of it and decides that
>>>> connection is not good on a much later stage based on the actual
>>>> received data.  So, we need to somehow propagate this information
>>>> from ovsdb-cs down to reconnect module or allow ovsdb-cs to control
>>>> the state machine, otherwise we will need two separate backoffs:
>>>
>>> That's what I was trying to implement with the "graceful reconnect:
>>> allow ovsdb-cs to decide how to reconnect.  But there are corner cases
>>> like the ones pointed out during the review of this patch.
>>>
>>>> one inside the reconnect for usual connection problems and one
>>>> in ovsdb-cs for high-level data inconsistencies and leader changes.
>>>>
>>>> Thinking this way led me to a different solution.  We could expose
>>>> something like jsonrpc_session_set_backoff_free_tries() and allow
>>>> ovsdb-cs to make a decision.  E.g.:
>>>>
>>>> diff --git a/lib/ovsdb-cs.c b/lib/ovsdb-cs.c
>>>> index f13065c6c..900597b96 100644
>>>> --- a/lib/ovsdb-cs.c
>>>> +++ b/lib/ovsdb-cs.c
>>>> @@ -729,6 +729,17 @@ void
>>>>  ovsdb_cs_force_reconnect(struct ovsdb_cs *cs)
>>>>  {
>>>>      if (cs->session) {
>>>> +        if (cs->state == CS_S_MONITORING) {
>>>> +            /* The ovsdb-cs was in MONITORING state, so we either had data
>>>> +             * inconsistency on this server, or it stopped being the 
>>>> cluster
>>>> +             * leader, or the user requested to re-connect.  Avoiding 
>>>> backoff
>>>> +             * in these cases, as we need to re-connect as soon as 
>>>> possible.
>>>> +             * Connections that are not in MONITORING state should have 
>>>> their
>>>> +             * backoff to avoid constant flood of re-connection attempts 
>>>> in
>>>> +             * case there is no suitable database server. */
>>>> +            jsonrpc_session_set_backoff_free_tries(
>>>> +                cs->session, jsonrpc_session_get_n_remotes(cs->session));
>>>> +        }
>>>>          jsonrpc_session_force_reconnect(cs->session);
>>>>      }
>>>>  }
>>>> ---
>>>>
>>>> This way, if the server looses leadership or inconsistency detected,
>>>> the client will have 3 free attempts to find a new suitable server.
>>>> After that it will start to backoff as it does now.  No changes in
>>>> reconnect module required.
>>>>
>>>> Thoughts?
>>>
>>> This works for me.  I just have a question regarding the new API: should
>>> we allow jsonrpc users to set the free tries to any value or shall we
>>> make it more strict, e.g., jsonrpc_session_reset_backoff_free_tries(),
>>> which would reset the number of free tries to 'n_remotes'?
> 
> And it should, probably, set number of free tries to n_remotes + 1,
> because it will count current disconnection as an attempt.
>


Makes sense.

>>
>> jsonrpc_session_reset_backoff_free_tries() seems better, because ovsdb-cs
>> doesn't actually set number of free tries for jsonrpc from the start.
>> Maybe we can name it just jsonrpc_session_reset_backoff() ?  I'm not
>> sure, but I just don't like this very long name.
>>
>>>
>>> Will you be sending a patch or shall I add your "Suggested-by"?
>>
>> I'm OK with "Suggested-by".
>> Would be also great to have some type of a test for this functionality.
>>

Let me try to come up with something.  I'll post a v2 these days.

Regards,
Dumitru

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH] reconnect: Add graceful reconnect.

Reply via email to