On 4/2/25 4:47 PM, Mike Pattrick wrote:
> On Thu, Mar 27, 2025 at 6:11 PM Ilya Maximets <i.maxim...@ovn.org> wrote:
>>
>> When the cluster falls apart, there is no leader and so there are no
>> heartbeats that need to be sent or received on time.  In this case
>> the RAFT_TIMER_THRESHOLD is too small, as the process will only wake
>> up for a full election timer.  When this happens, cooperative
>> multitasking code will emit warnings and traces thinking there is a
>> timer overrun.
>>
>> Fix that by setting the multitasking interval according to the full
>> election timer, if there is no known leader.
>>
>> Fixes: d4a15647b917 ("ovsdb: raft: Enable cooperative multitasking.")
>> Signed-off-by: Ilya Maximets <i.maxim...@ovn.org>
>> ---
>>
>> Cc: Frode Nordahl <fnord...@ubuntu.com>
>>
>>  ovsdb/raft.c | 11 ++++++++---
>>  1 file changed, 8 insertions(+), 3 deletions(-)
>>
>> diff --git a/ovsdb/raft.c b/ovsdb/raft.c
>> index 9c3c351b5..a9da28780 100644
>> --- a/ovsdb/raft.c
>> +++ b/ovsdb/raft.c
>> @@ -2170,9 +2170,14 @@ raft_run(struct raft *raft)
>>          raft_reset_ping_timer(raft);
>>      }
>>
>> -    uint64_t interval = raft->joining
>> -                        ? RAFT_JOIN_TIMEOUT_MS
>> -                        : RAFT_TIMER_THRESHOLD(raft->election_timer);
>> +    uint64_t interval = RAFT_TIMER_THRESHOLD(raft->election_timer);
> 
> Why not set interval in an else clause?

I had it written this way at first, but then changed for a couple reasons:

1. The RAFT_TIMER_THRESHOLD is the most common case that we'll use the
   vast majority of the time.
2. We'll need to have a negative check for the leader_sid in the else
   branch and the comment in that branch become a little more complicated
   in order to explain what's going on.

2 may be not a big deal, but 1 seems a good enough reason.

Does that make sense?

Best regards, Ilya Maximets.

> 
> -M
> 
>> +
>> +    if (raft->joining) {
>> +        interval = RAFT_JOIN_TIMEOUT_MS;
>> +    } else if (uuid_is_zero(&raft->leader_sid)) {
>> +        /* There are no heartbeats to handle when there is no leader. */
>> +        interval = raft->election_timer;
>> +    }
>>      cooperative_multitasking_set(
>>          &raft_run_cb, (void *) raft, time_msec(),
>>          interval + interval / 10, "raft_run");
>> --
>> 2.47.0
>>
>> _______________________________________________
>> dev mailing list
>> d...@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
> 

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to