Re: [PATCH RFC 2/2] migration: abort on destination if switchover limit exceeded

Joao Martins Wed, 26 Jun 2024 04:31:03 -0700

On 25/06/2024 19:37, Daniel P. Berrangé wrote:
> On Tue, Jun 25, 2024 at 10:53:41AM -0400, Peter Xu wrote:
>> Then the question is how should we suggest the user to specify these two
>> parameters.
>>
>> The cover letter used:
>>
>>   migrate_set_parameter downtime-limit 300
>>   migrate_set_parameter switchover-limit 10
> 
> What this means is that in practice the total downtime limit
> is 310 ms, however, expressing this as two parameters is
> incredibly inflexible.
> 
> If the actual RAM transfer downtime only took 50 ms, then why
> should the switchover downtime still be limited to 10ms, when
> we've still got a budget of 250 ms that was unused.
>


The downtime limit is 300, it's more than you are giving something *extra* 10ms
when you switchover regardless of where that's spent.

If it makes it easier to understand you could see this parameter as:

'downtime-limit-max-error' = 10 ms

The name as proposed by the RFC was meant to honor what the error margin was
meant for: to account for extra time during switchover. Adding this inside
downtime-limit wouldn't work as it otherwise would be used solely for RAM
transfer during precopy.

> IOW, if my VM tolerates a downtime of 310ms, then I want that
> 310ms spread across the RAM transfer downtime and switchover
> downtime in *any* ratio. ALl that matters is the overall
> completion time.
> 
That still happens with this patches, no specific budget is given to each.
Though implicitly if downtime-limit captures only RAM transfer, then in theory
if you're migrating a busy guest that happens to meet the SLA say
expected-downtime=290, then you have a total of 20 for switchover (thanks to the
extra 10 used in switchover-limit/downtime-limit-max-error 10).

But keep in mind that the migration prediction *does not* account for anything
other than RAM transfer. It happens that maybe your configuration is cheap, or
has been optimized enough over the years that you likely don't care or noticed
that it /could/ hurt the user designated SLA even if by a little.

Regards,
        Joao

Re: [PATCH RFC 2/2] migration: abort on destination if switchover limit exceeded

Reply via email to