On 25/06/2024 19:37, Daniel P. Berrangé wrote: > On Tue, Jun 25, 2024 at 10:53:41AM -0400, Peter Xu wrote: >> Then the question is how should we suggest the user to specify these two >> parameters. >> >> The cover letter used: >> >> migrate_set_parameter downtime-limit 300 >> migrate_set_parameter switchover-limit 10 > > What this means is that in practice the total downtime limit > is 310 ms, however, expressing this as two parameters is > incredibly inflexible. > > If the actual RAM transfer downtime only took 50 ms, then why > should the switchover downtime still be limited to 10ms, when > we've still got a budget of 250 ms that was unused. >
The downtime limit is 300, it's more than you are giving something *extra* 10ms when you switchover regardless of where that's spent. If it makes it easier to understand you could see this parameter as: 'downtime-limit-max-error' = 10 ms The name as proposed by the RFC was meant to honor what the error margin was meant for: to account for extra time during switchover. Adding this inside downtime-limit wouldn't work as it otherwise would be used solely for RAM transfer during precopy. > IOW, if my VM tolerates a downtime of 310ms, then I want that > 310ms spread across the RAM transfer downtime and switchover > downtime in *any* ratio. ALl that matters is the overall > completion time. > That still happens with this patches, no specific budget is given to each. Though implicitly if downtime-limit captures only RAM transfer, then in theory if you're migrating a busy guest that happens to meet the SLA say expected-downtime=290, then you have a total of 20 for switchover (thanks to the extra 10 used in switchover-limit/downtime-limit-max-error 10). But keep in mind that the migration prediction *does not* account for anything other than RAM transfer. It happens that maybe your configuration is cheap, or has been optimized enough over the years that you likely don't care or noticed that it /could/ hurt the user designated SLA even if by a little. Regards, Joao