On 27/02/2024 07:41, Peter Xu wrote:
> On Thu, Feb 22, 2024 at 05:56:27PM +0200, Avihai Horon wrote:
>> This bug was observed in several VFIO migration scenarios where some
>> workload on the VM prevented RAM from ever reaching a hard zero, not
>> allowing VFIO initial pre-copy data to be sent, and thus destination
>> could not ack switchover. Note that the same scenario, but without
>> switchover-ack, would converge.
>>
>> Fix it by not serializing device data sending during pre-copy iterative
>> phase if switchover was not acked yet.
> 
> I am still not fully convinced that it's even legal that one device can
> consume all iterator's bandwidth, ignoring the rest..  Though again it's
> not about this patch, but about commit 90697be889.
> 
> I'm thinking whether we should allow each device to have its own portion of
> chance to push data for each call to qemu_savevm_state_iterate(),
> irrelevant of vfio's switchover-ack capability.
> 

I guess that this means the only change needed is (...)

>> diff --git a/migration/savevm.c b/migration/savevm.c
>> index d612c8a9020..3a012796375 100644
>> --- a/migration/savevm.c
>> +++ b/migration/savevm.c
>> @@ -1386,7 +1386,7 @@ int qemu_savevm_state_resume_prepare(MigrationState *s)
>>   *   0 : We haven't finished, caller have to go again
>>   *   1 : We have finished, we can go to complete phase
>>   */
>> -int qemu_savevm_state_iterate(QEMUFile *f, bool postcopy)
>> +int qemu_savevm_state_iterate(QEMUFile *f, bool postcopy, bool 
>> can_switchover)
>>  {
>>      SaveStateEntry *se;
>>      int ret = 1;
>> @@ -1430,12 +1430,20 @@ int qemu_savevm_state_iterate(QEMUFile *f, bool 
>> postcopy)
>>                           "%d(%s): %d",
>>                           se->section_id, se->idstr, ret);
>>              qemu_file_set_error(f, ret);
>> +            return ret;
>>          }
>> -        if (ret <= 0) {
>> -            /* Do not proceed to the next vmstate before this one reported
>> -               completion of the current stage. This serializes the 
>> migration
>> -               and reduces the probability that a faster changing state is
>> -               synchronized over and over again. */
>> +
>> +        if (ret == 0 && can_switchover) {
>> +            /*
>> +             * Do not proceed to the next vmstate before this one reported
>> +             * completion of the current stage. This serializes the 
>> migration
>> +             * and reduces the probability that a faster changing state is
>> +             * synchronized over and over again.
>> +             * Do it only if migration can switchover. If migration can't
>> +             * switchover yet, do proceed to let other devices send their 
>> data
>> +             * too, as this may be required for switchover to be acked and
>> +             * migration to converge.
>> +             */
>>              break;
>>          }
>>      }

(...) is here to have:

if (ret < 0) {
        break;
}

? Or you were thinking in some heuristic?

        Joao

Reply via email to