On 13.02.20 20:09, Juan Quintela wrote:
> David Hildenbrand <da...@redhat.com> wrote:
>> Resizing while migrating is dangerous and does not work as expected.
>> The whole migration code works on the usable_length of ram blocks and does
>> not expect this to change at random points in time.
>>
>> Precopy: The ram block size must not change on the source, after
>> ram_save_setup(), so as long as the guest is still running on the source.
>>
>> Postcopy: The ram block size must not change on the target, after
>> synchronizing the RAM block list (ram_load_precopy()).
>>
>> AFAIKS, resizing can be trigger *after* (but not during) a reset in
>> ACPI code by the guest
>> - hw/arm/virt-acpi-build.c:acpi_ram_update()
>> - hw/i386/acpi-build.c:acpi_ram_update()
>>
>> I see no easy way to work around this. Fail hard instead of failing
>> somewhere in migration code due to strange other reasons. AFAIKs, the
>> rebuilts will be triggered during reboot, so this should not affect
>> running guests, but only guests that reboot at a very bad time and
>> actually require size changes.
>>
>> Let's further limit the impact by checking if an actual resize of the
>> RAM (in number of pages) is required.
>>
>> Don't perform the checks in qemu_ram_resize(), as that's called during
>> migration when syncing the used_length. Update documentation.
>>
>> Cc: "Dr. David Alan Gilbert" <dgilb...@redhat.com>
>> Cc: Eduardo Habkost <ehabk...@redhat.com>
>> Cc: Paolo Bonzini <pbonz...@redhat.com>
>> Cc: Igor Mammedov <imamm...@redhat.com>
>> Cc: "Michael S. Tsirkin" <m...@redhat.com>
>> Cc: Richard Henderson <richard.hender...@linaro.org>
>> Cc: Shannon Zhao <shannon.z...@linaro.org>
>> Cc: Alex Bennée <alex.ben...@linaro.org>
>> Cc: Shameerali Kolothum Thodi <shameerali.kolothum.th...@huawei.com>
>> Cc: Juan Quintela <quint...@redhat.com>
>> Signed-off-by: David Hildenbrand <da...@redhat.com>
>> ---
> 
> 
>>
>> Any idea how to avoid killing the guest? Anything obvious I am missing?
> 
> If you avoid the resize, it should be ok for both precopy & postcopy.
> 
> But, as you point, if acpi guest is the one changing sizes, we are in
> trouble.  But really, it makes exactly zero sense to reset during
> migrate.  if we _could_ catch the reset, the "intelligent" thing to do

I guess there are cases (e.g., guest admin is different to the host
admin, "intelligent tooling") where a reset could happen while
migrating. At least failing at that point won't result in losing data,
as the guest is still booting up.

> is:
> 
> - detect reset
> - launch guest on destination from zero.

And starting completely from zero might not always be the right thing to
do ...

> 
> I.e. not migration at all.  This would be my "better" idea, but I have
> no clue how to catch that kind of things in a sane way that works in
> every architecture.

E.g., on s390x, there are different kinds of resets routed through
system reset requests. IIRC, some require memory to be kept, others to
be reset to 0 (currently not done, as discarding ram blocks while
postcopy is running does not work as expected).

Resets while migrating are really tricky when it comes to memory.
Fortunately, this case should be very rare to trigger.

> 
> You get the:
> 
> Reviewed-by: Juan Quintela <quint...@redhat.com>
> 
> because:
> - your code change makes sense
> - the documentation update is good.
> 
> Thanks, Juan.
> 

Thanks!

-- 
Thanks,

David / dhildenb


Reply via email to