Would you mind suggesting any next steps on this issue?

1. Increase maximum size for "etc/acpi/rsdp", or

2. Remove host page size based alignment, and then fix any further live
migration issue?

Thank you very much!

Dongli Zhang

On 1/31/23 1:17 AM, Feng Sun wrote:
> Michael S. Tsirkin <m...@redhat.com> 于2023年1月30日周一 23:07写道:
>>
>> On Mon, Jan 30, 2023 at 10:47:25PM +0800, Feng Sun wrote:
>>> Igor Mammedov <imamm...@redhat.com> 于2023年1月24日周二 18:30写道:
>>>>
>>>> On Tue, 17 Jan 2023 19:15:21 +0800
>>>> Sun Feng <loyo...@gmail.com> wrote:
>>>>
>>>>> Migrate from aarch64 host with PAGE_SIZE 64k to 4k failed with following 
>>>>> errors:
>>>>>
>>>>> qmp_cmd_name: migrate-incoming, arguments: {"uri": "tcp:[::]:49152"}
>>>>> {"timestamp": {"seconds": 1673922775, "microseconds": 534702}, "event": 
>>>>> "MIGRATION", "data": {"status": "setup"}}
>>>>> {"timestamp": {"seconds": 1673922776, "microseconds": 53003}, "event": 
>>>>> "MIGRATION", "data": {"status": "active"}}
>>>>> 2023-01-17T02:32:56.058827Z qemu-system-aarch64: Length too large: 
>>>>> /rom@etc/acpi/rsdp: 0x10000 > 0x1000: Invalid argument
>>>>
>>>> this should mention/explain why it's happening.
>>>>
>>>> i.e we now have 4k limit for RSDP, but then source somehow managed to 
>>>> start with 64k
>>>> allocated to for RSDP. It looks like limit isn't working as expected to me.
>>>
>>> 4k limit should be romsize limit. I can see Rom '/rom@etc/acpi/rsdp'
>>> with romsize:4096, datasize:36.
>>> RAMBlock's used_length is set with datasize aligned to PAGE_SIZE, so
>>> it become 64k when PAGE_SIZE is 64k.
>>> ```
>>> static
>>> RAMBlock *qemu_ram_alloc_internal(ram_addr_t size, ram_addr_t max_size,
>>>                                   void (*resized)(const char*,
>>>                                                   uint64_t length,
>>>                                                   void *host),
>>>                                   void *host, uint32_t ram_flags,
>>>                                   MemoryRegion *mr, Error **errp)
>>> {
>>>     RAMBlock *new_block;
>>>     Error *local_err = NULL;
>>>
>>>     assert((ram_flags & ~(RAM_SHARED | RAM_RESIZEABLE | RAM_PREALLOC |
>>>                           RAM_NORESERVE)) == 0);
>>>     assert(!host ^ (ram_flags & RAM_PREALLOC));
>>>
>>>     size = HOST_PAGE_ALIGN(size);
>>>     max_size = HOST_PAGE_ALIGN(max_size);
>>>     new_block = g_malloc0(sizeof(*new_block));
>>>     new_block->mr = mr;
>>>     new_block->resized = resized;
>>>     new_block->used_length = size;
>>> ```
>>> So when migrate to 4k PAGE_SIZE, it will report the errors.
>>>
>>> ramblock information for PAGE_SIZE 64k and 4k.
>>> ```
>>> # getconf PAGE_SIZE
>>> 65536
>>> # virsh qemu-monitor-command testvm --hmp 'info ramblock'
>>>               Block Name    PSize              Offset
>>> Used              Total
>>>            mach-virt.ram   64 KiB  0x0000000000000000
>>> 0x0000000040000000 0x0000000040000000
>>>              virt.flash0   64 KiB  0x0000000040000000
>>> 0x0000000004000000 0x0000000004000000
>>>              virt.flash1   64 KiB  0x0000000044000000
>>> 0x0000000004000000 0x0000000004000000
>>>     /rom@etc/acpi/tables   64 KiB  0x0000000048040000
>>> 0x0000000000020000 0x0000000000200000
>>> 0000:00:01.2:00.0/virtio-net-pci.rom   64 KiB  0x0000000048000000
>>> 0x0000000000040000 0x0000000000040000
>>>    /rom@etc/table-loader   64 KiB  0x0000000048240000
>>> 0x0000000000010000 0x0000000000010000
>>>       /rom@etc/acpi/rsdp   64 KiB  0x0000000048280000
>>> 0x0000000000010000 0x0000000000010000
>>>
>>> # getconf PAGE_SIZE
>>> 4096
>>> # virsh qemu-monitor-command testvm --hmp 'info ramblock'
>>>               Block Name    PSize              Offset
>>> Used              Total
>>>            mach-virt.ram    4 KiB  0x0000000000000000
>>> 0x0000000800000000 0x0000000800000000
>>>              virt.flash0    4 KiB  0x0000000800000000
>>> 0x0000000004000000 0x0000000004000000
>>>              virt.flash1    4 KiB  0x0000000804000000
>>> 0x0000000004000000 0x0000000004000000
>>>     /rom@etc/acpi/tables    4 KiB  0x0000000808000000
>>> 0x0000000000020000 0x0000000000200000
>>>    /rom@etc/table-loader    4 KiB  0x0000000808200000
>>> 0x0000000000001000 0x0000000000010000
>>>       /rom@etc/acpi/rsdp    4 KiB  0x0000000808240000
>>> 0x0000000000001000 0x0000000000001000
>>> ```
>>
>> Oh interesting. I don't remember why I decided to align in.
>> What does the following do (warning: completely untested):
>>
>>
>> diff --git a/softmmu/physmem.c b/softmmu/physmem.c
>> index cb998cdf23..5c732101b9 100644
>> --- a/softmmu/physmem.c
>> +++ b/softmmu/physmem.c
>> @@ -2154,7 +2154,7 @@ RAMBlock *qemu_ram_alloc_internal(ram_addr_t size, 
>> ram_addr_t max_size,
>>                            RAM_NORESERVE)) == 0);
>>      assert(!host ^ (ram_flags & RAM_PREALLOC));
>>
>> -    size = HOST_PAGE_ALIGN(size);
>> +    // size = HOST_PAGE_ALIGN(size);
>>      max_size = HOST_PAGE_ALIGN(max_size);
>>      new_block = g_malloc0(sizeof(*new_block));
>>      new_block->mr = mr;
>>
> 
> With additional change we can see actually used size with 'info ramblock',
> 
> --- a/softmmu/physmem.c
> +++ b/softmmu/physmem.c
> @@ -1837,7 +1837,7 @@ int qemu_ram_resize(RAMBlock *block, ram_addr_t
> newsize, Error **errp)
> 
>      assert(block);
> 
> -    newsize = HOST_PAGE_ALIGN(newsize);
> +    //newsize = HOST_PAGE_ALIGN(newsize);
> 
>      if (block->used_length == newsize) {
>          /*
> 
> # virsh qemu-monitor-command testvm --hmp 'info ramblock'
>               Block Name    PSize              Offset
> Used              Total
>            mach-virt.ram   64 KiB  0x0000000000000000
> 0x0000000800000000 0x0000000800000000
>              virt.flash0   64 KiB  0x0000000800000000
> 0x0000000004000000 0x0000000004000000
>              virt.flash1   64 KiB  0x0000000804000000
> 0x0000000004000000 0x0000000004000000
>     /rom@etc/acpi/tables   64 KiB  0x0000000808000000
> 0x0000000000020000 0x0000000000200000
>    /rom@etc/table-loader   64 KiB  0x0000000808200000
> 0x0000000000000b00 0x0000000000010000
>       /rom@etc/acpi/rsdp   64 KiB  0x0000000808240000
> 0x0000000000000024 0x0000000000010000
> 
> but migration needs more changes. I fixed the following error during 
> migration:
> 
> qemu-system-aarch64: ../softmmu/physmem.c:1059:
> cpu_physical_memory_test_and_clear_dirty: Assertion `start >=
> ramblock->offset && start + length <= ramblock->offset +
> ramblock->used_length' failed.
> 2023-01-31 04:09:40.934+0000: shutting down, reason=crashed
> 
> --- a/softmmu/physmem.c
> +++ b/softmmu/physmem.c
> @@ -1055,7 +1055,7 @@ bool
> cpu_physical_memory_test_and_clear_dirty(ram_addr_t start,
>          ramblock = qemu_get_ram_block(start);
>          /* Range sanity check on the ramblock */
> 
>          assert(start >= ramblock->offset &&
> -               start + length <= ramblock->offset + ramblock->used_length);
> +               start + length <= ramblock->offset + ramblock->max_length);
> 
>          while (page < end) {
>              unsigned long idx = page / DIRTY_MEMORY_BLOCK_SIZE;
> 
> but more issues still exist,
> 
> source:
> 2023-01-31T05:23:28.051615Z qemu-system-aarch64: failed to save
> SaveStateEntry with id(name): 3(ram): -5
> 2023-01-31T05:23:28.053256Z qemu-system-aarch64: Unable to write to
> socket: Bad file descriptor
> 
> target:
> 2023-01-31T05:23:28.049659Z qemu-system-aarch64: Received an
> unexpected compressed page
> 2023-01-31T05:23:28.049709Z qemu-system-aarch64: error while loading
> state for instance 0x0 of device 'ram'
> 2023-01-31T05:23:28.050095Z qemu-system-aarch64: load of migration
> failed: Invalid argument
> 
> In my opinion, it would be a tricky way to set 64k and would not have
> migration compatibility problems.
> Of course, the best and appropriate way is to migrate with actual data size.
> I am not quite familiar with migration codes, if needed, I can help to
> do more migration patch tests.
> 


Reply via email to