Would you mind suggesting any next steps on this issue? 1. Increase maximum size for "etc/acpi/rsdp", or
2. Remove host page size based alignment, and then fix any further live migration issue? Thank you very much! Dongli Zhang On 1/31/23 1:17 AM, Feng Sun wrote: > Michael S. Tsirkin <m...@redhat.com> 于2023年1月30日周一 23:07写道: >> >> On Mon, Jan 30, 2023 at 10:47:25PM +0800, Feng Sun wrote: >>> Igor Mammedov <imamm...@redhat.com> 于2023年1月24日周二 18:30写道: >>>> >>>> On Tue, 17 Jan 2023 19:15:21 +0800 >>>> Sun Feng <loyo...@gmail.com> wrote: >>>> >>>>> Migrate from aarch64 host with PAGE_SIZE 64k to 4k failed with following >>>>> errors: >>>>> >>>>> qmp_cmd_name: migrate-incoming, arguments: {"uri": "tcp:[::]:49152"} >>>>> {"timestamp": {"seconds": 1673922775, "microseconds": 534702}, "event": >>>>> "MIGRATION", "data": {"status": "setup"}} >>>>> {"timestamp": {"seconds": 1673922776, "microseconds": 53003}, "event": >>>>> "MIGRATION", "data": {"status": "active"}} >>>>> 2023-01-17T02:32:56.058827Z qemu-system-aarch64: Length too large: >>>>> /rom@etc/acpi/rsdp: 0x10000 > 0x1000: Invalid argument >>>> >>>> this should mention/explain why it's happening. >>>> >>>> i.e we now have 4k limit for RSDP, but then source somehow managed to >>>> start with 64k >>>> allocated to for RSDP. It looks like limit isn't working as expected to me. >>> >>> 4k limit should be romsize limit. I can see Rom '/rom@etc/acpi/rsdp' >>> with romsize:4096, datasize:36. >>> RAMBlock's used_length is set with datasize aligned to PAGE_SIZE, so >>> it become 64k when PAGE_SIZE is 64k. >>> ``` >>> static >>> RAMBlock *qemu_ram_alloc_internal(ram_addr_t size, ram_addr_t max_size, >>> void (*resized)(const char*, >>> uint64_t length, >>> void *host), >>> void *host, uint32_t ram_flags, >>> MemoryRegion *mr, Error **errp) >>> { >>> RAMBlock *new_block; >>> Error *local_err = NULL; >>> >>> assert((ram_flags & ~(RAM_SHARED | RAM_RESIZEABLE | RAM_PREALLOC | >>> RAM_NORESERVE)) == 0); >>> assert(!host ^ (ram_flags & RAM_PREALLOC)); >>> >>> size = HOST_PAGE_ALIGN(size); >>> max_size = HOST_PAGE_ALIGN(max_size); >>> new_block = g_malloc0(sizeof(*new_block)); >>> new_block->mr = mr; >>> new_block->resized = resized; >>> new_block->used_length = size; >>> ``` >>> So when migrate to 4k PAGE_SIZE, it will report the errors. >>> >>> ramblock information for PAGE_SIZE 64k and 4k. >>> ``` >>> # getconf PAGE_SIZE >>> 65536 >>> # virsh qemu-monitor-command testvm --hmp 'info ramblock' >>> Block Name PSize Offset >>> Used Total >>> mach-virt.ram 64 KiB 0x0000000000000000 >>> 0x0000000040000000 0x0000000040000000 >>> virt.flash0 64 KiB 0x0000000040000000 >>> 0x0000000004000000 0x0000000004000000 >>> virt.flash1 64 KiB 0x0000000044000000 >>> 0x0000000004000000 0x0000000004000000 >>> /rom@etc/acpi/tables 64 KiB 0x0000000048040000 >>> 0x0000000000020000 0x0000000000200000 >>> 0000:00:01.2:00.0/virtio-net-pci.rom 64 KiB 0x0000000048000000 >>> 0x0000000000040000 0x0000000000040000 >>> /rom@etc/table-loader 64 KiB 0x0000000048240000 >>> 0x0000000000010000 0x0000000000010000 >>> /rom@etc/acpi/rsdp 64 KiB 0x0000000048280000 >>> 0x0000000000010000 0x0000000000010000 >>> >>> # getconf PAGE_SIZE >>> 4096 >>> # virsh qemu-monitor-command testvm --hmp 'info ramblock' >>> Block Name PSize Offset >>> Used Total >>> mach-virt.ram 4 KiB 0x0000000000000000 >>> 0x0000000800000000 0x0000000800000000 >>> virt.flash0 4 KiB 0x0000000800000000 >>> 0x0000000004000000 0x0000000004000000 >>> virt.flash1 4 KiB 0x0000000804000000 >>> 0x0000000004000000 0x0000000004000000 >>> /rom@etc/acpi/tables 4 KiB 0x0000000808000000 >>> 0x0000000000020000 0x0000000000200000 >>> /rom@etc/table-loader 4 KiB 0x0000000808200000 >>> 0x0000000000001000 0x0000000000010000 >>> /rom@etc/acpi/rsdp 4 KiB 0x0000000808240000 >>> 0x0000000000001000 0x0000000000001000 >>> ``` >> >> Oh interesting. I don't remember why I decided to align in. >> What does the following do (warning: completely untested): >> >> >> diff --git a/softmmu/physmem.c b/softmmu/physmem.c >> index cb998cdf23..5c732101b9 100644 >> --- a/softmmu/physmem.c >> +++ b/softmmu/physmem.c >> @@ -2154,7 +2154,7 @@ RAMBlock *qemu_ram_alloc_internal(ram_addr_t size, >> ram_addr_t max_size, >> RAM_NORESERVE)) == 0); >> assert(!host ^ (ram_flags & RAM_PREALLOC)); >> >> - size = HOST_PAGE_ALIGN(size); >> + // size = HOST_PAGE_ALIGN(size); >> max_size = HOST_PAGE_ALIGN(max_size); >> new_block = g_malloc0(sizeof(*new_block)); >> new_block->mr = mr; >> > > With additional change we can see actually used size with 'info ramblock', > > --- a/softmmu/physmem.c > +++ b/softmmu/physmem.c > @@ -1837,7 +1837,7 @@ int qemu_ram_resize(RAMBlock *block, ram_addr_t > newsize, Error **errp) > > assert(block); > > - newsize = HOST_PAGE_ALIGN(newsize); > + //newsize = HOST_PAGE_ALIGN(newsize); > > if (block->used_length == newsize) { > /* > > # virsh qemu-monitor-command testvm --hmp 'info ramblock' > Block Name PSize Offset > Used Total > mach-virt.ram 64 KiB 0x0000000000000000 > 0x0000000800000000 0x0000000800000000 > virt.flash0 64 KiB 0x0000000800000000 > 0x0000000004000000 0x0000000004000000 > virt.flash1 64 KiB 0x0000000804000000 > 0x0000000004000000 0x0000000004000000 > /rom@etc/acpi/tables 64 KiB 0x0000000808000000 > 0x0000000000020000 0x0000000000200000 > /rom@etc/table-loader 64 KiB 0x0000000808200000 > 0x0000000000000b00 0x0000000000010000 > /rom@etc/acpi/rsdp 64 KiB 0x0000000808240000 > 0x0000000000000024 0x0000000000010000 > > but migration needs more changes. I fixed the following error during > migration: > > qemu-system-aarch64: ../softmmu/physmem.c:1059: > cpu_physical_memory_test_and_clear_dirty: Assertion `start >= > ramblock->offset && start + length <= ramblock->offset + > ramblock->used_length' failed. > 2023-01-31 04:09:40.934+0000: shutting down, reason=crashed > > --- a/softmmu/physmem.c > +++ b/softmmu/physmem.c > @@ -1055,7 +1055,7 @@ bool > cpu_physical_memory_test_and_clear_dirty(ram_addr_t start, > ramblock = qemu_get_ram_block(start); > /* Range sanity check on the ramblock */ > > assert(start >= ramblock->offset && > - start + length <= ramblock->offset + ramblock->used_length); > + start + length <= ramblock->offset + ramblock->max_length); > > while (page < end) { > unsigned long idx = page / DIRTY_MEMORY_BLOCK_SIZE; > > but more issues still exist, > > source: > 2023-01-31T05:23:28.051615Z qemu-system-aarch64: failed to save > SaveStateEntry with id(name): 3(ram): -5 > 2023-01-31T05:23:28.053256Z qemu-system-aarch64: Unable to write to > socket: Bad file descriptor > > target: > 2023-01-31T05:23:28.049659Z qemu-system-aarch64: Received an > unexpected compressed page > 2023-01-31T05:23:28.049709Z qemu-system-aarch64: error while loading > state for instance 0x0 of device 'ram' > 2023-01-31T05:23:28.050095Z qemu-system-aarch64: load of migration > failed: Invalid argument > > In my opinion, it would be a tricky way to set 64k and would not have > migration compatibility problems. > Of course, the best and appropriate way is to migrate with actual data size. > I am not quite familiar with migration codes, if needed, I can help to > do more migration patch tests. >