Re: [PATCH v1 5/6] hw/arm/virt: Enable backup bitmap for dirty ring
On 2/22/23 3:27 AM, Peter Maydell wrote: On Mon, 13 Feb 2023 at 00:40, Gavin Shan wrote: When KVM device "kvm-arm-gicv3" or "arm-its-kvm" is used, we have to enable the backup bitmap for the dirty ring. Otherwise, the migration will fail because those two devices are using the backup bitmap to track dirty guest memory, corresponding to various hardware tables. Signed-off-by: Gavin Shan Reviewed-by: Juan Quintela --- hw/arm/virt.c| 26 ++ target/arm/kvm64.c | 25 + target/arm/kvm_arm.h | 12 3 files changed, 63 insertions(+) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 75f28947de..ea9bd98a65 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -2024,6 +2024,8 @@ static void machvirt_init(MachineState *machine) VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(machine); MachineClass *mc = MACHINE_GET_CLASS(machine); const CPUArchIdList *possible_cpus; +const char *gictype = NULL; +const char *itsclass = NULL; MemoryRegion *sysmem = get_system_memory(); MemoryRegion *secure_sysmem = NULL; MemoryRegion *tag_sysmem = NULL; @@ -2071,6 +2073,30 @@ static void machvirt_init(MachineState *machine) */ finalize_gic_version(vms); +/* + * When "kvm-arm-gicv3" or "arm-its-kvm" is used, the backup dirty + * bitmap has to be enabled for KVM dirty ring, before any memory + * slot is added. Otherwise, the migration will fail with the dirty + * ring. + */ +if (kvm_enabled()) { +if (vms->gic_version != VIRT_GIC_VERSION_2) { +gictype = gicv3_class_name(); +} + +if (vms->gic_version != VIRT_GIC_VERSION_2 && vms->its) { +itsclass = its_class_name(); +} + +if (((gictype && !strcmp(gictype, "kvm-arm-gicv3")) || + (itsclass && !strcmp(itsclass, "arm-its-kvm"))) && +!kvm_arm_enable_dirty_ring_with_bitmap()) { +error_report("Failed to enable the backup bitmap for " + "KVM dirty ring"); +exit(1); +} +} Why does this need to be board-specific code? Is there some way we can just do the right thing automatically? Why does the GIC/ITS matter? The kernel should already know whether we have asked it to do something that needs this extra extension, so I think we ought to be able in the generic "enable the dirty ring" code say "if the kernel says we need this extra thing, also enable this extra thing". Or if that's too early, we can do the extra part in a generic hook a bit later. In the future there might be other things, presumably, that need the backup bitmap, so it would be more future proof not to need to also change QEMU to add extra logic checks that duplicate the logic the kernel already has. When the dirty ring is enabled, a per-vcpu buffer is used to track the dirty pages. The prerequisite to use the per-vcpu buffer is existing running VCPU context. There are two cases where no running VCPU context exists and the backup bitmap extension is needed, as we know until now: (a) save/restore GICv3 tables; (b) save/restore ITS tables; These two cases are related to KVM device "kvm-arm-gicv3" and "arm-its-kvm", which are only needed by virt machine at present. So we needn't the backup bitmap extension for other boards. The host kernel always exports the capability KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP for ARM64. The capability isn't exported for x86 because we needn't it there. The generic path to enable the extension would be in kvm_init(). I think the extension is enabled unconditionally if it has been exported by host kernel. It means there will be unnecessary overhead to synchronize the backup bitmap when the aforementioned KVM devices aren't used, but the overhead should be very small and acceptable. The only concern is host kernel has to allocate the backup bitmap, which is totally no use. Please let me know your thoughts, Peter. qemu_init qemu_create_machine : : configure_accelerators do_configure_accelerator accel_init_machine kvm_init// Where the dirty ring is eanbled. Would be best kvm_arch_init // place to enable the backup bitmap extension regardless : // of 'kvm-arm-gicv3' and 'arm-its-kvm' are used : qmp_x_exit_preconfig qemu_init_board machine_run_board_init machvirt_init // memory slots are added here and where we know the KVM devices : // are used : accel_setup_post // no backend for KVM yet, xen only Note that the capability KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP can't be enabled if the dirty ring isn't enabled or memory slots have been added. Thanks, Gavin
Re: [PATCH v1 3/6] kvm: Synchronize the backup bitmap in the last stage
On 2/22/23 4:46 AM, Peter Xu wrote: On Mon, Feb 13, 2023 at 08:39:22AM +0800, Gavin Shan wrote: In the last stage of live migration or memory slot removal, the backup bitmap needs to be synchronized when it has been enabled. Signed-off-by: Gavin Shan --- accel/kvm/kvm-all.c | 11 +++ include/sysemu/kvm_int.h | 1 + 2 files changed, 12 insertions(+) diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c index 01a6a026af..b5e12de522 100644 --- a/accel/kvm/kvm-all.c +++ b/accel/kvm/kvm-all.c @@ -1352,6 +1352,10 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml, */ if (kvm_state->kvm_dirty_ring_size) { kvm_dirty_ring_reap_locked(kvm_state, NULL); +if (kvm_state->kvm_dirty_ring_with_bitmap) { +kvm_slot_sync_dirty_pages(mem); +kvm_slot_get_dirty_log(kvm_state, mem); +} } else { kvm_slot_get_dirty_log(kvm_state, mem); } IIUC after the memory atomic update changes lands QEMU, we may not need this sync at all. My understanding is that we sync dirty log here only because of non-atomic updates happening in the past and we may lose dirty bits unexpectedly. Maybe Paolo knows. But that needs some more justification and history digging, so definitely more suitable to leave it for later and separate discussion. Reviewed-by: Peter Xu Peter, could you please give some hints for me to understand the atomic and non-atomic update here? Ok, I will drop this part of changes in next revision with the assumption that we have atomic update supported for ARM64. Thanks, Gavin @@ -1573,6 +1577,12 @@ static void kvm_log_sync_global(MemoryListener *l, bool last_stage) mem = >slots[i]; if (mem->memory_size && mem->flags & KVM_MEM_LOG_DIRTY_PAGES) { kvm_slot_sync_dirty_pages(mem); + +if (s->kvm_dirty_ring_with_bitmap && last_stage && +kvm_slot_get_dirty_log(s, mem)) { +kvm_slot_sync_dirty_pages(mem); +} + /* * This is not needed by KVM_GET_DIRTY_LOG because the * ioctl will unconditionally overwrite the whole region. @@ -3701,6 +3711,7 @@ static void kvm_accel_instance_init(Object *obj) s->kernel_irqchip_split = ON_OFF_AUTO_AUTO; /* KVM dirty ring is by default off */ s->kvm_dirty_ring_size = 0; +s->kvm_dirty_ring_with_bitmap = false; s->notify_vmexit = NOTIFY_VMEXIT_OPTION_RUN; s->notify_window = 0; } diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h index 60b520a13e..fdd5b1bde0 100644 --- a/include/sysemu/kvm_int.h +++ b/include/sysemu/kvm_int.h @@ -115,6 +115,7 @@ struct KVMState } *as; uint64_t kvm_dirty_ring_bytes; /* Size of the per-vcpu dirty ring */ uint32_t kvm_dirty_ring_size; /* Number of dirty GFNs per ring */ +bool kvm_dirty_ring_with_bitmap; struct KVMDirtyRingReaper reaper; NotifyVmexitOption notify_vmexit; uint32_t notify_window; -- 2.23.0
Re: [PATCH] hw/arm/virt: Prevent CPUs in one socket to span mutiple NUMA nodes
On 2/22/23 10:31 AM, Philippe Mathieu-Daudé wrote: On 22/2/23 00:12, Gavin Shan wrote: On 2/21/23 9:21 PM, Philippe Mathieu-Daudé wrote: On 21/2/23 10:21, Gavin Shan wrote: On 2/21/23 8:15 PM, Philippe Mathieu-Daudé wrote: On 21/2/23 09:53, Gavin Shan wrote: Linux kernel guest reports warning when two CPUs in one socket have been associated with different NUMA nodes, using the following command lines. -smp 6,maxcpus=6,sockets=2,clusters=1,cores=3,threads=1 \ -numa node,nodeid=0,cpus=0-1,memdev=ram0 \ -numa node,nodeid=1,cpus=2-3,memdev=ram1 \ -numa node,nodeid=2,cpus=4-5,memdev=ram2 \ [ cut here ] WARNING: CPU: 0 PID: 1 at kernel/sched/topology.c:2271 build_sched_domains+0x284/0x910 Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.14.0-268.el9.aarch64 #1 pstate: 0045 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : build_sched_domains+0x284/0x910 lr : build_sched_domains+0x184/0x910 sp : 8804bd50 x29: 8804bd50 x28: 0002 x27: x26: 89cf9a80 x25: x24: 89cbf840 x23: 80325000 x22: 005df800 x21: 8a4ce508 x20: x19: 80324440 x18: 0014 x17: 388925c0 x16: 5386a066 x15: 9c10cc2e x14: 01c0 x13: 0001 x12: 7fffb1a0 x11: 7fffb180 x10: 8a4ce508 x9 : 0041 x8 : 8a4ce500 x7 : 8a4cf920 x6 : 0001 x5 : 0001 x4 : 0007 x3 : 0002 x2 : 1000 x1 : 8a4cf928 x0 : 0001 Call trace: build_sched_domains+0x284/0x910 sched_init_domains+0xac/0xe0 sched_init_smp+0x48/0xc8 kernel_init_freeable+0x140/0x1ac kernel_init+0x28/0x140 ret_from_fork+0x10/0x20 Fix it by preventing mutiple CPUs in one socket to be associated with different NUMA nodes. Reported-by: Yihuang Yu Signed-off-by: Gavin Shan --- hw/arm/virt.c | 37 + 1 file changed, 37 insertions(+) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index ac626b3bef..e0af267c77 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -230,6 +230,39 @@ static bool cpu_type_valid(const char *cpu) return false; } +static bool numa_state_valid(MachineState *ms) +{ + MachineClass *mc = MACHINE_GET_CLASS(ms); + NumaState *state = ms->numa_state; + const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms); + const CPUArchId *cpus = possible_cpus->cpus; + int len = possible_cpus->len, i, j; + + if (!state || state->num_nodes <= 1 || len <= 1) { + return true; + } + + for (i = 0; i < len; i++) { + for (j = i + 1; j < len; j++) { + if (cpus[i].props.has_socket_id && + cpus[i].props.has_node_id && + cpus[j].props.has_socket_id && + cpus[j].props.has_node_id && + cpus[i].props.socket_id == cpus[j].props.socket_id && + cpus[i].props.node_id != cpus[j].props.node_id) { + error_report("CPU-%d and CPU-%d in socket-%ld have been " + "associated with node-%ld and node-%ld", + i, j, cpus[i].props.socket_id, + cpus[i].props.node_id, + cpus[j].props.node_id); + return false; + } + } + } + + return true; +} + static void create_randomness(MachineState *ms, const char *node) { struct { @@ -2040,6 +2073,10 @@ static void machvirt_init(MachineState *machine) exit(1); } + if (!numa_state_valid(machine)) { + exit(1); + } Why restrict to the virt machine? We tried x86 machines and virt machine, but the issue isn't reproducible on x86 machines. So I think it's machine or architecture specific issue. However, I believe RiscV should have similar issue because linux/drivers/base/arch_topology.c is shared by ARM64 and RiscV. x86 doesn't use the driver to populate its CPU topology. Oh, I haven't thought about the other archs, I meant this seem a generic issue which affects all (ARM) machines, so why restrict to the (ARM) virt machine? [Ccing Igor for comments] Well, virt machine is the only concern to us for now. You're right that all ARM64 and ARM machines need this check and limitation. So the check needs to be done in the generic path. The best way I can figure out is like something below. The idea is to introduce a switch to 'struct NumaState' and do the check in the generic path. The switch is turned on by individual machines. Please let me know if you have better ideas Can't this be done generically in machine_numa_finish_cpu_init() ->
Re: [PATCH v1 2/6] migration: Add last stage indicator to global dirty log synchronization
On 2/22/23 4:36 AM, Peter Xu wrote: On Mon, Feb 13, 2023 at 08:39:21AM +0800, Gavin Shan wrote: The global dirty log synchronization is used when KVM and dirty ring are enabled. There is a particularity for ARM64 where the backup bitmap is used to track dirty pages in non-running-vcpu situations. It means the dirty ring works with the combination of ring buffer and backup bitmap. The dirty bits in the backup bitmap needs to collected in the last stage of live migration. In order to identify the last stage of live migration and pass it down, an extra parameter is added to the relevant functions and callbacks. This last stage indicator isn't used until the dirty ring is enabled in the subsequent patches. No functional change intended. Signed-off-by: Gavin Shan Reviewed-by: Peter Xu One trivial thing to mention below. --- accel/kvm/kvm-all.c | 2 +- include/exec/memory.h | 5 +++-- migration/dirtyrate.c | 4 ++-- migration/ram.c | 20 ++-- softmmu/memory.c | 10 +- 5 files changed, 21 insertions(+), 20 deletions(-) diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c index 9b26582655..01a6a026af 100644 --- a/accel/kvm/kvm-all.c +++ b/accel/kvm/kvm-all.c @@ -1554,7 +1554,7 @@ static void kvm_log_sync(MemoryListener *listener, kvm_slots_unlock(); } -static void kvm_log_sync_global(MemoryListener *l) +static void kvm_log_sync_global(MemoryListener *l, bool last_stage) { KVMMemoryListener *kml = container_of(l, KVMMemoryListener, listener); KVMState *s = kvm_state; diff --git a/include/exec/memory.h b/include/exec/memory.h index 2e602a2fad..75b2fd9f48 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -929,8 +929,9 @@ struct MemoryListener { * its @log_sync must be NULL. Vice versa. * * @listener: The #MemoryListener. + * @last_stage: The last stage to synchronize the log during migration IMHO it may be important to mention the vcpu status here that the caller guarantees to call the last_stage==true only once, only after all vcpus are stopped (and vcpus will not be started again if migration succeeded). Yes, I will update the comments in next revision accordingly. */ -void (*log_sync_global)(MemoryListener *listener); +void (*log_sync_global)(MemoryListener *listener, bool last_stage); Thanks, Gavin
Re: [PATCH v1 1/6] linux-headers: Update for dirty ring
On 2/22/23 3:30 AM, Peter Maydell wrote: On Mon, 13 Feb 2023 at 00:39, Gavin Shan wrote: Signed-off-by: Gavin Shan --- linux-headers/asm-arm64/kvm.h | 1 + linux-headers/linux/kvm.h | 2 ++ 2 files changed, 3 insertions(+) For this to be a non-RFC patch, this needs to be a proper sync of the headers against an upstream kernel tree. (By-hand tweaks are fine for purposes of working on and getting patchsets reviewed.) Yes, I vaguely remember there is script to synchronize linux header files, which is './scripts/update-linux-headers.sh'. I think I need to run the following command to update? # ./scripts/update-linux-headers.sh Thanks, Gavin
Re: [PATCH] hw/arm/virt: Prevent CPUs in one socket to span mutiple NUMA nodes
On 2/21/23 9:21 PM, Philippe Mathieu-Daudé wrote: On 21/2/23 10:21, Gavin Shan wrote: On 2/21/23 8:15 PM, Philippe Mathieu-Daudé wrote: On 21/2/23 09:53, Gavin Shan wrote: Linux kernel guest reports warning when two CPUs in one socket have been associated with different NUMA nodes, using the following command lines. -smp 6,maxcpus=6,sockets=2,clusters=1,cores=3,threads=1 \ -numa node,nodeid=0,cpus=0-1,memdev=ram0 \ -numa node,nodeid=1,cpus=2-3,memdev=ram1 \ -numa node,nodeid=2,cpus=4-5,memdev=ram2 \ [ cut here ] WARNING: CPU: 0 PID: 1 at kernel/sched/topology.c:2271 build_sched_domains+0x284/0x910 Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.14.0-268.el9.aarch64 #1 pstate: 0045 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : build_sched_domains+0x284/0x910 lr : build_sched_domains+0x184/0x910 sp : 8804bd50 x29: 8804bd50 x28: 0002 x27: x26: 89cf9a80 x25: x24: 89cbf840 x23: 80325000 x22: 005df800 x21: 8a4ce508 x20: x19: 80324440 x18: 0014 x17: 388925c0 x16: 5386a066 x15: 9c10cc2e x14: 01c0 x13: 0001 x12: 7fffb1a0 x11: 7fffb180 x10: 8a4ce508 x9 : 0041 x8 : 8a4ce500 x7 : 8a4cf920 x6 : 0001 x5 : 0001 x4 : 0007 x3 : 0002 x2 : 1000 x1 : 8a4cf928 x0 : 0001 Call trace: build_sched_domains+0x284/0x910 sched_init_domains+0xac/0xe0 sched_init_smp+0x48/0xc8 kernel_init_freeable+0x140/0x1ac kernel_init+0x28/0x140 ret_from_fork+0x10/0x20 Fix it by preventing mutiple CPUs in one socket to be associated with different NUMA nodes. Reported-by: Yihuang Yu Signed-off-by: Gavin Shan --- hw/arm/virt.c | 37 + 1 file changed, 37 insertions(+) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index ac626b3bef..e0af267c77 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -230,6 +230,39 @@ static bool cpu_type_valid(const char *cpu) return false; } +static bool numa_state_valid(MachineState *ms) +{ + MachineClass *mc = MACHINE_GET_CLASS(ms); + NumaState *state = ms->numa_state; + const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms); + const CPUArchId *cpus = possible_cpus->cpus; + int len = possible_cpus->len, i, j; + + if (!state || state->num_nodes <= 1 || len <= 1) { + return true; + } + + for (i = 0; i < len; i++) { + for (j = i + 1; j < len; j++) { + if (cpus[i].props.has_socket_id && + cpus[i].props.has_node_id && + cpus[j].props.has_socket_id && + cpus[j].props.has_node_id && + cpus[i].props.socket_id == cpus[j].props.socket_id && + cpus[i].props.node_id != cpus[j].props.node_id) { + error_report("CPU-%d and CPU-%d in socket-%ld have been " + "associated with node-%ld and node-%ld", + i, j, cpus[i].props.socket_id, + cpus[i].props.node_id, + cpus[j].props.node_id); + return false; + } + } + } + + return true; +} + static void create_randomness(MachineState *ms, const char *node) { struct { @@ -2040,6 +2073,10 @@ static void machvirt_init(MachineState *machine) exit(1); } + if (!numa_state_valid(machine)) { + exit(1); + } Why restrict to the virt machine? We tried x86 machines and virt machine, but the issue isn't reproducible on x86 machines. So I think it's machine or architecture specific issue. However, I believe RiscV should have similar issue because linux/drivers/base/arch_topology.c is shared by ARM64 and RiscV. x86 doesn't use the driver to populate its CPU topology. Oh, I haven't thought about the other archs, I meant this seem a generic issue which affects all (ARM) machines, so why restrict to the (ARM) virt machine? [Ccing Igor for comments] Well, virt machine is the only concern to us for now. You're right that all ARM64 and ARM machines need this check and limitation. So the check needs to be done in the generic path. The best way I can figure out is like something below. The idea is to introduce a switch to 'struct NumaState' and do the check in the generic path. The switch is turned on by individual machines. Please let me know if you have better ideas - Add 'bool struct NumaState::has_strict_socket_mapping', which is 'false' by default until machine specific initialization function calls helpe
Re: [PATCH] kvm: dirty-ring: Fix race with vcpu creation
On 2/6/23 10:00 AM, Peter Xu wrote: On Tue, Sep 27, 2022 at 11:46:53AM -0400, Peter Xu wrote: It's possible that we want to reap a dirty ring on a vcpu that is during creation, because the vcpu is put onto list (CPU_FOREACH visible) before initialization of the structures. In this case: qemu_init_vcpu x86_cpu_realizefn cpu_exec_realizefn cpu_list_add < can be probed by CPU_FOREACH qemu_init_vcpu cpus_accel->create_vcpu_thread(cpu); kvm_init_vcpu map kvm_dirty_gfns <--- kvm_dirty_gfns valid Don't try to reap dirty ring on vcpus during creation or it'll crash. Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2124756 Reported-by: Xiaohui Li Signed-off-by: Peter Xu --- accel/kvm/kvm-all.c | 9 + 1 file changed, 9 insertions(+) diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c index 5acab1767f..df5fabd3a8 100644 --- a/accel/kvm/kvm-all.c +++ b/accel/kvm/kvm-all.c @@ -757,6 +757,15 @@ static uint32_t kvm_dirty_ring_reap_one(KVMState *s, CPUState *cpu) uint32_t ring_size = s->kvm_dirty_ring_size; uint32_t count = 0, fetch = cpu->kvm_fetch_index; +/* + * It's possible that we race with vcpu creation code where the vcpu is + * put onto the vcpus list but not yet initialized the dirty ring + * structures. If so, skip it. + */ +if (!cpu->created) { +return 0; +} + assert(dirty_gfns && ring_size); trace_kvm_dirty_ring_reap_vcpu(cpu->cpu_index); Reviewed-by: Gavin Shan
Re: [PATCH] hw/arm/virt: Prevent CPUs in one socket to span mutiple NUMA nodes
On 2/21/23 8:15 PM, Philippe Mathieu-Daudé wrote: On 21/2/23 09:53, Gavin Shan wrote: Linux kernel guest reports warning when two CPUs in one socket have been associated with different NUMA nodes, using the following command lines. -smp 6,maxcpus=6,sockets=2,clusters=1,cores=3,threads=1 \ -numa node,nodeid=0,cpus=0-1,memdev=ram0 \ -numa node,nodeid=1,cpus=2-3,memdev=ram1 \ -numa node,nodeid=2,cpus=4-5,memdev=ram2 \ [ cut here ] WARNING: CPU: 0 PID: 1 at kernel/sched/topology.c:2271 build_sched_domains+0x284/0x910 Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.14.0-268.el9.aarch64 #1 pstate: 0045 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : build_sched_domains+0x284/0x910 lr : build_sched_domains+0x184/0x910 sp : 8804bd50 x29: 8804bd50 x28: 0002 x27: x26: 89cf9a80 x25: x24: 89cbf840 x23: 80325000 x22: 005df800 x21: 8a4ce508 x20: x19: 80324440 x18: 0014 x17: 388925c0 x16: 5386a066 x15: 9c10cc2e x14: 01c0 x13: 0001 x12: 7fffb1a0 x11: 7fffb180 x10: 8a4ce508 x9 : 0041 x8 : 8a4ce500 x7 : 8a4cf920 x6 : 0001 x5 : 0001 x4 : 0007 x3 : 0002 x2 : 1000 x1 : 8a4cf928 x0 : 0001 Call trace: build_sched_domains+0x284/0x910 sched_init_domains+0xac/0xe0 sched_init_smp+0x48/0xc8 kernel_init_freeable+0x140/0x1ac kernel_init+0x28/0x140 ret_from_fork+0x10/0x20 Fix it by preventing mutiple CPUs in one socket to be associated with different NUMA nodes. Reported-by: Yihuang Yu Signed-off-by: Gavin Shan --- hw/arm/virt.c | 37 + 1 file changed, 37 insertions(+) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index ac626b3bef..e0af267c77 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -230,6 +230,39 @@ static bool cpu_type_valid(const char *cpu) return false; } +static bool numa_state_valid(MachineState *ms) +{ + MachineClass *mc = MACHINE_GET_CLASS(ms); + NumaState *state = ms->numa_state; + const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms); + const CPUArchId *cpus = possible_cpus->cpus; + int len = possible_cpus->len, i, j; + + if (!state || state->num_nodes <= 1 || len <= 1) { + return true; + } + + for (i = 0; i < len; i++) { + for (j = i + 1; j < len; j++) { + if (cpus[i].props.has_socket_id && + cpus[i].props.has_node_id && + cpus[j].props.has_socket_id && + cpus[j].props.has_node_id && + cpus[i].props.socket_id == cpus[j].props.socket_id && + cpus[i].props.node_id != cpus[j].props.node_id) { + error_report("CPU-%d and CPU-%d in socket-%ld have been " + "associated with node-%ld and node-%ld", + i, j, cpus[i].props.socket_id, + cpus[i].props.node_id, + cpus[j].props.node_id); + return false; + } + } + } + + return true; +} + static void create_randomness(MachineState *ms, const char *node) { struct { @@ -2040,6 +2073,10 @@ static void machvirt_init(MachineState *machine) exit(1); } + if (!numa_state_valid(machine)) { + exit(1); + } Why restrict to the virt machine? We tried x86 machines and virt machine, but the issue isn't reproducible on x86 machines. So I think it's machine or architecture specific issue. However, I believe RiscV should have similar issue because linux/drivers/base/arch_topology.c is shared by ARM64 and RiscV. x86 doesn't use the driver to populate its CPU topology. Thanks, Gavin
[PATCH] hw/arm/virt: Prevent CPUs in one socket to span mutiple NUMA nodes
Linux kernel guest reports warning when two CPUs in one socket have been associated with different NUMA nodes, using the following command lines. -smp 6,maxcpus=6,sockets=2,clusters=1,cores=3,threads=1 \ -numa node,nodeid=0,cpus=0-1,memdev=ram0\ -numa node,nodeid=1,cpus=2-3,memdev=ram1\ -numa node,nodeid=2,cpus=4-5,memdev=ram2\ [ cut here ] WARNING: CPU: 0 PID: 1 at kernel/sched/topology.c:2271 build_sched_domains+0x284/0x910 Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.14.0-268.el9.aarch64 #1 pstate: 0045 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : build_sched_domains+0x284/0x910 lr : build_sched_domains+0x184/0x910 sp : 8804bd50 x29: 8804bd50 x28: 0002 x27: x26: 89cf9a80 x25: x24: 89cbf840 x23: 80325000 x22: 005df800 x21: 8a4ce508 x20: x19: 80324440 x18: 0014 x17: 388925c0 x16: 5386a066 x15: 9c10cc2e x14: 01c0 x13: 0001 x12: 7fffb1a0 x11: 7fffb180 x10: 8a4ce508 x9 : 0041 x8 : 8a4ce500 x7 : 8a4cf920 x6 : 0001 x5 : 0001 x4 : 0007 x3 : 0002 x2 : 1000 x1 : 8a4cf928 x0 : 0001 Call trace: build_sched_domains+0x284/0x910 sched_init_domains+0xac/0xe0 sched_init_smp+0x48/0xc8 kernel_init_freeable+0x140/0x1ac kernel_init+0x28/0x140 ret_from_fork+0x10/0x20 Fix it by preventing mutiple CPUs in one socket to be associated with different NUMA nodes. Reported-by: Yihuang Yu Signed-off-by: Gavin Shan --- hw/arm/virt.c | 37 + 1 file changed, 37 insertions(+) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index ac626b3bef..e0af267c77 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -230,6 +230,39 @@ static bool cpu_type_valid(const char *cpu) return false; } +static bool numa_state_valid(MachineState *ms) +{ +MachineClass *mc = MACHINE_GET_CLASS(ms); +NumaState *state = ms->numa_state; +const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms); +const CPUArchId *cpus = possible_cpus->cpus; +int len = possible_cpus->len, i, j; + +if (!state || state->num_nodes <= 1 || len <= 1) { +return true; +} + +for (i = 0; i < len; i++) { +for (j = i + 1; j < len; j++) { +if (cpus[i].props.has_socket_id && +cpus[i].props.has_node_id && +cpus[j].props.has_socket_id && +cpus[j].props.has_node_id && +cpus[i].props.socket_id == cpus[j].props.socket_id && +cpus[i].props.node_id != cpus[j].props.node_id) { +error_report("CPU-%d and CPU-%d in socket-%ld have been " + "associated with node-%ld and node-%ld", + i, j, cpus[i].props.socket_id, + cpus[i].props.node_id, + cpus[j].props.node_id); +return false; +} +} +} + +return true; +} + static void create_randomness(MachineState *ms, const char *node) { struct { @@ -2040,6 +2073,10 @@ static void machvirt_init(MachineState *machine) exit(1); } +if (!numa_state_valid(machine)) { +exit(1); +} + possible_cpus = mc->possible_cpu_arch_ids(machine); /* -- 2.23.0
[PATCH v1 4/6] kvm: Add helper kvm_dirty_ring_init()
Due to multiple capabilities associated with the dirty ring for different architectures: KVM_CAP_DIRTY_{LOG_RING, LOG_RING_ACQ_REL} for x86 and arm64 separately. There will be more to be done in order to support the dirty ring for arm64. Lets add helper kvm_dirty_ring_init() to enable the dirty ring. With this, the code looks a bit clean. No functional change intended. Signed-off-by: Gavin Shan --- accel/kvm/kvm-all.c | 76 - 1 file changed, 47 insertions(+), 29 deletions(-) diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c index b5e12de522..e5035026c9 100644 --- a/accel/kvm/kvm-all.c +++ b/accel/kvm/kvm-all.c @@ -1453,6 +1453,50 @@ static int kvm_dirty_ring_reaper_init(KVMState *s) return 0; } +static int kvm_dirty_ring_init(KVMState *s) +{ +uint32_t ring_size = s->kvm_dirty_ring_size; +uint64_t ring_bytes = ring_size * sizeof(struct kvm_dirty_gfn); +int ret; + +s->kvm_dirty_ring_size = 0; +s->kvm_dirty_ring_bytes = 0; + +/* Bail if the dirty ring size isn't specified */ +if (!ring_size) { +return 0; +} + +/* + * Read the max supported pages. Fall back to dirty logging mode + * if the dirty ring isn't supported. + */ +ret = kvm_vm_check_extension(s, KVM_CAP_DIRTY_LOG_RING); +if (ret <= 0) { +warn_report("KVM dirty ring not available, using bitmap method"); +return 0; +} + +if (ring_bytes > ret) { +error_report("KVM dirty ring size %" PRIu32 " too big " + "(maximum is %ld). Please use a smaller value.", + ring_size, (long)ret / sizeof(struct kvm_dirty_gfn)); +return -EINVAL; +} + +ret = kvm_vm_enable_cap(s, KVM_CAP_DIRTY_LOG_RING, 0, ring_bytes); +if (ret) { +error_report("Enabling of KVM dirty ring failed: %s. " + "Suggested minimum value is 1024.", strerror(-ret)); +return -EIO; +} + +s->kvm_dirty_ring_size = ring_size; +s->kvm_dirty_ring_bytes = ring_bytes; + +return 0; +} + static void kvm_region_add(MemoryListener *listener, MemoryRegionSection *section) { @@ -2522,35 +2566,9 @@ static int kvm_init(MachineState *ms) * Enable KVM dirty ring if supported, otherwise fall back to * dirty logging mode */ -if (s->kvm_dirty_ring_size > 0) { -uint64_t ring_bytes; - -ring_bytes = s->kvm_dirty_ring_size * sizeof(struct kvm_dirty_gfn); - -/* Read the max supported pages */ -ret = kvm_vm_check_extension(s, KVM_CAP_DIRTY_LOG_RING); -if (ret > 0) { -if (ring_bytes > ret) { -error_report("KVM dirty ring size %" PRIu32 " too big " - "(maximum is %ld). Please use a smaller value.", - s->kvm_dirty_ring_size, - (long)ret / sizeof(struct kvm_dirty_gfn)); -ret = -EINVAL; -goto err; -} - -ret = kvm_vm_enable_cap(s, KVM_CAP_DIRTY_LOG_RING, 0, ring_bytes); -if (ret) { -error_report("Enabling of KVM dirty ring failed: %s. " - "Suggested minimum value is 1024.", strerror(-ret)); -goto err; -} - -s->kvm_dirty_ring_bytes = ring_bytes; - } else { - warn_report("KVM dirty ring not available, using bitmap method"); - s->kvm_dirty_ring_size = 0; -} +ret = kvm_dirty_ring_init(s); +if (ret < 0) { +goto err; } /* -- 2.23.0
[PATCH v1 2/6] migration: Add last stage indicator to global dirty log synchronization
The global dirty log synchronization is used when KVM and dirty ring are enabled. There is a particularity for ARM64 where the backup bitmap is used to track dirty pages in non-running-vcpu situations. It means the dirty ring works with the combination of ring buffer and backup bitmap. The dirty bits in the backup bitmap needs to collected in the last stage of live migration. In order to identify the last stage of live migration and pass it down, an extra parameter is added to the relevant functions and callbacks. This last stage indicator isn't used until the dirty ring is enabled in the subsequent patches. No functional change intended. Signed-off-by: Gavin Shan --- accel/kvm/kvm-all.c | 2 +- include/exec/memory.h | 5 +++-- migration/dirtyrate.c | 4 ++-- migration/ram.c | 20 ++-- softmmu/memory.c | 10 +- 5 files changed, 21 insertions(+), 20 deletions(-) diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c index 9b26582655..01a6a026af 100644 --- a/accel/kvm/kvm-all.c +++ b/accel/kvm/kvm-all.c @@ -1554,7 +1554,7 @@ static void kvm_log_sync(MemoryListener *listener, kvm_slots_unlock(); } -static void kvm_log_sync_global(MemoryListener *l) +static void kvm_log_sync_global(MemoryListener *l, bool last_stage) { KVMMemoryListener *kml = container_of(l, KVMMemoryListener, listener); KVMState *s = kvm_state; diff --git a/include/exec/memory.h b/include/exec/memory.h index 2e602a2fad..75b2fd9f48 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -929,8 +929,9 @@ struct MemoryListener { * its @log_sync must be NULL. Vice versa. * * @listener: The #MemoryListener. + * @last_stage: The last stage to synchronize the log during migration */ -void (*log_sync_global)(MemoryListener *listener); +void (*log_sync_global)(MemoryListener *listener, bool last_stage); /** * @log_clear: @@ -2408,7 +2409,7 @@ MemoryRegionSection memory_region_find(MemoryRegion *mr, * * Synchronizes the dirty page log for all address spaces. */ -void memory_global_dirty_log_sync(void); +void memory_global_dirty_log_sync(bool last_stage); /** * memory_global_dirty_log_sync: synchronize the dirty log for all memory diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c index 575d48c397..da9b4a1f8d 100644 --- a/migration/dirtyrate.c +++ b/migration/dirtyrate.c @@ -101,7 +101,7 @@ void global_dirty_log_change(unsigned int flag, bool start) static void global_dirty_log_sync(unsigned int flag, bool one_shot) { qemu_mutex_lock_iothread(); -memory_global_dirty_log_sync(); +memory_global_dirty_log_sync(false); if (one_shot) { memory_global_dirty_log_stop(flag); } @@ -553,7 +553,7 @@ static void calculate_dirtyrate_dirty_bitmap(struct DirtyRateConfig config) * skip it unconditionally and start dirty tracking * from 2'round of log sync */ -memory_global_dirty_log_sync(); +memory_global_dirty_log_sync(false); /* * reset page protect manually and unconditionally. diff --git a/migration/ram.c b/migration/ram.c index b966e148c2..52abf015fd 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -1176,7 +1176,7 @@ static void migration_trigger_throttle(RAMState *rs) } } -static void migration_bitmap_sync(RAMState *rs) +static void migration_bitmap_sync(RAMState *rs, bool last_stage) { RAMBlock *block; int64_t end_time; @@ -1188,7 +1188,7 @@ static void migration_bitmap_sync(RAMState *rs) } trace_migration_bitmap_sync_start(); -memory_global_dirty_log_sync(); +memory_global_dirty_log_sync(last_stage); qemu_mutex_lock(>bitmap_mutex); WITH_RCU_READ_LOCK_GUARD() { @@ -1222,7 +1222,7 @@ static void migration_bitmap_sync(RAMState *rs) } } -static void migration_bitmap_sync_precopy(RAMState *rs) +static void migration_bitmap_sync_precopy(RAMState *rs, bool last_stage) { Error *local_err = NULL; @@ -1235,7 +1235,7 @@ static void migration_bitmap_sync_precopy(RAMState *rs) local_err = NULL; } -migration_bitmap_sync(rs); +migration_bitmap_sync(rs, last_stage); if (precopy_notify(PRECOPY_NOTIFY_AFTER_BITMAP_SYNC, _err)) { error_report_err(local_err); @@ -2887,7 +2887,7 @@ void ram_postcopy_send_discard_bitmap(MigrationState *ms) RCU_READ_LOCK_GUARD(); /* This should be our last sync, the src is now paused */ -migration_bitmap_sync(rs); +migration_bitmap_sync(rs, false); /* Easiest way to make sure we don't resume in the middle of a host-page */ rs->pss[RAM_CHANNEL_PRECOPY].last_sent_block = NULL; @@ -3077,7 +3077,7 @@ static void ram_init_bitmaps(RAMState *rs) /* We don't use dirty log with background snapshots */ if (!migrate_background_snapshot()) { memory_global_dirty_log_start(GLOBAL_DIRTY_MIGRATION); -migration_bitmap_sync_prec
[PATCH v1 5/6] hw/arm/virt: Enable backup bitmap for dirty ring
When KVM device "kvm-arm-gicv3" or "arm-its-kvm" is used, we have to enable the backup bitmap for the dirty ring. Otherwise, the migration will fail because those two devices are using the backup bitmap to track dirty guest memory, corresponding to various hardware tables. Signed-off-by: Gavin Shan Reviewed-by: Juan Quintela --- hw/arm/virt.c| 26 ++ target/arm/kvm64.c | 25 + target/arm/kvm_arm.h | 12 3 files changed, 63 insertions(+) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 75f28947de..ea9bd98a65 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -2024,6 +2024,8 @@ static void machvirt_init(MachineState *machine) VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(machine); MachineClass *mc = MACHINE_GET_CLASS(machine); const CPUArchIdList *possible_cpus; +const char *gictype = NULL; +const char *itsclass = NULL; MemoryRegion *sysmem = get_system_memory(); MemoryRegion *secure_sysmem = NULL; MemoryRegion *tag_sysmem = NULL; @@ -2071,6 +2073,30 @@ static void machvirt_init(MachineState *machine) */ finalize_gic_version(vms); +/* + * When "kvm-arm-gicv3" or "arm-its-kvm" is used, the backup dirty + * bitmap has to be enabled for KVM dirty ring, before any memory + * slot is added. Otherwise, the migration will fail with the dirty + * ring. + */ +if (kvm_enabled()) { +if (vms->gic_version != VIRT_GIC_VERSION_2) { +gictype = gicv3_class_name(); +} + +if (vms->gic_version != VIRT_GIC_VERSION_2 && vms->its) { +itsclass = its_class_name(); +} + +if (((gictype && !strcmp(gictype, "kvm-arm-gicv3")) || + (itsclass && !strcmp(itsclass, "arm-its-kvm"))) && +!kvm_arm_enable_dirty_ring_with_bitmap()) { +error_report("Failed to enable the backup bitmap for " + "KVM dirty ring"); +exit(1); +} +} + if (vms->secure) { /* * The Secure view of the world is the same as the NonSecure, diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c index 1197253d12..91960e1cd3 100644 --- a/target/arm/kvm64.c +++ b/target/arm/kvm64.c @@ -764,6 +764,31 @@ bool kvm_arm_steal_time_supported(void) return kvm_check_extension(kvm_state, KVM_CAP_STEAL_TIME); } +bool kvm_arm_enable_dirty_ring_with_bitmap(void) +{ +int ret; + +/* No need to enable the backup bitmap if dirty ring isn't enabled */ +if (!kvm_dirty_ring_enabled()) { +return true; +} + +ret = kvm_check_extension(kvm_state, KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP); +if (!ret) { +return false; +} + +ret = kvm_vm_enable_cap(kvm_state, KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP, +0, 1); +if (ret) { +return false; +} + +kvm_state->kvm_dirty_ring_with_bitmap = true; + +return true; +} + QEMU_BUILD_BUG_ON(KVM_ARM64_SVE_VQ_MIN != 1); uint32_t kvm_arm_sve_get_vls(CPUState *cs) diff --git a/target/arm/kvm_arm.h b/target/arm/kvm_arm.h index 99017b635c..402281e61a 100644 --- a/target/arm/kvm_arm.h +++ b/target/arm/kvm_arm.h @@ -282,6 +282,13 @@ void kvm_arm_steal_time_finalize(ARMCPU *cpu, Error **errp); */ bool kvm_arm_steal_time_supported(void); +/** + * kvm_arm_enable_dirty_ring_with_bitmap: + * Returns: true if KVM dirty ring's backup bitmap is enabled + * and false otherwise. + */ +bool kvm_arm_enable_dirty_ring_with_bitmap(void); + /** * kvm_arm_aarch32_supported: * @@ -395,6 +402,11 @@ static inline bool kvm_arm_steal_time_supported(void) return false; } +static inline bool kvm_arm_enable_dirty_ring_with_bitmap(void) +{ +return false; +} + /* * These functions should never actually be called without KVM support. */ -- 2.23.0
[PATCH v1 3/6] kvm: Synchronize the backup bitmap in the last stage
In the last stage of live migration or memory slot removal, the backup bitmap needs to be synchronized when it has been enabled. Signed-off-by: Gavin Shan --- accel/kvm/kvm-all.c | 11 +++ include/sysemu/kvm_int.h | 1 + 2 files changed, 12 insertions(+) diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c index 01a6a026af..b5e12de522 100644 --- a/accel/kvm/kvm-all.c +++ b/accel/kvm/kvm-all.c @@ -1352,6 +1352,10 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml, */ if (kvm_state->kvm_dirty_ring_size) { kvm_dirty_ring_reap_locked(kvm_state, NULL); +if (kvm_state->kvm_dirty_ring_with_bitmap) { +kvm_slot_sync_dirty_pages(mem); +kvm_slot_get_dirty_log(kvm_state, mem); +} } else { kvm_slot_get_dirty_log(kvm_state, mem); } @@ -1573,6 +1577,12 @@ static void kvm_log_sync_global(MemoryListener *l, bool last_stage) mem = >slots[i]; if (mem->memory_size && mem->flags & KVM_MEM_LOG_DIRTY_PAGES) { kvm_slot_sync_dirty_pages(mem); + +if (s->kvm_dirty_ring_with_bitmap && last_stage && +kvm_slot_get_dirty_log(s, mem)) { +kvm_slot_sync_dirty_pages(mem); +} + /* * This is not needed by KVM_GET_DIRTY_LOG because the * ioctl will unconditionally overwrite the whole region. @@ -3701,6 +3711,7 @@ static void kvm_accel_instance_init(Object *obj) s->kernel_irqchip_split = ON_OFF_AUTO_AUTO; /* KVM dirty ring is by default off */ s->kvm_dirty_ring_size = 0; +s->kvm_dirty_ring_with_bitmap = false; s->notify_vmexit = NOTIFY_VMEXIT_OPTION_RUN; s->notify_window = 0; } diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h index 60b520a13e..fdd5b1bde0 100644 --- a/include/sysemu/kvm_int.h +++ b/include/sysemu/kvm_int.h @@ -115,6 +115,7 @@ struct KVMState } *as; uint64_t kvm_dirty_ring_bytes; /* Size of the per-vcpu dirty ring */ uint32_t kvm_dirty_ring_size; /* Number of dirty GFNs per ring */ +bool kvm_dirty_ring_with_bitmap; struct KVMDirtyRingReaper reaper; NotifyVmexitOption notify_vmexit; uint32_t notify_window; -- 2.23.0
[PATCH v1 6/6] kvm: Enable dirty ring for arm64
arm64 has different capability from x86 to enable the dirty ring, which is KVM_CAP_DIRTY_LOG_RING_ACQ_REL. To enable it in kvm_dirty_ring_init() when KVM_CAP_DIRTY_LOG_RING isn't supported. Signed-off-by: Gavin Shan Reviewed-by: Juan Quintela --- accel/kvm/kvm-all.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c index e5035026c9..f6fbeae644 100644 --- a/accel/kvm/kvm-all.c +++ b/accel/kvm/kvm-all.c @@ -1457,6 +1457,7 @@ static int kvm_dirty_ring_init(KVMState *s) { uint32_t ring_size = s->kvm_dirty_ring_size; uint64_t ring_bytes = ring_size * sizeof(struct kvm_dirty_gfn); +unsigned int capability = KVM_CAP_DIRTY_LOG_RING; int ret; s->kvm_dirty_ring_size = 0; @@ -1471,7 +1472,12 @@ static int kvm_dirty_ring_init(KVMState *s) * Read the max supported pages. Fall back to dirty logging mode * if the dirty ring isn't supported. */ -ret = kvm_vm_check_extension(s, KVM_CAP_DIRTY_LOG_RING); +ret = kvm_vm_check_extension(s, capability); +if (ret <= 0) { +capability = KVM_CAP_DIRTY_LOG_RING_ACQ_REL; +ret = kvm_vm_check_extension(s, capability); +} + if (ret <= 0) { warn_report("KVM dirty ring not available, using bitmap method"); return 0; @@ -1484,7 +1490,7 @@ static int kvm_dirty_ring_init(KVMState *s) return -EINVAL; } -ret = kvm_vm_enable_cap(s, KVM_CAP_DIRTY_LOG_RING, 0, ring_bytes); +ret = kvm_vm_enable_cap(s, capability, 0, ring_bytes); if (ret) { error_report("Enabling of KVM dirty ring failed: %s. " "Suggested minimum value is 1024.", strerror(-ret)); -- 2.23.0
[PATCH v1 1/6] linux-headers: Update for dirty ring
Signed-off-by: Gavin Shan --- linux-headers/asm-arm64/kvm.h | 1 + linux-headers/linux/kvm.h | 2 ++ 2 files changed, 3 insertions(+) diff --git a/linux-headers/asm-arm64/kvm.h b/linux-headers/asm-arm64/kvm.h index 4bf2d7246e..a7cfefb3a8 100644 --- a/linux-headers/asm-arm64/kvm.h +++ b/linux-headers/asm-arm64/kvm.h @@ -43,6 +43,7 @@ #define __KVM_HAVE_VCPU_EVENTS #define KVM_COALESCED_MMIO_PAGE_OFFSET 1 +#define KVM_DIRTY_LOG_PAGE_OFFSET 64 #define KVM_REG_SIZE(id) \ (1U << (((id) & KVM_REG_SIZE_MASK) >> KVM_REG_SIZE_SHIFT)) diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h index ebdafa576d..5b4e0e4e68 100644 --- a/linux-headers/linux/kvm.h +++ b/linux-headers/linux/kvm.h @@ -1175,6 +1175,8 @@ struct kvm_ppc_resize_hpt { #define KVM_CAP_VM_DISABLE_NX_HUGE_PAGES 220 #define KVM_CAP_S390_ZPCI_OP 221 #define KVM_CAP_S390_CPU_TOPOLOGY 222 +#define KVM_CAP_DIRTY_LOG_RING_ACQ_REL 223 +#define KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP 225 #ifdef KVM_CAP_IRQ_ROUTING -- 2.23.0
[PATCH v1 0/6] hw/arm/virt: Support dirty ring
This series intends to support dirty ring for live migration. The dirty ring use discrete buffer to track dirty pages. For ARM64, the speciality is to use backup bitmap to track dirty pages when there is no-running-vcpu context. It's known that the backup bitmap needs to be synchronized when KVM device "kvm-arm-gicv3" or "arm-its-kvm" has been enabled. The backup bitmap is collected in the last stage of migration. PATCH[1]Synchronize linux-headers for dirty ring PATCH[2]Introduce indicator of the last stage migration and pass it all the way down PATCH[3]Synchronize the backup bitmap in the last stage of live migration PATCH[4]Introduce helper kvm_dirty_ring_init() to enable the dirty ring PATCH[5-6] Enable dirty ring for hw/arm/virt RFCv1: https://lists.nongnu.org/archive/html/qemu-arm/2023-02/msg00171.html Testing === (1) kvm-unit-tests/its-pending-migration and kvm-unit-tests/its-migration with dirty ring or normal dirty page tracking mechanism. All test cases passed. QEMU=./qemu.main/build/qemu-system-aarch64 ACCEL=kvm \ ./its-pending-migration QEMU=./qemu.main/build/qemu-system-aarch64 ACCEL=kvm \ ./its-migration QEMU=./qemu.main/build/qemu-system-aarch64 ACCEL=kvm,dirty-ring-size=65536 \ ./its-pending-migration QEMU=./qemu.main/build/qemu-system-aarch64 ACCEL=kvm,dirty-ring-size=65536 \ ./its-migration (2) Combinations of migration, post-copy migration, e1000e and virtio-net devices. All test cases passed. -netdev tap,id=net0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \ -device e1000e,bus=pcie.5,netdev=net0,mac=52:54:00:f1:26:a0 -netdev tap,id=vnet0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \ -device virtio-net-pci,bus=pcie.6,netdev=vnet0,mac=52:54:00:f1:26:b0 Changelog = v1: * Combine two patches into one PATCH[v1 2/6] for the last stage indicator (Peter) * Drop the secondary bitmap and use the original one directly (Juan) * Avoid "goto out" in helper kvm_dirty_ring_init() (Juan) Gavin Shan (6): linux-headers: Update for dirty ring migration: Add last stage indicator to global dirty log synchronization kvm: Synchronize the backup bitmap in the last stage kvm: Add helper kvm_dirty_ring_init() hw/arm/virt: Enable backup bitmap for dirty ring kvm: Enable dirty ring for arm64 accel/kvm/kvm-all.c | 95 --- hw/arm/virt.c | 26 ++ include/exec/memory.h | 5 +- include/sysemu/kvm_int.h | 1 + linux-headers/asm-arm64/kvm.h | 1 + linux-headers/linux/kvm.h | 2 + migration/dirtyrate.c | 4 +- migration/ram.c | 20 softmmu/memory.c | 10 ++-- target/arm/kvm64.c| 25 + target/arm/kvm_arm.h | 12 + 11 files changed, 152 insertions(+), 49 deletions(-) -- 2.23.0
Re: [PATCH RFCv1 2/8] memory: Add last stage indicator to global dirty log synchronization
On 2/10/23 6:48 AM, Peter Xu wrote: On Mon, Feb 06, 2023 at 07:20:04PM +0800, Gavin Shan wrote: The global dirty log synchronization is used when KVM and dirty ring are enabled. There is a particularity for ARM64 where the backup bitmap is used to track dirty pages in non-running-vcpu situations. It means the dirty ring works with the combination of ring buffer and backup bitmap. The dirty bits in the backup bitmap needs to collected in the last stage of live migration. In order to identify the last stage of live migration and pass it down, an extra parameter is added to the relevant functions and callback. This last stage information isn't used yet. No functional change intended. Signed-off-by: Gavin Shan --- accel/kvm/kvm-all.c | 2 +- include/exec/memory.h | 5 +++-- migration/dirtyrate.c | 4 ++-- migration/ram.c | 6 +++--- Better move the migration/ changes into the next patch. Ok, I will combine PATCH[2/8] and PATCH[3/8] in next revision. Thanks, Gavin
Re: [PATCH RFCv1 6/8] kvm: Add helper kvm_dirty_ring_init()
On 2/9/23 9:11 AM, Juan Quintela wrote: Gavin Shan wrote: Due to multiple capabilities associated with the dirty ring for different architectures: KVM_CAP_DIRTY_{LOG_RING, LOG_RING_ACQ_REL} for x86 and arm64 separately. There will be more to be done in order to support the dirty ring for arm64. Lets add helper kvm_dirty_ring_init() to enable the dirty ring. With this, the code looks a bit clean. No functional change intended. Signed-off-by: Gavin Shan --- accel/kvm/kvm-all.c | 73 - 1 file changed, 46 insertions(+), 27 deletions(-) diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c index 9ec117c441..399ef0f7e6 100644 --- a/accel/kvm/kvm-all.c +++ b/accel/kvm/kvm-all.c @@ -1476,6 +1476,49 @@ static int kvm_dirty_ring_reaper_init(KVMState *s) return 0; } +static int kvm_dirty_ring_init(KVMState *s) +{ +uint64_t ring_bytes; +int ret; + +/* + * Read the max supported pages. Fall back to dirty logging mode + * if the dirty ring isn't supported. + */ +ret = kvm_vm_check_extension(s, KVM_CAP_DIRTY_LOG_RING); +if (ret <= 0) { +warn_report("KVM dirty ring not available, using bitmap method"); +s->kvm_dirty_ring_size = 0; +return 0; +} + +ring_bytes = s->kvm_dirty_ring_size * sizeof(struct kvm_dirty_gfn); +if (ring_bytes > ret) { +error_report("KVM dirty ring size %" PRIu32 " too big " + "(maximum is %ld). Please use a smaller value.", + s->kvm_dirty_ring_size, + (long)ret / sizeof(struct kvm_dirty_gfn)); +ret = -EINVAL; +goto out; +} + +ret = kvm_vm_enable_cap(s, KVM_CAP_DIRTY_LOG_RING, 0, ring_bytes); +if (ret) { +error_report("Enabling of KVM dirty ring failed: %s. " + "Suggested minimum value is 1024.", strerror(-ret)); +ret = -EIO; +} + +out: +if (ret) { +s->kvm_dirty_ring_size = 0; +} else { +s->kvm_dirty_ring_bytes = ring_bytes; +} + +return ret; +} If you split it, you don't need the goto. static int kvm_dirty_ring_init(KVMState *s) { s->kvm_dirty_ring_size = 0; /* * Read the max supported pages. Fall back to dirty logging mode * if the dirty ring isn't supported. */ int ret = kvm_vm_check_extension(s, KVM_CAP_DIRTY_LOG_RING); if (ret <= 0) { warn_report("KVM dirty ring not available, using bitmap method"); return 0; } uint64_t ring_bytes = s->kvm_dirty_ring_size * sizeof(struct kvm_dirty_gfn); if (ring_bytes > ret) { error_report("KVM dirty ring size %" PRIu32 " too big " "(maximum is %ld). Please use a smaller value.", s->kvm_dirty_ring_size, (long)ret / sizeof(struct kvm_dirty_gfn)); return -EINVAL; } ret = kvm_vm_enable_cap(s, KVM_CAP_DIRTY_LOG_RING, 0, ring_bytes); if (ret) { error_report("Enabling of KVM dirty ring failed: %s. " "Suggested minimum value is 1024.", strerror(-ret)); return -EIO; } s->kvm_dirty_ring_bytes = ring_bytes; return ret; } Ok, thanks for your review. The goto will be removed in next revision. + static void kvm_region_add(MemoryListener *listener, MemoryRegionSection *section) { @@ -2545,33 +2588,9 @@ static int kvm_init(MachineState *ms) * dirty logging mode */ if (s->kvm_dirty_ring_size > 0) { -uint64_t ring_bytes; - -ring_bytes = s->kvm_dirty_ring_size * sizeof(struct kvm_dirty_gfn); - -/* Read the max supported pages */ -ret = kvm_vm_check_extension(s, KVM_CAP_DIRTY_LOG_RING); -if (ret > 0) { -if (ring_bytes > ret) { -error_report("KVM dirty ring size %" PRIu32 " too big " - "(maximum is %ld). Please use a smaller value.", - s->kvm_dirty_ring_size, - (long)ret / sizeof(struct kvm_dirty_gfn)); -ret = -EINVAL; -goto err; -} - -ret = kvm_vm_enable_cap(s, KVM_CAP_DIRTY_LOG_RING, 0, ring_bytes); -if (ret) { -error_report("Enabling of KVM dirty ring failed: %s. " - "Suggested minimum value is 1024.", strerror(-ret)); -goto err; -} - -s->kvm_dirty_ring_bytes = ring_bytes; - } else { - warn_report("KVM dirty ring not available, using bitmap method"); - s->kvm_dirty_ring_size = 0; +ret = kvm_dirty_ring_init(s); +if (ret < 0) { +goto err; } } Thanks, Gavin
Re: [PATCH RFCv1 4/8] kvm: Introduce secondary dirty bitmap
On 2/9/23 9:07 AM, Juan Quintela wrote: Gavin Shan wrote: When dirty ring is enabled on ARM64, the backup bitmap may be used to track the dirty pages in no-running-vcpu situations. The original bitmap is the primary one, used for the dirty ring buffer. We need the secondary bitmap to collect the backup bitmap for ARM64. No functional change intended. Signed-off-by: Gavin Shan --- accel/kvm/kvm-all.c | 50 ++-- include/sysemu/kvm_int.h | 1 + 2 files changed, 39 insertions(+), 12 deletions(-) diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c index 01a6a026af..1a93985574 100644 --- a/accel/kvm/kvm-all.c +++ b/accel/kvm/kvm-all.c @@ -553,13 +553,29 @@ static void kvm_log_stop(MemoryListener *listener, } } +static unsigned long *kvm_slot_dirty_bitmap(KVMSlot *slot, bool primary) +{ +if (primary) { +return slot->dirty_bmap; +} + +return slot->dirty_bmap + + slot->dirty_bmap_size / sizeof(slot->dirty_bmap[0]); +} Why? Just use two bitmaps and call it a day. Thanks for your review, Juan. Right, I had wrong assumption that the original (primary) bitmap can't be reused. It's why the secondary bitmap is introduced. The intention is to use the original (primary) bitmap to cover the dirty-ring buffers while the secondary bitmap is to cover the backup bitmap, resident in host kernel. I think the original (primary) bitmap can be reused in this case. After the dirty-ring buffer is synchronized to the original bitmap, which is updated to the dirty bits. It can be reused to cover the backup bitmap. I will remove the secondary bitmap in next revision. Thanks, Gavin
[PATCH RFCv1 6/8] kvm: Add helper kvm_dirty_ring_init()
Due to multiple capabilities associated with the dirty ring for different architectures: KVM_CAP_DIRTY_{LOG_RING, LOG_RING_ACQ_REL} for x86 and arm64 separately. There will be more to be done in order to support the dirty ring for arm64. Lets add helper kvm_dirty_ring_init() to enable the dirty ring. With this, the code looks a bit clean. No functional change intended. Signed-off-by: Gavin Shan --- accel/kvm/kvm-all.c | 73 - 1 file changed, 46 insertions(+), 27 deletions(-) diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c index 9ec117c441..399ef0f7e6 100644 --- a/accel/kvm/kvm-all.c +++ b/accel/kvm/kvm-all.c @@ -1476,6 +1476,49 @@ static int kvm_dirty_ring_reaper_init(KVMState *s) return 0; } +static int kvm_dirty_ring_init(KVMState *s) +{ +uint64_t ring_bytes; +int ret; + +/* + * Read the max supported pages. Fall back to dirty logging mode + * if the dirty ring isn't supported. + */ +ret = kvm_vm_check_extension(s, KVM_CAP_DIRTY_LOG_RING); +if (ret <= 0) { +warn_report("KVM dirty ring not available, using bitmap method"); +s->kvm_dirty_ring_size = 0; +return 0; +} + +ring_bytes = s->kvm_dirty_ring_size * sizeof(struct kvm_dirty_gfn); +if (ring_bytes > ret) { +error_report("KVM dirty ring size %" PRIu32 " too big " + "(maximum is %ld). Please use a smaller value.", + s->kvm_dirty_ring_size, + (long)ret / sizeof(struct kvm_dirty_gfn)); +ret = -EINVAL; +goto out; +} + +ret = kvm_vm_enable_cap(s, KVM_CAP_DIRTY_LOG_RING, 0, ring_bytes); +if (ret) { +error_report("Enabling of KVM dirty ring failed: %s. " + "Suggested minimum value is 1024.", strerror(-ret)); +ret = -EIO; +} + +out: +if (ret) { +s->kvm_dirty_ring_size = 0; +} else { +s->kvm_dirty_ring_bytes = ring_bytes; +} + +return ret; +} + static void kvm_region_add(MemoryListener *listener, MemoryRegionSection *section) { @@ -2545,33 +2588,9 @@ static int kvm_init(MachineState *ms) * dirty logging mode */ if (s->kvm_dirty_ring_size > 0) { -uint64_t ring_bytes; - -ring_bytes = s->kvm_dirty_ring_size * sizeof(struct kvm_dirty_gfn); - -/* Read the max supported pages */ -ret = kvm_vm_check_extension(s, KVM_CAP_DIRTY_LOG_RING); -if (ret > 0) { -if (ring_bytes > ret) { -error_report("KVM dirty ring size %" PRIu32 " too big " - "(maximum is %ld). Please use a smaller value.", - s->kvm_dirty_ring_size, - (long)ret / sizeof(struct kvm_dirty_gfn)); -ret = -EINVAL; -goto err; -} - -ret = kvm_vm_enable_cap(s, KVM_CAP_DIRTY_LOG_RING, 0, ring_bytes); -if (ret) { -error_report("Enabling of KVM dirty ring failed: %s. " - "Suggested minimum value is 1024.", strerror(-ret)); -goto err; -} - -s->kvm_dirty_ring_bytes = ring_bytes; - } else { - warn_report("KVM dirty ring not available, using bitmap method"); - s->kvm_dirty_ring_size = 0; +ret = kvm_dirty_ring_init(s); +if (ret < 0) { +goto err; } } -- 2.23.0
[PATCH RFCv1 5/8] kvm: Synchronize secondary bitmap in last stage
In the last stage of live migration or memory slot removal, the backup bitmap needs to be synchronized. Signed-off-by: Gavin Shan --- accel/kvm/kvm-all.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c index 1a93985574..9ec117c441 100644 --- a/accel/kvm/kvm-all.c +++ b/accel/kvm/kvm-all.c @@ -1377,10 +1377,12 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml, */ if (kvm_state->kvm_dirty_ring_size) { kvm_dirty_ring_reap_locked(kvm_state, NULL); +kvm_slot_get_dirty_log(kvm_state, mem, false); } else { kvm_slot_get_dirty_log(kvm_state, mem, true); } kvm_slot_sync_dirty_pages(mem, true); +kvm_slot_sync_dirty_pages(mem, false); } /* unregister the slot */ @@ -1604,6 +1606,11 @@ static void kvm_log_sync_global(MemoryListener *l, bool last_stage) * However kvm dirty ring has no such side effect. */ kvm_slot_reset_dirty_pages(mem); + +if (s->kvm_dirty_ring_with_bitmap && last_stage && +kvm_slot_get_dirty_log(s, mem, false)) { +kvm_slot_sync_dirty_pages(mem, false); +} } } kvm_slots_unlock(); -- 2.23.0
[PATCH RFCv1 7/8] hw/arm/virt: Enable backup bitmap for dirty ring
When KVM device "kvm-arm-gicv3" or "arm-its-kvm" is used, we have to enable the backup bitmap for the dirty ring. Otherwise, the migration will fail because those two devices are using the backup bitmap to track dirty guest memory, corresponding to various hardware tables. Signed-off-by: Gavin Shan --- hw/arm/virt.c| 26 ++ target/arm/kvm64.c | 25 + target/arm/kvm_arm.h | 12 3 files changed, 63 insertions(+) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index ba47728288..b91b5972bc 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -2025,6 +2025,8 @@ static void machvirt_init(MachineState *machine) VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(machine); MachineClass *mc = MACHINE_GET_CLASS(machine); const CPUArchIdList *possible_cpus; +const char *gictype = NULL; +const char *itsclass = NULL; MemoryRegion *sysmem = get_system_memory(); MemoryRegion *secure_sysmem = NULL; MemoryRegion *tag_sysmem = NULL; @@ -2072,6 +2074,30 @@ static void machvirt_init(MachineState *machine) */ finalize_gic_version(vms); +/* + * When "kvm-arm-gicv3" or "arm-its-kvm" is used, the backup dirty + * bitmap has to be enabled for KVM dirty ring, before any memory + * slot is added. Otherwise, the migration will fail with the dirty + * ring. + */ +if (kvm_enabled()) { +if (vms->gic_version != VIRT_GIC_VERSION_2) { +gictype = gicv3_class_name(); +} + +if (vms->gic_version != VIRT_GIC_VERSION_2 && vms->its) { +itsclass = its_class_name(); +} + +if (((gictype && !strcmp(gictype, "kvm-arm-gicv3")) || + (itsclass && !strcmp(itsclass, "arm-its-kvm"))) && +!kvm_arm_enable_dirty_ring_with_bitmap()) { +error_report("Failed to enable the backup bitmap for " + "KVM dirty ring"); +exit(1); +} +} + if (vms->secure) { /* * The Secure view of the world is the same as the NonSecure, diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c index 1197253d12..91960e1cd3 100644 --- a/target/arm/kvm64.c +++ b/target/arm/kvm64.c @@ -764,6 +764,31 @@ bool kvm_arm_steal_time_supported(void) return kvm_check_extension(kvm_state, KVM_CAP_STEAL_TIME); } +bool kvm_arm_enable_dirty_ring_with_bitmap(void) +{ +int ret; + +/* No need to enable the backup bitmap if dirty ring isn't enabled */ +if (!kvm_dirty_ring_enabled()) { +return true; +} + +ret = kvm_check_extension(kvm_state, KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP); +if (!ret) { +return false; +} + +ret = kvm_vm_enable_cap(kvm_state, KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP, +0, 1); +if (ret) { +return false; +} + +kvm_state->kvm_dirty_ring_with_bitmap = true; + +return true; +} + QEMU_BUILD_BUG_ON(KVM_ARM64_SVE_VQ_MIN != 1); uint32_t kvm_arm_sve_get_vls(CPUState *cs) diff --git a/target/arm/kvm_arm.h b/target/arm/kvm_arm.h index 99017b635c..402281e61a 100644 --- a/target/arm/kvm_arm.h +++ b/target/arm/kvm_arm.h @@ -282,6 +282,13 @@ void kvm_arm_steal_time_finalize(ARMCPU *cpu, Error **errp); */ bool kvm_arm_steal_time_supported(void); +/** + * kvm_arm_enable_dirty_ring_with_bitmap: + * Returns: true if KVM dirty ring's backup bitmap is enabled + * and false otherwise. + */ +bool kvm_arm_enable_dirty_ring_with_bitmap(void); + /** * kvm_arm_aarch32_supported: * @@ -395,6 +402,11 @@ static inline bool kvm_arm_steal_time_supported(void) return false; } +static inline bool kvm_arm_enable_dirty_ring_with_bitmap(void) +{ +return false; +} + /* * These functions should never actually be called without KVM support. */ -- 2.23.0
[PATCH RFCv1 8/8] kvm: Enable dirty ring for arm64
arm64 has different capability from x86 to enable the dirty ring, which is KVM_CAP_DIRTY_LOG_RING_ACQ_REL. To enable it in kvm_dirty_ring_init() when KVM_CAP_DIRTY_LOG_RING isn't supported. Signed-off-by: Gavin Shan --- accel/kvm/kvm-all.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c index 399ef0f7e6..8ab31865eb 100644 --- a/accel/kvm/kvm-all.c +++ b/accel/kvm/kvm-all.c @@ -1479,13 +1479,19 @@ static int kvm_dirty_ring_reaper_init(KVMState *s) static int kvm_dirty_ring_init(KVMState *s) { uint64_t ring_bytes; +unsigned int capability = KVM_CAP_DIRTY_LOG_RING; int ret; /* * Read the max supported pages. Fall back to dirty logging mode * if the dirty ring isn't supported. */ -ret = kvm_vm_check_extension(s, KVM_CAP_DIRTY_LOG_RING); +ret = kvm_vm_check_extension(s, capability); +if (ret <= 0) { +capability = KVM_CAP_DIRTY_LOG_RING_ACQ_REL; +ret = kvm_vm_check_extension(s, capability); +} + if (ret <= 0) { warn_report("KVM dirty ring not available, using bitmap method"); s->kvm_dirty_ring_size = 0; @@ -1502,7 +1508,7 @@ static int kvm_dirty_ring_init(KVMState *s) goto out; } -ret = kvm_vm_enable_cap(s, KVM_CAP_DIRTY_LOG_RING, 0, ring_bytes); +ret = kvm_vm_enable_cap(s, capability, 0, ring_bytes); if (ret) { error_report("Enabling of KVM dirty ring failed: %s. " "Suggested minimum value is 1024.", strerror(-ret)); -- 2.23.0
[PATCH RFCv1 4/8] kvm: Introduce secondary dirty bitmap
When dirty ring is enabled on ARM64, the backup bitmap may be used to track the dirty pages in no-running-vcpu situations. The original bitmap is the primary one, used for the dirty ring buffer. We need the secondary bitmap to collect the backup bitmap for ARM64. No functional change intended. Signed-off-by: Gavin Shan --- accel/kvm/kvm-all.c | 50 ++-- include/sysemu/kvm_int.h | 1 + 2 files changed, 39 insertions(+), 12 deletions(-) diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c index 01a6a026af..1a93985574 100644 --- a/accel/kvm/kvm-all.c +++ b/accel/kvm/kvm-all.c @@ -553,13 +553,29 @@ static void kvm_log_stop(MemoryListener *listener, } } +static unsigned long *kvm_slot_dirty_bitmap(KVMSlot *slot, bool primary) +{ +if (primary) { +return slot->dirty_bmap; +} + +return slot->dirty_bmap + + slot->dirty_bmap_size / sizeof(slot->dirty_bmap[0]); +} + /* get kvm's dirty pages bitmap and update qemu's */ -static void kvm_slot_sync_dirty_pages(KVMSlot *slot) +static void kvm_slot_sync_dirty_pages(KVMSlot *slot, bool primary) { +KVMState *s = kvm_state; +unsigned long *bmap = kvm_slot_dirty_bitmap(slot, primary); ram_addr_t start = slot->ram_start_offset; ram_addr_t pages = slot->memory_size / qemu_real_host_page_size(); -cpu_physical_memory_set_dirty_lebitmap(slot->dirty_bmap, start, pages); +if (!s->kvm_dirty_ring_with_bitmap && !primary) { +return; +} + +cpu_physical_memory_set_dirty_lebitmap(bmap, start, pages); } static void kvm_slot_reset_dirty_pages(KVMSlot *slot) @@ -572,6 +588,9 @@ static void kvm_slot_reset_dirty_pages(KVMSlot *slot) /* Allocate the dirty bitmap for a slot */ static void kvm_slot_init_dirty_bitmap(KVMSlot *mem) { +KVMState *s = kvm_state; +hwaddr bitmap_size, alloc_size; + if (!(mem->flags & KVM_MEM_LOG_DIRTY_PAGES) || mem->dirty_bmap) { return; } @@ -593,9 +612,11 @@ static void kvm_slot_init_dirty_bitmap(KVMSlot *mem) * And mem->memory_size is aligned to it (otherwise this mem can't * be registered to KVM). */ -hwaddr bitmap_size = ALIGN(mem->memory_size / qemu_real_host_page_size(), -/*HOST_LONG_BITS*/ 64) / 8; -mem->dirty_bmap = g_malloc0(bitmap_size); +bitmap_size = ALIGN(mem->memory_size / qemu_real_host_page_size(), +/*HOST_LONG_BITS*/ 64) / 8; +alloc_size = s->kvm_dirty_ring_with_bitmap ? 2 * bitmap_size : bitmap_size; + +mem->dirty_bmap = g_malloc0(alloc_size); mem->dirty_bmap_size = bitmap_size; } @@ -603,12 +624,16 @@ static void kvm_slot_init_dirty_bitmap(KVMSlot *mem) * Sync dirty bitmap from kernel to KVMSlot.dirty_bmap, return true if * succeeded, false otherwise */ -static bool kvm_slot_get_dirty_log(KVMState *s, KVMSlot *slot) +static bool kvm_slot_get_dirty_log(KVMState *s, KVMSlot *slot, bool primary) { struct kvm_dirty_log d = {}; int ret; -d.dirty_bitmap = slot->dirty_bmap; +if (!s->kvm_dirty_ring_with_bitmap && !primary) { +return false; +} + +d.dirty_bitmap = kvm_slot_dirty_bitmap(slot, primary); d.slot = slot->slot | (slot->as_id << 16); ret = kvm_vm_ioctl(s, KVM_GET_DIRTY_LOG, ); @@ -839,8 +864,8 @@ static void kvm_physical_sync_dirty_bitmap(KVMMemoryListener *kml, /* We don't have a slot if we want to trap every access. */ return; } -if (kvm_slot_get_dirty_log(s, mem)) { -kvm_slot_sync_dirty_pages(mem); +if (kvm_slot_get_dirty_log(s, mem, true)) { +kvm_slot_sync_dirty_pages(mem, true); } start_addr += slot_size; size -= slot_size; @@ -1353,9 +1378,9 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml, if (kvm_state->kvm_dirty_ring_size) { kvm_dirty_ring_reap_locked(kvm_state, NULL); } else { -kvm_slot_get_dirty_log(kvm_state, mem); +kvm_slot_get_dirty_log(kvm_state, mem, true); } -kvm_slot_sync_dirty_pages(mem); +kvm_slot_sync_dirty_pages(mem, true); } /* unregister the slot */ @@ -1572,7 +1597,7 @@ static void kvm_log_sync_global(MemoryListener *l, bool last_stage) for (i = 0; i < s->nr_slots; i++) { mem = >slots[i]; if (mem->memory_size && mem->flags & KVM_MEM_LOG_DIRTY_PAGES) { -kvm_slot_sync_dirty_pages(mem); +kvm_slot_sync_dirty_pages(mem, true); /* * This is not needed by KVM_GET_DIRTY_LOG because the * ioctl will unconditionally overwrite the whole region. @@ -3701,6 +3726,7 @@ static void kvm_accel_instance_init(Object *obj) s->ke
[PATCH RFCv1 3/8] migration: Add last stage indicator to global dirty log synchronization
For the pre-copy live migration scenario, the last stage indicator is needed for KVM backend to collect the dirty pages from the backup bitmap when dirty ring is used. The indicator isn't used so far. No functional change intended. Signed-off-by: Gavin Shan --- migration/ram.c | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/migration/ram.c b/migration/ram.c index d1b9b270ec..6c9642edee 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -1176,7 +1176,7 @@ static void migration_trigger_throttle(RAMState *rs) } } -static void migration_bitmap_sync(RAMState *rs) +static void migration_bitmap_sync(RAMState *rs, bool last_stage) { RAMBlock *block; int64_t end_time; @@ -1188,7 +1188,7 @@ static void migration_bitmap_sync(RAMState *rs) } trace_migration_bitmap_sync_start(); -memory_global_dirty_log_sync(false); +memory_global_dirty_log_sync(last_stage); qemu_mutex_lock(>bitmap_mutex); WITH_RCU_READ_LOCK_GUARD() { @@ -1222,7 +1222,7 @@ static void migration_bitmap_sync(RAMState *rs) } } -static void migration_bitmap_sync_precopy(RAMState *rs) +static void migration_bitmap_sync_precopy(RAMState *rs, bool last_stage) { Error *local_err = NULL; @@ -1235,7 +1235,7 @@ static void migration_bitmap_sync_precopy(RAMState *rs) local_err = NULL; } -migration_bitmap_sync(rs); +migration_bitmap_sync(rs, last_stage); if (precopy_notify(PRECOPY_NOTIFY_AFTER_BITMAP_SYNC, _err)) { error_report_err(local_err); @@ -2844,7 +2844,7 @@ void ram_postcopy_send_discard_bitmap(MigrationState *ms) RCU_READ_LOCK_GUARD(); /* This should be our last sync, the src is now paused */ -migration_bitmap_sync(rs); +migration_bitmap_sync(rs, false); /* Easiest way to make sure we don't resume in the middle of a host-page */ rs->pss[RAM_CHANNEL_PRECOPY].last_sent_block = NULL; @@ -3034,7 +3034,7 @@ static void ram_init_bitmaps(RAMState *rs) /* We don't use dirty log with background snapshots */ if (!migrate_background_snapshot()) { memory_global_dirty_log_start(GLOBAL_DIRTY_MIGRATION); -migration_bitmap_sync_precopy(rs); +migration_bitmap_sync_precopy(rs, false); } } qemu_mutex_unlock_ramlist(); @@ -3349,7 +3349,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque) WITH_RCU_READ_LOCK_GUARD() { if (!migration_in_postcopy()) { -migration_bitmap_sync_precopy(rs); +migration_bitmap_sync_precopy(rs, true); } ram_control_before_iterate(f, RAM_CONTROL_FINISH); @@ -3407,7 +3407,7 @@ static void ram_save_pending(QEMUFile *f, void *opaque, uint64_t max_size, remaining_size < max_size) { qemu_mutex_lock_iothread(); WITH_RCU_READ_LOCK_GUARD() { -migration_bitmap_sync_precopy(rs); +migration_bitmap_sync_precopy(rs, false); } qemu_mutex_unlock_iothread(); remaining_size = rs->migration_dirty_pages * TARGET_PAGE_SIZE; -- 2.23.0
[PATCH RFCv1 1/8] linux-headers: Update for dirty ring
Signed-off-by: Gavin Shan --- linux-headers/asm-arm64/kvm.h | 1 + linux-headers/linux/kvm.h | 2 ++ 2 files changed, 3 insertions(+) diff --git a/linux-headers/asm-arm64/kvm.h b/linux-headers/asm-arm64/kvm.h index 4bf2d7246e..a7cfefb3a8 100644 --- a/linux-headers/asm-arm64/kvm.h +++ b/linux-headers/asm-arm64/kvm.h @@ -43,6 +43,7 @@ #define __KVM_HAVE_VCPU_EVENTS #define KVM_COALESCED_MMIO_PAGE_OFFSET 1 +#define KVM_DIRTY_LOG_PAGE_OFFSET 64 #define KVM_REG_SIZE(id) \ (1U << (((id) & KVM_REG_SIZE_MASK) >> KVM_REG_SIZE_SHIFT)) diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h index ebdafa576d..5b4e0e4e68 100644 --- a/linux-headers/linux/kvm.h +++ b/linux-headers/linux/kvm.h @@ -1175,6 +1175,8 @@ struct kvm_ppc_resize_hpt { #define KVM_CAP_VM_DISABLE_NX_HUGE_PAGES 220 #define KVM_CAP_S390_ZPCI_OP 221 #define KVM_CAP_S390_CPU_TOPOLOGY 222 +#define KVM_CAP_DIRTY_LOG_RING_ACQ_REL 223 +#define KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP 225 #ifdef KVM_CAP_IRQ_ROUTING -- 2.23.0
[PATCH RFCv1 2/8] memory: Add last stage indicator to global dirty log synchronization
The global dirty log synchronization is used when KVM and dirty ring are enabled. There is a particularity for ARM64 where the backup bitmap is used to track dirty pages in non-running-vcpu situations. It means the dirty ring works with the combination of ring buffer and backup bitmap. The dirty bits in the backup bitmap needs to collected in the last stage of live migration. In order to identify the last stage of live migration and pass it down, an extra parameter is added to the relevant functions and callback. This last stage information isn't used yet. No functional change intended. Signed-off-by: Gavin Shan --- accel/kvm/kvm-all.c | 2 +- include/exec/memory.h | 5 +++-- migration/dirtyrate.c | 4 ++-- migration/ram.c | 6 +++--- softmmu/memory.c | 10 +- 5 files changed, 14 insertions(+), 13 deletions(-) diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c index 9b26582655..01a6a026af 100644 --- a/accel/kvm/kvm-all.c +++ b/accel/kvm/kvm-all.c @@ -1554,7 +1554,7 @@ static void kvm_log_sync(MemoryListener *listener, kvm_slots_unlock(); } -static void kvm_log_sync_global(MemoryListener *l) +static void kvm_log_sync_global(MemoryListener *l, bool last_stage) { KVMMemoryListener *kml = container_of(l, KVMMemoryListener, listener); KVMState *s = kvm_state; diff --git a/include/exec/memory.h b/include/exec/memory.h index 2e602a2fad..75b2fd9f48 100644 --- a/include/exec/memory.h +++ b/include/exec/memory.h @@ -929,8 +929,9 @@ struct MemoryListener { * its @log_sync must be NULL. Vice versa. * * @listener: The #MemoryListener. + * @last_stage: The last stage to synchronize the log during migration */ -void (*log_sync_global)(MemoryListener *listener); +void (*log_sync_global)(MemoryListener *listener, bool last_stage); /** * @log_clear: @@ -2408,7 +2409,7 @@ MemoryRegionSection memory_region_find(MemoryRegion *mr, * * Synchronizes the dirty page log for all address spaces. */ -void memory_global_dirty_log_sync(void); +void memory_global_dirty_log_sync(bool last_stage); /** * memory_global_dirty_log_sync: synchronize the dirty log for all memory diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c index 4bfb97fc68..aecc8142e4 100644 --- a/migration/dirtyrate.c +++ b/migration/dirtyrate.c @@ -101,7 +101,7 @@ void global_dirty_log_change(unsigned int flag, bool start) static void global_dirty_log_sync(unsigned int flag, bool one_shot) { qemu_mutex_lock_iothread(); -memory_global_dirty_log_sync(); +memory_global_dirty_log_sync(false); if (one_shot) { memory_global_dirty_log_stop(flag); } @@ -553,7 +553,7 @@ static void calculate_dirtyrate_dirty_bitmap(struct DirtyRateConfig config) * skip it unconditionally and start dirty tracking * from 2'round of log sync */ -memory_global_dirty_log_sync(); +memory_global_dirty_log_sync(false); /* * reset page protect manually and unconditionally. diff --git a/migration/ram.c b/migration/ram.c index 334309f1c6..d1b9b270ec 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -1188,7 +1188,7 @@ static void migration_bitmap_sync(RAMState *rs) } trace_migration_bitmap_sync_start(); -memory_global_dirty_log_sync(); +memory_global_dirty_log_sync(false); qemu_mutex_lock(>bitmap_mutex); WITH_RCU_READ_LOCK_GUARD() { @@ -3817,7 +3817,7 @@ void colo_incoming_start_dirty_log(void) qemu_mutex_lock_iothread(); qemu_mutex_lock_ramlist(); -memory_global_dirty_log_sync(); +memory_global_dirty_log_sync(false); WITH_RCU_READ_LOCK_GUARD() { RAMBLOCK_FOREACH_NOT_IGNORED(block) { ramblock_sync_dirty_bitmap(ram_state, block); @@ -4114,7 +4114,7 @@ void colo_flush_ram_cache(void) void *src_host; unsigned long offset = 0; -memory_global_dirty_log_sync(); +memory_global_dirty_log_sync(false); WITH_RCU_READ_LOCK_GUARD() { RAMBLOCK_FOREACH_NOT_IGNORED(block) { ramblock_sync_dirty_bitmap(ram_state, block); diff --git a/softmmu/memory.c b/softmmu/memory.c index 9d64efca26..1cc36ef028 100644 --- a/softmmu/memory.c +++ b/softmmu/memory.c @@ -2224,7 +2224,7 @@ void memory_region_set_dirty(MemoryRegion *mr, hwaddr addr, * If memory region `mr' is NULL, do global sync. Otherwise, sync * dirty bitmap for the specified memory region. */ -static void memory_region_sync_dirty_bitmap(MemoryRegion *mr) +static void memory_region_sync_dirty_bitmap(MemoryRegion *mr, bool last_stage) { MemoryListener *listener; AddressSpace *as; @@ -2254,7 +2254,7 @@ static void memory_region_sync_dirty_bitmap(MemoryRegion *mr) * is to do a global sync, because we are not capable to * sync in a finer granularity. */ -listener->log_sync_global(listener); +listener->log_sync_global(listener, l
[PATCH RFCv1 0/8] hw/arm/virt: Support dirty ring
This series intends to support dirty ring for live migration. The dirty ring use discrete buffer to track dirty pages. For ARM64, the speciality is to use backup bitmap to track dirty pages when there is no-running-vcpu context. It's known that the backup bitmap needs to be synchronized when KVM device "kvm-arm-gicv3" or "arm-its-kvm" has been enabled. The backup bitmap is collected in the last stage of migration. PATCH[1]Synchronize linux-headers for dirty ring PATCH[2-3] Introduce indicator of the last stage migration and pass it all the way down PATCH[4-5] Introduce secondary bitmap, corresponding to the backup bitmap PATCH[6-8] Enable dirty ring for hw/arm/virt Testing === (1) kvm-unit-tests/its-pending-migration and kvm-unit-tests/its-migration with dirty ring or normal dirty page tracking mechanism. All test cases passed. QEMU=./qemu.main/build/qemu-system-aarch64 ACCEL=kvm \ ./its-pending-migration QEMU=./qemu.main/build/qemu-system-aarch64 ACCEL=kvm \ ./its-migration QEMU=./qemu.main/build/qemu-system-aarch64 ACCEL=kvm,dirty-ring-size=65536 \ ./its-pending-migration QEMU=./qemu.main/build/qemu-system-aarch64 ACCEL=kvm,dirty-ring-size=65536 \ ./its-migration (2) Combinations of migration, post-copy migration, e1000e and virtio-net devices. All test cases passed. -netdev tap,id=net0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \ -device e1000e,bus=pcie.5,netdev=net0,mac=52:54:00:f1:26:a0 -netdev tap,id=vnet0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \ -device virtio-net-pci,bus=pcie.6,netdev=vnet0,mac=52:54:00:f1:26:b0 Gavin Shan (8): linux-headers: Update for dirty ring memory: Add last stage indicator to global dirty log synchronization migration: Add last stage indicator to global dirty log synchronization kvm: Introduce secondary dirty bitmap kvm: Synchronize secondary bitmap in last stage kvm: Add helper kvm_dirty_ring_init() hw/arm/virt: Enable backup bitmap for dirty ring kvm: Enable dirty ring for arm64 accel/kvm/kvm-all.c | 138 -- hw/arm/virt.c | 26 +++ include/exec/memory.h | 5 +- include/sysemu/kvm_int.h | 1 + linux-headers/asm-arm64/kvm.h | 1 + linux-headers/linux/kvm.h | 2 + migration/dirtyrate.c | 4 +- migration/ram.c | 20 ++--- softmmu/memory.c | 10 +-- target/arm/kvm64.c| 25 ++ target/arm/kvm_arm.h | 12 +++ 11 files changed, 185 insertions(+), 59 deletions(-) -- 2.23.0
Re: [PATCH v4] qapi/qom: Memory backend property prealloc-threads doc fix
On 11/14/22 11:24 AM, Zhenyu Zhang wrote: Commit ffac16fab3 "hostmem: introduce "prealloc-threads" property" (v5.0.0) changed the default number of threads from number of CPUs to 1. This was deemed a regression, and fixed in commit f8d426a685 "hostmem: default the amount of prealloc-threads to smp-cpus". Except the documentation remained unchanged. Update 'qapi/qom.json' to reflect the change in commit f8d426a6852c. Signed-off-by: Zhenyu Zhang --- v4: Fix the line exceeding 80 characters limitation issue (Gavin) v3: Covers historical descriptions (Markus) v2: The property is changed to smp-cpus since 5.0 (Phild) --- qapi/qom.json | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) Reviewed-by: Gavin Shan diff --git a/qapi/qom.json b/qapi/qom.json index 30e76653ad..f4a7917f3d 100644 --- a/qapi/qom.json +++ b/qapi/qom.json @@ -576,7 +576,8 @@ # # @prealloc: if true, preallocate memory (default: false) # -# @prealloc-threads: number of CPU threads to use for prealloc (default: 1) +# @prealloc-threads: number of CPU threads to use for prealloc +#(default: number of CPUs) (since 5.0) # # @prealloc-context: thread context to use for creation of preallocation threads #(default: none) (since 7.2)
Re: [PATCH v3] qapi/qom: Memory backend property prealloc-threads doc fix
On 11/11/22 6:54 PM, Igor Mammedov wrote: On Fri, 11 Nov 2022 17:34:04 +0800 Gavin Shan wrote: On 11/11/22 5:13 PM, Igor Mammedov wrote: On Fri, 11 Nov 2022 07:47:16 +0100 Markus Armbruster wrote: Gavin Shan writes: On 11/11/22 11:05 AM, Zhenyu Zhang wrote: Commit ffac16fab3 "hostmem: introduce "prealloc-threads" property" (v5.0.0) changed the default number of threads from number of CPUs to 1. This was deemed a regression, and fixed in commit f8d426a685 "hostmem: default the amount of prealloc-threads to smp-cpus". Except the documentation remained unchanged. Update it now. Signed-off-by: Zhenyu Zhang --- v3: Covers historical descriptions (Markus) v2: The property is changed to smp-cpus since 5.0 (Phild) --- qapi/qom.json | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) With the following comments addressed: Reviewed-by: Gavin Shan --- Please consider amending the commit log to something like below. The default "prealloc-threads" value is set to 1 when the property is added by commit ffac16fab33b ("hostmem: introduce "prealloc-threads" property") in v5.0.0. The default value is conflicting with the sugar property as the value provided by the sugar property is number of CPUs. What is the sugar property? Can you explain the conflict in a bit more detail? my guess is that Gavin means mem_prealloc compat glue in qemu_process_sugar_options() property value should be set according to following order default -> compat -> explicit value so I don't see any conflict here. PS: if it we up to me, default would have stayed 1, and prealloc-threads fixup to vCPUs number would happen in vl.c similar to what is done in qemu_process_sugar_options(), keeping backend clean of external dependencies. Yes, it's the sugar property I was talking about. I'm not sure if we have a more popular name for this property: compat property or sugar property. When 'mem-prealloc=on' and 'prealloc-threads=xxx' aren't provided, the value is 1 before commit f8d426a6852c is applied. It's not inconsistent with 'mem-prealloc=on'. It's the conflict I was talking about and it's fixed by commit f8d426a6852c default was not supposed to be consistent with legacy mem-prealloc and sugar property takes care of mem-prealloc=on case. so commit message in its current form looks fine to me. Ok, thanks for your confirm. I think Zhenyu needs to post v4, to fix the 80 characters limitation issue. My reviewed-by is still valid. The conflict has been fixed by commit f8d426a6852c ("hostmem: default the amount of prealloc-threads to smp-cpus"). However, 'qapi/qom.json' was missed to be updated accordingly in the commit. Update 'qapi/qom.json' to reflect the change in commit f8d426a6852c. Signed-off-by: Zhenyu Zhang When a specific commit is mentioned in the commit log, we usually have fixed format like below. commit ffac16fab33b ("hostmem: introduce "prealloc-threads" property") commit f8d426a6852c ("hostmem: default the amount of prealloc-threads to smp-cpus") This is certainly a common format, but the other one is also in use. diff --git a/qapi/qom.json b/qapi/qom.json index 30e76653ad..dfd89bc6d4 100644 --- a/qapi/qom.json +++ b/qapi/qom.json @@ -576,7 +576,7 @@ # # @prealloc: if true, preallocate memory (default: false) # -# @prealloc-threads: number of CPU threads to use for prealloc (default: 1) +# @prealloc-threads: number of CPU threads to use for prealloc (default: number of CPUs) (since 5.0) # # @prealloc-context: thread context to use for creation of preallocation threads #(default: none) (since 7.2) The line seems exceeding 80 characters. It'd better to limit each line in 75 characters. So you probably need: # @prealloc-threads: number of CPU threads to use for prealloc (default: number of CPUs) #(since 5.0) Still exceeds :) I suggested # @prealloc-threads: number of CPU threads to use for prealloc #(default: number of CPUs) (since 5.0) Markus's suggestion works :) Thanks, Gavin
Re: [PATCH v3] qapi/qom: Memory backend property prealloc-threads doc fix
On 11/11/22 5:13 PM, Igor Mammedov wrote: On Fri, 11 Nov 2022 07:47:16 +0100 Markus Armbruster wrote: Gavin Shan writes: On 11/11/22 11:05 AM, Zhenyu Zhang wrote: Commit ffac16fab3 "hostmem: introduce "prealloc-threads" property" (v5.0.0) changed the default number of threads from number of CPUs to 1. This was deemed a regression, and fixed in commit f8d426a685 "hostmem: default the amount of prealloc-threads to smp-cpus". Except the documentation remained unchanged. Update it now. Signed-off-by: Zhenyu Zhang --- v3: Covers historical descriptions (Markus) v2: The property is changed to smp-cpus since 5.0 (Phild) --- qapi/qom.json | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) With the following comments addressed: Reviewed-by: Gavin Shan --- Please consider amending the commit log to something like below. The default "prealloc-threads" value is set to 1 when the property is added by commit ffac16fab33b ("hostmem: introduce "prealloc-threads" property") in v5.0.0. The default value is conflicting with the sugar property as the value provided by the sugar property is number of CPUs. What is the sugar property? Can you explain the conflict in a bit more detail? my guess is that Gavin means mem_prealloc compat glue in qemu_process_sugar_options() property value should be set according to following order default -> compat -> explicit value so I don't see any conflict here. PS: if it we up to me, default would have stayed 1, and prealloc-threads fixup to vCPUs number would happen in vl.c similar to what is done in qemu_process_sugar_options(), keeping backend clean of external dependencies. Yes, it's the sugar property I was talking about. I'm not sure if we have a more popular name for this property: compat property or sugar property. When 'mem-prealloc=on' and 'prealloc-threads=xxx' aren't provided, the value is 1 before commit f8d426a6852c is applied. It's not inconsistent with 'mem-prealloc=on'. It's the conflict I was talking about and it's fixed by commit f8d426a6852c The conflict has been fixed by commit f8d426a6852c ("hostmem: default the amount of prealloc-threads to smp-cpus"). However, 'qapi/qom.json' was missed to be updated accordingly in the commit. Update 'qapi/qom.json' to reflect the change in commit f8d426a6852c. Signed-off-by: Zhenyu Zhang When a specific commit is mentioned in the commit log, we usually have fixed format like below. commit ffac16fab33b ("hostmem: introduce "prealloc-threads" property") commit f8d426a6852c ("hostmem: default the amount of prealloc-threads to smp-cpus") This is certainly a common format, but the other one is also in use. diff --git a/qapi/qom.json b/qapi/qom.json index 30e76653ad..dfd89bc6d4 100644 --- a/qapi/qom.json +++ b/qapi/qom.json @@ -576,7 +576,7 @@ # # @prealloc: if true, preallocate memory (default: false) # -# @prealloc-threads: number of CPU threads to use for prealloc (default: 1) +# @prealloc-threads: number of CPU threads to use for prealloc (default: number of CPUs) (since 5.0) # # @prealloc-context: thread context to use for creation of preallocation threads #(default: none) (since 7.2) The line seems exceeding 80 characters. It'd better to limit each line in 75 characters. So you probably need: # @prealloc-threads: number of CPU threads to use for prealloc (default: number of CPUs) #(since 5.0) Still exceeds :) I suggested # @prealloc-threads: number of CPU threads to use for prealloc #(default: number of CPUs) (since 5.0) Markus's suggestion works :) Thanks, Gavin
Re: [PATCH v3] qapi/qom: Memory backend property prealloc-threads doc fix
Hi Zhenyu, On 11/11/22 11:05 AM, Zhenyu Zhang wrote: Commit ffac16fab3 "hostmem: introduce "prealloc-threads" property" (v5.0.0) changed the default number of threads from number of CPUs to 1. This was deemed a regression, and fixed in commit f8d426a685 "hostmem: default the amount of prealloc-threads to smp-cpus". Except the documentation remained unchanged. Update it now. Signed-off-by: Zhenyu Zhang --- v3: Covers historical descriptions (Markus) v2: The property is changed to smp-cpus since 5.0 (Phild) --- qapi/qom.json | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) With the following comments addressed: Reviewed-by: Gavin Shan --- Please consider amending the commit log to something like below. The default "prealloc-threads" value is set to 1 when the property is added by commit ffac16fab33b ("hostmem: introduce "prealloc-threads" property") in v5.0.0. The default value is conflicting with the sugar property as the value provided by the sugar property is number of CPUs. The conflict has been fixed by commit f8d426a6852c ("hostmem: default the amount of prealloc-threads to smp-cpus"). However, 'qapi/qom.json' was missed to be updated accordingly in the commit. Update 'qapi/qom.json' to reflect the change in commit f8d426a6852c. Signed-off-by: Zhenyu Zhang When a specific commit is mentioned in the commit log, we usually have fixed format like below. commit ffac16fab33b ("hostmem: introduce "prealloc-threads" property") commit f8d426a6852c ("hostmem: default the amount of prealloc-threads to smp-cpus") diff --git a/qapi/qom.json b/qapi/qom.json index 30e76653ad..dfd89bc6d4 100644 --- a/qapi/qom.json +++ b/qapi/qom.json @@ -576,7 +576,7 @@ # # @prealloc: if true, preallocate memory (default: false) # -# @prealloc-threads: number of CPU threads to use for prealloc (default: 1) +# @prealloc-threads: number of CPU threads to use for prealloc (default: number of CPUs) (since 5.0) # # @prealloc-context: thread context to use for creation of preallocation threads #(default: none) (since 7.2) The line seems exceeding 80 characters. It'd better to limit each line in 75 characters. So you probably need: # @prealloc-threads: number of CPU threads to use for prealloc (default: number of CPUs) #(since 5.0) Thanks, Gavin
Re: [PATCH v6 0/7] hw/arm/virt: Improve address assignment for high memory regions
Hi Peter, On 10/29/22 2:06 AM, Peter Maydell wrote: On Wed, 26 Oct 2022 at 01:30, Gavin Shan wrote: On 10/24/22 11:54 AM, Gavin Shan wrote: There are three high memory regions, which are VIRT_HIGH_REDIST2, VIRT_HIGH_PCIE_ECAM and VIRT_HIGH_PCIE_MMIO. Their base addresses are floating on highest RAM address. However, they can be disabled in several cases. Could you help to take a look when getting a chance? I think Connie and Eric are close to complete the reviews, but v7 is still needed to address extra comments from them. I hope to make v7 mergeable if possible :) If Eric and Connie and Marc are happy with it then I don't have any further issues on top of that. NB that since softfreeze is next Tuesday, this is going to be 8.0 material at this point, I'm afraid :-( (Softfreeze caught me by surprise this cycle...) Ok. v7 was just posted with Connie/Eric's comments resolved. Marc also provided his r-b. Nothing is really critical since none of the patches fixes an existing issue. It would be great if it can be merged to 7.2 if we are fortunate enough. Otherwise, 8.0 is also good to me :) https://lists.nongnu.org/archive/html/qemu-arm/2022-10/msg00693.html (v7) Thanks, Gavin
Re: [PATCH v6 0/7] hw/arm/virt: Improve address assignment for high memory regions
Hi Marc, On 10/29/22 7:29 PM, Marc Zyngier wrote: On Wed, 26 Oct 2022 01:29:56 +0100, Gavin Shan wrote: On 10/24/22 11:54 AM, Gavin Shan wrote: There are three high memory regions, which are VIRT_HIGH_REDIST2, VIRT_HIGH_PCIE_ECAM and VIRT_HIGH_PCIE_MMIO. Their base addresses are floating on highest RAM address. However, they can be disabled in several cases. (1) One specific high memory region is disabled by developer by toggling vms->highmem_{redists, ecam, mmio}. (2) VIRT_HIGH_PCIE_ECAM region is disabled on machine, which is 'virt-2.12' or ealier than it. (3) VIRT_HIGH_PCIE_ECAM region is disabled when firmware is loaded on 32-bits system. (4) One specific high memory region is disabled when it breaks the PA space limit. The current implementation of virt_set_memmap() isn't comprehensive because the space for one specific high memory region is always reserved from the PA space for case (1), (2) and (3). In the code, 'base' and 'vms->highest_gpa' are always increased for those three cases. It's unnecessary since the assigned space of the disabled high memory region won't be used afterwards. The series intends to improve the address assignment for these high memory regions and introduces new properties for user to selectively disable those 3 high memory regions. PATCH[1-4] preparatory work for the improvment PATCH[5] improve high memory region address assignment PATCH[6] adds 'compact-highmem' to enable or disable the optimization PATCH[7] adds properties so that high memory regions can be disabled v5: https://lists.nongnu.org/archive/html/qemu-arm/2022-10/msg00280.html v4: https://lists.nongnu.org/archive/html/qemu-arm/2022-10/msg00067.html v3: https://lists.nongnu.org/archive/html/qemu-arm/2022-09/msg00258.html v2: https://lore.kernel.org/all/20220815062958.100366-1-gs...@redhat.com/T/ v1: https://lists.nongnu.org/archive/html/qemu-arm/2022-08/msg00013.html Could you help to take a look when getting a chance? I think Connie and Eric are close to complete the reviews, but v7 is still needed to address extra comments from them. I hope to make v7 mergeable if possible :) With the comments from Connie and Eric addressed, this looks good to me: Reviewed-by: Marc Zyngier Thanks for having gone the extra mile on this one. Thank you for your feedback and reviews. I've posted v7 with your r-b after resolving comments from Connie/Eric. https://lists.nongnu.org/archive/html/qemu-arm/2022-10/msg00693.html (v7) Thanks, Gavin
[PATCH v7 7/7] hw/arm/virt: Add properties to disable high memory regions
The 3 high memory regions are usually enabled by default, but they may be not used. For example, VIRT_HIGH_GIC_REDIST2 isn't needed by GICv2. This leads to waste in the PA space. Add properties ("highmem-redists", "highmem-ecam", "highmem-mmio") to allow users selectively disable them if needed. After that, the high memory region for GICv3 or GICv4 redistributor can be disabled by user, the number of maximal supported CPUs needs to be calculated based on 'vms->highmem_redists'. The follow-up error message is also improved to indicate if the high memory region for GICv3 and GICv4 has been enabled or not. Suggested-by: Marc Zyngier Signed-off-by: Gavin Shan Reviewed-by: Marc Zyngier --- docs/system/arm/virt.rst | 13 +++ hw/arm/virt.c| 75 ++-- 2 files changed, 86 insertions(+), 2 deletions(-) diff --git a/docs/system/arm/virt.rst b/docs/system/arm/virt.rst index 4454706392..188a4f211f 100644 --- a/docs/system/arm/virt.rst +++ b/docs/system/arm/virt.rst @@ -98,6 +98,19 @@ compact-highmem Set ``on``/``off`` to enable/disable the compact layout for high memory regions. The default is ``on`` for machine types later than ``virt-7.2``. +highmem-redists + Set ``on``/``off`` to enable/disable the high memory region for GICv3 or + GICv4 redistributor. The default is ``on``. Setting this to ``off`` will + limit the maximum number of CPUs when GICv3 or GICv4 is used. + +highmem-ecam + Set ``on``/``off`` to enable/disable the high memory region for PCI ECAM. + The default is ``on`` for machine types later than ``virt-3.0``. + +highmem-mmio + Set ``on``/``off`` to enable/disable the high memory region for PCI MMIO. + The default is ``on``. + gic-version Specify the version of the Generic Interrupt Controller (GIC) to provide. Valid values are: diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 020a95cfa2..65cff7add1 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -2095,14 +2095,20 @@ static void machvirt_init(MachineState *machine) if (vms->gic_version == VIRT_GIC_VERSION_2) { virt_max_cpus = GIC_NCPU; } else { -virt_max_cpus = virt_redist_capacity(vms, VIRT_GIC_REDIST) + -virt_redist_capacity(vms, VIRT_HIGH_GIC_REDIST2); +virt_max_cpus = virt_redist_capacity(vms, VIRT_GIC_REDIST); +if (vms->highmem_redists) { +virt_max_cpus += virt_redist_capacity(vms, VIRT_HIGH_GIC_REDIST2); +} } if (max_cpus > virt_max_cpus) { error_report("Number of SMP CPUs requested (%d) exceeds max CPUs " "supported by machine 'mach-virt' (%d)", max_cpus, virt_max_cpus); +if (vms->gic_version != VIRT_GIC_VERSION_2 && !vms->highmem_redists) { +error_printf("Try 'highmem-redists=on' for more CPUs\n"); +} + exit(1); } @@ -2371,6 +2377,49 @@ static void virt_set_compact_highmem(Object *obj, bool value, Error **errp) vms->highmem_compact = value; } +static bool virt_get_highmem_redists(Object *obj, Error **errp) +{ +VirtMachineState *vms = VIRT_MACHINE(obj); + +return vms->highmem_redists; +} + +static void virt_set_highmem_redists(Object *obj, bool value, Error **errp) +{ +VirtMachineState *vms = VIRT_MACHINE(obj); + +vms->highmem_redists = value; +} + +static bool virt_get_highmem_ecam(Object *obj, Error **errp) +{ +VirtMachineState *vms = VIRT_MACHINE(obj); + +return vms->highmem_ecam; +} + +static void virt_set_highmem_ecam(Object *obj, bool value, Error **errp) +{ +VirtMachineState *vms = VIRT_MACHINE(obj); + +vms->highmem_ecam = value; +} + +static bool virt_get_highmem_mmio(Object *obj, Error **errp) +{ +VirtMachineState *vms = VIRT_MACHINE(obj); + +return vms->highmem_mmio; +} + +static void virt_set_highmem_mmio(Object *obj, bool value, Error **errp) +{ +VirtMachineState *vms = VIRT_MACHINE(obj); + +vms->highmem_mmio = value; +} + + static bool virt_get_its(Object *obj, Error **errp) { VirtMachineState *vms = VIRT_MACHINE(obj); @@ -2996,6 +3045,28 @@ static void virt_machine_class_init(ObjectClass *oc, void *data) "Set on/off to enable/disable compact " "layout for high memory regions"); +object_class_property_add_bool(oc, "highmem-redists", + virt_get_highmem_redists, + virt_set_highmem_redists); +object_class_property_set_description(oc, "highmem-redists", + "Set on/off to enable/disable high " + "memory region for GICv3 or GICv4 " + "redistributor"
[PATCH v7 5/7] hw/arm/virt: Improve high memory region address assignment
There are three high memory regions, which are VIRT_HIGH_REDIST2, VIRT_HIGH_PCIE_ECAM and VIRT_HIGH_PCIE_MMIO. Their base addresses are floating on highest RAM address. However, they can be disabled in several cases. (1) One specific high memory region is likely to be disabled by code by toggling vms->highmem_{redists, ecam, mmio}. (2) VIRT_HIGH_PCIE_ECAM region is disabled on machine, which is 'virt-2.12' or ealier than it. (3) VIRT_HIGH_PCIE_ECAM region is disabled when firmware is loaded on 32-bits system. (4) One specific high memory region is disabled when it breaks the PA space limit. The current implementation of virt_set_{memmap, high_memmap}() isn't optimized because the high memory region's PA space is always reserved, regardless of whatever the actual state in the corresponding vms->highmem_{redists, ecam, mmio} flag. In the code, 'base' and 'vms->highest_gpa' are always increased for case (1), (2) and (3). It's unnecessary since the assigned PA space for the disabled high memory region won't be used afterwards. Improve the address assignment for those three high memory region by skipping the address assignment for one specific high memory region if it has been disabled in case (1), (2) and (3). The memory layout may be changed after the improvement is applied, which leads to potential migration breakage. So 'vms->highmem_compact' is added to control if the improvement should be applied. For now, 'vms->highmem_compact' is set to false, meaning that we don't have memory layout change until it becomes configurable through property 'compact-highmem' in next patch. Signed-off-by: Gavin Shan Reviewed-by: Eric Auger Reviewed-by: Cornelia Huck Reviewed-by: Marc Zyngier Tested-by: Zhenyu Zhang --- hw/arm/virt.c | 15 ++- include/hw/arm/virt.h | 1 + 2 files changed, 11 insertions(+), 5 deletions(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index ee98a8a3b6..3b4c995f80 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -1721,18 +1721,23 @@ static void virt_set_high_memmap(VirtMachineState *vms, vms->memmap[i].size = region_size; /* - * Check each device to see if they fit in the PA space, - * moving highest_gpa as we go. + * Check each device to see if it fits in the PA space, + * moving highest_gpa as we go. For compatibility, move + * highest_gpa for disabled fitting devices as well, if + * the compact layout has been disabled. * * For each device that doesn't fit, disable it. */ fits = (region_base + region_size) <= BIT_ULL(pa_bits); -if (fits) { -vms->highest_gpa = region_base + region_size - 1; +*region_enabled &= fits; +if (vms->highmem_compact && !*region_enabled) { +continue; } -*region_enabled &= fits; base = region_base + region_size; +if (fits) { +vms->highest_gpa = base - 1; +} } } diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h index 6ec479ca2b..709f623741 100644 --- a/include/hw/arm/virt.h +++ b/include/hw/arm/virt.h @@ -144,6 +144,7 @@ struct VirtMachineState { PFlashCFI01 *flash[2]; bool secure; bool highmem; +bool highmem_compact; bool highmem_ecam; bool highmem_mmio; bool highmem_redists; -- 2.23.0
[PATCH v7 6/7] hw/arm/virt: Add 'compact-highmem' property
After the improvement to high memory region address assignment is applied, the memory layout can be changed, introducing possible migration breakage. For example, VIRT_HIGH_PCIE_MMIO memory region is disabled or enabled when the optimization is applied or not, with the following configuration. The configuration is only achievable by modifying the source code until more properties are added to allow users selectively disable those high memory regions. pa_bits = 40; vms->highmem_redists = false; vms->highmem_ecam= false; vms->highmem_mmio= true; # qemu-system-aarch64 -accel kvm -cpu host\ -machine virt-7.2,compact-highmem={on, off} \ -m 4G,maxmem=511G -monitor stdio Region compact-highmem=off compact-highmem=on MEM[1GB 512GB][1GB 512GB] HIGH_GIC_REDISTS2 [512GB 512GB+64MB] [disabled] HIGH_PCIE_ECAM [512GB+256MB 512GB+512MB] [disabled] HIGH_PCIE_MMIO [disabled] [512GB 1TB] In order to keep backwords compatibility, we need to disable the optimization on machine, which is virt-7.1 or ealier than it. It means the optimization is enabled by default from virt-7.2. Besides, 'compact-highmem' property is added so that the optimization can be explicitly enabled or disabled on all machine types by users. Signed-off-by: Gavin Shan Reviewed-by: Eric Auger Reviewed-by: Cornelia Huck Reviewed-by: Marc Zyngier Tested-by: Zhenyu Zhang --- docs/system/arm/virt.rst | 4 hw/arm/virt.c| 32 include/hw/arm/virt.h| 1 + 3 files changed, 37 insertions(+) diff --git a/docs/system/arm/virt.rst b/docs/system/arm/virt.rst index 20442ea2c1..4454706392 100644 --- a/docs/system/arm/virt.rst +++ b/docs/system/arm/virt.rst @@ -94,6 +94,10 @@ highmem address space above 32 bits. The default is ``on`` for machine types later than ``virt-2.12``. +compact-highmem + Set ``on``/``off`` to enable/disable the compact layout for high memory regions. + The default is ``on`` for machine types later than ``virt-7.2``. + gic-version Specify the version of the Generic Interrupt Controller (GIC) to provide. Valid values are: diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 3b4c995f80..020a95cfa2 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -174,6 +174,12 @@ static const MemMapEntry base_memmap[] = { * Note the extended_memmap is sized so that it eventually also includes the * base_memmap entries (VIRT_HIGH_GIC_REDIST2 index is greater than the last * index of base_memmap). + * + * The memory map for these Highmem IO Regions can be in legacy or compact + * layout, depending on 'compact-highmem' property. With legacy layout, the + * PA space for one specific region is always reserved, even if the region + * has been disabled or doesn't fit into the PA space. However, the PA space + * for the region won't be reserved in these circumstances with compact layout. */ static MemMapEntry extended_memmap[] = { /* Additional 64 MB redist region (can contain up to 512 redistributors) */ @@ -2351,6 +2357,20 @@ static void virt_set_highmem(Object *obj, bool value, Error **errp) vms->highmem = value; } +static bool virt_get_compact_highmem(Object *obj, Error **errp) +{ +VirtMachineState *vms = VIRT_MACHINE(obj); + +return vms->highmem_compact; +} + +static void virt_set_compact_highmem(Object *obj, bool value, Error **errp) +{ +VirtMachineState *vms = VIRT_MACHINE(obj); + +vms->highmem_compact = value; +} + static bool virt_get_its(Object *obj, Error **errp) { VirtMachineState *vms = VIRT_MACHINE(obj); @@ -2969,6 +2989,13 @@ static void virt_machine_class_init(ObjectClass *oc, void *data) "Set on/off to enable/disable using " "physical address space above 32 bits"); +object_class_property_add_bool(oc, "compact-highmem", + virt_get_compact_highmem, + virt_set_compact_highmem); +object_class_property_set_description(oc, "compact-highmem", + "Set on/off to enable/disable compact " + "layout for high memory regions"); + object_class_property_add_str(oc, "gic-version", virt_get_gic_version, virt_set_gic_version); object_class_property_set_description(oc, "gic-version", @@ -3053,6 +3080,7 @@ static void virt_instance_init(Object *obj) /* High memory is enabled by default */ vms->highmem = true; +vms->highmem_compact = !vmc->no_highmem_compact; vms->gic_version = VIRT_GIC_VERSION_NOSEL; vms-&g
[PATCH v7 0/7] hw/arm/virt: Improve address assignment for high memory regions
There are three high memory regions, which are VIRT_HIGH_REDIST2, VIRT_HIGH_PCIE_ECAM and VIRT_HIGH_PCIE_MMIO. Their base addresses are floating on highest RAM address. However, they can be disabled in several cases. (1) One specific high memory region is disabled by developer by toggling vms->highmem_{redists, ecam, mmio}. (2) VIRT_HIGH_PCIE_ECAM region is disabled on machine, which is 'virt-2.12' or ealier than it. (3) VIRT_HIGH_PCIE_ECAM region is disabled when firmware is loaded on 32-bits system. (4) One specific high memory region is disabled when it breaks the PA space limit. The current implementation of virt_set_memmap() isn't comprehensive because the space for one specific high memory region is always reserved from the PA space for case (1), (2) and (3). In the code, 'base' and 'vms->highest_gpa' are always increased for those three cases. It's unnecessary since the assigned space of the disabled high memory region won't be used afterwards. The series intends to improve the address assignment for these high memory regions and introduces new properties for user to selectively disable those 3 high memory regions. PATCH[1-4] preparatory work for the improvment PATCH[5] improve high memory region address assignment PATCH[6] adds 'compact-highmem' to enable or disable the optimization PATCH[7] adds properties so that high memory regions can be disabled v6: https://lists.nongnu.org/archive/html/qemu-arm/2022-10/msg00490.html v5: https://lists.nongnu.org/archive/html/qemu-arm/2022-10/msg00280.html v4: https://lists.nongnu.org/archive/html/qemu-arm/2022-10/msg00067.html v3: https://lists.nongnu.org/archive/html/qemu-arm/2022-09/msg00258.html v2: https://lore.kernel.org/all/20220815062958.100366-1-gs...@redhat.com/T/ v1: https://lists.nongnu.org/archive/html/qemu-arm/2022-08/msg00013.html Changelog == v7: * Pick review-by from Connie/Eric/Marc (Connie/Eric/Marc) * Use 'base - 1' to update 'vms->highest_gpa' in PATCH[v7 5] (Eric) * Replace 'even' with 'even if' in the comments about the legacy and compact layout in PATCH[v7 6] (Connie) * The maximal supported CPUs are calculated based on property 'highmem-redists' and the improved comments and error messages for this property (Connie) v6: * Pick review-by from Connie/Eric(Connie/Eric) * Make the changes obvious in PATCH[v6 5/7] (Eric) * Move the example to commit log and describe the legacy and compact layout in code's comments in PATCH[v6 6/7] (Eric) * Comment and commit message improvements(Connie/Eric) * Add 3 properties in PATCH[v6 7/7], allowing user to disable those 3 high memory regions(Marc) v5: * Pick review-by and tested-by (Connie/Zhenyu) * Add extra check in PATCH[v5 4/6] (Connie) * Improve comments about compatibility for disabled regions in PATCH[v5 5/6] (Connie) v4: * Add virt_get_high_memmap_enabled() helper (Eric) * Move 'vms->highmem_compact' and related logic from PATCH[v4 6/6] to PATCH[v4 5/6] to avoid git-bisect breakage (Eric) * Document the legacy and optimized high memory region layout in commit log and source code (Eric) v3: * Reorder the patches(Gavin) * Add 'highmem-compact' property for backwards compatibility (Eric) v2: * Split the patches for easier review(Gavin) * Improved changelog (Marc) * Use 'bool fits' in virt_set_high_memmap() (Eric) Gavin Shan (7): hw/arm/virt: Introduce virt_set_high_memmap() helper hw/arm/virt: Rename variable size to region_size in virt_set_high_memmap() hw/arm/virt: Introduce variable region_base in virt_set_high_memmap() hw/arm/virt: Introduce virt_get_high_memmap_enabled() helper hw/arm/virt: Improve high memory region address assignment hw/arm/virt: Add 'compact-highmem' property hw/arm/virt: Add properties to disable high memory regions docs/system/arm/virt.rst | 17 hw/arm/virt.c| 193 --- include/hw/arm/virt.h| 2 + 3 files changed, 177 insertions(+), 35 deletions(-) -- 2.23.0
[PATCH v7 4/7] hw/arm/virt: Introduce virt_get_high_memmap_enabled() helper
This introduces virt_get_high_memmap_enabled() helper, which returns the pointer to vms->highmem_{redists, ecam, mmio}. The pointer will be used in the subsequent patches. No functional change intended. Signed-off-by: Gavin Shan Reviewed-by: Eric Auger Reviewed-by: Cornelia Huck Reviewed-by: Marc Zyngier Tested-by: Zhenyu Zhang --- hw/arm/virt.c | 32 +++- 1 file changed, 19 insertions(+), 13 deletions(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 0bf3cb7057..ee98a8a3b6 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -1689,14 +1689,31 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState *vms, int idx) return arm_cpu_mp_affinity(idx, clustersz); } +static inline bool *virt_get_high_memmap_enabled(VirtMachineState *vms, + int index) +{ +bool *enabled_array[] = { +>highmem_redists, +>highmem_ecam, +>highmem_mmio, +}; + +assert(ARRAY_SIZE(extended_memmap) - VIRT_LOWMEMMAP_LAST == + ARRAY_SIZE(enabled_array)); +assert(index - VIRT_LOWMEMMAP_LAST < ARRAY_SIZE(enabled_array)); + +return enabled_array[index - VIRT_LOWMEMMAP_LAST]; +} + static void virt_set_high_memmap(VirtMachineState *vms, hwaddr base, int pa_bits) { hwaddr region_base, region_size; -bool fits; +bool *region_enabled, fits; int i; for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) { +region_enabled = virt_get_high_memmap_enabled(vms, i); region_base = ROUND_UP(base, extended_memmap[i].size); region_size = extended_memmap[i].size; @@ -1714,18 +1731,7 @@ static void virt_set_high_memmap(VirtMachineState *vms, vms->highest_gpa = region_base + region_size - 1; } -switch (i) { -case VIRT_HIGH_GIC_REDIST2: -vms->highmem_redists &= fits; -break; -case VIRT_HIGH_PCIE_ECAM: -vms->highmem_ecam &= fits; -break; -case VIRT_HIGH_PCIE_MMIO: -vms->highmem_mmio &= fits; -break; -} - +*region_enabled &= fits; base = region_base + region_size; } } -- 2.23.0
[PATCH v7 3/7] hw/arm/virt: Introduce variable region_base in virt_set_high_memmap()
This introduces variable 'region_base' for the base address of the specific high memory region. It's the preparatory work to optimize high memory region address assignment. No functional change intended. Signed-off-by: Gavin Shan Reviewed-by: Eric Auger Reviewed-by: Cornelia Huck Reviewed-by: Marc Zyngier Tested-by: Zhenyu Zhang --- hw/arm/virt.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index e2ae88cf8b..0bf3cb7057 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -1692,15 +1692,15 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState *vms, int idx) static void virt_set_high_memmap(VirtMachineState *vms, hwaddr base, int pa_bits) { -hwaddr region_size; +hwaddr region_base, region_size; bool fits; int i; for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) { +region_base = ROUND_UP(base, extended_memmap[i].size); region_size = extended_memmap[i].size; -base = ROUND_UP(base, region_size); -vms->memmap[i].base = base; +vms->memmap[i].base = region_base; vms->memmap[i].size = region_size; /* @@ -1709,9 +1709,9 @@ static void virt_set_high_memmap(VirtMachineState *vms, * * For each device that doesn't fit, disable it. */ -fits = (base + region_size) <= BIT_ULL(pa_bits); +fits = (region_base + region_size) <= BIT_ULL(pa_bits); if (fits) { -vms->highest_gpa = base + region_size - 1; +vms->highest_gpa = region_base + region_size - 1; } switch (i) { @@ -1726,7 +1726,7 @@ static void virt_set_high_memmap(VirtMachineState *vms, break; } -base += region_size; +base = region_base + region_size; } } -- 2.23.0
[PATCH v7 2/7] hw/arm/virt: Rename variable size to region_size in virt_set_high_memmap()
This renames variable 'size' to 'region_size' in virt_set_high_memmap(). Its counterpart ('region_base') will be introduced in next patch. No functional change intended. Signed-off-by: Gavin Shan Reviewed-by: Eric Auger Reviewed-by: Cornelia Huck Reviewed-by: Marc Zyngier Tested-by: Zhenyu Zhang --- hw/arm/virt.c | 15 --- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 7572c44bda..e2ae88cf8b 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -1692,15 +1692,16 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState *vms, int idx) static void virt_set_high_memmap(VirtMachineState *vms, hwaddr base, int pa_bits) { +hwaddr region_size; +bool fits; int i; for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) { -hwaddr size = extended_memmap[i].size; -bool fits; +region_size = extended_memmap[i].size; -base = ROUND_UP(base, size); +base = ROUND_UP(base, region_size); vms->memmap[i].base = base; -vms->memmap[i].size = size; +vms->memmap[i].size = region_size; /* * Check each device to see if they fit in the PA space, @@ -1708,9 +1709,9 @@ static void virt_set_high_memmap(VirtMachineState *vms, * * For each device that doesn't fit, disable it. */ -fits = (base + size) <= BIT_ULL(pa_bits); +fits = (base + region_size) <= BIT_ULL(pa_bits); if (fits) { -vms->highest_gpa = base + size - 1; +vms->highest_gpa = base + region_size - 1; } switch (i) { @@ -1725,7 +1726,7 @@ static void virt_set_high_memmap(VirtMachineState *vms, break; } -base += size; +base += region_size; } } -- 2.23.0
[PATCH v7 1/7] hw/arm/virt: Introduce virt_set_high_memmap() helper
This introduces virt_set_high_memmap() helper. The logic of high memory region address assignment is moved to the helper. The intention is to make the subsequent optimization for high memory region address assignment easier. No functional change intended. Signed-off-by: Gavin Shan Reviewed-by: Eric Auger Reviewed-by: Cornelia Huck Reviewed-by: Marc Zyngier Tested-by: Zhenyu Zhang --- hw/arm/virt.c | 74 --- 1 file changed, 41 insertions(+), 33 deletions(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index cda9defe8f..7572c44bda 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -1689,6 +1689,46 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState *vms, int idx) return arm_cpu_mp_affinity(idx, clustersz); } +static void virt_set_high_memmap(VirtMachineState *vms, + hwaddr base, int pa_bits) +{ +int i; + +for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) { +hwaddr size = extended_memmap[i].size; +bool fits; + +base = ROUND_UP(base, size); +vms->memmap[i].base = base; +vms->memmap[i].size = size; + +/* + * Check each device to see if they fit in the PA space, + * moving highest_gpa as we go. + * + * For each device that doesn't fit, disable it. + */ +fits = (base + size) <= BIT_ULL(pa_bits); +if (fits) { +vms->highest_gpa = base + size - 1; +} + +switch (i) { +case VIRT_HIGH_GIC_REDIST2: +vms->highmem_redists &= fits; +break; +case VIRT_HIGH_PCIE_ECAM: +vms->highmem_ecam &= fits; +break; +case VIRT_HIGH_PCIE_MMIO: +vms->highmem_mmio &= fits; +break; +} + +base += size; +} +} + static void virt_set_memmap(VirtMachineState *vms, int pa_bits) { MachineState *ms = MACHINE(vms); @@ -1744,39 +1784,7 @@ static void virt_set_memmap(VirtMachineState *vms, int pa_bits) /* We know for sure that at least the memory fits in the PA space */ vms->highest_gpa = memtop - 1; -for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) { -hwaddr size = extended_memmap[i].size; -bool fits; - -base = ROUND_UP(base, size); -vms->memmap[i].base = base; -vms->memmap[i].size = size; - -/* - * Check each device to see if they fit in the PA space, - * moving highest_gpa as we go. - * - * For each device that doesn't fit, disable it. - */ -fits = (base + size) <= BIT_ULL(pa_bits); -if (fits) { -vms->highest_gpa = base + size - 1; -} - -switch (i) { -case VIRT_HIGH_GIC_REDIST2: -vms->highmem_redists &= fits; -break; -case VIRT_HIGH_PCIE_ECAM: -vms->highmem_ecam &= fits; -break; -case VIRT_HIGH_PCIE_MMIO: -vms->highmem_mmio &= fits; -break; -} - -base += size; -} +virt_set_high_memmap(vms, base, pa_bits); if (device_memory_size > 0) { ms->device_memory = g_malloc0(sizeof(*ms->device_memory)); -- 2.23.0
Re: [PATCH v6 7/7] hw/arm/virt: Add properties to disable high memory regions
Hi Connie, On 10/26/22 7:10 PM, Cornelia Huck wrote: On Wed, Oct 26 2022, Gavin Shan wrote: On 10/25/22 6:54 PM, Cornelia Huck wrote: On Mon, Oct 24 2022, Gavin Shan wrote: These 3 high memory regions are usually enabled by default, but s/These 3/The/ ? Ok. they may be not used. For example, VIRT_HIGH_GIC_REDIST2 isn't needed by GICv2. This leads to waste in the PA space. When building the command line, do we have enough information on when the regions provide something useful, and when they just waste space? I think the help messages are already indicative. For example, the help messages for 'highmem-redist2' indicate the region is only needed by GICv3 or GICv4. 'highmem-ecam' and 'highmem-mmio' are needed by PCI ECAM and MMIO and the key words 'high' indicates they're the corresponding second window. #./qemu-system-aarch64 -M virt,? highmem-ecam=- Set on/off to enable/disable high memory region for PCI ECAM highmem-mmio=- Set on/off to enable/disable high memory region for PCI MMIO highmem-redists= - Set on/off to enable/disable high memory region for GICv3 or GICv4 redistributor OK, hopefully this is enough for anyone building a command line directly. (Do we want to encourage management software like libvirt to switch off regions that are not needed?) Yeah, to disable those regions aren't needed will make libvirt smart. I think it's worthy for libvirt to do it. I'm not sure if arbitrary machine properties have been supported by libvirt or not. For example, 'highmem-redists' can still be turned off from 'vm1.xml' even though it's not explicitly supported in libvirt's code. Add properties to allow users selectively disable them if needed: "highmem-redists", "highmem-ecam", "highmem-mmio". Suggested-by: Marc Zyngier Signed-off-by: Gavin Shan --- docs/system/arm/virt.rst | 12 hw/arm/virt.c| 64 2 files changed, 76 insertions(+) diff --git a/docs/system/arm/virt.rst b/docs/system/arm/virt.rst index 4454706392..a1668a969d 100644 --- a/docs/system/arm/virt.rst +++ b/docs/system/arm/virt.rst @@ -98,6 +98,18 @@ compact-highmem Set ``on``/``off`` to enable/disable the compact layout for high memory regions. The default is ``on`` for machine types later than ``virt-7.2``. +highmem-redists + Set ``on``/``off`` to enable/disable the high memry region for GICv3/4 s/memry/memory/ Ok, copy-and-paste error. Will be fixed. + redistributor. The default is ``on``. Do we need to add a note about what effects setting this to "off" may have, e.g. "Setting this to ``off`` may limit the maximum number of cpus." or so? And/or "Setting this to ``off`` when using GICv2 will save some space."? We may not mention GICv2 since GICv3/v4 are already mentioned. It's a good idea to mention that the maximum number of CPUs is reduced when it's turned off. I will have something like below in next respin if you agree. highmem-redists Set ``on``/``off`` to enable/disable the high memroy region for GICv3 or GICv4 redistributor. The default is ``on``. Setting this to ``off`` will limit the maximum number of CPUs when GICv3 or GICv4 is used. OK, sounds reasonable to me. Ok, Thanks for your confirm. Since 'vms->highmem_redists' is changeable, the 'virt_max_cpus' in machvirt_init() needs to be recalculated based on that. The code change will be included into next respin. Besides, the follow-up error message will be improved to something like below. error_report("Number of SMP CPUs requested (%d) exceeds max CPUs " "supported by machine 'mach-virt' (%d). The high memory " "region for GICv3 or GICv4 redistributor has been %s", max_cpus, virt_max_cpus, vms->highmem_redists ? "enabled" : "disabled"); Hm, the doc for error_report() states that "The resulting message should be a single phrase, with no newline or trailing punctuation." Maybe if (max_cpus > virt_max_cpus) { error_report("Number of SMP CPUs requested (%d) exceeds max CPUs " "supported by machine 'mach-virt' (%d)", max_cpus, virt_max_cpus); if (vms->gic_version != VIRT_GIC_VERSION_2 && !vms->higmem_redists) { error_printf("Try 'highmem-redists=on' for more CPUs\n"); } exit(1); } Ok, I didn't notice the message raised by error_report() has this sort of requirements. Your suggested snippet looks good to me and it will be included in next respin. However, I would hold next respin (v7) for a while to see if I can receive comments from Peter/Marc on v6 :) Thanks, Gavin
Re: [PATCH v6 5/7] hw/arm/virt: Improve high memory region address assignment
Hi Connie, On 10/26/22 6:43 PM, Cornelia Huck wrote: On Wed, Oct 26 2022, Gavin Shan wrote: On 10/26/22 12:29 AM, Eric Auger wrote: On 10/24/22 05:54, Gavin Shan wrote: There are three high memory regions, which are VIRT_HIGH_REDIST2, VIRT_HIGH_PCIE_ECAM and VIRT_HIGH_PCIE_MMIO. Their base addresses are floating on highest RAM address. However, they can be disabled in several cases. (1) One specific high memory region is likely to be disabled by code by toggling vms->highmem_{redists, ecam, mmio}. (2) VIRT_HIGH_PCIE_ECAM region is disabled on machine, which is 'virt-2.12' or ealier than it. (3) VIRT_HIGH_PCIE_ECAM region is disabled when firmware is loaded on 32-bits system. (4) One specific high memory region is disabled when it breaks the PA space limit. The current implementation of virt_set_{memmap, high_memmap}() isn't optimized because the high memory region's PA space is always reserved, regardless of whatever the actual state in the corresponding vms->highmem_{redists, ecam, mmio} flag. In the code, 'base' and 'vms->highest_gpa' are always increased for case (1), (2) and (3). It's unnecessary since the assigned PA space for the disabled high memory region won't be used afterwards. Improve the address assignment for those three high memory region by skipping the address assignment for one specific high memory region if it has been disabled in case (1), (2) and (3). The memory layout may be changed after the improvement is applied, which leads to potential migration breakage. So 'vms->highmem_compact' is added to control if the improvement should be applied. For now, 'vms->highmem_compact' is set to false, meaning that we don't have memory layout change until it becomes configurable through property 'compact-highmem' in next patch. Signed-off-by: Gavin Shan Reviewed-by: Cornelia Huck the code has quite changed since Connie's R-b Right. Connie, could you please check if the changes make sense to you and I can regain your R-B? :) My R-b still holds. Thanks. Tested-by: Zhenyu Zhang --- hw/arm/virt.c | 15 ++- include/hw/arm/virt.h | 1 + 2 files changed, 11 insertions(+), 5 deletions(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index ee98a8a3b6..4896f600b4 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -1721,18 +1721,23 @@ static void virt_set_high_memmap(VirtMachineState *vms, vms->memmap[i].size = region_size; /* - * Check each device to see if they fit in the PA space, - * moving highest_gpa as we go. + * Check each device to see if it fits in the PA space, + * moving highest_gpa as we go. For compatibility, move + * highest_gpa for disabled fitting devices as well, if + * the compact layout has been disabled. * * For each device that doesn't fit, disable it. */ fits = (region_base + region_size) <= BIT_ULL(pa_bits); -if (fits) { -vms->highest_gpa = region_base + region_size - 1; +*region_enabled &= fits; +if (vms->highmem_compact && !*region_enabled) { +continue; } -*region_enabled &= fits; base = region_base + region_size; +if (fits) { +vms->highest_gpa = region_base + region_size - 1; vms->highest_gpa = base - 1; It's personal taste actually. I was thinking of using 'base - 1', but 'region_base + region_size - 1' looks more like a direct way. I don't have strong sense though and lets use 'base - 1' in next respin. I don't really have a preference for one or the other. Ok. Lets use 'base - 1' in next respin. Thanks, Gavin
Re: [PATCH v6 7/7] hw/arm/virt: Add properties to disable high memory regions
Hi Connie, On 10/25/22 6:54 PM, Cornelia Huck wrote: On Mon, Oct 24 2022, Gavin Shan wrote: These 3 high memory regions are usually enabled by default, but s/These 3/The/ ? Ok. they may be not used. For example, VIRT_HIGH_GIC_REDIST2 isn't needed by GICv2. This leads to waste in the PA space. When building the command line, do we have enough information on when the regions provide something useful, and when they just waste space? I think the help messages are already indicative. For example, the help messages for 'highmem-redist2' indicate the region is only needed by GICv3 or GICv4. 'highmem-ecam' and 'highmem-mmio' are needed by PCI ECAM and MMIO and the key words 'high' indicates they're the corresponding second window. #./qemu-system-aarch64 -M virt,? highmem-ecam=- Set on/off to enable/disable high memory region for PCI ECAM highmem-mmio=- Set on/off to enable/disable high memory region for PCI MMIO highmem-redists= - Set on/off to enable/disable high memory region for GICv3 or GICv4 redistributor Add properties to allow users selectively disable them if needed: "highmem-redists", "highmem-ecam", "highmem-mmio". Suggested-by: Marc Zyngier Signed-off-by: Gavin Shan --- docs/system/arm/virt.rst | 12 hw/arm/virt.c| 64 2 files changed, 76 insertions(+) diff --git a/docs/system/arm/virt.rst b/docs/system/arm/virt.rst index 4454706392..a1668a969d 100644 --- a/docs/system/arm/virt.rst +++ b/docs/system/arm/virt.rst @@ -98,6 +98,18 @@ compact-highmem Set ``on``/``off`` to enable/disable the compact layout for high memory regions. The default is ``on`` for machine types later than ``virt-7.2``. +highmem-redists + Set ``on``/``off`` to enable/disable the high memry region for GICv3/4 s/memry/memory/ Ok, copy-and-paste error. Will be fixed. + redistributor. The default is ``on``. Do we need to add a note about what effects setting this to "off" may have, e.g. "Setting this to ``off`` may limit the maximum number of cpus." or so? And/or "Setting this to ``off`` when using GICv2 will save some space."? We may not mention GICv2 since GICv3/v4 are already mentioned. It's a good idea to mention that the maximum number of CPUs is reduced when it's turned off. I will have something like below in next respin if you agree. highmem-redists Set ``on``/``off`` to enable/disable the high memroy region for GICv3 or GICv4 redistributor. The default is ``on``. Setting this to ``off`` will limit the maximum number of CPUs when GICv3 or GICv4 is used. Since 'vms->highmem_redists' is changeable, the 'virt_max_cpus' in machvirt_init() needs to be recalculated based on that. The code change will be included into next respin. Besides, the follow-up error message will be improved to something like below. error_report("Number of SMP CPUs requested (%d) exceeds max CPUs " "supported by machine 'mach-virt' (%d). The high memory " "region for GICv3 or GICv4 redistributor has been %s", max_cpus, virt_max_cpus, vms->highmem_redists ? "enabled" : "disabled"); + +highmem-ecam + Set ``on``/``off`` to enable/disable the high memry region for PCI ECAM. s/memry/memory/ Ok, copy-and-paste error. Will be fixed. + The default is ``on`` for machine types later than ``virt-3.0``. + +highmem-mmio + Set ``on``/``off`` to enable/disable the high memry region for PCI MMIO. s/memry/memory/ Ok. copy-and-paste error. Will be fixed. + The default is ``on``. + gic-version Specify the version of the Generic Interrupt Controller (GIC) to provide. Valid values are: Thanks, Gavin
Re: [PATCH v6 6/7] hw/arm/virt: Add 'compact-highmem' property
Hi Connie, On 10/25/22 6:30 PM, Cornelia Huck wrote: On Mon, Oct 24 2022, Gavin Shan wrote: After the improvement to high memory region address assignment is applied, the memory layout can be changed, introducing possible migration breakage. For example, VIRT_HIGH_PCIE_MMIO memory region is disabled or enabled when the optimization is applied or not, with the following configuration. The configuration is only achievable by modifying the source code until more properties are added to allow users selectively disable those high memory regions. pa_bits = 40; vms->highmem_redists = false; vms->highmem_ecam= false; vms->highmem_mmio= true; # qemu-system-aarch64 -accel kvm -cpu host\ -machine virt-7.2,compact-highmem={on, off} \ -m 4G,maxmem=511G -monitor stdio Region compact-highmem=off compact-highmem=on MEM[1GB 512GB][1GB 512GB] HIGH_GIC_REDISTS2 [512GB 512GB+64MB] [disabled] HIGH_PCIE_ECAM [512GB+256MB 512GB+512MB] [disabled] HIGH_PCIE_MMIO [disabled] [512GB 1TB] In order to keep backwords compatibility, we need to disable the optimization on machine, which is virt-7.1 or ealier than it. It means the optimization is enabled by default from virt-7.2. Besides, 'compact-highmem' property is added so that the optimization can be explicitly enabled or disabled on all machine types by users. Signed-off-by: Gavin Shan Reviewed-by: Cornelia Huck Tested-by: Zhenyu Zhang --- docs/system/arm/virt.rst | 4 hw/arm/virt.c| 32 include/hw/arm/virt.h| 1 + 3 files changed, 37 insertions(+) (...) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 4896f600b4..11b5685432 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -174,6 +174,12 @@ static const MemMapEntry base_memmap[] = { * Note the extended_memmap is sized so that it eventually also includes the * base_memmap entries (VIRT_HIGH_GIC_REDIST2 index is greater than the last * index of base_memmap). + * + * The memory map for these Highmem IO Regions can be in legacy or compact + * layout, depending on 'compact-highmem' property. With legacy layout, the + * PA space for one specific region is always reserved, even the region has s/even/even if/ Thanks, it will be improved as suggested in next respin (v7). + * been disabled or doesn't fit into the PA space. However, the PA space for + * the region won't be reserved in these circumstances with compact layout. */ static MemMapEntry extended_memmap[] = { /* Additional 64 MB redist region (can contain up to 512 redistributors) */ Thanks, Gavin
Re: [PATCH v6 5/7] hw/arm/virt: Improve high memory region address assignment
Hi Eric, On 10/26/22 12:29 AM, Eric Auger wrote: On 10/24/22 05:54, Gavin Shan wrote: There are three high memory regions, which are VIRT_HIGH_REDIST2, VIRT_HIGH_PCIE_ECAM and VIRT_HIGH_PCIE_MMIO. Their base addresses are floating on highest RAM address. However, they can be disabled in several cases. (1) One specific high memory region is likely to be disabled by code by toggling vms->highmem_{redists, ecam, mmio}. (2) VIRT_HIGH_PCIE_ECAM region is disabled on machine, which is 'virt-2.12' or ealier than it. (3) VIRT_HIGH_PCIE_ECAM region is disabled when firmware is loaded on 32-bits system. (4) One specific high memory region is disabled when it breaks the PA space limit. The current implementation of virt_set_{memmap, high_memmap}() isn't optimized because the high memory region's PA space is always reserved, regardless of whatever the actual state in the corresponding vms->highmem_{redists, ecam, mmio} flag. In the code, 'base' and 'vms->highest_gpa' are always increased for case (1), (2) and (3). It's unnecessary since the assigned PA space for the disabled high memory region won't be used afterwards. Improve the address assignment for those three high memory region by skipping the address assignment for one specific high memory region if it has been disabled in case (1), (2) and (3). The memory layout may be changed after the improvement is applied, which leads to potential migration breakage. So 'vms->highmem_compact' is added to control if the improvement should be applied. For now, 'vms->highmem_compact' is set to false, meaning that we don't have memory layout change until it becomes configurable through property 'compact-highmem' in next patch. Signed-off-by: Gavin Shan Reviewed-by: Cornelia Huck the code has quite changed since Connie's R-b Right. Connie, could you please check if the changes make sense to you and I can regain your R-B? :) Tested-by: Zhenyu Zhang --- hw/arm/virt.c | 15 ++- include/hw/arm/virt.h | 1 + 2 files changed, 11 insertions(+), 5 deletions(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index ee98a8a3b6..4896f600b4 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -1721,18 +1721,23 @@ static void virt_set_high_memmap(VirtMachineState *vms, vms->memmap[i].size = region_size; /* - * Check each device to see if they fit in the PA space, - * moving highest_gpa as we go. + * Check each device to see if it fits in the PA space, + * moving highest_gpa as we go. For compatibility, move + * highest_gpa for disabled fitting devices as well, if + * the compact layout has been disabled. * * For each device that doesn't fit, disable it. */ fits = (region_base + region_size) <= BIT_ULL(pa_bits); -if (fits) { -vms->highest_gpa = region_base + region_size - 1; +*region_enabled &= fits; +if (vms->highmem_compact && !*region_enabled) { +continue; } -*region_enabled &= fits; base = region_base + region_size; +if (fits) { +vms->highest_gpa = region_base + region_size - 1; vms->highest_gpa = base - 1; It's personal taste actually. I was thinking of using 'base - 1', but 'region_base + region_size - 1' looks more like a direct way. I don't have strong sense though and lets use 'base - 1' in next respin. +} } } diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h index 6ec479ca2b..709f623741 100644 --- a/include/hw/arm/virt.h +++ b/include/hw/arm/virt.h @@ -144,6 +144,7 @@ struct VirtMachineState { PFlashCFI01 *flash[2]; bool secure; bool highmem; +bool highmem_compact; bool highmem_ecam; bool highmem_mmio; bool highmem_redists; Besides Reviewed-by: Eric Auger Thanks, Gavin
Re: [PATCH v6 0/7] hw/arm/virt: Improve address assignment for high memory regions
Hi Peter and Marc, On 10/24/22 11:54 AM, Gavin Shan wrote: There are three high memory regions, which are VIRT_HIGH_REDIST2, VIRT_HIGH_PCIE_ECAM and VIRT_HIGH_PCIE_MMIO. Their base addresses are floating on highest RAM address. However, they can be disabled in several cases. (1) One specific high memory region is disabled by developer by toggling vms->highmem_{redists, ecam, mmio}. (2) VIRT_HIGH_PCIE_ECAM region is disabled on machine, which is 'virt-2.12' or ealier than it. (3) VIRT_HIGH_PCIE_ECAM region is disabled when firmware is loaded on 32-bits system. (4) One specific high memory region is disabled when it breaks the PA space limit. The current implementation of virt_set_memmap() isn't comprehensive because the space for one specific high memory region is always reserved from the PA space for case (1), (2) and (3). In the code, 'base' and 'vms->highest_gpa' are always increased for those three cases. It's unnecessary since the assigned space of the disabled high memory region won't be used afterwards. The series intends to improve the address assignment for these high memory regions and introduces new properties for user to selectively disable those 3 high memory regions. PATCH[1-4] preparatory work for the improvment PATCH[5] improve high memory region address assignment PATCH[6] adds 'compact-highmem' to enable or disable the optimization PATCH[7] adds properties so that high memory regions can be disabled v5: https://lists.nongnu.org/archive/html/qemu-arm/2022-10/msg00280.html v4: https://lists.nongnu.org/archive/html/qemu-arm/2022-10/msg00067.html v3: https://lists.nongnu.org/archive/html/qemu-arm/2022-09/msg00258.html v2: https://lore.kernel.org/all/20220815062958.100366-1-gs...@redhat.com/T/ v1: https://lists.nongnu.org/archive/html/qemu-arm/2022-08/msg00013.html Could you help to take a look when getting a chance? I think Connie and Eric are close to complete the reviews, but v7 is still needed to address extra comments from them. I hope to make v7 mergeable if possible :) Thanks, Gavin Changelog == v6: * Pick review-by from Connie/Eric(Connie/Eric) * Make the changes obvious in PATCH[v6 5/7] (Eric) * Move the example to commit log and describe the legacy and compact layout in code's comments in PATCH[v6 6/7] (Eric) * Comment and commit message improvements(Connie/Eric) * Add 3 properties in PATCH[v6 7/7], allowing user to disable those 3 high memory regions(Marc) v5: * Pick review-by and tested-by (Connie/Zhenyu) * Add extra check in PATCH[v5 4/6] (Connie) * Improve comments about compatibility for disabled regions in PATCH[v5 5/6] (Connie) v4: * Add virt_get_high_memmap_enabled() helper (Eric) * Move 'vms->highmem_compact' and related logic from PATCH[v4 6/6] to PATCH[v4 5/6] to avoid git-bisect breakage (Eric) * Document the legacy and optimized high memory region layout in commit log and source code (Eric) v3: * Reorder the patches(Gavin) * Add 'highmem-compact' property for backwards compatibility (Eric) v2: * Split the patches for easier review(Gavin) * Improved changelog (Marc) * Use 'bool fits' in virt_set_high_memmap() (Eric) Gavin Shan (7): hw/arm/virt: Introduce virt_set_high_memmap() helper hw/arm/virt: Rename variable size to region_size in virt_set_high_memmap() hw/arm/virt: Introduce variable region_base in virt_set_high_memmap() hw/arm/virt: Introduce virt_get_high_memmap_enabled() helper hw/arm/virt: Improve high memory region address assignment hw/arm/virt: Add 'compact-highmem' property hw/arm/virt: Add properties to disable high memory regions docs/system/arm/virt.rst | 16 hw/arm/virt.c| 182 --- include/hw/arm/virt.h| 2 + 3 files changed, 167 insertions(+), 33 deletions(-)
[PATCH v6 6/7] hw/arm/virt: Add 'compact-highmem' property
After the improvement to high memory region address assignment is applied, the memory layout can be changed, introducing possible migration breakage. For example, VIRT_HIGH_PCIE_MMIO memory region is disabled or enabled when the optimization is applied or not, with the following configuration. The configuration is only achievable by modifying the source code until more properties are added to allow users selectively disable those high memory regions. pa_bits = 40; vms->highmem_redists = false; vms->highmem_ecam= false; vms->highmem_mmio= true; # qemu-system-aarch64 -accel kvm -cpu host\ -machine virt-7.2,compact-highmem={on, off} \ -m 4G,maxmem=511G -monitor stdio Region compact-highmem=off compact-highmem=on MEM[1GB 512GB][1GB 512GB] HIGH_GIC_REDISTS2 [512GB 512GB+64MB] [disabled] HIGH_PCIE_ECAM [512GB+256MB 512GB+512MB] [disabled] HIGH_PCIE_MMIO [disabled] [512GB 1TB] In order to keep backwords compatibility, we need to disable the optimization on machine, which is virt-7.1 or ealier than it. It means the optimization is enabled by default from virt-7.2. Besides, 'compact-highmem' property is added so that the optimization can be explicitly enabled or disabled on all machine types by users. Signed-off-by: Gavin Shan Reviewed-by: Cornelia Huck Tested-by: Zhenyu Zhang --- docs/system/arm/virt.rst | 4 hw/arm/virt.c| 32 include/hw/arm/virt.h| 1 + 3 files changed, 37 insertions(+) diff --git a/docs/system/arm/virt.rst b/docs/system/arm/virt.rst index 20442ea2c1..4454706392 100644 --- a/docs/system/arm/virt.rst +++ b/docs/system/arm/virt.rst @@ -94,6 +94,10 @@ highmem address space above 32 bits. The default is ``on`` for machine types later than ``virt-2.12``. +compact-highmem + Set ``on``/``off`` to enable/disable the compact layout for high memory regions. + The default is ``on`` for machine types later than ``virt-7.2``. + gic-version Specify the version of the Generic Interrupt Controller (GIC) to provide. Valid values are: diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 4896f600b4..11b5685432 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -174,6 +174,12 @@ static const MemMapEntry base_memmap[] = { * Note the extended_memmap is sized so that it eventually also includes the * base_memmap entries (VIRT_HIGH_GIC_REDIST2 index is greater than the last * index of base_memmap). + * + * The memory map for these Highmem IO Regions can be in legacy or compact + * layout, depending on 'compact-highmem' property. With legacy layout, the + * PA space for one specific region is always reserved, even the region has + * been disabled or doesn't fit into the PA space. However, the PA space for + * the region won't be reserved in these circumstances with compact layout. */ static MemMapEntry extended_memmap[] = { /* Additional 64 MB redist region (can contain up to 512 redistributors) */ @@ -2351,6 +2357,20 @@ static void virt_set_highmem(Object *obj, bool value, Error **errp) vms->highmem = value; } +static bool virt_get_compact_highmem(Object *obj, Error **errp) +{ +VirtMachineState *vms = VIRT_MACHINE(obj); + +return vms->highmem_compact; +} + +static void virt_set_compact_highmem(Object *obj, bool value, Error **errp) +{ +VirtMachineState *vms = VIRT_MACHINE(obj); + +vms->highmem_compact = value; +} + static bool virt_get_its(Object *obj, Error **errp) { VirtMachineState *vms = VIRT_MACHINE(obj); @@ -2969,6 +2989,13 @@ static void virt_machine_class_init(ObjectClass *oc, void *data) "Set on/off to enable/disable using " "physical address space above 32 bits"); +object_class_property_add_bool(oc, "compact-highmem", + virt_get_compact_highmem, + virt_set_compact_highmem); +object_class_property_set_description(oc, "compact-highmem", + "Set on/off to enable/disable compact " + "layout for high memory regions"); + object_class_property_add_str(oc, "gic-version", virt_get_gic_version, virt_set_gic_version); object_class_property_set_description(oc, "gic-version", @@ -3053,6 +3080,7 @@ static void virt_instance_init(Object *obj) /* High memory is enabled by default */ vms->highmem = true; +vms->highmem_compact = !vmc->no_highmem_compact; vms->gic_version = VIRT_GIC_VERSION_NOSEL; vms->highmem_ecam = !vmc->no_highmem_ecam;
[PATCH v6 2/7] hw/arm/virt: Rename variable size to region_size in virt_set_high_memmap()
This renames variable 'size' to 'region_size' in virt_set_high_memmap(). Its counterpart ('region_base') will be introduced in next patch. No functional change intended. Signed-off-by: Gavin Shan Reviewed-by: Eric Auger Reviewed-by: Cornelia Huck Tested-by: Zhenyu Zhang --- hw/arm/virt.c | 15 --- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 7572c44bda..e2ae88cf8b 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -1692,15 +1692,16 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState *vms, int idx) static void virt_set_high_memmap(VirtMachineState *vms, hwaddr base, int pa_bits) { +hwaddr region_size; +bool fits; int i; for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) { -hwaddr size = extended_memmap[i].size; -bool fits; +region_size = extended_memmap[i].size; -base = ROUND_UP(base, size); +base = ROUND_UP(base, region_size); vms->memmap[i].base = base; -vms->memmap[i].size = size; +vms->memmap[i].size = region_size; /* * Check each device to see if they fit in the PA space, @@ -1708,9 +1709,9 @@ static void virt_set_high_memmap(VirtMachineState *vms, * * For each device that doesn't fit, disable it. */ -fits = (base + size) <= BIT_ULL(pa_bits); +fits = (base + region_size) <= BIT_ULL(pa_bits); if (fits) { -vms->highest_gpa = base + size - 1; +vms->highest_gpa = base + region_size - 1; } switch (i) { @@ -1725,7 +1726,7 @@ static void virt_set_high_memmap(VirtMachineState *vms, break; } -base += size; +base += region_size; } } -- 2.23.0
[PATCH v6 3/7] hw/arm/virt: Introduce variable region_base in virt_set_high_memmap()
This introduces variable 'region_base' for the base address of the specific high memory region. It's the preparatory work to optimize high memory region address assignment. No functional change intended. Signed-off-by: Gavin Shan Reviewed-by: Eric Auger Reviewed-by: Cornelia Huck Tested-by: Zhenyu Zhang --- hw/arm/virt.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index e2ae88cf8b..0bf3cb7057 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -1692,15 +1692,15 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState *vms, int idx) static void virt_set_high_memmap(VirtMachineState *vms, hwaddr base, int pa_bits) { -hwaddr region_size; +hwaddr region_base, region_size; bool fits; int i; for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) { +region_base = ROUND_UP(base, extended_memmap[i].size); region_size = extended_memmap[i].size; -base = ROUND_UP(base, region_size); -vms->memmap[i].base = base; +vms->memmap[i].base = region_base; vms->memmap[i].size = region_size; /* @@ -1709,9 +1709,9 @@ static void virt_set_high_memmap(VirtMachineState *vms, * * For each device that doesn't fit, disable it. */ -fits = (base + region_size) <= BIT_ULL(pa_bits); +fits = (region_base + region_size) <= BIT_ULL(pa_bits); if (fits) { -vms->highest_gpa = base + region_size - 1; +vms->highest_gpa = region_base + region_size - 1; } switch (i) { @@ -1726,7 +1726,7 @@ static void virt_set_high_memmap(VirtMachineState *vms, break; } -base += region_size; +base = region_base + region_size; } } -- 2.23.0
[PATCH v6 4/7] hw/arm/virt: Introduce virt_get_high_memmap_enabled() helper
This introduces virt_get_high_memmap_enabled() helper, which returns the pointer to vms->highmem_{redists, ecam, mmio}. The pointer will be used in the subsequent patches. No functional change intended. Signed-off-by: Gavin Shan Reviewed-by: Eric Auger Reviewed-by: Cornelia Huck Tested-by: Zhenyu Zhang --- hw/arm/virt.c | 32 +++- 1 file changed, 19 insertions(+), 13 deletions(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 0bf3cb7057..ee98a8a3b6 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -1689,14 +1689,31 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState *vms, int idx) return arm_cpu_mp_affinity(idx, clustersz); } +static inline bool *virt_get_high_memmap_enabled(VirtMachineState *vms, + int index) +{ +bool *enabled_array[] = { +>highmem_redists, +>highmem_ecam, +>highmem_mmio, +}; + +assert(ARRAY_SIZE(extended_memmap) - VIRT_LOWMEMMAP_LAST == + ARRAY_SIZE(enabled_array)); +assert(index - VIRT_LOWMEMMAP_LAST < ARRAY_SIZE(enabled_array)); + +return enabled_array[index - VIRT_LOWMEMMAP_LAST]; +} + static void virt_set_high_memmap(VirtMachineState *vms, hwaddr base, int pa_bits) { hwaddr region_base, region_size; -bool fits; +bool *region_enabled, fits; int i; for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) { +region_enabled = virt_get_high_memmap_enabled(vms, i); region_base = ROUND_UP(base, extended_memmap[i].size); region_size = extended_memmap[i].size; @@ -1714,18 +1731,7 @@ static void virt_set_high_memmap(VirtMachineState *vms, vms->highest_gpa = region_base + region_size - 1; } -switch (i) { -case VIRT_HIGH_GIC_REDIST2: -vms->highmem_redists &= fits; -break; -case VIRT_HIGH_PCIE_ECAM: -vms->highmem_ecam &= fits; -break; -case VIRT_HIGH_PCIE_MMIO: -vms->highmem_mmio &= fits; -break; -} - +*region_enabled &= fits; base = region_base + region_size; } } -- 2.23.0
[PATCH v6 0/7] hw/arm/virt: Improve address assignment for high memory regions
There are three high memory regions, which are VIRT_HIGH_REDIST2, VIRT_HIGH_PCIE_ECAM and VIRT_HIGH_PCIE_MMIO. Their base addresses are floating on highest RAM address. However, they can be disabled in several cases. (1) One specific high memory region is disabled by developer by toggling vms->highmem_{redists, ecam, mmio}. (2) VIRT_HIGH_PCIE_ECAM region is disabled on machine, which is 'virt-2.12' or ealier than it. (3) VIRT_HIGH_PCIE_ECAM region is disabled when firmware is loaded on 32-bits system. (4) One specific high memory region is disabled when it breaks the PA space limit. The current implementation of virt_set_memmap() isn't comprehensive because the space for one specific high memory region is always reserved from the PA space for case (1), (2) and (3). In the code, 'base' and 'vms->highest_gpa' are always increased for those three cases. It's unnecessary since the assigned space of the disabled high memory region won't be used afterwards. The series intends to improve the address assignment for these high memory regions and introduces new properties for user to selectively disable those 3 high memory regions. PATCH[1-4] preparatory work for the improvment PATCH[5] improve high memory region address assignment PATCH[6] adds 'compact-highmem' to enable or disable the optimization PATCH[7] adds properties so that high memory regions can be disabled v5: https://lists.nongnu.org/archive/html/qemu-arm/2022-10/msg00280.html v4: https://lists.nongnu.org/archive/html/qemu-arm/2022-10/msg00067.html v3: https://lists.nongnu.org/archive/html/qemu-arm/2022-09/msg00258.html v2: https://lore.kernel.org/all/20220815062958.100366-1-gs...@redhat.com/T/ v1: https://lists.nongnu.org/archive/html/qemu-arm/2022-08/msg00013.html Changelog == v6: * Pick review-by from Connie/Eric(Connie/Eric) * Make the changes obvious in PATCH[v6 5/7] (Eric) * Move the example to commit log and describe the legacy and compact layout in code's comments in PATCH[v6 6/7] (Eric) * Comment and commit message improvements(Connie/Eric) * Add 3 properties in PATCH[v6 7/7], allowing user to disable those 3 high memory regions(Marc) v5: * Pick review-by and tested-by (Connie/Zhenyu) * Add extra check in PATCH[v5 4/6] (Connie) * Improve comments about compatibility for disabled regions in PATCH[v5 5/6] (Connie) v4: * Add virt_get_high_memmap_enabled() helper (Eric) * Move 'vms->highmem_compact' and related logic from PATCH[v4 6/6] to PATCH[v4 5/6] to avoid git-bisect breakage (Eric) * Document the legacy and optimized high memory region layout in commit log and source code (Eric) v3: * Reorder the patches(Gavin) * Add 'highmem-compact' property for backwards compatibility (Eric) v2: * Split the patches for easier review(Gavin) * Improved changelog (Marc) * Use 'bool fits' in virt_set_high_memmap() (Eric) Gavin Shan (7): hw/arm/virt: Introduce virt_set_high_memmap() helper hw/arm/virt: Rename variable size to region_size in virt_set_high_memmap() hw/arm/virt: Introduce variable region_base in virt_set_high_memmap() hw/arm/virt: Introduce virt_get_high_memmap_enabled() helper hw/arm/virt: Improve high memory region address assignment hw/arm/virt: Add 'compact-highmem' property hw/arm/virt: Add properties to disable high memory regions docs/system/arm/virt.rst | 16 hw/arm/virt.c| 182 --- include/hw/arm/virt.h| 2 + 3 files changed, 167 insertions(+), 33 deletions(-) -- 2.23.0
[PATCH v6 1/7] hw/arm/virt: Introduce virt_set_high_memmap() helper
This introduces virt_set_high_memmap() helper. The logic of high memory region address assignment is moved to the helper. The intention is to make the subsequent optimization for high memory region address assignment easier. No functional change intended. Signed-off-by: Gavin Shan Reviewed-by: Eric Auger Reviewed-by: Cornelia Huck Tested-by: Zhenyu Zhang --- hw/arm/virt.c | 74 --- 1 file changed, 41 insertions(+), 33 deletions(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index cda9defe8f..7572c44bda 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -1689,6 +1689,46 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState *vms, int idx) return arm_cpu_mp_affinity(idx, clustersz); } +static void virt_set_high_memmap(VirtMachineState *vms, + hwaddr base, int pa_bits) +{ +int i; + +for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) { +hwaddr size = extended_memmap[i].size; +bool fits; + +base = ROUND_UP(base, size); +vms->memmap[i].base = base; +vms->memmap[i].size = size; + +/* + * Check each device to see if they fit in the PA space, + * moving highest_gpa as we go. + * + * For each device that doesn't fit, disable it. + */ +fits = (base + size) <= BIT_ULL(pa_bits); +if (fits) { +vms->highest_gpa = base + size - 1; +} + +switch (i) { +case VIRT_HIGH_GIC_REDIST2: +vms->highmem_redists &= fits; +break; +case VIRT_HIGH_PCIE_ECAM: +vms->highmem_ecam &= fits; +break; +case VIRT_HIGH_PCIE_MMIO: +vms->highmem_mmio &= fits; +break; +} + +base += size; +} +} + static void virt_set_memmap(VirtMachineState *vms, int pa_bits) { MachineState *ms = MACHINE(vms); @@ -1744,39 +1784,7 @@ static void virt_set_memmap(VirtMachineState *vms, int pa_bits) /* We know for sure that at least the memory fits in the PA space */ vms->highest_gpa = memtop - 1; -for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) { -hwaddr size = extended_memmap[i].size; -bool fits; - -base = ROUND_UP(base, size); -vms->memmap[i].base = base; -vms->memmap[i].size = size; - -/* - * Check each device to see if they fit in the PA space, - * moving highest_gpa as we go. - * - * For each device that doesn't fit, disable it. - */ -fits = (base + size) <= BIT_ULL(pa_bits); -if (fits) { -vms->highest_gpa = base + size - 1; -} - -switch (i) { -case VIRT_HIGH_GIC_REDIST2: -vms->highmem_redists &= fits; -break; -case VIRT_HIGH_PCIE_ECAM: -vms->highmem_ecam &= fits; -break; -case VIRT_HIGH_PCIE_MMIO: -vms->highmem_mmio &= fits; -break; -} - -base += size; -} +virt_set_high_memmap(vms, base, pa_bits); if (device_memory_size > 0) { ms->device_memory = g_malloc0(sizeof(*ms->device_memory)); -- 2.23.0
[PATCH v6 5/7] hw/arm/virt: Improve high memory region address assignment
There are three high memory regions, which are VIRT_HIGH_REDIST2, VIRT_HIGH_PCIE_ECAM and VIRT_HIGH_PCIE_MMIO. Their base addresses are floating on highest RAM address. However, they can be disabled in several cases. (1) One specific high memory region is likely to be disabled by code by toggling vms->highmem_{redists, ecam, mmio}. (2) VIRT_HIGH_PCIE_ECAM region is disabled on machine, which is 'virt-2.12' or ealier than it. (3) VIRT_HIGH_PCIE_ECAM region is disabled when firmware is loaded on 32-bits system. (4) One specific high memory region is disabled when it breaks the PA space limit. The current implementation of virt_set_{memmap, high_memmap}() isn't optimized because the high memory region's PA space is always reserved, regardless of whatever the actual state in the corresponding vms->highmem_{redists, ecam, mmio} flag. In the code, 'base' and 'vms->highest_gpa' are always increased for case (1), (2) and (3). It's unnecessary since the assigned PA space for the disabled high memory region won't be used afterwards. Improve the address assignment for those three high memory region by skipping the address assignment for one specific high memory region if it has been disabled in case (1), (2) and (3). The memory layout may be changed after the improvement is applied, which leads to potential migration breakage. So 'vms->highmem_compact' is added to control if the improvement should be applied. For now, 'vms->highmem_compact' is set to false, meaning that we don't have memory layout change until it becomes configurable through property 'compact-highmem' in next patch. Signed-off-by: Gavin Shan Reviewed-by: Cornelia Huck Tested-by: Zhenyu Zhang --- hw/arm/virt.c | 15 ++- include/hw/arm/virt.h | 1 + 2 files changed, 11 insertions(+), 5 deletions(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index ee98a8a3b6..4896f600b4 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -1721,18 +1721,23 @@ static void virt_set_high_memmap(VirtMachineState *vms, vms->memmap[i].size = region_size; /* - * Check each device to see if they fit in the PA space, - * moving highest_gpa as we go. + * Check each device to see if it fits in the PA space, + * moving highest_gpa as we go. For compatibility, move + * highest_gpa for disabled fitting devices as well, if + * the compact layout has been disabled. * * For each device that doesn't fit, disable it. */ fits = (region_base + region_size) <= BIT_ULL(pa_bits); -if (fits) { -vms->highest_gpa = region_base + region_size - 1; +*region_enabled &= fits; +if (vms->highmem_compact && !*region_enabled) { +continue; } -*region_enabled &= fits; base = region_base + region_size; +if (fits) { +vms->highest_gpa = region_base + region_size - 1; +} } } diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h index 6ec479ca2b..709f623741 100644 --- a/include/hw/arm/virt.h +++ b/include/hw/arm/virt.h @@ -144,6 +144,7 @@ struct VirtMachineState { PFlashCFI01 *flash[2]; bool secure; bool highmem; +bool highmem_compact; bool highmem_ecam; bool highmem_mmio; bool highmem_redists; -- 2.23.0
[PATCH v6 7/7] hw/arm/virt: Add properties to disable high memory regions
These 3 high memory regions are usually enabled by default, but they may be not used. For example, VIRT_HIGH_GIC_REDIST2 isn't needed by GICv2. This leads to waste in the PA space. Add properties to allow users selectively disable them if needed: "highmem-redists", "highmem-ecam", "highmem-mmio". Suggested-by: Marc Zyngier Signed-off-by: Gavin Shan --- docs/system/arm/virt.rst | 12 hw/arm/virt.c| 64 2 files changed, 76 insertions(+) diff --git a/docs/system/arm/virt.rst b/docs/system/arm/virt.rst index 4454706392..a1668a969d 100644 --- a/docs/system/arm/virt.rst +++ b/docs/system/arm/virt.rst @@ -98,6 +98,18 @@ compact-highmem Set ``on``/``off`` to enable/disable the compact layout for high memory regions. The default is ``on`` for machine types later than ``virt-7.2``. +highmem-redists + Set ``on``/``off`` to enable/disable the high memry region for GICv3/4 + redistributor. The default is ``on``. + +highmem-ecam + Set ``on``/``off`` to enable/disable the high memry region for PCI ECAM. + The default is ``on`` for machine types later than ``virt-3.0``. + +highmem-mmio + Set ``on``/``off`` to enable/disable the high memry region for PCI MMIO. + The default is ``on``. + gic-version Specify the version of the Generic Interrupt Controller (GIC) to provide. Valid values are: diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 11b5685432..afafc2d1b8 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -2371,6 +2371,49 @@ static void virt_set_compact_highmem(Object *obj, bool value, Error **errp) vms->highmem_compact = value; } +static bool virt_get_highmem_redists(Object *obj, Error **errp) +{ +VirtMachineState *vms = VIRT_MACHINE(obj); + +return vms->highmem_redists; +} + +static void virt_set_highmem_redists(Object *obj, bool value, Error **errp) +{ +VirtMachineState *vms = VIRT_MACHINE(obj); + +vms->highmem_redists = value; +} + +static bool virt_get_highmem_ecam(Object *obj, Error **errp) +{ +VirtMachineState *vms = VIRT_MACHINE(obj); + +return vms->highmem_ecam; +} + +static void virt_set_highmem_ecam(Object *obj, bool value, Error **errp) +{ +VirtMachineState *vms = VIRT_MACHINE(obj); + +vms->highmem_ecam = value; +} + +static bool virt_get_highmem_mmio(Object *obj, Error **errp) +{ +VirtMachineState *vms = VIRT_MACHINE(obj); + +return vms->highmem_mmio; +} + +static void virt_set_highmem_mmio(Object *obj, bool value, Error **errp) +{ +VirtMachineState *vms = VIRT_MACHINE(obj); + +vms->highmem_mmio = value; +} + + static bool virt_get_its(Object *obj, Error **errp) { VirtMachineState *vms = VIRT_MACHINE(obj); @@ -2996,6 +3039,27 @@ static void virt_machine_class_init(ObjectClass *oc, void *data) "Set on/off to enable/disable compact " "layout for high memory regions"); +object_class_property_add_bool(oc, "highmem-redists", + virt_get_highmem_redists, + virt_set_highmem_redists); +object_class_property_set_description(oc, "highmem-redists", + "Set on/off to enable/disable high " + "memory region for GICv3/4 redistributor"); + +object_class_property_add_bool(oc, "highmem-ecam", + virt_get_highmem_ecam, + virt_set_highmem_ecam); +object_class_property_set_description(oc, "highmem-ecam", + "Set on/off to enable/disable high " + "memory region for PCI ECAM"); + +object_class_property_add_bool(oc, "highmem-mmio", + virt_get_highmem_mmio, + virt_set_highmem_mmio); +object_class_property_set_description(oc, "highmem-mmio", + "Set on/off to enable/disable high " + "memory region for PCI MMIO"); + object_class_property_add_str(oc, "gic-version", virt_get_gic_version, virt_set_gic_version); object_class_property_set_description(oc, "gic-version", -- 2.23.0
Re: [PATCH v5 6/6] hw/arm/virt: Add 'compact-highmem' property
Hi Marc, On 10/20/22 5:44 PM, Marc Zyngier wrote: On Thu, 20 Oct 2022 00:57:32 +0100, Gavin Shan wrote: For Marc's suggestion to add properties so that these high memory regions can be disabled by users. I can add one patch after this one to introduce the following 3 properties. Could you please confirm the property names are good enough? It's nice if Marc can help to confirm before I'm going to work on next revision. "highmem-ecam":"on"/"off" on vms->highmem_ecam "highmem-mmio":"on"/"off" on vms->highmem_mmio "highmem-redists": "on"/"off" on vms->highmem_redists I think that'd be reasonable, and would give the user some actual control over what gets exposed in the highmem region. I guess that the annoying thing with these options is that they allow the user to request conflicting settings (256 CPUs and highmem-redists=off, for example). You'll need to make this fail more or less gracefully. Thanks for the quick confirm. The check is already existing in machvirt_init(). I think what we need is simply to calculate 'virt_max_cpus' with consideration to 'highmem-redists' property there. if (max_cpus > virt_max_cpus) { error_report("Number of SMP CPUs requested (%d) exceeds max CPUs " "supported by machine 'mach-virt' (%d)", max_cpus, virt_max_cpus); exit(1); } Thanks, Gavin
Re: [PATCH v5 6/6] hw/arm/virt: Add 'compact-highmem' property
Hi Eric, On 10/20/22 4:18 AM, Eric Auger wrote: On 10/12/22 01:18, Gavin Shan wrote: After the improvement to high memory region address assignment is applied, the memory layout can be changed, introducing possible migration breakage. For example, VIRT_HIGH_PCIE_MMIO memory region is disabled or enabled when the optimization is applied or not, with the following configuration. pa_bits = 40; vms->highmem_redists = false; vms->highmem_ecam= false; vms->highmem_mmio= true; # qemu-system-aarch64 -accel kvm -cpu host\ -machine virt-7.2,compact-highmem={on, off} \ -m 4G,maxmem=511G -monitor stdio Regioncompact-highmem=off compact-highmem=on RAM [1GB 512GB][1GB 512GB] HIGH_GIC_REDISTS [512GB 512GB+64MB] [disabled] HIGH_PCIE_ECAM[512GB+256MB 512GB+512MB] [disabled] HIGH_PCIE_MMIO[disabled] [512GB 1TB] In order to keep backwords compatibility, we need to disable the optimization on machines, which is virt-7.1 or ealier than it. It means the optimization is enabled by default from virt-7.2. Besides, 'compact-highmem' property is added so that the optimization can be explicitly enabled or disabled on all machine types by users. Signed-off-by: Gavin Shan Tested-by: Zhenyu Zhang --- docs/system/arm/virt.rst | 4 hw/arm/virt.c| 47 include/hw/arm/virt.h| 1 + 3 files changed, 52 insertions(+) diff --git a/docs/system/arm/virt.rst b/docs/system/arm/virt.rst index 20442ea2c1..75bf5a4994 100644 --- a/docs/system/arm/virt.rst +++ b/docs/system/arm/virt.rst @@ -94,6 +94,10 @@ highmem address space above 32 bits. The default is ``on`` for machine types later than ``virt-2.12``. +compact-highmem + Set ``on``/``off`` to enable/disable compact space for high memory regions. + The default is ``on`` for machine types later than ``virt-7.2`` + gic-version Specify the version of the Generic Interrupt Controller (GIC) to provide. Valid values are: diff --git a/hw/arm/virt.c b/hw/arm/virt.c index c05cfb5314..8f1dba0ece 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -174,6 +174,27 @@ static const MemMapEntry base_memmap[] = { * Note the extended_memmap is sized so that it eventually also includes the * base_memmap entries (VIRT_HIGH_GIC_REDIST2 index is greater than the last * index of base_memmap). + * + * The addresses assigned to these regions are affected by 'compact-highmem' + * property, which is to enable or disable the compact space in the Highmem + * IO regions. For example, VIRT_HIGH_PCIE_MMIO can be disabled or enabled + * depending on the property in the following scenario. To me you shall rather explain here what is the so-called "compact" space vs the legacy highmem layout. If I understand correctly the example rather legitimates the use of a compat option showing how the layout can be affected by the option. I would put that in the commit msg instead. Also in your example I see VIRT_HIGH_GIC_REDISTS is disabled but the code does not disable the region excpet if it does not fit within the PA. This does not match your example. Also the region is named VIRT_HIGH_GIC_REDIST2. In v4, Marc also suggested to have individual options for each highmem region. https://lore.kernel.org/qemu-devel/0f8e6a58-0dde-fb80-6966-7bb32c4df...@redhat.com/ Have you considered that option? I think your comments make sense to me. So lets put the following comments to the code and move the example to commit log. /* * The memory map for these Highmem IO Regions can be in legacy or compact * layout, depending on 'compact-highmem' property. In legacy layout, the * PA space for one specific region is always reserved, even the region has * been disabled or doesn't fit into the PA space. However, the PA space for * the region won't be reserved in these circumstances. */ You're correct about the example. VIRT_HIGH_GIC_REDIST2 should be used. Besides, the configuration is only achievable by modifying source code at present, until Marc's suggestion rolls in to allow users disable one particular high memory regions by more properties. I will amend the commit log to have something like below. For example, VIRT_HIGH_PCIE_MMIO memory region is disabled or enabled when the optimization is applied or not, with the following configuration. The configuration is only achievable by modifying source code, until more properties are added to allow user selectively disable those high memory regions. For Marc's suggestion to add properties so that these high memory regions can be disabled by users. I can add one patch after this one to introduce the following 3 properties. Could you please confirm the property names are good enough? It's nice if Marc can help to
Re: [PATCH v5 6/6] hw/arm/virt: Add 'compact-highmem' property
Hi Connie, On 10/19/22 10:00 PM, Cornelia Huck wrote: On Wed, Oct 12 2022, Gavin Shan wrote: After the improvement to high memory region address assignment is applied, the memory layout can be changed, introducing possible migration breakage. For example, VIRT_HIGH_PCIE_MMIO memory region is disabled or enabled when the optimization is applied or not, with the following configuration. pa_bits = 40; vms->highmem_redists = false; vms->highmem_ecam= false; vms->highmem_mmio= true; # qemu-system-aarch64 -accel kvm -cpu host\ -machine virt-7.2,compact-highmem={on, off} \ -m 4G,maxmem=511G -monitor stdio Regioncompact-highmem=off compact-highmem=on RAM [1GB 512GB][1GB 512GB] HIGH_GIC_REDISTS [512GB 512GB+64MB] [disabled] HIGH_PCIE_ECAM[512GB+256MB 512GB+512MB] [disabled] HIGH_PCIE_MMIO[disabled] [512GB 1TB] In order to keep backwords compatibility, we need to disable the optimization on machines, which is virt-7.1 or ealier than it. It means the optimization is enabled by default from virt-7.2. Besides, 'compact-highmem' property is added so that the optimization can be explicitly enabled or disabled on all machine types by users. Signed-off-by: Gavin Shan Tested-by: Zhenyu Zhang --- docs/system/arm/virt.rst | 4 hw/arm/virt.c| 47 include/hw/arm/virt.h| 1 + 3 files changed, 52 insertions(+) diff --git a/docs/system/arm/virt.rst b/docs/system/arm/virt.rst index 20442ea2c1..75bf5a4994 100644 --- a/docs/system/arm/virt.rst +++ b/docs/system/arm/virt.rst @@ -94,6 +94,10 @@ highmem address space above 32 bits. The default is ``on`` for machine types later than ``virt-2.12``. +compact-highmem + Set ``on``/``off`` to enable/disable compact space for high memory regions. Maybe s/compact space/the compact layout/ ? Yeah, 'compact layout' is better. I will amend in next respin. + The default is ``on`` for machine types later than ``virt-7.2`` + gic-version Specify the version of the Generic Interrupt Controller (GIC) to provide. Valid values are: diff --git a/hw/arm/virt.c b/hw/arm/virt.c index c05cfb5314..8f1dba0ece 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -174,6 +174,27 @@ static const MemMapEntry base_memmap[] = { * Note the extended_memmap is sized so that it eventually also includes the * base_memmap entries (VIRT_HIGH_GIC_REDIST2 index is greater than the last * index of base_memmap). + * + * The addresses assigned to these regions are affected by 'compact-highmem' + * property, which is to enable or disable the compact space in the Highmem s/space in/layout for/ ? Agreed. + * IO regions. For example, VIRT_HIGH_PCIE_MMIO can be disabled or enabled + * depending on the property in the following scenario. + * + * pa_bits = 40; + * vms->highmem_redists = false; + * vms->highmem_ecam= false; + * vms->highmem_mmio= true; + * + * # qemu-system-aarch64 -accel kvm -cpu host\ + * -machine virt-7.2,compact-highmem={on, off} \ + * -m 4G,maxmem=511G -monitor stdio + * + * Regioncompact-highmem=offcompact-highmem=on + * + * RAM [1GB 512GB][1GB 512GB] + * HIGH_GIC_REDISTS [512GB 512GB+64MB] [disabled] + * HIGH_PCIE_ECAM[512GB+256GB 512GB+512MB] [disabled] + * HIGH_PCIE_MMIO[disabled] [512GB 1TB] */ static MemMapEntry extended_memmap[] = { /* Additional 64 MB redist region (can contain up to 512 redistributors) */ (...) @@ -3124,8 +3167,12 @@ DEFINE_VIRT_MACHINE_AS_LATEST(7, 2) static void virt_machine_7_1_options(MachineClass *mc) { +VirtMachineClass *vmc = VIRT_MACHINE_CLASS(OBJECT_CLASS(mc)); + virt_machine_7_2_options(mc); compat_props_add(mc->compat_props, hw_compat_7_1, hw_compat_7_1_len); +/* Compact space for high memory regions was introduced with 7.2 */ s/space/layout/ ? Ack. +vmc->no_highmem_compact = true; } DEFINE_VIRT_MACHINE(7, 1) Other than the wording nits, lgtm. Reviewed-by: Cornelia Huck Thanks, Gavin
Re: [PATCH v5 5/6] hw/arm/virt: Improve high memory region address assignment
Hi Eric, On 10/20/22 4:07 AM, Eric Auger wrote: On 10/12/22 01:18, Gavin Shan wrote: There are three high memory regions, which are VIRT_HIGH_REDIST2, VIRT_HIGH_PCIE_ECAM and VIRT_HIGH_PCIE_MMIO. Their base addresses are floating on highest RAM address. However, they can be disabled in several cases. (1) One specific high memory region is disabled by developer by toggling vms->highmem_{redists, ecam, mmio}. I would replace the above sentence by One specific high memory region is likely to be disabled by the code by toggling vms->highmem_{redists, ecam, mmio}: Ok. (2) VIRT_HIGH_PCIE_ECAM region is disabled on machine, which is 'virt-2.12' or ealier than it. (3) VIRT_HIGH_PCIE_ECAM region is disabled when firmware is loaded on 32-bits system. (4) One specific high memory region is disabled when it breaks the PA space limit. The current implementation of virt_set_memmap() isn't comprehensive because the space for one specific high memory region is always reserved from the PA space for case (1), (2) and (3). I would suggest: isn't optimized because the high memory region PA range is always reserved whatever the actual state of the corresponding vms->highmem_ * flag. Ok. I will have something like below in next revision. The current implementation of virt_set_{memmap, high_memmap}() isn't optimized because the high memory region's PA space is always reserved, regardless of whatever the actual state in the corresponding vms->highmem_{redists, ecam, mmio} flag. In the code, In the code, 'base' and 'vms->highest_gpa' are always increased for those three cases. It's unnecessary since the assigned space of the disabled high memory region won't be used afterwards. This improves the address assignment for those three high memory s/This improves/Improve Ok. region by skipping the address assignment for one specific high memory region if it has been disabled in case (1), (2) and (3). 'vms->high_compact' is false for now, meaning that we don't have s/hight_compat/highmem_compact You also may justify the introduction of this new field. Thanks. It should be 'highmem_compact'. Yes, it makes sense to justify the addition of 'vms->highmem_compact'. I will have something like below in next revision. The memory layout may be changed after the improvement is applied, which leads to potential migration breakage. So 'vms->highmem_compact' is added to control if the improvement should be applied. For now, 'vms->highmem_compact' is set to false, meaning that we don't have memory layout change until it becomes configurable through property 'compact-highmem' in next patch. any behavior changes until it becomes configurable through property 'compact-highmem' in next patch. Signed-off-by: Gavin Shan Tested-by: Zhenyu Zhang --- hw/arm/virt.c | 23 +++ include/hw/arm/virt.h | 1 + 2 files changed, 16 insertions(+), 8 deletions(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index ee98a8a3b6..c05cfb5314 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -1717,22 +1717,29 @@ static void virt_set_high_memmap(VirtMachineState *vms, region_base = ROUND_UP(base, extended_memmap[i].size); region_size = extended_memmap[i].size; -vms->memmap[i].base = region_base; -vms->memmap[i].size = region_size; - /* * Check each device to see if they fit in the PA space, while we are at it, you can change s/they fit/it fits Agreed. - * moving highest_gpa as we go. + * moving highest_gpa as we go. For compatibility, move + * highest_gpa for disabled fitting devices as well, if + * the compact layout has been disabled. * * For each device that doesn't fit, disable it. */ fits = (region_base + region_size) <= BIT_ULL(pa_bits); -if (fits) { +if (*region_enabled && fits) { +vms->memmap[i].base = region_base; +vms->memmap[i].size = region_size; vms->highest_gpa = region_base + region_size - 1; +base = region_base + region_size; +} else { +*region_enabled = false; +if (!vms->highmem_compact) { +base = region_base + region_size; +if (fits) { +vms->highest_gpa = region_base + region_size - 1; +} +} } - -*region_enabled &= fits; -base = region_base + region_size; } } This looks quite complicated to me. It is not obvious for instance we have the same code as before when highmem_compact is not set. Typically vms->memmap[i].base/size are not always set as they were to be and impact on the rest of the code must be double checked. Could this be rewritten in that way (pseudocode totally untested). static void fit_highmem_slot(vms, *base, i,
[PATCH v5 5/6] hw/arm/virt: Improve high memory region address assignment
There are three high memory regions, which are VIRT_HIGH_REDIST2, VIRT_HIGH_PCIE_ECAM and VIRT_HIGH_PCIE_MMIO. Their base addresses are floating on highest RAM address. However, they can be disabled in several cases. (1) One specific high memory region is disabled by developer by toggling vms->highmem_{redists, ecam, mmio}. (2) VIRT_HIGH_PCIE_ECAM region is disabled on machine, which is 'virt-2.12' or ealier than it. (3) VIRT_HIGH_PCIE_ECAM region is disabled when firmware is loaded on 32-bits system. (4) One specific high memory region is disabled when it breaks the PA space limit. The current implementation of virt_set_memmap() isn't comprehensive because the space for one specific high memory region is always reserved from the PA space for case (1), (2) and (3). In the code, 'base' and 'vms->highest_gpa' are always increased for those three cases. It's unnecessary since the assigned space of the disabled high memory region won't be used afterwards. This improves the address assignment for those three high memory region by skipping the address assignment for one specific high memory region if it has been disabled in case (1), (2) and (3). 'vms->high_compact' is false for now, meaning that we don't have any behavior changes until it becomes configurable through property 'compact-highmem' in next patch. Signed-off-by: Gavin Shan Tested-by: Zhenyu Zhang --- hw/arm/virt.c | 23 +++ include/hw/arm/virt.h | 1 + 2 files changed, 16 insertions(+), 8 deletions(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index ee98a8a3b6..c05cfb5314 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -1717,22 +1717,29 @@ static void virt_set_high_memmap(VirtMachineState *vms, region_base = ROUND_UP(base, extended_memmap[i].size); region_size = extended_memmap[i].size; -vms->memmap[i].base = region_base; -vms->memmap[i].size = region_size; - /* * Check each device to see if they fit in the PA space, - * moving highest_gpa as we go. + * moving highest_gpa as we go. For compatibility, move + * highest_gpa for disabled fitting devices as well, if + * the compact layout has been disabled. * * For each device that doesn't fit, disable it. */ fits = (region_base + region_size) <= BIT_ULL(pa_bits); -if (fits) { +if (*region_enabled && fits) { +vms->memmap[i].base = region_base; +vms->memmap[i].size = region_size; vms->highest_gpa = region_base + region_size - 1; +base = region_base + region_size; +} else { +*region_enabled = false; +if (!vms->highmem_compact) { +base = region_base + region_size; +if (fits) { +vms->highest_gpa = region_base + region_size - 1; +} +} } - -*region_enabled &= fits; -base = region_base + region_size; } } diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h index 6ec479ca2b..709f623741 100644 --- a/include/hw/arm/virt.h +++ b/include/hw/arm/virt.h @@ -144,6 +144,7 @@ struct VirtMachineState { PFlashCFI01 *flash[2]; bool secure; bool highmem; +bool highmem_compact; bool highmem_ecam; bool highmem_mmio; bool highmem_redists; -- 2.23.0
[PATCH v5 2/6] hw/arm/virt: Rename variable size to region_size in virt_set_high_memmap()
This renames variable 'size' to 'region_size' in virt_set_high_memmap(). Its counterpart ('region_base') will be introduced in next patch. No functional change intended. Signed-off-by: Gavin Shan Reviewed-by: Eric Auger Reviewed-by: Cornelia Huck Tested-by: Zhenyu Zhang --- hw/arm/virt.c | 15 --- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 7572c44bda..e2ae88cf8b 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -1692,15 +1692,16 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState *vms, int idx) static void virt_set_high_memmap(VirtMachineState *vms, hwaddr base, int pa_bits) { +hwaddr region_size; +bool fits; int i; for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) { -hwaddr size = extended_memmap[i].size; -bool fits; +region_size = extended_memmap[i].size; -base = ROUND_UP(base, size); +base = ROUND_UP(base, region_size); vms->memmap[i].base = base; -vms->memmap[i].size = size; +vms->memmap[i].size = region_size; /* * Check each device to see if they fit in the PA space, @@ -1708,9 +1709,9 @@ static void virt_set_high_memmap(VirtMachineState *vms, * * For each device that doesn't fit, disable it. */ -fits = (base + size) <= BIT_ULL(pa_bits); +fits = (base + region_size) <= BIT_ULL(pa_bits); if (fits) { -vms->highest_gpa = base + size - 1; +vms->highest_gpa = base + region_size - 1; } switch (i) { @@ -1725,7 +1726,7 @@ static void virt_set_high_memmap(VirtMachineState *vms, break; } -base += size; +base += region_size; } } -- 2.23.0
[PATCH v5 4/6] hw/arm/virt: Introduce virt_get_high_memmap_enabled() helper
This introduces virt_get_high_memmap_enabled() helper, which returns the pointer to vms->highmem_{redists, ecam, mmio}. The pointer will be used in the subsequent patches. No functional change intended. Signed-off-by: Gavin Shan Tested-by: Zhenyu Zhang --- hw/arm/virt.c | 32 +++- 1 file changed, 19 insertions(+), 13 deletions(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 0bf3cb7057..ee98a8a3b6 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -1689,14 +1689,31 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState *vms, int idx) return arm_cpu_mp_affinity(idx, clustersz); } +static inline bool *virt_get_high_memmap_enabled(VirtMachineState *vms, + int index) +{ +bool *enabled_array[] = { +>highmem_redists, +>highmem_ecam, +>highmem_mmio, +}; + +assert(ARRAY_SIZE(extended_memmap) - VIRT_LOWMEMMAP_LAST == + ARRAY_SIZE(enabled_array)); +assert(index - VIRT_LOWMEMMAP_LAST < ARRAY_SIZE(enabled_array)); + +return enabled_array[index - VIRT_LOWMEMMAP_LAST]; +} + static void virt_set_high_memmap(VirtMachineState *vms, hwaddr base, int pa_bits) { hwaddr region_base, region_size; -bool fits; +bool *region_enabled, fits; int i; for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) { +region_enabled = virt_get_high_memmap_enabled(vms, i); region_base = ROUND_UP(base, extended_memmap[i].size); region_size = extended_memmap[i].size; @@ -1714,18 +1731,7 @@ static void virt_set_high_memmap(VirtMachineState *vms, vms->highest_gpa = region_base + region_size - 1; } -switch (i) { -case VIRT_HIGH_GIC_REDIST2: -vms->highmem_redists &= fits; -break; -case VIRT_HIGH_PCIE_ECAM: -vms->highmem_ecam &= fits; -break; -case VIRT_HIGH_PCIE_MMIO: -vms->highmem_mmio &= fits; -break; -} - +*region_enabled &= fits; base = region_base + region_size; } } -- 2.23.0
[PATCH v5 6/6] hw/arm/virt: Add 'compact-highmem' property
After the improvement to high memory region address assignment is applied, the memory layout can be changed, introducing possible migration breakage. For example, VIRT_HIGH_PCIE_MMIO memory region is disabled or enabled when the optimization is applied or not, with the following configuration. pa_bits = 40; vms->highmem_redists = false; vms->highmem_ecam= false; vms->highmem_mmio= true; # qemu-system-aarch64 -accel kvm -cpu host\ -machine virt-7.2,compact-highmem={on, off} \ -m 4G,maxmem=511G -monitor stdio Regioncompact-highmem=off compact-highmem=on RAM [1GB 512GB][1GB 512GB] HIGH_GIC_REDISTS [512GB 512GB+64MB] [disabled] HIGH_PCIE_ECAM[512GB+256MB 512GB+512MB] [disabled] HIGH_PCIE_MMIO[disabled] [512GB 1TB] In order to keep backwords compatibility, we need to disable the optimization on machines, which is virt-7.1 or ealier than it. It means the optimization is enabled by default from virt-7.2. Besides, 'compact-highmem' property is added so that the optimization can be explicitly enabled or disabled on all machine types by users. Signed-off-by: Gavin Shan Tested-by: Zhenyu Zhang --- docs/system/arm/virt.rst | 4 hw/arm/virt.c| 47 include/hw/arm/virt.h| 1 + 3 files changed, 52 insertions(+) diff --git a/docs/system/arm/virt.rst b/docs/system/arm/virt.rst index 20442ea2c1..75bf5a4994 100644 --- a/docs/system/arm/virt.rst +++ b/docs/system/arm/virt.rst @@ -94,6 +94,10 @@ highmem address space above 32 bits. The default is ``on`` for machine types later than ``virt-2.12``. +compact-highmem + Set ``on``/``off`` to enable/disable compact space for high memory regions. + The default is ``on`` for machine types later than ``virt-7.2`` + gic-version Specify the version of the Generic Interrupt Controller (GIC) to provide. Valid values are: diff --git a/hw/arm/virt.c b/hw/arm/virt.c index c05cfb5314..8f1dba0ece 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -174,6 +174,27 @@ static const MemMapEntry base_memmap[] = { * Note the extended_memmap is sized so that it eventually also includes the * base_memmap entries (VIRT_HIGH_GIC_REDIST2 index is greater than the last * index of base_memmap). + * + * The addresses assigned to these regions are affected by 'compact-highmem' + * property, which is to enable or disable the compact space in the Highmem + * IO regions. For example, VIRT_HIGH_PCIE_MMIO can be disabled or enabled + * depending on the property in the following scenario. + * + * pa_bits = 40; + * vms->highmem_redists = false; + * vms->highmem_ecam= false; + * vms->highmem_mmio= true; + * + * # qemu-system-aarch64 -accel kvm -cpu host\ + * -machine virt-7.2,compact-highmem={on, off} \ + * -m 4G,maxmem=511G -monitor stdio + * + * Regioncompact-highmem=offcompact-highmem=on + * + * RAM [1GB 512GB][1GB 512GB] + * HIGH_GIC_REDISTS [512GB 512GB+64MB] [disabled] + * HIGH_PCIE_ECAM[512GB+256GB 512GB+512MB] [disabled] + * HIGH_PCIE_MMIO[disabled] [512GB 1TB] */ static MemMapEntry extended_memmap[] = { /* Additional 64 MB redist region (can contain up to 512 redistributors) */ @@ -2353,6 +2374,20 @@ static void virt_set_highmem(Object *obj, bool value, Error **errp) vms->highmem = value; } +static bool virt_get_compact_highmem(Object *obj, Error **errp) +{ +VirtMachineState *vms = VIRT_MACHINE(obj); + +return vms->highmem_compact; +} + +static void virt_set_compact_highmem(Object *obj, bool value, Error **errp) +{ +VirtMachineState *vms = VIRT_MACHINE(obj); + +vms->highmem_compact = value; +} + static bool virt_get_its(Object *obj, Error **errp) { VirtMachineState *vms = VIRT_MACHINE(obj); @@ -2971,6 +3006,13 @@ static void virt_machine_class_init(ObjectClass *oc, void *data) "Set on/off to enable/disable using " "physical address space above 32 bits"); +object_class_property_add_bool(oc, "compact-highmem", + virt_get_compact_highmem, + virt_set_compact_highmem); +object_class_property_set_description(oc, "compact-highmem", + "Set on/off to enable/disable compact " + "space for high memory regions"); + object_class_property_add_str(oc, "gic-version", virt_get_gic_version, virt_set_gic_version)
[PATCH v5 3/6] hw/arm/virt: Introduce variable region_base in virt_set_high_memmap()
This introduces variable 'region_base' for the base address of the specific high memory region. It's the preparatory work to optimize high memory region address assignment. No functional change intended. Signed-off-by: Gavin Shan Reviewed-by: Eric Auger Reviewed-by: Cornelia Huck Tested-by: Zhenyu Zhang --- hw/arm/virt.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index e2ae88cf8b..0bf3cb7057 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -1692,15 +1692,15 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState *vms, int idx) static void virt_set_high_memmap(VirtMachineState *vms, hwaddr base, int pa_bits) { -hwaddr region_size; +hwaddr region_base, region_size; bool fits; int i; for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) { +region_base = ROUND_UP(base, extended_memmap[i].size); region_size = extended_memmap[i].size; -base = ROUND_UP(base, region_size); -vms->memmap[i].base = base; +vms->memmap[i].base = region_base; vms->memmap[i].size = region_size; /* @@ -1709,9 +1709,9 @@ static void virt_set_high_memmap(VirtMachineState *vms, * * For each device that doesn't fit, disable it. */ -fits = (base + region_size) <= BIT_ULL(pa_bits); +fits = (region_base + region_size) <= BIT_ULL(pa_bits); if (fits) { -vms->highest_gpa = base + region_size - 1; +vms->highest_gpa = region_base + region_size - 1; } switch (i) { @@ -1726,7 +1726,7 @@ static void virt_set_high_memmap(VirtMachineState *vms, break; } -base += region_size; +base = region_base + region_size; } } -- 2.23.0
[PATCH v5 1/6] hw/arm/virt: Introduce virt_set_high_memmap() helper
This introduces virt_set_high_memmap() helper. The logic of high memory region address assignment is moved to the helper. The intention is to make the subsequent optimization for high memory region address assignment easier. No functional change intended. Signed-off-by: Gavin Shan Reviewed-by: Eric Auger Reviewed-by: Cornelia Huck Tested-by: Zhenyu Zhang --- hw/arm/virt.c | 74 --- 1 file changed, 41 insertions(+), 33 deletions(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index cda9defe8f..7572c44bda 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -1689,6 +1689,46 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState *vms, int idx) return arm_cpu_mp_affinity(idx, clustersz); } +static void virt_set_high_memmap(VirtMachineState *vms, + hwaddr base, int pa_bits) +{ +int i; + +for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) { +hwaddr size = extended_memmap[i].size; +bool fits; + +base = ROUND_UP(base, size); +vms->memmap[i].base = base; +vms->memmap[i].size = size; + +/* + * Check each device to see if they fit in the PA space, + * moving highest_gpa as we go. + * + * For each device that doesn't fit, disable it. + */ +fits = (base + size) <= BIT_ULL(pa_bits); +if (fits) { +vms->highest_gpa = base + size - 1; +} + +switch (i) { +case VIRT_HIGH_GIC_REDIST2: +vms->highmem_redists &= fits; +break; +case VIRT_HIGH_PCIE_ECAM: +vms->highmem_ecam &= fits; +break; +case VIRT_HIGH_PCIE_MMIO: +vms->highmem_mmio &= fits; +break; +} + +base += size; +} +} + static void virt_set_memmap(VirtMachineState *vms, int pa_bits) { MachineState *ms = MACHINE(vms); @@ -1744,39 +1784,7 @@ static void virt_set_memmap(VirtMachineState *vms, int pa_bits) /* We know for sure that at least the memory fits in the PA space */ vms->highest_gpa = memtop - 1; -for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) { -hwaddr size = extended_memmap[i].size; -bool fits; - -base = ROUND_UP(base, size); -vms->memmap[i].base = base; -vms->memmap[i].size = size; - -/* - * Check each device to see if they fit in the PA space, - * moving highest_gpa as we go. - * - * For each device that doesn't fit, disable it. - */ -fits = (base + size) <= BIT_ULL(pa_bits); -if (fits) { -vms->highest_gpa = base + size - 1; -} - -switch (i) { -case VIRT_HIGH_GIC_REDIST2: -vms->highmem_redists &= fits; -break; -case VIRT_HIGH_PCIE_ECAM: -vms->highmem_ecam &= fits; -break; -case VIRT_HIGH_PCIE_MMIO: -vms->highmem_mmio &= fits; -break; -} - -base += size; -} +virt_set_high_memmap(vms, base, pa_bits); if (device_memory_size > 0) { ms->device_memory = g_malloc0(sizeof(*ms->device_memory)); -- 2.23.0
[PATCH v5 0/6] hw/arm/virt: Improve address assignment for high memory regions
There are three high memory regions, which are VIRT_HIGH_REDIST2, VIRT_HIGH_PCIE_ECAM and VIRT_HIGH_PCIE_MMIO. Their base addresses are floating on highest RAM address. However, they can be disabled in several cases. (1) One specific high memory region is disabled by developer by toggling vms->highmem_{redists, ecam, mmio}. (2) VIRT_HIGH_PCIE_ECAM region is disabled on machine, which is 'virt-2.12' or ealier than it. (3) VIRT_HIGH_PCIE_ECAM region is disabled when firmware is loaded on 32-bits system. (4) One specific high memory region is disabled when it breaks the PA space limit. The current implementation of virt_set_memmap() isn't comprehensive because the space for one specific high memory region is always reserved from the PA space for case (1), (2) and (3). In the code, 'base' and 'vms->highest_gpa' are always increased for those three cases. It's unnecessary since the assigned space of the disabled high memory region won't be used afterwards. The series intends to improve the address assignment for these high memory regions. PATCH[1-4] preparatory work for the improvment PATCH[5] improve high memory region address assignment PATCH[6] adds 'compact-highmem' to enable or disable the optimization v4: https://lists.nongnu.org/archive/html/qemu-arm/2022-10/msg00067.html v3: https://lists.nongnu.org/archive/html/qemu-arm/2022-09/msg00258.html v2: https://lore.kernel.org/all/20220815062958.100366-1-gs...@redhat.com/T/ v1: https://lists.nongnu.org/archive/html/qemu-arm/2022-08/msg00013.html Changelog == v5: * Pick review-by and tested-by (Connie/Zhenyu) * Add extra check in PATCH[v5 4/6] (Connie) * Improve comments about compatibility for disabled regions in PATCH[v5 5/6] (Connie) v4: * Add virt_get_high_memmap_enabled() helper (Eric) * Move 'vms->highmem_compact' and related logic from PATCH[v4 6/6] to PATCH[v4 5/6] to avoid git-bisect breakage (Eric) * Document the legacy and optimized high memory region layout in commit log and source code (Eric) v3: * Reorder the patches(Gavin) * Add 'highmem-compact' property for backwards compatibility (Eric) v2: * Split the patches for easier review(Gavin) * Improved changelog (Marc) * Use 'bool fits' in virt_set_high_memmap() (Eric) Gavin Shan (6): hw/arm/virt: Introduce virt_set_high_memmap() helper hw/arm/virt: Rename variable size to region_size in virt_set_high_memmap() hw/arm/virt: Introduce variable region_base in virt_set_high_memmap() hw/arm/virt: Introduce virt_get_high_memmap_enabled() helper hw/arm/virt: Improve high memory region address assignment hw/arm/virt: Add 'compact-highmem' property docs/system/arm/virt.rst | 4 ++ hw/arm/virt.c| 135 +-- include/hw/arm/virt.h| 2 + 3 files changed, 108 insertions(+), 33 deletions(-) -- 2.23.0
Re: [PATCH v4 4/6] hw/arm/virt: Introduce virt_get_high_memmap_enabled() helper
On 10/12/22 12:45 AM, Eric Auger wrote: On 10/5/22 00:47, Gavin Shan wrote: On 10/4/22 6:41 PM, Cornelia Huck wrote: On Tue, Oct 04 2022, Gavin Shan wrote: This introduces virt_get_high_memmap_enabled() helper, which returns the pointer to vms->highmem_{redists, ecam, mmio}. The pointer will be used in the subsequent patches. No functional change intended. Signed-off-by: Gavin Shan --- hw/arm/virt.c | 30 +- 1 file changed, 17 insertions(+), 13 deletions(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index b0b679d1f4..59de7b78b5 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -1689,14 +1689,29 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState *vms, int idx) return arm_cpu_mp_affinity(idx, clustersz); } +static inline bool *virt_get_high_memmap_enabled(VirtMachineState *vms, + int index) +{ + bool *enabled_array[] = { + >highmem_redists, + >highmem_ecam, + >highmem_mmio, + }; + + assert(index - VIRT_LOWMEMMAP_LAST < ARRAY_SIZE(enabled_array)); I wonder whether we want an assert(ARRAY_SIZE(extended_memmap) == ARRAY_SIZE(enabled_array))? IIUC, we never want those two to get out of sync? Yeah, It makes sense to ensure both arrays synchronized. I will add the extra check in next respin. With Connie's suggestion this looks good to me. What we need is actually like below because the array (extended_memmap) starts from VIRT_LOWMEMMAP_LAST instead of zero. I'm adding the extra check into v5, which will be posted shortly. assert(ARRAY_SIZE(extended_memmap) - VIRT_LOWMEMMAP_LAST == ARRAY_SIZE(enabled_array)); + + return enabled_array[index - VIRT_LOWMEMMAP_LAST]; +} + Thanks, Gavin
Re: [PATCH v4 6/6] hw/arm/virt: Add 'compact-highmem' property
Hi Marc, On 10/5/22 1:39 AM, Marc Zyngier wrote: On Tue, 04 Oct 2022 01:26:27 +0100, Gavin Shan wrote: After the improvement to high memory region address assignment is applied, the memory layout can be changed, introducing possible migration breakage. For example, VIRT_HIGH_PCIE_MMIO memory region is disabled or enabled when the optimization is applied or not, with the following configuration. pa_bits = 40; vms->highmem_redists = false; vms->highmem_ecam= false; vms->highmem_mmio= true; The question is how are these parameters specified by a user? Short of hacking the code, this isn't really possible. Yeah, It's impossible to have false for vms->highmem_redists unless the code is hacked. # qemu-system-aarch64 -accel kvm -cpu host\ -machine virt-7.2,compact-highmem={on, off} \ -m 4G,maxmem=511G -monitor stdio Regioncompact-highmem=off compact-highmem=on RAM [1GB 512GB][1GB 512GB] HIGH_GIC_REDISTS [512GB 512GB+64MB] [disabled] HIGH_PCIE_ECAM[512GB+256MB 512GB+512MB] [disabled] HIGH_PCIE_MMIO[disabled] [512GB 1TB] In order to keep backwords compatibility, we need to disable the optimization on machines, which is virt-7.1 or ealier than it. It means the optimization is enabled by default from virt-7.2. Besides, 'compact-highmem' property is added so that the optimization can be explicitly enabled or disabled on all machine types by users. Not directly related to this series, but it seems to me that we should be aiming at reproducible results across HW implementations (at least with KVM). Depending on how many PA bits the HW implements, we end-up with a set of devices or another, which is likely to be confusing for a user. I think we should consider an additional set of changes to allow a user to specify the PA bits as well as the devices they want to see enabled. I think the idea to selectively enable devices (high memory regions) is sensible. For example, users may needn't HIGH_PCIE_MMIO at all in some systems, where they have limited PCI devices. I'm not sure about PA bits because it has been discovered from hardware and configure the automatically optimized value/bits back to KVM. The optimized value/bits is automatically calculated based on the enabled high memory regions. Thanks, Gavin
Re: [PATCH v4 5/6] hw/arm/virt: Improve high memory region address
Hi Connie, On 10/4/22 6:53 PM, Cornelia Huck wrote: On Tue, Oct 04 2022, Gavin Shan wrote: There are three high memory regions, which are VIRT_HIGH_REDIST2, VIRT_HIGH_PCIE_ECAM and VIRT_HIGH_PCIE_MMIO. Their base addresses are floating on highest RAM address. However, they can be disabled in several cases. (1) One specific high memory region is disabled by developer by toggling vms->highmem_{redists, ecam, mmio}. (2) VIRT_HIGH_PCIE_ECAM region is disabled on machine, which is 'virt-2.12' or ealier than it. (3) VIRT_HIGH_PCIE_ECAM region is disabled when firmware is loaded on 32-bits system. (4) One specific high memory region is disabled when it breaks the PA space limit. The current implementation of virt_set_memmap() isn't comprehensive because the space for one specific high memory region is always reserved from the PA space for case (1), (2) and (3). In the code, 'base' and 'vms->highest_gpa' are always increased for those three cases. It's unnecessary since the assigned space of the disabled high memory region won't be used afterwards. This improves the address assignment for those three high memory region by skipping the address assignment for one specific high memory region if it has been disabled in case (1), (2) and (3). 'vms->high_compact' is false for now, meaning that we don't have any behavior changes until it becomes configurable through property 'compact-highmem' in next patch. Signed-off-by: Gavin Shan --- hw/arm/virt.c | 19 --- include/hw/arm/virt.h | 1 + 2 files changed, 13 insertions(+), 7 deletions(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 59de7b78b5..4164da49e9 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -1715,9 +1715,6 @@ static void virt_set_high_memmap(VirtMachineState *vms, region_base = ROUND_UP(base, extended_memmap[i].size); region_size = extended_memmap[i].size; -vms->memmap[i].base = region_base; -vms->memmap[i].size = region_size; - /* * Check each device to see if they fit in the PA space, * moving highest_gpa as we go. Maybe tweak this comment? "Check each enabled device to see if they fit in the PA space, moving highest_gpa as we go. For compatibility, move highest_gpa for disabled fitting devices as well, if the compact layout has been disabled." (Or would that be overkill?) It looks overkill to me since the code is simple and clear. However, comments won't be harmful. I will integrate the proposed comment in next respin. Thanks, Gavin
Re: [PATCH v4 4/6] hw/arm/virt: Introduce virt_get_high_memmap_enabled() helper
Hi Connie, On 10/4/22 6:41 PM, Cornelia Huck wrote: On Tue, Oct 04 2022, Gavin Shan wrote: This introduces virt_get_high_memmap_enabled() helper, which returns the pointer to vms->highmem_{redists, ecam, mmio}. The pointer will be used in the subsequent patches. No functional change intended. Signed-off-by: Gavin Shan --- hw/arm/virt.c | 30 +- 1 file changed, 17 insertions(+), 13 deletions(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index b0b679d1f4..59de7b78b5 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -1689,14 +1689,29 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState *vms, int idx) return arm_cpu_mp_affinity(idx, clustersz); } +static inline bool *virt_get_high_memmap_enabled(VirtMachineState *vms, + int index) +{ +bool *enabled_array[] = { +>highmem_redists, +>highmem_ecam, +>highmem_mmio, +}; + +assert(index - VIRT_LOWMEMMAP_LAST < ARRAY_SIZE(enabled_array)); I wonder whether we want an assert(ARRAY_SIZE(extended_memmap) == ARRAY_SIZE(enabled_array))? IIUC, we never want those two to get out of sync? Yeah, It makes sense to ensure both arrays synchronized. I will add the extra check in next respin. + +return enabled_array[index - VIRT_LOWMEMMAP_LAST]; +} + Thanks, Gavin
[PATCH v4 5/6] hw/arm/virt: Improve high memory region address
There are three high memory regions, which are VIRT_HIGH_REDIST2, VIRT_HIGH_PCIE_ECAM and VIRT_HIGH_PCIE_MMIO. Their base addresses are floating on highest RAM address. However, they can be disabled in several cases. (1) One specific high memory region is disabled by developer by toggling vms->highmem_{redists, ecam, mmio}. (2) VIRT_HIGH_PCIE_ECAM region is disabled on machine, which is 'virt-2.12' or ealier than it. (3) VIRT_HIGH_PCIE_ECAM region is disabled when firmware is loaded on 32-bits system. (4) One specific high memory region is disabled when it breaks the PA space limit. The current implementation of virt_set_memmap() isn't comprehensive because the space for one specific high memory region is always reserved from the PA space for case (1), (2) and (3). In the code, 'base' and 'vms->highest_gpa' are always increased for those three cases. It's unnecessary since the assigned space of the disabled high memory region won't be used afterwards. This improves the address assignment for those three high memory region by skipping the address assignment for one specific high memory region if it has been disabled in case (1), (2) and (3). 'vms->high_compact' is false for now, meaning that we don't have any behavior changes until it becomes configurable through property 'compact-highmem' in next patch. Signed-off-by: Gavin Shan --- hw/arm/virt.c | 19 --- include/hw/arm/virt.h | 1 + 2 files changed, 13 insertions(+), 7 deletions(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 59de7b78b5..4164da49e9 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -1715,9 +1715,6 @@ static void virt_set_high_memmap(VirtMachineState *vms, region_base = ROUND_UP(base, extended_memmap[i].size); region_size = extended_memmap[i].size; -vms->memmap[i].base = region_base; -vms->memmap[i].size = region_size; - /* * Check each device to see if they fit in the PA space, * moving highest_gpa as we go. @@ -1725,12 +1722,20 @@ static void virt_set_high_memmap(VirtMachineState *vms, * For each device that doesn't fit, disable it. */ fits = (region_base + region_size) <= BIT_ULL(pa_bits); -if (fits) { +if (*region_enabled && fits) { +vms->memmap[i].base = region_base; +vms->memmap[i].size = region_size; vms->highest_gpa = region_base + region_size - 1; +base = region_base + region_size; +} else { +*region_enabled = false; +if (!vms->highmem_compact) { +base = region_base + region_size; +if (fits) { +vms->highest_gpa = region_base + region_size - 1; +} +} } - -*region_enabled &= fits; -base = region_base + region_size; } } diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h index 6ec479ca2b..709f623741 100644 --- a/include/hw/arm/virt.h +++ b/include/hw/arm/virt.h @@ -144,6 +144,7 @@ struct VirtMachineState { PFlashCFI01 *flash[2]; bool secure; bool highmem; +bool highmem_compact; bool highmem_ecam; bool highmem_mmio; bool highmem_redists; -- 2.23.0
[PATCH v4 6/6] hw/arm/virt: Add 'compact-highmem' property
After the improvement to high memory region address assignment is applied, the memory layout can be changed, introducing possible migration breakage. For example, VIRT_HIGH_PCIE_MMIO memory region is disabled or enabled when the optimization is applied or not, with the following configuration. pa_bits = 40; vms->highmem_redists = false; vms->highmem_ecam= false; vms->highmem_mmio= true; # qemu-system-aarch64 -accel kvm -cpu host\ -machine virt-7.2,compact-highmem={on, off} \ -m 4G,maxmem=511G -monitor stdio Regioncompact-highmem=off compact-highmem=on RAM [1GB 512GB][1GB 512GB] HIGH_GIC_REDISTS [512GB 512GB+64MB] [disabled] HIGH_PCIE_ECAM[512GB+256MB 512GB+512MB] [disabled] HIGH_PCIE_MMIO[disabled] [512GB 1TB] In order to keep backwords compatibility, we need to disable the optimization on machines, which is virt-7.1 or ealier than it. It means the optimization is enabled by default from virt-7.2. Besides, 'compact-highmem' property is added so that the optimization can be explicitly enabled or disabled on all machine types by users. Signed-off-by: Gavin Shan --- docs/system/arm/virt.rst | 4 hw/arm/virt.c| 47 include/hw/arm/virt.h| 1 + 3 files changed, 52 insertions(+) diff --git a/docs/system/arm/virt.rst b/docs/system/arm/virt.rst index 20442ea2c1..75bf5a4994 100644 --- a/docs/system/arm/virt.rst +++ b/docs/system/arm/virt.rst @@ -94,6 +94,10 @@ highmem address space above 32 bits. The default is ``on`` for machine types later than ``virt-2.12``. +compact-highmem + Set ``on``/``off`` to enable/disable compact space for high memory regions. + The default is ``on`` for machine types later than ``virt-7.2`` + gic-version Specify the version of the Generic Interrupt Controller (GIC) to provide. Valid values are: diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 4164da49e9..9fe65a2ae1 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -174,6 +174,27 @@ static const MemMapEntry base_memmap[] = { * Note the extended_memmap is sized so that it eventually also includes the * base_memmap entries (VIRT_HIGH_GIC_REDIST2 index is greater than the last * index of base_memmap). + * + * The addresses assigned to these regions are affected by 'compact-highmem' + * property, which is to enable or disable the compact space in the Highmem + * IO regions. For example, VIRT_HIGH_PCIE_MMIO can be disabled or enabled + * depending on the property in the following scenario. + * + * pa_bits = 40; + * vms->highmem_redists = false; + * vms->highmem_ecam= false; + * vms->highmem_mmio= true; + * + * # qemu-system-aarch64 -accel kvm -cpu host\ + * -machine virt-7.2,compact-highmem={on, off} \ + * -m 4G,maxmem=511G -monitor stdio + * + * Regioncompact-highmem=offcompact-highmem=on + * + * RAM [1GB 512GB][1GB 512GB] + * HIGH_GIC_REDISTS [512GB 512GB+64MB] [disabled] + * HIGH_PCIE_ECAM[512GB+256GB 512GB+512MB] [disabled] + * HIGH_PCIE_MMIO[disabled] [512GB 1TB] */ static MemMapEntry extended_memmap[] = { /* Additional 64 MB redist region (can contain up to 512 redistributors) */ @@ -2349,6 +2370,20 @@ static void virt_set_highmem(Object *obj, bool value, Error **errp) vms->highmem = value; } +static bool virt_get_compact_highmem(Object *obj, Error **errp) +{ +VirtMachineState *vms = VIRT_MACHINE(obj); + +return vms->highmem_compact; +} + +static void virt_set_compact_highmem(Object *obj, bool value, Error **errp) +{ +VirtMachineState *vms = VIRT_MACHINE(obj); + +vms->highmem_compact = value; +} + static bool virt_get_its(Object *obj, Error **errp) { VirtMachineState *vms = VIRT_MACHINE(obj); @@ -2967,6 +3002,13 @@ static void virt_machine_class_init(ObjectClass *oc, void *data) "Set on/off to enable/disable using " "physical address space above 32 bits"); +object_class_property_add_bool(oc, "compact-highmem", + virt_get_compact_highmem, + virt_set_compact_highmem); +object_class_property_set_description(oc, "compact-highmem", + "Set on/off to enable/disable compact " + "space for high memory regions"); + object_class_property_add_str(oc, "gic-version", virt_get_gic_version, virt_set_gic_version); object_class_proper
[PATCH v4 4/6] hw/arm/virt: Introduce virt_get_high_memmap_enabled() helper
This introduces virt_get_high_memmap_enabled() helper, which returns the pointer to vms->highmem_{redists, ecam, mmio}. The pointer will be used in the subsequent patches. No functional change intended. Signed-off-by: Gavin Shan --- hw/arm/virt.c | 30 +- 1 file changed, 17 insertions(+), 13 deletions(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index b0b679d1f4..59de7b78b5 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -1689,14 +1689,29 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState *vms, int idx) return arm_cpu_mp_affinity(idx, clustersz); } +static inline bool *virt_get_high_memmap_enabled(VirtMachineState *vms, + int index) +{ +bool *enabled_array[] = { +>highmem_redists, +>highmem_ecam, +>highmem_mmio, +}; + +assert(index - VIRT_LOWMEMMAP_LAST < ARRAY_SIZE(enabled_array)); + +return enabled_array[index - VIRT_LOWMEMMAP_LAST]; +} + static void virt_set_high_memmap(VirtMachineState *vms, hwaddr base, int pa_bits) { hwaddr region_base, region_size; -bool fits; +bool *region_enabled, fits; int i; for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) { +region_enabled = virt_get_high_memmap_enabled(vms, i); region_base = ROUND_UP(base, extended_memmap[i].size); region_size = extended_memmap[i].size; @@ -1714,18 +1729,7 @@ static void virt_set_high_memmap(VirtMachineState *vms, vms->highest_gpa = region_base + region_size - 1; } -switch (i) { -case VIRT_HIGH_GIC_REDIST2: -vms->highmem_redists &= fits; -break; -case VIRT_HIGH_PCIE_ECAM: -vms->highmem_ecam &= fits; -break; -case VIRT_HIGH_PCIE_MMIO: -vms->highmem_mmio &= fits; -break; -} - +*region_enabled &= fits; base = region_base + region_size; } } -- 2.23.0
[PATCH v4 0/6] hw/arm/virt: Improve address assignment for high memory regions
There are three high memory regions, which are VIRT_HIGH_REDIST2, VIRT_HIGH_PCIE_ECAM and VIRT_HIGH_PCIE_MMIO. Their base addresses are floating on highest RAM address. However, they can be disabled in several cases. (1) One specific high memory region is disabled by developer by toggling vms->highmem_{redists, ecam, mmio}. (2) VIRT_HIGH_PCIE_ECAM region is disabled on machine, which is 'virt-2.12' or ealier than it. (3) VIRT_HIGH_PCIE_ECAM region is disabled when firmware is loaded on 32-bits system. (4) One specific high memory region is disabled when it breaks the PA space limit. The current implementation of virt_set_memmap() isn't comprehensive because the space for one specific high memory region is always reserved from the PA space for case (1), (2) and (3). In the code, 'base' and 'vms->highest_gpa' are always increased for those three cases. It's unnecessary since the assigned space of the disabled high memory region won't be used afterwards. The series intends to improve the address assignment for these high memory regions. PATCH[1-4] preparatory work for the improvment PATCH[5] improve high memory region address assignment PATCH[6] adds 'compact-highmem' to enable or disable the optimization History === v3: https://lists.nongnu.org/archive/html/qemu-arm/2022-09/msg00258.html v2: https://lore.kernel.org/all/20220815062958.100366-1-gs...@redhat.com/T/ v1: https://lists.nongnu.org/archive/html/qemu-arm/2022-08/msg00013.html Changelog == v4: * Add virt_get_high_memmap_enabled() helper (Eric) * Move 'vms->highmem_compact' and related logic from PATCH[v4 6/6] to PATCH[v4 5/6] to avoid git-bisect breakage (Eric) * Document the legacy and optimized high memory region layout in commit log and source code (Eric) v3: * Reorder the patches(Gavin) * Add 'highmem-compact' property for backwards compatibility (Eric) v2: * Split the patches for easier review(Gavin) * Improved changelog (Marc) * Use 'bool fits' in virt_set_high_memmap() (Eric) Gavin Shan (6): hw/arm/virt: Introduce virt_set_high_memmap() helper hw/arm/virt: Rename variable size to region_size in virt_set_high_memmap() hw/arm/virt: Introduce variable region_base in virt_set_high_memmap() hw/arm/virt: Introduce virt_get_high_memmap_enabled() helper hw/arm/virt: Improve high memory region address hw/arm/virt: Add 'compact-highmem' property docs/system/arm/virt.rst | 4 ++ hw/arm/virt.c| 131 +-- include/hw/arm/virt.h| 2 + 3 files changed, 104 insertions(+), 33 deletions(-) -- 2.23.0
[PATCH v4 2/6] hw/arm/virt: Rename variable size to region_size in virt_set_high_memmap()
This renames variable 'size' to 'region_size' in virt_set_high_memmap(). Its counterpart ('region_base') will be introduced in next patch. No functional change intended. Signed-off-by: Gavin Shan Reviewed-by: Eric Auger --- hw/arm/virt.c | 15 --- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 4dab528b82..187b3ee0e2 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -1692,15 +1692,16 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState *vms, int idx) static void virt_set_high_memmap(VirtMachineState *vms, hwaddr base, int pa_bits) { +hwaddr region_size; +bool fits; int i; for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) { -hwaddr size = extended_memmap[i].size; -bool fits; +region_size = extended_memmap[i].size; -base = ROUND_UP(base, size); +base = ROUND_UP(base, region_size); vms->memmap[i].base = base; -vms->memmap[i].size = size; +vms->memmap[i].size = region_size; /* * Check each device to see if they fit in the PA space, @@ -1708,9 +1709,9 @@ static void virt_set_high_memmap(VirtMachineState *vms, * * For each device that doesn't fit, disable it. */ -fits = (base + size) <= BIT_ULL(pa_bits); +fits = (base + region_size) <= BIT_ULL(pa_bits); if (fits) { -vms->highest_gpa = base + size - 1; +vms->highest_gpa = base + region_size - 1; } switch (i) { @@ -1725,7 +1726,7 @@ static void virt_set_high_memmap(VirtMachineState *vms, break; } -base += size; +base += region_size; } } -- 2.23.0
[PATCH v4 1/6] hw/arm/virt: Introduce virt_set_high_memmap() helper
This introduces virt_set_high_memmap() helper. The logic of high memory region address assignment is moved to the helper. The intention is to make the subsequent optimization for high memory region address assignment easier. No functional change intended. Signed-off-by: Gavin Shan Reviewed-by: Eric Auger Reviewed-by: Cornelia Huck --- hw/arm/virt.c | 74 --- 1 file changed, 41 insertions(+), 33 deletions(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 0961e053e5..4dab528b82 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -1689,6 +1689,46 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState *vms, int idx) return arm_cpu_mp_affinity(idx, clustersz); } +static void virt_set_high_memmap(VirtMachineState *vms, + hwaddr base, int pa_bits) +{ +int i; + +for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) { +hwaddr size = extended_memmap[i].size; +bool fits; + +base = ROUND_UP(base, size); +vms->memmap[i].base = base; +vms->memmap[i].size = size; + +/* + * Check each device to see if they fit in the PA space, + * moving highest_gpa as we go. + * + * For each device that doesn't fit, disable it. + */ +fits = (base + size) <= BIT_ULL(pa_bits); +if (fits) { +vms->highest_gpa = base + size - 1; +} + +switch (i) { +case VIRT_HIGH_GIC_REDIST2: +vms->highmem_redists &= fits; +break; +case VIRT_HIGH_PCIE_ECAM: +vms->highmem_ecam &= fits; +break; +case VIRT_HIGH_PCIE_MMIO: +vms->highmem_mmio &= fits; +break; +} + +base += size; +} +} + static void virt_set_memmap(VirtMachineState *vms, int pa_bits) { MachineState *ms = MACHINE(vms); @@ -1744,39 +1784,7 @@ static void virt_set_memmap(VirtMachineState *vms, int pa_bits) /* We know for sure that at least the memory fits in the PA space */ vms->highest_gpa = memtop - 1; -for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) { -hwaddr size = extended_memmap[i].size; -bool fits; - -base = ROUND_UP(base, size); -vms->memmap[i].base = base; -vms->memmap[i].size = size; - -/* - * Check each device to see if they fit in the PA space, - * moving highest_gpa as we go. - * - * For each device that doesn't fit, disable it. - */ -fits = (base + size) <= BIT_ULL(pa_bits); -if (fits) { -vms->highest_gpa = base + size - 1; -} - -switch (i) { -case VIRT_HIGH_GIC_REDIST2: -vms->highmem_redists &= fits; -break; -case VIRT_HIGH_PCIE_ECAM: -vms->highmem_ecam &= fits; -break; -case VIRT_HIGH_PCIE_MMIO: -vms->highmem_mmio &= fits; -break; -} - -base += size; -} +virt_set_high_memmap(vms, base, pa_bits); if (device_memory_size > 0) { ms->device_memory = g_malloc0(sizeof(*ms->device_memory)); -- 2.23.0
[PATCH v4 3/6] hw/arm/virt: Introduce variable region_base in virt_set_high_memmap()
This introduces variable 'region_base' for the base address of the specific high memory region. It's the preparatory work to optimize high memory region address assignment. No functional change intended. Signed-off-by: Gavin Shan Reviewed-by: Eric Auger --- hw/arm/virt.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 187b3ee0e2..b0b679d1f4 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -1692,15 +1692,15 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState *vms, int idx) static void virt_set_high_memmap(VirtMachineState *vms, hwaddr base, int pa_bits) { -hwaddr region_size; +hwaddr region_base, region_size; bool fits; int i; for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) { +region_base = ROUND_UP(base, extended_memmap[i].size); region_size = extended_memmap[i].size; -base = ROUND_UP(base, region_size); -vms->memmap[i].base = base; +vms->memmap[i].base = region_base; vms->memmap[i].size = region_size; /* @@ -1709,9 +1709,9 @@ static void virt_set_high_memmap(VirtMachineState *vms, * * For each device that doesn't fit, disable it. */ -fits = (base + region_size) <= BIT_ULL(pa_bits); +fits = (region_base + region_size) <= BIT_ULL(pa_bits); if (fits) { -vms->highest_gpa = base + region_size - 1; +vms->highest_gpa = region_base + region_size - 1; } switch (i) { @@ -1726,7 +1726,7 @@ static void virt_set_high_memmap(VirtMachineState *vms, break; } -base += region_size; +base = region_base + region_size; } } -- 2.23.0
Re: [PATCH v3 5/5] hw/arm/virt: Add 'highmem-compact' property
Hi Eric, On 10/3/22 4:49 PM, Eric Auger wrote: On 9/29/22 01:49, Gavin Shan wrote: On 9/28/22 10:22 PM, Eric Auger wrote: On 9/22/22 01:13, Gavin Shan wrote: After the improvement to high memory region address assignment is applied, the memory layout is changed. For example, VIRT_HIGH_PCIE_MMIO s/the memory layout is changed./the memory layout is changed, introducing possible migration breakage. Ok, much clearer. memory region is enabled when the improvement is applied, but it's disabled if the improvement isn't applied. pa_bits = 40; vms->highmem_redists = false; vms->highmem_ecam = false; vms->highmem_mmio = true; # qemu-system-aarch64 -accel kvm -cpu host \ -machine virt-7.2 -m 4G,maxmem=511G \ -monitor stdio In order to keep backwords compatibility, we need to disable the optimization on machines, which is virt-7.1 or ealier than it. It means the optimization is enabled by default from virt-7.2. Besides, 'highmem-compact' property is added so that the optimization can be I would rather rename the property into compact-highmem even if the vms field is name highmem_compact to align with other highmem fields Ok, but I would love to know why. Note that we already have 'highmem=on|off'. 'highmem_compact=on|off' seems consistent to me. To me the property name should rather sound 'english' with the adjective before the name 'high memory"' but I am not a native english speaker either. Ok. I agree 'compact-highmem' is better. The backup variable name will be still 'highmem_compact', which is consistent with the existing ones. explicitly enabled or disabled on all machine types by users. Signed-off-by: Gavin Shan --- docs/system/arm/virt.rst | 4 hw/arm/virt.c | 33 + include/hw/arm/virt.h | 2 ++ 3 files changed, 39 insertions(+) diff --git a/docs/system/arm/virt.rst b/docs/system/arm/virt.rst index 20442ea2c1..f05ec2253b 100644 --- a/docs/system/arm/virt.rst +++ b/docs/system/arm/virt.rst @@ -94,6 +94,10 @@ highmem address space above 32 bits. The default is ``on`` for machine types later than ``virt-2.12``. +highmem-compact + Set ``on``/``off`` to enable/disable compact space for high memory regions. + The default is ``on`` for machine types later than ``virt-7.2`` I think you should document what is compact layout versus legacy one, both in the commit msg and maybe as a comment in a code along with the comment in hw/arm/virt.c starting with 'Highmem IO Regions: ' Ok, I will add this into the commit log in v4. I don't think it's necessary to add duplicate comment in the code. People can check the commit log for details if needed. + gic-version Specify the version of the Generic Interrupt Controller (GIC) to provide. Valid values are: diff --git a/hw/arm/virt.c b/hw/arm/virt.c index b702f8f2b5..a4fbdaef91 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -1734,6 +1734,13 @@ static void virt_set_high_memmap(VirtMachineState *vms, base = region_base + region_size; } else { *region_enabled = false; + + if (!vms->highmem_compact) { this snippet should be already present in previous patch otherwise this will break bisectability. Hmm, nice catch! I think I need to swap PATCH[4] and PATCH[5] in next revision. In that order, 'compact-highmem' is introduced in PATCH[4], but not used yet. PATCH[5] has the optimization and 'compact-highmem' is used. No in general you introduce the property at the very end with the code guarded with an unset vms->highmem_compact in the previous patch. Yeah, what I need is define 'vms->highmem_compact' in PATCH[v3 4/5], whose value is false. I also need to update @base and @vms->highest_gpa on !vms->highmem_compact' in PATCH[v3 4/5]. + base = region_base + region_size; + if (fits) { + vms->highest_gpa = region_base + region_size - 1; + } + } } } } @@ -2348,6 +2355,20 @@ static void virt_set_highmem(Object *obj, bool value, Error **errp) vms->highmem = value; } +static bool virt_get_highmem_compact(Object *obj, Error **errp) +{ + VirtMachineState *vms = VIRT_MACHINE(obj); + + return vms->highmem_compact; +} + +static void virt_set_highmem_compact(Object *obj, bool value, Error **errp) +{ + VirtMachineState *vms = VIRT_MACHINE(obj); + + vms->highmem_compact = value; +} + static bool virt_get_its(Object *obj, Error **errp) { VirtMachineState *vms = VIRT_MACHINE(obj); @@ -2966,6 +2987,13 @@ static void virt_machine_class_init(ObjectClass *oc, void *data) "Set on/off to enable/disable using " "physical address space above 32 bits")
Re: [PATCH v3 4/5] hw/arm/virt: Improve high memory region address assignment
Hi Eric, On 10/3/22 4:44 PM, Eric Auger wrote: On 9/29/22 01:37, Gavin Shan wrote: On 9/28/22 10:51 PM, Eric Auger wrote: On 9/22/22 01:13, Gavin Shan wrote: There are three high memory regions, which are VIRT_HIGH_REDIST2, VIRT_HIGH_PCIE_ECAM and VIRT_HIGH_PCIE_MMIO. Their base addresses are floating on highest RAM address. However, they can be disabled in several cases. (1) One specific high memory region is disabled by developer by toggling vms->highmem_{redists, ecam, mmio}. (2) VIRT_HIGH_PCIE_ECAM region is disabled on machine, which is 'virt-2.12' or ealier than it. (3) VIRT_HIGH_PCIE_ECAM region is disabled when firmware is loaded on 32-bits system. (4) One specific high memory region is disabled when it breaks the PA space limit. The current implementation of virt_set_memmap() isn't comprehensive because the space for one specific high memory region is always reserved from the PA space for case (1), (2) and (3). In the code, 'base' and 'vms->highest_gpa' are always increased for those three cases. It's unnecessary since the assigned space of the disabled high memory region won't be used afterwards. This improves the address assignment for those three high memory region by skipping the address assignment for one specific high memory region if it has been disabled in case (1), (2) and (3). Signed-off-by: Gavin Shan --- hw/arm/virt.c | 44 ++-- 1 file changed, 26 insertions(+), 18 deletions(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index b0b679d1f4..b702f8f2b5 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -1693,15 +1693,31 @@ static void virt_set_high_memmap(VirtMachineState *vms, hwaddr base, int pa_bits) { hwaddr region_base, region_size; - bool fits; + bool *region_enabled, fits; IDo you really need a pointer? If the region is unknown this is a bug in virt code. The pointer is needed so that we can disable the region by setting 'false' to it at later point. Yeah, I think you're correct that 'unknown region' is a bug and we need to do assert(region_enabled), or something like below. Yeah I don't think using a pointer here is useful. When the high memory region can't fit into the PA space, it is disabled by toggling the corresponding flag (vms->highmem_{redists, ecam, mmio}) to false. It's part of the original implementation, as below. We either need a 'switch ... case' or a pointer. A pointer is more convenient since we need check and possibly update to the value. switch (i) { case VIRT_HIGH_GIC_REDIST2: vms->highmem_redists &= fits; break; case VIRT_HIGH_PCIE_ECAM: vms->highmem_ecam &= fits; break; case VIRT_HIGH_PCIE_MMIO: vms->highmem_mmio &= fits; break; } int i; for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) { region_base = ROUND_UP(base, extended_memmap[i].size); region_size = extended_memmap[i].size; - vms->memmap[i].base = region_base; - vms->memmap[i].size = region_size; + switch (i) { + case VIRT_HIGH_GIC_REDIST2: + region_enabled = >highmem_redists; + break; + case VIRT_HIGH_PCIE_ECAM: + region_enabled = >highmem_ecam; + break; + case VIRT_HIGH_PCIE_MMIO: + region_enabled = >highmem_mmio; + break; While we are at it I would change the vms fields dealing with those highmem regions and turn those fields into an array of bool indexed using i - VIRT_LOWMEMMAP_LAST (using a macro or something alike). We would not be obliged to have this switch, now duplicated. It makes sense to me. How about to have something like below in v4? static inline bool *virt_get_high_memmap_enabled(VirtMachineState *vms, int index) { bool *enabled_array[] = { >highmem_redists, >highmem_ecam, >highmem_mmio, }; assert(index - VIRT_LOWMEMMAP_LAST < ARRAY_SIZE(enabled_array)); return enabled_array[index - VIRT_LOWMEMMAP_LAST]; } I was rather thinking as directly using a vms->highmem_flags[] but your proposal may work as well. Ok. I will use my proposed change in next revision. + default: + region_enabled = NULL; + } + + /* Skip unknown region */ + if (!region_enabled) { + continue; + } /* * Check each device to see if they fit in the PA space, @@ -1710,23 +1726,15 @@ static void virt_set_high_memmap(VirtMachineState *vms, * For each device that doesn't fit, disable it. */ fits = (region_base + region_size) <= BIT_ULL(pa_bits); - if (fits) { - vms->highest_gpa = region_base + region_size - 1; - } + if (*reg
Re: [PATCH v3 5/5] hw/arm/virt: Add 'highmem-compact' property
Hi Cornelia, On 9/29/22 8:27 PM, Cornelia Huck wrote: On Thu, Sep 29 2022, Gavin Shan wrote: On 9/28/22 10:22 PM, Eric Auger wrote: On 9/22/22 01:13, Gavin Shan wrote: After the improvement to high memory region address assignment is applied, the memory layout is changed. For example, VIRT_HIGH_PCIE_MMIO s/the memory layout is changed./the memory layout is changed, introducing possible migration breakage. Ok, much clearer. memory region is enabled when the improvement is applied, but it's disabled if the improvement isn't applied. pa_bits = 40; vms->highmem_redists = false; vms->highmem_ecam= false; vms->highmem_mmio= true; # qemu-system-aarch64 -accel kvm -cpu host \ -machine virt-7.2 -m 4G,maxmem=511G \ -monitor stdio In order to keep backwords compatibility, we need to disable the optimization on machines, which is virt-7.1 or ealier than it. It means the optimization is enabled by default from virt-7.2. Besides, 'highmem-compact' property is added so that the optimization can be I would rather rename the property into compact-highmem even if the vms field is name highmem_compact to align with other highmem fields Ok, but I would love to know why. Note that we already have 'highmem=on|off'. 'highmem_compact=on|off' seems consistent to me. FWIW, I initially misread 'highmem_compact' as 'highmem_compat' (and had to re-read because I got confused). At least to me, 'compact_highmem' has less chance of being parsed incorrectly :) (although that is probably a personal thing.) Ok. 'compact-highmem' is also fine to me. I'm really bad at naming :) explicitly enabled or disabled on all machine types by users. Signed-off-by: Gavin Shan --- docs/system/arm/virt.rst | 4 hw/arm/virt.c| 33 + include/hw/arm/virt.h| 2 ++ 3 files changed, 39 insertions(+) diff --git a/docs/system/arm/virt.rst b/docs/system/arm/virt.rst index 20442ea2c1..f05ec2253b 100644 --- a/docs/system/arm/virt.rst +++ b/docs/system/arm/virt.rst @@ -94,6 +94,10 @@ highmem address space above 32 bits. The default is ``on`` for machine types later than ``virt-2.12``. +highmem-compact + Set ``on``/``off`` to enable/disable compact space for high memory regions. + The default is ``on`` for machine types later than ``virt-7.2`` I think you should document what is compact layout versus legacy one, both in the commit msg and maybe as a comment in a code along with the comment in hw/arm/virt.c starting with 'Highmem IO Regions: ' Ok, I will add this into the commit log in v4. I don't think it's necessary to add duplicate comment in the code. People can check the commit log for details if needed. Rather explain it in this file here, maybe? I'd prefer to be able to find out what 'compact' means without digging through the commit log. Ok, lets do as Eric suggested. There are existing comments about @extended_memmap[] in hw/arm/virt.c. We need to explain the legacy/modern laoyout and 'compact-highmem' property there. Thanks, Gavin
Re: [PATCH v3 5/5] hw/arm/virt: Add 'highmem-compact' property
Hi Eric, On 9/28/22 10:22 PM, Eric Auger wrote: On 9/22/22 01:13, Gavin Shan wrote: After the improvement to high memory region address assignment is applied, the memory layout is changed. For example, VIRT_HIGH_PCIE_MMIO s/the memory layout is changed./the memory layout is changed, introducing possible migration breakage. Ok, much clearer. memory region is enabled when the improvement is applied, but it's disabled if the improvement isn't applied. pa_bits = 40; vms->highmem_redists = false; vms->highmem_ecam= false; vms->highmem_mmio= true; # qemu-system-aarch64 -accel kvm -cpu host \ -machine virt-7.2 -m 4G,maxmem=511G \ -monitor stdio In order to keep backwords compatibility, we need to disable the optimization on machines, which is virt-7.1 or ealier than it. It means the optimization is enabled by default from virt-7.2. Besides, 'highmem-compact' property is added so that the optimization can be I would rather rename the property into compact-highmem even if the vms field is name highmem_compact to align with other highmem fields Ok, but I would love to know why. Note that we already have 'highmem=on|off'. 'highmem_compact=on|off' seems consistent to me. explicitly enabled or disabled on all machine types by users. Signed-off-by: Gavin Shan --- docs/system/arm/virt.rst | 4 hw/arm/virt.c| 33 + include/hw/arm/virt.h| 2 ++ 3 files changed, 39 insertions(+) diff --git a/docs/system/arm/virt.rst b/docs/system/arm/virt.rst index 20442ea2c1..f05ec2253b 100644 --- a/docs/system/arm/virt.rst +++ b/docs/system/arm/virt.rst @@ -94,6 +94,10 @@ highmem address space above 32 bits. The default is ``on`` for machine types later than ``virt-2.12``. +highmem-compact + Set ``on``/``off`` to enable/disable compact space for high memory regions. + The default is ``on`` for machine types later than ``virt-7.2`` I think you should document what is compact layout versus legacy one, both in the commit msg and maybe as a comment in a code along with the comment in hw/arm/virt.c starting with 'Highmem IO Regions: ' Ok, I will add this into the commit log in v4. I don't think it's necessary to add duplicate comment in the code. People can check the commit log for details if needed. + gic-version Specify the version of the Generic Interrupt Controller (GIC) to provide. Valid values are: diff --git a/hw/arm/virt.c b/hw/arm/virt.c index b702f8f2b5..a4fbdaef91 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -1734,6 +1734,13 @@ static void virt_set_high_memmap(VirtMachineState *vms, base = region_base + region_size; } else { *region_enabled = false; + +if (!vms->highmem_compact) { this snippet should be already present in previous patch otherwise this will break bisectability. Hmm, nice catch! I think I need to swap PATCH[4] and PATCH[5] in next revision. In that order, 'compact-highmem' is introduced in PATCH[4], but not used yet. PATCH[5] has the optimization and 'compact-highmem' is used. +base = region_base + region_size; +if (fits) { +vms->highest_gpa = region_base + region_size - 1; +} +} } } } @@ -2348,6 +2355,20 @@ static void virt_set_highmem(Object *obj, bool value, Error **errp) vms->highmem = value; } +static bool virt_get_highmem_compact(Object *obj, Error **errp) +{ +VirtMachineState *vms = VIRT_MACHINE(obj); + +return vms->highmem_compact; +} + +static void virt_set_highmem_compact(Object *obj, bool value, Error **errp) +{ +VirtMachineState *vms = VIRT_MACHINE(obj); + +vms->highmem_compact = value; +} + static bool virt_get_its(Object *obj, Error **errp) { VirtMachineState *vms = VIRT_MACHINE(obj); @@ -2966,6 +2987,13 @@ static void virt_machine_class_init(ObjectClass *oc, void *data) "Set on/off to enable/disable using " "physical address space above 32 bits"); +object_class_property_add_bool(oc, "highmem-compact", + virt_get_highmem_compact, + virt_set_highmem_compact); +object_class_property_set_description(oc, "highmem-compact", + "Set on/off to enable/disable compact " + "space for high memory regions"); + object_class_property_add_str(oc, "gic-version", virt_get_gic_version, virt_set_gic_version); object_class_property_set_description(oc, "gic-version", @@ -3050,6 +3078,7 @@ static void virt_instance_init(Object *obj) /* High memory i
Re: [PATCH v3 4/5] hw/arm/virt: Improve high memory region address assignment
Hi Eric, On 9/28/22 10:51 PM, Eric Auger wrote: On 9/22/22 01:13, Gavin Shan wrote: There are three high memory regions, which are VIRT_HIGH_REDIST2, VIRT_HIGH_PCIE_ECAM and VIRT_HIGH_PCIE_MMIO. Their base addresses are floating on highest RAM address. However, they can be disabled in several cases. (1) One specific high memory region is disabled by developer by toggling vms->highmem_{redists, ecam, mmio}. (2) VIRT_HIGH_PCIE_ECAM region is disabled on machine, which is 'virt-2.12' or ealier than it. (3) VIRT_HIGH_PCIE_ECAM region is disabled when firmware is loaded on 32-bits system. (4) One specific high memory region is disabled when it breaks the PA space limit. The current implementation of virt_set_memmap() isn't comprehensive because the space for one specific high memory region is always reserved from the PA space for case (1), (2) and (3). In the code, 'base' and 'vms->highest_gpa' are always increased for those three cases. It's unnecessary since the assigned space of the disabled high memory region won't be used afterwards. This improves the address assignment for those three high memory region by skipping the address assignment for one specific high memory region if it has been disabled in case (1), (2) and (3). Signed-off-by: Gavin Shan --- hw/arm/virt.c | 44 ++-- 1 file changed, 26 insertions(+), 18 deletions(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index b0b679d1f4..b702f8f2b5 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -1693,15 +1693,31 @@ static void virt_set_high_memmap(VirtMachineState *vms, hwaddr base, int pa_bits) { hwaddr region_base, region_size; -bool fits; +bool *region_enabled, fits; IDo you really need a pointer? If the region is unknown this is a bug in virt code. The pointer is needed so that we can disable the region by setting 'false' to it at later point. Yeah, I think you're correct that 'unknown region' is a bug and we need to do assert(region_enabled), or something like below. int i; for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) { region_base = ROUND_UP(base, extended_memmap[i].size); region_size = extended_memmap[i].size; -vms->memmap[i].base = region_base; -vms->memmap[i].size = region_size; +switch (i) { +case VIRT_HIGH_GIC_REDIST2: +region_enabled = >highmem_redists; +break; +case VIRT_HIGH_PCIE_ECAM: +region_enabled = >highmem_ecam; +break; +case VIRT_HIGH_PCIE_MMIO: +region_enabled = >highmem_mmio; +break; While we are at it I would change the vms fields dealing with those highmem regions and turn those fields into an array of bool indexed using i - VIRT_LOWMEMMAP_LAST (using a macro or something alike). We would not be obliged to have this switch, now duplicated. It makes sense to me. How about to have something like below in v4? static inline bool *virt_get_high_memmap_enabled(VirtMachineState *vms, int index) { bool *enabled_array[] = { >highmem_redists, >highmem_ecam, >highmem_mmio, }; assert(index - VIRT_LOWMEMMAP_LAST < ARRAY_SIZE(enabled_array)); return enabled_array[index - VIRT_LOWMEMMAP_LAST]; } +default: +region_enabled = NULL; +} + +/* Skip unknown region */ +if (!region_enabled) { +continue; +} /* * Check each device to see if they fit in the PA space, @@ -1710,23 +1726,15 @@ static void virt_set_high_memmap(VirtMachineState *vms, * For each device that doesn't fit, disable it. */ fits = (region_base + region_size) <= BIT_ULL(pa_bits); -if (fits) { -vms->highest_gpa = region_base + region_size - 1; -} +if (*region_enabled && fits) { +vms->memmap[i].base = region_base; +vms->memmap[i].size = region_size; -switch (i) { -case VIRT_HIGH_GIC_REDIST2: -vms->highmem_redists &= fits; -break; -case VIRT_HIGH_PCIE_ECAM: -vms->highmem_ecam &= fits; -break; -case VIRT_HIGH_PCIE_MMIO: -vms->highmem_mmio &= fits; -break; +vms->highest_gpa = region_base + region_size - 1; +base = region_base + region_size; +} else { +*region_enabled = false; } - -base = region_base + region_size; } } Thanks, Gavin
Re: [PATCH v3 3/5] hw/arm/virt: Introduce variable region_base in virt_set_high_memmap()
Hi Eric, On 9/28/22 10:10 PM, Eric Auger wrote: On 9/22/22 01:13, Gavin Shan wrote: This introduces variable 'region_base' for the base address of the specific high memory region. It's the preparatory work to optimize high memory region address assignment. Why is it a preparatory work (same comment for previous patch, ie [2/5] ). Are those changes really needed? why? In PATCH[4/5], @base argument is added to virt_set_high_memmap(), to represent current global base address. With the optimization applied in PATCH[4/5], @base isn't unconditionally updated to the top of the iterated high memory region. So we need @region_base here (PATCH[3/5]) to track the aligned base address for the iterated high memory region, which may or may be not updated to @base. Since we have @region_base in PATCH[3/5], it'd better to have @region_size in PATCH[2/5]. Actually, PATCH[1-3/5] are all preparatory patches for PATCH[4/5]. My intention was to organize the patches in a way to keep the logical change part simple enough, for easier review. Thanks, Gavin No functional change intended. Signed-off-by: Gavin Shan --- hw/arm/virt.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 187b3ee0e2..b0b679d1f4 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -1692,15 +1692,15 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState *vms, int idx) static void virt_set_high_memmap(VirtMachineState *vms, hwaddr base, int pa_bits) { -hwaddr region_size; +hwaddr region_base, region_size; bool fits; int i; for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) { +region_base = ROUND_UP(base, extended_memmap[i].size); region_size = extended_memmap[i].size; -base = ROUND_UP(base, region_size); -vms->memmap[i].base = base; +vms->memmap[i].base = region_base; vms->memmap[i].size = region_size; /* @@ -1709,9 +1709,9 @@ static void virt_set_high_memmap(VirtMachineState *vms, * * For each device that doesn't fit, disable it. */ -fits = (base + region_size) <= BIT_ULL(pa_bits); +fits = (region_base + region_size) <= BIT_ULL(pa_bits); if (fits) { -vms->highest_gpa = base + region_size - 1; +vms->highest_gpa = region_base + region_size - 1; } switch (i) { @@ -1726,7 +1726,7 @@ static void virt_set_high_memmap(VirtMachineState *vms, break; } -base += region_size; +base = region_base + region_size; } }
Re: [PATCH] KVM: dirty ring: Add memory barrier when marking pfn collected
On 9/23/22 7:35 AM, Peter Xu wrote: Following commit 4802bf910eee9, add the other missing barrier when marking the PFN as collected. This will also be required just like 4802bf910eee9 on weak ordering architectures like aarch64. Cc: Marc Zyngier Cc: Gavin Shan Cc: Paolo Bonzini Signed-off-by: Peter Xu --- accel/kvm/kvm-all.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) Reviewed-by: Gavin Shan diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c index 136c8eaed3..60a044048e 100644 --- a/accel/kvm/kvm-all.c +++ b/accel/kvm/kvm-all.c @@ -728,7 +728,11 @@ static bool dirty_gfn_is_dirtied(struct kvm_dirty_gfn *gfn) static void dirty_gfn_set_collected(struct kvm_dirty_gfn *gfn) { -gfn->flags = KVM_DIRTY_GFN_F_RESET; +/* + * Write the flags after reading the entry. Should pair with another + * smp_load_acquire() in KVM when reset dirty rings. + */ +qatomic_store_release(>flags, KVM_DIRTY_GFN_F_RESET); } /*
[PATCH v3 3/5] hw/arm/virt: Introduce variable region_base in virt_set_high_memmap()
This introduces variable 'region_base' for the base address of the specific high memory region. It's the preparatory work to optimize high memory region address assignment. No functional change intended. Signed-off-by: Gavin Shan --- hw/arm/virt.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 187b3ee0e2..b0b679d1f4 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -1692,15 +1692,15 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState *vms, int idx) static void virt_set_high_memmap(VirtMachineState *vms, hwaddr base, int pa_bits) { -hwaddr region_size; +hwaddr region_base, region_size; bool fits; int i; for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) { +region_base = ROUND_UP(base, extended_memmap[i].size); region_size = extended_memmap[i].size; -base = ROUND_UP(base, region_size); -vms->memmap[i].base = base; +vms->memmap[i].base = region_base; vms->memmap[i].size = region_size; /* @@ -1709,9 +1709,9 @@ static void virt_set_high_memmap(VirtMachineState *vms, * * For each device that doesn't fit, disable it. */ -fits = (base + region_size) <= BIT_ULL(pa_bits); +fits = (region_base + region_size) <= BIT_ULL(pa_bits); if (fits) { -vms->highest_gpa = base + region_size - 1; +vms->highest_gpa = region_base + region_size - 1; } switch (i) { @@ -1726,7 +1726,7 @@ static void virt_set_high_memmap(VirtMachineState *vms, break; } -base += region_size; +base = region_base + region_size; } } -- 2.23.0
[PATCH v3 5/5] hw/arm/virt: Add 'highmem-compact' property
After the improvement to high memory region address assignment is applied, the memory layout is changed. For example, VIRT_HIGH_PCIE_MMIO memory region is enabled when the improvement is applied, but it's disabled if the improvement isn't applied. pa_bits = 40; vms->highmem_redists = false; vms->highmem_ecam= false; vms->highmem_mmio= true; # qemu-system-aarch64 -accel kvm -cpu host \ -machine virt-7.2 -m 4G,maxmem=511G \ -monitor stdio In order to keep backwords compatibility, we need to disable the optimization on machines, which is virt-7.1 or ealier than it. It means the optimization is enabled by default from virt-7.2. Besides, 'highmem-compact' property is added so that the optimization can be explicitly enabled or disabled on all machine types by users. Signed-off-by: Gavin Shan --- docs/system/arm/virt.rst | 4 hw/arm/virt.c| 33 + include/hw/arm/virt.h| 2 ++ 3 files changed, 39 insertions(+) diff --git a/docs/system/arm/virt.rst b/docs/system/arm/virt.rst index 20442ea2c1..f05ec2253b 100644 --- a/docs/system/arm/virt.rst +++ b/docs/system/arm/virt.rst @@ -94,6 +94,10 @@ highmem address space above 32 bits. The default is ``on`` for machine types later than ``virt-2.12``. +highmem-compact + Set ``on``/``off`` to enable/disable compact space for high memory regions. + The default is ``on`` for machine types later than ``virt-7.2`` + gic-version Specify the version of the Generic Interrupt Controller (GIC) to provide. Valid values are: diff --git a/hw/arm/virt.c b/hw/arm/virt.c index b702f8f2b5..a4fbdaef91 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -1734,6 +1734,13 @@ static void virt_set_high_memmap(VirtMachineState *vms, base = region_base + region_size; } else { *region_enabled = false; + +if (!vms->highmem_compact) { +base = region_base + region_size; +if (fits) { +vms->highest_gpa = region_base + region_size - 1; +} +} } } } @@ -2348,6 +2355,20 @@ static void virt_set_highmem(Object *obj, bool value, Error **errp) vms->highmem = value; } +static bool virt_get_highmem_compact(Object *obj, Error **errp) +{ +VirtMachineState *vms = VIRT_MACHINE(obj); + +return vms->highmem_compact; +} + +static void virt_set_highmem_compact(Object *obj, bool value, Error **errp) +{ +VirtMachineState *vms = VIRT_MACHINE(obj); + +vms->highmem_compact = value; +} + static bool virt_get_its(Object *obj, Error **errp) { VirtMachineState *vms = VIRT_MACHINE(obj); @@ -2966,6 +2987,13 @@ static void virt_machine_class_init(ObjectClass *oc, void *data) "Set on/off to enable/disable using " "physical address space above 32 bits"); +object_class_property_add_bool(oc, "highmem-compact", + virt_get_highmem_compact, + virt_set_highmem_compact); +object_class_property_set_description(oc, "highmem-compact", + "Set on/off to enable/disable compact " + "space for high memory regions"); + object_class_property_add_str(oc, "gic-version", virt_get_gic_version, virt_set_gic_version); object_class_property_set_description(oc, "gic-version", @@ -3050,6 +3078,7 @@ static void virt_instance_init(Object *obj) /* High memory is enabled by default */ vms->highmem = true; +vms->highmem_compact = !vmc->no_highmem_compact; vms->gic_version = VIRT_GIC_VERSION_NOSEL; vms->highmem_ecam = !vmc->no_highmem_ecam; @@ -3119,8 +3148,12 @@ DEFINE_VIRT_MACHINE_AS_LATEST(7, 2) static void virt_machine_7_1_options(MachineClass *mc) { +VirtMachineClass *vmc = VIRT_MACHINE_CLASS(OBJECT_CLASS(mc)); + virt_machine_7_2_options(mc); compat_props_add(mc->compat_props, hw_compat_7_1, hw_compat_7_1_len); +/* Compact space for high memory regions was introduced with 7.2 */ +vmc->no_highmem_compact = true; } DEFINE_VIRT_MACHINE(7, 1) diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h index 6ec479ca2b..c7dd59d7f1 100644 --- a/include/hw/arm/virt.h +++ b/include/hw/arm/virt.h @@ -125,6 +125,7 @@ struct VirtMachineClass { bool no_pmu; bool claim_edge_triggered_timers; bool smbios_old_sys_ver; +bool no_highmem_compact; bool no_highmem_ecam; bool no_ged; /* Machines < 4.2 have no support for ACPI GED device */ bool kvm_no_adjvtime; @@ -144,6 +145,7 @@ struct VirtMachineState { PFlashCFI01 *flash[2]; bool secure; bool highmem; +bool highmem_compact; bool highmem_ecam; bool highmem_mmio; bool highmem_redists; -- 2.23.0
[PATCH v3 1/5] hw/arm/virt: Introduce virt_set_high_memmap() helper
This introduces virt_set_high_memmap() helper. The logic of high memory region address assignment is moved to the helper. The intention is to make the subsequent optimization for high memory region address assignment easier. No functional change intended. Signed-off-by: Gavin Shan --- hw/arm/virt.c | 74 --- 1 file changed, 41 insertions(+), 33 deletions(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 0961e053e5..4dab528b82 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -1689,6 +1689,46 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState *vms, int idx) return arm_cpu_mp_affinity(idx, clustersz); } +static void virt_set_high_memmap(VirtMachineState *vms, + hwaddr base, int pa_bits) +{ +int i; + +for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) { +hwaddr size = extended_memmap[i].size; +bool fits; + +base = ROUND_UP(base, size); +vms->memmap[i].base = base; +vms->memmap[i].size = size; + +/* + * Check each device to see if they fit in the PA space, + * moving highest_gpa as we go. + * + * For each device that doesn't fit, disable it. + */ +fits = (base + size) <= BIT_ULL(pa_bits); +if (fits) { +vms->highest_gpa = base + size - 1; +} + +switch (i) { +case VIRT_HIGH_GIC_REDIST2: +vms->highmem_redists &= fits; +break; +case VIRT_HIGH_PCIE_ECAM: +vms->highmem_ecam &= fits; +break; +case VIRT_HIGH_PCIE_MMIO: +vms->highmem_mmio &= fits; +break; +} + +base += size; +} +} + static void virt_set_memmap(VirtMachineState *vms, int pa_bits) { MachineState *ms = MACHINE(vms); @@ -1744,39 +1784,7 @@ static void virt_set_memmap(VirtMachineState *vms, int pa_bits) /* We know for sure that at least the memory fits in the PA space */ vms->highest_gpa = memtop - 1; -for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) { -hwaddr size = extended_memmap[i].size; -bool fits; - -base = ROUND_UP(base, size); -vms->memmap[i].base = base; -vms->memmap[i].size = size; - -/* - * Check each device to see if they fit in the PA space, - * moving highest_gpa as we go. - * - * For each device that doesn't fit, disable it. - */ -fits = (base + size) <= BIT_ULL(pa_bits); -if (fits) { -vms->highest_gpa = base + size - 1; -} - -switch (i) { -case VIRT_HIGH_GIC_REDIST2: -vms->highmem_redists &= fits; -break; -case VIRT_HIGH_PCIE_ECAM: -vms->highmem_ecam &= fits; -break; -case VIRT_HIGH_PCIE_MMIO: -vms->highmem_mmio &= fits; -break; -} - -base += size; -} +virt_set_high_memmap(vms, base, pa_bits); if (device_memory_size > 0) { ms->device_memory = g_malloc0(sizeof(*ms->device_memory)); -- 2.23.0
[PATCH v3 4/5] hw/arm/virt: Improve high memory region address assignment
There are three high memory regions, which are VIRT_HIGH_REDIST2, VIRT_HIGH_PCIE_ECAM and VIRT_HIGH_PCIE_MMIO. Their base addresses are floating on highest RAM address. However, they can be disabled in several cases. (1) One specific high memory region is disabled by developer by toggling vms->highmem_{redists, ecam, mmio}. (2) VIRT_HIGH_PCIE_ECAM region is disabled on machine, which is 'virt-2.12' or ealier than it. (3) VIRT_HIGH_PCIE_ECAM region is disabled when firmware is loaded on 32-bits system. (4) One specific high memory region is disabled when it breaks the PA space limit. The current implementation of virt_set_memmap() isn't comprehensive because the space for one specific high memory region is always reserved from the PA space for case (1), (2) and (3). In the code, 'base' and 'vms->highest_gpa' are always increased for those three cases. It's unnecessary since the assigned space of the disabled high memory region won't be used afterwards. This improves the address assignment for those three high memory region by skipping the address assignment for one specific high memory region if it has been disabled in case (1), (2) and (3). Signed-off-by: Gavin Shan --- hw/arm/virt.c | 44 ++-- 1 file changed, 26 insertions(+), 18 deletions(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index b0b679d1f4..b702f8f2b5 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -1693,15 +1693,31 @@ static void virt_set_high_memmap(VirtMachineState *vms, hwaddr base, int pa_bits) { hwaddr region_base, region_size; -bool fits; +bool *region_enabled, fits; int i; for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) { region_base = ROUND_UP(base, extended_memmap[i].size); region_size = extended_memmap[i].size; -vms->memmap[i].base = region_base; -vms->memmap[i].size = region_size; +switch (i) { +case VIRT_HIGH_GIC_REDIST2: +region_enabled = >highmem_redists; +break; +case VIRT_HIGH_PCIE_ECAM: +region_enabled = >highmem_ecam; +break; +case VIRT_HIGH_PCIE_MMIO: +region_enabled = >highmem_mmio; +break; +default: +region_enabled = NULL; +} + +/* Skip unknown region */ +if (!region_enabled) { +continue; +} /* * Check each device to see if they fit in the PA space, @@ -1710,23 +1726,15 @@ static void virt_set_high_memmap(VirtMachineState *vms, * For each device that doesn't fit, disable it. */ fits = (region_base + region_size) <= BIT_ULL(pa_bits); -if (fits) { -vms->highest_gpa = region_base + region_size - 1; -} +if (*region_enabled && fits) { +vms->memmap[i].base = region_base; +vms->memmap[i].size = region_size; -switch (i) { -case VIRT_HIGH_GIC_REDIST2: -vms->highmem_redists &= fits; -break; -case VIRT_HIGH_PCIE_ECAM: -vms->highmem_ecam &= fits; -break; -case VIRT_HIGH_PCIE_MMIO: -vms->highmem_mmio &= fits; -break; +vms->highest_gpa = region_base + region_size - 1; +base = region_base + region_size; +} else { +*region_enabled = false; } - -base = region_base + region_size; } } -- 2.23.0
[PATCH v3 2/5] hw/arm/virt: Rename variable size to region_size in virt_set_high_memmap()
This renames variable 'size' to 'region_size' in virt_set_high_memmap(). Its counterpart ('region_base') will be introduced in next patch. No functional change intended. Signed-off-by: Gavin Shan --- hw/arm/virt.c | 15 --- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 4dab528b82..187b3ee0e2 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -1692,15 +1692,16 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState *vms, int idx) static void virt_set_high_memmap(VirtMachineState *vms, hwaddr base, int pa_bits) { +hwaddr region_size; +bool fits; int i; for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) { -hwaddr size = extended_memmap[i].size; -bool fits; +region_size = extended_memmap[i].size; -base = ROUND_UP(base, size); +base = ROUND_UP(base, region_size); vms->memmap[i].base = base; -vms->memmap[i].size = size; +vms->memmap[i].size = region_size; /* * Check each device to see if they fit in the PA space, @@ -1708,9 +1709,9 @@ static void virt_set_high_memmap(VirtMachineState *vms, * * For each device that doesn't fit, disable it. */ -fits = (base + size) <= BIT_ULL(pa_bits); +fits = (base + region_size) <= BIT_ULL(pa_bits); if (fits) { -vms->highest_gpa = base + size - 1; +vms->highest_gpa = base + region_size - 1; } switch (i) { @@ -1725,7 +1726,7 @@ static void virt_set_high_memmap(VirtMachineState *vms, break; } -base += size; +base += region_size; } } -- 2.23.0
[PATCH v3 0/5] hw/arm/virt: Improve address assignment for high memory regions
There are three high memory regions, which are VIRT_HIGH_REDIST2, VIRT_HIGH_PCIE_ECAM and VIRT_HIGH_PCIE_MMIO. Their base addresses are floating on highest RAM address. However, they can be disabled in several cases. (1) One specific high memory region is disabled by developer by toggling vms->highmem_{redists, ecam, mmio}. (2) VIRT_HIGH_PCIE_ECAM region is disabled on machine, which is 'virt-2.12' or ealier than it. (3) VIRT_HIGH_PCIE_ECAM region is disabled when firmware is loaded on 32-bits system. (4) One specific high memory region is disabled when it breaks the PA space limit. The current implementation of virt_set_memmap() isn't comprehensive because the space for one specific high memory region is always reserved from the PA space for case (1), (2) and (3). In the code, 'base' and 'vms->highest_gpa' are always increased for those three cases. It's unnecessary since the assigned space of the disabled high memory region won't be used afterwards. The series intends to improve the address assignment for these high memory regions. PATCH[1-3] preparatory work for the improvment PATCH[4] improve high memory region address assignment PATCH[5] adds 'highmem-compact' to enable or disable the optimization History === v2: https://lore.kernel.org/all/20220815062958.100366-1-gs...@redhat.com/T/ v1: https://lists.nongnu.org/archive/html/qemu-arm/2022-08/msg00013.html Changelog == v3: * Reorder the patches(Gavin) * Add 'highmem-compact' property for backwards compatibility (Eric) v2: * Split the patches for easier review(Gavin) * Improved changelog (Marc) * Use 'bool fits' in virt_set_high_memmap() (Eric) Gavin Shan (5): hw/arm/virt: Introduce virt_set_high_memmap() helper hw/arm/virt: Rename variable size to region_size in virt_set_high_memmap() hw/arm/virt: Introduce variable region_base in virt_set_high_memmap() hw/arm/virt: Improve high memory region address assignment hw/arm/virt: Add 'highmem-compact' property docs/system/arm/virt.rst | 4 ++ hw/arm/virt.c| 116 --- include/hw/arm/virt.h| 2 + 3 files changed, 89 insertions(+), 33 deletions(-) -- 2.23.0
Re: [PATCH] KVM: use store-release to mark dirty pages as harvested
On 9/2/22 10:19 AM, Paolo Bonzini wrote: The following scenario can happen if QEMU sets more RESET flags while the KVM_RESET_DIRTY_RINGS ioctl is ongoing on another host CPU: CPU0 CPU1 CPU2 -- fill gfn0 store-rel flags for gfn0 fill gfn1 store-rel flags for gfn1 load-acq flags for gfn0 set RESET for gfn0 load-acq flags for gfn1 set RESET for gfn1 do ioctl! ---> ioctl(RESET_RINGS) fill gfn2 store-rel flags for gfn2 load-acq flags for gfn2 set RESET for gfn2 process gfn0 process gfn1 process gfn2 do ioctl! etc. The three load-acquire in CPU0 synchronize with the three store-release in CPU2, but CPU0 and CPU1 are only synchronized up to gfn1 and CPU1 may miss gfn2's fields other than flags. The kernel must be able to cope with invalid values of the fields, and userspace *will* invoke the ioctl once more. However, once the RESET flag is cleared on gfn2, it is lost forever, therefore in the above scenario CPU1 must read the correct value of gfn2's fields. Therefore RESET must be set with a store-release, that will synchronize with KVM's load-acquire in CPU1. Cc: Gavin Shan Cc: Peter Xu Signed-off-by: Paolo Bonzini --- accel/kvm/kvm-all.c | 18 +- 1 file changed, 17 insertions(+), 1 deletion(-) Reviewed-by: Gavin Shan diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c index 136c8eaed3..7c8ce18bdd 100644 --- a/accel/kvm/kvm-all.c +++ b/accel/kvm/kvm-all.c @@ -728,7 +728,23 @@ static bool dirty_gfn_is_dirtied(struct kvm_dirty_gfn *gfn) static void dirty_gfn_set_collected(struct kvm_dirty_gfn *gfn) { -gfn->flags = KVM_DIRTY_GFN_F_RESET; +/* + * Use a store-release so that the CPU that executes KVM_RESET_DIRTY_RINGS + * sees the full content of the ring: + * + * CPU0 CPU1 CPU2 + * -- + * fill gfn0 + * store-rel flags for gfn0 + * load-acq flags for gfn0 + * store-rel RESET for gfn0 + * ioctl(RESET_RINGS) + *load-acq flags for gfn0 + *check if flags have RESET + * + * The synchronization goes from CPU2 to CPU0 to CPU1. + */ +qatomic_store_release(>flags, KVM_DIRTY_GFN_F_RESET); } /*
Re: [PATCH v2] KVM: dirty ring: add missing memory barrier
On 8/27/22 6:22 PM, Paolo Bonzini wrote: The KVM_DIRTY_GFN_F_DIRTY flag ensures that the entry is valid. If the read of the fields are not ordered after the read of the flag, QEMU might see stale values. Cc: Peter Xu Cc: Gavin Shan Signed-off-by: Paolo Bonzini --- accel/kvm/kvm-all.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) Reviewed-by: Gavin Shan diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c index 8d81ab74de..136c8eaed3 100644 --- a/accel/kvm/kvm-all.c +++ b/accel/kvm/kvm-all.c @@ -719,7 +719,11 @@ static void kvm_dirty_ring_mark_page(KVMState *s, uint32_t as_id, static bool dirty_gfn_is_dirtied(struct kvm_dirty_gfn *gfn) { -return gfn->flags == KVM_DIRTY_GFN_F_DIRTY; +/* + * Read the flags before the value. Pairs with barrier in + * KVM's kvm_dirty_ring_push() function. + */ +return qatomic_load_acquire(>flags) == KVM_DIRTY_GFN_F_DIRTY; } static void dirty_gfn_set_collected(struct kvm_dirty_gfn *gfn)
Re: [PATCH v2 0/4] hw/arm/virt: Improve address assignment for high memory regions
Hi Eric, On 8/24/22 6:06 PM, Eric Auger wrote: On 8/24/22 05:29, Gavin Shan wrote: On 8/15/22 4:29 PM, Gavin Shan wrote: There are three high memory regions, which are VIRT_HIGH_REDIST2, VIRT_HIGH_PCIE_ECAM and VIRT_HIGH_PCIE_MMIO. Their base addresses are floating on highest RAM address. However, they can be disabled in several cases. (1) One specific high memory region is disabled by developer by toggling vms->highmem_{redists, ecam, mmio}. (2) VIRT_HIGH_PCIE_ECAM region is disabled on machine, which is 'virt-2.12' or ealier than it. (3) VIRT_HIGH_PCIE_ECAM region is disabled when firmware is loaded on 32-bits system. (4) One specific high memory region is disabled when it breaks the PA space limit. The current implementation of virt_set_memmap() isn't comprehensive because the space for one specific high memory region is always reserved from the PA space for case (1), (2) and (3). In the code, 'base' and 'vms->highest_gpa' are always increased for those three cases. It's unnecessary since the assigned space of the disabled high memory region won't be used afterwards. The series intends to improve the address assignment for these high memory regions: PATCH[1] and PATCH[2] are cleanup and preparatory works. PATCH[3] improves address assignment for these high memory regions PATCH[4] moves the address assignment logic into standalone helper History === v1: https://lists.nongnu.org/archive/html/qemu-arm/2022-08/msg00013.html Changelog = v2: * Split the patches for easier review (Gavin) * Improved changelog (Marc) * Use 'bool fits' in virt_set_high_memmap() (Eric) You did not really convince me about migration compat wrt the high MMIO region. Aren't the PCI BARs saved/restored meaning the device driver is expecting to find data at the same GPA. But what if your high MMIO region was relocated in the dest QEMU with a possibly smaller VM IPA? Don't you have MMIO regions now allocated outside of the dest MMIO region? How does the PCI host bridge route accesses to those regions? What do I miss? [...] Sorry for the delay because I was offline last week. I was intending to explain the migration on virtio-net device and spent some time to go through the code. I found it's still complicated for an example because PCI and virtio device models are involved. So lets still use e1000e.c as an example here. There are lots of registers residing in MMIO region, including MSIx table. The MSIx table is backed by PCIDevice::msix_table, which is a buffer. The access to MSIx table is read from or write to the buffer. The corresponding handler is hw/msix.c::msix_table_mmio_ops. msix_init() is called by e1000e.c to setup the MSIx table, which is associated with memory BAR#3. As the registers in MSIx table is totally emulated by QEMU, the BAR's base address isn't a concern. struct PCIDevice { : uint8_t *msix_table; : MemoryRegion msix_table_mmio; : }; /* @table_bar is registered as memory BAR#3 in e1000e_pci_realize() */ int msix_init(struct PCIDevice *dev, unsigned short nentries, MemoryRegion *table_bar, uint8_t table_bar_nr, unsigned table_offset, MemoryRegion *pba_bar, uint8_t pba_bar_nr, unsigned pba_offset, uint8_t cap_pos, Error **errp) { : memory_region_init_io(>msix_table_mmio, OBJECT(dev), _table_mmio_ops, dev, "msix-table", table_size); memory_region_add_subregion(table_bar, table_offset, >msix_table_mmio); : } As we concerned, the BAR's base addresses for MSIx tables are different on source and destination VMs. It's still not a problem because the registers in MSIx table are migrated, saved on source VM and restored on destination VM one by one. It's to say, not the whole buffer (PCIDevice::msix_table) is saved and restored at once. Further more, the unique ID string, instead the corresponding BAR's base address, is used to identify the MSIx table. For this particular case, the ID string is something like "e1000e_dev_id/pci-:05:00.0/msix state". With this ID string is received on the destination VM, the object and PCI device is located and the forth-coming data is saved to PCIDevice::msix_table. static const VMStateDescription e1000e_vmstate = { .name = "e1000e", .version_id = 1, .minimum_version_id = 1, .pre_save = e1000e_pre_save, .post_load = e1000e_post_load, .fields = (VMStateField[]) { VMSTATE_PCI_DEVICE(parent_obj, E1000EState), VMSTATE_MSIX(parent_obj, E1000EState), : } }; #define VMSTATE_MSIX_TEST(_field, _state, _test) { \ .name = (stringify(_field)), \ .size = sizeof(PCIDevice),
Re: [PATCH v2 0/4] hw/arm/virt: Improve address assignment for high memory regions
Hi Eric, On 8/24/22 6:06 PM, Eric Auger wrote: On 8/24/22 05:29, Gavin Shan wrote: On 8/15/22 4:29 PM, Gavin Shan wrote: There are three high memory regions, which are VIRT_HIGH_REDIST2, VIRT_HIGH_PCIE_ECAM and VIRT_HIGH_PCIE_MMIO. Their base addresses are floating on highest RAM address. However, they can be disabled in several cases. (1) One specific high memory region is disabled by developer by toggling vms->highmem_{redists, ecam, mmio}. (2) VIRT_HIGH_PCIE_ECAM region is disabled on machine, which is 'virt-2.12' or ealier than it. (3) VIRT_HIGH_PCIE_ECAM region is disabled when firmware is loaded on 32-bits system. (4) One specific high memory region is disabled when it breaks the PA space limit. The current implementation of virt_set_memmap() isn't comprehensive because the space for one specific high memory region is always reserved from the PA space for case (1), (2) and (3). In the code, 'base' and 'vms->highest_gpa' are always increased for those three cases. It's unnecessary since the assigned space of the disabled high memory region won't be used afterwards. The series intends to improve the address assignment for these high memory regions: PATCH[1] and PATCH[2] are cleanup and preparatory works. PATCH[3] improves address assignment for these high memory regions PATCH[4] moves the address assignment logic into standalone helper History === v1: https://lists.nongnu.org/archive/html/qemu-arm/2022-08/msg00013.html Changelog = v2: * Split the patches for easier review (Gavin) * Improved changelog (Marc) * Use 'bool fits' in virt_set_high_memmap() (Eric) You did not really convince me about migration compat wrt the high MMIO region. Aren't the PCI BARs saved/restored meaning the device driver is expecting to find data at the same GPA. But what if your high MMIO region was relocated in the dest QEMU with a possibly smaller VM IPA? Don't you have MMIO regions now allocated outside of the dest MMIO region? How does the PCI host bridge route accesses to those regions? What do I miss? I'm currently looking into virtio-pci-net migration, but need time to investigate how the device is migrated. I will get back to you once I have something. Thanks for your comments :) Thanks, Gavin
Re: [PATCH v2 0/4] hw/arm/virt: Improve address assignment for high memory regions
Hi Marc, On 8/15/22 4:29 PM, Gavin Shan wrote: There are three high memory regions, which are VIRT_HIGH_REDIST2, VIRT_HIGH_PCIE_ECAM and VIRT_HIGH_PCIE_MMIO. Their base addresses are floating on highest RAM address. However, they can be disabled in several cases. (1) One specific high memory region is disabled by developer by toggling vms->highmem_{redists, ecam, mmio}. (2) VIRT_HIGH_PCIE_ECAM region is disabled on machine, which is 'virt-2.12' or ealier than it. (3) VIRT_HIGH_PCIE_ECAM region is disabled when firmware is loaded on 32-bits system. (4) One specific high memory region is disabled when it breaks the PA space limit. The current implementation of virt_set_memmap() isn't comprehensive because the space for one specific high memory region is always reserved from the PA space for case (1), (2) and (3). In the code, 'base' and 'vms->highest_gpa' are always increased for those three cases. It's unnecessary since the assigned space of the disabled high memory region won't be used afterwards. The series intends to improve the address assignment for these high memory regions: PATCH[1] and PATCH[2] are cleanup and preparatory works. PATCH[3] improves address assignment for these high memory regions PATCH[4] moves the address assignment logic into standalone helper History === v1: https://lists.nongnu.org/archive/html/qemu-arm/2022-08/msg00013.html Changelog = v2: * Split the patches for easier review(Gavin) * Improved changelog (Marc) * Use 'bool fits' in virt_set_high_memmap() (Eric) Could you help to review when you have free cycles? It's just a kindly ping :) Thanks, Gavin Gavin Shan (4): hw/arm/virt: Rename variable size to region_size in virt_set_memmap() hw/arm/virt: Introduce variable region_base in virt_set_memmap() hw/arm/virt: Improve address assignment for high memory regions virt/hw/virt: Add virt_set_high_memmap() helper hw/arm/virt.c | 84 ++- 1 file changed, 50 insertions(+), 34 deletions(-)
[PATCH v2 3/4] hw/arm/virt: Improve address assignment for high memory regions
There are three high memory regions, which are VIRT_HIGH_REDIST2, VIRT_HIGH_PCIE_ECAM and VIRT_HIGH_PCIE_MMIO. Their base addresses are floating on highest RAM address. However, they can be disabled in several cases. (1) One specific high memory region is disabled by developer by toggling vms->highmem_{redists, ecam, mmio}. (2) VIRT_HIGH_PCIE_ECAM region is disabled on machine, which is 'virt-2.12' or ealier than it. (3) VIRT_HIGH_PCIE_ECAM region is disabled when firmware is loaded on 32-bits system. (4) One specific high memory region is disabled when it breaks the PA space limit. The current implementation of virt_set_memmap() isn't comprehensive because the space for one specific high memory region is always reserved from the PA space for case (1), (2) and (3). In the code, 'base' and 'vms->highest_gpa' are always increased for those three cases. It's unnecessary since the assigned space of the disabled high memory region won't be used afterwards. This improves the address assignment for those three high memory region by skipping the address assignment for one specific high memory region if it has been disabled in case (1), (2) and (3). Signed-off-by: Gavin Shan --- hw/arm/virt.c | 42 +- 1 file changed, 25 insertions(+), 17 deletions(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 582a8960fc..e38b6919c9 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -1746,10 +1746,26 @@ static void virt_set_memmap(VirtMachineState *vms, int pa_bits) for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) { hwaddr region_base = ROUND_UP(base, extended_memmap[i].size); hwaddr region_size = extended_memmap[i].size; -bool fits; +bool *region_enabled, fits; -vms->memmap[i].base = region_base; -vms->memmap[i].size = region_size; +switch (i) { +case VIRT_HIGH_GIC_REDIST2: +region_enabled = >highmem_redists; +break; +case VIRT_HIGH_PCIE_ECAM: +region_enabled = >highmem_ecam; +break; +case VIRT_HIGH_PCIE_MMIO: +region_enabled = >highmem_mmio; +break; +default: +region_enabled = NULL; +} + +/* Skip unknwon or disabled regions */ +if (!region_enabled || !*region_enabled) { +continue; +} /* * Check each device to see if they fit in the PA space, @@ -1759,22 +1775,14 @@ static void virt_set_memmap(VirtMachineState *vms, int pa_bits) */ fits = (region_base + region_size) <= BIT_ULL(pa_bits); if (fits) { -vms->highest_gpa = region_base + region_size - 1; -} +vms->memmap[i].base = region_base; +vms->memmap[i].size = region_size; -switch (i) { -case VIRT_HIGH_GIC_REDIST2: -vms->highmem_redists &= fits; -break; -case VIRT_HIGH_PCIE_ECAM: -vms->highmem_ecam &= fits; -break; -case VIRT_HIGH_PCIE_MMIO: -vms->highmem_mmio &= fits; -break; +base = region_base + region_size; +vms->highest_gpa = region_base + region_size - 1; +} else { +*region_enabled = false; } - -base = region_base + region_size; } if (device_memory_size > 0) { -- 2.23.0
[PATCH v2 4/4] virt/hw/virt: Add virt_set_high_memmap() helper
The logic to assign high memory region's address in virt_set_memmap() is independent. Lets move the logic to virt_set_high_memmap() helper. "each device" is replaced by "each region" in the comments. No functional change intended. Signed-off-by: Gavin Shan --- hw/arm/virt.c | 92 --- 1 file changed, 50 insertions(+), 42 deletions(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index e38b6919c9..4dde08a924 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -1688,6 +1688,55 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState *vms, int idx) return arm_cpu_mp_affinity(idx, clustersz); } +static void virt_set_high_memmap(VirtMachineState *vms, + hwaddr base, int pa_bits) +{ +hwaddr region_base, region_size; +bool *region_enabled, fits; +int i; + +for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) { +region_base = ROUND_UP(base, extended_memmap[i].size); +region_size = extended_memmap[i].size; + +switch (i) { +case VIRT_HIGH_GIC_REDIST2: +region_enabled = >highmem_redists; +break; +case VIRT_HIGH_PCIE_ECAM: +region_enabled = >highmem_ecam; +break; +case VIRT_HIGH_PCIE_MMIO: +region_enabled = >highmem_mmio; +break; +default: +region_enabled = NULL; +} + +/* Skip unknwon or disabled regions */ +if (!region_enabled || !*region_enabled) { +continue; +} + +/* + * Check each region to see if they fit in the PA space, + * moving highest_gpa as we go. + * + * For each device that doesn't fit, disable it. + */ +fits = (region_base + region_size) <= BIT_ULL(pa_bits); +if (fits) { +vms->memmap[i].base = region_base; +vms->memmap[i].size = region_size; + +base = region_base + region_size; +vms->highest_gpa = region_base + region_size - 1; +} else { +*region_enabled = false; +} +} +} + static void virt_set_memmap(VirtMachineState *vms, int pa_bits) { MachineState *ms = MACHINE(vms); @@ -1742,48 +1791,7 @@ static void virt_set_memmap(VirtMachineState *vms, int pa_bits) /* We know for sure that at least the memory fits in the PA space */ vms->highest_gpa = memtop - 1; - -for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) { -hwaddr region_base = ROUND_UP(base, extended_memmap[i].size); -hwaddr region_size = extended_memmap[i].size; -bool *region_enabled, fits; - -switch (i) { -case VIRT_HIGH_GIC_REDIST2: -region_enabled = >highmem_redists; -break; -case VIRT_HIGH_PCIE_ECAM: -region_enabled = >highmem_ecam; -break; -case VIRT_HIGH_PCIE_MMIO: -region_enabled = >highmem_mmio; -break; -default: -region_enabled = NULL; -} - -/* Skip unknwon or disabled regions */ -if (!region_enabled || !*region_enabled) { -continue; -} - -/* - * Check each device to see if they fit in the PA space, - * moving highest_gpa as we go. - * - * For each device that doesn't fit, disable it. - */ -fits = (region_base + region_size) <= BIT_ULL(pa_bits); -if (fits) { -vms->memmap[i].base = region_base; -vms->memmap[i].size = region_size; - -base = region_base + region_size; -vms->highest_gpa = region_base + region_size - 1; -} else { -*region_enabled = false; -} -} +virt_set_high_memmap(vms, base, pa_bits); if (device_memory_size > 0) { ms->device_memory = g_malloc0(sizeof(*ms->device_memory)); -- 2.23.0
[PATCH v2 1/4] hw/arm/virt: Rename variable size to region_size in virt_set_memmap()
This renames variable 'size' to 'region_size' in virt_set_memmap(). It's counterpart to 'region_base', which will be introducded in next patch. No functional change intended. Signed-off-by: Gavin Shan --- hw/arm/virt.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 9633f822f3..f8e9f3e205 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -1744,12 +1744,12 @@ static void virt_set_memmap(VirtMachineState *vms, int pa_bits) vms->highest_gpa = memtop - 1; for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) { -hwaddr size = extended_memmap[i].size; +hwaddr region_size = extended_memmap[i].size; bool fits; -base = ROUND_UP(base, size); +base = ROUND_UP(base, region_size); vms->memmap[i].base = base; -vms->memmap[i].size = size; +vms->memmap[i].size = region_size; /* * Check each device to see if they fit in the PA space, @@ -1757,9 +1757,9 @@ static void virt_set_memmap(VirtMachineState *vms, int pa_bits) * * For each device that doesn't fit, disable it. */ -fits = (base + size) <= BIT_ULL(pa_bits); +fits = (base + region_size) <= BIT_ULL(pa_bits); if (fits) { -vms->highest_gpa = base + size - 1; +vms->highest_gpa = base + region_size - 1; } switch (i) { @@ -1774,7 +1774,7 @@ static void virt_set_memmap(VirtMachineState *vms, int pa_bits) break; } -base += size; +base += region_size; } if (device_memory_size > 0) { -- 2.23.0
[PATCH v2 2/4] hw/arm/virt: Introduce variable region_base in virt_set_memmap()
This introduces variable 'region_base' for the base address of the specific high memory region. It's the preparatory to improve the address assignment for high memory region in next patch. No functional change intended. Signed-off-by: Gavin Shan --- hw/arm/virt.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index f8e9f3e205..582a8960fc 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -1744,11 +1744,11 @@ static void virt_set_memmap(VirtMachineState *vms, int pa_bits) vms->highest_gpa = memtop - 1; for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) { +hwaddr region_base = ROUND_UP(base, extended_memmap[i].size); hwaddr region_size = extended_memmap[i].size; bool fits; -base = ROUND_UP(base, region_size); -vms->memmap[i].base = base; +vms->memmap[i].base = region_base; vms->memmap[i].size = region_size; /* @@ -1757,9 +1757,9 @@ static void virt_set_memmap(VirtMachineState *vms, int pa_bits) * * For each device that doesn't fit, disable it. */ -fits = (base + region_size) <= BIT_ULL(pa_bits); +fits = (region_base + region_size) <= BIT_ULL(pa_bits); if (fits) { -vms->highest_gpa = base + region_size - 1; +vms->highest_gpa = region_base + region_size - 1; } switch (i) { @@ -1774,7 +1774,7 @@ static void virt_set_memmap(VirtMachineState *vms, int pa_bits) break; } -base += region_size; +base = region_base + region_size; } if (device_memory_size > 0) { -- 2.23.0