Re: [PATCH v1 5/6] hw/arm/virt: Enable backup bitmap for dirty ring

2023-02-21 Thread Gavin Shan

On 2/22/23 3:27 AM, Peter Maydell wrote:

On Mon, 13 Feb 2023 at 00:40, Gavin Shan  wrote:


When KVM device "kvm-arm-gicv3" or "arm-its-kvm" is used, we have to
enable the backup bitmap for the dirty ring. Otherwise, the migration
will fail because those two devices are using the backup bitmap to track
dirty guest memory, corresponding to various hardware tables.

Signed-off-by: Gavin Shan 
Reviewed-by: Juan Quintela 
---
  hw/arm/virt.c| 26 ++
  target/arm/kvm64.c   | 25 +
  target/arm/kvm_arm.h | 12 
  3 files changed, 63 insertions(+)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 75f28947de..ea9bd98a65 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -2024,6 +2024,8 @@ static void machvirt_init(MachineState *machine)
  VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(machine);
  MachineClass *mc = MACHINE_GET_CLASS(machine);
  const CPUArchIdList *possible_cpus;
+const char *gictype = NULL;
+const char *itsclass = NULL;
  MemoryRegion *sysmem = get_system_memory();
  MemoryRegion *secure_sysmem = NULL;
  MemoryRegion *tag_sysmem = NULL;
@@ -2071,6 +2073,30 @@ static void machvirt_init(MachineState *machine)
   */
  finalize_gic_version(vms);

+/*
+ * When "kvm-arm-gicv3" or "arm-its-kvm" is used, the backup dirty
+ * bitmap has to be enabled for KVM dirty ring, before any memory
+ * slot is added. Otherwise, the migration will fail with the dirty
+ * ring.
+ */
+if (kvm_enabled()) {
+if (vms->gic_version != VIRT_GIC_VERSION_2) {
+gictype = gicv3_class_name();
+}
+
+if (vms->gic_version != VIRT_GIC_VERSION_2 && vms->its) {
+itsclass = its_class_name();
+}
+
+if (((gictype && !strcmp(gictype, "kvm-arm-gicv3")) ||
+ (itsclass && !strcmp(itsclass, "arm-its-kvm"))) &&
+!kvm_arm_enable_dirty_ring_with_bitmap()) {
+error_report("Failed to enable the backup bitmap for "
+ "KVM dirty ring");
+exit(1);
+}
+}


Why does this need to be board-specific code? Is there
some way we can just do the right thing automatically?
Why does the GIC/ITS matter?

The kernel should already know whether we have asked it
to do something that needs this extra extension, so
I think we ought to be able in the generic "enable the
dirty ring" code say "if the kernel says we need this
extra thing, also enable this extra thing". Or if that's
too early, we can do the extra part in a generic hook a
bit later.

In the future there might be other things, presumably,
that need the backup bitmap, so it would be more future
proof not to need to also change QEMU to add extra
logic checks that duplicate the logic the kernel already has.



When the dirty ring is enabled, a per-vcpu buffer is used to track the dirty 
pages.
The prerequisite to use the per-vcpu buffer is existing running VCPU context. 
There
are two cases where no running VCPU context exists and the backup bitmap 
extension
is needed, as we know until now: (a) save/restore GICv3 tables; (b) 
save/restore ITS
tables; These two cases are related to KVM device "kvm-arm-gicv3" and 
"arm-its-kvm",
which are only needed by virt machine at present. So we needn't the backup 
bitmap
extension for other boards.

The host kernel always exports the capability KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP
for ARM64. The capability isn't exported for x86 because we needn't it there. 
The
generic path to enable the extension would be in kvm_init(). I think the 
extension
is enabled unconditionally if it has been exported by host kernel. It means 
there
will be unnecessary overhead to synchronize the backup bitmap when the 
aforementioned
KVM devices aren't used, but the overhead should be very small and acceptable. 
The
only concern is host kernel has to allocate the backup bitmap, which is totally 
no
use. Please let me know your thoughts, Peter.


qemu_init
  qemu_create_machine
:
:
  configure_accelerators
do_configure_accelerator
  accel_init_machine
kvm_init// Where the dirty ring is eanbled. 
Would be best
  kvm_arch_init // place to enable the backup 
bitmap extension regardless
:   // of 'kvm-arm-gicv3' and 
'arm-its-kvm' are used
:
  qmp_x_exit_preconfig
qemu_init_board
  machine_run_board_init
machvirt_init  // memory slots are added here and 
where we know the KVM devices
:  // are used
:
  accel_setup_post // no backend for KVM yet, xen only

Note that the capability KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP can't be enabled if
the dirty ring isn't enabled or memory slots have been added.

Thanks,
Gavin




Re: [PATCH v1 3/6] kvm: Synchronize the backup bitmap in the last stage

2023-02-21 Thread Gavin Shan

On 2/22/23 4:46 AM, Peter Xu wrote:

On Mon, Feb 13, 2023 at 08:39:22AM +0800, Gavin Shan wrote:

In the last stage of live migration or memory slot removal, the
backup bitmap needs to be synchronized when it has been enabled.

Signed-off-by: Gavin Shan 
---
  accel/kvm/kvm-all.c  | 11 +++
  include/sysemu/kvm_int.h |  1 +
  2 files changed, 12 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 01a6a026af..b5e12de522 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -1352,6 +1352,10 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
   */
  if (kvm_state->kvm_dirty_ring_size) {
  kvm_dirty_ring_reap_locked(kvm_state, NULL);
+if (kvm_state->kvm_dirty_ring_with_bitmap) {
+kvm_slot_sync_dirty_pages(mem);
+kvm_slot_get_dirty_log(kvm_state, mem);
+}
  } else {
  kvm_slot_get_dirty_log(kvm_state, mem);
  }


IIUC after the memory atomic update changes lands QEMU, we may not need
this sync at all.

My understanding is that we sync dirty log here only because of non-atomic
updates happening in the past and we may lose dirty bits unexpectedly.
Maybe Paolo knows.

But that needs some more justification and history digging, so definitely
more suitable to leave it for later and separate discussion.

Reviewed-by: Peter Xu 



Peter, could you please give some hints for me to understand the atomic
and non-atomic update here? Ok, I will drop this part of changes in next
revision with the assumption that we have atomic update supported for
ARM64.

Thanks,
Gavin




@@ -1573,6 +1577,12 @@ static void kvm_log_sync_global(MemoryListener *l, bool 
last_stage)
  mem = >slots[i];
  if (mem->memory_size && mem->flags & KVM_MEM_LOG_DIRTY_PAGES) {
  kvm_slot_sync_dirty_pages(mem);
+
+if (s->kvm_dirty_ring_with_bitmap && last_stage &&
+kvm_slot_get_dirty_log(s, mem)) {
+kvm_slot_sync_dirty_pages(mem);
+}
+
  /*
   * This is not needed by KVM_GET_DIRTY_LOG because the
   * ioctl will unconditionally overwrite the whole region.
@@ -3701,6 +3711,7 @@ static void kvm_accel_instance_init(Object *obj)
  s->kernel_irqchip_split = ON_OFF_AUTO_AUTO;
  /* KVM dirty ring is by default off */
  s->kvm_dirty_ring_size = 0;
+s->kvm_dirty_ring_with_bitmap = false;
  s->notify_vmexit = NOTIFY_VMEXIT_OPTION_RUN;
  s->notify_window = 0;
  }
diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h
index 60b520a13e..fdd5b1bde0 100644
--- a/include/sysemu/kvm_int.h
+++ b/include/sysemu/kvm_int.h
@@ -115,6 +115,7 @@ struct KVMState
  } *as;
  uint64_t kvm_dirty_ring_bytes;  /* Size of the per-vcpu dirty ring */
  uint32_t kvm_dirty_ring_size;   /* Number of dirty GFNs per ring */
+bool kvm_dirty_ring_with_bitmap;
  struct KVMDirtyRingReaper reaper;
  NotifyVmexitOption notify_vmexit;
  uint32_t notify_window;
--
2.23.0








Re: [PATCH] hw/arm/virt: Prevent CPUs in one socket to span mutiple NUMA nodes

2023-02-21 Thread Gavin Shan

On 2/22/23 10:31 AM, Philippe Mathieu-Daudé wrote:

On 22/2/23 00:12, Gavin Shan wrote:

On 2/21/23 9:21 PM, Philippe Mathieu-Daudé wrote:

On 21/2/23 10:21, Gavin Shan wrote:

On 2/21/23 8:15 PM, Philippe Mathieu-Daudé wrote:

On 21/2/23 09:53, Gavin Shan wrote:

Linux kernel guest reports warning when two CPUs in one socket have
been associated with different NUMA nodes, using the following command
lines.

   -smp 6,maxcpus=6,sockets=2,clusters=1,cores=3,threads=1 \
   -numa node,nodeid=0,cpus=0-1,memdev=ram0    \
   -numa node,nodeid=1,cpus=2-3,memdev=ram1    \
   -numa node,nodeid=2,cpus=4-5,memdev=ram2    \

   [ cut here ]
   WARNING: CPU: 0 PID: 1 at kernel/sched/topology.c:2271 
build_sched_domains+0x284/0x910
   Modules linked in:
   CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.14.0-268.el9.aarch64 #1
   pstate: 0045 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
   pc : build_sched_domains+0x284/0x910
   lr : build_sched_domains+0x184/0x910
   sp : 8804bd50
   x29: 8804bd50 x28: 0002 x27: 
   x26: 89cf9a80 x25:  x24: 89cbf840
   x23: 80325000 x22: 005df800 x21: 8a4ce508
   x20:  x19: 80324440 x18: 0014
   x17: 388925c0 x16: 5386a066 x15: 9c10cc2e
   x14: 01c0 x13: 0001 x12: 7fffb1a0
   x11: 7fffb180 x10: 8a4ce508 x9 : 0041
   x8 : 8a4ce500 x7 : 8a4cf920 x6 : 0001
   x5 : 0001 x4 : 0007 x3 : 0002
   x2 : 1000 x1 : 8a4cf928 x0 : 0001
   Call trace:
    build_sched_domains+0x284/0x910
    sched_init_domains+0xac/0xe0
    sched_init_smp+0x48/0xc8
    kernel_init_freeable+0x140/0x1ac
    kernel_init+0x28/0x140
    ret_from_fork+0x10/0x20

Fix it by preventing mutiple CPUs in one socket to be associated with
different NUMA nodes.

Reported-by: Yihuang Yu 
Signed-off-by: Gavin Shan 
---
  hw/arm/virt.c | 37 +
  1 file changed, 37 insertions(+)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index ac626b3bef..e0af267c77 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -230,6 +230,39 @@ static bool cpu_type_valid(const char *cpu)
  return false;
  }
+static bool numa_state_valid(MachineState *ms)
+{
+    MachineClass *mc = MACHINE_GET_CLASS(ms);
+    NumaState *state = ms->numa_state;
+    const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms);
+    const CPUArchId *cpus = possible_cpus->cpus;
+    int len = possible_cpus->len, i, j;
+
+    if (!state || state->num_nodes <= 1 || len <= 1) {
+    return true;
+    }
+
+    for (i = 0; i < len; i++) {
+    for (j = i + 1; j < len; j++) {
+    if (cpus[i].props.has_socket_id &&
+    cpus[i].props.has_node_id &&
+    cpus[j].props.has_socket_id &&
+    cpus[j].props.has_node_id &&
+    cpus[i].props.socket_id == cpus[j].props.socket_id &&
+    cpus[i].props.node_id != cpus[j].props.node_id) {
+    error_report("CPU-%d and CPU-%d in socket-%ld have been "
+ "associated with node-%ld and node-%ld",
+ i, j, cpus[i].props.socket_id,
+ cpus[i].props.node_id,
+ cpus[j].props.node_id);
+    return false;
+    }
+    }
+    }
+
+    return true;
+}
+
  static void create_randomness(MachineState *ms, const char *node)
  {
  struct {
@@ -2040,6 +2073,10 @@ static void machvirt_init(MachineState *machine)
  exit(1);
  }
+    if (!numa_state_valid(machine)) {
+    exit(1);
+    }


Why restrict to the virt machine?



We tried x86 machines and virt machine, but the issue isn't reproducible on x86 
machines.
So I think it's machine or architecture specific issue. However, I believe 
RiscV should
have similar issue because linux/drivers/base/arch_topology.c is shared by 
ARM64 and RiscV.
x86 doesn't use the driver to populate its CPU topology.


Oh, I haven't thought about the other archs, I meant this seem a generic
issue which affects all (ARM) machines, so why restrict to the (ARM)
virt machine?



[Ccing Igor for comments]

Well, virt machine is the only concern to us for now. You're right that all 
ARM64 and ARM machines
need this check and limitation. So the check needs to be done in the generic 
path. The best way
I can figure out is like something below. The idea is to introduce a switch to 
'struct NumaState'
and do the check in the generic path. The switch is turned on by individual 
machines. Please let me
know if you have better ideas


Can't this be done generically in machine_numa_finish_cpu_init()
->

Re: [PATCH v1 2/6] migration: Add last stage indicator to global dirty log synchronization

2023-02-21 Thread Gavin Shan

On 2/22/23 4:36 AM, Peter Xu wrote:

On Mon, Feb 13, 2023 at 08:39:21AM +0800, Gavin Shan wrote:

The global dirty log synchronization is used when KVM and dirty ring
are enabled. There is a particularity for ARM64 where the backup
bitmap is used to track dirty pages in non-running-vcpu situations.
It means the dirty ring works with the combination of ring buffer
and backup bitmap. The dirty bits in the backup bitmap needs to
collected in the last stage of live migration.

In order to identify the last stage of live migration and pass it
down, an extra parameter is added to the relevant functions and
callbacks. This last stage indicator isn't used until the dirty
ring is enabled in the subsequent patches.

No functional change intended.

Signed-off-by: Gavin Shan 


Reviewed-by: Peter Xu 

One trivial thing to mention below.


---
  accel/kvm/kvm-all.c   |  2 +-
  include/exec/memory.h |  5 +++--
  migration/dirtyrate.c |  4 ++--
  migration/ram.c   | 20 ++--
  softmmu/memory.c  | 10 +-
  5 files changed, 21 insertions(+), 20 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 9b26582655..01a6a026af 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -1554,7 +1554,7 @@ static void kvm_log_sync(MemoryListener *listener,
  kvm_slots_unlock();
  }
  
-static void kvm_log_sync_global(MemoryListener *l)

+static void kvm_log_sync_global(MemoryListener *l, bool last_stage)
  {
  KVMMemoryListener *kml = container_of(l, KVMMemoryListener, listener);
  KVMState *s = kvm_state;
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 2e602a2fad..75b2fd9f48 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -929,8 +929,9 @@ struct MemoryListener {
   * its @log_sync must be NULL.  Vice versa.
   *
   * @listener: The #MemoryListener.
+ * @last_stage: The last stage to synchronize the log during migration


IMHO it may be important to mention the vcpu status here that the caller
guarantees to call the last_stage==true only once, only after all vcpus are
stopped (and vcpus will not be started again if migration succeeded).



Yes, I will update the comments in next revision accordingly.


   */
-void (*log_sync_global)(MemoryListener *listener);
+void (*log_sync_global)(MemoryListener *listener, bool last_stage);


Thanks,
Gavin




Re: [PATCH v1 1/6] linux-headers: Update for dirty ring

2023-02-21 Thread Gavin Shan

On 2/22/23 3:30 AM, Peter Maydell wrote:

On Mon, 13 Feb 2023 at 00:39, Gavin Shan  wrote:


Signed-off-by: Gavin Shan 
---
  linux-headers/asm-arm64/kvm.h | 1 +
  linux-headers/linux/kvm.h | 2 ++
  2 files changed, 3 insertions(+)


For this to be a non-RFC patch, this needs to be a proper
sync of the headers against an upstream kernel tree.
(By-hand tweaks are fine for purposes of working on
and getting patchsets reviewed.)



Yes, I vaguely remember there is script to synchronize linux header files,
which is './scripts/update-linux-headers.sh'. I think I need to run the
following command to update?

  # ./scripts/update-linux-headers.sh  

Thanks,
Gavin




Re: [PATCH] hw/arm/virt: Prevent CPUs in one socket to span mutiple NUMA nodes

2023-02-21 Thread Gavin Shan

On 2/21/23 9:21 PM, Philippe Mathieu-Daudé wrote:

On 21/2/23 10:21, Gavin Shan wrote:

On 2/21/23 8:15 PM, Philippe Mathieu-Daudé wrote:

On 21/2/23 09:53, Gavin Shan wrote:

Linux kernel guest reports warning when two CPUs in one socket have
been associated with different NUMA nodes, using the following command
lines.

   -smp 6,maxcpus=6,sockets=2,clusters=1,cores=3,threads=1 \
   -numa node,nodeid=0,cpus=0-1,memdev=ram0    \
   -numa node,nodeid=1,cpus=2-3,memdev=ram1    \
   -numa node,nodeid=2,cpus=4-5,memdev=ram2    \

   [ cut here ]
   WARNING: CPU: 0 PID: 1 at kernel/sched/topology.c:2271 
build_sched_domains+0x284/0x910
   Modules linked in:
   CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.14.0-268.el9.aarch64 #1
   pstate: 0045 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
   pc : build_sched_domains+0x284/0x910
   lr : build_sched_domains+0x184/0x910
   sp : 8804bd50
   x29: 8804bd50 x28: 0002 x27: 
   x26: 89cf9a80 x25:  x24: 89cbf840
   x23: 80325000 x22: 005df800 x21: 8a4ce508
   x20:  x19: 80324440 x18: 0014
   x17: 388925c0 x16: 5386a066 x15: 9c10cc2e
   x14: 01c0 x13: 0001 x12: 7fffb1a0
   x11: 7fffb180 x10: 8a4ce508 x9 : 0041
   x8 : 8a4ce500 x7 : 8a4cf920 x6 : 0001
   x5 : 0001 x4 : 0007 x3 : 0002
   x2 : 1000 x1 : 8a4cf928 x0 : 0001
   Call trace:
    build_sched_domains+0x284/0x910
    sched_init_domains+0xac/0xe0
    sched_init_smp+0x48/0xc8
    kernel_init_freeable+0x140/0x1ac
    kernel_init+0x28/0x140
    ret_from_fork+0x10/0x20

Fix it by preventing mutiple CPUs in one socket to be associated with
different NUMA nodes.

Reported-by: Yihuang Yu 
Signed-off-by: Gavin Shan 
---
  hw/arm/virt.c | 37 +
  1 file changed, 37 insertions(+)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index ac626b3bef..e0af267c77 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -230,6 +230,39 @@ static bool cpu_type_valid(const char *cpu)
  return false;
  }
+static bool numa_state_valid(MachineState *ms)
+{
+    MachineClass *mc = MACHINE_GET_CLASS(ms);
+    NumaState *state = ms->numa_state;
+    const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms);
+    const CPUArchId *cpus = possible_cpus->cpus;
+    int len = possible_cpus->len, i, j;
+
+    if (!state || state->num_nodes <= 1 || len <= 1) {
+    return true;
+    }
+
+    for (i = 0; i < len; i++) {
+    for (j = i + 1; j < len; j++) {
+    if (cpus[i].props.has_socket_id &&
+    cpus[i].props.has_node_id &&
+    cpus[j].props.has_socket_id &&
+    cpus[j].props.has_node_id &&
+    cpus[i].props.socket_id == cpus[j].props.socket_id &&
+    cpus[i].props.node_id != cpus[j].props.node_id) {
+    error_report("CPU-%d and CPU-%d in socket-%ld have been "
+ "associated with node-%ld and node-%ld",
+ i, j, cpus[i].props.socket_id,
+ cpus[i].props.node_id,
+ cpus[j].props.node_id);
+    return false;
+    }
+    }
+    }
+
+    return true;
+}
+
  static void create_randomness(MachineState *ms, const char *node)
  {
  struct {
@@ -2040,6 +2073,10 @@ static void machvirt_init(MachineState *machine)
  exit(1);
  }
+    if (!numa_state_valid(machine)) {
+    exit(1);
+    }


Why restrict to the virt machine?



We tried x86 machines and virt machine, but the issue isn't reproducible on x86 
machines.
So I think it's machine or architecture specific issue. However, I believe 
RiscV should
have similar issue because linux/drivers/base/arch_topology.c is shared by 
ARM64 and RiscV.
x86 doesn't use the driver to populate its CPU topology.


Oh, I haven't thought about the other archs, I meant this seem a generic
issue which affects all (ARM) machines, so why restrict to the (ARM)
virt machine?



[Ccing Igor for comments]

Well, virt machine is the only concern to us for now. You're right that all 
ARM64 and ARM machines
need this check and limitation. So the check needs to be done in the generic 
path. The best way
I can figure out is like something below. The idea is to introduce a switch to 
'struct NumaState'
and do the check in the generic path. The switch is turned on by individual 
machines. Please let me
know if you have better ideas

- Add 'bool struct NumaState::has_strict_socket_mapping', which is 'false' by 
default until
  machine specific initialization function calls helpe

Re: [PATCH] kvm: dirty-ring: Fix race with vcpu creation

2023-02-21 Thread Gavin Shan

On 2/6/23 10:00 AM, Peter Xu wrote:

On Tue, Sep 27, 2022 at 11:46:53AM -0400, Peter Xu wrote:

It's possible that we want to reap a dirty ring on a vcpu that is during
creation, because the vcpu is put onto list (CPU_FOREACH visible) before
initialization of the structures.  In this case:

qemu_init_vcpu
 x86_cpu_realizefn
 cpu_exec_realizefn
 cpu_list_add  < can be probed by CPU_FOREACH
 qemu_init_vcpu
 cpus_accel->create_vcpu_thread(cpu);
 kvm_init_vcpu
 map kvm_dirty_gfns  <--- kvm_dirty_gfns valid

Don't try to reap dirty ring on vcpus during creation or it'll crash.

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2124756
Reported-by: Xiaohui Li 
Signed-off-by: Peter Xu 
---
  accel/kvm/kvm-all.c | 9 +
  1 file changed, 9 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 5acab1767f..df5fabd3a8 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -757,6 +757,15 @@ static uint32_t kvm_dirty_ring_reap_one(KVMState *s, 
CPUState *cpu)
  uint32_t ring_size = s->kvm_dirty_ring_size;
  uint32_t count = 0, fetch = cpu->kvm_fetch_index;
  
+/*

+ * It's possible that we race with vcpu creation code where the vcpu is
+ * put onto the vcpus list but not yet initialized the dirty ring
+ * structures.  If so, skip it.
+ */
+if (!cpu->created) {
+return 0;
+}
+
  assert(dirty_gfns && ring_size);
  trace_kvm_dirty_ring_reap_vcpu(cpu->cpu_index);
  


Reviewed-by: Gavin Shan 




Re: [PATCH] hw/arm/virt: Prevent CPUs in one socket to span mutiple NUMA nodes

2023-02-21 Thread Gavin Shan

On 2/21/23 8:15 PM, Philippe Mathieu-Daudé wrote:

On 21/2/23 09:53, Gavin Shan wrote:

Linux kernel guest reports warning when two CPUs in one socket have
been associated with different NUMA nodes, using the following command
lines.

   -smp 6,maxcpus=6,sockets=2,clusters=1,cores=3,threads=1 \
   -numa node,nodeid=0,cpus=0-1,memdev=ram0    \
   -numa node,nodeid=1,cpus=2-3,memdev=ram1    \
   -numa node,nodeid=2,cpus=4-5,memdev=ram2    \

   [ cut here ]
   WARNING: CPU: 0 PID: 1 at kernel/sched/topology.c:2271 
build_sched_domains+0x284/0x910
   Modules linked in:
   CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.14.0-268.el9.aarch64 #1
   pstate: 0045 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
   pc : build_sched_domains+0x284/0x910
   lr : build_sched_domains+0x184/0x910
   sp : 8804bd50
   x29: 8804bd50 x28: 0002 x27: 
   x26: 89cf9a80 x25:  x24: 89cbf840
   x23: 80325000 x22: 005df800 x21: 8a4ce508
   x20:  x19: 80324440 x18: 0014
   x17: 388925c0 x16: 5386a066 x15: 9c10cc2e
   x14: 01c0 x13: 0001 x12: 7fffb1a0
   x11: 7fffb180 x10: 8a4ce508 x9 : 0041
   x8 : 8a4ce500 x7 : 8a4cf920 x6 : 0001
   x5 : 0001 x4 : 0007 x3 : 0002
   x2 : 1000 x1 : 8a4cf928 x0 : 0001
   Call trace:
    build_sched_domains+0x284/0x910
    sched_init_domains+0xac/0xe0
    sched_init_smp+0x48/0xc8
    kernel_init_freeable+0x140/0x1ac
    kernel_init+0x28/0x140
    ret_from_fork+0x10/0x20

Fix it by preventing mutiple CPUs in one socket to be associated with
different NUMA nodes.

Reported-by: Yihuang Yu 
Signed-off-by: Gavin Shan 
---
  hw/arm/virt.c | 37 +
  1 file changed, 37 insertions(+)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index ac626b3bef..e0af267c77 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -230,6 +230,39 @@ static bool cpu_type_valid(const char *cpu)
  return false;
  }
+static bool numa_state_valid(MachineState *ms)
+{
+    MachineClass *mc = MACHINE_GET_CLASS(ms);
+    NumaState *state = ms->numa_state;
+    const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms);
+    const CPUArchId *cpus = possible_cpus->cpus;
+    int len = possible_cpus->len, i, j;
+
+    if (!state || state->num_nodes <= 1 || len <= 1) {
+    return true;
+    }
+
+    for (i = 0; i < len; i++) {
+    for (j = i + 1; j < len; j++) {
+    if (cpus[i].props.has_socket_id &&
+    cpus[i].props.has_node_id &&
+    cpus[j].props.has_socket_id &&
+    cpus[j].props.has_node_id &&
+    cpus[i].props.socket_id == cpus[j].props.socket_id &&
+    cpus[i].props.node_id != cpus[j].props.node_id) {
+    error_report("CPU-%d and CPU-%d in socket-%ld have been "
+ "associated with node-%ld and node-%ld",
+ i, j, cpus[i].props.socket_id,
+ cpus[i].props.node_id,
+ cpus[j].props.node_id);
+    return false;
+    }
+    }
+    }
+
+    return true;
+}
+
  static void create_randomness(MachineState *ms, const char *node)
  {
  struct {
@@ -2040,6 +2073,10 @@ static void machvirt_init(MachineState *machine)
  exit(1);
  }
+    if (!numa_state_valid(machine)) {
+    exit(1);
+    }


Why restrict to the virt machine?



We tried x86 machines and virt machine, but the issue isn't reproducible on x86 
machines.
So I think it's machine or architecture specific issue. However, I believe 
RiscV should
have similar issue because linux/drivers/base/arch_topology.c is shared by 
ARM64 and RiscV.
x86 doesn't use the driver to populate its CPU topology.

Thanks,
Gavin




[PATCH] hw/arm/virt: Prevent CPUs in one socket to span mutiple NUMA nodes

2023-02-21 Thread Gavin Shan
Linux kernel guest reports warning when two CPUs in one socket have
been associated with different NUMA nodes, using the following command
lines.

  -smp 6,maxcpus=6,sockets=2,clusters=1,cores=3,threads=1 \
  -numa node,nodeid=0,cpus=0-1,memdev=ram0\
  -numa node,nodeid=1,cpus=2-3,memdev=ram1\
  -numa node,nodeid=2,cpus=4-5,memdev=ram2\

  [ cut here ]
  WARNING: CPU: 0 PID: 1 at kernel/sched/topology.c:2271 
build_sched_domains+0x284/0x910
  Modules linked in:
  CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.14.0-268.el9.aarch64 #1
  pstate: 0045 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
  pc : build_sched_domains+0x284/0x910
  lr : build_sched_domains+0x184/0x910
  sp : 8804bd50
  x29: 8804bd50 x28: 0002 x27: 
  x26: 89cf9a80 x25:  x24: 89cbf840
  x23: 80325000 x22: 005df800 x21: 8a4ce508
  x20:  x19: 80324440 x18: 0014
  x17: 388925c0 x16: 5386a066 x15: 9c10cc2e
  x14: 01c0 x13: 0001 x12: 7fffb1a0
  x11: 7fffb180 x10: 8a4ce508 x9 : 0041
  x8 : 8a4ce500 x7 : 8a4cf920 x6 : 0001
  x5 : 0001 x4 : 0007 x3 : 0002
  x2 : 1000 x1 : 8a4cf928 x0 : 0001
  Call trace:
   build_sched_domains+0x284/0x910
   sched_init_domains+0xac/0xe0
   sched_init_smp+0x48/0xc8
   kernel_init_freeable+0x140/0x1ac
   kernel_init+0x28/0x140
   ret_from_fork+0x10/0x20

Fix it by preventing mutiple CPUs in one socket to be associated with
different NUMA nodes.

Reported-by: Yihuang Yu 
Signed-off-by: Gavin Shan 
---
 hw/arm/virt.c | 37 +
 1 file changed, 37 insertions(+)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index ac626b3bef..e0af267c77 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -230,6 +230,39 @@ static bool cpu_type_valid(const char *cpu)
 return false;
 }
 
+static bool numa_state_valid(MachineState *ms)
+{
+MachineClass *mc = MACHINE_GET_CLASS(ms);
+NumaState *state = ms->numa_state;
+const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms);
+const CPUArchId *cpus = possible_cpus->cpus;
+int len = possible_cpus->len, i, j;
+
+if (!state || state->num_nodes <= 1 || len <= 1) {
+return true;
+}
+
+for (i = 0; i < len; i++) {
+for (j = i + 1; j < len; j++) {
+if (cpus[i].props.has_socket_id &&
+cpus[i].props.has_node_id &&
+cpus[j].props.has_socket_id &&
+cpus[j].props.has_node_id &&
+cpus[i].props.socket_id == cpus[j].props.socket_id &&
+cpus[i].props.node_id != cpus[j].props.node_id) {
+error_report("CPU-%d and CPU-%d in socket-%ld have been "
+ "associated with node-%ld and node-%ld",
+ i, j, cpus[i].props.socket_id,
+ cpus[i].props.node_id,
+ cpus[j].props.node_id);
+return false;
+}
+}
+}
+
+return true;
+}
+
 static void create_randomness(MachineState *ms, const char *node)
 {
 struct {
@@ -2040,6 +2073,10 @@ static void machvirt_init(MachineState *machine)
 exit(1);
 }
 
+if (!numa_state_valid(machine)) {
+exit(1);
+}
+
 possible_cpus = mc->possible_cpu_arch_ids(machine);
 
 /*
-- 
2.23.0




[PATCH v1 4/6] kvm: Add helper kvm_dirty_ring_init()

2023-02-12 Thread Gavin Shan
Due to multiple capabilities associated with the dirty ring for different
architectures: KVM_CAP_DIRTY_{LOG_RING, LOG_RING_ACQ_REL} for x86 and
arm64 separately. There will be more to be done in order to support the
dirty ring for arm64.

Lets add helper kvm_dirty_ring_init() to enable the dirty ring. With this,
the code looks a bit clean.

No functional change intended.

Signed-off-by: Gavin Shan 
---
 accel/kvm/kvm-all.c | 76 -
 1 file changed, 47 insertions(+), 29 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index b5e12de522..e5035026c9 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -1453,6 +1453,50 @@ static int kvm_dirty_ring_reaper_init(KVMState *s)
 return 0;
 }
 
+static int kvm_dirty_ring_init(KVMState *s)
+{
+uint32_t ring_size = s->kvm_dirty_ring_size;
+uint64_t ring_bytes = ring_size * sizeof(struct kvm_dirty_gfn);
+int ret;
+
+s->kvm_dirty_ring_size = 0;
+s->kvm_dirty_ring_bytes = 0;
+
+/* Bail if the dirty ring size isn't specified */
+if (!ring_size) {
+return 0;
+}
+
+/*
+ * Read the max supported pages. Fall back to dirty logging mode
+ * if the dirty ring isn't supported.
+ */
+ret = kvm_vm_check_extension(s, KVM_CAP_DIRTY_LOG_RING);
+if (ret <= 0) {
+warn_report("KVM dirty ring not available, using bitmap method");
+return 0;
+}
+
+if (ring_bytes > ret) {
+error_report("KVM dirty ring size %" PRIu32 " too big "
+ "(maximum is %ld).  Please use a smaller value.",
+ ring_size, (long)ret / sizeof(struct kvm_dirty_gfn));
+return -EINVAL;
+}
+
+ret = kvm_vm_enable_cap(s, KVM_CAP_DIRTY_LOG_RING, 0, ring_bytes);
+if (ret) {
+error_report("Enabling of KVM dirty ring failed: %s. "
+ "Suggested minimum value is 1024.", strerror(-ret));
+return -EIO;
+}
+
+s->kvm_dirty_ring_size = ring_size;
+s->kvm_dirty_ring_bytes = ring_bytes;
+
+return 0;
+}
+
 static void kvm_region_add(MemoryListener *listener,
MemoryRegionSection *section)
 {
@@ -2522,35 +2566,9 @@ static int kvm_init(MachineState *ms)
  * Enable KVM dirty ring if supported, otherwise fall back to
  * dirty logging mode
  */
-if (s->kvm_dirty_ring_size > 0) {
-uint64_t ring_bytes;
-
-ring_bytes = s->kvm_dirty_ring_size * sizeof(struct kvm_dirty_gfn);
-
-/* Read the max supported pages */
-ret = kvm_vm_check_extension(s, KVM_CAP_DIRTY_LOG_RING);
-if (ret > 0) {
-if (ring_bytes > ret) {
-error_report("KVM dirty ring size %" PRIu32 " too big "
- "(maximum is %ld).  Please use a smaller value.",
- s->kvm_dirty_ring_size,
- (long)ret / sizeof(struct kvm_dirty_gfn));
-ret = -EINVAL;
-goto err;
-}
-
-ret = kvm_vm_enable_cap(s, KVM_CAP_DIRTY_LOG_RING, 0, ring_bytes);
-if (ret) {
-error_report("Enabling of KVM dirty ring failed: %s. "
- "Suggested minimum value is 1024.", 
strerror(-ret));
-goto err;
-}
-
-s->kvm_dirty_ring_bytes = ring_bytes;
- } else {
- warn_report("KVM dirty ring not available, using bitmap method");
- s->kvm_dirty_ring_size = 0;
-}
+ret = kvm_dirty_ring_init(s);
+if (ret < 0) {
+goto err;
 }
 
 /*
-- 
2.23.0




[PATCH v1 2/6] migration: Add last stage indicator to global dirty log synchronization

2023-02-12 Thread Gavin Shan
The global dirty log synchronization is used when KVM and dirty ring
are enabled. There is a particularity for ARM64 where the backup
bitmap is used to track dirty pages in non-running-vcpu situations.
It means the dirty ring works with the combination of ring buffer
and backup bitmap. The dirty bits in the backup bitmap needs to
collected in the last stage of live migration.

In order to identify the last stage of live migration and pass it
down, an extra parameter is added to the relevant functions and
callbacks. This last stage indicator isn't used until the dirty
ring is enabled in the subsequent patches.

No functional change intended.

Signed-off-by: Gavin Shan 
---
 accel/kvm/kvm-all.c   |  2 +-
 include/exec/memory.h |  5 +++--
 migration/dirtyrate.c |  4 ++--
 migration/ram.c   | 20 ++--
 softmmu/memory.c  | 10 +-
 5 files changed, 21 insertions(+), 20 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 9b26582655..01a6a026af 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -1554,7 +1554,7 @@ static void kvm_log_sync(MemoryListener *listener,
 kvm_slots_unlock();
 }
 
-static void kvm_log_sync_global(MemoryListener *l)
+static void kvm_log_sync_global(MemoryListener *l, bool last_stage)
 {
 KVMMemoryListener *kml = container_of(l, KVMMemoryListener, listener);
 KVMState *s = kvm_state;
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 2e602a2fad..75b2fd9f48 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -929,8 +929,9 @@ struct MemoryListener {
  * its @log_sync must be NULL.  Vice versa.
  *
  * @listener: The #MemoryListener.
+ * @last_stage: The last stage to synchronize the log during migration
  */
-void (*log_sync_global)(MemoryListener *listener);
+void (*log_sync_global)(MemoryListener *listener, bool last_stage);
 
 /**
  * @log_clear:
@@ -2408,7 +2409,7 @@ MemoryRegionSection memory_region_find(MemoryRegion *mr,
  *
  * Synchronizes the dirty page log for all address spaces.
  */
-void memory_global_dirty_log_sync(void);
+void memory_global_dirty_log_sync(bool last_stage);
 
 /**
  * memory_global_dirty_log_sync: synchronize the dirty log for all memory
diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c
index 575d48c397..da9b4a1f8d 100644
--- a/migration/dirtyrate.c
+++ b/migration/dirtyrate.c
@@ -101,7 +101,7 @@ void global_dirty_log_change(unsigned int flag, bool start)
 static void global_dirty_log_sync(unsigned int flag, bool one_shot)
 {
 qemu_mutex_lock_iothread();
-memory_global_dirty_log_sync();
+memory_global_dirty_log_sync(false);
 if (one_shot) {
 memory_global_dirty_log_stop(flag);
 }
@@ -553,7 +553,7 @@ static void calculate_dirtyrate_dirty_bitmap(struct 
DirtyRateConfig config)
  * skip it unconditionally and start dirty tracking
  * from 2'round of log sync
  */
-memory_global_dirty_log_sync();
+memory_global_dirty_log_sync(false);
 
 /*
  * reset page protect manually and unconditionally.
diff --git a/migration/ram.c b/migration/ram.c
index b966e148c2..52abf015fd 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1176,7 +1176,7 @@ static void migration_trigger_throttle(RAMState *rs)
 }
 }
 
-static void migration_bitmap_sync(RAMState *rs)
+static void migration_bitmap_sync(RAMState *rs, bool last_stage)
 {
 RAMBlock *block;
 int64_t end_time;
@@ -1188,7 +1188,7 @@ static void migration_bitmap_sync(RAMState *rs)
 }
 
 trace_migration_bitmap_sync_start();
-memory_global_dirty_log_sync();
+memory_global_dirty_log_sync(last_stage);
 
 qemu_mutex_lock(>bitmap_mutex);
 WITH_RCU_READ_LOCK_GUARD() {
@@ -1222,7 +1222,7 @@ static void migration_bitmap_sync(RAMState *rs)
 }
 }
 
-static void migration_bitmap_sync_precopy(RAMState *rs)
+static void migration_bitmap_sync_precopy(RAMState *rs, bool last_stage)
 {
 Error *local_err = NULL;
 
@@ -1235,7 +1235,7 @@ static void migration_bitmap_sync_precopy(RAMState *rs)
 local_err = NULL;
 }
 
-migration_bitmap_sync(rs);
+migration_bitmap_sync(rs, last_stage);
 
 if (precopy_notify(PRECOPY_NOTIFY_AFTER_BITMAP_SYNC, _err)) {
 error_report_err(local_err);
@@ -2887,7 +2887,7 @@ void ram_postcopy_send_discard_bitmap(MigrationState *ms)
 RCU_READ_LOCK_GUARD();
 
 /* This should be our last sync, the src is now paused */
-migration_bitmap_sync(rs);
+migration_bitmap_sync(rs, false);
 
 /* Easiest way to make sure we don't resume in the middle of a host-page */
 rs->pss[RAM_CHANNEL_PRECOPY].last_sent_block = NULL;
@@ -3077,7 +3077,7 @@ static void ram_init_bitmaps(RAMState *rs)
 /* We don't use dirty log with background snapshots */
 if (!migrate_background_snapshot()) {
 memory_global_dirty_log_start(GLOBAL_DIRTY_MIGRATION);
-migration_bitmap_sync_prec

[PATCH v1 5/6] hw/arm/virt: Enable backup bitmap for dirty ring

2023-02-12 Thread Gavin Shan
When KVM device "kvm-arm-gicv3" or "arm-its-kvm" is used, we have to
enable the backup bitmap for the dirty ring. Otherwise, the migration
will fail because those two devices are using the backup bitmap to track
dirty guest memory, corresponding to various hardware tables.

Signed-off-by: Gavin Shan 
Reviewed-by: Juan Quintela 
---
 hw/arm/virt.c| 26 ++
 target/arm/kvm64.c   | 25 +
 target/arm/kvm_arm.h | 12 
 3 files changed, 63 insertions(+)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 75f28947de..ea9bd98a65 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -2024,6 +2024,8 @@ static void machvirt_init(MachineState *machine)
 VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(machine);
 MachineClass *mc = MACHINE_GET_CLASS(machine);
 const CPUArchIdList *possible_cpus;
+const char *gictype = NULL;
+const char *itsclass = NULL;
 MemoryRegion *sysmem = get_system_memory();
 MemoryRegion *secure_sysmem = NULL;
 MemoryRegion *tag_sysmem = NULL;
@@ -2071,6 +2073,30 @@ static void machvirt_init(MachineState *machine)
  */
 finalize_gic_version(vms);
 
+/*
+ * When "kvm-arm-gicv3" or "arm-its-kvm" is used, the backup dirty
+ * bitmap has to be enabled for KVM dirty ring, before any memory
+ * slot is added. Otherwise, the migration will fail with the dirty
+ * ring.
+ */
+if (kvm_enabled()) {
+if (vms->gic_version != VIRT_GIC_VERSION_2) {
+gictype = gicv3_class_name();
+}
+
+if (vms->gic_version != VIRT_GIC_VERSION_2 && vms->its) {
+itsclass = its_class_name();
+}
+
+if (((gictype && !strcmp(gictype, "kvm-arm-gicv3")) ||
+ (itsclass && !strcmp(itsclass, "arm-its-kvm"))) &&
+!kvm_arm_enable_dirty_ring_with_bitmap()) {
+error_report("Failed to enable the backup bitmap for "
+ "KVM dirty ring");
+exit(1);
+}
+}
+
 if (vms->secure) {
 /*
  * The Secure view of the world is the same as the NonSecure,
diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
index 1197253d12..91960e1cd3 100644
--- a/target/arm/kvm64.c
+++ b/target/arm/kvm64.c
@@ -764,6 +764,31 @@ bool kvm_arm_steal_time_supported(void)
 return kvm_check_extension(kvm_state, KVM_CAP_STEAL_TIME);
 }
 
+bool kvm_arm_enable_dirty_ring_with_bitmap(void)
+{
+int ret;
+
+/* No need to enable the backup bitmap if dirty ring isn't enabled */
+if (!kvm_dirty_ring_enabled()) {
+return true;
+}
+
+ret = kvm_check_extension(kvm_state, KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP);
+if (!ret) {
+return false;
+}
+
+ret = kvm_vm_enable_cap(kvm_state, KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP,
+0, 1);
+if (ret) {
+return false;
+}
+
+kvm_state->kvm_dirty_ring_with_bitmap = true;
+
+return true;
+}
+
 QEMU_BUILD_BUG_ON(KVM_ARM64_SVE_VQ_MIN != 1);
 
 uint32_t kvm_arm_sve_get_vls(CPUState *cs)
diff --git a/target/arm/kvm_arm.h b/target/arm/kvm_arm.h
index 99017b635c..402281e61a 100644
--- a/target/arm/kvm_arm.h
+++ b/target/arm/kvm_arm.h
@@ -282,6 +282,13 @@ void kvm_arm_steal_time_finalize(ARMCPU *cpu, Error 
**errp);
  */
 bool kvm_arm_steal_time_supported(void);
 
+/**
+ * kvm_arm_enable_dirty_ring_with_bitmap:
+ * Returns: true if KVM dirty ring's backup bitmap is enabled
+ * and false otherwise.
+ */
+bool kvm_arm_enable_dirty_ring_with_bitmap(void);
+
 /**
  * kvm_arm_aarch32_supported:
  *
@@ -395,6 +402,11 @@ static inline bool kvm_arm_steal_time_supported(void)
 return false;
 }
 
+static inline bool kvm_arm_enable_dirty_ring_with_bitmap(void)
+{
+return false;
+}
+
 /*
  * These functions should never actually be called without KVM support.
  */
-- 
2.23.0




[PATCH v1 3/6] kvm: Synchronize the backup bitmap in the last stage

2023-02-12 Thread Gavin Shan
In the last stage of live migration or memory slot removal, the
backup bitmap needs to be synchronized when it has been enabled.

Signed-off-by: Gavin Shan 
---
 accel/kvm/kvm-all.c  | 11 +++
 include/sysemu/kvm_int.h |  1 +
 2 files changed, 12 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 01a6a026af..b5e12de522 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -1352,6 +1352,10 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
  */
 if (kvm_state->kvm_dirty_ring_size) {
 kvm_dirty_ring_reap_locked(kvm_state, NULL);
+if (kvm_state->kvm_dirty_ring_with_bitmap) {
+kvm_slot_sync_dirty_pages(mem);
+kvm_slot_get_dirty_log(kvm_state, mem);
+}
 } else {
 kvm_slot_get_dirty_log(kvm_state, mem);
 }
@@ -1573,6 +1577,12 @@ static void kvm_log_sync_global(MemoryListener *l, bool 
last_stage)
 mem = >slots[i];
 if (mem->memory_size && mem->flags & KVM_MEM_LOG_DIRTY_PAGES) {
 kvm_slot_sync_dirty_pages(mem);
+
+if (s->kvm_dirty_ring_with_bitmap && last_stage &&
+kvm_slot_get_dirty_log(s, mem)) {
+kvm_slot_sync_dirty_pages(mem);
+}
+
 /*
  * This is not needed by KVM_GET_DIRTY_LOG because the
  * ioctl will unconditionally overwrite the whole region.
@@ -3701,6 +3711,7 @@ static void kvm_accel_instance_init(Object *obj)
 s->kernel_irqchip_split = ON_OFF_AUTO_AUTO;
 /* KVM dirty ring is by default off */
 s->kvm_dirty_ring_size = 0;
+s->kvm_dirty_ring_with_bitmap = false;
 s->notify_vmexit = NOTIFY_VMEXIT_OPTION_RUN;
 s->notify_window = 0;
 }
diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h
index 60b520a13e..fdd5b1bde0 100644
--- a/include/sysemu/kvm_int.h
+++ b/include/sysemu/kvm_int.h
@@ -115,6 +115,7 @@ struct KVMState
 } *as;
 uint64_t kvm_dirty_ring_bytes;  /* Size of the per-vcpu dirty ring */
 uint32_t kvm_dirty_ring_size;   /* Number of dirty GFNs per ring */
+bool kvm_dirty_ring_with_bitmap;
 struct KVMDirtyRingReaper reaper;
 NotifyVmexitOption notify_vmexit;
 uint32_t notify_window;
-- 
2.23.0




[PATCH v1 6/6] kvm: Enable dirty ring for arm64

2023-02-12 Thread Gavin Shan
arm64 has different capability from x86 to enable the dirty ring, which
is KVM_CAP_DIRTY_LOG_RING_ACQ_REL. To enable it in kvm_dirty_ring_init()
when KVM_CAP_DIRTY_LOG_RING isn't supported.

Signed-off-by: Gavin Shan 
Reviewed-by: Juan Quintela 
---
 accel/kvm/kvm-all.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index e5035026c9..f6fbeae644 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -1457,6 +1457,7 @@ static int kvm_dirty_ring_init(KVMState *s)
 {
 uint32_t ring_size = s->kvm_dirty_ring_size;
 uint64_t ring_bytes = ring_size * sizeof(struct kvm_dirty_gfn);
+unsigned int capability = KVM_CAP_DIRTY_LOG_RING;
 int ret;
 
 s->kvm_dirty_ring_size = 0;
@@ -1471,7 +1472,12 @@ static int kvm_dirty_ring_init(KVMState *s)
  * Read the max supported pages. Fall back to dirty logging mode
  * if the dirty ring isn't supported.
  */
-ret = kvm_vm_check_extension(s, KVM_CAP_DIRTY_LOG_RING);
+ret = kvm_vm_check_extension(s, capability);
+if (ret <= 0) {
+capability = KVM_CAP_DIRTY_LOG_RING_ACQ_REL;
+ret = kvm_vm_check_extension(s, capability);
+}
+
 if (ret <= 0) {
 warn_report("KVM dirty ring not available, using bitmap method");
 return 0;
@@ -1484,7 +1490,7 @@ static int kvm_dirty_ring_init(KVMState *s)
 return -EINVAL;
 }
 
-ret = kvm_vm_enable_cap(s, KVM_CAP_DIRTY_LOG_RING, 0, ring_bytes);
+ret = kvm_vm_enable_cap(s, capability, 0, ring_bytes);
 if (ret) {
 error_report("Enabling of KVM dirty ring failed: %s. "
  "Suggested minimum value is 1024.", strerror(-ret));
-- 
2.23.0




[PATCH v1 1/6] linux-headers: Update for dirty ring

2023-02-12 Thread Gavin Shan
Signed-off-by: Gavin Shan 
---
 linux-headers/asm-arm64/kvm.h | 1 +
 linux-headers/linux/kvm.h | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/linux-headers/asm-arm64/kvm.h b/linux-headers/asm-arm64/kvm.h
index 4bf2d7246e..a7cfefb3a8 100644
--- a/linux-headers/asm-arm64/kvm.h
+++ b/linux-headers/asm-arm64/kvm.h
@@ -43,6 +43,7 @@
 #define __KVM_HAVE_VCPU_EVENTS
 
 #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
+#define KVM_DIRTY_LOG_PAGE_OFFSET 64
 
 #define KVM_REG_SIZE(id)   \
(1U << (((id) & KVM_REG_SIZE_MASK) >> KVM_REG_SIZE_SHIFT))
diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index ebdafa576d..5b4e0e4e68 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -1175,6 +1175,8 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_VM_DISABLE_NX_HUGE_PAGES 220
 #define KVM_CAP_S390_ZPCI_OP 221
 #define KVM_CAP_S390_CPU_TOPOLOGY 222
+#define KVM_CAP_DIRTY_LOG_RING_ACQ_REL 223
+#define KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP 225
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.23.0




[PATCH v1 0/6] hw/arm/virt: Support dirty ring

2023-02-12 Thread Gavin Shan
This series intends to support dirty ring for live migration. The dirty
ring use discrete buffer to track dirty pages. For ARM64, the speciality
is to use backup bitmap to track dirty pages when there is no-running-vcpu
context. It's known that the backup bitmap needs to be synchronized when
KVM device "kvm-arm-gicv3" or "arm-its-kvm" has been enabled. The backup
bitmap is collected in the last stage of migration.

PATCH[1]Synchronize linux-headers for dirty ring
PATCH[2]Introduce indicator of the last stage migration and pass it
all the way down
PATCH[3]Synchronize the backup bitmap in the last stage of live migration
PATCH[4]Introduce helper kvm_dirty_ring_init() to enable the dirty ring
PATCH[5-6]  Enable dirty ring for hw/arm/virt

RFCv1: https://lists.nongnu.org/archive/html/qemu-arm/2023-02/msg00171.html

Testing
===
(1) kvm-unit-tests/its-pending-migration and kvm-unit-tests/its-migration with
dirty ring or normal dirty page tracking mechanism. All test cases passed.

QEMU=./qemu.main/build/qemu-system-aarch64 ACCEL=kvm \
./its-pending-migration

QEMU=./qemu.main/build/qemu-system-aarch64 ACCEL=kvm \
./its-migration

QEMU=./qemu.main/build/qemu-system-aarch64 ACCEL=kvm,dirty-ring-size=65536 \
./its-pending-migration

QEMU=./qemu.main/build/qemu-system-aarch64 ACCEL=kvm,dirty-ring-size=65536 \
./its-migration

(2) Combinations of migration, post-copy migration, e1000e and virtio-net
devices. All test cases passed.

-netdev tap,id=net0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown  \
-device e1000e,bus=pcie.5,netdev=net0,mac=52:54:00:f1:26:a0

-netdev tap,id=vnet0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \
-device virtio-net-pci,bus=pcie.6,netdev=vnet0,mac=52:54:00:f1:26:b0

Changelog
=
v1:
  * Combine two patches into one PATCH[v1 2/6] for the last stage indicator
(Peter)
  * Drop the secondary bitmap and use the original one directly
(Juan)
  * Avoid "goto out" in helper kvm_dirty_ring_init()       
(Juan)


Gavin Shan (6):
  linux-headers: Update for dirty ring
  migration: Add last stage indicator to global dirty log
synchronization
  kvm: Synchronize the backup bitmap in the last stage
  kvm: Add helper kvm_dirty_ring_init()
  hw/arm/virt: Enable backup bitmap for dirty ring
  kvm: Enable dirty ring for arm64

 accel/kvm/kvm-all.c   | 95 ---
 hw/arm/virt.c | 26 ++
 include/exec/memory.h |  5 +-
 include/sysemu/kvm_int.h  |  1 +
 linux-headers/asm-arm64/kvm.h |  1 +
 linux-headers/linux/kvm.h |  2 +
 migration/dirtyrate.c |  4 +-
 migration/ram.c   | 20 
 softmmu/memory.c  | 10 ++--
 target/arm/kvm64.c| 25 +
 target/arm/kvm_arm.h  | 12 +
 11 files changed, 152 insertions(+), 49 deletions(-)

-- 
2.23.0




Re: [PATCH RFCv1 2/8] memory: Add last stage indicator to global dirty log synchronization

2023-02-09 Thread Gavin Shan

On 2/10/23 6:48 AM, Peter Xu wrote:

On Mon, Feb 06, 2023 at 07:20:04PM +0800, Gavin Shan wrote:

The global dirty log synchronization is used when KVM and dirty ring
are enabled. There is a particularity for ARM64 where the backup
bitmap is used to track dirty pages in non-running-vcpu situations.
It means the dirty ring works with the combination of ring buffer
and backup bitmap. The dirty bits in the backup bitmap needs to
collected in the last stage of live migration.

In order to identify the last stage of live migration and pass it
down, an extra parameter is added to the relevant functions and
callback. This last stage information isn't used yet.

No functional change intended.

Signed-off-by: Gavin Shan 
---
  accel/kvm/kvm-all.c   |  2 +-
  include/exec/memory.h |  5 +++--
  migration/dirtyrate.c |  4 ++--
  migration/ram.c   |  6 +++---


Better move the migration/ changes into the next patch.



Ok, I will combine PATCH[2/8] and PATCH[3/8] in next revision.

Thanks,
Gavin




Re: [PATCH RFCv1 6/8] kvm: Add helper kvm_dirty_ring_init()

2023-02-09 Thread Gavin Shan

On 2/9/23 9:11 AM, Juan Quintela wrote:

Gavin Shan  wrote:

Due to multiple capabilities associated with the dirty ring for different
architectures: KVM_CAP_DIRTY_{LOG_RING, LOG_RING_ACQ_REL} for x86 and
arm64 separately. There will be more to be done in order to support the
dirty ring for arm64.

Lets add helper kvm_dirty_ring_init() to enable the dirty ring. With this,
the code looks a bit clean.

No functional change intended.

Signed-off-by: Gavin Shan 
---
  accel/kvm/kvm-all.c | 73 -
  1 file changed, 46 insertions(+), 27 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 9ec117c441..399ef0f7e6 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -1476,6 +1476,49 @@ static int kvm_dirty_ring_reaper_init(KVMState *s)
  return 0;
  }
  
+static int kvm_dirty_ring_init(KVMState *s)

+{
+uint64_t ring_bytes;
+int ret;
+
+/*
+ * Read the max supported pages. Fall back to dirty logging mode
+ * if the dirty ring isn't supported.
+ */
+ret = kvm_vm_check_extension(s, KVM_CAP_DIRTY_LOG_RING);
+if (ret <= 0) {
+warn_report("KVM dirty ring not available, using bitmap method");
+s->kvm_dirty_ring_size = 0;
+return 0;
+}
+
+ring_bytes = s->kvm_dirty_ring_size * sizeof(struct kvm_dirty_gfn);
+if (ring_bytes > ret) {
+error_report("KVM dirty ring size %" PRIu32 " too big "
+ "(maximum is %ld).  Please use a smaller value.",
+ s->kvm_dirty_ring_size,
+ (long)ret / sizeof(struct kvm_dirty_gfn));
+ret = -EINVAL;
+goto out;
+}
+
+ret = kvm_vm_enable_cap(s, KVM_CAP_DIRTY_LOG_RING, 0, ring_bytes);
+if (ret) {
+error_report("Enabling of KVM dirty ring failed: %s. "
+ "Suggested minimum value is 1024.", strerror(-ret));
+ret = -EIO;
+}
+
+out:
+if (ret) {
+s->kvm_dirty_ring_size = 0;
+} else {
+s->kvm_dirty_ring_bytes = ring_bytes;
+}
+
+return ret;
+}


If you split it, you don't need the goto.

static int kvm_dirty_ring_init(KVMState *s)
{
 s->kvm_dirty_ring_size = 0;
 /*
  * Read the max supported pages. Fall back to dirty logging mode
  * if the dirty ring isn't supported.
  */
 int ret = kvm_vm_check_extension(s, KVM_CAP_DIRTY_LOG_RING);
 if (ret <= 0) {
 warn_report("KVM dirty ring not available, using bitmap method");
 return 0;
 }

 uint64_t ring_bytes = s->kvm_dirty_ring_size * sizeof(struct 
kvm_dirty_gfn);
 if (ring_bytes > ret) {
 error_report("KVM dirty ring size %" PRIu32 " too big "
  "(maximum is %ld).  Please use a smaller value.",
  s->kvm_dirty_ring_size,
  (long)ret / sizeof(struct kvm_dirty_gfn));
 return -EINVAL;
 }

 ret = kvm_vm_enable_cap(s, KVM_CAP_DIRTY_LOG_RING, 0, ring_bytes);
 if (ret) {
 error_report("Enabling of KVM dirty ring failed: %s. "
  "Suggested minimum value is 1024.", strerror(-ret));
 return -EIO;
 }

 s->kvm_dirty_ring_bytes = ring_bytes;
 return ret;
}



Ok, thanks for your review. The goto will be removed in next revision.





+
  static void kvm_region_add(MemoryListener *listener,
 MemoryRegionSection *section)
  {
@@ -2545,33 +2588,9 @@ static int kvm_init(MachineState *ms)
   * dirty logging mode
   */
  if (s->kvm_dirty_ring_size > 0) {
-uint64_t ring_bytes;
-
-ring_bytes = s->kvm_dirty_ring_size * sizeof(struct kvm_dirty_gfn);
-
-/* Read the max supported pages */
-ret = kvm_vm_check_extension(s, KVM_CAP_DIRTY_LOG_RING);
-if (ret > 0) {
-if (ring_bytes > ret) {
-error_report("KVM dirty ring size %" PRIu32 " too big "
- "(maximum is %ld).  Please use a smaller value.",
- s->kvm_dirty_ring_size,
- (long)ret / sizeof(struct kvm_dirty_gfn));
-ret = -EINVAL;
-goto err;
-}
-
-ret = kvm_vm_enable_cap(s, KVM_CAP_DIRTY_LOG_RING, 0, ring_bytes);
-if (ret) {
-error_report("Enabling of KVM dirty ring failed: %s. "
- "Suggested minimum value is 1024.", 
strerror(-ret));
-goto err;
-}
-
-s->kvm_dirty_ring_bytes = ring_bytes;
- } else {
- warn_report("KVM dirty ring not available, using bitmap method");
- s->kvm_dirty_ring_size = 0;
+ret = kvm_dirty_ring_init(s);
+if (ret < 0) {
+goto err;
  }
  }



Thanks,
Gavin




Re: [PATCH RFCv1 4/8] kvm: Introduce secondary dirty bitmap

2023-02-09 Thread Gavin Shan

On 2/9/23 9:07 AM, Juan Quintela wrote:

Gavin Shan  wrote:

When dirty ring is enabled on ARM64, the backup bitmap may be used
to track the dirty pages in no-running-vcpu situations. The original
bitmap is the primary one, used for the dirty ring buffer. We need
the secondary bitmap to collect the backup bitmap for ARM64.

No functional change intended.

Signed-off-by: Gavin Shan 
---
  accel/kvm/kvm-all.c  | 50 ++--
  include/sysemu/kvm_int.h |  1 +
  2 files changed, 39 insertions(+), 12 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 01a6a026af..1a93985574 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -553,13 +553,29 @@ static void kvm_log_stop(MemoryListener *listener,
  }
  }
  
+static unsigned long *kvm_slot_dirty_bitmap(KVMSlot *slot, bool primary)

+{
+if (primary) {
+return slot->dirty_bmap;
+}
+
+return slot->dirty_bmap +
+   slot->dirty_bmap_size / sizeof(slot->dirty_bmap[0]);
+}



Why?
Just use two bitmaps and call it a day.



Thanks for your review, Juan. Right, I had wrong assumption that the original
(primary) bitmap can't be reused. It's why the secondary bitmap is introduced.
The intention is to use the original (primary) bitmap to cover the dirty-ring
buffers while the secondary bitmap is to cover the backup bitmap, resident in
host kernel.

I think the original (primary) bitmap can be reused in this case. After the
dirty-ring buffer is synchronized to the original bitmap, which is updated
to the dirty bits. It can be reused to cover the backup bitmap. I will remove
the secondary bitmap in next revision.

Thanks,
Gavin




[PATCH RFCv1 6/8] kvm: Add helper kvm_dirty_ring_init()

2023-02-06 Thread Gavin Shan
Due to multiple capabilities associated with the dirty ring for different
architectures: KVM_CAP_DIRTY_{LOG_RING, LOG_RING_ACQ_REL} for x86 and
arm64 separately. There will be more to be done in order to support the
dirty ring for arm64.

Lets add helper kvm_dirty_ring_init() to enable the dirty ring. With this,
the code looks a bit clean.

No functional change intended.

Signed-off-by: Gavin Shan 
---
 accel/kvm/kvm-all.c | 73 -
 1 file changed, 46 insertions(+), 27 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 9ec117c441..399ef0f7e6 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -1476,6 +1476,49 @@ static int kvm_dirty_ring_reaper_init(KVMState *s)
 return 0;
 }
 
+static int kvm_dirty_ring_init(KVMState *s)
+{
+uint64_t ring_bytes;
+int ret;
+
+/*
+ * Read the max supported pages. Fall back to dirty logging mode
+ * if the dirty ring isn't supported.
+ */
+ret = kvm_vm_check_extension(s, KVM_CAP_DIRTY_LOG_RING);
+if (ret <= 0) {
+warn_report("KVM dirty ring not available, using bitmap method");
+s->kvm_dirty_ring_size = 0;
+return 0;
+}
+
+ring_bytes = s->kvm_dirty_ring_size * sizeof(struct kvm_dirty_gfn);
+if (ring_bytes > ret) {
+error_report("KVM dirty ring size %" PRIu32 " too big "
+ "(maximum is %ld).  Please use a smaller value.",
+ s->kvm_dirty_ring_size,
+ (long)ret / sizeof(struct kvm_dirty_gfn));
+ret = -EINVAL;
+goto out;
+}
+
+ret = kvm_vm_enable_cap(s, KVM_CAP_DIRTY_LOG_RING, 0, ring_bytes);
+if (ret) {
+error_report("Enabling of KVM dirty ring failed: %s. "
+ "Suggested minimum value is 1024.", strerror(-ret));
+ret = -EIO;
+}
+
+out:
+if (ret) {
+s->kvm_dirty_ring_size = 0;
+} else {
+s->kvm_dirty_ring_bytes = ring_bytes;
+}
+
+return ret;
+}
+
 static void kvm_region_add(MemoryListener *listener,
MemoryRegionSection *section)
 {
@@ -2545,33 +2588,9 @@ static int kvm_init(MachineState *ms)
  * dirty logging mode
  */
 if (s->kvm_dirty_ring_size > 0) {
-uint64_t ring_bytes;
-
-ring_bytes = s->kvm_dirty_ring_size * sizeof(struct kvm_dirty_gfn);
-
-/* Read the max supported pages */
-ret = kvm_vm_check_extension(s, KVM_CAP_DIRTY_LOG_RING);
-if (ret > 0) {
-if (ring_bytes > ret) {
-error_report("KVM dirty ring size %" PRIu32 " too big "
- "(maximum is %ld).  Please use a smaller value.",
- s->kvm_dirty_ring_size,
- (long)ret / sizeof(struct kvm_dirty_gfn));
-ret = -EINVAL;
-goto err;
-}
-
-ret = kvm_vm_enable_cap(s, KVM_CAP_DIRTY_LOG_RING, 0, ring_bytes);
-if (ret) {
-error_report("Enabling of KVM dirty ring failed: %s. "
- "Suggested minimum value is 1024.", 
strerror(-ret));
-goto err;
-}
-
-s->kvm_dirty_ring_bytes = ring_bytes;
- } else {
- warn_report("KVM dirty ring not available, using bitmap method");
- s->kvm_dirty_ring_size = 0;
+ret = kvm_dirty_ring_init(s);
+if (ret < 0) {
+goto err;
 }
 }
 
-- 
2.23.0




[PATCH RFCv1 5/8] kvm: Synchronize secondary bitmap in last stage

2023-02-06 Thread Gavin Shan
In the last stage of live migration or memory slot removal, the
backup bitmap needs to be synchronized.

Signed-off-by: Gavin Shan 
---
 accel/kvm/kvm-all.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 1a93985574..9ec117c441 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -1377,10 +1377,12 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
  */
 if (kvm_state->kvm_dirty_ring_size) {
 kvm_dirty_ring_reap_locked(kvm_state, NULL);
+kvm_slot_get_dirty_log(kvm_state, mem, false);
 } else {
 kvm_slot_get_dirty_log(kvm_state, mem, true);
 }
 kvm_slot_sync_dirty_pages(mem, true);
+kvm_slot_sync_dirty_pages(mem, false);
 }
 
 /* unregister the slot */
@@ -1604,6 +1606,11 @@ static void kvm_log_sync_global(MemoryListener *l, bool 
last_stage)
  * However kvm dirty ring has no such side effect.
  */
 kvm_slot_reset_dirty_pages(mem);
+
+if (s->kvm_dirty_ring_with_bitmap && last_stage &&
+kvm_slot_get_dirty_log(s, mem, false)) {
+kvm_slot_sync_dirty_pages(mem, false);
+}
 }
 }
 kvm_slots_unlock();
-- 
2.23.0




[PATCH RFCv1 7/8] hw/arm/virt: Enable backup bitmap for dirty ring

2023-02-06 Thread Gavin Shan
When KVM device "kvm-arm-gicv3" or "arm-its-kvm" is used, we have to
enable the backup bitmap for the dirty ring. Otherwise, the migration
will fail because those two devices are using the backup bitmap to track
dirty guest memory, corresponding to various hardware tables.

Signed-off-by: Gavin Shan 
---
 hw/arm/virt.c| 26 ++
 target/arm/kvm64.c   | 25 +
 target/arm/kvm_arm.h | 12 
 3 files changed, 63 insertions(+)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index ba47728288..b91b5972bc 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -2025,6 +2025,8 @@ static void machvirt_init(MachineState *machine)
 VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(machine);
 MachineClass *mc = MACHINE_GET_CLASS(machine);
 const CPUArchIdList *possible_cpus;
+const char *gictype = NULL;
+const char *itsclass = NULL;
 MemoryRegion *sysmem = get_system_memory();
 MemoryRegion *secure_sysmem = NULL;
 MemoryRegion *tag_sysmem = NULL;
@@ -2072,6 +2074,30 @@ static void machvirt_init(MachineState *machine)
  */
 finalize_gic_version(vms);
 
+/*
+ * When "kvm-arm-gicv3" or "arm-its-kvm" is used, the backup dirty
+ * bitmap has to be enabled for KVM dirty ring, before any memory
+ * slot is added. Otherwise, the migration will fail with the dirty
+ * ring.
+ */
+if (kvm_enabled()) {
+if (vms->gic_version != VIRT_GIC_VERSION_2) {
+gictype = gicv3_class_name();
+}
+
+if (vms->gic_version != VIRT_GIC_VERSION_2 && vms->its) {
+itsclass = its_class_name();
+}
+
+if (((gictype && !strcmp(gictype, "kvm-arm-gicv3")) ||
+ (itsclass && !strcmp(itsclass, "arm-its-kvm"))) &&
+!kvm_arm_enable_dirty_ring_with_bitmap()) {
+error_report("Failed to enable the backup bitmap for "
+ "KVM dirty ring");
+exit(1);
+}
+}
+
 if (vms->secure) {
 /*
  * The Secure view of the world is the same as the NonSecure,
diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
index 1197253d12..91960e1cd3 100644
--- a/target/arm/kvm64.c
+++ b/target/arm/kvm64.c
@@ -764,6 +764,31 @@ bool kvm_arm_steal_time_supported(void)
 return kvm_check_extension(kvm_state, KVM_CAP_STEAL_TIME);
 }
 
+bool kvm_arm_enable_dirty_ring_with_bitmap(void)
+{
+int ret;
+
+/* No need to enable the backup bitmap if dirty ring isn't enabled */
+if (!kvm_dirty_ring_enabled()) {
+return true;
+}
+
+ret = kvm_check_extension(kvm_state, KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP);
+if (!ret) {
+return false;
+}
+
+ret = kvm_vm_enable_cap(kvm_state, KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP,
+0, 1);
+if (ret) {
+return false;
+}
+
+kvm_state->kvm_dirty_ring_with_bitmap = true;
+
+return true;
+}
+
 QEMU_BUILD_BUG_ON(KVM_ARM64_SVE_VQ_MIN != 1);
 
 uint32_t kvm_arm_sve_get_vls(CPUState *cs)
diff --git a/target/arm/kvm_arm.h b/target/arm/kvm_arm.h
index 99017b635c..402281e61a 100644
--- a/target/arm/kvm_arm.h
+++ b/target/arm/kvm_arm.h
@@ -282,6 +282,13 @@ void kvm_arm_steal_time_finalize(ARMCPU *cpu, Error 
**errp);
  */
 bool kvm_arm_steal_time_supported(void);
 
+/**
+ * kvm_arm_enable_dirty_ring_with_bitmap:
+ * Returns: true if KVM dirty ring's backup bitmap is enabled
+ * and false otherwise.
+ */
+bool kvm_arm_enable_dirty_ring_with_bitmap(void);
+
 /**
  * kvm_arm_aarch32_supported:
  *
@@ -395,6 +402,11 @@ static inline bool kvm_arm_steal_time_supported(void)
 return false;
 }
 
+static inline bool kvm_arm_enable_dirty_ring_with_bitmap(void)
+{
+return false;
+}
+
 /*
  * These functions should never actually be called without KVM support.
  */
-- 
2.23.0




[PATCH RFCv1 8/8] kvm: Enable dirty ring for arm64

2023-02-06 Thread Gavin Shan
arm64 has different capability from x86 to enable the dirty ring, which
is KVM_CAP_DIRTY_LOG_RING_ACQ_REL. To enable it in kvm_dirty_ring_init()
when KVM_CAP_DIRTY_LOG_RING isn't supported.

Signed-off-by: Gavin Shan 
---
 accel/kvm/kvm-all.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 399ef0f7e6..8ab31865eb 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -1479,13 +1479,19 @@ static int kvm_dirty_ring_reaper_init(KVMState *s)
 static int kvm_dirty_ring_init(KVMState *s)
 {
 uint64_t ring_bytes;
+unsigned int capability = KVM_CAP_DIRTY_LOG_RING;
 int ret;
 
 /*
  * Read the max supported pages. Fall back to dirty logging mode
  * if the dirty ring isn't supported.
  */
-ret = kvm_vm_check_extension(s, KVM_CAP_DIRTY_LOG_RING);
+ret = kvm_vm_check_extension(s, capability);
+if (ret <= 0) {
+capability = KVM_CAP_DIRTY_LOG_RING_ACQ_REL;
+ret = kvm_vm_check_extension(s, capability);
+}
+
 if (ret <= 0) {
 warn_report("KVM dirty ring not available, using bitmap method");
 s->kvm_dirty_ring_size = 0;
@@ -1502,7 +1508,7 @@ static int kvm_dirty_ring_init(KVMState *s)
 goto out;
 }
 
-ret = kvm_vm_enable_cap(s, KVM_CAP_DIRTY_LOG_RING, 0, ring_bytes);
+ret = kvm_vm_enable_cap(s, capability, 0, ring_bytes);
 if (ret) {
 error_report("Enabling of KVM dirty ring failed: %s. "
  "Suggested minimum value is 1024.", strerror(-ret));
-- 
2.23.0




[PATCH RFCv1 4/8] kvm: Introduce secondary dirty bitmap

2023-02-06 Thread Gavin Shan
When dirty ring is enabled on ARM64, the backup bitmap may be used
to track the dirty pages in no-running-vcpu situations. The original
bitmap is the primary one, used for the dirty ring buffer. We need
the secondary bitmap to collect the backup bitmap for ARM64.

No functional change intended.

Signed-off-by: Gavin Shan 
---
 accel/kvm/kvm-all.c  | 50 ++--
 include/sysemu/kvm_int.h |  1 +
 2 files changed, 39 insertions(+), 12 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 01a6a026af..1a93985574 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -553,13 +553,29 @@ static void kvm_log_stop(MemoryListener *listener,
 }
 }
 
+static unsigned long *kvm_slot_dirty_bitmap(KVMSlot *slot, bool primary)
+{
+if (primary) {
+return slot->dirty_bmap;
+}
+
+return slot->dirty_bmap +
+   slot->dirty_bmap_size / sizeof(slot->dirty_bmap[0]);
+}
+
 /* get kvm's dirty pages bitmap and update qemu's */
-static void kvm_slot_sync_dirty_pages(KVMSlot *slot)
+static void kvm_slot_sync_dirty_pages(KVMSlot *slot, bool primary)
 {
+KVMState *s = kvm_state;
+unsigned long *bmap = kvm_slot_dirty_bitmap(slot, primary);
 ram_addr_t start = slot->ram_start_offset;
 ram_addr_t pages = slot->memory_size / qemu_real_host_page_size();
 
-cpu_physical_memory_set_dirty_lebitmap(slot->dirty_bmap, start, pages);
+if (!s->kvm_dirty_ring_with_bitmap && !primary) {
+return;
+}
+
+cpu_physical_memory_set_dirty_lebitmap(bmap, start, pages);
 }
 
 static void kvm_slot_reset_dirty_pages(KVMSlot *slot)
@@ -572,6 +588,9 @@ static void kvm_slot_reset_dirty_pages(KVMSlot *slot)
 /* Allocate the dirty bitmap for a slot  */
 static void kvm_slot_init_dirty_bitmap(KVMSlot *mem)
 {
+KVMState *s = kvm_state;
+hwaddr bitmap_size, alloc_size;
+
 if (!(mem->flags & KVM_MEM_LOG_DIRTY_PAGES) || mem->dirty_bmap) {
 return;
 }
@@ -593,9 +612,11 @@ static void kvm_slot_init_dirty_bitmap(KVMSlot *mem)
  * And mem->memory_size is aligned to it (otherwise this mem can't
  * be registered to KVM).
  */
-hwaddr bitmap_size = ALIGN(mem->memory_size / qemu_real_host_page_size(),
-/*HOST_LONG_BITS*/ 64) / 8;
-mem->dirty_bmap = g_malloc0(bitmap_size);
+bitmap_size = ALIGN(mem->memory_size / qemu_real_host_page_size(),
+/*HOST_LONG_BITS*/ 64) / 8;
+alloc_size = s->kvm_dirty_ring_with_bitmap ? 2 * bitmap_size : bitmap_size;
+
+mem->dirty_bmap = g_malloc0(alloc_size);
 mem->dirty_bmap_size = bitmap_size;
 }
 
@@ -603,12 +624,16 @@ static void kvm_slot_init_dirty_bitmap(KVMSlot *mem)
  * Sync dirty bitmap from kernel to KVMSlot.dirty_bmap, return true if
  * succeeded, false otherwise
  */
-static bool kvm_slot_get_dirty_log(KVMState *s, KVMSlot *slot)
+static bool kvm_slot_get_dirty_log(KVMState *s, KVMSlot *slot, bool primary)
 {
 struct kvm_dirty_log d = {};
 int ret;
 
-d.dirty_bitmap = slot->dirty_bmap;
+if (!s->kvm_dirty_ring_with_bitmap && !primary) {
+return false;
+}
+
+d.dirty_bitmap = kvm_slot_dirty_bitmap(slot, primary);
 d.slot = slot->slot | (slot->as_id << 16);
 ret = kvm_vm_ioctl(s, KVM_GET_DIRTY_LOG, );
 
@@ -839,8 +864,8 @@ static void 
kvm_physical_sync_dirty_bitmap(KVMMemoryListener *kml,
 /* We don't have a slot if we want to trap every access. */
 return;
 }
-if (kvm_slot_get_dirty_log(s, mem)) {
-kvm_slot_sync_dirty_pages(mem);
+if (kvm_slot_get_dirty_log(s, mem, true)) {
+kvm_slot_sync_dirty_pages(mem, true);
 }
 start_addr += slot_size;
 size -= slot_size;
@@ -1353,9 +1378,9 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
 if (kvm_state->kvm_dirty_ring_size) {
 kvm_dirty_ring_reap_locked(kvm_state, NULL);
 } else {
-kvm_slot_get_dirty_log(kvm_state, mem);
+kvm_slot_get_dirty_log(kvm_state, mem, true);
 }
-kvm_slot_sync_dirty_pages(mem);
+kvm_slot_sync_dirty_pages(mem, true);
 }
 
 /* unregister the slot */
@@ -1572,7 +1597,7 @@ static void kvm_log_sync_global(MemoryListener *l, bool 
last_stage)
 for (i = 0; i < s->nr_slots; i++) {
 mem = >slots[i];
 if (mem->memory_size && mem->flags & KVM_MEM_LOG_DIRTY_PAGES) {
-kvm_slot_sync_dirty_pages(mem);
+kvm_slot_sync_dirty_pages(mem, true);
 /*
  * This is not needed by KVM_GET_DIRTY_LOG because the
  * ioctl will unconditionally overwrite the whole region.
@@ -3701,6 +3726,7 @@ static void kvm_accel_instance_init(Object *obj)
 s->ke

[PATCH RFCv1 3/8] migration: Add last stage indicator to global dirty log synchronization

2023-02-06 Thread Gavin Shan
For the pre-copy live migration scenario, the last stage indicator
is needed for KVM backend to collect the dirty pages from the backup
bitmap when dirty ring is used. The indicator isn't used so far.

No functional change intended.

Signed-off-by: Gavin Shan 
---
 migration/ram.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index d1b9b270ec..6c9642edee 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1176,7 +1176,7 @@ static void migration_trigger_throttle(RAMState *rs)
 }
 }
 
-static void migration_bitmap_sync(RAMState *rs)
+static void migration_bitmap_sync(RAMState *rs, bool last_stage)
 {
 RAMBlock *block;
 int64_t end_time;
@@ -1188,7 +1188,7 @@ static void migration_bitmap_sync(RAMState *rs)
 }
 
 trace_migration_bitmap_sync_start();
-memory_global_dirty_log_sync(false);
+memory_global_dirty_log_sync(last_stage);
 
 qemu_mutex_lock(>bitmap_mutex);
 WITH_RCU_READ_LOCK_GUARD() {
@@ -1222,7 +1222,7 @@ static void migration_bitmap_sync(RAMState *rs)
 }
 }
 
-static void migration_bitmap_sync_precopy(RAMState *rs)
+static void migration_bitmap_sync_precopy(RAMState *rs, bool last_stage)
 {
 Error *local_err = NULL;
 
@@ -1235,7 +1235,7 @@ static void migration_bitmap_sync_precopy(RAMState *rs)
 local_err = NULL;
 }
 
-migration_bitmap_sync(rs);
+migration_bitmap_sync(rs, last_stage);
 
 if (precopy_notify(PRECOPY_NOTIFY_AFTER_BITMAP_SYNC, _err)) {
 error_report_err(local_err);
@@ -2844,7 +2844,7 @@ void ram_postcopy_send_discard_bitmap(MigrationState *ms)
 RCU_READ_LOCK_GUARD();
 
 /* This should be our last sync, the src is now paused */
-migration_bitmap_sync(rs);
+migration_bitmap_sync(rs, false);
 
 /* Easiest way to make sure we don't resume in the middle of a host-page */
 rs->pss[RAM_CHANNEL_PRECOPY].last_sent_block = NULL;
@@ -3034,7 +3034,7 @@ static void ram_init_bitmaps(RAMState *rs)
 /* We don't use dirty log with background snapshots */
 if (!migrate_background_snapshot()) {
 memory_global_dirty_log_start(GLOBAL_DIRTY_MIGRATION);
-migration_bitmap_sync_precopy(rs);
+migration_bitmap_sync_precopy(rs, false);
 }
 }
 qemu_mutex_unlock_ramlist();
@@ -3349,7 +3349,7 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
 
 WITH_RCU_READ_LOCK_GUARD() {
 if (!migration_in_postcopy()) {
-migration_bitmap_sync_precopy(rs);
+migration_bitmap_sync_precopy(rs, true);
 }
 
 ram_control_before_iterate(f, RAM_CONTROL_FINISH);
@@ -3407,7 +3407,7 @@ static void ram_save_pending(QEMUFile *f, void *opaque, 
uint64_t max_size,
 remaining_size < max_size) {
 qemu_mutex_lock_iothread();
 WITH_RCU_READ_LOCK_GUARD() {
-migration_bitmap_sync_precopy(rs);
+migration_bitmap_sync_precopy(rs, false);
 }
 qemu_mutex_unlock_iothread();
 remaining_size = rs->migration_dirty_pages * TARGET_PAGE_SIZE;
-- 
2.23.0




[PATCH RFCv1 1/8] linux-headers: Update for dirty ring

2023-02-06 Thread Gavin Shan
Signed-off-by: Gavin Shan 
---
 linux-headers/asm-arm64/kvm.h | 1 +
 linux-headers/linux/kvm.h | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/linux-headers/asm-arm64/kvm.h b/linux-headers/asm-arm64/kvm.h
index 4bf2d7246e..a7cfefb3a8 100644
--- a/linux-headers/asm-arm64/kvm.h
+++ b/linux-headers/asm-arm64/kvm.h
@@ -43,6 +43,7 @@
 #define __KVM_HAVE_VCPU_EVENTS
 
 #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
+#define KVM_DIRTY_LOG_PAGE_OFFSET 64
 
 #define KVM_REG_SIZE(id)   \
(1U << (((id) & KVM_REG_SIZE_MASK) >> KVM_REG_SIZE_SHIFT))
diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index ebdafa576d..5b4e0e4e68 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -1175,6 +1175,8 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_VM_DISABLE_NX_HUGE_PAGES 220
 #define KVM_CAP_S390_ZPCI_OP 221
 #define KVM_CAP_S390_CPU_TOPOLOGY 222
+#define KVM_CAP_DIRTY_LOG_RING_ACQ_REL 223
+#define KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP 225
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.23.0




[PATCH RFCv1 2/8] memory: Add last stage indicator to global dirty log synchronization

2023-02-06 Thread Gavin Shan
The global dirty log synchronization is used when KVM and dirty ring
are enabled. There is a particularity for ARM64 where the backup
bitmap is used to track dirty pages in non-running-vcpu situations.
It means the dirty ring works with the combination of ring buffer
and backup bitmap. The dirty bits in the backup bitmap needs to
collected in the last stage of live migration.

In order to identify the last stage of live migration and pass it
down, an extra parameter is added to the relevant functions and
callback. This last stage information isn't used yet.

No functional change intended.

Signed-off-by: Gavin Shan 
---
 accel/kvm/kvm-all.c   |  2 +-
 include/exec/memory.h |  5 +++--
 migration/dirtyrate.c |  4 ++--
 migration/ram.c   |  6 +++---
 softmmu/memory.c  | 10 +-
 5 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 9b26582655..01a6a026af 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -1554,7 +1554,7 @@ static void kvm_log_sync(MemoryListener *listener,
 kvm_slots_unlock();
 }
 
-static void kvm_log_sync_global(MemoryListener *l)
+static void kvm_log_sync_global(MemoryListener *l, bool last_stage)
 {
 KVMMemoryListener *kml = container_of(l, KVMMemoryListener, listener);
 KVMState *s = kvm_state;
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 2e602a2fad..75b2fd9f48 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -929,8 +929,9 @@ struct MemoryListener {
  * its @log_sync must be NULL.  Vice versa.
  *
  * @listener: The #MemoryListener.
+ * @last_stage: The last stage to synchronize the log during migration
  */
-void (*log_sync_global)(MemoryListener *listener);
+void (*log_sync_global)(MemoryListener *listener, bool last_stage);
 
 /**
  * @log_clear:
@@ -2408,7 +2409,7 @@ MemoryRegionSection memory_region_find(MemoryRegion *mr,
  *
  * Synchronizes the dirty page log for all address spaces.
  */
-void memory_global_dirty_log_sync(void);
+void memory_global_dirty_log_sync(bool last_stage);
 
 /**
  * memory_global_dirty_log_sync: synchronize the dirty log for all memory
diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c
index 4bfb97fc68..aecc8142e4 100644
--- a/migration/dirtyrate.c
+++ b/migration/dirtyrate.c
@@ -101,7 +101,7 @@ void global_dirty_log_change(unsigned int flag, bool start)
 static void global_dirty_log_sync(unsigned int flag, bool one_shot)
 {
 qemu_mutex_lock_iothread();
-memory_global_dirty_log_sync();
+memory_global_dirty_log_sync(false);
 if (one_shot) {
 memory_global_dirty_log_stop(flag);
 }
@@ -553,7 +553,7 @@ static void calculate_dirtyrate_dirty_bitmap(struct 
DirtyRateConfig config)
  * skip it unconditionally and start dirty tracking
  * from 2'round of log sync
  */
-memory_global_dirty_log_sync();
+memory_global_dirty_log_sync(false);
 
 /*
  * reset page protect manually and unconditionally.
diff --git a/migration/ram.c b/migration/ram.c
index 334309f1c6..d1b9b270ec 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1188,7 +1188,7 @@ static void migration_bitmap_sync(RAMState *rs)
 }
 
 trace_migration_bitmap_sync_start();
-memory_global_dirty_log_sync();
+memory_global_dirty_log_sync(false);
 
 qemu_mutex_lock(>bitmap_mutex);
 WITH_RCU_READ_LOCK_GUARD() {
@@ -3817,7 +3817,7 @@ void colo_incoming_start_dirty_log(void)
 qemu_mutex_lock_iothread();
 qemu_mutex_lock_ramlist();
 
-memory_global_dirty_log_sync();
+memory_global_dirty_log_sync(false);
 WITH_RCU_READ_LOCK_GUARD() {
 RAMBLOCK_FOREACH_NOT_IGNORED(block) {
 ramblock_sync_dirty_bitmap(ram_state, block);
@@ -4114,7 +4114,7 @@ void colo_flush_ram_cache(void)
 void *src_host;
 unsigned long offset = 0;
 
-memory_global_dirty_log_sync();
+memory_global_dirty_log_sync(false);
 WITH_RCU_READ_LOCK_GUARD() {
 RAMBLOCK_FOREACH_NOT_IGNORED(block) {
 ramblock_sync_dirty_bitmap(ram_state, block);
diff --git a/softmmu/memory.c b/softmmu/memory.c
index 9d64efca26..1cc36ef028 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -2224,7 +2224,7 @@ void memory_region_set_dirty(MemoryRegion *mr, hwaddr 
addr,
  * If memory region `mr' is NULL, do global sync.  Otherwise, sync
  * dirty bitmap for the specified memory region.
  */
-static void memory_region_sync_dirty_bitmap(MemoryRegion *mr)
+static void memory_region_sync_dirty_bitmap(MemoryRegion *mr, bool last_stage)
 {
 MemoryListener *listener;
 AddressSpace *as;
@@ -2254,7 +2254,7 @@ static void memory_region_sync_dirty_bitmap(MemoryRegion 
*mr)
  * is to do a global sync, because we are not capable to
  * sync in a finer granularity.
  */
-listener->log_sync_global(listener);
+listener->log_sync_global(listener, l

[PATCH RFCv1 0/8] hw/arm/virt: Support dirty ring

2023-02-06 Thread Gavin Shan
This series intends to support dirty ring for live migration. The dirty
ring use discrete buffer to track dirty pages. For ARM64, the speciality
is to use backup bitmap to track dirty pages when there is no-running-vcpu
context. It's known that the backup bitmap needs to be synchronized when
KVM device "kvm-arm-gicv3" or "arm-its-kvm" has been enabled. The backup
bitmap is collected in the last stage of migration.

PATCH[1]Synchronize linux-headers for dirty ring
PATCH[2-3]  Introduce indicator of the last stage migration and pass it
all the way down
PATCH[4-5]  Introduce secondary bitmap, corresponding to the backup bitmap
PATCH[6-8]  Enable dirty ring for hw/arm/virt

Testing
===
(1) kvm-unit-tests/its-pending-migration and kvm-unit-tests/its-migration with
dirty ring or normal dirty page tracking mechanism. All test cases passed.

QEMU=./qemu.main/build/qemu-system-aarch64 ACCEL=kvm \
./its-pending-migration

QEMU=./qemu.main/build/qemu-system-aarch64 ACCEL=kvm \
./its-migration

QEMU=./qemu.main/build/qemu-system-aarch64 ACCEL=kvm,dirty-ring-size=65536 \
./its-pending-migration

QEMU=./qemu.main/build/qemu-system-aarch64 ACCEL=kvm,dirty-ring-size=65536 \
./its-migration

(2) Combinations of migration, post-copy migration, e1000e and virtio-net
devices. All test cases passed.

-netdev tap,id=net0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown  \
-device e1000e,bus=pcie.5,netdev=net0,mac=52:54:00:f1:26:a0

-netdev tap,id=vnet0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \
-device virtio-net-pci,bus=pcie.6,netdev=vnet0,mac=52:54:00:f1:26:b0

Gavin Shan (8):
  linux-headers: Update for dirty ring
  memory: Add last stage indicator to global dirty log synchronization
  migration: Add last stage indicator to global dirty log
synchronization
  kvm: Introduce secondary dirty bitmap
  kvm: Synchronize secondary bitmap in last stage
  kvm: Add helper kvm_dirty_ring_init()
  hw/arm/virt: Enable backup bitmap for dirty ring
  kvm: Enable dirty ring for arm64

 accel/kvm/kvm-all.c   | 138 --
 hw/arm/virt.c |  26 +++
 include/exec/memory.h |   5 +-
 include/sysemu/kvm_int.h  |   1 +
 linux-headers/asm-arm64/kvm.h |   1 +
 linux-headers/linux/kvm.h |   2 +
 migration/dirtyrate.c |   4 +-
 migration/ram.c   |  20 ++---
 softmmu/memory.c  |  10 +--
 target/arm/kvm64.c|  25 ++
 target/arm/kvm_arm.h  |  12 +++
 11 files changed, 185 insertions(+), 59 deletions(-)

-- 
2.23.0




Re: [PATCH v4] qapi/qom: Memory backend property prealloc-threads doc fix

2022-11-14 Thread Gavin Shan

On 11/14/22 11:24 AM, Zhenyu Zhang wrote:

Commit ffac16fab3 "hostmem: introduce "prealloc-threads" property"
(v5.0.0) changed the default number of threads from number of CPUs
to 1.  This was deemed a regression, and fixed in commit f8d426a685
"hostmem: default the amount of prealloc-threads to smp-cpus".
Except the documentation remained unchanged.

Update 'qapi/qom.json' to reflect the change in commit f8d426a6852c.

Signed-off-by: Zhenyu Zhang 
---

v4: Fix the line exceeding 80 characters limitation issue  (Gavin)
v3: Covers historical descriptions (Markus)
v2: The property is changed to smp-cpus since 5.0  (Phild)

---
  qapi/qom.json | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)



Reviewed-by: Gavin Shan 


diff --git a/qapi/qom.json b/qapi/qom.json
index 30e76653ad..f4a7917f3d 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -576,7 +576,8 @@
  #
  # @prealloc: if true, preallocate memory (default: false)
  #
-# @prealloc-threads: number of CPU threads to use for prealloc (default: 1)
+# @prealloc-threads: number of CPU threads to use for prealloc
+#(default: number of CPUs) (since 5.0)
  #
  # @prealloc-context: thread context to use for creation of preallocation 
threads
  #(default: none) (since 7.2)






Re: [PATCH v3] qapi/qom: Memory backend property prealloc-threads doc fix

2022-11-11 Thread Gavin Shan

On 11/11/22 6:54 PM, Igor Mammedov wrote:

On Fri, 11 Nov 2022 17:34:04 +0800
Gavin Shan  wrote:

On 11/11/22 5:13 PM, Igor Mammedov wrote:

On Fri, 11 Nov 2022 07:47:16 +0100
Markus Armbruster  wrote:

Gavin Shan  writes:

On 11/11/22 11:05 AM, Zhenyu Zhang wrote:

Commit ffac16fab3 "hostmem: introduce "prealloc-threads" property"
(v5.0.0) changed the default number of threads from number of CPUs
to 1.  This was deemed a regression, and fixed in commit f8d426a685
"hostmem: default the amount of prealloc-threads to smp-cpus".
Except the documentation remained unchanged.  Update it now.
Signed-off-by: Zhenyu Zhang 
---
v3: Covers historical descriptions  (Markus)
v2: The property is changed to smp-cpus since 5.0   (Phild)
---
qapi/qom.json | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
  


With the following comments addressed:

Reviewed-by: Gavin Shan 

---

Please consider amending the commit log to something like below.

The default "prealloc-threads" value is set to 1 when the property is
added by commit ffac16fab33b ("hostmem: introduce "prealloc-threads"
property") in v5.0.0. The default value is conflicting with the sugar
property as the value provided by the sugar property is number of CPUs.


What is the sugar property?  Can you explain the conflict in a bit more
detail?


my guess is that Gavin means mem_prealloc compat glue in 
qemu_process_sugar_options()

property value should be set according to following order
   default -> compat -> explicit value
so I don't see any conflict here.

PS:
if it we up to me, default would have stayed 1,
and prealloc-threads fixup to vCPUs number would happen in vl.c
similar to what is done in qemu_process_sugar_options(),
keeping backend clean of external dependencies.
   


Yes, it's the sugar property I was talking about. I'm not sure if
we have a more popular name for this property: compat property or
sugar property.

When 'mem-prealloc=on' and 'prealloc-threads=xxx' aren't provided,
the value is 1 before commit f8d426a6852c is applied. It's not
inconsistent with 'mem-prealloc=on'. It's the conflict I was talking
about and it's fixed by commit f8d426a6852c


default was not supposed to be consistent with legacy mem-prealloc
and sugar property takes care of mem-prealloc=on case.

so commit message in its current form looks fine to me.



Ok, thanks for your confirm. I think Zhenyu needs to post v4, to fix
the 80 characters limitation issue. My reviewed-by is still valid.

  

The conflict has been fixed by commit f8d426a6852c ("hostmem: default
the amount of prealloc-threads to smp-cpus"). However, 'qapi/qom.json'
was missed to be updated accordingly in the commit.

Update 'qapi/qom.json' to reflect the change in commit f8d426a6852c.

Signed-off-by: Zhenyu Zhang 

When a specific commit is mentioned in the commit log, we usually have
fixed format like below.

commit ffac16fab33b ("hostmem: introduce "prealloc-threads" property")
commit f8d426a6852c ("hostmem: default the amount of prealloc-threads to 
smp-cpus")


This is certainly a common format, but the other one is also in use.
  

diff --git a/qapi/qom.json b/qapi/qom.json
index 30e76653ad..dfd89bc6d4 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -576,7 +576,7 @@
#
# @prealloc: if true, preallocate memory (default: false)
#
-# @prealloc-threads: number of CPU threads to use for prealloc (default: 1)
+# @prealloc-threads: number of CPU threads to use for prealloc (default: 
number of CPUs) (since 5.0)
#
# @prealloc-context: thread context to use for creation of preallocation 
threads
#(default: none) (since 7.2)
  


The line seems exceeding 80 characters. It'd better to limit each line in 75 
characters.
So you probably need:

  # @prealloc-threads: number of CPU threads to use for prealloc (default: 
number of CPUs)
  #(since 5.0)


Still exceeds :)

I suggested

# @prealloc-threads: number of CPU threads to use for prealloc
#(default: number of CPUs) (since 5.0)
  
   


Markus's suggestion works :)



Thanks,
Gavin




Re: [PATCH v3] qapi/qom: Memory backend property prealloc-threads doc fix

2022-11-11 Thread Gavin Shan

On 11/11/22 5:13 PM, Igor Mammedov wrote:

On Fri, 11 Nov 2022 07:47:16 +0100
Markus Armbruster  wrote: 

Gavin Shan  writes:

On 11/11/22 11:05 AM, Zhenyu Zhang wrote:

Commit ffac16fab3 "hostmem: introduce "prealloc-threads" property"
(v5.0.0) changed the default number of threads from number of CPUs
to 1.  This was deemed a regression, and fixed in commit f8d426a685
"hostmem: default the amount of prealloc-threads to smp-cpus".
Except the documentation remained unchanged.  Update it now.
Signed-off-by: Zhenyu Zhang 
---
v3: Covers historical descriptions  (Markus)
v2: The property is changed to smp-cpus since 5.0   (Phild)
---
   qapi/qom.json | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)
   


With the following comments addressed:

Reviewed-by: Gavin Shan 

---

Please consider amending the commit log to something like below.

The default "prealloc-threads" value is set to 1 when the property is
added by commit ffac16fab33b ("hostmem: introduce "prealloc-threads"
property") in v5.0.0. The default value is conflicting with the sugar
property as the value provided by the sugar property is number of CPUs.


What is the sugar property?  Can you explain the conflict in a bit more
detail?


my guess is that Gavin means mem_prealloc compat glue in 
qemu_process_sugar_options()

property value should be set according to following order
  default -> compat -> explicit value
so I don't see any conflict here.

PS:
if it we up to me, default would have stayed 1,
and prealloc-threads fixup to vCPUs number would happen in vl.c
similar to what is done in qemu_process_sugar_options(),
keeping backend clean of external dependencies.



Yes, it's the sugar property I was talking about. I'm not sure if
we have a more popular name for this property: compat property or
sugar property.

When 'mem-prealloc=on' and 'prealloc-threads=xxx' aren't provided,
the value is 1 before commit f8d426a6852c is applied. It's not
inconsistent with 'mem-prealloc=on'. It's the conflict I was talking
about and it's fixed by commit f8d426a6852c




The conflict has been fixed by commit f8d426a6852c ("hostmem: default
the amount of prealloc-threads to smp-cpus"). However, 'qapi/qom.json'
was missed to be updated accordingly in the commit.

Update 'qapi/qom.json' to reflect the change in commit f8d426a6852c.

Signed-off-by: Zhenyu Zhang 

When a specific commit is mentioned in the commit log, we usually have
fixed format like below.

commit ffac16fab33b ("hostmem: introduce "prealloc-threads" property")
commit f8d426a6852c ("hostmem: default the amount of prealloc-threads to 
smp-cpus")


This is certainly a common format, but the other one is also in use.


diff --git a/qapi/qom.json b/qapi/qom.json
index 30e76653ad..dfd89bc6d4 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -576,7 +576,7 @@
   #
   # @prealloc: if true, preallocate memory (default: false)
   #
-# @prealloc-threads: number of CPU threads to use for prealloc (default: 1)
+# @prealloc-threads: number of CPU threads to use for prealloc (default: 
number of CPUs) (since 5.0)
   #
   # @prealloc-context: thread context to use for creation of preallocation 
threads
   #(default: none) (since 7.2)
   


The line seems exceeding 80 characters. It'd better to limit each line in 75 
characters.
So you probably need:

 # @prealloc-threads: number of CPU threads to use for prealloc (default: 
number of CPUs)
 #(since 5.0)


Still exceeds :)

I suggested

   # @prealloc-threads: number of CPU threads to use for prealloc
   #(default: number of CPUs) (since 5.0)





Markus's suggestion works :)

Thanks,
Gavin





Re: [PATCH v3] qapi/qom: Memory backend property prealloc-threads doc fix

2022-11-10 Thread Gavin Shan

Hi Zhenyu,

On 11/11/22 11:05 AM, Zhenyu Zhang wrote:

Commit ffac16fab3 "hostmem: introduce "prealloc-threads" property"
(v5.0.0) changed the default number of threads from number of CPUs
to 1.  This was deemed a regression, and fixed in commit f8d426a685
"hostmem: default the amount of prealloc-threads to smp-cpus".
Except the documentation remained unchanged.  Update it now.

Signed-off-by: Zhenyu Zhang 
---

v3: Covers historical descriptions  (Markus)
v2: The property is changed to smp-cpus since 5.0   (Phild)

---
  qapi/qom.json | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)



With the following comments addressed:

Reviewed-by: Gavin Shan 

---

Please consider amending the commit log to something like below.

The default "prealloc-threads" value is set to 1 when the property is
added by commit ffac16fab33b ("hostmem: introduce "prealloc-threads"
property") in v5.0.0. The default value is conflicting with the sugar
property as the value provided by the sugar property is number of CPUs.
The conflict has been fixed by commit f8d426a6852c ("hostmem: default
the amount of prealloc-threads to smp-cpus"). However, 'qapi/qom.json'
was missed to be updated accordingly in the commit.

Update 'qapi/qom.json' to reflect the change in commit f8d426a6852c.

Signed-off-by: Zhenyu Zhang 

When a specific commit is mentioned in the commit log, we usually have
fixed format like below.

commit ffac16fab33b ("hostmem: introduce "prealloc-threads" property")
commit f8d426a6852c ("hostmem: default the amount of prealloc-threads to 
smp-cpus")


diff --git a/qapi/qom.json b/qapi/qom.json
index 30e76653ad..dfd89bc6d4 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -576,7 +576,7 @@
  #
  # @prealloc: if true, preallocate memory (default: false)
  #
-# @prealloc-threads: number of CPU threads to use for prealloc (default: 1)
+# @prealloc-threads: number of CPU threads to use for prealloc (default: 
number of CPUs) (since 5.0)
  #
  # @prealloc-context: thread context to use for creation of preallocation 
threads
  #(default: none) (since 7.2)



The line seems exceeding 80 characters. It'd better to limit each line in 75 
characters.
So you probably need:

# @prealloc-threads: number of CPU threads to use for prealloc (default: 
number of CPUs)
#(since 5.0)

Thanks,
Gavin




Re: [PATCH v6 0/7] hw/arm/virt: Improve address assignment for high memory regions

2022-10-29 Thread Gavin Shan

Hi Peter,

On 10/29/22 2:06 AM, Peter Maydell wrote:

On Wed, 26 Oct 2022 at 01:30, Gavin Shan  wrote:


On 10/24/22 11:54 AM, Gavin Shan wrote:

There are three high memory regions, which are VIRT_HIGH_REDIST2,
VIRT_HIGH_PCIE_ECAM and VIRT_HIGH_PCIE_MMIO. Their base addresses
are floating on highest RAM address. However, they can be disabled
in several cases.



Could you help to take a look when getting a chance? I think Connie and
Eric are close to complete the reviews, but v7 is still needed to address
extra comments from them. I hope to make v7 mergeable if possible :)


If Eric and Connie and Marc are happy with it then I don't have
any further issues on top of that.

NB that since softfreeze is next Tuesday, this is going to be
8.0 material at this point, I'm afraid :-( (Softfreeze caught
me by surprise this cycle...)



Ok. v7 was just posted with Connie/Eric's comments resolved. Marc
also provided his r-b. Nothing is really critical since none of
the patches fixes an existing issue. It would be great if it can
be merged to 7.2 if we are fortunate enough. Otherwise, 8.0 is also
good to me :)

https://lists.nongnu.org/archive/html/qemu-arm/2022-10/msg00693.html (v7)

Thanks,
Gavin




Re: [PATCH v6 0/7] hw/arm/virt: Improve address assignment for high memory regions

2022-10-29 Thread Gavin Shan

Hi Marc,

On 10/29/22 7:29 PM, Marc Zyngier wrote:

On Wed, 26 Oct 2022 01:29:56 +0100,
Gavin Shan  wrote:

On 10/24/22 11:54 AM, Gavin Shan wrote:

There are three high memory regions, which are VIRT_HIGH_REDIST2,
VIRT_HIGH_PCIE_ECAM and VIRT_HIGH_PCIE_MMIO. Their base addresses
are floating on highest RAM address. However, they can be disabled
in several cases.

(1) One specific high memory region is disabled by developer by
  toggling vms->highmem_{redists, ecam, mmio}.

(2) VIRT_HIGH_PCIE_ECAM region is disabled on machine, which is
  'virt-2.12' or ealier than it.

(3) VIRT_HIGH_PCIE_ECAM region is disabled when firmware is loaded
  on 32-bits system.

(4) One specific high memory region is disabled when it breaks the
  PA space limit.

The current implementation of virt_set_memmap() isn't comprehensive
because the space for one specific high memory region is always
reserved from the PA space for case (1), (2) and (3). In the code,
'base' and 'vms->highest_gpa' are always increased for those three
cases. It's unnecessary since the assigned space of the disabled
high memory region won't be used afterwards.

The series intends to improve the address assignment for these
high memory regions and introduces new properties for user to
selectively disable those 3 high memory regions.

PATCH[1-4] preparatory work for the improvment
PATCH[5]   improve high memory region address assignment
PATCH[6]   adds 'compact-highmem' to enable or disable the optimization
PATCH[7]   adds properties so that high memory regions can be disabled

v5: https://lists.nongnu.org/archive/html/qemu-arm/2022-10/msg00280.html
v4: https://lists.nongnu.org/archive/html/qemu-arm/2022-10/msg00067.html
v3: https://lists.nongnu.org/archive/html/qemu-arm/2022-09/msg00258.html
v2: https://lore.kernel.org/all/20220815062958.100366-1-gs...@redhat.com/T/
v1: https://lists.nongnu.org/archive/html/qemu-arm/2022-08/msg00013.html



Could you help to take a look when getting a chance? I think Connie and
Eric are close to complete the reviews, but v7 is still needed to address
extra comments from them. I hope to make v7 mergeable if possible :)


With the comments from Connie and Eric addressed, this looks good to
me:

Reviewed-by: Marc Zyngier 

Thanks for having gone the extra mile on this one.



Thank you for your feedback and reviews. I've posted v7 with your r-b
after resolving comments from Connie/Eric.

https://lists.nongnu.org/archive/html/qemu-arm/2022-10/msg00693.html (v7)

Thanks,
Gavin




[PATCH v7 7/7] hw/arm/virt: Add properties to disable high memory regions

2022-10-29 Thread Gavin Shan
The 3 high memory regions are usually enabled by default, but they may
be not used. For example, VIRT_HIGH_GIC_REDIST2 isn't needed by GICv2.
This leads to waste in the PA space.

Add properties ("highmem-redists", "highmem-ecam", "highmem-mmio") to
allow users selectively disable them if needed. After that, the high
memory region for GICv3 or GICv4 redistributor can be disabled by user,
the number of maximal supported CPUs needs to be calculated based on
'vms->highmem_redists'. The follow-up error message is also improved
to indicate if the high memory region for GICv3 and GICv4 has been
enabled or not.

Suggested-by: Marc Zyngier 
Signed-off-by: Gavin Shan 
Reviewed-by: Marc Zyngier 
---
 docs/system/arm/virt.rst | 13 +++
 hw/arm/virt.c| 75 ++--
 2 files changed, 86 insertions(+), 2 deletions(-)

diff --git a/docs/system/arm/virt.rst b/docs/system/arm/virt.rst
index 4454706392..188a4f211f 100644
--- a/docs/system/arm/virt.rst
+++ b/docs/system/arm/virt.rst
@@ -98,6 +98,19 @@ compact-highmem
   Set ``on``/``off`` to enable/disable the compact layout for high memory 
regions.
   The default is ``on`` for machine types later than ``virt-7.2``.
 
+highmem-redists
+  Set ``on``/``off`` to enable/disable the high memory region for GICv3 or
+  GICv4 redistributor. The default is ``on``. Setting this to ``off`` will
+  limit the maximum number of CPUs when GICv3 or GICv4 is used.
+
+highmem-ecam
+  Set ``on``/``off`` to enable/disable the high memory region for PCI ECAM.
+  The default is ``on`` for machine types later than ``virt-3.0``.
+
+highmem-mmio
+  Set ``on``/``off`` to enable/disable the high memory region for PCI MMIO.
+  The default is ``on``.
+
 gic-version
   Specify the version of the Generic Interrupt Controller (GIC) to provide.
   Valid values are:
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 020a95cfa2..65cff7add1 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -2095,14 +2095,20 @@ static void machvirt_init(MachineState *machine)
 if (vms->gic_version == VIRT_GIC_VERSION_2) {
 virt_max_cpus = GIC_NCPU;
 } else {
-virt_max_cpus = virt_redist_capacity(vms, VIRT_GIC_REDIST) +
-virt_redist_capacity(vms, VIRT_HIGH_GIC_REDIST2);
+virt_max_cpus = virt_redist_capacity(vms, VIRT_GIC_REDIST);
+if (vms->highmem_redists) {
+virt_max_cpus += virt_redist_capacity(vms, VIRT_HIGH_GIC_REDIST2);
+}
 }
 
 if (max_cpus > virt_max_cpus) {
 error_report("Number of SMP CPUs requested (%d) exceeds max CPUs "
  "supported by machine 'mach-virt' (%d)",
  max_cpus, virt_max_cpus);
+if (vms->gic_version != VIRT_GIC_VERSION_2 && !vms->highmem_redists) {
+error_printf("Try 'highmem-redists=on' for more CPUs\n");
+}
+
 exit(1);
 }
 
@@ -2371,6 +2377,49 @@ static void virt_set_compact_highmem(Object *obj, bool 
value, Error **errp)
 vms->highmem_compact = value;
 }
 
+static bool virt_get_highmem_redists(Object *obj, Error **errp)
+{
+VirtMachineState *vms = VIRT_MACHINE(obj);
+
+return vms->highmem_redists;
+}
+
+static void virt_set_highmem_redists(Object *obj, bool value, Error **errp)
+{
+VirtMachineState *vms = VIRT_MACHINE(obj);
+
+vms->highmem_redists = value;
+}
+
+static bool virt_get_highmem_ecam(Object *obj, Error **errp)
+{
+VirtMachineState *vms = VIRT_MACHINE(obj);
+
+return vms->highmem_ecam;
+}
+
+static void virt_set_highmem_ecam(Object *obj, bool value, Error **errp)
+{
+VirtMachineState *vms = VIRT_MACHINE(obj);
+
+vms->highmem_ecam = value;
+}
+
+static bool virt_get_highmem_mmio(Object *obj, Error **errp)
+{
+VirtMachineState *vms = VIRT_MACHINE(obj);
+
+return vms->highmem_mmio;
+}
+
+static void virt_set_highmem_mmio(Object *obj, bool value, Error **errp)
+{
+VirtMachineState *vms = VIRT_MACHINE(obj);
+
+vms->highmem_mmio = value;
+}
+
+
 static bool virt_get_its(Object *obj, Error **errp)
 {
 VirtMachineState *vms = VIRT_MACHINE(obj);
@@ -2996,6 +3045,28 @@ static void virt_machine_class_init(ObjectClass *oc, 
void *data)
   "Set on/off to enable/disable 
compact "
   "layout for high memory regions");
 
+object_class_property_add_bool(oc, "highmem-redists",
+   virt_get_highmem_redists,
+   virt_set_highmem_redists);
+object_class_property_set_description(oc, "highmem-redists",
+  "Set on/off to enable/disable high "
+  "memory region for GICv3 or GICv4 "
+  "redistributor"

[PATCH v7 5/7] hw/arm/virt: Improve high memory region address assignment

2022-10-29 Thread Gavin Shan
There are three high memory regions, which are VIRT_HIGH_REDIST2,
VIRT_HIGH_PCIE_ECAM and VIRT_HIGH_PCIE_MMIO. Their base addresses
are floating on highest RAM address. However, they can be disabled
in several cases.

(1) One specific high memory region is likely to be disabled by
code by toggling vms->highmem_{redists, ecam, mmio}.

(2) VIRT_HIGH_PCIE_ECAM region is disabled on machine, which is
'virt-2.12' or ealier than it.

(3) VIRT_HIGH_PCIE_ECAM region is disabled when firmware is loaded
on 32-bits system.

(4) One specific high memory region is disabled when it breaks the
PA space limit.

The current implementation of virt_set_{memmap, high_memmap}() isn't
optimized because the high memory region's PA space is always reserved,
regardless of whatever the actual state in the corresponding
vms->highmem_{redists, ecam, mmio} flag. In the code, 'base' and
'vms->highest_gpa' are always increased for case (1), (2) and (3).
It's unnecessary since the assigned PA space for the disabled high
memory region won't be used afterwards.

Improve the address assignment for those three high memory region by
skipping the address assignment for one specific high memory region if
it has been disabled in case (1), (2) and (3). The memory layout may
be changed after the improvement is applied, which leads to potential
migration breakage. So 'vms->highmem_compact' is added to control if
the improvement should be applied. For now, 'vms->highmem_compact' is
set to false, meaning that we don't have memory layout change until it
becomes configurable through property 'compact-highmem' in next patch.

Signed-off-by: Gavin Shan 
Reviewed-by: Eric Auger 
Reviewed-by: Cornelia Huck 
Reviewed-by: Marc Zyngier 
Tested-by: Zhenyu Zhang 
---
 hw/arm/virt.c | 15 ++-
 include/hw/arm/virt.h |  1 +
 2 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index ee98a8a3b6..3b4c995f80 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1721,18 +1721,23 @@ static void virt_set_high_memmap(VirtMachineState *vms,
 vms->memmap[i].size = region_size;
 
 /*
- * Check each device to see if they fit in the PA space,
- * moving highest_gpa as we go.
+ * Check each device to see if it fits in the PA space,
+ * moving highest_gpa as we go. For compatibility, move
+ * highest_gpa for disabled fitting devices as well, if
+ * the compact layout has been disabled.
  *
  * For each device that doesn't fit, disable it.
  */
 fits = (region_base + region_size) <= BIT_ULL(pa_bits);
-if (fits) {
-vms->highest_gpa = region_base + region_size - 1;
+*region_enabled &= fits;
+if (vms->highmem_compact && !*region_enabled) {
+continue;
 }
 
-*region_enabled &= fits;
 base = region_base + region_size;
+if (fits) {
+vms->highest_gpa = base - 1;
+}
 }
 }
 
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index 6ec479ca2b..709f623741 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -144,6 +144,7 @@ struct VirtMachineState {
 PFlashCFI01 *flash[2];
 bool secure;
 bool highmem;
+bool highmem_compact;
 bool highmem_ecam;
 bool highmem_mmio;
 bool highmem_redists;
-- 
2.23.0




[PATCH v7 6/7] hw/arm/virt: Add 'compact-highmem' property

2022-10-29 Thread Gavin Shan
After the improvement to high memory region address assignment is
applied, the memory layout can be changed, introducing possible
migration breakage. For example, VIRT_HIGH_PCIE_MMIO memory region
is disabled or enabled when the optimization is applied or not, with
the following configuration. The configuration is only achievable by
modifying the source code until more properties are added to allow
users selectively disable those high memory regions.

  pa_bits  = 40;
  vms->highmem_redists = false;
  vms->highmem_ecam= false;
  vms->highmem_mmio= true;

  # qemu-system-aarch64 -accel kvm -cpu host\
-machine virt-7.2,compact-highmem={on, off} \
-m 4G,maxmem=511G -monitor stdio

  Region compact-highmem=off compact-highmem=on
  
  MEM[1GB 512GB][1GB 512GB]
  HIGH_GIC_REDISTS2  [512GB   512GB+64MB]   [disabled]
  HIGH_PCIE_ECAM [512GB+256MB 512GB+512MB]  [disabled]
  HIGH_PCIE_MMIO [disabled] [512GB   1TB]

In order to keep backwords compatibility, we need to disable the
optimization on machine, which is virt-7.1 or ealier than it. It
means the optimization is enabled by default from virt-7.2. Besides,
'compact-highmem' property is added so that the optimization can be
explicitly enabled or disabled on all machine types by users.

Signed-off-by: Gavin Shan 
Reviewed-by: Eric Auger 
Reviewed-by: Cornelia Huck 
Reviewed-by: Marc Zyngier 
Tested-by: Zhenyu Zhang 
---
 docs/system/arm/virt.rst |  4 
 hw/arm/virt.c| 32 
 include/hw/arm/virt.h|  1 +
 3 files changed, 37 insertions(+)

diff --git a/docs/system/arm/virt.rst b/docs/system/arm/virt.rst
index 20442ea2c1..4454706392 100644
--- a/docs/system/arm/virt.rst
+++ b/docs/system/arm/virt.rst
@@ -94,6 +94,10 @@ highmem
   address space above 32 bits. The default is ``on`` for machine types
   later than ``virt-2.12``.
 
+compact-highmem
+  Set ``on``/``off`` to enable/disable the compact layout for high memory 
regions.
+  The default is ``on`` for machine types later than ``virt-7.2``.
+
 gic-version
   Specify the version of the Generic Interrupt Controller (GIC) to provide.
   Valid values are:
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 3b4c995f80..020a95cfa2 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -174,6 +174,12 @@ static const MemMapEntry base_memmap[] = {
  * Note the extended_memmap is sized so that it eventually also includes the
  * base_memmap entries (VIRT_HIGH_GIC_REDIST2 index is greater than the last
  * index of base_memmap).
+ *
+ * The memory map for these Highmem IO Regions can be in legacy or compact
+ * layout, depending on 'compact-highmem' property. With legacy layout, the
+ * PA space for one specific region is always reserved, even if the region
+ * has been disabled or doesn't fit into the PA space. However, the PA space
+ * for the region won't be reserved in these circumstances with compact layout.
  */
 static MemMapEntry extended_memmap[] = {
 /* Additional 64 MB redist region (can contain up to 512 redistributors) */
@@ -2351,6 +2357,20 @@ static void virt_set_highmem(Object *obj, bool value, 
Error **errp)
 vms->highmem = value;
 }
 
+static bool virt_get_compact_highmem(Object *obj, Error **errp)
+{
+VirtMachineState *vms = VIRT_MACHINE(obj);
+
+return vms->highmem_compact;
+}
+
+static void virt_set_compact_highmem(Object *obj, bool value, Error **errp)
+{
+VirtMachineState *vms = VIRT_MACHINE(obj);
+
+vms->highmem_compact = value;
+}
+
 static bool virt_get_its(Object *obj, Error **errp)
 {
 VirtMachineState *vms = VIRT_MACHINE(obj);
@@ -2969,6 +2989,13 @@ static void virt_machine_class_init(ObjectClass *oc, 
void *data)
   "Set on/off to enable/disable using "
   "physical address space above 32 
bits");
 
+object_class_property_add_bool(oc, "compact-highmem",
+   virt_get_compact_highmem,
+   virt_set_compact_highmem);
+object_class_property_set_description(oc, "compact-highmem",
+  "Set on/off to enable/disable 
compact "
+  "layout for high memory regions");
+
 object_class_property_add_str(oc, "gic-version", virt_get_gic_version,
   virt_set_gic_version);
 object_class_property_set_description(oc, "gic-version",
@@ -3053,6 +3080,7 @@ static void virt_instance_init(Object *obj)
 
 /* High memory is enabled by default */
 vms->highmem = true;
+vms->highmem_compact = !vmc->no_highmem_compact;
 vms->gic_version = VIRT_GIC_VERSION_NOSEL;
 
 vms-&g

[PATCH v7 0/7] hw/arm/virt: Improve address assignment for high memory regions

2022-10-29 Thread Gavin Shan
There are three high memory regions, which are VIRT_HIGH_REDIST2,
VIRT_HIGH_PCIE_ECAM and VIRT_HIGH_PCIE_MMIO. Their base addresses
are floating on highest RAM address. However, they can be disabled
in several cases.

(1) One specific high memory region is disabled by developer by
toggling vms->highmem_{redists, ecam, mmio}.

(2) VIRT_HIGH_PCIE_ECAM region is disabled on machine, which is
'virt-2.12' or ealier than it.

(3) VIRT_HIGH_PCIE_ECAM region is disabled when firmware is loaded
on 32-bits system.

(4) One specific high memory region is disabled when it breaks the
PA space limit.

The current implementation of virt_set_memmap() isn't comprehensive
because the space for one specific high memory region is always
reserved from the PA space for case (1), (2) and (3). In the code,
'base' and 'vms->highest_gpa' are always increased for those three
cases. It's unnecessary since the assigned space of the disabled
high memory region won't be used afterwards.

The series intends to improve the address assignment for these
high memory regions and introduces new properties for user to
selectively disable those 3 high memory regions.

PATCH[1-4] preparatory work for the improvment
PATCH[5]   improve high memory region address assignment
PATCH[6]   adds 'compact-highmem' to enable or disable the optimization
PATCH[7]   adds properties so that high memory regions can be disabled

v6: https://lists.nongnu.org/archive/html/qemu-arm/2022-10/msg00490.html
v5: https://lists.nongnu.org/archive/html/qemu-arm/2022-10/msg00280.html
v4: https://lists.nongnu.org/archive/html/qemu-arm/2022-10/msg00067.html
v3: https://lists.nongnu.org/archive/html/qemu-arm/2022-09/msg00258.html
v2: https://lore.kernel.org/all/20220815062958.100366-1-gs...@redhat.com/T/
v1: https://lists.nongnu.org/archive/html/qemu-arm/2022-08/msg00013.html

Changelog
==
v7:
  * Pick review-by from Connie/Eric/Marc   
(Connie/Eric/Marc)
  * Use 'base - 1' to update 'vms->highest_gpa' in PATCH[v7 5] (Eric)
  * Replace 'even' with 'even if' in the comments about the
legacy and compact layout in PATCH[v7 6]   (Connie)
  * The maximal supported CPUs are calculated based on
property 'highmem-redists' and the improved comments and
error messages for this property   (Connie)
v6:
  * Pick review-by from Connie/Eric(Connie/Eric)
  * Make the changes obvious in PATCH[v6 5/7]  (Eric)
  * Move the example to commit log and describe the legacy
and compact layout in code's comments in PATCH[v6 6/7] (Eric)
  * Comment and commit message improvements(Connie/Eric)
  * Add 3 properties in PATCH[v6 7/7], allowing user to disable
those 3 high memory regions(Marc)
v5:
  * Pick review-by and tested-by   (Connie/Zhenyu)
  * Add extra check in PATCH[v5 4/6]   (Connie)
  * Improve comments about compatibility for disabled regions
in PATCH[v5 5/6]   (Connie)
v4:
  * Add virt_get_high_memmap_enabled() helper  (Eric)
  * Move 'vms->highmem_compact' and related logic from
PATCH[v4 6/6] to PATCH[v4 5/6] to avoid git-bisect
breakage   (Eric)
  * Document the legacy and optimized high memory region
layout in commit log and source code   (Eric)
v3:
  * Reorder the patches(Gavin)
  * Add 'highmem-compact' property for backwards compatibility (Eric)
v2:
  * Split the patches for easier review(Gavin)
  * Improved changelog (Marc)
  * Use 'bool fits' in virt_set_high_memmap()  (Eric)

Gavin Shan (7):
  hw/arm/virt: Introduce virt_set_high_memmap() helper
  hw/arm/virt: Rename variable size to region_size in
virt_set_high_memmap()
  hw/arm/virt: Introduce variable region_base in virt_set_high_memmap()
  hw/arm/virt: Introduce virt_get_high_memmap_enabled() helper
  hw/arm/virt: Improve high memory region address assignment
  hw/arm/virt: Add 'compact-highmem' property
  hw/arm/virt: Add properties to disable high memory regions

 docs/system/arm/virt.rst |  17 
 hw/arm/virt.c| 193 ---
 include/hw/arm/virt.h|   2 +
 3 files changed, 177 insertions(+), 35 deletions(-)

-- 
2.23.0




[PATCH v7 4/7] hw/arm/virt: Introduce virt_get_high_memmap_enabled() helper

2022-10-29 Thread Gavin Shan
This introduces virt_get_high_memmap_enabled() helper, which returns
the pointer to vms->highmem_{redists, ecam, mmio}. The pointer will
be used in the subsequent patches.

No functional change intended.

Signed-off-by: Gavin Shan 
Reviewed-by: Eric Auger 
Reviewed-by: Cornelia Huck 
Reviewed-by: Marc Zyngier 
Tested-by: Zhenyu Zhang 
---
 hw/arm/virt.c | 32 +++-
 1 file changed, 19 insertions(+), 13 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 0bf3cb7057..ee98a8a3b6 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1689,14 +1689,31 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState 
*vms, int idx)
 return arm_cpu_mp_affinity(idx, clustersz);
 }
 
+static inline bool *virt_get_high_memmap_enabled(VirtMachineState *vms,
+ int index)
+{
+bool *enabled_array[] = {
+>highmem_redists,
+>highmem_ecam,
+>highmem_mmio,
+};
+
+assert(ARRAY_SIZE(extended_memmap) - VIRT_LOWMEMMAP_LAST ==
+   ARRAY_SIZE(enabled_array));
+assert(index - VIRT_LOWMEMMAP_LAST < ARRAY_SIZE(enabled_array));
+
+return enabled_array[index - VIRT_LOWMEMMAP_LAST];
+}
+
 static void virt_set_high_memmap(VirtMachineState *vms,
  hwaddr base, int pa_bits)
 {
 hwaddr region_base, region_size;
-bool fits;
+bool *region_enabled, fits;
 int i;
 
 for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) {
+region_enabled = virt_get_high_memmap_enabled(vms, i);
 region_base = ROUND_UP(base, extended_memmap[i].size);
 region_size = extended_memmap[i].size;
 
@@ -1714,18 +1731,7 @@ static void virt_set_high_memmap(VirtMachineState *vms,
 vms->highest_gpa = region_base + region_size - 1;
 }
 
-switch (i) {
-case VIRT_HIGH_GIC_REDIST2:
-vms->highmem_redists &= fits;
-break;
-case VIRT_HIGH_PCIE_ECAM:
-vms->highmem_ecam &= fits;
-break;
-case VIRT_HIGH_PCIE_MMIO:
-vms->highmem_mmio &= fits;
-break;
-}
-
+*region_enabled &= fits;
 base = region_base + region_size;
 }
 }
-- 
2.23.0




[PATCH v7 3/7] hw/arm/virt: Introduce variable region_base in virt_set_high_memmap()

2022-10-29 Thread Gavin Shan
This introduces variable 'region_base' for the base address of the
specific high memory region. It's the preparatory work to optimize
high memory region address assignment.

No functional change intended.

Signed-off-by: Gavin Shan 
Reviewed-by: Eric Auger 
Reviewed-by: Cornelia Huck 
Reviewed-by: Marc Zyngier 
Tested-by: Zhenyu Zhang 
---
 hw/arm/virt.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index e2ae88cf8b..0bf3cb7057 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1692,15 +1692,15 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState 
*vms, int idx)
 static void virt_set_high_memmap(VirtMachineState *vms,
  hwaddr base, int pa_bits)
 {
-hwaddr region_size;
+hwaddr region_base, region_size;
 bool fits;
 int i;
 
 for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) {
+region_base = ROUND_UP(base, extended_memmap[i].size);
 region_size = extended_memmap[i].size;
 
-base = ROUND_UP(base, region_size);
-vms->memmap[i].base = base;
+vms->memmap[i].base = region_base;
 vms->memmap[i].size = region_size;
 
 /*
@@ -1709,9 +1709,9 @@ static void virt_set_high_memmap(VirtMachineState *vms,
  *
  * For each device that doesn't fit, disable it.
  */
-fits = (base + region_size) <= BIT_ULL(pa_bits);
+fits = (region_base + region_size) <= BIT_ULL(pa_bits);
 if (fits) {
-vms->highest_gpa = base + region_size - 1;
+vms->highest_gpa = region_base + region_size - 1;
 }
 
 switch (i) {
@@ -1726,7 +1726,7 @@ static void virt_set_high_memmap(VirtMachineState *vms,
 break;
 }
 
-base += region_size;
+base = region_base + region_size;
 }
 }
 
-- 
2.23.0




[PATCH v7 2/7] hw/arm/virt: Rename variable size to region_size in virt_set_high_memmap()

2022-10-29 Thread Gavin Shan
This renames variable 'size' to 'region_size' in virt_set_high_memmap().
Its counterpart ('region_base') will be introduced in next patch.

No functional change intended.

Signed-off-by: Gavin Shan 
Reviewed-by: Eric Auger 
Reviewed-by: Cornelia Huck 
Reviewed-by: Marc Zyngier 
Tested-by: Zhenyu Zhang 
---
 hw/arm/virt.c | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 7572c44bda..e2ae88cf8b 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1692,15 +1692,16 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState 
*vms, int idx)
 static void virt_set_high_memmap(VirtMachineState *vms,
  hwaddr base, int pa_bits)
 {
+hwaddr region_size;
+bool fits;
 int i;
 
 for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) {
-hwaddr size = extended_memmap[i].size;
-bool fits;
+region_size = extended_memmap[i].size;
 
-base = ROUND_UP(base, size);
+base = ROUND_UP(base, region_size);
 vms->memmap[i].base = base;
-vms->memmap[i].size = size;
+vms->memmap[i].size = region_size;
 
 /*
  * Check each device to see if they fit in the PA space,
@@ -1708,9 +1709,9 @@ static void virt_set_high_memmap(VirtMachineState *vms,
  *
  * For each device that doesn't fit, disable it.
  */
-fits = (base + size) <= BIT_ULL(pa_bits);
+fits = (base + region_size) <= BIT_ULL(pa_bits);
 if (fits) {
-vms->highest_gpa = base + size - 1;
+vms->highest_gpa = base + region_size - 1;
 }
 
 switch (i) {
@@ -1725,7 +1726,7 @@ static void virt_set_high_memmap(VirtMachineState *vms,
 break;
 }
 
-base += size;
+base += region_size;
 }
 }
 
-- 
2.23.0




[PATCH v7 1/7] hw/arm/virt: Introduce virt_set_high_memmap() helper

2022-10-29 Thread Gavin Shan
This introduces virt_set_high_memmap() helper. The logic of high
memory region address assignment is moved to the helper. The intention
is to make the subsequent optimization for high memory region address
assignment easier.

No functional change intended.

Signed-off-by: Gavin Shan 
Reviewed-by: Eric Auger 
Reviewed-by: Cornelia Huck 
Reviewed-by: Marc Zyngier 
Tested-by: Zhenyu Zhang 
---
 hw/arm/virt.c | 74 ---
 1 file changed, 41 insertions(+), 33 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index cda9defe8f..7572c44bda 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1689,6 +1689,46 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState 
*vms, int idx)
 return arm_cpu_mp_affinity(idx, clustersz);
 }
 
+static void virt_set_high_memmap(VirtMachineState *vms,
+ hwaddr base, int pa_bits)
+{
+int i;
+
+for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) {
+hwaddr size = extended_memmap[i].size;
+bool fits;
+
+base = ROUND_UP(base, size);
+vms->memmap[i].base = base;
+vms->memmap[i].size = size;
+
+/*
+ * Check each device to see if they fit in the PA space,
+ * moving highest_gpa as we go.
+ *
+ * For each device that doesn't fit, disable it.
+ */
+fits = (base + size) <= BIT_ULL(pa_bits);
+if (fits) {
+vms->highest_gpa = base + size - 1;
+}
+
+switch (i) {
+case VIRT_HIGH_GIC_REDIST2:
+vms->highmem_redists &= fits;
+break;
+case VIRT_HIGH_PCIE_ECAM:
+vms->highmem_ecam &= fits;
+break;
+case VIRT_HIGH_PCIE_MMIO:
+vms->highmem_mmio &= fits;
+break;
+}
+
+base += size;
+}
+}
+
 static void virt_set_memmap(VirtMachineState *vms, int pa_bits)
 {
 MachineState *ms = MACHINE(vms);
@@ -1744,39 +1784,7 @@ static void virt_set_memmap(VirtMachineState *vms, int 
pa_bits)
 /* We know for sure that at least the memory fits in the PA space */
 vms->highest_gpa = memtop - 1;
 
-for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) {
-hwaddr size = extended_memmap[i].size;
-bool fits;
-
-base = ROUND_UP(base, size);
-vms->memmap[i].base = base;
-vms->memmap[i].size = size;
-
-/*
- * Check each device to see if they fit in the PA space,
- * moving highest_gpa as we go.
- *
- * For each device that doesn't fit, disable it.
- */
-fits = (base + size) <= BIT_ULL(pa_bits);
-if (fits) {
-vms->highest_gpa = base + size - 1;
-}
-
-switch (i) {
-case VIRT_HIGH_GIC_REDIST2:
-vms->highmem_redists &= fits;
-break;
-case VIRT_HIGH_PCIE_ECAM:
-vms->highmem_ecam &= fits;
-break;
-case VIRT_HIGH_PCIE_MMIO:
-vms->highmem_mmio &= fits;
-break;
-}
-
-base += size;
-}
+virt_set_high_memmap(vms, base, pa_bits);
 
 if (device_memory_size > 0) {
 ms->device_memory = g_malloc0(sizeof(*ms->device_memory));
-- 
2.23.0




Re: [PATCH v6 7/7] hw/arm/virt: Add properties to disable high memory regions

2022-10-28 Thread Gavin Shan

Hi Connie,

On 10/26/22 7:10 PM, Cornelia Huck wrote:

On Wed, Oct 26 2022, Gavin Shan  wrote:


On 10/25/22 6:54 PM, Cornelia Huck wrote:

On Mon, Oct 24 2022, Gavin Shan  wrote:


These 3 high memory regions are usually enabled by default, but


s/These 3/The/ ?



Ok.


they may be not used. For example, VIRT_HIGH_GIC_REDIST2 isn't
needed by GICv2. This leads to waste in the PA space.


When building the command line, do we have enough information on when
the regions provide something useful, and when they just waste space?



I think the help messages are already indicative. For example, the help
messages for 'highmem-redist2' indicate the region is only needed by
GICv3 or GICv4. 'highmem-ecam' and 'highmem-mmio' are needed by PCI ECAM
and MMIO and the key words 'high' indicates they're the corresponding
second window.

#./qemu-system-aarch64 -M virt,?
highmem-ecam=- Set on/off to enable/disable high memory region for 
PCI ECAM
highmem-mmio=- Set on/off to enable/disable high memory region for 
PCI MMIO
highmem-redists= - Set on/off to enable/disable high memory region for 
GICv3 or GICv4 redistributor


OK, hopefully this is enough for anyone building a command line
directly. (Do we want to encourage management software like libvirt to
switch off regions that are not needed?)



Yeah, to disable those regions aren't needed will make libvirt smart.
I think it's worthy for libvirt to do it. I'm not sure if arbitrary
machine properties have been supported by libvirt or not. For example,
'highmem-redists' can still be turned off from 'vm1.xml' even though
it's not explicitly supported in libvirt's code.





Add properties to allow users selectively disable them if needed:
"highmem-redists", "highmem-ecam", "highmem-mmio".

Suggested-by: Marc Zyngier 
Signed-off-by: Gavin Shan 
---
   docs/system/arm/virt.rst | 12 
   hw/arm/virt.c| 64 
   2 files changed, 76 insertions(+)

diff --git a/docs/system/arm/virt.rst b/docs/system/arm/virt.rst
index 4454706392..a1668a969d 100644
--- a/docs/system/arm/virt.rst
+++ b/docs/system/arm/virt.rst
@@ -98,6 +98,18 @@ compact-highmem
 Set ``on``/``off`` to enable/disable the compact layout for high memory 
regions.
 The default is ``on`` for machine types later than ``virt-7.2``.
   
+highmem-redists

+  Set ``on``/``off`` to enable/disable the high memry region for GICv3/4


s/memry/memory/



Ok, copy-and-paste error. Will be fixed.


+  redistributor. The default is ``on``.


Do we need to add a note about what effects setting this to "off" may
have, e.g. "Setting this to ``off`` may limit the maximum number of
cpus." or so? And/or "Setting this to ``off`` when using GICv2 will save
some space."?



We may not mention GICv2 since GICv3/v4 are already mentioned. It's a
good idea to mention that the maximum number of CPUs is reduced when
it's turned off. I will have something like below in next respin if
you agree.

highmem-redists
Set ``on``/``off`` to enable/disable the high memroy region for GICv3 or
GICv4 redistributor. The default is ``on``. Setting this to ``off`` will
limit the maximum number of CPUs when GICv3 or GICv4 is used.


OK, sounds reasonable to me.



Ok, Thanks for your confirm.



Since 'vms->highmem_redists' is changeable, the 'virt_max_cpus' in
machvirt_init() needs to be recalculated based on that. The code change
will be included into next respin. Besides, the follow-up error message
will be improved to something like below.

error_report("Number of SMP CPUs requested (%d) exceeds max CPUs "
 "supported by machine 'mach-virt' (%d). The high memory "
 "region for GICv3 or GICv4 redistributor has been %s",
 max_cpus, virt_max_cpus,
 vms->highmem_redists ? "enabled" : "disabled");


Hm, the doc for error_report() states that "The resulting message should
be a single phrase, with no newline or trailing punctuation." Maybe

 if (max_cpus > virt_max_cpus) {
 error_report("Number of SMP CPUs requested (%d) exceeds max CPUs "
  "supported by machine 'mach-virt' (%d)",
  max_cpus, virt_max_cpus);
 if (vms->gic_version != VIRT_GIC_VERSION_2 &&
 !vms->higmem_redists) {
 error_printf("Try 'highmem-redists=on' for more CPUs\n");
 }
 exit(1);
 }



Ok, I didn't notice the message raised by error_report() has this sort
of requirements. Your suggested snippet looks good to me and it will
be included in next respin. However, I would hold next respin (v7) for
a while to see if I can receive comments from Peter/Marc on v6 :)

Thanks,
Gavin




Re: [PATCH v6 5/7] hw/arm/virt: Improve high memory region address assignment

2022-10-28 Thread Gavin Shan

Hi Connie,

On 10/26/22 6:43 PM, Cornelia Huck wrote:

On Wed, Oct 26 2022, Gavin Shan  wrote:

On 10/26/22 12:29 AM, Eric Auger wrote:

On 10/24/22 05:54, Gavin Shan wrote:

There are three high memory regions, which are VIRT_HIGH_REDIST2,
VIRT_HIGH_PCIE_ECAM and VIRT_HIGH_PCIE_MMIO. Their base addresses
are floating on highest RAM address. However, they can be disabled
in several cases.

(1) One specific high memory region is likely to be disabled by
  code by toggling vms->highmem_{redists, ecam, mmio}.

(2) VIRT_HIGH_PCIE_ECAM region is disabled on machine, which is
  'virt-2.12' or ealier than it.

(3) VIRT_HIGH_PCIE_ECAM region is disabled when firmware is loaded
  on 32-bits system.

(4) One specific high memory region is disabled when it breaks the
  PA space limit.

The current implementation of virt_set_{memmap, high_memmap}() isn't
optimized because the high memory region's PA space is always reserved,
regardless of whatever the actual state in the corresponding
vms->highmem_{redists, ecam, mmio} flag. In the code, 'base' and
'vms->highest_gpa' are always increased for case (1), (2) and (3).
It's unnecessary since the assigned PA space for the disabled high
memory region won't be used afterwards.

Improve the address assignment for those three high memory region by
skipping the address assignment for one specific high memory region if
it has been disabled in case (1), (2) and (3). The memory layout may
be changed after the improvement is applied, which leads to potential
migration breakage. So 'vms->highmem_compact' is added to control if
the improvement should be applied. For now, 'vms->highmem_compact' is
set to false, meaning that we don't have memory layout change until it
becomes configurable through property 'compact-highmem' in next patch.

Signed-off-by: Gavin Shan 
Reviewed-by: Cornelia Huck 

the code has quite changed since Connie's R-b


Right. Connie, could you please check if the changes make sense to you
and I can regain your R-B? :)


My R-b still holds.



Thanks.




Tested-by: Zhenyu Zhang 
---
   hw/arm/virt.c | 15 ++-
   include/hw/arm/virt.h |  1 +
   2 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index ee98a8a3b6..4896f600b4 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1721,18 +1721,23 @@ static void virt_set_high_memmap(VirtMachineState *vms,
   vms->memmap[i].size = region_size;
   
   /*

- * Check each device to see if they fit in the PA space,
- * moving highest_gpa as we go.
+ * Check each device to see if it fits in the PA space,
+ * moving highest_gpa as we go. For compatibility, move
+ * highest_gpa for disabled fitting devices as well, if
+ * the compact layout has been disabled.
*
* For each device that doesn't fit, disable it.
*/
   fits = (region_base + region_size) <= BIT_ULL(pa_bits);
-if (fits) {
-vms->highest_gpa = region_base + region_size - 1;
+*region_enabled &= fits;
+if (vms->highmem_compact && !*region_enabled) {
+continue;
   }
   
-*region_enabled &= fits;

   base = region_base + region_size;
+if (fits) {
+vms->highest_gpa = region_base + region_size - 1;


vms->highest_gpa = base - 1;



It's personal taste actually. I was thinking of using 'base - 1', but
'region_base + region_size - 1' looks more like a direct way. I don't
have strong sense though and lets use 'base - 1' in next respin.


I don't really have a preference for one or the other.



Ok. Lets use 'base - 1' in next respin.

Thanks,
Gavin




Re: [PATCH v6 7/7] hw/arm/virt: Add properties to disable high memory regions

2022-10-25 Thread Gavin Shan

Hi Connie,

On 10/25/22 6:54 PM, Cornelia Huck wrote:

On Mon, Oct 24 2022, Gavin Shan  wrote:


These 3 high memory regions are usually enabled by default, but


s/These 3/The/ ?



Ok.


they may be not used. For example, VIRT_HIGH_GIC_REDIST2 isn't
needed by GICv2. This leads to waste in the PA space.


When building the command line, do we have enough information on when
the regions provide something useful, and when they just waste space?



I think the help messages are already indicative. For example, the help
messages for 'highmem-redist2' indicate the region is only needed by
GICv3 or GICv4. 'highmem-ecam' and 'highmem-mmio' are needed by PCI ECAM
and MMIO and the key words 'high' indicates they're the corresponding
second window.

#./qemu-system-aarch64 -M virt,?
highmem-ecam=- Set on/off to enable/disable high memory region for 
PCI ECAM
highmem-mmio=- Set on/off to enable/disable high memory region for 
PCI MMIO
highmem-redists= - Set on/off to enable/disable high memory region for 
GICv3 or GICv4 redistributor



Add properties to allow users selectively disable them if needed:
"highmem-redists", "highmem-ecam", "highmem-mmio".

Suggested-by: Marc Zyngier 
Signed-off-by: Gavin Shan 
---
  docs/system/arm/virt.rst | 12 
  hw/arm/virt.c| 64 
  2 files changed, 76 insertions(+)

diff --git a/docs/system/arm/virt.rst b/docs/system/arm/virt.rst
index 4454706392..a1668a969d 100644
--- a/docs/system/arm/virt.rst
+++ b/docs/system/arm/virt.rst
@@ -98,6 +98,18 @@ compact-highmem
Set ``on``/``off`` to enable/disable the compact layout for high memory 
regions.
The default is ``on`` for machine types later than ``virt-7.2``.
  
+highmem-redists

+  Set ``on``/``off`` to enable/disable the high memry region for GICv3/4


s/memry/memory/



Ok, copy-and-paste error. Will be fixed.


+  redistributor. The default is ``on``.


Do we need to add a note about what effects setting this to "off" may
have, e.g. "Setting this to ``off`` may limit the maximum number of
cpus." or so? And/or "Setting this to ``off`` when using GICv2 will save
some space."?



We may not mention GICv2 since GICv3/v4 are already mentioned. It's a
good idea to mention that the maximum number of CPUs is reduced when
it's turned off. I will have something like below in next respin if
you agree.

highmem-redists
  Set ``on``/``off`` to enable/disable the high memroy region for GICv3 or
  GICv4 redistributor. The default is ``on``. Setting this to ``off`` will
  limit the maximum number of CPUs when GICv3 or GICv4 is used.

Since 'vms->highmem_redists' is changeable, the 'virt_max_cpus' in
machvirt_init() needs to be recalculated based on that. The code change
will be included into next respin. Besides, the follow-up error message
will be improved to something like below.

  error_report("Number of SMP CPUs requested (%d) exceeds max CPUs "
   "supported by machine 'mach-virt' (%d). The high memory "
   "region for GICv3 or GICv4 redistributor has been %s",
   max_cpus, virt_max_cpus,
   vms->highmem_redists ? "enabled" : "disabled");


+
+highmem-ecam
+  Set ``on``/``off`` to enable/disable the high memry region for PCI ECAM.


s/memry/memory/



Ok, copy-and-paste error. Will be fixed.


+  The default is ``on`` for machine types later than ``virt-3.0``.
+
+highmem-mmio
+  Set ``on``/``off`` to enable/disable the high memry region for PCI MMIO.


s/memry/memory/



Ok. copy-and-paste error. Will be fixed.


+  The default is ``on``.
+
  gic-version
Specify the version of the Generic Interrupt Controller (GIC) to provide.
Valid values are:




Thanks,
Gavin




Re: [PATCH v6 6/7] hw/arm/virt: Add 'compact-highmem' property

2022-10-25 Thread Gavin Shan

Hi Connie,

On 10/25/22 6:30 PM, Cornelia Huck wrote:

On Mon, Oct 24 2022, Gavin Shan  wrote:


After the improvement to high memory region address assignment is
applied, the memory layout can be changed, introducing possible
migration breakage. For example, VIRT_HIGH_PCIE_MMIO memory region
is disabled or enabled when the optimization is applied or not, with
the following configuration. The configuration is only achievable by
modifying the source code until more properties are added to allow
users selectively disable those high memory regions.

   pa_bits  = 40;
   vms->highmem_redists = false;
   vms->highmem_ecam= false;
   vms->highmem_mmio= true;

   # qemu-system-aarch64 -accel kvm -cpu host\
 -machine virt-7.2,compact-highmem={on, off} \
 -m 4G,maxmem=511G -monitor stdio

   Region compact-highmem=off compact-highmem=on
   
   MEM[1GB 512GB][1GB 512GB]
   HIGH_GIC_REDISTS2  [512GB   512GB+64MB]   [disabled]
   HIGH_PCIE_ECAM [512GB+256MB 512GB+512MB]  [disabled]
   HIGH_PCIE_MMIO [disabled] [512GB   1TB]

In order to keep backwords compatibility, we need to disable the
optimization on machine, which is virt-7.1 or ealier than it. It
means the optimization is enabled by default from virt-7.2. Besides,
'compact-highmem' property is added so that the optimization can be
explicitly enabled or disabled on all machine types by users.

Signed-off-by: Gavin Shan 
Reviewed-by: Cornelia Huck 
Tested-by: Zhenyu Zhang 
---
  docs/system/arm/virt.rst |  4 
  hw/arm/virt.c| 32 
  include/hw/arm/virt.h|  1 +
  3 files changed, 37 insertions(+)



(...)


diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 4896f600b4..11b5685432 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -174,6 +174,12 @@ static const MemMapEntry base_memmap[] = {
   * Note the extended_memmap is sized so that it eventually also includes the
   * base_memmap entries (VIRT_HIGH_GIC_REDIST2 index is greater than the last
   * index of base_memmap).
+ *
+ * The memory map for these Highmem IO Regions can be in legacy or compact
+ * layout, depending on 'compact-highmem' property. With legacy layout, the
+ * PA space for one specific region is always reserved, even the region has


s/even/even if/



Thanks, it will be improved as suggested in next respin (v7).


+ * been disabled or doesn't fit into the PA space. However, the PA space for
+ * the region won't be reserved in these circumstances with compact layout.
   */
  static MemMapEntry extended_memmap[] = {
  /* Additional 64 MB redist region (can contain up to 512 redistributors) 
*/




Thanks,
Gavin




Re: [PATCH v6 5/7] hw/arm/virt: Improve high memory region address assignment

2022-10-25 Thread Gavin Shan

Hi Eric,

On 10/26/22 12:29 AM, Eric Auger wrote:

On 10/24/22 05:54, Gavin Shan wrote:

There are three high memory regions, which are VIRT_HIGH_REDIST2,
VIRT_HIGH_PCIE_ECAM and VIRT_HIGH_PCIE_MMIO. Their base addresses
are floating on highest RAM address. However, they can be disabled
in several cases.

(1) One specific high memory region is likely to be disabled by
 code by toggling vms->highmem_{redists, ecam, mmio}.

(2) VIRT_HIGH_PCIE_ECAM region is disabled on machine, which is
 'virt-2.12' or ealier than it.

(3) VIRT_HIGH_PCIE_ECAM region is disabled when firmware is loaded
 on 32-bits system.

(4) One specific high memory region is disabled when it breaks the
 PA space limit.

The current implementation of virt_set_{memmap, high_memmap}() isn't
optimized because the high memory region's PA space is always reserved,
regardless of whatever the actual state in the corresponding
vms->highmem_{redists, ecam, mmio} flag. In the code, 'base' and
'vms->highest_gpa' are always increased for case (1), (2) and (3).
It's unnecessary since the assigned PA space for the disabled high
memory region won't be used afterwards.

Improve the address assignment for those three high memory region by
skipping the address assignment for one specific high memory region if
it has been disabled in case (1), (2) and (3). The memory layout may
be changed after the improvement is applied, which leads to potential
migration breakage. So 'vms->highmem_compact' is added to control if
the improvement should be applied. For now, 'vms->highmem_compact' is
set to false, meaning that we don't have memory layout change until it
becomes configurable through property 'compact-highmem' in next patch.

Signed-off-by: Gavin Shan 
Reviewed-by: Cornelia Huck 

the code has quite changed since Connie's R-b


Right. Connie, could you please check if the changes make sense to you
and I can regain your R-B? :)


Tested-by: Zhenyu Zhang 
---
  hw/arm/virt.c | 15 ++-
  include/hw/arm/virt.h |  1 +
  2 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index ee98a8a3b6..4896f600b4 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1721,18 +1721,23 @@ static void virt_set_high_memmap(VirtMachineState *vms,
  vms->memmap[i].size = region_size;
  
  /*

- * Check each device to see if they fit in the PA space,
- * moving highest_gpa as we go.
+ * Check each device to see if it fits in the PA space,
+ * moving highest_gpa as we go. For compatibility, move
+ * highest_gpa for disabled fitting devices as well, if
+ * the compact layout has been disabled.
   *
   * For each device that doesn't fit, disable it.
   */
  fits = (region_base + region_size) <= BIT_ULL(pa_bits);
-if (fits) {
-vms->highest_gpa = region_base + region_size - 1;
+*region_enabled &= fits;
+if (vms->highmem_compact && !*region_enabled) {
+continue;
  }
  
-*region_enabled &= fits;

  base = region_base + region_size;
+if (fits) {
+vms->highest_gpa = region_base + region_size - 1;


vms->highest_gpa = base - 1;



It's personal taste actually. I was thinking of using 'base - 1', but
'region_base + region_size - 1' looks more like a direct way. I don't
have strong sense though and lets use 'base - 1' in next respin.


+}
  }
  }
  
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h

index 6ec479ca2b..709f623741 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -144,6 +144,7 @@ struct VirtMachineState {
  PFlashCFI01 *flash[2];
  bool secure;
  bool highmem;
+bool highmem_compact;
  bool highmem_ecam;
  bool highmem_mmio;
  bool highmem_redists;

Besides
Reviewed-by: Eric Auger 



Thanks,
Gavin




Re: [PATCH v6 0/7] hw/arm/virt: Improve address assignment for high memory regions

2022-10-25 Thread Gavin Shan

Hi Peter and Marc,

On 10/24/22 11:54 AM, Gavin Shan wrote:

There are three high memory regions, which are VIRT_HIGH_REDIST2,
VIRT_HIGH_PCIE_ECAM and VIRT_HIGH_PCIE_MMIO. Their base addresses
are floating on highest RAM address. However, they can be disabled
in several cases.

(1) One specific high memory region is disabled by developer by
 toggling vms->highmem_{redists, ecam, mmio}.

(2) VIRT_HIGH_PCIE_ECAM region is disabled on machine, which is
 'virt-2.12' or ealier than it.

(3) VIRT_HIGH_PCIE_ECAM region is disabled when firmware is loaded
 on 32-bits system.

(4) One specific high memory region is disabled when it breaks the
 PA space limit.

The current implementation of virt_set_memmap() isn't comprehensive
because the space for one specific high memory region is always
reserved from the PA space for case (1), (2) and (3). In the code,
'base' and 'vms->highest_gpa' are always increased for those three
cases. It's unnecessary since the assigned space of the disabled
high memory region won't be used afterwards.

The series intends to improve the address assignment for these
high memory regions and introduces new properties for user to
selectively disable those 3 high memory regions.

PATCH[1-4] preparatory work for the improvment
PATCH[5]   improve high memory region address assignment
PATCH[6]   adds 'compact-highmem' to enable or disable the optimization
PATCH[7]   adds properties so that high memory regions can be disabled

v5: https://lists.nongnu.org/archive/html/qemu-arm/2022-10/msg00280.html
v4: https://lists.nongnu.org/archive/html/qemu-arm/2022-10/msg00067.html
v3: https://lists.nongnu.org/archive/html/qemu-arm/2022-09/msg00258.html
v2: https://lore.kernel.org/all/20220815062958.100366-1-gs...@redhat.com/T/
v1: https://lists.nongnu.org/archive/html/qemu-arm/2022-08/msg00013.html



Could you help to take a look when getting a chance? I think Connie and
Eric are close to complete the reviews, but v7 is still needed to address
extra comments from them. I hope to make v7 mergeable if possible :)

Thanks,
Gavin


Changelog
==
v6:
   * Pick review-by from Connie/Eric(Connie/Eric)
   * Make the changes obvious in PATCH[v6 5/7]  (Eric)
   * Move the example to commit log and describe the legacy
 and compact layout in code's comments in PATCH[v6 6/7] (Eric)
   * Comment and commit message improvements(Connie/Eric)
   * Add 3 properties in PATCH[v6 7/7], allowing user to disable
 those 3 high memory regions(Marc)
v5:
   * Pick review-by and tested-by   (Connie/Zhenyu)
   * Add extra check in PATCH[v5 4/6]   (Connie)
   * Improve comments about compatibility for disabled regions
 in PATCH[v5 5/6]   (Connie)
v4:
   * Add virt_get_high_memmap_enabled() helper  (Eric)
   * Move 'vms->highmem_compact' and related logic from
 PATCH[v4 6/6] to PATCH[v4 5/6] to avoid git-bisect
 breakage   (Eric)
   * Document the legacy and optimized high memory region
 layout in commit log and source code   (Eric)
v3:
   * Reorder the patches(Gavin)
   * Add 'highmem-compact' property for backwards compatibility (Eric)
v2:
   * Split the patches for easier review(Gavin)
   * Improved changelog (Marc)
   * Use 'bool fits' in virt_set_high_memmap()  (Eric)

Gavin Shan (7):
   hw/arm/virt: Introduce virt_set_high_memmap() helper
   hw/arm/virt: Rename variable size to region_size in
 virt_set_high_memmap()
   hw/arm/virt: Introduce variable region_base in virt_set_high_memmap()
   hw/arm/virt: Introduce virt_get_high_memmap_enabled() helper
   hw/arm/virt: Improve high memory region address assignment
   hw/arm/virt: Add 'compact-highmem' property
   hw/arm/virt: Add properties to disable high memory regions

  docs/system/arm/virt.rst |  16 
  hw/arm/virt.c| 182 ---
  include/hw/arm/virt.h|   2 +
  3 files changed, 167 insertions(+), 33 deletions(-)






[PATCH v6 6/7] hw/arm/virt: Add 'compact-highmem' property

2022-10-24 Thread Gavin Shan
After the improvement to high memory region address assignment is
applied, the memory layout can be changed, introducing possible
migration breakage. For example, VIRT_HIGH_PCIE_MMIO memory region
is disabled or enabled when the optimization is applied or not, with
the following configuration. The configuration is only achievable by
modifying the source code until more properties are added to allow
users selectively disable those high memory regions.

  pa_bits  = 40;
  vms->highmem_redists = false;
  vms->highmem_ecam= false;
  vms->highmem_mmio= true;

  # qemu-system-aarch64 -accel kvm -cpu host\
-machine virt-7.2,compact-highmem={on, off} \
-m 4G,maxmem=511G -monitor stdio

  Region compact-highmem=off compact-highmem=on
  
  MEM[1GB 512GB][1GB 512GB]
  HIGH_GIC_REDISTS2  [512GB   512GB+64MB]   [disabled]
  HIGH_PCIE_ECAM [512GB+256MB 512GB+512MB]  [disabled]
  HIGH_PCIE_MMIO [disabled] [512GB   1TB]

In order to keep backwords compatibility, we need to disable the
optimization on machine, which is virt-7.1 or ealier than it. It
means the optimization is enabled by default from virt-7.2. Besides,
'compact-highmem' property is added so that the optimization can be
explicitly enabled or disabled on all machine types by users.

Signed-off-by: Gavin Shan 
Reviewed-by: Cornelia Huck 
Tested-by: Zhenyu Zhang 
---
 docs/system/arm/virt.rst |  4 
 hw/arm/virt.c| 32 
 include/hw/arm/virt.h|  1 +
 3 files changed, 37 insertions(+)

diff --git a/docs/system/arm/virt.rst b/docs/system/arm/virt.rst
index 20442ea2c1..4454706392 100644
--- a/docs/system/arm/virt.rst
+++ b/docs/system/arm/virt.rst
@@ -94,6 +94,10 @@ highmem
   address space above 32 bits. The default is ``on`` for machine types
   later than ``virt-2.12``.
 
+compact-highmem
+  Set ``on``/``off`` to enable/disable the compact layout for high memory 
regions.
+  The default is ``on`` for machine types later than ``virt-7.2``.
+
 gic-version
   Specify the version of the Generic Interrupt Controller (GIC) to provide.
   Valid values are:
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 4896f600b4..11b5685432 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -174,6 +174,12 @@ static const MemMapEntry base_memmap[] = {
  * Note the extended_memmap is sized so that it eventually also includes the
  * base_memmap entries (VIRT_HIGH_GIC_REDIST2 index is greater than the last
  * index of base_memmap).
+ *
+ * The memory map for these Highmem IO Regions can be in legacy or compact
+ * layout, depending on 'compact-highmem' property. With legacy layout, the
+ * PA space for one specific region is always reserved, even the region has
+ * been disabled or doesn't fit into the PA space. However, the PA space for
+ * the region won't be reserved in these circumstances with compact layout.
  */
 static MemMapEntry extended_memmap[] = {
 /* Additional 64 MB redist region (can contain up to 512 redistributors) */
@@ -2351,6 +2357,20 @@ static void virt_set_highmem(Object *obj, bool value, 
Error **errp)
 vms->highmem = value;
 }
 
+static bool virt_get_compact_highmem(Object *obj, Error **errp)
+{
+VirtMachineState *vms = VIRT_MACHINE(obj);
+
+return vms->highmem_compact;
+}
+
+static void virt_set_compact_highmem(Object *obj, bool value, Error **errp)
+{
+VirtMachineState *vms = VIRT_MACHINE(obj);
+
+vms->highmem_compact = value;
+}
+
 static bool virt_get_its(Object *obj, Error **errp)
 {
 VirtMachineState *vms = VIRT_MACHINE(obj);
@@ -2969,6 +2989,13 @@ static void virt_machine_class_init(ObjectClass *oc, 
void *data)
   "Set on/off to enable/disable using "
   "physical address space above 32 
bits");
 
+object_class_property_add_bool(oc, "compact-highmem",
+   virt_get_compact_highmem,
+   virt_set_compact_highmem);
+object_class_property_set_description(oc, "compact-highmem",
+  "Set on/off to enable/disable 
compact "
+  "layout for high memory regions");
+
 object_class_property_add_str(oc, "gic-version", virt_get_gic_version,
   virt_set_gic_version);
 object_class_property_set_description(oc, "gic-version",
@@ -3053,6 +3080,7 @@ static void virt_instance_init(Object *obj)
 
 /* High memory is enabled by default */
 vms->highmem = true;
+vms->highmem_compact = !vmc->no_highmem_compact;
 vms->gic_version = VIRT_GIC_VERSION_NOSEL;
 
 vms->highmem_ecam = !vmc->no_highmem_ecam;

[PATCH v6 2/7] hw/arm/virt: Rename variable size to region_size in virt_set_high_memmap()

2022-10-24 Thread Gavin Shan
This renames variable 'size' to 'region_size' in virt_set_high_memmap().
Its counterpart ('region_base') will be introduced in next patch.

No functional change intended.

Signed-off-by: Gavin Shan 
Reviewed-by: Eric Auger 
Reviewed-by: Cornelia Huck 
Tested-by: Zhenyu Zhang 
---
 hw/arm/virt.c | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 7572c44bda..e2ae88cf8b 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1692,15 +1692,16 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState 
*vms, int idx)
 static void virt_set_high_memmap(VirtMachineState *vms,
  hwaddr base, int pa_bits)
 {
+hwaddr region_size;
+bool fits;
 int i;
 
 for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) {
-hwaddr size = extended_memmap[i].size;
-bool fits;
+region_size = extended_memmap[i].size;
 
-base = ROUND_UP(base, size);
+base = ROUND_UP(base, region_size);
 vms->memmap[i].base = base;
-vms->memmap[i].size = size;
+vms->memmap[i].size = region_size;
 
 /*
  * Check each device to see if they fit in the PA space,
@@ -1708,9 +1709,9 @@ static void virt_set_high_memmap(VirtMachineState *vms,
  *
  * For each device that doesn't fit, disable it.
  */
-fits = (base + size) <= BIT_ULL(pa_bits);
+fits = (base + region_size) <= BIT_ULL(pa_bits);
 if (fits) {
-vms->highest_gpa = base + size - 1;
+vms->highest_gpa = base + region_size - 1;
 }
 
 switch (i) {
@@ -1725,7 +1726,7 @@ static void virt_set_high_memmap(VirtMachineState *vms,
 break;
 }
 
-base += size;
+base += region_size;
 }
 }
 
-- 
2.23.0




[PATCH v6 3/7] hw/arm/virt: Introduce variable region_base in virt_set_high_memmap()

2022-10-24 Thread Gavin Shan
This introduces variable 'region_base' for the base address of the
specific high memory region. It's the preparatory work to optimize
high memory region address assignment.

No functional change intended.

Signed-off-by: Gavin Shan 
Reviewed-by: Eric Auger 
Reviewed-by: Cornelia Huck 
Tested-by: Zhenyu Zhang 
---
 hw/arm/virt.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index e2ae88cf8b..0bf3cb7057 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1692,15 +1692,15 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState 
*vms, int idx)
 static void virt_set_high_memmap(VirtMachineState *vms,
  hwaddr base, int pa_bits)
 {
-hwaddr region_size;
+hwaddr region_base, region_size;
 bool fits;
 int i;
 
 for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) {
+region_base = ROUND_UP(base, extended_memmap[i].size);
 region_size = extended_memmap[i].size;
 
-base = ROUND_UP(base, region_size);
-vms->memmap[i].base = base;
+vms->memmap[i].base = region_base;
 vms->memmap[i].size = region_size;
 
 /*
@@ -1709,9 +1709,9 @@ static void virt_set_high_memmap(VirtMachineState *vms,
  *
  * For each device that doesn't fit, disable it.
  */
-fits = (base + region_size) <= BIT_ULL(pa_bits);
+fits = (region_base + region_size) <= BIT_ULL(pa_bits);
 if (fits) {
-vms->highest_gpa = base + region_size - 1;
+vms->highest_gpa = region_base + region_size - 1;
 }
 
 switch (i) {
@@ -1726,7 +1726,7 @@ static void virt_set_high_memmap(VirtMachineState *vms,
 break;
 }
 
-base += region_size;
+base = region_base + region_size;
 }
 }
 
-- 
2.23.0




[PATCH v6 4/7] hw/arm/virt: Introduce virt_get_high_memmap_enabled() helper

2022-10-23 Thread Gavin Shan
This introduces virt_get_high_memmap_enabled() helper, which returns
the pointer to vms->highmem_{redists, ecam, mmio}. The pointer will
be used in the subsequent patches.

No functional change intended.

Signed-off-by: Gavin Shan 
Reviewed-by: Eric Auger 
Reviewed-by: Cornelia Huck 
Tested-by: Zhenyu Zhang 
---
 hw/arm/virt.c | 32 +++-
 1 file changed, 19 insertions(+), 13 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 0bf3cb7057..ee98a8a3b6 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1689,14 +1689,31 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState 
*vms, int idx)
 return arm_cpu_mp_affinity(idx, clustersz);
 }
 
+static inline bool *virt_get_high_memmap_enabled(VirtMachineState *vms,
+ int index)
+{
+bool *enabled_array[] = {
+>highmem_redists,
+>highmem_ecam,
+>highmem_mmio,
+};
+
+assert(ARRAY_SIZE(extended_memmap) - VIRT_LOWMEMMAP_LAST ==
+   ARRAY_SIZE(enabled_array));
+assert(index - VIRT_LOWMEMMAP_LAST < ARRAY_SIZE(enabled_array));
+
+return enabled_array[index - VIRT_LOWMEMMAP_LAST];
+}
+
 static void virt_set_high_memmap(VirtMachineState *vms,
  hwaddr base, int pa_bits)
 {
 hwaddr region_base, region_size;
-bool fits;
+bool *region_enabled, fits;
 int i;
 
 for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) {
+region_enabled = virt_get_high_memmap_enabled(vms, i);
 region_base = ROUND_UP(base, extended_memmap[i].size);
 region_size = extended_memmap[i].size;
 
@@ -1714,18 +1731,7 @@ static void virt_set_high_memmap(VirtMachineState *vms,
 vms->highest_gpa = region_base + region_size - 1;
 }
 
-switch (i) {
-case VIRT_HIGH_GIC_REDIST2:
-vms->highmem_redists &= fits;
-break;
-case VIRT_HIGH_PCIE_ECAM:
-vms->highmem_ecam &= fits;
-break;
-case VIRT_HIGH_PCIE_MMIO:
-vms->highmem_mmio &= fits;
-break;
-}
-
+*region_enabled &= fits;
 base = region_base + region_size;
 }
 }
-- 
2.23.0




[PATCH v6 0/7] hw/arm/virt: Improve address assignment for high memory regions

2022-10-23 Thread Gavin Shan
There are three high memory regions, which are VIRT_HIGH_REDIST2,
VIRT_HIGH_PCIE_ECAM and VIRT_HIGH_PCIE_MMIO. Their base addresses
are floating on highest RAM address. However, they can be disabled
in several cases.

(1) One specific high memory region is disabled by developer by
toggling vms->highmem_{redists, ecam, mmio}.

(2) VIRT_HIGH_PCIE_ECAM region is disabled on machine, which is
'virt-2.12' or ealier than it.

(3) VIRT_HIGH_PCIE_ECAM region is disabled when firmware is loaded
on 32-bits system.

(4) One specific high memory region is disabled when it breaks the
PA space limit.

The current implementation of virt_set_memmap() isn't comprehensive
because the space for one specific high memory region is always
reserved from the PA space for case (1), (2) and (3). In the code,
'base' and 'vms->highest_gpa' are always increased for those three
cases. It's unnecessary since the assigned space of the disabled
high memory region won't be used afterwards.

The series intends to improve the address assignment for these
high memory regions and introduces new properties for user to
selectively disable those 3 high memory regions.

PATCH[1-4] preparatory work for the improvment
PATCH[5]   improve high memory region address assignment
PATCH[6]   adds 'compact-highmem' to enable or disable the optimization
PATCH[7]   adds properties so that high memory regions can be disabled

v5: https://lists.nongnu.org/archive/html/qemu-arm/2022-10/msg00280.html
v4: https://lists.nongnu.org/archive/html/qemu-arm/2022-10/msg00067.html
v3: https://lists.nongnu.org/archive/html/qemu-arm/2022-09/msg00258.html
v2: https://lore.kernel.org/all/20220815062958.100366-1-gs...@redhat.com/T/
v1: https://lists.nongnu.org/archive/html/qemu-arm/2022-08/msg00013.html

Changelog
==
v6:
  * Pick review-by from Connie/Eric(Connie/Eric)
  * Make the changes obvious in PATCH[v6 5/7]  (Eric)
  * Move the example to commit log and describe the legacy
and compact layout in code's comments in PATCH[v6 6/7] (Eric)
  * Comment and commit message improvements(Connie/Eric)
  * Add 3 properties in PATCH[v6 7/7], allowing user to disable
those 3 high memory regions(Marc)
v5:
  * Pick review-by and tested-by   (Connie/Zhenyu)
  * Add extra check in PATCH[v5 4/6]   (Connie)
  * Improve comments about compatibility for disabled regions
in PATCH[v5 5/6]   (Connie)
v4:
  * Add virt_get_high_memmap_enabled() helper  (Eric)
  * Move 'vms->highmem_compact' and related logic from
PATCH[v4 6/6] to PATCH[v4 5/6] to avoid git-bisect
breakage   (Eric)
  * Document the legacy and optimized high memory region
layout in commit log and source code   (Eric)
v3:
  * Reorder the patches(Gavin)
  * Add 'highmem-compact' property for backwards compatibility (Eric)
v2:
  * Split the patches for easier review(Gavin)
  * Improved changelog (Marc)
  * Use 'bool fits' in virt_set_high_memmap()  (Eric)

Gavin Shan (7):
  hw/arm/virt: Introduce virt_set_high_memmap() helper
  hw/arm/virt: Rename variable size to region_size in
virt_set_high_memmap()
  hw/arm/virt: Introduce variable region_base in virt_set_high_memmap()
  hw/arm/virt: Introduce virt_get_high_memmap_enabled() helper
  hw/arm/virt: Improve high memory region address assignment
  hw/arm/virt: Add 'compact-highmem' property
  hw/arm/virt: Add properties to disable high memory regions

 docs/system/arm/virt.rst |  16 
 hw/arm/virt.c| 182 ---
 include/hw/arm/virt.h|   2 +
 3 files changed, 167 insertions(+), 33 deletions(-)

-- 
2.23.0




[PATCH v6 1/7] hw/arm/virt: Introduce virt_set_high_memmap() helper

2022-10-23 Thread Gavin Shan
This introduces virt_set_high_memmap() helper. The logic of high
memory region address assignment is moved to the helper. The intention
is to make the subsequent optimization for high memory region address
assignment easier.

No functional change intended.

Signed-off-by: Gavin Shan 
Reviewed-by: Eric Auger 
Reviewed-by: Cornelia Huck 
Tested-by: Zhenyu Zhang 
---
 hw/arm/virt.c | 74 ---
 1 file changed, 41 insertions(+), 33 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index cda9defe8f..7572c44bda 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1689,6 +1689,46 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState 
*vms, int idx)
 return arm_cpu_mp_affinity(idx, clustersz);
 }
 
+static void virt_set_high_memmap(VirtMachineState *vms,
+ hwaddr base, int pa_bits)
+{
+int i;
+
+for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) {
+hwaddr size = extended_memmap[i].size;
+bool fits;
+
+base = ROUND_UP(base, size);
+vms->memmap[i].base = base;
+vms->memmap[i].size = size;
+
+/*
+ * Check each device to see if they fit in the PA space,
+ * moving highest_gpa as we go.
+ *
+ * For each device that doesn't fit, disable it.
+ */
+fits = (base + size) <= BIT_ULL(pa_bits);
+if (fits) {
+vms->highest_gpa = base + size - 1;
+}
+
+switch (i) {
+case VIRT_HIGH_GIC_REDIST2:
+vms->highmem_redists &= fits;
+break;
+case VIRT_HIGH_PCIE_ECAM:
+vms->highmem_ecam &= fits;
+break;
+case VIRT_HIGH_PCIE_MMIO:
+vms->highmem_mmio &= fits;
+break;
+}
+
+base += size;
+}
+}
+
 static void virt_set_memmap(VirtMachineState *vms, int pa_bits)
 {
 MachineState *ms = MACHINE(vms);
@@ -1744,39 +1784,7 @@ static void virt_set_memmap(VirtMachineState *vms, int 
pa_bits)
 /* We know for sure that at least the memory fits in the PA space */
 vms->highest_gpa = memtop - 1;
 
-for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) {
-hwaddr size = extended_memmap[i].size;
-bool fits;
-
-base = ROUND_UP(base, size);
-vms->memmap[i].base = base;
-vms->memmap[i].size = size;
-
-/*
- * Check each device to see if they fit in the PA space,
- * moving highest_gpa as we go.
- *
- * For each device that doesn't fit, disable it.
- */
-fits = (base + size) <= BIT_ULL(pa_bits);
-if (fits) {
-vms->highest_gpa = base + size - 1;
-}
-
-switch (i) {
-case VIRT_HIGH_GIC_REDIST2:
-vms->highmem_redists &= fits;
-break;
-case VIRT_HIGH_PCIE_ECAM:
-vms->highmem_ecam &= fits;
-break;
-case VIRT_HIGH_PCIE_MMIO:
-vms->highmem_mmio &= fits;
-break;
-}
-
-base += size;
-}
+virt_set_high_memmap(vms, base, pa_bits);
 
 if (device_memory_size > 0) {
 ms->device_memory = g_malloc0(sizeof(*ms->device_memory));
-- 
2.23.0




[PATCH v6 5/7] hw/arm/virt: Improve high memory region address assignment

2022-10-23 Thread Gavin Shan
There are three high memory regions, which are VIRT_HIGH_REDIST2,
VIRT_HIGH_PCIE_ECAM and VIRT_HIGH_PCIE_MMIO. Their base addresses
are floating on highest RAM address. However, they can be disabled
in several cases.

(1) One specific high memory region is likely to be disabled by
code by toggling vms->highmem_{redists, ecam, mmio}.

(2) VIRT_HIGH_PCIE_ECAM region is disabled on machine, which is
'virt-2.12' or ealier than it.

(3) VIRT_HIGH_PCIE_ECAM region is disabled when firmware is loaded
on 32-bits system.

(4) One specific high memory region is disabled when it breaks the
PA space limit.

The current implementation of virt_set_{memmap, high_memmap}() isn't
optimized because the high memory region's PA space is always reserved,
regardless of whatever the actual state in the corresponding
vms->highmem_{redists, ecam, mmio} flag. In the code, 'base' and
'vms->highest_gpa' are always increased for case (1), (2) and (3).
It's unnecessary since the assigned PA space for the disabled high
memory region won't be used afterwards.

Improve the address assignment for those three high memory region by
skipping the address assignment for one specific high memory region if
it has been disabled in case (1), (2) and (3). The memory layout may
be changed after the improvement is applied, which leads to potential
migration breakage. So 'vms->highmem_compact' is added to control if
the improvement should be applied. For now, 'vms->highmem_compact' is
set to false, meaning that we don't have memory layout change until it
becomes configurable through property 'compact-highmem' in next patch.

Signed-off-by: Gavin Shan 
Reviewed-by: Cornelia Huck 
Tested-by: Zhenyu Zhang 
---
 hw/arm/virt.c | 15 ++-
 include/hw/arm/virt.h |  1 +
 2 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index ee98a8a3b6..4896f600b4 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1721,18 +1721,23 @@ static void virt_set_high_memmap(VirtMachineState *vms,
 vms->memmap[i].size = region_size;
 
 /*
- * Check each device to see if they fit in the PA space,
- * moving highest_gpa as we go.
+ * Check each device to see if it fits in the PA space,
+ * moving highest_gpa as we go. For compatibility, move
+ * highest_gpa for disabled fitting devices as well, if
+ * the compact layout has been disabled.
  *
  * For each device that doesn't fit, disable it.
  */
 fits = (region_base + region_size) <= BIT_ULL(pa_bits);
-if (fits) {
-vms->highest_gpa = region_base + region_size - 1;
+*region_enabled &= fits;
+if (vms->highmem_compact && !*region_enabled) {
+continue;
 }
 
-*region_enabled &= fits;
 base = region_base + region_size;
+if (fits) {
+vms->highest_gpa = region_base + region_size - 1;
+}
 }
 }
 
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index 6ec479ca2b..709f623741 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -144,6 +144,7 @@ struct VirtMachineState {
 PFlashCFI01 *flash[2];
 bool secure;
 bool highmem;
+bool highmem_compact;
 bool highmem_ecam;
 bool highmem_mmio;
 bool highmem_redists;
-- 
2.23.0




[PATCH v6 7/7] hw/arm/virt: Add properties to disable high memory regions

2022-10-23 Thread Gavin Shan
These 3 high memory regions are usually enabled by default, but
they may be not used. For example, VIRT_HIGH_GIC_REDIST2 isn't
needed by GICv2. This leads to waste in the PA space.

Add properties to allow users selectively disable them if needed:
"highmem-redists", "highmem-ecam", "highmem-mmio".

Suggested-by: Marc Zyngier 
Signed-off-by: Gavin Shan 
---
 docs/system/arm/virt.rst | 12 
 hw/arm/virt.c| 64 
 2 files changed, 76 insertions(+)

diff --git a/docs/system/arm/virt.rst b/docs/system/arm/virt.rst
index 4454706392..a1668a969d 100644
--- a/docs/system/arm/virt.rst
+++ b/docs/system/arm/virt.rst
@@ -98,6 +98,18 @@ compact-highmem
   Set ``on``/``off`` to enable/disable the compact layout for high memory 
regions.
   The default is ``on`` for machine types later than ``virt-7.2``.
 
+highmem-redists
+  Set ``on``/``off`` to enable/disable the high memry region for GICv3/4
+  redistributor. The default is ``on``.
+
+highmem-ecam
+  Set ``on``/``off`` to enable/disable the high memry region for PCI ECAM.
+  The default is ``on`` for machine types later than ``virt-3.0``.
+
+highmem-mmio
+  Set ``on``/``off`` to enable/disable the high memry region for PCI MMIO.
+  The default is ``on``.
+
 gic-version
   Specify the version of the Generic Interrupt Controller (GIC) to provide.
   Valid values are:
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 11b5685432..afafc2d1b8 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -2371,6 +2371,49 @@ static void virt_set_compact_highmem(Object *obj, bool 
value, Error **errp)
 vms->highmem_compact = value;
 }
 
+static bool virt_get_highmem_redists(Object *obj, Error **errp)
+{
+VirtMachineState *vms = VIRT_MACHINE(obj);
+
+return vms->highmem_redists;
+}
+
+static void virt_set_highmem_redists(Object *obj, bool value, Error **errp)
+{
+VirtMachineState *vms = VIRT_MACHINE(obj);
+
+vms->highmem_redists = value;
+}
+
+static bool virt_get_highmem_ecam(Object *obj, Error **errp)
+{
+VirtMachineState *vms = VIRT_MACHINE(obj);
+
+return vms->highmem_ecam;
+}
+
+static void virt_set_highmem_ecam(Object *obj, bool value, Error **errp)
+{
+VirtMachineState *vms = VIRT_MACHINE(obj);
+
+vms->highmem_ecam = value;
+}
+
+static bool virt_get_highmem_mmio(Object *obj, Error **errp)
+{
+VirtMachineState *vms = VIRT_MACHINE(obj);
+
+return vms->highmem_mmio;
+}
+
+static void virt_set_highmem_mmio(Object *obj, bool value, Error **errp)
+{
+VirtMachineState *vms = VIRT_MACHINE(obj);
+
+vms->highmem_mmio = value;
+}
+
+
 static bool virt_get_its(Object *obj, Error **errp)
 {
 VirtMachineState *vms = VIRT_MACHINE(obj);
@@ -2996,6 +3039,27 @@ static void virt_machine_class_init(ObjectClass *oc, 
void *data)
   "Set on/off to enable/disable 
compact "
   "layout for high memory regions");
 
+object_class_property_add_bool(oc, "highmem-redists",
+   virt_get_highmem_redists,
+   virt_set_highmem_redists);
+object_class_property_set_description(oc, "highmem-redists",
+  "Set on/off to enable/disable high "
+  "memory region for GICv3/4 
redistributor");
+
+object_class_property_add_bool(oc, "highmem-ecam",
+   virt_get_highmem_ecam,
+   virt_set_highmem_ecam);
+object_class_property_set_description(oc, "highmem-ecam",
+  "Set on/off to enable/disable high "
+  "memory region for PCI ECAM");
+
+object_class_property_add_bool(oc, "highmem-mmio",
+   virt_get_highmem_mmio,
+   virt_set_highmem_mmio);
+object_class_property_set_description(oc, "highmem-mmio",
+  "Set on/off to enable/disable high "
+  "memory region for PCI MMIO");
+
 object_class_property_add_str(oc, "gic-version", virt_get_gic_version,
   virt_set_gic_version);
 object_class_property_set_description(oc, "gic-version",
-- 
2.23.0




Re: [PATCH v5 6/6] hw/arm/virt: Add 'compact-highmem' property

2022-10-20 Thread Gavin Shan

Hi Marc,

On 10/20/22 5:44 PM, Marc Zyngier wrote:

On Thu, 20 Oct 2022 00:57:32 +0100,
Gavin Shan  wrote:


For Marc's suggestion to add properties so that these high memory
regions can be disabled by users. I can add one patch after this one
to introduce the following 3 properties. Could you please confirm
the property names are good enough? It's nice if Marc can help to
confirm before I'm going to work on next revision.

 "highmem-ecam":"on"/"off" on vms->highmem_ecam
 "highmem-mmio":"on"/"off" on vms->highmem_mmio
 "highmem-redists": "on"/"off" on vms->highmem_redists


I think that'd be reasonable, and would give the user some actual
control over what gets exposed in the highmem region.

I guess that the annoying thing with these options is that they allow
the user to request conflicting settings (256 CPUs and
highmem-redists=off, for example). You'll need to make this fail more
or less gracefully.



Thanks for the quick confirm. The check is already existing in machvirt_init().
I think what we need is simply to calculate 'virt_max_cpus' with consideration
to 'highmem-redists' property there.

if (max_cpus > virt_max_cpus) {
error_report("Number of SMP CPUs requested (%d) exceeds max CPUs "
 "supported by machine 'mach-virt' (%d)",
 max_cpus, virt_max_cpus);
exit(1);
}

Thanks,
Gavin




Re: [PATCH v5 6/6] hw/arm/virt: Add 'compact-highmem' property

2022-10-19 Thread Gavin Shan

Hi Eric,

On 10/20/22 4:18 AM, Eric Auger wrote:

On 10/12/22 01:18, Gavin Shan wrote:

After the improvement to high memory region address assignment is
applied, the memory layout can be changed, introducing possible
migration breakage. For example, VIRT_HIGH_PCIE_MMIO memory region
is disabled or enabled when the optimization is applied or not, with
the following configuration.

   pa_bits  = 40;
   vms->highmem_redists = false;
   vms->highmem_ecam= false;
   vms->highmem_mmio= true;

   # qemu-system-aarch64 -accel kvm -cpu host\
 -machine virt-7.2,compact-highmem={on, off} \
 -m 4G,maxmem=511G -monitor stdio

   Regioncompact-highmem=off compact-highmem=on
   
   RAM   [1GB 512GB][1GB 512GB]
   HIGH_GIC_REDISTS  [512GB   512GB+64MB]   [disabled]
   HIGH_PCIE_ECAM[512GB+256MB 512GB+512MB]  [disabled]
   HIGH_PCIE_MMIO[disabled] [512GB   1TB]

In order to keep backwords compatibility, we need to disable the
optimization on machines, which is virt-7.1 or ealier than it. It
means the optimization is enabled by default from virt-7.2. Besides,
'compact-highmem' property is added so that the optimization can be
explicitly enabled or disabled on all machine types by users.

Signed-off-by: Gavin Shan 
Tested-by: Zhenyu Zhang 
---
  docs/system/arm/virt.rst |  4 
  hw/arm/virt.c| 47 
  include/hw/arm/virt.h|  1 +
  3 files changed, 52 insertions(+)

diff --git a/docs/system/arm/virt.rst b/docs/system/arm/virt.rst
index 20442ea2c1..75bf5a4994 100644
--- a/docs/system/arm/virt.rst
+++ b/docs/system/arm/virt.rst
@@ -94,6 +94,10 @@ highmem
address space above 32 bits. The default is ``on`` for machine types
later than ``virt-2.12``.
  
+compact-highmem

+  Set ``on``/``off`` to enable/disable compact space for high memory regions.
+  The default is ``on`` for machine types later than ``virt-7.2``
+
  gic-version
Specify the version of the Generic Interrupt Controller (GIC) to provide.
Valid values are:
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index c05cfb5314..8f1dba0ece 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -174,6 +174,27 @@ static const MemMapEntry base_memmap[] = {
   * Note the extended_memmap is sized so that it eventually also includes the
   * base_memmap entries (VIRT_HIGH_GIC_REDIST2 index is greater than the last
   * index of base_memmap).
+ *
+ * The addresses assigned to these regions are affected by 'compact-highmem'
+ * property, which is to enable or disable the compact space in the Highmem
+ * IO regions. For example, VIRT_HIGH_PCIE_MMIO can be disabled or enabled
+ * depending on the property in the following scenario.

To me you shall rather explain here what is the so-called "compact"
space vs the legacy highmem layout.

If I understand correctly the example rather legitimates the use of a
compat option showing how the layout can be affected by the option. I
would put that in the commit msg instead. Also in your example I see
VIRT_HIGH_GIC_REDISTS is disabled but the code does not disable the
region excpet if it does not fit within the PA. This does not match your
example. Also the region is named VIRT_HIGH_GIC_REDIST2.

In v4, Marc also suggested to have individual options for each highmem
region.
https://lore.kernel.org/qemu-devel/0f8e6a58-0dde-fb80-6966-7bb32c4df...@redhat.com/

Have you considered that option?



I think your comments make sense to me. So lets put the following comments
to the code and move the example to commit log.

  /*
   * The memory map for these Highmem IO Regions can be in legacy or compact
   * layout, depending on 'compact-highmem' property. In legacy layout, the
   * PA space for one specific region is always reserved, even the region has
   * been disabled or doesn't fit into the PA space. However, the PA space for
   * the region won't be reserved in these circumstances.
   */

You're correct about the example. VIRT_HIGH_GIC_REDIST2 should be used. Besides,
the configuration is only achievable by modifying source code at present, until
Marc's suggestion rolls in to allow users disable one particular high memory
regions by more properties. I will amend the commit log to have something like
below.

For example, VIRT_HIGH_PCIE_MMIO memory region is disabled or enabled when
the optimization is applied or not, with the following configuration. The
configuration is only achievable by modifying source code, until more 
properties
are added to allow user selectively disable those high memory regions.

For Marc's suggestion to add properties so that these high memory regions can
be disabled by users. I can add one patch after this one to introduce the 
following
3 properties. Could you please confirm the property names are good enough? It's
nice if Marc can help to

Re: [PATCH v5 6/6] hw/arm/virt: Add 'compact-highmem' property

2022-10-19 Thread Gavin Shan

Hi Connie,

On 10/19/22 10:00 PM, Cornelia Huck wrote:

On Wed, Oct 12 2022, Gavin Shan  wrote:


After the improvement to high memory region address assignment is
applied, the memory layout can be changed, introducing possible
migration breakage. For example, VIRT_HIGH_PCIE_MMIO memory region
is disabled or enabled when the optimization is applied or not, with
the following configuration.

   pa_bits  = 40;
   vms->highmem_redists = false;
   vms->highmem_ecam= false;
   vms->highmem_mmio= true;

   # qemu-system-aarch64 -accel kvm -cpu host\
 -machine virt-7.2,compact-highmem={on, off} \
 -m 4G,maxmem=511G -monitor stdio

   Regioncompact-highmem=off compact-highmem=on
   
   RAM   [1GB 512GB][1GB 512GB]
   HIGH_GIC_REDISTS  [512GB   512GB+64MB]   [disabled]
   HIGH_PCIE_ECAM[512GB+256MB 512GB+512MB]  [disabled]
   HIGH_PCIE_MMIO[disabled] [512GB   1TB]

In order to keep backwords compatibility, we need to disable the
optimization on machines, which is virt-7.1 or ealier than it. It
means the optimization is enabled by default from virt-7.2. Besides,
'compact-highmem' property is added so that the optimization can be
explicitly enabled or disabled on all machine types by users.

Signed-off-by: Gavin Shan 
Tested-by: Zhenyu Zhang 
---
  docs/system/arm/virt.rst |  4 
  hw/arm/virt.c| 47 
  include/hw/arm/virt.h|  1 +
  3 files changed, 52 insertions(+)

diff --git a/docs/system/arm/virt.rst b/docs/system/arm/virt.rst
index 20442ea2c1..75bf5a4994 100644
--- a/docs/system/arm/virt.rst
+++ b/docs/system/arm/virt.rst
@@ -94,6 +94,10 @@ highmem
address space above 32 bits. The default is ``on`` for machine types
later than ``virt-2.12``.
  
+compact-highmem

+  Set ``on``/``off`` to enable/disable compact space for high memory regions.


Maybe s/compact space/the compact layout/ ?



Yeah, 'compact layout' is better. I will amend in next respin.


+  The default is ``on`` for machine types later than ``virt-7.2``
+
  gic-version
Specify the version of the Generic Interrupt Controller (GIC) to provide.
Valid values are:
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index c05cfb5314..8f1dba0ece 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -174,6 +174,27 @@ static const MemMapEntry base_memmap[] = {
   * Note the extended_memmap is sized so that it eventually also includes the
   * base_memmap entries (VIRT_HIGH_GIC_REDIST2 index is greater than the last
   * index of base_memmap).
+ *
+ * The addresses assigned to these regions are affected by 'compact-highmem'
+ * property, which is to enable or disable the compact space in the Highmem


s/space in/layout for/ ?



Agreed.


+ * IO regions. For example, VIRT_HIGH_PCIE_MMIO can be disabled or enabled
+ * depending on the property in the following scenario.
+ *
+ * pa_bits  = 40;
+ * vms->highmem_redists = false;
+ * vms->highmem_ecam= false;
+ * vms->highmem_mmio= true;
+ *
+ * # qemu-system-aarch64 -accel kvm -cpu host\
+ *   -machine virt-7.2,compact-highmem={on, off} \
+ *   -m 4G,maxmem=511G -monitor stdio
+ *
+ * Regioncompact-highmem=offcompact-highmem=on
+ * 
+ * RAM   [1GB 512GB][1GB 512GB]
+ * HIGH_GIC_REDISTS  [512GB   512GB+64MB]   [disabled]
+ * HIGH_PCIE_ECAM[512GB+256GB 512GB+512MB]  [disabled]
+ * HIGH_PCIE_MMIO[disabled] [512GB   1TB]
   */
  static MemMapEntry extended_memmap[] = {
  /* Additional 64 MB redist region (can contain up to 512 redistributors) 
*/


(...)


@@ -3124,8 +3167,12 @@ DEFINE_VIRT_MACHINE_AS_LATEST(7, 2)
  
  static void virt_machine_7_1_options(MachineClass *mc)

  {
+VirtMachineClass *vmc = VIRT_MACHINE_CLASS(OBJECT_CLASS(mc));
+
  virt_machine_7_2_options(mc);
  compat_props_add(mc->compat_props, hw_compat_7_1, hw_compat_7_1_len);
+/* Compact space for high memory regions was introduced with 7.2 */


s/space/layout/ ?



Ack.


+vmc->no_highmem_compact = true;
  }
  DEFINE_VIRT_MACHINE(7, 1)
  


Other than the wording nits, lgtm.

Reviewed-by: Cornelia Huck 



Thanks,
Gavin




Re: [PATCH v5 5/6] hw/arm/virt: Improve high memory region address assignment

2022-10-19 Thread Gavin Shan

Hi Eric,

On 10/20/22 4:07 AM, Eric Auger wrote:

On 10/12/22 01:18, Gavin Shan wrote:

There are three high memory regions, which are VIRT_HIGH_REDIST2,
VIRT_HIGH_PCIE_ECAM and VIRT_HIGH_PCIE_MMIO. Their base addresses
are floating on highest RAM address. However, they can be disabled
in several cases.

(1) One specific high memory region is disabled by developer by
 toggling vms->highmem_{redists, ecam, mmio}.

I would replace the above sentence by

One specific high memory region is likely to be disabled by the code by toggling 
vms->highmem_{redists, ecam, mmio}:



Ok.



(2) VIRT_HIGH_PCIE_ECAM region is disabled on machine, which is
 'virt-2.12' or ealier than it.

(3) VIRT_HIGH_PCIE_ECAM region is disabled when firmware is loaded
 on 32-bits system.

(4) One specific high memory region is disabled when it breaks the
 PA space limit.

The current implementation of virt_set_memmap() isn't comprehensive
because the space for one specific high memory region is always
reserved from the PA space for case (1), (2) and (3).

I would suggest:
isn't optimized because the high memory region PA range is always

reserved whatever the actual state of the corresponding vms->highmem_
* flag.



Ok. I will have something like below in next revision.

  The current implementation of virt_set_{memmap, high_memmap}() isn't
  optimized because the high memory region's PA space is always reserved,
  regardless of whatever the actual state in the corresponding
  vms->highmem_{redists, ecam, mmio} flag. In the code, 


  In the code,
'base' and 'vms->highest_gpa' are always increased for those three
cases. It's unnecessary since the assigned space of the disabled
high memory region won't be used afterwards.

This improves the address assignment for those three high memory

s/This improves/Improve


Ok.


region by skipping the address assignment for one specific high
memory region if it has been disabled in case (1), (2) and (3).
'vms->high_compact' is false for now, meaning that we don't have

s/hight_compat/highmem_compact

You also may justify the introduction of this new field.


Thanks. It should be 'highmem_compact'. Yes, it makes sense to
justify the addition of 'vms->highmem_compact'. I will have something
like below in next revision.

  The memory layout may be changed after the improvement is applied, which
  leads to potential migration breakage. So 'vms->highmem_compact' is added
  to control if the improvement should be applied. For now, 
'vms->highmem_compact'
  is set to false, meaning that we don't have memory layout change until it
  becomes configurable through property 'compact-highmem' in next patch.


any behavior changes until it becomes configurable through property
'compact-highmem' in next patch.

Signed-off-by: Gavin Shan 

Tested-by: Zhenyu Zhang 
---
  hw/arm/virt.c | 23 +++
  include/hw/arm/virt.h |  1 +
  2 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index ee98a8a3b6..c05cfb5314 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1717,22 +1717,29 @@ static void virt_set_high_memmap(VirtMachineState *vms,
  region_base = ROUND_UP(base, extended_memmap[i].size);
  region_size = extended_memmap[i].size;
  
-vms->memmap[i].base = region_base;

-vms->memmap[i].size = region_size;
-
  /*
   * Check each device to see if they fit in the PA space,

while we are at it, you can change s/they fit/it fits


Agreed.


- * moving highest_gpa as we go.
+ * moving highest_gpa as we go. For compatibility, move
+ * highest_gpa for disabled fitting devices as well, if
+ * the compact layout has been disabled.
   *
   * For each device that doesn't fit, disable it.
   */
  fits = (region_base + region_size) <= BIT_ULL(pa_bits);
-if (fits) {
+if (*region_enabled && fits) {
+vms->memmap[i].base = region_base;
+vms->memmap[i].size = region_size;
  vms->highest_gpa = region_base + region_size - 1;
+base = region_base + region_size;
+} else {
+*region_enabled = false;
+if (!vms->highmem_compact) {
+base = region_base + region_size;
+if (fits) {
+vms->highest_gpa = region_base + region_size - 1;
+}
+}
  }
-
-*region_enabled &= fits;
-base = region_base + region_size;
  }
  }

This looks quite complicated to me. It is not obvious for instance we
have the same code as before when highmem_compact is not set. Typically

vms->memmap[i].base/size are not always set as they were to be and impact on 
the rest of the code must be double checked.

Could this be rewritten in that way (pseudocode totally untested).


static void fit_highmem_slot(vms, *base, i,

[PATCH v5 5/6] hw/arm/virt: Improve high memory region address assignment

2022-10-11 Thread Gavin Shan
There are three high memory regions, which are VIRT_HIGH_REDIST2,
VIRT_HIGH_PCIE_ECAM and VIRT_HIGH_PCIE_MMIO. Their base addresses
are floating on highest RAM address. However, they can be disabled
in several cases.

(1) One specific high memory region is disabled by developer by
toggling vms->highmem_{redists, ecam, mmio}.

(2) VIRT_HIGH_PCIE_ECAM region is disabled on machine, which is
'virt-2.12' or ealier than it.

(3) VIRT_HIGH_PCIE_ECAM region is disabled when firmware is loaded
on 32-bits system.

(4) One specific high memory region is disabled when it breaks the
PA space limit.

The current implementation of virt_set_memmap() isn't comprehensive
because the space for one specific high memory region is always
reserved from the PA space for case (1), (2) and (3). In the code,
'base' and 'vms->highest_gpa' are always increased for those three
cases. It's unnecessary since the assigned space of the disabled
high memory region won't be used afterwards.

This improves the address assignment for those three high memory
region by skipping the address assignment for one specific high
memory region if it has been disabled in case (1), (2) and (3).
'vms->high_compact' is false for now, meaning that we don't have
any behavior changes until it becomes configurable through property
'compact-highmem' in next patch.

Signed-off-by: Gavin Shan 
Tested-by: Zhenyu Zhang 
---
 hw/arm/virt.c | 23 +++
 include/hw/arm/virt.h |  1 +
 2 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index ee98a8a3b6..c05cfb5314 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1717,22 +1717,29 @@ static void virt_set_high_memmap(VirtMachineState *vms,
 region_base = ROUND_UP(base, extended_memmap[i].size);
 region_size = extended_memmap[i].size;
 
-vms->memmap[i].base = region_base;
-vms->memmap[i].size = region_size;
-
 /*
  * Check each device to see if they fit in the PA space,
- * moving highest_gpa as we go.
+ * moving highest_gpa as we go. For compatibility, move
+ * highest_gpa for disabled fitting devices as well, if
+ * the compact layout has been disabled.
  *
  * For each device that doesn't fit, disable it.
  */
 fits = (region_base + region_size) <= BIT_ULL(pa_bits);
-if (fits) {
+if (*region_enabled && fits) {
+vms->memmap[i].base = region_base;
+vms->memmap[i].size = region_size;
 vms->highest_gpa = region_base + region_size - 1;
+base = region_base + region_size;
+} else {
+*region_enabled = false;
+if (!vms->highmem_compact) {
+base = region_base + region_size;
+if (fits) {
+vms->highest_gpa = region_base + region_size - 1;
+}
+}
 }
-
-*region_enabled &= fits;
-base = region_base + region_size;
 }
 }
 
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index 6ec479ca2b..709f623741 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -144,6 +144,7 @@ struct VirtMachineState {
 PFlashCFI01 *flash[2];
 bool secure;
 bool highmem;
+bool highmem_compact;
 bool highmem_ecam;
 bool highmem_mmio;
 bool highmem_redists;
-- 
2.23.0




[PATCH v5 2/6] hw/arm/virt: Rename variable size to region_size in virt_set_high_memmap()

2022-10-11 Thread Gavin Shan
This renames variable 'size' to 'region_size' in virt_set_high_memmap().
Its counterpart ('region_base') will be introduced in next patch.

No functional change intended.

Signed-off-by: Gavin Shan 
Reviewed-by: Eric Auger 
Reviewed-by: Cornelia Huck 
Tested-by: Zhenyu Zhang 
---
 hw/arm/virt.c | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 7572c44bda..e2ae88cf8b 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1692,15 +1692,16 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState 
*vms, int idx)
 static void virt_set_high_memmap(VirtMachineState *vms,
  hwaddr base, int pa_bits)
 {
+hwaddr region_size;
+bool fits;
 int i;
 
 for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) {
-hwaddr size = extended_memmap[i].size;
-bool fits;
+region_size = extended_memmap[i].size;
 
-base = ROUND_UP(base, size);
+base = ROUND_UP(base, region_size);
 vms->memmap[i].base = base;
-vms->memmap[i].size = size;
+vms->memmap[i].size = region_size;
 
 /*
  * Check each device to see if they fit in the PA space,
@@ -1708,9 +1709,9 @@ static void virt_set_high_memmap(VirtMachineState *vms,
  *
  * For each device that doesn't fit, disable it.
  */
-fits = (base + size) <= BIT_ULL(pa_bits);
+fits = (base + region_size) <= BIT_ULL(pa_bits);
 if (fits) {
-vms->highest_gpa = base + size - 1;
+vms->highest_gpa = base + region_size - 1;
 }
 
 switch (i) {
@@ -1725,7 +1726,7 @@ static void virt_set_high_memmap(VirtMachineState *vms,
 break;
 }
 
-base += size;
+base += region_size;
 }
 }
 
-- 
2.23.0




[PATCH v5 4/6] hw/arm/virt: Introduce virt_get_high_memmap_enabled() helper

2022-10-11 Thread Gavin Shan
This introduces virt_get_high_memmap_enabled() helper, which returns
the pointer to vms->highmem_{redists, ecam, mmio}. The pointer will
be used in the subsequent patches.

No functional change intended.

Signed-off-by: Gavin Shan 
Tested-by: Zhenyu Zhang 
---
 hw/arm/virt.c | 32 +++-
 1 file changed, 19 insertions(+), 13 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 0bf3cb7057..ee98a8a3b6 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1689,14 +1689,31 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState 
*vms, int idx)
 return arm_cpu_mp_affinity(idx, clustersz);
 }
 
+static inline bool *virt_get_high_memmap_enabled(VirtMachineState *vms,
+ int index)
+{
+bool *enabled_array[] = {
+>highmem_redists,
+>highmem_ecam,
+>highmem_mmio,
+};
+
+assert(ARRAY_SIZE(extended_memmap) - VIRT_LOWMEMMAP_LAST ==
+   ARRAY_SIZE(enabled_array));
+assert(index - VIRT_LOWMEMMAP_LAST < ARRAY_SIZE(enabled_array));
+
+return enabled_array[index - VIRT_LOWMEMMAP_LAST];
+}
+
 static void virt_set_high_memmap(VirtMachineState *vms,
  hwaddr base, int pa_bits)
 {
 hwaddr region_base, region_size;
-bool fits;
+bool *region_enabled, fits;
 int i;
 
 for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) {
+region_enabled = virt_get_high_memmap_enabled(vms, i);
 region_base = ROUND_UP(base, extended_memmap[i].size);
 region_size = extended_memmap[i].size;
 
@@ -1714,18 +1731,7 @@ static void virt_set_high_memmap(VirtMachineState *vms,
 vms->highest_gpa = region_base + region_size - 1;
 }
 
-switch (i) {
-case VIRT_HIGH_GIC_REDIST2:
-vms->highmem_redists &= fits;
-break;
-case VIRT_HIGH_PCIE_ECAM:
-vms->highmem_ecam &= fits;
-break;
-case VIRT_HIGH_PCIE_MMIO:
-vms->highmem_mmio &= fits;
-break;
-}
-
+*region_enabled &= fits;
 base = region_base + region_size;
 }
 }
-- 
2.23.0




[PATCH v5 6/6] hw/arm/virt: Add 'compact-highmem' property

2022-10-11 Thread Gavin Shan
After the improvement to high memory region address assignment is
applied, the memory layout can be changed, introducing possible
migration breakage. For example, VIRT_HIGH_PCIE_MMIO memory region
is disabled or enabled when the optimization is applied or not, with
the following configuration.

  pa_bits  = 40;
  vms->highmem_redists = false;
  vms->highmem_ecam= false;
  vms->highmem_mmio= true;

  # qemu-system-aarch64 -accel kvm -cpu host\
-machine virt-7.2,compact-highmem={on, off} \
-m 4G,maxmem=511G -monitor stdio

  Regioncompact-highmem=off compact-highmem=on
  
  RAM   [1GB 512GB][1GB 512GB]
  HIGH_GIC_REDISTS  [512GB   512GB+64MB]   [disabled]
  HIGH_PCIE_ECAM[512GB+256MB 512GB+512MB]  [disabled]
  HIGH_PCIE_MMIO[disabled] [512GB   1TB]

In order to keep backwords compatibility, we need to disable the
optimization on machines, which is virt-7.1 or ealier than it. It
means the optimization is enabled by default from virt-7.2. Besides,
'compact-highmem' property is added so that the optimization can be
explicitly enabled or disabled on all machine types by users.

Signed-off-by: Gavin Shan 
Tested-by: Zhenyu Zhang 
---
 docs/system/arm/virt.rst |  4 
 hw/arm/virt.c| 47 
 include/hw/arm/virt.h|  1 +
 3 files changed, 52 insertions(+)

diff --git a/docs/system/arm/virt.rst b/docs/system/arm/virt.rst
index 20442ea2c1..75bf5a4994 100644
--- a/docs/system/arm/virt.rst
+++ b/docs/system/arm/virt.rst
@@ -94,6 +94,10 @@ highmem
   address space above 32 bits. The default is ``on`` for machine types
   later than ``virt-2.12``.
 
+compact-highmem
+  Set ``on``/``off`` to enable/disable compact space for high memory regions.
+  The default is ``on`` for machine types later than ``virt-7.2``
+
 gic-version
   Specify the version of the Generic Interrupt Controller (GIC) to provide.
   Valid values are:
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index c05cfb5314..8f1dba0ece 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -174,6 +174,27 @@ static const MemMapEntry base_memmap[] = {
  * Note the extended_memmap is sized so that it eventually also includes the
  * base_memmap entries (VIRT_HIGH_GIC_REDIST2 index is greater than the last
  * index of base_memmap).
+ *
+ * The addresses assigned to these regions are affected by 'compact-highmem'
+ * property, which is to enable or disable the compact space in the Highmem
+ * IO regions. For example, VIRT_HIGH_PCIE_MMIO can be disabled or enabled
+ * depending on the property in the following scenario.
+ *
+ * pa_bits  = 40;
+ * vms->highmem_redists = false;
+ * vms->highmem_ecam= false;
+ * vms->highmem_mmio= true;
+ *
+ * # qemu-system-aarch64 -accel kvm -cpu host\
+ *   -machine virt-7.2,compact-highmem={on, off} \
+ *   -m 4G,maxmem=511G -monitor stdio
+ *
+ * Regioncompact-highmem=offcompact-highmem=on
+ * 
+ * RAM   [1GB 512GB][1GB 512GB]
+ * HIGH_GIC_REDISTS  [512GB   512GB+64MB]   [disabled]
+ * HIGH_PCIE_ECAM[512GB+256GB 512GB+512MB]  [disabled]
+ * HIGH_PCIE_MMIO[disabled] [512GB   1TB]
  */
 static MemMapEntry extended_memmap[] = {
 /* Additional 64 MB redist region (can contain up to 512 redistributors) */
@@ -2353,6 +2374,20 @@ static void virt_set_highmem(Object *obj, bool value, 
Error **errp)
 vms->highmem = value;
 }
 
+static bool virt_get_compact_highmem(Object *obj, Error **errp)
+{
+VirtMachineState *vms = VIRT_MACHINE(obj);
+
+return vms->highmem_compact;
+}
+
+static void virt_set_compact_highmem(Object *obj, bool value, Error **errp)
+{
+VirtMachineState *vms = VIRT_MACHINE(obj);
+
+vms->highmem_compact = value;
+}
+
 static bool virt_get_its(Object *obj, Error **errp)
 {
 VirtMachineState *vms = VIRT_MACHINE(obj);
@@ -2971,6 +3006,13 @@ static void virt_machine_class_init(ObjectClass *oc, 
void *data)
   "Set on/off to enable/disable using "
   "physical address space above 32 
bits");
 
+object_class_property_add_bool(oc, "compact-highmem",
+   virt_get_compact_highmem,
+   virt_set_compact_highmem);
+object_class_property_set_description(oc, "compact-highmem",
+  "Set on/off to enable/disable 
compact "
+  "space for high memory regions");
+
 object_class_property_add_str(oc, "gic-version", virt_get_gic_version,
   virt_set_gic_version)

[PATCH v5 3/6] hw/arm/virt: Introduce variable region_base in virt_set_high_memmap()

2022-10-11 Thread Gavin Shan
This introduces variable 'region_base' for the base address of the
specific high memory region. It's the preparatory work to optimize
high memory region address assignment.

No functional change intended.

Signed-off-by: Gavin Shan 
Reviewed-by: Eric Auger 
Reviewed-by: Cornelia Huck 
Tested-by: Zhenyu Zhang 
---
 hw/arm/virt.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index e2ae88cf8b..0bf3cb7057 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1692,15 +1692,15 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState 
*vms, int idx)
 static void virt_set_high_memmap(VirtMachineState *vms,
  hwaddr base, int pa_bits)
 {
-hwaddr region_size;
+hwaddr region_base, region_size;
 bool fits;
 int i;
 
 for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) {
+region_base = ROUND_UP(base, extended_memmap[i].size);
 region_size = extended_memmap[i].size;
 
-base = ROUND_UP(base, region_size);
-vms->memmap[i].base = base;
+vms->memmap[i].base = region_base;
 vms->memmap[i].size = region_size;
 
 /*
@@ -1709,9 +1709,9 @@ static void virt_set_high_memmap(VirtMachineState *vms,
  *
  * For each device that doesn't fit, disable it.
  */
-fits = (base + region_size) <= BIT_ULL(pa_bits);
+fits = (region_base + region_size) <= BIT_ULL(pa_bits);
 if (fits) {
-vms->highest_gpa = base + region_size - 1;
+vms->highest_gpa = region_base + region_size - 1;
 }
 
 switch (i) {
@@ -1726,7 +1726,7 @@ static void virt_set_high_memmap(VirtMachineState *vms,
 break;
 }
 
-base += region_size;
+base = region_base + region_size;
 }
 }
 
-- 
2.23.0




[PATCH v5 1/6] hw/arm/virt: Introduce virt_set_high_memmap() helper

2022-10-11 Thread Gavin Shan
This introduces virt_set_high_memmap() helper. The logic of high
memory region address assignment is moved to the helper. The intention
is to make the subsequent optimization for high memory region address
assignment easier.

No functional change intended.

Signed-off-by: Gavin Shan 
Reviewed-by: Eric Auger 
Reviewed-by: Cornelia Huck 
Tested-by: Zhenyu Zhang 
---
 hw/arm/virt.c | 74 ---
 1 file changed, 41 insertions(+), 33 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index cda9defe8f..7572c44bda 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1689,6 +1689,46 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState 
*vms, int idx)
 return arm_cpu_mp_affinity(idx, clustersz);
 }
 
+static void virt_set_high_memmap(VirtMachineState *vms,
+ hwaddr base, int pa_bits)
+{
+int i;
+
+for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) {
+hwaddr size = extended_memmap[i].size;
+bool fits;
+
+base = ROUND_UP(base, size);
+vms->memmap[i].base = base;
+vms->memmap[i].size = size;
+
+/*
+ * Check each device to see if they fit in the PA space,
+ * moving highest_gpa as we go.
+ *
+ * For each device that doesn't fit, disable it.
+ */
+fits = (base + size) <= BIT_ULL(pa_bits);
+if (fits) {
+vms->highest_gpa = base + size - 1;
+}
+
+switch (i) {
+case VIRT_HIGH_GIC_REDIST2:
+vms->highmem_redists &= fits;
+break;
+case VIRT_HIGH_PCIE_ECAM:
+vms->highmem_ecam &= fits;
+break;
+case VIRT_HIGH_PCIE_MMIO:
+vms->highmem_mmio &= fits;
+break;
+}
+
+base += size;
+}
+}
+
 static void virt_set_memmap(VirtMachineState *vms, int pa_bits)
 {
 MachineState *ms = MACHINE(vms);
@@ -1744,39 +1784,7 @@ static void virt_set_memmap(VirtMachineState *vms, int 
pa_bits)
 /* We know for sure that at least the memory fits in the PA space */
 vms->highest_gpa = memtop - 1;
 
-for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) {
-hwaddr size = extended_memmap[i].size;
-bool fits;
-
-base = ROUND_UP(base, size);
-vms->memmap[i].base = base;
-vms->memmap[i].size = size;
-
-/*
- * Check each device to see if they fit in the PA space,
- * moving highest_gpa as we go.
- *
- * For each device that doesn't fit, disable it.
- */
-fits = (base + size) <= BIT_ULL(pa_bits);
-if (fits) {
-vms->highest_gpa = base + size - 1;
-}
-
-switch (i) {
-case VIRT_HIGH_GIC_REDIST2:
-vms->highmem_redists &= fits;
-break;
-case VIRT_HIGH_PCIE_ECAM:
-vms->highmem_ecam &= fits;
-break;
-case VIRT_HIGH_PCIE_MMIO:
-vms->highmem_mmio &= fits;
-break;
-}
-
-base += size;
-}
+virt_set_high_memmap(vms, base, pa_bits);
 
 if (device_memory_size > 0) {
 ms->device_memory = g_malloc0(sizeof(*ms->device_memory));
-- 
2.23.0




[PATCH v5 0/6] hw/arm/virt: Improve address assignment for high memory regions

2022-10-11 Thread Gavin Shan
There are three high memory regions, which are VIRT_HIGH_REDIST2,
VIRT_HIGH_PCIE_ECAM and VIRT_HIGH_PCIE_MMIO. Their base addresses
are floating on highest RAM address. However, they can be disabled
in several cases.

(1) One specific high memory region is disabled by developer by
toggling vms->highmem_{redists, ecam, mmio}.

(2) VIRT_HIGH_PCIE_ECAM region is disabled on machine, which is
'virt-2.12' or ealier than it.

(3) VIRT_HIGH_PCIE_ECAM region is disabled when firmware is loaded
on 32-bits system.

(4) One specific high memory region is disabled when it breaks the
PA space limit.

The current implementation of virt_set_memmap() isn't comprehensive
because the space for one specific high memory region is always
reserved from the PA space for case (1), (2) and (3). In the code,
'base' and 'vms->highest_gpa' are always increased for those three
cases. It's unnecessary since the assigned space of the disabled
high memory region won't be used afterwards.

The series intends to improve the address assignment for these
high memory regions.

PATCH[1-4] preparatory work for the improvment
PATCH[5]   improve high memory region address assignment
PATCH[6]   adds 'compact-highmem' to enable or disable the optimization

v4: https://lists.nongnu.org/archive/html/qemu-arm/2022-10/msg00067.html
v3: https://lists.nongnu.org/archive/html/qemu-arm/2022-09/msg00258.html
v2: https://lore.kernel.org/all/20220815062958.100366-1-gs...@redhat.com/T/
v1: https://lists.nongnu.org/archive/html/qemu-arm/2022-08/msg00013.html

Changelog
==
v5:
  * Pick review-by and tested-by   (Connie/Zhenyu)
  * Add extra check in PATCH[v5 4/6]   (Connie)
  * Improve comments about compatibility for disabled regions
in PATCH[v5 5/6]   (Connie)
v4:
  * Add virt_get_high_memmap_enabled() helper  (Eric)
  * Move 'vms->highmem_compact' and related logic from
PATCH[v4 6/6] to PATCH[v4 5/6] to avoid git-bisect
breakage   (Eric)
  * Document the legacy and optimized high memory region
layout in commit log and source code   (Eric)
v3:
  * Reorder the patches(Gavin)
  * Add 'highmem-compact' property for backwards compatibility (Eric)
v2:
  * Split the patches for easier review(Gavin)
  * Improved changelog (Marc)
  * Use 'bool fits' in virt_set_high_memmap()  (Eric)

Gavin Shan (6):
  hw/arm/virt: Introduce virt_set_high_memmap() helper
  hw/arm/virt: Rename variable size to region_size in
virt_set_high_memmap()
  hw/arm/virt: Introduce variable region_base in virt_set_high_memmap()
  hw/arm/virt: Introduce virt_get_high_memmap_enabled() helper
  hw/arm/virt: Improve high memory region address assignment
  hw/arm/virt: Add 'compact-highmem' property

 docs/system/arm/virt.rst |   4 ++
 hw/arm/virt.c| 135 +--
 include/hw/arm/virt.h|   2 +
 3 files changed, 108 insertions(+), 33 deletions(-)

-- 
2.23.0




Re: [PATCH v4 4/6] hw/arm/virt: Introduce virt_get_high_memmap_enabled() helper

2022-10-11 Thread Gavin Shan

On 10/12/22 12:45 AM, Eric Auger wrote:

On 10/5/22 00:47, Gavin Shan wrote:

On 10/4/22 6:41 PM, Cornelia Huck wrote:

On Tue, Oct 04 2022, Gavin Shan  wrote:


This introduces virt_get_high_memmap_enabled() helper, which returns
the pointer to vms->highmem_{redists, ecam, mmio}. The pointer will
be used in the subsequent patches.

No functional change intended.

Signed-off-by: Gavin Shan 
---
   hw/arm/virt.c | 30 +-
   1 file changed, 17 insertions(+), 13 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index b0b679d1f4..59de7b78b5 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1689,14 +1689,29 @@ static uint64_t
virt_cpu_mp_affinity(VirtMachineState *vms, int idx)
   return arm_cpu_mp_affinity(idx, clustersz);
   }
   +static inline bool *virt_get_high_memmap_enabled(VirtMachineState
*vms,
+ int index)
+{
+    bool *enabled_array[] = {
+    >highmem_redists,
+    >highmem_ecam,
+    >highmem_mmio,
+    };
+
+    assert(index - VIRT_LOWMEMMAP_LAST < ARRAY_SIZE(enabled_array));


I wonder whether we want an assert(ARRAY_SIZE(extended_memmap) ==
ARRAY_SIZE(enabled_array))? IIUC, we never want those two to get out of
sync?



Yeah, It makes sense to ensure both arrays synchronized. I will add
the extra check in next respin.


With Connie's suggestion this looks good to me.



What we need is actually like below because the array (extended_memmap)
starts from VIRT_LOWMEMMAP_LAST instead of zero. I'm adding the extra
check into v5, which will be posted shortly.

   assert(ARRAY_SIZE(extended_memmap) - VIRT_LOWMEMMAP_LAST ==
  ARRAY_SIZE(enabled_array));




+
+    return enabled_array[index - VIRT_LOWMEMMAP_LAST];
+}
+


Thanks,
Gavin




Re: [PATCH v4 6/6] hw/arm/virt: Add 'compact-highmem' property

2022-10-04 Thread Gavin Shan

Hi Marc,

On 10/5/22 1:39 AM, Marc Zyngier wrote:

On Tue, 04 Oct 2022 01:26:27 +0100,
Gavin Shan  wrote:


After the improvement to high memory region address assignment is
applied, the memory layout can be changed, introducing possible
migration breakage. For example, VIRT_HIGH_PCIE_MMIO memory region
is disabled or enabled when the optimization is applied or not, with
the following configuration.

   pa_bits  = 40;
   vms->highmem_redists = false;
   vms->highmem_ecam= false;
   vms->highmem_mmio= true;


The question is how are these parameters specified by a user? Short of
hacking the code, this isn't really possible.



Yeah, It's impossible to have false for vms->highmem_redists unless
the code is hacked.



   # qemu-system-aarch64 -accel kvm -cpu host\
 -machine virt-7.2,compact-highmem={on, off} \
 -m 4G,maxmem=511G -monitor stdio

   Regioncompact-highmem=off compact-highmem=on
   
   RAM   [1GB 512GB][1GB 512GB]
   HIGH_GIC_REDISTS  [512GB   512GB+64MB]   [disabled]
   HIGH_PCIE_ECAM[512GB+256MB 512GB+512MB]  [disabled]
   HIGH_PCIE_MMIO[disabled] [512GB   1TB]

In order to keep backwords compatibility, we need to disable the
optimization on machines, which is virt-7.1 or ealier than it. It
means the optimization is enabled by default from virt-7.2. Besides,
'compact-highmem' property is added so that the optimization can be
explicitly enabled or disabled on all machine types by users.


Not directly related to this series, but it seems to me that we should
be aiming at reproducible results across HW implementations (at least
with KVM). Depending on how many PA bits the HW implements, we end-up
with a set of devices or another, which is likely to be confusing for
a user.

I think we should consider an additional set of changes to allow a
user to specify the PA bits as well as the devices they want to see
enabled.



I think the idea to selectively enable devices (high memory regions)
is sensible. For example, users may needn't HIGH_PCIE_MMIO at all
in some systems, where they have limited PCI devices.

I'm not sure about PA bits because it has been discovered from hardware
and configure the automatically optimized value/bits back to KVM. The
optimized value/bits is automatically calculated based on the enabled
high memory regions.

Thanks,
Gavin




Re: [PATCH v4 5/6] hw/arm/virt: Improve high memory region address

2022-10-04 Thread Gavin Shan

Hi Connie,

On 10/4/22 6:53 PM, Cornelia Huck wrote:

On Tue, Oct 04 2022, Gavin Shan  wrote:


There are three high memory regions, which are VIRT_HIGH_REDIST2,
VIRT_HIGH_PCIE_ECAM and VIRT_HIGH_PCIE_MMIO. Their base addresses
are floating on highest RAM address. However, they can be disabled
in several cases.

(1) One specific high memory region is disabled by developer by
 toggling vms->highmem_{redists, ecam, mmio}.

(2) VIRT_HIGH_PCIE_ECAM region is disabled on machine, which is
 'virt-2.12' or ealier than it.

(3) VIRT_HIGH_PCIE_ECAM region is disabled when firmware is loaded
 on 32-bits system.

(4) One specific high memory region is disabled when it breaks the
 PA space limit.

The current implementation of virt_set_memmap() isn't comprehensive
because the space for one specific high memory region is always
reserved from the PA space for case (1), (2) and (3). In the code,
'base' and 'vms->highest_gpa' are always increased for those three
cases. It's unnecessary since the assigned space of the disabled
high memory region won't be used afterwards.

This improves the address assignment for those three high memory
region by skipping the address assignment for one specific high
memory region if it has been disabled in case (1), (2) and (3).
'vms->high_compact' is false for now, meaning that we don't have
any behavior changes until it becomes configurable through property
'compact-highmem' in next patch.

Signed-off-by: Gavin Shan 
---
  hw/arm/virt.c | 19 ---
  include/hw/arm/virt.h |  1 +
  2 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 59de7b78b5..4164da49e9 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1715,9 +1715,6 @@ static void virt_set_high_memmap(VirtMachineState *vms,
  region_base = ROUND_UP(base, extended_memmap[i].size);
  region_size = extended_memmap[i].size;
  
-vms->memmap[i].base = region_base;

-vms->memmap[i].size = region_size;
-
  /*
   * Check each device to see if they fit in the PA space,
   * moving highest_gpa as we go.


Maybe tweak this comment?

"Check each enabled device to see if they fit in the PA space,
moving highest_gpa as we go. For compatibility, move highest_gpa
for disabled fitting devices as well, if the compact layout has
been disabled."

(Or would that be overkill?)



It looks overkill to me since the code is simple and clear. However,
comments won't be harmful. I will integrate the proposed comment
in next respin.

Thanks,
Gavin






Re: [PATCH v4 4/6] hw/arm/virt: Introduce virt_get_high_memmap_enabled() helper

2022-10-04 Thread Gavin Shan

Hi Connie,

On 10/4/22 6:41 PM, Cornelia Huck wrote:

On Tue, Oct 04 2022, Gavin Shan  wrote:


This introduces virt_get_high_memmap_enabled() helper, which returns
the pointer to vms->highmem_{redists, ecam, mmio}. The pointer will
be used in the subsequent patches.

No functional change intended.

Signed-off-by: Gavin Shan 
---
  hw/arm/virt.c | 30 +-
  1 file changed, 17 insertions(+), 13 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index b0b679d1f4..59de7b78b5 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1689,14 +1689,29 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState 
*vms, int idx)
  return arm_cpu_mp_affinity(idx, clustersz);
  }
  
+static inline bool *virt_get_high_memmap_enabled(VirtMachineState *vms,

+ int index)
+{
+bool *enabled_array[] = {
+>highmem_redists,
+>highmem_ecam,
+>highmem_mmio,
+};
+
+assert(index - VIRT_LOWMEMMAP_LAST < ARRAY_SIZE(enabled_array));


I wonder whether we want an assert(ARRAY_SIZE(extended_memmap) ==
ARRAY_SIZE(enabled_array))? IIUC, we never want those two to get out of
sync?



Yeah, It makes sense to ensure both arrays synchronized. I will add
the extra check in next respin.


+
+return enabled_array[index - VIRT_LOWMEMMAP_LAST];
+}
+


Thanks,
Gavin




[PATCH v4 5/6] hw/arm/virt: Improve high memory region address

2022-10-03 Thread Gavin Shan
There are three high memory regions, which are VIRT_HIGH_REDIST2,
VIRT_HIGH_PCIE_ECAM and VIRT_HIGH_PCIE_MMIO. Their base addresses
are floating on highest RAM address. However, they can be disabled
in several cases.

(1) One specific high memory region is disabled by developer by
toggling vms->highmem_{redists, ecam, mmio}.

(2) VIRT_HIGH_PCIE_ECAM region is disabled on machine, which is
'virt-2.12' or ealier than it.

(3) VIRT_HIGH_PCIE_ECAM region is disabled when firmware is loaded
on 32-bits system.

(4) One specific high memory region is disabled when it breaks the
PA space limit.

The current implementation of virt_set_memmap() isn't comprehensive
because the space for one specific high memory region is always
reserved from the PA space for case (1), (2) and (3). In the code,
'base' and 'vms->highest_gpa' are always increased for those three
cases. It's unnecessary since the assigned space of the disabled
high memory region won't be used afterwards.

This improves the address assignment for those three high memory
region by skipping the address assignment for one specific high
memory region if it has been disabled in case (1), (2) and (3).
'vms->high_compact' is false for now, meaning that we don't have
any behavior changes until it becomes configurable through property
'compact-highmem' in next patch.

Signed-off-by: Gavin Shan 
---
 hw/arm/virt.c | 19 ---
 include/hw/arm/virt.h |  1 +
 2 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 59de7b78b5..4164da49e9 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1715,9 +1715,6 @@ static void virt_set_high_memmap(VirtMachineState *vms,
 region_base = ROUND_UP(base, extended_memmap[i].size);
 region_size = extended_memmap[i].size;
 
-vms->memmap[i].base = region_base;
-vms->memmap[i].size = region_size;
-
 /*
  * Check each device to see if they fit in the PA space,
  * moving highest_gpa as we go.
@@ -1725,12 +1722,20 @@ static void virt_set_high_memmap(VirtMachineState *vms,
  * For each device that doesn't fit, disable it.
  */
 fits = (region_base + region_size) <= BIT_ULL(pa_bits);
-if (fits) {
+if (*region_enabled && fits) {
+vms->memmap[i].base = region_base;
+vms->memmap[i].size = region_size;
 vms->highest_gpa = region_base + region_size - 1;
+base = region_base + region_size;
+} else {
+*region_enabled = false;
+if (!vms->highmem_compact) {
+base = region_base + region_size;
+if (fits) {
+vms->highest_gpa = region_base + region_size - 1;
+}
+}
 }
-
-*region_enabled &= fits;
-base = region_base + region_size;
 }
 }
 
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index 6ec479ca2b..709f623741 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -144,6 +144,7 @@ struct VirtMachineState {
 PFlashCFI01 *flash[2];
 bool secure;
 bool highmem;
+bool highmem_compact;
 bool highmem_ecam;
 bool highmem_mmio;
 bool highmem_redists;
-- 
2.23.0




[PATCH v4 6/6] hw/arm/virt: Add 'compact-highmem' property

2022-10-03 Thread Gavin Shan
After the improvement to high memory region address assignment is
applied, the memory layout can be changed, introducing possible
migration breakage. For example, VIRT_HIGH_PCIE_MMIO memory region
is disabled or enabled when the optimization is applied or not, with
the following configuration.

  pa_bits  = 40;
  vms->highmem_redists = false;
  vms->highmem_ecam= false;
  vms->highmem_mmio= true;

  # qemu-system-aarch64 -accel kvm -cpu host\
-machine virt-7.2,compact-highmem={on, off} \
-m 4G,maxmem=511G -monitor stdio

  Regioncompact-highmem=off compact-highmem=on
  
  RAM   [1GB 512GB][1GB 512GB]
  HIGH_GIC_REDISTS  [512GB   512GB+64MB]   [disabled]
  HIGH_PCIE_ECAM[512GB+256MB 512GB+512MB]  [disabled]
  HIGH_PCIE_MMIO[disabled] [512GB   1TB]

In order to keep backwords compatibility, we need to disable the
optimization on machines, which is virt-7.1 or ealier than it. It
means the optimization is enabled by default from virt-7.2. Besides,
'compact-highmem' property is added so that the optimization can be
explicitly enabled or disabled on all machine types by users.

Signed-off-by: Gavin Shan 
---
 docs/system/arm/virt.rst |  4 
 hw/arm/virt.c| 47 
 include/hw/arm/virt.h|  1 +
 3 files changed, 52 insertions(+)

diff --git a/docs/system/arm/virt.rst b/docs/system/arm/virt.rst
index 20442ea2c1..75bf5a4994 100644
--- a/docs/system/arm/virt.rst
+++ b/docs/system/arm/virt.rst
@@ -94,6 +94,10 @@ highmem
   address space above 32 bits. The default is ``on`` for machine types
   later than ``virt-2.12``.
 
+compact-highmem
+  Set ``on``/``off`` to enable/disable compact space for high memory regions.
+  The default is ``on`` for machine types later than ``virt-7.2``
+
 gic-version
   Specify the version of the Generic Interrupt Controller (GIC) to provide.
   Valid values are:
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 4164da49e9..9fe65a2ae1 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -174,6 +174,27 @@ static const MemMapEntry base_memmap[] = {
  * Note the extended_memmap is sized so that it eventually also includes the
  * base_memmap entries (VIRT_HIGH_GIC_REDIST2 index is greater than the last
  * index of base_memmap).
+ *
+ * The addresses assigned to these regions are affected by 'compact-highmem'
+ * property, which is to enable or disable the compact space in the Highmem
+ * IO regions. For example, VIRT_HIGH_PCIE_MMIO can be disabled or enabled
+ * depending on the property in the following scenario.
+ *
+ * pa_bits  = 40;
+ * vms->highmem_redists = false;
+ * vms->highmem_ecam= false;
+ * vms->highmem_mmio= true;
+ *
+ * # qemu-system-aarch64 -accel kvm -cpu host\
+ *   -machine virt-7.2,compact-highmem={on, off} \
+ *   -m 4G,maxmem=511G -monitor stdio
+ *
+ * Regioncompact-highmem=offcompact-highmem=on
+ * 
+ * RAM   [1GB 512GB][1GB 512GB]
+ * HIGH_GIC_REDISTS  [512GB   512GB+64MB]   [disabled]
+ * HIGH_PCIE_ECAM[512GB+256GB 512GB+512MB]  [disabled]
+ * HIGH_PCIE_MMIO[disabled] [512GB   1TB]
  */
 static MemMapEntry extended_memmap[] = {
 /* Additional 64 MB redist region (can contain up to 512 redistributors) */
@@ -2349,6 +2370,20 @@ static void virt_set_highmem(Object *obj, bool value, 
Error **errp)
 vms->highmem = value;
 }
 
+static bool virt_get_compact_highmem(Object *obj, Error **errp)
+{
+VirtMachineState *vms = VIRT_MACHINE(obj);
+
+return vms->highmem_compact;
+}
+
+static void virt_set_compact_highmem(Object *obj, bool value, Error **errp)
+{
+VirtMachineState *vms = VIRT_MACHINE(obj);
+
+vms->highmem_compact = value;
+}
+
 static bool virt_get_its(Object *obj, Error **errp)
 {
 VirtMachineState *vms = VIRT_MACHINE(obj);
@@ -2967,6 +3002,13 @@ static void virt_machine_class_init(ObjectClass *oc, 
void *data)
   "Set on/off to enable/disable using "
   "physical address space above 32 
bits");
 
+object_class_property_add_bool(oc, "compact-highmem",
+   virt_get_compact_highmem,
+   virt_set_compact_highmem);
+object_class_property_set_description(oc, "compact-highmem",
+  "Set on/off to enable/disable 
compact "
+  "space for high memory regions");
+
 object_class_property_add_str(oc, "gic-version", virt_get_gic_version,
   virt_set_gic_version);
 object_class_proper

[PATCH v4 4/6] hw/arm/virt: Introduce virt_get_high_memmap_enabled() helper

2022-10-03 Thread Gavin Shan
This introduces virt_get_high_memmap_enabled() helper, which returns
the pointer to vms->highmem_{redists, ecam, mmio}. The pointer will
be used in the subsequent patches.

No functional change intended.

Signed-off-by: Gavin Shan 
---
 hw/arm/virt.c | 30 +-
 1 file changed, 17 insertions(+), 13 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index b0b679d1f4..59de7b78b5 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1689,14 +1689,29 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState 
*vms, int idx)
 return arm_cpu_mp_affinity(idx, clustersz);
 }
 
+static inline bool *virt_get_high_memmap_enabled(VirtMachineState *vms,
+ int index)
+{
+bool *enabled_array[] = {
+>highmem_redists,
+>highmem_ecam,
+>highmem_mmio,
+};
+
+assert(index - VIRT_LOWMEMMAP_LAST < ARRAY_SIZE(enabled_array));
+
+return enabled_array[index - VIRT_LOWMEMMAP_LAST];
+}
+
 static void virt_set_high_memmap(VirtMachineState *vms,
  hwaddr base, int pa_bits)
 {
 hwaddr region_base, region_size;
-bool fits;
+bool *region_enabled, fits;
 int i;
 
 for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) {
+region_enabled = virt_get_high_memmap_enabled(vms, i);
 region_base = ROUND_UP(base, extended_memmap[i].size);
 region_size = extended_memmap[i].size;
 
@@ -1714,18 +1729,7 @@ static void virt_set_high_memmap(VirtMachineState *vms,
 vms->highest_gpa = region_base + region_size - 1;
 }
 
-switch (i) {
-case VIRT_HIGH_GIC_REDIST2:
-vms->highmem_redists &= fits;
-break;
-case VIRT_HIGH_PCIE_ECAM:
-vms->highmem_ecam &= fits;
-break;
-case VIRT_HIGH_PCIE_MMIO:
-vms->highmem_mmio &= fits;
-break;
-}
-
+*region_enabled &= fits;
 base = region_base + region_size;
 }
 }
-- 
2.23.0




[PATCH v4 0/6] hw/arm/virt: Improve address assignment for high memory regions

2022-10-03 Thread Gavin Shan
There are three high memory regions, which are VIRT_HIGH_REDIST2,
VIRT_HIGH_PCIE_ECAM and VIRT_HIGH_PCIE_MMIO. Their base addresses
are floating on highest RAM address. However, they can be disabled
in several cases.

(1) One specific high memory region is disabled by developer by
toggling vms->highmem_{redists, ecam, mmio}.

(2) VIRT_HIGH_PCIE_ECAM region is disabled on machine, which is
'virt-2.12' or ealier than it.

(3) VIRT_HIGH_PCIE_ECAM region is disabled when firmware is loaded
on 32-bits system.

(4) One specific high memory region is disabled when it breaks the
PA space limit.

The current implementation of virt_set_memmap() isn't comprehensive
because the space for one specific high memory region is always
reserved from the PA space for case (1), (2) and (3). In the code,
'base' and 'vms->highest_gpa' are always increased for those three
cases. It's unnecessary since the assigned space of the disabled
high memory region won't be used afterwards.

The series intends to improve the address assignment for these
high memory regions.

PATCH[1-4] preparatory work for the improvment
PATCH[5]   improve high memory region address assignment
PATCH[6]   adds 'compact-highmem' to enable or disable the optimization

History
===
v3: https://lists.nongnu.org/archive/html/qemu-arm/2022-09/msg00258.html
v2: https://lore.kernel.org/all/20220815062958.100366-1-gs...@redhat.com/T/
v1: https://lists.nongnu.org/archive/html/qemu-arm/2022-08/msg00013.html

Changelog
==
v4:
  * Add virt_get_high_memmap_enabled() helper  (Eric)
  * Move 'vms->highmem_compact' and related logic from
PATCH[v4 6/6] to PATCH[v4 5/6] to avoid git-bisect
breakage   (Eric)
  * Document the legacy and optimized high memory region
layout in commit log and source code   (Eric)
v3:
  * Reorder the patches(Gavin)
  * Add 'highmem-compact' property for backwards compatibility (Eric)
v2:
  * Split the patches for easier review(Gavin)
  * Improved changelog (Marc)
  * Use 'bool fits' in virt_set_high_memmap()  (Eric)

Gavin Shan (6):
  hw/arm/virt: Introduce virt_set_high_memmap() helper
  hw/arm/virt: Rename variable size to region_size in
virt_set_high_memmap()
  hw/arm/virt: Introduce variable region_base in virt_set_high_memmap()
  hw/arm/virt: Introduce virt_get_high_memmap_enabled() helper
  hw/arm/virt: Improve high memory region address
  hw/arm/virt: Add 'compact-highmem' property

 docs/system/arm/virt.rst |   4 ++
 hw/arm/virt.c| 131 +--
 include/hw/arm/virt.h|   2 +
 3 files changed, 104 insertions(+), 33 deletions(-)

-- 
2.23.0




[PATCH v4 2/6] hw/arm/virt: Rename variable size to region_size in virt_set_high_memmap()

2022-10-03 Thread Gavin Shan
This renames variable 'size' to 'region_size' in virt_set_high_memmap().
Its counterpart ('region_base') will be introduced in next patch.

No functional change intended.

Signed-off-by: Gavin Shan 
Reviewed-by: Eric Auger 
---
 hw/arm/virt.c | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 4dab528b82..187b3ee0e2 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1692,15 +1692,16 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState 
*vms, int idx)
 static void virt_set_high_memmap(VirtMachineState *vms,
  hwaddr base, int pa_bits)
 {
+hwaddr region_size;
+bool fits;
 int i;
 
 for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) {
-hwaddr size = extended_memmap[i].size;
-bool fits;
+region_size = extended_memmap[i].size;
 
-base = ROUND_UP(base, size);
+base = ROUND_UP(base, region_size);
 vms->memmap[i].base = base;
-vms->memmap[i].size = size;
+vms->memmap[i].size = region_size;
 
 /*
  * Check each device to see if they fit in the PA space,
@@ -1708,9 +1709,9 @@ static void virt_set_high_memmap(VirtMachineState *vms,
  *
  * For each device that doesn't fit, disable it.
  */
-fits = (base + size) <= BIT_ULL(pa_bits);
+fits = (base + region_size) <= BIT_ULL(pa_bits);
 if (fits) {
-vms->highest_gpa = base + size - 1;
+vms->highest_gpa = base + region_size - 1;
 }
 
 switch (i) {
@@ -1725,7 +1726,7 @@ static void virt_set_high_memmap(VirtMachineState *vms,
 break;
 }
 
-base += size;
+base += region_size;
 }
 }
 
-- 
2.23.0




[PATCH v4 1/6] hw/arm/virt: Introduce virt_set_high_memmap() helper

2022-10-03 Thread Gavin Shan
This introduces virt_set_high_memmap() helper. The logic of high
memory region address assignment is moved to the helper. The intention
is to make the subsequent optimization for high memory region address
assignment easier.

No functional change intended.

Signed-off-by: Gavin Shan 
Reviewed-by: Eric Auger 
Reviewed-by: Cornelia Huck 
---
 hw/arm/virt.c | 74 ---
 1 file changed, 41 insertions(+), 33 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 0961e053e5..4dab528b82 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1689,6 +1689,46 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState 
*vms, int idx)
 return arm_cpu_mp_affinity(idx, clustersz);
 }
 
+static void virt_set_high_memmap(VirtMachineState *vms,
+ hwaddr base, int pa_bits)
+{
+int i;
+
+for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) {
+hwaddr size = extended_memmap[i].size;
+bool fits;
+
+base = ROUND_UP(base, size);
+vms->memmap[i].base = base;
+vms->memmap[i].size = size;
+
+/*
+ * Check each device to see if they fit in the PA space,
+ * moving highest_gpa as we go.
+ *
+ * For each device that doesn't fit, disable it.
+ */
+fits = (base + size) <= BIT_ULL(pa_bits);
+if (fits) {
+vms->highest_gpa = base + size - 1;
+}
+
+switch (i) {
+case VIRT_HIGH_GIC_REDIST2:
+vms->highmem_redists &= fits;
+break;
+case VIRT_HIGH_PCIE_ECAM:
+vms->highmem_ecam &= fits;
+break;
+case VIRT_HIGH_PCIE_MMIO:
+vms->highmem_mmio &= fits;
+break;
+}
+
+base += size;
+}
+}
+
 static void virt_set_memmap(VirtMachineState *vms, int pa_bits)
 {
 MachineState *ms = MACHINE(vms);
@@ -1744,39 +1784,7 @@ static void virt_set_memmap(VirtMachineState *vms, int 
pa_bits)
 /* We know for sure that at least the memory fits in the PA space */
 vms->highest_gpa = memtop - 1;
 
-for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) {
-hwaddr size = extended_memmap[i].size;
-bool fits;
-
-base = ROUND_UP(base, size);
-vms->memmap[i].base = base;
-vms->memmap[i].size = size;
-
-/*
- * Check each device to see if they fit in the PA space,
- * moving highest_gpa as we go.
- *
- * For each device that doesn't fit, disable it.
- */
-fits = (base + size) <= BIT_ULL(pa_bits);
-if (fits) {
-vms->highest_gpa = base + size - 1;
-}
-
-switch (i) {
-case VIRT_HIGH_GIC_REDIST2:
-vms->highmem_redists &= fits;
-break;
-case VIRT_HIGH_PCIE_ECAM:
-vms->highmem_ecam &= fits;
-break;
-case VIRT_HIGH_PCIE_MMIO:
-vms->highmem_mmio &= fits;
-break;
-}
-
-base += size;
-}
+virt_set_high_memmap(vms, base, pa_bits);
 
 if (device_memory_size > 0) {
 ms->device_memory = g_malloc0(sizeof(*ms->device_memory));
-- 
2.23.0




[PATCH v4 3/6] hw/arm/virt: Introduce variable region_base in virt_set_high_memmap()

2022-10-03 Thread Gavin Shan
This introduces variable 'region_base' for the base address of the
specific high memory region. It's the preparatory work to optimize
high memory region address assignment.

No functional change intended.

Signed-off-by: Gavin Shan 
Reviewed-by: Eric Auger 
---
 hw/arm/virt.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 187b3ee0e2..b0b679d1f4 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1692,15 +1692,15 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState 
*vms, int idx)
 static void virt_set_high_memmap(VirtMachineState *vms,
  hwaddr base, int pa_bits)
 {
-hwaddr region_size;
+hwaddr region_base, region_size;
 bool fits;
 int i;
 
 for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) {
+region_base = ROUND_UP(base, extended_memmap[i].size);
 region_size = extended_memmap[i].size;
 
-base = ROUND_UP(base, region_size);
-vms->memmap[i].base = base;
+vms->memmap[i].base = region_base;
 vms->memmap[i].size = region_size;
 
 /*
@@ -1709,9 +1709,9 @@ static void virt_set_high_memmap(VirtMachineState *vms,
  *
  * For each device that doesn't fit, disable it.
  */
-fits = (base + region_size) <= BIT_ULL(pa_bits);
+fits = (region_base + region_size) <= BIT_ULL(pa_bits);
 if (fits) {
-vms->highest_gpa = base + region_size - 1;
+vms->highest_gpa = region_base + region_size - 1;
 }
 
 switch (i) {
@@ -1726,7 +1726,7 @@ static void virt_set_high_memmap(VirtMachineState *vms,
 break;
 }
 
-base += region_size;
+base = region_base + region_size;
 }
 }
 
-- 
2.23.0




Re: [PATCH v3 5/5] hw/arm/virt: Add 'highmem-compact' property

2022-10-03 Thread Gavin Shan

Hi Eric,

On 10/3/22 4:49 PM, Eric Auger wrote:

On 9/29/22 01:49, Gavin Shan wrote:

On 9/28/22 10:22 PM, Eric Auger wrote:

On 9/22/22 01:13, Gavin Shan wrote:

After the improvement to high memory region address assignment is
applied, the memory layout is changed. For example, VIRT_HIGH_PCIE_MMIO

s/the memory layout is changed./the memory layout is changed,
introducing possible migration breakage.


Ok, much clearer.


memory region is enabled when the improvement is applied, but it's
disabled if the improvement isn't applied.

  pa_bits  = 40;
  vms->highmem_redists = false;
  vms->highmem_ecam    = false;
  vms->highmem_mmio    = true;

  # qemu-system-aarch64 -accel kvm -cpu host \
    -machine virt-7.2 -m 4G,maxmem=511G  \
    -monitor stdio

In order to keep backwords compatibility, we need to disable the
optimization on machines, which is virt-7.1 or ealier than it. It
means the optimization is enabled by default from virt-7.2. Besides,
'highmem-compact' property is added so that the optimization can be

I would rather rename the property into compact-highmem even if the vms
field is name highmem_compact to align with other highmem fields


Ok, but I would love to know why. Note that we already have
'highmem=on|off'. 'highmem_compact=on|off' seems consistent
to me.

To me the property name should rather sound 'english' with the adjective
before the name 'high memory"' but I am not a native english speaker
either.


Ok. I agree 'compact-highmem' is better. The backup variable name will
be still 'highmem_compact', which is consistent with the existing ones.




explicitly enabled or disabled on all machine types by users.

Signed-off-by: Gavin Shan 
---
   docs/system/arm/virt.rst |  4 
   hw/arm/virt.c    | 33 +
   include/hw/arm/virt.h    |  2 ++
   3 files changed, 39 insertions(+)

diff --git a/docs/system/arm/virt.rst b/docs/system/arm/virt.rst
index 20442ea2c1..f05ec2253b 100644
--- a/docs/system/arm/virt.rst
+++ b/docs/system/arm/virt.rst
@@ -94,6 +94,10 @@ highmem
     address space above 32 bits. The default is ``on`` for machine
types
     later than ``virt-2.12``.
   +highmem-compact
+  Set ``on``/``off`` to enable/disable compact space for high
memory regions.
+  The default is ``on`` for machine types later than ``virt-7.2``

I think you should document what is compact layout versus legacy one,
both in the commit msg and maybe as a comment in a code along with the
comment in hw/arm/virt.c starting with 'Highmem IO Regions: '


Ok, I will add this into the commit log in v4. I don't think it's
necessary
to add duplicate comment in the code. People can check the commit log for
details if needed.


+
   gic-version
     Specify the version of the Generic Interrupt Controller (GIC) to
provide.
     Valid values are:
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index b702f8f2b5..a4fbdaef91 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1734,6 +1734,13 @@ static void
virt_set_high_memmap(VirtMachineState *vms,
   base = region_base + region_size;
   } else {
   *region_enabled = false;
+
+    if (!vms->highmem_compact) {

this snippet should be already present in previous patch otherwise this
will break bisectability.



Hmm, nice catch! I think I need to swap PATCH[4] and PATCH[5] in next
revision. In that order, 'compact-highmem' is introduced in PATCH[4],
but not used yet. PATCH[5] has the optimization and 'compact-highmem'
is used.

No in general you introduce the property at the very end with the code
guarded with an unset vms->highmem_compact in the previous patch.



Yeah, what I need is define 'vms->highmem_compact' in PATCH[v3 4/5],
whose value is false. I also need to update @base and @vms->highest_gpa
on !vms->highmem_compact' in PATCH[v3 4/5].




+    base = region_base + region_size;
+    if (fits) {
+    vms->highest_gpa = region_base + region_size - 1;
+    }
+    }
   }
   }
   }
@@ -2348,6 +2355,20 @@ static void virt_set_highmem(Object *obj,
bool value, Error **errp)
   vms->highmem = value;
   }
   +static bool virt_get_highmem_compact(Object *obj, Error **errp)
+{
+    VirtMachineState *vms = VIRT_MACHINE(obj);
+
+    return vms->highmem_compact;
+}
+
+static void virt_set_highmem_compact(Object *obj, bool value, Error
**errp)
+{
+    VirtMachineState *vms = VIRT_MACHINE(obj);
+
+    vms->highmem_compact = value;
+}
+
   static bool virt_get_its(Object *obj, Error **errp)
   {
   VirtMachineState *vms = VIRT_MACHINE(obj);
@@ -2966,6 +2987,13 @@ static void
virt_machine_class_init(ObjectClass *oc, void *data)
     "Set on/off to
enable/disable using "
     "physical address space
above 32 bits")

Re: [PATCH v3 4/5] hw/arm/virt: Improve high memory region address assignment

2022-10-03 Thread Gavin Shan

Hi Eric,

On 10/3/22 4:44 PM, Eric Auger wrote:

On 9/29/22 01:37, Gavin Shan wrote:

On 9/28/22 10:51 PM, Eric Auger wrote:

On 9/22/22 01:13, Gavin Shan wrote:

There are three high memory regions, which are VIRT_HIGH_REDIST2,
VIRT_HIGH_PCIE_ECAM and VIRT_HIGH_PCIE_MMIO. Their base addresses
are floating on highest RAM address. However, they can be disabled
in several cases.

(1) One specific high memory region is disabled by developer by
  toggling vms->highmem_{redists, ecam, mmio}.

(2) VIRT_HIGH_PCIE_ECAM region is disabled on machine, which is
  'virt-2.12' or ealier than it.

(3) VIRT_HIGH_PCIE_ECAM region is disabled when firmware is loaded
  on 32-bits system.

(4) One specific high memory region is disabled when it breaks the
  PA space limit.

The current implementation of virt_set_memmap() isn't comprehensive
because the space for one specific high memory region is always
reserved from the PA space for case (1), (2) and (3). In the code,
'base' and 'vms->highest_gpa' are always increased for those three
cases. It's unnecessary since the assigned space of the disabled
high memory region won't be used afterwards.

This improves the address assignment for those three high memory
region by skipping the address assignment for one specific high
memory region if it has been disabled in case (1), (2) and (3).

Signed-off-by: Gavin Shan 
---
   hw/arm/virt.c | 44 ++--
   1 file changed, 26 insertions(+), 18 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index b0b679d1f4..b702f8f2b5 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1693,15 +1693,31 @@ static void
virt_set_high_memmap(VirtMachineState *vms,
    hwaddr base, int pa_bits)
   {
   hwaddr region_base, region_size;
-    bool fits;
+    bool *region_enabled, fits;

IDo you really need a pointer? If the region is unknown this is a bug in
virt code.


The pointer is needed so that we can disable the region by setting
'false'
to it at later point. Yeah, I think you're correct that 'unknown region'
is a bug and we need to do assert(region_enabled), or something like
below.

Yeah I don't think using a pointer here is useful.


When the high memory region can't fit into the PA space, it is disabled
by toggling the corresponding flag (vms->highmem_{redists, ecam, mmio})
to false. It's part of the original implementation, as below. We either
need a 'switch ... case' or a pointer. A pointer is more convenient since
we need check and possibly update to the value.

   switch (i) {
case VIRT_HIGH_GIC_REDIST2:
vms->highmem_redists &= fits;
break;
case VIRT_HIGH_PCIE_ECAM:
vms->highmem_ecam &= fits;
break;
case VIRT_HIGH_PCIE_MMIO:
vms->highmem_mmio &= fits;
break;
}




   int i;
     for (i = VIRT_LOWMEMMAP_LAST; i <
ARRAY_SIZE(extended_memmap); i++) {
   region_base = ROUND_UP(base, extended_memmap[i].size);
   region_size = extended_memmap[i].size;
   -    vms->memmap[i].base = region_base;
-    vms->memmap[i].size = region_size;
+    switch (i) {
+    case VIRT_HIGH_GIC_REDIST2:
+    region_enabled = >highmem_redists;
+    break;
+    case VIRT_HIGH_PCIE_ECAM:
+    region_enabled = >highmem_ecam;
+    break;
+    case VIRT_HIGH_PCIE_MMIO:
+    region_enabled = >highmem_mmio;
+    break;

While we are at it I would change the vms fields dealing with those
highmem regions and turn those fields into an array of bool indexed
using i - VIRT_LOWMEMMAP_LAST (using a macro or something alike). We
would not be obliged to have this switch, now duplicated.


It makes sense to me. How about to have something like below in v4?

static inline bool *virt_get_high_memmap_enabled(VirtMachineState
*vms, int index)
{
     bool *enabled_array[] = {
   >highmem_redists,
   >highmem_ecam,
   >highmem_mmio,
     };

     assert(index - VIRT_LOWMEMMAP_LAST < ARRAY_SIZE(enabled_array));

     return enabled_array[index - VIRT_LOWMEMMAP_LAST];
}

I was rather thinking as directly using a vms->highmem_flags[] but your
proposal may work as well.


Ok. I will use my proposed change in next revision.




+    default:
+    region_enabled = NULL;
+    }
+
+    /* Skip unknown region */
+    if (!region_enabled) {
+    continue;
+    }
     /*
    * Check each device to see if they fit in the PA space,
@@ -1710,23 +1726,15 @@ static void
virt_set_high_memmap(VirtMachineState *vms,
    * For each device that doesn't fit, disable it.
    */
   fits = (region_base + region_size) <= BIT_ULL(pa_bits);
-    if (fits) {
-    vms->highest_gpa = region_base + region_size - 1;
-    }
+    if (*reg

Re: [PATCH v3 5/5] hw/arm/virt: Add 'highmem-compact' property

2022-09-29 Thread Gavin Shan

Hi Cornelia,

On 9/29/22 8:27 PM, Cornelia Huck wrote:

On Thu, Sep 29 2022, Gavin Shan  wrote:

On 9/28/22 10:22 PM, Eric Auger wrote:

On 9/22/22 01:13, Gavin Shan wrote:

After the improvement to high memory region address assignment is
applied, the memory layout is changed. For example, VIRT_HIGH_PCIE_MMIO

s/the memory layout is changed./the memory layout is changed,
introducing possible migration breakage.


Ok, much clearer.


memory region is enabled when the improvement is applied, but it's
disabled if the improvement isn't applied.

  pa_bits  = 40;
  vms->highmem_redists = false;
  vms->highmem_ecam= false;
  vms->highmem_mmio= true;

  # qemu-system-aarch64 -accel kvm -cpu host \
-machine virt-7.2 -m 4G,maxmem=511G  \
-monitor stdio

In order to keep backwords compatibility, we need to disable the
optimization on machines, which is virt-7.1 or ealier than it. It
means the optimization is enabled by default from virt-7.2. Besides,
'highmem-compact' property is added so that the optimization can be

I would rather rename the property into compact-highmem even if the vms
field is name highmem_compact to align with other highmem fields


Ok, but I would love to know why. Note that we already have
'highmem=on|off'. 'highmem_compact=on|off' seems consistent
to me.


FWIW, I initially misread 'highmem_compact' as 'highmem_compat' (and had
to re-read because I got confused). At least to me, 'compact_highmem'
has less chance of being parsed incorrectly :) (although that is
probably a personal thing.)



Ok. 'compact-highmem' is also fine to me. I'm really bad at naming :)




explicitly enabled or disabled on all machine types by users.

Signed-off-by: Gavin Shan 
---
   docs/system/arm/virt.rst |  4 
   hw/arm/virt.c| 33 +
   include/hw/arm/virt.h|  2 ++
   3 files changed, 39 insertions(+)

diff --git a/docs/system/arm/virt.rst b/docs/system/arm/virt.rst
index 20442ea2c1..f05ec2253b 100644
--- a/docs/system/arm/virt.rst
+++ b/docs/system/arm/virt.rst
@@ -94,6 +94,10 @@ highmem
 address space above 32 bits. The default is ``on`` for machine types
 later than ``virt-2.12``.
   
+highmem-compact

+  Set ``on``/``off`` to enable/disable compact space for high memory regions.
+  The default is ``on`` for machine types later than ``virt-7.2``

I think you should document what is compact layout versus legacy one,
both in the commit msg and maybe as a comment in a code along with the
comment in hw/arm/virt.c starting with 'Highmem IO Regions: '


Ok, I will add this into the commit log in v4. I don't think it's necessary
to add duplicate comment in the code. People can check the commit log for
details if needed.


Rather explain it in this file here, maybe? I'd prefer to be able to
find out what 'compact' means without digging through the commit log.



Ok, lets do as Eric suggested. There are existing comments about
@extended_memmap[] in hw/arm/virt.c. We need to explain the legacy/modern
laoyout and 'compact-highmem' property there.

Thanks,
Gavin




Re: [PATCH v3 5/5] hw/arm/virt: Add 'highmem-compact' property

2022-09-28 Thread Gavin Shan

Hi Eric,

On 9/28/22 10:22 PM, Eric Auger wrote:

On 9/22/22 01:13, Gavin Shan wrote:

After the improvement to high memory region address assignment is
applied, the memory layout is changed. For example, VIRT_HIGH_PCIE_MMIO

s/the memory layout is changed./the memory layout is changed,
introducing possible migration breakage.


Ok, much clearer.


memory region is enabled when the improvement is applied, but it's
disabled if the improvement isn't applied.

 pa_bits  = 40;
 vms->highmem_redists = false;
 vms->highmem_ecam= false;
 vms->highmem_mmio= true;

 # qemu-system-aarch64 -accel kvm -cpu host \
   -machine virt-7.2 -m 4G,maxmem=511G  \
   -monitor stdio

In order to keep backwords compatibility, we need to disable the
optimization on machines, which is virt-7.1 or ealier than it. It
means the optimization is enabled by default from virt-7.2. Besides,
'highmem-compact' property is added so that the optimization can be

I would rather rename the property into compact-highmem even if the vms
field is name highmem_compact to align with other highmem fields


Ok, but I would love to know why. Note that we already have
'highmem=on|off'. 'highmem_compact=on|off' seems consistent
to me.


explicitly enabled or disabled on all machine types by users.

Signed-off-by: Gavin Shan 
---
  docs/system/arm/virt.rst |  4 
  hw/arm/virt.c| 33 +
  include/hw/arm/virt.h|  2 ++
  3 files changed, 39 insertions(+)

diff --git a/docs/system/arm/virt.rst b/docs/system/arm/virt.rst
index 20442ea2c1..f05ec2253b 100644
--- a/docs/system/arm/virt.rst
+++ b/docs/system/arm/virt.rst
@@ -94,6 +94,10 @@ highmem
address space above 32 bits. The default is ``on`` for machine types
later than ``virt-2.12``.
  
+highmem-compact

+  Set ``on``/``off`` to enable/disable compact space for high memory regions.
+  The default is ``on`` for machine types later than ``virt-7.2``

I think you should document what is compact layout versus legacy one,
both in the commit msg and maybe as a comment in a code along with the
comment in hw/arm/virt.c starting with 'Highmem IO Regions: '


Ok, I will add this into the commit log in v4. I don't think it's necessary
to add duplicate comment in the code. People can check the commit log for
details if needed.


+
  gic-version
Specify the version of the Generic Interrupt Controller (GIC) to provide.
Valid values are:
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index b702f8f2b5..a4fbdaef91 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1734,6 +1734,13 @@ static void virt_set_high_memmap(VirtMachineState *vms,
  base = region_base + region_size;
  } else {
  *region_enabled = false;
+
+if (!vms->highmem_compact) {

this snippet should be already present in previous patch otherwise this
will break bisectability.



Hmm, nice catch! I think I need to swap PATCH[4] and PATCH[5] in next
revision. In that order, 'compact-highmem' is introduced in PATCH[4],
but not used yet. PATCH[5] has the optimization and 'compact-highmem'
is used.


+base = region_base + region_size;
+if (fits) {
+vms->highest_gpa = region_base + region_size - 1;
+}
+}
  }
  }
  }
@@ -2348,6 +2355,20 @@ static void virt_set_highmem(Object *obj, bool value, 
Error **errp)
  vms->highmem = value;
  }
  
+static bool virt_get_highmem_compact(Object *obj, Error **errp)

+{
+VirtMachineState *vms = VIRT_MACHINE(obj);
+
+return vms->highmem_compact;
+}
+
+static void virt_set_highmem_compact(Object *obj, bool value, Error **errp)
+{
+VirtMachineState *vms = VIRT_MACHINE(obj);
+
+vms->highmem_compact = value;
+}
+
  static bool virt_get_its(Object *obj, Error **errp)
  {
  VirtMachineState *vms = VIRT_MACHINE(obj);
@@ -2966,6 +2987,13 @@ static void virt_machine_class_init(ObjectClass *oc, 
void *data)
"Set on/off to enable/disable using 
"
"physical address space above 32 
bits");
  
+object_class_property_add_bool(oc, "highmem-compact",

+   virt_get_highmem_compact,
+   virt_set_highmem_compact);
+object_class_property_set_description(oc, "highmem-compact",
+  "Set on/off to enable/disable compact 
"
+  "space for high memory regions");
+
  object_class_property_add_str(oc, "gic-version", virt_get_gic_version,
virt_set_gic_version);
  object_class_property_set_description(oc, "gic-version",
@@ -3050,6 +3078,7 @@ static void virt_instance_init(Object *obj)
  
  /* High memory i

Re: [PATCH v3 4/5] hw/arm/virt: Improve high memory region address assignment

2022-09-28 Thread Gavin Shan

Hi Eric,

On 9/28/22 10:51 PM, Eric Auger wrote:

On 9/22/22 01:13, Gavin Shan wrote:

There are three high memory regions, which are VIRT_HIGH_REDIST2,
VIRT_HIGH_PCIE_ECAM and VIRT_HIGH_PCIE_MMIO. Their base addresses
are floating on highest RAM address. However, they can be disabled
in several cases.

(1) One specific high memory region is disabled by developer by
 toggling vms->highmem_{redists, ecam, mmio}.

(2) VIRT_HIGH_PCIE_ECAM region is disabled on machine, which is
 'virt-2.12' or ealier than it.

(3) VIRT_HIGH_PCIE_ECAM region is disabled when firmware is loaded
 on 32-bits system.

(4) One specific high memory region is disabled when it breaks the
 PA space limit.

The current implementation of virt_set_memmap() isn't comprehensive
because the space for one specific high memory region is always
reserved from the PA space for case (1), (2) and (3). In the code,
'base' and 'vms->highest_gpa' are always increased for those three
cases. It's unnecessary since the assigned space of the disabled
high memory region won't be used afterwards.

This improves the address assignment for those three high memory
region by skipping the address assignment for one specific high
memory region if it has been disabled in case (1), (2) and (3).

Signed-off-by: Gavin Shan 
---
  hw/arm/virt.c | 44 ++--
  1 file changed, 26 insertions(+), 18 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index b0b679d1f4..b702f8f2b5 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1693,15 +1693,31 @@ static void virt_set_high_memmap(VirtMachineState *vms,
   hwaddr base, int pa_bits)
  {
  hwaddr region_base, region_size;
-bool fits;
+bool *region_enabled, fits;

IDo you really need a pointer? If the region is unknown this is a bug in
virt code.


The pointer is needed so that we can disable the region by setting 'false'
to it at later point. Yeah, I think you're correct that 'unknown region'
is a bug and we need to do assert(region_enabled), or something like below.


  int i;
  
  for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) {

  region_base = ROUND_UP(base, extended_memmap[i].size);
  region_size = extended_memmap[i].size;
  
-vms->memmap[i].base = region_base;

-vms->memmap[i].size = region_size;
+switch (i) {
+case VIRT_HIGH_GIC_REDIST2:
+region_enabled = >highmem_redists;
+break;
+case VIRT_HIGH_PCIE_ECAM:
+region_enabled = >highmem_ecam;
+break;
+case VIRT_HIGH_PCIE_MMIO:
+region_enabled = >highmem_mmio;
+break;

While we are at it I would change the vms fields dealing with those
highmem regions and turn those fields into an array of bool indexed
using i - VIRT_LOWMEMMAP_LAST (using a macro or something alike). We
would not be obliged to have this switch, now duplicated.


It makes sense to me. How about to have something like below in v4?

static inline bool *virt_get_high_memmap_enabled(VirtMachineState *vms, int 
index)
{
bool *enabled_array[] = {
  >highmem_redists,
  >highmem_ecam,
  >highmem_mmio,
};

assert(index - VIRT_LOWMEMMAP_LAST < ARRAY_SIZE(enabled_array));

return enabled_array[index - VIRT_LOWMEMMAP_LAST];
}


+default:
+region_enabled = NULL;
+}
+
+/* Skip unknown region */
+if (!region_enabled) {
+continue;
+}
  
  /*

   * Check each device to see if they fit in the PA space,
@@ -1710,23 +1726,15 @@ static void virt_set_high_memmap(VirtMachineState *vms,
   * For each device that doesn't fit, disable it.
   */
  fits = (region_base + region_size) <= BIT_ULL(pa_bits);
-if (fits) {
-vms->highest_gpa = region_base + region_size - 1;
-}
+if (*region_enabled && fits) {
+vms->memmap[i].base = region_base;
+vms->memmap[i].size = region_size;
  
-switch (i) {

-case VIRT_HIGH_GIC_REDIST2:
-vms->highmem_redists &= fits;
-break;
-case VIRT_HIGH_PCIE_ECAM:
-vms->highmem_ecam &= fits;
-break;
-case VIRT_HIGH_PCIE_MMIO:
-vms->highmem_mmio &= fits;
-break;
+vms->highest_gpa = region_base + region_size - 1;
+base = region_base + region_size;
+} else {
+*region_enabled = false;
  }
-
-base = region_base + region_size;
  }
  }
  


Thanks,
Gavin




Re: [PATCH v3 3/5] hw/arm/virt: Introduce variable region_base in virt_set_high_memmap()

2022-09-28 Thread Gavin Shan

Hi Eric,

On 9/28/22 10:10 PM, Eric Auger wrote:

On 9/22/22 01:13, Gavin Shan wrote:

This introduces variable 'region_base' for the base address of the
specific high memory region. It's the preparatory work to optimize
high memory region address assignment.

Why is it a preparatory work (same comment for previous patch, ie [2/5]
). Are those changes really needed? why?



In PATCH[4/5], @base argument is added to virt_set_high_memmap(), to
represent current global base address. With the optimization applied
in PATCH[4/5], @base isn't unconditionally updated to the top of the
iterated high memory region. So we need @region_base here (PATCH[3/5])
to track the aligned base address for the iterated high memory region,
which may or may be not updated to @base.

Since we have @region_base in PATCH[3/5], it'd better to have @region_size
in PATCH[2/5].

Actually, PATCH[1-3/5] are all preparatory patches for PATCH[4/5]. My
intention was to organize the patches in a way to keep the logical
change part simple enough, for easier review.

Thanks,
Gavin



No functional change intended.

Signed-off-by: Gavin Shan 
---
  hw/arm/virt.c | 12 ++--
  1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 187b3ee0e2..b0b679d1f4 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1692,15 +1692,15 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState 
*vms, int idx)
  static void virt_set_high_memmap(VirtMachineState *vms,
   hwaddr base, int pa_bits)
  {
-hwaddr region_size;
+hwaddr region_base, region_size;
  bool fits;
  int i;
  
  for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) {

+region_base = ROUND_UP(base, extended_memmap[i].size);
  region_size = extended_memmap[i].size;
  
-base = ROUND_UP(base, region_size);

-vms->memmap[i].base = base;
+vms->memmap[i].base = region_base;
  vms->memmap[i].size = region_size;
  
  /*

@@ -1709,9 +1709,9 @@ static void virt_set_high_memmap(VirtMachineState *vms,
   *
   * For each device that doesn't fit, disable it.
   */
-fits = (base + region_size) <= BIT_ULL(pa_bits);
+fits = (region_base + region_size) <= BIT_ULL(pa_bits);
  if (fits) {
-vms->highest_gpa = base + region_size - 1;
+vms->highest_gpa = region_base + region_size - 1;
  }
  
  switch (i) {

@@ -1726,7 +1726,7 @@ static void virt_set_high_memmap(VirtMachineState *vms,
  break;
  }
  
-base += region_size;

+base = region_base + region_size;
  }
  }
  







Re: [PATCH] KVM: dirty ring: Add memory barrier when marking pfn collected

2022-09-22 Thread Gavin Shan

On 9/23/22 7:35 AM, Peter Xu wrote:

Following commit 4802bf910eee9, add the other missing barrier when marking
the PFN as collected.  This will also be required just like 4802bf910eee9
on weak ordering architectures like aarch64.

Cc: Marc Zyngier 
Cc: Gavin Shan 
Cc: Paolo Bonzini 
Signed-off-by: Peter Xu 
---
  accel/kvm/kvm-all.c | 6 +-
  1 file changed, 5 insertions(+), 1 deletion(-)



Reviewed-by: Gavin Shan 


diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 136c8eaed3..60a044048e 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -728,7 +728,11 @@ static bool dirty_gfn_is_dirtied(struct kvm_dirty_gfn *gfn)
  
  static void dirty_gfn_set_collected(struct kvm_dirty_gfn *gfn)

  {
-gfn->flags = KVM_DIRTY_GFN_F_RESET;
+/*
+ * Write the flags after reading the entry.  Should pair with another
+ * smp_load_acquire() in KVM when reset dirty rings.
+ */
+qatomic_store_release(>flags, KVM_DIRTY_GFN_F_RESET);
  }
  
  /*







[PATCH v3 3/5] hw/arm/virt: Introduce variable region_base in virt_set_high_memmap()

2022-09-21 Thread Gavin Shan
This introduces variable 'region_base' for the base address of the
specific high memory region. It's the preparatory work to optimize
high memory region address assignment.

No functional change intended.

Signed-off-by: Gavin Shan 
---
 hw/arm/virt.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 187b3ee0e2..b0b679d1f4 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1692,15 +1692,15 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState 
*vms, int idx)
 static void virt_set_high_memmap(VirtMachineState *vms,
  hwaddr base, int pa_bits)
 {
-hwaddr region_size;
+hwaddr region_base, region_size;
 bool fits;
 int i;
 
 for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) {
+region_base = ROUND_UP(base, extended_memmap[i].size);
 region_size = extended_memmap[i].size;
 
-base = ROUND_UP(base, region_size);
-vms->memmap[i].base = base;
+vms->memmap[i].base = region_base;
 vms->memmap[i].size = region_size;
 
 /*
@@ -1709,9 +1709,9 @@ static void virt_set_high_memmap(VirtMachineState *vms,
  *
  * For each device that doesn't fit, disable it.
  */
-fits = (base + region_size) <= BIT_ULL(pa_bits);
+fits = (region_base + region_size) <= BIT_ULL(pa_bits);
 if (fits) {
-vms->highest_gpa = base + region_size - 1;
+vms->highest_gpa = region_base + region_size - 1;
 }
 
 switch (i) {
@@ -1726,7 +1726,7 @@ static void virt_set_high_memmap(VirtMachineState *vms,
 break;
 }
 
-base += region_size;
+base = region_base + region_size;
 }
 }
 
-- 
2.23.0




[PATCH v3 5/5] hw/arm/virt: Add 'highmem-compact' property

2022-09-21 Thread Gavin Shan
After the improvement to high memory region address assignment is
applied, the memory layout is changed. For example, VIRT_HIGH_PCIE_MMIO
memory region is enabled when the improvement is applied, but it's
disabled if the improvement isn't applied.

pa_bits  = 40;
vms->highmem_redists = false;
vms->highmem_ecam= false;
vms->highmem_mmio= true;

# qemu-system-aarch64 -accel kvm -cpu host \
  -machine virt-7.2 -m 4G,maxmem=511G  \
  -monitor stdio

In order to keep backwords compatibility, we need to disable the
optimization on machines, which is virt-7.1 or ealier than it. It
means the optimization is enabled by default from virt-7.2. Besides,
'highmem-compact' property is added so that the optimization can be
explicitly enabled or disabled on all machine types by users.

Signed-off-by: Gavin Shan 
---
 docs/system/arm/virt.rst |  4 
 hw/arm/virt.c| 33 +
 include/hw/arm/virt.h|  2 ++
 3 files changed, 39 insertions(+)

diff --git a/docs/system/arm/virt.rst b/docs/system/arm/virt.rst
index 20442ea2c1..f05ec2253b 100644
--- a/docs/system/arm/virt.rst
+++ b/docs/system/arm/virt.rst
@@ -94,6 +94,10 @@ highmem
   address space above 32 bits. The default is ``on`` for machine types
   later than ``virt-2.12``.
 
+highmem-compact
+  Set ``on``/``off`` to enable/disable compact space for high memory regions.
+  The default is ``on`` for machine types later than ``virt-7.2``
+
 gic-version
   Specify the version of the Generic Interrupt Controller (GIC) to provide.
   Valid values are:
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index b702f8f2b5..a4fbdaef91 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1734,6 +1734,13 @@ static void virt_set_high_memmap(VirtMachineState *vms,
 base = region_base + region_size;
 } else {
 *region_enabled = false;
+
+if (!vms->highmem_compact) {
+base = region_base + region_size;
+if (fits) {
+vms->highest_gpa = region_base + region_size - 1;
+}
+}
 }
 }
 }
@@ -2348,6 +2355,20 @@ static void virt_set_highmem(Object *obj, bool value, 
Error **errp)
 vms->highmem = value;
 }
 
+static bool virt_get_highmem_compact(Object *obj, Error **errp)
+{
+VirtMachineState *vms = VIRT_MACHINE(obj);
+
+return vms->highmem_compact;
+}
+
+static void virt_set_highmem_compact(Object *obj, bool value, Error **errp)
+{
+VirtMachineState *vms = VIRT_MACHINE(obj);
+
+vms->highmem_compact = value;
+}
+
 static bool virt_get_its(Object *obj, Error **errp)
 {
 VirtMachineState *vms = VIRT_MACHINE(obj);
@@ -2966,6 +2987,13 @@ static void virt_machine_class_init(ObjectClass *oc, 
void *data)
   "Set on/off to enable/disable using "
   "physical address space above 32 
bits");
 
+object_class_property_add_bool(oc, "highmem-compact",
+   virt_get_highmem_compact,
+   virt_set_highmem_compact);
+object_class_property_set_description(oc, "highmem-compact",
+  "Set on/off to enable/disable 
compact "
+  "space for high memory regions");
+
 object_class_property_add_str(oc, "gic-version", virt_get_gic_version,
   virt_set_gic_version);
 object_class_property_set_description(oc, "gic-version",
@@ -3050,6 +3078,7 @@ static void virt_instance_init(Object *obj)
 
 /* High memory is enabled by default */
 vms->highmem = true;
+vms->highmem_compact = !vmc->no_highmem_compact;
 vms->gic_version = VIRT_GIC_VERSION_NOSEL;
 
 vms->highmem_ecam = !vmc->no_highmem_ecam;
@@ -3119,8 +3148,12 @@ DEFINE_VIRT_MACHINE_AS_LATEST(7, 2)
 
 static void virt_machine_7_1_options(MachineClass *mc)
 {
+VirtMachineClass *vmc = VIRT_MACHINE_CLASS(OBJECT_CLASS(mc));
+
 virt_machine_7_2_options(mc);
 compat_props_add(mc->compat_props, hw_compat_7_1, hw_compat_7_1_len);
+/* Compact space for high memory regions was introduced with 7.2 */
+vmc->no_highmem_compact = true;
 }
 DEFINE_VIRT_MACHINE(7, 1)
 
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index 6ec479ca2b..c7dd59d7f1 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -125,6 +125,7 @@ struct VirtMachineClass {
 bool no_pmu;
 bool claim_edge_triggered_timers;
 bool smbios_old_sys_ver;
+bool no_highmem_compact;
 bool no_highmem_ecam;
 bool no_ged;   /* Machines < 4.2 have no support for ACPI GED device */
 bool kvm_no_adjvtime;
@@ -144,6 +145,7 @@ struct VirtMachineState {
 PFlashCFI01 *flash[2];
 bool secure;
 bool highmem;
+bool highmem_compact;
 bool highmem_ecam;
 bool highmem_mmio;
 bool highmem_redists;
-- 
2.23.0




[PATCH v3 1/5] hw/arm/virt: Introduce virt_set_high_memmap() helper

2022-09-21 Thread Gavin Shan
This introduces virt_set_high_memmap() helper. The logic of high
memory region address assignment is moved to the helper. The intention
is to make the subsequent optimization for high memory region address
assignment easier.

No functional change intended.

Signed-off-by: Gavin Shan 
---
 hw/arm/virt.c | 74 ---
 1 file changed, 41 insertions(+), 33 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 0961e053e5..4dab528b82 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1689,6 +1689,46 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState 
*vms, int idx)
 return arm_cpu_mp_affinity(idx, clustersz);
 }
 
+static void virt_set_high_memmap(VirtMachineState *vms,
+ hwaddr base, int pa_bits)
+{
+int i;
+
+for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) {
+hwaddr size = extended_memmap[i].size;
+bool fits;
+
+base = ROUND_UP(base, size);
+vms->memmap[i].base = base;
+vms->memmap[i].size = size;
+
+/*
+ * Check each device to see if they fit in the PA space,
+ * moving highest_gpa as we go.
+ *
+ * For each device that doesn't fit, disable it.
+ */
+fits = (base + size) <= BIT_ULL(pa_bits);
+if (fits) {
+vms->highest_gpa = base + size - 1;
+}
+
+switch (i) {
+case VIRT_HIGH_GIC_REDIST2:
+vms->highmem_redists &= fits;
+break;
+case VIRT_HIGH_PCIE_ECAM:
+vms->highmem_ecam &= fits;
+break;
+case VIRT_HIGH_PCIE_MMIO:
+vms->highmem_mmio &= fits;
+break;
+}
+
+base += size;
+}
+}
+
 static void virt_set_memmap(VirtMachineState *vms, int pa_bits)
 {
 MachineState *ms = MACHINE(vms);
@@ -1744,39 +1784,7 @@ static void virt_set_memmap(VirtMachineState *vms, int 
pa_bits)
 /* We know for sure that at least the memory fits in the PA space */
 vms->highest_gpa = memtop - 1;
 
-for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) {
-hwaddr size = extended_memmap[i].size;
-bool fits;
-
-base = ROUND_UP(base, size);
-vms->memmap[i].base = base;
-vms->memmap[i].size = size;
-
-/*
- * Check each device to see if they fit in the PA space,
- * moving highest_gpa as we go.
- *
- * For each device that doesn't fit, disable it.
- */
-fits = (base + size) <= BIT_ULL(pa_bits);
-if (fits) {
-vms->highest_gpa = base + size - 1;
-}
-
-switch (i) {
-case VIRT_HIGH_GIC_REDIST2:
-vms->highmem_redists &= fits;
-break;
-case VIRT_HIGH_PCIE_ECAM:
-vms->highmem_ecam &= fits;
-break;
-case VIRT_HIGH_PCIE_MMIO:
-vms->highmem_mmio &= fits;
-break;
-}
-
-base += size;
-}
+virt_set_high_memmap(vms, base, pa_bits);
 
 if (device_memory_size > 0) {
 ms->device_memory = g_malloc0(sizeof(*ms->device_memory));
-- 
2.23.0




[PATCH v3 4/5] hw/arm/virt: Improve high memory region address assignment

2022-09-21 Thread Gavin Shan
There are three high memory regions, which are VIRT_HIGH_REDIST2,
VIRT_HIGH_PCIE_ECAM and VIRT_HIGH_PCIE_MMIO. Their base addresses
are floating on highest RAM address. However, they can be disabled
in several cases.

(1) One specific high memory region is disabled by developer by
toggling vms->highmem_{redists, ecam, mmio}.

(2) VIRT_HIGH_PCIE_ECAM region is disabled on machine, which is
'virt-2.12' or ealier than it.

(3) VIRT_HIGH_PCIE_ECAM region is disabled when firmware is loaded
on 32-bits system.

(4) One specific high memory region is disabled when it breaks the
PA space limit.

The current implementation of virt_set_memmap() isn't comprehensive
because the space for one specific high memory region is always
reserved from the PA space for case (1), (2) and (3). In the code,
'base' and 'vms->highest_gpa' are always increased for those three
cases. It's unnecessary since the assigned space of the disabled
high memory region won't be used afterwards.

This improves the address assignment for those three high memory
region by skipping the address assignment for one specific high
memory region if it has been disabled in case (1), (2) and (3).

Signed-off-by: Gavin Shan 
---
 hw/arm/virt.c | 44 ++--
 1 file changed, 26 insertions(+), 18 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index b0b679d1f4..b702f8f2b5 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1693,15 +1693,31 @@ static void virt_set_high_memmap(VirtMachineState *vms,
  hwaddr base, int pa_bits)
 {
 hwaddr region_base, region_size;
-bool fits;
+bool *region_enabled, fits;
 int i;
 
 for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) {
 region_base = ROUND_UP(base, extended_memmap[i].size);
 region_size = extended_memmap[i].size;
 
-vms->memmap[i].base = region_base;
-vms->memmap[i].size = region_size;
+switch (i) {
+case VIRT_HIGH_GIC_REDIST2:
+region_enabled = >highmem_redists;
+break;
+case VIRT_HIGH_PCIE_ECAM:
+region_enabled = >highmem_ecam;
+break;
+case VIRT_HIGH_PCIE_MMIO:
+region_enabled = >highmem_mmio;
+break;
+default:
+region_enabled = NULL;
+}
+
+/* Skip unknown region */
+if (!region_enabled) {
+continue;
+}
 
 /*
  * Check each device to see if they fit in the PA space,
@@ -1710,23 +1726,15 @@ static void virt_set_high_memmap(VirtMachineState *vms,
  * For each device that doesn't fit, disable it.
  */
 fits = (region_base + region_size) <= BIT_ULL(pa_bits);
-if (fits) {
-vms->highest_gpa = region_base + region_size - 1;
-}
+if (*region_enabled && fits) {
+vms->memmap[i].base = region_base;
+vms->memmap[i].size = region_size;
 
-switch (i) {
-case VIRT_HIGH_GIC_REDIST2:
-vms->highmem_redists &= fits;
-break;
-case VIRT_HIGH_PCIE_ECAM:
-vms->highmem_ecam &= fits;
-break;
-case VIRT_HIGH_PCIE_MMIO:
-vms->highmem_mmio &= fits;
-break;
+vms->highest_gpa = region_base + region_size - 1;
+base = region_base + region_size;
+} else {
+*region_enabled = false;
 }
-
-base = region_base + region_size;
 }
 }
 
-- 
2.23.0




[PATCH v3 2/5] hw/arm/virt: Rename variable size to region_size in virt_set_high_memmap()

2022-09-21 Thread Gavin Shan
This renames variable 'size' to 'region_size' in virt_set_high_memmap().
Its counterpart ('region_base') will be introduced in next patch.

No functional change intended.

Signed-off-by: Gavin Shan 
---
 hw/arm/virt.c | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 4dab528b82..187b3ee0e2 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1692,15 +1692,16 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState 
*vms, int idx)
 static void virt_set_high_memmap(VirtMachineState *vms,
  hwaddr base, int pa_bits)
 {
+hwaddr region_size;
+bool fits;
 int i;
 
 for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) {
-hwaddr size = extended_memmap[i].size;
-bool fits;
+region_size = extended_memmap[i].size;
 
-base = ROUND_UP(base, size);
+base = ROUND_UP(base, region_size);
 vms->memmap[i].base = base;
-vms->memmap[i].size = size;
+vms->memmap[i].size = region_size;
 
 /*
  * Check each device to see if they fit in the PA space,
@@ -1708,9 +1709,9 @@ static void virt_set_high_memmap(VirtMachineState *vms,
  *
  * For each device that doesn't fit, disable it.
  */
-fits = (base + size) <= BIT_ULL(pa_bits);
+fits = (base + region_size) <= BIT_ULL(pa_bits);
 if (fits) {
-vms->highest_gpa = base + size - 1;
+vms->highest_gpa = base + region_size - 1;
 }
 
 switch (i) {
@@ -1725,7 +1726,7 @@ static void virt_set_high_memmap(VirtMachineState *vms,
 break;
 }
 
-base += size;
+base += region_size;
 }
 }
 
-- 
2.23.0




[PATCH v3 0/5] hw/arm/virt: Improve address assignment for high memory regions

2022-09-21 Thread Gavin Shan
There are three high memory regions, which are VIRT_HIGH_REDIST2,
VIRT_HIGH_PCIE_ECAM and VIRT_HIGH_PCIE_MMIO. Their base addresses
are floating on highest RAM address. However, they can be disabled
in several cases.

(1) One specific high memory region is disabled by developer by
toggling vms->highmem_{redists, ecam, mmio}.

(2) VIRT_HIGH_PCIE_ECAM region is disabled on machine, which is
'virt-2.12' or ealier than it.

(3) VIRT_HIGH_PCIE_ECAM region is disabled when firmware is loaded
on 32-bits system.

(4) One specific high memory region is disabled when it breaks the
PA space limit.

The current implementation of virt_set_memmap() isn't comprehensive
because the space for one specific high memory region is always
reserved from the PA space for case (1), (2) and (3). In the code,
'base' and 'vms->highest_gpa' are always increased for those three
cases. It's unnecessary since the assigned space of the disabled
high memory region won't be used afterwards.

The series intends to improve the address assignment for these
high memory regions.

PATCH[1-3] preparatory work for the improvment
PATCH[4]   improve high memory region address assignment
PATCH[5]   adds 'highmem-compact' to enable or disable the optimization

History
===
v2: https://lore.kernel.org/all/20220815062958.100366-1-gs...@redhat.com/T/
v1: https://lists.nongnu.org/archive/html/qemu-arm/2022-08/msg00013.html

Changelog
==
v3:
  * Reorder the patches(Gavin)
  * Add 'highmem-compact' property for backwards compatibility (Eric)
v2:
  * Split the patches for easier review(Gavin)
  * Improved changelog (Marc)
  * Use 'bool fits' in virt_set_high_memmap()  (Eric)

Gavin Shan (5):
  hw/arm/virt: Introduce virt_set_high_memmap() helper
  hw/arm/virt: Rename variable size to region_size in
virt_set_high_memmap()
  hw/arm/virt: Introduce variable region_base in virt_set_high_memmap()
  hw/arm/virt: Improve high memory region address assignment
  hw/arm/virt: Add 'highmem-compact' property

 docs/system/arm/virt.rst |   4 ++
 hw/arm/virt.c| 116 ---
 include/hw/arm/virt.h|   2 +
 3 files changed, 89 insertions(+), 33 deletions(-)

-- 
2.23.0




Re: [PATCH] KVM: use store-release to mark dirty pages as harvested

2022-09-06 Thread Gavin Shan

On 9/2/22 10:19 AM, Paolo Bonzini wrote:

The following scenario can happen if QEMU sets more RESET flags while
the KVM_RESET_DIRTY_RINGS ioctl is ongoing on another host CPU:

 CPU0 CPU1   CPU2
  -- 
 fill gfn0
 store-rel flags for gfn0
 fill gfn1
 store-rel flags for gfn1
 load-acq flags for gfn0
 set RESET for gfn0
 load-acq flags for gfn1
 set RESET for gfn1
 do ioctl! --->
  ioctl(RESET_RINGS)
 fill gfn2
 store-rel flags for gfn2
 load-acq flags for gfn2
 set RESET for gfn2
  process gfn0
  process gfn1
  process gfn2
 do ioctl!
 etc.

The three load-acquire in CPU0 synchronize with the three store-release
in CPU2, but CPU0 and CPU1 are only synchronized up to gfn1 and CPU1
may miss gfn2's fields other than flags.

The kernel must be able to cope with invalid values of the fields, and
userspace *will* invoke the ioctl once more.  However, once the RESET flag
is cleared on gfn2, it is lost forever, therefore in the above scenario
CPU1 must read the correct value of gfn2's fields.

Therefore RESET must be set with a store-release, that will synchronize
with KVM's load-acquire in CPU1.

Cc: Gavin Shan 
Cc: Peter Xu 
Signed-off-by: Paolo Bonzini 
---
  accel/kvm/kvm-all.c | 18 +-
  1 file changed, 17 insertions(+), 1 deletion(-)



Reviewed-by: Gavin Shan 


diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 136c8eaed3..7c8ce18bdd 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -728,7 +728,23 @@ static bool dirty_gfn_is_dirtied(struct kvm_dirty_gfn *gfn)
  
  static void dirty_gfn_set_collected(struct kvm_dirty_gfn *gfn)

  {
-gfn->flags = KVM_DIRTY_GFN_F_RESET;
+/*
+ * Use a store-release so that the CPU that executes KVM_RESET_DIRTY_RINGS
+ * sees the full content of the ring:
+ *
+ * CPU0 CPU1 CPU2
+ * 
--
+ *   fill gfn0
+ *   store-rel flags 
for gfn0
+ * load-acq flags for gfn0
+ * store-rel RESET for gfn0
+ *  ioctl(RESET_RINGS)
+ *load-acq flags for gfn0
+ *check if flags have RESET
+ *
+ * The synchronization goes from CPU2 to CPU0 to CPU1.
+ */
+qatomic_store_release(>flags, KVM_DIRTY_GFN_F_RESET);
  }
  
  /*







Re: [PATCH v2] KVM: dirty ring: add missing memory barrier

2022-09-06 Thread Gavin Shan

On 8/27/22 6:22 PM, Paolo Bonzini wrote:

The KVM_DIRTY_GFN_F_DIRTY flag ensures that the entry is valid.  If
the read of the fields are not ordered after the read of the flag,
QEMU might see stale values.

Cc: Peter Xu 
Cc: Gavin Shan 
Signed-off-by: Paolo Bonzini 
---
  accel/kvm/kvm-all.c | 6 +-
  1 file changed, 5 insertions(+), 1 deletion(-)



Reviewed-by: Gavin Shan 


diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 8d81ab74de..136c8eaed3 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -719,7 +719,11 @@ static void kvm_dirty_ring_mark_page(KVMState *s, uint32_t 
as_id,
  
  static bool dirty_gfn_is_dirtied(struct kvm_dirty_gfn *gfn)

  {
-return gfn->flags == KVM_DIRTY_GFN_F_DIRTY;
+/*
+ * Read the flags before the value.  Pairs with barrier in
+ * KVM's kvm_dirty_ring_push() function.
+ */
+return qatomic_load_acquire(>flags) == KVM_DIRTY_GFN_F_DIRTY;
  }
  
  static void dirty_gfn_set_collected(struct kvm_dirty_gfn *gfn)







Re: [PATCH v2 0/4] hw/arm/virt: Improve address assignment for high memory regions

2022-09-05 Thread Gavin Shan

Hi Eric,

On 8/24/22 6:06 PM, Eric Auger wrote:

On 8/24/22 05:29, Gavin Shan wrote:

On 8/15/22 4:29 PM, Gavin Shan wrote:

There are three high memory regions, which are VIRT_HIGH_REDIST2,
VIRT_HIGH_PCIE_ECAM and VIRT_HIGH_PCIE_MMIO. Their base addresses
are floating on highest RAM address. However, they can be disabled
in several cases.
  (1) One specific high memory region is disabled by developer by
  toggling vms->highmem_{redists, ecam, mmio}.
  (2) VIRT_HIGH_PCIE_ECAM region is disabled on machine, which is
  'virt-2.12' or ealier than it.
  (3) VIRT_HIGH_PCIE_ECAM region is disabled when firmware is loaded
  on 32-bits system.
  (4) One specific high memory region is disabled when it breaks the
  PA space limit.
  The current implementation of virt_set_memmap() isn't comprehensive
because the space for one specific high memory region is always
reserved from the PA space for case (1), (2) and (3). In the code,
'base' and 'vms->highest_gpa' are always increased for those three
cases. It's unnecessary since the assigned space of the disabled
high memory region won't be used afterwards.

The series intends to improve the address assignment for these
high memory regions:

PATCH[1] and PATCH[2] are cleanup and preparatory works.
PATCH[3] improves address assignment for these high memory regions
PATCH[4] moves the address assignment logic into standalone helper

History
===
v1: https://lists.nongnu.org/archive/html/qemu-arm/2022-08/msg00013.html

Changelog
=
v2:
    * Split the patches for easier review    (Gavin)
    * Improved changelog (Marc)
    * Use 'bool fits' in virt_set_high_memmap()  (Eric)


You did not really convince me about migration compat wrt the high MMIO
region. Aren't the PCI BARs saved/restored meaning the device driver is
expecting to find data at the same GPA. But what if your high MMIO
region was relocated in the dest QEMU with a possibly smaller VM IPA?
Don't you have MMIO regions now allocated outside of the dest MMIO
region? How does the PCI host bridge route accesses to those regions?
What do I miss?



[...]

Sorry for the delay because I was offline last week. I was intending
to explain the migration on virtio-net device and spent some time to
go through the code. I found it's still complicated for an example
because PCI and virtio device models are involved. So lets still use
e1000e.c as an example here.

There are lots of registers residing in MMIO region, including MSIx
table. The MSIx table is backed by PCIDevice::msix_table, which is
a buffer. The access to MSIx table is read from or write to the buffer.
The corresponding handler is hw/msix.c::msix_table_mmio_ops. msix_init()
is called by e1000e.c to setup the MSIx table, which is associated with
memory BAR#3. As the registers in MSIx table is totally emulated by
QEMU, the BAR's base address isn't a concern.

  struct PCIDevice {
 :
 uint8_t *msix_table;
 :
 MemoryRegion msix_table_mmio;
 :
  };

  /* @table_bar is registered as memory BAR#3 in e1000e_pci_realize() */
  int msix_init(struct PCIDevice *dev, unsigned short nentries,
MemoryRegion *table_bar, uint8_t table_bar_nr,
unsigned table_offset, MemoryRegion *pba_bar,
uint8_t pba_bar_nr, unsigned pba_offset, uint8_t cap_pos,
Error **errp)
  {
   :
memory_region_init_io(>msix_table_mmio, OBJECT(dev), 
_table_mmio_ops, dev,
  "msix-table", table_size);
memory_region_add_subregion(table_bar, table_offset, >msix_table_mmio);
   :
  }

As we concerned, the BAR's base addresses for MSIx tables are different on 
source
and destination VMs. It's still not a problem because the registers in MSIx 
table
are migrated, saved on source VM and restored on destination VM one by one. It's
to say, not the whole buffer (PCIDevice::msix_table) is saved and restored at 
once.
Further more, the unique ID string, instead the corresponding BAR's base 
address,
is used to identify the MSIx table. For this particular case, the ID string is
something like "e1000e_dev_id/pci-:05:00.0/msix state". With this ID string
is received on the destination VM, the object and PCI device is located and the
forth-coming data is saved to PCIDevice::msix_table.

  static const VMStateDescription e1000e_vmstate = {
.name = "e1000e",
.version_id = 1,
.minimum_version_id = 1,
.pre_save = e1000e_pre_save,
.post_load = e1000e_post_load,
.fields = (VMStateField[]) {
VMSTATE_PCI_DEVICE(parent_obj, E1000EState),
VMSTATE_MSIX(parent_obj, E1000EState),
:
}
  };

  #define VMSTATE_MSIX_TEST(_field, _state, _test) { \
.name = (stringify(_field)), \
.size = sizeof(PCIDevice),

Re: [PATCH v2 0/4] hw/arm/virt: Improve address assignment for high memory regions

2022-08-26 Thread Gavin Shan

Hi Eric,

On 8/24/22 6:06 PM, Eric Auger wrote:

On 8/24/22 05:29, Gavin Shan wrote:

On 8/15/22 4:29 PM, Gavin Shan wrote:

There are three high memory regions, which are VIRT_HIGH_REDIST2,
VIRT_HIGH_PCIE_ECAM and VIRT_HIGH_PCIE_MMIO. Their base addresses
are floating on highest RAM address. However, they can be disabled
in several cases.
  (1) One specific high memory region is disabled by developer by
  toggling vms->highmem_{redists, ecam, mmio}.
  (2) VIRT_HIGH_PCIE_ECAM region is disabled on machine, which is
  'virt-2.12' or ealier than it.
  (3) VIRT_HIGH_PCIE_ECAM region is disabled when firmware is loaded
  on 32-bits system.
  (4) One specific high memory region is disabled when it breaks the
  PA space limit.
  The current implementation of virt_set_memmap() isn't comprehensive
because the space for one specific high memory region is always
reserved from the PA space for case (1), (2) and (3). In the code,
'base' and 'vms->highest_gpa' are always increased for those three
cases. It's unnecessary since the assigned space of the disabled
high memory region won't be used afterwards.

The series intends to improve the address assignment for these
high memory regions:

PATCH[1] and PATCH[2] are cleanup and preparatory works.
PATCH[3] improves address assignment for these high memory regions
PATCH[4] moves the address assignment logic into standalone helper

History
===
v1: https://lists.nongnu.org/archive/html/qemu-arm/2022-08/msg00013.html

Changelog
=
v2:
    * Split the patches for easier review    (Gavin)
    * Improved changelog (Marc)
    * Use 'bool fits' in virt_set_high_memmap()  (Eric)


You did not really convince me about migration compat wrt the high MMIO
region. Aren't the PCI BARs saved/restored meaning the device driver is
expecting to find data at the same GPA. But what if your high MMIO
region was relocated in the dest QEMU with a possibly smaller VM IPA?
Don't you have MMIO regions now allocated outside of the dest MMIO
region? How does the PCI host bridge route accesses to those regions?
What do I miss?



I'm currently looking into virtio-pci-net migration, but need time to
investigate how the device is migrated. I will get back to you once
I have something. Thanks for your comments :)

Thanks,
Gavin




Re: [PATCH v2 0/4] hw/arm/virt: Improve address assignment for high memory regions

2022-08-23 Thread Gavin Shan

Hi Marc,

On 8/15/22 4:29 PM, Gavin Shan wrote:

There are three high memory regions, which are VIRT_HIGH_REDIST2,
VIRT_HIGH_PCIE_ECAM and VIRT_HIGH_PCIE_MMIO. Their base addresses
are floating on highest RAM address. However, they can be disabled
in several cases.
 
(1) One specific high memory region is disabled by developer by

 toggling vms->highmem_{redists, ecam, mmio}.
 
(2) VIRT_HIGH_PCIE_ECAM region is disabled on machine, which is

 'virt-2.12' or ealier than it.
 
(3) VIRT_HIGH_PCIE_ECAM region is disabled when firmware is loaded

 on 32-bits system.
 
(4) One specific high memory region is disabled when it breaks the

 PA space limit.
 
The current implementation of virt_set_memmap() isn't comprehensive

because the space for one specific high memory region is always
reserved from the PA space for case (1), (2) and (3). In the code,
'base' and 'vms->highest_gpa' are always increased for those three
cases. It's unnecessary since the assigned space of the disabled
high memory region won't be used afterwards.

The series intends to improve the address assignment for these
high memory regions:

PATCH[1] and PATCH[2] are cleanup and preparatory works.
PATCH[3] improves address assignment for these high memory regions
PATCH[4] moves the address assignment logic into standalone helper

History
===
v1: https://lists.nongnu.org/archive/html/qemu-arm/2022-08/msg00013.html

Changelog
=
v2:
   * Split the patches for easier review(Gavin)
   * Improved changelog (Marc)
   * Use 'bool fits' in virt_set_high_memmap()  (Eric)



Could you help to review when you have free cycles? It's just a kindly
ping :)

Thanks,
Gavin



Gavin Shan (4):
   hw/arm/virt: Rename variable size to region_size in virt_set_memmap()
   hw/arm/virt: Introduce variable region_base in virt_set_memmap()
   hw/arm/virt: Improve address assignment for high memory regions
   virt/hw/virt: Add virt_set_high_memmap() helper

  hw/arm/virt.c | 84 ++-
  1 file changed, 50 insertions(+), 34 deletions(-)






[PATCH v2 3/4] hw/arm/virt: Improve address assignment for high memory regions

2022-08-15 Thread Gavin Shan
There are three high memory regions, which are VIRT_HIGH_REDIST2,
VIRT_HIGH_PCIE_ECAM and VIRT_HIGH_PCIE_MMIO. Their base addresses
are floating on highest RAM address. However, they can be disabled
in several cases.

(1) One specific high memory region is disabled by developer by
toggling vms->highmem_{redists, ecam, mmio}.

(2) VIRT_HIGH_PCIE_ECAM region is disabled on machine, which is
'virt-2.12' or ealier than it.

(3) VIRT_HIGH_PCIE_ECAM region is disabled when firmware is loaded
on 32-bits system.

(4) One specific high memory region is disabled when it breaks the
PA space limit.

The current implementation of virt_set_memmap() isn't comprehensive
because the space for one specific high memory region is always
reserved from the PA space for case (1), (2) and (3). In the code,
'base' and 'vms->highest_gpa' are always increased for those three
cases. It's unnecessary since the assigned space of the disabled
high memory region won't be used afterwards.

This improves the address assignment for those three high memory
region by skipping the address assignment for one specific high
memory region if it has been disabled in case (1), (2) and (3).

Signed-off-by: Gavin Shan 
---
 hw/arm/virt.c | 42 +-
 1 file changed, 25 insertions(+), 17 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 582a8960fc..e38b6919c9 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1746,10 +1746,26 @@ static void virt_set_memmap(VirtMachineState *vms, int 
pa_bits)
 for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) {
 hwaddr region_base = ROUND_UP(base, extended_memmap[i].size);
 hwaddr region_size = extended_memmap[i].size;
-bool fits;
+bool *region_enabled, fits;
 
-vms->memmap[i].base = region_base;
-vms->memmap[i].size = region_size;
+switch (i) {
+case VIRT_HIGH_GIC_REDIST2:
+region_enabled = >highmem_redists;
+break;
+case VIRT_HIGH_PCIE_ECAM:
+region_enabled = >highmem_ecam;
+break;
+case VIRT_HIGH_PCIE_MMIO:
+region_enabled = >highmem_mmio;
+break;
+default:
+region_enabled = NULL;
+}
+
+/* Skip unknwon or disabled regions */
+if (!region_enabled || !*region_enabled) {
+continue;
+}
 
 /*
  * Check each device to see if they fit in the PA space,
@@ -1759,22 +1775,14 @@ static void virt_set_memmap(VirtMachineState *vms, int 
pa_bits)
  */
 fits = (region_base + region_size) <= BIT_ULL(pa_bits);
 if (fits) {
-vms->highest_gpa = region_base + region_size - 1;
-}
+vms->memmap[i].base = region_base;
+vms->memmap[i].size = region_size;
 
-switch (i) {
-case VIRT_HIGH_GIC_REDIST2:
-vms->highmem_redists &= fits;
-break;
-case VIRT_HIGH_PCIE_ECAM:
-vms->highmem_ecam &= fits;
-break;
-case VIRT_HIGH_PCIE_MMIO:
-vms->highmem_mmio &= fits;
-break;
+base = region_base + region_size;
+vms->highest_gpa = region_base + region_size - 1;
+} else {
+*region_enabled = false;
 }
-
-base = region_base + region_size;
 }
 
 if (device_memory_size > 0) {
-- 
2.23.0




[PATCH v2 4/4] virt/hw/virt: Add virt_set_high_memmap() helper

2022-08-15 Thread Gavin Shan
The logic to assign high memory region's address in virt_set_memmap()
is independent. Lets move the logic to virt_set_high_memmap() helper.
"each device" is replaced by "each region" in the comments.

No functional change intended.

Signed-off-by: Gavin Shan 
---
 hw/arm/virt.c | 92 ---
 1 file changed, 50 insertions(+), 42 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index e38b6919c9..4dde08a924 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1688,6 +1688,55 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState 
*vms, int idx)
 return arm_cpu_mp_affinity(idx, clustersz);
 }
 
+static void virt_set_high_memmap(VirtMachineState *vms,
+ hwaddr base, int pa_bits)
+{
+hwaddr region_base, region_size;
+bool *region_enabled, fits;
+int i;
+
+for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) {
+region_base = ROUND_UP(base, extended_memmap[i].size);
+region_size = extended_memmap[i].size;
+
+switch (i) {
+case VIRT_HIGH_GIC_REDIST2:
+region_enabled = >highmem_redists;
+break;
+case VIRT_HIGH_PCIE_ECAM:
+region_enabled = >highmem_ecam;
+break;
+case VIRT_HIGH_PCIE_MMIO:
+region_enabled = >highmem_mmio;
+break;
+default:
+region_enabled = NULL;
+}
+
+/* Skip unknwon or disabled regions */
+if (!region_enabled || !*region_enabled) {
+continue;
+}
+
+/*
+ * Check each region to see if they fit in the PA space,
+ * moving highest_gpa as we go.
+ *
+ * For each device that doesn't fit, disable it.
+ */
+fits = (region_base + region_size) <= BIT_ULL(pa_bits);
+if (fits) {
+vms->memmap[i].base = region_base;
+vms->memmap[i].size = region_size;
+
+base = region_base + region_size;
+vms->highest_gpa = region_base + region_size - 1;
+} else {
+*region_enabled = false;
+}
+}
+}
+
 static void virt_set_memmap(VirtMachineState *vms, int pa_bits)
 {
 MachineState *ms = MACHINE(vms);
@@ -1742,48 +1791,7 @@ static void virt_set_memmap(VirtMachineState *vms, int 
pa_bits)
 
 /* We know for sure that at least the memory fits in the PA space */
 vms->highest_gpa = memtop - 1;
-
-for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) {
-hwaddr region_base = ROUND_UP(base, extended_memmap[i].size);
-hwaddr region_size = extended_memmap[i].size;
-bool *region_enabled, fits;
-
-switch (i) {
-case VIRT_HIGH_GIC_REDIST2:
-region_enabled = >highmem_redists;
-break;
-case VIRT_HIGH_PCIE_ECAM:
-region_enabled = >highmem_ecam;
-break;
-case VIRT_HIGH_PCIE_MMIO:
-region_enabled = >highmem_mmio;
-break;
-default:
-region_enabled = NULL;
-}
-
-/* Skip unknwon or disabled regions */
-if (!region_enabled || !*region_enabled) {
-continue;
-}
-
-/*
- * Check each device to see if they fit in the PA space,
- * moving highest_gpa as we go.
- *
- * For each device that doesn't fit, disable it.
- */
-fits = (region_base + region_size) <= BIT_ULL(pa_bits);
-if (fits) {
-vms->memmap[i].base = region_base;
-vms->memmap[i].size = region_size;
-
-base = region_base + region_size;
-vms->highest_gpa = region_base + region_size - 1;
-} else {
-*region_enabled = false;
-}
-}
+virt_set_high_memmap(vms, base, pa_bits);
 
 if (device_memory_size > 0) {
 ms->device_memory = g_malloc0(sizeof(*ms->device_memory));
-- 
2.23.0




[PATCH v2 1/4] hw/arm/virt: Rename variable size to region_size in virt_set_memmap()

2022-08-15 Thread Gavin Shan
This renames variable 'size' to 'region_size' in virt_set_memmap().
It's counterpart to 'region_base', which will be introducded in
next patch.

No functional change intended.

Signed-off-by: Gavin Shan 
---
 hw/arm/virt.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 9633f822f3..f8e9f3e205 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1744,12 +1744,12 @@ static void virt_set_memmap(VirtMachineState *vms, int 
pa_bits)
 vms->highest_gpa = memtop - 1;
 
 for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) {
-hwaddr size = extended_memmap[i].size;
+hwaddr region_size = extended_memmap[i].size;
 bool fits;
 
-base = ROUND_UP(base, size);
+base = ROUND_UP(base, region_size);
 vms->memmap[i].base = base;
-vms->memmap[i].size = size;
+vms->memmap[i].size = region_size;
 
 /*
  * Check each device to see if they fit in the PA space,
@@ -1757,9 +1757,9 @@ static void virt_set_memmap(VirtMachineState *vms, int 
pa_bits)
  *
  * For each device that doesn't fit, disable it.
  */
-fits = (base + size) <= BIT_ULL(pa_bits);
+fits = (base + region_size) <= BIT_ULL(pa_bits);
 if (fits) {
-vms->highest_gpa = base + size - 1;
+vms->highest_gpa = base + region_size - 1;
 }
 
 switch (i) {
@@ -1774,7 +1774,7 @@ static void virt_set_memmap(VirtMachineState *vms, int 
pa_bits)
 break;
 }
 
-base += size;
+base += region_size;
 }
 
 if (device_memory_size > 0) {
-- 
2.23.0




[PATCH v2 2/4] hw/arm/virt: Introduce variable region_base in virt_set_memmap()

2022-08-15 Thread Gavin Shan
This introduces variable 'region_base' for the base address of
the specific high memory region. It's the preparatory to improve
the address assignment for high memory region in next patch.

No functional change intended.

Signed-off-by: Gavin Shan 
---
 hw/arm/virt.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index f8e9f3e205..582a8960fc 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1744,11 +1744,11 @@ static void virt_set_memmap(VirtMachineState *vms, int 
pa_bits)
 vms->highest_gpa = memtop - 1;
 
 for (i = VIRT_LOWMEMMAP_LAST; i < ARRAY_SIZE(extended_memmap); i++) {
+hwaddr region_base = ROUND_UP(base, extended_memmap[i].size);
 hwaddr region_size = extended_memmap[i].size;
 bool fits;
 
-base = ROUND_UP(base, region_size);
-vms->memmap[i].base = base;
+vms->memmap[i].base = region_base;
 vms->memmap[i].size = region_size;
 
 /*
@@ -1757,9 +1757,9 @@ static void virt_set_memmap(VirtMachineState *vms, int 
pa_bits)
  *
  * For each device that doesn't fit, disable it.
  */
-fits = (base + region_size) <= BIT_ULL(pa_bits);
+fits = (region_base + region_size) <= BIT_ULL(pa_bits);
 if (fits) {
-vms->highest_gpa = base + region_size - 1;
+vms->highest_gpa = region_base + region_size - 1;
 }
 
 switch (i) {
@@ -1774,7 +1774,7 @@ static void virt_set_memmap(VirtMachineState *vms, int 
pa_bits)
 break;
 }
 
-base += region_size;
+base = region_base + region_size;
 }
 
 if (device_memory_size > 0) {
-- 
2.23.0




<    1   2   3   4   5   6   7   8   9   10   >