Re: [PATCH v2 3/8] machine: Print supported CPU models instead of typenames

2023-07-30 Thread Gavin Shan



On 7/28/23 00:27, Richard Henderson wrote:

On 7/26/23 22:16, Gavin Shan wrote:


On 7/27/23 09:08, Richard Henderson wrote:

On 7/25/23 17:32, Gavin Shan wrote:

-static const char *q800_machine_valid_cpu_types[] = {
+static const char * const q800_machine_valid_cpu_types[] = {
  M68K_CPU_TYPE_NAME("m68040"),
  NULL
  };
+static const char * const q800_machine_valid_cpu_models[] = {
+    "m68040",
+    NULL
+};


I really don't like this replication.



Right, it's going to be lots of replications, but gives much flexibility.
There are 21 targets and we don't have fixed pattern for the mapping between
CPU model name and CPU typename. I'm summarizing the used patterns like below.

   1 All CPU model names are mappinged to fixed CPU typename;
   2 CPU model name is same to CPU typename;
   3 CPU model name is alias to CPU typename;
   4 CPU model name is prefix of CPU typename;

   Target Categories    suffix-of-CPU-typename
   ---
   alpha  -234  -alpha-cpu
   arm    ---4  -arm-cpu
   avr    -2--
   cris   --34  -cris-cpu
   hexagon    ---4  -hexagon-cpu
   hppa   1---
   i386   ---4  -i386-cpu
   loongarch  -2-4  -loongarch-cpu
   m68k   ---4  -m68k-cpu
   microblaze 1---
   mips   ---4  -mips64-cpu  -mips-cpu
   nios2  1---
   openrisc   ---4  -or1k-cpu
   ppc    --34  -powerpc64-cpu  -powerpc-cpu
   riscv  ---4  -riscv-cpu
   rx -2-4  -rx-cpu
   s390x  ---4  -s390x-cpu
   sh4    --34  -superh-cpu
   sparc  -2--
   tricore    ---4  -tricore-cpu
   xtensa ---4  -xtensa-cpu


That is unfortunate, however...



There are several options as below. Please let me know which one or something
else is the best.

(a) Keep what we have and use mc->valid_{cpu_types, cpu_models}[] to track
the valid CPU typenames and CPU model names.

(b) Introduce CPUClass::model_name_by_typename(). Every target has their own
implementation to convert CPU typename to CPU model name. The CPU model name
is parsed from mc->valid_cpu_types[i].

 char *CPUClass::model_by_typename(const char *typename);

(c) As we discussed before, use mc->valid_cpu_type_suffix and 
mc->valid_cpu_models
because the CPU type check is currently needed by target arm/m68k/riscv where we
do have fixed pattern to convert CPU model names to CPU typenames. The CPU 
typename
is comprised of CPU model name and suffix. However, it won't be working when 
the CPU
type check is required by other target where we have patterns other than this.


(d) Merge the two arrays together and use macro expansion, e.g.

typedef struct {
     const char *name;
     const char *type;
} Something;

#define ARM_SOMETHING(x)  { x, ARM_CPU_TYPE_NAME(x) }

static const Something valid[] = {
     ARM_SOMETHING("cortex-a53"),
     { NULL, NULL }
};

where Something ought to be better named.



Thanks, Richard. It's a nice idea, but not generalized enough. Igor suggested
to reuse the existing list_cpus() in another reply, and I suggested to add
CPUClass::type_to_model() to convert CPU type name to model name for every
target. Please take look and comment when you get a chance.

  https://lists.nongnu.org/archive/html/qemu-arm/2023-07/msg00659.html

Thanks,
Gavin




Re: [PATCH v3 6/6] qtest: microbit-test: add tests for nRF51 DETECT

2023-07-30 Thread Thomas Huth

On 28/07/2023 18.05, Chris Laplante wrote:

Exercise the DETECT mechanism of the GPIO peripheral.

Signed-off-by: Chris Laplante 
Reviewed-by: Peter Maydell 
---
  tests/qtest/microbit-test.c | 42 +
  1 file changed, 42 insertions(+)

diff --git a/tests/qtest/microbit-test.c b/tests/qtest/microbit-test.c
index 6022a92b6a..8f87810cd5 100644
--- a/tests/qtest/microbit-test.c
+++ b/tests/qtest/microbit-test.c
@@ -393,6 +393,47 @@ static void test_nrf51_gpio(void)
  qtest_quit(qts);
  }
  
+static void test_nrf51_gpio_detect(void) {

+QTestState *qts = qtest_init("-M microbit");
+int i;
+
+// Connect input buffer on pins 1-7, configure SENSE for high level


QEMU coding style says that // comments should be avoided. See 
docs/devel/style.rst , section "Comment style". Please use /* ... */ 
comments instead.


 Thanks,
  Thomas




Re: [PATCH v2 3/8] machine: Print supported CPU models instead of typenames

2023-07-30 Thread Gavin Shan



On 7/27/23 19:00, Igor Mammedov wrote:

On Thu, 27 Jul 2023 15:16:18 +1000
Gavin Shan  wrote:


On 7/27/23 09:08, Richard Henderson wrote:

On 7/25/23 17:32, Gavin Shan wrote:

-static const char *q800_machine_valid_cpu_types[] = {
+static const char * const q800_machine_valid_cpu_types[] = {
   M68K_CPU_TYPE_NAME("m68040"),
   NULL
   };
+static const char * const q800_machine_valid_cpu_models[] = {
+    "m68040",
+    NULL
+};


I really don't like this replication.
   


Right, it's going to be lots of replications, but gives much flexibility.
There are 21 targets and we don't have fixed pattern for the mapping between
CPU model name and CPU typename. I'm summarizing the used patterns like below.

1 All CPU model names are mappinged to fixed CPU typename;


 plainly spelled it would be: cpu_model name ignored and
 a cpu type is returned anyways.

I'd make this hard error right away, as "junk in => error out"
it's clearly user error. I think we don't even have to follow
deprecation process for that.



Right, It's not expected behavior to map ambiguous CPU model names to
the fixed CPU typename.


2 CPU model name is same to CPU typename;
3 CPU model name is alias to CPU typename;
4 CPU model name is prefix of CPU typename;


and some more:
 5. cpu model names aren't names at all sometimes, and some other
CPU property is used. (ppc)
This one I'd prefer to get rid of and ppc handling more consistent
with other targets, which would need PPC folks to persuaded to drop
PVR lookup.



I put this into class 3, meaning the PVRs are regarded as aliases to CPU
typenames.



Target Categoriessuffix-of-CPU-typename
---
alpha  -234  -alpha-cpu
arm---4  -arm-cpu
avr-2--
cris   --34  -cris-cpu
hexagon---4  -hexagon-cpu
hppa   1---
i386   ---4  -i386-cpu
loongarch  -2-4  -loongarch-cpu
m68k   ---4  -m68k-cpu
microblaze 1---
mips   ---4  -mips64-cpu  -mips-cpu
nios2  1---
openrisc   ---4  -or1k-cpu
ppc--34  -powerpc64-cpu  -powerpc-cpu
riscv  ---4  -riscv-cpu
rx -2-4  -rx-cpu
s390x  ---4  -s390x-cpu
sh4--34  -superh-cpu
sparc  -2--
tricore---4  -tricore-cpu
xtensa ---4  -xtensa-cpu

There are several options as below. Please let me know which one or something
else is the best.

(a) Keep what we have and use mc->valid_{cpu_types, cpu_models}[] to track
the valid CPU typenames and CPU model names.

(b) Introduce CPUClass::model_name_by_typename(). Every target has their own
implementation to convert CPU typename to CPU model name. The CPU model name
is parsed from mc->valid_cpu_types[i].

  char *CPUClass::model_by_typename(const char *typename);

(c) As we discussed before, use mc->valid_cpu_type_suffix and 
mc->valid_cpu_models
because the CPU type check is currently needed by target arm/m68k/riscv where we
do have fixed pattern to convert CPU model names to CPU typenames. The CPU 
typename
is comprised of CPU model name and suffix. However, it won't be working when 
the CPU
type check is required by other target where we have patterns other than this.


none of above is really good, that's why I was objecting to introducing
reverse type->name mapping. That ends up with increased amount junk,
and it's not because your patches are bad, but because you are trying
to deal with cpu model names (which is a historically evolved mess).
The best from engineering POV would be replacing CPU models with
type names.

Even though it's a bit radical, I very much prefer replacing
cpu_model names with '-cpu type'usage directly. Making it
consistent with -device/other interfaces and coincidentally that
obsoletes need in reverse name mapping.

It's painful for end users who will need to change configs/scripts,
but it's one time job. Additionally from QEMU pov, codebase
will be less messy => more maintainable which benefits not only
developers but end-users in the end.



I have to clarify the type->model mapping has been existing since the
model->type mapping was introduced with the help of CPU_RESOLVING_TYPE.
I mean the logic has been existing since the existence of CPU_RESOLVING_TYPE,
even the code wasn't there.

I'm not sure about the idea to switch to '-cpu ' since
it was rejected by Peter Maydell before. Hope Peter can double confirm
for this. For me, the shorter name is beneficial. For example, users
needn't to have '-cpu host-arm-cpu' for '-cpu host'.



[rant:
It's the same story repeating over and over, when it comes to
changing QEMU CLI, which hits resistance wall. But with QEMU
deprecation process we've changed CLI 

RE: [PATCH RFC v1 1/3] vfio/pci: detect the support of dynamic MSI-X allocation

2023-07-30 Thread Liu, Jing2
Hi C.

> On July 28, 2023 4:44 PM, Cédric Le Goater  wrote:
> 
> [ ... ]
> 
> > Sorry I didn't quite understand "info.flags be tested against
> VFIO_IRQ_INFO_NORESIZE".
> > I saw kernel < 6.4 simply added NORESIZE to info.flags and latest kernel 
> > adds
> if has_dyn_msix.
> > Would you please kindly describe more on your point?
> 
> I was trying to find the conditions to detect safely that the kernel didn't 
> have
> dynamic MSI-X support. Testing VFIO_IRQ_INFO_NORESIZE seems enough.
> 
Oh, I see.

> >> In that case, QEMU should report an error and the trace event is not
> needed.
> >
> > I replied an email with new error handling draft code based on my
> > understanding, which reports the error and need no trace. Could you please
> help review if that is what we want?
> 
> yes. It looked good. Please send a v1 !

Thanks for reviewing that. I guess you mean v2 for next version 

Jing
> 
> Thanks,
> 
> Cédric.



RE: [PATCH RFC v1 1/3] vfio/pci: detect the support of dynamic MSI-X allocation

2023-07-30 Thread Liu, Jing2
Hi Alex,

> On July 28, 2023 11:41 PM, Alex Williamson  wrote:
> 
> On Fri, 28 Jul 2023 10:27:17 +0200
> Cédric Le Goater  wrote:
> 
> > On 7/28/23 10:09, Liu, Jing2 wrote:
> > > Hi Alex,
> > >
> > > Thanks very much for reviewing the patches.
> > >
> > >> On July 28, 2023 1:25 AM, Alex Williamson
>  wrote:
> > >>
> > >> On Thu, 27 Jul 2023 03:24:08 -0400
> > >> Jing Liu  wrote:
> > >>
> > >>> From: Reinette Chatre 
> > >>>
> > >>> Kernel provides the guidance of dynamic MSI-X allocation support
> > >>> of passthrough device, by clearing the VFIO_IRQ_INFO_NORESIZE flag
> > >>> to guide user space.
> > >>>
> > >>> Fetch and store the flags from host for later use to determine if
> > >>> specific flags are set.
> > >>>
> > >>> Signed-off-by: Reinette Chatre 
> > >>> Signed-off-by: Jing Liu 
> > >>> ---
> > >>>   hw/vfio/pci.c| 12 
> > >>>   hw/vfio/pci.h|  1 +
> > >>>   hw/vfio/trace-events |  2 ++
> > >>>   3 files changed, 15 insertions(+)
> > >>>
> > >>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index
> > >>> a205c6b1130f..0c4ac0873d40 100644
> > >>> --- a/hw/vfio/pci.c
> > >>> +++ b/hw/vfio/pci.c
> > >>> @@ -1572,6 +1572,7 @@ static void
> > >>> vfio_msix_early_setup(VFIOPCIDevice
> > >>> *vdev, Error **errp)
> > >>>
> > >>>   static int vfio_msix_setup(VFIOPCIDevice *vdev, int pos, Error
> > >>> **errp)  {
> > >>> +struct vfio_irq_info irq_info = { .argsz = sizeof(irq_info)
> > >>> + };
> > >>>   int ret;
> > >>>   Error *err = NULL;
> > >>>
> > >>> @@ -1624,6 +1625,17 @@ static int vfio_msix_setup(VFIOPCIDevice
> > >>> *vdev, int
> > >> pos, Error **errp)
> > >>>   memory_region_set_enabled(>pdev.msix_table_mmio,
> false);
> > >>>   }
> > >>>
> > >>> +irq_info.index = VFIO_PCI_MSIX_IRQ_INDEX;
> > >>> +ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_IRQ_INFO,
> _info);
> > >>> +if (ret) {
> > >>> +/* This can fail for an old kernel or legacy PCI dev */
> > >>> +
> > >>> + trace_vfio_msix_setup_get_irq_info_failure(strerror(errno));
> > >>
> > >> We only call vfio_msix_setup() if the device has an MSI-X
> > >> capability, so the "legacy PCI" portion of this comment seems 
> > >> unjustified.
> > >> Otherwise the GET_IRQ_INFO ioctl has always existed, so I'd also
> > >> question the "old kernel" part of this comment.
> > >
> > > Oh, yes, I just realize that only VFIO_PCI_ERR_IRQ_INDEX and
> > > VFIO_PCI_REQ_IRQ_INDEX were added later in
> > > include/uapi/linux/vfio.h. Thus, this ioctl() with MSIX index would not 
> > > fail by
> the old-kernel or legacy-PCI reason.
> > > Thanks for pointing out this to me.
> > >
> > > We don't currently sanity test the device
> > >> exposed MSI-X info versus that reported by GET_IRQ_INFO, but it
> > >> seems valid to do so.
> > >
> > > Do we want to keep the check of possible failure from kernel (e.g.,
> > > -EFAULT) and report the error code back to caller? Maybe like this,
> > >
> > > static void vfio_msix_early_setup(VFIOPCIDevice *vdev, Error **errp)
> > > {
> > >  
> > >  msix->entries = (ctrl & PCI_MSIX_FLAGS_QSIZE) + 1;
> > >
> > >  ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_IRQ_INFO, _info);
> > >  if (ret < 0) {
> > >  error_setg_errno(errp, -ret, "failed to get MSI-X IRQ INFO");
> > >  return;
> 
> Yes.
> 
> > >  } else {
> > >  vdev->msix->noresize = !!(irq_info.flags & 
> > > VFIO_IRQ_INFO_NORESIZE);
> > >  }
> 
> No 'else' required here since the error branch returns.

Oh, yes. Will remove the "else" and just set the ”noresize“ value in next 
version.

> 
> > >  ...
> > >  trace_vfio_msix_early_setup(vdev->vbasedev.name, pos, msix-
> >table_bar,
> > >  msix->table_offset, msix->entries,
> > > vdev->msix->noresize);
> >
> > In the trace event, please ouput irq_info.flags since it gives more
> > information on the value returned by the kernel.
> >
> > >  
> > > }
> > >
> > >> I'd expect this to happen in vfio_msix_early_setup() though,
> > >> especially since that's where the remainder of VFIOMSIXInfo is setup.
> > >
> > >>
> > >>> +} else {
> > >>> +vdev->msix->irq_info_flags = irq_info.flags;
> > >>> +}
> > >>> +trace_vfio_msix_setup_irq_info_flags(vdev->vbasedev.name,
> > >>> +
> > >>> + vdev->msix->irq_info_flags);
> > >>> +
> > >>>   return 0;
> > >>>   }
> > >>>
> > >>> diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h index
> > >>> a2771b9ff3cc..ad34ec56d0ae 100644
> > >>> --- a/hw/vfio/pci.h
> > >>> +++ b/hw/vfio/pci.h
> > >>> @@ -113,6 +113,7 @@ typedef struct VFIOMSIXInfo {
> > >>>   uint32_t table_offset;
> > >>>   uint32_t pba_offset;
> > >>>   unsigned long *pending;
> > >>> +uint32_t irq_info_flags;
> > >>
> > >> Why not simply pull out a "noresize" bool?  Thanks,
> > >>
> > > Will change to a bool type.
> >
> > I would simply cache the KVM flags value under VFIOMSIXInfo as you did
> > and add an helper. Both work the same but the intial 

[PATCH] hw/riscv: split RAM into low and high memory

2023-07-30 Thread Fei Wu
riscv virt platform's memory started at 0x8000 and
straddled the 4GiB boundary. Curiously enough, this choice
of a memory layout will prevent from launching a VM with
a bit more than 2000MiB and PCIe pass-thru on an x86 host, due
to identity mapping requirements for the MSI doorbell on x86,
and these (APIC/IOAPIC) live right below 4GiB.

So just split the RAM range into two portions:
- 1 GiB range from 0x8000 to 0xc000.
- The remainder at 0x1

...leaving a hole between the ranges.

Signed-off-by: Andrei Warkentin 
Signed-off-by: Fei Wu 
---
 hw/riscv/virt.c | 74 -
 include/hw/riscv/virt.h |  1 +
 2 files changed, 66 insertions(+), 9 deletions(-)

diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
index d90286dc46..8fbdc7220c 100644
--- a/hw/riscv/virt.c
+++ b/hw/riscv/virt.c
@@ -75,7 +75,9 @@
 #error "Can't accomodate all IMSIC groups in address space"
 #endif
 
-static const MemMapEntry virt_memmap[] = {
+#define LOW_MEM (1 * GiB)
+
+static MemMapEntry virt_memmap[] = {
 [VIRT_DEBUG] ={0x0, 0x100 },
 [VIRT_MROM] = { 0x1000,0xf000 },
 [VIRT_TEST] = {   0x10,0x1000 },
@@ -96,6 +98,7 @@ static const MemMapEntry virt_memmap[] = {
 [VIRT_PCIE_ECAM] ={ 0x3000,0x1000 },
 [VIRT_PCIE_MMIO] ={ 0x4000,0x4000 },
 [VIRT_DRAM] = { 0x8000,   0x0 },
+[VIRT_DRAM_HIGH] ={ 0x1,  0x0 },
 };
 
 /* PCIe high mmio is fixed for RV32 */
@@ -295,15 +298,12 @@ static void create_fdt_socket_cpus(RISCVVirtState *s, int 
socket,
 }
 }
 
-static void create_fdt_socket_memory(RISCVVirtState *s,
- const MemMapEntry *memmap, int socket)
+static void create_fdt_socket_mem_range(RISCVVirtState *s, uint64_t addr,
+uint64_t size, int socket)
 {
 char *mem_name;
-uint64_t addr, size;
 MachineState *ms = MACHINE(s);
 
-addr = memmap[VIRT_DRAM].base + riscv_socket_mem_offset(ms, socket);
-size = riscv_socket_mem_size(ms, socket);
 mem_name = g_strdup_printf("/memory@%lx", (long)addr);
 qemu_fdt_add_subnode(ms->fdt, mem_name);
 qemu_fdt_setprop_cells(ms->fdt, mem_name, "reg",
@@ -313,6 +313,34 @@ static void create_fdt_socket_memory(RISCVVirtState *s,
 g_free(mem_name);
 }
 
+static void create_fdt_socket_memory(RISCVVirtState *s,
+ const MemMapEntry *memmap, int socket)
+{
+uint64_t addr, size;
+MachineState *mc = MACHINE(s);
+uint64_t sock_offset = riscv_socket_mem_offset(mc, socket);
+uint64_t sock_size = riscv_socket_mem_size(mc, socket);
+
+if (sock_offset < memmap[VIRT_DRAM].size) {
+uint64_t low_mem_end = memmap[VIRT_DRAM].base + memmap[VIRT_DRAM].size;
+
+addr = memmap[VIRT_DRAM].base + sock_offset;
+size = MIN(low_mem_end - addr, sock_size);
+create_fdt_socket_mem_range(s, addr, size, socket);
+
+size = sock_size - size;
+if (size > 0) {
+create_fdt_socket_mem_range(s, memmap[VIRT_DRAM_HIGH].base,
+size, socket);
+}
+} else {
+addr = memmap[VIRT_DRAM_HIGH].base +
+   sock_offset - memmap[VIRT_DRAM].size;
+
+create_fdt_socket_mem_range(s, addr, sock_size, socket);
+}
+}
+
 static void create_fdt_socket_clint(RISCVVirtState *s,
 const MemMapEntry *memmap, int socket,
 uint32_t *intc_phandles)
@@ -1334,10 +1362,12 @@ static void virt_machine_done(Notifier *notifier, void 
*data)
 
 static void virt_machine_init(MachineState *machine)
 {
-const MemMapEntry *memmap = virt_memmap;
+MemMapEntry *memmap = virt_memmap;
 RISCVVirtState *s = RISCV_VIRT_MACHINE(machine);
 MemoryRegion *system_memory = get_system_memory();
 MemoryRegion *mask_rom = g_new(MemoryRegion, 1);
+MemoryRegion *ram_below_4g, *ram_above_4g;
+uint64_t ram_size_low, ram_size_high;
 char *soc_name;
 DeviceState *mmio_irqchip, *virtio_irqchip, *pcie_irqchip;
 int i, base_hartid, hart_count;
@@ -1448,6 +1478,17 @@ static void virt_machine_init(MachineState *machine)
 }
 }
 
+if (machine->ram_size > LOW_MEM) {
+ram_size_high = machine->ram_size - LOW_MEM;
+ram_size_low = LOW_MEM;
+} else {
+ram_size_high = 0;
+ram_size_low = machine->ram_size;
+}
+
+memmap[VIRT_DRAM].size = ram_size_low;
+memmap[VIRT_DRAM_HIGH].size = ram_size_high;
+
 if (riscv_is_32bit(>soc[0])) {
 #if HOST_LONG_BITS == 64
 /* limit RAM size in a 32-bit system */
@@ -1460,7 +1501,8 @@ static void virt_machine_init(MachineState *machine)
 virt_high_pcie_memmap.size = VIRT32_HIGH_PCIE_MMIO_SIZE;
 } else {
 virt_high_pcie_memmap.size = VIRT64_HIGH_PCIE_MMIO_SIZE;
-

[PATCH] kvm: Fix crash by initializing kvm_state early

2023-07-30 Thread Gavin Shan
Runs into core dump on arm64 and the backtrace extracted from the
core dump is shown as below. It's caused by accessing @kvm_state which
isn't initialized at that point due to commit 176d073029 ("hw/arm/virt:
Use machine_memory_devices_init()"), where the machine's memory region
is added ealier than before.

main
qemu_init
configure_accelerators
qemu_opts_foreach
do_configure_accelerator
accel_init_machine
kvm_init
virt_kvm_type
virt_set_memmap
machine_memory_devices_init
memory_region_add_subregion
memory_region_add_subregion_common
memory_region_update_container_subregions
memory_region_transaction_begin
qemu_flush_coalesced_mmio_buffer
kvm_flush_coalesced_mmio_buffer

Fix it by initializing @kvm_state early. With this applied, no crash
is observed on arm64.

Fixes: 176d073029 ("hw/arm/virt: Use machine_memory_devices_init()")
Signed-off-by: Gavin Shan 
---
 accel/kvm/kvm-all.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 373d876c05..c825cba12f 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -2464,6 +2464,7 @@ static int kvm_init(MachineState *ms)
 qemu_mutex_init(_slots_lock);
 
 s = KVM_STATE(ms->accelerator);
+kvm_state = s;
 
 /*
  * On systems where the kernel can support different base page
@@ -2695,8 +2696,6 @@ static int kvm_init(MachineState *ms)
 #endif
 }
 
-kvm_state = s;
-
 ret = kvm_arch_init(ms, s);
 if (ret < 0) {
 goto err;
-- 
2.41.0




Re: [PATCH v2 4/4] vhost-user-scsi: support reconnect to backend

2023-07-30 Thread Raphael Norwitz
I don’t think we should be changing any vhost-scsi-common code here. I’d rather 
implement wrappers around vhost_user_scsi_start/stop() around 
vhost_user_scsi_common_start/stop() and check started_vu there.

Otherwise I think this is looking good. 

Glad to see you caught the vhost_user_scsi_handle_ouptut and implemented it 
like vhost-user-blk. Can it go in a separate change?

> On Jul 25, 2023, at 6:42 AM, Li Feng  wrote:
> 
> If the backend crashes and restarts, the device is broken.
> This patch adds reconnect for vhost-user-scsi.
> 
> Tested with spdk backend.
> 
> Signed-off-by: Li Feng 
> ---
> hw/scsi/vhost-scsi-common.c   |   6 +
> hw/scsi/vhost-user-scsi.c | 220 +++---
> include/hw/virtio/vhost-scsi-common.h |   3 +
> include/hw/virtio/vhost-user-scsi.h   |   3 +
> 4 files changed, 211 insertions(+), 21 deletions(-)
> 
> diff --git a/hw/scsi/vhost-scsi-common.c b/hw/scsi/vhost-scsi-common.c
> index 664adb15b4..3fde477eee 100644
> --- a/hw/scsi/vhost-scsi-common.c
> +++ b/hw/scsi/vhost-scsi-common.c
> @@ -81,6 +81,7 @@ int vhost_scsi_common_start(VHostSCSICommon *vsc)
> error_report("Error start vhost dev");
> goto err_guest_notifiers;
> }
> +vsc->started_vu = true;
> 
> /* guest_notifier_mask/pending not used yet, so just unmask
>  * everything here.  virtio-pci will do the right thing by
> @@ -106,6 +107,11 @@ void vhost_scsi_common_stop(VHostSCSICommon *vsc)
> VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
> int ret = 0;
> 
> +if (!vsc->started_vu) {
> +return;
> +}
> +vsc->started_vu = false;
> +
> vhost_dev_stop(>dev, vdev, true);
> 
> if (k->set_guest_notifiers) {
> diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
> index ee99b19e7a..bd32dcf999 100644
> --- a/hw/scsi/vhost-user-scsi.c
> +++ b/hw/scsi/vhost-user-scsi.c
> @@ -46,20 +46,25 @@ enum VhostUserProtocolFeature {
> static void vhost_user_scsi_set_status(VirtIODevice *vdev, uint8_t status)
> {
> VHostUserSCSI *s = (VHostUserSCSI *)vdev;
> +DeviceState *dev = >parent_obj.parent_obj.parent_obj.parent_obj;
> VHostSCSICommon *vsc = VHOST_SCSI_COMMON(s);
> -bool start = (status & VIRTIO_CONFIG_S_DRIVER_OK) && vdev->vm_running;
> +VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(dev);
> +bool should_start = virtio_device_should_start(vdev, status);
> +int ret;
> 
> -if (vhost_dev_is_started(>dev) == start) {
> +if (!s->connected) {
> return;
> }
> 
> -if (start) {
> -int ret;
> +if (vhost_dev_is_started(>dev) == should_start) {
> +return;
> +}
> 
> +if (should_start) {
> ret = vhost_scsi_common_start(vsc);
> if (ret < 0) {
> error_report("unable to start vhost-user-scsi: %s", 
> strerror(-ret));
> -exit(1);
> +qemu_chr_fe_disconnect(>conf.chardev);
> }
> } else {
> vhost_scsi_common_stop(vsc);
> @@ -85,8 +90,160 @@ static void vhost_user_scsi_reset(VirtIODevice *vdev)
> }
> }
> 
> -static void vhost_dummy_handle_output(VirtIODevice *vdev, VirtQueue *vq)
> +static void vhost_user_scsi_handle_output(VirtIODevice *vdev, VirtQueue *vq)
> {
> +VHostUserSCSI *s = (VHostUserSCSI *)vdev;
> +DeviceState *dev = >parent_obj.parent_obj.parent_obj.parent_obj;
> +VHostSCSICommon *vsc = VHOST_SCSI_COMMON(s);
> +VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(dev);
> +
> +Error *local_err = NULL;
> +int i, ret;
> +
> +if (!vdev->start_on_kick) {
> +return;
> +}
> +
> +if (!s->connected) {
> +return;
> +}
> +
> +if (vhost_dev_is_started(>dev)) {
> +return;
> +}
> +
> +/*
> + * Some guests kick before setting VIRTIO_CONFIG_S_DRIVER_OK so start
> + * vhost here instead of waiting for .set_status().
> + */
> +ret = vhost_scsi_common_start(vsc);
> +if (ret < 0) {
> +error_reportf_err(local_err, "vhost-user-blk: vhost start failed: ");
> +qemu_chr_fe_disconnect(>conf.chardev);
> +return;
> +}
> +
> +/* Kick right away to begin processing requests already in vring */
> +for (i = 0; i < vsc->dev.nvqs; i++) {
> +VirtQueue *kick_vq = virtio_get_queue(vdev, i);
> +
> +if (!virtio_queue_get_desc_addr(vdev, i)) {
> +continue;
> +}
> +event_notifier_set(virtio_queue_get_host_notifier(kick_vq));
> +}
> +}
> +
> +static int vhost_user_scsi_connect(DeviceState *dev, Error **errp)
> +{
> +VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> +VHostUserSCSI *s = VHOST_USER_SCSI(vdev);
> +VHostSCSICommon *vsc = VHOST_SCSI_COMMON(s);
> +VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(dev);
> +int ret = 0;
> +
> +if (s->connected) {
> +return 0;
> +}
> +s->connected = true;
> +
> +vsc->dev.num_queues = vs->conf.num_queues;
> +vsc->dev.nvqs = VIRTIO_SCSI_VQ_NUM_FIXED + vs->conf.num_queues;
> +

Re: [PATCH v2 2/4] vhost-user-common: send get_inflight_fd once

2023-07-30 Thread Raphael Norwitz

> On Jul 28, 2023, at 3:49 AM, Li Feng  wrote:
> 
> 
> 
>> 2023年7月28日 下午2:04,Michael S. Tsirkin  写道:
>> 
>> On Tue, Jul 25, 2023 at 06:42:45PM +0800, Li Feng wrote:
>>> Get_inflight_fd is sent only once. When reconnecting to the backend,
>>> qemu sent set_inflight_fd to the backend.
>> 
>> I don't understand what you are trying to say here.
>> Should be:
>> Currently ABCD. This is wrong/unnecessary because EFG. This patch HIJ.
> 
> Thanks, I will reorganize the commit message in v3.
>> 
>>> Signed-off-by: Li Feng 
>>> ---
>>> hw/scsi/vhost-scsi-common.c | 37 ++---
>>> 1 file changed, 18 insertions(+), 19 deletions(-)
>>> 
>>> diff --git a/hw/scsi/vhost-scsi-common.c b/hw/scsi/vhost-scsi-common.c
>>> index a06f01af26..664adb15b4 100644
>>> --- a/hw/scsi/vhost-scsi-common.c
>>> +++ b/hw/scsi/vhost-scsi-common.c
>>> @@ -52,20 +52,28 @@ int vhost_scsi_common_start(VHostSCSICommon *vsc)
>>> 
>>> vsc->dev.acked_features = vdev->guest_features;
>>> 
>>> -assert(vsc->inflight == NULL);
>>> -vsc->inflight = g_new0(struct vhost_inflight, 1);
>>> -ret = vhost_dev_get_inflight(>dev,
>>> - vs->conf.virtqueue_size,
>>> - vsc->inflight);
>>> +ret = vhost_dev_prepare_inflight(>dev, vdev);
>>> if (ret < 0) {
>>> -error_report("Error get inflight: %d", -ret);
>>> +error_report("Error setting inflight format: %d", -ret);
>>> goto err_guest_notifiers;
>>> }
>>> 
>>> -ret = vhost_dev_set_inflight(>dev, vsc->inflight);
>>> -if (ret < 0) {
>>> -error_report("Error set inflight: %d", -ret);
>>> -goto err_guest_notifiers;
>>> +if (vsc->inflight) {
>>> +if (!vsc->inflight->addr) {
>>> +ret = vhost_dev_get_inflight(>dev,
>>> +vs->conf.virtqueue_size,
>>> +vsc->inflight);
>>> +if (ret < 0) {
>>> +error_report("Error get inflight: %d", -ret);
>> 
>> As long as you are fixing this - should be "getting inflight”.
> I will fix it in v3.
>> 
>>> +goto err_guest_notifiers;
>>> +}
>>> +}
>>> +

Looks like you reworked this a bit so to avoid a potential crash if 
vsc->inflight is NULL

Should we fix it for vhost-user-blk too?

>>> +ret = vhost_dev_set_inflight(>dev, vsc->inflight);
>>> +if (ret < 0) {
>>> +error_report("Error set inflight: %d", -ret);
>>> +goto err_guest_notifiers;
>>> +}
>>> }
>>> 
>>> ret = vhost_dev_start(>dev, vdev, true);
>>> @@ -85,9 +93,6 @@ int vhost_scsi_common_start(VHostSCSICommon *vsc)
>>> return ret;
>>> 
>>> err_guest_notifiers:
>>> -g_free(vsc->inflight);
>>> -vsc->inflight = NULL;
>>> -
>>> k->set_guest_notifiers(qbus->parent, vsc->dev.nvqs, false);
>>> err_host_notifiers:
>>> vhost_dev_disable_notifiers(>dev, vdev);
>>> @@ -111,12 +116,6 @@ void vhost_scsi_common_stop(VHostSCSICommon *vsc)
>>> }
>>> assert(ret >= 0);
>>> 

As I said before, I think this introduces a leak.

>>> -if (vsc->inflight) {
>>> -vhost_dev_free_inflight(vsc->inflight);
>>> -g_free(vsc->inflight);
>>> -vsc->inflight = NULL;
>>> -}
>>> -
>>> vhost_dev_disable_notifiers(>dev, vdev);
>>> }
>>> 
>>> -- 
>>> 2.41.0



Re: [PATCH v2 3/4] vhost: move and rename the conn retry times

2023-07-30 Thread Raphael Norwitz


> On Jul 25, 2023, at 6:42 AM, Li Feng  wrote:
> 
> Multile devices need this macro, move it to a common header.
> 
> Signed-off-by: Li Feng 

Reviewed-by: Raphael Norwitz 

> ---
> hw/block/vhost-user-blk.c   | 4 +---
> hw/virtio/vhost-user-gpio.c | 3 +--
> include/hw/virtio/vhost.h   | 2 ++
> 3 files changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> index eecf3f7a81..3c69fa47d5 100644
> --- a/hw/block/vhost-user-blk.c
> +++ b/hw/block/vhost-user-blk.c
> @@ -32,8 +32,6 @@
> #include "sysemu/sysemu.h"
> #include "sysemu/runstate.h"
> 
> -#define REALIZE_CONNECTION_RETRIES 3
> -
> static const int user_feature_bits[] = {
> VIRTIO_BLK_F_SIZE_MAX,
> VIRTIO_BLK_F_SEG_MAX,
> @@ -482,7 +480,7 @@ static void vhost_user_blk_device_realize(DeviceState 
> *dev, Error **errp)
> s->inflight = g_new0(struct vhost_inflight, 1);
> s->vhost_vqs = g_new0(struct vhost_virtqueue, s->num_queues);
> 
> -retries = REALIZE_CONNECTION_RETRIES;
> +retries = VU_REALIZE_CONN_RETRIES;
> assert(!*errp);
> do {
> if (*errp) {
> diff --git a/hw/virtio/vhost-user-gpio.c b/hw/virtio/vhost-user-gpio.c
> index 3b013f2d0f..d9979aa5db 100644
> --- a/hw/virtio/vhost-user-gpio.c
> +++ b/hw/virtio/vhost-user-gpio.c
> @@ -15,7 +15,6 @@
> #include "standard-headers/linux/virtio_ids.h"
> #include "trace.h"
> 
> -#define REALIZE_CONNECTION_RETRIES 3
> #define VHOST_NVQS 2
> 
> /* Features required from VirtIO */
> @@ -359,7 +358,7 @@ static void vu_gpio_device_realize(DeviceState *dev, 
> Error **errp)
> qemu_chr_fe_set_handlers(>chardev, NULL, NULL, vu_gpio_event, NULL,
>  dev, NULL, true);
> 
> -retries = REALIZE_CONNECTION_RETRIES;
> +retries = VU_REALIZE_CONN_RETRIES;
> g_assert(!*errp);
> do {
> if (*errp) {
> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
> index 6a173cb9fa..ca3131b1af 100644
> --- a/include/hw/virtio/vhost.h
> +++ b/include/hw/virtio/vhost.h
> @@ -8,6 +8,8 @@
> #define VHOST_F_DEVICE_IOTLB 63
> #define VHOST_USER_F_PROTOCOL_FEATURES 30
> 
> +#define VU_REALIZE_CONN_RETRIES 3
> +
> /* Generic structures common for any vhost based device. */
> 
> struct vhost_inflight {
> -- 
> 2.41.0
> 




Re: [PATCH v2 1/4] vhost: fix the fd leak

2023-07-30 Thread Raphael Norwitz



> On Jul 25, 2023, at 6:42 AM, Li Feng  wrote:
> 
> When the vhost-user reconnect to the backend, the notifer should be
> cleanup. Otherwise, the fd resource will be exhausted.
> 
> Fixes: f9a09ca3ea ("vhost: add support for configure interrupt")
> 
> Signed-off-by: Li Feng 

Reviewed-by: Raphael Norwitz 

> ---
> hw/virtio/vhost.c | 2 ++
> 1 file changed, 2 insertions(+)
> 
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index abf0d03c8d..e2f6ffb446 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -2044,6 +2044,8 @@ void vhost_dev_stop(struct vhost_dev *hdev, 
> VirtIODevice *vdev, bool vrings)
> event_notifier_test_and_clear(
> >vqs[VHOST_QUEUE_NUM_CONFIG_INR].masked_config_notifier);
> event_notifier_test_and_clear(>config_notifier);
> +event_notifier_cleanup(
> +>vqs[VHOST_QUEUE_NUM_CONFIG_INR].masked_config_notifier);
> 
> trace_vhost_dev_stop(hdev, vdev->name, vrings);
> 
> -- 
> 2.41.0
> 




Re: [PATCH] vhost-user-scsi: support reconnect to backend

2023-07-30 Thread Raphael Norwitz


> On Jul 28, 2023, at 3:48 AM, Li Feng  wrote:
> 
> Thanks for your reply.
> 
>> 2023年7月28日 上午5:21,Raphael Norwitz  写道:
>> 
>> 
>> 
>>> On Jul 25, 2023, at 6:19 AM, Li Feng  wrote:
>>> 
>>> Thanks for your comments.
>>> 
 2023年7月25日 上午1:21,Raphael Norwitz  写道:
 
 Very excited to see this. High level looks good modulo a few small things.
 
 My major concern is around existing vhost-user-scsi backends which don’t 
 support VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD. IMO we should hide the 
 reconnect behavior behind a VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD check. We 
 may want to do the same for vhost-user-blk.
 
 The question is then what happens if the check is false. IIUC without an 
 inflight FD, if a device processes requests out of order, it’s not safe to 
 continue execution on reconnect, as there’s no way for the backend to know 
 how to replay IO. Should we permanently wedge the device or have QEMU fail 
 out? May be nice to have a toggle for this.
>>> 
>>> Based on what MST said, is there anything else I need to do?
>> 
>> I don’t think so.
>> 
 
> On Jul 21, 2023, at 6:51 AM, Li Feng  wrote:
> 
> If the backend crashes and restarts, the device is broken.
> This patch adds reconnect for vhost-user-scsi.
> 
> Tested with spdk backend.
> 
> Signed-off-by: Li Feng 
> ---
> hw/block/vhost-user-blk.c   |   2 -
> hw/scsi/vhost-scsi-common.c |  27 ++---
> hw/scsi/vhost-user-scsi.c   | 163 +---
> include/hw/virtio/vhost-user-scsi.h |   3 +
> include/hw/virtio/vhost.h   |   2 +
> 5 files changed, 165 insertions(+), 32 deletions(-)
> 
> diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> index eecf3f7a81..f250c740b5 100644
> --- a/hw/block/vhost-user-blk.c
> +++ b/hw/block/vhost-user-blk.c
> @@ -32,8 +32,6 @@
> #include "sysemu/sysemu.h"
> #include "sysemu/runstate.h"
> 
> -#define REALIZE_CONNECTION_RETRIES 3
> -
> static const int user_feature_bits[] = {
>   VIRTIO_BLK_F_SIZE_MAX,
>   VIRTIO_BLK_F_SEG_MAX,
> diff --git a/hw/scsi/vhost-scsi-common.c b/hw/scsi/vhost-scsi-common.c
 
 Why can’t all the vhost-scsi-common stuff be moved to a separate change?
>>> 
>>> I will move this code to separate patch.
 
 Especially the stuff introduced for vhost-user-blk in 
 1b0063b3048af65dfaae6422a572c87db8575a92 should be moved out.
>>> OK.
>>> 
 
> index a06f01af26..08801886b8 100644
> --- a/hw/scsi/vhost-scsi-common.c
> +++ b/hw/scsi/vhost-scsi-common.c
> @@ -52,16 +52,22 @@ int vhost_scsi_common_start(VHostSCSICommon *vsc)
> 
>   vsc->dev.acked_features = vdev->guest_features;
> 
> -assert(vsc->inflight == NULL);
> -vsc->inflight = g_new0(struct vhost_inflight, 1);
> -ret = vhost_dev_get_inflight(>dev,
> - vs->conf.virtqueue_size,
> - vsc->inflight);
> +ret = vhost_dev_prepare_inflight(>dev, vdev);
>   if (ret < 0) {
> -error_report("Error get inflight: %d", -ret);
> +error_report("Error setting inflight format: %d", -ret);
>   goto err_guest_notifiers;
>   }
> 
> +if (!vsc->inflight->addr) {
> +ret = vhost_dev_get_inflight(>dev,
> +vs->conf.virtqueue_size,
> +vsc->inflight);
> +if (ret < 0) {
> +error_report("Error get inflight: %d", -ret);
> +goto err_guest_notifiers;
> +}
> +}
> +
>   ret = vhost_dev_set_inflight(>dev, vsc->inflight);
>   if (ret < 0) {
>   error_report("Error set inflight: %d", -ret);
> @@ -85,9 +91,6 @@ int vhost_scsi_common_start(VHostSCSICommon *vsc)
>   return ret;
> 
> err_guest_notifiers:
> -g_free(vsc->inflight);
> -vsc->inflight = NULL;
> -
>   k->set_guest_notifiers(qbus->parent, vsc->dev.nvqs, false);
> err_host_notifiers:
>   vhost_dev_disable_notifiers(>dev, vdev);
> @@ -111,12 +114,6 @@ void vhost_scsi_common_stop(VHostSCSICommon *vsc)
>   }
>   assert(ret >= 0);
> 
 
 In the vhost-scsi (kernel backend) path, what will cleanup vsc->inflight 
 now?
>>> OK, we should check the vsc->inflight if it is null, the vhost-scsi doesn’t 
>>> allocate the
>>> inflight object memory.
>> 
>> Are you saying vhost-scsi never allocates inflight so we don’t need to check 
>> for it?
> We have checked the vsc->inflight, and only if allocated, we send the 
> get/set_inflight_fd.
> This works with vhost-user-scsi/vhost-scsi both.

So then it sounds like this code introduces a resource leak. 
g_free(vsc->inflight) should be added to the vhost-scsi code in 
vhost_scsi_stop().

>> 
>>> 
 
> -if 

Re: [RFC][PATCH] accel/tcg: Use lookup_and_goto_ptr() for linux-user in translator_use_goto_tb()

2023-07-30 Thread Helge Deller

On 7/30/23 22:03, Richard Henderson wrote:

On 7/30/23 10:56, Helge Deller wrote:

I'm quite unclear about translator_use_goto_tb() for qemu-user
emulation(and in general).

Based on the function name, the function translator_use_goto_tb() shall
help to decide if a program should use goto_tb() and exit_tb() to jump
to the next instruction.

Currently, if the destination is on the same page, it returns true.
I wonder, if it shouldn't return false in this case instead, because
arches have code like this: (taken from target/hppa/translate.c):
 if (... && translator_use_goto_tb(ctx, f)) {
 tcg_gen_goto_tb(which);
 tcg_gen_movi_reg(cpu_iaoq_f, f);
 tcg_gen_movi_reg(cpu_iaoq_b, b);
 tcg_gen_exit_tb(ctx->base.tb, which);
 } else {
 copy_iaoq_entry(cpu_iaoq_f, f, cpu_iaoq_b);
 copy_iaoq_entry(cpu_iaoq_b, b, ctx->iaoq_n_var);
 tcg_gen_lookup_and_goto_ptr();
 }

Shouldn't, if the destination is on the same page, the (faster?)
path with tcg_gen_lookup_and_goto_ptr() be taken instead?


No, because tcg_gen_lookup_and_goto_ptr is not the faster path.
That always involves a lookup, then an indirect branch.


Ah, ok. So my assumption was wrong, and this explains it.


The goto_tb path is linked, so only requires a lookup once, and the
branch may be direct (depending on the host architecture).

Probably the last question in this regard:

This code:
IN:
0x00010c98:  cmpib,<>,n 0,r19,0x10c98

generates "nop/jmp" in the code:

the tcg_gen_goto_tb() branch:
OUT:
0x7fd7e400070e:  85 dbtestl%ebx, %ebx
0x7fd7e4000710:  0f 85 20 00 00 00jne  0x7fd7e4000736
0x7fd7e4000716:  90   nop   <- from 
"tcg_gen_op1i(INDEX_op_goto_tb, idx)" in tcg_gen_goto_tb()
0x7fd7e4000717:  e9 00 00 00 00   jmp  0x7fd7e400071c   <- jump 
is effective useless.
0x7fd7e400071c:  c7 45 00 a3 0c 01 00 movl $0x10ca3, (%rbp)
0x7fd7e4000723:  c7 45 04 a7 0c 01 00 movl $0x10ca7, 4(%rbp)
0x7fd7e400072a:  48 8d 05 0f ff ff ff leaq -0xf1(%rip), %rax
0x7fd7e4000731:  e9 e2 f8 ff ff   jmp  0x7fd7e418
0x7fd7e4000736:  90   nop   <- here 
too.
0x7fd7e4000737:  e9 00 00 00 00   jmp  0x7fd7e400073c
0x7fd7e400073c:  c7 45 00 9f 0c 01 00 movl $0x10c9f, (%rbp)
0x7fd7e4000743:  c7 45 04 9b 0c 01 00 movl $0x10c9b, 4(%rbp)
0x7fd7e400074a:  48 8d 05 f0 fe ff ff leaq -0x110(%rip), %rax
0x7fd7e4000751:  e9 c2 f8 ff ff   jmp  0x7fd7e418

I assume those nops/jmp+0 is to be able to insert breakpoints?

But those nops/jmps are never in the tcg_gen_lookup_and_goto_ptr() branch,
probably because breakpoints are checked in HELPER(lookup_tb_ptr), right?

0x7ff47c0006d0:  0f 85 18 00 00 00jne  0x7ff47c0006ee
0x7ff47c0006d6:  c7 45 00 a3 0c 01 00 movl $0x10ca3, (%rbp)
0x7ff47c0006dd:  c7 45 04 a7 0c 01 00 movl $0x10ca7, 4(%rbp)
0x7ff47c0006e4:  48 8b fd movq %rbp, %rdi
0x7ff47c0006e7:  e8 34 bf 42 0f   callq0x7ff48b42c620
0x7ff47c0006ec:  ff e0jmpq *%rax
0x7ff47c0006ee:  c7 45 00 9f 0c 01 00 movl $0x10c9f, (%rbp)
0x7ff47c0006f5:  c7 45 04 9b 0c 01 00 movl $0x10c9b, 4(%rbp)
0x7ff47c0006fc:  48 8b fd movq %rbp, %rdi
0x7ff47c0006ff:  e8 1c bf 42 0f   callq0x7ff48b42c620
0x7ff47c000704:  ff e0jmpq *%rax

Question:
Couldn't those nops be avoided in the tcg_gen_goto_tb() branch too, e.g.
if breakpoints are checked when returning through the prologue exit path?

Thanks!
Helge



Re: [RFC][PATCH] accel/tcg: Use lookup_and_goto_ptr() for linux-user in translator_use_goto_tb()

2023-07-30 Thread Richard Henderson

On 7/30/23 10:56, Helge Deller wrote:

I'm quite unclear about translator_use_goto_tb() for qemu-user
emulation(and in general).

Based on the function name, the function translator_use_goto_tb() shall
help to decide if a program should use goto_tb() and exit_tb() to jump
to the next instruction.

Currently, if the destination is on the same page, it returns true.
I wonder, if it shouldn't return false in this case instead, because
arches have code like this: (taken from target/hppa/translate.c):
 if (... && translator_use_goto_tb(ctx, f)) {
 tcg_gen_goto_tb(which);
 tcg_gen_movi_reg(cpu_iaoq_f, f);
 tcg_gen_movi_reg(cpu_iaoq_b, b);
 tcg_gen_exit_tb(ctx->base.tb, which);
 } else {
 copy_iaoq_entry(cpu_iaoq_f, f, cpu_iaoq_b);
 copy_iaoq_entry(cpu_iaoq_b, b, ctx->iaoq_n_var);
 tcg_gen_lookup_and_goto_ptr();
 }

Shouldn't, if the destination is on the same page, the (faster?)
path with tcg_gen_lookup_and_goto_ptr() be taken instead?


No, because tcg_gen_lookup_and_goto_ptr is not the faster path.
That always involves a lookup, then an indirect branch.

The goto_tb path is linked, so only requires a lookup once, and the branch may be direct 
(depending on the host architecture).



r~



Re: [PATCH] elf2dmp: Don't abandon when Prcb is set to 0

2023-07-30 Thread Viktor Prutyanov


>> On 2023/06/12 12:42, Viktor Prutyanov wrote:
>>
 Prcb may be set to 0 for some CPUs if the dump was taken before they
 start. The dump may still contain valuable information for started CPUs
 so don't abandon conversion in such a case.

 Signed-off-by: Akihiko Odaki 
 ---
 contrib/elf2dmp/main.c | 5 +
 1 file changed, 5 insertions(+)

 diff --git a/contrib/elf2dmp/main.c b/contrib/elf2dmp/main.c
 index d77b8f98f7..91c58e4424 100644
 --- a/contrib/elf2dmp/main.c
 +++ b/contrib/elf2dmp/main.c
 @@ -312,6 +312,11 @@ static int fill_context(KDDEBUGGER_DATA64 *kdbg,
 return 1;
 }

 + if (!Prcb) {
 + eprintf("Context for CPU #%d is missing\n", i);
 + continue;
 + }
 +
 if (va_space_rw(vs, Prcb + kdbg->OffsetPrcbContext,
 , sizeof(Context), 0)) {
 eprintf("Failed to read CPU #%d ContextFrame location\n", i);

 --
 2.40.1
>>>
>>> Hi Akihiko,
>>>
>>> How this fix can be tested?
>>
>> It is a bit difficult to test it as you need to interrupt the very early
>> stage of boot. I applied the following change to TCG so that it stops
>> immediately after the first processor configures Prcb.
>>
>> diff --git a/target/i386/tcg/sysemu/misc_helper.c
>> b/target/i386/tcg/sysemu/misc_helper.c
>> index e1528b7f80..f68eba9cac 100644
>> --- a/target/i386/tcg/sysemu/misc_helper.c
>> +++ b/target/i386/tcg/sysemu/misc_helper.c
>> @@ -25,6 +25,9 @@
>> #include "exec/address-spaces.h"
>> #include "exec/exec-all.h"
>> #include "tcg/helper-tcg.h"
>> +#include "exec/gdbstub.h"
>> +#include "hw/core/cpu.h"
>> +#include "sysemu/runstate.h"
>>
>> void helper_outb(CPUX86State *env, uint32_t port, uint32_t data)
>> {
>> @@ -217,7 +220,10 @@ void helper_wrmsr(CPUX86State *env)
>> env->segs[R_FS].base = val;
>> break;
>> case MSR_GSBASE:
>> + printf("%s: %" PRIx64 "\n", __func__, val);
>> env->segs[R_GS].base = val;
>> + gdb_set_stop_cpu(current_cpu);
>> + vm_stop(RUN_STATE_PAUSED);
>> break;
>> case MSR_KERNELGSBASE:
>> env->kernelgsbase = val;
>>
>>> NumberProcessors field is still set to qemu_elf.state_nr, how does WinDbg 
>>> react to this?
>>
>> If Prcb for processor 1 is missing, WinDbg outputs: KiProcessorBlock[1]
>> is null.
>> You can still debug the started processors with no issue.
>>
>> Regards,
>> Akihiko Odaki
>>
>>> Viktor
> 
> Reviewed-by: Viktor Prutyanov 

Hi Peter,

Could you please put Akihiko's patch into your branch?

Thank you,
Viktor Prutyanov



[PULL 0/1] hw/nvme fixes

2023-07-30 Thread Klaus Jensen
From: Klaus Jensen 

Hi,

This should also fix coverity cid 1518067 and 1518066.

The following changes since commit ccb86f079a9e4d94918086a9df18c1844347aff8:

  Merge tag 'pull-nbd-2023-07-28' of https://repo.or.cz/qemu/ericb into staging 
(2023-07-28 09:56:57 -0700)

are available in the Git repository at:

  https://gitlab.com/birkelund/qemu.git tags/nvme-next-pull-request

for you to fetch changes up to c1e244b6552efdff5612a33c6630aaf95964eaf5:

  hw/nvme: use stl/ldl pci dma api (2023-07-30 20:09:54 +0200)


hw/nvme fixes

- use the stl/ldl pci dma api
-BEGIN PGP SIGNATURE-

iQEzBAABCgAdFiEEUigzqnXi3OaiR2bATeGvMW1PDekFAmTGuc8ACgkQTeGvMW1P
Dek41wgAwqgRmtUhmmaQJJpF5Pya3J7n3Zkbp+cULdnSp/su7W7yIUTcTzdbr34d
9LbNHmWerXYinlIxG08ZWw2lq0TwApKj+8gv/wf8H7dG86/pBYfoQvOlkNx2QKyR
vtRNlILCEbJpbSfY3LbFNvRGOkArr6HkzT4hZprUIfCvRg58u5oIxEx/ZYa+m3WU
ED0y/46e7HbVbmbwJKrn4EK3k0zGdFyeINRZ5TB5DML3lCTX6eaZTLUXGIb7LLcK
Xyv6/TCkPTggDszTam24kx0A7DhC+3f2C8DsJg7H8jnWb1F+oq/2EJam/0HU22Uk
n348MrWOusuF7kbHMCP9h28gYT3aWw==
=KjVO
-END PGP SIGNATURE-



Klaus Jensen (1):
  hw/nvme: use stl/ldl pci dma api

 hw/nvme/ctrl.c | 42 +-
 1 file changed, 13 insertions(+), 29 deletions(-)

-- 
2.41.0




[PULL 1/1] hw/nvme: use stl/ldl pci dma api

2023-07-30 Thread Klaus Jensen
From: Klaus Jensen 

Use the stl/ldl pci dma api for writing/reading doorbells. This removes
the explicit endian conversions.

Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Cédric Le Goater 
Tested-by: Cédric Le Goater 
Reviewed-by: Thomas Huth 
Signed-off-by: Klaus Jensen 
---
 hw/nvme/ctrl.c | 42 +-
 1 file changed, 13 insertions(+), 29 deletions(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index dadc2dc7da10..f2e5a2fa737b 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -1468,20 +1468,16 @@ static inline void nvme_blk_write(BlockBackend *blk, 
int64_t offset,
 
 static void nvme_update_cq_eventidx(const NvmeCQueue *cq)
 {
-uint32_t v = cpu_to_le32(cq->head);
-
 trace_pci_nvme_update_cq_eventidx(cq->cqid, cq->head);
 
-pci_dma_write(PCI_DEVICE(cq->ctrl), cq->ei_addr, , sizeof(v));
+stl_le_pci_dma(PCI_DEVICE(cq->ctrl), cq->ei_addr, cq->head,
+   MEMTXATTRS_UNSPECIFIED);
 }
 
 static void nvme_update_cq_head(NvmeCQueue *cq)
 {
-uint32_t v;
-
-pci_dma_read(PCI_DEVICE(cq->ctrl), cq->db_addr, , sizeof(v));
-
-cq->head = le32_to_cpu(v);
+ldl_le_pci_dma(PCI_DEVICE(cq->ctrl), cq->db_addr, >head,
+   MEMTXATTRS_UNSPECIFIED);
 
 trace_pci_nvme_update_cq_head(cq->cqid, cq->head);
 }
@@ -6801,7 +6797,6 @@ static uint16_t nvme_dbbuf_config(NvmeCtrl *n, const 
NvmeRequest *req)
 PCIDevice *pci = PCI_DEVICE(n);
 uint64_t dbs_addr = le64_to_cpu(req->cmd.dptr.prp1);
 uint64_t eis_addr = le64_to_cpu(req->cmd.dptr.prp2);
-uint32_t v;
 int i;
 
 /* Address should be page aligned */
@@ -6819,8 +6814,6 @@ static uint16_t nvme_dbbuf_config(NvmeCtrl *n, const 
NvmeRequest *req)
 NvmeCQueue *cq = n->cq[i];
 
 if (sq) {
-v = cpu_to_le32(sq->tail);
-
 /*
  * CAP.DSTRD is 0, so offset of ith sq db_addr is (i<<3)
  * nvme_process_db() uses this hard-coded way to calculate
@@ -6828,7 +6821,7 @@ static uint16_t nvme_dbbuf_config(NvmeCtrl *n, const 
NvmeRequest *req)
  */
 sq->db_addr = dbs_addr + (i << 3);
 sq->ei_addr = eis_addr + (i << 3);
-pci_dma_write(pci, sq->db_addr, , sizeof(sq->tail));
+stl_le_pci_dma(pci, sq->db_addr, sq->tail, MEMTXATTRS_UNSPECIFIED);
 
 if (n->params.ioeventfd && sq->sqid != 0) {
 if (!nvme_init_sq_ioeventfd(sq)) {
@@ -6838,12 +6831,10 @@ static uint16_t nvme_dbbuf_config(NvmeCtrl *n, const 
NvmeRequest *req)
 }
 
 if (cq) {
-v = cpu_to_le32(cq->head);
-
 /* CAP.DSTRD is 0, so offset of ith cq db_addr is (i<<3)+(1<<2) */
 cq->db_addr = dbs_addr + (i << 3) + (1 << 2);
 cq->ei_addr = eis_addr + (i << 3) + (1 << 2);
-pci_dma_write(pci, cq->db_addr, , sizeof(cq->head));
+stl_le_pci_dma(pci, cq->db_addr, cq->head, MEMTXATTRS_UNSPECIFIED);
 
 if (n->params.ioeventfd && cq->cqid != 0) {
 if (!nvme_init_cq_ioeventfd(cq)) {
@@ -6974,20 +6965,16 @@ static uint16_t nvme_admin_cmd(NvmeCtrl *n, NvmeRequest 
*req)
 
 static void nvme_update_sq_eventidx(const NvmeSQueue *sq)
 {
-uint32_t v = cpu_to_le32(sq->tail);
-
 trace_pci_nvme_update_sq_eventidx(sq->sqid, sq->tail);
 
-pci_dma_write(PCI_DEVICE(sq->ctrl), sq->ei_addr, , sizeof(v));
+stl_le_pci_dma(PCI_DEVICE(sq->ctrl), sq->ei_addr, sq->tail,
+   MEMTXATTRS_UNSPECIFIED);
 }
 
 static void nvme_update_sq_tail(NvmeSQueue *sq)
 {
-uint32_t v;
-
-pci_dma_read(PCI_DEVICE(sq->ctrl), sq->db_addr, , sizeof(v));
-
-sq->tail = le32_to_cpu(v);
+ldl_le_pci_dma(PCI_DEVICE(sq->ctrl), sq->db_addr, >tail,
+   MEMTXATTRS_UNSPECIFIED);
 
 trace_pci_nvme_update_sq_tail(sq->sqid, sq->tail);
 }
@@ -7592,7 +7579,7 @@ static uint64_t nvme_mmio_read(void *opaque, hwaddr addr, 
unsigned size)
 static void nvme_process_db(NvmeCtrl *n, hwaddr addr, int val)
 {
 PCIDevice *pci = PCI_DEVICE(n);
-uint32_t qid, v;
+uint32_t qid;
 
 if (unlikely(addr & ((1 << 2) - 1))) {
 NVME_GUEST_ERR(pci_nvme_ub_db_wr_misaligned,
@@ -7659,8 +7646,7 @@ static void nvme_process_db(NvmeCtrl *n, hwaddr addr, int 
val)
 start_sqs = nvme_cq_full(cq) ? 1 : 0;
 cq->head = new_head;
 if (!qid && n->dbbuf_enabled) {
-v = cpu_to_le32(cq->head);
-pci_dma_write(pci, cq->db_addr, , sizeof(cq->head));
+stl_le_pci_dma(pci, cq->db_addr, cq->head, MEMTXATTRS_UNSPECIFIED);
 }
 if (start_sqs) {
 NvmeSQueue *sq;
@@ -7720,8 +7706,6 @@ static void nvme_process_db(NvmeCtrl *n, hwaddr addr, int 
val)
 
 sq->tail = new_tail;
 if (!qid && n->dbbuf_enabled) {
-v = cpu_to_le32(sq->tail);
-
 /*
  * The spec states "the host shall also update the controller's
  * corresponding doorbell property to 

Re: [PATCH] hda-audio: use log-scale for amplifier levels

2023-07-30 Thread Marc-André Lureau
Hi

On Sun, Jul 30, 2023 at 5:52 PM Yuanqing Li  wrote:
>
> According Intel's High Definition Audio Specification (Revision 1.0a,
> Section 7.3.4.10: Amplifier Capabilities), the amplifier gain levels
> should be evenly spaced in dB, i.e. using a log scale instead of linear.
>
> Here, the hda-codec reports amplifier levels from 0 to -48 dB at 1-dB
> steps matching the 8-bit dynamic range, and the -49 dB level is mapped
> to 0 (muted).
>
> Signed-off-by: Yuanqing Li 
> ---
>  hw/audio/hda-codec.c | 14 +++---
>  1 file changed, 11 insertions(+), 3 deletions(-)
>
> diff --git a/hw/audio/hda-codec.c b/hw/audio/hda-codec.c
> index c51d8ba617..c2aa71624b 100644
> --- a/hw/audio/hda-codec.c
> +++ b/hw/audio/hda-codec.c
> @@ -121,7 +121,7 @@ static void hda_codec_parse_fmt(uint32_t format, struct 
> audsettings *as)
>  #define QEMU_HDA_PCM_FORMATS (AC_SUPPCM_BITS_16 |   \
>0x1fc /* 16 -> 96 kHz */)
>  #define QEMU_HDA_AMP_NONE(0)
> -#define QEMU_HDA_AMP_STEPS   0x4a
> +#define QEMU_HDA_AMP_STEPS   0x31 /* 20 * log10(255) = 48 dB dynamic range */
>
>  #define   PARAM mixemu
>  #define   HDA_MIXER
> @@ -433,6 +433,14 @@ static void hda_audio_set_running(HDAAudioStream *st, 
> bool running)
>  }
>  }
>
> +/* first muted; then from -48 dB to 0 dB */
> +static const uint32_t hda_vol_table[] = {
> +0,   1,   1,   1,   1,   2,   2,   2,   2,   3,   3,   3,   4,   4,   5,
> +5,   6,   6,   7,   8,   9,   10,  11,  13,  14,  16,  18,  20,  23,  26,
> +29,  32,  36,  40,  45,  51,  57,  64,  72,  81,  90,  102, 114, 128, 
> 143,
> +161, 181, 203, 227, 255
> +}

It was not clear to me which scale is applied by the backend. I guess
mixeng uses linear, and thus all others should too.

> +
>  static void hda_audio_set_amp(HDAAudioStream *st)
>  {
>  bool muted;
> @@ -446,8 +454,8 @@ static void hda_audio_set_amp(HDAAudioStream *st)
>  left  = st->mute_left  ? 0 : st->gain_left;
>  right = st->mute_right ? 0 : st->gain_right;
>
> -left = left * 255 / QEMU_HDA_AMP_STEPS;
> -right = right * 255 / QEMU_HDA_AMP_STEPS;
> +left = hda_vol_table[left];
> +right = hda_vol_table[right];
>

Whoo, better check bounds here. (value can go up to AC_AMP_GAIN 0x7f,
reading the code)

Definitely not trivial material imho

-- 
Marc-André Lureau



[PATCH] Fix some typos in documentation and comments

2023-07-30 Thread Stefan Weil via
Signed-off-by: Stefan Weil 
---

This patch was triggered by a spelling check for the generated
QEMU documentation using codespell. It does not try to fix all
typos which still exist in the QEMU code, but has a focus on
those required to fix the documentation. Nevertheless some code
comments with the same typos were fixed, too.

I think the patch is trivial, so maybe it can still be included
in the upcoming release, but that's not strictly necessary.

Stefan

 docs/about/deprecated.rst| 2 +-
 docs/devel/qom.rst   | 2 +-
 docs/system/devices/nvme.rst | 2 +-
 hw/core/loader.c | 4 ++--
 include/exec/memory.h| 2 +-
 ui/vnc-enc-tight.c   | 2 +-
 6 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
index 1c35f55666..92a2bafd2b 100644
--- a/docs/about/deprecated.rst
+++ b/docs/about/deprecated.rst
@@ -369,7 +369,7 @@ mapping permissions et al by using its 'mapped' security 
model option.
 Nowadays it would make sense to reimplement the ``proxy`` backend by using
 QEMU's ``vhost`` feature, which would eliminate the high latency costs under
 which the 9p ``proxy`` backend currently suffers. However as of to date nobody
-has indicated plans for such kind of reimplemention unfortunately.
+has indicated plans for such kind of reimplementation unfortunately.
 
 
 Block device options
diff --git a/docs/devel/qom.rst b/docs/devel/qom.rst
index 0b506426d7..9918fac7f2 100644
--- a/docs/devel/qom.rst
+++ b/docs/devel/qom.rst
@@ -30,7 +30,7 @@ user configuration.
 Creating a QOM class
 
 
-A simple minimal device implementation may look something like bellow:
+A simple minimal device implementation may look something like below:
 
 .. code-block:: c
:caption: Creating a minimal type
diff --git a/docs/system/devices/nvme.rst b/docs/system/devices/nvme.rst
index a8bb8d729c..2a3af268f7 100644
--- a/docs/system/devices/nvme.rst
+++ b/docs/system/devices/nvme.rst
@@ -232,7 +232,7 @@ parameters:
   Set the number of Reclaim Groups.
 
 ``fdp.nruh`` (default: ``0``)
-  Set the number of Reclaim Unit Handles. This is a mandatory paramater and
+  Set the number of Reclaim Unit Handles. This is a mandatory parameter and
   must be non-zero.
 
 ``fdp.runs`` (default: ``96M``)
diff --git a/hw/core/loader.c b/hw/core/loader.c
index 8b7fd9e9e5..4dd5a71fb7 100644
--- a/hw/core/loader.c
+++ b/hw/core/loader.c
@@ -863,7 +863,7 @@ ssize_t load_image_gzipped(const char *filename, hwaddr 
addr, uint64_t max_sz)
 
 /*
  * The Linux header magic number for a EFI PE/COFF
- * image targetting an unspecified architecture.
+ * image targeting an unspecified architecture.
  */
 #define EFI_PE_LINUX_MAGIC"\xcd\x23\x82\x81"
 
@@ -1492,7 +1492,7 @@ RomGap rom_find_largest_gap_between(hwaddr base, size_t 
size)
 if (rom->mr || rom->fw_file) {
 continue;
 }
-/* ignore anything finishing bellow base */
+/* ignore anything finishing below base */
 if (rom->addr + rom->romsize <= base) {
 continue;
 }
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 7f5c11a0cc..68284428f8 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -942,7 +942,7 @@ struct MemoryListener {
  *
  * @listener: The #MemoryListener.
  * @last_stage: The last stage to synchronize the log during migration.
- * The caller should gurantee that the synchronization with true for
+ * The caller should guarantee that the synchronization with true for
  * @last_stage is triggered for once after all VCPUs have been stopped.
  */
 void (*log_sync_global)(MemoryListener *listener, bool last_stage);
diff --git a/ui/vnc-enc-tight.c b/ui/vnc-enc-tight.c
index 09200d71b8..ee853dcfcb 100644
--- a/ui/vnc-enc-tight.c
+++ b/ui/vnc-enc-tight.c
@@ -77,7 +77,7 @@ static int tight_send_framebuffer_update(VncState *vs, int x, 
int y,
 
 #ifdef CONFIG_VNC_JPEG
 static const struct {
-double jpeg_freq_min;   /* Don't send JPEG if the freq is bellow */
+double jpeg_freq_min;   /* Don't send JPEG if the freq is below */
 double jpeg_freq_threshold; /* Always send JPEG if the freq is above */
 int jpeg_idx;   /* Allow indexed JPEG */
 int jpeg_full;  /* Allow full color JPEG */
-- 
2.39.2




Re: [RFC][PATCH] accel/tcg: Use lookup_and_goto_ptr() for linux-user in translator_use_goto_tb()

2023-07-30 Thread Helge Deller

On 7/30/23 20:02, Richard Henderson wrote:

On 7/30/23 11:01, Richard Henderson wrote:

On 7/30/23 10:56, Helge Deller wrote:

+++ b/accel/tcg/translator.c
@@ -124,8 +124,13 @@ bool translator_use_goto_tb(DisasContextBase *db, vaddr 
dest)
  return false;
  }

+#ifndef CONFIG_USER_ONLY
  /* Check for the dest on the same page as the start of the TB.  */
  return ((db->pc_first ^ dest) & TARGET_PAGE_MASK) == 0;
+#else
+    /* linux-user doesn't need to fear pagefaults for exec swap-in */
+    return false;
+#endif
  }


No, we still need to stop at pages for breakpoints.

... and munmap for e.g. dlclose.


Ok, so we can't optimize for linux-user.
But what about my other question: Shouldn't it be
  return ((db->pc_first ^ dest) & TARGET_PAGE_MASK) != 0;
?

helge



[RFC][PATCH] accel/tcg: Use lookup_and_goto_ptr() for linux-user in translator_use_goto_tb()

2023-07-30 Thread Helge Deller
I'm quite unclear about translator_use_goto_tb() for qemu-user
emulation(and in general).

Based on the function name, the function translator_use_goto_tb() shall
help to decide if a program should use goto_tb() and exit_tb() to jump
to the next instruction.

Currently, if the destination is on the same page, it returns true.
I wonder, if it shouldn't return false in this case instead, because
arches have code like this: (taken from target/hppa/translate.c):
if (... && translator_use_goto_tb(ctx, f)) {
tcg_gen_goto_tb(which);
tcg_gen_movi_reg(cpu_iaoq_f, f);
tcg_gen_movi_reg(cpu_iaoq_b, b);
tcg_gen_exit_tb(ctx->base.tb, which);
} else {
copy_iaoq_entry(cpu_iaoq_f, f, cpu_iaoq_b);
copy_iaoq_entry(cpu_iaoq_b, b, ctx->iaoq_n_var);
tcg_gen_lookup_and_goto_ptr();
}

Shouldn't, if the destination is on the same page, the (faster?)
path with tcg_gen_lookup_and_goto_ptr() be taken instead?

Now, for user-mode emulation page faults can't happen at all.
Shouldn't in this case the tcg_gen_lookup_and_goto_ptr() path been taken
unconditionally, as shown in the patch below?

Signed-off-by: Helge Deller 

diff --git a/accel/tcg/translator.c b/accel/tcg/translator.c
index 1a6a5448c8..07224a7f83 100644
--- a/accel/tcg/translator.c
+++ b/accel/tcg/translator.c
@@ -124,8 +124,13 @@ bool translator_use_goto_tb(DisasContextBase *db, vaddr 
dest)
 return false;
 }

+#ifndef CONFIG_USER_ONLY
 /* Check for the dest on the same page as the start of the TB.  */
 return ((db->pc_first ^ dest) & TARGET_PAGE_MASK) == 0;
+#else
+/* linux-user doesn't need to fear pagefaults for exec swap-in */
+return false;
+#endif
 }

 void translator_loop(CPUState *cpu, TranslationBlock *tb, int *max_insns,



Re: [RFC][PATCH] accel/tcg: Use lookup_and_goto_ptr() for linux-user in translator_use_goto_tb()

2023-07-30 Thread Richard Henderson

On 7/30/23 10:56, Helge Deller wrote:

+++ b/accel/tcg/translator.c
@@ -124,8 +124,13 @@ bool translator_use_goto_tb(DisasContextBase *db, vaddr 
dest)
  return false;
  }

+#ifndef CONFIG_USER_ONLY
  /* Check for the dest on the same page as the start of the TB.  */
  return ((db->pc_first ^ dest) & TARGET_PAGE_MASK) == 0;
+#else
+/* linux-user doesn't need to fear pagefaults for exec swap-in */
+return false;
+#endif
  }


No, we still need to stop at pages for breakpoints.


r~



Re: [RFC][PATCH] accel/tcg: Use lookup_and_goto_ptr() for linux-user in translator_use_goto_tb()

2023-07-30 Thread Richard Henderson

On 7/30/23 11:01, Richard Henderson wrote:

On 7/30/23 10:56, Helge Deller wrote:

+++ b/accel/tcg/translator.c
@@ -124,8 +124,13 @@ bool translator_use_goto_tb(DisasContextBase *db, vaddr 
dest)
  return false;
  }

+#ifndef CONFIG_USER_ONLY
  /* Check for the dest on the same page as the start of the TB.  */
  return ((db->pc_first ^ dest) & TARGET_PAGE_MASK) == 0;
+#else
+    /* linux-user doesn't need to fear pagefaults for exec swap-in */
+    return false;
+#endif
  }


No, we still need to stop at pages for breakpoints.


... and munmap for e.g. dlclose.


r~




Re: [PATCH] target/hppa: Move iaoq registers and thus reduce generated code size

2023-07-30 Thread Richard Henderson

On 7/30/23 09:30, Helge Deller wrote:

On hppa the Instruction Address Offset Queue (IAOQ) registers specifies
the next to-be-executed instructions addresses. Each generated TB writes those
registers at least once, so those registers are used heavily in generated
code.

Looking at the generated assembly, for a x86-64 host this code
to write the address $0x7ffe826f into iaoq_f is generated:
0x7f73e8000184:  c7 85 d4 01 00 00 6f 82  movl $0x7ffe826f, 0x1d4(%rbp)
0x7f73e800018c:  fe 7f
0x7f73e800018e:  c7 85 d8 01 00 00 73 82  movl $0x7ffe8273, 0x1d8(%rbp)
0x7f73e8000196:  fe 7f

With the trivial change, by moving the variables iaoq_f and iaoq_b to
the top of struct CPUArchState, the offset to %rbp is reduced (from
0x1d4 to 0), which allows the x86-64 tcg to generate 3 bytes less of
generated code per move instruction:
0x7fc1e800018c:  c7 45 00 6f 82 fe 7f movl $0x7ffe826f, (%rbp)
0x7fc1e8000193:  c7 45 04 73 82 fe 7f movl $0x7ffe8273, 4(%rbp)

Overall this is a reduction of generated code (not a reduction of
number of instructions).
A test run with checks the generated code size by running "/bin/ls"
with qemu-user shows that the code size shrinks from 1616767 to 1569273
bytes, which is ~97% of the former size.

Signed-off-by: Helge Deller


Reviewed-by: Richard Henderson 

r~



Re: [PATCH] target/openrisc: Set EPCR to next PC on FPE exceptions

2023-07-30 Thread Richard Henderson

On 7/29/23 14:08, Stafford Horne wrote:

The architecture specification calls for the EPCR to be set to "Address
of next not executed instruction" when there is a floating point
exception (FPE).  This was not being done, so fix it by using the same
method as syscall.  Note, this may need a lot more work if we start
seeing floating point operations in delay slots which exceptions
enabled.

Without this patch FPU exceptions will loop, as the exception hanlding
will always return back to the failed floating point instruction.

This was not noticed in earlier testing because:

  1. The compiler usually generates code which clobbers the input operand
 such as:

   lf.div.s r19,r17,r19

  2. The target will store the operation output before to the register
 before handling the exception.  So an operation such as:

   float a = 100.0f;
   float b = 0.0f;
   float c = a / b;/* lf.div.s r19,r17,r19 */

 Will first execute:

   100 / 0-> Store inf to c (r19)
  -> triggering divide by zero exception
  -> handle and return

 Then it will exectute:

   100 / inf  -> Store 0 to c  (no exception)

To confirm the looping behavoid and the fix I used the following:

 float fpu_div(float a, float b) {
float c;
asm volatile("lf.div.s %0, %1, %2"
  : "+r" (c)
  : "r" (a), "r" (b));
return c;
 }

Signed-off-by: Stafford Horne 
---
  target/openrisc/interrupt.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/openrisc/interrupt.c b/target/openrisc/interrupt.c
index 3887812810..9b14b8a2c6 100644
--- a/target/openrisc/interrupt.c
+++ b/target/openrisc/interrupt.c
@@ -34,7 +34,7 @@ void openrisc_cpu_do_interrupt(CPUState *cs)
  int exception = cs->exception_index;
  
  env->epcr = env->pc;

-if (exception == EXCP_SYSCALL) {
+if (exception == EXCP_SYSCALL || exception == EXCP_FPE) {
  env->epcr += 4;
  }
  /* When we have an illegal instruction the error effective address


According to Table 6-3, when in a delay slot the EPCR should be the address of the jump, 
for both syscall and fpe.  This whole block should be moved down...



/* Set/clear dsx to indicate if we are in a delay slot exception.  */
if (env->dflag) {
env->dflag = 0;
env->sr |= SR_DSX;
env->epcr -= 4;
} else {
env->sr &= ~SR_DSX;
}


... into the else.

With that,
Reviewed-by: Richard Henderson 


r~



[PATCH QEMU v2 3/3] tests/migration: Introduce dirty-limit into guestperf

2023-07-30 Thread ~hyman
From: Hyman Huang(黄勇) 

Currently, guestperf does not cover the dirty-limit
migration, support this feature.

Note that dirty-limit requires 'dirty-ring-size' set.

To enable dirty-limit, setting x-vcpu-dirty-limit-period
as 500ms and x-vcpu-dirty-limit as 10MB/s:
$ ./tests/migration/guestperf.py \
--dirty-ring-size 4096 \
--dirty-limit --x-vcpu-dirty-limit-period 500 \
--vcpu-dirty-limit 10 --output output.json \

To run the entire standardized set of dirty-limit-enabled
comparisons, with unix migration:
$ ./tests/migration/guestperf-batch.py \
--dirty-ring-size 4096 \
--dst-host localhost --transport unix \
--filter compr-dirty-limit* --output outputdir

Signed-off-by: Hyman Huang(黄勇) 
---
 tests/migration/guestperf/comparison.py | 23 +++
 tests/migration/guestperf/engine.py | 17 +
 tests/migration/guestperf/progress.py   | 16 ++--
 tests/migration/guestperf/scenario.py   | 11 ++-
 tests/migration/guestperf/shell.py  | 18 +-
 5 files changed, 81 insertions(+), 4 deletions(-)

diff --git a/tests/migration/guestperf/comparison.py 
b/tests/migration/guestperf/comparison.py
index c03b3f6d7e..42cc0372d1 100644
--- a/tests/migration/guestperf/comparison.py
+++ b/tests/migration/guestperf/comparison.py
@@ -135,4 +135,27 @@ COMPARISONS = [
 Scenario("compr-multifd-channels-64",
  multifd=True, multifd_channels=64),
 ]),
+
+# Looking at effect of dirty-limit with
+# varying x_vcpu_dirty_limit_period
+Comparison("compr-dirty-limit-period", scenarios = [
+Scenario("compr-dirty-limit-period-500",
+ dirty_limit=True, x_vcpu_dirty_limit_period=500),
+Scenario("compr-dirty-limit-period-800",
+ dirty_limit=True, x_vcpu_dirty_limit_period=800),
+Scenario("compr-dirty-limit-period-1000",
+ dirty_limit=True, x_vcpu_dirty_limit_period=1000),
+]),
+
+
+# Looking at effect of dirty-limit with
+# varying vcpu_dirty_limit
+Comparison("compr-dirty-limit", scenarios = [
+Scenario("compr-dirty-limit-10MB",
+ dirty_limit=True, vcpu_dirty_limit=10),
+Scenario("compr-dirty-limit-20MB",
+ dirty_limit=True, vcpu_dirty_limit=20),
+Scenario("compr-dirty-limit-50MB",
+ dirty_limit=True, vcpu_dirty_limit=50),
+]),
 ]
diff --git a/tests/migration/guestperf/engine.py 
b/tests/migration/guestperf/engine.py
index 29ebb5011b..93a6f78e46 100644
--- a/tests/migration/guestperf/engine.py
+++ b/tests/migration/guestperf/engine.py
@@ -102,6 +102,8 @@ class Engine(object):
 info.get("expected-downtime", 0),
 info.get("setup-time", 0),
 info.get("cpu-throttle-percentage", 0),
+info.get("dirty-limit-throttle-time-per-round", 0),
+info.get("dirty-limit-ring-full-time", 0),
 )
 
 def _migrate(self, hardware, scenario, src, dst, connect_uri):
@@ -203,6 +205,21 @@ class Engine(object):
 resp = dst.command("migrate-set-parameters",
multifd_channels=scenario._multifd_channels)
 
+if scenario._dirty_limit:
+if not hardware._dirty_ring_size:
+raise Exception("dirty ring size must be configured when "
+"testing dirty limit migration")
+
+resp = src.command("migrate-set-capabilities",
+   capabilities = [
+   { "capability": "dirty-limit",
+ "state": True }
+   ])
+resp = src.command("migrate-set-parameters",
+x_vcpu_dirty_limit_period=scenario._x_vcpu_dirty_limit_period)
+resp = src.command("migrate-set-parameters",
+   vcpu_dirty_limit=scenario._vcpu_dirty_limit)
+
 resp = src.command("migrate", uri=connect_uri)
 
 post_copy = False
diff --git a/tests/migration/guestperf/progress.py 
b/tests/migration/guestperf/progress.py
index ab1ee57273..d490584217 100644
--- a/tests/migration/guestperf/progress.py
+++ b/tests/migration/guestperf/progress.py
@@ -81,7 +81,9 @@ class Progress(object):
  downtime,
  downtime_expected,
  setup_time,
- throttle_pcent):
+ throttle_pcent,
+ dirty_limit_throttle_time_per_round,
+ dirty_limit_ring_full_time):
 
 self._status = status
 self._ram = ram
@@ -91,6 +93,10 @@ class Progress(object):
 self._downtime_expected = downtime_expected
 self._setup_time = setup_time
 self._throttle_pcent = throttle_pcent
+self._dirty_limit_throttle_time_per_round = \
+dirty_limit_throttle_time_per_round
+self._dirty_limit_ring_full_time = \
+

Re: [PATCH v2] target/ppc: Fix VRMA page size for ISA v3.0

2023-07-30 Thread Cédric Le Goater

On 7/30/23 13:18, Nicholas Piggin wrote:

Until v2.07s, the VRMA page size (L||LP) was encoded in LPCR[VRMASD].
In v3.0 that moved to the partition table PS field.

The powernv machine can now run KVM HPT guests on POWER9/10 CPUs with
this fix and the patch to add ASDR.

Fixes: 3367c62f522b ("target/ppc: Support for POWER9 native hash")
Signed-off-by: Nicholas Piggin 
---
Since v1:
- Added llp variable to avoid calling get_vrma_llp twice [Cedric].
- Added some bit defines for architected fields and values [Cedric].


Thanks,


Reviewed-by: Cédric Le Goater 

C.




Patches 1,3 from the previously posted series, let's defer 4-6
decrementer fixes until after 8.1, so this is the last remaining
one from the series.

Thanks,
Nick

  target/ppc/mmu-hash64.c | 45 +++--
  target/ppc/mmu-hash64.h |  5 +
  2 files changed, 44 insertions(+), 6 deletions(-)

diff --git a/target/ppc/mmu-hash64.c b/target/ppc/mmu-hash64.c
index a0c90df3ce..d645c0bb94 100644
--- a/target/ppc/mmu-hash64.c
+++ b/target/ppc/mmu-hash64.c
@@ -874,12 +874,46 @@ static target_ulong rmls_limit(PowerPCCPU *cpu)
  return rma_sizes[rmls];
  }
  
-static int build_vrma_slbe(PowerPCCPU *cpu, ppc_slb_t *slb)

+/* Return the LLP in SLB_VSID format */
+static uint64_t get_vrma_llp(PowerPCCPU *cpu)
  {
  CPUPPCState *env = >env;
-target_ulong lpcr = env->spr[SPR_LPCR];
-uint32_t vrmasd = (lpcr & LPCR_VRMASD) >> LPCR_VRMASD_SHIFT;
-target_ulong vsid = SLB_VSID_VRMA | ((vrmasd << 4) & SLB_VSID_LLP_MASK);
+uint64_t llp;
+
+if (env->mmu_model == POWERPC_MMU_3_00) {
+ppc_v3_pate_t pate;
+uint64_t ps, l, lp;
+
+/*
+ * ISA v3.0 removes the LPCR[VRMASD] field and puts the VRMA base
+ * page size (L||LP equivalent) in the PS field in the HPT partition
+ * table entry.
+ */
+if (!ppc64_v3_get_pate(cpu, cpu->env.spr[SPR_LPIDR], )) {
+error_report("Bad VRMA with no partition table entry");
+return 0;
+}
+ps = PATE0_GET_PS(pate.dw0);
+/* PS has L||LP in 3 consecutive bits, put them into SLB LLP format */
+l = (ps >> 2) & 0x1;
+lp = ps & 0x3;
+llp = (l << SLB_VSID_L_SHIFT) | (lp << SLB_VSID_LP_SHIFT);
+
+} else {
+uint64_t lpcr = env->spr[SPR_LPCR];
+target_ulong vrmasd = (lpcr & LPCR_VRMASD) >> LPCR_VRMASD_SHIFT;
+
+/* VRMASD LLP matches SLB format, just shift and mask it */
+llp = (vrmasd << SLB_VSID_LP_SHIFT) & SLB_VSID_LLP_MASK;
+}
+
+return llp;
+}
+
+static int build_vrma_slbe(PowerPCCPU *cpu, ppc_slb_t *slb)
+{
+uint64_t llp = get_vrma_llp(cpu);
+target_ulong vsid = SLB_VSID_VRMA | llp;
  int i;
  
  for (i = 0; i < PPC_PAGE_SIZES_MAX_SZ; i++) {

@@ -897,8 +931,7 @@ static int build_vrma_slbe(PowerPCCPU *cpu, ppc_slb_t *slb)
  }
  }
  
-error_report("Bad page size encoding in LPCR[VRMASD]; LPCR=0x"

- TARGET_FMT_lx, lpcr);
+error_report("Bad VRMA page size encoding 0x" TARGET_FMT_lx, llp);
  
  return -1;

  }
diff --git a/target/ppc/mmu-hash64.h b/target/ppc/mmu-hash64.h
index 1496955d38..de653fcae5 100644
--- a/target/ppc/mmu-hash64.h
+++ b/target/ppc/mmu-hash64.h
@@ -41,8 +41,10 @@ void ppc_hash64_finalize(PowerPCCPU *cpu);
  #define SLB_VSID_KP 0x0400ULL
  #define SLB_VSID_N  0x0200ULL /* no-execute */
  #define SLB_VSID_L  0x0100ULL
+#define SLB_VSID_L_SHIFTPPC_BIT_NR(55)
  #define SLB_VSID_C  0x0080ULL /* class */
  #define SLB_VSID_LP 0x0030ULL
+#define SLB_VSID_LP_SHIFT   PPC_BIT_NR(59)
  #define SLB_VSID_ATTR   0x0FFFULL
  #define SLB_VSID_LLP_MASK   (SLB_VSID_L | SLB_VSID_LP)
  #define SLB_VSID_4K 0xULL
@@ -58,6 +60,9 @@ void ppc_hash64_finalize(PowerPCCPU *cpu);
  #define SDR_64_HTABSIZE0x001FULL
  
  #define PATE0_HTABORG   0x0FFCULL

+#define PATE0_PSPPC_BITMASK(56, 58)
+#define PATE0_GET_PS(dw0)   (((dw0) & PATE0_PS) >> PPC_BIT_NR(58))
+
  #define HPTES_PER_GROUP 8
  #define HASH_PTE_SIZE_6416
  #define HASH_PTEG_SIZE_64   (HASH_PTE_SIZE_64 * HPTES_PER_GROUP)





[PATCH QEMU v2 2/3] tests/migration: Introduce dirty-ring-size option into guestperf

2023-07-30 Thread ~hyman
From: Hyman Huang(黄勇) 

Dirty ring size configuration is not supported by guestperf tool.

Introduce dirty-ring-size (ranges in [1024, 65536]) option so
developers can play with dirty-ring and dirty-limit feature easier.

To set dirty ring size with 4096 during migration test:
$ ./tests/migration/guestperf.py --dirty-ring-size 4096 xxx

Signed-off-by: Hyman Huang(黄勇) 
---
 tests/migration/guestperf/engine.py   | 6 +-
 tests/migration/guestperf/hardware.py | 8 ++--
 tests/migration/guestperf/shell.py| 6 +-
 3 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/tests/migration/guestperf/engine.py 
b/tests/migration/guestperf/engine.py
index e69d16a62c..29ebb5011b 100644
--- a/tests/migration/guestperf/engine.py
+++ b/tests/migration/guestperf/engine.py
@@ -325,7 +325,6 @@ class Engine(object):
 cmdline = "'" + cmdline + "'"
 
 argv = [
-"-accel", "kvm",
 "-cpu", "host",
 "-kernel", self._kernel,
 "-initrd", self._initrd,
@@ -333,6 +332,11 @@ class Engine(object):
 "-m", str((hardware._mem * 1024) + 512),
 "-smp", str(hardware._cpus),
 ]
+if hardware._dirty_ring_size:
+argv.extend(["-accel", "kvm,dirty-ring-size=%s" %
+ hardware._dirty_ring_size])
+else:
+argv.extend(["-accel", "kvm"])
 
 argv.extend(self._get_qemu_serial_args())
 
diff --git a/tests/migration/guestperf/hardware.py 
b/tests/migration/guestperf/hardware.py
index 3145785ffd..f779cc050b 100644
--- a/tests/migration/guestperf/hardware.py
+++ b/tests/migration/guestperf/hardware.py
@@ -23,7 +23,8 @@ class Hardware(object):
  src_cpu_bind=None, src_mem_bind=None,
  dst_cpu_bind=None, dst_mem_bind=None,
  prealloc_pages = False,
- huge_pages=False, locked_pages=False):
+ huge_pages=False, locked_pages=False,
+ dirty_ring_size=0):
 self._cpus = cpus
 self._mem = mem # GiB
 self._src_mem_bind = src_mem_bind # List of NUMA nodes
@@ -33,6 +34,7 @@ class Hardware(object):
 self._prealloc_pages = prealloc_pages
 self._huge_pages = huge_pages
 self._locked_pages = locked_pages
+self._dirty_ring_size = dirty_ring_size
 
 
 def serialize(self):
@@ -46,6 +48,7 @@ class Hardware(object):
 "prealloc_pages": self._prealloc_pages,
 "huge_pages": self._huge_pages,
 "locked_pages": self._locked_pages,
+"dirty_ring_size": self._dirty_ring_size,
 }
 
 @classmethod
@@ -59,4 +62,5 @@ class Hardware(object):
 data["dst_mem_bind"],
 data["prealloc_pages"],
 data["huge_pages"],
-data["locked_pages"])
+data["locked_pages"],
+data["dirty_ring_size"])
diff --git a/tests/migration/guestperf/shell.py 
b/tests/migration/guestperf/shell.py
index 8a809e3dda..7d6b8cd7cf 100644
--- a/tests/migration/guestperf/shell.py
+++ b/tests/migration/guestperf/shell.py
@@ -60,6 +60,8 @@ class BaseShell(object):
 parser.add_argument("--prealloc-pages", dest="prealloc_pages", 
default=False)
 parser.add_argument("--huge-pages", dest="huge_pages", default=False)
 parser.add_argument("--locked-pages", dest="locked_pages", 
default=False)
+parser.add_argument("--dirty-ring-size", dest="dirty_ring_size",
+default=0, type=int)
 
 self._parser = parser
 
@@ -89,7 +91,9 @@ class BaseShell(object):
 
 locked_pages=args.locked_pages,
 huge_pages=args.huge_pages,
-prealloc_pages=args.prealloc_pages)
+prealloc_pages=args.prealloc_pages,
+
+dirty_ring_size=args.dirty_ring_size)
 
 
 class Shell(BaseShell):
-- 
2.38.5




Re: [PATCH for-8.2 0/2] ppc: get rid of free() (gitlab #1798)

2023-07-30 Thread Daniel Henrique Barboza




On 7/29/23 12:35, Peter Maydell wrote:

On Fri, 28 Jul 2023 at 21:57, Daniel Henrique Barboza
 wrote:

Here's some trivial changes following Peter's call to arms against
free() and friends in gitlab issue #1798 in an attempt to enforce
our memory management guidelines [1].


To clarify, this isn't a "call to arms". The issue is marked up as
a "bite-sized task", which is to say that it's a potential easy
place to start for newcomers to the community who might be making
their first contribution to the codebase. The changes it suggests
aren't urgent; at most they're a nice-to-have, since glib
guarantees that you can mix malloc/free and g_malloc/g_free.


I failed to realized it was a byte sized task :/ and my Coccinelle comment
in the bug makes me fell dumb hehe (given that Coccinelle is not newcomer
friendly).



We've had this sitting around as a suggestion on the wiki page
for bite-sized-tasks for years, and occasionally people come
through and have a go at it. I wanted to clean up and expand
on the description of what we had in mind for the change, to
give those people a better chance of successfully completing
the task.


What we can do then, since I already sent these, is perhaps link these patches
as example/template in the gitlab issue later on.


Thanks,

Daniel



thanks
-- PMM




Re: [PATCH for-8.2 2/2] target/ppc: use g_free() in test_opcode_table()

2023-07-30 Thread Daniel Henrique Barboza




On 7/29/23 12:32, Peter Maydell wrote:

On Fri, 28 Jul 2023 at 21:47, Daniel Henrique Barboza
 wrote:


Use g_free(table[i]) instead of free(table[i]) to comply with QEMU low
level memory management guidelines.

Signed-off-by: Daniel Henrique Barboza 
---
  target/ppc/translate.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index e6a0709066..d90535266e 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -7129,7 +7129,7 @@ static int test_opcode_table(opc_handler_t **table, int 
len)
  tmp = test_opcode_table(ind_table(table[i]),
  PPC_CPU_INDIRECT_OPCODES_LEN);
  if (tmp == 0) {
-free(table[i]);
+g_free(table[i]);
  table[i] = _handler;
  } else {
  count++;


Where is the allocation that this memory is free()ing? I
think it is the g_new() in create_new_table(), but the code
is a little complicated for me to understand...


It's on create_new_table() in the same file:

static int create_new_table(opc_handler_t **table, unsigned char idx)
{
opc_handler_t **tmp;

tmp = g_new(opc_handler_t *, PPC_CPU_INDIRECT_OPCODES_LEN);
fill_new_table(tmp, PPC_CPU_INDIRECT_OPCODES_LEN);
table[idx] = (opc_handler_t *)((uintptr_t)tmp | PPC_INDIRECT);

return 0;
}


I probably should've mentioned in the commit msg ...


Daniel




thanks
-- PMM




[PATCH QEMU v2 0/3] migration: enrich the dirty-limit test case

2023-07-30 Thread ~hyman
The dirty-limit migration test involves many passes
and takes about 1 minute on average, so put it in
the slow mode of migration-test. Inspired by Peter.

V2:
- put the dirty-limit migration test in slow mode and
  enrich the test case comment

Dirty-limit feature was introduced in 8.1, and the test
case could be enriched to make sure the behavior and
the performance of dirty-limit is exactly what we want.

This series adds 2 test cases, the first commit aims
for the functional test and the others aim for the
performance test.

Please review, thanks.

Yong.

Hyman Huang(黄勇) (3):
  tests: Add migration dirty-limit capability test
  tests/migration: Introduce dirty-ring-size option into guestperf
  tests/migration: Introduce dirty-limit into guestperf

 tests/migration/guestperf/comparison.py |  23 
 tests/migration/guestperf/engine.py |  23 +++-
 tests/migration/guestperf/hardware.py   |   8 +-
 tests/migration/guestperf/progress.py   |  16 ++-
 tests/migration/guestperf/scenario.py   |  11 +-
 tests/migration/guestperf/shell.py  |  24 +++-
 tests/qtest/migration-test.c| 164 
 7 files changed, 261 insertions(+), 8 deletions(-)

-- 
2.38.5



[PATCH QEMU v2 1/3] tests: Add migration dirty-limit capability test

2023-07-30 Thread ~hyman
From: Hyman Huang(黄勇) 

Add migration dirty-limit capability test if kernel support
dirty ring.

Migration dirty-limit capability introduce dirty limit
capability, two parameters: x-vcpu-dirty-limit-period and
vcpu-dirty-limit are introduced to implement the live
migration with dirty limit.

The test case does the following things:
1. start src, dst vm and enable dirty-limit capability
2. start migrate and set cancel it to check if dirty limit
   stop working.
3. restart dst vm
4. start migrate and enable dirty-limit capability
5. check if migration satisfy the convergence condition
   during pre-switchover phase.

Note that this test case involves many passes, so it runs
in slow mode only.

Signed-off-by: Hyman Huang(黄勇) 
---
 tests/qtest/migration-test.c | 164 +++
 1 file changed, 164 insertions(+)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 62d3f37021..0be2d17c42 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -2739,6 +2739,166 @@ static void test_vcpu_dirty_limit(void)
 dirtylimit_stop_vm(vm);
 }
 
+static void migrate_dirty_limit_wait_showup(QTestState *from,
+const int64_t period,
+const int64_t value)
+{
+/* Enable dirty limit capability */
+migrate_set_capability(from, "dirty-limit", true);
+
+/* Set dirty limit parameters */
+migrate_set_parameter_int(from, "x-vcpu-dirty-limit-period", period);
+migrate_set_parameter_int(from, "vcpu-dirty-limit", value);
+
+/* Make sure migrate can't converge */
+migrate_ensure_non_converge(from);
+
+/* To check limit rate after precopy */
+migrate_set_capability(from, "pause-before-switchover", true);
+
+/* Wait for the serial output from the source */
+wait_for_serial("src_serial");
+}
+
+/*
+ * This test does:
+ *  source  destination
+ *  start vm
+ *  start incoming vm
+ *  migrate
+ *  wait dirty limit to begin
+ *  cancel migrate
+ *  cancellation check
+ *  restart incoming vm
+ *  migrate
+ *  wait dirty limit to begin
+ *  wait pre-switchover event
+ *  convergence condition check
+ *
+ * And see if dirty limit migration works correctly.
+ * This test case involves many passes, so it runs in slow mode only.
+ */
+static void test_migrate_dirty_limit(void)
+{
+g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs);
+QTestState *from, *to;
+int64_t remaining;
+uint64_t throttle_us_per_full;
+/*
+ * We want the test to be stable and as fast as possible.
+ * E.g., with 1Gb/s bandwith migration may pass without dirty limit,
+ * so we need to decrease a bandwidth.
+ */
+const int64_t dirtylimit_period = 1000, dirtylimit_value = 50;
+const int64_t max_bandwidth = 4; /* ~400Mb/s */
+const int64_t downtime_limit = 250; /* 250ms */
+/*
+ * We migrate through unix-socket (> 500Mb/s).
+ * Thus, expected migration speed ~= bandwidth limit (< 500Mb/s).
+ * So, we can predict expected_threshold
+ */
+const int64_t expected_threshold = max_bandwidth * downtime_limit / 1000;
+int max_try_count = 10;
+MigrateCommon args = {
+.start = {
+.hide_stderr = true,
+.use_dirty_ring = true,
+},
+.listen_uri = uri,
+.connect_uri = uri,
+};
+
+/* Start src, dst vm */
+if (test_migrate_start(, , args.listen_uri, )) {
+return;
+}
+
+/* Prepare for dirty limit migration and wait src vm show up */
+migrate_dirty_limit_wait_showup(from, dirtylimit_period, dirtylimit_value);
+
+/* Start migrate */
+migrate_qmp(from, uri, "{}");
+
+/* Wait for dirty limit throttle begin */
+throttle_us_per_full = 0;
+while (throttle_us_per_full == 0) {
+throttle_us_per_full =
+read_migrate_property_int(from, "dirty-limit-throttle-time-per-round");
+usleep(100);
+g_assert_false(got_src_stop);
+}
+
+/* Now cancel migrate and wait for dirty limit throttle switch off */
+migrate_cancel(from);
+wait_for_migration_status(from, "cancelled", NULL);
+
+/* Check if dirty limit throttle switched off, set timeout 1ms */
+do {
+throttle_us_per_full =
+read_migrate_property_int(from, "dirty-limit-throttle-time-per-round");
+usleep(100);
+g_assert_false(got_src_stop);
+} while (throttle_us_per_full != 0 && --max_try_count);
+
+/* Assert dirty limit is not in service */
+g_assert_cmpint(throttle_us_per_full, ==, 0);
+
+args = (MigrateCommon) {
+.start = {
+.only_target = true,
+.use_dirty_ring = true,
+},
+.listen_uri = uri,
+.connect_uri = uri,
+};
+
+/* Restart dst vm, src vm already show up so we needn't wait anymore */
+if 

[PATCH QEMU v3 1/3] qapi: Reformat the dirty-limit migration doc comments

2023-07-30 Thread ~hyman
From: Hyman Huang(黄勇) 

Reformat the dirty-limit migration doc comments to conform
to current conventions as commit a937b6aa739 (qapi: Reformat
doc comments to conform to current conventions).

Signed-off-by: Markus Armbruster 
Signed-off-by: Hyman Huang(黄勇) 
---
 qapi/migration.json | 69 ++---
 1 file changed, 34 insertions(+), 35 deletions(-)

diff --git a/qapi/migration.json b/qapi/migration.json
index 6b49593d2f..a74ade4d72 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -258,17 +258,17 @@
 # blocked.  Present and non-empty when migration is blocked.
 # (since 6.0)
 #
-# @dirty-limit-throttle-time-per-round: Maximum throttle time (in 
microseconds) of virtual
-#   CPUs each dirty ring full round, which 
shows how
-#   MigrationCapability dirty-limit 
affects the guest
-#   during live migration. (since 8.1)
-#
-# @dirty-limit-ring-full-time: Estimated average dirty ring full time (in 
microseconds)
-#  each dirty ring full round, note that the value 
equals
-#  dirty ring memory size divided by average dirty 
page rate
-#  of virtual CPU, which can be used to observe 
the average
-#  memory load of virtual CPU indirectly. Note 
that zero
-#  means guest doesn't dirty memory (since 8.1)
+# @dirty-limit-throttle-time-per-round: Maximum throttle time
+# (in microseconds) of virtual CPUs each dirty ring full round,
+# which shows how MigrationCapability dirty-limit affects the
+# guest during live migration.  (Since 8.1)
+#
+# @dirty-limit-ring-full-time: Estimated average dirty ring full
+# time (in microseconds) for each dirty ring full round. The
+# value equals the dirty ring memory size divided by the average
+# dirty page rate of the virtual CPU, which can be used to
+# observe the average memory load of the virtual CPU indirectly.
+# Note that zero means guest doesn't dirty memory.  (Since 8.1)
 #
 # Since: 0.14
 ##
@@ -519,15 +519,14 @@
 # are present.  'return-path' capability must be enabled to use
 # it.  (since 8.1)
 #
-# @dirty-limit: If enabled, migration will use the dirty-limit algo to
-#   throttle down guest instead of auto-converge algo.
-#   Throttle algo only works when vCPU's dirtyrate greater
-#   than 'vcpu-dirty-limit', read processes in guest os
-#   aren't penalized any more, so this algo can improve
-#   performance of vCPU during live migration. This is an
-#   optional performance feature and should not affect the
-#   correctness of the existing auto-converge algo.
-#   (since 8.1)
+# @dirty-limit: If enabled, migration will use the dirty-limit
+# algorithim to throttle down guest instead of auto-converge
+# algorithim. Throttle algorithim only works when vCPU's dirtyrate
+# greater than 'vcpu-dirty-limit', read processes in guest os
+# aren't penalized any more, so this algorithim can improve
+# performance of vCPU during live migration. This is an optional
+# performance feature and should not affect the correctness of the
+# existing auto-converge algorithim.  (Since 8.1)
 #
 # Features:
 #
@@ -822,17 +821,17 @@
 # Nodes are mapped to their block device name if there is one, and
 # to their node name otherwise.  (Since 5.2)
 #
-# @x-vcpu-dirty-limit-period: Periodic time (in milliseconds) of dirty limit 
during
-# live migration. Should be in the range 1 to 
1000ms,
-# defaults to 1000ms. (Since 8.1)
+# @x-vcpu-dirty-limit-period: Periodic time (in milliseconds) of dirty
+# limit during live migration. Should be in the range 1 to 1000ms.
+# Defaults to 1000ms.  (Since 8.1)
 #
 # @vcpu-dirty-limit: Dirtyrate limit (MB/s) during live migration.
-#Defaults to 1. (Since 8.1)
+# Defaults to 1.  (Since 8.1)
 #
 # Features:
 #
 # @unstable: Members @x-checkpoint-delay and @x-vcpu-dirty-limit-period
-#are experimental.
+# are experimental.
 #
 # Since: 2.4
 ##
@@ -988,17 +987,17 @@
 # Nodes are mapped to their block device name if there is one, and
 # to their node name otherwise.  (Since 5.2)
 #
-# @x-vcpu-dirty-limit-period: Periodic time (in milliseconds) of dirty limit 
during
-# live migration. Should be in the range 1 to 
1000ms,
-# defaults to 1000ms. (Since 8.1)
+# @x-vcpu-dirty-limit-period: Periodic time (in milliseconds) of dirty
+# limit during live migration. Should be in the range 1 to 1000ms.
+# Defaults to 1000ms.  (Since 8.1)
 #
 # @vcpu-dirty-limit: Dirtyrate limit (MB/s) during live migration.
-#

[PATCH QEMU v3 3/3] MAINTAINERS: Add section "Migration dirty limit and dirty page rate"

2023-07-30 Thread ~hyman
From: Hyman Huang(黄勇) 

I've built interests in dirty limit and dirty page rate
features and also have been working on projects related
to this subsystem.

Add a section to the MAINTAINERS file for migration
dirty limit and dirty page rate.

Add myself as a maintainer for this subsystem so that I
can help to improve the dirty limit algorithm and review
the patches about dirty page rate.

Signed-off-by: Hyman Huang(黄勇) 
Signed-off-by: Markus Armbruster 
Acked-by: Peter Xu 
---
 MAINTAINERS | 9 +
 1 file changed, 9 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 12e59b6b27..6111b6b4d9 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3209,6 +3209,15 @@ F: qapi/migration.json
 F: tests/migration/
 F: util/userfaultfd.c
 
+Migration dirty limit and dirty page rate
+M: Hyman Huang 
+S: Maintained
+F: softmmu/dirtylimit.c
+F: include/sysemu/dirtylimit.h
+F: migration/dirtyrate.c
+F: migration/dirtyrate.h
+F: include/sysemu/dirtyrate.h
+
 D-Bus
 M: Marc-André Lureau 
 S: Maintained
-- 
2.38.5



[PATCH QEMU v3 0/3] migration: craft the doc comments

2023-07-30 Thread ~hyman
Hi, please review the version 3 of the series, thanks.

V3:
- craft the commit message of
  "Add section for migration dirty limit and dirty page rate",
  and put the section after section "Migration", suggested
  by Markus.

V2:
- split the first commit in v1 into 2
- add commit message of commit:
  MAINTAINERS: Add Hyman Huang as maintainer

Yong

Hyman Huang(黄勇) (3):
  qapi: Reformat the dirty-limit migration doc comments
  qapi: Craft the dirty-limit capability comment
  MAINTAINERS: Add section "Migration dirty limit and dirty page rate"

 MAINTAINERS |  9 +++
 qapi/migration.json | 66 +
 2 files changed, 40 insertions(+), 35 deletions(-)

-- 
2.38.5



[PATCH QEMU v3 2/3] qapi: Craft the dirty-limit capability comment

2023-07-30 Thread ~hyman
From: Hyman Huang(黄勇) 

Signed-off-by: Markus Armbruster 
Signed-off-by: Hyman Huang(黄勇) 
---
 qapi/migration.json | 13 +
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/qapi/migration.json b/qapi/migration.json
index a74ade4d72..62ab151da2 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -519,14 +519,11 @@
 # are present.  'return-path' capability must be enabled to use
 # it.  (since 8.1)
 #
-# @dirty-limit: If enabled, migration will use the dirty-limit
-# algorithim to throttle down guest instead of auto-converge
-# algorithim. Throttle algorithim only works when vCPU's dirtyrate
-# greater than 'vcpu-dirty-limit', read processes in guest os
-# aren't penalized any more, so this algorithim can improve
-# performance of vCPU during live migration. This is an optional
-# performance feature and should not affect the correctness of the
-# existing auto-converge algorithim.  (Since 8.1)
+# @dirty-limit: If enabled, migration will throttle vCPUs as needed to
+# keep their dirty page rate within @vcpu-dirty-limit.  This can
+# improve responsiveness of large guests during live migration,
+# and can result in more stable read performance.  Requires KVM
+# with accelerator property "dirty-ring-size" set.  (Since 8.1)
 #
 # Features:
 #
-- 
2.38.5




[PATCH] target/hppa: Move iaoq registers and thus reduce generated code size

2023-07-30 Thread Helge Deller
On hppa the Instruction Address Offset Queue (IAOQ) registers specifies
the next to-be-executed instructions addresses. Each generated TB writes those
registers at least once, so those registers are used heavily in generated
code.

Looking at the generated assembly, for a x86-64 host this code
to write the address $0x7ffe826f into iaoq_f is generated:
0x7f73e8000184:  c7 85 d4 01 00 00 6f 82  movl $0x7ffe826f, 0x1d4(%rbp)
0x7f73e800018c:  fe 7f
0x7f73e800018e:  c7 85 d8 01 00 00 73 82  movl $0x7ffe8273, 0x1d8(%rbp)
0x7f73e8000196:  fe 7f

With the trivial change, by moving the variables iaoq_f and iaoq_b to
the top of struct CPUArchState, the offset to %rbp is reduced (from
0x1d4 to 0), which allows the x86-64 tcg to generate 3 bytes less of
generated code per move instruction:
0x7fc1e800018c:  c7 45 00 6f 82 fe 7f movl $0x7ffe826f, (%rbp)
0x7fc1e8000193:  c7 45 04 73 82 fe 7f movl $0x7ffe8273, 4(%rbp)

Overall this is a reduction of generated code (not a reduction of
number of instructions).
A test run with checks the generated code size by running "/bin/ls"
with qemu-user shows that the code size shrinks from 1616767 to 1569273
bytes, which is ~97% of the former size.

Signed-off-by: Helge Deller 

diff --git a/target/hppa/cpu.h b/target/hppa/cpu.h
index 9fe79b1242..75c5c0ccf7 100644
--- a/target/hppa/cpu.h
+++ b/target/hppa/cpu.h
@@ -168,6 +168,9 @@ typedef struct {
 } hppa_tlb_entry;

 typedef struct CPUArchState {
+target_ureg iaoq_f;  /* front */
+target_ureg iaoq_b;  /* back, aka next instruction */
+
 target_ureg gr[32];
 uint64_t fr[32];
 uint64_t sr[8];  /* stored shifted into place for gva */
@@ -186,8 +189,6 @@ typedef struct CPUArchState {
 target_ureg psw_cb;  /* in least significant bit of next nibble */
 target_ureg psw_cb_msb;  /* boolean */

-target_ureg iaoq_f;  /* front */
-target_ureg iaoq_b;  /* back, aka next instruction */
 uint64_t iasq_f;
 uint64_t iasq_b;




Re: [PATCH for-8.2 1/6] vfio/migration: Move from STOP_COPY to STOP in vfio_save_cleanup()

2023-07-30 Thread Cédric Le Goater

On 7/16/23 10:15, Avihai Horon wrote:

Changing the device state from STOP_COPY to STOP can take time as the
device may need to free resources and do other operations as part of the
transition. Currently, this is done in vfio_save_complete_precopy() and
therefore it is counted in the migration downtime.

To avoid this, change the device state from STOP_COPY to STOP in
vfio_save_cleanup(), which is called after migration has completed and
thus is not part of migration downtime.

Signed-off-by: Avihai Horon 


Have you tried this series with vGPUs ? If so, are there any improvement
to report ?

Thanks,

C.


---
  hw/vfio/migration.c | 19 +--
  1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 2674f4bc47..8acd182a8b 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -383,6 +383,19 @@ static void vfio_save_cleanup(void *opaque)
  VFIODevice *vbasedev = opaque;
  VFIOMigration *migration = vbasedev->migration;
  
+/*

+ * Changing device state from STOP_COPY to STOP can take time. Do it here,
+ * after migration has completed, so it won't increase downtime.
+ */
+if (migration->device_state == VFIO_DEVICE_STATE_STOP_COPY) {
+/*
+ * If setting the device in STOP state fails, the device should be
+ * reset. To do so, use ERROR state as a recover state.
+ */
+vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP,
+ VFIO_DEVICE_STATE_ERROR);
+}
+
  g_free(migration->data_buffer);
  migration->data_buffer = NULL;
  migration->precopy_init_size = 0;
@@ -508,12 +521,6 @@ static int vfio_save_complete_precopy(QEMUFile *f, void 
*opaque)
  return ret;
  }
  
-/*

- * If setting the device in STOP state fails, the device should be reset.
- * To do so, use ERROR state as a recover state.
- */
-ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP,
-   VFIO_DEVICE_STATE_ERROR);
  trace_vfio_save_complete_precopy(vbasedev->name, ret);
  
  return ret;





Re: [PATCH for-8.2 2/6] sysemu: Add pre VM state change callback

2023-07-30 Thread Cédric Le Goater

[ ... ]


+ * qemu_add_vm_change_state_handler_prio_full:


qemu_add_vm_change_state_handler_prio_all() may be. I don't have much better
but 'full' doesn't sound right. minor.


I followed the GLib naming convention.
For example, g_tree_new and g_tree_new_full, or g_hash_table_new and 
g_hash_table_new_full.
I tend to go with GLib naming as it might be more familiar to people.
What do you think?


Let's keep it then.

Thanks,

C.




Re: [PATCH 5/6] hw/ppc: Always store the decrementer value

2023-07-30 Thread Cédric Le Goater

On 7/30/23 11:40, Nicholas Piggin wrote:

On Thu Jul 27, 2023 at 10:26 PM AEST, Cédric Le Goater wrote:

Hello Nick,

On 7/26/23 20:22, Nicholas Piggin wrote:

When writing a value to the decrementer that raises an exception, the
irq is raised, but the value is not stored so the store doesn't appear
to have changed the register when it is read again.

Always store the write value to the register.


This change has a serious performance impact when a guest is run under
PowerNV. Could you please take a look ?


Yeah, the decrementer load doesn't sign-extend the value correctly as
it should for the large-decrementer option. It makes skiboot detect
the decrementer size as 64 bits instead of 56, and things go bad from
there. KVM seems more affected because it's saving and restoring DEC
frequently.

The fix seems simple but considering the compounding series of bugs
and issues coming up with this, I think it will be better to defer
the decrementer work until 8.2 unfortunately.


Yes. QEMU 8.1 has already a lot, fixes, tests and models [1].


PS: We should really introduce avocado tests for nested.


Yeah agreed. Both for pseries and powernv, ideally.


The same disk image could be used for the 3 HV implementations. This would
be a nice addition to Harsh's series [2]

Thanks,

C.

[1] https://wiki.qemu.org/ChangeLog/8.1#PowerPC
[2] https://patchwork.ozlabs.org/project/qemu-ppc/list/?series=364386




Re: [PATCH v3 0/3] Enable -cpu ,help

2023-07-30 Thread Peter Maydell
On Sun, 30 Jul 2023 at 08:21, Dinah Baum  wrote:
>
> This patch adds the ability to query for CPU
> features. Currently it is limited to the architecture that
> support feature probing (arm, i386, and s390x).
>
> Ex:
> athlon features:
>   3dnow=
>   3dnowext=
>   3dnowprefetch=
>   ...
>
> Suggested-by: Peter Maydell
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1480

Ccing Markus for review of the option parsing parts...

thanks
-- PMM

>
> Dinah Baum (3):
>   qapi: Moved architecture agnostic data types to `machine`
>   qapi, target/: Enable 'query-cpu-model-expansion' on all architectures
>   cpu, softmmu/vl.c: Change parsing of -cpu argument to allow -cpu
> cpu,help to print options for the CPU type similar to how the
> '-device' option works.
>
>  cpu.c|  61 ++
>  include/exec/cpu-common.h|   7 ++
>  include/qapi/qmp/qdict.h |   1 +
>  qapi/machine-target.json | 138 +--
>  qapi/machine.json| 130 +
>  qemu-options.hx  |   7 +-
>  qobject/qdict.c  |   5 ++
>  softmmu/vl.c |  35 +++-
>  target/arm/arm-qmp-cmds.c|   7 +-
>  target/arm/cpu.h |   7 ++
>  target/i386/cpu-sysemu.c |   7 +-
>  target/i386/cpu.h|   6 ++
>  target/s390x/cpu.h   |   7 ++
>  target/s390x/cpu_models_sysemu.c |   6 +-
>  14 files changed, 273 insertions(+), 151 deletions(-)
>



[PATCH] hda-audio: use log-scale for amplifier levels

2023-07-30 Thread Yuanqing Li
According Intel's High Definition Audio Specification (Revision 1.0a,
Section 7.3.4.10: Amplifier Capabilities), the amplifier gain levels
should be evenly spaced in dB, i.e. using a log scale instead of linear.

Here, the hda-codec reports amplifier levels from 0 to -48 dB at 1-dB
steps matching the 8-bit dynamic range, and the -49 dB level is mapped
to 0 (muted).

Signed-off-by: Yuanqing Li 
---
 hw/audio/hda-codec.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/hw/audio/hda-codec.c b/hw/audio/hda-codec.c
index c51d8ba617..c2aa71624b 100644
--- a/hw/audio/hda-codec.c
+++ b/hw/audio/hda-codec.c
@@ -121,7 +121,7 @@ static void hda_codec_parse_fmt(uint32_t format, struct 
audsettings *as)
 #define QEMU_HDA_PCM_FORMATS (AC_SUPPCM_BITS_16 |   \
   0x1fc /* 16 -> 96 kHz */)
 #define QEMU_HDA_AMP_NONE(0)
-#define QEMU_HDA_AMP_STEPS   0x4a
+#define QEMU_HDA_AMP_STEPS   0x31 /* 20 * log10(255) = 48 dB dynamic range */
 
 #define   PARAM mixemu
 #define   HDA_MIXER
@@ -433,6 +433,14 @@ static void hda_audio_set_running(HDAAudioStream *st, bool 
running)
 }
 }
 
+/* first muted; then from -48 dB to 0 dB */
+static const uint32_t hda_vol_table[] = {
+0,   1,   1,   1,   1,   2,   2,   2,   2,   3,   3,   3,   4,   4,   5,
+5,   6,   6,   7,   8,   9,   10,  11,  13,  14,  16,  18,  20,  23,  26,
+29,  32,  36,  40,  45,  51,  57,  64,  72,  81,  90,  102, 114, 128, 143,
+161, 181, 203, 227, 255
+}
+
 static void hda_audio_set_amp(HDAAudioStream *st)
 {
 bool muted;
@@ -446,8 +454,8 @@ static void hda_audio_set_amp(HDAAudioStream *st)
 left  = st->mute_left  ? 0 : st->gain_left;
 right = st->mute_right ? 0 : st->gain_right;
 
-left = left * 255 / QEMU_HDA_AMP_STEPS;
-right = right * 255 / QEMU_HDA_AMP_STEPS;
+left = hda_vol_table[left];
+right = hda_vol_table[right];
 
 if (!st->state->mixer) {
 return;
-- 
2.41.0




[PATCH v2] target/ppc: Fix VRMA page size for ISA v3.0

2023-07-30 Thread Nicholas Piggin
Until v2.07s, the VRMA page size (L||LP) was encoded in LPCR[VRMASD].
In v3.0 that moved to the partition table PS field.

The powernv machine can now run KVM HPT guests on POWER9/10 CPUs with
this fix and the patch to add ASDR.

Fixes: 3367c62f522b ("target/ppc: Support for POWER9 native hash")
Signed-off-by: Nicholas Piggin 
---
Since v1:
- Added llp variable to avoid calling get_vrma_llp twice [Cedric].
- Added some bit defines for architected fields and values [Cedric].

Patches 1,3 from the previously posted series, let's defer 4-6
decrementer fixes until after 8.1, so this is the last remaining
one from the series.

Thanks,
Nick

 target/ppc/mmu-hash64.c | 45 +++--
 target/ppc/mmu-hash64.h |  5 +
 2 files changed, 44 insertions(+), 6 deletions(-)

diff --git a/target/ppc/mmu-hash64.c b/target/ppc/mmu-hash64.c
index a0c90df3ce..d645c0bb94 100644
--- a/target/ppc/mmu-hash64.c
+++ b/target/ppc/mmu-hash64.c
@@ -874,12 +874,46 @@ static target_ulong rmls_limit(PowerPCCPU *cpu)
 return rma_sizes[rmls];
 }
 
-static int build_vrma_slbe(PowerPCCPU *cpu, ppc_slb_t *slb)
+/* Return the LLP in SLB_VSID format */
+static uint64_t get_vrma_llp(PowerPCCPU *cpu)
 {
 CPUPPCState *env = >env;
-target_ulong lpcr = env->spr[SPR_LPCR];
-uint32_t vrmasd = (lpcr & LPCR_VRMASD) >> LPCR_VRMASD_SHIFT;
-target_ulong vsid = SLB_VSID_VRMA | ((vrmasd << 4) & SLB_VSID_LLP_MASK);
+uint64_t llp;
+
+if (env->mmu_model == POWERPC_MMU_3_00) {
+ppc_v3_pate_t pate;
+uint64_t ps, l, lp;
+
+/*
+ * ISA v3.0 removes the LPCR[VRMASD] field and puts the VRMA base
+ * page size (L||LP equivalent) in the PS field in the HPT partition
+ * table entry.
+ */
+if (!ppc64_v3_get_pate(cpu, cpu->env.spr[SPR_LPIDR], )) {
+error_report("Bad VRMA with no partition table entry");
+return 0;
+}
+ps = PATE0_GET_PS(pate.dw0);
+/* PS has L||LP in 3 consecutive bits, put them into SLB LLP format */
+l = (ps >> 2) & 0x1;
+lp = ps & 0x3;
+llp = (l << SLB_VSID_L_SHIFT) | (lp << SLB_VSID_LP_SHIFT);
+
+} else {
+uint64_t lpcr = env->spr[SPR_LPCR];
+target_ulong vrmasd = (lpcr & LPCR_VRMASD) >> LPCR_VRMASD_SHIFT;
+
+/* VRMASD LLP matches SLB format, just shift and mask it */
+llp = (vrmasd << SLB_VSID_LP_SHIFT) & SLB_VSID_LLP_MASK;
+}
+
+return llp;
+}
+
+static int build_vrma_slbe(PowerPCCPU *cpu, ppc_slb_t *slb)
+{
+uint64_t llp = get_vrma_llp(cpu);
+target_ulong vsid = SLB_VSID_VRMA | llp;
 int i;
 
 for (i = 0; i < PPC_PAGE_SIZES_MAX_SZ; i++) {
@@ -897,8 +931,7 @@ static int build_vrma_slbe(PowerPCCPU *cpu, ppc_slb_t *slb)
 }
 }
 
-error_report("Bad page size encoding in LPCR[VRMASD]; LPCR=0x"
- TARGET_FMT_lx, lpcr);
+error_report("Bad VRMA page size encoding 0x" TARGET_FMT_lx, llp);
 
 return -1;
 }
diff --git a/target/ppc/mmu-hash64.h b/target/ppc/mmu-hash64.h
index 1496955d38..de653fcae5 100644
--- a/target/ppc/mmu-hash64.h
+++ b/target/ppc/mmu-hash64.h
@@ -41,8 +41,10 @@ void ppc_hash64_finalize(PowerPCCPU *cpu);
 #define SLB_VSID_KP 0x0400ULL
 #define SLB_VSID_N  0x0200ULL /* no-execute */
 #define SLB_VSID_L  0x0100ULL
+#define SLB_VSID_L_SHIFTPPC_BIT_NR(55)
 #define SLB_VSID_C  0x0080ULL /* class */
 #define SLB_VSID_LP 0x0030ULL
+#define SLB_VSID_LP_SHIFT   PPC_BIT_NR(59)
 #define SLB_VSID_ATTR   0x0FFFULL
 #define SLB_VSID_LLP_MASK   (SLB_VSID_L | SLB_VSID_LP)
 #define SLB_VSID_4K 0xULL
@@ -58,6 +60,9 @@ void ppc_hash64_finalize(PowerPCCPU *cpu);
 #define SDR_64_HTABSIZE0x001FULL
 
 #define PATE0_HTABORG   0x0FFCULL
+#define PATE0_PSPPC_BITMASK(56, 58)
+#define PATE0_GET_PS(dw0)   (((dw0) & PATE0_PS) >> PPC_BIT_NR(58))
+
 #define HPTES_PER_GROUP 8
 #define HASH_PTE_SIZE_6416
 #define HASH_PTEG_SIZE_64   (HASH_PTE_SIZE_64 * HPTES_PER_GROUP)
-- 
2.40.1




Re: [PATCH 5/6] hw/ppc: Always store the decrementer value

2023-07-30 Thread Nicholas Piggin
On Thu Jul 27, 2023 at 10:26 PM AEST, Cédric Le Goater wrote:
> Hello Nick,
>
> On 7/26/23 20:22, Nicholas Piggin wrote:
> > When writing a value to the decrementer that raises an exception, the
> > irq is raised, but the value is not stored so the store doesn't appear
> > to have changed the register when it is read again.
> > 
> > Always store the write value to the register.
>
> This change has a serious performance impact when a guest is run under
> PowerNV. Could you please take a look ?

Yeah, the decrementer load doesn't sign-extend the value correctly as
it should for the large-decrementer option. It makes skiboot detect
the decrementer size as 64 bits instead of 56, and things go bad from
there. KVM seems more affected because it's saving and restoring DEC
frequently.

The fix seems simple but considering the compounding series of bugs
and issues coming up with this, I think it will be better to defer
the decrementer work until 8.2 unfortunately.

Thanks,
Nick

> Thanks,
>
> C.
>
> PS: We should really introduce avocado tests for nested.

Yeah agreed. Both for pseries and powernv, ideally.

Thanks,
Nick



Re: [PATCH] gdbstub: Fix client Ctrl-C handling

2023-07-30 Thread Nicholas Piggin
On Wed Jul 26, 2023 at 4:35 PM AEST, Joel Stanley wrote:
> On Wed, 12 Jul 2023 at 02:12, Nicholas Piggin  wrote:
> >
> > On Tue Jul 11, 2023 at 9:03 PM AEST, Matheus Tavares Bernardino wrote:
> > > > Nicholas Piggin  wrote:
> > > >
> > > > diff --git a/gdbstub/gdbstub.c b/gdbstub/gdbstub.c
> > > > index 6911b73c07..ce8b42eb15 100644
> > > > --- a/gdbstub/gdbstub.c
> > > > +++ b/gdbstub/gdbstub.c
> > > > @@ -2051,8 +2051,17 @@ void gdb_read_byte(uint8_t ch)
> > > >  return;
> > > >  }
> > > >  if (runstate_is_running()) {
> > > > -/* when the CPU is running, we cannot do anything except stop
> > > > -   it when receiving a char */
> > > > +/*
> > > > + * When the CPU is running, we cannot do anything except stop
> > > > + * it when receiving a char. This is expected on a Ctrl-C in 
> > > > the
> > > > + * gdb client. Because we are in all-stop mode, gdb sends a
> > > > + * 0x03 byte which is not a usual packet, so we handle it 
> > > > specially
> > > > + * here, but it does expect a stop reply.
> > > > + */
> > > > +if (ch != 0x03) {
> > > > +warn_report("gdbstub: client sent packet while target 
> > > > running\n");
> > > > +}
> > > > +gdbserver_state.allow_stop_reply = true;
> > > >  vm_stop(RUN_STATE_PAUSED);
> > > >  } else
> > > >  #endif
> > >
> > > Makes sense to me, but shouldn't we send the stop-reply packet only for
> > > Ctrl+C/0x03?
> >
> > Good question.
> >
> > I think if we get a character here that's not a 3, we're already in
> > trouble, and we eat it so even worse. Since we only send a stop packet
> > back when the vm stops, then if we don't send one now we might never
> > send it. At least if we send one then the client might have some chance
> > to get back to a sane state. And this does at least take us back to
> > behaviour before the stop filtering patch.
> >
> > Could go further and only stop the machine if it was a 3, or send a
> > stop packet even if we were stopped, etc. but all that get further from
> > a minimal fix.
>
> I was taking a look at -rc1 and it looks like this hasn't made it in.
> Is it something we want to propose including?
>
> As a user of qemu I'd vote for it to go in.

I think it should, gdb is hardly usable without it.

Thanks,
Nick



[PATCH v3 2/3] qapi, target/: Enable 'query-cpu-model-expansion' on all architectures

2023-07-30 Thread Dinah Baum
Only architectures that implement the command will return
results, others will return an error message as before.

Signed-off-by: Dinah Baum 
---
 cpu.c| 20 +++
 include/exec/cpu-common.h|  7 
 qapi/machine-target.json | 60 
 qapi/machine.json| 53 
 target/arm/arm-qmp-cmds.c|  7 ++--
 target/arm/cpu.h |  7 
 target/i386/cpu-sysemu.c |  7 ++--
 target/i386/cpu.h|  6 
 target/s390x/cpu.h   |  7 
 target/s390x/cpu_models_sysemu.c |  6 ++--
 10 files changed, 110 insertions(+), 70 deletions(-)

diff --git a/cpu.c b/cpu.c
index 1c948d1161..a99d09cd47 100644
--- a/cpu.c
+++ b/cpu.c
@@ -292,6 +292,26 @@ void list_cpus(void)
 #endif
 }
 
+CpuModelExpansionInfo *get_cpu_model_expansion_info(CpuModelExpansionType type,
+CpuModelInfo *model,
+Error **errp)
+{
+/* XXX: implement cpu_model_expansion for targets that still miss it */
+#if defined(cpu_model_expansion)
+return cpu_model_expansion(type, model, errp);
+#else
+error_setg(errp, "Could not query cpu model information");
+return NULL;
+#endif
+}
+
+CpuModelExpansionInfo *qmp_query_cpu_model_expansion(CpuModelExpansionType 
type,
+ CpuModelInfo *model,
+ Error **errp)
+{
+return get_cpu_model_expansion_info(type, model, errp);
+}
+
 #if defined(CONFIG_USER_ONLY)
 void tb_invalidate_phys_addr(hwaddr addr)
 {
diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index 87dc9a752c..653f8a9d2b 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -6,6 +6,7 @@
 #ifndef CONFIG_USER_ONLY
 #include "exec/hwaddr.h"
 #endif
+#include "qapi/qapi-commands-machine.h"
 
 /**
  * vaddr:
@@ -167,4 +168,10 @@ int cpu_memory_rw_debug(CPUState *cpu, vaddr addr,
 /* vl.c */
 void list_cpus(void);
 
+CpuModelExpansionInfo *get_cpu_model_expansion_info(CpuModelExpansionType type,
+CpuModelInfo *model,
+Error **errp);
+void list_cpu_model_expansion(CpuModelExpansionType type,
+  CpuModelInfo *model, Error **errp);
+
 #endif /* CPU_COMMON_H */
diff --git a/qapi/machine-target.json b/qapi/machine-target.json
index 3ee2f7ca6b..a658b1754b 100644
--- a/qapi/machine-target.json
+++ b/qapi/machine-target.json
@@ -139,66 +139,6 @@
   'returns': 'CpuModelBaselineInfo',
   'if': 'TARGET_S390X' }
 
-##
-# @CpuModelExpansionInfo:
-#
-# The result of a cpu model expansion.
-#
-# @model: the expanded CpuModelInfo.
-#
-# Since: 2.8
-##
-{ 'struct': 'CpuModelExpansionInfo',
-  'data': { 'model': 'CpuModelInfo' },
-  'if': { 'any': [ 'TARGET_S390X',
-   'TARGET_I386',
-   'TARGET_ARM' ] } }
-
-##
-# @query-cpu-model-expansion:
-#
-# Expands a given CPU model (or a combination of CPU model +
-# additional options) to different granularities, allowing tooling to
-# get an understanding what a specific CPU model looks like in QEMU
-# under a certain configuration.
-#
-# This interface can be used to query the "host" CPU model.
-#
-# The data returned by this command may be affected by:
-#
-# * QEMU version: CPU models may look different depending on the QEMU
-#   version.  (Except for CPU models reported as "static" in
-#   query-cpu-definitions.)
-# * machine-type: CPU model may look different depending on the
-#   machine-type.  (Except for CPU models reported as "static" in
-#   query-cpu-definitions.)
-# * machine options (including accelerator): in some architectures,
-#   CPU models may look different depending on machine and accelerator
-#   options.  (Except for CPU models reported as "static" in
-#   query-cpu-definitions.)
-# * "-cpu" arguments and global properties: arguments to the -cpu
-#   option and global properties may affect expansion of CPU models.
-#   Using query-cpu-model-expansion while using these is not advised.
-#
-# Some architectures may not support all expansion types.  s390x
-# supports "full" and "static". Arm only supports "full".
-#
-# Returns: a CpuModelExpansionInfo.  Returns an error if expanding CPU
-# models is not supported, if the model cannot be expanded, if the
-# model contains an unknown CPU definition name, unknown
-# properties or properties with a wrong type.  Also returns an
-# error if an expansion type is not supported.
-#
-# Since: 2.8
-##
-{ 'command': 'query-cpu-model-expansion',
-  'data': { 'type': 'CpuModelExpansionType',
-'model': 'CpuModelInfo' },
-  'returns': 'CpuModelExpansionInfo',
-  'if': { 'any': [ 'TARGET_S390X',
-   'TARGET_I386',
-   'TARGET_ARM' ] } }
-
 ##
 # 

[PATCH v3 3/3] cpu, softmmu/vl.c: Change parsing of -cpu argument to allow -cpu cpu, help to print options for the CPU type similar to how the '-device' option works.

2023-07-30 Thread Dinah Baum
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1480
Signed-off-by: Dinah Baum 

Signed-off-by: Dinah Baum 
---
 cpu.c| 41 
 include/qapi/qmp/qdict.h |  1 +
 qemu-options.hx  |  7 ---
 qobject/qdict.c  |  5 +
 softmmu/vl.c | 35 +-
 5 files changed, 85 insertions(+), 4 deletions(-)

diff --git a/cpu.c b/cpu.c
index a99d09cd47..9971ffeeba 100644
--- a/cpu.c
+++ b/cpu.c
@@ -43,6 +43,10 @@
 #include "trace/trace-root.h"
 #include "qemu/accel.h"
 #include "qemu/plugin.h"
+#include "qemu/cutils.h"
+#include "qemu/qemu-print.h"
+#include "qapi/qmp/qdict.h"
+#include "qapi/qmp/qobject.h"
 
 uintptr_t qemu_host_page_size;
 intptr_t qemu_host_page_mask;
@@ -312,6 +316,43 @@ CpuModelExpansionInfo 
*qmp_query_cpu_model_expansion(CpuModelExpansionType type,
 return get_cpu_model_expansion_info(type, model, errp);
 }
 
+void list_cpu_model_expansion(CpuModelExpansionType type,
+  CpuModelInfo *model,
+  Error **errp)
+{
+CpuModelExpansionInfo *expansion_info;
+QDict *qdict;
+QDictEntry *qdict_entry;
+const char *key;
+QObject *obj;
+QType q_type;
+GPtrArray *array;
+int i;
+const char *type_name;
+
+expansion_info = get_cpu_model_expansion_info(type, model, errp);
+if (expansion_info) {
+qdict = qobject_to(QDict, expansion_info->model->props);
+if (qdict) {
+qemu_printf("%s features:\n", model->name);
+array = g_ptr_array_new();
+for (qdict_entry = (QDictEntry *)qdict_first(qdict); qdict_entry;
+ qdict_entry = (QDictEntry *)qdict_next(qdict, qdict_entry)) {
+g_ptr_array_add(array, qdict_entry);
+}
+g_ptr_array_sort(array, (GCompareFunc)dict_key_compare);
+for (i = 0; i < array->len; i++) {
+qdict_entry = array->pdata[i];
+key = qdict_entry_key(qdict_entry);
+obj = qdict_get(qdict, key);
+q_type = qobject_type(obj);
+type_name = QType_str(q_type);
+qemu_printf("  %s=<%s>\n", key, type_name);
+}
+}
+}
+}
+
 #if defined(CONFIG_USER_ONLY)
 void tb_invalidate_phys_addr(hwaddr addr)
 {
diff --git a/include/qapi/qmp/qdict.h b/include/qapi/qmp/qdict.h
index 82e90fc072..d0b6c3d358 100644
--- a/include/qapi/qmp/qdict.h
+++ b/include/qapi/qmp/qdict.h
@@ -67,5 +67,6 @@ bool qdict_get_try_bool(const QDict *qdict, const char *key, 
bool def_value);
 const char *qdict_get_try_str(const QDict *qdict, const char *key);
 
 QDict *qdict_clone_shallow(const QDict *src);
+int dict_key_compare(QDictEntry **entry1, QDictEntry **entry2);
 
 #endif /* QDICT_H */
diff --git a/qemu-options.hx b/qemu-options.hx
index 29b98c3d4c..e0f0284927 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -169,11 +169,12 @@ SRST
 ERST
 
 DEF("cpu", HAS_ARG, QEMU_OPTION_cpu,
-"-cpu cpuselect CPU ('-cpu help' for list)\n", QEMU_ARCH_ALL)
+"-cpu cpuselect CPU ('-cpu help' for list)\n"
+"use '-cpu cpu,help' to print possible properties\n", 
QEMU_ARCH_ALL)
 SRST
 ``-cpu model``
-Select CPU model (``-cpu help`` for list and additional feature
-selection)
+Select CPU model (``-cpu help`` and ``-cpu cpu,help``) for list and 
additional feature
+selection
 ERST
 
 DEF("accel", HAS_ARG, QEMU_OPTION_accel,
diff --git a/qobject/qdict.c b/qobject/qdict.c
index 8faff230d3..31407e62f6 100644
--- a/qobject/qdict.c
+++ b/qobject/qdict.c
@@ -447,3 +447,8 @@ void qdict_unref(QDict *q)
 {
 qobject_unref(q);
 }
+
+int dict_key_compare(QDictEntry **entry1, QDictEntry **entry2)
+{
+return g_strcmp0(qdict_entry_key(*entry1), qdict_entry_key(*entry2));
+}
diff --git a/softmmu/vl.c b/softmmu/vl.c
index b0b96f67fa..1fd87f2c06 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -501,6 +501,15 @@ static QemuOptsList qemu_action_opts = {
 },
 };
 
+static QemuOptsList qemu_cpu_opts = {
+.name = "cpu",
+.implied_opt_name = "cpu",
+.head = QTAILQ_HEAD_INITIALIZER(qemu_cpu_opts.head),
+.desc = {
+{ /* end of list */ }
+},
+};
+
 const char *qemu_get_vm_name(void)
 {
 return qemu_name;
@@ -1159,6 +1168,26 @@ static int device_init_func(void *opaque, QemuOpts 
*opts, Error **errp)
 return 0;
 }
 
+static int cpu_help_func(void *opaque, QemuOpts *opts, Error **errp)
+{
+CpuModelInfo *model;
+
+if (cpu_option && is_help_option(cpu_option)) {
+list_cpus();
+return 1;
+}
+
+if (!cpu_option || !qemu_opt_has_help_opt(opts)) {
+return 0;
+}
+
+model = g_new0(CpuModelInfo, 1);
+model->name = (char *)qemu_opt_get(opts, "cpu");
+/* TODO: handle other expansion cases */
+list_cpu_model_expansion(CPU_MODEL_EXPANSION_TYPE_FULL, model, errp);
+return 1;
+}
+
 static int 

Re: [PATCH for-8.2 2/6] sysemu: Add pre VM state change callback

2023-07-30 Thread Avihai Horon



On 27/07/2023 19:23, Cédric Le Goater wrote:

External email: Use caution opening links or attachments


On 7/16/23 10:15, Avihai Horon wrote:

Add pre VM state change callback to struct VMChangeStateEntry.


This sentence could be the subject.


Sure.





The pre VM state change callback is optional and can be set by the new
function qemu_add_vm_change_state_handler_prio_full() that allows
setting this callback in addition to the main callback.

The pre VM state change callbacks and main callbacks are called in two
separate phases: First all pre VM state change callbacks are called and
only then all main callbacks are called.

The purpose of the new pre VM state change callback is to allow all
devices to run a preliminary task before calling the devices' main
callbacks.

This will facilitate adding P2P support for VFIO migration where all
VFIO devices need to be put in an intermediate P2P quiescent state
before being stopped or started by the main VM state change callback.

Signed-off-by: Avihai Horon 
---
  include/sysemu/runstate.h |  4 
  softmmu/runstate.c    | 39 +++
  2 files changed, 43 insertions(+)

diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
index 7beb29c2e2..bb38a4b4bd 100644
--- a/include/sysemu/runstate.h
+++ b/include/sysemu/runstate.h
@@ -16,6 +16,10 @@ VMChangeStateEntry 
*qemu_add_vm_change_state_handler(VMChangeStateHandler *cb,

   void *opaque);
  VMChangeStateEntry *qemu_add_vm_change_state_handler_prio(
  VMChangeStateHandler *cb, void *opaque, int priority);
+VMChangeStateEntry *
+qemu_add_vm_change_state_handler_prio_full(VMChangeStateHandler *cb,
+   VMChangeStateHandler 
*pre_change_cb,

+   void *opaque, int priority);
  VMChangeStateEntry *qdev_add_vm_change_state_handler(DeviceState *dev,
VMChangeStateHandler *cb,
   void *opaque);
diff --git a/softmmu/runstate.c b/softmmu/runstate.c
index f3bd862818..a1f0653899 100644
--- a/softmmu/runstate.c
+++ b/softmmu/runstate.c
@@ -271,6 +271,7 @@ void qemu_system_vmstop_request(RunState state)
  }
  struct VMChangeStateEntry {
  VMChangeStateHandler *cb;
+    VMChangeStateHandler *pre_change_cb;


I propose to use 'prepare' instead of 'pre_change'


Sure, I will change it.





  void *opaque;
  QTAILQ_ENTRY(VMChangeStateEntry) entries;
  int priority;
@@ -293,12 +294,38 @@ static QTAILQ_HEAD(, VMChangeStateEntry) 
vm_change_state_head =

   */
  VMChangeStateEntry *qemu_add_vm_change_state_handler_prio(
  VMChangeStateHandler *cb, void *opaque, int priority)
+{
+    return qemu_add_vm_change_state_handler_prio_full(cb, NULL, opaque,
+ priority);
+}
+
+/**
+ * qemu_add_vm_change_state_handler_prio_full:


qemu_add_vm_change_state_handler_prio_all() may be. I don't have much 
better

but 'full' doesn't sound right. minor.


I followed the GLib naming convention.
For example, g_tree_new and g_tree_new_full, or g_hash_table_new and 
g_hash_table_new_full.

I tend to go with GLib naming as it might be more familiar to people.
What do you think?

Thanks!



The rest looks fine to me.

Thanks,

C.


+ * @cb: the main callback to invoke
+ * @pre_change_cb: a callback to invoke before the main callback
+ * @opaque: user data passed to the callbacks
+ * @priority: low priorities execute first when the vm runs and the 
reverse is

+ *    true when the vm stops
+ *
+ * Register a main callback function and an optional pre VM state 
change
+ * callback function that are invoked when the vm starts or stops 
running. The

+ * main callback and the pre VM state change callback are called in two
+ * separate phases: First all pre VM state change callbacks are 
called and only

+ * then all main callbacks are called.
+ *
+ * Returns: an entry to be freed using 
qemu_del_vm_change_state_handler()

+ */
+VMChangeStateEntry *
+qemu_add_vm_change_state_handler_prio_full(VMChangeStateHandler *cb,
+   VMChangeStateHandler 
*pre_change_cb,

+   void *opaque, int priority)
  {
  VMChangeStateEntry *e;
  VMChangeStateEntry *other;

  e = g_malloc0(sizeof(*e));
  e->cb = cb;
+    e->pre_change_cb = pre_change_cb;
  e->opaque = opaque;
  e->priority = priority;

@@ -333,10 +360,22 @@ void vm_state_notify(bool running, RunState state)
  trace_vm_state_notify(running, state, RunState_str(state));

  if (running) {
+    QTAILQ_FOREACH_SAFE(e, _change_state_head, entries, next) {
+    if (e->pre_change_cb) {
+    e->pre_change_cb(e->opaque, running, state);
+    }
+    }
+
  QTAILQ_FOREACH_SAFE(e, _change_state_head, entries, next) {
  e->cb(e->opaque, running, state);
  }
  } else {
+    

[PATCH v3 1/3] qapi: Moved architecture agnostic data types to `machine`

2023-07-30 Thread Dinah Baum
Signed-off-by: Dinah Baum 
---
 qapi/machine-target.json | 78 +---
 qapi/machine.json| 77 +++
 2 files changed, 78 insertions(+), 77 deletions(-)

diff --git a/qapi/machine-target.json b/qapi/machine-target.json
index f0a6b72414..3ee2f7ca6b 100644
--- a/qapi/machine-target.json
+++ b/qapi/machine-target.json
@@ -4,83 +4,7 @@
 # This work is licensed under the terms of the GNU GPL, version 2 or later.
 # See the COPYING file in the top-level directory.
 
-##
-# @CpuModelInfo:
-#
-# Virtual CPU model.
-#
-# A CPU model consists of the name of a CPU definition, to which delta
-# changes are applied (e.g. features added/removed). Most magic values
-# that an architecture might require should be hidden behind the name.
-# However, if required, architectures can expose relevant properties.
-#
-# @name: the name of the CPU definition the model is based on
-#
-# @props: a dictionary of QOM properties to be applied
-#
-# Since: 2.8
-##
-{ 'struct': 'CpuModelInfo',
-  'data': { 'name': 'str',
-'*props': 'any' } }
-
-##
-# @CpuModelExpansionType:
-#
-# An enumeration of CPU model expansion types.
-#
-# @static: Expand to a static CPU model, a combination of a static
-# base model name and property delta changes.  As the static base
-# model will never change, the expanded CPU model will be the
-# same, independent of QEMU version, machine type, machine
-# options, and accelerator options.  Therefore, the resulting
-# model can be used by tooling without having to specify a
-# compatibility machine - e.g. when displaying the "host" model.
-# The @static CPU models are migration-safe.
-#
-# @full: Expand all properties.  The produced model is not guaranteed
-# to be migration-safe, but allows tooling to get an insight and
-# work with model details.
-#
-# Note: When a non-migration-safe CPU model is expanded in static
-# mode, some features enabled by the CPU model may be omitted,
-# because they can't be implemented by a static CPU model
-# definition (e.g. cache info passthrough and PMU passthrough in
-# x86). If you need an accurate representation of the features
-# enabled by a non-migration-safe CPU model, use @full.  If you
-# need a static representation that will keep ABI compatibility
-# even when changing QEMU version or machine-type, use @static
-# (but keep in mind that some features may be omitted).
-#
-# Since: 2.8
-##
-{ 'enum': 'CpuModelExpansionType',
-  'data': [ 'static', 'full' ] }
-
-##
-# @CpuModelCompareResult:
-#
-# An enumeration of CPU model comparison results.  The result is
-# usually calculated using e.g. CPU features or CPU generations.
-#
-# @incompatible: If model A is incompatible to model B, model A is not
-# guaranteed to run where model B runs and the other way around.
-#
-# @identical: If model A is identical to model B, model A is
-# guaranteed to run where model B runs and the other way around.
-#
-# @superset: If model A is a superset of model B, model B is
-# guaranteed to run where model A runs.  There are no guarantees
-# about the other way.
-#
-# @subset: If model A is a subset of model B, model A is guaranteed to
-# run where model B runs.  There are no guarantees about the other
-# way.
-#
-# Since: 2.8
-##
-{ 'enum': 'CpuModelCompareResult',
-  'data': [ 'incompatible', 'identical', 'superset', 'subset' ] }
+{ 'include': 'machine.json' }
 
 ##
 # @CpuModelBaselineInfo:
diff --git a/qapi/machine.json b/qapi/machine.json
index a08b6576ca..192c781310 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -1691,3 +1691,80 @@
 { 'command': 'dumpdtb',
   'data': { 'filename': 'str' },
   'if': 'CONFIG_FDT' }
+
+##
+# @CpuModelInfo:
+#
+# Virtual CPU model.
+#
+# A CPU model consists of the name of a CPU definition, to which delta
+# changes are applied (e.g. features added/removed). Most magic values
+# that an architecture might require should be hidden behind the name.
+# However, if required, architectures can expose relevant properties.
+#
+# @name: the name of the CPU definition the model is based on
+#
+# @props: a dictionary of QOM properties to be applied
+#
+# Since: 2.8
+##
+{ 'struct': 'CpuModelInfo',
+  'data': { 'name': 'str', '*props': 'any' } }
+
+##
+# @CpuModelExpansionType:
+#
+# An enumeration of CPU model expansion types.
+#
+# @static: Expand to a static CPU model, a combination of a static
+# base model name and property delta changes.  As the static base
+# model will never change, the expanded CPU model will be the
+# same, independent of QEMU version, machine type, machine
+# options, and accelerator options.  Therefore, the resulting
+# model can be used by tooling without having to specify a
+# compatibility machine - e.g. when displaying the "host" model.
+# The @static CPU models are migration-safe.
+#
+# @full: Expand all properties.  

Re: [PATCH for-8.2 0/6] vfio/migration: Add P2P support for VFIO migration

2023-07-30 Thread Avihai Horon



On 18/07/2023 18:46, Jason Gunthorpe wrote:

On Sun, Jul 16, 2023 at 11:15:35AM +0300, Avihai Horon wrote:

Hi all,

The first patch in this series adds a small optimization to VFIO
migration by moving the STOP_COPY->STOP transition to
vfio_save_cleanup(). Testing with a ConnectX-7 VFIO device showed that
this can reduce downtime by up to 6%.

The rest of the series adds P2P support for VFIO migration.

VFIO migration uAPI defines an optional intermediate P2P quiescent
state. While in the P2P quiescent state, P2P DMA transactions cannot be
initiated by the device, but the device can respond to incoming ones.
Additionally, all outstanding P2P transactions are guaranteed to have
been completed by the time the device enters this state.

For clarity it is "all outstanding DMA transactions are guarenteed to
have been completed" - the device has no idea if something is P2P or
not.


The uAPI states that [1]: "If the device can identify P2P 
transactionsthen it can stop only P2P DMA, otherwise it must stop all DMA.".


So I will stick to the uAPI definition.

[1] 
https://elixir.bootlin.com/linux/v6.5-rc1/source/include/uapi/linux/vfio.h#L1032





The purpose of this state is to support migration of multiple devices
that are doing P2P transactions between themselves.

s/are/might/


Sure, will change it.

Thanks!




[PATCH v3 0/3] Enable -cpu ,help

2023-07-30 Thread Dinah Baum
This patch adds the ability to query for CPU
features. Currently it is limited to the architecture that
support feature probing (arm, i386, and s390x).

Ex:
athlon features:
  3dnow=
  3dnowext=
  3dnowprefetch=
  ...
  
Suggested-by: Peter Maydell
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1480

Dinah Baum (3):
  qapi: Moved architecture agnostic data types to `machine`
  qapi, target/: Enable 'query-cpu-model-expansion' on all architectures
  cpu, softmmu/vl.c: Change parsing of -cpu argument to allow -cpu
cpu,help to print options for the CPU type similar to how the
'-device' option works.

 cpu.c|  61 ++
 include/exec/cpu-common.h|   7 ++
 include/qapi/qmp/qdict.h |   1 +
 qapi/machine-target.json | 138 +--
 qapi/machine.json| 130 +
 qemu-options.hx  |   7 +-
 qobject/qdict.c  |   5 ++
 softmmu/vl.c |  35 +++-
 target/arm/arm-qmp-cmds.c|   7 +-
 target/arm/cpu.h |   7 ++
 target/i386/cpu-sysemu.c |   7 +-
 target/i386/cpu.h|   6 ++
 target/s390x/cpu.h   |   7 ++
 target/s390x/cpu_models_sysemu.c |   6 +-
 14 files changed, 273 insertions(+), 151 deletions(-)

-- 
2.30.2