date:20240212

Re: [PATCH] tcg/arm: Fix goto_tb for large translation blocks

2024-02-12 Thread Michael Tokarev


13.02.2024 00:56, Richard Henderson:

Correct arithmetic for separating high and low
on a large negative number.

Fixes: 79ffece4447 ("tcg/arm: Implement direct branch for goto_tb")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1714
Signed-off-by: Richard Henderson 


Cc: qemu-stable@
Reviewed-by: Michael Tokarev 


---
  tcg/arm/tcg-target.c.inc | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tcg/arm/tcg-target.c.inc b/tcg/arm/tcg-target.c.inc
index ffd23ef789..6a04c73c76 100644
--- a/tcg/arm/tcg-target.c.inc
+++ b/tcg/arm/tcg-target.c.inc
@@ -1771,9 +1771,9 @@ static void tcg_out_goto_tb(TCGContext *s, int which)
   * shifted immediate from pc.
   */
  int h = -i_disp;
-int l = h & 0xfff;
+int l = -(h & 0xfff);
  
-h = encode_imm_nofail(h - l);

+h = encode_imm_nofail(h + l);
  tcg_out_dat_imm(s, COND_AL, ARITH_SUB, TCG_REG_R0, TCG_REG_PC, h);
  tcg_out_ld32_12(s, COND_AL, TCG_REG_PC, TCG_REG_R0, l);
  }

[PATCH] hw/i386/sgx: Use QDev API

2024-02-12 Thread Philippe Mathieu-Daudé

Prefer the QDev API over the low level QOM one.
No logical change intended.

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/i386/sgx.c | 14 ++
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/hw/i386/sgx.c b/hw/i386/sgx.c
index 70305547d4..9176040f8f 100644
--- a/hw/i386/sgx.c
+++ b/hw/i386/sgx.c
@@ -286,7 +286,6 @@ void pc_machine_init_sgx_epc(PCMachineState *pcms)
 SGXEPCState *sgx_epc = >sgx_epc;
 X86MachineState *x86ms = X86_MACHINE(pcms);
 SgxEPCList *list = NULL;
-Object *obj;
 
 memset(sgx_epc, 0, sizeof(SGXEPCState));
 if (!x86ms->sgx_epc_list) {
@@ -300,16 +299,15 @@ void pc_machine_init_sgx_epc(PCMachineState *pcms)
 _epc->mr);
 
 for (list = x86ms->sgx_epc_list; list; list = list->next) {
-obj = object_new("sgx-epc");
+DeviceState *dev = qdev_new(TYPE_SGX_EPC);
 
 /* set the memdev link with memory backend */
-object_property_parse(obj, SGX_EPC_MEMDEV_PROP, list->value->memdev,
-  _fatal);
+object_property_parse(OBJECT(dev), SGX_EPC_MEMDEV_PROP,
+  list->value->memdev, _fatal);
 /* set the numa node property for sgx epc object */
-object_property_set_uint(obj, SGX_EPC_NUMA_NODE_PROP, 
list->value->node,
- _fatal);
-object_property_set_bool(obj, "realized", true, _fatal);
-object_unref(obj);
+object_property_set_uint(OBJECT(dev), SGX_EPC_NUMA_NODE_PROP,
+ list->value->node, _fatal);
+qdev_realize_and_unref(dev, _fatal);
 }
 
 if ((sgx_epc->base + sgx_epc->size) < sgx_epc->base) {
-- 
2.41.0

Re: [PATCH] hw/block/tc58128: Don't emit deprecation warning under qtest

2024-02-12 Thread Philippe Mathieu-Daudé


On 6/2/24 16:41, Peter Maydell wrote:

Suppress the deprecation warning when we're running under qtest,
to avoid "make check" including warning messages in its output.

Signed-off-by: Peter Maydell 
---
  hw/block/tc58128.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)


Patched queued, thanks.

Re: [PATCH] system: Move memory_ldst.c.inc to system

2024-02-12 Thread Philippe Mathieu-Daudé


On 9/2/24 01:05, BALATON Zoltan wrote:

This file is only used by system/physmem.c so move them together.

Signed-off-by: BALATON Zoltan 
---
  memory_ldst.c.inc => system/memory_ldst.c.inc | 0
  1 file changed, 0 insertions(+), 0 deletions(-)
  rename memory_ldst.c.inc => system/memory_ldst.c.inc (100%)

diff --git a/memory_ldst.c.inc b/system/memory_ldst.c.inc
similarity index 100%
rename from memory_ldst.c.inc
rename to system/memory_ldst.c.inc


Patched queued, thanks.

Re: [PATCH v2 1/2] linux-user: Map low priority rt signals

2024-02-12 Thread Philippe Mathieu-Daudé


Cc'ing Brian & Taylor

On 12/2/24 21:45, Ilya Leoshkevich wrote:

Some applications want to use low priority realtime signals (e.g.,
SIGRTMAX). Currently QEMU cannot map all target realtime signals to
host signals, and chooses to sacrifice the end of the target realtime
signal range.

Change this to the middle of that range, hoping that fewer applications
will need it.

Signed-off-by: Ilya Leoshkevich 
---
  linux-user/signal.c | 18 +-
  1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/linux-user/signal.c b/linux-user/signal.c
index d3e62ab030f..a81533b563a 100644
--- a/linux-user/signal.c
+++ b/linux-user/signal.c
@@ -511,13 +511,14 @@ static int core_dump_signal(int sig)
  
  static void signal_table_init(void)

  {
-int hsig, tsig, count;
+int hsig, hsig_count, tsig, tsig_count, tsig_hole, tsig_hole_size, count;
  
  /*

- * Signals are supported starting from TARGET_SIGRTMIN and going up
- * until we run out of host realtime signals.  Glibc uses the lower 2
- * RT signals and (hopefully) nobody uses the upper ones.
- * This is why SIGRTMIN (34) is generally greater than __SIGRTMIN (32).
+ * Signals are supported starting from TARGET_SIGRTMIN and up to
+ * TARGET_SIGRTMAX, potentially with a hole in the middle of this
+ * range, which, hopefully, nobody uses. Glibc uses the lower 2 RT
+ * signals; this is why SIGRTMIN (34) is generally greater than
+ * __SIGRTMIN (32).
   * To fix this properly we would need to do manual signal delivery
   * multiplexed over a single host signal.
   * Attempts for configure "missing" signals via sigaction will be
@@ -536,9 +537,16 @@ static void signal_table_init(void)
  host_to_target_signal_table[SIGABRT] = 0;
  host_to_target_signal_table[hsig++] = TARGET_SIGABRT;
  
+hsig_count = SIGRTMAX - hsig + 1;

+tsig_count = TARGET_NSIG - TARGET_SIGRTMIN + 1;
+tsig_hole_size = tsig_count - MIN(hsig_count, tsig_count);
+tsig_hole = TARGET_SIGRTMIN + (tsig_count - tsig_hole_size) / 2;
  for (tsig = TARGET_SIGRTMIN;
   hsig <= SIGRTMAX && tsig <= TARGET_NSIG;
   hsig++, tsig++) {
+if (tsig == tsig_hole) {
+tsig += tsig_hole_size;
+}
  host_to_target_signal_table[hsig] = tsig;
  }

Re: [PATCH 0/3] hw/usb: Rename NB_PORTS -> UHCI_PORTS / EHCI_PORTS

2024-02-12 Thread Richard Henderson


On 2/12/24 18:38, Philippe Mathieu-Daudé wrote:

Philippe Mathieu-Daudé (3):
   hw/usb: Style cleanup
   hw/usb/uhci: Rename NB_PORTS -> UHCI_PORTS
   hw/usb/ehci: Rename NB_PORTS -> EHCI_PORTS


Reviewed-by: Richard Henderson 

r~

Re: [PATCH] tcg: Increase width of temp_subindex

2024-02-12 Thread Michael Tokarev


13.02.2024 08:44, Michael Tokarev wrote:

13.02.2024 05:21, Richard Henderson:

We need values 0-3 for TCG_TYPE_I128 on 32-bit hosts.

Cc: qemu-sta...@nongnu.org
Fixes: 43eef72f4109 ("tcg: Add temp allocation for TCGv_i128")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2159
Signed-off-by: Richard Henderson 
---

I feel certain that I made this change back when I introduced TCGv_i128.
I imagine that something went wrong with a rebase and it got lost.
Worse, we don't use temp_subindex often, and we usually handle i128
this value correctly.  It took a quirk of register allocation ordering
to make an invalid value in temp_subindex lead to a crash.


Somehow it feels deja vue too, maybe we increased some other width like
this already, in stable series too, but I can't find it now.

"This" in "this value" should be omitted. With this fixed,


This is that "ENOCOFFEE" time once again ;) - I haven't noticed this
is a comment *after* the patch description.


Reviewed-by: Michael Tokarev 


Tested-by: Michael Tokarev 

I run several different guests here which did show the same crash, -
none of them crashes after this patch.

/mjt

Re: [PATCH] tcg: Increase width of temp_subindex

2024-02-12 Thread Michael Tokarev


13.02.2024 05:21, Richard Henderson:

We need values 0-3 for TCG_TYPE_I128 on 32-bit hosts.

Cc: qemu-sta...@nongnu.org
Fixes: 43eef72f4109 ("tcg: Add temp allocation for TCGv_i128")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2159
Signed-off-by: Richard Henderson 
---

I feel certain that I made this change back when I introduced TCGv_i128.
I imagine that something went wrong with a rebase and it got lost.
Worse, we don't use temp_subindex often, and we usually handle i128
this value correctly.  It took a quirk of register allocation ordering
to make an invalid value in temp_subindex lead to a crash.


Somehow it feels deja vue too, maybe we increased some other width like
this already, in stable series too, but I can't find it now.

"This" in "this value" should be omitted. With this fixed,

Reviewed-by: Michael Tokarev 

Thank you for the quick fix Richard!

/mjt


---
  include/tcg/tcg.h | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index daf2a5bf9e..451f3fec41 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -412,7 +412,7 @@ typedef struct TCGTemp {
  unsigned int mem_coherent:1;
  unsigned int mem_allocated:1;
  unsigned int temp_allocated:1;
-unsigned int temp_subindex:1;
+unsigned int temp_subindex:2;
  
  int64_t val;

  struct TCGTemp *mem_base;

Single step (gdb) issue on QEMU without KVM

2024-02-12 Thread Ravi Bhagavatula

Hi,

We are running into an issue using GDB with our RTOS on QEMU x86_64 system 
emulator without KVM. The same works well when we re-run the test with KVM 
enabled.

The scenario is following:

  *   ptrace sets up a HW watchpoint using debug register
  *   ptrace sets up a user program single-stepping through TF bit in EFLAGS 
register
  *   ptrace kicks the program
  *   the program hits the next instruction which shall cause a debug exception 
due to watchpoint event
  *   the program shall also cause the same debug exception due to single-step 
event

In case of KVM QEMU mode - TF bit in EFLAGS causes "int 1" in user mode, which 
is handled by LOS178 ptrace_debugerr() function (user mode debug exception 
handler), which does the following steps:

  *   clears TF bit from EFLAGS
  *   checks Debug-Status Register (DR6)
  *   since watchpoint bit is set (B0 bit in DR6) - single-step bit from DR6 is 
cleared and watchpoint event is handled.

This sequence is described in Debug Status Register (DR6) section of Intel 64 
SDM 
(https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html)
 as follows:
* BS (single step) flag (bit 14) - Indicates (when set) that the debug 
exception was triggered by the single-
step execution mode (enabled with the TF flag in the EFLAGS register). The 
single-step mode is the highest-
priority debug exception. When the BS flag is set, any of the other debug 
status bits also may be set

However, in case of no-KVM QEMU mode CPU behaves differently:

  *   debug exception is triggered because of watchpoint event, execution 
enters etrap1 function (beginning of watchpoint event handling)
  *   TF bit in EFLAGS causes CPU to execute int 1 in kernel mode inside of 
etrap1
  *   debug exception in kernel mode is handled by SKDB, if it's installed, so 
LOS178 enters SKDB
  *   SKDB doesn't ignore any debug exception, even though it didn't use any 
debugging utilities (like BPs, WPs, single-step), so it puts the LOS178 into 
SKDB command line mode

This sequence can be proved by the CPU state after SKDB is entered, here we see 
that etrap1 was interrupted in the beginning:
* t
fp=0x7c67bee8, pc=0x80005348 , sp=0x7fffdfc0

DR6 (DSTAT) value shows there are 2 bits set - B0 (watchpoint) and BS 
(single-step):
* r
...
DR0   DR1   
  DR2 DR3
00213610   
  (DR4)  (DR5)  
  DSTAT   DCTRL
  4ff1 001d0402

I've tried a few different -cpu flags for qemu, but it doesn't change behavior, 
so it appears that no-KVM QEMU implementation may have a bug, which makes CPU 
to execute exceptions in a wrong order for this scenario.
I didn't have a chance to reproduce the same scenario on Linux, but looking 
through the source code it appears that at least KGDB verifies did it set a 
single-step, has the user thread single-step flag set, and ignore the 
unexpected single-step exceptions so it may handle this case seamlessly, so the 
issue might be hidden on Linux running atop no-KVM QEMU.

Any thoughts or further steps?

Thanks,
Ravi Bhagavatula

[PATCH 2/3] hw/usb/uhci: Rename NB_PORTS -> UHCI_PORTS

2024-02-12 Thread Philippe Mathieu-Daudé

Rename NB_PORTS as UHCI_PORTS to avoid definition clash
with EHCI equivalent:

  hw/usb/hcd-uhci.h:38:9: error: 'NB_PORTS' macro redefined 
[-Werror,-Wmacro-redefined]
  #define NB_PORTS 2
  ^
  hw/usb/hcd-ehci.h:40:9: note: previous definition is here
  #define NB_PORTS 6/* Max. Number of downstream ports */
  ^

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/usb/hcd-uhci.h |  4 ++--
 hw/usb/hcd-uhci.c | 22 +++---
 2 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/hw/usb/hcd-uhci.h b/hw/usb/hcd-uhci.h
index 69f8b40c49..6d26b94e92 100644
--- a/hw/usb/hcd-uhci.h
+++ b/hw/usb/hcd-uhci.h
@@ -35,7 +35,7 @@
 
 typedef struct UHCIQueue UHCIQueue;
 
-#define NB_PORTS 2
+#define UHCI_PORTS 2
 
 typedef struct UHCIPort {
 USBPort port;
@@ -59,7 +59,7 @@ typedef struct UHCIState {
 uint32_t frame_bytes;
 uint32_t frame_bandwidth;
 bool completions_only;
-UHCIPort ports[NB_PORTS];
+UHCIPort ports[UHCI_PORTS];
 qemu_irq irq;
 /* Interrupts that should be raised at the end of the current frame.  */
 uint32_t pending_int_mask;
diff --git a/hw/usb/hcd-uhci.c b/hw/usb/hcd-uhci.c
index b95b47f6a4..a03cf22e69 100644
--- a/hw/usb/hcd-uhci.c
+++ b/hw/usb/hcd-uhci.c
@@ -322,7 +322,7 @@ static void uhci_reset(DeviceState *dev)
 s->fl_base_addr = 0;
 s->sof_timing = 64;
 
-for(i = 0; i < NB_PORTS; i++) {
+for(i = 0; i < UHCI_PORTS; i++) {
 port = >ports[i];
 port->ctrl = 0x0080;
 if (port->port.dev && port->port.dev->attached) {
@@ -364,7 +364,7 @@ static const VMStateDescription vmstate_uhci = {
 .fields = (const VMStateField[]) {
 VMSTATE_PCI_DEVICE(dev, UHCIState),
 VMSTATE_UINT8_EQUAL(num_ports_vmstate, UHCIState, NULL),
-VMSTATE_STRUCT_ARRAY(ports, UHCIState, NB_PORTS, 1,
+VMSTATE_STRUCT_ARRAY(ports, UHCIState, UHCI_PORTS, 1,
  vmstate_uhci_port, UHCIPort),
 VMSTATE_UINT16(cmd, UHCIState),
 VMSTATE_UINT16(status, UHCIState),
@@ -404,7 +404,7 @@ static void uhci_port_write(void *opaque, hwaddr addr,
 int i;
 
 /* send reset on the USB bus */
-for(i = 0; i < NB_PORTS; i++) {
+for(i = 0; i < UHCI_PORTS; i++) {
 port = >ports[i];
 usb_device_reset(port->port.dev);
 }
@@ -457,7 +457,7 @@ static void uhci_port_write(void *opaque, hwaddr addr,
 int n;
 
 n = (addr >> 1) & 7;
-if (n >= NB_PORTS) {
+if (n >= UHCI_PORTS) {
 return;
 }
 port = >ports[n];
@@ -514,7 +514,7 @@ static uint64_t uhci_port_read(void *opaque, hwaddr addr, 
unsigned size)
 UHCIPort *port;
 int n;
 n = (addr >> 1) & 7;
-if (n >= NB_PORTS) {
+if (n >= UHCI_PORTS) {
 goto read_default;
 }
 port = >ports[n];
@@ -609,7 +609,7 @@ static USBDevice *uhci_find_device(UHCIState *s, uint8_t 
addr)
 USBDevice *dev;
 int i;
 
-for (i = 0; i < NB_PORTS; i++) {
+for (i = 0; i < UHCI_PORTS; i++) {
 UHCIPort *port = >ports[i];
 if (!(port->ctrl & UHCI_PORT_EN)) {
 continue;
@@ -1173,11 +1173,11 @@ void usb_uhci_common_realize(PCIDevice *dev, Error 
**errp)
 s->irq = pci_allocate_irq(dev);
 
 if (s->masterbus) {
-USBPort *ports[NB_PORTS];
-for(i = 0; i < NB_PORTS; i++) {
+USBPort *ports[UHCI_PORTS];
+for(i = 0; i < UHCI_PORTS; i++) {
 ports[i] = >ports[i].port;
 }
-usb_register_companion(s->masterbus, ports, NB_PORTS,
+usb_register_companion(s->masterbus, ports, UHCI_PORTS,
s->firstport, s, _port_ops,
USB_SPEED_MASK_LOW | USB_SPEED_MASK_FULL,
);
@@ -1187,14 +1187,14 @@ void usb_uhci_common_realize(PCIDevice *dev, Error 
**errp)
 }
 } else {
 usb_bus_new(>bus, sizeof(s->bus), _bus_ops, DEVICE(dev));
-for (i = 0; i < NB_PORTS; i++) {
+for (i = 0; i < UHCI_PORTS; i++) {
 usb_register_port(>bus, >ports[i].port, s, i, _port_ops,
   USB_SPEED_MASK_LOW | USB_SPEED_MASK_FULL);
 }
 }
 s->bh = qemu_bh_new_guarded(uhci_bh, s, 
(dev)->mem_reentrancy_guard);
 s->frame_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, uhci_frame_timer, s);
-s->num_ports_vmstate = NB_PORTS;
+s->num_ports_vmstate = UHCI_PORTS;
 QTAILQ_INIT(>queues);
 
 memory_region_init_io(>io_bar, OBJECT(s), _ioport_ops, s,
-- 
2.41.0

[PATCH 1/3] hw/usb: Style cleanup

2024-02-12 Thread Philippe Mathieu-Daudé

We are going to modify these lines, fix their style
in order to avoid checkpatch.pl warning.

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/usb/hcd-ehci.c | 3 ++-
 hw/usb/hcd-uhci.c | 6 --
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/hw/usb/hcd-ehci.c b/hw/usb/hcd-ehci.c
index 870c72cb59..98a2860069 100644
--- a/hw/usb/hcd-ehci.c
+++ b/hw/usb/hcd-ehci.c
@@ -1086,8 +1086,9 @@ static void ehci_opreg_write(void *ptr, hwaddr addr,
 case CONFIGFLAG:
 val &= 0x1;
 if (val) {
-for(i = 0; i < NB_PORTS; i++)
+for (i = 0; i < NB_PORTS; i++) {
 handle_port_owner_write(s, i, 0);
+}
 }
 break;
 
diff --git a/hw/usb/hcd-uhci.c b/hw/usb/hcd-uhci.c
index 7d3c026dae..b95b47f6a4 100644
--- a/hw/usb/hcd-uhci.c
+++ b/hw/usb/hcd-uhci.c
@@ -457,8 +457,9 @@ static void uhci_port_write(void *opaque, hwaddr addr,
 int n;
 
 n = (addr >> 1) & 7;
-if (n >= NB_PORTS)
+if (n >= NB_PORTS) {
 return;
+}
 port = >ports[n];
 dev = port->port.dev;
 if (dev && dev->attached) {
@@ -513,8 +514,9 @@ static uint64_t uhci_port_read(void *opaque, hwaddr addr, 
unsigned size)
 UHCIPort *port;
 int n;
 n = (addr >> 1) & 7;
-if (n >= NB_PORTS)
+if (n >= NB_PORTS) {
 goto read_default;
+}
 port = >ports[n];
 val = port->ctrl;
 }
-- 
2.41.0

[PATCH 3/3] hw/usb/ehci: Rename NB_PORTS -> EHCI_PORTS

2024-02-12 Thread Philippe Mathieu-Daudé

Rename NB_PORTS as EHCI_PORTS to avoid definition clash
with UHCI equivalent:

  hw/usb/hcd-ehci.h:40:9: error: 'NB_PORTS' macro redefined 
[-Werror,-Wmacro-redefined]
  #define NB_PORTS 6/* Max. Number of downstream ports */
  ^
  hw/usb/hcd-uhci.h:38:9: note: previous definition is here
  #define NB_PORTS 2
  ^

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/usb/hcd-ehci.h|  8 
 hw/usb/hcd-ehci-pci.c|  2 +-
 hw/usb/hcd-ehci-sysbus.c |  2 +-
 hw/usb/hcd-ehci.c| 20 ++--
 4 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/hw/usb/hcd-ehci.h b/hw/usb/hcd-ehci.h
index 2cd821f49e..56a1c09d1f 100644
--- a/hw/usb/hcd-ehci.h
+++ b/hw/usb/hcd-ehci.h
@@ -37,7 +37,7 @@
 #define MMIO_SIZE0x1000
 #define CAPA_SIZE0x10
 
-#define NB_PORTS 6/* Max. Number of downstream ports */
+#define EHCI_PORTS   6/* Max. Number of downstream ports */
 
 typedef struct EHCIPacket EHCIPacket;
 typedef struct EHCIQueue EHCIQueue;
@@ -288,7 +288,7 @@ struct EHCIState {
 uint32_t configflag;
 };
 };
-uint32_t portsc[NB_PORTS];
+uint32_t portsc[EHCI_PORTS];
 
 /*
  *  Internal states, shadow registers, etc
@@ -298,8 +298,8 @@ struct EHCIState {
 bool working;
 uint32_t astate; /* Current state in asynchronous schedule */
 uint32_t pstate; /* Current state in periodic schedule */
-USBPort ports[NB_PORTS];
-USBPort *companion_ports[NB_PORTS];
+USBPort ports[EHCI_PORTS];
+USBPort *companion_ports[EHCI_PORTS];
 uint32_t usbsts_pending;
 uint32_t usbsts_frindex;
 EHCIQueueHead aqueues;
diff --git a/hw/usb/hcd-ehci-pci.c b/hw/usb/hcd-ehci-pci.c
index 0b26db74d8..3ff54edf62 100644
--- a/hw/usb/hcd-ehci-pci.c
+++ b/hw/usb/hcd-ehci-pci.c
@@ -83,7 +83,7 @@ static void usb_ehci_pci_init(Object *obj)
 s->capsbase = 0x00;
 s->opregbase = 0x20;
 s->portscbase = 0x44;
-s->portnr = NB_PORTS;
+s->portnr = EHCI_PORTS;
 
 if (!dc->hotpluggable) {
 s->companion_enable = true;
diff --git a/hw/usb/hcd-ehci-sysbus.c b/hw/usb/hcd-ehci-sysbus.c
index bfb774504c..fe1dabd0bb 100644
--- a/hw/usb/hcd-ehci-sysbus.c
+++ b/hw/usb/hcd-ehci-sysbus.c
@@ -88,7 +88,7 @@ static void ehci_sysbus_class_init(ObjectClass *klass, void 
*data)
 SysBusEHCIClass *sec = SYS_BUS_EHCI_CLASS(klass);
 
 sec->portscbase = 0x44;
-sec->portnr = NB_PORTS;
+sec->portnr = EHCI_PORTS;
 
 dc->realize = usb_ehci_sysbus_realize;
 dc->vmsd = _ehci_sysbus;
diff --git a/hw/usb/hcd-ehci.c b/hw/usb/hcd-ehci.c
index 98a2860069..01864d4649 100644
--- a/hw/usb/hcd-ehci.c
+++ b/hw/usb/hcd-ehci.c
@@ -783,9 +783,9 @@ static void ehci_register_companion(USBBus *bus, USBPort 
*ports[],
 EHCIState *s = container_of(bus, EHCIState, bus);
 uint32_t i;
 
-if (firstport + portcount > NB_PORTS) {
+if (firstport + portcount > EHCI_PORTS) {
 error_setg(errp, "firstport must be between 0 and %u",
-   NB_PORTS - portcount);
+   EHCI_PORTS - portcount);
 return;
 }
 
@@ -831,7 +831,7 @@ static USBDevice *ehci_find_device(EHCIState *ehci, uint8_t 
addr)
 USBPort *port;
 int i;
 
-for (i = 0; i < NB_PORTS; i++) {
+for (i = 0; i < EHCI_PORTS; i++) {
 port = >ports[i];
 if (!(ehci->portsc[i] & PORTSC_PED)) {
 DPRINTF("Port %d not enabled\n", i);
@@ -850,7 +850,7 @@ void ehci_reset(void *opaque)
 {
 EHCIState *s = opaque;
 int i;
-USBDevice *devs[NB_PORTS];
+USBDevice *devs[EHCI_PORTS];
 
 trace_usb_ehci_reset();
 
@@ -858,7 +858,7 @@ void ehci_reset(void *opaque)
  * Do the detach before touching portsc, so that it correctly gets send to
  * us or to our companion based on PORTSC_POWNER before the reset.
  */
-for(i = 0; i < NB_PORTS; i++) {
+for(i = 0; i < EHCI_PORTS; i++) {
 devs[i] = s->ports[i].dev;
 if (devs[i] && devs[i]->attached) {
 usb_detach(>ports[i]);
@@ -877,7 +877,7 @@ void ehci_reset(void *opaque)
 s->astate = EST_INACTIVE;
 s->pstate = EST_INACTIVE;
 
-for(i = 0; i < NB_PORTS; i++) {
+for(i = 0; i < EHCI_PORTS; i++) {
 if (s->companion_ports[i]) {
 s->portsc[i] = PORTSC_POWNER | PORTSC_PPOWER;
 } else {
@@ -1086,7 +1086,7 @@ static void ehci_opreg_write(void *ptr, hwaddr addr,
 case CONFIGFLAG:
 val &= 0x1;
 if (val) {
-for (i = 0; i < NB_PORTS; i++) {
+for (i = 0; i < EHCI_PORTS; i++) {
 handle_port_owner_write(s, i, 0);
 }
 }
@@ -2427,7 +2427,7 @@ static int usb_ehci_post_load(void *opaque, int 
version_id)
 EHCIState *s = opaque;
 int i;
 
-for (i = 0; i < NB_PORTS; i++) {
+for (i = 0; i < EHCI_PORTS; i++) {
 USBPort *companion = s->companion_ports[i];
 if (companion == NULL) {

[PATCH 0/3] hw/usb: Rename NB_PORTS -> UHCI_PORTS / EHCI_PORTS

2024-02-12 Thread Philippe Mathieu-Daudé

Rename variables to avoid clashing when including both
"hw/usb/hcd-ehci.h" and "hw/usb/hcd-uhci.h" in the same
file.

Philippe Mathieu-Daudé (3):
  hw/usb: Style cleanup
  hw/usb/uhci: Rename NB_PORTS -> UHCI_PORTS
  hw/usb/ehci: Rename NB_PORTS -> EHCI_PORTS

 hw/usb/hcd-ehci.h|  8 
 hw/usb/hcd-uhci.h|  4 ++--
 hw/usb/hcd-ehci-pci.c|  2 +-
 hw/usb/hcd-ehci-sysbus.c |  2 +-
 hw/usb/hcd-ehci.c| 21 +++--
 hw/usb/hcd-uhci.c| 24 +---
 6 files changed, 32 insertions(+), 29 deletions(-)

-- 
2.41.0

[PATCH] hw/i386/q35: Simplify pc_q35_init() since PCI is always enabled

2024-02-12 Thread Philippe Mathieu-Daudé

We can not create the Q35 machine without PCI, so simplify
pc_q35_init() removing pointless checks.

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/i386/pc_q35.c | 32 ++--
 1 file changed, 10 insertions(+), 22 deletions(-)

diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 7ca3f465e0..5923621845 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -130,8 +130,7 @@ static void pc_q35_init(MachineState *machine)
 ISADevice *rtc_state;
 MemoryRegion *system_memory = get_system_memory();
 MemoryRegion *system_io = get_system_io();
-MemoryRegion *pci_memory;
-MemoryRegion *rom_memory;
+MemoryRegion *pci_memory = g_new(MemoryRegion, 1);
 GSIState *gsi_state;
 ISABus *isa_bus;
 int i;
@@ -143,6 +142,8 @@ static void pc_q35_init(MachineState *machine)
 bool keep_pci_slot_hpc;
 uint64_t pci_hole64_size = 0;
 
+assert(pcmc->pci_enabled);
+
 /* Check whether RAM fits below 4G (leaving 1/2 GByte for IO memory
  * and 256 Mbytes for PCI Express Enhanced Configuration Access Mapping
  * also known as MMCFG).
@@ -189,16 +190,6 @@ static void pc_q35_init(MachineState *machine)
 kvmclock_create(pcmc->kvmclock_create_always);
 }
 
-/* pci enabled */
-if (pcmc->pci_enabled) {
-pci_memory = g_new(MemoryRegion, 1);
-memory_region_init(pci_memory, NULL, "pci", UINT64_MAX);
-rom_memory = pci_memory;
-} else {
-pci_memory = NULL;
-rom_memory = system_memory;
-}
-
 pc_guest_info_init(pcms);
 
 if (pcmc->smbios_defaults) {
@@ -212,14 +203,13 @@ static void pc_q35_init(MachineState *machine)
 /* create pci host bus */
 phb = OBJECT(qdev_new(TYPE_Q35_HOST_DEVICE));
 
-if (pcmc->pci_enabled) {
-pci_hole64_size = object_property_get_uint(phb,
-   
PCI_HOST_PROP_PCI_HOLE64_SIZE,
-   _abort);
-}
+pci_hole64_size = object_property_get_uint(phb,
+   PCI_HOST_PROP_PCI_HOLE64_SIZE,
+   _abort);
 
 /* allocate ram and load rom/bios */
-pc_memory_init(pcms, system_memory, rom_memory, pci_hole64_size);
+memory_region_init(pci_memory, NULL, "pci", UINT64_MAX);
+pc_memory_init(pcms, system_memory, pci_memory, pci_hole64_size);
 
 object_property_add_child(OBJECT(machine), "q35", phb);
 object_property_set_link(phb, PCI_HOST_PROP_RAM_MEM,
@@ -243,7 +233,7 @@ static void pc_q35_init(MachineState *machine)
 pcms->bus = host_bus;
 
 /* irq lines */
-gsi_state = pc_gsi_create(>gsi, pcmc->pci_enabled);
+gsi_state = pc_gsi_create(>gsi, true);
 
 /* create ISA bus */
 lpc = pci_new_multifunction(PCI_DEVFN(ICH9_LPC_DEV, ICH9_LPC_FUNC),
@@ -286,9 +276,7 @@ static void pc_q35_init(MachineState *machine)
 pc_i8259_create(isa_bus, gsi_state->i8259_irq);
 }
 
-if (pcmc->pci_enabled) {
-ioapic_init_gsi(gsi_state, "q35");
-}
+ioapic_init_gsi(gsi_state, "q35");
 
 if (tcg_enabled()) {
 x86_register_ferr_irq(x86ms->gsi[13]);
-- 
2.41.0

Re: [PATCH] tcg: Increase width of temp_subindex

2024-02-12 Thread Philippe Mathieu-Daudé


On 13/2/24 03:21, Richard Henderson wrote:

We need values 0-3 for TCG_TYPE_I128 on 32-bit hosts.

Cc: qemu-sta...@nongnu.org
Fixes: 43eef72f4109 ("tcg: Add temp allocation for TCGv_i128")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2159
Signed-off-by: Richard Henderson 
---

I feel certain that I made this change back when I introduced TCGv_i128.
I imagine that something went wrong with a rebase and it got lost.
Worse, we don't use temp_subindex often, and we usually handle i128
this value correctly.  It took a quirk of register allocation ordering
to make an invalid value in temp_subindex lead to a crash.

---
  include/tcg/tcg.h | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)


Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v6 0/2] acpi: report numa nodes for device memory using GI

2024-02-12 Thread Ankit Agrawal

>> >
>> > I'd find it helpful to see the resulting chunk of SRAT for these examples
>> > (disassembled) in this cover letter and the patches (where there are more 
>> > examples).
>>
>> Ack. I'll document the resulting SRAT table as well.
>
> Still didn't happen so this is dropped for now.

Hi Michael, does this mean it is dropped from Qemu v9.0?
FWIW, I'll post the next version incorporating the feedbacks by next week.

[PATCH v4 2/3] backends: Initial support for SPDM socket support

2024-02-12 Thread Alistair Francis

From: Huai-Cheng Kuo 

SPDM enables authentication, attestation and key exchange to assist in
providing infrastructure security enablement. It's a standard published
by the DMTF [1].

SPDM supports multiple transports, including PCIe DOE and MCTP.
This patch adds support to QEMU to connect to an external SPDM
instance.

SPDM support can be added to any QEMU device by exposing a
TCP socket to a SPDM server. The server can then implement the SPDM
decoding/encoding support, generally using libspdm [2].

This is similar to how the current TPM implementation works and means
that the heavy lifting of setting up certificate chains, capabilities,
measurements and complex crypto can be done outside QEMU by a well
supported and tested library.

1: https://www.dmtf.org/standards/SPDM
2: https://github.com/DMTF/libspdm

Signed-off-by: Huai-Cheng Kuo 
Signed-off-by: Chris Browy 
Co-developed-by: Jonathan Cameron 
Signed-off-by: Jonathan Cameron 
[ Changes by WM
 - Bug fixes from testing
]
Signed-off-by: Wilfred Mallawa 
[ Changes by AF:
 - Convert to be more QEMU-ified
 - Move to backends as it isn't PCIe specific
]
Signed-off-by: Alistair Francis 
---
 include/sysemu/spdm-socket.h |  44 +++
 backends/spdm-socket.c   | 216 +++
 backends/Kconfig |   4 +
 backends/meson.build |   2 +
 4 files changed, 266 insertions(+)
 create mode 100644 include/sysemu/spdm-socket.h
 create mode 100644 backends/spdm-socket.c

diff --git a/include/sysemu/spdm-socket.h b/include/sysemu/spdm-socket.h
new file mode 100644
index 00..24e6fccb83
--- /dev/null
+++ b/include/sysemu/spdm-socket.h
@@ -0,0 +1,44 @@
+/*
+ * QEMU SPDM socket support
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#ifndef SPDM_REQUESTER_H
+#define SPDM_REQUESTER_H
+
+int spdm_socket_connect(uint16_t port, Error **errp);
+uint32_t spdm_socket_rsp(const int socket, uint32_t transport_type,
+ void *req, uint32_t req_len,
+ void *rsp, uint32_t rsp_len);
+void spdm_socket_close(const int socket, uint32_t transport_type);
+
+#define SPDM_SOCKET_COMMAND_NORMAL0x0001
+#define SPDM_SOCKET_COMMAND_OOB_ENCAP_KEY_UPDATE  0x8001
+#define SPDM_SOCKET_COMMAND_CONTINUE  0xFFFD
+#define SPDM_SOCKET_COMMAND_SHUTDOWN  0xFFFE
+#define SPDM_SOCKET_COMMAND_UNKOWN0x
+#define SPDM_SOCKET_COMMAND_TEST  0xDEAD
+
+#define SPDM_SOCKET_TRANSPORT_TYPE_MCTP   0x01
+#define SPDM_SOCKET_TRANSPORT_TYPE_PCI_DOE0x02
+
+#define SPDM_SOCKET_MAX_MESSAGE_BUFFER_SIZE   0x1200
+
+#endif
diff --git a/backends/spdm-socket.c b/backends/spdm-socket.c
new file mode 100644
index 00..d0663d696c
--- /dev/null
+++ b/backends/spdm-socket.c
@@ -0,0 +1,216 @@
+/* SPDX-License-Identifier: BSD-3-Clause */
+/*
+ * QEMU SPDM socket support
+ *
+ * This is based on:
+ * 
https://github.com/DMTF/spdm-emu/blob/07c0a838bcc1c6207c656ac75885c0603e344b6f/spdm_emu/spdm_emu_common/command.c
+ * but has been re-written to match QEMU style
+ *
+ * Copyright (c) 2021, DMTF. All rights reserved.
+ * Copyright (c) 2023. Western Digital Corporation or its affiliates.
+ */
+
+#include "qemu/osdep.h"
+#include "sysemu/spdm-socket.h"
+#include "qapi/error.h"
+
+static bool read_bytes(const int socket, uint8_t *buffer,
+   size_t number_of_bytes)
+{
+ssize_t number_received = 0;
+ssize_t result;
+
+while (number_received < number_of_bytes) {
+result = recv(socket, buffer + number_received,
+  number_of_bytes - number_received, 0);
+if (result <= 0) {
+return false;
+}
+number_received += result;
+}
+return true;
+}
+
+static bool read_data32(const int socket, uint32_t *data)
+{
+bool result;
+
+result = read_bytes(socket, (uint8_t *)data, sizeof(uint32_t));
+if (!result) {
+

[PATCH v4 3/3] hw/nvme: Add SPDM over DOE support

2024-02-12 Thread Alistair Francis

From: Wilfred Mallawa 

Setup Data Object Exchance (DOE) as an extended capability for the NVME
controller and connect SPDM to it (CMA) to it.

Signed-off-by: Wilfred Mallawa 
Signed-off-by: Alistair Francis 
Reviewed-by: Jonathan Cameron 
Acked-by: Klaus Jensen 
---
 docs/specs/index.rst|   1 +
 docs/specs/spdm.rst | 122 
 include/hw/pci/pci_device.h |   5 ++
 include/hw/pci/pcie_doe.h   |   3 +
 hw/nvme/ctrl.c  |  53 
 5 files changed, 184 insertions(+)
 create mode 100644 docs/specs/spdm.rst

diff --git a/docs/specs/index.rst b/docs/specs/index.rst
index 1484e3e760..e2d907959a 100644
--- a/docs/specs/index.rst
+++ b/docs/specs/index.rst
@@ -29,6 +29,7 @@ guest hardware that is specific to QEMU.
edu
ivshmem-spec
pvpanic
+   spdm
standard-vga
virt-ctlr
vmcoreinfo
diff --git a/docs/specs/spdm.rst b/docs/specs/spdm.rst
new file mode 100644
index 00..4d0942c1ad
--- /dev/null
+++ b/docs/specs/spdm.rst
@@ -0,0 +1,122 @@
+==
+QEMU Security Protocols and Data Models (SPDM) Support
+==
+
+SPDM enables authentication, attestation and key exchange to assist in
+providing infrastructure security enablement. It's a standard published
+by the `DMTF`_.
+
+QEMU supports connecting to a SPDM responder implementation. This allows an
+external application to emulate the SPDM responder logic for an SPDM device.
+
+Setting up a SPDM server
+
+
+When using QEMU with SPDM devices QEMU will connect to a server which
+implements the SPDM functionality.
+
+SPDM-Utils
+--
+
+You can use `SPDM Utils`_ to emulate a responder. This is the simplest method.
+
+SPDM-Utils is a Linux applications to manage, test and develop devices
+supporting DMTF Security Protocol and Data Model (SPDM). It is written in Rust
+and utilises libspdm.
+
+To use SPDM-Utils you will need to do the following steps. Details are included
+in the SPDM-Utils README.
+
+ 1. `Build libspdm`_
+ 2. `Build SPDM Utils using Cargo`_
+ 3. `Run it as a server`_
+
+spdm-emu
+
+
+You can use `spdm emu`_ to model the
+SPDM responder.
+
+.. code-block:: shell
+
+$ cd spdm-emu
+$ git submodule init; git submodule update --recursive
+$ mkdir build; cd build
+$ cmake -DARCH=x64 -DTOOLCHAIN=GCC -DTARGET=Debug -DCRYPTO=openssl ..
+$ make -j32
+$ make copy_sample_key # Build certificates, required for SPDM 
authentication.
+
+It is worth noting that the certificates should be in compliance with
+PCIe r6.1 sec 6.31.3. This means you will need to add the following to
+openssl.cnf
+
+.. code-block::
+
+subjectAltName = 
otherName:2.23.147;UTF8:Vendor=1b36:Device=0010:CC=010802:REV=02:SSVID=1af4:SSID=1100
+2.23.147 = ASN1:OID:2.23.147
+
+and then manually regenerate some certificates with:
+
+.. code-block:: shell
+
+$ openssl req -nodes -newkey ec:param.pem -keyout end_responder.key \
+-out end_responder.req -sha384 -batch \
+-subj "/CN=DMTF libspdm ECP384 responder cert"
+
+$ openssl x509 -req -in end_responder.req -out end_responder.cert \
+-CA inter.cert -CAkey inter.key -sha384 -days 3650 -set_serial 3 \
+-extensions v3_end -extfile ../openssl.cnf
+
+$ openssl asn1parse -in end_responder.cert -out end_responder.cert.der
+
+$ cat ca.cert.der inter.cert.der end_responder.cert.der > 
bundle_responder.certchain.der
+
+You can use SPDM-Utils instead as it will generate the correct certificates
+automatically.
+
+The responder can then be launched with
+
+.. code-block:: shell
+
+$ cd bin
+$ ./spdm_responder_emu --trans PCI_DOE
+
+Connecting an SPDM NVMe device
+==
+
+Once a SPDM server is running we can start QEMU and connect to the server.
+
+For an NVMe device first let's setup a block we can use
+
+.. code-block:: shell
+
+$ cd qemu-spdm/linux/image
+$ dd if=/dev/zero of=blknvme bs=1M count=2096 # 2GB NNMe Drive
+
+Then you can add this to your QEMU command line:
+
+.. code-block:: shell
+
+-drive file=blknvme,if=none,id=mynvme,format=raw \
+-device nvme,drive=mynvme,serial=deadbeef,spdm=2323
+
+At which point QEMU will try to connect to the SPDM server.
+
+
+.. _DMTF:
+   https://www.dmtf.org/standards/SPDM
+
+.. _SPDM Utils:
+   https://github.com/westerndigitalcorporation/spdm-utils
+
+.. _spdm emu:
+   https://github.com/dmtf/spdm-emu
+
+.. _Build SPDM Utils:
+   https://github.com/westerndigitalcorporation/spdm-utils#building
+
+.. _Generate the certificates:
+   
https://github.com/westerndigitalcorporation/spdm-utils#generate-mutable-certificates
+
+.. _Run it as a server:
+   
https://github.com/westerndigitalcorporation/spdm-utils#qemu-spdm-device-emulation
diff --git a/include/hw/pci/pci_device.h b/include/hw/pci/pci_device.h
index d3dd0f64b2..b8379c78f1 100644
---

[PATCH v4 0/3] Initial support for SPDM Responders

2024-02-12 Thread Alistair Francis

The Security Protocol and Data Model (SPDM) Specification defines
messages, data objects, and sequences for performing message exchanges
over a variety of transport and physical media.
 - 
https://www.dmtf.org/sites/default/files/standards/documents/DSP0274_1.3.0.pdf

SPDM currently supports PCIe DOE and MCTP transports, but it can be
extended to support others in the future. This series adds
support to QEMU to connect to an external SPDM instance.

SPDM support can be added to any QEMU device by exposing a
TCP socket to a SPDM server. The server can then implement the SPDM
decoding/encoding support, generally using libspdm [1].

This is similar to how the current TPM implementation works and means
that the heavy lifting of setting up certificate chains, capabilities,
measurements and complex crypto can be done outside QEMU by a well
supported and tested library.

This series implements socket support and exposes SPDM for a NVMe device.

1: https://github.com/DMTF/libspdm

v4:
 - Rebase
v3:
 - Spelling fixes
 - Support for SPDM-Utils
v2:
 - Add cover letter
 - A few code fixes based on comments
 - Document SPDM-Utils
 - A few tweaks and clarifications to the documentation

Alistair Francis (1):
  hw/pci: Add all Data Object Types defined in PCIe r6.0

Huai-Cheng Kuo (1):
  backends: Initial support for SPDM socket support

Wilfred Mallawa (1):
  hw/nvme: Add SPDM over DOE support

 docs/specs/index.rst |   1 +
 docs/specs/spdm.rst  | 122 
 include/hw/pci/pci_device.h  |   5 +
 include/hw/pci/pcie_doe.h|   5 +
 include/sysemu/spdm-socket.h |  44 +++
 backends/spdm-socket.c   | 216 +++
 hw/nvme/ctrl.c   |  53 +
 backends/Kconfig |   4 +
 backends/meson.build |   2 +
 9 files changed, 452 insertions(+)
 create mode 100644 docs/specs/spdm.rst
 create mode 100644 include/sysemu/spdm-socket.h
 create mode 100644 backends/spdm-socket.c

-- 
2.43.0

[PATCH v4 1/3] hw/pci: Add all Data Object Types defined in PCIe r6.0

2024-02-12 Thread Alistair Francis

Add all of the defined protocols/features from the PCIe-SIG r6.0
"Table 6-32 PCI-SIG defined Data Object Types (Vendor ID = 0001h)"
table.

Signed-off-by: Alistair Francis 
Reviewed-by: Jonathan Cameron 
---
 include/hw/pci/pcie_doe.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/hw/pci/pcie_doe.h b/include/hw/pci/pcie_doe.h
index 87dc17dcef..15d94661f9 100644
--- a/include/hw/pci/pcie_doe.h
+++ b/include/hw/pci/pcie_doe.h
@@ -46,6 +46,8 @@ REG32(PCI_DOE_CAP_STATUS, 0)
 
 /* PCI-SIG defined Data Object Types - r6.0 Table 6-32 */
 #define PCI_SIG_DOE_DISCOVERY   0x00
+#define PCI_SIG_DOE_CMA 0x01
+#define PCI_SIG_DOE_SECURED_CMA 0x02
 
 #define PCI_DOE_DW_SIZE_MAX (1 << 18)
 #define PCI_DOE_PROTOCOL_NUM_MAX256
-- 
2.43.0

[PATCH] tcg: Increase width of temp_subindex

2024-02-12 Thread Richard Henderson

We need values 0-3 for TCG_TYPE_I128 on 32-bit hosts.

Cc: qemu-sta...@nongnu.org
Fixes: 43eef72f4109 ("tcg: Add temp allocation for TCGv_i128")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2159
Signed-off-by: Richard Henderson 
---

I feel certain that I made this change back when I introduced TCGv_i128.
I imagine that something went wrong with a rebase and it got lost.
Worse, we don't use temp_subindex often, and we usually handle i128
this value correctly.  It took a quirk of register allocation ordering
to make an invalid value in temp_subindex lead to a crash.

---
 include/tcg/tcg.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index daf2a5bf9e..451f3fec41 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -412,7 +412,7 @@ typedef struct TCGTemp {
 unsigned int mem_coherent:1;
 unsigned int mem_allocated:1;
 unsigned int temp_allocated:1;
-unsigned int temp_subindex:1;
+unsigned int temp_subindex:2;
 
 int64_t val;
 struct TCGTemp *mem_base;
-- 
2.34.1

Re: [RFC PATCH SET] cxl: add poison event handler

2024-02-12 Thread Dave Jiang




On 2/9/24 4:54 AM, Shiyang Ruan wrote:
> Currently driver only trace cxl events, poison injection on cxl memdev
> is silent.  OS needs to be notified then it could handle poison range
> in time.  Per CXL spec, the device error event could be signaled through
> FW-First and OS-First methods.
> 
> This draft patchset adds poison event handler in OS-First method:
>   - qemu:
> - CXL device report POISON event to OS by MSI by sending GMER after
>   injecting a poison record
>   - CXL driver
> a. read the POISON event through GMER;
> b. get POISON list;
> c. translate DPA to HPA;
> d. construct a mce instance, then call mce_log() to queue this mce
>instance;
> 
> This patchset includes 2 patches for qemu, 5 patches for cxl driver.

Hi Ruan,
Next time please split this out and post as 2 different series. You can have 
the CXL driver series cover letter reference the QEMU changes. And please add 
QEMU to subject prefix for the QEMU patches. Thank you!

> 
> Shiyang Ruan (5):
>   cxl/core: correct length of DPA field masks
>   cxl/core: introduce cxl_memdev_dpa_to_hpa()
>   cxl/core: introduce cxl_mem_report_poison()
>   cxl/core: add report option for cxl_mem_get_poison()
>   cxl/core: add poison injection event handler
> 
>  arch/x86/kernel/cpu/mce/core.c |  1 +
>  drivers/cxl/core/mbox.c| 82 +++---
>  drivers/cxl/core/memdev.c  | 16 ++-
>  drivers/cxl/core/region.c  |  8 ++--
>  drivers/cxl/core/trace.h   |  6 +--
>  drivers/cxl/cxlmem.h   | 11 ++---
>  drivers/cxl/pci.c  |  4 +-
>  7 files changed, 97 insertions(+), 31 deletions(-)
>

Re: RFC i386/sev: kernel-hashes, reference measurements and event logs

2024-02-12 Thread Dionna Amalie Glaze

On Mon, Feb 12, 2024 at 12:57 PM James Bottomley  wrote:
>
> On Mon, 2024-02-12 at 12:16 -0800, Dionna Amalie Glaze wrote:
> > This is not a patch but it felt inappropriate to derail a recent
> > patch that's just refactoring the kernel-hashes object_class_property
> > definition. Apologies if this has been discussed before, as I'm not
> > particularly active here.
>
> I haven't seen that patch, but I presume it's not relevant?
>
> > Regarding kernel-hashes, how is that load-time information passed
> > along to the guest beyond, say, OVMF? Can we require that these
> > hashes
> > are also present in fw_cfg so they can be read from the kernel? In
> > Linux it'd be nice to have /sys/firmware/qemu_fw_cfg/sev_kernel_hash,
> > /sys/firmware/qemu_fw_cfg/sev_cmdline_hash,
> > /sys/firmware/qemu_fw_cfg/sev_initrd_hash
>
> Are you referring to measured direct boot?  In that case, there's no
> point having hashes in the fw_cfg, because OVMF in the guest needs to
> hash the kernel itself and then compare to a trusted source (which the
> fw_cfg file wouldn't be because it's under the control of the

I'm not suggesting that OVMF uses the fw_cfg files to check, just to
use an interface that is readable by the guest OS. If the values are
wrong, then you can't reconstruct the measurement in the attestation
report and the host just DoSes the guest, which is already possible
without this change.

> hypervisor).  For SEV, the trusted source is a table in the launch
> measured ROM, but I'm sure TDX does something similar.

TDX measures information after launch into RTMRs, which SEV-SNP doesn't have.

>
> > I'm working on how to use standard document formats for providing
> > reference measurements of the Google Compute Engine virtual firmware
> > for remote attestation, and these hashes have an impact not just on
> > the measurement but on the entire model that the IETF RATS working
> > group is considering for authorizing attestation measurements.
> >
> > If you're assembling a VM launch configuration with firmware provided
> > by a trusted vendor (say Google), and your hashes are passed in from
> > an API, there's no easy rendezvous to state that the combination
> > produces the expected hardware measurement. This makes adding
> > kernel-hashes support unappetizing, since it makes the hardware
> > attestation report's measurement have no meaning, or at least, it
> > makes life difficult for people trying to assign it meaning.
>
> Well, no, since firmware tends to update on a longer timeframe than the
> kernel and the cmdline and initrd tend to have quicker update cycles

This is not true of our environment, where every release of the
hypervisor includes a fresh build of the firmware, which may have
codegen differences for multiple reasons. We don't have a way to
announce ahead of time which firmware an instance will get for them to
sign an initial measurement derived from it.

> than the kernel.  Thus there's no one golden reference.  Instead you
> give a tool (say the virtee sev_snp_measure tool
>
> https://github.com/virtee/sev-snp-measure
>
> ) the hashes of the firmware, kernel, command line and initrd and it
> caclulates the expected launch measure

Yes indeed, I've written one myself. Publishing it has been a bit
Kafkaesque as opposed to the go-sev-guest and go-configfs-tsm
libraries.
The problem is still preauthorization when it'd be much more
preferable to authorize components of a measurement and derive the
SEV-SNP attestation from the components.

>
> > The measurement is the product of two different entities as assembled
> > by the VMM given a trusted firmware and the kernel hashes. It's a bit
> > of a sandwich of (GCE) core firmware, (User) SEV hashes, (GCE) BSP
> > VMSA, AP VMSA*.
> >
> > When you collect "evidence" to verify locally or pass along to a
> > verification service, you need more than just the hardware
> > attestation report to make sense of the combined bits. You have a PCR
> > situation like with TPM, so you need an event log for these different
> > aspects of the ultimate measurement. There is no event log for this
> > -kernel-hashes construction.
>
> That's because it's a pre-launch measure.  The TCG log is only for post
> launch.  The idea being those values are needed for you to approve
> something in pre-launch, like key release, before the TPM starts
> running.
>
> That's not to say we shouldn't have log entries for pre-launch, but
> that's a generic problem and not specific to measured direct boot.
>

Okay, do you know which direction the solution is moving? I know
there's a measured TPM firmware change proposed to the TCG, but that's
a bit too specific.

> > We can use the TCG TPM event log to post EV_NO_ACTION events about
> > the PlatformRIM, specifically, to point at a UEFI variable that we
> > populate to store our signed document about the expected measurements
> > with the Qemu-SEV-SNP-boot-protocol, but I don't see how we might
> > collect the kernel-hashes values as extra evidence

[PULL 08/12] target/hppa: Allow read-access to PSW with rsm 0, reg instruction

2024-02-12 Thread deller

From: Helge Deller 

HP-UX 11 and HP ODE tools use the "rsm 0,%reg" instruction in not priviledged
code paths to get the current PSW flags. The constant 0 means that no bits of
the PSW shall be reset, so this is effectively a read-only access to the PSW.
Allow this read-only access even for not privileged code.

Signed-off-by: Helge Deller 
Acked-by: Richard Henderson 
---
 target/hppa/translate.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/target/hppa/translate.c b/target/hppa/translate.c
index 53ec57ee86..01f3188656 100644
--- a/target/hppa/translate.c
+++ b/target/hppa/translate.c
@@ -2156,10 +2156,16 @@ static bool trans_ldsid(DisasContext *ctx, arg_ldsid *a)
 
 static bool trans_rsm(DisasContext *ctx, arg_rsm *a)
 {
+#ifdef CONFIG_USER_ONLY
 CHECK_MOST_PRIVILEGED(EXCP_PRIV_OPR);
-#ifndef CONFIG_USER_ONLY
+#else
 TCGv_i64 tmp;
 
+/* HP-UX 11i and HP ODE use rsm for read-access to PSW */
+if (a->i) {
+CHECK_MOST_PRIVILEGED(EXCP_PRIV_OPR);
+}
+
 nullify_over(ctx);
 
 tmp = tcg_temp_new_i64();
-- 
2.43.0

[PULL 06/12] target/hppa: Implement do_transaction_failed handler for I/O errors

2024-02-12 Thread deller

From: Helge Deller 

Add the do_transaction_failed() handler to tigger a HPMC to the CPU
in case of I/O transaction errors.

This is a preparation commit.
We still lack implementation for some registers, so do not yet enable sending
HPMCs.  Having this hunk here now nevertheless helps for the further
development, so that it can easily be enabled later on.

Signed-off-by: Helge Deller 
---
 target/hppa/cpu.c|  1 +
 target/hppa/cpu.h|  5 +
 target/hppa/mem_helper.c | 19 +++
 3 files changed, 25 insertions(+)

diff --git a/target/hppa/cpu.c b/target/hppa/cpu.c
index 5f87c1b12a..afe73d4474 100644
--- a/target/hppa/cpu.c
+++ b/target/hppa/cpu.c
@@ -191,6 +191,7 @@ static const TCGCPUOps hppa_tcg_ops = {
 .cpu_exec_interrupt = hppa_cpu_exec_interrupt,
 .do_interrupt = hppa_cpu_do_interrupt,
 .do_unaligned_access = hppa_cpu_do_unaligned_access,
+.do_transaction_failed = hppa_cpu_do_transaction_failed,
 #endif /* !CONFIG_USER_ONLY */
 };
 
diff --git a/target/hppa/cpu.h b/target/hppa/cpu.h
index 7a181e8f33..a92dc352cb 100644
--- a/target/hppa/cpu.h
+++ b/target/hppa/cpu.h
@@ -381,6 +381,11 @@ bool hppa_cpu_exec_interrupt(CPUState *cpu, int int_req);
 int hppa_get_physical_address(CPUHPPAState *env, vaddr addr, int mmu_idx,
   int type, hwaddr *pphys, int *pprot,
   HPPATLBEntry **tlb_entry);
+void hppa_cpu_do_transaction_failed(CPUState *cs, hwaddr physaddr,
+ vaddr addr, unsigned size,
+ MMUAccessType access_type,
+ int mmu_idx, MemTxAttrs attrs,
+ MemTxResult response, uintptr_t retaddr);
 extern const MemoryRegionOps hppa_io_eir_ops;
 extern const VMStateDescription vmstate_hppa_cpu;
 void hppa_cpu_alarm_timer(void *);
diff --git a/target/hppa/mem_helper.c b/target/hppa/mem_helper.c
index 629a9d90ef..676c0b3003 100644
--- a/target/hppa/mem_helper.c
+++ b/target/hppa/mem_helper.c
@@ -353,6 +353,25 @@ raise_exception_with_ior(CPUHPPAState *env, int excp, 
uintptr_t retaddr,
 cpu_loop_exit_restore(cs, retaddr);
 }
 
+void hppa_cpu_do_transaction_failed(CPUState *cs, hwaddr physaddr,
+ vaddr addr, unsigned size,
+ MMUAccessType access_type,
+ int mmu_idx, MemTxAttrs attrs,
+ MemTxResult response, uintptr_t retaddr)
+{
+CPUHPPAState *env = cpu_env(cs);
+
+qemu_log_mask(LOG_GUEST_ERROR, "HPMC at " TARGET_FMT_lx ":" TARGET_FMT_lx
+" while accessing I/O at %#08" HWADDR_PRIx "\n",
+env->iasq_f, env->iaoq_f, physaddr);
+
+/* FIXME: Enable HPMC exceptions when firmware has clean device probing */
+if (0) {
+raise_exception_with_ior(env, EXCP_HPMC, retaddr, addr,
+ MMU_IDX_MMU_DISABLED(mmu_idx));
+}
+}
+
 bool hppa_cpu_tlb_fill(CPUState *cs, vaddr addr, int size,
MMUAccessType type, int mmu_idx,
bool probe, uintptr_t retaddr)
-- 
2.43.0

[PULL 09/12] target/hppa: PDC_BTLB_INFO uses 32-bit ints

2024-02-12 Thread deller

From: Helge Deller 

The BTLB helper function stores the BTLB info (four 32-bit ints) into
the memory of the guest. They are only available when emulating a 32-bit
CPU in the guest, so use "uint32_t" instead of "target_ulong" here.

Signed-off-by: Helge Deller 
Reviewed-by: Richard Henderson 
---
 target/hppa/mem_helper.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/hppa/mem_helper.c b/target/hppa/mem_helper.c
index 676c0b3003..66b8fa7d72 100644
--- a/target/hppa/mem_helper.c
+++ b/target/hppa/mem_helper.c
@@ -684,7 +684,7 @@ void HELPER(diag_btlb)(CPUHPPAState *env)
 case 0:
 /* return BTLB parameters */
 qemu_log_mask(CPU_LOG_MMU, "PDC_BLOCK_TLB: PDC_BTLB_INFO\n");
-vaddr = probe_access(env, env->gr[24], 4 * sizeof(target_ulong),
+vaddr = probe_access(env, env->gr[24], 4 * sizeof(uint32_t),
  MMU_DATA_STORE, mmu_idx, ra);
 if (vaddr == NULL) {
 env->gr[28] = -10; /* invalid argument */
-- 
2.43.0

[PULL 03/12] hw/pci-host/astro: Avoid aborting on access failure

2024-02-12 Thread deller

From: Helge Deller 

Instead of stopping the emulation, report a MEMTX_DECODE_ERROR if the OS
tries to access non-existent registers.

Signed-off-by: Helge Deller 
Reviewed-by: Richard Henderson 
---
 hw/pci-host/astro.c | 27 +++
 1 file changed, 11 insertions(+), 16 deletions(-)

diff --git a/hw/pci-host/astro.c b/hw/pci-host/astro.c
index 37d271118c..96d655f5fb 100644
--- a/hw/pci-host/astro.c
+++ b/hw/pci-host/astro.c
@@ -122,10 +122,6 @@ static MemTxResult elroy_chip_read_with_attrs(void 
*opaque, hwaddr addr,
 case 0x0800:/* IOSAPIC_REG_SELECT */
 val = s->iosapic_reg_select;
 break;
-case 0x0808:
-val = UINT64_MAX;/* XXX: tbc. */
-g_assert_not_reached();
-break;
 case 0x0810:/* IOSAPIC_REG_WINDOW */
 switch (s->iosapic_reg_select) {
 case 0x01:  /* IOSAPIC_REG_VERSION */
@@ -135,15 +131,15 @@ static MemTxResult elroy_chip_read_with_attrs(void 
*opaque, hwaddr addr,
 if (s->iosapic_reg_select < ARRAY_SIZE(s->iosapic_reg)) {
 val = s->iosapic_reg[s->iosapic_reg_select];
 } else {
-trace_iosapic_reg_read(s->iosapic_reg_select, size, val);
-g_assert_not_reached();
+val = 0;
+ret = MEMTX_DECODE_ERROR;
 }
 }
 trace_iosapic_reg_read(s->iosapic_reg_select, size, val);
 break;
 default:
-trace_elroy_read(addr, size, val);
-g_assert_not_reached();
+val = 0;
+ret = MEMTX_DECODE_ERROR;
 }
 trace_elroy_read(addr, size, val);
 
@@ -191,7 +187,7 @@ static MemTxResult elroy_chip_write_with_attrs(void 
*opaque, hwaddr addr,
 if (s->iosapic_reg_select < ARRAY_SIZE(s->iosapic_reg)) {
 s->iosapic_reg[s->iosapic_reg_select] = val;
 } else {
-g_assert_not_reached();
+return MEMTX_DECODE_ERROR;
 }
 break;
 case 0x0840:/* IOSAPIC_REG_EOI */
@@ -204,7 +200,7 @@ static MemTxResult elroy_chip_write_with_attrs(void 
*opaque, hwaddr addr,
 }
 break;
 default:
-g_assert_not_reached();
+return MEMTX_DECODE_ERROR;
 }
 return MEMTX_OK;
 }
@@ -594,8 +590,8 @@ static MemTxResult astro_chip_read_with_attrs(void *opaque, 
hwaddr addr,
 #undef EMPTY_PORT
 
 default:
-trace_astro_chip_read(addr, size, val);
-g_assert_not_reached();
+val = 0;
+ret = MEMTX_DECODE_ERROR;
 }
 
 /* for 32-bit accesses mask return value */
@@ -610,6 +606,7 @@ static MemTxResult astro_chip_write_with_attrs(void 
*opaque, hwaddr addr,
   uint64_t val, unsigned size,
   MemTxAttrs attrs)
 {
+MemTxResult ret = MEMTX_OK;
 AstroState *s = opaque;
 
 trace_astro_chip_write(addr, size, val);
@@ -686,11 +683,9 @@ static MemTxResult astro_chip_write_with_attrs(void 
*opaque, hwaddr addr,
 #undef EMPTY_PORT
 
 default:
-/* Controlled by astro_chip_mem_valid above.  */
-trace_astro_chip_write(addr, size, val);
-g_assert_not_reached();
+ret = MEMTX_DECODE_ERROR;
 }
-return MEMTX_OK;
+return ret;
 }
 
 static const MemoryRegionOps astro_chip_ops = {
-- 
2.43.0

[PULL 05/12] lasi: allow access to LAN MAC address registers

2024-02-12 Thread deller

From: Helge Deller 

Firmware and qemu reads and writes the MAC address for the LASI LAN via
registers in LASI. Allow those accesses and return zero even if LASI
LAN isn't enabled to avoid HPMCs (=crashes).

Signed-off-by: Helge Deller 
---
 hw/misc/lasi.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/misc/lasi.c b/hw/misc/lasi.c
index 003f5b5ed8..9cfa5bb316 100644
--- a/hw/misc/lasi.c
+++ b/hw/misc/lasi.c
@@ -38,6 +38,7 @@ static bool lasi_chip_mem_valid(void *opaque, hwaddr addr,
 case LASI_LPT:
 case LASI_UART:
 case LASI_LAN:
+case LASI_LAN + 12: /* LASI LAN MAC */
 case LASI_RTC:
 
 case LASI_PCR ... LASI_AMR:
@@ -78,6 +79,7 @@ static MemTxResult lasi_chip_read_with_attrs(void *opaque, 
hwaddr addr,
 case LASI_LPT:
 case LASI_UART:
 case LASI_LAN:
+case LASI_LAN + 12:
 val = 0;
 break;
 case LASI_RTC:
-- 
2.43.0

[PULL 02/12] target/hppa: Add "diag 0x101" for console output support

2024-02-12 Thread deller

From: Helge Deller 

For debugging purposes at the early stage of the bootup process,
the SeaBIOS-hppa firmware sometimes needs to output characters to the
serial console. Note that the serial console is the default output
method for parisc machines.

At this stage PCI busses and other devices haven't been initialized
yet. So, SeaBIOS-hppa will not be able to find the correct I/O ports
for the serial ports yet.

Instead, add an emulation for the "diag 0x101" opcode to assist here.
Without any other dependencies, SeaBIOS-hppa can then load the character
to be printed in register %r26 and issue the diag assembly instruction.

The qemu diag_console_output() helper function will then print
that character to the first serial port.

Signed-off-by: Helge Deller 
Reviewed-by: Richard Henderson 
---
 target/hppa/helper.h |  1 +
 target/hppa/sys_helper.c | 36 
 target/hppa/translate.c  |  6 ++
 3 files changed, 43 insertions(+)

diff --git a/target/hppa/helper.h b/target/hppa/helper.h
index 20698f68ed..1bdbcd8f98 100644
--- a/target/hppa/helper.h
+++ b/target/hppa/helper.h
@@ -103,4 +103,5 @@ DEF_HELPER_FLAGS_1(ptlbe, TCG_CALL_NO_RWG, void, env)
 DEF_HELPER_FLAGS_2(lpa, TCG_CALL_NO_WG, tl, env, tl)
 DEF_HELPER_FLAGS_1(change_prot_id, TCG_CALL_NO_RWG, void, env)
 DEF_HELPER_1(diag_btlb, void, env)
+DEF_HELPER_1(diag_console_output, void, env)
 #endif
diff --git a/target/hppa/sys_helper.c b/target/hppa/sys_helper.c
index a59245eed3..4a31748342 100644
--- a/target/hppa/sys_helper.c
+++ b/target/hppa/sys_helper.c
@@ -23,6 +23,8 @@
 #include "exec/helper-proto.h"
 #include "qemu/timer.h"
 #include "sysemu/runstate.h"
+#include "sysemu/sysemu.h"
+#include "chardev/char-fe.h"
 
 void HELPER(write_interval_timer)(CPUHPPAState *env, target_ulong val)
 {
@@ -109,3 +111,37 @@ void HELPER(rfi_r)(CPUHPPAState *env)
 helper_getshadowregs(env);
 helper_rfi(env);
 }
+
+#ifndef CONFIG_USER_ONLY
+/*
+ * diag_console_output() is a helper function used during the initial bootup
+ * process of the SeaBIOS-hppa firmware.  During the bootup phase, addresses of
+ * serial ports on e.g. PCI busses are unknown and most other devices haven't
+ * been initialized and configured yet.  With help of a simple "diag" assembler
+ * instruction and an ASCII character code in register %r26 firmware can easily
+ * print debug output without any dependencies to the first serial port and use
+ * that as serial console.
+ */
+void HELPER(diag_console_output)(CPUHPPAState *env)
+{
+CharBackend *serial_backend;
+Chardev *serial_port;
+unsigned char c;
+
+/* find first serial port */
+serial_port = serial_hd(0);
+if (!serial_port) {
+return;
+}
+
+/* get serial_backend for the serial port */
+serial_backend = serial_port->be;
+if (!serial_backend ||
+!qemu_chr_fe_backend_connected(serial_backend)) {
+return;
+}
+
+c = (unsigned char)env->gr[26];
+qemu_chr_fe_write(serial_backend, , sizeof(c));
+}
+#endif
diff --git a/target/hppa/translate.c b/target/hppa/translate.c
index 08d09d50d7..53ec57ee86 100644
--- a/target/hppa/translate.c
+++ b/target/hppa/translate.c
@@ -4411,6 +4411,12 @@ static bool trans_diag(DisasContext *ctx, arg_diag *a)
 gen_helper_diag_btlb(tcg_env);
 return nullify_end(ctx);
 }
+if (a->i == 0x101) {
+/* print char in %r26 to first serial console, used by SeaBIOS-hppa */
+nullify_over(ctx);
+gen_helper_diag_console_output(tcg_env);
+return nullify_end(ctx);
+}
 #endif
 qemu_log_mask(LOG_UNIMP, "DIAG opcode 0x%04x ignored\n", a->i);
 return true;
-- 
2.43.0

[PULL 07/12] lasi: Add reset I/O ports for LASI audio and FDC

2024-02-12 Thread deller

From: Helge Deller 

Linux writes zeroes at bootup into the default ports for LASI audio and
LASI floppy controller to reset those devices.  Allow writing to those
registers to avoid HPMCs.

Signed-off-by: Helge Deller 
---
 hw/misc/lasi.c | 11 +++
 include/hw/misc/lasi.h |  2 ++
 2 files changed, 13 insertions(+)

diff --git a/hw/misc/lasi.c b/hw/misc/lasi.c
index 9cfa5bb316..970fc98b5c 100644
--- a/hw/misc/lasi.c
+++ b/hw/misc/lasi.c
@@ -36,10 +36,13 @@ static bool lasi_chip_mem_valid(void *opaque, hwaddr addr,
 case LASI_IAR:
 
 case LASI_LPT:
+case LASI_AUDIO:
+case LASI_AUDIO + 4:
 case LASI_UART:
 case LASI_LAN:
 case LASI_LAN + 12: /* LASI LAN MAC */
 case LASI_RTC:
+case LASI_FDC:
 
 case LASI_PCR ... LASI_AMR:
 ret = true;
@@ -80,6 +83,7 @@ static MemTxResult lasi_chip_read_with_attrs(void *opaque, 
hwaddr addr,
 case LASI_UART:
 case LASI_LAN:
 case LASI_LAN + 12:
+case LASI_FDC:
 val = 0;
 break;
 case LASI_RTC:
@@ -145,12 +149,19 @@ static MemTxResult lasi_chip_write_with_attrs(void 
*opaque, hwaddr addr,
 case LASI_LPT:
 /* XXX: reset parallel port */
 break;
+case LASI_AUDIO:
+case LASI_AUDIO + 4:
+/* XXX: reset audio port */
+break;
 case LASI_UART:
 /* XXX: reset serial port */
 break;
 case LASI_LAN:
 /* XXX: reset LAN card */
 break;
+case LASI_FDC:
+/* XXX: reset Floppy controller */
+break;
 case LASI_RTC:
 s->rtc_ref = val - time(NULL);
 break;
diff --git a/include/hw/misc/lasi.h b/include/hw/misc/lasi.h
index 0a8c7352be..f01c0f680a 100644
--- a/include/hw/misc/lasi.h
+++ b/include/hw/misc/lasi.h
@@ -26,9 +26,11 @@ OBJECT_DECLARE_SIMPLE_TYPE(LasiState, LASI_CHIP)
 #define LASI_IAR0x10
 
 #define LASI_LPT0x02000
+#define LASI_AUDIO  0x04000
 #define LASI_UART   0x05000
 #define LASI_LAN0x07000
 #define LASI_RTC0x09000
+#define LASI_FDC0x0A000
 
 #define LASI_PCR0x0C000 /* LASI Power Control register */
 #define LASI_ERRLOG 0x0C004 /* LASI Error Logging register */
-- 
2.43.0

[PULL 12/12] hw/hppa/machine: Load 64-bit firmware on 64-bit machines

2024-02-12 Thread deller

From: Helge Deller 

Load the 64-bit SeaBIOS-hppa firmware by default when running on a 64-bit
machine. This will enable us to later support more than 4GB of RAM and is
required that the OS (or PALO bootloader) will start or install a 64-bit kernel
instead of a 32-bit kernel.

Note that SeaBIOS-hppa v16 provides the "-fw_cfg opt/OS64,string=3" option with
which the user can control what the firmware shall report back to the OS:
Support of 32-bit OS, support of a 64-bit OS, or support for both (default).

Wrap firmware loading inside !qtest_enabled() to avoid this warning with
qtest: "qemu-system-hppa: no firmware provided".

Signed-off-by: Helge Deller 
Acked-by: Richard Henderson 
---
 hw/hppa/machine.c | 52 +++
 1 file changed, 30 insertions(+), 22 deletions(-)

diff --git a/hw/hppa/machine.c b/hw/hppa/machine.c
index eb78c46ff1..5fcaf5884b 100644
--- a/hw/hppa/machine.c
+++ b/hw/hppa/machine.c
@@ -13,6 +13,7 @@
 #include "qemu/error-report.h"
 #include "sysemu/reset.h"
 #include "sysemu/sysemu.h"
+#include "sysemu/qtest.h"
 #include "sysemu/runstate.h"
 #include "hw/rtc/mc146818rtc.h"
 #include "hw/timer/i8254.h"
@@ -333,6 +334,7 @@ static void machine_HP_common_init_tail(MachineState 
*machine, PCIBus *pci_bus,
 const char *kernel_filename = machine->kernel_filename;
 const char *kernel_cmdline = machine->kernel_cmdline;
 const char *initrd_filename = machine->initrd_filename;
+const char *firmware = machine->firmware;
 MachineClass *mc = MACHINE_GET_CLASS(machine);
 DeviceState *dev;
 PCIDevice *pci_dev;
@@ -408,31 +410,37 @@ static void machine_HP_common_init_tail(MachineState 
*machine, PCIBus *pci_bus,
 
 /* Load firmware.  Given that this is not "real" firmware,
but one explicitly written for the emulation, we might as
-   well load it directly from an ELF image.  */
-firmware_filename = qemu_find_file(QEMU_FILE_TYPE_BIOS,
-   machine->firmware ?: 
"hppa-firmware.img");
-if (firmware_filename == NULL) {
-error_report("no firmware provided");
-exit(1);
-}
+   well load it directly from an ELF image. Load the 64-bit
+   firmware on 64-bit machines by default if not specified
+   on command line. */
+if (!qtest_enabled()) {
+if (!firmware) {
+firmware = lasi_dev ? "hppa-firmware.img" : "hppa-firmware64.img";
+}
+firmware_filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, firmware);
+if (firmware_filename == NULL) {
+error_report("no firmware provided");
+exit(1);
+}
 
-size = load_elf(firmware_filename, NULL, translate, NULL,
-_entry, _low, _high, NULL,
-true, EM_PARISC, 0, 0);
+size = load_elf(firmware_filename, NULL, translate, NULL,
+_entry, _low, _high, NULL,
+true, EM_PARISC, 0, 0);
 
-if (size < 0) {
-error_report("could not load firmware '%s'", firmware_filename);
-exit(1);
-}
-qemu_log_mask(CPU_LOG_PAGE, "Firmware loaded at 0x%08" PRIx64
-  "-0x%08" PRIx64 ", entry at 0x%08" PRIx64 ".\n",
-  firmware_low, firmware_high, firmware_entry);
-if (firmware_low < translate(NULL, FIRMWARE_START) ||
-firmware_high >= translate(NULL, FIRMWARE_END)) {
-error_report("Firmware overlaps with memory or IO space");
-exit(1);
+if (size < 0) {
+error_report("could not load firmware '%s'", firmware_filename);
+exit(1);
+}
+qemu_log_mask(CPU_LOG_PAGE, "Firmware loaded at 0x%08" PRIx64
+  "-0x%08" PRIx64 ", entry at 0x%08" PRIx64 ".\n",
+  firmware_low, firmware_high, firmware_entry);
+if (firmware_low < translate(NULL, FIRMWARE_START) ||
+firmware_high >= translate(NULL, FIRMWARE_END)) {
+error_report("Firmware overlaps with memory or IO space");
+exit(1);
+}
+g_free(firmware_filename);
 }
-g_free(firmware_filename);
 
 rom_region = g_new(MemoryRegion, 1);
 memory_region_init_ram(rom_region, NULL, "firmware",
-- 
2.43.0

[PULL 04/12] hw/pci-host/astro: Implement Hard Fail and Soft Fail mode

2024-02-12 Thread deller

From: Helge Deller 

The Astro/Elroy chip can work in either Hard-Fail or Soft-Fail mode.

Hard fail means the system bus will send an HPMC (=crash) to the
processor, soft fail means the system bus will ignore timeouts of
MMIO-reads or MMIO-writes and return -1ULL.

The HF mode is controlled by a bit in the status register and is usually
programmed by the OS. Return the corresponing values based on the current
value of that bit.

Signed-off-by: Helge Deller 
Reviewed-by: Richard Henderson 
---
 hw/pci-host/astro.c | 21 +++--
 include/hw/pci-host/astro.h |  2 ++
 2 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/hw/pci-host/astro.c b/hw/pci-host/astro.c
index 96d655f5fb..e3e589ceac 100644
--- a/hw/pci-host/astro.c
+++ b/hw/pci-host/astro.c
@@ -131,15 +131,21 @@ static MemTxResult elroy_chip_read_with_attrs(void 
*opaque, hwaddr addr,
 if (s->iosapic_reg_select < ARRAY_SIZE(s->iosapic_reg)) {
 val = s->iosapic_reg[s->iosapic_reg_select];
 } else {
-val = 0;
-ret = MEMTX_DECODE_ERROR;
+goto check_hf;
 }
 }
 trace_iosapic_reg_read(s->iosapic_reg_select, size, val);
 break;
 default:
-val = 0;
-ret = MEMTX_DECODE_ERROR;
+check_hf:
+if (s->status_control & HF_ENABLE) {
+val = 0;
+ret = MEMTX_DECODE_ERROR;
+} else {
+/* return -1ULL if HardFail is disabled */
+val = ~0;
+ret = MEMTX_OK;
+}
 }
 trace_elroy_read(addr, size, val);
 
@@ -187,7 +193,7 @@ static MemTxResult elroy_chip_write_with_attrs(void 
*opaque, hwaddr addr,
 if (s->iosapic_reg_select < ARRAY_SIZE(s->iosapic_reg)) {
 s->iosapic_reg[s->iosapic_reg_select] = val;
 } else {
-return MEMTX_DECODE_ERROR;
+goto check_hf;
 }
 break;
 case 0x0840:/* IOSAPIC_REG_EOI */
@@ -200,7 +206,10 @@ static MemTxResult elroy_chip_write_with_attrs(void 
*opaque, hwaddr addr,
 }
 break;
 default:
-return MEMTX_DECODE_ERROR;
+check_hf:
+if (s->status_control & HF_ENABLE) {
+return MEMTX_DECODE_ERROR;
+}
 }
 return MEMTX_OK;
 }
diff --git a/include/hw/pci-host/astro.h b/include/hw/pci-host/astro.h
index f63fd220f3..e2966917cd 100644
--- a/include/hw/pci-host/astro.h
+++ b/include/hw/pci-host/astro.h
@@ -27,6 +27,8 @@ OBJECT_DECLARE_SIMPLE_TYPE(ElroyState, ELROY_PCI_HOST_BRIDGE)
 #define IOS_DIST_BASE_ADDR  0xfffee0ULL
 #define IOS_DIST_BASE_SIZE   0x1ULL
 
+#define HF_ENABLE   0x40/* enable HF mode (default is -1 mode) */
+
 struct AstroState;
 
 struct ElroyState {
-- 
2.43.0

[PULL 10/12] hw/net/tulip: add chip status register values

2024-02-12 Thread deller

From: Sven Schnelle 

Netbsd isn't able to detect a link on the emulated tulip card. That's
because netbsd reads the Chip Status Register of the Phy (address
0x14). The default phy data in the qemu tulip driver is all zero,
which means no link is established and autonegotation isn't complete.

Therefore set the register to 0x3b40, which means:

Link is up, Autonegotation complete, Full Duplex, 100MBit/s Link
speed.

Also clear the mask because this register is read only.

Signed-off-by: Sven Schnelle 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Helge Deller 
Tested-by: Helge Deller 
Signed-off-by: Helge Deller 
---
 hw/net/tulip.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/net/tulip.c b/hw/net/tulip.c
index 6d4fb06dad..1f2ef20977 100644
--- a/hw/net/tulip.c
+++ b/hw/net/tulip.c
@@ -421,7 +421,7 @@ static uint16_t tulip_mdi_default[] = {
 /* MDI Registers 8 - 15 */
 0x, 0x, 0x, 0x, 0x, 0x, 0x, 0x,
 /* MDI Registers 16 - 31 */
-0x0003, 0x, 0x0001, 0x, 0x, 0x, 0x, 0x,
+0x0003, 0x, 0x0001, 0x, 0x3b40, 0x, 0x, 0x,
 0x, 0x, 0x, 0x, 0x, 0x, 0x, 0x,
 };
 
@@ -429,7 +429,7 @@ static uint16_t tulip_mdi_default[] = {
 static const uint16_t tulip_mdi_mask[] = {
 0x, 0x, 0x, 0x, 0xc01f, 0x, 0x, 0x,
 0x, 0x, 0x, 0x, 0x, 0x, 0x, 0x,
-0x0fff, 0x, 0x, 0x, 0x, 0x, 0x, 0x,
+0x0fff, 0x, 0x, 0x, 0x, 0x, 0x, 0x,
 0x, 0x, 0x, 0x, 0x, 0x, 0x, 0x,
 };
 
-- 
2.43.0

[PULL 00/12] Hppa64 patches

2024-02-12 Thread deller

From: Helge Deller 

The following changes since commit 39a6e4f87e7b75a45b08d6dc8b8b7c2954c87440:

  Merge tag 'pull-qapi-2024-02-03' of https://repo.or.cz/qemu/armbru into 
staging (2024-02-03 13:31:58 +)

are available in the Git repository at:

  https://github.com/hdeller/qemu-hppa.git tags/hppa64-pull-request

for you to fetch changes up to a9314795f068515ff5925d0f68adf0a3215f6d2d:

  hw/hppa/machine: Load 64-bit firmware on 64-bit machines (2024-02-13 00:44:06 
+0100)


target/hppa: Enhancements and fixes

Some enhancements and fixes for the hppa target.

The major change is, that this patchset adds a new SeaBIOS-hppa firmware
which is built as 32- and 64-bit firmware.
The new 64-bit firmware is necessary to fully support 64-bit operating systems
(HP-UX, Linux, NetBSD,...).



Helge Deller (11):
  disas/hppa: Add disassembly for qemu specific instructions
  target/hppa: Add "diag 0x101" for console output support
  hw/pci-host/astro: Avoid aborting on access failure
  hw/pci-host/astro: Implement Hard Fail and Soft Fail mode
  lasi: allow access to LAN MAC address registers
  target/hppa: Implement do_transaction_failed handler for I/O errors
  lasi: Add reset I/O ports for LASI audio and FDC
  target/hppa: Allow read-access to PSW with rsm 0,reg instruction
  target/hppa: PDC_BTLB_INFO uses 32-bit ints
  target/hppa: Update SeaBIOS-hppa to version 16
  hw/hppa/machine: Load 64-bit firmware on 64-bit machines

Sven Schnelle (1):
  hw/net/tulip: add chip status register values

 disas/hppa.c|   4 +++
 hw/hppa/machine.c   |  52 +---
 hw/misc/lasi.c  |  13 +
 hw/net/tulip.c  |   4 +--
 hw/pci-host/astro.c |  36 ++---
 include/hw/misc/lasi.h  |   2 ++
 include/hw/pci-host/astro.h |   2 ++
 pc-bios/hppa-firmware.img   | Bin 163324 -> 167820 bytes
 pc-bios/hppa-firmware64.img | Bin 0 -> 206024 bytes
 roms/seabios-hppa   |   2 +-
 target/hppa/cpu.c   |   1 +
 target/hppa/cpu.h   |   5 
 target/hppa/helper.h|   1 +
 target/hppa/mem_helper.c|  21 ++-
 target/hppa/sys_helper.c|  36 +
 target/hppa/translate.c |  14 +-
 16 files changed, 150 insertions(+), 43 deletions(-)
 mode change 100644 => 100755 pc-bios/hppa-firmware.img
 create mode 100755 pc-bios/hppa-firmware64.img

-- 
2.43.0

[PULL 01/12] disas/hppa: Add disassembly for qemu specific instructions

2024-02-12 Thread deller

From: Helge Deller 

Add disassembly of opcodes for "HALT QEMU", "RESET QEMU" and
"RESTORE SHR" (restore shadow registers).

Signed-off-by: Helge Deller 
Reviewed-by: Richard Henderson 
---
 disas/hppa.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/disas/hppa.c b/disas/hppa.c
index cce4f4aa37..22dce9b41b 100644
--- a/disas/hppa.c
+++ b/disas/hppa.c
@@ -1609,6 +1609,10 @@ static const struct pa_opcode pa_opcodes[] =
 { "call",  0xe800a000, 0xffe0e000, "nW", pa10, FLAG_STRICT},
 { "ret",   0xe840d000, 0xfffd, "n", pa20, FLAG_STRICT},
 
+/* Opcodes assigned to QEMU, used by SeaBIOS firmware and Linux kernel */
+{ "HALT QEMU", 0xfffdead0, 0xfffd, "n", pa10, FLAG_STRICT},
+{ "RESET QEMU",0xfffdead1, 0xfffd, "n", pa10, FLAG_STRICT},
+{ "RESTORE SHR",0xfffdead2, 0xfffd, "n", pa10, FLAG_STRICT},
 };
 
 #define NUMOPCODES ((sizeof pa_opcodes)/(sizeof pa_opcodes[0]))
-- 
2.43.0

Re: [PULL 11/12] target/hppa: Update SeaBIOS-hppa to version 16

2024-02-12 Thread Helge Deller


On 2/11/24 19:49, Michael Tokarev wrote:

11.02.2024 15:29, del...@kernel.org

From: Helge Deller 

SeaBIOS-hppa version 16 news & enhancements:



  pc-bios/hppa-firmware.img   | Bin 163324 -> 167820 bytes
  pc-bios/hppa-firmware64.img | Bin 0 -> 206024 bytes
  roms/seabios-hppa   |   2 +-
  3 files changed, 1 insertion(+), 1 deletion(-)
  mode change 100644 => 100755 pc-bios/hppa-firmware.img
  create mode 100755 pc-bios/hppa-firmware64.img


Can we have build instructions for these files in roms/Makefile please?


Sure, I'll add then in the next round when I send a v17 firmware.

Helge

Re: [PULL 00/12] Hppa64 patches

2024-02-12 Thread Helge Deller


On 2/12/24 22:16, Peter Maydell wrote:

On Sun, 11 Feb 2024 at 12:30,  wrote:


From: Helge Deller 

The following changes since commit 39a6e4f87e7b75a45b08d6dc8b8b7c2954c87440:

   Merge tag 'pull-qapi-2024-02-03' of https://repo.or.cz/qemu/armbru into 
staging (2024-02-03 13:31:58 +)

are available in the Git repository at:

   https://github.com/hdeller/qemu-hppa.git tags/hppa64-pull-request

for you to fetch changes up to f9d2270c85872bd71a01e15b2ebda2569f17f811:

   hw/hppa/machine: Load 64-bit firmware on 64-bit machines (2024-02-11 
13:25:15 +0100)


target/hppa: Enhancements and fixes

A new SeaBIOS-hppa firmware which is built as 32- and 64-bit firmware.
Necessary to fully support 64-bit operating systems (HP-UX, Linux, NetBSD,...).




This fails "make check", eg:
https://gitlab.com/qemu-project/qemu/-/jobs/6154451100

because when the qom-test etc tests run qemu-system-hppa, it
barfs with "qemu-system-hppa: no firmware provided".

That kind of firmware check needs to not fire when
using the qtest accel.


Ok. But how do people usually work around this kind of issue?
Test if the qtest accel is in use?
Ignore if the firmware can't be loaded?
Any hint would be great!

Helge

Re: [RFC PATCH] target/ppc: Move add and subf type fixed-point arithmetic instructions to decodetree

2024-02-12 Thread Richard Henderson


On 2/9/24 01:35, Chinmay Rath wrote:

+_tab_cy rt ra rb cy
+@Z23_tab_cy .. rt:5 ra:5 rb:5 cy:2  .   _tab_cy

...

+ADDEX   01 . . . .. 10101010 -  @Z23_tab_cy

...

+static bool trans_ADDEX(DisasContext *ctx, arg_Z23_tab_cy *a)
+{
+gen_op_arith_add(ctx, cpu_gpr[a->rt], cpu_gpr[a->ra], cpu_gpr[a->rb],
+ cpu_ov, cpu_ov32, true, true, false, false);
+return true;
+}


CY != 0 is reserved.

While you could diagnose this in trans_ADDEX, it seems cleaner to simply match 00 in the 
CY field until a future ISA defines something else.  All that is required is a comment in 
the decodetree entry.


# Z23-form, with CY=0; all other values for CY are reserved.
# This works out the same as X-form.
ADDEX01 . . . 00 10101010 -   @X


r~

[PATCH] tcg/arm: Fix goto_tb for large translation blocks

2024-02-12 Thread Richard Henderson

Correct arithmetic for separating high and low
on a large negative number.

Fixes: 79ffece4447 ("tcg/arm: Implement direct branch for goto_tb")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1714
Signed-off-by: Richard Henderson 
---
 tcg/arm/tcg-target.c.inc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tcg/arm/tcg-target.c.inc b/tcg/arm/tcg-target.c.inc
index ffd23ef789..6a04c73c76 100644
--- a/tcg/arm/tcg-target.c.inc
+++ b/tcg/arm/tcg-target.c.inc
@@ -1771,9 +1771,9 @@ static void tcg_out_goto_tb(TCGContext *s, int which)
  * shifted immediate from pc.
  */
 int h = -i_disp;
-int l = h & 0xfff;
+int l = -(h & 0xfff);
 
-h = encode_imm_nofail(h - l);
+h = encode_imm_nofail(h + l);
 tcg_out_dat_imm(s, COND_AL, ARITH_SUB, TCG_REG_R0, TCG_REG_PC, h);
 tcg_out_ld32_12(s, COND_AL, TCG_REG_PC, TCG_REG_R0, l);
 }
-- 
2.34.1

Re: [PULL 00/12] Hppa64 patches

2024-02-12 Thread Peter Maydell

On Sun, 11 Feb 2024 at 12:30,  wrote:
>
> From: Helge Deller 
>
> The following changes since commit 39a6e4f87e7b75a45b08d6dc8b8b7c2954c87440:
>
>   Merge tag 'pull-qapi-2024-02-03' of https://repo.or.cz/qemu/armbru into 
> staging (2024-02-03 13:31:58 +)
>
> are available in the Git repository at:
>
>   https://github.com/hdeller/qemu-hppa.git tags/hppa64-pull-request
>
> for you to fetch changes up to f9d2270c85872bd71a01e15b2ebda2569f17f811:
>
>   hw/hppa/machine: Load 64-bit firmware on 64-bit machines (2024-02-11 
> 13:25:15 +0100)
>
> 
> target/hppa: Enhancements and fixes
>
> A new SeaBIOS-hppa firmware which is built as 32- and 64-bit firmware.
> Necessary to fully support 64-bit operating systems (HP-UX, Linux, 
> NetBSD,...).
>
> 

This fails "make check", eg:
https://gitlab.com/qemu-project/qemu/-/jobs/6154451100

because when the qom-test etc tests run qemu-system-hppa, it
barfs with "qemu-system-hppa: no firmware provided".

That kind of firmware check needs to not fire when
using the qtest accel.

thanks
-- PMM

Re: RFC i386/sev: kernel-hashes, reference measurements and event logs

2024-02-12 Thread James Bottomley

On Mon, 2024-02-12 at 12:16 -0800, Dionna Amalie Glaze wrote:
> This is not a patch but it felt inappropriate to derail a recent
> patch that's just refactoring the kernel-hashes object_class_property
> definition. Apologies if this has been discussed before, as I'm not
> particularly active here.

I haven't seen that patch, but I presume it's not relevant?

> Regarding kernel-hashes, how is that load-time information passed
> along to the guest beyond, say, OVMF? Can we require that these
> hashes
> are also present in fw_cfg so they can be read from the kernel? In
> Linux it'd be nice to have /sys/firmware/qemu_fw_cfg/sev_kernel_hash,
> /sys/firmware/qemu_fw_cfg/sev_cmdline_hash,
> /sys/firmware/qemu_fw_cfg/sev_initrd_hash

Are you referring to measured direct boot?  In that case, there's no
point having hashes in the fw_cfg, because OVMF in the guest needs to
hash the kernel itself and then compare to a trusted source (which the
fw_cfg file wouldn't be because it's under the control of the
hypervisor).  For SEV, the trusted source is a table in the launch
measured ROM, but I'm sure TDX does something similar.

> I'm working on how to use standard document formats for providing
> reference measurements of the Google Compute Engine virtual firmware
> for remote attestation, and these hashes have an impact not just on
> the measurement but on the entire model that the IETF RATS working
> group is considering for authorizing attestation measurements.
> 
> If you're assembling a VM launch configuration with firmware provided
> by a trusted vendor (say Google), and your hashes are passed in from
> an API, there's no easy rendezvous to state that the combination
> produces the expected hardware measurement. This makes adding
> kernel-hashes support unappetizing, since it makes the hardware
> attestation report's measurement have no meaning, or at least, it
> makes life difficult for people trying to assign it meaning.

Well, no, since firmware tends to update on a longer timeframe than the
kernel and the cmdline and initrd tend to have quicker update cycles
than the kernel.  Thus there's no one golden reference.  Instead you
give a tool (say the virtee sev_snp_measure tool

https://github.com/virtee/sev-snp-measure

) the hashes of the firmware, kernel, command line and initrd and it
caclulates the expected launch measure

> The measurement is the product of two different entities as assembled
> by the VMM given a trusted firmware and the kernel hashes. It's a bit
> of a sandwich of (GCE) core firmware, (User) SEV hashes, (GCE) BSP
> VMSA, AP VMSA*.
> 
> When you collect "evidence" to verify locally or pass along to a
> verification service, you need more than just the hardware
> attestation report to make sense of the combined bits. You have a PCR
> situation like with TPM, so you need an event log for these different
> aspects of the ultimate measurement. There is no event log for this
> -kernel-hashes construction.

That's because it's a pre-launch measure.  The TCG log is only for post
launch.  The idea being those values are needed for you to approve
something in pre-launch, like key release, before the TPM starts
running.

That's not to say we shouldn't have log entries for pre-launch, but
that's a generic problem and not specific to measured direct boot.

> We can use the TCG TPM event log to post EV_NO_ACTION events about
> the PlatformRIM, specifically, to point at a UEFI variable that we
> populate to store our signed document about the expected measurements
> with the Qemu-SEV-SNP-boot-protocol, but I don't see how we might
> collect the kernel-hashes values as extra evidence to combine and
> derive the attestation report's MEASUREMENT field to accept
> "evidence" objects for the core firmware component and the kernel
> hashes component.

This sounds like a first measurement thing.  In many ways, the pre-
launch measurement is equivalent to the SRTM of a physical system which
is collected in the EV_S_CRTM_* events.  But for that to happen, I
think the TCG would have to bless it in some form.

James

> So my question is if this feature is to be a long term feature, how
> do you expect to collect the SEV hashes as a separate evidence object
> to play nicely with IETF RATS?
> 
> Is this a long term feature, or are we expecting it to be deprecated
> by SVSM?
> 
> I've tagged in people in CC that I could imagine would have something
> to say about this.
> 
> Thanks y'all
> 
> --
> -Dionna Glaze, PhD (she/her)

[PATCH v2 2/2] tests/tcg: Add SIGRTMIN/SIGRTMAX test

2024-02-12 Thread Ilya Leoshkevich

Test the lowest and the highest real-time signals.

Signed-off-by: Ilya Leoshkevich 
---
 tests/tcg/multiarch/linux/linux-sigrtminmax.c | 41 +++
 1 file changed, 41 insertions(+)
 create mode 100644 tests/tcg/multiarch/linux/linux-sigrtminmax.c

diff --git a/tests/tcg/multiarch/linux/linux-sigrtminmax.c 
b/tests/tcg/multiarch/linux/linux-sigrtminmax.c
new file mode 100644
index 000..773383a3fef
--- /dev/null
+++ b/tests/tcg/multiarch/linux/linux-sigrtminmax.c
@@ -0,0 +1,41 @@
+/*
+ * Test the lowest and the highest real-time signals.
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static bool seen_sigrtmin, seen_sigrtmax;
+
+static void handle_signal(int sig)
+{
+if (sig == SIGRTMIN) {
+seen_sigrtmin = true;
+} else if (sig == SIGRTMAX) {
+seen_sigrtmax = true;
+} else {
+_exit(1);
+}
+}
+
+int main(void)
+{
+struct sigaction act;
+
+memset(, 0, sizeof(act));
+act.sa_handler = handle_signal;
+assert(sigaction(SIGRTMIN, , NULL) == 0);
+assert(sigaction(SIGRTMAX, , NULL) == 0);
+
+assert(kill(getpid(), SIGRTMIN) == 0);
+assert(seen_sigrtmin);
+assert(kill(getpid(), SIGRTMAX) == 0);
+assert(seen_sigrtmax);
+
+return EXIT_SUCCESS;
+}
-- 
2.43.0

[PATCH v2 1/2] linux-user: Map low priority rt signals

2024-02-12 Thread Ilya Leoshkevich

Some applications want to use low priority realtime signals (e.g.,
SIGRTMAX). Currently QEMU cannot map all target realtime signals to
host signals, and chooses to sacrifice the end of the target realtime
signal range.

Change this to the middle of that range, hoping that fewer applications
will need it.

Signed-off-by: Ilya Leoshkevich 
---
 linux-user/signal.c | 18 +-
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/linux-user/signal.c b/linux-user/signal.c
index d3e62ab030f..a81533b563a 100644
--- a/linux-user/signal.c
+++ b/linux-user/signal.c
@@ -511,13 +511,14 @@ static int core_dump_signal(int sig)
 
 static void signal_table_init(void)
 {
-int hsig, tsig, count;
+int hsig, hsig_count, tsig, tsig_count, tsig_hole, tsig_hole_size, count;
 
 /*
- * Signals are supported starting from TARGET_SIGRTMIN and going up
- * until we run out of host realtime signals.  Glibc uses the lower 2
- * RT signals and (hopefully) nobody uses the upper ones.
- * This is why SIGRTMIN (34) is generally greater than __SIGRTMIN (32).
+ * Signals are supported starting from TARGET_SIGRTMIN and up to
+ * TARGET_SIGRTMAX, potentially with a hole in the middle of this
+ * range, which, hopefully, nobody uses. Glibc uses the lower 2 RT
+ * signals; this is why SIGRTMIN (34) is generally greater than
+ * __SIGRTMIN (32).
  * To fix this properly we would need to do manual signal delivery
  * multiplexed over a single host signal.
  * Attempts for configure "missing" signals via sigaction will be
@@ -536,9 +537,16 @@ static void signal_table_init(void)
 host_to_target_signal_table[SIGABRT] = 0;
 host_to_target_signal_table[hsig++] = TARGET_SIGABRT;
 
+hsig_count = SIGRTMAX - hsig + 1;
+tsig_count = TARGET_NSIG - TARGET_SIGRTMIN + 1;
+tsig_hole_size = tsig_count - MIN(hsig_count, tsig_count);
+tsig_hole = TARGET_SIGRTMIN + (tsig_count - tsig_hole_size) / 2;
 for (tsig = TARGET_SIGRTMIN;
  hsig <= SIGRTMAX && tsig <= TARGET_NSIG;
  hsig++, tsig++) {
+if (tsig == tsig_hole) {
+tsig += tsig_hole_size;
+}
 host_to_target_signal_table[hsig] = tsig;
 }
 
-- 
2.43.0

[PATCH v2 0/2] linux-user: Map low priority rt signals

2024-02-12 Thread Ilya Leoshkevich

Hi,

There are apps out there that want to use SIGRTMAX, which linux-user
currently does not map to a host signal. The reason is that with the
current approach it's not possible to map all target signals, so it
was decided to sacrifice the end of the range.

In this series I propose to sacrifice the middle instead. Patch 1 makes
the change, patch 2 is a test.

Best regards,
Ilya

Ilya Leoshkevich (2):
  linux-user: Map low priority rt signals
  tests/tcg: Add SIGRTMIN/SIGRTMAX test

 linux-user/signal.c   | 18 +---
 tests/tcg/multiarch/linux/linux-sigrtminmax.c | 41 +++
 2 files changed, 54 insertions(+), 5 deletions(-)
 create mode 100644 tests/tcg/multiarch/linux/linux-sigrtminmax.c

-- 
2.43.0

Re: Re: [PULL 05/13] linux-user: Use walk_memory_regions for open_self_maps

2024-02-12 Thread Ilya Leoshkevich

On Mon, Feb 05, 2024 at 11:11:06AM +, Richard Purdie wrote:
> On Mon, 2024-02-05 at 13:05 +1000, Richard Henderson wrote:
> > On 1/26/24 23:52, Richard Purdie wrote:
> > > On Fri, 2024-01-26 at 16:33 +0300, Michael Tokarev wrote:
> > > > 26.01.2024 16:03, Richard Purdie wrote:
> > > > > I've run into a problem with this change.
> > > > > 
> > > > > We (Yocto Project) upgraded to qemu 8.2.0 recently and after that we
> > > > > started seeing errors cross compiling webkitgtk on x86_64 for x86_64
> > > > > during the introspection code which runs under user mode qemu.
> > > > 
> > > > Besides your observations, please be aware there's quite a few issues 
> > > > in 8.2.0.
> > > > Please take a look at 
> > > > https://gitlab.com/mjt0k/qemu/-/commits/staging-8.2/
> > > > (and https://gitlab.com/qemu-project/qemu/-/commits/staging-8.2/ which 
> > > > is updated
> > > > less often) for fixes already queued up, if you haven't looked there 
> > > > already.
> > > > 8.2.1 stable/bugfix release is scheduled for the beginning of the next 
> > > > week.
> > > 
> > > Thanks.
> > > 
> > > I should note that I did test the staging-8.2 branch and nothing there
> > > helped. The issue was also present with master as of yesterday.
> > > 
> > > https://bugzilla.yoctoproject.org/show_bug.cgi?id=15367 is Yocto
> > > Projects tracking of the issue which has the commits for master and
> > > staging-8.2 that I tested.
> > 
> > The yocto logs referenced here are not helpful for reproducing the problem.
> 
> It took me a couple of days I didn't have to workout which commit
> caused it, which versions showed the issue and how to work around it.
> 
> It looks host kernel specific since it doesn't happen on some systems
> so even with the binaries/command/environment vars, it may not be
> enough.
> 
> I was hoping the indication of the cause might help point to the fix as
> there is quite a bit of work in trying to extract this into a
> reproducer. The failure is 20 mins into a webkitgtk compile on a remote
> CI system which no longer has the context on it.
> 
> > Please extract a binary to run, inputs, and command-line.
> 
> I wish I could say that to the bug reports I get! :)
> 
> I'll do my best but finding the time is going to be a challenge.
> 
> Cheers,
> 
> Richard

I just ran into a similar crash and could reproduce it with
5005aed8a7e7 alpha-linux-user as follows:

#include 
#include 

int main(void)
{
shmat(shmget(IPC_PRIVATE, 1836016, IPC_CREAT | 0600), (void 
*)0x2804000, 0);
open("/proc/self/maps", O_RDONLY);
}

Apparently an mmap() is missing for shmat() when g>h and shmaddr is
specified. The mismatch between the host's and the guest's view of the
mapping's tail appears to be causing the SEGV.

RFC i386/sev: kernel-hashes, reference measurements and event logs

2024-02-12 Thread Dionna Amalie Glaze

This is not a patch but it felt inappropriate to derail a recent patch
that's just refactoring the kernel-hashes object_class_property
definition. Apologies if this has been discussed before, as I'm not
particularly active here.

Regarding kernel-hashes, how is that load-time information passed
along to the guest beyond, say, OVMF? Can we require that these hashes
are also present in fw_cfg so they can be read from the kernel? In
Linux it'd be nice to have /sys/firmware/qemu_fw_cfg/sev_kernel_hash,
/sys/firmware/qemu_fw_cfg/sev_cmdline_hash,
/sys/firmware/qemu_fw_cfg/sev_initrd_hash

I'm working on how to use standard document formats for providing
reference measurements of the Google Compute Engine virtual firmware
for remote attestation, and these hashes have an impact not just on
the measurement but on the entire model that the IETF RATS working
group is considering for authorizing attestation measurements.

If you're assembling a VM launch configuration with firmware provided
by a trusted vendor (say Google), and your hashes are passed in from
an API, there's no easy rendezvous to state that the combination
produces the expected hardware measurement. This makes adding
kernel-hashes support unappetizing, since it makes the hardware
attestation report's measurement have no meaning, or at least, it
makes life difficult for people trying to assign it meaning.

The measurement is the product of two different entities as assembled
by the VMM given a trusted firmware and the kernel hashes. It's a bit
of a sandwich of (GCE) core firmware, (User) SEV hashes, (GCE) BSP
VMSA, AP VMSA*.

When you collect "evidence" to verify locally or pass along to a
verification service, you need more than just the hardware attestation
report to make sense of the combined bits. You have a PCR situation
like with TPM, so you need an event log for these different aspects of
the ultimate measurement. There is no event log for this
-kernel-hashes construction.

We can use the TCG TPM event log to post EV_NO_ACTION events about the
PlatformRIM, specifically, to point at a UEFI variable that we
populate to store our signed document about the expected measurements
with the Qemu-SEV-SNP-boot-protocol, but I don't see how we might
collect the kernel-hashes values as extra evidence to combine and
derive the attestation report's MEASUREMENT field to accept "evidence"
objects for the core firmware component and the kernel hashes
component.

So my question is if this feature is to be a long term feature, how do
you expect to collect the SEV hashes as a separate evidence object to
play nicely with IETF RATS?

Is this a long term feature, or are we expecting it to be deprecated by SVSM?

I've tagged in people in CC that I could imagine would have something
to say about this.

Thanks y'all

--
-Dionna Glaze, PhD (she/her)

Re: [PATCH v4 3/8] Add an internal PLL Clock object

2024-02-12 Thread Arnaud Minier

Hello Alistair,

Yes, I think we should bail out if pll_set_vco_multiplier receives an invalid 
value to respect the hardware defined bounds.
I actually intended to add a return there but I missed it. It will be added in 
the next version.

Thanks,
Arnaud Minier


- Mail original -
De: "Alistair Francis" 
À: "Arnaud Minier" 
Cc: "qemu-devel" , "Laurent Vivier" 
, "Alistair Francis" , "Inès 
Varhol" , "Peter Maydell" 
, "Paolo Bonzini" , "Thomas 
Huth" , qemu-...@nongnu.org
Envoyé: Jeudi 1 Février 2024 01:18:22
Objet: Re: [PATCH v4 3/8] Add an internal PLL Clock object

On Wed, Jan 31, 2024 at 2:09 AM Arnaud Minier
 wrote:
>
> This object represents the PLLs and their channels. The PLLs allow for a
> more fine-grained control of the clocks frequency.
>
> Wasn't sure about how to handle the reset and the migration so used the
> same appproach as the BCM2835 CPRMAN.
>
> Signed-off-by: Arnaud Minier 
> Signed-off-by: Inès Varhol 
> ---
>  hw/misc/stm32l4x5_rcc.c   | 175 ++
>  hw/misc/trace-events  |   5 +
>  include/hw/misc/stm32l4x5_rcc.h   |  40 +
>  include/hw/misc/stm32l4x5_rcc_internals.h |  22 +++
>  4 files changed, 242 insertions(+)
>
> diff --git a/hw/misc/stm32l4x5_rcc.c b/hw/misc/stm32l4x5_rcc.c
> index ed10832f88..fb0233c3e9 100644
> --- a/hw/misc/stm32l4x5_rcc.c
> +++ b/hw/misc/stm32l4x5_rcc.c
> @@ -162,6 +162,156 @@ static void clock_mux_set_source(RccClockMuxState *mux, 
> RccClockMuxSource src)
>  clock_mux_update(mux);
>  }
>
> +static void pll_update(RccPllState *pll)
> +{
> +uint64_t vco_freq, old_channel_freq, channel_freq;
> +int i;
> +
> +/* The common PLLM factor is handled by the PLL mux */
> +vco_freq = muldiv64(clock_get_hz(pll->in), pll->vco_multiplier, 1);
> +
> +for (i = 0; i < RCC_NUM_CHANNEL_PLL_OUT; i++) {
> +if (!pll->channel_exists[i]) {
> +continue;
> +}
> +
> +old_channel_freq = clock_get_hz(pll->channels[i]);
> +if (!pll->enabled ||
> +!pll->channel_enabled[i] ||
> +!pll->channel_divider[i]) {
> +channel_freq = 0;
> +} else {
> +channel_freq = muldiv64(vco_freq,
> +1,
> +pll->channel_divider[i]);
> +}
> +
> +/* No change, early continue to avoid log spam and useless 
> propagation */
> +if (old_channel_freq == channel_freq) {
> +continue;
> +}
> +
> +clock_update_hz(pll->channels[i], channel_freq);
> +trace_stm32l4x5_rcc_pll_update(pll->id, i, vco_freq,
> +old_channel_freq, channel_freq);
> +}
> +}
> +
> +static void pll_src_update(void *opaque, ClockEvent event)
> +{
> +RccPllState *s = opaque;
> +pll_update(s);
> +}
> +
> +static void pll_init(Object *obj)
> +{
> +RccPllState *s = RCC_PLL(obj);
> +size_t i;
> +
> +s->in = qdev_init_clock_in(DEVICE(s), "in",
> +   pll_src_update, s, ClockUpdate);
> +
> +const char *names[] = {
> +"out-p", "out-q", "out-r",
> +};
> +
> +for (i = 0; i < RCC_NUM_CHANNEL_PLL_OUT; i++) {
> +s->channels[i] = qdev_init_clock_out(DEVICE(s), names[i]);
> +}
> +}
> +
> +static void pll_reset_hold(Object *obj)
> +{ }
> +
> +static const VMStateDescription pll_vmstate = {
> +.name = TYPE_RCC_PLL,
> +.version_id = 1,
> +.minimum_version_id = 1,
> +.fields = (VMStateField[]) {
> +VMSTATE_UINT32(id, RccPllState),
> +VMSTATE_CLOCK(in, RccPllState),
> +VMSTATE_ARRAY_CLOCK(channels, RccPllState,
> +RCC_NUM_CHANNEL_PLL_OUT),
> +VMSTATE_BOOL(enabled, RccPllState),
> +VMSTATE_UINT32(vco_multiplier, RccPllState),
> +VMSTATE_BOOL_ARRAY(channel_enabled, RccPllState, 
> RCC_NUM_CHANNEL_PLL_OUT),
> +VMSTATE_BOOL_ARRAY(channel_exists, RccPllState, 
> RCC_NUM_CHANNEL_PLL_OUT),
> +VMSTATE_UINT32_ARRAY(channel_divider, RccPllState, 
> RCC_NUM_CHANNEL_PLL_OUT),
> +VMSTATE_END_OF_LIST()
> +}
> +};
> +
> +static void pll_class_init(ObjectClass *klass, void *data)
> +{
> +DeviceClass *dc = DEVICE_CLASS(klass);
> +ResettableClass *rc = RESETTABLE_CLASS(klass);
> +
> +rc->phases.hold = pll_reset_hold;
> +dc->vmsd = _vmstate;
> +}
> +
> +static void pll_set_vco_multiplier(RccPllState *pll, uint32_t vco_multiplier)
> +{
> +if (pll->vco_multiplier == vco_multiplier) {
> +return;
> +}
> +
> +if (vco_multiplier < 8 || vco_multiplier > 86) {
> +qemu_log_mask(LOG_GUEST_ERROR,
> +"%s: VCO multiplier is out of bound (%u) for PLL %u\n",
> +__func__, vco_multiplier, pll->id);

Should we bail out with an invalid value?

Alistair

> +}
> +
> +trace_stm32l4x5_rcc_pll_set_vco_multiplier(pll->id,
> +pll->vco_multiplier, vco_multiplier);
> +
> +pll->vco_multiplier =

Re: [PATCH RFCv2 6/8] backends/iommufd: Add ability to disable hugepages

2024-02-12 Thread Joao Martins

On 12/02/2024 17:17, Markus Armbruster wrote:
> Joao Martins  writes:
> 
>> Allow disabling hugepages to be dirty track at base page
>> granularity in similar vein to vfio_type1_iommu.disable_hugepages
>> but per IOAS.
>>
>> Signed-off-by: Joao Martins 
> 
> [...]
> 
>> diff --git a/qapi/qom.json b/qapi/qom.json
>> index 84af23fe245d..9ad27e2b939b 100644
>> --- a/qapi/qom.json
>> +++ b/qapi/qom.json
>> @@ -809,7 +809,7 @@
>>  # Since: 9.0
>>  ##
>>  { 'struct': 'IOMMUFDProperties',
>> -  'data': { '*fd': 'str' } }
>> +  'data': { '*fd': 'str', '*hugepages': 'bool' } }
>>  
>>  ##
>>  # @RngProperties:
> 
> Missing documentation for the new member.
> 
> The latest QAPI PR is making this a hard error.
> 

Gah, sorry. I think I didn't have that PR yet as I didn't hit any build errors.
Missing the doc was pure distraction.

Will fix it for the next version.

Joao

Re: [PATCH RFCv2 6/8] backends/iommufd: Add ability to disable hugepages

2024-02-12 Thread Markus Armbruster

Joao Martins  writes:

> Allow disabling hugepages to be dirty track at base page
> granularity in similar vein to vfio_type1_iommu.disable_hugepages
> but per IOAS.
>
> Signed-off-by: Joao Martins 

[...]

> diff --git a/qapi/qom.json b/qapi/qom.json
> index 84af23fe245d..9ad27e2b939b 100644
> --- a/qapi/qom.json
> +++ b/qapi/qom.json
> @@ -809,7 +809,7 @@
>  # Since: 9.0
>  ##
>  { 'struct': 'IOMMUFDProperties',
> -  'data': { '*fd': 'str' } }
> +  'data': { '*fd': 'str', '*hugepages': 'bool' } }
>  
>  ##
>  # @RngProperties:

Missing documentation for the new member.

The latest QAPI PR is making this a hard error.

Re: [PULL 00/17] Misc fixes patches

2024-02-12 Thread Peter Maydell

On Fri, 9 Feb 2024 at 14:07, Daniel P. Berrangé  wrote:
>
> The following changes since commit 9e34f127f419b3941b36dfdfac79640dc81e97e2:
>
>   Merge tag 'pull-request-2024-02-06' of https://gitlab.com/thuth/qemu into 
> staging (2024-02-08 11:59:28 +)
>
> are available in the Git repository at:
>
>   https://gitlab.com/berrange/qemu tags/misc-fixes-pull-request
>
> for you to fetch changes up to d87b258b75498d3e8563ec8ebaaf67efc27be945:
>
>   tests: Add case for LUKS volume with detached header (2024-02-09 12:50:38 
> +)
>
> 
>  - LUKS support for detached headers
>  - Update x86 CPU model docs and script
>  - Add missing close of chardev QIOChannel
>  - More trace events o nTKS handshake
>  - Drop unsafe VNC constants
>  - Increase NOFILE limit during startup
>


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/9.0
for any user-visible changes.

-- PMM

Re: [PULL 00/10] testing, doc and gdbstub updates

2024-02-12 Thread Peter Maydell

On Fri, 9 Feb 2024 at 19:49, Alex Bennée  wrote:
>
> The following changes since commit 5d1fc614413b10dd94858b07a1b2e26b1aa0296c:
>
>   Merge tag 'migration-staging-pull-request' of 
> https://gitlab.com/peterx/qemu into staging (2024-02-09 11:22:20 +)
>
> are available in the Git repository at:
>
>   https://gitlab.com/stsquad/qemu.git tags/pull-maintainer-updates-090224-1
>
> for you to fetch changes up to 86b75667e04b49a0b75f061f589b3fbec3fb78f1:
>
>   tests/tcg: Add the syscall catchpoint gdbstub test (2024-02-09 17:52:40 
> +)
>
> 
> testing, doc and gdbstub updates:
>
>   - add sqlite3 to openSUSE image
>   - mark CRIS as deprecated
>   - re-enable the TCG plugin tests
>   - use select for semihosting
>   - implement syscall catching in gdbstub
>


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/9.0
for any user-visible changes.

-- PMM

Re: [PULL 06/15] tests/qtest/migration: Don't use -cpu max for aarch64

2024-02-12 Thread Peter Maydell

On Mon, 5 Feb 2024 at 02:56, Peter Xu  wrote:
> Thanks, but then this is pretty sad.  I'm surprised aarch64 doesn't have
> such requirement to allow some VM config to run across all kinds of hosts.

It just hasn't been anything that anybody so far has wanted
enough to put the necessary kernel-side work into. (There are
also some tricky issues surrounding errata workarounds that
the guest needs to do: you need to have some way of telling the
guest "the vCPU looks like it's type X but you need to do
errata workarounds ABC for CPU type Y, not the ones for X".)

> > The difference is just that we provide different cross-version migration
> > compatibility support levels for the two cases. (Strictly speaking, I'm
> > not sure we strongly support migration compat for 'max' on KVM either --
> > for instance you probably need to be doing a migration on the same host
> > CPU type and the same host kernel version. It's just that the definition
> > of "max" on KVM is less QEMU-dependent and more host-kernel-dependent, so
> > in your particular situation running the test cases you're less likely to
> > see any possible breakage.)
>
> Yes we don't have issue for the current CI on KVM compatibilities, but QEMU
> does matter for sure.
>
> Then we can either (1) add code as Fabiano suggested to choose different
> cpu model by adding hack code in qtest, or (2) we simply not support
> aarch64 on cross binary test like most of the rest of the arch, but only
> support x86, until any arch can provide a stable CPU that support all
> config of hosts (we can document it in the CI file).

That seems a bit pessimistic. How about "always only test with TCG" ?
That will defend the migration compat on all the device models etc,
which is the bit we're most likely to break by accident.

thanks
-- PMM

Re: [PATCH RFCv2 2/8] vfio/iommufd: Introduce auto domain creation

2024-02-12 Thread Joao Martins

On 12/02/2024 16:27, Jason Gunthorpe wrote:
> On Mon, Feb 12, 2024 at 01:56:37PM +, Joao Martins wrote:
>> There's generally two modes of operation for IOMMUFD:
>>
>> * The simple user API which intends to perform relatively simple things
>> with IOMMUs e.g. DPDK. It generally creates an IOAS and attach to VFIO
>> and mainly performs IOAS_MAP and UNMAP.
>>
>> * The native IOMMUFD API where you have fine grained control of the
>> IOMMU domain and model it accordingly. This is where most new feature
>> are being steered to.
>>
>> For dirty tracking 2) is required, as it needs to ensure that
>> the stage-2/parent IOMMU domain will only attach devices
>> that support dirty tracking (so far it is all homogeneous in x86, likely
>> not the case for smmuv3). Such invariant on dirty tracking provides a
>> useful guarantee to VMMs that will refuse incompatible device
>> attachments for IOMMU domains.
>>
>> For dirty tracking such property is enabled/enforced via HWPT_ALLOC,
>> which is responsible for creating an IOMMU domain. This is contrast to
>> the 'simple API' where the IOMMU domain is created by IOMMUFD
>> automatically when it attaches to VFIO (usually referred as autodomains)
>>
>> To support dirty tracking with the advanced IOMMUFD API, it needs
>> similar logic, where IOMMU domains are created and devices attached to
>> compatible domains. Essentially mimmicing kernel
>> iommufd_device_auto_get_domain(). If this fails (i.e. mdevs) it falls back
>> to IOAS attach.
>>
>> Signed-off-by: Joao Martins 
>> ---
>> Right now the only alternative to a userspace autodomains implementation
>> is to mimmicing all the flags being added to HWPT_ALLOC but into VFIO
>> IOAS attach.
> 
> It was my expectation that VMM userspace would implement direct hwpt
> support. This is needed in all kinds of cases going forward because
> hwpt allocation is not uniform across iommu instances and managing
> this in the kernel is only feasible for simpler cases. Dirty tracking
> would be one of them.
> 

/me nods

>> +int iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id,
>> +   uint32_t pt_id, uint32_t flags,
>> +   uint32_t data_type, uint32_t data_len,
>> +   void *data_ptr, uint32_t *out_hwpt)
>> +{
>> +int ret;
>> +struct iommu_hwpt_alloc alloc_hwpt = {
>> +.size = sizeof(struct iommu_hwpt_alloc),
>> +.flags = flags,
>> +.dev_id = dev_id,
>> +.pt_id = pt_id,
>> +.data_type = data_type,
>> +.data_len = data_len,
>> +.data_uptr = (uint64_t)data_ptr,
>> +.__reserved = 0,
> 
> Unless the coding style requirs this it is not necessary to zero in
> the __reserved within a bracketed in initializer..
> 

Ah yes; and no other helper is doing this, so definitely doesn't look code
style. I'll remove it.

>> +static int iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>> +VFIOIOMMUFDContainer *container,
>> +Error **errp)
>> +{
>> +int iommufd = vbasedev->iommufd_dev.iommufd->fd;
>> +VFIOIOASHwpt *hwpt;
>> +Error *err = NULL;
>> +int ret = -EINVAL;
>> +uint32_t hwpt_id;
>> +
>> +/* Try to find a domain */
>> +QLIST_FOREACH(hwpt, >hwpt_list, next) {
>> +ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, );
>> +if (ret) {
>> +/* -EINVAL means the domain is incompatible with the device. */
>> +if (ret == -EINVAL) {
> 
> Please send a kernel kdoc patch to document this..
> 

Ack

> The approach looks good to me
> 
> The nesting patches surely need this too?

>From what I understand, yes, but they seem to be able to hid this inside
intel-iommu for the parent hwpt allocation somehow. I'll let them comment;

Re: [PATCH 08/14] vfio: Use new Error** argument in vfio_save_setup()

2024-02-12 Thread Cédric Le Goater


On 2/12/24 10:17, Avihai Horon wrote:

Hi Cedric,

On 07/02/2024 15:33, Cédric Le Goater wrote:

External email: Use caution opening links or attachments


Add an Error** argument to vfio_migration_set_state() and adjust
callers, including vfio_save_setup(). The error will be propagated up
to qemu_savevm_state_setup() where the save_setup() handler is
executed.

Signed-off-by: Cédric Le Goater 
---
  hw/vfio/migration.c | 62 +
  1 file changed, 40 insertions(+), 22 deletions(-)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 
2dfbe671f6f45aa530c7341177bb532d8292cecd..2e0a79967cc97f44d9be5575c3cfe18c9f349dab
 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -84,7 +84,8 @@ static const char *mig_state_to_str(enum 
vfio_device_mig_state state)

  static int vfio_migration_set_state(VFIODevice *vbasedev,
  enum vfio_device_mig_state new_state,
-    enum vfio_device_mig_state recover_state)
+    enum vfio_device_mig_state recover_state,
+    Error **errp)
  {
  VFIOMigration *migration = vbasedev->migration;
  uint64_t buf[DIV_ROUND_UP(sizeof(struct vfio_device_feature) +
@@ -104,15 +105,15 @@ static int vfio_migration_set_state(VFIODevice *vbasedev,
  ret = -errno;

  if (recover_state == VFIO_DEVICE_STATE_ERROR) {
-    error_report("%s: Failed setting device state to %s, err: %s. "
- "Recover state is ERROR. Resetting device",
- vbasedev->name, mig_state_to_str(new_state),
- strerror(errno));
+    error_setg(errp, "%s: Failed setting device state to %s, err: %s. "
+   "Recover state is ERROR. Resetting device",
+   vbasedev->name, mig_state_to_str(new_state),
+   strerror(errno));

  goto reset_device;
  }

-    error_report(
+    error_setg(errp,
  "%s: Failed setting device state to %s, err: %s. Setting device in 
recover state %s",
   vbasedev->name, mig_state_to_str(new_state),
   strerror(errno), mig_state_to_str(recover_state));
@@ -120,7 +121,7 @@ static int vfio_migration_set_state(VFIODevice *vbasedev,
  mig_state->device_state = recover_state;
  if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) {
  ret = -errno;
-    error_report(
+    error_setg(errp,
  "%s: Failed setting device in recover state, err: %s. Resetting 
device",
   vbasedev->name, strerror(errno));


I think here we will assert because errp is already set.

Adding an error_append() API would be useful here I guess.


yes.


Otherwise, we need to move the first error_setg() below, to before we return 
from a successful recover state change, and construct the error message 
differently (e.g., provide a full error message for the recover state fail case 
containing also the first error).

Do you have other ideas?


Errors for :

if (ioctl(vbasedev->fd, VFIO_DEVICE_RESET)) {

should be treated as the others with and error_append() and not
hw_error(). This needs a rework before any new changes.

I also wonder why we have twice :

migration->device_state = recover_state;

It looks redundant. The ioctl VFIO_DEVICE_FEATURE should leave the
state unmodified.

Thanks,

C.

[RFC PATCH] include/qom/object.h: New OBJECT_DEFINE_SIMPLE_TYPE{, _WITH_INTEFACES} macros

2024-02-12 Thread Peter Maydell

We have an OBJECT_DEFINE_TYPE_EXTENDED macro, plus several variations
on it, which emits the boilerplate for the TypeInfo and ensures it is
registered with the type system.  However, all the existing macros
insist that the type being defined has its own FooClass struct, so
they aren't useful for the common case of a simple leaf class which
doesn't have any new methods or any other need for its own class
struct (that is, for the kind of type that OBJECT_DECLARE_SIMPLE_TYPE
declares).

Pull the actual implementation of OBJECT_DEFINE_TYPE_EXTENDED out
into a new DO_OBJECT_DEFINE_TYPE_EXTENDED which parameterizes the
value we use for the class_size field.  This lets us add a new
OBJECT_DEFINE_SIMPLE_TYPE which does the same job as the various
existing OBJECT_DEFINE_*_TYPE_* family macros for this kind of simple
type, and the variant OBJECT_DEFINE_SIMPLE_TYPE_WITH_INTERFACES for
when the type will implement some interfaces.

Signed-off-by: Peter Maydell 
---
I've marked this RFC largely because this patch doesn't include
any uses of the new macros. I wanted them for a series I'm currently
working on, but I wanted to send this out early so it could have an
extended review period and a bit more visibility than if I stuck
it inside an 8 patch series about some other topic. (In fact, this
is the second time I looked at the OBJECT_DEFINE_* macros and found
they weren't suitable for the common case kind of type. The first
time around I went back to writing the type out the old fashioned
way, but this time I figured I'd try improving the macros.)

 docs/devel/qom.rst   |  34 +++--
 include/qom/object.h | 114 +--
 2 files changed, 117 insertions(+), 31 deletions(-)

diff --git a/docs/devel/qom.rst b/docs/devel/qom.rst
index 9918fac7f21..0889ca949c1 100644
--- a/docs/devel/qom.rst
+++ b/docs/devel/qom.rst
@@ -348,12 +348,14 @@ used. This does the same as OBJECT_DECLARE_SIMPLE_TYPE(), 
but without
 the 'struct MyDeviceClass' definition.
 
 To implement the type, the OBJECT_DEFINE macro family is available.
-In the simple case the OBJECT_DEFINE_TYPE macro is suitable:
+For the simplest case of a leaf class which doesn't need any of its
+own virtual functions (i.e. which was declared with OBJECT_DECLARE_SIMPLE_TYPE)
+the OBJECT_DEFINE_SIMPLE_TYPE macro is suitable:
 
 .. code-block:: c
:caption: Defining a simple type
 
-   OBJECT_DEFINE_TYPE(MyDevice, my_device, MY_DEVICE, DEVICE)
+   OBJECT_DEFINE_SIMPLE_TYPE(MyDevice, my_device, MY_DEVICE, DEVICE)
 
 This is equivalent to the following:
 
@@ -370,7 +372,6 @@ This is equivalent to the following:
.instance_size = sizeof(MyDevice),
.instance_init = my_device_init,
.instance_finalize = my_device_finalize,
-   .class_size = sizeof(MyDeviceClass),
.class_init = my_device_class_init,
};
 
@@ -385,13 +386,36 @@ This is sufficient to get the type registered with the 
type
 system, and the three standard methods now need to be implemented
 along with any other logic required for the type.
 
+If the class needs its own virtual methods, or has some other
+per-class state it needs to store in its own class struct,
+then you can use the OBJECT_DEFINE_TYPE macro. This does the
+same thing as OBJECT_DEFINE_SIMPLE_TYPE, but it also sets the
+class_size of the type to the size of the class struct.
+
+.. code-block:: c
+   :caption: Defining a type which needs a class struct
+
+   OBJECT_DEFINE_TYPE(MyDevice, my_device, MY_DEVICE, DEVICE)
+
 If the type needs to implement one or more interfaces, then the
-OBJECT_DEFINE_TYPE_WITH_INTERFACES() macro can be used instead.
-This accepts an array of interface type names.
+OBJECT_DEFINE_SIMPLE_TYPE_WITH_INTERFACES() and
+OBJECT_DEFINE_TYPE_WITH_INTERFACES() macros can be used instead.
+These accept an array of interface type names. The difference between
+them is that the former is for simple leaf classes that don't need
+a class struct, and the latter is for when you will be defining
+a class struct.
 
 .. code-block:: c
:caption: Defining a simple type implementing interfaces
 
+   OBJECT_DEFINE_SIMPLE_TYPE_WITH_INTERFACES(MyDevice, my_device,
+ MY_DEVICE, DEVICE,
+ { TYPE_USER_CREATABLE },
+ { NULL })
+
+.. code-block:: c
+   :caption: Defining a type implementing interfaces
+
OBJECT_DEFINE_TYPE_WITH_INTERFACES(MyDevice, my_device,
   MY_DEVICE, DEVICE,
   { TYPE_USER_CREATABLE },
diff --git a/include/qom/object.h b/include/qom/object.h
index afccd24ca7a..f52ab216cdd 100644
--- a/include/qom/object.h
+++ b/include/qom/object.h
@@ -258,6 +258,51 @@ struct Object
 DECLARE_INSTANCE_CHECKER(InstanceType, MODULE_OBJ_NAME, 
TYPE_##MODULE_OBJ_NAME)
 
 
+/**
+ * DO_OBJECT_DEFINE_TYPE_EXTENDED:
+ * @ModuleObjName: the object name with initial caps

Re: [RFC v2 0/5] ARM Nested Virt Support

2024-02-12 Thread Marc Zyngier


On 2024-02-09 18:57, Peter Maydell wrote:

On Fri, 9 Feb 2024 at 16:00, Eric Auger  wrote:


This series adds ARM Nested Virtualization support in KVM mode.
This is a respin of previous contributions from Miguel [1] and Haibo 
[2].


This was tested with Marc's v11 [3] on Ampere HW with fedora L1 guest 
and

L2 guests booted without EDK2. However it does not work yet with
EDK2 but it looks unrelated to this qemu integration (host hard 
lockups).


The host needs to be booted with "kvm-arm.mode=nested" option and
qemu needs to be invoked with :

-machine virt,virtualization=on

There is a known issue with hosts supporting SVE. Kernel does not 
support both

SVE and NV2 and the current qemu integration has an issue with the
scratch_host_vcpu startup because both are enabled if exposed by the 
kernel.
This is independent on whether sve is disabled on the command line. 
Unfortunately
I lost access to the HW that expose that issue so I couldn't fix it in 
this

version.


You can probably repro that by running the whole setup under
QEMU's FEAT_NV emulation, which will be able to give you a CPU
with both FEAT_NV and SVE.

Personally I think that this is a kernel missing-feature that
should really be fixed as part of getting the kernel patches
upstreamed. There's no cause to force every userspace VMM to
develop extra complications for this.


I don't plan to make NV visible to userspace before this is fixed.
Which may delay KVM NV by another year or five, but I don't think
anyone is really waiting for it anyway.

M.
--
Jazz is not dead. It just smells funny...

Re: [RFC v2 0/5] ARM Nested Virt Support

2024-02-12 Thread Eric Auger

Hi Peter,

On 2/9/24 19:57, Peter Maydell wrote:
> On Fri, 9 Feb 2024 at 16:00, Eric Auger  wrote:
>> This series adds ARM Nested Virtualization support in KVM mode.
>> This is a respin of previous contributions from Miguel [1] and Haibo [2].
>>
>> This was tested with Marc's v11 [3] on Ampere HW with fedora L1 guest and
>> L2 guests booted without EDK2. However it does not work yet with
>> EDK2 but it looks unrelated to this qemu integration (host hard lockups).
>>
>> The host needs to be booted with "kvm-arm.mode=nested" option and
>> qemu needs to be invoked with :
>>
>> -machine virt,virtualization=on
>>
>> There is a known issue with hosts supporting SVE. Kernel does not support 
>> both
>> SVE and NV2 and the current qemu integration has an issue with the
>> scratch_host_vcpu startup because both are enabled if exposed by the kernel.
>> This is independent on whether sve is disabled on the command line. 
>> Unfortunately
>> I lost access to the HW that expose that issue so I couldn't fix it in this
>> version.
> You can probably repro that by running the whole setup under
> QEMU's FEAT_NV emulation, which will be able to give you a CPU
> with both FEAT_NV and SVE.

indeed, this should work now we have FEAT_NV.
>
> Personally I think that this is a kernel missing-feature that
> should really be fixed as part of getting the kernel patches
> upstreamed. There's no cause to force every userspace VMM to
> develop extra complications for this.
yes maybe this will be fixed later on.

Thanks!

Eric

>
> thanks
> -- PMM
>

[PATCH 2/2] target/riscv/csr: Added the ability to delegate LCOFI to VS

2024-02-12 Thread Irina Ryapolova

From: Vadim Shakirov 

In the AIA specification in the paragraph "Virtual interrupts for VS level"
it is indicated for interrupts 13-63: if the bit in hideleg is enabled,
then the corresponding vsip and vsie bits are aliases to sip and sie

Signed-off-by: Vadim Shakirov 
---
 target/riscv/csr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index 0c21145eaf..51b1099e10 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -1136,7 +1136,7 @@ static RISCVException write_stimecmph(CPURISCVState *env, 
int csrno,
 static const uint64_t delegable_ints =
 S_MODE_INTERRUPTS | VS_MODE_INTERRUPTS | MIP_LCOFIP;
 static const uint64_t vs_delegable_ints =
-(VS_MODE_INTERRUPTS | LOCAL_INTERRUPTS) & ~MIP_LCOFIP;
+VS_MODE_INTERRUPTS | LOCAL_INTERRUPTS;
 static const uint64_t all_ints = M_MODE_INTERRUPTS | S_MODE_INTERRUPTS |
  HS_MODE_INTERRUPTS | LOCAL_INTERRUPTS;
 #define DELEGABLE_EXCPS ((1ULL << (RISCV_EXCP_INST_ADDR_MIS)) | \
-- 
2.25.1

[PATCH 1/2] target/riscv/csr.c: Add functional of hvictl CSR

2024-02-12 Thread Irina Ryapolova

CSR hvictl (Hypervisor Virtual Interrupt Control) provides further flexibility
for injecting interrupts into VS level in situations not fully supported by the
facilities described thus far, but only with more active involvement of the 
hypervisor.
(See riscv-interrupts-1.0: Interrupts at VS level)

Signed-off-by: Irina Ryapolova 
---
 target/riscv/csr.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index 674ea075a4..0c21145eaf 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -3585,6 +3585,21 @@ static int read_hvictl(CPURISCVState *env, int csrno, 
target_ulong *val)
 static int write_hvictl(CPURISCVState *env, int csrno, target_ulong val)
 {
 env->hvictl = val & HVICTL_VALID_MASK;
+if (env->hvictl & HVICTL_VTI)
+{
+uint32_t hviid = get_field(env->hvictl, HVICTL_IID);
+uint32_t hviprio = get_field(env->hvictl, HVICTL_IPRIO);
+/* the pair IID = 9, IPRIO = 0 generally to represent no interrupt in 
hvictl. */
+if (!(hviid == IRQ_S_EXT && hviprio == 0)) {
+uint64_t new_val = BIT(hviid) ;
+ if (new_val & S_MODE_INTERRUPTS) {
+rmw_hvip64(env, csrno, NULL, new_val << 1, new_val << 1);
+} else if (new_val & LOCAL_INTERRUPTS) {
+rmw_hvip64(env, csrno, NULL, new_val, new_val);
+}
+}
+}
+
 return RISCV_EXCP_NONE;
 }
 
-- 
2.25.1

[PATCH 2/2] target/riscv/csr: Added the ability to delegate LCOFI to VS

2024-02-12 Thread Irina Ryapolova

From: Vadim Shakirov 

In the AIA specification in the paragraph "Virtual interrupts for VS level"
it is indicated for interrupts 13-63: if the bit in hideleg is enabled,
then the corresponding vsip and vsie bits are aliases to sip and sie

Signed-off-by: Vadim Shakirov 
---
 target/riscv/csr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index 0c21145eaf..51b1099e10 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -1136,7 +1136,7 @@ static RISCVException write_stimecmph(CPURISCVState *env, 
int csrno,
 static const uint64_t delegable_ints =
 S_MODE_INTERRUPTS | VS_MODE_INTERRUPTS | MIP_LCOFIP;
 static const uint64_t vs_delegable_ints =
-(VS_MODE_INTERRUPTS | LOCAL_INTERRUPTS) & ~MIP_LCOFIP;
+VS_MODE_INTERRUPTS | LOCAL_INTERRUPTS;
 static const uint64_t all_ints = M_MODE_INTERRUPTS | S_MODE_INTERRUPTS |
  HS_MODE_INTERRUPTS | LOCAL_INTERRUPTS;
 #define DELEGABLE_EXCPS ((1ULL << (RISCV_EXCP_INST_ADDR_MIS)) | \
-- 
2.25.1

[PATCH 1/2] target/riscv/csr.c: Add functional of hvictl CSR

2024-02-12 Thread Irina Ryapolova

CSR hvictl (Hypervisor Virtual Interrupt Control) provides further flexibility
for injecting interrupts into VS level in situations not fully supported by the
facilities described thus far, but only with more active involvement of the 
hypervisor.
(See riscv-interrupts-1.0: Interrupts at VS level)

Signed-off-by: Irina Ryapolova 
---
 target/riscv/csr.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index 674ea075a4..0c21145eaf 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -3585,6 +3585,21 @@ static int read_hvictl(CPURISCVState *env, int csrno, 
target_ulong *val)
 static int write_hvictl(CPURISCVState *env, int csrno, target_ulong val)
 {
 env->hvictl = val & HVICTL_VALID_MASK;
+if (env->hvictl & HVICTL_VTI)
+{
+uint32_t hviid = get_field(env->hvictl, HVICTL_IID);
+uint32_t hviprio = get_field(env->hvictl, HVICTL_IPRIO);
+/* the pair IID = 9, IPRIO = 0 generally to represent no interrupt in 
hvictl. */
+if (!(hviid == IRQ_S_EXT && hviprio == 0)) {
+uint64_t new_val = BIT(hviid) ;
+ if (new_val & S_MODE_INTERRUPTS) {
+rmw_hvip64(env, csrno, NULL, new_val << 1, new_val << 1);
+} else if (new_val & LOCAL_INTERRUPTS) {
+rmw_hvip64(env, csrno, NULL, new_val, new_val);
+}
+}
+}
+
 return RISCV_EXCP_NONE;
 }
 
-- 
2.25.1

Re: [PATCH RFCv2 6/8] backends/iommufd: Add ability to disable hugepages

2024-02-12 Thread Jason Gunthorpe

On Mon, Feb 12, 2024 at 01:56:41PM +, Joao Martins wrote:
> Allow disabling hugepages to be dirty track at base page
> granularity in similar vein to vfio_type1_iommu.disable_hugepages
> but per IOAS.

No objection to this, but I just wanted to observe I didn't imagine
using this option for this purpose. It should work OK but it is a
pretty big an inefficient hammer :)

Jason

Re: [PATCH 04/14] migration: Modify ram_init_bitmaps() to report dirty tracking errors

2024-02-12 Thread Cédric Le Goater


On 2/12/24 09:51, Avihai Horon wrote:

Hi Cedric,

On 07/02/2024 15:33, Cédric Le Goater wrote:

External email: Use caution opening links or attachments


The .save_setup() handler has now an Error** argument that we can use
to propagate errors reported by the .log_global_start() handler. Do
that for the RAM. qemu_savevm_state_setup() will store the error under
the migration stream for later detection in the migration sequence.

Signed-off-by: Cédric Le Goater 
---
  migration/ram.c | 19 ++-
  1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/migration/ram.c b/migration/ram.c
index 
d86626bb1c704b2d3497b323a702ca6ca8939a79..b87245466bb46937fd0358d0c66432bcc6280018
 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2802,19 +2802,17 @@ static void 
migration_bitmap_clear_discarded_pages(RAMState *rs)
  }
  }

-static void ram_init_bitmaps(RAMState *rs)
+static void ram_init_bitmaps(RAMState *rs, Error **errp)
  {
-    Error *local_err = NULL;
-
  qemu_mutex_lock_ramlist();

  WITH_RCU_READ_LOCK_GUARD() {
  ram_list_init_bitmaps();
  /* We don't use dirty log with background snapshots */
  if (!migrate_background_snapshot()) {
-    memory_global_dirty_log_start(GLOBAL_DIRTY_MIGRATION, _err);
-    if (local_err) {
-    error_report_err(local_err);
+    memory_global_dirty_log_start(GLOBAL_DIRTY_MIGRATION, errp);
+    if (*errp) {


I think we should use ERRP_GUARD() or a local error here and also below at 
ram_init_bitmaps() (or return bool like Philippe suggested).


yes. I will rework that part.

Thanks,

C.




Thanks.


+    break;
  }
  migration_bitmap_sync_precopy(rs, false);
  }
@@ -2828,7 +2826,7 @@ static void ram_init_bitmaps(RAMState *rs)
  migration_bitmap_clear_discarded_pages(rs);
  }

-static int ram_init_all(RAMState **rsp)
+static int ram_init_all(RAMState **rsp, Error **errp)
  {
  if (ram_state_init(rsp)) {
  return -1;
@@ -2839,7 +2837,10 @@ static int ram_init_all(RAMState **rsp)
  return -1;
  }

-    ram_init_bitmaps(*rsp);
+    ram_init_bitmaps(*rsp, errp);
+    if (*errp) {
+    return -1;
+    }

  return 0;
  }
@@ -2952,7 +2953,7 @@ static int ram_save_setup(QEMUFile *f, void *opaque, 
Error **errp)

  /* migration has already setup the bitmap, reuse it. */
  if (!migration_in_colo_state()) {
-    if (ram_init_all(rsp) != 0) {
+    if (ram_init_all(rsp, errp) != 0) {
  compress_threads_save_cleanup();
  return -1;
  }
--
2.43.0

Re: [PATCH 03/14] memory: Add Error** argument to .log_global*() handlers

2024-02-12 Thread Cédric Le Goater


On 2/12/24 09:43, Avihai Horon wrote:

Hi Cedric,

On 09/02/2024 12:14, Cédric Le Goater wrote:

External email: Use caution opening links or attachments


On 2/8/24 06:48, Peter Xu wrote:

On Wed, Feb 07, 2024 at 02:33:36PM +0100, Cédric Le Goater wrote:

@@ -2936,14 +2940,14 @@ void memory_global_dirty_log_start(unsigned int flags)
  trace_global_dirty_changed(global_dirty_tracking);

  if (!old_flags) {
-    MEMORY_LISTENER_CALL_GLOBAL(log_global_start, Forward);
+    MEMORY_LISTENER_CALL_GLOBAL(log_global_start, Forward, errp);
  memory_region_transaction_begin();
  memory_region_update_pending = true;
  memory_region_transaction_commit();
  }
  }

-static void memory_global_dirty_log_do_stop(unsigned int flags)
+static void memory_global_dirty_log_do_stop(unsigned int flags, Error **errp)
  {
  assert(flags && !(flags & (~GLOBAL_DIRTY_MASK)));
  assert((global_dirty_tracking & flags) == flags);
@@ -2955,7 +2959,7 @@ static void memory_global_dirty_log_do_stop(unsigned int 
flags)
  memory_region_transaction_begin();
  memory_region_update_pending = true;
  memory_region_transaction_commit();
-    MEMORY_LISTENER_CALL_GLOBAL(log_global_stop, Reverse);
+    MEMORY_LISTENER_CALL_GLOBAL(log_global_stop, Reverse, errp);
  }
  }


I'm a little bit surprised to see that MEMORY_LISTENER_CALL_GLOBAL()
already allows >2 args, with the ability to conditionally pass over errp
with such oneliner change; even if all callers were only using 2 args
before this patch..

yes. The proposal takes the easy path.

Should we change all memory listener global handlers :

  begin
  commit
  log_global_after_sync
  log_global_start
  log_global_stop

to take an extra Error **errp argument ?

I think we should distinguish begin + commit handlers from the log_global_*
with a new macro. In which case, we could also change the handler to return
a bool and fail at the first error in MEMORY_LISTENER_CALL_GLOBAL(...).


I think we must fail at first error in any case. Otherwise, if two handlers 
error and call error_setg() with errp, the second handler will assert IIUC.


Good point. I will respin with a new MEMORY_LISTENER_CALL_GLOBAL_ERR macro
exiting the loop at first error.

Thanks,

C.

[PATCH 1/2] hw/arm: Use TYPE_OR_IRQ when connecting STM32L4x5 EXTI fan-in IRQs

2024-02-12 Thread Inès Varhol

Fixes: 52671f69f7a4 ("[PATCH v8 0/3] Add device STM32L4x5 EXTI")
Signed-off-by: Inès Varhol 
---
 hw/arm/stm32l4x5_soc.c | 69 --
 1 file changed, 59 insertions(+), 10 deletions(-)

diff --git a/hw/arm/stm32l4x5_soc.c b/hw/arm/stm32l4x5_soc.c
index f470ff74ec..df5bb02315 100644
--- a/hw/arm/stm32l4x5_soc.c
+++ b/hw/arm/stm32l4x5_soc.c
@@ -26,6 +26,7 @@
 #include "qapi/error.h"
 #include "exec/address-spaces.h"
 #include "sysemu/sysemu.h"
+#include "hw/or-irq.h"
 #include "hw/arm/stm32l4x5_soc.h"
 #include "hw/qdev-clock.h"
 #include "hw/misc/unimp.h"
@@ -48,15 +49,14 @@ static const int exti_irq[NUM_EXTI_IRQ] = {
 8,  /* GPIO[2] */
 9,  /* GPIO[3] */
 10, /* GPIO[4] */
-23, 23, 23, 23, 23, /* GPIO[5..9]  */
-40, 40, 40, 40, 40, 40, /* GPIO[10..15]*/
-1,  /* PVD */
+-1, -1, -1, -1, -1, /* GPIO[5..9] OR gate 23   */
+-1, -1, -1, -1, -1, -1, /* GPIO[10..15] OR gate 40 */
+-1, /* PVD OR gate 1   */
 67, /* OTG_FS_WKUP, Direct */
 41, /* RTC_ALARM   */
 2,  /* RTC_TAMP_STAMP2/CSS_LSE */
 3,  /* RTC wakeup timer*/
-63, /* COMP1   */
-63, /* COMP2   */
+-1, -1, /* COMP[1..2] OR gate 63   */
 31, /* I2C1 wakeup, Direct */
 33, /* I2C2 wakeup, Direct */
 72, /* I2C3 wakeup, Direct */
@@ -69,13 +69,29 @@ static const int exti_irq[NUM_EXTI_IRQ] = {
 65, /* LPTIM1, Direct  */
 66, /* LPTIM2, Direct  */
 76, /* SWPMI1 wakeup, Direct   */
-1,  /* PVM1 wakeup */
-1,  /* PVM2 wakeup */
-1,  /* PVM3 wakeup */
-1,  /* PVM4 wakeup */
+-1, -1, -1, -1, /* PVM[1..4] OR gate 1 */
 78  /* LCD wakeup, Direct  */
 };
 
+#define NUM_EXTI_OR_GATES 4
+static const int exti_or_gates_out[NUM_EXTI_OR_GATES] = {
+23, 40, 63, 1,
+};
+
+#define NUM_EXTI_SIMPLE_FANIN_IRQ 3
+static const int exti_or_gates_num_lines_in[NUM_EXTI_SIMPLE_FANIN_IRQ] = {
+5, 6, 2,
+};
+
+static const int exti_or_gates_first_line_in[NUM_EXTI_SIMPLE_FANIN_IRQ] = {
+5, 10, 21,
+};
+
+#define NUM_EXTI_OR_GATE1_NUM_LINES_IN 5
+static const int exti_or_gate1_lines_in[NUM_EXTI_OR_GATE1_NUM_LINES_IN] = {
+16, 35, 36, 37, 38,
+};
+
 static void stm32l4x5_soc_initfn(Object *obj)
 {
 Stm32l4x5SocState *s = STM32L4X5_SOC(obj);
@@ -175,8 +191,41 @@ static void stm32l4x5_soc_realize(DeviceState *dev_soc, 
Error **errp)
 return;
 }
 sysbus_mmio_map(busdev, 0, EXTI_ADDR);
+
+/* IRQs with fan-in that require an OR gate */
+for (unsigned i = 0; i < NUM_EXTI_OR_GATES; i++) {
+Object *orgate = object_new(TYPE_OR_IRQ);
+object_property_set_int(orgate, "num-lines",
+exti_or_gates_num_lines_in[i], _fatal);
+/* Should unref be used? */
+qdev_realize(DEVICE(orgate), NULL, _fatal);
+
+qdev_connect_gpio_out(DEVICE(orgate), 0,
+qdev_get_gpio_in(armv7m, exti_or_gates_out[i]));
+
+/* consecutive inputs for OR gates 23, 40, 63 */
+if (i < NUM_EXTI_SIMPLE_FANIN_IRQ) {
+for (unsigned j = 0; j < exti_or_gates_num_lines_in[i]; j++) {
+sysbus_connect_irq(SYS_BUS_DEVICE(>exti),
+exti_or_gates_first_line_in[i] + j,
+qdev_get_gpio_in(DEVICE(orgate), j));
+}
+/* non-consecutive inputs for OR gate 1 */
+} else {
+for (unsigned j = 0; j < NUM_EXTI_OR_GATE1_NUM_LINES_IN; j++) {
+sysbus_connect_irq(SYS_BUS_DEVICE(>exti),
+exti_or_gate1_lines_in[j],
+qdev_get_gpio_in(DEVICE(orgate), j));
+}
+}
+}
+
+/* IRQs that don't require fan-in */
 for (unsigned i = 0; i < NUM_EXTI_IRQ; i++) {
-sysbus_connect_irq(busdev, i, qdev_get_gpio_in(armv7m, exti_irq[i]));
+if (exti_irq[i] != -1) {
+sysbus_connect_irq(busdev, i,
+   qdev_get_gpio_in(armv7m, exti_irq[i]));
+}
 }
 
 for (unsigned i = 0; i < 16; i++) {
-- 
2.43.0

[PATCH 2/2] tests/qtest: Check that EXTI fan-in irqs are correctly connected

2024-02-12 Thread Inès Varhol

This commit adds a QTest that verifies each input line of a specific
EXTI OR gate can influence the output line.

Signed-off-by: Inès Varhol 
---
 tests/qtest/stm32l4x5_exti-test.c | 97 +++
 1 file changed, 97 insertions(+)

diff --git a/tests/qtest/stm32l4x5_exti-test.c 
b/tests/qtest/stm32l4x5_exti-test.c
index c390077713..276b7adc7a 100644
--- a/tests/qtest/stm32l4x5_exti-test.c
+++ b/tests/qtest/stm32l4x5_exti-test.c
@@ -31,6 +31,11 @@
 
 #define EXTI0_IRQ 6
 #define EXTI1_IRQ 7
+#define EXTI5_IRQ 23
+#define EXTI6_IRQ 23
+#define EXTI7_IRQ 23
+#define EXTI8_IRQ 23
+#define EXTI9_IRQ 23
 #define EXTI35_IRQ 1
 
 static void enable_nvic_irq(unsigned int n)
@@ -499,6 +504,96 @@ static void test_interrupt(void)
 g_assert_false(check_nvic_pending(EXTI1_IRQ));
 }
 
+static void test_orred_interrupts(void)
+{
+/*
+ * For lines EXTI5..9 (fanned-in to NVIC irq 23),
+ * test that rising the line pends interrupt
+ * 23 in NVIC.
+ */
+enable_nvic_irq(EXTI5_IRQ);
+/* Check that there are no interrupts already pending in PR */
+g_assert_cmpuint(exti_readl(EXTI_PR1), ==, 0x);
+/* Check that this specific interrupt isn't pending in NVIC */
+g_assert_false(check_nvic_pending(EXTI5_IRQ));
+
+/* Enable interrupt lines EXTI[5..9] */
+exti_writel(EXTI_IMR1, (0x1F << 5));
+
+/* Configure interrupt on rising edge */
+exti_writel(EXTI_RTSR1, (0x1F << 5));
+
+/* Simulate rising edge from GPIO line 7 */
+exti_set_irq(7, 1);
+g_assert_cmpuint(exti_readl(EXTI_PR1), ==, 1 << 7);
+g_assert_true(check_nvic_pending(EXTI7_IRQ));
+
+/* Clear the pending bit in PR */
+exti_writel(EXTI_PR1, 1 << 7);
+g_assert_cmpuint(exti_readl(EXTI_PR1), ==, 0x);
+g_assert_true(check_nvic_pending(EXTI7_IRQ));
+
+/* Clean NVIC */
+unpend_nvic_irq(EXTI7_IRQ);
+g_assert_false(check_nvic_pending(EXTI7_IRQ));
+
+/* Simulate rising edge from GPIO line 6 */
+exti_set_irq(6, 1);
+g_assert_cmpuint(exti_readl(EXTI_PR1), ==, 1 << 6);
+g_assert_true(check_nvic_pending(EXTI6_IRQ));
+
+/* Clear the pending bit in PR */
+exti_writel(EXTI_PR1, 1 << 6);
+g_assert_cmpuint(exti_readl(EXTI_PR1), ==, 0x);
+g_assert_true(check_nvic_pending(EXTI6_IRQ));
+
+/* Clean NVIC */
+unpend_nvic_irq(EXTI6_IRQ);
+g_assert_false(check_nvic_pending(EXTI6_IRQ));
+
+/* Simulate rising edge from GPIO line 5 */
+exti_set_irq(5, 1);
+g_assert_cmpuint(exti_readl(EXTI_PR1), ==, 1 << 5);
+g_assert_true(check_nvic_pending(EXTI5_IRQ));
+
+/* Clear the pending bit in PR */
+exti_writel(EXTI_PR1, 1 << 5);
+g_assert_cmpuint(exti_readl(EXTI_PR1), ==, 0x);
+g_assert_true(check_nvic_pending(EXTI5_IRQ));
+
+/* Clean NVIC */
+unpend_nvic_irq(EXTI5_IRQ);
+g_assert_false(check_nvic_pending(EXTI5_IRQ));
+
+/* Simulate rising edge from GPIO line 8 */
+exti_set_irq(8, 1);
+g_assert_cmpuint(exti_readl(EXTI_PR1), ==, 1 << 8);
+g_assert_true(check_nvic_pending(EXTI8_IRQ));
+
+/* Clear the pending bit in PR */
+exti_writel(EXTI_PR1, 1 << 8);
+g_assert_cmpuint(exti_readl(EXTI_PR1), ==, 0x);
+g_assert_true(check_nvic_pending(EXTI8_IRQ));
+
+/* Clean NVIC */
+unpend_nvic_irq(EXTI8_IRQ);
+g_assert_false(check_nvic_pending(EXTI8_IRQ));
+
+/* Simulate rising edge from GPIO line 9 */
+exti_set_irq(9, 1);
+g_assert_cmpuint(exti_readl(EXTI_PR1), ==, 1 << 9);
+g_assert_true(check_nvic_pending(EXTI9_IRQ));
+
+/* Clear the pending bit in PR */
+exti_writel(EXTI_PR1, 1 << 9);
+g_assert_cmpuint(exti_readl(EXTI_PR1), ==, 0x);
+g_assert_true(check_nvic_pending(EXTI9_IRQ));
+
+/* Clean NVIC */
+unpend_nvic_irq(EXTI9_IRQ);
+g_assert_false(check_nvic_pending(EXTI9_IRQ));
+}
+
 int main(int argc, char **argv)
 {
 int ret;
@@ -515,6 +610,8 @@ int main(int argc, char **argv)
 qtest_add_func("stm32l4x5/exti/masked_interrupt", test_masked_interrupt);
 qtest_add_func("stm32l4x5/exti/interrupt", test_interrupt);
 qtest_add_func("stm32l4x5/exti/test_edge_selector", test_edge_selector);
+qtest_add_func("stm32l4x5/exti/test_orred_interrupts",
+   test_orred_interrupts);
 
 qtest_start("-machine b-l475e-iot01a");
 ret = g_test_run();
-- 
2.43.0

[PATCH 0/2] hw/arm: Fix STM32L4x5 EXTI to CPU irq fan-in connections

2024-02-12 Thread Inès Varhol

The original code was connecting several outbounds qemu_irqs to the
same qemu_irq without using a TYPE_OR_IRQ.

This patch fixes the issue by using OR gates when necessary (1st commit).

I attempted to check that the problem is fixed by using a QTest (2nd commit)
but actually the test is passing even before the fix :
when any fan-in input line is raised, the output is raised too.

Fixes: 52671f69f7a4 ("[PATCH v8 0/3] Add device STM32L4x5 EXTI")
Signed-off-by: Inès Varhol 

Inès Varhol (2):
  hw/arm: Use TYPE_OR_IRQ when connecting STM32L4x5 EXTI fan-in IRQs
  tests/qtest: Check that EXTI fan-in irqs are correctly connected

 hw/arm/stm32l4x5_soc.c| 69 ++
 tests/qtest/stm32l4x5_exti-test.c | 97 +++
 2 files changed, 156 insertions(+), 10 deletions(-)

-- 
2.43.0

Re: [PATCH RFCv2 2/8] vfio/iommufd: Introduce auto domain creation

2024-02-12 Thread Jason Gunthorpe

On Mon, Feb 12, 2024 at 01:56:37PM +, Joao Martins wrote:
> There's generally two modes of operation for IOMMUFD:
> 
> * The simple user API which intends to perform relatively simple things
> with IOMMUs e.g. DPDK. It generally creates an IOAS and attach to VFIO
> and mainly performs IOAS_MAP and UNMAP.
> 
> * The native IOMMUFD API where you have fine grained control of the
> IOMMU domain and model it accordingly. This is where most new feature
> are being steered to.
> 
> For dirty tracking 2) is required, as it needs to ensure that
> the stage-2/parent IOMMU domain will only attach devices
> that support dirty tracking (so far it is all homogeneous in x86, likely
> not the case for smmuv3). Such invariant on dirty tracking provides a
> useful guarantee to VMMs that will refuse incompatible device
> attachments for IOMMU domains.
> 
> For dirty tracking such property is enabled/enforced via HWPT_ALLOC,
> which is responsible for creating an IOMMU domain. This is contrast to
> the 'simple API' where the IOMMU domain is created by IOMMUFD
> automatically when it attaches to VFIO (usually referred as autodomains)
> 
> To support dirty tracking with the advanced IOMMUFD API, it needs
> similar logic, where IOMMU domains are created and devices attached to
> compatible domains. Essentially mimmicing kernel
> iommufd_device_auto_get_domain(). If this fails (i.e. mdevs) it falls back
> to IOAS attach.
> 
> Signed-off-by: Joao Martins 
> ---
> Right now the only alternative to a userspace autodomains implementation
> is to mimmicing all the flags being added to HWPT_ALLOC but into VFIO
> IOAS attach.

It was my expectation that VMM userspace would implement direct hwpt
support. This is needed in all kinds of cases going forward because
hwpt allocation is not uniform across iommu instances and managing
this in the kernel is only feasible for simpler cases. Dirty tracking
would be one of them.

> +int iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id,
> +   uint32_t pt_id, uint32_t flags,
> +   uint32_t data_type, uint32_t data_len,
> +   void *data_ptr, uint32_t *out_hwpt)
> +{
> +int ret;
> +struct iommu_hwpt_alloc alloc_hwpt = {
> +.size = sizeof(struct iommu_hwpt_alloc),
> +.flags = flags,
> +.dev_id = dev_id,
> +.pt_id = pt_id,
> +.data_type = data_type,
> +.data_len = data_len,
> +.data_uptr = (uint64_t)data_ptr,
> +.__reserved = 0,

Unless the coding style requirs this it is not necessary to zero in
the __reserved within a bracketed in initializer..

> +static int iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
> +VFIOIOMMUFDContainer *container,
> +Error **errp)
> +{
> +int iommufd = vbasedev->iommufd_dev.iommufd->fd;
> +VFIOIOASHwpt *hwpt;
> +Error *err = NULL;
> +int ret = -EINVAL;
> +uint32_t hwpt_id;
> +
> +/* Try to find a domain */
> +QLIST_FOREACH(hwpt, >hwpt_list, next) {
> +ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, );
> +if (ret) {
> +/* -EINVAL means the domain is incompatible with the device. */
> +if (ret == -EINVAL) {

Please send a kernel kdoc patch to document this..

The approach looks good to me

The nesting patches surely need this too?

Jason

Bug: physmem: address_space_read_cached_slow() accesses wrong MemoryRegion on latter part of large reads.

2024-02-12 Thread Jonathan Cameron via

Hi All,

The continuing saga of a getting CXL emulation to play well with using the
memory as normal RAM ran into (hopefully) a last issue.

When running my boot image via virtio-blk-pci and having deliberately forced 
some
buffers to end up in the CXL memory via
$ numactl --membind=1 ls
then on shut down I was getting a failure to allocate a large enough DMA buffer
even with Mattias' set to increase the size and an extra patch to apply that to
the main system memory (as virtio ignores iommu and PCI address space).
gdb/bt pointed at virtqueue_split_pop() bt pointed later than the actual problem
as descriptors were loaded but corrupt here:
https://elixir.bootlin.com/qemu/latest/source/hw/virtio/virtio.c#L1573
https://elixir.bootlin.com/qemu/latest/source/hw/virtio/virtio.c#L306

After a chase (via a highly suspicious read size of 0x it turns
out it was issuing a 16 byte read via address_space_read_cached_slow() and
getting a very odd answer.

The first 8 bytes was the correct virtio descriptor content the second 8
were coming from the flash region at physical address 0 onwards on arm-virt.

With those breadcrumbs the problem was (I think) easy to find.

https://elixir.bootlin.com/qemu/latest/source/include/exec/memory.h#L3029
address_space_read_cached_slow() operates on a MemoryRegionCache with the
hwaddr being relative to the start of that cached region.

However it calls flatview_read_continue()  That's fine as long as we can
do the read in one go.


For normal memory flatview_read_continue() will deal with splitting a read up
but on each iteration it loads the mr via
mr = flatview_translate(fv, addr, , , false, attrs);
to cope with reads that cross MemoryRegions.
https://elixir.bootlin.com/qemu/latest/source/system/physmem.c#L2737

Unfortunately the addr passed in here is the one addressing into the
MemoryRegionCache offset by whatever we already read.
Which for this bug example that was 0x8 (in the flash memory).

Assuming I have correctly identified the problem.
One potential fix is to define a new

MemtxResult address_space_read_cached_continue(MemoryRegionCache *cache,
   hwaddr addr, MemTxAttr attrs,
   void *ptr, hwaddr len,
   hwaddr addr1, hwaddr l,
   MemoryRegion *mr)

That is nearly identical to flatview_read_continue() but with the 
mr = flatview_translate() replaced with
mr = address_space_translate_cached(cache, addr, , , false, attrs)

That's a bit ugly though given the duplicated code but any other change
is going to involve some more invasive splitting out of utility functions
to share all but the outer loop.

I don't currently have a test hitting it but assume
flatview_write_continue() in address_space_write_cached_slow() has the
same problem.

Jonathan

p.s. Will tidy this and the rest of my house of cards up then post it.
I suspect we'll carry on hitting QEMU limitations with the CXL emulation
but for now I have x86 and ARM setups that work with TCG.
Hmm. Need to spend some time getting regressions tests in place :(

Re: Requesting suggestions on how to access I2C Bus of the Guest OS

2024-02-12 Thread Peter Maydell

On Mon, 12 Feb 2024 at 16:01, Harshit Aghera  wrote:
>
> We have a Linux image targeted for architecture Cortex A7, that we are 
> running inside QEMU on x86 machine.
>
>
>
> Our Linux image has I2C slave with EEPROM backend, instantiated from 
> user-space. Reference - Linux I2C slave EEPROM backend — The Linux Kernel 
> documentation.
>
> Commands used from user-space to instantiate these I2C slaves –
>
> echo slave-24c512ro 0x1054 > /sys/bus/i2c/devices/i2c-0/new_device
>
> echo slave-24c02 0x1054 > /sys/bus/i2c/devices/i2c-1/new_device
>
>
>
> Is there a way to send I2C messages to these I2C buses (bus 0 and bus 1) on 
> the Guest OS, as a I2C Master from the Host OS?
>
> Is there a way to send I2C messages to these I2C buses from a separate QEMU 
> Instance’s Guest OS acting as a I2C Master?

No; our i2c bus framework does not currently support either of those
things.

thanks
-- PMM

Re: [PATCH v6 0/2] acpi: report numa nodes for device memory using GI

2024-02-12 Thread Michael S. Tsirkin

On Thu, Jan 04, 2024 at 03:05:27AM +, Ankit Agrawal wrote:
> 
> >>
> >> -numa node,nodeid=2 -numa node,nodeid=3 -numa node,nodeid=4 \
> >> -numa node,nodeid=5 -numa node,nodeid=6 -numa node,nodeid=7 \
> >> -numa node,nodeid=8 -numa node,nodeid=9 \
> >> -device 
> >> vfio-pci-nohotplug,host=0009:01:00.0,bus=pcie.0,addr=04.0,rombar=0,id=dev0 
> >> \
> >> -object acpi-generic-initiator,id=gi0,pci-dev=dev0,host-nodes=2-9 \
> >>
> >
> > I'd find it helpful to see the resulting chunk of SRAT for these examples
> > (disassembled) in this cover letter and the patches (where there are more 
> > examples).
> 
> Ack. I'll document the resulting SRAT table as well.

Still didn't happen so this is dropped for now.

Re: [RFC PATCH 14/14] migration: Fix return-path thread exit

2024-02-12 Thread Cédric Le Goater


Hello Peter

On 2/8/24 06:57, Peter Xu wrote:

On Wed, Feb 07, 2024 at 02:33:47PM +0100, Cédric Le Goater wrote:

In case of error, close_return_path_on_source() can perform a shutdown
to exit the return-path thread.  However, in migrate_fd_cleanup(),
'to_dst_file' is closed before calling close_return_path_on_source()
and the shutdown fails, leaving the source and destination waiting for
an event to occur.

Close the file after calling close_return_path_on_source() so that the
shutdown succeeds and the return-path thread exits.

Signed-off-by: Cédric Le Goater 
---

  This is an RFC because the correct fix implies reworking the QEMUFile
  construct, built on top of the QEMU I/O channel.

  migration/migration.c | 13 ++---
  1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 
5f55af3d7624750ca416c4177781241b3e291e5d..de329f2c553288935d824748286e79e535929b8b
 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1313,6 +1313,8 @@ void migrate_set_state(int *state, int old_state, int 
new_state)
  
  static void migrate_fd_cleanup(MigrationState *s)

  {
+QEMUFile *tmp = NULL;
+
  g_free(s->hostname);
  s->hostname = NULL;
  json_writer_free(s->vmdesc);
@@ -1321,8 +1323,6 @@ static void migrate_fd_cleanup(MigrationState *s)
  qemu_savevm_state_cleanup();
  
  if (s->to_dst_file) {

-QEMUFile *tmp;
-
  trace_migrate_fd_cleanup();
  bql_unlock();
  if (s->migration_thread_running) {
@@ -1341,15 +1341,14 @@ static void migrate_fd_cleanup(MigrationState *s)
   * critical section won't block for long.
   */
  migration_ioc_unregister_yank_from_file(tmp);
-qemu_fclose(tmp);
  }
  
-/*

- * We already cleaned up to_dst_file, so errors from the return
- * path might be due to that, ignore them.
- */
  close_return_path_on_source(s);
  
+if (tmp) {

+qemu_fclose(tmp);
+}
+
  assert(!migration_is_active(s));
  
  if (s->state == MIGRATION_STATUS_CANCELLING) {


I think this is okay to me for a short term plan.  I'll see how others
think, also add Dan into the loop.

If so, would you please add a rich comment explaining why tmp needs to be
closed later?  Especially, explicit comment on the ordering requirement
would be helpful: IMHO here it's an order that qemu_fclose() must happen
after close_return_path_on_source().  So when others work on this code we
don't easily break it without noticing.


Sure. I will when we have clarified with Fabiano what is the best
approach.


Also please feel free to post separately on migration patches if you'd like
us to merge the patches when repost.


This series is a collection of multiple (related) changes :

* extra Error** parameter to save_setup() migration handlers.
  This change has consequences on the various callers which are not
  fully analyzed.
* similar changes for memory logging handlers. These looks more self
  contained and I will see if I can send then separately.
* return-path thread termination

and then, in background we have open questions regarding :

* the QEMUfile implementation and its QIOChannel usage for migration
  streams
* qemu_file_set_error* vs. migrate_set_error. It is confusing, at least
  for me. Do we have some documentation on best practices ?

Thanks,

C.

Requesting suggestions on how to access I2C Bus of the Guest OS

2024-02-12 Thread Harshit Aghera

We have a Linux image targeted for architecture Cortex A7, that we are running 
inside QEMU on x86 machine.

Our Linux image has I2C slave with EEPROM backend, instantiated from 
user-space. Reference - Linux I2C slave EEPROM backend - The Linux Kernel 
documentation.
Commands used from user-space to instantiate these I2C slaves -

echo slave-24c512ro 0x1054 > /sys/bus/i2c/devices/i2c-0/new_device

echo slave-24c02 0x1054 > /sys/bus/i2c/devices/i2c-1/new_device

Is there a way to send I2C messages to these I2C buses (bus 0 and bus 1) on the 
Guest OS, as a I2C Master from the Host OS?
Is there a way to send I2C messages to these I2C buses from a separate QEMU 
Instance's Guest OS acting as a I2C Master?


Thanks,
Harshit.

Re: [PATCH 01/14] migration: Add Error** argument to .save_setup() handler

2024-02-12 Thread Avihai Horon




On 12/02/2024 16:49, Cédric Le Goater wrote:

External email: Use caution opening links or attachments


On 2/12/24 09:36, Avihai Horon wrote:

Hi, Cedric

On 07/02/2024 15:33, Cédric Le Goater wrote:

External email: Use caution opening links or attachments


The purpose is to record a potential error in the migration stream if
qemu_savevm_state_setup() fails. Most of the current .save_setup()
handlers can be modified to use the Error argument instead of managing
their own and calling locally error_report(). The following patches
will introduce such changes for VFIO first.

Signed-off-by: Cédric Le Goater 
---
  include/migration/register.h   | 2 +-
  hw/ppc/spapr.c | 2 +-
  hw/s390x/s390-stattrib.c   | 2 +-
  hw/vfio/migration.c    | 2 +-
  migration/block-dirty-bitmap.c | 2 +-
  migration/block.c  | 2 +-
  migration/ram.c    | 2 +-
  migration/savevm.c | 4 ++--
  8 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/include/migration/register.h 
b/include/migration/register.h
index 
9ab1f79512c605f0c88a45b560c57486fa054441..831600a00eae4efd0464b60925d65de4d9dbcff8 
100644

--- a/include/migration/register.h
+++ b/include/migration/register.h
@@ -25,7 +25,7 @@ typedef struct SaveVMHandlers {
   * used to perform early checks.
   */
  int (*save_prepare)(void *opaque, Error **errp);
-    int (*save_setup)(QEMUFile *f, void *opaque);
+    int (*save_setup)(QEMUFile *f, void *opaque, Error **errp);
  void (*save_cleanup)(void *opaque);
  int (*save_live_complete_postcopy)(QEMUFile *f, void *opaque);
  int (*save_live_complete_precopy)(QEMUFile *f, void *opaque);
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 
0d72d286d80f0435122593555f79fae4d90acf81..a1b0aa02582ad2d68a13476c1859b18143da7bb8 
100644

--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2142,7 +2142,7 @@ static const VMStateDescription vmstate_spapr = {
  }
  };

-static int htab_save_setup(QEMUFile *f, void *opaque)
+static int htab_save_setup(QEMUFile *f, void *opaque, Error **errp)
  {
  SpaprMachineState *spapr = opaque;

diff --git a/hw/s390x/s390-stattrib.c b/hw/s390x/s390-stattrib.c
index 
c483b62a9b5f71772639fc180bdad15ecb6711cb..c934df424a555d83d2198f5ddfc0cbe0ea98e9ec 
100644

--- a/hw/s390x/s390-stattrib.c
+++ b/hw/s390x/s390-stattrib.c
@@ -166,7 +166,7 @@ static int cmma_load(QEMUFile *f, void *opaque, 
int version_id)

  return ret;
  }

-static int cmma_save_setup(QEMUFile *f, void *opaque)
+static int cmma_save_setup(QEMUFile *f, void *opaque, Error **errp)
  {
  S390StAttribState *sas = S390_STATTRIB(opaque);
  S390StAttribClass *sac = S390_STATTRIB_GET_CLASS(sas);
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 
70e6b1a709f9b67e4c9eb41033d76347275cac42..8bcb4bc73cd5ba5338e3ffa4d907d0e6bfbb9485 
100644

--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -378,7 +378,7 @@ static int vfio_save_prepare(void *opaque, Error 
**errp)

  return 0;
  }

-static int vfio_save_setup(QEMUFile *f, void *opaque)
+static int vfio_save_setup(QEMUFile *f, void *opaque, Error **errp)
  {
  VFIODevice *vbasedev = opaque;
  VFIOMigration *migration = vbasedev->migration;
diff --git a/migration/block-dirty-bitmap.c 
b/migration/block-dirty-bitmap.c
index 
2708abf3d762de774ed294d3fdb8e56690d2974c..16f84e6c57c2403a8c2d6319f4e7b6360dade28c 
100644

--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -1213,7 +1213,7 @@ fail:
  return ret;
  }

-static int dirty_bitmap_save_setup(QEMUFile *f, void *opaque)
+static int dirty_bitmap_save_setup(QEMUFile *f, void *opaque, Error 
**errp)

  {
  DBMSaveState *s = &((DBMState *)opaque)->save;
  SaveBitmapState *dbms = NULL;
diff --git a/migration/block.c b/migration/block.c
index 
8c6ebafacc1ffe930d1d4f19d968817b14852c69..df15319ceab66201b043f15eac1b0a7d6522b60c 
100644

--- a/migration/block.c
+++ b/migration/block.c
@@ -708,7 +708,7 @@ static void block_migration_cleanup(void *opaque)
  blk_mig_unlock();
  }

-static int block_save_setup(QEMUFile *f, void *opaque)
+static int block_save_setup(QEMUFile *f, void *opaque, Error **errp)
  {
  int ret;

diff --git a/migration/ram.c b/migration/ram.c
index 
d5b7cd5ac2f31aabf4a248b966153401c48912cf..136c237f4079f68d4e578cf1c72eec2efc815bc8 
100644

--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2931,7 +2931,7 @@ void qemu_guest_free_page_hint(void *addr, 
size_t len)

   * @f: QEMUFile where to send the data
   * @opaque: RAMState pointer
   */
-static int ram_save_setup(QEMUFile *f, void *opaque)
+static int ram_save_setup(QEMUFile *f, void *opaque, Error **errp)
  {
  RAMState **rsp = opaque;
  RAMBlock *block;
diff --git a/migration/savevm.c b/migration/savevm.c
index 
d612c8a9020b204d5d078d5df85f0e6449c27645..f2ae799bad13e631bccf733a34c3a8fd22e8dd48 
100644

--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1342,10 +1342,10 @@ void

Re: [RFC PATCH 14/14] migration: Fix return-path thread exit

2024-02-12 Thread Cédric Le Goater


Hello Fabiano

On 2/8/24 14:29, Fabiano Rosas wrote:

Cédric Le Goater  writes:


In case of error, close_return_path_on_source() can perform a shutdown
to exit the return-path thread.  However, in migrate_fd_cleanup(),
'to_dst_file' is closed before calling close_return_path_on_source()
and the shutdown fails, leaving the source and destination waiting for
an event to occur.


Hi, Cédric

Are you sure this is not caused by patch 13? 


It happens with upstream QEMU without any patch.

When vfio_listener_log_global_start() fails, it sets an error on the
QEMUFile. To reproduce without a VFIO device, you can inject an error
when dirty tracking is started. Something like below,

@@ -2817,6 +2817,8 @@ static void ram_init_bitmaps(RAMState *r
  * containing all 1s to exclude any discarded pages from migration.
  */
 migration_bitmap_clear_discarded_pages(rs);
+
+qemu_file_set_error(migrate_get_current()->to_dst_file, -EAGAIN);
 }
 
 static int ram_init_all(RAMState **rsp)


Activate return-path and migrate.


That 'if (ms->to_dst_file'
was there to avoid this sort of thing happening.

Is there some reordering possibility that I'm not spotting in the code
below? I think the data dependency on to_dst_file shouldn't allow it.

migrate_fd_cleanup:
 qemu_mutex_lock(>qemu_file_lock);
 tmp = s->to_dst_file;
 s->to_dst_file = NULL;
 qemu_mutex_unlock(>qemu_file_lock);
 ...
 qemu_fclose(tmp);

close_return_path_on_source:
 WITH_QEMU_LOCK_GUARD(>qemu_file_lock) {
 if (ms->to_dst_file && ms->rp_state.from_dst_file &&
 qemu_file_get_error(ms->to_dst_file)) {
 qemu_file_shutdown(ms->rp_state.from_dst_file);
 }
 }


close_return_path_on_source() is called by migrate_fd_cleanup() in
the same thread. So, when we reach the locking section ms->to_dst_file
is already NULL and qemu_fclose() has been closed :/

May be I misunderstood. Please try to reproduce with the little hack
above.

Thanks,

C.


I'm thinking maybe the culprit is the close_return_path_on_source() at
migration_completion(). It might be possible for it to race with the
migrate_fd_cleanup_bh from migration_iteration_finish().

If that's the case, then I think that one possible fix would be to hold
the BQL at migration_completion() so the BH doesn't get dispatched until
we properly close the return path.



Close the file after calling close_return_path_on_source() so that the
shutdown succeeds and the return-path thread exits.

Signed-off-by: Cédric Le Goater 
---

  This is an RFC because the correct fix implies reworking the QEMUFile
  construct, built on top of the QEMU I/O channel.

  migration/migration.c | 13 ++---
  1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 
5f55af3d7624750ca416c4177781241b3e291e5d..de329f2c553288935d824748286e79e535929b8b
 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1313,6 +1313,8 @@ void migrate_set_state(int *state, int old_state, int 
new_state)
  
  static void migrate_fd_cleanup(MigrationState *s)

  {
+QEMUFile *tmp = NULL;
+
  g_free(s->hostname);
  s->hostname = NULL;
  json_writer_free(s->vmdesc);
@@ -1321,8 +1323,6 @@ static void migrate_fd_cleanup(MigrationState *s)
  qemu_savevm_state_cleanup();
  
  if (s->to_dst_file) {

-QEMUFile *tmp;
-
  trace_migrate_fd_cleanup();
  bql_unlock();
  if (s->migration_thread_running) {
@@ -1341,15 +1341,14 @@ static void migrate_fd_cleanup(MigrationState *s)
   * critical section won't block for long.
   */
  migration_ioc_unregister_yank_from_file(tmp);
-qemu_fclose(tmp);
  }
  
-/*

- * We already cleaned up to_dst_file, so errors from the return
- * path might be due to that, ignore them.
- */
  close_return_path_on_source(s);
  
+if (tmp) {

+qemu_fclose(tmp);
+}
+
  assert(!migration_is_active(s));
  
  if (s->state == MIGRATION_STATUS_CANCELLING) {

Re: [PATCH V3 09/13] migration: notifier error checking

2024-02-12 Thread Steven Sistare

On 2/12/2024 4:24 AM, David Hildenbrand wrote:
> On 08.02.24 19:54, Steve Sistare wrote:
>> Check the status returned by migration notifiers and report errors.
>> If notifiers fail, call the notifiers again so they can clean up.
> 
> IIUC, if any of the notifiers will actually start to fail, say, during 
> MIG_EVENT_PRECOPY_SETUP, you will call MIG_EVENT_PRECOPY_FAILED on all 
> notifiers.
> 
> That will include notifiers that have never seen a MIG_EVENT_PRECOPY_SETUP 
> call.

Correct.

> Is that what we expect notifiers to be able to deal with? Can we document 
> that?

The notifiers have always needed to handle failure without knowing the previous 
migration
states, and are robust about unwinding their own internal state.  I will 
document it.

Thanks for all the RBs.

- Steve

Re: [PATCH] iotests: Make 144 deterministic again

2024-02-12 Thread Stefan Hajnoczi

On Fri, Feb 09, 2024 at 06:31:03PM +0100, Kevin Wolf wrote:
> Since commit effd60c8 changed how QMP commands are processed, the order
> of the block-commit return value and job events in iotests 144 wasn't
> fixed and more and caused the test to fail intermittently.
> 
> Change the test to cache events first and then print them in a
> predefined order.
> 
> Waiting three times for JOB_STATUS_CHANGE is a bit uglier than just
> waiting for the JOB_STATUS_CHANGE that has "status": "ready", but the
> tooling we have doesn't seem to allow the latter easily.
> 
> Fixes: effd60c878176bcaf97fa7ce2b12d04bb8ead6f7
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2126
> Signed-off-by: Kevin Wolf 
> ---
>  tests/qemu-iotests/144 | 12 +++-
>  tests/qemu-iotests/144.out |  2 +-
>  2 files changed, 12 insertions(+), 2 deletions(-)

Thank you!

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [PATCH 01/14] migration: Add Error** argument to .save_setup() handler

2024-02-12 Thread Cédric Le Goater


On 2/12/24 09:36, Avihai Horon wrote:

Hi, Cedric

On 07/02/2024 15:33, Cédric Le Goater wrote:

External email: Use caution opening links or attachments


The purpose is to record a potential error in the migration stream if
qemu_savevm_state_setup() fails. Most of the current .save_setup()
handlers can be modified to use the Error argument instead of managing
their own and calling locally error_report(). The following patches
will introduce such changes for VFIO first.

Signed-off-by: Cédric Le Goater 
---
  include/migration/register.h   | 2 +-
  hw/ppc/spapr.c | 2 +-
  hw/s390x/s390-stattrib.c   | 2 +-
  hw/vfio/migration.c    | 2 +-
  migration/block-dirty-bitmap.c | 2 +-
  migration/block.c  | 2 +-
  migration/ram.c    | 2 +-
  migration/savevm.c | 4 ++--
  8 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/include/migration/register.h b/include/migration/register.h
index 
9ab1f79512c605f0c88a45b560c57486fa054441..831600a00eae4efd0464b60925d65de4d9dbcff8
 100644
--- a/include/migration/register.h
+++ b/include/migration/register.h
@@ -25,7 +25,7 @@ typedef struct SaveVMHandlers {
   * used to perform early checks.
   */
  int (*save_prepare)(void *opaque, Error **errp);
-    int (*save_setup)(QEMUFile *f, void *opaque);
+    int (*save_setup)(QEMUFile *f, void *opaque, Error **errp);
  void (*save_cleanup)(void *opaque);
  int (*save_live_complete_postcopy)(QEMUFile *f, void *opaque);
  int (*save_live_complete_precopy)(QEMUFile *f, void *opaque);
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 
0d72d286d80f0435122593555f79fae4d90acf81..a1b0aa02582ad2d68a13476c1859b18143da7bb8
 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2142,7 +2142,7 @@ static const VMStateDescription vmstate_spapr = {
  }
  };

-static int htab_save_setup(QEMUFile *f, void *opaque)
+static int htab_save_setup(QEMUFile *f, void *opaque, Error **errp)
  {
  SpaprMachineState *spapr = opaque;

diff --git a/hw/s390x/s390-stattrib.c b/hw/s390x/s390-stattrib.c
index 
c483b62a9b5f71772639fc180bdad15ecb6711cb..c934df424a555d83d2198f5ddfc0cbe0ea98e9ec
 100644
--- a/hw/s390x/s390-stattrib.c
+++ b/hw/s390x/s390-stattrib.c
@@ -166,7 +166,7 @@ static int cmma_load(QEMUFile *f, void *opaque, int 
version_id)
  return ret;
  }

-static int cmma_save_setup(QEMUFile *f, void *opaque)
+static int cmma_save_setup(QEMUFile *f, void *opaque, Error **errp)
  {
  S390StAttribState *sas = S390_STATTRIB(opaque);
  S390StAttribClass *sac = S390_STATTRIB_GET_CLASS(sas);
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 
70e6b1a709f9b67e4c9eb41033d76347275cac42..8bcb4bc73cd5ba5338e3ffa4d907d0e6bfbb9485
 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -378,7 +378,7 @@ static int vfio_save_prepare(void *opaque, Error **errp)
  return 0;
  }

-static int vfio_save_setup(QEMUFile *f, void *opaque)
+static int vfio_save_setup(QEMUFile *f, void *opaque, Error **errp)
  {
  VFIODevice *vbasedev = opaque;
  VFIOMigration *migration = vbasedev->migration;
diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index 
2708abf3d762de774ed294d3fdb8e56690d2974c..16f84e6c57c2403a8c2d6319f4e7b6360dade28c
 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -1213,7 +1213,7 @@ fail:
  return ret;
  }

-static int dirty_bitmap_save_setup(QEMUFile *f, void *opaque)
+static int dirty_bitmap_save_setup(QEMUFile *f, void *opaque, Error **errp)
  {
  DBMSaveState *s = &((DBMState *)opaque)->save;
  SaveBitmapState *dbms = NULL;
diff --git a/migration/block.c b/migration/block.c
index 
8c6ebafacc1ffe930d1d4f19d968817b14852c69..df15319ceab66201b043f15eac1b0a7d6522b60c
 100644
--- a/migration/block.c
+++ b/migration/block.c
@@ -708,7 +708,7 @@ static void block_migration_cleanup(void *opaque)
  blk_mig_unlock();
  }

-static int block_save_setup(QEMUFile *f, void *opaque)
+static int block_save_setup(QEMUFile *f, void *opaque, Error **errp)
  {
  int ret;

diff --git a/migration/ram.c b/migration/ram.c
index 
d5b7cd5ac2f31aabf4a248b966153401c48912cf..136c237f4079f68d4e578cf1c72eec2efc815bc8
 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2931,7 +2931,7 @@ void qemu_guest_free_page_hint(void *addr, size_t len)
   * @f: QEMUFile where to send the data
   * @opaque: RAMState pointer
   */
-static int ram_save_setup(QEMUFile *f, void *opaque)
+static int ram_save_setup(QEMUFile *f, void *opaque, Error **errp)
  {
  RAMState **rsp = opaque;
  RAMBlock *block;
diff --git a/migration/savevm.c b/migration/savevm.c
index 
d612c8a9020b204d5d078d5df85f0e6449c27645..f2ae799bad13e631bccf733a34c3a8fd22e8dd48
 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1342,10 +1342,10 @@ void qemu_savevm_state_setup(QEMUFile *f)
  }
  save_section_header(f, se, QEMU_VM_SECTION_START);

-    ret = se->ops->save_setup(f,

Re: possible deprecation and removal of some old QEMU Arm machine types (pxa2xx, omap, sa1110)

2024-02-12 Thread Guenter Roeck


[ sorry for the earlier noise; accidentally hit "send" ]

On 2/12/24 04:32, Peter Maydell wrote:

QEMU includes some models of old Arm machine types which are
a bit problematic for us because:
  * they're written in a very old way that uses numerous APIs that we
would like to get away from (eg they don't use qdev, they use
qemu_system_reset_request(), they use vmstate_register(), etc)
  * they've been that way for a decade plus and nobody particularly has
stepped up to try to modernise the code (beyond some occasional
work here and there)
  * we often don't have test cases for them, which means that if we
do try to do the necessary refactoring work on them we have no
idea if they even still work at all afterwards

All these machine types are also of hardware that has largely passed
away into history and where I would not be surprised to find that
e.g. the Linux kernel support was never tested on real hardware
any more.

So I would like to explore whether we can deprecate-and-drop
some or all of them. This would let us delete the code entirely
rather than spending a long time trying to bring it up to scratch
for a probably very small to nonexistent userbase. The aim of this
email is to see if anybody is still using any of these and would be
upset if they went away. Reports of "I tried to use this machine
type and it's just broken" are also interesting as they would
strongly suggest that the machine has no real users and can be
removed.

The machines I have in mind are:

PXA2xx machines:

akitaSharp SL-C1000 (Akita) PDA (PXA270)
borzoi   Sharp SL-C3100 (Borzoi) PDA (PXA270)
connex   Gumstix Connex (PXA255)
mainstoneMainstone II (PXA27x)
spitzSharp SL-C3000 (Spitz) PDA (PXA270)
terrier  Sharp SL-C3200 (Terrier) PDA (PXA270)
tosa Sharp SL-6000 (Tosa) PDA (PXA255)
verdex   Gumstix Verdex Pro XL6P COMs (PXA270)
z2   Zipit Z2 (PXA27x)


I test akita, borzoi, spitz, and terrier. Upstream Linux removed support
for mainstone, tosa, and z2 from the Linux kernel as of version 6.0, so
I am no longer testing those.

I never managed to boot connex or verdex.


OMAP1 machines:

cheetah  Palm Tungsten|E aka. Cheetah PDA (OMAP310)
sx1  Siemens SX1 (OMAP310) V2
sx1-v1   Siemens SX1 (OMAP310) V1


I test sx1. I don't think I ever tried cheetah, and I could not get sx1-v1
to work.


OMAP2 machines:

n800 Nokia N800 tablet aka. RX-34 (OMAP2420)
n810 Nokia N810 tablet aka. RX-44 (OMAP2420)


I never managed to get those to boot the Linux kernel.


The one SA1110 machine:

collie   Sharp SL-5500 (Collie) PDA (SA-1110)


I do test collie.

All the ones I use still boot the latest Linux kernel.


Obviously if we can remove all the machines that used a given
SoC, that's much more effective than if we just delete one or two.

I don't have any test images for the SA1110 or OMAP1 machines,
so those are the ones I am most keen to be able to drop.
I do have test images for a few of the pxa2xx and the OMAP2 machines.


I don't mind dropping them, just listing what I use for testing the
Linux kernel. I suspect I may be the only "user" of those boards,
though, both in Linux and qemu.

Guenter

Re: [PULL 00/61] riscv-to-apply queue

2024-02-12 Thread Peter Maydell

On Fri, 9 Feb 2024 at 10:59, Alistair Francis  wrote:
>
> The following changes since commit 03e4bc0bc02779fdf6f8e8d83197f05e70881abf:
>
>   Merge tag 'pull-tcg-20240205-2' of https://gitlab.com/rth7680/qemu into 
> staging (2024-02-08 16:08:42 +)
>
> are available in the Git repository at:
>
>   https://github.com/alistair23/qemu.git tags/pull-riscv-to-apply-20240209
>
> for you to fetch changes up to deb0ff0c777d20602ecc5b6f74f18cb7ecc0b91f:
>
>   target/riscv: add rv32i, rv32e and rv64e CPUs (2024-02-09 20:49:41 +1000)
>
> 
> RISC-V PR for 9.0
>
> * Check for 'A' extension on all atomic instructions
> * Add support for 'B' extension
> * Internally deprecate riscv_cpu_options
> * Implement optional CSR mcontext of debug Sdtrig extension
> * Internally add cpu->cfg.vlenb and  remove cpu->cfg.vlen
> * Support vlenb and vregs[] in KVM
> * RISC-V gdbstub and TCG plugin improvements
> * Remove vxrm and vxsat from FCSR
> * Use RISCVException as return type for all csr ops
> * Use g_autofree more and fix a memory leak
> * Add support for Zaamo and Zalrsc
> * Support new isa extension detection devicetree properties
> * SMBIOS support for RISC-V virt machine
> * Enable xtheadsync under user mode
> * Add rv32i,rv32e and rv64e CPUs
>


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/9.0
for any user-visible changes.

-- PMM

[PATCH v2] tap-win32: Remove unnecessary stubs

2024-02-12 Thread Akihiko Odaki

Some of them are only necessary for POSIX systems. The others are
assigned to function pointers in NetClientInfo that can actually be
NULL.

Signed-off-by: Akihiko Odaki 
---
Changes in v2:
- Rebased.
- Link to v1: 
https://lore.kernel.org/r/20231006051127.5429-1-akihiko.od...@daynix.com
---
 net/tap-win32.c | 54 --
 1 file changed, 54 deletions(-)

diff --git a/net/tap-win32.c b/net/tap-win32.c
index 7b8b4be02cff..7edbd7163370 100644
--- a/net/tap-win32.c
+++ b/net/tap-win32.c
@@ -707,70 +707,16 @@ static void tap_win32_send(void *opaque)
 }
 }
 
-static bool tap_has_ufo(NetClientState *nc)
-{
-return false;
-}
-
-static bool tap_has_vnet_hdr(NetClientState *nc)
-{
-return false;
-}
-
-int tap_probe_vnet_hdr_len(int fd, int len)
-{
-return 0;
-}
-
-void tap_fd_set_vnet_hdr_len(int fd, int len)
-{
-}
-
-int tap_fd_set_vnet_le(int fd, int is_le)
-{
-return -EINVAL;
-}
-
-int tap_fd_set_vnet_be(int fd, int is_be)
-{
-return -EINVAL;
-}
-
-static void tap_using_vnet_hdr(NetClientState *nc, bool using_vnet_hdr)
-{
-}
-
-static void tap_set_offload(NetClientState *nc, int csum, int tso4,
- int tso6, int ecn, int ufo, int uso4, int uso6)
-{
-}
-
 struct vhost_net *tap_get_vhost_net(NetClientState *nc)
 {
 return NULL;
 }
 
-static bool tap_has_vnet_hdr_len(NetClientState *nc, int len)
-{
-return false;
-}
-
-static void tap_set_vnet_hdr_len(NetClientState *nc, int len)
-{
-abort();
-}
-
 static NetClientInfo net_tap_win32_info = {
 .type = NET_CLIENT_DRIVER_TAP,
 .size = sizeof(TAPState),
 .receive = tap_receive,
 .cleanup = tap_cleanup,
-.has_ufo = tap_has_ufo,
-.has_vnet_hdr = tap_has_vnet_hdr,
-.has_vnet_hdr_len = tap_has_vnet_hdr_len,
-.using_vnet_hdr = tap_using_vnet_hdr,
-.set_offload = tap_set_offload,
-.set_vnet_hdr_len = tap_set_vnet_hdr_len,
 };
 
 static int tap_win32_init(NetClientState *peer, const char *model,

---
base-commit: 5d1fc614413b10dd94858b07a1b2e26b1aa0296c
change-id: 20240212-tap-51087194c8eb

Best regards,
-- 
Akihiko Odaki

Re: possible deprecation and removal of some old QEMU Arm machine types (pxa2xx, omap, sa1110)

2024-02-12 Thread Guenter Roeck


On 2/12/24 04:32, Peter Maydell wrote:

QEMU includes some models of old Arm machine types which are
a bit problematic for us because:
  * they're written in a very old way that uses numerous APIs that we
would like to get away from (eg they don't use qdev, they use
qemu_system_reset_request(), they use vmstate_register(), etc)
  * they've been that way for a decade plus and nobody particularly has
stepped up to try to modernise the code (beyond some occasional
work here and there)
  * we often don't have test cases for them, which means that if we
do try to do the necessary refactoring work on them we have no
idea if they even still work at all afterwards

All these machine types are also of hardware that has largely passed
away into history and where I would not be surprised to find that
e.g. the Linux kernel support was never tested on real hardware
any more.

So I would like to explore whether we can deprecate-and-drop
some or all of them. This would let us delete the code entirely
rather than spending a long time trying to bring it up to scratch
for a probably very small to nonexistent userbase. The aim of this
email is to see if anybody is still using any of these and would be
upset if they went away. Reports of "I tried to use this machine
type and it's just broken" are also interesting as they would
strongly suggest that the machine has no real users and can be
removed.

The machines I have in mind are:

PXA2xx machines:

akitaSharp SL-C1000 (Akita) PDA (PXA270)
borzoi   Sharp SL-C3100 (Borzoi) PDA (PXA270)
connex   Gumstix Connex (PXA255)
mainstoneMainstone II (PXA27x)
spitzSharp SL-C3000 (Spitz) PDA (PXA270)
terrier  Sharp SL-C3200 (Terrier) PDA (PXA270)
tosa Sharp SL-6000 (Tosa) PDA (PXA255)
verdex   Gumstix Verdex Pro XL6P COMs (PXA270)
z2   Zipit Z2 (PXA27x)

OMAP1 machines:

cheetah  Palm Tungsten|E aka. Cheetah PDA (OMAP310)
sx1  Siemens SX1 (OMAP310) V2
sx1-v1   Siemens SX1 (OMAP310) V1

OMAP2 machines:

n800 Nokia N800 tablet aka. RX-34 (OMAP2420)
n810 Nokia N810 tablet aka. RX-44 (OMAP2420)

The one SA1110 machine:

collie   Sharp SL-5500 (Collie) PDA (SA-1110)

Obviously if we can remove all the machines that used a given
SoC, that's much more effective than if we just delete one or two.

I don't have any test images for the SA1110 or OMAP1 machines,
so those are the ones I am most keen to be able to drop.
I do have test images for a few of the pxa2xx and the OMAP2 machines.

thanks
-- PMM

[PATCH RFCv2 8/8] vfio/common: Allow disabling device dirty page tracking

2024-02-12 Thread Joao Martins

The property 'x-pre-copy-dirty-page-tracking' allows disabling the whole
tracking of VF pre-copy phase of dirty page tracking, though it means
that it will only be used at the start of the switchover phase.

Add an option that disables the VF dirty page tracking, and fall
back into container-based dirty page tracking. This also allows to
use IOMMU dirty tracking even on VFs with their own dirty
tracker scheme.

Signed-off-by: Joao Martins 
---
 hw/vfio/common.c  | 7 +++
 hw/vfio/migration.c   | 3 ++-
 hw/vfio/pci.c | 3 +++
 include/hw/vfio/vfio-common.h | 1 +
 4 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index a940c0b6ede8..9fe113ea016d 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -213,6 +213,9 @@ bool vfio_devices_all_device_dirty_tracking(const 
VFIOContainerBase *bcontainer)
 VFIODevice *vbasedev;
 
 QLIST_FOREACH(vbasedev, >device_list, container_next) {
+if (vbasedev->device_dirty_page_tracking == ON_OFF_AUTO_OFF) {
+return false;
+}
 if (!vbasedev->dirty_pages_supported) {
 return false;
 }
@@ -236,6 +239,10 @@ bool vfio_device_dirty_pages_supported(VFIODevice 
*vbasedev)
 return false;
 }
 
+if (vbasedev->device_dirty_page_tracking == ON_OFF_AUTO_OFF) {
+return false;
+}
+
 return !vbasedev->dirty_pages_supported;
 }
 
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 674e76b3f3df..09e742fbeb9f 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -938,7 +938,8 @@ bool vfio_migration_realize(VFIODevice *vbasedev, Error 
**errp)
 return !vfio_block_migration(vbasedev, err, errp);
 }
 
-if (!vbasedev->dirty_pages_supported &&
+if ((!vbasedev->dirty_pages_supported ||
+ vbasedev->device_dirty_page_tracking == ON_OFF_AUTO_OFF) &&
 (vbasedev->iommufd_dev.iommufd &&
  !iommufd_dirty_pages_supported(>iommufd_dev, ))) {
 if (vbasedev->enable_migration == ON_OFF_AUTO_AUTO) {
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index dedb64fc080e..d3b516b2e4d3 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3349,6 +3349,9 @@ static Property vfio_pci_dev_properties[] = {
 DEFINE_PROP_ON_OFF_AUTO("x-pre-copy-dirty-page-tracking", VFIOPCIDevice,
 vbasedev.pre_copy_dirty_page_tracking,
 ON_OFF_AUTO_ON),
+DEFINE_PROP_ON_OFF_AUTO("x-device-dirty-page-tracking", VFIOPCIDevice,
+vbasedev.device_dirty_page_tracking,
+ON_OFF_AUTO_ON),
 DEFINE_PROP_ON_OFF_AUTO("display", VFIOPCIDevice,
 display, ON_OFF_AUTO_OFF),
 DEFINE_PROP_UINT32("xres", VFIOPCIDevice, display_xres, 0),
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index a3e691c126c6..e67f5e74cebd 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -136,6 +136,7 @@ typedef struct VFIODevice {
 VFIOMigration *migration;
 Error *migration_blocker;
 OnOffAuto pre_copy_dirty_page_tracking;
+OnOffAuto device_dirty_page_tracking;
 bool dirty_pages_supported;
 bool dirty_tracking;
 union {
-- 
2.39.3

[PATCH RFCv2 7/8] vfio/migration: Don't block migration device dirty tracking is unsupported

2024-02-12 Thread Joao Martins

By default VFIO migration is set to auto, which will support live
migration if the migration capability is set *and* also dirty page
tracking is supported.

For testing purposes one can force enable without dirty page tracking
via enable-migration=on, but that option is generally left for testing
purposes.

So starting with IOMMU dirty tracking it can use to acomodate the lack of
VF dirty page tracking allowing us to minimize the VF requirements for
migration and thus enabling migration by default for those.

Signed-off-by: Joao Martins 
---
 hw/vfio/iommufd.c| 3 +--
 hw/vfio/migration.c  | 4 +++-
 include/sysemu/iommufd.h | 1 +
 3 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 697d40841d7f..78d8f4391b68 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -275,8 +275,7 @@ static int iommufd_cdev_detach_ioas_hwpt(VFIODevice 
*vbasedev, Error **errp)
 return ret;
 }
 
-static bool iommufd_dirty_pages_supported(IOMMUFDDevice *iommufd_dev,
-  Error **errp)
+bool iommufd_dirty_pages_supported(IOMMUFDDevice *iommufd_dev, Error **errp)
 {
 uint64_t caps;
 int r;
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 70e6b1a709f9..674e76b3f3df 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -938,7 +938,9 @@ bool vfio_migration_realize(VFIODevice *vbasedev, Error 
**errp)
 return !vfio_block_migration(vbasedev, err, errp);
 }
 
-if (!vbasedev->dirty_pages_supported) {
+if (!vbasedev->dirty_pages_supported &&
+(vbasedev->iommufd_dev.iommufd &&
+ !iommufd_dirty_pages_supported(>iommufd_dev, ))) {
 if (vbasedev->enable_migration == ON_OFF_AUTO_AUTO) {
 error_setg(,
"%s: VFIO device doesn't support device dirty tracking",
diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
index bc6607e3d444..d6be49f2ac78 100644
--- a/include/sysemu/iommufd.h
+++ b/include/sysemu/iommufd.h
@@ -53,6 +53,7 @@ typedef struct IOMMUFDDevice {
 void iommufd_device_init(IOMMUFDDevice *idev);
 int iommufd_device_get_hw_capabilities(IOMMUFDDevice *idev, uint64_t *caps,
Error **errp);
+bool iommufd_dirty_pages_supported(IOMMUFDDevice *idev, Error **errp);
 int iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id,
uint32_t pt_id, uint32_t flags,
uint32_t data_type, uint32_t data_len,
-- 
2.39.3

[PATCH RFCv2 4/8] vfio/iommufd: Implement VFIOIOMMUClass::set_dirty_tracking support

2024-02-12 Thread Joao Martins

ioctl(iommufd, IOMMU_HWPT_SET_DIRTY_TRACKING, arg) is the UAPI that
enables or disables dirty page tracking.

It is called on the whole list of iommu domains it is are tracking,
and on failure it rolls it back.

Signed-off-by: Joao Martins 
---
 backends/iommufd.c   | 19 +++
 backends/trace-events|  1 +
 hw/vfio/common.c |  7 ++-
 hw/vfio/iommufd.c| 28 
 include/sysemu/iommufd.h |  3 +++
 5 files changed, 57 insertions(+), 1 deletion(-)

diff --git a/backends/iommufd.c b/backends/iommufd.c
index 2970135af4b9..954de61c2da0 100644
--- a/backends/iommufd.c
+++ b/backends/iommufd.c
@@ -240,6 +240,25 @@ int iommufd_backend_alloc_hwpt(int iommufd, uint32_t 
dev_id,
 return !ret ? 0 : -errno;
 }
 
+int iommufd_backend_set_dirty_tracking(IOMMUFDBackend *be, uint32_t hwpt_id,
+   bool start)
+{
+int ret;
+struct iommu_hwpt_set_dirty_tracking set_dirty = {
+.size = sizeof(set_dirty),
+.hwpt_id = hwpt_id,
+.flags = !start ? 0 : IOMMU_HWPT_DIRTY_TRACKING_ENABLE,
+};
+
+ret = ioctl(be->fd, IOMMU_HWPT_SET_DIRTY_TRACKING, _dirty);
+trace_iommufd_backend_set_dirty(be->fd, hwpt_id, start, ret);
+if (ret) {
+error_report("IOMMU_HWPT_SET_DIRTY_TRACKING failed: %s",
+ strerror(errno));
+}
+return !ret ? 0 : -errno;
+}
+
 static const TypeInfo iommufd_backend_info = {
 .name = TYPE_IOMMUFD_BACKEND,
 .parent = TYPE_OBJECT,
diff --git a/backends/trace-events b/backends/trace-events
index f83a276a4253..feba2baca5f7 100644
--- a/backends/trace-events
+++ b/backends/trace-events
@@ -16,3 +16,4 @@ iommufd_backend_unmap_dma(int iommufd, uint32_t ioas, 
uint64_t iova, uint64_t si
 iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id, uint32_t pt_id, 
uint32_t flags, uint32_t hwpt_type, uint32_t len, uint64_t data_ptr, uint32_t 
out_hwpt_id, int ret) " iommufd=%d dev_id=%u pt_id=%u flags=0x%x hwpt_type=%u 
len=%u data_ptr=0x%"PRIx64" out_hwpt=%u (%d)"
 iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas, int ret) " iommufd=%d 
ioas=%d (%d)"
 iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d id=%d 
(%d)"
+iommufd_backend_set_dirty(int iommufd, uint32_t hwpt_id, bool start, int ret) 
" iommufd=%d hwpt=%d enable=%d (%d)"
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index d8fc7077f839..a940c0b6ede8 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -190,7 +190,7 @@ static bool 
vfio_devices_all_dirty_tracking(VFIOContainerBase *bcontainer)
 QLIST_FOREACH(vbasedev, >device_list, container_next) {
 VFIOMigration *migration = vbasedev->migration;
 
-if (!migration) {
+if (!migration && !vbasedev->iommufd_dev.iommufd) {
 return false;
 }
 
@@ -199,6 +199,11 @@ static bool 
vfio_devices_all_dirty_tracking(VFIOContainerBase *bcontainer)
  vfio_device_state_is_precopy(vbasedev))) {
 return false;
 }
+
+if (vbasedev->iommufd_dev.iommufd &&
+!bcontainer->dirty_pages_supported) {
+return false;
+}
 }
 return true;
 }
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index edacb6d72748..361e659288fd 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -25,6 +25,7 @@
 #include "qemu/cutils.h"
 #include "qemu/chardev_open.h"
 #include "pci.h"
+#include "migration/migration.h"
 
 static int iommufd_cdev_map(const VFIOContainerBase *bcontainer, hwaddr iova,
 ram_addr_t size, void *vaddr, bool readonly)
@@ -115,6 +116,32 @@ static void iommufd_cdev_unbind_and_disconnect(VFIODevice 
*vbasedev)
 iommufd_backend_disconnect(vbasedev->iommufd_dev.iommufd);
 }
 
+static int iommufd_set_dirty_page_tracking(const VFIOContainerBase *bcontainer,
+   bool start)
+{
+const VFIOIOMMUFDContainer *container =
+container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
+int ret;
+VFIOIOASHwpt *hwpt;
+
+QLIST_FOREACH(hwpt, >hwpt_list, next) {
+ret = iommufd_backend_set_dirty_tracking(container->be,
+ hwpt->hwpt_id, start);
+if (ret) {
+goto err;
+}
+}
+
+return 0;
+
+err:
+QLIST_FOREACH(hwpt, >hwpt_list, next) {
+iommufd_backend_set_dirty_tracking(container->be,
+   hwpt->hwpt_id, !start);
+}
+return ret;
+}
+
 static int iommufd_cdev_getfd(const char *sysfs_path, Error **errp)
 {
 long int ret = -ENOTTY;
@@ -737,6 +764,7 @@ static void vfio_iommu_iommufd_class_init(ObjectClass 
*klass, void *data)
 vioc->detach_device = iommufd_cdev_detach;
 vioc->pci_hot_reset = iommufd_cdev_pci_hot_reset;
 vioc->host_iommu_device_init = vfio_cdev_host_iommu_device_init;
+vioc->set_dirty_page_tracking = iommufd_set_dirty_page_tracking;
 };

[PATCH RFCv2 3/8] vfio/iommufd: Probe and request hwpt dirty tracking capability

2024-02-12 Thread Joao Martins

Probe hardware dirty tracking support by querying device hw capabilities
via IOMMUFD_GET_HW_INFO.

In preparation to using the dirty tracking UAPI, request dirty tracking in
the HWPT flags when the device doesn't support dirty page tracking or has
it disabled; or when support when the VF backing IOMMU supports dirty
tracking. The latter is in the possibility of a device being attached
that doesn't have a dirty tracker.

Signed-off-by: Joao Martins 
---
 hw/vfio/common.c  | 18 ++
 hw/vfio/iommufd.c | 25 -
 include/hw/vfio/vfio-common.h |  2 ++
 3 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index f7f85160be88..d8fc7077f839 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -216,6 +216,24 @@ bool vfio_devices_all_device_dirty_tracking(const 
VFIOContainerBase *bcontainer)
 return true;
 }
 
+bool vfio_device_migration_supported(VFIODevice *vbasedev)
+{
+if (!vbasedev->migration) {
+return false;
+}
+
+return vbasedev->migration->mig_flags & VFIO_MIGRATION_STOP_COPY;
+}
+
+bool vfio_device_dirty_pages_supported(VFIODevice *vbasedev)
+{
+if (vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF) {
+return false;
+}
+
+return !vbasedev->dirty_pages_supported;
+}
+
 /*
  * Check if all VFIO devices are running and migration is active, which is
  * essentially equivalent to the migration being in pre-copy phase.
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index ca7ec45e725c..edacb6d72748 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -219,11 +219,26 @@ static int iommufd_cdev_detach_ioas_hwpt(VFIODevice 
*vbasedev, Error **errp)
 return ret;
 }
 
+static bool iommufd_dirty_pages_supported(IOMMUFDDevice *iommufd_dev,
+  Error **errp)
+{
+uint64_t caps;
+int r;
+
+r = iommufd_device_get_hw_capabilities(iommufd_dev, , errp);
+if (r) {
+return false;
+}
+
+return caps & IOMMU_HW_CAP_DIRTY_TRACKING;
+}
+
 static int iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
 VFIOIOMMUFDContainer *container,
 Error **errp)
 {
 int iommufd = vbasedev->iommufd_dev.iommufd->fd;
+uint32_t flags = 0;
 VFIOIOASHwpt *hwpt;
 Error *err = NULL;
 int ret = -EINVAL;
@@ -245,9 +260,15 @@ static int iommufd_cdev_autodomains_get(VFIODevice 
*vbasedev,
 }
 }
 
+if ((vfio_device_migration_supported(vbasedev) &&
+ !vfio_device_dirty_pages_supported(vbasedev)) ||
+iommufd_dirty_pages_supported(>iommufd_dev, )) {
+flags = IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
+}
+
 ret = iommufd_backend_alloc_hwpt(iommufd,
  vbasedev->iommufd_dev.devid,
- container->ioas_id, 0, 0, 0,
+ container->ioas_id, flags, 0, 0,
  NULL, _id);
 if (ret) {
 error_append_hint(,
@@ -271,6 +292,8 @@ static int iommufd_cdev_autodomains_get(VFIODevice 
*vbasedev,
 vbasedev->hwpt = hwpt;
 QLIST_INSERT_HEAD(>device_list, vbasedev, hwpt_next);
 QLIST_INSERT_HEAD(>hwpt_list, hwpt, next);
+container->bcontainer.dirty_pages_supported =
+  (flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING);
 return 0;
 }
 
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 7f7d823221e2..a3e691c126c6 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -271,6 +271,8 @@ bool
 vfio_devices_all_running_and_mig_active(const VFIOContainerBase *bcontainer);
 bool
 vfio_devices_all_device_dirty_tracking(const VFIOContainerBase *bcontainer);
+bool vfio_device_migration_supported(VFIODevice *vbasedev);
+bool vfio_device_dirty_pages_supported(VFIODevice *vbasedev);
 int vfio_devices_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
 VFIOBitmap *vbmap, hwaddr iova,
 hwaddr size);
-- 
2.39.3

[PATCH RFCv2 6/8] backends/iommufd: Add ability to disable hugepages

2024-02-12 Thread Joao Martins

Allow disabling hugepages to be dirty track at base page
granularity in similar vein to vfio_type1_iommu.disable_hugepages
but per IOAS.

Signed-off-by: Joao Martins 
---
 backends/iommufd.c   | 36 
 backends/trace-events|  1 +
 hw/vfio/iommufd.c|  4 
 include/sysemu/iommufd.h |  4 
 qapi/qom.json|  2 +-
 5 files changed, 46 insertions(+), 1 deletion(-)

diff --git a/backends/iommufd.c b/backends/iommufd.c
index dd676d493c37..72fd98a9a50c 100644
--- a/backends/iommufd.c
+++ b/backends/iommufd.c
@@ -29,6 +29,7 @@ static void iommufd_backend_init(Object *obj)
 be->fd = -1;
 be->users = 0;
 be->owned = true;
+be->hugepages = 1;
 }
 
 static void iommufd_backend_finalize(Object *obj)
@@ -63,6 +64,14 @@ static bool iommufd_backend_can_be_deleted(UserCreatable *uc)
 return !be->users;
 }
 
+static void iommufd_backend_set_hugepages(Object *obj, bool enabled,
+  Error **errp)
+{
+IOMMUFDBackend *be = IOMMUFD_BACKEND(obj);
+
+be->hugepages = enabled;
+}
+
 static void iommufd_backend_class_init(ObjectClass *oc, void *data)
 {
 UserCreatableClass *ucc = USER_CREATABLE_CLASS(oc);
@@ -70,6 +79,11 @@ static void iommufd_backend_class_init(ObjectClass *oc, void 
*data)
 ucc->can_be_deleted = iommufd_backend_can_be_deleted;
 
 object_class_property_add_str(oc, "fd", NULL, iommufd_backend_set_fd);
+
+object_class_property_add_bool(oc, "hugepages", NULL,
+   iommufd_backend_set_hugepages);
+object_class_property_set_description(oc, "hugepages",
+  "Set to 'off' to disable hugepages");
 }
 
 int iommufd_backend_connect(IOMMUFDBackend *be, Error **errp)
@@ -106,6 +120,28 @@ out:
 trace_iommufd_backend_disconnect(be->fd, be->users);
 }
 
+int iommufd_backend_set_option(int fd, uint32_t object_id,
+   uint32_t option_id, uint64_t val64)
+{
+int ret;
+struct iommu_option option = {
+.size = sizeof(option),
+.option_id = option_id,
+.op = IOMMU_OPTION_OP_SET,
+.val64 = val64,
+.object_id = object_id,
+};
+
+ret = ioctl(fd, IOMMU_OPTION, );
+if (ret) {
+error_report("Failed to set option %x to value %"PRIx64" %m", 
option_id,
+ val64);
+}
+trace_iommufd_backend_set_option(fd, object_id, option_id, val64, ret);
+
+return ret;
+}
+
 int iommufd_backend_alloc_ioas(IOMMUFDBackend *be, uint32_t *ioas_id,
Error **errp)
 {
diff --git a/backends/trace-events b/backends/trace-events
index 11a27cb114b6..076166552881 100644
--- a/backends/trace-events
+++ b/backends/trace-events
@@ -15,6 +15,7 @@ iommufd_backend_unmap_dma_non_exist(int iommufd, uint32_t 
ioas, uint64_t iova, u
 iommufd_backend_unmap_dma(int iommufd, uint32_t ioas, uint64_t iova, uint64_t 
size, int ret) " iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" (%d)"
 iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id, uint32_t pt_id, 
uint32_t flags, uint32_t hwpt_type, uint32_t len, uint64_t data_ptr, uint32_t 
out_hwpt_id, int ret) " iommufd=%d dev_id=%u pt_id=%u flags=0x%x hwpt_type=%u 
len=%u data_ptr=0x%"PRIx64" out_hwpt=%u (%d)"
 iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas, int ret) " iommufd=%d 
ioas=%d (%d)"
+iommufd_backend_set_option(int iommufd, uint32_t object_id, uint32_t 
option_id, uint64_t val, int ret) " iommufd=%d object_id=%u option_id=%u 
val64=0x%"PRIx64" (%d)"
 iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d id=%d 
(%d)"
 iommufd_backend_set_dirty(int iommufd, uint32_t hwpt_id, bool start, int ret) 
" iommufd=%d hwpt=%d enable=%d (%d)"
 iommufd_backend_get_dirty_bitmap(int iommufd, uint32_t hwpt_id, uint64_t iova, 
uint64_t size, uint64_t page_size, int ret) " iommufd=%d hwpt=%d 
iova=0x%"PRIx64" size=0x%"PRIx64" page_size=0x%"PRIx64" (%d)"
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 79b13bd262cc..697d40841d7f 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -521,6 +521,10 @@ static int iommufd_cdev_attach(const char *name, 
VFIODevice *vbasedev,
 goto err_alloc_ioas;
 }
 
+if (!vbasedev->iommufd_dev.iommufd->hugepages) {
+iommufd_backend_set_option(vbasedev->iommufd_dev.iommufd->fd, ioas_id,
+   IOMMU_OPTION_HUGE_PAGES, 0);
+}
 trace_iommufd_cdev_alloc_ioas(vbasedev->iommufd_dev.iommufd->fd, ioas_id);
 
 container = g_malloc0(sizeof(*container));
diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
index ba19b7ea4c19..bc6607e3d444 100644
--- a/include/sysemu/iommufd.h
+++ b/include/sysemu/iommufd.h
@@ -19,6 +19,7 @@ struct IOMMUFDBackend {
 /*< protected >*/
 int fd;/* /dev/iommu file descriptor */
 bool owned;/* is the /dev/iommu opened internally */
+bool hugepages;/* are hugepages

[PATCH RFCv2 5/8] vfio/iommufd: Implement VFIOIOMMUClass::query_dirty_bitmap support

2024-02-12 Thread Joao Martins

ioctl(iommufd, IOMMU_HWPT_GET_DIRTY_BITMAP, arg) is the UAPI
that fetches the bitmap that tells what was dirty in an IOVA
range.

A single bitmap is allocated and used across all the hwpts
sharing an IOAS which is then used in log_sync() to set Qemu
global bitmaps.

Signed-off-by: Joao Martins 
---
 backends/iommufd.c   | 24 
 backends/trace-events|  1 +
 hw/vfio/iommufd.c| 30 ++
 include/sysemu/iommufd.h |  3 +++
 4 files changed, 58 insertions(+)

diff --git a/backends/iommufd.c b/backends/iommufd.c
index 954de61c2da0..dd676d493c37 100644
--- a/backends/iommufd.c
+++ b/backends/iommufd.c
@@ -259,6 +259,30 @@ int iommufd_backend_set_dirty_tracking(IOMMUFDBackend *be, 
uint32_t hwpt_id,
 return !ret ? 0 : -errno;
 }
 
+int iommufd_backend_get_dirty_bitmap(IOMMUFDBackend *be, uint32_t hwpt_id,
+ uint64_t iova, ram_addr_t size,
+ uint64_t page_size, uint64_t *data)
+{
+int ret;
+struct iommu_hwpt_get_dirty_bitmap get_dirty_bitmap = {
+.size = sizeof(get_dirty_bitmap),
+.hwpt_id = hwpt_id,
+.iova = iova, .length = size,
+.page_size = page_size, .data = (uintptr_t)data,
+};
+
+ret = ioctl(be->fd, IOMMU_HWPT_GET_DIRTY_BITMAP, _dirty_bitmap);
+trace_iommufd_backend_get_dirty_bitmap(be->fd, hwpt_id, iova, size,
+   page_size, ret);
+if (ret) {
+error_report("IOMMU_HWPT_GET_DIRTY_BITMAP (iova: 0x%"PRIx64
+ " size: 0x%"PRIx64") failed: %s", iova,
+ size, strerror(errno));
+}
+
+return !ret ? 0 : -errno;
+}
+
 static const TypeInfo iommufd_backend_info = {
 .name = TYPE_IOMMUFD_BACKEND,
 .parent = TYPE_OBJECT,
diff --git a/backends/trace-events b/backends/trace-events
index feba2baca5f7..11a27cb114b6 100644
--- a/backends/trace-events
+++ b/backends/trace-events
@@ -17,3 +17,4 @@ iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id, 
uint32_t pt_id, uint32_
 iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas, int ret) " iommufd=%d 
ioas=%d (%d)"
 iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d id=%d 
(%d)"
 iommufd_backend_set_dirty(int iommufd, uint32_t hwpt_id, bool start, int ret) 
" iommufd=%d hwpt=%d enable=%d (%d)"
+iommufd_backend_get_dirty_bitmap(int iommufd, uint32_t hwpt_id, uint64_t iova, 
uint64_t size, uint64_t page_size, int ret) " iommufd=%d hwpt=%d 
iova=0x%"PRIx64" size=0x%"PRIx64" page_size=0x%"PRIx64" (%d)"
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 361e659288fd..79b13bd262cc 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -25,6 +25,7 @@
 #include "qemu/cutils.h"
 #include "qemu/chardev_open.h"
 #include "pci.h"
+#include "exec/ram_addr.h"
 #include "migration/migration.h"
 
 static int iommufd_cdev_map(const VFIOContainerBase *bcontainer, hwaddr iova,
@@ -142,6 +143,34 @@ err:
 return ret;
 }
 
+static int iommufd_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
+  VFIOBitmap *vbmap, uint64_t iova,
+  uint64_t size)
+{
+VFIOIOMMUFDContainer *container = container_of(bcontainer,
+   VFIOIOMMUFDContainer,
+   bcontainer);
+int ret;
+VFIOIOASHwpt *hwpt;
+unsigned long page_size;
+
+if (!bcontainer->dirty_pages_supported) {
+return 0;
+}
+
+page_size = qemu_real_host_page_size();
+QLIST_FOREACH(hwpt, >hwpt_list, next) {
+ret = iommufd_backend_get_dirty_bitmap(container->be, hwpt->hwpt_id,
+   iova, size, page_size,
+   vbmap->bitmap);
+if (ret) {
+break;
+}
+}
+
+return ret;
+}
+
 static int iommufd_cdev_getfd(const char *sysfs_path, Error **errp)
 {
 long int ret = -ENOTTY;
@@ -765,6 +794,7 @@ static void vfio_iommu_iommufd_class_init(ObjectClass 
*klass, void *data)
 vioc->pci_hot_reset = iommufd_cdev_pci_hot_reset;
 vioc->host_iommu_device_init = vfio_cdev_host_iommu_device_init;
 vioc->set_dirty_page_tracking = iommufd_set_dirty_page_tracking;
+vioc->query_dirty_bitmap = iommufd_query_dirty_bitmap;
 };
 
 static const TypeInfo types[] = {
diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
index 562c189dd92c..ba19b7ea4c19 100644
--- a/include/sysemu/iommufd.h
+++ b/include/sysemu/iommufd.h
@@ -55,5 +55,8 @@ int iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id,
void *data_ptr, uint32_t *out_hwpt);
 int iommufd_backend_set_dirty_tracking(IOMMUFDBackend *be, uint32_t hwpt_id,
bool start);
+int iommufd_backend_get_dirty_bitmap(IOMMUFDBackend *be, uint32_t hwpt_id,
+

[PATCH RFCv2 1/8] backends/iommufd: Introduce helper function iommufd_device_get_hw_capabilities()

2024-02-12 Thread Joao Martins

The new helper will fetch vendor agnostic IOMMU capabilities supported
both by hardware and software. Right now it is only iommu dirty
tracking.

Signed-off-by: Joao Martins 
---
 backends/iommufd.c   | 25 +
 include/sysemu/iommufd.h |  2 ++
 2 files changed, 27 insertions(+)

diff --git a/backends/iommufd.c b/backends/iommufd.c
index d92791bba935..8486894f1b3f 100644
--- a/backends/iommufd.c
+++ b/backends/iommufd.c
@@ -237,3 +237,28 @@ void iommufd_device_init(IOMMUFDDevice *idev)
 host_iommu_base_device_init(>base, HID_IOMMUFD,
 sizeof(IOMMUFDDevice));
 }
+
+int iommufd_device_get_hw_capabilities(IOMMUFDDevice *idev, uint64_t *caps,
+   Error **errp)
+{
+struct iommu_hw_info info = {
+.size = sizeof(info),
+.flags = 0,
+.dev_id = idev->devid,
+.data_len = 0,
+.__reserved = 0,
+.data_uptr = 0,
+.out_capabilities = 0,
+};
+int ret;
+
+ret = ioctl(idev->iommufd->fd, IOMMU_GET_HW_INFO, );
+if (ret) {
+error_setg_errno(errp, errno,
+ "Failed to get hardware info capabilities");
+} else {
+*caps = info.out_capabilities;
+}
+
+return ret;
+}
diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
index c3f346976039..4afe97307dbe 100644
--- a/include/sysemu/iommufd.h
+++ b/include/sysemu/iommufd.h
@@ -47,4 +47,6 @@ typedef struct IOMMUFDDevice {
 } IOMMUFDDevice;
 
 void iommufd_device_init(IOMMUFDDevice *idev);
+int iommufd_device_get_hw_capabilities(IOMMUFDDevice *idev, uint64_t *caps,
+   Error **errp);
 #endif
-- 
2.39.3

[PATCH RFCv2 0/8] vfio/iommufd: IOMMUFD Dirty Tracking

2024-02-12 Thread Joao Martins

This small series adds support for Dirty Tracking in IOMMUFD backend.
The sole reason I still made it RFC is because of the second patch,
where we are implementing user-managed auto domains.

In essence it is quite similar to the original IOMMUFD series where we
would allocate a HWPT, until we switched later on into a IOAS attach.
Patch 2 goes into more detail, but the gist is that there's two modes of
using IOMMUFD and by keep using kernel managed auto domains we would end
up duplicating the same flags we have in HWPT but into the VFIO IOAS
attach. While it is true that just adding a flag is simpler, it also
creates duplication and motivates duplicate what hwpt-alloc already has.
But there's a chance I have the wrong expectation here, so any feedback
welcome.

The series is divided into:

* Patch 1: Adds a simple helper to get device capabilities;

* Patches 2 - 5: IOMMUFD backend support for dirty tracking;

The workflow is relatively simple:

1) Probe device and allow dirty tracking in the HWPT
2) Toggling dirty tracking on/off
3) Read-and-clear of Dirty IOVAs

The heuristics selected for (1) were to enable it *if* device supports
migration but doesn't support VF dirty tracking or IOMMU dirty tracking
is supported. The latter is for the hotplug case where we can add a device
without a tracker and thus still support migration.

The unmap case is deferred until further vIOMMU support with migration
is added[3] which will then introduce the usage of
IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR in GET_DIRTY_BITMAP ioctl in the
dma unmap bitmap flow.

* Patches 6-8: Add disabling of hugepages to allow tracking at base
page; avoid blocking live migration where there's no VF dirty
tracker, considering that we have IOMMU dirty tracking. And allow
disabling VF dirty tracker via qemu command line.

This series builds on top of Zhengzhong series[0], but only requires the
first 9 patches i.e. up to ("vfio/pci: Initialize host iommu device
instance after attachment")[1] that are more generic IOMMUFD device
plumbing, and doesn't require the nesting counterpart.

This is stored on github:
https://github.com/jpemartins/qemu/commits/iommufd-v5

Note: While Linux v6.7 has IOMMU dirty tracking feature, I suggest folks
use the latest for-rc of iommufd kernel tree as there's some fixes there.

Comments and feedback appreciated.

Cheers,
Joao

Chances since RFCv1[2]:
* Remove intel/amd dirty tracking emulation enabling
* Remove the dirtyrate improvement for VF/IOMMU dirty tracking
[Will pursue these two in separate series]
* Introduce auto domains support
* Enforce dirty tracking following the IOMMUFD UAPI for this
* Add support for toggling hugepages in IOMMUFD
* Auto enable support when VF supports migration to use IOMMU
when it doesn't have VF dirty tracking
* Add a parameter to toggle VF dirty tracking

[0] 
https://lore.kernel.org/qemu-devel/20240201072818.327930-1-zhenzhong.d...@intel.com/
[1] 
https://lore.kernel.org/qemu-devel/20240201072818.327930-10-zhenzhong.d...@intel.com/
[2] 
https://lore.kernel.org/qemu-devel/20220428211351.3897-1-joao.m.mart...@oracle.com/
[3] 
https://lore.kernel.org/qemu-devel/20230622214845.3980-1-joao.m.mart...@oracle.com/

Joao Martins (8):
  backends/iommufd: Introduce helper function
iommufd_device_get_hw_capabilities()
  vfio/iommufd: Introduce auto domain creation
  vfio/iommufd: Probe and request hwpt dirty tracking capability
  vfio/iommufd: Implement VFIOIOMMUClass::set_dirty_tracking support
  vfio/iommufd: Implement VFIOIOMMUClass::query_dirty_bitmap support
  backends/iommufd: Add ability to disable hugepages
  vfio/migration: Don't block migration device dirty tracking is
unsupported
  vfio/common: Allow disabling device dirty page tracking

 backends/iommufd.c| 133 
 backends/trace-events |   4 +
 hw/vfio/common.c  |  32 ++-
 hw/vfio/iommufd.c | 162 ++
 hw/vfio/migration.c   |   5 +-
 hw/vfio/pci.c |   3 +
 include/hw/vfio/vfio-common.h |  12 +++
 include/sysemu/iommufd.h  |  17 
 qapi/qom.json |   2 +-
 9 files changed, 367 insertions(+), 3 deletions(-)

-- 
2.39.3

[PATCH RFCv2 2/8] vfio/iommufd: Introduce auto domain creation

2024-02-12 Thread Joao Martins

There's generally two modes of operation for IOMMUFD:

* The simple user API which intends to perform relatively simple things
with IOMMUs e.g. DPDK. It generally creates an IOAS and attach to VFIO
and mainly performs IOAS_MAP and UNMAP.

* The native IOMMUFD API where you have fine grained control of the
IOMMU domain and model it accordingly. This is where most new feature
are being steered to.

For dirty tracking 2) is required, as it needs to ensure that
the stage-2/parent IOMMU domain will only attach devices
that support dirty tracking (so far it is all homogeneous in x86, likely
not the case for smmuv3). Such invariant on dirty tracking provides a
useful guarantee to VMMs that will refuse incompatible device
attachments for IOMMU domains.

For dirty tracking such property is enabled/enforced via HWPT_ALLOC,
which is responsible for creating an IOMMU domain. This is contrast to
the 'simple API' where the IOMMU domain is created by IOMMUFD
automatically when it attaches to VFIO (usually referred as autodomains)

To support dirty tracking with the advanced IOMMUFD API, it needs
similar logic, where IOMMU domains are created and devices attached to
compatible domains. Essentially mimmicing kernel
iommufd_device_auto_get_domain(). If this fails (i.e. mdevs) it falls back
to IOAS attach.

Signed-off-by: Joao Martins 
---
Right now the only alternative to a userspace autodomains implementation
is to mimmicing all the flags being added to HWPT_ALLOC but into VFIO
IOAS attach. So opted for autodomains userspace approach to avoid the
duplication of hwpt-alloc flags vs attach-ioas flags. I lack mdev real
drivers atm, so testing with those is still TBD.

Opinions, comments, welcome!
---
 backends/iommufd.c| 29 +
 backends/trace-events |  1 +
 hw/vfio/iommufd.c | 78 +++
 include/hw/vfio/vfio-common.h |  9 
 include/sysemu/iommufd.h  |  4 ++
 5 files changed, 121 insertions(+)

diff --git a/backends/iommufd.c b/backends/iommufd.c
index 8486894f1b3f..2970135af4b9 100644
--- a/backends/iommufd.c
+++ b/backends/iommufd.c
@@ -211,6 +211,35 @@ int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t 
ioas_id,
 return ret;
 }
 
+int iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id,
+   uint32_t pt_id, uint32_t flags,
+   uint32_t data_type, uint32_t data_len,
+   void *data_ptr, uint32_t *out_hwpt)
+{
+int ret;
+struct iommu_hwpt_alloc alloc_hwpt = {
+.size = sizeof(struct iommu_hwpt_alloc),
+.flags = flags,
+.dev_id = dev_id,
+.pt_id = pt_id,
+.data_type = data_type,
+.data_len = data_len,
+.data_uptr = (uint64_t)data_ptr,
+.__reserved = 0,
+};
+
+ret = ioctl(iommufd, IOMMU_HWPT_ALLOC, _hwpt);
+trace_iommufd_backend_alloc_hwpt(iommufd, dev_id, pt_id, flags, data_type,
+ data_len, (uint64_t)data_ptr,
+ alloc_hwpt.out_hwpt_id, ret);
+if (ret) {
+error_report("IOMMU_HWPT_ALLOC failed: %m");
+} else {
+*out_hwpt = alloc_hwpt.out_hwpt_id;
+}
+return !ret ? 0 : -errno;
+}
+
 static const TypeInfo iommufd_backend_info = {
 .name = TYPE_IOMMUFD_BACKEND,
 .parent = TYPE_OBJECT,
diff --git a/backends/trace-events b/backends/trace-events
index d45c6e31a67e..f83a276a4253 100644
--- a/backends/trace-events
+++ b/backends/trace-events
@@ -13,5 +13,6 @@ iommu_backend_set_fd(int fd) "pre-opened /dev/iommu fd=%d"
 iommufd_backend_map_dma(int iommufd, uint32_t ioas, uint64_t iova, uint64_t 
size, void *vaddr, bool readonly, int ret) " iommufd=%d ioas=%d 
iova=0x%"PRIx64" size=0x%"PRIx64" addr=%p readonly=%d (%d)"
 iommufd_backend_unmap_dma_non_exist(int iommufd, uint32_t ioas, uint64_t iova, 
uint64_t size, int ret) " Unmap nonexistent mapping: iommufd=%d ioas=%d 
iova=0x%"PRIx64" size=0x%"PRIx64" (%d)"
 iommufd_backend_unmap_dma(int iommufd, uint32_t ioas, uint64_t iova, uint64_t 
size, int ret) " iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" (%d)"
+iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id, uint32_t pt_id, 
uint32_t flags, uint32_t hwpt_type, uint32_t len, uint64_t data_ptr, uint32_t 
out_hwpt_id, int ret) " iommufd=%d dev_id=%u pt_id=%u flags=0x%x hwpt_type=%u 
len=%u data_ptr=0x%"PRIx64" out_hwpt=%u (%d)"
 iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas, int ret) " iommufd=%d 
ioas=%d (%d)"
 iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d id=%d 
(%d)"
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 7d39d7a5fa51..ca7ec45e725c 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -219,10 +219,82 @@ static int iommufd_cdev_detach_ioas_hwpt(VFIODevice 
*vbasedev, Error **errp)
 return ret;
 }
 
+static int iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
+

Re: [PATCH v3] hw/usb: fix xhci port notify

2024-02-12 Thread Nikita Ostrenkov

(+Michael)

ping
https://patchew.org/QEMU/20231117173916.3658-1-n.ostren...@gmail.com/

чт, 25 янв. 2024 г. в 23:06, Nikita Ostrenkov :

> ping
> https://patchew.org/QEMU/20231117173916.3658-1-n.ostren...@gmail.com/
>
> пн, 18 дек. 2023 г., 13:40 Nikita Ostrenkov :
>
>> ping
>> https://patchew.org/QEMU/20231117173916.3658-1-n.ostren...@gmail.com/
>>
>> пт, 17 нояб. 2023 г., 20:39 Nikita Ostrenkov :
>>
>>> From MCF5253 Reference manual
>>> https://www.nxp.com/docs/en/reference-manual/MCF5253RM.pdf
>>>
>>> Host mode: Port Change Detect. The controller sets this bit to a one
>>> when on any port a Connect Status occurs, a PortEnable/Disable Change
>>> occurs, an Over Current Change occurs, or the Force Port Resume bit is set
>>> as theresult of a J-K transition on the suspended port.
>>>
>>> Signed-off-by: Nikita Ostrenkov 
>>> ---
>>>  hw/usb/hcd-xhci.c | 1 +
>>>  1 file changed, 1 insertion(+)
>>>
>>> diff --git a/hw/usb/hcd-xhci.c b/hw/usb/hcd-xhci.c
>>> index 4b60114207..1b2f4ac721 100644
>>> --- a/hw/usb/hcd-xhci.c
>>> +++ b/hw/usb/hcd-xhci.c
>>> @@ -2627,6 +2627,7 @@ static void xhci_port_notify(XHCIPort *port,
>>> uint32_t bits)
>>>  if (!xhci_running(port->xhci)) {
>>>  return;
>>>  }
>>> +port->xhci->usbsts |= USBSTS_PCD;
>>>  xhci_event(port->xhci, , 0);
>>>  }
>>>
>>> --
>>> 2.34.1
>>>
>>>

Re: [PATCH 13/14] migration: Use migrate_has_error() in close_return_path_on_source()

2024-02-12 Thread Cédric Le Goater


On 2/8/24 14:57, Fabiano Rosas wrote:

Cédric Le Goater  writes:


On 2/8/24 14:07, Fabiano Rosas wrote:

Cédric Le Goater  writes:


close_return_path_on_source() retrieves the migration error from the
the QEMUFile '->to_dst_file' to know if a shutdown is required. This
shutdown is required to exit the return-path thread. However, in
migrate_fd_cleanup(), '->to_dst_file' is cleaned up before calling
close_return_path_on_source() and the shutdown is never performed,
leaving the source and destination waiting for an event to occur.

Avoid relying on '->to_dst_file' and use migrate_has_error() instead.

Suggested-by: Peter Xu 
Signed-off-by: Cédric Le Goater 
---
   migration/migration.c | 3 +--
   1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 
d5f705ceef4c925589aa49335969672c0d761fa2..5f55af3d7624750ca416c4177781241b3e291e5d
 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2372,8 +2372,7 @@ static bool close_return_path_on_source(MigrationState 
*ms)
* cause it to unblock if it's stuck waiting for the destination.
*/
   WITH_QEMU_LOCK_GUARD(>qemu_file_lock) {
-if (ms->to_dst_file && ms->rp_state.from_dst_file &&
-qemu_file_get_error(ms->to_dst_file)) {
+if (migrate_has_error(ms) && ms->rp_state.from_dst_file) {
   qemu_file_shutdown(ms->rp_state.from_dst_file);
   }
   }


Hm, maybe Peter can help defend this, but this assumes that every
function that takes an 'f' and sets the file error also sets
migrate_set_error(). I'm not sure we have determined that, have we?


How could we check all the code path ? I agree it is difficult when
looking at the code :/


It would help if the thing wasn't called 'f' for the most part of the
code to begin with.

Whenever there's a file error at to_dst_file there's the chance that the
rp_state.from_dst_file got stuck. So we cannot ignore the file error.

Would it work if we checked it earlier during cleanup as you did
previously and then set the migration error?


Do you mean doing something similar to what is done in
source_return_path_thread() ?

if (qemu_file_get_error(s->to_dst_file)) {
qemu_file_get_error_obj(s->to_dst_file, );
if (err) {
migrate_set_error(ms, err);
error_free(err);
...

Yes. That would be safer I think.


Nevertheless, I am struggling to understand how qemu_file_set_error()
and migrate_set_error() fit together. I was expecting some kind of
synchronization  routine but there isn't it seems. Are they completely
orthogonal ? when should we use these routines and when not ?

My initial goal was to modify some of the memory handlers (log_global*)
and migration handlers to propagate errors at the QMP level and them
report to the management layer. This is growing in something bigger
and currently, I don't find a good approach to the problem.

The last two patches of this series try to fix the return-path thread
termination. Let's keep that for after.

Thanks,

C.

Re: [PATCH v3 07/11] hw/sh4/r2d: Realize IDE controller before accessing it

2024-02-12 Thread Yoshinori Sato

On Fri, 09 Feb 2024 03:12:40 +0900,
Philippe Mathieu-Daudé wrote:
> 
> We should not wire IRQs on unrealized device.
> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  hw/sh4/r2d.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/sh4/r2d.c b/hw/sh4/r2d.c
> index e9f316a6ce..c73e8f49b8 100644
> --- a/hw/sh4/r2d.c
> +++ b/hw/sh4/r2d.c
> @@ -285,9 +285,9 @@ static void r2d_init(MachineState *machine)
>  dinfo = drive_get(IF_IDE, 0, 0);
>  dev = qdev_new("mmio-ide");
>  busdev = SYS_BUS_DEVICE(dev);
> -sysbus_connect_irq(busdev, 0, irq[CF_IDE]);
>  qdev_prop_set_uint32(dev, "shift", 1);
>  sysbus_realize_and_unref(busdev, _fatal);
> +sysbus_connect_irq(busdev, 0, irq[CF_IDE]);
>  sysbus_mmio_map(busdev, 0, 0x14001000);
>  sysbus_mmio_map(busdev, 1, 0x1400080c);
>  mmio_ide_init_drives(dev, dinfo, NULL);
> -- 
> 2.41.0
> 

Reviewed-by: Yoshinori Sato 

-- 
Yosinori Sato

possible deprecation and removal of some old QEMU Arm machine types (pxa2xx, omap, sa1110)

2024-02-12 Thread Peter Maydell

QEMU includes some models of old Arm machine types which are
a bit problematic for us because:
 * they're written in a very old way that uses numerous APIs that we
   would like to get away from (eg they don't use qdev, they use
   qemu_system_reset_request(), they use vmstate_register(), etc)
 * they've been that way for a decade plus and nobody particularly has
   stepped up to try to modernise the code (beyond some occasional
   work here and there)
 * we often don't have test cases for them, which means that if we
   do try to do the necessary refactoring work on them we have no
   idea if they even still work at all afterwards

All these machine types are also of hardware that has largely passed
away into history and where I would not be surprised to find that
e.g. the Linux kernel support was never tested on real hardware
any more.

So I would like to explore whether we can deprecate-and-drop
some or all of them. This would let us delete the code entirely
rather than spending a long time trying to bring it up to scratch
for a probably very small to nonexistent userbase. The aim of this
email is to see if anybody is still using any of these and would be
upset if they went away. Reports of "I tried to use this machine
type and it's just broken" are also interesting as they would
strongly suggest that the machine has no real users and can be
removed.

The machines I have in mind are:

PXA2xx machines:

akitaSharp SL-C1000 (Akita) PDA (PXA270)
borzoi   Sharp SL-C3100 (Borzoi) PDA (PXA270)
connex   Gumstix Connex (PXA255)
mainstoneMainstone II (PXA27x)
spitzSharp SL-C3000 (Spitz) PDA (PXA270)
terrier  Sharp SL-C3200 (Terrier) PDA (PXA270)
tosa Sharp SL-6000 (Tosa) PDA (PXA255)
verdex   Gumstix Verdex Pro XL6P COMs (PXA270)
z2   Zipit Z2 (PXA27x)

OMAP1 machines:

cheetah  Palm Tungsten|E aka. Cheetah PDA (OMAP310)
sx1  Siemens SX1 (OMAP310) V2
sx1-v1   Siemens SX1 (OMAP310) V1

OMAP2 machines:

n800 Nokia N800 tablet aka. RX-34 (OMAP2420)
n810 Nokia N810 tablet aka. RX-44 (OMAP2420)

The one SA1110 machine:

collie   Sharp SL-5500 (Collie) PDA (SA-1110)

Obviously if we can remove all the machines that used a given
SoC, that's much more effective than if we just delete one or two.

I don't have any test images for the SA1110 or OMAP1 machines,
so those are the ones I am most keen to be able to drop.
I do have test images for a few of the pxa2xx and the OMAP2 machines.

thanks
-- PMM

[PATCH] main-loop: Avoid some unnecessary poll calls

2024-02-12 Thread Ross Lagerwall via

A common pattern is seen where a timer fires, the callback does some
work, then rearms the timer which implicitly calls qemu_notify_event().

qemu_notify_event() is supposed to interrupt the main loop's poll() by
calling qemu_bh_schedule(). In the case that this is being called from a
main loop callback, the main loop is already not waiting on poll() and
instead it means the main loop does an addition iteration with a timeout
of 0 to handle the bottom half wakeup, before once again polling with
the expected timeout value.

Detect this situation by skipping the qemu_bh_schedule() call if the
default main context is currently owned by the caller. i.e. it is being
called as part of a poll / timer callback. Adjust the scope of the main
context acquire / release to cover the timer callbacks in
qemu_clock_run_all_timers().

Signed-off-by: Ross Lagerwall 
---
 util/main-loop.c | 24 +---
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/util/main-loop.c b/util/main-loop.c
index a0386cfeb60c..a2afbb7d0e13 100644
--- a/util/main-loop.c
+++ b/util/main-loop.c
@@ -145,10 +145,16 @@ AioContext *qemu_get_aio_context(void)
 
 void qemu_notify_event(void)
 {
+GMainContext *context;
+
 if (!qemu_aio_context) {
 return;
 }
-qemu_bh_schedule(qemu_notify_bh);
+
+context = g_main_context_default();
+if (!g_main_context_is_owner(context)) {
+qemu_bh_schedule(qemu_notify_bh);
+}
 }
 
 static GArray *gpollfds;
@@ -292,11 +298,8 @@ static void glib_pollfds_poll(void)
 
 static int os_host_main_loop_wait(int64_t timeout)
 {
-GMainContext *context = g_main_context_default();
 int ret;
 
-g_main_context_acquire(context);
-
 glib_pollfds_fill();
 
 bql_unlock();
@@ -309,8 +312,6 @@ static int os_host_main_loop_wait(int64_t timeout)
 
 glib_pollfds_poll();
 
-g_main_context_release(context);
-
 return ret;
 }
 #else
@@ -470,15 +471,12 @@ static int os_host_main_loop_wait(int64_t timeout)
 fd_set rfds, wfds, xfds;
 int nfds;
 
-g_main_context_acquire(context);
-
 /* XXX: need to suppress polling by better using win32 events */
 ret = 0;
 for (pe = first_polling_entry; pe != NULL; pe = pe->next) {
 ret |= pe->func(pe->opaque);
 }
 if (ret != 0) {
-g_main_context_release(context);
 return ret;
 }
 
@@ -538,8 +536,6 @@ static int os_host_main_loop_wait(int64_t timeout)
 g_main_context_dispatch(context);
 }
 
-g_main_context_release(context);
-
 return select_ret || g_poll_ret;
 }
 #endif
@@ -559,6 +555,7 @@ void main_loop_poll_remove_notifier(Notifier *notify)
 
 void main_loop_wait(int nonblocking)
 {
+GMainContext *context = g_main_context_default();
 MainLoopPoll mlpoll = {
 .state = MAIN_LOOP_POLL_FILL,
 .timeout = UINT32_MAX,
@@ -586,7 +583,10 @@ void main_loop_wait(int nonblocking)
   timerlistgroup_deadline_ns(
   _loop_tlg));
 
+g_main_context_acquire(context);
+
 ret = os_host_main_loop_wait(timeout_ns);
+
 mlpoll.state = ret < 0 ? MAIN_LOOP_POLL_ERR : MAIN_LOOP_POLL_OK;
 notifier_list_notify(_loop_poll_notifiers, );
 
@@ -598,6 +598,8 @@ void main_loop_wait(int nonblocking)
 icount_start_warp_timer();
 }
 qemu_clock_run_all_timers();
+
+g_main_context_release(context);
 }
 
 /* Functions to operate on the main QEMU AioContext.  */
-- 
2.43.0

Re: [PATCH v3 03/11] hw/rx/rx62n: Only call qdev_get_gpio_in() when necessary

2024-02-12 Thread Yoshinori Sato

On Fri, 09 Feb 2024 03:12:36 +0900,
Philippe Mathieu-Daudé wrote:
> 
> Instead of filling an array of all the possible IRQs, only call
> qdev_get_gpio_in() when an IRQ is used. Remove the array from
> RX62NState. Doing so we avoid calling qdev_get_gpio_in() on an
> unrealized device.
> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  include/hw/rx/rx62n.h |  1 -
>  hw/rx/rx62n.c | 16 
>  2 files changed, 8 insertions(+), 9 deletions(-)
> 
> diff --git a/include/hw/rx/rx62n.h b/include/hw/rx/rx62n.h
> index bcda583ab3..766fe0e435 100644
> --- a/include/hw/rx/rx62n.h
> +++ b/include/hw/rx/rx62n.h
> @@ -67,7 +67,6 @@ struct RX62NState {
>  MemoryRegion iomem2;
>  MemoryRegion iomem3;
>  MemoryRegion c_flash;
> -qemu_irq irq[NR_IRQS];
>  
>  /* Input Clock (XTAL) frequency */
>  uint32_t xtal_freq_hz;
> diff --git a/hw/rx/rx62n.c b/hw/rx/rx62n.c
> index d3f61a6837..560f53a58a 100644
> --- a/hw/rx/rx62n.c
> +++ b/hw/rx/rx62n.c
> @@ -148,14 +148,11 @@ static void register_icu(RX62NState *s)
>  qlist_append_int(trigger_level, levelirq[i]);
>  }
>  qdev_prop_set_array(DEVICE(icu), "trigger-level", trigger_level);
> -
> -for (i = 0; i < NR_IRQS; i++) {
> -s->irq[i] = qdev_get_gpio_in(DEVICE(icu), i);
> -}
>  sysbus_realize(icu, _abort);
> +
>  sysbus_connect_irq(icu, 0, qdev_get_gpio_in(DEVICE(>cpu), 
> RX_CPU_IRQ));
>  sysbus_connect_irq(icu, 1, qdev_get_gpio_in(DEVICE(>cpu), 
> RX_CPU_FIR));
> -sysbus_connect_irq(icu, 2, s->irq[SWI]);
> +sysbus_connect_irq(icu, 2, qdev_get_gpio_in(DEVICE(>icu), SWI));
>  sysbus_mmio_map(icu, 0, RX62N_ICU_BASE);
>  }
>  
> @@ -172,7 +169,8 @@ static void register_tmr(RX62NState *s, int unit)
>  
>  irqbase = RX62N_TMR_IRQ + TMR_NR_IRQ * unit;
>  for (i = 0; i < TMR_NR_IRQ; i++) {
> -sysbus_connect_irq(tmr, i, s->irq[irqbase + i]);
> +sysbus_connect_irq(tmr, i,
> +   qdev_get_gpio_in(DEVICE(>icu), irqbase + i));
>  }
>  sysbus_mmio_map(tmr, 0, RX62N_TMR_BASE + unit * 0x10);
>  }
> @@ -190,7 +188,8 @@ static void register_cmt(RX62NState *s, int unit)
>  
>  irqbase = RX62N_CMT_IRQ + CMT_NR_IRQ * unit;
>  for (i = 0; i < CMT_NR_IRQ; i++) {
> -sysbus_connect_irq(cmt, i, s->irq[irqbase + i]);
> +sysbus_connect_irq(cmt, i,
> +   qdev_get_gpio_in(DEVICE(>icu), irqbase + i));
>  }
>  sysbus_mmio_map(cmt, 0, RX62N_CMT_BASE + unit * 0x10);
>  }
> @@ -209,7 +208,8 @@ static void register_sci(RX62NState *s, int unit)
>  
>  irqbase = RX62N_SCI_IRQ + SCI_NR_IRQ * unit;
>  for (i = 0; i < SCI_NR_IRQ; i++) {
> -sysbus_connect_irq(sci, i, s->irq[irqbase + i]);
> +sysbus_connect_irq(sci, i,
> +   qdev_get_gpio_in(DEVICE(>icu), irqbase + i));
>  }
>  sysbus_mmio_map(sci, 0, RX62N_SCI_BASE + unit * 0x08);
>  }
> -- 
> 2.41.0
> 

Reviewed-by: Yoshinori Sato 

-- 
Yosinori Sato

Re: [PATCH v3 02/11] hw/rx/rx62n: Reduce inclusion of 'qemu/units.h'

2024-02-12 Thread Yoshinori Sato

On Fri, 09 Feb 2024 03:12:35 +0900,
Philippe Mathieu-Daudé wrote:
> 
> "qemu/units.h" is not used in the "hw/rx/rx62n.h"
> header, include it in the source where it is.
> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  include/hw/rx/rx62n.h | 1 -
>  hw/rx/rx-gdbsim.c | 1 +
>  hw/rx/rx62n.c | 1 +
>  3 files changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/include/hw/rx/rx62n.h b/include/hw/rx/rx62n.h
> index 73ceeb58e5..bcda583ab3 100644
> --- a/include/hw/rx/rx62n.h
> +++ b/include/hw/rx/rx62n.h
> @@ -29,7 +29,6 @@
>  #include "hw/timer/renesas_tmr.h"
>  #include "hw/timer/renesas_cmt.h"
>  #include "hw/char/renesas_sci.h"
> -#include "qemu/units.h"
>  #include "qom/object.h"
>  
>  #define TYPE_RX62N_MCU "rx62n-mcu"
> diff --git a/hw/rx/rx-gdbsim.c b/hw/rx/rx-gdbsim.c
> index 47c17026c7..bb4746c556 100644
> --- a/hw/rx/rx-gdbsim.c
> +++ b/hw/rx/rx-gdbsim.c
> @@ -20,6 +20,7 @@
>  #include "qemu/cutils.h"
>  #include "qemu/error-report.h"
>  #include "qemu/guest-random.h"
> +#include "qemu/units.h"
>  #include "qapi/error.h"
>  #include "hw/loader.h"
>  #include "hw/rx/rx62n.h"
> diff --git a/hw/rx/rx62n.c b/hw/rx/rx62n.c
> index 4dc44afd9d..d3f61a6837 100644
> --- a/hw/rx/rx62n.c
> +++ b/hw/rx/rx62n.c
> @@ -23,6 +23,7 @@
>  #include "qemu/osdep.h"
>  #include "qapi/error.h"
>  #include "qemu/error-report.h"
> +#include "qemu/units.h"
>  #include "hw/rx/rx62n.h"
>  #include "hw/loader.h"
>  #include "hw/sysbus.h"
> -- 
> 2.41.0
> 

Reviewed-by: Yoshinori Sato 

-- 
Yosinori Sato

Re:Re: [PATCH] This patch implements several Octeon +/II instructions.

2024-02-12 Thread owl129

Hi > How can we test it? Is there any distribution producing kernel for
> Octeon+/2? Per https://github.com/MarvellEmbeddedProcessors/marvell-dpdk
>  I understand there could be Linux and FreeBSD, is that correct?Actually I 
> don't know how to fully test each intruction. 


To the best of my knowledge, the Octeon Instruction is optimized for networking 
security/application processor.
And the latest reference manual is not public available. 
(https://dokumen.tips/documents/cavium-networks-octeon-plus-cn50xx-hardware-2008-cavium-networks-octeon-plus.html?page=1
 )


I find the Instruction specification from Marvell's toolchain and Vargrind's 
vex ir translation 
(https://sourceware.org/git/?p=valgrind.git;a=blob;f=VEX/priv/guest_mips_toIR.c;h=1285edad0b83b0f0a6b21fc63d2235d50f94d204;hb=HEAD#l2909)
I have successfully emulated an ELF binary compiled for Cavium (Marvell) on x86 
architecture.


Can you help or give me some suggestions about testing?






Best


owl129

[PATCH v3 7/7] pcie_sriov: Release VFs failed to realize

2024-02-12 Thread Akihiko Odaki

Release VFs failed to realize just as we do in unregister_vfs().

Fixes: 7c0fa8dff811 ("pcie: Add support for Single Root I/O Virtualization 
(SR/IOV)")
Signed-off-by: Akihiko Odaki 
---
 hw/pci/pcie_sriov.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/pci/pcie_sriov.c b/hw/pci/pcie_sriov.c
index 9ba34cf8f8ed..9d668b8d6c17 100644
--- a/hw/pci/pcie_sriov.c
+++ b/hw/pci/pcie_sriov.c
@@ -91,6 +91,8 @@ bool pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset,
 vf->exp.sriov_vf.vf_number = i;
 
 if (!qdev_realize(>qdev, bus, errp)) {
+object_unparent(OBJECT(vf));
+object_unref(vf);
 unrealize_vfs(dev, i);
 return false;
 }

-- 
2.43.0

1 2 >

1 - 100 of 164 matches

Mail list logo