Re: [PATCH 2/6] qemu/main-loop: rename QEMU_IOTHREAD_LOCK_GUARD to QEMU_BQL_LOCK_GUARD

2023-11-30 Thread Harsh Prateek Bora




On 12/1/23 01:57, Stefan Hajnoczi wrote:

On Thu, Nov 30, 2023 at 10:14:47AM +0100, Ilya Leoshkevich wrote:

On Wed, 2023-11-29 at 16:26 -0500, Stefan Hajnoczi wrote:

The name "iothread" is overloaded. Use the term Big QEMU Lock (BQL)
instead, it is already widely used and unambiguous.

Signed-off-by: Stefan Hajnoczi 
---
  include/qemu/main-loop.h  | 20 ++--
  hw/i386/kvm/xen_evtchn.c  | 14 +++---
  hw/i386/kvm/xen_gnttab.c  |  2 +-
  hw/mips/mips_int.c    |  2 +-
  hw/ppc/ppc.c  |  2 +-
  target/i386/kvm/xen-emu.c |  2 +-
  target/ppc/excp_helper.c  |  2 +-
  target/ppc/helper_regs.c  |  2 +-
  target/riscv/cpu_helper.c |  4 ++--
  9 files changed, 25 insertions(+), 25 deletions(-)

diff --git a/include/qemu/main-loop.h b/include/qemu/main-loop.h
index d6f75e57bd..0b6a3e4824 100644
--- a/include/qemu/main-loop.h
+++ b/include/qemu/main-loop.h
@@ -344,13 +344,13 @@ void qemu_bql_lock_impl(const char *file, int
line);
  void qemu_bql_unlock(void);
  
  /**

- * QEMU_IOTHREAD_LOCK_GUARD
+ * QEMU_BQL_LOCK_GUARD
   *
- * Wrap a block of code in a conditional
qemu_mutex_{lock,unlock}_iothread.
+ * Wrap a block of code in a conditional qemu_bql_{lock,unlock}.
   */
-typedef struct IOThreadLockAuto IOThreadLockAuto;
+typedef struct BQLLockAuto BQLLockAuto;
  
-static inline IOThreadLockAuto *qemu_iothread_auto_lock(const char

*file,
+static inline BQLLockAuto *qemu_bql_auto_lock(const char *file,
  int line)


The padding is not correct anymore.


Good point, I didn't check the formatting after search-and-replace. I
will fix this across the patch series in v2.



Yeh, some comments in 5/6 and 6/6 can also make full use of 80 char 
width after search-replace effect.


regards,
Harsh


Stefan




Re: [PATCH 9.0 00/13] Consolidate common vdpa members in VhostVDPAShared

2023-11-30 Thread Jason Wang
On Sat, Nov 25, 2023 at 1:14 AM Eugenio Pérez  wrote:
>
> Current memory operations like pinning may take a lot of time at the
> destination.  Currently they are done after the source of the migration is
> stopped, and before the workload is resumed at the destination.  This is a
> period where neigher traffic can flow, nor the VM workload can continue
> (downtime).
>
> We can do better as we know the memory layout of the guest RAM at the
> destination from the moment the migration starts.  Moving that operation 
> allows
> QEMU to communicate the kernel the maps while the workload is still running in
> the source, so Linux can start mapping them.  Ideally, all IOMMU is 
> configured,
> but if the vDPA parent driver uses on-chip IOMMU and .set_map we're still
> saving all the pinning time.
>
> This is a first required step to consolidate all the members in a common
> struct.  This is needed because the destination does not know what vhost_vdpa
> struct will have the registered listener member, so it is easier to place them
> in a shared struct rather to keep them in vhost_vdpa struct.
>
> v1 from RFC:
> * Fix vhost_vdpa_net_cvq_start checking for always_svq instead of
>   shadow_data.  This could cause CVQ not being shadowed if
>   vhost_vdpa_net_cvq_start was called in the middle of a migration.

With the renaming of the VhostVDPAShared to VhostVDPAParent.

Acked-by: Jason Wang 

Thanks

>
> Eugenio Pérez (13):
>   vdpa: add VhostVDPAShared
>   vdpa: move iova tree to the shared struct
>   vdpa: move iova_range to vhost_vdpa_shared
>   vdpa: move shadow_data to vhost_vdpa_shared
>   vdpa: use vdpa shared for tracing
>   vdpa: move file descriptor to vhost_vdpa_shared
>   vdpa: move iotlb_batch_begin_sent to vhost_vdpa_shared
>   vdpa: move backend_cap to vhost_vdpa_shared
>   vdpa: remove msg type of vhost_vdpa
>   vdpa: move iommu_list to vhost_vdpa_shared
>   vdpa: use VhostVDPAShared in vdpa_dma_map and unmap
>   vdpa: use dev_shared in vdpa_iommu
>   vdpa: move memory listener to vhost_vdpa_shared
>
>  include/hw/virtio/vhost-vdpa.h |  36 +---
>  hw/virtio/vdpa-dev.c   |   7 +-
>  hw/virtio/vhost-vdpa.c | 160 +
>  net/vhost-vdpa.c   | 117 
>  hw/virtio/trace-events |  14 +--
>  5 files changed, 174 insertions(+), 160 deletions(-)
>
> --
> 2.39.3
>
>




Re: [PATCH 9.0 01/13] vdpa: add VhostVDPAShared

2023-11-30 Thread Eugenio Perez Martin
On Fri, Dec 1, 2023 at 6:35 AM Jason Wang  wrote:
>
> On Sat, Nov 25, 2023 at 1:14 AM Eugenio Pérez  wrote:
> >
> > It will hold properties shared among all vhost_vdpa instances associated
> > with of the same device.  For example, we just need one iova_tree or one
> > memory listener for the entire device.
> >
> > Next patches will register the vhost_vdpa memory listener at the
> > beginning of the VM migration at the destination. This enables QEMU to
> > map the memory to the device before stopping the VM at the source,
> > instead of doing while both source and destination are stopped, thus
> > minimizing the downtime.
> >
> > However, the destination QEMU is unaware of which vhost_vdpa struct will
> > register its memory_listener.  If the source guest has CVQ enabled, it
> > will be the one associated with the CVQ.  Otherwise, it will be the
> > first one.
> >
> > Save the memory operations related members in a common place rather than
> > always in the first / last vhost_vdpa.
>
> Great.
>
> Patch looks good but I think we probably need a better name like
> VhostVDPAParent?
>

Sure, I'm ok with the renaming. I'll change it for v2.

Thanks!

> And it would be better in the future if we can convert it to QOM.
>
> Thanks
>
> >
> > Signed-off-by: Eugenio Pérez 
> > ---
> >  include/hw/virtio/vhost-vdpa.h |  5 +
> >  net/vhost-vdpa.c   | 24 ++--
> >  2 files changed, 27 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
> > index 5407d54fd7..eb1a56d75a 100644
> > --- a/include/hw/virtio/vhost-vdpa.h
> > +++ b/include/hw/virtio/vhost-vdpa.h
> > @@ -30,6 +30,10 @@ typedef struct VhostVDPAHostNotifier {
> >  void *addr;
> >  } VhostVDPAHostNotifier;
> >
> > +/* Info shared by all vhost_vdpa device models */
> > +typedef struct vhost_vdpa_shared {
> > +} VhostVDPAShared;
> > +
> >  typedef struct vhost_vdpa {
> >  int device_fd;
> >  int index;
> > @@ -46,6 +50,7 @@ typedef struct vhost_vdpa {
> >  bool suspended;
> >  /* IOVA mapping used by the Shadow Virtqueue */
> >  VhostIOVATree *iova_tree;
> > +VhostVDPAShared *shared;
> >  GPtrArray *shadow_vqs;
> >  const VhostShadowVirtqueueOps *shadow_vq_ops;
> >  void *shadow_vq_ops_opaque;
> > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> > index d0614d7954..8b661b9e6d 100644
> > --- a/net/vhost-vdpa.c
> > +++ b/net/vhost-vdpa.c
> > @@ -240,6 +240,10 @@ static void vhost_vdpa_cleanup(NetClientState *nc)
> >  qemu_close(s->vhost_vdpa.device_fd);
> >  s->vhost_vdpa.device_fd = -1;
> >  }
> > +if (s->vhost_vdpa.index != 0) {
> > +return;
> > +}
> > +g_free(s->vhost_vdpa.shared);
> >  }
> >
> >  /** Dummy SetSteeringEBPF to support RSS for vhost-vdpa backend  */
> > @@ -1661,6 +1665,7 @@ static NetClientState 
> > *net_vhost_vdpa_init(NetClientState *peer,
> > bool svq,
> > struct vhost_vdpa_iova_range 
> > iova_range,
> > uint64_t features,
> > +   VhostVDPAShared *shared,
> > Error **errp)
> >  {
> >  NetClientState *nc = NULL;
> > @@ -1696,6 +1701,7 @@ static NetClientState 
> > *net_vhost_vdpa_init(NetClientState *peer,
> >  if (queue_pair_index == 0) {
> >  vhost_vdpa_net_valid_svq_features(features,
> >
> > >vhost_vdpa.migration_blocker);
> > +s->vhost_vdpa.shared = g_new0(VhostVDPAShared, 1);
> >  } else if (!is_datapath) {
> >  s->cvq_cmd_out_buffer = mmap(NULL, 
> > vhost_vdpa_net_cvq_cmd_page_len(),
> >   PROT_READ | PROT_WRITE,
> > @@ -1708,11 +1714,16 @@ static NetClientState 
> > *net_vhost_vdpa_init(NetClientState *peer,
> >  s->vhost_vdpa.shadow_vq_ops_opaque = s;
> >  s->cvq_isolated = cvq_isolated;
> >  }
> > +if (queue_pair_index != 0) {
> > +s->vhost_vdpa.shared = shared;
> > +}
> > +
> >  ret = vhost_vdpa_add(nc, (void *)>vhost_vdpa, queue_pair_index, 
> > nvqs);
> >  if (ret) {
> >  qemu_del_net_client(nc);
> >  return NULL;
> >  }
> > +
> >  return nc;
> >  }
> >
> > @@ -1824,17 +1835,26 @@ int net_init_vhost_vdpa(const Netdev *netdev, const 
> > char *name,
> >  ncs = g_malloc0(sizeof(*ncs) * queue_pairs);
> >
> >  for (i = 0; i < queue_pairs; i++) {
> > +VhostVDPAShared *shared = NULL;
> > +
> > +if (i) {
> > +shared = DO_UPCAST(VhostVDPAState, nc, 
> > ncs[0])->vhost_vdpa.shared;
> > +}
> >  ncs[i] = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
> >   vdpa_device_fd, i, 2, true, 
> > opts->x_svq,
> > - iova_range, features, errp);
> > + 

Re: [PATCH v4] migration: Plug memory leak with migration URIs

2023-11-30 Thread Markus Armbruster
Peter Xu  writes:

> On Thu, Nov 30, 2023 at 07:35:43PM +0100, Markus Armbruster wrote:
>> Peter Xu  writes:
>> 
>> > On Wed, Nov 29, 2023 at 08:43:01PM +, Het Gala wrote:
>> >> migrate_uri_parse() allocates memory to 'channel' if the user
>> >> opts for old syntax - uri, which is leaked because there is no
>> >> code for freeing 'channel'.
>> >> So, free channel to avoid memory leak in case where 'channels'
>> >> is empty and uri parsing is required.
>> >> 
>> >> Fixes: 5994024f ("migration: Implement MigrateChannelList to qmp 
>> >> migration flow")
>> >> Signed-off-by: Het Gala 
>> >> Suggested-by: Markus Armbruster 
>> >
>> > Reviewed-by: Peter Xu 
>> >
>> >> @@ -533,18 +533,18 @@ static void qemu_start_incoming_migration(const 
>> >> char *uri, bool has_channels,
  -MigrationChannel *channel = NULL;
  +g_autoptr(MigrationChannel) channel = NULL;
   MigrationAddress *addr = NULL;
   MigrationIncomingState *mis = migration_incoming_get_current();

   /*
* Having preliminary checks for uri and channel
*/
   if (uri && has_channels) {
   error_setg(errp, "'uri' and 'channels' arguments are mutually "
  "exclusive; exactly one of the two should be present 
in "
  "'migrate-incoming' qmp command ");
   return;
   } else if (channels) {
   /* To verify that Migrate channel list has only item */
   if (channels->next) {
>> >>  error_setg(errp, "Channel list has more than one entries");
>> >>  return;
>> >>  }
>> >> -channel = channels->value;
>> >> +addr = channels->value->addr;
>> >>  } else if (uri) {
>> >>  /* caller uses the old URI syntax */
>> >>  if (!migrate_uri_parse(uri, , errp)) {
>> >>  return;
>> >>  }
>> >> +addr = channel->addr;
>> >>  } else {
>> >>  error_setg(errp, "neither 'uri' or 'channels' argument are "
>> >> "specified in 'migrate-incoming' qmp command ");
>> >>  return;
>> >>  }
>> >> -addr = channel->addr;
>> >
>> > Why these "addr" lines need change?  Won't that behave the same as before?
>> 
>> In the first case, @channel is now null.  If we left the assignment to
>> @addr alone, it would crash.  Clearer now?
>
> Is it this one?
>
> if (uri && has_channels) {
> error_setg(errp, "'uri' and 'channels' arguments are mutually "
>"exclusive; exactly one of the two should be present in "
>"'migrate-incoming' qmp command ");
> return;
> }
>
> It returns already?

I meant the first visible case, i.e. if (channels).  Sorry for being
less than clear!

The problem is to free the result of migrate_uri_parse().

The patch's solution is to use @channel *only* for holding that result,
so it can be g_autoptr: drop channel = channels->value from the if
(channels) conditional.

Since this breaks addr = channel->addr, we move that assignment into the
conditionals that reach it, which lets us unbreak it the if (channels)
one.




Re: [PATCH v3] arm/kvm: Enable support for KVM_ARM_VCPU_PMU_V3_FILTER

2023-11-30 Thread Gavin Shan

Hi Shaoqin,

On 11/29/23 14:08, Shaoqin Huang wrote:

The KVM_ARM_VCPU_PMU_V3_FILTER provide the ability to let the VMM decide
which PMU events are provided to the guest. Add a new option
`pmu-filter` as -accel sub-option to set the PMU Event Filtering.
Without the filter, the KVM will expose all events from the host to
guest by default.

The `pmu-filter` has such format:

   pmu-filter="{A,D}:start-end[;{A,D}:start-end...]"

The A means "allow" and D means "deny", start is the first event of the
range and the end is the last one. The first registered range defines
the global policy(global ALLOW if the first @action is DENY, global DENY
if the first @action is ALLOW). The start and end only support hex
format now. For example:

   pmu-filter="A:0x11-0x11;A:0x23-0x3a;D:0x30-0x30"

Since the first action is allow, we have a global deny policy. It
will allow event 0x11 (The cycle counter), events 0x23 to 0x3a is
also allowed except the event 0x30 is denied, and all the other events
are disallowed.

Here is an real example shows how to use the PMU Event Filtering, when
we launch a guest by use kvm, add such command line:

   # qemu-system-aarch64 \
-accel kvm,pmu-filter="D:0x11-0x11"

Since the first action is deny, we have a global allow policy. This
disables the filtering of the cycle counter (event 0x11 being CPU_CYCLES).

And then in guest, use the perf to count the cycle:

   # perf stat sleep 1

Performance counter stats for 'sleep 1':

   1.22 msec task-clock   #0.001 CPUs 
utilized
  1  context-switches #  820.695 /sec
  0  cpu-migrations   #0.000 /sec
 55  page-faults  #   45.138 K/sec
  cycles
1128954  instructions
 227031  branches #  186.323 M/sec
   8686  branch-misses#3.83% of all 
branches

1.002492480 seconds time elapsed

0.001752000 seconds user
0.0 seconds sys

As we can see, the cycle counter has been disabled in the guest, but
other pmu events are still work.

Signed-off-by: Shaoqin Huang 
---
v2->v3:
   - Improve commits message, use kernel doc wording, add more explaination on
 filter example, fix some typo error.[Eric]
   - Add g_free() in kvm_arch_set_pmu_filter() to prevent memory leak. [Eric]
   - Add more precise error message report.  [Eric]
   - In options doc, add pmu-filter rely on KVM_ARM_VCPU_PMU_V3_FILTER support 
in
 KVM.[Eric]

v1->v2:
   - Add more description for allow and deny meaning in
 commit message. [Sebastian]
   - Small improvement.  [Sebastian]

v2: https://lore.kernel.org/all/20231117060838.39723-1-shahu...@redhat.com/
v1: https://lore.kernel.org/all/20231113081713.153615-1-shahu...@redhat.com/
---
  include/sysemu/kvm_int.h |  1 +
  qemu-options.hx  | 21 +
  target/arm/kvm.c | 23 ++
  target/arm/kvm64.c   | 68 
  4 files changed, 113 insertions(+)

diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h
index fd846394be..8f4601474f 100644
--- a/include/sysemu/kvm_int.h
+++ b/include/sysemu/kvm_int.h
@@ -120,6 +120,7 @@ struct KVMState
  uint32_t xen_caps;
  uint16_t xen_gnttab_max_frames;
  uint16_t xen_evtchn_max_pirq;
+char *kvm_pmu_filter;
  };
  
  void kvm_memory_listener_register(KVMState *s, KVMMemoryListener *kml,

diff --git a/qemu-options.hx b/qemu-options.hx
index 42fd09e4de..8b721d6668 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -187,6 +187,7 @@ DEF("accel", HAS_ARG, QEMU_OPTION_accel,
  "tb-size=n (TCG translation block cache size)\n"
  "dirty-ring-size=n (KVM dirty ring GFN count, default 
0)\n"
  "eager-split-size=n (KVM Eager Page Split chunk size, default 
0, disabled. ARM only)\n"
+"pmu-filter={A,D}:start-end[;...] (KVM PMU Event Filter, 
default no filter. ARM only)\n"

  ^^^

Potential alignment issue, or the email isn't shown for me correctly.
Besides, why not follow the pattern in the commit log, which is nicer
than what's of being:

pmu-filter={A,D}:start-end[;...]

to

pmu-filter="{A,D}:start-end[;{A,D}:start-end...]


  "notify-vmexit=run|internal-error|disable,notify-window=n 
(enable notify VM exit and set notify window, x86 only)\n"
  "thread=single|multi (enable multi-threaded TCG)\n", 
QEMU_ARCH_ALL)
  SRST
@@ -259,6 +260,26 @@ SRST
  impact on the memory. By default, this feature is disabled
  (eager-split-size=0).
  
+``pmu-filter={A,D}:start-end[;...]``

+KVM implements pmu event filtering to prevent a guest from 

Re: [PATCH 9.0 01/13] vdpa: add VhostVDPAShared

2023-11-30 Thread Jason Wang
On Sat, Nov 25, 2023 at 1:14 AM Eugenio Pérez  wrote:
>
> It will hold properties shared among all vhost_vdpa instances associated
> with of the same device.  For example, we just need one iova_tree or one
> memory listener for the entire device.
>
> Next patches will register the vhost_vdpa memory listener at the
> beginning of the VM migration at the destination. This enables QEMU to
> map the memory to the device before stopping the VM at the source,
> instead of doing while both source and destination are stopped, thus
> minimizing the downtime.
>
> However, the destination QEMU is unaware of which vhost_vdpa struct will
> register its memory_listener.  If the source guest has CVQ enabled, it
> will be the one associated with the CVQ.  Otherwise, it will be the
> first one.
>
> Save the memory operations related members in a common place rather than
> always in the first / last vhost_vdpa.

Great.

Patch looks good but I think we probably need a better name like
VhostVDPAParent?

And it would be better in the future if we can convert it to QOM.

Thanks

>
> Signed-off-by: Eugenio Pérez 
> ---
>  include/hw/virtio/vhost-vdpa.h |  5 +
>  net/vhost-vdpa.c   | 24 ++--
>  2 files changed, 27 insertions(+), 2 deletions(-)
>
> diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
> index 5407d54fd7..eb1a56d75a 100644
> --- a/include/hw/virtio/vhost-vdpa.h
> +++ b/include/hw/virtio/vhost-vdpa.h
> @@ -30,6 +30,10 @@ typedef struct VhostVDPAHostNotifier {
>  void *addr;
>  } VhostVDPAHostNotifier;
>
> +/* Info shared by all vhost_vdpa device models */
> +typedef struct vhost_vdpa_shared {
> +} VhostVDPAShared;
> +
>  typedef struct vhost_vdpa {
>  int device_fd;
>  int index;
> @@ -46,6 +50,7 @@ typedef struct vhost_vdpa {
>  bool suspended;
>  /* IOVA mapping used by the Shadow Virtqueue */
>  VhostIOVATree *iova_tree;
> +VhostVDPAShared *shared;
>  GPtrArray *shadow_vqs;
>  const VhostShadowVirtqueueOps *shadow_vq_ops;
>  void *shadow_vq_ops_opaque;
> diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> index d0614d7954..8b661b9e6d 100644
> --- a/net/vhost-vdpa.c
> +++ b/net/vhost-vdpa.c
> @@ -240,6 +240,10 @@ static void vhost_vdpa_cleanup(NetClientState *nc)
>  qemu_close(s->vhost_vdpa.device_fd);
>  s->vhost_vdpa.device_fd = -1;
>  }
> +if (s->vhost_vdpa.index != 0) {
> +return;
> +}
> +g_free(s->vhost_vdpa.shared);
>  }
>
>  /** Dummy SetSteeringEBPF to support RSS for vhost-vdpa backend  */
> @@ -1661,6 +1665,7 @@ static NetClientState 
> *net_vhost_vdpa_init(NetClientState *peer,
> bool svq,
> struct vhost_vdpa_iova_range 
> iova_range,
> uint64_t features,
> +   VhostVDPAShared *shared,
> Error **errp)
>  {
>  NetClientState *nc = NULL;
> @@ -1696,6 +1701,7 @@ static NetClientState 
> *net_vhost_vdpa_init(NetClientState *peer,
>  if (queue_pair_index == 0) {
>  vhost_vdpa_net_valid_svq_features(features,
>>vhost_vdpa.migration_blocker);
> +s->vhost_vdpa.shared = g_new0(VhostVDPAShared, 1);
>  } else if (!is_datapath) {
>  s->cvq_cmd_out_buffer = mmap(NULL, vhost_vdpa_net_cvq_cmd_page_len(),
>   PROT_READ | PROT_WRITE,
> @@ -1708,11 +1714,16 @@ static NetClientState 
> *net_vhost_vdpa_init(NetClientState *peer,
>  s->vhost_vdpa.shadow_vq_ops_opaque = s;
>  s->cvq_isolated = cvq_isolated;
>  }
> +if (queue_pair_index != 0) {
> +s->vhost_vdpa.shared = shared;
> +}
> +
>  ret = vhost_vdpa_add(nc, (void *)>vhost_vdpa, queue_pair_index, nvqs);
>  if (ret) {
>  qemu_del_net_client(nc);
>  return NULL;
>  }
> +
>  return nc;
>  }
>
> @@ -1824,17 +1835,26 @@ int net_init_vhost_vdpa(const Netdev *netdev, const 
> char *name,
>  ncs = g_malloc0(sizeof(*ncs) * queue_pairs);
>
>  for (i = 0; i < queue_pairs; i++) {
> +VhostVDPAShared *shared = NULL;
> +
> +if (i) {
> +shared = DO_UPCAST(VhostVDPAState, nc, 
> ncs[0])->vhost_vdpa.shared;
> +}
>  ncs[i] = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
>   vdpa_device_fd, i, 2, true, opts->x_svq,
> - iova_range, features, errp);
> + iova_range, features, shared, errp);
>  if (!ncs[i])
>  goto err;
>  }
>
>  if (has_cvq) {
> +VhostVDPAState *s0 = DO_UPCAST(VhostVDPAState, nc, ncs[0]);
> +VhostVDPAShared *shared = s0->vhost_vdpa.shared;
> +
>  nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
>   vdpa_device_fd, i, 1, 

Re: [PATCH 1/6] system/cpus: rename qemu_mutex_lock_iothread() to qemu_bql_lock()

2023-11-30 Thread Harsh Prateek Bora

Hi Stefan,

On 11/30/23 02:56, Stefan Hajnoczi wrote:

The Big QEMU Lock (BQL) has many names and they are confusing. The
actual QemuMutex variable is called qemu_global_mutex but it's commonly
referred to as the BQL in discussions and some code comments. The
locking APIs, however, are called qemu_mutex_lock_iothread() and
qemu_mutex_unlock_iothread().

The "iothread" name is historic and comes from when the main thread was
split into into KVM vcpu threads and the "iothread" (now called the main
loop thread). I have contributed to the confusion myself by introducing
a separate --object iothread, a separate concept unrelated to the BQL.

The "iothread" name is no longer appropriate for the BQL. Rename the
locking APIs to:
- void qemu_bql_lock(void)
- void qemu_bql_unlock(void)
- bool qemu_bql_locked(void)

There are more APIs with "iothread" in their names. Subsequent patches
will rename them. There are also comments and documentation that will be
updated in later patches.

Signed-off-by: Stefan Hajnoczi 
---
  include/block/aio-wait.h |   2 +-
  include/qemu/main-loop.h |  26 +++---
  accel/accel-blocker.c|  10 +--
  accel/dummy-cpus.c   |   8 +-
  accel/hvf/hvf-accel-ops.c|   4 +-
  accel/kvm/kvm-accel-ops.c|   4 +-
  accel/kvm/kvm-all.c  |  22 ++---
  accel/tcg/cpu-exec.c |  26 +++---
  accel/tcg/cputlb.c   |  16 ++--
  accel/tcg/tcg-accel-ops-icount.c |   4 +-
  accel/tcg/tcg-accel-ops-mttcg.c  |  12 +--
  accel/tcg/tcg-accel-ops-rr.c |  14 ++--
  accel/tcg/tcg-accel-ops.c|   2 +-
  accel/tcg/translate-all.c|   2 +-
  cpu-common.c |   4 +-
  dump/dump.c  |   4 +-
  hw/core/cpu-common.c |   6 +-
  hw/i386/intel_iommu.c|   6 +-
  hw/i386/kvm/xen_evtchn.c |  16 ++--
  hw/i386/kvm/xen_overlay.c|   2 +-
  hw/i386/kvm/xen_xenstore.c   |   2 +-
  hw/intc/arm_gicv3_cpuif.c|   2 +-
  hw/intc/s390_flic.c  |  18 ++--
  hw/misc/edu.c|   4 +-
  hw/misc/imx6_src.c   |   2 +-
  hw/misc/imx7_src.c   |   2 +-
  hw/net/xen_nic.c |   8 +-
  hw/ppc/pegasos2.c|   2 +-
  hw/ppc/ppc.c |   4 +-
  hw/ppc/spapr.c   |   2 +-
  hw/ppc/spapr_rng.c   |   4 +-
  hw/ppc/spapr_softmmu.c   |   4 +-
  hw/remote/mpqemu-link.c  |  12 +--
  hw/remote/vfio-user-obj.c|   2 +-
  hw/s390x/s390-skeys.c|   2 +-
  migration/block-dirty-bitmap.c   |   4 +-
  migration/block.c|  16 ++--
  migration/colo.c |  60 +++---
  migration/dirtyrate.c|  12 +--
  migration/migration.c|  52 ++--
  migration/ram.c  |  12 +--
  replay/replay-internal.c |   2 +-
  semihosting/console.c|   8 +-
  stubs/iothread-lock.c|   6 +-
  system/cpu-throttle.c|   4 +-
  system/cpus.c|  28 +++
  system/dirtylimit.c  |   4 +-
  system/memory.c  |   2 +-
  system/physmem.c |   8 +-
  system/runstate.c|   2 +-
  system/watchpoint.c  |   4 +-
  target/arm/arm-powerctl.c|  14 ++--
  target/arm/helper.c  |   4 +-
  target/arm/hvf/hvf.c |   8 +-
  target/arm/kvm.c |   4 +-
  target/arm/kvm64.c   |   4 +-
  target/arm/ptw.c |   6 +-
  target/arm/tcg/helper-a64.c  |   8 +-
  target/arm/tcg/m_helper.c|   4 +-
  target/arm/tcg/op_helper.c   |  24 +++---
  target/arm/tcg/psci.c|   2 +-
  target/hppa/int_helper.c |   8 +-
  target/i386/hvf/hvf.c|   6 +-
  target/i386/kvm/hyperv.c |   4 +-
  target/i386/kvm/kvm.c|  28 +++
  target/i386/kvm/xen-emu.c|  14 ++--
  target/i386/nvmm/nvmm-accel-ops.c|   4 +-
  target/i386/nvmm/nvmm-all.c  |  20 ++---
  target/i386/tcg/sysemu/fpu_helper.c  |   6 +-
  target/i386/tcg/sysemu/misc_helper.c |   4 +-
  target/i386/whpx/whpx-accel-ops.c|   4 +-
  target/i386/whpx/whpx-all.c  |  24 +++---
  target/loongarch/csr_helper.c|   4 +-
  target/mips/kvm.c|   4 +-
  target/mips/tcg/sysemu/cp0_helper.c  |   4 +-
  target/openrisc/sys_helper.c |  16 ++--
  target/ppc/excp_helper.c |  12 +--
  target/ppc/kvm.c |   4 +-
  target/ppc/misc_helper.c |   8 +-
  target/ppc/timebase_helper.c |   8 +-
  target/s390x/kvm/kvm.c   |   4 +-
  

Re: [PATCH v2 04/14] spapr: nested: Introduce cap-nested-papr for Nested PAPR API

2023-11-30 Thread Harsh Prateek Bora




On 11/30/23 16:41, Nicholas Piggin wrote:

On Thu Nov 30, 2023 at 4:19 PM AEST, Harsh Prateek Bora wrote:



On 11/29/23 09:31, Nicholas Piggin wrote:

On Thu Oct 12, 2023 at 8:49 PM AEST, Harsh Prateek Bora wrote:

Introduce a SPAPR capability cap-nested-papr which provides a nested
HV facility to the guest. This is similar to cap-nested-hv, but uses
a different (incompatible) API and so they are mutually exclusive.
This new API is to enable support for KVM on PowerVM and recently the
Linux kernel side patches have been accepted upstream as well [1].
Support for related hcalls is being added in next set of patches.


We do want to be able to support both APIs on a per-guest basis. It
doesn't look like the vmstate bits will be a problem, both could be
enabled if the logic permitted it and that wouldn't cause a
compatibility problem I think?



I am not sure if it makes sense to have both APIs working in parallel
for a nested guest.


Not for the nested guest, but for the nested KVM host (i.e., the direct
pseries guest running QEMU). QEMU doesn't know ahead of time which API
might be used by the OS.


Former uses h_enter_guest and expects L1 to store
most of the regs, and has no concept like GSB where the communication
between L1 and L0 takes place in a standard format which is used at
nested guest exit also. Here, we have separate APIs for guest/vcpu
create and then do a run_vcpu for a specific vcpu. So, we cant really
use both APIs interchangeably while running a nested guest. BTW, L1
kernel uses only either of the APIs at a time, preferably this one if
supported.


Yeah not on the same guest. And it's less about running two different
APIs on different guests with the same L1 simultaneously (although we
could probably change KVM to support that fairly easily, and we might
want to for testing purposes), but more about compatibility. What if
we boot or exec into an old kernel that doesn't support the new API?


Hmm, ok, that's a possible use case, will drop the mutual exclusion in v3.

regards,
Harsh




And it's a bit of a nitpick, but the capability should not be permitted
before the actual APIs are supported IMO. You could split this into
adding .api first, so the implementation can test it, and add the spapr
caps at the end.



Agree, I shall update as suggested.


Thanks,
Nick




Re: [PATCH v8 1/9] machine: Use error handling when CPU type is checked

2023-11-30 Thread Gavin Shan

Hi Markus,

On 11/29/23 19:20, Markus Armbruster wrote:

Gavin Shan  writes:


QEMU will be terminated if the specified CPU type isn't supported
in machine_run_board_init(). The list of supported CPU type names
is tracked by mc->valid_cpu_types.


Suggest to drop the second sentence.



Indeed, it's not so helpful.


The error handling can be used to propagate error messages, to be
consistent how the errors are handled for other situations in the
same function.

No functional change intended.

Suggested-by: Igor Mammedov 
Signed-off-by: Gavin Shan 
---
v8: Drop @local_err and use @errp to be compatible with
 ERRP_GUARD()  (Phil)
---
  hw/core/machine.c | 13 +++--
  1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index 0c17398141..bde7f4af6d 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -1466,15 +1466,16 @@ void machine_run_board_init(MachineState *machine, 
const char *mem_path, Error *
  
  if (!machine_class->valid_cpu_types[i]) {

  /* The user specified CPU is not valid */
-error_report("Invalid CPU type: %s", machine->cpu_type);
-error_printf("The valid types are: %s",
- machine_class->valid_cpu_types[0]);
+error_setg(errp, "Invalid CPU type: %s", machine->cpu_type);
+error_append_hint(errp, "The valid types are: %s",
+  machine_class->valid_cpu_types[0]);
  for (i = 1; machine_class->valid_cpu_types[i]; i++) {
-error_printf(", %s", machine_class->valid_cpu_types[i]);
+error_append_hint(errp, ", %s",
+  machine_class->valid_cpu_types[i]);
  }
-error_printf("\n");
  
-exit(1);

+error_append_hint(, "\n");
+return;
  }
  }


This cleans up an anti-pattern: use of error_report() within a function that
returns errors through an Error **errp parameter.

Cleanup, not bug fix, because the only caller passes _abort.

Suggest to start the commit message with a mention of the anti-pattern.
Here's how I'd write it:

 Functions that use an Error **errp parameter to return errors should
 not also report them to the user, because reporting is the caller's
 job.

 machine_run_board_init() violates this principle: it calls
 error_report(), error_printf(), and exit(1) when the machine doesn't
 support the requested CPU type.

 Clean this up by using error_setg() and error_append_hint() instead.
 No functional change, as the only caller passes _fatal.



Thanks for the nice write-up. I will take it if v9 is needed to address
comments from other people.


Whether you use my suggestion or not:
Reviewed-by: Markus Armbruster 



Thanks for your review.

Thanks,
Gavin




[PATCH 2/2] linux-user: Fix openat() emulation to not modify atime

2023-11-30 Thread Shu-Chun Weng
Commit b8002058 strengthened openat()'s /proc detection by calling
realpath(3) on the given path, which allows various paths and symlinks
that points to the /proc file system to be intercepted correctly.

Using realpath(3), though, has a side effect that it reads the symlinks
along the way, and thus changes their atime. The results in the
following code snippet already get ~now instead of the real atime:

  int fd = open("/path/to/a/symlink", O_PATH | O_NOFOLLOW);
  struct stat st;
  fstat(fd, st);
  return st.st_atime;

This change opens a path that doesn't appear to be part of /proc
directly and checks the destination of /proc/self/fd/n to determine if
it actually refers to a file in /proc.

Neither this nor the existing code works with symlinks or indirect paths
(e.g.  /tmp/../proc/self/exe) that points to /proc/self/exe because it
is itself a symlink, and both realpath(3) and /proc/self/fd/n will
resolve into the location of QEMU.

Signed-off-by: Shu-Chun Weng 
---
 linux-user/syscall.c | 42 +-
 1 file changed, 33 insertions(+), 9 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index e384e14248..25e2cda10a 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -8308,8 +8308,6 @@ static int open_net_route(CPUArchState *cpu_env, int fd)
 int do_guest_openat(CPUArchState *cpu_env, int dirfd, const char *fname,
 int flags, mode_t mode, bool safe)
 {
-g_autofree char *proc_name = NULL;
-const char *pathname;
 struct fake_open {
 const char *filename;
 int (*fill)(CPUArchState *cpu_env, int fd);
@@ -8333,13 +8331,39 @@ int do_guest_openat(CPUArchState *cpu_env, int dirfd, 
const char *fname,
 #endif
 { NULL, NULL, NULL }
 };
+char pathname[PATH_MAX];
 
-/* if this is a file from /proc/ filesystem, expand full name */
-proc_name = realpath(fname, NULL);
-if (proc_name && strncmp(proc_name, "/proc/", 6) == 0) {
-pathname = proc_name;
+if (strncmp(fname, "/proc/", 6) == 0) {
+pstrcpy(pathname, sizeof(pathname), fname);
 } else {
-pathname = fname;
+char procpath[PATH_MAX];
+int fd, n;
+
+if (safe) {
+fd = safe_openat(dirfd, path(fname), flags, mode);
+} else {
+fd = openat(dirfd, path(fname), flags, mode);
+}
+if (fd < 0) {
+return fd;
+}
+
+/*
+ * Try to get the real path of the file we just opened. We avoid 
calling
+ * `realpath(3)` because it calls `readlink(2)` on symlinks which
+ * changes their atime. Note that since `/proc/self/exe` is a symlink,
+ * `pathname` will never resolves to it (neither will `realpath(3)`).
+ * That's why we check `fname` against the "/proc/" prefix first.
+ */
+snprintf(procpath, sizeof(procpath), "/proc/self/fd/%d", fd);
+n = readlink(procpath, pathname, sizeof(pathname));
+pathname[n < sizeof(pathname) ? n : sizeof(pathname)] = '\0';
+
+/* if this is not a file from /proc/ filesystem, the fd is good as-is 
*/
+if (strncmp(pathname, "/proc/", 6) != 0) {
+return fd;
+}
+close(fd);
 }
 
 if (is_proc_myself(pathname, "exe")) {
@@ -8390,9 +8414,9 @@ int do_guest_openat(CPUArchState *cpu_env, int dirfd, 
const char *fname,
 }
 
 if (safe) {
-return safe_openat(dirfd, path(pathname), flags, mode);
+return safe_openat(dirfd, pathname, flags, mode);
 } else {
-return openat(dirfd, path(pathname), flags, mode);
+return openat(dirfd, pathname, flags, mode);
 }
 }
 



[PATCH 1/2] linux-user: Define TARGET_O_LARGEFILE for aarch64

2023-11-30 Thread Shu-Chun Weng
In 050a1ba, when moving the macros from preprocessor-guarding to
file-based definition, TARGET_O_LARGEFILE appeared to have been
accidentally left off.

This may have correctness implication, but so far I was only confused by
strace's output.

Signed-off-by: Shu-Chun Weng 
---
 linux-user/aarch64/target_fcntl.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/linux-user/aarch64/target_fcntl.h 
b/linux-user/aarch64/target_fcntl.h
index efdf6e5f05..55ab788a7c 100644
--- a/linux-user/aarch64/target_fcntl.h
+++ b/linux-user/aarch64/target_fcntl.h
@@ -11,6 +11,7 @@
 #define TARGET_O_DIRECTORY  04 /* must be a directory */
 #define TARGET_O_NOFOLLOW  010 /* don't follow links */
 #define TARGET_O_DIRECT020 /* direct disk access hint */
+#define TARGET_O_LARGEFILE 040
 
 #include "../generic/fcntl.h"
 #endif



[PATCH 0/2] linux-user: openat() fixes

2023-11-30 Thread Shu-Chun Weng
Shu-Chun Weng (2):
  linux-user: Define TARGET_O_LARGEFILE for aarch64
  linux-user: Fix openat() emulation to not modify atime

 linux-user/aarch64/target_fcntl.h |  1 +
 linux-user/syscall.c  | 42 ---
 2 files changed, 34 insertions(+), 9 deletions(-)




[PATCH v5] accel/kvm: Turn DPRINTF macro use into tracepoints

2023-11-30 Thread Jai Arora
Patch removes DRPINTF macro and adds multiple tracepoints
to capture different kvm events.

We also drop the DPRINTFs that don't add any additional
information than trace_kvm_run_exit already does.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1827

Signed-off-by: Jai Arora 
Reviewed-by: Alex Bennée 
---
v5: Adds Reviewed-by tag as requested by Alex Bennee

Added it now, thanks for the feedback. :)

 accel/kvm/kvm-all.c| 28 ++--
 accel/kvm/trace-events |  7 ++-
 2 files changed, 12 insertions(+), 23 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index e39a810a4e..80ac7b35b7 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -69,16 +69,6 @@
 #define KVM_GUESTDBG_BLOCKIRQ 0
 #endif
 
-//#define DEBUG_KVM
-
-#ifdef DEBUG_KVM
-#define DPRINTF(fmt, ...) \
-do { fprintf(stderr, fmt, ## __VA_ARGS__); } while (0)
-#else
-#define DPRINTF(fmt, ...) \
-do { } while (0)
-#endif
-
 struct KVMParkedVcpu {
 unsigned long vcpu_id;
 int kvm_fd;
@@ -331,7 +321,7 @@ static int do_kvm_destroy_vcpu(CPUState *cpu)
 struct KVMParkedVcpu *vcpu = NULL;
 int ret = 0;
 
-DPRINTF("kvm_destroy_vcpu\n");
+trace_kvm_destroy_vcpu();
 
 ret = kvm_arch_destroy_vcpu(cpu);
 if (ret < 0) {
@@ -341,7 +331,7 @@ static int do_kvm_destroy_vcpu(CPUState *cpu)
 mmap_size = kvm_ioctl(s, KVM_GET_VCPU_MMAP_SIZE, 0);
 if (mmap_size < 0) {
 ret = mmap_size;
-DPRINTF("KVM_GET_VCPU_MMAP_SIZE failed\n");
+trace_kvm_failed_get_vcpu_mmap_size();
 goto err;
 }
 
@@ -443,7 +433,6 @@ int kvm_init_vcpu(CPUState *cpu, Error **errp)
PAGE_SIZE * KVM_DIRTY_LOG_PAGE_OFFSET);
 if (cpu->kvm_dirty_gfns == MAP_FAILED) {
 ret = -errno;
-DPRINTF("mmap'ing vcpu dirty gfns failed: %d\n", ret);
 goto err;
 }
 }
@@ -2821,7 +2810,7 @@ int kvm_cpu_exec(CPUState *cpu)
 struct kvm_run *run = cpu->kvm_run;
 int ret, run_ret;
 
-DPRINTF("kvm_cpu_exec()\n");
+trace_kvm_cpu_exec();
 
 if (kvm_arch_process_async_events(cpu)) {
 qatomic_set(>exit_request, 0);
@@ -2848,7 +2837,7 @@ int kvm_cpu_exec(CPUState *cpu)
 
 kvm_arch_pre_run(cpu, run);
 if (qatomic_read(>exit_request)) {
-DPRINTF("interrupt exit requested\n");
+   trace_kvm_interrupt_exit_request();
 /*
  * KVM requires us to reenter the kernel after IO exits to complete
  * instruction emulation. This self-signal will ensure that we
@@ -2878,7 +2867,7 @@ int kvm_cpu_exec(CPUState *cpu)
 
 if (run_ret < 0) {
 if (run_ret == -EINTR || run_ret == -EAGAIN) {
-DPRINTF("io window exit\n");
+trace_kvm_io_window_exit();
 kvm_eat_signals(cpu);
 ret = EXCP_INTERRUPT;
 break;
@@ -2900,7 +2889,6 @@ int kvm_cpu_exec(CPUState *cpu)
 trace_kvm_run_exit(cpu->cpu_index, run->exit_reason);
 switch (run->exit_reason) {
 case KVM_EXIT_IO:
-DPRINTF("handle_io\n");
 /* Called outside BQL */
 kvm_handle_io(run->io.port, attrs,
   (uint8_t *)run + run->io.data_offset,
@@ -2910,7 +2898,6 @@ int kvm_cpu_exec(CPUState *cpu)
 ret = 0;
 break;
 case KVM_EXIT_MMIO:
-DPRINTF("handle_mmio\n");
 /* Called outside BQL */
 address_space_rw(_space_memory,
  run->mmio.phys_addr, attrs,
@@ -2920,11 +2907,9 @@ int kvm_cpu_exec(CPUState *cpu)
 ret = 0;
 break;
 case KVM_EXIT_IRQ_WINDOW_OPEN:
-DPRINTF("irq_window_open\n");
 ret = EXCP_INTERRUPT;
 break;
 case KVM_EXIT_SHUTDOWN:
-DPRINTF("shutdown\n");
 qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_RESET);
 ret = EXCP_INTERRUPT;
 break;
@@ -2959,6 +2944,7 @@ int kvm_cpu_exec(CPUState *cpu)
 ret = 0;
 break;
 case KVM_EXIT_SYSTEM_EVENT:
+trace_kvm_run_exit_system_event(cpu->cpu_index, 
run->system_event.type);
 switch (run->system_event.type) {
 case KVM_SYSTEM_EVENT_SHUTDOWN:
 qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
@@ -2976,13 +2962,11 @@ int kvm_cpu_exec(CPUState *cpu)
 ret = 0;
 break;
 default:
-DPRINTF("kvm_arch_handle_exit\n");
 ret = kvm_arch_handle_exit(cpu, run);
 break;
 }
 break;
 default:
-DPRINTF("kvm_arch_handle_exit\n");
 ret = kvm_arch_handle_exit(cpu, run);
 break;
 }
diff --git a/accel/kvm/trace-events b/accel/kvm/trace-events
index 399aaeb0ec..f61a21019a 100644
--- a/accel/kvm/trace-events

Re: [RFC PATCH v2 17/19] heki: x86: Update permissions counters during text patching

2023-11-30 Thread Edgecombe, Rick P
On Wed, 2023-11-29 at 15:07 -0600, Madhavan T. Venkataraman wrote:
> Threat Model
> 
> 
> In the threat model in Heki, the attacker is a user space attacker
> who exploits
> a kernel vulnerability to gain more privileges or bypass the kernel's
> access
> control and self-protection mechanisms. 
> 
> In the context of the guest page table, one of the things that the
> threat model translates
> to is a hacker gaining access to a guest page with RWX permissions.
> E.g., by adding execute
> permissions to a writable page or by adding write permissions to an
> executable page.
> 
> Today, the permissions for a guest page in the extended page table
> are RWX by
> default. So, if a hacker manages to establish RWX for a page in the
> guest page
> table, then that is all he needs to do some damage.

I had a few random comments from watching the plumbers talk online:

Is there really a big difference between a page that is RWX, and a RW
page that is about to become RX? I realize that there is an addition of
timing, but when executable code is getting loaded it can be written to
then and later executed. I think that gap could be addressed in two
different ways, both pretty difficult:
 1. Verifying the loaded code before it gets marked 
executable. This is difficult because the kernel does lots of 
tweaks on the code it is loading (alternatives, etc). It can't 
just check a signature.
 2. Loading the code in a protected environment. In this model the 
(for example) module signature would be checked, then the code 
would be loaded in some sort of protected environment. This way 
integrity of the loaded code would be enforced. But extracting 
module loading into a separate domain would be difficult. 
Various scattered features all have their hands in the loading.

Secondly, I wonder if another way to look at the memory parts of HEKI
could be that this is a way to protect certain page table bits from
stay writes. The RWX bits in the EPT are not directly writable, so more
steps are needed to change things than just a stray write (instead the
helpers involved in the operations need to be called). If that is a
fair way of looking at it, then I wonder how HEKI compares to a
solution like this security-wise:
https://lore.kernel.org/lkml/20210830235927.6443-1-rick.p.edgeco...@intel.com/

Functional-wise it had the benefit of working on bare metal and
supporting the normal kernel features.


Re: [PATCH V6 05/14] migration: propagate suspended runstate

2023-11-30 Thread Peter Xu
On Thu, Nov 30, 2023 at 01:37:18PM -0800, Steve Sistare wrote:
> If the outgoing machine was previously suspended, propagate that to the
> incoming side via global_state, so a subsequent vm_start restores the
> suspended state.  To maintain backward and forward compatibility, define
> the new field in a zero'd hole in the GlobalState struct.
> 
> Signed-off-by: Steve Sistare 
> ---
>  migration/global_state.c | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/migration/global_state.c b/migration/global_state.c
> index 4e2a9d8..de2532c 100644
> --- a/migration/global_state.c
> +++ b/migration/global_state.c
> @@ -25,6 +25,7 @@ typedef struct {
>  uint8_t runstate[100];
>  RunState state;
>  bool received;
> +bool vm_was_suspended;
>  } GlobalState;
>  
>  static GlobalState global_state;
> @@ -35,6 +36,7 @@ static void global_state_do_store(RunState state)
>  assert(strlen(state_str) < sizeof(global_state.runstate));
>  strpadcpy((char *)global_state.runstate, sizeof(global_state.runstate),
>state_str, '\0');
> +global_state.vm_was_suspended = vm_get_suspended();
>  }
>  
>  void global_state_store(void)
> @@ -68,6 +70,12 @@ static bool global_state_needed(void *opaque)
>  return true;
>  }
>  
> +/* If the suspended state must be remembered, it is needed */
> +
> +if (vm_get_suspended()) {
> +return true;
> +}
> +
>  /* If state is running or paused, it is not needed */
>  
>  if (strcmp(runstate, "running") == 0 ||
> @@ -109,6 +117,7 @@ static int global_state_post_load(void *opaque, int 
> version_id)
>  return -EINVAL;
>  }
>  s->state = r;
> +vm_set_suspended(s->vm_was_suspended || r == RUN_STATE_SUSPENDED);

IIUC current vm_was_suspended (based on my read of your patch) was not the
same as a boolean representing "whether VM is suspended", but only a
temporary field to remember that for a VM stop request.  To be explicit, I
didn't see this flag set in qemu_system_suspend() in your previous patch.

If so, we can already do:

  vm_set_suspended(s->vm_was_suspended);

Irrelevant of RUN_STATE_SUSPENDED?

>  
>  return 0;
>  }
> @@ -134,6 +143,7 @@ static const VMStateDescription vmstate_globalstate = {
>  .fields = (VMStateField[]) {
>  VMSTATE_UINT32(size, GlobalState),
>  VMSTATE_BUFFER(runstate, GlobalState),
> +VMSTATE_BOOL(vm_was_suspended, GlobalState),
>  VMSTATE_END_OF_LIST()
>  },
>  };

I think this will break migration between old/new, unfortunately.  And
since the global state exist mostly for every VM, all VM setup should be
affected, and over all archs.

We used to have the version_id field right above for adding fields, but I
_think_ that will still break backward migration fron new->old binary, so
not wanted.  Juan can keep me honest.

The best thing is still machine compat properties, afaict, to fix.  It's
slightly involved, but let me attach a sample diff for you (at the end,
possibly working with your current patch kind-of squashed, but not ever
tested), hopefully make it slightly easier.

I'm wondering how bad it is to just ignore it, it's not as bad as if we
don't fix stop-during-suspend, in this case the worst case of forgetting
this field over migration is: if VM stopped (and used to be suspended) then
after migration it'll keep being stopped, however after "cont" it'll forget
the suspended state.  Not that bad!  IIUC SPR should always migrate with
suspended (rather than any fully stopped state), right?  Then shouldn't be
affected.  If risk is low, maybe we can leave this one for later?

Thanks,

===8<===

diff --git a/migration/migration.h b/migration/migration.h
index cf2c9c88e0..c3fd1f8347 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -470,6 +470,8 @@ struct MigrationState {
 bool switchover_acked;
 /* Is this a rdma migration */
 bool rdma_migration;
+/* Whether remember global vm_was_suspended field? */
+bool store_vm_was_suspended;
 };
 
 void migrate_set_state(int *state, int old_state, int new_state);
diff --git a/hw/core/machine.c b/hw/core/machine.c
index 0c17398141..365e01c1c9 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -37,6 +37,7 @@ GlobalProperty hw_compat_8_1[] = {
 { "ramfb", "x-migrate", "off" },
 { "vfio-pci-nohotplug", "x-ramfb-migrate", "off" },
 { "igb", "x-pcie-flr-init", "off" },
+{ "migration", "store-vm-was-suspended", false },
 };
 const size_t hw_compat_8_1_len = G_N_ELEMENTS(hw_compat_8_1);
 
diff --git a/migration/global_state.c b/migration/global_state.c
index 4e2a9d8ec0..ffa7bf82ca 100644
--- a/migration/global_state.c
+++ b/migration/global_state.c
@@ -25,6 +25,7 @@ typedef struct {
 uint8_t runstate[100];
 RunState state;
 bool received;
+bool vm_was_suspended;
 } GlobalState;
 
 static GlobalState global_state;
@@ -124,6 +125,25 @@ static int global_state_pre_save(void *opaque)
 return 0;
 }
 

Re: [PATCH 10/12] scsi: remove outdated AioContext lock comment

2023-11-30 Thread Eric Blake
On Wed, Nov 29, 2023 at 02:55:51PM -0500, Stefan Hajnoczi wrote:
> The SCSI subsystem no longer uses the AioContext lock. Request
> processing runs exclusively in the BlockBackend's AioContext since
> "scsi: only access SCSIDevice->requests from one thread" and hence the
> lock is unnecessary.
> 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  hw/scsi/scsi-disk.c | 1 -
>  1 file changed, 1 deletion(-)
>

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org




Re: [PATCH 09/12] docs: remove AioContext lock from IOThread docs

2023-11-30 Thread Eric Blake
On Wed, Nov 29, 2023 at 02:55:50PM -0500, Stefan Hajnoczi wrote:
> Encourage the use of locking primitives and stop mentioning the
> AioContext lock since it is being removed.
> 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  docs/devel/multiple-iothreads.txt | 45 +++
>  1 file changed, 15 insertions(+), 30 deletions(-)
> 
> diff --git a/docs/devel/multiple-iothreads.txt 
> b/docs/devel/multiple-iothreads.txt
> index a3e949f6b3..4865196bde 100644
> --- a/docs/devel/multiple-iothreads.txt

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org




Re: [PATCH-for-9.0] accel/tcg: Remove unused tb_invalidate_phys_addr()

2023-11-30 Thread Richard Henderson

On 11/30/23 14:32, Philippe Mathieu-Daudé wrote:

Commit e3f7c801f1 introduced the TCGCPUOps::debug_check_breakpoint()
handler, and commit 10c37828b2 "moved breakpoint recognition outside
of translation", so "we no longer need to flush any TBs when changing
BPs".

The last target using tb_invalidate_phys_addr() was converted to the
debug_check_breakpoint(), so this function is now unused. Remove it.

Signed-off-by: Philippe Mathieu-Daudé 
---
Based-on: <20231130171920.3798954-1-jcmvb...@gmail.com>
---
  include/exec/exec-all.h |  5 -
  cpu-target.c| 29 -
  2 files changed, 34 deletions(-)



Reviewed-by: Richard Henderson 


r~



Re: [PATCH V6 03/14] cpus: stop vm in suspended runstate

2023-11-30 Thread Peter Xu
On Thu, Nov 30, 2023 at 01:37:16PM -0800, Steve Sistare wrote:
> Currently, a vm in the suspended state is not completely stopped.  The VCPUs
> have been paused, but the cpu clock still runs, and runstate notifiers for
> the transition to stopped have not been called.  This causes problems for
> live migration.  Stale cpu timers_state is saved to the migration stream,
> causing time errors in the guest when it wakes from suspend, and state that
> would have been modified by runstate notifiers is wrong.
> 
> Modify vm_stop to completely stop the vm if the current state is suspended,
> transition to RUN_STATE_PAUSED, and remember that the machine was suspended.
> Modify vm_start to restore the suspended state.
> 
> This affects all callers of vm_stop and vm_start, notably, the qapi stop and
> cont commands.  For example:
> 
> (qemu) info status
> VM status: paused (suspended)
> 
> (qemu) stop
> (qemu) info status
> VM status: paused
> 
> (qemu) cont
> (qemu) info status
> VM status: paused (suspended)
> 
> (qemu) system_wakeup
> (qemu) info status
> VM status: running

So system_wakeup for a stopped (but used to be suspended) VM will fail
directly, not touching vm_was_suspended.  It's not mentioned here, but that
behavior makes sense to me.

> 
> Suggested-by: Peter Xu 
> Signed-off-by: Steve Sistare 

Reviewed-by: Peter Xu 

Since you touched qapi/, please copy maintainers too.  I've copied Markus
and Eric in this reply.

I also have some nitpicks which may not affect the R-b, please see below.

> ---
>  include/sysemu/runstate.h |  5 +
>  qapi/misc.json| 10 --
>  system/cpus.c | 19 ++-
>  system/runstate.c |  3 +++
>  4 files changed, 30 insertions(+), 7 deletions(-)
> 
> diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
> index f6a337b..1d6828f 100644
> --- a/include/sysemu/runstate.h
> +++ b/include/sysemu/runstate.h
> @@ -40,6 +40,11 @@ static inline bool shutdown_caused_by_guest(ShutdownCause 
> cause)
>  return cause >= SHUTDOWN_CAUSE_GUEST_SHUTDOWN;
>  }
>  
> +static inline bool runstate_is_started(RunState state)

Would runstate_has_vm_running() sound better?  It is a bit awkward when
saying something like "start a runstate".

> +{
> +return state == RUN_STATE_RUNNING || state == RUN_STATE_SUSPENDED;
> +}
> +
>  void vm_start(void);
>  
>  /**
> diff --git a/qapi/misc.json b/qapi/misc.json
> index cda2eff..efb8d44 100644
> --- a/qapi/misc.json
> +++ b/qapi/misc.json
> @@ -134,7 +134,7 @@
>  ##
>  # @stop:
>  #
> -# Stop all guest VCPU execution.
> +# Stop all guest VCPU and VM execution.
>  #
>  # Since: 0.14
>  #
> @@ -143,6 +143,9 @@
>  # the guest remains paused once migration finishes, as if the -S
>  # option was passed on the command line.
>  #
> +# In the "suspended" state, it will completely stop the VM and
> +# cause a transition to the "paused" state. (Since 9.0)
> +#
>  # Example:
>  #
>  # -> { "execute": "stop" }
> @@ -153,7 +156,7 @@
>  ##
>  # @cont:
>  #
> -# Resume guest VCPU execution.
> +# Resume guest VCPU and VM execution.
>  #
>  # Since: 0.14
>  #
> @@ -165,6 +168,9 @@
>  # guest starts once migration finishes, removing the effect of the
>  # -S command line option if it was passed.
>  #
> +# If the VM was previously suspended, and not been reset or woken,
> +# this command will transition back to the "suspended" state. (Since 9.0)
> +#
>  # Example:
>  #
>  # -> { "execute": "cont" }
> diff --git a/system/cpus.c b/system/cpus.c
> index ef7a0d3..cbc6d6d 100644
> --- a/system/cpus.c
> +++ b/system/cpus.c
> @@ -277,11 +277,15 @@ bool vm_get_suspended(void)
>  static int do_vm_stop(RunState state, bool send_stop)
>  {
>  int ret = 0;
> +RunState oldstate = runstate_get();
>  
> -if (runstate_is_running()) {
> +if (runstate_is_started(oldstate)) {
> +vm_was_suspended = (oldstate == RUN_STATE_SUSPENDED);
>  runstate_set(state);
>  cpu_disable_ticks();
> -pause_all_vcpus();
> +if (oldstate == RUN_STATE_RUNNING) {
> +pause_all_vcpus();
> +}
>  vm_state_notify(0, state);
>  if (send_stop) {
>  qapi_event_send_stop();
> @@ -736,8 +740,13 @@ int vm_prepare_start(bool step_pending, RunState state)
>  
>  void vm_start(void)
>  {
> -if (!vm_prepare_start(false, RUN_STATE_RUNNING)) {
> -resume_all_vcpus();
> +RunState state = vm_was_suspended ? RUN_STATE_SUSPENDED : 
> RUN_STATE_RUNNING;
> +
> +if (!vm_prepare_start(false, state)) {
> +if (state == RUN_STATE_RUNNING) {
> +resume_all_vcpus();
> +}
> +vm_was_suspended = false;
>  }
>  }
>  
> @@ -745,7 +754,7 @@ void vm_start(void)
> current state is forgotten forever */
>  int vm_stop_force_state(RunState state)
>  {
> -if (runstate_is_running()) {
> +if (runstate_is_started(runstate_get())) {
>  

Re: [PATCH V6 02/14] cpus: vm_was_suspended

2023-11-30 Thread Peter Xu
On Thu, Nov 30, 2023 at 01:37:15PM -0800, Steve Sistare wrote:
> Add a state variable to remember if a vm previously transitioned into a
> suspended state.
> 
> Signed-off-by: Steve Sistare 

I'd even consider squashing this small patch into the next, the reasoning
to have it resides there, but not a huge deal:

Reviewed-by: Peter Xu 

Thanks,

-- 
Peter Xu




Re: [PATCH 1/6] system/cpus: rename qemu_mutex_lock_iothread() to qemu_bql_lock()

2023-11-30 Thread Eric Farman
On Wed, 2023-11-29 at 16:26 -0500, Stefan Hajnoczi wrote:
> The Big QEMU Lock (BQL) has many names and they are confusing. The
> actual QemuMutex variable is called qemu_global_mutex but it's
> commonly
> referred to as the BQL in discussions and some code comments. The
> locking APIs, however, are called qemu_mutex_lock_iothread() and
> qemu_mutex_unlock_iothread().
> 
> The "iothread" name is historic and comes from when the main thread
> was
> split into into KVM vcpu threads and the "iothread" (now called the
> main
> loop thread). I have contributed to the confusion myself by
> introducing
> a separate --object iothread, a separate concept unrelated to the
> BQL.
> 
> The "iothread" name is no longer appropriate for the BQL. Rename the
> locking APIs to:
> - void qemu_bql_lock(void)
> - void qemu_bql_unlock(void)
> - bool qemu_bql_locked(void)
> 
> There are more APIs with "iothread" in their names. Subsequent
> patches
> will rename them. There are also comments and documentation that will
> be
> updated in later patches.
> 
> Signed-off-by: Stefan Hajnoczi 

Acked-by: Eric Farman 



Re: [PATCH 08/12] aio: remove aio_context_acquire()/aio_context_release() API

2023-11-30 Thread Eric Blake
On Wed, Nov 29, 2023 at 02:55:49PM -0500, Stefan Hajnoczi wrote:
> Delete these functions because nothing calls these functions anymore.
> 
> I introduced these APIs in commit 98563fc3ec44 ("aio: add
> aio_context_acquire() and aio_context_release()") in 2014. It's with a
> sigh of relief that I delete these APIs almost 10 years later.
> 
> Thanks to Paolo Bonzini's vision for multi-queue QEMU, we got an
> understanding of where the code needed to go in order to remove the
> limitations that the original dataplane and the IOThread/AioContext
> approach that followed it.
> 
> Emanuele Giuseppe Esposito had the splendid determination to convert
> large parts of the codebase so that they no longer needed the AioContext
> lock. This was a painstaking process, both in the actual code changes
> required and the iterations of code review that Emanuele eeked out of

s/eeked/eked/

> Kevin and me over many months.
> 
> Kevin Wolf tackled multitudes of graph locking conversions to protect
> in-flight I/O from run-time changes to the block graph as well as the
> clang Thread Safety Analysis annotations that allow the compiler to
> check whether the graph lock is being used correctly.
> 
> And me, well, I'm just here to add some pizzazz to the QEMU multi-queue
> block layer :). Thank you to everyone who helped with this effort,
> including Eric Blake, code reviewer extraordinaire, and others who I've
> forgotten to mention.
> 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  include/block/aio.h | 17 -
>  util/async.c| 10 --
>  2 files changed, 27 deletions(-)
>

Yay!

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org




Re: [PATCH 1/6] system/cpus: rename qemu_mutex_lock_iothread() to qemu_bql_lock()

2023-11-30 Thread BALATON Zoltan

On Thu, 30 Nov 2023, Stefan Hajnoczi wrote:


On Thu, Nov 30, 2023 at 03:08:49PM -0500, Peter Xu wrote:

On Wed, Nov 29, 2023 at 04:26:20PM -0500, Stefan Hajnoczi wrote:

The Big QEMU Lock (BQL) has many names and they are confusing. The
actual QemuMutex variable is called qemu_global_mutex but it's commonly
referred to as the BQL in discussions and some code comments. The
locking APIs, however, are called qemu_mutex_lock_iothread() and
qemu_mutex_unlock_iothread().

The "iothread" name is historic and comes from when the main thread was
split into into KVM vcpu threads and the "iothread" (now called the main
loop thread). I have contributed to the confusion myself by introducing
a separate --object iothread, a separate concept unrelated to the BQL.

The "iothread" name is no longer appropriate for the BQL. Rename the
locking APIs to:
- void qemu_bql_lock(void)
- void qemu_bql_unlock(void)
- bool qemu_bql_locked(void)

There are more APIs with "iothread" in their names. Subsequent patches
will rename them. There are also comments and documentation that will be
updated in later patches.

Signed-off-by: Stefan Hajnoczi 


Acked-by: Peter Xu 

Two nickpicks:

  - BQL contains "QEMU" as the 2nd character, so maybe easier to further
rename qemu_bql into bql_?


Philippe wondered whether the variable name should end with _mutex (or
_lock is common too), so an alternative might be big_qemu_lock. That's
imperfect because it doesn't start with the usual qemu_ prefix.
qemu_big_lock is better in that regard but inconsistent with our BQL
abbreviation.


BQL isn't very specific for those unfamiliar with the code but it's short 
and already used and known by people so I'm OK with qemu_bql with some 
comments and docs explainig here and there what bql stands for should be 
enough for new people to quickly find out. If we want to be more verbose 
how about "qemu_global_mutex" which is self describing but longer and does 
not resemble BQL so then comments may be needed to explain this is what 
was called BQL as well. I don't mind either way though.


Regards,
BALATON Zoltan


I don't like putting an underscore at the end. It's unusual and would
make me wonder what that means.

Naming is hard, but please discuss and I'm open to change to BQL
variable's name to whatever we all agree on.



  - Could we keep the full spell of BQL at some places, so people can still
reference it if not familiar?  IIUC most of the BQL helpers will root
back to the major three functions (_lock, _unlock, _locked), perhaps
add a comment of "BQL stands for..." over these three functions as
comment?


Yes, I'll update the doc comments to say "Big QEMU Lock (BQL)" for each
of these functions.

Stefan





Re: [PATCH 07/12] aio-wait: draw equivalence between AIO_WAIT_WHILE() and AIO_WAIT_WHILE_UNLOCKED()

2023-11-30 Thread Eric Blake
On Wed, Nov 29, 2023 at 02:55:48PM -0500, Stefan Hajnoczi wrote:
> Now that the AioContext lock no longer exists, AIO_WAIT_WHILE() and
> AIO_WAIT_WHILE_UNLOCKED() are equivalent.
> 
> A future patch will get rid of AIO_WAIT_WHILE_UNLOCKED().
> 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  include/block/aio-wait.h | 16 
>  1 file changed, 4 insertions(+), 12 deletions(-)
>

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org




[PATCH V6 00/14] fix migration of suspended runstate

2023-11-30 Thread Steve Sistare
Migration of a guest in the suspended runstate is broken.  The incoming
migration code automatically tries to wake the guest, which is wrong;
the guest should end migration in the same runstate it started.  Further,
after saving a snapshot in the suspended state and loading it, the vm_start
fails.  The runstate is RUNNING, but the guest is not.

See the commit messages for the details.

Changes in V2:
  * simplify "start on wakeup request"
  * fix postcopy, snapshot, and background migration
  * refactor fixes for each type of migration
  * explicitly handled suspended events and runstate in tests
  * add test for postcopy and background migration

Changes in V3:
  * rebase to tip
  * fix hang in new function migrate_wait_for_dirty_mem

Changes in V4:
  * rebase to tip
  * add patch for vm_prepare_start (thanks Peter)
  * add patch to preserve cpu ticks

Changes in V5:
  * rebase to tip
  * added patches to completely stop vm in suspended state:
  cpus: refactor vm_stop
  cpus: stop vm in suspended state
  * added patch to partially resume vm in suspended state:
  cpus: start vm in suspended state
  * modified "preserve suspended ..." patches to use the above.
  * deleted patch "preserve cpu ticks if suspended".  stop ticks in
vm_stop_force_state instead.
  * deleted patch "add runstate function".  defined new helper function
migrate_new_runstate in "preserve suspended runstate"
  * Added some RB's, but removed other RB's because the patches changed.

Changes in V6:
  * all vm_stop calls completely stop the suspended state
  * refactored and updated the "cpus" patches
  * simplified the "preserve suspended" patches
  * added patch "bootfile per vm"

Steve Sistare (14):
  cpus: pass runstate to vm_prepare_start
  cpus: vm_was_suspended
  cpus: stop vm in suspended runstate
  cpus: vm_resume
  migration: propagate suspended runstate
  migration: preserve suspended runstate
  migration: preserve suspended for snapshot
  migration: preserve suspended for bg_migration
  tests/qtest: migration events
  tests/qtest: option to suspend during migration
  tests/qtest: precopy migration with suspend
  tests/qtest: postcopy migration with suspend
  tests/qtest: bootfile per vm
  tests/qtest: background migration with suspend

 backends/tpm/tpm_emulator.c  |   2 +-
 gdbstub/system.c |   2 +-
 hw/usb/hcd-ehci.c|   2 +-
 hw/usb/redirect.c|   2 +-
 hw/xen/xen-hvm-common.c  |   2 +-
 include/migration/snapshot.h |   7 +
 include/sysemu/runstate.h|  19 ++-
 migration/global_state.c |  10 ++
 migration/migration-hmp-cmds.c   |   8 +-
 migration/migration.c|  15 +--
 migration/savevm.c   |  23 ++--
 qapi/misc.json   |  10 +-
 system/cpus.c|  49 +--
 system/runstate.c|   9 ++
 system/vl.c  |   2 +
 tests/migration/i386/Makefile|   5 +-
 tests/migration/i386/a-b-bootblock.S |  50 +++-
 tests/migration/i386/a-b-bootblock.h |  26 ++--
 tests/qtest/migration-helpers.c  |  27 ++--
 tests/qtest/migration-helpers.h  |  11 +-
 tests/qtest/migration-test.c | 240 +--
 21 files changed, 382 insertions(+), 139 deletions(-)

-- 
1.8.3.1




[PATCH V6 06/14] migration: preserve suspended runstate

2023-11-30 Thread Steve Sistare
A guest that is migrated in the suspended state automaticaly wakes and
continues execution.  This is wrong; the guest should end migration in
the same state it started.  The root cause is that the outgoing migration
code automatically wakes the guest, then saves the RUNNING runstate in
global_state_store(), hence the incoming migration code thinks the guest is
running and continues the guest if autostart is true.

On the outgoing side, delete the call to qemu_system_wakeup_request().
Now that vm_stop completely stops a vm in the suspended state (from the
preceding patches), the existing call to vm_stop_force_state is sufficient
to correctly migrate all vmstate.

On the incoming side, call vm_start if the pre-migration state was running
or suspended.  For the latter, vm_start correctly restores the suspended
state, and a future system_wakeup monitor request will cause the vm to
resume running.

Signed-off-by: Steve Sistare 
---
 migration/migration.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 28a34c9..d1d94c4 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -603,7 +603,7 @@ static void process_incoming_migration_bh(void *opaque)
  */
 if (!migrate_late_block_activate() ||
  (autostart && (!global_state_received() ||
-global_state_get_runstate() == RUN_STATE_RUNNING))) {
+runstate_is_started(global_state_get_runstate() {
 /* Make sure all file formats throw away their mutable metadata.
  * If we get an error here, just don't restart the VM yet. */
 bdrv_activate_all(_err);
@@ -627,7 +627,7 @@ static void process_incoming_migration_bh(void *opaque)
 dirty_bitmap_mig_before_vm_start();
 
 if (!global_state_received() ||
-global_state_get_runstate() == RUN_STATE_RUNNING) {
+runstate_is_started(global_state_get_runstate())) {
 if (autostart) {
 vm_start();
 } else {
@@ -2415,7 +2415,6 @@ static int postcopy_start(MigrationState *ms, Error 
**errp)
 
 migration_downtime_start(ms);
 
-qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);
 global_state_store();
 ret = migration_stop_vm(RUN_STATE_FINISH_MIGRATE);
 if (ret < 0) {
@@ -2614,7 +2613,6 @@ static int migration_completion_precopy(MigrationState *s,
 
 qemu_mutex_lock_iothread();
 migration_downtime_start(s);
-qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);
 
 s->vm_old_state = runstate_get();
 global_state_store();
@@ -3135,7 +3133,7 @@ static void migration_iteration_finish(MigrationState *s)
 case MIGRATION_STATUS_FAILED:
 case MIGRATION_STATUS_CANCELLED:
 case MIGRATION_STATUS_CANCELLING:
-if (s->vm_old_state == RUN_STATE_RUNNING) {
+if (runstate_is_started(s->vm_old_state)) {
 if (!runstate_check(RUN_STATE_SHUTDOWN)) {
 vm_start();
 }
-- 
1.8.3.1




[PATCH V6 11/14] tests/qtest: precopy migration with suspend

2023-11-30 Thread Steve Sistare
Add a test case to verify that the suspended state is handled correctly
during live migration precopy.  The test suspends the src, migrates, then
wakes the dest.

Signed-off-by: Steve Sistare 
---
 tests/qtest/migration-helpers.c |  3 ++
 tests/qtest/migration-helpers.h |  2 ++
 tests/qtest/migration-test.c| 64 ++---
 3 files changed, 65 insertions(+), 4 deletions(-)

diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c
index fd3b94e..37e8e81 100644
--- a/tests/qtest/migration-helpers.c
+++ b/tests/qtest/migration-helpers.c
@@ -32,6 +32,9 @@ bool migrate_watch_for_events(QTestState *who, const char 
*name,
 if (g_str_equal(name, "STOP")) {
 state->stop_seen = true;
 return true;
+} else if (g_str_equal(name, "SUSPEND")) {
+state->suspend_seen = true;
+return true;
 } else if (g_str_equal(name, "RESUME")) {
 state->resume_seen = true;
 return true;
diff --git a/tests/qtest/migration-helpers.h b/tests/qtest/migration-helpers.h
index 3d32699..b478549 100644
--- a/tests/qtest/migration-helpers.h
+++ b/tests/qtest/migration-helpers.h
@@ -18,6 +18,8 @@
 typedef struct QTestMigrationState {
 bool stop_seen;
 bool resume_seen;
+bool suspend_seen;
+bool suspend_me;
 } QTestMigrationState;
 
 bool migrate_watch_for_events(QTestState *who, const char *name,
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index e10d5a4..200f023 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -178,7 +178,7 @@ static void bootfile_delete(void)
 /*
  * Wait for some output in the serial output file,
  * we get an 'A' followed by an endless string of 'B's
- * but on the destination we won't have the A.
+ * but on the destination we won't have the A (unless we enabled 
suspend/resume)
  */
 static void wait_for_serial(const char *side)
 {
@@ -245,6 +245,13 @@ static void wait_for_resume(QTestState *who, 
QTestMigrationState *state)
 }
 }
 
+static void wait_for_suspend(QTestState *who, QTestMigrationState *state)
+{
+if (!state->suspend_seen) {
+qtest_qmp_eventwait(who, "SUSPEND");
+}
+}
+
 /*
  * It's tricky to use qemu's migration event capability with qtest,
  * events suddenly appearing confuse the qmp()/hmp() responses.
@@ -299,7 +306,7 @@ static void wait_for_migration_pass(QTestState *who)
 {
 uint64_t pass, prev_pass = 0, changes = 0;
 
-while (changes < 2 && !src_state.stop_seen) {
+while (changes < 2 && !src_state.stop_seen && !src_state.suspend_seen) {
 usleep(1000);
 pass = get_migration_pass(who);
 changes += (pass != prev_pass);
@@ -595,7 +602,8 @@ static void migrate_wait_for_dirty_mem(QTestState *from,
 watch_byte = qtest_readb(from, watch_address);
 do {
 usleep(1000 * 10);
-} while (qtest_readb(from, watch_address) == watch_byte);
+} while (qtest_readb(from, watch_address) == watch_byte &&
+ !src_state.suspend_seen);
 }
 
 
@@ -771,6 +779,7 @@ static int test_migrate_start(QTestState **from, QTestState 
**to,
 dst_state = (QTestMigrationState) { };
 src_state = (QTestMigrationState) { };
 bootfile_create(tmpfs, args->suspend_me);
+src_state.suspend_me = args->suspend_me;
 
 if (strcmp(arch, "i386") == 0 || strcmp(arch, "x86_64") == 0) {
 memory_size = "150M";
@@ -1730,6 +1739,9 @@ static void test_precopy_common(MigrateCommon *args)
  * change anything.
  */
 if (args->result == MIG_TEST_SUCCEED) {
+if (src_state.suspend_me) {
+wait_for_suspend(from, _state);
+}
 qtest_qmp_assert_success(from, "{ 'execute' : 'stop'}");
 wait_for_stop(from, _state);
 migrate_ensure_converge(from);
@@ -1777,6 +1789,9 @@ static void test_precopy_common(MigrateCommon *args)
  */
 wait_for_migration_complete(from);
 
+if (src_state.suspend_me) {
+wait_for_suspend(from, _state);
+}
 wait_for_stop(from, _state);
 
 } else {
@@ -1793,6 +1808,11 @@ static void test_precopy_common(MigrateCommon *args)
 
 wait_for_resume(to, _state);
 
+if (args->start.suspend_me) {
+/* wakeup succeeds only if guest is suspended */
+qtest_qmp_assert_success(to, "{'execute': 'system_wakeup'}");
+}
+
 wait_for_serial("dest_serial");
 }
 
@@ -1879,6 +1899,34 @@ static void test_precopy_unix_plain(void)
 test_precopy_common();
 }
 
+static void test_precopy_unix_suspend_live(void)
+{
+g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs);
+MigrateCommon args = {
+.listen_uri = uri,
+.connect_uri = uri,
+/*
+ * despite being live, the test is fast because the src
+ * suspends immediately.
+ */
+.live = true,
+.start.suspend_me = 

[PATCH V6 10/14] tests/qtest: option to suspend during migration

2023-11-30 Thread Steve Sistare
Add an option to suspend the src in a-b-bootblock.S, which puts the guest
in S3 state after one round of writing to memory.  The option is enabled by
poking a 1 into the suspend_me word in the boot block prior to starting the
src vm.  Generate symbol offsets in a-b-bootblock.h so that the suspend_me
offset is known.  Generate the bootblock for each test, because suspend_me
may differ for each.

Signed-off-by: Steve Sistare 
Acked-by: Peter Xu 
---
 tests/migration/i386/Makefile|  5 ++--
 tests/migration/i386/a-b-bootblock.S | 50 +---
 tests/migration/i386/a-b-bootblock.h | 26 +--
 tests/qtest/migration-test.c | 12 ++---
 4 files changed, 77 insertions(+), 16 deletions(-)

diff --git a/tests/migration/i386/Makefile b/tests/migration/i386/Makefile
index 5c03241..37a72ae 100644
--- a/tests/migration/i386/Makefile
+++ b/tests/migration/i386/Makefile
@@ -4,9 +4,10 @@
 .PHONY: all clean
 all: a-b-bootblock.h
 
-a-b-bootblock.h: x86.bootsect
+a-b-bootblock.h: x86.bootsect x86.o
echo "$$__note" > header.tmp
xxd -i $< | sed -e 's/.*int.*//' >> header.tmp
+   nm x86.o | awk '{print "#define SYM_"$$3" 0x"$$1}' >> header.tmp
mv header.tmp $@
 
 x86.bootsect: x86.boot
@@ -16,7 +17,7 @@ x86.boot: x86.o
$(CROSS_PREFIX)objcopy -O binary $< $@
 
 x86.o: a-b-bootblock.S
-   $(CROSS_PREFIX)gcc -m32 -march=i486 -c $< -o $@
+   $(CROSS_PREFIX)gcc -I.. -m32 -march=i486 -c $< -o $@
 
 clean:
@rm -rf *.boot *.o *.bootsect
diff --git a/tests/migration/i386/a-b-bootblock.S 
b/tests/migration/i386/a-b-bootblock.S
index 6bb..6f39eb6 100644
--- a/tests/migration/i386/a-b-bootblock.S
+++ b/tests/migration/i386/a-b-bootblock.S
@@ -9,6 +9,23 @@
 #
 # Author: dgilb...@redhat.com
 
+#include "migration-test.h"
+
+#define ACPI_ENABLE 0xf1
+#define ACPI_PORT_SMI_CMD   0xb2
+#define ACPI_PM_BASE0x600
+#define PM1A_CNT_OFFSET 4
+
+#define ACPI_SCI_ENABLE 0x0001
+#define ACPI_SLEEP_TYPE 0x0400
+#define ACPI_SLEEP_ENABLE   0x2000
+#define SLEEP (ACPI_SCI_ENABLE + ACPI_SLEEP_TYPE + ACPI_SLEEP_ENABLE)
+
+#define LOW_ADDRX86_TEST_MEM_START
+#define HIGH_ADDR   X86_TEST_MEM_END
+
+/* Save the suspended status at an address that is not written in the loop. */
+#define suspended   (X86_TEST_MEM_START + 4)
 
 .code16
 .org 0x7c00
@@ -35,8 +52,8 @@ start: # at 0x7c00 ?
 mov %eax,%ds
 
 # Start from 1MB
-.set TEST_MEM_START, (1024*1024)
-.set TEST_MEM_END, (100*1024*1024)
+.set TEST_MEM_START, X86_TEST_MEM_START
+.set TEST_MEM_END, X86_TEST_MEM_END
 
 mov $65,%ax
 mov $0x3f8,%dx
@@ -69,7 +86,30 @@ innerloop:
 mov $0x3f8,%dx
 outb %al,%dx
 
-jmp mainloop
+# should this test suspend?
+mov (suspend_me),%eax
+cmp $0,%eax
+je mainloop
+
+# are we waking after suspend?  do not suspend again.
+mov $suspended,%eax
+mov (%eax),%eax
+cmp $1,%eax
+je mainloop
+
+# enable acpi
+mov $ACPI_ENABLE,%al
+outb %al,$ACPI_PORT_SMI_CMD
+
+# suspend to ram
+mov $suspended,%eax
+movl $1,(%eax)
+mov $SLEEP,%ax
+mov $(ACPI_PM_BASE + PM1A_CNT_OFFSET),%dx
+outw %ax,%dx
+# not reached.  The wakeup causes reset and restart at 0x7c00, and we
+# do not save and restore registers as a real kernel would do.
+
 
 # GDT magic from old (GPLv2)  Grub startup.S
 .p2align2   /* force 4-byte alignment */
@@ -95,6 +135,10 @@ gdtdesc:
 .word   0x27/* limit */
 .long   gdt /* addr */
 
+/* test launcher can poke a 1 here to exercise suspend */
+suspend_me:
+.int  0
+
 /* I'm a bootable disk */
 .org 0x7dfe
 .byte 0x55
diff --git a/tests/migration/i386/a-b-bootblock.h 
b/tests/migration/i386/a-b-bootblock.h
index 5b52391..c83f871 100644
--- a/tests/migration/i386/a-b-bootblock.h
+++ b/tests/migration/i386/a-b-bootblock.h
@@ -4,7 +4,7 @@
  * the header and the assembler differences in your patch submission.
  */
 unsigned char x86_bootsect[] = {
-  0xfa, 0x0f, 0x01, 0x16, 0x8c, 0x7c, 0x66, 0xb8, 0x01, 0x00, 0x00, 0x00,
+  0xfa, 0x0f, 0x01, 0x16, 0xb8, 0x7c, 0x66, 0xb8, 0x01, 0x00, 0x00, 0x00,
   0x0f, 0x22, 0xc0, 0x66, 0xea, 0x20, 0x7c, 0x00, 0x00, 0x08, 0x00, 0x00,
   0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xe4, 0x92, 0x0c, 0x02,
   0xe6, 0x92, 0xb8, 0x10, 0x00, 0x00, 0x00, 0x8e, 0xd8, 0x66, 0xb8, 0x41,
@@ -13,13 +13,13 @@ unsigned char x86_bootsect[] = {
   0x40, 0x06, 0x7c, 0xf1, 0xb8, 0x00, 0x00, 0x10, 0x00, 0xfe, 0x00, 0x05,
   0x00, 0x10, 0x00, 0x00, 0x3d, 0x00, 0x00, 0x40, 0x06, 0x7c, 0xf2, 0xfe,
   0xc3, 0x80, 0xe3, 0x3f, 0x75, 0xe6, 0x66, 0xb8, 0x42, 0x00, 0x66, 0xba,
-  0xf8, 0x03, 0xee, 0xeb, 0xdb, 0x8d, 0x76, 0x00, 0x00, 0x00, 0x00, 0x00,
-  0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 

[PATCH V6 08/14] migration: preserve suspended for bg_migration

2023-11-30 Thread Steve Sistare
Do not wake a suspended guest during bg_migration, and restore the prior
state at finish rather than unconditionally running.  Allow the additional
state transitions that occur.

Signed-off-by: Steve Sistare 
Reviewed-by: Fabiano Rosas 
---
 migration/migration.c | 7 +--
 system/runstate.c | 1 +
 2 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index d1d94c4..63c616f 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3389,7 +3389,7 @@ static void bg_migration_vm_start_bh(void *opaque)
 qemu_bh_delete(s->vm_start_bh);
 s->vm_start_bh = NULL;
 
-vm_start();
+vm_resume(s->vm_old_state);
 migration_downtime_end(s);
 }
 
@@ -3461,11 +3461,6 @@ static void *bg_migration_thread(void *opaque)
 
 qemu_mutex_lock_iothread();
 
-/*
- * If VM is currently in suspended state, then, to make a valid runstate
- * transition in vm_stop_force_state() we need to wakeup it up.
- */
-qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);
 s->vm_old_state = runstate_get();
 
 global_state_store();
diff --git a/system/runstate.c b/system/runstate.c
index ca9eb54..621a023 100644
--- a/system/runstate.c
+++ b/system/runstate.c
@@ -168,6 +168,7 @@ static const RunStateTransition runstate_transitions_def[] 
= {
 { RUN_STATE_SUSPENDED, RUN_STATE_PAUSED},
 { RUN_STATE_SUSPENDED, RUN_STATE_SAVE_VM },
 { RUN_STATE_SUSPENDED, RUN_STATE_RESTORE_VM },
+{ RUN_STATE_SUSPENDED, RUN_STATE_SHUTDOWN },
 
 { RUN_STATE_WATCHDOG, RUN_STATE_RUNNING },
 { RUN_STATE_WATCHDOG, RUN_STATE_FINISH_MIGRATE },
-- 
1.8.3.1




[PATCH V6 12/14] tests/qtest: postcopy migration with suspend

2023-11-30 Thread Steve Sistare
Add a test case to verify that the suspended state is handled correctly by
live migration postcopy.  The test suspends the src, migrates, then wakes
the dest.

Signed-off-by: Steve Sistare 
---
 tests/qtest/migration-test.c | 22 +-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 200f023..af661f8 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -638,6 +638,9 @@ static void migrate_postcopy_start(QTestState *from, 
QTestState *to)
 {
 qtest_qmp_assert_success(from, "{ 'execute': 'migrate-start-postcopy' }");
 
+if (src_state.suspend_me) {
+wait_for_suspend(from, _state);
+}
 wait_for_stop(from, _state);
 qtest_qmp_eventwait(to, "RESUME");
 }
@@ -1359,6 +1362,11 @@ static void migrate_postcopy_complete(QTestState *from, 
QTestState *to,
 {
 wait_for_migration_complete(from);
 
+if (args->start.suspend_me) {
+/* wakeup succeeds only if guest is suspended */
+qtest_qmp_assert_success(to, "{'execute': 'system_wakeup'}");
+}
+
 /* Make sure we get at least one "B" on destination */
 wait_for_serial("dest_serial");
 
@@ -1392,6 +1400,15 @@ static void test_postcopy(void)
 test_postcopy_common();
 }
 
+static void test_postcopy_suspend(void)
+{
+MigrateCommon args = {
+.start.suspend_me = true,
+};
+
+test_postcopy_common();
+}
+
 static void test_postcopy_compress(void)
 {
 MigrateCommon args = {
@@ -3412,7 +3429,10 @@ int main(int argc, char **argv)
 qtest_add_func("/migration/postcopy/recovery/double-failures",
test_postcopy_recovery_double_fail);
 #endif /* _WIN32 */
-
+if (is_x86) {
+qtest_add_func("/migration/postcopy/suspend",
+   test_postcopy_suspend);
+}
 }
 
 qtest_add_func("/migration/bad_dest", test_baddest);
-- 
1.8.3.1




[PATCH V6 09/14] tests/qtest: migration events

2023-11-30 Thread Steve Sistare
Define a state object to capture events seen by migration tests, to allow
more events to be captured in a subsequent patch, and simplify event
checking in wait_for_migration_pass.  No functional change.

Signed-off-by: Steve Sistare 
Reviewed-by: Fabiano Rosas 
Reviewed-by: Daniel P. Berrangé 
---
 tests/qtest/migration-helpers.c | 24 ---
 tests/qtest/migration-helpers.h |  9 ++--
 tests/qtest/migration-test.c| 91 +++--
 3 files changed, 56 insertions(+), 68 deletions(-)

diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c
index 24fb7b3..fd3b94e 100644
--- a/tests/qtest/migration-helpers.c
+++ b/tests/qtest/migration-helpers.c
@@ -24,26 +24,16 @@
  */
 #define MIGRATION_STATUS_WAIT_TIMEOUT 120
 
-bool migrate_watch_for_stop(QTestState *who, const char *name,
-QDict *event, void *opaque)
-{
-bool *seen = opaque;
-
-if (g_str_equal(name, "STOP")) {
-*seen = true;
-return true;
-}
-
-return false;
-}
-
-bool migrate_watch_for_resume(QTestState *who, const char *name,
+bool migrate_watch_for_events(QTestState *who, const char *name,
   QDict *event, void *opaque)
 {
-bool *seen = opaque;
+QTestMigrationState *state = opaque;
 
-if (g_str_equal(name, "RESUME")) {
-*seen = true;
+if (g_str_equal(name, "STOP")) {
+state->stop_seen = true;
+return true;
+} else if (g_str_equal(name, "RESUME")) {
+state->resume_seen = true;
 return true;
 }
 
diff --git a/tests/qtest/migration-helpers.h b/tests/qtest/migration-helpers.h
index e31dc85..3d32699 100644
--- a/tests/qtest/migration-helpers.h
+++ b/tests/qtest/migration-helpers.h
@@ -15,9 +15,12 @@
 
 #include "libqtest.h"
 
-bool migrate_watch_for_stop(QTestState *who, const char *name,
-QDict *event, void *opaque);
-bool migrate_watch_for_resume(QTestState *who, const char *name,
+typedef struct QTestMigrationState {
+bool stop_seen;
+bool resume_seen;
+} QTestMigrationState;
+
+bool migrate_watch_for_events(QTestState *who, const char *name,
   QDict *event, void *opaque);
 
 G_GNUC_PRINTF(3, 4)
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 0fbaa6a..05c0740 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -43,8 +43,8 @@
 unsigned start_address;
 unsigned end_address;
 static bool uffd_feature_thread_id;
-static bool got_src_stop;
-static bool got_dst_resume;
+static QTestMigrationState src_state;
+static QTestMigrationState dst_state;
 
 /*
  * An initial 3 MB offset is used as that corresponds
@@ -230,6 +230,20 @@ static void wait_for_serial(const char *side)
 } while (true);
 }
 
+static void wait_for_stop(QTestState *who, QTestMigrationState *state)
+{
+if (!state->stop_seen) {
+qtest_qmp_eventwait(who, "STOP");
+}
+}
+
+static void wait_for_resume(QTestState *who, QTestMigrationState *state)
+{
+if (!state->resume_seen) {
+qtest_qmp_eventwait(who, "RESUME");
+}
+}
+
 /*
  * It's tricky to use qemu's migration event capability with qtest,
  * events suddenly appearing confuse the qmp()/hmp() responses.
@@ -277,21 +291,19 @@ static void read_blocktime(QTestState *who)
 qobject_unref(rsp_return);
 }
 
+/*
+ * Wait for two changes in the migration pass count, but bail if we stop.
+ */
 static void wait_for_migration_pass(QTestState *who)
 {
-uint64_t initial_pass = get_migration_pass(who);
-uint64_t pass;
+uint64_t pass, prev_pass = 0, changes = 0;
 
-/* Wait for the 1st sync */
-while (!got_src_stop && !initial_pass) {
-usleep(1000);
-initial_pass = get_migration_pass(who);
-}
-
-do {
+while (changes < 2 && !src_state.stop_seen) {
 usleep(1000);
 pass = get_migration_pass(who);
-} while (pass == initial_pass && !got_src_stop);
+changes += (pass != prev_pass);
+prev_pass = pass;
+}
 }
 
 static void check_guests_ram(QTestState *who)
@@ -617,10 +629,7 @@ static void migrate_postcopy_start(QTestState *from, 
QTestState *to)
 {
 qtest_qmp_assert_success(from, "{ 'execute': 'migrate-start-postcopy' }");
 
-if (!got_src_stop) {
-qtest_qmp_eventwait(from, "STOP");
-}
-
+wait_for_stop(from, _state);
 qtest_qmp_eventwait(to, "RESUME");
 }
 
@@ -756,8 +765,8 @@ static int test_migrate_start(QTestState **from, QTestState 
**to,
 }
 }
 
-got_src_stop = false;
-got_dst_resume = false;
+dst_state = (QTestMigrationState) { };
+src_state = (QTestMigrationState) { };
 if (strcmp(arch, "i386") == 0 || strcmp(arch, "x86_64") == 0) {
 memory_size = "150M";
 
@@ -848,8 +857,8 @@ static int test_migrate_start(QTestState **from, QTestState 
**to,
 if (!args->only_target) {
 *from = qtest_init_with_env(QEMU_ENV_SRC, cmd_source);
 

[PATCH V6 13/14] tests/qtest: bootfile per vm

2023-11-30 Thread Steve Sistare
Create a separate bootfile for the outgoing and incoming vm, so the block
layer can lock the file during the background migration test.  Otherwise,
the test fails with:
  "Failed to get "write" lock.  Is another process using the image
   [/tmp/migration-test-WAKPD2/bootsect]?"

Signed-off-by: Steve Sistare 
---
 tests/qtest/migration-test.c | 44 +++-
 1 file changed, 27 insertions(+), 17 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index af661f8..e16710f 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -124,7 +124,8 @@ static bool ufd_version_check(void)
 #endif
 
 static char *tmpfs;
-static char *bootpath;
+static char *src_bootpath;
+static char *dst_bootpath;
 
 /* The boot file modifies memory area in [start_address, end_address)
  * repeatedly. It outputs a 'B' at a fixed rate while it's still running.
@@ -133,13 +134,13 @@ static char *bootpath;
 #include "tests/migration/aarch64/a-b-kernel.h"
 #include "tests/migration/s390x/a-b-bios.h"
 
-static void bootfile_create(char *dir, bool suspend_me)
+static char *bootfile_create(char *dir, const char *prefix, bool suspend_me)
 {
 const char *arch = qtest_get_arch();
 unsigned char *content;
 size_t len;
+char *bootpath = g_strdup_printf("%s/%s-bootsect", dir, prefix);
 
-bootpath = g_strdup_printf("%s/bootsect", dir);
 if (strcmp(arch, "i386") == 0 || strcmp(arch, "x86_64") == 0) {
 /* the assembled x86 boot sector should be exactly one sector large */
 g_assert(sizeof(x86_bootsect) == 512);
@@ -153,7 +154,7 @@ static void bootfile_create(char *dir, bool suspend_me)
 /*
  * sane architectures can be programmed at the boot prompt
  */
-return;
+return NULL;
 } else if (strcmp(arch, "aarch64") == 0) {
 content = aarch64_kernel;
 len = sizeof(aarch64_kernel);
@@ -166,13 +167,15 @@ static void bootfile_create(char *dir, bool suspend_me)
 
 g_assert_cmpint(fwrite(content, len, 1, bootfile), ==, 1);
 fclose(bootfile);
+return bootpath;
 }
 
-static void bootfile_delete(void)
+static void bootfile_delete(char *bootpath)
 {
-unlink(bootpath);
-g_free(bootpath);
-bootpath = NULL;
+if (bootpath) {
+unlink(bootpath);
+g_free(bootpath);
+}
 }
 
 /*
@@ -766,6 +769,7 @@ static int test_migrate_start(QTestState **from, QTestState 
**to,
 const gchar *ignore_stderr;
 g_autofree char *shmem_opts = NULL;
 g_autofree char *shmem_path = NULL;
+const char *arch_boot_fmt = NULL;
 const char *kvm_opts = NULL;
 const char *arch = qtest_get_arch();
 const char *memory_size;
@@ -781,7 +785,8 @@ static int test_migrate_start(QTestState **from, QTestState 
**to,
 
 dst_state = (QTestMigrationState) { };
 src_state = (QTestMigrationState) { };
-bootfile_create(tmpfs, args->suspend_me);
+src_bootpath = bootfile_create(tmpfs, "src", args->suspend_me);
+dst_bootpath = bootfile_create(tmpfs, "dst", args->suspend_me);
 src_state.suspend_me = args->suspend_me;
 
 if (strcmp(arch, "i386") == 0 || strcmp(arch, "x86_64") == 0) {
@@ -792,15 +797,14 @@ static int test_migrate_start(QTestState **from, 
QTestState **to,
 } else {
 machine_alias = "q35";
 }
-arch_opts = g_strdup_printf(
-"-drive if=none,id=d0,file=%s,format=raw "
-"-device ide-hd,drive=d0,secs=1,cyls=1,heads=1", bootpath);
+arch_boot_fmt = "-drive if=none,id=d0,file=%s,format=raw "
+"-device ide-hd,drive=d0,secs=1,cyls=1,heads=1";
 start_address = X86_TEST_MEM_START;
 end_address = X86_TEST_MEM_END;
 } else if (g_str_equal(arch, "s390x")) {
 memory_size = "128M";
 machine_alias = "s390-ccw-virtio";
-arch_opts = g_strdup_printf("-bios %s", bootpath);
+arch_boot_fmt = "-bios %s";
 start_address = S390_TEST_MEM_START;
 end_address = S390_TEST_MEM_END;
 } else if (strcmp(arch, "ppc64") == 0) {
@@ -818,13 +822,18 @@ static int test_migrate_start(QTestState **from, 
QTestState **to,
 memory_size = "150M";
 machine_alias = "virt";
 machine_opts = "gic-version=max";
-arch_opts = g_strdup_printf("-cpu max -kernel %s", bootpath);
+arch_boot_fmt = "-cpu max -kernel %s";
 start_address = ARM_TEST_MEM_START;
 end_address = ARM_TEST_MEM_END;
 } else {
 g_assert_not_reached();
 }
 
+if (arch_boot_fmt) {
+arch_source = g_strdup_printf(arch_boot_fmt, src_bootpath);
+arch_target = g_strdup_printf(arch_boot_fmt, dst_bootpath);
+}
+
 if (!getenv("QTEST_LOG") && args->hide_stderr) {
 #ifndef _WIN32
 ignore_stderr = "2>/dev/null";
@@ -3052,13 +3061,13 @@ static QTestState *dirtylimit_start_vm(void)
 QTestState *vm = NULL;
 g_autofree gchar *cmd = NULL;
 
-

[PATCH V6 03/14] cpus: stop vm in suspended runstate

2023-11-30 Thread Steve Sistare
Currently, a vm in the suspended state is not completely stopped.  The VCPUs
have been paused, but the cpu clock still runs, and runstate notifiers for
the transition to stopped have not been called.  This causes problems for
live migration.  Stale cpu timers_state is saved to the migration stream,
causing time errors in the guest when it wakes from suspend, and state that
would have been modified by runstate notifiers is wrong.

Modify vm_stop to completely stop the vm if the current state is suspended,
transition to RUN_STATE_PAUSED, and remember that the machine was suspended.
Modify vm_start to restore the suspended state.

This affects all callers of vm_stop and vm_start, notably, the qapi stop and
cont commands.  For example:

(qemu) info status
VM status: paused (suspended)

(qemu) stop
(qemu) info status
VM status: paused

(qemu) cont
(qemu) info status
VM status: paused (suspended)

(qemu) system_wakeup
(qemu) info status
VM status: running

Suggested-by: Peter Xu 
Signed-off-by: Steve Sistare 
---
 include/sysemu/runstate.h |  5 +
 qapi/misc.json| 10 --
 system/cpus.c | 19 ++-
 system/runstate.c |  3 +++
 4 files changed, 30 insertions(+), 7 deletions(-)

diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
index f6a337b..1d6828f 100644
--- a/include/sysemu/runstate.h
+++ b/include/sysemu/runstate.h
@@ -40,6 +40,11 @@ static inline bool shutdown_caused_by_guest(ShutdownCause 
cause)
 return cause >= SHUTDOWN_CAUSE_GUEST_SHUTDOWN;
 }
 
+static inline bool runstate_is_started(RunState state)
+{
+return state == RUN_STATE_RUNNING || state == RUN_STATE_SUSPENDED;
+}
+
 void vm_start(void);
 
 /**
diff --git a/qapi/misc.json b/qapi/misc.json
index cda2eff..efb8d44 100644
--- a/qapi/misc.json
+++ b/qapi/misc.json
@@ -134,7 +134,7 @@
 ##
 # @stop:
 #
-# Stop all guest VCPU execution.
+# Stop all guest VCPU and VM execution.
 #
 # Since: 0.14
 #
@@ -143,6 +143,9 @@
 # the guest remains paused once migration finishes, as if the -S
 # option was passed on the command line.
 #
+# In the "suspended" state, it will completely stop the VM and
+# cause a transition to the "paused" state. (Since 9.0)
+#
 # Example:
 #
 # -> { "execute": "stop" }
@@ -153,7 +156,7 @@
 ##
 # @cont:
 #
-# Resume guest VCPU execution.
+# Resume guest VCPU and VM execution.
 #
 # Since: 0.14
 #
@@ -165,6 +168,9 @@
 # guest starts once migration finishes, removing the effect of the
 # -S command line option if it was passed.
 #
+# If the VM was previously suspended, and not been reset or woken,
+# this command will transition back to the "suspended" state. (Since 9.0)
+#
 # Example:
 #
 # -> { "execute": "cont" }
diff --git a/system/cpus.c b/system/cpus.c
index ef7a0d3..cbc6d6d 100644
--- a/system/cpus.c
+++ b/system/cpus.c
@@ -277,11 +277,15 @@ bool vm_get_suspended(void)
 static int do_vm_stop(RunState state, bool send_stop)
 {
 int ret = 0;
+RunState oldstate = runstate_get();
 
-if (runstate_is_running()) {
+if (runstate_is_started(oldstate)) {
+vm_was_suspended = (oldstate == RUN_STATE_SUSPENDED);
 runstate_set(state);
 cpu_disable_ticks();
-pause_all_vcpus();
+if (oldstate == RUN_STATE_RUNNING) {
+pause_all_vcpus();
+}
 vm_state_notify(0, state);
 if (send_stop) {
 qapi_event_send_stop();
@@ -736,8 +740,13 @@ int vm_prepare_start(bool step_pending, RunState state)
 
 void vm_start(void)
 {
-if (!vm_prepare_start(false, RUN_STATE_RUNNING)) {
-resume_all_vcpus();
+RunState state = vm_was_suspended ? RUN_STATE_SUSPENDED : 
RUN_STATE_RUNNING;
+
+if (!vm_prepare_start(false, state)) {
+if (state == RUN_STATE_RUNNING) {
+resume_all_vcpus();
+}
+vm_was_suspended = false;
 }
 }
 
@@ -745,7 +754,7 @@ void vm_start(void)
current state is forgotten forever */
 int vm_stop_force_state(RunState state)
 {
-if (runstate_is_running()) {
+if (runstate_is_started(runstate_get())) {
 return vm_stop(state);
 } else {
 int ret;
diff --git a/system/runstate.c b/system/runstate.c
index ea9d6c2..e2fa204 100644
--- a/system/runstate.c
+++ b/system/runstate.c
@@ -108,6 +108,7 @@ static const RunStateTransition runstate_transitions_def[] 
= {
 { RUN_STATE_PAUSED, RUN_STATE_POSTMIGRATE },
 { RUN_STATE_PAUSED, RUN_STATE_PRELAUNCH },
 { RUN_STATE_PAUSED, RUN_STATE_COLO},
+{ RUN_STATE_PAUSED, RUN_STATE_SUSPENDED},
 
 { RUN_STATE_POSTMIGRATE, RUN_STATE_RUNNING },
 { RUN_STATE_POSTMIGRATE, RUN_STATE_FINISH_MIGRATE },
@@ -161,6 +162,7 @@ static const RunStateTransition runstate_transitions_def[] 
= {
 { RUN_STATE_SUSPENDED, RUN_STATE_FINISH_MIGRATE },
 { RUN_STATE_SUSPENDED, RUN_STATE_PRELAUNCH },
 { RUN_STATE_SUSPENDED, RUN_STATE_COLO},
+{ RUN_STATE_SUSPENDED, 

[PATCH V6 01/14] cpus: pass runstate to vm_prepare_start

2023-11-30 Thread Steve Sistare
When a vm in the suspended state is migrated, we must call vm_prepare_start
on the destination, so a later system_wakeup properly resumes the guest,
when main_loop_should_exit calls resume_all_vcpus.  However, the runstate
should remain suspended until system_wakeup is called, so allow the caller
to pass the new state to vm_prepare_start, rather than assume the new state
is RUN_STATE_RUNNING.  Modify vm state change handlers that check
RUN_STATE_RUNNING to instead use the running parameter.

No functional change.

Suggested-by: Peter Xu 
Signed-off-by: Steve Sistare 
Reviewed-by: Peter Xu 
---
 backends/tpm/tpm_emulator.c | 2 +-
 gdbstub/system.c| 2 +-
 hw/usb/hcd-ehci.c   | 2 +-
 hw/usb/redirect.c   | 2 +-
 hw/xen/xen-hvm-common.c | 2 +-
 include/sysemu/runstate.h   | 4 +++-
 system/cpus.c   | 8 
 7 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/backends/tpm/tpm_emulator.c b/backends/tpm/tpm_emulator.c
index f7f1b4a..254fce7 100644
--- a/backends/tpm/tpm_emulator.c
+++ b/backends/tpm/tpm_emulator.c
@@ -904,7 +904,7 @@ static void tpm_emulator_vm_state_change(void *opaque, bool 
running,
 
 trace_tpm_emulator_vm_state_change(running, state);
 
-if (!running || state != RUN_STATE_RUNNING || !tpm_emu->relock_storage) {
+if (!running || !tpm_emu->relock_storage) {
 return;
 }
 
diff --git a/gdbstub/system.c b/gdbstub/system.c
index 783ac14..7ab9f82 100644
--- a/gdbstub/system.c
+++ b/gdbstub/system.c
@@ -570,7 +570,7 @@ int gdb_continue_partial(char *newstates)
 }
 }
 
-if (vm_prepare_start(step_requested)) {
+if (vm_prepare_start(step_requested, RUN_STATE_RUNNING)) {
 return 0;
 }
 
diff --git a/hw/usb/hcd-ehci.c b/hw/usb/hcd-ehci.c
index 19b4534..10c82ce 100644
--- a/hw/usb/hcd-ehci.c
+++ b/hw/usb/hcd-ehci.c
@@ -2451,7 +2451,7 @@ static void usb_ehci_vm_state_change(void *opaque, bool 
running, RunState state)
  * USB-devices which have async handled packages have a packet in the
  * ep queue to match the completion with.
  */
-if (state == RUN_STATE_RUNNING) {
+if (running) {
 ehci_advance_async_state(ehci);
 }
 
diff --git a/hw/usb/redirect.c b/hw/usb/redirect.c
index c9893df..3785bb0 100644
--- a/hw/usb/redirect.c
+++ b/hw/usb/redirect.c
@@ -1403,7 +1403,7 @@ static void usbredir_vm_state_change(void *priv, bool 
running, RunState state)
 {
 USBRedirDevice *dev = priv;
 
-if (state == RUN_STATE_RUNNING && dev->parser != NULL) {
+if (running && dev->parser != NULL) {
 usbredirparser_do_write(dev->parser); /* Flush any pending writes */
 }
 }
diff --git a/hw/xen/xen-hvm-common.c b/hw/xen/xen-hvm-common.c
index 565dc39..47e6cb1 100644
--- a/hw/xen/xen-hvm-common.c
+++ b/hw/xen/xen-hvm-common.c
@@ -623,7 +623,7 @@ void xen_hvm_change_state_handler(void *opaque, bool 
running,
 
 xen_set_ioreq_server_state(xen_domid,
state->ioservid,
-   (rstate == RUN_STATE_RUNNING));
+   running);
 }
 
 void xen_exit_notifier(Notifier *n, void *data)
diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
index c8c2bd8..9e78c7f 100644
--- a/include/sysemu/runstate.h
+++ b/include/sysemu/runstate.h
@@ -46,8 +46,10 @@ void vm_start(void);
  * vm_prepare_start: Prepare for starting/resuming the VM
  *
  * @step_pending: whether any of the CPUs is about to be single-stepped by gdb
+ * @state: the vm state to setup
  */
-int vm_prepare_start(bool step_pending);
+int vm_prepare_start(bool step_pending, RunState state);
+
 int vm_stop(RunState state);
 int vm_stop_force_state(RunState state);
 int vm_shutdown(void);
diff --git a/system/cpus.c b/system/cpus.c
index a444a74..0c60d7a 100644
--- a/system/cpus.c
+++ b/system/cpus.c
@@ -682,7 +682,7 @@ int vm_stop(RunState state)
  * Returns -1 if the vCPUs are not to be restarted (e.g. if they are already
  * running or in case of an error condition), 0 otherwise.
  */
-int vm_prepare_start(bool step_pending)
+int vm_prepare_start(bool step_pending, RunState state)
 {
 RunState requested;
 
@@ -714,14 +714,14 @@ int vm_prepare_start(bool step_pending)
 qapi_event_send_resume();
 
 cpu_enable_ticks();
-runstate_set(RUN_STATE_RUNNING);
-vm_state_notify(1, RUN_STATE_RUNNING);
+runstate_set(state);
+vm_state_notify(1, state);
 return 0;
 }
 
 void vm_start(void)
 {
-if (!vm_prepare_start(false)) {
+if (!vm_prepare_start(false, RUN_STATE_RUNNING)) {
 resume_all_vcpus();
 }
 }
-- 
1.8.3.1




[PATCH V6 07/14] migration: preserve suspended for snapshot

2023-11-30 Thread Steve Sistare
Restoring a snapshot can break a suspended guest.  Snapshots suffer from
the same suspended-state issues that affect live migration, plus they must
handle an additional problematic scenario, which is that a running vm must
remain running if it loads a suspended snapshot.  Currently, after loading
such a snapshot, the vm_start fails.  The runstate is RUNNING, but the guest
is not.

To save, the vm_stop call now completely stops the suspended state, courtesy
of a recent patch.  Finish with vm_resume to leave the vm in the state it had
prior to the save, correctly restoring the suspended state.

To load, if the snapshot is not suspended, then vm_stop + vm_resume
correctly handles all states, and leaves the vm in the state it had prior
to the load.  However, if the snapshot is suspended, restoration is
trickier.  First, call vm_resume to restore the state to suspended so the
current state matches the saved state.  Then, if the pre-load state is
running, call wakeup to resume running.

Prior to these changes, the vm_stop to RUN_STATE_SAVE_VM and
RUN_STATE_RESTORE_VM did not change runstate if the current state was
paused, suspended, or prelaunch, but now it does, so allow these
transitions.

Signed-off-by: Steve Sistare 
---
 include/migration/snapshot.h   |  7 +++
 migration/migration-hmp-cmds.c |  8 +---
 migration/savevm.c | 23 +--
 system/runstate.c  |  5 +
 system/vl.c|  2 ++
 5 files changed, 32 insertions(+), 13 deletions(-)

diff --git a/include/migration/snapshot.h b/include/migration/snapshot.h
index e72083b..9e4dcaa 100644
--- a/include/migration/snapshot.h
+++ b/include/migration/snapshot.h
@@ -16,6 +16,7 @@
 #define QEMU_MIGRATION_SNAPSHOT_H
 
 #include "qapi/qapi-builtin-types.h"
+#include "qapi/qapi-types-run-state.h"
 
 /**
  * save_snapshot: Save an internal snapshot.
@@ -61,4 +62,10 @@ bool delete_snapshot(const char *name,
 bool has_devices, strList *devices,
 Error **errp);
 
+/**
+ * load_snapshot_resume: Restore runstate after loading snapshot.
+ * @state: state to restore
+ */
+void load_snapshot_resume(RunState state);
+
 #endif
diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
index 86ae832..c8d70bc 100644
--- a/migration/migration-hmp-cmds.c
+++ b/migration/migration-hmp-cmds.c
@@ -399,15 +399,17 @@ void hmp_info_migrate_parameters(Monitor *mon, const 
QDict *qdict)
 
 void hmp_loadvm(Monitor *mon, const QDict *qdict)
 {
-int saved_vm_running  = runstate_is_running();
+RunState saved_state = runstate_get();
+
 const char *name = qdict_get_str(qdict, "name");
 Error *err = NULL;
 
 vm_stop(RUN_STATE_RESTORE_VM);
 
-if (load_snapshot(name, NULL, false, NULL, ) && saved_vm_running) {
-vm_start();
+if (load_snapshot(name, NULL, false, NULL, )) {
+load_snapshot_resume(saved_state);
 }
+
 hmp_handle_error(mon, err);
 }
 
diff --git a/migration/savevm.c b/migration/savevm.c
index eec5503..78697c0 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -3046,7 +3046,7 @@ bool save_snapshot(const char *name, bool overwrite, 
const char *vmstate,
 QEMUSnapshotInfo sn1, *sn = 
 int ret = -1, ret2;
 QEMUFile *f;
-int saved_vm_running;
+RunState saved_state = runstate_get();
 uint64_t vm_state_size;
 g_autoptr(GDateTime) now = g_date_time_new_now_local();
 AioContext *aio_context;
@@ -3094,8 +3094,6 @@ bool save_snapshot(const char *name, bool overwrite, 
const char *vmstate,
 }
 aio_context = bdrv_get_aio_context(bs);
 
-saved_vm_running = runstate_is_running();
-
 global_state_store();
 vm_stop(RUN_STATE_SAVE_VM);
 
@@ -3163,9 +3161,7 @@ bool save_snapshot(const char *name, bool overwrite, 
const char *vmstate,
 
 bdrv_drain_all_end();
 
-if (saved_vm_running) {
-vm_start();
-}
+vm_resume(saved_state);
 return ret == 0;
 }
 
@@ -3339,6 +3335,14 @@ err_drain:
 return false;
 }
 
+void load_snapshot_resume(RunState state)
+{
+vm_resume(state);
+if (state == RUN_STATE_RUNNING && runstate_get() == RUN_STATE_SUSPENDED) {
+qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);
+}
+}
+
 bool delete_snapshot(const char *name, bool has_devices,
  strList *devices, Error **errp)
 {
@@ -3403,16 +3407,15 @@ static void snapshot_load_job_bh(void *opaque)
 {
 Job *job = opaque;
 SnapshotJob *s = container_of(job, SnapshotJob, common);
-int orig_vm_running;
+RunState orig_state = runstate_get();
 
 job_progress_set_remaining(>common, 1);
 
-orig_vm_running = runstate_is_running();
 vm_stop(RUN_STATE_RESTORE_VM);
 
 s->ret = load_snapshot(s->tag, s->vmstate, true, s->devices, s->errp);
-if (s->ret && orig_vm_running) {
-vm_start();
+if (s->ret) {
+load_snapshot_resume(orig_state);
 }
 
 job_progress_update(>common, 1);
diff 

[PATCH V6 14/14] tests/qtest: background migration with suspend

2023-11-30 Thread Steve Sistare
Add a test case to verify that the suspended state is handled correctly by
a background migration.  The test suspends the src, migrates, then wakes
the dest.

Signed-off-by: Steve Sistare 
---
 tests/qtest/migration-test.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index e16710f..30d4b32 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -1954,6 +1954,26 @@ static void test_precopy_unix_suspend_notlive(void)
 test_precopy_common();
 }
 
+static void *test_bg_suspend_start(QTestState *from, QTestState *to)
+{
+migrate_set_capability(from, "background-snapshot", true);
+return NULL;
+}
+
+static void test_bg_suspend(void)
+{
+g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs);
+MigrateCommon args = {
+.listen_uri = uri,
+.connect_uri = uri,
+.live = true,   /* runs fast, the src suspends immediately. */
+.start.suspend_me = true,
+.start_hook = test_bg_suspend_start
+};
+
+test_precopy_common();
+}
+
 static void test_precopy_unix_dirty_ring(void)
 {
 g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs);
@@ -3441,6 +3461,7 @@ int main(int argc, char **argv)
 if (is_x86) {
 qtest_add_func("/migration/postcopy/suspend",
test_postcopy_suspend);
+qtest_add_func("/migration/bg/suspend", test_bg_suspend);
 }
 }
 
-- 
1.8.3.1




[PATCH V6 05/14] migration: propagate suspended runstate

2023-11-30 Thread Steve Sistare
If the outgoing machine was previously suspended, propagate that to the
incoming side via global_state, so a subsequent vm_start restores the
suspended state.  To maintain backward and forward compatibility, define
the new field in a zero'd hole in the GlobalState struct.

Signed-off-by: Steve Sistare 
---
 migration/global_state.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/migration/global_state.c b/migration/global_state.c
index 4e2a9d8..de2532c 100644
--- a/migration/global_state.c
+++ b/migration/global_state.c
@@ -25,6 +25,7 @@ typedef struct {
 uint8_t runstate[100];
 RunState state;
 bool received;
+bool vm_was_suspended;
 } GlobalState;
 
 static GlobalState global_state;
@@ -35,6 +36,7 @@ static void global_state_do_store(RunState state)
 assert(strlen(state_str) < sizeof(global_state.runstate));
 strpadcpy((char *)global_state.runstate, sizeof(global_state.runstate),
   state_str, '\0');
+global_state.vm_was_suspended = vm_get_suspended();
 }
 
 void global_state_store(void)
@@ -68,6 +70,12 @@ static bool global_state_needed(void *opaque)
 return true;
 }
 
+/* If the suspended state must be remembered, it is needed */
+
+if (vm_get_suspended()) {
+return true;
+}
+
 /* If state is running or paused, it is not needed */
 
 if (strcmp(runstate, "running") == 0 ||
@@ -109,6 +117,7 @@ static int global_state_post_load(void *opaque, int 
version_id)
 return -EINVAL;
 }
 s->state = r;
+vm_set_suspended(s->vm_was_suspended || r == RUN_STATE_SUSPENDED);
 
 return 0;
 }
@@ -134,6 +143,7 @@ static const VMStateDescription vmstate_globalstate = {
 .fields = (VMStateField[]) {
 VMSTATE_UINT32(size, GlobalState),
 VMSTATE_BUFFER(runstate, GlobalState),
+VMSTATE_BOOL(vm_was_suspended, GlobalState),
 VMSTATE_END_OF_LIST()
 },
 };
-- 
1.8.3.1




[PATCH V6 04/14] cpus: vm_resume

2023-11-30 Thread Steve Sistare
Define the vm_resume helper, for use in subsequent patches.

Signed-off-by: Steve Sistare 
---
 include/sysemu/runstate.h | 8 
 system/cpus.c | 9 +
 2 files changed, 17 insertions(+)

diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
index 1d6828f..a900cec 100644
--- a/include/sysemu/runstate.h
+++ b/include/sysemu/runstate.h
@@ -55,6 +55,14 @@ void vm_start(void);
  */
 int vm_prepare_start(bool step_pending, RunState state);
 
+/**
+ * vm_resume: If @state is a startable state, start the vm and set the state,
+ * else just set the state.
+ *
+ * @state: the state to restore
+ */
+void vm_resume(RunState state);
+
 int vm_stop(RunState state);
 int vm_stop_force_state(RunState state);
 int vm_shutdown(void);
diff --git a/system/cpus.c b/system/cpus.c
index cbc6d6d..63cf356 100644
--- a/system/cpus.c
+++ b/system/cpus.c
@@ -750,6 +750,15 @@ void vm_start(void)
 }
 }
 
+void vm_resume(RunState state)
+{
+if (runstate_is_started(state)) {
+vm_start();
+} else {
+runstate_set(state);
+}
+}
+
 /* does a state transition even if the VM is already stopped,
current state is forgotten forever */
 int vm_stop_force_state(RunState state)
-- 
1.8.3.1




[PATCH V6 02/14] cpus: vm_was_suspended

2023-11-30 Thread Steve Sistare
Add a state variable to remember if a vm previously transitioned into a
suspended state.

Signed-off-by: Steve Sistare 
---
 include/sysemu/runstate.h |  2 ++
 system/cpus.c | 15 +++
 2 files changed, 17 insertions(+)

diff --git a/include/sysemu/runstate.h b/include/sysemu/runstate.h
index 9e78c7f..f6a337b 100644
--- a/include/sysemu/runstate.h
+++ b/include/sysemu/runstate.h
@@ -53,6 +53,8 @@ int vm_prepare_start(bool step_pending, RunState state);
 int vm_stop(RunState state);
 int vm_stop_force_state(RunState state);
 int vm_shutdown(void);
+void vm_set_suspended(bool suspended);
+bool vm_get_suspended(void);
 
 typedef enum WakeupReason {
 /* Always keep QEMU_WAKEUP_REASON_NONE = 0 */
diff --git a/system/cpus.c b/system/cpus.c
index 0c60d7a..ef7a0d3 100644
--- a/system/cpus.c
+++ b/system/cpus.c
@@ -259,6 +259,21 @@ void cpu_interrupt(CPUState *cpu, int mask)
 }
 }
 
+/*
+ * True if the vm was previously suspended, and has not been woken or reset.
+ */
+static int vm_was_suspended;
+
+void vm_set_suspended(bool suspended)
+{
+vm_was_suspended = suspended;
+}
+
+bool vm_get_suspended(void)
+{
+return vm_was_suspended;
+}
+
 static int do_vm_stop(RunState state, bool send_stop)
 {
 int ret = 0;
-- 
1.8.3.1




Re: [PATCH 06/12] scsi: remove AioContext locking

2023-11-30 Thread Eric Blake
On Wed, Nov 29, 2023 at 02:55:47PM -0500, Stefan Hajnoczi wrote:
> The AioContext lock no longer has any effect. Remove it.
> 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  include/hw/virtio/virtio-scsi.h | 14 --
>  hw/scsi/scsi-bus.c  |  2 --
>  hw/scsi/scsi-disk.c | 28 
>  hw/scsi/virtio-scsi.c   | 18 --
>  4 files changed, 4 insertions(+), 58 deletions(-)
>

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org




Re: [PATCH 05/12] block: remove AioContext locking

2023-11-30 Thread Eric Blake
On Wed, Nov 29, 2023 at 02:55:46PM -0500, Stefan Hajnoczi wrote:
> This is the big patch that removes
> aio_context_acquire()/aio_context_release() from the block layer and
> affected block layer users.
> 
> There isn't a clean way to split this patch and the reviewers are likely
> the same group of people, so I decided to do it in one patch.
> 
> Signed-off-by: Stefan Hajnoczi 
> ---

> +++ b/block.c
> @@ -7585,29 +7433,12 @@ void coroutine_fn bdrv_co_leave(BlockDriverState *bs, 
> AioContext *old_ctx)
>  
>  void coroutine_fn bdrv_co_lock(BlockDriverState *bs)
>  {
> -AioContext *ctx = bdrv_get_aio_context(bs);
> -
> -/* In the main thread, bs->aio_context won't change concurrently */
> -assert(qemu_get_current_aio_context() == qemu_get_aio_context());
> -
> -/*
> - * We're in coroutine context, so we already hold the lock of the main
> - * loop AioContext. Don't lock it twice to avoid deadlocks.
> - */
> -assert(qemu_in_coroutine());

Is this assertion worth keeping in the short term?...

> -if (ctx != qemu_get_aio_context()) {
> -aio_context_acquire(ctx);
> -}
> +/* TODO removed in next patch */
>  }

...I guess I'll see in the next patch.

>  
>  void coroutine_fn bdrv_co_unlock(BlockDriverState *bs)
>  {
> -AioContext *ctx = bdrv_get_aio_context(bs);
> -
> -assert(qemu_in_coroutine());
> -if (ctx != qemu_get_aio_context()) {
> -aio_context_release(ctx);
> -}
> +/* TODO removed in next patch */
>  }

Same comment.

> +++ b/blockdev.c
> @@ -1395,7 +1352,6 @@ static void external_snapshot_action(TransactionAction 
> *action,
>  /* File name of the new image (for 'blockdev-snapshot-sync') */
>  const char *new_image_file;
>  ExternalSnapshotState *state = g_new0(ExternalSnapshotState, 1);
> -AioContext *aio_context;
>  uint64_t perm, shared;
>  
>  /* TODO We'll eventually have to take a writer lock in this function */

I'm guessing removal of the locking gets us one step closer to
implementing this TODO at a later time?  Or is it now a stale comment?
Either way, it doesn't affect this patch.

> +++ b/migration/block.c
> @@ -270,7 +270,6 @@ static int mig_save_device_bulk(QEMUFile *f, 
> BlkMigDevState *bmds)
>  
>  if (bmds->shared_base) {
>  qemu_mutex_lock_iothread();
> -aio_context_acquire(blk_get_aio_context(bb));
...
> @@ -313,22 +311,10 @@ static int mig_save_device_bulk(QEMUFile *f, 
> BlkMigDevState *bmds)
>  block_mig_state.submitted++;
>  blk_mig_unlock();
>  
> -/* We do not know if bs is under the main thread (and thus does
> - * not acquire the AioContext when doing AIO) or rather under
> - * dataplane.  Thus acquire both the iothread mutex and the
> - * AioContext.
> - *
> - * This is ugly and will disappear when we make bdrv_* thread-safe,
> - * without the need to acquire the AioContext.
> - */
> -qemu_mutex_lock_iothread();
> -aio_context_acquire(blk_get_aio_context(bmds->blk));

Will conflict, but with trivial resolution, with your other thread
renaming things to qemu_bql_lock().


> +++ b/tests/unit/test-blockjob.c

> -static void test_complete_in_standby(void)
> -{

> @@ -531,13 +402,5 @@ int main(int argc, char **argv)
>  g_test_add_func("/blockjob/cancel/standby", test_cancel_standby);
>  g_test_add_func("/blockjob/cancel/pending", test_cancel_pending);
>  g_test_add_func("/blockjob/cancel/concluded", test_cancel_concluded);
> -
> -/*
> - * This test is flaky and sometimes fails in CI and otherwise:
> - * don't run unless user opts in via environment variable.
> - */
> -if (getenv("QEMU_TEST_FLAKY_TESTS")) {
> -g_test_add_func("/blockjob/complete_in_standby", 
> test_complete_in_standby);
> -}

Looks like you ripped out this entire test, because it is no longer
viable.  I might have mentioned it in the commit message, or squashed
the removal of this test into the earlier 02/12 patch.

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org




Re: [PATCH v4] accel/kvm: Turn DPRINTF macro use into tracepoints

2023-11-30 Thread Alex Bennée
Jai Arora  writes:

> Patch removes DRPINTF macro and adds multiple tracepoints
> to capture different kvm events.
>
> We also drop the DPRINTFs that don't add any additional
> information than trace_kvm_run_exit already does.
>
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1827
>
> Signed-off-by: Jai Arora 

But you didn't add my reviewed-by tag ;-)

Tools like b4 will do the collection of tags for you if you want.

> ---
> v4: Adds changes in commit message requested by Alex Bennee
>
> ps.
>
> I tried using git notes for the change log
> May be it did not reflect. Thanks for the feedback and review
>
>  accel/kvm/kvm-all.c| 28 ++--
>  accel/kvm/trace-events |  7 ++-
>  2 files changed, 12 insertions(+), 23 deletions(-)
>
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index e39a810a4e..80ac7b35b7 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -69,16 +69,6 @@
>  #define KVM_GUESTDBG_BLOCKIRQ 0
>  #endif
>  
> -//#define DEBUG_KVM
> -
> -#ifdef DEBUG_KVM
> -#define DPRINTF(fmt, ...) \
> -do { fprintf(stderr, fmt, ## __VA_ARGS__); } while (0)
> -#else
> -#define DPRINTF(fmt, ...) \
> -do { } while (0)
> -#endif
> -
>  struct KVMParkedVcpu {
>  unsigned long vcpu_id;
>  int kvm_fd;
> @@ -331,7 +321,7 @@ static int do_kvm_destroy_vcpu(CPUState *cpu)
>  struct KVMParkedVcpu *vcpu = NULL;
>  int ret = 0;
>  
> -DPRINTF("kvm_destroy_vcpu\n");
> +trace_kvm_destroy_vcpu();
>  
>  ret = kvm_arch_destroy_vcpu(cpu);
>  if (ret < 0) {
> @@ -341,7 +331,7 @@ static int do_kvm_destroy_vcpu(CPUState *cpu)
>  mmap_size = kvm_ioctl(s, KVM_GET_VCPU_MMAP_SIZE, 0);
>  if (mmap_size < 0) {
>  ret = mmap_size;
> -DPRINTF("KVM_GET_VCPU_MMAP_SIZE failed\n");
> +trace_kvm_failed_get_vcpu_mmap_size();
>  goto err;
>  }
>  
> @@ -443,7 +433,6 @@ int kvm_init_vcpu(CPUState *cpu, Error **errp)
> PAGE_SIZE * KVM_DIRTY_LOG_PAGE_OFFSET);
>  if (cpu->kvm_dirty_gfns == MAP_FAILED) {
>  ret = -errno;
> -DPRINTF("mmap'ing vcpu dirty gfns failed: %d\n", ret);
>  goto err;
>  }
>  }
> @@ -2821,7 +2810,7 @@ int kvm_cpu_exec(CPUState *cpu)
>  struct kvm_run *run = cpu->kvm_run;
>  int ret, run_ret;
>  
> -DPRINTF("kvm_cpu_exec()\n");
> +trace_kvm_cpu_exec();
>  
>  if (kvm_arch_process_async_events(cpu)) {
>  qatomic_set(>exit_request, 0);
> @@ -2848,7 +2837,7 @@ int kvm_cpu_exec(CPUState *cpu)
>  
>  kvm_arch_pre_run(cpu, run);
>  if (qatomic_read(>exit_request)) {
> -DPRINTF("interrupt exit requested\n");
> + trace_kvm_interrupt_exit_request();
>  /*
>   * KVM requires us to reenter the kernel after IO exits to 
> complete
>   * instruction emulation. This self-signal will ensure that we
> @@ -2878,7 +2867,7 @@ int kvm_cpu_exec(CPUState *cpu)
>  
>  if (run_ret < 0) {
>  if (run_ret == -EINTR || run_ret == -EAGAIN) {
> -DPRINTF("io window exit\n");
> +trace_kvm_io_window_exit();
>  kvm_eat_signals(cpu);
>  ret = EXCP_INTERRUPT;
>  break;
> @@ -2900,7 +2889,6 @@ int kvm_cpu_exec(CPUState *cpu)
>  trace_kvm_run_exit(cpu->cpu_index, run->exit_reason);
>  switch (run->exit_reason) {
>  case KVM_EXIT_IO:
> -DPRINTF("handle_io\n");
>  /* Called outside BQL */
>  kvm_handle_io(run->io.port, attrs,
>(uint8_t *)run + run->io.data_offset,
> @@ -2910,7 +2898,6 @@ int kvm_cpu_exec(CPUState *cpu)
>  ret = 0;
>  break;
>  case KVM_EXIT_MMIO:
> -DPRINTF("handle_mmio\n");
>  /* Called outside BQL */
>  address_space_rw(_space_memory,
>   run->mmio.phys_addr, attrs,
> @@ -2920,11 +2907,9 @@ int kvm_cpu_exec(CPUState *cpu)
>  ret = 0;
>  break;
>  case KVM_EXIT_IRQ_WINDOW_OPEN:
> -DPRINTF("irq_window_open\n");
>  ret = EXCP_INTERRUPT;
>  break;
>  case KVM_EXIT_SHUTDOWN:
> -DPRINTF("shutdown\n");
>  qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_RESET);
>  ret = EXCP_INTERRUPT;
>  break;
> @@ -2959,6 +2944,7 @@ int kvm_cpu_exec(CPUState *cpu)
>  ret = 0;
>  break;
>  case KVM_EXIT_SYSTEM_EVENT:
> +trace_kvm_run_exit_system_event(cpu->cpu_index, 
> run->system_event.type);
>  switch (run->system_event.type) {
>  case KVM_SYSTEM_EVENT_SHUTDOWN:
>  qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
> @@ -2976,13 +2962,11 @@ int kvm_cpu_exec(CPUState *cpu)
>  ret = 0;
>  break;
>  

Re: [PATCH 1/6] system/cpus: rename qemu_mutex_lock_iothread() to qemu_bql_lock()

2023-11-30 Thread Peter Xu
On Thu, Nov 30, 2023 at 03:43:25PM -0500, Stefan Hajnoczi wrote:
> On Thu, Nov 30, 2023 at 03:08:49PM -0500, Peter Xu wrote:
> > On Wed, Nov 29, 2023 at 04:26:20PM -0500, Stefan Hajnoczi wrote:
> > > The Big QEMU Lock (BQL) has many names and they are confusing. The
> > > actual QemuMutex variable is called qemu_global_mutex but it's commonly
> > > referred to as the BQL in discussions and some code comments. The
> > > locking APIs, however, are called qemu_mutex_lock_iothread() and
> > > qemu_mutex_unlock_iothread().
> > > 
> > > The "iothread" name is historic and comes from when the main thread was
> > > split into into KVM vcpu threads and the "iothread" (now called the main
> > > loop thread). I have contributed to the confusion myself by introducing
> > > a separate --object iothread, a separate concept unrelated to the BQL.
> > > 
> > > The "iothread" name is no longer appropriate for the BQL. Rename the
> > > locking APIs to:
> > > - void qemu_bql_lock(void)
> > > - void qemu_bql_unlock(void)
> > > - bool qemu_bql_locked(void)
> > > 
> > > There are more APIs with "iothread" in their names. Subsequent patches
> > > will rename them. There are also comments and documentation that will be
> > > updated in later patches.
> > > 
> > > Signed-off-by: Stefan Hajnoczi 
> > 
> > Acked-by: Peter Xu 
> > 
> > Two nickpicks:
> > 
> >   - BQL contains "QEMU" as the 2nd character, so maybe easier to further
> > rename qemu_bql into bql_?
> 
> Philippe wondered whether the variable name should end with _mutex (or
> _lock is common too), so an alternative might be big_qemu_lock. That's

IMHO mutex isn't important in this context, but an implementation detail of
the "lock" as an abstract concept.

For example, we won't need to rename it again then if the impl changes,
e.g. using pure futex or a rwlock replacement.  When that happens we don't
need to change all call sites again.

(never really meant to change the lock impl, just an example.. :)

KVM actually has that example of KVM_MMU_LOCK() macro taking as the rwlock
write lock when the spinlock is replaced with rwlock, while it'll keep to
be the spinlock "lock()" when !KVM_HAVE_MMU_RWLOCK.

> imperfect because it doesn't start with the usual qemu_ prefix.
> qemu_big_lock is better in that regard but inconsistent with our BQL
> abbreviation.
> 
> I don't like putting an underscore at the end. It's unusual and would
> make me wonder what that means.

Ah, I meant replacing the "qemu_bql_" prefix with "bql_", as that contains
QEMU already, rather than making "_" at the end.  So they'll be bql_lock(),
bql_unlock(), bql_locked().

> 
> Naming is hard, but please discuss and I'm open to change to BQL
> variable's name to whatever we all agree on.

I'm pretty okay with qemu_bql_lock(), etc. too.  I prefer a tiny little bit
on bql_ over qemu_bql_ in this regard, but frankly they're all names good
enough to me.  The "qemu_" prefix can still be a good thing saying "this is
a qemu global function", even if contained inside "bql" itself.

> 
> > 
> >   - Could we keep the full spell of BQL at some places, so people can still
> > reference it if not familiar?  IIUC most of the BQL helpers will root
> > back to the major three functions (_lock, _unlock, _locked), perhaps
> > add a comment of "BQL stands for..." over these three functions as
> > comment?
> 
> Yes, I'll update the doc comments to say "Big QEMU Lock (BQL)" for each
> of these functions.

Thanks!

-- 
Peter Xu




[PATCH-for-9.0] accel/tcg: Remove tb_invalidate_phys_page() from system emulation

2023-11-30 Thread Philippe Mathieu-Daudé
Since previous commit, tb_invalidate_phys_page() is not used
anymore in system emulation. Make it static for user emulation
and remove its public declaration in "exec/translate-all.h".

Signed-off-by: Philippe Mathieu-Daudé 
---
Based-on: <20231130203241.31099-1-phi...@linaro.org>
---
 include/exec/translate-all.h |  1 -
 accel/tcg/tb-maint.c | 24 +---
 2 files changed, 1 insertion(+), 24 deletions(-)

diff --git a/include/exec/translate-all.h b/include/exec/translate-all.h
index 88602ae8d8..85c9460c7c 100644
--- a/include/exec/translate-all.h
+++ b/include/exec/translate-all.h
@@ -23,7 +23,6 @@
 
 
 /* translate-all.c */
-void tb_invalidate_phys_page(tb_page_addr_t addr);
 void tb_check_watchpoint(CPUState *cpu, uintptr_t retaddr);
 
 #ifdef CONFIG_USER_ONLY
diff --git a/accel/tcg/tb-maint.c b/accel/tcg/tb-maint.c
index 3d2a896220..da39a43bd8 100644
--- a/accel/tcg/tb-maint.c
+++ b/accel/tcg/tb-maint.c
@@ -1021,7 +1021,7 @@ void tb_invalidate_phys_range(tb_page_addr_t start, 
tb_page_addr_t last)
  * Called with mmap_lock held for user-mode emulation
  * NOTE: this function must not be called while a TB is running.
  */
-void tb_invalidate_phys_page(tb_page_addr_t addr)
+static void tb_invalidate_phys_page(tb_page_addr_t addr)
 {
 tb_page_addr_t start, last;
 
@@ -1160,28 +1160,6 @@ tb_invalidate_phys_page_range__locked(struct 
page_collection *pages,
 #endif
 }
 
-/*
- * Invalidate all TBs which intersect with the target physical
- * address page @addr.
- */
-void tb_invalidate_phys_page(tb_page_addr_t addr)
-{
-struct page_collection *pages;
-tb_page_addr_t start, last;
-PageDesc *p;
-
-p = page_find(addr >> TARGET_PAGE_BITS);
-if (p == NULL) {
-return;
-}
-
-start = addr & TARGET_PAGE_MASK;
-last = addr | ~TARGET_PAGE_MASK;
-pages = page_collection_lock(start, last);
-tb_invalidate_phys_page_range__locked(pages, p, start, last, 0);
-page_collection_unlock(pages);
-}
-
 /*
  * Invalidate all TBs which intersect with the target physical address range
  * [start;last]. NOTE: start and end may refer to *different* physical pages.
-- 
2.41.0




Re: [PATCH 1/6] system/cpus: rename qemu_mutex_lock_iothread() to qemu_bql_lock()

2023-11-30 Thread Stefan Hajnoczi
On Thu, Nov 30, 2023 at 03:08:49PM -0500, Peter Xu wrote:
> On Wed, Nov 29, 2023 at 04:26:20PM -0500, Stefan Hajnoczi wrote:
> > The Big QEMU Lock (BQL) has many names and they are confusing. The
> > actual QemuMutex variable is called qemu_global_mutex but it's commonly
> > referred to as the BQL in discussions and some code comments. The
> > locking APIs, however, are called qemu_mutex_lock_iothread() and
> > qemu_mutex_unlock_iothread().
> > 
> > The "iothread" name is historic and comes from when the main thread was
> > split into into KVM vcpu threads and the "iothread" (now called the main
> > loop thread). I have contributed to the confusion myself by introducing
> > a separate --object iothread, a separate concept unrelated to the BQL.
> > 
> > The "iothread" name is no longer appropriate for the BQL. Rename the
> > locking APIs to:
> > - void qemu_bql_lock(void)
> > - void qemu_bql_unlock(void)
> > - bool qemu_bql_locked(void)
> > 
> > There are more APIs with "iothread" in their names. Subsequent patches
> > will rename them. There are also comments and documentation that will be
> > updated in later patches.
> > 
> > Signed-off-by: Stefan Hajnoczi 
> 
> Acked-by: Peter Xu 
> 
> Two nickpicks:
> 
>   - BQL contains "QEMU" as the 2nd character, so maybe easier to further
> rename qemu_bql into bql_?

Philippe wondered whether the variable name should end with _mutex (or
_lock is common too), so an alternative might be big_qemu_lock. That's
imperfect because it doesn't start with the usual qemu_ prefix.
qemu_big_lock is better in that regard but inconsistent with our BQL
abbreviation.

I don't like putting an underscore at the end. It's unusual and would
make me wonder what that means.

Naming is hard, but please discuss and I'm open to change to BQL
variable's name to whatever we all agree on.

> 
>   - Could we keep the full spell of BQL at some places, so people can still
> reference it if not familiar?  IIUC most of the BQL helpers will root
> back to the major three functions (_lock, _unlock, _locked), perhaps
> add a comment of "BQL stands for..." over these three functions as
> comment?

Yes, I'll update the doc comments to say "Big QEMU Lock (BQL)" for each
of these functions.

Stefan


signature.asc
Description: PGP signature


RE: [PATCH] Hexagon (target/hexagon) Fix shadow variable when idef-parser is off

2023-11-30 Thread Brian Cain


> -Original Message-
> From: Philippe Mathieu-Daudé 
> Sent: Thursday, November 30, 2023 2:17 PM
> To: Taylor Simpson ; qemu-devel@nongnu.org
> Cc: Brian Cain ; Matheus Bernardino (QUIC)
> ; Sid Manning ; Marco
> Liebel (QUIC) ; richard.hender...@linaro.org;
> a...@rev.ng; a...@rev.ng
> Subject: Re: [PATCH] Hexagon (target/hexagon) Fix shadow variable when idef-
> parser is off
> 
> WARNING: This email originated from outside of Qualcomm. Please be wary of
> any links or attachments, and do not enable macros.
> 
> On 30/11/23 19:39, Taylor Simpson wrote:
> > Adding -Werror=shadow=compatible-local causes Hexagon not to build
> > when idef-parser is off.  The "label" variable in CHECK_NOSHUF_PRED
> > shadows a variable in the surrounding code.
> >
> > Signed-off-by: Taylor Simpson 
> > ---
> >   target/hexagon/macros.h | 6 +++---
> >   1 file changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/target/hexagon/macros.h b/target/hexagon/macros.h
> > index 9a51b5709b..f99390e2a8 100644
> > --- a/target/hexagon/macros.h
> > +++ b/target/hexagon/macros.h
> > @@ -93,13 +93,13 @@
> >
> >   #define CHECK_NOSHUF_PRED(GET_EA, SIZE, PRED) \
> >   do { \
> > -TCGLabel *label = gen_new_label(); \
> > -tcg_gen_brcondi_tl(TCG_COND_EQ, PRED, 0, label); \
> > +TCGLabel *noshuf_label = gen_new_label(); \
> > +tcg_gen_brcondi_tl(TCG_COND_EQ, PRED, 0, noshuf_label); \
> 
> Fragile, but sufficient.

The fragility here refers to the fact that CHECK_NOSHUF_PRED() macro could show 
up in other contexts and then could shadow those?

We could change the macro to a function or expand the macro to take a label 
declared outside.  Would that be preferred?  Or are there other suggestions?

-Brian


Re: [PATCH 6/6] Rename "QEMU global mutex" to "BQL" in comments and docs

2023-11-30 Thread Stefan Hajnoczi
On Thu, Nov 30, 2023 at 02:49:48PM +0100, Philippe Mathieu-Daudé wrote:
> On 29/11/23 22:26, Stefan Hajnoczi wrote:
> > The term "QEMU global mutex" is identical to the more widely used Big
> > QEMU Lock ("BQL"). Update the code comments and documentation to use
> > "BQL" instead of "QEMU global mutex".
> > 
> > Signed-off-by: Stefan Hajnoczi 
> > ---
> >   docs/devel/multi-thread-tcg.rst   |  7 +++
> >   docs/devel/qapi-code-gen.rst  |  2 +-
> >   docs/devel/replay.rst |  2 +-
> >   docs/devel/multiple-iothreads.txt | 16 
> >   include/block/blockjob.h  |  6 +++---
> >   include/io/task.h |  2 +-
> >   include/qemu/coroutine-core.h |  2 +-
> >   include/qemu/coroutine.h  |  2 +-
> >   hw/block/dataplane/virtio-blk.c   |  8 
> >   hw/block/virtio-blk.c |  2 +-
> >   hw/scsi/virtio-scsi-dataplane.c   |  6 +++---
> >   net/tap.c |  2 +-
> >   12 files changed, 28 insertions(+), 29 deletions(-)
> 
> 
> > diff --git a/include/block/blockjob.h b/include/block/blockjob.h
> > index e594c10d23..b2bc7c04d6 100644
> > --- a/include/block/blockjob.h
> > +++ b/include/block/blockjob.h
> > @@ -54,7 +54,7 @@ typedef struct BlockJob {
> >   /**
> >* Speed that was set with @block_job_set_speed.
> > - * Always modified and read under QEMU global mutex 
> > (GLOBAL_STATE_CODE).
> > + * Always modified and read under BQL (GLOBAL_STATE_CODE).
> 
> "under the BQL"
> 
> >*/
> >   int64_t speed;
> > @@ -66,7 +66,7 @@ typedef struct BlockJob {
> >   /**
> >* Block other operations when block job is running.
> > - * Always modified and read under QEMU global mutex 
> > (GLOBAL_STATE_CODE).
> > + * Always modified and read under BQL (GLOBAL_STATE_CODE).
> 
> Ditto,
> 
> >*/
> >   Error *blocker;
> > @@ -89,7 +89,7 @@ typedef struct BlockJob {
> >   /**
> >* BlockDriverStates that are involved in this block job.
> > - * Always modified and read under QEMU global mutex 
> > (GLOBAL_STATE_CODE).
> > + * Always modified and read under BQL (GLOBAL_STATE_CODE).
> 
> Ditto.
> 
> >*/
> >   GSList *nodes;
> >   } BlockJob;

Will fix in v2.

Stefan


signature.asc
Description: PGP signature


Re: [PATCH 5/6] Replace "iothread lock" with "BQL" in comments

2023-11-30 Thread Stefan Hajnoczi
On Thu, Nov 30, 2023 at 02:47:49PM +0100, Philippe Mathieu-Daudé wrote:
> Hi Stefan,
> 
> On 29/11/23 22:26, Stefan Hajnoczi wrote:
> > The term "iothread lock" is obsolete. The APIs use Big QEMU Lock (BQL)
> > in their names. Update the code comments to use "BQL" instead of
> > "iothread lock".
> > 
> > Signed-off-by: Stefan Hajnoczi 
> > ---
> >   docs/devel/reset.rst |  2 +-
> >   hw/display/qxl.h |  2 +-
> >   include/exec/cpu-common.h|  2 +-
> >   include/exec/memory.h|  4 ++--
> >   include/exec/ramblock.h  |  2 +-
> >   include/migration/register.h |  8 
> >   target/arm/internals.h   |  4 ++--
> >   accel/tcg/cputlb.c   |  4 ++--
> >   accel/tcg/tcg-accel-ops-icount.c |  2 +-
> >   hw/remote/mpqemu-link.c  |  2 +-
> >   migration/block-dirty-bitmap.c   | 10 +-
> >   migration/block.c| 24 
> >   migration/colo.c |  2 +-
> >   migration/migration.c|  2 +-
> >   migration/ram.c  |  4 ++--
> >   system/physmem.c |  6 +++---
> >   target/arm/helper.c  |  2 +-
> >   target/arm/tcg/m_helper.c|  2 +-
> >   ui/spice-core.c  |  2 +-
> >   util/rcu.c   |  2 +-
> >   audio/coreaudio.m|  4 ++--
> >   ui/cocoa.m   |  6 +++---
> >   22 files changed, 49 insertions(+), 49 deletions(-)
> 
> 
> > diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h
> > index 69c6a53902..a2bc0a345d 100644
> > --- a/include/exec/ramblock.h
> > +++ b/include/exec/ramblock.h
> > @@ -34,7 +34,7 @@ struct RAMBlock {
> >   ram_addr_t max_length;
> >   void (*resized)(const char*, uint64_t length, void *host);
> >   uint32_t flags;
> > -/* Protected by iothread lock.  */
> > +/* Protected by BQL.  */
> 
> There is only one single BQL, so preferably:
> 
> "by the BQL"
> 
> >   char idstr[256];
> >   /* RCU-enabled, writes protected by the ramlist lock */
> >   QLIST_ENTRY(RAMBlock) next;
> 
> 
> 
> 
> > -/* Called with iothread lock taken.  */
> > +/* Called with BQL taken.  */
> 
> "with the BQL" (other uses)

I will try to change these for v2. It's a pre-existing issue though
because there was only ever one "iothread lock" too.

Stefan


signature.asc
Description: PGP signature


[PATCH-for-9.0] accel/tcg: Remove unused tb_invalidate_phys_addr()

2023-11-30 Thread Philippe Mathieu-Daudé
Commit e3f7c801f1 introduced the TCGCPUOps::debug_check_breakpoint()
handler, and commit 10c37828b2 "moved breakpoint recognition outside
of translation", so "we no longer need to flush any TBs when changing
BPs".

The last target using tb_invalidate_phys_addr() was converted to the
debug_check_breakpoint(), so this function is now unused. Remove it.

Signed-off-by: Philippe Mathieu-Daudé 
---
Based-on: <20231130171920.3798954-1-jcmvb...@gmail.com>
---
 include/exec/exec-all.h |  5 -
 cpu-target.c| 29 -
 2 files changed, 34 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index ee90ef122b..df3d93a2e2 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -518,11 +518,6 @@ static inline void tb_set_page_addr1(TranslationBlock *tb,
 uint32_t curr_cflags(CPUState *cpu);
 
 /* TranslationBlock invalidate API */
-#if defined(CONFIG_USER_ONLY)
-void tb_invalidate_phys_addr(hwaddr addr);
-#else
-void tb_invalidate_phys_addr(AddressSpace *as, hwaddr addr, MemTxAttrs attrs);
-#endif
 void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr);
 void tb_invalidate_phys_range(tb_page_addr_t start, tb_page_addr_t last);
 void tb_set_jmp_target(TranslationBlock *tb, int n, uintptr_t addr);
diff --git a/cpu-target.c b/cpu-target.c
index 508013e23d..997ca2e846 100644
--- a/cpu-target.c
+++ b/cpu-target.c
@@ -276,35 +276,6 @@ void list_cpus(void)
 #endif
 }
 
-#if defined(CONFIG_USER_ONLY)
-void tb_invalidate_phys_addr(hwaddr addr)
-{
-mmap_lock();
-tb_invalidate_phys_page(addr);
-mmap_unlock();
-}
-#else
-void tb_invalidate_phys_addr(AddressSpace *as, hwaddr addr, MemTxAttrs attrs)
-{
-ram_addr_t ram_addr;
-MemoryRegion *mr;
-hwaddr l = 1;
-
-if (!tcg_enabled()) {
-return;
-}
-
-RCU_READ_LOCK_GUARD();
-mr = address_space_translate(as, addr, , , false, attrs);
-if (!(memory_region_is_ram(mr)
-  || memory_region_is_romd(mr))) {
-return;
-}
-ram_addr = memory_region_get_ram_addr(mr) + addr;
-tb_invalidate_phys_page(ram_addr);
-}
-#endif
-
 /* enable or disable single step mode. EXCP_DEBUG is returned by the
CPU loop after each instruction */
 void cpu_single_step(CPUState *cpu, int enabled)
-- 
2.41.0




Re: [PATCH 4/6] system/cpus: rename qemu_global_mutex to qemu_bql

2023-11-30 Thread Stefan Hajnoczi
On Thu, Nov 30, 2023 at 02:44:07PM +0100, Philippe Mathieu-Daudé wrote:
> Hi Stefan,
> 
> On 29/11/23 22:26, Stefan Hajnoczi wrote:
> > The APIs using qemu_global_mutex now follow the Big QEMU Lock (BQL)
> > nomenclature. It's a little strange that the actual QemuMutex variable
> > that embodies the BQL is called qemu_global_mutex instead of qemu_bql.
> > Rename it for consistency.
> > 
> > Signed-off-by: Stefan Hajnoczi 
> > ---
> >   system/cpus.c | 20 ++--
> >   1 file changed, 10 insertions(+), 10 deletions(-)
> > 
> > diff --git a/system/cpus.c b/system/cpus.c
> > index eb24a4db8e..138720a540 100644
> > --- a/system/cpus.c
> > +++ b/system/cpus.c
> > @@ -65,7 +65,7 @@
> >   #endif /* CONFIG_LINUX */
> > -static QemuMutex qemu_global_mutex;
> > +static QemuMutex qemu_bql;
> 
> I thought we were using _cond/_sem/_mutex suffixes, but
> this is not enforced:

I'm open to alternative names. Here are some I can think of:
- big_qemu_lock (although grepping for "bql" won't find it)
- qemu_bql_mutex

If there is no strong feeling about this then let's leave it at
qemu_bql. Otherwise, please discuss.

Thanks,
Stefan


signature.asc
Description: PGP signature


Re: [PATCH 2/6] qemu/main-loop: rename QEMU_IOTHREAD_LOCK_GUARD to QEMU_BQL_LOCK_GUARD

2023-11-30 Thread Stefan Hajnoczi
On Thu, Nov 30, 2023 at 10:14:47AM +0100, Ilya Leoshkevich wrote:
> On Wed, 2023-11-29 at 16:26 -0500, Stefan Hajnoczi wrote:
> > The name "iothread" is overloaded. Use the term Big QEMU Lock (BQL)
> > instead, it is already widely used and unambiguous.
> > 
> > Signed-off-by: Stefan Hajnoczi 
> > ---
> >  include/qemu/main-loop.h  | 20 ++--
> >  hw/i386/kvm/xen_evtchn.c  | 14 +++---
> >  hw/i386/kvm/xen_gnttab.c  |  2 +-
> >  hw/mips/mips_int.c    |  2 +-
> >  hw/ppc/ppc.c  |  2 +-
> >  target/i386/kvm/xen-emu.c |  2 +-
> >  target/ppc/excp_helper.c  |  2 +-
> >  target/ppc/helper_regs.c  |  2 +-
> >  target/riscv/cpu_helper.c |  4 ++--
> >  9 files changed, 25 insertions(+), 25 deletions(-)
> > 
> > diff --git a/include/qemu/main-loop.h b/include/qemu/main-loop.h
> > index d6f75e57bd..0b6a3e4824 100644
> > --- a/include/qemu/main-loop.h
> > +++ b/include/qemu/main-loop.h
> > @@ -344,13 +344,13 @@ void qemu_bql_lock_impl(const char *file, int
> > line);
> >  void qemu_bql_unlock(void);
> >  
> >  /**
> > - * QEMU_IOTHREAD_LOCK_GUARD
> > + * QEMU_BQL_LOCK_GUARD
> >   *
> > - * Wrap a block of code in a conditional
> > qemu_mutex_{lock,unlock}_iothread.
> > + * Wrap a block of code in a conditional qemu_bql_{lock,unlock}.
> >   */
> > -typedef struct IOThreadLockAuto IOThreadLockAuto;
> > +typedef struct BQLLockAuto BQLLockAuto;
> >  
> > -static inline IOThreadLockAuto *qemu_iothread_auto_lock(const char
> > *file,
> > +static inline BQLLockAuto *qemu_bql_auto_lock(const char *file,
> >  int line)
> 
> The padding is not correct anymore.

Good point, I didn't check the formatting after search-and-replace. I
will fix this across the patch series in v2.

Stefan


signature.asc
Description: PGP signature


Re: [PATCH] Hexagon (target/hexagon) Fix shadow variable when idef-parser is off

2023-11-30 Thread Philippe Mathieu-Daudé

On 30/11/23 19:39, Taylor Simpson wrote:

Adding -Werror=shadow=compatible-local causes Hexagon not to build
when idef-parser is off.  The "label" variable in CHECK_NOSHUF_PRED
shadows a variable in the surrounding code.

Signed-off-by: Taylor Simpson 
---
  target/hexagon/macros.h | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/target/hexagon/macros.h b/target/hexagon/macros.h
index 9a51b5709b..f99390e2a8 100644
--- a/target/hexagon/macros.h
+++ b/target/hexagon/macros.h
@@ -93,13 +93,13 @@
  
  #define CHECK_NOSHUF_PRED(GET_EA, SIZE, PRED) \

  do { \
-TCGLabel *label = gen_new_label(); \
-tcg_gen_brcondi_tl(TCG_COND_EQ, PRED, 0, label); \
+TCGLabel *noshuf_label = gen_new_label(); \
+tcg_gen_brcondi_tl(TCG_COND_EQ, PRED, 0, noshuf_label); \


Fragile, but sufficient.

Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH v2 1/2] block: commit: Allow users to request only format driver names in backing file format

2023-11-30 Thread Peter Krempa
On Thu, Nov 30, 2023 at 13:24:18 -0600, Eric Blake wrote:
> On Thu, Nov 30, 2023 at 05:06:03PM +0100, Peter Krempa wrote:
> > Introduce a new flag 'backing_file_format_no_protocol' for the
> > block-commit QMP command which instructs the internals to use 'raw'
> > instead of the protocol driver in case when a image is used without a
> > dummy 'raw' wrapper.
> > 
> > The flag is designed such that it can be always asserted by management
> > tools even when there isn't any update to backing files.
> > 
> > The flag will be used by libvirt so that the backing images still
> > reference the proper format even when libvirt will stop using the dummy
> > raw driver (raw driver with no other config). Libvirt needs this so that
> > the images stay compatible with older libvirt versions which didn't
> > expect that a protocol driver name can appear in the backing file format
> > field.
> > 
> > Signed-off-by: Peter Krempa 
> > Reviewed-by: Vladimir Sementsov-Ogievskiy 
> > ---
> 
> > +++ b/qapi/block-core.json
> > @@ -1810,6 +1810,14 @@
> >  # Care should be taken when specifying the string, to specify a
> >  # valid filename or protocol.  (Since 2.1)
> >  #
> > +# @backing-file-format-no-protocol: If true always use a 'format' driver 
> > name
> > +# for the 'backing file format' field if updating the image header of 
> > the
> > +# overlay of 'top'. Otherwise the real name of the driver of the 
> > backing
> > +# image may be used which may be a protocol driver.
> > +#
> > +# Can be used also when no image header will be updated.
> > +# (default: false; since: 9.0)


As I've previously stated, I don't really care about a name as long as I
don't have to keep re-sending,

> This is a long name.  What about:

But is the long name really a problem?

> @backing-mask-protocol: If true, replace any protocol mentioned in the
> 'backing file format' with 'raw', rather than storing the protocol
> name as the backing format.  Can be used even when no image header
> will be updated (default false; since 9.0).

Sounds okay to me. In the end, nobody will really see this as libvirt
will be using it internally




Re: [PATCH 1/6] system/cpus: rename qemu_mutex_lock_iothread() to qemu_bql_lock()

2023-11-30 Thread Peter Xu
On Wed, Nov 29, 2023 at 04:26:20PM -0500, Stefan Hajnoczi wrote:
> The Big QEMU Lock (BQL) has many names and they are confusing. The
> actual QemuMutex variable is called qemu_global_mutex but it's commonly
> referred to as the BQL in discussions and some code comments. The
> locking APIs, however, are called qemu_mutex_lock_iothread() and
> qemu_mutex_unlock_iothread().
> 
> The "iothread" name is historic and comes from when the main thread was
> split into into KVM vcpu threads and the "iothread" (now called the main
> loop thread). I have contributed to the confusion myself by introducing
> a separate --object iothread, a separate concept unrelated to the BQL.
> 
> The "iothread" name is no longer appropriate for the BQL. Rename the
> locking APIs to:
> - void qemu_bql_lock(void)
> - void qemu_bql_unlock(void)
> - bool qemu_bql_locked(void)
> 
> There are more APIs with "iothread" in their names. Subsequent patches
> will rename them. There are also comments and documentation that will be
> updated in later patches.
> 
> Signed-off-by: Stefan Hajnoczi 

Acked-by: Peter Xu 

Two nickpicks:

  - BQL contains "QEMU" as the 2nd character, so maybe easier to further
rename qemu_bql into bql_?

  - Could we keep the full spell of BQL at some places, so people can still
reference it if not familiar?  IIUC most of the BQL helpers will root
back to the major three functions (_lock, _unlock, _locked), perhaps
add a comment of "BQL stands for..." over these three functions as
comment?

Please take or ignore these nitpicks; my ACK will stand irrelevant.

Thanks,

-- 
Peter Xu




[PATCH v4] accel/kvm: Turn DPRINTF macro use into tracepoints

2023-11-30 Thread Jai Arora
Patch removes DRPINTF macro and adds multiple tracepoints
to capture different kvm events.

We also drop the DPRINTFs that don't add any additional
information than trace_kvm_run_exit already does.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1827

Signed-off-by: Jai Arora 
---
v4: Adds changes in commit message requested by Alex Bennee

ps.

I tried using git notes for the change log
May be it did not reflect. Thanks for the feedback and review

 accel/kvm/kvm-all.c| 28 ++--
 accel/kvm/trace-events |  7 ++-
 2 files changed, 12 insertions(+), 23 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index e39a810a4e..80ac7b35b7 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -69,16 +69,6 @@
 #define KVM_GUESTDBG_BLOCKIRQ 0
 #endif
 
-//#define DEBUG_KVM
-
-#ifdef DEBUG_KVM
-#define DPRINTF(fmt, ...) \
-do { fprintf(stderr, fmt, ## __VA_ARGS__); } while (0)
-#else
-#define DPRINTF(fmt, ...) \
-do { } while (0)
-#endif
-
 struct KVMParkedVcpu {
 unsigned long vcpu_id;
 int kvm_fd;
@@ -331,7 +321,7 @@ static int do_kvm_destroy_vcpu(CPUState *cpu)
 struct KVMParkedVcpu *vcpu = NULL;
 int ret = 0;
 
-DPRINTF("kvm_destroy_vcpu\n");
+trace_kvm_destroy_vcpu();
 
 ret = kvm_arch_destroy_vcpu(cpu);
 if (ret < 0) {
@@ -341,7 +331,7 @@ static int do_kvm_destroy_vcpu(CPUState *cpu)
 mmap_size = kvm_ioctl(s, KVM_GET_VCPU_MMAP_SIZE, 0);
 if (mmap_size < 0) {
 ret = mmap_size;
-DPRINTF("KVM_GET_VCPU_MMAP_SIZE failed\n");
+trace_kvm_failed_get_vcpu_mmap_size();
 goto err;
 }
 
@@ -443,7 +433,6 @@ int kvm_init_vcpu(CPUState *cpu, Error **errp)
PAGE_SIZE * KVM_DIRTY_LOG_PAGE_OFFSET);
 if (cpu->kvm_dirty_gfns == MAP_FAILED) {
 ret = -errno;
-DPRINTF("mmap'ing vcpu dirty gfns failed: %d\n", ret);
 goto err;
 }
 }
@@ -2821,7 +2810,7 @@ int kvm_cpu_exec(CPUState *cpu)
 struct kvm_run *run = cpu->kvm_run;
 int ret, run_ret;
 
-DPRINTF("kvm_cpu_exec()\n");
+trace_kvm_cpu_exec();
 
 if (kvm_arch_process_async_events(cpu)) {
 qatomic_set(>exit_request, 0);
@@ -2848,7 +2837,7 @@ int kvm_cpu_exec(CPUState *cpu)
 
 kvm_arch_pre_run(cpu, run);
 if (qatomic_read(>exit_request)) {
-DPRINTF("interrupt exit requested\n");
+   trace_kvm_interrupt_exit_request();
 /*
  * KVM requires us to reenter the kernel after IO exits to complete
  * instruction emulation. This self-signal will ensure that we
@@ -2878,7 +2867,7 @@ int kvm_cpu_exec(CPUState *cpu)
 
 if (run_ret < 0) {
 if (run_ret == -EINTR || run_ret == -EAGAIN) {
-DPRINTF("io window exit\n");
+trace_kvm_io_window_exit();
 kvm_eat_signals(cpu);
 ret = EXCP_INTERRUPT;
 break;
@@ -2900,7 +2889,6 @@ int kvm_cpu_exec(CPUState *cpu)
 trace_kvm_run_exit(cpu->cpu_index, run->exit_reason);
 switch (run->exit_reason) {
 case KVM_EXIT_IO:
-DPRINTF("handle_io\n");
 /* Called outside BQL */
 kvm_handle_io(run->io.port, attrs,
   (uint8_t *)run + run->io.data_offset,
@@ -2910,7 +2898,6 @@ int kvm_cpu_exec(CPUState *cpu)
 ret = 0;
 break;
 case KVM_EXIT_MMIO:
-DPRINTF("handle_mmio\n");
 /* Called outside BQL */
 address_space_rw(_space_memory,
  run->mmio.phys_addr, attrs,
@@ -2920,11 +2907,9 @@ int kvm_cpu_exec(CPUState *cpu)
 ret = 0;
 break;
 case KVM_EXIT_IRQ_WINDOW_OPEN:
-DPRINTF("irq_window_open\n");
 ret = EXCP_INTERRUPT;
 break;
 case KVM_EXIT_SHUTDOWN:
-DPRINTF("shutdown\n");
 qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_RESET);
 ret = EXCP_INTERRUPT;
 break;
@@ -2959,6 +2944,7 @@ int kvm_cpu_exec(CPUState *cpu)
 ret = 0;
 break;
 case KVM_EXIT_SYSTEM_EVENT:
+trace_kvm_run_exit_system_event(cpu->cpu_index, 
run->system_event.type);
 switch (run->system_event.type) {
 case KVM_SYSTEM_EVENT_SHUTDOWN:
 qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
@@ -2976,13 +2962,11 @@ int kvm_cpu_exec(CPUState *cpu)
 ret = 0;
 break;
 default:
-DPRINTF("kvm_arch_handle_exit\n");
 ret = kvm_arch_handle_exit(cpu, run);
 break;
 }
 break;
 default:
-DPRINTF("kvm_arch_handle_exit\n");
 ret = kvm_arch_handle_exit(cpu, run);
 break;
 }
diff --git a/accel/kvm/trace-events b/accel/kvm/trace-events
index 

Re: [PATCH v3] accel/kvm: Turn DPRINTF macro use into tracepoints

2023-11-30 Thread Alex Bennée
Jai Arora  writes:

> Patch removes DRPINTF macro and adds multiple tracepoints
> to capture different kvm events.

maybe add "We drop the DPRINTFs that don't add any additional
information than trace_kvm_run_exit already does."

Otherwise:

Reviewed-by: Alex Bennée 

p.s.

It is helpful to add a mini-changelog bellow the --- line so reviewers
can see what changes have already been made to the patch. The git
tooling will strip this log when the maintainer applies the patches.

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro



Re: [PATCH v3 03/70] RAMBlock/guest_memfd: Enable KVM_GUEST_MEMFD_ALLOW_HUGEPAGE

2023-11-30 Thread David Hildenbrand

On 30.11.23 18:51, Daniel P. Berrangé wrote:

On Thu, Nov 30, 2023 at 05:54:26PM +0100, David Hildenbrand wrote:

On 30.11.23 17:01, Sean Christopherson wrote:

On Thu, Nov 30, 2023, David Hildenbrand wrote:

On 30.11.23 08:32, Xiaoyao Li wrote:

On 11/20/2023 5:26 PM, David Hildenbrand wrote:



... did you shamelessly copy that from hw/virtio/virtio-mem.c ? ;)


Get caught.


This should be factored out into a common helper.


Sure, will do it in next version.


Factor it out in a separate patch. Then, this patch is get small that
you can just squash it into #2.

And my comment regarding "flags = 0" to patch #2 does no longer apply :)



I see.

But it depends on if KVM_GUEST_MEMFD_ALLOW_HUGEPAGE will appear together
with initial guest memfd in linux (hopefully 6.8)
https://lore.kernel.org/all/CABgObfa=dh7fysbvif63os9svog_wt-aqygtuagkqny5biz...@mail.gmail.com/



Doesn't seem to be in -next if I am looking at the right tree:

https://git.kernel.org/pub/scm/virt/kvm/kvm.git/log/?h=next


Yeah, we punted on adding hugepage support for the initial guest_memfd merge so
as not to rush in kludgy uABI.  The internal KVM code isn't problematic, we just
haven't figured out exactly what the ABI should look like, e.g. should hugepages
be dependent on THP being enabled, and if not, how does userspace discover the
supported hugepage sizes?


Are we talking about THP or hugetlb? They are two different things, and
"KVM_GUEST_MEMFD_ALLOW_HUGEPAGE" doesn't make it clearer what we are talking
about.

This patch here "get_thp_size()" indicates that we care about THP, not
hugetlb.


THP lives in:
/sys/kernel/mm/transparent_hugepage/
and hugetlb in:
/sys/kernel/mm/hugepages/

THP for shmem+anon currently really only supports PMD-sized THP, that size
can be observed via:
/sys/kernel/mm/transparent_hugepage/hpage_pmd_size

hugetlb sizes can be detected simply by looking at the folders inside
/sys/kernel/mm/hugepages/. "tools/testing/selftests/mm/vm_util.c" in the
kernel has a function "detect_hugetlb_page_sizes()" that uses that interface
to detect the sizes.


But likely we want THP support here. Because for hugetlb, one would actually
have to instruct the kernel which size to use, like we do for memfd with
hugetlb.


Would we not want both ultimately ?


Likely we want both somehow, although I am not sure how to obtain either 
cleanly and fully.


My question is targeted at what the current interface/implementation 
promises, and how it relates to both, THP and hugetlb.


--
Cheers,

David / dhildenb




Re: [PATCH v2 2/2] block: stream: Allow users to request only format driver names in backing file format

2023-11-30 Thread Eric Blake
On Thu, Nov 30, 2023 at 05:06:04PM +0100, Peter Krempa wrote:
> Introduce a new flag 'backing_file_format_no_protocol' for the
> block-commit QMP command which instructs the internals to use 'raw'
> instead of the protocol driver in case when a image is used without a
> dummy 'raw' wrapper.

Same long name as in 1/2; if we come up with a nicer name (my proposal
in 1/2 was backing-mask-protocol; but I'm open to others), we should
keep the two patches consistent.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org




Re: [PATCH v3 03/70] RAMBlock/guest_memfd: Enable KVM_GUEST_MEMFD_ALLOW_HUGEPAGE

2023-11-30 Thread David Hildenbrand

On 30.11.23 18:46, Peter Xu wrote:

On Thu, Nov 30, 2023 at 05:54:26PM +0100, David Hildenbrand wrote:

But likely we want THP support here. Because for hugetlb, one would actually
have to instruct the kernel which size to use, like we do for memfd with
hugetlb.


I doubt it, as VM can still leverage larger sizes if possible?


What do you doubt? I am talking about the current implementation and 
expected semantics of KVM_GUEST_MEMFD_ALLOW_HUGEPAGE.


--
Cheers,

David / dhildenb




Re: [PATCH v2 1/2] block: commit: Allow users to request only format driver names in backing file format

2023-11-30 Thread Eric Blake
On Thu, Nov 30, 2023 at 05:06:03PM +0100, Peter Krempa wrote:
> Introduce a new flag 'backing_file_format_no_protocol' for the
> block-commit QMP command which instructs the internals to use 'raw'
> instead of the protocol driver in case when a image is used without a
> dummy 'raw' wrapper.
> 
> The flag is designed such that it can be always asserted by management
> tools even when there isn't any update to backing files.
> 
> The flag will be used by libvirt so that the backing images still
> reference the proper format even when libvirt will stop using the dummy
> raw driver (raw driver with no other config). Libvirt needs this so that
> the images stay compatible with older libvirt versions which didn't
> expect that a protocol driver name can appear in the backing file format
> field.
> 
> Signed-off-by: Peter Krempa 
> Reviewed-by: Vladimir Sementsov-Ogievskiy 
> ---

> +++ b/qapi/block-core.json
> @@ -1810,6 +1810,14 @@
>  # Care should be taken when specifying the string, to specify a
>  # valid filename or protocol.  (Since 2.1)
>  #
> +# @backing-file-format-no-protocol: If true always use a 'format' driver name
> +# for the 'backing file format' field if updating the image header of the
> +# overlay of 'top'. Otherwise the real name of the driver of the backing
> +# image may be used which may be a protocol driver.
> +#
> +# Can be used also when no image header will be updated.
> +# (default: false; since: 9.0)

This is a long name.  What about:

@backing-mask-protocol: If true, replace any protocol mentioned in the
'backing file format' with 'raw', rather than storing the protocol
name as the backing format.  Can be used even when no image header
will be updated (default false; since 9.0).

or s/mask/hide/ if that sounds better.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org




Re: [PATCH v3 03/70] RAMBlock/guest_memfd: Enable KVM_GUEST_MEMFD_ALLOW_HUGEPAGE

2023-11-30 Thread David Hildenbrand

On 30.11.23 18:57, David Hildenbrand wrote:

On 30.11.23 18:46, Peter Xu wrote:

On Thu, Nov 30, 2023 at 05:54:26PM +0100, David Hildenbrand wrote:

But likely we want THP support here. Because for hugetlb, one would actually
have to instruct the kernel which size to use, like we do for memfd with
hugetlb.


I doubt it, as VM can still leverage larger sizes if possible?


What do you doubt? I am talking about the current implementation and
expected semantics of KVM_GUEST_MEMFD_ALLOW_HUGEPAGE.



I looked at the kernel implementation, and it simply allocates a 
PMD-sized folio and puts it into the pagecache. So hugetlb is not involved.


That raises various questions:

1) What are the semantics if we ever allow migrating/compacting such
   folios. Would we allow split them into smaller pages when required
   (or compact into larger)? What happens when we would partially zap
   them (fallocate?)right now? IOW, do they behave like THP, and do we
   want them to behave like THP?

2) If they behave like THP, wow would we able to compact them into
   bigger pages? khugepaged only works on VMAs IIRC.

3) How would you allocate gigantic pages if not by the help of hugetlb
   and reserved pools? At least as of today, runtime allocation of
   gigantic pages is extremely unreliable and compaction into gigantic
   pages does not work. So gigantic pages would be something for that
   far distant future.

4) cont-pte-sizes folios?

Maybe it's all clarified already, in that case I'd appreciate a pointer.

Looking at the current code, it looks like it behaves like shmem thp, 
just without any way to collapse afterwards (unless I am missing something).


--
Cheers,

David / dhildenb




RE: [PATCH] Hexagon (target/hexagon) Fix shadow variable when idef-parser is off

2023-11-30 Thread Brian Cain
> -Original Message-
> From: Taylor Simpson 
> Sent: Thursday, November 30, 2023 12:40 PM
> To: qemu-devel@nongnu.org
> Cc: Brian Cain ; Matheus Bernardino (QUIC)
> ; Sid Manning ; Marco
> Liebel (QUIC) ; richard.hender...@linaro.org;
> phi...@linaro.org; a...@rev.ng; a...@rev.ng; ltaylorsimp...@gmail.com
> Subject: [PATCH] Hexagon (target/hexagon) Fix shadow variable when idef-
> parser is off
> 
> WARNING: This email originated from outside of Qualcomm. Please be wary of
> any links or attachments, and do not enable macros.
> 
> Adding -Werror=shadow=compatible-local causes Hexagon not to build
> when idef-parser is off.  The "label" variable in CHECK_NOSHUF_PRED
> shadows a variable in the surrounding code.
> 
> Signed-off-by: Taylor Simpson 
> ---
>  target/hexagon/macros.h | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/target/hexagon/macros.h b/target/hexagon/macros.h
> index 9a51b5709b..f99390e2a8 100644
> --- a/target/hexagon/macros.h
> +++ b/target/hexagon/macros.h
> @@ -93,13 +93,13 @@
> 
>  #define CHECK_NOSHUF_PRED(GET_EA, SIZE, PRED) \
>  do { \
> -TCGLabel *label = gen_new_label(); \
> -tcg_gen_brcondi_tl(TCG_COND_EQ, PRED, 0, label); \
> +TCGLabel *noshuf_label = gen_new_label(); \
> +tcg_gen_brcondi_tl(TCG_COND_EQ, PRED, 0, noshuf_label); \
>  GET_EA; \
>  if (insn->slot == 0 && ctx->pkt->pkt_has_store_s1) { \
>  probe_noshuf_load(EA, SIZE, ctx->mem_idx); \
>  } \
> -gen_set_label(label); \
> +gen_set_label(noshuf_label); \
>  if (insn->slot == 0 && ctx->pkt->pkt_has_store_s1) { \
>  process_store(ctx, 1); \
>  } \
> --
> 2.34.1

Reviewed-by: Brian Cain 


Re: [PATCH 04/12] graph-lock: remove AioContext locking

2023-11-30 Thread Eric Blake
On Wed, Nov 29, 2023 at 02:55:45PM -0500, Stefan Hajnoczi wrote:
> Stop acquiring/releasing the AioContext lock in
> bdrv_graph_wrlock()/bdrv_graph_unlock() since the lock no longer has any
> effect.
> 
> The distinction between bdrv_graph_wrunlock() and
> bdrv_graph_wrunlock_ctx() becomes meaningless and they can be collapsed
> into one function.
> 
> Signed-off-by: Stefan Hajnoczi 
> ---

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org




Re: [PATCH v4] migration: Plug memory leak with migration URIs

2023-11-30 Thread Peter Xu
On Thu, Nov 30, 2023 at 07:35:43PM +0100, Markus Armbruster wrote:
> Peter Xu  writes:
> 
> > On Wed, Nov 29, 2023 at 08:43:01PM +, Het Gala wrote:
> >> migrate_uri_parse() allocates memory to 'channel' if the user
> >> opts for old syntax - uri, which is leaked because there is no
> >> code for freeing 'channel'.
> >> So, free channel to avoid memory leak in case where 'channels'
> >> is empty and uri parsing is required.
> >> 
> >> Fixes: 5994024f ("migration: Implement MigrateChannelList to qmp migration 
> >> flow")
> >> Signed-off-by: Het Gala 
> >> Suggested-by: Markus Armbruster 
> >
> > Reviewed-by: Peter Xu 
> >
> >> @@ -533,18 +533,18 @@ static void qemu_start_incoming_migration(const char 
> >> *uri, bool has_channels,
> >>  error_setg(errp, "Channel list has more than one entries");
> >>  return;
> >>  }
> >> -channel = channels->value;
> >> +addr = channels->value->addr;
> >>  } else if (uri) {
> >>  /* caller uses the old URI syntax */
> >>  if (!migrate_uri_parse(uri, , errp)) {
> >>  return;
> >>  }
> >> +addr = channel->addr;
> >>  } else {
> >>  error_setg(errp, "neither 'uri' or 'channels' argument are "
> >> "specified in 'migrate-incoming' qmp command ");
> >>  return;
> >>  }
> >> -addr = channel->addr;
> >
> > Why these "addr" lines need change?  Won't that behave the same as before?
> 
> In the first case, @channel is now null.  If we left the assignment to
> @addr alone, it would crash.  Clearer now?

Is it this one?

if (uri && has_channels) {
error_setg(errp, "'uri' and 'channels' arguments are mutually "
   "exclusive; exactly one of the two should be present in "
   "'migrate-incoming' qmp command ");
return;
}

It returns already?

Thanks,

-- 
Peter Xu




[PATCH] Hexagon (target/hexagon) Fix shadow variable when idef-parser is off

2023-11-30 Thread Taylor Simpson
Adding -Werror=shadow=compatible-local causes Hexagon not to build
when idef-parser is off.  The "label" variable in CHECK_NOSHUF_PRED
shadows a variable in the surrounding code.

Signed-off-by: Taylor Simpson 
---
 target/hexagon/macros.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/target/hexagon/macros.h b/target/hexagon/macros.h
index 9a51b5709b..f99390e2a8 100644
--- a/target/hexagon/macros.h
+++ b/target/hexagon/macros.h
@@ -93,13 +93,13 @@
 
 #define CHECK_NOSHUF_PRED(GET_EA, SIZE, PRED) \
 do { \
-TCGLabel *label = gen_new_label(); \
-tcg_gen_brcondi_tl(TCG_COND_EQ, PRED, 0, label); \
+TCGLabel *noshuf_label = gen_new_label(); \
+tcg_gen_brcondi_tl(TCG_COND_EQ, PRED, 0, noshuf_label); \
 GET_EA; \
 if (insn->slot == 0 && ctx->pkt->pkt_has_store_s1) { \
 probe_noshuf_load(EA, SIZE, ctx->mem_idx); \
 } \
-gen_set_label(label); \
+gen_set_label(noshuf_label); \
 if (insn->slot == 0 && ctx->pkt->pkt_has_store_s1) { \
 process_store(ctx, 1); \
 } \
-- 
2.34.1




Re: [PATCH v4] migration: Plug memory leak with migration URIs

2023-11-30 Thread Markus Armbruster
Peter Xu  writes:

> On Wed, Nov 29, 2023 at 08:43:01PM +, Het Gala wrote:
>> migrate_uri_parse() allocates memory to 'channel' if the user
>> opts for old syntax - uri, which is leaked because there is no
>> code for freeing 'channel'.
>> So, free channel to avoid memory leak in case where 'channels'
>> is empty and uri parsing is required.
>> 
>> Fixes: 5994024f ("migration: Implement MigrateChannelList to qmp migration 
>> flow")
>> Signed-off-by: Het Gala 
>> Suggested-by: Markus Armbruster 
>
> Reviewed-by: Peter Xu 
>
>> @@ -533,18 +533,18 @@ static void qemu_start_incoming_migration(const char 
>> *uri, bool has_channels,
>>  error_setg(errp, "Channel list has more than one entries");
>>  return;
>>  }
>> -channel = channels->value;
>> +addr = channels->value->addr;
>>  } else if (uri) {
>>  /* caller uses the old URI syntax */
>>  if (!migrate_uri_parse(uri, , errp)) {
>>  return;
>>  }
>> +addr = channel->addr;
>>  } else {
>>  error_setg(errp, "neither 'uri' or 'channels' argument are "
>> "specified in 'migrate-incoming' qmp command ");
>>  return;
>>  }
>> -addr = channel->addr;
>
> Why these "addr" lines need change?  Won't that behave the same as before?

In the first case, @channel is now null.  If we left the assignment to
@addr alone, it would crash.  Clearer now?




[PATCH for-9.0 2/4] linux-headers: riscv: add ptrace.h

2023-11-30 Thread Daniel Henrique Barboza
KVM vector support for RISC-V requires the linux-header ptrace.h.

Signed-off-by: Daniel Henrique Barboza 
---
 linux-headers/asm-riscv/ptrace.h | 132 +++
 scripts/update-linux-headers.sh  |   3 +
 2 files changed, 135 insertions(+)
 create mode 100644 linux-headers/asm-riscv/ptrace.h

diff --git a/linux-headers/asm-riscv/ptrace.h b/linux-headers/asm-riscv/ptrace.h
new file mode 100644
index 00..1e3166caca
--- /dev/null
+++ b/linux-headers/asm-riscv/ptrace.h
@@ -0,0 +1,132 @@
+/* SPDX-License-Identifier: GPL-2.0-only WITH Linux-syscall-note */
+/*
+ * Copyright (C) 2012 Regents of the University of California
+ */
+
+#ifndef _ASM_RISCV_PTRACE_H
+#define _ASM_RISCV_PTRACE_H
+
+#ifndef __ASSEMBLY__
+
+#include 
+
+#define PTRACE_GETFDPIC33
+
+#define PTRACE_GETFDPIC_EXEC   0
+#define PTRACE_GETFDPIC_INTERP 1
+
+/*
+ * User-mode register state for core dumps, ptrace, sigcontext
+ *
+ * This decouples struct pt_regs from the userspace ABI.
+ * struct user_regs_struct must form a prefix of struct pt_regs.
+ */
+struct user_regs_struct {
+   unsigned long pc;
+   unsigned long ra;
+   unsigned long sp;
+   unsigned long gp;
+   unsigned long tp;
+   unsigned long t0;
+   unsigned long t1;
+   unsigned long t2;
+   unsigned long s0;
+   unsigned long s1;
+   unsigned long a0;
+   unsigned long a1;
+   unsigned long a2;
+   unsigned long a3;
+   unsigned long a4;
+   unsigned long a5;
+   unsigned long a6;
+   unsigned long a7;
+   unsigned long s2;
+   unsigned long s3;
+   unsigned long s4;
+   unsigned long s5;
+   unsigned long s6;
+   unsigned long s7;
+   unsigned long s8;
+   unsigned long s9;
+   unsigned long s10;
+   unsigned long s11;
+   unsigned long t3;
+   unsigned long t4;
+   unsigned long t5;
+   unsigned long t6;
+};
+
+struct __riscv_f_ext_state {
+   __u32 f[32];
+   __u32 fcsr;
+};
+
+struct __riscv_d_ext_state {
+   __u64 f[32];
+   __u32 fcsr;
+};
+
+struct __riscv_q_ext_state {
+   __u64 f[64] __attribute__((aligned(16)));
+   __u32 fcsr;
+   /*
+* Reserved for expansion of sigcontext structure.  Currently zeroed
+* upon signal, and must be zero upon sigreturn.
+*/
+   __u32 reserved[3];
+};
+
+struct __riscv_ctx_hdr {
+   __u32 magic;
+   __u32 size;
+};
+
+struct __riscv_extra_ext_header {
+   __u32 __padding[129] __attribute__((aligned(16)));
+   /*
+* Reserved for expansion of sigcontext structure.  Currently zeroed
+* upon signal, and must be zero upon sigreturn.
+*/
+   __u32 reserved;
+   struct __riscv_ctx_hdr hdr;
+};
+
+union __riscv_fp_state {
+   struct __riscv_f_ext_state f;
+   struct __riscv_d_ext_state d;
+   struct __riscv_q_ext_state q;
+};
+
+struct __riscv_v_ext_state {
+   unsigned long vstart;
+   unsigned long vl;
+   unsigned long vtype;
+   unsigned long vcsr;
+   unsigned long vlenb;
+   void *datap;
+   /*
+* In signal handler, datap will be set a correct user stack offset
+* and vector registers will be copied to the address of datap
+* pointer.
+*/
+};
+
+struct __riscv_v_regset_state {
+   unsigned long vstart;
+   unsigned long vl;
+   unsigned long vtype;
+   unsigned long vcsr;
+   unsigned long vlenb;
+   char vreg[];
+};
+
+/*
+ * According to spec: The number of bits in a single vector register,
+ * VLEN >= ELEN, which must be a power of 2, and must be no greater than
+ * 2^16 = 65536bits = 8192bytes
+ */
+#define RISCV_MAX_VLENB (8192)
+
+#endif /* __ASSEMBLY__ */
+
+#endif /* _ASM_RISCV_PTRACE_H */
diff --git a/scripts/update-linux-headers.sh b/scripts/update-linux-headers.sh
index 34295c0fe5..a0006eec6f 100755
--- a/scripts/update-linux-headers.sh
+++ b/scripts/update-linux-headers.sh
@@ -156,6 +156,9 @@ for arch in $ARCHLIST; do
 cp_portable "$tmpdir/bootparam.h" \
 "$output/include/standard-headers/asm-$arch"
 fi
+if [ $arch = riscv ]; then
+cp "$tmpdir/include/asm/ptrace.h" "$output/linux-headers/asm-riscv/"
+fi
 done
 
 rm -rf "$output/linux-headers/linux"
-- 
2.41.0




[PATCH for-9.0 1/4] linux-headers: Update to Linux v6.7-rc3

2023-11-30 Thread Daniel Henrique Barboza
We'll add a new RISC-V linux-header file, but first let's update all
headers.

Headers for 'asm-loongarch' were added in this update.

 old commit msg:

add asm-riscv/ptrace.h

KVM Vector support for RISC-V requires import linux-header
 arch/riscv/include/uapi/asm/ptrace.h.

Signed-off-by: Daniel Henrique Barboza 
---
 include/standard-headers/drm/drm_fourcc.h |   2 +
 include/standard-headers/linux/pci_regs.h |  24 ++-
 include/standard-headers/linux/vhost_types.h  |   7 +
 .../standard-headers/linux/virtio_config.h|   5 +
 include/standard-headers/linux/virtio_pci.h   |  11 ++
 linux-headers/asm-arm64/kvm.h |  32 
 linux-headers/asm-generic/unistd.h|  14 +-
 linux-headers/asm-loongarch/bitsperlong.h |   1 +
 linux-headers/asm-loongarch/kvm.h | 108 +++
 linux-headers/asm-loongarch/mman.h|   1 +
 linux-headers/asm-loongarch/unistd.h  |   5 +
 linux-headers/asm-mips/unistd_n32.h   |   4 +
 linux-headers/asm-mips/unistd_n64.h   |   4 +
 linux-headers/asm-mips/unistd_o32.h   |   4 +
 linux-headers/asm-powerpc/unistd_32.h |   4 +
 linux-headers/asm-powerpc/unistd_64.h |   4 +
 linux-headers/asm-riscv/kvm.h |  12 ++
 linux-headers/asm-s390/unistd_32.h|   4 +
 linux-headers/asm-s390/unistd_64.h|   4 +
 linux-headers/asm-x86/unistd_32.h |   4 +
 linux-headers/asm-x86/unistd_64.h |   3 +
 linux-headers/asm-x86/unistd_x32.h|   3 +
 linux-headers/linux/iommufd.h | 180 +-
 linux-headers/linux/kvm.h |  11 ++
 linux-headers/linux/psp-sev.h |   1 +
 linux-headers/linux/stddef.h  |   7 +
 linux-headers/linux/userfaultfd.h |   9 +-
 linux-headers/linux/vfio.h|  47 +++--
 linux-headers/linux/vhost.h   |   8 +
 29 files changed, 497 insertions(+), 26 deletions(-)
 create mode 100644 linux-headers/asm-loongarch/bitsperlong.h
 create mode 100644 linux-headers/asm-loongarch/kvm.h
 create mode 100644 linux-headers/asm-loongarch/mman.h
 create mode 100644 linux-headers/asm-loongarch/unistd.h

diff --git a/include/standard-headers/drm/drm_fourcc.h 
b/include/standard-headers/drm/drm_fourcc.h
index 72279f4d25..3afb70160f 100644
--- a/include/standard-headers/drm/drm_fourcc.h
+++ b/include/standard-headers/drm/drm_fourcc.h
@@ -322,6 +322,8 @@ extern "C" {
  * index 1 = Cr:Cb plane, [39:0] Cr1:Cb1:Cr0:Cb0 little endian
  */
 #define DRM_FORMAT_NV15fourcc_code('N', 'V', '1', '5') /* 2x2 
subsampled Cr:Cb plane */
+#define DRM_FORMAT_NV20fourcc_code('N', 'V', '2', '0') /* 2x1 
subsampled Cr:Cb plane */
+#define DRM_FORMAT_NV30fourcc_code('N', 'V', '3', '0') /* 
non-subsampled Cr:Cb plane */
 
 /*
  * 2 plane YCbCr MSB aligned
diff --git a/include/standard-headers/linux/pci_regs.h 
b/include/standard-headers/linux/pci_regs.h
index e5f558d964..a39193213f 100644
--- a/include/standard-headers/linux/pci_regs.h
+++ b/include/standard-headers/linux/pci_regs.h
@@ -80,6 +80,7 @@
 #define  PCI_HEADER_TYPE_NORMAL0
 #define  PCI_HEADER_TYPE_BRIDGE1
 #define  PCI_HEADER_TYPE_CARDBUS   2
+#define  PCI_HEADER_TYPE_MFD   0x80/* Multi-Function Device 
(possible) */
 
 #define PCI_BIST   0x0f/* 8 bits */
 #define  PCI_BIST_CODE_MASK0x0f/* Return result */
@@ -637,6 +638,7 @@
 #define PCI_EXP_RTCAP  0x1e/* Root Capabilities */
 #define  PCI_EXP_RTCAP_CRSVIS  0x0001  /* CRS Software Visibility capability */
 #define PCI_EXP_RTSTA  0x20/* Root Status */
+#define  PCI_EXP_RTSTA_PME_RQ_ID 0x /* PME Requester ID */
 #define  PCI_EXP_RTSTA_PME 0x0001 /* PME status */
 #define  PCI_EXP_RTSTA_PENDING 0x0002 /* PME pending */
 /*
@@ -930,12 +932,13 @@
 
 /* Process Address Space ID */
 #define PCI_PASID_CAP  0x04/* PASID feature register */
-#define  PCI_PASID_CAP_EXEC0x02/* Exec permissions Supported */
-#define  PCI_PASID_CAP_PRIV0x04/* Privilege Mode Supported */
+#define  PCI_PASID_CAP_EXEC0x0002  /* Exec permissions Supported */
+#define  PCI_PASID_CAP_PRIV0x0004  /* Privilege Mode Supported */
+#define  PCI_PASID_CAP_WIDTH   0x1f00
 #define PCI_PASID_CTRL 0x06/* PASID control register */
-#define  PCI_PASID_CTRL_ENABLE 0x01/* Enable bit */
-#define  PCI_PASID_CTRL_EXEC   0x02/* Exec permissions Enable */
-#define  PCI_PASID_CTRL_PRIV   0x04/* Privilege Mode Enable */
+#define  PCI_PASID_CTRL_ENABLE 0x0001  /* Enable bit */
+#define  PCI_PASID_CTRL_EXEC   0x0002  /* Exec permissions Enable */
+#define  PCI_PASID_CTRL_PRIV   0x0004  /* Privilege Mode Enable */
 #define PCI_EXT_CAP_PASID_SIZEOF   8
 
 /* Single Root I/O Virtualization */
@@ -975,6 +978,8 @@
 #define  PCI_LTR_VALUE_MASK0x03ff
 

[PATCH for-9.0 0/4] target/riscv: add RVV CSRs

2023-11-30 Thread Daniel Henrique Barboza
Hi,

This series adds RVV, vstart, vl and vtype regs to the KVM driver.

But first we need a couple of things done. We need 'ptrace.h' RISC-V
linux-header to be able to read/write RVV CSRs. This is done in patch 2.
Patch 1 is the usual linux-header bump for all archs.

Patch 3 is adding a realize() callback for the KVM driver because we're
doing a prctl() to enable Vector support for the thread.

Patches are based on master. You'll need the following KVM fix to be
able to build:

[PATCH for-8.2] target/riscv/kvm: fix shadowing in kvm_riscv_(get|put)_regs_csr


Daniel Henrique Barboza (4):
  linux-headers: Update to Linux v6.7-rc3
  linux-headers: riscv: add ptrace.h
  target/riscv/kvm: do PR_RISCV_V_SET_CONTROL during realize()
  target/riscv/kvm: add RVV and Vector CSR regs

 include/standard-headers/drm/drm_fourcc.h |   2 +
 include/standard-headers/linux/pci_regs.h |  24 ++-
 include/standard-headers/linux/vhost_types.h  |   7 +
 .../standard-headers/linux/virtio_config.h|   5 +
 include/standard-headers/linux/virtio_pci.h   |  11 ++
 linux-headers/asm-arm64/kvm.h |  32 
 linux-headers/asm-generic/unistd.h|  14 +-
 linux-headers/asm-loongarch/bitsperlong.h |   1 +
 linux-headers/asm-loongarch/kvm.h | 108 +++
 linux-headers/asm-loongarch/mman.h|   1 +
 linux-headers/asm-loongarch/unistd.h  |   5 +
 linux-headers/asm-mips/unistd_n32.h   |   4 +
 linux-headers/asm-mips/unistd_n64.h   |   4 +
 linux-headers/asm-mips/unistd_o32.h   |   4 +
 linux-headers/asm-powerpc/unistd_32.h |   4 +
 linux-headers/asm-powerpc/unistd_64.h |   4 +
 linux-headers/asm-riscv/kvm.h |  12 ++
 linux-headers/asm-riscv/ptrace.h  | 132 +
 linux-headers/asm-s390/unistd_32.h|   4 +
 linux-headers/asm-s390/unistd_64.h|   4 +
 linux-headers/asm-x86/unistd_32.h |   4 +
 linux-headers/asm-x86/unistd_64.h |   3 +
 linux-headers/asm-x86/unistd_x32.h|   3 +
 linux-headers/linux/iommufd.h | 180 +-
 linux-headers/linux/kvm.h |  11 ++
 linux-headers/linux/psp-sev.h |   1 +
 linux-headers/linux/stddef.h  |   7 +
 linux-headers/linux/userfaultfd.h |   9 +-
 linux-headers/linux/vfio.h|  47 +++--
 linux-headers/linux/vhost.h   |   8 +
 scripts/update-linux-headers.sh   |   3 +
 target/riscv/kvm/kvm-cpu.c| 103 ++
 32 files changed, 735 insertions(+), 26 deletions(-)
 create mode 100644 linux-headers/asm-loongarch/bitsperlong.h
 create mode 100644 linux-headers/asm-loongarch/kvm.h
 create mode 100644 linux-headers/asm-loongarch/mman.h
 create mode 100644 linux-headers/asm-loongarch/unistd.h
 create mode 100644 linux-headers/asm-riscv/ptrace.h

-- 
2.41.0




[PATCH for-9.0 4/4] target/riscv/kvm: add RVV and Vector CSR regs

2023-11-30 Thread Daniel Henrique Barboza
Add support for RVV and Vector CSR KVM regs vstart, vl and vtype.

Support for vregs[] requires KVM side changes and an extra reg (vlenb)
and will be added later.

Signed-off-by: Daniel Henrique Barboza 
---
 target/riscv/kvm/kvm-cpu.c | 74 ++
 1 file changed, 74 insertions(+)

diff --git a/target/riscv/kvm/kvm-cpu.c b/target/riscv/kvm/kvm-cpu.c
index 273c71baea..5408ead81c 100644
--- a/target/riscv/kvm/kvm-cpu.c
+++ b/target/riscv/kvm/kvm-cpu.c
@@ -89,6 +89,10 @@ static uint64_t kvm_riscv_reg_id(CPURISCVState *env, 
uint64_t type,
 
 #define RISCV_FP_D_REG(env, idx)  kvm_riscv_reg_id(env, KVM_REG_RISCV_FP_D, 
idx)
 
+#define RISCV_VECTOR_CSR_REG(env, name) \
+kvm_riscv_reg_id(env, KVM_REG_RISCV_VECTOR, \
+ KVM_REG_RISCV_VECTOR_CSR_REG(name))
+
 #define KVM_RISCV_GET_CSR(cs, env, csr, reg) \
 do { \
 int _ret = kvm_get_one_reg(cs, RISCV_CSR_REG(env, csr), ); \
@@ -142,6 +146,7 @@ static KVMCPUConfig kvm_misa_ext_cfgs[] = {
 KVM_MISA_CFG(RVH, KVM_RISCV_ISA_EXT_H),
 KVM_MISA_CFG(RVI, KVM_RISCV_ISA_EXT_I),
 KVM_MISA_CFG(RVM, KVM_RISCV_ISA_EXT_M),
+KVM_MISA_CFG(RVV, KVM_RISCV_ISA_EXT_V),
 };
 
 static void kvm_cpu_get_misa_ext_cfg(Object *obj, Visitor *v,
@@ -688,6 +693,65 @@ static void kvm_riscv_put_regs_timer(CPUState *cs)
 env->kvm_timer_dirty = false;
 }
 
+static int kvm_riscv_get_regs_vector(CPUState *cs)
+{
+CPURISCVState *env = _CPU(cs)->env;
+target_ulong reg;
+int ret = 0;
+
+if (!riscv_has_ext(env, RVV)) {
+return 0;
+}
+
+ret = kvm_get_one_reg(cs, RISCV_VECTOR_CSR_REG(env, vstart), );
+if (ret) {
+return ret;
+}
+env->vstart = reg;
+
+ret = kvm_get_one_reg(cs, RISCV_VECTOR_CSR_REG(env, vl), );
+if (ret) {
+return ret;
+}
+env->vl = reg;
+
+ret = kvm_get_one_reg(cs, RISCV_VECTOR_CSR_REG(env, vtype), );
+if (ret) {
+return ret;
+}
+env->vtype = reg;
+
+return 0;
+}
+
+static int kvm_riscv_put_regs_vector(CPUState *cs)
+{
+CPURISCVState *env = _CPU(cs)->env;
+target_ulong reg;
+int ret = 0;
+
+if (!riscv_has_ext(env, RVV)) {
+return 0;
+}
+
+reg = env->vstart;
+ret = kvm_set_one_reg(cs, RISCV_VECTOR_CSR_REG(env, vstart), );
+if (ret) {
+return ret;
+}
+
+reg = env->vl;
+ret = kvm_set_one_reg(cs, RISCV_VECTOR_CSR_REG(env, vl), );
+if (ret) {
+return ret;
+}
+
+reg = env->vtype;
+ret = kvm_set_one_reg(cs, RISCV_VECTOR_CSR_REG(env, vtype), );
+
+return ret;
+}
+
 typedef struct KVMScratchCPU {
 int kvmfd;
 int vmfd;
@@ -989,6 +1053,11 @@ int kvm_arch_get_registers(CPUState *cs)
 return ret;
 }
 
+ret = kvm_riscv_get_regs_vector(cs);
+if (ret) {
+return ret;
+}
+
 return ret;
 }
 
@@ -1029,6 +1098,11 @@ int kvm_arch_put_registers(CPUState *cs, int level)
 return ret;
 }
 
+ret = kvm_riscv_put_regs_vector(cs);
+if (ret) {
+return ret;
+}
+
 if (KVM_PUT_RESET_STATE == level) {
 RISCVCPU *cpu = RISCV_CPU(cs);
 if (cs->cpu_index == 0) {
-- 
2.41.0




[PATCH for-9.0 3/4] target/riscv/kvm: do PR_RISCV_V_SET_CONTROL during realize()

2023-11-30 Thread Daniel Henrique Barboza
Linux RISC-V vector documentation (Document/arch/riscv/vector.rst)
mandates a prctl() in order to allow an userspace thread to use the
Vector extension from the host.

This is something to be done in realize() time, after init(), when we
already decided whether we're using RVV or not. We don't have a
realize() callback for KVM yet, so add kvm_cpu_realize() and enable RVV
for the thread via PR_RISCV_V_SET_CONTROL.

Signed-off-by: Daniel Henrique Barboza 
---
 target/riscv/kvm/kvm-cpu.c | 29 +
 1 file changed, 29 insertions(+)

diff --git a/target/riscv/kvm/kvm-cpu.c b/target/riscv/kvm/kvm-cpu.c
index 45b6cf1cfa..273c71baea 100644
--- a/target/riscv/kvm/kvm-cpu.c
+++ b/target/riscv/kvm/kvm-cpu.c
@@ -18,6 +18,7 @@
 
 #include "qemu/osdep.h"
 #include 
+#include 
 
 #include 
 
@@ -47,6 +48,9 @@
 #include "sysemu/runstate.h"
 #include "hw/riscv/numa.h"
 
+#define PR_RISCV_V_SET_CONTROL69
+#define PR_RISCV_V_VSTATE_CTRL_ON  2
+
 void riscv_kvm_aplic_request(void *opaque, int irq, int level)
 {
 kvm_set_irq(kvm_state, irq, !!level);
@@ -1481,11 +1485,36 @@ static void kvm_cpu_instance_init(CPUState *cs)
 }
 }
 
+/*
+ * We'll get here via the following path:
+ *
+ * riscv_cpu_realize()
+ *   -> cpu_exec_realizefn()
+ *  -> kvm_cpu_realize() (via accel_cpu_common_realize())
+ */
+static bool kvm_cpu_realize(CPUState *cs, Error **errp)
+{
+RISCVCPU *cpu = RISCV_CPU(cs);
+int ret;
+
+if (riscv_has_ext(>env, RVV)) {
+ret = prctl(PR_RISCV_V_SET_CONTROL, PR_RISCV_V_VSTATE_CTRL_ON);
+if (ret) {
+error_setg(errp, "Error in prctl PR_RISCV_V_SET_CONTROL, code: %s",
+   strerrorname_np(errno));
+return false;
+}
+}
+
+   return true;
+}
+
 static void kvm_cpu_accel_class_init(ObjectClass *oc, void *data)
 {
 AccelCPUClass *acc = ACCEL_CPU_CLASS(oc);
 
 acc->cpu_instance_init = kvm_cpu_instance_init;
+acc->cpu_target_realize = kvm_cpu_realize;
 }
 
 static const TypeInfo kvm_cpu_accel_type_info = {
-- 
2.41.0




Re: [PATCH 2/2] tests/tcg/xtensa: add icount/ibreak priority test

2023-11-30 Thread Richard Henderson

On 11/30/23 11:19, Max Filippov wrote:

When icount and ibreak exceptions are due to happen on the same address
icount has higher precedence.

Signed-off-by: Max Filippov 
---
  tests/tcg/xtensa/test_break.S | 25 -
  1 file changed, 24 insertions(+), 1 deletion(-)



Acked-by: Richard Henderson 


r~



Re: [PATCH 1/2] target/xtensa: use generic instruction breakpoint infrastructure

2023-11-30 Thread Richard Henderson

On 11/30/23 11:19, Max Filippov wrote:

Don't embed ibreak exception generation into TB and don't invalidate TB
on ibreak address change. Add CPUBreakpoint pointers to xtensa
CPUArchState, use cpu_breakpoint_insert/cpu_breakpoint_remove_by_ref to
manage ibreak breakpoints and provide TCGCPUOps::debug_check_breakpoint
callback that recognizes valid instruction breakpoints.

Signed-off-by: Max Filippov 
---
  target/xtensa/cpu.c|  1 +
  target/xtensa/cpu.h|  4 
  target/xtensa/dbg_helper.c | 46 +-
  target/xtensa/helper.c | 12 ++
  target/xtensa/translate.c  | 17 --
  5 files changed, 47 insertions(+), 33 deletions(-)


Thanks a bunch,

Reviewed-by: Richard Henderson 


r~



Re: [PATCH 03/12] aio: make aio_context_acquire()/aio_context_release() a no-op

2023-11-30 Thread Eric Blake
On Wed, Nov 29, 2023 at 02:55:44PM -0500, Stefan Hajnoczi wrote:
> aio_context_acquire()/aio_context_release() has been replaced by
> fine-grained locking to protect state shared by multiple threads. The
> AioContext lock still plays the role of balancing locking in
> AIO_WAIT_WHILE() and many functions in QEMU either require that the
> AioContext lock is held or not held for this reason. In other words, the
> AioContext lock is purely there for consistency with itself and serves
> no real purpose anymore.
> 
> Stop actually acquiring/releasing the lock in
> aio_context_acquire()/aio_context_release() so that subsequent patches
> can remove callers across the codebase incrementally.
> 
> I have performed "make check" and qemu-iotests stress tests across
> x86-64, ppc64le, and aarch64 to confirm that there are no failures as a
> result of eliminating the lock.
> 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  util/async.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/util/async.c b/util/async.c
> index 8f90ddc304..04ee83d220 100644
> --- a/util/async.c
> +++ b/util/async.c
> @@ -725,12 +725,12 @@ void aio_context_unref(AioContext *ctx)
>  
>  void aio_context_acquire(AioContext *ctx)
>  {
> -qemu_rec_mutex_lock(>lock);
> +/* TODO remove this function */
>  }
>  
>  void aio_context_release(AioContext *ctx)
>  {
> -qemu_rec_mutex_unlock(>lock);
> +/* TODO remove this function */
>  }
>  
>  QEMU_DEFINE_STATIC_CO_TLS(AioContext *, my_aiocontext)
> -- 
> 2.42.0

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org




Re: [PATCH v3 03/70] RAMBlock/guest_memfd: Enable KVM_GUEST_MEMFD_ALLOW_HUGEPAGE

2023-11-30 Thread Daniel P . Berrangé
On Thu, Nov 30, 2023 at 05:54:26PM +0100, David Hildenbrand wrote:
> On 30.11.23 17:01, Sean Christopherson wrote:
> > On Thu, Nov 30, 2023, David Hildenbrand wrote:
> > > On 30.11.23 08:32, Xiaoyao Li wrote:
> > > > On 11/20/2023 5:26 PM, David Hildenbrand wrote:
> > > > > 
> > > > > > > ... did you shamelessly copy that from hw/virtio/virtio-mem.c ? ;)
> > > > > > 
> > > > > > Get caught.
> > > > > > 
> > > > > > > This should be factored out into a common helper.
> > > > > > 
> > > > > > Sure, will do it in next version.
> > > > > 
> > > > > Factor it out in a separate patch. Then, this patch is get small that
> > > > > you can just squash it into #2.
> > > > > 
> > > > > And my comment regarding "flags = 0" to patch #2 does no longer apply 
> > > > > :)
> > > > > 
> > > > 
> > > > I see.
> > > > 
> > > > But it depends on if KVM_GUEST_MEMFD_ALLOW_HUGEPAGE will appear together
> > > > with initial guest memfd in linux (hopefully 6.8)
> > > > https://lore.kernel.org/all/CABgObfa=dh7fysbvif63os9svog_wt-aqygtuagkqny5biz...@mail.gmail.com/
> > > > 
> > > 
> > > Doesn't seem to be in -next if I am looking at the right tree:
> > > 
> > > https://git.kernel.org/pub/scm/virt/kvm/kvm.git/log/?h=next
> > 
> > Yeah, we punted on adding hugepage support for the initial guest_memfd 
> > merge so
> > as not to rush in kludgy uABI.  The internal KVM code isn't problematic, we 
> > just
> > haven't figured out exactly what the ABI should look like, e.g. should 
> > hugepages
> > be dependent on THP being enabled, and if not, how does userspace discover 
> > the
> > supported hugepage sizes?
> 
> Are we talking about THP or hugetlb? They are two different things, and
> "KVM_GUEST_MEMFD_ALLOW_HUGEPAGE" doesn't make it clearer what we are talking
> about.
> 
> This patch here "get_thp_size()" indicates that we care about THP, not
> hugetlb.
> 
> 
> THP lives in:
>   /sys/kernel/mm/transparent_hugepage/
> and hugetlb in:
>   /sys/kernel/mm/hugepages/
> 
> THP for shmem+anon currently really only supports PMD-sized THP, that size
> can be observed via:
>   /sys/kernel/mm/transparent_hugepage/hpage_pmd_size
> 
> hugetlb sizes can be detected simply by looking at the folders inside
> /sys/kernel/mm/hugepages/. "tools/testing/selftests/mm/vm_util.c" in the
> kernel has a function "detect_hugetlb_page_sizes()" that uses that interface
> to detect the sizes.
> 
> 
> But likely we want THP support here. Because for hugetlb, one would actually
> have to instruct the kernel which size to use, like we do for memfd with
> hugetlb.

Would we not want both ultimately ?

THP is good because it increases performance vs non-HP out of the box
without the user or mgmt app having to make any decisions.

It does not give you deterministic performance though, because it has
to opportunistically assign huge pages basd on what is available and
that may differ each time a VM is launched.  Explicit admin/mgmt app
controlled huge page usage gives determinism, at the cost of increased
mgmt overhead.

Both are valid use cases depending on the tradeoff a deployment and/or
mgmt app wants to make.


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH v3 03/70] RAMBlock/guest_memfd: Enable KVM_GUEST_MEMFD_ALLOW_HUGEPAGE

2023-11-30 Thread Peter Xu
On Thu, Nov 30, 2023 at 05:54:26PM +0100, David Hildenbrand wrote:
> But likely we want THP support here. Because for hugetlb, one would actually
> have to instruct the kernel which size to use, like we do for memfd with
> hugetlb.

I doubt it, as VM can still leverage larger sizes if possible?

IIUC one of the major challenges of gmem hugepage is how to support
security features while reusing existing mm infrastructures as much as
possible.

Thanks,

-- 
Peter Xu




Re: [PATCH v4] migration: Plug memory leak with migration URIs

2023-11-30 Thread Peter Xu
On Wed, Nov 29, 2023 at 08:43:01PM +, Het Gala wrote:
> migrate_uri_parse() allocates memory to 'channel' if the user
> opts for old syntax - uri, which is leaked because there is no
> code for freeing 'channel'.
> So, free channel to avoid memory leak in case where 'channels'
> is empty and uri parsing is required.
> 
> Fixes: 5994024f ("migration: Implement MigrateChannelList to qmp migration 
> flow")
> Signed-off-by: Het Gala 
> Suggested-by: Markus Armbruster 

Reviewed-by: Peter Xu 

> @@ -533,18 +533,18 @@ static void qemu_start_incoming_migration(const char 
> *uri, bool has_channels,
>  error_setg(errp, "Channel list has more than one entries");
>  return;
>  }
> -channel = channels->value;
> +addr = channels->value->addr;
>  } else if (uri) {
>  /* caller uses the old URI syntax */
>  if (!migrate_uri_parse(uri, , errp)) {
>  return;
>  }
> +addr = channel->addr;
>  } else {
>  error_setg(errp, "neither 'uri' or 'channels' argument are "
> "specified in 'migrate-incoming' qmp command ");
>  return;
>  }
> -addr = channel->addr;

Why these "addr" lines need change?  Won't that behave the same as before?

Thanks,

-- 
Peter Xu




[PATCH 1/2] target/xtensa: use generic instruction breakpoint infrastructure

2023-11-30 Thread Max Filippov
Don't embed ibreak exception generation into TB and don't invalidate TB
on ibreak address change. Add CPUBreakpoint pointers to xtensa
CPUArchState, use cpu_breakpoint_insert/cpu_breakpoint_remove_by_ref to
manage ibreak breakpoints and provide TCGCPUOps::debug_check_breakpoint
callback that recognizes valid instruction breakpoints.

Signed-off-by: Max Filippov 
---
 target/xtensa/cpu.c|  1 +
 target/xtensa/cpu.h|  4 
 target/xtensa/dbg_helper.c | 46 +-
 target/xtensa/helper.c | 12 ++
 target/xtensa/translate.c  | 17 --
 5 files changed, 47 insertions(+), 33 deletions(-)

diff --git a/target/xtensa/cpu.c b/target/xtensa/cpu.c
index e20fe87bf255..b74ee8917065 100644
--- a/target/xtensa/cpu.c
+++ b/target/xtensa/cpu.c
@@ -235,6 +235,7 @@ static const struct TCGCPUOps xtensa_tcg_ops = {
 .do_interrupt = xtensa_cpu_do_interrupt,
 .do_transaction_failed = xtensa_cpu_do_transaction_failed,
 .do_unaligned_access = xtensa_cpu_do_unaligned_access,
+.debug_check_breakpoint = xtensa_debug_check_breakpoint,
 #endif /* !CONFIG_USER_ONLY */
 };
 
diff --git a/target/xtensa/cpu.h b/target/xtensa/cpu.h
index dd8172930653..8a423706d8c0 100644
--- a/target/xtensa/cpu.h
+++ b/target/xtensa/cpu.h
@@ -229,6 +229,7 @@ enum {
 #define MAX_NCCOMPARE 3
 #define MAX_TLB_WAY_SIZE 8
 #define MAX_NDBREAK 2
+#define MAX_NIBREAK 2
 #define MAX_NMEMORY 4
 #define MAX_MPU_FOREGROUND_SEGMENTS 32
 
@@ -547,6 +548,8 @@ struct CPUArchState {
 
 /* Watchpoints for DBREAK registers */
 struct CPUWatchpoint *cpu_watchpoint[MAX_NDBREAK];
+/* Breakpoints for IBREAK registers */
+struct CPUBreakpoint *cpu_breakpoint[MAX_NIBREAK];
 };
 
 /**
@@ -590,6 +593,7 @@ void xtensa_cpu_do_transaction_failed(CPUState *cs, hwaddr 
physaddr, vaddr addr,
   int mmu_idx, MemTxAttrs attrs,
   MemTxResult response, uintptr_t retaddr);
 hwaddr xtensa_cpu_get_phys_page_debug(CPUState *cpu, vaddr addr);
+bool xtensa_debug_check_breakpoint(CPUState *cs);
 #endif
 void xtensa_cpu_dump_state(CPUState *cpu, FILE *f, int flags);
 void xtensa_count_regs(const XtensaConfig *config,
diff --git a/target/xtensa/dbg_helper.c b/target/xtensa/dbg_helper.c
index 3e0c9e8e8be0..497dafca719c 100644
--- a/target/xtensa/dbg_helper.c
+++ b/target/xtensa/dbg_helper.c
@@ -33,27 +33,21 @@
 #include "exec/exec-all.h"
 #include "exec/address-spaces.h"
 
-static void tb_invalidate_virtual_addr(CPUXtensaState *env, uint32_t vaddr)
-{
-uint32_t paddr;
-uint32_t page_size;
-unsigned access;
-int ret = xtensa_get_physical_addr(env, false, vaddr, 2, 0,
-   , _size, );
-if (ret == 0) {
-tb_invalidate_phys_addr(_space_memory, paddr,
-MEMTXATTRS_UNSPECIFIED);
-}
-}
-
 void HELPER(wsr_ibreakenable)(CPUXtensaState *env, uint32_t v)
 {
+CPUState *cs = env_cpu(env);
 uint32_t change = v ^ env->sregs[IBREAKENABLE];
 unsigned i;
 
 for (i = 0; i < env->config->nibreak; ++i) {
 if (change & (1 << i)) {
-tb_invalidate_virtual_addr(env, env->sregs[IBREAKA + i]);
+if (v & (1 << i)) {
+cpu_breakpoint_insert(cs, env->sregs[IBREAKA + i],
+  BP_CPU, >cpu_breakpoint[i]);
+} else {
+cpu_breakpoint_remove_by_ref(cs, env->cpu_breakpoint[i]);
+env->cpu_breakpoint[i] = NULL;
+}
 }
 }
 env->sregs[IBREAKENABLE] = v & ((1 << env->config->nibreak) - 1);
@@ -62,12 +56,32 @@ void HELPER(wsr_ibreakenable)(CPUXtensaState *env, uint32_t 
v)
 void HELPER(wsr_ibreaka)(CPUXtensaState *env, uint32_t i, uint32_t v)
 {
 if (env->sregs[IBREAKENABLE] & (1 << i) && env->sregs[IBREAKA + i] != v) {
-tb_invalidate_virtual_addr(env, env->sregs[IBREAKA + i]);
-tb_invalidate_virtual_addr(env, v);
+CPUState *cs = env_cpu(env);
+
+cpu_breakpoint_remove_by_ref(cs, env->cpu_breakpoint[i]);
+cpu_breakpoint_insert(cs, v, BP_CPU, >cpu_breakpoint[i]);
 }
 env->sregs[IBREAKA + i] = v;
 }
 
+bool xtensa_debug_check_breakpoint(CPUState *cs)
+{
+XtensaCPU *cpu = XTENSA_CPU(cs);
+CPUXtensaState *env = >env;
+unsigned int i;
+
+if (xtensa_get_cintlevel(env) >= env->config->debug_level) {
+return false;
+}
+for (i = 0; i < env->config->nibreak; ++i) {
+if (env->sregs[IBREAKENABLE] & (1 << i) &&
+env->sregs[IBREAKA + i] == env->pc) {
+return true;
+}
+}
+return false;
+}
+
 static void set_dbreak(CPUXtensaState *env, unsigned i, uint32_t dbreaka,
 uint32_t dbreakc)
 {
diff --git a/target/xtensa/helper.c b/target/xtensa/helper.c
index dbeb97a953cc..151da75a41c5 100644
--- a/target/xtensa/helper.c
+++ b/target/xtensa/helper.c
@@ -231,6 +231,18 @@ void 

[PATCH 2/2] tests/tcg/xtensa: add icount/ibreak priority test

2023-11-30 Thread Max Filippov
When icount and ibreak exceptions are due to happen on the same address
icount has higher precedence.

Signed-off-by: Max Filippov 
---
 tests/tcg/xtensa/test_break.S | 25 -
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/tests/tcg/xtensa/test_break.S b/tests/tcg/xtensa/test_break.S
index 3aa18b5cec3f..4c618feb5b10 100644
--- a/tests/tcg/xtensa/test_break.S
+++ b/tests/tcg/xtensa/test_break.S
@@ -129,7 +129,7 @@ test ibreak_remove
 4:
 test_end
 
-test ibreak_priority
+test ibreak_break_priority
 set_vector debug_vector, 2f
 rsila2, debug_level - 1
 movia2, 1f
@@ -145,6 +145,29 @@ test ibreak_priority
 movia3, 0x2
 assert  eq, a2, a3
 test_end
+
+test ibreak_icount_priority
+set_vector debug_vector, 2f
+rsila2, debug_level - 1
+movia2, 1f
+wsr a2, ibreaka0
+movia2, 1
+wsr a2, ibreakenable
+movia2, -2
+wsr a2, icount
+movia2, 1
+wsr a2, icountlevel
+isync
+rsila2, 0
+nop
+1:
+break   0, 0
+test_fail
+2:
+rsr a2, debugcause
+movia3, 0x1
+assert  eq, a2, a3
+test_end
 #endif
 
 test icount
-- 
2.39.2




[PATCH 0/2] target/xtensa: use generic instruction breakpoint infrastructure

2023-11-30 Thread Max Filippov
Hello,

this series makes target/xtensa use generic instruction breakpoint
infrastructure removing its use of tb_invalidate_phys_addr. It also adds
a new TCG test checking relative priority of icount and ibreak exceptions
for target/xtensa.

Max Filippov (2):
  target/xtensa: use generic instruction breakpoint infrastructure
  tests/tcg/xtensa: add icount/ibreak priority test

 target/xtensa/cpu.c   |  1 +
 target/xtensa/cpu.h   |  4 +++
 target/xtensa/dbg_helper.c| 46 +++
 target/xtensa/helper.c| 12 +
 target/xtensa/translate.c | 17 -
 tests/tcg/xtensa/test_break.S | 25 ++-
 6 files changed, 71 insertions(+), 34 deletions(-)

-- 
2.39.2




Re: [PATCH 2/3] docs/devel: rationalise unstable gitlab tests under FLAKY_TESTS

2023-11-30 Thread Alex Bennée
Peter Maydell  writes:

> On Thu, 30 Nov 2023 at 15:33, Alex Bennée  wrote:
>>
>> It doesn't make sense to have two classes of flaky tests. While it may
>> take the constrained environment of CI to trigger failures easily it
>> doesn't mean they don't occasionally happen on developer machines. As
>> CI is the gating factor to passing there is no point developers
>> running the tests locally anyway unless they are trying to fix things.
>>
>> While we are at it update the language in the docs to discourage the
>> QEMU_TEST_FLAKY_TESTS becoming a permanent solution.
>>
>> Signed-off-by: Alex Bennée 
>> ---
>>  docs/devel/testing.rst   | 31 +++-
>>  tests/avocado/boot_linux.py  |  8 +++---
>>  tests/avocado/boot_linux_console.py  |  5 ++--
>>  tests/avocado/intel_iommu.py |  5 ++--
>>  tests/avocado/linux_initrd.py|  5 ++--
>>  tests/avocado/machine_aspeed.py  |  8 +++---
>>  tests/avocado/machine_mips_malta.py  |  8 +++---
>>  tests/avocado/machine_rx_gdbsim.py   |  8 +++---
>>  tests/avocado/machine_s390_ccw_virtio.py |  2 +-
>>  tests/avocado/replay_kernel.py   |  5 ++--
>>  tests/avocado/reverse_debugging.py   | 14 +++
>>  tests/avocado/smmu.py|  5 ++--
>>  tests/avocado/tuxrun_baselines.py|  4 +--
>>  13 files changed, 67 insertions(+), 41 deletions(-)
>>
>> diff --git a/docs/devel/testing.rst b/docs/devel/testing.rst
>> index 22218dbedb..579d1837e0 100644
>> --- a/docs/devel/testing.rst
>> +++ b/docs/devel/testing.rst
>> @@ -1371,23 +1371,32 @@ conditions. For example, tests that take longer to 
>> execute when QEMU is
>>  compiled with debug flags. Therefore, the ``AVOCADO_TIMEOUT_EXPECTED`` 
>> variable
>>  has been used to determine whether those tests should run or not.
>>
>> -GITLAB_CI
>> -^
>> -A number of tests are flagged to not run on the GitLab CI. Usually because
>> -they proved to the flaky or there are constraints on the CI environment 
>> which
>> -would make them fail. If you encounter a similar situation then use that
>> -variable as shown on the code snippet below to skip the test:
>> +QEMU_TEST_FLAKY_TESTS
>> +^
>> +Some tests are not working reliably and thus are disabled by default.
>> +This includes tests that don't run reliably on GitLab's CI which
>> +usually expose real issues that are rarely seen on developer machines
>> +due to the constraints of the CI environment. If you encounter a
>> +similar situation then mark the test as shown on the code snippet
>> +below:
>>
>>  .. code::
>>
>> -  @skipIf(os.getenv('GITLAB_CI'), 'Running on GitLab')
>> +  @skipUnless(os.getenv('QEMU_TEST_FLAKY_TESTS'), 'Test is unstable on 
>> GitLab')
>>def test(self):
>>do_something()
>
> Can we also say here that when marking a test as flaky:
>  * we should raise a github issue giving details of what goes wrong
>  * the URL of that issue should be in a comment above the @skipUnless
>line ?
>
> That way we have a history of why we disabled the test and we
> might even manage to fix it some day, in which case we'll know
> we are able to unmark it as flaky...

Good idea, I'll update.

>
>> +To run such tests locally you will need to set the environment
>> +variable. For example:
>> +
>> +.. code::
>> +
>> +   env QEMU_TEST_FLAKY_TESTS=1 ./pyvenv/bin/avocado run \
>> +  tests/avocado/boot_linux.py:BootLinuxPPC64.test_pseries_tcg
>
> The "env" here is unnecessary (assuming a standard Posix shell);
> "VAR=value foo" will run "foo" with VAR set to 'value' only
> for the duration of that command.

Ahh I habit I've picked up from running a fish shell. Do we want to
assume everyone's interactive shell is Posix compatible?

>
> thanks
> -- PMM

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro



Re: [PATCH v3 03/70] RAMBlock/guest_memfd: Enable KVM_GUEST_MEMFD_ALLOW_HUGEPAGE

2023-11-30 Thread David Hildenbrand

On 30.11.23 17:01, Sean Christopherson wrote:

On Thu, Nov 30, 2023, David Hildenbrand wrote:

On 30.11.23 08:32, Xiaoyao Li wrote:

On 11/20/2023 5:26 PM, David Hildenbrand wrote:



... did you shamelessly copy that from hw/virtio/virtio-mem.c ? ;)


Get caught.


This should be factored out into a common helper.


Sure, will do it in next version.


Factor it out in a separate patch. Then, this patch is get small that
you can just squash it into #2.

And my comment regarding "flags = 0" to patch #2 does no longer apply :)



I see.

But it depends on if KVM_GUEST_MEMFD_ALLOW_HUGEPAGE will appear together
with initial guest memfd in linux (hopefully 6.8)
https://lore.kernel.org/all/CABgObfa=dh7fysbvif63os9svog_wt-aqygtuagkqny5biz...@mail.gmail.com/



Doesn't seem to be in -next if I am looking at the right tree:

https://git.kernel.org/pub/scm/virt/kvm/kvm.git/log/?h=next


Yeah, we punted on adding hugepage support for the initial guest_memfd merge so
as not to rush in kludgy uABI.  The internal KVM code isn't problematic, we just
haven't figured out exactly what the ABI should look like, e.g. should hugepages
be dependent on THP being enabled, and if not, how does userspace discover the
supported hugepage sizes?


Are we talking about THP or hugetlb? They are two different things, and 
"KVM_GUEST_MEMFD_ALLOW_HUGEPAGE" doesn't make it clearer what we are 
talking about.


This patch here "get_thp_size()" indicates that we care about THP, not 
hugetlb.



THP lives in:
/sys/kernel/mm/transparent_hugepage/
and hugetlb in:
/sys/kernel/mm/hugepages/

THP for shmem+anon currently really only supports PMD-sized THP, that 
size can be observed via:

/sys/kernel/mm/transparent_hugepage/hpage_pmd_size

hugetlb sizes can be detected simply by looking at the folders inside
/sys/kernel/mm/hugepages/. "tools/testing/selftests/mm/vm_util.c" in the 
kernel has a function "detect_hugetlb_page_sizes()" that uses that 
interface to detect the sizes.



But likely we want THP support here. Because for hugetlb, one would 
actually have to instruct the kernel which size to use, like we do for 
memfd with hugetlb.



Anon support for smaller sizes than PMDs is in the works, and once 
upstream, it can then be detected via 
/sys/kernel/mm/transparent_hugepage/ as well.


shmem support for smaller sizes is partially in the works: only on the 
write() path. Likely, we'll make it configurable/observable in 
/sys/kernel/mm/transparent_hugepage/ as well.



So if we are talking about THP for shmem, there really only is 
/sys/kernel/mm/transparent_hugepage/hpage_pmd_size.


--
Cheers,

David / dhildenb




Re: [PATCH v3] arm/kvm: Enable support for KVM_ARM_VCPU_PMU_V3_FILTER

2023-11-30 Thread Sebastian Ott

On Tue, 28 Nov 2023, Shaoqin Huang wrote:

+static void kvm_arm_pmu_filter_init(CPUState *cs)
+{
+static bool pmu_filter_init = false;
+struct kvm_pmu_event_filter filter;
+struct kvm_device_attr attr = {
+.group  = KVM_ARM_VCPU_PMU_V3_CTRL,
+.attr   = KVM_ARM_VCPU_PMU_V3_FILTER,
+.addr   = (uint64_t),
+};
+KVMState *kvm_state = cs->kvm_state;
+char *tmp;
+char *str, act;
+
+if (!kvm_state->kvm_pmu_filter)
+return;
+
+if (kvm_vcpu_ioctl(cs, KVM_HAS_DEVICE_ATTR, attr)) {
+error_report("The kernel doesn't support the pmu event filter!\n");
+abort();
+}
+
+/* The filter only needs to be initialized for 1 vcpu. */
+if (!pmu_filter_init)
+pmu_filter_init = true;


Imho this is missing an else to bail out. Or the shorter version

if (pmu_filter_init)
return;

pmu_filter_init = true;

which could also move above the other tests.

Sebastian




Re: [PATCH 2/3] docs/devel: rationalise unstable gitlab tests under FLAKY_TESTS

2023-11-30 Thread Peter Maydell
On Thu, 30 Nov 2023 at 15:33, Alex Bennée  wrote:
>
> It doesn't make sense to have two classes of flaky tests. While it may
> take the constrained environment of CI to trigger failures easily it
> doesn't mean they don't occasionally happen on developer machines. As
> CI is the gating factor to passing there is no point developers
> running the tests locally anyway unless they are trying to fix things.
>
> While we are at it update the language in the docs to discourage the
> QEMU_TEST_FLAKY_TESTS becoming a permanent solution.
>
> Signed-off-by: Alex Bennée 
> ---
>  docs/devel/testing.rst   | 31 +++-
>  tests/avocado/boot_linux.py  |  8 +++---
>  tests/avocado/boot_linux_console.py  |  5 ++--
>  tests/avocado/intel_iommu.py |  5 ++--
>  tests/avocado/linux_initrd.py|  5 ++--
>  tests/avocado/machine_aspeed.py  |  8 +++---
>  tests/avocado/machine_mips_malta.py  |  8 +++---
>  tests/avocado/machine_rx_gdbsim.py   |  8 +++---
>  tests/avocado/machine_s390_ccw_virtio.py |  2 +-
>  tests/avocado/replay_kernel.py   |  5 ++--
>  tests/avocado/reverse_debugging.py   | 14 +++
>  tests/avocado/smmu.py|  5 ++--
>  tests/avocado/tuxrun_baselines.py|  4 +--
>  13 files changed, 67 insertions(+), 41 deletions(-)
>
> diff --git a/docs/devel/testing.rst b/docs/devel/testing.rst
> index 22218dbedb..579d1837e0 100644
> --- a/docs/devel/testing.rst
> +++ b/docs/devel/testing.rst
> @@ -1371,23 +1371,32 @@ conditions. For example, tests that take longer to 
> execute when QEMU is
>  compiled with debug flags. Therefore, the ``AVOCADO_TIMEOUT_EXPECTED`` 
> variable
>  has been used to determine whether those tests should run or not.
>
> -GITLAB_CI
> -^
> -A number of tests are flagged to not run on the GitLab CI. Usually because
> -they proved to the flaky or there are constraints on the CI environment which
> -would make them fail. If you encounter a similar situation then use that
> -variable as shown on the code snippet below to skip the test:
> +QEMU_TEST_FLAKY_TESTS
> +^
> +Some tests are not working reliably and thus are disabled by default.
> +This includes tests that don't run reliably on GitLab's CI which
> +usually expose real issues that are rarely seen on developer machines
> +due to the constraints of the CI environment. If you encounter a
> +similar situation then mark the test as shown on the code snippet
> +below:
>
>  .. code::
>
> -  @skipIf(os.getenv('GITLAB_CI'), 'Running on GitLab')
> +  @skipUnless(os.getenv('QEMU_TEST_FLAKY_TESTS'), 'Test is unstable on 
> GitLab')
>def test(self):
>do_something()

Can we also say here that when marking a test as flaky:
 * we should raise a github issue giving details of what goes wrong
 * the URL of that issue should be in a comment above the @skipUnless
   line ?

That way we have a history of why we disabled the test and we
might even manage to fix it some day, in which case we'll know
we are able to unmark it as flaky...

> +To run such tests locally you will need to set the environment
> +variable. For example:
> +
> +.. code::
> +
> +   env QEMU_TEST_FLAKY_TESTS=1 ./pyvenv/bin/avocado run \
> +  tests/avocado/boot_linux.py:BootLinuxPPC64.test_pseries_tcg

The "env" here is unnecessary (assuming a standard Posix shell);
"VAR=value foo" will run "foo" with VAR set to 'value' only
for the duration of that command.

thanks
-- PMM



Re: [PATCH v1 1/2] hw/cxl/device: read from register values in mdev_reg_read()

2023-11-30 Thread Jonathan Cameron via
On Mon, 27 Nov 2023 12:27:02 -0800
Davidlohr Bueso  wrote:

> On Mon, 27 Nov 2023, Hyeonggon Yoo wrote:
> 
> >In the current mdev_reg_read() implementation, it consistently returns
> >that the Media Status is Ready (01b). This was fine until commit
> >25a52959f99d ("hw/cxl: Add support for device sanitation") because the
> >media was presumed to be ready.
> >
> >However, as per the CXL 3.0 spec "8.2.9.8.5.1 Sanitize (Opcode 4400h)",
> >during sanitation, the Media State should be set to Disabled (11b). The
> >mentioned commit correctly sets it to Disabled, but mdev_reg_read()
> >still returns Media Status as Ready.
> >
> >To address this, update mdev_reg_read() to read register values instead
> >of returning dummy values.
> >
> >Fixes: commit 25a52959f99d ("hw/cxl: Add support for device sanitation")
> >Signed-off-by: Hyeonggon Yoo <42.hye...@gmail.com>  
> 
> Looks good, thanks.
> 
> Reviewed-by: Davidlohr Bueso 
> 
> In addition how about the following to further robustify?
>- disallow certain incoming cci cmd when media is disabled
>- deal with memory reads/writes when media is disabled
>- make __toggle_media() a nop when passed value is already set
>- play nice with arm64 uses little endian reads and writes (this
>  should be extended to all of mbox/cci of course).
This one you've lost me on.  Arm64 and x86 both little endian.

If you mean generally harden the code we haven't fixed up for
big endian systems then fair enough - that indeed needs doing.
Tricky to be sure we got it all right though unless we have a big
endian arch to test on...

Jonathan




Re: [PATCH 2/3] docs/devel: rationalise unstable gitlab tests under FLAKY_TESTS

2023-11-30 Thread Philippe Mathieu-Daudé

On 30/11/23 16:33, Alex Bennée wrote:

It doesn't make sense to have two classes of flaky tests. While it may
take the constrained environment of CI to trigger failures easily it
doesn't mean they don't occasionally happen on developer machines. As
CI is the gating factor to passing there is no point developers
running the tests locally anyway unless they are trying to fix things.

While we are at it update the language in the docs to discourage the
QEMU_TEST_FLAKY_TESTS becoming a permanent solution.

Signed-off-by: Alex Bennée 
---
  docs/devel/testing.rst   | 31 +++-
  tests/avocado/boot_linux.py  |  8 +++---
  tests/avocado/boot_linux_console.py  |  5 ++--
  tests/avocado/intel_iommu.py |  5 ++--
  tests/avocado/linux_initrd.py|  5 ++--
  tests/avocado/machine_aspeed.py  |  8 +++---
  tests/avocado/machine_mips_malta.py  |  8 +++---
  tests/avocado/machine_rx_gdbsim.py   |  8 +++---
  tests/avocado/machine_s390_ccw_virtio.py |  2 +-
  tests/avocado/replay_kernel.py   |  5 ++--
  tests/avocado/reverse_debugging.py   | 14 +++
  tests/avocado/smmu.py|  5 ++--
  tests/avocado/tuxrun_baselines.py|  4 +--
  13 files changed, 67 insertions(+), 41 deletions(-)




diff --git a/tests/avocado/boot_linux.py b/tests/avocado/boot_linux.py
index be30dcbd58..9e9773e6e1 100644
--- a/tests/avocado/boot_linux.py
+++ b/tests/avocado/boot_linux.py
@@ -12,7 +12,7 @@
  
  from avocado_qemu import LinuxTest, BUILD_DIR
  
-from avocado import skipIf

+from avocado import skipUnless
  
  
  class BootLinuxX8664(LinuxTest):

@@ -93,7 +93,8 @@ class BootLinuxPPC64(LinuxTest):
  
  timeout = 360
  
-@skipIf(os.getenv('GITLAB_CI'), 'Running on GitLab')

+@skipUnless(os.getenv('QEMU_TEST_FLAKY_TESTS'), 'Test is unstable on 
GitLab')


Later we could move that to a @flakyTest decorator around QemuBaseTest.

Reviewed-by: Philippe Mathieu-Daudé 


+
  def test_pseries_tcg(self):
  """
  :avocado: tags=machine:pseries





Re: [PATCH v7 00/10] Introduce model for IBM's FSI

2023-11-30 Thread Ninad Palsule

Hello Cedric,

On 11/29/23 15:29, Cédric Le Goater wrote:

On 11/29/23 15:56, Ninad Palsule wrote:

Hello Cedric,

On 11/27/23 10:31, Cédric Le Goater wrote:

Hello Ninad,

On 10/26/23 18:47, Ninad Palsule wrote:

Hello,

Please review the patch-set version 7.
I have incorporated review comments from Cedric, Philippe and Thomas.



I reworked v7 with the suggestions I made in patches 1-6. Please 
check :


  https://github.com/legoater/qemu/commits/aspeed-8.2

I will have more questions on the mappings because some parts are 
really

unclear.

I forgot to mention in my last mail. If I build against your 
aspeed-8.2 branch then rainier machine is failing to boot.


The same images are working with qemu master branch.

fsi/qemu-system-arm -M rainier-bmc -nographic -kernel 
./fitImage-linux.bin -dtb ./aspeed-bmc-ibm-rainier.dtb -initrd 
./obmc-phosphor-initramfs.rootfs.cpio.xz -drive 
file=./obmc-phosphor-image.rootfs.wic.qcow2,if=sd,index=2 -append 
'rootwait console=ttyS4,115200n8 root=PARTLABEL=rofs-a' -net nic -net 
user,hostfwd=:127.0.0.1:3222-:22,hostfwd=:127.0.0.1:2234-:1234 -trace 
'fsi*'

 |
 |
Starting systemd-udevd version 254^
[   50.630407] /dev/disk/by-partlabel/rofs-a: Can't open blockdev
mount: mounting /dev/disk/by-partlabel/rofs-a on /mnt/rofs failed: No 
such file or directory

/bin/sh: can't access tty; job control turned off


Hello,

Please try with  '-M rainier-bmc,boot-emmc=false' to avoid using the
boot emmc mode which is the default for the rainier machine.

I just updated the aspeed-8.2 branch to take into account the -kernel
option when booting the rainier.

That worked. Thanks!



Thanks,

C.







Re: [PATCH] pcie_sriov: Remove g_new assertion

2023-11-30 Thread Michael S. Tsirkin
On Thu, Nov 30, 2023 at 04:37:43PM +0100, Cédric Le Goater wrote:
> On 11/30/23 10:52, YangHang Liu wrote:
> > After applying this patch, the VM with a igbvf will not crash during reboot.
> > 
> > Tested-by: Yanghang Liumailto:yangh...@redhat.com>>
> 
> Michael, do you have plans to send a PR for -rc3 ?
> 
> Thanks,
> 
> C.


Yes.

> 
> 
> > 
> > On Mon, Nov 27, 2023 at 5:12 PM Cédric Le Goater  > > wrote:
> > 
> > On 11/23/23 08:56, Akihiko Odaki wrote:
> >  > g_new() aborts if the allocation fails so it returns NULL only if the
> >  > requested allocation size is zero. register_vfs() makes such an
> >  > allocation if NumVFs is zero so it should not assert that g_new()
> >  > returns a non-NULL value.
> >  >
> >  > Fixes: 7c0fa8dff8 ("pcie: Add support for Single Root I/O 
> > Virtualization (SR/IOV)")
> >  > Buglink: https://issues.redhat.com/browse/RHEL-17209 
> > 
> >  > Signed-off-by: Akihiko Odaki  > >
> > 
> > 
> > Reviewed-by: Cédric Le Goater mailto:c...@redhat.com>>
> > 
> > Thanks,
> > 
> > C.
> > 
> > 
> >  > ---
> >  >   hw/pci/pcie_sriov.c | 1 -
> >  >   1 file changed, 1 deletion(-)
> >  >
> >  > diff --git a/hw/pci/pcie_sriov.c b/hw/pci/pcie_sriov.c
> >  > index 5ef8950940..a1fe65f5d8 100644
> >  > --- a/hw/pci/pcie_sriov.c
> >  > +++ b/hw/pci/pcie_sriov.c
> >  > @@ -178,7 +178,6 @@ static void register_vfs(PCIDevice *dev)
> >  >       num_vfs = pci_get_word(dev->config + sriov_cap + 
> > PCI_SRIOV_NUM_VF);
> >  >
> >  >       dev->exp.sriov_pf.vf = g_new(PCIDevice *, num_vfs);
> >  > -    assert(dev->exp.sriov_pf.vf);
> >  >
> >  >       trace_sriov_register_vfs(dev->name, PCI_SLOT(dev->devfn),
> >  >                                PCI_FUNC(dev->devfn), num_vfs);
> > 
> > 




[PATCH v2 2/2] block: stream: Allow users to request only format driver names in backing file format

2023-11-30 Thread Peter Krempa
Introduce a new flag 'backing_file_format_no_protocol' for the
block-commit QMP command which instructs the internals to use 'raw'
instead of the protocol driver in case when a image is used without a
dummy 'raw' wrapper.

The flag is designed such that it can be always asserted by management
tools even when there isn't any update to backing files.

The flag will be used by libvirt so that the backing images still
reference the proper format even when libvirt will stop using the dummy
raw driver (raw driver with no other config). Libvirt needs this so that
the images stay compatible with older libvirt versions which didn't
expect that a protocol driver name can appear in the backing file format
field.

Signed-off-by: Peter Krempa 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 block/monitor/block-hmp-cmds.c |  2 +-
 block/stream.c | 10 +-
 blockdev.c |  7 +++
 include/block/block_int-global-state.h |  3 +++
 qapi/block-core.json   | 11 ++-
 5 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/block/monitor/block-hmp-cmds.c b/block/monitor/block-hmp-cmds.c
index c729cbf1eb..9080e29d4d 100644
--- a/block/monitor/block-hmp-cmds.c
+++ b/block/monitor/block-hmp-cmds.c
@@ -509,7 +509,7 @@ void hmp_block_stream(Monitor *mon, const QDict *qdict)
 const char *base = qdict_get_try_str(qdict, "base");
 int64_t speed = qdict_get_try_int(qdict, "speed", 0);

-qmp_block_stream(device, device, base, NULL, NULL, NULL,
+qmp_block_stream(device, device, base, NULL, NULL, false, false, NULL,
  qdict_haskey(qdict, "speed"), speed,
  true, BLOCKDEV_ON_ERROR_REPORT, NULL,
  false, false, false, false, );
diff --git a/block/stream.c b/block/stream.c
index 01fe7c0f16..42befd6b1d 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -39,6 +39,7 @@ typedef struct StreamBlockJob {
 BlockDriverState *target_bs;
 BlockdevOnError on_error;
 char *backing_file_str;
+bool backing_file_format_no_protocol;
 bool bs_read_only;
 } StreamBlockJob;

@@ -95,7 +96,12 @@ static int stream_prepare(Job *job)
 if (unfiltered_base) {
 base_id = s->backing_file_str ?: unfiltered_base->filename;
 if (unfiltered_base->drv) {
-base_fmt = unfiltered_base->drv->format_name;
+if (s->backing_file_format_no_protocol &&
+unfiltered_base->drv->protocol_name) {
+base_fmt = "raw";
+} else {
+base_fmt = unfiltered_base->drv->format_name;
+}
 }
 }

@@ -247,6 +253,7 @@ static const BlockJobDriver stream_job_driver = {

 void stream_start(const char *job_id, BlockDriverState *bs,
   BlockDriverState *base, const char *backing_file_str,
+  bool backing_file_format_no_protocol,
   BlockDriverState *bottom,
   int creation_flags, int64_t speed,
   BlockdevOnError on_error,
@@ -398,6 +405,7 @@ void stream_start(const char *job_id, BlockDriverState *bs,
 s->base_overlay = base_overlay;
 s->above_base = above_base;
 s->backing_file_str = g_strdup(backing_file_str);
+s->backing_file_format_no_protocol = backing_file_format_no_protocol;
 s->cor_filter_bs = cor_filter_bs;
 s->target_bs = bs;
 s->bs_read_only = bs_read_only;
diff --git a/blockdev.c b/blockdev.c
index 038031bb03..dc477c4f7e 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2408,6 +2408,8 @@ void qmp_block_stream(const char *job_id, const char 
*device,
   const char *base,
   const char *base_node,
   const char *backing_file,
+  bool has_backing_file_format_no_protocol,
+  bool backing_file_format_no_protocol,
   const char *bottom,
   bool has_speed, int64_t speed,
   bool has_on_error, BlockdevOnError on_error,
@@ -2443,6 +2445,10 @@ void qmp_block_stream(const char *job_id, const char 
*device,
 return;
 }

+if (!has_backing_file_format_no_protocol) {
+backing_file_format_no_protocol = false;
+}
+
 if (!has_on_error) {
 on_error = BLOCKDEV_ON_ERROR_REPORT;
 }
@@ -2531,6 +2537,7 @@ void qmp_block_stream(const char *job_id, const char 
*device,
 }

 stream_start(job_id, bs, base_bs, backing_file,
+ backing_file_format_no_protocol,
  bottom_bs, job_flags, has_speed ? speed : 0, on_error,
  filter_node_name, _err);
 if (local_err) {
diff --git a/include/block/block_int-global-state.h 
b/include/block/block_int-global-state.h
index 4f253ff362..4301061048 100644
--- a/include/block/block_int-global-state.h
+++ b/include/block/block_int-global-state.h
@@ -46,6 +46,8 @@
  * flatten 

Re: [PATCH 2/3] docs/devel: rationalise unstable gitlab tests under FLAKY_TESTS

2023-11-30 Thread Cédric Le Goater

On 11/30/23 16:33, Alex Bennée wrote:

It doesn't make sense to have two classes of flaky tests. While it may
take the constrained environment of CI to trigger failures easily it
doesn't mean they don't occasionally happen on developer machines. As
CI is the gating factor to passing there is no point developers
running the tests locally anyway unless they are trying to fix things.

While we are at it update the language in the docs to discourage the
QEMU_TEST_FLAKY_TESTS becoming a permanent solution.

Signed-off-by: Alex Bennée 
---
  docs/devel/testing.rst   | 31 +++-
  tests/avocado/boot_linux.py  |  8 +++---
  tests/avocado/boot_linux_console.py  |  5 ++--
  tests/avocado/intel_iommu.py |  5 ++--
  tests/avocado/linux_initrd.py|  5 ++--
  tests/avocado/machine_aspeed.py  |  8 +++---
  tests/avocado/machine_mips_malta.py  |  8 +++---
  tests/avocado/machine_rx_gdbsim.py   |  8 +++---
  tests/avocado/machine_s390_ccw_virtio.py |  2 +-
  tests/avocado/replay_kernel.py   |  5 ++--
  tests/avocado/reverse_debugging.py   | 14 +++
  tests/avocado/smmu.py|  5 ++--
  tests/avocado/tuxrun_baselines.py|  4 +--
  13 files changed, 67 insertions(+), 41 deletions(-)

diff --git a/docs/devel/testing.rst b/docs/devel/testing.rst
index 22218dbedb..579d1837e0 100644
--- a/docs/devel/testing.rst
+++ b/docs/devel/testing.rst
@@ -1371,23 +1371,32 @@ conditions. For example, tests that take longer to 
execute when QEMU is
  compiled with debug flags. Therefore, the ``AVOCADO_TIMEOUT_EXPECTED`` 
variable
  has been used to determine whether those tests should run or not.
  
-GITLAB_CI

-^
-A number of tests are flagged to not run on the GitLab CI. Usually because
-they proved to the flaky or there are constraints on the CI environment which
-would make them fail. If you encounter a similar situation then use that
-variable as shown on the code snippet below to skip the test:
+QEMU_TEST_FLAKY_TESTS
+^
+Some tests are not working reliably and thus are disabled by default.
+This includes tests that don't run reliably on GitLab's CI which
+usually expose real issues that are rarely seen on developer machines
+due to the constraints of the CI environment. If you encounter a
+similar situation then mark the test as shown on the code snippet
+below:
  
  .. code::
  
-  @skipIf(os.getenv('GITLAB_CI'), 'Running on GitLab')

+  @skipUnless(os.getenv('QEMU_TEST_FLAKY_TESTS'), 'Test is unstable on GitLab')
def test(self):
do_something()
  
-QEMU_TEST_FLAKY_TESTS

-^
-Some tests are not working reliably and thus are disabled by default.
-Set this environment variable to enable them.
+Tests should not live in this state forever and should either be fixed
+or eventually removed. If you move a test into this category please
+consider raising a bug to track progress towards a fix.
+
+To run such tests locally you will need to set the environment
+variable. For example:
+
+.. code::
+
+   env QEMU_TEST_FLAKY_TESTS=1 ./pyvenv/bin/avocado run \
+  tests/avocado/boot_linux.py:BootLinuxPPC64.test_pseries_tcg
  
  Uninstalling Avocado

  
diff --git a/tests/avocado/boot_linux.py b/tests/avocado/boot_linux.py
index be30dcbd58..9e9773e6e1 100644
--- a/tests/avocado/boot_linux.py
+++ b/tests/avocado/boot_linux.py
@@ -12,7 +12,7 @@
  
  from avocado_qemu import LinuxTest, BUILD_DIR
  
-from avocado import skipIf

+from avocado import skipUnless
  
  
  class BootLinuxX8664(LinuxTest):

@@ -93,7 +93,8 @@ class BootLinuxPPC64(LinuxTest):
  
  timeout = 360
  
-@skipIf(os.getenv('GITLAB_CI'), 'Running on GitLab')

+@skipUnless(os.getenv('QEMU_TEST_FLAKY_TESTS'), 'Test is unstable on 
GitLab')
+
  def test_pseries_tcg(self):


is it because the test is too slow ? If this is the case then we should
probably remove. I have never see it fails though.


  """
  :avocado: tags=machine:pseries
@@ -111,7 +112,8 @@ class BootLinuxS390X(LinuxTest):
  
  timeout = 240
  
-@skipIf(os.getenv('GITLAB_CI'), 'Running on GitLab')

+@skipUnless(os.getenv('QEMU_TEST_FLAKY_TESTS'), 'Test is unstable on 
GitLab')
+
  def test_s390_ccw_virtio_tcg(self):
  """
  :avocado: tags=machine:s390-ccw-virtio
diff --git a/tests/avocado/boot_linux_console.py 
b/tests/avocado/boot_linux_console.py
index 6eab515718..231b4f68e5 100644
--- a/tests/avocado/boot_linux_console.py
+++ b/tests/avocado/boot_linux_console.py
@@ -15,7 +15,7 @@
  
  from avocado import skip

  from avocado import skipUnless
-from avocado import skipIf
+from avocado import skipUnless
  from avocado_qemu import QemuSystemTest
  from avocado_qemu import exec_command
  from avocado_qemu import exec_command_and_wait_for_pattern
@@ -1419,7 +1419,8 @@ def test_ppc_mac99(self):
  # This test has a 6-10% failure rate on various hosts 

[PATCH v2 0/2] block: commit/stream: Allow users to request only format driver names in backing file format

2023-11-30 Thread Peter Krempa
Please see patches for rationale.

Libvirt patches using this new flag will be posted soon-ish (after
cleanup).

v2:
 - fixed mistaken argument order in 'hmp_block_stream'
 - changed version in docs to 9.0 as getting this into RC 3 probably
   isn't realistic

Peter Krempa (2):
  block: commit: Allow users to request only format driver names in
backing file format
  block: stream: Allow users to request only format driver names in
backing file format

 block.c| 37 +-
 block/commit.c |  6 -
 block/monitor/block-hmp-cmds.c |  2 +-
 block/stream.c | 10 ++-
 blockdev.c | 13 +
 include/block/block-global-state.h |  3 ++-
 include/block/block_int-common.h   |  4 ++-
 include/block/block_int-global-state.h |  6 +
 qapi/block-core.json   | 22 +--
 tests/unit/test-bdrv-drain.c   |  3 ++-
 10 files changed, 91 insertions(+), 15 deletions(-)

-- 
2.43.0




[PATCH v2 1/2] block: commit: Allow users to request only format driver names in backing file format

2023-11-30 Thread Peter Krempa
Introduce a new flag 'backing_file_format_no_protocol' for the
block-commit QMP command which instructs the internals to use 'raw'
instead of the protocol driver in case when a image is used without a
dummy 'raw' wrapper.

The flag is designed such that it can be always asserted by management
tools even when there isn't any update to backing files.

The flag will be used by libvirt so that the backing images still
reference the proper format even when libvirt will stop using the dummy
raw driver (raw driver with no other config). Libvirt needs this so that
the images stay compatible with older libvirt versions which didn't
expect that a protocol driver name can appear in the backing file format
field.

Signed-off-by: Peter Krempa 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 block.c| 37 +-
 block/commit.c |  6 -
 blockdev.c |  6 +
 include/block/block-global-state.h |  3 ++-
 include/block/block_int-common.h   |  4 ++-
 include/block/block_int-global-state.h |  3 +++
 qapi/block-core.json   | 11 +++-
 tests/unit/test-bdrv-drain.c   |  3 ++-
 8 files changed, 61 insertions(+), 12 deletions(-)

diff --git a/block.c b/block.c
index bfb0861ec6..986a529941 100644
--- a/block.c
+++ b/block.c
@@ -1309,11 +1309,14 @@ static void bdrv_backing_detach(BdrvChild *c)
 }

 static int bdrv_backing_update_filename(BdrvChild *c, BlockDriverState *base,
-const char *filename, Error **errp)
+const char *filename,
+bool backing_file_format_no_protocol,
+Error **errp)
 {
 BlockDriverState *parent = c->opaque;
 bool read_only = bdrv_is_read_only(parent);
 int ret;
+const char *format_name;
 GLOBAL_STATE_CODE();

 if (read_only) {
@@ -1323,9 +1326,23 @@ static int bdrv_backing_update_filename(BdrvChild *c, 
BlockDriverState *base,
 }
 }

-ret = bdrv_change_backing_file(parent, filename,
-   base->drv ? base->drv->format_name : "",
-   false);
+if (base->drv) {
+/*
+ * If the new base image doesn't have a format driver layer, which we
+ * detect by the fact that @base is a protocol driver, we record
+ * 'raw' as the format instead of putting the protocol name as the
+ * backing format
+ */
+if (backing_file_format_no_protocol && base->drv->protocol_name) {
+format_name = "raw";
+} else {
+format_name = base->drv->format_name;
+}
+} else {
+format_name = "";
+}
+
+ret = bdrv_change_backing_file(parent, filename, format_name, false);
 if (ret < 0) {
 error_setg_errno(errp, -ret, "Could not update backing file link");
 }
@@ -1479,10 +1496,14 @@ static void GRAPH_WRLOCK bdrv_child_cb_detach(BdrvChild 
*child)
 }

 static int bdrv_child_cb_update_filename(BdrvChild *c, BlockDriverState *base,
- const char *filename, Error **errp)
+ const char *filename,
+ bool backing_file_format_no_protocol,
+ Error **errp)
 {
 if (c->role & BDRV_CHILD_COW) {
-return bdrv_backing_update_filename(c, base, filename, errp);
+return bdrv_backing_update_filename(c, base, filename,
+backing_file_format_no_protocol,
+errp);
 }
 return 0;
 }
@@ -5961,7 +5982,8 @@ void bdrv_unfreeze_backing_chain(BlockDriverState *bs, 
BlockDriverState *base)
  *
  */
 int bdrv_drop_intermediate(BlockDriverState *top, BlockDriverState *base,
-   const char *backing_file_str)
+   const char *backing_file_str,
+   bool backing_file_format_no_protocol)
 {
 BlockDriverState *explicit_top = top;
 bool update_inherits_from;
@@ -6027,6 +6049,7 @@ int bdrv_drop_intermediate(BlockDriverState *top, 
BlockDriverState *base,

 if (c->klass->update_filename) {
 ret = c->klass->update_filename(c, base, backing_file_str,
+backing_file_format_no_protocol,
 _err);
 if (ret < 0) {
 /*
diff --git a/block/commit.c b/block/commit.c
index 69cc75be0c..5a584b712e 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -42,6 +42,7 @@ typedef struct CommitBlockJob {
 bool base_read_only;
 bool chain_frozen;
 char *backing_file_str;
+bool backing_file_format_no_protocol;
 } CommitBlockJob;

 static int commit_prepare(Job *job)
@@ -61,7 +62,8 @@ static int 

Re: [PATCH 0/3] final fixes for 8.2

2023-11-30 Thread Richard Henderson

On 11/30/23 09:33, Alex Bennée wrote:

Alex Bennée (3):
   gdbstub: use a better signal when we halt for IO reasons
   docs/devel: rationalise unstable gitlab tests under FLAKY_TESTS
   gitlab: add optional job to run flaky avocado tests


Reviewed-by: Richard Henderson 

r~



Re: [PATCH v3 03/70] RAMBlock/guest_memfd: Enable KVM_GUEST_MEMFD_ALLOW_HUGEPAGE

2023-11-30 Thread Sean Christopherson
On Thu, Nov 30, 2023, David Hildenbrand wrote:
> On 30.11.23 08:32, Xiaoyao Li wrote:
> > On 11/20/2023 5:26 PM, David Hildenbrand wrote:
> > > 
> > > > > ... did you shamelessly copy that from hw/virtio/virtio-mem.c ? ;)
> > > > 
> > > > Get caught.
> > > > 
> > > > > This should be factored out into a common helper.
> > > > 
> > > > Sure, will do it in next version.
> > > 
> > > Factor it out in a separate patch. Then, this patch is get small that
> > > you can just squash it into #2.
> > > 
> > > And my comment regarding "flags = 0" to patch #2 does no longer apply :)
> > > 
> > 
> > I see.
> > 
> > But it depends on if KVM_GUEST_MEMFD_ALLOW_HUGEPAGE will appear together
> > with initial guest memfd in linux (hopefully 6.8)
> > https://lore.kernel.org/all/CABgObfa=dh7fysbvif63os9svog_wt-aqygtuagkqny5biz...@mail.gmail.com/
> > 
> 
> Doesn't seem to be in -next if I am looking at the right tree:
> 
> https://git.kernel.org/pub/scm/virt/kvm/kvm.git/log/?h=next

Yeah, we punted on adding hugepage support for the initial guest_memfd merge so
as not to rush in kludgy uABI.  The internal KVM code isn't problematic, we just
haven't figured out exactly what the ABI should look like, e.g. should hugepages
be dependent on THP being enabled, and if not, how does userspace discover the
supported hugepage sizes?



[PATCH v3] accel/kvm: Turn DPRINTF macro use into tracepoints

2023-11-30 Thread Jai Arora
Patch removes DRPINTF macro and adds multiple tracepoints
to capture different kvm events.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1827

Signed-off-by: Jai Arora 
---

Notes:
v3: Addresses review comments by Alex Benn??e

Removes trace events kvm_run_exit_reason, kvm_exit_system_event
and their usage.

Adds trace event kvm_run_exit_system_event to trace run->system_event.type

 accel/kvm/kvm-all.c| 28 ++--
 accel/kvm/trace-events |  7 ++-
 2 files changed, 12 insertions(+), 23 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index e39a810a4e..80ac7b35b7 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -69,16 +69,6 @@
 #define KVM_GUESTDBG_BLOCKIRQ 0
 #endif
 
-//#define DEBUG_KVM
-
-#ifdef DEBUG_KVM
-#define DPRINTF(fmt, ...) \
-do { fprintf(stderr, fmt, ## __VA_ARGS__); } while (0)
-#else
-#define DPRINTF(fmt, ...) \
-do { } while (0)
-#endif
-
 struct KVMParkedVcpu {
 unsigned long vcpu_id;
 int kvm_fd;
@@ -331,7 +321,7 @@ static int do_kvm_destroy_vcpu(CPUState *cpu)
 struct KVMParkedVcpu *vcpu = NULL;
 int ret = 0;
 
-DPRINTF("kvm_destroy_vcpu\n");
+trace_kvm_destroy_vcpu();
 
 ret = kvm_arch_destroy_vcpu(cpu);
 if (ret < 0) {
@@ -341,7 +331,7 @@ static int do_kvm_destroy_vcpu(CPUState *cpu)
 mmap_size = kvm_ioctl(s, KVM_GET_VCPU_MMAP_SIZE, 0);
 if (mmap_size < 0) {
 ret = mmap_size;
-DPRINTF("KVM_GET_VCPU_MMAP_SIZE failed\n");
+trace_kvm_failed_get_vcpu_mmap_size();
 goto err;
 }
 
@@ -443,7 +433,6 @@ int kvm_init_vcpu(CPUState *cpu, Error **errp)
PAGE_SIZE * KVM_DIRTY_LOG_PAGE_OFFSET);
 if (cpu->kvm_dirty_gfns == MAP_FAILED) {
 ret = -errno;
-DPRINTF("mmap'ing vcpu dirty gfns failed: %d\n", ret);
 goto err;
 }
 }
@@ -2821,7 +2810,7 @@ int kvm_cpu_exec(CPUState *cpu)
 struct kvm_run *run = cpu->kvm_run;
 int ret, run_ret;
 
-DPRINTF("kvm_cpu_exec()\n");
+trace_kvm_cpu_exec();
 
 if (kvm_arch_process_async_events(cpu)) {
 qatomic_set(>exit_request, 0);
@@ -2848,7 +2837,7 @@ int kvm_cpu_exec(CPUState *cpu)
 
 kvm_arch_pre_run(cpu, run);
 if (qatomic_read(>exit_request)) {
-DPRINTF("interrupt exit requested\n");
+   trace_kvm_interrupt_exit_request();
 /*
  * KVM requires us to reenter the kernel after IO exits to complete
  * instruction emulation. This self-signal will ensure that we
@@ -2878,7 +2867,7 @@ int kvm_cpu_exec(CPUState *cpu)
 
 if (run_ret < 0) {
 if (run_ret == -EINTR || run_ret == -EAGAIN) {
-DPRINTF("io window exit\n");
+trace_kvm_io_window_exit();
 kvm_eat_signals(cpu);
 ret = EXCP_INTERRUPT;
 break;
@@ -2900,7 +2889,6 @@ int kvm_cpu_exec(CPUState *cpu)
 trace_kvm_run_exit(cpu->cpu_index, run->exit_reason);
 switch (run->exit_reason) {
 case KVM_EXIT_IO:
-DPRINTF("handle_io\n");
 /* Called outside BQL */
 kvm_handle_io(run->io.port, attrs,
   (uint8_t *)run + run->io.data_offset,
@@ -2910,7 +2898,6 @@ int kvm_cpu_exec(CPUState *cpu)
 ret = 0;
 break;
 case KVM_EXIT_MMIO:
-DPRINTF("handle_mmio\n");
 /* Called outside BQL */
 address_space_rw(_space_memory,
  run->mmio.phys_addr, attrs,
@@ -2920,11 +2907,9 @@ int kvm_cpu_exec(CPUState *cpu)
 ret = 0;
 break;
 case KVM_EXIT_IRQ_WINDOW_OPEN:
-DPRINTF("irq_window_open\n");
 ret = EXCP_INTERRUPT;
 break;
 case KVM_EXIT_SHUTDOWN:
-DPRINTF("shutdown\n");
 qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_RESET);
 ret = EXCP_INTERRUPT;
 break;
@@ -2959,6 +2944,7 @@ int kvm_cpu_exec(CPUState *cpu)
 ret = 0;
 break;
 case KVM_EXIT_SYSTEM_EVENT:
+trace_kvm_run_exit_system_event(cpu->cpu_index, 
run->system_event.type);
 switch (run->system_event.type) {
 case KVM_SYSTEM_EVENT_SHUTDOWN:
 qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
@@ -2976,13 +2962,11 @@ int kvm_cpu_exec(CPUState *cpu)
 ret = 0;
 break;
 default:
-DPRINTF("kvm_arch_handle_exit\n");
 ret = kvm_arch_handle_exit(cpu, run);
 break;
 }
 break;
 default:
-DPRINTF("kvm_arch_handle_exit\n");
 ret = kvm_arch_handle_exit(cpu, run);
 break;
 }
diff --git a/accel/kvm/trace-events b/accel/kvm/trace-events
index 399aaeb0ec..f61a21019a 100644
--- 

Re: [PATCH 0/3] Virtio dmabuf improvements

2023-11-30 Thread Albert Esteve
On Tue, Nov 7, 2023 at 10:37 AM Albert Esteve  wrote:

> Various improvements for the virtio-dmabuf module.
> This patch includes:
>
> - Check for ownership before allowing a vhost device
>   to remove an object from the table.
> - Properly cleanup shared resources if a vhost device
>   object gets cleaned up.
> - Rename virtio dmabuf functions to `virtio_dmabuf_*`
>
> Albert Esteve (3):
>   hw/virtio: check owner for removing objects
>   hw/virtio: cleanup shared resources
>   hw/virtio: rename virtio dmabuf API
>
>  hw/display/virtio-dmabuf.c| 14 +-
>  hw/virtio/vhost-user.c| 33 ++-
>  hw/virtio/vhost.c |  5 
>  include/hw/virtio/vhost.h |  6 +
>  include/hw/virtio/virtio-dmabuf.h | 33 ---
>  tests/unit/test-virtio-dmabuf.c   | 44 +++
>  6 files changed, 83 insertions(+), 52 deletions(-)
>
> --
> 2.41.0
>
>
Bump :)

@Marc-André Lureau  could you please take a
look? You suggested the API upgrades, so would be great if you could check
if it is what you had in mind.

Thanks!


Re: [PATCH] pcie_sriov: Remove g_new assertion

2023-11-30 Thread Cédric Le Goater

On 11/30/23 10:52, YangHang Liu wrote:

After applying this patch, the VM with a igbvf will not crash during reboot.

Tested-by: Yanghang Liumailto:yangh...@redhat.com>>


Michael, do you have plans to send a PR for -rc3 ?

Thanks,

C.





On Mon, Nov 27, 2023 at 5:12 PM Cédric Le Goater mailto:c...@redhat.com>> wrote:

On 11/23/23 08:56, Akihiko Odaki wrote:
 > g_new() aborts if the allocation fails so it returns NULL only if the
 > requested allocation size is zero. register_vfs() makes such an
 > allocation if NumVFs is zero so it should not assert that g_new()
 > returns a non-NULL value.
 >
 > Fixes: 7c0fa8dff8 ("pcie: Add support for Single Root I/O Virtualization 
(SR/IOV)")
 > Buglink: https://issues.redhat.com/browse/RHEL-17209 

 > Signed-off-by: Akihiko Odaki mailto:akihiko.od...@daynix.com>>


Reviewed-by: Cédric Le Goater mailto:c...@redhat.com>>

Thanks,

C.


 > ---
 >   hw/pci/pcie_sriov.c | 1 -
 >   1 file changed, 1 deletion(-)
 >
 > diff --git a/hw/pci/pcie_sriov.c b/hw/pci/pcie_sriov.c
 > index 5ef8950940..a1fe65f5d8 100644
 > --- a/hw/pci/pcie_sriov.c
 > +++ b/hw/pci/pcie_sriov.c
 > @@ -178,7 +178,6 @@ static void register_vfs(PCIDevice *dev)
 >       num_vfs = pci_get_word(dev->config + sriov_cap + PCI_SRIOV_NUM_VF);
 >
 >       dev->exp.sriov_pf.vf = g_new(PCIDevice *, num_vfs);
 > -    assert(dev->exp.sriov_pf.vf);
 >
 >       trace_sriov_register_vfs(dev->name, PCI_SLOT(dev->devfn),
 >                                PCI_FUNC(dev->devfn), num_vfs);







[PATCH 2/3] docs/devel: rationalise unstable gitlab tests under FLAKY_TESTS

2023-11-30 Thread Alex Bennée
It doesn't make sense to have two classes of flaky tests. While it may
take the constrained environment of CI to trigger failures easily it
doesn't mean they don't occasionally happen on developer machines. As
CI is the gating factor to passing there is no point developers
running the tests locally anyway unless they are trying to fix things.

While we are at it update the language in the docs to discourage the
QEMU_TEST_FLAKY_TESTS becoming a permanent solution.

Signed-off-by: Alex Bennée 
---
 docs/devel/testing.rst   | 31 +++-
 tests/avocado/boot_linux.py  |  8 +++---
 tests/avocado/boot_linux_console.py  |  5 ++--
 tests/avocado/intel_iommu.py |  5 ++--
 tests/avocado/linux_initrd.py|  5 ++--
 tests/avocado/machine_aspeed.py  |  8 +++---
 tests/avocado/machine_mips_malta.py  |  8 +++---
 tests/avocado/machine_rx_gdbsim.py   |  8 +++---
 tests/avocado/machine_s390_ccw_virtio.py |  2 +-
 tests/avocado/replay_kernel.py   |  5 ++--
 tests/avocado/reverse_debugging.py   | 14 +++
 tests/avocado/smmu.py|  5 ++--
 tests/avocado/tuxrun_baselines.py|  4 +--
 13 files changed, 67 insertions(+), 41 deletions(-)

diff --git a/docs/devel/testing.rst b/docs/devel/testing.rst
index 22218dbedb..579d1837e0 100644
--- a/docs/devel/testing.rst
+++ b/docs/devel/testing.rst
@@ -1371,23 +1371,32 @@ conditions. For example, tests that take longer to 
execute when QEMU is
 compiled with debug flags. Therefore, the ``AVOCADO_TIMEOUT_EXPECTED`` variable
 has been used to determine whether those tests should run or not.
 
-GITLAB_CI
-^
-A number of tests are flagged to not run on the GitLab CI. Usually because
-they proved to the flaky or there are constraints on the CI environment which
-would make them fail. If you encounter a similar situation then use that
-variable as shown on the code snippet below to skip the test:
+QEMU_TEST_FLAKY_TESTS
+^
+Some tests are not working reliably and thus are disabled by default.
+This includes tests that don't run reliably on GitLab's CI which
+usually expose real issues that are rarely seen on developer machines
+due to the constraints of the CI environment. If you encounter a
+similar situation then mark the test as shown on the code snippet
+below:
 
 .. code::
 
-  @skipIf(os.getenv('GITLAB_CI'), 'Running on GitLab')
+  @skipUnless(os.getenv('QEMU_TEST_FLAKY_TESTS'), 'Test is unstable on GitLab')
   def test(self):
   do_something()
 
-QEMU_TEST_FLAKY_TESTS
-^
-Some tests are not working reliably and thus are disabled by default.
-Set this environment variable to enable them.
+Tests should not live in this state forever and should either be fixed
+or eventually removed. If you move a test into this category please
+consider raising a bug to track progress towards a fix.
+
+To run such tests locally you will need to set the environment
+variable. For example:
+
+.. code::
+
+   env QEMU_TEST_FLAKY_TESTS=1 ./pyvenv/bin/avocado run \
+  tests/avocado/boot_linux.py:BootLinuxPPC64.test_pseries_tcg
 
 Uninstalling Avocado
 
diff --git a/tests/avocado/boot_linux.py b/tests/avocado/boot_linux.py
index be30dcbd58..9e9773e6e1 100644
--- a/tests/avocado/boot_linux.py
+++ b/tests/avocado/boot_linux.py
@@ -12,7 +12,7 @@
 
 from avocado_qemu import LinuxTest, BUILD_DIR
 
-from avocado import skipIf
+from avocado import skipUnless
 
 
 class BootLinuxX8664(LinuxTest):
@@ -93,7 +93,8 @@ class BootLinuxPPC64(LinuxTest):
 
 timeout = 360
 
-@skipIf(os.getenv('GITLAB_CI'), 'Running on GitLab')
+@skipUnless(os.getenv('QEMU_TEST_FLAKY_TESTS'), 'Test is unstable on 
GitLab')
+
 def test_pseries_tcg(self):
 """
 :avocado: tags=machine:pseries
@@ -111,7 +112,8 @@ class BootLinuxS390X(LinuxTest):
 
 timeout = 240
 
-@skipIf(os.getenv('GITLAB_CI'), 'Running on GitLab')
+@skipUnless(os.getenv('QEMU_TEST_FLAKY_TESTS'), 'Test is unstable on 
GitLab')
+
 def test_s390_ccw_virtio_tcg(self):
 """
 :avocado: tags=machine:s390-ccw-virtio
diff --git a/tests/avocado/boot_linux_console.py 
b/tests/avocado/boot_linux_console.py
index 6eab515718..231b4f68e5 100644
--- a/tests/avocado/boot_linux_console.py
+++ b/tests/avocado/boot_linux_console.py
@@ -15,7 +15,7 @@
 
 from avocado import skip
 from avocado import skipUnless
-from avocado import skipIf
+from avocado import skipUnless
 from avocado_qemu import QemuSystemTest
 from avocado_qemu import exec_command
 from avocado_qemu import exec_command_and_wait_for_pattern
@@ -1419,7 +1419,8 @@ def test_ppc_mac99(self):
 # This test has a 6-10% failure rate on various hosts that look
 # like issues with a buggy kernel. As a result we don't want it
 # gating releases on Gitlab.
-@skipIf(os.getenv('GITLAB_CI'), 'Running on GitLab')
+@skipUnless(os.getenv('QEMU_TEST_FLAKY_TESTS'), 'Test 

[PATCH 0/3] final fixes for 8.2

2023-11-30 Thread Alex Bennée
8.2 is looking fairly stable but I do have one bug fix for gdbstub
which I came across while debugging something else. The changes for
avocado rationalise all flaky tests under the QEMU_TEST_FLAKY_TESTS
environment variable. The final patch re-adds the flaky tests to the
CI as a manually run allow_fail job so we can still attempt to debug
their failure in the place they tend to fall over.

Alex.

Alex Bennée (3):
  gdbstub: use a better signal when we halt for IO reasons
  docs/devel: rationalise unstable gitlab tests under FLAKY_TESTS
  gitlab: add optional job to run flaky avocado tests

 docs/devel/testing.rst   | 32 
 gdbstub/internals.h  |  1 +
 gdbstub/system.c |  2 +-
 .gitlab-ci.d/buildtest.yml   | 30 ++
 tests/avocado/boot_linux.py  | 10 +---
 tests/avocado/boot_linux_console.py  |  6 +++--
 tests/avocado/intel_iommu.py |  6 +++--
 tests/avocado/linux_initrd.py|  7 --
 tests/avocado/machine_aspeed.py  | 10 +---
 tests/avocado/machine_mips_malta.py  | 10 +---
 tests/avocado/machine_rx_gdbsim.py   | 10 +---
 tests/avocado/machine_s390_ccw_virtio.py |  3 ++-
 tests/avocado/replay_kernel.py   |  7 --
 tests/avocado/reverse_debugging.py   | 16 
 tests/avocado/smmu.py|  6 +++--
 tests/avocado/tuxrun_baselines.py|  5 ++--
 16 files changed, 119 insertions(+), 42 deletions(-)

-- 
2.39.2




[PATCH 1/3] gdbstub: use a better signal when we halt for IO reasons

2023-11-30 Thread Alex Bennée
The gdb description GDB_SIGNAL_IO is "I/O possible" and by default gdb
will try and restart the guest, getting us nowhere. Report
GDB_SIGNAL_STOP instead which should at least halt the session at the
failure point.

Signed-off-by: Alex Bennée 
Cc: Luis Machado 
Message-Id: <20231123131905.2640498-1-alex.ben...@linaro.org>
---
 gdbstub/internals.h | 1 +
 gdbstub/system.c| 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/gdbstub/internals.h b/gdbstub/internals.h
index 465c24b36e..5c0c725e54 100644
--- a/gdbstub/internals.h
+++ b/gdbstub/internals.h
@@ -24,6 +24,7 @@ enum {
 GDB_SIGNAL_TRAP = 5,
 GDB_SIGNAL_ABRT = 6,
 GDB_SIGNAL_ALRM = 14,
+GDB_SIGNAL_STOP = 17,
 GDB_SIGNAL_IO = 23,
 GDB_SIGNAL_XCPU = 24,
 GDB_SIGNAL_UNKNOWN = 143
diff --git a/gdbstub/system.c b/gdbstub/system.c
index 783ac140b9..83fd452800 100644
--- a/gdbstub/system.c
+++ b/gdbstub/system.c
@@ -183,7 +183,7 @@ static void gdb_vm_state_change(void *opaque, bool running, 
RunState state)
 break;
 case RUN_STATE_IO_ERROR:
 trace_gdbstub_hit_io_error();
-ret = GDB_SIGNAL_IO;
+ret = GDB_SIGNAL_STOP;
 break;
 case RUN_STATE_WATCHDOG:
 trace_gdbstub_hit_watchdog();
-- 
2.39.2




  1   2   3   >