Re: [PATCH trivial] qemu-options.hx: document that tftp=dir is readonly

2024-02-07 Thread Daniel P . Berrangé
On Thu, Feb 08, 2024 at 09:02:28AM +0300, Michael Tokarev wrote:
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1286
> Signed-off-by: Michael Tokarev 
> ---
>  qemu-options.hx | 2 ++
>  1 file changed, 2 insertions(+)

Reviewed-by: Daniel P. Berrangé 


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: Re: Re: [PATCH] vdpa-dev: Fix initialisation order to restore VDUSE compatibility

2024-02-07 Thread Jason Wang
On Wed, Feb 7, 2024 at 4:47 PM Stefano Garzarella  wrote:
>
> On Wed, Feb 07, 2024 at 11:17:34AM +0800, Jason Wang wrote:
> >On Tue, Feb 6, 2024 at 4:31 PM Stefano Garzarella  
> >wrote:
> >>
> >> On Tue, Feb 06, 2024 at 10:47:40AM +0800, Jason Wang wrote:
> >> >On Mon, Feb 5, 2024 at 6:51 PM Stefano Garzarella  
> >> >wrote:
> >> >>
> >> >> On Fri, Feb 02, 2024 at 02:25:21PM +0100, Kevin Wolf wrote:
> >> >> >VDUSE requires that virtqueues are first enabled before the DRIVER_OK
> >> >> >status flag is set; with the current API of the kernel module, it is
> >> >> >impossible to enable the opposite order in our block export code 
> >> >> >because
> >> >> >userspace is not notified when a virtqueue is enabled.
> >> >
> >> >Did this mean virtio-blk will enable a virtqueue after DRIVER_OK?
> >>
> >> It's not specific to virtio-blk, but to the generic vdpa device we have
> >> in QEMU (i.e. vhost-vdpa-device). Yep, after commit
> >> 6c4825476a4351530bcac17abab72295b75ffe98, virtqueues are enabled after
> >> DRIVER_OK.
> >
> >Right.
> >
> >>
> >> >Sepc is not clear about this and that's why we introduce
> >> >VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK.
> >>
> >> Ah, I didn't know about this new feature. So after commit
> >> 6c4825476a4351530bcac17abab72295b75ffe98 the vhost-vdpa-device is not
> >> complying with the specification, right?
> >
> >Kind of, but as stated, it's just because spec is unclear about the
> >behaviour. There's a chance that spec will explicitly support it in
> >the future.
> >
> >>
> >> >
> >> >>
> >> >> Yeah, IMHO the VDUSE protocol is missing a VDUSE_SET_VQ_READY message,
> >> >
> >> >I think you meant when VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK is
> >> >negotiated.
> >>
> >> At this point yes. But if VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK is not
> >> negotiated, should we return an error in vhost-vdpa kernel module if
> >> VHOST_VDPA_SET_VRING_ENABLE is called when DRIVER_OK is already set?
> >
> >I'm not sure if this can break some setups or not. It might be better
> >to leave it as is?
>
> As I also answered in the kernel patch, IMHO we have to return an error,
> because any setups are broken anyway (see the problem we're fixing with
 is > this patch),

For VDUSE probably yes, but not for other parents. It's easy to
mandate on day 0 but might be hard for now.

For mlx5e, it supports the semantic
VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK even before the feature bit is
introduced.

And we don't do other checks like for example setting vq address, size
after DRIVER_OK.

We can hear from others.

> so at this point it's better to return an error, so they
> don't spend time figuring out why the VDUSE device doesn't work.
>
> >
> >Without VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK, we don't know if
> >parent support vq_ready after driver_ok.
>
> So we have to assume that it doesn't support it, to be more
> conservative, right?
>
> >With VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK, we know parent support
> >vq_ready after driver_ok.
> >
> >>
> >> >If this is truth, it seems a little more complicated, for
> >> >example the get_backend_features needs to be forward to the userspace?
> >>
> >> I'm not understanding, don't we already have VHOST_GET_BACKEND_FEATURES
> >> for this? Or do you mean userspace on the VDUSE side?
> >
> >Yes, since in this case the parent is in the userspace, there's no way
> >for VDUSE to know if user space supports vq_ready after driver_ok or
> >not.
> >
> >As you may have noticed, we don't have a message for vq_ready which
> >implies that vq_ready after driver_ok can't be supported.
>
> Yep, I see.
>
> >
> >>
> >> >This seems suboptimal to implement this in the spec first and then we
> >> >can leverage the features. Or we can have another parameter for the
> >> >ioctl that creates the vduse device.
> >>
> >> I got a little lost, though in vhost-user, the device can always expect
> >> a vring_enable/disable, so I thought it was not complicated in VDUSE.
> >
> >Yes, the problem is assuming we have a message for vq_ready, there
> >could be  a "legacy" userspace that doesn't support that.  So in that
> >case, VDUSE needs to know if the userspace parent can support that or
> >not.
>
> I think VDUSE needs to negotiate features (if it doesn't already) as
> vhost-user does with the backend. Also for new future functionality.

It negotiates virtio features but not vhost backend features.

Thanks




Re: [PATCH 2/4] tests/unit/test-char: Fix qemu_socket(), make_udp_socket() check

2024-02-07 Thread Markus Armbruster
Eric Blake  writes:

> On Sat, Feb 03, 2024 at 09:02:26AM +0100, Markus Armbruster wrote:
>> qemu_socket() and make_udp_socket() return a file descriptor on
>> success, -1 on failure.  The check misinterprets 0 as failure.  Fix
>> that.
>> 
>> Signed-off-by: Markus Armbruster 
>> ---
>>  tests/unit/test-char.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> Reviewed-by: Eric Blake 
>
> Might be worth amending the commit message of 1/4 where you called out
> this bug to mention it will be fixed in the next patch.

Yes.  Thanks!




Re: [PATCH 1/4] chardev/parallel: Don't close stdin on inappropriate device

2024-02-07 Thread Markus Armbruster
Eric Blake  writes:

> On Sat, Feb 03, 2024 at 09:02:25AM +0100, Markus Armbruster wrote:
>> The __linux__ version of qemu_chr_open_pp_fd() tries to claim the
>> parport device with a PPCLAIM ioctl().  On success, it stores the file
>> descriptor in the chardev object, and returns success.  On failure, it
>> closes the file descriptor, and returns failure.
>> 
>> chardev_new() then passes the Chardev to object_unref().  This duly
>> calls char_parallel_finalize(), which closes the file descriptor
>> stored in the chardev object.  Since qemu_chr_open_pp_fd() didn't
>> store it, it's still zero, so this closes standard input.  Ooopsie.
>> 
>> To demonstate, add a unit test.  With the bug above unfixed, running
>> this test closes standard input.  char_hotswap_test() happens to run
>> next.  It opens a socket, duly gets file descriptor 0, and since it
>> tests for success with > 0 instead of >= 0, it fails.
>
> Two bugs for the price of one!
>
>> 
>> The test needs to be conditional exactly like the chardev it tests.
>> Since the condition is rather complicated, steal the solution from the
>> serial chardev: define HAVE_CHARDEV_PARALLEL in qemu/osdep.h.  This
>> also permits simplifying chardev/meson.build a bit.
>> 
>> The bug fix is easy enough: store the file descriptor, and leave
>> closing it to char_parallel_finalize().
>> 
>> Signed-off-by: Markus Armbruster 
>> ---
>
>> +++ b/include/qemu/osdep.h
>> @@ -508,11 +508,18 @@ void qemu_anon_ram_free(void *ptr, size_t size);
>>  
>>  #ifdef _WIN32
>>  #define HAVE_CHARDEV_SERIAL 1
>> -#elif defined(__linux__) || defined(__sun__) || defined(__FreeBSD__)\
>> +#define HAVE_CHARDEV_PARALLEL 1
>> +#else
>> +#if defined(__linux__) || defined(__sun__) || defined(__FreeBSD__)   \
>>  || defined(__NetBSD__) || defined(__OpenBSD__) || 
>> defined(__DragonFly__) \
>>  || defined(__GLIBC__) || defined(__APPLE__)
>>  #define HAVE_CHARDEV_SERIAL 1
>>  #endif
>> +#if defined(__linux__) || defined(__FreeBSD__) \
>> +|| defined(__FreeBSD_kernel__) || defined(__DragonFly__)
>> +#define HAVE_CHARDEV_PARALLEL 1
>> +#endif
>> +#endif
>
> Not for this patch, but I've grown to like a preprocessor style I've
> seen in other projects to make it easier to read nested #if:
>
> #ifdef _WIN32
> # define HAVE_CHARDEV_SERIAL 1
> # define HAVE_CHARDEV_PARALLEL 1
> #else
> # if defined(__linux__) ... defined(__APPLE__)
> #  define HAVE_CHARDEV_SERIAL 1
> # endif
> # if defined(__linux__) ... defined(__DragonFly__)
> #  define HAVE_CHARDEV_PARALLEL 1
> # endif
> #endif

I like this style, too.

>> +++ b/chardev/meson.build
>> @@ -21,11 +21,9 @@ if host_os == 'windows'
>>  else
>>chardev_ss.add(files(
>>'char-fd.c',
>> +'char-parallel.c',
>>'char-pty.c',
>
> Indentation looks off.  Otherwise,

Will fix.

> Reviewed-by: Eric Blake 

Thanks!




Re: [PATCH 09/15] qga/qapi-schema: Plug trivial documentation holes

2024-02-07 Thread Markus Armbruster
Daniel P. Berrangé  writes:

> On Mon, Feb 05, 2024 at 08:47:03AM +0100, Markus Armbruster wrote:
>> Add missing return member documentation of guest-get-disks,
>> guest-get-devices, guest-get-diskstats, and guest-get-cpustats.
>> 
>> The NVMe SMART information returned by guest-getdisks remains
>> undocumented.  Add a TODO there.
>> 
>> Signed-off-by: Markus Armbruster 
>> ---
>>  qga/qapi-schema.json | 24 ++--
>>  1 file changed, 14 insertions(+), 10 deletions(-)
>
> Reviewed-by: Daniel P. Berrangé 
>
>> @@ -978,7 +974,7 @@
>>  #
>>  # Disk type related smart information.
>>  #
>> -# - @nvme: NVMe disk smart
>> +# @type: disk bus type
>>  #
>>  # Since: 7.1
>>  ##
>
> Fun, so not only were the fields that exist not documented,
> but we've documented fields which don't exist.

I think the @nvme line tried to document the branch.  Not useful; the
doc generator takes care of that:

GuestDiskSmart (Object)
---

Disk type related smart information.

* nvme: NVMe disk smart

Members
~~~

"type": "GuestDiskBusType"
Not documented

--> The members of "GuestNVMeSmart" when "type" is "nvme"

>> @@ -1780,7 +1784,7 @@
>>  #
>>  # Get statistics of each CPU in millisecond.
>>  #
>> -# - @linux: Linux style CPU statistics
>> +# @type: guest operating system
>>  #
>>  # Since: 7.1
>>  ##
>
> And more things which don't exist !

Likewise.




[PATCH trivial] qemu-options.hx: document that tftp=dir is readonly

2024-02-07 Thread Michael Tokarev
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1286
Signed-off-by: Michael Tokarev 
---
 qemu-options.hx | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/qemu-options.hx b/qemu-options.hx
index 5adbed1101..05f70231c9 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -3100,6 +3100,8 @@ SRST
 server. The files in dir will be exposed as the root of a TFTP
 server. The TFTP client on the guest must be configured in
 binary mode (use the command ``bin`` of the Unix TFTP client).
+The built-in TFTP server is read-only, qemu will not write to
+this directory.
 
 ``tftp-server-name=name``
 In BOOTP reply, broadcast name as the "TFTP server name"
-- 
2.39.2




Re: [RFC PATCH 14/14] migration: Fix return-path thread exit

2024-02-07 Thread Peter Xu
On Wed, Feb 07, 2024 at 02:33:47PM +0100, Cédric Le Goater wrote:
> In case of error, close_return_path_on_source() can perform a shutdown
> to exit the return-path thread.  However, in migrate_fd_cleanup(),
> 'to_dst_file' is closed before calling close_return_path_on_source()
> and the shutdown fails, leaving the source and destination waiting for
> an event to occur.
> 
> Close the file after calling close_return_path_on_source() so that the
> shutdown succeeds and the return-path thread exits.
> 
> Signed-off-by: Cédric Le Goater 
> ---
> 
>  This is an RFC because the correct fix implies reworking the QEMUFile
>  construct, built on top of the QEMU I/O channel.
> 
>  migration/migration.c | 13 ++---
>  1 file changed, 6 insertions(+), 7 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 
> 5f55af3d7624750ca416c4177781241b3e291e5d..de329f2c553288935d824748286e79e535929b8b
>  100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1313,6 +1313,8 @@ void migrate_set_state(int *state, int old_state, int 
> new_state)
>  
>  static void migrate_fd_cleanup(MigrationState *s)
>  {
> +QEMUFile *tmp = NULL;
> +
>  g_free(s->hostname);
>  s->hostname = NULL;
>  json_writer_free(s->vmdesc);
> @@ -1321,8 +1323,6 @@ static void migrate_fd_cleanup(MigrationState *s)
>  qemu_savevm_state_cleanup();
>  
>  if (s->to_dst_file) {
> -QEMUFile *tmp;
> -
>  trace_migrate_fd_cleanup();
>  bql_unlock();
>  if (s->migration_thread_running) {
> @@ -1341,15 +1341,14 @@ static void migrate_fd_cleanup(MigrationState *s)
>   * critical section won't block for long.
>   */
>  migration_ioc_unregister_yank_from_file(tmp);
> -qemu_fclose(tmp);
>  }
>  
> -/*
> - * We already cleaned up to_dst_file, so errors from the return
> - * path might be due to that, ignore them.
> - */
>  close_return_path_on_source(s);
>  
> +if (tmp) {
> +qemu_fclose(tmp);
> +}
> +
>  assert(!migration_is_active(s));
>  
>  if (s->state == MIGRATION_STATUS_CANCELLING) {

I think this is okay to me for a short term plan.  I'll see how others
think, also add Dan into the loop.

If so, would you please add a rich comment explaining why tmp needs to be
closed later?  Especially, explicit comment on the ordering requirement
would be helpful: IMHO here it's an order that qemu_fclose() must happen
after close_return_path_on_source().  So when others work on this code we
don't easily break it without noticing.

Also please feel free to post separately on migration patches if you'd like
us to merge the patches when repost.

Thanks,

-- 
Peter Xu




Re: [PATCH 13/14] migration: Use migrate_has_error() in close_return_path_on_source()

2024-02-07 Thread Peter Xu
On Wed, Feb 07, 2024 at 02:33:46PM +0100, Cédric Le Goater wrote:
> close_return_path_on_source() retrieves the migration error from the
> the QEMUFile '->to_dst_file' to know if a shutdown is required. This
> shutdown is required to exit the return-path thread. However, in
> migrate_fd_cleanup(), '->to_dst_file' is cleaned up before calling
> close_return_path_on_source() and the shutdown is never performed,
> leaving the source and destination waiting for an event to occur.
> 
> Avoid relying on '->to_dst_file' and use migrate_has_error() instead.
> 
> Suggested-by: Peter Xu 
> Signed-off-by: Cédric Le Goater 

Reviewed-by: Peter Xu 

-- 
Peter Xu




Re: [PATCH 12/14] migration: Report error when shutdown fails

2024-02-07 Thread Peter Xu
On Wed, Feb 07, 2024 at 02:33:45PM +0100, Cédric Le Goater wrote:
> This will help detect issues regarding I/O channels usage.
> 
> Signed-off-by: Cédric Le Goater 

Reviewed-by: Peter Xu 

-- 
Peter Xu




Re: [PATCH 03/14] memory: Add Error** argument to .log_global*() handlers

2024-02-07 Thread Peter Xu
On Wed, Feb 07, 2024 at 02:33:36PM +0100, Cédric Le Goater wrote:
> @@ -2936,14 +2940,14 @@ void memory_global_dirty_log_start(unsigned int flags)
>  trace_global_dirty_changed(global_dirty_tracking);
>  
>  if (!old_flags) {
> -MEMORY_LISTENER_CALL_GLOBAL(log_global_start, Forward);
> +MEMORY_LISTENER_CALL_GLOBAL(log_global_start, Forward, errp);
>  memory_region_transaction_begin();
>  memory_region_update_pending = true;
>  memory_region_transaction_commit();
>  }
>  }
>  
> -static void memory_global_dirty_log_do_stop(unsigned int flags)
> +static void memory_global_dirty_log_do_stop(unsigned int flags, Error **errp)
>  {
>  assert(flags && !(flags & (~GLOBAL_DIRTY_MASK)));
>  assert((global_dirty_tracking & flags) == flags);
> @@ -2955,7 +2959,7 @@ static void memory_global_dirty_log_do_stop(unsigned 
> int flags)
>  memory_region_transaction_begin();
>  memory_region_update_pending = true;
>  memory_region_transaction_commit();
> -MEMORY_LISTENER_CALL_GLOBAL(log_global_stop, Reverse);
> +MEMORY_LISTENER_CALL_GLOBAL(log_global_stop, Reverse, errp);
>  }
>  }

I'm a little bit surprised to see that MEMORY_LISTENER_CALL_GLOBAL()
already allows >2 args, with the ability to conditionally pass over errp
with such oneliner change; even if all callers were only using 2 args
before this patch..

Acked-by: Peter Xu 

-- 
Peter Xu




Re: [PULL 1/1] virtio-blk: avoid using ioeventfd state in irqfd conditional

2024-02-07 Thread Michael Tokarev

06.02.2024 18:31, Stefan Hajnoczi :

Requests that complete in an IOThread use irqfd to notify the guest
while requests that complete in the main loop thread use the traditional
qdev irq code path. The reason for this conditional is that the irq code
path requires the BQL:

   if (s->ioeventfd_started && !s->ioeventfd_disabled) {
   virtio_notify_irqfd(vdev, req->vq);
   } else {
   virtio_notify(vdev, req->vq);
   }

There is a corner case where the conditional invokes the irq code path
instead of the irqfd code path:

   static void virtio_blk_stop_ioeventfd(VirtIODevice *vdev)
   {
   ...
   /*
* Set ->ioeventfd_started to false before draining so that host 
notifiers
* are not detached/attached anymore.
*/
   s->ioeventfd_started = false;

   /* Wait for virtio_blk_dma_restart_bh() and in flight I/O to complete */
   blk_drain(s->conf.conf.blk);

During blk_drain() the conditional produces the wrong result because
ioeventfd_started is false.

Use qemu_in_iothread() instead of checking the ioeventfd state.

Buglink: https://issues.redhat.com/browse/RHEL-15394
Signed-off-by: Stefan Hajnoczi 
Message-id: 20240122172625.415386-1-stefa...@redhat.com
Signed-off-by: Stefan Hajnoczi 


Cc qemu-stable?  This smells like a stable material, please let me know
if it is not.

(And yes I've seen it also included in another pullreq)

Thanks,

/mjt



Re: [RFC/INCOMPLETE PATCH 0/8] Attempt to make qemu-img options consistent and --help working

2024-02-07 Thread Michael Tokarev

07.02.2024 22:01, Manos Pitsidianakis:

Hello Michael,

Such changes are long overdue. However given the complexity of
commands and arguments, maybe it'd be a good idea to write a code
generator for the command line interface, This way you could also
generate --help outputs, manpages, shell completions just from the
command line spec we'd use to generate the argv parsing routines.


I tried starting with just a set of common options in --help output,
but quickly abandoned that idea.  Even there, and I mentioned that
in the email you're replying to, we should have slightly different
text in different contexts.  So it seems like it's better to just
write it in two places.  Two *different* places, - which is the
--help output and qemu-img.rst documentation (from which the manpage
is generated).  The two places are really different, because --help
is just a very brief usage/options summary, while the doc is a much
more complete descriptive guide.

There's one more context, - the option parsing code. There are ways
to make it easier, like libpopt for example, but in my view, in an
attempt to make life easier, it makes it even more complex.  In my
taste anyway, YMMV.

In short, - while this stuff seems like a good candidate for some
automation, it might be not, and the first step IMHO is to get the
first task done, - to review and refresh all options, see what can
be done with the messy difference of the meanings for subcommands,
describe them.  Maybe after that whole thing can be automated (if
it's possible to do with this differently-named hell and with
readable output).

/mjt



Re: [PATCH 02/14] migration: Add Error** argument to .load_setup() handler

2024-02-07 Thread Peter Xu
On Wed, Feb 07, 2024 at 02:33:35PM +0100, Cédric Le Goater wrote:
> diff --git a/migration/ram.c b/migration/ram.c
> index 
> 136c237f4079f68d4e578cf1c72eec2efc815bc8..8dac9bac2fe8b8c19e102c771a7ef6e976252906
>  100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -3498,7 +3498,7 @@ void colo_release_ram_cache(void)
>   * @f: QEMUFile where to receive the data
>   * @opaque: RAMState pointer

Another one may need touch up..

>   */
> -static int ram_load_setup(QEMUFile *f, void *opaque)
> +static int ram_load_setup(QEMUFile *f, void *opaque, Error **errp)
>  {
>  xbzrle_load_setup();
>  ramblock_recv_map_init();
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 
> f2ae799bad13e631bccf733a34c3a8fd22e8dd48..990f4249a26d28117ee365d8b20fc5bbca0d43d6
>  100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -2737,7 +2737,7 @@ static void 
> qemu_loadvm_state_switchover_ack_needed(MigrationIncomingState *mis)
>  
> trace_loadvm_state_switchover_ack_needed(mis->switchover_ack_pending_num);
>  }
>  
> -static int qemu_loadvm_state_setup(QEMUFile *f)
> +static int qemu_loadvm_state_setup(QEMUFile *f, Error **errp)
>  {
>  SaveStateEntry *se;
>  int ret;
> @@ -2753,10 +2753,11 @@ static int qemu_loadvm_state_setup(QEMUFile *f)
>  }
>  }
>  
> -ret = se->ops->load_setup(f, se->opaque);
> +ret = se->ops->load_setup(f, se->opaque, errp);
>  if (ret < 0) {
> +error_prepend(errp, "Load state of device %s failed: ",
> +  se->idstr);
>  qemu_file_set_error(f, ret);

Do we also want to switch to _set_error_obj()?  Or even use
migrate_set_error() (the latter may apply to previous patch too if it
works)?

> -error_report("Load state of device %s failed", se->idstr);
>  return ret;
>  }
>  }
> @@ -2937,7 +2938,8 @@ int qemu_loadvm_state(QEMUFile *f)
>  return ret;
>  }
>  
> -if (qemu_loadvm_state_setup(f) != 0) {
> +if (qemu_loadvm_state_setup(f, _err) != 0) {
> +error_report_err(local_err);
>  return -EINVAL;
>  }
>  
> -- 
> 2.43.0
> 
> 

-- 
Peter Xu




Re: [PATCH 01/14] migration: Add Error** argument to .save_setup() handler

2024-02-07 Thread Peter Xu
On Wed, Feb 07, 2024 at 02:33:34PM +0100, Cédric Le Goater wrote:
> diff --git a/migration/ram.c b/migration/ram.c
> index 
> d5b7cd5ac2f31aabf4a248b966153401c48912cf..136c237f4079f68d4e578cf1c72eec2efc815bc8
>  100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -2931,7 +2931,7 @@ void qemu_guest_free_page_hint(void *addr, size_t len)
>   * @f: QEMUFile where to send the data
>   * @opaque: RAMState pointer

Document may need a touch-up.

>   */
> -static int ram_save_setup(QEMUFile *f, void *opaque)
> +static int ram_save_setup(QEMUFile *f, void *opaque, Error **errp)
>  {
>  RAMState **rsp = opaque;
>  RAMBlock *block;

Besides:

Reviewed-by: Peter Xu 

-- 
Peter Xu




[PATCH 1/2] migration/multifd: Cleanup TLS iochannel referencing

2024-02-07 Thread peterx
From: Peter Xu 

Commit a1af605bd5 ("migration/multifd: fix hangup with TLS-Multifd due to
blocking handshake") introduced a thread for TLS channels, which will
resolve the issue on blocking the main thread.  However in the same commit
p->c is slightly abused just to be able to pass over the pointer "p" into
the thread.

That's the major reason we'll need to conditionally free the io channel in
the fault paths.

To clean it up, using a separate structure to pass over both "p" and "tioc"
in the tls handshake thread.  Then we can make it a rule that p->c will
never be set until the channel is completely setup.  With that, we can drop
the tricky conditional unref of the io channel in the error path.

Signed-off-by: Peter Xu 
---
 migration/multifd.c | 37 +++--
 1 file changed, 23 insertions(+), 14 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index adfe8c9a0a..4a85a6b7b3 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -873,16 +873,22 @@ out:
 
 static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque);
 
+typedef struct {
+MultiFDSendParams *p;
+QIOChannelTLS *tioc;
+} MultiFDTLSThreadArgs;
+
 static void *multifd_tls_handshake_thread(void *opaque)
 {
-MultiFDSendParams *p = opaque;
-QIOChannelTLS *tioc = QIO_CHANNEL_TLS(p->c);
+MultiFDTLSThreadArgs *args = opaque;
 
-qio_channel_tls_handshake(tioc,
+qio_channel_tls_handshake(args->tioc,
   multifd_new_send_channel_async,
-  p,
+  args->p,
   NULL,
   NULL);
+g_free(args);
+
 return NULL;
 }
 
@@ -892,6 +898,7 @@ static bool multifd_tls_channel_connect(MultiFDSendParams 
*p,
 {
 MigrationState *s = migrate_get_current();
 const char *hostname = s->hostname;
+MultiFDTLSThreadArgs *args;
 QIOChannelTLS *tioc;
 
 tioc = migration_tls_client_create(ioc, hostname, errp);
@@ -906,11 +913,14 @@ static bool multifd_tls_channel_connect(MultiFDSendParams 
*p,
 object_unref(OBJECT(ioc));
 trace_multifd_tls_outgoing_handshake_start(ioc, tioc, hostname);
 qio_channel_set_name(QIO_CHANNEL(tioc), "multifd-tls-outgoing");
-p->c = QIO_CHANNEL(tioc);
+
+args = g_new0(MultiFDTLSThreadArgs, 1);
+args->tioc = tioc;
+args->p = p;
 
 p->tls_thread_created = true;
 qemu_thread_create(>tls_thread, "multifd-tls-handshake-worker",
-   multifd_tls_handshake_thread, p,
+   multifd_tls_handshake_thread, args,
QEMU_THREAD_JOINABLE);
 return true;
 }
@@ -923,6 +933,7 @@ static bool multifd_channel_connect(MultiFDSendParams *p,
 
 migration_ioc_register_yank(ioc);
 p->registered_yank = true;
+/* Setup p->c only if the channel is completely setup */
 p->c = ioc;
 
 p->thread_created = true;
@@ -976,14 +987,12 @@ out:
 
 trace_multifd_new_send_channel_async_error(p->id, local_err);
 multifd_send_set_error(local_err);
-if (!p->c) {
-/*
- * If no channel has been created, drop the initial
- * reference. Otherwise cleanup happens at
- * multifd_send_channel_destroy()
- */
-object_unref(OBJECT(ioc));
-}
+/*
+ * For error cases (TLS or non-TLS), IO channel is always freed here
+ * rather than when cleanup multifd: since p->c is not set, multifd
+ * cleanup code doesn't even know its existance.
+ */
+object_unref(OBJECT(ioc));
 error_free(local_err);
 }
 
-- 
2.43.0




[PATCH 0/2] migration: cleanup TLS channel referencing

2024-02-07 Thread peterx
From: Peter Xu 

Based-on: <20240208030528.368214-1-pet...@redhat.com>

The patchset is based on the latest migration pull request.  This is a
small cleanup patchset to firstly cleanup tls iochannel deref on error
paths, then further remove one unused var on yank if the cleanup applies.

These are exactly the same patch I attached here in this email reply:

https://lore.kernel.org/r/ZcLrF5HN920rUTSw@x1n

It's just a formal post, collecting one R-b from Fabiano in patch 2.

Please feel free to have a look, thanks.

Peter Xu (2):
  migration/multifd: Cleanup TLS iochannel referencing
  migration/multifd: Drop registered_yank

 migration/multifd.h |  2 --
 migration/multifd.c | 44 ++--
 2 files changed, 26 insertions(+), 20 deletions(-)

-- 
2.43.0




[PATCH 2/2] migration/multifd: Drop registered_yank

2024-02-07 Thread peterx
From: Peter Xu 

With a clear definition of p->c protocol, where we only set it up if the
channel is fully established (TLS or non-TLS), registered_yank boolean will
have equal meaning of "p->c != NULL".

Drop registered_yank by checking p->c instead.

Reviewed-by: Fabiano Rosas 
Signed-off-by: Peter Xu 
---
 migration/multifd.h | 2 --
 migration/multifd.c | 7 +++
 2 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/migration/multifd.h b/migration/multifd.h
index 8a1cad0996..b3fe27ae93 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -78,8 +78,6 @@ typedef struct {
 bool tls_thread_created;
 /* communication channel */
 QIOChannel *c;
-/* is the yank function registered */
-bool registered_yank;
 /* packet allocated len */
 uint32_t packet_len;
 /* guest page size */
diff --git a/migration/multifd.c b/migration/multifd.c
index 4a85a6b7b3..278453cf84 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -648,11 +648,11 @@ static int multifd_send_channel_destroy(QIOChannel *send)
 
 static bool multifd_send_cleanup_channel(MultiFDSendParams *p, Error **errp)
 {
-if (p->registered_yank) {
+if (p->c) {
 migration_ioc_unregister_yank(p->c);
+multifd_send_channel_destroy(p->c);
+p->c = NULL;
 }
-multifd_send_channel_destroy(p->c);
-p->c = NULL;
 qemu_sem_destroy(>sem);
 qemu_sem_destroy(>sem_sync);
 g_free(p->name);
@@ -932,7 +932,6 @@ static bool multifd_channel_connect(MultiFDSendParams *p,
 qio_channel_set_delay(ioc, false);
 
 migration_ioc_register_yank(ioc);
-p->registered_yank = true;
 /* Setup p->c only if the channel is completely setup */
 p->c = ioc;
 
-- 
2.43.0




Re: [PATCH v3 0/6] migration/multifd: Fix channel creation vs. cleanup races

2024-02-07 Thread Peter Xu
On Wed, Feb 07, 2024 at 10:31:51AM -0300, Fabiano Rosas wrote:
> Peter Xu  writes:
> 
> > On Tue, Feb 06, 2024 at 06:51:12PM -0300, Fabiano Rosas wrote:
> >> Based-on: 20240202102857.110210-1-pet...@redhat.com
> >> [PATCH v2 00/23] migration/multifd: Refactor ->send_prepare() and cleanups
> >> https://lore.kernel.org/r/20240202102857.110210-1-pet...@redhat.com
> >> 
> >> Hi,
> >> 
> >> For v3 I fixed the refcounting issue spotted by Avihai. The situation
> >> there is a bit clunky due to historical reasons. The gist is that we
> >> have an assumption that channel creation never fails after p->c has
> >> been set, so when 'p->c == NULL' we have to unref and when 'p->c !=
> >> NULL' the cleanup code will do the unref.
> >
> > Yes, this looks good to me.  That's a good catch.
> >
> > I'll leave at least one more day for Avihai and/or Dan to have another
> > look.  My r-b persist as of now on patch 5.
> >
> > Actually I think the conditional unref is slightly tricky, but it's not its
> > own fault, IMHO, OTOH it's more about a1af605bd5ad where p->c is slightly
> > abused.  My understanding is we can avoid that conditional unref with below
> > patch 1 as a cleanup (on top of this series).  Then patch 2 comes all
> > alongside.
> 
> Yes, I even wrote some code to always set p->c and leave the unref to
> the cleanup routine. Doing reference counting in the middle of the code
> like that leaves us exposed to new code relying on p->c being valid
> during cleanup. There's already yank and the cleanup itself which expect
> p->c to be valid.
> 
> However, I couldn't get past the uglyness of overwriting p->c, so I kept
> the changes minimal for this version.

Yep. The good part of "only set p->c if channel fully established" is that
then the migration thread fully owns the ioc as long as set, and no
overwritting possible.

> 
> I'm also wondering whether we can remove the TLS handshake thread
> altogether now that we moved the multifd_send_setup call into the
> migration thread. My (poor) understanding is that a1af605bd5ad hit the
> issue that the QIOTask completion would just execute after
> multifd_save_setup returned. Otherwise I don't see how adding a thread
> to an already async task would have helped. But I need to think about
> that a bit more.

It could be even simpler than that, iiuc.  Note that in the stack showed in
that commit message, it hasn't even reached the async handling:

Src: (multifd_send_0)
multifd_channel_connect
  multifd_tls_channel_connect
multifd_tls_channel_connect
   qio_channel_tls_handshake
  qio_channel_tls_handshake_task < async handling provided 
here..
qcrypto_tls_session_handshake
  gnutls_handshake <-- but we're still at sync 
phase..
   ...
recvmsg (Blocking I/O waiting for response)

IMHO it'll be much easier if we can remove those threads.  Please keep the
adventure there, just a heads-up that the async paths seem to have a close
dependency so far on gmainloop contexts, while the migration thread may not
provide that anymore (and I hope we don't introduce anything either; I
think it's better we stick with a full threaded model in migration rather
than event based).

> 
> >
> > We don't need to rush on these, though, we should fix the thread race
> >first because multiple of us hit it, and all cleanups can be done
> >later.
> 
> I said we should not merge these two changes right now, but I take that
> back. I'll leave it up to you, there doesn't seem to be that much impact
> in adding them.

Thanks.  I just sent the pull without them, as I don't yet have plan to
queue them so soon; I'll be accused to abuse the maintainership. :-)

If you think worthwhile, I can still repost them as formal patches later
and put there on the list.  If your explore on a bigger hammer works then I
think we can ignore these two patches.  But if you or anyone thinks we
could have these as good cleanups, we can also merge them first for 9.0 and
leave whatever else for later.

> 
> >
> > =
> > From 0830819d86e08c5175d6669505aa712a0a09717f Mon Sep 17 00:00:00 2001
> > From: Peter Xu 
> > Date: Wed, 7 Feb 2024 10:08:35 +0800
> > Subject: [PATCH 1/2] migration/multifd: Cleanup TLS iochannel referencing
> >
> > Commit a1af605bd5 ("migration/multifd: fix hangup with TLS-Multifd due to
> > blocking handshake") introduced a thread for TLS channels, which will
> > resolve the issue on blocking the main thread.  However in the same commit
> > p->c is slightly abused just to be able to pass over the pointer "p" into
> > the thread.
> >
> > That's the major reason we'll need to conditionally free the io channel in
> > the fault paths.
> >
> > To clean it up, using a separate structure to pass over both "p" and "tioc"
> > in the tls handshake thread.  Then we can make it a rule that p->c will
> > never be set until the channel is completely setup.  With that, we can drop
> > the tricky 

[PULL 18/34] migration/multifd: Change retval of multifd_send_pages()

2024-02-07 Thread peterx
From: Peter Xu 

Using int is an overkill when there're only two options.  Change it to a
boolean.

Reviewed-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240202102857.110210-18-pet...@redhat.com
Signed-off-by: Peter Xu 
---
 migration/multifd.c | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index 12e587fda8..35d4e8ad1f 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -449,9 +449,10 @@ static void multifd_send_kick_main(MultiFDSendParams *p)
  * thread is using the channel mutex when changing it, and the channel
  * have to had finish with its own, otherwise pending_job can't be
  * false.
+ *
+ * Returns true if succeed, false otherwise.
  */
-
-static int multifd_send_pages(void)
+static bool multifd_send_pages(void)
 {
 int i;
 static int next_channel;
@@ -459,7 +460,7 @@ static int multifd_send_pages(void)
 MultiFDPages_t *pages = multifd_send_state->pages;
 
 if (multifd_send_should_exit()) {
-return -1;
+return false;
 }
 
 /* We wait here, until at least one channel is ready */
@@ -473,7 +474,7 @@ static int multifd_send_pages(void)
 next_channel %= migrate_multifd_channels();
 for (i = next_channel;; i = (i + 1) % migrate_multifd_channels()) {
 if (multifd_send_should_exit()) {
-return -1;
+return false;
 }
 p = _send_state->params[i];
 /*
@@ -502,7 +503,7 @@ static int multifd_send_pages(void)
 qemu_mutex_unlock(>mutex);
 qemu_sem_post(>sem);
 
-return 1;
+return true;
 }
 
 /* Returns true if enqueue successful, false otherwise */
@@ -526,7 +527,7 @@ bool multifd_queue_page(RAMBlock *block, ram_addr_t offset)
 changed = true;
 }
 
-if (multifd_send_pages() < 0) {
+if (!multifd_send_pages()) {
 return false;
 }
 
@@ -666,7 +667,7 @@ int multifd_send_sync_main(void)
 return 0;
 }
 if (multifd_send_state->pages->num) {
-if (multifd_send_pages() < 0) {
+if (!multifd_send_pages()) {
 error_report("%s: multifd_send_pages fail", __func__);
 return -1;
 }
-- 
2.43.0




[PULL 23/34] migration/multifd: Fix MultiFDSendParams.packet_num race

2024-02-07 Thread peterx
From: Peter Xu 

As reported correctly by Fabiano [1] (while per Fabiano, it sourced back to
Elena's initial report in Oct 2023), MultiFDSendParams.packet_num is buggy
to be assigned and stored.  Consider two consequent operations of: (1)
queue a job into multifd send thread X, then (2) queue another sync request
to the same send thread X.  Then the MultiFDSendParams.packet_num will be
assigned twice, and the first assignment can get lost already.

To avoid that, we move the packet_num assignment from p->packet_num into
where the thread will fill in the packet.  Use atomic operations to protect
the field, making sure there's no race.

Note that atomic fetch_add() may not be good for scaling purposes, however
multifd should be fine as number of threads should normally not go beyond
16 threads.  Let's leave that concern for later but fix the issue first.

There's also a trick on how to make it always work even on 32 bit hosts for
uint64_t packet number.  Switching to uintptr_t as of now to simply the
case.  It will cause packet number to overflow easier on 32 bit, but that
shouldn't be a major concern for now as 32 bit systems is not the major
audience for any performance concerns like what multifd wants to address.

We also need to move multifd_send_state definition upper, so that
multifd_send_fill_packet() can reference it.

[1] https://lore.kernel.org/r/87o7d1jlu5@suse.de

Reported-by: Elena Ufimtseva 
Reviewed-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240202102857.110210-23-pet...@redhat.com
Signed-off-by: Peter Xu 
---
 migration/multifd.h |  2 --
 migration/multifd.c | 56 +++--
 2 files changed, 34 insertions(+), 24 deletions(-)

diff --git a/migration/multifd.h b/migration/multifd.h
index 9b40a53cb6..98876ff94a 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -97,8 +97,6 @@ typedef struct {
 bool running;
 /* multifd flags for each packet */
 uint32_t flags;
-/* global number of generated multifd packets */
-uint64_t packet_num;
 /*
  * The sender thread has work to do if either of below boolean is set.
  *
diff --git a/migration/multifd.c b/migration/multifd.c
index 130f86a1fb..b317d57d61 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -45,6 +45,35 @@ typedef struct {
 uint64_t unused2[4];/* Reserved for future use */
 } __attribute__((packed)) MultiFDInit_t;
 
+struct {
+MultiFDSendParams *params;
+/* array of pages to sent */
+MultiFDPages_t *pages;
+/*
+ * Global number of generated multifd packets.
+ *
+ * Note that we used 'uintptr_t' because it'll naturally support atomic
+ * operations on both 32bit / 64 bits hosts.  It means on 32bit systems
+ * multifd will overflow the packet_num easier, but that should be
+ * fine.
+ *
+ * Another option is to use QEMU's Stat64 then it'll be 64 bits on all
+ * hosts, however so far it does not support atomic fetch_add() yet.
+ * Make it easy for now.
+ */
+uintptr_t packet_num;
+/* send channels ready */
+QemuSemaphore channels_ready;
+/*
+ * Have we already run terminate threads.  There is a race when it
+ * happens that we got one error while we are exiting.
+ * We will use atomic operations.  Only valid values are 0 and 1.
+ */
+int exiting;
+/* multifd ops */
+MultiFDMethods *ops;
+} *multifd_send_state;
+
 /* Multifd without compression */
 
 /**
@@ -292,13 +321,16 @@ void multifd_send_fill_packet(MultiFDSendParams *p)
 {
 MultiFDPacket_t *packet = p->packet;
 MultiFDPages_t *pages = p->pages;
+uint64_t packet_num;
 int i;
 
 packet->flags = cpu_to_be32(p->flags);
 packet->pages_alloc = cpu_to_be32(p->pages->allocated);
 packet->normal_pages = cpu_to_be32(pages->num);
 packet->next_packet_size = cpu_to_be32(p->next_packet_size);
-packet->packet_num = cpu_to_be64(p->packet_num);
+
+packet_num = qatomic_fetch_inc(_send_state->packet_num);
+packet->packet_num = cpu_to_be64(packet_num);
 
 if (pages->block) {
 strncpy(packet->ramblock, pages->block->idstr, 256);
@@ -314,7 +346,7 @@ void multifd_send_fill_packet(MultiFDSendParams *p)
 p->packets_sent++;
 p->total_normal_pages += pages->num;
 
-trace_multifd_send(p->id, p->packet_num, pages->num, p->flags,
+trace_multifd_send(p->id, packet_num, pages->num, p->flags,
p->next_packet_size);
 }
 
@@ -398,24 +430,6 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams 
*p, Error **errp)
 return 0;
 }
 
-struct {
-MultiFDSendParams *params;
-/* array of pages to sent */
-MultiFDPages_t *pages;
-/* global number of generated multifd packets */
-uint64_t packet_num;
-/* send channels ready */
-QemuSemaphore channels_ready;
-/*
- * Have we already run terminate threads.  There is a race when it
- * happens that we got one error while we are exiting.
- 

[PULL 06/34] migration/multifd: Drop MultiFDSendParams.normal[] array

2024-02-07 Thread peterx
From: Peter Xu 

This array is redundant when p->pages exists.  Now we extended the life of
p->pages to the whole period where pending_job is set, it should be safe to
always use p->pages->offset[] rather than p->normal[].  Drop the array.

Alongside, the normal_num is also redundant, which is the same to
p->pages->num.

This doesn't apply to recv side, because there's no extra buffering on recv
side, so p->normal[] array is still needed.

Reviewed-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240202102857.110210-6-pet...@redhat.com
Signed-off-by: Peter Xu 
---
 migration/multifd.h  |  4 
 migration/multifd-zlib.c |  7 ---
 migration/multifd-zstd.c |  7 ---
 migration/multifd.c  | 33 +
 4 files changed, 21 insertions(+), 30 deletions(-)

diff --git a/migration/multifd.h b/migration/multifd.h
index 7c040cb85a..3920bdbcf1 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -122,10 +122,6 @@ typedef struct {
 struct iovec *iov;
 /* number of iovs used */
 uint32_t iovs_num;
-/* Pages that are not zero */
-ram_addr_t *normal;
-/* num of non zero pages */
-uint32_t normal_num;
 /* used for compression methods */
 void *data;
 }  MultiFDSendParams;
diff --git a/migration/multifd-zlib.c b/migration/multifd-zlib.c
index 37ce48621e..100809abc1 100644
--- a/migration/multifd-zlib.c
+++ b/migration/multifd-zlib.c
@@ -116,17 +116,18 @@ static void zlib_send_cleanup(MultiFDSendParams *p, Error 
**errp)
  */
 static int zlib_send_prepare(MultiFDSendParams *p, Error **errp)
 {
+MultiFDPages_t *pages = p->pages;
 struct zlib_data *z = p->data;
 z_stream *zs = >zs;
 uint32_t out_size = 0;
 int ret;
 uint32_t i;
 
-for (i = 0; i < p->normal_num; i++) {
+for (i = 0; i < pages->num; i++) {
 uint32_t available = z->zbuff_len - out_size;
 int flush = Z_NO_FLUSH;
 
-if (i == p->normal_num - 1) {
+if (i == pages->num - 1) {
 flush = Z_SYNC_FLUSH;
 }
 
@@ -135,7 +136,7 @@ static int zlib_send_prepare(MultiFDSendParams *p, Error 
**errp)
  * with compression. zlib does not guarantee that this is safe,
  * therefore copy the page before calling deflate().
  */
-memcpy(z->buf, p->pages->block->host + p->normal[i], p->page_size);
+memcpy(z->buf, p->pages->block->host + pages->offset[i], p->page_size);
 zs->avail_in = p->page_size;
 zs->next_in = z->buf;
 
diff --git a/migration/multifd-zstd.c b/migration/multifd-zstd.c
index b471daadcd..2023edd8cc 100644
--- a/migration/multifd-zstd.c
+++ b/migration/multifd-zstd.c
@@ -113,6 +113,7 @@ static void zstd_send_cleanup(MultiFDSendParams *p, Error 
**errp)
  */
 static int zstd_send_prepare(MultiFDSendParams *p, Error **errp)
 {
+MultiFDPages_t *pages = p->pages;
 struct zstd_data *z = p->data;
 int ret;
 uint32_t i;
@@ -121,13 +122,13 @@ static int zstd_send_prepare(MultiFDSendParams *p, Error 
**errp)
 z->out.size = z->zbuff_len;
 z->out.pos = 0;
 
-for (i = 0; i < p->normal_num; i++) {
+for (i = 0; i < pages->num; i++) {
 ZSTD_EndDirective flush = ZSTD_e_continue;
 
-if (i == p->normal_num - 1) {
+if (i == pages->num - 1) {
 flush = ZSTD_e_flush;
 }
-z->in.src = p->pages->block->host + p->normal[i];
+z->in.src = p->pages->block->host + pages->offset[i];
 z->in.size = p->page_size;
 z->in.pos = 0;
 
diff --git a/migration/multifd.c b/migration/multifd.c
index 5633ac245a..8bb1fd95cf 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -90,13 +90,13 @@ static int nocomp_send_prepare(MultiFDSendParams *p, Error 
**errp)
 {
 MultiFDPages_t *pages = p->pages;
 
-for (int i = 0; i < p->normal_num; i++) {
-p->iov[p->iovs_num].iov_base = pages->block->host + p->normal[i];
+for (int i = 0; i < pages->num; i++) {
+p->iov[p->iovs_num].iov_base = pages->block->host + pages->offset[i];
 p->iov[p->iovs_num].iov_len = p->page_size;
 p->iovs_num++;
 }
 
-p->next_packet_size = p->normal_num * p->page_size;
+p->next_packet_size = pages->num * p->page_size;
 p->flags |= MULTIFD_FLAG_NOCOMP;
 return 0;
 }
@@ -269,21 +269,22 @@ static void multifd_pages_clear(MultiFDPages_t *pages)
 static void multifd_send_fill_packet(MultiFDSendParams *p)
 {
 MultiFDPacket_t *packet = p->packet;
+MultiFDPages_t *pages = p->pages;
 int i;
 
 packet->flags = cpu_to_be32(p->flags);
 packet->pages_alloc = cpu_to_be32(p->pages->allocated);
-packet->normal_pages = cpu_to_be32(p->normal_num);
+packet->normal_pages = cpu_to_be32(pages->num);
 packet->next_packet_size = cpu_to_be32(p->next_packet_size);
 packet->packet_num = cpu_to_be64(p->packet_num);
 
-if (p->pages->block) {
-strncpy(packet->ramblock, p->pages->block->idstr, 256);
+if (pages->block) {
+

[PULL 08/34] migration/multifd: Simplify locking in sender thread

2024-02-07 Thread peterx
From: Peter Xu 

The sender thread will yield the p->mutex before IO starts, trying to not
block the requester thread.  This may be unnecessary lock optimizations,
because the requester can already read pending_job safely even without the
lock, because the requester is currently the only one who can assign a
task.

Drop that lock complication on both sides:

  (1) in the sender thread, always take the mutex until job done
  (2) in the requester thread, check pending_job clear lockless

Reviewed-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240202102857.110210-8-pet...@redhat.com
Signed-off-by: Peter Xu 
---
 migration/multifd.c | 23 ---
 1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index ea25bbe6bd..4d5a01ed93 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -429,7 +429,9 @@ static int multifd_send_pages(void)
 return -1;
 }
 
+/* We wait here, until at least one channel is ready */
 qemu_sem_wait(_send_state->channels_ready);
+
 /*
  * next_channel can remain from a previous migration that was
  * using more channels, so ensure it doesn't overflow if the
@@ -441,17 +443,26 @@ static int multifd_send_pages(void)
 return -1;
 }
 p = _send_state->params[i];
-qemu_mutex_lock(>mutex);
+/*
+ * Lockless read to p->pending_job is safe, because only multifd
+ * sender thread can clear it.
+ */
 if (qatomic_read(>pending_job) == false) {
-qatomic_set(>pending_job, true);
 next_channel = (i + 1) % migrate_multifd_channels();
 break;
 }
-qemu_mutex_unlock(>mutex);
 }
+
+qemu_mutex_lock(>mutex);
 assert(!p->pages->num);
 assert(!p->pages->block);
-
+/*
+ * Double check on pending_job==false with the lock.  In the future if
+ * we can have >1 requester thread, we can replace this with a "goto
+ * retry", but that is for later.
+ */
+assert(qatomic_read(>pending_job) == false);
+qatomic_set(>pending_job, true);
 p->packet_num = multifd_send_state->packet_num++;
 multifd_send_state->pages = p->pages;
 p->pages = pages;
@@ -709,8 +720,6 @@ static void *multifd_send_thread(void *opaque)
 multifd_send_fill_packet(p);
 p->num_packets++;
 p->total_normal_pages += pages->num;
-qemu_mutex_unlock(>mutex);
-
 trace_multifd_send(p->id, packet_num, pages->num, p->flags,
p->next_packet_size);
 
@@ -730,6 +739,7 @@ static void *multifd_send_thread(void *opaque)
 ret = qio_channel_writev_full_all(p->c, p->iov, p->iovs_num, NULL,
   0, p->write_flags, _err);
 if (ret != 0) {
+qemu_mutex_unlock(>mutex);
 break;
 }
 
@@ -738,7 +748,6 @@ static void *multifd_send_thread(void *opaque)
 
 multifd_pages_reset(p->pages);
 p->next_packet_size = 0;
-qemu_mutex_lock(>mutex);
 qatomic_set(>pending_job, false);
 qemu_mutex_unlock(>mutex);
 } else if (qatomic_read(>pending_sync)) {
-- 
2.43.0




[PULL 19/34] migration/multifd: Rewrite multifd_queue_page()

2024-02-07 Thread peterx
From: Peter Xu 

The current multifd_queue_page() is not easy to read and follow.  It is not
good with a few reasons:

  - No helper at all to show what exactly does a condition mean; in short,
  readability is low.

  - Rely on pages->ramblock being cleared to detect an empty queue.  It's
  slightly an overload of the ramblock pointer, per Fabiano [1], which I
  also agree.

  - Contains a self recursion, even if not necessary..

Rewrite this function.  We add some comments to make it even clearer on
what it does.

[1] https://lore.kernel.org/r/87wmrpjzew@suse.de

Reviewed-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240202102857.110210-19-pet...@redhat.com
Signed-off-by: Peter Xu 
---
 migration/multifd.c | 56 ++---
 1 file changed, 37 insertions(+), 19 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index 35d4e8ad1f..4ab8e6eff2 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -506,35 +506,53 @@ static bool multifd_send_pages(void)
 return true;
 }
 
+static inline bool multifd_queue_empty(MultiFDPages_t *pages)
+{
+return pages->num == 0;
+}
+
+static inline bool multifd_queue_full(MultiFDPages_t *pages)
+{
+return pages->num == pages->allocated;
+}
+
+static inline void multifd_enqueue(MultiFDPages_t *pages, ram_addr_t offset)
+{
+pages->offset[pages->num++] = offset;
+}
+
 /* Returns true if enqueue successful, false otherwise */
 bool multifd_queue_page(RAMBlock *block, ram_addr_t offset)
 {
-MultiFDPages_t *pages = multifd_send_state->pages;
-bool changed = false;
+MultiFDPages_t *pages;
+
+retry:
+pages = multifd_send_state->pages;
 
-if (!pages->block) {
+/* If the queue is empty, we can already enqueue now */
+if (multifd_queue_empty(pages)) {
 pages->block = block;
+multifd_enqueue(pages, offset);
+return true;
 }
 
-if (pages->block == block) {
-pages->offset[pages->num] = offset;
-pages->num++;
-
-if (pages->num < pages->allocated) {
-return true;
+/*
+ * Not empty, meanwhile we need a flush.  It can because of either:
+ *
+ * (1) The page is not on the same ramblock of previous ones, or,
+ * (2) The queue is full.
+ *
+ * After flush, always retry.
+ */
+if (pages->block != block || multifd_queue_full(pages)) {
+if (!multifd_send_pages()) {
+return false;
 }
-} else {
-changed = true;
-}
-
-if (!multifd_send_pages()) {
-return false;
-}
-
-if (changed) {
-return multifd_queue_page(block, offset);
+goto retry;
 }
 
+/* Not empty, and we still have space, do it! */
+multifd_enqueue(pages, offset);
 return true;
 }
 
-- 
2.43.0




[PULL 24/34] migration/multifd: Optimize sender side to be lockless

2024-02-07 Thread peterx
From: Peter Xu 

When reviewing my attempt to refactor send_prepare(), Fabiano suggested we
try out with dropping the mutex in multifd code [1].

I thought about that before but I never tried to change the code.  Now
maybe it's time to give it a stab.  This only optimizes the sender side.

The trick here is multifd has a clear provider/consumer model, that the
migration main thread publishes requests (either pending_job/pending_sync),
while the multifd sender threads are consumers.  Here we don't have a lot
of complicated data sharing, and the jobs can logically be submitted
lockless.

Arm the code with atomic weapons.  Two things worth mentioning:

  - For multifd_send_pages(): we can use qatomic_load_acquire() when trying
  to find a free channel, but that's expensive if we attach one ACQUIRE per
  channel.  Instead, keep the qatomic_read() on reading the pending_job
  flag as we do already, meanwhile use one smp_mb_acquire() after the loop
  to guarantee the memory ordering.

  - For pending_sync: it doesn't have any extra data required since now
  p->flags are never touched, it should be safe to not use memory barrier.
  That's different from pending_job.

Provide rich comments for all the lockless operations to state how they are
paired.  With that, we can remove the mutex.

[1] https://lore.kernel.org/r/87o7d1jlu5@suse.de

Suggested-by: Fabiano Rosas 
Reviewed-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240202102857.110210-24-pet...@redhat.com
Signed-off-by: Peter Xu 
---
 migration/multifd.h |  2 --
 migration/multifd.c | 51 +++--
 2 files changed, 26 insertions(+), 27 deletions(-)

diff --git a/migration/multifd.h b/migration/multifd.h
index 98876ff94a..78a2317263 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -91,8 +91,6 @@ typedef struct {
 /* syncs main thread and channels */
 QemuSemaphore sem_sync;
 
-/* this mutex protects the following parameters */
-QemuMutex mutex;
 /* is this channel thread running */
 bool running;
 /* multifd flags for each packet */
diff --git a/migration/multifd.c b/migration/multifd.c
index b317d57d61..fbdb129088 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -501,19 +501,19 @@ static bool multifd_send_pages(void)
 }
 }
 
-qemu_mutex_lock(>mutex);
-assert(!p->pages->num);
-assert(!p->pages->block);
 /*
- * Double check on pending_job==false with the lock.  In the future if
- * we can have >1 requester thread, we can replace this with a "goto
- * retry", but that is for later.
+ * Make sure we read p->pending_job before all the rest.  Pairs with
+ * qatomic_store_release() in multifd_send_thread().
  */
-assert(qatomic_read(>pending_job) == false);
-qatomic_set(>pending_job, true);
+smp_mb_acquire();
+assert(!p->pages->num);
 multifd_send_state->pages = p->pages;
 p->pages = pages;
-qemu_mutex_unlock(>mutex);
+/*
+ * Making sure p->pages is setup before marking pending_job=true. Pairs
+ * with the qatomic_load_acquire() in multifd_send_thread().
+ */
+qatomic_store_release(>pending_job, true);
 qemu_sem_post(>sem);
 
 return true;
@@ -648,7 +648,6 @@ static bool multifd_send_cleanup_channel(MultiFDSendParams 
*p, Error **errp)
 }
 multifd_send_channel_destroy(p->c);
 p->c = NULL;
-qemu_mutex_destroy(>mutex);
 qemu_sem_destroy(>sem);
 qemu_sem_destroy(>sem_sync);
 g_free(p->name);
@@ -742,14 +741,12 @@ int multifd_send_sync_main(void)
 
 trace_multifd_send_sync_main_signal(p->id);
 
-qemu_mutex_lock(>mutex);
 /*
  * We should be the only user so far, so not possible to be set by
  * others concurrently.
  */
 assert(qatomic_read(>pending_sync) == false);
 qatomic_set(>pending_sync, true);
-qemu_mutex_unlock(>mutex);
 qemu_sem_post(>sem);
 }
 for (i = 0; i < migrate_multifd_channels(); i++) {
@@ -796,9 +793,12 @@ static void *multifd_send_thread(void *opaque)
 if (multifd_send_should_exit()) {
 break;
 }
-qemu_mutex_lock(>mutex);
 
-if (qatomic_read(>pending_job)) {
+/*
+ * Read pending_job flag before p->pages.  Pairs with the
+ * qatomic_store_release() in multifd_send_pages().
+ */
+if (qatomic_load_acquire(>pending_job)) {
 MultiFDPages_t *pages = p->pages;
 
 p->iovs_num = 0;
@@ -806,14 +806,12 @@ static void *multifd_send_thread(void *opaque)
 
 ret = multifd_send_state->ops->send_prepare(p, _err);
 if (ret != 0) {
-qemu_mutex_unlock(>mutex);
 break;
 }
 
 ret = qio_channel_writev_full_all(p->c, p->iov, p->iovs_num, NULL,
   0, p->write_flags, _err);
 if (ret != 0) {
-

[PULL 07/34] migration/multifd: Separate SYNC request with normal jobs

2024-02-07 Thread peterx
From: Peter Xu 

Multifd provide a threaded model for processing jobs.  On sender side,
there can be two kinds of job: (1) a list of pages to send, or (2) a sync
request.

The sync request is a very special kind of job.  It never contains a page
array, but only a multifd packet telling the dest side to synchronize with
sent pages.

Before this patch, both requests use the pending_job field, no matter what
the request is, it will boost pending_job, while multifd sender thread will
decrement it after it finishes one job.

However this should be racy, because SYNC is special in that it needs to
set p->flags with MULTIFD_FLAG_SYNC, showing that this is a sync request.
Consider a sequence of operations where:

  - migration thread enqueue a job to send some pages, pending_job++ (0->1)

  - [...before the selected multifd sender thread wakes up...]

  - migration thread enqueue another job to sync, pending_job++ (1->2),
setup p->flags=MULTIFD_FLAG_SYNC

  - multifd sender thread wakes up, found pending_job==2
- send the 1st packet with MULTIFD_FLAG_SYNC and list of pages
- send the 2nd packet with flags==0 and no pages

This is not expected, because MULTIFD_FLAG_SYNC should hopefully be done
after all the pages are received.  Meanwhile, the 2nd packet will be
completely useless, which contains zero information.

I didn't verify above, but I think this issue is still benign in that at
least on the recv side we always receive pages before handling
MULTIFD_FLAG_SYNC.  However that's not always guaranteed and just tricky.

One other reason I want to separate it is using p->flags to communicate
between the two threads is also not clearly defined, it's very hard to read
and understand why accessing p->flags is always safe; see the current impl
of multifd_send_thread() where we tried to cache only p->flags.  It doesn't
need to be that complicated.

This patch introduces pending_sync, a separate flag just to show that the
requester needs a sync.  Alongside, we remove the tricky caching of
p->flags now because after this patch p->flags should only be used by
multifd sender thread now, which will be crystal clear.  So it is always
thread safe to access p->flags.

With that, we can also safely convert the pending_job into a boolean,
because we don't support >1 pending jobs anyway.

Always use atomic ops to access both flags to make sure no cache effect.
When at it, drop the initial setting of "pending_job = 0" because it's
always allocated using g_new0().

Reviewed-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240202102857.110210-7-pet...@redhat.com
Signed-off-by: Peter Xu 
---
 migration/multifd.h | 13 +++--
 migration/multifd.c | 39 +--
 2 files changed, 36 insertions(+), 16 deletions(-)

diff --git a/migration/multifd.h b/migration/multifd.h
index 3920bdbcf1..08f26ef3fe 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -99,8 +99,17 @@ typedef struct {
 uint32_t flags;
 /* global number of generated multifd packets */
 uint64_t packet_num;
-/* thread has work to do */
-int pending_job;
+/*
+ * The sender thread has work to do if either of below boolean is set.
+ *
+ * @pending_job:  a job is pending
+ * @pending_sync: a sync request is pending
+ *
+ * For both of these fields, they're only set by the requesters, and
+ * cleared by the multifd sender threads.
+ */
+bool pending_job;
+bool pending_sync;
 /* array of pages to sent.
  * The owner of 'pages' depends of 'pending_job' value:
  * pending_job == 0 -> migration_thread can use it.
diff --git a/migration/multifd.c b/migration/multifd.c
index 8bb1fd95cf..ea25bbe6bd 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -442,8 +442,8 @@ static int multifd_send_pages(void)
 }
 p = _send_state->params[i];
 qemu_mutex_lock(>mutex);
-if (!p->pending_job) {
-p->pending_job++;
+if (qatomic_read(>pending_job) == false) {
+qatomic_set(>pending_job, true);
 next_channel = (i + 1) % migrate_multifd_channels();
 break;
 }
@@ -631,8 +631,12 @@ int multifd_send_sync_main(void)
 
 qemu_mutex_lock(>mutex);
 p->packet_num = multifd_send_state->packet_num++;
-p->flags |= MULTIFD_FLAG_SYNC;
-p->pending_job++;
+/*
+ * We should be the only user so far, so not possible to be set by
+ * others concurrently.
+ */
+assert(qatomic_read(>pending_sync) == false);
+qatomic_set(>pending_sync, true);
 qemu_mutex_unlock(>mutex);
 qemu_sem_post(>sem);
 }
@@ -685,10 +689,9 @@ static void *multifd_send_thread(void *opaque)
 }
 qemu_mutex_lock(>mutex);
 
-if (p->pending_job) {
+if (qatomic_read(>pending_job)) {
 uint64_t packet_num = p->packet_num;
 MultiFDPages_t *pages = p->pages;
-

[PULL 27/34] migration/multifd: Remove p->running

2024-02-07 Thread peterx
From: Fabiano Rosas 

We currently only need p->running to avoid calling qemu_thread_join()
on a non existent thread if the thread has never been created.

However, there are at least two bugs in this logic:

1) On the sending side, p->running is set too early and
qemu_thread_create() can be skipped due to an error during TLS
handshake, leaving the flag set and leading to a crash when
multifd_send_cleanup() calls qemu_thread_join().

2) During exit, the multifd thread clears the flag while holding the
channel lock. The counterpart at multifd_send_cleanup() reads the flag
outside of the lock and might free the mutex while the multifd thread
still has it locked.

Fix the first issue by setting the flag right before creating the
thread. Rename it from p->running to p->thread_created to clarify its
usage.

Fix the second issue by not clearing the flag at the multifd thread
exit. We don't have any use for that.

Note that these bugs are straight-forward logic issues and not race
conditions. There is still a gap for races to affect this code due to
multifd_send_cleanup() being allowed to run concurrently with the
thread creation loop. This issue is solved in the next patches.

Cc: qemu-stable 
Fixes: 29647140157a ("migration/tls: add support for multifd tls-handshake")
Reported-by: Avihai Horon 
Reported-by: chenyuh...@huawei.com
Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240206215118.6171-3-faro...@suse.de
Signed-off-by: Peter Xu 
---
 migration/multifd.h |  7 ++-
 migration/multifd.c | 27 ---
 2 files changed, 14 insertions(+), 20 deletions(-)

diff --git a/migration/multifd.h b/migration/multifd.h
index 720c9d50db..7881980ee6 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -73,6 +73,7 @@ typedef struct {
 char *name;
 /* channel thread id */
 QemuThread thread;
+bool thread_created;
 QemuThread tls_thread;
 bool tls_thread_created;
 /* communication channel */
@@ -93,8 +94,6 @@ typedef struct {
 /* syncs main thread and channels */
 QemuSemaphore sem_sync;
 
-/* is this channel thread running */
-bool running;
 /* multifd flags for each packet */
 uint32_t flags;
 /*
@@ -143,6 +142,7 @@ typedef struct {
 char *name;
 /* channel thread id */
 QemuThread thread;
+bool thread_created;
 /* communication channel */
 QIOChannel *c;
 /* packet allocated len */
@@ -157,8 +157,6 @@ typedef struct {
 
 /* this mutex protects the following parameters */
 QemuMutex mutex;
-/* is this channel thread running */
-bool running;
 /* should this thread finish */
 bool quit;
 /* multifd flags for each packet */
@@ -217,4 +215,3 @@ static inline void 
multifd_send_prepare_header(MultiFDSendParams *p)
 
 
 #endif
-
diff --git a/migration/multifd.c b/migration/multifd.c
index 5551711a2a..e6ac1ad6dc 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -634,7 +634,7 @@ static void multifd_send_terminate_threads(void)
 qemu_thread_join(>tls_thread);
 }
 
-if (p->running) {
+if (p->thread_created) {
 qemu_thread_join(>thread);
 }
 }
@@ -862,7 +862,6 @@ out:
 error_free(local_err);
 }
 
-p->running = false;
 rcu_unregister_thread();
 migration_threads_remove(thread);
 trace_multifd_send_thread_end(p->id, p->packets_sent, 
p->total_normal_pages);
@@ -953,6 +952,8 @@ static bool multifd_channel_connect(MultiFDSendParams *p,
 migration_ioc_register_yank(ioc);
 p->registered_yank = true;
 p->c = ioc;
+
+p->thread_created = true;
 qemu_thread_create(>thread, p->name, multifd_send_thread, p,
QEMU_THREAD_JOINABLE);
 return true;
@@ -967,7 +968,6 @@ static void multifd_new_send_channel_async(QIOTask *task, 
gpointer opaque)
 trace_multifd_new_send_channel_async(p->id);
 if (!qio_task_propagate_error(task, _err)) {
 qio_channel_set_delay(ioc, false);
-p->running = true;
 if (multifd_channel_connect(p, ioc, _err)) {
 return;
 }
@@ -1128,15 +1128,15 @@ void multifd_recv_cleanup(void)
 for (i = 0; i < migrate_multifd_channels(); i++) {
 MultiFDRecvParams *p = _recv_state->params[i];
 
-if (p->running) {
-/*
- * multifd_recv_thread may hung at MULTIFD_FLAG_SYNC handle code,
- * however try to wakeup it without harm in cleanup phase.
- */
-qemu_sem_post(>sem_sync);
-}
+/*
+ * multifd_recv_thread may hung at MULTIFD_FLAG_SYNC handle code,
+ * however try to wakeup it without harm in cleanup phase.
+ */
+qemu_sem_post(>sem_sync);
 
-qemu_thread_join(>thread);
+if (p->thread_created) {
+qemu_thread_join(>thread);
+}
 }
 for (i = 0; i < migrate_multifd_channels(); i++) {
 

[PULL 04/34] migration/multifd: Drop MultiFDSendParams.quit, cleanup error paths

2024-02-07 Thread peterx
From: Peter Xu 

Multifd send side has two fields to indicate error quits:

  - MultiFDSendParams.quit
  - _send_state->exiting

Merge them into the global one.  The replacement is done by changing all
p->quit checks into the global var check.  The global check doesn't need
any lock.

A few more things done on top of this altogether:

  - multifd_send_terminate_threads()

Moving the xchg() of _send_state->exiting upper, so as to cover
the tracepoint, migrate_set_error() and migrate_set_state().

  - multifd_send_sync_main()

In the 2nd loop, add one more check over the global var to make sure we
don't keep the looping if QEMU already decided to quit.

  - multifd_tls_outgoing_handshake()

Use multifd_send_terminate_threads() to set the error state.  That has
a benefit of updating MigrationState.error to that error too, so we can
persist that 1st error we hit in that specific channel.

  - multifd_new_send_channel_async()

Take similar approach like above, drop the migrate_set_error() because
multifd_send_terminate_threads() already covers that.  Unwrap the helper
multifd_new_send_channel_cleanup() along the way; not really needed.

Reviewed-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240202102857.110210-4-pet...@redhat.com
Signed-off-by: Peter Xu 
---
 migration/multifd.h |  2 --
 migration/multifd.c | 85 ++---
 2 files changed, 33 insertions(+), 54 deletions(-)

diff --git a/migration/multifd.h b/migration/multifd.h
index 35d11f103c..7c040cb85a 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -95,8 +95,6 @@ typedef struct {
 QemuMutex mutex;
 /* is this channel thread running */
 bool running;
-/* should this thread finish */
-bool quit;
 /* multifd flags for each packet */
 uint32_t flags;
 /* global number of generated multifd packets */
diff --git a/migration/multifd.c b/migration/multifd.c
index b8d2c96533..2c98023d67 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -372,6 +372,11 @@ struct {
 MultiFDMethods *ops;
 } *multifd_send_state;
 
+static bool multifd_send_should_exit(void)
+{
+return qatomic_read(_send_state->exiting);
+}
+
 /*
  * The migration thread can wait on either of the two semaphores.  This
  * function can be used to kick the main thread out of waiting on either of
@@ -409,7 +414,7 @@ static int multifd_send_pages(void)
 MultiFDSendParams *p = NULL; /* make happy gcc */
 MultiFDPages_t *pages = multifd_send_state->pages;
 
-if (qatomic_read(_send_state->exiting)) {
+if (multifd_send_should_exit()) {
 return -1;
 }
 
@@ -421,14 +426,11 @@ static int multifd_send_pages(void)
  */
 next_channel %= migrate_multifd_channels();
 for (i = next_channel;; i = (i + 1) % migrate_multifd_channels()) {
-p = _send_state->params[i];
-
-qemu_mutex_lock(>mutex);
-if (p->quit) {
-error_report("%s: channel %d has already quit!", __func__, i);
-qemu_mutex_unlock(>mutex);
+if (multifd_send_should_exit()) {
 return -1;
 }
+p = _send_state->params[i];
+qemu_mutex_lock(>mutex);
 if (!p->pending_job) {
 p->pending_job++;
 next_channel = (i + 1) % migrate_multifd_channels();
@@ -483,6 +485,16 @@ static void multifd_send_terminate_threads(Error *err)
 {
 int i;
 
+/*
+ * We don't want to exit each threads twice.  Depending on where
+ * we get the error, or if there are two independent errors in two
+ * threads at the same time, we can end calling this function
+ * twice.
+ */
+if (qatomic_xchg(_send_state->exiting, 1)) {
+return;
+}
+
 trace_multifd_send_terminate_threads(err != NULL);
 
 if (err) {
@@ -497,26 +509,13 @@ static void multifd_send_terminate_threads(Error *err)
 }
 }
 
-/*
- * We don't want to exit each threads twice.  Depending on where
- * we get the error, or if there are two independent errors in two
- * threads at the same time, we can end calling this function
- * twice.
- */
-if (qatomic_xchg(_send_state->exiting, 1)) {
-return;
-}
-
 for (i = 0; i < migrate_multifd_channels(); i++) {
 MultiFDSendParams *p = _send_state->params[i];
 
-qemu_mutex_lock(>mutex);
-p->quit = true;
 qemu_sem_post(>sem);
 if (p->c) {
 qio_channel_shutdown(p->c, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
 }
-qemu_mutex_unlock(>mutex);
 }
 }
 
@@ -615,16 +614,13 @@ int multifd_send_sync_main(void)
 for (i = 0; i < migrate_multifd_channels(); i++) {
 MultiFDSendParams *p = _send_state->params[i];
 
-trace_multifd_send_sync_main_signal(p->id);
-
-qemu_mutex_lock(>mutex);
-
-if (p->quit) {
-error_report("%s: channel %d has already quit", __func__, i);
-

[PULL 13/34] migration/multifd: multifd_send_prepare_header()

2024-02-07 Thread peterx
From: Peter Xu 

Introduce a helper multifd_send_prepare_header() to setup the header packet
for multifd sender.

It's fine to setup the IOV[0] _before_ send_prepare() because the packet
buffer is already ready, even if the content is to be filled in.

With this helper, we can already slightly clean up the zero copy path.

Note that I explicitly put it into multifd.h, because I want it inlined
directly into multifd*.c where necessary later.

Reviewed-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240202102857.110210-13-pet...@redhat.com
Signed-off-by: Peter Xu 
---
 migration/multifd.h |  8 
 migration/multifd.c | 16 
 2 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/migration/multifd.h b/migration/multifd.h
index 2e4ad0dc56..4ec005f53f 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -209,5 +209,13 @@ typedef struct {
 
 void multifd_register_ops(int method, MultiFDMethods *ops);
 
+static inline void multifd_send_prepare_header(MultiFDSendParams *p)
+{
+p->iov[0].iov_len = p->packet_len;
+p->iov[0].iov_base = p->packet;
+p->iovs_num++;
+}
+
+
 #endif
 
diff --git a/migration/multifd.c b/migration/multifd.c
index 44163e4e28..cd4467aff4 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -712,10 +712,14 @@ static void *multifd_send_thread(void *opaque)
 if (qatomic_read(>pending_job)) {
 MultiFDPages_t *pages = p->pages;
 
-if (use_zero_copy_send) {
-p->iovs_num = 0;
-} else {
-p->iovs_num = 1;
+p->iovs_num = 0;
+
+if (!use_zero_copy_send) {
+/*
+ * Only !zerocopy needs the header in IOV; zerocopy will
+ * send it separately.
+ */
+multifd_send_prepare_header(p);
 }
 
 assert(pages->num);
@@ -735,10 +739,6 @@ static void *multifd_send_thread(void *opaque)
 if (ret != 0) {
 break;
 }
-} else {
-/* Send header using the same writev call */
-p->iov[0].iov_len = p->packet_len;
-p->iov[0].iov_base = p->packet;
 }
 
 ret = qio_channel_writev_full_all(p->c, p->iov, p->iovs_num, NULL,
-- 
2.43.0




[PULL 15/34] migration/multifd: Forbid spurious wakeups

2024-02-07 Thread peterx
From: Peter Xu 

Now multifd's logic is designed to have no spurious wakeup.  I still
remember a talk to Juan and he seems to agree we should drop it now, and if
my memory was right it was there because multifd used to hit that when
still debugging.

Let's drop it and see what can explode; as long as it's not reaching
soft-freeze.

Reviewed-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240202102857.110210-15-pet...@redhat.com
Signed-off-by: Peter Xu 
---
 migration/multifd.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index 6aa44340de..28b54100cd 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -756,7 +756,9 @@ static void *multifd_send_thread(void *opaque)
 p->next_packet_size = 0;
 qatomic_set(>pending_job, false);
 qemu_mutex_unlock(>mutex);
-} else if (qatomic_read(>pending_sync)) {
+} else {
+/* If not a normal job, must be a sync request */
+assert(qatomic_read(>pending_sync));
 p->flags = MULTIFD_FLAG_SYNC;
 multifd_send_fill_packet(p);
 ret = qio_channel_write_all(p->c, (void *)p->packet,
@@ -771,9 +773,6 @@ static void *multifd_send_thread(void *opaque)
 qatomic_set(>pending_sync, false);
 qemu_mutex_unlock(>mutex);
 qemu_sem_post(>sem_sync);
-} else {
-qemu_mutex_unlock(>mutex);
-/* sometimes there are spurious wakeups */
 }
 }
 
-- 
2.43.0




[PULL 20/34] migration/multifd: Cleanup multifd_save_cleanup()

2024-02-07 Thread peterx
From: Peter Xu 

Shrink the function by moving relevant works into helpers: move the thread
join()s into multifd_send_terminate_threads(), then create two more helpers
to cover channel/state cleanups.

Add a TODO entry for the thread terminate process because p->running is
still buggy.  We need to fix it at some point but not yet covered.

Suggested-by: Fabiano Rosas 
Reviewed-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240202102857.110210-20-pet...@redhat.com
Signed-off-by: Peter Xu 
---
 migration/multifd.c | 91 +
 1 file changed, 59 insertions(+), 32 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index 4ab8e6eff2..4cb0d2cc17 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -593,6 +593,11 @@ static void multifd_send_terminate_threads(void)
  * always set it.
  */
 qatomic_set(_send_state->exiting, 1);
+
+/*
+ * Firstly, kick all threads out; no matter whether they are just idle,
+ * or blocked in an IO system call.
+ */
 for (i = 0; i < migrate_multifd_channels(); i++) {
 MultiFDSendParams *p = _send_state->params[i];
 
@@ -601,6 +606,21 @@ static void multifd_send_terminate_threads(void)
 qio_channel_shutdown(p->c, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
 }
 }
+
+/*
+ * Finally recycle all the threads.
+ *
+ * TODO: p->running is still buggy, e.g. we can reach here without the
+ * corresponding multifd_new_send_channel_async() get invoked yet,
+ * then a new thread can even be created after this function returns.
+ */
+for (i = 0; i < migrate_multifd_channels(); i++) {
+MultiFDSendParams *p = _send_state->params[i];
+
+if (p->running) {
+qemu_thread_join(>thread);
+}
+}
 }
 
 static int multifd_send_channel_destroy(QIOChannel *send)
@@ -608,6 +628,41 @@ static int multifd_send_channel_destroy(QIOChannel *send)
 return socket_send_channel_destroy(send);
 }
 
+static bool multifd_send_cleanup_channel(MultiFDSendParams *p, Error **errp)
+{
+if (p->registered_yank) {
+migration_ioc_unregister_yank(p->c);
+}
+multifd_send_channel_destroy(p->c);
+p->c = NULL;
+qemu_mutex_destroy(>mutex);
+qemu_sem_destroy(>sem);
+qemu_sem_destroy(>sem_sync);
+g_free(p->name);
+p->name = NULL;
+multifd_pages_clear(p->pages);
+p->pages = NULL;
+p->packet_len = 0;
+g_free(p->packet);
+p->packet = NULL;
+g_free(p->iov);
+p->iov = NULL;
+multifd_send_state->ops->send_cleanup(p, errp);
+
+return *errp == NULL;
+}
+
+static void multifd_send_cleanup_state(void)
+{
+qemu_sem_destroy(_send_state->channels_ready);
+g_free(multifd_send_state->params);
+multifd_send_state->params = NULL;
+multifd_pages_clear(multifd_send_state->pages);
+multifd_send_state->pages = NULL;
+g_free(multifd_send_state);
+multifd_send_state = NULL;
+}
+
 void multifd_save_cleanup(void)
 {
 int i;
@@ -615,48 +670,20 @@ void multifd_save_cleanup(void)
 if (!migrate_multifd()) {
 return;
 }
+
 multifd_send_terminate_threads();
-for (i = 0; i < migrate_multifd_channels(); i++) {
-MultiFDSendParams *p = _send_state->params[i];
 
-if (p->running) {
-qemu_thread_join(>thread);
-}
-}
 for (i = 0; i < migrate_multifd_channels(); i++) {
 MultiFDSendParams *p = _send_state->params[i];
 Error *local_err = NULL;
 
-if (p->registered_yank) {
-migration_ioc_unregister_yank(p->c);
-}
-multifd_send_channel_destroy(p->c);
-p->c = NULL;
-qemu_mutex_destroy(>mutex);
-qemu_sem_destroy(>sem);
-qemu_sem_destroy(>sem_sync);
-g_free(p->name);
-p->name = NULL;
-multifd_pages_clear(p->pages);
-p->pages = NULL;
-p->packet_len = 0;
-g_free(p->packet);
-p->packet = NULL;
-g_free(p->iov);
-p->iov = NULL;
-multifd_send_state->ops->send_cleanup(p, _err);
-if (local_err) {
+if (!multifd_send_cleanup_channel(p, _err)) {
 migrate_set_error(migrate_get_current(), local_err);
 error_free(local_err);
 }
 }
-qemu_sem_destroy(_send_state->channels_ready);
-g_free(multifd_send_state->params);
-multifd_send_state->params = NULL;
-multifd_pages_clear(multifd_send_state->pages);
-multifd_send_state->pages = NULL;
-g_free(multifd_send_state);
-multifd_send_state = NULL;
+
+multifd_send_cleanup_state();
 }
 
 static int multifd_zero_copy_flush(QIOChannel *c)
-- 
2.43.0




[PULL 26/34] migration/multifd: Join the TLS thread

2024-02-07 Thread peterx
From: Fabiano Rosas 

We're currently leaking the resources of the TLS thread by not joining
it and also overwriting the p->thread pointer altogether.

Fixes: a1af605bd5 ("migration/multifd: fix hangup with TLS-Multifd due to 
blocking handshake")
Cc: qemu-stable 
Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240206215118.6171-2-faro...@suse.de
Signed-off-by: Peter Xu 
---
 migration/multifd.h | 2 ++
 migration/multifd.c | 8 +++-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/migration/multifd.h b/migration/multifd.h
index 78a2317263..720c9d50db 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -73,6 +73,8 @@ typedef struct {
 char *name;
 /* channel thread id */
 QemuThread thread;
+QemuThread tls_thread;
+bool tls_thread_created;
 /* communication channel */
 QIOChannel *c;
 /* is the yank function registered */
diff --git a/migration/multifd.c b/migration/multifd.c
index fbdb129088..5551711a2a 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -630,6 +630,10 @@ static void multifd_send_terminate_threads(void)
 for (i = 0; i < migrate_multifd_channels(); i++) {
 MultiFDSendParams *p = _send_state->params[i];
 
+if (p->tls_thread_created) {
+qemu_thread_join(>tls_thread);
+}
+
 if (p->running) {
 qemu_thread_join(>thread);
 }
@@ -921,7 +925,9 @@ static bool multifd_tls_channel_connect(MultiFDSendParams 
*p,
 trace_multifd_tls_outgoing_handshake_start(ioc, tioc, hostname);
 qio_channel_set_name(QIO_CHANNEL(tioc), "multifd-tls-outgoing");
 p->c = QIO_CHANNEL(tioc);
-qemu_thread_create(>thread, "multifd-tls-handshake-worker",
+
+p->tls_thread_created = true;
+qemu_thread_create(>tls_thread, "multifd-tls-handshake-worker",
multifd_tls_handshake_thread, p,
QEMU_THREAD_JOINABLE);
 return true;
-- 
2.43.0




[PULL 33/34] ci: Remove tag dependency for build-previous-qemu

2024-02-07 Thread peterx
From: Peter Xu 

The new build-previous-qemu job relies on QEMU release tag being present,
while that may not be always true for personal git repositories since by
default tag is not pushed.  The job can fail on those CI kicks, as reported
by Peter Maydell.

Fix it by fetching the tags remotely from the official repository, as
suggested by Dan.

[1] https://lore.kernel.org/r/zcc9sckj7vvqe...@redhat.com

Reported-by: Peter Maydell 
Suggested-by: "Daniel P. Berrangé" 
Reviewed-by: "Daniel P. Berrangé" 
Link: https://lore.kernel.org/r/20240207005403.242235-3-pet...@redhat.com
Signed-off-by: Peter Xu 
---
 .gitlab-ci.d/buildtest.yml | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml
index 79bbc8585b..cfe95c1b17 100644
--- a/.gitlab-ci.d/buildtest.yml
+++ b/.gitlab-ci.d/buildtest.yml
@@ -189,6 +189,8 @@ build-previous-qemu:
 TARGETS: x86_64-softmmu aarch64-softmmu
   before_script:
 - export QEMU_PREV_VERSION="$(sed 's/\([0-9.]*\)\.[0-9]*/v\1.0/' VERSION)"
+- git remote add upstream https://gitlab.com/qemu-project/qemu
+- git fetch upstream $QEMU_PREV_VERSION
 - git checkout $QEMU_PREV_VERSION
   after_script:
 - mv build build-previous
-- 
2.43.0




[PULL 12/34] migration/multifd: Move trace_multifd_send|recv()

2024-02-07 Thread peterx
From: Peter Xu 

Move them into fill/unfill of packets.  With that, we can further cleanup
the send/recv thread procedure, and remove one more temp var.

Reviewed-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240202102857.110210-12-pet...@redhat.com
Signed-off-by: Peter Xu 
---
 migration/multifd.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index 94a0124934..44163e4e28 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -291,6 +291,9 @@ static void multifd_send_fill_packet(MultiFDSendParams *p)
 
 p->packets_sent++;
 p->total_normal_pages += pages->num;
+
+trace_multifd_send(p->id, p->packet_num, pages->num, p->flags,
+   p->next_packet_size);
 }
 
 static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
@@ -341,6 +344,9 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams *p, 
Error **errp)
 p->packets_recved++;
 p->total_normal_pages += p->normal_num;
 
+trace_multifd_recv(p->id, p->packet_num, p->normal_num, p->flags,
+   p->next_packet_size);
+
 if (p->normal_num == 0) {
 return 0;
 }
@@ -704,7 +710,6 @@ static void *multifd_send_thread(void *opaque)
 qemu_mutex_lock(>mutex);
 
 if (qatomic_read(>pending_job)) {
-uint64_t packet_num = p->packet_num;
 MultiFDPages_t *pages = p->pages;
 
 if (use_zero_copy_send) {
@@ -722,8 +727,6 @@ static void *multifd_send_thread(void *opaque)
 }
 
 multifd_send_fill_packet(p);
-trace_multifd_send(p->id, packet_num, pages->num, p->flags,
-   p->next_packet_size);
 
 if (use_zero_copy_send) {
 /* Send header first, without zerocopy */
@@ -1123,8 +1126,6 @@ static void *multifd_recv_thread(void *opaque)
 flags = p->flags;
 /* recv methods don't know how to handle the SYNC flag */
 p->flags &= ~MULTIFD_FLAG_SYNC;
-trace_multifd_recv(p->id, p->packet_num, p->normal_num, flags,
-   p->next_packet_size);
 qemu_mutex_unlock(>mutex);
 
 if (p->normal_num) {
-- 
2.43.0




[PULL 30/34] migration/multifd: Unify multifd and TLS connection paths

2024-02-07 Thread peterx
From: Fabiano Rosas 

During multifd channel creation (multifd_send_new_channel_async) when
TLS is enabled, the multifd_channel_connect function is called twice,
once to create the TLS handshake thread and another time after the
asynchrounous TLS handshake has finished.

This creates a slightly confusing call stack where
multifd_channel_connect() is called more times than the number of
channels. It also splits error handling between the two callers of
multifd_channel_connect() causing some code duplication. Lastly, it
gets in the way of having a single point to determine whether all
channel creation tasks have been initiated.

Refactor the code to move the reentrancy one level up at the
multifd_new_send_channel_async() level, de-duplicating the error
handling and allowing for the next patch to introduce a
synchronization point common to all the multifd channel creation,
regardless of TLS.

Note that the previous code would never fail once p->c had been set.
This patch changes this assumption, which affects refcounting, so add
comments around object_unref to explain the situation.

Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240206215118.6171-6-faro...@suse.de
Signed-off-by: Peter Xu 
---
 migration/multifd.c | 83 ++---
 1 file changed, 40 insertions(+), 43 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index cf865edba0..3db18dc79e 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -869,30 +869,7 @@ out:
 return NULL;
 }
 
-static bool multifd_channel_connect(MultiFDSendParams *p,
-QIOChannel *ioc,
-Error **errp);
-
-static void multifd_tls_outgoing_handshake(QIOTask *task,
-   gpointer opaque)
-{
-MultiFDSendParams *p = opaque;
-QIOChannel *ioc = QIO_CHANNEL(qio_task_get_source(task));
-Error *err = NULL;
-
-if (!qio_task_propagate_error(task, )) {
-trace_multifd_tls_outgoing_handshake_complete(ioc);
-if (multifd_channel_connect(p, ioc, )) {
-return;
-}
-}
-
-trace_multifd_tls_outgoing_handshake_error(ioc, error_get_pretty(err));
-
-multifd_send_set_error(err);
-multifd_send_kick_main(p);
-error_free(err);
-}
+static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque);
 
 static void *multifd_tls_handshake_thread(void *opaque)
 {
@@ -900,7 +877,7 @@ static void *multifd_tls_handshake_thread(void *opaque)
 QIOChannelTLS *tioc = QIO_CHANNEL_TLS(p->c);
 
 qio_channel_tls_handshake(tioc,
-  multifd_tls_outgoing_handshake,
+  multifd_new_send_channel_async,
   p,
   NULL,
   NULL);
@@ -920,6 +897,10 @@ static bool multifd_tls_channel_connect(MultiFDSendParams 
*p,
 return false;
 }
 
+/*
+ * Ownership of the socket channel now transfers to the newly
+ * created TLS channel, which has already taken a reference.
+ */
 object_unref(OBJECT(ioc));
 trace_multifd_tls_outgoing_handshake_start(ioc, tioc, hostname);
 qio_channel_set_name(QIO_CHANNEL(tioc), "multifd-tls-outgoing");
@@ -936,18 +917,7 @@ static bool multifd_channel_connect(MultiFDSendParams *p,
 QIOChannel *ioc,
 Error **errp)
 {
-trace_multifd_set_outgoing_channel(
-ioc, object_get_typename(OBJECT(ioc)),
-migrate_get_current()->hostname);
-
-if (migrate_channel_requires_tls_upgrade(ioc)) {
-/*
- * tls_channel_connect will call back to this
- * function after the TLS handshake,
- * so we mustn't call multifd_send_thread until then
- */
-return multifd_tls_channel_connect(p, ioc, errp);
-}
+qio_channel_set_delay(ioc, false);
 
 migration_ioc_register_yank(ioc);
 p->registered_yank = true;
@@ -959,24 +929,51 @@ static bool multifd_channel_connect(MultiFDSendParams *p,
 return true;
 }
 
+/*
+ * When TLS is enabled this function is called once to establish the
+ * TLS connection and a second time after the TLS handshake to create
+ * the multifd channel. Without TLS it goes straight into the channel
+ * creation.
+ */
 static void multifd_new_send_channel_async(QIOTask *task, gpointer opaque)
 {
 MultiFDSendParams *p = opaque;
 QIOChannel *ioc = QIO_CHANNEL(qio_task_get_source(task));
 Error *local_err = NULL;
+bool ret;
 
 trace_multifd_new_send_channel_async(p->id);
-if (!qio_task_propagate_error(task, _err)) {
-qio_channel_set_delay(ioc, false);
-if (multifd_channel_connect(p, ioc, _err)) {
-return;
-}
+
+if (qio_task_propagate_error(task, _err)) {
+ret = false;
+goto out;
+}
+
+

[PULL 25/34] migration: Fix logic of channels and transport compatibility check

2024-02-07 Thread peterx
From: Avihai Horon 

The commit in the fixes line mistakenly modified the channels and
transport compatibility check logic so it now checks multi-channel
support only for socket transport type.

Thus, running multifd migration using a transport other than socket that
is incompatible with multi-channels (such as "exec") would lead to a
segmentation fault instead of an error message.
For example:
  (qemu) migrate_set_capability multifd on
  (qemu) migrate -d "exec:cat > /tmp/vm_state"
  Segmentation fault (core dumped)

Fix it by checking multi-channel compatibility for all transport types.

Cc: qemu-stable 
Fixes: d95533e1cdcc ("migration: modify migration_channels_and_uri_compatible() 
for new QAPI syntax")
Signed-off-by: Avihai Horon 
Reviewed-by: Peter Xu 
Link: https://lore.kernel.org/r/20240125162528.7552-2-avih...@nvidia.com
Signed-off-by: Peter Xu 
---
 migration/migration.c | 17 +++--
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 9b695685b1..b427be8762 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -129,11 +129,17 @@ static bool migration_needs_multiple_sockets(void)
 return migrate_multifd() || migrate_postcopy_preempt();
 }
 
-static bool transport_supports_multi_channels(SocketAddress *saddr)
+static bool transport_supports_multi_channels(MigrationAddress *addr)
 {
-return saddr->type == SOCKET_ADDRESS_TYPE_INET ||
-   saddr->type == SOCKET_ADDRESS_TYPE_UNIX ||
-   saddr->type == SOCKET_ADDRESS_TYPE_VSOCK;
+if (addr->transport == MIGRATION_ADDRESS_TYPE_SOCKET) {
+SocketAddress *saddr = >u.socket;
+
+return saddr->type == SOCKET_ADDRESS_TYPE_INET ||
+   saddr->type == SOCKET_ADDRESS_TYPE_UNIX ||
+   saddr->type == SOCKET_ADDRESS_TYPE_VSOCK;
+}
+
+return false;
 }
 
 static bool
@@ -141,8 +147,7 @@ 
migration_channels_and_transport_compatible(MigrationAddress *addr,
 Error **errp)
 {
 if (migration_needs_multiple_sockets() &&
-(addr->transport == MIGRATION_ADDRESS_TYPE_SOCKET) &&
-!transport_supports_multi_channels(>u.socket)) {
+!transport_supports_multi_channels(addr)) {
 error_setg(errp, "Migration requires multi-channel URIs (e.g. tcp)");
 return false;
 }
-- 
2.43.0




[PULL 21/34] migration/multifd: Cleanup multifd_load_cleanup()

2024-02-07 Thread peterx
From: Peter Xu 

Use similar logic to cleanup the recv side.

Note that multifd_recv_terminate_threads() may need some similar rework
like the sender side, but let's leave that for later.

Reviewed-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240202102857.110210-21-pet...@redhat.com
Signed-off-by: Peter Xu 
---
 migration/multifd.c | 52 ++---
 1 file changed, 30 insertions(+), 22 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index 4cb0d2cc17..e2dd2f6e04 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -1070,6 +1070,34 @@ void multifd_load_shutdown(void)
 }
 }
 
+static void multifd_recv_cleanup_channel(MultiFDRecvParams *p)
+{
+migration_ioc_unregister_yank(p->c);
+object_unref(OBJECT(p->c));
+p->c = NULL;
+qemu_mutex_destroy(>mutex);
+qemu_sem_destroy(>sem_sync);
+g_free(p->name);
+p->name = NULL;
+p->packet_len = 0;
+g_free(p->packet);
+p->packet = NULL;
+g_free(p->iov);
+p->iov = NULL;
+g_free(p->normal);
+p->normal = NULL;
+multifd_recv_state->ops->recv_cleanup(p);
+}
+
+static void multifd_recv_cleanup_state(void)
+{
+qemu_sem_destroy(_recv_state->sem_sync);
+g_free(multifd_recv_state->params);
+multifd_recv_state->params = NULL;
+g_free(multifd_recv_state);
+multifd_recv_state = NULL;
+}
+
 void multifd_load_cleanup(void)
 {
 int i;
@@ -1092,29 +1120,9 @@ void multifd_load_cleanup(void)
 qemu_thread_join(>thread);
 }
 for (i = 0; i < migrate_multifd_channels(); i++) {
-MultiFDRecvParams *p = _recv_state->params[i];
-
-migration_ioc_unregister_yank(p->c);
-object_unref(OBJECT(p->c));
-p->c = NULL;
-qemu_mutex_destroy(>mutex);
-qemu_sem_destroy(>sem_sync);
-g_free(p->name);
-p->name = NULL;
-p->packet_len = 0;
-g_free(p->packet);
-p->packet = NULL;
-g_free(p->iov);
-p->iov = NULL;
-g_free(p->normal);
-p->normal = NULL;
-multifd_recv_state->ops->recv_cleanup(p);
+multifd_recv_cleanup_channel(_recv_state->params[i]);
 }
-qemu_sem_destroy(_recv_state->sem_sync);
-g_free(multifd_recv_state->params);
-multifd_recv_state->params = NULL;
-g_free(multifd_recv_state);
-multifd_recv_state = NULL;
+multifd_recv_cleanup_state();
 }
 
 void multifd_recv_sync_main(void)
-- 
2.43.0




[PULL 10/34] migration/multifd: Rename p->num_packets and clean it up

2024-02-07 Thread peterx
From: Peter Xu 

This field, no matter whether on src or dest, is only used for debugging
purpose.

They can even be removed already, unless it still more or less provide some
accounting on "how many packets are sent/recved for this thread".  The
other more important one is called packet_num, which is embeded in the
multifd packet headers (MultiFDPacket_t).

So let's keep them for now, but make them much easier to understand, by
doing below:

  - Rename both of them to packets_sent / packets_recved, the old
  name (num_packets) are waaay too confusing when we already have
  MultiFDPacket_t.packets_num.

  - Avoid worrying on the "initial packet": we know we will send it, that's
  good enough.  The accounting won't matter a great deal to start with 0 or
  with 1.

  - Move them to where we send/recv the packets.  They're:

- multifd_send_fill_packet() for senders.
- multifd_recv_unfill_packet() for receivers.

Reviewed-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240202102857.110210-10-pet...@redhat.com
Signed-off-by: Peter Xu 
---
 migration/multifd.h |  6 +++---
 migration/multifd.c | 13 +
 2 files changed, 8 insertions(+), 11 deletions(-)

diff --git a/migration/multifd.h b/migration/multifd.h
index 08f26ef3fe..2e4ad0dc56 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -124,7 +124,7 @@ typedef struct {
 /* size of the next packet that contains pages */
 uint32_t next_packet_size;
 /* packets sent through this channel */
-uint64_t num_packets;
+uint64_t packets_sent;
 /* non zero pages sent through this channel */
 uint64_t total_normal_pages;
 /* buffers to send */
@@ -174,8 +174,8 @@ typedef struct {
 MultiFDPacket_t *packet;
 /* size of the next packet that contains pages */
 uint32_t next_packet_size;
-/* packets sent through this channel */
-uint64_t num_packets;
+/* packets received through this channel */
+uint64_t packets_recved;
 /* ramblock */
 RAMBlock *block;
 /* ramblock host address */
diff --git a/migration/multifd.c b/migration/multifd.c
index 518f9de723..eca76e2c18 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -288,6 +288,8 @@ static void multifd_send_fill_packet(MultiFDSendParams *p)
 
 packet->offset[i] = cpu_to_be64(temp);
 }
+
+p->packets_sent++;
 }
 
 static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
@@ -335,6 +337,7 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams *p, 
Error **errp)
 
 p->next_packet_size = be32_to_cpu(packet->next_packet_size);
 p->packet_num = be64_to_cpu(packet->packet_num);
+p->packets_recved++;
 
 if (p->normal_num == 0) {
 return 0;
@@ -688,8 +691,6 @@ static void *multifd_send_thread(void *opaque)
 ret = -1;
 goto out;
 }
-/* initial packet */
-p->num_packets = 1;
 
 while (true) {
 qemu_sem_post(_send_state->channels_ready);
@@ -719,7 +720,6 @@ static void *multifd_send_thread(void *opaque)
 }
 
 multifd_send_fill_packet(p);
-p->num_packets++;
 p->total_normal_pages += pages->num;
 trace_multifd_send(p->id, packet_num, pages->num, p->flags,
p->next_packet_size);
@@ -787,7 +787,7 @@ out:
 
 rcu_unregister_thread();
 migration_threads_remove(thread);
-trace_multifd_send_thread_end(p->id, p->num_packets, 
p->total_normal_pages);
+trace_multifd_send_thread_end(p->id, p->packets_sent, 
p->total_normal_pages);
 
 return NULL;
 }
@@ -1124,7 +1124,6 @@ static void *multifd_recv_thread(void *opaque)
 p->flags &= ~MULTIFD_FLAG_SYNC;
 trace_multifd_recv(p->id, p->packet_num, p->normal_num, flags,
p->next_packet_size);
-p->num_packets++;
 p->total_normal_pages += p->normal_num;
 qemu_mutex_unlock(>mutex);
 
@@ -1150,7 +1149,7 @@ static void *multifd_recv_thread(void *opaque)
 qemu_mutex_unlock(>mutex);
 
 rcu_unregister_thread();
-trace_multifd_recv_thread_end(p->id, p->num_packets, 
p->total_normal_pages);
+trace_multifd_recv_thread_end(p->id, p->packets_recved, 
p->total_normal_pages);
 
 return NULL;
 }
@@ -1252,8 +1251,6 @@ void multifd_recv_new_channel(QIOChannel *ioc, Error 
**errp)
 }
 p->c = ioc;
 object_ref(OBJECT(ioc));
-/* initial packet */
-p->num_packets = 1;
 
 p->running = true;
 qemu_thread_create(>thread, p->name, multifd_recv_thread, p,
-- 
2.43.0




[PULL 17/34] migration/multifd: Change retval of multifd_queue_page()

2024-02-07 Thread peterx
From: Peter Xu 

Using int is an overkill when there're only two options.  Change it to a
boolean.

Reviewed-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240202102857.110210-17-pet...@redhat.com
Signed-off-by: Peter Xu 
---
 migration/multifd.h | 2 +-
 migration/multifd.c | 9 +
 migration/ram.c | 2 +-
 3 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/migration/multifd.h b/migration/multifd.h
index 34a2ecb9f4..a320c53a6f 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -22,7 +22,7 @@ bool multifd_recv_all_channels_created(void);
 void multifd_recv_new_channel(QIOChannel *ioc, Error **errp);
 void multifd_recv_sync_main(void);
 int multifd_send_sync_main(void);
-int multifd_queue_page(RAMBlock *block, ram_addr_t offset);
+bool multifd_queue_page(RAMBlock *block, ram_addr_t offset);
 
 /* Multifd Compression flags */
 #define MULTIFD_FLAG_SYNC (1 << 0)
diff --git a/migration/multifd.c b/migration/multifd.c
index ba86f9dda5..12e587fda8 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -505,7 +505,8 @@ static int multifd_send_pages(void)
 return 1;
 }
 
-int multifd_queue_page(RAMBlock *block, ram_addr_t offset)
+/* Returns true if enqueue successful, false otherwise */
+bool multifd_queue_page(RAMBlock *block, ram_addr_t offset)
 {
 MultiFDPages_t *pages = multifd_send_state->pages;
 bool changed = false;
@@ -519,21 +520,21 @@ int multifd_queue_page(RAMBlock *block, ram_addr_t offset)
 pages->num++;
 
 if (pages->num < pages->allocated) {
-return 1;
+return true;
 }
 } else {
 changed = true;
 }
 
 if (multifd_send_pages() < 0) {
-return -1;
+return false;
 }
 
 if (changed) {
 return multifd_queue_page(block, offset);
 }
 
-return 1;
+return true;
 }
 
 /* Multifd send side hit an error; remember it and prepare to quit */
diff --git a/migration/ram.c b/migration/ram.c
index d5b7cd5ac2..4649a81204 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1252,7 +1252,7 @@ static int ram_save_page(RAMState *rs, PageSearchStatus 
*pss)
 
 static int ram_save_multifd_page(RAMBlock *block, ram_addr_t offset)
 {
-if (multifd_queue_page(block, offset) < 0) {
+if (!multifd_queue_page(block, offset)) {
 return -1;
 }
 stat64_add(_stats.normal_pages, 1);
-- 
2.43.0




[PULL 28/34] migration/multifd: Move multifd_send_setup error handling in to the function

2024-02-07 Thread peterx
From: Fabiano Rosas 

Hide the error handling inside multifd_send_setup to make it cleaner
for the next patch to move the function around.

Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240206215118.6171-4-faro...@suse.de
Signed-off-by: Peter Xu 
---
 migration/multifd.h   |  2 +-
 migration/migration.c |  6 +-
 migration/multifd.c   | 24 +---
 3 files changed, 19 insertions(+), 13 deletions(-)

diff --git a/migration/multifd.h b/migration/multifd.h
index 7881980ee6..8a1cad0996 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -13,7 +13,7 @@
 #ifndef QEMU_MIGRATION_MULTIFD_H
 #define QEMU_MIGRATION_MULTIFD_H
 
-int multifd_send_setup(Error **errp);
+bool multifd_send_setup(void);
 void multifd_send_shutdown(void);
 int multifd_recv_setup(Error **errp);
 void multifd_recv_cleanup(void);
diff --git a/migration/migration.c b/migration/migration.c
index b427be8762..6432a81e8b 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3635,11 +3635,7 @@ void migrate_fd_connect(MigrationState *s, Error 
*error_in)
 return;
 }
 
-if (multifd_send_setup(_err) != 0) {
-migrate_set_error(s, local_err);
-error_report_err(local_err);
-migrate_set_state(>state, MIGRATION_STATUS_SETUP,
-  MIGRATION_STATUS_FAILED);
+if (!multifd_send_setup()) {
 migrate_fd_cleanup(s);
 return;
 }
diff --git a/migration/multifd.c b/migration/multifd.c
index e6ac1ad6dc..cf865edba0 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -985,14 +985,16 @@ static void multifd_new_send_channel_create(gpointer 
opaque)
 socket_send_channel_create(multifd_new_send_channel_async, opaque);
 }
 
-int multifd_send_setup(Error **errp)
+bool multifd_send_setup(void)
 {
-int thread_count;
+MigrationState *s = migrate_get_current();
+Error *local_err = NULL;
+int thread_count, ret = 0;
 uint32_t page_count = MULTIFD_PACKET_SIZE / qemu_target_page_size();
 uint8_t i;
 
 if (!migrate_multifd()) {
-return 0;
+return true;
 }
 
 thread_count = migrate_multifd_channels();
@@ -1026,14 +1028,22 @@ int multifd_send_setup(Error **errp)
 
 for (i = 0; i < thread_count; i++) {
 MultiFDSendParams *p = _send_state->params[i];
-int ret;
 
-ret = multifd_send_state->ops->send_setup(p, errp);
+ret = multifd_send_state->ops->send_setup(p, _err);
 if (ret) {
-return ret;
+break;
 }
 }
-return 0;
+
+if (ret) {
+migrate_set_error(s, local_err);
+error_report_err(local_err);
+migrate_set_state(>state, MIGRATION_STATUS_SETUP,
+  MIGRATION_STATUS_FAILED);
+return false;
+}
+
+return true;
 }
 
 struct {
-- 
2.43.0




[PULL 11/34] migration/multifd: Move total_normal_pages accounting

2024-02-07 Thread peterx
From: Peter Xu 

Just like the previous patch, move the accounting for total_normal_pages on
both src/dst sides into the packet fill/unfill procedures.

Reviewed-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240202102857.110210-11-pet...@redhat.com
Signed-off-by: Peter Xu 
---
 migration/multifd.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index eca76e2c18..94a0124934 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -290,6 +290,7 @@ static void multifd_send_fill_packet(MultiFDSendParams *p)
 }
 
 p->packets_sent++;
+p->total_normal_pages += pages->num;
 }
 
 static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
@@ -338,6 +339,7 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams *p, 
Error **errp)
 p->next_packet_size = be32_to_cpu(packet->next_packet_size);
 p->packet_num = be64_to_cpu(packet->packet_num);
 p->packets_recved++;
+p->total_normal_pages += p->normal_num;
 
 if (p->normal_num == 0) {
 return 0;
@@ -720,7 +722,6 @@ static void *multifd_send_thread(void *opaque)
 }
 
 multifd_send_fill_packet(p);
-p->total_normal_pages += pages->num;
 trace_multifd_send(p->id, packet_num, pages->num, p->flags,
p->next_packet_size);
 
@@ -1124,7 +1125,6 @@ static void *multifd_recv_thread(void *opaque)
 p->flags &= ~MULTIFD_FLAG_SYNC;
 trace_multifd_recv(p->id, p->packet_num, p->normal_num, flags,
p->next_packet_size);
-p->total_normal_pages += p->normal_num;
 qemu_mutex_unlock(>mutex);
 
 if (p->normal_num) {
-- 
2.43.0




[PULL 34/34] ci: Update comment for migration-compat-aarch64

2024-02-07 Thread peterx
From: Peter Xu 

It turns out that we may not be able to enable this test even for the
upcoming v9.0.  Document what we're still missing.

Reviewed-by: "Daniel P. Berrangé" 
Link: https://lore.kernel.org/r/20240207005403.242235-4-pet...@redhat.com
Signed-off-by: Peter Xu 
---
 .gitlab-ci.d/buildtest.yml | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml
index cfe95c1b17..f56df59c94 100644
--- a/.gitlab-ci.d/buildtest.yml
+++ b/.gitlab-ci.d/buildtest.yml
@@ -219,9 +219,10 @@ build-previous-qemu:
 - QTEST_QEMU_BINARY_DST=./qemu-system-${TARGET}
   QTEST_QEMU_BINARY=../build/qemu-system-${TARGET} 
./tests/qtest/migration-test
 
-# This job is disabled until we release 9.0. The existing
-# migration-test in 8.2 is broken on aarch64. The fix was already
-# commited, but it will only take effect once 9.0 is out.
+# This job needs to be disabled until we can have an aarch64 CPU model that
+# will both (1) support both KVM and TCG, and (2) provide a stable ABI.
+# Currently only "-cpu max" can provide (1), however it doesn't guarantee
+# (2).  Mark this test skipped until later.
 migration-compat-aarch64:
   extends: .migration-compat-common
   variables:
-- 
2.43.0




[PULL 31/34] migration/multifd: Add a synchronization point for channel creation

2024-02-07 Thread peterx
From: Fabiano Rosas 

It is possible that one of the multifd channels fails to be created at
multifd_new_send_channel_async() while the rest of the channel
creation tasks are still in flight.

This could lead to multifd_save_cleanup() executing the
qemu_thread_join() loop too early and not waiting for the threads
which haven't been created yet, leading to the freeing of resources
that the newly created threads will try to access and crash.

Add a synchronization point after which there will be no attempts at
thread creation and therefore calling multifd_save_cleanup() past that
point will ensure it properly waits for the threads.

A note about performance: Prior to this patch, if a channel took too
long to be established, other channels could finish connecting first
and already start taking load. Now we're bounded by the
slowest-connecting channel.

Reported-by: Avihai Horon 
Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240206215118.6171-7-faro...@suse.de
Signed-off-by: Peter Xu 
---
 migration/multifd.c | 32 ++--
 1 file changed, 26 insertions(+), 6 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index 3db18dc79e..adfe8c9a0a 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -62,6 +62,11 @@ struct {
  * Make it easy for now.
  */
 uintptr_t packet_num;
+/*
+ * Synchronization point past which no more channels will be
+ * created.
+ */
+QemuSemaphore channels_created;
 /* send channels ready */
 QemuSemaphore channels_ready;
 /*
@@ -622,10 +627,6 @@ static void multifd_send_terminate_threads(void)
 
 /*
  * Finally recycle all the threads.
- *
- * TODO: p->running is still buggy, e.g. we can reach here without the
- * corresponding multifd_new_send_channel_async() get invoked yet,
- * then a new thread can even be created after this function returns.
  */
 for (i = 0; i < migrate_multifd_channels(); i++) {
 MultiFDSendParams *p = _send_state->params[i];
@@ -670,6 +671,7 @@ static bool multifd_send_cleanup_channel(MultiFDSendParams 
*p, Error **errp)
 
 static void multifd_send_cleanup_state(void)
 {
+qemu_sem_destroy(_send_state->channels_created);
 qemu_sem_destroy(_send_state->channels_ready);
 g_free(multifd_send_state->params);
 multifd_send_state->params = NULL;
@@ -954,18 +956,26 @@ static void multifd_new_send_channel_async(QIOTask *task, 
gpointer opaque)
 
 if (migrate_channel_requires_tls_upgrade(ioc)) {
 ret = multifd_tls_channel_connect(p, ioc, _err);
+if (ret) {
+return;
+}
 } else {
 ret = multifd_channel_connect(p, ioc, _err);
 }
 
+out:
+/*
+ * Here we're not interested whether creation succeeded, only that
+ * it happened at all.
+ */
+qemu_sem_post(_send_state->channels_created);
+
 if (ret) {
 return;
 }
 
-out:
 trace_multifd_new_send_channel_async_error(p->id, local_err);
 multifd_send_set_error(local_err);
-multifd_send_kick_main(p);
 if (!p->c) {
 /*
  * If no channel has been created, drop the initial
@@ -998,6 +1008,7 @@ bool multifd_send_setup(void)
 multifd_send_state = g_malloc0(sizeof(*multifd_send_state));
 multifd_send_state->params = g_new0(MultiFDSendParams, thread_count);
 multifd_send_state->pages = multifd_pages_init(page_count);
+qemu_sem_init(_send_state->channels_created, 0);
 qemu_sem_init(_send_state->channels_ready, 0);
 qatomic_set(_send_state->exiting, 0);
 multifd_send_state->ops = multifd_ops[migrate_multifd_compression()];
@@ -1023,6 +1034,15 @@ bool multifd_send_setup(void)
 multifd_new_send_channel_create(p);
 }
 
+/*
+ * Wait until channel creation has started for all channels. The
+ * creation can still fail, but no more channels will be created
+ * past this point.
+ */
+for (i = 0; i < thread_count; i++) {
+qemu_sem_wait(_send_state->channels_created);
+}
+
 for (i = 0; i < thread_count; i++) {
 MultiFDSendParams *p = _send_state->params[i];
 
-- 
2.43.0




[PULL 29/34] migration/multifd: Move multifd_send_setup into migration thread

2024-02-07 Thread peterx
From: Fabiano Rosas 

We currently have an unfavorable situation around multifd channels
creation and the migration thread execution.

We create the multifd channels with qio_channel_socket_connect_async
-> qio_task_run_in_thread, but only connect them at the
multifd_new_send_channel_async callback, called from
qio_task_complete, which is registered as a glib event.

So at multifd_send_setup() we create the channels, but they will only
be actually usable after the whole multifd_send_setup() calling stack
returns back to the main loop. Which means that the migration thread
is already up and running without any possibility for the multifd
channels to be ready on time.

We currently rely on the channels-ready semaphore blocking
multifd_send_sync_main() until channels start to come up and release
it. However there have been bugs recently found when a channel's
creation fails and multifd_send_cleanup() is allowed to run while
other channels are still being created.

Let's start to organize this situation by moving the
multifd_send_setup() call into the migration thread. That way we
unblock the main-loop to dispatch the completion callbacks and
actually have a chance of getting the multifd channels ready for when
the migration thread needs them.

The next patches will deal with the synchronization aspects.

Note that this takes multifd_send_setup() out of the BQL.

Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240206215118.6171-5-faro...@suse.de
Signed-off-by: Peter Xu 
---
 migration/migration.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 6432a81e8b..ab21de2cad 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3327,6 +3327,10 @@ static void *migration_thread(void *opaque)
 object_ref(OBJECT(s));
 update_iteration_initial_status(s);
 
+if (!multifd_send_setup()) {
+goto out;
+}
+
 bql_lock();
 qemu_savevm_state_header(s->to_dst_file);
 bql_unlock();
@@ -3398,6 +3402,7 @@ static void *migration_thread(void *opaque)
 urgent = migration_rate_limit();
 }
 
+out:
 trace_migration_thread_after_loop();
 migration_iteration_finish(s);
 object_unref(OBJECT(s));
@@ -3635,11 +3640,6 @@ void migrate_fd_connect(MigrationState *s, Error 
*error_in)
 return;
 }
 
-if (!multifd_send_setup()) {
-migrate_fd_cleanup(s);
-return;
-}
-
 if (migrate_background_snapshot()) {
 qemu_thread_create(>thread, "bg_snapshot",
 bg_migration_thread, s, QEMU_THREAD_JOINABLE);
-- 
2.43.0




[PULL 16/34] migration/multifd: Split multifd_send_terminate_threads()

2024-02-07 Thread peterx
From: Peter Xu 

Split multifd_send_terminate_threads() into two functions:

  - multifd_send_set_error(): used when an error happened on the sender
side, set error and quit state only

  - multifd_send_terminate_threads(): used only by the main thread to kick
all multifd send threads out of sleep, for the last recycling.

Use multifd_send_set_error() in the three old call sites where only the
error will be set.

Use multifd_send_terminate_threads() in the last one where the main thread
will kick the multifd threads at last in multifd_save_cleanup().

Both helpers will need to set quitting=1.

Suggested-by: Fabiano Rosas 
Reviewed-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240202102857.110210-16-pet...@redhat.com
Signed-off-by: Peter Xu 
---
 migration/multifd.c| 27 ++-
 migration/trace-events |  2 +-
 2 files changed, 19 insertions(+), 10 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index 28b54100cd..ba86f9dda5 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -536,10 +536,9 @@ int multifd_queue_page(RAMBlock *block, ram_addr_t offset)
 return 1;
 }
 
-static void multifd_send_terminate_threads(Error *err)
+/* Multifd send side hit an error; remember it and prepare to quit */
+static void multifd_send_set_error(Error *err)
 {
-int i;
-
 /*
  * We don't want to exit each threads twice.  Depending on where
  * we get the error, or if there are two independent errors in two
@@ -550,8 +549,6 @@ static void multifd_send_terminate_threads(Error *err)
 return;
 }
 
-trace_multifd_send_terminate_threads(err != NULL);
-
 if (err) {
 MigrationState *s = migrate_get_current();
 migrate_set_error(s, err);
@@ -563,7 +560,19 @@ static void multifd_send_terminate_threads(Error *err)
   MIGRATION_STATUS_FAILED);
 }
 }
+}
+
+static void multifd_send_terminate_threads(void)
+{
+int i;
+
+trace_multifd_send_terminate_threads();
 
+/*
+ * Tell everyone we're quitting.  No xchg() needed here; we simply
+ * always set it.
+ */
+qatomic_set(_send_state->exiting, 1);
 for (i = 0; i < migrate_multifd_channels(); i++) {
 MultiFDSendParams *p = _send_state->params[i];
 
@@ -586,7 +595,7 @@ void multifd_save_cleanup(void)
 if (!migrate_multifd()) {
 return;
 }
-multifd_send_terminate_threads(NULL);
+multifd_send_terminate_threads();
 for (i = 0; i < migrate_multifd_channels(); i++) {
 MultiFDSendParams *p = _send_state->params[i];
 
@@ -780,7 +789,7 @@ out:
 if (ret) {
 assert(local_err);
 trace_multifd_send_error(p->id);
-multifd_send_terminate_threads(local_err);
+multifd_send_set_error(local_err);
 multifd_send_kick_main(p);
 error_free(local_err);
 }
@@ -816,7 +825,7 @@ static void multifd_tls_outgoing_handshake(QIOTask *task,
 
 trace_multifd_tls_outgoing_handshake_error(ioc, error_get_pretty(err));
 
-multifd_send_terminate_threads(err);
+multifd_send_set_error(err);
 multifd_send_kick_main(p);
 error_free(err);
 }
@@ -898,7 +907,7 @@ static void multifd_new_send_channel_async(QIOTask *task, 
gpointer opaque)
 }
 
 trace_multifd_new_send_channel_async_error(p->id, local_err);
-multifd_send_terminate_threads(local_err);
+multifd_send_set_error(local_err);
 multifd_send_kick_main(p);
 object_unref(OBJECT(ioc));
 error_free(local_err);
diff --git a/migration/trace-events b/migration/trace-events
index de4a743c8a..298ad2b0dd 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -141,7 +141,7 @@ multifd_send_error(uint8_t id) "channel %u"
 multifd_send_sync_main(long packet_num) "packet num %ld"
 multifd_send_sync_main_signal(uint8_t id) "channel %u"
 multifd_send_sync_main_wait(uint8_t id) "channel %u"
-multifd_send_terminate_threads(bool error) "error %d"
+multifd_send_terminate_threads(void) ""
 multifd_send_thread_end(uint8_t id, uint64_t packets, uint64_t normal_pages) 
"channel %u packets %" PRIu64 " normal pages %"  PRIu64
 multifd_send_thread_start(uint8_t id) "%u"
 multifd_tls_outgoing_handshake_start(void *ioc, void *tioc, const char 
*hostname) "ioc=%p tioc=%p hostname=%s"
-- 
2.43.0




[PULL 22/34] migration/multifd: Stick with send/recv on function names

2024-02-07 Thread peterx
From: Peter Xu 

Most of the multifd code uses send/recv to represent the two sides, but
some rare cases use save/load.

Since send/recv is the majority, replacing the save/load use cases to use
send/recv globally.  Now we reach a consensus on the naming.

Reviewed-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240202102857.110210-22-pet...@redhat.com
Signed-off-by: Peter Xu 
---
 migration/multifd.h   | 10 +-
 migration/migration.c | 12 ++--
 migration/multifd.c   | 10 +-
 3 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/migration/multifd.h b/migration/multifd.h
index a320c53a6f..9b40a53cb6 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -13,11 +13,11 @@
 #ifndef QEMU_MIGRATION_MULTIFD_H
 #define QEMU_MIGRATION_MULTIFD_H
 
-int multifd_save_setup(Error **errp);
-void multifd_save_cleanup(void);
-int multifd_load_setup(Error **errp);
-void multifd_load_cleanup(void);
-void multifd_load_shutdown(void);
+int multifd_send_setup(Error **errp);
+void multifd_send_shutdown(void);
+int multifd_recv_setup(Error **errp);
+void multifd_recv_cleanup(void);
+void multifd_recv_shutdown(void);
 bool multifd_recv_all_channels_created(void);
 void multifd_recv_new_channel(QIOChannel *ioc, Error **errp);
 void multifd_recv_sync_main(void);
diff --git a/migration/migration.c b/migration/migration.c
index b574e66f7b..9b695685b1 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -312,7 +312,7 @@ void migration_incoming_state_destroy(void)
 {
 struct MigrationIncomingState *mis = migration_incoming_get_current();
 
-multifd_load_cleanup();
+multifd_recv_cleanup();
 compress_threads_load_cleanup();
 
 if (mis->to_src_file) {
@@ -663,7 +663,7 @@ static void process_incoming_migration_bh(void *opaque)
 
 trace_vmstate_downtime_checkpoint("dst-precopy-bh-announced");
 
-multifd_load_shutdown();
+multifd_recv_shutdown();
 
 dirty_bitmap_mig_before_vm_start();
 
@@ -760,7 +760,7 @@ fail:
   MIGRATION_STATUS_FAILED);
 qemu_fclose(mis->from_src_file);
 
-multifd_load_cleanup();
+multifd_recv_cleanup();
 compress_threads_load_cleanup();
 
 exit(EXIT_FAILURE);
@@ -886,7 +886,7 @@ void migration_ioc_process_incoming(QIOChannel *ioc, Error 
**errp)
 default_channel = !mis->from_src_file;
 }
 
-if (multifd_load_setup(errp) != 0) {
+if (multifd_recv_setup(errp) != 0) {
 return;
 }
 
@@ -1332,7 +1332,7 @@ static void migrate_fd_cleanup(MigrationState *s)
 }
 bql_lock();
 
-multifd_save_cleanup();
+multifd_send_shutdown();
 qemu_mutex_lock(>qemu_file_lock);
 tmp = s->to_dst_file;
 s->to_dst_file = NULL;
@@ -3630,7 +3630,7 @@ void migrate_fd_connect(MigrationState *s, Error 
*error_in)
 return;
 }
 
-if (multifd_save_setup(_err) != 0) {
+if (multifd_send_setup(_err) != 0) {
 migrate_set_error(s, local_err);
 error_report_err(local_err);
 migrate_set_state(>state, MIGRATION_STATUS_SETUP,
diff --git a/migration/multifd.c b/migration/multifd.c
index e2dd2f6e04..130f86a1fb 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -663,7 +663,7 @@ static void multifd_send_cleanup_state(void)
 multifd_send_state = NULL;
 }
 
-void multifd_save_cleanup(void)
+void multifd_send_shutdown(void)
 {
 int i;
 
@@ -965,7 +965,7 @@ static void multifd_new_send_channel_create(gpointer opaque)
 socket_send_channel_create(multifd_new_send_channel_async, opaque);
 }
 
-int multifd_save_setup(Error **errp)
+int multifd_send_setup(Error **errp)
 {
 int thread_count;
 uint32_t page_count = MULTIFD_PACKET_SIZE / qemu_target_page_size();
@@ -1063,7 +1063,7 @@ static void multifd_recv_terminate_threads(Error *err)
 }
 }
 
-void multifd_load_shutdown(void)
+void multifd_recv_shutdown(void)
 {
 if (migrate_multifd()) {
 multifd_recv_terminate_threads(NULL);
@@ -1098,7 +1098,7 @@ static void multifd_recv_cleanup_state(void)
 multifd_recv_state = NULL;
 }
 
-void multifd_load_cleanup(void)
+void multifd_recv_cleanup(void)
 {
 int i;
 
@@ -1213,7 +1213,7 @@ static void *multifd_recv_thread(void *opaque)
 return NULL;
 }
 
-int multifd_load_setup(Error **errp)
+int multifd_recv_setup(Error **errp)
 {
 int thread_count;
 uint32_t page_count = MULTIFD_PACKET_SIZE / qemu_target_page_size();
-- 
2.43.0




[PULL 02/34] migration/multifd: Drop stale comment for multifd zero copy

2024-02-07 Thread peterx
From: Peter Xu 

We've already done that with multifd_flush_after_each_section, for multifd
in general.  Drop the stale "TODO-like" comment.

Reviewed-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240202102857.110210-2-pet...@redhat.com
Signed-off-by: Peter Xu 
---
 migration/multifd.c | 11 ---
 1 file changed, 11 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index 25cbc6dc6b..eee2586770 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -598,17 +598,6 @@ int multifd_send_sync_main(void)
 }
 }
 
-/*
- * When using zero-copy, it's necessary to flush the pages before any of
- * the pages can be sent again, so we'll make sure the new version of the
- * pages will always arrive _later_ than the old pages.
- *
- * Currently we achieve this by flushing the zero-page requested writes
- * per ram iteration, but in the future we could potentially optimize it
- * to be less frequent, e.g. only after we finished one whole scanning of
- * all the dirty bitmaps.
- */
-
 flush_zero_copy = migrate_zero_copy_send();
 
 for (i = 0; i < migrate_multifd_channels(); i++) {
-- 
2.43.0




[PULL 32/34] tests/migration-test: Stick with gicv3 in aarch64 test

2024-02-07 Thread peterx
From: Peter Xu 

Recently we introduced cross-binary migration test.  It's always wanted
that migration-test uses stable guest ABI for both QEMU binaries in this
case, so that both QEMU binaries will be compatible on the migration
stream with the cmdline specified.

Switch to a static gic version "3" rather than using version "max", so that
GIC should be stable now across any future QEMU binaries for migration-test.

Here the version can actually be anything as long as the ABI is stable.  We
choose "3" because it's the majority of what we already use in QEMU while
still new enough: "git grep gic-version=3" shows 6 hit, while version 4 has
no direct user yet besides "max".

Note that even with this change, aarch64 won't be able to work yet with
migration cross binary test, but then the only missing piece will be the
stable CPU model.

Reviewed-by: "Daniel P. Berrangé" 
Link: https://lore.kernel.org/r/20240207005403.242235-2-pet...@redhat.com
Signed-off-by: Peter Xu 
---
 tests/qtest/migration-test.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 7675519cfa..8a5bb1752e 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -819,7 +819,7 @@ static int test_migrate_start(QTestState **from, QTestState 
**to,
 } else if (strcmp(arch, "aarch64") == 0) {
 memory_size = "150M";
 machine_alias = "virt";
-machine_opts = "gic-version=max";
+machine_opts = "gic-version=3";
 arch_opts = g_strdup_printf("-cpu max -kernel %s", bootpath);
 start_address = ARM_TEST_MEM_START;
 end_address = ARM_TEST_MEM_END;
-- 
2.43.0




[PULL 03/34] migration/multifd: multifd_send_kick_main()

2024-02-07 Thread peterx
From: Peter Xu 

When a multifd sender thread hit errors, it always needs to kick the main
thread by kicking all the semaphores that it can be waiting upon.

Provide a helper for it and deduplicate the code.

Reviewed-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240202102857.110210-3-pet...@redhat.com
Signed-off-by: Peter Xu 
---
 migration/multifd.c | 21 +++--
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index eee2586770..b8d2c96533 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -372,6 +372,18 @@ struct {
 MultiFDMethods *ops;
 } *multifd_send_state;
 
+/*
+ * The migration thread can wait on either of the two semaphores.  This
+ * function can be used to kick the main thread out of waiting on either of
+ * them.  Should mostly only be called when something wrong happened with
+ * the current multifd send thread.
+ */
+static void multifd_send_kick_main(MultiFDSendParams *p)
+{
+qemu_sem_post(>sem_sync);
+qemu_sem_post(_send_state->channels_ready);
+}
+
 /*
  * How we use multifd_send_state->pages and channel->pages?
  *
@@ -739,8 +751,7 @@ out:
 assert(local_err);
 trace_multifd_send_error(p->id);
 multifd_send_terminate_threads(local_err);
-qemu_sem_post(>sem_sync);
-qemu_sem_post(_send_state->channels_ready);
+multifd_send_kick_main(p);
 error_free(local_err);
 }
 
@@ -781,8 +792,7 @@ static void multifd_tls_outgoing_handshake(QIOTask *task,
  * is not created, and then tell who pay attention to me.
  */
 p->quit = true;
-qemu_sem_post(_send_state->channels_ready);
-qemu_sem_post(>sem_sync);
+multifd_send_kick_main(p);
 error_free(err);
 }
 
@@ -852,8 +862,7 @@ static void 
multifd_new_send_channel_cleanup(MultiFDSendParams *p,
 {
  migrate_set_error(migrate_get_current(), err);
  /* Error happen, we need to tell who pay attention to me */
- qemu_sem_post(_send_state->channels_ready);
- qemu_sem_post(>sem_sync);
+ multifd_send_kick_main(p);
  /*
   * Although multifd_send_thread is not created, but main migration
   * thread need to judge whether it is running, so we need to mark
-- 
2.43.0




[PULL 14/34] migration/multifd: Move header prepare/fill into send_prepare()

2024-02-07 Thread peterx
From: Peter Xu 

This patch redefines the interfacing of ->send_prepare().  It further
simplifies multifd_send_thread() especially on zero copy.

Now with the new interface, we require the hook to do all the work for
preparing the IOVs to send.  After it's completed, the IOVs should be ready
to be dumped into the specific multifd QIOChannel later.

So now the API looks like:

  p->pages --->  send_prepare() -> IOVs

This also prepares for the case where the input can be extended to even not
any p->pages.  But that's for later.

This patch will achieve similar goal of what Fabiano used to propose here:

https://lore.kernel.org/r/20240126221943.26628-1-faro...@suse.de

However the send() interface may not be necessary.  I'm boldly attaching a
"Co-developed-by" for Fabiano.

Co-developed-by: Fabiano Rosas 
Reviewed-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240202102857.110210-14-pet...@redhat.com
Signed-off-by: Peter Xu 
---
 migration/multifd.h  |  1 +
 migration/multifd-zlib.c |  4 +++
 migration/multifd-zstd.c |  4 +++
 migration/multifd.c  | 61 ++--
 4 files changed, 37 insertions(+), 33 deletions(-)

diff --git a/migration/multifd.h b/migration/multifd.h
index 4ec005f53f..34a2ecb9f4 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -208,6 +208,7 @@ typedef struct {
 } MultiFDMethods;
 
 void multifd_register_ops(int method, MultiFDMethods *ops);
+void multifd_send_fill_packet(MultiFDSendParams *p);
 
 static inline void multifd_send_prepare_header(MultiFDSendParams *p)
 {
diff --git a/migration/multifd-zlib.c b/migration/multifd-zlib.c
index 100809abc1..012e3bdea1 100644
--- a/migration/multifd-zlib.c
+++ b/migration/multifd-zlib.c
@@ -123,6 +123,8 @@ static int zlib_send_prepare(MultiFDSendParams *p, Error 
**errp)
 int ret;
 uint32_t i;
 
+multifd_send_prepare_header(p);
+
 for (i = 0; i < pages->num; i++) {
 uint32_t available = z->zbuff_len - out_size;
 int flush = Z_NO_FLUSH;
@@ -172,6 +174,8 @@ static int zlib_send_prepare(MultiFDSendParams *p, Error 
**errp)
 p->next_packet_size = out_size;
 p->flags |= MULTIFD_FLAG_ZLIB;
 
+multifd_send_fill_packet(p);
+
 return 0;
 }
 
diff --git a/migration/multifd-zstd.c b/migration/multifd-zstd.c
index 2023edd8cc..dc8fe43e94 100644
--- a/migration/multifd-zstd.c
+++ b/migration/multifd-zstd.c
@@ -118,6 +118,8 @@ static int zstd_send_prepare(MultiFDSendParams *p, Error 
**errp)
 int ret;
 uint32_t i;
 
+multifd_send_prepare_header(p);
+
 z->out.dst = z->zbuff;
 z->out.size = z->zbuff_len;
 z->out.pos = 0;
@@ -161,6 +163,8 @@ static int zstd_send_prepare(MultiFDSendParams *p, Error 
**errp)
 p->next_packet_size = z->out.pos;
 p->flags |= MULTIFD_FLAG_ZSTD;
 
+multifd_send_fill_packet(p);
+
 return 0;
 }
 
diff --git a/migration/multifd.c b/migration/multifd.c
index cd4467aff4..6aa44340de 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -50,15 +50,15 @@ typedef struct {
 /**
  * nocomp_send_setup: setup send side
  *
- * For no compression this function does nothing.
- *
- * Returns 0 for success or -1 for error
- *
  * @p: Params for the channel that we are using
  * @errp: pointer to an error
  */
 static int nocomp_send_setup(MultiFDSendParams *p, Error **errp)
 {
+if (migrate_zero_copy_send()) {
+p->write_flags |= QIO_CHANNEL_WRITE_FLAG_ZERO_COPY;
+}
+
 return 0;
 }
 
@@ -88,7 +88,17 @@ static void nocomp_send_cleanup(MultiFDSendParams *p, Error 
**errp)
  */
 static int nocomp_send_prepare(MultiFDSendParams *p, Error **errp)
 {
+bool use_zero_copy_send = migrate_zero_copy_send();
 MultiFDPages_t *pages = p->pages;
+int ret;
+
+if (!use_zero_copy_send) {
+/*
+ * Only !zerocopy needs the header in IOV; zerocopy will
+ * send it separately.
+ */
+multifd_send_prepare_header(p);
+}
 
 for (int i = 0; i < pages->num; i++) {
 p->iov[p->iovs_num].iov_base = pages->block->host + pages->offset[i];
@@ -98,6 +108,18 @@ static int nocomp_send_prepare(MultiFDSendParams *p, Error 
**errp)
 
 p->next_packet_size = pages->num * p->page_size;
 p->flags |= MULTIFD_FLAG_NOCOMP;
+
+multifd_send_fill_packet(p);
+
+if (use_zero_copy_send) {
+/* Send header first, without zerocopy */
+ret = qio_channel_write_all(p->c, (void *)p->packet,
+p->packet_len, errp);
+if (ret != 0) {
+return -1;
+}
+}
+
 return 0;
 }
 
@@ -266,7 +288,7 @@ static void multifd_pages_clear(MultiFDPages_t *pages)
 g_free(pages);
 }
 
-static void multifd_send_fill_packet(MultiFDSendParams *p)
+void multifd_send_fill_packet(MultiFDSendParams *p)
 {
 MultiFDPacket_t *packet = p->packet;
 MultiFDPages_t *pages = p->pages;
@@ -688,7 +710,6 @@ static void *multifd_send_thread(void *opaque)
 MigrationThread *thread = 

[PULL 09/34] migration/multifd: Drop pages->num check in sender thread

2024-02-07 Thread peterx
From: Peter Xu 

Now with a split SYNC handler, we always have pages->num set for
pending_job==true.  Assert it instead.

Reviewed-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240202102857.110210-9-pet...@redhat.com
Signed-off-by: Peter Xu 
---
 migration/multifd.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index 4d5a01ed93..518f9de723 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -710,13 +710,14 @@ static void *multifd_send_thread(void *opaque)
 p->iovs_num = 1;
 }
 
-if (pages->num) {
-ret = multifd_send_state->ops->send_prepare(p, _err);
-if (ret != 0) {
-qemu_mutex_unlock(>mutex);
-break;
-}
+assert(pages->num);
+
+ret = multifd_send_state->ops->send_prepare(p, _err);
+if (ret != 0) {
+qemu_mutex_unlock(>mutex);
+break;
 }
+
 multifd_send_fill_packet(p);
 p->num_packets++;
 p->total_normal_pages += pages->num;
-- 
2.43.0




[PULL 00/34] Migration staging patches

2024-02-07 Thread peterx
From: Peter Xu 

The following changes since commit 39a6e4f87e7b75a45b08d6dc8b8b7c2954c87440:

  Merge tag 'pull-qapi-2024-02-03' of https://repo.or.cz/qemu/armbru into 
staging (2024-02-03 13:31:58 +)

are available in the Git repository at:

  https://gitlab.com/peterx/qemu.git tags/migration-staging-pull-request

for you to fetch changes up to 940bf8ff1ca82aa458c553d9aa9dd7671ed15a4d:

  ci: Update comment for migration-compat-aarch64 (2024-02-07 10:51:27 +0800)


Migration pull

- William's fix on hwpoison migration which used to crash QEMU
- Peter's multifd cleanup + bugfix + optimizations
- Avihai's fix on multifd crash over non-socket channels
- Fabiano's multifd thread-race fix
- Peter's CI fix series



Avihai Horon (1):
  migration: Fix logic of channels and transport compatibility check

Fabiano Rosas (6):
  migration/multifd: Join the TLS thread
  migration/multifd: Remove p->running
  migration/multifd: Move multifd_send_setup error handling in to the
function
  migration/multifd: Move multifd_send_setup into migration thread
  migration/multifd: Unify multifd and TLS connection paths
  migration/multifd: Add a synchronization point for channel creation

Peter Xu (26):
  migration/multifd: Drop stale comment for multifd zero copy
  migration/multifd: multifd_send_kick_main()
  migration/multifd: Drop MultiFDSendParams.quit, cleanup error paths
  migration/multifd: Postpone reset of MultiFDPages_t
  migration/multifd: Drop MultiFDSendParams.normal[] array
  migration/multifd: Separate SYNC request with normal jobs
  migration/multifd: Simplify locking in sender thread
  migration/multifd: Drop pages->num check in sender thread
  migration/multifd: Rename p->num_packets and clean it up
  migration/multifd: Move total_normal_pages accounting
  migration/multifd: Move trace_multifd_send|recv()
  migration/multifd: multifd_send_prepare_header()
  migration/multifd: Move header prepare/fill into send_prepare()
  migration/multifd: Forbid spurious wakeups
  migration/multifd: Split multifd_send_terminate_threads()
  migration/multifd: Change retval of multifd_queue_page()
  migration/multifd: Change retval of multifd_send_pages()
  migration/multifd: Rewrite multifd_queue_page()
  migration/multifd: Cleanup multifd_save_cleanup()
  migration/multifd: Cleanup multifd_load_cleanup()
  migration/multifd: Stick with send/recv on function names
  migration/multifd: Fix MultiFDSendParams.packet_num race
  migration/multifd: Optimize sender side to be lockless
  tests/migration-test: Stick with gicv3 in aarch64 test
  ci: Remove tag dependency for build-previous-qemu
  ci: Update comment for migration-compat-aarch64

William Roche (1):
  migration: prevent migration when VM has poisoned memory

 include/sysemu/kvm.h |   6 +
 migration/multifd.h  |  59 +--
 accel/kvm/kvm-all.c  |  10 +
 accel/stubs/kvm-stub.c   |   5 +
 migration/migration.c|  48 ++-
 migration/multifd-zlib.c |  11 +-
 migration/multifd-zstd.c |  11 +-
 migration/multifd.c  | 778 ---
 migration/ram.c  |   2 +-
 tests/qtest/migration-test.c |   2 +-
 .gitlab-ci.d/buildtest.yml   |   9 +-
 migration/trace-events   |   2 +-
 12 files changed, 547 insertions(+), 396 deletions(-)

-- 
2.43.0




[PULL 05/34] migration/multifd: Postpone reset of MultiFDPages_t

2024-02-07 Thread peterx
From: Peter Xu 

Now we reset MultiFDPages_t object in the multifd sender thread in the
middle of the sending job.  That's not necessary, because the "*pages"
struct will not be reused anyway until pending_job is cleared.

Move that to the end after the job is completed, provide a helper to reset
a "*pages" object.  Use that same helper when free the object too.

This prepares us to keep using p->pages in the follow up patches, where we
may drop p->normal[].

Reviewed-by: Fabiano Rosas 
Link: https://lore.kernel.org/r/20240202102857.110210-5-pet...@redhat.com
Signed-off-by: Peter Xu 
---
 migration/multifd.c | 18 ++
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index 2c98023d67..5633ac245a 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -172,6 +172,17 @@ void multifd_register_ops(int method, MultiFDMethods *ops)
 multifd_ops[method] = ops;
 }
 
+/* Reset a MultiFDPages_t* object for the next use */
+static void multifd_pages_reset(MultiFDPages_t *pages)
+{
+/*
+ * We don't need to touch offset[] array, because it will be
+ * overwritten later when reused.
+ */
+pages->num = 0;
+pages->block = NULL;
+}
+
 static int multifd_send_initial_packet(MultiFDSendParams *p, Error **errp)
 {
 MultiFDInit_t msg = {};
@@ -248,9 +259,8 @@ static MultiFDPages_t *multifd_pages_init(uint32_t n)
 
 static void multifd_pages_clear(MultiFDPages_t *pages)
 {
-pages->num = 0;
+multifd_pages_reset(pages);
 pages->allocated = 0;
-pages->block = NULL;
 g_free(pages->offset);
 pages->offset = NULL;
 g_free(pages);
@@ -704,8 +714,6 @@ static void *multifd_send_thread(void *opaque)
 p->flags = 0;
 p->num_packets++;
 p->total_normal_pages += p->normal_num;
-p->pages->num = 0;
-p->pages->block = NULL;
 qemu_mutex_unlock(>mutex);
 
 trace_multifd_send(p->id, packet_num, p->normal_num, flags,
@@ -732,6 +740,8 @@ static void *multifd_send_thread(void *opaque)
 
 stat64_add(_stats.multifd_bytes,
p->next_packet_size + p->packet_len);
+
+multifd_pages_reset(p->pages);
 p->next_packet_size = 0;
 qemu_mutex_lock(>mutex);
 p->pending_job--;
-- 
2.43.0




[PULL 01/34] migration: prevent migration when VM has poisoned memory

2024-02-07 Thread peterx
From: William Roche 

A memory page poisoned from the hypervisor level is no longer readable.
The migration of a VM will crash Qemu when it tries to read the
memory address space and stumbles on the poisoned page with a similar
stack trace:

Program terminated with signal SIGBUS, Bus error.
#0  _mm256_loadu_si256
#1  buffer_zero_avx2
#2  select_accel_fn
#3  buffer_is_zero
#4  save_zero_page
#5  ram_save_target_page_legacy
#6  ram_save_host_page
#7  ram_find_and_save_block
#8  ram_save_iterate
#9  qemu_savevm_state_iterate
#10 migration_iteration_run
#11 migration_thread
#12 qemu_thread_start

To avoid this VM crash during the migration, prevent the migration
when a known hardware poison exists on the VM.

Signed-off-by: William Roche 
Link: https://lore.kernel.org/r/20240130190640.139364-2-william.ro...@oracle.com
Signed-off-by: Peter Xu 
---
 include/sysemu/kvm.h   |  6 ++
 accel/kvm/kvm-all.c| 10 ++
 accel/stubs/kvm-stub.c |  5 +
 migration/migration.c  |  7 +++
 4 files changed, 28 insertions(+)

diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index d614878164..fad9a7e8ff 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -538,4 +538,10 @@ bool kvm_arch_cpu_check_are_resettable(void);
 bool kvm_dirty_ring_enabled(void);
 
 uint32_t kvm_dirty_ring_size(void);
+
+/**
+ * kvm_hwpoisoned_mem - indicate if there is any hwpoisoned page
+ * reported for the VM.
+ */
+bool kvm_hwpoisoned_mem(void);
 #endif
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 49e755ec4a..a8cecd040e 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -1119,6 +1119,11 @@ int kvm_vm_check_extension(KVMState *s, unsigned int 
extension)
 return ret;
 }
 
+/*
+ * We track the poisoned pages to be able to:
+ * - replace them on VM reset
+ * - block a migration for a VM with a poisoned page
+ */
 typedef struct HWPoisonPage {
 ram_addr_t ram_addr;
 QLIST_ENTRY(HWPoisonPage) list;
@@ -1152,6 +1157,11 @@ void kvm_hwpoison_page_add(ram_addr_t ram_addr)
 QLIST_INSERT_HEAD(_page_list, page, list);
 }
 
+bool kvm_hwpoisoned_mem(void)
+{
+return !QLIST_EMPTY(_page_list);
+}
+
 static uint32_t adjust_ioeventfd_endianness(uint32_t val, uint32_t size)
 {
 #if HOST_BIG_ENDIAN != TARGET_BIG_ENDIAN
diff --git a/accel/stubs/kvm-stub.c b/accel/stubs/kvm-stub.c
index 1b37d9a302..ca38172884 100644
--- a/accel/stubs/kvm-stub.c
+++ b/accel/stubs/kvm-stub.c
@@ -124,3 +124,8 @@ uint32_t kvm_dirty_ring_size(void)
 {
 return 0;
 }
+
+bool kvm_hwpoisoned_mem(void)
+{
+return false;
+}
diff --git a/migration/migration.c b/migration/migration.c
index d5f705ceef..b574e66f7b 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -67,6 +67,7 @@
 #include "options.h"
 #include "sysemu/dirtylimit.h"
 #include "qemu/sockets.h"
+#include "sysemu/kvm.h"
 
 static NotifierList migration_state_notifiers =
 NOTIFIER_LIST_INITIALIZER(migration_state_notifiers);
@@ -1906,6 +1907,12 @@ static bool migrate_prepare(MigrationState *s, bool blk, 
bool blk_inc,
 return false;
 }
 
+if (kvm_hwpoisoned_mem()) {
+error_setg(errp, "Can't migrate this vm with hardware poisoned memory, 
"
+   "please reboot the vm and try again");
+return false;
+}
+
 if (migration_is_blocked(errp)) {
 return false;
 }
-- 
2.43.0




Re: [PATCH v3 0/6] migration/multifd: Fix channel creation vs. cleanup races

2024-02-07 Thread Peter Xu
On Tue, Feb 06, 2024 at 06:51:12PM -0300, Fabiano Rosas wrote:
> Based-on: 20240202102857.110210-1-pet...@redhat.com
> [PATCH v2 00/23] migration/multifd: Refactor ->send_prepare() and cleanups
> https://lore.kernel.org/r/20240202102857.110210-1-pet...@redhat.com
> 
> Hi,
> 
> For v3 I fixed the refcounting issue spotted by Avihai. The situation
> there is a bit clunky due to historical reasons. The gist is that we
> have an assumption that channel creation never fails after p->c has
> been set, so when 'p->c == NULL' we have to unref and when 'p->c !=
> NULL' the cleanup code will do the unref.
> 
> CI run: https://gitlab.com/farosas/qemu/-/pipelines/1166889341

Apologize if I queue this too fast, but i'll disappear tomorrow, so I want
to have this thread race fixed soon.  I hope that's already complete from
angle of all race can happen, but if otherwise we work on top.

queued, thanks.

-- 
Peter Xu




Re: [External] Re: Re: [PATCH 2/6] migration/multifd: Add zero pages and zero bytes counter to migration status interface.

2024-02-07 Thread Peter Xu
On Wed, Feb 07, 2024 at 03:44:18PM -0800, Hao Xiang wrote:
> On Wed, Feb 7, 2024 at 12:41 AM Jiri Denemark  wrote:
> >
> > On Wed, Feb 07, 2024 at 12:37:15 +0800, Peter Xu wrote:
> > > On Wed, Feb 07, 2024 at 12:13:10PM +0800, Peter Xu wrote:
> > > > On Tue, Feb 06, 2024 at 11:19:04PM +, Hao Xiang wrote:
> > > > > This change extends the MigrationStatus interface to track zero pages
> > > > > and zero bytes counter.
> > > > >
> > > > > Signed-off-by: Hao Xiang 
> > > >
> > > > Reviewed-by: Peter Xu 
> > >
> > > I'll need to scratch this, sorry..
> > >
> > > The issue is I forgot we have "duplicate" which is exactly "zero
> > > page"s.. See:
> > >
> > > info->ram->duplicate = stat64_get(_stats.zero_pages);
> > >
> > > If you think the name too confusing and want a replacement, maybe it's 
> > > fine
> > > and maybe we can do that.  Then we can keep this zero page counter
> > > introduced, reporting the same value as duplicates, then with a follow up
> > > patch to deprecate "duplicate" parameter.  See an exmaple on how to
> > > deprecate in 7b24d326348e1672.
> > >
> > > One thing I'm not sure is whether Libvirt will be fine on losing
> > > "duplicates" after 2+ QEMU major releases.  Copy Jiri for this.  My
> > > understanding is that Libvirt should be keeping an eye on deprecation list
> > > and react, but I'd like to double check..
> >
> > This should not be a big deal as we can internally map either one
> > (depending on what QEMU supports) to the same libvirt's field. AFAIK
> > there is a consensus on Cc-ing libvirt-devel on patches that deprecate

I see.

> > QEMU interfaces so that we can update our code in time before the
> > deprecated interface is dropped.

Right.

What I mostly worried is "old libvirt" + "new qemu", where the old libvirt
only knows "duplicates", while the new (after 2 releases) will only report
"zeros".

> >
> > BTW, libvirt maps "duplicate" to:
> >
> > /**
> >  * VIR_DOMAIN_JOB_MEMORY_CONSTANT:
> >  *
> >  * virDomainGetJobStats field: number of pages filled with a constant
> >  * byte (all bytes in a single page are identical) transferred since the
> >  * beginning of the migration job, as VIR_TYPED_PARAM_ULLONG.
> >  *
> >  * The most common example of such pages are zero pages, i.e., pages filled
> >  * with zero bytes.
> >  *
> >  * Since: 1.0.3
> >  */
> > # define VIR_DOMAIN_JOB_MEMORY_CONSTANT  "memory_constant"
> >
> > Jirka
> >
> 
> Interesting. I didn't notice the existence of "duplicate" for zero
> pages. I do think the name is quite confusing. I will create the
> "zero/zero_bytes" counter and a separate commit to deprecate
> "duplicate". Will add libvirt devs per instruction above.

Yeah, please go ahead, and I hope my worry is not a real concern above; we
can figure that out later.  Even without deprecating "duplicate", maybe
it'll at least still be worthwhile we start having "zeros" reported
alongside.  Then after 10/20/30/N years we always have a chance to
deprecate the other one, just a matter of compatible window.

Thanks,

-- 
Peter Xu




Re: [PATCH v2 0/3] ci: Fixes on the recent cross-binary test case

2024-02-07 Thread Peter Xu
On Wed, Feb 07, 2024 at 08:54:00AM +0800, pet...@redhat.com wrote:
> From: Peter Xu 
> 
> v2:
> - Fix a typo in patch 2 on QEMU_PREV_VERSION
> - Added R-bs for Dan
> 
> Hi,
> 
> This small patchset updates the recent cross-binary test for migration on
> a few things.
> 
> Patch 1 modifies the aarch64 test GIC version to 3 rather than "max",
> paving way for enabling it, even if the CPU model is not yet ready.
> 
> Patch 2 removes the tag dependency of the new build-previous-qemu job, so
> that in personal CI pipelines the job won't fail if the tag is missing, as
> reported by Peter Maydell, and solution suggested by Dan.
> 
> Patch 3 updates the comment for aarch64 on the test to state the fact, and
> what is missing.  Then we don't target it support for v9.0, but only until
> we have a stable CPU model for aarch64 (if ever possible to support both
> tcg and kvm).
> 
> Comments welcomed, thanks.

queued.

-- 
Peter Xu




Re: [External] Re: [PATCH 0/6] Introduce multifd zero page checking.

2024-02-07 Thread Peter Xu
On Wed, Feb 07, 2024 at 04:47:27PM -0800, Hao Xiang wrote:
> Sure I will drop "throughput" to avoid confusion. In my testing, 1
> multifd channel already makes the main thread spin at 100%. So the
> total-time is the same across 1/2/4 multifd channels as long as zero
> page is run on the main migration thread. Of course, this is based on
> the fact that the network is not the bottleneck. One interesting
> finding is that multifd 1 channel with multifd zero page has better
> performance than multifd 1 channel with main migration thread.

It's probably because the main thread has even more works to do than
"detecting zero page" alone.

When zero detection is done in main thread and when the guest is fully
idle, it'll consume a major portion of main thread cpu resource scanning
those pages already.  Consider all pages zero, multifd threads should be
fully idle, so n_channels may not matter here.

When 1 multifd thread created with zero-page offloading, zero page is fully
offloaded from main -> multifd thread even if only one.  It's kind of a
similar effect of forking the main thread into two threads, so the main
thread can be more efficient on other tasks (fetching/scanning dirty bits,
etc.).

Thanks,

-- 
Peter Xu




Re: [External] RE: Regarding to the recent Intel IAA/DSA/QAT support on migration

2024-02-07 Thread Hao Xiang
On Wed, Feb 7, 2024 at 12:38 AM Liu, Yuan1  wrote:
>
> Thank you very much for your reminder and the rapid updates to the
> multifd function. I will incorporate your suggestions into the next
> version (IAA Accelerated Live Migration solution).
>
> Regarding the QAT and DSA optimization, my colleagues and I have
> already started reviewing and testing them, and it seems like a
> promising optimization direction. I am more than willing to contribute
> further efforts to the long-term maintenance of Intel accelerators in
> live migration.
>
> > -Original Message-
> > From: Peter Xu 
> > Sent: Wednesday, February 7, 2024 4:10 PM
> > To: Bryan Zhang ; Hao Xiang
> > ; Liu, Yuan1 
> > Cc: Fabiano Rosas ; QEMU Devel Mailing List  > de...@nongnu.org>
> > Subject: Regarding to the recent Intel IAA/DSA/QAT support on migration
> >
> > Copy qemu-devel.
> >
> > On Wed, Feb 07, 2024 at 04:07:40PM +0800, Peter Xu wrote:
> > > Hi,
> > >
> > > I'm sending this email just to leave a generic comment to the recent
> > > migration efforts to enable these new Intel technologies.
> > >
> > > The relevant patchsets (latest version so far) we're discussing are:
> > >
> > >   [PATCH v3 0/4] Live Migration Acceleration with IAA Compression
> > >
> > > https://lore.kernel.org/r/20240103112851.908082-1-yuan1@intel.com
> > >
> > >   [PATCH v3 00/20] Use Intel DSA accelerator to offload zero page
> > checking in multifd live migration.
> > >
> > > https://lore.kernel.org/r/20240104004452.324068-1-hao.xiang@bytedance.
> > > com
> > >
> > >   [PATCH 0/5] *** Implement using Intel QAT to offload ZLIB
> > >
> > > https://lore.kernel.org/r/20231231205804.2366509-1-bryan.zhang@bytedan
> > > ce.com
> > >
> > > I want to comment in a generic way since this should apply to all
> > > these
> > > series:
> > >
> > >   - A heads-up that multifd code is rapidly changing recently, I
> > apologize
> > > that you'll need a rebase.  It's just that it's probably much better
> > to
> > > do this before anything lands there.
> > >
> > > IIUC the good thing is we found that send_prepare() doesn't need to
> > be
> > > changed that much, however there's still some change; please refer
> > to
> > > the new code (I'll prepare a pull tomorrow to include most of the
> > > changes, and we should have a major thread race fixed too with
> > Fabiano
> > > & Avihai's help). I hope this will also provide some kind of
> > isolation
> > > to e.g. other works that may touch other areas.  E.g., I hope fixed-
> > ram
> > > won't need to conflict much with any of the above series now.

Thanks for the update. The rebase shouldn't be that bad so no worries.

> > >
> > >   - When posting the new patchset (if there is a plan..), please make
> > sure
> > > we have:
> > >
> > > - Proper unit tests for the new code (probably mostly software
> > >   fallbacks to be tested on the new libraries being introduced; just
> > to
> > >   make sure the new library code paths can get some torture please).
> > >
> > > - Proper documentation for the new code.  Please feel free to start
> > >   creating your own .rst file under docs/devel/migration/, we can
> > try
> > >   to merge them later.  It should help avoid conflictions.  Please
> > also
> > >   link the new file into index.rst there.
> > >
> > >   IMHO the document can contain many things, the important ones
> > could
> > >   start from: who should enable such feature; what one can get from
> > >   having it enabled; what is the HW requirement to enable it; how
> > >   should one tune the new parameters, and so on... some links to the
> > >   technology behinds it would be nice too to be referenced.
> > >
> > > - Try to add new code (especially HW/library based) into new file.
> > >   I see that QPL & QAT already proposed its own files (multifd-
> > pql.c,
> > >   multifd-qatzip.c) which is great.
> > >
> > >   Xiang, please also consider doing so for the DSA based zero page
> > >   detection.  It can be called multifd-zero-page.c, for example, and
> > >   you can create it when working on the
> > >   offload-zero-page-detect-to-multifd patchset already.

Sounds good.

> > >
> > > - Please provide someone who can potentially maintain this code if
> > ever
> > >   possible.  Pushing these code upstream is great, but maintaining
> > will
> > >   also take effort.  It might be impractical this keeps growing for
> > >   migration maintainers (currently Fabiano and myself), so we may
> > like
> > >   to have people covering these areas, especially when the new codes
> > >   are not directly relevant to migration framework.
> > >
> > >   I'd suggest for each of the project we can add an entry in
> > >   MAINTAINERS below "Migration" section, adding relevant files (and
> > >   these files should exist in both the new section and "Migration").
> > I
> > >   am not sure whether 

Re: [External] Re: [PATCH 1/6] migration/multifd: Add new migration option multifd-zero-page.

2024-02-07 Thread Hao Xiang
On Tue, Feb 6, 2024 at 7:45 PM Peter Xu  wrote:
>
> On Tue, Feb 06, 2024 at 11:19:03PM +, Hao Xiang wrote:
> > diff --git a/qapi/migration.json b/qapi/migration.json
> > index 819708321d..ff033a0344 100644
> > --- a/qapi/migration.json
> > +++ b/qapi/migration.json
> > @@ -874,6 +874,11 @@
> >  # @mode: Migration mode. See description in @MigMode. Default is 'normal'.
> >  #(Since 8.2)
> >  #
> > +# @multifd-zero-page: Multifd zero page checking. If the parameter is true,
> > +# zero page checking is done on the multifd sender thread. If the 
> > parameter
> > +# is false, zero page checking is done on the migration main thread. 
> > Default
> > +# is set to true. (Since 9.0)
>
> I replied somewhere before on this, but I can try again..
>
> Do you think it'll be better to introduce a generic parameter for zero page
> detection?
>
>   - "none" if disabled,
>   - "legacy" for main thread,
>   - "multifd" for multifd (software-based).
>
> A string could work, but maybe cleaner to introduce
> @MigrationZeroPageDetector enum?
>
> When you add more, you can keep extending that with the single field
> ("multifd-dsa", etc.).
>
> --
> Peter Xu
>

Sorry I overlooked the previous email. This sounds like a good idea.



Re: [External] Re: [PATCH 0/6] Introduce multifd zero page checking.

2024-02-07 Thread Hao Xiang
On Tue, Feb 6, 2024 at 7:39 PM Peter Xu  wrote:
>
> On Tue, Feb 06, 2024 at 11:19:02PM +, Hao Xiang wrote:
> > This patchset is based on Juan Quintela's old series here
> > https://lore.kernel.org/all/20220802063907.18882-1-quint...@redhat.com/
> >
> > In the multifd live migration model, there is a single migration main
> > thread scanning the page map, queuing the pages to multiple multifd
> > sender threads. The migration main thread runs zero page checking on
> > every page before queuing the page to the sender threads. Zero page
> > checking is a CPU intensive task and hence having a single thread doing
> > all that doesn't scale well. This change introduces a new function
> > to run the zero page checking on the multifd sender threads. This
> > patchset also lays the ground work for future changes to offload zero
> > page checking task to accelerator hardwares.
> >
> > Use two Intel 4th generation Xeon servers for testing.
> >
> > Architecture:x86_64
> > CPU(s):  192
> > Thread(s) per core:  2
> > Core(s) per socket:  48
> > Socket(s):   2
> > NUMA node(s):2
> > Vendor ID:   GenuineIntel
> > CPU family:  6
> > Model:   143
> > Model name:  Intel(R) Xeon(R) Platinum 8457C
> > Stepping:8
> > CPU MHz: 2538.624
> > CPU max MHz: 3800.
> > CPU min MHz: 800.
> >
> > Perform multifd live migration with below setup:
> > 1. VM has 100GB memory. All pages in the VM are zero pages.
> > 2. Use tcp socket for live migratio.
> > 3. Use 4 multifd channels and zero page checking on migration main thread.
> > 4. Use 1/2/4 multifd channels and zero page checking on multifd sender
> > threads.
> > 5. Record migration total time from sender QEMU console's "info migrate"
> > command.
> > 6. Calculate throughput with "100GB / total time".
> >
> > +--+
> > |zero-page-checking | total-time(ms) | throughput(GB/s)|
> > +--+
> > |main-thread| 9629   | 10.38GB/s   |
> > +--+
> > |multifd-1-threads  | 6182   | 16.17GB/s   |
> > +--+
> > |multifd-2-threads  | 4643   | 21.53GB/s   |
> > +--+
> > |multifd-4-threads  | 4143   | 24.13GB/s   |
> > +--+
>
> This "throughput" is slightly confusing; I was initially surprised to see a
> large throughput for idle guests.  IMHO the "total-time" would explain.
> Feel free to drop that column if there's a repost.
>
> Did you check why 4 channels mostly already reached the top line?  Is it
> because main thread is already spinning 100%?
>
> Thanks,
>
> --
> Peter Xu

Sure I will drop "throughput" to avoid confusion. In my testing, 1
multifd channel already makes the main thread spin at 100%. So the
total-time is the same across 1/2/4 multifd channels as long as zero
page is run on the main migration thread. Of course, this is based on
the fact that the network is not the bottleneck. One interesting
finding is that multifd 1 channel with multifd zero page has better
performance than multifd 1 channel with main migration thread.
>



Re: [External] Re: Re: [PATCH 2/6] migration/multifd: Add zero pages and zero bytes counter to migration status interface.

2024-02-07 Thread Hao Xiang
On Wed, Feb 7, 2024 at 12:41 AM Jiri Denemark  wrote:
>
> On Wed, Feb 07, 2024 at 12:37:15 +0800, Peter Xu wrote:
> > On Wed, Feb 07, 2024 at 12:13:10PM +0800, Peter Xu wrote:
> > > On Tue, Feb 06, 2024 at 11:19:04PM +, Hao Xiang wrote:
> > > > This change extends the MigrationStatus interface to track zero pages
> > > > and zero bytes counter.
> > > >
> > > > Signed-off-by: Hao Xiang 
> > >
> > > Reviewed-by: Peter Xu 
> >
> > I'll need to scratch this, sorry..
> >
> > The issue is I forgot we have "duplicate" which is exactly "zero
> > page"s.. See:
> >
> > info->ram->duplicate = stat64_get(_stats.zero_pages);
> >
> > If you think the name too confusing and want a replacement, maybe it's fine
> > and maybe we can do that.  Then we can keep this zero page counter
> > introduced, reporting the same value as duplicates, then with a follow up
> > patch to deprecate "duplicate" parameter.  See an exmaple on how to
> > deprecate in 7b24d326348e1672.
> >
> > One thing I'm not sure is whether Libvirt will be fine on losing
> > "duplicates" after 2+ QEMU major releases.  Copy Jiri for this.  My
> > understanding is that Libvirt should be keeping an eye on deprecation list
> > and react, but I'd like to double check..
>
> This should not be a big deal as we can internally map either one
> (depending on what QEMU supports) to the same libvirt's field. AFAIK
> there is a consensus on Cc-ing libvirt-devel on patches that deprecate
> QEMU interfaces so that we can update our code in time before the
> deprecated interface is dropped.
>
> BTW, libvirt maps "duplicate" to:
>
> /**
>  * VIR_DOMAIN_JOB_MEMORY_CONSTANT:
>  *
>  * virDomainGetJobStats field: number of pages filled with a constant
>  * byte (all bytes in a single page are identical) transferred since the
>  * beginning of the migration job, as VIR_TYPED_PARAM_ULLONG.
>  *
>  * The most common example of such pages are zero pages, i.e., pages filled
>  * with zero bytes.
>  *
>  * Since: 1.0.3
>  */
> # define VIR_DOMAIN_JOB_MEMORY_CONSTANT  "memory_constant"
>
> Jirka
>

Interesting. I didn't notice the existence of "duplicate" for zero
pages. I do think the name is quite confusing. I will create the
"zero/zero_bytes" counter and a separate commit to deprecate
"duplicate". Will add libvirt devs per instruction above.



Re: [PULL 0/1] Block patches

2024-02-07 Thread Kevin Wolf
Am 06.02.2024 um 16:31 hat Stefan Hajnoczi geschrieben:
> The following changes since commit 39a6e4f87e7b75a45b08d6dc8b8b7c2954c87440:
> 
>   Merge tag 'pull-qapi-2024-02-03' of https://repo.or.cz/qemu/armbru into 
> staging (2024-02-03 13:31:58 +)
> 
> are available in the Git repository at:
> 
>   https://gitlab.com/stefanha/qemu.git tags/block-pull-request
> 
> for you to fetch changes up to 1d52cc0ac27761e296b14655c2f5b2649ee69491:
> 
>   virtio-blk: avoid using ioeventfd state in irqfd conditional (2024-02-06 
> 10:22:18 -0500)
> 
> 
> Pull request
> 
> A bug fix for in-flight I/O during ioeventfd shutdown.
> 
> 
> 
> Stefan Hajnoczi (1):
>   virtio-blk: avoid using ioeventfd state in irqfd conditional

I noticed that this patch is also in the pull request I sent, so I
guess if mine goes through, you don't have to process this one
separately.

Kevin




Re: [PATCH v8 2/3] hw/arm: Connect STM32L4x5 EXTI to STM32L4x5 SoC

2024-02-07 Thread Philippe Mathieu-Daudé

Hi Inès,

(this is now commit 52671f69f7).

On 9/1/24 17:06, Inès Varhol wrote:

Tested-by: Philippe Mathieu-Daudé 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Alistair Francis 
Signed-off-by: Arnaud Minier 
Signed-off-by: Inès Varhol 
---
  hw/arm/Kconfig |  1 +
  hw/arm/stm32l4x5_soc.c | 52 +-
  include/hw/arm/stm32l4x5_soc.h |  3 ++
  3 files changed, 55 insertions(+), 1 deletion(-)




+#define NUM_EXTI_IRQ 40
+/* Match exti line connections with their CPU IRQ number */
+/* See Vector Table (Reference Manual p.396) */
+static const int exti_irq[NUM_EXTI_IRQ] = {
+6,  /* GPIO[0] */
+7,  /* GPIO[1] */
+8,  /* GPIO[2] */
+9,  /* GPIO[3] */
+10, /* GPIO[4] */
+23, 23, 23, 23, 23, /* GPIO[5..9]  */
+40, 40, 40, 40, 40, 40, /* GPIO[10..15]*/


I'm sorry because I missed that earlier, and I'm surprised
you aren't chasing weird bugs. Due to how QEMU IRQs are
implemented, we can not wire multiple input lines to the same
output without using an intermediate "OR gate". We model it
as TYPE_OR_IRQ. See the comment in "hw/qdev-core.h" added in
commit cd07d7f9f5 ("qdev: Document GPIO related functions"):

 * It is not valid to try to connect one outbound GPIO to multiple
 * qemu_irqs at once, or to connect multiple outbound GPIOs to the
 * same qemu_irq. (Warning: there is no assertion or other guard to
 * catch this error: the model will just not do the right thing.)
 * Instead, for fan-out you can use the TYPE_SPLIT_IRQ device: connect
 * a device's outbound GPIO to the splitter's input, and connect each
 * of the splitter's outputs to a different device.  For fan-in you
 * can use the TYPE_OR_IRQ device, which is a model of a logical OR
 * gate with multiple inputs and one output.

So for example for the GPIO[10..15] you need to create a 6-line
OR gate as (totally untested):

  /* 6-line OR IRQ gate */
  Object *orgate40 = object_new(TYPE_OR_IRQ);
  object_property_set_int(orgate40, "num-lines", 6, _fatal);
  qdev_realize(DEVICE(orgate), NULL, _fatal);

  /* OR gate -> IRQ #40 */
  qdev_connect_gpio_out(DEVICE(orgate40), 0,
qdev_get_gpio_in(armv7m, 40));

  /* EXTI GPIO[10..15] -> OR gate */
  for (unsigned i = 0; i < 6; i++) {
  sysbus_connect_irq(SYS_BUS_DEVICE(>exti), 10 + i,
 qdev_get_gpio_in(DEVICE(orgate40), i));
  }


+1,  /* PVD */
+67, /* OTG_FS_WKUP, Direct */
+41, /* RTC_ALARM   */
+2,  /* RTC_TAMP_STAMP2/CSS_LSE */
+3,  /* RTC wakeup timer*/
+63, /* COMP1   */
+63, /* COMP2   */
+31, /* I2C1 wakeup, Direct */
+33, /* I2C2 wakeup, Direct */
+72, /* I2C3 wakeup, Direct */
+37, /* USART1 wakeup, Direct   */
+38, /* USART2 wakeup, Direct   */
+39, /* USART3 wakeup, Direct   */
+52, /* UART4 wakeup, Direct*/
+53, /* UART4 wakeup, Direct*/
+70, /* LPUART1 wakeup, Direct  */
+65, /* LPTIM1, Direct  */
+66, /* LPTIM2, Direct  */
+76, /* SWPMI1 wakeup, Direct   */
+1,  /* PVM1 wakeup */
+1,  /* PVM2 wakeup */
+1,  /* PVM3 wakeup */
+1,  /* PVM4 wakeup */
+78  /* LCD wakeup, Direct  */
+};



+busdev = SYS_BUS_DEVICE(>exti);
+if (!sysbus_realize(busdev, errp)) {
+return;
+}
+sysbus_mmio_map(busdev, 0, EXTI_ADDR);
+for (unsigned i = 0; i < NUM_EXTI_IRQ; i++) {
+sysbus_connect_irq(busdev, i, qdev_get_gpio_in(armv7m, exti_irq[i]));


  ^^

+}

Regards,

Phil.



[PULL 03/16] virtio-blk: add vq_rq[] bounds check in virtio_blk_dma_restart_cb()

2024-02-07 Thread Kevin Wolf
From: Stefan Hajnoczi 

Hanna Czenczek  noted that the array index in
virtio_blk_dma_restart_cb() is not bounds-checked:

  g_autofree VirtIOBlockReq **vq_rq = g_new0(VirtIOBlockReq *, num_queues);
  ...
  while (rq) {
  VirtIOBlockReq *next = rq->next;
  uint16_t idx = virtio_get_queue_index(rq->vq);

  rq->next = vq_rq[idx];
 ^^

The code is correct because both rq->vq and vq_rq[] depend on
num_queues, but this is indirect and not 100% obvious. Add an assertion.

Suggested-by: Hanna Czenczek 
Reviewed-by: Manos Pitsidianakis 
Reviewed-by: Hanna Czenczek 
Signed-off-by: Stefan Hajnoczi 
Message-ID: <20240206190610.107963-4-stefa...@redhat.com>
Reviewed-by: Kevin Wolf 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Kevin Wolf 
---
 hw/block/virtio-blk.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index e430ba583c..31212506ca 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -1209,6 +1209,8 @@ static void virtio_blk_dma_restart_cb(void *opaque, bool 
running,
 VirtIOBlockReq *next = rq->next;
 uint16_t idx = virtio_get_queue_index(rq->vq);
 
+/* Only num_queues vqs were created so vq_rq[idx] is within bounds */
+assert(idx < num_queues);
 rq->next = vq_rq[idx];
 vq_rq[idx] = rq;
 rq = next;
-- 
2.43.0




[PULL 06/16] block-backend: Allow concurrent context changes

2024-02-07 Thread Kevin Wolf
From: Hanna Czenczek 

Since AioContext locks have been removed, a BlockBackend's AioContext
may really change at any time (only exception is that it is often
confined to a drained section, as noted in this patch).  Therefore,
blk_get_aio_context() cannot rely on its root node's context always
matching that of the BlockBackend.

In practice, whether they match does not matter anymore anyway: Requests
can be sent to BDSs from any context, so anyone who requests the BB's
context should have no reason to require the root node to have the same
context.  Therefore, we can and should remove the assertion to that
effect.

In addition, because the context can be set and queried from different
threads concurrently, it has to be accessed with atomic operations.

Buglink: https://issues.redhat.com/browse/RHEL-19381
Suggested-by: Kevin Wolf 
Signed-off-by: Hanna Czenczek 
Message-ID: <20240202144755.671354-2-hre...@redhat.com>
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Kevin Wolf 
Signed-off-by: Kevin Wolf 
---
 block/block-backend.c | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 209eb07528..9c4de79e6b 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -44,7 +44,7 @@ struct BlockBackend {
 char *name;
 int refcnt;
 BdrvChild *root;
-AioContext *ctx;
+AioContext *ctx; /* access with atomic operations only */
 DriveInfo *legacy_dinfo;/* null unless created by drive_new() */
 QTAILQ_ENTRY(BlockBackend) link; /* for block_backends */
 QTAILQ_ENTRY(BlockBackend) monitor_link; /* for monitor_block_backends */
@@ -2414,22 +2414,22 @@ void blk_op_unblock_all(BlockBackend *blk, Error 
*reason)
 }
 }
 
+/**
+ * Return BB's current AioContext.  Note that this context may change
+ * concurrently at any time, with one exception: If the BB has a root node
+ * attached, its context will only change through 
bdrv_try_change_aio_context(),
+ * which creates a drained section.  Therefore, incrementing such a BB's
+ * in-flight counter will prevent its context from changing.
+ */
 AioContext *blk_get_aio_context(BlockBackend *blk)
 {
-BlockDriverState *bs;
 IO_CODE();
 
 if (!blk) {
 return qemu_get_aio_context();
 }
 
-bs = blk_bs(blk);
-if (bs) {
-AioContext *ctx = bdrv_get_aio_context(blk_bs(blk));
-assert(ctx == blk->ctx);
-}
-
-return blk->ctx;
+return qatomic_read(>ctx);
 }
 
 int blk_set_aio_context(BlockBackend *blk, AioContext *new_context,
@@ -2442,7 +2442,7 @@ int blk_set_aio_context(BlockBackend *blk, AioContext 
*new_context,
 GLOBAL_STATE_CODE();
 
 if (!bs) {
-blk->ctx = new_context;
+qatomic_set(>ctx, new_context);
 return 0;
 }
 
@@ -2471,7 +2471,7 @@ static void blk_root_set_aio_ctx_commit(void *opaque)
 AioContext *new_context = s->new_ctx;
 ThrottleGroupMember *tgm = >public.throttle_group_member;
 
-blk->ctx = new_context;
+qatomic_set(>ctx, new_context);
 if (tgm->throttle_state) {
 throttle_group_detach_aio_context(tgm);
 throttle_group_attach_aio_context(tgm, new_context);
-- 
2.43.0




[PULL 11/16] scsi: Don't ignore most usb-storage properties

2024-02-07 Thread Kevin Wolf
usb-storage is for the most part just a wrapper around an internally
created scsi-disk device. It uses DEFINE_BLOCK_PROPERTIES() to offer all
of the usual block device properties to the user, but then only forwards
a few select properties to the internal device while the rest is
silently ignored.

This changes scsi_bus_legacy_add_drive() to accept a whole BlockConf
instead of some individual values inside of it so that usb-storage can
now pass the whole configuration to the internal scsi-disk. This enables
the remaining block device properties, e.g. logical/physical_block_size
or discard_granularity.

Buglink: https://issues.redhat.com/browse/RHEL-22375
Signed-off-by: Kevin Wolf 
Message-ID: <20240131130607.24117-1-kw...@redhat.com>
Signed-off-by: Kevin Wolf 
---
 include/hw/scsi/scsi.h   |  5 +
 hw/scsi/scsi-bus.c   | 33 +
 hw/usb/dev-storage-classic.c |  5 +
 3 files changed, 15 insertions(+), 28 deletions(-)

diff --git a/include/hw/scsi/scsi.h b/include/hw/scsi/scsi.h
index 10c4e8288d..c3d5e17e38 100644
--- a/include/hw/scsi/scsi.h
+++ b/include/hw/scsi/scsi.h
@@ -199,10 +199,7 @@ static inline SCSIBus *scsi_bus_from_device(SCSIDevice *d)
 }
 
 SCSIDevice *scsi_bus_legacy_add_drive(SCSIBus *bus, BlockBackend *blk,
-  int unit, bool removable, int bootindex,
-  bool share_rw,
-  BlockdevOnError rerror,
-  BlockdevOnError werror,
+  int unit, bool removable, BlockConf 
*conf,
   const char *serial, Error **errp);
 void scsi_bus_set_ua(SCSIBus *bus, SCSISense sense);
 void scsi_bus_legacy_handle_cmdline(SCSIBus *bus);
diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
index 230313022c..9e40b0c920 100644
--- a/hw/scsi/scsi-bus.c
+++ b/hw/scsi/scsi-bus.c
@@ -376,15 +376,13 @@ static void scsi_qdev_unrealize(DeviceState *qdev)
 
 /* handle legacy '-drive if=scsi,...' cmd line args */
 SCSIDevice *scsi_bus_legacy_add_drive(SCSIBus *bus, BlockBackend *blk,
-  int unit, bool removable, int bootindex,
-  bool share_rw,
-  BlockdevOnError rerror,
-  BlockdevOnError werror,
+  int unit, bool removable, BlockConf 
*conf,
   const char *serial, Error **errp)
 {
 const char *driver;
 char *name;
 DeviceState *dev;
+SCSIDevice *s;
 DriveInfo *dinfo;
 
 if (blk_is_sg(blk)) {
@@ -402,11 +400,10 @@ SCSIDevice *scsi_bus_legacy_add_drive(SCSIBus *bus, 
BlockBackend *blk,
 object_property_add_child(OBJECT(bus), name, OBJECT(dev));
 g_free(name);
 
+s = SCSI_DEVICE(dev);
+s->conf = *conf;
+
 qdev_prop_set_uint32(dev, "scsi-id", unit);
-if (bootindex >= 0) {
-object_property_set_int(OBJECT(dev), "bootindex", bootindex,
-_abort);
-}
 if (object_property_find(OBJECT(dev), "removable")) {
 qdev_prop_set_bit(dev, "removable", removable);
 }
@@ -417,19 +414,12 @@ SCSIDevice *scsi_bus_legacy_add_drive(SCSIBus *bus, 
BlockBackend *blk,
 object_unparent(OBJECT(dev));
 return NULL;
 }
-if (!object_property_set_bool(OBJECT(dev), "share-rw", share_rw, errp)) {
-object_unparent(OBJECT(dev));
-return NULL;
-}
-
-qdev_prop_set_enum(dev, "rerror", rerror);
-qdev_prop_set_enum(dev, "werror", werror);
 
 if (!qdev_realize_and_unref(dev, >qbus, errp)) {
 object_unparent(OBJECT(dev));
 return NULL;
 }
-return SCSI_DEVICE(dev);
+return s;
 }
 
 void scsi_bus_legacy_handle_cmdline(SCSIBus *bus)
@@ -437,6 +427,12 @@ void scsi_bus_legacy_handle_cmdline(SCSIBus *bus)
 Location loc;
 DriveInfo *dinfo;
 int unit;
+BlockConf conf = {
+.bootindex = -1,
+.share_rw = false,
+.rerror = BLOCKDEV_ON_ERROR_AUTO,
+.werror = BLOCKDEV_ON_ERROR_AUTO,
+};
 
 loc_push_none();
 for (unit = 0; unit <= bus->info->max_target; unit++) {
@@ -446,10 +442,7 @@ void scsi_bus_legacy_handle_cmdline(SCSIBus *bus)
 }
 qemu_opts_loc_restore(dinfo->opts);
 scsi_bus_legacy_add_drive(bus, blk_by_legacy_dinfo(dinfo),
-  unit, false, -1, false,
-  BLOCKDEV_ON_ERROR_AUTO,
-  BLOCKDEV_ON_ERROR_AUTO,
-  NULL, _fatal);
+  unit, false, , NULL, _fatal);
 }
 loc_pop();
 }
diff --git a/hw/usb/dev-storage-classic.c b/hw/usb/dev-storage-classic.c
index 84d19752b5..50a3ad6285 100644
--- a/hw/usb/dev-storage-classic.c
+++ b/hw/usb/dev-storage-classic.c
@@ -67,10 +67,7 @@ static void 

[PULL 09/16] iotests: give tempdir an identifying name

2024-02-07 Thread Kevin Wolf
From: Daniel P. Berrangé 

If something goes wrong causing the iotests not to cleanup their
temporary directory, it is useful if the dir had an identifying
name to show what is to blame.

Signed-off-by: Daniel P. Berrangé 
Message-ID: <20240205155158.1843304-1-berra...@redhat.com>
Revieved-by: Michael Tokarev 
Reviewed-by: Kevin Wolf 
Signed-off-by: Kevin Wolf 
---
 tests/qemu-iotests/testenv.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/testenv.py b/tests/qemu-iotests/testenv.py
index 3ff38f2661..588f30a4f1 100644
--- a/tests/qemu-iotests/testenv.py
+++ b/tests/qemu-iotests/testenv.py
@@ -126,7 +126,7 @@ def init_directories(self) -> None:
 self.tmp_sock_dir = False
 Path(self.sock_dir).mkdir(parents=True, exist_ok=True)
 except KeyError:
-self.sock_dir = tempfile.mkdtemp()
+self.sock_dir = tempfile.mkdtemp(prefix="qemu-iotests-")
 self.tmp_sock_dir = True
 
 self.sample_img_dir = os.getenv('SAMPLE_IMG_DIR',
-- 
2.43.0




[PULL 01/16] virtio-blk: enforce iothread-vq-mapping validation

2024-02-07 Thread Kevin Wolf
From: Stefan Hajnoczi 

Hanna Czenczek  noticed that the safety of
`vq_aio_context[vq->value] = ctx;` with user-defined vq->value inputs is
not obvious.

The code is structured in validate() + apply() steps so input validation
is there, but it happens way earlier and there is nothing that
guarantees apply() can only be called with validated inputs.

This patch moves the validate() call inside the apply() function so
validation is guaranteed. I also added the bounds checking assertion
that Hanna suggested.

Signed-off-by: Stefan Hajnoczi 
Reviewed-by: Manos Pitsidianakis 
Reviewed-by: Hanna Czenczek 
Message-ID: <20240206190610.107963-2-stefa...@redhat.com>
Reviewed-by: Kevin Wolf 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Kevin Wolf 
---
 hw/block/virtio-blk.c | 183 +++---
 1 file changed, 102 insertions(+), 81 deletions(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 227d83569f..6e3e3a23ee 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -1485,68 +1485,6 @@ static int virtio_blk_load_device(VirtIODevice *vdev, 
QEMUFile *f,
 return 0;
 }
 
-static bool
-validate_iothread_vq_mapping_list(IOThreadVirtQueueMappingList *list,
-uint16_t num_queues, Error **errp)
-{
-g_autofree unsigned long *vqs = bitmap_new(num_queues);
-g_autoptr(GHashTable) iothreads =
-g_hash_table_new(g_str_hash, g_str_equal);
-
-for (IOThreadVirtQueueMappingList *node = list; node; node = node->next) {
-const char *name = node->value->iothread;
-uint16List *vq;
-
-if (!iothread_by_id(name)) {
-error_setg(errp, "IOThread \"%s\" object does not exist", name);
-return false;
-}
-
-if (!g_hash_table_add(iothreads, (gpointer)name)) {
-error_setg(errp,
-"duplicate IOThread name \"%s\" in iothread-vq-mapping",
-name);
-return false;
-}
-
-if (node != list) {
-if (!!node->value->vqs != !!list->value->vqs) {
-error_setg(errp, "either all items in iothread-vq-mapping "
- "must have vqs or none of them must have it");
-return false;
-}
-}
-
-for (vq = node->value->vqs; vq; vq = vq->next) {
-if (vq->value >= num_queues) {
-error_setg(errp, "vq index %u for IOThread \"%s\" must be "
-"less than num_queues %u in iothread-vq-mapping",
-vq->value, name, num_queues);
-return false;
-}
-
-if (test_and_set_bit(vq->value, vqs)) {
-error_setg(errp, "cannot assign vq %u to IOThread \"%s\" "
-"because it is already assigned", vq->value, name);
-return false;
-}
-}
-}
-
-if (list->value->vqs) {
-for (uint16_t i = 0; i < num_queues; i++) {
-if (!test_bit(i, vqs)) {
-error_setg(errp,
-"missing vq %u IOThread assignment in 
iothread-vq-mapping",
-i);
-return false;
-}
-}
-}
-
-return true;
-}
-
 static void virtio_resize_cb(void *opaque)
 {
 VirtIODevice *vdev = opaque;
@@ -1613,15 +1551,95 @@ static const BlockDevOps virtio_block_ops = {
 .drained_end   = virtio_blk_drained_end,
 };
 
-/* Generate vq:AioContext mappings from a validated iothread-vq-mapping list */
-static void
-apply_vq_mapping(IOThreadVirtQueueMappingList *iothread_vq_mapping_list,
- AioContext **vq_aio_context, uint16_t num_queues)
+static bool
+validate_iothread_vq_mapping_list(IOThreadVirtQueueMappingList *list,
+uint16_t num_queues, Error **errp)
+{
+g_autofree unsigned long *vqs = bitmap_new(num_queues);
+g_autoptr(GHashTable) iothreads =
+g_hash_table_new(g_str_hash, g_str_equal);
+
+for (IOThreadVirtQueueMappingList *node = list; node; node = node->next) {
+const char *name = node->value->iothread;
+uint16List *vq;
+
+if (!iothread_by_id(name)) {
+error_setg(errp, "IOThread \"%s\" object does not exist", name);
+return false;
+}
+
+if (!g_hash_table_add(iothreads, (gpointer)name)) {
+error_setg(errp,
+"duplicate IOThread name \"%s\" in iothread-vq-mapping",
+name);
+return false;
+}
+
+if (node != list) {
+if (!!node->value->vqs != !!list->value->vqs) {
+error_setg(errp, "either all items in iothread-vq-mapping "
+ "must have vqs or none of them must have it");
+return false;
+}
+}
+
+for (vq = node->value->vqs; vq; vq = vq->next) {
+if (vq->value >= num_queues) {
+error_setg(errp, "vq index 

[PULL 10/16] virtio-blk: do not use C99 mixed declarations

2024-02-07 Thread Kevin Wolf
From: Stefan Hajnoczi 

QEMU's coding style generally forbids C99 mixed declarations.

Signed-off-by: Stefan Hajnoczi 
Message-ID: <20240206140410.65650-1-stefa...@redhat.com>
Reviewed-by: Hanna Czenczek 
Reviewed-by: Kevin Wolf 
Acked-by: Michael S. Tsirkin 
Signed-off-by: Kevin Wolf 
---
 hw/block/virtio-blk.c | 17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 31212506ca..bda5c117d4 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -661,6 +661,9 @@ static void virtio_blk_zone_report_complete(void *opaque, 
int ret)
 int64_t zrp_size, n, j = 0;
 int64_t nz = data->zone_report_data.nr_zones;
 int8_t err_status = VIRTIO_BLK_S_OK;
+struct virtio_blk_zone_report zrp_hdr = (struct virtio_blk_zone_report) {
+.nr_zones = cpu_to_le64(nz),
+};
 
 trace_virtio_blk_zone_report_complete(vdev, req, nz, ret);
 if (ret) {
@@ -668,9 +671,6 @@ static void virtio_blk_zone_report_complete(void *opaque, 
int ret)
 goto out;
 }
 
-struct virtio_blk_zone_report zrp_hdr = (struct virtio_blk_zone_report) {
-.nr_zones = cpu_to_le64(nz),
-};
 zrp_size = sizeof(struct virtio_blk_zone_report)
+ sizeof(struct virtio_blk_zone_descriptor) * nz;
 n = iov_from_buf(in_iov, in_num, 0, _hdr, sizeof(zrp_hdr));
@@ -898,13 +898,14 @@ static int virtio_blk_handle_zone_append(VirtIOBlockReq 
*req,
 
 int64_t offset = virtio_ldq_p(vdev, >out.sector) << BDRV_SECTOR_BITS;
 int64_t len = iov_size(out_iov, out_num);
+ZoneCmdData *data;
 
 trace_virtio_blk_handle_zone_append(vdev, req, offset >> BDRV_SECTOR_BITS);
 if (!check_zoned_request(s, offset, len, true, _status)) {
 goto out;
 }
 
-ZoneCmdData *data = g_malloc(sizeof(ZoneCmdData));
+data = g_malloc(sizeof(ZoneCmdData));
 data->req = req;
 data->in_iov = in_iov;
 data->in_num = in_num;
@@ -1191,14 +1192,15 @@ static void virtio_blk_dma_restart_cb(void *opaque, 
bool running,
 {
 VirtIOBlock *s = opaque;
 uint16_t num_queues = s->conf.num_queues;
+g_autofree VirtIOBlockReq **vq_rq = NULL;
+VirtIOBlockReq *rq;
 
 if (!running) {
 return;
 }
 
 /* Split the device-wide s->rq request list into per-vq request lists */
-g_autofree VirtIOBlockReq **vq_rq = g_new0(VirtIOBlockReq *, num_queues);
-VirtIOBlockReq *rq;
+vq_rq = g_new0(VirtIOBlockReq *, num_queues);
 
 WITH_QEMU_LOCK_GUARD(>rq_lock) {
 rq = s->rq;
@@ -1961,6 +1963,7 @@ static void virtio_blk_device_realize(DeviceState *dev, 
Error **errp)
 VirtIODevice *vdev = VIRTIO_DEVICE(dev);
 VirtIOBlock *s = VIRTIO_BLK(dev);
 VirtIOBlkConf *conf = >conf;
+BlockDriverState *bs;
 Error *err = NULL;
 unsigned i;
 
@@ -2006,7 +2009,7 @@ static void virtio_blk_device_realize(DeviceState *dev, 
Error **errp)
 return;
 }
 
-BlockDriverState *bs = blk_bs(conf->conf.blk);
+bs = blk_bs(conf->conf.blk);
 if (bs->bl.zoned != BLK_Z_NONE) {
 virtio_add_feature(>host_features, VIRTIO_BLK_F_ZONED);
 if (bs->bl.zoned == BLK_Z_HM) {
-- 
2.43.0




[PULL 05/16] monitor: use aio_co_reschedule_self()

2024-02-07 Thread Kevin Wolf
From: Stefan Hajnoczi 

The aio_co_reschedule_self() API is designed to avoid the race
condition between scheduling the coroutine in another AioContext and
yielding.

The QMP dispatch code uses the open-coded version that appears
susceptible to the race condition at first glance:

  aio_co_schedule(qemu_get_aio_context(), qemu_coroutine_self());
  qemu_coroutine_yield();

The code is actually safe because the iohandler and qemu_aio_context
AioContext run under the Big QEMU Lock. Nevertheless, set a good example
and use aio_co_reschedule_self() so it's obvious that there is no race.

Suggested-by: Hanna Reitz 
Reviewed-by: Manos Pitsidianakis 
Reviewed-by: Hanna Czenczek 
Signed-off-by: Stefan Hajnoczi 
Message-ID: <20240206190610.107963-6-stefa...@redhat.com>
Reviewed-by: Kevin Wolf 
Acked-by: Markus Armbruster 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Kevin Wolf 
---
 qapi/qmp-dispatch.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/qapi/qmp-dispatch.c b/qapi/qmp-dispatch.c
index 176b549473..f3488afeef 100644
--- a/qapi/qmp-dispatch.c
+++ b/qapi/qmp-dispatch.c
@@ -212,8 +212,7 @@ QDict *coroutine_mixed_fn qmp_dispatch(const QmpCommandList 
*cmds, QObject *requ
  * executing the command handler so that it can make progress if it
  * involves an AIO_WAIT_WHILE().
  */
-aio_co_schedule(qemu_get_aio_context(), qemu_coroutine_self());
-qemu_coroutine_yield();
+aio_co_reschedule_self(qemu_get_aio_context());
 }
 
 monitor_set_cur(qemu_coroutine_self(), cur_mon);
@@ -227,9 +226,7 @@ QDict *coroutine_mixed_fn qmp_dispatch(const QmpCommandList 
*cmds, QObject *requ
  * Move back to iohandler_ctx so that nested event loops for
  * qemu_aio_context don't start new monitor commands.
  */
-aio_co_schedule(iohandler_get_aio_context(),
-qemu_coroutine_self());
-qemu_coroutine_yield();
+aio_co_reschedule_self(iohandler_get_aio_context());
 }
 } else {
/*
-- 
2.43.0




[PULL 04/16] virtio-blk: declare VirtIOBlock::rq with a type

2024-02-07 Thread Kevin Wolf
From: Stefan Hajnoczi 

The VirtIOBlock::rq field has had the type void * since its introduction
in commit 869a5c6df19a ("Stop VM on error in virtio-blk. (Gleb
Natapov)").

Perhaps this was done to avoid the forward declaration of
VirtIOBlockReq.

Hanna Czenczek  pointed out the missing type. Specify
the actual type because there is no need to use void * here.

Suggested-by: Hanna Czenczek 
Reviewed-by: Manos Pitsidianakis 
Reviewed-by: Hanna Czenczek 
Signed-off-by: Stefan Hajnoczi 
Message-ID: <20240206190610.107963-5-stefa...@redhat.com>
Reviewed-by: Kevin Wolf 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Kevin Wolf 
---
 include/hw/virtio/virtio-blk.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h
index 833a9a344f..5c14110c4b 100644
--- a/include/hw/virtio/virtio-blk.h
+++ b/include/hw/virtio/virtio-blk.h
@@ -55,7 +55,7 @@ struct VirtIOBlock {
 VirtIODevice parent_obj;
 BlockBackend *blk;
 QemuMutex rq_lock;
-void *rq; /* protected by rq_lock */
+struct VirtIOBlockReq *rq; /* protected by rq_lock */
 VirtIOBlkConf conf;
 unsigned short sector_mask;
 bool original_wce;
-- 
2.43.0




[PULL 12/16] blkio: Respect memory-alignment for bounce buffer allocations

2024-02-07 Thread Kevin Wolf
blkio_alloc_mem_region() requires that the requested buffer size is a
multiple of the memory-alignment property. If it isn't, the allocation
fails with a return value of -EINVAL.

Fix the call in blkio_resize_bounce_pool() to make sure the requested
size is properly aligned.

I observed this problem with vhost-vdpa, which requires page aligned
memory. As the virtio-blk device behind it still had 512 byte blocks, we
got bs->bl.request_alignment = 512, but actually any request that needed
a bounce buffer and was not aligned to 4k would fail without this fix.

Suggested-by: Stefano Garzarella 
Signed-off-by: Kevin Wolf 
Message-ID: <20240131173140.42398-1-kw...@redhat.com>
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Stefano Garzarella 
Signed-off-by: Kevin Wolf 
---
 block/blkio.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/block/blkio.c b/block/blkio.c
index bc2f21784c..882e1c297b 100644
--- a/block/blkio.c
+++ b/block/blkio.c
@@ -89,6 +89,9 @@ static int blkio_resize_bounce_pool(BDRVBlkioState *s, 
int64_t bytes)
 /* Pad size to reduce frequency of resize calls */
 bytes += 128 * 1024;
 
+/* Align the pool size to avoid blkio_alloc_mem_region() failure */
+bytes = QEMU_ALIGN_UP(bytes, s->mem_region_alignment);
+
 WITH_QEMU_LOCK_GUARD(>blkio_lock) {
 int ret;
 
-- 
2.43.0




[PULL 07/16] scsi: Await request purging

2024-02-07 Thread Kevin Wolf
From: Hanna Czenczek 

scsi_device_for_each_req_async() currently does not provide any way to
be awaited.  One of its callers is scsi_device_purge_requests(), which
therefore currently does not guarantee that all requests are fully
settled when it returns.

We want all requests to be settled, because scsi_device_purge_requests()
is called through the unrealize path, including the one invoked by
virtio_scsi_hotunplug() through qdev_simple_device_unplug_cb(), which
most likely assumes that all SCSI requests are done then.

In fact, scsi_device_purge_requests() already contains a blk_drain(),
but this will not fully await scsi_device_for_each_req_async(), only the
I/O requests it potentially cancels (not the non-I/O requests).
However, we can have scsi_device_for_each_req_async() increment the BB
in-flight counter, and have scsi_device_for_each_req_async_bh()
decrement it when it is done.  This way, the blk_drain() will fully
await all SCSI requests to be purged.

This also removes the need for scsi_device_for_each_req_async_bh() to
double-check the current context and potentially re-schedule itself,
should it now differ from the BB's context: Changing a BB's AioContext
with a root node is done through bdrv_try_change_aio_context(), which
creates a drained section.  With this patch, we keep the BB in-flight
counter elevated throughout, so we know the BB's context cannot change.

Signed-off-by: Hanna Czenczek 
Message-ID: <20240202144755.671354-3-hre...@redhat.com>
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Kevin Wolf 
Signed-off-by: Kevin Wolf 
---
 hw/scsi/scsi-bus.c | 30 +-
 1 file changed, 21 insertions(+), 9 deletions(-)

diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
index 0a2eb11c56..230313022c 100644
--- a/hw/scsi/scsi-bus.c
+++ b/hw/scsi/scsi-bus.c
@@ -120,17 +120,13 @@ static void scsi_device_for_each_req_async_bh(void 
*opaque)
 SCSIRequest *next;
 
 /*
- * If the AioContext changed before this BH was called then reschedule into
- * the new AioContext before accessing ->requests. This can happen when
- * scsi_device_for_each_req_async() is called and then the AioContext is
- * changed before BHs are run.
+ * The BB cannot have changed contexts between this BH being scheduled and
+ * now: BBs' AioContexts, when they have a node attached, can only be
+ * changed via bdrv_try_change_aio_context(), in a drained section.  While
+ * we have the in-flight counter incremented, that drain must block.
  */
 ctx = blk_get_aio_context(s->conf.blk);
-if (ctx != qemu_get_current_aio_context()) {
-aio_bh_schedule_oneshot(ctx, scsi_device_for_each_req_async_bh,
-g_steal_pointer());
-return;
-}
+assert(ctx == qemu_get_current_aio_context());
 
 QTAILQ_FOREACH_SAFE(req, >requests, next, next) {
 data->fn(req, data->fn_opaque);
@@ -138,11 +134,16 @@ static void scsi_device_for_each_req_async_bh(void 
*opaque)
 
 /* Drop the reference taken by scsi_device_for_each_req_async() */
 object_unref(OBJECT(s));
+
+/* Paired with blk_inc_in_flight() in scsi_device_for_each_req_async() */
+blk_dec_in_flight(s->conf.blk);
 }
 
 /*
  * Schedule @fn() to be invoked for each enqueued request in device @s. @fn()
  * runs in the AioContext that is executing the request.
+ * Keeps the BlockBackend's in-flight counter incremented until everything is
+ * done, so draining it will settle all scheduled @fn() calls.
  */
 static void scsi_device_for_each_req_async(SCSIDevice *s,
void (*fn)(SCSIRequest *, void *),
@@ -163,6 +164,8 @@ static void scsi_device_for_each_req_async(SCSIDevice *s,
  */
 object_ref(OBJECT(s));
 
+/* Paired with blk_dec_in_flight() in scsi_device_for_each_req_async_bh() 
*/
+blk_inc_in_flight(s->conf.blk);
 aio_bh_schedule_oneshot(blk_get_aio_context(s->conf.blk),
 scsi_device_for_each_req_async_bh,
 data);
@@ -1728,11 +1731,20 @@ static void scsi_device_purge_one_req(SCSIRequest *req, 
void *opaque)
 scsi_req_cancel_async(req, NULL);
 }
 
+/**
+ * Cancel all requests, and block until they are deleted.
+ */
 void scsi_device_purge_requests(SCSIDevice *sdev, SCSISense sense)
 {
 scsi_device_for_each_req_async(sdev, scsi_device_purge_one_req, NULL);
 
+/*
+ * Await all the scsi_device_purge_one_req() calls scheduled by
+ * scsi_device_for_each_req_async(), and all I/O requests that were
+ * cancelled this way, but may still take a bit of time to settle.
+ */
 blk_drain(sdev->conf.blk);
+
 scsi_device_set_ua(sdev, sense);
 }
 
-- 
2.43.0




[PULL 16/16] virtio-blk: avoid using ioeventfd state in irqfd conditional

2024-02-07 Thread Kevin Wolf
From: Stefan Hajnoczi 

Requests that complete in an IOThread use irqfd to notify the guest
while requests that complete in the main loop thread use the traditional
qdev irq code path. The reason for this conditional is that the irq code
path requires the BQL:

  if (s->ioeventfd_started && !s->ioeventfd_disabled) {
  virtio_notify_irqfd(vdev, req->vq);
  } else {
  virtio_notify(vdev, req->vq);
  }

There is a corner case where the conditional invokes the irq code path
instead of the irqfd code path:

  static void virtio_blk_stop_ioeventfd(VirtIODevice *vdev)
  {
  ...
  /*
   * Set ->ioeventfd_started to false before draining so that host notifiers
   * are not detached/attached anymore.
   */
  s->ioeventfd_started = false;

  /* Wait for virtio_blk_dma_restart_bh() and in flight I/O to complete */
  blk_drain(s->conf.conf.blk);

During blk_drain() the conditional produces the wrong result because
ioeventfd_started is false.

Use qemu_in_iothread() instead of checking the ioeventfd state.

Buglink: https://issues.redhat.com/browse/RHEL-15394
Signed-off-by: Stefan Hajnoczi 
Message-ID: <20240122172625.415386-1-stefa...@redhat.com>
Reviewed-by: Kevin Wolf 
Signed-off-by: Kevin Wolf 
---
 hw/block/virtio-blk.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 4ca5e632ea..738cb2ac36 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -66,7 +66,7 @@ static void virtio_blk_req_complete(VirtIOBlockReq *req, 
unsigned char status)
 iov_discard_undo(>inhdr_undo);
 iov_discard_undo(>outhdr_undo);
 virtqueue_push(req->vq, >elem, req->in_len);
-if (s->ioeventfd_started && !s->ioeventfd_disabled) {
+if (qemu_in_iothread()) {
 virtio_notify_irqfd(vdev, req->vq);
 } else {
 virtio_notify(vdev, req->vq);
-- 
2.43.0




[PULL 15/16] virtio-blk: Use ioeventfd_attach in start_ioeventfd

2024-02-07 Thread Kevin Wolf
From: Hanna Czenczek 

Commit d3f6f294aeadd5f88caf0155e4360808c95b3146 ("virtio-blk: always set
ioeventfd during startup") has made virtio_blk_start_ioeventfd() always
kick the virtqueue (set the ioeventfd), regardless of whether the BB is
drained.  That is no longer necessary, because attaching the host
notifier will now set the ioeventfd, too; this happens either
immediately right here in virtio_blk_start_ioeventfd(), or later when
the drain ends, in virtio_blk_ioeventfd_attach().

With event_notifier_set() removed, the code becomes the same as the one
in virtio_blk_ioeventfd_attach(), so we can reuse that function.

Signed-off-by: Hanna Czenczek 
Message-ID: <20240202153158.788922-4-hre...@redhat.com>
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Kevin Wolf 
Signed-off-by: Kevin Wolf 
---
 hw/block/virtio-blk.c | 21 ++---
 1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index bda5c117d4..4ca5e632ea 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -37,6 +37,8 @@
 #include "hw/virtio/virtio-blk-common.h"
 #include "qemu/coroutine.h"
 
+static void virtio_blk_ioeventfd_attach(VirtIOBlock *s);
+
 static void virtio_blk_init_request(VirtIOBlock *s, VirtQueue *vq,
 VirtIOBlockReq *req)
 {
@@ -1847,17 +1849,14 @@ static int virtio_blk_start_ioeventfd(VirtIODevice 
*vdev)
 s->ioeventfd_started = true;
 smp_wmb(); /* paired with aio_notify_accept() on the read side */
 
-/* Get this show started by hooking up our callbacks */
-for (i = 0; i < nvqs; i++) {
-VirtQueue *vq = virtio_get_queue(vdev, i);
-AioContext *ctx = s->vq_aio_context[i];
-
-/* Kick right away to begin processing requests already in vring */
-event_notifier_set(virtio_queue_get_host_notifier(vq));
-
-if (!blk_in_drain(s->conf.conf.blk)) {
-virtio_queue_aio_attach_host_notifier(vq, ctx);
-}
+/*
+ * Get this show started by hooking up our callbacks.  If drained now,
+ * virtio_blk_drained_end() will do this later.
+ * Attaching the notifier also kicks the virtqueues, processing any 
requests
+ * they may already have.
+ */
+if (!blk_in_drain(s->conf.conf.blk)) {
+virtio_blk_ioeventfd_attach(s);
 }
 return 0;
 
-- 
2.43.0




[PULL 02/16] virtio-blk: clarify that there is at least 1 virtqueue

2024-02-07 Thread Kevin Wolf
From: Stefan Hajnoczi 

It is not possible to instantiate a virtio-blk device with 0 virtqueues.
The following check is located in ->realize():

  if (!conf->num_queues) {
  error_setg(errp, "num-queues property must be larger than 0");
  return;
  }

Later on we access s->vq_aio_context[0] under the assumption that there
is as least one virtqueue. Hanna Czenczek  noted that
it would help to show that the array index is already valid.

Add an assertion to document that s->vq_aio_context[0] is always
safe...and catch future code changes that break this assumption.

Suggested-by: Hanna Czenczek 
Reviewed-by: Manos Pitsidianakis 
Reviewed-by: Hanna Czenczek 
Signed-off-by: Stefan Hajnoczi 
Message-ID: <20240206190610.107963-3-stefa...@redhat.com>
Reviewed-by: Kevin Wolf 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Kevin Wolf 
---
 hw/block/virtio-blk.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 6e3e3a23ee..e430ba583c 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -1824,6 +1824,7 @@ static int virtio_blk_start_ioeventfd(VirtIODevice *vdev)
  * Try to change the AioContext so that block jobs and other operations can
  * co-locate their activity in the same AioContext. If it fails, nevermind.
  */
+assert(nvqs > 0); /* enforced during ->realize() */
 r = blk_set_aio_context(s->conf.conf.blk, s->vq_aio_context[0],
 _err);
 if (r < 0) {
-- 
2.43.0




[PULL 14/16] virtio: Re-enable notifications after drain

2024-02-07 Thread Kevin Wolf
From: Hanna Czenczek 

During drain, we do not care about virtqueue notifications, which is why
we remove the handlers on it.  When removing those handlers, whether vq
notifications are enabled or not depends on whether we were in polling
mode or not; if not, they are enabled (by default); if so, they have
been disabled by the io_poll_start callback.

Because we do not care about those notifications after removing the
handlers, this is fine.  However, we have to explicitly ensure they are
enabled when re-attaching the handlers, so we will resume receiving
notifications.  We do this in virtio_queue_aio_attach_host_notifier*().
If such a function is called while we are in a polling section,
attaching the notifiers will then invoke the io_poll_start callback,
re-disabling notifications.

Because we will always miss virtqueue updates in the drained section, we
also need to poll the virtqueue once after attaching the notifiers.

Buglink: https://issues.redhat.com/browse/RHEL-3934
Signed-off-by: Hanna Czenczek 
Message-ID: <20240202153158.788922-3-hre...@redhat.com>
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Kevin Wolf 
Signed-off-by: Kevin Wolf 
---
 include/block/aio.h |  7 ++-
 hw/virtio/virtio.c  | 42 ++
 2 files changed, 48 insertions(+), 1 deletion(-)

diff --git a/include/block/aio.h b/include/block/aio.h
index 5d0a114988..8378553eb9 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -480,9 +480,14 @@ void aio_set_event_notifier(AioContext *ctx,
 AioPollFn *io_poll,
 EventNotifierHandler *io_poll_ready);
 
-/* Set polling begin/end callbacks for an event notifier that has already been
+/*
+ * Set polling begin/end callbacks for an event notifier that has already been
  * registered with aio_set_event_notifier.  Do nothing if the event notifier is
  * not registered.
+ *
+ * Note that if the io_poll_end() callback (or the entire notifier) is removed
+ * during polling, it will not be called, so an io_poll_begin() is not
+ * necessarily always followed by an io_poll_end().
  */
 void aio_set_event_notifier_poll(AioContext *ctx,
  EventNotifier *notifier,
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 7549094154..d229755eae 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -3556,6 +3556,17 @@ static void 
virtio_queue_host_notifier_aio_poll_end(EventNotifier *n)
 
 void virtio_queue_aio_attach_host_notifier(VirtQueue *vq, AioContext *ctx)
 {
+/*
+ * virtio_queue_aio_detach_host_notifier() can leave notifications 
disabled.
+ * Re-enable them.  (And if detach has not been used before, notifications
+ * being enabled is still the default state while a notifier is attached;
+ * see virtio_queue_host_notifier_aio_poll_end(), which will always leave
+ * notifications enabled once the polling section is left.)
+ */
+if (!virtio_queue_get_notification(vq)) {
+virtio_queue_set_notification(vq, 1);
+}
+
 aio_set_event_notifier(ctx, >host_notifier,
virtio_queue_host_notifier_read,
virtio_queue_host_notifier_aio_poll,
@@ -3563,6 +3574,13 @@ void virtio_queue_aio_attach_host_notifier(VirtQueue 
*vq, AioContext *ctx)
 aio_set_event_notifier_poll(ctx, >host_notifier,
 virtio_queue_host_notifier_aio_poll_begin,
 virtio_queue_host_notifier_aio_poll_end);
+
+/*
+ * We will have ignored notifications about new requests from the guest
+ * while no notifiers were attached, so "kick" the virt queue to process
+ * those requests now.
+ */
+event_notifier_set(>host_notifier);
 }
 
 /*
@@ -3573,14 +3591,38 @@ void virtio_queue_aio_attach_host_notifier(VirtQueue 
*vq, AioContext *ctx)
  */
 void virtio_queue_aio_attach_host_notifier_no_poll(VirtQueue *vq, AioContext 
*ctx)
 {
+/* See virtio_queue_aio_attach_host_notifier() */
+if (!virtio_queue_get_notification(vq)) {
+virtio_queue_set_notification(vq, 1);
+}
+
 aio_set_event_notifier(ctx, >host_notifier,
virtio_queue_host_notifier_read,
NULL, NULL);
+
+/*
+ * See virtio_queue_aio_attach_host_notifier().
+ * Note that this may be unnecessary for the type of virtqueues this
+ * function is used for.  Still, it will not hurt to have a quick look into
+ * whether we can/should process any of the virtqueue elements.
+ */
+event_notifier_set(>host_notifier);
 }
 
 void virtio_queue_aio_detach_host_notifier(VirtQueue *vq, AioContext *ctx)
 {
 aio_set_event_notifier(ctx, >host_notifier, NULL, NULL, NULL);
+
+/*
+ * aio_set_event_notifier_poll() does not guarantee whether io_poll_end()
+ * will run after io_poll_begin(), so by removing the notifier, we do not
+ * know whether 

[PULL 08/16] iotests: fix leak of tmpdir in dry-run mode

2024-02-07 Thread Kevin Wolf
From: Daniel P. Berrangé 

Creating an instance of the 'TestEnv' class will create a temporary
directory. This dir is only deleted, however, in the __exit__ handler
invoked by a context manager.

In dry-run mode, we don't use the TestEnv via a context manager, so
were leaking the temporary directory. Since meson invokes 'check'
5 times on each configure run, developers /tmp was filling up with
empty temporary directories.

Signed-off-by: Daniel P. Berrangé 
Message-ID: <20240205154019.1841037-1-berra...@redhat.com>
Reviewed-by: Michael Tokarev 
Reviewed-by: Kevin Wolf 
Signed-off-by: Kevin Wolf 
---
 tests/qemu-iotests/check | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/check b/tests/qemu-iotests/check
index f2e9d27dcf..56d88ca423 100755
--- a/tests/qemu-iotests/check
+++ b/tests/qemu-iotests/check
@@ -184,7 +184,8 @@ if __name__ == '__main__':
 sys.exit(str(e))
 
 if args.dry_run:
-print('\n'.join([os.path.basename(t) for t in tests]))
+with env:
+print('\n'.join([os.path.basename(t) for t in tests]))
 else:
 with TestRunner(env, tap=args.tap,
 color=args.color) as tr:
-- 
2.43.0




[PULL 13/16] virtio-scsi: Attach event vq notifier with no_poll

2024-02-07 Thread Kevin Wolf
From: Hanna Czenczek 

As of commit 38738f7dbbda90fbc161757b7f4be35b52205552 ("virtio-scsi:
don't waste CPU polling the event virtqueue"), we only attach an io_read
notifier for the virtio-scsi event virtqueue instead, and no polling
notifiers.  During operation, the event virtqueue is typically
non-empty, but none of the buffers are intended to be used immediately.
Instead, they only get used when certain events occur.  Therefore, it
makes no sense to continuously poll it when non-empty, because it is
supposed to be and stay non-empty.

We do this by using virtio_queue_aio_attach_host_notifier_no_poll()
instead of virtio_queue_aio_attach_host_notifier() for the event
virtqueue.

Commit 766aa2de0f29b657148e04599320d771c36fd126 ("virtio-scsi: implement
BlockDevOps->drained_begin()") however has virtio_scsi_drained_end() use
virtio_queue_aio_attach_host_notifier() for all virtqueues, including
the event virtqueue.  This can lead to it being polled again, undoing
the benefit of commit 38738f7dbbda90fbc161757b7f4be35b52205552.

Fix it by using virtio_queue_aio_attach_host_notifier_no_poll() for the
event virtqueue.

Reported-by: Fiona Ebner 
Fixes: 766aa2de0f29b657148e04599320d771c36fd126
   ("virtio-scsi: implement BlockDevOps->drained_begin()")
Reviewed-by: Stefan Hajnoczi 
Tested-by: Fiona Ebner 
Reviewed-by: Fiona Ebner 
Signed-off-by: Hanna Czenczek 
Message-ID: <20240202153158.788922-2-hre...@redhat.com>
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Kevin Wolf 
Signed-off-by: Kevin Wolf 
---
 hw/scsi/virtio-scsi.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 690aceec45..9f02ceea09 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -1149,6 +1149,7 @@ static void virtio_scsi_drained_begin(SCSIBus *bus)
 static void virtio_scsi_drained_end(SCSIBus *bus)
 {
 VirtIOSCSI *s = container_of(bus, VirtIOSCSI, bus);
+VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(s);
 VirtIODevice *vdev = VIRTIO_DEVICE(s);
 uint32_t total_queues = VIRTIO_SCSI_VQ_NUM_FIXED +
 s->parent_obj.conf.num_queues;
@@ -1166,7 +1167,11 @@ static void virtio_scsi_drained_end(SCSIBus *bus)
 
 for (uint32_t i = 0; i < total_queues; i++) {
 VirtQueue *vq = virtio_get_queue(vdev, i);
-virtio_queue_aio_attach_host_notifier(vq, s->ctx);
+if (vq == vs->event_vq) {
+virtio_queue_aio_attach_host_notifier_no_poll(vq, s->ctx);
+} else {
+virtio_queue_aio_attach_host_notifier(vq, s->ctx);
+}
 }
 }
 
-- 
2.43.0




[PULL 00/16] Block layer patches

2024-02-07 Thread Kevin Wolf
The following changes since commit 39a6e4f87e7b75a45b08d6dc8b8b7c2954c87440:

  Merge tag 'pull-qapi-2024-02-03' of https://repo.or.cz/qemu/armbru into 
staging (2024-02-03 13:31:58 +)

are available in the Git repository at:

  https://repo.or.cz/qemu/kevin.git tags/for-upstream

for you to fetch changes up to 7ccd0415f2d67e6739da756241f60d98d5c80bf8:

  virtio-blk: avoid using ioeventfd state in irqfd conditional (2024-02-07 
21:59:07 +0100)


Block layer patches

- Allow concurrent BB context changes
- virtio: Re-enable notifications after drain
- virtio-blk: Fix missing use of irqfd
- scsi: Don't ignore most usb-storage properties
- blkio: Respect memory-alignment for bounce buffer allocations
- iotests tmpdir fixes
- virtio-blk: Code cleanups


Daniel P. Berrangé (2):
  iotests: fix leak of tmpdir in dry-run mode
  iotests: give tempdir an identifying name

Hanna Czenczek (5):
  block-backend: Allow concurrent context changes
  scsi: Await request purging
  virtio-scsi: Attach event vq notifier with no_poll
  virtio: Re-enable notifications after drain
  virtio-blk: Use ioeventfd_attach in start_ioeventfd

Kevin Wolf (2):
  scsi: Don't ignore most usb-storage properties
  blkio: Respect memory-alignment for bounce buffer allocations

Stefan Hajnoczi (7):
  virtio-blk: enforce iothread-vq-mapping validation
  virtio-blk: clarify that there is at least 1 virtqueue
  virtio-blk: add vq_rq[] bounds check in virtio_blk_dma_restart_cb()
  virtio-blk: declare VirtIOBlock::rq with a type
  monitor: use aio_co_reschedule_self()
  virtio-blk: do not use C99 mixed declarations
  virtio-blk: avoid using ioeventfd state in irqfd conditional

 include/block/aio.h|   7 +-
 include/hw/scsi/scsi.h |   5 +-
 include/hw/virtio/virtio-blk.h |   2 +-
 block/blkio.c  |   3 +
 block/block-backend.c  |  22 ++--
 hw/block/virtio-blk.c  | 226 +++--
 hw/scsi/scsi-bus.c |  63 ++--
 hw/scsi/virtio-scsi.c  |   7 +-
 hw/usb/dev-storage-classic.c   |   5 +-
 hw/virtio/virtio.c |  42 
 qapi/qmp-dispatch.c|   7 +-
 tests/qemu-iotests/testenv.py  |   2 +-
 tests/qemu-iotests/check   |   3 +-
 13 files changed, 236 insertions(+), 158 deletions(-)




Re: [PATCH] ui/console: Fix console resize with placeholder surface

2024-02-07 Thread Michael Tokarev

07.02.2024 20:20, Tianlan Zhou :

In `qemu_console_resize()`, the old surface of the console is keeped if the new
console size is the same as the old one. If the old surface is a placeholder,
and the new size of console is the same as the placeholder surface (640*480),
the surface won't be replace.
In this situation, the surface's `QEMU_PLACEHOLDER_FLAG` flag is still set, so
the console won't be displayed in SDL display mode.
This patch fixes this problem by forcing a new surface if the old one is a
placeholder.


Cc qemu-stable

/mjt



Re: [PATCH v3 29/29] user: Prefer fast cpu_env() over slower CPU QOM cast macro

2024-02-07 Thread Warner Losh
[[ I dont know if it's too late ]]

On Mon, Jan 29, 2024 at 9:48 AM Philippe Mathieu-Daudé 
wrote:

> Mechanical patch produced running the command documented
> in scripts/coccinelle/cpu_env.cocci_template header.
>
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  bsd-user/signal.c   | 3 +--
>  linux-user/signal.c | 6 ++
>  2 files changed, 3 insertions(+), 6 deletions(-)
>

Reviewed-by: Warner Losh 


> diff --git a/bsd-user/signal.c b/bsd-user/signal.c
> index ca31470772..c6f0b1be38 100644
> --- a/bsd-user/signal.c
> +++ b/bsd-user/signal.c
> @@ -463,14 +463,13 @@ static int fatal_signal(int sig)
>  void force_sig_fault(int sig, int code, abi_ulong addr)
>  {
>  CPUState *cpu = thread_cpu;
> -CPUArchState *env = cpu_env(cpu);
>  target_siginfo_t info = {};
>
>  info.si_signo = sig;
>  info.si_errno = 0;
>  info.si_code = code;
>  info.si_addr = addr;
> -queue_signal(env, sig, QEMU_SI_FAULT, );
> +queue_signal(cpu_env(cpu), sig, QEMU_SI_FAULT, );
>  }
>
>  static void host_signal_handler(int host_sig, siginfo_t *info, void *puc)
> diff --git a/linux-user/signal.c b/linux-user/signal.c
> index c9527adfa3..f78f7fc476 100644
> --- a/linux-user/signal.c
> +++ b/linux-user/signal.c
> @@ -623,7 +623,6 @@ void signal_init(void)
>  void force_sig(int sig)
>  {
>  CPUState *cpu = thread_cpu;
> -CPUArchState *env = cpu_env(cpu);
>  target_siginfo_t info = {};
>
>  info.si_signo = sig;
> @@ -631,7 +630,7 @@ void force_sig(int sig)
>  info.si_code = TARGET_SI_KERNEL;
>  info._sifields._kill._pid = 0;
>  info._sifields._kill._uid = 0;
> -queue_signal(env, info.si_signo, QEMU_SI_KILL, );
> +queue_signal(cpu_env(cpu), info.si_signo, QEMU_SI_KILL, );
>  }
>
>  /*
> @@ -641,14 +640,13 @@ void force_sig(int sig)
>  void force_sig_fault(int sig, int code, abi_ulong addr)
>  {
>  CPUState *cpu = thread_cpu;
> -CPUArchState *env = cpu_env(cpu);
>  target_siginfo_t info = {};
>
>  info.si_signo = sig;
>  info.si_errno = 0;
>  info.si_code = code;
>  info._sifields._sigfault._addr = addr;
> -queue_signal(env, sig, QEMU_SI_FAULT, );
> +queue_signal(cpu_env(cpu), sig, QEMU_SI_FAULT, );
>  }
>
>  /* Force a SIGSEGV if we couldn't write to memory trying to set
> --
> 2.41.0
>
>


Re: [PATCH] virtio-blk: avoid using ioeventfd state in irqfd conditional

2024-02-07 Thread Kevin Wolf
Am 22.01.2024 um 18:26 hat Stefan Hajnoczi geschrieben:
> Requests that complete in an IOThread use irqfd to notify the guest
> while requests that complete in the main loop thread use the traditional
> qdev irq code path. The reason for this conditional is that the irq code
> path requires the BQL:
> 
>   if (s->ioeventfd_started && !s->ioeventfd_disabled) {
>   virtio_notify_irqfd(vdev, req->vq);
>   } else {
>   virtio_notify(vdev, req->vq);
>   }
> 
> There is a corner case where the conditional invokes the irq code path
> instead of the irqfd code path:
> 
>   static void virtio_blk_stop_ioeventfd(VirtIODevice *vdev)
>   {
>   ...
>   /*
>* Set ->ioeventfd_started to false before draining so that host 
> notifiers
>* are not detached/attached anymore.
>*/
>   s->ioeventfd_started = false;
> 
>   /* Wait for virtio_blk_dma_restart_bh() and in flight I/O to complete */
>   blk_drain(s->conf.conf.blk);
> 
> During blk_drain() the conditional produces the wrong result because
> ioeventfd_started is false.
> 
> Use qemu_in_iothread() instead of checking the ioeventfd state.
> 
> Buglink: https://issues.redhat.com/browse/RHEL-15394
> Signed-off-by: Stefan Hajnoczi 

Thanks, applied to the block branch.

Kevin




Re: [PATCH v2 0/3] virtio: Re-enable notifications after drain

2024-02-07 Thread Kevin Wolf
Am 02.02.2024 um 16:31 hat Hanna Czenczek geschrieben:
> Hanna Czenczek (3):
>   virtio-scsi: Attach event vq notifier with no_poll
>   virtio: Re-enable notifications after drain
>   virtio-blk: Use ioeventfd_attach in start_ioeventfd
> 
>  include/block/aio.h   |  7 ++-
>  hw/block/virtio-blk.c | 21 ++---
>  hw/scsi/virtio-scsi.c |  7 ++-
>  hw/virtio/virtio.c| 42 ++
>  4 files changed, 64 insertions(+), 13 deletions(-)

Thanks, applied to the block branch.

Kevin




Re: [PATCH v2 2/2] aspeed: fix hardcode boot address 0

2024-02-07 Thread Philippe Mathieu-Daudé

Hi Jamin,

On 7/2/24 20:52, Jamin Lin via wrote:

In the previous design of ASPEED SOCs QEMU model, it set the boot
address at "0" which was the hardcode setting for ast10x0, ast2600,
ast2500 and ast2400.

According to the design of ast2700, it has bootmcu which is used for
executing SPL and initialize DRAM, then, CPUs(cortex-a35)
execute u-boot, kernel and rofs. QEMU will only support CPU(cortex-a35)
parts and the boot address is "0x4 " for ast2700.


This justification from here ...


Therefore, fixed hardcode boot address 0.


... to here is still unclear. You provided an explanation in previous
patch, maybe worth including it in this description?

Otherwise for the code changes:
Reviewed-by: Philippe Mathieu-Daudé 


Signed-off-by: Troy Lee 
Signed-off-by: Jamin Lin 
---
  hw/arm/aspeed.c | 4 +++-
  hw/arm/aspeed_ast2400.c | 4 ++--
  hw/arm/aspeed_ast2600.c | 2 +-
  include/hw/arm/aspeed_soc.h | 2 --
  4 files changed, 6 insertions(+), 6 deletions(-)





Re: [PATCH 12/14] migration: Report error when shutdown fails

2024-02-07 Thread Philippe Mathieu-Daudé

On 7/2/24 14:33, Cédric Le Goater wrote:

This will help detect issues regarding I/O channels usage.


English isn't my native language but I'd expect "detecting" here.


Signed-off-by: Cédric Le Goater 
---
  migration/qemu-file.c | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)


Reviewed-by: Philippe Mathieu-Daudé 





Re: [PATCH 11/14] vfio: Extend vfio_set_migration_error() with Error* argument

2024-02-07 Thread Philippe Mathieu-Daudé

On 7/2/24 14:33, Cédric Le Goater wrote:

vfio_set_migration_error() sets the 'return' error on the migration
stream if a migration is in progress. To improve error reporting, add
a new Error* argument to also set the Error object on the migration
stream.

Signed-off-by: Cédric Le Goater 
---
  hw/vfio/common.c | 50 +---
  1 file changed, 30 insertions(+), 20 deletions(-)




-static void vfio_set_migration_error(int err)
+static void vfio_set_migration_error(int ret, Error *err)


Maybe rename vfio_report_migration_error() for clarity?


  {
  MigrationState *ms = migrate_get_current();
  
  if (migration_is_setup_or_active(ms->state)) {

  WITH_QEMU_LOCK_GUARD(>qemu_file_lock) {
  if (ms->to_dst_file) {
-qemu_file_set_error(ms->to_dst_file, err);
+qemu_file_set_error_obj(ms->to_dst_file, ret, err);
  }
  }
+} else {
+error_report_err(err);
  }
  }





Re: [PATCH 09/14] vfio: Add Error** argument to .vfio_save_config() handler

2024-02-07 Thread Philippe Mathieu-Daudé

On 7/2/24 14:33, Cédric Le Goater wrote:

Use vmstate_save_state_with_err() to improve error reporting in the
callers.

Signed-off-by: Cédric Le Goater 
---
  include/hw/vfio/vfio-common.h |  2 +-
  hw/vfio/migration.c   | 18 --
  hw/vfio/pci.c |  5 +++--
  3 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 
9b7ef7d02b5a0ad5266bcc4d06cd6874178978e4..710e0d6a880b97848af6ddc2e7968a01054fa122
 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -133,7 +133,7 @@ struct VFIODeviceOps {
  int (*vfio_hot_reset_multi)(VFIODevice *vdev);
  void (*vfio_eoi)(VFIODevice *vdev);
  Object *(*vfio_get_object)(VFIODevice *vdev);
-void (*vfio_save_config)(VFIODevice *vdev, QEMUFile *f);
+int (*vfio_save_config)(VFIODevice *vdev, QEMUFile *f, Error **errp);


Worth a one-line docstring?

Otherwise,
Reviewed-by: Philippe Mathieu-Daudé 


  int (*vfio_load_config)(VFIODevice *vdev, QEMUFile *f);
  };





Re: [PATCH 08/14] vfio: Use new Error** argument in vfio_save_setup()

2024-02-07 Thread Philippe Mathieu-Daudé

On 7/2/24 14:33, Cédric Le Goater wrote:

Add an Error** argument to vfio_migration_set_state() and adjust
callers, including vfio_save_setup(). The error will be propagated up
to qemu_savevm_state_setup() where the save_setup() handler is
executed.

Signed-off-by: Cédric Le Goater 
---
  hw/vfio/migration.c | 62 +
  1 file changed, 40 insertions(+), 22 deletions(-)




@@ -429,13 +431,18 @@ static void vfio_save_cleanup(void *opaque)
  {
  VFIODevice *vbasedev = opaque;
  VFIOMigration *migration = vbasedev->migration;
+Error *local_err = NULL;
  
  /*

   * Changing device state from STOP_COPY to STOP can take time. Do it here,
   * after migration has completed, so it won't increase downtime.
   */
  if (migration->device_state == VFIO_DEVICE_STATE_STOP_COPY) {
-vfio_migration_set_state_or_reset(vbasedev, VFIO_DEVICE_STATE_STOP);
+vfio_migration_set_state_or_reset(vbasedev, VFIO_DEVICE_STATE_STOP,
+  _err);
+if (local_err) {


Please check callee return value instead.


+error_report_err(local_err);
+}
  }
  
  g_free(migration->data_buffer);

@@ -541,11 +548,13 @@ static int vfio_save_complete_precopy(QEMUFile *f, void 
*opaque)
  VFIODevice *vbasedev = opaque;
  ssize_t data_size;
  int ret;
+Error *local_err = NULL;
  
  /* We reach here with device state STOP or STOP_COPY only */

  ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP_COPY,
-   VFIO_DEVICE_STATE_STOP);
-if (ret) {
+   VFIO_DEVICE_STATE_STOP, _err);
+if (local_err) {


Ditto.


+error_report_err(local_err);
  return ret;
  }




@@ -760,6 +773,7 @@ static void vfio_migration_state_notifier(Notifier 
*notifier, void *data)
  VFIOMigration *migration = container_of(notifier, VFIOMigration,
  migration_state);
  VFIODevice *vbasedev = migration->vbasedev;
+Error *local_err = NULL;
  
  trace_vfio_migration_state_notifier(vbasedev->name,

  MigrationStatus_str(s->state));
@@ -768,7 +782,11 @@ static void vfio_migration_state_notifier(Notifier 
*notifier, void *data)
  case MIGRATION_STATUS_CANCELLING:
  case MIGRATION_STATUS_CANCELLED:
  case MIGRATION_STATUS_FAILED:
-vfio_migration_set_state_or_reset(vbasedev, VFIO_DEVICE_STATE_RUNNING);
+vfio_migration_set_state_or_reset(vbasedev, VFIO_DEVICE_STATE_RUNNING,
+  _err);
+if (local_err) {


Ditto.


+error_report_err(local_err);
+}
  }
  }
  





Re: [PATCH 07/14] vfio: Add Error** argument to vfio_devices_dma_logging_stop()

2024-02-07 Thread Philippe Mathieu-Daudé

On 7/2/24 14:33, Cédric Le Goater wrote:

This improves error reporting in the log_global_stop() VFIO handler.

Signed-off-by: Cédric Le Goater 
---
  hw/vfio/common.c | 19 ++-
  1 file changed, 14 insertions(+), 5 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH 06/14] vfio: Add Error** argument to vfio_devices_dma_logging_start()

2024-02-07 Thread Philippe Mathieu-Daudé

On 7/2/24 14:33, Cédric Le Goater wrote:

This allows to update the Error argument of the VFIO log_global_start()
handler. Errors detected when device level logging is started will be
propagated up to qemu_savevm_state_setup() when the ram save_setup()
handler is executed.

The vfio_set_migration_error() call becomes redudant. Remove it.


Typo "redundant".



Signed-off-by: Cédric Le Goater 
---
  hw/vfio/common.c | 21 +
  1 file changed, 9 insertions(+), 12 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH 05/14] vfio: Add Error** argument to .set_dirty_page_tracking() handler

2024-02-07 Thread Philippe Mathieu-Daudé

On 7/2/24 14:33, Cédric Le Goater wrote:

We will use the Error object to improve error reporting in the
.log_global*() handlers of VFIO.

Signed-off-by: Cédric Le Goater 
---
  include/hw/vfio/vfio-container-base.h | 4 ++--
  hw/vfio/common.c  | 4 ++--
  hw/vfio/container-base.c  | 4 ++--
  hw/vfio/container.c   | 6 +++---
  4 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/include/hw/vfio/vfio-container-base.h 
b/include/hw/vfio/vfio-container-base.h
index 
b2813b0c117985425c842d91f011bb895955d738..f22fcb5a214be2717b42815371346401bb7fce51
 100644
--- a/include/hw/vfio/vfio-container-base.h
+++ b/include/hw/vfio/vfio-container-base.h
@@ -81,7 +81,7 @@ int vfio_container_add_section_window(VFIOContainerBase 
*bcontainer,
  void vfio_container_del_section_window(VFIOContainerBase *bcontainer,
 MemoryRegionSection *section);
  int vfio_container_set_dirty_page_tracking(VFIOContainerBase *bcontainer,
-   bool start);
+   bool start, Error **errp);


Since here, please document modified prototypes, otherwise:

Reviewed-by: Philippe Mathieu-Daudé 


  int vfio_container_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
VFIOBitmap *vbmap,
hwaddr iova, hwaddr size);
@@ -122,7 +122,7 @@ struct VFIOIOMMUClass {
  void (*detach_device)(VFIODevice *vbasedev);
  /* migration feature */
  int (*set_dirty_page_tracking)(const VFIOContainerBase *bcontainer,
-   bool start);
+   bool start, Error **errp);
  int (*query_dirty_bitmap)(const VFIOContainerBase *bcontainer,
VFIOBitmap *vbmap,
hwaddr iova, hwaddr size);





Re: [PATCH 04/14] migration: Modify ram_init_bitmaps() to report dirty tracking errors

2024-02-07 Thread Philippe Mathieu-Daudé

On 7/2/24 14:33, Cédric Le Goater wrote:

The .save_setup() handler has now an Error** argument that we can use
to propagate errors reported by the .log_global_start() handler. Do
that for the RAM. qemu_savevm_state_setup() will store the error under
the migration stream for later detection in the migration sequence.

Signed-off-by: Cédric Le Goater 
---
  migration/ram.c | 19 ++-
  1 file changed, 10 insertions(+), 9 deletions(-)




-static void ram_init_bitmaps(RAMState *rs)
+static void ram_init_bitmaps(RAMState *rs, Error **errp)


Please return a boolean.


  {
-Error *local_err = NULL;
-
  qemu_mutex_lock_ramlist();
  
  WITH_RCU_READ_LOCK_GUARD() {

  ram_list_init_bitmaps();
  /* We don't use dirty log with background snapshots */
  if (!migrate_background_snapshot()) {
-memory_global_dirty_log_start(GLOBAL_DIRTY_MIGRATION, _err);
-if (local_err) {
-error_report_err(local_err);
+memory_global_dirty_log_start(GLOBAL_DIRTY_MIGRATION, errp);
+if (*errp) {
+break;
  }
  migration_bitmap_sync_precopy(rs, false);
  }
@@ -2828,7 +2826,7 @@ static void ram_init_bitmaps(RAMState *rs)
  migration_bitmap_clear_discarded_pages(rs);
  }





Re: [PATCH 02/14] migration: Add Error** argument to .load_setup() handler

2024-02-07 Thread Philippe Mathieu-Daudé

On 7/2/24 14:33, Cédric Le Goater wrote:

This will be useful to report errors at a higher level, mostly in VFIO
today.

Signed-off-by: Cédric Le Goater 
---
  include/migration/register.h |  2 +-
  hw/vfio/migration.c  |  2 +-
  migration/ram.c  |  2 +-
  migration/savevm.c   | 10 ++
  4 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/include/migration/register.h b/include/migration/register.h
index 
831600a00eae4efd0464b60925d65de4d9dbcff8..e6bc226c98b27c1fb0f9e2b56d8aff491aa14d65
 100644
--- a/include/migration/register.h
+++ b/include/migration/register.h
@@ -72,7 +72,7 @@ typedef struct SaveVMHandlers {
  void (*state_pending_exact)(void *opaque, uint64_t *must_precopy,
  uint64_t *can_postcopy);
  LoadStateHandler *load_state;
-int (*load_setup)(QEMUFile *f, void *opaque);
+int (*load_setup)(QEMUFile *f, void *opaque, Error **errp);


Please document this prototype. Otherwise:

Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH 01/14] migration: Add Error** argument to .save_setup() handler

2024-02-07 Thread Philippe Mathieu-Daudé

On 7/2/24 14:33, Cédric Le Goater wrote:

The purpose is to record a potential error in the migration stream if
qemu_savevm_state_setup() fails. Most of the current .save_setup()
handlers can be modified to use the Error argument instead of managing
their own and calling locally error_report(). The following patches
will introduce such changes for VFIO first.

Signed-off-by: Cédric Le Goater 
---
  include/migration/register.h   | 2 +-
  hw/ppc/spapr.c | 2 +-
  hw/s390x/s390-stattrib.c   | 2 +-
  hw/vfio/migration.c| 2 +-
  migration/block-dirty-bitmap.c | 2 +-
  migration/block.c  | 2 +-
  migration/ram.c| 2 +-
  migration/savevm.c | 4 ++--
  8 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/include/migration/register.h b/include/migration/register.h
index 
9ab1f79512c605f0c88a45b560c57486fa054441..831600a00eae4efd0464b60925d65de4d9dbcff8
 100644
--- a/include/migration/register.h
+++ b/include/migration/register.h
@@ -25,7 +25,7 @@ typedef struct SaveVMHandlers {
   * used to perform early checks.
   */
  int (*save_prepare)(void *opaque, Error **errp);
-int (*save_setup)(QEMUFile *f, void *opaque);
+int (*save_setup)(QEMUFile *f, void *opaque, Error **errp);


Since you change this, do you mind adding a docstring
describing this prototype?

Otherwise,
Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH v3 0/6] target/arm: assorted mte fixes

2024-02-07 Thread Gustavo Romero

Hi Richard,

On 2/6/24 11:52 PM, Richard Henderson wrote:

Changes for v3:
   - As if /sys/devices/system/cpu/cpu/mte_tcf_preferred is "sync".
   - Fix do_st_zpa as well as do_ld_zpa.  Oops.

Because of the above, I dropped Gustavo's t-b.


r~


Richard Henderson (6):
   linux-user/aarch64: Choose SYNC as the preferred MTE mode
   target/arm: Fix nregs computation in do_{ld,st}_zpa
   target/arm: Adjust and validate mtedesc sizem1
   target/arm: Split out make_svemte_desc
   target/arm: Handle mte in do_ldrq, do_ldro
   target/arm: Fix SVE/SME gross MTE suppression checks

  linux-user/aarch64/target_prctl.h | 29 ++-
  target/arm/internals.h|  2 +-
  target/arm/tcg/translate-a64.h|  2 +
  target/arm/tcg/sme_helper.c   |  8 +--
  target/arm/tcg/sve_helper.c   | 12 ++---
  target/arm/tcg/translate-sme.c| 15 ++
  target/arm/tcg/translate-sve.c| 83 ++-
  7 files changed, 83 insertions(+), 68 deletions(-)


Since this patchset fixes the "prctl() failed: Invalid argument"
on QEMU user-mode when both flags (ASYNC | SYNC) are passed
to prctl(PR_SET_TAGGED_ADDR_CTRL, ...), I tested it again --
expecting no different result -- so:

Tested-by: Gustavo Romero 

If that t-b tag doesn't make sense, feel free to drop it :)

Thanks for fixing it!


Cheers,
Gustavo



Re: [PATCH v2 10/14] gdbstub: Expose TARGET_SIGTRAP in a target-agnostic way

2024-02-07 Thread Philippe Mathieu-Daudé

On 7/2/24 17:38, Alex Bennée wrote:

From: Ilya Leoshkevich 

The upcoming syscall catchpoint support needs to send SIGTRAP stop
packets to GDB. Being able to compile this support only once for all
targets is a good thing, and it requires hiding TARGET_SIGTRAP behind
a function call.

Signed-off-by: Ilya Leoshkevich 
Message-Id: <20240202152506.279476-2-...@linux.ibm.com>
Signed-off-by: Alex Bennée 
---
  gdbstub/internals.h   | 1 +
  gdbstub/user-target.c | 5 +
  2 files changed, 6 insertions(+)

diff --git a/gdbstub/internals.h b/gdbstub/internals.h
index 5c0c725e54c..aeb0d9b5377 100644
--- a/gdbstub/internals.h
+++ b/gdbstub/internals.h
@@ -136,6 +136,7 @@ void gdb_append_thread_id(CPUState *cpu, GString *buf);
  int gdb_get_cpu_index(CPUState *cpu);
  unsigned int gdb_get_max_cpus(void); /* both */
  bool gdb_can_reverse(void); /* softmmu, stub for user */
+int gdb_target_sigtrap(void); /* user */
  
  void gdb_create_default_process(GDBState *s);
  
diff --git a/gdbstub/user-target.c b/gdbstub/user-target.c

index c4bba4c72c7..b7d4c37cd81 100644
--- a/gdbstub/user-target.c
+++ b/gdbstub/user-target.c
@@ -418,3 +418,8 @@ void gdb_handle_query_xfer_exec_file(GArray *params, void 
*user_ctx)
  ts->bprm->filename + offset);
  gdb_put_strbuf();
  }
+
+int gdb_target_sigtrap(void)
+{
+return TARGET_SIGTRAP;
+}


Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH v3 1/6] linux-user/aarch64: Choose SYNC as the preferred MTE mode

2024-02-07 Thread Gustavo Romero



On 2/6/24 11:52 PM, Richard Henderson wrote:

The API does not generate an error for setting ASYNC | SYNC; that merely
constrains the selection vs the per-cpu default.  For qemu linux-user,
choose SYNC as the default.

Cc: qemu-sta...@nongnu.org
Reported-by: Gustavo Romero 
Signed-off-by: Richard Henderson 
---
  linux-user/aarch64/target_prctl.h | 29 +
  1 file changed, 17 insertions(+), 12 deletions(-)

diff --git a/linux-user/aarch64/target_prctl.h 
b/linux-user/aarch64/target_prctl.h
index 5067e7d731..aa8e203c15 100644
--- a/linux-user/aarch64/target_prctl.h
+++ b/linux-user/aarch64/target_prctl.h
@@ -173,21 +173,26 @@ static abi_long 
do_prctl_set_tagged_addr_ctrl(CPUArchState *env, abi_long arg2)
  env->tagged_addr_enable = arg2 & PR_TAGGED_ADDR_ENABLE;
  
  if (cpu_isar_feature(aa64_mte, cpu)) {

-switch (arg2 & PR_MTE_TCF_MASK) {
-case PR_MTE_TCF_NONE:
-case PR_MTE_TCF_SYNC:
-case PR_MTE_TCF_ASYNC:
-break;
-default:
-return -EINVAL;
-}
-
  /*
   * Write PR_MTE_TCF to SCTLR_EL1[TCF0].
- * Note that the syscall values are consistent with hw.
+ *
+ * The kernel has a per-cpu configuration for the sysadmin,
+ * /sys/devices/system/cpu/cpu/mte_tcf_preferred,
+ * which qemu does not implement.
+ *
+ * Because there is no performance difference between the modes, and
+ * because SYNC is most useful for debugging MTE errors, choose SYNC
+ * as the preferred mode.  With this preference, and the way the API
+ * uses only two bits, there is no way for the program to select
+ * ASYMM mode.
   */
-env->cp15.sctlr_el[1] =
-deposit64(env->cp15.sctlr_el[1], 38, 2, arg2 >> PR_MTE_TCF_SHIFT);
+unsigned tcf = 0;
+if (arg2 & PR_MTE_TCF_SYNC) {
+tcf = 1;
+} else if (arg2 & PR_MTE_TCF_ASYNC) {
+tcf = 2;
+}
+env->cp15.sctlr_el[1] = deposit64(env->cp15.sctlr_el[1], 38, 2, tcf);
  
  /*

   * Write PR_MTE_TAG to GCR_EL1[Exclude].



ok, so no ASYMM in QEMU user-mode, plus if both SYNC and ASYNC flags are
specified by the user SYNC is selected. Contrary to what happens by default
on Linux, because of the mte_tcf_preferred value, which is ASYNC, and the
final value selected is define by:

resolved_mte_tcf = (mte_ctrl & pref) ? pref : mte_ctrl; [0]

where pref is mte_tcf_preferred (CPU, the value set in sys /mte_tcf_preferred)
and mte_ctr comes from the process, i.e. is the value specified by the user in
the flags -- hence the default on Linux if both flags are specified is ASYNC,
not SYNC.

(just some notes for the records).

Thanks.


[0] 
https://github.com/torvalds/linux/blob/master/arch/arm64/kernel/mte.c#L180-L186



[PATCH v2 2/2] aspeed: fix hardcode boot address 0

2024-02-07 Thread Jamin Lin via
In the previous design of ASPEED SOCs QEMU model, it set the boot
address at "0" which was the hardcode setting for ast10x0, ast2600,
ast2500 and ast2400.

According to the design of ast2700, it has bootmcu which is used for
executing SPL and initialize DRAM, then, CPUs(cortex-a35)
execute u-boot, kernel and rofs. QEMU will only support CPU(cortex-a35)
parts and the boot address is "0x4 " for ast2700.
Therefore, fixed hardcode boot address 0.

Signed-off-by: Troy Lee 
Signed-off-by: Jamin Lin 
---
 hw/arm/aspeed.c | 4 +++-
 hw/arm/aspeed_ast2400.c | 4 ++--
 hw/arm/aspeed_ast2600.c | 2 +-
 include/hw/arm/aspeed_soc.h | 2 --
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c
index 06d863958b..39758557be 100644
--- a/hw/arm/aspeed.c
+++ b/hw/arm/aspeed.c
@@ -289,12 +289,14 @@ static void aspeed_install_boot_rom(AspeedMachineState 
*bmc, BlockBackend *blk,
 uint64_t rom_size)
 {
 AspeedSoCState *soc = bmc->soc;
+AspeedSoCClass *sc = ASPEED_SOC_GET_CLASS(soc);
 
 memory_region_init_rom(>boot_rom, NULL, "aspeed.boot_rom", rom_size,
_abort);
 memory_region_add_subregion_overlap(>spi_boot_container, 0,
 >boot_rom, 1);
-write_boot_rom(blk, ASPEED_SOC_SPI_BOOT_ADDR, rom_size, _abort);
+write_boot_rom(blk, sc->memmap[ASPEED_DEV_SPI_BOOT],
+   rom_size, _abort);
 }
 
 void aspeed_board_init_flashes(AspeedSMCState *s, const char *flashtype,
diff --git a/hw/arm/aspeed_ast2400.c b/hw/arm/aspeed_ast2400.c
index 95da85fee0..d125886207 100644
--- a/hw/arm/aspeed_ast2400.c
+++ b/hw/arm/aspeed_ast2400.c
@@ -26,7 +26,7 @@
 #define ASPEED_SOC_IOMEM_SIZE   0x0020
 
 static const hwaddr aspeed_soc_ast2400_memmap[] = {
-[ASPEED_DEV_SPI_BOOT]  =  ASPEED_SOC_SPI_BOOT_ADDR,
+[ASPEED_DEV_SPI_BOOT]  = 0x,
 [ASPEED_DEV_IOMEM]  = 0x1E60,
 [ASPEED_DEV_FMC]= 0x1E62,
 [ASPEED_DEV_SPI1]   = 0x1E63,
@@ -61,7 +61,7 @@ static const hwaddr aspeed_soc_ast2400_memmap[] = {
 };
 
 static const hwaddr aspeed_soc_ast2500_memmap[] = {
-[ASPEED_DEV_SPI_BOOT]  = ASPEED_SOC_SPI_BOOT_ADDR,
+[ASPEED_DEV_SPI_BOOT]  = 0x,
 [ASPEED_DEV_IOMEM]  = 0x1E60,
 [ASPEED_DEV_FMC]= 0x1E62,
 [ASPEED_DEV_SPI1]   = 0x1E63,
diff --git a/hw/arm/aspeed_ast2600.c b/hw/arm/aspeed_ast2600.c
index f74561ecdc..174be53770 100644
--- a/hw/arm/aspeed_ast2600.c
+++ b/hw/arm/aspeed_ast2600.c
@@ -22,7 +22,7 @@
 #define ASPEED_SOC_DPMCU_SIZE   0x0004
 
 static const hwaddr aspeed_soc_ast2600_memmap[] = {
-[ASPEED_DEV_SPI_BOOT]  = ASPEED_SOC_SPI_BOOT_ADDR,
+[ASPEED_DEV_SPI_BOOT]  = 0x,
 [ASPEED_DEV_SRAM]  = 0x1000,
 [ASPEED_DEV_DPMCU] = 0x1800,
 /* 0x1600 0x17FF : AHB BUS do LPC Bus bridge */
diff --git a/include/hw/arm/aspeed_soc.h b/include/hw/arm/aspeed_soc.h
index 5ab0902da0..bf43ad8351 100644
--- a/include/hw/arm/aspeed_soc.h
+++ b/include/hw/arm/aspeed_soc.h
@@ -224,8 +224,6 @@ enum {
 ASPEED_DEV_FSI2,
 };
 
-#define ASPEED_SOC_SPI_BOOT_ADDR 0x0
-
 qemu_irq aspeed_soc_get_irq(AspeedSoCState *s, int dev);
 bool aspeed_soc_uart_realize(AspeedSoCState *s, Error **errp);
 void aspeed_soc_uart_set_chr(AspeedSoCState *s, int dev, Chardev *chr);
-- 
2.25.1




[PATCH v2 1/2] aspeed: introduce a new UART0 device name

2024-02-07 Thread Jamin Lin via
The Aspeed datasheet refers to the UART controllers
as UART1 - UART13 for the ast10x0, ast2600, ast2500
and ast2400 SoCs and the Aspeed ast2700 introduces an UART0
and the UART controllers as UART0 - UART12.

To keep the naming in the QEMU models
in sync with the datasheet, let's introduce a new  UART0 device name
and do the required adjustements, etc ...

Signed-off-by: Troy Lee 
Signed-off-by: Jamin Lin 
---
 hw/arm/aspeed.c | 13 -
 hw/arm/aspeed_ast10x0.c |  1 +
 hw/arm/aspeed_ast2400.c |  2 ++
 hw/arm/aspeed_ast2600.c |  1 +
 hw/arm/aspeed_soc_common.c  | 14 +-
 include/hw/arm/aspeed_soc.h |  2 ++
 6 files changed, 23 insertions(+), 10 deletions(-)

diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c
index 09b1e823ba..06d863958b 100644
--- a/hw/arm/aspeed.c
+++ b/hw/arm/aspeed.c
@@ -342,7 +342,7 @@ static void connect_serial_hds_to_uarts(AspeedMachineState 
*bmc)
 int uart_chosen = bmc->uart_chosen ? bmc->uart_chosen : amc->uart_default;
 
 aspeed_soc_uart_set_chr(s, uart_chosen, serial_hd(0));
-for (int i = 1, uart = ASPEED_DEV_UART1; i < sc->uarts_num; i++, uart++) {
+for (int i = 0, uart = sc->uarts_base; i < sc->uarts_num; i++, uart++) {
 if (uart == uart_chosen) {
 continue;
 }
@@ -1094,7 +1094,7 @@ static char *aspeed_get_bmc_console(Object *obj, Error 
**errp)
 AspeedMachineClass *amc = ASPEED_MACHINE_GET_CLASS(bmc);
 int uart_chosen = bmc->uart_chosen ? bmc->uart_chosen : amc->uart_default;
 
-return g_strdup_printf("uart%d", uart_chosen - ASPEED_DEV_UART1 + 1);
+return g_strdup_printf("uart%d", uart_chosen - ASPEED_DEV_UART0);
 }
 
 static void aspeed_set_bmc_console(Object *obj, const char *value, Error 
**errp)
@@ -1103,6 +1103,8 @@ static void aspeed_set_bmc_console(Object *obj, const 
char *value, Error **errp)
 AspeedMachineClass *amc = ASPEED_MACHINE_GET_CLASS(bmc);
 AspeedSoCClass *sc = ASPEED_SOC_CLASS(object_class_by_name(amc->soc_name));
 int val;
+int start = sc->uarts_base - ASPEED_DEV_UART0;
+int end = start + sc->uarts_num;
 
 if (sscanf(value, "uart%u", ) != 1) {
 error_setg(errp, "Bad value for \"uart\" property");
@@ -1110,11 +1112,12 @@ static void aspeed_set_bmc_console(Object *obj, const 
char *value, Error **errp)
 }
 
 /* The number of UART depends on the SoC */
-if (val < 1 || val > sc->uarts_num) {
-error_setg(errp, "\"uart\" should be in range [1 - %d]", 
sc->uarts_num);
+if (val < start || val >= end) {
+error_setg(errp, "\"uart\" should be in range [%d - %d]",
+   start, end - 1);
 return;
 }
-bmc->uart_chosen = ASPEED_DEV_UART1 + val - 1;
+bmc->uart_chosen = val + ASPEED_DEV_UART0;
 }
 
 static void aspeed_machine_class_props_init(ObjectClass *oc)
diff --git a/hw/arm/aspeed_ast10x0.c b/hw/arm/aspeed_ast10x0.c
index c3b5116a6a..2634e0f654 100644
--- a/hw/arm/aspeed_ast10x0.c
+++ b/hw/arm/aspeed_ast10x0.c
@@ -436,6 +436,7 @@ static void aspeed_soc_ast1030_class_init(ObjectClass 
*klass, void *data)
 sc->wdts_num = 4;
 sc->macs_num = 1;
 sc->uarts_num = 13;
+sc->uarts_base = ASPEED_DEV_UART1;
 sc->irqmap = aspeed_soc_ast1030_irqmap;
 sc->memmap = aspeed_soc_ast1030_memmap;
 sc->num_cpus = 1;
diff --git a/hw/arm/aspeed_ast2400.c b/hw/arm/aspeed_ast2400.c
index 8829561bb6..95da85fee0 100644
--- a/hw/arm/aspeed_ast2400.c
+++ b/hw/arm/aspeed_ast2400.c
@@ -523,6 +523,7 @@ static void aspeed_soc_ast2400_class_init(ObjectClass *oc, 
void *data)
 sc->wdts_num = 2;
 sc->macs_num = 2;
 sc->uarts_num= 5;
+sc->uarts_base   = ASPEED_DEV_UART1;
 sc->irqmap   = aspeed_soc_ast2400_irqmap;
 sc->memmap   = aspeed_soc_ast2400_memmap;
 sc->num_cpus = 1;
@@ -551,6 +552,7 @@ static void aspeed_soc_ast2500_class_init(ObjectClass *oc, 
void *data)
 sc->wdts_num = 3;
 sc->macs_num = 2;
 sc->uarts_num= 5;
+sc->uarts_base   = ASPEED_DEV_UART1;
 sc->irqmap   = aspeed_soc_ast2500_irqmap;
 sc->memmap   = aspeed_soc_ast2500_memmap;
 sc->num_cpus = 1;
diff --git a/hw/arm/aspeed_ast2600.c b/hw/arm/aspeed_ast2600.c
index 4ee32ea99d..f74561ecdc 100644
--- a/hw/arm/aspeed_ast2600.c
+++ b/hw/arm/aspeed_ast2600.c
@@ -666,6 +666,7 @@ static void aspeed_soc_ast2600_class_init(ObjectClass *oc, 
void *data)
 sc->wdts_num = 4;
 sc->macs_num = 4;
 sc->uarts_num= 13;
+sc->uarts_base   = ASPEED_DEV_UART1;
 sc->irqmap   = aspeed_soc_ast2600_irqmap;
 sc->memmap   = aspeed_soc_ast2600_memmap;
 sc->num_cpus = 2;
diff --git a/hw/arm/aspeed_soc_common.c b/hw/arm/aspeed_soc_common.c
index 123a0c432c..54c875c8d5 100644
--- a/hw/arm/aspeed_soc_common.c
+++ b/hw/arm/aspeed_soc_common.c
@@ -36,7 +36,7 @@ bool aspeed_soc_uart_realize(AspeedSoCState *s, Error **errp)
 AspeedSoCClass *sc = ASPEED_SOC_GET_CLASS(s);
 SerialMM *smm;
 
-

  1   2   3   >