On 29/01/2024 6:17, Peter Xu wrote:
External email: Use caution opening links or attachments


On Sun, Jan 28, 2024 at 05:43:52PM +0200, Avihai Horon wrote:
On 25/01/2024 22:57, Fabiano Rosas wrote:
External email: Use caution opening links or attachments


Avihai Horon <avih...@nvidia.com> writes:

The commit in the fixes line moved multifd thread creation to a
different location, but forgot to move the p->running = true assignment
as well. Thus, p->running is set to true before multifd thread is
actually created.

p->running is used in multifd_save_cleanup() to decide whether to join
the multifd thread or not.

With TLS, an error in multifd_tls_channel_connect() can lead to a
segmentation fault because p->running is true but p->thread is never
initialized, so multifd_save_cleanup() tries to join an uninitialized
thread.

Fix it by moving p->running = true assignment right after multifd thread
creation. Also move qio_channel_set_delay() to there, as this is where
it used to be originally.

Fixes: 29647140157a ("migration/tls: add support for multifd tls-handshake")
Signed-off-by: Avihai Horon <avih...@nvidia.com>
Just for context, I haven't looked at this patch yet, but we were
planning to remove p->running altogether:

https://lore.kernel.org/r/20231110200241.20679-1-faro...@suse.de
Thanks for putting me in the picture.
I see that there has been a discussion about the multifd creation/treadown
flow.
In light of this discussion, I can already see a few problems in my series
that I didn't notice before (such as the TLS handshake thread leak).
The thread you mentioned here and some of my patches point out some problems
in multifd creation/treardown. I guess we can discuss it and see what's the
best way to solve them.

Regarding this patch, your solution indeed solves the bug that this patch
addresses, so maybe this could be dropped (or only noted in your patch).

Maybe I should also put you (and Peter) in context for this whole series --
I am writing it as preparation for adding a separate migration channel for
VFIO device migration, so VFIO devices could be migrated in parallel.
So this series tries to lay down some foundations to facilitate it.
Avihai, is the throughput the only reason that VFIO would like to have a
separate channel?

Actually, the main reason is to be able to send and load multiple VFIO devices data in parallel. For example, today if we have three VFIO devices, they are migrated sequentially one after another. This particularly hurts during the complete pre-copy phase (downtime), as loading the VFIO data in destination involves FW interaction and resource allocation, which takes time and simply blocks the other devices from sending and loading their data. Providing a separate channel and thread for each VIFO device solves this problem and ideally reduces the VFIO contribution to downtime from sum{VFIO device #1, ..., VFIO device #N} to max{VFIO device #1, ..., VFIO device #N}.


I'm wondering if we can also use multifd threads to send vfio data at some
point.  Now multifd indeed is closely bound to ram pages but maybe it'll
change in the near future to take any load?

Multifd is for solving the throughput issue already. If vfio has the same
goal, IMHO it'll be good to keep them using the same thread model, instead
of managing different threads in different places.  With that, any user
setting (for example, multifd-n-threads) will naturally apply to all
components, rather than relying on yet-another vfio-migration-threads-num
parameter.

Frankly, I didn't really put much attention to the throughput factor, and my plan is to introduce only a single thread per device. VFIO devices may have many GBs of data to migrate (e.g., vGPUs) and even mlx5 VFs can have a few GBs of data. So what you are saying here is interesting, although I didn't test such scenario to see the actual benefit.

I am trying to think if/how this could work and I have a few concerns:
1. RAM is made of fixed-positioned pages that can be randomly read/written, so sending these pages over multiple channels and loading them in the destination can work pretty naturally without much overhead.    VFIO device data, on the other hand, is just an opaque stream of bytes from QEMU point of view. This means that if we break this data to "packets" and send them over multiple channels, we must preserve the order by which this data was    originally read from the device and write the data in the same order to the destination device.    I am wondering if the overhead of maintaining such order may hurt performance, making it not worthwhile.

2. As I mentioned above, the main motivation for separate VFIO migration channel/thread, as I see today, is to allow parallel migration of VFIO devices.    AFAIU multifd, as it is today, doesn't provide such parallelism (i.e., in the complete pre-copy phase each device, be it RAM or VFIO, will fully send its data over the multifd threads and only after finishing will the next device send its data).

This is just what came to mind. Maybe we can think this more thoroughly and see if it could work somehow, now or in the future. However, I think making the multifd threads generic so they can send any kind of data is a good thing in general, regardless of VFIO.


Reply via email to