On 11/03/2025 15:04, Cédric Le Goater wrote:
External email: Use caution opening links or attachments
On 3/7/25 14:45, Maciej S. Szmigiero wrote:
On 7.03.2025 13:03, Cédric Le Goater wrote:
On 3/7/25 11:57, Maciej S. Szmigiero wrote:
From: "Maciej S. Szmigiero" <maciej.szmigi...@oracle.com>
There's already a max in-flight VFIO device state buffers *count*
limit,
no. there isn't. Do we need both ?
This is on a top of the remaining patches
(x-migration-load-config-after-iter
and x-migration-max-queued-buffers) - I thought we were supposed to work
on these after the main series was merged as they are relatively
non-critical.
yes. we don't need both count and size limits though, a size limit is
enough.
I would also give x-migration-load-config-after-iter priority over
x-migration-max-queued-buffers{,-size} as the former is correctness fix
while the later are just additional functionalities.
ok. I have kept both patches in my tree with the doc updates.
Also, if some setup is truly worried about these buffers consuming
too much
memory then roughly the same thing could be achieved by (temporarily)
putting
the target QEMU process in a memory-limited cgroup.
yes.
That said,
since QEMU exchanges 1MB VFIODeviceStatePackets when using multifd and
that
the overall device state is in the order of 100MB :
/*
* This is an arbitrary size based on migration of mlx5 devices,
where typically
* total device migration size is on the order of 100s of MB.
Testing with
* larger values, e.g. 128MB and 1GB, did not show a performance
improvement.
*/
#define VFIO_MIG_DEFAULT_DATA_BUFFER_SIZE (1 * MiB)
Could we define the limit to 1GB ?
Avihai, would that make sense ?
There can be many use cases, each one with its own requirements and
constraints, so it's hard for me to think of a "good" default value.
IIUC this limit is mostly relevant for the extreme cases where devices
have big state + writing the buffers to the device is slow.
So IMHO let's set it to unlimited by default and let the users decide if
they want to set such limit and to what value. (Note also that even when
unlimited, it is really limited to 2 * device_state_size).
Unless you have other reasons why 1GB or other value is preferable?
Thanks.
Thanks,
C.
On the other hand, the network endianess patch is urgent since it
affects
the bit stream.
add also max queued buffers *size* limit.
Signed-off-by: Maciej S. Szmigiero <maciej.szmigi...@oracle.com>
---
docs/devel/migration/vfio.rst | 8 +++++---
hw/vfio/migration-multifd.c | 21 +++++++++++++++++++--
hw/vfio/pci.c | 9 +++++++++
include/hw/vfio/vfio-common.h | 1 +
4 files changed, 34 insertions(+), 5 deletions(-)
diff --git a/docs/devel/migration/vfio.rst
b/docs/devel/migration/vfio.rst
index 7c9cb7bdbf87..127a1db35949 100644
--- a/docs/devel/migration/vfio.rst
+++ b/docs/devel/migration/vfio.rst
@@ -254,12 +254,14 @@ This means that a malicious QEMU source could
theoretically cause the target
QEMU to allocate unlimited amounts of memory for such
buffers-in-flight.
The "x-migration-max-queued-buffers" property allows capping the
maximum count
-of these VFIO device state buffers queued at the destination.
+of these VFIO device state buffers queued at the destination while
+"x-migration-max-queued-buffers-size" property allows capping
their total queued
+size.
Because a malicious QEMU source causing OOM on the target is not
expected to be
a realistic threat in most of VFIO live migration use cases and
the right value
-depends on the particular setup by default this queued buffers
limit is
-disabled by setting it to UINT64_MAX.
+depends on the particular setup by default these queued buffers
limits are
+disabled by setting them to UINT64_MAX.
Some host platforms (like ARM64) require that VFIO device config
is loaded only
after all iterables were loaded.
diff --git a/hw/vfio/migration-multifd.c b/hw/vfio/migration-multifd.c
index dccd763d7c39..a9d41b9f1cb1 100644
--- a/hw/vfio/migration-multifd.c
+++ b/hw/vfio/migration-multifd.c
@@ -83,6 +83,7 @@ typedef struct VFIOMultifd {
uint32_t load_buf_idx;
uint32_t load_buf_idx_last;
uint32_t load_buf_queued_pending_buffers;
'load_buf_queued_pending_buffers' is not in mainline. Please rebase.
Thanks,
C.
Thanks,
Maciej