On Wed, Apr 30, 2025 at 09:49:40AM +0000, Zhijian Li (Fujitsu) wrote: > > > On 30/04/2025 16:48, Zhijian Li (Fujitsu) via wrote: > >>> stderr: > >>> qemu-system-x86_64: cannot get rkey > >>> qemu-system-x86_64: error while loading state section id 2(ram) > >>> qemu-system-x86_64: load of migration failed: Operation not permitted > >>> qemu-system-x86_64: rdma migration: recv polling control error! > >>> qemu-system-x86_64: RDMA is in an error state waiting migration to abort! > >>> qemu-system-x86_64: failed to save SaveStateEntry with id(name): 2(ram): > >>> -1 > >>> qemu-system-x86_64: Channel error: Operation not permitted > >>> ** > >>> ERROR:../tests/qtest/migration/migration-qmp.c:200:check_migration_status: > >>> assertion failed (current_status != "failed"): ("failed" != "failed") > >>> qemu-system-x86_64: warning: Early error. Sending error. > >>> qemu-system-x86_64: warning: rdma migration: send polling control error > >>> ../tests/qtest/libqtest.c:199: kill_qemu() tried to terminate QEMU > >>> process but encountered exit status 1 (expected 0) > >>> > >>> So running the test also needs root? Is it possible we fix the test so it > >>> can also be smart enough to skip if it knows it'll hit the "cannot get > >>> rkey" error (even if it sees the rdma link setup)? Not something urgent > >>> but definitely good to have. > > It seems it's a security problem, I have no a good idea yet. > > > > Let me see see... > > Another workaround is update the 'ulimit -l' to >=128M for a non-root user(in > practice > this value works well on fedora40)
OK so it's about the locked mem.. thanks for looking. > > So we would have something like this: > > diff --git a/tests/qtest/migration/precopy-tests.c > b/tests/qtest/migration/precopy-tests.c > index 9f7236dc59f..1f24753c5a5 100644 > --- a/tests/qtest/migration/precopy-tests.c > +++ b/tests/qtest/migration/precopy-tests.c > @@ -101,6 +101,26 @@ static void test_precopy_unix_dirty_ring(void) > > #ifdef CONFIG_RDMA > > +#include <sys/resource.h> > +#define REQUIRED_MEMLOCK (128 * 1024 * 1024) // 128MB How does the 128M come from? Is it correlated to the VM size somehow? Btw, migrate_start() says for x86 we use 150MB VM. When you feel confident, feel free to send a formal patch, it can also include the reposted version of the current patch so that can be a series. It'll also be great if you could make sure they apply on top of: https://gitlab.com/peterx/qemu/-/tree/migration-staging Thanks, > + > +static bool mlock_check(void) > +{ > + uid_t uid; > + struct rlimit rlim; > + > + uid = getuid(); > + if (uid == 0) { > + return true; > + } > + > + if (getrlimit(RLIMIT_MEMLOCK, &rlim) != 0) { > + return false; > + } > + > + return rlim.rlim_cur >= REQUIRED_MEMLOCK; > +} > + > #define RDMA_MIGRATION_HELPER "scripts/rdma-migration-helper.sh" > static int new_rdma_link(char *buffer, bool ipv6) > { > @@ -137,6 +157,11 @@ static void test_precopy_rdma_plain_ip(bool ipv6) > { > char buffer[128] = {}; > > + if (!mlock_check()) { > + g_test_skip("'ulimit -l' is too small, require 128M"); > + return; > + } > + > if (new_rdma_link(buffer, ipv6)) { > g_test_skip("No rdma link available\n" > "# To enable the test:\n" -- Peter Xu