On Wed, Apr 30, 2025 at 09:49:40AM +0000, Zhijian Li (Fujitsu) wrote:
> 
> 
> On 30/04/2025 16:48, Zhijian Li (Fujitsu) via wrote:
> >>> stderr:
> >>> qemu-system-x86_64: cannot get rkey
> >>> qemu-system-x86_64: error while loading state section id 2(ram)
> >>> qemu-system-x86_64: load of migration failed: Operation not permitted
> >>> qemu-system-x86_64: rdma migration: recv polling control error!
> >>> qemu-system-x86_64: RDMA is in an error state waiting migration to abort!
> >>> qemu-system-x86_64: failed to save SaveStateEntry with id(name): 2(ram): 
> >>> -1
> >>> qemu-system-x86_64: Channel error: Operation not permitted
> >>> **
> >>> ERROR:../tests/qtest/migration/migration-qmp.c:200:check_migration_status:
> >>>  assertion failed (current_status != "failed"): ("failed" != "failed")
> >>> qemu-system-x86_64: warning: Early error. Sending error.
> >>> qemu-system-x86_64: warning: rdma migration: send polling control error
> >>> ../tests/qtest/libqtest.c:199: kill_qemu() tried to terminate QEMU 
> >>> process but encountered exit status 1 (expected 0)
> >>>
> >>> So running the test also needs root?  Is it possible we fix the test so it
> >>> can also be smart enough to skip if it knows it'll hit the "cannot get
> >>> rkey" error (even if it sees the rdma link setup)?  Not something urgent
> >>> but definitely good to have.
> > It seems it's a security problem, I have no a good idea yet.
> > 
> > Let me see see...
> 
> Another workaround is update the 'ulimit -l' to >=128M for a non-root user(in 
> practice
> this value works well on fedora40)

OK so it's about the locked mem.. thanks for looking.

> 
> So we would have something like this:
> 
> diff --git a/tests/qtest/migration/precopy-tests.c 
> b/tests/qtest/migration/precopy-tests.c
> index 9f7236dc59f..1f24753c5a5 100644
> --- a/tests/qtest/migration/precopy-tests.c
> +++ b/tests/qtest/migration/precopy-tests.c
> @@ -101,6 +101,26 @@ static void test_precopy_unix_dirty_ring(void)
>   
>   #ifdef CONFIG_RDMA
>   
> +#include <sys/resource.h>
> +#define REQUIRED_MEMLOCK (128 * 1024 * 1024) // 128MB

How does the 128M come from?  Is it correlated to the VM size somehow?
Btw, migrate_start() says for x86 we use 150MB VM.

When you feel confident, feel free to send a formal patch, it can also
include the reposted version of the current patch so that can be a series.

It'll also be great if you could make sure they apply on top of:

https://gitlab.com/peterx/qemu/-/tree/migration-staging

Thanks,

> +
> +static bool mlock_check(void)
> +{
> +    uid_t uid;
> +    struct rlimit rlim;
> +
> +    uid = getuid();
> +    if (uid == 0) {
> +        return true;
> +    }
> +
> +    if (getrlimit(RLIMIT_MEMLOCK, &rlim) != 0) {
> +        return false;
> +    }
> +
> +    return rlim.rlim_cur >= REQUIRED_MEMLOCK;
> +}
> +
>   #define RDMA_MIGRATION_HELPER "scripts/rdma-migration-helper.sh"
>   static int new_rdma_link(char *buffer, bool ipv6)
>   {
> @@ -137,6 +157,11 @@ static void test_precopy_rdma_plain_ip(bool ipv6)
>   {
>       char buffer[128] = {};
>   
> +    if (!mlock_check()) {
> +        g_test_skip("'ulimit -l' is too small, require 128M");
> +        return;
> +    }
> +
>       if (new_rdma_link(buffer, ipv6)) {
>           g_test_skip("No rdma link available\n"
>                       "# To enable the test:\n"

-- 
Peter Xu


Reply via email to