Re: [PATCH 1/7] migration/rdma: Fix save_page method to fail on polling error

2023-09-05 Thread Zhijian Li (Fujitsu)
On 31/08/2023 21:25, Markus Armbruster wrote: > qemu_rdma_save_page() reports polling error with error_report(), then > succeeds anyway. This is because the variable holding the polling > status *shadows* the variable the function returns. The latter > remains zero. > > Broken since day one,

Re: [PATCH v2 1/1] migration: skip poisoned memory pages on "ram saving" phase

2023-09-14 Thread Zhijian Li (Fujitsu)
On 15/09/2023 04:20, “William Roche wrote: > From: William Roche > > A memory page poisoned from the hypervisor level is no longer readable. > Thus, it is now treated as a zero-page for the ram saving migration phase. > > The migration of a VM will crash Qemu when it tries to read the >

Re: [PATCH v3 02/13] migration/rdma: Unfold ram_control_before_iterate()

2023-10-13 Thread Zhijian Li (Fujitsu)
On 12/10/2023 04:35, Juan Quintela wrote: > Once there: > - Remove unused data parameter > - unfold it in its callers. > - change all callers to call qemu_rdma_registration_start() > - We need to call QIO_CHANNEL_RDMA() after we check for migrate_rdma() > > Reviewed-by: Peter Xu >

Re: [PATCH v3 13/13] migration/rdma: Remove all "ret" variables that are used only once

2023-10-13 Thread Zhijian Li (Fujitsu)
On 12/10/2023 04:35, Juan Quintela wrote: > Change code that is: > > int ret; > ... > > ret = foo(); > if (ret[ < 0]?) { > > to: > > if (foo()[ < 0]) { > > Signed-off-by: Juan Quintela Reviewed-by: Li Zhijian

Re: [PATCH 39/52] migration/rdma: Convert qemu_rdma_write_one() to Error

2023-10-16 Thread Zhijian Li (Fujitsu)
On 16/10/2023 20:11, Jason Gunthorpe wrote: > On Sat, Oct 07, 2023 at 03:40:50AM +0000, Zhijian Li (Fujitsu) wrote: >> +rdma-core >> >> >> Is global variable *errno* reliable when the documentation only states >> "returns 0 on success, or the valu

Re: [PATCH v2 1/1] migration: skip poisoned memory pages on "ram saving" phase

2023-09-17 Thread Zhijian Li (Fujitsu)
On 15/09/2023 19:31, William Roche wrote: > On 9/15/23 05:13, Zhijian Li (Fujitsu) wrote: >> >> >> I'm okay with "RDMA isn't touched". >> BTW, could you share your reproducing program/hacking to poison the page, so >> that >> i am able

Re: [PATCH 01/52] migration/rdma: Clean up qemu_rdma_poll()'s return type

2023-09-21 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > qemu_rdma_poll()'s return type is uint64_t, even though it returns 0, > -1, or @ret, which is int. Its callers assign the return value to int > variables, then check whether it's negative. Unclean. > > Return int instead. > > Signed-off-by:

Re: [PATCH 08/52] migration/rdma: Fix qemu_rdma_accept() to return failure on errors

2023-09-21 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > qemu_rdma_accept() returns 0 in some cases even when it didn't > complete its job due to errors. Impact is not obvious. I figure the > caller will soon fail again with a misleading error message. > > Fix it to return -1 on any failure. > >

Re: [PATCH 09/52] migration/rdma: Put @errp parameter last

2023-09-21 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > include/qapi/error.h demands: > > * - Functions that use Error to report errors have an Error **errp > * parameter. It should be the last parameter, except for functions > * taking variable arguments. > > qemu_rdma_connect() does not

Re: [PATCH 00/52] migration/rdma: Error handling fixes

2023-09-21 Thread Zhijian Li (Fujitsu)
Perter, On 20/09/2023 00:49, Peter Xu wrote: > On Mon, Sep 18, 2023 at 04:41:14PM +0200, Markus Armbruster wrote: >> Oh dear, where to start. There's so much wrong, and in pretty obvious >> ways. This code should never have passed review. I'm refraining from >> saying more; see the commit

Re: [PATCH 04/52] migration/rdma: Drop fragile wr_id formatting

2023-09-21 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > wrid_desc[] uses 4001 pointers to map four integer values to strings. > > print_wrid() accesses wrid_desc[] out of bounds when passed a negative > argument. It returns null for values 2..1999 and 2001..3999. > > qemu_rdma_poll() and

Re: [PATCH 07/52] migration/rdma: Give qio_channel_rdma_source_funcs internal linkage

2023-09-21 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > Signed-off-by: Markus Armbruster Reviewed-by: Li Zhijian

Re: [PATCH 10/52] migration/rdma: Eliminate error_propagate()

2023-09-21 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > When all we do with an Error we receive into a local variable is > propagating to somewhere else, we can just as well receive it there > right away. > > Signed-off-by: Markus Armbruster Reviewed-by: Li Zhijian > --- > migration/rdma.c | 19

Re: [PATCH 11/52] migration/rdma: Drop rdma_add_block() error handling

2023-09-21 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > rdma_add_block() can't fail. Return void, and drop the unreachable > error handling. > > Signed-off-by: Markus Armbruster > --- > migration/rdma.c | 30 +- > 1 file changed, 9 insertions(+), 21 deletions(-) >

Re: [PATCH 02/52] migration/rdma: Clean up qemu_rdma_data_init()'s return type

2023-09-21 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > qemu_rdma_data_init() return type is void *. It actually returns > RDMAContext *, and all its callers assign the value to an > RDMAContext *. Unclean. > > Return RDMAContext * instead. > > Signed-off-by: Markus Armbruster Reviewed-by: Li

Re: [PATCH 03/52] migration/rdma: Clean up rdma_delete_block()'s return type

2023-09-21 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > rdma_delete_block() always returns 0, which its only caller ignores. > Return void instead. > > Signed-off-by: Markus Armbruster Reviewed-by: Li Zhijian

Re: [PATCH 21/52] migration/rdma: Fix QEMUFileHooks method return values

2023-09-24 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > The QEMUFileHooks methods don't come with a written contract. Digging > through the code calling them, we find: > > * save_page(): I'm fine with this > >Negative values RAM_SAVE_CONTROL_DELAYED and >RAM_SAVE_CONTROL_NOT_SUPP are

Re: [PATCH 22/52] migration/rdma: Fix rdma_getaddrinfo() error checking

2023-09-24 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > rdma_getaddrinfo() returns 0 on success. On error, it returns one of > the EAI_ error codes like getaddrinfo() does, Good catch, It used to be -1 on error, rdma_getaddrinfo(3) updated it 2021. or -1 with errno set. > This is broken by

Re: [PATCH 23/52] migration/rdma: Clean up qemu_rdma_wait_comp_channel()'s error value

2023-09-24 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > qemu_rdma_wait_comp_channel() returns 0 on success, and either -1 or > rdma->error_state on failure. Callers actually expect a negative > error value. I don't see the only one callers expect a negative error code. migration/rdma.c:1654:

Re: [PATCH 26/52] migration/rdma: Replace int error_state by bool errored

2023-09-25 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > All we do with the value of RDMAContext member @error_state is test > whether it's zero. Change to bool and rename to @errored. > make sense! Reviewed-by: Li Zhijian Can we move this patch ahead "[PATCH 23/52] migration/rdma: Clean up

Re: [PATCH 27/52] migration/rdma: Drop superfluous assignments to @ret

2023-09-25 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > Signed-off-by: Markus Armbruster Reviewed-by: Li Zhijian > --- > migration/rdma.c | 35 ++- > 1 file changed, 10 insertions(+), 25 deletions(-) > > diff --git a/migration/rdma.c b/migration/rdma.c > index

Re: [PATCH 28/52] migration/rdma: Check negative error values the same way everywhere

2023-09-25 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > When a function returns 0 on success, negative value on error, > checking for non-zero suffices, but checking for negative is clearer. > So do that. > This patch is no my favor convention. @Peter, Juan I'd like to hear your opinions. Thanks

Re: [PATCH 29/52] migration/rdma: Plug a memory leak and improve a message

2023-09-25 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > When migration capability @rdma-pin-all is true, but the server cannot > honor it, qemu_rdma_connect() calls macro ERROR(), then returns > success. > > ERROR() sets an error. Since qemu_rdma_connect() returns success, its > caller

Re: [PATCH 17/52] migration/rdma: Replace dangerous macro CHECK_ERROR_STATE()

2023-09-22 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > Hiding return statements in macros is a bad idea. Use a function > instead, and open code the return part. > > Signed-off-by: Markus Armbruster Reviewed-by: Li Zhijian > --- > migration/rdma.c | 43

Re: [PATCH 15/52] migration/rdma: Ditch useless numeric error codes in error messages

2023-09-22 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > Several error messages include numeric error codes returned by failed > functions: > > * ibv_poll_cq() returns an unspecified negative value. Useless. > > * rdma_accept and rmda_get_cm_event() return -1. Useless.

Re: [PATCH 14/52] migration/rdma: Use bool for two RDMAContext flags

2023-09-22 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > @error_reported and @received_error are flags. The latter is even > assigned bool true. Change them from int to bool. > > Signed-off-by: Markus Armbruster Reviewed-by: Li Zhijian > --- > migration/rdma.c | 6 +++--- > 1 file changed, 3

Re: [PATCH 13/52] migration/rdma: Make qemu_rdma_buffer_mergable() return bool

2023-09-22 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > qemu_rdma_buffer_mergable() is semantically a predicate. It returns > int 0 or 1. Return bool instead. > > Signed-off-by: Markus Armbruster Reviewed-by: Li Zhijian

Re: [PATCH 12/52] migration/rdma: Drop qemu_rdma_search_ram_block() error handling

2023-09-22 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > qemu_rdma_search_ram_block() can't fail. Return void, and drop the > unreachable error handling. > > Signed-off-by: Markus Armbruster Reviewed-by: Li Zhijian

Re: [PATCH 18/52] migration/rdma: Fix qemu_rdma_broken_ipv6_kernel() to set error

2023-09-22 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > qemu_rdma_resolve_host() and qemu_rdma_dest_init() try addresses until > they find on that works. If none works, they return the first Error > set by qemu_rdma_broken_ipv6_kernel(), or else return a generic one. > > qemu_rdma_broken_ipv6_kernel()

Re: [PATCH 1/2] migration: Fix rdma migration failed

2023-09-22 Thread Zhijian Li (Fujitsu)
On 20/09/2023 20:46, Fabiano Rosas wrote: > Li Zhijian writes: > >> From: Li Zhijian >> >> Destination will fail with: >> qemu-system-x86_64: rdma: Too many requests in this message >> (3638950032).Bailing. >> >> migrate with RDMA is different from tcp. RDMA has its own control >> message,

Re: [PATCH 11/52] migration/rdma: Drop rdma_add_block() error handling

2023-09-22 Thread Zhijian Li (Fujitsu)
On 21/09/2023 19:15, Markus Armbruster wrote: > "Zhijian Li (Fujitsu)" writes: > >> On 18/09/2023 22:41, Markus Armbruster wrote: >>> rdma_add_block() can't fail. Return void, and drop the unreachable >>> error handling. >>> >>> Signed

Re: [PATCH 16/52] migration/rdma: Fix io_writev(), io_readv() methods to obey contract

2023-09-22 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > QIOChannelClass methods qio_channel_rdma_readv() and > qio_channel_rdma_writev() violate their method contract when > rdma->error_state is non-zero: > > 1. They return whatever is in rdma->error_state then. Only -1 will be > fine. -2 will be

Re: [PATCH 19/52] migration/rdma: Fix qemu_get_cm_event_timeout() to always set error

2023-09-22 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > qemu_get_cm_event_timeout() neglects to set an error when it fails > because rdma_get_cm_event() fails. Harmless, as its caller > qemu_rdma_connect() substitutes a generic error then. Fix it anyway. > > qemu_rdma_connect() also sets the generic

Re: [PATCH 34/52] migration/rdma: Convert qemu_rdma_exchange_recv() to Error

2023-09-25 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > Functions that use an Error **errp parameter to return errors should > not also report them to the user, because reporting is the caller's > job. When the caller does, the error is reported twice. When it > doesn't (because it recovered from the

Re: [PATCH 42/52] migration/rdma: Convert qemu_rdma_post_recv_control() to Error

2023-09-25 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > Just for symmetry with qemu_rdma_post_send_control(). Error messages > lose detail I consider of no use to users. > > Signed-off-by: Markus Armbruster Reviewed-by: Li Zhijian

Re: [PATCH 44/52] migration/rdma: Silence qemu_rdma_resolve_host()

2023-09-25 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > Functions that use an Error **errp parameter to return errors should > not also report them to the user, because reporting is the caller's > job. When the caller does, the error is reported twice. When it > doesn't (because it recovered from the

Re: [PATCH 48/52] migration/rdma: Silence qemu_rdma_block_for_wrid()

2023-09-26 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:42, Markus Armbruster wrote: > Functions that use an Error **errp parameter to return errors should > not also report them to the user, because reporting is the caller's > job. When the caller does, the error is reported twice. When it > doesn't (because it recovered from the

Re: [PATCH 51/52] migration/rdma: Use error_report() & friends instead of stderr

2023-09-26 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:42, Markus Armbruster wrote: > if (local->nb_blocks != nb_dest_blocks) { > -fprintf(stderr, "ram blocks mismatch (Number of blocks %d vs %d) > " > -"Your QEMU command line parameters are probably " > -"not identical

Re: [PATCH] MAINTAINERS: Add entry for rdma migration

2023-09-26 Thread Zhijian Li (Fujitsu)
jian can see the patches and review when he still has > the bandwidth. Feel free to add my Acked tag. thanks. Acked-by: Li Zhijian > > Cc: Daniel P. Berrangé > Cc: Juan Quintela > Cc: Markus Armbruster > Cc: Zhijian Li (Fujitsu) > Cc: Fabiano Rosas > Signed-off-by:

Re: [PATCH 41/52] migration/rdma: Convert qemu_rdma_post_send_control() to Error

2023-09-25 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > Functions that use an Error **errp parameter to return errors should > not also report them to the user, because reporting is the caller's > job. When the caller does, the error is reported twice. When it > doesn't (because it recovered from the

Re: [PATCH 39/52] migration/rdma: Convert qemu_rdma_write_one() to Error

2023-09-25 Thread Zhijian Li (Fujitsu)
On 26/09/2023 13:50, Li Zhijian wrote: > > > On 18/09/2023 22:41, Markus Armbruster wrote: >> Functions that use an Error **errp parameter to return errors should >> not also report them to the user, because reporting is the caller's >> job.  When the caller does, the error is reported twice. 

Re: [PATCH 40/52] migration/rdma: Convert qemu_rdma_write() to Error

2023-09-25 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > Just for consistency with qemu_rdma_write_one() and > qemu_rdma_write_flush(), and for slightly simpler code. > > Signed-off-by: Markus Armbruster Reviewed-by: Li Zhijian

Re: [PATCH 39/52] migration/rdma: Convert qemu_rdma_write_one() to Error

2023-09-25 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > Functions that use an Error **errp parameter to return errors should > not also report them to the user, because reporting is the caller's > job. When the caller does, the error is reported twice. When it > doesn't (because it recovered from the

Re: [PATCH 39/52] migration/rdma: Convert qemu_rdma_write_one() to Error

2023-09-25 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > Functions that use an Error **errp parameter to return errors should > not also report them to the user, because reporting is the caller's > job. When the caller does, the error is reported twice. When it > doesn't (because it recovered from the

Re: [PATCH 47/52] migration/rdma: Don't report received completion events as error

2023-09-26 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:42, Markus Armbruster wrote: > When qemu_rdma_wait_comp_channel() receives an event from the > completion channel, it reports an error "receive cm event while wait > comp channel,cm event is T", where T is the numeric event type. > However, the function fails only when T is a

Re: [PATCH 52/52] migration/rdma: Fix how we show device details on open

2023-09-26 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:42, Markus Armbruster wrote: > qemu_rdma_dump_id() dumps RDMA device details to stdout. > > rdma_start_outgoing_migration() calls it via qemu_rdma_source_init() > and qemu_rdma_resolve_host() to show source device details. > rdma_start_incoming_migration() arranges its call via

Re: [PATCH 36/52] migration/rdma: Convert qemu_rdma_exchange_get_response() to Error

2023-09-25 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > Functions that use an Error **errp parameter to return errors should > not also report them to the user, because reporting is the caller's > job. When the caller does, the error is reported twice. When it > doesn't (because it recovered from the

Re: [PATCH 46/52] migration/rdma: Silence qemu_rdma_reg_control()

2023-09-26 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:42, Markus Armbruster wrote: > Functions that use an Error **errp parameter to return errors should > not also report them to the user, because reporting is the caller's > job. When the caller does, the error is reported twice. When it > doesn't (because it recovered from the

Re: [PATCH 37/52] migration/rdma: Convert qemu_rdma_reg_whole_ram_blocks() to Error

2023-09-25 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > Functions that use an Error **errp parameter to return errors should > not also report them to the user, because reporting is the caller's > job. When the caller does, the error is reported twice. When it > doesn't (because it recovered from the

Re: [PATCH 45/52] migration/rdma: Silence qemu_rdma_connect()

2023-09-26 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > Functions that use an Error **errp parameter to return errors should > not also report them to the user, because reporting is the caller's > job. When the caller does, the error is reported twice. When it > doesn't (because it recovered from the

Re: [PATCH 35/52] migration/rdma: Convert qemu_rdma_exchange_send() to Error

2023-09-25 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > Functions that use an Error **errp parameter to return errors should > not also report them to the user, because reporting is the caller's > job. When the caller does, the error is reported twice. When it > doesn't (because it recovered from the

Re: [PATCH 38/52] migration/rdma: Convert qemu_rdma_write_flush() to Error

2023-09-25 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > Functions that use an Error **errp parameter to return errors should > not also report them to the user, because reporting is the caller's > job. When the caller does, the error is reported twice. When it > doesn't (because it recovered from the

Re: [PATCH 43/52] migration/rdma: Convert qemu_rdma_alloc_pd_cq() to Error

2023-09-25 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > Functions that use an Error **errp parameter to return errors should > not also report them to the user, because reporting is the caller's > job. When the caller does, the error is reported twice. When it > doesn't (because it recovered from the

Re: [PATCH 49/52] migration/rdma: Silence qemu_rdma_register_and_get_keys()

2023-09-26 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:42, Markus Armbruster wrote: > Functions that use an Error **errp parameter to return errors should > not also report them to the user, because reporting is the caller's > job. When the caller does, the error is reported twice. When it > doesn't (because it recovered from the

Re: [PATCH 43/52] migration/rdma: Convert qemu_rdma_alloc_pd_cq() to Error

2023-09-26 Thread Zhijian Li (Fujitsu)
On 26/09/2023 14:41, Markus Armbruster wrote: > "Zhijian Li (Fujitsu)" writes: > >> On 18/09/2023 22:41, Markus Armbruster wrote: >>> Functions that use an Error **errp parameter to return errors should >>> not also report them to the user, because

Re: [PATCH 50/52] migration/rdma: Silence qemu_rdma_cleanup()

2023-09-26 Thread Zhijian Li (Fujitsu)
On 26/09/2023 19:52, Markus Armbruster wrote: > "Zhijian Li (Fujitsu)" writes: > >> On 18/09/2023 22:42, Markus Armbruster wrote: >>> Functions that use an Error **errp parameter to return errors should >>> not also report them to the user, because

Re: [PATCH 24/52] migration/rdma: Return -1 instead of negative errno code

2023-09-26 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > Several functions return negative errno codes on failure. Callers > check for specific codes exactly never. For some of the functions, > callers couldn't check even if they wanted to, because the functions > also return negative values that

Re: [PATCH 25/52] migration/rdma: Dumb down remaining int error values to -1

2023-09-26 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > This is just to make the error value more obvious. Callers don't > mind. > > Signed-off-by: Markus Armbruster Reviewed-by: Li Zhijian

Re: [PATCH 50/52] migration/rdma: Silence qemu_rdma_cleanup()

2023-09-26 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:42, Markus Armbruster wrote: > Functions that use an Error **errp parameter to return errors should > not also report them to the user, because reporting is the caller's > job. When the caller does, the error is reported twice. When it > doesn't (because it recovered from the

Re: [PATCH 26/52] migration/rdma: Replace int error_state by bool errored

2023-09-26 Thread Zhijian Li (Fujitsu)
On 25/09/2023 15:09, Markus Armbruster wrote: > "Zhijian Li (Fujitsu)" writes: > >> On 18/09/2023 22:41, Markus Armbruster wrote: >>> All we do with the value of RDMAContext member @error_state is test >>> whether it's zero. Change to bool and

Re: [PATCH v2 1/1] migration: skip poisoned memory pages on "ram saving" phase

2023-09-20 Thread Zhijian Li (Fujitsu)
On 15/09/2023 19:31, William Roche wrote: > On 9/15/23 05:13, Zhijian Li (Fujitsu) wrote: >> >> >> I'm okay with "RDMA isn't touched". >> BTW, could you share your reproducing program/hacking to poison the page, so >> that >> i am able

Re: [PATCH 05/52] migration/rdma: Consistently use uint64_t for work request IDs

2023-09-18 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > We use int instead of uint64_t in a few places. Change them to > uint64_t. > > This cleans up a comparison of signed qemu_rdma_block_for_wrid() > parameter @wrid_requested with unsigned @wr_id. Harmless, because the > actual arguments are

Re: [PATCH 1/2] migration: Fix rdma migration failed

2023-09-20 Thread Zhijian Li (Fujitsu)
Sorry to all, i forgot to update my email address to lizhij...@fujitsu.com. Corrected it. On 20/09/2023 17:04, Li Zhijian wrote: > From: Li Zhijian > > Destination will fail with: > qemu-system-x86_64: rdma: Too many requests in this message > (3638950032).Bailing. > > migrate with RDMA is

Re: [PATCH 2/2] migration/rdma: zore out head.repeat to make the error more clear

2023-09-20 Thread Zhijian Li (Fujitsu)
On 20/09/2023 21:01, Fabiano Rosas wrote: > Li Zhijian writes: > >> From: Li Zhijian >> >> Previously, we got a confusion error that complains >> the RDMAControlHeader.repeat: >> qemu-system-x86_64: rdma: Too many requests in this message >> (3638950032).Bailing. >> >> Actually, it's caused

Re: [PATCH 30/52] migration/rdma: Delete inappropriate error_report() in macro ERROR()

2023-09-25 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > Functions that use an Error **errp parameter to return errors should > not also report them to the user, because reporting is the caller's > job. When the caller does, the error is reported twice. When it > doesn't (because it recovered from the

Re: [PATCH 31/52] migration/rdma: Retire macro ERROR()

2023-09-25 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > ERROR() has become "error_setg() unless an error has been set > already". Hiding the conditional in the macro is in the way of > further work. Replace the macro uses by their expansion, and delete > the macro. > > Signed-off-by: Markus

Re: [PATCH 33/52] migration/rdma: Drop "@errp is clear" guards around error_setg()

2023-09-25 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > These guards are all redundant now. > > Signed-off-by: Markus Armbruster Reviewed-by: Li Zhijian > --- > migration/rdma.c | 164 +++ > 1 file changed, 51 insertions(+), 113 deletions(-) > >

Re: [PATCH 32/52] migration/rdma: Fix error handling around rdma_getaddrinfo()

2023-09-25 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > qemu_rdma_resolve_host() and qemu_rdma_dest_init() iterate over > addresses to find one that works, holding onto the first Error from > qemu_rdma_broken_ipv6_kernel() for use when no address works. Issues: > > 1. If @errp was _abort or _fatal,

Re: [PATCH 1/2] migration: Fix rdma migration failed

2023-09-25 Thread Zhijian Li (Fujitsu)
On 22/09/2023 23:42, Peter Xu wrote: > On Wed, Sep 20, 2023 at 05:04:11PM +0800, Li Zhijian wrote: >> From: Li Zhijian >> >> Destination will fail with: >> qemu-system-x86_64: rdma: Too many requests in this message >> (3638950032).Bailing. >> >> migrate with RDMA is different from tcp. RDMA

Re: [PATCH 00/52] migration/rdma: Error handling fixes

2023-09-25 Thread Zhijian Li (Fujitsu)
On 22/09/2023 23:21, Peter Xu wrote: > On Thu, Sep 21, 2023 at 08:27:24AM +0000, Zhijian Li (Fujitsu) wrote: >> I'm worried that I may not have enough time, ability, or environment to >> review/test >> the RDMA patches. but for this patch set, i will take a look later. &g

Re: [PATCH 21/52] migration/rdma: Fix QEMUFileHooks method return values

2023-09-25 Thread Zhijian Li (Fujitsu)
On 25/09/2023 14:36, Markus Armbruster wrote: > "Zhijian Li (Fujitsu)" writes: > >> On 18/09/2023 22:41, Markus Armbruster wrote: >>> The QEMUFileHooks methods don't come with a written contract. Digging >>> through the code calling them, we fin

Re: [PATCH v2 2/7] migration: Clean up local variable shadowing

2023-09-21 Thread Zhijian Li (Fujitsu)
On 21/09/2023 02:31, Markus Armbruster wrote: > Local variables shadowing other local variables or parameters make the > code needlessly hard to understand. Tracked down with -Wshadow=local. > Clean up: delete inner declarations when they are actually redundant, > else rename variables. > >

Re: [PATCH 1/1] migration: Non multifd migration don't care about multifd flushes

2023-10-11 Thread Zhijian Li (Fujitsu)
On 12/10/2023 04:55, Juan Quintela wrote: > RDMA was having trouble because > migrate_multifd_flush_after_each_section() can only be true or false, > but we don't want to send any flush when we are not in multifd > migration. > > CC: Fabiano Rosas Fixes:

Re: [PATCH v2 53/53] migration/rdma: Replace flawed device detail dump by tracing

2023-10-06 Thread Zhijian Li (Fujitsu)
On 28/09/2023 21:20, Markus Armbruster wrote: > qemu_rdma_dump_id() dumps RDMA device details to stdout. > > rdma_start_outgoing_migration() calls it via qemu_rdma_source_init() > and qemu_rdma_resolve_host() to show source device details. > rdma_start_incoming_migration() arranges its call via

Re: [PATCH v2 52/53] migration/rdma: Use error_report() & friends instead of stderr

2023-10-06 Thread Zhijian Li (Fujitsu)
On 28/09/2023 21:20, Markus Armbruster wrote: > error_report() obeys -msg, reports the current error location if any, > and reports to the current monitor if any. Reporting to stderr > directly with fprintf() or perror() is wrong, because it loses all > this. > > Fix the offenders. Bonus:

Re: [PATCH v2 07/53] migration/rdma: Clean up two more harmless signed vs. unsigned issues

2023-10-06 Thread Zhijian Li (Fujitsu)
On 28/09/2023 21:19, Markus Armbruster wrote: > qemu_rdma_exchange_get_response() compares int parameter @expecting > with uint32_t head->type. Actual arguments are non-negative > enumeration constants, RDMAControlHeader uint32_t member type, or > qemu_rdma_exchange_recv() int parameter

Re: [PATCH 39/52] migration/rdma: Convert qemu_rdma_write_one() to Error

2023-10-06 Thread Zhijian Li (Fujitsu)
+rdma-core Is global variable *errno* reliable when the documentation only states "returns 0 on success, or the value of errno on failure (which indicates the failure reason)." Someone read it as "assign error code to errno and return it.", I used to think the same way. but ibv_post_send()

Re: [PATCH v2 1/2] migration: Fix rdma migration failed

2023-10-07 Thread Zhijian Li (Fujitsu)
On 04/10/2023 02:57, Juan Quintela wrote: > commit c638f66121ce30063fbf68c3eab4d7429cf2b209 > Author: Juan Quintela > Date: Tue Oct 3 20:53:38 2023 +0200 > > migration: Non multifd migration don't care about multifd flushes > > RDMA was having trouble because >

Re: [PATCH v2 06/53] migration/rdma: Fix unwanted integer truncation

2023-10-06 Thread Zhijian Li (Fujitsu)
On 28/09/2023 21:19, Markus Armbruster wrote: > qio_channel_rdma_readv() assigns the size_t value of qemu_rdma_fill() > to an int variable before it adds it to @done / subtracts it from > @want, both size_t. Truncation when qemu_rdma_fill() copies more than > INT_MAX bytes. Seems vanishingly

Re: [PATCH v2 51/53] migration/rdma: Downgrade qemu_rdma_cleanup() errors to warnings

2023-10-06 Thread Zhijian Li (Fujitsu)
On 28/09/2023 21:20, Markus Armbruster wrote: > Functions that use an Error **errp parameter to return errors should > not also report them to the user, because reporting is the caller's > job. When the caller does, the error is reported twice. When it > doesn't (because it recovered from the

Re: [PATCH v2 16/53] migration/rdma: Fix or document problematic uses of errno

2023-10-06 Thread Zhijian Li (Fujitsu)
On 28/09/2023 21:19, Markus Armbruster wrote: > We use errno after calling Libibverbs functions that are not > documented to set errno (manual page does not mention errno), or where > the documentation is unclear ("returns [...] the value of errno on > failure"). While this could be read as

Re: [PATCH v2 22/53] migration/rdma: Drop dead qemu_rdma_data_init() code for !@host_port

2023-10-06 Thread Zhijian Li (Fujitsu)
On 28/09/2023 21:19, Markus Armbruster wrote: > qemu_rdma_data_init() neglects to set an Error when it fails because > @host_port is null. Fortunately, no caller passes null, so this is > merely a latent bug. Drop the flawed code handling null argument. > > Signed-off-by: Markus Armbruster

Re: [PATCH] hw/cxl: Fix opaque type interpret wrongly

2023-10-18 Thread Zhijian Li (Fujitsu)
On 13/10/2023 16:52, Philippe Mathieu-Daudé wrote: > On 13/10/23 03:55, Li Zhijian wrote: >> void cxl_component_register_block_init(Object *obj, >>     CXLComponentState *cxl_cstate, >>     const char *type) >> { >>

Re: [PATCH v2 1/2] hw/cxl: Pass CXLComponentState to cache_mem_ops

2023-10-23 Thread Zhijian Li (Fujitsu)
On 19/10/2023 18:50, Jonathan Cameron wrote: > On Wed, 18 Oct 2023 16:24:07 +0800 > Li Zhijian wrote: > >> cache_mem_ops.{read,write}() interprets opaque as >> CXLComponentState(cxl_cstate) instead of ComponentRegisters(cregs). >> >> Fortunately, cregs is the first member of cxl_cstate, so

Re: [PATCH v3 01/13] migration: Create migrate_rdma()

2023-10-13 Thread Zhijian Li (Fujitsu)
On 12/10/2023 04:35, Juan Quintela wrote: > Helper to say if we are doing a migration over rdma. > > Reviewed-by: Peter Xu > Signed-off-by: Juan Quintela Reviewed-by: Li Zhijian > --- > migration/migration.h | 2 ++ > migration/options.h | 1 + > migration/migration.c | 1 + >

Re: [PATCH v3 10/13] migration/rdma: Check sooner if we are in postcopy for save_page()

2023-10-13 Thread Zhijian Li (Fujitsu)
On 12/10/2023 04:35, Juan Quintela wrote: > Reviewed-by: Peter Xu > Signed-off-by: Juan Quintela Reviewed-by: Li Zhijian > --- > migration/rdma.c | 6 +- > 1 file changed, 1 insertion(+), 5 deletions(-) > > diff --git a/migration/rdma.c b/migration/rdma.c > index

Re: [PATCH v3 04/13] migration/rdma: Remove all uses of RAM_CONTROL_HOOK

2023-10-13 Thread Zhijian Li (Fujitsu)
On 12/10/2023 04:35, Juan Quintela wrote: > Instead of going trhough ram_control_load_hook(), call > qemu_rdma_registration_handle() directly. > s/trhough/through Reviewed-by: Li Zhijian > Reviewed-by: Peter Xu > Signed-off-by: Juan Quintela > --- > migration/qemu-file.h | 1 - >

Re: [PATCH v3 07/13] qemu-file: Remove QEMUFileHooks

2023-10-13 Thread Zhijian Li (Fujitsu)
On 12/10/2023 04:35, Juan Quintela wrote: > The only user was rdma, and its use is gone. > > Reviewed-by: Peter Xu > Signed-off-by: Juan Quintela Reviewed-by: Li Zhijian > --- > migration/qemu-file.h | 4 > migration/qemu-file.c | 6 -- > migration/rdma.c | 9 - >

Re: [PATCH v3 03/13] migration/rdma: Unfold ram_control_after_iterate()

2023-10-13 Thread Zhijian Li (Fujitsu)
On 12/10/2023 04:35, Juan Quintela wrote: > Once there: > - Remove unused data parameter > - unfold it in its callers > - change all callers to call qemu_rdma_registration_stop() > - We need to call QIO_CHANNEL_RDMA() after we check for migrate_rdma() > > Reviewed-by: Peter Xu > Signed-off-by:

Re: [PATCH v3 08/13] migration/rdma: Move rdma constants from qemu-file.h to rdma.h

2023-10-13 Thread Zhijian Li (Fujitsu)
On 12/10/2023 04:35, Juan Quintela wrote: > Reviewed-by: Peter Xu > Signed-off-by: Juan Quintela Reviewed-by: Li Zhijian > --- > migration/qemu-file.h | 17 - > migration/rdma.h | 16 > migration/ram.c | 2 +- > 3 files changed, 17

Re: [PATCH v3 06/13] migration/rdma: Create rdma_control_save_page()

2023-10-13 Thread Zhijian Li (Fujitsu)
On 12/10/2023 04:35, Juan Quintela wrote: > The only user of ram_control_save_page() and save_page() hook was > rdma. Just move the function to rdma.c, rename it to > rdma_control_save_page(). > > Reviewed-by: Peter Xu > Signed-off-by: Juan Quintela [...] > > +int

Re: [PATCH v3 09/13] migration/rdma: Remove qemu_ prefix from exported functions

2023-10-13 Thread Zhijian Li (Fujitsu)
On 12/10/2023 04:35, Juan Quintela wrote: > Functions are long enough even without this. > > Reviewed-by: Peter Xu > Signed-off-by: Juan Quintela Reviewed-by: Li Zhijian > --- > migration/rdma.h | 12 ++-- > migration/ram.c| 14 +++--- > migration/rdma.c

Re: [PATCH v3 11/13] migration/rdma: Use i as for index instead of idx

2023-10-13 Thread Zhijian Li (Fujitsu)
On 12/10/2023 04:35, Juan Quintela wrote: > Once there, all the uses are local to the for, so declare the variable > inside the for statement. > > Signed-off-by: Juan Quintela Reviewed-by: Li Zhijian > --- > migration/rdma.c | 49 ++-- > 1 file

Re: [PATCH v3 05/13] migration/rdma: Unfold hook_ram_load()

2023-10-13 Thread Zhijian Li (Fujitsu)
On 12/10/2023 04:35, Juan Quintela wrote: > There is only one flag called with: RAM_CONTROL_BLOCK_REG. > > Reviewed-by: Peter Xu > Signed-off-by: Juan Quintela Reviewed-by: Li Zhijian > --- > migration/qemu-file.h | 11 --- > migration/rdma.h | 3 +++ >

Re: [PATCH v3 12/13] migration/rdma: Declare for index variables local

2023-10-13 Thread Zhijian Li (Fujitsu)
On 12/10/2023 04:35, Juan Quintela wrote: > Declare all variables that are only used inside a for loop inside the > for statement. > > This makes clear that they are not used outside of the for loop. > > Signed-off-by: Juan Quintela Reviewed-by: Li Zhijian

Re: [PATCH 2/2] docs/clx: Change to lowercase as others

2023-05-25 Thread Zhijian Li (Fujitsu)
On 25/05/2023 19:49, Jonathan Cameron via wrote: > On Fri, 19 May 2023 16:58:02 +0800 > Li Zhijian wrote: > >> Using the same style except the 'Topo' abbreviation. >> >> Signed-off-by: Li Zhijian >> --- >> I'm not a native speaker, feel free to correct me. > > I've edited slightly and

Re: [PATCH v2] hw/cxl: Fix CFMW config memory leak

2023-05-31 Thread Zhijian Li (Fujitsu)
On 31/05/2023 13:45, Philippe Mathieu-Daudé wrote: > Hi Li, > > On 31/5/23 04:34, Li Zhijian wrote: >> Only 'fw' pointer is marked as g_autofree, so we shoud free other >> resource manually in error path. >> >> Signed-off-by: Li Zhijian >> --- >> V2: Delete unnecesarry check >> --- >>  

Re: [PATCH] hw/clx: Fix CFMW config memory leak

2023-05-30 Thread Zhijian Li (Fujitsu)
On 30/05/2023 23:14, Jonathan Cameron via wrote: > On Mon, 29 May 2023 15:59:56 +0800 > Li Zhijian wrote: > >> Only 'fw' pointer is marked as g_autofree, so we shoud free other >> resource manually in error path. >> > Good spot. > > Patch title typo > hw/cxl:> OMG, it was the 2nd time i

Re: [PATCH-for-9.1 v2 2/3] migration: Remove RDMA protocol handling

2024-04-09 Thread Zhijian Li (Fujitsu)
on 4/10/2024 3:46 AM, Peter Xu wrote: >> Is there document/link about the unittest/CI for migration tests, Why >> are those tests missing? >> Is it hard or very special to set up an environment for that? maybe we >> can help in this regards. > See tests/qtest/migration-test.c. We put most of

Re: [PATCH] migration/colo: Fix bdrv_graph_rdlock_main_loop: Assertion `!qemu_in_coroutine()' failed.

2024-04-16 Thread Zhijian Li (Fujitsu)
On 17/04/2024 10:44, Li Zhijian wrote: > bdrv_activate_all() should not be called from the coroutine context, move > it to the QEMU thread colo_process_incoming_thread() with the bql_lock > protected. > > The backtrace is as follows: > #4 0x561af7948362 in bdrv_graph_rdlock_main_loop ()

  1   2   >