"Zhijian Li (Fujitsu)" <lizhij...@fujitsu.com> writes:

> On 18/09/2023 22:41, Markus Armbruster wrote:
>> The QEMUFileHooks methods don't come with a written contract.  Digging
>> through the code calling them, we find:
>> 
>> * save_page():
>
> I'm fine with this
>
>> 
>>    Negative values RAM_SAVE_CONTROL_DELAYED and
>>    RAM_SAVE_CONTROL_NOT_SUPP are special.  Any other negative value is
>>    an unspecified error.
>> 
>>    qemu_rdma_save_page() returns -EIO or rdma->error_state on error.  I
>>    believe the latter is always negative.  Nothing stops either of them
>>    to clash with the special values, though.  Feels unlikely, but fix
>>    it anyway to return only the special values and -1.
>> 
>> * before_ram_iterate(), before_ram_iterate():
>
> error code returned by before_ram_iterate() and before_ram_iterate() will be
> assigned to QEMUFile for upper layer.
> But it seems that no callers take care about the error ?  Shouldn't let the 
> callers
> to check the error instead ?

The error values returned by qemu_rdma_registration_start() and
qemu_rdma_registration_stop() carry no additional information a caller
could check.

Both return either -EIO or rdma->error_state on error.  The latter is
*not* a negative errno code.  Evidence: qio_channel_rdma_shutdown()
assigns -1, qemu_rdma_block_for_wrid() assigns the error value of
qemu_rdma_poll(), which can be the error value of ibv_poll_cq(), which
is an unspecified negative value, ...

I decided not to investigate what qemu-file.c does with the error values
after one quick glance at the code.  It's confusing, and quite possibly
confused.  But I'm already at 50+ patches, and am neither inclined nor
able to take on more migration cleanup at this time.

>>    Negative value means error.  qemu_rdma_registration_start() and
>>    qemu_rdma_registration_stop() comply as far as I can tell.  Make
>>    them comply *obviously*, by returning -1 on error.
>> 
>> * hook_ram_load:
>> 
>>    Negative value means error.  rdma_load_hook() already returns -1 on
>>    error.  Leave it alone.
>
> see inline reply below,
>
>> 
>> Signed-off-by: Markus Armbruster <arm...@redhat.com>
>> ---
>>   migration/rdma.c | 79 +++++++++++++++++++++++-------------------------
>>   1 file changed, 37 insertions(+), 42 deletions(-)
>> 
>> diff --git a/migration/rdma.c b/migration/rdma.c
>> index cc59155a50..46b5859268 100644
>> --- a/migration/rdma.c
>> +++ b/migration/rdma.c
>> @@ -3219,12 +3219,11 @@ static size_t qemu_rdma_save_page(QEMUFile *f,
>>       rdma = qatomic_rcu_read(&rioc->rdmaout);
>>   
>>       if (!rdma) {
>> -        return -EIO;
>> +        return -1;
>>       }
>>   
>> -    ret = check_error_state(rdma);
>> -    if (ret) {
>> -        return ret;
>> +    if (check_error_state(rdma)) {
>> +        return -1;
>>       }
>>   
>>       qemu_fflush(f);
>> @@ -3290,9 +3289,10 @@ static size_t qemu_rdma_save_page(QEMUFile *f,
>>       }
>>   
>>       return RAM_SAVE_CONTROL_DELAYED;
>> +
>>   err:
>>       rdma->error_state = ret;
>> -    return ret;
>> +    return -1;
>>   }
>>   
>>   static void rdma_accept_incoming_migration(void *opaque);
>> @@ -3538,12 +3538,11 @@ static int qemu_rdma_registration_handle(QEMUFile *f)
>>       rdma = qatomic_rcu_read(&rioc->rdmain);
>>   
>>       if (!rdma) {
>> -        return -EIO;
>> +        return -1;
>
> that's because EIO is not accurate here ?

It's because the function does not return a negative errno code, and
returning -EIO is misleading readers into assuming it does.

>>       }
>>   
>> -    ret = check_error_state(rdma);
>> -    if (ret) {
>> -        return ret;
>
> Ditto

Likewise.

[...]


Reply via email to