Re: [PATCH 2/2] migration/rdma: zore out head.repeat to make the error more clear

2023-09-22 Thread Peter Xu
On Wed, Sep 20, 2023 at 05:04:12PM +0800, Li Zhijian wrote:
> From: Li Zhijian 
> 
> Previously, we got a confusion error that complains
> the RDMAControlHeader.repeat:
> qemu-system-x86_64: rdma: Too many requests in this message 
> (3638950032).Bailing.
> 
> Actually, it's caused by an unexpected RDMAControlHeader.type.
> After this patch, error will become:
> qemu-system-x86_64: Unknown control message QEMU FILE
> 
> Signed-off-by: Li Zhijian 

Reviewed-by: Peter Xu 

-- 
Peter Xu




Re: [PATCH 2/2] migration/rdma: zore out head.repeat to make the error more clear

2023-09-21 Thread Fabiano Rosas
"Zhijian Li (Fujitsu)"  writes:

> On 20/09/2023 21:01, Fabiano Rosas wrote:
>> Li Zhijian  writes:
>> 
>>> From: Li Zhijian 
>>>
>>> Previously, we got a confusion error that complains
>>> the RDMAControlHeader.repeat:
>>> qemu-system-x86_64: rdma: Too many requests in this message 
>>> (3638950032).Bailing.
>>>
>>> Actually, it's caused by an unexpected RDMAControlHeader.type.
>>> After this patch, error will become:
>>> qemu-system-x86_64: Unknown control message QEMU FILE
>>>
>>> Signed-off-by: Li Zhijian 
>>> ---
>>>   migration/rdma.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/migration/rdma.c b/migration/rdma.c
>>> index a2a3db35b1..3073d9953c 100644
>>> --- a/migration/rdma.c
>>> +++ b/migration/rdma.c
>>> @@ -2812,7 +2812,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel 
>>> *ioc,
>>>   size_t remaining = iov[i].iov_len;
>>>   uint8_t * data = (void *)iov[i].iov_base;
>>>   while (remaining) {
>>> -RDMAControlHeader head;
>>> +RDMAControlHeader head = {};
>>>   
>>>   len = MIN(remaining, RDMA_SEND_INCREMENT);
>>>   remaining -= len;
>> 
>
> 2815 RDMAControlHeader head = {};
> 2816
> 2817 len = MIN(remaining, RDMA_SEND_INCREMENT);
> 2818 remaining -= len;
> 2819
> 2820 head.len = len;
> 2821 head.type = RDMA_CONTROL_QEMU_FILE;
> 2822
> 2823 ret = qemu_rdma_exchange_send(rdma, , data, NULL, NULL, 
> NULL);
>
>> I'm struggling to see how head is used before we set the type a couple
>> of lines below. Could you expand on it?
>
>
> IIUC, head is used for both common migration control path and RDMA specific 
> control path.
>
> hook_stage(RAM_SAVE_FLAG_HOOK) {
> rdma_hook_process(qemu_rdma_registration_handle) {
>do {
>// this is a RDMA own control block, should not be disturbed by 
> the common migration control path.
>// head will be extracted and processed here.
>// qio_channel_rdma_writev() will send RDMA_CONTROL_QEMU_FILE, 
> which is an unexpected message for this block.
>// head.repeat will be examined before the type, so an 
> uninitialized repeat will confuse us here.
>} while (!RDMA_CONTROL_REGISTER_FINISHED || !error)
> }
> }
>
>
> when qio_channel_rdma_writev() is used for common migration control path, 
> repeat is useless and will not be examined.
>
> With this patch, we can quickly know the cause.
>

Ah, right. Somehow I interpreted the commit message as meaning the
'type' field was bogus. But it's the 'repeat' field that causes the
issue. Thanks for the explanation.

Reviewed-by: Fabiano Rosas 




Re: [PATCH 2/2] migration/rdma: zore out head.repeat to make the error more clear

2023-09-20 Thread Zhijian Li (Fujitsu)


On 20/09/2023 21:01, Fabiano Rosas wrote:
> Li Zhijian  writes:
> 
>> From: Li Zhijian 
>>
>> Previously, we got a confusion error that complains
>> the RDMAControlHeader.repeat:
>> qemu-system-x86_64: rdma: Too many requests in this message 
>> (3638950032).Bailing.
>>
>> Actually, it's caused by an unexpected RDMAControlHeader.type.
>> After this patch, error will become:
>> qemu-system-x86_64: Unknown control message QEMU FILE
>>
>> Signed-off-by: Li Zhijian 
>> ---
>>   migration/rdma.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/migration/rdma.c b/migration/rdma.c
>> index a2a3db35b1..3073d9953c 100644
>> --- a/migration/rdma.c
>> +++ b/migration/rdma.c
>> @@ -2812,7 +2812,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
>>   size_t remaining = iov[i].iov_len;
>>   uint8_t * data = (void *)iov[i].iov_base;
>>   while (remaining) {
>> -RDMAControlHeader head;
>> +RDMAControlHeader head = {};
>>   
>>   len = MIN(remaining, RDMA_SEND_INCREMENT);
>>   remaining -= len;
> 

2815 RDMAControlHeader head = {};
2816
2817 len = MIN(remaining, RDMA_SEND_INCREMENT);
2818 remaining -= len;
2819
2820 head.len = len;
2821 head.type = RDMA_CONTROL_QEMU_FILE;
2822
2823 ret = qemu_rdma_exchange_send(rdma, , data, NULL, NULL, 
NULL);

> I'm struggling to see how head is used before we set the type a couple
> of lines below. Could you expand on it?


IIUC, head is used for both common migration control path and RDMA specific 
control path.

hook_stage(RAM_SAVE_FLAG_HOOK) {
rdma_hook_process(qemu_rdma_registration_handle) {
   do {
   // this is a RDMA own control block, should not be disturbed by the 
common migration control path.
   // head will be extracted and processed here.
   // qio_channel_rdma_writev() will send RDMA_CONTROL_QEMU_FILE, which 
is an unexpected message for this block.
   // head.repeat will be examined before the type, so an uninitialized 
repeat will confuse us here.
   } while (!RDMA_CONTROL_REGISTER_FINISHED || !error)
}
}


when qio_channel_rdma_writev() is used for common migration control path, 
repeat is useless and will not be examined.

With this patch, we can quickly know the cause.


> 
> Also, a smoke test could have caught both issues early on. Is there any
> reason for not having any?

i have no idea yet :)


Thanks
Zhijian

Re: [PATCH 2/2] migration/rdma: zore out head.repeat to make the error more clear

2023-09-20 Thread Fabiano Rosas
Li Zhijian  writes:

> From: Li Zhijian 
>
> Previously, we got a confusion error that complains
> the RDMAControlHeader.repeat:
> qemu-system-x86_64: rdma: Too many requests in this message 
> (3638950032).Bailing.
>
> Actually, it's caused by an unexpected RDMAControlHeader.type.
> After this patch, error will become:
> qemu-system-x86_64: Unknown control message QEMU FILE
>
> Signed-off-by: Li Zhijian 
> ---
>  migration/rdma.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/migration/rdma.c b/migration/rdma.c
> index a2a3db35b1..3073d9953c 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -2812,7 +2812,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
>  size_t remaining = iov[i].iov_len;
>  uint8_t * data = (void *)iov[i].iov_base;
>  while (remaining) {
> -RDMAControlHeader head;
> +RDMAControlHeader head = {};
>  
>  len = MIN(remaining, RDMA_SEND_INCREMENT);
>  remaining -= len;

I'm struggling to see how head is used before we set the type a couple
of lines below. Could you expand on it?

Also, a smoke test could have caught both issues early on. Is there any
reason for not having any?



[PATCH 2/2] migration/rdma: zore out head.repeat to make the error more clear

2023-09-20 Thread Li Zhijian
From: Li Zhijian 

Previously, we got a confusion error that complains
the RDMAControlHeader.repeat:
qemu-system-x86_64: rdma: Too many requests in this message 
(3638950032).Bailing.

Actually, it's caused by an unexpected RDMAControlHeader.type.
After this patch, error will become:
qemu-system-x86_64: Unknown control message QEMU FILE

Signed-off-by: Li Zhijian 
---
 migration/rdma.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/migration/rdma.c b/migration/rdma.c
index a2a3db35b1..3073d9953c 100644
--- a/migration/rdma.c
+++ b/migration/rdma.c
@@ -2812,7 +2812,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel *ioc,
 size_t remaining = iov[i].iov_len;
 uint8_t * data = (void *)iov[i].iov_base;
 while (remaining) {
-RDMAControlHeader head;
+RDMAControlHeader head = {};
 
 len = MIN(remaining, RDMA_SEND_INCREMENT);
 remaining -= len;
-- 
2.31.1