"Zhijian Li (Fujitsu)" <lizhij...@fujitsu.com> writes:

> On 20/09/2023 21:01, Fabiano Rosas wrote:
>> Li Zhijian <lizhij...@fujitsu.com> writes:
>> 
>>> From: Li Zhijian <lizhij...@cn.fujitsu.com>
>>>
>>> Previously, we got a confusion error that complains
>>> the RDMAControlHeader.repeat:
>>> qemu-system-x86_64: rdma: Too many requests in this message 
>>> (3638950032).Bailing.
>>>
>>> Actually, it's caused by an unexpected RDMAControlHeader.type.
>>> After this patch, error will become:
>>> qemu-system-x86_64: Unknown control message QEMU FILE
>>>
>>> Signed-off-by: Li Zhijian <lizhij...@cn.fujitsu.com>
>>> ---
>>>   migration/rdma.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/migration/rdma.c b/migration/rdma.c
>>> index a2a3db35b1..3073d9953c 100644
>>> --- a/migration/rdma.c
>>> +++ b/migration/rdma.c
>>> @@ -2812,7 +2812,7 @@ static ssize_t qio_channel_rdma_writev(QIOChannel 
>>> *ioc,
>>>           size_t remaining = iov[i].iov_len;
>>>           uint8_t * data = (void *)iov[i].iov_base;
>>>           while (remaining) {
>>> -            RDMAControlHeader head;
>>> +            RDMAControlHeader head = {};
>>>   
>>>               len = MIN(remaining, RDMA_SEND_INCREMENT);
>>>               remaining -= len;
>> 
>
> 2815             RDMAControlHeader head = {};
> 2816
> 2817             len = MIN(remaining, RDMA_SEND_INCREMENT);
> 2818             remaining -= len;
> 2819
> 2820             head.len = len;
> 2821             head.type = RDMA_CONTROL_QEMU_FILE;
> 2822
> 2823             ret = qemu_rdma_exchange_send(rdma, &head, data, NULL, NULL, 
> NULL);
>
>> I'm struggling to see how head is used before we set the type a couple
>> of lines below. Could you expand on it?
>
>
> IIUC, head is used for both common migration control path and RDMA specific 
> control path.
>
> hook_stage(RAM_SAVE_FLAG_HOOK) {
>     rdma_hook_process(qemu_rdma_registration_handle) {
>        do {
>            // this is a RDMA own control block, should not be disturbed by 
> the common migration control path.
>            // head will be extracted and processed here.
>            // qio_channel_rdma_writev() will send RDMA_CONTROL_QEMU_FILE, 
> which is an unexpected message for this block.
>            // head.repeat will be examined before the type, so an 
> uninitialized repeat will confuse us here.
>        } while (!RDMA_CONTROL_REGISTER_FINISHED || !error)
>     }
> }
>
>
> when qio_channel_rdma_writev() is used for common migration control path, 
> repeat is useless and will not be examined.
>
> With this patch, we can quickly know the cause.
>

Ah, right. Somehow I interpreted the commit message as meaning the
'type' field was bogus. But it's the 'repeat' field that causes the
issue. Thanks for the explanation.

Reviewed-by: Fabiano Rosas <faro...@suse.de>


Reply via email to