Re: [Qemu-devel] [PATCH 2/2] migration: not wait RDMA_CM_EVENT_DISCONNECTED event after rdma_disconnect

2018-05-22 Thread 858585 jemmy
On Wed, May 16, 2018 at 9:13 PM, Dr. David Alan Gilbert
 wrote:
> * 858585 jemmy (jemmy858...@gmail.com) wrote:
>
> 
>
>> >> >> > I wonder why dereg_mr takes so long - I could understand if reg_mr
>> >> >> > took a long time, but why for dereg, that sounds like the easy side.
>> >> >>
>> >> >> I use perf collect the information when ibv_dereg_mr is invoked.
>> >> >>
>> >> >> -   9.95%  client2  [kernel.kallsyms]  [k] put_compound_page
>> >> >>   `
>> >> >>- put_compound_page
>> >> >>   - 98.45% put_page
>> >> >>__ib_umem_release
>> >> >>ib_umem_release
>> >> >>dereg_mr
>> >> >>mlx5_ib_dereg_mr
>> >> >>ib_dereg_mr
>> >> >>uverbs_free_mr
>> >> >>remove_commit_idr_uobject
>> >> >>_rdma_remove_commit_uobject
>> >> >>rdma_remove_commit_uobject
>> >> >>ib_uverbs_dereg_mr
>> >> >>ib_uverbs_write
>> >> >>vfs_write
>> >> >>sys_write
>> >> >>system_call_fastpath
>> >> >>__GI___libc_write
>> >> >>0
>> >> >>   + 1.55% __ib_umem_release
>> >> >> +   8.31%  client2  [kernel.kallsyms]  [k] compound_unlock_irqrestore
>> >> >> +   7.01%  client2  [kernel.kallsyms]  [k] page_waitqueue
>> >> >> +   7.00%  client2  [kernel.kallsyms]  [k] set_page_dirty
>> >> >> +   6.61%  client2  [kernel.kallsyms]  [k] unlock_page
>> >> >> +   6.33%  client2  [kernel.kallsyms]  [k] put_page_testzero
>> >> >> +   5.68%  client2  [kernel.kallsyms]  [k] set_page_dirty_lock
>> >> >> +   4.30%  client2  [kernel.kallsyms]  [k] __wake_up_bit
>> >> >> +   4.04%  client2  [kernel.kallsyms]  [k] free_pages_prepare
>> >> >> +   3.65%  client2  [kernel.kallsyms]  [k] release_pages
>> >> >> +   3.62%  client2  [kernel.kallsyms]  [k] arch_local_irq_save
>> >> >> +   3.35%  client2  [kernel.kallsyms]  [k] page_mapping
>> >> >> +   3.13%  client2  [kernel.kallsyms]  [k] get_pageblock_flags_group
>> >> >> +   3.09%  client2  [kernel.kallsyms]  [k] put_page
>> >> >>
>> >> >> the reason is __ib_umem_release will loop many times for each page.
>> >> >>
>> >> >> static void __ib_umem_release(struct ib_device *dev, struct ib_umem
>> >> >> *umem, int dirty)
>> >> >> {
>> >> >> struct scatterlist *sg;
>> >> >> struct page *page;
>> >> >> int i;
>> >> >>
>> >> >> if (umem->nmap > 0)
>> >> >>  ib_dma_unmap_sg(dev, umem->sg_head.sgl,
>> >> >> umem->npages,
>> >> >> DMA_BIDIRECTIONAL);
>> >> >>
>> >> >>  for_each_sg(umem->sg_head.sgl, sg, umem->npages, i) {  <<
>> >> >> loop a lot of times for each page.here
>> >> >
>> >> > Why 'lot of times for each page'?  I don't know this code at all, but
>> >> > I'd expected once per page?
>> >>
>> >> sorry, once per page, but a lot of page for a big size virtual machine.
>> >
>> > Ah OK; so yes it seems best if you can find a way to do the release in
>> > the migration thread then;  still maybe this is something some
>> > of the kernel people could look at speeding up?
>>
>> The kernel code seem is not complex, and I have no idea how to speed up.
>
> Me neither; but I'll ask around.
>
>> >> >
>> >> > With your other kernel fix, does the problem of the missing
>> >> > RDMA_CM_EVENT_DISCONNECTED events go away?
>> >>
>> >> Yes, after kernel and qemu fixed, this issue never happens again.
>> >
>> > I'm confused; which qemu fix; my question was whether the kernel fix by
>> > itself fixed the problem of the missing event.
>>
>> this qemu fix:
>> migration: update index field when delete or qsort RDMALocalBlock
>
> OK good; so then we shouldn't need this 2/2 patch.
>
>> this issue also cause by some ram block is not released. but I do not
>> find the root cause.
>
> Hmm, we should try and track that down.
>
>> >
>> >> Do you think we should remove rdma_get_cm_event after rdma_disconnect?
>> >
>> > I don't think so; if 'rdma_disconnect' is supposed to generate the event
>> > I think we're supposed to wait for it to know that the disconnect is
>> > really complete.
>>
>> After move qemu_fclose to migration thread, it will not block the main
>> thread when wait
>> the disconnection event.
>
> I'm not sure about moving the fclose to the migration thread; it worries
> me with the interaction with cancel and other failures.

Bad news, I find the destination qemu also have this problem today.
when the precopy migration finished, the destination qemu main thread
process_incoming_migration_co will call qemu_fclose(f), and cause the
guest os hang for a while.

so I think it's better to solve this problem by rdma.c itself, not
change the migration.c code
when calling qemu_fclose.

we can create a separate thread in qemu_rdma_cleanup, and release the
resource in the
separate thread.

>
> Dave
>
>> >
>> > Dave
>> >
>> >>
>> >> >
>> >> > Dave
>> >> >
>> >> >> 

Re: [Qemu-devel] [PATCH 2/2] migration: not wait RDMA_CM_EVENT_DISCONNECTED event after rdma_disconnect

2018-05-16 Thread Dr. David Alan Gilbert
* 858585 jemmy (jemmy858...@gmail.com) wrote:



> >> >> > I wonder why dereg_mr takes so long - I could understand if reg_mr
> >> >> > took a long time, but why for dereg, that sounds like the easy side.
> >> >>
> >> >> I use perf collect the information when ibv_dereg_mr is invoked.
> >> >>
> >> >> -   9.95%  client2  [kernel.kallsyms]  [k] put_compound_page
> >> >>   `
> >> >>- put_compound_page
> >> >>   - 98.45% put_page
> >> >>__ib_umem_release
> >> >>ib_umem_release
> >> >>dereg_mr
> >> >>mlx5_ib_dereg_mr
> >> >>ib_dereg_mr
> >> >>uverbs_free_mr
> >> >>remove_commit_idr_uobject
> >> >>_rdma_remove_commit_uobject
> >> >>rdma_remove_commit_uobject
> >> >>ib_uverbs_dereg_mr
> >> >>ib_uverbs_write
> >> >>vfs_write
> >> >>sys_write
> >> >>system_call_fastpath
> >> >>__GI___libc_write
> >> >>0
> >> >>   + 1.55% __ib_umem_release
> >> >> +   8.31%  client2  [kernel.kallsyms]  [k] compound_unlock_irqrestore
> >> >> +   7.01%  client2  [kernel.kallsyms]  [k] page_waitqueue
> >> >> +   7.00%  client2  [kernel.kallsyms]  [k] set_page_dirty
> >> >> +   6.61%  client2  [kernel.kallsyms]  [k] unlock_page
> >> >> +   6.33%  client2  [kernel.kallsyms]  [k] put_page_testzero
> >> >> +   5.68%  client2  [kernel.kallsyms]  [k] set_page_dirty_lock
> >> >> +   4.30%  client2  [kernel.kallsyms]  [k] __wake_up_bit
> >> >> +   4.04%  client2  [kernel.kallsyms]  [k] free_pages_prepare
> >> >> +   3.65%  client2  [kernel.kallsyms]  [k] release_pages
> >> >> +   3.62%  client2  [kernel.kallsyms]  [k] arch_local_irq_save
> >> >> +   3.35%  client2  [kernel.kallsyms]  [k] page_mapping
> >> >> +   3.13%  client2  [kernel.kallsyms]  [k] get_pageblock_flags_group
> >> >> +   3.09%  client2  [kernel.kallsyms]  [k] put_page
> >> >>
> >> >> the reason is __ib_umem_release will loop many times for each page.
> >> >>
> >> >> static void __ib_umem_release(struct ib_device *dev, struct ib_umem
> >> >> *umem, int dirty)
> >> >> {
> >> >> struct scatterlist *sg;
> >> >> struct page *page;
> >> >> int i;
> >> >>
> >> >> if (umem->nmap > 0)
> >> >>  ib_dma_unmap_sg(dev, umem->sg_head.sgl,
> >> >> umem->npages,
> >> >> DMA_BIDIRECTIONAL);
> >> >>
> >> >>  for_each_sg(umem->sg_head.sgl, sg, umem->npages, i) {  <<
> >> >> loop a lot of times for each page.here
> >> >
> >> > Why 'lot of times for each page'?  I don't know this code at all, but
> >> > I'd expected once per page?
> >>
> >> sorry, once per page, but a lot of page for a big size virtual machine.
> >
> > Ah OK; so yes it seems best if you can find a way to do the release in
> > the migration thread then;  still maybe this is something some
> > of the kernel people could look at speeding up?
> 
> The kernel code seem is not complex, and I have no idea how to speed up.

Me neither; but I'll ask around.

> >> >
> >> > With your other kernel fix, does the problem of the missing
> >> > RDMA_CM_EVENT_DISCONNECTED events go away?
> >>
> >> Yes, after kernel and qemu fixed, this issue never happens again.
> >
> > I'm confused; which qemu fix; my question was whether the kernel fix by
> > itself fixed the problem of the missing event.
> 
> this qemu fix:
> migration: update index field when delete or qsort RDMALocalBlock

OK good; so then we shouldn't need this 2/2 patch.

> this issue also cause by some ram block is not released. but I do not
> find the root cause.

Hmm, we should try and track that down.

> >
> >> Do you think we should remove rdma_get_cm_event after rdma_disconnect?
> >
> > I don't think so; if 'rdma_disconnect' is supposed to generate the event
> > I think we're supposed to wait for it to know that the disconnect is
> > really complete.
> 
> After move qemu_fclose to migration thread, it will not block the main
> thread when wait
> the disconnection event.

I'm not sure about moving the fclose to the migration thread; it worries
me with the interaction with cancel and other failures.

Dave

> >
> > Dave
> >
> >>
> >> >
> >> > Dave
> >> >
> >> >>   page = sg_page(sg);
> >> >>   if (umem->writable && dirty)
> >> >>   set_page_dirty_lock(page);
> >> >>   put_page(page);
> >> >>  }
> >> >>
> >> >>  sg_free_table(>sg_head);
> >> >>  return;
> >> >> }
> >> >>
> >> >> >
> >> >> > Dave
> >> >> >
> >> >> >
> >> >> >> >
> >> >> >> > Dave
> >> >> >> >
> >> >> >> >> >
> >> >> >> >> >> Anyway, it should not invoke rdma_get_cm_event in main thread, 
> >> >> >> >> >> and the event channel
> >> >> >> >> >> is also destroyed in qemu_rdma_cleanup.
> >> >> >> >> >>
> >> >> >> >> >> Signed-off-by: Lidong Chen 
> >> >> >> >> 

Re: [Qemu-devel] [PATCH 2/2] migration: not wait RDMA_CM_EVENT_DISCONNECTED event after rdma_disconnect

2018-05-16 Thread 858585 jemmy
On Wed, May 16, 2018 at 5:53 PM, Dr. David Alan Gilbert
 wrote:
> * 858585 jemmy (jemmy858...@gmail.com) wrote:
>> On Wed, May 16, 2018 at 5:39 PM, Dr. David Alan Gilbert
>>  wrote:
>> > * 858585 jemmy (jemmy858...@gmail.com) wrote:
>> >> On Tue, May 15, 2018 at 3:27 AM, Dr. David Alan Gilbert
>> >>  wrote:
>> >> > * 858585 jemmy (jemmy858...@gmail.com) wrote:
>> >> >> On Sat, May 12, 2018 at 2:03 AM, Dr. David Alan Gilbert
>> >> >>  wrote:
>> >> >> > * 858585 jemmy (jemmy858...@gmail.com) wrote:
>> >> >> >> On Wed, May 9, 2018 at 2:40 AM, Dr. David Alan Gilbert
>> >> >> >>  wrote:
>> >> >> >> > * Lidong Chen (jemmy858...@gmail.com) wrote:
>> >> >> >> >> When cancel migration during RDMA precopy, the source qemu main 
>> >> >> >> >> thread hangs sometime.
>> >> >> >> >>
>> >> >> >> >> The backtrace is:
>> >> >> >> >> (gdb) bt
>> >> >> >> >> #0  0x7f249eabd43d in write () from 
>> >> >> >> >> /lib64/libpthread.so.0
>> >> >> >> >> #1  0x7f24a1ce98e4 in rdma_get_cm_event 
>> >> >> >> >> (channel=0x4675d10, event=0x7ffe2f643dd0) at src/cma.c:2189
>> >> >> >> >> #2  0x007b6166 in qemu_rdma_cleanup (rdma=0x6784000) 
>> >> >> >> >> at migration/rdma.c:2296
>> >> >> >> >> #3  0x007b7cae in qio_channel_rdma_close 
>> >> >> >> >> (ioc=0x3bfcc30, errp=0x0) at migration/rdma.c:2999
>> >> >> >> >> #4  0x008db60e in qio_channel_close (ioc=0x3bfcc30, 
>> >> >> >> >> errp=0x0) at io/channel.c:273
>> >> >> >> >> #5  0x007a8765 in channel_close (opaque=0x3bfcc30) 
>> >> >> >> >> at migration/qemu-file-channel.c:98
>> >> >> >> >> #6  0x007a71f9 in qemu_fclose (f=0x527c000) at 
>> >> >> >> >> migration/qemu-file.c:334
>> >> >> >> >> #7  0x00795b96 in migrate_fd_cleanup 
>> >> >> >> >> (opaque=0x3b46280) at migration/migration.c:1162
>> >> >> >> >> #8  0x0093a71b in aio_bh_call (bh=0x3db7a20) at 
>> >> >> >> >> util/async.c:90
>> >> >> >> >> #9  0x0093a7b2 in aio_bh_poll (ctx=0x3b121c0) at 
>> >> >> >> >> util/async.c:118
>> >> >> >> >> #10 0x0093f2ad in aio_dispatch (ctx=0x3b121c0) at 
>> >> >> >> >> util/aio-posix.c:436
>> >> >> >> >> #11 0x0093ab41 in aio_ctx_dispatch 
>> >> >> >> >> (source=0x3b121c0, callback=0x0, user_data=0x0)
>> >> >> >> >> at util/async.c:261
>> >> >> >> >> #12 0x7f249f73c7aa in g_main_context_dispatch () from 
>> >> >> >> >> /lib64/libglib-2.0.so.0
>> >> >> >> >> #13 0x0093dc5e in glib_pollfds_poll () at 
>> >> >> >> >> util/main-loop.c:215
>> >> >> >> >> #14 0x0093dd4e in os_host_main_loop_wait 
>> >> >> >> >> (timeout=2800) at util/main-loop.c:263
>> >> >> >> >> #15 0x0093de05 in main_loop_wait (nonblocking=0) at 
>> >> >> >> >> util/main-loop.c:522
>> >> >> >> >> #16 0x005bc6a5 in main_loop () at vl.c:1944
>> >> >> >> >> #17 0x005c39b5 in main (argc=56, 
>> >> >> >> >> argv=0x7ffe2f6443f8, envp=0x3ad0030) at vl.c:4752
>> >> >> >> >>
>> >> >> >> >> It does not get the RDMA_CM_EVENT_DISCONNECTED event after 
>> >> >> >> >> rdma_disconnect sometime.
>> >> >> >> >> I do not find out the root cause why not get 
>> >> >> >> >> RDMA_CM_EVENT_DISCONNECTED event, but
>> >> >> >> >> it can be reproduced if not invoke ibv_dereg_mr to release all 
>> >> >> >> >> ram blocks which fixed
>> >> >> >> >> in previous patch.
>> >> >> >> >
>> >> >> >> > Does this happen without your other changes?
>> >> >> >>
>> >> >> >> Yes, this issue also happen on v2.12.0. base on
>> >> >> >> commit 4743c23509a51bd4ee85cc272287a41917d1be35
>> >> >> >>
>> >> >> >> > Can you give me instructions to repeat it and also say which
>> >> >> >> > cards you wereusing?
>> >> >> >>
>> >> >> >> This issue can be reproduced by start and cancel migration.
>> >> >> >> less than 10 times, this issue will be reproduced.
>> >> >> >>
>> >> >> >> The command line is:
>> >> >> >> virsh migrate --live --copy-storage-all  --undefinesource 
>> >> >> >> --persistent
>> >> >> >> --timeout 10800 \
>> >> >> >>  --verbose 83e0049e-1325-4f31-baf9-25231509ada1  \
>> >> >> >> qemu+ssh://9.16.46.142/system rdma://9.16.46.142
>> >> >> >>
>> >> >> >> The net card i use is :
>> >> >> >> :3b:00.0 Ethernet controller: Mellanox Technologies MT27710 
>> >> >> >> Family
>> >> >> >> [ConnectX-4 Lx]
>> >> >> >> :3b:00.1 Ethernet controller: Mellanox Technologies MT27710 
>> >> >> >> Family
>> >> >> >> [ConnectX-4 Lx]
>> >> >> >>
>> >> >> >> This issue is related to ibv_dereg_mr, if not invoke ibv_dereg_mr 
>> >> >> >> for
>> >> >> >> all ram block, this issue can be reproduced.
>> >> >> >> If we fixed the bugs and use ibv_dereg_mr to release all ram block,
>> >> >> >> this issue never happens.
>> >> >> >
>> >> >> > Maybe that is the right fix; I can imagine that the RDMA code doesn't
>> >> >> > like closing down if there are still ramblocks 

Re: [Qemu-devel] [PATCH 2/2] migration: not wait RDMA_CM_EVENT_DISCONNECTED event after rdma_disconnect

2018-05-16 Thread Dr. David Alan Gilbert
* 858585 jemmy (jemmy858...@gmail.com) wrote:
> On Wed, May 16, 2018 at 5:39 PM, Dr. David Alan Gilbert
>  wrote:
> > * 858585 jemmy (jemmy858...@gmail.com) wrote:
> >> On Tue, May 15, 2018 at 3:27 AM, Dr. David Alan Gilbert
> >>  wrote:
> >> > * 858585 jemmy (jemmy858...@gmail.com) wrote:
> >> >> On Sat, May 12, 2018 at 2:03 AM, Dr. David Alan Gilbert
> >> >>  wrote:
> >> >> > * 858585 jemmy (jemmy858...@gmail.com) wrote:
> >> >> >> On Wed, May 9, 2018 at 2:40 AM, Dr. David Alan Gilbert
> >> >> >>  wrote:
> >> >> >> > * Lidong Chen (jemmy858...@gmail.com) wrote:
> >> >> >> >> When cancel migration during RDMA precopy, the source qemu main 
> >> >> >> >> thread hangs sometime.
> >> >> >> >>
> >> >> >> >> The backtrace is:
> >> >> >> >> (gdb) bt
> >> >> >> >> #0  0x7f249eabd43d in write () from /lib64/libpthread.so.0
> >> >> >> >> #1  0x7f24a1ce98e4 in rdma_get_cm_event 
> >> >> >> >> (channel=0x4675d10, event=0x7ffe2f643dd0) at src/cma.c:2189
> >> >> >> >> #2  0x007b6166 in qemu_rdma_cleanup (rdma=0x6784000) 
> >> >> >> >> at migration/rdma.c:2296
> >> >> >> >> #3  0x007b7cae in qio_channel_rdma_close 
> >> >> >> >> (ioc=0x3bfcc30, errp=0x0) at migration/rdma.c:2999
> >> >> >> >> #4  0x008db60e in qio_channel_close (ioc=0x3bfcc30, 
> >> >> >> >> errp=0x0) at io/channel.c:273
> >> >> >> >> #5  0x007a8765 in channel_close (opaque=0x3bfcc30) at 
> >> >> >> >> migration/qemu-file-channel.c:98
> >> >> >> >> #6  0x007a71f9 in qemu_fclose (f=0x527c000) at 
> >> >> >> >> migration/qemu-file.c:334
> >> >> >> >> #7  0x00795b96 in migrate_fd_cleanup 
> >> >> >> >> (opaque=0x3b46280) at migration/migration.c:1162
> >> >> >> >> #8  0x0093a71b in aio_bh_call (bh=0x3db7a20) at 
> >> >> >> >> util/async.c:90
> >> >> >> >> #9  0x0093a7b2 in aio_bh_poll (ctx=0x3b121c0) at 
> >> >> >> >> util/async.c:118
> >> >> >> >> #10 0x0093f2ad in aio_dispatch (ctx=0x3b121c0) at 
> >> >> >> >> util/aio-posix.c:436
> >> >> >> >> #11 0x0093ab41 in aio_ctx_dispatch (source=0x3b121c0, 
> >> >> >> >> callback=0x0, user_data=0x0)
> >> >> >> >> at util/async.c:261
> >> >> >> >> #12 0x7f249f73c7aa in g_main_context_dispatch () from 
> >> >> >> >> /lib64/libglib-2.0.so.0
> >> >> >> >> #13 0x0093dc5e in glib_pollfds_poll () at 
> >> >> >> >> util/main-loop.c:215
> >> >> >> >> #14 0x0093dd4e in os_host_main_loop_wait 
> >> >> >> >> (timeout=2800) at util/main-loop.c:263
> >> >> >> >> #15 0x0093de05 in main_loop_wait (nonblocking=0) at 
> >> >> >> >> util/main-loop.c:522
> >> >> >> >> #16 0x005bc6a5 in main_loop () at vl.c:1944
> >> >> >> >> #17 0x005c39b5 in main (argc=56, argv=0x7ffe2f6443f8, 
> >> >> >> >> envp=0x3ad0030) at vl.c:4752
> >> >> >> >>
> >> >> >> >> It does not get the RDMA_CM_EVENT_DISCONNECTED event after 
> >> >> >> >> rdma_disconnect sometime.
> >> >> >> >> I do not find out the root cause why not get 
> >> >> >> >> RDMA_CM_EVENT_DISCONNECTED event, but
> >> >> >> >> it can be reproduced if not invoke ibv_dereg_mr to release all 
> >> >> >> >> ram blocks which fixed
> >> >> >> >> in previous patch.
> >> >> >> >
> >> >> >> > Does this happen without your other changes?
> >> >> >>
> >> >> >> Yes, this issue also happen on v2.12.0. base on
> >> >> >> commit 4743c23509a51bd4ee85cc272287a41917d1be35
> >> >> >>
> >> >> >> > Can you give me instructions to repeat it and also say which
> >> >> >> > cards you wereusing?
> >> >> >>
> >> >> >> This issue can be reproduced by start and cancel migration.
> >> >> >> less than 10 times, this issue will be reproduced.
> >> >> >>
> >> >> >> The command line is:
> >> >> >> virsh migrate --live --copy-storage-all  --undefinesource 
> >> >> >> --persistent
> >> >> >> --timeout 10800 \
> >> >> >>  --verbose 83e0049e-1325-4f31-baf9-25231509ada1  \
> >> >> >> qemu+ssh://9.16.46.142/system rdma://9.16.46.142
> >> >> >>
> >> >> >> The net card i use is :
> >> >> >> :3b:00.0 Ethernet controller: Mellanox Technologies MT27710 
> >> >> >> Family
> >> >> >> [ConnectX-4 Lx]
> >> >> >> :3b:00.1 Ethernet controller: Mellanox Technologies MT27710 
> >> >> >> Family
> >> >> >> [ConnectX-4 Lx]
> >> >> >>
> >> >> >> This issue is related to ibv_dereg_mr, if not invoke ibv_dereg_mr for
> >> >> >> all ram block, this issue can be reproduced.
> >> >> >> If we fixed the bugs and use ibv_dereg_mr to release all ram block,
> >> >> >> this issue never happens.
> >> >> >
> >> >> > Maybe that is the right fix; I can imagine that the RDMA code doesn't
> >> >> > like closing down if there are still ramblocks registered that
> >> >> > potentially could have incoming DMA?
> >> >> >
> >> >> >> And for the kernel part, there is a bug also cause not release ram
> >> >> >> block when canceling live migration.
> >> >> >> 

Re: [Qemu-devel] [PATCH 2/2] migration: not wait RDMA_CM_EVENT_DISCONNECTED event after rdma_disconnect

2018-05-16 Thread 858585 jemmy
On Wed, May 16, 2018 at 5:39 PM, Dr. David Alan Gilbert
 wrote:
> * 858585 jemmy (jemmy858...@gmail.com) wrote:
>> On Tue, May 15, 2018 at 3:27 AM, Dr. David Alan Gilbert
>>  wrote:
>> > * 858585 jemmy (jemmy858...@gmail.com) wrote:
>> >> On Sat, May 12, 2018 at 2:03 AM, Dr. David Alan Gilbert
>> >>  wrote:
>> >> > * 858585 jemmy (jemmy858...@gmail.com) wrote:
>> >> >> On Wed, May 9, 2018 at 2:40 AM, Dr. David Alan Gilbert
>> >> >>  wrote:
>> >> >> > * Lidong Chen (jemmy858...@gmail.com) wrote:
>> >> >> >> When cancel migration during RDMA precopy, the source qemu main 
>> >> >> >> thread hangs sometime.
>> >> >> >>
>> >> >> >> The backtrace is:
>> >> >> >> (gdb) bt
>> >> >> >> #0  0x7f249eabd43d in write () from /lib64/libpthread.so.0
>> >> >> >> #1  0x7f24a1ce98e4 in rdma_get_cm_event (channel=0x4675d10, 
>> >> >> >> event=0x7ffe2f643dd0) at src/cma.c:2189
>> >> >> >> #2  0x007b6166 in qemu_rdma_cleanup (rdma=0x6784000) at 
>> >> >> >> migration/rdma.c:2296
>> >> >> >> #3  0x007b7cae in qio_channel_rdma_close 
>> >> >> >> (ioc=0x3bfcc30, errp=0x0) at migration/rdma.c:2999
>> >> >> >> #4  0x008db60e in qio_channel_close (ioc=0x3bfcc30, 
>> >> >> >> errp=0x0) at io/channel.c:273
>> >> >> >> #5  0x007a8765 in channel_close (opaque=0x3bfcc30) at 
>> >> >> >> migration/qemu-file-channel.c:98
>> >> >> >> #6  0x007a71f9 in qemu_fclose (f=0x527c000) at 
>> >> >> >> migration/qemu-file.c:334
>> >> >> >> #7  0x00795b96 in migrate_fd_cleanup (opaque=0x3b46280) 
>> >> >> >> at migration/migration.c:1162
>> >> >> >> #8  0x0093a71b in aio_bh_call (bh=0x3db7a20) at 
>> >> >> >> util/async.c:90
>> >> >> >> #9  0x0093a7b2 in aio_bh_poll (ctx=0x3b121c0) at 
>> >> >> >> util/async.c:118
>> >> >> >> #10 0x0093f2ad in aio_dispatch (ctx=0x3b121c0) at 
>> >> >> >> util/aio-posix.c:436
>> >> >> >> #11 0x0093ab41 in aio_ctx_dispatch (source=0x3b121c0, 
>> >> >> >> callback=0x0, user_data=0x0)
>> >> >> >> at util/async.c:261
>> >> >> >> #12 0x7f249f73c7aa in g_main_context_dispatch () from 
>> >> >> >> /lib64/libglib-2.0.so.0
>> >> >> >> #13 0x0093dc5e in glib_pollfds_poll () at 
>> >> >> >> util/main-loop.c:215
>> >> >> >> #14 0x0093dd4e in os_host_main_loop_wait 
>> >> >> >> (timeout=2800) at util/main-loop.c:263
>> >> >> >> #15 0x0093de05 in main_loop_wait (nonblocking=0) at 
>> >> >> >> util/main-loop.c:522
>> >> >> >> #16 0x005bc6a5 in main_loop () at vl.c:1944
>> >> >> >> #17 0x005c39b5 in main (argc=56, argv=0x7ffe2f6443f8, 
>> >> >> >> envp=0x3ad0030) at vl.c:4752
>> >> >> >>
>> >> >> >> It does not get the RDMA_CM_EVENT_DISCONNECTED event after 
>> >> >> >> rdma_disconnect sometime.
>> >> >> >> I do not find out the root cause why not get 
>> >> >> >> RDMA_CM_EVENT_DISCONNECTED event, but
>> >> >> >> it can be reproduced if not invoke ibv_dereg_mr to release all ram 
>> >> >> >> blocks which fixed
>> >> >> >> in previous patch.
>> >> >> >
>> >> >> > Does this happen without your other changes?
>> >> >>
>> >> >> Yes, this issue also happen on v2.12.0. base on
>> >> >> commit 4743c23509a51bd4ee85cc272287a41917d1be35
>> >> >>
>> >> >> > Can you give me instructions to repeat it and also say which
>> >> >> > cards you wereusing?
>> >> >>
>> >> >> This issue can be reproduced by start and cancel migration.
>> >> >> less than 10 times, this issue will be reproduced.
>> >> >>
>> >> >> The command line is:
>> >> >> virsh migrate --live --copy-storage-all  --undefinesource --persistent
>> >> >> --timeout 10800 \
>> >> >>  --verbose 83e0049e-1325-4f31-baf9-25231509ada1  \
>> >> >> qemu+ssh://9.16.46.142/system rdma://9.16.46.142
>> >> >>
>> >> >> The net card i use is :
>> >> >> :3b:00.0 Ethernet controller: Mellanox Technologies MT27710 Family
>> >> >> [ConnectX-4 Lx]
>> >> >> :3b:00.1 Ethernet controller: Mellanox Technologies MT27710 Family
>> >> >> [ConnectX-4 Lx]
>> >> >>
>> >> >> This issue is related to ibv_dereg_mr, if not invoke ibv_dereg_mr for
>> >> >> all ram block, this issue can be reproduced.
>> >> >> If we fixed the bugs and use ibv_dereg_mr to release all ram block,
>> >> >> this issue never happens.
>> >> >
>> >> > Maybe that is the right fix; I can imagine that the RDMA code doesn't
>> >> > like closing down if there are still ramblocks registered that
>> >> > potentially could have incoming DMA?
>> >> >
>> >> >> And for the kernel part, there is a bug also cause not release ram
>> >> >> block when canceling live migration.
>> >> >> https://patchwork.kernel.org/patch/10385781/
>> >> >
>> >> > OK, that's a pain; which threads are doing the dereg - is some stuff
>> >> > in the migration thread and some stuff in the main thread on cleanup?
>> >>
>> >> Yes, the migration thread invokes ibv_reg_mr, and the 

Re: [Qemu-devel] [PATCH 2/2] migration: not wait RDMA_CM_EVENT_DISCONNECTED event after rdma_disconnect

2018-05-16 Thread Dr. David Alan Gilbert
* 858585 jemmy (jemmy858...@gmail.com) wrote:
> On Tue, May 15, 2018 at 3:27 AM, Dr. David Alan Gilbert
>  wrote:
> > * 858585 jemmy (jemmy858...@gmail.com) wrote:
> >> On Sat, May 12, 2018 at 2:03 AM, Dr. David Alan Gilbert
> >>  wrote:
> >> > * 858585 jemmy (jemmy858...@gmail.com) wrote:
> >> >> On Wed, May 9, 2018 at 2:40 AM, Dr. David Alan Gilbert
> >> >>  wrote:
> >> >> > * Lidong Chen (jemmy858...@gmail.com) wrote:
> >> >> >> When cancel migration during RDMA precopy, the source qemu main 
> >> >> >> thread hangs sometime.
> >> >> >>
> >> >> >> The backtrace is:
> >> >> >> (gdb) bt
> >> >> >> #0  0x7f249eabd43d in write () from /lib64/libpthread.so.0
> >> >> >> #1  0x7f24a1ce98e4 in rdma_get_cm_event (channel=0x4675d10, 
> >> >> >> event=0x7ffe2f643dd0) at src/cma.c:2189
> >> >> >> #2  0x007b6166 in qemu_rdma_cleanup (rdma=0x6784000) at 
> >> >> >> migration/rdma.c:2296
> >> >> >> #3  0x007b7cae in qio_channel_rdma_close (ioc=0x3bfcc30, 
> >> >> >> errp=0x0) at migration/rdma.c:2999
> >> >> >> #4  0x008db60e in qio_channel_close (ioc=0x3bfcc30, 
> >> >> >> errp=0x0) at io/channel.c:273
> >> >> >> #5  0x007a8765 in channel_close (opaque=0x3bfcc30) at 
> >> >> >> migration/qemu-file-channel.c:98
> >> >> >> #6  0x007a71f9 in qemu_fclose (f=0x527c000) at 
> >> >> >> migration/qemu-file.c:334
> >> >> >> #7  0x00795b96 in migrate_fd_cleanup (opaque=0x3b46280) 
> >> >> >> at migration/migration.c:1162
> >> >> >> #8  0x0093a71b in aio_bh_call (bh=0x3db7a20) at 
> >> >> >> util/async.c:90
> >> >> >> #9  0x0093a7b2 in aio_bh_poll (ctx=0x3b121c0) at 
> >> >> >> util/async.c:118
> >> >> >> #10 0x0093f2ad in aio_dispatch (ctx=0x3b121c0) at 
> >> >> >> util/aio-posix.c:436
> >> >> >> #11 0x0093ab41 in aio_ctx_dispatch (source=0x3b121c0, 
> >> >> >> callback=0x0, user_data=0x0)
> >> >> >> at util/async.c:261
> >> >> >> #12 0x7f249f73c7aa in g_main_context_dispatch () from 
> >> >> >> /lib64/libglib-2.0.so.0
> >> >> >> #13 0x0093dc5e in glib_pollfds_poll () at 
> >> >> >> util/main-loop.c:215
> >> >> >> #14 0x0093dd4e in os_host_main_loop_wait 
> >> >> >> (timeout=2800) at util/main-loop.c:263
> >> >> >> #15 0x0093de05 in main_loop_wait (nonblocking=0) at 
> >> >> >> util/main-loop.c:522
> >> >> >> #16 0x005bc6a5 in main_loop () at vl.c:1944
> >> >> >> #17 0x005c39b5 in main (argc=56, argv=0x7ffe2f6443f8, 
> >> >> >> envp=0x3ad0030) at vl.c:4752
> >> >> >>
> >> >> >> It does not get the RDMA_CM_EVENT_DISCONNECTED event after 
> >> >> >> rdma_disconnect sometime.
> >> >> >> I do not find out the root cause why not get 
> >> >> >> RDMA_CM_EVENT_DISCONNECTED event, but
> >> >> >> it can be reproduced if not invoke ibv_dereg_mr to release all ram 
> >> >> >> blocks which fixed
> >> >> >> in previous patch.
> >> >> >
> >> >> > Does this happen without your other changes?
> >> >>
> >> >> Yes, this issue also happen on v2.12.0. base on
> >> >> commit 4743c23509a51bd4ee85cc272287a41917d1be35
> >> >>
> >> >> > Can you give me instructions to repeat it and also say which
> >> >> > cards you wereusing?
> >> >>
> >> >> This issue can be reproduced by start and cancel migration.
> >> >> less than 10 times, this issue will be reproduced.
> >> >>
> >> >> The command line is:
> >> >> virsh migrate --live --copy-storage-all  --undefinesource --persistent
> >> >> --timeout 10800 \
> >> >>  --verbose 83e0049e-1325-4f31-baf9-25231509ada1  \
> >> >> qemu+ssh://9.16.46.142/system rdma://9.16.46.142
> >> >>
> >> >> The net card i use is :
> >> >> :3b:00.0 Ethernet controller: Mellanox Technologies MT27710 Family
> >> >> [ConnectX-4 Lx]
> >> >> :3b:00.1 Ethernet controller: Mellanox Technologies MT27710 Family
> >> >> [ConnectX-4 Lx]
> >> >>
> >> >> This issue is related to ibv_dereg_mr, if not invoke ibv_dereg_mr for
> >> >> all ram block, this issue can be reproduced.
> >> >> If we fixed the bugs and use ibv_dereg_mr to release all ram block,
> >> >> this issue never happens.
> >> >
> >> > Maybe that is the right fix; I can imagine that the RDMA code doesn't
> >> > like closing down if there are still ramblocks registered that
> >> > potentially could have incoming DMA?
> >> >
> >> >> And for the kernel part, there is a bug also cause not release ram
> >> >> block when canceling live migration.
> >> >> https://patchwork.kernel.org/patch/10385781/
> >> >
> >> > OK, that's a pain; which threads are doing the dereg - is some stuff
> >> > in the migration thread and some stuff in the main thread on cleanup?
> >>
> >> Yes, the migration thread invokes ibv_reg_mr, and the main thread bh
> >> will invoke ibv_dereg_mr.
> >
> > OK, although my reading of that discussion is that *should* be allowed
> > (with your fix).
> >
> >> and when the main thread schedule 

Re: [Qemu-devel] [PATCH 2/2] migration: not wait RDMA_CM_EVENT_DISCONNECTED event after rdma_disconnect

2018-05-16 Thread 858585 jemmy
On Tue, May 15, 2018 at 3:27 AM, Dr. David Alan Gilbert
 wrote:
> * 858585 jemmy (jemmy858...@gmail.com) wrote:
>> On Sat, May 12, 2018 at 2:03 AM, Dr. David Alan Gilbert
>>  wrote:
>> > * 858585 jemmy (jemmy858...@gmail.com) wrote:
>> >> On Wed, May 9, 2018 at 2:40 AM, Dr. David Alan Gilbert
>> >>  wrote:
>> >> > * Lidong Chen (jemmy858...@gmail.com) wrote:
>> >> >> When cancel migration during RDMA precopy, the source qemu main thread 
>> >> >> hangs sometime.
>> >> >>
>> >> >> The backtrace is:
>> >> >> (gdb) bt
>> >> >> #0  0x7f249eabd43d in write () from /lib64/libpthread.so.0
>> >> >> #1  0x7f24a1ce98e4 in rdma_get_cm_event (channel=0x4675d10, 
>> >> >> event=0x7ffe2f643dd0) at src/cma.c:2189
>> >> >> #2  0x007b6166 in qemu_rdma_cleanup (rdma=0x6784000) at 
>> >> >> migration/rdma.c:2296
>> >> >> #3  0x007b7cae in qio_channel_rdma_close (ioc=0x3bfcc30, 
>> >> >> errp=0x0) at migration/rdma.c:2999
>> >> >> #4  0x008db60e in qio_channel_close (ioc=0x3bfcc30, 
>> >> >> errp=0x0) at io/channel.c:273
>> >> >> #5  0x007a8765 in channel_close (opaque=0x3bfcc30) at 
>> >> >> migration/qemu-file-channel.c:98
>> >> >> #6  0x007a71f9 in qemu_fclose (f=0x527c000) at 
>> >> >> migration/qemu-file.c:334
>> >> >> #7  0x00795b96 in migrate_fd_cleanup (opaque=0x3b46280) at 
>> >> >> migration/migration.c:1162
>> >> >> #8  0x0093a71b in aio_bh_call (bh=0x3db7a20) at 
>> >> >> util/async.c:90
>> >> >> #9  0x0093a7b2 in aio_bh_poll (ctx=0x3b121c0) at 
>> >> >> util/async.c:118
>> >> >> #10 0x0093f2ad in aio_dispatch (ctx=0x3b121c0) at 
>> >> >> util/aio-posix.c:436
>> >> >> #11 0x0093ab41 in aio_ctx_dispatch (source=0x3b121c0, 
>> >> >> callback=0x0, user_data=0x0)
>> >> >> at util/async.c:261
>> >> >> #12 0x7f249f73c7aa in g_main_context_dispatch () from 
>> >> >> /lib64/libglib-2.0.so.0
>> >> >> #13 0x0093dc5e in glib_pollfds_poll () at 
>> >> >> util/main-loop.c:215
>> >> >> #14 0x0093dd4e in os_host_main_loop_wait 
>> >> >> (timeout=2800) at util/main-loop.c:263
>> >> >> #15 0x0093de05 in main_loop_wait (nonblocking=0) at 
>> >> >> util/main-loop.c:522
>> >> >> #16 0x005bc6a5 in main_loop () at vl.c:1944
>> >> >> #17 0x005c39b5 in main (argc=56, argv=0x7ffe2f6443f8, 
>> >> >> envp=0x3ad0030) at vl.c:4752
>> >> >>
>> >> >> It does not get the RDMA_CM_EVENT_DISCONNECTED event after 
>> >> >> rdma_disconnect sometime.
>> >> >> I do not find out the root cause why not get 
>> >> >> RDMA_CM_EVENT_DISCONNECTED event, but
>> >> >> it can be reproduced if not invoke ibv_dereg_mr to release all ram 
>> >> >> blocks which fixed
>> >> >> in previous patch.
>> >> >
>> >> > Does this happen without your other changes?
>> >>
>> >> Yes, this issue also happen on v2.12.0. base on
>> >> commit 4743c23509a51bd4ee85cc272287a41917d1be35
>> >>
>> >> > Can you give me instructions to repeat it and also say which
>> >> > cards you wereusing?
>> >>
>> >> This issue can be reproduced by start and cancel migration.
>> >> less than 10 times, this issue will be reproduced.
>> >>
>> >> The command line is:
>> >> virsh migrate --live --copy-storage-all  --undefinesource --persistent
>> >> --timeout 10800 \
>> >>  --verbose 83e0049e-1325-4f31-baf9-25231509ada1  \
>> >> qemu+ssh://9.16.46.142/system rdma://9.16.46.142
>> >>
>> >> The net card i use is :
>> >> :3b:00.0 Ethernet controller: Mellanox Technologies MT27710 Family
>> >> [ConnectX-4 Lx]
>> >> :3b:00.1 Ethernet controller: Mellanox Technologies MT27710 Family
>> >> [ConnectX-4 Lx]
>> >>
>> >> This issue is related to ibv_dereg_mr, if not invoke ibv_dereg_mr for
>> >> all ram block, this issue can be reproduced.
>> >> If we fixed the bugs and use ibv_dereg_mr to release all ram block,
>> >> this issue never happens.
>> >
>> > Maybe that is the right fix; I can imagine that the RDMA code doesn't
>> > like closing down if there are still ramblocks registered that
>> > potentially could have incoming DMA?
>> >
>> >> And for the kernel part, there is a bug also cause not release ram
>> >> block when canceling live migration.
>> >> https://patchwork.kernel.org/patch/10385781/
>> >
>> > OK, that's a pain; which threads are doing the dereg - is some stuff
>> > in the migration thread and some stuff in the main thread on cleanup?
>>
>> Yes, the migration thread invokes ibv_reg_mr, and the main thread bh
>> will invoke ibv_dereg_mr.
>
> OK, although my reading of that discussion is that *should* be allowed
> (with your fix).
>
>> and when the main thread schedule s->cleanup_bh, the migration thread
>> may have been exited.
>>
>> And I find ibv_dereg_mr may take a long time for the big memory size
>> virtual machine.
>>
>> The test result is:
>>  10GB  326ms
>>  20GB  699ms
>>  30GB  1021ms
>>  40GB  1387ms
>>  

Re: [Qemu-devel] [PATCH 2/2] migration: not wait RDMA_CM_EVENT_DISCONNECTED event after rdma_disconnect

2018-05-14 Thread Dr. David Alan Gilbert
* 858585 jemmy (jemmy858...@gmail.com) wrote:
> On Sat, May 12, 2018 at 2:03 AM, Dr. David Alan Gilbert
>  wrote:
> > * 858585 jemmy (jemmy858...@gmail.com) wrote:
> >> On Wed, May 9, 2018 at 2:40 AM, Dr. David Alan Gilbert
> >>  wrote:
> >> > * Lidong Chen (jemmy858...@gmail.com) wrote:
> >> >> When cancel migration during RDMA precopy, the source qemu main thread 
> >> >> hangs sometime.
> >> >>
> >> >> The backtrace is:
> >> >> (gdb) bt
> >> >> #0  0x7f249eabd43d in write () from /lib64/libpthread.so.0
> >> >> #1  0x7f24a1ce98e4 in rdma_get_cm_event (channel=0x4675d10, 
> >> >> event=0x7ffe2f643dd0) at src/cma.c:2189
> >> >> #2  0x007b6166 in qemu_rdma_cleanup (rdma=0x6784000) at 
> >> >> migration/rdma.c:2296
> >> >> #3  0x007b7cae in qio_channel_rdma_close (ioc=0x3bfcc30, 
> >> >> errp=0x0) at migration/rdma.c:2999
> >> >> #4  0x008db60e in qio_channel_close (ioc=0x3bfcc30, 
> >> >> errp=0x0) at io/channel.c:273
> >> >> #5  0x007a8765 in channel_close (opaque=0x3bfcc30) at 
> >> >> migration/qemu-file-channel.c:98
> >> >> #6  0x007a71f9 in qemu_fclose (f=0x527c000) at 
> >> >> migration/qemu-file.c:334
> >> >> #7  0x00795b96 in migrate_fd_cleanup (opaque=0x3b46280) at 
> >> >> migration/migration.c:1162
> >> >> #8  0x0093a71b in aio_bh_call (bh=0x3db7a20) at 
> >> >> util/async.c:90
> >> >> #9  0x0093a7b2 in aio_bh_poll (ctx=0x3b121c0) at 
> >> >> util/async.c:118
> >> >> #10 0x0093f2ad in aio_dispatch (ctx=0x3b121c0) at 
> >> >> util/aio-posix.c:436
> >> >> #11 0x0093ab41 in aio_ctx_dispatch (source=0x3b121c0, 
> >> >> callback=0x0, user_data=0x0)
> >> >> at util/async.c:261
> >> >> #12 0x7f249f73c7aa in g_main_context_dispatch () from 
> >> >> /lib64/libglib-2.0.so.0
> >> >> #13 0x0093dc5e in glib_pollfds_poll () at 
> >> >> util/main-loop.c:215
> >> >> #14 0x0093dd4e in os_host_main_loop_wait (timeout=2800) 
> >> >> at util/main-loop.c:263
> >> >> #15 0x0093de05 in main_loop_wait (nonblocking=0) at 
> >> >> util/main-loop.c:522
> >> >> #16 0x005bc6a5 in main_loop () at vl.c:1944
> >> >> #17 0x005c39b5 in main (argc=56, argv=0x7ffe2f6443f8, 
> >> >> envp=0x3ad0030) at vl.c:4752
> >> >>
> >> >> It does not get the RDMA_CM_EVENT_DISCONNECTED event after 
> >> >> rdma_disconnect sometime.
> >> >> I do not find out the root cause why not get RDMA_CM_EVENT_DISCONNECTED 
> >> >> event, but
> >> >> it can be reproduced if not invoke ibv_dereg_mr to release all ram 
> >> >> blocks which fixed
> >> >> in previous patch.
> >> >
> >> > Does this happen without your other changes?
> >>
> >> Yes, this issue also happen on v2.12.0. base on
> >> commit 4743c23509a51bd4ee85cc272287a41917d1be35
> >>
> >> > Can you give me instructions to repeat it and also say which
> >> > cards you wereusing?
> >>
> >> This issue can be reproduced by start and cancel migration.
> >> less than 10 times, this issue will be reproduced.
> >>
> >> The command line is:
> >> virsh migrate --live --copy-storage-all  --undefinesource --persistent
> >> --timeout 10800 \
> >>  --verbose 83e0049e-1325-4f31-baf9-25231509ada1  \
> >> qemu+ssh://9.16.46.142/system rdma://9.16.46.142
> >>
> >> The net card i use is :
> >> :3b:00.0 Ethernet controller: Mellanox Technologies MT27710 Family
> >> [ConnectX-4 Lx]
> >> :3b:00.1 Ethernet controller: Mellanox Technologies MT27710 Family
> >> [ConnectX-4 Lx]
> >>
> >> This issue is related to ibv_dereg_mr, if not invoke ibv_dereg_mr for
> >> all ram block, this issue can be reproduced.
> >> If we fixed the bugs and use ibv_dereg_mr to release all ram block,
> >> this issue never happens.
> >
> > Maybe that is the right fix; I can imagine that the RDMA code doesn't
> > like closing down if there are still ramblocks registered that
> > potentially could have incoming DMA?
> >
> >> And for the kernel part, there is a bug also cause not release ram
> >> block when canceling live migration.
> >> https://patchwork.kernel.org/patch/10385781/
> >
> > OK, that's a pain; which threads are doing the dereg - is some stuff
> > in the migration thread and some stuff in the main thread on cleanup?
> 
> Yes, the migration thread invokes ibv_reg_mr, and the main thread bh
> will invoke ibv_dereg_mr.

OK, although my reading of that discussion is that *should* be allowed
(with your fix).

> and when the main thread schedule s->cleanup_bh, the migration thread
> may have been exited.
> 
> And I find ibv_dereg_mr may take a long time for the big memory size
> virtual machine.
> 
> The test result is:
>  10GB  326ms
>  20GB  699ms
>  30GB  1021ms
>  40GB  1387ms
>  50GB  1712ms
>  60GB  2034ms
>  70GB  2457ms
>  80GB  2807ms
>  90GB  3107ms
> 100GB 3474ms
> 110GB 3735ms
> 120GB 4064ms
> 130GB 4567ms
> 140GB 4886ms
> 
> when canceling migration, the guest os 

Re: [Qemu-devel] [PATCH 2/2] migration: not wait RDMA_CM_EVENT_DISCONNECTED event after rdma_disconnect

2018-05-14 Thread 858585 jemmy
On Sat, May 12, 2018 at 2:03 AM, Dr. David Alan Gilbert
 wrote:
> * 858585 jemmy (jemmy858...@gmail.com) wrote:
>> On Wed, May 9, 2018 at 2:40 AM, Dr. David Alan Gilbert
>>  wrote:
>> > * Lidong Chen (jemmy858...@gmail.com) wrote:
>> >> When cancel migration during RDMA precopy, the source qemu main thread 
>> >> hangs sometime.
>> >>
>> >> The backtrace is:
>> >> (gdb) bt
>> >> #0  0x7f249eabd43d in write () from /lib64/libpthread.so.0
>> >> #1  0x7f24a1ce98e4 in rdma_get_cm_event (channel=0x4675d10, 
>> >> event=0x7ffe2f643dd0) at src/cma.c:2189
>> >> #2  0x007b6166 in qemu_rdma_cleanup (rdma=0x6784000) at 
>> >> migration/rdma.c:2296
>> >> #3  0x007b7cae in qio_channel_rdma_close (ioc=0x3bfcc30, 
>> >> errp=0x0) at migration/rdma.c:2999
>> >> #4  0x008db60e in qio_channel_close (ioc=0x3bfcc30, errp=0x0) 
>> >> at io/channel.c:273
>> >> #5  0x007a8765 in channel_close (opaque=0x3bfcc30) at 
>> >> migration/qemu-file-channel.c:98
>> >> #6  0x007a71f9 in qemu_fclose (f=0x527c000) at 
>> >> migration/qemu-file.c:334
>> >> #7  0x00795b96 in migrate_fd_cleanup (opaque=0x3b46280) at 
>> >> migration/migration.c:1162
>> >> #8  0x0093a71b in aio_bh_call (bh=0x3db7a20) at 
>> >> util/async.c:90
>> >> #9  0x0093a7b2 in aio_bh_poll (ctx=0x3b121c0) at 
>> >> util/async.c:118
>> >> #10 0x0093f2ad in aio_dispatch (ctx=0x3b121c0) at 
>> >> util/aio-posix.c:436
>> >> #11 0x0093ab41 in aio_ctx_dispatch (source=0x3b121c0, 
>> >> callback=0x0, user_data=0x0)
>> >> at util/async.c:261
>> >> #12 0x7f249f73c7aa in g_main_context_dispatch () from 
>> >> /lib64/libglib-2.0.so.0
>> >> #13 0x0093dc5e in glib_pollfds_poll () at util/main-loop.c:215
>> >> #14 0x0093dd4e in os_host_main_loop_wait (timeout=2800) 
>> >> at util/main-loop.c:263
>> >> #15 0x0093de05 in main_loop_wait (nonblocking=0) at 
>> >> util/main-loop.c:522
>> >> #16 0x005bc6a5 in main_loop () at vl.c:1944
>> >> #17 0x005c39b5 in main (argc=56, argv=0x7ffe2f6443f8, 
>> >> envp=0x3ad0030) at vl.c:4752
>> >>
>> >> It does not get the RDMA_CM_EVENT_DISCONNECTED event after 
>> >> rdma_disconnect sometime.
>> >> I do not find out the root cause why not get RDMA_CM_EVENT_DISCONNECTED 
>> >> event, but
>> >> it can be reproduced if not invoke ibv_dereg_mr to release all ram blocks 
>> >> which fixed
>> >> in previous patch.
>> >
>> > Does this happen without your other changes?
>>
>> Yes, this issue also happen on v2.12.0. base on
>> commit 4743c23509a51bd4ee85cc272287a41917d1be35
>>
>> > Can you give me instructions to repeat it and also say which
>> > cards you wereusing?
>>
>> This issue can be reproduced by start and cancel migration.
>> less than 10 times, this issue will be reproduced.
>>
>> The command line is:
>> virsh migrate --live --copy-storage-all  --undefinesource --persistent
>> --timeout 10800 \
>>  --verbose 83e0049e-1325-4f31-baf9-25231509ada1  \
>> qemu+ssh://9.16.46.142/system rdma://9.16.46.142
>>
>> The net card i use is :
>> :3b:00.0 Ethernet controller: Mellanox Technologies MT27710 Family
>> [ConnectX-4 Lx]
>> :3b:00.1 Ethernet controller: Mellanox Technologies MT27710 Family
>> [ConnectX-4 Lx]
>>
>> This issue is related to ibv_dereg_mr, if not invoke ibv_dereg_mr for
>> all ram block, this issue can be reproduced.
>> If we fixed the bugs and use ibv_dereg_mr to release all ram block,
>> this issue never happens.
>
> Maybe that is the right fix; I can imagine that the RDMA code doesn't
> like closing down if there are still ramblocks registered that
> potentially could have incoming DMA?
>
>> And for the kernel part, there is a bug also cause not release ram
>> block when canceling live migration.
>> https://patchwork.kernel.org/patch/10385781/
>
> OK, that's a pain; which threads are doing the dereg - is some stuff
> in the migration thread and some stuff in the main thread on cleanup?

Yes, the migration thread invokes ibv_reg_mr, and the main thread bh
will invoke ibv_dereg_mr.
and when the main thread schedule s->cleanup_bh, the migration thread
may have been exited.

And I find ibv_dereg_mr may take a long time for the big memory size
virtual machine.

The test result is:
 10GB  326ms
 20GB  699ms
 30GB  1021ms
 40GB  1387ms
 50GB  1712ms
 60GB  2034ms
 70GB  2457ms
 80GB  2807ms
 90GB  3107ms
100GB 3474ms
110GB 3735ms
120GB 4064ms
130GB 4567ms
140GB 4886ms

when canceling migration, the guest os will hang for a while.
so I think we should invoke qemu_fclose(s->to_dst_file)  in
migration_thread, and without hold io thread lock.

>
> Dave
>
>> >
>> >> Anyway, it should not invoke rdma_get_cm_event in main thread, and the 
>> >> event channel
>> >> is also destroyed in qemu_rdma_cleanup.
>> >>
>> >> Signed-off-by: Lidong Chen 
>> >> ---
>> >>  

Re: [Qemu-devel] [PATCH 2/2] migration: not wait RDMA_CM_EVENT_DISCONNECTED event after rdma_disconnect

2018-05-11 Thread Dr. David Alan Gilbert
* 858585 jemmy (jemmy858...@gmail.com) wrote:
> On Wed, May 9, 2018 at 2:40 AM, Dr. David Alan Gilbert
>  wrote:
> > * Lidong Chen (jemmy858...@gmail.com) wrote:
> >> When cancel migration during RDMA precopy, the source qemu main thread 
> >> hangs sometime.
> >>
> >> The backtrace is:
> >> (gdb) bt
> >> #0  0x7f249eabd43d in write () from /lib64/libpthread.so.0
> >> #1  0x7f24a1ce98e4 in rdma_get_cm_event (channel=0x4675d10, 
> >> event=0x7ffe2f643dd0) at src/cma.c:2189
> >> #2  0x007b6166 in qemu_rdma_cleanup (rdma=0x6784000) at 
> >> migration/rdma.c:2296
> >> #3  0x007b7cae in qio_channel_rdma_close (ioc=0x3bfcc30, 
> >> errp=0x0) at migration/rdma.c:2999
> >> #4  0x008db60e in qio_channel_close (ioc=0x3bfcc30, errp=0x0) 
> >> at io/channel.c:273
> >> #5  0x007a8765 in channel_close (opaque=0x3bfcc30) at 
> >> migration/qemu-file-channel.c:98
> >> #6  0x007a71f9 in qemu_fclose (f=0x527c000) at 
> >> migration/qemu-file.c:334
> >> #7  0x00795b96 in migrate_fd_cleanup (opaque=0x3b46280) at 
> >> migration/migration.c:1162
> >> #8  0x0093a71b in aio_bh_call (bh=0x3db7a20) at util/async.c:90
> >> #9  0x0093a7b2 in aio_bh_poll (ctx=0x3b121c0) at 
> >> util/async.c:118
> >> #10 0x0093f2ad in aio_dispatch (ctx=0x3b121c0) at 
> >> util/aio-posix.c:436
> >> #11 0x0093ab41 in aio_ctx_dispatch (source=0x3b121c0, 
> >> callback=0x0, user_data=0x0)
> >> at util/async.c:261
> >> #12 0x7f249f73c7aa in g_main_context_dispatch () from 
> >> /lib64/libglib-2.0.so.0
> >> #13 0x0093dc5e in glib_pollfds_poll () at util/main-loop.c:215
> >> #14 0x0093dd4e in os_host_main_loop_wait (timeout=2800) at 
> >> util/main-loop.c:263
> >> #15 0x0093de05 in main_loop_wait (nonblocking=0) at 
> >> util/main-loop.c:522
> >> #16 0x005bc6a5 in main_loop () at vl.c:1944
> >> #17 0x005c39b5 in main (argc=56, argv=0x7ffe2f6443f8, 
> >> envp=0x3ad0030) at vl.c:4752
> >>
> >> It does not get the RDMA_CM_EVENT_DISCONNECTED event after rdma_disconnect 
> >> sometime.
> >> I do not find out the root cause why not get RDMA_CM_EVENT_DISCONNECTED 
> >> event, but
> >> it can be reproduced if not invoke ibv_dereg_mr to release all ram blocks 
> >> which fixed
> >> in previous patch.
> >
> > Does this happen without your other changes?
> 
> Yes, this issue also happen on v2.12.0. base on
> commit 4743c23509a51bd4ee85cc272287a41917d1be35
> 
> > Can you give me instructions to repeat it and also say which
> > cards you wereusing?
> 
> This issue can be reproduced by start and cancel migration.
> less than 10 times, this issue will be reproduced.
> 
> The command line is:
> virsh migrate --live --copy-storage-all  --undefinesource --persistent
> --timeout 10800 \
>  --verbose 83e0049e-1325-4f31-baf9-25231509ada1  \
> qemu+ssh://9.16.46.142/system rdma://9.16.46.142
> 
> The net card i use is :
> :3b:00.0 Ethernet controller: Mellanox Technologies MT27710 Family
> [ConnectX-4 Lx]
> :3b:00.1 Ethernet controller: Mellanox Technologies MT27710 Family
> [ConnectX-4 Lx]
> 
> This issue is related to ibv_dereg_mr, if not invoke ibv_dereg_mr for
> all ram block, this issue can be reproduced.
> If we fixed the bugs and use ibv_dereg_mr to release all ram block,
> this issue never happens.

Maybe that is the right fix; I can imagine that the RDMA code doesn't
like closing down if there are still ramblocks registered that
potentially could have incoming DMA?

> And for the kernel part, there is a bug also cause not release ram
> block when canceling live migration.
> https://patchwork.kernel.org/patch/10385781/

OK, that's a pain; which threads are doing the dereg - is some stuff
in the migration thread and some stuff in the main thread on cleanup?

Dave

> >
> >> Anyway, it should not invoke rdma_get_cm_event in main thread, and the 
> >> event channel
> >> is also destroyed in qemu_rdma_cleanup.
> >>
> >> Signed-off-by: Lidong Chen 
> >> ---
> >>  migration/rdma.c   | 12 ++--
> >>  migration/trace-events |  1 -
> >>  2 files changed, 2 insertions(+), 11 deletions(-)
> >>
> >> diff --git a/migration/rdma.c b/migration/rdma.c
> >> index 0dd4033..92e4d30 100644
> >> --- a/migration/rdma.c
> >> +++ b/migration/rdma.c
> >> @@ -2275,8 +2275,7 @@ static int qemu_rdma_write(QEMUFile *f, RDMAContext 
> >> *rdma,
> >>
> >>  static void qemu_rdma_cleanup(RDMAContext *rdma)
> >>  {
> >> -struct rdma_cm_event *cm_event;
> >> -int ret, idx;
> >> +int idx;
> >>
> >>  if (rdma->cm_id && rdma->connected) {
> >>  if ((rdma->error_state ||
> >> @@ -2290,14 +2289,7 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
> >>  qemu_rdma_post_send_control(rdma, NULL, );
> >>  }
> >>
> >> -ret = rdma_disconnect(rdma->cm_id);
> >> -if (!ret) {
> >> 

Re: [Qemu-devel] [PATCH 2/2] migration: not wait RDMA_CM_EVENT_DISCONNECTED event after rdma_disconnect

2018-05-08 Thread 858585 jemmy
On Wed, May 9, 2018 at 2:40 AM, Dr. David Alan Gilbert
 wrote:
> * Lidong Chen (jemmy858...@gmail.com) wrote:
>> When cancel migration during RDMA precopy, the source qemu main thread hangs 
>> sometime.
>>
>> The backtrace is:
>> (gdb) bt
>> #0  0x7f249eabd43d in write () from /lib64/libpthread.so.0
>> #1  0x7f24a1ce98e4 in rdma_get_cm_event (channel=0x4675d10, 
>> event=0x7ffe2f643dd0) at src/cma.c:2189
>> #2  0x007b6166 in qemu_rdma_cleanup (rdma=0x6784000) at 
>> migration/rdma.c:2296
>> #3  0x007b7cae in qio_channel_rdma_close (ioc=0x3bfcc30, 
>> errp=0x0) at migration/rdma.c:2999
>> #4  0x008db60e in qio_channel_close (ioc=0x3bfcc30, errp=0x0) at 
>> io/channel.c:273
>> #5  0x007a8765 in channel_close (opaque=0x3bfcc30) at 
>> migration/qemu-file-channel.c:98
>> #6  0x007a71f9 in qemu_fclose (f=0x527c000) at 
>> migration/qemu-file.c:334
>> #7  0x00795b96 in migrate_fd_cleanup (opaque=0x3b46280) at 
>> migration/migration.c:1162
>> #8  0x0093a71b in aio_bh_call (bh=0x3db7a20) at util/async.c:90
>> #9  0x0093a7b2 in aio_bh_poll (ctx=0x3b121c0) at util/async.c:118
>> #10 0x0093f2ad in aio_dispatch (ctx=0x3b121c0) at 
>> util/aio-posix.c:436
>> #11 0x0093ab41 in aio_ctx_dispatch (source=0x3b121c0, 
>> callback=0x0, user_data=0x0)
>> at util/async.c:261
>> #12 0x7f249f73c7aa in g_main_context_dispatch () from 
>> /lib64/libglib-2.0.so.0
>> #13 0x0093dc5e in glib_pollfds_poll () at util/main-loop.c:215
>> #14 0x0093dd4e in os_host_main_loop_wait (timeout=2800) at 
>> util/main-loop.c:263
>> #15 0x0093de05 in main_loop_wait (nonblocking=0) at 
>> util/main-loop.c:522
>> #16 0x005bc6a5 in main_loop () at vl.c:1944
>> #17 0x005c39b5 in main (argc=56, argv=0x7ffe2f6443f8, 
>> envp=0x3ad0030) at vl.c:4752
>>
>> It does not get the RDMA_CM_EVENT_DISCONNECTED event after rdma_disconnect 
>> sometime.
>> I do not find out the root cause why not get RDMA_CM_EVENT_DISCONNECTED 
>> event, but
>> it can be reproduced if not invoke ibv_dereg_mr to release all ram blocks 
>> which fixed
>> in previous patch.
>
> Does this happen without your other changes?

Yes, this issue also happen on v2.12.0. base on
commit 4743c23509a51bd4ee85cc272287a41917d1be35

> Can you give me instructions to repeat it and also say which
> cards you wereusing?

This issue can be reproduced by start and cancel migration.
less than 10 times, this issue will be reproduced.

The command line is:
virsh migrate --live --copy-storage-all  --undefinesource --persistent
--timeout 10800 \
 --verbose 83e0049e-1325-4f31-baf9-25231509ada1  \
qemu+ssh://9.16.46.142/system rdma://9.16.46.142

The net card i use is :
:3b:00.0 Ethernet controller: Mellanox Technologies MT27710 Family
[ConnectX-4 Lx]
:3b:00.1 Ethernet controller: Mellanox Technologies MT27710 Family
[ConnectX-4 Lx]

This issue is related to ibv_dereg_mr, if not invoke ibv_dereg_mr for
all ram block, this issue can be reproduced.
If we fixed the bugs and use ibv_dereg_mr to release all ram block,
this issue never happens.

And for the kernel part, there is a bug also cause not release ram
block when canceling live migration.
https://patchwork.kernel.org/patch/10385781/

>
>> Anyway, it should not invoke rdma_get_cm_event in main thread, and the event 
>> channel
>> is also destroyed in qemu_rdma_cleanup.
>>
>> Signed-off-by: Lidong Chen 
>> ---
>>  migration/rdma.c   | 12 ++--
>>  migration/trace-events |  1 -
>>  2 files changed, 2 insertions(+), 11 deletions(-)
>>
>> diff --git a/migration/rdma.c b/migration/rdma.c
>> index 0dd4033..92e4d30 100644
>> --- a/migration/rdma.c
>> +++ b/migration/rdma.c
>> @@ -2275,8 +2275,7 @@ static int qemu_rdma_write(QEMUFile *f, RDMAContext 
>> *rdma,
>>
>>  static void qemu_rdma_cleanup(RDMAContext *rdma)
>>  {
>> -struct rdma_cm_event *cm_event;
>> -int ret, idx;
>> +int idx;
>>
>>  if (rdma->cm_id && rdma->connected) {
>>  if ((rdma->error_state ||
>> @@ -2290,14 +2289,7 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
>>  qemu_rdma_post_send_control(rdma, NULL, );
>>  }
>>
>> -ret = rdma_disconnect(rdma->cm_id);
>> -if (!ret) {
>> -trace_qemu_rdma_cleanup_waiting_for_disconnect();
>> -ret = rdma_get_cm_event(rdma->channel, _event);
>> -if (!ret) {
>> -rdma_ack_cm_event(cm_event);
>> -}
>> -}
>> +rdma_disconnect(rdma->cm_id);
>
> I'm worried whether this change could break stuff:
> The docs say for rdma_disconnect that it flushes any posted work
> requests to the completion queue;  so unless we wait for the event
> do we know the stuff has been flushed?   In the normal non-cancel case
> I'm worried that means we could lose something.
> 

Re: [Qemu-devel] [PATCH 2/2] migration: not wait RDMA_CM_EVENT_DISCONNECTED event after rdma_disconnect

2018-05-08 Thread Dr. David Alan Gilbert
* Lidong Chen (jemmy858...@gmail.com) wrote:
> When cancel migration during RDMA precopy, the source qemu main thread hangs 
> sometime.
> 
> The backtrace is:
> (gdb) bt
> #0  0x7f249eabd43d in write () from /lib64/libpthread.so.0
> #1  0x7f24a1ce98e4 in rdma_get_cm_event (channel=0x4675d10, 
> event=0x7ffe2f643dd0) at src/cma.c:2189
> #2  0x007b6166 in qemu_rdma_cleanup (rdma=0x6784000) at 
> migration/rdma.c:2296
> #3  0x007b7cae in qio_channel_rdma_close (ioc=0x3bfcc30, 
> errp=0x0) at migration/rdma.c:2999
> #4  0x008db60e in qio_channel_close (ioc=0x3bfcc30, errp=0x0) at 
> io/channel.c:273
> #5  0x007a8765 in channel_close (opaque=0x3bfcc30) at 
> migration/qemu-file-channel.c:98
> #6  0x007a71f9 in qemu_fclose (f=0x527c000) at 
> migration/qemu-file.c:334
> #7  0x00795b96 in migrate_fd_cleanup (opaque=0x3b46280) at 
> migration/migration.c:1162
> #8  0x0093a71b in aio_bh_call (bh=0x3db7a20) at util/async.c:90
> #9  0x0093a7b2 in aio_bh_poll (ctx=0x3b121c0) at util/async.c:118
> #10 0x0093f2ad in aio_dispatch (ctx=0x3b121c0) at 
> util/aio-posix.c:436
> #11 0x0093ab41 in aio_ctx_dispatch (source=0x3b121c0, 
> callback=0x0, user_data=0x0)
> at util/async.c:261
> #12 0x7f249f73c7aa in g_main_context_dispatch () from 
> /lib64/libglib-2.0.so.0
> #13 0x0093dc5e in glib_pollfds_poll () at util/main-loop.c:215
> #14 0x0093dd4e in os_host_main_loop_wait (timeout=2800) at 
> util/main-loop.c:263
> #15 0x0093de05 in main_loop_wait (nonblocking=0) at 
> util/main-loop.c:522
> #16 0x005bc6a5 in main_loop () at vl.c:1944
> #17 0x005c39b5 in main (argc=56, argv=0x7ffe2f6443f8, 
> envp=0x3ad0030) at vl.c:4752
> 
> It does not get the RDMA_CM_EVENT_DISCONNECTED event after rdma_disconnect 
> sometime.
> I do not find out the root cause why not get RDMA_CM_EVENT_DISCONNECTED 
> event, but
> it can be reproduced if not invoke ibv_dereg_mr to release all ram blocks 
> which fixed
> in previous patch.

Does this happen without your other changes?
Can you give me instructions to repeat it and also say which
cards you wereusing?

> Anyway, it should not invoke rdma_get_cm_event in main thread, and the event 
> channel
> is also destroyed in qemu_rdma_cleanup.
> 
> Signed-off-by: Lidong Chen 
> ---
>  migration/rdma.c   | 12 ++--
>  migration/trace-events |  1 -
>  2 files changed, 2 insertions(+), 11 deletions(-)
> 
> diff --git a/migration/rdma.c b/migration/rdma.c
> index 0dd4033..92e4d30 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -2275,8 +2275,7 @@ static int qemu_rdma_write(QEMUFile *f, RDMAContext 
> *rdma,
>  
>  static void qemu_rdma_cleanup(RDMAContext *rdma)
>  {
> -struct rdma_cm_event *cm_event;
> -int ret, idx;
> +int idx;
>  
>  if (rdma->cm_id && rdma->connected) {
>  if ((rdma->error_state ||
> @@ -2290,14 +2289,7 @@ static void qemu_rdma_cleanup(RDMAContext *rdma)
>  qemu_rdma_post_send_control(rdma, NULL, );
>  }
>  
> -ret = rdma_disconnect(rdma->cm_id);
> -if (!ret) {
> -trace_qemu_rdma_cleanup_waiting_for_disconnect();
> -ret = rdma_get_cm_event(rdma->channel, _event);
> -if (!ret) {
> -rdma_ack_cm_event(cm_event);
> -}
> -}
> +rdma_disconnect(rdma->cm_id);

I'm worried whether this change could break stuff:
The docs say for rdma_disconnect that it flushes any posted work
requests to the completion queue;  so unless we wait for the event
do we know the stuff has been flushed?   In the normal non-cancel case
I'm worried that means we could lose something.
(But I don't know the rdma/infiniband specs well enough to know if it's
really a problem).

Dave

>  trace_qemu_rdma_cleanup_disconnect();
>  rdma->connected = false;
>  }
> diff --git a/migration/trace-events b/migration/trace-events
> index d6be74b..64573ff 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -125,7 +125,6 @@ qemu_rdma_accept_pin_state(bool pin) "%d"
>  qemu_rdma_accept_pin_verbsc(void *verbs) "Verbs context after listen: %p"
>  qemu_rdma_block_for_wrid_miss(const char *wcompstr, int wcomp, const char 
> *gcompstr, uint64_t req) "A Wanted wrid %s (%d) but got %s (%" PRIu64 ")"
>  qemu_rdma_cleanup_disconnect(void) ""
> -qemu_rdma_cleanup_waiting_for_disconnect(void) ""
>  qemu_rdma_close(void) ""
>  qemu_rdma_connect_pin_all_requested(void) ""
>  qemu_rdma_connect_pin_all_outcome(bool pin) "%d"
> -- 
> 1.8.3.1
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK