[Bug 1805256] Re: [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues
On Fri, Oct 11, 2019 at 08:30:02AM +, Jan Glauber wrote: > On Fri, Oct 11, 2019 at 10:18:18AM +0200, Paolo Bonzini wrote: > > On 11/10/19 08:05, Jan Glauber wrote: > > > On Wed, Oct 09, 2019 at 11:15:04AM +0200, Paolo Bonzini wrote: > > >>> ...but if I bump notify_me size to uint64_t the issue goes away. > > >> > > >> Ouch. :) Is this with or without my patch(es)? > > > > You didn't answer this question. > > Oh, sorry... I did but the mail probably didn't make it out. > I have both of your changes applied (as I think they make sense). > > > >> Also, what if you just add a dummy uint32_t after notify_me? > > > > > > With the dummy the testcase also runs fine for 500 iterations. > > > > You might be lucky and causing list_lock to be in another cache line. > > What if you add __attribute__((aligned(16)) to notify_me (and keep the > > dummy)? > > Good point. I'll try to force both into the same cacheline. On the Hi1620, this still hangs in the first iteration: diff --git a/include/block/aio.h b/include/block/aio.h index 6b0d52f732..00e56a5412 100644 --- a/include/block/aio.h +++ b/include/block/aio.h @@ -82,7 +82,7 @@ struct AioContext { * Instead, the aio_poll calls include both the prepare and the * dispatch phase, hence a simple counter is enough for them. */ -uint32_t notify_me; +__attribute__((aligned(16))) uint64_t notify_me; /* A lock to protect between QEMUBH and AioHandler adders and deleter, * and to ensure that no callbacks are removed while we're walking and diff --git a/util/async.c b/util/async.c index ca83e32c7f..024c4c567d 100644 --- a/util/async.c +++ b/util/async.c @@ -242,7 +242,7 @@ aio_ctx_check(GSource *source) aio_notify_accept(ctx); for (bh = ctx->first_bh; bh; bh = bh->next) { -if (bh->scheduled) { +if (atomic_mb_read(&bh->scheduled)) { return true; } } @@ -342,12 +342,12 @@ LinuxAioState *aio_get_linux_aio(AioContext *ctx) void aio_notify(AioContext *ctx) { -/* Write e.g. bh->scheduled before reading ctx->notify_me. Pairs - * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll. +/* Using atomic_mb_read ensures that e.g. bh->scheduled is written before + * ctx->notify_me is read. Pairs with atomic_or in aio_ctx_prepare or + * atomic_add in aio_poll. */ -smp_mb(); -if (ctx->notify_me) { -event_notifier_set(&ctx->notifier); +if (atomic_mb_read(&ctx->notify_me)) { + event_notifier_set(&ctx->notifier); atomic_mb_set(&ctx->notified, true); } } -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1805256 Title: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images To manage notifications about this bug go to: https://bugs.launchpad.net/kunpeng920/+bug/1805256/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1805256] Re: [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues
On Fri, Oct 11, 2019 at 06:05:25AM +, Jan Glauber wrote: > On Wed, Oct 09, 2019 at 11:15:04AM +0200, Paolo Bonzini wrote: > > On 09/10/19 10:02, Jan Glauber wrote: > > > > I'm still not sure what the actual issue is here, but could it be some bad > > > interaction between the notify_me and the list_lock? The are both 4 byte > > > and side-by-side: > > > > > > address notify_me: 0xdb528aa0 sizeof notify_me: 4 > > > address list_lock: 0xdb528aa4 sizeof list_lock: 4 > > > > > > AFAICS the generated code looks OK (all load/store exclusive done > > > with 32 bit size): > > > > > > e6c: 885ffc01ldaxr w1, [x0] > > > e70: 11000821add w1, w1, #0x2 > > > e74: 8802fc01stlxr w2, w1, [x0] > > > > > > ...but if I bump notify_me size to uint64_t the issue goes away. > > > > Ouch. :) Is this with or without my patch(es)? > > > > Also, what if you just add a dummy uint32_t after notify_me? > > With the dummy the testcase also runs fine for 500 iterations. > > Dann, can you try if this works on the Hi1620 too? On Hi1620, it hung on the first iteration. Here's the complete patch I'm running with: diff --git a/include/block/aio.h b/include/block/aio.h index 6b0d52f732..e6fd6f1a1a 100644 --- a/include/block/aio.h +++ b/include/block/aio.h @@ -82,7 +82,7 @@ struct AioContext { * Instead, the aio_poll calls include both the prepare and the * dispatch phase, hence a simple counter is enough for them. */ -uint32_t notify_me; +uint64_t notify_me; /* A lock to protect between QEMUBH and AioHandler adders and deleter, * and to ensure that no callbacks are removed while we're walking and diff --git a/util/async.c b/util/async.c index ca83e32c7f..024c4c567d 100644 --- a/util/async.c +++ b/util/async.c @@ -242,7 +242,7 @@ aio_ctx_check(GSource *source) aio_notify_accept(ctx); for (bh = ctx->first_bh; bh; bh = bh->next) { -if (bh->scheduled) { +if (atomic_mb_read(&bh->scheduled)) { return true; } } @@ -342,12 +342,12 @@ LinuxAioState *aio_get_linux_aio(AioContext *ctx) void aio_notify(AioContext *ctx) { -/* Write e.g. bh->scheduled before reading ctx->notify_me. Pairs - * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll. +/* Using atomic_mb_read ensures that e.g. bh->scheduled is written before + * ctx->notify_me is read. Pairs with atomic_or in aio_ctx_prepare or + * atomic_add in aio_poll. */ -smp_mb(); -if (ctx->notify_me) { -event_notifier_set(&ctx->notifier); +if (atomic_mb_read(&ctx->notify_me)) { + event_notifier_set(&ctx->notifier); atomic_mb_set(&ctx->notified, true); } } -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1805256 Title: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images To manage notifications about this bug go to: https://bugs.launchpad.net/kunpeng920/+bug/1805256/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1805256] Re: [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues
On Fri, Oct 11, 2019 at 10:18:18AM +0200, Paolo Bonzini wrote: > On 11/10/19 08:05, Jan Glauber wrote: > > On Wed, Oct 09, 2019 at 11:15:04AM +0200, Paolo Bonzini wrote: > >>> ...but if I bump notify_me size to uint64_t the issue goes away. > >> > >> Ouch. :) Is this with or without my patch(es)? > > You didn't answer this question. Oh, sorry... I did but the mail probably didn't make it out. I have both of your changes applied (as I think they make sense). > >> Also, what if you just add a dummy uint32_t after notify_me? > > > > With the dummy the testcase also runs fine for 500 iterations. > > You might be lucky and causing list_lock to be in another cache line. > What if you add __attribute__((aligned(16)) to notify_me (and keep the > dummy)? Good point. I'll try to force both into the same cacheline. --Jan > Paolo > > > Dann, can you try if this works on the Hi1620 too? -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1805256 Title: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images To manage notifications about this bug go to: https://bugs.launchpad.net/kunpeng920/+bug/1805256/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1805256] Re: [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues
On Wed, Oct 09, 2019 at 11:15:04AM +0200, Paolo Bonzini wrote: > On 09/10/19 10:02, Jan Glauber wrote: > > I'm still not sure what the actual issue is here, but could it be some bad > > interaction between the notify_me and the list_lock? The are both 4 byte > > and side-by-side: > > > > address notify_me: 0xdb528aa0 sizeof notify_me: 4 > > address list_lock: 0xdb528aa4 sizeof list_lock: 4 > > > > AFAICS the generated code looks OK (all load/store exclusive done > > with 32 bit size): > > > > e6c: 885ffc01ldaxr w1, [x0] > > e70: 11000821add w1, w1, #0x2 > > e74: 8802fc01stlxr w2, w1, [x0] > > > > ...but if I bump notify_me size to uint64_t the issue goes away. > > Ouch. :) Is this with or without my patch(es)? > > Also, what if you just add a dummy uint32_t after notify_me? With the dummy the testcase also runs fine for 500 iterations. Dann, can you try if this works on the Hi1620 too? --Jan -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1805256 Title: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images To manage notifications about this bug go to: https://bugs.launchpad.net/kunpeng920/+bug/1805256/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1805256] Re: [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues
On Mon, Oct 07, 2019 at 04:58:30PM +0200, Paolo Bonzini wrote: > On 07/10/19 16:44, dann frazier wrote: > > On Mon, Oct 07, 2019 at 01:06:20PM +0200, Paolo Bonzini wrote: > >> On 02/10/19 11:23, Jan Glauber wrote: > >>> I've looked into this on ThunderX2. The arm64 code generated for the > >>> atomic_[add|sub] accesses of ctx->notify_me doesn't contain any > >>> memory barriers. It is just plain ldaxr/stlxr. > >>> > >>> From my understanding this is not sufficient for SMP sync. > >>> > >>> If I read this comment correct: > >>> > >>> void aio_notify(AioContext *ctx) > >>> { > >>> /* Write e.g. bh->scheduled before reading ctx->notify_me. Pairs > >>> * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll. > >>> */ > >>> smp_mb(); > >>> if (ctx->notify_me) { > >>> > >>> it points out that the smp_mb() should be paired. But as > >>> I said the used atomics don't generate any barriers at all. > >> > >> Based on the rest of the thread, this patch should also fix the bug: > >> > >> diff --git a/util/async.c b/util/async.c > >> index 47dcbfa..721ea53 100644 > >> --- a/util/async.c > >> +++ b/util/async.c > >> @@ -249,7 +249,7 @@ aio_ctx_check(GSource *source) > >> aio_notify_accept(ctx); > >> > >> for (bh = ctx->first_bh; bh; bh = bh->next) { > >> -if (bh->scheduled) { > >> +if (atomic_mb_read(&bh->scheduled)) { > >> return true; > >> } > >> } > >> > >> > >> And also the memory barrier in aio_notify can actually be replaced > >> with a SEQ_CST load: > >> > >> diff --git a/util/async.c b/util/async.c > >> index 47dcbfa..721ea53 100644 > >> --- a/util/async.c > >> +++ b/util/async.c > >> @@ -349,11 +349,11 @@ LinuxAioState *aio_get_linux_aio(AioContext *ctx) > >> > >> void aio_notify(AioContext *ctx) > >> { > >> -/* Write e.g. bh->scheduled before reading ctx->notify_me. Pairs > >> - * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll. > >> +/* Using atomic_mb_read ensures that e.g. bh->scheduled is written > >> before > >> + * ctx->notify_me is read. Pairs with atomic_or in aio_ctx_prepare or > >> + * atomic_add in aio_poll. > >> */ > >> -smp_mb(); > >> -if (ctx->notify_me) { > >> +if (atomic_mb_read(&ctx->notify_me)) { > >> event_notifier_set(&ctx->notifier); > >> atomic_mb_set(&ctx->notified, true); > >> } > >> > >> > >> Would you be able to test these (one by one possibly)? > > > > Paolo, > > I tried them both separately and together on a Hi1620 system, each > > time it hung in the first iteration. Here's a backtrace of a run with > > both patches applied: > > Ok, not a great start... I'll find myself an aarch64 machine and look > at it more closely. I'd like the patch to be something we can > understand and document, since this is probably the second most-used > memory barrier idiom in QEMU. > > Paolo I'm still not sure what the actual issue is here, but could it be some bad interaction between the notify_me and the list_lock? The are both 4 byte and side-by-side: address notify_me: 0xdb528aa0 sizeof notify_me: 4 address list_lock: 0xdb528aa4 sizeof list_lock: 4 AFAICS the generated code looks OK (all load/store exclusive done with 32 bit size): e6c: 885ffc01ldaxr w1, [x0] e70: 11000821add w1, w1, #0x2 e74: 8802fc01stlxr w2, w1, [x0] ...but if I bump notify_me size to uint64_t the issue goes away. BTW, the image file I convert in the testcase is ~20 GB. --Jan diff --git a/include/block/aio.h b/include/block/aio.h index a1d6b9e24939..e8a5ea3860bb 100644 --- a/include/block/aio.h +++ b/include/block/aio.h @@ -83,7 +83,7 @@ struct AioContext { * Instead, the aio_poll calls include both the prepare and the * dispatch phase, hence a simple counter is enough for them. */ -uint32_t notify_me; +uint64_t notify_me; /* A lock to protect between QEMUBH and AioHandler adders and deleter, * and to ensure that no callbacks are removed while we're walking and -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1805256 Title: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images To manage notifications about this bug go to: https://bugs.launchpad.net/kunpeng920/+bug/1805256/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1805256] Re: [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues
On Mon, Oct 07, 2019 at 01:06:20PM +0200, Paolo Bonzini wrote: > On 02/10/19 11:23, Jan Glauber wrote: > > I've looked into this on ThunderX2. The arm64 code generated for the > > atomic_[add|sub] accesses of ctx->notify_me doesn't contain any > > memory barriers. It is just plain ldaxr/stlxr. > > > > From my understanding this is not sufficient for SMP sync. > > > > If I read this comment correct: > > > > void aio_notify(AioContext *ctx) > > { > > /* Write e.g. bh->scheduled before reading ctx->notify_me. Pairs > > * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll. > > */ > > smp_mb(); > > if (ctx->notify_me) { > > > > it points out that the smp_mb() should be paired. But as > > I said the used atomics don't generate any barriers at all. > > Based on the rest of the thread, this patch should also fix the bug: > > diff --git a/util/async.c b/util/async.c > index 47dcbfa..721ea53 100644 > --- a/util/async.c > +++ b/util/async.c > @@ -249,7 +249,7 @@ aio_ctx_check(GSource *source) > aio_notify_accept(ctx); > > for (bh = ctx->first_bh; bh; bh = bh->next) { > -if (bh->scheduled) { > +if (atomic_mb_read(&bh->scheduled)) { > return true; > } > } > > > And also the memory barrier in aio_notify can actually be replaced > with a SEQ_CST load: > > diff --git a/util/async.c b/util/async.c > index 47dcbfa..721ea53 100644 > --- a/util/async.c > +++ b/util/async.c > @@ -349,11 +349,11 @@ LinuxAioState *aio_get_linux_aio(AioContext *ctx) > > void aio_notify(AioContext *ctx) > { > -/* Write e.g. bh->scheduled before reading ctx->notify_me. Pairs > - * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll. > +/* Using atomic_mb_read ensures that e.g. bh->scheduled is written before > + * ctx->notify_me is read. Pairs with atomic_or in aio_ctx_prepare or > + * atomic_add in aio_poll. > */ > -smp_mb(); > -if (ctx->notify_me) { > +if (atomic_mb_read(&ctx->notify_me)) { > event_notifier_set(&ctx->notifier); > atomic_mb_set(&ctx->notified, true); > } > > > Would you be able to test these (one by one possibly)? Sure. > > I've tried to verify me theory with this patch and didn't run into the > > issue for ~500 iterations (usually I would trigger the issue ~20 > > iterations). > > Sorry for asking the obvious---500 iterations of what? The testcase mentioned in the Canonical issue: https://bugs.launchpad.net/qemu/+bug/1805256 It's a simple image convert: qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2 Usually it got stuck after 3-20 iterations. --Jan -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1805256 Title: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images To manage notifications about this bug go to: https://bugs.launchpad.net/kunpeng920/+bug/1805256/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1805256] Re: [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues
On Mon, Oct 07, 2019 at 01:06:20PM +0200, Paolo Bonzini wrote: > On 02/10/19 11:23, Jan Glauber wrote: > > I've looked into this on ThunderX2. The arm64 code generated for the > > atomic_[add|sub] accesses of ctx->notify_me doesn't contain any > > memory barriers. It is just plain ldaxr/stlxr. > > > > From my understanding this is not sufficient for SMP sync. > > > > If I read this comment correct: > > > > void aio_notify(AioContext *ctx) > > { > > /* Write e.g. bh->scheduled before reading ctx->notify_me. Pairs > > * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll. > > */ > > smp_mb(); > > if (ctx->notify_me) { > > > > it points out that the smp_mb() should be paired. But as > > I said the used atomics don't generate any barriers at all. > > Based on the rest of the thread, this patch should also fix the bug: > > diff --git a/util/async.c b/util/async.c > index 47dcbfa..721ea53 100644 > --- a/util/async.c > +++ b/util/async.c > @@ -249,7 +249,7 @@ aio_ctx_check(GSource *source) > aio_notify_accept(ctx); > > for (bh = ctx->first_bh; bh; bh = bh->next) { > -if (bh->scheduled) { > +if (atomic_mb_read(&bh->scheduled)) { > return true; > } > } > > > And also the memory barrier in aio_notify can actually be replaced > with a SEQ_CST load: > > diff --git a/util/async.c b/util/async.c > index 47dcbfa..721ea53 100644 > --- a/util/async.c > +++ b/util/async.c > @@ -349,11 +349,11 @@ LinuxAioState *aio_get_linux_aio(AioContext *ctx) > > void aio_notify(AioContext *ctx) > { > -/* Write e.g. bh->scheduled before reading ctx->notify_me. Pairs > - * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll. > +/* Using atomic_mb_read ensures that e.g. bh->scheduled is written before > + * ctx->notify_me is read. Pairs with atomic_or in aio_ctx_prepare or > + * atomic_add in aio_poll. > */ > -smp_mb(); > -if (ctx->notify_me) { > +if (atomic_mb_read(&ctx->notify_me)) { > event_notifier_set(&ctx->notifier); > atomic_mb_set(&ctx->notified, true); > } > > > Would you be able to test these (one by one possibly)? Paolo, I tried them both separately and together on a Hi1620 system, each time it hung in the first iteration. Here's a backtrace of a run with both patches applied: (gdb) thread apply all bt Thread 3 (Thread 0x8154b820 (LWP 63900)): #0 0x8b9402cc in __GI___sigtimedwait (set=, set@entry=0xf1e08070, info=info@entry=0x8154ad98, timeout=timeout@entry=0x0) at ../sysdeps/unix/sysv/linux/sigtimedwait.c:42 #1 0x8ba77fac in __sigwait (set=set@entry=0xf1e08070, sig=sig@entry=0x8154ae74) at ../sysdeps/unix/sysv/linux/sigwait.c:28 #2 0xb7dc1610 in sigwait_compat (opaque=0xf1e08070) at util/compatfd.c:35 #3 0xb7dc3e80 in qemu_thread_start (args=) at util/qemu-thread-posix.c:519 #4 0x8ba6d088 in start_thread (arg=0xceefbf4f) at pthread_create.c:463 #5 0x8b9dd4ec in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78 Thread 2 (Thread 0x81d4c820 (LWP 63899)): #0 syscall () at ../sysdeps/unix/sysv/linux/aarch64/syscall.S:38 #1 0xb7dc4cd8 in qemu_futex_wait (val=, f=) at /home/ubuntu/qemu/include/qemu/futex.h:29 #2 qemu_event_wait (ev=ev@entry=0xb7e48708 ) at util/qemu-thread-posix.c:459 #3 0xb7ddf44c in call_rcu_thread (opaque=) at util/rcu.c:260 #4 0xb7dc3e80 in qemu_thread_start (args=) at util/qemu-thread-posix.c:519 #5 0x8ba6d088 in start_thread (arg=0xceefc05f) at pthread_create.c:463 #6 0x8b9dd4ec in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78 Thread 1 (Thread 0x81e83010 (LWP 63898)): #0 0x8b9d4154 in __GI_ppoll (fds=0xf1e0dbc0, nfds=187650205809964, timeout=, timeout@entry=0x0, sigmask=0xceefbef0) at ../sysdeps/unix/sysv/linux/ppoll.c:39 #1 0xb7dbedb0 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at /usr/include/aarch64-linux-gnu/bits/poll2.h:77 #2 qemu_poll_ns (fds=, nfds=, timeout=timeout@entry=-1) at util/qemu-timer.c:340 #3 0xb7dbfd2c in os_host_main_loop_wait (timeout=-1) at util/main-loop.c:236 #4 main_loop_wait (nonblocking=) at util/main-loop.c:517 #5 0xb7ce86e8 in convert_do_copy (s=0xceefc068) at qemu-img.c:2028 #6 img_convert (argc=, argv=) at qemu-img.c:2520 #7 0xb7ce1e54 in main (argc=8, argv=) at qemu-img.c:5097 > > I've tried to verify me theory with this patch and didn't run into the > > issue for ~500 iterations (usually I would trigger the issue ~20 > > iterations). > > Sorry for asking the obvious---500 iterations of what? $ for i in $(seq 1 500); do echo "==$i=="; ./qemu/qemu-img convert -p -f qcow2 -O qcow2 bionic-server-cloudimg-arm64.img out.img; done ==1== (37.19/100%) -dann -- You received this bug notificatio
[Bug 1805256] Re: [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues
On Wed, Oct 02, 2019 at 11:45:19AM +0200, Paolo Bonzini wrote: > On 02/10/19 11:23, Jan Glauber wrote: > > I've tried to verify me theory with this patch and didn't run into the > > issue for ~500 iterations (usually I would trigger the issue ~20 > > iterations). > > Awesome! That would be a compiler bug though, as atomic_add and atomic_sub > are defined as sequentially consistent: > > #define atomic_add(ptr, n) ((void) __atomic_fetch_add(ptr, n, > __ATOMIC_SEQ_CST)) > #define atomic_sub(ptr, n) ((void) __atomic_fetch_sub(ptr, n, > __ATOMIC_SEQ_CST)) Compiler bug sounds kind of unlikely... > What compiler are you using and what distro? Can you compile util/aio-posix.c > with "-fdump-rtl-all -fdump-tree-all", zip the boatload of debugging files and > send them my way? This is on Ubuntu 18.04.3, gcc version 7.4.0 (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1) I've uploaded the debug files to: https://bugs.launchpad.net/qemu/+bug/1805256/+attachment/5293619/+files/aio-posix.tar.xz Thanks, Jan > Thanks, > > Paolo -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1805256 Title: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1805256/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1805256] Re: [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues
I've looked into this on ThunderX2. The arm64 code generated for the atomic_[add|sub] accesses of ctx->notify_me doesn't contain any memory barriers. It is just plain ldaxr/stlxr. >From my understanding this is not sufficient for SMP sync. If I read this comment correct: void aio_notify(AioContext *ctx) { /* Write e.g. bh->scheduled before reading ctx->notify_me. Pairs * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll. */ smp_mb(); if (ctx->notify_me) { it points out that the smp_mb() should be paired. But as I said the used atomics don't generate any barriers at all. I've tried to verify me theory with this patch and didn't run into the issue for ~500 iterations (usually I would trigger the issue ~20 iterations). --Jan diff --git a/util/aio-posix.c b/util/aio-posix.c index d8f0cb4af8dd..d07dcd4e9993 100644 --- a/util/aio-posix.c +++ b/util/aio-posix.c @@ -591,6 +591,7 @@ bool aio_poll(AioContext *ctx, bool blocking) */ if (blocking) { atomic_add(&ctx->notify_me, 2); +smp_mb(); } qemu_lockcnt_inc(&ctx->list_lock); @@ -632,6 +633,7 @@ bool aio_poll(AioContext *ctx, bool blocking) if (blocking) { atomic_sub(&ctx->notify_me, 2); +smp_mb(); } /* Adjust polling time */ diff --git a/util/async.c b/util/async.c index 4dd9d95a9e73..92ac209c4615 100644 --- a/util/async.c +++ b/util/async.c @@ -222,6 +222,7 @@ aio_ctx_prepare(GSource *source, gint*timeout) AioContext *ctx = (AioContext *) source; atomic_or(&ctx->notify_me, 1); +smp_mb(); /* We assume there is no timeout already supplied */ *timeout = qemu_timeout_ns_to_ms(aio_compute_timeout(ctx)); @@ -240,6 +241,7 @@ aio_ctx_check(GSource *source) QEMUBH *bh; atomic_and(&ctx->notify_me, ~1); +smp_mb(); aio_notify_accept(ctx); for (bh = ctx->first_bh; bh; bh = bh->next) { -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1805256 Title: qemu-img hangs on rcu_call_ready_event logic in Aarch64 when converting images To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1805256/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1805256] Re: [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues
On Wed, Sep 11, 2019 at 04:09:25PM -0300, Rafael David Tinoco wrote: > > Zhengui's theory that notify_me doesn't work properly on ARM is more > > promising, but he couldn't provide a clear explanation of why he thought > > notify_me is involved. In particular, I would have expected notify_me to > > be wrong if the qemu_poll_ns call came from aio_ctx_dispatch, for example: > > > > > > glib_pollfds_fill > > g_main_context_prepare > > aio_ctx_prepare > > atomic_or(&ctx->notify_me, 1) > > qemu_poll_ns > > glib_pollfds_poll > > g_main_context_check > > aio_ctx_check > > atomic_and(&ctx->notify_me, ~1) > > g_main_context_dispatch > > aio_ctx_dispatch > > /* do something for event */ > > qemu_poll_ns > > > > Paolo, > > I tried confining execution in a single NUMA domain (cpu & mem) and > still faced the issue, then, I added a mutex "ctx->notify_me_lcktest" > into context to protect "ctx->notify_me", like showed bellow, and it > seems to have either fixed or mitigated it. > > I was able to cause the hung once every 3 or 4 runs. I have already ran > qemu-img convert more than 30 times now and couldn't reproduce it again. > > Next step is to play with the barriers and check why existing ones > aren't enough for ordering access to ctx->notify_me ... or should I > try/do something else in your opinion ? > > This arch/machine (Huawei D06): > > $ lscpu > Architecture:aarch64 > Byte Order: Little Endian > CPU(s): 96 > On-line CPU(s) list: 0-95 > Thread(s) per core: 1 > Core(s) per socket: 48 > Socket(s): 2 > NUMA node(s):4 > Vendor ID: 0x48 > Model: 0 > Stepping:0x0 > CPU max MHz: 2000. > CPU min MHz: 200. > BogoMIPS:200.00 > L1d cache: 64K > L1i cache: 64K > L2 cache:512K > L3 cache:32768K > NUMA node0 CPU(s): 0-23 > NUMA node1 CPU(s): 24-47 > NUMA node2 CPU(s): 48-71 > NUMA node3 CPU(s): 72-95 > Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics > cpuid asimdrdm dcpop Note that I'm also seeing this on a ThunderX2 (same calltrace): $ lscpu Architecture:aarch64 Byte Order: Little Endian CPU(s): 224 On-line CPU(s) list: 0-223 Thread(s) per core: 4 Core(s) per socket: 28 Socket(s): 2 NUMA node(s):2 Vendor ID: Cavium Model: 1 Model name: ThunderX2 99xx Stepping:0x1 BogoMIPS:400.00 L1d cache: 32K L1i cache: 32K L2 cache:256K L3 cache:32768K NUMA node0 CPU(s): 0-111 NUMA node1 CPU(s): 112-223 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics cpuid asimdrdm -dann > > > diff --git a/include/block/aio.h b/include/block/aio.h > index 0ca25dfec6..0724086d91 100644 > --- a/include/block/aio.h > +++ b/include/block/aio.h > @@ -84,6 +84,7 @@ struct AioContext { > * dispatch phase, hence a simple counter is enough for them. > */ > uint32_t notify_me; > +QemuMutex notify_me_lcktest; > > /* A lock to protect between QEMUBH and AioHandler adders and deleter, > * and to ensure that no callbacks are removed while we're walking and > diff --git a/util/aio-posix.c b/util/aio-posix.c > index 51c41ed3c9..031d6e2997 100644 > --- a/util/aio-posix.c > +++ b/util/aio-posix.c > @@ -529,7 +529,9 @@ static bool run_poll_handlers(AioContext *ctx, > int64_t max_ns, int64_t *timeout) > bool progress; > int64_t start_time, elapsed_time; > > +qemu_mutex_lock(&ctx->notify_me_lcktest); > assert(ctx->notify_me); > +qemu_mutex_unlock(&ctx->notify_me_lcktest); > assert(qemu_lockcnt_count(&ctx->list_lock) > 0); > > trace_run_poll_handlers_begin(ctx, max_ns, *timeout); > @@ -601,8 +603,10 @@ bool aio_poll(AioContext *ctx, bool blocking) > * so disable the optimization now. > */ > if (blocking) { > +qemu_mutex_lock(&ctx->notify_me_lcktest); > assert(in_aio_context_home_thread(ctx)); > atomic_add(&ctx->notify_me, 2); > +qemu_mutex_unlock(&ctx->notify_me_lcktest); > } > > qemu_lockcnt_inc(&ctx->list_lock); > @@ -647,8 +651,10 @@ bool aio_poll(AioContext *ctx, bool blocking) > } > > if (blocking) { > +qemu_mutex_lock(&ctx->notify_me_lcktest); > atomic_sub(&ctx->notify_me, 2); > aio_notify_accept(ctx); > +qemu_mutex_unlock(&ctx->notify_me_lcktest); > } > > /* Adjust polling time */ > diff --git a/util/async.c b/util/async.c > index c10642a385..140e1e86f5 100644 > --- a/util/async.c > +++ b/util/async.c > @@ -221,7 +221,9 @@ aio_ctx_prepare(GSource *source, gint*timeout) > { > AioContext *ctx = (AioContext *) source; > > +qemu_mutex_lock(&ctx->notify_me_lcktest); > atomic_or(&ctx->no