[Bug 1805256] Re: [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues

2019-10-11 Thread dann frazier
On Fri, Oct 11, 2019 at 08:30:02AM +, Jan Glauber wrote:
> On Fri, Oct 11, 2019 at 10:18:18AM +0200, Paolo Bonzini wrote:
> > On 11/10/19 08:05, Jan Glauber wrote:
> > > On Wed, Oct 09, 2019 at 11:15:04AM +0200, Paolo Bonzini wrote:
> > >>> ...but if I bump notify_me size to uint64_t the issue goes away.
> > >>
> > >> Ouch. :)  Is this with or without my patch(es)?
> > 
> > You didn't answer this question.
> 
> Oh, sorry... I did but the mail probably didn't make it out.
> I have both of your changes applied (as I think they make sense).
> 
> > >> Also, what if you just add a dummy uint32_t after notify_me?
> > > 
> > > With the dummy the testcase also runs fine for 500 iterations.
> > 
> > You might be lucky and causing list_lock to be in another cache line.
> > What if you add __attribute__((aligned(16)) to notify_me (and keep the
> > dummy)?
> 
> Good point. I'll try to force both into the same cacheline.

On the Hi1620, this still hangs in the first iteration:

diff --git a/include/block/aio.h b/include/block/aio.h
index 6b0d52f732..00e56a5412 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -82,7 +82,7 @@ struct AioContext {
  * Instead, the aio_poll calls include both the prepare and the
  * dispatch phase, hence a simple counter is enough for them.
  */
-uint32_t notify_me;
+__attribute__((aligned(16))) uint64_t notify_me;
 
 /* A lock to protect between QEMUBH and AioHandler adders and deleter,
  * and to ensure that no callbacks are removed while we're walking and
diff --git a/util/async.c b/util/async.c
index ca83e32c7f..024c4c567d 100644
--- a/util/async.c
+++ b/util/async.c
@@ -242,7 +242,7 @@ aio_ctx_check(GSource *source)
 aio_notify_accept(ctx);
 
 for (bh = ctx->first_bh; bh; bh = bh->next) {
-if (bh->scheduled) {
+if (atomic_mb_read(&bh->scheduled)) {
 return true;
 }
 }
@@ -342,12 +342,12 @@ LinuxAioState *aio_get_linux_aio(AioContext *ctx)
 
 void aio_notify(AioContext *ctx)
 {
-/* Write e.g. bh->scheduled before reading ctx->notify_me.  Pairs
- * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll.
+/* Using atomic_mb_read ensures that e.g. bh->scheduled is written before
+ * ctx->notify_me is read.  Pairs with atomic_or in aio_ctx_prepare or
+ * atomic_add in aio_poll.
  */
-smp_mb();
-if (ctx->notify_me) {
-event_notifier_set(&ctx->notifier);
+if (atomic_mb_read(&ctx->notify_me)) {
+   event_notifier_set(&ctx->notifier);
 atomic_mb_set(&ctx->notified, true);
 }
 }

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1805256

Title:
  qemu-img hangs on rcu_call_ready_event logic in Aarch64 when
  converting images

To manage notifications about this bug go to:
https://bugs.launchpad.net/kunpeng920/+bug/1805256/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1805256] Re: [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues

2019-10-11 Thread dann frazier
On Fri, Oct 11, 2019 at 06:05:25AM +, Jan Glauber wrote:
> On Wed, Oct 09, 2019 at 11:15:04AM +0200, Paolo Bonzini wrote:
> > On 09/10/19 10:02, Jan Glauber wrote:
> 
> > > I'm still not sure what the actual issue is here, but could it be some bad
> > > interaction between the notify_me and the list_lock? The are both 4 byte
> > > and side-by-side:
> > > 
> > > address notify_me: 0xdb528aa0  sizeof notify_me: 4
> > > address list_lock: 0xdb528aa4  sizeof list_lock: 4
> > > 
> > > AFAICS the generated code looks OK (all load/store exclusive done
> > > with 32 bit size):
> > > 
> > >  e6c:   885ffc01ldaxr   w1, [x0]
> > >  e70:   11000821add w1, w1, #0x2
> > >  e74:   8802fc01stlxr   w2, w1, [x0]
> > > 
> > > ...but if I bump notify_me size to uint64_t the issue goes away.
> > 
> > Ouch. :)  Is this with or without my patch(es)?
> > 
> > Also, what if you just add a dummy uint32_t after notify_me?
> 
> With the dummy the testcase also runs fine for 500 iterations.
> 
> Dann, can you try if this works on the Hi1620 too?

On Hi1620, it hung on the first iteration. Here's the complete patch
I'm running with:

diff --git a/include/block/aio.h b/include/block/aio.h
index 6b0d52f732..e6fd6f1a1a 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -82,7 +82,7 @@ struct AioContext {
  * Instead, the aio_poll calls include both the prepare and the
  * dispatch phase, hence a simple counter is enough for them.
  */
-uint32_t notify_me;
+uint64_t notify_me;
 
 /* A lock to protect between QEMUBH and AioHandler adders and deleter,
  * and to ensure that no callbacks are removed while we're walking and
diff --git a/util/async.c b/util/async.c
index ca83e32c7f..024c4c567d 100644
--- a/util/async.c
+++ b/util/async.c
@@ -242,7 +242,7 @@ aio_ctx_check(GSource *source)
 aio_notify_accept(ctx);
 
 for (bh = ctx->first_bh; bh; bh = bh->next) {
-if (bh->scheduled) {
+if (atomic_mb_read(&bh->scheduled)) {
 return true;
 }
 }
@@ -342,12 +342,12 @@ LinuxAioState *aio_get_linux_aio(AioContext *ctx)
 
 void aio_notify(AioContext *ctx)
 {
-/* Write e.g. bh->scheduled before reading ctx->notify_me.  Pairs
- * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll.
+/* Using atomic_mb_read ensures that e.g. bh->scheduled is written before
+ * ctx->notify_me is read.  Pairs with atomic_or in aio_ctx_prepare or
+ * atomic_add in aio_poll.
  */
-smp_mb();
-if (ctx->notify_me) {
-event_notifier_set(&ctx->notifier);
+if (atomic_mb_read(&ctx->notify_me)) {
+   event_notifier_set(&ctx->notifier);
 atomic_mb_set(&ctx->notified, true);
 }
 }

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1805256

Title:
  qemu-img hangs on rcu_call_ready_event logic in Aarch64 when
  converting images

To manage notifications about this bug go to:
https://bugs.launchpad.net/kunpeng920/+bug/1805256/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1805256] Re: [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues

2019-10-11 Thread Jan Glauber
On Fri, Oct 11, 2019 at 10:18:18AM +0200, Paolo Bonzini wrote:
> On 11/10/19 08:05, Jan Glauber wrote:
> > On Wed, Oct 09, 2019 at 11:15:04AM +0200, Paolo Bonzini wrote:
> >>> ...but if I bump notify_me size to uint64_t the issue goes away.
> >>
> >> Ouch. :)  Is this with or without my patch(es)?
> 
> You didn't answer this question.

Oh, sorry... I did but the mail probably didn't make it out.
I have both of your changes applied (as I think they make sense).

> >> Also, what if you just add a dummy uint32_t after notify_me?
> > 
> > With the dummy the testcase also runs fine for 500 iterations.
> 
> You might be lucky and causing list_lock to be in another cache line.
> What if you add __attribute__((aligned(16)) to notify_me (and keep the
> dummy)?

Good point. I'll try to force both into the same cacheline.

--Jan

> Paolo
> 
> > Dann, can you try if this works on the Hi1620 too?

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1805256

Title:
  qemu-img hangs on rcu_call_ready_event logic in Aarch64 when
  converting images

To manage notifications about this bug go to:
https://bugs.launchpad.net/kunpeng920/+bug/1805256/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1805256] Re: [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues

2019-10-10 Thread Jan Glauber
On Wed, Oct 09, 2019 at 11:15:04AM +0200, Paolo Bonzini wrote:
> On 09/10/19 10:02, Jan Glauber wrote:

> > I'm still not sure what the actual issue is here, but could it be some bad
> > interaction between the notify_me and the list_lock? The are both 4 byte
> > and side-by-side:
> > 
> > address notify_me: 0xdb528aa0  sizeof notify_me: 4
> > address list_lock: 0xdb528aa4  sizeof list_lock: 4
> > 
> > AFAICS the generated code looks OK (all load/store exclusive done
> > with 32 bit size):
> > 
> >  e6c:   885ffc01ldaxr   w1, [x0]
> >  e70:   11000821add w1, w1, #0x2
> >  e74:   8802fc01stlxr   w2, w1, [x0]
> > 
> > ...but if I bump notify_me size to uint64_t the issue goes away.
> 
> Ouch. :)  Is this with or without my patch(es)?
> 
> Also, what if you just add a dummy uint32_t after notify_me?

With the dummy the testcase also runs fine for 500 iterations.

Dann, can you try if this works on the Hi1620 too?

--Jan

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1805256

Title:
  qemu-img hangs on rcu_call_ready_event logic in Aarch64 when
  converting images

To manage notifications about this bug go to:
https://bugs.launchpad.net/kunpeng920/+bug/1805256/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1805256] Re: [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues

2019-10-09 Thread Jan Glauber
On Mon, Oct 07, 2019 at 04:58:30PM +0200, Paolo Bonzini wrote:
> On 07/10/19 16:44, dann frazier wrote:
> > On Mon, Oct 07, 2019 at 01:06:20PM +0200, Paolo Bonzini wrote:
> >> On 02/10/19 11:23, Jan Glauber wrote:
> >>> I've looked into this on ThunderX2. The arm64 code generated for the
> >>> atomic_[add|sub] accesses of ctx->notify_me doesn't contain any
> >>> memory barriers. It is just plain ldaxr/stlxr.
> >>>
> >>> From my understanding this is not sufficient for SMP sync.
> >>>
> >>> If I read this comment correct:
> >>>
> >>> void aio_notify(AioContext *ctx)
> >>> {
> >>> /* Write e.g. bh->scheduled before reading ctx->notify_me.  Pairs
> >>>  * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll.
> >>>  */
> >>> smp_mb();
> >>> if (ctx->notify_me) {
> >>>
> >>> it points out that the smp_mb() should be paired. But as
> >>> I said the used atomics don't generate any barriers at all.
> >>
> >> Based on the rest of the thread, this patch should also fix the bug:
> >>
> >> diff --git a/util/async.c b/util/async.c
> >> index 47dcbfa..721ea53 100644
> >> --- a/util/async.c
> >> +++ b/util/async.c
> >> @@ -249,7 +249,7 @@ aio_ctx_check(GSource *source)
> >>  aio_notify_accept(ctx);
> >>  
> >>  for (bh = ctx->first_bh; bh; bh = bh->next) {
> >> -if (bh->scheduled) {
> >> +if (atomic_mb_read(&bh->scheduled)) {
> >>  return true;
> >>  }
> >>  }
> >>
> >>
> >> And also the memory barrier in aio_notify can actually be replaced
> >> with a SEQ_CST load:
> >>
> >> diff --git a/util/async.c b/util/async.c
> >> index 47dcbfa..721ea53 100644
> >> --- a/util/async.c
> >> +++ b/util/async.c
> >> @@ -349,11 +349,11 @@ LinuxAioState *aio_get_linux_aio(AioContext *ctx)
> >>  
> >>  void aio_notify(AioContext *ctx)
> >>  {
> >> -/* Write e.g. bh->scheduled before reading ctx->notify_me.  Pairs
> >> - * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll.
> >> +/* Using atomic_mb_read ensures that e.g. bh->scheduled is written 
> >> before
> >> + * ctx->notify_me is read.  Pairs with atomic_or in aio_ctx_prepare or
> >> + * atomic_add in aio_poll.
> >>   */
> >> -smp_mb();
> >> -if (ctx->notify_me) {
> >> +if (atomic_mb_read(&ctx->notify_me)) {
> >>  event_notifier_set(&ctx->notifier);
> >>  atomic_mb_set(&ctx->notified, true);
> >>  }
> >>
> >>
> >> Would you be able to test these (one by one possibly)?
> > 
> > Paolo,
> >   I tried them both separately and together on a Hi1620 system, each
> > time it hung in the first iteration. Here's a backtrace of a run with
> > both patches applied:
> 
> Ok, not a great start...  I'll find myself an aarch64 machine and look
> at it more closely.  I'd like the patch to be something we can
> understand and document, since this is probably the second most-used
> memory barrier idiom in QEMU.
> 
> Paolo

I'm still not sure what the actual issue is here, but could it be some bad
interaction between the notify_me and the list_lock? The are both 4 byte
and side-by-side:

address notify_me: 0xdb528aa0  sizeof notify_me: 4
address list_lock: 0xdb528aa4  sizeof list_lock: 4

AFAICS the generated code looks OK (all load/store exclusive done
with 32 bit size):

 e6c:   885ffc01ldaxr   w1, [x0]
 e70:   11000821add w1, w1, #0x2
 e74:   8802fc01stlxr   w2, w1, [x0]

...but if I bump notify_me size to uint64_t the issue goes away.

BTW, the image file I convert in the testcase is ~20 GB.

--Jan

diff --git a/include/block/aio.h b/include/block/aio.h
index a1d6b9e24939..e8a5ea3860bb 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -83,7 +83,7 @@ struct AioContext {
  * Instead, the aio_poll calls include both the prepare and the
  * dispatch phase, hence a simple counter is enough for them.
  */
-uint32_t notify_me;
+uint64_t notify_me;
 
 /* A lock to protect between QEMUBH and AioHandler adders and deleter,
  * and to ensure that no callbacks are removed while we're walking and

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1805256

Title:
  qemu-img hangs on rcu_call_ready_event logic in Aarch64 when
  converting images

To manage notifications about this bug go to:
https://bugs.launchpad.net/kunpeng920/+bug/1805256/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1805256] Re: [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues

2019-10-07 Thread Jan Glauber
On Mon, Oct 07, 2019 at 01:06:20PM +0200, Paolo Bonzini wrote:
> On 02/10/19 11:23, Jan Glauber wrote:
> > I've looked into this on ThunderX2. The arm64 code generated for the
> > atomic_[add|sub] accesses of ctx->notify_me doesn't contain any
> > memory barriers. It is just plain ldaxr/stlxr.
> > 
> > From my understanding this is not sufficient for SMP sync.
> > 
> > If I read this comment correct:
> > 
> > void aio_notify(AioContext *ctx)
> > {
> > /* Write e.g. bh->scheduled before reading ctx->notify_me.  Pairs
> >  * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll.
> >  */
> > smp_mb();
> > if (ctx->notify_me) {
> > 
> > it points out that the smp_mb() should be paired. But as
> > I said the used atomics don't generate any barriers at all.
> 
> Based on the rest of the thread, this patch should also fix the bug:
> 
> diff --git a/util/async.c b/util/async.c
> index 47dcbfa..721ea53 100644
> --- a/util/async.c
> +++ b/util/async.c
> @@ -249,7 +249,7 @@ aio_ctx_check(GSource *source)
>  aio_notify_accept(ctx);
>  
>  for (bh = ctx->first_bh; bh; bh = bh->next) {
> -if (bh->scheduled) {
> +if (atomic_mb_read(&bh->scheduled)) {
>  return true;
>  }
>  }
> 
> 
> And also the memory barrier in aio_notify can actually be replaced
> with a SEQ_CST load:
> 
> diff --git a/util/async.c b/util/async.c
> index 47dcbfa..721ea53 100644
> --- a/util/async.c
> +++ b/util/async.c
> @@ -349,11 +349,11 @@ LinuxAioState *aio_get_linux_aio(AioContext *ctx)
>  
>  void aio_notify(AioContext *ctx)
>  {
> -/* Write e.g. bh->scheduled before reading ctx->notify_me.  Pairs
> - * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll.
> +/* Using atomic_mb_read ensures that e.g. bh->scheduled is written before
> + * ctx->notify_me is read.  Pairs with atomic_or in aio_ctx_prepare or
> + * atomic_add in aio_poll.
>   */
> -smp_mb();
> -if (ctx->notify_me) {
> +if (atomic_mb_read(&ctx->notify_me)) {
>  event_notifier_set(&ctx->notifier);
>  atomic_mb_set(&ctx->notified, true);
>  }
> 
> 
> Would you be able to test these (one by one possibly)?

Sure.

> > I've tried to verify me theory with this patch and didn't run into the
> > issue for ~500 iterations (usually I would trigger the issue ~20 
> > iterations).
> 
> Sorry for asking the obvious---500 iterations of what?

The testcase mentioned in the Canonical issue:
https://bugs.launchpad.net/qemu/+bug/1805256

It's a simple image convert:
qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2

Usually it got stuck after 3-20 iterations.

--Jan

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1805256

Title:
  qemu-img hangs on rcu_call_ready_event logic in Aarch64 when
  converting images

To manage notifications about this bug go to:
https://bugs.launchpad.net/kunpeng920/+bug/1805256/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1805256] Re: [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues

2019-10-07 Thread dann frazier
On Mon, Oct 07, 2019 at 01:06:20PM +0200, Paolo Bonzini wrote:
> On 02/10/19 11:23, Jan Glauber wrote:
> > I've looked into this on ThunderX2. The arm64 code generated for the
> > atomic_[add|sub] accesses of ctx->notify_me doesn't contain any
> > memory barriers. It is just plain ldaxr/stlxr.
> > 
> > From my understanding this is not sufficient for SMP sync.
> > 
> > If I read this comment correct:
> > 
> > void aio_notify(AioContext *ctx)
> > {
> > /* Write e.g. bh->scheduled before reading ctx->notify_me.  Pairs
> >  * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll.
> >  */
> > smp_mb();
> > if (ctx->notify_me) {
> > 
> > it points out that the smp_mb() should be paired. But as
> > I said the used atomics don't generate any barriers at all.
> 
> Based on the rest of the thread, this patch should also fix the bug:
> 
> diff --git a/util/async.c b/util/async.c
> index 47dcbfa..721ea53 100644
> --- a/util/async.c
> +++ b/util/async.c
> @@ -249,7 +249,7 @@ aio_ctx_check(GSource *source)
>  aio_notify_accept(ctx);
>  
>  for (bh = ctx->first_bh; bh; bh = bh->next) {
> -if (bh->scheduled) {
> +if (atomic_mb_read(&bh->scheduled)) {
>  return true;
>  }
>  }
> 
> 
> And also the memory barrier in aio_notify can actually be replaced
> with a SEQ_CST load:
> 
> diff --git a/util/async.c b/util/async.c
> index 47dcbfa..721ea53 100644
> --- a/util/async.c
> +++ b/util/async.c
> @@ -349,11 +349,11 @@ LinuxAioState *aio_get_linux_aio(AioContext *ctx)
>  
>  void aio_notify(AioContext *ctx)
>  {
> -/* Write e.g. bh->scheduled before reading ctx->notify_me.  Pairs
> - * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll.
> +/* Using atomic_mb_read ensures that e.g. bh->scheduled is written before
> + * ctx->notify_me is read.  Pairs with atomic_or in aio_ctx_prepare or
> + * atomic_add in aio_poll.
>   */
> -smp_mb();
> -if (ctx->notify_me) {
> +if (atomic_mb_read(&ctx->notify_me)) {
>  event_notifier_set(&ctx->notifier);
>  atomic_mb_set(&ctx->notified, true);
>  }
> 
> 
> Would you be able to test these (one by one possibly)?

Paolo,
  I tried them both separately and together on a Hi1620 system, each
time it hung in the first iteration. Here's a backtrace of a run with
both patches applied:

(gdb) thread apply all bt

Thread 3 (Thread 0x8154b820 (LWP 63900)):
#0  0x8b9402cc in __GI___sigtimedwait (set=, 
set@entry=0xf1e08070, 
info=info@entry=0x8154ad98, timeout=timeout@entry=0x0) at 
../sysdeps/unix/sysv/linux/sigtimedwait.c:42
#1  0x8ba77fac in __sigwait (set=set@entry=0xf1e08070, 
sig=sig@entry=0x8154ae74)
at ../sysdeps/unix/sysv/linux/sigwait.c:28
#2  0xb7dc1610 in sigwait_compat (opaque=0xf1e08070) at 
util/compatfd.c:35
#3  0xb7dc3e80 in qemu_thread_start (args=) at 
util/qemu-thread-posix.c:519
#4  0x8ba6d088 in start_thread (arg=0xceefbf4f) at 
pthread_create.c:463
#5  0x8b9dd4ec in thread_start () at 
../sysdeps/unix/sysv/linux/aarch64/clone.S:78

Thread 2 (Thread 0x81d4c820 (LWP 63899)):
#0  syscall () at ../sysdeps/unix/sysv/linux/aarch64/syscall.S:38
#1  0xb7dc4cd8 in qemu_futex_wait (val=, f=)
at /home/ubuntu/qemu/include/qemu/futex.h:29
#2  qemu_event_wait (ev=ev@entry=0xb7e48708 ) at 
util/qemu-thread-posix.c:459
#3  0xb7ddf44c in call_rcu_thread (opaque=) at 
util/rcu.c:260
#4  0xb7dc3e80 in qemu_thread_start (args=) at 
util/qemu-thread-posix.c:519
#5  0x8ba6d088 in start_thread (arg=0xceefc05f) at 
pthread_create.c:463
#6  0x8b9dd4ec in thread_start () at 
../sysdeps/unix/sysv/linux/aarch64/clone.S:78

Thread 1 (Thread 0x81e83010 (LWP 63898)):
#0  0x8b9d4154 in __GI_ppoll (fds=0xf1e0dbc0, nfds=187650205809964, 
timeout=, 
timeout@entry=0x0, sigmask=0xceefbef0) at 
../sysdeps/unix/sysv/linux/ppoll.c:39
#1  0xb7dbedb0 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=)
at /usr/include/aarch64-linux-gnu/bits/poll2.h:77
#2  qemu_poll_ns (fds=, nfds=, 
timeout=timeout@entry=-1) at util/qemu-timer.c:340
#3  0xb7dbfd2c in os_host_main_loop_wait (timeout=-1) at 
util/main-loop.c:236
#4  main_loop_wait (nonblocking=) at util/main-loop.c:517
#5  0xb7ce86e8 in convert_do_copy (s=0xceefc068) at qemu-img.c:2028
#6  img_convert (argc=, argv=) at qemu-img.c:2520
#7  0xb7ce1e54 in main (argc=8, argv=) at qemu-img.c:5097

> > I've tried to verify me theory with this patch and didn't run into the
> > issue for ~500 iterations (usually I would trigger the issue ~20 
> > iterations).
> 
> Sorry for asking the obvious---500 iterations of what?

$ for i in $(seq 1 500); do echo "==$i=="; ./qemu/qemu-img convert -p -f qcow2 
-O qcow2 bionic-server-cloudimg-arm64.img out.img; done
==1==
(37.19/100%)

  -dann

-- 
You received this bug notificatio

[Bug 1805256] Re: [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues

2019-10-02 Thread Jan Glauber
On Wed, Oct 02, 2019 at 11:45:19AM +0200, Paolo Bonzini wrote:
> On 02/10/19 11:23, Jan Glauber wrote:
> > I've tried to verify me theory with this patch and didn't run into the
> > issue for ~500 iterations (usually I would trigger the issue ~20 
> > iterations).
> 
> Awesome!  That would be a compiler bug though, as atomic_add and atomic_sub
> are defined as sequentially consistent:
> 
> #define atomic_add(ptr, n) ((void) __atomic_fetch_add(ptr, n, 
> __ATOMIC_SEQ_CST))
> #define atomic_sub(ptr, n) ((void) __atomic_fetch_sub(ptr, n, 
> __ATOMIC_SEQ_CST))

Compiler bug sounds kind of unlikely...

> What compiler are you using and what distro?  Can you compile util/aio-posix.c
> with "-fdump-rtl-all -fdump-tree-all", zip the boatload of debugging files and
> send them my way?

This is on Ubuntu 18.04.3,
gcc version 7.4.0 (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1)

I've uploaded the debug files to:
https://bugs.launchpad.net/qemu/+bug/1805256/+attachment/5293619/+files/aio-posix.tar.xz

Thanks,
Jan

> Thanks,
> 
> Paolo

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1805256

Title:
  qemu-img hangs on rcu_call_ready_event logic in Aarch64 when
  converting images

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1805256/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1805256] Re: [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues

2019-10-02 Thread Jan Glauber
I've looked into this on ThunderX2. The arm64 code generated for the
atomic_[add|sub] accesses of ctx->notify_me doesn't contain any
memory barriers. It is just plain ldaxr/stlxr.

>From my understanding this is not sufficient for SMP sync.

If I read this comment correct:

void aio_notify(AioContext *ctx)
{
/* Write e.g. bh->scheduled before reading ctx->notify_me.  Pairs
 * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll.
 */
smp_mb();
if (ctx->notify_me) {

it points out that the smp_mb() should be paired. But as
I said the used atomics don't generate any barriers at all.

I've tried to verify me theory with this patch and didn't run into the
issue for ~500 iterations (usually I would trigger the issue ~20 iterations).

--Jan

diff --git a/util/aio-posix.c b/util/aio-posix.c
index d8f0cb4af8dd..d07dcd4e9993 100644
--- a/util/aio-posix.c
+++ b/util/aio-posix.c
@@ -591,6 +591,7 @@ bool aio_poll(AioContext *ctx, bool blocking)
  */
 if (blocking) {
 atomic_add(&ctx->notify_me, 2);
+smp_mb();
 }
 
 qemu_lockcnt_inc(&ctx->list_lock);
@@ -632,6 +633,7 @@ bool aio_poll(AioContext *ctx, bool blocking)
 
 if (blocking) {
 atomic_sub(&ctx->notify_me, 2);
+smp_mb();
 }
 
 /* Adjust polling time */
diff --git a/util/async.c b/util/async.c
index 4dd9d95a9e73..92ac209c4615 100644
--- a/util/async.c
+++ b/util/async.c
@@ -222,6 +222,7 @@ aio_ctx_prepare(GSource *source, gint*timeout)
 AioContext *ctx = (AioContext *) source;
 
 atomic_or(&ctx->notify_me, 1);
+smp_mb();
 
 /* We assume there is no timeout already supplied */
 *timeout = qemu_timeout_ns_to_ms(aio_compute_timeout(ctx));
@@ -240,6 +241,7 @@ aio_ctx_check(GSource *source)
 QEMUBH *bh;
 
 atomic_and(&ctx->notify_me, ~1);
+smp_mb();
 aio_notify_accept(ctx);
 
 for (bh = ctx->first_bh; bh; bh = bh->next) {

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1805256

Title:
  qemu-img hangs on rcu_call_ready_event logic in Aarch64 when
  converting images

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1805256/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1805256] Re: [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues

2019-09-24 Thread dann frazier
On Wed, Sep 11, 2019 at 04:09:25PM -0300, Rafael David Tinoco wrote:
> > Zhengui's theory that notify_me doesn't work properly on ARM is more
> > promising, but he couldn't provide a clear explanation of why he thought
> > notify_me is involved.  In particular, I would have expected notify_me to
> > be wrong if the qemu_poll_ns call came from aio_ctx_dispatch, for example:
> > 
> > 
> > glib_pollfds_fill
> >   g_main_context_prepare
> > aio_ctx_prepare
> >   atomic_or(&ctx->notify_me, 1)
> > qemu_poll_ns
> > glib_pollfds_poll
> >   g_main_context_check
> > aio_ctx_check
> >   atomic_and(&ctx->notify_me, ~1)
> >   g_main_context_dispatch
> > aio_ctx_dispatch
> >   /* do something for event */
> > qemu_poll_ns 
> > 
> 
> Paolo,
> 
> I tried confining execution in a single NUMA domain (cpu & mem) and
> still faced the issue, then, I added a mutex "ctx->notify_me_lcktest"
> into context to protect "ctx->notify_me", like showed bellow, and it
> seems to have either fixed or mitigated it.
> 
> I was able to cause the hung once every 3 or 4 runs. I have already ran
> qemu-img convert more than 30 times now and couldn't reproduce it again.
> 
> Next step is to play with the barriers and check why existing ones
> aren't enough for ordering access to ctx->notify_me ... or should I
> try/do something else in your opinion ?
> 
> This arch/machine (Huawei D06):
> 
> $ lscpu
> Architecture:aarch64
> Byte Order:  Little Endian
> CPU(s):  96
> On-line CPU(s) list: 0-95
> Thread(s) per core:  1
> Core(s) per socket:  48
> Socket(s):   2
> NUMA node(s):4
> Vendor ID:   0x48
> Model:   0
> Stepping:0x0
> CPU max MHz: 2000.
> CPU min MHz: 200.
> BogoMIPS:200.00
> L1d cache:   64K
> L1i cache:   64K
> L2 cache:512K
> L3 cache:32768K
> NUMA node0 CPU(s):   0-23
> NUMA node1 CPU(s):   24-47
> NUMA node2 CPU(s):   48-71
> NUMA node3 CPU(s):   72-95
> Flags:   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics
> cpuid asimdrdm dcpop

Note that I'm also seeing this on a ThunderX2 (same calltrace):

$ lscpu
Architecture:aarch64
Byte Order:  Little Endian
CPU(s):  224
On-line CPU(s) list: 0-223
Thread(s) per core:  4
Core(s) per socket:  28
Socket(s):   2
NUMA node(s):2
Vendor ID:   Cavium
Model:   1
Model name:  ThunderX2 99xx
Stepping:0x1
BogoMIPS:400.00
L1d cache:   32K
L1i cache:   32K
L2 cache:256K
L3 cache:32768K
NUMA node0 CPU(s):   0-111
NUMA node1 CPU(s):   112-223
Flags:   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics cpuid 
asimdrdm

  -dann

> 
> 
> diff --git a/include/block/aio.h b/include/block/aio.h
> index 0ca25dfec6..0724086d91 100644
> --- a/include/block/aio.h
> +++ b/include/block/aio.h
> @@ -84,6 +84,7 @@ struct AioContext {
>   * dispatch phase, hence a simple counter is enough for them.
>   */
>  uint32_t notify_me;
> +QemuMutex notify_me_lcktest;
> 
>  /* A lock to protect between QEMUBH and AioHandler adders and deleter,
>   * and to ensure that no callbacks are removed while we're walking and
> diff --git a/util/aio-posix.c b/util/aio-posix.c
> index 51c41ed3c9..031d6e2997 100644
> --- a/util/aio-posix.c
> +++ b/util/aio-posix.c
> @@ -529,7 +529,9 @@ static bool run_poll_handlers(AioContext *ctx,
> int64_t max_ns, int64_t *timeout)
>  bool progress;
>  int64_t start_time, elapsed_time;
> 
> +qemu_mutex_lock(&ctx->notify_me_lcktest);
>  assert(ctx->notify_me);
> +qemu_mutex_unlock(&ctx->notify_me_lcktest);
>  assert(qemu_lockcnt_count(&ctx->list_lock) > 0);
> 
>  trace_run_poll_handlers_begin(ctx, max_ns, *timeout);
> @@ -601,8 +603,10 @@ bool aio_poll(AioContext *ctx, bool blocking)
>   * so disable the optimization now.
>   */
>  if (blocking) {
> +qemu_mutex_lock(&ctx->notify_me_lcktest);
>  assert(in_aio_context_home_thread(ctx));
>  atomic_add(&ctx->notify_me, 2);
> +qemu_mutex_unlock(&ctx->notify_me_lcktest);
>  }
> 
>  qemu_lockcnt_inc(&ctx->list_lock);
> @@ -647,8 +651,10 @@ bool aio_poll(AioContext *ctx, bool blocking)
>  }
> 
>  if (blocking) {
> +qemu_mutex_lock(&ctx->notify_me_lcktest);
>  atomic_sub(&ctx->notify_me, 2);
>  aio_notify_accept(ctx);
> +qemu_mutex_unlock(&ctx->notify_me_lcktest);
>  }
> 
>  /* Adjust polling time */
> diff --git a/util/async.c b/util/async.c
> index c10642a385..140e1e86f5 100644
> --- a/util/async.c
> +++ b/util/async.c
> @@ -221,7 +221,9 @@ aio_ctx_prepare(GSource *source, gint*timeout)
>  {
>  AioContext *ctx = (AioContext *) source;
> 
> +qemu_mutex_lock(&ctx->notify_me_lcktest);
>  atomic_or(&ctx->no