Re: [Bug 1805256] Re: [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues

2019-10-11 Thread no-reply
Patchew URL: https://patchew.org/QEMU/20191011175536.GB25464@xps13.dannf/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Subject: [Bug 1805256] Re: [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 
possible issues
Type: series
Message-id: 20191011175536.GB25464@xps13.dannf

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Switched to a new branch 'test'
d3707c3 qemu_futex_wait() lockups in ARM64: 2 possible issues

=== OUTPUT BEGIN ===
WARNING: Block comments use a leading /* on a separate line
#66: FILE: util/async.c:345:
+/* Using atomic_mb_read ensures that e.g. bh->scheduled is written before

ERROR: code indent should never use tabs
#74: FILE: util/async.c:350:
+^Ievent_notifier_set(>notifier);$

ERROR: Missing Signed-off-by: line(s)

total: 2 errors, 1 warnings, 33 lines checked

Commit d3707c31b5da (qemu_futex_wait() lockups in ARM64: 2 possible issues) has 
style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20191011175536.GB25464@xps13.dannf/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [Bug 1805256] Re: [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues

2019-10-11 Thread no-reply
Patchew URL: https://patchew.org/QEMU/20191011175536.GB25464@xps13.dannf/



Hi,

This series failed the docker-mingw@fedora build test. Please find the testing 
commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#! /bin/bash
export ARCH=x86_64
make docker-image-fedora V=1 NETWORK=1
time make docker-test-mingw@fedora J=14 NETWORK=1
=== TEST SCRIPT END ===

In file included from /tmp/qemu-test/src/include/qemu/osdep.h:51,
 from /tmp/qemu-test/src/util/async.c:26:
/tmp/qemu-test/src/util/async.c: In function 'aio_notify':
/tmp/qemu-test/src/include/qemu/compiler.h:86:36: error: static assertion 
failed: "not expecting: sizeof(*>notify_me) > ATOMIC_REG_SIZE"
 #define QEMU_BUILD_BUG_MSG(x, msg) _Static_assert(!(x), msg)
^~
/tmp/qemu-test/src/include/qemu/compiler.h:94:30: note: in expansion of macro 
'QEMU_BUILD_BUG_MSG'
---
/tmp/qemu-test/src/util/async.c:349:9: note: in expansion of macro 
'atomic_mb_read'
 if (atomic_mb_read(>notify_me)) {
 ^~
make: *** [/tmp/qemu-test/src/rules.mak:69: util/async.o] Error 1
make: *** Waiting for unfinished jobs
Traceback (most recent call last):
  File "./tests/docker/docker.py", line 662, in 
---
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', 
'--label', 'com.qemu.instance.uuid=b90449ea684e4a14ad6b34f4f5dcb0c9', '-u', 
'1001', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', 
'-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 
'SHOW_ENV=', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', 
'/home/patchew/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', 
'/var/tmp/patchew-tester-tmp-0y2eb54d/src/docker-src.2019-10-11-20.12.57.30254:/var/tmp/qemu:z,ro',
 'qemu:fedora', '/var/tmp/qemu/run', 'test-mingw']' returned non-zero exit 
status 2.
filter=--filter=label=com.qemu.instance.uuid=b90449ea684e4a14ad6b34f4f5dcb0c9
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-0y2eb54d/src'
make: *** [docker-run-test-mingw@fedora] Error 2

real11m27.735s
user0m7.532s


The full log is available at
http://patchew.org/logs/20191011175536.GB25464@xps13.dannf/testing.docker-mingw@fedora/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

[Bug 1805256] Re: [Qemu-devel] qemu_futex_wait() lockups in ARM64: 2 possible issues

2019-10-11 Thread dann frazier
On Fri, Oct 11, 2019 at 08:30:02AM +, Jan Glauber wrote:
> On Fri, Oct 11, 2019 at 10:18:18AM +0200, Paolo Bonzini wrote:
> > On 11/10/19 08:05, Jan Glauber wrote:
> > > On Wed, Oct 09, 2019 at 11:15:04AM +0200, Paolo Bonzini wrote:
> > >>> ...but if I bump notify_me size to uint64_t the issue goes away.
> > >>
> > >> Ouch. :)  Is this with or without my patch(es)?
> > 
> > You didn't answer this question.
> 
> Oh, sorry... I did but the mail probably didn't make it out.
> I have both of your changes applied (as I think they make sense).
> 
> > >> Also, what if you just add a dummy uint32_t after notify_me?
> > > 
> > > With the dummy the testcase also runs fine for 500 iterations.
> > 
> > You might be lucky and causing list_lock to be in another cache line.
> > What if you add __attribute__((aligned(16)) to notify_me (and keep the
> > dummy)?
> 
> Good point. I'll try to force both into the same cacheline.

On the Hi1620, this still hangs in the first iteration:

diff --git a/include/block/aio.h b/include/block/aio.h
index 6b0d52f732..00e56a5412 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -82,7 +82,7 @@ struct AioContext {
  * Instead, the aio_poll calls include both the prepare and the
  * dispatch phase, hence a simple counter is enough for them.
  */
-uint32_t notify_me;
+__attribute__((aligned(16))) uint64_t notify_me;
 
 /* A lock to protect between QEMUBH and AioHandler adders and deleter,
  * and to ensure that no callbacks are removed while we're walking and
diff --git a/util/async.c b/util/async.c
index ca83e32c7f..024c4c567d 100644
--- a/util/async.c
+++ b/util/async.c
@@ -242,7 +242,7 @@ aio_ctx_check(GSource *source)
 aio_notify_accept(ctx);
 
 for (bh = ctx->first_bh; bh; bh = bh->next) {
-if (bh->scheduled) {
+if (atomic_mb_read(>scheduled)) {
 return true;
 }
 }
@@ -342,12 +342,12 @@ LinuxAioState *aio_get_linux_aio(AioContext *ctx)
 
 void aio_notify(AioContext *ctx)
 {
-/* Write e.g. bh->scheduled before reading ctx->notify_me.  Pairs
- * with atomic_or in aio_ctx_prepare or atomic_add in aio_poll.
+/* Using atomic_mb_read ensures that e.g. bh->scheduled is written before
+ * ctx->notify_me is read.  Pairs with atomic_or in aio_ctx_prepare or
+ * atomic_add in aio_poll.
  */
-smp_mb();
-if (ctx->notify_me) {
-event_notifier_set(>notifier);
+if (atomic_mb_read(>notify_me)) {
+   event_notifier_set(>notifier);
 atomic_mb_set(>notified, true);
 }
 }

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1805256

Title:
  qemu-img hangs on rcu_call_ready_event logic in Aarch64 when
  converting images

Status in kunpeng920:
  New
Status in QEMU:
  In Progress
Status in qemu package in Ubuntu:
  In Progress
Status in qemu source package in Bionic:
  New
Status in qemu source package in Disco:
  New
Status in qemu source package in Eoan:
  In Progress
Status in qemu source package in FF-Series:
  New

Bug description:
  Command:

  qemu-img convert -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2

  Hangs indefinitely approximately 30% of the runs.

  

  Workaround:

  qemu-img convert -m 1 -f qcow2 -O qcow2 ./disk01.qcow2 ./output.qcow2

  Run "qemu-img convert" with "a single coroutine" to avoid this issue.

  

  (gdb) thread 1
  ...
  (gdb) bt
  #0 0xbf1ad81c in __GI_ppoll
  #1 0xaabcf73c in ppoll
  #2 qemu_poll_ns
  #3 0xaabd0764 in os_host_main_loop_wait
  #4 main_loop_wait
  ...

  (gdb) thread 2
  ...
  (gdb) bt
  #0 syscall ()
  #1 0xaabd41cc in qemu_futex_wait
  #2 qemu_event_wait (ev=ev@entry=0xaac86ce8 )
  #3 0xaabed05c in call_rcu_thread
  #4 0xaabd34c8 in qemu_thread_start
  #5 0xbf25c880 in start_thread
  #6 0xbf1b6b9c in thread_start ()

  (gdb) thread 3
  ...
  (gdb) bt
  #0 0xbf11aa20 in __GI___sigtimedwait
  #1 0xbf2671b4 in __sigwait
  #2 0xaabd1ddc in sigwait_compat
  #3 0xaabd34c8 in qemu_thread_start
  #4 0xbf25c880 in start_thread
  #5 0xbf1b6b9c in thread_start

  

  (gdb) run
  Starting program: /usr/bin/qemu-img convert -f qcow2 -O qcow2
  ./disk01.ext4.qcow2 ./output.qcow2

  [New Thread 0xbec5ad90 (LWP 72839)]
  [New Thread 0xbe459d90 (LWP 72840)]
  [New Thread 0xbdb57d90 (LWP 72841)]
  [New Thread 0xacac9d90 (LWP 72859)]
  [New Thread 0xa7ffed90 (LWP 72860)]
  [New Thread 0xa77fdd90 (LWP 72861)]
  [New Thread 0xa6ffcd90 (LWP 72862)]
  [New Thread 0xa67fbd90 (LWP 72863)]
  [New Thread 0xa5ffad90 (LWP 72864)]

  [Thread 0xa5ffad90 (LWP 72864) exited]
  [Thread 0xa6ffcd90 (LWP 72862) exited]
  [Thread 0xa77fdd90 (LWP 72861) exited]
  [Thread 0xbdb57d90 (LWP 72841) exited]
  [Thread 0xa67fbd90 (LWP 72863) exited]
  [Thread 0xacac9d90 (LWP