Re: [Qemu-devel] Remaining CI failures

2019-01-14 Thread Alex Bennée


Emilio G. Cota  writes:

> On Fri, Jan 11, 2019 at 19:10:07 +, Alex Bennée wrote:
>> So trying to narrow down the remaining failures in the CI system. There
>> is one with a patch in flight (use g_usleep instead of sleep) but there
>> remains two failure modes, both erratic.
>>
>> tests/qht-par:
>>
>> I can trigger this on my dev machine with a gprof enabled build:
>>
>>   # QEMU configure log Fri Jan 11 14:10:45 GMT 2019
>>   # Configured with: './configure' '--disable-tools' '--disable-docs' 
>> '--enable-gprof' '--enable-gcov'
>>
>> I only seem to be able to trigger it when running via the wrapper in the
>> make system:
>>
>>   retry.py -n 30 --invert make check-tests/test-qht-par
>>
>> Eventually this crashes with:
>>
>>   ERROR:tests/test-qht-par.c:20:test_qht: assertion failed (rc == 0): (35584 
>> == 0)
>>
>> Leaving a core dump for the child:
>
> I can't replicate this on my machine :-(
>
> I suspect this is probably related to the fact that gprof
> is not even meant to support multi-threaded programs (like qht-bench).
>
> Given that we've fixed the sleep issue (thanks!) and that there is no use
> in running test-qht-par under gprof, I propose to permanently skip
> it under gprof, e.g.:
>
> --- a/tests/Makefile.include
> +++ b/tests/Makefile.include
> @@ -88,7 +88,8 @@ check-unit-y += tests/test-rcu-simpleq$(EXESUF)
>  check-unit-y += tests/test-rcu-tailq$(EXESUF)
>  check-unit-y += tests/test-qdist$(EXESUF)
>  check-unit-y += tests/test-qht$(EXESUF)
> -# FIXME: {test-qht-par + gprof} often break on Travis CI
> +# test-qht-par invokes qht-bench, which is multi-threaded.
> +# gprof doesn't support multi-threaded programs, so skip this test under 
> gprof.
>  check-unit-$(call lnot,$(CONFIG_GPROF)) += tests/test-qht-par$(EXESUF)
>  check-unit-y += tests/test-bitops$(EXESUF)
>  check-unit-y += tests/test-bitcnt$(EXESUF)
>
> If you agree, I can submit a proper patch with the above.

Heh, that fix is already in although I reverted the work-around in my PR
as it seemed a little too hacky given it seems to be fixed now.

--
Alex Bennée



Re: [Qemu-devel] Remaining CI failures

2019-01-14 Thread Daniel P . Berrangé
On Mon, Jan 14, 2019 at 10:02:06AM -0500, Emilio G. Cota wrote:
> On Fri, Jan 11, 2019 at 19:10:07 +, Alex Bennée wrote:
> > So trying to narrow down the remaining failures in the CI system. There
> > is one with a patch in flight (use g_usleep instead of sleep) but there
> > remains two failure modes, both erratic.
> > 
> > tests/qht-par:
> > 
> > I can trigger this on my dev machine with a gprof enabled build:
> > 
> >   # QEMU configure log Fri Jan 11 14:10:45 GMT 2019
> >   # Configured with: './configure' '--disable-tools' '--disable-docs' 
> > '--enable-gprof' '--enable-gcov'
> > 
> > I only seem to be able to trigger it when running via the wrapper in the
> > make system:
> > 
> >   retry.py -n 30 --invert make check-tests/test-qht-par
> > 
> > Eventually this crashes with:
> > 
> >   ERROR:tests/test-qht-par.c:20:test_qht: assertion failed (rc == 0): 
> > (35584 == 0)
> > 
> > Leaving a core dump for the child:
> 
> I can't replicate this on my machine :-(
> 
> I suspect this is probably related to the fact that gprof
> is not even meant to support multi-threaded programs (like qht-bench).
> 
> Given that we've fixed the sleep issue (thanks!) and that there is no use
> in running test-qht-par under gprof, I propose to permanently skip
> it under gprof, e.g.:
> 
> --- a/tests/Makefile.include
> +++ b/tests/Makefile.include
> @@ -88,7 +88,8 @@ check-unit-y += tests/test-rcu-simpleq$(EXESUF)
>  check-unit-y += tests/test-rcu-tailq$(EXESUF)
>  check-unit-y += tests/test-qdist$(EXESUF)
>  check-unit-y += tests/test-qht$(EXESUF)
> -# FIXME: {test-qht-par + gprof} often break on Travis CI
> +# test-qht-par invokes qht-bench, which is multi-threaded.
> +# gprof doesn't support multi-threaded programs, so skip this test under 
> gprof.

If gprof doesn't support threads, then we potentially need to skip more
of our tests. eg test-io-channel-socket uses APIs that spawn threads
behind the scenes, not to mention the test-thread-pool test. Some
chardev tests probably hit APIs that spawn background threads too.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Re: [Qemu-devel] Remaining CI failures

2019-01-14 Thread Emilio G. Cota
On Fri, Jan 11, 2019 at 19:10:07 +, Alex Bennée wrote:
> So trying to narrow down the remaining failures in the CI system. There
> is one with a patch in flight (use g_usleep instead of sleep) but there
> remains two failure modes, both erratic.
> 
> tests/qht-par:
> 
> I can trigger this on my dev machine with a gprof enabled build:
> 
>   # QEMU configure log Fri Jan 11 14:10:45 GMT 2019
>   # Configured with: './configure' '--disable-tools' '--disable-docs' 
> '--enable-gprof' '--enable-gcov'
> 
> I only seem to be able to trigger it when running via the wrapper in the
> make system:
> 
>   retry.py -n 30 --invert make check-tests/test-qht-par
> 
> Eventually this crashes with:
> 
>   ERROR:tests/test-qht-par.c:20:test_qht: assertion failed (rc == 0): (35584 
> == 0)
> 
> Leaving a core dump for the child:

I can't replicate this on my machine :-(

I suspect this is probably related to the fact that gprof
is not even meant to support multi-threaded programs (like qht-bench).

Given that we've fixed the sleep issue (thanks!) and that there is no use
in running test-qht-par under gprof, I propose to permanently skip
it under gprof, e.g.:

--- a/tests/Makefile.include
+++ b/tests/Makefile.include
@@ -88,7 +88,8 @@ check-unit-y += tests/test-rcu-simpleq$(EXESUF)
 check-unit-y += tests/test-rcu-tailq$(EXESUF)
 check-unit-y += tests/test-qdist$(EXESUF)
 check-unit-y += tests/test-qht$(EXESUF)
-# FIXME: {test-qht-par + gprof} often break on Travis CI
+# test-qht-par invokes qht-bench, which is multi-threaded.
+# gprof doesn't support multi-threaded programs, so skip this test under gprof.
 check-unit-$(call lnot,$(CONFIG_GPROF)) += tests/test-qht-par$(EXESUF)
 check-unit-y += tests/test-bitops$(EXESUF)
 check-unit-y += tests/test-bitcnt$(EXESUF)

If you agree, I can submit a proper patch with the above.

Thanks,

Emilio



Re: [Qemu-devel] Remaining CI failures

2019-01-14 Thread Peter Maydell
On Fri, 11 Jan 2019 at 19:10, Alex Bennée  wrote:
> I did run the make under strace to see who is using O_NONBLOCK but even
> after filtering out a bunch of stuff it seems to be quite embedded:
>
>  ag "O_NONBLOCK" check.strace  | ag -v "/sys" | ag -v "/dev" | grep -v ".git" 
> | wc -l
>  2449

The strace rune I used to track down the other O_NONBLOCK issue was
strace -o /tmp/strace.log -e fcntl,execve,clone -e signal=none -f make ...

and then looking through the strace.log for uses of O_NONBLOCK
on fds 0, 1 or 2, and backtracking those to which binary had been
exec'd to give that PID. Then I did a full strace of just
that test, and looked to check whether the 0/1/2 was really
the top-level make's stdin/stdout/stderr or whether something else
had dup'd or opened an fd there.

thanks
-- PMM



[Qemu-devel] Remaining CI failures

2019-01-11 Thread Alex Bennée


Hi,

So trying to narrow down the remaining failures in the CI system. There
is one with a patch in flight (use g_usleep instead of sleep) but there
remains two failure modes, both erratic.

tests/qht-par:

I can trigger this on my dev machine with a gprof enabled build:

  # QEMU configure log Fri Jan 11 14:10:45 GMT 2019
  # Configured with: './configure' '--disable-tools' '--disable-docs' 
'--enable-gprof' '--enable-gcov'

I only seem to be able to trigger it when running via the wrapper in the
make system:

  retry.py -n 30 --invert make check-tests/test-qht-par

Eventually this crashes with:

  ERROR:tests/test-qht-par.c:20:test_qht: assertion failed (rc == 0): (35584 == 
0)

Leaving a core dump for the child:

  Core was generated by `tests/qht-bench -R -S0.1 -D1 -N1 -n 2 -u 20 -d 1'
  (gdb) info thread
Id   Target Id Frame
  * 1Thread 0x76a7e700 (LWP 15473) 0x5557c306 in 
call_rcu_thread (opaque=0x0) at util/rcu.c:255
2Thread 0x77fbe780 (LWP 15472) 0x555b8d50 in 
gcov_read_words ()
  (gdb) bt
  #0  __mcount_internal (frompc=, selfpc=93824992383630) at 
mcount.c:98
  #1  0x76e15e24 in mcount () at ../sysdeps/x86_64/_mcount.S:51
  #2  0x5557928e in qemu_event_reset (ev=0x3cc692b8d452f400) at 
util/qemu-thread-posix.c:408
  #3  0x5557c306 in call_rcu_thread (opaque=0x0) at util/rcu.c:255
  #4  0x55579630 in qemu_thread_start (args=0x55808080) at 
util/qemu-thread-posix.c:502
  #5  0x770e96db in start_thread (arg=0x76a7e700) at 
pthread_create.c:463
  #6  0x76e1288f in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:95
  (gdb) thread 2
  [Switching to thread 2 (Thread 0x77fbe780 (LWP 15472))]
  #0  0x555b8d50 in gcov_read_words ()
  (gdb) bt
  #0  0x555b8d50 in gcov_read_words ()
  #1  0x555b9453 in __gcov_read_summary ()
  #2  0x555ba461 in gcov_do_dump ()
  #3  0x555bab62 in __gcov_exit ()
  #4  0x555b8c22 in _GLOBAL__sub_D_00100_1_json_lexer_init () at 
qobject/json-lexer.c:365
  #5  0x77de5b73 in _dl_fini () at dl-fini.c:138
  #6  0x76d34041 in __run_exit_handlers (status=0, listp=0x770dc718 
<__exit_funcs>, run_list_atexit=run_list_atexit@entry=true, 
run_dtors=run_dtors@entry=true) at
  exit.c:108
  #7  0x76d3413a in __GI_exit (status=) at exit.c:139
  #8  0x76d12b9e in __libc_start_main (main=0x55575d50 , 
argc=11, argv=0x7fffdf78, init=, fini=, 
rtld_fini=, stack_end=0x7fffdf68) at ../csu/libc-start.c:344
  #9  0x55573c1a in _start ()

To trigger the second failure I had to run on a limited Xenial machine
(16.04, 2 cores, 8Gb RAM) again with gprof build:

  # QEMU configure log Thu 10 Jan 22:22:52 GMT 2019
  # Configured with: './configure' '--enable-gprof' '--enable-gcov' 
'--disable-pie' 
'--target-list=aarch64-softmmu,arm-softmmu,i386-softmmu,mips-softmmu,mips64-softmmu,ppc64-softmmu,riscv64-softmmu,s390x-softmmu,x86_64-softmmu'

Running it like Travis does:

  retry.py -n 40 --invert -- make -j 3 check V=1

It eventually fails with:

  PASS: tests/test-hmp
  make: write error: stdout

It's hard to tell from the output what was running that failed. So far
I've managed to get the following information out of execsnoop:

  qemu-system-x86  1345   1332 0 x86_64-softmmu/qemu-system-x86_64 -qtest 
unix:/tmp/qtest-1332.sock,nowait -qtest-log /dev/null -chardev 
socket,path=/tmp/qtest-1332.qmp,no
  wait,id=char0 -mon chardev=char0,mode=control -machine accel=qtest -display 
none -S -M pc-i440fx-4.0
  sh   1350   1332 0 /bin/sh -c exec 
x86_64-softmmu/qemu-system-x86_64 -qtest unix:/tmp/qtest-1332.sock,nowait 
-qtest-log /dev/null -chardev socket,path=/tmp/q
  t
  qemu-system-x86  1350   1332 0 x86_64-softmmu/qemu-system-x86_64 -qtest 
unix:/tmp/qtest-1332.sock,nowait -qtest-log /dev/null -chardev 
socket,path=/tmp/qtest-1332.qmp,no
  wait,id=char0 -mon chardev=char0,mode=control -machine accel=qtest -display 
none -S -M pc-q35-3.1
  sh   1356   1332 0 /bin/sh -c exec 
x86_64-softmmu/qemu-system-x86_64 -qtest unix:/tmp/qtest-1332.sock,nowait 
-qtest-log /dev/null -chardev socket,path=/tmp/q
  t
  qemu-system-x86  1356   1332 0 x86_64-softmmu/qemu-system-x86_64 -qtest 
unix:/tmp/qtest-1332.sock,nowait -qtest-log /dev/null -chardev 
socket,path=/tmp/qtest-1332.qmp,no
  wait,id=char0 -mon chardev=char0,mode=control -machine accel=qtest -display 
none -S -M pc-i440fx-3.1
  sh   1361   1332 0 /bin/sh -c exec 
x86_64-softmmu/qemu-system-x86_64 -qtest unix:/tmp/qtest-1332.sock,nowait 
-qtest-log /dev/null -chardev socket,path=/tmp/q
  t
  qemu-system-x86  1361   1332 0 x86_64-softmmu/qemu-system-x86_64 -qtest 
unix:/tmp/qtest-1332.sock,nowait -qtest-log /dev/null -chardev 
socket,path=/tmp/qtest-1332.qmp,no
  wait,id=char0 -mon chardev=char0,mode=control -machine accel=qtest -display 
none -S -M pc-q35-4.0
  sh