Re: [Qemu-devel] Remaining CI failures
Emilio G. Cota writes: > On Fri, Jan 11, 2019 at 19:10:07 +, Alex Bennée wrote: >> So trying to narrow down the remaining failures in the CI system. There >> is one with a patch in flight (use g_usleep instead of sleep) but there >> remains two failure modes, both erratic. >> >> tests/qht-par: >> >> I can trigger this on my dev machine with a gprof enabled build: >> >> # QEMU configure log Fri Jan 11 14:10:45 GMT 2019 >> # Configured with: './configure' '--disable-tools' '--disable-docs' >> '--enable-gprof' '--enable-gcov' >> >> I only seem to be able to trigger it when running via the wrapper in the >> make system: >> >> retry.py -n 30 --invert make check-tests/test-qht-par >> >> Eventually this crashes with: >> >> ERROR:tests/test-qht-par.c:20:test_qht: assertion failed (rc == 0): (35584 >> == 0) >> >> Leaving a core dump for the child: > > I can't replicate this on my machine :-( > > I suspect this is probably related to the fact that gprof > is not even meant to support multi-threaded programs (like qht-bench). > > Given that we've fixed the sleep issue (thanks!) and that there is no use > in running test-qht-par under gprof, I propose to permanently skip > it under gprof, e.g.: > > --- a/tests/Makefile.include > +++ b/tests/Makefile.include > @@ -88,7 +88,8 @@ check-unit-y += tests/test-rcu-simpleq$(EXESUF) > check-unit-y += tests/test-rcu-tailq$(EXESUF) > check-unit-y += tests/test-qdist$(EXESUF) > check-unit-y += tests/test-qht$(EXESUF) > -# FIXME: {test-qht-par + gprof} often break on Travis CI > +# test-qht-par invokes qht-bench, which is multi-threaded. > +# gprof doesn't support multi-threaded programs, so skip this test under > gprof. > check-unit-$(call lnot,$(CONFIG_GPROF)) += tests/test-qht-par$(EXESUF) > check-unit-y += tests/test-bitops$(EXESUF) > check-unit-y += tests/test-bitcnt$(EXESUF) > > If you agree, I can submit a proper patch with the above. Heh, that fix is already in although I reverted the work-around in my PR as it seemed a little too hacky given it seems to be fixed now. -- Alex Bennée
Re: [Qemu-devel] Remaining CI failures
On Mon, Jan 14, 2019 at 10:02:06AM -0500, Emilio G. Cota wrote: > On Fri, Jan 11, 2019 at 19:10:07 +, Alex Bennée wrote: > > So trying to narrow down the remaining failures in the CI system. There > > is one with a patch in flight (use g_usleep instead of sleep) but there > > remains two failure modes, both erratic. > > > > tests/qht-par: > > > > I can trigger this on my dev machine with a gprof enabled build: > > > > # QEMU configure log Fri Jan 11 14:10:45 GMT 2019 > > # Configured with: './configure' '--disable-tools' '--disable-docs' > > '--enable-gprof' '--enable-gcov' > > > > I only seem to be able to trigger it when running via the wrapper in the > > make system: > > > > retry.py -n 30 --invert make check-tests/test-qht-par > > > > Eventually this crashes with: > > > > ERROR:tests/test-qht-par.c:20:test_qht: assertion failed (rc == 0): > > (35584 == 0) > > > > Leaving a core dump for the child: > > I can't replicate this on my machine :-( > > I suspect this is probably related to the fact that gprof > is not even meant to support multi-threaded programs (like qht-bench). > > Given that we've fixed the sleep issue (thanks!) and that there is no use > in running test-qht-par under gprof, I propose to permanently skip > it under gprof, e.g.: > > --- a/tests/Makefile.include > +++ b/tests/Makefile.include > @@ -88,7 +88,8 @@ check-unit-y += tests/test-rcu-simpleq$(EXESUF) > check-unit-y += tests/test-rcu-tailq$(EXESUF) > check-unit-y += tests/test-qdist$(EXESUF) > check-unit-y += tests/test-qht$(EXESUF) > -# FIXME: {test-qht-par + gprof} often break on Travis CI > +# test-qht-par invokes qht-bench, which is multi-threaded. > +# gprof doesn't support multi-threaded programs, so skip this test under > gprof. If gprof doesn't support threads, then we potentially need to skip more of our tests. eg test-io-channel-socket uses APIs that spawn threads behind the scenes, not to mention the test-thread-pool test. Some chardev tests probably hit APIs that spawn background threads too. Regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
Re: [Qemu-devel] Remaining CI failures
On Fri, Jan 11, 2019 at 19:10:07 +, Alex Bennée wrote: > So trying to narrow down the remaining failures in the CI system. There > is one with a patch in flight (use g_usleep instead of sleep) but there > remains two failure modes, both erratic. > > tests/qht-par: > > I can trigger this on my dev machine with a gprof enabled build: > > # QEMU configure log Fri Jan 11 14:10:45 GMT 2019 > # Configured with: './configure' '--disable-tools' '--disable-docs' > '--enable-gprof' '--enable-gcov' > > I only seem to be able to trigger it when running via the wrapper in the > make system: > > retry.py -n 30 --invert make check-tests/test-qht-par > > Eventually this crashes with: > > ERROR:tests/test-qht-par.c:20:test_qht: assertion failed (rc == 0): (35584 > == 0) > > Leaving a core dump for the child: I can't replicate this on my machine :-( I suspect this is probably related to the fact that gprof is not even meant to support multi-threaded programs (like qht-bench). Given that we've fixed the sleep issue (thanks!) and that there is no use in running test-qht-par under gprof, I propose to permanently skip it under gprof, e.g.: --- a/tests/Makefile.include +++ b/tests/Makefile.include @@ -88,7 +88,8 @@ check-unit-y += tests/test-rcu-simpleq$(EXESUF) check-unit-y += tests/test-rcu-tailq$(EXESUF) check-unit-y += tests/test-qdist$(EXESUF) check-unit-y += tests/test-qht$(EXESUF) -# FIXME: {test-qht-par + gprof} often break on Travis CI +# test-qht-par invokes qht-bench, which is multi-threaded. +# gprof doesn't support multi-threaded programs, so skip this test under gprof. check-unit-$(call lnot,$(CONFIG_GPROF)) += tests/test-qht-par$(EXESUF) check-unit-y += tests/test-bitops$(EXESUF) check-unit-y += tests/test-bitcnt$(EXESUF) If you agree, I can submit a proper patch with the above. Thanks, Emilio
Re: [Qemu-devel] Remaining CI failures
On Fri, 11 Jan 2019 at 19:10, Alex Bennée wrote: > I did run the make under strace to see who is using O_NONBLOCK but even > after filtering out a bunch of stuff it seems to be quite embedded: > > ag "O_NONBLOCK" check.strace | ag -v "/sys" | ag -v "/dev" | grep -v ".git" > | wc -l > 2449 The strace rune I used to track down the other O_NONBLOCK issue was strace -o /tmp/strace.log -e fcntl,execve,clone -e signal=none -f make ... and then looking through the strace.log for uses of O_NONBLOCK on fds 0, 1 or 2, and backtracking those to which binary had been exec'd to give that PID. Then I did a full strace of just that test, and looked to check whether the 0/1/2 was really the top-level make's stdin/stdout/stderr or whether something else had dup'd or opened an fd there. thanks -- PMM
[Qemu-devel] Remaining CI failures
Hi, So trying to narrow down the remaining failures in the CI system. There is one with a patch in flight (use g_usleep instead of sleep) but there remains two failure modes, both erratic. tests/qht-par: I can trigger this on my dev machine with a gprof enabled build: # QEMU configure log Fri Jan 11 14:10:45 GMT 2019 # Configured with: './configure' '--disable-tools' '--disable-docs' '--enable-gprof' '--enable-gcov' I only seem to be able to trigger it when running via the wrapper in the make system: retry.py -n 30 --invert make check-tests/test-qht-par Eventually this crashes with: ERROR:tests/test-qht-par.c:20:test_qht: assertion failed (rc == 0): (35584 == 0) Leaving a core dump for the child: Core was generated by `tests/qht-bench -R -S0.1 -D1 -N1 -n 2 -u 20 -d 1' (gdb) info thread Id Target Id Frame * 1Thread 0x76a7e700 (LWP 15473) 0x5557c306 in call_rcu_thread (opaque=0x0) at util/rcu.c:255 2Thread 0x77fbe780 (LWP 15472) 0x555b8d50 in gcov_read_words () (gdb) bt #0 __mcount_internal (frompc=, selfpc=93824992383630) at mcount.c:98 #1 0x76e15e24 in mcount () at ../sysdeps/x86_64/_mcount.S:51 #2 0x5557928e in qemu_event_reset (ev=0x3cc692b8d452f400) at util/qemu-thread-posix.c:408 #3 0x5557c306 in call_rcu_thread (opaque=0x0) at util/rcu.c:255 #4 0x55579630 in qemu_thread_start (args=0x55808080) at util/qemu-thread-posix.c:502 #5 0x770e96db in start_thread (arg=0x76a7e700) at pthread_create.c:463 #6 0x76e1288f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 (gdb) thread 2 [Switching to thread 2 (Thread 0x77fbe780 (LWP 15472))] #0 0x555b8d50 in gcov_read_words () (gdb) bt #0 0x555b8d50 in gcov_read_words () #1 0x555b9453 in __gcov_read_summary () #2 0x555ba461 in gcov_do_dump () #3 0x555bab62 in __gcov_exit () #4 0x555b8c22 in _GLOBAL__sub_D_00100_1_json_lexer_init () at qobject/json-lexer.c:365 #5 0x77de5b73 in _dl_fini () at dl-fini.c:138 #6 0x76d34041 in __run_exit_handlers (status=0, listp=0x770dc718 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at exit.c:108 #7 0x76d3413a in __GI_exit (status=) at exit.c:139 #8 0x76d12b9e in __libc_start_main (main=0x55575d50 , argc=11, argv=0x7fffdf78, init=, fini=, rtld_fini=, stack_end=0x7fffdf68) at ../csu/libc-start.c:344 #9 0x55573c1a in _start () To trigger the second failure I had to run on a limited Xenial machine (16.04, 2 cores, 8Gb RAM) again with gprof build: # QEMU configure log Thu 10 Jan 22:22:52 GMT 2019 # Configured with: './configure' '--enable-gprof' '--enable-gcov' '--disable-pie' '--target-list=aarch64-softmmu,arm-softmmu,i386-softmmu,mips-softmmu,mips64-softmmu,ppc64-softmmu,riscv64-softmmu,s390x-softmmu,x86_64-softmmu' Running it like Travis does: retry.py -n 40 --invert -- make -j 3 check V=1 It eventually fails with: PASS: tests/test-hmp make: write error: stdout It's hard to tell from the output what was running that failed. So far I've managed to get the following information out of execsnoop: qemu-system-x86 1345 1332 0 x86_64-softmmu/qemu-system-x86_64 -qtest unix:/tmp/qtest-1332.sock,nowait -qtest-log /dev/null -chardev socket,path=/tmp/qtest-1332.qmp,no wait,id=char0 -mon chardev=char0,mode=control -machine accel=qtest -display none -S -M pc-i440fx-4.0 sh 1350 1332 0 /bin/sh -c exec x86_64-softmmu/qemu-system-x86_64 -qtest unix:/tmp/qtest-1332.sock,nowait -qtest-log /dev/null -chardev socket,path=/tmp/q t qemu-system-x86 1350 1332 0 x86_64-softmmu/qemu-system-x86_64 -qtest unix:/tmp/qtest-1332.sock,nowait -qtest-log /dev/null -chardev socket,path=/tmp/qtest-1332.qmp,no wait,id=char0 -mon chardev=char0,mode=control -machine accel=qtest -display none -S -M pc-q35-3.1 sh 1356 1332 0 /bin/sh -c exec x86_64-softmmu/qemu-system-x86_64 -qtest unix:/tmp/qtest-1332.sock,nowait -qtest-log /dev/null -chardev socket,path=/tmp/q t qemu-system-x86 1356 1332 0 x86_64-softmmu/qemu-system-x86_64 -qtest unix:/tmp/qtest-1332.sock,nowait -qtest-log /dev/null -chardev socket,path=/tmp/qtest-1332.qmp,no wait,id=char0 -mon chardev=char0,mode=control -machine accel=qtest -display none -S -M pc-i440fx-3.1 sh 1361 1332 0 /bin/sh -c exec x86_64-softmmu/qemu-system-x86_64 -qtest unix:/tmp/qtest-1332.sock,nowait -qtest-log /dev/null -chardev socket,path=/tmp/q t qemu-system-x86 1361 1332 0 x86_64-softmmu/qemu-system-x86_64 -qtest unix:/tmp/qtest-1332.sock,nowait -qtest-log /dev/null -chardev socket,path=/tmp/qtest-1332.qmp,no wait,id=char0 -mon chardev=char0,mode=control -machine accel=qtest -display none -S -M pc-q35-4.0 sh