On Tue, Jul 04, 2023 at 10:37:38AM +0100, Anthony PERARD wrote:
> On Wed, Jun 28, 2023 at 02:31:39PM +0200, Roger Pau Monné wrote:
> > On Fri, Jun 23, 2023 at 03:04:21PM +0000, osstest service owner wrote:
> > > flight 181558 xen-unstable real [real]
> > > http://logs.test-lab.xenproject.org/osstest/logs/181558/
> > > 
> > > Regressions :-(
> > > 
> > > Tests which did not succeed and are blocking,
> > > including tests which could not be run:
> > >  test-amd64-amd64-xl-qcow2   21 guest-start/debian.repeat fail REGR. vs. 
> > > 181545
> > 
> > The test failing here is hitting the assert in qemu_cond_signal() as
> > called by worker_thread():
> > 
> > #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
> > #1  0x00007ffff740b535 in __GI_abort () at abort.c:79
> > #2  0x00007ffff740b40f in __assert_fail_base (fmt=0x7ffff756cef0 
> > "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x55555614abcb 
> > "cond->initialized",
> >     file=0x55555614ab88 "../qemu-xen-dir-remote/util/qemu-thread-posix.c", 
> > line=198, function=<optimized out>) at assert.c:92
> > #3  0x00007ffff74191a2 in __GI___assert_fail (assertion=0x55555614abcb 
> > "cond->initialized", file=0x55555614ab88 
> > "../qemu-xen-dir-remote/util/qemu-thread-posix.c", line=198,
> >     function=0x55555614ad80 <__PRETTY_FUNCTION__.17104> "qemu_cond_signal") 
> > at assert.c:101
> > #4  0x0000555555f1c8d2 in qemu_cond_signal (cond=0x7fffb800db30) at 
> > ../qemu-xen-dir-remote/util/qemu-thread-posix.c:198
> > #5  0x0000555555f36973 in worker_thread (opaque=0x7fffb800dab0) at 
> > ../qemu-xen-dir-remote/util/thread-pool.c:129
> > #6  0x0000555555f1d1d2 in qemu_thread_start (args=0x7fffb8000b20) at 
> > ../qemu-xen-dir-remote/util/qemu-thread-posix.c:505
> > #7  0x00007ffff75b0fa3 in start_thread (arg=<optimized out>) at 
> > pthread_create.c:486
> > #8  0x00007ffff74e206f in clone () at 
> > ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
> > 
> > I've been trying to figure out how it can get in such state, but so
> > far I had no luck.  I'm not a QEMU expert, so it's probably better if
> > someone else could handle this.
> > 
> > In the failures I've seen, and the reproduction I have, the assert
> > triggers in the QEMU dom0 instance responsible for locally-attaching
> > the disk to dom0 in order to run pygrub.
> > 
> > This is also with QEMU 7.2, as testing with upstream QEMU is blocked
> > ATM, so there's a chance it has already been fixed upstream.
> > 
> > Thanks, Roger.
> 
> So, I've run a test with the latest QEMU and I can still reproduce the
> issue. The test also fails with QEMU 7.1.0.
> 
> But, QEMU 7.0 seems to pass the test, even with a start-stop loop of 200
> iteration. So I'll try to find out if something change in that range.
> Or try to find out why would the thread pool be not initialised
> properly.

Thanks for looking into this.

There are a set of changes from Paolo Bonzini:

232e9255478f thread-pool: remove stopping variable
900fa208f506 thread-pool: replace semaphore with condition variable
3c7b72ddca9c thread-pool: optimize scheduling of completion bottom half

That landed in 7.1 that seem like possible candidates.

Roger.

Reply via email to