Alright, here is what is happening:
Whenever program is stuck, thread #2 backtrace is this:
(gdb) bt
#0 syscall () at ../sysdeps/unix/sysv/linux/aarch64/syscall.S:38
#1 0xaabd41b0 in qemu_futex_wait (val=, f=) at ./util/qemu-thread-posix.c:438
#2 qemu_event_wait
Alright,
I'm still investigating this but wanted to share some findings... I
haven't got a kernel dump yet after the task is frozen, I have analyzed
only the userland part of it (although I have checked if code was
running inside kernel with perf cycles:u/cycles:k at some point).
The big picture
Alright, with a d06 aarch64 machine I was able to reproduce it after 8
attempts.I'll debug it today and provide feedback on my findings.
(gdb) bt full
#0 0xb0b2181c in __GI_ppoll (fds=0xce5ab770, nfds=4,
timeout=, timeout@entry=0x0,
sigmask=sigmask@entry=0x0) at
Alright, I couldn't reproduce this yet, I'm running same test case in a
24 cores box and causing lots of context switches and CPU migrations in
parallel (trying to exhaust the logic).
Will let this running for sometime to check.
Unfortunately this can be related QEMU AIO BH locking/primitives
OOhh nm on the virtual environment test, as I just remembered we don't
have KVM on 2nd level for aarch64 yet (at least in ARMv8 implementing
virt extension). I'll try to reproduce in the real env only.
--
You received this bug notification because you are a member of qemu-
devel-ml, which is
Hello Liz,
I'll try to reproduce this issue in a Cortex-A53 aarch64 real
environment (w/ 24 HW threads) AND in a virtual environment w/ lots of
vCPUs... but, if it's a barrier missing - or the lack of atomicity
and/or ordering in a primitive - then, I'm afraid the context switch in
between vCPUs
** Changed in: qemu (Ubuntu)
Status: Confirmed => In Progress
** Changed in: qemu (Ubuntu)
Assignee: (unassigned) => Rafael David Tinoco (rafaeldtinoco)
** Changed in: qemu (Ubuntu)
Importance: Undecided => Medium
--
You received this bug notification because you are a member of
** Also affects: qemu (Ubuntu)
Importance: Undecided
Status: New
** Changed in: qemu (Ubuntu)
Status: New => Confirmed
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1805256
Title:
I can reproduce this problem with qemu.git/matser. It still exists in
qemu.git/matser. I found that when an IO return in worker threads and want to
call aio_notify to wake up main_loop, but it found that ctx->notify_me is
cleared to 0 by main_loop in aio_ctx_check by calling
frazier, Do you find the conditions that necessarily make this problem appear?
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1805256
Title:
qemu-img hangs on high core count ARM system
Status in
Do you have any good ideas about it? Maybe somewhere lack of memeory
barriers that cause it?
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1805256
Title:
qemu-img hangs on high core count ARM
No, sorry - this bugs still persists w/ latest upstream (@ afccfc0). I
found a report of similar symptoms:
https://patchwork.kernel.org/patch/10047341/
https://bugzilla.redhat.com/show_bug.cgi?id=1524770#c13
To be clear, ^ is already fixed upstream, so it is not the *same* issue
- but
** Changed in: qemu
Status: New => Confirmed
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1805256
Title:
qemu-img hangs on high core count ARM system
Status in QEMU:
Confirmed
Bug
sorry, I make a spelling mistake here("Hi, I also found a problem that
qemu-img convert hands in ARM.").The right is "I also found a problem
that qemu-img convert hangs in ARM".
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
Hi, I also found a problem that qemu-img convert hands in ARM.
The convert command line is "qemu-img convert -f qcow2 -O raw disk.qcow2
disk.raw ".
The bt is below:
Thread 2 (Thread 0x4b776e50 (LWP 27215)):
#0 0x4a3f2994 in sigtimedwait () from /lib64/libc.so.6
#1 0x4a39c60c
** Tags added: qemu-img
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1805256
Title:
qemu-img hangs on high core count ARM system
Status in QEMU:
New
Bug description:
On the HiSilicon D06
ext4 filesystem, SATA drive:
(gdb) thread apply all bt
Thread 3 (Thread 0x9bffc9a0 (LWP 9015)):
#0 0xaaa462cc in __GI___sigtimedwait (set=,
set@entry=0xe725c070, info=info@entry=0x9bffbf18,
timeout=0x3ff1, timeout@entry=0x0)
at
Hi, can you do a `thread apply all bt` instead? If I were to bet, we're
probably waiting for some slow call like lseek to return in another
thread.
What filesystem/blockdevice is involved here?
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed
18 matches
Mail list logo