Re: [Qemu-devel] qemu_system_reset_request() broken w.r.t BQL locking regime
Paolo Bonziniwrites: > On 05/07/2017 18:14, Peter Maydell wrote: >>> - Guest resets board, writing to some hw address (e.g. >>> arm_sysctl_write) >>> - This triggers qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_RESET) >>> - We exit iowrite and drop the BQL >>> - vl.c schedules qemu_system_reset->qemu_devices_reset...arm_cpu_reset >>> - we start writing new values to CPU env while still in TCG code >>> - CHAOS! >>> >>> The general solution for this is to ensure these sort of tasks are done >>> with safe work in the CPUs context when we know nothing else is running. >>> It seems this is probably best done by modifying >>> qemu_system_reset_request to queue work up on current_cpu and execute it >>> as safe work - I don't think the vl.c thread should ever be messing >>> about with calling cpu_reset directly. >> My first thought is that qemu_system_reset() should absolutely >> stop every CPU (or other runnable thing like a DMA agent) in the >> system. The semantics are basically "like a power cycle", so >> that should include a complete stop of the world. (Is this >> what vm_stop() does? Dunno...) > > I agree, it should do vm_stop() as the first thing and, if applicable, > vm_start() as the last thing, similar to e.g. savevm. OK I did some more digging and basically the problem is cpu_stop_current does the wrong thing. It can set cpu->stopped while still in the vCPU thread which means when the vl.c thread does pause_all_vcpus() it thinks the thread is paused when in fact it isn't leading to the chaos. I think the fix is to tighten up our usage of these two functions. So my current plan is: * pause_all_vcpus() should never be called from vCPU/HW emulation One case in kvm_apic has been fixed by Pranith. The other case in s390 should be converted to use async_safe_work. Once this is done we can assert that pause_all_vcpus() is not in a vCPU thread and keep it for qmp,hmp and gdb type operations. * vm_stop() is probably being misused by vCPU threads There are more uses than pause_all_vcpus here but they all seem to be for error handling bail-out type things. * cpu_stop_current() is probably superfluous now It certainly shouldn't be called directly from the vCPU code (rtas_power_off) and once we know pause_all_vcpus() can't be called directly at least one call is gone. I think the current_cpu handling is a relic of the days of single-threaded handling when it was a global. Does that sound reasonable? -- Alex Bennée
Re: [Qemu-devel] qemu_system_reset_request() broken w.r.t BQL locking regime
Peter Maydellwrites: > On 5 July 2017 at 20:30, Alex Bennée wrote: >> >> Peter Maydell writes: >> >>> On 5 July 2017 at 17:01, Alex Bennée wrote: An interesting bug was reported on #qemu today. It was bisected to 8d04fb55 (drop global lock for TCG) and only occurred when QEMU was run with taskset -c 0. Originally the fingers where pointed at mttcg but it occurs in both single and multi-threaded modes. I think the problem is qemu_system_reset_request() is certainly racy when resetting a running CPU. AFAICT: - Guest resets board, writing to some hw address (e.g. arm_sysctl_write) - This triggers qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_RESET) - We exit iowrite and drop the BQL - vl.c schedules qemu_system_reset->qemu_devices_reset...arm_cpu_reset - we start writing new values to CPU env while still in TCG code - CHAOS! The general solution for this is to ensure these sort of tasks are done with safe work in the CPUs context when we know nothing else is running. It seems this is probably best done by modifying qemu_system_reset_request to queue work up on current_cpu and execute it as safe work - I don't think the vl.c thread should ever be messing about with calling cpu_reset directly. >>> >>> My first thought is that qemu_system_reset() should absolutely >>> stop every CPU (or other runnable thing like a DMA agent) in the >>> system. >> >> Are all these reset calls system wide though? > > It's called 'system_reset' because it resets the entire system... > >> After all with PCSI you >> can bring individual cores up and down. I appreciate the vexpress stuff >> pre-dates those well defined semantics though. > > It's individual core reset that's a more ad-hoc afterthought, > really. > >> vm_stop certainly tries to deal with things gracefully as well as send >> qapi events, drain IO queues and the rest of it. My only concern is it >> handles two cases - external vm_stops and those from the current CPU. >> >> I think it may be cleaner for CPU originated halts to use the >> async_safe_run_on_cpu() mechanism. > > System reset already has an async component to it -- you call > qemu_system_reset_request(), which just says "schedule a system > reset as soon as convenient". qemu_system_reset() is the thing > that runs later and actually does the job (from the io thread, > not the CPU thread). > > Looking more closely at the vl.c code, it looks like it > calls pause_all_vcpus() before calling qemu_system_reset(): > shouldn't that be pausing all the TCG CPUs? Looking deeper it seems cpu_stop_current() is doing the wrong thing. Because it sets cpu->stopped the pause_all_vcpus() in the vl.c thread doesn't wait. I suspect it should really be doing a cpu_loop_exit. I'll see if I can work up a patch. > > thanks > -- PMM -- Alex Bennée
Re: [Qemu-devel] qemu_system_reset_request() broken w.r.t BQL locking regime
Peter Maydellwrites: > On 5 July 2017 at 20:30, Alex Bennée wrote: >> >> Peter Maydell writes: >> >>> On 5 July 2017 at 17:01, Alex Bennée wrote: An interesting bug was reported on #qemu today. It was bisected to 8d04fb55 (drop global lock for TCG) and only occurred when QEMU was run with taskset -c 0. Originally the fingers where pointed at mttcg but it occurs in both single and multi-threaded modes. I think the problem is qemu_system_reset_request() is certainly racy when resetting a running CPU. AFAICT: - Guest resets board, writing to some hw address (e.g. arm_sysctl_write) - This triggers qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_RESET) - We exit iowrite and drop the BQL - vl.c schedules qemu_system_reset->qemu_devices_reset...arm_cpu_reset - we start writing new values to CPU env while still in TCG code - CHAOS! The general solution for this is to ensure these sort of tasks are done with safe work in the CPUs context when we know nothing else is running. It seems this is probably best done by modifying qemu_system_reset_request to queue work up on current_cpu and execute it as safe work - I don't think the vl.c thread should ever be messing about with calling cpu_reset directly. >>> >>> My first thought is that qemu_system_reset() should absolutely >>> stop every CPU (or other runnable thing like a DMA agent) in the >>> system. >> >> Are all these reset calls system wide though? > > It's called 'system_reset' because it resets the entire system... > >> After all with PCSI you >> can bring individual cores up and down. I appreciate the vexpress stuff >> pre-dates those well defined semantics though. > > It's individual core reset that's a more ad-hoc afterthought, > really. > >> vm_stop certainly tries to deal with things gracefully as well as send >> qapi events, drain IO queues and the rest of it. My only concern is it >> handles two cases - external vm_stops and those from the current CPU. >> >> I think it may be cleaner for CPU originated halts to use the >> async_safe_run_on_cpu() mechanism. > > System reset already has an async component to it -- you call > qemu_system_reset_request(), which just says "schedule a system > reset as soon as convenient". qemu_system_reset() is the thing > that runs later and actually does the job (from the io thread, > not the CPU thread). > > Looking more closely at the vl.c code, it looks like it > calls pause_all_vcpus() before calling qemu_system_reset(): > shouldn't that be pausing all the TCG CPUs? Hmm it should - but it doesn't seem to have in this backtrace: #0 0x5593fdd3 in arm_cpu_reset (s=0x569abb90) at /home/alex/lsrc/qemu/qemu.git/target/arm/cpu.c:119 #1 0x55bcc74a in cpu_reset (cpu=0x569abb90) at qom/cpu.c:268 #2 0x5589d82a in do_cpu_reset (opaque=0x569abb90) at /home/alex/lsrc/qemu/qemu.git/hw/arm/boot.c:570 #3 0x55a257e4 in qemu_devices_reset () at hw/core/reset.c:69 #4 0x559697a8 in qemu_system_reset (reason=SHUTDOWN_CAUSE_GUEST_RESET) at vl.c:1713 #5 0x55969c0d in main_loop_should_exit () at vl.c:1885 #6 0x55969cda in main_loop () at vl.c:1922 #7 0x55971aca in main (argc=16, argv=0x7fffd918, envp=0x7fffd9a0) at vl.c:4749 Thread 4 (Thread 0x7fff731ff700 (LWP 10098)): #0 0x7fffdf4f5a15 in do_futex_wait (private=0, abstime=0x7fff731fc670, expected=0, futex_word=0x7fff64cbb5b8) at ../sysdeps/unix/sysv/linux/futex-internal.h:205 #1 0x7fffdf4f5a15 in do_futex_wait (sem=sem@entry=0x7fff64cbb5b8, abstime=abstime@entry=0x7fff731fc670) at sem_waitcommon.c:111 #2 0x7fffdf4f5adf in __new_sem_wait_slow (sem=0x7fff64cbb5b8, abstime=0x7fff731fc670) at sem_waitcommon.c:181 #3 0x7fffdf4f5b92 in sem_timedwait (sem=, abstime=) at sem_timedwait.c:36 #4 0x55d27488 in qemu_sem_timedwait (sem=0x7fff64cbb5b8, ms=1) at util/qemu-thread-posix.c:271 #5 0x55d20aad in worker_thread (opaque=0x7fff64cbb550) at util/thread-pool.c:92 #6 0x7fffdf4ed6ba in start_thread (arg=0x7fff731ff700) at pthread_create.c:333 #7 0x7fffdf2233dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 Thread 3 (Thread 0x7fff7ebff700 (LWP 10097)): #0 0x7fffdf4f630a in __lll_unlock_wake () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:371 #1 0x7fffdf4f14ff in __GI___pthread_mutex_unlock (decr=1, mutex=0x5641ae20 ) at pthread_mutex_unlock.c:55 #2 0x7fffdf4f14ff in __GI___pthread_mutex_unlock (mutex=0x5641ae20 ) at pthread_mutex_unlock.c:314 #3 0x55d27091 in qemu_mutex_unlock (mutex=0x5641ae20 ) at util/qemu-thread-posix.c:88 #4 0x557aa911 in qemu_mutex_unlock_iothread () at /home/alex/lsrc/qemu/qemu.git/cpus.c:1589 #5 0x557d791a in
Re: [Qemu-devel] qemu_system_reset_request() broken w.r.t BQL locking regime
On 5 July 2017 at 20:30, Alex Bennéewrote: > > Peter Maydell writes: > >> On 5 July 2017 at 17:01, Alex Bennée wrote: >>> An interesting bug was reported on #qemu today. It was bisected to >>> 8d04fb55 (drop global lock for TCG) and only occurred when QEMU was run >>> with taskset -c 0. Originally the fingers where pointed at mttcg but it >>> occurs in both single and multi-threaded modes. >>> >>> I think the problem is qemu_system_reset_request() is certainly racy >>> when resetting a running CPU. AFAICT: >>> >>> - Guest resets board, writing to some hw address (e.g. >>> arm_sysctl_write) >>> - This triggers qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_RESET) >>> - We exit iowrite and drop the BQL >>> - vl.c schedules qemu_system_reset->qemu_devices_reset...arm_cpu_reset >>> - we start writing new values to CPU env while still in TCG code >>> - CHAOS! >>> >>> The general solution for this is to ensure these sort of tasks are done >>> with safe work in the CPUs context when we know nothing else is running. >>> It seems this is probably best done by modifying >>> qemu_system_reset_request to queue work up on current_cpu and execute it >>> as safe work - I don't think the vl.c thread should ever be messing >>> about with calling cpu_reset directly. >> >> My first thought is that qemu_system_reset() should absolutely >> stop every CPU (or other runnable thing like a DMA agent) in the >> system. > > Are all these reset calls system wide though? It's called 'system_reset' because it resets the entire system... > After all with PCSI you > can bring individual cores up and down. I appreciate the vexpress stuff > pre-dates those well defined semantics though. It's individual core reset that's a more ad-hoc afterthought, really. > vm_stop certainly tries to deal with things gracefully as well as send > qapi events, drain IO queues and the rest of it. My only concern is it > handles two cases - external vm_stops and those from the current CPU. > > I think it may be cleaner for CPU originated halts to use the > async_safe_run_on_cpu() mechanism. System reset already has an async component to it -- you call qemu_system_reset_request(), which just says "schedule a system reset as soon as convenient". qemu_system_reset() is the thing that runs later and actually does the job (from the io thread, not the CPU thread). Looking more closely at the vl.c code, it looks like it calls pause_all_vcpus() before calling qemu_system_reset(): shouldn't that be pausing all the TCG CPUs? thanks -- PMM
Re: [Qemu-devel] qemu_system_reset_request() broken w.r.t BQL locking regime
Paolo Bonziniwrites: > On 05/07/2017 18:14, Peter Maydell wrote: >>> - Guest resets board, writing to some hw address (e.g. >>> arm_sysctl_write) >>> - This triggers qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_RESET) >>> - We exit iowrite and drop the BQL >>> - vl.c schedules qemu_system_reset->qemu_devices_reset...arm_cpu_reset >>> - we start writing new values to CPU env while still in TCG code >>> - CHAOS! >>> >>> The general solution for this is to ensure these sort of tasks are done >>> with safe work in the CPUs context when we know nothing else is running. >>> It seems this is probably best done by modifying >>> qemu_system_reset_request to queue work up on current_cpu and execute it >>> as safe work - I don't think the vl.c thread should ever be messing >>> about with calling cpu_reset directly. >> My first thought is that qemu_system_reset() should absolutely >> stop every CPU (or other runnable thing like a DMA agent) in the >> system. The semantics are basically "like a power cycle", so >> that should include a complete stop of the world. (Is this >> what vm_stop() does? Dunno...) > > I agree, it should do vm_stop() as the first thing and, if applicable, > vm_start() as the last thing, similar to e.g. savevm. Why not use our async_safe_run_on_cpu mechanism for it? Certainly I wouldn't expect the vCPU hitting it's own reset button to need to be graceful about it. > > In fact, the above bug probably has existed forever in KVM. > > Paolo -- Alex Bennée
Re: [Qemu-devel] qemu_system_reset_request() broken w.r.t BQL locking regime
Peter Maydellwrites: > On 5 July 2017 at 17:01, Alex Bennée wrote: >> An interesting bug was reported on #qemu today. It was bisected to >> 8d04fb55 (drop global lock for TCG) and only occurred when QEMU was run >> with taskset -c 0. Originally the fingers where pointed at mttcg but it >> occurs in both single and multi-threaded modes. >> >> I think the problem is qemu_system_reset_request() is certainly racy >> when resetting a running CPU. AFAICT: >> >> - Guest resets board, writing to some hw address (e.g. >> arm_sysctl_write) >> - This triggers qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_RESET) >> - We exit iowrite and drop the BQL >> - vl.c schedules qemu_system_reset->qemu_devices_reset...arm_cpu_reset >> - we start writing new values to CPU env while still in TCG code >> - CHAOS! >> >> The general solution for this is to ensure these sort of tasks are done >> with safe work in the CPUs context when we know nothing else is running. >> It seems this is probably best done by modifying >> qemu_system_reset_request to queue work up on current_cpu and execute it >> as safe work - I don't think the vl.c thread should ever be messing >> about with calling cpu_reset directly. > > My first thought is that qemu_system_reset() should absolutely > stop every CPU (or other runnable thing like a DMA agent) in the > system. Are all these reset calls system wide though? After all with PCSI you can bring individual cores up and down. I appreciate the vexpress stuff pre-dates those well defined semantics though. > The semantics are basically "like a power cycle", so > that should include a complete stop of the world. (Is this > what vm_stop() does? Dunno...) vm_stop certainly tries to deal with things gracefully as well as send qapi events, drain IO queues and the rest of it. My only concern is it handles two cases - external vm_stops and those from the current CPU. I think it may be cleaner for CPU originated halts to use the async_safe_run_on_cpu() mechanism. It has clear semantics with respect to the behaviour of other CPUs. If you queue work with async_safe_run_on_cpu and do a cpu_loop_exit you can guarantee all vCPUs have stopped and the work has been serviced before the originating vCPU executes its next instruction. > > thanks > -- PMM -- Alex Bennée
Re: [Qemu-devel] qemu_system_reset_request() broken w.r.t BQL locking regime
On Jul 5, 2017, at 12:42 PM, qemu-devel-requ...@nongnu.org wrote: Hi, An interesting bug was reported on #qemu today. It was bisected to 8d04fb55 (drop global lock for TCG) and only occurred when QEMU was run with taskset -c 0. Originally the fingers where pointed at mttcg but it occurs in both single and multi-threaded modes. I think the problem is qemu_system_reset_request() is certainly racy when resetting a running CPU. AFAICT: - Guest resets board, writing to some hw address (e.g. arm_sysctl_write) - This triggers qemu_system_reset_request (SHUTDOWN_CAUSE_GUEST_RESET) - We exit iowrite and drop the BQL - vl.c schedules qemu_system_reset- >qemu_devices_reset...arm_cpu_reset - we start writing new values to CPU env while still in TCG code - CHAOS! The general solution for this is to ensure these sort of tasks are done with safe work in the CPUs context when we know nothing else is running. It seems this is probably best done by modifying qemu_system_reset_request to queue work up on current_cpu and execute it as safe work - I don't think the vl.c thread should ever be messing about with calling cpu_reset directly. Maybe vl.c should be changed so it registers a request to reset the emulator instead. So instead of cpu_reset() we do request_cpu_reset() Looking at the calls most of these are made by device code but I see KVM also does it. I just wanted to check this was a reasonable approach and wouldn't upset anything else. Any thoughts? I think the problem with the QEMU monitor command "stop" (which causes the emulator to crash) is related to this issue as well.
Re: [Qemu-devel] qemu_system_reset_request() broken w.r.t BQL locking regime
On 05/07/2017 18:14, Peter Maydell wrote: >> - Guest resets board, writing to some hw address (e.g. >> arm_sysctl_write) >> - This triggers qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_RESET) >> - We exit iowrite and drop the BQL >> - vl.c schedules qemu_system_reset->qemu_devices_reset...arm_cpu_reset >> - we start writing new values to CPU env while still in TCG code >> - CHAOS! >> >> The general solution for this is to ensure these sort of tasks are done >> with safe work in the CPUs context when we know nothing else is running. >> It seems this is probably best done by modifying >> qemu_system_reset_request to queue work up on current_cpu and execute it >> as safe work - I don't think the vl.c thread should ever be messing >> about with calling cpu_reset directly. > My first thought is that qemu_system_reset() should absolutely > stop every CPU (or other runnable thing like a DMA agent) in the > system. The semantics are basically "like a power cycle", so > that should include a complete stop of the world. (Is this > what vm_stop() does? Dunno...) I agree, it should do vm_stop() as the first thing and, if applicable, vm_start() as the last thing, similar to e.g. savevm. In fact, the above bug probably has existed forever in KVM. Paolo
Re: [Qemu-devel] qemu_system_reset_request() broken w.r.t BQL locking regime
On 5 July 2017 at 17:01, Alex Bennéewrote: > An interesting bug was reported on #qemu today. It was bisected to > 8d04fb55 (drop global lock for TCG) and only occurred when QEMU was run > with taskset -c 0. Originally the fingers where pointed at mttcg but it > occurs in both single and multi-threaded modes. > > I think the problem is qemu_system_reset_request() is certainly racy > when resetting a running CPU. AFAICT: > > - Guest resets board, writing to some hw address (e.g. > arm_sysctl_write) > - This triggers qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_RESET) > - We exit iowrite and drop the BQL > - vl.c schedules qemu_system_reset->qemu_devices_reset...arm_cpu_reset > - we start writing new values to CPU env while still in TCG code > - CHAOS! > > The general solution for this is to ensure these sort of tasks are done > with safe work in the CPUs context when we know nothing else is running. > It seems this is probably best done by modifying > qemu_system_reset_request to queue work up on current_cpu and execute it > as safe work - I don't think the vl.c thread should ever be messing > about with calling cpu_reset directly. My first thought is that qemu_system_reset() should absolutely stop every CPU (or other runnable thing like a DMA agent) in the system. The semantics are basically "like a power cycle", so that should include a complete stop of the world. (Is this what vm_stop() does? Dunno...) thanks -- PMM