Re: [Qemu-devel] [RFC PATCH V7 09/19] Drop global lock during TCG code execution
On 11/08/2015 23:34, Frederic Konrad wrote: Also if qemu_cond_broadcast(qemu_io_proceeded_cond) is being dropped there is no point keeping the guff around in qemu_tcg_wait_io_event. Yes good point. BTW this leads to high consumption of host CPU eg: 100% per VCPU thread as the VCPUs thread are no longer waiting for qemu_io_proceeded_cond. If the guest CPU is busy waiting, that's expected. But if the guest CPU is halted, it should not have 100% host CPU consumption. Paolo
Re: [Qemu-devel] [RFC PATCH V7 09/19] Drop global lock during TCG code execution
On 12/08/2015 11:58, Paolo Bonzini wrote: On 11/08/2015 23:34, Frederic Konrad wrote: Also if qemu_cond_broadcast(qemu_io_proceeded_cond) is being dropped there is no point keeping the guff around in qemu_tcg_wait_io_event. Yes good point. BTW this leads to high consumption of host CPU eg: 100% per VCPU thread as the VCPUs thread are no longer waiting for qemu_io_proceeded_cond. If the guest CPU is busy waiting, that's expected. But if the guest CPU is halted, it should not have 100% host CPU consumption. Paolo Hmm so that's definitely strange... I mean theorically it's the same as before? An other thing. It seems that we need to signal the VCPU when the iothread take the lock eg: if (tcg_enabled() qemu_thread_is_self(io_thread)) { CPU_FOREACH(cpu) { cpu_exit(cpu); } } To make this patch working without MTTCG. Fred
Re: [Qemu-devel] [RFC PATCH V7 09/19] Drop global lock during TCG code execution
On 11/08/2015 22:12, Alex Bennée wrote: Paolo Bonzini pbonz...@redhat.com writes: On 10/08/2015 17:27, fred.kon...@greensocs.com wrote: void qemu_mutex_lock_iothread(void) { -atomic_inc(iothread_requesting_mutex); -/* In the simple case there is no need to bump the VCPU thread out of - * TCG code execution. - */ -if (!tcg_enabled() || qemu_in_vcpu_thread() || -!first_cpu || !first_cpu-thread) { -qemu_mutex_lock(qemu_global_mutex); -atomic_dec(iothread_requesting_mutex); -} else { -if (qemu_mutex_trylock(qemu_global_mutex)) { -qemu_cpu_kick_thread(first_cpu); -qemu_mutex_lock(qemu_global_mutex); -} -atomic_dec(iothread_requesting_mutex); -qemu_cond_broadcast(qemu_io_proceeded_cond); -} -iothread_locked = true; iothread_locked = true must be kept. Otherwise... yay! :) Also if qemu_cond_broadcast(qemu_io_proceeded_cond) is being dropped there is no point keeping the guff around in qemu_tcg_wait_io_event. Yes good point. BTW this leads to high consumption of host CPU eg: 100% per VCPU thread as the VCPUs thread are no longer waiting for qemu_io_proceeded_cond. Fred
Re: [Qemu-devel] [RFC PATCH V7 09/19] Drop global lock during TCG code execution
On 10/08/2015 18:15, Paolo Bonzini wrote: On 10/08/2015 17:27, fred.kon...@greensocs.com wrote: void qemu_mutex_lock_iothread(void) { -atomic_inc(iothread_requesting_mutex); -/* In the simple case there is no need to bump the VCPU thread out of - * TCG code execution. - */ -if (!tcg_enabled() || qemu_in_vcpu_thread() || -!first_cpu || !first_cpu-thread) { -qemu_mutex_lock(qemu_global_mutex); -atomic_dec(iothread_requesting_mutex); -} else { -if (qemu_mutex_trylock(qemu_global_mutex)) { -qemu_cpu_kick_thread(first_cpu); -qemu_mutex_lock(qemu_global_mutex); -} -atomic_dec(iothread_requesting_mutex); -qemu_cond_broadcast(qemu_io_proceeded_cond); -} -iothread_locked = true; iothread_locked = true must be kept. Otherwise... yay! :) oops :). @@ -125,8 +128,10 @@ void tlb_flush_page(CPUState *cpu, target_ulong addr) can be detected */ void tlb_protect_code(ram_addr_t ram_addr) { +qemu_mutex_lock_iothread(); cpu_physical_memory_test_and_clear_dirty(ram_addr, TARGET_PAGE_SIZE, DIRTY_MEMORY_CODE); +qemu_mutex_unlock_iothread(); } Not needed anymore. diff --git a/target-i386/misc_helper.c b/target-i386/misc_helper.c index 52c5d65..55f63bf 100644 --- a/target-i386/misc_helper.c +++ b/target-i386/misc_helper.c None of this is needed anymore either! :) +/* + * Some device's reset needs to grab the global_mutex. So just release it + * here. + */ +qemu_mutex_unlock_iothread(); /* reset all devices */ QTAILQ_FOREACH_SAFE(re, reset_handlers, entry, nre) { re-func(re-opaque); } +qemu_mutex_lock_iothread(); Should never have been true? (And, I think, it was pointed out in a previous version too). I had a double lock with the reset handler from vexpress-a15. I don't really remember why. But I hacked that. It's fixed now :) Thanks, Fred Paolo
Re: [Qemu-devel] [RFC PATCH V7 09/19] Drop global lock during TCG code execution
Paolo Bonzini pbonz...@redhat.com writes: On 10/08/2015 17:27, fred.kon...@greensocs.com wrote: void qemu_mutex_lock_iothread(void) { -atomic_inc(iothread_requesting_mutex); -/* In the simple case there is no need to bump the VCPU thread out of - * TCG code execution. - */ -if (!tcg_enabled() || qemu_in_vcpu_thread() || -!first_cpu || !first_cpu-thread) { -qemu_mutex_lock(qemu_global_mutex); -atomic_dec(iothread_requesting_mutex); -} else { -if (qemu_mutex_trylock(qemu_global_mutex)) { -qemu_cpu_kick_thread(first_cpu); -qemu_mutex_lock(qemu_global_mutex); -} -atomic_dec(iothread_requesting_mutex); -qemu_cond_broadcast(qemu_io_proceeded_cond); -} -iothread_locked = true; iothread_locked = true must be kept. Otherwise... yay! :) Also if qemu_cond_broadcast(qemu_io_proceeded_cond) is being dropped there is no point keeping the guff around in qemu_tcg_wait_io_event. -- Alex Bennée
[Qemu-devel] [RFC PATCH V7 09/19] Drop global lock during TCG code execution
From: KONRAD Frederic fred.kon...@greensocs.com This finally allows TCG to benefit from the iothread introduction: Drop the global mutex while running pure TCG CPU code. Reacquire the lock when entering MMIO or PIO emulation, or when leaving the TCG loop. We have to revert a few optimization for the current TCG threading model, namely kicking the TCG thread in qemu_mutex_lock_iothread and not kicking it in qemu_cpu_kick. We also need to disable RAM block reordering until we have a more efficient locking mechanism at hand. I'm pretty sure some cases are still broken, definitely SMP (we no longer perform round-robin scheduling by chance). Still, a Linux x86 UP guest and my Musicpal ARM model boot fine here. These numbers demonstrate where we gain something: 20338 jan 20 0 331m 75m 6904 R 99 0.9 0:50.95 qemu-system-arm 20337 jan 20 0 331m 75m 6904 S 20 0.9 0:26.50 qemu-system-arm The guest CPU was fully loaded, but the iothread could still run mostly independent on a second core. Without the patch we don't get beyond 32206 jan 20 0 330m 73m 7036 R 82 0.9 1:06.00 qemu-system-arm 32204 jan 20 0 330m 73m 7036 S 21 0.9 0:17.03 qemu-system-arm We don't benefit significantly, though, when the guest is not fully loading a host CPU. Note that this patch depends on http://thread.gmane.org/gmane.comp.emulators.qemu/118657 Changes from Fred Konrad: * Rebase on the current HEAD. * Fixes a deadlock in qemu_devices_reset(). * Remove the mutex in address_space_* --- cpus.c| 20 +++- cputlb.c | 5 + target-i386/misc_helper.c | 27 --- translate-all.c | 2 ++ vl.c | 6 ++ 5 files changed, 40 insertions(+), 20 deletions(-) diff --git a/cpus.c b/cpus.c index 2550be2..154a081 100644 --- a/cpus.c +++ b/cpus.c @@ -1232,23 +1232,7 @@ bool qemu_mutex_iothread_locked(void) void qemu_mutex_lock_iothread(void) { -atomic_inc(iothread_requesting_mutex); -/* In the simple case there is no need to bump the VCPU thread out of - * TCG code execution. - */ -if (!tcg_enabled() || qemu_in_vcpu_thread() || -!first_cpu || !first_cpu-thread) { -qemu_mutex_lock(qemu_global_mutex); -atomic_dec(iothread_requesting_mutex); -} else { -if (qemu_mutex_trylock(qemu_global_mutex)) { -qemu_cpu_kick_thread(first_cpu); -qemu_mutex_lock(qemu_global_mutex); -} -atomic_dec(iothread_requesting_mutex); -qemu_cond_broadcast(qemu_io_proceeded_cond); -} -iothread_locked = true; +qemu_mutex_lock(qemu_global_mutex); } void qemu_mutex_unlock_iothread(void) @@ -1469,7 +1453,9 @@ static int tcg_cpu_exec(CPUState *cpu) cpu-icount_decr.u16.low = decr; cpu-icount_extra = count; } +qemu_mutex_unlock_iothread(); ret = cpu_exec(cpu); +qemu_mutex_lock_iothread(); #ifdef CONFIG_PROFILER tcg_time += profile_getclock() - ti; #endif diff --git a/cputlb.c b/cputlb.c index a506086..79fff1c 100644 --- a/cputlb.c +++ b/cputlb.c @@ -30,6 +30,9 @@ #include exec/ram_addr.h #include tcg/tcg.h +void qemu_mutex_lock_iothread(void); +void qemu_mutex_unlock_iothread(void); + //#define DEBUG_TLB //#define DEBUG_TLB_CHECK @@ -125,8 +128,10 @@ void tlb_flush_page(CPUState *cpu, target_ulong addr) can be detected */ void tlb_protect_code(ram_addr_t ram_addr) { +qemu_mutex_lock_iothread(); cpu_physical_memory_test_and_clear_dirty(ram_addr, TARGET_PAGE_SIZE, DIRTY_MEMORY_CODE); +qemu_mutex_unlock_iothread(); } /* update the TLB so that writes in physical page 'phys_addr' are no longer diff --git a/target-i386/misc_helper.c b/target-i386/misc_helper.c index 52c5d65..55f63bf 100644 --- a/target-i386/misc_helper.c +++ b/target-i386/misc_helper.c @@ -27,8 +27,10 @@ void helper_outb(CPUX86State *env, uint32_t port, uint32_t data) #ifdef CONFIG_USER_ONLY fprintf(stderr, outb: port=0x%04x, data=%02x\n, port, data); #else +qemu_mutex_lock_iothread(); address_space_stb(address_space_io, port, data, cpu_get_mem_attrs(env), NULL); +qemu_mutex_unlock_iothread(); #endif } @@ -38,8 +40,13 @@ target_ulong helper_inb(CPUX86State *env, uint32_t port) fprintf(stderr, inb: port=0x%04x\n, port); return 0; #else -return address_space_ldub(address_space_io, port, +target_ulong ret; + +qemu_mutex_lock_iothread(); +ret = address_space_ldub(address_space_io, port, cpu_get_mem_attrs(env), NULL); +qemu_mutex_unlock_iothread(); +return ret; #endif } @@ -48,8 +55,10 @@ void helper_outw(CPUX86State *env, uint32_t port, uint32_t data) #ifdef CONFIG_USER_ONLY fprintf(stderr, outw: port=0x%04x, data=%04x\n, port, data); #else +qemu_mutex_lock_iothread();
Re: [Qemu-devel] [RFC PATCH V7 09/19] Drop global lock during TCG code execution
On 10/08/2015 17:27, fred.kon...@greensocs.com wrote: void qemu_mutex_lock_iothread(void) { -atomic_inc(iothread_requesting_mutex); -/* In the simple case there is no need to bump the VCPU thread out of - * TCG code execution. - */ -if (!tcg_enabled() || qemu_in_vcpu_thread() || -!first_cpu || !first_cpu-thread) { -qemu_mutex_lock(qemu_global_mutex); -atomic_dec(iothread_requesting_mutex); -} else { -if (qemu_mutex_trylock(qemu_global_mutex)) { -qemu_cpu_kick_thread(first_cpu); -qemu_mutex_lock(qemu_global_mutex); -} -atomic_dec(iothread_requesting_mutex); -qemu_cond_broadcast(qemu_io_proceeded_cond); -} -iothread_locked = true; iothread_locked = true must be kept. Otherwise... yay! :) @@ -125,8 +128,10 @@ void tlb_flush_page(CPUState *cpu, target_ulong addr) can be detected */ void tlb_protect_code(ram_addr_t ram_addr) { +qemu_mutex_lock_iothread(); cpu_physical_memory_test_and_clear_dirty(ram_addr, TARGET_PAGE_SIZE, DIRTY_MEMORY_CODE); +qemu_mutex_unlock_iothread(); } Not needed anymore. diff --git a/target-i386/misc_helper.c b/target-i386/misc_helper.c index 52c5d65..55f63bf 100644 --- a/target-i386/misc_helper.c +++ b/target-i386/misc_helper.c None of this is needed anymore either! :) +/* + * Some device's reset needs to grab the global_mutex. So just release it + * here. + */ +qemu_mutex_unlock_iothread(); /* reset all devices */ QTAILQ_FOREACH_SAFE(re, reset_handlers, entry, nre) { re-func(re-opaque); } +qemu_mutex_lock_iothread(); Should never have been true? (And, I think, it was pointed out in a previous version too). Paolo