Re: [Qemu-devel] [RFC PATCH V7 09/19] Drop global lock during TCG code execution

2015-08-12 Thread Paolo Bonzini


On 11/08/2015 23:34, Frederic Konrad wrote:

 Also if qemu_cond_broadcast(qemu_io_proceeded_cond) is being dropped
 there is no point keeping the guff around in qemu_tcg_wait_io_event.

 Yes good point.
 
 BTW this leads to high consumption of host CPU eg: 100% per VCPU thread as
 the VCPUs thread are no longer waiting for qemu_io_proceeded_cond.

If the guest CPU is busy waiting, that's expected.  But if the guest CPU
is halted, it should not have 100% host CPU consumption.

Paolo



Re: [Qemu-devel] [RFC PATCH V7 09/19] Drop global lock during TCG code execution

2015-08-12 Thread Frederic Konrad

On 12/08/2015 11:58, Paolo Bonzini wrote:


On 11/08/2015 23:34, Frederic Konrad wrote:

Also if qemu_cond_broadcast(qemu_io_proceeded_cond) is being dropped
there is no point keeping the guff around in qemu_tcg_wait_io_event.


Yes good point.

BTW this leads to high consumption of host CPU eg: 100% per VCPU thread as
the VCPUs thread are no longer waiting for qemu_io_proceeded_cond.

If the guest CPU is busy waiting, that's expected.  But if the guest CPU
is halted, it should not have 100% host CPU consumption.

Paolo

Hmm so that's definitely strange... I mean theorically it's the same as 
before?


An other thing. It seems that we need to signal the VCPU when the 
iothread take

the lock eg:

if (tcg_enabled()  qemu_thread_is_self(io_thread)) {
CPU_FOREACH(cpu) {
cpu_exit(cpu);
}
}

To make this patch working without MTTCG.

Fred



Re: [Qemu-devel] [RFC PATCH V7 09/19] Drop global lock during TCG code execution

2015-08-11 Thread Frederic Konrad

On 11/08/2015 22:12, Alex Bennée wrote:

Paolo Bonzini pbonz...@redhat.com writes:


On 10/08/2015 17:27, fred.kon...@greensocs.com wrote:

  void qemu_mutex_lock_iothread(void)
  {
-atomic_inc(iothread_requesting_mutex);
-/* In the simple case there is no need to bump the VCPU thread out of
- * TCG code execution.
- */
-if (!tcg_enabled() || qemu_in_vcpu_thread() ||
-!first_cpu || !first_cpu-thread) {
-qemu_mutex_lock(qemu_global_mutex);
-atomic_dec(iothread_requesting_mutex);
-} else {
-if (qemu_mutex_trylock(qemu_global_mutex)) {
-qemu_cpu_kick_thread(first_cpu);
-qemu_mutex_lock(qemu_global_mutex);
-}
-atomic_dec(iothread_requesting_mutex);
-qemu_cond_broadcast(qemu_io_proceeded_cond);
-}
-iothread_locked = true;

iothread_locked = true must be kept.  Otherwise... yay! :)


Also if qemu_cond_broadcast(qemu_io_proceeded_cond) is being dropped
there is no point keeping the guff around in qemu_tcg_wait_io_event.


Yes good point.

BTW this leads to high consumption of host CPU eg: 100% per VCPU thread as
the VCPUs thread are no longer waiting for qemu_io_proceeded_cond.

Fred



Re: [Qemu-devel] [RFC PATCH V7 09/19] Drop global lock during TCG code execution

2015-08-11 Thread Frederic Konrad

On 10/08/2015 18:15, Paolo Bonzini wrote:


On 10/08/2015 17:27, fred.kon...@greensocs.com wrote:

  void qemu_mutex_lock_iothread(void)
  {
-atomic_inc(iothread_requesting_mutex);
-/* In the simple case there is no need to bump the VCPU thread out of
- * TCG code execution.
- */
-if (!tcg_enabled() || qemu_in_vcpu_thread() ||
-!first_cpu || !first_cpu-thread) {
-qemu_mutex_lock(qemu_global_mutex);
-atomic_dec(iothread_requesting_mutex);
-} else {
-if (qemu_mutex_trylock(qemu_global_mutex)) {
-qemu_cpu_kick_thread(first_cpu);
-qemu_mutex_lock(qemu_global_mutex);
-}
-atomic_dec(iothread_requesting_mutex);
-qemu_cond_broadcast(qemu_io_proceeded_cond);
-}
-iothread_locked = true;

iothread_locked = true must be kept.  Otherwise... yay! :)


oops :).



@@ -125,8 +128,10 @@ void tlb_flush_page(CPUState *cpu, target_ulong addr)
 can be detected */
  void tlb_protect_code(ram_addr_t ram_addr)
  {
+qemu_mutex_lock_iothread();
  cpu_physical_memory_test_and_clear_dirty(ram_addr, TARGET_PAGE_SIZE,
   DIRTY_MEMORY_CODE);
+qemu_mutex_unlock_iothread();
  }
  

Not needed anymore.


diff --git a/target-i386/misc_helper.c b/target-i386/misc_helper.c
index 52c5d65..55f63bf 100644
--- a/target-i386/misc_helper.c
+++ b/target-i386/misc_helper.c

None of this is needed anymore either! :)


+/*
+ * Some device's reset needs to grab the global_mutex. So just release it
+ * here.
+ */
+qemu_mutex_unlock_iothread();
  /* reset all devices */
  QTAILQ_FOREACH_SAFE(re, reset_handlers, entry, nre) {
  re-func(re-opaque);
  }
+qemu_mutex_lock_iothread();

Should never have been true?  (And, I think, it was pointed out in a
previous version too).

I had a double lock with the reset handler from vexpress-a15. I don't really
remember why. But I hacked that. It's fixed now :)

Thanks,
Fred


Paolo






Re: [Qemu-devel] [RFC PATCH V7 09/19] Drop global lock during TCG code execution

2015-08-11 Thread Alex Bennée

Paolo Bonzini pbonz...@redhat.com writes:

 On 10/08/2015 17:27, fred.kon...@greensocs.com wrote:
  void qemu_mutex_lock_iothread(void)
  {
 -atomic_inc(iothread_requesting_mutex);
 -/* In the simple case there is no need to bump the VCPU thread out of
 - * TCG code execution.
 - */
 -if (!tcg_enabled() || qemu_in_vcpu_thread() ||
 -!first_cpu || !first_cpu-thread) {
 -qemu_mutex_lock(qemu_global_mutex);
 -atomic_dec(iothread_requesting_mutex);
 -} else {
 -if (qemu_mutex_trylock(qemu_global_mutex)) {
 -qemu_cpu_kick_thread(first_cpu);
 -qemu_mutex_lock(qemu_global_mutex);
 -}
 -atomic_dec(iothread_requesting_mutex);
 -qemu_cond_broadcast(qemu_io_proceeded_cond);
 -}
 -iothread_locked = true;

 iothread_locked = true must be kept.  Otherwise... yay! :)


Also if qemu_cond_broadcast(qemu_io_proceeded_cond) is being dropped
there is no point keeping the guff around in qemu_tcg_wait_io_event.

-- 
Alex Bennée



[Qemu-devel] [RFC PATCH V7 09/19] Drop global lock during TCG code execution

2015-08-10 Thread fred . konrad
From: KONRAD Frederic fred.kon...@greensocs.com

This finally allows TCG to benefit from the iothread introduction: Drop
the global mutex while running pure TCG CPU code. Reacquire the lock
when entering MMIO or PIO emulation, or when leaving the TCG loop.

We have to revert a few optimization for the current TCG threading
model, namely kicking the TCG thread in qemu_mutex_lock_iothread and not
kicking it in qemu_cpu_kick. We also need to disable RAM block
reordering until we have a more efficient locking mechanism at hand.

I'm pretty sure some cases are still broken, definitely SMP (we no
longer perform round-robin scheduling by chance). Still, a Linux x86
UP guest and my Musicpal ARM model boot fine here. These numbers
demonstrate where we gain something:

20338 jan   20   0  331m  75m 6904 R   99  0.9   0:50.95 qemu-system-arm
20337 jan   20   0  331m  75m 6904 S   20  0.9   0:26.50 qemu-system-arm

The guest CPU was fully loaded, but the iothread could still run mostly
independent on a second core. Without the patch we don't get beyond

32206 jan   20   0  330m  73m 7036 R   82  0.9   1:06.00 qemu-system-arm
32204 jan   20   0  330m  73m 7036 S   21  0.9   0:17.03 qemu-system-arm

We don't benefit significantly, though, when the guest is not fully
loading a host CPU.

Note that this patch depends on
http://thread.gmane.org/gmane.comp.emulators.qemu/118657

Changes from Fred Konrad:
  * Rebase on the current HEAD.
  * Fixes a deadlock in qemu_devices_reset().
  * Remove the mutex in address_space_*
---
 cpus.c| 20 +++-
 cputlb.c  |  5 +
 target-i386/misc_helper.c | 27 ---
 translate-all.c   |  2 ++
 vl.c  |  6 ++
 5 files changed, 40 insertions(+), 20 deletions(-)

diff --git a/cpus.c b/cpus.c
index 2550be2..154a081 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1232,23 +1232,7 @@ bool qemu_mutex_iothread_locked(void)
 
 void qemu_mutex_lock_iothread(void)
 {
-atomic_inc(iothread_requesting_mutex);
-/* In the simple case there is no need to bump the VCPU thread out of
- * TCG code execution.
- */
-if (!tcg_enabled() || qemu_in_vcpu_thread() ||
-!first_cpu || !first_cpu-thread) {
-qemu_mutex_lock(qemu_global_mutex);
-atomic_dec(iothread_requesting_mutex);
-} else {
-if (qemu_mutex_trylock(qemu_global_mutex)) {
-qemu_cpu_kick_thread(first_cpu);
-qemu_mutex_lock(qemu_global_mutex);
-}
-atomic_dec(iothread_requesting_mutex);
-qemu_cond_broadcast(qemu_io_proceeded_cond);
-}
-iothread_locked = true;
+qemu_mutex_lock(qemu_global_mutex);
 }
 
 void qemu_mutex_unlock_iothread(void)
@@ -1469,7 +1453,9 @@ static int tcg_cpu_exec(CPUState *cpu)
 cpu-icount_decr.u16.low = decr;
 cpu-icount_extra = count;
 }
+qemu_mutex_unlock_iothread();
 ret = cpu_exec(cpu);
+qemu_mutex_lock_iothread();
 #ifdef CONFIG_PROFILER
 tcg_time += profile_getclock() - ti;
 #endif
diff --git a/cputlb.c b/cputlb.c
index a506086..79fff1c 100644
--- a/cputlb.c
+++ b/cputlb.c
@@ -30,6 +30,9 @@
 #include exec/ram_addr.h
 #include tcg/tcg.h
 
+void qemu_mutex_lock_iothread(void);
+void qemu_mutex_unlock_iothread(void);
+
 //#define DEBUG_TLB
 //#define DEBUG_TLB_CHECK
 
@@ -125,8 +128,10 @@ void tlb_flush_page(CPUState *cpu, target_ulong addr)
can be detected */
 void tlb_protect_code(ram_addr_t ram_addr)
 {
+qemu_mutex_lock_iothread();
 cpu_physical_memory_test_and_clear_dirty(ram_addr, TARGET_PAGE_SIZE,
  DIRTY_MEMORY_CODE);
+qemu_mutex_unlock_iothread();
 }
 
 /* update the TLB so that writes in physical page 'phys_addr' are no longer
diff --git a/target-i386/misc_helper.c b/target-i386/misc_helper.c
index 52c5d65..55f63bf 100644
--- a/target-i386/misc_helper.c
+++ b/target-i386/misc_helper.c
@@ -27,8 +27,10 @@ void helper_outb(CPUX86State *env, uint32_t port, uint32_t 
data)
 #ifdef CONFIG_USER_ONLY
 fprintf(stderr, outb: port=0x%04x, data=%02x\n, port, data);
 #else
+qemu_mutex_lock_iothread();
 address_space_stb(address_space_io, port, data,
   cpu_get_mem_attrs(env), NULL);
+qemu_mutex_unlock_iothread();
 #endif
 }
 
@@ -38,8 +40,13 @@ target_ulong helper_inb(CPUX86State *env, uint32_t port)
 fprintf(stderr, inb: port=0x%04x\n, port);
 return 0;
 #else
-return address_space_ldub(address_space_io, port,
+target_ulong ret;
+
+qemu_mutex_lock_iothread();
+ret = address_space_ldub(address_space_io, port,
   cpu_get_mem_attrs(env), NULL);
+qemu_mutex_unlock_iothread();
+return ret;
 #endif
 }
 
@@ -48,8 +55,10 @@ void helper_outw(CPUX86State *env, uint32_t port, uint32_t 
data)
 #ifdef CONFIG_USER_ONLY
 fprintf(stderr, outw: port=0x%04x, data=%04x\n, port, data);
 #else
+qemu_mutex_lock_iothread();
 

Re: [Qemu-devel] [RFC PATCH V7 09/19] Drop global lock during TCG code execution

2015-08-10 Thread Paolo Bonzini


On 10/08/2015 17:27, fred.kon...@greensocs.com wrote:
  void qemu_mutex_lock_iothread(void)
  {
 -atomic_inc(iothread_requesting_mutex);
 -/* In the simple case there is no need to bump the VCPU thread out of
 - * TCG code execution.
 - */
 -if (!tcg_enabled() || qemu_in_vcpu_thread() ||
 -!first_cpu || !first_cpu-thread) {
 -qemu_mutex_lock(qemu_global_mutex);
 -atomic_dec(iothread_requesting_mutex);
 -} else {
 -if (qemu_mutex_trylock(qemu_global_mutex)) {
 -qemu_cpu_kick_thread(first_cpu);
 -qemu_mutex_lock(qemu_global_mutex);
 -}
 -atomic_dec(iothread_requesting_mutex);
 -qemu_cond_broadcast(qemu_io_proceeded_cond);
 -}
 -iothread_locked = true;

iothread_locked = true must be kept.  Otherwise... yay! :)

 @@ -125,8 +128,10 @@ void tlb_flush_page(CPUState *cpu, target_ulong addr)
 can be detected */
  void tlb_protect_code(ram_addr_t ram_addr)
  {
 +qemu_mutex_lock_iothread();
  cpu_physical_memory_test_and_clear_dirty(ram_addr, TARGET_PAGE_SIZE,
   DIRTY_MEMORY_CODE);
 +qemu_mutex_unlock_iothread();
  }
  

Not needed anymore.

 diff --git a/target-i386/misc_helper.c b/target-i386/misc_helper.c
 index 52c5d65..55f63bf 100644
 --- a/target-i386/misc_helper.c
 +++ b/target-i386/misc_helper.c

None of this is needed anymore either! :)

 +/*
 + * Some device's reset needs to grab the global_mutex. So just release it
 + * here.
 + */
 +qemu_mutex_unlock_iothread();
  /* reset all devices */
  QTAILQ_FOREACH_SAFE(re, reset_handlers, entry, nre) {
  re-func(re-opaque);
  }
 +qemu_mutex_lock_iothread();

Should never have been true?  (And, I think, it was pointed out in a
previous version too).

Paolo