Re: [Qemu-devel] [PATCH 15/16] tcg: Make tb_flush() thread safe

2016-09-26 Thread Paolo Bonzini


On 24/09/2016 22:44, Richard Henderson wrote:
> On 09/24/2016 04:51 AM, Paolo Bonzini wrote:
>>
>>
>> - Original Message -
>>> From: "Richard Henderson" 
>>> To: "Paolo Bonzini" , qemu-devel@nongnu.org
>>> Cc: "serge fdrv" , c...@braap.org, "alex
>>> bennee" , "sergey fedorov"
>>> 
>>> Sent: Friday, September 23, 2016 8:06:09 PM
>>> Subject: Re: [Qemu-devel] [PATCH 15/16] tcg: Make tb_flush() thread safe
>>>
>>> On 09/23/2016 12:31 AM, Paolo Bonzini wrote:
>>>> +unsigned tb_flush_req = (unsigned) (uintptr_t) data;
>>>
>>> Extra cast?
>>>
>>>> -tcg_ctx.tb_ctx.tb_flush_count++;
>>>> +atomic_inc(&tcg_ctx.tb_ctx.tb_flush_count);
>>>
>>> Since this is the only place this value is incremented, and we're
>>> under a
>>> lock,
>>> it should be cheaper to use
>>>
>>>atomic_mb_set(&tcg_ctx.tb_ctx.tb_flush_count, tb_flush_req + 1);
>>
>> atomic_set will do even.  Though it's not really a fast path, which is
>> why I went for atomic_inc.
> 
> Don't we need the flush to be complete before the new count is seen? 
> That's why I was suggesting the mb_set.

You're right in that the mb_set is more correct, or even better would be
a store-release.

Actually even atomic_set works, though in a fairly surprising manner.
This is because the final check is done by do_tb_flush under the mutex,
so the final check does wait for the flush to be complete.

If tb_flush_count is exposed too early to tb_flush, all that can happen
is that do_tb_flush sees a tb_flush_req that's more recent than it
should.  do_tb_flush then incorrectly redoes the flush, but that's not a
disaster.

But I'm changing it to mb_set as you suggested.

Paolo



Re: [Qemu-devel] [PATCH 15/16] tcg: Make tb_flush() thread safe

2016-09-24 Thread Richard Henderson

On 09/24/2016 04:51 AM, Paolo Bonzini wrote:



- Original Message -

From: "Richard Henderson" 
To: "Paolo Bonzini" , qemu-devel@nongnu.org
Cc: "serge fdrv" , c...@braap.org, "alex bennee" 
, "sergey fedorov"

Sent: Friday, September 23, 2016 8:06:09 PM
Subject: Re: [Qemu-devel] [PATCH 15/16] tcg: Make tb_flush() thread safe

On 09/23/2016 12:31 AM, Paolo Bonzini wrote:

+unsigned tb_flush_req = (unsigned) (uintptr_t) data;


Extra cast?


-tcg_ctx.tb_ctx.tb_flush_count++;
+atomic_inc(&tcg_ctx.tb_ctx.tb_flush_count);


Since this is the only place this value is incremented, and we're under a
lock,
it should be cheaper to use

   atomic_mb_set(&tcg_ctx.tb_ctx.tb_flush_count, tb_flush_req + 1);


atomic_set will do even.  Though it's not really a fast path, which is
why I went for atomic_inc.


Don't we need the flush to be complete before the new count is seen?  That's 
why I was suggesting the mb_set.



r~



Re: [Qemu-devel] [PATCH 15/16] tcg: Make tb_flush() thread safe

2016-09-24 Thread Paolo Bonzini


- Original Message -
> From: "Richard Henderson" 
> To: "Paolo Bonzini" , qemu-devel@nongnu.org
> Cc: "serge fdrv" , c...@braap.org, "alex bennee" 
> , "sergey fedorov"
> 
> Sent: Friday, September 23, 2016 8:06:09 PM
> Subject: Re: [Qemu-devel] [PATCH 15/16] tcg: Make tb_flush() thread safe
> 
> On 09/23/2016 12:31 AM, Paolo Bonzini wrote:
> > +unsigned tb_flush_req = (unsigned) (uintptr_t) data;
> 
> Extra cast?
> 
> > -tcg_ctx.tb_ctx.tb_flush_count++;
> > +atomic_inc(&tcg_ctx.tb_ctx.tb_flush_count);
> 
> Since this is the only place this value is incremented, and we're under a
> lock,
> it should be cheaper to use
> 
>atomic_mb_set(&tcg_ctx.tb_ctx.tb_flush_count, tb_flush_req + 1);

atomic_set will do even.  Though it's not really a fast path, which is
why I went for atomic_inc.

> > +uintptr_t tb_flush_req = (uintptr_t)
> > +atomic_read(&tcg_ctx.tb_ctx.tb_flush_count);
> 
> Extra cast?

Yeah.

Paolo

> That said, it's correct as-is so,
> 
> Reviewed-by: Richard Henderson 
> 
> 
> r~
> 



Re: [Qemu-devel] [PATCH 15/16] tcg: Make tb_flush() thread safe

2016-09-23 Thread Richard Henderson

On 09/23/2016 12:31 AM, Paolo Bonzini wrote:

+unsigned tb_flush_req = (unsigned) (uintptr_t) data;


Extra cast?


-tcg_ctx.tb_ctx.tb_flush_count++;
+atomic_inc(&tcg_ctx.tb_ctx.tb_flush_count);


Since this is the only place this value is incremented, and we're under a lock, 
it should be cheaper to use


  atomic_mb_set(&tcg_ctx.tb_ctx.tb_flush_count, tb_flush_req + 1);


+uintptr_t tb_flush_req = (uintptr_t)
+atomic_read(&tcg_ctx.tb_ctx.tb_flush_count);


Extra cast?

That said, it's correct as-is so,

Reviewed-by: Richard Henderson 


r~



[Qemu-devel] [PATCH 15/16] tcg: Make tb_flush() thread safe

2016-09-23 Thread Paolo Bonzini
From: Sergey Fedorov 

Use async_safe_run_on_cpu() to make tb_flush() thread safe.  This is
possible now that code generation does not happen in the middle of
execution.

It can happen that multiple threads schedule a safe work to flush the
translation buffer. To keep statistics and debugging output sane, always
check if the translation buffer has already been flushed.

Signed-off-by: Sergey Fedorov 
Signed-off-by: Sergey Fedorov 
[AJB: minor re-base fixes]
Signed-off-by: Alex Bennée 
Message-Id: <1470158864-17651-13-git-send-email-alex.ben...@linaro.org>
Signed-off-by: Paolo Bonzini 
---
 cpu-exec.c| 12 ++--
 include/exec/tb-context.h |  2 +-
 include/qom/cpu.h |  2 --
 translate-all.c   | 38 --
 4 files changed, 31 insertions(+), 23 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index 9f4bd0b..8823d23 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -204,20 +204,16 @@ static void cpu_exec_nocache(CPUState *cpu, int 
max_cycles,
  TranslationBlock *orig_tb, bool ignore_icount)
 {
 TranslationBlock *tb;
-bool old_tb_flushed;
 
 /* Should never happen.
We only end up here when an existing TB is too long.  */
 if (max_cycles > CF_COUNT_MASK)
 max_cycles = CF_COUNT_MASK;
 
-old_tb_flushed = cpu->tb_flushed;
-cpu->tb_flushed = false;
 tb = tb_gen_code(cpu, orig_tb->pc, orig_tb->cs_base, orig_tb->flags,
  max_cycles | CF_NOCACHE
  | (ignore_icount ? CF_IGNORE_ICOUNT : 0));
-tb->orig_tb = cpu->tb_flushed ? NULL : orig_tb;
-cpu->tb_flushed |= old_tb_flushed;
+tb->orig_tb = orig_tb;
 /* execute the generated code */
 trace_exec_tb_nocache(tb, tb->pc);
 cpu_tb_exec(cpu, tb);
@@ -338,10 +334,7 @@ static inline TranslationBlock *tb_find(CPUState *cpu,
 tb_lock();
 have_tb_lock = true;
 }
-/* Check if translation buffer has been flushed */
-if (cpu->tb_flushed) {
-cpu->tb_flushed = false;
-} else if (!tb->invalid) {
+if (!tb->invalid) {
 tb_add_jump(last_tb, tb_exit, tb);
 }
 }
@@ -606,7 +599,6 @@ int cpu_exec(CPUState *cpu)
 break;
 }
 
-atomic_mb_set(&cpu->tb_flushed, false); /* reset before first TB 
lookup */
 for(;;) {
 cpu_handle_interrupt(cpu, &last_tb);
 tb = tb_find(cpu, last_tb, tb_exit);
diff --git a/include/exec/tb-context.h b/include/exec/tb-context.h
index dce95d9..c7f17f2 100644
--- a/include/exec/tb-context.h
+++ b/include/exec/tb-context.h
@@ -38,7 +38,7 @@ struct TBContext {
 QemuMutex tb_lock;
 
 /* statistics */
-int tb_flush_count;
+unsigned tb_flush_count;
 int tb_phys_invalidate_count;
 };
 
diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index 4092dd9..5dfe74a 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -253,7 +253,6 @@ struct qemu_work_item;
  * @crash_occurred: Indicates the OS reported a crash (panic) for this CPU
  * @tcg_exit_req: Set to force TCG to stop executing linked TBs for this
  *   CPU and return to its top level loop.
- * @tb_flushed: Indicates the translation buffer has been flushed.
  * @singlestep_enabled: Flags for single-stepping.
  * @icount_extra: Instructions until next timer event.
  * @icount_decr: Number of cycles left, with interrupt flag in high bit.
@@ -306,7 +305,6 @@ struct CPUState {
 bool unplug;
 bool crash_occurred;
 bool exit_request;
-bool tb_flushed;
 uint32_t interrupt_request;
 int singlestep_enabled;
 int64_t icount_extra;
diff --git a/translate-all.c b/translate-all.c
index e9bc90c..1385736 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -834,12 +834,19 @@ static void page_flush_tb(void)
 }
 
 /* flush all the translation blocks */
-/* XXX: tb_flush is currently not thread safe */
-void tb_flush(CPUState *cpu)
+static void do_tb_flush(CPUState *cpu, void *data)
 {
-if (!tcg_enabled()) {
-return;
+unsigned tb_flush_req = (unsigned) (uintptr_t) data;
+
+tb_lock();
+
+/* If it's already been done on request of another CPU,
+ * just retry.
+ */
+if (tcg_ctx.tb_ctx.tb_flush_count != tb_flush_req) {
+goto done;
 }
+
 #if defined(DEBUG_FLUSH)
 printf("qemu: flush code_size=%ld nb_tbs=%d avg_tb_size=%ld\n",
(unsigned long)(tcg_ctx.code_gen_ptr - tcg_ctx.code_gen_buffer),
@@ -858,7 +865,6 @@ void tb_flush(CPUState *cpu)
 for (i = 0; i < TB_JMP_CACHE_SIZE; ++i) {
 atomic_set(&cpu->tb_jmp_cache[i], NULL);
 }
-atomic_mb_set(&cpu->tb_flushed, true);
 }
 
 tcg_ctx.tb_ctx.nb_tbs = 0;
@@ -868,7 +874,19 @@ void tb_flush(CPUState *cpu)
 tcg_ctx.code_gen_ptr = tcg_ctx.code_gen_buffer;
 /* XXX: flush processor icache at this point if cache flush is
expensive */
-tcg_ctx.tb_ctx.t

Re: [Qemu-devel] [PATCH 15/16] tcg: Make tb_flush() thread safe

2016-09-21 Thread Paolo Bonzini


On 21/09/2016 19:37, Emilio G. Cota wrote:
> On Wed, Sep 21, 2016 at 18:19:26 +0200, Paolo Bonzini wrote:
>>
>>
>> On 21/09/2016 18:05, Emilio G. Cota wrote:
> +tb_lock();
> +
> +/* If it's already been done on request of another CPU,
> + * just retry.
> + */
> +if (atomic_read(&tcg_ctx.tb_ctx.tb_flush_count) != tb_flush_req) {
> +goto done;
>>> tb_flush_count is always accessed with tb_lock held, right? If so, I don't
>>> see a reason to access it with atomic_read/set.
>>
>> tb_flush accesses it outside tb_lock.  Technically this one you're
>> quoting need not use atomic_read, but others need to.
> 
> Sorry for being thick, but when does tb_flush not own tb_lock?
> (I'm assuming we're talking only user-mode, since full-system has
> for now empty tb_lock/unlock helpers.)

When called from gdbstub I think it doesn't (and for system-mode in
other cases too, so better be ready anyway).

Paolo



Re: [Qemu-devel] [PATCH 15/16] tcg: Make tb_flush() thread safe

2016-09-21 Thread Emilio G. Cota
On Wed, Sep 21, 2016 at 18:19:26 +0200, Paolo Bonzini wrote:
> 
> 
> On 21/09/2016 18:05, Emilio G. Cota wrote:
> >> > +tb_lock();
> >> > +
> >> > +/* If it's already been done on request of another CPU,
> >> > + * just retry.
> >> > + */
> >> > +if (atomic_read(&tcg_ctx.tb_ctx.tb_flush_count) != tb_flush_req) {
> >> > +goto done;
> > tb_flush_count is always accessed with tb_lock held, right? If so, I don't
> > see a reason to access it with atomic_read/set.
> 
> tb_flush accesses it outside tb_lock.  Technically this one you're
> quoting need not use atomic_read, but others need to.

Sorry for being thick, but when does tb_flush not own tb_lock?
(I'm assuming we're talking only user-mode, since full-system has
for now empty tb_lock/unlock helpers.)

E.



Re: [Qemu-devel] [PATCH 15/16] tcg: Make tb_flush() thread safe

2016-09-21 Thread Paolo Bonzini


On 21/09/2016 18:05, Emilio G. Cota wrote:
>> > +tb_lock();
>> > +
>> > +/* If it's already been done on request of another CPU,
>> > + * just retry.
>> > + */
>> > +if (atomic_read(&tcg_ctx.tb_ctx.tb_flush_count) != tb_flush_req) {
>> > +goto done;
> tb_flush_count is always accessed with tb_lock held, right? If so, I don't
> see a reason to access it with atomic_read/set.

tb_flush accesses it outside tb_lock.  Technically this one you're
quoting need not use atomic_read, but others need to.

>> +cpu_fprintf(f, "TB flush count  %d\n",
>> +atomic_read(&tcg_ctx.tb_ctx.tb_flush_count));
> 
>  s/%d/%u/ would be more appropriate given the type change.


Ok.

Paolo



Re: [Qemu-devel] [PATCH 15/16] tcg: Make tb_flush() thread safe

2016-09-21 Thread Emilio G. Cota
On Mon, Sep 19, 2016 at 14:50:58 +0200, Paolo Bonzini wrote:
> From: Sergey Fedorov 
> 
> Use async_safe_run_on_cpu() to make tb_flush() thread safe.  This is
> possible now that code generation does not happen in the middle of
> execution.
> 
> It can happen that multiple threads schedule a safe work to flush the
> translation buffer. To keep statistics and debugging output sane, always
> check if the translation buffer has already been flushed.
> 
> Signed-off-by: Sergey Fedorov 
> Signed-off-by: Sergey Fedorov 
> [AJB: minor re-base fixes]
> Signed-off-by: Alex Bennée 
> Message-Id: <1470158864-17651-13-git-send-email-alex.ben...@linaro.org>
> Signed-off-by: Paolo Bonzini 
> ---
(snip)
> @@ -38,7 +38,7 @@ struct TBContext {
>  QemuMutex tb_lock;
>  
>  /* statistics */
> -int tb_flush_count;
> +unsigned tb_flush_count;
(snip)
>  /* flush all the translation blocks */
> -/* XXX: tb_flush is currently not thread safe */
> -void tb_flush(CPUState *cpu)
> +static void do_tb_flush(CPUState *cpu, void *data)
>  {
> -if (!tcg_enabled()) {
> -return;
> +unsigned tb_flush_req = (unsigned) (uintptr_t) data;
> +
> +tb_lock();
> +
> +/* If it's already been done on request of another CPU,
> + * just retry.
> + */
> +if (atomic_read(&tcg_ctx.tb_ctx.tb_flush_count) != tb_flush_req) {
> +goto done;

tb_flush_count is always accessed with tb_lock held, right? If so, I don't
see a reason to access it with atomic_read/set.

(snip)
> @@ -1773,7 +1790,8 @@ void dump_exec_info(FILE *f, fprintf_function 
> cpu_fprintf)
>  qht_statistics_destroy(&hst);
>  
>  cpu_fprintf(f, "\nStatistics:\n");
> -cpu_fprintf(f, "TB flush count  %d\n", 
> tcg_ctx.tb_ctx.tb_flush_count);
> +cpu_fprintf(f, "TB flush count  %d\n",
> +atomic_read(&tcg_ctx.tb_ctx.tb_flush_count));

 s/%d/%u/ would be more appropriate given the type change.

E.



[Qemu-devel] [PATCH 15/16] tcg: Make tb_flush() thread safe

2016-09-19 Thread Paolo Bonzini
From: Sergey Fedorov 

Use async_safe_run_on_cpu() to make tb_flush() thread safe.  This is
possible now that code generation does not happen in the middle of
execution.

It can happen that multiple threads schedule a safe work to flush the
translation buffer. To keep statistics and debugging output sane, always
check if the translation buffer has already been flushed.

Signed-off-by: Sergey Fedorov 
Signed-off-by: Sergey Fedorov 
[AJB: minor re-base fixes]
Signed-off-by: Alex Bennée 
Message-Id: <1470158864-17651-13-git-send-email-alex.ben...@linaro.org>
Signed-off-by: Paolo Bonzini 
---
 cpu-exec.c| 12 ++--
 include/exec/tb-context.h |  2 +-
 include/qom/cpu.h |  2 --
 translate-all.c   | 38 --
 4 files changed, 31 insertions(+), 23 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index b240b9f..a8ff2a1 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -203,20 +203,16 @@ static void cpu_exec_nocache(CPUState *cpu, int 
max_cycles,
  TranslationBlock *orig_tb, bool ignore_icount)
 {
 TranslationBlock *tb;
-bool old_tb_flushed;
 
 /* Should never happen.
We only end up here when an existing TB is too long.  */
 if (max_cycles > CF_COUNT_MASK)
 max_cycles = CF_COUNT_MASK;
 
-old_tb_flushed = cpu->tb_flushed;
-cpu->tb_flushed = false;
 tb = tb_gen_code(cpu, orig_tb->pc, orig_tb->cs_base, orig_tb->flags,
  max_cycles | CF_NOCACHE
  | (ignore_icount ? CF_IGNORE_ICOUNT : 0));
-tb->orig_tb = cpu->tb_flushed ? NULL : orig_tb;
-cpu->tb_flushed |= old_tb_flushed;
+tb->orig_tb = orig_tb;
 /* execute the generated code */
 trace_exec_tb_nocache(tb, tb->pc);
 cpu_tb_exec(cpu, tb);
@@ -337,10 +333,7 @@ static inline TranslationBlock *tb_find(CPUState *cpu,
 tb_lock();
 have_tb_lock = true;
 }
-/* Check if translation buffer has been flushed */
-if (cpu->tb_flushed) {
-cpu->tb_flushed = false;
-} else if (!tb->invalid) {
+if (!tb->invalid) {
 tb_add_jump(last_tb, tb_exit, tb);
 }
 }
@@ -605,7 +598,6 @@ int cpu_exec(CPUState *cpu)
 break;
 }
 
-atomic_mb_set(&cpu->tb_flushed, false); /* reset before first TB 
lookup */
 for(;;) {
 cpu_handle_interrupt(cpu, &last_tb);
 tb = tb_find(cpu, last_tb, tb_exit);
diff --git a/include/exec/tb-context.h b/include/exec/tb-context.h
index dce95d9..c7f17f2 100644
--- a/include/exec/tb-context.h
+++ b/include/exec/tb-context.h
@@ -38,7 +38,7 @@ struct TBContext {
 QemuMutex tb_lock;
 
 /* statistics */
-int tb_flush_count;
+unsigned tb_flush_count;
 int tb_phys_invalidate_count;
 };
 
diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index 4092dd9..5dfe74a 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -253,7 +253,6 @@ struct qemu_work_item;
  * @crash_occurred: Indicates the OS reported a crash (panic) for this CPU
  * @tcg_exit_req: Set to force TCG to stop executing linked TBs for this
  *   CPU and return to its top level loop.
- * @tb_flushed: Indicates the translation buffer has been flushed.
  * @singlestep_enabled: Flags for single-stepping.
  * @icount_extra: Instructions until next timer event.
  * @icount_decr: Number of cycles left, with interrupt flag in high bit.
@@ -306,7 +305,6 @@ struct CPUState {
 bool unplug;
 bool crash_occurred;
 bool exit_request;
-bool tb_flushed;
 uint32_t interrupt_request;
 int singlestep_enabled;
 int64_t icount_extra;
diff --git a/translate-all.c b/translate-all.c
index b6663dc..ab657e7 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -832,12 +832,19 @@ static void page_flush_tb(void)
 }
 
 /* flush all the translation blocks */
-/* XXX: tb_flush is currently not thread safe */
-void tb_flush(CPUState *cpu)
+static void do_tb_flush(CPUState *cpu, void *data)
 {
-if (!tcg_enabled()) {
-return;
+unsigned tb_flush_req = (unsigned) (uintptr_t) data;
+
+tb_lock();
+
+/* If it's already been done on request of another CPU,
+ * just retry.
+ */
+if (atomic_read(&tcg_ctx.tb_ctx.tb_flush_count) != tb_flush_req) {
+goto done;
 }
+
 #if defined(DEBUG_FLUSH)
 printf("qemu: flush code_size=%ld nb_tbs=%d avg_tb_size=%ld\n",
(unsigned long)(tcg_ctx.code_gen_ptr - tcg_ctx.code_gen_buffer),
@@ -856,7 +863,6 @@ void tb_flush(CPUState *cpu)
 for (i = 0; i < TB_JMP_CACHE_SIZE; ++i) {
 atomic_set(&cpu->tb_jmp_cache[i], NULL);
 }
-atomic_mb_set(&cpu->tb_flushed, true);
 }
 
 tcg_ctx.tb_ctx.nb_tbs = 0;
@@ -866,7 +872,19 @@ void tb_flush(CPUState *cpu)
 tcg_ctx.code_gen_ptr = tcg_ctx.code_gen_buffer;
 /* XXX: flush processor icache at this point if cache flush is
expensive */
-tc

[Qemu-devel] [PATCH 15/16] tcg: Make tb_flush() thread safe

2016-09-12 Thread Paolo Bonzini
From: Sergey Fedorov 

Use async_safe_run_on_cpu() to make tb_flush() thread safe.  This is
possible now that code generation does not happen in the middle of
execution.

It can happen that multiple threads schedule a safe work to flush the
translation buffer. To keep statistics and debugging output sane, always
check if the translation buffer has already been flushed.

Signed-off-by: Sergey Fedorov 
Signed-off-by: Sergey Fedorov 
[AJB: minor re-base fixes]
Signed-off-by: Alex Bennée 
Message-Id: <1470158864-17651-13-git-send-email-alex.ben...@linaro.org>
Signed-off-by: Paolo Bonzini 
---
 cpu-exec.c| 12 ++--
 include/exec/tb-context.h |  2 +-
 include/qom/cpu.h |  2 --
 translate-all.c   | 38 --
 4 files changed, 31 insertions(+), 23 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index b240b9f..a8ff2a1 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -203,20 +203,16 @@ static void cpu_exec_nocache(CPUState *cpu, int 
max_cycles,
  TranslationBlock *orig_tb, bool ignore_icount)
 {
 TranslationBlock *tb;
-bool old_tb_flushed;
 
 /* Should never happen.
We only end up here when an existing TB is too long.  */
 if (max_cycles > CF_COUNT_MASK)
 max_cycles = CF_COUNT_MASK;
 
-old_tb_flushed = cpu->tb_flushed;
-cpu->tb_flushed = false;
 tb = tb_gen_code(cpu, orig_tb->pc, orig_tb->cs_base, orig_tb->flags,
  max_cycles | CF_NOCACHE
  | (ignore_icount ? CF_IGNORE_ICOUNT : 0));
-tb->orig_tb = cpu->tb_flushed ? NULL : orig_tb;
-cpu->tb_flushed |= old_tb_flushed;
+tb->orig_tb = orig_tb;
 /* execute the generated code */
 trace_exec_tb_nocache(tb, tb->pc);
 cpu_tb_exec(cpu, tb);
@@ -337,10 +333,7 @@ static inline TranslationBlock *tb_find(CPUState *cpu,
 tb_lock();
 have_tb_lock = true;
 }
-/* Check if translation buffer has been flushed */
-if (cpu->tb_flushed) {
-cpu->tb_flushed = false;
-} else if (!tb->invalid) {
+if (!tb->invalid) {
 tb_add_jump(last_tb, tb_exit, tb);
 }
 }
@@ -605,7 +598,6 @@ int cpu_exec(CPUState *cpu)
 break;
 }
 
-atomic_mb_set(&cpu->tb_flushed, false); /* reset before first TB 
lookup */
 for(;;) {
 cpu_handle_interrupt(cpu, &last_tb);
 tb = tb_find(cpu, last_tb, tb_exit);
diff --git a/include/exec/tb-context.h b/include/exec/tb-context.h
index dce95d9..c7f17f2 100644
--- a/include/exec/tb-context.h
+++ b/include/exec/tb-context.h
@@ -38,7 +38,7 @@ struct TBContext {
 QemuMutex tb_lock;
 
 /* statistics */
-int tb_flush_count;
+unsigned tb_flush_count;
 int tb_phys_invalidate_count;
 };
 
diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index d1ca31c..3eb595c 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -253,7 +253,6 @@ struct qemu_work_item;
  * @crash_occurred: Indicates the OS reported a crash (panic) for this CPU
  * @tcg_exit_req: Set to force TCG to stop executing linked TBs for this
  *   CPU and return to its top level loop.
- * @tb_flushed: Indicates the translation buffer has been flushed.
  * @singlestep_enabled: Flags for single-stepping.
  * @icount_extra: Instructions until next timer event.
  * @icount_decr: Number of cycles left, with interrupt flag in high bit.
@@ -306,7 +305,6 @@ struct CPUState {
 bool unplug;
 bool crash_occurred;
 bool exit_request;
-bool tb_flushed;
 uint32_t interrupt_request;
 int singlestep_enabled;
 int64_t icount_extra;
diff --git a/translate-all.c b/translate-all.c
index b6663dc..ab657e7 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -832,12 +832,19 @@ static void page_flush_tb(void)
 }
 
 /* flush all the translation blocks */
-/* XXX: tb_flush is currently not thread safe */
-void tb_flush(CPUState *cpu)
+static void do_tb_flush(CPUState *cpu, void *data)
 {
-if (!tcg_enabled()) {
-return;
+unsigned tb_flush_req = (unsigned) (uintptr_t) data;
+
+tb_lock();
+
+/* If it's already been done on request of another CPU,
+ * just retry.
+ */
+if (atomic_read(&tcg_ctx.tb_ctx.tb_flush_count) != tb_flush_req) {
+goto done;
 }
+
 #if defined(DEBUG_FLUSH)
 printf("qemu: flush code_size=%ld nb_tbs=%d avg_tb_size=%ld\n",
(unsigned long)(tcg_ctx.code_gen_ptr - tcg_ctx.code_gen_buffer),
@@ -856,7 +863,6 @@ void tb_flush(CPUState *cpu)
 for (i = 0; i < TB_JMP_CACHE_SIZE; ++i) {
 atomic_set(&cpu->tb_jmp_cache[i], NULL);
 }
-atomic_mb_set(&cpu->tb_flushed, true);
 }
 
 tcg_ctx.tb_ctx.nb_tbs = 0;
@@ -866,7 +872,19 @@ void tb_flush(CPUState *cpu)
 tcg_ctx.code_gen_ptr = tcg_ctx.code_gen_buffer;
 /* XXX: flush processor icache at this point if cache flush is
expensive */
-tc