Re: [Qemu-devel] [PATCH 12/16] translate-all: discard TB when tb_link_page returns an existing matching TB

2018-04-05 Thread Emilio G. Cota
On Thu, Mar 29, 2018 at 16:19:39 +0100, Alex Bennée wrote:
> 
> Emilio G. Cota  writes:
> 
> > Use the recently-gained QHT feature of returning the matching TB if it
> > already exists. This allows us to get rid of the lookup we perform
> > right after acquiring tb_lock.
> >
> > Suggested-by: Richard Henderson 
> > Signed-off-by: Emilio G. Cota 
> > ---
> >  accel/tcg/cpu-exec.c  | 14 ++
> >  accel/tcg/translate-all.c | 47 
> > ++-
> >  2 files changed, 40 insertions(+), 21 deletions(-)
> >
> > diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
> > index 7c83887..8aed38c 100644
> > --- a/accel/tcg/cpu-exec.c
> > +++ b/accel/tcg/cpu-exec.c
> > @@ -243,10 +243,7 @@ void cpu_exec_step_atomic(CPUState *cpu)
> >  if (tb == NULL) {
> >  mmap_lock();
> >  tb_lock();
> > -tb = tb_htable_lookup(cpu, pc, cs_base, flags, cf_mask);
> > -if (likely(tb == NULL)) {
> > -tb = tb_gen_code(cpu, pc, cs_base, flags, cflags);
> > -}
> > +tb = tb_gen_code(cpu, pc, cs_base, flags, cflags);
> 
> tb_gen_code needs to be renamed to reflect it's semantics.
> tb_get_or_gen_code? Or maybe tb_get_code with a sub-helper to do the
> generation.

I think it can remain as tb_gen_code. The caller still gets
a TB, and whether that TB has been generated by this thread or
any other thread is irrelevant.

(snip)
> > --- a/accel/tcg/translate-all.c
> > +++ b/accel/tcg/translate-all.c
> > @@ -1503,12 +1503,16 @@ static inline void tb_page_add(PageDesc *p, 
> > TranslationBlock *tb,
> >   * (-1) to indicate that only one page contains the TB.
> >   *
> >   * Called with mmap_lock held for user-mode emulation.
> > + *
> > + * Returns @tb or an existing TB that matches @tb.
> 
> That's just confusing to read. So this returns a TB like the @tb we
> passed in but actually a different one matching the same conditions?

Good point. Here tb_link_page is not a great name, but instead
of adding a long name such as tb_link_page_or_get_existing, in
v2 I've expanded the above comment. It now looks as follows:

 * Returns a pointer @tb, or a pointer to an existing TB that matches @tb.
 * Note that in !user-mode, another thread might have already added a TB
 * for the same block of guest code that @tb corresponds to. In that case,
 * the caller should discard the original @tb, and use instead the returned TB.
 
> > @@ -1706,7 +1727,15 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
> >   * memory barrier is required before tb_link_page() makes the TB 
> > visible
> >   * through the physical hash table and physical page list.
> >   */
> > -tb_link_page(tb, phys_pc, phys_page2);
> > +existing_tb = tb_link_page(tb, phys_pc, phys_page2);
> > +/* if the TB already exists, discard what we just translated */
> 
> So are we in the position now that we could potentially do a translation
> but be beaten by another thread generating the same code?

Exactly.

> I suspect we could
> do with a bit of explanatory commentary for the tb_gen_code functions.

As I said above I don't think tb_gen_code changes at all
to its callers, since the caller still gets a TB pointer that it
did not have before.

tb_link_page is the key here -- I hope the updated comment
I quoted above is enough to make things clear.

> Also I think the "Translation Blocks" section needs updating in the
> MTTCG design document to make this clear.

I've added a comment at the bottom of that section:

Parallel code generation is supported. QHT is used at insertion time
as the synchronization point across threads, thereby ensuring that we only
keep track of a single TranslationBlock for each guest code block.

> I'm curious if we should be counting unused translations somewhere in
> the JIT stats. I'm guessing you need to work at a pathalogical case to
> hit this much?

This should be extremely rare on most workloads. Given that and the
fact that we won't have unused translated code (we discard it by
resetting code_gen_ptr), I wouldn't worry too much about this.
In the unlikely case that it ever became a problem, TCG profiling
time would account for it, and on a perf profile we'd see the slow
path in tb_link_page being taken.

Thanks,

Emilio



Re: [Qemu-devel] [PATCH 12/16] translate-all: discard TB when tb_link_page returns an existing matching TB

2018-03-29 Thread Alex Bennée

Emilio G. Cota  writes:

> Use the recently-gained QHT feature of returning the matching TB if it
> already exists. This allows us to get rid of the lookup we perform
> right after acquiring tb_lock.
>
> Suggested-by: Richard Henderson 
> Signed-off-by: Emilio G. Cota 
> ---
>  accel/tcg/cpu-exec.c  | 14 ++
>  accel/tcg/translate-all.c | 47 
> ++-
>  2 files changed, 40 insertions(+), 21 deletions(-)
>
> diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
> index 7c83887..8aed38c 100644
> --- a/accel/tcg/cpu-exec.c
> +++ b/accel/tcg/cpu-exec.c
> @@ -243,10 +243,7 @@ void cpu_exec_step_atomic(CPUState *cpu)
>  if (tb == NULL) {
>  mmap_lock();
>  tb_lock();
> -tb = tb_htable_lookup(cpu, pc, cs_base, flags, cf_mask);
> -if (likely(tb == NULL)) {
> -tb = tb_gen_code(cpu, pc, cs_base, flags, cflags);
> -}
> +tb = tb_gen_code(cpu, pc, cs_base, flags, cflags);

tb_gen_code needs to be renamed to reflect it's semantics.
tb_get_or_gen_code? Or maybe tb_get_code with a sub-helper to do the
generation.

>  tb_unlock();
>  mmap_unlock();
>  }
> @@ -396,14 +393,7 @@ static inline TranslationBlock *tb_find(CPUState *cpu,
>  tb_lock();
>  acquired_tb_lock = true;
>
> -/* There's a chance that our desired tb has been translated while
> - * taking the locks so we check again inside the lock.
> - */
> -tb = tb_htable_lookup(cpu, pc, cs_base, flags, cf_mask);
> -if (likely(tb == NULL)) {
> -/* if no translated code available, then translate it now */
> -tb = tb_gen_code(cpu, pc, cs_base, flags, cf_mask);
> -}
> +tb = tb_gen_code(cpu, pc, cs_base, flags, cf_mask);
>
>  mmap_unlock();
>  /* We add the TB in the virtual pc hash table for the fast lookup */
> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
> index 82832ef..dbe6c12 100644
> --- a/accel/tcg/translate-all.c
> +++ b/accel/tcg/translate-all.c
> @@ -1503,12 +1503,16 @@ static inline void tb_page_add(PageDesc *p, 
> TranslationBlock *tb,
>   * (-1) to indicate that only one page contains the TB.
>   *
>   * Called with mmap_lock held for user-mode emulation.
> + *
> + * Returns @tb or an existing TB that matches @tb.

That's just confusing to read. So this returns a TB like the @tb we
passed in but actually a different one matching the same conditions?

>   */
> -static void tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
> - tb_page_addr_t phys_page2)
> +static TranslationBlock *
> +tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
> + tb_page_addr_t phys_page2)
>  {
>  PageDesc *p;
>  PageDesc *p2 = NULL;
> +void *existing_tb;
>  uint32_t h;
>
>  assert_memory_lock();
> @@ -1516,6 +1520,11 @@ static void tb_link_page(TranslationBlock *tb, 
> tb_page_addr_t phys_pc,
>  /*
>   * Add the TB to the page list.
>   * To avoid deadlock, acquire first the lock of the lower-addressed page.
> + * We keep the locks held until after inserting the TB in the hash table,
> + * so that if the insertion fails we know for sure that the TBs are still
> + * in the page descriptors.
> + * Note that inserting into the hash table first isn't an option, since
> + * we can only insert TBs that are fully initialized.
>   */
>  p = page_find_alloc(phys_pc >> TARGET_PAGE_BITS, 1);
>  if (likely(phys_page2 == -1)) {
> @@ -1535,21 +1544,33 @@ static void tb_link_page(TranslationBlock *tb, 
> tb_page_addr_t phys_pc,
>  tb_page_add(p2, tb, 1, phys_page2);
>  }
>
> +/* add in the hash table */
> +h = tb_hash_func(phys_pc, tb->pc, tb->flags, tb->cflags & CF_HASH_MASK,
> + tb->trace_vcpu_dstate);
> +existing_tb = qht_insert(_ctx.htable, tb, h);

modulo comments about qht_insert API earlier in the series.

> +
> +/* remove TB from the page(s) if we couldn't insert it */
> +if (unlikely(existing_tb)) {
> +tb_page_remove(p, tb);
> +invalidate_page_bitmap(p);
> +if (p2) {
> +tb_page_remove(p2, tb);
> +invalidate_page_bitmap(p2);
> +}
> +tb = existing_tb;
> +}
> +
>  if (p2) {
>  page_unlock(p2);
>  }
>  page_unlock(p);
>
> -/* add in the hash table */
> -h = tb_hash_func(phys_pc, tb->pc, tb->flags, tb->cflags & CF_HASH_MASK,
> - tb->trace_vcpu_dstate);
> -qht_insert(_ctx.htable, tb, h);
> -
>  #ifdef CONFIG_USER_ONLY
>  if (DEBUG_TB_CHECK_GATE) {
>  tb_page_check();
>  }
>  #endif
> +return tb;
>  }
>
>  /* Called with mmap_lock held for user mode emulation.  */
> @@ -1558,7 +1579,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,

[Qemu-devel] [PATCH 12/16] translate-all: discard TB when tb_link_page returns an existing matching TB

2018-02-26 Thread Emilio G. Cota
Use the recently-gained QHT feature of returning the matching TB if it
already exists. This allows us to get rid of the lookup we perform
right after acquiring tb_lock.

Suggested-by: Richard Henderson 
Signed-off-by: Emilio G. Cota 
---
 accel/tcg/cpu-exec.c  | 14 ++
 accel/tcg/translate-all.c | 47 ++-
 2 files changed, 40 insertions(+), 21 deletions(-)

diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index 7c83887..8aed38c 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -243,10 +243,7 @@ void cpu_exec_step_atomic(CPUState *cpu)
 if (tb == NULL) {
 mmap_lock();
 tb_lock();
-tb = tb_htable_lookup(cpu, pc, cs_base, flags, cf_mask);
-if (likely(tb == NULL)) {
-tb = tb_gen_code(cpu, pc, cs_base, flags, cflags);
-}
+tb = tb_gen_code(cpu, pc, cs_base, flags, cflags);
 tb_unlock();
 mmap_unlock();
 }
@@ -396,14 +393,7 @@ static inline TranslationBlock *tb_find(CPUState *cpu,
 tb_lock();
 acquired_tb_lock = true;
 
-/* There's a chance that our desired tb has been translated while
- * taking the locks so we check again inside the lock.
- */
-tb = tb_htable_lookup(cpu, pc, cs_base, flags, cf_mask);
-if (likely(tb == NULL)) {
-/* if no translated code available, then translate it now */
-tb = tb_gen_code(cpu, pc, cs_base, flags, cf_mask);
-}
+tb = tb_gen_code(cpu, pc, cs_base, flags, cf_mask);
 
 mmap_unlock();
 /* We add the TB in the virtual pc hash table for the fast lookup */
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 82832ef..dbe6c12 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1503,12 +1503,16 @@ static inline void tb_page_add(PageDesc *p, 
TranslationBlock *tb,
  * (-1) to indicate that only one page contains the TB.
  *
  * Called with mmap_lock held for user-mode emulation.
+ *
+ * Returns @tb or an existing TB that matches @tb.
  */
-static void tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
- tb_page_addr_t phys_page2)
+static TranslationBlock *
+tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
+ tb_page_addr_t phys_page2)
 {
 PageDesc *p;
 PageDesc *p2 = NULL;
+void *existing_tb;
 uint32_t h;
 
 assert_memory_lock();
@@ -1516,6 +1520,11 @@ static void tb_link_page(TranslationBlock *tb, 
tb_page_addr_t phys_pc,
 /*
  * Add the TB to the page list.
  * To avoid deadlock, acquire first the lock of the lower-addressed page.
+ * We keep the locks held until after inserting the TB in the hash table,
+ * so that if the insertion fails we know for sure that the TBs are still
+ * in the page descriptors.
+ * Note that inserting into the hash table first isn't an option, since
+ * we can only insert TBs that are fully initialized.
  */
 p = page_find_alloc(phys_pc >> TARGET_PAGE_BITS, 1);
 if (likely(phys_page2 == -1)) {
@@ -1535,21 +1544,33 @@ static void tb_link_page(TranslationBlock *tb, 
tb_page_addr_t phys_pc,
 tb_page_add(p2, tb, 1, phys_page2);
 }
 
+/* add in the hash table */
+h = tb_hash_func(phys_pc, tb->pc, tb->flags, tb->cflags & CF_HASH_MASK,
+ tb->trace_vcpu_dstate);
+existing_tb = qht_insert(_ctx.htable, tb, h);
+
+/* remove TB from the page(s) if we couldn't insert it */
+if (unlikely(existing_tb)) {
+tb_page_remove(p, tb);
+invalidate_page_bitmap(p);
+if (p2) {
+tb_page_remove(p2, tb);
+invalidate_page_bitmap(p2);
+}
+tb = existing_tb;
+}
+
 if (p2) {
 page_unlock(p2);
 }
 page_unlock(p);
 
-/* add in the hash table */
-h = tb_hash_func(phys_pc, tb->pc, tb->flags, tb->cflags & CF_HASH_MASK,
- tb->trace_vcpu_dstate);
-qht_insert(_ctx.htable, tb, h);
-
 #ifdef CONFIG_USER_ONLY
 if (DEBUG_TB_CHECK_GATE) {
 tb_page_check();
 }
 #endif
+return tb;
 }
 
 /* Called with mmap_lock held for user mode emulation.  */
@@ -1558,7 +1579,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
   uint32_t flags, int cflags)
 {
 CPUArchState *env = cpu->env_ptr;
-TranslationBlock *tb;
+TranslationBlock *tb, *existing_tb;
 tb_page_addr_t phys_pc, phys_page2;
 target_ulong virt_page2;
 tcg_insn_unit *gen_code_buf;
@@ -1706,7 +1727,15 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
  * memory barrier is required before tb_link_page() makes the TB visible
  * through the physical hash table and physical page list.
  */
-tb_link_page(tb, phys_pc, phys_page2);
+existing_tb = tb_link_page(tb, phys_pc, phys_page2);
+/* if the TB