Re: [Qemu-devel] [PATCH 12/16] translate-all: discard TB when tb_link_page returns an existing matching TB
On Thu, Mar 29, 2018 at 16:19:39 +0100, Alex Bennée wrote: > > Emilio G. Cotawrites: > > > Use the recently-gained QHT feature of returning the matching TB if it > > already exists. This allows us to get rid of the lookup we perform > > right after acquiring tb_lock. > > > > Suggested-by: Richard Henderson > > Signed-off-by: Emilio G. Cota > > --- > > accel/tcg/cpu-exec.c | 14 ++ > > accel/tcg/translate-all.c | 47 > > ++- > > 2 files changed, 40 insertions(+), 21 deletions(-) > > > > diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c > > index 7c83887..8aed38c 100644 > > --- a/accel/tcg/cpu-exec.c > > +++ b/accel/tcg/cpu-exec.c > > @@ -243,10 +243,7 @@ void cpu_exec_step_atomic(CPUState *cpu) > > if (tb == NULL) { > > mmap_lock(); > > tb_lock(); > > -tb = tb_htable_lookup(cpu, pc, cs_base, flags, cf_mask); > > -if (likely(tb == NULL)) { > > -tb = tb_gen_code(cpu, pc, cs_base, flags, cflags); > > -} > > +tb = tb_gen_code(cpu, pc, cs_base, flags, cflags); > > tb_gen_code needs to be renamed to reflect it's semantics. > tb_get_or_gen_code? Or maybe tb_get_code with a sub-helper to do the > generation. I think it can remain as tb_gen_code. The caller still gets a TB, and whether that TB has been generated by this thread or any other thread is irrelevant. (snip) > > --- a/accel/tcg/translate-all.c > > +++ b/accel/tcg/translate-all.c > > @@ -1503,12 +1503,16 @@ static inline void tb_page_add(PageDesc *p, > > TranslationBlock *tb, > > * (-1) to indicate that only one page contains the TB. > > * > > * Called with mmap_lock held for user-mode emulation. > > + * > > + * Returns @tb or an existing TB that matches @tb. > > That's just confusing to read. So this returns a TB like the @tb we > passed in but actually a different one matching the same conditions? Good point. Here tb_link_page is not a great name, but instead of adding a long name such as tb_link_page_or_get_existing, in v2 I've expanded the above comment. It now looks as follows: * Returns a pointer @tb, or a pointer to an existing TB that matches @tb. * Note that in !user-mode, another thread might have already added a TB * for the same block of guest code that @tb corresponds to. In that case, * the caller should discard the original @tb, and use instead the returned TB. > > @@ -1706,7 +1727,15 @@ TranslationBlock *tb_gen_code(CPUState *cpu, > > * memory barrier is required before tb_link_page() makes the TB > > visible > > * through the physical hash table and physical page list. > > */ > > -tb_link_page(tb, phys_pc, phys_page2); > > +existing_tb = tb_link_page(tb, phys_pc, phys_page2); > > +/* if the TB already exists, discard what we just translated */ > > So are we in the position now that we could potentially do a translation > but be beaten by another thread generating the same code? Exactly. > I suspect we could > do with a bit of explanatory commentary for the tb_gen_code functions. As I said above I don't think tb_gen_code changes at all to its callers, since the caller still gets a TB pointer that it did not have before. tb_link_page is the key here -- I hope the updated comment I quoted above is enough to make things clear. > Also I think the "Translation Blocks" section needs updating in the > MTTCG design document to make this clear. I've added a comment at the bottom of that section: Parallel code generation is supported. QHT is used at insertion time as the synchronization point across threads, thereby ensuring that we only keep track of a single TranslationBlock for each guest code block. > I'm curious if we should be counting unused translations somewhere in > the JIT stats. I'm guessing you need to work at a pathalogical case to > hit this much? This should be extremely rare on most workloads. Given that and the fact that we won't have unused translated code (we discard it by resetting code_gen_ptr), I wouldn't worry too much about this. In the unlikely case that it ever became a problem, TCG profiling time would account for it, and on a perf profile we'd see the slow path in tb_link_page being taken. Thanks, Emilio
Re: [Qemu-devel] [PATCH 12/16] translate-all: discard TB when tb_link_page returns an existing matching TB
Emilio G. Cotawrites: > Use the recently-gained QHT feature of returning the matching TB if it > already exists. This allows us to get rid of the lookup we perform > right after acquiring tb_lock. > > Suggested-by: Richard Henderson > Signed-off-by: Emilio G. Cota > --- > accel/tcg/cpu-exec.c | 14 ++ > accel/tcg/translate-all.c | 47 > ++- > 2 files changed, 40 insertions(+), 21 deletions(-) > > diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c > index 7c83887..8aed38c 100644 > --- a/accel/tcg/cpu-exec.c > +++ b/accel/tcg/cpu-exec.c > @@ -243,10 +243,7 @@ void cpu_exec_step_atomic(CPUState *cpu) > if (tb == NULL) { > mmap_lock(); > tb_lock(); > -tb = tb_htable_lookup(cpu, pc, cs_base, flags, cf_mask); > -if (likely(tb == NULL)) { > -tb = tb_gen_code(cpu, pc, cs_base, flags, cflags); > -} > +tb = tb_gen_code(cpu, pc, cs_base, flags, cflags); tb_gen_code needs to be renamed to reflect it's semantics. tb_get_or_gen_code? Or maybe tb_get_code with a sub-helper to do the generation. > tb_unlock(); > mmap_unlock(); > } > @@ -396,14 +393,7 @@ static inline TranslationBlock *tb_find(CPUState *cpu, > tb_lock(); > acquired_tb_lock = true; > > -/* There's a chance that our desired tb has been translated while > - * taking the locks so we check again inside the lock. > - */ > -tb = tb_htable_lookup(cpu, pc, cs_base, flags, cf_mask); > -if (likely(tb == NULL)) { > -/* if no translated code available, then translate it now */ > -tb = tb_gen_code(cpu, pc, cs_base, flags, cf_mask); > -} > +tb = tb_gen_code(cpu, pc, cs_base, flags, cf_mask); > > mmap_unlock(); > /* We add the TB in the virtual pc hash table for the fast lookup */ > diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c > index 82832ef..dbe6c12 100644 > --- a/accel/tcg/translate-all.c > +++ b/accel/tcg/translate-all.c > @@ -1503,12 +1503,16 @@ static inline void tb_page_add(PageDesc *p, > TranslationBlock *tb, > * (-1) to indicate that only one page contains the TB. > * > * Called with mmap_lock held for user-mode emulation. > + * > + * Returns @tb or an existing TB that matches @tb. That's just confusing to read. So this returns a TB like the @tb we passed in but actually a different one matching the same conditions? > */ > -static void tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc, > - tb_page_addr_t phys_page2) > +static TranslationBlock * > +tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc, > + tb_page_addr_t phys_page2) > { > PageDesc *p; > PageDesc *p2 = NULL; > +void *existing_tb; > uint32_t h; > > assert_memory_lock(); > @@ -1516,6 +1520,11 @@ static void tb_link_page(TranslationBlock *tb, > tb_page_addr_t phys_pc, > /* > * Add the TB to the page list. > * To avoid deadlock, acquire first the lock of the lower-addressed page. > + * We keep the locks held until after inserting the TB in the hash table, > + * so that if the insertion fails we know for sure that the TBs are still > + * in the page descriptors. > + * Note that inserting into the hash table first isn't an option, since > + * we can only insert TBs that are fully initialized. > */ > p = page_find_alloc(phys_pc >> TARGET_PAGE_BITS, 1); > if (likely(phys_page2 == -1)) { > @@ -1535,21 +1544,33 @@ static void tb_link_page(TranslationBlock *tb, > tb_page_addr_t phys_pc, > tb_page_add(p2, tb, 1, phys_page2); > } > > +/* add in the hash table */ > +h = tb_hash_func(phys_pc, tb->pc, tb->flags, tb->cflags & CF_HASH_MASK, > + tb->trace_vcpu_dstate); > +existing_tb = qht_insert(_ctx.htable, tb, h); modulo comments about qht_insert API earlier in the series. > + > +/* remove TB from the page(s) if we couldn't insert it */ > +if (unlikely(existing_tb)) { > +tb_page_remove(p, tb); > +invalidate_page_bitmap(p); > +if (p2) { > +tb_page_remove(p2, tb); > +invalidate_page_bitmap(p2); > +} > +tb = existing_tb; > +} > + > if (p2) { > page_unlock(p2); > } > page_unlock(p); > > -/* add in the hash table */ > -h = tb_hash_func(phys_pc, tb->pc, tb->flags, tb->cflags & CF_HASH_MASK, > - tb->trace_vcpu_dstate); > -qht_insert(_ctx.htable, tb, h); > - > #ifdef CONFIG_USER_ONLY > if (DEBUG_TB_CHECK_GATE) { > tb_page_check(); > } > #endif > +return tb; > } > > /* Called with mmap_lock held for user mode emulation. */ > @@ -1558,7 +1579,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
[Qemu-devel] [PATCH 12/16] translate-all: discard TB when tb_link_page returns an existing matching TB
Use the recently-gained QHT feature of returning the matching TB if it already exists. This allows us to get rid of the lookup we perform right after acquiring tb_lock. Suggested-by: Richard HendersonSigned-off-by: Emilio G. Cota --- accel/tcg/cpu-exec.c | 14 ++ accel/tcg/translate-all.c | 47 ++- 2 files changed, 40 insertions(+), 21 deletions(-) diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c index 7c83887..8aed38c 100644 --- a/accel/tcg/cpu-exec.c +++ b/accel/tcg/cpu-exec.c @@ -243,10 +243,7 @@ void cpu_exec_step_atomic(CPUState *cpu) if (tb == NULL) { mmap_lock(); tb_lock(); -tb = tb_htable_lookup(cpu, pc, cs_base, flags, cf_mask); -if (likely(tb == NULL)) { -tb = tb_gen_code(cpu, pc, cs_base, flags, cflags); -} +tb = tb_gen_code(cpu, pc, cs_base, flags, cflags); tb_unlock(); mmap_unlock(); } @@ -396,14 +393,7 @@ static inline TranslationBlock *tb_find(CPUState *cpu, tb_lock(); acquired_tb_lock = true; -/* There's a chance that our desired tb has been translated while - * taking the locks so we check again inside the lock. - */ -tb = tb_htable_lookup(cpu, pc, cs_base, flags, cf_mask); -if (likely(tb == NULL)) { -/* if no translated code available, then translate it now */ -tb = tb_gen_code(cpu, pc, cs_base, flags, cf_mask); -} +tb = tb_gen_code(cpu, pc, cs_base, flags, cf_mask); mmap_unlock(); /* We add the TB in the virtual pc hash table for the fast lookup */ diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c index 82832ef..dbe6c12 100644 --- a/accel/tcg/translate-all.c +++ b/accel/tcg/translate-all.c @@ -1503,12 +1503,16 @@ static inline void tb_page_add(PageDesc *p, TranslationBlock *tb, * (-1) to indicate that only one page contains the TB. * * Called with mmap_lock held for user-mode emulation. + * + * Returns @tb or an existing TB that matches @tb. */ -static void tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc, - tb_page_addr_t phys_page2) +static TranslationBlock * +tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc, + tb_page_addr_t phys_page2) { PageDesc *p; PageDesc *p2 = NULL; +void *existing_tb; uint32_t h; assert_memory_lock(); @@ -1516,6 +1520,11 @@ static void tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc, /* * Add the TB to the page list. * To avoid deadlock, acquire first the lock of the lower-addressed page. + * We keep the locks held until after inserting the TB in the hash table, + * so that if the insertion fails we know for sure that the TBs are still + * in the page descriptors. + * Note that inserting into the hash table first isn't an option, since + * we can only insert TBs that are fully initialized. */ p = page_find_alloc(phys_pc >> TARGET_PAGE_BITS, 1); if (likely(phys_page2 == -1)) { @@ -1535,21 +1544,33 @@ static void tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc, tb_page_add(p2, tb, 1, phys_page2); } +/* add in the hash table */ +h = tb_hash_func(phys_pc, tb->pc, tb->flags, tb->cflags & CF_HASH_MASK, + tb->trace_vcpu_dstate); +existing_tb = qht_insert(_ctx.htable, tb, h); + +/* remove TB from the page(s) if we couldn't insert it */ +if (unlikely(existing_tb)) { +tb_page_remove(p, tb); +invalidate_page_bitmap(p); +if (p2) { +tb_page_remove(p2, tb); +invalidate_page_bitmap(p2); +} +tb = existing_tb; +} + if (p2) { page_unlock(p2); } page_unlock(p); -/* add in the hash table */ -h = tb_hash_func(phys_pc, tb->pc, tb->flags, tb->cflags & CF_HASH_MASK, - tb->trace_vcpu_dstate); -qht_insert(_ctx.htable, tb, h); - #ifdef CONFIG_USER_ONLY if (DEBUG_TB_CHECK_GATE) { tb_page_check(); } #endif +return tb; } /* Called with mmap_lock held for user mode emulation. */ @@ -1558,7 +1579,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu, uint32_t flags, int cflags) { CPUArchState *env = cpu->env_ptr; -TranslationBlock *tb; +TranslationBlock *tb, *existing_tb; tb_page_addr_t phys_pc, phys_page2; target_ulong virt_page2; tcg_insn_unit *gen_code_buf; @@ -1706,7 +1727,15 @@ TranslationBlock *tb_gen_code(CPUState *cpu, * memory barrier is required before tb_link_page() makes the TB visible * through the physical hash table and physical page list. */ -tb_link_page(tb, phys_pc, phys_page2); +existing_tb = tb_link_page(tb, phys_pc, phys_page2); +/* if the TB