Re: [Qemu-devel] [PATCH v4 11/43] tcg: define CF_PARALLEL and use it for TB hashing along with CF_COUNT_MASK
On 10/09/2017 12:24 PM, Emilio G. Cota wrote: > On Thu, Oct 05, 2017 at 19:24:16 -0400, Emilio G. Cota wrote: >> I'm including the fixups below. > > Just pushed v5 of this series. I rebased the series on top of the current > master, > and added the already mentioned fixes plus a new one (see below). > > Grab the patches from: > https://github.com/cota/qemu/tree/multi-tcg-v5 Thanks. I was just about to do this rebase myself. > Changes since v4: > - Rebase to current master. (fixed a few conflicts, mostly due to some targets > having been converted to the generic translation loop.) > - Add the removal of CF_COUNT_MASK as a separate "FIXUP" commit to signal > that this still requires attention. > - Fix tb TC comparison function to support deletions (otherwise -icount > breaks) > - [new fix] TCG regions: Rely on max_cpus instead of smp_cpus. Otherwise we'd > break CPU hotplug on targets that support it. (max_cpus and smp_cpus are > set from the command line with -smp foo,maxcpus=bar; max_cpus is always > >= smp_cpus.) Excellent. I'll work on reviewing these changes this week. In the meantime, I've cherry-picked about half of the patch set. r~
Re: [Qemu-devel] [PATCH v4 11/43] tcg: define CF_PARALLEL and use it for TB hashing along with CF_COUNT_MASK
On Thu, Oct 05, 2017 at 19:24:16 -0400, Emilio G. Cota wrote: > I'm including the fixups below. Just pushed v5 of this series. I rebased the series on top of the current master, and added the already mentioned fixes plus a new one (see below). Grab the patches from: https://github.com/cota/qemu/tree/multi-tcg-v5 Changes since v4: - Rebase to current master. (fixed a few conflicts, mostly due to some targets having been converted to the generic translation loop.) - Add the removal of CF_COUNT_MASK as a separate "FIXUP" commit to signal that this still requires attention. - Fix tb TC comparison function to support deletions (otherwise -icount breaks) - [new fix] TCG regions: Rely on max_cpus instead of smp_cpus. Otherwise we'd break CPU hotplug on targets that support it. (max_cpus and smp_cpus are set from the command line with -smp foo,maxcpus=bar; max_cpus is always >= smp_cpus.) Thanks, Emilio
Re: [Qemu-devel] [PATCH v4 11/43] tcg: define CF_PARALLEL and use it for TB hashing along with CF_COUNT_MASK
On Mon, Sep 25, 2017 at 10:01:15 -0700, Richard Henderson wrote: > On 09/22/2017 01:40 PM, Emilio G. Cota wrote: > > Hi Richard, > > > > Are you planning to get this patchset merged in this window? If so, I can > > give it a respin on top of the current master. > > Yes, I do. I've been intending to look at ... > > > Anyway, before doing so we should fix the issue around CF_COUNT_MASK that > > Pranith reported: > > ... this one and figure out why my intuition is wrong. > I'm at Linaro Connect this week, so it may have to wait til next. I just tested this with icount. Turns out two fixups to this patchset are necessary to not break icount. The first one is (again) to just include CF_PARALLEL in the hash mask. The second is a fix for the comparison function of tb_tc, without which removals don't work. I'm including the fixups below. Thanks, Emilio commit 7a899a8df2d769dd80ba1a7f559cb4ddbb0c568b Author: Emilio G. Cota Date: Thu Oct 5 18:40:30 2017 -0400 FIXUP for "tcg: define CF_PARALLEL ..." Signed-off-by: Emilio G. Cota diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h index 025fae0..8b2f233 100644 --- a/include/exec/exec-all.h +++ b/include/exec/exec-all.h @@ -326,7 +326,7 @@ struct TranslationBlock { #define CF_INVALID 0x8 /* TB is stale. Setters must acquire tb_lock */ #define CF_PARALLEL0x10 /* Generate code for a parallel context */ /* cflags' mask for hashing/comparison */ -#define CF_HASH_MASK (CF_COUNT_MASK | CF_PARALLEL) +#define CF_HASH_MASK (CF_PARALLEL) /* Per-vCPU dynamic tracing state used to generate this TB */ uint32_t trace_vcpu_dstate; commit c102c2409a5a134fd7f9ba61f69a5ae29df99c89 Author: Emilio G. Cota Date: Thu Oct 5 18:51:24 2017 -0400 FIXUP for "translate-all: use a binary search tree to ..." Signed-off-by: Emilio G. Cota diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c index 9f1faff..fe607ca 100644 --- a/accel/tcg/translate-all.c +++ b/accel/tcg/translate-all.c @@ -793,19 +793,19 @@ static gint tb_tc_cmp(gconstpointer ap, gconstpointer bp) const struct tb_tc *b = bp; /* - * When both sizes are set, we know this isn't a lookup and therefore - * the two buffers are non-overlapping. + * When both sizes are set, we know this isn't a lookup. * This is the most likely case: every TB must be inserted; lookups * are a lot less frequent. */ if (likely(a->size && b->size)) { -/* a->ptr == b->ptr would mean the buffers overlap */ -g_assert(a->ptr != b->ptr); - if (a->ptr > b->ptr) { return 1; +} else if (a->ptr < b->ptr) { +return -1; } -return -1; +/* a->ptr == b->ptr should happen only on deletions */ +g_assert(a->size == b->size); +return 0; } /* * All lookups have either .size field set to 0.
Re: [Qemu-devel] [PATCH v4 11/43] tcg: define CF_PARALLEL and use it for TB hashing along with CF_COUNT_MASK
On 09/22/2017 01:40 PM, Emilio G. Cota wrote: > Hi Richard, > > Are you planning to get this patchset merged in this window? If so, I can > give it a respin on top of the current master. Yes, I do. I've been intending to look at ... > Anyway, before doing so we should fix the issue around CF_COUNT_MASK that > Pranith reported: ... this one and figure out why my intuition is wrong. I'm at Linaro Connect this week, so it may have to wait til next. r~
Re: [Qemu-devel] [PATCH v4 11/43] tcg: define CF_PARALLEL and use it for TB hashing along with CF_COUNT_MASK
Hi Richard, Are you planning to get this patchset merged in this window? If so, I can give it a respin on top of the current master. Anyway, before doing so we should fix the issue around CF_COUNT_MASK that Pranith reported: On Wed, Aug 30, 2017 at 10:43:28 -0400, Pranith Kumar wrote: > On Tue, Aug 29, 2017 at 5:16 PM, Emilio G. Cota wrote: > > On Sun, Aug 27, 2017 at 18:15:50 -0400, Pranith Kumar wrote: > >> Hi Emilio, > >> > >> On Fri, Jul 21, 2017 at 1:59 AM, Emilio G. Cota wrote: > >> > This will enable us to decouple code translation from the value > >> > of parallel_cpus at any given time. It will also help us minimize > >> > TB flushes when generating code via EXCP_ATOMIC. > >> > > >> > Note that the declaration of parallel_cpus is brought to exec-all.h > >> > to be able to define there the "curr_cflags" inline. > >> > > >> > Signed-off-by: Emilio G. Cota > >> > >> I was testing a winxp image today and I bisected a infinite loop to > >> this commit. The loop happens both with and without mttcg, so I think > >> it has got to do with something else. > > > > Can you test the below? It lets me boot ubuntu, otherwise it reliably > > chokes on a 'rep movsb' *very* early (doesn't even get to grub). > > > > This discusson on v2 might be relevant (I added CF_COUNT_MASK as a > > result of it, but it seems I have to revisit that): > > https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg06456.html > > > > Anyway let me know if this fixes it for you. Thanks for testing! > > > > Yes, this fixes the issue for me. My quick-fix is just not to check those bits, but as you pointed out during v3's review the whole thing is a little bit fragile if we don't check them. Should we just go with the CF_NOCHAIN flag that you proposed? If so, do you want me to give that approach a go, or you prefer to do it yourself? Thanks, Emilio
Re: [Qemu-devel] [PATCH v4 11/43] tcg: define CF_PARALLEL and use it for TB hashing along with CF_COUNT_MASK
On Tue, Aug 29, 2017 at 5:16 PM, Emilio G. Cota wrote: > On Sun, Aug 27, 2017 at 18:15:50 -0400, Pranith Kumar wrote: >> Hi Emilio, >> >> On Fri, Jul 21, 2017 at 1:59 AM, Emilio G. Cota wrote: >> > This will enable us to decouple code translation from the value >> > of parallel_cpus at any given time. It will also help us minimize >> > TB flushes when generating code via EXCP_ATOMIC. >> > >> > Note that the declaration of parallel_cpus is brought to exec-all.h >> > to be able to define there the "curr_cflags" inline. >> > >> > Signed-off-by: Emilio G. Cota >> >> I was testing a winxp image today and I bisected a infinite loop to >> this commit. The loop happens both with and without mttcg, so I think >> it has got to do with something else. > > Can you test the below? It lets me boot ubuntu, otherwise it reliably > chokes on a 'rep movsb' *very* early (doesn't even get to grub). > > This discusson on v2 might be relevant (I added CF_COUNT_MASK as a > result of it, but it seems I have to revisit that): > https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg06456.html > > Anyway let me know if this fixes it for you. Thanks for testing! > Yes, this fixes the issue for me. Thanks, -- Pranith
Re: [Qemu-devel] [PATCH v4 11/43] tcg: define CF_PARALLEL and use it for TB hashing along with CF_COUNT_MASK
On Sun, Aug 27, 2017 at 18:15:50 -0400, Pranith Kumar wrote: > Hi Emilio, > > On Fri, Jul 21, 2017 at 1:59 AM, Emilio G. Cota wrote: > > This will enable us to decouple code translation from the value > > of parallel_cpus at any given time. It will also help us minimize > > TB flushes when generating code via EXCP_ATOMIC. > > > > Note that the declaration of parallel_cpus is brought to exec-all.h > > to be able to define there the "curr_cflags" inline. > > > > Signed-off-by: Emilio G. Cota > > I was testing a winxp image today and I bisected a infinite loop to > this commit. The loop happens both with and without mttcg, so I think > it has got to do with something else. Can you test the below? It lets me boot ubuntu, otherwise it reliably chokes on a 'rep movsb' *very* early (doesn't even get to grub). This discusson on v2 might be relevant (I added CF_COUNT_MASK as a result of it, but it seems I have to revisit that): https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg06456.html Anyway let me know if this fixes it for you. Thanks for testing! Emilio diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h index 025fae0..8b2f233 100644 --- a/include/exec/exec-all.h +++ b/include/exec/exec-all.h @@ -326,7 +326,7 @@ struct TranslationBlock { #define CF_INVALID 0x8 /* TB is stale. Setters must acquire tb_lock */ #define CF_PARALLEL0x10 /* Generate code for a parallel context */ /* cflags' mask for hashing/comparison */ -#define CF_HASH_MASK (CF_COUNT_MASK | CF_PARALLEL) +#define CF_HASH_MASK (CF_PARALLEL) /* Per-vCPU dynamic tracing state used to generate this TB */ uint32_t trace_vcpu_dstate;
Re: [Qemu-devel] [PATCH v4 11/43] tcg: define CF_PARALLEL and use it for TB hashing along with CF_COUNT_MASK
Hi Emilio, On Fri, Jul 21, 2017 at 1:59 AM, Emilio G. Cota wrote: > This will enable us to decouple code translation from the value > of parallel_cpus at any given time. It will also help us minimize > TB flushes when generating code via EXCP_ATOMIC. > > Note that the declaration of parallel_cpus is brought to exec-all.h > to be able to define there the "curr_cflags" inline. > > Signed-off-by: Emilio G. Cota I was testing a winxp image today and I bisected a infinite loop to this commit. The loop happens both with and without mttcg, so I think it has got to do with something else. It seems to be executing this instruction: Trace 0x7fffc1d036c0 [0: 000c9a6b] IN: 0x000c9a6b: rep movsb %cs:(%si),%es:(%di) and never stops. You can find an execution log here: http://pranith.org/files/log_n.txt.gz Let me know if you need more information. Thanks, > --- > include/exec/exec-all.h | 20 +++- > include/exec/tb-hash-xx.h | 9 ++--- > include/exec/tb-hash.h| 4 ++-- > include/exec/tb-lookup.h | 6 +++--- > tcg/tcg.h | 1 - > accel/tcg/cpu-exec.c | 45 +++-- > accel/tcg/translate-all.c | 13 + > exec.c| 2 +- > tcg/tcg-runtime.c | 2 +- > tests/qht-bench.c | 2 +- > 10 files changed, 65 insertions(+), 39 deletions(-) > > diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h > index 8cbd90b..f6a928d 100644 > --- a/include/exec/exec-all.h > +++ b/include/exec/exec-all.h > @@ -353,6 +353,9 @@ struct TranslationBlock { > #define CF_USE_ICOUNT 0x2 > #define CF_IGNORE_ICOUNT 0x4 /* Do not generate icount code */ > #define CF_INVALID 0x8 /* TB is stale. Setters must acquire tb_lock > */ > +#define CF_PARALLEL0x10 /* Generate code for a parallel context */ > +/* cflags' mask for hashing/comparison */ > +#define CF_HASH_MASK (CF_COUNT_MASK | CF_PARALLEL) > > /* Per-vCPU dynamic tracing state used to generate this TB */ > uint32_t trace_vcpu_dstate; > @@ -396,11 +399,26 @@ struct TranslationBlock { > uintptr_t jmp_list_first; > }; > > +extern bool parallel_cpus; > + > +/* Hide the atomic_read to make code a little easier on the eyes */ > +static inline uint32_t tb_cflags(const TranslationBlock *tb) > +{ > +return atomic_read(&tb->cflags); > +} > + > +/* current cflags for hashing/comparison */ > +static inline uint32_t curr_cflags(void) > +{ > +return parallel_cpus ? CF_PARALLEL : 0; > +} > + > void tb_free(TranslationBlock *tb); > void tb_flush(CPUState *cpu); > void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr); > TranslationBlock *tb_htable_lookup(CPUState *cpu, target_ulong pc, > - target_ulong cs_base, uint32_t flags); > + target_ulong cs_base, uint32_t flags, > + uint32_t cf_mask); > > #if defined(USE_DIRECT_JUMP) > > diff --git a/include/exec/tb-hash-xx.h b/include/exec/tb-hash-xx.h > index 6cd3022..747a9a6 100644 > --- a/include/exec/tb-hash-xx.h > +++ b/include/exec/tb-hash-xx.h > @@ -48,8 +48,8 @@ > * xxhash32, customized for input variables that are not guaranteed to be > * contiguous in memory. > */ > -static inline > -uint32_t tb_hash_func6(uint64_t a0, uint64_t b0, uint32_t e, uint32_t f) > +static inline uint32_t > +tb_hash_func7(uint64_t a0, uint64_t b0, uint32_t e, uint32_t f, uint32_t g) > { > uint32_t v1 = TB_HASH_XX_SEED + PRIME32_1 + PRIME32_2; > uint32_t v2 = TB_HASH_XX_SEED + PRIME32_2; > @@ -78,7 +78,7 @@ uint32_t tb_hash_func6(uint64_t a0, uint64_t b0, uint32_t > e, uint32_t f) > v4 *= PRIME32_1; > > h32 = rol32(v1, 1) + rol32(v2, 7) + rol32(v3, 12) + rol32(v4, 18); > -h32 += 24; > +h32 += 28; > > h32 += e * PRIME32_3; > h32 = rol32(h32, 17) * PRIME32_4; > @@ -86,6 +86,9 @@ uint32_t tb_hash_func6(uint64_t a0, uint64_t b0, uint32_t > e, uint32_t f) > h32 += f * PRIME32_3; > h32 = rol32(h32, 17) * PRIME32_4; > > +h32 += g * PRIME32_3; > +h32 = rol32(h32, 17) * PRIME32_4; > + > h32 ^= h32 >> 15; > h32 *= PRIME32_2; > h32 ^= h32 >> 13; > diff --git a/include/exec/tb-hash.h b/include/exec/tb-hash.h > index 17b5ee0..0526c4f 100644 > --- a/include/exec/tb-hash.h > +++ b/include/exec/tb-hash.h > @@ -59,9 +59,9 @@ static inline unsigned int > tb_jmp_cache_hash_func(target_ulong pc) > > static inline > uint32_t tb_hash_func(tb_page_addr_t phys_pc, target_ulong pc, uint32_t > flags, > - uint32_t trace_vcpu_dstate) > + uint32_t cf_mask, uint32_t trace_vcpu_dstate) > { > -return tb_hash_func6(phys_pc, pc, flags, trace_vcpu_dstate); > +return tb_hash_func7(phys_pc, pc, flags, cf_mask, trace_vcpu_dstate); > } > > #endif > diff --git a/include/exec/tb-lookup.h b/include/exec/tb-lookup.h > index 436b6d5..2961385
Re: [Qemu-devel] [PATCH v4 11/43] tcg: define CF_PARALLEL and use it for TB hashing along with CF_COUNT_MASK
On 07/20/2017 07:59 PM, Emilio G. Cota wrote: This will enable us to decouple code translation from the value of parallel_cpus at any given time. It will also help us minimize TB flushes when generating code via EXCP_ATOMIC. Note that the declaration of parallel_cpus is brought to exec-all.h to be able to define there the "curr_cflags" inline. Signed-off-by: Emilio G. Cota --- include/exec/exec-all.h | 20 +++- include/exec/tb-hash-xx.h | 9 ++--- include/exec/tb-hash.h| 4 ++-- include/exec/tb-lookup.h | 6 +++--- tcg/tcg.h | 1 - accel/tcg/cpu-exec.c | 45 +++-- accel/tcg/translate-all.c | 13 + exec.c| 2 +- tcg/tcg-runtime.c | 2 +- tests/qht-bench.c | 2 +- 10 files changed, 65 insertions(+), 39 deletions(-) Reviewed-by: Richard Henderson r~
[Qemu-devel] [PATCH v4 11/43] tcg: define CF_PARALLEL and use it for TB hashing along with CF_COUNT_MASK
This will enable us to decouple code translation from the value of parallel_cpus at any given time. It will also help us minimize TB flushes when generating code via EXCP_ATOMIC. Note that the declaration of parallel_cpus is brought to exec-all.h to be able to define there the "curr_cflags" inline. Signed-off-by: Emilio G. Cota --- include/exec/exec-all.h | 20 +++- include/exec/tb-hash-xx.h | 9 ++--- include/exec/tb-hash.h| 4 ++-- include/exec/tb-lookup.h | 6 +++--- tcg/tcg.h | 1 - accel/tcg/cpu-exec.c | 45 +++-- accel/tcg/translate-all.c | 13 + exec.c| 2 +- tcg/tcg-runtime.c | 2 +- tests/qht-bench.c | 2 +- 10 files changed, 65 insertions(+), 39 deletions(-) diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h index 8cbd90b..f6a928d 100644 --- a/include/exec/exec-all.h +++ b/include/exec/exec-all.h @@ -353,6 +353,9 @@ struct TranslationBlock { #define CF_USE_ICOUNT 0x2 #define CF_IGNORE_ICOUNT 0x4 /* Do not generate icount code */ #define CF_INVALID 0x8 /* TB is stale. Setters must acquire tb_lock */ +#define CF_PARALLEL0x10 /* Generate code for a parallel context */ +/* cflags' mask for hashing/comparison */ +#define CF_HASH_MASK (CF_COUNT_MASK | CF_PARALLEL) /* Per-vCPU dynamic tracing state used to generate this TB */ uint32_t trace_vcpu_dstate; @@ -396,11 +399,26 @@ struct TranslationBlock { uintptr_t jmp_list_first; }; +extern bool parallel_cpus; + +/* Hide the atomic_read to make code a little easier on the eyes */ +static inline uint32_t tb_cflags(const TranslationBlock *tb) +{ +return atomic_read(&tb->cflags); +} + +/* current cflags for hashing/comparison */ +static inline uint32_t curr_cflags(void) +{ +return parallel_cpus ? CF_PARALLEL : 0; +} + void tb_free(TranslationBlock *tb); void tb_flush(CPUState *cpu); void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr); TranslationBlock *tb_htable_lookup(CPUState *cpu, target_ulong pc, - target_ulong cs_base, uint32_t flags); + target_ulong cs_base, uint32_t flags, + uint32_t cf_mask); #if defined(USE_DIRECT_JUMP) diff --git a/include/exec/tb-hash-xx.h b/include/exec/tb-hash-xx.h index 6cd3022..747a9a6 100644 --- a/include/exec/tb-hash-xx.h +++ b/include/exec/tb-hash-xx.h @@ -48,8 +48,8 @@ * xxhash32, customized for input variables that are not guaranteed to be * contiguous in memory. */ -static inline -uint32_t tb_hash_func6(uint64_t a0, uint64_t b0, uint32_t e, uint32_t f) +static inline uint32_t +tb_hash_func7(uint64_t a0, uint64_t b0, uint32_t e, uint32_t f, uint32_t g) { uint32_t v1 = TB_HASH_XX_SEED + PRIME32_1 + PRIME32_2; uint32_t v2 = TB_HASH_XX_SEED + PRIME32_2; @@ -78,7 +78,7 @@ uint32_t tb_hash_func6(uint64_t a0, uint64_t b0, uint32_t e, uint32_t f) v4 *= PRIME32_1; h32 = rol32(v1, 1) + rol32(v2, 7) + rol32(v3, 12) + rol32(v4, 18); -h32 += 24; +h32 += 28; h32 += e * PRIME32_3; h32 = rol32(h32, 17) * PRIME32_4; @@ -86,6 +86,9 @@ uint32_t tb_hash_func6(uint64_t a0, uint64_t b0, uint32_t e, uint32_t f) h32 += f * PRIME32_3; h32 = rol32(h32, 17) * PRIME32_4; +h32 += g * PRIME32_3; +h32 = rol32(h32, 17) * PRIME32_4; + h32 ^= h32 >> 15; h32 *= PRIME32_2; h32 ^= h32 >> 13; diff --git a/include/exec/tb-hash.h b/include/exec/tb-hash.h index 17b5ee0..0526c4f 100644 --- a/include/exec/tb-hash.h +++ b/include/exec/tb-hash.h @@ -59,9 +59,9 @@ static inline unsigned int tb_jmp_cache_hash_func(target_ulong pc) static inline uint32_t tb_hash_func(tb_page_addr_t phys_pc, target_ulong pc, uint32_t flags, - uint32_t trace_vcpu_dstate) + uint32_t cf_mask, uint32_t trace_vcpu_dstate) { -return tb_hash_func6(phys_pc, pc, flags, trace_vcpu_dstate); +return tb_hash_func7(phys_pc, pc, flags, cf_mask, trace_vcpu_dstate); } #endif diff --git a/include/exec/tb-lookup.h b/include/exec/tb-lookup.h index 436b6d5..2961385 100644 --- a/include/exec/tb-lookup.h +++ b/include/exec/tb-lookup.h @@ -21,7 +21,7 @@ /* Might cause an exception, so have a longjmp destination ready */ static inline TranslationBlock * tb_lookup__cpu_state(CPUState *cpu, target_ulong *pc, target_ulong *cs_base, - uint32_t *flags) + uint32_t *flags, uint32_t cf_mask) { CPUArchState *env = (CPUArchState *)cpu->env_ptr; TranslationBlock *tb; @@ -35,10 +35,10 @@ tb_lookup__cpu_state(CPUState *cpu, target_ulong *pc, target_ulong *cs_base, tb->cs_base == *cs_base && tb->flags == *flags && tb->trace_vcpu_dstate == *cpu->trace_dstate && - !(atomic_read(&tb->cflags) & CF_INVALID))) { + (t