Re: [Qemu-devel] [PATCH v4 11/43] tcg: define CF_PARALLEL and use it for TB hashing along with CF_COUNT_MASK

2017-10-09 Thread Richard Henderson
On 10/09/2017 12:24 PM, Emilio G. Cota wrote:
> On Thu, Oct 05, 2017 at 19:24:16 -0400, Emilio G. Cota wrote:
>> I'm including the fixups below.
> 
> Just pushed v5 of this series. I rebased the series on top of the current 
> master,
> and added the already mentioned fixes plus a new one (see below).
> 
> Grab the patches from:
>   https://github.com/cota/qemu/tree/multi-tcg-v5

Thanks.  I was just about to do this rebase myself.

> Changes since v4:
> - Rebase to current master. (fixed a few conflicts, mostly due to some targets
>   having been converted to the generic translation loop.)
> - Add the removal of CF_COUNT_MASK as a separate "FIXUP" commit to signal
>   that this still requires attention.
> - Fix tb TC comparison function to support deletions (otherwise -icount 
> breaks)
> - [new fix] TCG regions: Rely on max_cpus instead of smp_cpus. Otherwise we'd
>   break CPU hotplug on targets that support it. (max_cpus and smp_cpus are
>   set from the command line with -smp foo,maxcpus=bar; max_cpus is always
>   >= smp_cpus.)

Excellent.  I'll work on reviewing these changes this week.
In the meantime, I've cherry-picked about half of the patch set.


r~




Re: [Qemu-devel] [PATCH v4 11/43] tcg: define CF_PARALLEL and use it for TB hashing along with CF_COUNT_MASK

2017-10-09 Thread Emilio G. Cota
On Thu, Oct 05, 2017 at 19:24:16 -0400, Emilio G. Cota wrote:
> I'm including the fixups below.

Just pushed v5 of this series. I rebased the series on top of the current 
master,
and added the already mentioned fixes plus a new one (see below).

Grab the patches from:
  https://github.com/cota/qemu/tree/multi-tcg-v5

Changes since v4:
- Rebase to current master. (fixed a few conflicts, mostly due to some targets
  having been converted to the generic translation loop.)
- Add the removal of CF_COUNT_MASK as a separate "FIXUP" commit to signal
  that this still requires attention.
- Fix tb TC comparison function to support deletions (otherwise -icount breaks)
- [new fix] TCG regions: Rely on max_cpus instead of smp_cpus. Otherwise we'd
  break CPU hotplug on targets that support it. (max_cpus and smp_cpus are
  set from the command line with -smp foo,maxcpus=bar; max_cpus is always
  >= smp_cpus.)

Thanks,

Emilio



Re: [Qemu-devel] [PATCH v4 11/43] tcg: define CF_PARALLEL and use it for TB hashing along with CF_COUNT_MASK

2017-10-05 Thread Emilio G. Cota
On Mon, Sep 25, 2017 at 10:01:15 -0700, Richard Henderson wrote:
> On 09/22/2017 01:40 PM, Emilio G. Cota wrote:
> > Hi Richard,
> > 
> > Are you planning to get this patchset merged in this window? If so, I can
> > give it a respin on top of the current master.
> 
> Yes, I do.  I've been intending to look at ...
> 
> > Anyway, before doing so we should fix the issue around CF_COUNT_MASK that
> > Pranith reported:
> 
> ... this one and figure out why my intuition is wrong.
> I'm at Linaro Connect this week, so it may have to wait til next.

I just tested this with icount. Turns out two fixups to this patchset are
necessary to not break icount. The first one is (again) to just
include CF_PARALLEL in the hash mask. The second is a fix for the comparison
function of tb_tc, without which removals don't work.

I'm including the fixups below.

Thanks,

Emilio

commit 7a899a8df2d769dd80ba1a7f559cb4ddbb0c568b
Author: Emilio G. Cota 
Date:   Thu Oct 5 18:40:30 2017 -0400

FIXUP for "tcg: define CF_PARALLEL ..."

Signed-off-by: Emilio G. Cota 

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 025fae0..8b2f233 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -326,7 +326,7 @@ struct TranslationBlock {
 #define CF_INVALID 0x8 /* TB is stale. Setters must acquire tb_lock */
 #define CF_PARALLEL0x10 /* Generate code for a parallel context */
 /* cflags' mask for hashing/comparison */
-#define CF_HASH_MASK (CF_COUNT_MASK | CF_PARALLEL)
+#define CF_HASH_MASK (CF_PARALLEL)
 
 /* Per-vCPU dynamic tracing state used to generate this TB */
 uint32_t trace_vcpu_dstate;

commit c102c2409a5a134fd7f9ba61f69a5ae29df99c89
Author: Emilio G. Cota 
Date:   Thu Oct 5 18:51:24 2017 -0400

FIXUP for "translate-all: use a binary search tree to ..."

Signed-off-by: Emilio G. Cota 

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 9f1faff..fe607ca 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -793,19 +793,19 @@ static gint tb_tc_cmp(gconstpointer ap, gconstpointer bp)
 const struct tb_tc *b = bp;
 
 /*
- * When both sizes are set, we know this isn't a lookup and therefore
- * the two buffers are non-overlapping.
+ * When both sizes are set, we know this isn't a lookup.
  * This is the most likely case: every TB must be inserted; lookups
  * are a lot less frequent.
  */
 if (likely(a->size && b->size)) {
-/* a->ptr == b->ptr would mean the buffers overlap */
-g_assert(a->ptr != b->ptr);
-
 if (a->ptr > b->ptr) {
 return 1;
+} else if (a->ptr < b->ptr) {
+return -1;
 }
-return -1;
+/* a->ptr == b->ptr should happen only on deletions */
+g_assert(a->size == b->size);
+return 0;
 }
 /*
  * All lookups have either .size field set to 0.




Re: [Qemu-devel] [PATCH v4 11/43] tcg: define CF_PARALLEL and use it for TB hashing along with CF_COUNT_MASK

2017-09-25 Thread Richard Henderson
On 09/22/2017 01:40 PM, Emilio G. Cota wrote:
> Hi Richard,
> 
> Are you planning to get this patchset merged in this window? If so, I can
> give it a respin on top of the current master.

Yes, I do.  I've been intending to look at ...

> Anyway, before doing so we should fix the issue around CF_COUNT_MASK that
> Pranith reported:

... this one and figure out why my intuition is wrong.
I'm at Linaro Connect this week, so it may have to wait til next.


r~



Re: [Qemu-devel] [PATCH v4 11/43] tcg: define CF_PARALLEL and use it for TB hashing along with CF_COUNT_MASK

2017-09-25 Thread Emilio G. Cota
Hi Richard,

Are you planning to get this patchset merged in this window? If so, I can
give it a respin on top of the current master.

Anyway, before doing so we should fix the issue around CF_COUNT_MASK that
Pranith reported:

On Wed, Aug 30, 2017 at 10:43:28 -0400, Pranith Kumar wrote:
> On Tue, Aug 29, 2017 at 5:16 PM, Emilio G. Cota  wrote:
> > On Sun, Aug 27, 2017 at 18:15:50 -0400, Pranith Kumar wrote:
> >> Hi Emilio,
> >>
> >> On Fri, Jul 21, 2017 at 1:59 AM, Emilio G. Cota  wrote:
> >> > This will enable us to decouple code translation from the value
> >> > of parallel_cpus at any given time. It will also help us minimize
> >> > TB flushes when generating code via EXCP_ATOMIC.
> >> >
> >> > Note that the declaration of parallel_cpus is brought to exec-all.h
> >> > to be able to define there the "curr_cflags" inline.
> >> >
> >> > Signed-off-by: Emilio G. Cota 
> >>
> >> I was testing a winxp image today and I bisected a infinite loop to
> >> this commit. The loop happens both with and without mttcg, so I think
> >> it has got to do with something else.
> >
> > Can you test the below? It lets me boot ubuntu, otherwise it reliably
> > chokes on a 'rep movsb' *very* early (doesn't even get to grub).
> >
> > This discusson on v2 might be relevant (I added CF_COUNT_MASK as a
> > result of it, but it seems I have to revisit that):
> >   https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg06456.html
> >
> > Anyway let me know if this fixes it for you. Thanks for testing!
> >
> 
> Yes, this fixes the issue for me.

My quick-fix is just not to check those bits, but as you pointed out during
v3's review the whole thing is a little bit fragile if we don't check them.

Should we just go with the CF_NOCHAIN flag that you proposed? If so, do you
want me to give that approach a go, or you prefer to do it yourself?

Thanks,

Emilio




Re: [Qemu-devel] [PATCH v4 11/43] tcg: define CF_PARALLEL and use it for TB hashing along with CF_COUNT_MASK

2017-08-30 Thread Pranith Kumar
On Tue, Aug 29, 2017 at 5:16 PM, Emilio G. Cota  wrote:
> On Sun, Aug 27, 2017 at 18:15:50 -0400, Pranith Kumar wrote:
>> Hi Emilio,
>>
>> On Fri, Jul 21, 2017 at 1:59 AM, Emilio G. Cota  wrote:
>> > This will enable us to decouple code translation from the value
>> > of parallel_cpus at any given time. It will also help us minimize
>> > TB flushes when generating code via EXCP_ATOMIC.
>> >
>> > Note that the declaration of parallel_cpus is brought to exec-all.h
>> > to be able to define there the "curr_cflags" inline.
>> >
>> > Signed-off-by: Emilio G. Cota 
>>
>> I was testing a winxp image today and I bisected a infinite loop to
>> this commit. The loop happens both with and without mttcg, so I think
>> it has got to do with something else.
>
> Can you test the below? It lets me boot ubuntu, otherwise it reliably
> chokes on a 'rep movsb' *very* early (doesn't even get to grub).
>
> This discusson on v2 might be relevant (I added CF_COUNT_MASK as a
> result of it, but it seems I have to revisit that):
>   https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg06456.html
>
> Anyway let me know if this fixes it for you. Thanks for testing!
>

Yes, this fixes the issue for me.

Thanks,
--
Pranith



Re: [Qemu-devel] [PATCH v4 11/43] tcg: define CF_PARALLEL and use it for TB hashing along with CF_COUNT_MASK

2017-08-29 Thread Emilio G. Cota
On Sun, Aug 27, 2017 at 18:15:50 -0400, Pranith Kumar wrote:
> Hi Emilio,
> 
> On Fri, Jul 21, 2017 at 1:59 AM, Emilio G. Cota  wrote:
> > This will enable us to decouple code translation from the value
> > of parallel_cpus at any given time. It will also help us minimize
> > TB flushes when generating code via EXCP_ATOMIC.
> >
> > Note that the declaration of parallel_cpus is brought to exec-all.h
> > to be able to define there the "curr_cflags" inline.
> >
> > Signed-off-by: Emilio G. Cota 
> 
> I was testing a winxp image today and I bisected a infinite loop to
> this commit. The loop happens both with and without mttcg, so I think
> it has got to do with something else.

Can you test the below? It lets me boot ubuntu, otherwise it reliably
chokes on a 'rep movsb' *very* early (doesn't even get to grub).

This discusson on v2 might be relevant (I added CF_COUNT_MASK as a
result of it, but it seems I have to revisit that):
  https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg06456.html

Anyway let me know if this fixes it for you. Thanks for testing!

Emilio

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 025fae0..8b2f233 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -326,7 +326,7 @@ struct TranslationBlock {
 #define CF_INVALID 0x8 /* TB is stale. Setters must acquire tb_lock */
 #define CF_PARALLEL0x10 /* Generate code for a parallel context */
 /* cflags' mask for hashing/comparison */
-#define CF_HASH_MASK (CF_COUNT_MASK | CF_PARALLEL)
+#define CF_HASH_MASK (CF_PARALLEL)

 /* Per-vCPU dynamic tracing state used to generate this TB */
 uint32_t trace_vcpu_dstate;



Re: [Qemu-devel] [PATCH v4 11/43] tcg: define CF_PARALLEL and use it for TB hashing along with CF_COUNT_MASK

2017-08-27 Thread Pranith Kumar
Hi Emilio,

On Fri, Jul 21, 2017 at 1:59 AM, Emilio G. Cota  wrote:
> This will enable us to decouple code translation from the value
> of parallel_cpus at any given time. It will also help us minimize
> TB flushes when generating code via EXCP_ATOMIC.
>
> Note that the declaration of parallel_cpus is brought to exec-all.h
> to be able to define there the "curr_cflags" inline.
>
> Signed-off-by: Emilio G. Cota 

I was testing a winxp image today and I bisected a infinite loop to
this commit. The loop happens both with and without mttcg, so I think
it has got to do with something else.

It seems to be executing this instruction:

Trace 0x7fffc1d036c0 [0: 000c9a6b]

IN:
0x000c9a6b:  rep movsb %cs:(%si),%es:(%di)

and never stops.

You can find an execution log here: http://pranith.org/files/log_n.txt.gz

Let me know if you need more information.

Thanks,

> ---
>  include/exec/exec-all.h   | 20 +++-
>  include/exec/tb-hash-xx.h |  9 ++---
>  include/exec/tb-hash.h|  4 ++--
>  include/exec/tb-lookup.h  |  6 +++---
>  tcg/tcg.h |  1 -
>  accel/tcg/cpu-exec.c  | 45 +++--
>  accel/tcg/translate-all.c | 13 +
>  exec.c|  2 +-
>  tcg/tcg-runtime.c |  2 +-
>  tests/qht-bench.c |  2 +-
>  10 files changed, 65 insertions(+), 39 deletions(-)
>
> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
> index 8cbd90b..f6a928d 100644
> --- a/include/exec/exec-all.h
> +++ b/include/exec/exec-all.h
> @@ -353,6 +353,9 @@ struct TranslationBlock {
>  #define CF_USE_ICOUNT  0x2
>  #define CF_IGNORE_ICOUNT 0x4 /* Do not generate icount code */
>  #define CF_INVALID 0x8 /* TB is stale. Setters must acquire tb_lock 
> */
> +#define CF_PARALLEL0x10 /* Generate code for a parallel context */
> +/* cflags' mask for hashing/comparison */
> +#define CF_HASH_MASK (CF_COUNT_MASK | CF_PARALLEL)
>
>  /* Per-vCPU dynamic tracing state used to generate this TB */
>  uint32_t trace_vcpu_dstate;
> @@ -396,11 +399,26 @@ struct TranslationBlock {
>  uintptr_t jmp_list_first;
>  };
>
> +extern bool parallel_cpus;
> +
> +/* Hide the atomic_read to make code a little easier on the eyes */
> +static inline uint32_t tb_cflags(const TranslationBlock *tb)
> +{
> +return atomic_read(>cflags);
> +}
> +
> +/* current cflags for hashing/comparison */
> +static inline uint32_t curr_cflags(void)
> +{
> +return parallel_cpus ? CF_PARALLEL : 0;
> +}
> +
>  void tb_free(TranslationBlock *tb);
>  void tb_flush(CPUState *cpu);
>  void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr);
>  TranslationBlock *tb_htable_lookup(CPUState *cpu, target_ulong pc,
> -   target_ulong cs_base, uint32_t flags);
> +   target_ulong cs_base, uint32_t flags,
> +   uint32_t cf_mask);
>
>  #if defined(USE_DIRECT_JUMP)
>
> diff --git a/include/exec/tb-hash-xx.h b/include/exec/tb-hash-xx.h
> index 6cd3022..747a9a6 100644
> --- a/include/exec/tb-hash-xx.h
> +++ b/include/exec/tb-hash-xx.h
> @@ -48,8 +48,8 @@
>   * xxhash32, customized for input variables that are not guaranteed to be
>   * contiguous in memory.
>   */
> -static inline
> -uint32_t tb_hash_func6(uint64_t a0, uint64_t b0, uint32_t e, uint32_t f)
> +static inline uint32_t
> +tb_hash_func7(uint64_t a0, uint64_t b0, uint32_t e, uint32_t f, uint32_t g)
>  {
>  uint32_t v1 = TB_HASH_XX_SEED + PRIME32_1 + PRIME32_2;
>  uint32_t v2 = TB_HASH_XX_SEED + PRIME32_2;
> @@ -78,7 +78,7 @@ uint32_t tb_hash_func6(uint64_t a0, uint64_t b0, uint32_t 
> e, uint32_t f)
>  v4 *= PRIME32_1;
>
>  h32 = rol32(v1, 1) + rol32(v2, 7) + rol32(v3, 12) + rol32(v4, 18);
> -h32 += 24;
> +h32 += 28;
>
>  h32 += e * PRIME32_3;
>  h32  = rol32(h32, 17) * PRIME32_4;
> @@ -86,6 +86,9 @@ uint32_t tb_hash_func6(uint64_t a0, uint64_t b0, uint32_t 
> e, uint32_t f)
>  h32 += f * PRIME32_3;
>  h32  = rol32(h32, 17) * PRIME32_4;
>
> +h32 += g * PRIME32_3;
> +h32  = rol32(h32, 17) * PRIME32_4;
> +
>  h32 ^= h32 >> 15;
>  h32 *= PRIME32_2;
>  h32 ^= h32 >> 13;
> diff --git a/include/exec/tb-hash.h b/include/exec/tb-hash.h
> index 17b5ee0..0526c4f 100644
> --- a/include/exec/tb-hash.h
> +++ b/include/exec/tb-hash.h
> @@ -59,9 +59,9 @@ static inline unsigned int 
> tb_jmp_cache_hash_func(target_ulong pc)
>
>  static inline
>  uint32_t tb_hash_func(tb_page_addr_t phys_pc, target_ulong pc, uint32_t 
> flags,
> -  uint32_t trace_vcpu_dstate)
> +  uint32_t cf_mask, uint32_t trace_vcpu_dstate)
>  {
> -return tb_hash_func6(phys_pc, pc, flags, trace_vcpu_dstate);
> +return tb_hash_func7(phys_pc, pc, flags, cf_mask, trace_vcpu_dstate);
>  }
>
>  #endif
> diff --git a/include/exec/tb-lookup.h 

Re: [Qemu-devel] [PATCH v4 11/43] tcg: define CF_PARALLEL and use it for TB hashing along with CF_COUNT_MASK

2017-07-21 Thread Richard Henderson

On 07/20/2017 07:59 PM, Emilio G. Cota wrote:

This will enable us to decouple code translation from the value
of parallel_cpus at any given time. It will also help us minimize
TB flushes when generating code via EXCP_ATOMIC.

Note that the declaration of parallel_cpus is brought to exec-all.h
to be able to define there the "curr_cflags" inline.

Signed-off-by: Emilio G. Cota
---
  include/exec/exec-all.h   | 20 +++-
  include/exec/tb-hash-xx.h |  9 ++---
  include/exec/tb-hash.h|  4 ++--
  include/exec/tb-lookup.h  |  6 +++---
  tcg/tcg.h |  1 -
  accel/tcg/cpu-exec.c  | 45 +++--
  accel/tcg/translate-all.c | 13 +
  exec.c|  2 +-
  tcg/tcg-runtime.c |  2 +-
  tests/qht-bench.c |  2 +-
  10 files changed, 65 insertions(+), 39 deletions(-)


Reviewed-by: Richard Henderson 


r~