Re: [Qemu-devel] [PATCH v3 2/5] target-ppc: improve lxvw4x implementation

David Gibson Tue, 20 Sep 2016 19:12:54 -0700

On Tue, Sep 20, 2016 at 10:40:03PM +0530, Nikunj A Dadhania wrote:
> David Gibson <da...@gibson.dropbear.id.au> writes:
> 
> > [ Unknown signature status ]
> > On Mon, Sep 19, 2016 at 04:06:40PM +0530, Nikunj A Dadhania wrote:
> >> David Gibson <da...@gibson.dropbear.id.au> writes:
> >> > [ Unknown signature status ]
> >> > On Mon, Sep 19, 2016 at 04:19:34PM +1000, David Gibson wrote:
> >> >> On Fri, Sep 16, 2016 at 04:21:48PM +0530, Nikunj A Dadhania wrote:
> >> >> > diff --git a/target-ppc/translate/vsx-impl.inc.c 
> >> >> > b/target-ppc/translate/vsx-impl.inc.c
> >> >> > index eee6052..df278df 100644
> >> >> > --- a/target-ppc/translate/vsx-impl.inc.c
> >> >> > +++ b/target-ppc/translate/vsx-impl.inc.c
> >> >> > @@ -75,7 +75,6 @@ static void gen_lxvdsx(DisasContext *ctx)
> >> >> >  static void gen_lxvw4x(DisasContext *ctx)
> >> >> >  {
> >> >> >      TCGv EA;
> >> >> > -    TCGv_i64 tmp;
> >> >> >      TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
> >> >> >      TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
> >> >> >      if (unlikely(!ctx->vsx_enabled)) {
> >> >> > @@ -84,22 +83,14 @@ static void gen_lxvw4x(DisasContext *ctx)
> >> >> >      }
> >> >> >      gen_set_access_type(ctx, ACCESS_INT);
> >> >> >      EA = tcg_temp_new();
> >> >> > -    tmp = tcg_temp_new_i64();
> >> >> >  
> >> >> >      gen_addr_reg_index(ctx, EA);
> >> >> > -    gen_qemu_ld32u_i64(ctx, tmp, EA);
> >> >> > -    tcg_gen_addi_tl(EA, EA, 4);
> >> >> > -    gen_qemu_ld32u_i64(ctx, xth, EA);
> >> >> > -    tcg_gen_deposit_i64(xth, xth, tmp, 32, 32);
> >> >> > -
> >> >> > -    tcg_gen_addi_tl(EA, EA, 4);
> >> >> > -    gen_qemu_ld32u_i64(ctx, tmp, EA);
> >> >> > -    tcg_gen_addi_tl(EA, EA, 4);
> >> >> > -    gen_qemu_ld32u_i64(ctx, xtl, EA);
> >> >> > -    tcg_gen_deposit_i64(xtl, xtl, tmp, 32, 32);
> >> >> > -
> >> >> > +    tcg_gen_qemu_ld_i64(xth, EA, ctx->mem_idx, MO_LEQ);
> >> >> > +    gen_helper_deposit32x2(xth, xth);
> >> >> > +    tcg_gen_addi_tl(EA, EA, 8);
> >> >> > +    tcg_gen_qemu_ld_i64(xtl, EA, ctx->mem_idx, MO_LEQ);
> >> >> > +    gen_helper_deposit32x2(xtl, xtl);
> >> >
> >> > ..and I think this is wrong for BE mode.  The deposit32x2 will get the
> >> > words in the right order, but the bytes within each word will be wrong
> >> > because of the LE mode load on a BE setup.
> >> 
> >> Since lxvw4x/stxvw4x is available on POWER8. I tried running my test
> >> code on BE and LE Fedora24 VM. TCG Results match the POWER8 hardware.
> >> The order within the word is not changed. Snippet of the test code at
> >> the end of email. Can share full code if needed (maybe will do it in
> >> kvm-unit-test)
> >
> > Ugh.. now I'm confused.  I would not have expected the results you've
> > seen from these tests.  But I still can't understand *how* the
> > emulation could be correct: IIUC MO_LEQ would mean it loads the 8
> > bytes as a single 64-bit LE integer.
> 
> For both the case LE/BE we do a LE read ...


.. and I can't see how that can be right for the BE case.

> > Which should be the same as
> > loading one 32-bit LE integer into the low half of the target
> > register, then a 32-bit LE integer into the high half ot the target
> > register.
> 
> .. The 64-bit integer read is not same in these cases. The input itself
> would be in the order of the format.
> 
> Input rb32[]:  00010203  20212223  30313233  40414243 
> 
> LE:
> helper_deposit32x2: 2021222300010203 
> helper_deposit32x2: 4041424330313233
> 
> BE
> helper_deposit32x2: 2322212003020100
> helper_deposit32x2: 4342414033323130


Sorry.. I can't really follow the above, because I'm not sure if
you're displaying the bytes within each word in significance order, or
increasing-address order.

> >
> > As I said above, the deposit32x2 will swap the order of the two ints,
> > but it won't byteswap the individual int32s which should have been BE
> > in memory.
> >
> > Can you find the flaw in my reasoning?
> 
> One anomaly that I see in BE code generation: it also generates a
> stxvw4x after lxvw4x. I am not sure why.

Ah... see I'm wondering if it's using the stxvw4x to store back to the
union which you then get the results from.  If that's so it could
explain the results, since the bug I suspect is in lxvw4x would be
cancelled out by the corresponding bug in stxv4wx, which is exactly
why I'd prefer the approach to testing mentioned below.

> 
> >>>>>>>>>>>>>>>> BE BE BE >>>>>>>>>>>>>>
> Input rb32[]:   00010203  20212223  30313233  40414243 
> 
> gen_lxvw4x: called
> helper_deposit32x2: 2322212003020100
> helper_deposit32x2: 4342414033323130
> gen_stxvw4x: called
> helper_deposit32x2: 0302010023222120
> helper_deposit32x2: 3332313043424140
> Output VRT32:  00010203 20212223 30313233 40414243 
> 
> >> vsx.h:
> >> ======
> >> #define U32_SIZE (sizeof(__vector uint32_t) / sizeof(uint32_t))
> >> 
> >> typedef union {
> >>     __vector uint32_t v;
> >>     uint32_t a[U32_SIZE];
> >> } vuint32_t;
> >
> > I am a little suspicious that whatever the compiler does to convert
> > the vector to an array via this union might be undoing a byte reverse.
> >
> > I'd be more confident if you used VSX instructions to extract and
> > store separately one of the 32-bit subwords of the vector.
> 
> I will try to figure those instructions.

Ok, thanks.

-- 
David Gibson                    | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
                                | _way_ _around_!
http://www.ozlabs.org/~dgibson

signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH v3 2/5] target-ppc: improve lxvw4x implementation

Reply via email to