Re: [Qemu-devel] [PATCH v3 2/5] target-ppc: improve lxvw4x implementation

Nikunj A Dadhania Tue, 20 Sep 2016 20:47:05 -0700

David Gibson <da...@gibson.dropbear.id.au> writes:

> [ Unknown signature status ]
> On Tue, Sep 20, 2016 at 10:40:03PM +0530, Nikunj A Dadhania wrote:
>> David Gibson <da...@gibson.dropbear.id.au> writes:
>> 
>> > [ Unknown signature status ]
>> > On Mon, Sep 19, 2016 at 04:06:40PM +0530, Nikunj A Dadhania wrote:
>> >> David Gibson <da...@gibson.dropbear.id.au> writes:
>> >> > [ Unknown signature status ]
>> >> > On Mon, Sep 19, 2016 at 04:19:34PM +1000, David Gibson wrote:
>> >> >> On Fri, Sep 16, 2016 at 04:21:48PM +0530, Nikunj A Dadhania wrote:
>> >> >> > diff --git a/target-ppc/translate/vsx-impl.inc.c 
>> >> >> > b/target-ppc/translate/vsx-impl.inc.c
>> >> >> > index eee6052..df278df 100644
>> >> >> > --- a/target-ppc/translate/vsx-impl.inc.c
>> >> >> > +++ b/target-ppc/translate/vsx-impl.inc.c
>> >> >> > @@ -75,7 +75,6 @@ static void gen_lxvdsx(DisasContext *ctx)
>> >> >> >  static void gen_lxvw4x(DisasContext *ctx)
>> >> >> >  {
>> >> >> >      TCGv EA;
>> >> >> > -    TCGv_i64 tmp;
>> >> >> >      TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
>> >> >> >      TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
>> >> >> >      if (unlikely(!ctx->vsx_enabled)) {
>> >> >> > @@ -84,22 +83,14 @@ static void gen_lxvw4x(DisasContext *ctx)
>> >> >> >      }
>> >> >> >      gen_set_access_type(ctx, ACCESS_INT);
>> >> >> >      EA = tcg_temp_new();
>> >> >> > -    tmp = tcg_temp_new_i64();
>> >> >> >  
>> >> >> >      gen_addr_reg_index(ctx, EA);
>> >> >> > -    gen_qemu_ld32u_i64(ctx, tmp, EA);
>> >> >> > -    tcg_gen_addi_tl(EA, EA, 4);
>> >> >> > -    gen_qemu_ld32u_i64(ctx, xth, EA);
>> >> >> > -    tcg_gen_deposit_i64(xth, xth, tmp, 32, 32);
>> >> >> > -
>> >> >> > -    tcg_gen_addi_tl(EA, EA, 4);
>> >> >> > -    gen_qemu_ld32u_i64(ctx, tmp, EA);
>> >> >> > -    tcg_gen_addi_tl(EA, EA, 4);
>> >> >> > -    gen_qemu_ld32u_i64(ctx, xtl, EA);
>> >> >> > -    tcg_gen_deposit_i64(xtl, xtl, tmp, 32, 32);
>> >> >> > -
>> >> >> > +    tcg_gen_qemu_ld_i64(xth, EA, ctx->mem_idx, MO_LEQ);
>> >> >> > +    gen_helper_deposit32x2(xth, xth);
>> >> >> > +    tcg_gen_addi_tl(EA, EA, 8);
>> >> >> > +    tcg_gen_qemu_ld_i64(xtl, EA, ctx->mem_idx, MO_LEQ);
>> >> >> > +    gen_helper_deposit32x2(xtl, xtl);
>> >> >
>> >> > ..and I think this is wrong for BE mode.  The deposit32x2 will get the
>> >> > words in the right order, but the bytes within each word will be wrong
>> >> > because of the LE mode load on a BE setup.
>> >> 
>> >> Since lxvw4x/stxvw4x is available on POWER8. I tried running my test
>> >> code on BE and LE Fedora24 VM. TCG Results match the POWER8 hardware.
>> >> The order within the word is not changed. Snippet of the test code at
>> >> the end of email. Can share full code if needed (maybe will do it in
>> >> kvm-unit-test)
>> >
>> > Ugh.. now I'm confused.  I would not have expected the results you've
>> > seen from these tests.  But I still can't understand *how* the
>> > emulation could be correct: IIUC MO_LEQ would mean it loads the 8
>> > bytes as a single 64-bit LE integer.
>> 
>> For both the case LE/BE we do a LE read ...
>
> .. and I can't see how that can be right for the BE case.
>
>> > Which should be the same as
>> > loading one 32-bit LE integer into the low half of the target
>> > register, then a 32-bit LE integer into the high half ot the target
>> > register.
>> 
>> .. The 64-bit integer read is not same in these cases. The input itself
>> would be in the order of the format.
>> 
>> Input rb32[]:  00010203  20212223  30313233  40414243 
>> 
>> LE:
>> helper_deposit32x2: 2021222300010203 
>> helper_deposit32x2: 4041424330313233
>> 
>> BE
>> helper_deposit32x2: 2322212003020100
>> helper_deposit32x2: 4342414033323130
>
>
> Sorry.. I can't really follow the above, because I'm not sure if
> you're displaying the bytes within each word in significance order, or
> increasing-address order.


Ah, thats just a print inside the helper just to check what MO_LEQ returned:

 uint64_t helper_deposit32x2(uint64_t x)
 {
     fprintf(stderr, "%s: %016lx\n", __func__, x);
     return deposit64((x >> 32), 32, 32, (x));
 }

>
>> >
>> > As I said above, the deposit32x2 will swap the order of the two ints,
>> > but it won't byteswap the individual int32s which should have been BE
>> > in memory.
>> >
>> > Can you find the flaw in my reasoning?
>> 
>> One anomaly that I see in BE code generation: it also generates a
>> stxvw4x after lxvw4x. I am not sure why.
>
> Ah... see I'm wondering if it's using the stxvw4x to store back to the
> union which you then get the results from.  If that's so it could
> explain the results, since the bug I suspect is in lxvw4x would be
> cancelled out by the corresponding bug in stxv4wx, which is exactly
> why I'd prefer the approach to testing mentioned below.

Yes, I am investigating it.

>> >>>>>>>>>>>>>>>> BE BE BE >>>>>>>>>>>>>>
>> Input rb32[]:   00010203  20212223  30313233  40414243 
>> 
>> gen_lxvw4x: called
>> helper_deposit32x2: 2322212003020100
>> helper_deposit32x2: 4342414033323130
>> gen_stxvw4x: called
>> helper_deposit32x2: 0302010023222120
>> helper_deposit32x2: 3332313043424140
>> Output VRT32:  00010203 20212223 30313233 40414243 
>> 
>> >> vsx.h:
>> >> ======
>> >> #define U32_SIZE (sizeof(__vector uint32_t) / sizeof(uint32_t))
>> >> 
>> >> typedef union {
>> >>     __vector uint32_t v;
>> >>     uint32_t a[U32_SIZE];
>> >> } vuint32_t;
>> >
>> > I am a little suspicious that whatever the compiler does to convert
>> > the vector to an array via this union might be undoing a byte reverse.
>> >
>> > I'd be more confident if you used VSX instructions to extract and
>> > store separately one of the 32-bit subwords of the vector.
>> 
>> I will try to figure those instructions.
>
> Ok, thanks.
>

Regards,
Nikunj

Re: [Qemu-devel] [PATCH v3 2/5] target-ppc: improve lxvw4x implementation

Reply via email to