Re: [Qemu-devel] [PATCH v3 2/5] target-ppc: improve lxvw4x implementation

Nikunj A Dadhania Tue, 20 Sep 2016 11:01:27 -0700

David Gibson <da...@gibson.dropbear.id.au> writes:

> [ Unknown signature status ]
> On Mon, Sep 19, 2016 at 04:06:40PM +0530, Nikunj A Dadhania wrote:
>> David Gibson <da...@gibson.dropbear.id.au> writes:
>> > [ Unknown signature status ]
>> > On Mon, Sep 19, 2016 at 04:19:34PM +1000, David Gibson wrote:
>> >> On Fri, Sep 16, 2016 at 04:21:48PM +0530, Nikunj A Dadhania wrote:
>> >> > diff --git a/target-ppc/translate/vsx-impl.inc.c 
>> >> > b/target-ppc/translate/vsx-impl.inc.c
>> >> > index eee6052..df278df 100644
>> >> > --- a/target-ppc/translate/vsx-impl.inc.c
>> >> > +++ b/target-ppc/translate/vsx-impl.inc.c
>> >> > @@ -75,7 +75,6 @@ static void gen_lxvdsx(DisasContext *ctx)
>> >> >  static void gen_lxvw4x(DisasContext *ctx)
>> >> >  {
>> >> >      TCGv EA;
>> >> > -    TCGv_i64 tmp;
>> >> >      TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode));
>> >> >      TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode));
>> >> >      if (unlikely(!ctx->vsx_enabled)) {
>> >> > @@ -84,22 +83,14 @@ static void gen_lxvw4x(DisasContext *ctx)
>> >> >      }
>> >> >      gen_set_access_type(ctx, ACCESS_INT);
>> >> >      EA = tcg_temp_new();
>> >> > -    tmp = tcg_temp_new_i64();
>> >> >  
>> >> >      gen_addr_reg_index(ctx, EA);
>> >> > -    gen_qemu_ld32u_i64(ctx, tmp, EA);
>> >> > -    tcg_gen_addi_tl(EA, EA, 4);
>> >> > -    gen_qemu_ld32u_i64(ctx, xth, EA);
>> >> > -    tcg_gen_deposit_i64(xth, xth, tmp, 32, 32);
>> >> > -
>> >> > -    tcg_gen_addi_tl(EA, EA, 4);
>> >> > -    gen_qemu_ld32u_i64(ctx, tmp, EA);
>> >> > -    tcg_gen_addi_tl(EA, EA, 4);
>> >> > -    gen_qemu_ld32u_i64(ctx, xtl, EA);
>> >> > -    tcg_gen_deposit_i64(xtl, xtl, tmp, 32, 32);
>> >> > -
>> >> > +    tcg_gen_qemu_ld_i64(xth, EA, ctx->mem_idx, MO_LEQ);
>> >> > +    gen_helper_deposit32x2(xth, xth);
>> >> > +    tcg_gen_addi_tl(EA, EA, 8);
>> >> > +    tcg_gen_qemu_ld_i64(xtl, EA, ctx->mem_idx, MO_LEQ);
>> >> > +    gen_helper_deposit32x2(xtl, xtl);
>> >
>> > ..and I think this is wrong for BE mode.  The deposit32x2 will get the
>> > words in the right order, but the bytes within each word will be wrong
>> > because of the LE mode load on a BE setup.
>> 
>> Since lxvw4x/stxvw4x is available on POWER8. I tried running my test
>> code on BE and LE Fedora24 VM. TCG Results match the POWER8 hardware.
>> The order within the word is not changed. Snippet of the test code at
>> the end of email. Can share full code if needed (maybe will do it in
>> kvm-unit-test)
>
> Ugh.. now I'm confused.  I would not have expected the results you've
> seen from these tests.  But I still can't understand *how* the
> emulation could be correct: IIUC MO_LEQ would mean it loads the 8
> bytes as a single 64-bit LE integer.


For both the case LE/BE we do a LE read ...

> Which should be the same as
> loading one 32-bit LE integer into the low half of the target
> register, then a 32-bit LE integer into the high half ot the target
> register.

.. The 64-bit integer read is not same in these cases. The input itself
would be in the order of the format.

Input rb32[]:  00010203  20212223  30313233  40414243 

LE:
helper_deposit32x2: 2021222300010203 
helper_deposit32x2: 4041424330313233

BE
helper_deposit32x2: 2322212003020100
helper_deposit32x2: 4342414033323130

>
> As I said above, the deposit32x2 will swap the order of the two ints,
> but it won't byteswap the individual int32s which should have been BE
> in memory.
>
> Can you find the flaw in my reasoning?

One anomaly that I see in BE code generation: it also generates a
stxvw4x after lxvw4x. I am not sure why.

>>>>>>>>>>>>>>>> BE BE BE >>>>>>>>>>>>>>
Input rb32[]:   00010203  20212223  30313233  40414243 

gen_lxvw4x: called
helper_deposit32x2: 2322212003020100
helper_deposit32x2: 4342414033323130
gen_stxvw4x: called
helper_deposit32x2: 0302010023222120
helper_deposit32x2: 3332313043424140
Output VRT32:  00010203 20212223 30313233 40414243 

>> vsx.h:
>> ======
>> #define U32_SIZE (sizeof(__vector uint32_t) / sizeof(uint32_t))
>> 
>> typedef union {
>>     __vector uint32_t v;
>>     uint32_t a[U32_SIZE];
>> } vuint32_t;
>
> I am a little suspicious that whatever the compiler does to convert
> the vector to an array via this union might be undoing a byte reverse.
>
> I'd be more confident if you used VSX instructions to extract and
> store separately one of the 32-bit subwords of the vector.

I will try to figure those instructions.

Regards
Nikunj

Re: [Qemu-devel] [PATCH v3 2/5] target-ppc: improve lxvw4x implementation

Reply via email to