David Gibson <da...@gibson.dropbear.id.au> writes: > [ Unknown signature status ] > On Mon, Sep 19, 2016 at 04:06:40PM +0530, Nikunj A Dadhania wrote: >> David Gibson <da...@gibson.dropbear.id.au> writes: >> > [ Unknown signature status ] >> > On Mon, Sep 19, 2016 at 04:19:34PM +1000, David Gibson wrote: >> >> On Fri, Sep 16, 2016 at 04:21:48PM +0530, Nikunj A Dadhania wrote: >> >> > diff --git a/target-ppc/translate/vsx-impl.inc.c >> >> > b/target-ppc/translate/vsx-impl.inc.c >> >> > index eee6052..df278df 100644 >> >> > --- a/target-ppc/translate/vsx-impl.inc.c >> >> > +++ b/target-ppc/translate/vsx-impl.inc.c >> >> > @@ -75,7 +75,6 @@ static void gen_lxvdsx(DisasContext *ctx) >> >> > static void gen_lxvw4x(DisasContext *ctx) >> >> > { >> >> > TCGv EA; >> >> > - TCGv_i64 tmp; >> >> > TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode)); >> >> > TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode)); >> >> > if (unlikely(!ctx->vsx_enabled)) { >> >> > @@ -84,22 +83,14 @@ static void gen_lxvw4x(DisasContext *ctx) >> >> > } >> >> > gen_set_access_type(ctx, ACCESS_INT); >> >> > EA = tcg_temp_new(); >> >> > - tmp = tcg_temp_new_i64(); >> >> > >> >> > gen_addr_reg_index(ctx, EA); >> >> > - gen_qemu_ld32u_i64(ctx, tmp, EA); >> >> > - tcg_gen_addi_tl(EA, EA, 4); >> >> > - gen_qemu_ld32u_i64(ctx, xth, EA); >> >> > - tcg_gen_deposit_i64(xth, xth, tmp, 32, 32); >> >> > - >> >> > - tcg_gen_addi_tl(EA, EA, 4); >> >> > - gen_qemu_ld32u_i64(ctx, tmp, EA); >> >> > - tcg_gen_addi_tl(EA, EA, 4); >> >> > - gen_qemu_ld32u_i64(ctx, xtl, EA); >> >> > - tcg_gen_deposit_i64(xtl, xtl, tmp, 32, 32); >> >> > - >> >> > + tcg_gen_qemu_ld_i64(xth, EA, ctx->mem_idx, MO_LEQ); >> >> > + gen_helper_deposit32x2(xth, xth); >> >> > + tcg_gen_addi_tl(EA, EA, 8); >> >> > + tcg_gen_qemu_ld_i64(xtl, EA, ctx->mem_idx, MO_LEQ); >> >> > + gen_helper_deposit32x2(xtl, xtl); >> > >> > ..and I think this is wrong for BE mode. The deposit32x2 will get the >> > words in the right order, but the bytes within each word will be wrong >> > because of the LE mode load on a BE setup. >> >> Since lxvw4x/stxvw4x is available on POWER8. I tried running my test >> code on BE and LE Fedora24 VM. TCG Results match the POWER8 hardware. >> The order within the word is not changed. Snippet of the test code at >> the end of email. Can share full code if needed (maybe will do it in >> kvm-unit-test) > > Ugh.. now I'm confused. I would not have expected the results you've > seen from these tests. But I still can't understand *how* the > emulation could be correct: IIUC MO_LEQ would mean it loads the 8 > bytes as a single 64-bit LE integer.
For both the case LE/BE we do a LE read ... > Which should be the same as > loading one 32-bit LE integer into the low half of the target > register, then a 32-bit LE integer into the high half ot the target > register. .. The 64-bit integer read is not same in these cases. The input itself would be in the order of the format. Input rb32[]: 00010203 20212223 30313233 40414243 LE: helper_deposit32x2: 2021222300010203 helper_deposit32x2: 4041424330313233 BE helper_deposit32x2: 2322212003020100 helper_deposit32x2: 4342414033323130 > > As I said above, the deposit32x2 will swap the order of the two ints, > but it won't byteswap the individual int32s which should have been BE > in memory. > > Can you find the flaw in my reasoning? One anomaly that I see in BE code generation: it also generates a stxvw4x after lxvw4x. I am not sure why. >>>>>>>>>>>>>>>> BE BE BE >>>>>>>>>>>>>> Input rb32[]: 00010203 20212223 30313233 40414243 gen_lxvw4x: called helper_deposit32x2: 2322212003020100 helper_deposit32x2: 4342414033323130 gen_stxvw4x: called helper_deposit32x2: 0302010023222120 helper_deposit32x2: 3332313043424140 Output VRT32: 00010203 20212223 30313233 40414243 >> vsx.h: >> ====== >> #define U32_SIZE (sizeof(__vector uint32_t) / sizeof(uint32_t)) >> >> typedef union { >> __vector uint32_t v; >> uint32_t a[U32_SIZE]; >> } vuint32_t; > > I am a little suspicious that whatever the compiler does to convert > the vector to an array via this union might be undoing a byte reverse. > > I'd be more confident if you used VSX instructions to extract and > store separately one of the 32-bit subwords of the vector. I will try to figure those instructions. Regards Nikunj