David Gibson <da...@gibson.dropbear.id.au> writes: > [ Unknown signature status ] > On Tue, Sep 20, 2016 at 10:40:03PM +0530, Nikunj A Dadhania wrote: >> David Gibson <da...@gibson.dropbear.id.au> writes: >> >> > [ Unknown signature status ] >> > On Mon, Sep 19, 2016 at 04:06:40PM +0530, Nikunj A Dadhania wrote: >> >> David Gibson <da...@gibson.dropbear.id.au> writes: >> >> > [ Unknown signature status ] >> >> > On Mon, Sep 19, 2016 at 04:19:34PM +1000, David Gibson wrote: >> >> >> On Fri, Sep 16, 2016 at 04:21:48PM +0530, Nikunj A Dadhania wrote: >> >> >> > diff --git a/target-ppc/translate/vsx-impl.inc.c >> >> >> > b/target-ppc/translate/vsx-impl.inc.c >> >> >> > index eee6052..df278df 100644 >> >> >> > --- a/target-ppc/translate/vsx-impl.inc.c >> >> >> > +++ b/target-ppc/translate/vsx-impl.inc.c >> >> >> > @@ -75,7 +75,6 @@ static void gen_lxvdsx(DisasContext *ctx) >> >> >> > static void gen_lxvw4x(DisasContext *ctx) >> >> >> > { >> >> >> > TCGv EA; >> >> >> > - TCGv_i64 tmp; >> >> >> > TCGv_i64 xth = cpu_vsrh(xT(ctx->opcode)); >> >> >> > TCGv_i64 xtl = cpu_vsrl(xT(ctx->opcode)); >> >> >> > if (unlikely(!ctx->vsx_enabled)) { >> >> >> > @@ -84,22 +83,14 @@ static void gen_lxvw4x(DisasContext *ctx) >> >> >> > } >> >> >> > gen_set_access_type(ctx, ACCESS_INT); >> >> >> > EA = tcg_temp_new(); >> >> >> > - tmp = tcg_temp_new_i64(); >> >> >> > >> >> >> > gen_addr_reg_index(ctx, EA); >> >> >> > - gen_qemu_ld32u_i64(ctx, tmp, EA); >> >> >> > - tcg_gen_addi_tl(EA, EA, 4); >> >> >> > - gen_qemu_ld32u_i64(ctx, xth, EA); >> >> >> > - tcg_gen_deposit_i64(xth, xth, tmp, 32, 32); >> >> >> > - >> >> >> > - tcg_gen_addi_tl(EA, EA, 4); >> >> >> > - gen_qemu_ld32u_i64(ctx, tmp, EA); >> >> >> > - tcg_gen_addi_tl(EA, EA, 4); >> >> >> > - gen_qemu_ld32u_i64(ctx, xtl, EA); >> >> >> > - tcg_gen_deposit_i64(xtl, xtl, tmp, 32, 32); >> >> >> > - >> >> >> > + tcg_gen_qemu_ld_i64(xth, EA, ctx->mem_idx, MO_LEQ); >> >> >> > + gen_helper_deposit32x2(xth, xth); >> >> >> > + tcg_gen_addi_tl(EA, EA, 8); >> >> >> > + tcg_gen_qemu_ld_i64(xtl, EA, ctx->mem_idx, MO_LEQ); >> >> >> > + gen_helper_deposit32x2(xtl, xtl); >> >> > >> >> > ..and I think this is wrong for BE mode. The deposit32x2 will get the >> >> > words in the right order, but the bytes within each word will be wrong >> >> > because of the LE mode load on a BE setup. >> >> >> >> Since lxvw4x/stxvw4x is available on POWER8. I tried running my test >> >> code on BE and LE Fedora24 VM. TCG Results match the POWER8 hardware. >> >> The order within the word is not changed. Snippet of the test code at >> >> the end of email. Can share full code if needed (maybe will do it in >> >> kvm-unit-test) >> > >> > Ugh.. now I'm confused. I would not have expected the results you've >> > seen from these tests. But I still can't understand *how* the >> > emulation could be correct: IIUC MO_LEQ would mean it loads the 8 >> > bytes as a single 64-bit LE integer. >> >> For both the case LE/BE we do a LE read ... > > .. and I can't see how that can be right for the BE case. > >> > Which should be the same as >> > loading one 32-bit LE integer into the low half of the target >> > register, then a 32-bit LE integer into the high half ot the target >> > register. >> >> .. The 64-bit integer read is not same in these cases. The input itself >> would be in the order of the format. >> >> Input rb32[]: 00010203 20212223 30313233 40414243 >> >> LE: >> helper_deposit32x2: 2021222300010203 >> helper_deposit32x2: 4041424330313233 >> >> BE >> helper_deposit32x2: 2322212003020100 >> helper_deposit32x2: 4342414033323130 > > > Sorry.. I can't really follow the above, because I'm not sure if > you're displaying the bytes within each word in significance order, or > increasing-address order.
Ah, thats just a print inside the helper just to check what MO_LEQ returned: uint64_t helper_deposit32x2(uint64_t x) { fprintf(stderr, "%s: %016lx\n", __func__, x); return deposit64((x >> 32), 32, 32, (x)); } > >> > >> > As I said above, the deposit32x2 will swap the order of the two ints, >> > but it won't byteswap the individual int32s which should have been BE >> > in memory. >> > >> > Can you find the flaw in my reasoning? >> >> One anomaly that I see in BE code generation: it also generates a >> stxvw4x after lxvw4x. I am not sure why. > > Ah... see I'm wondering if it's using the stxvw4x to store back to the > union which you then get the results from. If that's so it could > explain the results, since the bug I suspect is in lxvw4x would be > cancelled out by the corresponding bug in stxv4wx, which is exactly > why I'd prefer the approach to testing mentioned below. Yes, I am investigating it. >> >>>>>>>>>>>>>>>> BE BE BE >>>>>>>>>>>>>> >> Input rb32[]: 00010203 20212223 30313233 40414243 >> >> gen_lxvw4x: called >> helper_deposit32x2: 2322212003020100 >> helper_deposit32x2: 4342414033323130 >> gen_stxvw4x: called >> helper_deposit32x2: 0302010023222120 >> helper_deposit32x2: 3332313043424140 >> Output VRT32: 00010203 20212223 30313233 40414243 >> >> >> vsx.h: >> >> ====== >> >> #define U32_SIZE (sizeof(__vector uint32_t) / sizeof(uint32_t)) >> >> >> >> typedef union { >> >> __vector uint32_t v; >> >> uint32_t a[U32_SIZE]; >> >> } vuint32_t; >> > >> > I am a little suspicious that whatever the compiler does to convert >> > the vector to an array via this union might be undoing a byte reverse. >> > >> > I'd be more confident if you used VSX instructions to extract and >> > store separately one of the 32-bit subwords of the vector. >> >> I will try to figure those instructions. > > Ok, thanks. > Regards, Nikunj