Re: [Qemu-devel] [PATCH 5/6] target-ppc: add lxvb16x and lxvh8x

Nikunj A Dadhania Wed, 10 Aug 2016 02:24:13 -0700

Richard Henderson <r...@twiddle.net> writes:

> On 08/08/2016 10:57 AM, Richard Henderson wrote:
>> On 08/07/2016 11:06 PM, Nikunj A Dadhania wrote:
>>> +#define LXV(name, access, swap, type, elems)                     \
>>> +uint64_t helper_##name(CPUPPCState *env,                         \
>>> +                       target_ulong addr)                        \
>>> +{                                                                \
>>> +    type r[elems] = {0};                                         \
>>> +    int i, index, bound, step;                                   \
>>> +    if (msr_le) {                                                \
>>> +        index = elems - 1;                                       \
>>> +        bound = -1;                                              \
>>> +        step = -1;                                               \
>>> +    } else  {                                                    \
>>> +        index = 0;                                               \
>>> +        bound = elems;                                           \
>>> +        step = 1;                                                \
>>> +    }                                                            \
>>> +                                                                 \
>>> +    for (i = index; i != bound; i += step) {                     \
>>> +        if (needs_byteswap(env)) {                               \
>>> +            r[i] = swap(access(env, addr, GETPC()));             \
>>> +        } else {                                                 \
>>> +            r[i] =  access(env, addr, GETPC());                  \
>>> +        }                                                        \
>>> +        addr = addr_add(env, addr, sizeof(type));                \
>>> +    }                                                            \
>>> +    return  *((uint64_t *)r);                                    \
>>> +}
>>
>> This looks more complicated than necessary.
>>
>> (1) In big-endian mode, surely this simplifies to two 64-bit big-endian 
>> loads.
>>
>> (2) In little-endian mode, the overhead of accessing memory surely dominates,
>> and therefore we should perform two 64-bit loads and manipulate the data 
>> after.
>>
>> AFAICS, this is easiest done by requesting two 64-bit *big-endian* loads, and
>> then swapping bytes.  E.g.
>>
>> uint64_t helper_bswap16x4(uint64_t x)
>> {
>>     uint64_t m = 0x00ff00ff00ff00ffull;
>>     return ((x & m) << 8) | ((x >> 8) & m);
>> }
>>
>> uint64_t helper_bswap32x2(uint64_t x)
>> {
>>     return deposit64(bswap32(x >> 32), 32, 32, bswap32(x));
>> }
>
> To correct myself, this big-endian load really only makes sense for lxvh8x.
> For lxvw4x, a little-endian load with a word swap is fewer operations.  I.e.
>
>    tcg_gen_qemu_ld_i64(t0, addr, ctx->mem_idx, MO_LEQ);
>    tcg_gen_shri_i64(t1, t0, 32);
>    tcg_gen_deposit_i64(dest, t1, t0, 32, 32);



I have following for lxvw4x:

    if (ctx->le_mode) {
        t0 = tcg_temp_new_i64();
        t1 = tcg_temp_new_i64();
        tcg_gen_qemu_ld_i64(t0, EA, ctx->mem_idx, MO_LEQ);
        tcg_gen_shri_i64(t1, t0, 32);
        tcg_gen_deposit_i64(xth, t1, t0, 32, 32);
        tcg_gen_addi_tl(EA, EA, 8);
        tcg_gen_qemu_ld_i64(t0, EA, ctx->mem_idx, MO_LEQ);
        tcg_gen_shri_i64(t1, t0, 32);
        tcg_gen_deposit_i64(xtl, t1, t0, 32, 32);
        tcg_temp_free_i64(t0);
        tcg_temp_free_i64(t1);
    } else {
        tcg_gen_qemu_ld_i64(xth, EA, ctx->mem_idx, MO_BEQ);
        tcg_gen_addi_tl(EA, EA, 8);
        tcg_gen_qemu_ld_i64(xtl, EA, ctx->mem_idx, MO_BEQ);
    }

Here is the test code:

__vector uint32_t vrt32;
uint32_t rb32[4] = {0x00010203, 0x20212223, 0x30313233, 0x40414243};
asm("lxvw4x %x0, 0, %1 \n\t" \
     : "=ws"(vrt32) : "r"(&rb32));
printf("VRT32 = "); vec_put_u32(vrt32);

Result On LE:
VRT32 = 40414243 30313233 20212223 00010203

Result On BE:
VRT32 = 03020100 23222120 33323130 43424140

I had have expected the the BE result to be
VRT32 = 00010203 20212223 30313233 40414243

But seems there is something going under the hoods. Am I missing
something here?

I can fix the BE case using following but not sure if that will be
correct !

        tcg_gen_qemu_ld_i64(xth, EA, ctx->mem_idx, MO_Q);
        gen_helper_bswap32x2(xth, xth);
        tcg_gen_addi_tl(EA, EA, 8);
        tcg_gen_qemu_ld_i64(xtl, EA, ctx->mem_idx, MO_Q);
        gen_helper_bswap32x2(xtl, xtl);

uint64_t helper_bswap32x2(uint64_t x)
{
    return deposit64((x >> 32), 32, 32, (x));
}

And then I get the expected result:
VRT32 = 00010203 20212223 30313233 40414243

Regards,
Nikunj

Re: [Qemu-devel] [PATCH 5/6] target-ppc: add lxvb16x and lxvh8x

Reply via email to