The implementation of QEMU 0.12.5 is as below:
static inline void gen_neon_mull(TCGv_i64 dest, TCGv a, TCGv b, int size, 
int u)
{
    TCGv_i64 tmp;

    switch ((size << 1) | u) {
    case 0: gen_helper_neon_mull_s8(dest, a, b); break;
    case 1: gen_helper_neon_mull_u8(dest, a, b); break;
    case 2: gen_helper_neon_mull_s16(dest, a, b); break;
    case 3: gen_helper_neon_mull_u16(dest, a, b); break;
    case 4:
        tmp = gen_muls_i64_i32(a, b);
        tcg_gen_mov_i64(dest, tmp);
        break;
    case 5:
        tmp = gen_mulu_i64_i32(a, b);
        tcg_gen_mov_i64(dest, tmp);
        break;
    default: abort();
    }
} 

Logically, implementation of Vmull.s32 and vmul.u32 is just similar to the 
8 and 16 bit cases. For example:
    case 4: gen_helper_neon_mull_s32(dest, a, b); break;
    case 5: gen_helper_neon_mull_u32(dest, a, b); break;
I implemented in this way and tested. It is OK. So I can't understand why 
Vmull.s32 and vmul.u32 were implemented like this in QEMU 0.12.5. Please 
explain for me !


Thanks and Best Regards,
=====================
Tran Van Dung

"The information in this e-mail (including attachments) is confidential and is 
only intended for use by the addressee. If you are not the intended recipient 
or addressee, please notify us immediately. Any unauthorized disclosure, use or 
dissemination either in whole or in part is prohibited. Opinions, conclusions 
and other information contained in this message are personal opinions of the 
sender and do not necessarily represent the views of the Panasonic Group of 
companies."

Reply via email to