On Thu, Sep 29, 2016 at 12:11:51AM +0530, Nikunj A Dadhania wrote: > This series contains 7 new instructions for POWER9 ISA3.0 > Use newer qemu load/store tcg helpers and optimize stxvw4x and lxvw4x. > > GCC was adding epilogue for every VSX instructions causing change in > behaviour. For testing the load vector instructions used mfvsrld/mfvsrd > for loading vsr to register. And for testing store vector, used mtvsrdd > instructions. This helped in getting rid of the epilogue added by gcc. > > Patches: > 01: mfvsrld: Move From VSR Lower Doubleword > 02: mtvsrdd: Move To VSR Double Doubleword > 03: mtvsrws: Move To VSR Word & Splat > 05: lxvw4x: improve implementation > 05: stxv4x: improve implementation > 06: lxvh8x: Load VSX Vector Halfword*8 > 07: stxvh8x: Store VSX Vector Halfword*8 > 08: lxvb16x: Load VSX Vector Byte*16 > 09: stxvb16x: Store VSX Vector Byte*16
I've applied everything that rth reviewed to ppc-for-2.8. I've tweaked the ascii art diagrams describing the endianness transformations. Specifically I removed the within-element spaces for each element on the vector (not memory) side. That's to emphasise the fact that in-register there's no endianness, just numbers. > > Changelog: > v4: > * Added gen_bswap16x8 inline for lxvh8x and stxvh8x in tcg > * Dropped helper_bswap16x4 > * Use temporaries in stxvh8x and not clobber the register > > v3: > * Added 3 new VSR instructions. > * Fixed all the vector load/store instructions for BE/LE. > * Added detailed commit messages to patches. > * Dropped deposit32x2 and implemented it using tcg ops > > v2: > * Fix lxvw4x/stxv4x translation as LE/BE were both similar > one in tcg and other as helper > * Rename bswap32x2 to deposit32x2 as it does not need to > swap content(32bit) > * stxvh8x had a bug as David suggested. > > v1: > * More load/store cleanups in byte reverse routines > * ld64/st64 converted to newer macro and updated call sites > * Cleanup load with reservation and store conditional > * Return invalid random for darn instruction > > v0: > * darn - read /dev/random to get the random number > * xxspltib - make is PPC64 only > * Consolidate load/store operations and use macros to generate qemu_st/ld > * Simplify load/store vsx endian manipulation > > Nikunj A Dadhania (6): > target-ppc: improve lxvw4x implementation > target-ppc: improve stxvw4x implementation > target-ppc: add lxvh8x instruction > target-ppc: add stxvh8x instruction > target-ppc: add lxvb16x instruction > target-ppc: add stxvb16x instruction > > Ravi Bangoria (3): > target-ppc: Implement mfvsrld instruction > target-ppc: Implement mtvsrdd instruction > target-ppc: Implement mtvsrws instruction > > target-ppc/translate/vsx-impl.inc.c | 238 > ++++++++++++++++++++++++++++++++---- > target-ppc/translate/vsx-ops.inc.c | 7 ++ > 2 files changed, 221 insertions(+), 24 deletions(-) > -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
signature.asc
Description: PGP signature