On Mon, Oct 31, 2016 at 11:16 AM, Jochen Rollwagen <[email protected]> wrote: > Am 31.10.2016 um 07:01 schrieb Matt Turner: >> >> On Fri, Oct 28, 2016 at 1:28 AM, Jochen Rollwagen <[email protected]> >> wrote: >>> >>> Hi there, >>> >>> gcc seems to create some sub-optimal code for the following code sequence >>> in >>> radeon_accel.c: >>> >>> for (; nwords > 0; --nwords, ++d, ++s) >>> *d = ((*s & 0xffff) << 16) | ((*s >> 16) & 0xffff); >>> >>> the body of the loop compiles to >>> >>> lwz 9,40(31) >>> lwz 9,0(9) >>> rotlwi 10,9,16 >>> lwz 9,36(31) >>> stw 10,0(9) >>> lwz 9,44(31) >>> addi 9,9,-1 >>> stw 9,44(31) >>> lwz 9,36(31) >>> addi 9,9,4 >>> stw 9,36(31) >>> lwz 9,40(31) >>> addi 9,9,4 >>> stw 9,40(31) >>> >>> this patch adds some (hopefully optimal) assembler code, bringing it in >>> line >>> with the other cases in the switch: >>> >>> diff --git a/src/radeon_accel.c b/src/radeon_accel.c >>> index 1def2a3..580fa33 100644 >>> --- a/src/radeon_accel.c >>> +++ b/src/radeon_accel.c >>> @@ -138,7 +138,16 @@ void RADEONCopySwap(uint8_t *dst, uint8_t *src, >>> unsigned int size, int swap) >>> unsigned int nwords = size >> 2; >>> >>> for (; nwords > 0; --nwords, ++d, ++s) >>> - *d = ((*s & 0xffff) << 16) | ((*s >> 16) & 0xffff); >>> +#ifdef __powerpc__ >>> + __asm__ volatile ("rlwinm %0,%1,%2,%3,%4\n\t" >>> + "rlwimi >>> %0,%1,%5,%6,%7\n\t" >>> + : "=&r" (*d) >>> + : "r" (*s),"i" (16), >>> "i" >>> (16),"i" (31) ,"i" (16), "i" (0),"i" (15) >>> + :); >>> + >>> +#else >>> + *d = ((*s & 0xffff) << 16) | ((*s >> 16) & >>> 0xffff); >>> +#endif >> >> It looks like this code just swaps the two 16-bit shorts. That looks >> like a rotate by 16. I don't know anything about PPC assembly, but if >> this description [1] of the rlwinm is correct, you should be able to >> do it in a single rlwinm instruction, right? E.g., __rlwinm(*s, 16, 0, >> 31). >> >> Also, if it's possible it would be more readable to embed the >> immediates in the assembly text string instead of "i" inputs. >> >> [1] https://gist.github.com/rygorous/1440600 > > > Hi Matt, > > thanks for your review. > > You’re perfectly right, the one rlwinm does the trick –with half the > instructions (one). > > Which very probably really makes this „the optimal“ code sequence.
Cool! > As for the integers, putting them directly in the asm statement seems to > make gcc’s inline asm interpret them as register names (r0, r16, r31). > There’s probably a funky compiler switch for that, but i guess the safest, > cleanest version is the one below. You might try prefixing them with a # or $. Some assembly language's syntax requires such a prefix for immediate values. Also, if you want you might be interested in replacing the other #ifdef __powerpc__ cases later in that file with uses of __builtin_bswap*. _______________________________________________ xorg-driver-ati mailing list [email protected] https://lists.x.org/mailman/listinfo/xorg-driver-ati
