Re: [PATCH] micro-optimize RADEONCopySwap in radeon_accel.c for powerpc

2016-11-04 Thread Jochen Rollwagen
Am 02.11.2016 um 04:28 schrieb Michel Dänzer: On 02/11/16 02:39 AM, Jochen Rollwagen wrote: So as far as i’m concerned the last version of the patch i sent is the final one (i’m attaching it). Please re-send the patch you want to be applied generated by git format-patch, ideally with your

Re: [PATCH] micro-optimize RADEONCopySwap in radeon_accel.c for powerpc

2016-11-01 Thread Michel Dänzer
On 02/11/16 02:39 AM, Jochen Rollwagen wrote: > > So as far as i’m concerned the last version of the patch i sent is the > final one (i’m attaching it). Please re-send the patch you want to be applied generated by git format-patch, ideally with your Signed-off-by tag. > On my machine I’m

Re: [PATCH] micro-optimize RADEONCopySwap in radeon_accel.c for powerpc

2016-11-01 Thread Jochen Rollwagen
Am 31.10.2016 um 19:30 schrieb Matt Turner: On Mon, Oct 31, 2016 at 11:16 AM, Jochen Rollwagen wrote: Am 31.10.2016 um 07:01 schrieb Matt Turner: On Fri, Oct 28, 2016 at 1:28 AM, Jochen Rollwagen wrote: Hi there, gcc seems to create some

Re: [PATCH] micro-optimize RADEONCopySwap in radeon_accel.c for powerpc

2016-10-31 Thread Jochen Rollwagen
Am 31.10.2016 um 07:01 schrieb Matt Turner: On Fri, Oct 28, 2016 at 1:28 AM, Jochen Rollwagen wrote: Hi there, gcc seems to create some sub-optimal code for the following code sequence in radeon_accel.c: for (; nwords > 0; --nwords, ++d, ++s) *d = ((*s &

Re: [PATCH] micro-optimize RADEONCopySwap in radeon_accel.c for powerpc

2016-10-31 Thread Matt Turner
On Mon, Oct 31, 2016 at 11:16 AM, Jochen Rollwagen wrote: > Am 31.10.2016 um 07:01 schrieb Matt Turner: >> >> On Fri, Oct 28, 2016 at 1:28 AM, Jochen Rollwagen >> wrote: >>> >>> Hi there, >>> >>> gcc seems to create some sub-optimal code for the

Re: [PATCH] micro-optimize RADEONCopySwap in radeon_accel.c for powerpc

2016-10-31 Thread Matt Turner
On Fri, Oct 28, 2016 at 1:28 AM, Jochen Rollwagen wrote: > Hi there, > > gcc seems to create some sub-optimal code for the following code sequence in > radeon_accel.c: > > for (; nwords > 0; --nwords, ++d, ++s) > *d = ((*s & 0x) << 16) | ((*s >> 16) &

[PATCH] micro-optimize RADEONCopySwap in radeon_accel.c for powerpc

2016-10-30 Thread Jochen Rollwagen
Hi there, gcc seems to create some sub-optimal code for the following code sequence in radeon_accel.c: for (; nwords > 0; --nwords, ++d, ++s) *d = ((*s & 0x) << 16) | ((*s >> 16) & 0x); the body of the loop compiles to lwz 9,40(31) lwz 9,0(9) rotlwi 10,9,16