[Pixman] [PATCH 2/3] armv7: Faster fill operations

2015-03-04 Thread Ben Avison
This eliminates a number of branches over blocks of code that are either empty or can be trivially combined with a separate code block at the start and end of each scanline. This has a surprisingly big effect, at least on Cortex-A7, for src_n_8: Before After Mean StdDev Mean

[Pixman] [PATCH 0/3] armv7 patches

2015-03-04 Thread Ben Avison
These are completely independent of my other patch series - a simple set of changes arising from my initial playing with the ARMv7 code. Ben Avison (3): armv7: Coalesce scalar accesses where possible armv7: Faster fill operations armv7: Use VLD-to-all-lanes pixman/pixman-arm-neon-asm.S |

[Pixman] [PATCH 3/3] armv7: Use VLD-to-all-lanes

2015-03-04 Thread Ben Avison
I noticed in passing that a number of opportunities to use the all-lanes variant of VLD has been missed. I don't expect any measurable speedup because these are all in init code, but this simplifies the code a bit. --- pixman/pixman-arm-neon-asm.S | 142 +-

Re: [Pixman] [PATCH 2/3] armv7: Faster fill operations

2015-03-04 Thread Ben Avison
On Thu, 05 Mar 2015 02:10:18 -, Matt Turner matts...@gmail.com wrote: What do you use to generate this? I use the script below to munge the output of lowlevel-blt-bench into a CSV file, then feed it through the perfcmp Python script. The script was originally written for this purpose, but

Re: [Pixman] [PATCH 2/3] armv7: Faster fill operations

2015-03-04 Thread Matt Turner
On Wed, Mar 4, 2015 at 5:56 PM, Ben Avison bavi...@riscosopen.org wrote: This eliminates a number of branches over blocks of code that are either empty or can be trivially combined with a separate code block at the start and end of each scanline. This has a surprisingly big effect, at least on