This eliminates a number of branches over blocks of code that are either
empty or can be trivially combined with a separate code block at the start
and end of each scanline. This has a surprisingly big effect, at least on
Cortex-A7, for src_n_8:
Before After
Mean StdDev Mean
These are completely independent of my other patch series - a simple set of
changes arising from my initial playing with the ARMv7 code.
Ben Avison (3):
armv7: Coalesce scalar accesses where possible
armv7: Faster fill operations
armv7: Use VLD-to-all-lanes
pixman/pixman-arm-neon-asm.S |
I noticed in passing that a number of opportunities to use the all-lanes
variant of VLD has been missed. I don't expect any measurable speedup because
these are all in init code, but this simplifies the code a bit.
---
pixman/pixman-arm-neon-asm.S | 142 +-
On Thu, 05 Mar 2015 02:10:18 -, Matt Turner matts...@gmail.com wrote:
What do you use to generate this?
I use the script below to munge the output of lowlevel-blt-bench into a
CSV file, then feed it through the perfcmp Python script. The script
was originally written for this purpose, but
On Wed, Mar 4, 2015 at 5:56 PM, Ben Avison bavi...@riscosopen.org wrote:
This eliminates a number of branches over blocks of code that are either
empty or can be trivially combined with a separate code block at the start
and end of each scanline. This has a surprisingly big effect, at least on