On Fri, 1 Feb 2019, Martin Storsjö wrote:
The previous version was a pretty exact translation of the arm
version. This version does do some unnecessary arithemetic (it does
more operations on vectors that are only half filled; it does 4
uaddw and 4 sqxtun instead of 2 of each), but it reduces the overhead
of packing data together (which could be done for free in the arm
version).
This gives a decent speedup on Cortex A53, a minor speedup on
A72 and a very minor slowdown on Cortex A73.
Before: Cortex A53 A72 A73
vp8_idct_add_neon: 79.7 67.5 65.0
After:
vp8_idct_add_neon: 67.7 64.8 66.7
---
libavcodec/aarch64/vp8dsp_neon.S | 49 ++++++++++++++++++++--------------------
1 file changed, 25 insertions(+), 24 deletions(-)
22:38 <jannau> feel free to push next week if I didn't manage to start by
then
I'll push this patchset soon, with some changes squashed as suggested by
Diego.
// Martin
_______________________________________________
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel