On Fri, 1 Feb 2019, Martin Storsjö wrote:

The previous version was a pretty exact translation of the arm
version. This version does do some unnecessary arithemetic (it does
more operations on vectors that are only half filled; it does 4
uaddw and 4 sqxtun instead of 2 of each), but it reduces the overhead
of packing data together (which could be done for free in the arm
version).

This gives a decent speedup on Cortex A53, a minor speedup on
A72 and a very minor slowdown on Cortex A73.

Before:        Cortex A53    A72    A73
vp8_idct_add_neon:   79.7   67.5   65.0
After:
vp8_idct_add_neon:   67.7   64.8   66.7
---
libavcodec/aarch64/vp8dsp_neon.S | 49 ++++++++++++++++++++--------------------
1 file changed, 25 insertions(+), 24 deletions(-)

22:38 <jannau> feel free to push next week if I didn't manage to start by
               then

I'll push this patchset soon, with some changes squashed as suggested by Diego.

// Martin
_______________________________________________
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to