Re: [libav-devel] [PATCH 19/19] aarch64: vp8: Optimize vp8_idct_add_neon for aarch64

2019-02-19 Thread Martin Storsjö
On Fri, 1 Feb 2019, Martin Storsjö wrote: The previous version was a pretty exact translation of the arm version. This version does do some unnecessary arithemetic (it does more operations on vectors that are only half filled; it does 4 uaddw and 4 sqxtun instead of 2 of each), but it reduces

[libav-devel] [PATCH 19/19] aarch64: vp8: Optimize vp8_idct_add_neon for aarch64

2019-02-01 Thread Martin Storsjö
The previous version was a pretty exact translation of the arm version. This version does do some unnecessary arithemetic (it does more operations on vectors that are only half filled; it does 4 uaddw and 4 sqxtun instead of 2 of each), but it reduces the overhead of packing data together (which