Re: [FFmpeg-devel] [aarch64] improve performance of?ff_hscale_8_to_15_neon

2019-12-17 Thread Michael Niedermayer
On Mon, Dec 16, 2019 at 10:53:26PM +0100, Jean-Baptiste Kempf wrote: > On Mon, Dec 9, 2019, at 18:42, Sebastian Pop wrote: > > On Mon, Dec 9, 2019 at 5:01 AM Clément Bœsch wrote: > > > > > > On Sun, Dec 08, 2019 at 11:08:31PM +0200, Martin Storsjö wrote: > > > > On Sun, 8 Dec 2019, Clément Bœsch

Re: [FFmpeg-devel] [aarch64] improve performance of ff_hscale_8_to_15_neon

2019-12-16 Thread Jean-Baptiste Kempf
On Mon, Dec 9, 2019, at 18:42, Sebastian Pop wrote: > On Mon, Dec 9, 2019 at 5:01 AM Clément Bœsch wrote: > > > > On Sun, Dec 08, 2019 at 11:08:31PM +0200, Martin Storsjö wrote: > > > On Sun, 8 Dec 2019, Clément Bœsch wrote: > > > > > > > On Wed, Dec 04, 2019 at 05:24:46PM -0600, Sebastian Pop

Re: [FFmpeg-devel] [aarch64] improve performance of ff_hscale_8_to_15_neon

2019-12-09 Thread Sebastian Pop
On Mon, Dec 9, 2019 at 5:01 AM Clément Bœsch wrote: > > On Sun, Dec 08, 2019 at 11:08:31PM +0200, Martin Storsjö wrote: > > On Sun, 8 Dec 2019, Clément Bœsch wrote: > > > > > On Wed, Dec 04, 2019 at 05:24:46PM -0600, Sebastian Pop wrote: > > > > Hi Clément, > > > > > > > > please find attached

Re: [FFmpeg-devel] [aarch64] improve performance of ff_hscale_8_to_15_neon

2019-12-09 Thread Clément Bœsch
On Sun, Dec 08, 2019 at 11:08:31PM +0200, Martin Storsjö wrote: > On Sun, 8 Dec 2019, Clément Bœsch wrote: > > > On Wed, Dec 04, 2019 at 05:24:46PM -0600, Sebastian Pop wrote: > > > Hi Clément, > > > > > > please find attached the updated patch addressing all your comments. > > > Let me know if

Re: [FFmpeg-devel] [aarch64] improve performance of ff_hscale_8_to_15_neon

2019-12-08 Thread Martin Storsjö
On Sun, 8 Dec 2019, Clément Bœsch wrote: On Wed, Dec 04, 2019 at 05:24:46PM -0600, Sebastian Pop wrote: Hi Clément, please find attached the updated patch addressing all your comments. Let me know if there is anything else that I missed and that I need to address. I can't test but patch

Re: [FFmpeg-devel] [aarch64] improve performance of ff_hscale_8_to_15_neon

2019-12-08 Thread Clément Bœsch
On Wed, Dec 04, 2019 at 05:24:46PM -0600, Sebastian Pop wrote: > Hi Clément, > > please find attached the updated patch addressing all your comments. > Let me know if there is anything else that I missed and that I need to > address. > I can't test but patch LGTM. Aside from the commit

Re: [FFmpeg-devel] [aarch64] improve performance of ff_hscale_8_to_15_neon

2019-12-04 Thread Sebastian Pop
Hi Clément, please find attached the updated patch addressing all your comments. Let me know if there is anything else that I missed and that I need to address. Thanks, Sebastian On Sun, Dec 1, 2019 at 3:01 PM Martin Storsjö wrote: > > On Sun, 1 Dec 2019, Clément Bœsch wrote: > > > On Wed, Nov

Re: [FFmpeg-devel] [aarch64] improve performance of ff_hscale_8_to_15_neon

2019-12-01 Thread Martin Storsjö
On Sun, 1 Dec 2019, Clément Bœsch wrote: On Wed, Nov 27, 2019 at 12:30:35PM -0600, Sebastian Pop wrote: [...] From 9ecaa99fab4b8bedf3884344774162636eaa5389 Mon Sep 17 00:00:00 2001 From: Sebastian Pop Date: Sun, 17 Nov 2019 14:13:13 -0600 Subject: [PATCH] [aarch64] use FMA and increase vector

Re: [FFmpeg-devel] [aarch64] improve performance of ff_hscale_8_to_15_neon

2019-12-01 Thread Clément Bœsch
On Wed, Nov 27, 2019 at 12:30:35PM -0600, Sebastian Pop wrote: [...] > From 9ecaa99fab4b8bedf3884344774162636eaa5389 Mon Sep 17 00:00:00 2001 > From: Sebastian Pop > Date: Sun, 17 Nov 2019 14:13:13 -0600 > Subject: [PATCH] [aarch64] use FMA and increase vector factor to 4 > > This patch

Re: [FFmpeg-devel] [aarch64] improve performance of ff_hscale_8_to_15_neon

2019-11-27 Thread Ronald S. Bultje
Hi, On Thu, Nov 28, 2019 at 2:08 AM Ronald S. Bultje wrote: > Hi, > > On Wed, Nov 27, 2019 at 3:28 PM Sebastian Pop wrote: > >> On Wed, Nov 27, 2019 at 2:13 PM Clément Bœsch wrote: >> > Yeah I will by the end of the week. I wrote that a few years ago so I >> need >> > to take some time to get

Re: [FFmpeg-devel] [aarch64] improve performance of ff_hscale_8_to_15_neon

2019-11-27 Thread Ronald S. Bultje
Hi, On Wed, Nov 27, 2019 at 3:28 PM Sebastian Pop wrote: > On Wed, Nov 27, 2019 at 2:13 PM Clément Bœsch wrote: > > Yeah I will by the end of the week. I wrote that a few years ago so I > need > > to take some time to get back in the context. > > Thanks Clément for your help. > > > > > BTW,

Re: [FFmpeg-devel] [aarch64] improve performance of ff_hscale_8_to_15_neon

2019-11-27 Thread Sebastian Pop
On Wed, Nov 27, 2019 at 2:13 PM Clément Bœsch wrote: > Yeah I will by the end of the week. I wrote that a few years ago so I need > to take some time to get back in the context. Thanks Clément for your help. > > BTW, that's quite a huge speed improvement you're bringing in, are you > sure you

Re: [FFmpeg-devel] [aarch64] improve performance of ff_hscale_8_to_15_neon

2019-11-27 Thread Clément Bœsch
On Wed, Nov 27, 2019 at 07:36:01PM +, Pop, Sebastian wrote: > Thanks Jean-Baptiste for your review and suggestions on how to improve my > patch submission. > From the git logs I found out that Clément Bœsch wrote the original aarch64 > vectorization for that function. > Maybe Clément could

Re: [FFmpeg-devel] [aarch64] improve performance of ff_hscale_8_to_15_neon

2019-11-27 Thread Jean-Baptiste Kempf
On Wed, Nov 27, 2019, at 19:46, Sebastian Pop wrote: > On Wed, Nov 27, 2019 at 12:37 PM Jean-Baptiste Kempf > wrote: > > > Please let me know if I can make the patch better. > > > > Remove the commented lines. > > Attached the updated patch. OK for me. Cannot comment on the content. --

Re: [FFmpeg-devel] [aarch64] improve performance of ff_hscale_8_to_15_neon

2019-11-27 Thread Sebastian Pop
On Wed, Nov 27, 2019 at 12:37 PM Jean-Baptiste Kempf wrote: > > Please let me know if I can make the patch better. > > Remove the commented lines. Attached the updated patch. Thank you, Sebastian 0001-aarch64-use-FMA-and-increase-vector-factor-to-4.patch Description: Binary data

Re: [FFmpeg-devel] [aarch64] improve performance of ff_hscale_8_to_15_neon

2019-11-27 Thread Sebastian Pop
On Mon, Nov 25, 2019 at 11:20 PM Jean-Baptiste Kempf wrote: > > Is there a coding rule in ffmpeg that restricts the use of intrinsics? > > Yes. See doc/optimization.txt. > Use external asm (nasm/yasm) or inline asm (__asm__()), do not use intrinsics. Thanks for the pointer. > Also, here, you're

Re: [FFmpeg-devel] [aarch64] improve performance of ff_hscale_8_to_15_neon

2019-11-27 Thread Jean-Baptiste Kempf
Hello, On Wed, Nov 27, 2019, at 19:30, Sebastian Pop wrote: > Please find attached a patch that improves the existing code in > aarch64/hscale.S > Performance test with gcc and clang shows that the patch improves > performance by 34% on Graviton A1 instances: So, that is better than the

Re: [FFmpeg-devel] [aarch64] improve performance of ff_hscale_8_to_15_neon

2019-11-25 Thread Jean-Baptiste Kempf
On Tue, Nov 26, 2019, at 05:51, Sebastian Pop wrote: > On Mon, Nov 25, 2019 at 4:18 PM Jean-Baptiste Kempf wrote: > > Why adding a new version, in intrinsics, instead of changing the existing > > implementation? > > > > Personal preference: I like to read c code instead of asm. > Also I find it

Re: [FFmpeg-devel] [aarch64] improve performance of ff_hscale_8_to_15_neon

2019-11-25 Thread Sebastian Pop
On Mon, Nov 25, 2019 at 4:18 PM Jean-Baptiste Kempf wrote: > Why adding a new version, in intrinsics, instead of changing the existing > implementation? > Personal preference: I like to read c code instead of asm. Also I find it much easier to experiment by changing c code rather than asm. Is

Re: [FFmpeg-devel] [aarch64] improve performance of ff_hscale_8_to_15_neon

2019-11-25 Thread Jean-Baptiste Kempf
Hello, On Mon, Nov 25, 2019, at 22:59, Sebastian Pop wrote: > This patch implements ff_hscale_8_to_15_neon with NEON fused multiply > accumulate > and bumps the vectorization factor from 2 to 4. I have seen speedups up to 15% > on Graviton A1 instances based on A-72 cpus. Why adding a new

[FFmpeg-devel] [aarch64] improve performance of ff_hscale_8_to_15_neon

2019-11-25 Thread Sebastian Pop
Hi, This patch implements ff_hscale_8_to_15_neon with NEON fused multiply accumulate and bumps the vectorization factor from 2 to 4. I have seen speedups up to 15% on Graviton A1 instances based on A-72 cpus. $ ffmpeg -nostats -f lavfi -i testsrc2=4k:d=2 -vf