Hi,
This patch implements ff_hscale_8_to_15_neon with NEON fused multiply accumulate
and bumps the vectorization factor from 2 to 4. I have seen speedups up to 15%
on Graviton A1 instances based on A-72 cpus.
$ ffmpeg -nostats -f lavfi -i testsrc2=4k:d=2 -vf
bench=start,scale=1024x1024,bench=stop
On Mon, Nov 25, 2019 at 4:18 PM Jean-Baptiste Kempf wrote:
> Why adding a new version, in intrinsics, instead of changing the existing
> implementation?
>
Personal preference: I like to read c code instead of asm.
Also I find it much easier to experiment by changing c code rather than asm.
Is t
:0.030462 min:0.030051
Tested with `make check` on aarch64-linux.
Please let me know if I can make the patch better.
Thank you,
Sebastian
From e04f9606f7ea581d8398eb2f37df2f59add8b374 Mon Sep 17 00:00:00 2001
From: Sebastian Pop
Date: Sun, 17 Nov 2019 14:13:13 -0600
Subject: [PATCH] [aarch64] use
On Wed, Nov 27, 2019 at 12:37 PM Jean-Baptiste Kempf wrote:
> > Please let me know if I can make the patch better.
>
> Remove the commented lines.
Attached the updated patch.
Thank you,
Sebastian
0001-aarch64-use-FMA-and-increase-vector-factor-to-4.patch
Description: Binary data
__
On Wed, Nov 27, 2019 at 2:13 PM Clément Bœsch wrote:
> Yeah I will by the end of the week. I wrote that a few years ago so I need
> to take some time to get back in the context.
Thanks Clément for your help.
>
> BTW, that's quite a huge speed improvement you're bringing in, are you
> sure you ar
gt; On Wed, Nov 27, 2019 at 12:30:35PM -0600, Sebastian Pop wrote:
> > [...]
> >> From 9ecaa99fab4b8bedf3884344774162636eaa5389 Mon Sep 17 00:00:00 2001
> >> From: Sebastian Pop
> >> Date: Sun, 17 Nov 2019 14:13:13 -0600
> >> Subject: [PATCH] [aarch64] use
On Mon, Dec 9, 2019 at 5:01 AM Clément Bœsch wrote:
>
> On Sun, Dec 08, 2019 at 11:08:31PM +0200, Martin Storsjö wrote:
> > On Sun, 8 Dec 2019, Clément Bœsch wrote:
> >
> > > On Wed, Dec 04, 2019 at 05:24:46PM -0600, Sebastian Pop wrote:
> > > > Hi Clémen
Hi,
This patch rewrites the innermost loop of ff_yuv2planeX_8_neon to avoid zips and
horizontal adds by using fused multiply adds. The patch also uses ld1r to load
one element and replicate it across all lanes of the vector. The patch also
improves the clipping code by removing the shift right ins
On Mon, Dec 16, 2019 at 3:56 PM Jean-Baptiste Kempf wrote:
>
> On Tue, Dec 10, 2019, at 23:38, Sebastian Pop wrote:
>> Please let me know how I can improve the patch.
>
> No remarks from me.
>
Clément, any further feedback to improve the patch?
Ok to commi
hscale is bound by the number of multiply-adds available on a given core.
The attached patch doubles the number of multiply-adds by distributing half
the load to a helper thread.
The performance improves up to 50% on Graviton2 Arm Neoverse-N1 processors.
$ ./ffmpeg_g -nostats -f lavfi -i testsrc2
On Sat, Jul 18, 2020 at 1:35 AM Michael Niedermayer
wrote:
> Multithreading support should be added in a architecture independant way
>
>
Attached patch moves helper threads up from hscale to
chr_h_scale and lum_h_scale in an architecture independent way.
This new version of the patch improves pe
Hi,
Unrolling by 4 the outer loop in yuv2planeX reduces the number of cache
accesses by 7.5%.
The values loaded for the filter are used in the 4 unrolled iterations and
avoids reloading 3 times the same values.
The performance was measured on an Arm64 Neoverse-N1 Graviton2 c6g.metal
instance with
Thanks Michael for your feedback.
On Wed, Aug 19, 2020 at 6:55 AM Michael Niedermayer
wrote:
> faster is better obviously, so if its tested with odd sizes and arm
> developers had a chance to comment. it should be ok
>
>
The current patch was tested with `make check` on Arm64 Graviton2.
I also h
ch?
Thanks,
Sebastian
On Wed, Aug 19, 2020 at 1:37 PM Sebastian Pop wrote:
> Thanks Michael for your feedback.
>
> On Wed, Aug 19, 2020 at 6:55 AM Michael Niedermayer
> wrote:
>
>> faster is better obviously, so if its tested with odd sizes and arm
>> developers ha
44
> --- a/libswscale/aarch64/hscale.S
> +++ b/libswscale/aarch64/hscale.S
> @@ -1,5 +1,7 @@
> /*
> * Copyright (c) 2016 Clément Bœsch
> + * Copyright (c) 2019-2021 Sebastian Pop
> + * Co
15 matches
Mail list logo