Thanks James for spotting this. I have sent two patches fixing the valgrind
error from checkasm and the unchecked av_mallocs.
I do not believe that the two remaining valgrind errors come from my patch,
although I may be mistaken. Using git bisect, I have
identified b94cd55155d8c061f1e1faca9076afe5
On 2/17/2021 5:24 PM, Paul B Mahol wrote:
On Tue, Feb 16, 2021 at 6:31 PM Alan Kelly <
alankelly-at-google@ffmpeg.org> wrote:
Looks like there are no comments, is this OK to be applied? Thanks
Applied, thanks for pinging.
Valgrind complains about this change. The checkasm test specific
On Tue, Feb 16, 2021 at 6:31 PM Alan Kelly <
alankelly-at-google@ffmpeg.org> wrote:
> Looks like there are no comments, is this OK to be applied? Thanks
>
Applied, thanks for pinging.
>
> On Tue, Feb 9, 2021 at 6:25 PM Paul B Mahol wrote:
>
> > Will apply in no comments.
> > __
Looks like there are no comments, is this OK to be applied? Thanks
On Tue, Feb 9, 2021 at 6:25 PM Paul B Mahol wrote:
> Will apply in no comments.
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-
Will apply in no comments.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Ping!
On Thu, Jan 14, 2021 at 3:47 PM Alan Kelly wrote:
> ---
> Replaces cpuflag(mmx) with notcpuflag(sse3) for store macro
> Tests for multiple sizes in checkasm-sw_scale
> checkasm-sw_scale aligns memory on 8 bytes instad of 32 to catch aligned
> loads
> libswscale/x86/Makefile |
---
Replaces cpuflag(mmx) with notcpuflag(sse3) for store macro
Tests for multiple sizes in checkasm-sw_scale
checkasm-sw_scale aligns memory on 8 bytes instad of 32 to catch aligned loads
libswscale/x86/Makefile | 1 +
libswscale/x86/swscale.c | 130 ---
Apologies for this: when I added mmx to the yasm file, I added a macro for
the stores selecting mova for mmx and movdqu for the others. if
cpuflag(mmx) evaluates to true for all architectures so I replaced it with
if notcpuflag(sse3).
The alignment in the checkasm test has been changed to 8 from 3
On Mon, Jan 11, 2021 at 05:46:31PM +0100, Alan Kelly wrote:
> ---
> Fixes a bug where if there is no offset and a tail which is not processed by
> the
> sse3/avx2 version the dither is modified
> Deletes mmx/mmxext yuv2yuvX version from swscale_template and adds it
> to yuv2yuvX.asm to reduce
---
Fixes a bug where if there is no offset and a tail which is not processed by
the
sse3/avx2 version the dither is modified
Deletes mmx/mmxext yuv2yuvX version from swscale_template and adds it
to yuv2yuvX.asm to reduce code duplication and so that it may be used
to process the tail from th
It's a bug in the patch. The tail not processed by the sse3/avx2 version is
done by the mmx version. I used offset to account for the src pixels
already processed, however, dither is modified if offset is not 0. In cases
where there is a tail and offset is 0, this bug appears. I am working on a
sol
On Thu, Jan 07, 2021 at 10:41:19AM +0100, Alan Kelly wrote:
> ---
> Replaces mova with movdqu due to alignment issues
> libswscale/x86/Makefile | 1 +
> libswscale/x86/swscale.c| 106 +---
> libswscale/x86/yuv2yuvX.asm | 117 ++
On Thu, Jan 07, 2021 at 10:39:56AM +0100, Alan Kelly wrote:
> Thanks for your patience with this, I have replaced mova with movdqu - movu
> generated a compile error on ssse3. What system did this crash on?
AMD Ryzen 9 3950X on linux
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF1
Thanks for your patience with this, I have replaced mova with movdqu - movu
generated a compile error on ssse3. What system did this crash on?
On Wed, Jan 6, 2021 at 9:10 PM Michael Niedermayer
wrote:
> On Tue, Jan 05, 2021 at 01:31:25PM +0100, Alan Kelly wrote:
> > Ping!
>
> crashes (due to ali
---
Replaces mova with movdqu due to alignment issues
libswscale/x86/Makefile | 1 +
libswscale/x86/swscale.c| 106 +---
libswscale/x86/yuv2yuvX.asm | 117
tests/checkasm/sw_scale.c | 98 ++
On Tue, Jan 05, 2021 at 01:31:25PM +0100, Alan Kelly wrote:
> Ping!
crashes (due to alignment i think)
(gdb) disassemble $rip-32,$rip+32
Dump of assembler code from 0x555730a1 to 0x555730e1:
0x555730a1 : int$0x71
0x555730a3 : out%al,$0x3
0x5557
Ping!
On Thu, Dec 17, 2020 at 11:42 AM Alan Kelly wrote:
> ---
> Fixes memory alignment problem in checkasm-sw_scale
> Tested on Linux 32 and 64 bit and mingw32
> libswscale/x86/Makefile | 1 +
> libswscale/x86/swscale.c| 106 +---
> libswscale/x86/yuv2yu
---
Fixes memory alignment problem in checkasm-sw_scale
Tested on Linux 32 and 64 bit and mingw32
libswscale/x86/Makefile | 1 +
libswscale/x86/swscale.c| 106 +---
libswscale/x86/yuv2yuvX.asm | 117
tests/checkasm/sw_sca
On Thu, Dec 10, 2020 at 04:46:26PM +0100, Alan Kelly wrote:
> ---
> Replaces ff_sws_init_swscale_x86 with ff_getSwsFunc
> Load offset if not gprsize but 8 on both 32 and 64 bit
> Removes sfence as NT store no longer used
> libswscale/x86/Makefile | 1 +
> libswscale/x86/swscale.c| 106
---
Replaces ff_sws_init_swscale_x86 with ff_getSwsFunc
Load offset if not gprsize but 8 on both 32 and 64 bit
Removes sfence as NT store no longer used
libswscale/x86/Makefile | 1 +
libswscale/x86/swscale.c| 106 +---
libswscale/x86/yuv2yuvX.asm | 117 +++
On 2020/12/09 11:19, Alan Kelly wrote:
---
Activates avx2 version of yuv2yuvX
Adds checkasm for yuv2yuvX
Modifies ff_yuv2yuvX_* signature to match yuv2yuvX_*
Replaces non-temporal stores with temporal stores
libswscale/x86/Makefile | 1 +
libswscale/x86/swscale.c| 106 +++
This function is tested by fate-filter-fps-r. I have also added a checkasm
test and bench.
I have done a lot more testing and benching of this code and I am now happy
to activate the avx2 version because the performance is so good. On my
machine I get the following results for filter size 4 and 0
---
Activates avx2 version of yuv2yuvX
Adds checkasm for yuv2yuvX
Modifies ff_yuv2yuvX_* signature to match yuv2yuvX_*
Replaces non-temporal stores with temporal stores
libswscale/x86/Makefile | 1 +
libswscale/x86/swscale.c| 106 +---
libswscale/x86/yuv2y
Quoting Alan Kelly (2020-11-19 09:41:56)
> ---
> All of Henrik's suggestions have been implemented. Additionally,
> m3 and m6 are permuted in avx2 before storing to ensure bit by bit
> identical results in avx2.
> libswscale/x86/Makefile | 1 +
> libswscale/x86/swscale.c| 75 +++-
Ping
On Thu, Nov 19, 2020 at 9:42 AM Alan Kelly wrote:
> ---
> All of Henrik's suggestions have been implemented. Additionally,
> m3 and m6 are permuted in avx2 before storing to ensure bit by bit
> identical results in avx2.
> libswscale/x86/Makefile | 1 +
> libswscale/x86/swscale.c
---
All of Henrik's suggestions have been implemented. Additionally,
m3 and m6 are permuted in avx2 before storing to ensure bit by bit
identical results in avx2.
libswscale/x86/Makefile | 1 +
libswscale/x86/swscale.c| 75 +++
libswscale/x86/yuv2yuvX.asm | 118 ++
On Mon, Nov 16, 2020 at 11:03 AM Alan Kelly
wrote:
> +cglobal yuv2yuvX, 6, 7, 16, filter, filterSize, dest, dstW, dither, offset,
> src
Only 8 xmm registers are used, so 8 should be used instead of 16 here.
Otherwise it causes unnecessary spilling of registers on 64-bit
Windows.
> +%if ARCH_X86_
---
Fixes bug in sse3 path where m1 is not set correctly resulting in off
by one errors. The results are now bit by bit identical.
libswscale/x86/Makefile | 1 +
libswscale/x86/swscale.c| 75
libswscale/x86/yuv2yuvX.asm | 114 ++
On Thu, Nov 12, 2020 at 09:33:18AM +0100, Alan Kelly wrote:
> ---
> It now works on x86-32
> libswscale/x86/Makefile | 1 +
> libswscale/x86/swscale.c| 75
> libswscale/x86/yuv2yuvX.asm | 110
> 3 files changed, 121 insertio
---
It now works on x86-32
libswscale/x86/Makefile | 1 +
libswscale/x86/swscale.c| 75
libswscale/x86/yuv2yuvX.asm | 110
3 files changed, 121 insertions(+), 65 deletions(-)
create mode 100644 libswscale/x86/yuv2yuvX.asm
Am Fr., 6. Nov. 2020 um 09:04 Uhr schrieb Alan Kelly
:
>
> The function was re-written in asm, this code is heavily derived from the
> original code, the algorithm remains unchanged, the implementation is
> optimized. Would you agree to adding the copyright from swscale.c:
> * Copyright (C) 2001-20
On Tue, Nov 10, 2020 at 09:43:47AM +0100, Alan Kelly wrote:
> ---
> yuv2yuvX.asm: Ports yuv2yuvX to asm, unrolls main loop and adds
> other small optimizations for ~20% speed-up. Copyright updated to
> include the original from swscale.c
> swscale.c: Removes yuv2yuvX_sse3 and calls new function
---
yuv2yuvX.asm: Ports yuv2yuvX to asm, unrolls main loop and adds
other small optimizations for ~20% speed-up. Copyright updated to
include the original from swscale.c
swscale.c: Removes yuv2yuvX_sse3 and calls new function ff_yuv2yuvX_sse3.
Calls yuv2yuvX_mmxext on remainining elements if r
The function was re-written in asm, this code is heavily derived from the
original code, the algorithm remains unchanged, the implementation is
optimized. Would you agree to adding the copyright from swscale.c:
* Copyright (C) 2001-2011 Michael Niedermayer
to this file, having both copyrights? Th
Am Di., 27. Okt. 2020 um 09:56 Uhr schrieb Alan Kelly
:
> --- /dev/null
> +++ b/libswscale/x86/yuv2yuvX.asm
> @@ -0,0 +1,105 @@
> +;**
> +;* x86-optimized yuv2yuvX
> +;* Copyright 2020 Google LLC
Either the commit message
Thanks for the feedback Anton.
The second patch incorporates changes suggested by James Almer:
avx2 instructions are wrapped in if cpuflag(avx2) and movddup restored
mm1 is replaced by m1 on x86_32
On Tue, Oct 27, 2020 at 10:40 AM Anton Khirnov wrote:
> Hi,
> Quoting Alan Kelly (2020-10-27 10
Hi,
Quoting Alan Kelly (2020-10-27 10:10:14)
> ---
> libswscale/x86/Makefile | 1 +
> libswscale/x86/swscale.c| 75 -
> libswscale/x86/yuv2yuvX.asm | 109
> 3 files changed, 120 insertions(+), 65 deletions(-)
> create mode 10
---
libswscale/x86/Makefile | 1 +
libswscale/x86/swscale.c| 75 -
libswscale/x86/yuv2yuvX.asm | 109
3 files changed, 120 insertions(+), 65 deletions(-)
create mode 100644 libswscale/x86/yuv2yuvX.asm
diff --git a/libswscale
Apologies for the multiple threads, my git send-email was wrongly
configured. This has been fixed.
This code has been tested on AVX2 giving a significant speedup, however,
until the ff_hscale* functions are ported to avx2, this should not be
enabled as it results in an overall slowdown of swscale
Thanks for the review, I have made the required changes. As I have changed
the subject the patch is in a new thread.
On Fri, Oct 23, 2020 at 4:10 PM James Almer wrote:
> On 10/23/2020 10:17 AM, Alan Kelly wrote:
> > Fixed. The wrong step size was used causing a write passed the end of
> > the
---
libswscale/x86/Makefile | 1 +
libswscale/x86/swscale.c| 75 --
libswscale/x86/yuv2yuvX.asm | 105
3 files changed, 116 insertions(+), 65 deletions(-)
create mode 100644 libswscale/x86/yuv2yuvX.asm
diff --git a/libswscal
On Fri, Oct 23, 2020 at 03:34:18PM +0200, Alan Kelly wrote:
> Fixed. The wrong step size was used causing a write passed the end of
> the buffer. yuv2yuvX_mmxext is now called if there are any remaining
> pixels.
>
> There is currently no checkasm for these functions. Is this required for
> sub
On 10/23/2020 10:17 AM, Alan Kelly wrote:
> Fixed. The wrong step size was used causing a write passed the end of
> the buffer. yuv2yuvX_mmxext is now called if there are any remaining pixels.
Please fix the commit subject (It's too long and contains commentary),
and keep comments about fixes be
Fixed. The wrong step size was used causing a write passed the end of
the buffer. yuv2yuvX_mmxext is now called if there are any remaining
pixels.
There is currently no checkasm for these functions. Is this required for
submission?
(Apologies for the double mail, I used git send-email but it
Fixed. The wrong step size was used causing a write passed the end of
the buffer. yuv2yuvX_mmxext is now called if there are any remaining pixels.
---
libswscale/x86/Makefile | 1 +
libswscale/x86/swscale.c| 75 --
libswscale/x86/yuv2yuvX.asm | 105
On Thu, Oct 22, 2020 at 09:43:53AM +0200, Alan Kelly wrote:
> Other functions to be ported to avx2 have been identified and are on
> the todo list.
> ---
> libswscale/x86/Makefile | 1 +
> libswscale/x86/swscale.c| 72 +++--
> libswscale/x86/yuv2yuvX.asm | 105 ++
Do we have checkasm for those functions?
On Thu, 22 Oct 2020, at 09:43, Alan Kelly wrote:
> Other functions to be ported to avx2 have been identified and are on
> the todo list.
> ---
> libswscale/x86/Makefile | 1 +
> libswscale/x86/swscale.c| 72 +++--
> libswscal
Other functions to be ported to avx2 have been identified and are on
the todo list.
---
libswscale/x86/Makefile | 1 +
libswscale/x86/swscale.c| 72 +++--
libswscale/x86/yuv2yuvX.asm | 105
3 files changed, 112 insertions(+), 66 d
48 matches
Mail list logo