Re: [FFmpeg-devel] [PATCH 1/5] lavu/pixfmt: add Y210/AYUV/Y410 pixel formats

2019-06-28 Thread James Darnley
On 2019-06-28 03:03, Hendrik Leppkes wrote: > On Fri, Jun 28, 2019 at 1:26 AM James Darnley wrote: >> >> On 2019-06-28 04:26, Linjie Fu wrote: >>> Previously, media driver provided planar format(like 420 8 bit), but >>> for HEVC Range Extension (422

Re: [FFmpeg-devel] [PATCH 1/5] lavu/pixfmt: add Y210/AYUV/Y410 pixel formats

2019-06-27 Thread James Darnley
On 2019-06-28 04:26, Linjie Fu wrote: > Previously, media driver provided planar format(like 420 8 bit), but > for HEVC Range Extension (422/444 8/10 bit), the decoded image is > produced in packed format. > > Y210/AYUV/Y410 are packed formats which are needed in HEVC Rext decoding > for both

Re: [FFmpeg-devel] [PATCH] avcodec: Add librav1e encoder

2019-05-28 Thread James Darnley
On 2019-05-28 22:00, Derek Buitenhuis wrote: > On 28/05/2019 20:58, James Almer wrote: >> I think x26* and vpx/aom call it crf? It's not in option_tables.h in any >> case. > > They do not. This is a constant quantizer mode, not constant rate factor. IIRC either qp or cqp signature.asc

Re: [FFmpeg-devel] [PATCH 1/7] libavfilter/vf_overlay.c: change the commands style for the macro defined function

2019-05-24 Thread James Darnley
On 2019-05-24 12:06, James Darnley wrote: > On 2019-05-24 11:36, lance.lmw...@gmail.com wrote: >> From: Limin Wang >> >> ... > > Why? I see why: so you don't screw-up the macros you create later. signature.asc Description

Re: [FFmpeg-devel] [PATCH 1/7] libavfilter/vf_overlay.c: change the commands style for the macro defined function

2019-05-24 Thread James Darnley
On 2019-05-24 11:36, lance.lmw...@gmail.com wrote: > From: Limin Wang > > ... Why? And these are "comments" not "commands". signature.asc Description: OpenPGP digital signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org

Re: [FFmpeg-devel] [PATCH] avcodec/v210dec: Fix alignment check for AVX2

2019-05-18 Thread James Darnley
On 2019-05-18 12:15, Michael Niedermayer wrote: > On Sat, May 18, 2019 at 12:02:55PM +0200, James Darnley wrote: >> I object to the commit message though because it isn't a "null pointer >> dereference" but if that is the error as reported by the tool then keep >>

Re: [FFmpeg-devel] [PATCH] avcodec/v210dec: Fix alignment check for AVX2

2019-05-18 Thread James Darnley
On 2019-05-18 09:39, Michael Niedermayer wrote: > Fixes: "null pointer dereference" > Fixes: > 14551/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_V210_fuzzer-5088609952071680 > > Found-by: continuous fuzzing process > https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg >

Re: [FFmpeg-devel] [PATCH 0/3] v210dec checkasm test and avx2 function

2019-04-18 Thread James Darnley
On 2019-04-10 14:47, James Darnley wrote: > I am resending this my patches because I am not sure if I sent this version in > the past. I split my changes into two patches because they do separate > things. > > I also changed some tabs to spaces in Mike's AVX2 patch. >

Re: [FFmpeg-devel] [PATCH 3/3] libavcodec Adding ff_v210_planar_unpack AVX2

2019-04-10 Thread James Darnley
On 2019-04-10 14:47, James Darnley wrote: > From: Michael Stoner Screw you mailing list or git, which ever one of you managed to screw up the author's address. I will correct that, if I can. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org ht

[FFmpeg-devel] [PATCH 1/3] avcodec/v210dec: move DSP function setting into dedicated function

2019-04-10 Thread James Darnley
Prepare for checkasm test. --- libavcodec/v210dec.c | 16 ++-- libavcodec/v210dec.h | 1 + 2 files changed, 11 insertions(+), 6 deletions(-) diff --git a/libavcodec/v210dec.c b/libavcodec/v210dec.c index ddc5dbe8be..fd8a6b0d78 100644 --- a/libavcodec/v210dec.c +++

[FFmpeg-devel] [PATCH 3/3] libavcodec Adding ff_v210_planar_unpack AVX2

2019-04-10 Thread James Darnley
From: Michael Stoner Replaced VSHUFPS with VPBLENDD to relieve port 5 bottleneck AVX2 is 1.4x faster than AVX --- Mike, is this still the patch you want applied. I had to make a small amendment to it because you had some tabs as indentation. libavcodec/v210dec.c | 10 +-

[FFmpeg-devel] [PATCH 0/3] v210dec checkasm test and avx2 function

2019-04-10 Thread James Darnley
I am resending this my patches because I am not sure if I sent this version in the past. I split my changes into two patches because they do separate things. I also changed some tabs to spaces in Mike's AVX2 patch. James Darnley (2): avcodec/v210dec: move DSP function setting into dedicated

[FFmpeg-devel] [PATCH 2/3] checkasm: add test for v210dec

2019-04-10 Thread James Darnley
sm_check_vf_hflip(void); void checkasm_check_vf_threshold(void); diff --git a/tests/checkasm/v210dec.c b/tests/checkasm/v210dec.c new file mode 100644 index 00..7dd50a8271 --- /dev/null +++ b/tests/checkasm/v210dec.c @@ -0,0 +1,77 @@ +/* + * Copyright (c) 2019 James Darnley + * + * This file is

Re: [FFmpeg-devel] [PATCH] libavcodec Adding ff_v210_planar_unpack AVX2

2019-03-27 Thread James Darnley
On 2019-03-26 21:22, Mike Stoner via ffmpeg-devel wrote: > Hello, > I’ve accounted for all feedback on this so far, I’m wondering if it is ready > to be pushed upstream? > > Here are my results from ‘checkasm’ (lower is better): > > v210_unpack_c: 1636 > v210_unpack_ssse3: 611 >

[FFmpeg-devel] [PATCH 1/2] avcodec/v210dec: move DSP function setting into dedicated function

2019-03-06 Thread James Darnley
Prepare for checkasm test. --- libavcodec/v210dec.c | 16 ++-- libavcodec/v210dec.h | 1 + 2 files changed, 11 insertions(+), 6 deletions(-) diff --git a/libavcodec/v210dec.c b/libavcodec/v210dec.c index ddc5dbe8be..fd8a6b0d78 100644 --- a/libavcodec/v210dec.c +++

[FFmpeg-devel] [PATCH 2/2] checkasm: add test for v210dec

2019-03-06 Thread James Darnley
sm_check_vf_hflip(void); void checkasm_check_vf_threshold(void); diff --git a/tests/checkasm/v210dec.c b/tests/checkasm/v210dec.c new file mode 100644 index 00..7dd50a8271 --- /dev/null +++ b/tests/checkasm/v210dec.c @@ -0,0 +1,77 @@ +/* + * Copyright (c) 2019 James Darnley + * + * This file is

[FFmpeg-devel] [PATCH] avcodec/v210dec: move DSP function setting into dedicated function

2019-03-06 Thread James Darnley
Prepare for checkasm test. --- libavcodec/v210dec.c | 16 ++-- libavcodec/v210dec.h | 1 + 2 files changed, 11 insertions(+), 6 deletions(-) diff --git a/libavcodec/v210dec.c b/libavcodec/v210dec.c index ddc5dbe8be..6db662538e 100644 --- a/libavcodec/v210dec.c +++

Re: [FFmpeg-devel] [PATCH] checkasm: add test for v210dec

2019-03-06 Thread James Darnley
On 2019-03-06 20:31, James Darnley wrote: > ... Wrong patch and wrong reference. Please ignore this. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH] checkasm: add test for v210dec

2019-03-06 Thread James Darnley
sm_check_vf_hflip(void); void checkasm_check_vf_threshold(void); diff --git a/tests/checkasm/v210dec.c b/tests/checkasm/v210dec.c new file mode 100644 index 00..7320ed5e37 --- /dev/null +++ b/tests/checkasm/v210dec.c @@ -0,0 +1,76 @@ +/* + * Copyright (c) 2019 James Darnley + * + * This file is

Re: [FFmpeg-devel] [PATCH 1/2] avcodec/v210dec: move DSP function setting into dedicated function

2019-03-06 Thread James Darnley
On 2019-03-06 10:11, Paul B Mahol wrote: > On 3/6/19, Carl Eugen Hoyos wrote: >> 2019-03-04 23:58 GMT+01:00, James Darnley : >>> Prepare for checkasm test. >>> --- >>> libavcodec/v210dec.c | 13 + >>> libavcodec/v210dec.h | 1 + >&g

[FFmpeg-devel] [PATCH 2/2] checkasm: add test for v210dec

2019-03-04 Thread James Darnley
sm_check_vf_hflip(void); void checkasm_check_vf_threshold(void); diff --git a/tests/checkasm/v210dec.c b/tests/checkasm/v210dec.c new file mode 100644 index 00..7320ed5e37 --- /dev/null +++ b/tests/checkasm/v210dec.c @@ -0,0 +1,76 @@ +/* + * Copyright (c) 2019 James Darnley + * + * This file is

[FFmpeg-devel] [PATCH 1/2] avcodec/v210dec: move DSP function setting into dedicated function

2019-03-04 Thread James Darnley
Prepare for checkasm test. --- libavcodec/v210dec.c | 13 + libavcodec/v210dec.h | 1 + 2 files changed, 10 insertions(+), 4 deletions(-) diff --git a/libavcodec/v210dec.c b/libavcodec/v210dec.c index ddc5dbe8be..28cf00d320 100644 --- a/libavcodec/v210dec.c +++

Re: [FFmpeg-devel] [PATCH] Added ff_v210_planar_unpack_aligned_avx2

2019-03-04 Thread James Darnley
On 2019-03-01 18:41, Michael Stoner wrote: > The AVX2 code leverages VPERMD to process 12 pixels/iteration. This is my > first patch submission so any comments are greatly appreciated. > > -Mike > > Tested on Skylake (Win32 & Win64) > 1920x1080 input frame > = > C code -

Re: [FFmpeg-devel] [PATCH] Added ff_v210_planar_unpack_aligned_avx2

2019-03-04 Thread James Darnley
On 2019-03-03 15:44, Martin Vignali wrote: > Hello, > > ... > > Not directly related to this patch, but it can be interesting for testing > purpose to write a checkasm test for the v210 func decoding. > So it's more easy to check the perf for "each" cpu flags, and be sure, the > various width

Re: [FFmpeg-devel] Lossy GIF encoding

2019-02-15 Thread James Darnley
On 2019-02-15 10:01, Kornel wrote: > libavcodec/gif.c in ff_gif_encoder.pix_fmts seems to passively declare types > of pixel formats it accepts. If you want to experiment you can change that so it accepts rgb (also or only). Then you can implement and test what you want, then you can ask about

Re: [FFmpeg-devel] [PATCH] avformat/matroskaenc: add reserve free space option

2018-09-06 Thread James Darnley
On 2018-09-06 19:39, Sigríður Regína Sigurþórsdóttir wrote: > +if (s->metadata_header_padding) { > +if (s->metadata_header_padding == 1) > +s->metadata_header_padding++; > +put_ebml_void(pb, s->metadata_header_padding); > +} Unfortunately I was forced to make

Re: [FFmpeg-devel] [PATCH] avformat/matroskaenc: add reserve free space option

2018-09-05 Thread James Darnley
On 2018-09-05 22:52, Sigríður Regína Sigurþórsdóttir wrote: > +{"reserve_free_space", "Reserve a given amount of space at the > beginning og the file for unspecified purpose." I added the "metadata_header_padding" global option many years ago. Can you not reuse it for this purpose? Is it

Re: [FFmpeg-devel] [PATCH] frame: Simplify the video allocation

2018-09-03 Thread James Darnley
On 2018-09-03 15:29, James Almer wrote: > pass 32 - 1 to both av_image_fill_pointers() calls directly? Please do not add a magic number where nobody will find it. Use one of the 3 already existing methods for knowing the alignment necessary for assembly. If this is unrelated, my apologies.

Re: [FFmpeg-devel] [PATCH 1/3] diracdec: add 10-bit Haar SIMD functions

2018-07-27 Thread James Darnley
On 2018-07-27 15:05, Henrik Gramner wrote: > On Fri, Jul 27, 2018 at 1:47 PM, James Darnley wrote: >> On 2018-07-26 17:29, Rostislav Pehlivanov wrote: >>>> +cglobal horizontal_compose_haar_10bit, 3, 6+ARCH_X86_64, 4, b, temp_, w, >>>> x, b2 >>>> +

Re: [FFmpeg-devel] [PATCH 1/3] diracdec: add 10-bit Haar SIMD functions

2018-07-27 Thread James Darnley
On 2018-07-26 17:29, Rostislav Pehlivanov wrote: > On 26 July 2018 at 12:28, James Darnley wrote: > +cglobal vertical_compose_haar_10bit, 3, 6, 4, b0, b1, w >> +DECLARE_REG_TMP 4,5 >> + >> +mova m2, [pd_1] >> +mov r3d, wd >> +and wd,

[FFmpeg-devel] [PATCH 1/3] diracdec: add 10-bit Haar SIMD functions

2018-07-26 Thread James Darnley
wavelet trasnform +;* Copyright (c) 2018 James Darnley +;* +;* This file is part of FFmpeg. +;* +;* FFmpeg is free software; you can redistribute it and/or +;* modify it under the terms of the GNU Lesser General Public +;* License as published by the Free Software Foundation; either +;* ver

[FFmpeg-devel] [PATCH 0/3 v2] x86 SIMD for dirac 10-bit wavelet transforms

2018-07-26 Thread James Darnley
I will ask the same question as last time. Is the AVX worth it in Haar? Also I am surprised that the AVX2 doesn't have a bigger difference on some of the vertical transforms. James Darnley (3): diracdec: add 10-bit Haar SIMD functions diracdec: add 10-bit Legall 5,3 (5_3) SIMD functions

[FFmpeg-devel] [PATCH 3/3] diracdec: add 10-bit Deslauriers-Dubuc 9, 7 (9_7) vertical high-pass function

2018-07-26 Thread James Darnley
Speed of ffmpeg when decoding a 720p yuv422p10 file encoded with the relevant transform. C: 84fps SSE2: 111fps AVX2: 115fps dd97 vertical hi sse2: 2.77x faster (31773 vs. 11457 decicycles) compared with C avx2: 3.83x faster (31773 vs. 8297 decicycles) compared with C ---

[FFmpeg-devel] [PATCH 2/3] diracdec: add 10-bit Legall 5, 3 (5_3) SIMD functions

2018-07-26 Thread James Darnley
Speed of ffmpeg when decoding a 720p yuv422p10 file encoded with the relevant transform. C: 94fps SSE2: 118fps AVX2: 121fps legall vertical hi sse2: 3.86x faster (20201 vs. 5231 decicycles) compared with C avx2: 6.70x faster (20201 vs. 3014 decicycles) compared with C legall vertical

Re: [FFmpeg-devel] [PATCH 0/6] x86 SIMD for dirac 10-bit wavelet transforms

2018-07-25 Thread James Darnley
On 2018-07-19 17:23, Rostislav Pehlivanov wrote: > Could you provide standard overall transform results using START/STOP_TIMER > rather than overall decoding speed? Ask and ye shall receive. > haar horizontal compose > sse2: 3.67x faster (45248±108.1 vs. 12328±21.1 decicycles) compared with

Re: [FFmpeg-devel] [PATCH 3/6] diracdec: add 10-bit Deslauriers-Dubuc 9, 7 (9_7) vertical high-pass function

2018-07-19 Thread James Darnley
On 2018-07-19 17:26, Rostislav Pehlivanov wrote: > On 19 July 2018 at 15:52, James Darnley wrote: > >> int32_t *b1, int32_t *b2, int >> b1[i] = COMPOSE_DIRAC53iH0(b0[i], b1[i], b2[i]); >> } >> >> +static void dd97_vertical_hi_sse2(i

Re: [FFmpeg-devel] [PATCH 0/6] x86 SIMD for dirac 10-bit wavelet transforms

2018-07-19 Thread James Darnley
On 2018-07-19 17:23, Rostislav Pehlivanov wrote: > > Could you provide standard overall transform results using START/STOP_TIMER > rather than overall decoding speed? > Coefficients sizes and therefore golomb unpacking speed changes with > respect to the transform so potentially there could be

[FFmpeg-devel] [PATCH 5/6] diracdec: avx2 dd97

2018-07-19 Thread James Darnley
--- libavcodec/x86/dirac_dwt_10bit.asm| 3 ++- libavcodec/x86/dirac_dwt_init_10bit.c | 13 + 2 files changed, 15 insertions(+), 1 deletion(-) diff --git a/libavcodec/x86/dirac_dwt_10bit.asm b/libavcodec/x86/dirac_dwt_10bit.asm index ae110d2945..2e039e11ea 100644 ---

[FFmpeg-devel] [PATCH 4/6] diracdec: avx2 legall

2018-07-19 Thread James Darnley
--- libavcodec/x86/dirac_dwt_10bit.asm| 4 +++- libavcodec/x86/dirac_dwt_init_10bit.c | 22 ++ 2 files changed, 25 insertions(+), 1 deletion(-) diff --git a/libavcodec/x86/dirac_dwt_10bit.asm b/libavcodec/x86/dirac_dwt_10bit.asm index 681de5e1df..ae110d2945 100644 ---

[FFmpeg-devel] [PATCH 3/6] diracdec: add 10-bit Deslauriers-Dubuc 9, 7 (9_7) vertical high-pass function

2018-07-19 Thread James Darnley
Speed of ffmpeg when decoding a 720p yuv422p10 file encoded with the relevant transform. C: 84fps SSE2: 111fps AVX2: 115fps --- libavcodec/x86/dirac_dwt_10bit.asm| 38 +++ libavcodec/x86/dirac_dwt_init_10bit.c | 16 +++ 2 files changed, 54 insertions(+)

[FFmpeg-devel] [PATCH 1/6] diracdec: add 10-bit Haar SIMD functions

2018-07-19 Thread James Darnley
@@ -0,0 +1,113 @@ +;** +;* x86 optimized discrete 10-bit wavelet trasnform +;* Copyright (c) 2018 James Darnley +;* +;* This file is part of FFmpeg. +;* +;* FFmpeg is free software; you can redistribute it and/or +;* modify

[FFmpeg-devel] [PATCH 2/6] diracdec: add 10-bit Legall 5, 3 (5_3) SIMD functions

2018-07-19 Thread James Darnley
Speed of ffmpeg when decoding a 720p yuv422p10 file encoded with the relevant transform. C: 94fps SSE2: 118fps AVX2: 121fps --- libavcodec/x86/dirac_dwt_10bit.asm| 55 +++ libavcodec/x86/dirac_dwt_init_10bit.c | 23 +++ 2 files changed, 78 insertions(+)

[FFmpeg-devel] [PATCH 6/6] diracdec: increase rodata alignment for avx2

2018-07-19 Thread James Darnley
--- libavcodec/x86/dirac_dwt_10bit.asm | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libavcodec/x86/dirac_dwt_10bit.asm b/libavcodec/x86/dirac_dwt_10bit.asm index 2e039e11ea..d0da822a81 100644 --- a/libavcodec/x86/dirac_dwt_10bit.asm +++ b/libavcodec/x86/dirac_dwt_10bit.asm

[FFmpeg-devel] [PATCH 0/6] x86 SIMD for dirac 10-bit wavelet transforms

2018-07-19 Thread James Darnley
it in Haar? Is the AVX2 worth it in the latter two? I added those later which is why they are separate patches. I will squash them before pushing if I keep them. James Darnley (6): diracdec: add 10-bit Haar SIMD functions diracdec: add 10-bit Legall 5,3 (5_3) SIMD functions diracdec: add 10

Re: [FFmpeg-devel] [PATCH v2] fate: add more vc2 encoder tests

2018-07-18 Thread James Darnley
On 2018-07-18 02:24, Michael Niedermayer wrote: > On Mon, Jul 16, 2018 at 01:03:53PM +0200, James Darnley wrote: >> From: James Darnley >> >> --- >> Michael, can you test this for the same failure you saw last time? > > seems to work in all cases i tested Good

[FFmpeg-devel] [PATCH v2] fate: add more vc2 encoder tests

2018-07-16 Thread James Darnley
From: James Darnley --- Michael, can you test this for the same failure you saw last time? tests/fate/vcodec.mak | 24 tests/ref/vsynth/vsynth1-vc2-t5_3 | 4 tests/ref/vsynth/vsynth1-vc2-thaar | 4 tests/ref/vsynth/vsynth2-vc2-t5_3

Re: [FFmpeg-devel] [PATCH] fate: add more vc2 encoder tests

2018-07-14 Thread James Darnley
On 2018-07-14 23:31, Michael Niedermayer wrote: > [SMPTE VC-2 encoder @ 0x2a42640] Error setting option wavelet_type to value > 444p12. Yep. Good work make! ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org

Re: [FFmpeg-devel] [PATCH] fate: add more vc2 encoder tests

2018-07-14 Thread James Darnley
On 2018-07-14 17:58, Michael Niedermayer wrote: > On Fri, Jul 13, 2018 at 08:09:57PM +0200, James Darnley wrote: >> From: James Darnley >> >> --- >> tests/fate/vcodec.mak | 24 >> tests/ref/vsynth/vsynth1-vc2-5_3

[FFmpeg-devel] [PATCH] fate: add more vc2 encoder tests

2018-07-13 Thread James Darnley
From: James Darnley --- tests/fate/vcodec.mak | 24 tests/ref/vsynth/vsynth1-vc2-5_3 | 4 tests/ref/vsynth/vsynth1-vc2-haar | 4 tests/ref/vsynth/vsynth2-vc2-5_3 | 4 tests/ref/vsynth/vsynth2-vc2-haar | 4 tests

Re: [FFmpeg-devel] [PATCH 1/4] avcodec/dirac_dwt_template: Fix signedness regression in interleave()

2018-07-13 Thread James Darnley
On 2018-07-13 19:26, Michael Niedermayer wrote: > Found-by: > > Signed-off-by: Michael Niedermayer > --- > libavcodec/dirac_dwt_template.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/libavcodec/dirac_dwt_template.c b/libavcodec/dirac_dwt_template.c > index

Re: [FFmpeg-devel] [PATCH 2/8] ffserver: Implement lua config file reader

2018-05-21 Thread James Darnley
On 2018-05-20 20:53, Stephan Holljes wrote: > +#include > +#include > +#include That's not portable. Lua headers are not in a subdirectory. > +int configs_read(struct HTTPDConfig **configs, const char *filename) > +{ > +int ret = 0; > +int nb_configs = 0; > +int nb_streams = 0; >

Re: [FFmpeg-devel] [RFC][PATCH] configure: Disable unsafe demuxers by default

2018-05-10 Thread James Darnley
On 2018-05-11 00:57, Derek Buitenhuis wrote: >> Disabling demuxers by default does not seem to be a good idea to me. > > So rendering arbitrary text files by default seems like a good idea in > comparsion? I want to argue some more so here you go: it isn't "by default". It gets rendered because

Re: [FFmpeg-devel] [RFC][PATCH] configure: Disable unsafe demuxers by default

2018-05-10 Thread James Darnley
On 2018-05-10 17:44, Derek Buitenhuis wrote: > These demuxers have probes that mainly probe based on file extension, > and map to codec IDs that render text as video. The result is that > ffmpeg will, by default, happily render, for example, .txt files > as images. This is not exactly a good

Re: [FFmpeg-devel] github

2018-04-26 Thread James Darnley
On 2018-04-26 13:15, Daniel Oberhoff wrote: > I was wondering if there is any chance to move development to github? > I.e. not just mirror, but as primary development repo, with issues and > pull requests? Would make collaboration a *lot* easier (think of > submitting a pr instead of having to

[FFmpeg-devel] [PATCH 2/2] avcodec/vc2enc: do not reset the last_parse_code variable every frame

2018-02-23 Thread James Darnley
It should be kept between frames so that the encoder can actually know whether the previous parse_code was an End Sequence. --- libavcodec/vc2enc.c | 1 - tests/ref/vsynth/vsynth1-vc2-420p | 2 +- tests/ref/vsynth/vsynth1-vc2-420p10 | 2 +-

[FFmpeg-devel] [PATCH 0/2] two small vc2enc fixes

2018-02-23 Thread James Darnley
James Darnley (2): avcodec/vc2enc: do not write an End Sequence after the first field of field-separated pictures avcodec/vc2enc: do not reset the last_parse_code variable every frame libavcodec/vc2enc.c | 9 +++-- tests/ref/vsynth/vsynth1-vc2-420p | 2

[FFmpeg-devel] [PATCH 1/2] avcodec/vc2enc: do not write an End Sequence after the first field of field-separated pictures

2018-02-23 Thread James Darnley
A lax or tolerant decoder may support an End Sequence between fields of the same frame but the spec. says this is not allowed. 2012 spec. 10.3.2, 2017 spec. 10.4.3: Where pictures are fields, a sequence shall comprise a whole number of frames (i.e., an even number of fields) and shall begin and

Re: [FFmpeg-devel] [PATCH] h264_idct: enable unmacro on newer NASM versions

2018-02-12 Thread James Darnley
On 2018-02-10 14:17, Rostislav Pehlivanov wrote: > Signed-off-by: Rostislav Pehlivanov > --- > libavcodec/x86/h264_idct.asm | 6 +- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/libavcodec/x86/h264_idct.asm b/libavcodec/x86/h264_idct.asm > index

Re: [FFmpeg-devel] [PATCH] vc2enc: prevent bitrate overshoots

2018-02-01 Thread James Darnley
On 2018-01-31 15:56, James Darnley wrote: > From: Rostislav Pehlivanov <rpehliva...@obe.tv> > > The rounding caused by the size scaler wasn't compensated for and the > slice sizes grew beyond what is allowed per frame. > > Signed-off-by: Rostislav Pehlivanov <rpehliva

[FFmpeg-devel] [PATCH] vc2enc: prevent bitrate overshoots

2018-01-31 Thread James Darnley
From: Rostislav Pehlivanov The rounding caused by the size scaler wasn't compensated for and the slice sizes grew beyond what is allowed per frame. Signed-off-by: Rostislav Pehlivanov --- libavcodec/vc2enc.c | 22 ++ 1 file changed,

Re: [FFmpeg-devel] avfilter/x86/vf_blend : add avx2 for 8b func (v2)

2018-01-16 Thread James Darnley
On 2018-01-16 22:26, Martin Vignali wrote: > diff --git a/libavutil/x86/x86util.asm b/libavutil/x86/x86util.asm > index d7cd996842..9db2d90e57 100644 > --- a/libavutil/x86/x86util.asm > +++ b/libavutil/x86/x86util.asm > @@ -335,7 +335,7 @@ > %endmacro > > %macro ABS2 4 > -%if cpuflag(ssse3) >

Re: [FFmpeg-devel] [PATCH] avfilter/vf_lut: add support for gray formats

2017-12-22 Thread James Darnley
On 2017-12-22 10:57, Paul B Mahol wrote: > Signed-off-by: Paul B Mahol > --- > libavfilter/vf_lut.c | 6 +- > tests/ref/fate/filter-pixfmts-lut | 5 + > 2 files changed, 10 insertions(+), 1 deletion(-) > > diff --git a/libavfilter/vf_lut.c

Re: [FFmpeg-devel] [PATCH 0/7] AVX-512 support (v.3)

2017-12-21 Thread James Darnley
On 2017-12-21 15:06, Carl Eugen Hoyos wrote: > 2017-12-21 14:40 GMT+01:00 James Darnley <jdarn...@obe.tv>: >> I have addressed all the comments raised in the previous threads. >> While some patches were okayed last time I am still sending them >> as part of these to g

[FFmpeg-devel] [PATCH 4/7] avutil: add alignment needed for AVX-512

2017-12-21 Thread James Darnley
--- libavutil/mem.c | 2 +- libavutil/x86/cpu.c | 2 ++ 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/libavutil/mem.c b/libavutil/mem.c index 6ad409daf4..79e8b597f1 100644 --- a/libavutil/mem.c +++ b/libavutil/mem.c @@ -61,7 +61,7 @@ void free(void *ptr); #include

[FFmpeg-devel] [PATCH 0/7] AVX-512 support (v.3)

2017-12-21 Thread James Darnley
I have addressed all the comments raised in the previous threads. While some patches were okayed last time I am still sending them as part of these to give everyone a final change to see them again and to object if they wish. Henrik Gramner (1): x86inc: AVX-512 support James Darnley (6

[FFmpeg-devel] [PATCH 2/7] avutil: add AVX-512 flags

2017-12-21 Thread James Darnley
--- Changelog | 1 + doc/APIchanges| 3 +++ libavutil/cpu.c | 6 +- libavutil/cpu.h | 1 + libavutil/tests/cpu.c | 1 + libavutil/version.h | 2 +- libavutil/x86/cpu.h | 2 ++ 7 files changed, 14 insertions(+), 2 deletions(-) diff --git a/Changelog

[FFmpeg-devel] [PATCH 7/7] checkasm: support for AVX-512 functions

2017-12-21 Thread James Darnley
--- tests/checkasm/checkasm.c | 1 + 1 file changed, 1 insertion(+) diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c index 45a70aa87f..ff0ca5b68d 100644 --- a/tests/checkasm/checkasm.c +++ b/tests/checkasm/checkasm.c @@ -204,6 +204,7 @@ static const struct { { "FMA3",

[FFmpeg-devel] [PATCH 1/7] configure: test whether x86 assembler supports AVX-512

2017-12-21 Thread James Darnley
--- configure | 5 + 1 file changed, 5 insertions(+) diff --git a/configure b/configure index d09eec4155..07fb825f91 100755 --- a/configure +++ b/configure @@ -411,6 +411,7 @@ Optimization options (experts only): --disable-fma3 disable FMA3 optimizations --disable-fma4

[FFmpeg-devel] [PATCH 3/7] avutil: detect when AVX-512 is available

2017-12-21 Thread James Darnley
--- libavutil/x86/cpu.c | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/libavutil/x86/cpu.c b/libavutil/x86/cpu.c index f33088c8c7..696f47b3bf 100644 --- a/libavutil/x86/cpu.c +++ b/libavutil/x86/cpu.c @@ -97,6 +97,7 @@ int ff_get_cpu_flags_x86(void) int

[FFmpeg-devel] [PATCH 5/7] avcodec: add stride alignment needed for AVX-512

2017-12-21 Thread James Darnley
--- configure | 2 ++ libavcodec/internal.h | 4 +++- 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/configure b/configure index 07fb825f91..d3187d71ed 100755 --- a/configure +++ b/configure @@ -1892,6 +1892,7 @@ ARCH_FEATURES=" local_aligned simd_align_16

[FFmpeg-devel] [PATCH 6/7] x86inc: AVX-512 support

2017-12-21 Thread James Darnley
From: Henrik Gramner AVX-512 consists of a plethora of different extensions, but in order to keep things a bit more manageable we group together the following extensions under a single baseline cpu flag which should cover SKL-X and future CPUs: * AVX-512 Foundation (F) *

Re: [FFmpeg-devel] avfilter/x86/vf_threshold : add SSE4 and AVX2 for threshold 16

2017-12-03 Thread James Darnley
On 2017-12-03 19:30, Martin Vignali wrote: > libavfilter/x86/vf_threshold.asm| 19 ++- > libavfilter/x86/vf_threshold_init.c | 34 -- > 2 files changed, 34 insertions(+), 19 deletions(-) > > diff --git a/libavfilter/x86/vf_threshold.asm >

Re: [FFmpeg-devel] [PATCH 7/8] lavc/flacenc: add AVX2 version of the 32-bit LPC encoder

2017-12-02 Thread James Darnley
On 2017-11-27 17:50, Henrik Gramner wrote: > On Sun, Nov 26, 2017 at 11:51 PM, James Darnley <james.darn...@gmail.com> > wrote: >> -pd_0_int_min: times 2 dd 0, -2147483648 >> -pq_int_min: times 2 dq -2147483648 >> -pq_int_max: times 2 dq 2147483647 &

Re: [FFmpeg-devel] avutil/x86util : add macro for 128 bits constant load

2017-11-27 Thread James Darnley
On 2017-11-27 20:19, Martin Vignali wrote: > +%macro VBROADCASTI128 2 ; dst xmm/ymm, src : 128bits val > +%if mmsize == 32 > +vbroadcasti128 %1, %2 > +%else > +mova %1, %2 > +%endif > +%endmacro If the condition was made "mmsize > 16" would this work correctly for zmm registers?

Re: [FFmpeg-devel] [PATCH 6/8] lavc/x86/flac_dsp_gpl: partially unroll 32-bit LPC encoder

2017-11-26 Thread James Darnley
On 2017-11-27 00:17, Rostislav Pehlivanov wrote: > On 26 November 2017 at 22:51, James Darnley <james.darn...@gmail.com> wrote: >> @@ -152,13 +152,13 @@ RET >> %macro FUNCTION_BODY_32 0 >> >> %if ARCH_X86_64 >> -cglobal flac_enc_lpc_32, 5, 7,

Re: [FFmpeg-devel] [PATCH 7/8] lavc/flacenc: add AVX2 version of the 32-bit LPC encoder

2017-11-26 Thread James Darnley
On 2017-11-27 00:13, Rostislav Pehlivanov wrote: > On 26 November 2017 at 22:51, James Darnley <james.darn...@gmail.com> wrote: >> @@ -123,7 +123,10 @@ RET >> %endmacro >> >> %macro PMINSQ 3 >> -pcmpgtq %3, %2, %1 >> +mova%3, %2 >>

[FFmpeg-devel] [PATCH 7/8] lavc/flacenc: add AVX2 version of the 32-bit LPC encoder

2017-11-26 Thread James Darnley
When compared to the SSE4.2 version runtime, is reduced by 1 to 26%. The function itself is around 2 times faster. --- libavcodec/x86/flac_dsp_gpl.asm | 56 +++-- libavcodec/x86/flacdsp_init.c | 5 +++- 2 files changed, 47 insertions(+), 14 deletions(-)

[FFmpeg-devel] [PATCH 6/8] lavc/x86/flac_dsp_gpl: partially unroll 32-bit LPC encoder

2017-11-26 Thread James Darnley
Around 1.1 times faster and reduces runtime by up to 6%. --- libavcodec/x86/flac_dsp_gpl.asm | 91 - 1 file changed, 72 insertions(+), 19 deletions(-) diff --git a/libavcodec/x86/flac_dsp_gpl.asm b/libavcodec/x86/flac_dsp_gpl.asm index

[FFmpeg-devel] [PATCH 3/8] avcodec/flac: add SSE4.2 version of the 32-bit lpc encoder

2017-11-26 Thread James Darnley
From 1.3 to 2.5 times faster. Runtime reduced by 4 to 58%. As with the 16-bit version the speed-up generally increases with compression_level. Also like the 16-bit version, it is not used with levels less than 3. After this bug fix in long, long ago in e609cfd697 this 32-bit lpc encoder is

[FFmpeg-devel] [PATCH 1/8] avcodec/flac: document limitations of the LPC encoder

2017-11-26 Thread James Darnley
State that the maximum value of order is 32. This limit is used in both C and x86 assebly code. --- libavcodec/flacdsp.h | 8 1 file changed, 8 insertions(+) diff --git a/libavcodec/flacdsp.h b/libavcodec/flacdsp.h index 7bb0dd0e9a..90fd3f04b5 100644 --- a/libavcodec/flacdsp.h +++

[FFmpeg-devel] [PATCH 8/8] checkasm: add tests for flacenc lpc coder

2017-11-26 Thread James Darnley
--- tests/checkasm/flacdsp.c | 72 1 file changed, 72 insertions(+) diff --git a/tests/checkasm/flacdsp.c b/tests/checkasm/flacdsp.c index dccb54d672..08e5e264ea 100644 --- a/tests/checkasm/flacdsp.c +++ b/tests/checkasm/flacdsp.c @@ -20,13 +20,16

[FFmpeg-devel] [PATCH 4/8] avcodec/flac: partially unroll loop in flac_enc_lpc_32

2017-11-26 Thread James Darnley
Now does 6 samples per iteration, up from 2. From 1.6 to 2.1 times faster again. 2.5 to 3.9 times faster overall. Runtime is reduced by a further 4 to 17%. Reduced by 9 to 65% overall. Same conditions as previously. --- libavcodec/x86/flac_dsp_gpl.asm | 30 +- 1

[FFmpeg-devel] [PATCH 0/8] left-overs of an ancient patch set for the flac encoder

2017-11-26 Thread James Darnley
the benchmarking I originally did a little less useful because both types of the lpc coder are used for both sample depths (16 and 24). That does make the 32-bit version more useful though because it gets used with 16-bit samples when the intermediates overflow 32 bits. James Darnley (8): avcodec/flac

[FFmpeg-devel] [PATCH 2/8] avcodec/flac: add AVX2 version of the 16-bit LPC encoder

2017-11-26 Thread James Darnley
When compared to the SSE4 version, runtime is reduced by 0.5 to 20%. After a bug fix log, long ago in e609cfd697 the 16-bit lpc encoder is used so little that the runtime reduction is no longer correct. The function itself is around 2 times faster. (As one might expect for doing twice as many

[FFmpeg-devel] [PATCH 5/8] lavc/x86/flac_dsp_gpl: cosmetic whitespace alignment

2017-11-26 Thread James Darnley
--- libavcodec/x86/flac_dsp_gpl.asm | 40 1 file changed, 20 insertions(+), 20 deletions(-) diff --git a/libavcodec/x86/flac_dsp_gpl.asm b/libavcodec/x86/flac_dsp_gpl.asm index 4d212ed212..952fc8b86b 100644 --- a/libavcodec/x86/flac_dsp_gpl.asm +++

Re: [FFmpeg-devel] [PATCH 03/11] avutil: detect when AVX-512 is available

2017-11-20 Thread James Darnley
On 2017-11-10 03:11, James Almer wrote: > On 11/9/2017 8:58 AM, James Darnley wrote: >> @@ -154,6 +155,13 @@ int ff_get_cpu_flags_x86(void) >> if (ebx & 0x0100) >> rval |= AV_CPU_FLAG_BMI2; >> } >> +#if HAVE_AVX512 /*

[FFmpeg-devel] [PATCH] configure: add audio_frame_queue dependency for aptx codec

2017-11-19 Thread James Darnley
--- configure | 2 ++ 1 file changed, 2 insertions(+) diff --git a/configure b/configure index 8b7b7e164b..48761934be 100755 --- a/configure +++ b/configure @@ -2439,6 +2439,8 @@ amv_encoder_select="aandcttables jpegtables mpegvideoenc" ape_decoder_select="bswapdsp llauddsp"

Re: [FFmpeg-devel] [PATCH 08/11] avcodec/v210enc: add AVX-512 10-bit line pack function

2017-11-13 Thread James Darnley
On 2017-11-10 22:13, James Darnley wrote: > The IRC log should appear at the link below. >> https://lists.ffmpeg.org/pipermail/ffmpeg-devel-irc/2017-November/004651.html Of course when I try to predict what number an email will get based on the past few it ends up being out of order. T

Re: [FFmpeg-devel] [PATCH] avfilter/vf_threshold: add x86 SIMD

2017-11-12 Thread James Darnley
On 2017-11-12 21:15, Rostislav Pehlivanov wrote: > On 12 November 2017 at 19:15, Paul B Mahol wrote: > +movam7, [pb_128] >> +addinq, wq >> +add thresholdq, wq >> +add minq, wq >> +add maxq, wq >> +add outq, wq >> +neg

Re: [FFmpeg-devel] [PATCH 08/11] avcodec/v210enc: add AVX-512 10-bit line pack function

2017-11-10 Thread James Darnley
On 2017-11-10 14:32, James Darnley wrote: > I mentioned previously that using ZMM registers will cause the CPU to > reduce its frequency. > > Gramner said on IRC that a user should spend 20-30% of time in > AVX-512/ZMM code for it to be a net gain in speed. > From ffmpeg-devel

Re: [FFmpeg-devel] [PATCH 05/11] avcodec: add stride alignment needed for AVX-512

2017-11-10 Thread James Darnley
On 2017-11-10 02:38, James Almer wrote: > On 11/9/2017 8:58 AM, James Darnley wrote: >> --- >> configure | 2 ++ >> libavcodec/internal.h | 4 +++- >> 2 files changed, 5 insertions(+), 1 deletion(-) >> >> diff --git a/configure b/configure

Re: [FFmpeg-devel] [PATCH 08/11] avcodec/v210enc: add AVX-512 10-bit line pack function

2017-11-10 Thread James Darnley
On 2017-11-09 20:42, Martin Vignali wrote: > I doesn't want to block this patch, but > like you say (in your previous version), that this version is not faster, > i'm not sure, it's interesting to apply it. > You already made "real" avx512 version for other funcs, in order to check > the rest of

Re: [FFmpeg-devel] [PATCH 11/11] avcodec/lossless_videodsp: add AVX-512 version of add_bytes

2017-11-10 Thread James Darnley
On 2017-11-09 20:43, Martin Vignali wrote: > 2017-11-09 20:37 GMT+01:00 Martin Vignali : >> lgtm >> >> Can you post your checkasm benchmark result for this ? Yep > $ ./tests/checkasm/checkasm --bench --test=llviddsp > benchmarking with native FFmpeg timers > nop: 26.0 >

Re: [FFmpeg-devel] [PATCH 10/11] avcodec/blockdsp: add AVX-512 version of clear_block(s)

2017-11-10 Thread James Darnley
On 2017-11-09 20:35, Martin Vignali wrote: > 2017-11-09 12:58 GMT+01:00 James Darnley <jdarn...@obe.tv>: > >> From: James Darnley <james.darn...@gmail.com> >> >> Also adjust alignment requirements where nessecary. >> --- >> Whether this patc

[FFmpeg-devel] [PATCH 05/11] avcodec: add stride alignment needed for AVX-512

2017-11-09 Thread James Darnley
--- configure | 2 ++ libavcodec/internal.h | 4 +++- 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/configure b/configure index 146a87324c..fce8030d91 100755 --- a/configure +++ b/configure @@ -1886,6 +1886,7 @@ ARCH_FEATURES=" local_aligned simd_align_16

[FFmpeg-devel] [PATCH 08/11] avcodec/v210enc: add AVX-512 10-bit line pack function

2017-11-09 Thread James Darnley
--- libavcodec/x86/v210enc.asm| 5 + libavcodec/x86/v210enc_init.c | 7 +++ 2 files changed, 12 insertions(+) diff --git a/libavcodec/x86/v210enc.asm b/libavcodec/x86/v210enc.asm index 965f2bea3c..5068af27f8 100644 --- a/libavcodec/x86/v210enc.asm +++ b/libavcodec/x86/v210enc.asm @@

[FFmpeg-devel] [PATCH 09/11] avcodec/blockdsp: roll-up x86asm preprocessor loop

2017-11-09 Thread James Darnley
From: James Darnley <james.darn...@gmail.com> --- libavcodec/x86/blockdsp.asm | 11 --- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/libavcodec/x86/blockdsp.asm b/libavcodec/x86/blockdsp.asm index 9d203df8f5..9d0e8a3242 100644 --- a/libavcodec/x86/blockdsp.asm

[FFmpeg-devel] [PATCH 02/11] avutil: add AVX-512 flags

2017-11-09 Thread James Darnley
--- libavutil/cpu.c | 6 +- libavutil/cpu.h | 1 + libavutil/tests/cpu.c | 1 + libavutil/x86/cpu.h | 2 ++ 4 files changed, 9 insertions(+), 1 deletion(-) diff --git a/libavutil/cpu.c b/libavutil/cpu.c index c8401b8258..6548cc3042 100644 --- a/libavutil/cpu.c +++

[FFmpeg-devel] [PATCH 03/11] avutil: detect when AVX-512 is available

2017-11-09 Thread James Darnley
--- I've changed this patch slightly because I discovered that it would cause an illegal instruction exception on much older processors (probably all without AVX). I was running xgetbv() almost uncontitionally. Now it is a little more like what is the in x264 patch. libavutil/x86/cpu.c | 12

[FFmpeg-devel] [PATCH 07/11] checkasm: support for AVX-512 functions

2017-11-09 Thread James Darnley
--- tests/checkasm/checkasm.c | 1 + 1 file changed, 1 insertion(+) diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c index b8b0e32dbd..9fb1438bdb 100644 --- a/tests/checkasm/checkasm.c +++ b/tests/checkasm/checkasm.c @@ -192,6 +192,7 @@ static const struct { { "FMA3",

  1   2   3   4   5   >