Re: [FFmpeg-devel] [PATCHv2 4/4] avfilter/vf_framerate: add SIMD functions for frame blending

2018-01-18 Thread Martin Vignali
> + > + > +%if HAVE_AVX2_EXTERNAL > + > +INIT_YMM avx2 > +BLEND_FRAMES > + > +INIT_YMM avx2 > Don't think it's necessary to repeat INIT_YMM avx2. > +BLEND_FRAMES16 > + > +%endif > -- Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org

Re: [FFmpeg-devel] [PATCH 4/4] avfilter/vf_framerate: add SIMD functions for frame blending

2018-01-18 Thread Martin Vignali
> if (s->bitdepth == 8) { > s->blend_factor_max = 1 << BLEND_FACTOR_DEPTH8; > -s->blend = blend_frames_c; > +if (ARCH_X86 && EXTERNAL_AVX2(cpu_flags)) > +s->blend = ff_blend_frames_avx2; > I think it's : if (EXTERNAL_AVX2_FAST(cpu_flags) > +else

Re: [FFmpeg-devel] avfilter/x86/vf_blend : add avx2 for 8b func (v2)

2018-01-17 Thread Martin Vignali
Hello, New patch in attach with modification in average, grain extract, multiply, screen, grain merge -- blend Average -- Prev patch : average_c: 15605.4 average_sse2: 1205.9 average_avx2: 772.4 New patch : average_c: 15604.4 average_sse2: 490.9 average_avx2: 265.2 With 3 operand : using

Re: [FFmpeg-devel] avfilter/x86/vf_blend : add avx2 for 8b func (v2)

2018-01-16 Thread Martin Vignali
2018-01-16 23:00 GMT+01:00 James Darnley <james.darn...@gmail.com>: > On 2018-01-16 22:26, Martin Vignali wrote: > > diff --git a/libavutil/x86/x86util.asm b/libavutil/x86/x86util.asm > > index d7cd996842..9db2d90e57 100644 > > --- a/libavutil/x86/x86util.asm > >

[FFmpeg-devel] avfilter/x86/vf_blend : add avx2 for 8b func (v2)

2018-01-16 Thread Martin Vignali
Hello, following Henrik Gramner comments (in discussion "avfilter/x86/vf_blend : add avx2 version for 8b func (WIP)") in attach new patch to add AVX2 version for each 8b func (except divide) 001 : avutil : add ABS2 for avx2 002 : avfilter : add AVX2 version for most of the func, the AVX2 is a

Re: [FFmpeg-devel] avcodec/utvideoenc : add SIMD (SSSE3) for sub_left_pred

2018-01-14 Thread Martin Vignali
Hello, new patch in attach no more segfault if i remove movsxdifnidn, after change in the "declare_func_emms" part I also modify the checkasm test, following your comments Martin 0001-avcodec-utvideoenc-add-SIMD-avx-for-sub_left_predict.patch Description: Binary data

Re: [FFmpeg-devel] avcodec/utvideoenc : add SIMD (SSSE3) for sub_left_pred

2018-01-13 Thread Martin Vignali
Hello, Following Henrik Gramner's comments, new patch in attach i try to change int width -> ptrdiff_t width to remove movsxdifnidn but i have a segfault if height > 1 pass fate test for me. Martin 0001-avcodec-utvideoenc-add-SIMD-avx-for.patch Description: Binary data

Re: [FFmpeg-devel] avcodec/utvideoenc : add SIMD (SSSE3) for sub_left_pred

2018-01-12 Thread Martin Vignali
> > this changes the output: > make -j12 && ./ffmpeg -i ~/videos/matrixbench_mpeg2.mpg -an -vcodec > utvideo -t 1 -pix_fmt yuv420p -pred left -t 1 test2.avi > > -rw-r- 1 michael michael 3744402 Jan 12 04:20 test2.avi > -rw-r- 1 michael michael 3753358 Jan 12 04:19 test.avi > > > Hello,

[FFmpeg-devel] avcodec/utvideoenc : add SIMD (SSSE3) for sub_left_pred

2018-01-11 Thread Martin Vignali
Hello, in attach patch to add SIMD for sub_left_pred in utvideoenc 001 : add SIMD for utvideoenc 002 : add checkasm for llviddspenc (diff bytes and sub_left_pred) Encoding result : ./ffmpeg -i utvideo_file.avi -c:v utvideo -pred left res.avi Without frame= 3316 fps=194 q=-0.0 Lsize= 3613675kB

Re: [FFmpeg-devel] avfilter/vf_interlace : add checkasm for lowpass_line and AVX2 version

2018-01-11 Thread Martin Vignali
2017-12-30 19:57 GMT+01:00 Martin Vignali <martin.vign...@gmail.com>: > > > >> >> > >> This broke several interlace fate tests, including the new checkasm >> one >> > >> you added. >> > >> > > > New patch in attach f

Re: [FFmpeg-devel] avfilter/vf_interlace : add checkasm for lowpass_line and AVX2 version

2017-12-30 Thread Martin Vignali
> > >> > > >> This broke several interlace fate tests, including the new checkasm > one > > >> you added. > > >> > > New patch in attach for AVX2 version i add a process of only 1* mmsize, before the loop (who process 2 * mmsize at each loop) Pass

Re: [FFmpeg-devel] avfilter/vf_interlace : add checkasm for lowpass_line and AVX2 version

2017-12-19 Thread Martin Vignali
2017-12-19 21:59 GMT+01:00 James Almer <jamr...@gmail.com>: > On 12/19/2017 5:16 PM, Martin Vignali wrote: > >> > >> LGTM, thanks. > >> > > > > Pushed, thanks > > This broke several interlace fate tests, including the new checkasm one >

Re: [FFmpeg-devel] avfilter/vf_interlace : add checkasm for lowpass_line and AVX2 version

2017-12-19 Thread Martin Vignali
> > LGTM, thanks. > Pushed, thanks Do you think, that the complex low pass filter could also be improved that > way? > > > Yes, probably. I will take a look. Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org

Re: [FFmpeg-devel] avfilter/x86/vf_hflip : make macro and add AVX2

2017-12-19 Thread Martin Vignali
Pushed, using vpermq (faster for me) and add "%if HAVE_AVX2_EXTERNAL" around INIT YMM... Thanks Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] avfilter/vf_interlace : add checkasm for lowpass_line and AVX2 version

2017-12-18 Thread Martin Vignali
> > > > Please also add the changes you made in patch 1 and avx2 to vf_tinterlace. > > > > For patch 1, IMHO, it's not necessary (the modification is mainly to make checkasm test easier to write, and like vf_interlace and vf_tinterlace use the same asm only one is useful for checkasm) In attach

Re: [FFmpeg-devel] avfilter/x86/vf_blend : add avx2 version for 8b func (WIP)

2017-12-18 Thread Martin Vignali
2017-12-17 19:41 GMT+01:00 Henrik Gramner <hen...@gramner.com>: > On Thu, Dec 14, 2017 at 11:16 AM, Martin Vignali > <martin.vign...@gmail.com> wrote: > > 2017-12-13 17:37 GMT+01:00 Henrik Gramner <hen...@gramner.com>: > >> You could also do vextracti12

Re: [FFmpeg-devel] avfilter/vf_interlace and vf_tinterlace : remove avx version

2017-12-18 Thread Martin Vignali
> when running checkasm several times sse2 is mostly faster here, not always. > But the difference is quite small. > Since I´m not an SIMD expert I´m fine with this patch as long as no one > with more expertise objects. > > > Seems like Paul B Mahol is against. And I don't have strong opinion on

[FFmpeg-devel] avfilter/vf_limiter : add checkasm and AVX2 version

2017-12-16 Thread Martin Vignali
Hello, in attach patch to add a checkasm test for vf_limiter SIMD (8 and 16) the checkasm patch need to be apply after patch in discussion : "avfilter/vf_interlace : add checkasm for lowpass_line and AVX2 version" 007 : add ff_limiter_init func to init the dsp func 008 : checkasm patch can be

[FFmpeg-devel] avfilter/vf_interlace and vf_tinterlace : remove avx version

2017-12-16 Thread Martin Vignali
Hello, Following discussion "avfilter/vf_interlace : add checkasm for lowpass_line and AVX2 version" the AVX version seems to be slower than SSE Patch in attach remove it, for vf_interlace and vf_tinterlace (both use the same SIMD) Martin 0004-avfilter-vf_interlace-and-vf_tinterlace.patch

Re: [FFmpeg-devel] avfilter/vf_interlace : add checkasm for lowpass_line and AVX2 version

2017-12-16 Thread Martin Vignali
2017-12-16 14:48 GMT+01:00 Carl Eugen Hoyos <ceffm...@gmail.com>: > 2017-12-16 14:17 GMT+01:00 Martin Vignali <martin.vign...@gmail.com>: > > > 002 : Checkasm test for lowpass_line > > The change to checkasm.mak contains unexpected tabs iiuc. > New patch in

[FFmpeg-devel] avfilter/vf_interlace : add checkasm for lowpass_line and AVX2 version

2017-12-16 Thread Martin Vignali
Hello, In attach patch to add a checkasm test, for lowpass_line (8 and 16) and AVX2 version for lowpass_line (8 and 16) 001 : Modify init part of vf_interlace (add ff_interlace_init and modify ff_interlace_init_x86) 002 : Checkasm test for lowpass_line can be test with

Re: [FFmpeg-devel] avfilter/x86/vf_interlace : fix crash if unaligned data (ticket 6491)

2017-12-15 Thread Martin Vignali
> > Would it be faster to instead process the unaligned pixels > separately and use aligned access for most of the line? > > > Probably. but the asm code will become more complex. Without a checkasm test for these funcs, i prefer to not try to

Re: [FFmpeg-devel] avfilter/x86/vf_interlace : fix crash if unaligned data (ticket 6491)

2017-12-15 Thread Martin Vignali
> Patch LGTM, thanks! > > Regards, > Thomas > > Pushed thanks Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] avfilter/x86/vf_interlace : fix crash in low_pass_complex

2017-12-15 Thread Martin Vignali
> > Patch LGTM, thanks! > > Regards, > Thomas > > > Pushed, thanks Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] avfilter/x86/vf_interlace : fix crash if unaligned data (ticket 6491)

2017-12-14 Thread Martin Vignali
2017-12-14 21:33 GMT+01:00 Thomas Mundt <tmund...@gmail.com>: > Hi, > > 2017-12-14 17:01 GMT+01:00 Martin Vignali <martin.vign...@gmail.com>: > > > Hello, > > > > > > in attach patch to fix crash using this command line > > ./ffmpeg -f lavf

[FFmpeg-devel] avfilter/x86/vf_interlace : fix crash in low_pass_complex

2017-12-14 Thread Martin Vignali
Hello, related to ticket 6491 (crash using crop and vf_interlace) in attach patch to fix crash when data are unaligned, with low_pass_complex filtering (the previous patch, fix crash, for low_pass_simple filtering) Can be test with For 8 bits ./ffmpeg -f lavfi -i

[FFmpeg-devel] avfilter/x86/vf_interlace : fix crash if unaligned data (ticket 6491)

2017-12-14 Thread Martin Vignali
Hello, in attach patch to fix crash using this command line ./ffmpeg -f lavfi -i testsrc=s=hd1080,format=yuv420p -vf crop=1440:1080,interlace -f null - (ticket 6491) Use unaligned load, to avoid crash Doesn't fix crash when using low_pass_complex : ./ffmpeg -f lavfi -i

Re: [FFmpeg-devel] avfilter/x86/vf_blend : add avx2 version for 8b func (WIP)

2017-12-14 Thread Martin Vignali
2017-12-13 17:37 GMT+01:00 Henrik Gramner <hen...@gramner.com>: > On Sat, Dec 9, 2017 at 1:11 PM, Martin Vignali <martin.vign...@gmail.com> > wrote: > > the idea in AVX2 is to load 128bits of data (2x 64 bits) > > then shuffle accross lane, the two 64 bit

Re: [FFmpeg-devel] avfilter/x86/vf_hflip : make macro and add AVX2

2017-12-14 Thread Martin Vignali
2017-12-13 17:18 GMT+01:00 Henrik Gramner <hen...@gramner.com>: > On Wed, Dec 13, 2017 at 6:07 AM, Martin Vignali > <martin.vign...@gmail.com> wrote: > > +vpermq m1, [srcq + xq - mmsize + %3], 0x4e; flip each lane > at load > > +vpermq m2, [s

[FFmpeg-devel] avfilter/x86/vf_hflip : make macro and add AVX2

2017-12-13 Thread Martin Vignali
Hello, In attach patch to merge byte and short hflip asm func into a macro and add AVX2 version Checkasm result (Kaby Lake, x86_64, mac os 10.12) hflip_byte_c: 30.9 hflip_byte_ssse3: 30.4 hflip_byte_avx2: 21.9 hflip_short_c: 31.6 hflip_short_ssse3: 30.4 hflip_short_avx2: 22.4 Martin

Re: [FFmpeg-devel] checkasm/vf_hflip : add test for vf_hflip SIMD

2017-12-13 Thread Martin Vignali
> > Tested on linux/mingw 32/64 x86 and linux mips/arm > > > > Thanks for comments and testing Pushed Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] checkasm/vf_hflip : add test for vf_hflip SIMD

2017-12-11 Thread Martin Vignali
> > It doesn't run (the test is skipped) on 32-bit VS 2017 with command: > configure --enable-gpl --toolchain=msvc && make fate-rsync > SAMPLES=../fate-suite && make fate SAMPLES=../fate-suite > > With exactly the same command it runs on 64-bit VS 2017. > With similar command line (without

Re: [FFmpeg-devel] checkasm/vf_hflip : add test for vf_hflip SIMD

2017-12-11 Thread Martin Vignali
2017-12-11 10:49 GMT+01:00 Mateusz <mateu...@poczta.onet.pl>: > W dniu 11.12.2017 o 00:51, Mateusz pisze: > > W dniu 10.12.2017 o 21:13, Martin Vignali pisze: > >>> > >>> For me there is no "src + (width - 1) * step" in > tests/checkasm/vf_

Re: [FFmpeg-devel] checkasm/vf_hflip : add test for vf_hflip SIMD

2017-12-10 Thread Martin Vignali
> > For me there is no "src + (width - 1) * step" in tests/checkasm/vf_hflip.c > > You pass start of the src buffer but you should pass end of the buffer. > > > > Thanks ! New patch in attach. Martin 0001-avfilter-vf_hflip-move-context-func-init-in.patch Description: Binary data

[FFmpeg-devel] avcodec/magicyuv : use gradient dsp func

2017-12-09 Thread Martin Vignali
Hello, in attach patch to use gradient_pred dsp func for magicyuvdec pass fate test for me Martin 0004-avcodec-magicyuv-use-gradient_pred-dsp-func-for-8-bi.patch Description: Binary data ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org

[FFmpeg-devel] avcodec/utvideodec : use gradient pred for interlace

2017-12-09 Thread Martin Vignali
Hello, in attach patch to use gradient_pred dsp func, for interlace pass fate test for me (x86 64) Martin 0003-avcodec-utvideodec-use-gradient_pred-dsp-in-interlac.patch Description: Binary data ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org

[FFmpeg-devel] avfilter/x86/vf_blend : add avx2 version for 8b func (WIP)

2017-12-09 Thread Martin Vignali
Hello, in attach patch to add AVX2 version for each 8b func (except divide) 001 : avutil : add ABS2 for avx2 002 : avfilter : add AVX2 version for most of the func, the AVX2 is a simple modification VBROADCASTi128, for constant loading when the process stay in 8bits when the process use

Re: [FFmpeg-devel] checkasm/vf_hflip : add test for vf_hflip SIMD

2017-12-09 Thread Martin Vignali
> > > Do you test on X86_32 or x86_64 ? > > failure occurs on both > > > > Nasm or Yasm ? > > NASM version 2.10.09 compiled on Dec 29 2013 > > > > I try to compile with the same nasm version (on os X, X86_64) using --x86asmexe=nasm_exe_path2.10rc9 in the configure And the checkasm also pass

Re: [FFmpeg-devel] avcodec/utvideodec : add SIMD (SSSE3 and AVX2) for gradient_pred (V2)

2017-12-09 Thread Martin Vignali
2017-12-02 19:55 GMT+01:00 Martin Vignali <martin.vign...@gmail.com>: > Hello, > > New patchs in attach for adding gradient pred SIMD (SSSE3 and AVX2) > (use by utvideo dec now (more use will be add later)) > > Checkasm result (width = 1024) > &g

Re: [FFmpeg-devel] avcodec/x86/lossless_videodsp : add_left_pred AVX2 v2

2017-12-09 Thread Martin Vignali
2017-12-02 19:12 GMT+01:00 Martin Vignali <martin.vign...@gmail.com>: > New patch in attach > > 001, 002 : unchanged > > 003 : use VBROADCASTI128 macro for constant loading en XMM/YMM instead of > 256 bits constants. >

Re: [FFmpeg-devel] avcodec/huffyuvenc : try to call dsp with aligned data, and remove code duplication

2017-12-09 Thread Martin Vignali
2017-12-02 18:59 GMT+01:00 Martin Vignali <martin.vign...@gmail.com>: > > >> requiring FFMIN() to be evaluated per iteration could be slower >> if the compiler fails to factor it out >> >> >> >> New patchs in attach : > > 001 : unchanged &

Re: [FFmpeg-devel] avcodec/utvideodec : use dsp add_median_pred for second line

2017-12-09 Thread Martin Vignali
2017-11-26 19:23 GMT+01:00 Martin Vignali <martin.vign...@gmail.com>: > Hello, > > Patch in attach > > dsp func need align16 data > make only the start of the line in scalar, and call the dsp for the rest > instead of process the entire line in scalar > >

Re: [FFmpeg-devel] avfilter/x86/vf_threshold : add SSE4 and AVX2 for threshold 16

2017-12-09 Thread Martin Vignali
Thanks for comments and testing Pushed Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] checkasm/vf_hflip : add test for vf_hflip SIMD

2017-12-08 Thread Martin Vignali
> > issue still happens with both reverted > > checkasm: using random seed 1616253308 > SSSE3: >hflip_byte_ssse3 (vf_hflip.c:63) > - vf_hflip.hflip_byte [FAILED] >hflip_short_ssse3 (vf_hflip.c:63) > - vf_hflip.hflip_short [FAILED] > checkasm: 2 of 2 tests have failed > > Thanks for

Re: [FFmpeg-devel] checkasm/vf_hflip : add test for vf_hflip SIMD

2017-12-08 Thread Martin Vignali
> > maybe iam missing something > but my box doesnt like your test: > > > Is there a link to these recent commit : https://github.com/FFmpeg/FFmpeg/commit/dc33fe1d0080e932faa9fe3c7fb4850dfde161a8 https://github.com/FFmpeg/FFmpeg/commit/f2aa0ce5a059cf02ee4cbd68111dd2ad622edc85 ? Martin

Re: [FFmpeg-devel] avfilter/x86/vf_threshold : add SSE4 and AVX2 for threshold 16

2017-12-07 Thread Martin Vignali
> > You should also change the cglobal line for x86_32, right below this else > > new patch in attach 0001-avfilter-x86-vf_threshold-add-threshold16-SIMD-SSE4.patch Description: Binary data 0002-checkasm-vf_threshold-add-test-for-threshold16.patch Description: Binary data

[FFmpeg-devel] checkasm/vf_hflip : add test for vf_hflip SIMD

2017-12-07 Thread Martin Vignali
Hello, Patch in attach add a checkasm test for vf_hflip (byte and short) Martin 0001-avfilter-vf_hflip-move-context-func-init-in.patch Description: Binary data 0002-checkasm-vf_hflip-add-test-for-vf_hflip-byte-and-sho.patch Description: Binary data

Re: [FFmpeg-devel] avfilter/x86/vf_threshold : add SSE4 and AVX2 for threshold 16

2017-12-07 Thread Martin Vignali
2017-12-03 21:28 GMT+01:00 Martin Vignali <martin.vign...@gmail.com>: > > > 2017-12-03 21:15 GMT+01:00 James Darnley <james.darn...@gmail.com>: > >> On 2017-12-03 19:30, Martin Vignali wrote: >> > libavfilter/x86/vf_threshold.asm| 19 ++

Re: [FFmpeg-devel] [PATCH] avfilter/x86/vf_hflip.asm: improve indentation

2017-12-04 Thread Martin Vignali
> .end: > -RET > +RET > > Maybe indent more the RET to be "inside" the end label. Otherwise, ok (more easy to read) ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH] avfilter: add hflip x86 SIMD

2017-12-03 Thread Martin Vignali
Checkasm result (osx) for your last patch : hflip_byte_c: 28.5 hflip_byte_ssse3: 29.0 hflip_short_c: 277.7 hflip_short_ssse3: 65.0 if you add a "cmp xq, wq" after the simd loop you can be faster than c (clang), if width is multiple of mmsize*2 hflip_byte_c: 28.5 hflip_byte_ssse3: 27.5 see below

Re: [FFmpeg-devel] avfilter/x86/vf_threshold : add SSE4 and AVX2 for threshold 16

2017-12-03 Thread Martin Vignali
2017-12-03 21:15 GMT+01:00 James Darnley <james.darn...@gmail.com>: > On 2017-12-03 19:30, Martin Vignali wrote: > > libavfilter/x86/vf_threshold.asm| 19 ++- > > libavfilter/x86/vf_threshold_init.c | 34 -- >

Re: [FFmpeg-devel] [PATCH] avfilter: add hflip x86 SIMD

2017-12-03 Thread Martin Vignali
I modify the checkasm test, to test various width if (check_func(s.flip_line[0], "hflip_%s", report_name)) { for (i = 1; i < w; i++) { call_ref(src, dst_ref, i); call_new(src, dst_new, i); if (memcmp(dst_ref, dst_new, WIDTH)) {

Re: [FFmpeg-devel] [PATCH] avfilter: add hflip x86 SIMD

2017-12-03 Thread Martin Vignali
Maybe the problem come from the skip part : +INIT_XMM ssse3 > +cglobal hflip_byte, 3, 5, 3, src, dst, w, x, v > +movam0, [pb_flip_byte] > +mov xq, 0 > +mov wd, dword wm > +sub wq, 2 * mmsize > +cmp wq, mmsize > +jl .skip > + > +.loop0: > +

Re: [FFmpeg-devel] [PATCH] avfilter: add hflip x86 SIMD

2017-12-03 Thread Martin Vignali
2017-12-03 20:36 GMT+01:00 Paul B Mahol <one...@gmail.com>: > On 12/3/17, Martin Vignali <martin.vign...@gmail.com> wrote: > >> > >> In any case, if clang or gcc can generate better code, then the hand > >> written version needs to be optimized to be a

Re: [FFmpeg-devel] [PATCH] avfilter: add hflip x86 SIMD

2017-12-03 Thread Martin Vignali
> > In any case, if clang or gcc can generate better code, then the hand > written version needs to be optimized to be as fast or faster. > > > Quick test : pass checkasm (but probably only because width = 256) hflip_byte_c: 26.4 hflip_byte_ssse3: 20.4 INIT_XMM ssse3 cglobal hflip_byte, 3, 5, 2,

Re: [FFmpeg-devel] [PATCH] avfilter: add hflip x86 SIMD

2017-12-03 Thread Martin Vignali
> Can you post a disassembly of hflip_byte_c? > > > in O1 : clang -S -O1 test_asm_gen.c .section__TEXT,__text,regular,pure_instructions .macosx_version_min 10, 12 .globl_hflip_byte_c .p2align4, 0x90 _hflip_byte_c: ## @hflip_byte_c

[FFmpeg-devel] avfilter/x86/vf_threshold : add SSE4 and AVX2 for threshold 16

2017-12-03 Thread Martin Vignali
Hello, Patch in attach add SIMD for threshold16 Checkasm result : threshold16_c: 304.5 threshold16_sse4: 60.5 threshold16_avx2: 45.0 001 : modify threshold macro, and add threshold 16 002 : add checkasm test for threshold16 Martin

Re: [FFmpeg-devel] avfilter/vf_threshold : add checkasm and avx2 version for threshold8

2017-12-03 Thread Martin Vignali
2017-12-03 18:01 GMT+01:00 Paul B Mahol <one...@gmail.com>: > On 12/3/17, Martin Vignali <martin.vign...@gmail.com> wrote: > >> > > >>> > 002 : Add checkasm test for vf_threshold > >>> > >>> Why is this GPL? > >>> >

Re: [FFmpeg-devel] [PATCH] avfilter: add hflip x86 SIMD

2017-12-03 Thread Martin Vignali
> 2017-12-03 17:46 GMT+01:00 Paul B Mahol <one...@gmail.com>: > >> On 12/3/17, Martin Vignali <martin.vign...@gmail.com> wrote: >> > Hello, >> > >> > Maybe you can use a macro for byte and short version, >> > only few lines are dif

Re: [FFmpeg-devel] avfilter/vf_threshold : add checkasm and avx2 version for threshold8

2017-12-03 Thread Martin Vignali
> > >> > 002 : Add checkasm test for vf_threshold >> >> Why is this GPL? >> >> Because i copy paste the header of vf_blend checkasm > Will change to LGPL. > > in fact most(all ?) checkasm test are under GPL licence. and the checkasm exe, is under GPL too. So i will let the current GPL licence

Re: [FFmpeg-devel] [PATCH] avfilter: add hflip x86 SIMD

2017-12-03 Thread Martin Vignali
2017-12-03 17:46 GMT+01:00 Paul B Mahol <one...@gmail.com>: > On 12/3/17, Martin Vignali <martin.vign...@gmail.com> wrote: > > Hello, > > > > Maybe you can use a macro for byte and short version, > > only few lines are different in each version >

Re: [FFmpeg-devel] avfilter/vf_threshold : add checkasm and avx2 version for threshold8

2017-12-03 Thread Martin Vignali
> > > 002 : Add checkasm test for vf_threshold > > Why is this GPL? > > Because i copy paste the header of vf_blend checkasm Will change to LGPL. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH] avfilter: add hflip x86 SIMD

2017-12-03 Thread Martin Vignali
Hello, Maybe you can use a macro for byte and short version, only few lines are different in each version Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] avfilter/vf_threshold : add checkasm and avx2 version for threshold8

2017-12-03 Thread Martin Vignali
Hello, In attach patch to add a checkasm test for the recently added threshold8 SIMD also add AVX2 version Checkasm result : threshold8_c: 584.8 threshold8_sse4: 65.0 threshold8_avx2: 43.5 001 : create ff_threshold_init_func in order to simplify checkasm write (more like previous checkasm

[FFmpeg-devel] avcodec/utvideodec : add SIMD (SSSE3 and AVX2) for gradient_pred (V2)

2017-12-02 Thread Martin Vignali
Hello, New patchs in attach for adding gradient pred SIMD (SSSE3 and AVX2) (use by utvideo dec now (more use will be add later)) Checkasm result (width = 1024) add_gradient_pred_c: 2070.2 add_gradient_pred_ssse3: 602.4 add_gradient_pred_avx2: 385.7 Need to be apply after add_left_pred AVX2

Re: [FFmpeg-devel] avcodec/x86/lossless_videodsp : add_left_pred AVX2 v2

2017-12-02 Thread Martin Vignali
New patch in attach 001, 002 : unchanged 003 : use VBROADCASTI128 macro for constant loading en XMM/YMM instead of 256 bits constants. Martin 0001-checkasm-llviddsp-test-return-of-add_left_pred-16.patch Description: Binary data 0002-avcodec-x86-lossless_videodsp.asm-make-macro-for.patch

Re: [FFmpeg-devel] avcodec/huffyuvenc : try to call dsp with aligned data, and remove code duplication

2017-12-02 Thread Martin Vignali
> > requiring FFMIN() to be evaluated per iteration could be slower > if the compiler fails to factor it out > > > > New patchs in attach : 001 : unchanged 002 : add "int min_width = FFMIN(w, 32)" at the start of the func 003 : add "int min_width = FFMIN(w, 8)" at the start of the func Pass

Re: [FFmpeg-devel] avutil/x86util : add macro for 128 bits constant load

2017-12-02 Thread Martin Vignali
2017-12-02 13:13 GMT+01:00 Henrik Gramner <hen...@gramner.com>: > On Fri, Dec 1, 2017 at 9:03 PM, Martin Vignali <martin.vign...@gmail.com> > wrote: > > If no one have objections, i will push these patch tomorrow. > > > > Martin > > Follow James' sugges

Re: [FFmpeg-devel] [PATCH] avfilter: add hflip x86 SIMD

2017-12-02 Thread Martin Vignali
> + > +%include "libavutil/x86/x86util.asm" > + > +SECTION_RODATA > + > +pb_flip_byte: times 16 db 15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0 > +pb_flip_short: times 16 db 14,15,12,13,10,11,8,9,6,7,4,5,2,3,0,1 > + > times 16 ? Martin ___ ffmpeg-devel

Re: [FFmpeg-devel] [PATCH] avfilter/vf_threshold: add x86 SIMD

2017-12-01 Thread Martin Vignali
> >> > > Do you need pxor m0, m4 and pxor m1, m4 ? > > Yes, > I need it. > > Yes, you're right, sorry for the noise Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH] avfilter/vf_threshold: add x86 SIMD

2017-12-01 Thread Martin Vignali
Hello, + > +INIT_XMM sse4 > Maybe use a macro (AVX2 can probably, be easy to add) > +cglobal threshold8, 10, 13, 5, in, threshold, min, max, out, ilinesize, > tlinesize, flinesize, slinesize, olinesize, w, h, x > +mov wd, dword wm > +mov hd, dword hm > Maybe you can use

Re: [FFmpeg-devel] avcodec/huffyuvenc : try to call dsp with aligned data, and remove code duplication

2017-12-01 Thread Martin Vignali
2017-11-26 19:07 GMT+01:00 Martin Vignali <martin.vign...@gmail.com>: > Hello, > > in attach patchs > > 0001-avcodec-huffyuvenc-increase-scalar-loop-count > and > 0003-avcodec-huffyuvenc-sub_left_prediction_bgr32-call-ds > > like diff_bytes and diff_bytes16, have

Re: [FFmpeg-devel] avutil/x86util : add macro for 128 bits constant load

2017-12-01 Thread Martin Vignali
2017-11-28 21:04 GMT+01:00 Henrik Gramner : > On Mon, Nov 27, 2017 at 11:37 PM, James Almer wrote: > > On 11/27/2017 7:33 PM, James Darnley wrote: > >> If the condition was made "mmsize > 16" would this work correctly for > >> zmm registers? (Assume I

[FFmpeg-devel] avutil/x86util : add macro for 128 bits constant load

2017-11-27 Thread Martin Vignali
Hello Following suggestion by Henrik Gramner in attach a patch to add a macro in x86_utils.asm in order to load a 128 bits constantes in an XMM register or in each part of a ZMM register Not sure about the name of this macro, and the position in the x86utils file Patch 002 : Use this new

Re: [FFmpeg-devel] avcodec/x86/bswapdsp : convert pb_bswap32 to ymm constant in order to simplify code

2017-11-27 Thread Martin Vignali
2017-11-27 17:59 GMT+01:00 Henrik Gramner <hen...@gramner.com>: > On Sat, Nov 25, 2017 at 9:53 PM, Martin Vignali > <martin.vign...@gmail.com> wrote: > > Hello, > > > > In attach patch to convert pb_bswap32 to ymm constant > > and remove the vbroadcasti

[FFmpeg-devel] avcodec/utvideodec : add x86 SIMD (SSSE3) for gradient prediction

2017-11-26 Thread Martin Vignali
Hello, Patch in attach add SIMD (SSSE 3) for gradient prediction and a checkasm test Checkasm result (width = 1024) (kaby lake, macos 10.12) add_gradient_pred_c: 1708.8 add_gradient_pred_ssse3: 533.0 Benchmark on a 3 min HD File in gradient (422) without SIMD : bench: utime=102.695s bench:

[FFmpeg-devel] avcodec/utvideodec : use dsp add_median_pred for second line

2017-11-26 Thread Martin Vignali
Hello, Patch in attach dsp func need align16 data make only the start of the line in scalar, and call the dsp for the rest instead of process the entire line in scalar pass make fate-utvideo for me Martin 0001-avcodec-utvideodec-use-dsp-add_median_pred-for-secon.patch Description: Binary

[FFmpeg-devel] avcodec/huffyuvenc : try to call dsp with aligned data, and remove code duplication

2017-11-26 Thread Martin Vignali
Hello, in attach patchs 0001-avcodec-huffyuvenc-increase-scalar-loop-count and 0003-avcodec-huffyuvenc-sub_left_prediction_bgr32-call-ds like diff_bytes and diff_bytes16, have AVX2 version, increase the scalar loop to call the aligned version in most case

[FFmpeg-devel] fate/hap : add test for hap encoding

2017-11-26 Thread Martin Vignali
Hello, Patch in attach, add test for hap encoding (currently not cover) (patch 002) and move decoding tests to a separate file (patch 001) decoding can be test with make fate-hap SAMPLES=fate-suite/ and encoding can be test with make fate-hapenc SAMPLES=fate-suite/ Hap encoding need ffmpeg

[FFmpeg-devel] avcodec/x86/lossless_videodsp : add_left_pred AVX2 v2

2017-11-25 Thread Martin Vignali
Hello, New patchs in attach in order to add avx2 version for add_left_pred Change since the v1 patch - use ymm constant - use 3 operandes mode Check asm result add_left_pred_rnd_acc_c: 1279.8 add_left_pred_rnd_acc_ssse3: 261.3 add_left_pred_rnd_acc_avx2: 209.8 add_left_pred_zero_c: 1284.8

[FFmpeg-devel] avcodec/x86/bswapdsp : convert pb_bswap32 to ymm constant in order to simplify code

2017-11-25 Thread Martin Vignali
Hello, In attach patch to convert pb_bswap32 to ymm constant and remove the vbroadcasti128 part Speed seems to be similar to me Martin 0004-avcodec-x86-bswapdsp-convert-pb_bswap32-to-ymm.patch Description: Binary data ___ ffmpeg-devel mailing list

Re: [FFmpeg-devel] fate/hapdec : add test for hap alpha only

2017-11-25 Thread Martin Vignali
> > LGTM > > thx > > > > Pushed, thanks Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] avcodec/hapdec : use gray8 for HapAlphaOnly decoding instead of RGB0

2017-11-25 Thread Martin Vignali
2017-11-23 3:46 GMT+01:00 Carl Eugen Hoyos <ceffm...@gmail.com>: > 2017-11-16 23:41 GMT+01:00 Martin Vignali <martin.vign...@gmail.com>: > > Hello, > > > > Following previous discussion > > patch in attach change pix_fmt for hap alpha only decoding to use g

Re: [FFmpeg-devel] avcodec/x86/exrdsp : use ymm constant for pb_80 instead of vbroadcasti128

2017-11-23 Thread Martin Vignali
2017-11-22 18:21 GMT+01:00 James Almer <jamr...@gmail.com>: > On 11/21/2017 6:09 PM, Martin Vignali wrote: > > Hello, > > > > After patch by James Almer > > (pb_80 now fit an ymm) > > > > The two mode (SSE, AVX2) for constant loading can be r

[FFmpeg-devel] avcodec/x86/exrdsp : use ymm constant for pb_80 instead of vbroadcasti128

2017-11-21 Thread Martin Vignali
Hello, After patch by James Almer (pb_80 now fit an ymm) The two mode (SSE, AVX2) for constant loading can be remove speed seems to be similar to me Martin 0002-avcodec-x86-exrdsp-use-ymm-constant-for-pb_80.patch Description: Binary data ___

[FFmpeg-devel] checkasm/utvideo : be more explicit for WIDTH_PADDED define

2017-11-21 Thread Martin Vignali
Hello, Patch in attach Martin 0001-checkasm-utvideo-be-more-explicit-to-the-WIDTH_PADDE.patch Description: Binary data ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] libavcodec/utvideodsp : add avx2 version

2017-11-21 Thread Martin Vignali
Hello, > > > Checkasm result (Kaby Lake, os 10.12) > > restore_rgb_planes_c: 8371.0 > > restore_rgb_planes_sse2: 6583.7 > > restore_rgb_planes_avx2: 3596.5 > > > > restore_rgb_planes10_c: 16735.7 > > restore_rgb_planes10_sse2: 11478.5 > > restore_rgb_planes10_avx2: 7193.7 > > Curious, on my

Re: [FFmpeg-devel] libavcodec/utvideodsp : add avx2 version

2017-11-20 Thread Martin Vignali
Hello, > > If noone reviews, and you tested it then it should be ok to > apply Ok will apply AVX2 for utvideodsp, huffyuv(enc)dsp and hapqa decoding (and fate) > especially considering you waited a month (which is longer than > needed generally) > > > > What is the recommanded time, to wait a

Re: [FFmpeg-devel] libavcodec/hapdec : add support for hapqa decoding

2017-11-20 Thread Martin Vignali
2017-11-13 20:06 GMT+01:00 Martin Vignali <martin.vign...@gmail.com>: > > > 2017-10-21 21:37 GMT+02:00 Carl Eugen Hoyos <ceffm...@gmail.com>: > >> 2017-10-21 21:32 GMT+02:00 Martin Vignali <martin.vign...@gmail.com>: >> > 2017-10-21 21:23 G

Re: [FFmpeg-devel] libavcodec/utvideodsp : add avx2 version

2017-11-20 Thread Martin Vignali
2017-11-04 19:33 GMT+01:00 Martin Vignali <martin.vign...@gmail.com>: > > > 2017-10-25 21:53 GMT+02:00 Martin Vignali <martin.vign...@gmail.com>: > >> >> >> 2017-10-22 14:05 GMT+02:00 Martin Vignali <martin.vign...@gmail.com>: >>

Re: [FFmpeg-devel] libavcodec/huffyuvdsp(enc) : add avx2 version for add(diff)_int16

2017-11-20 Thread Martin Vignali
2017-11-04 19:31 GMT+01:00 Martin Vignali <martin.vign...@gmail.com>: > > > 2017-10-22 0:26 GMT+02:00 Martin Vignali <martin.vign...@gmail.com>: > >> Hello, >> >> In attach patch to add avx2 version for huffyuv dsp and huffyuvdsp enc >> for add_

Re: [FFmpeg-devel] [PATCH] dvenc: Prevent out-of-bounds read

2017-11-17 Thread Martin Vignali
2017-11-17 17:20 GMT+01:00 Derek Buitenhuis : > mb_area_start has 5 entries, and 'a' is iterated through from 0 to 3. > 'a2' is set to 'a + 1', and mb_area_start[a2 + 1] is accessed, so if > a is 3, then we try to access mb_area_start[5]. > > Signed-off-by: Derek

Re: [FFmpeg-devel] fate/hapdec : add test for hap alpha only

2017-11-16 Thread Martin Vignali
> > New patch in attach (use gray8 pix_fmt) > Need to be apply after patch in discussion > avcodec/hapdec : use gray8 for HapAlphaOnly decoding instead of RGB0 > > Martin > With the attachment 0007-fate-hapAlphaOnly-add-test-for-hap-alpha-only-decodi.patch Description: Binary data

Re: [FFmpeg-devel] fate/hapdec : add test for hap alpha only

2017-11-16 Thread Martin Vignali
2017-09-28 21:53 GMT+02:00 Martin Vignali <martin.vign...@gmail.com>: > > > 2017-09-24 11:53 GMT+02:00 Michael Niedermayer <mich...@niedermayer.cc>: > >> On Sat, Sep 23, 2017 at 09:53:45PM +0200, Martin Vignali wrote: >> > Hello, >> > >> >

[FFmpeg-devel] avcodec/hapdec : use gray8 for HapAlphaOnly decoding instead of RGB0

2017-11-16 Thread Martin Vignali
Hello, Following previous discussion patch in attach change pix_fmt for hap alpha only decoding to use gray8 instead of RGB0 0005-avcodec-texturedsp-add-rgtc1u-gray-decoding.patch add rgtc1u_gray func in order to decode in a gray8 picture

Re: [FFmpeg-devel] [PATCH] avfilter: add normalize filter

2017-11-16 Thread Martin Vignali
Hello, > Maybe there's some other way to send a patch (base64, attached zip file, > ???) > > > https://ffmpeg.org/git-howto.html#Preparing-a-patchset try git format-patch origin/master and put in attach the file(s) Martin

Re: [FFmpeg-devel] avcodec/hapqa_extract_bsf : add bsf filter for haqqa (to hapq or hapalpha only) conversion (WIP)

2017-11-14 Thread Martin Vignali
2017-11-14 1:26 GMT+01:00 Carl Eugen Hoyos <ceffm...@gmail.com>: > 2017-11-13 22:43 GMT+01:00 Martin Vignali <martin.vign...@gmail.com>: > > > In attach patch to add a new bitstream filter > > > > The goal is to convert HAPQA file to HAPQ (removing al

[FFmpeg-devel] avcodec/hapqa_extract_bsf : add bsf filter for haqqa (to hapq or hapalpha only) conversion (WIP)

2017-11-13 Thread Martin Vignali
Hello, In attach patch to add a new bitstream filter The goal is to convert HAPQA file to HAPQ (removing alpha) or HAPAlphaOnly (remove rgb) HAPQA data, is separate in two part, one for RGB data and one for alpha data so we can make the conversion without losses, by copying the right part.

Re: [FFmpeg-devel] libavcodec/hapdec : add support for hapqa decoding

2017-11-13 Thread Martin Vignali
2017-10-21 21:37 GMT+02:00 Carl Eugen Hoyos <ceffm...@gmail.com>: > 2017-10-21 21:32 GMT+02:00 Martin Vignali <martin.vign...@gmail.com>: > > 2017-10-21 21:23 GMT+02:00 Carl Eugen Hoyos <ceffm...@gmail.com>: > > > >> 2017-10-21 19:35 GMT+02:0

Re: [FFmpeg-devel] libavcodec/hap : add HapAlphaOnly decoding/encoding

2017-11-13 Thread Martin Vignali
2017-10-16 16:37 GMT+02:00 Tom Butterworth : > > >> Patches 0001, 0002, 0006 and 0007 LGTM and are uncontentious. As they > are > >> required for Hap Q Alpha support I will commit these shortly, assuming > >> nobody objects. > >> > >> > > Ok for that, so we can discuss in

<    1   2   3   4   5   6   >