Re: [FFmpeg-devel] [PATCH v2 3/3] avfilter/vf_colordetect: add x86 SIMD implementation

2025-07-16 Thread Henrik Gramner via ffmpeg-devel
On Wed, Jul 16, 2025 at 6:26 PM Niklas Haas wrote: > +cglobal detect_range%1, 6, 7, 5, data, stride, width, height, mpeg_min, > mpeg_max, x > +movd xm0, mpeg_mind > +movd xm1, mpeg_maxd > +vpbroadcast%1 m0, xm0 > +vpbroadcast%1 m1, xm1 You could perhaps also do something like the

Re: [FFmpeg-devel] [PATCH] avcodec/x86/vp9: Add AVX-512ICL for 16x16 and 32x32 10bpc inverse transforms

2025-05-26 Thread Henrik Gramner via ffmpeg-devel
On Wed, May 21, 2025 at 5:48 PM Henrik Gramner wrote: > > Tested to pass FATE on Linux and Windows. Pushed. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link abo

[FFmpeg-devel] [PATCH] avcodec/x86/vp9: Add AVX-512ICL for 16x16 and 32x32 10bpc inverse transforms

2025-05-21 Thread Henrik Gramner via ffmpeg-devel
Tested to pass FATE on Linux and Windows. Checkasm numbers vs the existing SSE2 code on Zen 5 (Strix Halo): vp9_inv_adst_adst_16x16_sub16_add_10_sse2: 1041.8 ( 1.92x) vp9_inv_adst_adst_16x16_sub16_add_10_avx512icl: 132.5 (15.06x) vp9_inv_dct_adst_16x16_sub16_add_10_sse2: 901.0 ( 1

Re: [FFmpeg-devel] [PATCH] avcodec/x86/vp9: Add AVX-512ICL for 16x16 and 32x32 8bpc inverse transforms

2025-05-19 Thread Henrik Gramner via ffmpeg-devel
On Sat, May 17, 2025 at 12:59 AM Henrik Gramner wrote: > > Placed in a new separate file as the existing combined MMX/SSE/AVX > file is humongous and takes forever to assemble as is. > > This adds ~16 KiB of .text. The existing 8bpc asm is ~240 KiB of which > the correspond

[FFmpeg-devel] [PATCH] avcodec/x86/vp9: Add AVX-512ICL for 16x16 and 32x32 8bpc inverse transforms

2025-05-16 Thread Henrik Gramner via ffmpeg-devel
Placed in a new separate file as the existing combined MMX/SSE/AVX file is humongous and takes forever to assemble as is. This adds ~16 KiB of .text. The existing 8bpc asm is ~240 KiB of which the corresponding AVX2 functions makes up ~42 KiB. Tested to pass FATE on Linux and Windows. Checkasm n

Re: [FFmpeg-devel] [PATCH v2] checkasm: add sample argument to adjust during bench

2024-05-21 Thread Henrik Gramner via ffmpeg-devel
On Tue, May 21, 2024 at 2:33 PM J. Dekker wrote: > @@ -338,8 +338,9 @@ typedef struct CheckasmPerf { > uint64_t tsum = 0;\ > int ti, tcount = 0;\ > uint64_t t = 0; \ > +const uint64_t truns = bench_runs;\ > checkasm_set_signal_handler

Re: [FFmpeg-devel] [PATCH v3 2/5] ffbuild/libversion.sh: add shebang

2024-04-09 Thread Henrik Gramner via ffmpeg-devel
On Tue, Apr 9, 2024 at 11:52 PM Marth64 wrote: > > +#!/bin/sh > Might I suggest `#!/usr/bin/env sh` instead for this case? > I tend to prefer it from a portability and usability perspective, > but I can imagine for sh it might not matter. /bin/sh exists on virtually every *NIX system whereas /usr

Re: [FFmpeg-devel] [PATCH] lavf/vsrc_ddagrab: WinAPI functions must be called as stdcall in x86_32

2024-04-07 Thread Henrik Gramner via ffmpeg-devel
On Sun, Apr 7, 2024 at 2:59 AM Vadim Guchenko wrote: > +typedef DPI_AWARENESS_CONTEXT (__stdcall > *set_thread_dpi_t)(DPI_AWARENESS_CONTEXT); I believe most existing code uses WINAPI instead of __stdcall. ___ ffmpeg-devel mailing list ffmpeg-devel@

Re: [FFmpeg-devel] [GASPP PATCH] Implicitly start out in the text section for armasm

2024-04-04 Thread Henrik Gramner via ffmpeg-devel
On Wed, Apr 3, 2024 at 3:47 PM Martin Storsjö wrote: > > This fixes assembling files starting with bare symbol declarations, > without explicitly switching to .text first. lgtm. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/ma

Re: [FFmpeg-devel] [PATCH] avcodec/x86/h264_idct: Fix incorrect xmm spilling on win64

2024-03-25 Thread Henrik Gramner via ffmpeg-devel
On Sun, Mar 24, 2024 at 8:21 PM Henrik Gramner wrote: > > Broken in afa471d0efed1df5dca6eeeb2fcdd211ae4cad4e. It just happened > to work before due to x86inc.asm previously performing XMM spills in > INIT_MMX mode which was more of a bug than an intentional feature.

Re: [FFmpeg-devel] [PATCH] avformat/mov_chan: Use anonymous union

2024-03-25 Thread Henrik Gramner via ffmpeg-devel
On Mon, Mar 25, 2024 at 4:01 PM Andreas Rheinhardt wrote: > > Right, it is an anonymous enum, not union. Amended locally. > > - Andreas Can confirm this eliminates the warnings, lgtm. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.

[FFmpeg-devel] [PATCH] avcodec/x86/h264_idct: Fix incorrect xmm spilling on win64

2024-03-24 Thread Henrik Gramner via ffmpeg-devel
Broken in afa471d0efed1df5dca6eeeb2fcdd211ae4cad4e. It just happened to work before due to x86inc.asm previously performing XMM spills in INIT_MMX mode which was more of a bug than an intentional feature. x86_h264_idct_spill_xmm.patch Description: Binary data _

Re: [FFmpeg-devel] [PATCH] x86: Update x86inc.asm

2024-03-24 Thread Henrik Gramner via ffmpeg-devel
On Tue, Mar 19, 2024 at 11:20 AM Henrik Gramner wrote: > > Will push in a few days if there are no comments. Pushed. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visi

Re: [FFmpeg-devel] [PATCH] x86: Update x86inc.asm

2024-03-19 Thread Henrik Gramner via ffmpeg-devel
On Sat, Mar 16, 2024 at 8:53 PM Henrik Gramner wrote: > Makes things up-to-date with the upstream at > https://code.videolan.org/videolan/x86inc.asm Will push in a few days if there are no comments. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmp

Re: [FFmpeg-devel] [PATCH] avutil/x86util: Fix broken pre-SSE4.1 PMINSD emulation

2024-03-17 Thread Henrik Gramner via ffmpeg-devel
On Sun, Mar 17, 2024 at 1:44 PM James Almer wrote: > LGTM. I wonder why we even added a float based fallback for this. Thanks, pushed. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe

[FFmpeg-devel] [PATCH] avutil/x86util: Fix broken pre-SSE4.1 PMINSD emulation

2024-03-17 Thread Henrik Gramner via ffmpeg-devel
Fixes yadif-16 which allows FATE to pass. Broken since 2904db90458a1253e4aea6844ba9a59ac11923b6 (2017). pminsd_emulation.patch Description: Binary data ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-dev

[FFmpeg-devel] [PATCH] x86: Update x86inc.asm

2024-03-16 Thread Henrik Gramner via ffmpeg-devel
Makes things up-to-date with the upstream at https://code.videolan.org/videolan/x86inc.asm Specifying every individual change is difficult as there have been divergences and cherry-picks over time, but the full upstream change log can be found at https://code.videolan.org/videolan/x86inc.asm/-/com

Re: [FFmpeg-devel] [PATCH] libavcodec/h264pred: Remove pred8x8_horizontal_8_mmxext

2024-03-02 Thread Henrik Gramner via ffmpeg-devel
On Sat, Mar 2, 2024 at 10:13 PM Kieran Kunhya wrote: > SPLATB_LOAD m0, r0+r1*0-1, m2 > SPLATB_LOAD m1, r0+r1*1-1, m2 This adds an extra unnecessary shuffle in the SSE2 code as it splats to a full register. The easiest way of fixing it would probably be to unroll the macro and manually g

Re: [FFmpeg-devel] [PATCH] avcodec/x86/hevc: fix luma 12b overflow

2024-02-25 Thread Henrik Gramner via ffmpeg-devel
On Sun, Feb 25, 2024 at 5:42 PM Ronald S. Bultje wrote: > +movam13, [pw_8] > +paddw m10, m12, m12 > +paddw m12, m10 ; 9 * (q0 - p0) - 3 * ( q1 - p1 ) > paddw m12, m13; + 8 Memory operand > +paddw m10, m13, m13 > +paddw

Re: [FFmpeg-devel] [PATCH] checkasm: Generalize crash handling

2023-12-22 Thread Henrik Gramner via ffmpeg-devel
On Fri, Dec 22, 2023 at 7:20 AM Rémi Denis-Courmont wrote: > >> > +checkasm_fail_func("%s", > >> > + s == SIGFPE ? "fatal arithmetic error" : > >> > + s == SIGILL ? "illegal instruction" : > >> > + s == SIGBUS ?

Re: [FFmpeg-devel] [PATCH] checkasm: Generalize crash handling

2023-12-21 Thread Henrik Gramner via ffmpeg-devel
On Thu, Dec 21, 2023 at 9:16 PM Rémi Denis-Courmont wrote: > > +checkasm_fail_func("%s", > > + s == SIGFPE ? "fatal arithmetic error" : > > + s == SIGILL ? "illegal instruction" : > > + s == SIGBUS ? "bus error"

Re: [FFmpeg-devel] [PATCH] checkasm: Generalize crash handling

2023-12-21 Thread Henrik Gramner via ffmpeg-devel
On Tue, Dec 19, 2023 at 1:02 PM Martin Storsjö wrote: > This replaces the riscv specific handling from > 7212466e735aa187d82f51dadbce957fe3da77f0 (which essentially is > reverted, together with 286d6742218ba0235c32876b50bf593cb1986353) > with a different implementation of the same (plus a bit more

Re: [FFmpeg-devel] [PATCH v2] avcodec/amfenc: increase precision of Sleep() on Windows

2023-11-27 Thread Henrik Gramner via ffmpeg-devel
On Mon, Nov 27, 2023 at 2:42 PM Mark Thompson wrote: > Is it reasonable to set this global state from a library without the parent > program knowing? We'd really prefer not to affect the global state > unexpectedly. CreateWaitableTimerExW() with the CREATE_WAITABLE_TIMER_HIGH_RESOLUTION flag m

Re: [FFmpeg-devel] [PATCH 2/3] x86/ac3dsp: add ff_float_to_fixed24_avx2()

2023-11-23 Thread Henrik Gramner via ffmpeg-devel
On Thu, Nov 23, 2023 at 12:51 PM James Almer wrote: > movdqa wiht ymm is avx2. I could change it to movaps, but technically > the registers contain floats and i don't know if any old AVX cpu has > penalties for changing domains. Fwiw I believe what domain the result of fp <-> int conversion instr

Re: [FFmpeg-devel] [PATCH 1/4] avutil/x86/pixelutils: Empty MMX state in ff_pixelutils_sad_8x8_mmxext

2023-11-01 Thread Henrik Gramner via ffmpeg-devel
On Wed, Nov 1, 2023 at 10:44 AM Andreas Rheinhardt wrote: > libavutil/x86/pixelutils.asm | 1 + > 1 file changed, 1 insertion(+) IIRC the emms instructions is quite slow on many systems, so if this is the only pixelutils function using mmx it's probably better to just rewrite it to use SSE2 inst

Re: [FFmpeg-devel] [PATCH] x86inc: Add REPX macro to repeat instructions/operations

2023-10-01 Thread Henrik Gramner via ffmpeg-devel
On Fri, Sep 29, 2023 at 1:38 PM Frank Plowman wrote: > libavutil/x86/x86inc.asm | 10 ++ > 1 file changed, 10 insertions(+) LGTM. As a side note https://code.videolan.org/videolan/x86inc.asm is the upstream repo for x86inc.asm. ___ ffmpeg-deve

Re: [FFmpeg-devel] [PATCH] x86: replace explicit REP_RETs with RETs

2023-01-31 Thread Henrik Gramner
lgtm ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH] RFC: v210enc optimisations and initial AVX-512

2022-10-21 Thread Henrik Gramner
On Fri, Oct 21, 2022 at 5:41 AM Kieran Kunhya wrote: > > Hi, > > Please see attached an attempt to optimise the 8-bit input to v210enc to > reduce the number of shuffles. > This comes at the cost of having to extract the middle element and perform > a DWORD shift on it and then reinserting it. > I

Re: [FFmpeg-devel] [PATCH 2/4] lavc/pthread_frame: set worker thread names

2022-10-18 Thread Henrik Gramner
On Tue, Oct 18, 2022 at 6:54 PM Anton Khirnov wrote: > +static void thread_set_name(PerThreadContext *p) > +{ > +AVCodecContext *avctx = p->avctx; > +int idx = p - p->parent->threads; > +char name[16]; > + > +snprintf(name, sizeof(name), "d:%.7s:ft%d", avctx->codec->name, idx); > +

Re: [FFmpeg-devel] [PATCH v4] libavfilter/x86/vf_convolution: add sobel filter optimization and unit test with intel AVX512 VNNI

2022-09-19 Thread Henrik Gramner
On Wed, Sep 7, 2022 at 8:47 AM wrote: > +.loop1: > +pxor m4, m4 > +pxor m5, m5 Those zero-initializations are redundant. Aside from that the asm LGTM. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmp

Re: [FFmpeg-devel] [PATCH v2] x86/tx_float: Fix building for platforms with a symbol prefix

2022-09-06 Thread Henrik Gramner
LGTM. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH v3] libavfilter/x86/vf_convolution: add sobel filter optimization and unit test with intel AVX512 VNNI

2022-09-06 Thread Henrik Gramner
On Tue, Aug 23, 2022 at 10:43 AM wrote: > +.loop1: > +pxor m4, m4 > +pxor m5, m5 > + > +;Gx > +SOBEL_MUL_16 0, data_n1, 4 > +SOBEL_MUL_16 1, data_n2, 4 > +SOBEL_MUL_16 2, data_n1, 4 > +SOBEL_ADD_16 6, 4 > +SOBEL_MUL_16 7, data_p2, 4 > +SOBEL_ADD_16 8, 4 > + > [.

Re: [FFmpeg-devel] [PATCH v2] x86/tx_float: implement inverse MDCT AVX2 assembly

2022-09-02 Thread Henrik Gramner
On Fri, Sep 2, 2022 at 7:55 AM Lynne wrote: > +movd xmm4, strided > +neg t2d > +movd xmm5, t2d > +SPLATD xmm4 > +SPLATD xmm5 > +vperm2f128 m4, m4, m4, 0x00 ; +stride splatted > +vperm2f128 m5, m5, m5, 0x00 ; -stride splatted movd xm4, strided pxor m5, m5 vpbr

Re: [FFmpeg-devel] Discrepancy between comments for AVX512 flags

2022-08-27 Thread Henrik Gramner
> On Sat, Aug 27, 2022 at 12:04 AM James Darnley wrote: > I think the feature selection is fine as-is, if you want to clarify > the comments go ahead. AVX512 wouldn't be useful with a subset even > smaller then what the plain AVX512 is looking for (there is also no > CPUs with any smaller set, afa

Re: [FFmpeg-devel] [PATCH] avcodec/x86/pngdsp: Remove obsolete ff_add_bytes_l2_mmx()

2022-07-25 Thread Henrik Gramner
On Mon, Jul 25, 2022 at 5:43 AM Andreas Rheinhardt wrote: > > It is overridden by ff_add_bytes_l2_sse2() on any non-ancient CPU. > > Signed-off-by: Andreas Rheinhardt Lgtm ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman

Re: [FFmpeg-devel] [PATCH 1/2] libavutil: Add av_visibility_hidden for setting hidden symbol visibility

2022-07-11 Thread Henrik Gramner
On Mon, Jul 11, 2022 at 11:19 AM Martin Storsjö wrote: > +#if (AV_GCC_VERSION_AT_LEAST(4,0) || defined(__clang__)) && > (defined(__ELF__) || defined(__MACH__)) > +#define av_visibility_hidden __attribute__((visibility("hidden"))) > +#else > +#define av_visibility_hidden > +#endif The usu

Re: [FFmpeg-devel] [PATCH v2 5/5] avcodec/x86/hevc_mc: add qpel_h64_8_avx512icl

2022-03-11 Thread Henrik Gramner
All 5/5 LGTM. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH 2/6] avcodec/x86/hevc_mc: add qpel_h8_8_avx512icl and qpel_hv8_8_avx512icl

2022-03-10 Thread Henrik Gramner
On Wed, Feb 23, 2022 at 9:58 AM wrote: > +%macro HEVC_PUT_HEVC_QPEL_AVX512ICL 2 > [...] > +vpmovdw xm6, m6 > +movu [dstq], xm6 vpmovdw can take a memory operand as dst directly: vpmovdw [dstq], m6 (the same applies to the hv function) > +%macro HEVC_PUT_

Re: [FFmpeg-devel] [PATCH 1/6] avutil/cpu: add AVX512 Icelake flag

2022-03-10 Thread Henrik Gramner
On Wed, Feb 23, 2022 at 9:57 AM wrote: > > From: Wu Jianhua > > Signed-off-by: Wu Jianhua > --- > configure | 13 +++--- > libavutil/cpu.c | 1 + > libavutil/cpu.h | 1 + > libavutil/x86/cpu.c | 8 -- > libavutil/x86/cpu.h | 1 + > lib

Re: [FFmpeg-devel] [PATCH v2 3/9] avcodec/av1dec: support setup shear process

2021-07-06 Thread Henrik Gramner
On Mon, Jul 5, 2021 at 4:32 AM Fei Wang wrote: > +int64_t v, w; > +int32_t *param = &s->cur_frame.gm_params[idx][0]; ... > +v = param[4] * (1 << AV1_WARPEDMODEL_PREC_BITS); > +w = param[3] * param[4]; Possible integer overflow? Might need some int64_t casting before the mu

Re: [FFmpeg-devel] [PATCH] ffmpeg: add -fpsmin to clamp output framerate

2021-06-14 Thread Henrik Gramner
On Mon, Jun 14, 2021 at 9:22 AM Matthias Neugebauer wrote: > Anything I can do to not land in spam? On another Google groups > mailing list I (and many others including the admin accounts) had > the same issue a couple of times. This is caused by sending emails from a domain with a DMARC reject o

Re: [FFmpeg-devel] [PATCH] Moves yuv2yuvX_sse3 to yasm, unrolls main loop and other small optimizations for ~20% speedup.

2020-11-17 Thread Henrik Gramner
On Mon, Nov 16, 2020 at 11:03 AM Alan Kelly wrote: > +cglobal yuv2yuvX, 6, 7, 16, filter, filterSize, dest, dstW, dither, offset, > src Only 8 xmm registers are used, so 8 should be used instead of 16 here. Otherwise it causes unnecessary spilling of registers on 64-bit Windows. > +%if ARCH_X86_

Re: [FFmpeg-devel] [PATCH 3/3] x86/vf_blend: fix warnings about trailing empty parameters

2020-07-10 Thread Henrik Gramner
On Thu, Jul 9, 2020 at 4:54 PM James Almer wrote: > @@ -38,7 +38,7 @@ pb_255: times 16 db 255 > > SECTION .text > > -%macro BLEND_INIT 2-3 > +%macro BLEND_INIT 2 > %if ARCH_X86_64 > cglobal blend_%1, 6, 9, %2, top, top_linesize, bottom, bottom_linesize, dst, > dst_linesize, width, end, x >

Re: [FFmpeg-devel] Project orientation

2020-07-05 Thread Henrik Gramner
On Sun, Jul 5, 2020 at 9:10 PM Marton Balint wrote: > I don't know enough about x262/x264 to do this with reasonable amount of > work. Do you think there is a chance of this happening if I post a bounty > or get a sponsorship? x264 is an H.264/AVC encoder and as such an MPEG-2 encoder is not in s

Re: [FFmpeg-devel] [PATCH 5/5] checkasm: aarch64: Check for stack overflows

2020-05-15 Thread Henrik Gramner
All 5 lgtm. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH] Workaround to build ffmpeg on MacOs 10.15

2020-01-03 Thread Henrik Gramner
On Fri, Jan 3, 2020 at 7:37 PM Moritz Barsnick wrote: > On Fri, Jan 03, 2020 at 11:05:25 +0100, Timo Rothenpieler wrote: > > I think this was discussed on this list in the past. > > Not sure what the conclusion was, but I think an unconditional flag like > > this being added wasn't all that well r

Re: [FFmpeg-devel] [PATCH V3 2/2] libswscale/x86/yuv2rgb: add ssse3 version

2019-12-16 Thread Henrik Gramner
On Wed, Dec 4, 2019 at 4:03 AM Ting Fu wrote: > +VBROADCASTSD y_offset, [pointer_c_ditherq + 8 * 8] > +VBROADCASTSD u_offset, [pointer_c_ditherq + 9 * 8] > +VBROADCASTSD v_offset, [pointer_c_ditherq + 10 * 8] > +VBROADCASTSD ug_coff, [pointer_c_ditherq + 7 * 8] > +VBROADCAS

Re: [FFmpeg-devel] [PATCH 4/4] avfilter/vf_v360: x86 SIMD for interpolations

2019-09-05 Thread Henrik Gramner
On Wed, Sep 4, 2019 at 9:29 PM Paul B Mahol wrote: > +movd xm6, [pd_255] > +vpbroadcastdm6, xm6 vpbroadcastdm6, [pd_255] ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To

Re: [FFmpeg-devel] [PATCH 4/4] avfilter/vf_v360: x86 SIMD for interpolations

2019-09-04 Thread Henrik Gramner
On Wed, Sep 4, 2019 at 10:01 PM James Almer wrote: > On 9/4/2019 4:28 PM, Paul B Mahol wrote: > > +vpmulld m3, m1, m0 > > +vpaddd m1, m3, m2 > > pmulld m1, m0 > paddd m1, m2 Could use pmaddwd instead as well, it's faster than pmulld on pretty much every CPU. >

Re: [FFmpeg-devel] [PATCH] avutil/mem: Mark DECLARE_ASM_ALIGNED as visibility("hidden") for __GNUC__

2019-03-13 Thread Henrik Gramner
On Wed, Feb 20, 2019 at 8:03 PM Fāng-ruì Sòng wrote: > --- a/libavutil/mem.h > +++ b/libavutil/mem.h > > +#if defined(__GNUC__) && !(defined(_WIN32) || defined(__CYGWIN__)) > +#define DECLARE_HIDDEN __attribute__ ((visibility ("hidden"))) > +#else > +#define DECLARE_HIDDEN > +#endif libav

Re: [FFmpeg-devel] [PATCH] avutil: Rename RSHIFT macro to ROUNDED_RSHIFT

2019-01-27 Thread Henrik Gramner
On Mon, Jan 21, 2019 at 9:54 PM James Almer wrote: > There's also no good way to deprecate a define and replace it with > another while informing the library user, so for something purely > cosmetic like this i don't think it's worth the trouble. Would it be possible to create a deprecated inline

Re: [FFmpeg-devel] [PATCH] avcodec/libx264: remove FF_CODEC_CAP_INIT_THREADSAFE flag

2018-10-23 Thread Henrik Gramner
On Tue, Oct 23, 2018 at 3:22 PM Derek Buitenhuis wrote: > I'd like to point out that this patch or some variant may be required anyway. > > libx264 only uses strtok_r or strtok_s if available on the platform. > > See: > https://git.videolan.org/?p=x264.git;a=blob;f=common/osdep.h;h=715ef8a00c01ad

Re: [FFmpeg-devel] [PATCH] avcodec/libx264: remove FF_CODEC_CAP_INIT_THREADSAFE flag

2018-10-21 Thread Henrik Gramner
Fixed in x264-sandbox. All uses of plain strtok() will be removed from x264 in the next push. I though all of the strtok() uses in x264 had already been converted to strtok_r() but apparently that wasn't the case. Sorry about that. ___ ffmpeg-devel maili

Re: [FFmpeg-devel] swscale/x86/rgb2rgb : port shuffle2103 to external asm

2018-10-09 Thread Henrik Gramner
On Mon, Oct 8, 2018 at 7:46 PM Martin Vignali wrote: > > Hello, > > Patch in attach port inline asm shuffle 2103 func (mmx/mmxext) to external > asm > and remove the inline asm version > > Martin Keeping both MMX and MMXEXT seems a bit excessive. Ideally both would be replaced with something more

Re: [FFmpeg-devel] [PATCH 2/2] avutil/float_dsp: add ff_vector_dmul_{sse2, avx}

2018-09-14 Thread Henrik Gramner
On Fri, Sep 14, 2018 at 3:26 PM, James Almer wrote: > On 9/14/2018 9:57 AM, Henrik Gramner wrote: >> Also if you want a 32-bit result from lea it should be written as "lea >> lend, [lenq*8 - mmsize*4]" which is equivalent but has a shorter >> opcode (e.g. always u

Re: [FFmpeg-devel] [PATCH 2/2] avutil/float_dsp: add ff_vector_dmul_{sse2, avx}

2018-09-14 Thread Henrik Gramner
On Fri, Sep 14, 2018 at 4:51 PM, Henrik Gramner wrote: > I can't really think of any scenario where using a 32-bit register > address operand with a 64-bit destination for LEA is not a mistake. To clarify on this, using a 32-bit memory operand means the calculated effective address

Re: [FFmpeg-devel] [PATCH 2/2] avutil/float_dsp: add ff_vector_dmul_{sse2, avx}

2018-09-14 Thread Henrik Gramner
On Thu, Sep 13, 2018 at 3:08 PM, James Almer wrote: > +lea lenq, [lend*8 - mmsize*4] Is len guaranteed to be a multiple of mmsize/8? Otherwise this would cause misalignment. It will also break if len < mmsize/2. Also if you want a 32-bit result from lea it should be written as "lea len

Re: [FFmpeg-devel] [PATCH 1/3] diracdec: add 10-bit Haar SIMD functions

2018-07-27 Thread Henrik Gramner
On Fri, Jul 27, 2018 at 4:03 PM, James Darnley wrote: > On 2018-07-27 15:05, Henrik Gramner wrote: >> Can't you just use 7 GPR:s on x86-32 as well? > > I'm sure I've done that in the past and at least 1 platform has always > complained due to PIE or stack alignm

Re: [FFmpeg-devel] [PATCH 1/3] diracdec: add 10-bit Haar SIMD functions

2018-07-27 Thread Henrik Gramner
On Fri, Jul 27, 2018 at 1:47 PM, James Darnley wrote: > On 2018-07-26 17:29, Rostislav Pehlivanov wrote: >>> +cglobal horizontal_compose_haar_10bit, 3, 6+ARCH_X86_64, 4, b, temp_, w, >>> x, b2 >>> +DECLARE_REG_TMP 2,5 >>> +%if ARCH_X86_64 >>> +%define tail r6d >>> +%else >>> +

Re: [FFmpeg-devel] [PATCH] avfilter/vf_overlay: add x86 SIMD

2018-05-01 Thread Henrik Gramner
On Tue, May 1, 2018 at 10:02 AM, Paul B Mahol wrote: > +cglobal overlay_row_22, 6, 8, 8, 0, d, da, s, a, w, al, r, x [...] > +movum2, [aq+2*xq] > +pandm2, m3 > +movum6, [aq+2*xq] > +pandm6, m7 > +psrlw m6, 8 > +p

Re: [FFmpeg-devel] [PATCH] avfilter/vf_overlay: add x86 SIMD for yuv444 format when main stream has no alpha

2018-04-30 Thread Henrik Gramner
On Mon, Apr 30, 2018 at 6:17 PM, Paul B Mahol wrote: > +.loop0: > +movu m1, [dq + xq] > +movu m2, [aq + xq] > +movu m3, [sq + xq] > + > +pshufb m1, [pb_b2dw] > +pshufb m2, [pb_b2dw] > +pshufb m3, [pb_b2dw] > +

Re: [FFmpeg-devel] [PATCH] avcodec/x86/hpeldsp: fix half pel interpolation

2018-04-27 Thread Henrik Gramner
On Fri, Apr 27, 2018 at 4:47 PM, Jerome Borsboom wrote: > In the put_no_rnd_pixels functions, the psubusb instruction subtracts one > from each > unsigned byte to correct for the rouding that the PAVGB instruction performs. > The psubusb > instruction, however, uses saturation when the value doe

Re: [FFmpeg-devel] [PATCH 0/5] x86inc: Sync changes from x264

2018-01-20 Thread Henrik Gramner
Pushed. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH 5/5] x86inc: Drop cpuflags_slowctz

2018-01-18 Thread Henrik Gramner
--- libavutil/x86/x86inc.asm | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm index 438863042f..5044ee86f0 100644 --- a/libavutil/x86/x86inc.asm +++ b/libavutil/x86/x86inc.asm @@ -827,9 +827,8 @@ BRANCH_INSTR jz, je, jnz,

[FFmpeg-devel] [PATCH 4/5] x86inc: Correctly set mmreg variables

2018-01-18 Thread Henrik Gramner
;* ;* Authors: Loren Merritt ;* Henrik Gramner @@ -892,6 +892,36 @@ BRANCH_INSTR jz, je, jnz, jne, jl, jle, jnl, jnle, jg, jge, jng, jnge, ja, jae, %undef %1%2 %endmacro +%macro DEFINE_MMREGS 1 ; mmtype +%assign %%prev_mmregs 0 +%ifdef num_mmregs +%assign

[FFmpeg-devel] [PATCH 1/5] x86inc: Enable AVX emulation for floating-point pseudo-instructions

2018-01-18 Thread Henrik Gramner
There are 32 pseudo-instructions for each floating-point comparison instruction, but only 8 of them are actually valid in legacy-encoded mode. The remaining 24 requires the use of VEX-encoded (v-prefixed) instructions and can therefore be disregarded for this purpose. --- libavutil/x86/x86inc.asm

[FFmpeg-devel] [PATCH 2/5] x86inc: Use .rdata instead of .rodata on Windows

2018-01-18 Thread Henrik Gramner
The standard section for read-only data on Windows is .rdata. Nasm will flag non-standard sections as executable by default which isn't ideal. --- libavutil/x86/x86inc.asm | 4 1 file changed, 4 insertions(+) diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm index 3b43dbc2e0..

[FFmpeg-devel] [PATCH 3/5] x86inc: Support creating global symbols from local labels

2018-01-18 Thread Henrik Gramner
index 57cd4d80de..de048f863d 100644 --- a/libavutil/x86/x86inc.asm +++ b/libavutil/x86/x86inc.asm @@ -4,9 +4,9 @@ ;* Copyright (C) 2005-2017 x264 project ;* ;* Authors: Loren Merritt +;* Henrik Gramner ;* Anton Mitrofanov ;* Fiona Glaser -;* Henrik

[FFmpeg-devel] [PATCH 0/5] x86inc: Sync changes from x264

2018-01-18 Thread Henrik Gramner
Henrik Gramner (5): x86inc: Enable AVX emulation for floating-point pseudo-instructions x86inc: Use .rdata instead of .rodata on Windows x86inc: Support creating global symbols from local labels x86inc: Correctly set mmreg variables x86inc: Drop cpuflags_slowctz libavutil/x86

Re: [FFmpeg-devel] avfilter/x86/vf_blend : add avx2 for 8b func (v2)

2018-01-17 Thread Henrik Gramner
On Tue, Jan 16, 2018 at 11:33 PM, Martin Vignali wrote: > BLEND_INIT grainextract, 4 You could also try doing twice as much per iteration which might be more efficient, especially in avx2 since it avoids cross-lane shuffles. Applies to some other ones as well. E.g. something like: pxor

Re: [FFmpeg-devel] avcodec/utvideoenc : add SIMD (SSSE3) for sub_left_pred

2018-01-14 Thread Henrik Gramner
On Sat, Jan 13, 2018 at 5:22 PM, Martin Vignali wrote: > +#define randomize_buffers(buf, size) \ > +do { \ > +int j; \ > +uint8_t *tmp_buf = (uint8_t *)buf;\ > +for (j = 0; j < size; j++) \ > +

Re: [FFmpeg-devel] [PATCH 3/3] avfilter/vf_framerate: add SIMD functions for frame blending

2018-01-14 Thread Henrik Gramner
On Sat, Jan 13, 2018 at 10:57 PM, Marton Balint wrote: > +.loop: > +movum0, [src1q + xq] > +movum1, [src2q + xq] > +punpckl%1%2 m5, m0, m2 ; 0e0f0g0h > +punpckh%1%2 m0, m2 ; 0a0b0c0d > +punpckl%1%2

Re: [FFmpeg-devel] avcodec/utvideoenc : add SIMD (SSSE3) for sub_left_pred

2018-01-13 Thread Henrik Gramner
On Sat, Jan 13, 2018 at 5:22 PM, Martin Vignali wrote: > i try to change int width -> ptrdiff_t width to remove movsxdifnidn > but i have a segfault if height > 1 I'm guessing due to > +declare_func_emms(AV_CPU_FLAG_MMX, void, uint8_t *dst, const uint8_t > *src, > + ptr

Re: [FFmpeg-devel] avcodec/utvideoenc : add SIMD (SSSE3) for sub_left_pred

2018-01-12 Thread Henrik Gramner
On Thu, Jan 11, 2018 at 9:45 PM, Martin Vignali wrote: > +if (check_func(c.sub_left_predict, "sub_left_predict")) { > +call_ref(dst0, src0, stride, width, height); > +call_new(dst1, src0, stride, width, height); > +if (memcmp(dst0, dst1, width)) > +fail(); >

Re: [FFmpeg-devel] avfilter/x86/vf_blend : add avx2 version for 8b func (WIP)

2017-12-17 Thread Henrik Gramner
On Thu, Dec 14, 2017 at 11:16 AM, Martin Vignali wrote: > 2017-12-13 17:37 GMT+01:00 Henrik Gramner : >> You could also do vextracti128 + 128-bit packuswb instead of 256-bit >> packuswb + vpermq. >> > Sorry don't understand this part > do you mean 128 bit pack

Re: [FFmpeg-devel] avfilter/x86/vf_blend : add avx2 version for 8b func (WIP)

2017-12-13 Thread Henrik Gramner
On Sat, Dec 9, 2017 at 1:11 PM, Martin Vignali wrote: > the idea in AVX2 is to load 128bits of data (2x 64 bits) > then shuffle accross lane, the two 64 bits in the low part of each lane, to > keep the rest of the process similar > to the sse version What about using pmovzxbw instead of movu + vp

Re: [FFmpeg-devel] avfilter/x86/vf_hflip : make macro and add AVX2

2017-12-13 Thread Henrik Gramner
On Wed, Dec 13, 2017 at 6:07 AM, Martin Vignali wrote: > +vpermq m1, [srcq + xq - mmsize + %3], 0x4e; flip each lane at > load > +vpermq m2, [srcq + xq - 2 * mmsize + %3], 0x4e; flip each lane at > load Would doing 2x 128-bit movu + 2x vinserti128 be faster? __

Re: [FFmpeg-devel] avutil/x86util : add macro for 128 bits constant load

2017-12-02 Thread Henrik Gramner
On Fri, Dec 1, 2017 at 9:03 PM, Martin Vignali wrote: > If no one have objections, i will push these patch tomorrow. > > Martin Follow James' suggestion to use >16 instead of ==32, otherwise OK. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http

Re: [FFmpeg-devel] avutil/x86util : add macro for 128 bits constant load

2017-11-28 Thread Henrik Gramner
On Mon, Nov 27, 2017 at 11:37 PM, James Almer wrote: > On 11/27/2017 7:33 PM, James Darnley wrote: >> If the condition was made "mmsize > 16" would this work correctly for >> zmm registers? (Assume I finally push my AVX-512 patches). > > No, there's no EVEX variant of vbroadcasti128. For that you

Re: [FFmpeg-devel] [PATCH 7/8] lavc/flacenc: add AVX2 version of the 32-bit LPC encoder

2017-11-27 Thread Henrik Gramner
>> Using 128-bit broadcasts is preferable over duplicating the constants >> to 256-bit unless there's a good reason for doing so since it wastes >> less cache and is faster on AMD CPU:s. > > What would that reason be? Afaik broadcasts are expensive, since they > both load from memory then splat dat

Re: [FFmpeg-devel] avcodec/x86/bswapdsp : convert pb_bswap32 to ymm constant in order to simplify code

2017-11-27 Thread Henrik Gramner
On Sat, Nov 25, 2017 at 9:53 PM, Martin Vignali wrote: > Hello, > > In attach patch to convert pb_bswap32 to ymm constant > and remove the vbroadcasti128 part > > Speed seems to be similar to me This just wastes cache for no reason. A tiny amount, sure, but minor things tends to add up eventually

Re: [FFmpeg-devel] [PATCH 7/8] lavc/flacenc: add AVX2 version of the 32-bit LPC encoder

2017-11-27 Thread Henrik Gramner
On Sun, Nov 26, 2017 at 11:51 PM, James Darnley wrote: > -pd_0_int_min: times 2 dd 0, -2147483648 > -pq_int_min: times 2 dq -2147483648 > -pq_int_max: times 2 dq 2147483647 > +pd_0_int_min: times 4 dd 0, -2147483648 > +pq_int_min: times 4 dq -2147483648 > +pq_int_max: times 4 dq 21

Re: [FFmpeg-devel] [PATCH] Remove REP_RET usage throughout x86 asm files

2017-11-13 Thread Henrik Gramner
On Sun, Nov 12, 2017 at 9:59 PM, Rostislav Pehlivanov wrote: > No longer needed as AUTO_REP_RET deals with it on normal RETs. Only when the RET follows a branch instruction. If it's a branch target (that isn't by itself preceded by a branch instruction) there is no way of automatically detecting

Re: [FFmpeg-devel] [PATCH 8/8] avcodec/v210enc: add AVX-512 10-bit line pack function

2017-10-30 Thread Henrik Gramner
On Mon, Oct 30, 2017 at 2:08 PM, James Darnley wrote: > +INIT_YMM avx512 ymm? ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH] swscale: Reduce verbosity of misalignment reporting

2017-10-29 Thread Henrik Gramner
On Sun, Oct 22, 2017 at 11:47 AM, Henrik Gramner wrote: > It's a bit overzealous to complain about misalignment with AV_LOG_WARNING, > especially since memory bandwidth is much more likely to be the bottleneck > compared to data alignment which the user may not even have contr

[FFmpeg-devel] [PATCH] swscale: Reduce verbosity of misalignment reporting

2017-10-22 Thread Henrik Gramner
It's a bit overzealous to complain about misalignment with AV_LOG_WARNING, especially since memory bandwidth is much more likely to be the bottleneck compared to data alignment which the user may not even have control over. --- libswscale/swscale.c | 18 +++--- 1 file changed, 3 insert

Re: [FFmpeg-devel] [PATCH]lavc/h264:Only check x264_build if it was set

2017-10-06 Thread Henrik Gramner
On Thu, Oct 5, 2017 at 8:31 AM, Carl Eugen Hoyos wrote: > Hi! > > Attached patch fixes ticket #6717. > > Please comment, Carl Eugen Signed numbers are converted to unsigned when compared to unsigned numbers which means -1 becomes UINT_MAX so this patch shouldn't actually change anything. #6717 i

Re: [FFmpeg-devel] libavcodec/exr : add x86 SIMD for predictor

2017-10-01 Thread Henrik Gramner
On Sun, Oct 1, 2017 at 4:14 PM, James Almer wrote: > We normally use int for counters, and don't mix declaration and statements. > And in any case ptrdiff_t would be "more correct" for this. Ah right. C90, ugh. Too used to C99. Yeah, feel free to use whatever datatype that's most appropriate for

Re: [FFmpeg-devel] libavcodec/exr : add x86 SIMD for predictor

2017-10-01 Thread Henrik Gramner
On Fri, Sep 22, 2017 at 11:12 PM, Martin Vignali wrote: > +static void predictor_scalar(uint8_t *src, ptrdiff_t size) > +{ > +uint8_t *t= src + 1; > +uint8_t *stop = src + size; > + > +while (t < stop) { > +int d = (int) t[-1] + (int) t[0] - 128; > +t[0] = d; > +

Re: [FFmpeg-devel] [PATCH 3/3] avcodec/x86/lossless_videoencdsp: Fix warning: signed dword value exceeds bounds

2017-09-30 Thread Henrik Gramner
On Sat, Sep 30, 2017 at 12:58 AM, Michael Niedermayer wrote: > -andi, -2 * regsize > +andi, -(2 * regsize) regsize is defined to mmsize / 2 in the relevant case so the expression resolves to -2 * 16 / 2 In nasm integers are 64-bit and / is unsigned divisio

Re: [FFmpeg-devel] libavcodec/exr : add SIMD for reorder pixels (SSE and AVX2) v3 (WIP)

2017-09-10 Thread Henrik Gramner
On Sun, Sep 10, 2017 at 5:17 PM, Martin Vignali wrote: > +void (*reorder_pixels)(uint8_t *src, uint8_t *dst, int size); size should be ptrdiff_t instead of int since it's used as a 64-bit operand in the asm on x86-64 and the upper 32 bits are undefined otherwise. > +++ b/libavcodec/x86/exrds

Re: [FFmpeg-devel] [PATCH]v6 Opus Pyramid Vector Quantization Search in x86 SIMD asm

2017-08-06 Thread Henrik Gramner
On Sat, Aug 5, 2017 at 12:58 AM, Ivan Kalvachev wrote: > 8 packed, 8 scalar. > > Unless I miss something (and as I've said before, > I'm not confident enough to mess with that code.) > > (AVX does extend to 32 variants, but they are not > SSE compatible, so no need to emulate them.) Oh, right. I

Re: [FFmpeg-devel] [PATCH] Add macros used in opus_pvq_search to x86util.asm

2017-08-06 Thread Henrik Gramner
On Sat, Aug 5, 2017 at 9:10 PM, Ivan Kalvachev wrote: > +%macro VBROADCASTSS 2 ; dst xmm/ymm, src m32/xmm > +%if cpuflag(avx2) > +vbroadcastss %1, %2; ymm, xmm > +%elif cpuflag(avx) > +%ifnum sizeof%2 ; avx1 register > +vpermilps xmm%1, xmm%2, q

Re: [FFmpeg-devel] [PATCH]v6 Opus Pyramid Vector Quantization Search in x86 SIMD asm

2017-08-04 Thread Henrik Gramner
On Thu, Aug 3, 2017 at 11:36 PM, Ivan Kalvachev wrote: >> 1234_1234_1234_123 >> VBROADCASTSS ym1, xm1 >> BLENDVPS m1, m2, m3 >> >> is the most commonly used alignment. > > I see that a lot of .asm files use different alignments. > I'll try to pick something similar that I

Re: [FFmpeg-devel] [PATCH]v6 Opus Pyramid Vector Quantization Search in x86 SIMD asm

2017-08-02 Thread Henrik Gramner
On Tue, Aug 1, 2017 at 11:46 PM, Ivan Kalvachev wrote: > On 7/31/17, Henrik Gramner wrote: >> Use rN instead of rNq for numbered registers (q suffix is used for >> named args only due to preprocessor limitations). > > Is this documented? Not sure, but there's probably

Re: [FFmpeg-devel] [PATCH]v6 Opus Pyramid Vector Quantization Search in x86 SIMD asm

2017-07-31 Thread Henrik Gramner
On Wed, Jul 26, 2017 at 4:56 PM, Ivan Kalvachev wrote: > +++ b/libavcodec/x86/opus_pvq_search.asm Generic minor stuff: Use rN instead of rNq for numbered registers (q suffix is used for named args only due to preprocessor limitations). Use the same "standard" vertical alignment rules as most ex

Re: [FFmpeg-devel] [PATCH] avfilter: add LIBVMAF filter

2017-07-16 Thread Henrik Gramner
`./configure && make` results in "libavfilter/vf_libvmaf.c:29:21: fatal error: libvmaf.h: No such file or directory". I don't have libvmaf installed, but it configures it as enabled and detects it as installed anyway. ___ ffmpeg-devel mailing list ffmpeg

Re: [FFmpeg-devel] [PATCH 1/2] checkasm: add sbrdsp tests

2017-06-29 Thread Henrik Gramner
On Fri, Jun 30, 2017 at 1:58 AM, Michael Niedermayer wrote: > Program received signal SIGSEGV, Segmentation fault. > 0x00684919 in ff_sbr_hf_gen_sse () >0x00684909 : sub%r9,%r8 > => 0x00684919 : movaps (%rsi,%r8,1),%xmm0 > r9 0xdeadbeef0080

Re: [FFmpeg-devel] [WIP][PATCH]v2 Opus Pyramid Vector Quantization Search in x86 SIMD asm

2017-06-25 Thread Henrik Gramner
On Sat, Jun 24, 2017 at 10:39 PM, Ivan Kalvachev wrote: > +%define HADDPS_IS_FAST 0 > +%define PHADDD_IS_FAST 0 [...] > +haddps %1, %1 > +haddps %1, %1 [...] > + phaddd xmm%1,xmm%1 > + phaddd xmm%1,xmm%1 You can safely assume that those instru

Re: [FFmpeg-devel] [PATCH 09/11] avcodec/x86: allow future 8-bit simple idct to have "DC only hack"

2017-06-24 Thread Henrik Gramner
On Mon, Jun 19, 2017 at 5:11 PM, James Darnley wrote: > +por m1, m8, m13 > +por m1, m12 > +por m1, [blockq+ 16] ; { row[1] }[0-7] > +por m1, [blockq+ 48] ; { row[3] }[0-7] > +por m1, [blockq+ 80] ; { row[5] }[0-7] > +por m1, [blockq

  1   2   3   >