On Wed, Jul 16, 2025 at 6:26 PM Niklas Haas wrote:
> +cglobal detect_range%1, 6, 7, 5, data, stride, width, height, mpeg_min,
> mpeg_max, x
> +movd xm0, mpeg_mind
> +movd xm1, mpeg_maxd
> +vpbroadcast%1 m0, xm0
> +vpbroadcast%1 m1, xm1
You could perhaps also do something like the
On Wed, May 21, 2025 at 5:48 PM Henrik Gramner wrote:
>
> Tested to pass FATE on Linux and Windows.
Pushed.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link abo
Tested to pass FATE on Linux and Windows.
Checkasm numbers vs the existing SSE2 code on Zen 5 (Strix Halo):
vp9_inv_adst_adst_16x16_sub16_add_10_sse2: 1041.8 ( 1.92x)
vp9_inv_adst_adst_16x16_sub16_add_10_avx512icl: 132.5 (15.06x)
vp9_inv_dct_adst_16x16_sub16_add_10_sse2: 901.0 ( 1
On Sat, May 17, 2025 at 12:59 AM Henrik Gramner wrote:
>
> Placed in a new separate file as the existing combined MMX/SSE/AVX
> file is humongous and takes forever to assemble as is.
>
> This adds ~16 KiB of .text. The existing 8bpc asm is ~240 KiB of which
> the correspond
Placed in a new separate file as the existing combined MMX/SSE/AVX
file is humongous and takes forever to assemble as is.
This adds ~16 KiB of .text. The existing 8bpc asm is ~240 KiB of which
the corresponding AVX2 functions makes up ~42 KiB.
Tested to pass FATE on Linux and Windows.
Checkasm n
On Tue, May 21, 2024 at 2:33 PM J. Dekker wrote:
> @@ -338,8 +338,9 @@ typedef struct CheckasmPerf {
> uint64_t tsum = 0;\
> int ti, tcount = 0;\
> uint64_t t = 0; \
> +const uint64_t truns = bench_runs;\
> checkasm_set_signal_handler
On Tue, Apr 9, 2024 at 11:52 PM Marth64 wrote:
> > +#!/bin/sh
> Might I suggest `#!/usr/bin/env sh` instead for this case?
> I tend to prefer it from a portability and usability perspective,
> but I can imagine for sh it might not matter.
/bin/sh exists on virtually every *NIX system whereas /usr
On Sun, Apr 7, 2024 at 2:59 AM Vadim Guchenko wrote:
> +typedef DPI_AWARENESS_CONTEXT (__stdcall
> *set_thread_dpi_t)(DPI_AWARENESS_CONTEXT);
I believe most existing code uses WINAPI instead of __stdcall.
___
ffmpeg-devel mailing list
ffmpeg-devel@
On Wed, Apr 3, 2024 at 3:47 PM Martin Storsjö wrote:
>
> This fixes assembling files starting with bare symbol declarations,
> without explicitly switching to .text first.
lgtm.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/ma
On Sun, Mar 24, 2024 at 8:21 PM Henrik Gramner wrote:
>
> Broken in afa471d0efed1df5dca6eeeb2fcdd211ae4cad4e. It just happened
> to work before due to x86inc.asm previously performing XMM spills in
> INIT_MMX mode which was more of a bug than an intentional feature.
On Mon, Mar 25, 2024 at 4:01 PM Andreas Rheinhardt
wrote:
>
> Right, it is an anonymous enum, not union. Amended locally.
>
> - Andreas
Can confirm this eliminates the warnings, lgtm.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.
Broken in afa471d0efed1df5dca6eeeb2fcdd211ae4cad4e. It just happened
to work before due to x86inc.asm previously performing XMM spills in
INIT_MMX mode which was more of a bug than an intentional feature.
x86_h264_idct_spill_xmm.patch
Description: Binary data
_
On Tue, Mar 19, 2024 at 11:20 AM Henrik Gramner wrote:
>
> Will push in a few days if there are no comments.
Pushed.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visi
On Sat, Mar 16, 2024 at 8:53 PM Henrik Gramner wrote:
> Makes things up-to-date with the upstream at
> https://code.videolan.org/videolan/x86inc.asm
Will push in a few days if there are no comments.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmp
On Sun, Mar 17, 2024 at 1:44 PM James Almer wrote:
> LGTM. I wonder why we even added a float based fallback for this.
Thanks, pushed.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe
Fixes yadif-16 which allows FATE to pass.
Broken since 2904db90458a1253e4aea6844ba9a59ac11923b6 (2017).
pminsd_emulation.patch
Description: Binary data
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-dev
Makes things up-to-date with the upstream at
https://code.videolan.org/videolan/x86inc.asm
Specifying every individual change is difficult as there have been
divergences and cherry-picks over time, but the full upstream change
log can be found at
https://code.videolan.org/videolan/x86inc.asm/-/com
On Sat, Mar 2, 2024 at 10:13 PM Kieran Kunhya wrote:
> SPLATB_LOAD m0, r0+r1*0-1, m2
> SPLATB_LOAD m1, r0+r1*1-1, m2
This adds an extra unnecessary shuffle in the SSE2 code as it splats
to a full register. The easiest way of fixing it would probably be to
unroll the macro and manually g
On Sun, Feb 25, 2024 at 5:42 PM Ronald S. Bultje wrote:
> +movam13, [pw_8]
> +paddw m10, m12, m12
> +paddw m12, m10 ; 9 * (q0 - p0) - 3 * ( q1 - p1 )
> paddw m12, m13; + 8
Memory operand
> +paddw m10, m13, m13
> +paddw
On Fri, Dec 22, 2023 at 7:20 AM Rémi Denis-Courmont wrote:
> >> > +checkasm_fail_func("%s",
> >> > + s == SIGFPE ? "fatal arithmetic error" :
> >> > + s == SIGILL ? "illegal instruction" :
> >> > + s == SIGBUS ?
On Thu, Dec 21, 2023 at 9:16 PM Rémi Denis-Courmont wrote:
> > +checkasm_fail_func("%s",
> > + s == SIGFPE ? "fatal arithmetic error" :
> > + s == SIGILL ? "illegal instruction" :
> > + s == SIGBUS ? "bus error"
On Tue, Dec 19, 2023 at 1:02 PM Martin Storsjö wrote:
> This replaces the riscv specific handling from
> 7212466e735aa187d82f51dadbce957fe3da77f0 (which essentially is
> reverted, together with 286d6742218ba0235c32876b50bf593cb1986353)
> with a different implementation of the same (plus a bit more
On Mon, Nov 27, 2023 at 2:42 PM Mark Thompson wrote:
> Is it reasonable to set this global state from a library without the parent
> program knowing? We'd really prefer not to affect the global state
> unexpectedly.
CreateWaitableTimerExW() with the
CREATE_WAITABLE_TIMER_HIGH_RESOLUTION flag m
On Thu, Nov 23, 2023 at 12:51 PM James Almer wrote:
> movdqa wiht ymm is avx2. I could change it to movaps, but technically
> the registers contain floats and i don't know if any old AVX cpu has
> penalties for changing domains.
Fwiw I believe what domain the result of fp <-> int conversion
instr
On Wed, Nov 1, 2023 at 10:44 AM Andreas Rheinhardt
wrote:
> libavutil/x86/pixelutils.asm | 1 +
> 1 file changed, 1 insertion(+)
IIRC the emms instructions is quite slow on many systems, so if this
is the only pixelutils function using mmx it's probably better to just
rewrite it to use SSE2 inst
On Fri, Sep 29, 2023 at 1:38 PM Frank Plowman wrote:
> libavutil/x86/x86inc.asm | 10 ++
> 1 file changed, 10 insertions(+)
LGTM.
As a side note https://code.videolan.org/videolan/x86inc.asm is the
upstream repo for x86inc.asm.
___
ffmpeg-deve
lgtm
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
On Fri, Oct 21, 2022 at 5:41 AM Kieran Kunhya wrote:
>
> Hi,
>
> Please see attached an attempt to optimise the 8-bit input to v210enc to
> reduce the number of shuffles.
> This comes at the cost of having to extract the middle element and perform
> a DWORD shift on it and then reinserting it.
> I
On Tue, Oct 18, 2022 at 6:54 PM Anton Khirnov wrote:
> +static void thread_set_name(PerThreadContext *p)
> +{
> +AVCodecContext *avctx = p->avctx;
> +int idx = p - p->parent->threads;
> +char name[16];
> +
> +snprintf(name, sizeof(name), "d:%.7s:ft%d", avctx->codec->name, idx);
> +
On Wed, Sep 7, 2022 at 8:47 AM wrote:
> +.loop1:
> +pxor m4, m4
> +pxor m5, m5
Those zero-initializations are redundant. Aside from that the asm LGTM.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmp
LGTM.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
On Tue, Aug 23, 2022 at 10:43 AM wrote:
> +.loop1:
> +pxor m4, m4
> +pxor m5, m5
> +
> +;Gx
> +SOBEL_MUL_16 0, data_n1, 4
> +SOBEL_MUL_16 1, data_n2, 4
> +SOBEL_MUL_16 2, data_n1, 4
> +SOBEL_ADD_16 6, 4
> +SOBEL_MUL_16 7, data_p2, 4
> +SOBEL_ADD_16 8, 4
> +
> [.
On Fri, Sep 2, 2022 at 7:55 AM Lynne wrote:
> +movd xmm4, strided
> +neg t2d
> +movd xmm5, t2d
> +SPLATD xmm4
> +SPLATD xmm5
> +vperm2f128 m4, m4, m4, 0x00 ; +stride splatted
> +vperm2f128 m5, m5, m5, 0x00 ; -stride splatted
movd xm4, strided
pxor m5, m5
vpbr
> On Sat, Aug 27, 2022 at 12:04 AM James Darnley wrote:
> I think the feature selection is fine as-is, if you want to clarify
> the comments go ahead. AVX512 wouldn't be useful with a subset even
> smaller then what the plain AVX512 is looking for (there is also no
> CPUs with any smaller set, afa
On Mon, Jul 25, 2022 at 5:43 AM Andreas Rheinhardt
wrote:
>
> It is overridden by ff_add_bytes_l2_sse2() on any non-ancient CPU.
>
> Signed-off-by: Andreas Rheinhardt
Lgtm
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman
On Mon, Jul 11, 2022 at 11:19 AM Martin Storsjö wrote:
> +#if (AV_GCC_VERSION_AT_LEAST(4,0) || defined(__clang__)) &&
> (defined(__ELF__) || defined(__MACH__))
> +#define av_visibility_hidden __attribute__((visibility("hidden")))
> +#else
> +#define av_visibility_hidden
> +#endif
The usu
All 5/5 LGTM.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
On Wed, Feb 23, 2022 at 9:58 AM wrote:
> +%macro HEVC_PUT_HEVC_QPEL_AVX512ICL 2
> [...]
> +vpmovdw xm6, m6
> +movu [dstq], xm6
vpmovdw can take a memory operand as dst directly:
vpmovdw [dstq], m6
(the same applies to the hv function)
> +%macro HEVC_PUT_
On Wed, Feb 23, 2022 at 9:57 AM wrote:
>
> From: Wu Jianhua
>
> Signed-off-by: Wu Jianhua
> ---
> configure | 13 +++---
> libavutil/cpu.c | 1 +
> libavutil/cpu.h | 1 +
> libavutil/x86/cpu.c | 8 --
> libavutil/x86/cpu.h | 1 +
> lib
On Mon, Jul 5, 2021 at 4:32 AM Fei Wang wrote:
> +int64_t v, w;
> +int32_t *param = &s->cur_frame.gm_params[idx][0];
...
> +v = param[4] * (1 << AV1_WARPEDMODEL_PREC_BITS);
> +w = param[3] * param[4];
Possible integer overflow? Might need some int64_t casting before the
mu
On Mon, Jun 14, 2021 at 9:22 AM Matthias Neugebauer wrote:
> Anything I can do to not land in spam? On another Google groups
> mailing list I (and many others including the admin accounts) had
> the same issue a couple of times.
This is caused by sending emails from a domain with a DMARC reject o
On Mon, Nov 16, 2020 at 11:03 AM Alan Kelly
wrote:
> +cglobal yuv2yuvX, 6, 7, 16, filter, filterSize, dest, dstW, dither, offset,
> src
Only 8 xmm registers are used, so 8 should be used instead of 16 here.
Otherwise it causes unnecessary spilling of registers on 64-bit
Windows.
> +%if ARCH_X86_
On Thu, Jul 9, 2020 at 4:54 PM James Almer wrote:
> @@ -38,7 +38,7 @@ pb_255: times 16 db 255
>
> SECTION .text
>
> -%macro BLEND_INIT 2-3
> +%macro BLEND_INIT 2
> %if ARCH_X86_64
> cglobal blend_%1, 6, 9, %2, top, top_linesize, bottom, bottom_linesize, dst,
> dst_linesize, width, end, x
>
On Sun, Jul 5, 2020 at 9:10 PM Marton Balint wrote:
> I don't know enough about x262/x264 to do this with reasonable amount of
> work. Do you think there is a chance of this happening if I post a bounty
> or get a sponsorship?
x264 is an H.264/AVC encoder and as such an MPEG-2 encoder is not in
s
All 5 lgtm.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
On Fri, Jan 3, 2020 at 7:37 PM Moritz Barsnick wrote:
> On Fri, Jan 03, 2020 at 11:05:25 +0100, Timo Rothenpieler wrote:
> > I think this was discussed on this list in the past.
> > Not sure what the conclusion was, but I think an unconditional flag like
> > this being added wasn't all that well r
On Wed, Dec 4, 2019 at 4:03 AM Ting Fu wrote:
> +VBROADCASTSD y_offset, [pointer_c_ditherq + 8 * 8]
> +VBROADCASTSD u_offset, [pointer_c_ditherq + 9 * 8]
> +VBROADCASTSD v_offset, [pointer_c_ditherq + 10 * 8]
> +VBROADCASTSD ug_coff, [pointer_c_ditherq + 7 * 8]
> +VBROADCAS
On Wed, Sep 4, 2019 at 9:29 PM Paul B Mahol wrote:
> +movd xm6, [pd_255]
> +vpbroadcastdm6, xm6
vpbroadcastdm6, [pd_255]
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To
On Wed, Sep 4, 2019 at 10:01 PM James Almer wrote:
> On 9/4/2019 4:28 PM, Paul B Mahol wrote:
> > +vpmulld m3, m1, m0
> > +vpaddd m1, m3, m2
>
> pmulld m1, m0
> paddd m1, m2
Could use pmaddwd instead as well, it's faster than pmulld on pretty
much every CPU.
>
On Wed, Feb 20, 2019 at 8:03 PM Fāng-ruì Sòng
wrote:
> --- a/libavutil/mem.h
> +++ b/libavutil/mem.h
>
> +#if defined(__GNUC__) && !(defined(_WIN32) || defined(__CYGWIN__))
> +#define DECLARE_HIDDEN __attribute__ ((visibility ("hidden")))
> +#else
> +#define DECLARE_HIDDEN
> +#endif
libav
On Mon, Jan 21, 2019 at 9:54 PM James Almer wrote:
> There's also no good way to deprecate a define and replace it with
> another while informing the library user, so for something purely
> cosmetic like this i don't think it's worth the trouble.
Would it be possible to create a deprecated inline
On Tue, Oct 23, 2018 at 3:22 PM Derek Buitenhuis
wrote:
> I'd like to point out that this patch or some variant may be required anyway.
>
> libx264 only uses strtok_r or strtok_s if available on the platform.
>
> See:
> https://git.videolan.org/?p=x264.git;a=blob;f=common/osdep.h;h=715ef8a00c01ad
Fixed in x264-sandbox. All uses of plain strtok() will be removed from
x264 in the next push.
I though all of the strtok() uses in x264 had already been converted
to strtok_r() but apparently that wasn't the case. Sorry about that.
___
ffmpeg-devel maili
On Mon, Oct 8, 2018 at 7:46 PM Martin Vignali wrote:
>
> Hello,
>
> Patch in attach port inline asm shuffle 2103 func (mmx/mmxext) to external
> asm
> and remove the inline asm version
>
> Martin
Keeping both MMX and MMXEXT seems a bit excessive. Ideally both would
be replaced with something more
On Fri, Sep 14, 2018 at 3:26 PM, James Almer wrote:
> On 9/14/2018 9:57 AM, Henrik Gramner wrote:
>> Also if you want a 32-bit result from lea it should be written as "lea
>> lend, [lenq*8 - mmsize*4]" which is equivalent but has a shorter
>> opcode (e.g. always u
On Fri, Sep 14, 2018 at 4:51 PM, Henrik Gramner wrote:
> I can't really think of any scenario where using a 32-bit register
> address operand with a 64-bit destination for LEA is not a mistake.
To clarify on this, using a 32-bit memory operand means the calculated
effective address
On Thu, Sep 13, 2018 at 3:08 PM, James Almer wrote:
> +lea lenq, [lend*8 - mmsize*4]
Is len guaranteed to be a multiple of mmsize/8? Otherwise this would
cause misalignment. It will also break if len < mmsize/2.
Also if you want a 32-bit result from lea it should be written as "lea
len
On Fri, Jul 27, 2018 at 4:03 PM, James Darnley wrote:
> On 2018-07-27 15:05, Henrik Gramner wrote:
>> Can't you just use 7 GPR:s on x86-32 as well?
>
> I'm sure I've done that in the past and at least 1 platform has always
> complained due to PIE or stack alignm
On Fri, Jul 27, 2018 at 1:47 PM, James Darnley wrote:
> On 2018-07-26 17:29, Rostislav Pehlivanov wrote:
>>> +cglobal horizontal_compose_haar_10bit, 3, 6+ARCH_X86_64, 4, b, temp_, w,
>>> x, b2
>>> +DECLARE_REG_TMP 2,5
>>> +%if ARCH_X86_64
>>> +%define tail r6d
>>> +%else
>>> +
On Tue, May 1, 2018 at 10:02 AM, Paul B Mahol wrote:
> +cglobal overlay_row_22, 6, 8, 8, 0, d, da, s, a, w, al, r, x
[...]
> +movum2, [aq+2*xq]
> +pandm2, m3
> +movum6, [aq+2*xq]
> +pandm6, m7
> +psrlw m6, 8
> +p
On Mon, Apr 30, 2018 at 6:17 PM, Paul B Mahol wrote:
> +.loop0:
> +movu m1, [dq + xq]
> +movu m2, [aq + xq]
> +movu m3, [sq + xq]
> +
> +pshufb m1, [pb_b2dw]
> +pshufb m2, [pb_b2dw]
> +pshufb m3, [pb_b2dw]
> +
On Fri, Apr 27, 2018 at 4:47 PM, Jerome Borsboom
wrote:
> In the put_no_rnd_pixels functions, the psubusb instruction subtracts one
> from each
> unsigned byte to correct for the rouding that the PAVGB instruction performs.
> The psubusb
> instruction, however, uses saturation when the value doe
Pushed.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
---
libavutil/x86/x86inc.asm | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm
index 438863042f..5044ee86f0 100644
--- a/libavutil/x86/x86inc.asm
+++ b/libavutil/x86/x86inc.asm
@@ -827,9 +827,8 @@ BRANCH_INSTR jz, je, jnz,
;*
;* Authors: Loren Merritt
;* Henrik Gramner
@@ -892,6 +892,36 @@ BRANCH_INSTR jz, je, jnz, jne, jl, jle, jnl, jnle, jg,
jge, jng, jnge, ja, jae,
%undef %1%2
%endmacro
+%macro DEFINE_MMREGS 1 ; mmtype
+%assign %%prev_mmregs 0
+%ifdef num_mmregs
+%assign
There are 32 pseudo-instructions for each floating-point comparison
instruction, but only 8 of them are actually valid in legacy-encoded mode.
The remaining 24 requires the use of VEX-encoded (v-prefixed) instructions
and can therefore be disregarded for this purpose.
---
libavutil/x86/x86inc.asm
The standard section for read-only data on Windows is .rdata. Nasm will
flag non-standard sections as executable by default which isn't ideal.
---
libavutil/x86/x86inc.asm | 4
1 file changed, 4 insertions(+)
diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm
index 3b43dbc2e0..
index 57cd4d80de..de048f863d 100644
--- a/libavutil/x86/x86inc.asm
+++ b/libavutil/x86/x86inc.asm
@@ -4,9 +4,9 @@
;* Copyright (C) 2005-2017 x264 project
;*
;* Authors: Loren Merritt
+;* Henrik Gramner
;* Anton Mitrofanov
;* Fiona Glaser
-;* Henrik
Henrik Gramner (5):
x86inc: Enable AVX emulation for floating-point pseudo-instructions
x86inc: Use .rdata instead of .rodata on Windows
x86inc: Support creating global symbols from local labels
x86inc: Correctly set mmreg variables
x86inc: Drop cpuflags_slowctz
libavutil/x86
On Tue, Jan 16, 2018 at 11:33 PM, Martin Vignali
wrote:
> BLEND_INIT grainextract, 4
You could also try doing twice as much per iteration which might be
more efficient, especially in avx2 since it avoids cross-lane
shuffles. Applies to some other ones as well.
E.g. something like:
pxor
On Sat, Jan 13, 2018 at 5:22 PM, Martin Vignali
wrote:
> +#define randomize_buffers(buf, size) \
> +do { \
> +int j; \
> +uint8_t *tmp_buf = (uint8_t *)buf;\
> +for (j = 0; j < size; j++) \
> +
On Sat, Jan 13, 2018 at 10:57 PM, Marton Balint wrote:
> +.loop:
> +movum0, [src1q + xq]
> +movum1, [src2q + xq]
> +punpckl%1%2 m5, m0, m2 ; 0e0f0g0h
> +punpckh%1%2 m0, m2 ; 0a0b0c0d
> +punpckl%1%2
On Sat, Jan 13, 2018 at 5:22 PM, Martin Vignali
wrote:
> i try to change int width -> ptrdiff_t width to remove movsxdifnidn
> but i have a segfault if height > 1
I'm guessing due to
> +declare_func_emms(AV_CPU_FLAG_MMX, void, uint8_t *dst, const uint8_t
> *src,
> + ptr
On Thu, Jan 11, 2018 at 9:45 PM, Martin Vignali
wrote:
> +if (check_func(c.sub_left_predict, "sub_left_predict")) {
> +call_ref(dst0, src0, stride, width, height);
> +call_new(dst1, src0, stride, width, height);
> +if (memcmp(dst0, dst1, width))
> +fail();
>
On Thu, Dec 14, 2017 at 11:16 AM, Martin Vignali
wrote:
> 2017-12-13 17:37 GMT+01:00 Henrik Gramner :
>> You could also do vextracti128 + 128-bit packuswb instead of 256-bit
>> packuswb + vpermq.
>>
> Sorry don't understand this part
> do you mean 128 bit pack
On Sat, Dec 9, 2017 at 1:11 PM, Martin Vignali wrote:
> the idea in AVX2 is to load 128bits of data (2x 64 bits)
> then shuffle accross lane, the two 64 bits in the low part of each lane, to
> keep the rest of the process similar
> to the sse version
What about using pmovzxbw instead of movu + vp
On Wed, Dec 13, 2017 at 6:07 AM, Martin Vignali
wrote:
> +vpermq m1, [srcq + xq - mmsize + %3], 0x4e; flip each lane at
> load
> +vpermq m2, [srcq + xq - 2 * mmsize + %3], 0x4e; flip each lane at
> load
Would doing 2x 128-bit movu + 2x vinserti128 be faster?
__
On Fri, Dec 1, 2017 at 9:03 PM, Martin Vignali wrote:
> If no one have objections, i will push these patch tomorrow.
>
> Martin
Follow James' suggestion to use >16 instead of ==32, otherwise OK.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http
On Mon, Nov 27, 2017 at 11:37 PM, James Almer wrote:
> On 11/27/2017 7:33 PM, James Darnley wrote:
>> If the condition was made "mmsize > 16" would this work correctly for
>> zmm registers? (Assume I finally push my AVX-512 patches).
>
> No, there's no EVEX variant of vbroadcasti128. For that you
>> Using 128-bit broadcasts is preferable over duplicating the constants
>> to 256-bit unless there's a good reason for doing so since it wastes
>> less cache and is faster on AMD CPU:s.
>
> What would that reason be? Afaik broadcasts are expensive, since they
> both load from memory then splat dat
On Sat, Nov 25, 2017 at 9:53 PM, Martin Vignali
wrote:
> Hello,
>
> In attach patch to convert pb_bswap32 to ymm constant
> and remove the vbroadcasti128 part
>
> Speed seems to be similar to me
This just wastes cache for no reason. A tiny amount, sure, but minor
things tends to add up eventually
On Sun, Nov 26, 2017 at 11:51 PM, James Darnley wrote:
> -pd_0_int_min: times 2 dd 0, -2147483648
> -pq_int_min: times 2 dq -2147483648
> -pq_int_max: times 2 dq 2147483647
> +pd_0_int_min: times 4 dd 0, -2147483648
> +pq_int_min: times 4 dq -2147483648
> +pq_int_max: times 4 dq 21
On Sun, Nov 12, 2017 at 9:59 PM, Rostislav Pehlivanov
wrote:
> No longer needed as AUTO_REP_RET deals with it on normal RETs.
Only when the RET follows a branch instruction. If it's a branch
target (that isn't by itself preceded by a branch instruction) there
is no way of automatically detecting
On Mon, Oct 30, 2017 at 2:08 PM, James Darnley wrote:
> +INIT_YMM avx512
ymm?
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
On Sun, Oct 22, 2017 at 11:47 AM, Henrik Gramner wrote:
> It's a bit overzealous to complain about misalignment with AV_LOG_WARNING,
> especially since memory bandwidth is much more likely to be the bottleneck
> compared to data alignment which the user may not even have contr
It's a bit overzealous to complain about misalignment with AV_LOG_WARNING,
especially since memory bandwidth is much more likely to be the bottleneck
compared to data alignment which the user may not even have control over.
---
libswscale/swscale.c | 18 +++---
1 file changed, 3 insert
On Thu, Oct 5, 2017 at 8:31 AM, Carl Eugen Hoyos wrote:
> Hi!
>
> Attached patch fixes ticket #6717.
>
> Please comment, Carl Eugen
Signed numbers are converted to unsigned when compared to unsigned
numbers which means -1 becomes UINT_MAX so this patch shouldn't
actually change anything.
#6717 i
On Sun, Oct 1, 2017 at 4:14 PM, James Almer wrote:
> We normally use int for counters, and don't mix declaration and statements.
> And in any case ptrdiff_t would be "more correct" for this.
Ah right. C90, ugh. Too used to C99.
Yeah, feel free to use whatever datatype that's most appropriate for
On Fri, Sep 22, 2017 at 11:12 PM, Martin Vignali
wrote:
> +static void predictor_scalar(uint8_t *src, ptrdiff_t size)
> +{
> +uint8_t *t= src + 1;
> +uint8_t *stop = src + size;
> +
> +while (t < stop) {
> +int d = (int) t[-1] + (int) t[0] - 128;
> +t[0] = d;
> +
On Sat, Sep 30, 2017 at 12:58 AM, Michael Niedermayer
wrote:
> -andi, -2 * regsize
> +andi, -(2 * regsize)
regsize is defined to mmsize / 2 in the relevant case so the
expression resolves to -2 * 16 / 2
In nasm integers are 64-bit and / is unsigned divisio
On Sun, Sep 10, 2017 at 5:17 PM, Martin Vignali
wrote:
> +void (*reorder_pixels)(uint8_t *src, uint8_t *dst, int size);
size should be ptrdiff_t instead of int since it's used as a 64-bit
operand in the asm on x86-64 and the upper 32 bits are undefined
otherwise.
> +++ b/libavcodec/x86/exrds
On Sat, Aug 5, 2017 at 12:58 AM, Ivan Kalvachev wrote:
> 8 packed, 8 scalar.
>
> Unless I miss something (and as I've said before,
> I'm not confident enough to mess with that code.)
>
> (AVX does extend to 32 variants, but they are not
> SSE compatible, so no need to emulate them.)
Oh, right. I
On Sat, Aug 5, 2017 at 9:10 PM, Ivan Kalvachev wrote:
> +%macro VBROADCASTSS 2 ; dst xmm/ymm, src m32/xmm
> +%if cpuflag(avx2)
> +vbroadcastss %1, %2; ymm, xmm
> +%elif cpuflag(avx)
> +%ifnum sizeof%2 ; avx1 register
> +vpermilps xmm%1, xmm%2, q
On Thu, Aug 3, 2017 at 11:36 PM, Ivan Kalvachev wrote:
>> 1234_1234_1234_123
>> VBROADCASTSS ym1, xm1
>> BLENDVPS m1, m2, m3
>>
>> is the most commonly used alignment.
>
> I see that a lot of .asm files use different alignments.
> I'll try to pick something similar that I
On Tue, Aug 1, 2017 at 11:46 PM, Ivan Kalvachev wrote:
> On 7/31/17, Henrik Gramner wrote:
>> Use rN instead of rNq for numbered registers (q suffix is used for
>> named args only due to preprocessor limitations).
>
> Is this documented?
Not sure, but there's probably
On Wed, Jul 26, 2017 at 4:56 PM, Ivan Kalvachev wrote:
> +++ b/libavcodec/x86/opus_pvq_search.asm
Generic minor stuff:
Use rN instead of rNq for numbered registers (q suffix is used for
named args only due to preprocessor limitations).
Use the same "standard" vertical alignment rules as most ex
`./configure && make` results in "libavfilter/vf_libvmaf.c:29:21:
fatal error: libvmaf.h: No such file or directory".
I don't have libvmaf installed, but it configures it as enabled and
detects it as installed anyway.
___
ffmpeg-devel mailing list
ffmpeg
On Fri, Jun 30, 2017 at 1:58 AM, Michael Niedermayer
wrote:
> Program received signal SIGSEGV, Segmentation fault.
> 0x00684919 in ff_sbr_hf_gen_sse ()
>0x00684909 : sub%r9,%r8
> => 0x00684919 : movaps (%rsi,%r8,1),%xmm0
> r9 0xdeadbeef0080
On Sat, Jun 24, 2017 at 10:39 PM, Ivan Kalvachev wrote:
> +%define HADDPS_IS_FAST 0
> +%define PHADDD_IS_FAST 0
[...]
> +haddps %1, %1
> +haddps %1, %1
[...]
> + phaddd xmm%1,xmm%1
> + phaddd xmm%1,xmm%1
You can safely assume that those instru
On Mon, Jun 19, 2017 at 5:11 PM, James Darnley wrote:
> +por m1, m8, m13
> +por m1, m12
> +por m1, [blockq+ 16] ; { row[1] }[0-7]
> +por m1, [blockq+ 48] ; { row[3] }[0-7]
> +por m1, [blockq+ 80] ; { row[5] }[0-7]
> +por m1, [blockq
1 - 100 of 250 matches
Mail list logo