Re: [libav-devel] [PATCH 4/4] x86: fft: Port to cpuflags
On Fri, Mar 10, 2017 at 3:17 PM, Diego Biurrunwrote: > +%macro INTERL 5 > +%if cpuflag(avx) > +vunpckhps %3, %2, %1 > +vunpcklps %2, %2, %1 > +vextractf128 %4(%5), %2, 0 > +vextractf128 %4 %+ H(%5), %3, 0 > +vextractf128 %4(%5 + 1), %2, 1 > +vextractf128 %4 %+ H(%5 + 1), %3, 1 > +%elif cpuflag(sse) > +mova %3, %2 > +unpcklps %2, %1 > +unpckhps %3, %1 > +mova %4(%5), %2 > +mova %4(%5+1), %3 > +%endif > +%endmacro The unpacks can be factored outside the ifs. Just use 3-arg unconditionally when dst != src1. Drop the v prefix for instructions with both legacy and VEX encodings. x86inc automatically uses VEX in AVX functions. Never use vextract(f|i)128 with a 0 immediate, use a basic move instruction with the corresponding xmm register source instead. ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH] mov: Avoid memcmp of uninitialised data
On Sun, Jan 29, 2017 at 8:59 PM, Mark Thompsonwrote: > strncmp Any particular reason for not just using plain strcmp()? ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [FFmpeg-devel] [PATCH] x86inc: Avoid using eax/rax for storing the stack pointer
On Mon, Dec 26, 2016 at 2:52 PM, Ronald S. Bultjewrote: > Hm, OK, I think it affects unix64/x86-32 also when using 32-byte > alignment. We do use the stack pointer then. On 32-bit and UNIX64 it simply uses a different caller-saved register which doesn't require additional instructions. > I think my hesitation comes from how I view x86inc.asm. There's two ways to > see it: > - it's a universal tool, like a compiler, to assist writing assembly > (combined with yasm/nasm as actual assembler); > or > - it's a local tool for ffmpeg/libav/x26[5], like libavutil/attributes.h, > to assist writing assembly. In practice it's basically (a), but designed around the use-case of (b). > If x86inc.asm were like a compiler, every micro-optimization, no matter the > benefit, would be important. If it were a local tool, we indeed wouldn't > care because ffmpeg spends most runtime for important use cases in other > areas. (There's obviously a grayscale in this black/white range that I'm > drawing out.) So having said that, patch is OK. If someone would later come > in to add something to take return value type (void vs. non-void) into > account, I would still find that helpful. :) Specifying a full function prototype for cglobal instead of the current implementation would be ideal. It would also allow stuff like full floating-point abstraction and the ability to auto-load a non-contiguous subset of parameters with optional sign-extension of 32-bit args etc. The problem is that it's difficult to implement in a clean way with the limited Yasm syntax. Nasm does have better string parsing capabilities (although I haven't looked into it in detail) so if we decide to drop Yasm support at some point in the future this feature could perhaps be considered. ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH] x86inc: Avoid using eax/rax for storing the stack pointer
On Mon, Dec 26, 2016 at 2:32 AM, Ronald S. Bultjewrote: > I know I'm terribly nitpicking here for the limited scope of the comment, > but this only matters for functions that have a return value. Do you think > it makes sense to allow functions to opt out of this requirement if they > explicitly state to not have a return value? An opt-out would only be relevant on 64-bit Windows when the following criteria are true for a function: * Reserves exactly 6 registers * Reserves stack space with the original stack pointer stored in a register (as opposed to the stack) * Requires >16 byte stack alignment (e.g. spilling ymm registers to the stack) * Does not have a return value If and only if all of those are true this would result in one register being unnecessarily saved (the cost of which would likely be hidden by OoE). On other systems than WIN64 or if any of the conditions above is false an opt-out doesn't make any sense. Considering how rare that corner case is in combination with how fairly insignificant the downside is I'm not sure it makes that much sense to complicate the x86inc API further with an opt-out just for that specific scenario. ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
[libav-devel] [PATCH] x86inc: Avoid using eax/rax for storing the stack pointer
When allocating stack space with an alignment requirement that is larger than the current stack alignment we need to store a copy of the original stack pointer in order to be able to restore it later. If we chose to use another register for this purpose we should not pick eax/rax since it can be overwritten as a return value. --- libavutil/x86/x86inc.asm | 7 +++ 1 file changed, 7 insertions(+) diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm index b2e9c60..128ddc1 100644 --- a/libavutil/x86/x86inc.asm +++ b/libavutil/x86/x86inc.asm @@ -385,7 +385,14 @@ DECLARE_REG_TMP_SIZE 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14 %ifnum %1 %if %1 != 0 && required_stack_alignment > STACK_ALIGNMENT %if %1 > 0 +; Reserve an additional register for storing the original stack pointer, but avoid using +; eax/rax for this purpose since it can potentially get overwritten as a return value. %assign regs_used (regs_used + 1) +%if ARCH_X86_64 && regs_used == 7 +%assign regs_used 8 +%elif ARCH_X86_64 == 0 && regs_used == 1 +%assign regs_used 2 +%endif %endif %if ARCH_X86_64 && regs_used < 5 + UNIX64 * 3 ; Ensure that we don't clobber any registers containing arguments. For UNIX64 we also preserve r6 (rax) -- 2.7.4 ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 1/3] ratecontrol: Use correct function pointer casts instead of void*
On Fri, Nov 11, 2016 at 1:22 PM, Diego Biurrunwrote: > ISO C forbids initialization between function pointer and ‘void *’ ISO C technically allows quite a lot of weird stuff, like having function pointers that are different from data pointers. Is there even any known relevant system where casting function pointers to void* isn't well defined? I know it's required by POSIX (e.g. dlsym() et al.) at least. ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 2/2] hevc: x86: Add add_residual optimizations
On Wed, Oct 19, 2016 at 10:18 AM, Diego Biurrunwrote: > +%macro ADD_RES_MMX_4_8 0 > +mova m2, [r1] > +mova m4, [r1+8] > +pxor m3, m3 > +psubw m3, m2 > +packuswb m2, m2 > +packuswb m3, m3 > +pxor m5, m5 > +psubw m5, m4 > +packuswb m4, m4 > +packuswb m5, m5 > + > +movh m0, [r0] > +movh m1, [r0+r2] > +paddusb m0, m2 > +paddusb m1, m4 > +psubusb m0, m3 > +psubusb m1, m5 > +movh[r0], m0 > +movh [r0+r2], m1 > +%endmacro mova m0, [r1] mova m2, [r1+8] pxor m1, m1 pxor m3, m3 psubw m1, m0 psubw m3, m2 packuswb m0, m2 packuswb m1, m3 movd m2, [r0] movd m3, [r0+r2] punpckldq m2, m3 paddusb m0, m2 psubusb m0, m1 movd[r0], m0 psrlq m0, 32 movd [r0+r2], m0 [...] > +cglobal hevc_add_residual_4_8, 3, 4, 6 r3 isn't used, no need to reserve it. [...] > +%if cpuflag(avx) > +psubw m3, m0, m4 > +psubw m5, m0, m6 > +%else > +mova m3, m0 > +mova m5, m0 > +psubw m3, m4 > +psubw m5, m6 > +%endif Pointless %else. x86inc will do this automatically for non-AVX when 3-arg syntax is used. [...] > +decr4d > +jnz .loop Nit: jg .loop [...] > +cglobal hevc_add_residual_4_10,3,4, 6 r3 isn't used. [...] > +cglobal hevc_add_residual_8_10,3,5,6 r4 isn't used. ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 1/2] checkasm: Add a test for HEVC add_residual
On Wed, Oct 19, 2016 at 5:43 PM, Diego Biurrunwrote: > What exactly segfaults? checkasm --bench=add_res The stride for bench_new() shouldn't be different from call_new() Actually it should probably be more like: int stride = block_size << (bit_depth > 8); call_ref(dst0, res0, stride); to test with stricter (smaller) alignments. > I get a complaint from clang-asan for size 32: The randomize_buffers2() call is wrong, it shouldn't multiply size with 2. ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 1/2] checkasm: Add a test for HEVC add_residual
On Wed, Oct 19, 2016 at 10:18 AM, Diego Biurrunwrote: > +bench_new(dst1, res1, block_size); Segfaults. Should probably be block_size * 2 like the other calls. ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 2/2] checkasm: Add a test for HEVC add_residual
On Fri, Oct 14, 2016 at 10:29 AM, Luca Barbatowrote: > The term checkasm is misleading. The whole thing is a unit-test for some > specific dsp functions. Not really, no. The checkasm tests only tests whether or not the output of the assembly functions matches the output of the C function. It doesn't try to verify that the C function is actually correct With that said, I really don't see any point in disabling tests just because there's currently no assembly implementation. ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 1/2] hevc: Add SSE2 and AVX IDCT
On Fri, Oct 7, 2016 at 6:32 PM, Alexandra Hájkováwrote: > On Fri, Oct 7, 2016 at 12:32 AM, Diego Biurrun wrote: >> There should be no need to redefine the transpose functions, just call >> the right one with the help of the cpuname macro. > > The traspose functions are called by IDCT_size*size macros and the macro > itself > is the same for avx and sse2. I think the only way to avoid this > define is to group the init > by SIMD instead of grouping it by bitdepth but what to do with the > bitdepth then? > So I think it would be better to leave the define as it is. I think he means call hevc_idct_transpose_NxN_ %+ cpuname which indeed allows you to get rid of the defines. ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 1/2] hevc: Add SSE2 and AVX IDCT
On Tue, Oct 4, 2016 at 7:35 PM, Alexandra Hájkováwrote: > +cglobal hevc_idct_16x16_%1, 1, 2, 16, coeffs > +mov r1d, 3 > +.loop16: > +TR_16x4 8 * r1, 7, [pd_64], 64, 2, 32, 8, 16, 1, 0 > +dec r1 dec r1d [...] > +++ b/libavcodec/x86/hevcdsp_init.c The function pointers for the AVX versions of 4x4 and 8x8 are not assigned on 32-bit. Otherwise LGTM. Tested and passes checkasm on 64-bit Linux, 64-bit Windows, and 32-bit Windows. ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 1/2] hevc: Add SSE2 and AVX IDCT
On Sat, Oct 1, 2016 at 12:55 PM,wrote: > +cglobal hevc_idct_4x4_ %+ %1, 1, 1, 5, coeffs cglobal hevc_idct_4x4_%1, 1, 1, 5, coeffs [...] > +%macro SWAP_BLOCKS 5 [...] > +TRANSPOSE_4x4 4, 5, 8 [...] > +TRANSPOSE_4x4 4, 5, 8 [...] > +%macro TRANSPOSE_BLOCK 3 [...] > +TRANSPOSE_4x4 4, 5, 8 TRANSPOSE_4x4 4, 5, 6 Makes the 8x8 IDCT use one xmm register less (9 -> 8). [...] > +%macro TRANSPOSE_8x8 0 Might as well turn this one into a function too while we're at it. [...] > +cglobal hevc_idct_8x8_ %+ %1, 1, 1, 8, coeffs cglobal hevc_idct_8x8_%1, 1, 1, 8, coeffs [...] > +cglobal transpose_16x16, 0, 0, 0 Should be prefixed with hevc_idct_. This is also still SSE2-only, an AVX version would avoid some register-register moves. Template it and instantiate both SSE2 and AVX versions. Then inside the INIT_IDCT macro you can do the following: INIT_XMM sse2 %define transpose_16x16 hevc_idct_transpose_16x16_sse2 ... INIT_XMM avx %define transpose_16x16 hevc_idct_transpose_16x16_avx ... This applies to the other transposes as well. [...] > +cglobal hevc_idct_16x16_ %+ %1, 1, 3, 15, coeffs [...] > +call transpose_16x16 > +RET cglobal hevc_idct_16x16_%1, 1, 2, 16, coeffs [...] TAIL_CALL transpose_16x16, 1 [...] > +%macro E32_O32 5 [...] > +mova m11, [rsp + %5] > +paddd m11, m14 paddd m11, m14, [rsp + %5] [...] > +%macro TR_32x4 3 [...] > +lea r2, [trans_coeff32 + 15 * 128] > +lea r3, [coeffsq + %1 + 960] > +lea r4, [coeffsq + %1 + 16 * 64] > +mov r5d, 16 * 16 > +%%loop: > +E32_O32 r2, r3, r4, shift, r5 - 16 > +sub r2, 128 > +sub r3, 64 > +add r4, 64 > +sub r5d, 16 > +jg %%loop lea r2, [trans_coeff32 + 15 * 128] lea r3, [coeffsq + %1] lea r4, [r3 + 16 * 64] mov r5d, 15 * 16 %%loop: E32_O32 r2, r3+r5*4, r4, shift, r5 sub r2, 128 add r4, 64 sub r5d, 16 jge %%loop [...] > +cglobal hevc_idct_32x32_ %+ %1, 1, 7, 15, 512, coeffs > +mov r1d, 8 > +.loop32: > +TR_32x4 (8 * r1 - 8), %1, 1 > +dec r1d > +jg .loop32 [...] > +call transpose_32x32 > +RET cglobal hevc_idct_32x32_%1, 1, 6, 16, 256, coeffs mov r1d, 7 .loop32: TR_32x4 8 * r1, %1, 1 dec r1d jge .loop32 [...] TAIL_CALL transpose_32x32, 1 [...] > @@ -270,6 +288,7 @@ void ff_hevc_dsp_init_x86(HEVCDSPContext *c, const int > bit_depth) > c->hevc_v_loop_filter_chroma = > ff_hevc_v_loop_filter_chroma_8_sse2; > c->hevc_h_loop_filter_chroma = > ff_hevc_h_loop_filter_chroma_8_sse2; > > + > c->idct_dc[1] = ff_hevc_idct_8x8_dc_8_sse2; > c->idct_dc[2] = ff_hevc_idct_16x16_dc_8_sse2; > c->idct_dc[3] = ff_hevc_idct_32x32_dc_8_sse2; Unnecessary extra newline. [...] > #if ARCH_X86_64 > if (bit_depth == 8) { > +if (EXTERNAL_SSE2(cpu_flags)) { > +c->idct[0] = ff_hevc_idct_4x4_8_sse2; > +c->idct[1] = ff_hevc_idct_8x8_8_sse2; Both 4x4 and 8x8 should work in 32-bit x86 as well. ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 1/2] hevc: Add SSE2 and AVX IDCT
On Fri, Sep 30, 2016 at 5:40 PM,wrote: > +%if cpuflag(avx) > +pmaddwd m2, m0, [pw_64] ; e0 > +pmaddwd m3, m1, [pw_83_36] ; o0 > +%else > +mova m2, m0 > +pmaddwd m2, [pw_64] > +mova m3, m1 > +pmaddwd m3, [pw_83_36] > +%endif Redundant %else. x86inc will automatically turn 3-arg instructions into a move + 2-arg instruction when targetting pre-AVX. For commutative instructions (e.g. A op B == B op A) it will also use the memory operand for the move since that's advantageous on some CPUs. This applies to several other code sections as well. [...] +%macro LOAD_BLOCK 7 +movq %1, [r0q + %3 + %7] +movhps %1, [r0q + %5 + %7] +movq %2, [r0q + %4 + %7] +movhps %2, [r0q + %6 + %7] +%endmacro Thq q suffix for registers is redundant, just use r0. Applies to the STORE_PACKED macro as well. [...] > +%macro TRANSPOSE_16x16 0 Make this into a function, just like for 32x32. [...] > +%macro E32_O32 5 [...] > +mova m11, [rsp + %5] [...] > +mov r2, 16 > +mov r3, trans_coeff32 > +mov r4, coeffsq > +mov r5, 0 > +mov r6, coeffsq > +add r6, 31 * 64 > +.loopE32_%3 > +E32_O32 r3, r4 + %1, r6 + %1, shift, r5 > +sub r6, 64 > +add r5, 16 > +add r4, 64 > +add r3, 128 > +dec r2 > +jg .loopE32_%3 mova m11, [rsp + %5 + 256] [...] lea r2, [trans_coeff32] lea r3, [coeffsq + %1 + 1024] lea r4, [r3 + 31*64 - 1024] mov r5, -256 %%loop: E32_O32 r2, r3 + r5*4, r4, shift, r5 add r2, 128 sub r4, 64 add r5, 16 jl %%loop (untested) > +transpose_32x32: Use cglobal (with 0,0,0 register args) and instantiate SSE2+AVX versions. [...] > +mov r1, 7 > +mov r2, 7 * 256 > +.loop_transpose > +SWAP_BLOCKS 0, r2, 64, 0, r1 * 8 > +sub r2, 256 > +dec r1 > +jg .loop_transpose Use dword registers (r1d/r2d) for the mov, sub, and dec instructions (but keep native size for offsets like the SWAP_BLOCKS arguments). Applies to several other code sections as well (whenever the value is positive and fits in a dword). ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 3/9] blockdsp/x86: yasmify
On Thu, Sep 22, 2016 at 9:39 AM, Anton Khirnov <an...@khirnov.net> wrote: > Quoting Henrik Gramner (2016-09-21 17:13:31) >> Why not use xorps like the original code then? INIT_XMM sse will also >> make mova assemble to movaps instead of movdqa, so no problem there. > > mmx only has pxor, so I'd need yet more ifdefs then. If you really care about optimizing for Pentium II, yes. Alternatively just drop the MMX implementation. ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 3/9] blockdsp/x86: yasmify
On Wed, Sep 21, 2016 at 9:01 AM, Anton Khirnovwrote: > Yes they are, because pxor does not exist in SSE. Why not use xorps like the original code then? INIT_XMM sse will also make mova assemble to movaps instead of movdqa, so no problem there. ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 1/2] hevc: Add AVX IDCT
Not a super-thorough review by any means, but anyway... On Sun, Sep 18, 2016 at 7:35 PM, Alexandra Hájkováwrote: [...] > +SECTION_RODATA Check if any of the constants are duplicates of already existing ones. [...] > +%macro TR_4x4 2 > +; interleaves src0 with src2 to m0 > +; and src1 with scr3 to m2 > +; src0: 00 01 02 03 m0: 00 02 01 21 02 22 03 23 00 _20_ 01 21 02 22 03 23 [...] > +SWAP 3, 0 > +SWAP 3, 2 SWAP 3, 2, 0 [...] > +cglobal hevc_idct_4x4_ %+ %1, 1, 14, 14, coeffs I'm pretty sure this functions doesn't require 14 GPRs and 14 vector registers. [...] > +%macro STORE_16 5 > +movu[rsp + %1], %5 > +movu[rsp + %2], %3 > +%endmacro I don't see any reason for doing unaligned stores. This will likely result in a performance hit compared to using aligned ones. [...] > +%macro E8_O8 8 > +pmaddwd m6, m4, %3 > +pmaddwd m7, m5, %4 > +paddd m6, m7 > + > +%if %8 == 8 > +paddd %7, m8 > +%endif > + > +paddd m7, m6, %7 ; o8 + e8 > +psubd %7, m6 ; e8 - o8 > +STORE_%8 %5 + %1, %6 + %1, %7, %2, m7 > +%endmacro If you do the middle paddd inside TR_4x4 macro instead (it even takes a parameter for this already) you need to do 8 fewer adds in the 8x8 idct. You also save a register which means the function will work in 32-bit x86 as well. [...] > +; transpose src packed in m4, m5 > +; to m3, m1 > +%macro TRANSPOSE 0 > +SBUTTERFLY wd, 4, 5, 8 > +SBUTTERFLY dq, 4, 5, 8 > +%endmacro "TRANSPOSE" is kind of generic, a bit more specific macro name would be useful. The comment is also wrong. Furthermore, in simple macros like this it's IMO preferable to have the registers as arguments instead of being hard-coded. [...] > +%macro SWAP_BLOCKS 5 > +; M_i > +LOAD_BLOCK m6, m7, %2, %2 + %3, %2 + 2 * %3, %2 + 3 * %3, %1 > + > +; M_j > +LOAD_BLOCK m4, m5, %4, %4 + %3, %4 + 2 * %3, %4 + 3 * %3, %5 > +TRANSPOSE > +STORE_PACKED m4, m5, %2, %2 + %3, %2 + 2 * %3, %2 + 3 * %3, %1 > + > +; transpose and store M_i > +SWAP m6, m4 > +SWAP m7, m5 > +TRANSPOSE > +STORE_PACKED m4, m5, %4, %4 + %3, %4 + 2 * %3, %4 + 3 * %3, %5 > +%endmacro You should perform the loads in the same order as you operate on them. E.g. LOAD m4, m5 TRANSPOSE m4, m5, m6 LOAD m6, m7 STORE m4, m5 TRANSPOSE m6, m7, m4 STORE m6, m7 [...] > +cglobal hevc_idct_8x8_ %+ %1, 1, 14, 14, coeffs I'm pretty sure this functions doesn't require 14 GPRs and 14 vector registers either. [...] > +; %1, 2 - transform constants > +; %3, 4 - regs with interleaved coeffs > +%macro ADD 4 > +pmaddwd m8, %3, %1 > +pmaddwd m9, %4, %2 > +paddd m8, m9 > +paddd m10, m8 > +%endmacro ADD is defined in x86inc already which could potentially cause weird issues, use a different macro name. [...] > +; %1 ... %4 transform coeffs > +; %5, %6 offsets for storing e+o/e-o back to coeffsq > +; %7 - shift > +; %8 - add > +; %9 - block_size > +%macro E16_O16 9 > +pxor m10, m10 > +ADD %1, %2, m0, m1 > +ADD %3, %4, m2, m3 > + > +movu m4, [rsp + %5] > +%if %9 == 8 > +paddd m4, %8 > +%endif > + > +paddd m5, m10, m4 ; o16 + e16 > +psubd m4, m10 ; e16 - o16 > +STORE_%9 %5, %6, m4, %7, m5 > +%endmacro Zeroing the accumulator can be avoided by for example making the ADD macro take an additional parameter which switches between paddd and SWAP. Also it doesn't seem like you're using that many registers here so try to use a lower register number instead of m10. Align your offsets so you don't have to do unaligned loads/stores. Looking at the disassembly, 50% of your loads/stores are misaligned by 8 for no obvious reason. Furthermore, the distance between the 16-byte stores seems to be 32 bytes which means half of the stack space is sitting unused. You're storing 8 registers worth of data to the stack before loading it back again. If you avoid using more than 8 xmm registers you could use the remaining 8 for temporary storage instead of the stack on x86-64 (keep using the stack on x86-32). [...] > +%macro TR_16x4 9 > +mova m12, [pd_64] > + > +; produce 8x4 matrix of e16 coeffs > +; for 4 first rows and store it on stack (128 bytes) > +TR_8x4 %1, 7, %4, %5, %6, %8 > + > +; load 8 even rows > +LOAD_BLOCK m0, m1, %9 * %6, %9 * 3 * %6, %9 * 5 * %6, %9 * 7 * %6, %1 > +LOAD_BLOCK m2, m3, %9 * 9 * %6, %9 * 11 * %6, %9 * 13 * %6, %9 * 15 * > %6, %1 > + > +SBUTTERFLY wd, 0, 1, 4 > +SBUTTERFLY wd, 2, 3, 4 > + > +mova m7, %3 > + > +E16_O16 [pw_90_87], [pw_80_70], [pw_57_43], [pw_25_9], 0 + %1, 15 * %6 > + %1, %2, m7, %7 > +E16_O16 [pw_87_57], [pw_9_m43], [pw_m80_m90], [pw_m70_m25], %6 + %1, 14 > * %6 + %1, %2, m7, %7 > +E16_O16 [pw_80_9], [pw_m70_m87], [pw_m25_57], [pw_90_43], 2 * %6 + %1, > 13 * %6 + %1, %2, m7, %7 > +E16_O16 [pw_70_m43], [pw_m87_9],
Re: [libav-devel] [PATCH 9/9] audiodsp/x86: yasmify vector_clipf_sse
On Tue, Sep 6, 2016 at 11:39 AM, Anton Khirnovwrote: >> Use 3-arg maxps instead of mova. > > Isn't that AVX-only? It is, x86inc will simply convert it to mova+minps when assembling it as non-AVX code but it reduces the line count. It's certainly not worth to go into bikeshedding territory about it however, so if you prefer to use plain mova:s just keep them. >> Otherwise LGTM, you could make an AVX version using ymm registers >> as well in a separate patch if you want to, just need to make sure >> the buffers are aligned. > > This function is only used in two rather obscure places, so probably > not worth it Fair enough. ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH] audiodsp/x86: clear the high bits of the order parameter on 64bit
On Tue, Sep 6, 2016 at 11:44 AM, Anton Khirnovwrote: > Also change shl to add, since it can be faster on some CPUs. > > CC: libav-sta...@libav.org > --- > libavcodec/x86/audiodsp.asm | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) Ok. ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 5/9] audiodsp/x86: sign extend the order argument to scalarproduct_int16 on 64bit
On Mon, Sep 5, 2016 at 1:02 PM, Anton Khirnovwrote: > CC: libav-sta...@libav.org > --- > libavcodec/x86/audiodsp.asm | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/libavcodec/x86/audiodsp.asm b/libavcodec/x86/audiodsp.asm > index dc38ada..0e3019c 100644 > --- a/libavcodec/x86/audiodsp.asm > +++ b/libavcodec/x86/audiodsp.asm > @@ -26,6 +26,7 @@ SECTION .text > %macro SCALARPRODUCT 0 > ; int ff_scalarproduct_int16(int16_t *v1, int16_t *v2, int order) > cglobal scalarproduct_int16, 3,3,3, v1, v2, order > +movsxdifnidn orderq, orderd > shl orderq, 1 > add v1q, orderq > add v2q, orderq Alternatively replace "shl orderq, 1" with "add orderd, orderd" instead, one instruction less since instructions operating on 32-bit registers will implicitly zero the upper 32 bits (using "shl orderd, 1" works equally well but add can be faster than shl on some CPUs so might as well use that instead). ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 9/9] audiodsp/x86: yasmify vector_clipf_sse
On Mon, Sep 5, 2016 at 1:02 PM, Anton Khirnovwrote: > +shl lenq, 2 You could also skip this shift and just use 4*lenq instead in the memory operands, multiplying by 2, 4, or 8 in memory args is free. ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 9/9] audiodsp/x86: yasmify vector_clipf_sse
On Mon, Sep 5, 2016 at 1:02 PM, Anton Khirnovwrote: > +cglobal vector_clipf, 3, 3, 6, dst, src, len, min, max > +%if ARCH_X86_32 > +VBROADCASTSS m0, minm > +VBROADCASTSS m1, maxm > +%else > +VBROADCASTSS m0, m0 > +VBROADCASTSS m1, m1 > +%endif This will fail on WIN64, to deal with the somewhat silly calling conventions on that platform you need to do something like VBROADCASTSS m0, m3 VBROADCASTSS m1, maxm (not tested, I don't have access to a Windows machine at the moment). > +movsxdifnidn lenq, lend > +shl lenq, 2 > + > +.loop > +sub lenq, 4 * mmsize Move the subtraction just before the branch (jg) to allow macro-op fusion on modern Intel CPUs. > + > +mova m2, [srcq + lenq + 0 * mmsize] > +mova m3, [srcq + lenq + 1 * mmsize] > +mova m4, [srcq + lenq + 2 * mmsize] > +mova m5, [srcq + lenq + 3 * mmsize] > + > +maxps m2, m0 > +maxps m3, m0 > +maxps m4, m0 > +maxps m5, m0 Use 3-arg maxps instead of mova. > +minps m2, m1 > +minps m3, m1 > +minps m4, m1 > +minps m5, m1 > + > +mova [dstq + lenq + 0 * mmsize], m2 > +mova [dstq + lenq + 1 * mmsize], m3 > +mova [dstq + lenq + 2 * mmsize], m4 > +mova [dstq + lenq + 3 * mmsize], m5 > + > +jg .loop > + > +RET Otherwise LGTM, you could make an AVX version using ymm registers as well in a separate patch if you want to, just need to make sure the buffers are aligned. ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 1/2 v2] x86/hevc: add add_residual
On Thu, Jul 21, 2016 at 2:48 AM, Josh de Kockwrote: > +cglobal hevc_add_residual_16_8, 3, 5, 7, dst, coeffs, stride > +pxorm0, m0 > +lea r3, [strideq * 3] > +RES_ADD_SSE_16_32_8 0, dstq, dstq + strideq > +RES_ADD_SSE_16_32_8 64, dstq + strideq * 2, dstq + r3 > +mov r4d, 3 > +.loop: > +addcoeffsq, 128 > +lea dstq, [dstq + strideq * 4] > +RES_ADD_SSE_16_32_8 0, dstq, dstq + strideq > +RES_ADD_SSE_16_32_8 64, dstq + strideq * 2, dstq + r3 > +dec r4d > +jnz .loop > +RET You can do all iterations within the loop instead, e.g. something like: mov r4d, 4 .loop: RES_ADD_SSE_16_32_8 0, dstq, dstq + strideq RES_ADD_SSE_16_32_8 64, dstq + strideq * 2, dstq + r3 addcoeffsq, 128 lea dstq, [dstq + strideq * 4] dec r4d jnz .loop (the same applies to all other similar functions) ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 1/3] x86/hevc: add add_residual
On Thu, Jul 14, 2016 at 7:25 PM, Josh de Kockwrote: Some of those functions are several kilobytes large. That's going to result in a lot of cache misses. I suggest using loops instead of duplicating the same code over and over with %reps. ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH] checkasm: add HEVC test for testing IDCT DC
On Mon, Jul 18, 2016 at 8:11 PM, Alexandra Hájkováwrote: > +if (check_func(h.idct_dc[i - 2], "idct_%dx%d_dc_%d", block_size, > block_size, bit_depth)) { > +call_ref(coeffs0); > +call_new(coeffs1); > +if (memcmp(coeffs0, coeffs1, sizeof(*coeffs0) * size)) { > +printf("fail: i %d, block_size %d, bit_depth %d\n", i, > block_size, bit_depth); > +fail(); > +} > +} bench_new() as well - otherwise there wont be any performance numbers. ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 6/6] hevc: Add AVX2 DC IDCT
On Sun, Jul 10, 2016 at 1:10 PM, Alexandra Hájkováwrote: Some fairly minor nits: > +++ b/libavcodec/x86/hevc_idct.asm > +cglobal hevc_idct_%1x%1_dc_%3, 1, 2, 1, coeff, tmp > +movsx tmpq, word [coeffq] > +add tmpw, ((1 << 14-%3) + 1) > +sar tmpw, (15-%3) > +movd xm0, tmpd Using dword instead of qword for the movsx gets rid of an unnecessary REX-prefix. Can the add overflow 16-bit, e.g. is the use of a 16-bit shift instead of a 32-bit one required for truncation? If not, use dword for all those instructions to prevent the possibility of partial register access stalls on some CPUs. [...] > +.loop: > +mova [coeffq+mmsize*0], m0 > +mova [coeffq+mmsize*1], m0 > +mova [coeffq+mmsize*2], m0 > +mova [coeffq+mmsize*3], m0 > +mova [coeffq+mmsize*4], m0 > +mova [coeffq+mmsize*5], m0 > +mova [coeffq+mmsize*6], m0 > +mova [coeffq+mmsize*7], m0 > +add coeffq, mmsize*8 > +dec cntd > +jg .loop Offsets in the range [-128,127] can be encoded in 1 byte whereas larger offsets require 4 bytes, and mmsize*4 is 128 when using ymm registers. The code size can therefore be slightly reduced by reordering instructions like this: mova [coeffq+mmsize*0], m0 mova [coeffq+mmsize*1], m0 mova [coeffq+mmsize*2], m0 mova [coeffq+mmsize*3], m0 add coeffq, mmsize*8 mova [coeffq+mmsize*-4], m0 mova [coeffq+mmsize*-3], m0 mova [coeffq+mmsize*-2], m0 mova [coeffq+mmsize*-1], m0 ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
[libav-devel] [PATCH 3/4] x86inc: Improve handling of %ifid with multi-token parameters
From: Anton MitrofanovThe yasm/nasm preprocessor only checks the first token, which means that parameters such as `dword [rax]` are treated as identifiers, which is generally not what we want. --- libavutil/x86/x86inc.asm | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm index 60aad23..b79cc19 100644 --- a/libavutil/x86/x86inc.asm +++ b/libavutil/x86/x86inc.asm @@ -1136,7 +1136,7 @@ INIT_XMM CHECK_AVX_INSTR_EMU {%1 %6, %7, %8}, %6, %8 %endif %if %5 && %4 == 0 -%ifnid %8 +%ifnnum sizeof%8 ; 3-operand AVX instructions with a memory arg can only have it in src2, ; whereas SSE emulation prefers to have it in src1 (i.e. the mov). ; So, if the instruction is commutative with a memory arg, swap them. @@ -1500,7 +1500,7 @@ FMA_INSTR pmadcswd, pmaddwd, paddd v%5%6 %1, %2, %3, %4 %elifidn %1, %2 ; If %3 or %4 is a memory operand it needs to be encoded as the last operand. -%ifid %3 +%ifnum sizeof%3 v%{5}213%6 %2, %3, %4 %else v%{5}132%6 %2, %4, %3 -- 1.9.1 ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
[libav-devel] [PATCH 4/4] x86inc: Enable AVX emulation in additional cases
From: Anton MitrofanovAllows emulation to work when dst is equal to src2 as long as the instruction is commutative, e.g. `addps m0, m1, m0`. --- libavutil/x86/x86inc.asm | 21 + 1 file changed, 13 insertions(+), 8 deletions(-) diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm index b79cc19..dca1f78 100644 --- a/libavutil/x86/x86inc.asm +++ b/libavutil/x86/x86inc.asm @@ -1129,14 +1129,12 @@ INIT_XMM %if __emulate_avx %xdefine __src1 %7 %xdefine __src2 %8 -%ifnidn %6, %7 -%if %0 >= 9 -CHECK_AVX_INSTR_EMU {%1 %6, %7, %8, %9}, %6, %8, %9 -%else -CHECK_AVX_INSTR_EMU {%1 %6, %7, %8}, %6, %8 -%endif -%if %5 && %4 == 0 -%ifnnum sizeof%8 +%if %5 && %4 == 0 +%ifnidn %6, %7 +%ifidn %6, %8 +%xdefine __src1 %8 +%xdefine __src2 %7 +%elifnnum sizeof%8 ; 3-operand AVX instructions with a memory arg can only have it in src2, ; whereas SSE emulation prefers to have it in src1 (i.e. the mov). ; So, if the instruction is commutative with a memory arg, swap them. @@ -1144,6 +1142,13 @@ INIT_XMM %xdefine __src2 %7 %endif %endif +%endif +%ifnidn %6, __src1 +%if %0 >= 9 +CHECK_AVX_INSTR_EMU {%1 %6, %7, %8, %9}, %6, __src2, %9 +%else +CHECK_AVX_INSTR_EMU {%1 %6, %7, %8}, %6, __src2 +%endif %if __sizeofreg == 8 MOVQ %6, __src1 %elif %3 -- 1.9.1 ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
[libav-devel] [PATCH 2/4] x86inc: Fix AVX emulation of some instructions
From: Anton Mitrofanov--- libavutil/x86/x86inc.asm | 44 1 file changed, 24 insertions(+), 20 deletions(-) diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm index 10352fc..60aad23 100644 --- a/libavutil/x86/x86inc.asm +++ b/libavutil/x86/x86inc.asm @@ -1096,7 +1096,7 @@ INIT_XMM ;%1 == instruction ;%2 == minimal instruction set ;%3 == 1 if float, 0 if int -;%4 == 1 if non-destructive or 4-operand (xmm, xmm, xmm, imm), 0 otherwise +;%4 == 1 if 4-operand emulation, 0 if 3-operand emulation, 255 otherwise (no emulation) ;%5 == 1 if commutative (i.e. doesn't matter which src arg is which), 0 if not ;%6+: operands %macro RUN_AVX_INSTR 6-9+ @@ -1171,9 +1171,9 @@ INIT_XMM ;%1 == instruction ;%2 == minimal instruction set ;%3 == 1 if float, 0 if int -;%4 == 1 if non-destructive or 4-operand (xmm, xmm, xmm, imm), 0 otherwise +;%4 == 1 if 4-operand emulation, 0 if 3-operand emulation, 255 otherwise (no emulation) ;%5 == 1 if commutative (i.e. doesn't matter which src arg is which), 0 if not -%macro AVX_INSTR 1-5 fnord, 0, 1, 0 +%macro AVX_INSTR 1-5 fnord, 0, 255, 0 %macro %1 1-10 fnord, fnord, fnord, fnord, %1, %2, %3, %4, %5 %ifidn %2, fnord RUN_AVX_INSTR %6, %7, %8, %9, %10, %1 @@ -1207,10 +1207,10 @@ AVX_INSTR andnpd, sse2, 1, 0, 0 AVX_INSTR andnps, sse, 1, 0, 0 AVX_INSTR andpd, sse2, 1, 0, 1 AVX_INSTR andps, sse, 1, 0, 1 -AVX_INSTR blendpd, sse4, 1, 0, 0 -AVX_INSTR blendps, sse4, 1, 0, 0 -AVX_INSTR blendvpd, sse4, 1, 0, 0 -AVX_INSTR blendvps, sse4, 1, 0, 0 +AVX_INSTR blendpd, sse4, 1, 1, 0 +AVX_INSTR blendps, sse4, 1, 1, 0 +AVX_INSTR blendvpd, sse4 ; can't be emulated +AVX_INSTR blendvps, sse4 ; can't be emulated AVX_INSTR cmppd, sse2, 1, 1, 0 AVX_INSTR cmpps, sse, 1, 1, 0 AVX_INSTR cmpsd, sse2, 1, 1, 0 @@ -1281,7 +1281,7 @@ AVX_INSTR movsldup, sse3 AVX_INSTR movss, sse, 1, 0, 0 AVX_INSTR movupd, sse2 AVX_INSTR movups, sse -AVX_INSTR mpsadbw, sse4 +AVX_INSTR mpsadbw, sse4, 0, 1, 0 AVX_INSTR mulpd, sse2, 1, 0, 1 AVX_INSTR mulps, sse, 1, 0, 1 AVX_INSTR mulsd, sse2, 1, 0, 0 @@ -1303,14 +1303,18 @@ AVX_INSTR paddsb, mmx, 0, 0, 1 AVX_INSTR paddsw, mmx, 0, 0, 1 AVX_INSTR paddusb, mmx, 0, 0, 1 AVX_INSTR paddusw, mmx, 0, 0, 1 -AVX_INSTR palignr, ssse3 +AVX_INSTR palignr, ssse3, 0, 1, 0 AVX_INSTR pand, mmx, 0, 0, 1 AVX_INSTR pandn, mmx, 0, 0, 0 AVX_INSTR pavgb, mmx2, 0, 0, 1 AVX_INSTR pavgw, mmx2, 0, 0, 1 -AVX_INSTR pblendvb, sse4, 0, 0, 0 -AVX_INSTR pblendw, sse4 -AVX_INSTR pclmulqdq +AVX_INSTR pblendvb, sse4 ; can't be emulated +AVX_INSTR pblendw, sse4, 0, 1, 0 +AVX_INSTR pclmulqdq, fnord, 0, 1, 0 +AVX_INSTR pclmulhqhqdq, fnord, 0, 0, 0 +AVX_INSTR pclmulhqlqdq, fnord, 0, 0, 0 +AVX_INSTR pclmullqhqdq, fnord, 0, 0, 0 +AVX_INSTR pclmullqlqdq, fnord, 0, 0, 0 AVX_INSTR pcmpestri, sse42 AVX_INSTR pcmpestrm, sse42 AVX_INSTR pcmpistri, sse42 @@ -1334,10 +1338,10 @@ AVX_INSTR phminposuw, sse4 AVX_INSTR phsubw, ssse3, 0, 0, 0 AVX_INSTR phsubd, ssse3, 0, 0, 0 AVX_INSTR phsubsw, ssse3, 0, 0, 0 -AVX_INSTR pinsrb, sse4 -AVX_INSTR pinsrd, sse4 -AVX_INSTR pinsrq, sse4 -AVX_INSTR pinsrw, mmx2 +AVX_INSTR pinsrb, sse4, 0, 1, 0 +AVX_INSTR pinsrd, sse4, 0, 1, 0 +AVX_INSTR pinsrq, sse4, 0, 1, 0 +AVX_INSTR pinsrw, mmx2, 0, 1, 0 AVX_INSTR pmaddwd, mmx, 0, 0, 1 AVX_INSTR pmaddubsw, ssse3, 0, 0, 0 AVX_INSTR pmaxsb, sse4, 0, 0, 1 @@ -1409,18 +1413,18 @@ AVX_INSTR punpcklwd, mmx, 0, 0, 0 AVX_INSTR punpckldq, mmx, 0, 0, 0 AVX_INSTR punpcklqdq, sse2, 0, 0, 0 AVX_INSTR pxor, mmx, 0, 0, 1 -AVX_INSTR rcpps, sse, 1, 0, 0 +AVX_INSTR rcpps, sse AVX_INSTR rcpss, sse, 1, 0, 0 AVX_INSTR roundpd, sse4 AVX_INSTR roundps, sse4 AVX_INSTR roundsd, sse4, 1, 1, 0 AVX_INSTR roundss, sse4, 1, 1, 0 -AVX_INSTR rsqrtps, sse, 1, 0, 0 +AVX_INSTR rsqrtps, sse AVX_INSTR rsqrtss, sse, 1, 0, 0 AVX_INSTR shufpd, sse2, 1, 1, 0 AVX_INSTR shufps, sse, 1, 1, 0 -AVX_INSTR sqrtpd, sse2, 1, 0, 0 -AVX_INSTR sqrtps, sse, 1, 0, 0 +AVX_INSTR sqrtpd, sse2 +AVX_INSTR sqrtps, sse AVX_INSTR sqrtsd, sse2, 1, 0, 0 AVX_INSTR sqrtss, sse, 1, 0, 0 AVX_INSTR stmxcsr, sse -- 1.9.1 ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
[libav-devel] [PATCH 1/4] x86inc: Fix AVX emulation of scalar float instructions
Those instructions are not commutative since they only change the first element in the vector and leave the rest unmodified. --- libavutil/x86/x86inc.asm | 28 ++-- 1 file changed, 14 insertions(+), 14 deletions(-) diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm index 20ef7b8..10352fc 100644 --- a/libavutil/x86/x86inc.asm +++ b/libavutil/x86/x86inc.asm @@ -1193,8 +1193,8 @@ INIT_XMM ; Non-destructive instructions are written without parameters AVX_INSTR addpd, sse2, 1, 0, 1 AVX_INSTR addps, sse, 1, 0, 1 -AVX_INSTR addsd, sse2, 1, 0, 1 -AVX_INSTR addss, sse, 1, 0, 1 +AVX_INSTR addsd, sse2, 1, 0, 0 +AVX_INSTR addss, sse, 1, 0, 0 AVX_INSTR addsubpd, sse3, 1, 0, 0 AVX_INSTR addsubps, sse3, 1, 0, 0 AVX_INSTR aesdec, fnord, 0, 0, 0 @@ -1224,10 +1224,10 @@ AVX_INSTR cvtpd2ps, sse2 AVX_INSTR cvtps2dq, sse2 AVX_INSTR cvtps2pd, sse2 AVX_INSTR cvtsd2si, sse2 -AVX_INSTR cvtsd2ss, sse2 -AVX_INSTR cvtsi2sd, sse2 -AVX_INSTR cvtsi2ss, sse -AVX_INSTR cvtss2sd, sse2 +AVX_INSTR cvtsd2ss, sse2, 1, 0, 0 +AVX_INSTR cvtsi2sd, sse2, 1, 0, 0 +AVX_INSTR cvtsi2ss, sse, 1, 0, 0 +AVX_INSTR cvtss2sd, sse2, 1, 0, 0 AVX_INSTR cvtss2si, sse AVX_INSTR cvttpd2dq, sse2 AVX_INSTR cvttps2dq, sse2 @@ -1250,12 +1250,12 @@ AVX_INSTR ldmxcsr, sse AVX_INSTR maskmovdqu, sse2 AVX_INSTR maxpd, sse2, 1, 0, 1 AVX_INSTR maxps, sse, 1, 0, 1 -AVX_INSTR maxsd, sse2, 1, 0, 1 -AVX_INSTR maxss, sse, 1, 0, 1 +AVX_INSTR maxsd, sse2, 1, 0, 0 +AVX_INSTR maxss, sse, 1, 0, 0 AVX_INSTR minpd, sse2, 1, 0, 1 AVX_INSTR minps, sse, 1, 0, 1 -AVX_INSTR minsd, sse2, 1, 0, 1 -AVX_INSTR minss, sse, 1, 0, 1 +AVX_INSTR minsd, sse2, 1, 0, 0 +AVX_INSTR minss, sse, 1, 0, 0 AVX_INSTR movapd, sse2 AVX_INSTR movaps, sse AVX_INSTR movd, mmx @@ -1284,8 +1284,8 @@ AVX_INSTR movups, sse AVX_INSTR mpsadbw, sse4 AVX_INSTR mulpd, sse2, 1, 0, 1 AVX_INSTR mulps, sse, 1, 0, 1 -AVX_INSTR mulsd, sse2, 1, 0, 1 -AVX_INSTR mulss, sse, 1, 0, 1 +AVX_INSTR mulsd, sse2, 1, 0, 0 +AVX_INSTR mulss, sse, 1, 0, 0 AVX_INSTR orpd, sse2, 1, 0, 1 AVX_INSTR orps, sse, 1, 0, 1 AVX_INSTR pabsb, ssse3 @@ -1413,8 +1413,8 @@ AVX_INSTR rcpps, sse, 1, 0, 0 AVX_INSTR rcpss, sse, 1, 0, 0 AVX_INSTR roundpd, sse4 AVX_INSTR roundps, sse4 -AVX_INSTR roundsd, sse4 -AVX_INSTR roundss, sse4 +AVX_INSTR roundsd, sse4, 1, 1, 0 +AVX_INSTR roundss, sse4, 1, 1, 0 AVX_INSTR rsqrtps, sse, 1, 0, 0 AVX_INSTR rsqrtss, sse, 1, 0, 0 AVX_INSTR shufpd, sse2, 1, 1, 0 -- 1.9.1 ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
[libav-devel] [PATCH 0/4] x86inc: Sync changes from x264
Anton Mitrofanov (3): x86inc: Fix AVX emulation of some instructions x86inc: Improve handling of %ifid with multi-token parameters x86inc: Enable AVX emulation in additional cases Henrik Gramner (1): x86inc: Fix AVX emulation of scalar float instructions libavutil/x86/x86inc.asm | 95 ++-- 1 file changed, 52 insertions(+), 43 deletions(-) -- 1.9.1 ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH] h264: Use isprint to sanitize the SEI debug message
On Sat, Feb 6, 2016 at 7:34 PM, Luca Barbatowrote: > Give how this function is used it is not really important, its purpose > is to not break the terminal printing garbage. That's true I guess. > Do you have time to get me a function that is local independent? static inline av_const int av_isprint(int c) { return c > 31 && c < 127; } ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH] h264: Use isprint to sanitize the SEI debug message
On Sat, Feb 6, 2016 at 1:03 PM, Luca Barbatowrote: > +if (isprint(val)) Shouldn't we use a locale-independent version similar to the other functions in libavutil/avstring.h? ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH] h264: Parse only the x264 info unregisterd sei
On Wed, Jul 29, 2015 at 10:51 PM, Luca Barbatowrote: > And restrict the string to ascii text. Restricting to printable characters would be even better. ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
[libav-devel] [PATCH] msvc: Fix libx264 linking
--- configure | 1 + 1 file changed, 1 insertion(+) diff --git a/configure b/configure index c5bcb78..0bf29c2 100755 --- a/configure +++ b/configure @@ -2951,6 +2951,7 @@ msvc_common_flags(){ -lz) echo zlib.lib ;; -lavifil32) echo vfw32.lib ;; -lavicap32) echo vfw32.lib user32.lib ;; +-lx264) echo libx264.lib ;; -l*) echo ${flag#-l}.lib ;; -L*) echo -libpath:${flag#-L} ;; *)echo $flag ;; -- 1.9.1 ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
[libav-devel] [PATCH v2] x86inc: Preserve arguments when allocating stack space
When allocating stack space with a larger alignment than the known stack alignment a temporary register is used for storing the stack pointer. Ensure that this isn't one of the registers used for passing arguments. --- libavutil/x86/x86inc.asm | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm index fc58b74..7d6c171 100644 --- a/libavutil/x86/x86inc.asm +++ b/libavutil/x86/x86inc.asm @@ -386,8 +386,11 @@ DECLARE_REG_TMP_SIZE 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14 %if %1 != 0 && required_stack_alignment > STACK_ALIGNMENT %if %1 > 0 %assign regs_used (regs_used + 1) -%elif ARCH_X86_64 && regs_used == num_args && num_args <= 4 + UNIX64 * 2 -%warning "Stack pointer will overwrite register argument" +%endif +%if ARCH_X86_64 && regs_used < 5 + UNIX64 * 3 +; Ensure that we don't clobber any registers containing arguments. For UNIX64 we also preserve r6 (rax) +; since it's used as a hidden argument in vararg functions to specify the number of vector registers used. +%assign regs_used 5 + UNIX64 * 3 %endif %endif %endif -- 1.9.1 ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 4/8] x86inc: Preserve arguments when allocating stack space
On Mon, Jan 18, 2016 at 2:35 PM, Ronald S. Bultje <rsbul...@gmail.com> wrote: > On Sun, Jan 17, 2016 at 6:21 PM, Henrik Gramner <hen...@gramner.com> wrote: >> @@ -386,8 +386,10 @@ DECLARE_REG_TMP_SIZE >> 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14 >> %if %1 != 0 && required_stack_alignment > STACK_ALIGNMENT >> %if %1 > 0 >> %assign regs_used (regs_used + 1) >> -%elif ARCH_X86_64 && regs_used == num_args && num_args <= 4 + >> UNIX64 * 2 >> -%warning "Stack pointer will overwrite register argument" >> +%endif >> +%if ARCH_X86_64 && regs_used < 5 + UNIX64 * 3 >> +; Ensure that we don't clobber any registers containing >> arguments >> +%assign regs_used 5 + UNIX64 * 3 > > Why 5 + unix * 3 and not 5 +unix * 2? Isn't unix64 6 regs and win64 4 regs? Because in the System V ABI, r6 (rax) is used to specify the number of arguments passed in vector registers in vararg functions so we use r7 instead of potentially clobbering it. It's certainly unlikely for it to actually be relevant in handwritten assembly functions, but there's not really any drawback of supporting that use case here (both r6 and r7 are volatile). ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
[libav-devel] [PATCH 4/8] x86inc: Preserve arguments when allocating stack space
When allocating stack space with a larger alignment than the known stack alignment a temporary register is used for storing the stack pointer. Ensure that this isn't one of the registers used for passing arguments. --- libavutil/x86/x86inc.asm | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm index fc58b74..c355ee7 100644 --- a/libavutil/x86/x86inc.asm +++ b/libavutil/x86/x86inc.asm @@ -386,8 +386,10 @@ DECLARE_REG_TMP_SIZE 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14 %if %1 != 0 && required_stack_alignment > STACK_ALIGNMENT %if %1 > 0 %assign regs_used (regs_used + 1) -%elif ARCH_X86_64 && regs_used == num_args && num_args <= 4 + UNIX64 * 2 -%warning "Stack pointer will overwrite register argument" +%endif +%if ARCH_X86_64 && regs_used < 5 + UNIX64 * 3 +; Ensure that we don't clobber any registers containing arguments +%assign regs_used 5 + UNIX64 * 3 %endif %endif %endif -- 1.9.1 ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
[libav-devel] [PATCH 5/8] x86inc: Use more consistent indentation
--- libavutil/x86/x86inc.asm | 134 +++ 1 file changed, 67 insertions(+), 67 deletions(-) diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm index c355ee7..de20e76 100644 --- a/libavutil/x86/x86inc.asm +++ b/libavutil/x86/x86inc.asm @@ -183,9 +183,9 @@ %define e%1h %3 %define r%1b %2 %define e%1b %2 -%if ARCH_X86_64 == 0 -%define r%1 e%1 -%endif +%if ARCH_X86_64 == 0 +%define r%1 e%1 +%endif %endmacro DECLARE_REG_SIZE ax, al, ah @@ -503,9 +503,9 @@ DECLARE_REG 14, R15, 120 %macro RET 0 WIN64_RESTORE_XMM_INTERNAL rsp POP_IF_USED 14, 13, 12, 11, 10, 9, 8, 7 -%if mmsize == 32 -vzeroupper -%endif +%if mmsize == 32 +vzeroupper +%endif AUTO_REP_RET %endmacro @@ -542,17 +542,17 @@ DECLARE_REG 14, R15, 72 %define has_epilogue regs_used > 9 || mmsize == 32 || stack_size > 0 %macro RET 0 -%if stack_size_padded > 0 -%if required_stack_alignment > STACK_ALIGNMENT -mov rsp, rstkm -%else -add rsp, stack_size_padded -%endif -%endif +%if stack_size_padded > 0 +%if required_stack_alignment > STACK_ALIGNMENT +mov rsp, rstkm +%else +add rsp, stack_size_padded +%endif +%endif POP_IF_USED 14, 13, 12, 11, 10, 9 -%if mmsize == 32 -vzeroupper -%endif +%if mmsize == 32 +vzeroupper +%endif AUTO_REP_RET %endmacro @@ -598,29 +598,29 @@ DECLARE_ARG 7, 8, 9, 10, 11, 12, 13, 14 %define has_epilogue regs_used > 3 || mmsize == 32 || stack_size > 0 %macro RET 0 -%if stack_size_padded > 0 -%if required_stack_alignment > STACK_ALIGNMENT -mov rsp, rstkm -%else -add rsp, stack_size_padded -%endif -%endif +%if stack_size_padded > 0 +%if required_stack_alignment > STACK_ALIGNMENT +mov rsp, rstkm +%else +add rsp, stack_size_padded +%endif +%endif POP_IF_USED 6, 5, 4, 3 -%if mmsize == 32 -vzeroupper -%endif +%if mmsize == 32 +vzeroupper +%endif AUTO_REP_RET %endmacro %endif ;== %if WIN64 == 0 -%macro WIN64_SPILL_XMM 1 -%endmacro -%macro WIN64_RESTORE_XMM 1 -%endmacro -%macro WIN64_PUSH_XMM 0 -%endmacro +%macro WIN64_SPILL_XMM 1 +%endmacro +%macro WIN64_RESTORE_XMM 1 +%endmacro +%macro WIN64_PUSH_XMM 0 +%endmacro %endif ; On AMD cpus <=K10, an ordinary ret is slow if it immediately follows either @@ -846,14 +846,14 @@ BRANCH_INSTR jz, je, jnz, jne, jl, jle, jnl, jnle, jg, jge, jng, jnge, ja, jae, %define movnta movntq %assign %%i 0 %rep 8 -CAT_XDEFINE m, %%i, mm %+ %%i -CAT_XDEFINE nnmm, %%i, %%i -%assign %%i %%i+1 +CAT_XDEFINE m, %%i, mm %+ %%i +CAT_XDEFINE nnmm, %%i, %%i +%assign %%i %%i+1 %endrep %rep 8 -CAT_UNDEF m, %%i -CAT_UNDEF nnmm, %%i -%assign %%i %%i+1 +CAT_UNDEF m, %%i +CAT_UNDEF nnmm, %%i +%assign %%i %%i+1 %endrep INIT_CPUFLAGS %1 %endmacro @@ -864,7 +864,7 @@ BRANCH_INSTR jz, je, jnz, jne, jl, jle, jnl, jnle, jg, jge, jng, jnge, ja, jae, %define mmsize 16 %define num_mmregs 8 %if ARCH_X86_64 -%define num_mmregs 16 +%define num_mmregs 16 %endif %define mova movdqa %define movu movdqu @@ -872,9 +872,9 @@ BRANCH_INSTR jz, je, jnz, jne, jl, jle, jnl, jnle, jg, jge, jng, jnge, ja, jae, %define movnta movntdq %assign %%i 0 %rep num_mmregs -CAT_XDEFINE m, %%i, xmm %+ %%i -CAT_XDEFINE nnxmm, %%i, %%i -%assign %%i %%i+1 +CAT_XDEFINE m, %%i, xmm %+ %%i +CAT_XDEFINE nnxmm, %%i, %%i +%assign %%i %%i+1 %endrep INIT_CPUFLAGS %1 %endmacro @@ -885,7 +885,7 @@ BRANCH_INSTR jz, je, jnz, jne, jl, jle, jnl, jnle, jg, jge, jng, jnge, ja, jae, %define mmsize 32 %define num_mmregs 8 %if ARCH_X86_64 -%define num_mmregs 16 +%define num_mmregs 16 %endif %define mova movdqa %define movu movdqu @@ -893,9 +893,9 @@ BRANCH_INSTR jz, je, jnz, jne, jl, jle, jnl, jnle, jg, jge, jng, jnge, ja, jae, %define movnta movntdq %assign %%i 0 %rep num_mmregs -CAT_XDEFINE m, %%i, ymm %+ %%i -CAT_XDEFINE nnymm, %%i, %%i -%assign %%i %%i+1 +CAT_XDEFINE m, %%i, ymm %+ %%i +CAT_XDEFINE nnymm, %%i, %%i +%assign %%i %%i+1 %endrep INIT_CPUFLAGS %1 %endmacro @@ -919,7 +919,7 @@ INIT_XMM %assign i 0 %rep 16 DECLARE_MMCAST i -%assign i i+1 +%assign i i+1 %endrep ; I often want to use macros that permute their arguments. e.g. there's no @@ -937,23 +937,23 @@ INIT_XMM ; doesn't cost any cycles. %macro PERMUTE 2-* ; takes a list of pairs to swap -%rep %0/2 -%xdefine %%tmp%2 m%2 -%rotate 2 -%endrep -%rep %0/2 -%xdefine m%1 %%tmp%2 -CAT_XDEFINE nn, m%1, %1 -%rotate 2 -%endrep +%rep %0/2 +
[libav-devel] [PATCH 0/8] x86inc: Sync changes from x264
The following patches were recently pushed to x264. Geza Lore (1): x86inc: Add debug symbols indicating sizes of compiled functions Henrik Gramner (7): x86inc: Make cpuflag() and notcpuflag() return 0 or 1 x86inc: Be more verbose in assertion failures x86inc: Improve FMA instruction handling x86inc: Preserve arguments when allocating stack space x86inc: Use more consistent indentation x86inc: Simplify AUTO_REP_RET x86inc: Avoid creating unnecessary local labels libavcodec/x86/proresdsp.asm| 2 +- libavutil/x86/x86inc.asm| 259 ++-- tests/checkasm/x86/checkasm.asm | 8 +- 3 files changed, 146 insertions(+), 123 deletions(-) -- 1.9.1 ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
[libav-devel] [PATCH 7/8] x86inc: Avoid creating unnecessary local labels
The REP_RET workaround is only needed on old AMD cpus, and the labels clutter up the symbol table and confuse debugging/profiling tools, so use EQU to create SHN_ABS symbols instead of creating local labels. Furthermore, skip the workaround completely in functions that definitely won't run on such cpus. Note that EQU is just creating a local label when using nasm instead of yasm. This is probably a bug, but at least it doesn't break anything. --- libavutil/x86/x86inc.asm | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm index 05a5790..980d753 100644 --- a/libavutil/x86/x86inc.asm +++ b/libavutil/x86/x86inc.asm @@ -647,8 +647,10 @@ DECLARE_ARG 7, 8, 9, 10, 11, 12, 13, 14 %rep %0 %macro %1 1-2 %1 %2 %1 -%%branch_instr: -%xdefine last_branch_adr %%branch_instr +%if notcpuflag(ssse3) +%%branch_instr equ $ +%xdefine last_branch_adr %%branch_instr +%endif %endmacro %rotate 1 %endrep -- 1.9.1 ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
[libav-devel] [PATCH 8/8] x86inc: Add debug symbols indicating sizes of compiled functions
From: Geza LoreSome debuggers/profilers use this metadata to determine which function a given instruction is in; without it they get can confused by local labels (if you haven't stripped those). On the other hand, some tools are still confused even with this metadata. e.g. this fixes `gdb`, but not `perf`. Currently only implemented for ELF. --- libavcodec/x86/proresdsp.asm| 2 +- libavutil/x86/x86inc.asm| 23 +++ tests/checkasm/x86/checkasm.asm | 8 3 files changed, 28 insertions(+), 5 deletions(-) diff --git a/libavcodec/x86/proresdsp.asm b/libavcodec/x86/proresdsp.asm index a0e97b3..5a329cb 100644 --- a/libavcodec/x86/proresdsp.asm +++ b/libavcodec/x86/proresdsp.asm @@ -54,7 +54,7 @@ cextern pw_8 cextern pw_512 cextern pw_1019 -section .text align=16 +SECTION .text ; interleave data while maintaining source ; %1=type, %2=dstlo, %3=dsthi, %4=src, %5=interleave diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm index 980d753..8afce5b 100644 --- a/libavutil/x86/x86inc.asm +++ b/libavutil/x86/x86inc.asm @@ -633,6 +633,7 @@ DECLARE_ARG 7, 8, 9, 10, 11, 12, 13, 14 %else rep ret %endif +annotate_function_size %endmacro %define last_branch_adr $$ @@ -641,6 +642,7 @@ DECLARE_ARG 7, 8, 9, 10, 11, 12, 13, 14 times ((last_branch_adr-$)>>31)+1 rep ; times 1 iff $ == last_branch_adr. %endif ret +annotate_function_size %endmacro %macro BRANCH_INSTR 0-* @@ -665,6 +667,7 @@ BRANCH_INSTR jz, je, jnz, jne, jl, jle, jnl, jnle, jg, jge, jng, jnge, ja, jae, %elif %2 jmp %1 %endif +annotate_function_size %endmacro ;= @@ -686,6 +689,7 @@ BRANCH_INSTR jz, je, jnz, jne, jl, jle, jnl, jnle, jg, jge, jng, jnge, ja, jae, cglobal_internal 0, %1 %+ SUFFIX, %2 %endmacro %macro cglobal_internal 2-3+ +annotate_function_size %if %1 %xdefine %%FUNCTION_PREFIX private_prefix %xdefine %%VISIBILITY hidden @@ -699,6 +703,7 @@ BRANCH_INSTR jz, je, jnz, jne, jl, jle, jnl, jnle, jg, jge, jng, jnge, ja, jae, CAT_XDEFINE cglobaled_, %2, 1 %endif %xdefine current_function %2 +%xdefine current_function_section __SECT__ %if FORMAT_ELF global %2:function %%VISIBILITY %else @@ -747,6 +752,24 @@ BRANCH_INSTR jz, je, jnz, jne, jl, jle, jnl, jnle, jg, jge, jng, jnge, ja, jae, [SECTION .note.GNU-stack noalloc noexec nowrite progbits] %endif +; Tell debuggers how large the function was. +; This may be invoked multiple times per function; we rely on later instances overriding earlier ones. +; This is invoked by RET and similar macros, and also cglobal does it for the previous function, +; but if the last function in a source file doesn't use any of the standard macros for its epilogue, +; then its size might be unspecified. +%macro annotate_function_size 0 +%ifdef __YASM_VER__ +%ifdef current_function +%if FORMAT_ELF +current_function_section +%%ecf equ $ +size current_function %%ecf - current_function +__SECT__ +%endif +%endif +%endif +%endmacro + ; cpuflags %assign cpuflags_mmx (1<<0) diff --git a/tests/checkasm/x86/checkasm.asm b/tests/checkasm/x86/checkasm.asm index 52d10ae..55212fc 100644 --- a/tests/checkasm/x86/checkasm.asm +++ b/tests/checkasm/x86/checkasm.asm @@ -66,14 +66,14 @@ cextern fail_func ;- cglobal stack_clobber, 1,2 ; Clobber the stack with junk below the stack pointer -%define size (max_args+6)*8 -SUB rsp, size -mov r1, size-8 +%define argsize (max_args+6)*8 +SUB rsp, argsize +mov r1, argsize-8 .loop: mov [rsp+r1], r0 sub r1, 8 jge .loop -ADD rsp, size +ADD rsp, argsize RET %if WIN64 -- 1.9.1 ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
[libav-devel] [PATCH 2/8] x86inc: Be more verbose in assertion failures
--- libavutil/x86/x86inc.asm | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm index afcd6b8..dabb6cc 100644 --- a/libavutil/x86/x86inc.asm +++ b/libavutil/x86/x86inc.asm @@ -295,7 +295,7 @@ DECLARE_REG_TMP_SIZE 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14 %macro ASSERT 1 %if (%1) == 0 -%error assert failed +%error assertion ``%1'' failed %endif %endmacro -- 1.9.1 ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
[libav-devel] [PATCH 3/8] x86inc: Improve FMA instruction handling
* Correctly handle FMA instructions with memory operands. * Print a warning if FMA instructions are used without the correct cpuflag. * Simplify the instantiation code. * Clarify documentation. Only the last operand in FMA3 instructions can be a memory operand. When converting FMA4 instructions to FMA3 instructions we can utilize the fact that multiply is a commutative operation and reorder operands if necessary to ensure that a memory operand is used only as the last operand. --- libavutil/x86/x86inc.asm | 77 +++- 1 file changed, 37 insertions(+), 40 deletions(-) diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm index dabb6cc..fc58b74 100644 --- a/libavutil/x86/x86inc.asm +++ b/libavutil/x86/x86inc.asm @@ -1,7 +1,7 @@ ;* ;* x86inc.asm: x264asm abstraction layer ;* -;* Copyright (C) 2005-2015 x264 project +;* Copyright (C) 2005-2016 x264 project ;* ;* Authors: Loren Merritt;* Anton Mitrofanov @@ -1456,47 +1456,44 @@ FMA_INSTR pmadcswd, pmaddwd, paddd ; This lets us use tzcnt without bumping the yasm version requirement yet. %define tzcnt rep bsf -; convert FMA4 to FMA3 if possible -%macro FMA4_INSTR 4 -%macro %1 4-8 %1, %2, %3, %4 -%if cpuflag(fma4) -v%5 %1, %2, %3, %4 -%elifidn %1, %2 -v%6 %1, %4, %3 ; %1 = %1 * %3 + %4 -%elifidn %1, %3 -v%7 %1, %2, %4 ; %1 = %2 * %1 + %4 -%elifidn %1, %4 -v%8 %1, %2, %3 ; %1 = %2 * %3 + %1 -%else -%error fma3 emulation of ``%5 %1, %2, %3, %4'' is not supported -%endif -%endmacro +; Macros for consolidating FMA3 and FMA4 using 4-operand (dst, src1, src2, src3) syntax. +; FMA3 is only possible if dst is the same as one of the src registers. +; Either src2 or src3 can be a memory operand. +%macro FMA4_INSTR 2-* +%push fma4_instr +%xdefine %$prefix %1 +%rep %0 - 1 +%macro %$prefix%2 4-6 %$prefix, %2 +%if notcpuflag(fma3) && notcpuflag(fma4) +%error use of ``%5%6'' fma instruction in cpuname function: current_function +%elif cpuflag(fma4) +v%5%6 %1, %2, %3, %4 +%elifidn %1, %2 +; If %3 or %4 is a memory operand it needs to be encoded as the last operand. +%ifid %3 +v%{5}213%6 %2, %3, %4 +%else +v%{5}132%6 %2, %4, %3 +%endif +%elifidn %1, %3 +v%{5}213%6 %3, %2, %4 +%elifidn %1, %4 +v%{5}231%6 %4, %2, %3 +%else +%error fma3 emulation of ``%5%6 %1, %2, %3, %4'' is not supported +%endif +%endmacro +%rotate 1 +%endrep +%pop %endmacro -FMA4_INSTR fmaddpd, fmadd132pd, fmadd213pd, fmadd231pd -FMA4_INSTR fmaddps, fmadd132ps, fmadd213ps, fmadd231ps -FMA4_INSTR fmaddsd, fmadd132sd, fmadd213sd, fmadd231sd -FMA4_INSTR fmaddss, fmadd132ss, fmadd213ss, fmadd231ss - -FMA4_INSTR fmaddsubpd, fmaddsub132pd, fmaddsub213pd, fmaddsub231pd -FMA4_INSTR fmaddsubps, fmaddsub132ps, fmaddsub213ps, fmaddsub231ps -FMA4_INSTR fmsubaddpd, fmsubadd132pd, fmsubadd213pd, fmsubadd231pd -FMA4_INSTR fmsubaddps, fmsubadd132ps, fmsubadd213ps, fmsubadd231ps - -FMA4_INSTR fmsubpd, fmsub132pd, fmsub213pd, fmsub231pd -FMA4_INSTR fmsubps, fmsub132ps, fmsub213ps, fmsub231ps -FMA4_INSTR fmsubsd, fmsub132sd, fmsub213sd, fmsub231sd -FMA4_INSTR fmsubss, fmsub132ss, fmsub213ss, fmsub231ss - -FMA4_INSTR fnmaddpd, fnmadd132pd, fnmadd213pd, fnmadd231pd -FMA4_INSTR fnmaddps, fnmadd132ps, fnmadd213ps, fnmadd231ps -FMA4_INSTR fnmaddsd, fnmadd132sd, fnmadd213sd, fnmadd231sd -FMA4_INSTR fnmaddss, fnmadd132ss, fnmadd213ss, fnmadd231ss - -FMA4_INSTR fnmsubpd, fnmsub132pd, fnmsub213pd, fnmsub231pd -FMA4_INSTR fnmsubps, fnmsub132ps, fnmsub213ps, fnmsub231ps -FMA4_INSTR fnmsubsd, fnmsub132sd, fnmsub213sd, fnmsub231sd -FMA4_INSTR fnmsubss, fnmsub132ss, fnmsub213ss, fnmsub231ss +FMA4_INSTR fmadd,pd, ps, sd, ss +FMA4_INSTR fmaddsub, pd, ps +FMA4_INSTR fmsub,pd, ps, sd, ss +FMA4_INSTR fmsubadd, pd, ps +FMA4_INSTR fnmadd, pd, ps, sd, ss +FMA4_INSTR fnmsub, pd, ps, sd, ss ; workaround: vpbroadcastq is broken in x86_32 due to a yasm bug (fixed in 1.3.0) %ifdef __YASM_VER__ -- 1.9.1 ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH v2 1/1] x86: use emms after ff_int32_to_float_fmul_scalar_sse
On Wed, Dec 30, 2015 at 1:43 PM, Janne Grunauwrote: > libavcodec/x86/fmtconvert.asm | 9 - > 1 file changed, 8 insertions(+), 1 deletion(-) Ok. ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 1/1] x86: use emms after ff_int32_to_float_fmul_scalar_sse
On Tue, Dec 29, 2015 at 12:32 PM, Janne Grunauwrote: > Intel's Instruction Set Reference (as of September 2015) clearly states > that cvtpi2ps switches to MMX state. Actual CPUs do not switch if the > source is a memory location. The Instruction Set Reference from 1999 > (Order Number 243191) describes this behaviour but all later versions > I've seen have make no distinction whether MMX registers or memory is > used as source. > The documentation for the matching SSE2 instruction to convert to double > (cvtpi2pd) was fixed (see the valgrind bug > https://bugs.kde.org/show_bug.cgi?id=210264). > > It will take time to get a clarification and fixes in place. In the > meantime it makes sense to change ff_int32_to_float_fmul_scalar_sse to > be correct according to the documentation. The vast majority of users > will have SSE2 so a change to the SSE version has little effect. > > Fixes fate-checkasm on x86 valgrind targets. > > Valgrind 'bug' reported as https://bugs.kde.org/show_bug.cgi?id=357059 > --- > libavcodec/x86/fmtconvert.asm | 7 +++ > 1 file changed, 7 insertions(+) > > diff --git a/libavcodec/x86/fmtconvert.asm b/libavcodec/x86/fmtconvert.asm > index 0383322..c2ff707 100644 > --- a/libavcodec/x86/fmtconvert.asm > +++ b/libavcodec/x86/fmtconvert.asm > @@ -61,6 +61,13 @@ cglobal int32_to_float_fmul_scalar, 4, 4, %1, dst, src, > mul, len > mova [dstq+lenq+16], m2 > add lenq, 32 > jl .loop > +%if cpuflag(sse) > +;; cvtpi2ps switches to MMX even if the source is a memory location > +;; possible an error in documentation since every tested CPU disagrees > with > +;; that. Use emms anyway since the vast majority of machines will use the > +;; SSE2 variant > +emms > +%endif > REP_RET > %endmacro Should be notcpuflag(sse2). Also the REP_RET could be replaced with RET, but that's a pretty minor thing. ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 2/2] checkasm: x86: post commit review fixes
On Tue, Dec 22, 2015 at 10:59 PM, Janne Grunauwrote: > Check the full FPU tag word instead of only the upper half and simplify > the comparison. It previously only checked the lower half, not the upper. > Use upper-case function base name as macro name to instantiate both > checked_call variants. > --- > tests/checkasm/x86/checkasm.asm | 20 +--- > 1 file changed, 9 insertions(+), 11 deletions(-) Otherwise ok. ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 1/2] x86: zero extend the 32-bit length in int32_to_float_fmul_scalar implicitly
On Tue, Dec 22, 2015 at 10:59 PM, Janne Grunauwrote: > This reverts commit 5dfe4edad63971d669ae456b0bc40ef9364cca80. > --- > libavcodec/x86/fmtconvert.asm | 5 + > 1 file changed, 1 insertion(+), 4 deletions(-) Ok. ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [libav-commits] checkasm: add fmtconvert tests
On Tue, Dec 22, 2015 at 10:44 PM, Janne Grunauwrote: >> Intel's current documentation is very clear on cvtpi2ps: "This >> instruction causes a transition from x87 FPU to MMX technology >> operation". > > every tested silicon (nothing ancient or SSE only though) and the copy > of the manual from 1999 (Order Number 243191) (Pentium 3 and SSE were > intruduced 1999) disagree. The instruction reference manual from 2002 > (Order Number 245471-006) misses the comment. But that could easily be > an edit error. The comments from the end of the instruction description > were removed and the main description was extended. Huh, that's interesting. Also note that cvtpi2ps and cvtpi2pd has different exception mechanics as well. > Since it will probably take ages to get this resolved: adding a emms > before the return of int32_to_float_fmul_scalar_sse is enough for all > x86 systems/calling conventions? That seems like to easiest, safest and most compatible way of handling it. Especially considering that CPUs with SSE2 won't even run that code path anyway. ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 1/1] x86: checkasm: check for or handle missing cleanup after MMX instructions
On Fri, Dec 11, 2015 at 6:40 PM, Janne Grunauwrote: > +#define declare_new_emms(cpu_flags, ret, ...) \ > +ret (*checked_call)(void *, int, int, int, int, int, __VA_ARGS__) = \ > +((cpu_flags) & av_get_cpu_flags()) ? (void > *)checkasm_checked_call_emms : \ > + (void *)checkasm_checked_call; > +#define declare_new_emms(cpu_flags, ret, ...) ret (*checked_call)(void *, > __VA_ARGS__) = \ > +((cpu_flags) & av_get_cpu_flags()) ? (void > *)checkasm_checked_call_emms :\ > + Do we need to have cpu_flags as a parameter here? Couldn't we just use the checked_call_emms codepath unconditionally whenever declare_new_emms is used on x86 or am I missing something? > +%macro check_call 0-1 > +cglobal checked_call%1, 2,15,16,max_args*8+8 s/check_call/CHECKED_CALL/ Also I'm not sure if using %1 when it can be undefined is a good idea. It might just happen to accidentally work right now. > +%ifnid %1, _emms > +fstenv [rsp] > +mov r9h, [rsp + 8] > +add r9h, 1 > +jz .emms_ok > +report_fail error_message_emms > +emms > +.emms_ok: > +%else > +emms > +%endif You're not checking if registers 4-7 are empty here because the FPU tag word is 16 bits and rNh is an 8-bit register corresponding to bits 8-15 of a full register. mov/add should be replaced with cmp word [rsp + 8], 0x (and jz with je IMO even though they assemble to the same opcode because "equal" makes more sense than "zero" in that case). > +%ifnid %1, _emms > +fstenv [rsp] > +mov r3h, [rsp + 8] > +add r3h, 1 > +jz .emms_ok > +report_fail error_message_emms > +emms > +.emms_ok: > +%else > +emms > +%endif Ditto, also s/rsp/esp/ for consistency with the rest of the 32-bit code. ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [libav-commits] checkasm: add fmtconvert tests
On Tue, Dec 22, 2015 at 5:41 PM, Janne Grunauwrote: > I found HTML copy from 1999 of Intel's manual(1) which says that > cvtpi2ps with a memory location as source doesn't cause a transition to > MMX state. The current documentation for cvtpi2pd (packed int to packed > double conversion) says the same. Valgrind wasn't following that that > until Vitor reported it as #210264(2) in 2009 and it was fixed in (3). > As Julian Seward says in the commit message the situation is a little > bit fishy. Intel's current documentation is very clear on cvtpi2ps: "This instruction causes a transition from x87 FPU to MMX technology operation". For cvtpi2pd it does point out that a state transition only happens when the source is an MMX registers. I'm guessing that this difference in behavior is due to the fact that cvtpi2ps is SSE while cvtpi2pd is SSE2. ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 1/2] configure: Support msys2 out of box
On Sat, Nov 21, 2015 at 7:53 AM, Hendrik Leppkeswrote: > msys2 provides various .sh scripts to setup the environment, one for > msys2 building, and one for mingw32/64 respectively. > You need to launch it using the appropriate shell script, but just > running sh.exe. > > - Hendrik Which is not really obvious and somewhat counter-intuitive if you're using say msvc instead of mingw. Is there any downside of making stuff "just work" (tm) with the default msys shell? ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
[libav-devel] [PATCH] checkasm: Fix compilation with --disable-avcodec
--- tests/checkasm/checkasm.c | 24 +--- 1 file changed, 13 insertions(+), 11 deletions(-) diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c index 9219a83..3ed78b6 100644 --- a/tests/checkasm/checkasm.c +++ b/tests/checkasm/checkasm.c @@ -57,17 +57,19 @@ static const struct { const char *name; void (*func)(void); } tests[] = { -#if CONFIG_BSWAPDSP -{ "bswapdsp", checkasm_check_bswapdsp }, -#endif -#if CONFIG_H264PRED -{ "h264pred", checkasm_check_h264pred }, -#endif -#if CONFIG_H264QPEL -{ "h264qpel", checkasm_check_h264qpel }, -#endif -#if CONFIG_V210_ENCODER -{ "v210enc", checkasm_check_v210enc }, +#if CONFIG_AVCODEC +#if CONFIG_BSWAPDSP +{ "bswapdsp", checkasm_check_bswapdsp }, +#endif +#if CONFIG_H264PRED +{ "h264pred", checkasm_check_h264pred }, +#endif +#if CONFIG_H264QPEL +{ "h264qpel", checkasm_check_h264qpel }, +#endif +#if CONFIG_V210_ENCODER +{ "v210enc", checkasm_check_v210enc }, +#endif #endif { NULL } }; -- 1.9.1 ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH] checkasm: Fix compilation with --disable-avcodec
On Sun, Oct 4, 2015 at 8:39 PM, Luca Barbatowrote: > Alternatively we might make sure if avcodec is disabled all its > components are as well. > > might simplify a lot the code... Yes, that's indeed a solid approach as well. Who's volunteering for that though? I don't really know much about the build system. ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
[libav-devel] [PATCH] checkasm: Fix the function name sorting algorithm
The previous implementation was behaving incorrectly in some corner cases. --- tests/checkasm/checkasm.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c index 013e197..9219a83 100644 --- a/tests/checkasm/checkasm.c +++ b/tests/checkasm/checkasm.c @@ -280,12 +280,16 @@ static void print_benchs(CheckasmFunc *f) /* ASCIIbetical sort except preserving natural order for numbers */ static int cmp_func_names(const char *a, const char *b) { +const char *start = a; int ascii_diff, digit_diff; -for (; !(ascii_diff = *a - *b) && *a; a++, b++); +for (; !(ascii_diff = *(const unsigned char*)a - *(const unsigned char*)b) && *a; a++, b++); for (; av_isdigit(*a) && av_isdigit(*b); a++, b++); -return (digit_diff = av_isdigit(*a) - av_isdigit(*b)) ? digit_diff : ascii_diff; +if (a > start && av_isdigit(a[-1]) && (digit_diff = av_isdigit(*a) - av_isdigit(*b))) +return digit_diff; + +return ascii_diff; } /* Perform a tree rotation in the specified direction and return the new root */ -- 1.9.1 ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH] avutil/avstring: Inline some tiny functions
On Mon, Sep 28, 2015 at 9:49 AM, Anton Khirnovwrote: > But does it actually improve performance measurably? I'd argue that > those functions are used in places where it doesn't really matter. I was using some perf tools through checkasm when I noticed an awful lot of time was spent calling av_isdigit() which was kind of silly, and inlining it made it run around 5% faster overall. But yes, it's obviously not a performance critical piece of code by any means. I haven't really looked at other code that uses any of those functions though. > And since inline public functions tend to generate pain, it's better to > avoid them unless there are large practical gains otherwise. av_toupper() and av_tolower() are similar short functions in the same file that are currently inlined though, so one could argue that this patch improves consistency if nothing else. ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
[libav-devel] [PATCH] avutil/avstring: Inline some tiny functions
They're short enough that inlining them actually reduces code size due to all the overhead associated with making a function call. --- libavutil/avstring.c | 22 -- libavutil/avstring.h | 22 ++ 2 files changed, 18 insertions(+), 26 deletions(-) diff --git a/libavutil/avstring.c b/libavutil/avstring.c index eb5c95a..5a443ab 100644 --- a/libavutil/avstring.c +++ b/libavutil/avstring.c @@ -212,28 +212,6 @@ const char *av_dirname(char *path) return path; } -int av_isdigit(int c) -{ -return c >= '0' && c <= '9'; -} - -int av_isgraph(int c) -{ -return c > 32 && c < 127; -} - -int av_isspace(int c) -{ -return c == ' ' || c == '\f' || c == '\n' || c == '\r' || c == '\t' || - c == '\v'; -} - -int av_isxdigit(int c) -{ -c = av_tolower(c); -return av_isdigit(c) || (c >= 'a' && c <= 'f'); -} - int av_match_name(const char *name, const char *names) { const char *p; diff --git a/libavutil/avstring.h b/libavutil/avstring.h index 7c30ee1..780f109 100644 --- a/libavutil/avstring.h +++ b/libavutil/avstring.h @@ -154,17 +154,27 @@ char *av_get_token(const char **buf, const char *term); /** * Locale-independent conversion of ASCII isdigit. */ -av_const int av_isdigit(int c); +static inline av_const int av_isdigit(int c) +{ +return c >= '0' && c <= '9'; +} /** * Locale-independent conversion of ASCII isgraph. */ -av_const int av_isgraph(int c); +static inline av_const int av_isgraph(int c) +{ +return c > 32 && c < 127; +} /** * Locale-independent conversion of ASCII isspace. */ -av_const int av_isspace(int c); +static inline av_const int av_isspace(int c) +{ +return c == ' ' || c == '\f' || c == '\n' || c == '\r' || c == '\t' || + c == '\v'; +} /** * Locale-independent conversion of ASCII characters to uppercase. @@ -189,7 +199,11 @@ static inline av_const int av_tolower(int c) /** * Locale-independent conversion of ASCII isxdigit. */ -av_const int av_isxdigit(int c); +static inline av_const int av_isxdigit(int c) +{ +c = av_tolower(c); +return av_isdigit(c) || (c >= 'a' && c <= 'f'); +} /* * Locale-independent case-insensitive compare. -- 1.8.3.2 ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
[libav-devel] [PATCH] checkasm: Use a self-balancing tree
Tested functions are internally kept in a binary search tree for efficient lookups. The downside of the current implementation is that the tree quickly becomes unbalanced which causes an unneccessary amount of comparisons between nodes. Improve this by changing the tree into a self-balancing left-leaning red-black tree with a worst case lookup/insertion time complexity of O(log n). Significantly reduces the recursion depth and makes the tests run around 10% faster overall. The relative performance improvement compared to the existing non-balanced tree will also most likely increase as more tests are added. --- tests/checkasm/checkasm.c | 59 +-- 1 file changed, 47 insertions(+), 12 deletions(-) diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c index 3d47a43..013e197 100644 --- a/tests/checkasm/checkasm.c +++ b/tests/checkasm/checkasm.c @@ -125,6 +125,7 @@ typedef struct CheckasmFuncVersion { typedef struct CheckasmFunc { struct CheckasmFunc *child[2]; CheckasmFuncVersion versions; +uint8_t color; /* 0 = red, 1 = black */ char name[1]; } CheckasmFunc; @@ -287,24 +288,57 @@ static int cmp_func_names(const char *a, const char *b) return (digit_diff = av_isdigit(*a) - av_isdigit(*b)) ? digit_diff : ascii_diff; } +/* Perform a tree rotation in the specified direction and return the new root */ +static CheckasmFunc *rotate_tree(CheckasmFunc *f, int dir) +{ +CheckasmFunc *r = f->child[dir^1]; +f->child[dir^1] = r->child[dir]; +r->child[dir] = f; +r->color = f->color; +f->color = 0; +return r; +} + +#define is_red(f) ((f) && !(f)->color) + +/* Balance a left-leaning red-black tree at the specified node */ +static void balance_tree(CheckasmFunc **root) +{ +CheckasmFunc *f = *root; + +if (is_red(f->child[0]) && is_red(f->child[1])) { +f->color ^= 1; +f->child[0]->color = f->child[1]->color = 1; +} + +if (!is_red(f->child[0]) && is_red(f->child[1])) +*root = rotate_tree(f, 0); /* Rotate left */ +else if (is_red(f->child[0]) && is_red(f->child[0]->child[0])) +*root = rotate_tree(f, 1); /* Rotate right */ +} + /* Get a node with the specified name, creating it if it doesn't exist */ -static CheckasmFunc *get_func(const char *name, int length) +static CheckasmFunc *get_func(CheckasmFunc **root, const char *name) { -CheckasmFunc *f, **f_ptr = +CheckasmFunc *f = *root; -/* Search the tree for a matching node */ -while ((f = *f_ptr)) { +if (f) { +/* Search the tree for a matching node */ int cmp = cmp_func_names(name, f->name); -if (!cmp) -return f; +if (cmp) { +f = get_func(>child[cmp > 0], name); -f_ptr = >child[(cmp > 0)]; +/* Rebalance the tree on the way up if a new node was inserted */ +if (!f->versions.func) +balance_tree(root); +} +} else { +/* Allocate and insert a new node into the tree */ +int name_length = strlen(name); +f = *root = checkasm_malloc(sizeof(CheckasmFunc) + name_length); +memcpy(f->name, name, name_length + 1); } -/* Allocate and insert a new node into the tree */ -f = *f_ptr = checkasm_malloc(sizeof(CheckasmFunc) + length); -memcpy(f->name, name, length+1); - return f; } @@ -405,7 +439,8 @@ void *checkasm_check_func(void *func, const char *name, ...) if (!func || name_length <= 0 || name_length >= sizeof(name_buf)) return NULL; -state.current_func = get_func(name_buf, name_length); +state.current_func = get_func(, name_buf); +state.funcs->color = 1; v = _func->versions; if (v->func) { -- 1.8.3.2 ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
[libav-devel] [PATCH] checkasm/x86: Correctly handle variadic functions
The System V ABI on x86-64 specifies that the al register contains an upper bound of the number of arguments passed in vector registers when calling variadic functions, so we aren't allowed to clobber it. checkasm_fail_func() is a variadic function so also zero al before calling it. --- tests/checkasm/x86/checkasm.asm | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/tests/checkasm/x86/checkasm.asm b/tests/checkasm/x86/checkasm.asm index 828352c..94b19b6 100644 --- a/tests/checkasm/x86/checkasm.asm +++ b/tests/checkasm/x86/checkasm.asm @@ -77,8 +77,10 @@ cglobal stack_clobber, 1,2 %if WIN64 %assign free_regs 7 +DECLARE_REG_TMP 4 %else %assign free_regs 9 +DECLARE_REG_TMP 7 %endif ;- @@ -86,7 +88,7 @@ cglobal stack_clobber, 1,2 ;- INIT_XMM cglobal checked_call, 2,15,16,max_args*8+8 -mov r6, r0 +mov t0, r0 ; All arguments have been pushed on the stack instead of registers in order to ; test for incorrect assumptions that 32-bit ints are zero-extended to 64-bit. @@ -129,7 +131,7 @@ cglobal checked_call, 2,15,16,max_args*8+8 mov r %+ i, [n %+ i] %assign i i-1 %endrep -call r6 +call t0 %assign i 14 %rep 15-free_regs xor r %+ i, [n %+ i] @@ -156,6 +158,7 @@ cglobal checked_call, 2,15,16,max_args*8+8 mov r9, rax mov r10, rdx lea r0, [error_message] +xor eax, eax call fail_func mov rdx, r10 mov rax, r9 -- 1.8.3.2 ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH] tiny_psnr: Use the correct abs() version
On Tue, Sep 22, 2015 at 9:28 PM, Vittorio Giovarawrote: > I am puzzled as well, msdn reports this function available only from > vs2013, but there is a vs2012 fate instance which seems to compile > fine with it. That wouldn't exactly be the first incorrect thing in the MSDN documentation though. ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
[libav-devel] [PATCH] checkasm: v210: Fix array overwrite
--- tests/checkasm/v210enc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tests/checkasm/v210enc.c b/tests/checkasm/v210enc.c index cdb8e76..4f5f6ba 100644 --- a/tests/checkasm/v210enc.c +++ b/tests/checkasm/v210enc.c @@ -43,7 +43,7 @@ AV_WN32A(v0 + i, r); \ AV_WN32A(v1 + i, r); \ } \ -for (i = 0; i < BUF_SIZE * 8 / 3; i += 4) {\ +for (i = 0; i < width * 8 / 3; i += 4) { \ uint32_t r = rnd();\ AV_WN32A(dst0 + i, r); \ AV_WN32A(dst1 + i, r); \ -- 1.8.3.2 ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
[libav-devel] [PATCH] checkasm: add unit tests for v210enc
--- libavcodec/v210enc.c | 15 +--- libavcodec/v210enc.h | 2 + tests/checkasm/Makefile | 1 + tests/checkasm/checkasm.c | 3 ++ tests/checkasm/checkasm.h | 1 + tests/checkasm/v210enc.c | 94 +++ 6 files changed, 111 insertions(+), 5 deletions(-) create mode 100644 tests/checkasm/v210enc.c diff --git a/libavcodec/v210enc.c b/libavcodec/v210enc.c index 42cbb86..ca6ad2e 100644 --- a/libavcodec/v210enc.c +++ b/libavcodec/v210enc.c @@ -82,6 +82,15 @@ static void v210_planar_pack_10_c(const uint16_t *y, const uint16_t *u, } } +av_cold void ff_v210enc_init(V210EncContext *s) +{ +s->pack_line_8 = v210_planar_pack_8_c; +s->pack_line_10 = v210_planar_pack_10_c; + +if (ARCH_X86) +ff_v210enc_init_x86(s); +} + static av_cold int encode_init(AVCodecContext *avctx) { V210EncContext *s = avctx->priv_data; @@ -97,11 +106,7 @@ FF_DISABLE_DEPRECATION_WARNINGS FF_ENABLE_DEPRECATION_WARNINGS #endif -s->pack_line_8 = v210_planar_pack_8_c; -s->pack_line_10 = v210_planar_pack_10_c; - -if (ARCH_X86) -ff_v210enc_init_x86(s); +ff_v210enc_init(s); return 0; } diff --git a/libavcodec/v210enc.h b/libavcodec/v210enc.h index be9b66d..81a3531 100644 --- a/libavcodec/v210enc.h +++ b/libavcodec/v210enc.h @@ -30,6 +30,8 @@ typedef struct V210EncContext { const uint16_t *v, uint8_t *dst, ptrdiff_t width); } V210EncContext; +void ff_v210enc_init(V210EncContext *s); + void ff_v210enc_init_x86(V210EncContext *s); #endif /* AVCODEC_V210ENC_H */ diff --git a/tests/checkasm/Makefile b/tests/checkasm/Makefile index ff57aa8..5fccad9 100644 --- a/tests/checkasm/Makefile +++ b/tests/checkasm/Makefile @@ -2,6 +2,7 @@ AVCODECOBJS-$(CONFIG_BSWAPDSP) += bswapdsp.o AVCODECOBJS-$(CONFIG_H264PRED) += h264pred.o AVCODECOBJS-$(CONFIG_H264QPEL) += h264qpel.o +AVCODECOBJS-$(CONFIG_V210_ENCODER) += v210enc.o CHECKASMOBJS-$(CONFIG_AVCODEC) += $(AVCODECOBJS-yes) diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c index a120bc3..3d47a43 100644 --- a/tests/checkasm/checkasm.c +++ b/tests/checkasm/checkasm.c @@ -66,6 +66,9 @@ static const struct { #if CONFIG_H264QPEL { "h264qpel", checkasm_check_h264qpel }, #endif +#if CONFIG_V210_ENCODER +{ "v210enc", checkasm_check_v210enc }, +#endif { NULL } }; diff --git a/tests/checkasm/checkasm.h b/tests/checkasm/checkasm.h index cbf3dca..aa32655 100644 --- a/tests/checkasm/checkasm.h +++ b/tests/checkasm/checkasm.h @@ -32,6 +32,7 @@ void checkasm_check_bswapdsp(void); void checkasm_check_h264pred(void); void checkasm_check_h264qpel(void); +void checkasm_check_v210enc(void); void *checkasm_check_func(void *func, const char *name, ...) av_printf_format(2, 3); int checkasm_bench_func(void); diff --git a/tests/checkasm/v210enc.c b/tests/checkasm/v210enc.c new file mode 100644 index 000..08c79e4 --- /dev/null +++ b/tests/checkasm/v210enc.c @@ -0,0 +1,94 @@ +/* + * Copyright (c) 2015 Henrik Gramner + * + * This file is part of Libav. + * + * Libav is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * Libav is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with Libav; if not, write to the Free Software Foundation, Inc., + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + */ + +#include +#include "checkasm.h" +#include "libavcodec/v210enc.h" +#include "libavutil/common.h" +#include "libavutil/internal.h" +#include "libavutil/intreadwrite.h" + +#define BUF_SIZE 512 + +#define randomize_buffers(mask)\ +do { \ +int i, size = sizeof(*y0); \ +for (i = 0; i < BUF_SIZE; i += 4/size) { \ +uint32_t r = rnd() & mask; \ +AV_WN32A(y0+i, r); \ +AV_WN32A(y1+i, r); \ +} \ +for (i = 0; i < BUF_SIZE/2; i += 4/size) { \ +uint32_t r = rnd() & mask; \ +AV_WN32A(u0+i, r); \ +AV_WN32A(u1+i, r); \ +r = rnd() & mask; \ +AV_WN32A(v0+i, r); \ +AV_WN32A(v1+i, r); \ +} \ +for (i = 0; i <
[libav-devel] [PATCH] checkasm: Fix floating point arguments on 64-bit Windows
--- tests/checkasm/x86/checkasm.asm | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/tests/checkasm/x86/checkasm.asm b/tests/checkasm/x86/checkasm.asm index 4948fc9..828352c 100644 --- a/tests/checkasm/x86/checkasm.asm +++ b/tests/checkasm/x86/checkasm.asm @@ -103,16 +103,20 @@ cglobal checked_call, 2,15,16,max_args*8+8 mov [rsp+(i-6)*8], r9 %assign i i+1 %endrep -%else +%else ; WIN64 %assign i 4 %rep max_args-4 mov r9, [rsp+stack_offset+(i+7)*8] mov [rsp+i*8], r9 %assign i i+1 %endrep -%endif -%if WIN64 +; Move possible floating-point arguments to the correct registers +movq m0, r0 +movq m1, r1 +movq m2, r2 +movq m3, r3 + %assign i 6 %rep 16-6 mova m %+ i, [x %+ i] -- 1.8.3.2 ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH] hevcdsp: add x86 SIMD for MC
On Sun, Aug 23, 2015 at 8:27 PM, Anton Khirnov an...@khirnov.net wrote: Quoting James Almer (2015-08-22 23:58:41) You need to use the d suffix instead of q on the register names to make sure the high bits are cleared. Eh? Perhaps I'm misunderstading something, but I'd expect that using d here would do exactly the opposite and keep the random data in the high bits. Operations on 32-bit registers zeroes the high bits of the register. ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH] checkasm: add HEVC MC tests
Minor nits: +#define randomize_buffers(buf, size, depth) s/buffers/buffer/ since you're only randomizing a single one at a time. +static const char *interp_names[2][2] = { { pixels, h }, { v, hv } }; const char * const Otherwise lgtm. ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
[libav-devel] [PATCH v2] checkasm: Explicitly declare function prototypes
Now we no longer have to rely on function pointers intentionally declared without specified argument types. This makes it easier to support functions with floating point parameters or return values as well as functions returning 64-bit values on 32-bit architectures. It also avoids having to explicitly cast strides to ptrdiff_t for example. --- v2: Updated to fix comments in x86/checkasm.asm --- tests/checkasm/Makefile | 3 --- tests/checkasm/bswapdsp.c | 2 ++ tests/checkasm/checkasm.c | 6 +++--- tests/checkasm/checkasm.h | 38 ++ tests/checkasm/h264pred.c | 32 tests/checkasm/h264qpel.c | 7 --- tests/checkasm/x86/checkasm.asm | 4 ++-- 7 files changed, 53 insertions(+), 39 deletions(-) diff --git a/tests/checkasm/Makefile b/tests/checkasm/Makefile index 9498ebf..ff57aa8 100644 --- a/tests/checkasm/Makefile +++ b/tests/checkasm/Makefile @@ -17,9 +17,6 @@ CHECKASMDIRS := $(sort $(dir $(CHECKASMOBJS))) $(CHECKASMOBJS): | $(CHECKASMDIRS) OBJDIRS += $(CHECKASMDIRS) -# We rely on function pointers intentionally declared without specified argument types. -tests/checkasm/%.o: CFLAGS := $(CFLAGS:-Wstrict-prototypes=-Wno-strict-prototypes) - CHECKASM := tests/checkasm/checkasm$(EXESUF) $(CHECKASM): $(EXEOBJS) $(CHECKASMOBJS) $(FF_STATIC_DEP_LIBS) diff --git a/tests/checkasm/bswapdsp.c b/tests/checkasm/bswapdsp.c index 748a886..829ebaa 100644 --- a/tests/checkasm/bswapdsp.c +++ b/tests/checkasm/bswapdsp.c @@ -43,6 +43,8 @@ #define check_bswap(type) \ do { \ int w; \ +declare_func(void, type *dst, const type *src, int w); \ + \ for (w = 0; w BUF_SIZE / sizeof(type); w++) { \ int offset = (BUF_SIZE / sizeof(type) - w) 15; /* Test various alignments */ \ randomize_buffers(); \ diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c index b564e7e..a120bc3 100644 --- a/tests/checkasm/checkasm.c +++ b/tests/checkasm/checkasm.c @@ -111,7 +111,7 @@ static const struct { typedef struct CheckasmFuncVersion { struct CheckasmFuncVersion *next; -intptr_t (*func)(); +void *func; int ok; int cpu; int iterations; @@ -387,10 +387,10 @@ int main(int argc, char *argv[]) /* Decide whether or not the specified function needs to be tested and * allocate/initialize data structures if needed. Returns a pointer to a * reference function if the function should be tested, otherwise NULL */ -intptr_t (*checkasm_check_func(intptr_t (*func)(), const char *name, ...))() +void *checkasm_check_func(void *func, const char *name, ...) { char name_buf[256]; -intptr_t (*ref)() = func; +void *ref = func; CheckasmFuncVersion *v; int name_length; va_list arg; diff --git a/tests/checkasm/checkasm.h b/tests/checkasm/checkasm.h index 443546a..cbf3dca 100644 --- a/tests/checkasm/checkasm.h +++ b/tests/checkasm/checkasm.h @@ -33,7 +33,7 @@ void checkasm_check_bswapdsp(void); void checkasm_check_h264pred(void); void checkasm_check_h264qpel(void); -intptr_t (*checkasm_check_func(intptr_t (*func)(), const char *name, ...))() av_printf_format(2, 3); +void *checkasm_check_func(void *func, const char *name, ...) av_printf_format(2, 3); int checkasm_bench_func(void); void checkasm_fail_func(const char *msg, ...) av_printf_format(1, 2); void checkasm_update_bench(int iterations, uint64_t cycles); @@ -42,14 +42,16 @@ void checkasm_report(const char *name, ...) av_printf_format(1, 2); extern AVLFG checkasm_lfg; #define rnd() av_lfg_get(checkasm_lfg) -static av_unused intptr_t (*func_ref)(); -static av_unused intptr_t (*func_new)(); +static av_unused void *func_ref, *func_new; #define BENCH_RUNS 1000 /* Trade-off between accuracy and speed */ /* Decide whether or not the specified function needs to be tested */ -#define check_func(func, ...) ((func_new = (intptr_t (*)())func) \ - (func_ref = checkasm_check_func(func_new, __VA_ARGS__))) +#define check_func(func, ...) (func_ref = checkasm_check_func((func_new = func), __VA_ARGS__)) + +/* Declare the function prototype. The first argument is the return value, the remaining + * arguments are the function parameters. Naming parameters is optional. */ +#define declare_func(ret, ...) declare_new(ret, __VA_ARGS__) typedef ret func_type(__VA_ARGS__) /* Indicate that the current test has failed */ #define fail() checkasm_fail_func(%s:%d, av_basename(__FILE__), __LINE__) @@
Re: [libav-devel] [PATCH 7/8] checkasm: add HEVC MC tests
On Wed, Aug 19, 2015 at 9:43 PM, Anton Khirnov an...@khirnov.net wrote: +const int srcstride = FFALIGN(width, 16) * sizeof(*src0); +const int dststride = FFALIGN(width, 16) * PIXEL_SIZE(bit_depth); Strides, and any other pointer-sized value, should be ptrdiff_t - or more preferable, review/push my checkasm patch and rebase this one on top of that to get rid of the issue. :) +report(%s, qpel); +report(%s, epel); +report(%s, unweighted_pred); +report(%s, weighted_pred); The %s is redundant with string literals. ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
[libav-devel] [PATCH] checkasm: Explicitly declare function prototypes
Now we no longer have to rely on function pointers intentionally declared without specified argument types. This makes it easier to support functions with floating point parameters or return values as well as functions returning 64-bit values on 32-bit architectures. It also avoids having to explicitly cast strides to ptrdiff_t for example. --- tests/checkasm/Makefile | 3 --- tests/checkasm/bswapdsp.c | 2 ++ tests/checkasm/checkasm.c | 6 +++--- tests/checkasm/checkasm.h | 38 ++ tests/checkasm/h264pred.c | 32 tests/checkasm/h264qpel.c | 7 --- tests/checkasm/x86/checkasm.asm | 2 +- 7 files changed, 52 insertions(+), 38 deletions(-) diff --git a/tests/checkasm/Makefile b/tests/checkasm/Makefile index 9498ebf..ff57aa8 100644 --- a/tests/checkasm/Makefile +++ b/tests/checkasm/Makefile @@ -17,9 +17,6 @@ CHECKASMDIRS := $(sort $(dir $(CHECKASMOBJS))) $(CHECKASMOBJS): | $(CHECKASMDIRS) OBJDIRS += $(CHECKASMDIRS) -# We rely on function pointers intentionally declared without specified argument types. -tests/checkasm/%.o: CFLAGS := $(CFLAGS:-Wstrict-prototypes=-Wno-strict-prototypes) - CHECKASM := tests/checkasm/checkasm$(EXESUF) $(CHECKASM): $(EXEOBJS) $(CHECKASMOBJS) $(FF_STATIC_DEP_LIBS) diff --git a/tests/checkasm/bswapdsp.c b/tests/checkasm/bswapdsp.c index 748a886..829ebaa 100644 --- a/tests/checkasm/bswapdsp.c +++ b/tests/checkasm/bswapdsp.c @@ -43,6 +43,8 @@ #define check_bswap(type) \ do { \ int w; \ +declare_func(void, type *dst, const type *src, int w); \ + \ for (w = 0; w BUF_SIZE / sizeof(type); w++) { \ int offset = (BUF_SIZE / sizeof(type) - w) 15; /* Test various alignments */ \ randomize_buffers(); \ diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c index b564e7e..a120bc3 100644 --- a/tests/checkasm/checkasm.c +++ b/tests/checkasm/checkasm.c @@ -111,7 +111,7 @@ static const struct { typedef struct CheckasmFuncVersion { struct CheckasmFuncVersion *next; -intptr_t (*func)(); +void *func; int ok; int cpu; int iterations; @@ -387,10 +387,10 @@ int main(int argc, char *argv[]) /* Decide whether or not the specified function needs to be tested and * allocate/initialize data structures if needed. Returns a pointer to a * reference function if the function should be tested, otherwise NULL */ -intptr_t (*checkasm_check_func(intptr_t (*func)(), const char *name, ...))() +void *checkasm_check_func(void *func, const char *name, ...) { char name_buf[256]; -intptr_t (*ref)() = func; +void *ref = func; CheckasmFuncVersion *v; int name_length; va_list arg; diff --git a/tests/checkasm/checkasm.h b/tests/checkasm/checkasm.h index 443546a..cbf3dca 100644 --- a/tests/checkasm/checkasm.h +++ b/tests/checkasm/checkasm.h @@ -33,7 +33,7 @@ void checkasm_check_bswapdsp(void); void checkasm_check_h264pred(void); void checkasm_check_h264qpel(void); -intptr_t (*checkasm_check_func(intptr_t (*func)(), const char *name, ...))() av_printf_format(2, 3); +void *checkasm_check_func(void *func, const char *name, ...) av_printf_format(2, 3); int checkasm_bench_func(void); void checkasm_fail_func(const char *msg, ...) av_printf_format(1, 2); void checkasm_update_bench(int iterations, uint64_t cycles); @@ -42,14 +42,16 @@ void checkasm_report(const char *name, ...) av_printf_format(1, 2); extern AVLFG checkasm_lfg; #define rnd() av_lfg_get(checkasm_lfg) -static av_unused intptr_t (*func_ref)(); -static av_unused intptr_t (*func_new)(); +static av_unused void *func_ref, *func_new; #define BENCH_RUNS 1000 /* Trade-off between accuracy and speed */ /* Decide whether or not the specified function needs to be tested */ -#define check_func(func, ...) ((func_new = (intptr_t (*)())func) \ - (func_ref = checkasm_check_func(func_new, __VA_ARGS__))) +#define check_func(func, ...) (func_ref = checkasm_check_func((func_new = func), __VA_ARGS__)) + +/* Declare the function prototype. The first argument is the return value, the remaining + * arguments are the function parameters. Naming parameters is optional. */ +#define declare_func(ret, ...) declare_new(ret, __VA_ARGS__) typedef ret func_type(__VA_ARGS__) /* Indicate that the current test has failed */ #define fail() checkasm_fail_func(%s:%d, av_basename(__FILE__), __LINE__) @@ -58,18 +60,16 @@ static av_unused intptr_t (*func_new)();
[libav-devel] [PATCH] checkasm: x86: properly save rdx/edx in checked_call()
If the return value doesn't fit in a single register rdx/edx can in some cases be used in addition to rax/eax. Doesn't affect any of the existing checkasm tests but might be useful later. Also comment the relevant code a bit better. --- tests/checkasm/x86/checkasm.asm | 7 +++ 1 file changed, 7 insertions(+) diff --git a/tests/checkasm/x86/checkasm.asm b/tests/checkasm/x86/checkasm.asm index cc8745f..38fc90e 100644 --- a/tests/checkasm/x86/checkasm.asm +++ b/tests/checkasm/x86/checkasm.asm @@ -145,10 +145,15 @@ cglobal checked_call, 2,15,16,max_args*8+8 or r14, r5 %endif +; Call fail_func() with a descriptive message to mark it as a failure +; if the called function didn't preserve all callee-saved registers. +; Save the return value located in rdx:rax first to prevent clobbering. jz .ok mov r9, rax +mov r10, rdx lea r0, [error_message] call fail_func +mov rdx, r10 mov rax, r9 .ok: RET @@ -182,9 +187,11 @@ cglobal checked_call, 1,7 or r3, r5 jz .ok mov r3, eax +mov r4, edx lea r0, [error_message] mov [esp], r0 call fail_func +mov edx, r4 mov eax, r3 .ok: add esp, max_args*4 -- 1.8.3.2 ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
[libav-devel] [PATCH] x86inc: Various minor backports from x264
--- libavutil/x86/x86inc.asm | 32 +--- 1 file changed, 21 insertions(+), 11 deletions(-) diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm index a519fd5..6ad9785 100644 --- a/libavutil/x86/x86inc.asm +++ b/libavutil/x86/x86inc.asm @@ -1,7 +1,7 @@ ;* ;* x86inc.asm: x264asm abstraction layer ;* -;* Copyright (C) 2005-2013 x264 project +;* Copyright (C) 2005-2015 x264 project ;* ;* Authors: Loren Merritt lor...@u.washington.edu ;* Anton Mitrofanov bugmas...@narod.ru @@ -67,6 +67,15 @@ %endif %endif +%define FORMAT_ELF 0 +%ifidn __OUTPUT_FORMAT__,elf +%define FORMAT_ELF 1 +%elifidn __OUTPUT_FORMAT__,elf32 +%define FORMAT_ELF 1 +%elifidn __OUTPUT_FORMAT__,elf64 +%define FORMAT_ELF 1 +%endif + %ifdef PREFIX %define mangle(x) _ %+ x %else @@ -688,7 +697,7 @@ BRANCH_INSTR jz, je, jnz, jne, jl, jle, jnl, jnle, jg, jge, jng, jnge, ja, jae, CAT_XDEFINE cglobaled_, %2, 1 %endif %xdefine current_function %2 -%ifidn __OUTPUT_FORMAT__,elf +%if FORMAT_ELF global %2:function %%VISIBILITY %else global %2 @@ -714,14 +723,16 @@ BRANCH_INSTR jz, je, jnz, jne, jl, jle, jnl, jnle, jg, jge, jng, jnge, ja, jae, ; like cextern, but without the prefix %macro cextern_naked 1 -%xdefine %1 mangle(%1) +%ifdef PREFIX +%xdefine %1 mangle(%1) +%endif CAT_XDEFINE cglobaled_, %1, 1 extern %1 %endmacro %macro const 1-2+ %xdefine %1 mangle(private_prefix %+ _ %+ %1) -%ifidn __OUTPUT_FORMAT__,elf +%if FORMAT_ELF global %1:data hidden %else global %1 @@ -729,10 +740,9 @@ BRANCH_INSTR jz, je, jnz, jne, jl, jle, jnl, jnle, jg, jge, jng, jnge, ja, jae, %1: %2 %endmacro -; This is needed for ELF, otherwise the GNU linker assumes the stack is -; executable by default. -%ifidn __OUTPUT_FORMAT__,elf -[section .note.GNU-stack noalloc noexec nowrite progbits] +; This is needed for ELF, otherwise the GNU linker assumes the stack is executable by default. +%if FORMAT_ELF +[SECTION .note.GNU-stack noalloc noexec nowrite progbits] %endif ; cpuflags @@ -751,8 +761,8 @@ BRANCH_INSTR jz, je, jnz, jne, jl, jle, jnl, jnle, jg, jge, jng, jnge, ja, jae, %assign cpuflags_avx (111)| cpuflags_sse42 %assign cpuflags_xop (112)| cpuflags_avx %assign cpuflags_fma4 (113)| cpuflags_avx -%assign cpuflags_avx2 (114)| cpuflags_avx -%assign cpuflags_fma3 (115)| cpuflags_avx +%assign cpuflags_fma3 (114)| cpuflags_avx +%assign cpuflags_avx2 (115)| cpuflags_fma3 %assign cpuflags_cache32 (116) %assign cpuflags_cache64 (117) @@ -801,7 +811,7 @@ BRANCH_INSTR jz, je, jnz, jne, jl, jle, jnl, jnle, jg, jge, jng, jnge, ja, jae, %endif %endif -%if cpuflag(sse2) +%if ARCH_X86_64 || cpuflag(sse2) CPUNOP amdnop %else CPUNOP basicnop -- 1.8.3.2 ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
[libav-devel] [PATCH] checkasm: Remove unnecessary include
--- tests/checkasm/checkasm.c | 4 1 file changed, 4 deletions(-) diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c index 82c635e..b564e7e 100644 --- a/tests/checkasm/checkasm.c +++ b/tests/checkasm/checkasm.c @@ -33,10 +33,6 @@ #include io.h #endif -#if ARCH_X86 -#include libavutil/x86/cpu.h -#endif - #if HAVE_SETCONSOLETEXTATTRIBUTE #include windows.h #define COLOR_REDFOREGROUND_RED -- 1.8.3.2 ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 7/8] x86inc: nasm support
On Sat, Aug 1, 2015 at 5:27 PM, Henrik Gramner hen...@gramner.com wrote: --- configure| 3 --- libavutil/x86/x86inc.asm | 42 +- 2 files changed, 29 insertions(+), 16 deletions(-) Skip this one for now, nasm seems to have a bug with dependency generation when using smartalign. x264 doesn't handle dependencies the same way so it worked fine there. I've filed a bug report with nasm. ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 8/8] x86inc: Various minor backports from x264
On Sat, Aug 1, 2015 at 9:34 PM, James Almer jamr...@gmail.com wrote: The same could be done in av_parse_cpu_flags(). It doesn't affect this patch, and can be done separately. Just throwing the idea out there. Yeah, I guess. What about bmi/bmi2, for that matter? What about them? ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH] x86: dct: Disable dct32_float_sse on x86-64
On Sat, Aug 1, 2015 at 8:28 PM, Anton Khirnov an...@khirnov.net wrote: Any specific reason you use ARCH_X86_64 in one file and ARCH_X86_32 in the other? I missed that there's a define for ARCH_X86_32 in asm (some other code used %if ARCH_X86_64 == 0 so I assumed it didn't). Using ARCH_X86_32 in both places is obviously clearer. ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
[libav-devel] [PATCH] x86: dcadsp: Avoid SSE2 instructions in SSE functions
--- libavcodec/x86/dcadsp.asm | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libavcodec/x86/dcadsp.asm b/libavcodec/x86/dcadsp.asm index c42ee23..c99df12 100644 --- a/libavcodec/x86/dcadsp.asm +++ b/libavcodec/x86/dcadsp.asm @@ -148,7 +148,7 @@ DECODE_HF addps m4, va ; va1+3 vb1+3 va2+4 vb2+4 movhlps vb, m4 ; va1+3 vb1+3 addps vb, m4 ; va0..4 vb0..4 -movh[outq + count], vb +movlps [outq + count], vb %if %1 sub cf0q, 8*NUM_COEF %endif -- 1.8.3.2 ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
[libav-devel] [PATCH 3/8] x86inc: warn when instructions incompatible with current cpuflags are used
From: Anton Mitrofanov bugmas...@narod.ru Signed-off-by: Henrik Gramner hen...@gramner.com --- libavutil/x86/x86inc.asm | 587 --- 1 file changed, 299 insertions(+), 288 deletions(-) diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm index ae6813a..96ebe37 100644 --- a/libavutil/x86/x86inc.asm +++ b/libavutil/x86/x86inc.asm @@ -1069,15 +1069,16 @@ INIT_XMM %endmacro ;%1 == instruction -;%2 == 1 if float, 0 if int -;%3 == 1 if non-destructive or 4-operand (xmm, xmm, xmm, imm), 0 otherwise -;%4 == 1 if commutative (i.e. doesn't matter which src arg is which), 0 if not -;%5+: operands -%macro RUN_AVX_INSTR 5-8+ -%ifnum sizeof%6 +;%2 == minimal instruction set +;%3 == 1 if float, 0 if int +;%4 == 1 if non-destructive or 4-operand (xmm, xmm, xmm, imm), 0 otherwise +;%5 == 1 if commutative (i.e. doesn't matter which src arg is which), 0 if not +;%6+: operands +%macro RUN_AVX_INSTR 6-9+ +%ifnum sizeof%7 +%assign __sizeofreg sizeof%7 +%elifnum sizeof%6 %assign __sizeofreg sizeof%6 -%elifnum sizeof%5 -%assign __sizeofreg sizeof%5 %else %assign __sizeofreg mmsize %endif @@ -1086,325 +1087,335 @@ INIT_XMM %xdefine __instr v%1 %else %xdefine __instr %1 -%if %0 = 7+%3 +%if %0 = 8+%4 %assign __emulate_avx 1 %endif %endif +%ifnidn %2, fnord +%ifdef cpuname +%if notcpuflag(%2) +%error use of ``%1'' %2 instruction in cpuname function: current_function +%elif cpuflags_%2 cpuflags_sse notcpuflag(sse2) __sizeofreg 8 +%error use of ``%1'' sse2 instruction in cpuname function: current_function +%endif +%endif +%endif %if __emulate_avx -%xdefine __src1 %6 -%xdefine __src2 %7 -%ifnidn %5, %6 -%if %0 = 8 -CHECK_AVX_INSTR_EMU {%1 %5, %6, %7, %8}, %5, %7, %8 +%xdefine __src1 %7 +%xdefine __src2 %8 +%ifnidn %6, %7 +%if %0 = 9 +CHECK_AVX_INSTR_EMU {%1 %6, %7, %8, %9}, %6, %8, %9 %else -CHECK_AVX_INSTR_EMU {%1 %5, %6, %7}, %5, %7 +CHECK_AVX_INSTR_EMU {%1 %6, %7, %8}, %6, %8 %endif -%if %4 %3 == 0 -%ifnid %7 +%if %5 %4 == 0 +%ifnid %8 ; 3-operand AVX instructions with a memory arg can only have it in src2, ; whereas SSE emulation prefers to have it in src1 (i.e. the mov). ; So, if the instruction is commutative with a memory arg, swap them. -%xdefine __src1 %7 -%xdefine __src2 %6 +%xdefine __src1 %8 +%xdefine __src2 %7 %endif %endif %if __sizeofreg == 8 -MOVQ %5, __src1 -%elif %2 -MOVAPS %5, __src1 +MOVQ %6, __src1 +%elif %3 +MOVAPS %6, __src1 %else -MOVDQA %5, __src1 +MOVDQA %6, __src1 %endif %endif -%if %0 = 8 -%1 %5, __src2, %8 +%if %0 = 9 +%1 %6, __src2, %9 %else -%1 %5, __src2 +%1 %6, __src2 %endif -%elif %0 = 8 -__instr %5, %6, %7, %8 +%elif %0 = 9 +__instr %6, %7, %8, %9 +%elif %0 == 8 +__instr %6, %7, %8 %elif %0 == 7 -__instr %5, %6, %7 -%elif %0 == 6 -__instr %5, %6 +__instr %6, %7 %else -__instr %5 +__instr %6 %endif %endmacro ;%1 == instruction -;%2 == 1 if float, 0 if int -;%3 == 1 if non-destructive or 4-operand (xmm, xmm, xmm, imm), 0 otherwise -;%4 == 1 if commutative (i.e. doesn't matter which src arg is which), 0 if not -%macro AVX_INSTR 1-4 0, 1, 0 -%macro %1 1-9 fnord, fnord, fnord, fnord, %1, %2, %3, %4 +;%2 == minimal instruction set +;%3 == 1 if float, 0 if int +;%4 == 1 if non-destructive or 4-operand (xmm, xmm, xmm, imm), 0 otherwise +;%5 == 1 if commutative (i.e. doesn't matter which src arg is which), 0 if not +%macro AVX_INSTR 1-5 fnord, 0, 1, 0 +%macro %1 1-10 fnord, fnord, fnord, fnord, %1, %2, %3, %4, %5 %ifidn %2, fnord -RUN_AVX_INSTR %6, %7, %8, %9, %1 +RUN_AVX_INSTR %6, %7, %8, %9, %10, %1 %elifidn %3, fnord -RUN_AVX_INSTR %6, %7, %8, %9, %1, %2 +RUN_AVX_INSTR %6, %7, %8, %9, %10, %1, %2 %elifidn %4, fnord -RUN_AVX_INSTR %6, %7, %8, %9, %1, %2, %3 +RUN_AVX_INSTR %6, %7, %8, %9, %10, %1, %2, %3 %elifidn %5, fnord -RUN_AVX_INSTR %6, %7, %8, %9, %1, %2, %3, %4 +RUN_AVX_INSTR %6, %7, %8, %9, %10, %1, %2, %3, %4 %else
[libav-devel] [PATCH 8/8] x86inc: Various minor backports from x264
--- libavutil/x86/x86inc.asm | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm index d70a5f9..0e2f447 100644 --- a/libavutil/x86/x86inc.asm +++ b/libavutil/x86/x86inc.asm @@ -1,7 +1,7 @@ ;* ;* x86inc.asm: x264asm abstraction layer ;* -;* Copyright (C) 2005-2013 x264 project +;* Copyright (C) 2005-2015 x264 project ;* ;* Authors: Loren Merritt lor...@u.washington.edu ;* Anton Mitrofanov bugmas...@narod.ru @@ -740,7 +740,7 @@ BRANCH_INSTR jz, je, jnz, jne, jl, jle, jnl, jnle, jg, jge, jng, jnge, ja, jae, ; This is needed for ELF, otherwise the GNU linker assumes the stack is executable by default. %if FORMAT_ELF -[section .note.GNU-stack noalloc noexec nowrite progbits] +[SECTION .note.GNU-stack noalloc noexec nowrite progbits] %endif ; cpuflags @@ -759,8 +759,8 @@ BRANCH_INSTR jz, je, jnz, jne, jl, jle, jnl, jnle, jg, jge, jng, jnge, ja, jae, %assign cpuflags_avx (111)| cpuflags_sse42 %assign cpuflags_xop (112)| cpuflags_avx %assign cpuflags_fma4 (113)| cpuflags_avx -%assign cpuflags_avx2 (114)| cpuflags_avx -%assign cpuflags_fma3 (115)| cpuflags_avx +%assign cpuflags_fma3 (114)| cpuflags_avx +%assign cpuflags_avx2 (115)| cpuflags_fma3 %assign cpuflags_cache32 (116) %assign cpuflags_cache64 (117) @@ -809,7 +809,7 @@ BRANCH_INSTR jz, je, jnz, jne, jl, jle, jnl, jnle, jg, jge, jng, jnge, ja, jae, %endif %endif -%if cpuflag(sse2) +%if ARCH_X86_64 || cpuflag(sse2) %ifdef __NASM_VER__ ALIGNMODE k8 %else -- 1.8.3.2 ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
[libav-devel] [PATCH 1/8] x86inc: warn if XOP integer FMA instruction emulation is impossible
From: Anton Mitrofanov bugmas...@narod.ru Emulation requires a temporary register if arguments 1 and 4 are the same; this doesn't obey the semantics of the original instruction, so we can't emulate that in x86inc. Also add pmacsdql emulation. Signed-off-by: Henrik Gramner hen...@gramner.com --- libavutil/x86/x86inc.asm | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm index a6e1f33..4c0a4bd 100644 --- a/libavutil/x86/x86inc.asm +++ b/libavutil/x86/x86inc.asm @@ -1410,15 +1410,18 @@ AVX_INSTR pfmul, 1, 0, 1 %macro %1 4-7 %1, %2, %3 %if cpuflag(xop) v%5 %1, %2, %3, %4 -%else +%elifnidn %1, %4 %6 %1, %2, %3 %7 %1, %4 +%else +%error non-xop emulation of ``%5 %1, %2, %3, %4'' is not supported %endif %endmacro %endmacro -FMA_INSTR pmacsdd, pmulld, paddd FMA_INSTR pmacsww, pmullw, paddw +FMA_INSTR pmacsdd, pmulld, paddd ; sse4 emulation +FMA_INSTR pmacsdql, pmuldq, paddq ; sse4 emulation FMA_INSTR pmadcswd, pmaddwd, paddd ; tzcnt is equivalent to rep bsf and is backwards-compatible with bsf. -- 1.8.3.2 ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
[libav-devel] [PATCH 2/8] x86inc: Support arbitrary stack alignments
Change ALLOC_STACK to always align the stack before allocating stack space for consistency. Previously alignment would occur either before or after allocating stack space depending on whether manual alignment was required or not. --- libavcodec/x86/h264_deblock.asm | 4 +-- libavutil/x86/x86inc.asm| 62 ++--- 2 files changed, 42 insertions(+), 24 deletions(-) diff --git a/libavcodec/x86/h264_deblock.asm b/libavcodec/x86/h264_deblock.asm index d2067c8..33fd5a9 100644 --- a/libavcodec/x86/h264_deblock.asm +++ b/libavcodec/x86/h264_deblock.asm @@ -444,13 +444,13 @@ cglobal deblock_%1_luma_8, 5,5,8,2*%2 ;int8_t *tc0) ;- INIT_MMX cpuname -cglobal deblock_h_luma_8, 0,5,8,0x60+HAVE_ALIGNED_STACK*12 +cglobal deblock_h_luma_8, 0,5,8,0x60+12 movr0, r0mp movr3, r1m lear4, [r3*3] subr0, 4 lear1, [r0+r4] -%define pix_tmp esp+12*HAVE_ALIGNED_STACK +%define pix_tmp esp+12 ; transpose 6x16 - tmp space TRANSPOSE6x8_MEM PASS8ROWS(r0, r1, r3, r4), pix_tmp diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm index 4c0a4bd..ae6813a 100644 --- a/libavutil/x86/x86inc.asm +++ b/libavutil/x86/x86inc.asm @@ -42,6 +42,17 @@ %define public_prefix private_prefix %endif +%if HAVE_ALIGNED_STACK +%define STACK_ALIGNMENT 16 +%endif +%ifndef STACK_ALIGNMENT +%if ARCH_X86_64 +%define STACK_ALIGNMENT 16 +%else +%define STACK_ALIGNMENT 4 +%endif +%endif + %define WIN64 0 %define UNIX64 0 %if ARCH_X86_64 @@ -108,8 +119,9 @@ ; %1 = number of arguments. loads them from stack if needed. ; %2 = number of registers used. pushes callee-saved regs if needed. ; %3 = number of xmm registers used. pushes callee-saved xmm regs if needed. -; %4 = (optional) stack size to be allocated. If not aligned (x86-32 ICC 10.x, -; MSVC or YMM), the stack will be manually aligned (to 16 or 32 bytes), +; %4 = (optional) stack size to be allocated. The stack will be aligned before +; allocating the specified stack size. If the required stack alignment is +; larger than the known stack alignment the stack will be manually aligned ; and an extra register will be allocated to hold the original stack ; pointer (to not invalidate r0m etc.). To prevent the use of an extra ; register as stack pointer, request a negative stack size. @@ -117,8 +129,10 @@ ; PROLOGUE can also be invoked by adding the same options to cglobal ; e.g. -; cglobal foo, 2,3,0, dst, src, tmp -; declares a function (foo), taking two args (dst and src) and one local variable (tmp) +; cglobal foo, 2,3,7,0x40, dst, src, tmp +; declares a function (foo) that automatically loads two arguments (dst and +; src) into registers, uses one additional register (tmp) plus 7 vector +; registers (m0-m6) and allocates 0x40 bytes of stack space. ; TODO Some functions can use some args directly from the stack. If they're the ; last args then you can just not declare them, but if they're in the middle @@ -319,26 +333,28 @@ DECLARE_REG_TMP_SIZE 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14 %assign n_arg_names %0 %endmacro +%define required_stack_alignment ((mmsize + 15) ~15) + %macro ALLOC_STACK 1-2 0 ; stack_size, n_xmm_regs (for win64 only) %ifnum %1 %if %1 != 0 -%assign %%stack_alignment ((mmsize + 15) ~15) +%assign %%pad 0 %assign stack_size %1 %if stack_size 0 %assign stack_size -stack_size %endif -%assign stack_size_padded stack_size %if WIN64 -%assign stack_size_padded stack_size_padded + 32 ; reserve 32 bytes for shadow space +%assign %%pad %%pad + 32 ; shadow space %if mmsize != 8 %assign xmm_regs_used %2 %if xmm_regs_used 8 -%assign stack_size_padded stack_size_padded + (xmm_regs_used-8)*16 +%assign %%pad %%pad + (xmm_regs_used-8)*16 ; callee-saved xmm registers %endif %endif %endif -%if mmsize = 16 HAVE_ALIGNED_STACK -%assign stack_size_padded stack_size_padded + %%stack_alignment - gprsize - (stack_offset (%%stack_alignment - 1)) +%if required_stack_alignment = STACK_ALIGNMENT +; maintain the current stack alignment +%assign stack_size_padded stack_size + %%pad + ((-%%pad-stack_offset-gprsize) (STACK_ALIGNMENT-1)) SUB rsp, stack_size_padded %else %assign %%reg_num (regs_used - 1) @@ -347,17 +363,17 @@ DECLARE_REG_TMP_SIZE 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14 ; it, i.e. in [rsp+stack_size_padded], so we can restore the ; stack in a single
[libav-devel] [PATCH 0/8] x86inc: Sync changes from x264
This brings x86inc.asm in libavutil up to date with x86inc.asm in x264. They're not 100% identical but the difference is tiny compared to before. Anton Mitrofanov (2): x86inc: warn if XOP integer FMA instruction emulation is impossible x86inc: warn when instructions incompatible with current cpuflags are used Christophe Gisquet (1): x86inc: Fix instantiation of YMM registers Henrik Gramner (5): x86inc: Support arbitrary stack alignments x86inc: Disable vpbroadcastq workaround in newer yasm versions x86inc: Drop SECTION_TEXT macro x86inc: nasm support x86inc: Various minor backports from x264 configure | 3 - libavcodec/x86/apedsp.asm | 2 +- libavcodec/x86/audiodsp.asm | 2 +- libavcodec/x86/bswapdsp.asm | 2 +- libavcodec/x86/dcadsp.asm | 2 +- libavcodec/x86/dct32.asm| 2 +- libavcodec/x86/fft.asm | 2 +- libavcodec/x86/fmtconvert.asm | 2 +- libavcodec/x86/h263_loopfilter.asm | 2 +- libavcodec/x86/h264_deblock.asm | 4 +- libavcodec/x86/hpeldsp.asm | 2 +- libavcodec/x86/huffyuvdsp.asm | 2 +- libavcodec/x86/imdct36.asm | 2 +- libavcodec/x86/pngdsp.asm | 2 +- libavcodec/x86/qpeldsp.asm | 2 +- libavcodec/x86/sbrdsp.asm | 2 +- libavfilter/x86/af_volume.asm | 2 +- libavresample/x86/audio_convert.asm | 2 +- libavresample/x86/audio_mix.asm | 2 +- libavresample/x86/dither.asm| 2 +- libavutil/x86/x86inc.asm| 742 +++- 21 files changed, 410 insertions(+), 375 deletions(-) -- 1.8.3.2 ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
[libav-devel] [PATCH 6/8] x86inc: Drop SECTION_TEXT macro
The .text section is already 16-byte aligned by default on all supported platforms so `SECTION_TEXT` isn't any different from `SECTION .text`. --- libavcodec/x86/apedsp.asm | 2 +- libavcodec/x86/audiodsp.asm | 2 +- libavcodec/x86/bswapdsp.asm | 2 +- libavcodec/x86/dcadsp.asm | 2 +- libavcodec/x86/dct32.asm| 2 +- libavcodec/x86/fft.asm | 2 +- libavcodec/x86/fmtconvert.asm | 2 +- libavcodec/x86/h263_loopfilter.asm | 2 +- libavcodec/x86/hpeldsp.asm | 2 +- libavcodec/x86/huffyuvdsp.asm | 2 +- libavcodec/x86/imdct36.asm | 2 +- libavcodec/x86/pngdsp.asm | 2 +- libavcodec/x86/qpeldsp.asm | 2 +- libavcodec/x86/sbrdsp.asm | 2 +- libavfilter/x86/af_volume.asm | 2 +- libavresample/x86/audio_convert.asm | 2 +- libavresample/x86/audio_mix.asm | 2 +- libavresample/x86/dither.asm| 2 +- libavutil/x86/x86inc.asm| 12 19 files changed, 18 insertions(+), 30 deletions(-) diff --git a/libavcodec/x86/apedsp.asm b/libavcodec/x86/apedsp.asm index d721ebd..d6abd98 100644 --- a/libavcodec/x86/apedsp.asm +++ b/libavcodec/x86/apedsp.asm @@ -20,7 +20,7 @@ %include libavutil/x86/x86util.asm -SECTION_TEXT +SECTION .text %macro SCALARPRODUCT 0 ; int ff_scalarproduct_and_madd_int16(int16_t *v1, int16_t *v2, int16_t *v3, diff --git a/libavcodec/x86/audiodsp.asm b/libavcodec/x86/audiodsp.asm index f2e831d..696a73b 100644 --- a/libavcodec/x86/audiodsp.asm +++ b/libavcodec/x86/audiodsp.asm @@ -21,7 +21,7 @@ %include libavutil/x86/x86util.asm -SECTION_TEXT +SECTION .text %macro SCALARPRODUCT 0 ; int ff_scalarproduct_int16(int16_t *v1, int16_t *v2, int order) diff --git a/libavcodec/x86/bswapdsp.asm b/libavcodec/x86/bswapdsp.asm index 42580a3..4810867 100644 --- a/libavcodec/x86/bswapdsp.asm +++ b/libavcodec/x86/bswapdsp.asm @@ -24,7 +24,7 @@ SECTION_RODATA pb_bswap32: db 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12 -SECTION_TEXT +SECTION .text ; %1 = aligned/unaligned %macro BSWAP_LOOPS 1 diff --git a/libavcodec/x86/dcadsp.asm b/libavcodec/x86/dcadsp.asm index c99df12..18c7a0c 100644 --- a/libavcodec/x86/dcadsp.asm +++ b/libavcodec/x86/dcadsp.asm @@ -24,7 +24,7 @@ SECTION_RODATA pf_inv16: times 4 dd 0x3D80 ; 1/16 -SECTION_TEXT +SECTION .text ; void decode_hf(float dst[DCA_SUBBANDS][8], const int32_t vq_num[DCA_SUBBANDS], ;const int8_t hf_vq[1024][32], intptr_t vq_offset, diff --git a/libavcodec/x86/dct32.asm b/libavcodec/x86/dct32.asm index fa723b0..c7d2b6b 100644 --- a/libavcodec/x86/dct32.asm +++ b/libavcodec/x86/dct32.asm @@ -191,7 +191,7 @@ ps_p1p1m1m1: dd 0, 0, 0x8000, 0x8000, 0, 0, 0x8000, 0x8000 %endmacro INIT_YMM avx -SECTION_TEXT +SECTION .text ; void ff_dct32_float_avx(FFTSample *out, const FFTSample *in) cglobal dct32_float, 2,3,8, out, in, tmp ; pass 1 diff --git a/libavcodec/x86/fft.asm b/libavcodec/x86/fft.asm index e4744a3..d3be72e 100644 --- a/libavcodec/x86/fft.asm +++ b/libavcodec/x86/fft.asm @@ -90,7 +90,7 @@ cextern cos_ %+ i %1 %endmacro -SECTION_TEXT +SECTION .text %macro T2_3DNOW 4 ; z0, z1, mem0, mem1 mova %1, %3 diff --git a/libavcodec/x86/fmtconvert.asm b/libavcodec/x86/fmtconvert.asm index e82f149..727daa9 100644 --- a/libavcodec/x86/fmtconvert.asm +++ b/libavcodec/x86/fmtconvert.asm @@ -21,7 +21,7 @@ %include libavutil/x86/x86util.asm -SECTION_TEXT +SECTION .text ;-- ; void ff_int32_to_float_fmul_scalar(float *dst, const int32_t *src, float mul, diff --git a/libavcodec/x86/h263_loopfilter.asm b/libavcodec/x86/h263_loopfilter.asm index 673f795..cd726ba 100644 --- a/libavcodec/x86/h263_loopfilter.asm +++ b/libavcodec/x86/h263_loopfilter.asm @@ -24,7 +24,7 @@ SECTION_RODATA cextern pb_FC cextern h263_loop_filter_strength -SECTION_TEXT +SECTION .text %macro H263_LOOP_FILTER 5 pxor m7, m7 diff --git a/libavcodec/x86/hpeldsp.asm b/libavcodec/x86/hpeldsp.asm index 073f7f9..b8929b9 100644 --- a/libavcodec/x86/hpeldsp.asm +++ b/libavcodec/x86/hpeldsp.asm @@ -23,7 +23,7 @@ SECTION_RODATA cextern pb_1 -SECTION_TEXT +SECTION .text ; void ff_put_pixels8_x2(uint8_t *block, const uint8_t *pixels, ptrdiff_t line_size, int h) %macro PUT_PIXELS8_X2 0 diff --git a/libavcodec/x86/huffyuvdsp.asm b/libavcodec/x86/huffyuvdsp.asm index 436abc8..e7536da 100644 --- a/libavcodec/x86/huffyuvdsp.asm +++ b/libavcodec/x86/huffyuvdsp.asm @@ -28,7 +28,7 @@ pb_7: times 8 db 7 pb_: db -1,-1,-1,-1,3,3,3,3,-1,-1,-1,-1,11,11,11,11 pb_zz11zz55zz99zzdd: db -1,-1,1,1,-1,-1,5,5,-1,-1,9,9,-1,-1,13,13 -SECTION_TEXT +SECTION .text ; void ff_add_hfyu_median_pred_mmxext(uint8_t *dst, const uint8_t *top, ; const uint8_t *diff, int w, diff --git a/libavcodec/x86/imdct36.asm
[libav-devel] [PATCH 5/8] x86inc: Disable vpbroadcastq workaround in newer yasm versions
The bug was fixed in 1.3.0, so only perform the workaround in earlier versions. --- libavutil/x86/x86inc.asm | 20 +++- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm index 2844fdf..d4ce68f 100644 --- a/libavutil/x86/x86inc.asm +++ b/libavutil/x86/x86inc.asm @@ -1499,13 +1499,15 @@ FMA4_INSTR fnmsubps, fnmsub132ps, fnmsub213ps, fnmsub231ps FMA4_INSTR fnmsubsd, fnmsub132sd, fnmsub213sd, fnmsub231sd FMA4_INSTR fnmsubss, fnmsub132ss, fnmsub213ss, fnmsub231ss -; workaround: vpbroadcastq is broken in x86_32 due to a yasm bug -%if ARCH_X86_64 == 0 -%macro vpbroadcastq 2 -%if sizeof%1 == 16 -movddup %1, %2 -%else -vbroadcastsd %1, %2 -%endif -%endmacro +; workaround: vpbroadcastq is broken in x86_32 due to a yasm bug (fixed in 1.3.0) +%ifdef __YASM_VER__ +%if __YASM_VERSION_ID__ 0x0103 ARCH_X86_64 == 0 +%macro vpbroadcastq 2 +%if sizeof%1 == 16 +movddup %1, %2 +%else +vbroadcastsd %1, %2 +%endif +%endmacro +%endif %endif -- 1.8.3.2 ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
[libav-devel] [PATCH 7/8] x86inc: nasm support
--- configure| 3 --- libavutil/x86/x86inc.asm | 42 +- 2 files changed, 29 insertions(+), 16 deletions(-) diff --git a/configure b/configure index 482be43..79dd3a5 100755 --- a/configure +++ b/configure @@ -1353,7 +1353,6 @@ ARCH_EXT_LIST_PPC= ARCH_EXT_LIST_X86= $ARCH_EXT_LIST_X86_SIMD -cpunop i686 @@ -1732,7 +1731,6 @@ ppc4xx_deps=ppc vsx_deps=altivec power8_deps=vsx -cpunop_deps=i686 x86_64_select=i686 x86_64_suggest=fast_cmov @@ -4151,7 +4149,6 @@ EOF check_yasm vpmacsdd xmm0, xmm1, xmm2, xmm3 || disable xop_external check_yasm vfmadd132ps ymm0, ymm1, ymm2|| disable fma3_external check_yasm vfmaddps ymm0, ymm1, ymm2, ymm3 || disable fma4_external -check_yasm CPU amdnop || disable cpunop fi case $cpu in diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm index a519fd5..d70a5f9 100644 --- a/libavutil/x86/x86inc.asm +++ b/libavutil/x86/x86inc.asm @@ -67,6 +67,15 @@ %endif %endif +%define FORMAT_ELF 0 +%ifidn __OUTPUT_FORMAT__,elf +%define FORMAT_ELF 1 +%elifidn __OUTPUT_FORMAT__,elf32 +%define FORMAT_ELF 1 +%elifidn __OUTPUT_FORMAT__,elf64 +%define FORMAT_ELF 1 +%endif + %ifdef PREFIX %define mangle(x) _ %+ x %else @@ -96,11 +105,9 @@ default rel %endif -%macro CPUNOP 1 -%if HAVE_CPUNOP -CPU %1 -%endif -%endmacro +%ifdef __NASM_VER__ +%use smartalign +%endif ; Macros to eliminate most code duplication between x86_32 and x86_64: ; Currently this works only for leaf functions which load all their arguments @@ -688,7 +695,7 @@ BRANCH_INSTR jz, je, jnz, jne, jl, jle, jnl, jnle, jg, jge, jng, jnge, ja, jae, CAT_XDEFINE cglobaled_, %2, 1 %endif %xdefine current_function %2 -%ifidn __OUTPUT_FORMAT__,elf +%if FORMAT_ELF global %2:function %%VISIBILITY %else global %2 @@ -714,14 +721,16 @@ BRANCH_INSTR jz, je, jnz, jne, jl, jle, jnl, jnle, jg, jge, jng, jnge, ja, jae, ; like cextern, but without the prefix %macro cextern_naked 1 -%xdefine %1 mangle(%1) +%ifdef PREFIX +%xdefine %1 mangle(%1) +%endif CAT_XDEFINE cglobaled_, %1, 1 extern %1 %endmacro %macro const 1-2+ %xdefine %1 mangle(private_prefix %+ _ %+ %1) -%ifidn __OUTPUT_FORMAT__,elf +%if FORMAT_ELF global %1:data hidden %else global %1 @@ -729,9 +738,8 @@ BRANCH_INSTR jz, je, jnz, jne, jl, jle, jnl, jnle, jg, jge, jng, jnge, ja, jae, %1: %2 %endmacro -; This is needed for ELF, otherwise the GNU linker assumes the stack is -; executable by default. -%ifidn __OUTPUT_FORMAT__,elf +; This is needed for ELF, otherwise the GNU linker assumes the stack is executable by default. +%if FORMAT_ELF [section .note.GNU-stack noalloc noexec nowrite progbits] %endif @@ -802,9 +810,17 @@ BRANCH_INSTR jz, je, jnz, jne, jl, jle, jnl, jnle, jg, jge, jng, jnge, ja, jae, %endif %if cpuflag(sse2) -CPUNOP amdnop +%ifdef __NASM_VER__ +ALIGNMODE k8 +%else +CPU amdnop +%endif %else -CPUNOP basicnop +%ifdef __NASM_VER__ +ALIGNMODE nop +%else +CPU basicnop +%endif %endif %endmacro -- 1.8.3.2 ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
[libav-devel] [PATCH 4/8] x86inc: Fix instantiation of YMM registers
From: Christophe Gisquet christophe.gisq...@gmail.com Signed-off-by: Henrik Gramner hen...@gramner.com --- libavutil/x86/x86inc.asm | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm index 96ebe37..2844fdf 100644 --- a/libavutil/x86/x86inc.asm +++ b/libavutil/x86/x86inc.asm @@ -893,7 +893,7 @@ BRANCH_INSTR jz, je, jnz, jne, jl, jle, jnl, jnle, jg, jge, jng, jnge, ja, jae, %assign %%i 0 %rep num_mmregs CAT_XDEFINE m, %%i, ymm %+ %%i -CAT_XDEFINE nymm, %%i, %%i +CAT_XDEFINE nnymm, %%i, %%i %assign %%i %%i+1 %endrep INIT_CPUFLAGS %1 -- 1.8.3.2 ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
[libav-devel] [PATCH] x86: dct: Disable dct32_float_sse on x86-64
There is an SSE2 implementation so the SSE version is never used. The SSE version also happens to contain SSE2 instructions on x86-64. --- libavcodec/x86/dct32.asm | 3 +++ libavcodec/x86/dct_init.c | 2 ++ 2 files changed, 5 insertions(+) diff --git a/libavcodec/x86/dct32.asm b/libavcodec/x86/dct32.asm index 9c147b9..fa723b0 100644 --- a/libavcodec/x86/dct32.asm +++ b/libavcodec/x86/dct32.asm @@ -482,7 +482,10 @@ cglobal dct32_float, 2, 3, 16, out, in, tmp %endif %endmacro +%if ARCH_X86_64 == 0 INIT_XMM sse DCT32_FUNC +%endif + INIT_XMM sse2 DCT32_FUNC diff --git a/libavcodec/x86/dct_init.c b/libavcodec/x86/dct_init.c index ca9fbc7..b2e43a9 100644 --- a/libavcodec/x86/dct_init.c +++ b/libavcodec/x86/dct_init.c @@ -30,8 +30,10 @@ av_cold void ff_dct_init_x86(DCTContext *s) { int cpu_flags = av_get_cpu_flags(); +#if ARCH_X86_32 if (EXTERNAL_SSE(cpu_flags)) s-dct32 = ff_dct32_float_sse; +#endif if (EXTERNAL_SSE2(cpu_flags)) s-dct32 = ff_dct32_float_sse2; if (EXTERNAL_AVX_FAST(cpu_flags)) -- 1.8.3.2 ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH] x86: dcadsp: Avoid SSE2 instructions in SSE functions
On Sat, Aug 1, 2015 at 8:49 PM, James Almer jamr...@gmail.com wrote: I however think movq/sd should be used here for sse2 and above instead of movlps. That's a moot point in this case since the code in question is SSE only (and even if it wasn't I'm skeptical to the claim that it would be measurably slower than movsd). ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH] checkasm: Include io.h for isatty, if available
On Wed, Jul 29, 2015 at 10:09 PM, Martin Storsjö mar...@martin.st wrote: configure does check for isatty, and checkasm properly checks HAVE_ISATTY, but on some platforms (e.g. WinRT), io.h needs to be included for isatty to be available. Ok. ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
[libav-devel] [PATCH 2/2] checkasm: Use LOCAL_ALIGNED
From: Michael Niedermayer mich...@niedermayer.cc Fixes alignment issues and bus errors. --- tests/checkasm/bswapdsp.c | 9 + tests/checkasm/h264pred.c | 5 +++-- tests/checkasm/h264qpel.c | 9 + 3 files changed, 13 insertions(+), 10 deletions(-) diff --git a/tests/checkasm/bswapdsp.c b/tests/checkasm/bswapdsp.c index 3871029..748a886 100644 --- a/tests/checkasm/bswapdsp.c +++ b/tests/checkasm/bswapdsp.c @@ -22,6 +22,7 @@ #include checkasm.h #include libavcodec/bswapdsp.h #include libavutil/common.h +#include libavutil/internal.h #include libavutil/intreadwrite.h #define BUF_SIZE 512 @@ -55,10 +56,10 @@ void checkasm_check_bswapdsp(void) { -DECLARE_ALIGNED(16, uint8_t, src0)[BUF_SIZE]; -DECLARE_ALIGNED(16, uint8_t, src1)[BUF_SIZE]; -DECLARE_ALIGNED(16, uint8_t, dst0)[BUF_SIZE]; -DECLARE_ALIGNED(16, uint8_t, dst1)[BUF_SIZE]; +LOCAL_ALIGNED_16(uint8_t, src0, [BUF_SIZE]); +LOCAL_ALIGNED_16(uint8_t, src1, [BUF_SIZE]); +LOCAL_ALIGNED_16(uint8_t, dst0, [BUF_SIZE]); +LOCAL_ALIGNED_16(uint8_t, dst1, [BUF_SIZE]); BswapDSPContext h; ff_bswapdsp_init(h); diff --git a/tests/checkasm/h264pred.c b/tests/checkasm/h264pred.c index 40e949a..08f23e6 100644 --- a/tests/checkasm/h264pred.c +++ b/tests/checkasm/h264pred.c @@ -23,6 +23,7 @@ #include libavcodec/avcodec.h #include libavcodec/h264pred.h #include libavutil/common.h +#include libavutil/internal.h #include libavutil/intreadwrite.h static const int codec_ids[4] = { AV_CODEC_ID_H264, AV_CODEC_ID_VP8, AV_CODEC_ID_RV40, AV_CODEC_ID_SVQ3 }; @@ -232,8 +233,8 @@ void checkasm_check_h264pred(void) { check_pred8x8l, pred8x8l }, }; -DECLARE_ALIGNED(16, uint8_t, buf0)[BUF_SIZE]; -DECLARE_ALIGNED(16, uint8_t, buf1)[BUF_SIZE]; +LOCAL_ALIGNED_16(uint8_t, buf0, [BUF_SIZE]); +LOCAL_ALIGNED_16(uint8_t, buf1, [BUF_SIZE]); H264PredContext h; int test, codec, chroma_format, bit_depth; diff --git a/tests/checkasm/h264qpel.c b/tests/checkasm/h264qpel.c index 550f9d9..f734945 100644 --- a/tests/checkasm/h264qpel.c +++ b/tests/checkasm/h264qpel.c @@ -22,6 +22,7 @@ #include checkasm.h #include libavcodec/h264qpel.h #include libavutil/common.h +#include libavutil/internal.h #include libavutil/intreadwrite.h static const uint32_t pixel_mask[3] = { 0x, 0x01ff01ff, 0x03ff03ff }; @@ -48,10 +49,10 @@ static const uint32_t pixel_mask[3] = { 0x, 0x01ff01ff, 0x03ff03ff }; void checkasm_check_h264qpel(void) { -DECLARE_ALIGNED(16, uint8_t, buf0)[BUF_SIZE]; -DECLARE_ALIGNED(16, uint8_t, buf1)[BUF_SIZE]; -DECLARE_ALIGNED(16, uint8_t, dst0)[BUF_SIZE]; -DECLARE_ALIGNED(16, uint8_t, dst1)[BUF_SIZE]; +LOCAL_ALIGNED_16(uint8_t, buf0, [BUF_SIZE]); +LOCAL_ALIGNED_16(uint8_t, buf1, [BUF_SIZE]); +LOCAL_ALIGNED_16(uint8_t, dst0, [BUF_SIZE]); +LOCAL_ALIGNED_16(uint8_t, dst1, [BUF_SIZE]); H264QpelContext h; int op, bit_depth, i, j; -- 1.8.3.2 ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
[libav-devel] [PATCH 1/2] checkasm: Modify report format
Makes it a bit more clear where each test belongs. Suggested by Anton Khirnov. --- tests/checkasm/checkasm.c | 57 +++ tests/checkasm/checkasm.h | 2 +- tests/checkasm/h264qpel.c | 2 +- 3 files changed, 30 insertions(+), 31 deletions(-) diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c index e6cf3d7..f1e9cd9 100644 --- a/tests/checkasm/checkasm.c +++ b/tests/checkasm/checkasm.c @@ -53,17 +53,20 @@ #endif /* List of tests to invoke */ -static void (* const tests[])(void) = { +static const struct { +const char *name; +void (*func)(void); +} tests[] = { #if CONFIG_BSWAPDSP -checkasm_check_bswapdsp, +{ bswapdsp, checkasm_check_bswapdsp }, #endif #if CONFIG_H264PRED -checkasm_check_h264pred, +{ h264pred, checkasm_check_h264pred }, #endif #if CONFIG_H264QPEL -checkasm_check_h264qpel, +{ h264qpel, checkasm_check_h264qpel }, #endif -NULL +{ NULL } }; /* List of cpu flags to check */ @@ -127,6 +130,7 @@ static struct { CheckasmFunc *funcs; CheckasmFunc *current_func; CheckasmFuncVersion *current_func_ver; +const char *current_test_name; const char *bench_pattern; int bench_pattern_len; int num_checked; @@ -314,8 +318,10 @@ static void check_cpu_flag(const char *name, int flag) int i; state.cpu_flag_name = name; -for (i = 0; tests[i]; i++) -tests[i](); +for (i = 0; tests[i].func; i++) { +state.current_test_name = tests[i].name; +tests[i].func(); +} } } @@ -332,7 +338,7 @@ int main(int argc, char *argv[]) { int i, seed, ret = 0; -if (!tests[0] || !cpus[0].flag) { +if (!tests[0].func || !cpus[0].flag) { fprintf(stderr, checkasm: no tests to perform\n); return 0; } @@ -464,19 +470,15 @@ void checkasm_report(const char *name, ...) static int prev_checked, prev_failed, max_length; if (state.num_checked prev_checked) { -print_cpu_name(); - -if (*name) { -int pad_length = max_length; -va_list arg; +int pad_length = max_length + 4; +va_list arg; -fprintf(stderr, - ); -va_start(arg, name); -pad_length -= vfprintf(stderr, name, arg); -va_end(arg); -fprintf(stderr, %*c, FFMAX(pad_length, 0) + 2, '['); -} else -fprintf(stderr, - %-*s [, max_length, state.current_func-name); +print_cpu_name(); +pad_length -= fprintf(stderr, - %s., state.current_test_name); +va_start(arg, name); +pad_length -= vfprintf(stderr, name, arg); +va_end(arg); +fprintf(stderr, %*c, FFMAX(pad_length, 0) + 2, '['); if (state.num_failed == prev_failed) color_printf(COLOR_GREEN, OK); @@ -487,16 +489,13 @@ void checkasm_report(const char *name, ...) prev_checked = state.num_checked; prev_failed = state.num_failed; } else if (!state.cpu_flag) { -int length; - /* Calculate the amount of padding required to make the output vertically aligned */ -if (*name) { -va_list arg; -va_start(arg, name); -length = vsnprintf(NULL, 0, name, arg); -va_end(arg); -} else -length = strlen(state.current_func-name); +int length = strlen(state.current_test_name); +va_list arg; + +va_start(arg, name); +length += vsnprintf(NULL, 0, name, arg); +va_end(arg); if (length max_length) max_length = length; diff --git a/tests/checkasm/checkasm.h b/tests/checkasm/checkasm.h index b7a36ee..443546a 100644 --- a/tests/checkasm/checkasm.h +++ b/tests/checkasm/checkasm.h @@ -55,7 +55,7 @@ static av_unused intptr_t (*func_new)(); #define fail() checkasm_fail_func(%s:%d, av_basename(__FILE__), __LINE__) /* Print the test outcome */ -#define report(...) checkasm_report( __VA_ARGS__) +#define report checkasm_report /* Call the reference function */ #define call_ref(...) func_ref(__VA_ARGS__) diff --git a/tests/checkasm/h264qpel.c b/tests/checkasm/h264qpel.c index 01b97ae..550f9d9 100644 --- a/tests/checkasm/h264qpel.c +++ b/tests/checkasm/h264qpel.c @@ -74,6 +74,6 @@ void checkasm_check_h264qpel(void) } } } -report(%s_h264_qpel, op_name); +report(%s, op_name); } } -- 1.8.3.2 ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH] [RFC] use a wrapper script to call MS link.exe to avoid mixing with /usr/bin/link.exe
On Thu, Jul 23, 2015 at 7:23 PM, Steve Lhomme rob...@gmail.com wrote: On Thu, Jul 23, 2015 at 7:02 PM, Derek Buitenhuis derek.buitenh...@gmail.com wrote: Broken permissions. Not sure how I can tweak that under Windows. git update-index --chmod=+x file ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH] [RFC] use a wrapper script to call MS link.exe to avoid mixing with /usr/bin/link.exe
On Thu, Jul 23, 2015 at 9:04 AM, Martin Storsjö mar...@martin.st wrote: Why is this suddenly using command instead of which now? This won't work in a linux environment. Why wouldn't it work in a linux environment? `command` is POSIX. This stackoverflow post sums it up fairly well: https://stackoverflow.com/a/677212 ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 1/1] checkasm: remove empty array initializer list in h264pred test
On Mon, Jul 20, 2015 at 11:18 PM, Janne Grunau janne-li...@jannau.net wrote: Fixes MSVC compilation. --- tests/checkasm/h264pred.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) Ok. ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 1/1] checkasm: fix MSVC build by adding a zero initializer for an empty array
On Mon, Jul 20, 2015 at 11:58 AM, Janne Grunau janne-li...@jannau.net wrote: --- tests/checkasm/h264pred.c | 1 + 1 file changed, 1 insertion(+) Shouldn't it be NULL instead of 0 since those are pointers? Otherwise OK. ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 2/4] checkasm: test all architectures with optimisations
lgtm. ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
[libav-devel] [PATCH 2/2] tests/checkasm/checkasm: Give macro a body to avoid potential unexpected syntax issues
From: Michael Niedermayer mich...@niedermayer.cc Signed-off-by: Michael Niedermayer mich...@niedermayer.cc --- tests/checkasm/checkasm.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tests/checkasm/checkasm.h b/tests/checkasm/checkasm.h index 1a46e9b..b54be16 100644 --- a/tests/checkasm/checkasm.h +++ b/tests/checkasm/checkasm.h @@ -110,7 +110,7 @@ void checkasm_stack_clobber(uint64_t clobber, ...); }\ } while (0) #else -#define bench_new(...) +#define bench_new(...) while(0) #endif #endif -- 1.8.3.2 ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
[libav-devel] [PATCH 1/2] checkasm: exit with status 0 instead of 1 if there are no tests to perform
--- tests/checkasm/checkasm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c index 7b1ea8f..0aa3d1c 100644 --- a/tests/checkasm/checkasm.c +++ b/tests/checkasm/checkasm.c @@ -317,7 +317,7 @@ int main(int argc, char *argv[]) if (!tests[0] || !cpus[0].flag) { fprintf(stderr, checkasm: no tests to perform\n); -return 1; +return 0; } if (argc 1 !strncmp(argv[1], --bench, 7)) { -- 1.8.3.2 ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH] cosmetics: Reformat checkasm tests
On Fri, Jul 17, 2015 at 8:08 PM, Luca Barbato lu_z...@gentoo.org wrote: -qpel_mc_func (*tab)[16] = op ? h.avg_h264_qpel_pixels_tab : h.put_h264_qpel_pixels_tab; +qpel_mc_func(*tab)[16] = op ? h.avg_h264_qpel_pixels_tab : h.put_h264_qpel_pixels_tab; No space between type and identifier? I don't particularly have any preference for either way, but it's used in most other places: https://git.libav.org/?p=libav.gita=searchh=HEADst=greps=[_a-zA-Z0-9]%2B+%2B\%28\*[^%2C]%2B\%29\[.*\]sr=1 Otherwise OK. ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
[libav-devel] [PATCH 1/2] x86: bswapdsp: Don't treat 32-bit integers as 64-bit
The upper halves are not guaranteed to be zero in x86-64. Also use `test` instead of `and` when the result isn't used for anything other than as a branch condition, this allows some register moves to be eliminated. --- libavcodec/x86/bswapdsp.asm | 23 ++- 1 file changed, 10 insertions(+), 13 deletions(-) diff --git a/libavcodec/x86/bswapdsp.asm b/libavcodec/x86/bswapdsp.asm index 17a6cb1..42580a3 100644 --- a/libavcodec/x86/bswapdsp.asm +++ b/libavcodec/x86/bswapdsp.asm @@ -28,8 +28,8 @@ SECTION_TEXT ; %1 = aligned/unaligned %macro BSWAP_LOOPS 1 -mov r3, r2 -sar r2, 3 +mov r3d, r2d +sar r2d, 3 jz .left4_%1 .loop8_%1: mov%1m0, [r1 + 0] @@ -57,11 +57,11 @@ SECTION_TEXT %endif add r0, 32 add r1, 32 -dec r2 +dec r2d jnz .loop8_%1 .left4_%1: -mov r2, r3 -and r3, 4 +mov r2d, r3d +test r3d, 4 jz .left mov%1m0, [r1] %if cpuflag(ssse3) @@ -84,13 +84,11 @@ SECTION_TEXT %macro BSWAP32_BUF 0 %if cpuflag(ssse3) cglobal bswap32_buf, 3,4,3 -mov r3, r1 mova m2, [pb_bswap32] %else cglobal bswap32_buf, 3,4,5 -mov r3, r1 %endif -and r3, 15 +test r1, 15 jz .start_align BSWAP_LOOPS u jmp .left @@ -98,8 +96,7 @@ cglobal bswap32_buf, 3,4,5 BSWAP_LOOPS a .left: %if cpuflag(ssse3) -mov r3, r2 -and r2, 2 +test r2d, 2 jz .left1 movq m0, [r1] pshufb m0, m2 @@ -107,13 +104,13 @@ cglobal bswap32_buf, 3,4,5 add r1, 8 add r0, 8 .left1: -and r3, 1 +test r2d, 1 jz .end mov r2d, [r1] bswapr2d mov [r0], r2d %else -and r2, 3 +and r2d, 3 jz .end .loop2: mov r3d, [r1] @@ -121,7 +118,7 @@ cglobal bswap32_buf, 3,4,5 mov [r0], r3d add r1, 4 add r0, 4 -dec r2 +dec r2d jnz .loop2 %endif .end: -- 1.8.3.2 ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
[libav-devel] [PATCH 2/2] checkasm: add unit tests for bswapdsp
--- tests/checkasm/Makefile | 1 + tests/checkasm/bswapdsp.c | 73 +++ tests/checkasm/checkasm.c | 3 ++ tests/checkasm/checkasm.h | 1 + 4 files changed, 78 insertions(+) create mode 100644 tests/checkasm/bswapdsp.c diff --git a/tests/checkasm/Makefile b/tests/checkasm/Makefile index 0758746..483ad13 100644 --- a/tests/checkasm/Makefile +++ b/tests/checkasm/Makefile @@ -1,4 +1,5 @@ # libavcodec tests +AVCODECOBJS-$(CONFIG_BSWAPDSP) += bswapdsp.o AVCODECOBJS-$(CONFIG_H264PRED) += h264pred.o AVCODECOBJS-$(CONFIG_H264QPEL) += h264qpel.o diff --git a/tests/checkasm/bswapdsp.c b/tests/checkasm/bswapdsp.c new file mode 100644 index 000..7b1566b --- /dev/null +++ b/tests/checkasm/bswapdsp.c @@ -0,0 +1,73 @@ +/* + * Copyright (c) 2015 Henrik Gramner + * + * This file is part of Libav. + * + * Libav is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * Libav is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with Libav; if not, write to the Free Software Foundation, Inc., + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + */ + +#include string.h +#include checkasm.h +#include libavcodec/bswapdsp.h +#include libavutil/common.h +#include libavutil/intreadwrite.h + +#define BUF_SIZE 512 + +#define randomize_buffers()\ +do {\ +int i;\ +for (i = 0; i BUF_SIZE; i += 4) {\ +uint32_t r = rnd();\ +AV_WN32A(src0+i, r);\ +AV_WN32A(src1+i, r);\ +r = rnd();\ +AV_WN32A(dst0+i, r);\ +AV_WN32A(dst1+i, r);\ +}\ +} while (0) + +#define check_bswap(type)\ +do {\ +int w;\ +for (w = 0; w BUF_SIZE/sizeof(type); w++) {\ +int offset = (BUF_SIZE/sizeof(type) - w) 15; /* Test various alignments */\ +randomize_buffers();\ +call_ref((type*)dst0+offset, (type*)src0+offset, w);\ +call_new((type*)dst1+offset, (type*)src1+offset, w);\ +if (memcmp(src0, src1, BUF_SIZE) || memcmp(dst0, dst1, BUF_SIZE))\ +fail();\ +bench_new((type*)dst1+offset, (type*)src1+offset, w);\ +}\ +} while (0) + +void checkasm_check_bswapdsp(void) +{ +DECLARE_ALIGNED(16, uint8_t, src0)[BUF_SIZE]; +DECLARE_ALIGNED(16, uint8_t, src1)[BUF_SIZE]; +DECLARE_ALIGNED(16, uint8_t, dst0)[BUF_SIZE]; +DECLARE_ALIGNED(16, uint8_t, dst1)[BUF_SIZE]; +BswapDSPContext h; + +ff_bswapdsp_init(h); + +if (check_func(h.bswap_buf, bswap_buf)) +check_bswap(uint32_t); + +if (check_func(h.bswap16_buf, bswap16_buf)) +check_bswap(uint16_t); + +report(bswap); +} diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c index 7b1ea8f..ce73778 100644 --- a/tests/checkasm/checkasm.c +++ b/tests/checkasm/checkasm.c @@ -54,6 +54,9 @@ /* List of tests to invoke */ static void (* const tests[])(void) = { +#if CONFIG_BSWAPDSP +checkasm_check_bswapdsp, +#endif #if CONFIG_H264PRED checkasm_check_h264pred, #endif diff --git a/tests/checkasm/checkasm.h b/tests/checkasm/checkasm.h index 1a46e9b..c2e359f 100644 --- a/tests/checkasm/checkasm.h +++ b/tests/checkasm/checkasm.h @@ -29,6 +29,7 @@ #include libavutil/lfg.h #include libavutil/timer.h +void checkasm_check_bswapdsp(void); void checkasm_check_h264pred(void); void checkasm_check_h264qpel(void); -- 1.8.3.2 ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
[libav-devel] [PATCH] checkasm: Add unit tests for h264qpel
--- tests/checkasm/Makefile | 1 + tests/checkasm/checkasm.c | 3 ++ tests/checkasm/checkasm.h | 1 + tests/checkasm/h264qpel.c | 80 +++ 4 files changed, 85 insertions(+) create mode 100644 tests/checkasm/h264qpel.c diff --git a/tests/checkasm/Makefile b/tests/checkasm/Makefile index 33e2c09..0758746 100644 --- a/tests/checkasm/Makefile +++ b/tests/checkasm/Makefile @@ -1,5 +1,6 @@ # libavcodec tests AVCODECOBJS-$(CONFIG_H264PRED) += h264pred.o +AVCODECOBJS-$(CONFIG_H264QPEL) += h264qpel.o CHECKASMOBJS-$(CONFIG_AVCODEC) += $(AVCODECOBJS-yes) diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c index 59383b8..7b1ea8f 100644 --- a/tests/checkasm/checkasm.c +++ b/tests/checkasm/checkasm.c @@ -57,6 +57,9 @@ static void (* const tests[])(void) = { #if CONFIG_H264PRED checkasm_check_h264pred, #endif +#if CONFIG_H264QPEL +checkasm_check_h264qpel, +#endif NULL }; diff --git a/tests/checkasm/checkasm.h b/tests/checkasm/checkasm.h index 90844e2..1a46e9b 100644 --- a/tests/checkasm/checkasm.h +++ b/tests/checkasm/checkasm.h @@ -30,6 +30,7 @@ #include libavutil/timer.h void checkasm_check_h264pred(void); +void checkasm_check_h264qpel(void); intptr_t (*checkasm_check_func(intptr_t (*func)(), const char *name, ...))() av_printf_format(2, 3); int checkasm_bench_func(void); diff --git a/tests/checkasm/h264qpel.c b/tests/checkasm/h264qpel.c new file mode 100644 index 000..06bc6ad --- /dev/null +++ b/tests/checkasm/h264qpel.c @@ -0,0 +1,80 @@ +/* + * Copyright (c) 2015 Henrik Gramner + * + * This file is part of Libav. + * + * Libav is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * Libav is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with Libav; if not, write to the Free Software Foundation, Inc., + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + */ + +#include string.h +#include checkasm.h +#include libavcodec/h264qpel.h +#include libavutil/common.h +#include libavutil/intreadwrite.h + +static const uint32_t pixel_mask[3] = { 0x, 0x01ff01ff, 0x03ff03ff }; + +#define SIZEOF_PIXEL ((bit_depth + 7) / 8) +#define BUF_SIZE (2*16*(16+3+4)) + +#define randomize_buffers()\ +do {\ +uint32_t mask = pixel_mask[bit_depth-8];\ +int k;\ +for (k = 0; k BUF_SIZE; k += 4) {\ +uint32_t r = rnd() mask;\ +AV_WN32A(buf0+k, r);\ +AV_WN32A(buf1+k, r);\ +r = rnd();\ +AV_WN32A(dst0+k, r);\ +AV_WN32A(dst1+k, r);\ +}\ +} while (0) + +#define src0 (buf0 + 3*2*16) /* h264qpel functions read data from negative src pointer offsets */ +#define src1 (buf1 + 3*2*16) + +void checkasm_check_h264qpel(void) +{ +DECLARE_ALIGNED(16, uint8_t, buf0)[BUF_SIZE]; +DECLARE_ALIGNED(16, uint8_t, buf1)[BUF_SIZE]; +DECLARE_ALIGNED(16, uint8_t, dst0)[BUF_SIZE]; +DECLARE_ALIGNED(16, uint8_t, dst1)[BUF_SIZE]; +H264QpelContext h; +int op, bit_depth, i, j; + +for (op = 0; op 2; op++) { +qpel_mc_func (*tab)[16] = op ? h.avg_h264_qpel_pixels_tab : h.put_h264_qpel_pixels_tab; +const char *op_name = op ? avg : put; + +for (bit_depth = 8; bit_depth = 10; bit_depth++) { +ff_h264qpel_init(h, bit_depth); +for (i = 0; i (op ? 3 : 4); i++) { +int size = 16 i; +for (j = 0; j 16; j++) { +if (check_func(tab[i][j], %s_h264_qpel_%d_mc%d%d_%d, op_name, size, j3, j2, bit_depth)) { +randomize_buffers(); +call_ref(dst0, src0, (ptrdiff_t)size*SIZEOF_PIXEL); +call_new(dst1, src1, (ptrdiff_t)size*SIZEOF_PIXEL); +if (memcmp(dst0, dst1, BUF_SIZE)) +fail(); +bench_new(dst1, src1, (ptrdiff_t)size*SIZEOF_PIXEL); +} +} +} +} +report(%s_h264_qpel, op_name); +} +} -- 1.8.3.2 ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel