from:"Henrik Gramner"

Re: [libav-devel] [PATCH 4/4] x86: fft: Port to cpuflags

2017-03-14 Thread Henrik Gramner

On Fri, Mar 10, 2017 at 3:17 PM, Diego Biurrun  wrote:
> +%macro INTERL 5
> +%if cpuflag(avx)
> +vunpckhps  %3, %2, %1
> +vunpcklps  %2, %2, %1
> +vextractf128   %4(%5), %2, 0
> +vextractf128  %4 %+ H(%5), %3, 0
> +vextractf128   %4(%5 + 1), %2, 1
> +vextractf128  %4 %+ H(%5 + 1), %3, 1
> +%elif cpuflag(sse)
> +mova %3, %2
> +unpcklps %2, %1
> +unpckhps %3, %1
> +mova  %4(%5), %2
> +mova  %4(%5+1), %3
> +%endif
> +%endmacro

The unpacks can be factored outside the ifs. Just use 3-arg
unconditionally when dst != src1.

Drop the v prefix for instructions with both legacy and VEX encodings.
x86inc automatically uses VEX in AVX functions.

Never use vextract(f|i)128 with a 0 immediate, use a basic move
instruction with the corresponding xmm register source instead.
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH] mov: Avoid memcmp of uninitialised data

2017-01-29 Thread Henrik Gramner

On Sun, Jan 29, 2017 at 8:59 PM, Mark Thompson  wrote:
> strncmp

Any particular reason for not just using plain strcmp()?
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [FFmpeg-devel] [PATCH] x86inc: Avoid using eax/rax for storing the stack pointer

2016-12-26 Thread Henrik Gramner

On Mon, Dec 26, 2016 at 2:52 PM, Ronald S. Bultje  wrote:
>  Hm, OK, I think it affects unix64/x86-32 also when using 32-byte
> alignment. We do use the stack pointer then.

On 32-bit and UNIX64 it simply uses a different caller-saved register
which doesn't require additional instructions.

> I think my hesitation comes from how I view x86inc.asm. There's two ways to
> see it:
> - it's a universal tool, like a compiler, to assist writing assembly
> (combined with yasm/nasm as actual assembler);
> or
> - it's a local tool for ffmpeg/libav/x26[5], like libavutil/attributes.h,
> to assist writing assembly.

In practice it's basically (a), but designed around the use-case of (b).

> If x86inc.asm were like a compiler, every micro-optimization, no matter the
> benefit, would be important. If it were a local tool, we indeed wouldn't
> care because ffmpeg spends most runtime for important use cases in other
> areas. (There's obviously a grayscale in this black/white range that I'm
> drawing out.) So having said that, patch is OK. If someone would later come
> in to add something to take return value type (void vs. non-void) into
> account, I would still find that helpful. :)

Specifying a full function prototype for cglobal instead of the
current implementation would be ideal. It would also allow stuff like
full floating-point abstraction and the ability to auto-load a
non-contiguous subset of parameters with optional sign-extension of
32-bit args etc.

The problem is that it's difficult to implement in a clean way with
the limited Yasm syntax. Nasm does have better string parsing
capabilities (although I haven't looked into it in detail) so if we
decide to drop Yasm support at some point in the future this feature
could perhaps be considered.
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH] x86inc: Avoid using eax/rax for storing the stack pointer

2016-12-26 Thread Henrik Gramner

On Mon, Dec 26, 2016 at 2:32 AM, Ronald S. Bultje  wrote:
> I know I'm terribly nitpicking here for the limited scope of the comment,
> but this only matters for functions that have a return value. Do you think
> it makes sense to allow functions to opt out of this requirement if they
> explicitly state to not have a return value?

An opt-out would only be relevant on 64-bit Windows when the following
criteria are true for a function:

* Reserves exactly 6 registers
* Reserves stack space with the original stack pointer stored in a
register (as opposed to the stack)
* Requires >16 byte stack alignment (e.g. spilling ymm registers to the stack)
* Does not have a return value

If and only if all of those are true this would result in one register
being unnecessarily saved (the cost of which would likely be hidden by
OoE). On other systems than WIN64 or if any of the conditions above is
false an opt-out doesn't make any sense.

Considering how rare that corner case is in combination with how
fairly insignificant the downside is I'm not sure it makes that much
sense to complicate the x86inc API further with an opt-out just for
that specific scenario.
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

[libav-devel] [PATCH] x86inc: Avoid using eax/rax for storing the stack pointer

2016-12-25 Thread Henrik Gramner

When allocating stack space with an alignment requirement that is larger
than the current stack alignment we need to store a copy of the original
stack pointer in order to be able to restore it later.

If we chose to use another register for this purpose we should not pick
eax/rax since it can be overwritten as a return value.
---
 libavutil/x86/x86inc.asm | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm
index b2e9c60..128ddc1 100644
--- a/libavutil/x86/x86inc.asm
+++ b/libavutil/x86/x86inc.asm
@@ -385,7 +385,14 @@ DECLARE_REG_TMP_SIZE 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14
 %ifnum %1
 %if %1 != 0 && required_stack_alignment > STACK_ALIGNMENT
 %if %1 > 0
+; Reserve an additional register for storing the original 
stack pointer, but avoid using
+; eax/rax for this purpose since it can potentially get 
overwritten as a return value.
 %assign regs_used (regs_used + 1)
+%if ARCH_X86_64 && regs_used == 7
+%assign regs_used 8
+%elif ARCH_X86_64 == 0 && regs_used == 1
+%assign regs_used 2
+%endif
 %endif
 %if ARCH_X86_64 && regs_used < 5 + UNIX64 * 3
 ; Ensure that we don't clobber any registers containing 
arguments. For UNIX64 we also preserve r6 (rax)
-- 
2.7.4

___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH 1/3] ratecontrol: Use correct function pointer casts instead of void*

2016-11-11 Thread Henrik Gramner

On Fri, Nov 11, 2016 at 1:22 PM, Diego Biurrun  wrote:
> ISO C forbids initialization between function pointer and ‘void *’

ISO C technically allows quite a lot of weird stuff, like having
function pointers that are different from data pointers.

Is there even any known relevant system where casting function
pointers to void* isn't well defined? I know it's required by POSIX
(e.g. dlsym() et al.) at least.
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH 2/2] hevc: x86: Add add_residual optimizations

2016-10-19 Thread Henrik Gramner

On Wed, Oct 19, 2016 at 10:18 AM, Diego Biurrun  wrote:
> +%macro ADD_RES_MMX_4_8 0
> +mova  m2, [r1]
> +mova  m4, [r1+8]
> +pxor  m3, m3
> +psubw m3, m2
> +packuswb  m2, m2
> +packuswb  m3, m3
> +pxor  m5, m5
> +psubw m5, m4
> +packuswb  m4, m4
> +packuswb  m5, m5
> +
> +movh  m0, [r0]
> +movh  m1, [r0+r2]
> +paddusb   m0, m2
> +paddusb   m1, m4
> +psubusb   m0, m3
> +psubusb   m1, m5
> +movh[r0], m0
> +movh [r0+r2], m1
> +%endmacro

mova  m0, [r1]
mova  m2, [r1+8]
pxor  m1, m1
pxor  m3, m3
psubw m1, m0
psubw m3, m2
packuswb  m0, m2
packuswb  m1, m3

movd  m2, [r0]
movd  m3, [r0+r2]
punpckldq m2, m3
paddusb   m0, m2
psubusb   m0, m1
movd[r0], m0
psrlq m0, 32
movd [r0+r2], m0

[...]

> +cglobal hevc_add_residual_4_8, 3, 4, 6

r3 isn't used, no need to reserve it.

[...]

> +%if cpuflag(avx)
> +psubw m3, m0, m4
> +psubw m5, m0, m6
> +%else
> +mova  m3, m0
> +mova  m5, m0
> +psubw m3, m4
> +psubw m5, m6
> +%endif

Pointless %else. x86inc will do this automatically for non-AVX when
3-arg syntax is used.

[...]

> +decr4d
> +jnz .loop

Nit: jg .loop

[...]

> +cglobal hevc_add_residual_4_10,3,4, 6

r3 isn't used.

[...]

> +cglobal hevc_add_residual_8_10,3,5,6

r4 isn't used.
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH 1/2] checkasm: Add a test for HEVC add_residual

2016-10-19 Thread Henrik Gramner

On Wed, Oct 19, 2016 at 5:43 PM, Diego Biurrun  wrote:
> What exactly segfaults?

checkasm --bench=add_res

The stride for bench_new() shouldn't be different from call_new()

Actually it should probably be more like:

int stride = block_size << (bit_depth > 8);
call_ref(dst0, res0, stride);

to test with stricter (smaller) alignments.

> I get a complaint from clang-asan for size 32:

The randomize_buffers2() call is wrong, it shouldn't multiply size with 2.
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH 1/2] checkasm: Add a test for HEVC add_residual

2016-10-19 Thread Henrik Gramner

On Wed, Oct 19, 2016 at 10:18 AM, Diego Biurrun  wrote:
> +bench_new(dst1, res1, block_size);


Segfaults. Should probably be block_size * 2 like the other calls.
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH 2/2] checkasm: Add a test for HEVC add_residual

2016-10-14 Thread Henrik Gramner

On Fri, Oct 14, 2016 at 10:29 AM, Luca Barbato  wrote:
> The term checkasm is misleading. The whole thing is a unit-test for some
> specific dsp functions.

Not really, no. The checkasm tests only tests whether or not the
output of the assembly functions matches the output of the C function.
It doesn't try to verify that the C function is actually correct

With that said, I really don't see any point in disabling tests just
because there's currently no assembly implementation.
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH 1/2] hevc: Add SSE2 and AVX IDCT

2016-10-07 Thread Henrik Gramner

On Fri, Oct 7, 2016 at 6:32 PM, Alexandra Hájková
 wrote:
> On Fri, Oct 7, 2016 at 12:32 AM, Diego Biurrun  wrote:
>> There should be no need to redefine the transpose functions, just call
>> the right one with the help of the cpuname macro.
>
> The traspose functions are called by IDCT_size*size macros and the macro 
> itself
> is the same for avx and sse2. I think the only way to avoid this
> define is to group the init
> by SIMD instead of grouping it by bitdepth but what to do with the
> bitdepth then?
> So I think it would be better to leave the define as it is.

I think he means

call hevc_idct_transpose_NxN_ %+ cpuname

which indeed allows you to get rid of the defines.
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH 1/2] hevc: Add SSE2 and AVX IDCT

2016-10-04 Thread Henrik Gramner

On Tue, Oct 4, 2016 at 7:35 PM, Alexandra Hájková
 wrote:
> +cglobal hevc_idct_16x16_%1, 1, 2, 16, coeffs
> +mov r1d, 3
> +.loop16:
> +TR_16x4 8 * r1, 7, [pd_64], 64, 2, 32, 8, 16, 1, 0
> +dec r1

dec r1d

[...]

> +++ b/libavcodec/x86/hevcdsp_init.c

The function pointers for the AVX versions of 4x4 and 8x8 are not
assigned on 32-bit.

Otherwise LGTM.

Tested and passes checkasm on 64-bit Linux, 64-bit Windows, and 32-bit Windows.
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH 1/2] hevc: Add SSE2 and AVX IDCT

2016-10-02 Thread Henrik Gramner

On Sat, Oct 1, 2016 at 12:55 PM,   wrote:

> +cglobal hevc_idct_4x4_ %+ %1, 1, 1, 5, coeffs

cglobal hevc_idct_4x4_%1, 1, 1, 5, coeffs

[...]

> +%macro SWAP_BLOCKS 5
[...]
> +TRANSPOSE_4x4 4, 5, 8
[...]
> +TRANSPOSE_4x4 4, 5, 8
[...]
> +%macro TRANSPOSE_BLOCK 3
[...]
> +TRANSPOSE_4x4 4, 5, 8

TRANSPOSE_4x4 4, 5, 6

Makes the 8x8 IDCT use one xmm register less (9 -> 8).

[...]

> +%macro TRANSPOSE_8x8 0

Might as well turn this one into a function too while we're at it.

[...]

> +cglobal hevc_idct_8x8_ %+ %1, 1, 1, 8, coeffs

cglobal hevc_idct_8x8_%1, 1, 1, 8, coeffs

[...]

> +cglobal transpose_16x16, 0, 0, 0

Should be prefixed with hevc_idct_. This is also still SSE2-only, an
AVX version would avoid some register-register moves.

Template it and instantiate both SSE2 and AVX versions. Then inside
the INIT_IDCT macro you can do the following:

INIT_XMM sse2
%define transpose_16x16 hevc_idct_transpose_16x16_sse2
...

INIT_XMM avx
%define transpose_16x16 hevc_idct_transpose_16x16_avx
...

This applies to the other transposes as well.

[...]

> +cglobal hevc_idct_16x16_ %+ %1, 1, 3, 15, coeffs
[...]
> +call transpose_16x16
> +RET

cglobal hevc_idct_16x16_%1, 1, 2, 16, coeffs
[...]
TAIL_CALL transpose_16x16, 1

[...]

> +%macro E32_O32 5
[...]
> +mova m11, [rsp + %5]
> +paddd m11, m14

paddd m11, m14, [rsp + %5]

[...]

> +%macro TR_32x4 3
[...]
> +lea r2, [trans_coeff32 + 15 * 128]
> +lea r3, [coeffsq + %1 + 960]
> +lea r4, [coeffsq + %1 + 16 * 64]
> +mov r5d, 16 * 16
> +%%loop:
> +E32_O32 r2, r3, r4, shift, r5 - 16
> +sub r2, 128
> +sub r3, 64
> +add r4, 64
> +sub r5d, 16
> +jg %%loop

lea  r2, [trans_coeff32 + 15 * 128]
lea  r3, [coeffsq + %1]
lea  r4, [r3 + 16 * 64]
mov r5d, 15 * 16
%%loop:
E32_O32 r2, r3+r5*4, r4, shift, r5
sub  r2, 128
add  r4, 64
sub r5d, 16
jge %%loop

[...]

> +cglobal hevc_idct_32x32_ %+ %1, 1, 7, 15, 512, coeffs
> +mov r1d, 8
> +.loop32:
> +TR_32x4 (8 * r1 - 8), %1, 1
> +dec r1d
> +jg .loop32
[...]
> +call transpose_32x32
> +RET

cglobal hevc_idct_32x32_%1, 1, 6, 16, 256, coeffs
mov r1d, 7
.loop32:
TR_32x4 8 * r1, %1, 1
dec r1d
jge .loop32
[...]
TAIL_CALL transpose_32x32, 1

[...]

> @@ -270,6 +288,7 @@ void ff_hevc_dsp_init_x86(HEVCDSPContext *c, const int 
> bit_depth)
>  c->hevc_v_loop_filter_chroma = 
> ff_hevc_v_loop_filter_chroma_8_sse2;
>  c->hevc_h_loop_filter_chroma = 
> ff_hevc_h_loop_filter_chroma_8_sse2;
>
> +
>  c->idct_dc[1] = ff_hevc_idct_8x8_dc_8_sse2;
>  c->idct_dc[2] = ff_hevc_idct_16x16_dc_8_sse2;
>  c->idct_dc[3] = ff_hevc_idct_32x32_dc_8_sse2;

Unnecessary extra newline.

[...]

>  #if ARCH_X86_64
>  if (bit_depth == 8) {
> +if (EXTERNAL_SSE2(cpu_flags)) {
> +c->idct[0] = ff_hevc_idct_4x4_8_sse2;
> +c->idct[1] = ff_hevc_idct_8x8_8_sse2;

Both 4x4 and 8x8 should work in 32-bit x86 as well.
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH 1/2] hevc: Add SSE2 and AVX IDCT

2016-09-30 Thread Henrik Gramner

On Fri, Sep 30, 2016 at 5:40 PM,   wrote:
> +%if cpuflag(avx)
> +pmaddwd m2, m0, [pw_64] ; e0
> +pmaddwd m3, m1, [pw_83_36] ; o0
> +%else
> +mova m2, m0
> +pmaddwd m2, [pw_64]
> +mova m3, m1
> +pmaddwd m3, [pw_83_36]
> +%endif

Redundant %else. x86inc will automatically turn 3-arg instructions
into a move + 2-arg instruction when targetting pre-AVX. For
commutative instructions (e.g. A op B == B op A) it will also use the
memory operand for the move since that's advantageous on some CPUs.

This applies to several other code sections as well.

[...]

+%macro LOAD_BLOCK 7
+movq   %1, [r0q + %3 + %7]
+movhps %1, [r0q + %5 + %7]
+movq   %2, [r0q + %4 + %7]
+movhps %2, [r0q + %6 + %7]
+%endmacro

Thq q suffix for registers is redundant, just use r0. Applies to the
STORE_PACKED macro as well.

[...]

> +%macro TRANSPOSE_16x16 0

Make this into a function, just like for 32x32.

[...]

> +%macro E32_O32 5
[...]
> +mova m11, [rsp + %5]
[...]
> +mov r2, 16
> +mov r3, trans_coeff32
> +mov r4, coeffsq
> +mov r5, 0
> +mov r6, coeffsq
> +add r6, 31 * 64
> +.loopE32_%3
> +E32_O32 r3, r4 + %1, r6 + %1, shift, r5
> +sub r6, 64
> +add r5, 16
> +add r4, 64
> +add r3, 128
> +dec r2
> +jg .loopE32_%3

mova m11, [rsp + %5 + 256]
[...]
lea   r2, [trans_coeff32]
lea   r3, [coeffsq + %1 + 1024]
lea   r4, [r3 + 31*64 - 1024]
mov   r5, -256
%%loop:
E32_O32 r2, r3 + r5*4, r4, shift, r5
add   r2, 128
sub   r4, 64
add   r5, 16
jl %%loop

(untested)

> +transpose_32x32:

Use cglobal (with 0,0,0 register args) and instantiate SSE2+AVX versions.

[...]

> +mov r1, 7
> +mov r2, 7 * 256
> +.loop_transpose
> +SWAP_BLOCKS 0, r2, 64, 0, r1 * 8
> +sub r2, 256
> +dec r1
> +jg .loop_transpose

Use dword registers (r1d/r2d) for the mov, sub, and dec instructions
(but keep native size for offsets like the SWAP_BLOCKS arguments).
Applies to several other code sections as well (whenever the value is
positive and fits in a dword).
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH 3/9] blockdsp/x86: yasmify

2016-09-22 Thread Henrik Gramner

On Thu, Sep 22, 2016 at 9:39 AM, Anton Khirnov <an...@khirnov.net> wrote:
> Quoting Henrik Gramner (2016-09-21 17:13:31)
>> Why not use xorps like the original code then? INIT_XMM sse will also
>> make mova assemble to movaps instead of movdqa, so no problem there.
>
> mmx only has pxor, so I'd need yet more ifdefs then.

If you really care about optimizing for Pentium II, yes. Alternatively
just drop the MMX implementation.
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH 3/9] blockdsp/x86: yasmify

2016-09-21 Thread Henrik Gramner

On Wed, Sep 21, 2016 at 9:01 AM, Anton Khirnov  wrote:
> Yes they are, because pxor does not exist in SSE.

Why not use xorps like the original code then? INIT_XMM sse will also
make mova assemble to movaps instead of movdqa, so no problem there.
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH 1/2] hevc: Add AVX IDCT

2016-09-19 Thread Henrik Gramner

Not a super-thorough review by any means, but anyway...

On Sun, Sep 18, 2016 at 7:35 PM, Alexandra Hájková
 wrote:

[...]

> +SECTION_RODATA

Check if any of the constants are duplicates of already existing ones.

[...]

> +%macro TR_4x4 2
> +; interleaves src0 with src2 to m0
> +; and src1 with scr3 to m2
> +; src0: 00 01 02 03 m0: 00 02 01 21 02 22 03 23

00 _20_ 01 21 02 22 03 23

[...]

> +SWAP 3, 0
> +SWAP 3, 2

SWAP 3, 2, 0

[...]

> +cglobal hevc_idct_4x4_ %+ %1, 1, 14, 14, coeffs

I'm pretty sure this functions doesn't require 14 GPRs and 14 vector registers.

[...]

> +%macro STORE_16 5
> +movu[rsp + %1], %5
> +movu[rsp + %2], %3
> +%endmacro

I don't see any reason for doing unaligned stores. This will likely
result in a performance hit compared to using aligned ones.

[...]

> +%macro E8_O8 8
> +pmaddwd m6, m4, %3
> +pmaddwd m7, m5, %4
> +paddd m6, m7
> +
> +%if %8 == 8
> +paddd %7, m8
> +%endif
> +
> +paddd m7, m6, %7 ; o8 + e8
> +psubd %7, m6 ; e8 - o8
> +STORE_%8 %5 + %1, %6 + %1, %7, %2, m7
> +%endmacro

If you do the middle paddd inside TR_4x4 macro instead (it even takes
a parameter for this already) you need to do 8 fewer adds in the 8x8
idct. You also save a register which means the function will work in
32-bit x86 as well.

[...]

> +; transpose src packed in m4, m5
> +;  to m3, m1
> +%macro TRANSPOSE 0
> +SBUTTERFLY wd, 4, 5, 8
> +SBUTTERFLY dq, 4, 5, 8
> +%endmacro

"TRANSPOSE" is kind of generic, a bit more specific macro name would
be useful. The comment is also wrong. Furthermore, in simple macros
like this it's IMO preferable to have the registers as arguments
instead of being hard-coded.

[...]

> +%macro SWAP_BLOCKS 5
> +; M_i
> +LOAD_BLOCK m6, m7, %2, %2 + %3, %2 + 2 * %3, %2 + 3 * %3, %1
> +
> +; M_j
> +LOAD_BLOCK m4, m5, %4, %4 + %3, %4 + 2 * %3, %4 + 3 * %3, %5
> +TRANSPOSE
> +STORE_PACKED m4, m5, %2, %2 + %3, %2 + 2 * %3, %2 + 3 * %3, %1
> +
> +; transpose and store M_i
> +SWAP m6, m4
> +SWAP m7, m5
> +TRANSPOSE
> +STORE_PACKED m4, m5, %4, %4 + %3, %4 + 2 * %3, %4 + 3 * %3, %5
> +%endmacro

You should perform the loads in the same order as you operate on them. E.g.
LOAD  m4, m5
TRANSPOSE m4, m5, m6
LOAD  m6, m7
STORE m4, m5
TRANSPOSE m6, m7, m4
STORE m6, m7

[...]

> +cglobal hevc_idct_8x8_ %+ %1, 1, 14, 14, coeffs

I'm pretty sure this functions doesn't require 14 GPRs and 14 vector
registers either.

[...]

> +; %1, 2 - transform constants
> +; %3, 4 - regs with interleaved coeffs
> +%macro ADD 4
> +pmaddwd m8, %3, %1
> +pmaddwd m9, %4, %2
> +paddd   m8, m9
> +paddd   m10, m8
> +%endmacro

ADD is defined in x86inc already which could potentially cause weird
issues, use a different macro name.

[...]

> +; %1 ... %4 transform coeffs
> +; %5, %6 offsets for storing e+o/e-o back to coeffsq
> +; %7 - shift
> +; %8 - add
> +; %9 - block_size
> +%macro E16_O16 9
> +pxor m10, m10
> +ADD %1, %2, m0, m1
> +ADD %3, %4, m2, m3
> +
> +movu m4, [rsp + %5]
> +%if %9 == 8
> +paddd m4, %8
> +%endif
> +
> +paddd m5, m10, m4 ; o16 + e16
> +psubd m4, m10  ; e16 - o16
> +STORE_%9 %5, %6, m4, %7, m5
> +%endmacro

Zeroing the accumulator can be avoided by for example making the ADD
macro take an additional parameter which switches between paddd and
SWAP. Also it doesn't seem like you're using that many registers here
so try to use a lower register number instead of m10.

Align your offsets so you don't have to do unaligned loads/stores.
Looking at the disassembly, 50% of your loads/stores are misaligned by
8 for no obvious reason. Furthermore, the distance between the 16-byte
stores seems to be 32 bytes which means half of the stack space is
sitting unused.

You're storing 8 registers worth of data to the stack before loading
it back again. If you avoid using more than 8 xmm registers you could
use the remaining 8 for temporary storage instead of the stack on
x86-64 (keep using the stack on x86-32).

[...]

> +%macro TR_16x4 9
> +mova m12, [pd_64]
> +
> +; produce 8x4 matrix of e16 coeffs
> +; for 4 first rows and store it on stack (128 bytes)
> +TR_8x4 %1, 7, %4, %5, %6, %8
> +
> +; load 8 even rows
> +LOAD_BLOCK m0, m1, %9 * %6, %9 * 3 * %6, %9 * 5 * %6, %9 * 7 * %6, %1
> +LOAD_BLOCK m2, m3, %9 * 9 * %6, %9 * 11 * %6, %9 * 13 * %6, %9 * 15 * 
> %6, %1
> +
> +SBUTTERFLY wd, 0, 1, 4
> +SBUTTERFLY wd, 2, 3, 4
> +
> +mova m7, %3
> +
> +E16_O16 [pw_90_87], [pw_80_70], [pw_57_43], [pw_25_9],  0 + %1, 15 * %6 
> + %1, %2, m7, %7
> +E16_O16 [pw_87_57], [pw_9_m43], [pw_m80_m90], [pw_m70_m25], %6 + %1, 14 
> * %6 + %1, %2, m7, %7
> +E16_O16 [pw_80_9], [pw_m70_m87], [pw_m25_57], [pw_90_43], 2 * %6 + %1, 
> 13 * %6 + %1, %2, m7, %7
> +E16_O16 [pw_70_m43], [pw_m87_9],

Re: [libav-devel] [PATCH 9/9] audiodsp/x86: yasmify vector_clipf_sse

2016-09-06 Thread Henrik Gramner

On Tue, Sep 6, 2016 at 11:39 AM, Anton Khirnov  wrote:
>> Use 3-arg maxps instead of mova.
>
> Isn't that AVX-only?

It is, x86inc will simply convert it to mova+minps when assembling it
as non-AVX code but it reduces the line count. It's certainly not
worth to go into bikeshedding territory about it however, so if you
prefer to use plain mova:s just keep them.

>> Otherwise LGTM, you could make an AVX version using ymm registers
>> as well in a separate patch if you want to, just need to make sure
>> the buffers are aligned.
>
> This function is only used in two rather obscure places, so probably
> not worth it

Fair enough.
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH] audiodsp/x86: clear the high bits of the order parameter on 64bit

2016-09-06 Thread Henrik Gramner

On Tue, Sep 6, 2016 at 11:44 AM, Anton Khirnov  wrote:
> Also change shl to add, since it can be faster on some CPUs.
>
> CC: libav-sta...@libav.org
> ---
>  libavcodec/x86/audiodsp.asm | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Ok.
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH 5/9] audiodsp/x86: sign extend the order argument to scalarproduct_int16 on 64bit

2016-09-05 Thread Henrik Gramner

On Mon, Sep 5, 2016 at 1:02 PM, Anton Khirnov  wrote:
> CC: libav-sta...@libav.org
> ---
>  libavcodec/x86/audiodsp.asm | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/libavcodec/x86/audiodsp.asm b/libavcodec/x86/audiodsp.asm
> index dc38ada..0e3019c 100644
> --- a/libavcodec/x86/audiodsp.asm
> +++ b/libavcodec/x86/audiodsp.asm
> @@ -26,6 +26,7 @@ SECTION .text
>  %macro SCALARPRODUCT 0
>  ; int ff_scalarproduct_int16(int16_t *v1, int16_t *v2, int order)
>  cglobal scalarproduct_int16, 3,3,3, v1, v2, order
> +movsxdifnidn orderq, orderd
>  shl orderq, 1
>  add v1q, orderq
>  add v2q, orderq

Alternatively replace "shl orderq, 1" with "add orderd, orderd"
instead, one instruction less since instructions operating on 32-bit
registers will implicitly zero the upper 32 bits (using "shl orderd,
1" works equally well but add can be faster than shl on some CPUs so
might as well use that instead).
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH 9/9] audiodsp/x86: yasmify vector_clipf_sse

2016-09-05 Thread Henrik Gramner

On Mon, Sep 5, 2016 at 1:02 PM, Anton Khirnov  wrote:
> +shl lenq, 2

You could also skip this shift and just use 4*lenq instead in the
memory operands, multiplying by 2, 4, or 8 in memory args is free.
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH 9/9] audiodsp/x86: yasmify vector_clipf_sse

2016-09-05 Thread Henrik Gramner

On Mon, Sep 5, 2016 at 1:02 PM, Anton Khirnov  wrote:
> +cglobal vector_clipf, 3, 3, 6, dst, src, len, min, max
> +%if ARCH_X86_32
> +VBROADCASTSS m0, minm
> +VBROADCASTSS m1, maxm
> +%else
> +VBROADCASTSS m0, m0
> +VBROADCASTSS m1, m1
> +%endif

This will fail on WIN64, to deal with the somewhat silly calling
conventions on that platform you need to do something like
VBROADCASTSS m0, m3
VBROADCASTSS m1, maxm
(not tested, I don't have access to a Windows machine at the moment).

> +movsxdifnidn lenq, lend
> +shl lenq, 2
> +
> +.loop
> +sub lenq, 4 * mmsize

Move the subtraction just before the branch (jg) to allow macro-op
fusion on modern Intel CPUs.

> +
> +mova m2, [srcq + lenq + 0 * mmsize]
> +mova m3, [srcq + lenq + 1 * mmsize]
> +mova m4, [srcq + lenq + 2 * mmsize]
> +mova m5, [srcq + lenq + 3 * mmsize]
> +
> +maxps m2, m0
> +maxps m3, m0
> +maxps m4, m0
> +maxps m5, m0

Use 3-arg maxps instead of mova.

> +minps m2, m1
> +minps m3, m1
> +minps m4, m1
> +minps m5, m1
> +
> +mova [dstq + lenq + 0 * mmsize], m2
> +mova [dstq + lenq + 1 * mmsize], m3
> +mova [dstq + lenq + 2 * mmsize], m4
> +mova [dstq + lenq + 3 * mmsize], m5
> +
> +jg .loop
> +
> +RET

Otherwise LGTM, you could make an AVX version using ymm registers as
well in a separate patch if you want to, just need to make sure the
buffers are aligned.
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH 1/2 v2] x86/hevc: add add_residual

2016-07-21 Thread Henrik Gramner

On Thu, Jul 21, 2016 at 2:48 AM, Josh de Kock  wrote:
> +cglobal hevc_add_residual_16_8, 3, 5, 7, dst, coeffs, stride
> +pxorm0, m0
> +lea r3, [strideq * 3]
> +RES_ADD_SSE_16_32_8  0, dstq, dstq + strideq
> +RES_ADD_SSE_16_32_8 64, dstq + strideq * 2, dstq + r3
> +mov r4d, 3
> +.loop:
> +addcoeffsq, 128
> +lea   dstq, [dstq + strideq * 4]
> +RES_ADD_SSE_16_32_8  0, dstq, dstq + strideq
> +RES_ADD_SSE_16_32_8 64, dstq + strideq * 2, dstq + r3
> +dec r4d
> +jnz .loop
> +RET

You can do all iterations within the loop instead, e.g. something like:

mov r4d, 4
.loop:
RES_ADD_SSE_16_32_8  0, dstq, dstq + strideq
RES_ADD_SSE_16_32_8 64, dstq + strideq * 2, dstq + r3
addcoeffsq, 128
lea   dstq, [dstq + strideq * 4]
dec r4d
jnz .loop

(the same applies to all other similar functions)
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH 1/3] x86/hevc: add add_residual

2016-07-19 Thread Henrik Gramner

On Thu, Jul 14, 2016 at 7:25 PM, Josh de Kock  wrote:
Some of those functions are several kilobytes large. That's going to
result in a lot of cache misses.

I suggest using loops instead of duplicating the same code over and
over with %reps.
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH] checkasm: add HEVC test for testing IDCT DC

2016-07-19 Thread Henrik Gramner

On Mon, Jul 18, 2016 at 8:11 PM, Alexandra Hájková
 wrote:
> +if (check_func(h.idct_dc[i - 2], "idct_%dx%d_dc_%d", block_size, 
> block_size, bit_depth)) {
> +call_ref(coeffs0);
> +call_new(coeffs1);
> +if (memcmp(coeffs0, coeffs1, sizeof(*coeffs0) * size)) {
> +printf("fail: i %d, block_size %d, bit_depth %d\n", i, 
> block_size, bit_depth);
> +fail();
> +}
> +}

bench_new() as well - otherwise there wont be any performance numbers.
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH 6/6] hevc: Add AVX2 DC IDCT

2016-07-10 Thread Henrik Gramner

On Sun, Jul 10, 2016 at 1:10 PM, Alexandra Hájková
 wrote:

Some fairly minor nits:

> +++ b/libavcodec/x86/hevc_idct.asm

> +cglobal hevc_idct_%1x%1_dc_%3, 1, 2, 1, coeff, tmp
> +movsx tmpq, word [coeffq]
> +add   tmpw, ((1 << 14-%3) + 1)
> +sar   tmpw, (15-%3)
> +movd   xm0, tmpd

Using dword instead of qword for the movsx gets rid of an unnecessary
REX-prefix.

Can the add overflow 16-bit, e.g. is the use of a 16-bit shift instead
of a 32-bit one required for truncation? If not, use dword for all
those instructions to prevent the possibility of partial register
access stalls on some CPUs.

[...]

> +.loop:
> +mova [coeffq+mmsize*0], m0
> +mova [coeffq+mmsize*1], m0
> +mova [coeffq+mmsize*2], m0
> +mova [coeffq+mmsize*3], m0
> +mova [coeffq+mmsize*4], m0
> +mova [coeffq+mmsize*5], m0
> +mova [coeffq+mmsize*6], m0
> +mova [coeffq+mmsize*7], m0
> +add  coeffq, mmsize*8
> +dec  cntd
> +jg  .loop

Offsets in the range [-128,127] can be encoded in 1 byte whereas
larger offsets require 4 bytes, and mmsize*4 is 128 when using ymm
registers. The code size can therefore be slightly reduced by
reordering instructions like this:

mova [coeffq+mmsize*0], m0
mova [coeffq+mmsize*1], m0
mova [coeffq+mmsize*2], m0
mova [coeffq+mmsize*3], m0
add  coeffq, mmsize*8
mova [coeffq+mmsize*-4], m0
mova [coeffq+mmsize*-3], m0
mova [coeffq+mmsize*-2], m0
mova [coeffq+mmsize*-1], m0
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

[libav-devel] [PATCH 3/4] x86inc: Improve handling of %ifid with multi-token parameters

2016-04-20 Thread Henrik Gramner

From: Anton Mitrofanov 

The yasm/nasm preprocessor only checks the first token, which means that
parameters such as `dword [rax]` are treated as identifiers, which is
generally not what we want.
---
 libavutil/x86/x86inc.asm | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm
index 60aad23..b79cc19 100644
--- a/libavutil/x86/x86inc.asm
+++ b/libavutil/x86/x86inc.asm
@@ -1136,7 +1136,7 @@ INIT_XMM
 CHECK_AVX_INSTR_EMU {%1 %6, %7, %8}, %6, %8
 %endif
 %if %5 && %4 == 0
-%ifnid %8
+%ifnnum sizeof%8
 ; 3-operand AVX instructions with a memory arg can only 
have it in src2,
 ; whereas SSE emulation prefers to have it in src1 (i.e. 
the mov).
 ; So, if the instruction is commutative with a memory arg, 
swap them.
@@ -1500,7 +1500,7 @@ FMA_INSTR pmadcswd, pmaddwd, paddd
 v%5%6 %1, %2, %3, %4
 %elifidn %1, %2
 ; If %3 or %4 is a memory operand it needs to be encoded as 
the last operand.
-%ifid %3
+%ifnum sizeof%3
 v%{5}213%6 %2, %3, %4
 %else
 v%{5}132%6 %2, %4, %3
-- 
1.9.1

___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

[libav-devel] [PATCH 4/4] x86inc: Enable AVX emulation in additional cases

2016-04-20 Thread Henrik Gramner

From: Anton Mitrofanov 

Allows emulation to work when dst is equal to src2 as long as the
instruction is commutative, e.g. `addps m0, m1, m0`.
---
 libavutil/x86/x86inc.asm | 21 +
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm
index b79cc19..dca1f78 100644
--- a/libavutil/x86/x86inc.asm
+++ b/libavutil/x86/x86inc.asm
@@ -1129,14 +1129,12 @@ INIT_XMM
 %if __emulate_avx
 %xdefine __src1 %7
 %xdefine __src2 %8
-%ifnidn %6, %7
-%if %0 >= 9
-CHECK_AVX_INSTR_EMU {%1 %6, %7, %8, %9}, %6, %8, %9
-%else
-CHECK_AVX_INSTR_EMU {%1 %6, %7, %8}, %6, %8
-%endif
-%if %5 && %4 == 0
-%ifnnum sizeof%8
+%if %5 && %4 == 0
+%ifnidn %6, %7
+%ifidn %6, %8
+%xdefine __src1 %8
+%xdefine __src2 %7
+%elifnnum sizeof%8
 ; 3-operand AVX instructions with a memory arg can only 
have it in src2,
 ; whereas SSE emulation prefers to have it in src1 (i.e. 
the mov).
 ; So, if the instruction is commutative with a memory arg, 
swap them.
@@ -1144,6 +1142,13 @@ INIT_XMM
 %xdefine __src2 %7
 %endif
 %endif
+%endif
+%ifnidn %6, __src1
+%if %0 >= 9
+CHECK_AVX_INSTR_EMU {%1 %6, %7, %8, %9}, %6, __src2, %9
+%else
+CHECK_AVX_INSTR_EMU {%1 %6, %7, %8}, %6, __src2
+%endif
 %if __sizeofreg == 8
 MOVQ %6, __src1
 %elif %3
-- 
1.9.1

___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

[libav-devel] [PATCH 2/4] x86inc: Fix AVX emulation of some instructions

2016-04-20 Thread Henrik Gramner

From: Anton Mitrofanov 

---
 libavutil/x86/x86inc.asm | 44 
 1 file changed, 24 insertions(+), 20 deletions(-)

diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm
index 10352fc..60aad23 100644
--- a/libavutil/x86/x86inc.asm
+++ b/libavutil/x86/x86inc.asm
@@ -1096,7 +1096,7 @@ INIT_XMM
 ;%1 == instruction
 ;%2 == minimal instruction set
 ;%3 == 1 if float, 0 if int
-;%4 == 1 if non-destructive or 4-operand (xmm, xmm, xmm, imm), 0 otherwise
+;%4 == 1 if 4-operand emulation, 0 if 3-operand emulation, 255 otherwise (no 
emulation)
 ;%5 == 1 if commutative (i.e. doesn't matter which src arg is which), 0 if not
 ;%6+: operands
 %macro RUN_AVX_INSTR 6-9+
@@ -1171,9 +1171,9 @@ INIT_XMM
 ;%1 == instruction
 ;%2 == minimal instruction set
 ;%3 == 1 if float, 0 if int
-;%4 == 1 if non-destructive or 4-operand (xmm, xmm, xmm, imm), 0 otherwise
+;%4 == 1 if 4-operand emulation, 0 if 3-operand emulation, 255 otherwise (no 
emulation)
 ;%5 == 1 if commutative (i.e. doesn't matter which src arg is which), 0 if not
-%macro AVX_INSTR 1-5 fnord, 0, 1, 0
+%macro AVX_INSTR 1-5 fnord, 0, 255, 0
 %macro %1 1-10 fnord, fnord, fnord, fnord, %1, %2, %3, %4, %5
 %ifidn %2, fnord
 RUN_AVX_INSTR %6, %7, %8, %9, %10, %1
@@ -1207,10 +1207,10 @@ AVX_INSTR andnpd, sse2, 1, 0, 0
 AVX_INSTR andnps, sse, 1, 0, 0
 AVX_INSTR andpd, sse2, 1, 0, 1
 AVX_INSTR andps, sse, 1, 0, 1
-AVX_INSTR blendpd, sse4, 1, 0, 0
-AVX_INSTR blendps, sse4, 1, 0, 0
-AVX_INSTR blendvpd, sse4, 1, 0, 0
-AVX_INSTR blendvps, sse4, 1, 0, 0
+AVX_INSTR blendpd, sse4, 1, 1, 0
+AVX_INSTR blendps, sse4, 1, 1, 0
+AVX_INSTR blendvpd, sse4 ; can't be emulated
+AVX_INSTR blendvps, sse4 ; can't be emulated
 AVX_INSTR cmppd, sse2, 1, 1, 0
 AVX_INSTR cmpps, sse, 1, 1, 0
 AVX_INSTR cmpsd, sse2, 1, 1, 0
@@ -1281,7 +1281,7 @@ AVX_INSTR movsldup, sse3
 AVX_INSTR movss, sse, 1, 0, 0
 AVX_INSTR movupd, sse2
 AVX_INSTR movups, sse
-AVX_INSTR mpsadbw, sse4
+AVX_INSTR mpsadbw, sse4, 0, 1, 0
 AVX_INSTR mulpd, sse2, 1, 0, 1
 AVX_INSTR mulps, sse, 1, 0, 1
 AVX_INSTR mulsd, sse2, 1, 0, 0
@@ -1303,14 +1303,18 @@ AVX_INSTR paddsb, mmx, 0, 0, 1
 AVX_INSTR paddsw, mmx, 0, 0, 1
 AVX_INSTR paddusb, mmx, 0, 0, 1
 AVX_INSTR paddusw, mmx, 0, 0, 1
-AVX_INSTR palignr, ssse3
+AVX_INSTR palignr, ssse3, 0, 1, 0
 AVX_INSTR pand, mmx, 0, 0, 1
 AVX_INSTR pandn, mmx, 0, 0, 0
 AVX_INSTR pavgb, mmx2, 0, 0, 1
 AVX_INSTR pavgw, mmx2, 0, 0, 1
-AVX_INSTR pblendvb, sse4, 0, 0, 0
-AVX_INSTR pblendw, sse4
-AVX_INSTR pclmulqdq
+AVX_INSTR pblendvb, sse4 ; can't be emulated
+AVX_INSTR pblendw, sse4, 0, 1, 0
+AVX_INSTR pclmulqdq, fnord, 0, 1, 0
+AVX_INSTR pclmulhqhqdq, fnord, 0, 0, 0
+AVX_INSTR pclmulhqlqdq, fnord, 0, 0, 0
+AVX_INSTR pclmullqhqdq, fnord, 0, 0, 0
+AVX_INSTR pclmullqlqdq, fnord, 0, 0, 0
 AVX_INSTR pcmpestri, sse42
 AVX_INSTR pcmpestrm, sse42
 AVX_INSTR pcmpistri, sse42
@@ -1334,10 +1338,10 @@ AVX_INSTR phminposuw, sse4
 AVX_INSTR phsubw, ssse3, 0, 0, 0
 AVX_INSTR phsubd, ssse3, 0, 0, 0
 AVX_INSTR phsubsw, ssse3, 0, 0, 0
-AVX_INSTR pinsrb, sse4
-AVX_INSTR pinsrd, sse4
-AVX_INSTR pinsrq, sse4
-AVX_INSTR pinsrw, mmx2
+AVX_INSTR pinsrb, sse4, 0, 1, 0
+AVX_INSTR pinsrd, sse4, 0, 1, 0
+AVX_INSTR pinsrq, sse4, 0, 1, 0
+AVX_INSTR pinsrw, mmx2, 0, 1, 0
 AVX_INSTR pmaddwd, mmx, 0, 0, 1
 AVX_INSTR pmaddubsw, ssse3, 0, 0, 0
 AVX_INSTR pmaxsb, sse4, 0, 0, 1
@@ -1409,18 +1413,18 @@ AVX_INSTR punpcklwd, mmx, 0, 0, 0
 AVX_INSTR punpckldq, mmx, 0, 0, 0
 AVX_INSTR punpcklqdq, sse2, 0, 0, 0
 AVX_INSTR pxor, mmx, 0, 0, 1
-AVX_INSTR rcpps, sse, 1, 0, 0
+AVX_INSTR rcpps, sse
 AVX_INSTR rcpss, sse, 1, 0, 0
 AVX_INSTR roundpd, sse4
 AVX_INSTR roundps, sse4
 AVX_INSTR roundsd, sse4, 1, 1, 0
 AVX_INSTR roundss, sse4, 1, 1, 0
-AVX_INSTR rsqrtps, sse, 1, 0, 0
+AVX_INSTR rsqrtps, sse
 AVX_INSTR rsqrtss, sse, 1, 0, 0
 AVX_INSTR shufpd, sse2, 1, 1, 0
 AVX_INSTR shufps, sse, 1, 1, 0
-AVX_INSTR sqrtpd, sse2, 1, 0, 0
-AVX_INSTR sqrtps, sse, 1, 0, 0
+AVX_INSTR sqrtpd, sse2
+AVX_INSTR sqrtps, sse
 AVX_INSTR sqrtsd, sse2, 1, 0, 0
 AVX_INSTR sqrtss, sse, 1, 0, 0
 AVX_INSTR stmxcsr, sse
-- 
1.9.1

___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

[libav-devel] [PATCH 1/4] x86inc: Fix AVX emulation of scalar float instructions

2016-04-20 Thread Henrik Gramner

Those instructions are not commutative since they only change the first
element in the vector and leave the rest unmodified.
---
 libavutil/x86/x86inc.asm | 28 ++--
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm
index 20ef7b8..10352fc 100644
--- a/libavutil/x86/x86inc.asm
+++ b/libavutil/x86/x86inc.asm
@@ -1193,8 +1193,8 @@ INIT_XMM
 ; Non-destructive instructions are written without parameters
 AVX_INSTR addpd, sse2, 1, 0, 1
 AVX_INSTR addps, sse, 1, 0, 1
-AVX_INSTR addsd, sse2, 1, 0, 1
-AVX_INSTR addss, sse, 1, 0, 1
+AVX_INSTR addsd, sse2, 1, 0, 0
+AVX_INSTR addss, sse, 1, 0, 0
 AVX_INSTR addsubpd, sse3, 1, 0, 0
 AVX_INSTR addsubps, sse3, 1, 0, 0
 AVX_INSTR aesdec, fnord, 0, 0, 0
@@ -1224,10 +1224,10 @@ AVX_INSTR cvtpd2ps, sse2
 AVX_INSTR cvtps2dq, sse2
 AVX_INSTR cvtps2pd, sse2
 AVX_INSTR cvtsd2si, sse2
-AVX_INSTR cvtsd2ss, sse2
-AVX_INSTR cvtsi2sd, sse2
-AVX_INSTR cvtsi2ss, sse
-AVX_INSTR cvtss2sd, sse2
+AVX_INSTR cvtsd2ss, sse2, 1, 0, 0
+AVX_INSTR cvtsi2sd, sse2, 1, 0, 0
+AVX_INSTR cvtsi2ss, sse, 1, 0, 0
+AVX_INSTR cvtss2sd, sse2, 1, 0, 0
 AVX_INSTR cvtss2si, sse
 AVX_INSTR cvttpd2dq, sse2
 AVX_INSTR cvttps2dq, sse2
@@ -1250,12 +1250,12 @@ AVX_INSTR ldmxcsr, sse
 AVX_INSTR maskmovdqu, sse2
 AVX_INSTR maxpd, sse2, 1, 0, 1
 AVX_INSTR maxps, sse, 1, 0, 1
-AVX_INSTR maxsd, sse2, 1, 0, 1
-AVX_INSTR maxss, sse, 1, 0, 1
+AVX_INSTR maxsd, sse2, 1, 0, 0
+AVX_INSTR maxss, sse, 1, 0, 0
 AVX_INSTR minpd, sse2, 1, 0, 1
 AVX_INSTR minps, sse, 1, 0, 1
-AVX_INSTR minsd, sse2, 1, 0, 1
-AVX_INSTR minss, sse, 1, 0, 1
+AVX_INSTR minsd, sse2, 1, 0, 0
+AVX_INSTR minss, sse, 1, 0, 0
 AVX_INSTR movapd, sse2
 AVX_INSTR movaps, sse
 AVX_INSTR movd, mmx
@@ -1284,8 +1284,8 @@ AVX_INSTR movups, sse
 AVX_INSTR mpsadbw, sse4
 AVX_INSTR mulpd, sse2, 1, 0, 1
 AVX_INSTR mulps, sse, 1, 0, 1
-AVX_INSTR mulsd, sse2, 1, 0, 1
-AVX_INSTR mulss, sse, 1, 0, 1
+AVX_INSTR mulsd, sse2, 1, 0, 0
+AVX_INSTR mulss, sse, 1, 0, 0
 AVX_INSTR orpd, sse2, 1, 0, 1
 AVX_INSTR orps, sse, 1, 0, 1
 AVX_INSTR pabsb, ssse3
@@ -1413,8 +1413,8 @@ AVX_INSTR rcpps, sse, 1, 0, 0
 AVX_INSTR rcpss, sse, 1, 0, 0
 AVX_INSTR roundpd, sse4
 AVX_INSTR roundps, sse4
-AVX_INSTR roundsd, sse4
-AVX_INSTR roundss, sse4
+AVX_INSTR roundsd, sse4, 1, 1, 0
+AVX_INSTR roundss, sse4, 1, 1, 0
 AVX_INSTR rsqrtps, sse, 1, 0, 0
 AVX_INSTR rsqrtss, sse, 1, 0, 0
 AVX_INSTR shufpd, sse2, 1, 1, 0
-- 
1.9.1

___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

[libav-devel] [PATCH 0/4] x86inc: Sync changes from x264

2016-04-20 Thread Henrik Gramner

Anton Mitrofanov (3):
  x86inc: Fix AVX emulation of some instructions
  x86inc: Improve handling of %ifid with multi-token parameters
  x86inc: Enable AVX emulation in additional cases

Henrik Gramner (1):
  x86inc: Fix AVX emulation of scalar float instructions

 libavutil/x86/x86inc.asm | 95 ++--
 1 file changed, 52 insertions(+), 43 deletions(-)

-- 
1.9.1

___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH] h264: Use isprint to sanitize the SEI debug message

2016-02-06 Thread Henrik Gramner

On Sat, Feb 6, 2016 at 7:34 PM, Luca Barbato  wrote:
> Give how this function is used it is not really important, its purpose
> is to not break the terminal printing garbage.

That's true I guess.

> Do you have time to get me a function that is local independent?

static inline av_const int av_isprint(int c)
{
return c > 31 && c < 127;
}
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH] h264: Use isprint to sanitize the SEI debug message

2016-02-06 Thread Henrik Gramner

On Sat, Feb 6, 2016 at 1:03 PM, Luca Barbato  wrote:
> +if (isprint(val))

Shouldn't we use a locale-independent version similar to the other
functions in libavutil/avstring.h?
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH] h264: Parse only the x264 info unregisterd sei

2016-02-04 Thread Henrik Gramner

On Wed, Jul 29, 2015 at 10:51 PM, Luca Barbato  wrote:
> And restrict the string to ascii text.

Restricting to printable characters would be even better.
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

[libav-devel] [PATCH] msvc: Fix libx264 linking

2016-01-28 Thread Henrik Gramner

---
 configure | 1 +
 1 file changed, 1 insertion(+)

diff --git a/configure b/configure
index c5bcb78..0bf29c2 100755
--- a/configure
+++ b/configure
@@ -2951,6 +2951,7 @@ msvc_common_flags(){
 -lz)  echo zlib.lib ;;
 -lavifil32)   echo vfw32.lib ;;
 -lavicap32)   echo vfw32.lib user32.lib ;;
+-lx264)   echo libx264.lib ;;
 -l*)  echo ${flag#-l}.lib ;;
 -L*)  echo -libpath:${flag#-L} ;;
 *)echo $flag ;;
-- 
1.9.1

___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

[libav-devel] [PATCH v2] x86inc: Preserve arguments when allocating stack space

2016-01-20 Thread Henrik Gramner

When allocating stack space with a larger alignment than the known stack
alignment a temporary register is used for storing the stack pointer.
Ensure that this isn't one of the registers used for passing arguments.
---
 libavutil/x86/x86inc.asm | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm
index fc58b74..7d6c171 100644
--- a/libavutil/x86/x86inc.asm
+++ b/libavutil/x86/x86inc.asm
@@ -386,8 +386,11 @@ DECLARE_REG_TMP_SIZE 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14
 %if %1 != 0 && required_stack_alignment > STACK_ALIGNMENT
 %if %1 > 0
 %assign regs_used (regs_used + 1)
-%elif ARCH_X86_64 && regs_used == num_args && num_args <= 4 + 
UNIX64 * 2
-%warning "Stack pointer will overwrite register argument"
+%endif
+%if ARCH_X86_64 && regs_used < 5 + UNIX64 * 3
+; Ensure that we don't clobber any registers containing 
arguments. For UNIX64 we also preserve r6 (rax)
+; since it's used as a hidden argument in vararg functions to 
specify the number of vector registers used.
+%assign regs_used 5 + UNIX64 * 3
 %endif
 %endif
 %endif
-- 
1.9.1

___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH 4/8] x86inc: Preserve arguments when allocating stack space

2016-01-18 Thread Henrik Gramner

On Mon, Jan 18, 2016 at 2:35 PM, Ronald S. Bultje <rsbul...@gmail.com> wrote:
> On Sun, Jan 17, 2016 at 6:21 PM, Henrik Gramner <hen...@gramner.com> wrote:
>> @@ -386,8 +386,10 @@ DECLARE_REG_TMP_SIZE
>> 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14
>>  %if %1 != 0 && required_stack_alignment > STACK_ALIGNMENT
>>  %if %1 > 0
>>  %assign regs_used (regs_used + 1)
>> -%elif ARCH_X86_64 && regs_used == num_args && num_args <= 4 +
>> UNIX64 * 2
>> -%warning "Stack pointer will overwrite register argument"
>> +%endif
>> +%if ARCH_X86_64 && regs_used < 5 + UNIX64 * 3
>> +; Ensure that we don't clobber any registers containing
>> arguments
>> +%assign regs_used 5 + UNIX64 * 3
>
> Why 5 + unix * 3 and not 5 +unix * 2? Isn't unix64 6 regs and win64 4 regs?

Because in the System V ABI, r6 (rax) is used to specify the number of
arguments passed in vector registers in vararg functions so we use r7
instead of potentially clobbering it. It's certainly unlikely for it
to actually be relevant in handwritten assembly functions, but there's
not really any drawback of supporting that use case here (both r6 and
r7 are volatile).
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

[libav-devel] [PATCH 4/8] x86inc: Preserve arguments when allocating stack space

2016-01-17 Thread Henrik Gramner

When allocating stack space with a larger alignment than the known stack
alignment a temporary register is used for storing the stack pointer.
Ensure that this isn't one of the registers used for passing arguments.
---
 libavutil/x86/x86inc.asm | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm
index fc58b74..c355ee7 100644
--- a/libavutil/x86/x86inc.asm
+++ b/libavutil/x86/x86inc.asm
@@ -386,8 +386,10 @@ DECLARE_REG_TMP_SIZE 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14
 %if %1 != 0 && required_stack_alignment > STACK_ALIGNMENT
 %if %1 > 0
 %assign regs_used (regs_used + 1)
-%elif ARCH_X86_64 && regs_used == num_args && num_args <= 4 + 
UNIX64 * 2
-%warning "Stack pointer will overwrite register argument"
+%endif
+%if ARCH_X86_64 && regs_used < 5 + UNIX64 * 3
+; Ensure that we don't clobber any registers containing 
arguments
+%assign regs_used 5 + UNIX64 * 3
 %endif
 %endif
 %endif
-- 
1.9.1

___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

[libav-devel] [PATCH 5/8] x86inc: Use more consistent indentation

2016-01-17 Thread Henrik Gramner

---
 libavutil/x86/x86inc.asm | 134 +++
 1 file changed, 67 insertions(+), 67 deletions(-)

diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm
index c355ee7..de20e76 100644
--- a/libavutil/x86/x86inc.asm
+++ b/libavutil/x86/x86inc.asm
@@ -183,9 +183,9 @@
 %define e%1h %3
 %define r%1b %2
 %define e%1b %2
-%if ARCH_X86_64 == 0
-%define r%1  e%1
-%endif
+%if ARCH_X86_64 == 0
+%define r%1 e%1
+%endif
 %endmacro
 
 DECLARE_REG_SIZE ax, al, ah
@@ -503,9 +503,9 @@ DECLARE_REG 14, R15, 120
 %macro RET 0
 WIN64_RESTORE_XMM_INTERNAL rsp
 POP_IF_USED 14, 13, 12, 11, 10, 9, 8, 7
-%if mmsize == 32
-vzeroupper
-%endif
+%if mmsize == 32
+vzeroupper
+%endif
 AUTO_REP_RET
 %endmacro
 
@@ -542,17 +542,17 @@ DECLARE_REG 14, R15, 72
 %define has_epilogue regs_used > 9 || mmsize == 32 || stack_size > 0
 
 %macro RET 0
-%if stack_size_padded > 0
-%if required_stack_alignment > STACK_ALIGNMENT
-mov rsp, rstkm
-%else
-add rsp, stack_size_padded
-%endif
-%endif
+%if stack_size_padded > 0
+%if required_stack_alignment > STACK_ALIGNMENT
+mov rsp, rstkm
+%else
+add rsp, stack_size_padded
+%endif
+%endif
 POP_IF_USED 14, 13, 12, 11, 10, 9
-%if mmsize == 32
-vzeroupper
-%endif
+%if mmsize == 32
+vzeroupper
+%endif
 AUTO_REP_RET
 %endmacro
 
@@ -598,29 +598,29 @@ DECLARE_ARG 7, 8, 9, 10, 11, 12, 13, 14
 %define has_epilogue regs_used > 3 || mmsize == 32 || stack_size > 0
 
 %macro RET 0
-%if stack_size_padded > 0
-%if required_stack_alignment > STACK_ALIGNMENT
-mov rsp, rstkm
-%else
-add rsp, stack_size_padded
-%endif
-%endif
+%if stack_size_padded > 0
+%if required_stack_alignment > STACK_ALIGNMENT
+mov rsp, rstkm
+%else
+add rsp, stack_size_padded
+%endif
+%endif
 POP_IF_USED 6, 5, 4, 3
-%if mmsize == 32
-vzeroupper
-%endif
+%if mmsize == 32
+vzeroupper
+%endif
 AUTO_REP_RET
 %endmacro
 
 %endif ;==
 
 %if WIN64 == 0
-%macro WIN64_SPILL_XMM 1
-%endmacro
-%macro WIN64_RESTORE_XMM 1
-%endmacro
-%macro WIN64_PUSH_XMM 0
-%endmacro
+%macro WIN64_SPILL_XMM 1
+%endmacro
+%macro WIN64_RESTORE_XMM 1
+%endmacro
+%macro WIN64_PUSH_XMM 0
+%endmacro
 %endif
 
 ; On AMD cpus <=K10, an ordinary ret is slow if it immediately follows either
@@ -846,14 +846,14 @@ BRANCH_INSTR jz, je, jnz, jne, jl, jle, jnl, jnle, jg, 
jge, jng, jnge, ja, jae,
 %define movnta movntq
 %assign %%i 0
 %rep 8
-CAT_XDEFINE m, %%i, mm %+ %%i
-CAT_XDEFINE nnmm, %%i, %%i
-%assign %%i %%i+1
+CAT_XDEFINE m, %%i, mm %+ %%i
+CAT_XDEFINE nnmm, %%i, %%i
+%assign %%i %%i+1
 %endrep
 %rep 8
-CAT_UNDEF m, %%i
-CAT_UNDEF nnmm, %%i
-%assign %%i %%i+1
+CAT_UNDEF m, %%i
+CAT_UNDEF nnmm, %%i
+%assign %%i %%i+1
 %endrep
 INIT_CPUFLAGS %1
 %endmacro
@@ -864,7 +864,7 @@ BRANCH_INSTR jz, je, jnz, jne, jl, jle, jnl, jnle, jg, jge, 
jng, jnge, ja, jae,
 %define mmsize 16
 %define num_mmregs 8
 %if ARCH_X86_64
-%define num_mmregs 16
+%define num_mmregs 16
 %endif
 %define mova movdqa
 %define movu movdqu
@@ -872,9 +872,9 @@ BRANCH_INSTR jz, je, jnz, jne, jl, jle, jnl, jnle, jg, jge, 
jng, jnge, ja, jae,
 %define movnta movntdq
 %assign %%i 0
 %rep num_mmregs
-CAT_XDEFINE m, %%i, xmm %+ %%i
-CAT_XDEFINE nnxmm, %%i, %%i
-%assign %%i %%i+1
+CAT_XDEFINE m, %%i, xmm %+ %%i
+CAT_XDEFINE nnxmm, %%i, %%i
+%assign %%i %%i+1
 %endrep
 INIT_CPUFLAGS %1
 %endmacro
@@ -885,7 +885,7 @@ BRANCH_INSTR jz, je, jnz, jne, jl, jle, jnl, jnle, jg, jge, 
jng, jnge, ja, jae,
 %define mmsize 32
 %define num_mmregs 8
 %if ARCH_X86_64
-%define num_mmregs 16
+%define num_mmregs 16
 %endif
 %define mova movdqa
 %define movu movdqu
@@ -893,9 +893,9 @@ BRANCH_INSTR jz, je, jnz, jne, jl, jle, jnl, jnle, jg, jge, 
jng, jnge, ja, jae,
 %define movnta movntdq
 %assign %%i 0
 %rep num_mmregs
-CAT_XDEFINE m, %%i, ymm %+ %%i
-CAT_XDEFINE nnymm, %%i, %%i
-%assign %%i %%i+1
+CAT_XDEFINE m, %%i, ymm %+ %%i
+CAT_XDEFINE nnymm, %%i, %%i
+%assign %%i %%i+1
 %endrep
 INIT_CPUFLAGS %1
 %endmacro
@@ -919,7 +919,7 @@ INIT_XMM
 %assign i 0
 %rep 16
 DECLARE_MMCAST i
-%assign i i+1
+%assign i i+1
 %endrep
 
 ; I often want to use macros that permute their arguments. e.g. there's no
@@ -937,23 +937,23 @@ INIT_XMM
 ; doesn't cost any cycles.
 
 %macro PERMUTE 2-* ; takes a list of pairs to swap
-%rep %0/2
-%xdefine %%tmp%2 m%2
-%rotate 2
-%endrep
-%rep %0/2
-%xdefine m%1 %%tmp%2
-CAT_XDEFINE nn, m%1, %1
-%rotate 2
-%endrep
+%rep %0/2
+

[libav-devel] [PATCH 0/8] x86inc: Sync changes from x264

2016-01-17 Thread Henrik Gramner

The following patches were recently pushed to x264.

Geza Lore (1):
  x86inc: Add debug symbols indicating sizes of compiled functions

Henrik Gramner (7):
  x86inc: Make cpuflag() and notcpuflag() return 0 or 1
  x86inc: Be more verbose in assertion failures
  x86inc: Improve FMA instruction handling
  x86inc: Preserve arguments when allocating stack space
  x86inc: Use more consistent indentation
  x86inc: Simplify AUTO_REP_RET
  x86inc: Avoid creating unnecessary local labels

 libavcodec/x86/proresdsp.asm|   2 +-
 libavutil/x86/x86inc.asm| 259 ++--
 tests/checkasm/x86/checkasm.asm |   8 +-
 3 files changed, 146 insertions(+), 123 deletions(-)

-- 
1.9.1

___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

[libav-devel] [PATCH 7/8] x86inc: Avoid creating unnecessary local labels

2016-01-17 Thread Henrik Gramner

The REP_RET workaround is only needed on old AMD cpus, and the labels clutter
up the symbol table and confuse debugging/profiling tools, so use EQU to
create SHN_ABS symbols instead of creating local labels. Furthermore, skip
the workaround completely in functions that definitely won't run on such cpus.

Note that EQU is just creating a local label when using nasm instead of yasm.
This is probably a bug, but at least it doesn't break anything.
---
 libavutil/x86/x86inc.asm | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm
index 05a5790..980d753 100644
--- a/libavutil/x86/x86inc.asm
+++ b/libavutil/x86/x86inc.asm
@@ -647,8 +647,10 @@ DECLARE_ARG 7, 8, 9, 10, 11, 12, 13, 14
 %rep %0
 %macro %1 1-2 %1
 %2 %1
-%%branch_instr:
-%xdefine last_branch_adr %%branch_instr
+%if notcpuflag(ssse3)
+%%branch_instr equ $
+%xdefine last_branch_adr %%branch_instr
+%endif
 %endmacro
 %rotate 1
 %endrep
-- 
1.9.1

___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

[libav-devel] [PATCH 8/8] x86inc: Add debug symbols indicating sizes of compiled functions

2016-01-17 Thread Henrik Gramner

From: Geza Lore 

Some debuggers/profilers use this metadata to determine which function a
given instruction is in; without it they get can confused by local labels
(if you haven't stripped those). On the other hand, some tools are still
confused even with this metadata. e.g. this fixes `gdb`, but not `perf`.

Currently only implemented for ELF.
---
 libavcodec/x86/proresdsp.asm|  2 +-
 libavutil/x86/x86inc.asm| 23 +++
 tests/checkasm/x86/checkasm.asm |  8 
 3 files changed, 28 insertions(+), 5 deletions(-)

diff --git a/libavcodec/x86/proresdsp.asm b/libavcodec/x86/proresdsp.asm
index a0e97b3..5a329cb 100644
--- a/libavcodec/x86/proresdsp.asm
+++ b/libavcodec/x86/proresdsp.asm
@@ -54,7 +54,7 @@ cextern pw_8
 cextern pw_512
 cextern pw_1019
 
-section .text align=16
+SECTION .text
 
 ; interleave data while maintaining source
 ; %1=type, %2=dstlo, %3=dsthi, %4=src, %5=interleave
diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm
index 980d753..8afce5b 100644
--- a/libavutil/x86/x86inc.asm
+++ b/libavutil/x86/x86inc.asm
@@ -633,6 +633,7 @@ DECLARE_ARG 7, 8, 9, 10, 11, 12, 13, 14
 %else
 rep ret
 %endif
+annotate_function_size
 %endmacro
 
 %define last_branch_adr $$
@@ -641,6 +642,7 @@ DECLARE_ARG 7, 8, 9, 10, 11, 12, 13, 14
 times ((last_branch_adr-$)>>31)+1 rep ; times 1 iff $ == 
last_branch_adr.
 %endif
 ret
+annotate_function_size
 %endmacro
 
 %macro BRANCH_INSTR 0-*
@@ -665,6 +667,7 @@ BRANCH_INSTR jz, je, jnz, jne, jl, jle, jnl, jnle, jg, jge, 
jng, jnge, ja, jae,
 %elif %2
 jmp %1
 %endif
+annotate_function_size
 %endmacro
 
 ;=
@@ -686,6 +689,7 @@ BRANCH_INSTR jz, je, jnz, jne, jl, jle, jnl, jnle, jg, jge, 
jng, jnge, ja, jae,
 cglobal_internal 0, %1 %+ SUFFIX, %2
 %endmacro
 %macro cglobal_internal 2-3+
+annotate_function_size
 %if %1
 %xdefine %%FUNCTION_PREFIX private_prefix
 %xdefine %%VISIBILITY hidden
@@ -699,6 +703,7 @@ BRANCH_INSTR jz, je, jnz, jne, jl, jle, jnl, jnle, jg, jge, 
jng, jnge, ja, jae,
 CAT_XDEFINE cglobaled_, %2, 1
 %endif
 %xdefine current_function %2
+%xdefine current_function_section __SECT__
 %if FORMAT_ELF
 global %2:function %%VISIBILITY
 %else
@@ -747,6 +752,24 @@ BRANCH_INSTR jz, je, jnz, jne, jl, jle, jnl, jnle, jg, 
jge, jng, jnge, ja, jae,
 [SECTION .note.GNU-stack noalloc noexec nowrite progbits]
 %endif
 
+; Tell debuggers how large the function was.
+; This may be invoked multiple times per function; we rely on later instances 
overriding earlier ones.
+; This is invoked by RET and similar macros, and also cglobal does it for the 
previous function,
+; but if the last function in a source file doesn't use any of the standard 
macros for its epilogue,
+; then its size might be unspecified.
+%macro annotate_function_size 0
+%ifdef __YASM_VER__
+%ifdef current_function
+%if FORMAT_ELF
+current_function_section
+%%ecf equ $
+size current_function %%ecf - current_function
+__SECT__
+%endif
+%endif
+%endif
+%endmacro
+
 ; cpuflags
 
 %assign cpuflags_mmx  (1<<0)
diff --git a/tests/checkasm/x86/checkasm.asm b/tests/checkasm/x86/checkasm.asm
index 52d10ae..55212fc 100644
--- a/tests/checkasm/x86/checkasm.asm
+++ b/tests/checkasm/x86/checkasm.asm
@@ -66,14 +66,14 @@ cextern fail_func
 ;-
 cglobal stack_clobber, 1,2
 ; Clobber the stack with junk below the stack pointer
-%define size (max_args+6)*8
-SUB  rsp, size
-mov   r1, size-8
+%define argsize (max_args+6)*8
+SUB  rsp, argsize
+mov   r1, argsize-8
 .loop:
 mov [rsp+r1], r0
 sub   r1, 8
 jge .loop
-ADD  rsp, size
+ADD  rsp, argsize
 RET
 
 %if WIN64
-- 
1.9.1

___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

[libav-devel] [PATCH 2/8] x86inc: Be more verbose in assertion failures

2016-01-17 Thread Henrik Gramner

---
 libavutil/x86/x86inc.asm | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm
index afcd6b8..dabb6cc 100644
--- a/libavutil/x86/x86inc.asm
+++ b/libavutil/x86/x86inc.asm
@@ -295,7 +295,7 @@ DECLARE_REG_TMP_SIZE 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14
 
 %macro ASSERT 1
 %if (%1) == 0
-%error assert failed
+%error assertion ``%1'' failed
 %endif
 %endmacro
 
-- 
1.9.1

___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

[libav-devel] [PATCH 3/8] x86inc: Improve FMA instruction handling

2016-01-17 Thread Henrik Gramner

 * Correctly handle FMA instructions with memory operands.
 * Print a warning if FMA instructions are used without the correct cpuflag.
 * Simplify the instantiation code.
 * Clarify documentation.

Only the last operand in FMA3 instructions can be a memory operand. When
converting FMA4 instructions to FMA3 instructions we can utilize the fact
that multiply is a commutative operation and reorder operands if necessary
to ensure that a memory operand is used only as the last operand.
---
 libavutil/x86/x86inc.asm | 77 +++-
 1 file changed, 37 insertions(+), 40 deletions(-)

diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm
index dabb6cc..fc58b74 100644
--- a/libavutil/x86/x86inc.asm
+++ b/libavutil/x86/x86inc.asm
@@ -1,7 +1,7 @@
 ;*
 ;* x86inc.asm: x264asm abstraction layer
 ;*
-;* Copyright (C) 2005-2015 x264 project
+;* Copyright (C) 2005-2016 x264 project
 ;*
 ;* Authors: Loren Merritt 
 ;*  Anton Mitrofanov 
@@ -1456,47 +1456,44 @@ FMA_INSTR pmadcswd, pmaddwd, paddd
 ; This lets us use tzcnt without bumping the yasm version requirement yet.
 %define tzcnt rep bsf
 
-; convert FMA4 to FMA3 if possible
-%macro FMA4_INSTR 4
-%macro %1 4-8 %1, %2, %3, %4
-%if cpuflag(fma4)
-v%5 %1, %2, %3, %4
-%elifidn %1, %2
-v%6 %1, %4, %3 ; %1 = %1 * %3 + %4
-%elifidn %1, %3
-v%7 %1, %2, %4 ; %1 = %2 * %1 + %4
-%elifidn %1, %4
-v%8 %1, %2, %3 ; %1 = %2 * %3 + %1
-%else
-%error fma3 emulation of ``%5 %1, %2, %3, %4'' is not supported
-%endif
-%endmacro
+; Macros for consolidating FMA3 and FMA4 using 4-operand (dst, src1, src2, 
src3) syntax.
+; FMA3 is only possible if dst is the same as one of the src registers.
+; Either src2 or src3 can be a memory operand.
+%macro FMA4_INSTR 2-*
+%push fma4_instr
+%xdefine %$prefix %1
+%rep %0 - 1
+%macro %$prefix%2 4-6 %$prefix, %2
+%if notcpuflag(fma3) && notcpuflag(fma4)
+%error use of ``%5%6'' fma instruction in cpuname function: 
current_function
+%elif cpuflag(fma4)
+v%5%6 %1, %2, %3, %4
+%elifidn %1, %2
+; If %3 or %4 is a memory operand it needs to be encoded as 
the last operand.
+%ifid %3
+v%{5}213%6 %2, %3, %4
+%else
+v%{5}132%6 %2, %4, %3
+%endif
+%elifidn %1, %3
+v%{5}213%6 %3, %2, %4
+%elifidn %1, %4
+v%{5}231%6 %4, %2, %3
+%else
+%error fma3 emulation of ``%5%6 %1, %2, %3, %4'' is not 
supported
+%endif
+%endmacro
+%rotate 1
+%endrep
+%pop
 %endmacro
 
-FMA4_INSTR fmaddpd, fmadd132pd, fmadd213pd, fmadd231pd
-FMA4_INSTR fmaddps, fmadd132ps, fmadd213ps, fmadd231ps
-FMA4_INSTR fmaddsd, fmadd132sd, fmadd213sd, fmadd231sd
-FMA4_INSTR fmaddss, fmadd132ss, fmadd213ss, fmadd231ss
-
-FMA4_INSTR fmaddsubpd, fmaddsub132pd, fmaddsub213pd, fmaddsub231pd
-FMA4_INSTR fmaddsubps, fmaddsub132ps, fmaddsub213ps, fmaddsub231ps
-FMA4_INSTR fmsubaddpd, fmsubadd132pd, fmsubadd213pd, fmsubadd231pd
-FMA4_INSTR fmsubaddps, fmsubadd132ps, fmsubadd213ps, fmsubadd231ps
-
-FMA4_INSTR fmsubpd, fmsub132pd, fmsub213pd, fmsub231pd
-FMA4_INSTR fmsubps, fmsub132ps, fmsub213ps, fmsub231ps
-FMA4_INSTR fmsubsd, fmsub132sd, fmsub213sd, fmsub231sd
-FMA4_INSTR fmsubss, fmsub132ss, fmsub213ss, fmsub231ss
-
-FMA4_INSTR fnmaddpd, fnmadd132pd, fnmadd213pd, fnmadd231pd
-FMA4_INSTR fnmaddps, fnmadd132ps, fnmadd213ps, fnmadd231ps
-FMA4_INSTR fnmaddsd, fnmadd132sd, fnmadd213sd, fnmadd231sd
-FMA4_INSTR fnmaddss, fnmadd132ss, fnmadd213ss, fnmadd231ss
-
-FMA4_INSTR fnmsubpd, fnmsub132pd, fnmsub213pd, fnmsub231pd
-FMA4_INSTR fnmsubps, fnmsub132ps, fnmsub213ps, fnmsub231ps
-FMA4_INSTR fnmsubsd, fnmsub132sd, fnmsub213sd, fnmsub231sd
-FMA4_INSTR fnmsubss, fnmsub132ss, fnmsub213ss, fnmsub231ss
+FMA4_INSTR fmadd,pd, ps, sd, ss
+FMA4_INSTR fmaddsub, pd, ps
+FMA4_INSTR fmsub,pd, ps, sd, ss
+FMA4_INSTR fmsubadd, pd, ps
+FMA4_INSTR fnmadd,   pd, ps, sd, ss
+FMA4_INSTR fnmsub,   pd, ps, sd, ss
 
 ; workaround: vpbroadcastq is broken in x86_32 due to a yasm bug (fixed in 
1.3.0)
 %ifdef __YASM_VER__
-- 
1.9.1

___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH v2 1/1] x86: use emms after ff_int32_to_float_fmul_scalar_sse

2015-12-30 Thread Henrik Gramner

On Wed, Dec 30, 2015 at 1:43 PM, Janne Grunau  wrote:
>  libavcodec/x86/fmtconvert.asm | 9 -
>  1 file changed, 8 insertions(+), 1 deletion(-)

Ok.
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH 1/1] x86: use emms after ff_int32_to_float_fmul_scalar_sse

2015-12-29 Thread Henrik Gramner

On Tue, Dec 29, 2015 at 12:32 PM, Janne Grunau  wrote:
> Intel's Instruction Set Reference (as of September 2015) clearly states
> that cvtpi2ps switches to MMX state. Actual CPUs do not switch if the
> source is a memory location. The Instruction Set Reference from 1999
> (Order Number 243191) describes this behaviour but all later versions
> I've seen have make no distinction whether MMX registers or memory is
> used as source.
> The documentation for the matching SSE2 instruction to convert to double
> (cvtpi2pd) was fixed (see the valgrind bug
> https://bugs.kde.org/show_bug.cgi?id=210264).
>
> It will take time to get a clarification and fixes in place. In the
> meantime it makes sense to change ff_int32_to_float_fmul_scalar_sse to
> be correct according to the documentation. The vast majority of users
> will have SSE2 so a change to the SSE version has little effect.
>
> Fixes fate-checkasm on x86 valgrind targets.
>
> Valgrind 'bug' reported as https://bugs.kde.org/show_bug.cgi?id=357059
> ---
>  libavcodec/x86/fmtconvert.asm | 7 +++
>  1 file changed, 7 insertions(+)
>
> diff --git a/libavcodec/x86/fmtconvert.asm b/libavcodec/x86/fmtconvert.asm
> index 0383322..c2ff707 100644
> --- a/libavcodec/x86/fmtconvert.asm
> +++ b/libavcodec/x86/fmtconvert.asm
> @@ -61,6 +61,13 @@ cglobal int32_to_float_fmul_scalar, 4, 4, %1, dst, src, 
> mul, len
>  mova  [dstq+lenq+16], m2
>  add lenq, 32
>  jl .loop
> +%if cpuflag(sse)
> +;; cvtpi2ps switches to MMX even if the source is a memory location
> +;; possible an error in documentation since every tested CPU disagrees 
> with
> +;; that. Use emms anyway since the vast majority of machines will use the
> +;; SSE2 variant
> +emms
> +%endif
>  REP_RET
>  %endmacro

Should be notcpuflag(sse2). Also the REP_RET could be replaced with
RET, but that's a pretty minor thing.
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH 2/2] checkasm: x86: post commit review fixes

2015-12-25 Thread Henrik Gramner

On Tue, Dec 22, 2015 at 10:59 PM, Janne Grunau  wrote:
> Check the full FPU tag word instead of only the upper half and simplify
> the comparison.

It previously only checked the lower half, not the upper.

> Use upper-case function base name as macro name to instantiate both
> checked_call variants.
> ---
>  tests/checkasm/x86/checkasm.asm | 20 +---
>  1 file changed, 9 insertions(+), 11 deletions(-)

Otherwise ok.
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH 1/2] x86: zero extend the 32-bit length in int32_to_float_fmul_scalar implicitly

2015-12-25 Thread Henrik Gramner

On Tue, Dec 22, 2015 at 10:59 PM, Janne Grunau  wrote:
> This reverts commit 5dfe4edad63971d669ae456b0bc40ef9364cca80.
> ---
>  libavcodec/x86/fmtconvert.asm | 5 +
>  1 file changed, 1 insertion(+), 4 deletions(-)

Ok.
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [libav-commits] checkasm: add fmtconvert tests

2015-12-23 Thread Henrik Gramner

On Tue, Dec 22, 2015 at 10:44 PM, Janne Grunau  wrote:
>> Intel's current documentation is very clear on cvtpi2ps: "This
>> instruction causes a transition from x87 FPU to MMX technology
>> operation".
>
> every tested silicon (nothing ancient or SSE only though) and the copy
> of the manual from 1999 (Order Number 243191) (Pentium 3 and SSE were
> intruduced 1999) disagree. The instruction reference manual from 2002
> (Order Number 245471-006) misses the comment. But that could easily be
> an edit error. The comments from the end of the instruction description
> were removed and the main description was extended.

Huh, that's interesting. Also note that cvtpi2ps and cvtpi2pd has
different exception mechanics as well.

> Since it will probably take ages to get this resolved: adding a emms
> before the return of int32_to_float_fmul_scalar_sse is enough for all
> x86 systems/calling conventions?

That seems like to easiest, safest and most compatible way of handling
it. Especially considering that CPUs with SSE2 won't even run that
code path anyway.
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH 1/1] x86: checkasm: check for or handle missing cleanup after MMX instructions

2015-12-22 Thread Henrik Gramner

On Fri, Dec 11, 2015 at 6:40 PM, Janne Grunau  wrote:
> +#define declare_new_emms(cpu_flags, ret, ...) \
> +ret (*checked_call)(void *, int, int, int, int, int, __VA_ARGS__) = \
> +((cpu_flags) & av_get_cpu_flags()) ? (void 
> *)checkasm_checked_call_emms : \
> + (void *)checkasm_checked_call;

> +#define declare_new_emms(cpu_flags, ret, ...) ret (*checked_call)(void *, 
> __VA_ARGS__) = \
> +((cpu_flags) & av_get_cpu_flags()) ? (void 
> *)checkasm_checked_call_emms :\
> +

Do we need to have cpu_flags as a parameter here? Couldn't we just use
the checked_call_emms codepath unconditionally whenever
declare_new_emms is used on x86 or am I missing something?

> +%macro check_call 0-1
> +cglobal checked_call%1, 2,15,16,max_args*8+8

s/check_call/CHECKED_CALL/

Also I'm not sure if using %1 when it can be undefined is a good idea.
It might just happen to accidentally work right now.

> +%ifnid %1, _emms
> +fstenv [rsp]
> +mov  r9h, [rsp + 8]
> +add  r9h, 1
> +jz   .emms_ok
> +report_fail error_message_emms
> +emms
> +.emms_ok:
> +%else
> +emms
> +%endif

You're not checking if registers 4-7 are empty here because the FPU
tag word is 16 bits and rNh is an 8-bit register corresponding to bits
8-15 of a full register.

mov/add should be replaced with cmp word [rsp + 8], 0x (and jz
with je IMO even though they assemble to the same opcode because
"equal" makes more sense than "zero" in that case).

> +%ifnid %1, _emms
> +fstenv [rsp]
> +mov  r3h, [rsp + 8]
> +add  r3h, 1
> +jz   .emms_ok
> +report_fail error_message_emms
> +emms
> +.emms_ok:
> +%else
> +emms
> +%endif

Ditto, also s/rsp/esp/ for consistency with the rest of the 32-bit code.
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [libav-commits] checkasm: add fmtconvert tests

2015-12-22 Thread Henrik Gramner

On Tue, Dec 22, 2015 at 5:41 PM, Janne Grunau  wrote:
> I found HTML copy from 1999 of Intel's manual(1) which says that
> cvtpi2ps with a memory location as source doesn't cause a transition to
> MMX state. The current documentation for cvtpi2pd (packed int to packed
> double conversion) says the same. Valgrind wasn't following that that
> until Vitor reported it as #210264(2) in 2009 and it was fixed in (3).
> As Julian Seward says in the commit message the situation is a little
> bit fishy.

Intel's current documentation is very clear on cvtpi2ps: "This
instruction causes a transition from x87 FPU to MMX technology
operation".

For cvtpi2pd it does point out that a state transition only happens
when the source is an MMX registers. I'm guessing that this difference
in behavior is due to the fact that cvtpi2ps is SSE while cvtpi2pd is
SSE2.
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH 1/2] configure: Support msys2 out of box

2015-11-21 Thread Henrik Gramner

On Sat, Nov 21, 2015 at 7:53 AM, Hendrik Leppkes  wrote:
> msys2 provides various .sh scripts to setup the environment, one for
> msys2 building, and one for mingw32/64 respectively.
> You need to launch it using the appropriate shell script, but just
> running sh.exe.
>
> - Hendrik

Which is not really obvious and somewhat counter-intuitive if you're
using say msvc instead of mingw.

Is there any downside of making stuff "just work" (tm) with the
default msys shell?
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

[libav-devel] [PATCH] checkasm: Fix compilation with --disable-avcodec

2015-10-04 Thread Henrik Gramner

---
 tests/checkasm/checkasm.c | 24 +---
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c
index 9219a83..3ed78b6 100644
--- a/tests/checkasm/checkasm.c
+++ b/tests/checkasm/checkasm.c
@@ -57,17 +57,19 @@ static const struct {
 const char *name;
 void (*func)(void);
 } tests[] = {
-#if CONFIG_BSWAPDSP
-{ "bswapdsp", checkasm_check_bswapdsp },
-#endif
-#if CONFIG_H264PRED
-{ "h264pred", checkasm_check_h264pred },
-#endif
-#if CONFIG_H264QPEL
-{ "h264qpel", checkasm_check_h264qpel },
-#endif
-#if CONFIG_V210_ENCODER
-{ "v210enc", checkasm_check_v210enc },
+#if CONFIG_AVCODEC
+#if CONFIG_BSWAPDSP
+{ "bswapdsp", checkasm_check_bswapdsp },
+#endif
+#if CONFIG_H264PRED
+{ "h264pred", checkasm_check_h264pred },
+#endif
+#if CONFIG_H264QPEL
+{ "h264qpel", checkasm_check_h264qpel },
+#endif
+#if CONFIG_V210_ENCODER
+{ "v210enc", checkasm_check_v210enc },
+#endif
 #endif
 { NULL }
 };
-- 
1.9.1

___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH] checkasm: Fix compilation with --disable-avcodec

2015-10-04 Thread Henrik Gramner

On Sun, Oct 4, 2015 at 8:39 PM, Luca Barbato  wrote:
> Alternatively we might make sure if avcodec is disabled all its
> components are as well.
>
> might simplify a lot the code...

Yes, that's indeed a solid approach as well. Who's volunteering for
that though? I don't really know much about the build system.
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

[libav-devel] [PATCH] checkasm: Fix the function name sorting algorithm

2015-09-28 Thread Henrik Gramner

The previous implementation was behaving incorrectly in some corner cases.
---
 tests/checkasm/checkasm.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c
index 013e197..9219a83 100644
--- a/tests/checkasm/checkasm.c
+++ b/tests/checkasm/checkasm.c
@@ -280,12 +280,16 @@ static void print_benchs(CheckasmFunc *f)
 /* ASCIIbetical sort except preserving natural order for numbers */
 static int cmp_func_names(const char *a, const char *b)
 {
+const char *start = a;
 int ascii_diff, digit_diff;
 
-for (; !(ascii_diff = *a - *b) && *a; a++, b++);
+for (; !(ascii_diff = *(const unsigned char*)a - *(const unsigned char*)b) 
&& *a; a++, b++);
 for (; av_isdigit(*a) && av_isdigit(*b); a++, b++);
 
-return (digit_diff = av_isdigit(*a) - av_isdigit(*b)) ? digit_diff : 
ascii_diff;
+if (a > start && av_isdigit(a[-1]) && (digit_diff = av_isdigit(*a) - 
av_isdigit(*b)))
+return digit_diff;
+
+return ascii_diff;
 }
 
 /* Perform a tree rotation in the specified direction and return the new root 
*/
-- 
1.9.1

___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH] avutil/avstring: Inline some tiny functions

2015-09-28 Thread Henrik Gramner

On Mon, Sep 28, 2015 at 9:49 AM, Anton Khirnov  wrote:
> But does it actually improve performance measurably? I'd argue that
> those functions are used in places where it doesn't really matter.

I was using some perf tools through checkasm when I noticed an awful
lot of time was spent calling av_isdigit() which was kind of silly,
and inlining it made it run around 5% faster overall. But yes, it's
obviously not a performance critical piece of code by any means. I
haven't really looked at other code that uses any of those functions
though.

> And since inline public functions tend to generate pain, it's better to
> avoid them unless there are large practical gains otherwise.

av_toupper() and av_tolower() are similar short functions in the same
file that are currently inlined though, so one could argue that this
patch improves consistency if nothing else.
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

[libav-devel] [PATCH] avutil/avstring: Inline some tiny functions

2015-09-26 Thread Henrik Gramner

They're short enough that inlining them actually reduces code size due to
all the overhead associated with making a function call.
---
 libavutil/avstring.c | 22 --
 libavutil/avstring.h | 22 ++
 2 files changed, 18 insertions(+), 26 deletions(-)

diff --git a/libavutil/avstring.c b/libavutil/avstring.c
index eb5c95a..5a443ab 100644
--- a/libavutil/avstring.c
+++ b/libavutil/avstring.c
@@ -212,28 +212,6 @@ const char *av_dirname(char *path)
 return path;
 }
 
-int av_isdigit(int c)
-{
-return c >= '0' && c <= '9';
-}
-
-int av_isgraph(int c)
-{
-return c > 32 && c < 127;
-}
-
-int av_isspace(int c)
-{
-return c == ' ' || c == '\f' || c == '\n' || c == '\r' || c == '\t' ||
-   c == '\v';
-}
-
-int av_isxdigit(int c)
-{
-c = av_tolower(c);
-return av_isdigit(c) || (c >= 'a' && c <= 'f');
-}
-
 int av_match_name(const char *name, const char *names)
 {
 const char *p;
diff --git a/libavutil/avstring.h b/libavutil/avstring.h
index 7c30ee1..780f109 100644
--- a/libavutil/avstring.h
+++ b/libavutil/avstring.h
@@ -154,17 +154,27 @@ char *av_get_token(const char **buf, const char *term);
 /**
  * Locale-independent conversion of ASCII isdigit.
  */
-av_const int av_isdigit(int c);
+static inline av_const int av_isdigit(int c)
+{
+return c >= '0' && c <= '9';
+}
 
 /**
  * Locale-independent conversion of ASCII isgraph.
  */
-av_const int av_isgraph(int c);
+static inline av_const int av_isgraph(int c)
+{
+return c > 32 && c < 127;
+}
 
 /**
  * Locale-independent conversion of ASCII isspace.
  */
-av_const int av_isspace(int c);
+static inline av_const int av_isspace(int c)
+{
+return c == ' ' || c == '\f' || c == '\n' || c == '\r' || c == '\t' ||
+   c == '\v';
+}
 
 /**
  * Locale-independent conversion of ASCII characters to uppercase.
@@ -189,7 +199,11 @@ static inline av_const int av_tolower(int c)
 /**
  * Locale-independent conversion of ASCII isxdigit.
  */
-av_const int av_isxdigit(int c);
+static inline av_const int av_isxdigit(int c)
+{
+c = av_tolower(c);
+return av_isdigit(c) || (c >= 'a' && c <= 'f');
+}
 
 /*
  * Locale-independent case-insensitive compare.
-- 
1.8.3.2

___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

[libav-devel] [PATCH] checkasm: Use a self-balancing tree

2015-09-25 Thread Henrik Gramner

Tested functions are internally kept in a binary search tree for efficient
lookups. The downside of the current implementation is that the tree quickly
becomes unbalanced which causes an unneccessary amount of comparisons between
nodes. Improve this by changing the tree into a self-balancing left-leaning
red-black tree with a worst case lookup/insertion time complexity of O(log n).

Significantly reduces the recursion depth and makes the tests run around 10%
faster overall. The relative performance improvement compared to the existing
non-balanced tree will also most likely increase as more tests are added.
---
 tests/checkasm/checkasm.c | 59 +--
 1 file changed, 47 insertions(+), 12 deletions(-)

diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c
index 3d47a43..013e197 100644
--- a/tests/checkasm/checkasm.c
+++ b/tests/checkasm/checkasm.c
@@ -125,6 +125,7 @@ typedef struct CheckasmFuncVersion {
 typedef struct CheckasmFunc {
 struct CheckasmFunc *child[2];
 CheckasmFuncVersion versions;
+uint8_t color; /* 0 = red, 1 = black */
 char name[1];
 } CheckasmFunc;
 
@@ -287,24 +288,57 @@ static int cmp_func_names(const char *a, const char *b)
 return (digit_diff = av_isdigit(*a) - av_isdigit(*b)) ? digit_diff : 
ascii_diff;
 }
 
+/* Perform a tree rotation in the specified direction and return the new root 
*/
+static CheckasmFunc *rotate_tree(CheckasmFunc *f, int dir)
+{
+CheckasmFunc *r = f->child[dir^1];
+f->child[dir^1] = r->child[dir];
+r->child[dir] = f;
+r->color = f->color;
+f->color = 0;
+return r;
+}
+
+#define is_red(f) ((f) && !(f)->color)
+
+/* Balance a left-leaning red-black tree at the specified node */
+static void balance_tree(CheckasmFunc **root)
+{
+CheckasmFunc *f = *root;
+
+if (is_red(f->child[0]) && is_red(f->child[1])) {
+f->color ^= 1;
+f->child[0]->color = f->child[1]->color = 1;
+}
+
+if (!is_red(f->child[0]) && is_red(f->child[1]))
+*root = rotate_tree(f, 0); /* Rotate left */
+else if (is_red(f->child[0]) && is_red(f->child[0]->child[0]))
+*root = rotate_tree(f, 1); /* Rotate right */
+}
+
 /* Get a node with the specified name, creating it if it doesn't exist */
-static CheckasmFunc *get_func(const char *name, int length)
+static CheckasmFunc *get_func(CheckasmFunc **root, const char *name)
 {
-CheckasmFunc *f, **f_ptr = 
+CheckasmFunc *f = *root;
 
-/* Search the tree for a matching node */
-while ((f = *f_ptr)) {
+if (f) {
+/* Search the tree for a matching node */
 int cmp = cmp_func_names(name, f->name);
-if (!cmp)
-return f;
+if (cmp) {
+f = get_func(>child[cmp > 0], name);
 
-f_ptr = >child[(cmp > 0)];
+/* Rebalance the tree on the way up if a new node was inserted */
+if (!f->versions.func)
+balance_tree(root);
+}
+} else {
+/* Allocate and insert a new node into the tree */
+int name_length = strlen(name);
+f = *root = checkasm_malloc(sizeof(CheckasmFunc) + name_length);
+memcpy(f->name, name, name_length + 1);
 }
 
-/* Allocate and insert a new node into the tree */
-f = *f_ptr = checkasm_malloc(sizeof(CheckasmFunc) + length);
-memcpy(f->name, name, length+1);
-
 return f;
 }
 
@@ -405,7 +439,8 @@ void *checkasm_check_func(void *func, const char *name, ...)
 if (!func || name_length <= 0 || name_length >= sizeof(name_buf))
 return NULL;
 
-state.current_func = get_func(name_buf, name_length);
+state.current_func = get_func(, name_buf);
+state.funcs->color = 1;
 v = _func->versions;
 
 if (v->func) {
-- 
1.8.3.2

___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

[libav-devel] [PATCH] checkasm/x86: Correctly handle variadic functions

2015-09-23 Thread Henrik Gramner

The System V ABI on x86-64 specifies that the al register contains an upper
bound of the number of arguments passed in vector registers when calling
variadic functions, so we aren't allowed to clobber it.

checkasm_fail_func() is a variadic function so also zero al before calling it.
---
 tests/checkasm/x86/checkasm.asm | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/tests/checkasm/x86/checkasm.asm b/tests/checkasm/x86/checkasm.asm
index 828352c..94b19b6 100644
--- a/tests/checkasm/x86/checkasm.asm
+++ b/tests/checkasm/x86/checkasm.asm
@@ -77,8 +77,10 @@ cglobal stack_clobber, 1,2
 
 %if WIN64
 %assign free_regs 7
+DECLARE_REG_TMP 4
 %else
 %assign free_regs 9
+DECLARE_REG_TMP 7
 %endif
 
 ;-
@@ -86,7 +88,7 @@ cglobal stack_clobber, 1,2
 ;-
 INIT_XMM
 cglobal checked_call, 2,15,16,max_args*8+8
-mov  r6, r0
+mov  t0, r0
 
 ; All arguments have been pushed on the stack instead of registers in 
order to
 ; test for incorrect assumptions that 32-bit ints are zero-extended to 
64-bit.
@@ -129,7 +131,7 @@ cglobal checked_call, 2,15,16,max_args*8+8
 mov r %+ i, [n %+ i]
 %assign i i-1
 %endrep
-call r6
+call t0
 %assign i 14
 %rep 15-free_regs
 xor r %+ i, [n %+ i]
@@ -156,6 +158,7 @@ cglobal checked_call, 2,15,16,max_args*8+8
 mov  r9, rax
 mov r10, rdx
 lea  r0, [error_message]
+xor eax, eax
 call fail_func
 mov rdx, r10
 mov rax, r9
-- 
1.8.3.2

___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH] tiny_psnr: Use the correct abs() version

2015-09-22 Thread Henrik Gramner

On Tue, Sep 22, 2015 at 9:28 PM, Vittorio Giovara
 wrote:
> I am puzzled as well, msdn reports this function available only from
> vs2013, but there is a vs2012 fate instance which seems to compile
> fine with it.

That wouldn't exactly be the first incorrect thing in the MSDN
documentation though.
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

[libav-devel] [PATCH] checkasm: v210: Fix array overwrite

2015-09-16 Thread Henrik Gramner

---
 tests/checkasm/v210enc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/checkasm/v210enc.c b/tests/checkasm/v210enc.c
index cdb8e76..4f5f6ba 100644
--- a/tests/checkasm/v210enc.c
+++ b/tests/checkasm/v210enc.c
@@ -43,7 +43,7 @@
 AV_WN32A(v0 + i, r);   \
 AV_WN32A(v1 + i, r);   \
 }  \
-for (i = 0; i < BUF_SIZE * 8 / 3; i += 4) {\
+for (i = 0; i < width * 8 / 3; i += 4) {   \
 uint32_t r = rnd();\
 AV_WN32A(dst0 + i, r); \
 AV_WN32A(dst1 + i, r); \
-- 
1.8.3.2

___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

[libav-devel] [PATCH] checkasm: add unit tests for v210enc

2015-09-05 Thread Henrik Gramner

---
 libavcodec/v210enc.c  | 15 +---
 libavcodec/v210enc.h  |  2 +
 tests/checkasm/Makefile   |  1 +
 tests/checkasm/checkasm.c |  3 ++
 tests/checkasm/checkasm.h |  1 +
 tests/checkasm/v210enc.c  | 94 +++
 6 files changed, 111 insertions(+), 5 deletions(-)
 create mode 100644 tests/checkasm/v210enc.c

diff --git a/libavcodec/v210enc.c b/libavcodec/v210enc.c
index 42cbb86..ca6ad2e 100644
--- a/libavcodec/v210enc.c
+++ b/libavcodec/v210enc.c
@@ -82,6 +82,15 @@ static void v210_planar_pack_10_c(const uint16_t *y, const 
uint16_t *u,
 }
 }
 
+av_cold void ff_v210enc_init(V210EncContext *s)
+{
+s->pack_line_8  = v210_planar_pack_8_c;
+s->pack_line_10 = v210_planar_pack_10_c;
+
+if (ARCH_X86)
+ff_v210enc_init_x86(s);
+}
+
 static av_cold int encode_init(AVCodecContext *avctx)
 {
 V210EncContext *s = avctx->priv_data;
@@ -97,11 +106,7 @@ FF_DISABLE_DEPRECATION_WARNINGS
 FF_ENABLE_DEPRECATION_WARNINGS
 #endif
 
-s->pack_line_8  = v210_planar_pack_8_c;
-s->pack_line_10 = v210_planar_pack_10_c;
-
-if (ARCH_X86)
-ff_v210enc_init_x86(s);
+ff_v210enc_init(s);
 
 return 0;
 }
diff --git a/libavcodec/v210enc.h b/libavcodec/v210enc.h
index be9b66d..81a3531 100644
--- a/libavcodec/v210enc.h
+++ b/libavcodec/v210enc.h
@@ -30,6 +30,8 @@ typedef struct V210EncContext {
  const uint16_t *v, uint8_t *dst, ptrdiff_t width);
 } V210EncContext;
 
+void ff_v210enc_init(V210EncContext *s);
+
 void ff_v210enc_init_x86(V210EncContext *s);
 
 #endif /* AVCODEC_V210ENC_H */
diff --git a/tests/checkasm/Makefile b/tests/checkasm/Makefile
index ff57aa8..5fccad9 100644
--- a/tests/checkasm/Makefile
+++ b/tests/checkasm/Makefile
@@ -2,6 +2,7 @@
 AVCODECOBJS-$(CONFIG_BSWAPDSP) += bswapdsp.o
 AVCODECOBJS-$(CONFIG_H264PRED) += h264pred.o
 AVCODECOBJS-$(CONFIG_H264QPEL) += h264qpel.o
+AVCODECOBJS-$(CONFIG_V210_ENCODER) += v210enc.o
 
 CHECKASMOBJS-$(CONFIG_AVCODEC) += $(AVCODECOBJS-yes)
 
diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c
index a120bc3..3d47a43 100644
--- a/tests/checkasm/checkasm.c
+++ b/tests/checkasm/checkasm.c
@@ -66,6 +66,9 @@ static const struct {
 #if CONFIG_H264QPEL
 { "h264qpel", checkasm_check_h264qpel },
 #endif
+#if CONFIG_V210_ENCODER
+{ "v210enc", checkasm_check_v210enc },
+#endif
 { NULL }
 };
 
diff --git a/tests/checkasm/checkasm.h b/tests/checkasm/checkasm.h
index cbf3dca..aa32655 100644
--- a/tests/checkasm/checkasm.h
+++ b/tests/checkasm/checkasm.h
@@ -32,6 +32,7 @@
 void checkasm_check_bswapdsp(void);
 void checkasm_check_h264pred(void);
 void checkasm_check_h264qpel(void);
+void checkasm_check_v210enc(void);
 
 void *checkasm_check_func(void *func, const char *name, ...) 
av_printf_format(2, 3);
 int checkasm_bench_func(void);
diff --git a/tests/checkasm/v210enc.c b/tests/checkasm/v210enc.c
new file mode 100644
index 000..08c79e4
--- /dev/null
+++ b/tests/checkasm/v210enc.c
@@ -0,0 +1,94 @@
+/*
+ * Copyright (c) 2015 Henrik Gramner
+ *
+ * This file is part of Libav.
+ *
+ * Libav is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * Libav is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with Libav; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+#include 
+#include "checkasm.h"
+#include "libavcodec/v210enc.h"
+#include "libavutil/common.h"
+#include "libavutil/internal.h"
+#include "libavutil/intreadwrite.h"
+
+#define BUF_SIZE 512
+
+#define randomize_buffers(mask)\
+do {   \
+int i, size = sizeof(*y0); \
+for (i = 0; i < BUF_SIZE; i += 4/size) {   \
+uint32_t r = rnd() & mask; \
+AV_WN32A(y0+i, r); \
+AV_WN32A(y1+i, r); \
+}  \
+for (i = 0; i < BUF_SIZE/2; i += 4/size) { \
+uint32_t r = rnd() & mask; \
+AV_WN32A(u0+i, r); \
+AV_WN32A(u1+i, r); \
+r = rnd() & mask;  \
+AV_WN32A(v0+i, r); \
+AV_WN32A(v1+i, r); \
+}  \
+for (i = 0; i <

[libav-devel] [PATCH] checkasm: Fix floating point arguments on 64-bit Windows

2015-08-24 Thread Henrik Gramner

---
 tests/checkasm/x86/checkasm.asm | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/tests/checkasm/x86/checkasm.asm b/tests/checkasm/x86/checkasm.asm
index 4948fc9..828352c 100644
--- a/tests/checkasm/x86/checkasm.asm
+++ b/tests/checkasm/x86/checkasm.asm
@@ -103,16 +103,20 @@ cglobal checked_call, 2,15,16,max_args*8+8
 mov  [rsp+(i-6)*8], r9
 %assign i i+1
 %endrep
-%else
+%else ; WIN64
 %assign i 4
 %rep max_args-4
 mov  r9, [rsp+stack_offset+(i+7)*8]
 mov  [rsp+i*8], r9
 %assign i i+1
 %endrep
-%endif
 
-%if WIN64
+; Move possible floating-point arguments to the correct registers
+movq m0, r0
+movq m1, r1
+movq m2, r2
+movq m3, r3
+
 %assign i 6
 %rep 16-6
 mova m %+ i, [x %+ i]
-- 
1.8.3.2

___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH] hevcdsp: add x86 SIMD for MC

2015-08-23 Thread Henrik Gramner

On Sun, Aug 23, 2015 at 8:27 PM, Anton Khirnov an...@khirnov.net wrote:
 Quoting James Almer (2015-08-22 23:58:41)
 You need to use the d suffix
 instead of q on the register names to make sure the high bits are cleared.

 Eh? Perhaps I'm misunderstading something, but I'd expect that using d
 here would do exactly the opposite and keep the random data in the high bits.

Operations on 32-bit registers zeroes the high bits of the register.
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH] checkasm: add HEVC MC tests

2015-08-22 Thread Henrik Gramner

Minor nits:

 +#define randomize_buffers(buf, size, depth)

s/buffers/buffer/ since you're only randomizing a single one at a time.

 +static const char *interp_names[2][2] = { { pixels, h }, { v, hv } };

const char * const

Otherwise lgtm.
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

[libav-devel] [PATCH v2] checkasm: Explicitly declare function prototypes

2015-08-20 Thread Henrik Gramner

Now we no longer have to rely on function pointers intentionally
declared without specified argument types.

This makes it easier to support functions with floating point parameters
or return values as well as functions returning 64-bit values on 32-bit
architectures. It also avoids having to explicitly cast strides to
ptrdiff_t for example.

---
v2: Updated to fix comments in x86/checkasm.asm
---
 tests/checkasm/Makefile |  3 ---
 tests/checkasm/bswapdsp.c   |  2 ++
 tests/checkasm/checkasm.c   |  6 +++---
 tests/checkasm/checkasm.h   | 38 ++
 tests/checkasm/h264pred.c   | 32 
 tests/checkasm/h264qpel.c   |  7 ---
 tests/checkasm/x86/checkasm.asm |  4 ++--
 7 files changed, 53 insertions(+), 39 deletions(-)

diff --git a/tests/checkasm/Makefile b/tests/checkasm/Makefile
index 9498ebf..ff57aa8 100644
--- a/tests/checkasm/Makefile
+++ b/tests/checkasm/Makefile
@@ -17,9 +17,6 @@ CHECKASMDIRS := $(sort $(dir $(CHECKASMOBJS)))
 $(CHECKASMOBJS): | $(CHECKASMDIRS)
 OBJDIRS += $(CHECKASMDIRS)
 
-# We rely on function pointers intentionally declared without specified 
argument types.
-tests/checkasm/%.o: CFLAGS := 
$(CFLAGS:-Wstrict-prototypes=-Wno-strict-prototypes)
-
 CHECKASM := tests/checkasm/checkasm$(EXESUF)
 
 $(CHECKASM): $(EXEOBJS) $(CHECKASMOBJS) $(FF_STATIC_DEP_LIBS)
diff --git a/tests/checkasm/bswapdsp.c b/tests/checkasm/bswapdsp.c
index 748a886..829ebaa 100644
--- a/tests/checkasm/bswapdsp.c
+++ b/tests/checkasm/bswapdsp.c
@@ -43,6 +43,8 @@
 #define check_bswap(type)  
\
 do {   
\
 int w; 
\
+declare_func(void, type *dst, const type *src, int w); 
\
+   
\
 for (w = 0; w  BUF_SIZE / sizeof(type); w++) {
\
 int offset = (BUF_SIZE / sizeof(type) - w)  15; /* Test various 
alignments */ \
 randomize_buffers();   
\
diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c
index b564e7e..a120bc3 100644
--- a/tests/checkasm/checkasm.c
+++ b/tests/checkasm/checkasm.c
@@ -111,7 +111,7 @@ static const struct {
 
 typedef struct CheckasmFuncVersion {
 struct CheckasmFuncVersion *next;
-intptr_t (*func)();
+void *func;
 int ok;
 int cpu;
 int iterations;
@@ -387,10 +387,10 @@ int main(int argc, char *argv[])
 /* Decide whether or not the specified function needs to be tested and
  * allocate/initialize data structures if needed. Returns a pointer to a
  * reference function if the function should be tested, otherwise NULL */
-intptr_t (*checkasm_check_func(intptr_t (*func)(), const char *name, ...))()
+void *checkasm_check_func(void *func, const char *name, ...)
 {
 char name_buf[256];
-intptr_t (*ref)() = func;
+void *ref = func;
 CheckasmFuncVersion *v;
 int name_length;
 va_list arg;
diff --git a/tests/checkasm/checkasm.h b/tests/checkasm/checkasm.h
index 443546a..cbf3dca 100644
--- a/tests/checkasm/checkasm.h
+++ b/tests/checkasm/checkasm.h
@@ -33,7 +33,7 @@ void checkasm_check_bswapdsp(void);
 void checkasm_check_h264pred(void);
 void checkasm_check_h264qpel(void);
 
-intptr_t (*checkasm_check_func(intptr_t (*func)(), const char *name, ...))() 
av_printf_format(2, 3);
+void *checkasm_check_func(void *func, const char *name, ...) 
av_printf_format(2, 3);
 int checkasm_bench_func(void);
 void checkasm_fail_func(const char *msg, ...) av_printf_format(1, 2);
 void checkasm_update_bench(int iterations, uint64_t cycles);
@@ -42,14 +42,16 @@ void checkasm_report(const char *name, ...) 
av_printf_format(1, 2);
 extern AVLFG checkasm_lfg;
 #define rnd() av_lfg_get(checkasm_lfg)
 
-static av_unused intptr_t (*func_ref)();
-static av_unused intptr_t (*func_new)();
+static av_unused void *func_ref, *func_new;
 
 #define BENCH_RUNS 1000 /* Trade-off between accuracy and speed */
 
 /* Decide whether or not the specified function needs to be tested */
-#define check_func(func, ...) ((func_new = (intptr_t (*)())func) \
-  (func_ref = checkasm_check_func(func_new, 
__VA_ARGS__)))
+#define check_func(func, ...) (func_ref = checkasm_check_func((func_new = 
func), __VA_ARGS__))
+
+/* Declare the function prototype. The first argument is the return value, the 
remaining
+ * arguments are the function parameters. Naming parameters is optional. */
+#define declare_func(ret, ...) declare_new(ret, __VA_ARGS__) typedef ret 
func_type(__VA_ARGS__)
 
 /* Indicate that the current test has failed */
 #define fail() checkasm_fail_func(%s:%d, av_basename(__FILE__), __LINE__)
@@

Re: [libav-devel] [PATCH 7/8] checkasm: add HEVC MC tests

2015-08-20 Thread Henrik Gramner

On Wed, Aug 19, 2015 at 9:43 PM, Anton Khirnov an...@khirnov.net wrote:
 +const int srcstride = FFALIGN(width, 16) * sizeof(*src0);
 +const int dststride = FFALIGN(width, 16) * PIXEL_SIZE(bit_depth);

 Strides, and any other pointer-sized value, should be ptrdiff_t - or
more preferable, review/push my checkasm patch and rebase this one on
top of that to get rid of the issue. :)

 +report(%s, qpel);
 +report(%s, epel);
 +report(%s, unweighted_pred);
 +report(%s, weighted_pred);

The %s is redundant with string literals.
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

[libav-devel] [PATCH] checkasm: Explicitly declare function prototypes

2015-08-16 Thread Henrik Gramner

Now we no longer have to rely on function pointers intentionally
declared without specified argument types.

This makes it easier to support functions with floating point parameters
or return values as well as functions returning 64-bit values on 32-bit
architectures. It also avoids having to explicitly cast strides to
ptrdiff_t for example.
---
 tests/checkasm/Makefile |  3 ---
 tests/checkasm/bswapdsp.c   |  2 ++
 tests/checkasm/checkasm.c   |  6 +++---
 tests/checkasm/checkasm.h   | 38 ++
 tests/checkasm/h264pred.c   | 32 
 tests/checkasm/h264qpel.c   |  7 ---
 tests/checkasm/x86/checkasm.asm |  2 +-
 7 files changed, 52 insertions(+), 38 deletions(-)

diff --git a/tests/checkasm/Makefile b/tests/checkasm/Makefile
index 9498ebf..ff57aa8 100644
--- a/tests/checkasm/Makefile
+++ b/tests/checkasm/Makefile
@@ -17,9 +17,6 @@ CHECKASMDIRS := $(sort $(dir $(CHECKASMOBJS)))
 $(CHECKASMOBJS): | $(CHECKASMDIRS)
 OBJDIRS += $(CHECKASMDIRS)
 
-# We rely on function pointers intentionally declared without specified 
argument types.
-tests/checkasm/%.o: CFLAGS := 
$(CFLAGS:-Wstrict-prototypes=-Wno-strict-prototypes)
-
 CHECKASM := tests/checkasm/checkasm$(EXESUF)
 
 $(CHECKASM): $(EXEOBJS) $(CHECKASMOBJS) $(FF_STATIC_DEP_LIBS)
diff --git a/tests/checkasm/bswapdsp.c b/tests/checkasm/bswapdsp.c
index 748a886..829ebaa 100644
--- a/tests/checkasm/bswapdsp.c
+++ b/tests/checkasm/bswapdsp.c
@@ -43,6 +43,8 @@
 #define check_bswap(type)  
\
 do {   
\
 int w; 
\
+declare_func(void, type *dst, const type *src, int w); 
\
+   
\
 for (w = 0; w  BUF_SIZE / sizeof(type); w++) {
\
 int offset = (BUF_SIZE / sizeof(type) - w)  15; /* Test various 
alignments */ \
 randomize_buffers();   
\
diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c
index b564e7e..a120bc3 100644
--- a/tests/checkasm/checkasm.c
+++ b/tests/checkasm/checkasm.c
@@ -111,7 +111,7 @@ static const struct {
 
 typedef struct CheckasmFuncVersion {
 struct CheckasmFuncVersion *next;
-intptr_t (*func)();
+void *func;
 int ok;
 int cpu;
 int iterations;
@@ -387,10 +387,10 @@ int main(int argc, char *argv[])
 /* Decide whether or not the specified function needs to be tested and
  * allocate/initialize data structures if needed. Returns a pointer to a
  * reference function if the function should be tested, otherwise NULL */
-intptr_t (*checkasm_check_func(intptr_t (*func)(), const char *name, ...))()
+void *checkasm_check_func(void *func, const char *name, ...)
 {
 char name_buf[256];
-intptr_t (*ref)() = func;
+void *ref = func;
 CheckasmFuncVersion *v;
 int name_length;
 va_list arg;
diff --git a/tests/checkasm/checkasm.h b/tests/checkasm/checkasm.h
index 443546a..cbf3dca 100644
--- a/tests/checkasm/checkasm.h
+++ b/tests/checkasm/checkasm.h
@@ -33,7 +33,7 @@ void checkasm_check_bswapdsp(void);
 void checkasm_check_h264pred(void);
 void checkasm_check_h264qpel(void);
 
-intptr_t (*checkasm_check_func(intptr_t (*func)(), const char *name, ...))() 
av_printf_format(2, 3);
+void *checkasm_check_func(void *func, const char *name, ...) 
av_printf_format(2, 3);
 int checkasm_bench_func(void);
 void checkasm_fail_func(const char *msg, ...) av_printf_format(1, 2);
 void checkasm_update_bench(int iterations, uint64_t cycles);
@@ -42,14 +42,16 @@ void checkasm_report(const char *name, ...) 
av_printf_format(1, 2);
 extern AVLFG checkasm_lfg;
 #define rnd() av_lfg_get(checkasm_lfg)
 
-static av_unused intptr_t (*func_ref)();
-static av_unused intptr_t (*func_new)();
+static av_unused void *func_ref, *func_new;
 
 #define BENCH_RUNS 1000 /* Trade-off between accuracy and speed */
 
 /* Decide whether or not the specified function needs to be tested */
-#define check_func(func, ...) ((func_new = (intptr_t (*)())func) \
-  (func_ref = checkasm_check_func(func_new, 
__VA_ARGS__)))
+#define check_func(func, ...) (func_ref = checkasm_check_func((func_new = 
func), __VA_ARGS__))
+
+/* Declare the function prototype. The first argument is the return value, the 
remaining
+ * arguments are the function parameters. Naming parameters is optional. */
+#define declare_func(ret, ...) declare_new(ret, __VA_ARGS__) typedef ret 
func_type(__VA_ARGS__)
 
 /* Indicate that the current test has failed */
 #define fail() checkasm_fail_func(%s:%d, av_basename(__FILE__), __LINE__)
@@ -58,18 +60,16 @@ static av_unused intptr_t (*func_new)();

[libav-devel] [PATCH] checkasm: x86: properly save rdx/edx in checked_call()

2015-08-16 Thread Henrik Gramner

If the return value doesn't fit in a single register rdx/edx can in some
cases be used in addition to rax/eax.

Doesn't affect any of the existing checkasm tests but might be useful later.

Also comment the relevant code a bit better.
---
 tests/checkasm/x86/checkasm.asm | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/tests/checkasm/x86/checkasm.asm b/tests/checkasm/x86/checkasm.asm
index cc8745f..38fc90e 100644
--- a/tests/checkasm/x86/checkasm.asm
+++ b/tests/checkasm/x86/checkasm.asm
@@ -145,10 +145,15 @@ cglobal checked_call, 2,15,16,max_args*8+8
 or  r14, r5
 %endif
 
+; Call fail_func() with a descriptive message to mark it as a failure
+; if the called function didn't preserve all callee-saved registers.
+; Save the return value located in rdx:rax first to prevent clobbering.
 jz .ok
 mov  r9, rax
+mov r10, rdx
 lea  r0, [error_message]
 call fail_func
+mov rdx, r10
 mov rax, r9
 .ok:
 RET
@@ -182,9 +187,11 @@ cglobal checked_call, 1,7
 or   r3, r5
 jz .ok
 mov  r3, eax
+mov  r4, edx
 lea  r0, [error_message]
 mov [esp], r0
 call fail_func
+mov  edx, r4
 mov  eax, r3
 .ok:
 add  esp, max_args*4
-- 
1.8.3.2

___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

[libav-devel] [PATCH] x86inc: Various minor backports from x264

2015-08-11 Thread Henrik Gramner

---
 libavutil/x86/x86inc.asm | 32 +---
 1 file changed, 21 insertions(+), 11 deletions(-)

diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm
index a519fd5..6ad9785 100644
--- a/libavutil/x86/x86inc.asm
+++ b/libavutil/x86/x86inc.asm
@@ -1,7 +1,7 @@
 ;*
 ;* x86inc.asm: x264asm abstraction layer
 ;*
-;* Copyright (C) 2005-2013 x264 project
+;* Copyright (C) 2005-2015 x264 project
 ;*
 ;* Authors: Loren Merritt lor...@u.washington.edu
 ;*  Anton Mitrofanov bugmas...@narod.ru
@@ -67,6 +67,15 @@
 %endif
 %endif
 
+%define FORMAT_ELF 0
+%ifidn __OUTPUT_FORMAT__,elf
+%define FORMAT_ELF 1
+%elifidn __OUTPUT_FORMAT__,elf32
+%define FORMAT_ELF 1
+%elifidn __OUTPUT_FORMAT__,elf64
+%define FORMAT_ELF 1
+%endif
+
 %ifdef PREFIX
 %define mangle(x) _ %+ x
 %else
@@ -688,7 +697,7 @@ BRANCH_INSTR jz, je, jnz, jne, jl, jle, jnl, jnle, jg, jge, 
jng, jnge, ja, jae,
 CAT_XDEFINE cglobaled_, %2, 1
 %endif
 %xdefine current_function %2
-%ifidn __OUTPUT_FORMAT__,elf
+%if FORMAT_ELF
 global %2:function %%VISIBILITY
 %else
 global %2
@@ -714,14 +723,16 @@ BRANCH_INSTR jz, je, jnz, jne, jl, jle, jnl, jnle, jg, 
jge, jng, jnge, ja, jae,
 
 ; like cextern, but without the prefix
 %macro cextern_naked 1
-%xdefine %1 mangle(%1)
+%ifdef PREFIX
+%xdefine %1 mangle(%1)
+%endif
 CAT_XDEFINE cglobaled_, %1, 1
 extern %1
 %endmacro
 
 %macro const 1-2+
 %xdefine %1 mangle(private_prefix %+ _ %+ %1)
-%ifidn __OUTPUT_FORMAT__,elf
+%if FORMAT_ELF
 global %1:data hidden
 %else
 global %1
@@ -729,10 +740,9 @@ BRANCH_INSTR jz, je, jnz, jne, jl, jle, jnl, jnle, jg, 
jge, jng, jnge, ja, jae,
 %1: %2
 %endmacro
 
-; This is needed for ELF, otherwise the GNU linker assumes the stack is
-; executable by default.
-%ifidn __OUTPUT_FORMAT__,elf
-[section .note.GNU-stack noalloc noexec nowrite progbits]
+; This is needed for ELF, otherwise the GNU linker assumes the stack is 
executable by default.
+%if FORMAT_ELF
+[SECTION .note.GNU-stack noalloc noexec nowrite progbits]
 %endif
 
 ; cpuflags
@@ -751,8 +761,8 @@ BRANCH_INSTR jz, je, jnz, jne, jl, jle, jnl, jnle, jg, jge, 
jng, jnge, ja, jae,
 %assign cpuflags_avx  (111)| cpuflags_sse42
 %assign cpuflags_xop  (112)| cpuflags_avx
 %assign cpuflags_fma4 (113)| cpuflags_avx
-%assign cpuflags_avx2 (114)| cpuflags_avx
-%assign cpuflags_fma3 (115)| cpuflags_avx
+%assign cpuflags_fma3 (114)| cpuflags_avx
+%assign cpuflags_avx2 (115)| cpuflags_fma3
 
 %assign cpuflags_cache32  (116)
 %assign cpuflags_cache64  (117)
@@ -801,7 +811,7 @@ BRANCH_INSTR jz, je, jnz, jne, jl, jle, jnl, jnle, jg, jge, 
jng, jnge, ja, jae,
 %endif
 %endif
 
-%if cpuflag(sse2)
+%if ARCH_X86_64 || cpuflag(sse2)
 CPUNOP amdnop
 %else
 CPUNOP basicnop
-- 
1.8.3.2

___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

[libav-devel] [PATCH] checkasm: Remove unnecessary include

2015-08-05 Thread Henrik Gramner

---
 tests/checkasm/checkasm.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c
index 82c635e..b564e7e 100644
--- a/tests/checkasm/checkasm.c
+++ b/tests/checkasm/checkasm.c
@@ -33,10 +33,6 @@
 #include io.h
 #endif
 
-#if ARCH_X86
-#include libavutil/x86/cpu.h
-#endif
-
 #if HAVE_SETCONSOLETEXTATTRIBUTE
 #include windows.h
 #define COLOR_REDFOREGROUND_RED
-- 
1.8.3.2

___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH 7/8] x86inc: nasm support

2015-08-02 Thread Henrik Gramner

On Sat, Aug 1, 2015 at 5:27 PM, Henrik Gramner hen...@gramner.com wrote:
 ---
  configure|  3 ---
  libavutil/x86/x86inc.asm | 42 +-
  2 files changed, 29 insertions(+), 16 deletions(-)

Skip this one for now, nasm seems to have a bug with dependency
generation when using smartalign. x264 doesn't handle dependencies the
same way so it worked fine there.

I've filed a bug report with nasm.
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH 8/8] x86inc: Various minor backports from x264

2015-08-02 Thread Henrik Gramner

On Sat, Aug 1, 2015 at 9:34 PM, James Almer jamr...@gmail.com wrote:
 The same could be done in av_parse_cpu_flags().
 It doesn't affect this patch, and can be done separately. Just throwing
 the idea out there.

Yeah, I guess.

 What about bmi/bmi2, for that matter?

What about them?
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH] x86: dct: Disable dct32_float_sse on x86-64

2015-08-01 Thread Henrik Gramner

On Sat, Aug 1, 2015 at 8:28 PM, Anton Khirnov an...@khirnov.net wrote:
Any specific reason you use ARCH_X86_64 in one file and
ARCH_X86_32 in the other?

I missed that there's a define for ARCH_X86_32 in asm (some other code
used %if ARCH_X86_64 == 0 so I assumed it didn't).

Using ARCH_X86_32 in both places is obviously clearer.
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

[libav-devel] [PATCH] x86: dcadsp: Avoid SSE2 instructions in SSE functions

2015-08-01 Thread Henrik Gramner

---
 libavcodec/x86/dcadsp.asm | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libavcodec/x86/dcadsp.asm b/libavcodec/x86/dcadsp.asm
index c42ee23..c99df12 100644
--- a/libavcodec/x86/dcadsp.asm
+++ b/libavcodec/x86/dcadsp.asm
@@ -148,7 +148,7 @@ DECODE_HF
 addps   m4, va ; va1+3 vb1+3 va2+4 vb2+4
 movhlps vb, m4 ; va1+3  vb1+3
 addps   vb, m4 ; va0..4 vb0..4
-movh[outq + count], vb
+movlps  [outq + count], vb
 %if %1
 sub   cf0q, 8*NUM_COEF
 %endif
-- 
1.8.3.2

___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

[libav-devel] [PATCH 3/8] x86inc: warn when instructions incompatible with current cpuflags are used

2015-08-01 Thread Henrik Gramner

From: Anton Mitrofanov bugmas...@narod.ru

Signed-off-by: Henrik Gramner hen...@gramner.com
---
 libavutil/x86/x86inc.asm | 587 ---
 1 file changed, 299 insertions(+), 288 deletions(-)

diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm
index ae6813a..96ebe37 100644
--- a/libavutil/x86/x86inc.asm
+++ b/libavutil/x86/x86inc.asm
@@ -1069,15 +1069,16 @@ INIT_XMM
 %endmacro
 
 ;%1 == instruction
-;%2 == 1 if float, 0 if int
-;%3 == 1 if non-destructive or 4-operand (xmm, xmm, xmm, imm), 0 otherwise
-;%4 == 1 if commutative (i.e. doesn't matter which src arg is which), 0 if not
-;%5+: operands
-%macro RUN_AVX_INSTR 5-8+
-%ifnum sizeof%6
+;%2 == minimal instruction set
+;%3 == 1 if float, 0 if int
+;%4 == 1 if non-destructive or 4-operand (xmm, xmm, xmm, imm), 0 otherwise
+;%5 == 1 if commutative (i.e. doesn't matter which src arg is which), 0 if not
+;%6+: operands
+%macro RUN_AVX_INSTR 6-9+
+%ifnum sizeof%7
+%assign __sizeofreg sizeof%7
+%elifnum sizeof%6
 %assign __sizeofreg sizeof%6
-%elifnum sizeof%5
-%assign __sizeofreg sizeof%5
 %else
 %assign __sizeofreg mmsize
 %endif
@@ -1086,325 +1087,335 @@ INIT_XMM
 %xdefine __instr v%1
 %else
 %xdefine __instr %1
-%if %0 = 7+%3
+%if %0 = 8+%4
 %assign __emulate_avx 1
 %endif
 %endif
+%ifnidn %2, fnord
+%ifdef cpuname
+%if notcpuflag(%2)
+%error use of ``%1'' %2 instruction in cpuname function: 
current_function
+%elif cpuflags_%2  cpuflags_sse  notcpuflag(sse2)  
__sizeofreg  8
+%error use of ``%1'' sse2 instruction in cpuname function: 
current_function
+%endif
+%endif
+%endif
 
 %if __emulate_avx
-%xdefine __src1 %6
-%xdefine __src2 %7
-%ifnidn %5, %6
-%if %0 = 8
-CHECK_AVX_INSTR_EMU {%1 %5, %6, %7, %8}, %5, %7, %8
+%xdefine __src1 %7
+%xdefine __src2 %8
+%ifnidn %6, %7
+%if %0 = 9
+CHECK_AVX_INSTR_EMU {%1 %6, %7, %8, %9}, %6, %8, %9
 %else
-CHECK_AVX_INSTR_EMU {%1 %5, %6, %7}, %5, %7
+CHECK_AVX_INSTR_EMU {%1 %6, %7, %8}, %6, %8
 %endif
-%if %4  %3 == 0
-%ifnid %7
+%if %5  %4 == 0
+%ifnid %8
 ; 3-operand AVX instructions with a memory arg can only 
have it in src2,
 ; whereas SSE emulation prefers to have it in src1 (i.e. 
the mov).
 ; So, if the instruction is commutative with a memory arg, 
swap them.
-%xdefine __src1 %7
-%xdefine __src2 %6
+%xdefine __src1 %8
+%xdefine __src2 %7
 %endif
 %endif
 %if __sizeofreg == 8
-MOVQ %5, __src1
-%elif %2
-MOVAPS %5, __src1
+MOVQ %6, __src1
+%elif %3
+MOVAPS %6, __src1
 %else
-MOVDQA %5, __src1
+MOVDQA %6, __src1
 %endif
 %endif
-%if %0 = 8
-%1 %5, __src2, %8
+%if %0 = 9
+%1 %6, __src2, %9
 %else
-%1 %5, __src2
+%1 %6, __src2
 %endif
-%elif %0 = 8
-__instr %5, %6, %7, %8
+%elif %0 = 9
+__instr %6, %7, %8, %9
+%elif %0 == 8
+__instr %6, %7, %8
 %elif %0 == 7
-__instr %5, %6, %7
-%elif %0 == 6
-__instr %5, %6
+__instr %6, %7
 %else
-__instr %5
+__instr %6
 %endif
 %endmacro
 
 ;%1 == instruction
-;%2 == 1 if float, 0 if int
-;%3 == 1 if non-destructive or 4-operand (xmm, xmm, xmm, imm), 0 otherwise
-;%4 == 1 if commutative (i.e. doesn't matter which src arg is which), 0 if not
-%macro AVX_INSTR 1-4 0, 1, 0
-%macro %1 1-9 fnord, fnord, fnord, fnord, %1, %2, %3, %4
+;%2 == minimal instruction set
+;%3 == 1 if float, 0 if int
+;%4 == 1 if non-destructive or 4-operand (xmm, xmm, xmm, imm), 0 otherwise
+;%5 == 1 if commutative (i.e. doesn't matter which src arg is which), 0 if not
+%macro AVX_INSTR 1-5 fnord, 0, 1, 0
+%macro %1 1-10 fnord, fnord, fnord, fnord, %1, %2, %3, %4, %5
 %ifidn %2, fnord
-RUN_AVX_INSTR %6, %7, %8, %9, %1
+RUN_AVX_INSTR %6, %7, %8, %9, %10, %1
 %elifidn %3, fnord
-RUN_AVX_INSTR %6, %7, %8, %9, %1, %2
+RUN_AVX_INSTR %6, %7, %8, %9, %10, %1, %2
 %elifidn %4, fnord
-RUN_AVX_INSTR %6, %7, %8, %9, %1, %2, %3
+RUN_AVX_INSTR %6, %7, %8, %9, %10, %1, %2, %3
 %elifidn %5, fnord
-RUN_AVX_INSTR %6, %7, %8, %9, %1, %2, %3, %4
+RUN_AVX_INSTR %6, %7, %8, %9, %10, %1, %2, %3, %4
 %else

[libav-devel] [PATCH 8/8] x86inc: Various minor backports from x264

2015-08-01 Thread Henrik Gramner

---
 libavutil/x86/x86inc.asm | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm
index d70a5f9..0e2f447 100644
--- a/libavutil/x86/x86inc.asm
+++ b/libavutil/x86/x86inc.asm
@@ -1,7 +1,7 @@
 ;*
 ;* x86inc.asm: x264asm abstraction layer
 ;*
-;* Copyright (C) 2005-2013 x264 project
+;* Copyright (C) 2005-2015 x264 project
 ;*
 ;* Authors: Loren Merritt lor...@u.washington.edu
 ;*  Anton Mitrofanov bugmas...@narod.ru
@@ -740,7 +740,7 @@ BRANCH_INSTR jz, je, jnz, jne, jl, jle, jnl, jnle, jg, jge, 
jng, jnge, ja, jae,
 
 ; This is needed for ELF, otherwise the GNU linker assumes the stack is 
executable by default.
 %if FORMAT_ELF
-[section .note.GNU-stack noalloc noexec nowrite progbits]
+[SECTION .note.GNU-stack noalloc noexec nowrite progbits]
 %endif
 
 ; cpuflags
@@ -759,8 +759,8 @@ BRANCH_INSTR jz, je, jnz, jne, jl, jle, jnl, jnle, jg, jge, 
jng, jnge, ja, jae,
 %assign cpuflags_avx  (111)| cpuflags_sse42
 %assign cpuflags_xop  (112)| cpuflags_avx
 %assign cpuflags_fma4 (113)| cpuflags_avx
-%assign cpuflags_avx2 (114)| cpuflags_avx
-%assign cpuflags_fma3 (115)| cpuflags_avx
+%assign cpuflags_fma3 (114)| cpuflags_avx
+%assign cpuflags_avx2 (115)| cpuflags_fma3
 
 %assign cpuflags_cache32  (116)
 %assign cpuflags_cache64  (117)
@@ -809,7 +809,7 @@ BRANCH_INSTR jz, je, jnz, jne, jl, jle, jnl, jnle, jg, jge, 
jng, jnge, ja, jae,
 %endif
 %endif
 
-%if cpuflag(sse2)
+%if ARCH_X86_64 || cpuflag(sse2)
 %ifdef __NASM_VER__
 ALIGNMODE k8
 %else
-- 
1.8.3.2

___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

[libav-devel] [PATCH 1/8] x86inc: warn if XOP integer FMA instruction emulation is impossible

2015-08-01 Thread Henrik Gramner

From: Anton Mitrofanov bugmas...@narod.ru

Emulation requires a temporary register if arguments 1 and 4 are the same; this
doesn't obey the semantics of the original instruction, so we can't emulate
that in x86inc.

Also add pmacsdql emulation.

Signed-off-by: Henrik Gramner hen...@gramner.com
---
 libavutil/x86/x86inc.asm | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm
index a6e1f33..4c0a4bd 100644
--- a/libavutil/x86/x86inc.asm
+++ b/libavutil/x86/x86inc.asm
@@ -1410,15 +1410,18 @@ AVX_INSTR pfmul, 1, 0, 1
 %macro %1 4-7 %1, %2, %3
 %if cpuflag(xop)
 v%5 %1, %2, %3, %4
-%else
+%elifnidn %1, %4
 %6 %1, %2, %3
 %7 %1, %4
+%else
+%error non-xop emulation of ``%5 %1, %2, %3, %4'' is not supported
 %endif
 %endmacro
 %endmacro
 
-FMA_INSTR  pmacsdd,  pmulld, paddd
 FMA_INSTR  pmacsww,  pmullw, paddw
+FMA_INSTR  pmacsdd,  pmulld, paddd ; sse4 emulation
+FMA_INSTR pmacsdql,  pmuldq, paddq ; sse4 emulation
 FMA_INSTR pmadcswd, pmaddwd, paddd
 
 ; tzcnt is equivalent to rep bsf and is backwards-compatible with bsf.
-- 
1.8.3.2

___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

[libav-devel] [PATCH 2/8] x86inc: Support arbitrary stack alignments

2015-08-01 Thread Henrik Gramner

Change ALLOC_STACK to always align the stack before allocating stack space for
consistency. Previously alignment would occur either before or after allocating
stack space depending on whether manual alignment was required or not.
---
 libavcodec/x86/h264_deblock.asm |  4 +--
 libavutil/x86/x86inc.asm| 62 ++---
 2 files changed, 42 insertions(+), 24 deletions(-)

diff --git a/libavcodec/x86/h264_deblock.asm b/libavcodec/x86/h264_deblock.asm
index d2067c8..33fd5a9 100644
--- a/libavcodec/x86/h264_deblock.asm
+++ b/libavcodec/x86/h264_deblock.asm
@@ -444,13 +444,13 @@ cglobal deblock_%1_luma_8, 5,5,8,2*%2
 ;int8_t *tc0)
 ;-
 INIT_MMX cpuname
-cglobal deblock_h_luma_8, 0,5,8,0x60+HAVE_ALIGNED_STACK*12
+cglobal deblock_h_luma_8, 0,5,8,0x60+12
 movr0, r0mp
 movr3, r1m
 lear4, [r3*3]
 subr0, 4
 lear1, [r0+r4]
-%define pix_tmp esp+12*HAVE_ALIGNED_STACK
+%define pix_tmp esp+12
 
 ; transpose 6x16 - tmp space
 TRANSPOSE6x8_MEM  PASS8ROWS(r0, r1, r3, r4), pix_tmp
diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm
index 4c0a4bd..ae6813a 100644
--- a/libavutil/x86/x86inc.asm
+++ b/libavutil/x86/x86inc.asm
@@ -42,6 +42,17 @@
 %define public_prefix private_prefix
 %endif
 
+%if HAVE_ALIGNED_STACK
+%define STACK_ALIGNMENT 16
+%endif
+%ifndef STACK_ALIGNMENT
+%if ARCH_X86_64
+%define STACK_ALIGNMENT 16
+%else
+%define STACK_ALIGNMENT 4
+%endif
+%endif
+
 %define WIN64  0
 %define UNIX64 0
 %if ARCH_X86_64
@@ -108,8 +119,9 @@
 ; %1 = number of arguments. loads them from stack if needed.
 ; %2 = number of registers used. pushes callee-saved regs if needed.
 ; %3 = number of xmm registers used. pushes callee-saved xmm regs if needed.
-; %4 = (optional) stack size to be allocated. If not aligned (x86-32 ICC 10.x,
-;  MSVC or YMM), the stack will be manually aligned (to 16 or 32 bytes),
+; %4 = (optional) stack size to be allocated. The stack will be aligned before
+;  allocating the specified stack size. If the required stack alignment is
+;  larger than the known stack alignment the stack will be manually aligned
 ;  and an extra register will be allocated to hold the original stack
 ;  pointer (to not invalidate r0m etc.). To prevent the use of an extra
 ;  register as stack pointer, request a negative stack size.
@@ -117,8 +129,10 @@
 ; PROLOGUE can also be invoked by adding the same options to cglobal
 
 ; e.g.
-; cglobal foo, 2,3,0, dst, src, tmp
-; declares a function (foo), taking two args (dst and src) and one local 
variable (tmp)
+; cglobal foo, 2,3,7,0x40, dst, src, tmp
+; declares a function (foo) that automatically loads two arguments (dst and
+; src) into registers, uses one additional register (tmp) plus 7 vector
+; registers (m0-m6) and allocates 0x40 bytes of stack space.
 
 ; TODO Some functions can use some args directly from the stack. If they're the
 ; last args then you can just not declare them, but if they're in the middle
@@ -319,26 +333,28 @@ DECLARE_REG_TMP_SIZE 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14
 %assign n_arg_names %0
 %endmacro
 
+%define required_stack_alignment ((mmsize + 15)  ~15)
+
 %macro ALLOC_STACK 1-2 0 ; stack_size, n_xmm_regs (for win64 only)
 %ifnum %1
 %if %1 != 0
-%assign %%stack_alignment ((mmsize + 15)  ~15)
+%assign %%pad 0
 %assign stack_size %1
 %if stack_size  0
 %assign stack_size -stack_size
 %endif
-%assign stack_size_padded stack_size
 %if WIN64
-%assign stack_size_padded stack_size_padded + 32 ; reserve 32 
bytes for shadow space
+%assign %%pad %%pad + 32 ; shadow space
 %if mmsize != 8
 %assign xmm_regs_used %2
 %if xmm_regs_used  8
-%assign stack_size_padded stack_size_padded + 
(xmm_regs_used-8)*16
+%assign %%pad %%pad + (xmm_regs_used-8)*16 ; 
callee-saved xmm registers
 %endif
 %endif
 %endif
-%if mmsize = 16  HAVE_ALIGNED_STACK
-%assign stack_size_padded stack_size_padded + 
%%stack_alignment - gprsize - (stack_offset  (%%stack_alignment - 1))
+%if required_stack_alignment = STACK_ALIGNMENT
+; maintain the current stack alignment
+%assign stack_size_padded stack_size + %%pad + 
((-%%pad-stack_offset-gprsize)  (STACK_ALIGNMENT-1))
 SUB rsp, stack_size_padded
 %else
 %assign %%reg_num (regs_used - 1)
@@ -347,17 +363,17 @@ DECLARE_REG_TMP_SIZE 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14
 ; it, i.e. in [rsp+stack_size_padded], so we can restore the
 ; stack in a single

[libav-devel] [PATCH 0/8] x86inc: Sync changes from x264

2015-08-01 Thread Henrik Gramner

This brings x86inc.asm in libavutil up to date with x86inc.asm in x264.

They're not 100% identical but the difference is tiny compared to before.

Anton Mitrofanov (2):
  x86inc: warn if XOP integer FMA instruction emulation is impossible
  x86inc: warn when instructions incompatible with current cpuflags are
used

Christophe Gisquet (1):
  x86inc: Fix instantiation of YMM registers

Henrik Gramner (5):
  x86inc: Support arbitrary stack alignments
  x86inc: Disable vpbroadcastq workaround in newer yasm versions
  x86inc: Drop SECTION_TEXT macro
  x86inc: nasm support
  x86inc: Various minor backports from x264

 configure   |   3 -
 libavcodec/x86/apedsp.asm   |   2 +-
 libavcodec/x86/audiodsp.asm |   2 +-
 libavcodec/x86/bswapdsp.asm |   2 +-
 libavcodec/x86/dcadsp.asm   |   2 +-
 libavcodec/x86/dct32.asm|   2 +-
 libavcodec/x86/fft.asm  |   2 +-
 libavcodec/x86/fmtconvert.asm   |   2 +-
 libavcodec/x86/h263_loopfilter.asm  |   2 +-
 libavcodec/x86/h264_deblock.asm |   4 +-
 libavcodec/x86/hpeldsp.asm  |   2 +-
 libavcodec/x86/huffyuvdsp.asm   |   2 +-
 libavcodec/x86/imdct36.asm  |   2 +-
 libavcodec/x86/pngdsp.asm   |   2 +-
 libavcodec/x86/qpeldsp.asm  |   2 +-
 libavcodec/x86/sbrdsp.asm   |   2 +-
 libavfilter/x86/af_volume.asm   |   2 +-
 libavresample/x86/audio_convert.asm |   2 +-
 libavresample/x86/audio_mix.asm |   2 +-
 libavresample/x86/dither.asm|   2 +-
 libavutil/x86/x86inc.asm| 742 +++-
 21 files changed, 410 insertions(+), 375 deletions(-)

-- 
1.8.3.2

___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

[libav-devel] [PATCH 6/8] x86inc: Drop SECTION_TEXT macro

2015-08-01 Thread Henrik Gramner

The .text section is already 16-byte aligned by default on all supported
platforms so `SECTION_TEXT` isn't any different from `SECTION .text`.
---
 libavcodec/x86/apedsp.asm   |  2 +-
 libavcodec/x86/audiodsp.asm |  2 +-
 libavcodec/x86/bswapdsp.asm |  2 +-
 libavcodec/x86/dcadsp.asm   |  2 +-
 libavcodec/x86/dct32.asm|  2 +-
 libavcodec/x86/fft.asm  |  2 +-
 libavcodec/x86/fmtconvert.asm   |  2 +-
 libavcodec/x86/h263_loopfilter.asm  |  2 +-
 libavcodec/x86/hpeldsp.asm  |  2 +-
 libavcodec/x86/huffyuvdsp.asm   |  2 +-
 libavcodec/x86/imdct36.asm  |  2 +-
 libavcodec/x86/pngdsp.asm   |  2 +-
 libavcodec/x86/qpeldsp.asm  |  2 +-
 libavcodec/x86/sbrdsp.asm   |  2 +-
 libavfilter/x86/af_volume.asm   |  2 +-
 libavresample/x86/audio_convert.asm |  2 +-
 libavresample/x86/audio_mix.asm |  2 +-
 libavresample/x86/dither.asm|  2 +-
 libavutil/x86/x86inc.asm| 12 
 19 files changed, 18 insertions(+), 30 deletions(-)

diff --git a/libavcodec/x86/apedsp.asm b/libavcodec/x86/apedsp.asm
index d721ebd..d6abd98 100644
--- a/libavcodec/x86/apedsp.asm
+++ b/libavcodec/x86/apedsp.asm
@@ -20,7 +20,7 @@
 
 %include libavutil/x86/x86util.asm
 
-SECTION_TEXT
+SECTION .text
 
 %macro SCALARPRODUCT 0
 ; int ff_scalarproduct_and_madd_int16(int16_t *v1, int16_t *v2, int16_t *v3,
diff --git a/libavcodec/x86/audiodsp.asm b/libavcodec/x86/audiodsp.asm
index f2e831d..696a73b 100644
--- a/libavcodec/x86/audiodsp.asm
+++ b/libavcodec/x86/audiodsp.asm
@@ -21,7 +21,7 @@
 
 %include libavutil/x86/x86util.asm
 
-SECTION_TEXT
+SECTION .text
 
 %macro SCALARPRODUCT 0
 ; int ff_scalarproduct_int16(int16_t *v1, int16_t *v2, int order)
diff --git a/libavcodec/x86/bswapdsp.asm b/libavcodec/x86/bswapdsp.asm
index 42580a3..4810867 100644
--- a/libavcodec/x86/bswapdsp.asm
+++ b/libavcodec/x86/bswapdsp.asm
@@ -24,7 +24,7 @@
 SECTION_RODATA
 pb_bswap32: db 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12
 
-SECTION_TEXT
+SECTION .text
 
 ; %1 = aligned/unaligned
 %macro BSWAP_LOOPS  1
diff --git a/libavcodec/x86/dcadsp.asm b/libavcodec/x86/dcadsp.asm
index c99df12..18c7a0c 100644
--- a/libavcodec/x86/dcadsp.asm
+++ b/libavcodec/x86/dcadsp.asm
@@ -24,7 +24,7 @@
 SECTION_RODATA
 pf_inv16:  times 4 dd 0x3D80 ; 1/16
 
-SECTION_TEXT
+SECTION .text
 
 ; void decode_hf(float dst[DCA_SUBBANDS][8], const int32_t 
vq_num[DCA_SUBBANDS],
 ;const int8_t hf_vq[1024][32], intptr_t vq_offset,
diff --git a/libavcodec/x86/dct32.asm b/libavcodec/x86/dct32.asm
index fa723b0..c7d2b6b 100644
--- a/libavcodec/x86/dct32.asm
+++ b/libavcodec/x86/dct32.asm
@@ -191,7 +191,7 @@ ps_p1p1m1m1: dd 0, 0, 0x8000, 0x8000, 0, 0, 
0x8000, 0x8000
 %endmacro
 
 INIT_YMM avx
-SECTION_TEXT
+SECTION .text
 ; void ff_dct32_float_avx(FFTSample *out, const FFTSample *in)
 cglobal dct32_float, 2,3,8, out, in, tmp
 ; pass 1
diff --git a/libavcodec/x86/fft.asm b/libavcodec/x86/fft.asm
index e4744a3..d3be72e 100644
--- a/libavcodec/x86/fft.asm
+++ b/libavcodec/x86/fft.asm
@@ -90,7 +90,7 @@ cextern cos_ %+ i
 %1
 %endmacro
 
-SECTION_TEXT
+SECTION .text
 
 %macro T2_3DNOW 4 ; z0, z1, mem0, mem1
 mova %1, %3
diff --git a/libavcodec/x86/fmtconvert.asm b/libavcodec/x86/fmtconvert.asm
index e82f149..727daa9 100644
--- a/libavcodec/x86/fmtconvert.asm
+++ b/libavcodec/x86/fmtconvert.asm
@@ -21,7 +21,7 @@
 
 %include libavutil/x86/x86util.asm
 
-SECTION_TEXT
+SECTION .text
 
 ;--
 ; void ff_int32_to_float_fmul_scalar(float *dst, const int32_t *src, float mul,
diff --git a/libavcodec/x86/h263_loopfilter.asm 
b/libavcodec/x86/h263_loopfilter.asm
index 673f795..cd726ba 100644
--- a/libavcodec/x86/h263_loopfilter.asm
+++ b/libavcodec/x86/h263_loopfilter.asm
@@ -24,7 +24,7 @@ SECTION_RODATA
 cextern pb_FC
 cextern h263_loop_filter_strength
 
-SECTION_TEXT
+SECTION .text
 
 %macro H263_LOOP_FILTER 5
 pxor m7, m7
diff --git a/libavcodec/x86/hpeldsp.asm b/libavcodec/x86/hpeldsp.asm
index 073f7f9..b8929b9 100644
--- a/libavcodec/x86/hpeldsp.asm
+++ b/libavcodec/x86/hpeldsp.asm
@@ -23,7 +23,7 @@
 SECTION_RODATA
 cextern pb_1
 
-SECTION_TEXT
+SECTION .text
 
 ; void ff_put_pixels8_x2(uint8_t *block, const uint8_t *pixels, ptrdiff_t 
line_size, int h)
 %macro PUT_PIXELS8_X2 0
diff --git a/libavcodec/x86/huffyuvdsp.asm b/libavcodec/x86/huffyuvdsp.asm
index 436abc8..e7536da 100644
--- a/libavcodec/x86/huffyuvdsp.asm
+++ b/libavcodec/x86/huffyuvdsp.asm
@@ -28,7 +28,7 @@ pb_7: times 8 db 7
 pb_: db -1,-1,-1,-1,3,3,3,3,-1,-1,-1,-1,11,11,11,11
 pb_zz11zz55zz99zzdd: db -1,-1,1,1,-1,-1,5,5,-1,-1,9,9,-1,-1,13,13
 
-SECTION_TEXT
+SECTION .text
 
 ; void ff_add_hfyu_median_pred_mmxext(uint8_t *dst, const uint8_t *top,
 ; const uint8_t *diff, int w,
diff --git a/libavcodec/x86/imdct36.asm

[libav-devel] [PATCH 5/8] x86inc: Disable vpbroadcastq workaround in newer yasm versions

2015-08-01 Thread Henrik Gramner

The bug was fixed in 1.3.0, so only perform the workaround in earlier versions.
---
 libavutil/x86/x86inc.asm | 20 +++-
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm
index 2844fdf..d4ce68f 100644
--- a/libavutil/x86/x86inc.asm
+++ b/libavutil/x86/x86inc.asm
@@ -1499,13 +1499,15 @@ FMA4_INSTR fnmsubps, fnmsub132ps, fnmsub213ps, 
fnmsub231ps
 FMA4_INSTR fnmsubsd, fnmsub132sd, fnmsub213sd, fnmsub231sd
 FMA4_INSTR fnmsubss, fnmsub132ss, fnmsub213ss, fnmsub231ss
 
-; workaround: vpbroadcastq is broken in x86_32 due to a yasm bug
-%if ARCH_X86_64 == 0
-%macro vpbroadcastq 2
-%if sizeof%1 == 16
-movddup %1, %2
-%else
-vbroadcastsd %1, %2
-%endif
-%endmacro
+; workaround: vpbroadcastq is broken in x86_32 due to a yasm bug (fixed in 
1.3.0)
+%ifdef __YASM_VER__
+%if __YASM_VERSION_ID__  0x0103  ARCH_X86_64 == 0
+%macro vpbroadcastq 2
+%if sizeof%1 == 16
+movddup %1, %2
+%else
+vbroadcastsd %1, %2
+%endif
+%endmacro
+%endif
 %endif
-- 
1.8.3.2

___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

[libav-devel] [PATCH 7/8] x86inc: nasm support

2015-08-01 Thread Henrik Gramner

---
 configure|  3 ---
 libavutil/x86/x86inc.asm | 42 +-
 2 files changed, 29 insertions(+), 16 deletions(-)

diff --git a/configure b/configure
index 482be43..79dd3a5 100755
--- a/configure
+++ b/configure
@@ -1353,7 +1353,6 @@ ARCH_EXT_LIST_PPC=
 
 ARCH_EXT_LIST_X86=
 $ARCH_EXT_LIST_X86_SIMD
-cpunop
 i686
 
 
@@ -1732,7 +1731,6 @@ ppc4xx_deps=ppc
 vsx_deps=altivec
 power8_deps=vsx
 
-cpunop_deps=i686
 x86_64_select=i686
 x86_64_suggest=fast_cmov
 
@@ -4151,7 +4149,6 @@ EOF
 check_yasm vpmacsdd xmm0, xmm1, xmm2, xmm3 || disable xop_external
 check_yasm vfmadd132ps ymm0, ymm1, ymm2|| disable fma3_external
 check_yasm vfmaddps ymm0, ymm1, ymm2, ymm3 || disable fma4_external
-check_yasm CPU amdnop || disable cpunop
 fi
 
 case $cpu in
diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm
index a519fd5..d70a5f9 100644
--- a/libavutil/x86/x86inc.asm
+++ b/libavutil/x86/x86inc.asm
@@ -67,6 +67,15 @@
 %endif
 %endif
 
+%define FORMAT_ELF 0
+%ifidn __OUTPUT_FORMAT__,elf
+%define FORMAT_ELF 1
+%elifidn __OUTPUT_FORMAT__,elf32
+%define FORMAT_ELF 1
+%elifidn __OUTPUT_FORMAT__,elf64
+%define FORMAT_ELF 1
+%endif
+
 %ifdef PREFIX
 %define mangle(x) _ %+ x
 %else
@@ -96,11 +105,9 @@
 default rel
 %endif
 
-%macro CPUNOP 1
-%if HAVE_CPUNOP
-CPU %1
-%endif
-%endmacro
+%ifdef __NASM_VER__
+%use smartalign
+%endif
 
 ; Macros to eliminate most code duplication between x86_32 and x86_64:
 ; Currently this works only for leaf functions which load all their arguments
@@ -688,7 +695,7 @@ BRANCH_INSTR jz, je, jnz, jne, jl, jle, jnl, jnle, jg, jge, 
jng, jnge, ja, jae,
 CAT_XDEFINE cglobaled_, %2, 1
 %endif
 %xdefine current_function %2
-%ifidn __OUTPUT_FORMAT__,elf
+%if FORMAT_ELF
 global %2:function %%VISIBILITY
 %else
 global %2
@@ -714,14 +721,16 @@ BRANCH_INSTR jz, je, jnz, jne, jl, jle, jnl, jnle, jg, 
jge, jng, jnge, ja, jae,
 
 ; like cextern, but without the prefix
 %macro cextern_naked 1
-%xdefine %1 mangle(%1)
+%ifdef PREFIX
+%xdefine %1 mangle(%1)
+%endif
 CAT_XDEFINE cglobaled_, %1, 1
 extern %1
 %endmacro
 
 %macro const 1-2+
 %xdefine %1 mangle(private_prefix %+ _ %+ %1)
-%ifidn __OUTPUT_FORMAT__,elf
+%if FORMAT_ELF
 global %1:data hidden
 %else
 global %1
@@ -729,9 +738,8 @@ BRANCH_INSTR jz, je, jnz, jne, jl, jle, jnl, jnle, jg, jge, 
jng, jnge, ja, jae,
 %1: %2
 %endmacro
 
-; This is needed for ELF, otherwise the GNU linker assumes the stack is
-; executable by default.
-%ifidn __OUTPUT_FORMAT__,elf
+; This is needed for ELF, otherwise the GNU linker assumes the stack is 
executable by default.
+%if FORMAT_ELF
 [section .note.GNU-stack noalloc noexec nowrite progbits]
 %endif
 
@@ -802,9 +810,17 @@ BRANCH_INSTR jz, je, jnz, jne, jl, jle, jnl, jnle, jg, 
jge, jng, jnge, ja, jae,
 %endif
 
 %if cpuflag(sse2)
-CPUNOP amdnop
+%ifdef __NASM_VER__
+ALIGNMODE k8
+%else
+CPU amdnop
+%endif
 %else
-CPUNOP basicnop
+%ifdef __NASM_VER__
+ALIGNMODE nop
+%else
+CPU basicnop
+%endif
 %endif
 %endmacro
 
-- 
1.8.3.2

___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

[libav-devel] [PATCH 4/8] x86inc: Fix instantiation of YMM registers

2015-08-01 Thread Henrik Gramner

From: Christophe Gisquet christophe.gisq...@gmail.com

Signed-off-by: Henrik Gramner hen...@gramner.com
---
 libavutil/x86/x86inc.asm | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm
index 96ebe37..2844fdf 100644
--- a/libavutil/x86/x86inc.asm
+++ b/libavutil/x86/x86inc.asm
@@ -893,7 +893,7 @@ BRANCH_INSTR jz, je, jnz, jne, jl, jle, jnl, jnle, jg, jge, 
jng, jnge, ja, jae,
 %assign %%i 0
 %rep num_mmregs
 CAT_XDEFINE m, %%i, ymm %+ %%i
-CAT_XDEFINE nymm, %%i, %%i
+CAT_XDEFINE nnymm, %%i, %%i
 %assign %%i %%i+1
 %endrep
 INIT_CPUFLAGS %1
-- 
1.8.3.2

___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

[libav-devel] [PATCH] x86: dct: Disable dct32_float_sse on x86-64

2015-08-01 Thread Henrik Gramner

There is an SSE2 implementation so the SSE version is never used. The SSE
version also happens to contain SSE2 instructions on x86-64.
---
 libavcodec/x86/dct32.asm  | 3 +++
 libavcodec/x86/dct_init.c | 2 ++
 2 files changed, 5 insertions(+)

diff --git a/libavcodec/x86/dct32.asm b/libavcodec/x86/dct32.asm
index 9c147b9..fa723b0 100644
--- a/libavcodec/x86/dct32.asm
+++ b/libavcodec/x86/dct32.asm
@@ -482,7 +482,10 @@ cglobal dct32_float, 2, 3, 16, out, in, tmp
 %endif
 %endmacro
 
+%if ARCH_X86_64 == 0
 INIT_XMM sse
 DCT32_FUNC
+%endif
+
 INIT_XMM sse2
 DCT32_FUNC
diff --git a/libavcodec/x86/dct_init.c b/libavcodec/x86/dct_init.c
index ca9fbc7..b2e43a9 100644
--- a/libavcodec/x86/dct_init.c
+++ b/libavcodec/x86/dct_init.c
@@ -30,8 +30,10 @@ av_cold void ff_dct_init_x86(DCTContext *s)
 {
 int cpu_flags = av_get_cpu_flags();
 
+#if ARCH_X86_32
 if (EXTERNAL_SSE(cpu_flags))
 s-dct32 = ff_dct32_float_sse;
+#endif
 if (EXTERNAL_SSE2(cpu_flags))
 s-dct32 = ff_dct32_float_sse2;
 if (EXTERNAL_AVX_FAST(cpu_flags))
-- 
1.8.3.2

___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH] x86: dcadsp: Avoid SSE2 instructions in SSE functions

2015-08-01 Thread Henrik Gramner

On Sat, Aug 1, 2015 at 8:49 PM, James Almer jamr...@gmail.com wrote:
 I however think movq/sd should be used here for sse2 and above instead of
 movlps.

That's a moot point in this case since the code in question is SSE
only (and even if it wasn't I'm skeptical to the claim that it would
be measurably slower than movsd).
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH] checkasm: Include io.h for isatty, if available

2015-07-29 Thread Henrik Gramner

On Wed, Jul 29, 2015 at 10:09 PM, Martin Storsjö mar...@martin.st wrote:
 configure does check for isatty, and checkasm properly checks
 HAVE_ISATTY, but on some platforms (e.g. WinRT), io.h needs to be
 included for isatty to be available.

Ok.
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

[libav-devel] [PATCH 2/2] checkasm: Use LOCAL_ALIGNED

2015-07-24 Thread Henrik Gramner

From: Michael Niedermayer mich...@niedermayer.cc

Fixes alignment issues and bus errors.
---
 tests/checkasm/bswapdsp.c | 9 +
 tests/checkasm/h264pred.c | 5 +++--
 tests/checkasm/h264qpel.c | 9 +
 3 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/tests/checkasm/bswapdsp.c b/tests/checkasm/bswapdsp.c
index 3871029..748a886 100644
--- a/tests/checkasm/bswapdsp.c
+++ b/tests/checkasm/bswapdsp.c
@@ -22,6 +22,7 @@
 #include checkasm.h
 #include libavcodec/bswapdsp.h
 #include libavutil/common.h
+#include libavutil/internal.h
 #include libavutil/intreadwrite.h
 
 #define BUF_SIZE 512
@@ -55,10 +56,10 @@
 
 void checkasm_check_bswapdsp(void)
 {
-DECLARE_ALIGNED(16, uint8_t, src0)[BUF_SIZE];
-DECLARE_ALIGNED(16, uint8_t, src1)[BUF_SIZE];
-DECLARE_ALIGNED(16, uint8_t, dst0)[BUF_SIZE];
-DECLARE_ALIGNED(16, uint8_t, dst1)[BUF_SIZE];
+LOCAL_ALIGNED_16(uint8_t, src0, [BUF_SIZE]);
+LOCAL_ALIGNED_16(uint8_t, src1, [BUF_SIZE]);
+LOCAL_ALIGNED_16(uint8_t, dst0, [BUF_SIZE]);
+LOCAL_ALIGNED_16(uint8_t, dst1, [BUF_SIZE]);
 BswapDSPContext h;
 
 ff_bswapdsp_init(h);
diff --git a/tests/checkasm/h264pred.c b/tests/checkasm/h264pred.c
index 40e949a..08f23e6 100644
--- a/tests/checkasm/h264pred.c
+++ b/tests/checkasm/h264pred.c
@@ -23,6 +23,7 @@
 #include libavcodec/avcodec.h
 #include libavcodec/h264pred.h
 #include libavutil/common.h
+#include libavutil/internal.h
 #include libavutil/intreadwrite.h
 
 static const int codec_ids[4] = { AV_CODEC_ID_H264, AV_CODEC_ID_VP8, 
AV_CODEC_ID_RV40, AV_CODEC_ID_SVQ3 };
@@ -232,8 +233,8 @@ void checkasm_check_h264pred(void)
 { check_pred8x8l,  pred8x8l  },
 };
 
-DECLARE_ALIGNED(16, uint8_t, buf0)[BUF_SIZE];
-DECLARE_ALIGNED(16, uint8_t, buf1)[BUF_SIZE];
+LOCAL_ALIGNED_16(uint8_t, buf0, [BUF_SIZE]);
+LOCAL_ALIGNED_16(uint8_t, buf1, [BUF_SIZE]);
 H264PredContext h;
 int test, codec, chroma_format, bit_depth;
 
diff --git a/tests/checkasm/h264qpel.c b/tests/checkasm/h264qpel.c
index 550f9d9..f734945 100644
--- a/tests/checkasm/h264qpel.c
+++ b/tests/checkasm/h264qpel.c
@@ -22,6 +22,7 @@
 #include checkasm.h
 #include libavcodec/h264qpel.h
 #include libavutil/common.h
+#include libavutil/internal.h
 #include libavutil/intreadwrite.h
 
 static const uint32_t pixel_mask[3] = { 0x, 0x01ff01ff, 0x03ff03ff };
@@ -48,10 +49,10 @@ static const uint32_t pixel_mask[3] = { 0x, 
0x01ff01ff, 0x03ff03ff };
 
 void checkasm_check_h264qpel(void)
 {
-DECLARE_ALIGNED(16, uint8_t, buf0)[BUF_SIZE];
-DECLARE_ALIGNED(16, uint8_t, buf1)[BUF_SIZE];
-DECLARE_ALIGNED(16, uint8_t, dst0)[BUF_SIZE];
-DECLARE_ALIGNED(16, uint8_t, dst1)[BUF_SIZE];
+LOCAL_ALIGNED_16(uint8_t, buf0, [BUF_SIZE]);
+LOCAL_ALIGNED_16(uint8_t, buf1, [BUF_SIZE]);
+LOCAL_ALIGNED_16(uint8_t, dst0, [BUF_SIZE]);
+LOCAL_ALIGNED_16(uint8_t, dst1, [BUF_SIZE]);
 H264QpelContext h;
 int op, bit_depth, i, j;
 
-- 
1.8.3.2

___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

[libav-devel] [PATCH 1/2] checkasm: Modify report format

2015-07-24 Thread Henrik Gramner

Makes it a bit more clear where each test belongs.

Suggested by Anton Khirnov.
---
 tests/checkasm/checkasm.c | 57 +++
 tests/checkasm/checkasm.h |  2 +-
 tests/checkasm/h264qpel.c |  2 +-
 3 files changed, 30 insertions(+), 31 deletions(-)

diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c
index e6cf3d7..f1e9cd9 100644
--- a/tests/checkasm/checkasm.c
+++ b/tests/checkasm/checkasm.c
@@ -53,17 +53,20 @@
 #endif
 
 /* List of tests to invoke */
-static void (* const tests[])(void) = {
+static const struct {
+const char *name;
+void (*func)(void);
+} tests[] = {
 #if CONFIG_BSWAPDSP
-checkasm_check_bswapdsp,
+{ bswapdsp, checkasm_check_bswapdsp },
 #endif
 #if CONFIG_H264PRED
-checkasm_check_h264pred,
+{ h264pred, checkasm_check_h264pred },
 #endif
 #if CONFIG_H264QPEL
-checkasm_check_h264qpel,
+{ h264qpel, checkasm_check_h264qpel },
 #endif
-NULL
+{ NULL }
 };
 
 /* List of cpu flags to check */
@@ -127,6 +130,7 @@ static struct {
 CheckasmFunc *funcs;
 CheckasmFunc *current_func;
 CheckasmFuncVersion *current_func_ver;
+const char *current_test_name;
 const char *bench_pattern;
 int bench_pattern_len;
 int num_checked;
@@ -314,8 +318,10 @@ static void check_cpu_flag(const char *name, int flag)
 int i;
 
 state.cpu_flag_name = name;
-for (i = 0; tests[i]; i++)
-tests[i]();
+for (i = 0; tests[i].func; i++) {
+state.current_test_name = tests[i].name;
+tests[i].func();
+}
 }
 }
 
@@ -332,7 +338,7 @@ int main(int argc, char *argv[])
 {
 int i, seed, ret = 0;
 
-if (!tests[0] || !cpus[0].flag) {
+if (!tests[0].func || !cpus[0].flag) {
 fprintf(stderr, checkasm: no tests to perform\n);
 return 0;
 }
@@ -464,19 +470,15 @@ void checkasm_report(const char *name, ...)
 static int prev_checked, prev_failed, max_length;
 
 if (state.num_checked  prev_checked) {
-print_cpu_name();
-
-if (*name) {
-int pad_length = max_length;
-va_list arg;
+int pad_length = max_length + 4;
+va_list arg;
 
-fprintf(stderr,  - );
-va_start(arg, name);
-pad_length -= vfprintf(stderr, name, arg);
-va_end(arg);
-fprintf(stderr, %*c, FFMAX(pad_length, 0) + 2, '[');
-} else
-fprintf(stderr,  - %-*s [, max_length, state.current_func-name);
+print_cpu_name();
+pad_length -= fprintf(stderr,  - %s., state.current_test_name);
+va_start(arg, name);
+pad_length -= vfprintf(stderr, name, arg);
+va_end(arg);
+fprintf(stderr, %*c, FFMAX(pad_length, 0) + 2, '[');
 
 if (state.num_failed == prev_failed)
 color_printf(COLOR_GREEN, OK);
@@ -487,16 +489,13 @@ void checkasm_report(const char *name, ...)
 prev_checked = state.num_checked;
 prev_failed  = state.num_failed;
 } else if (!state.cpu_flag) {
-int length;
-
 /* Calculate the amount of padding required to make the output 
vertically aligned */
-if (*name) {
-va_list arg;
-va_start(arg, name);
-length = vsnprintf(NULL, 0, name, arg);
-va_end(arg);
-} else
-length = strlen(state.current_func-name);
+int length = strlen(state.current_test_name);
+va_list arg;
+
+va_start(arg, name);
+length += vsnprintf(NULL, 0, name, arg);
+va_end(arg);
 
 if (length  max_length)
 max_length = length;
diff --git a/tests/checkasm/checkasm.h b/tests/checkasm/checkasm.h
index b7a36ee..443546a 100644
--- a/tests/checkasm/checkasm.h
+++ b/tests/checkasm/checkasm.h
@@ -55,7 +55,7 @@ static av_unused intptr_t (*func_new)();
 #define fail() checkasm_fail_func(%s:%d, av_basename(__FILE__), __LINE__)
 
 /* Print the test outcome */
-#define report(...) checkasm_report( __VA_ARGS__)
+#define report checkasm_report
 
 /* Call the reference function */
 #define call_ref(...) func_ref(__VA_ARGS__)
diff --git a/tests/checkasm/h264qpel.c b/tests/checkasm/h264qpel.c
index 01b97ae..550f9d9 100644
--- a/tests/checkasm/h264qpel.c
+++ b/tests/checkasm/h264qpel.c
@@ -74,6 +74,6 @@ void checkasm_check_h264qpel(void)
 }
 }
 }
-report(%s_h264_qpel, op_name);
+report(%s, op_name);
 }
 }
-- 
1.8.3.2

___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH] [RFC] use a wrapper script to call MS link.exe to avoid mixing with /usr/bin/link.exe

2015-07-23 Thread Henrik Gramner

On Thu, Jul 23, 2015 at 7:23 PM, Steve Lhomme rob...@gmail.com wrote:
 On Thu, Jul 23, 2015 at 7:02 PM, Derek Buitenhuis 
 derek.buitenh...@gmail.com wrote:
 Broken permissions.

 Not sure how I can tweak that under Windows.

git update-index --chmod=+x file
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH] [RFC] use a wrapper script to call MS link.exe to avoid mixing with /usr/bin/link.exe

2015-07-23 Thread Henrik Gramner

On Thu, Jul 23, 2015 at 9:04 AM, Martin Storsjö mar...@martin.st wrote:
 Why is this suddenly using command instead of which now? This won't work
 in a linux environment.

Why wouldn't it work in a linux environment? `command` is POSIX.

This stackoverflow post sums it up fairly well:
https://stackoverflow.com/a/677212
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH 1/1] checkasm: remove empty array initializer list in h264pred test

2015-07-20 Thread Henrik Gramner

On Mon, Jul 20, 2015 at 11:18 PM, Janne Grunau janne-li...@jannau.net wrote:
 Fixes MSVC compilation.
 ---
  tests/checkasm/h264pred.c | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)

Ok.
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH 1/1] checkasm: fix MSVC build by adding a zero initializer for an empty array

2015-07-20 Thread Henrik Gramner

On Mon, Jul 20, 2015 at 11:58 AM, Janne Grunau janne-li...@jannau.net wrote:
 ---
  tests/checkasm/h264pred.c | 1 +
  1 file changed, 1 insertion(+)

Shouldn't it be NULL instead of 0 since those are pointers?

Otherwise OK.
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH 2/4] checkasm: test all architectures with optimisations

2015-07-17 Thread Henrik Gramner

lgtm.
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

[libav-devel] [PATCH 2/2] tests/checkasm/checkasm: Give macro a body to avoid potential unexpected syntax issues

2015-07-17 Thread Henrik Gramner

From: Michael Niedermayer mich...@niedermayer.cc

Signed-off-by: Michael Niedermayer mich...@niedermayer.cc
---
 tests/checkasm/checkasm.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/checkasm/checkasm.h b/tests/checkasm/checkasm.h
index 1a46e9b..b54be16 100644
--- a/tests/checkasm/checkasm.h
+++ b/tests/checkasm/checkasm.h
@@ -110,7 +110,7 @@ void checkasm_stack_clobber(uint64_t clobber, ...);
 }\
 } while (0)
 #else
-#define bench_new(...)
+#define bench_new(...) while(0)
 #endif
 
 #endif
-- 
1.8.3.2

___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

[libav-devel] [PATCH 1/2] checkasm: exit with status 0 instead of 1 if there are no tests to perform

2015-07-17 Thread Henrik Gramner

---
 tests/checkasm/checkasm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c
index 7b1ea8f..0aa3d1c 100644
--- a/tests/checkasm/checkasm.c
+++ b/tests/checkasm/checkasm.c
@@ -317,7 +317,7 @@ int main(int argc, char *argv[])
 
 if (!tests[0] || !cpus[0].flag) {
 fprintf(stderr, checkasm: no tests to perform\n);
-return 1;
+return 0;
 }
 
 if (argc  1  !strncmp(argv[1], --bench, 7)) {
-- 
1.8.3.2

___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH] cosmetics: Reformat checkasm tests

2015-07-17 Thread Henrik Gramner

On Fri, Jul 17, 2015 at 8:08 PM, Luca Barbato lu_z...@gentoo.org wrote:
 -qpel_mc_func (*tab)[16] = op ? h.avg_h264_qpel_pixels_tab : 
 h.put_h264_qpel_pixels_tab;
 +qpel_mc_func(*tab)[16] = op ? h.avg_h264_qpel_pixels_tab : 
 h.put_h264_qpel_pixels_tab;

No space between type and identifier? I don't particularly have any
preference for either way, but it's used in most other places:
https://git.libav.org/?p=libav.gita=searchh=HEADst=greps=[_a-zA-Z0-9]%2B+%2B\%28\*[^%2C]%2B\%29\[.*\]sr=1

Otherwise OK.
___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

[libav-devel] [PATCH 1/2] x86: bswapdsp: Don't treat 32-bit integers as 64-bit

2015-07-15 Thread Henrik Gramner

The upper halves are not guaranteed to be zero in x86-64.

Also use `test` instead of `and` when the result isn't used for anything other
than as a branch condition, this allows some register moves to be eliminated.
---
 libavcodec/x86/bswapdsp.asm | 23 ++-
 1 file changed, 10 insertions(+), 13 deletions(-)

diff --git a/libavcodec/x86/bswapdsp.asm b/libavcodec/x86/bswapdsp.asm
index 17a6cb1..42580a3 100644
--- a/libavcodec/x86/bswapdsp.asm
+++ b/libavcodec/x86/bswapdsp.asm
@@ -28,8 +28,8 @@ SECTION_TEXT
 
 ; %1 = aligned/unaligned
 %macro BSWAP_LOOPS  1
-mov  r3, r2
-sar  r2, 3
+mov  r3d, r2d
+sar  r2d, 3
 jz   .left4_%1
 .loop8_%1:
 mov%1m0, [r1 +  0]
@@ -57,11 +57,11 @@ SECTION_TEXT
 %endif
 add  r0, 32
 add  r1, 32
-dec  r2
+dec  r2d
 jnz  .loop8_%1
 .left4_%1:
-mov  r2, r3
-and  r3, 4
+mov  r2d, r3d
+test r3d, 4
 jz   .left
 mov%1m0, [r1]
 %if cpuflag(ssse3)
@@ -84,13 +84,11 @@ SECTION_TEXT
 %macro BSWAP32_BUF 0
 %if cpuflag(ssse3)
 cglobal bswap32_buf, 3,4,3
-mov  r3, r1
 mova m2, [pb_bswap32]
 %else
 cglobal bswap32_buf, 3,4,5
-mov  r3, r1
 %endif
-and  r3, 15
+test r1, 15
 jz   .start_align
 BSWAP_LOOPS  u
 jmp  .left
@@ -98,8 +96,7 @@ cglobal bswap32_buf, 3,4,5
 BSWAP_LOOPS  a
 .left:
 %if cpuflag(ssse3)
-mov  r3, r2
-and  r2, 2
+test r2d, 2
 jz   .left1
 movq m0, [r1]
 pshufb   m0, m2
@@ -107,13 +104,13 @@ cglobal bswap32_buf, 3,4,5
 add  r1, 8
 add  r0, 8
 .left1:
-and  r3, 1
+test r2d, 1
 jz   .end
 mov  r2d, [r1]
 bswapr2d
 mov  [r0], r2d
 %else
-and  r2, 3
+and  r2d, 3
 jz   .end
 .loop2:
 mov  r3d, [r1]
@@ -121,7 +118,7 @@ cglobal bswap32_buf, 3,4,5
 mov  [r0], r3d
 add  r1, 4
 add  r0, 4
-dec  r2
+dec  r2d
 jnz  .loop2
 %endif
 .end:
-- 
1.8.3.2

___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

[libav-devel] [PATCH 2/2] checkasm: add unit tests for bswapdsp

2015-07-15 Thread Henrik Gramner

---
 tests/checkasm/Makefile   |  1 +
 tests/checkasm/bswapdsp.c | 73 +++
 tests/checkasm/checkasm.c |  3 ++
 tests/checkasm/checkasm.h |  1 +
 4 files changed, 78 insertions(+)
 create mode 100644 tests/checkasm/bswapdsp.c

diff --git a/tests/checkasm/Makefile b/tests/checkasm/Makefile
index 0758746..483ad13 100644
--- a/tests/checkasm/Makefile
+++ b/tests/checkasm/Makefile
@@ -1,4 +1,5 @@
 # libavcodec tests
+AVCODECOBJS-$(CONFIG_BSWAPDSP) += bswapdsp.o
 AVCODECOBJS-$(CONFIG_H264PRED) += h264pred.o
 AVCODECOBJS-$(CONFIG_H264QPEL) += h264qpel.o
 
diff --git a/tests/checkasm/bswapdsp.c b/tests/checkasm/bswapdsp.c
new file mode 100644
index 000..7b1566b
--- /dev/null
+++ b/tests/checkasm/bswapdsp.c
@@ -0,0 +1,73 @@
+/*
+ * Copyright (c) 2015 Henrik Gramner
+ *
+ * This file is part of Libav.
+ *
+ * Libav is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * Libav is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with Libav; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+#include string.h
+#include checkasm.h
+#include libavcodec/bswapdsp.h
+#include libavutil/common.h
+#include libavutil/intreadwrite.h
+
+#define BUF_SIZE 512
+
+#define randomize_buffers()\
+do {\
+int i;\
+for (i = 0; i  BUF_SIZE; i += 4) {\
+uint32_t r = rnd();\
+AV_WN32A(src0+i, r);\
+AV_WN32A(src1+i, r);\
+r = rnd();\
+AV_WN32A(dst0+i, r);\
+AV_WN32A(dst1+i, r);\
+}\
+} while (0)
+
+#define check_bswap(type)\
+do {\
+int w;\
+for (w = 0; w  BUF_SIZE/sizeof(type); w++) {\
+int offset = (BUF_SIZE/sizeof(type) - w)  15; /* Test various 
alignments */\
+randomize_buffers();\
+call_ref((type*)dst0+offset, (type*)src0+offset, w);\
+call_new((type*)dst1+offset, (type*)src1+offset, w);\
+if (memcmp(src0, src1, BUF_SIZE) || memcmp(dst0, dst1, BUF_SIZE))\
+fail();\
+bench_new((type*)dst1+offset, (type*)src1+offset, w);\
+}\
+} while (0)
+
+void checkasm_check_bswapdsp(void)
+{
+DECLARE_ALIGNED(16, uint8_t, src0)[BUF_SIZE];
+DECLARE_ALIGNED(16, uint8_t, src1)[BUF_SIZE];
+DECLARE_ALIGNED(16, uint8_t, dst0)[BUF_SIZE];
+DECLARE_ALIGNED(16, uint8_t, dst1)[BUF_SIZE];
+BswapDSPContext h;
+
+ff_bswapdsp_init(h);
+
+if (check_func(h.bswap_buf, bswap_buf))
+check_bswap(uint32_t);
+
+if (check_func(h.bswap16_buf, bswap16_buf))
+check_bswap(uint16_t);
+
+report(bswap);
+}
diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c
index 7b1ea8f..ce73778 100644
--- a/tests/checkasm/checkasm.c
+++ b/tests/checkasm/checkasm.c
@@ -54,6 +54,9 @@
 
 /* List of tests to invoke */
 static void (* const tests[])(void) = {
+#if CONFIG_BSWAPDSP
+checkasm_check_bswapdsp,
+#endif
 #if CONFIG_H264PRED
 checkasm_check_h264pred,
 #endif
diff --git a/tests/checkasm/checkasm.h b/tests/checkasm/checkasm.h
index 1a46e9b..c2e359f 100644
--- a/tests/checkasm/checkasm.h
+++ b/tests/checkasm/checkasm.h
@@ -29,6 +29,7 @@
 #include libavutil/lfg.h
 #include libavutil/timer.h
 
+void checkasm_check_bswapdsp(void);
 void checkasm_check_h264pred(void);
 void checkasm_check_h264qpel(void);
 
-- 
1.8.3.2

___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

[libav-devel] [PATCH] checkasm: Add unit tests for h264qpel

2015-07-13 Thread Henrik Gramner

---
 tests/checkasm/Makefile   |  1 +
 tests/checkasm/checkasm.c |  3 ++
 tests/checkasm/checkasm.h |  1 +
 tests/checkasm/h264qpel.c | 80 +++
 4 files changed, 85 insertions(+)
 create mode 100644 tests/checkasm/h264qpel.c

diff --git a/tests/checkasm/Makefile b/tests/checkasm/Makefile
index 33e2c09..0758746 100644
--- a/tests/checkasm/Makefile
+++ b/tests/checkasm/Makefile
@@ -1,5 +1,6 @@
 # libavcodec tests
 AVCODECOBJS-$(CONFIG_H264PRED) += h264pred.o
+AVCODECOBJS-$(CONFIG_H264QPEL) += h264qpel.o
 
 CHECKASMOBJS-$(CONFIG_AVCODEC) += $(AVCODECOBJS-yes)
 
diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c
index 59383b8..7b1ea8f 100644
--- a/tests/checkasm/checkasm.c
+++ b/tests/checkasm/checkasm.c
@@ -57,6 +57,9 @@ static void (* const tests[])(void) = {
 #if CONFIG_H264PRED
 checkasm_check_h264pred,
 #endif
+#if CONFIG_H264QPEL
+checkasm_check_h264qpel,
+#endif
 NULL
 };
 
diff --git a/tests/checkasm/checkasm.h b/tests/checkasm/checkasm.h
index 90844e2..1a46e9b 100644
--- a/tests/checkasm/checkasm.h
+++ b/tests/checkasm/checkasm.h
@@ -30,6 +30,7 @@
 #include libavutil/timer.h
 
 void checkasm_check_h264pred(void);
+void checkasm_check_h264qpel(void);
 
 intptr_t (*checkasm_check_func(intptr_t (*func)(), const char *name, ...))() 
av_printf_format(2, 3);
 int checkasm_bench_func(void);
diff --git a/tests/checkasm/h264qpel.c b/tests/checkasm/h264qpel.c
new file mode 100644
index 000..06bc6ad
--- /dev/null
+++ b/tests/checkasm/h264qpel.c
@@ -0,0 +1,80 @@
+/*
+ * Copyright (c) 2015 Henrik Gramner
+ *
+ * This file is part of Libav.
+ *
+ * Libav is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * Libav is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with Libav; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+#include string.h
+#include checkasm.h
+#include libavcodec/h264qpel.h
+#include libavutil/common.h
+#include libavutil/intreadwrite.h
+
+static const uint32_t pixel_mask[3] = { 0x, 0x01ff01ff, 0x03ff03ff };
+
+#define SIZEOF_PIXEL ((bit_depth + 7) / 8)
+#define BUF_SIZE (2*16*(16+3+4))
+
+#define randomize_buffers()\
+do {\
+uint32_t mask = pixel_mask[bit_depth-8];\
+int k;\
+for (k = 0; k  BUF_SIZE; k += 4) {\
+uint32_t r = rnd()  mask;\
+AV_WN32A(buf0+k, r);\
+AV_WN32A(buf1+k, r);\
+r = rnd();\
+AV_WN32A(dst0+k, r);\
+AV_WN32A(dst1+k, r);\
+}\
+} while (0)
+
+#define src0 (buf0 + 3*2*16) /* h264qpel functions read data from negative src 
pointer offsets */
+#define src1 (buf1 + 3*2*16)
+
+void checkasm_check_h264qpel(void)
+{
+DECLARE_ALIGNED(16, uint8_t, buf0)[BUF_SIZE];
+DECLARE_ALIGNED(16, uint8_t, buf1)[BUF_SIZE];
+DECLARE_ALIGNED(16, uint8_t, dst0)[BUF_SIZE];
+DECLARE_ALIGNED(16, uint8_t, dst1)[BUF_SIZE];
+H264QpelContext h;
+int op, bit_depth, i, j;
+
+for (op = 0; op  2; op++) {
+qpel_mc_func (*tab)[16] = op ? h.avg_h264_qpel_pixels_tab : 
h.put_h264_qpel_pixels_tab;
+const char *op_name = op ? avg : put;
+
+for (bit_depth = 8; bit_depth = 10; bit_depth++) {
+ff_h264qpel_init(h, bit_depth);
+for (i = 0; i  (op ? 3 : 4); i++) {
+int size = 16  i;
+for (j = 0; j  16; j++) {
+if (check_func(tab[i][j], %s_h264_qpel_%d_mc%d%d_%d, 
op_name, size, j3, j2, bit_depth)) {
+randomize_buffers();
+call_ref(dst0, src0, (ptrdiff_t)size*SIZEOF_PIXEL);
+call_new(dst1, src1, (ptrdiff_t)size*SIZEOF_PIXEL);
+if (memcmp(dst0, dst1, BUF_SIZE))
+fail();
+bench_new(dst1, src1, (ptrdiff_t)size*SIZEOF_PIXEL);
+}
+}
+}
+}
+report(%s_h264_qpel, op_name);
+}
+}
-- 
1.8.3.2

___
libav-devel mailing list
libav-devel@libav.org
https://lists.libav.org/mailman/listinfo/libav-devel

1 2 >

1 - 100 of 114 matches

Mail list logo