Re: [libav-devel] [PATCH 4/4] x86: fft: Port to cpuflags

2017-03-14 Thread Henrik Gramner
On Fri, Mar 10, 2017 at 3:17 PM, Diego Biurrun wrote: > +%macro INTERL 5 > +%if cpuflag(avx) > +vunpckhps %3, %2, %1 > +vunpcklps %2, %2, %1 > +vextractf128 %4(%5), %2, 0 > +vextractf128 %4 %+ H(%5), %3, 0 > +vextractf128 %4(%5 + 1), %2, 1 > +

Re: [libav-devel] [PATCH] mov: Avoid memcmp of uninitialised data

2017-01-29 Thread Henrik Gramner
On Sun, Jan 29, 2017 at 8:59 PM, Mark Thompson wrote: > strncmp Any particular reason for not just using plain strcmp()? ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [FFmpeg-devel] [PATCH] x86inc: Avoid using eax/rax for storing the stack pointer

2016-12-26 Thread Henrik Gramner
On Mon, Dec 26, 2016 at 2:52 PM, Ronald S. Bultje wrote: > Hm, OK, I think it affects unix64/x86-32 also when using 32-byte > alignment. We do use the stack pointer then. On 32-bit and UNIX64 it simply uses a different caller-saved register which doesn't require additional

Re: [libav-devel] [PATCH] x86inc: Avoid using eax/rax for storing the stack pointer

2016-12-26 Thread Henrik Gramner
On Mon, Dec 26, 2016 at 2:32 AM, Ronald S. Bultje wrote: > I know I'm terribly nitpicking here for the limited scope of the comment, > but this only matters for functions that have a return value. Do you think > it makes sense to allow functions to opt out of this requirement

[libav-devel] [PATCH] x86inc: Avoid using eax/rax for storing the stack pointer

2016-12-25 Thread Henrik Gramner
When allocating stack space with an alignment requirement that is larger than the current stack alignment we need to store a copy of the original stack pointer in order to be able to restore it later. If we chose to use another register for this purpose we should not pick eax/rax since it can be

Re: [libav-devel] [PATCH 1/3] ratecontrol: Use correct function pointer casts instead of void*

2016-11-11 Thread Henrik Gramner
On Fri, Nov 11, 2016 at 1:22 PM, Diego Biurrun wrote: > ISO C forbids initialization between function pointer and ‘void *’ ISO C technically allows quite a lot of weird stuff, like having function pointers that are different from data pointers. Is there even any known relevant

Re: [libav-devel] [PATCH 2/2] hevc: x86: Add add_residual optimizations

2016-10-19 Thread Henrik Gramner
On Wed, Oct 19, 2016 at 10:18 AM, Diego Biurrun wrote: > +%macro ADD_RES_MMX_4_8 0 > +mova m2, [r1] > +mova m4, [r1+8] > +pxor m3, m3 > +psubw m3, m2 > +packuswb m2, m2 > +packuswb m3,

Re: [libav-devel] [PATCH 1/2] checkasm: Add a test for HEVC add_residual

2016-10-19 Thread Henrik Gramner
On Wed, Oct 19, 2016 at 5:43 PM, Diego Biurrun wrote: > What exactly segfaults? checkasm --bench=add_res The stride for bench_new() shouldn't be different from call_new() Actually it should probably be more like: int stride = block_size << (bit_depth > 8);

Re: [libav-devel] [PATCH 1/2] checkasm: Add a test for HEVC add_residual

2016-10-19 Thread Henrik Gramner
On Wed, Oct 19, 2016 at 10:18 AM, Diego Biurrun wrote: > +bench_new(dst1, res1, block_size); Segfaults. Should probably be block_size * 2 like the other calls. ___ libav-devel mailing list libav-devel@libav.org

Re: [libav-devel] [PATCH 2/2] checkasm: Add a test for HEVC add_residual

2016-10-14 Thread Henrik Gramner
On Fri, Oct 14, 2016 at 10:29 AM, Luca Barbato wrote: > The term checkasm is misleading. The whole thing is a unit-test for some > specific dsp functions. Not really, no. The checkasm tests only tests whether or not the output of the assembly functions matches the output of

Re: [libav-devel] [PATCH 1/2] hevc: Add SSE2 and AVX IDCT

2016-10-07 Thread Henrik Gramner
On Fri, Oct 7, 2016 at 6:32 PM, Alexandra Hájková wrote: > On Fri, Oct 7, 2016 at 12:32 AM, Diego Biurrun wrote: >> There should be no need to redefine the transpose functions, just call >> the right one with the help of the cpuname macro. > > The

Re: [libav-devel] [PATCH 1/2] hevc: Add SSE2 and AVX IDCT

2016-10-04 Thread Henrik Gramner
On Tue, Oct 4, 2016 at 7:35 PM, Alexandra Hájková wrote: > +cglobal hevc_idct_16x16_%1, 1, 2, 16, coeffs > +mov r1d, 3 > +.loop16: > +TR_16x4 8 * r1, 7, [pd_64], 64, 2, 32, 8, 16, 1, 0 > +dec r1 dec r1d [...] > +++ b/libavcodec/x86/hevcdsp_init.c The

Re: [libav-devel] [PATCH 1/2] hevc: Add SSE2 and AVX IDCT

2016-10-02 Thread Henrik Gramner
On Sat, Oct 1, 2016 at 12:55 PM, wrote: > +cglobal hevc_idct_4x4_ %+ %1, 1, 1, 5, coeffs cglobal hevc_idct_4x4_%1, 1, 1, 5, coeffs [...] > +%macro SWAP_BLOCKS 5 [...] > +TRANSPOSE_4x4 4, 5, 8 [...] > +TRANSPOSE_4x4 4, 5, 8 [...] > +%macro TRANSPOSE_BLOCK 3

Re: [libav-devel] [PATCH 1/2] hevc: Add SSE2 and AVX IDCT

2016-09-30 Thread Henrik Gramner
On Fri, Sep 30, 2016 at 5:40 PM, wrote: > +%if cpuflag(avx) > +pmaddwd m2, m0, [pw_64] ; e0 > +pmaddwd m3, m1, [pw_83_36] ; o0 > +%else > +mova m2, m0 > +pmaddwd m2, [pw_64] > +mova m3, m1 > +pmaddwd m3, [pw_83_36] > +%endif Redundant %else.

Re: [libav-devel] [PATCH 3/9] blockdsp/x86: yasmify

2016-09-22 Thread Henrik Gramner
On Thu, Sep 22, 2016 at 9:39 AM, Anton Khirnov <an...@khirnov.net> wrote: > Quoting Henrik Gramner (2016-09-21 17:13:31) >> Why not use xorps like the original code then? INIT_XMM sse will also >> make mova assemble to movaps instead of movdqa, so no problem there. > >

Re: [libav-devel] [PATCH 3/9] blockdsp/x86: yasmify

2016-09-21 Thread Henrik Gramner
On Wed, Sep 21, 2016 at 9:01 AM, Anton Khirnov wrote: > Yes they are, because pxor does not exist in SSE. Why not use xorps like the original code then? INIT_XMM sse will also make mova assemble to movaps instead of movdqa, so no problem there.

Re: [libav-devel] [PATCH 1/2] hevc: Add AVX IDCT

2016-09-19 Thread Henrik Gramner
Not a super-thorough review by any means, but anyway... On Sun, Sep 18, 2016 at 7:35 PM, Alexandra Hájková wrote: [...] > +SECTION_RODATA Check if any of the constants are duplicates of already existing ones. [...] > +%macro TR_4x4 2 > +; interleaves src0

Re: [libav-devel] [PATCH 9/9] audiodsp/x86: yasmify vector_clipf_sse

2016-09-06 Thread Henrik Gramner
On Tue, Sep 6, 2016 at 11:39 AM, Anton Khirnov wrote: >> Use 3-arg maxps instead of mova. > > Isn't that AVX-only? It is, x86inc will simply convert it to mova+minps when assembling it as non-AVX code but it reduces the line count. It's certainly not worth to go into

Re: [libav-devel] [PATCH] audiodsp/x86: clear the high bits of the order parameter on 64bit

2016-09-06 Thread Henrik Gramner
On Tue, Sep 6, 2016 at 11:44 AM, Anton Khirnov wrote: > Also change shl to add, since it can be faster on some CPUs. > > CC: libav-sta...@libav.org > --- > libavcodec/x86/audiodsp.asm | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) Ok.

Re: [libav-devel] [PATCH 5/9] audiodsp/x86: sign extend the order argument to scalarproduct_int16 on 64bit

2016-09-05 Thread Henrik Gramner
On Mon, Sep 5, 2016 at 1:02 PM, Anton Khirnov wrote: > CC: libav-sta...@libav.org > --- > libavcodec/x86/audiodsp.asm | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/libavcodec/x86/audiodsp.asm b/libavcodec/x86/audiodsp.asm > index dc38ada..0e3019c 100644 > ---

Re: [libav-devel] [PATCH 9/9] audiodsp/x86: yasmify vector_clipf_sse

2016-09-05 Thread Henrik Gramner
On Mon, Sep 5, 2016 at 1:02 PM, Anton Khirnov wrote: > +shl lenq, 2 You could also skip this shift and just use 4*lenq instead in the memory operands, multiplying by 2, 4, or 8 in memory args is free. ___ libav-devel mailing list

Re: [libav-devel] [PATCH 9/9] audiodsp/x86: yasmify vector_clipf_sse

2016-09-05 Thread Henrik Gramner
On Mon, Sep 5, 2016 at 1:02 PM, Anton Khirnov wrote: > +cglobal vector_clipf, 3, 3, 6, dst, src, len, min, max > +%if ARCH_X86_32 > +VBROADCASTSS m0, minm > +VBROADCASTSS m1, maxm > +%else > +VBROADCASTSS m0, m0 > +VBROADCASTSS m1, m1 > +%endif This will fail

Re: [libav-devel] [PATCH 1/2 v2] x86/hevc: add add_residual

2016-07-21 Thread Henrik Gramner
On Thu, Jul 21, 2016 at 2:48 AM, Josh de Kock wrote: > +cglobal hevc_add_residual_16_8, 3, 5, 7, dst, coeffs, stride > +pxorm0, m0 > +lea r3, [strideq * 3] > +RES_ADD_SSE_16_32_8 0, dstq, dstq + strideq > +RES_ADD_SSE_16_32_8 64,

Re: [libav-devel] [PATCH 1/3] x86/hevc: add add_residual

2016-07-19 Thread Henrik Gramner
On Thu, Jul 14, 2016 at 7:25 PM, Josh de Kock wrote: Some of those functions are several kilobytes large. That's going to result in a lot of cache misses. I suggest using loops instead of duplicating the same code over and over with %reps.

Re: [libav-devel] [PATCH] checkasm: add HEVC test for testing IDCT DC

2016-07-19 Thread Henrik Gramner
On Mon, Jul 18, 2016 at 8:11 PM, Alexandra Hájková wrote: > +if (check_func(h.idct_dc[i - 2], "idct_%dx%d_dc_%d", block_size, > block_size, bit_depth)) { > +call_ref(coeffs0); > +call_new(coeffs1); > +if (memcmp(coeffs0,

Re: [libav-devel] [PATCH 6/6] hevc: Add AVX2 DC IDCT

2016-07-10 Thread Henrik Gramner
On Sun, Jul 10, 2016 at 1:10 PM, Alexandra Hájková wrote: Some fairly minor nits: > +++ b/libavcodec/x86/hevc_idct.asm > +cglobal hevc_idct_%1x%1_dc_%3, 1, 2, 1, coeff, tmp > +movsx tmpq, word [coeffq] > +add tmpw, ((1 << 14-%3) +

[libav-devel] [PATCH 3/4] x86inc: Improve handling of %ifid with multi-token parameters

2016-04-20 Thread Henrik Gramner
From: Anton Mitrofanov The yasm/nasm preprocessor only checks the first token, which means that parameters such as `dword [rax]` are treated as identifiers, which is generally not what we want. --- libavutil/x86/x86inc.asm | 4 ++-- 1 file changed, 2 insertions(+), 2

[libav-devel] [PATCH 4/4] x86inc: Enable AVX emulation in additional cases

2016-04-20 Thread Henrik Gramner
From: Anton Mitrofanov Allows emulation to work when dst is equal to src2 as long as the instruction is commutative, e.g. `addps m0, m1, m0`. --- libavutil/x86/x86inc.asm | 21 + 1 file changed, 13 insertions(+), 8 deletions(-) diff --git

[libav-devel] [PATCH 2/4] x86inc: Fix AVX emulation of some instructions

2016-04-20 Thread Henrik Gramner
From: Anton Mitrofanov --- libavutil/x86/x86inc.asm | 44 1 file changed, 24 insertions(+), 20 deletions(-) diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm index 10352fc..60aad23 100644 ---

[libav-devel] [PATCH 1/4] x86inc: Fix AVX emulation of scalar float instructions

2016-04-20 Thread Henrik Gramner
Those instructions are not commutative since they only change the first element in the vector and leave the rest unmodified. --- libavutil/x86/x86inc.asm | 28 ++-- 1 file changed, 14 insertions(+), 14 deletions(-) diff --git a/libavutil/x86/x86inc.asm

[libav-devel] [PATCH 0/4] x86inc: Sync changes from x264

2016-04-20 Thread Henrik Gramner
Anton Mitrofanov (3): x86inc: Fix AVX emulation of some instructions x86inc: Improve handling of %ifid with multi-token parameters x86inc: Enable AVX emulation in additional cases Henrik Gramner (1): x86inc: Fix AVX emulation of scalar float instructions libavutil/x86/x86inc.asm | 95

Re: [libav-devel] [PATCH] h264: Use isprint to sanitize the SEI debug message

2016-02-06 Thread Henrik Gramner
On Sat, Feb 6, 2016 at 7:34 PM, Luca Barbato wrote: > Give how this function is used it is not really important, its purpose > is to not break the terminal printing garbage. That's true I guess. > Do you have time to get me a function that is local independent? static

Re: [libav-devel] [PATCH] h264: Use isprint to sanitize the SEI debug message

2016-02-06 Thread Henrik Gramner
On Sat, Feb 6, 2016 at 1:03 PM, Luca Barbato wrote: > +if (isprint(val)) Shouldn't we use a locale-independent version similar to the other functions in libavutil/avstring.h? ___ libav-devel mailing list

Re: [libav-devel] [PATCH] h264: Parse only the x264 info unregisterd sei

2016-02-04 Thread Henrik Gramner
On Wed, Jul 29, 2015 at 10:51 PM, Luca Barbato wrote: > And restrict the string to ascii text. Restricting to printable characters would be even better. ___ libav-devel mailing list libav-devel@libav.org

[libav-devel] [PATCH] msvc: Fix libx264 linking

2016-01-28 Thread Henrik Gramner
--- configure | 1 + 1 file changed, 1 insertion(+) diff --git a/configure b/configure index c5bcb78..0bf29c2 100755 --- a/configure +++ b/configure @@ -2951,6 +2951,7 @@ msvc_common_flags(){ -lz) echo zlib.lib ;; -lavifil32) echo vfw32.lib ;;

[libav-devel] [PATCH v2] x86inc: Preserve arguments when allocating stack space

2016-01-20 Thread Henrik Gramner
When allocating stack space with a larger alignment than the known stack alignment a temporary register is used for storing the stack pointer. Ensure that this isn't one of the registers used for passing arguments. --- libavutil/x86/x86inc.asm | 7 +-- 1 file changed, 5 insertions(+), 2

Re: [libav-devel] [PATCH 4/8] x86inc: Preserve arguments when allocating stack space

2016-01-18 Thread Henrik Gramner
On Mon, Jan 18, 2016 at 2:35 PM, Ronald S. Bultje <rsbul...@gmail.com> wrote: > On Sun, Jan 17, 2016 at 6:21 PM, Henrik Gramner <hen...@gramner.com> wrote: >> @@ -386,8 +386,10 @@ DECLARE_REG_TMP_SIZE >> 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14 >> %if %1

[libav-devel] [PATCH 4/8] x86inc: Preserve arguments when allocating stack space

2016-01-17 Thread Henrik Gramner
When allocating stack space with a larger alignment than the known stack alignment a temporary register is used for storing the stack pointer. Ensure that this isn't one of the registers used for passing arguments. --- libavutil/x86/x86inc.asm | 6 -- 1 file changed, 4 insertions(+), 2

[libav-devel] [PATCH 5/8] x86inc: Use more consistent indentation

2016-01-17 Thread Henrik Gramner
--- libavutil/x86/x86inc.asm | 134 +++ 1 file changed, 67 insertions(+), 67 deletions(-) diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm index c355ee7..de20e76 100644 --- a/libavutil/x86/x86inc.asm +++ b/libavutil/x86/x86inc.asm @@

[libav-devel] [PATCH 0/8] x86inc: Sync changes from x264

2016-01-17 Thread Henrik Gramner
The following patches were recently pushed to x264. Geza Lore (1): x86inc: Add debug symbols indicating sizes of compiled functions Henrik Gramner (7): x86inc: Make cpuflag() and notcpuflag() return 0 or 1 x86inc: Be more verbose in assertion failures x86inc: Improve FMA instruction

[libav-devel] [PATCH 7/8] x86inc: Avoid creating unnecessary local labels

2016-01-17 Thread Henrik Gramner
The REP_RET workaround is only needed on old AMD cpus, and the labels clutter up the symbol table and confuse debugging/profiling tools, so use EQU to create SHN_ABS symbols instead of creating local labels. Furthermore, skip the workaround completely in functions that definitely won't run on such

[libav-devel] [PATCH 8/8] x86inc: Add debug symbols indicating sizes of compiled functions

2016-01-17 Thread Henrik Gramner
From: Geza Lore Some debuggers/profilers use this metadata to determine which function a given instruction is in; without it they get can confused by local labels (if you haven't stripped those). On the other hand, some tools are still confused even with this metadata. e.g.

[libav-devel] [PATCH 2/8] x86inc: Be more verbose in assertion failures

2016-01-17 Thread Henrik Gramner
--- libavutil/x86/x86inc.asm | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm index afcd6b8..dabb6cc 100644 --- a/libavutil/x86/x86inc.asm +++ b/libavutil/x86/x86inc.asm @@ -295,7 +295,7 @@ DECLARE_REG_TMP_SIZE

[libav-devel] [PATCH 3/8] x86inc: Improve FMA instruction handling

2016-01-17 Thread Henrik Gramner
* Correctly handle FMA instructions with memory operands. * Print a warning if FMA instructions are used without the correct cpuflag. * Simplify the instantiation code. * Clarify documentation. Only the last operand in FMA3 instructions can be a memory operand. When converting FMA4

Re: [libav-devel] [PATCH v2 1/1] x86: use emms after ff_int32_to_float_fmul_scalar_sse

2015-12-30 Thread Henrik Gramner
On Wed, Dec 30, 2015 at 1:43 PM, Janne Grunau wrote: > libavcodec/x86/fmtconvert.asm | 9 - > 1 file changed, 8 insertions(+), 1 deletion(-) Ok. ___ libav-devel mailing list libav-devel@libav.org

Re: [libav-devel] [PATCH 1/1] x86: use emms after ff_int32_to_float_fmul_scalar_sse

2015-12-29 Thread Henrik Gramner
On Tue, Dec 29, 2015 at 12:32 PM, Janne Grunau wrote: > Intel's Instruction Set Reference (as of September 2015) clearly states > that cvtpi2ps switches to MMX state. Actual CPUs do not switch if the > source is a memory location. The Instruction Set Reference from 1999 >

Re: [libav-devel] [PATCH 2/2] checkasm: x86: post commit review fixes

2015-12-25 Thread Henrik Gramner
On Tue, Dec 22, 2015 at 10:59 PM, Janne Grunau wrote: > Check the full FPU tag word instead of only the upper half and simplify > the comparison. It previously only checked the lower half, not the upper. > Use upper-case function base name as macro name to instantiate

Re: [libav-devel] [PATCH 1/2] x86: zero extend the 32-bit length in int32_to_float_fmul_scalar implicitly

2015-12-25 Thread Henrik Gramner
On Tue, Dec 22, 2015 at 10:59 PM, Janne Grunau wrote: > This reverts commit 5dfe4edad63971d669ae456b0bc40ef9364cca80. > --- > libavcodec/x86/fmtconvert.asm | 5 + > 1 file changed, 1 insertion(+), 4 deletions(-) Ok. ___

Re: [libav-devel] [libav-commits] checkasm: add fmtconvert tests

2015-12-23 Thread Henrik Gramner
On Tue, Dec 22, 2015 at 10:44 PM, Janne Grunau wrote: >> Intel's current documentation is very clear on cvtpi2ps: "This >> instruction causes a transition from x87 FPU to MMX technology >> operation". > > every tested silicon (nothing ancient or SSE only though) and the

Re: [libav-devel] [PATCH 1/1] x86: checkasm: check for or handle missing cleanup after MMX instructions

2015-12-22 Thread Henrik Gramner
On Fri, Dec 11, 2015 at 6:40 PM, Janne Grunau wrote: > +#define declare_new_emms(cpu_flags, ret, ...) \ > +ret (*checked_call)(void *, int, int, int, int, int, __VA_ARGS__) = \ > +((cpu_flags) & av_get_cpu_flags()) ? (void > *)checkasm_checked_call_emms : \ >

Re: [libav-devel] [libav-commits] checkasm: add fmtconvert tests

2015-12-22 Thread Henrik Gramner
On Tue, Dec 22, 2015 at 5:41 PM, Janne Grunau wrote: > I found HTML copy from 1999 of Intel's manual(1) which says that > cvtpi2ps with a memory location as source doesn't cause a transition to > MMX state. The current documentation for cvtpi2pd (packed int to packed >

Re: [libav-devel] [PATCH 1/2] configure: Support msys2 out of box

2015-11-21 Thread Henrik Gramner
On Sat, Nov 21, 2015 at 7:53 AM, Hendrik Leppkes wrote: > msys2 provides various .sh scripts to setup the environment, one for > msys2 building, and one for mingw32/64 respectively. > You need to launch it using the appropriate shell script, but just > running sh.exe. > > -

[libav-devel] [PATCH] checkasm: Fix compilation with --disable-avcodec

2015-10-04 Thread Henrik Gramner
--- tests/checkasm/checkasm.c | 24 +--- 1 file changed, 13 insertions(+), 11 deletions(-) diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c index 9219a83..3ed78b6 100644 --- a/tests/checkasm/checkasm.c +++ b/tests/checkasm/checkasm.c @@ -57,17 +57,19 @@

Re: [libav-devel] [PATCH] checkasm: Fix compilation with --disable-avcodec

2015-10-04 Thread Henrik Gramner
On Sun, Oct 4, 2015 at 8:39 PM, Luca Barbato wrote: > Alternatively we might make sure if avcodec is disabled all its > components are as well. > > might simplify a lot the code... Yes, that's indeed a solid approach as well. Who's volunteering for that though? I don't really

[libav-devel] [PATCH] checkasm: Fix the function name sorting algorithm

2015-09-28 Thread Henrik Gramner
The previous implementation was behaving incorrectly in some corner cases. --- tests/checkasm/checkasm.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c index 013e197..9219a83 100644 --- a/tests/checkasm/checkasm.c

Re: [libav-devel] [PATCH] avutil/avstring: Inline some tiny functions

2015-09-28 Thread Henrik Gramner
On Mon, Sep 28, 2015 at 9:49 AM, Anton Khirnov wrote: > But does it actually improve performance measurably? I'd argue that > those functions are used in places where it doesn't really matter. I was using some perf tools through checkasm when I noticed an awful lot of time was

[libav-devel] [PATCH] avutil/avstring: Inline some tiny functions

2015-09-26 Thread Henrik Gramner
They're short enough that inlining them actually reduces code size due to all the overhead associated with making a function call. --- libavutil/avstring.c | 22 -- libavutil/avstring.h | 22 ++ 2 files changed, 18 insertions(+), 26 deletions(-) diff --git

[libav-devel] [PATCH] checkasm: Use a self-balancing tree

2015-09-25 Thread Henrik Gramner
Tested functions are internally kept in a binary search tree for efficient lookups. The downside of the current implementation is that the tree quickly becomes unbalanced which causes an unneccessary amount of comparisons between nodes. Improve this by changing the tree into a self-balancing

[libav-devel] [PATCH] checkasm/x86: Correctly handle variadic functions

2015-09-23 Thread Henrik Gramner
The System V ABI on x86-64 specifies that the al register contains an upper bound of the number of arguments passed in vector registers when calling variadic functions, so we aren't allowed to clobber it. checkasm_fail_func() is a variadic function so also zero al before calling it. ---

Re: [libav-devel] [PATCH] tiny_psnr: Use the correct abs() version

2015-09-22 Thread Henrik Gramner
On Tue, Sep 22, 2015 at 9:28 PM, Vittorio Giovara wrote: > I am puzzled as well, msdn reports this function available only from > vs2013, but there is a vs2012 fate instance which seems to compile > fine with it. That wouldn't exactly be the first incorrect thing in

[libav-devel] [PATCH] checkasm: v210: Fix array overwrite

2015-09-16 Thread Henrik Gramner
--- tests/checkasm/v210enc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tests/checkasm/v210enc.c b/tests/checkasm/v210enc.c index cdb8e76..4f5f6ba 100644 --- a/tests/checkasm/v210enc.c +++ b/tests/checkasm/v210enc.c @@ -43,7 +43,7 @@ AV_WN32A(v0 + i, r);

[libav-devel] [PATCH] checkasm: add unit tests for v210enc

2015-09-05 Thread Henrik Gramner
null +++ b/tests/checkasm/v210enc.c @@ -0,0 +1,94 @@ +/* + * Copyright (c) 2015 Henrik Gramner + * + * This file is part of Libav. + * + * Libav is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundati

[libav-devel] [PATCH] checkasm: Fix floating point arguments on 64-bit Windows

2015-08-24 Thread Henrik Gramner
--- tests/checkasm/x86/checkasm.asm | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/tests/checkasm/x86/checkasm.asm b/tests/checkasm/x86/checkasm.asm index 4948fc9..828352c 100644 --- a/tests/checkasm/x86/checkasm.asm +++ b/tests/checkasm/x86/checkasm.asm @@ -103,16

Re: [libav-devel] [PATCH] hevcdsp: add x86 SIMD for MC

2015-08-23 Thread Henrik Gramner
On Sun, Aug 23, 2015 at 8:27 PM, Anton Khirnov an...@khirnov.net wrote: Quoting James Almer (2015-08-22 23:58:41) You need to use the d suffix instead of q on the register names to make sure the high bits are cleared. Eh? Perhaps I'm misunderstading something, but I'd expect that using d

Re: [libav-devel] [PATCH] checkasm: add HEVC MC tests

2015-08-22 Thread Henrik Gramner
Minor nits: +#define randomize_buffers(buf, size, depth) s/buffers/buffer/ since you're only randomizing a single one at a time. +static const char *interp_names[2][2] = { { pixels, h }, { v, hv } }; const char * const Otherwise lgtm. ___

[libav-devel] [PATCH v2] checkasm: Explicitly declare function prototypes

2015-08-20 Thread Henrik Gramner
Now we no longer have to rely on function pointers intentionally declared without specified argument types. This makes it easier to support functions with floating point parameters or return values as well as functions returning 64-bit values on 32-bit architectures. It also avoids having to

Re: [libav-devel] [PATCH 7/8] checkasm: add HEVC MC tests

2015-08-20 Thread Henrik Gramner
On Wed, Aug 19, 2015 at 9:43 PM, Anton Khirnov an...@khirnov.net wrote: +const int srcstride = FFALIGN(width, 16) * sizeof(*src0); +const int dststride = FFALIGN(width, 16) * PIXEL_SIZE(bit_depth); Strides, and any other pointer-sized value, should be ptrdiff_t - or more preferable,

[libav-devel] [PATCH] checkasm: Explicitly declare function prototypes

2015-08-16 Thread Henrik Gramner
Now we no longer have to rely on function pointers intentionally declared without specified argument types. This makes it easier to support functions with floating point parameters or return values as well as functions returning 64-bit values on 32-bit architectures. It also avoids having to

[libav-devel] [PATCH] checkasm: x86: properly save rdx/edx in checked_call()

2015-08-16 Thread Henrik Gramner
If the return value doesn't fit in a single register rdx/edx can in some cases be used in addition to rax/eax. Doesn't affect any of the existing checkasm tests but might be useful later. Also comment the relevant code a bit better. --- tests/checkasm/x86/checkasm.asm | 7 +++ 1 file

[libav-devel] [PATCH] x86inc: Various minor backports from x264

2015-08-11 Thread Henrik Gramner
--- libavutil/x86/x86inc.asm | 32 +--- 1 file changed, 21 insertions(+), 11 deletions(-) diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm index a519fd5..6ad9785 100644 --- a/libavutil/x86/x86inc.asm +++ b/libavutil/x86/x86inc.asm @@ -1,7 +1,7 @@

[libav-devel] [PATCH] checkasm: Remove unnecessary include

2015-08-05 Thread Henrik Gramner
--- tests/checkasm/checkasm.c | 4 1 file changed, 4 deletions(-) diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c index 82c635e..b564e7e 100644 --- a/tests/checkasm/checkasm.c +++ b/tests/checkasm/checkasm.c @@ -33,10 +33,6 @@ #include io.h #endif -#if ARCH_X86

Re: [libav-devel] [PATCH 7/8] x86inc: nasm support

2015-08-02 Thread Henrik Gramner
On Sat, Aug 1, 2015 at 5:27 PM, Henrik Gramner hen...@gramner.com wrote: --- configure| 3 --- libavutil/x86/x86inc.asm | 42 +- 2 files changed, 29 insertions(+), 16 deletions(-) Skip this one for now, nasm seems to have a bug

Re: [libav-devel] [PATCH 8/8] x86inc: Various minor backports from x264

2015-08-02 Thread Henrik Gramner
On Sat, Aug 1, 2015 at 9:34 PM, James Almer jamr...@gmail.com wrote: The same could be done in av_parse_cpu_flags(). It doesn't affect this patch, and can be done separately. Just throwing the idea out there. Yeah, I guess. What about bmi/bmi2, for that matter? What about them?

Re: [libav-devel] [PATCH] x86: dct: Disable dct32_float_sse on x86-64

2015-08-01 Thread Henrik Gramner
On Sat, Aug 1, 2015 at 8:28 PM, Anton Khirnov an...@khirnov.net wrote: Any specific reason you use ARCH_X86_64 in one file and ARCH_X86_32 in the other? I missed that there's a define for ARCH_X86_32 in asm (some other code used %if ARCH_X86_64 == 0 so I assumed it didn't). Using ARCH_X86_32 in

[libav-devel] [PATCH] x86: dcadsp: Avoid SSE2 instructions in SSE functions

2015-08-01 Thread Henrik Gramner
--- libavcodec/x86/dcadsp.asm | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libavcodec/x86/dcadsp.asm b/libavcodec/x86/dcadsp.asm index c42ee23..c99df12 100644 --- a/libavcodec/x86/dcadsp.asm +++ b/libavcodec/x86/dcadsp.asm @@ -148,7 +148,7 @@ DECODE_HF addps m4,

[libav-devel] [PATCH 3/8] x86inc: warn when instructions incompatible with current cpuflags are used

2015-08-01 Thread Henrik Gramner
From: Anton Mitrofanov bugmas...@narod.ru Signed-off-by: Henrik Gramner hen...@gramner.com --- libavutil/x86/x86inc.asm | 587 --- 1 file changed, 299 insertions(+), 288 deletions(-) diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm

[libav-devel] [PATCH 8/8] x86inc: Various minor backports from x264

2015-08-01 Thread Henrik Gramner
--- libavutil/x86/x86inc.asm | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm index d70a5f9..0e2f447 100644 --- a/libavutil/x86/x86inc.asm +++ b/libavutil/x86/x86inc.asm @@ -1,7 +1,7 @@

[libav-devel] [PATCH 1/8] x86inc: warn if XOP integer FMA instruction emulation is impossible

2015-08-01 Thread Henrik Gramner
From: Anton Mitrofanov bugmas...@narod.ru Emulation requires a temporary register if arguments 1 and 4 are the same; this doesn't obey the semantics of the original instruction, so we can't emulate that in x86inc. Also add pmacsdql emulation. Signed-off-by: Henrik Gramner hen...@gramner.com

[libav-devel] [PATCH 2/8] x86inc: Support arbitrary stack alignments

2015-08-01 Thread Henrik Gramner
Change ALLOC_STACK to always align the stack before allocating stack space for consistency. Previously alignment would occur either before or after allocating stack space depending on whether manual alignment was required or not. --- libavcodec/x86/h264_deblock.asm | 4 +--

[libav-devel] [PATCH 0/8] x86inc: Sync changes from x264

2015-08-01 Thread Henrik Gramner
cpuflags are used Christophe Gisquet (1): x86inc: Fix instantiation of YMM registers Henrik Gramner (5): x86inc: Support arbitrary stack alignments x86inc: Disable vpbroadcastq workaround in newer yasm versions x86inc: Drop SECTION_TEXT macro x86inc: nasm support x86inc: Various

[libav-devel] [PATCH 6/8] x86inc: Drop SECTION_TEXT macro

2015-08-01 Thread Henrik Gramner
The .text section is already 16-byte aligned by default on all supported platforms so `SECTION_TEXT` isn't any different from `SECTION .text`. --- libavcodec/x86/apedsp.asm | 2 +- libavcodec/x86/audiodsp.asm | 2 +- libavcodec/x86/bswapdsp.asm | 2 +-

[libav-devel] [PATCH 5/8] x86inc: Disable vpbroadcastq workaround in newer yasm versions

2015-08-01 Thread Henrik Gramner
The bug was fixed in 1.3.0, so only perform the workaround in earlier versions. --- libavutil/x86/x86inc.asm | 20 +++- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm index 2844fdf..d4ce68f 100644 ---

[libav-devel] [PATCH 7/8] x86inc: nasm support

2015-08-01 Thread Henrik Gramner
--- configure| 3 --- libavutil/x86/x86inc.asm | 42 +- 2 files changed, 29 insertions(+), 16 deletions(-) diff --git a/configure b/configure index 482be43..79dd3a5 100755 --- a/configure +++ b/configure @@ -1353,7 +1353,6 @@

[libav-devel] [PATCH 4/8] x86inc: Fix instantiation of YMM registers

2015-08-01 Thread Henrik Gramner
From: Christophe Gisquet christophe.gisq...@gmail.com Signed-off-by: Henrik Gramner hen...@gramner.com --- libavutil/x86/x86inc.asm | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm index 96ebe37..2844fdf 100644

[libav-devel] [PATCH] x86: dct: Disable dct32_float_sse on x86-64

2015-08-01 Thread Henrik Gramner
There is an SSE2 implementation so the SSE version is never used. The SSE version also happens to contain SSE2 instructions on x86-64. --- libavcodec/x86/dct32.asm | 3 +++ libavcodec/x86/dct_init.c | 2 ++ 2 files changed, 5 insertions(+) diff --git a/libavcodec/x86/dct32.asm

Re: [libav-devel] [PATCH] x86: dcadsp: Avoid SSE2 instructions in SSE functions

2015-08-01 Thread Henrik Gramner
On Sat, Aug 1, 2015 at 8:49 PM, James Almer jamr...@gmail.com wrote: I however think movq/sd should be used here for sse2 and above instead of movlps. That's a moot point in this case since the code in question is SSE only (and even if it wasn't I'm skeptical to the claim that it would be

Re: [libav-devel] [PATCH] checkasm: Include io.h for isatty, if available

2015-07-29 Thread Henrik Gramner
On Wed, Jul 29, 2015 at 10:09 PM, Martin Storsjö mar...@martin.st wrote: configure does check for isatty, and checkasm properly checks HAVE_ISATTY, but on some platforms (e.g. WinRT), io.h needs to be included for isatty to be available. Ok. ___

[libav-devel] [PATCH 2/2] checkasm: Use LOCAL_ALIGNED

2015-07-24 Thread Henrik Gramner
From: Michael Niedermayer mich...@niedermayer.cc Fixes alignment issues and bus errors. --- tests/checkasm/bswapdsp.c | 9 + tests/checkasm/h264pred.c | 5 +++-- tests/checkasm/h264qpel.c | 9 + 3 files changed, 13 insertions(+), 10 deletions(-) diff --git

[libav-devel] [PATCH 1/2] checkasm: Modify report format

2015-07-24 Thread Henrik Gramner
Makes it a bit more clear where each test belongs. Suggested by Anton Khirnov. --- tests/checkasm/checkasm.c | 57 +++ tests/checkasm/checkasm.h | 2 +- tests/checkasm/h264qpel.c | 2 +- 3 files changed, 30 insertions(+), 31 deletions(-) diff --git

Re: [libav-devel] [PATCH] [RFC] use a wrapper script to call MS link.exe to avoid mixing with /usr/bin/link.exe

2015-07-23 Thread Henrik Gramner
On Thu, Jul 23, 2015 at 7:23 PM, Steve Lhomme rob...@gmail.com wrote: On Thu, Jul 23, 2015 at 7:02 PM, Derek Buitenhuis derek.buitenh...@gmail.com wrote: Broken permissions. Not sure how I can tweak that under Windows. git update-index --chmod=+x file

Re: [libav-devel] [PATCH] [RFC] use a wrapper script to call MS link.exe to avoid mixing with /usr/bin/link.exe

2015-07-23 Thread Henrik Gramner
On Thu, Jul 23, 2015 at 9:04 AM, Martin Storsjö mar...@martin.st wrote: Why is this suddenly using command instead of which now? This won't work in a linux environment. Why wouldn't it work in a linux environment? `command` is POSIX. This stackoverflow post sums it up fairly well:

Re: [libav-devel] [PATCH 1/1] checkasm: remove empty array initializer list in h264pred test

2015-07-20 Thread Henrik Gramner
On Mon, Jul 20, 2015 at 11:18 PM, Janne Grunau janne-li...@jannau.net wrote: Fixes MSVC compilation. --- tests/checkasm/h264pred.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) Ok. ___ libav-devel mailing list libav-devel@libav.org

Re: [libav-devel] [PATCH 1/1] checkasm: fix MSVC build by adding a zero initializer for an empty array

2015-07-20 Thread Henrik Gramner
On Mon, Jul 20, 2015 at 11:58 AM, Janne Grunau janne-li...@jannau.net wrote: --- tests/checkasm/h264pred.c | 1 + 1 file changed, 1 insertion(+) Shouldn't it be NULL instead of 0 since those are pointers? Otherwise OK. ___ libav-devel mailing list

Re: [libav-devel] [PATCH 2/4] checkasm: test all architectures with optimisations

2015-07-17 Thread Henrik Gramner
lgtm. ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel

[libav-devel] [PATCH 2/2] tests/checkasm/checkasm: Give macro a body to avoid potential unexpected syntax issues

2015-07-17 Thread Henrik Gramner
From: Michael Niedermayer mich...@niedermayer.cc Signed-off-by: Michael Niedermayer mich...@niedermayer.cc --- tests/checkasm/checkasm.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tests/checkasm/checkasm.h b/tests/checkasm/checkasm.h index 1a46e9b..b54be16 100644 ---

[libav-devel] [PATCH 1/2] checkasm: exit with status 0 instead of 1 if there are no tests to perform

2015-07-17 Thread Henrik Gramner
--- tests/checkasm/checkasm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c index 7b1ea8f..0aa3d1c 100644 --- a/tests/checkasm/checkasm.c +++ b/tests/checkasm/checkasm.c @@ -317,7 +317,7 @@ int main(int argc, char

Re: [libav-devel] [PATCH] cosmetics: Reformat checkasm tests

2015-07-17 Thread Henrik Gramner
On Fri, Jul 17, 2015 at 8:08 PM, Luca Barbato lu_z...@gentoo.org wrote: -qpel_mc_func (*tab)[16] = op ? h.avg_h264_qpel_pixels_tab : h.put_h264_qpel_pixels_tab; +qpel_mc_func(*tab)[16] = op ? h.avg_h264_qpel_pixels_tab : h.put_h264_qpel_pixels_tab; No space between type and

[libav-devel] [PATCH 1/2] x86: bswapdsp: Don't treat 32-bit integers as 64-bit

2015-07-15 Thread Henrik Gramner
The upper halves are not guaranteed to be zero in x86-64. Also use `test` instead of `and` when the result isn't used for anything other than as a branch condition, this allows some register moves to be eliminated. --- libavcodec/x86/bswapdsp.asm | 23 ++- 1 file changed, 10

[libav-devel] [PATCH 2/2] checkasm: add unit tests for bswapdsp

2015-07-15 Thread Henrik Gramner
diff --git a/tests/checkasm/bswapdsp.c b/tests/checkasm/bswapdsp.c new file mode 100644 index 000..7b1566b --- /dev/null +++ b/tests/checkasm/bswapdsp.c @@ -0,0 +1,73 @@ +/* + * Copyright (c) 2015 Henrik Gramner + * + * This file is part of Libav. + * + * Libav is free software; you can

[libav-devel] [PATCH] checkasm: Add unit tests for h264qpel

2015-07-13 Thread Henrik Gramner
--- /dev/null +++ b/tests/checkasm/h264qpel.c @@ -0,0 +1,80 @@ +/* + * Copyright (c) 2015 Henrik Gramner + * + * This file is part of Libav. + * + * Libav is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software

  1   2   >