[libav-devel] [PATCH] x86inc: Avoid using eax/rax for storing the stack pointer

2016-12-25 Thread Henrik Gramner
When allocating stack space with an alignment requirement that is larger than the current stack alignment we need to store a copy of the original stack pointer in order to be able to restore it later. If we chose to use another register for this purpose we should not pick eax/rax since it can be o

Re: [libav-devel] [PATCH] x86inc: Avoid using eax/rax for storing the stack pointer

2016-12-26 Thread Henrik Gramner
On Mon, Dec 26, 2016 at 2:32 AM, Ronald S. Bultje wrote: > I know I'm terribly nitpicking here for the limited scope of the comment, > but this only matters for functions that have a return value. Do you think > it makes sense to allow functions to opt out of this requirement if they > explicitly

Re: [libav-devel] [FFmpeg-devel] [PATCH] x86inc: Avoid using eax/rax for storing the stack pointer

2016-12-26 Thread Henrik Gramner
On Mon, Dec 26, 2016 at 2:52 PM, Ronald S. Bultje wrote: > Hm, OK, I think it affects unix64/x86-32 also when using 32-byte > alignment. We do use the stack pointer then. On 32-bit and UNIX64 it simply uses a different caller-saved register which doesn't require additional instructions. > I thi

Re: [libav-devel] [PATCH] mov: Avoid memcmp of uninitialised data

2017-01-29 Thread Henrik Gramner
On Sun, Jan 29, 2017 at 8:59 PM, Mark Thompson wrote: > strncmp Any particular reason for not just using plain strcmp()? ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH 4/4] x86: fft: Port to cpuflags

2017-03-14 Thread Henrik Gramner
On Fri, Mar 10, 2017 at 3:17 PM, Diego Biurrun wrote: > +%macro INTERL 5 > +%if cpuflag(avx) > +vunpckhps %3, %2, %1 > +vunpcklps %2, %2, %1 > +vextractf128 %4(%5), %2, 0 > +vextractf128 %4 %+ H(%5), %3, 0 > +vextractf128 %4(%5 + 1), %2, 1 > +vextractf128 %4

Re: [libav-devel] [RFC] checkasm: assembly testing and benchmarking tool

2015-07-03 Thread Henrik Gramner
sion from 2014-11-09. I might have some additional changes locally on top of that, but I'm not at home right now so I can't check it at this moment. /Henrik From 6efacab3472889ba6acb88d60d7ca7413512edb8 Mon Sep 17 00:00:00 2001 From: Henrik Gramner Date: Sun, 9 Nov 2014 15:43:40 +0100 Su

Re: [libav-devel] [RFC] checkasm: assembly testing and benchmarking tool

2015-07-06 Thread Henrik Gramner
Attaching updated version with all local changes merged. I'll support it and make changes if necessary if there's any interest in getting it committed this time. /Henrik From 14e611818921403230c70177fa36e201a9d4d6f6 Mon Sep 17 00:00:00 2001 From: Henrik Gramner Date: Mon, 6 Jul 201

Re: [libav-devel] [RFC] checkasm: assembly testing and benchmarking tool

2015-07-07 Thread Henrik Gramner
or you now? (no other changes compared to the previous version aside from Makefile stuff) /Henrik From b685f0a886d8e460735f1d2be9e68678fe7521f2 Mon Sep 17 00:00:00 2001 From: Henrik Gramner Date: Tue, 7 Jul 2015 13:51:49 +0200 Subject: [PATCH] Checkasm: assembly testing and benchmarking to

[libav-devel] [PATCH] x86/timer: serialize rdtsc

2015-07-08 Thread Henrik Gramner
Improves the accuracy of measurements, especially in short sections. To quote the Intel 64 and IA-32 Architectures Software Developer's Manual: "The RDTSC instruction is not a serializing instruction. It does not necessarily wait until all previous instructions have been executed before reading th

Re: [libav-devel] [RFC] checkasm: assembly testing and benchmarking tool

2015-07-11 Thread Henrik Gramner
Updated with some fairly minor changes (mainly using AV_READ_TIME()) after comments on IRC etc. /Henrik From a2b30e3e4dbab5912947a820c63fd32baf7b81ab Mon Sep 17 00:00:00 2001 From: Henrik Gramner Date: Sat, 11 Jul 2015 20:32:11 +0200 Subject: [PATCH] Checkasm: assembly testing and benchmarking

[libav-devel] [PATCH] checkasm: Add unit tests for h264qpel

2015-07-13 Thread Henrik Gramner
0..06bc6ad --- /dev/null +++ b/tests/checkasm/h264qpel.c @@ -0,0 +1,80 @@ +/* + * Copyright (c) 2015 Henrik Gramner + * + * This file is part of Libav. + * + * Libav is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by +

Re: [libav-devel] [PATCH] checkasm: Add unit tests for h264qpel

2015-07-13 Thread Henrik Gramner
Attaching a modified version that also verifies that the src buffers are identical after the calls. Probably overkill, but might as well check while we're at it. /Henrik From 259c3acc59581f13bb0a4b8debe62ea1c65d66f4 Mon Sep 17 00:00:00 2001 From: Henrik Gramner Date: Mon, 13 Jul 2015 23:

[libav-devel] [PATCH 2/2] checkasm: add unit tests for bswapdsp

2015-07-15 Thread Henrik Gramner
diff --git a/tests/checkasm/bswapdsp.c b/tests/checkasm/bswapdsp.c new file mode 100644 index 000..7b1566b --- /dev/null +++ b/tests/checkasm/bswapdsp.c @@ -0,0 +1,73 @@ +/* + * Copyright (c) 2015 Henrik Gramner + * + * This file is part of Libav. + * + * Libav is free software; you can

[libav-devel] [PATCH 1/2] x86: bswapdsp: Don't treat 32-bit integers as 64-bit

2015-07-15 Thread Henrik Gramner
The upper halves are not guaranteed to be zero in x86-64. Also use `test` instead of `and` when the result isn't used for anything other than as a branch condition, this allows some register moves to be eliminated. --- libavcodec/x86/bswapdsp.asm | 23 ++- 1 file changed, 10 i

Re: [libav-devel] [PATCH 2/4] checkasm: test all architectures with optimisations

2015-07-17 Thread Henrik Gramner
lgtm. ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel

[libav-devel] [PATCH 1/2] checkasm: exit with status 0 instead of 1 if there are no tests to perform

2015-07-17 Thread Henrik Gramner
--- tests/checkasm/checkasm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c index 7b1ea8f..0aa3d1c 100644 --- a/tests/checkasm/checkasm.c +++ b/tests/checkasm/checkasm.c @@ -317,7 +317,7 @@ int main(int argc, char *argv[])

[libav-devel] [PATCH 2/2] tests/checkasm/checkasm: Give macro a body to avoid potential unexpected syntax issues

2015-07-17 Thread Henrik Gramner
From: Michael Niedermayer Signed-off-by: Michael Niedermayer --- tests/checkasm/checkasm.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tests/checkasm/checkasm.h b/tests/checkasm/checkasm.h index 1a46e9b..b54be16 100644 --- a/tests/checkasm/checkasm.h +++ b/tests/checkasm

Re: [libav-devel] [PATCH] cosmetics: Reformat checkasm tests

2015-07-17 Thread Henrik Gramner
On Fri, Jul 17, 2015 at 8:08 PM, Luca Barbato wrote: > -qpel_mc_func (*tab)[16] = op ? h.avg_h264_qpel_pixels_tab : > h.put_h264_qpel_pixels_tab; > +qpel_mc_func(*tab)[16] = op ? h.avg_h264_qpel_pixels_tab : > h.put_h264_qpel_pixels_tab; No space between type and identifier? I d

Re: [libav-devel] [PATCH 1/1] checkasm: fix MSVC build by adding a zero initializer for an empty array

2015-07-20 Thread Henrik Gramner
On Mon, Jul 20, 2015 at 11:58 AM, Janne Grunau wrote: > --- > tests/checkasm/h264pred.c | 1 + > 1 file changed, 1 insertion(+) Shouldn't it be NULL instead of 0 since those are pointers? Otherwise OK. ___ libav-devel mailing list libav-devel@libav.or

Re: [libav-devel] [PATCH 1/1] checkasm: remove empty array initializer list in h264pred test

2015-07-20 Thread Henrik Gramner
On Mon, Jul 20, 2015 at 11:18 PM, Janne Grunau wrote: > Fixes MSVC compilation. > --- > tests/checkasm/h264pred.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) Ok. ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/m

Re: [libav-devel] [PATCH] [RFC] use a wrapper script to call MS link.exe to avoid mixing with /usr/bin/link.exe

2015-07-23 Thread Henrik Gramner
On Thu, Jul 23, 2015 at 9:04 AM, Martin Storsjö wrote: > Why is this suddenly using "command" instead of "which" now? This won't work > in a linux environment. Why wouldn't it work in a linux environment? `command` is POSIX. This stackoverflow post sums it up fairly well: https://stackoverflow.c

Re: [libav-devel] [PATCH] [RFC] use a wrapper script to call MS link.exe to avoid mixing with /usr/bin/link.exe

2015-07-23 Thread Henrik Gramner
On Thu, Jul 23, 2015 at 7:23 PM, Steve Lhomme wrote: > On Thu, Jul 23, 2015 at 7:02 PM, Derek Buitenhuis > wrote: >> Broken permissions. > > Not sure how I can tweak that under Windows. git update-index --chmod=+x ___ libav-devel mailing list libav-d

[libav-devel] [PATCH 2/2] checkasm: Use LOCAL_ALIGNED

2015-07-24 Thread Henrik Gramner
From: Michael Niedermayer Fixes alignment issues and bus errors. --- tests/checkasm/bswapdsp.c | 9 + tests/checkasm/h264pred.c | 5 +++-- tests/checkasm/h264qpel.c | 9 + 3 files changed, 13 insertions(+), 10 deletions(-) diff --git a/tests/checkasm/bswapdsp.c b/tests/checkasm/

[libav-devel] [PATCH 1/2] checkasm: Modify report format

2015-07-24 Thread Henrik Gramner
Makes it a bit more clear where each test belongs. Suggested by Anton Khirnov. --- tests/checkasm/checkasm.c | 57 +++ tests/checkasm/checkasm.h | 2 +- tests/checkasm/h264qpel.c | 2 +- 3 files changed, 30 insertions(+), 31 deletions(-) diff --git a

Re: [libav-devel] [PATCH] checkasm: Include io.h for isatty, if available

2015-07-29 Thread Henrik Gramner
On Wed, Jul 29, 2015 at 10:09 PM, Martin Storsjö wrote: > configure does check for isatty, and checkasm properly checks > HAVE_ISATTY, but on some platforms (e.g. WinRT), io.h needs to be > included for isatty to be available. Ok. ___ libav-devel mailin

[libav-devel] [PATCH] x86: dcadsp: Avoid SSE2 instructions in SSE functions

2015-08-01 Thread Henrik Gramner
--- libavcodec/x86/dcadsp.asm | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libavcodec/x86/dcadsp.asm b/libavcodec/x86/dcadsp.asm index c42ee23..c99df12 100644 --- a/libavcodec/x86/dcadsp.asm +++ b/libavcodec/x86/dcadsp.asm @@ -148,7 +148,7 @@ DECODE_HF addps m4, v

[libav-devel] [PATCH] x86: dct: Disable dct32_float_sse on x86-64

2015-08-01 Thread Henrik Gramner
There is an SSE2 implementation so the SSE version is never used. The "SSE" version also happens to contain SSE2 instructions on x86-64. --- libavcodec/x86/dct32.asm | 3 +++ libavcodec/x86/dct_init.c | 2 ++ 2 files changed, 5 insertions(+) diff --git a/libavcodec/x86/dct32.asm b/libavcodec/x86

[libav-devel] [PATCH 1/8] x86inc: warn if XOP integer FMA instruction emulation is impossible

2015-08-01 Thread Henrik Gramner
From: Anton Mitrofanov Emulation requires a temporary register if arguments 1 and 4 are the same; this doesn't obey the semantics of the original instruction, so we can't emulate that in x86inc. Also add pmacsdql emulation. Signed-off-by: Henrik Gramner --- libavutil/x86/x86i

[libav-devel] [PATCH 2/8] x86inc: Support arbitrary stack alignments

2015-08-01 Thread Henrik Gramner
Change ALLOC_STACK to always align the stack before allocating stack space for consistency. Previously alignment would occur either before or after allocating stack space depending on whether manual alignment was required or not. --- libavcodec/x86/h264_deblock.asm | 4 +-- libavutil/x86/x86inc.a

[libav-devel] [PATCH 0/8] x86inc: Sync changes from x264

2015-08-01 Thread Henrik Gramner
rrent cpuflags are used Christophe Gisquet (1): x86inc: Fix instantiation of YMM registers Henrik Gramner (5): x86inc: Support arbitrary stack alignments x86inc: Disable vpbroadcastq workaround in newer yasm versions x86inc: Drop SECTION_TEXT macro x86inc: nasm support x86inc: Va

[libav-devel] [PATCH 7/8] x86inc: nasm support

2015-08-01 Thread Henrik Gramner
--- configure| 3 --- libavutil/x86/x86inc.asm | 42 +- 2 files changed, 29 insertions(+), 16 deletions(-) diff --git a/configure b/configure index 482be43..79dd3a5 100755 --- a/configure +++ b/configure @@ -1353,7 +1353,6 @@ ARCH_EXT_LIST_

[libav-devel] [PATCH 4/8] x86inc: Fix instantiation of YMM registers

2015-08-01 Thread Henrik Gramner
From: Christophe Gisquet Signed-off-by: Henrik Gramner --- libavutil/x86/x86inc.asm | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm index 96ebe37..2844fdf 100644 --- a/libavutil/x86/x86inc.asm +++ b/libavutil/x86

[libav-devel] [PATCH 6/8] x86inc: Drop SECTION_TEXT macro

2015-08-01 Thread Henrik Gramner
The .text section is already 16-byte aligned by default on all supported platforms so `SECTION_TEXT` isn't any different from `SECTION .text`. --- libavcodec/x86/apedsp.asm | 2 +- libavcodec/x86/audiodsp.asm | 2 +- libavcodec/x86/bswapdsp.asm | 2 +- libavcodec/x86/d

[libav-devel] [PATCH 5/8] x86inc: Disable vpbroadcastq workaround in newer yasm versions

2015-08-01 Thread Henrik Gramner
The bug was fixed in 1.3.0, so only perform the workaround in earlier versions. --- libavutil/x86/x86inc.asm | 20 +++- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm index 2844fdf..d4ce68f 100644 --- a/libavutil/x

[libav-devel] [PATCH 3/8] x86inc: warn when instructions incompatible with current cpuflags are used

2015-08-01 Thread Henrik Gramner
From: Anton Mitrofanov Signed-off-by: Henrik Gramner --- libavutil/x86/x86inc.asm | 587 --- 1 file changed, 299 insertions(+), 288 deletions(-) diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm index ae6813a..96ebe37 100644 --- a

[libav-devel] [PATCH 8/8] x86inc: Various minor backports from x264

2015-08-01 Thread Henrik Gramner
--- libavutil/x86/x86inc.asm | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm index d70a5f9..0e2f447 100644 --- a/libavutil/x86/x86inc.asm +++ b/libavutil/x86/x86inc.asm @@ -1,7 +1,7 @@ ;

Re: [libav-devel] [PATCH] x86: dct: Disable dct32_float_sse on x86-64

2015-08-01 Thread Henrik Gramner
On Sat, Aug 1, 2015 at 8:28 PM, Anton Khirnov wrote: >Any specific reason you use ARCH_X86_64 in one file and >ARCH_X86_32 in the other? I missed that there's a define for ARCH_X86_32 in asm (some other code used %if ARCH_X86_64 == 0 so I assumed it didn't). Using ARCH_X86_32 in both places is o

Re: [libav-devel] [PATCH] x86: dcadsp: Avoid SSE2 instructions in SSE functions

2015-08-01 Thread Henrik Gramner
On Sat, Aug 1, 2015 at 8:49 PM, James Almer wrote: > I however think movq/sd should be used here for sse2 and above instead of > movlps. That's a moot point in this case since the code in question is SSE only (and even if it wasn't I'm skeptical to the claim that it would be measurably slower tha

Re: [libav-devel] [PATCH 7/8] x86inc: nasm support

2015-08-02 Thread Henrik Gramner
On Sat, Aug 1, 2015 at 5:27 PM, Henrik Gramner wrote: > --- > configure| 3 --- > libavutil/x86/x86inc.asm | 42 +- > 2 files changed, 29 insertions(+), 16 deletions(-) Skip this one for now, nasm seems to have a bug wit

Re: [libav-devel] [PATCH 8/8] x86inc: Various minor backports from x264

2015-08-02 Thread Henrik Gramner
On Sat, Aug 1, 2015 at 9:34 PM, James Almer wrote: > The same could be done in av_parse_cpu_flags(). > It doesn't affect this patch, and can be done separately. Just throwing > the idea out there. Yeah, I guess. > What about bmi/bmi2, for that matter? What about them? __

[libav-devel] [PATCH] checkasm: Remove unnecessary include

2015-08-05 Thread Henrik Gramner
--- tests/checkasm/checkasm.c | 4 1 file changed, 4 deletions(-) diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c index 82c635e..b564e7e 100644 --- a/tests/checkasm/checkasm.c +++ b/tests/checkasm/checkasm.c @@ -33,10 +33,6 @@ #include #endif -#if ARCH_X86 -#include "

[libav-devel] [PATCH] x86inc: Various minor backports from x264

2015-08-11 Thread Henrik Gramner
--- libavutil/x86/x86inc.asm | 32 +--- 1 file changed, 21 insertions(+), 11 deletions(-) diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm index a519fd5..6ad9785 100644 --- a/libavutil/x86/x86inc.asm +++ b/libavutil/x86/x86inc.asm @@ -1,7 +1,7 @@ ;

[libav-devel] [PATCH] checkasm: Explicitly declare function prototypes

2015-08-16 Thread Henrik Gramner
Now we no longer have to rely on function pointers intentionally declared without specified argument types. This makes it easier to support functions with floating point parameters or return values as well as functions returning 64-bit values on 32-bit architectures. It also avoids having to expli

[libav-devel] [PATCH] checkasm: x86: properly save rdx/edx in checked_call()

2015-08-16 Thread Henrik Gramner
If the return value doesn't fit in a single register rdx/edx can in some cases be used in addition to rax/eax. Doesn't affect any of the existing checkasm tests but might be useful later. Also comment the relevant code a bit better. --- tests/checkasm/x86/checkasm.asm | 7 +++ 1 file changed

[libav-devel] [PATCH v2] checkasm: Explicitly declare function prototypes

2015-08-20 Thread Henrik Gramner
Now we no longer have to rely on function pointers intentionally declared without specified argument types. This makes it easier to support functions with floating point parameters or return values as well as functions returning 64-bit values on 32-bit architectures. It also avoids having to expli

Re: [libav-devel] [PATCH 7/8] checkasm: add HEVC MC tests

2015-08-20 Thread Henrik Gramner
On Wed, Aug 19, 2015 at 9:43 PM, Anton Khirnov wrote: > +const int srcstride = FFALIGN(width, 16) * sizeof(*src0); > +const int dststride = FFALIGN(width, 16) * PIXEL_SIZE(bit_depth); Strides, and any other pointer-sized value, should be ptrdiff_t - or more preferable, review/push my che

Re: [libav-devel] [PATCH] checkasm: add HEVC MC tests

2015-08-22 Thread Henrik Gramner
Minor nits: > +#define randomize_buffers(buf, size, depth) s/buffers/buffer/ since you're only randomizing a single one at a time. > +static const char *interp_names[2][2] = { { "pixels", "h" }, { "v", "hv" } }; const char * const Otherwise lgtm. ___

Re: [libav-devel] [PATCH] hevcdsp: add x86 SIMD for MC

2015-08-23 Thread Henrik Gramner
On Sun, Aug 23, 2015 at 8:27 PM, Anton Khirnov wrote: > Quoting James Almer (2015-08-22 23:58:41) >> You need to use the d suffix >> instead of q on the register names to make sure the high bits are cleared. > > Eh? Perhaps I'm misunderstading something, but I'd expect that using d > here would do

[libav-devel] [PATCH] checkasm: Fix floating point arguments on 64-bit Windows

2015-08-24 Thread Henrik Gramner
--- tests/checkasm/x86/checkasm.asm | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/tests/checkasm/x86/checkasm.asm b/tests/checkasm/x86/checkasm.asm index 4948fc9..828352c 100644 --- a/tests/checkasm/x86/checkasm.asm +++ b/tests/checkasm/x86/checkasm.asm @@ -103,16

[libav-devel] [PATCH] checkasm: add unit tests for v210enc

2015-09-05 Thread Henrik Gramner
null +++ b/tests/checkasm/v210enc.c @@ -0,0 +1,94 @@ +/* + * Copyright (c) 2015 Henrik Gramner + * + * This file is part of Libav. + * + * Libav is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundati

[libav-devel] [PATCH] checkasm: v210: Fix array overwrite

2015-09-16 Thread Henrik Gramner
--- tests/checkasm/v210enc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tests/checkasm/v210enc.c b/tests/checkasm/v210enc.c index cdb8e76..4f5f6ba 100644 --- a/tests/checkasm/v210enc.c +++ b/tests/checkasm/v210enc.c @@ -43,7 +43,7 @@ AV_WN32A(v0 + i, r);

Re: [libav-devel] [PATCH] tiny_psnr: Use the correct abs() version

2015-09-22 Thread Henrik Gramner
On Tue, Sep 22, 2015 at 9:28 PM, Vittorio Giovara wrote: > I am puzzled as well, msdn reports this function available only from > vs2013, but there is a vs2012 fate instance which seems to compile > fine with it. That wouldn't exactly be the first incorrect thing in the MSDN documentation though.

[libav-devel] [PATCH] checkasm/x86: Correctly handle variadic functions

2015-09-23 Thread Henrik Gramner
The System V ABI on x86-64 specifies that the al register contains an upper bound of the number of arguments passed in vector registers when calling variadic functions, so we aren't allowed to clobber it. checkasm_fail_func() is a variadic function so also zero al before calling it. --- tests/che

[libav-devel] [PATCH] checkasm: Use a self-balancing tree

2015-09-25 Thread Henrik Gramner
Tested functions are internally kept in a binary search tree for efficient lookups. The downside of the current implementation is that the tree quickly becomes unbalanced which causes an unneccessary amount of comparisons between nodes. Improve this by changing the tree into a self-balancing left-l

[libav-devel] [PATCH] avutil/avstring: Inline some tiny functions

2015-09-26 Thread Henrik Gramner
They're short enough that inlining them actually reduces code size due to all the overhead associated with making a function call. --- libavutil/avstring.c | 22 -- libavutil/avstring.h | 22 ++ 2 files changed, 18 insertions(+), 26 deletions(-) diff --git

[libav-devel] [PATCH] checkasm: Fix the function name sorting algorithm

2015-09-28 Thread Henrik Gramner
The previous implementation was behaving incorrectly in some corner cases. --- tests/checkasm/checkasm.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c index 013e197..9219a83 100644 --- a/tests/checkasm/checkasm.c +

Re: [libav-devel] [PATCH] avutil/avstring: Inline some tiny functions

2015-09-28 Thread Henrik Gramner
On Mon, Sep 28, 2015 at 9:49 AM, Anton Khirnov wrote: > But does it actually improve performance measurably? I'd argue that > those functions are used in places where it doesn't really matter. I was using some perf tools through checkasm when I noticed an awful lot of time was spent calling av_is

[libav-devel] [PATCH] checkasm: Fix compilation with --disable-avcodec

2015-10-04 Thread Henrik Gramner
--- tests/checkasm/checkasm.c | 24 +--- 1 file changed, 13 insertions(+), 11 deletions(-) diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c index 9219a83..3ed78b6 100644 --- a/tests/checkasm/checkasm.c +++ b/tests/checkasm/checkasm.c @@ -57,17 +57,19 @@ stati

Re: [libav-devel] [PATCH] checkasm: Fix compilation with --disable-avcodec

2015-10-04 Thread Henrik Gramner
On Sun, Oct 4, 2015 at 8:39 PM, Luca Barbato wrote: > Alternatively we might make sure if avcodec is disabled all its > components are as well. > > might simplify a lot the code... Yes, that's indeed a solid approach as well. Who's volunteering for that though? I don't really know much about the

[libav-devel] [PATCH] x86: h264_intrapred: Don't treat 32-bit integers as 64-bit

2014-10-01 Thread Henrik Gramner
The upper halves are not guaranteed to be zero in x86-64. --- libavcodec/x86/h264_intrapred.asm | 98 +++ 1 file changed, 49 insertions(+), 49 deletions(-) diff --git a/libavcodec/x86/h264_intrapred.asm b/libavcodec/x86/h264_intrapred.asm index 4a4fa10..df657a

[libav-devel] [RFC] checkasm: assembly testing and benchmarking tool

2014-10-10 Thread Henrik Gramner
ff0fd4437d83 Mon Sep 17 00:00:00 2001 From: Henrik Gramner Date: Fri, 10 Oct 2014 21:53:38 +0200 Subject: [PATCH] Checkasm: assembly testing and benchmarking tool It provides the following features: * verify correctness by comparing the output to the C version. * detect failure to save and rest

Re: [libav-devel] [RFC] checkasm: assembly testing and benchmarking tool

2014-10-10 Thread Henrik Gramner
On Fri, Oct 10, 2014 at 10:18 PM, Luca Barbato wrote: > Given that probably all the init functions do not have a cpuflag parameter > wouldn't be better override the cpu state with av_set_cpu_flags_mask and not > go over the codebase to add the parameter? Oh, I totally missed the existence of that

[libav-devel] [PATCH] Mark some character handling functions av_const

2014-11-04 Thread Henrik Gramner
--- libavutil/avstring.h | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/libavutil/avstring.h b/libavutil/avstring.h index 9a18ddd..7c30ee1 100644 --- a/libavutil/avstring.h +++ b/libavutil/avstring.h @@ -154,22 +154,22 @@ char *av_get_token(const char **buf, const

[libav-devel] [PATCH] checkasm: assembly testing and benchmarking tool

2014-11-09 Thread Henrik Gramner
n x86-64 (the upper halves are not guaranteed to be zero - but in practice they very often are, which makes those bugs hard to spot otherwise). * easy benchmarking. From 6efacab3472889ba6acb88d60d7ca7413512edb8 Mon Sep 17 00:00:00 2001 From: Henrik Gramner Date: Sun, 9 Nov 2014 15:43:40 +0100 Su

Re: [libav-devel] [PATCH] vf_interlace: Add SIMD for lowpass filter

2014-11-10 Thread Henrik Gramner
On Mon, Nov 10, 2014 at 10:42 PM, Kieran Kunhya wrote: > +pw_1: times 8 dw 1 cextern pw_1 > +psllw m0, 1 > +psllw m1, 1 paddw mN, mN ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel

Re: [libav-devel] [PATCH] checkasm: assembly testing and benchmarking tool

2014-11-14 Thread Henrik Gramner
Ping. On Sun, Nov 9, 2014 at 4:35 PM, Henrik Gramner wrote: > Libav currently doesn't have any good unit tests for assembly code > which makes it difficult to write new assembly functions and/or > improve the existing ones. > > x264 have the checkasm tool which does the job

Re: [libav-devel] [PATCH] checkasm: assembly testing and benchmarking tool

2014-11-16 Thread Henrik Gramner
On Sat, Nov 15, 2014 at 12:33 AM, Luca Barbato wrote: > On 09/11/14 16:35, Henrik Gramner wrote: >> Libav currently doesn't have any good unit tests for assembly code >> which makes it difficult to write new assembly functions and/or >> improve the existing ones. >>

Re: [libav-devel] [PATCH] vf_interlace: x86: improve asm performance

2014-11-24 Thread Henrik Gramner
> +mova m2, [r2+r1] > +mova m3, [r2+r1+mmsize] > +pxor m2, m6 > +pxor m3, m6 pxor m2, m6, [r2+r1] pxor m3, m6, [r2+r1+mmsize] Avoids two moves in AVX, otherwise LGTM. ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.

Re: [libav-devel] [PATCH 1/2] configure: Support msys2 out of box

2015-11-21 Thread Henrik Gramner
On Sat, Nov 21, 2015 at 7:53 AM, Hendrik Leppkes wrote: > msys2 provides various .sh scripts to setup the environment, one for > msys2 building, and one for mingw32/64 respectively. > You need to launch it using the appropriate shell script, but just > running sh.exe. > > - Hendrik Which is not r

Re: [libav-devel] [PATCH 1/1] x86: checkasm: check for or handle missing cleanup after MMX instructions

2015-12-22 Thread Henrik Gramner
On Fri, Dec 11, 2015 at 6:40 PM, Janne Grunau wrote: > +#define declare_new_emms(cpu_flags, ret, ...) \ > +ret (*checked_call)(void *, int, int, int, int, int, __VA_ARGS__) = \ > +((cpu_flags) & av_get_cpu_flags()) ? (void > *)checkasm_checked_call_emms : \ > +

Re: [libav-devel] [libav-commits] checkasm: add fmtconvert tests

2015-12-22 Thread Henrik Gramner
On Tue, Dec 22, 2015 at 5:41 PM, Janne Grunau wrote: > I found HTML copy from 1999 of Intel's manual(1) which says that > cvtpi2ps with a memory location as source doesn't cause a transition to > MMX state. The current documentation for cvtpi2pd (packed int to packed > double conversion) says the

Re: [libav-devel] [libav-commits] checkasm: add fmtconvert tests

2015-12-23 Thread Henrik Gramner
On Tue, Dec 22, 2015 at 10:44 PM, Janne Grunau wrote: >> Intel's current documentation is very clear on cvtpi2ps: "This >> instruction causes a transition from x87 FPU to MMX technology >> operation". > > every tested silicon (nothing ancient or SSE only though) and the copy > of the manual from 1

Re: [libav-devel] [PATCH 1/2] x86: zero extend the 32-bit length in int32_to_float_fmul_scalar implicitly

2015-12-25 Thread Henrik Gramner
On Tue, Dec 22, 2015 at 10:59 PM, Janne Grunau wrote: > This reverts commit 5dfe4edad63971d669ae456b0bc40ef9364cca80. > --- > libavcodec/x86/fmtconvert.asm | 5 + > 1 file changed, 1 insertion(+), 4 deletions(-) Ok. ___ libav-devel mailing list lib

Re: [libav-devel] [PATCH 2/2] checkasm: x86: post commit review fixes

2015-12-25 Thread Henrik Gramner
On Tue, Dec 22, 2015 at 10:59 PM, Janne Grunau wrote: > Check the full FPU tag word instead of only the upper half and simplify > the comparison. It previously only checked the lower half, not the upper. > Use upper-case function base name as macro name to instantiate both > checked_call variant

Re: [libav-devel] [PATCH 1/1] x86: use emms after ff_int32_to_float_fmul_scalar_sse

2015-12-29 Thread Henrik Gramner
On Tue, Dec 29, 2015 at 12:32 PM, Janne Grunau wrote: > Intel's Instruction Set Reference (as of September 2015) clearly states > that cvtpi2ps switches to MMX state. Actual CPUs do not switch if the > source is a memory location. The Instruction Set Reference from 1999 > (Order Number 243191) des

Re: [libav-devel] [PATCH v2 1/1] x86: use emms after ff_int32_to_float_fmul_scalar_sse

2015-12-30 Thread Henrik Gramner
On Wed, Dec 30, 2015 at 1:43 PM, Janne Grunau wrote: > libavcodec/x86/fmtconvert.asm | 9 - > 1 file changed, 8 insertions(+), 1 deletion(-) Ok. ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-d

[libav-devel] [PATCH 1/8] x86inc: Make cpuflag() and notcpuflag() return 0 or 1

2016-01-17 Thread Henrik Gramner
Makes it possible to use them in arithmetic expressions. --- libavutil/x86/x86inc.asm | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm index 6ad9785..afcd6b8 100644 --- a/libavutil/x86/x86inc.asm +++ b/libavutil/x86/x86inc

[libav-devel] [PATCH 4/8] x86inc: Preserve arguments when allocating stack space

2016-01-17 Thread Henrik Gramner
When allocating stack space with a larger alignment than the known stack alignment a temporary register is used for storing the stack pointer. Ensure that this isn't one of the registers used for passing arguments. --- libavutil/x86/x86inc.asm | 6 -- 1 file changed, 4 insertions(+), 2 deletio

[libav-devel] [PATCH 5/8] x86inc: Use more consistent indentation

2016-01-17 Thread Henrik Gramner
--- libavutil/x86/x86inc.asm | 134 +++ 1 file changed, 67 insertions(+), 67 deletions(-) diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm index c355ee7..de20e76 100644 --- a/libavutil/x86/x86inc.asm +++ b/libavutil/x86/x86inc.asm @@ -18

[libav-devel] [PATCH 6/8] x86inc: Simplify AUTO_REP_RET

2016-01-17 Thread Henrik Gramner
cpuflags is never undefined any more, it's set to 0 instead. Also fix an incorrect comment. --- libavutil/x86/x86inc.asm | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm index de20e76..05a5790 100644 --- a/libavutil/x86/

[libav-devel] [PATCH 0/8] x86inc: Sync changes from x264

2016-01-17 Thread Henrik Gramner
The following patches were recently pushed to x264. Geza Lore (1): x86inc: Add debug symbols indicating sizes of compiled functions Henrik Gramner (7): x86inc: Make cpuflag() and notcpuflag() return 0 or 1 x86inc: Be more verbose in assertion failures x86inc: Improve FMA instruction

[libav-devel] [PATCH 7/8] x86inc: Avoid creating unnecessary local labels

2016-01-17 Thread Henrik Gramner
The REP_RET workaround is only needed on old AMD cpus, and the labels clutter up the symbol table and confuse debugging/profiling tools, so use EQU to create SHN_ABS symbols instead of creating local labels. Furthermore, skip the workaround completely in functions that definitely won't run on such

[libav-devel] [PATCH 8/8] x86inc: Add debug symbols indicating sizes of compiled functions

2016-01-17 Thread Henrik Gramner
From: Geza Lore Some debuggers/profilers use this metadata to determine which function a given instruction is in; without it they get can confused by local labels (if you haven't stripped those). On the other hand, some tools are still confused even with this metadata. e.g. this fixes `gdb`, but

[libav-devel] [PATCH 2/8] x86inc: Be more verbose in assertion failures

2016-01-17 Thread Henrik Gramner
--- libavutil/x86/x86inc.asm | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm index afcd6b8..dabb6cc 100644 --- a/libavutil/x86/x86inc.asm +++ b/libavutil/x86/x86inc.asm @@ -295,7 +295,7 @@ DECLARE_REG_TMP_SIZE 0,1,2,3,4,5,6,7,

[libav-devel] [PATCH 3/8] x86inc: Improve FMA instruction handling

2016-01-17 Thread Henrik Gramner
* Correctly handle FMA instructions with memory operands. * Print a warning if FMA instructions are used without the correct cpuflag. * Simplify the instantiation code. * Clarify documentation. Only the last operand in FMA3 instructions can be a memory operand. When converting FMA4 instruction

Re: [libav-devel] [PATCH 4/8] x86inc: Preserve arguments when allocating stack space

2016-01-18 Thread Henrik Gramner
On Mon, Jan 18, 2016 at 2:35 PM, Ronald S. Bultje wrote: > On Sun, Jan 17, 2016 at 6:21 PM, Henrik Gramner wrote: >> @@ -386,8 +386,10 @@ DECLARE_REG_TMP_SIZE >> 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14 >> %if %1 != 0 && required_stack_alignment > STACK_ALIG

[libav-devel] [PATCH v2] x86inc: Preserve arguments when allocating stack space

2016-01-20 Thread Henrik Gramner
When allocating stack space with a larger alignment than the known stack alignment a temporary register is used for storing the stack pointer. Ensure that this isn't one of the registers used for passing arguments. --- libavutil/x86/x86inc.asm | 7 +-- 1 file changed, 5 insertions(+), 2 deleti

[libav-devel] [PATCH] msvc: Fix libx264 linking

2016-01-28 Thread Henrik Gramner
--- configure | 1 + 1 file changed, 1 insertion(+) diff --git a/configure b/configure index c5bcb78..0bf29c2 100755 --- a/configure +++ b/configure @@ -2951,6 +2951,7 @@ msvc_common_flags(){ -lz) echo zlib.lib ;; -lavifil32) echo vfw32.lib ;;

Re: [libav-devel] [PATCH] h264: Parse only the x264 info unregisterd sei

2016-02-04 Thread Henrik Gramner
On Wed, Jul 29, 2015 at 10:51 PM, Luca Barbato wrote: > And restrict the string to ascii text. Restricting to printable characters would be even better. ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-de

Re: [libav-devel] [PATCH] h264: Use isprint to sanitize the SEI debug message

2016-02-06 Thread Henrik Gramner
On Sat, Feb 6, 2016 at 1:03 PM, Luca Barbato wrote: > +if (isprint(val)) Shouldn't we use a locale-independent version similar to the other functions in libavutil/avstring.h? ___ libav-devel mailing list libav-devel@libav.org https://lists.libav

Re: [libav-devel] [PATCH] h264: Use isprint to sanitize the SEI debug message

2016-02-06 Thread Henrik Gramner
On Sat, Feb 6, 2016 at 7:34 PM, Luca Barbato wrote: > Give how this function is used it is not really important, its purpose > is to not break the terminal printing garbage. That's true I guess. > Do you have time to get me a function that is local independent? static inline av_const int av_isp

[libav-devel] [PATCH 0/4] x86inc: Sync changes from x264

2016-04-20 Thread Henrik Gramner
Anton Mitrofanov (3): x86inc: Fix AVX emulation of some instructions x86inc: Improve handling of %ifid with multi-token parameters x86inc: Enable AVX emulation in additional cases Henrik Gramner (1): x86inc: Fix AVX emulation of scalar float instructions libavutil/x86/x86inc.asm | 95

[libav-devel] [PATCH 2/4] x86inc: Fix AVX emulation of some instructions

2016-04-20 Thread Henrik Gramner
From: Anton Mitrofanov --- libavutil/x86/x86inc.asm | 44 1 file changed, 24 insertions(+), 20 deletions(-) diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm index 10352fc..60aad23 100644 --- a/libavutil/x86/x86inc.asm +++ b/libavutil/

[libav-devel] [PATCH 3/4] x86inc: Improve handling of %ifid with multi-token parameters

2016-04-20 Thread Henrik Gramner
From: Anton Mitrofanov The yasm/nasm preprocessor only checks the first token, which means that parameters such as `dword [rax]` are treated as identifiers, which is generally not what we want. --- libavutil/x86/x86inc.asm | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/l

[libav-devel] [PATCH 1/4] x86inc: Fix AVX emulation of scalar float instructions

2016-04-20 Thread Henrik Gramner
Those instructions are not commutative since they only change the first element in the vector and leave the rest unmodified. --- libavutil/x86/x86inc.asm | 28 ++-- 1 file changed, 14 insertions(+), 14 deletions(-) diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86i

[libav-devel] [PATCH 4/4] x86inc: Enable AVX emulation in additional cases

2016-04-20 Thread Henrik Gramner
From: Anton Mitrofanov Allows emulation to work when dst is equal to src2 as long as the instruction is commutative, e.g. `addps m0, m1, m0`. --- libavutil/x86/x86inc.asm | 21 + 1 file changed, 13 insertions(+), 8 deletions(-) diff --git a/libavutil/x86/x86inc.asm b/libavut

Re: [libav-devel] [PATCH 6/6] hevc: Add AVX2 DC IDCT

2016-07-10 Thread Henrik Gramner
On Sun, Jul 10, 2016 at 1:10 PM, Alexandra Hájková wrote: Some fairly minor nits: > +++ b/libavcodec/x86/hevc_idct.asm > +cglobal hevc_idct_%1x%1_dc_%3, 1, 2, 1, coeff, tmp > +movsx tmpq, word [coeffq] > +add tmpw, ((1 << 14-%3) + 1) > +sar tm

Re: [libav-devel] [PATCH] checkasm: add HEVC test for testing IDCT DC

2016-07-19 Thread Henrik Gramner
On Mon, Jul 18, 2016 at 8:11 PM, Alexandra Hájková wrote: > +if (check_func(h.idct_dc[i - 2], "idct_%dx%d_dc_%d", block_size, > block_size, bit_depth)) { > +call_ref(coeffs0); > +call_new(coeffs1); > +if (memcmp(coeffs0, coeffs1, sizeof(*coeffs0) * size

Re: [libav-devel] [PATCH 1/3] x86/hevc: add add_residual

2016-07-19 Thread Henrik Gramner
On Thu, Jul 14, 2016 at 7:25 PM, Josh de Kock wrote: Some of those functions are several kilobytes large. That's going to result in a lot of cache misses. I suggest using loops instead of duplicating the same code over and over with %reps. ___ libav-dev

Re: [libav-devel] [PATCH 1/2 v2] x86/hevc: add add_residual

2016-07-21 Thread Henrik Gramner
On Thu, Jul 21, 2016 at 2:48 AM, Josh de Kock wrote: > +cglobal hevc_add_residual_16_8, 3, 5, 7, dst, coeffs, stride > +pxorm0, m0 > +lea r3, [strideq * 3] > +RES_ADD_SSE_16_32_8 0, dstq, dstq + strideq > +RES_ADD_SSE_16_32_8 64, dstq + strideq * 2,

  1   2   >