Re: [libav-devel] [GASPP PATCH] Comment out "it" instructions for armasm

2019-10-03 Thread Janne Grunau
On 2019-10-02 11:53:28 +0300, Martin Storsjö wrote: > Armasm implicitly adds it instructions as needed. In VS 2019 16.3, > there's a bug [1] in armasm making it fail to parse these it instructions > (but it can still add them implicitly just fine). > > I'm not sure if it really is worth working

Re: [libav-devel] [GASPP PATCH] Filter out the armasm argument "-oldit" from the preprocessor

2019-10-03 Thread Janne Grunau
On 2019-10-02 10:58:46 +0300, Martin Storsjö wrote: > --- > gas-preprocessor.pl | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/gas-preprocessor.pl b/gas-preprocessor.pl > index 6da37c1..b6c2786 100755 > --- a/gas-preprocessor.pl > +++ b/gas-preprocessor.pl > @@ -162,6 +162,7 @@ if

Re: [libav-devel] [PATCH] aarch64: Add assembly support for -fsanitize=hwaddress tagged globals.

2019-08-22 Thread Janne Grunau
On 2019-08-21 22:40:13 +0300, Martin Storsjö wrote: > From: Peter Collingbourne > > As of LLVM r368102, Clang will set a pointer tag in bits 56-63 of the > address of a global when compiling with -fsanitize=hwaddress. This requires > an adjustment to assembly code that takes the address of such

[libav-devel] [PATCH 1/2] h264/arm64: implement missing 4:2:2 chroma loop filter neon functions

2019-02-27 Thread Janne Grunau
--- libavcodec/aarch64/h264dsp_init_aarch64.c | 18 ++-- libavcodec/aarch64/h264dsp_neon.S | 36 +++ 2 files changed, 46 insertions(+), 8 deletions(-) diff --git a/libavcodec/aarch64/h264dsp_init_aarch64.c b/libavcodec/aarch64/h264dsp_init_aarch64.c index

[libav-devel] [PATCH 2/2] checkasm/h264: test 4:2:2 chroma loop filter functions

2019-02-27 Thread Janne Grunau
--- tests/checkasm/h264dsp.c | 44 1 file changed, 26 insertions(+), 18 deletions(-) diff --git a/tests/checkasm/h264dsp.c b/tests/checkasm/h264dsp.c index 706fc79397..ee07121ab4 100644 --- a/tests/checkasm/h264dsp.c +++ b/tests/checkasm/h264dsp.c @@

Re: [libav-devel] [PATCH 1/1] h264/x86: sign extend int stride in deblock functions

2019-01-27 Thread Janne Grunau
On 2019-01-27 11:39:13 +0100, Diego Biurrun wrote: > On Sun, Jan 27, 2019 at 11:18:41AM +0100, Janne Grunau wrote: > > Fixes checkasm errors after adding the h264 deblock tests. > > --- > > libavcodec/x86/h264_deblock.asm | 8 > > libavcodec/x86

[libav-devel] [PATCH 1/1] h264/x86: sign extend int stride in deblock functions

2019-01-27 Thread Janne Grunau
Fixes checkasm errors after adding the h264 deblock tests. --- libavcodec/x86/h264_deblock.asm | 8 libavcodec/x86/h264_deblock_10bit.asm | 9 + 2 files changed, 17 insertions(+) diff --git a/libavcodec/x86/h264_deblock.asm b/libavcodec/x86/h264_deblock.asm index

Re: [libav-devel] [PATCH 2/4] checkasm/h264: add loop filter tests

2019-01-27 Thread Janne Grunau
On 2019-01-26 23:22:42 +0200, Martin Storsjö wrote: > On Tue, 1 Jan 2019, Janne Grunau wrote: > > > --- > > tests/checkasm/h264dsp.c | 124 +++ > > 1 file changed, 124 insertions(+) > > This newly added test seems to fail on mac

Re: [libav-devel] [PATCH] libopenh264dec: Use a newer decoding entry point function

2019-01-26 Thread Janne Grunau
On 2019-01-25 10:39:13 +0200, Martin Storsjö wrote: > The "new" entry point actually has existed since OpenH264 1.4 in > 2015, but with B-frames, this entry point is essential for actually > getting the right frames returned and reordered. > > The name of this function, DecodeFrameNoDelay, is

Re: [libav-devel] [GASPP PATCH] Name read-only data sections .rdata, convert both .rdata and .rodata in the same way

2019-01-26 Thread Janne Grunau
On 2019-01-11 15:24:56 +0200, Martin Storsjö wrote: > --- > gas-preprocessor.pl | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/gas-preprocessor.pl b/gas-preprocessor.pl > index 4131c46..0137718 100755 > --- a/gas-preprocessor.pl > +++ b/gas-preprocessor.pl > @@ -1198,7

[libav-devel] [PATCH 1/4] h264/aarch64: sign extend int stride in loop filter asm

2019-01-01 Thread Janne Grunau
--- libavcodec/aarch64/h264dsp_neon.S | 3 +++ 1 file changed, 3 insertions(+) diff --git a/libavcodec/aarch64/h264dsp_neon.S b/libavcodec/aarch64/h264dsp_neon.S index 9b4610a4d4..60ffa24500 100644 --- a/libavcodec/aarch64/h264dsp_neon.S +++ b/libavcodec/aarch64/h264dsp_neon.S @@ -130,6 +130,7

[libav-devel] [PATCH 2/4] checkasm/h264: add loop filter tests

2019-01-01 Thread Janne Grunau
--- tests/checkasm/h264dsp.c | 124 +++ 1 file changed, 124 insertions(+) diff --git a/tests/checkasm/h264dsp.c b/tests/checkasm/h264dsp.c index f355a72a74..706fc79397 100644 --- a/tests/checkasm/h264dsp.c +++ b/tests/checkasm/h264dsp.c @@ -28,6 +28,7 @@

[libav-devel] [PATCH 4/4] h264/aarch64: add intra loop filter neon asm

2019-01-01 Thread Janne Grunau
c/aarch64/h264dsp_neon.S index b649f1d018..448e575b8c 100644 --- a/libavcodec/aarch64/h264dsp_neon.S +++ b/libavcodec/aarch64/h264dsp_neon.S @@ -1,6 +1,7 @@ /* * Copyright (c) 2008 Mans Rullgard * Copyright (c) 2013 Janne Grunau + * Copyright (c) 2014 Janne Grunau * * This file

[libav-devel] [PATCH 3/4] h264/aarch64: optimize neon loop filter

2019-01-01 Thread Janne Grunau
Exit as soon as possible if no filtering will be done. Improves the checkasm --bench cycle count on a Snapdragon 820e: h264_h_loop_filter_luma_8bpp_c: 72.4 -> 72.5 h264_h_loop_filter_luma_8bpp_neon: 97.1 -> 56.3 h264_v_loop_filter_luma_8bpp_c: 174.0 -> 173.5

Re: [libav-devel] [GASPP PATCH] Use the correct variable $line instead of the implicit variable

2018-10-26 Thread Janne Grunau
On 2018-10-22 23:24:12 +0300, Martin Storsjö wrote: > This fixes cases if the input parameter is something else than > the currently iterated variable. > --- > gas-preprocessor.pl | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/gas-preprocessor.pl

Re: [libav-devel] [GASPP PATCH] Add a -verbose option for printing all executed commands

2018-10-26 Thread Janne Grunau
On 2018-10-22 23:23:38 +0300, Martin Storsjö wrote: > --- > gas-preprocessor.pl | 9 + > 1 file changed, 9 insertions(+) > > diff --git a/gas-preprocessor.pl b/gas-preprocessor.pl > index fd9aac8..41d7b69 100755 > --- a/gas-preprocessor.pl > +++ b/gas-preprocessor.pl > @@ -27,6 +27,7 @@

Re: [libav-devel] [GASPP PATCH 2/2] Don't match whitespace as branch condition codes

2018-10-22 Thread Janne Grunau
On 2018-10-20 00:18:27 +0300, Martin Storsjö wrote: > For cases like "b1b", this could previously be matched as > $cond = " ". > > This fixes preprocessing with a preprocessor that preserves multiple > consecutive spaces, like cl.exe does. > --- > Better fix, which also works in a number of

Re: [libav-devel] [GASPP PATCH] Extend armasm64 workaround for uxtw/sxtw to uxth/sxth and uxtb/sxtb as well

2018-10-22 Thread Janne Grunau
On 2018-10-22 12:51:47 +0300, Martin Storsjö wrote: > --- > gas-preprocessor.pl | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/gas-preprocessor.pl b/gas-preprocessor.pl > index 7efe3b9..669d435 100755 > --- a/gas-preprocessor.pl > +++ b/gas-preprocessor.pl > @@ -1011,7

Re: [libav-devel] [PATCH] arm: Emit .thumb_func directives

2018-10-12 Thread Janne Grunau
On 2018-10-12 14:43:56 +0300, Martin Storsjö wrote: > Prior to Xcode 9.3, the clang built-in assembler didn't support > altmacro, and gas-preprocessor was used for assembling for arm/darwin. > > For thumb functions, gas-preprocessor took care of adding the .thumb_func > directives, but when now

Re: [libav-devel] [PATCH 1/2] avcodec: rename the AV1 profiles

2018-03-29 Thread Janne Grunau
On 2018-03-29 13:10:49 -0300, James Almer wrote: > Use the proper names instead of numbers > > Signed-off-by: James Almer > --- > libavcodec/avcodec.h | 6 +++--- > libavcodec/libaomenc.c | 6 +++--- > 2 files changed, 6 insertions(+), 6 deletions(-) > > diff --git

Re: [libav-devel] [GASPP PATCH 2/2] Convert {v0.8b-v3.8b} into {v0.8b, v1.8b, v2.8b, v3.8b} for armasm64

2018-03-14 Thread Janne Grunau
On 2018-03-08 15:26:14 +0200, Martin Storsjö wrote: > --- > gas-preprocessor.pl | 18 ++ > 1 file changed, 18 insertions(+) > > diff --git a/gas-preprocessor.pl b/gas-preprocessor.pl > index 9a7f6d8..5158cc7 100755 > --- a/gas-preprocessor.pl > +++ b/gas-preprocessor.pl > @@

Re: [libav-devel] [GASPP PATCH 1/2] Convert .extern into IMPORT for armasm

2018-03-14 Thread Janne Grunau
On 2018-03-08 15:26:13 +0200, Martin Storsjö wrote: > --- > gas-preprocessor.pl | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/gas-preprocessor.pl b/gas-preprocessor.pl > index b0c909c..9a7f6d8 100755 > --- a/gas-preprocessor.pl > +++ b/gas-preprocessor.pl > @@ -1119,6 +1119,7 @@ sub

Re: [libav-devel] [GASPP PATCH 3/3] Don't skip negative offsets for ldr by default for armasm64

2018-03-14 Thread Janne Grunau
On 2018-03-06 10:58:32 +0200, Martin Storsjö wrote: > The version of armasm64 in Visual Studio 2017 15.6 can assemble > these just fine. > --- > gas-preprocessor.pl | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/gas-preprocessor.pl b/gas-preprocessor.pl > index

Re: [libav-devel] [GASPP PATCH 2/3] Don't skip prfum instructions by default for armasm64

2018-03-14 Thread Janne Grunau
On 2018-03-06 10:58:31 +0200, Martin Storsjö wrote: > The version of armasm64 in Visual Studio 2017 15.5 can assemble > these just fine. > --- > gas-preprocessor.pl | 10 ++ > 1 file changed, 6 insertions(+), 4 deletions(-) > > diff --git a/gas-preprocessor.pl b/gas-preprocessor.pl >

Re: [libav-devel] [GASPP PATCH 1/3] Document what versions were buggy and required GASPP_ARMASM64_INVERT_SCALE

2018-03-14 Thread Janne Grunau
On 2018-03-06 10:58:30 +0200, Martin Storsjö wrote: > --- > gas-preprocessor.pl | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/gas-preprocessor.pl b/gas-preprocessor.pl > index 3787756..9ff47a9 100755 > --- a/gas-preprocessor.pl > +++ b/gas-preprocessor.pl > @@ -1041,7

Re: [libav-devel] [PATCH 6/8] Define _WIN32 while preprocessing for armasm

2017-10-18 Thread Janne Grunau
On 2017-10-18 09:43:11 +0300, Martin Storsjö wrote: > On Wed, 18 Oct 2017, Janne Grunau wrote: > > >On 2017-10-14 23:35:20 +0300, Martin Storsjö wrote: > >>--- > >> gas-preprocessor.pl | 1 + > >> 1 file changed, 1 insertion(+) > >> > >&

Re: [libav-devel] [GASPP PATCH 6/6] Work around an armasm64 bug in the scale operand to fcvtzs/scvtf

2017-10-18 Thread Janne Grunau
On 2017-10-16 22:38:19 +0300, Martin Storsjö wrote: > The operand shouldn't be stored as is, but stored as 64-scale, in > the opcode, but armasm64 misses to do this. > > This might be a big enough bug to report and try to get fixed, but > that requires removing this workaround at that point.

Re: [libav-devel] [GASPP PATCH 5/6] Convert local labels in tbz instructions for armasm

2017-10-18 Thread Janne Grunau
On 2017-10-16 22:38:18 +0300, Martin Storsjö wrote: > Also convert the register from wX into xX, since armasm fails to > assemble it when referring to the register as wX. > --- > gas-preprocessor.pl | 11 +-- > 1 file changed, 9 insertions(+), 2 deletions(-) > > diff --git

Re: [libav-devel] [GASPP PATCH 4/6] Convert ldr/str/ldrb/strb etc into ldurb, when the offset is negative

2017-10-18 Thread Janne Grunau
On 2017-10-16 22:38:17 +0300, Martin Storsjö wrote: > --- > gas-preprocessor.pl | 13 + > 1 file changed, 13 insertions(+) > > diff --git a/gas-preprocessor.pl b/gas-preprocessor.pl > index 552ed0c..b650c39 100755 > --- a/gas-preprocessor.pl > +++ b/gas-preprocessor.pl > @@ -1012,6

Re: [libav-devel] [GASPP PATCH 3/6] Handle cinc just like ccmp/csel

2017-10-18 Thread Janne Grunau
On 2017-10-16 22:38:16 +0300, Martin Storsjö wrote: > This can be squashed into "Add support for MS armasm64"; this > was found while trying to build x264. > --- > gas-preprocessor.pl | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/gas-preprocessor.pl b/gas-preprocessor.pl > index

Re: [libav-devel] [GASPP PATCH 2/6] Allow register names such as xzr instead of the pattern [xw]\d+ in ccmp/csel

2017-10-18 Thread Janne Grunau
On 2017-10-16 22:38:15 +0300, Martin Storsjö wrote: > Also update the csel pattern similarly. > > This is required for building x264. > --- > gas-preprocessor.pl | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/gas-preprocessor.pl b/gas-preprocessor.pl > index

Re: [libav-devel] [GASPP PATCH] Support converting uxtl into ushll on a line that starts with a local label

2017-10-18 Thread Janne Grunau
On 2017-10-16 12:36:13 +0300, Martin Storsjö wrote: > Also make a note that this conversion is necessary for armasm64. > > For consistency, allow local labels in all similar full-line > conversions as well. > --- > gas-preprocessor.pl | 19 ++- > 1 file changed, 10 insertions(+),

Re: [libav-devel] [PATCH 2/3] aarch64: Remove a dot from a label

2017-10-18 Thread Janne Grunau
On 2017-10-14 23:35:32 +0300, Martin Storsjö wrote: > This fixes building with armasm64 (when run through gas-preprocessor). > --- > libavcodec/aarch64/mpegaudiodsp_neon.S | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/libavcodec/aarch64/mpegaudiodsp_neon.S >

Re: [libav-devel] [PATCH 8/8] Add support for MS armasm64

2017-10-18 Thread Janne Grunau
On 2017-10-14 23:35:22 +0300, Martin Storsjö wrote: > --- > I haven't been able to assemble prfum instructions with armasm64 yet; > dumpbin -disasm does disassemble the instruction correctly (e.g. from > an object file assembled with llvm), but armasm64 doesn't support > assembling it, either in

Re: [libav-devel] [PATCH 6/8] Define _WIN32 while preprocessing for armasm

2017-10-18 Thread Janne Grunau
On 2017-10-14 23:35:20 +0300, Martin Storsjö wrote: > --- > gas-preprocessor.pl | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/gas-preprocessor.pl b/gas-preprocessor.pl > index 456ee24..63b0ab3 100755 > --- a/gas-preprocessor.pl > +++ b/gas-preprocessor.pl > @@ -98,6 +98,7 @@ if

Re: [libav-devel] [PATCH 5/8] Operate on the right variable instead of the implicit variable

2017-10-18 Thread Janne Grunau
On 2017-10-14 23:35:19 +0300, Martin Storsjö wrote: > Apparently, this hasn't caused any issues in practice. > --- > gas-preprocessor.pl | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/gas-preprocessor.pl b/gas-preprocessor.pl > index fe9c746..456ee24 100755 > ---

Re: [libav-devel] [PATCH 4/8] Require boundaries around local labels in handle_local_label

2017-10-18 Thread Janne Grunau
On 2017-10-14 23:35:18 +0300, Martin Storsjö wrote: > Since we're doing a replace of a string that looks like e.g "1b" > over a full line, such a string could concievably be a substring of > another identifier as well. > > This doesn't fix any known issue, but attempts to make this > less

Re: [libav-devel] [PATCH 3/8] Correctly check for arm condition codes when trying to filter out 'bic'

2017-10-18 Thread Janne Grunau
On 2017-10-14 23:35:17 +0300, Martin Storsjö wrote: > Since an empty condition code also is valid, this also matched for > any other string, since it matched the empty string. By making sure > the pattern matches the full string, we avoid that issue. > > Thanks to the later is_arm_register check,

Re: [libav-devel] [PATCH 1/8] Pass -undef to cpp instead of undefining __ELF__ and __MACH__

2017-10-18 Thread Janne Grunau
On 2017-10-14 23:35:15 +0300, Martin Storsjö wrote: > --- > gas-preprocessor.pl | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/gas-preprocessor.pl b/gas-preprocessor.pl > index afdfc9e..6aae65d 100755 > --- a/gas-preprocessor.pl > +++ b/gas-preprocessor.pl > @@ -97,8

Re: [libav-devel] [PATCH] arm: Fix SIGBUS on ARM when compiled with binutils 2.29

2017-09-01 Thread Janne Grunau
On 2017-08-31 12:10:56 +0300, Martin Storsjö wrote: > In binutils 2.29, the behavior of the ADR instruction changed so that 1 is > added to the address of a Thumb function (previously nothing was added). This > allows the loaded address to be passed to a BLX instruction and the correct > mode

Re: [libav-devel] [PATCH 2/3] configure: Simplify dlopen check

2017-02-23 Thread Janne Grunau
On 2017-02-21 18:26:25 +0100, Diego Biurrun wrote: > --- > > This was previously approved. > > configure | 26 +- > 1 file changed, 9 insertions(+), 17 deletions(-) > > diff --git a/configure b/configure > index 6f1be32..ef6a8e0 100755 > --- a/configure > +++

Re: [libav-devel] [PATCH 1/3] Revert "configure: Add proper weak dependency of drawtext filter on libfontconfig"

2017-02-23 Thread Janne Grunau
On 2017-02-21 18:26:24 +0100, Diego Biurrun wrote: > External dependencies cannot be handled as weak dependencies since they need > to be explicitly enabled. If a weak dependency is set, the variable > corresponding > to the weak dependency can be enabled without the rest of the build system >

Re: [libav-devel] [PATCH 3/4] arm: vp9itxfm: Reorder iadst16 coeffs

2017-02-23 Thread Janne Grunau
On 2017-02-09 14:33:55 +0200, Martin Storsjö wrote: > This matches the order they are in the 16 bpp version. > > There they are in this order, to make sure we access them in the > same order they are declared, easing loading only half of the > coefficients at a time. > > This makes the 8 bpp

Re: [libav-devel] [PATCH 4/4] aarch64: vp9itxfm: Reorder iadst16 coeffs

2017-02-23 Thread Janne Grunau
On 2017-02-09 14:33:56 +0200, Martin Storsjö wrote: > This matches the order they are in the 16 bpp version. > > There they are in this order, to make sure we access them in the > same order they are declared, easing loading only half of the > coefficients at a time. > > This makes the 8 bpp

Re: [libav-devel] [PATCH 2/4] aarch64: vp9itxfm: Reorder the idct coefficients for better pairing

2017-02-23 Thread Janne Grunau
On 2017-02-09 14:33:54 +0200, Martin Storsjö wrote: > All elements are used pairwise, except for the first one. > Previously, the 16th element was unused. Move the unused element > to the second slot, to make the later element pairs not split > across registers. > > This simplifies loading only

Re: [libav-devel] [PATCH 1/4] arm: vp9itxfm: Reorder the idct coefficients for better pairing

2017-02-23 Thread Janne Grunau
On 2017-02-09 14:33:53 +0200, Martin Storsjö wrote: > All elements are used pairwise, except for the first one. > Previously, the 16th element was unused. Move the unused element > to the second slot, to make the later element pairs not split > across registers. > > This simplifies loading only

Re: [libav-devel] [PATCH] arm: vp9itxfm: Avoid reloading the idct32 coefficients

2017-02-23 Thread Janne Grunau
On 2017-02-09 13:39:55 +0200, Martin Storsjö wrote: > The idct32x32 function actually backed up and restored q4-q7 even > though it didn't clobber them; there are plenty of registers that > can be used to allow keeping all the idct coefficients in registers > without having to reload different

Re: [libav-devel] [PATCH 6/6] arm: vp9lpf: Implement the mix2_44 function with one single filter pass

2017-02-23 Thread Janne Grunau
On 2017-02-11 23:42:05 +0200, Martin Storsjö wrote: > On Sat, 11 Feb 2017, Martin Storsjö wrote: > > >On Fri, 10 Feb 2017, Janne Grunau wrote: > > > >>On 2017-01-15 22:55:52 +0200, Martin Storsjö wrote: > >>>For this case, with 8 inputs but only changing

Re: [libav-devel] [PATCH 2/6] arm/aarch64: vp9lpf: Keep the comparison to E within 8 bit

2017-02-23 Thread Janne Grunau
On 2017-02-11 22:19:02 +0200, Martin Storsjö wrote: > On Fri, 10 Feb 2017, Janne Grunau wrote: > > >On 2017-01-15 22:55:48 +0200, Martin Storsjö wrote: > >>The theoretical maximum value of E is 193, so we can just > >>saturate the addition to 255. > >> &g

Re: [libav-devel] [PATCH] aarch64: vp9itxfm: Avoid reloading the idct32 coefficients

2017-02-10 Thread Janne Grunau
On 2017-02-09 13:27:04 +0200, Martin Storsjö wrote: > The idct32x32 function actually backed up and restored d8-d15 even ... pushed onto the stack ... is imo clearer even though there are no explicit push/pop instructions > though it didn't clobber them; there are plenty of registers that > can

Re: [libav-devel] [PATCH 6/6] arm: vp9lpf: Implement the mix2_44 function with one single filter pass

2017-02-10 Thread Janne Grunau
On 2017-01-15 22:55:52 +0200, Martin Storsjö wrote: > For this case, with 8 inputs but only changing 4 of them, we can fit > all 16 input pixels into a q register, and still have enough temporary > registers for doing the loop filter. > > The wd=8 filters would require too many temporary

Re: [libav-devel] [PATCH 5/6] aarch64: vp9lpf: Interleave the start of flat8in into the calculation above

2017-02-10 Thread Janne Grunau
On 2017-01-15 22:55:51 +0200, Martin Storsjö wrote: > --- > libavcodec/aarch64/vp9lpf_neon.S | 16 +--- > 1 file changed, 13 insertions(+), 3 deletions(-) > > diff --git a/libavcodec/aarch64/vp9lpf_neon.S > b/libavcodec/aarch64/vp9lpf_neon.S > index 4553173..3894307 100644 > ---

Re: [libav-devel] [PATCH 4/6] arm: vp9lpf: Interleave the start of flat8in into the calculation above

2017-02-10 Thread Janne Grunau
On 2017-01-15 22:55:50 +0200, Martin Storsjö wrote: > This adds lots of extra .ifs, but speeds it up by a couple cycles, > by avoiding stalls. > --- > libavcodec/arm/vp9lpf_neon.S | 8 ++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff --git a/libavcodec/arm/vp9lpf_neon.S

Re: [libav-devel] [PATCH 3/6] arm: vp9lpf: Use orrs instead of orr+cmp

2017-02-10 Thread Janne Grunau
On 2017-01-15 22:55:49 +0200, Martin Storsjö wrote: > --- > libavcodec/arm/vp9lpf_neon.S | 12 > 1 file changed, 4 insertions(+), 8 deletions(-) > > diff --git a/libavcodec/arm/vp9lpf_neon.S b/libavcodec/arm/vp9lpf_neon.S > index 5e154f6..9be4cef 100644 > ---

Re: [libav-devel] [PATCH 2/6] arm/aarch64: vp9lpf: Keep the comparison to E within 8 bit

2017-02-10 Thread Janne Grunau
On 2017-01-15 22:55:48 +0200, Martin Storsjö wrote: > The theoretical maximum value of E is 193, so we can just > saturate the addition to 255. > > Before: Cortex A7 A8 A9 A53 A53/AArch64 > vp9_loop_filter_v_4_8_neon: 143.0 127.7 114.888.0

Re: [libav-devel] [PATCH 1/6] arm/aarch64: vp9lpf: Calculate !hev directly

2017-02-10 Thread Janne Grunau
On 2017-01-15 22:55:47 +0200, Martin Storsjö wrote: > Previously we first calculated hev, and then negated it. > > Since we were able to schedule the negation in the middle > of another calculation, we don't see any gain in all cases. > > Before: Cortex A7 A8 A9

Re: [libav-devel] [PATCH 1/2] arm: vp9itxfm: Optimize 16x16 and 32x32 idct dc by unrolling

2017-02-10 Thread Janne Grunau
On 2017-01-05 09:35:36 +0200, Martin Storsjö wrote: > This work is sponsored by, and copyright, Google. > > Before:Cortex A7 A8 A9 A53 > vp9_inv_dct_dct_16x16_sub1_add_neon: 273.0 189.5 211.7 235.8 > vp9_inv_dct_dct_32x32_sub1_add_neon: 752.0

Re: [libav-devel] [PATCH 6/6] aarch64: vp9mc: Calculate less unused data in the 4 pixel wide horizontal filter

2017-02-10 Thread Janne Grunau
On 2017-01-02 14:17:56 +0200, Martin Storsjö wrote: > No measured speedup on an Cortex A53, but other cores might benefit. A little surprised that it didn't made a difference on the cortex-a53 since certain sites reported the NEON unit isn't fully 128-bit wide, So unlikely that it makes a

Re: [libav-devel] [PATCH 4/6] aarch64: vp9mc: Simplify the extmla macro parameters

2017-02-10 Thread Janne Grunau
On 2017-01-02 14:17:54 +0200, Martin Storsjö wrote: > Fold the field lengths into the macro. > > This makes the macro invocations much more readable, when the > lines are shorter. > > This also makes it easier to use only half the registers within > the macro. > --- >

Re: [libav-devel] [PATCH 5/6] arm: vp9mc: Calculate less unused data in the 4 pixel wide horizontal filter

2017-02-10 Thread Janne Grunau
On 2017-01-02 14:17:55 +0200, Martin Storsjö wrote: > Before:Cortex A7 A8 A9 A53 > vp9_put_8tap_smooth_4h_neon: 378.1 273.2 340.7 229.5 > After: > vp9_put_8tap_smooth_4h_neon: 352.1 222.2 290.5 229.5 > --- > libavcodec/arm/vp9mc_neon.S | 33

Re: [libav-devel] [PATCH 1/6] arm: vp9itxfm: Share instructions for loading idct coeffs in the 8x8 function

2017-02-09 Thread Janne Grunau
On 2017-02-09 14:29:56 +0200, Martin Storsjö wrote: > --- > libavcodec/arm/vp9itxfm_neon.S | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/libavcodec/arm/vp9itxfm_neon.S b/libavcodec/arm/vp9itxfm_neon.S > index 167d517..3d0b0fa 100644 > ---

Re: [libav-devel] [PATCH 5/5] aarch64: vp9itxfm: Do a simpler half/quarter idct16/idct32 when possible (alternative 1)

2017-02-09 Thread Janne Grunau
On 2017-02-09 09:50:48 +0200, Martin Storsjö wrote: > On Thu, 9 Feb 2017, Janne Grunau wrote: > > >On 2017-02-05 14:05:49 +0200, Martin Storsjö wrote: > >>On Sun, 5 Feb 2017, Janne Grunau wrote: > >> > >>>> // out1 = in1 + in2 > >>>> //

Re: [libav-devel] [PATCH 5/5] aarch64: vp9itxfm: Do separate functions for half/quarter idct16 and idct32 (alternative 2)

2017-02-08 Thread Janne Grunau
On 2017-02-06 00:16:41 +0200, Martin Storsjö wrote: > > Ok, so after running a slightly shorter clip (which seems to have about as > large percentage of runtime doing IDCT as the previous one) with a bit more > iterations, I've got the following results (the 'user' part from 'time > avconv

Re: [libav-devel] [PATCH 5/5] aarch64: vp9itxfm: Do a simpler half/quarter idct16/idct32 when possible (alternative 1)

2017-02-08 Thread Janne Grunau
On 2017-02-05 14:05:49 +0200, Martin Storsjö wrote: > On Sun, 5 Feb 2017, Janne Grunau wrote: > > >> // out1 = in1 + in2 > >> // out2 = in1 - in2 > >> .macro butterfly_8h out1, out2, in1, in2 > >>@@ -463,7 +510,7 @@ function idct16x16_dc_add_neon > &

Re: [libav-devel] [PATCH] configure: Rework dependency handling for conflicting components

2017-02-07 Thread Janne Grunau
On 2017-02-06 17:22:06 +0100, Diego Biurrun wrote: > This makes the feature more visible and obvious. > --- > > Changed to use _conflict instead of _not as Janne suggested. > > configure | 22 +- > 1 file changed, 13 insertions(+), 9 deletions(-) ok Janne

Re: [libav-devel] [PATCH] configure: Add name parameter to require_pkg_config() helper function

2017-02-07 Thread Janne Grunau
On 2017-02-06 18:08:00 +0100, Diego Biurrun wrote: > This allows distinguishing between the internal variable name for > external libraries and the pkg-config package name. Having both > names available avoids special-casing outside the helper function > when the two identifiers do not match. >

Re: [libav-devel] [PATCH 07/12] configure: Add name parameter to require_pkg_config() helper function

2017-02-05 Thread Janne Grunau
On 2017-01-24 18:12:47 +0100, Diego Biurrun wrote: > This allows distinguishing between the internal variable name for > external libraries and the pkg-config package name. Having both > names available avoids special-casing outside the helper function > when the two identifiers do not match. >

Re: [libav-devel] [PATCH 5/5] aarch64: vp9itxfm: Do separate functions for half/quarter idct16 and idct32 (alternative 2)

2017-02-05 Thread Janne Grunau
On 2016-12-01 11:27:02 +0200, Martin Storsjö wrote: > This work is sponsored by, and copyright, Google. > > This makes it easier to avoid filling the temp buffer with zeros for the > skipped slices, and leads to slightly more straightforward code for these > cases (for the 16x16 case, where the

Re: [libav-devel] [PATCH] arm: vp9itxfm: Avoid .irp when it doesn't save any lines

2017-02-05 Thread Janne Grunau
On 2017-02-04 23:37:37 +0200, Martin Storsjö wrote: > This makes it more readable. > --- > This was suggested by Janne in a review of a patch that added a > modified copy of this function; similar code already exists as well. > --- > libavcodec/arm/vp9itxfm_neon.S | 24 >

Re: [libav-devel] [PATCH 5/5] aarch64: vp9itxfm: Do a simpler half/quarter idct16/idct32 when possible (alternative 1)

2017-02-05 Thread Janne Grunau
On 2016-12-01 11:27:01 +0200, Martin Storsjö wrote: > This work is sponsored by, and copyright, Google. > > This increases the code size of libavcodec/aarch64/vp9itxfm_neon.o > from 14740 to 18504 bytes. > > Before: > vp9_inv_dct_dct_16x16_sub1_add_neon: 235.3 >

Re: [libav-devel] [PATCH 2/5] arm: vp9itxfm: Do separate functions for half/quarter idct16 and idct32 (alternative 2)

2017-02-05 Thread Janne Grunau
On 2017-02-05 00:34:16 +0200, Martin Storsjö wrote: > On Sat, 4 Feb 2017, Janne Grunau wrote: > > >I'm not really sure which variant I prefer. Is the speed difference > >mesuable for idct heavy real world samples? If you have preference for one > >or the other varian

Re: [libav-devel] [PATCH 4/5] aarch64: vp9itxfm: Restructure the idct32 store macros

2017-02-04 Thread Janne Grunau
On 2016-12-01 11:27:00 +0200, Martin Storsjö wrote: > This avoids concatenation, which can't be used if the whole macro > is wrapped within another macro. > --- > libavcodec/aarch64/vp9itxfm_neon.S | 80 > +++--- > 1 file changed, 40 insertions(+), 40 deletions(-)

Re: [libav-devel] [PATCH 3/5] aarch64: vp9itxfm: Make the larger core transforms standalone functions

2017-02-04 Thread Janne Grunau
On 2016-12-01 11:26:59 +0200, Martin Storsjö wrote: > This work is sponsored by, and copyright, Google. > > This reduces the code size of libavcodec/aarch64/vp9itxfm_neon.o from > 19496 to 14740 bytes. > > This gives a small slowdown of a couple of tens of cycles, but makes > it more feasible to

Re: [libav-devel] [PATCH 2/5] arm: vp9itxfm: Do a simpler half/quarter idct16/idct32 when possible (alternative 1)

2017-02-04 Thread Janne Grunau
On 2017-02-03 23:44:51 +0200, Martin Storsjö wrote: > On Fri, 3 Feb 2017, Janne Grunau wrote: > > >On 2016-12-01 11:26:57 +0200, Martin Storsjö wrote: > >>This work is sponsored by, and copyright, Google. > >> > > >>@@ -668,13 +756,40

Re: [libav-devel] [PATCH 2/5] arm: vp9itxfm: Do separate functions for half/quarter idct16 and idct32 (alternative 2)

2017-02-04 Thread Janne Grunau
On 2016-12-01 11:26:58 +0200, Martin Storsjö wrote: > This work is sponsored by, and copyright, Google. > > This makes it easier to avoid filling the temp buffer with zeros for the > skipped slices, and leads to slightly more straightforward code for these > cases (for the 16x16 case, where the

Re: [libav-devel] [PATCH 2/5] arm: vp9itxfm: Do a simpler half/quarter idct16/idct32 when possible (alternative 1)

2017-02-03 Thread Janne Grunau
On 2016-12-01 11:26:57 +0200, Martin Storsjö wrote: > This work is sponsored by, and copyright, Google. > > This increases the code size of libavcodec/arm/vp9itxfm_neon.o > from 12388 to 15064 bytes. > > Before: Cortex A7 A8 A9 A53 >

Re: [libav-devel] [PATCH 1/5] arm: vp9itxfm: Make the larger core transforms standalone functions

2017-02-03 Thread Janne Grunau
On 2016-12-01 11:26:56 +0200, Martin Storsjö wrote: > This work is sponsored by, and copyright, Google. > > This reduces the code size of libavcodec/arm/vp9itxfm_neon.o from > 15324 to 12388 bytes. > > This gives a small slowdown of a couple tens of cycles, up to around > 150 cycles for the full

Re: [libav-devel] [PATCH] build: Move build-system-related helper files to a separate subdirectory

2016-12-22 Thread Janne Grunau
On 2016-12-22 13:07:14 +0100, Diego Biurrun wrote: > This unclutters the top-level directory and groups related files together. > --- > > Now with "avbuild" as directory to store files in instead of "build". > > .gitignore | 3 ++- > Makefile |

Re: [libav-devel] [PATCH] h264dec: make sure to only end a field if it has been started

2016-12-18 Thread Janne Grunau
On 2016-12-18 11:36:30 +0100, Anton Khirnov wrote: > Calling ff_h264_field_end() when the per-field state is not properly > initialized leads to all kinds of undefined behaviour. > > CC: libav-sta...@libav.org > Bug-Id: 977 978 992 > --- > libavcodec/h264_picture.c | 1 + >

Re: [libav-devel] [PATCH] pthread_frame: do not run hwaccel decoding asynchronously unless it's safe

2016-12-18 Thread Janne Grunau
On 2016-12-14 09:56:40 +0100, Anton Khirnov wrote: > Certain hardware decoding APIs are not guaranteed to be thread-safe, so > having the user access decoded hardware surfaces while the decoder is > running in another thread can cause failures (this is mainly known to > happen with DXVA2). > >

Re: [libav-devel] [PATCH] pthread_frame: ensure the threads don't run simultaneously with hwaccel

2016-12-18 Thread Janne Grunau
On 2016-12-14 09:56:23 +0100, Anton Khirnov wrote: > --- > libavcodec/h263dec.c | 2 +- > libavcodec/h264dec.c | 2 +- > libavcodec/pthread_frame.c | 35 +++ > 3 files changed, 37 insertions(+), 2 deletions(-) > > diff --git a/libavcodec/h263dec.c

[libav-devel] [PATCH 1/1] arm64: replace 'bic' with immediate with 'and' with inverted immediate

2016-12-08 Thread Janne Grunau
The former is not an official pseudo instruction although gas and llvm's internal assembler support it. Fixes a build error with xcode 6.2 reported by Memphiz on github. --- libavcodec/aarch64/synth_filter_neon.S | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git

Re: [libav-devel] [PATCH 3/3] pthread_frame: do not run hwaccel decoding asynchronously unless it's safe

2016-12-06 Thread Janne Grunau
On 2016-12-03 17:34:34 +0100, Anton Khirnov wrote: > Certain hardware decoding APIs are often not thread-safe, so having the user > access decoded hardware surfaces while the decoder is running in another > thread can cause failures (this is mainly known to happen with DXVA2). > > For such

Re: [libav-devel] [PATCH 2/3] pthread_frame: ensure the threads don't run simultaneously with hwaccel

2016-12-06 Thread Janne Grunau
On 2016-12-03 17:34:33 +0100, Anton Khirnov wrote: > --- > libavcodec/h263dec.c | 2 +- > libavcodec/h264dec.c | 2 +- > libavcodec/pthread_frame.c | 27 +++ > 3 files changed, 29 insertions(+), 2 deletions(-) > > diff --git a/libavcodec/h263dec.c

Re: [libav-devel] [PATCH 1/3] hevc: decouple calling get_format() from exporting the SPS parameters

2016-12-06 Thread Janne Grunau
On 2016-12-03 17:34:32 +0100, Anton Khirnov wrote: > This makes sure ff_get_format() does not get called unnecessarily from > update_thread_context(). > --- > libavcodec/hevcdec.c | 49 ++--- > 1 file changed, 30 insertions(+), 19 deletions(-) > > diff

Re: [libav-devel] [PATCH 3/3] aarch64: vp9itxfm: Skip empty slices in the first pass of idct_idct 16x16 and 32x32

2016-11-30 Thread Janne Grunau
On 2016-11-28 11:26:02 +0200, Martin Storsjö wrote: > This work is sponsored by, and copyright, Google. > > Previously all subpartitions except the eob=1 (DC) case ran with > the same runtime: > > vp9_inv_dct_dct_16x16_sub16_add_neon: 1373.2 > vp9_inv_dct_dct_32x32_sub32_add_neon: 8089.0 >

Re: [libav-devel] [PATCH 2/3] arm: vp9itxfm: Skip empty slices in the first pass of idct_idct 16x16 and 32x32

2016-11-30 Thread Janne Grunau
On 2016-11-28 11:26:01 +0200, Martin Storsjö wrote: > This work is sponsored by, and copyright, Google. > > Previously all subpartitions except the eob=1 (DC) case ran with > the same runtime: > > vp9_inv_dct_dct_16x16_sub16_add_neon: 3188.1 2435.4 2499.0 1969.0 >

Re: [libav-devel] [PATCH 1/3] arm: vp9itxfm: Only reload the idct coeffs for the iadst_idct combination

2016-11-30 Thread Janne Grunau
On 2016-11-28 11:26:00 +0200, Martin Storsjö wrote: > This avoids reloading them if they haven't been clobbered, if the > first pass also was idct. > > This is similar to what was done in the aarch64 version. > --- > libavcodec/arm/vp9itxfm_neon.S | 2 +- > 1 file changed, 1 insertion(+), 1

Re: [libav-devel] [PATCH] vp9dsp: add DC only versions for idct/idct.

2016-11-30 Thread Janne Grunau
On 2016-11-29 14:55:41 +0200, Martin Storsjö wrote: > From: Clément Bœsch > > before: > > time ./avconv -v 0 -nostats -threads 1 -i sintel_vp9_500kbps.webm -f null - > real0m11.125s > user0m11.059s > sys 0m0.050s > > time ./avconv -v 0 -nostats -threads 1 -i

Re: [libav-devel] [PATCH] pthread_frame: do not run hwaccel decoding asynchronously

2016-11-24 Thread Janne Grunau
On 2016-11-24 19:19:59 +0100, Anton Khirnov wrote: > Only allow the decoding thread to run while the user is inside a lavc > decode call (avcodec_send_packet/receive_frame). > Hardware decoding APIs are often not thread-safe, so having the user > access decoded hardware surfaces while the decoder

Re: [libav-devel] [PATCH 14/15] configure: Do not add newlines in filter()/filter_out() functions

2016-11-24 Thread Janne Grunau
On 2016-11-24 17:24:00 +0100, Diego Biurrun wrote: > --- > configure | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/configure b/configure > index a5295bf..27fb6ea 100755 > --- a/configure > +++ b/configure > @@ -430,7 +430,7 @@ filter(){ > pat=$1 > shift >

Re: [libav-devel] [PATCH 03/15] configure: Add missing asyncts filter, movie filter, and output example deps

2016-11-24 Thread Janne Grunau
On 2016-11-24 19:32:54 +0100, Diego Biurrun wrote: > On Thu, Nov 24, 2016 at 06:44:35PM +0100, Janne Grunau wrote: > > On 2016-11-24 17:23:49 +0100, Diego Biurrun wrote: > > > --- a/configure > > > +++ b/configure > > > @@ -2440,6 +2441,7

Re: [libav-devel] [PATCH 15/15] [RFC] configure: more fine-grained link-time dependency settings

2016-11-24 Thread Janne Grunau
On 2016-11-24 17:24:01 +0100, Diego Biurrun wrote: > --- > > This works as advertised. > > Issues: > > - Maybe keeping _extralibs as suffix is better than _lbs, dunno. > - Possibly I should investigate Janne's idea of using the function > name as variable name instead of adding a library name

Re: [libav-devel] [PATCH 04/15] configure: Use correct libm linker flag during math function checks

2016-11-24 Thread Janne Grunau
On 2016-11-24 17:23:50 +0100, Diego Biurrun wrote: > --- > > I suspect very many missing math functions were actually spurious test > failures related to this ... > > configure | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/configure b/configure > index

Re: [libav-devel] [PATCH 03/15] configure: Add missing asyncts filter, movie filter, and output example deps

2016-11-24 Thread Janne Grunau
On 2016-11-24 17:23:49 +0100, Diego Biurrun wrote: > --- > configure| 8 ++-- > libavfilter/vsrc_movie.c | 4 > 2 files changed, 10 insertions(+), 2 deletions(-) > > diff --git a/configure b/configure > index f204dc2..8fa2f46 100755 > --- a/configure > +++ b/configure >

Re: [libav-devel] [PATCH 01/15] configure: Remove old avisynth support leftover

2016-11-24 Thread Janne Grunau
On 2016-11-24 17:23:47 +0100, Diego Biurrun wrote: > --- > configure | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/configure b/configure > index 42c1848..78f1cac 100755 > --- a/configure > +++ b/configure > @@ -3039,7 +3039,6 @@ msvc_common_flags(){ > -mthumb)

Re: [libav-devel] [PATCHv2] aarch64: vp9itxfm: Don't repeatedly set x9 when nothing overwrites it

2016-11-23 Thread Janne Grunau
On 2016-11-24 00:09:35 +0200, Martin Storsjö wrote: > --- > libavcodec/aarch64/vp9itxfm_neon.S | 26 +++--- > 1 file changed, 15 insertions(+), 11 deletions(-) > > diff --git a/libavcodec/aarch64/vp9itxfm_neon.S > b/libavcodec/aarch64/vp9itxfm_neon.S > index 2dc6b75..f4194a6

Re: [libav-devel] [PATCH 2/2] checkasm: vp9dsp: benchmark all sub-IDCTs (but not WHT or ADST).

2016-11-23 Thread Janne Grunau
On 2016-11-18 13:57:05 +0200, Martin Storsjö wrote: > From: "Ronald S. Bultje" > > --- > tests/checkasm/vp9dsp.c | 21 ++--- > 1 file changed, 14 insertions(+), 7 deletions(-) > > diff --git a/tests/checkasm/vp9dsp.c b/tests/checkasm/vp9dsp.c > index

Re: [libav-devel] [PATCH 04/11] arm: vp9itxfm: Skip empty slices in the first pass of idct_idct 16x16 and 32x32

2016-11-23 Thread Janne Grunau
On 2016-11-23 15:00:51 +0200, Martin Storsjö wrote: > This work is sponsored by, and copyright, Google. > > Previously all subpartitions except the eob=1 (DC) case ran with > the same runtime: > > vp9_inv_dct_dct_16x16_sub16_add_neon: 3189.0 2486.8 2509.9 1964.1 >

Re: [libav-devel] [PATCH 03/11] arm: vp9itxfm: Rename a macro parameter to fit better

2016-11-23 Thread Janne Grunau
On 2016-11-23 15:00:50 +0200, Martin Storsjö wrote: > Since the same parameter is used for both input and output, > the name inout is more fitting. > > This matches the naming used below in the dmbutterfly macro. > --- > libavcodec/arm/vp9itxfm_neon.S | 14 +++--- > 1 file changed, 7

  1   2   3   4   5   6   7   8   9   10   >