On 2019-10-02 11:53:28 +0300, Martin Storsjö wrote:
> Armasm implicitly adds it instructions as needed. In VS 2019 16.3,
> there's a bug [1] in armasm making it fail to parse these it instructions
> (but it can still add them implicitly just fine).
>
> I'm not sure if it really is worth working
On 2019-10-02 10:58:46 +0300, Martin Storsjö wrote:
> ---
> gas-preprocessor.pl | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/gas-preprocessor.pl b/gas-preprocessor.pl
> index 6da37c1..b6c2786 100755
> --- a/gas-preprocessor.pl
> +++ b/gas-preprocessor.pl
> @@ -162,6 +162,7 @@ if
On 2019-08-21 22:40:13 +0300, Martin Storsjö wrote:
> From: Peter Collingbourne
>
> As of LLVM r368102, Clang will set a pointer tag in bits 56-63 of the
> address of a global when compiling with -fsanitize=hwaddress. This requires
> an adjustment to assembly code that takes the address of such
---
libavcodec/aarch64/h264dsp_init_aarch64.c | 18 ++--
libavcodec/aarch64/h264dsp_neon.S | 36 +++
2 files changed, 46 insertions(+), 8 deletions(-)
diff --git a/libavcodec/aarch64/h264dsp_init_aarch64.c
b/libavcodec/aarch64/h264dsp_init_aarch64.c
index
---
tests/checkasm/h264dsp.c | 44
1 file changed, 26 insertions(+), 18 deletions(-)
diff --git a/tests/checkasm/h264dsp.c b/tests/checkasm/h264dsp.c
index 706fc79397..ee07121ab4 100644
--- a/tests/checkasm/h264dsp.c
+++ b/tests/checkasm/h264dsp.c
@@
On 2019-01-27 11:39:13 +0100, Diego Biurrun wrote:
> On Sun, Jan 27, 2019 at 11:18:41AM +0100, Janne Grunau wrote:
> > Fixes checkasm errors after adding the h264 deblock tests.
> > ---
> > libavcodec/x86/h264_deblock.asm | 8
> > libavcodec/x86
Fixes checkasm errors after adding the h264 deblock tests.
---
libavcodec/x86/h264_deblock.asm | 8
libavcodec/x86/h264_deblock_10bit.asm | 9 +
2 files changed, 17 insertions(+)
diff --git a/libavcodec/x86/h264_deblock.asm b/libavcodec/x86/h264_deblock.asm
index
On 2019-01-26 23:22:42 +0200, Martin Storsjö wrote:
> On Tue, 1 Jan 2019, Janne Grunau wrote:
>
> > ---
> > tests/checkasm/h264dsp.c | 124 +++
> > 1 file changed, 124 insertions(+)
>
> This newly added test seems to fail on mac
On 2019-01-25 10:39:13 +0200, Martin Storsjö wrote:
> The "new" entry point actually has existed since OpenH264 1.4 in
> 2015, but with B-frames, this entry point is essential for actually
> getting the right frames returned and reordered.
>
> The name of this function, DecodeFrameNoDelay, is
On 2019-01-11 15:24:56 +0200, Martin Storsjö wrote:
> ---
> gas-preprocessor.pl | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gas-preprocessor.pl b/gas-preprocessor.pl
> index 4131c46..0137718 100755
> --- a/gas-preprocessor.pl
> +++ b/gas-preprocessor.pl
> @@ -1198,7
---
libavcodec/aarch64/h264dsp_neon.S | 3 +++
1 file changed, 3 insertions(+)
diff --git a/libavcodec/aarch64/h264dsp_neon.S
b/libavcodec/aarch64/h264dsp_neon.S
index 9b4610a4d4..60ffa24500 100644
--- a/libavcodec/aarch64/h264dsp_neon.S
+++ b/libavcodec/aarch64/h264dsp_neon.S
@@ -130,6 +130,7
---
tests/checkasm/h264dsp.c | 124 +++
1 file changed, 124 insertions(+)
diff --git a/tests/checkasm/h264dsp.c b/tests/checkasm/h264dsp.c
index f355a72a74..706fc79397 100644
--- a/tests/checkasm/h264dsp.c
+++ b/tests/checkasm/h264dsp.c
@@ -28,6 +28,7 @@
c/aarch64/h264dsp_neon.S
index b649f1d018..448e575b8c 100644
--- a/libavcodec/aarch64/h264dsp_neon.S
+++ b/libavcodec/aarch64/h264dsp_neon.S
@@ -1,6 +1,7 @@
/*
* Copyright (c) 2008 Mans Rullgard
* Copyright (c) 2013 Janne Grunau
+ * Copyright (c) 2014 Janne Grunau
*
* This file
Exit as soon as possible if no filtering will be done.
Improves the checkasm --bench cycle count on a Snapdragon 820e:
h264_h_loop_filter_luma_8bpp_c: 72.4 -> 72.5
h264_h_loop_filter_luma_8bpp_neon: 97.1 -> 56.3
h264_v_loop_filter_luma_8bpp_c: 174.0 -> 173.5
On 2018-10-22 23:24:12 +0300, Martin Storsjö wrote:
> This fixes cases if the input parameter is something else than
> the currently iterated variable.
> ---
> gas-preprocessor.pl | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/gas-preprocessor.pl
On 2018-10-22 23:23:38 +0300, Martin Storsjö wrote:
> ---
> gas-preprocessor.pl | 9 +
> 1 file changed, 9 insertions(+)
>
> diff --git a/gas-preprocessor.pl b/gas-preprocessor.pl
> index fd9aac8..41d7b69 100755
> --- a/gas-preprocessor.pl
> +++ b/gas-preprocessor.pl
> @@ -27,6 +27,7 @@
On 2018-10-20 00:18:27 +0300, Martin Storsjö wrote:
> For cases like "b1b", this could previously be matched as
> $cond = " ".
>
> This fixes preprocessing with a preprocessor that preserves multiple
> consecutive spaces, like cl.exe does.
> ---
> Better fix, which also works in a number of
On 2018-10-22 12:51:47 +0300, Martin Storsjö wrote:
> ---
> gas-preprocessor.pl | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gas-preprocessor.pl b/gas-preprocessor.pl
> index 7efe3b9..669d435 100755
> --- a/gas-preprocessor.pl
> +++ b/gas-preprocessor.pl
> @@ -1011,7
On 2018-10-12 14:43:56 +0300, Martin Storsjö wrote:
> Prior to Xcode 9.3, the clang built-in assembler didn't support
> altmacro, and gas-preprocessor was used for assembling for arm/darwin.
>
> For thumb functions, gas-preprocessor took care of adding the .thumb_func
> directives, but when now
On 2018-03-29 13:10:49 -0300, James Almer wrote:
> Use the proper names instead of numbers
>
> Signed-off-by: James Almer
> ---
> libavcodec/avcodec.h | 6 +++---
> libavcodec/libaomenc.c | 6 +++---
> 2 files changed, 6 insertions(+), 6 deletions(-)
>
> diff --git
On 2018-03-08 15:26:14 +0200, Martin Storsjö wrote:
> ---
> gas-preprocessor.pl | 18 ++
> 1 file changed, 18 insertions(+)
>
> diff --git a/gas-preprocessor.pl b/gas-preprocessor.pl
> index 9a7f6d8..5158cc7 100755
> --- a/gas-preprocessor.pl
> +++ b/gas-preprocessor.pl
> @@
On 2018-03-08 15:26:13 +0200, Martin Storsjö wrote:
> ---
> gas-preprocessor.pl | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/gas-preprocessor.pl b/gas-preprocessor.pl
> index b0c909c..9a7f6d8 100755
> --- a/gas-preprocessor.pl
> +++ b/gas-preprocessor.pl
> @@ -1119,6 +1119,7 @@ sub
On 2018-03-06 10:58:32 +0200, Martin Storsjö wrote:
> The version of armasm64 in Visual Studio 2017 15.6 can assemble
> these just fine.
> ---
> gas-preprocessor.pl | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/gas-preprocessor.pl b/gas-preprocessor.pl
> index
On 2018-03-06 10:58:31 +0200, Martin Storsjö wrote:
> The version of armasm64 in Visual Studio 2017 15.5 can assemble
> these just fine.
> ---
> gas-preprocessor.pl | 10 ++
> 1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/gas-preprocessor.pl b/gas-preprocessor.pl
>
On 2018-03-06 10:58:30 +0200, Martin Storsjö wrote:
> ---
> gas-preprocessor.pl | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gas-preprocessor.pl b/gas-preprocessor.pl
> index 3787756..9ff47a9 100755
> --- a/gas-preprocessor.pl
> +++ b/gas-preprocessor.pl
> @@ -1041,7
On 2017-10-18 09:43:11 +0300, Martin Storsjö wrote:
> On Wed, 18 Oct 2017, Janne Grunau wrote:
>
> >On 2017-10-14 23:35:20 +0300, Martin Storsjö wrote:
> >>---
> >> gas-preprocessor.pl | 1 +
> >> 1 file changed, 1 insertion(+)
> >>
> >&
On 2017-10-16 22:38:19 +0300, Martin Storsjö wrote:
> The operand shouldn't be stored as is, but stored as 64-scale, in
> the opcode, but armasm64 misses to do this.
>
> This might be a big enough bug to report and try to get fixed, but
> that requires removing this workaround at that point.
On 2017-10-16 22:38:18 +0300, Martin Storsjö wrote:
> Also convert the register from wX into xX, since armasm fails to
> assemble it when referring to the register as wX.
> ---
> gas-preprocessor.pl | 11 +--
> 1 file changed, 9 insertions(+), 2 deletions(-)
>
> diff --git
On 2017-10-16 22:38:17 +0300, Martin Storsjö wrote:
> ---
> gas-preprocessor.pl | 13 +
> 1 file changed, 13 insertions(+)
>
> diff --git a/gas-preprocessor.pl b/gas-preprocessor.pl
> index 552ed0c..b650c39 100755
> --- a/gas-preprocessor.pl
> +++ b/gas-preprocessor.pl
> @@ -1012,6
On 2017-10-16 22:38:16 +0300, Martin Storsjö wrote:
> This can be squashed into "Add support for MS armasm64"; this
> was found while trying to build x264.
> ---
> gas-preprocessor.pl | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/gas-preprocessor.pl b/gas-preprocessor.pl
> index
On 2017-10-16 22:38:15 +0300, Martin Storsjö wrote:
> Also update the csel pattern similarly.
>
> This is required for building x264.
> ---
> gas-preprocessor.pl | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/gas-preprocessor.pl b/gas-preprocessor.pl
> index
On 2017-10-16 12:36:13 +0300, Martin Storsjö wrote:
> Also make a note that this conversion is necessary for armasm64.
>
> For consistency, allow local labels in all similar full-line
> conversions as well.
> ---
> gas-preprocessor.pl | 19 ++-
> 1 file changed, 10 insertions(+),
On 2017-10-14 23:35:32 +0300, Martin Storsjö wrote:
> This fixes building with armasm64 (when run through gas-preprocessor).
> ---
> libavcodec/aarch64/mpegaudiodsp_neon.S | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/libavcodec/aarch64/mpegaudiodsp_neon.S
>
On 2017-10-14 23:35:22 +0300, Martin Storsjö wrote:
> ---
> I haven't been able to assemble prfum instructions with armasm64 yet;
> dumpbin -disasm does disassemble the instruction correctly (e.g. from
> an object file assembled with llvm), but armasm64 doesn't support
> assembling it, either in
On 2017-10-14 23:35:20 +0300, Martin Storsjö wrote:
> ---
> gas-preprocessor.pl | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/gas-preprocessor.pl b/gas-preprocessor.pl
> index 456ee24..63b0ab3 100755
> --- a/gas-preprocessor.pl
> +++ b/gas-preprocessor.pl
> @@ -98,6 +98,7 @@ if
On 2017-10-14 23:35:19 +0300, Martin Storsjö wrote:
> Apparently, this hasn't caused any issues in practice.
> ---
> gas-preprocessor.pl | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gas-preprocessor.pl b/gas-preprocessor.pl
> index fe9c746..456ee24 100755
> ---
On 2017-10-14 23:35:18 +0300, Martin Storsjö wrote:
> Since we're doing a replace of a string that looks like e.g "1b"
> over a full line, such a string could concievably be a substring of
> another identifier as well.
>
> This doesn't fix any known issue, but attempts to make this
> less
On 2017-10-14 23:35:17 +0300, Martin Storsjö wrote:
> Since an empty condition code also is valid, this also matched for
> any other string, since it matched the empty string. By making sure
> the pattern matches the full string, we avoid that issue.
>
> Thanks to the later is_arm_register check,
On 2017-10-14 23:35:15 +0300, Martin Storsjö wrote:
> ---
> gas-preprocessor.pl | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/gas-preprocessor.pl b/gas-preprocessor.pl
> index afdfc9e..6aae65d 100755
> --- a/gas-preprocessor.pl
> +++ b/gas-preprocessor.pl
> @@ -97,8
On 2017-08-31 12:10:56 +0300, Martin Storsjö wrote:
> In binutils 2.29, the behavior of the ADR instruction changed so that 1 is
> added to the address of a Thumb function (previously nothing was added). This
> allows the loaded address to be passed to a BLX instruction and the correct
> mode
On 2017-02-21 18:26:25 +0100, Diego Biurrun wrote:
> ---
>
> This was previously approved.
>
> configure | 26 +-
> 1 file changed, 9 insertions(+), 17 deletions(-)
>
> diff --git a/configure b/configure
> index 6f1be32..ef6a8e0 100755
> --- a/configure
> +++
On 2017-02-21 18:26:24 +0100, Diego Biurrun wrote:
> External dependencies cannot be handled as weak dependencies since they need
> to be explicitly enabled. If a weak dependency is set, the variable
> corresponding
> to the weak dependency can be enabled without the rest of the build system
>
On 2017-02-09 14:33:55 +0200, Martin Storsjö wrote:
> This matches the order they are in the 16 bpp version.
>
> There they are in this order, to make sure we access them in the
> same order they are declared, easing loading only half of the
> coefficients at a time.
>
> This makes the 8 bpp
On 2017-02-09 14:33:56 +0200, Martin Storsjö wrote:
> This matches the order they are in the 16 bpp version.
>
> There they are in this order, to make sure we access them in the
> same order they are declared, easing loading only half of the
> coefficients at a time.
>
> This makes the 8 bpp
On 2017-02-09 14:33:54 +0200, Martin Storsjö wrote:
> All elements are used pairwise, except for the first one.
> Previously, the 16th element was unused. Move the unused element
> to the second slot, to make the later element pairs not split
> across registers.
>
> This simplifies loading only
On 2017-02-09 14:33:53 +0200, Martin Storsjö wrote:
> All elements are used pairwise, except for the first one.
> Previously, the 16th element was unused. Move the unused element
> to the second slot, to make the later element pairs not split
> across registers.
>
> This simplifies loading only
On 2017-02-09 13:39:55 +0200, Martin Storsjö wrote:
> The idct32x32 function actually backed up and restored q4-q7 even
> though it didn't clobber them; there are plenty of registers that
> can be used to allow keeping all the idct coefficients in registers
> without having to reload different
On 2017-02-11 23:42:05 +0200, Martin Storsjö wrote:
> On Sat, 11 Feb 2017, Martin Storsjö wrote:
>
> >On Fri, 10 Feb 2017, Janne Grunau wrote:
> >
> >>On 2017-01-15 22:55:52 +0200, Martin Storsjö wrote:
> >>>For this case, with 8 inputs but only changing
On 2017-02-11 22:19:02 +0200, Martin Storsjö wrote:
> On Fri, 10 Feb 2017, Janne Grunau wrote:
>
> >On 2017-01-15 22:55:48 +0200, Martin Storsjö wrote:
> >>The theoretical maximum value of E is 193, so we can just
> >>saturate the addition to 255.
> >>
&g
On 2017-02-09 13:27:04 +0200, Martin Storsjö wrote:
> The idct32x32 function actually backed up and restored d8-d15 even
... pushed onto the stack ... is imo clearer even though there are no
explicit push/pop instructions
> though it didn't clobber them; there are plenty of registers that
> can
On 2017-01-15 22:55:52 +0200, Martin Storsjö wrote:
> For this case, with 8 inputs but only changing 4 of them, we can fit
> all 16 input pixels into a q register, and still have enough temporary
> registers for doing the loop filter.
>
> The wd=8 filters would require too many temporary
On 2017-01-15 22:55:51 +0200, Martin Storsjö wrote:
> ---
> libavcodec/aarch64/vp9lpf_neon.S | 16 +---
> 1 file changed, 13 insertions(+), 3 deletions(-)
>
> diff --git a/libavcodec/aarch64/vp9lpf_neon.S
> b/libavcodec/aarch64/vp9lpf_neon.S
> index 4553173..3894307 100644
> ---
On 2017-01-15 22:55:50 +0200, Martin Storsjö wrote:
> This adds lots of extra .ifs, but speeds it up by a couple cycles,
> by avoiding stalls.
> ---
> libavcodec/arm/vp9lpf_neon.S | 8 ++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/libavcodec/arm/vp9lpf_neon.S
On 2017-01-15 22:55:49 +0200, Martin Storsjö wrote:
> ---
> libavcodec/arm/vp9lpf_neon.S | 12
> 1 file changed, 4 insertions(+), 8 deletions(-)
>
> diff --git a/libavcodec/arm/vp9lpf_neon.S b/libavcodec/arm/vp9lpf_neon.S
> index 5e154f6..9be4cef 100644
> ---
On 2017-01-15 22:55:48 +0200, Martin Storsjö wrote:
> The theoretical maximum value of E is 193, so we can just
> saturate the addition to 255.
>
> Before: Cortex A7 A8 A9 A53 A53/AArch64
> vp9_loop_filter_v_4_8_neon: 143.0 127.7 114.888.0
On 2017-01-15 22:55:47 +0200, Martin Storsjö wrote:
> Previously we first calculated hev, and then negated it.
>
> Since we were able to schedule the negation in the middle
> of another calculation, we don't see any gain in all cases.
>
> Before: Cortex A7 A8 A9
On 2017-01-05 09:35:36 +0200, Martin Storsjö wrote:
> This work is sponsored by, and copyright, Google.
>
> Before:Cortex A7 A8 A9 A53
> vp9_inv_dct_dct_16x16_sub1_add_neon: 273.0 189.5 211.7 235.8
> vp9_inv_dct_dct_32x32_sub1_add_neon: 752.0
On 2017-01-02 14:17:56 +0200, Martin Storsjö wrote:
> No measured speedup on an Cortex A53, but other cores might benefit.
A little surprised that it didn't made a difference on the cortex-a53
since certain sites reported the NEON unit isn't fully 128-bit wide, So
unlikely that it makes a
On 2017-01-02 14:17:54 +0200, Martin Storsjö wrote:
> Fold the field lengths into the macro.
>
> This makes the macro invocations much more readable, when the
> lines are shorter.
>
> This also makes it easier to use only half the registers within
> the macro.
> ---
>
On 2017-01-02 14:17:55 +0200, Martin Storsjö wrote:
> Before:Cortex A7 A8 A9 A53
> vp9_put_8tap_smooth_4h_neon: 378.1 273.2 340.7 229.5
> After:
> vp9_put_8tap_smooth_4h_neon: 352.1 222.2 290.5 229.5
> ---
> libavcodec/arm/vp9mc_neon.S | 33
On 2017-02-09 14:29:56 +0200, Martin Storsjö wrote:
> ---
> libavcodec/arm/vp9itxfm_neon.S | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/libavcodec/arm/vp9itxfm_neon.S b/libavcodec/arm/vp9itxfm_neon.S
> index 167d517..3d0b0fa 100644
> ---
On 2017-02-09 09:50:48 +0200, Martin Storsjö wrote:
> On Thu, 9 Feb 2017, Janne Grunau wrote:
>
> >On 2017-02-05 14:05:49 +0200, Martin Storsjö wrote:
> >>On Sun, 5 Feb 2017, Janne Grunau wrote:
> >>
> >>>> // out1 = in1 + in2
> >>>> //
On 2017-02-06 00:16:41 +0200, Martin Storsjö wrote:
>
> Ok, so after running a slightly shorter clip (which seems to have about as
> large percentage of runtime doing IDCT as the previous one) with a bit more
> iterations, I've got the following results (the 'user' part from 'time
> avconv
On 2017-02-05 14:05:49 +0200, Martin Storsjö wrote:
> On Sun, 5 Feb 2017, Janne Grunau wrote:
>
> >> // out1 = in1 + in2
> >> // out2 = in1 - in2
> >> .macro butterfly_8h out1, out2, in1, in2
> >>@@ -463,7 +510,7 @@ function idct16x16_dc_add_neon
> &
On 2017-02-06 17:22:06 +0100, Diego Biurrun wrote:
> This makes the feature more visible and obvious.
> ---
>
> Changed to use _conflict instead of _not as Janne suggested.
>
> configure | 22 +-
> 1 file changed, 13 insertions(+), 9 deletions(-)
ok
Janne
On 2017-02-06 18:08:00 +0100, Diego Biurrun wrote:
> This allows distinguishing between the internal variable name for
> external libraries and the pkg-config package name. Having both
> names available avoids special-casing outside the helper function
> when the two identifiers do not match.
>
On 2017-01-24 18:12:47 +0100, Diego Biurrun wrote:
> This allows distinguishing between the internal variable name for
> external libraries and the pkg-config package name. Having both
> names available avoids special-casing outside the helper function
> when the two identifiers do not match.
>
On 2016-12-01 11:27:02 +0200, Martin Storsjö wrote:
> This work is sponsored by, and copyright, Google.
>
> This makes it easier to avoid filling the temp buffer with zeros for the
> skipped slices, and leads to slightly more straightforward code for these
> cases (for the 16x16 case, where the
On 2017-02-04 23:37:37 +0200, Martin Storsjö wrote:
> This makes it more readable.
> ---
> This was suggested by Janne in a review of a patch that added a
> modified copy of this function; similar code already exists as well.
> ---
> libavcodec/arm/vp9itxfm_neon.S | 24
>
On 2016-12-01 11:27:01 +0200, Martin Storsjö wrote:
> This work is sponsored by, and copyright, Google.
>
> This increases the code size of libavcodec/aarch64/vp9itxfm_neon.o
> from 14740 to 18504 bytes.
>
> Before:
> vp9_inv_dct_dct_16x16_sub1_add_neon: 235.3
>
On 2017-02-05 00:34:16 +0200, Martin Storsjö wrote:
> On Sat, 4 Feb 2017, Janne Grunau wrote:
>
> >I'm not really sure which variant I prefer. Is the speed difference
> >mesuable for idct heavy real world samples? If you have preference for one
> >or the other varian
On 2016-12-01 11:27:00 +0200, Martin Storsjö wrote:
> This avoids concatenation, which can't be used if the whole macro
> is wrapped within another macro.
> ---
> libavcodec/aarch64/vp9itxfm_neon.S | 80
> +++---
> 1 file changed, 40 insertions(+), 40 deletions(-)
On 2016-12-01 11:26:59 +0200, Martin Storsjö wrote:
> This work is sponsored by, and copyright, Google.
>
> This reduces the code size of libavcodec/aarch64/vp9itxfm_neon.o from
> 19496 to 14740 bytes.
>
> This gives a small slowdown of a couple of tens of cycles, but makes
> it more feasible to
On 2017-02-03 23:44:51 +0200, Martin Storsjö wrote:
> On Fri, 3 Feb 2017, Janne Grunau wrote:
>
> >On 2016-12-01 11:26:57 +0200, Martin Storsjö wrote:
> >>This work is sponsored by, and copyright, Google.
> >>
>
> >>@@ -668,13 +756,40
On 2016-12-01 11:26:58 +0200, Martin Storsjö wrote:
> This work is sponsored by, and copyright, Google.
>
> This makes it easier to avoid filling the temp buffer with zeros for the
> skipped slices, and leads to slightly more straightforward code for these
> cases (for the 16x16 case, where the
On 2016-12-01 11:26:57 +0200, Martin Storsjö wrote:
> This work is sponsored by, and copyright, Google.
>
> This increases the code size of libavcodec/arm/vp9itxfm_neon.o
> from 12388 to 15064 bytes.
>
> Before: Cortex A7 A8 A9 A53
>
On 2016-12-01 11:26:56 +0200, Martin Storsjö wrote:
> This work is sponsored by, and copyright, Google.
>
> This reduces the code size of libavcodec/arm/vp9itxfm_neon.o from
> 15324 to 12388 bytes.
>
> This gives a small slowdown of a couple tens of cycles, up to around
> 150 cycles for the full
On 2016-12-22 13:07:14 +0100, Diego Biurrun wrote:
> This unclutters the top-level directory and groups related files together.
> ---
>
> Now with "avbuild" as directory to store files in instead of "build".
>
> .gitignore | 3 ++-
> Makefile |
On 2016-12-18 11:36:30 +0100, Anton Khirnov wrote:
> Calling ff_h264_field_end() when the per-field state is not properly
> initialized leads to all kinds of undefined behaviour.
>
> CC: libav-sta...@libav.org
> Bug-Id: 977 978 992
> ---
> libavcodec/h264_picture.c | 1 +
>
On 2016-12-14 09:56:40 +0100, Anton Khirnov wrote:
> Certain hardware decoding APIs are not guaranteed to be thread-safe, so
> having the user access decoded hardware surfaces while the decoder is
> running in another thread can cause failures (this is mainly known to
> happen with DXVA2).
>
>
On 2016-12-14 09:56:23 +0100, Anton Khirnov wrote:
> ---
> libavcodec/h263dec.c | 2 +-
> libavcodec/h264dec.c | 2 +-
> libavcodec/pthread_frame.c | 35 +++
> 3 files changed, 37 insertions(+), 2 deletions(-)
>
> diff --git a/libavcodec/h263dec.c
The former is not an official pseudo instruction although gas and llvm's
internal assembler support it. Fixes a build error with xcode 6.2
reported by Memphiz on github.
---
libavcodec/aarch64/synth_filter_neon.S | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git
On 2016-12-03 17:34:34 +0100, Anton Khirnov wrote:
> Certain hardware decoding APIs are often not thread-safe, so having the user
> access decoded hardware surfaces while the decoder is running in another
> thread can cause failures (this is mainly known to happen with DXVA2).
>
> For such
On 2016-12-03 17:34:33 +0100, Anton Khirnov wrote:
> ---
> libavcodec/h263dec.c | 2 +-
> libavcodec/h264dec.c | 2 +-
> libavcodec/pthread_frame.c | 27 +++
> 3 files changed, 29 insertions(+), 2 deletions(-)
>
> diff --git a/libavcodec/h263dec.c
On 2016-12-03 17:34:32 +0100, Anton Khirnov wrote:
> This makes sure ff_get_format() does not get called unnecessarily from
> update_thread_context().
> ---
> libavcodec/hevcdec.c | 49 ++---
> 1 file changed, 30 insertions(+), 19 deletions(-)
>
> diff
On 2016-11-28 11:26:02 +0200, Martin Storsjö wrote:
> This work is sponsored by, and copyright, Google.
>
> Previously all subpartitions except the eob=1 (DC) case ran with
> the same runtime:
>
> vp9_inv_dct_dct_16x16_sub16_add_neon: 1373.2
> vp9_inv_dct_dct_32x32_sub32_add_neon: 8089.0
>
On 2016-11-28 11:26:01 +0200, Martin Storsjö wrote:
> This work is sponsored by, and copyright, Google.
>
> Previously all subpartitions except the eob=1 (DC) case ran with
> the same runtime:
>
> vp9_inv_dct_dct_16x16_sub16_add_neon: 3188.1 2435.4 2499.0 1969.0
>
On 2016-11-28 11:26:00 +0200, Martin Storsjö wrote:
> This avoids reloading them if they haven't been clobbered, if the
> first pass also was idct.
>
> This is similar to what was done in the aarch64 version.
> ---
> libavcodec/arm/vp9itxfm_neon.S | 2 +-
> 1 file changed, 1 insertion(+), 1
On 2016-11-29 14:55:41 +0200, Martin Storsjö wrote:
> From: Clément Bœsch
>
> before:
>
> time ./avconv -v 0 -nostats -threads 1 -i sintel_vp9_500kbps.webm -f null -
> real0m11.125s
> user0m11.059s
> sys 0m0.050s
>
> time ./avconv -v 0 -nostats -threads 1 -i
On 2016-11-24 19:19:59 +0100, Anton Khirnov wrote:
> Only allow the decoding thread to run while the user is inside a lavc
> decode call (avcodec_send_packet/receive_frame).
> Hardware decoding APIs are often not thread-safe, so having the user
> access decoded hardware surfaces while the decoder
On 2016-11-24 17:24:00 +0100, Diego Biurrun wrote:
> ---
> configure | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/configure b/configure
> index a5295bf..27fb6ea 100755
> --- a/configure
> +++ b/configure
> @@ -430,7 +430,7 @@ filter(){
> pat=$1
> shift
>
On 2016-11-24 19:32:54 +0100, Diego Biurrun wrote:
> On Thu, Nov 24, 2016 at 06:44:35PM +0100, Janne Grunau wrote:
> > On 2016-11-24 17:23:49 +0100, Diego Biurrun wrote:
> > > --- a/configure
> > > +++ b/configure
> > > @@ -2440,6 +2441,7
On 2016-11-24 17:24:01 +0100, Diego Biurrun wrote:
> ---
>
> This works as advertised.
>
> Issues:
>
> - Maybe keeping _extralibs as suffix is better than _lbs, dunno.
> - Possibly I should investigate Janne's idea of using the function
> name as variable name instead of adding a library name
On 2016-11-24 17:23:50 +0100, Diego Biurrun wrote:
> ---
>
> I suspect very many missing math functions were actually spurious test
> failures related to this ...
>
> configure | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/configure b/configure
> index
On 2016-11-24 17:23:49 +0100, Diego Biurrun wrote:
> ---
> configure| 8 ++--
> libavfilter/vsrc_movie.c | 4
> 2 files changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/configure b/configure
> index f204dc2..8fa2f46 100755
> --- a/configure
> +++ b/configure
>
On 2016-11-24 17:23:47 +0100, Diego Biurrun wrote:
> ---
> configure | 1 -
> 1 file changed, 1 deletion(-)
>
> diff --git a/configure b/configure
> index 42c1848..78f1cac 100755
> --- a/configure
> +++ b/configure
> @@ -3039,7 +3039,6 @@ msvc_common_flags(){
> -mthumb)
On 2016-11-24 00:09:35 +0200, Martin Storsjö wrote:
> ---
> libavcodec/aarch64/vp9itxfm_neon.S | 26 +++---
> 1 file changed, 15 insertions(+), 11 deletions(-)
>
> diff --git a/libavcodec/aarch64/vp9itxfm_neon.S
> b/libavcodec/aarch64/vp9itxfm_neon.S
> index 2dc6b75..f4194a6
On 2016-11-18 13:57:05 +0200, Martin Storsjö wrote:
> From: "Ronald S. Bultje"
>
> ---
> tests/checkasm/vp9dsp.c | 21 ++---
> 1 file changed, 14 insertions(+), 7 deletions(-)
>
> diff --git a/tests/checkasm/vp9dsp.c b/tests/checkasm/vp9dsp.c
> index
On 2016-11-23 15:00:51 +0200, Martin Storsjö wrote:
> This work is sponsored by, and copyright, Google.
>
> Previously all subpartitions except the eob=1 (DC) case ran with
> the same runtime:
>
> vp9_inv_dct_dct_16x16_sub16_add_neon: 3189.0 2486.8 2509.9 1964.1
>
On 2016-11-23 15:00:50 +0200, Martin Storsjö wrote:
> Since the same parameter is used for both input and output,
> the name inout is more fitting.
>
> This matches the naming used below in the dmbutterfly macro.
> ---
> libavcodec/arm/vp9itxfm_neon.S | 14 +++---
> 1 file changed, 7
1 - 100 of 2530 matches
Mail list logo