On Sun, 3 Jan 2016, Derek Buitenhuis wrote:
It serves absolutely no purpose other than to confuse potentional
Android developers about how to use hardware acceleration properly
on the the platform. Both stagefright itself, and MediaCodec, have
avcodec backends already, and this is the correct
This fixes trac issue #5417.
This is cherry-picked from libav commit
d825b1a5306576dcd0553b7d0d24a3a46ad92864.
---
Updated the commit message to mention the ticket number.
---
libavcodec/libopenh264dec.c | 2 ++
libavcodec/libopenh264enc.c | 26 --
2 files changed, 26
This is cherrypicked from libav, from commits
82b7525173f20702a8cbc26ebedbf4b69b8fecec and
d0b1e6049b06ca146ece4d2f199c5dba1565.
---
Fixed the issues pointed out by Michael, removed the parts of the
commit message as requested by Carl.
---
Changelog | 1 +
configure
On Tue, 26 Jul 2016, Michael Niedermayer wrote:
On Tue, Jul 26, 2016 at 09:31:17PM +0300, Martin Storsjö wrote:
This is cherrypicked from libav, from commits
82b7525173f20702a8cbc26ebedbf4b69b8fecec and
d0b1e6049b06ca146ece4d2f199c5dba1565.
---
Fixed the issues pointed out by Michael
While it is less featureful (and slower) than the built-in H264
decoder, one could potentially want to use it to take advantage
of the cisco patent license offer.
This is cherrypicked from libav, from commits
82b7525173f20702a8cbc26ebedbf4b69b8fecec and
d0b1e6049b06ca146ece4d2f199c5dba1565.
This is cherry-picked from libav commit
d825b1a5306576dcd0553b7d0d24a3a46ad92864.
---
libavcodec/libopenh264dec.c | 2 ++
libavcodec/libopenh264enc.c | 26 --
2 files changed, 26 insertions(+), 2 deletions(-)
diff --git a/libavcodec/libopenh264dec.c
On Thu, 19 Jan 2017, Michael Niedermayer wrote:
On Wed, Jan 18, 2017 at 11:45:08PM +0200, Martin Storsjö wrote:
This work is sponsored by, and copyright, Google.
This is more in line with how it will be extended for more bitdepths.
---
libavcodec/arm/vp9dsp_init_arm.c | 24
This work is sponsored by, and copyright, Google.
This is more in line with how it will be extended for more bitdepths.
---
libavcodec/aarch64/vp9dsp_init_aarch64.c | 24 +---
1 file changed, 9 insertions(+), 15 deletions(-)
diff --git
This work is sponsored by, and copyright, Google.
This is similar to the arm version, but due to the larger registers
on aarch64, we can do 8 pixels at a time for all filter sizes.
Examples of runtimes vs the 32 bit version, on a Cortex A53:
ARM
This work is sponsored by, and copyright, Google.
The plain pixel put/copy functions are used from the 8 bit version,
for the double size (e.g. put16 uses ff_vp9_copy32_neon), and a new
copy128 is added.
Compared with the 8 bit version, the filters can no longer use the
trick to accumulate in 16
This work is sponsored by, and copyright, Google.
This is structured similarly to the 8 bit version. In the 8 bit
version, the coefficients are 16 bits, and intermediates are 32 bits.
Here, the coefficients are 32 bit. For the 4x4 transforms for 10 bit
content, the intermediates also fit in 32
This work is sponsored by, and copyright, Google.
This has mostly got the same differences to the 8 bit version as
in the arm version. For the horizontal filters, we do 16 pixels
in parallel as well. For the 8 pixel wide vertical filters, we can
accumulate 4 rows before storing, just as in the 8
This work is sponsored by, and copyright, Google.
Compared to the arm version, on aarch64 we can keep the full 8x8
transform in registers, and for 16x16 and 32x32, we can process
it in slices of 4 pixels instead of 2.
Examples of runtimes vs the 32 bit version, on a Cortex A53:
This work is sponsored by, and copyright, Google.
This is pretty much similar to the 8 bpp version, but in some senses
simpler. All input pixels are 16 bits, and all intermediates also fit
in 16 bits, so there's no lengthening/narrowing in the filter at all.
For the full 16 pixel wide filter, we
This work is sponsored by, and copyright, Google.
This is more in line with how it will be extended for more bitdepths.
---
libavcodec/arm/vp9dsp_init_arm.c | 24 +---
1 file changed, 9 insertions(+), 15 deletions(-)
diff --git a/libavcodec/arm/vp9dsp_init_arm.c
On Mon, 14 Nov 2016, Ronald S. Bultje wrote:
Hi,
On Mon, Nov 14, 2016 at 5:32 AM, Martin Storsjö <mar...@martin.st> wrote:
Make them aligned, to allow efficient access to them from simd.
This is an adapted cherry-pick from libav commit
a4cfcddcb0f76e837d5abc06840c2b26c0
This work is sponsored by, and copyright, Google.
The implementation tries to have smart handling of cases
where no pixels need the full filtering for the 8/16 width
filters, skipping both calculation and writeback of the
unmodified pixels in those cases. The actual effect of this
is hard to test
This work is sponsored by, and copyright, Google.
These are ported from the ARM version; thanks to the larger
amount of registers available, we can do the loop filters with
16 pixels at a time. The implementation is fully templated, with
a single macro which can generate versions for both 8 and
This work is sponsored by, and copyright, Google.
These are ported from the ARM version; it is essentially a 1:1
port with no extra added features, but with some hand tuning
(especially for the plain copy/avg functions). The ARM version
isn't very register starved to begin with, so there's not
This work is sponsored by, and copyright, Google.
The filter coefficients are signed values, where the product of the
multiplication with one individual filter coefficient doesn't
overflow a 16 bit signed value (the largest filter coefficient is
127). But when the products are accumulated, the
With apple tools, the linker fails with errors like these, if the
offset is negative:
ld: in section __TEXT,__text reloc 8: symbol index out of range for
architecture arm64
This is cherry-picked from libav commit
c44a8a3eabcd6acd2ba79f32ec8a432e6ebe552c.
---
libavutil/aarch64/asm.S | 14
This work is sponsored by, and copyright, Google.
For the transforms up to 8x8, we can fit all the data (including
temporaries) in registers and just do a straightforward transform
of all the data. For 16x16, we do a transform of 4x16 pixels in
4 slices, using a temporary buffer. For 32x32, we
We reset .Lpic_gp to zero at the start of each function, which means
that the logic within movrelx for clearing gp when necessary will
be missed.
This fixes using movrelx in different functions with a different
helper register.
This is cherry-picked from libav commit
This work is sponsored by, and copyright, Google.
These are ported from the ARM version; thanks to the larger
amount of registers available, we can do the 16x16 and 32x32
transforms in slices 8 pixels wide instead of 4. This gives
a speedup of around 1.4x compared to the 32 bit version.
The fact
Make them aligned, to allow efficient access to them from simd.
This is an adapted cherry-pick from libav commit
a4cfcddcb0f76e837d5abc06840c2b26c0e8aefc.
---
libavcodec/vp9dsp.c | 56 +++
libavcodec/vp9dsp.h | 3 +++
From: Janne Grunau
Since aarch64 has enough free general purpose registers use them to
branch to the appropiate storage code. 1-2 cycles faster for the
functions using loop_filter 8/16, ... on a cortex-a53. Mixed results
(up to 2 cycles faster/slower) on a cortex-a57.
This work is sponsored by, and copyright, Google.
Previously all subpartitions except the eob=1 (DC) case ran with
the same runtime:
vp9_inv_dct_dct_16x16_sub16_add_neon: 1373.2
vp9_inv_dct_dct_32x32_sub32_add_neon: 8089.0
By skipping individual 8x16 or 8x32 pixel slices in the first pass,
This work is sponsored by, and copyright, Google.
Previously all subpartitions except the eob=1 (DC) case ran with
the same runtime:
Cortex A7 A8 A9 A53
vp9_inv_dct_dct_16x16_sub16_add_neon: 3188.1 2435.4 2499.0 1969.0
Since the same parameter is used for both input and output,
the name inout is more fitting.
This matches the naming used below in the dmbutterfly macro.
This is cherrypicked from libav commit
79566ec8c77969d5f9be533de04b1349834cca62.
---
libavcodec/arm/vp9itxfm_neon.S | 14 +++---
1
This is cherrypicked from libav commit
721bc37522c5c1d6a8c3cea5e9c3fcde8d256c05.
---
libavcodec/aarch64/vp9itxfm_neon.S | 16
libavcodec/arm/vp9itxfm_neon.S | 8
2 files changed, 12 insertions(+), 12 deletions(-)
diff --git a/libavcodec/aarch64/vp9itxfm_neon.S
This is cherrypicked from libav commit
85ad5ea72ce3983947a3b07e4b35c66cb16dfaba.
---
libavcodec/aarch64/vp9mc_neon.S | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/libavcodec/aarch64/vp9mc_neon.S b/libavcodec/aarch64/vp9mc_neon.S
index 69dad6d..80d1d23 100644
---
From: Janne Grunau
This is one instruction less for thumb, and only have got
1/2 arm/thumb specific instructions.
This is cherrypicked from libav commit
e5b0fc170f85b00f7dd0ac514918fb5c95253d39.
---
libavcodec/arm/vp9itxfm_neon.S | 28
1
This is cherrypicked from libav commit
65074791e8f8397600aacc9801efdd1eb6e3.
---
libavcodec/aarch64/vp9dsp_init_aarch64.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/libavcodec/aarch64/vp9dsp_init_aarch64.c
b/libavcodec/aarch64/vp9dsp_init_aarch64.c
index
This avoids reloading them if they haven't been clobbered, if the
first pass also was idct.
This is similar to what was done in the aarch64 version.
This is cherrypicked from libav commit
3c87039a404c5659ae9bf7454a04e186532eb40b.
---
libavcodec/arm/vp9itxfm_neon.S | 2 +-
1 file changed, 1
This is cherrypicked from libav commit
2f99117f6ff24ce5be2abb9e014cb8b86c2aa0e0.
---
libavcodec/aarch64/vp9itxfm_neon.S | 26 +++---
1 file changed, 15 insertions(+), 11 deletions(-)
diff --git a/libavcodec/aarch64/vp9itxfm_neon.S
b/libavcodec/aarch64/vp9itxfm_neon.S
index
The clobbering tests in checkasm are only invoked when testing
correctness, so this bug didn't show up when benchmarking the
dc-only version.
This is cherrypicked from libav commit
4d960a11855f4212eb3a4e470ce890db7f01df29.
---
libavcodec/aarch64/vp9itxfm_neon.S | 8
1 file changed, 4
From: Janne Grunau
The latter is 1 cycle faster on a cortex-53 and since the operands are
bytewise (or larger) bitmask (impossible to overflow to zero) both are
equivalent.
This is cherrypicked from libav commit
e7ae8f7a715843a5089d18e033afb3ee19ab3057.
---
This is cherrypicked from libav commit
c536e5e8698110c139b1c17938998a5547550aa3.
---
libavcodec/arm/vp9mc_neon.S | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/libavcodec/arm/vp9mc_neon.S b/libavcodec/arm/vp9mc_neon.S
index 5fe3024..83235ff 100644
---
This reduces the number of lines and reduces the duplication.
Also simplify the eob check for the half case.
If we are in the half case, we know we at least will need to do the
first three slices, we only need to check eob for the fourth one,
so we can hardcode the value to check against instead
This makes the code slightly clearer, but doesn't make any functional
difference.
---
libavcodec/arm/vp9itxfm_16bpp_neon.S | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/libavcodec/arm/vp9itxfm_16bpp_neon.S
b/libavcodec/arm/vp9itxfm_16bpp_neon.S
index e6e9440..a92f323
Keep the idct32 coefficients in narrow form in q6-q7, and idct16
coefficients in lengthened 32 bit form in q0-q3. Avoid clobbering
q0-q3 in the pass1 function, and squeeze the idct16 coefficients
into q0-q1 in the pass2 function to avoid reloading them.
The idct16 coefficients are clobbered and
In the half/quarter cases where we don't use the min_eob array, defer
loading the pointer until we know it will be needed.
This is cherrypicked from libav commit
3a0d5e206d24d41d87a25ba16a79b2ea04c39d4c.
---
libavcodec/aarch64/vp9itxfm_neon.S | 3 ++-
libavcodec/arm/vp9itxfm_neon.S | 4 ++--
This allows reusing the macro for a separate implementation of the
pass2 function.
---
libavcodec/aarch64/vp9itxfm_16bpp_neon.S | 98
1 file changed, 49 insertions(+), 49 deletions(-)
diff --git a/libavcodec/aarch64/vp9itxfm_16bpp_neon.S
---
libavcodec/arm/vp9itxfm_16bpp_neon.S | 20 ++--
1 file changed, 10 insertions(+), 10 deletions(-)
diff --git a/libavcodec/arm/vp9itxfm_16bpp_neon.S
b/libavcodec/arm/vp9itxfm_16bpp_neon.S
index a92f323..9c02ed9 100644
--- a/libavcodec/arm/vp9itxfm_16bpp_neon.S
+++
---
libavcodec/aarch64/vp9itxfm_16bpp_neon.S | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/libavcodec/aarch64/vp9itxfm_16bpp_neon.S
b/libavcodec/aarch64/vp9itxfm_16bpp_neon.S
index f53e94a..f80604f 100644
--- a/libavcodec/aarch64/vp9itxfm_16bpp_neon.S
+++
This work is sponsored by, and copyright, Google.
This avoids loading and calculating coefficients that we know will
be zero, and avoids filling the temp buffer with zeros in places
where we know the second pass won't read.
This gives a pretty substantial speedup for the smaller subpartitions.
This work is sponsored by, and copyright, Google.
This reduces the code size of libavcodec/arm/vp9itxfm_16bpp_neon.o from
17500 to 14516 bytes.
This gives a small slowdown of a couple tens of cycles, up to around
150 cycles for the full case of the largest transform, but makes
it more feasible
This work is sponsored by, and copyright, Google.
This reduces the code size of libavcodec/aarch64/vp9itxfm_16bpp_neon.o from
26288 to 21512 bytes.
This gives a small slowdown of a couple of tens of cycles, but makes
it more feasible to add more optimized versions of these transforms.
Before:
This makes the code a bit more readable.
---
libavcodec/aarch64/vp9itxfm_16bpp_neon.S | 24
1 file changed, 12 insertions(+), 12 deletions(-)
diff --git a/libavcodec/aarch64/vp9itxfm_16bpp_neon.S
b/libavcodec/aarch64/vp9itxfm_16bpp_neon.S
index f80604f..86ea29e 100644
This work is sponsored by, and copyright, Google.
This avoids loading and calculating coefficients that we know will
be zero, and avoids filling the temp buffer with zeros in places
where we know the second pass won't read.
This gives a pretty substantial speedup for the smaller subpartitions.
Align the second/third operands as they usually are.
Due to the wildly varying sizes of the written out operands
in aarch64 assembly, the column alignment is usually not as clear
as in arm assembly.
This is cherrypicked from libav commit
7995ebfad12002033c73feed422a1cfc62081e8f.
---
This avoids concatenation, which can't be used if the whole macro
is wrapped within another macro.
---
libavcodec/aarch64/vp9itxfm_16bpp_neon.S | 90
1 file changed, 45 insertions(+), 45 deletions(-)
diff --git a/libavcodec/aarch64/vp9itxfm_16bpp_neon.S
This work is sponsored by, and copyright, Google.
This avoids loading and calculating coefficients that we know will
be zero, and avoids filling the temp buffer with zeros in places
where we know the second pass won't read.
This gives a pretty substantial speedup for the smaller subpartitions.
This work is sponsored by, and copyright, Google.
This avoids loading and calculating coefficients that we know will
be zero, and avoids filling the temp buffer with zeros in places
where we know the second pass won't read.
This gives a pretty substantial speedup for the smaller subpartitions.
No measured speedup on a Cortex A53, but other cores might benefit.
This is cherrypicked from libav commit
388e0d2515bc6bbc9d0c9af1d230bd16cf945fe7.
---
libavcodec/aarch64/vp9mc_neon.S | 15 +--
1 file changed, 13 insertions(+), 2 deletions(-)
diff --git
Before:Cortex A7 A8 A9 A53
vp9_put_8tap_smooth_4h_neon: 378.1 273.2 340.7 229.5
After:
vp9_put_8tap_smooth_4h_neon: 352.1 222.2 290.5 229.5
This is cherrypicked from libav commit
fea92a4b57d1c328b1de226a5f213a629ee63754.
---
This is cherrypicked from libav commit
0c0b87f12d48d4e7f0d3d13f9345e828a3a5ea32.
---
libavcodec/aarch64/vp9itxfm_neon.S | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/libavcodec/aarch64/vp9itxfm_neon.S
b/libavcodec/aarch64/vp9itxfm_neon.S
index 5219d6e..6bb097b 100644
This work is sponsored by, and copyright, Google.
This reduces the code size of libavcodec/aarch64/vp9itxfm_neon.o from
19496 to 14740 bytes.
This gives a small slowdown of a couple of tens of cycles, but makes
it more feasible to add more optimized versions of these transforms.
Before:
This makes it more readable.
This is cherrypicked from libav commit
3bc5b28d5a191864c54bba60646933a63da31656.
---
libavcodec/arm/vp9itxfm_neon.S | 24
1 file changed, 12 insertions(+), 12 deletions(-)
diff --git a/libavcodec/arm/vp9itxfm_neon.S
This avoids concatenation, which can't be used if the whole macro
is wrapped within another macro.
This is also arguably more readable.
This is cherrypicked from libav commit
58d87e0f49bcbbc6f426328f53b657bae7430cd2.
---
libavcodec/aarch64/vp9itxfm_neon.S | 80
All elements are used pairwise, except for the first one.
Previously, the 16th element was unused. Move the unused element
to the second slot, to make the later element pairs not split
across registers.
This simplifies loading only parts of the coefficients,
reducing the difference to the 16 bpp
All elements are used pairwise, except for the first one.
Previously, the 16th element was unused. Move the unused element
to the second slot, to make the later element pairs not split
across registers.
This simplifies loading only parts of the coefficients,
reducing the difference to the 16 bpp
Fold the field lengths into the macro.
This makes the macro invocations much more readable, when the
lines are shorter.
This also makes it easier to use only half the registers within
the macro.
This is cherrypicked from libav commit
5e0c2158fbc774f87d3ce4b7b950ba4d42c4a7b8.
---
This adds lots of extra .ifs, but speeds it up by a couple cycles,
by avoiding stalls.
This is cherrypicked from libav commit
b0806088d3b27044145b20421da8d39089ae0c6a.
---
libavcodec/aarch64/vp9lpf_neon.S | 14 +++---
1 file changed, 11 insertions(+), 3 deletions(-)
diff --git
The idct32x32 function actually pushed q4-q7 onto the stack even
though it didn't clobber them; there are plenty of registers that
can be used to allow keeping all the idct coefficients in registers
without having to reload different subsets of them at different
stages in the transform.
Since the
The theoretical maximum value of E is 193, so we can just
saturate the addition to 255.
Before: Cortex A7 A8 A9 A53 A53/AArch64
vp9_loop_filter_v_4_8_neon: 143.0 127.7 114.888.0 87.7
vp9_loop_filter_v_8_8_neon: 241.0 197.2 173.7
This fixes building with clang for linux with PIC enabled.
This is cherrypicked from libav commit
8847eeaa14189885038140fb2b8a7adc7100.
---
libavutil/aarch64/asm.S | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/libavutil/aarch64/asm.S b/libavutil/aarch64/asm.S
index
This is cherrypicked from libav commit
07b5136c481d394992c7e951967df0cfbb346c0b.
---
libavcodec/aarch64/vp9lpf_neon.S | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/libavcodec/aarch64/vp9lpf_neon.S b/libavcodec/aarch64/vp9lpf_neon.S
index cd3e26c..ebfd9be 100644
---
The idct32x32 function actually pushed d8-d15 onto the stack even
though it didn't clobber them; there are plenty of registers that
can be used to allow keeping all the idct coefficients in registers
without having to reload different subsets of them at different
stages in the transform.
After
This is one cycle faster in total, and three instructions fewer.
Before:
vp9_loop_filter_mix2_v_44_16_neon: 123.2
After:
vp9_loop_filter_mix2_v_44_16_neon: 122.2
This is cherrypicked from libav commit
3bf9c48320f25f3d5557485b0202f22ae60748b0.
---
libavcodec/aarch64/vp9lpf_neon.S | 21
For this case, with 8 inputs but only changing 4 of them, we can fit
all 16 input pixels into a q register, and still have enough temporary
registers for doing the loop filter.
The wd=8 filters would require too many temporary registers for
processing all 16 pixels at once though.
Before:
This matches the order they are in the 16 bpp version.
There they are in this order, to make sure we access them in the
same order they are declared, easing loading only half of the
coefficients at a time.
This makes the 8 bpp version match the 16 bpp version better.
This is cherrypicked from
This matches the order they are in the 16 bpp version.
There they are in this order, to make sure we access them in the
same order they are declared, easing loading only half of the
coefficients at a time.
This makes the 8 bpp version match the 16 bpp version better.
This is cherrypicked from
This work is sponsored by, and copyright, Google.
This reduces the code size of libavcodec/arm/vp9itxfm_neon.o from
15324 to 12388 bytes.
This gives a small slowdown of a couple tens of cycles, up to around
150 cycles for the full case of the largest transform, but makes
it more feasible to add
This adds lots of extra .ifs, but speeds it up by a couple cycles,
by avoiding stalls.
This is cherrypicked from libav commit
e18c39005ad1dbb178b336f691da1de91afd434e.
---
libavcodec/arm/vp9lpf_neon.S | 8 ++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git
This is cherrypicked from libav commit
435cd7bc99671bf561193421a50ac6e9d63c4266.
---
libavcodec/arm/vp9lpf_neon.S | 12
1 file changed, 4 insertions(+), 8 deletions(-)
diff --git a/libavcodec/arm/vp9lpf_neon.S b/libavcodec/arm/vp9lpf_neon.S
index 2761956..3d289e5 100644
---
This allows reusing the macro for a separate implementation of the
pass2 function.
This is cherrypicked from libav commit
79d332ebbde8c0a3e9da094dcfd10abd33ba7378.
---
libavcodec/aarch64/vp9itxfm_neon.S | 90 +++---
1 file changed, 45 insertions(+), 45
This allows reusing the macro for a separate implementation of the
pass2 function.
This is cherrypicked from libav commit
47b3c2c18d1897f3c753ba0cec4b2d7aa24526af.
---
libavcodec/arm/vp9itxfm_neon.S | 72 +-
1 file changed, 36 insertions(+), 36
This work is sponsored by, and copyright, Google.
Before:Cortex A7 A8 A9 A53
vp9_inv_dct_dct_16x16_sub1_add_neon: 273.0 189.5 211.7 235.8
vp9_inv_dct_dct_32x32_sub1_add_neon: 752.0 459.2 862.2 553.9
After:
This work is sponsored by, and copyright, Google.
Before: Cortex A53
vp9_inv_dct_dct_16x16_sub1_add_neon: 235.3
vp9_inv_dct_dct_32x32_sub1_add_neon: 555.1
After:
vp9_inv_dct_dct_16x16_sub1_add_neon: 180.2
vp9_inv_dct_dct_32x32_sub1_add_neon: 475.3
This is
This is cherrypicked from libav commit
8476eb0d3ab1f7a52317b23346646389c08fb57a.
---
libavcodec/aarch64/vp9itxfm_neon.S | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/libavcodec/aarch64/vp9itxfm_neon.S
b/libavcodec/aarch64/vp9itxfm_neon.S
index 3b34749..5219d6e 100644
The ld1r is a leftover from the arm version, where this trick is
beneficial on some cores.
Use a single-lane load where we don't need the semantics of ld1r.
This is cherrypicked from libav commit
ed8d293306e12c9b79022d37d39f48825ce7f2fa.
---
libavcodec/aarch64/vp9itxfm_neon.S | 16
This is cherrypicked from libav commit
3933b86bb93aca47f29fbd493075b0f110c1e3f5.
---
libavcodec/arm/vp9itxfm_neon.S | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/libavcodec/arm/vp9itxfm_neon.S b/libavcodec/arm/vp9itxfm_neon.S
index 33a7af1..78fdae6 100644
---
This is cherrypicked from libav commit
3dd7827258ddaa2e51085d0c677d6f3b1be3572f.
---
libavcodec/aarch64/vp9itxfm_neon.S | 16
1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/libavcodec/aarch64/vp9itxfm_neon.S
b/libavcodec/aarch64/vp9itxfm_neon.S
index
Previously we first calculated hev, and then negated it.
Since we were able to schedule the negation in the middle
of another calculation, we don't see any gain in all cases.
Before: Cortex A7 A8 A9 A53 A53/AArch64
vp9_loop_filter_v_4_8_neon: 147.0 129.0
This is cherrypicked from libav commit
4da4b2b87f08a1331650c7e36eb7d4029a160776.
---
libavcodec/aarch64/vp9itxfm_neon.S | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/libavcodec/aarch64/vp9itxfm_neon.S
b/libavcodec/aarch64/vp9itxfm_neon.S
index 3eb999a..df178d2 100644
---
Hi Jorge,
On Mon, 7 Aug 2017, Jorge Ramirez wrote:
On 08/03/2017 01:53 AM, Mark Thompson wrote:
+default:
+return 0;
+}
+
+SET_V4L_EXT_CTRL(value, qmin, avctx->qmin, "minimum video quantizer
scale");
+SET_V4L_EXT_CTRL(value, qmax, avctx->qmax, "maximum video
From: Memphiz
Properly use the b.eq/b.ge forms instead of the nonstandard forms
(which both gas and newer clang accept though), and expand the
register list that used a range (which the Xcode 6.2 clang, based
on clang 3.5 svn, didn't support).
This is cherrypicked from libav
From: Memphiz
Properly use the b.eq form instead of the nonstandard form (which
both gas and newer clang accept though), and expand the register
lists that used a range (which the Xcode 6.2 clang, based on clang
3.5 svn, didn't support).
---
Vanilla clang supports altmacro since clang 5.0, and thus doesn't
require gas-preprocessor for building the arm assembly any longer.
However, the built-in assembler doesn't support .dn directives.
This readds checks that were removed in d7320ca3ed10f0d, when
the last usage of .dn directives
Clang supports the macro expansion counter (used for making unique
labels within macro expansions), but not when targeting darwin.
Convert uses of the counter into normal local labels, as used
elsewhere.
Since Xcode 9.3, the bundled clang supports altmacro and doesn't
require using
When targeting darwin, clang requires commas between arguments,
while the no-comma form is allowed for other targets.
Since Xcode 9.3, the bundled clang supports altmacro and doesn't
require using gas-preprocessor any longer.
---
libavcodec/arm/hevcdsp_deblock_neon.S | 8
1 file
On Sat, 31 Mar 2018, Hendrik Leppkes wrote:
On Fri, Mar 30, 2018 at 9:14 PM, Martin Storsjö <mar...@martin.st> wrote:
Clang supports the macro expansion counter (used for making unique
labels within macro expansions), but not when targeting darwin.
Convert uses of the counter into
---
Removed the option and made this behaviour the default.
---
libavformat/flv.h| 1 +
libavformat/flvdec.c | 18 ++
2 files changed, 15 insertions(+), 4 deletions(-)
diff --git a/libavformat/flv.h b/libavformat/flv.h
index 3aabb3adc9..3571b90279 100644
---
On Sun, 28 Oct 2018, Michael Niedermayer wrote:
On Sat, Oct 27, 2018 at 09:22:18PM +0300, Martin Storsjö wrote:
On Sat, 27 Oct 2018, Michael Niedermayer wrote:
On Thu, Oct 25, 2018 at 03:59:17PM +0300, Martin Storsjö wrote:
---
libavformat/flv.h| 1 +
libavformat/flvdec.c | 21
On Mon, 29 Oct 2018, Derek Buitenhuis wrote:
On 29/10/2018 14:10, Martin Storsjö wrote:
I don't understand why this is being used in favour of a proper
pointer field? An integer field is just ascting to be misused.
Even the doxygen is really sketchy on it.
It's essentially meant to be used
On Mon, 29 Oct 2018, Derek Buitenhuis wrote:
On 25/10/2018 13:58, Martin Storsjö wrote:
+x4->nb_reordered_opaque = x264_encoder_maximum_delayed_frames(x4->enc) + 1;
Is it possible this changes when the encoder is reconfigured (e.g. to
interlaced)?
Good point. I'm sure it's po
On Wed, 31 Oct 2018, Derek Buitenhuis wrote:
On 30/10/2018 19:49, Martin Storsjö wrote:
Hmm, that might make sense, but with a little twist. The max reordered
frames for H.264 is known, but onto that you also get more delay due to
frame threads and other details that this function within x264
On Thu, 1 Nov 2018, Derek Buitenhuis wrote:
On 31/10/2018 21:41, Martin Storsjö wrote:
Even though we do allow reconfiguration, it doesn't look like we support
changing any parameters which would actually affect the delay, only RC
method and targets (CRF, bitrate, etc). So given
On Tue, 30 Oct 2018, Derek Buitenhuis wrote:
On 29/10/2018 21:06, Martin Storsjö wrote:
As I guess there can be old frames in flight, the only safe option is to
enlarge, not to shrink it. But in case a realloc moves the array, the old
pointers end up pretty useless.
Just always allocate
1 - 100 of 1334 matches
Mail list logo