Re: [libav-devel] [PATCH] asfdec: fix reading files larger than 2GB
On 24/02/2017 01:05, John Stebbins wrote: > avio_skip returns file position and overflows int > --- > libavformat/asfdec.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/libavformat/asfdec.c b/libavformat/asfdec.c > index 34730b2..10d3396 100644 > --- a/libavformat/asfdec.c > +++ b/libavformat/asfdec.c > @@ -976,7 +976,8 @@ static int asf_read_simple_index(AVFormatContext *s, > const GUIDParseTable *g) > uint64_t interval; // index entry time interval in 100 ns units, usually > it's 1s > uint32_t pkt_num, nb_entries; > int32_t prev_pkt_num = -1; > -int i, ret; > +int i; > +int64_t ret; > uint64_t size = avio_rl64(pb); > > // simple index objects should be ordered by stream number, this loop > tries to find > Sounds good, I hadn't look at the code but maybe it might be clearer using a second variable. lu ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
[libav-devel] [PATCH] asfdec: fix reading files larger than 2GB
avio_skip returns file position and overflows int --- libavformat/asfdec.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/libavformat/asfdec.c b/libavformat/asfdec.c index 34730b2..10d3396 100644 --- a/libavformat/asfdec.c +++ b/libavformat/asfdec.c @@ -976,7 +976,8 @@ static int asf_read_simple_index(AVFormatContext *s, const GUIDParseTable *g) uint64_t interval; // index entry time interval in 100 ns units, usually it's 1s uint32_t pkt_num, nb_entries; int32_t prev_pkt_num = -1; -int i, ret; +int i; +int64_t ret; uint64_t size = avio_rl64(pb); // simple index objects should be ordered by stream number, this loop tries to find -- 2.9.3 ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 2/6] arm/aarch64: vp9lpf: Keep the comparison to E within 8 bit
On Thu, 23 Feb 2017, Janne Grunau wrote: On 2017-02-11 22:19:02 +0200, Martin Storsjö wrote: On Fri, 10 Feb 2017, Janne Grunau wrote: >On 2017-01-15 22:55:48 +0200, Martin Storsjö wrote: >>The theoretical maximum value of E is 193, so we can just >>saturate the addition to 255. >> >>Before: Cortex A7 A8 A9 A53 A53/AArch64 >>vp9_loop_filter_v_4_8_neon: 143.0 127.7 114.888.0 87.7 >>vp9_loop_filter_v_8_8_neon: 241.0 197.2 173.7 140.0136.7 >>vp9_loop_filter_v_16_8_neon:497.0 419.5 379.7 293.0275.7 >>vp9_loop_filter_v_16_16_neon: 965.2 818.7 731.4 579.0452.0 >>After: >>vp9_loop_filter_v_4_8_neon: 136.0 125.7 112.684.0 83.0 >>vp9_loop_filter_v_8_8_neon: 234.0 195.5 171.5 136.0133.7 >>vp9_loop_filter_v_16_8_neon:490.0 417.5 377.7 289.0271.0 >>vp9_loop_filter_v_16_16_neon: 951.2 814.7 732.3 571.0446.7 >>--- >> libavcodec/aarch64/vp9lpf_neon.S | 40 +--- >> libavcodec/arm/vp9lpf_neon.S | 11 +-- >> 2 files changed, 14 insertions(+), 37 deletions(-) >> >>diff --git a/libavcodec/aarch64/vp9lpf_neon.S b/libavcodec/aarch64/vp9lpf_neon.S >>index 3b8e6eb..4553173 100644 >>--- a/libavcodec/aarch64/vp9lpf_neon.S >>+++ b/libavcodec/aarch64/vp9lpf_neon.S >>@@ -51,13 +51,6 @@ >> // see the arm version instead. >> >> >>-.macro uabdl_sz dst1, dst2, in1, in2, sz >>-uabdl \dst1, \in1\().8b, \in2\().8b >>-.ifc \sz, .16b >>-uabdl2 \dst2, \in1\().16b, \in2\().16b >>-.endif >>-.endm >>- >> .macro add_sz dst1, dst2, in1, in2, in3, in4, sz >> add \dst1, \in1, \in3 >> .ifc \sz, .16b >>@@ -86,20 +79,6 @@ >> .endif >> .endm >> >>-.macro cmhs_sz dst1, dst2, in1, in2, in3, in4, sz >>-cmhs\dst1, \in1, \in3 >>-.ifc \sz, .16b >>-cmhs\dst2, \in2, \in4 >>-.endif >>-.endm >>- >>-.macro xtn_sz dst, in1, in2, sz >>-xtn \dst\().8b, \in1 >>-.ifc \sz, .16b >>-xtn2\dst\().16b, \in2 >>-.endif >>-.endm >>- >> .macro usubl_sz dst1, dst2, in1, in2, sz >> usubl \dst1, \in1\().8b, \in2\().8b >> .ifc \sz, .16b >>@@ -179,20 +158,20 @@ >> // tmpq2 == tmp3 + tmp4, etc. >> .macro loop_filter wd, sz, mix, tmp1, tmp2, tmp3, tmp4, tmp5, tmp6, tmp7, tmp8 >> .if \mix == 0 >>-dup v0.8h, w2// E >>-dup v1.8h, w2// E >>+dup v0\sz, w2// E >> dup v2\sz, w3// I >> dup v3\sz, w4// H >> .else >>-dup v0.8h, w2// E >>+dup v0.8b, w2// E >> dup v2.8b, w3// I >> dup v3.8b, w4// H >>+lsr w5, w2, #8 >> lsr w6, w3, #8 >> lsr w7, w4, #8 >>-ushrv1.8h, v0.8h, #8 // E >>+dup v1.8b, w5// E >> dup v4.8b, w6// I >>-bic v0.8h, #255, lsl 8 // E >> dup v5.8b, w7// H >>+trn1v0.2d, v0.2d, v1.2d > >isn't this equivalent to > >dup v0.8h, w2 >uzp1 v0.16b, v0.16b, v0.16b > >on little endian? Nice idea, but it isn't quite as straightforward on aarch64 - on arm it would have been. gah, yes. All the even values will be output in the output registers of uzp1, so you need uzp2 as well. So instead of this as we have now: dup v0.8b, w2 lsr w5, w2, #8 dup v1.8b, w5 trn1 v0.2d, v0.2d, v1.2d We could do: dup v0.8h, w2 uzp2 v1.16b, v0.16b, v0.16b uzp1 v0.16b, v0.16b, v0.16b trn1 v0.2d, v0.2d, v1.2d rev16 v1.16b, v0.16b // or ext ..x or any other instruction uzp1 v0.16b, v0.16b, v1.16b is one instruction less but also not straight forward Neat, thanks! This turns out to be one cycle faster in total, and three instructions less. I'll push that as a separate patch since it changes the existing ones quite a bit as well, not just the registers touched by this patch. // Martin ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 2/3] configure: Simplify dlopen check
On 2017-02-21 18:26:25 +0100, Diego Biurrun wrote: > --- > > This was previously approved. > > configure | 26 +- > 1 file changed, 9 insertions(+), 17 deletions(-) > > diff --git a/configure b/configure > index 6f1be32..ef6a8e0 100755 > --- a/configure > +++ b/configure > @@ -1608,7 +1608,6 @@ SYSTEM_FUNCS=" > CommandLineToArgvW > CoTaskMemFree > CryptGenRandom > -dlopen > fcntl > flt_lim > fork > @@ -2218,10 +2217,8 @@ wmv3_vaapi_hwaccel_select="vc1_vaapi_hwaccel" > wmv3_vdpau_hwaccel_select="vc1_vdpau_hwaccel" > > # hardware-accelerated codecs > -nvenc_deps_any="dlopen LoadLibrary" > -nvenc_extralibs='$ldl' > -omx_deps="dlopen pthreads" > -omx_extralibs='$ldl' > +nvenc_deps_any="libdl LoadLibrary" > +omx_deps="libdl pthreads" > omx_rpi_select="omx" > qsvdec_select="qsv" > qsvenc_select="qsv" > @@ -2280,7 +2277,7 @@ mjpeg2jpeg_bsf_select="jpegtables" > > # external libraries > avisynth_deps="LoadLibrary" > -avxsynth_deps="dlopen" > +avxsynth_deps="libdl" > avisynth_demuxer_deps_any="avisynth avxsynth" > avisynth_demuxer_select="riffdec" > libdcadec_decoder_deps="libdcadec" > @@ -2472,10 +2469,8 @@ deinterlace_qsv_filter_deps="libmfx" > deinterlace_vaapi_filter_deps="vaapi" > delogo_filter_deps="gpl" > drawtext_filter_deps="libfreetype" > -frei0r_filter_deps="frei0r dlopen" > -frei0r_filter_extralibs='$ldl' > -frei0r_src_filter_deps="frei0r dlopen" > -frei0r_src_filter_extralibs='$ldl' > +frei0r_filter_deps="frei0r libdl" > +frei0r_src_filter_deps="frei0r libdl" > hdcd_filter_deps="libhdcd" > hqdn3d_filter_deps="gpl" > interlace_filter_deps="gpl" > @@ -4461,12 +4456,6 @@ check_code cc arm_neon.h "int16x8_t test = > vdupq_n_s16(0)" && enable intrinsics_ > > check_ldflags -Wl,--as-needed > > -if check_func dlopen; then > -ldl= > -elif check_func dlopen -ldl; then > -ldl=-ldl > -fi > - > if ! disabled network; then > check_func getaddrinfo $network_extralibs > check_func inet_aton $network_extralibs > @@ -4638,6 +4627,9 @@ enabled pthreads && > disabled zlib || check_lib zlib zlib.h zlibVersion -lz > disabled bzlib || check_lib bzlib bzlib.h BZ2_bzlibVersion -lbz2 > > +# On some systems dynamic loading requires no extra linker flags > +check_lib libdl dlfcn.h dlopen || check_lib libdl dlfcn.h dlopen -ldl > + > check_lib libm math.h sin -lm && LIBM="-lm" > > atan2f_args=2 > @@ -4650,7 +4642,7 @@ done > > # these are off by default, so fail if requested and not available > enabled avisynth && require_header avisynth/avisynth_c.h > -enabled avxsynth && require avxsynth "avxsynth/avxsynth_c.h > dlfcn.h" dlopen -ldl > +enabled avxsynth && require_header avxsynth/avxsynth_c.h > enabled cuda && require cuda cuda.h cuInit -lcuda > enabled frei0r&& require_header frei0r.h > enabled gnutls&& require_pkg_config gnutls gnutls > gnutls/gnutls.h gnutls_global_init ok Janne ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 1/3] Revert "configure: Add proper weak dependency of drawtext filter on libfontconfig"
On 2017-02-21 18:26:24 +0100, Diego Biurrun wrote: > External dependencies cannot be handled as weak dependencies since they need > to be explicitly enabled. If a weak dependency is set, the variable > corresponding > to the weak dependency can be enabled without the rest of the build system > settings, resulting in a failing build. > > This reverts commit 66988320794a107f2a460eaa71dbd9fab8056842. > --- > configure | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/configure b/configure > index 24e9fc3..6f1be32 100755 > --- a/configure > +++ b/configure > @@ -2472,7 +2472,6 @@ deinterlace_qsv_filter_deps="libmfx" > deinterlace_vaapi_filter_deps="vaapi" > delogo_filter_deps="gpl" > drawtext_filter_deps="libfreetype" > -drawtext_filter_suggest="libfontconfig" > frei0r_filter_deps="frei0r dlopen" > frei0r_filter_extralibs='$ldl' > frei0r_src_filter_deps="frei0r dlopen" ok Janne ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 3/4] arm: vp9itxfm: Reorder iadst16 coeffs
On 2017-02-09 14:33:55 +0200, Martin Storsjö wrote: > This matches the order they are in the 16 bpp version. > > There they are in this order, to make sure we access them in the > same order they are declared, easing loading only half of the > coefficients at a time. > > This makes the 8 bpp version match the 16 bpp version better. > --- > libavcodec/arm/vp9itxfm_neon.S | 12 ++-- > 1 file changed, 6 insertions(+), 6 deletions(-) > > diff --git a/libavcodec/arm/vp9itxfm_neon.S b/libavcodec/arm/vp9itxfm_neon.S > index f74d542..c8eeb76 100644 > --- a/libavcodec/arm/vp9itxfm_neon.S > +++ b/libavcodec/arm/vp9itxfm_neon.S > @@ -37,8 +37,8 @@ idct_coeffs: > endconst > > const iadst16_coeffs, align=4 > -.short 16364, 804, 15893, 3981, 14811, 7005, 13160, 9760 > -.short 11003, 12140, 8423, 14053, 5520, 15426, 2404, 16207 > +.short 16364, 804, 15893, 3981, 11003, 12140, 8423, 14053 > +.short 14811, 7005, 13160, 9760, 5520, 15426, 2404, 16207 > endconst > > @ Do four 4x4 transposes, using q registers for the subtransposes that don't > @@ -672,19 +672,19 @@ function iadst16 > vld1.16 {q0-q1}, [r12,:128] > > mbutterfly_lq3, q2, d31, d16, d0[1], d0[0] @ q3 = t1, q2 = > t0 > -mbutterfly_lq5, q4, d23, d24, d2[1], d2[0] @ q5 = t9, q4 = > t8 > +mbutterfly_lq5, q4, d23, d24, d1[1], d1[0] @ q5 = t9, q4 = > t8 > butterfly_n d31, d24, q3, q5, q6, q5 @ d31 = t1a, d24 = > t9a > mbutterfly_lq7, q6, d29, d18, d0[3], d0[2] @ q7 = t3, q6 = > t2 > butterfly_n d16, d23, q2, q4, q3, q4 @ d16 = t0a, d23 = > t8a > > -mbutterfly_lq3, q2, d21, d26, d2[3], d2[2] @ q3 = t11, q2 = > t10 > +mbutterfly_lq3, q2, d21, d26, d1[3], d1[2] @ q3 = t11, q2 = > t10 > butterfly_n d29, d26, q7, q3, q4, q3 @ d29 = t3a, d26 = > t11a > -mbutterfly_lq5, q4, d27, d20, d1[1], d1[0] @ q5 = t5, q4 = > t4 > +mbutterfly_lq5, q4, d27, d20, d2[1], d2[0] @ q5 = t5, q4 = > t4 > butterfly_n d18, d21, q6, q2, q3, q2 @ d18 = t2a, d21 = > t10a > > mbutterfly_lq7, q6, d19, d28, d3[1], d3[0] @ q7 = t13, q6 = > t12 > butterfly_n d20, d28, q5, q7, q2, q7 @ d20 = t5a, d28 = > t13a > -mbutterfly_lq3, q2, d25, d22, d1[3], d1[2] @ q3 = t7, q2 = > t6 > +mbutterfly_lq3, q2, d25, d22, d2[3], d2[2] @ q3 = t7, q2 = > t6 > butterfly_n d27, d19, q4, q6, q5, q6 @ d27 = t4a, d19 = > t12a > > mbutterfly_lq5, q4, d17, d30, d3[3], d3[2] @ q5 = t15, q4 = > t14 ok Janne ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 4/4] aarch64: vp9itxfm: Reorder iadst16 coeffs
On 2017-02-09 14:33:56 +0200, Martin Storsjö wrote: > This matches the order they are in the 16 bpp version. > > There they are in this order, to make sure we access them in the > same order they are declared, easing loading only half of the > coefficients at a time. > > This makes the 8 bpp version match the 16 bpp version better. > --- > libavcodec/aarch64/vp9itxfm_neon.S | 12 ++-- > 1 file changed, 6 insertions(+), 6 deletions(-) > > diff --git a/libavcodec/aarch64/vp9itxfm_neon.S > b/libavcodec/aarch64/vp9itxfm_neon.S > index f87f6bd..7b7dbd4 100644 > --- a/libavcodec/aarch64/vp9itxfm_neon.S > +++ b/libavcodec/aarch64/vp9itxfm_neon.S > @@ -37,8 +37,8 @@ idct_coeffs: > endconst > > const iadst16_coeffs, align=4 > -.short 16364, 804, 15893, 3981, 14811, 7005, 13160, 9760 > -.short 11003, 12140, 8423, 14053, 5520, 15426, 2404, 16207 > +.short 16364, 804, 15893, 3981, 11003, 12140, 8423, 14053 > +.short 14811, 7005, 13160, 9760, 5520, 15426, 2404, 16207 > endconst > > // out1 = ((in1 + in2) * d0[0] + (1 << 13)) >> 14 > @@ -622,19 +622,19 @@ function iadst16 > ld1 {v0.8h,v1.8h}, [x11] > > dmbutterfly_l v6, v7, v4, v5, v31, v16, v0.h[1], v0.h[0] // > v6,v7 = t1, v4,v5 = t0 > -dmbutterfly_l v10, v11, v8, v9, v23, v24, v1.h[1], v1.h[0] // > v10,v11 = t9, v8,v9 = t8 > +dmbutterfly_l v10, v11, v8, v9, v23, v24, v0.h[5], v0.h[4] // > v10,v11 = t9, v8,v9 = t8 > dbutterfly_nv31, v24, v6, v7, v10, v11, v12, v13, v10, v11 // > v31 = t1a, v24 = t9a > dmbutterfly_l v14, v15, v12, v13, v29, v18, v0.h[3], v0.h[2] // > v14,v15 = t3, v12,v13 = t2 > dbutterfly_nv16, v23, v4, v5, v8, v9, v6, v7, v8, v9 // > v16 = t0a, v23 = t8a > > -dmbutterfly_l v6, v7, v4, v5, v21, v26, v1.h[3], v1.h[2] // > v6,v7 = t11, v4,v5 = t10 > +dmbutterfly_l v6, v7, v4, v5, v21, v26, v0.h[7], v0.h[6] // > v6,v7 = t11, v4,v5 = t10 > dbutterfly_nv29, v26, v14, v15, v6, v7, v8, v9, v6, v7 // > v29 = t3a, v26 = t11a > -dmbutterfly_l v10, v11, v8, v9, v27, v20, v0.h[5], v0.h[4] // > v10,v11 = t5, v8,v9 = t4 > +dmbutterfly_l v10, v11, v8, v9, v27, v20, v1.h[1], v1.h[0] // > v10,v11 = t5, v8,v9 = t4 > dbutterfly_nv18, v21, v12, v13, v4, v5, v6, v7, v4, v5 // > v18 = t2a, v21 = t10a > > dmbutterfly_l v14, v15, v12, v13, v19, v28, v1.h[5], v1.h[4] // > v14,v15 = t13, v12,v13 = t12 > dbutterfly_nv20, v28, v10, v11, v14, v15, v4, v5, v14, v15 // > v20 = t5a, v28 = t13a > -dmbutterfly_l v6, v7, v4, v5, v25, v22, v0.h[7], v0.h[6] // > v6,v7 = t7, v4,v5 = t6 > +dmbutterfly_l v6, v7, v4, v5, v25, v22, v1.h[3], v1.h[2] // > v6,v7 = t7, v4,v5 = t6 > dbutterfly_nv27, v19, v8, v9, v12, v13, v10, v11, v12, v13 // > v27 = t4a, v19 = t12a > > dmbutterfly_l v10, v11, v8, v9, v17, v30, v1.h[7], v1.h[6] // > v10,v11 = t15, v8,v9 = t14 ok Janne ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 2/4] aarch64: vp9itxfm: Reorder the idct coefficients for better pairing
On 2017-02-09 14:33:54 +0200, Martin Storsjö wrote: > All elements are used pairwise, except for the first one. > Previously, the 16th element was unused. Move the unused element > to the second slot, to make the later element pairs not split > across registers. > > This simplifies loading only parts of the coefficients, > reducing the difference to the 16 bpp version. > --- > libavcodec/aarch64/vp9itxfm_neon.S | 124 > ++--- > 1 file changed, 62 insertions(+), 62 deletions(-) ok Janne ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 1/4] arm: vp9itxfm: Reorder the idct coefficients for better pairing
On 2017-02-09 14:33:53 +0200, Martin Storsjö wrote: > All elements are used pairwise, except for the first one. > Previously, the 16th element was unused. Move the unused element > to the second slot, to make the later element pairs not split > across registers. > > This simplifies loading only parts of the coefficients, > reducing the difference to the 16 bpp version. > --- > The 16 bpp version is only in ffmpeg for now, since libav's vp9 > decoder doesn't support the high bitdepth profiles. This change > in itself still makes sense to do though. > --- > libavcodec/arm/vp9itxfm_neon.S | 124 > - > 1 file changed, 62 insertions(+), 62 deletions(-) ok Janne ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH] arm: vp9itxfm: Avoid reloading the idct32 coefficients
On 2017-02-09 13:39:55 +0200, Martin Storsjö wrote: > The idct32x32 function actually backed up and restored q4-q7 even > though it didn't clobber them; there are plenty of registers that > can be used to allow keeping all the idct coefficients in registers > without having to reload different subsets of them at different > stages in the transform. > > Since the idct16 core transform avoids clobbering q4-q7 (but clobbers > q2-q3 instead, to avoid needing to back up and restore q4-q7 at all > in the idct16 function), and the lanewise vmul needs a register in > the q0-q3 range, we move the stored coefficients from q2-q3 into q4-q5 > while doing idct16. > > While keeping these coefficients in registers, we still can skip backing > up and restoring q7. > > Before: Cortex A7 A8 A9 A53 > vp9_inv_dct_dct_32x32_sub32_add_neon: 18553.8 17182.7 14303.3 12089.7 > After: > vp9_inv_dct_dct_32x32_sub32_add_neon: 18470.3 16717.7 14173.6 11860.8 > --- > libavcodec/arm/vp9itxfm_neon.S | 246 > - > 1 file changed, 120 insertions(+), 126 deletions(-) ok Janne ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 6/6] arm: vp9lpf: Implement the mix2_44 function with one single filter pass
On 2017-02-11 23:42:05 +0200, Martin Storsjö wrote: > On Sat, 11 Feb 2017, Martin Storsjö wrote: > > >On Fri, 10 Feb 2017, Janne Grunau wrote: > > > >>On 2017-01-15 22:55:52 +0200, Martin Storsjö wrote: > >>>For this case, with 8 inputs but only changing 4 of them, we can fit > >>>all 16 input pixels into a q register, and still have enough temporary > >>>registers for doing the loop filter. > >>> > >>>The wd=8 filters would require too many temporary registers for > >>>processing all 16 pixels at once though. > >>> > >>>Before: Cortex A7 A8 A9 A53 > >>>vp9_loop_filter_mix2_v_44_16_neon: 289.7 256.2 237.5 181.2 > >>>After: > >>>vp9_loop_filter_mix2_v_44_16_neon: 221.2 150.5 177.7 138.0 > >>>--- > >>> libavcodec/arm/vp9dsp_init_arm.c | 7 +- > >>> libavcodec/arm/vp9lpf_neon.S | 191 > >+++ > >>> 2 files changed, 195 insertions(+), 3 deletions(-) > >>> > >>>diff --git a/libavcodec/arm/vp9dsp_init_arm.c > >b/libavcodec/arm/vp9dsp_init_arm.c > >>>index e99d931..1ede170 100644 > >>>--- a/libavcodec/arm/vp9dsp_init_arm.c > >>>+++ b/libavcodec/arm/vp9dsp_init_arm.c > >>>@@ -194,6 +194,8 @@ define_loop_filters(8, 8); > >>> define_loop_filters(16, 8); > >>> define_loop_filters(16, 16); > >>> > >>>+define_loop_filters(44, 16); > >>>+ > >>> #define lf_mix_fn(dir, wd1, wd2, stridea) > >\ > >>> static void loop_filter_##dir##_##wd1##wd2##_16_neon(uint8_t *dst, > >\ > >>> ptrdiff_t > >>>stride, > >\ > >>>@@ -207,7 +209,6 @@ static void > >loop_filter_##dir##_##wd1##wd2##_16_neon(uint8_t *dst, > >>> lf_mix_fn(h, wd1, wd2, stride) \ > >>> lf_mix_fn(v, wd1, wd2, sizeof(uint8_t)) > >>> > >>>-lf_mix_fns(4, 4) > >>> lf_mix_fns(4, 8) > >>> lf_mix_fns(8, 4) > >>> lf_mix_fns(8, 8) > >>>@@ -227,8 +228,8 @@ static av_cold void > >vp9dsp_loopfilter_init_arm(VP9DSPContext *dsp) > >>> dsp->loop_filter_16[0] = ff_vp9_loop_filter_h_16_16_neon; > >>> dsp->loop_filter_16[1] = ff_vp9_loop_filter_v_16_16_neon; > >>> > >>>-dsp->loop_filter_mix2[0][0][0] = loop_filter_h_44_16_neon; > >>>-dsp->loop_filter_mix2[0][0][1] = loop_filter_v_44_16_neon; > >>>+dsp->loop_filter_mix2[0][0][0] = ff_vp9_loop_filter_h_44_16_neon; > >>>+dsp->loop_filter_mix2[0][0][1] = ff_vp9_loop_filter_v_44_16_neon; > >>> dsp->loop_filter_mix2[0][1][0] = loop_filter_h_48_16_neon; > >>> dsp->loop_filter_mix2[0][1][1] = loop_filter_v_48_16_neon; > >>> dsp->loop_filter_mix2[1][0][0] = loop_filter_h_84_16_neon; > >>>diff --git a/libavcodec/arm/vp9lpf_neon.S b/libavcodec/arm/vp9lpf_neon.S > >>>index e31c807..12984a9 100644 > >>>--- a/libavcodec/arm/vp9lpf_neon.S > >>>+++ b/libavcodec/arm/vp9lpf_neon.S > >>>@@ -44,6 +44,109 @@ > >>> vtrn.8 \r2, \r3 > >>> .endm > >>> > >>>+@ The input to and output from this macro is in the registers q8-q15, > >>>+@ and q0-q7 are used as scratch registers. > >>>+@ p3 = q8, p0 = q11, q0 = q12, q3 = q15 > >>>+.macro loop_filter_q > >>>+vdup.u8 d0, r2 @ E > >>>+lsr r2, r2, #8 > >>>+vdup.u8 d2, r3 @ I > >>>+lsr r3, r3, #8 > >>>+vdup.u8 d1, r2 @ E > >>>+vdup.u8 d3, r3 @ I > > > >I tried implementing your suggestion with uzp here, but it ended up being > >slower actually. With the version of the patch I posted here: > > > >vp9_loop_filter_mix2_v_44_16_neon: 221.2 150.5 185.0 139.0 > > > >With this block replaced with this: > > > >vdup.u16q0, r2 @ E > >vdup.u16q1, r3 @ I > >vuzp.u8 d0, d1 @ E > >vuzp.u8 d2, d3 @ I > > > >I get the following: > > > >vp9_loop_filter_mix2_v_44_16_neon: 223.2 150.5 186.1 142.0 > > > >I.e. 1-3 cycles slower on A7, A9 and A53, identical on A8. > > If I move the two vuzp further down, I get the following: > > vp9_loop_filter_mix2_v_44_16_neon: 223.2 148.5 185.1 141.0 > > I.e. +2 on A7, -2 on A8, 0 on A9, +2 on A53. So on average it's still worse, > even though it codewise is neater. leave it as it was then Janne ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 2/6] arm/aarch64: vp9lpf: Keep the comparison to E within 8 bit
On 2017-02-11 22:19:02 +0200, Martin Storsjö wrote: > On Fri, 10 Feb 2017, Janne Grunau wrote: > > >On 2017-01-15 22:55:48 +0200, Martin Storsjö wrote: > >>The theoretical maximum value of E is 193, so we can just > >>saturate the addition to 255. > >> > >>Before: Cortex A7 A8 A9 A53 A53/AArch64 > >>vp9_loop_filter_v_4_8_neon: 143.0 127.7 114.888.0 87.7 > >>vp9_loop_filter_v_8_8_neon: 241.0 197.2 173.7 140.0136.7 > >>vp9_loop_filter_v_16_8_neon:497.0 419.5 379.7 293.0275.7 > >>vp9_loop_filter_v_16_16_neon: 965.2 818.7 731.4 579.0452.0 > >>After: > >>vp9_loop_filter_v_4_8_neon: 136.0 125.7 112.684.0 83.0 > >>vp9_loop_filter_v_8_8_neon: 234.0 195.5 171.5 136.0133.7 > >>vp9_loop_filter_v_16_8_neon:490.0 417.5 377.7 289.0271.0 > >>vp9_loop_filter_v_16_16_neon: 951.2 814.7 732.3 571.0446.7 > >>--- > >> libavcodec/aarch64/vp9lpf_neon.S | 40 > >> +--- > >> libavcodec/arm/vp9lpf_neon.S | 11 +-- > >> 2 files changed, 14 insertions(+), 37 deletions(-) > >> > >>diff --git a/libavcodec/aarch64/vp9lpf_neon.S > >>b/libavcodec/aarch64/vp9lpf_neon.S > >>index 3b8e6eb..4553173 100644 > >>--- a/libavcodec/aarch64/vp9lpf_neon.S > >>+++ b/libavcodec/aarch64/vp9lpf_neon.S > >>@@ -51,13 +51,6 @@ > >> // see the arm version instead. > >> > >> > >>-.macro uabdl_sz dst1, dst2, in1, in2, sz > >>-uabdl \dst1, \in1\().8b, \in2\().8b > >>-.ifc \sz, .16b > >>-uabdl2 \dst2, \in1\().16b, \in2\().16b > >>-.endif > >>-.endm > >>- > >> .macro add_sz dst1, dst2, in1, in2, in3, in4, sz > >> add \dst1, \in1, \in3 > >> .ifc \sz, .16b > >>@@ -86,20 +79,6 @@ > >> .endif > >> .endm > >> > >>-.macro cmhs_sz dst1, dst2, in1, in2, in3, in4, sz > >>-cmhs\dst1, \in1, \in3 > >>-.ifc \sz, .16b > >>-cmhs\dst2, \in2, \in4 > >>-.endif > >>-.endm > >>- > >>-.macro xtn_sz dst, in1, in2, sz > >>-xtn \dst\().8b, \in1 > >>-.ifc \sz, .16b > >>-xtn2\dst\().16b, \in2 > >>-.endif > >>-.endm > >>- > >> .macro usubl_sz dst1, dst2, in1, in2, sz > >> usubl \dst1, \in1\().8b, \in2\().8b > >> .ifc \sz, .16b > >>@@ -179,20 +158,20 @@ > >> // tmpq2 == tmp3 + tmp4, etc. > >> .macro loop_filter wd, sz, mix, tmp1, tmp2, tmp3, tmp4, tmp5, tmp6, tmp7, > >> tmp8 > >> .if \mix == 0 > >>-dup v0.8h, w2// E > >>-dup v1.8h, w2// E > >>+dup v0\sz, w2// E > >> dup v2\sz, w3// I > >> dup v3\sz, w4// H > >> .else > >>-dup v0.8h, w2// E > >>+dup v0.8b, w2// E > >> dup v2.8b, w3// I > >> dup v3.8b, w4// H > >>+lsr w5, w2, #8 > >> lsr w6, w3, #8 > >> lsr w7, w4, #8 > >>-ushrv1.8h, v0.8h, #8 // E > >>+dup v1.8b, w5// E > >> dup v4.8b, w6// I > >>-bic v0.8h, #255, lsl 8 // E > >> dup v5.8b, w7// H > >>+trn1v0.2d, v0.2d, v1.2d > > > >isn't this equivalent to > > > >dup v0.8h, w2 > >uzp1 v0.16b, v0.16b, v0.16b > > > >on little endian? > > Nice idea, but it isn't quite as straightforward on aarch64 - on arm it > would have been. gah, yes. > All the even values will be output in the output registers of uzp1, so > you need uzp2 as well. > > So instead of this as we have now: > > dup v0.8b, w2 > lsr w5, w2, #8 > dup v1.8b, w5 > trn1 v0.2d, v0.2d, v1.2d > > We could do: > > dup v0.8h, w2 > uzp2 v1.16b, v0.16b, v0.16b > uzp1 v0.16b, v0.16b, v0.16b > trn1 v0.2d, v0.2d, v1.2d rev16 v1.16b, v0.16b // or ext ..x or any other instruction uzp1 v0.16b, v0.16b, v1.16b is one instruction less but also not straight forward ok as is Janne ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 1/3] movenc: Add an option for enabling negative CTS offsets
On Thu, 23 Feb 2017, Yusuke Nakamura wrote: 2017-02-20 6:22 GMT+09:00 Martin Storsjö: This reduces the need for an edit list; streams that start with e.g. dts=-1, pts=0 can be encoded as dts=0, pts=0 (which is valid in mov/mp4) by shifting the dts values of all packets forward. This avoids the need for edit lists for such streams (while they still are needed for audio streams with encoder delay). --- libavformat/movenc.c | 24 libavformat/movenc.h | 2 ++ 2 files changed, 22 insertions(+), 4 deletions(-) diff --git a/libavformat/movenc.c b/libavformat/movenc.c index 840190d..713c145 100644 --- a/libavformat/movenc.c +++ b/libavformat/movenc.c @@ -62,6 +62,7 @@ static const AVOption options[] = { { "delay_moov", "Delay writing the initial moov until the first fragment is cut, or until the first fragment flush", 0, AV_OPT_TYPE_CONST, {.i64 = FF_MOV_FLAG_DELAY_MOOV}, INT_MIN, INT_MAX, AV_OPT_FLAG_ENCODING_PARAM, "movflags" }, { "global_sidx", "Write a global sidx index at the start of the file", 0, AV_OPT_TYPE_CONST, {.i64 = FF_MOV_FLAG_GLOBAL_SIDX}, INT_MIN, INT_MAX, AV_OPT_FLAG_ENCODING_PARAM, "movflags" }, { "skip_trailer", "Skip writing the mfra/tfra/mfro trailer for fragmented files", 0, AV_OPT_TYPE_CONST, {.i64 = FF_MOV_FLAG_SKIP_TRAILER}, INT_MIN, INT_MAX, AV_OPT_FLAG_ENCODING_PARAM, "movflags" }, +{ "negative_cts_offsets", "Use negative CTS offsets (reducing the need for edit lists)", 0, AV_OPT_TYPE_CONST, {.i64 = FF_MOV_FLAG_NEGATIVE_CTS_OFFSETS}, INT_MIN, INT_MAX, AV_OPT_FLAG_ENCODING_PARAM, "movflags" }, FF_RTP_FLAG_OPTS(MOVMuxContext, rtp_flags), { "skip_iods", "Skip writing iods atom.", offsetof(MOVMuxContext, iods_skip), AV_OPT_TYPE_INT, {.i64 = 0}, 0, 1, AV_OPT_FLAG_ENCODING_PARAM}, { "iods_audio_profile", "iods audio profile atom.", offsetof(MOVMuxContext, iods_audio_profile), AV_OPT_TYPE_INT, {.i64 = -1}, -1, 255, AV_OPT_FLAG_ENCODING_PARAM}, @@ -1163,8 +1164,9 @@ static int mov_write_stsd_tag(AVFormatContext *s, AVIOContext *pb, MOVTrack *tra return update_size(pb, pos); } -static int mov_write_ctts_tag(AVIOContext *pb, MOVTrack *track) +static int mov_write_ctts_tag(AVFormatContext *s, AVIOContext *pb, MOVTrack *track) { +MOVMuxContext *mov = s->priv_data; MOVStts *ctts_entries; uint32_t entries = 0; uint32_t atom_size; @@ -1188,7 +1190,11 @@ static int mov_write_ctts_tag(AVIOContext *pb, MOVTrack *track) atom_size = 16 + (entries * 8); avio_wb32(pb, atom_size); /* size */ ffio_wfourcc(pb, "ctts"); -avio_wb32(pb, 0); /* version & flags */ +if (mov->flags & FF_MOV_FLAG_NEGATIVE_CTS_OFFSETS) +avio_w8(pb, 1); /* version */ ctts ver. 1 is defined in iso4 or later isobmff brands. Thanks, will change so that we declare iso4 as major brand if this flag is set (unless some other option is set that requires declaring iso5). +else +avio_w8(pb, 0); /* version */ +avio_wb24(pb, 0); /* flags */ avio_wb32(pb, entries); /* entry count */ for (i = 0; i < entries; i++) { avio_wb32(pb, ctts_entries[i].count); @@ -1273,7 +1279,7 @@ static int mov_write_stbl_tag(AVFormatContext *s, AVIOContext *pb, MOVTrack *tra mov_write_stss_tag(pb, track, MOV_PARTIAL_SYNC_SAMPLE); if (track->par->codec_type == AVMEDIA_TYPE_VIDEO && track->flags & MOV_TRACK_CTTS && track->entry) -mov_write_ctts_tag(pb, track); +mov_write_ctts_tag(s, pb, track); mov_write_stsc_tag(pb, track); mov_write_stsz_tag(pb, track); mov_write_stco_tag(pb, track); @@ -2594,7 +2600,10 @@ static int mov_write_trun_tag(AVIOContext *pb, MOVMuxContext *mov, avio_wb32(pb, 0); /* size placeholder */ ffio_wfourcc(pb, "trun"); -avio_w8(pb, 0); /* version */ +if (mov->flags & FF_MOV_FLAG_NEGATIVE_CTS_OFFSETS) +avio_w8(pb, 1); /* version */ +else +avio_w8(pb, 0); /* version */ avio_wb24(pb, flags); avio_wb32(pb, end - first); /* sample count */ @@ -3729,6 +3738,12 @@ static int mov_write_packet(AVFormatContext *s, AVPacket *pkt) mov->flags &= ~FF_MOV_FLAG_FRAG_DISCONT; } +if (mov->flags & FF_MOV_FLAG_NEGATIVE_CTS_OFFSETS) { +if (trk->dts_shift == AV_NOPTS_VALUE) +trk->dts_shift = pkt->pts - pkt->dts; Do you care about an issue of negative composition time offset on early flush of movie fragments? Reordering of leading samples could confuse demuxers due to the non-zero cts of the first sample and no examination about subsequent samples. This can be occured when starting to remux from Open-GOP boundary (also, don't forget that AVC and HEVC can output P or B pictures before IDR picture). Good point - I hadn't thought about that. In those cases, we won't get exactly the desired result. On the other hand, I don't have any better idea on heuristics that would do the right thing either. So I'd declare that as a known
Re: [libav-devel] [PATCH 3/3] Add Apple Pixlet decoder
On Wed, Feb 22, 2017 at 12:53:35PM -0500, Vittorio Giovara wrote: > --- /dev/null > +++ b/libavcodec/pixlet.c > @@ -0,0 +1,689 @@ > +static int read_high_coeffs(AVCodecContext *avctx, uint8_t *src, int16_t > *dst, > +int size, int64_t c, int a, int64_t d, > +int width, ptrdiff_t stride) > +{ > +PixletContext *ctx = avctx->priv_data; > +BitstreamContext *bc = >bc; > +unsigned cnt1, shbits, rlen, nbits, length, i = 0, j = 0, k; > +int ret, escape, pfx, value, yflag, xflag, flag = 0; > +int64_t state = 3, tmp; > + > +while (i < size) { > +if (state >> 8 != -3) { > +value = ff_clz((state >> 8) + 3) ^ 0x1F; > +} else { > +value = -1; > +} nit: pointless () > +cnt1 = get_unary(bc, 0, length); > +if (cnt1 >= length) { > +cnt1 = bitstream_read(bc, nbits); > +} else { > +pfx= 14 + uint64_t) (value - 14)) >> 32) & (value - 14)); Maybe just make value uint64_t instead of casting? > +static int read_highpass(AVCodecContext *avctx, uint8_t *ptr, > + int plane, AVFrame *frame) > +{ > +for (i = 0; i < ctx->levels * 3; i++) { > +uint32_t magic = bytestream2_get_be32(>gb); > + > +if (magic != PIXLET_MAGIC) { > +av_log(avctx, AV_LOG_ERROR, > + "wrong magic number: 0x%08X for plane %d, band %d\n", > + magic, plane, i); magic is uint32_t, use the correct C99 printf conversion specifier. > +static int pixlet_decode_frame(AVCodecContext *avctx, void *data, > + int *got_frame, AVPacket *avpkt) > +{ > +uint32_t pktsize; > + > +pktsize = bytestream2_get_be32(>gb); > +if (pktsize <= 44 || pktsize - 4 > bytestream2_get_bytes_left(>gb)) > { > +av_log(avctx, AV_LOG_ERROR, "Invalid packet size %u.\n", pktsize); same Diego ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 3/3] Add Apple Pixlet decoder
On 22/02/2017 18:53, Vittorio Giovara wrote: > +/* elenril reads this as if (cthulhu->state == fhtagn) */ > +if ((a >= 0) + (a ^ (a >> 31)) - (a >> 31) != 1) { > +nbits = 33 - ff_clz((a >= 0) + (a ^ (a >> 31)) - (a >> 31) - 1); > +if (nbits > 16) > +return AVERROR_INVALIDDATA; > +} else { > +nbits = 1; > +} cthulu = (a >= 0) + (a ^ (a >> 31)) - (a >> 31); if (cthulu != 1) { nbits = 33 - ff_clz(cthulu - 1); ... The rest looks fine. lu ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 2/3] libavutil: add av_mod_uintp2
On 22/02/2017 18:53, Vittorio Giovara wrote: > From: James Almer> > Signed-off-by: James Almer > --- > libavutil/common.h | 14 ++ > 1 file changed, 14 insertions(+) > Ok. ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 1/3] intmath: add faster clz support
On 22/02/2017 18:53, Vittorio Giovara wrote: > From: Ganesh Ajjanagadde> > --- > libavutil/intmath.h | 19 +++ > 1 file changed, 19 insertions(+) > Sure ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel