Re: [FFmpeg-devel] [PATCH] libavcodec/hevcdec: detect non-conformant missing refs
On 4/5/22 17:10, Anton Khirnov wrote: > Quoting Xiaolei Yu (2022-04-05 08:49:24) >> >> For cases which prefer rejecting broken bitstreams. >> --- >> libavcodec/hevc_refs.c | 15 --- >> 1 file changed, 12 insertions(+), 3 deletions(-) >> >> diff --git a/libavcodec/hevc_refs.c b/libavcodec/hevc_refs.c >> index fe18ca2b1d..7ea70e301b 100644 >> --- a/libavcodec/hevc_refs.c >> +++ b/libavcodec/hevc_refs.cfind_ref_idx( >> @@ -426,7 +426,7 @@ static HEVCFrame *generate_missing_ref(HEVCContext *s, >> int poc) >> >> /* add a reference with the given poc to the list and mark it as used in >> DPB */ >> static int add_candidate_ref(HEVCContext *s, RefPicList *list, >> - int poc, int ref_flag, uint8_t use_msb) >> + int poc, int ref_flag, uint8_t use_msb, int >> maybe_missing) > > allow_missing would be clearer IMO > done >> { >> HEVCFrame *ref = find_ref_idx(s, poc, use_msb); >> >> @@ -434,6 +434,9 @@ static int add_candidate_ref(HEVCContext *s, RefPicList >> *list, >> return AVERROR_INVALIDDATA; >> >> if (!ref) { >> +if ((s->avctx->err_recognition & AV_EF_COMPLIANT) && !maybe_missing) > > a log message would be nice, so one can easily tell where exactly the > error comes from > Moved the log message from find_ref_idx to where the missing refs are to be generated. OpenPGP_0x22AEF7EFBDCEFF03.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH v2 1/2] avcodec/hevcdec: skip generating missing refs in foll lists
Missing refs shall be generated only when they are actually used. Without this change a sequence of a BLA picture and an associated RASL picture would still be decoded without complaints if the RASL picture is mislabeled as RADL. --- libavcodec/hevc_refs.c | 19 +-- 1 file changed, 13 insertions(+), 6 deletions(-) diff --git a/libavcodec/hevc_refs.c b/libavcodec/hevc_refs.c index fe18ca2b1d..84a21991c7 100644 --- a/libavcodec/hevc_refs.c +++ b/libavcodec/hevc_refs.c @@ -378,9 +378,6 @@ static HEVCFrame *find_ref_idx(HEVCContext *s, int poc, uint8_t use_msb) } } -if (s->nal_unit_type != HEVC_NAL_CRA_NUT && !IS_BLA(s)) -av_log(s->avctx, AV_LOG_ERROR, - "Could not find ref with POC %d\n", poc); return NULL; } @@ -426,7 +423,7 @@ static HEVCFrame *generate_missing_ref(HEVCContext *s, int poc) /* add a reference with the given poc to the list and mark it as used in DPB */ static int add_candidate_ref(HEVCContext *s, RefPicList *list, - int poc, int ref_flag, uint8_t use_msb) + int poc, int ref_flag, uint8_t use_msb, enum RPSType rps_type) { HEVCFrame *ref = find_ref_idx(s, poc, use_msb); @@ -434,6 +431,15 @@ static int add_candidate_ref(HEVCContext *s, RefPicList *list, return AVERROR_INVALIDDATA; if (!ref) { +int allow_missing = !(rps_type == ST_CURR_BEF || rps_type == ST_CURR_AFT || + rps_type == LT_CURR); + +/* skip generating missing refs in foll lists as they are not used by current frame */ +if (allow_missing) +return 0; + +av_log(s->avctx, AV_LOG_ERROR, "Could not find ref with POC %d\n", poc); + ref = generate_missing_ref(s, poc); if (!ref) return AVERROR(ENOMEM); @@ -484,7 +490,7 @@ int ff_hevc_frame_rps(HEVCContext *s) else list = ST_CURR_AFT; -ret = add_candidate_ref(s, &rps[list], poc, HEVC_FRAME_FLAG_SHORT_REF, 1); +ret = add_candidate_ref(s, &rps[list], poc, HEVC_FRAME_FLAG_SHORT_REF, 1, list); if (ret < 0) goto fail; } @@ -494,7 +500,8 @@ int ff_hevc_frame_rps(HEVCContext *s) int poc = long_rps->poc[i]; int list = long_rps->used[i] ? LT_CURR : LT_FOLL; -ret = add_candidate_ref(s, &rps[list], poc, HEVC_FRAME_FLAG_LONG_REF, long_rps->poc_msb_present[i]); +ret = add_candidate_ref(s, &rps[list], poc, HEVC_FRAME_FLAG_LONG_REF, long_rps->poc_msb_present[i], +list); if (ret < 0) goto fail; } -- 2.35.1 OpenPGP_0x22AEF7EFBDCEFF03.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH v2 2/2] avcodec/hevcdec: detect non-conformant missing refs
For cases which prefer rejecting broken bitstreams. --- libavcodec/hevc_refs.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/libavcodec/hevc_refs.c b/libavcodec/hevc_refs.c index 84a21991c7..9f8b6022c4 100644 --- a/libavcodec/hevc_refs.c +++ b/libavcodec/hevc_refs.c @@ -439,6 +439,8 @@ static int add_candidate_ref(HEVCContext *s, RefPicList *list, return 0; av_log(s->avctx, AV_LOG_ERROR, "Could not find ref with POC %d\n", poc); +if (s->avctx->err_recognition & AV_EF_COMPLIANT) +return AVERROR_INVALIDDATA; ref = generate_missing_ref(s, poc); if (!ref) -- 2.35.1 OpenPGP_0x22AEF7EFBDCEFF03.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH 1/1] librtmp: use AVBPrint instead of char *
On Fri, 15 Apr 2022, Tristan Matthews wrote: This avoids having to do one pass to calculate the full length to allocate followed by a second pass to actually append values. --- libavformat/librtmp.c | 124 +++--- 1 file changed, 33 insertions(+), 91 deletions(-) diff --git a/libavformat/librtmp.c b/libavformat/librtmp.c index 43013e46e0..b7e9fc81cf 100644 --- a/libavformat/librtmp.c +++ b/libavformat/librtmp.c @@ -25,6 +25,7 @@ */ #include "libavutil/avstring.h" +#include "libavutil/bprint.h" #include "libavutil/mathematics.h" #include "libavutil/opt.h" #include "avformat.h" @@ -38,6 +39,7 @@ typedef struct LibRTMPContext { const AVClass *class; +AVBPrint filename; RTMP rtmp; char *app; char *conn; @@ -50,7 +52,6 @@ typedef struct LibRTMPContext { char *pageurl; char *client_buffer_time; int live; -char *temp_filename; int buffer_size; } LibRTMPContext; This looks ok to me. (I guess it also could be considered viable to not store the AVBprint in the context but detach an allocated string instead; that would decrease the persistent allocation size a little, at the cost of some more copying, but the difference is probably negligible, so this probably is sensible as is.) Let's still wait for Marton's review too. // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH v2 0/1] lavc/aarch64: add some neon pix_abs functions
On Fri, 15 Apr 2022, Martin Storsjö wrote: On Thu, 14 Apr 2022, Swinney, Jonathan wrote: Thanks Martin for the review. I made some updates according to the suggestions you made. I added a checkasm function, but I'm new to the test framework, so it may need some work still. Thanks for putting in the effort to make a test - that adds a lot of value to the project! I'll follow up with a review of the newly added test. Btw, for a guide on how checkasm tests, see https://ffmpeg.org/pipermail/ffmpeg-devel/2022-March/294400.html as a bit of a walkthrough. // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH 1/2] swscale/aarch64: add hscale specializations
On Fri, 15 Apr 2022, Swinney, Jonathan wrote: This patch adds specializations for hscale for filterSize == 4 and 8 and converts the existing implementation for the X8 version. For the old code, now used for the X8 version, it improves the efficiency of the final summations by reducing 11 instructions to 7. ff_hscale8to15_8_neon is mostly unchanged from the original except for a few changes. - The loads for the filter data were consolidated into a single 64 byte ld1 instruction. Couldn't you do this optimization on the existing function too? - The final summations were improved. - The inner loop on filterSize was completely removed I presume that this is the only differing factor which affects whether it's worthwhile to keep a separate width=8 function or not. At least from the checkasm benchmark numbers, the difference is notable but not huge (on the range of 4-10%, while the summation improvements gain even more). Given a fully optimized function that has an inner loop (which is only taken once for the width=8 case), is the separate function without an inner loop really necessary? ff_hscale8to15_4_neon is a complete rewrite. Since the main bottleneck here is loading the data from src, this data is loaded a whole block ahead and stored back to the stack to be loaded again with ld4. This arranges the data for most efficient use of the vector instructions and removes the need for completion adds at the end. The number of iterations of the C per iteration of the assembly is increased from 4 to 8, but because of the prefetching, it can only be used when dstW is >= 16. This improves speed by 26% on Graviton 2 (Neoverse N1) ffmpeg -nostats -f lavfi -i testsrc2=4k:d=2 -vf bench=start,scale=1024x1024,bench=stop -f null - before: t:0.001796 avg:0.001839 max:0.002756 min:0.001733 after: t:0.001690 avg:0.001352 max:0.002171 min:0.001292 In direct micro benchmarks I wrote the benefit is more dramatic when filterSize == 4. | (seconds) | c6g | | | --- | - | - | | filterSize | 4 | 8 | | original| 7.554 | 7.621 | | optimized | 3.736 | 7.054 | | improvement | 50.5% | 7.44% | This function does already have a checkasm test, so it'd be useful to include those numbers too! FWIW that test (runnable with "checkasm --test=sw_scale", benchmarkable with "checkasm --bench=hscale --test=sw_scale") is a bit lacking - it only ever tests with one dstW, 512. As this patch shows special handling of smaller dstW, it'd probably be good to e.g. randomly test a couple widths in the range of 1-512, or test one width < 16 and a couple more bigger ones. Or just exhaustively testing the full range, although that's quite uncommon among tests. Additionally, that testcase only tests with a steady 1 pixel stride in filterPos for each destination pixel - I guess it'd be good for testcase coverage to actually test more realistic scaling scenarios too. So if you'd be able to improve that testcase while at it, that'd be very much appreciated! Signed-off-by: Jonathan Swinney --- libswscale/aarch64/hscale.S | 263 +-- libswscale/aarch64/swscale.c | 41 -- libswscale/utils.c | 2 +- 3 files changed, 284 insertions(+), 22 deletions(-) diff --git a/libswscale/aarch64/hscale.S b/libswscale/aarch64/hscale.S index af55ffe2b7..a934653a46 100644 --- a/libswscale/aarch64/hscale.S +++ b/libswscale/aarch64/hscale.S @@ -1,5 +1,7 @@ /* * Copyright (c) 2016 Clément Bœsch + * Copyright (c) 2019-2021 Sebastian Pop + * Copyright (c) 2022 Jonathan Swinney * * This file is part of FFmpeg. * @@ -20,7 +22,25 @@ #include "libavutil/aarch64/asm.S" -function ff_hscale_8_to_15_neon, export=1 +/* +;- +; horizontal line scaling +; +; void hscaleto__ +; (SwsContext *c, int{16,32}_t *dst, +;int dstW, const uint{8,16}_t *src, +;const int16_t *filter, +;const int32_t *filterPos, int filterSize); +; +; Scale one horizontal line. Input is either 8-bit width or 16-bit width +; ($source_width can be either 8, 9, 10 or 16, difference is whether we have to +; downscale before multiplying). Filter is 14 bits. Output is either 15 bits +; (in int16_t) or 19 bits (in int32_t), as given in $intermediate_nbits. Each +; output pixel is generated from $filterSize input pixels, the position of +; the first pixel is given in filterPos[nOutputPixel]. +;- */ + +function ff_hscale8to15_X8_neon, export=1 sbfiz x7, x6, #1, #32 // filterSize*2 (*2 because int16) 1: ldr w8, [x5], #4// filterPos[idx] ldr w0, [x5], #4// filterPos[idx + 1] @@ -61,20 +81,239 @@ function ff_hscale_8
Re: [FFmpeg-devel] [PATCH 2/2] swscale/aarch64: add vscale specializations
On Fri, 15 Apr 2022, Swinney, Jonathan wrote: This commit adds new code paths for vscale when filterSize is 2, 4, or 8. By using specialized code with unrolling to match the filterSize we can improve performance. | (seconds) | c6g | | | | | - | - | - | | filterSize | 2 | 4 | 8 | | original| 0.581 | 0.974 | 1.744 | | optimized | 0.399 | 0.569 | 1.052 | | improvement | 31.1% | 41.6% | 39.7% | Signed-off-by: Jonathan Swinney --- libswscale/aarch64/output.S | 147 +-- libswscale/aarch64/swscale.c | 12 +++ 2 files changed, 153 insertions(+), 6 deletions(-) I'll have a closer look at the assembly itself at a later time, but first: The checkasm tests in tests/checkasm/sw_scale.c does test yuv2planeX, but there's no testing of yuv2plane1, can you extend it to cover that too? And that existing test only tests filter sizes 1, 4, 8, 16, but apparently should be extended to test size 2 too? // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".