Re: [FFmpeg-devel] [PATCH] libavcodec/hevcdec: detect non-conformant missing refs

2022-04-16 Thread Xiaolei Yu
On 4/5/22 17:10, Anton Khirnov wrote:
> Quoting Xiaolei Yu (2022-04-05 08:49:24)
>>
>> For cases which prefer rejecting broken bitstreams.
>> ---
>>  libavcodec/hevc_refs.c | 15 ---
>>  1 file changed, 12 insertions(+), 3 deletions(-)
>>
>> diff --git a/libavcodec/hevc_refs.c b/libavcodec/hevc_refs.c
>> index fe18ca2b1d..7ea70e301b 100644
>> --- a/libavcodec/hevc_refs.c
>> +++ b/libavcodec/hevc_refs.cfind_ref_idx(
>> @@ -426,7 +426,7 @@ static HEVCFrame *generate_missing_ref(HEVCContext *s, 
>> int poc)
>>
>>  /* add a reference with the given poc to the list and mark it as used in 
>> DPB */
>>  static int add_candidate_ref(HEVCContext *s, RefPicList *list,
>> - int poc, int ref_flag, uint8_t use_msb)
>> + int poc, int ref_flag, uint8_t use_msb, int 
>> maybe_missing)
>
> allow_missing would be clearer IMO
>

done

>>  {
>>  HEVCFrame *ref = find_ref_idx(s, poc, use_msb);
>>
>> @@ -434,6 +434,9 @@ static int add_candidate_ref(HEVCContext *s, RefPicList 
>> *list,
>>  return AVERROR_INVALIDDATA;
>>
>>  if (!ref) {
>> +if ((s->avctx->err_recognition & AV_EF_COMPLIANT) && !maybe_missing)
>
> a log message would be nice, so one can easily tell where exactly the
> error comes from
>

Moved the log message from find_ref_idx to where the missing refs are to be 
generated.

OpenPGP_0x22AEF7EFBDCEFF03.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-devel] [PATCH v2 1/2] avcodec/hevcdec: skip generating missing refs in foll lists

2022-04-16 Thread Xiaolei Yu
Missing refs shall be generated only when they are actually used.

Without this change a sequence of a BLA picture and an associated RASL picture
would still be decoded without complaints if the RASL picture is mislabeled
as RADL.
---
 libavcodec/hevc_refs.c | 19 +--
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/libavcodec/hevc_refs.c b/libavcodec/hevc_refs.c
index fe18ca2b1d..84a21991c7 100644
--- a/libavcodec/hevc_refs.c
+++ b/libavcodec/hevc_refs.c
@@ -378,9 +378,6 @@ static HEVCFrame *find_ref_idx(HEVCContext *s, int poc, 
uint8_t use_msb)
 }
 }
 
-if (s->nal_unit_type != HEVC_NAL_CRA_NUT && !IS_BLA(s))
-av_log(s->avctx, AV_LOG_ERROR,
-   "Could not find ref with POC %d\n", poc);
 return NULL;
 }
 
@@ -426,7 +423,7 @@ static HEVCFrame *generate_missing_ref(HEVCContext *s, int 
poc)
 
 /* add a reference with the given poc to the list and mark it as used in DPB */
 static int add_candidate_ref(HEVCContext *s, RefPicList *list,
- int poc, int ref_flag, uint8_t use_msb)
+ int poc, int ref_flag, uint8_t use_msb, enum 
RPSType rps_type)
 {
 HEVCFrame *ref = find_ref_idx(s, poc, use_msb);
 
@@ -434,6 +431,15 @@ static int add_candidate_ref(HEVCContext *s, RefPicList 
*list,
 return AVERROR_INVALIDDATA;
 
 if (!ref) {
+int allow_missing = !(rps_type == ST_CURR_BEF || rps_type == 
ST_CURR_AFT ||
+  rps_type == LT_CURR);
+
+/* skip generating missing refs in foll lists as they are not used by 
current frame */
+if (allow_missing)
+return 0;
+
+av_log(s->avctx, AV_LOG_ERROR, "Could not find ref with POC %d\n", 
poc);
+
 ref = generate_missing_ref(s, poc);
 if (!ref)
 return AVERROR(ENOMEM);
@@ -484,7 +490,7 @@ int ff_hevc_frame_rps(HEVCContext *s)
 else
 list = ST_CURR_AFT;
 
-ret = add_candidate_ref(s, &rps[list], poc, HEVC_FRAME_FLAG_SHORT_REF, 
1);
+ret = add_candidate_ref(s, &rps[list], poc, HEVC_FRAME_FLAG_SHORT_REF, 
1, list);
 if (ret < 0)
 goto fail;
 }
@@ -494,7 +500,8 @@ int ff_hevc_frame_rps(HEVCContext *s)
 int poc  = long_rps->poc[i];
 int list = long_rps->used[i] ? LT_CURR : LT_FOLL;
 
-ret = add_candidate_ref(s, &rps[list], poc, HEVC_FRAME_FLAG_LONG_REF, 
long_rps->poc_msb_present[i]);
+ret = add_candidate_ref(s, &rps[list], poc, HEVC_FRAME_FLAG_LONG_REF, 
long_rps->poc_msb_present[i],
+list);
 if (ret < 0)
 goto fail;
 }
-- 
2.35.1


OpenPGP_0x22AEF7EFBDCEFF03.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-devel] [PATCH v2 2/2] avcodec/hevcdec: detect non-conformant missing refs

2022-04-16 Thread Xiaolei Yu
For cases which prefer rejecting broken bitstreams.
---
 libavcodec/hevc_refs.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/libavcodec/hevc_refs.c b/libavcodec/hevc_refs.c
index 84a21991c7..9f8b6022c4 100644
--- a/libavcodec/hevc_refs.c
+++ b/libavcodec/hevc_refs.c
@@ -439,6 +439,8 @@ static int add_candidate_ref(HEVCContext *s, RefPicList 
*list,
 return 0;
 
 av_log(s->avctx, AV_LOG_ERROR, "Could not find ref with POC %d\n", 
poc);
+if (s->avctx->err_recognition & AV_EF_COMPLIANT)
+return AVERROR_INVALIDDATA;
 
 ref = generate_missing_ref(s, poc);
 if (!ref)
-- 
2.35.1



OpenPGP_0x22AEF7EFBDCEFF03.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 1/1] librtmp: use AVBPrint instead of char *

2022-04-16 Thread Martin Storsjö

On Fri, 15 Apr 2022, Tristan Matthews wrote:


This avoids having to do one pass to calculate the full length to allocate
followed by a second pass to actually append values.
---
libavformat/librtmp.c | 124 +++---
1 file changed, 33 insertions(+), 91 deletions(-)

diff --git a/libavformat/librtmp.c b/libavformat/librtmp.c
index 43013e46e0..b7e9fc81cf 100644
--- a/libavformat/librtmp.c
+++ b/libavformat/librtmp.c
@@ -25,6 +25,7 @@
 */

#include "libavutil/avstring.h"
+#include "libavutil/bprint.h"
#include "libavutil/mathematics.h"
#include "libavutil/opt.h"
#include "avformat.h"
@@ -38,6 +39,7 @@

typedef struct LibRTMPContext {
const AVClass *class;
+AVBPrint filename;
RTMP rtmp;
char *app;
char *conn;
@@ -50,7 +52,6 @@ typedef struct LibRTMPContext {
char *pageurl;
char *client_buffer_time;
int live;
-char *temp_filename;
int buffer_size;
} LibRTMPContext;


This looks ok to me.

(I guess it also could be considered viable to not store the AVBprint in 
the context but detach an allocated string instead; that would decrease 
the persistent allocation size a little, at the cost of some more copying, 
but the difference is probably negligible, so this probably is sensible as 
is.)


Let's still wait for Marton's review too.

// Martin

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH v2 0/1] lavc/aarch64: add some neon pix_abs functions

2022-04-16 Thread Martin Storsjö

On Fri, 15 Apr 2022, Martin Storsjö wrote:


On Thu, 14 Apr 2022, Swinney, Jonathan wrote:

Thanks Martin for the review. I made some updates according to the 
suggestions you made.


I added a checkasm function, but I'm new to the test framework, so it may 
need some work still.


Thanks for putting in the effort to make a test - that adds a lot of value to 
the project! I'll follow up with a review of the newly added test.


Btw, for a guide on how checkasm tests, see 
https://ffmpeg.org/pipermail/ffmpeg-devel/2022-March/294400.html as a bit 
of a walkthrough.


// Martin
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 1/2] swscale/aarch64: add hscale specializations

2022-04-16 Thread Martin Storsjö

On Fri, 15 Apr 2022, Swinney, Jonathan wrote:


This patch adds specializations for hscale for filterSize == 4 and 8 and
converts the existing implementation for the X8 version. For the old code, now
used for the X8 version, it improves the efficiency of the final summations by
reducing 11 instructions to 7.

ff_hscale8to15_8_neon is mostly unchanged from the original except for a few
changes.
- The loads for the filter data were consolidated into a single 64 byte ld1
  instruction.


Couldn't you do this optimization on the existing function too?


- The final summations were improved.
- The inner loop on filterSize was completely removed


I presume that this is the only differing factor which affects whether 
it's worthwhile to keep a separate width=8 function or not. At least from 
the checkasm benchmark numbers, the difference is notable but not huge (on 
the range of 4-10%, while the summation improvements gain even more).


Given a fully optimized function that has an inner loop (which is only 
taken once for the width=8 case), is the separate function without an 
inner loop really necessary?




ff_hscale8to15_4_neon is a complete rewrite. Since the main bottleneck here is
loading the data from src, this data is loaded a whole block ahead and stored
back to the stack to be loaded again with ld4. This arranges the data for most
efficient use of the vector instructions and removes the need for completion
adds at the end. The number of iterations of the C per iteration of the assembly
is increased from 4 to 8, but because of the prefetching, it can only be used
when dstW is >= 16.

This improves speed by 26% on Graviton 2 (Neoverse N1)
ffmpeg -nostats -f lavfi -i testsrc2=4k:d=2 -vf 
bench=start,scale=1024x1024,bench=stop -f null -
before: t:0.001796 avg:0.001839 max:0.002756 min:0.001733
after:  t:0.001690 avg:0.001352 max:0.002171 min:0.001292

In direct micro benchmarks I wrote the benefit is more dramatic when filterSize 
== 4.

| (seconds)   | c6g   |   |
| --- | - | - |
| filterSize  | 4 | 8 |
| original| 7.554 | 7.621 |
| optimized   | 3.736 | 7.054 |
| improvement | 50.5% | 7.44% |


This function does already have a checkasm test, so it'd be useful to 
include those numbers too!


FWIW that test (runnable with "checkasm --test=sw_scale", benchmarkable 
with "checkasm --bench=hscale --test=sw_scale") is a bit lacking - it only 
ever tests with one dstW, 512. As this patch shows special handling of 
smaller dstW, it'd probably be good to e.g. randomly test a couple widths 
in the range of 1-512, or test one width < 16 and a couple more bigger 
ones. Or just exhaustively testing the full range, although that's quite 
uncommon among tests.


Additionally, that testcase only tests with a steady 1 pixel stride in 
filterPos for each destination pixel - I guess it'd be good for testcase 
coverage to actually test more realistic scaling scenarios too.


So if you'd be able to improve that testcase while at it, that'd be very 
much appreciated!




Signed-off-by: Jonathan Swinney 
---
libswscale/aarch64/hscale.S  | 263 +--
libswscale/aarch64/swscale.c |  41 --
libswscale/utils.c   |   2 +-
3 files changed, 284 insertions(+), 22 deletions(-)

diff --git a/libswscale/aarch64/hscale.S b/libswscale/aarch64/hscale.S
index af55ffe2b7..a934653a46 100644
--- a/libswscale/aarch64/hscale.S
+++ b/libswscale/aarch64/hscale.S
@@ -1,5 +1,7 @@
/*
 * Copyright (c) 2016 Clément Bœsch 
+ * Copyright (c) 2019-2021 Sebastian Pop 
+ * Copyright (c) 2022 Jonathan Swinney 
 *
 * This file is part of FFmpeg.
 *
@@ -20,7 +22,25 @@

#include "libavutil/aarch64/asm.S"

-function ff_hscale_8_to_15_neon, export=1
+/*
+;-
+; horizontal line scaling
+;
+; void hscaleto__
+;   (SwsContext *c, int{16,32}_t *dst,
+;int dstW, const uint{8,16}_t *src,
+;const int16_t *filter,
+;const int32_t *filterPos, int filterSize);
+;
+; Scale one horizontal line. Input is either 8-bit width or 16-bit width
+; ($source_width can be either 8, 9, 10 or 16, difference is whether we have to
+; downscale before multiplying). Filter is 14 bits. Output is either 15 bits
+; (in int16_t) or 19 bits (in int32_t), as given in $intermediate_nbits. Each
+; output pixel is generated from $filterSize input pixels, the position of
+; the first pixel is given in filterPos[nOutputPixel].
+;- 
*/
+
+function ff_hscale8to15_X8_neon, export=1
sbfiz   x7, x6, #1, #32 // filterSize*2 (*2 
because int16)
1:  ldr w8, [x5], #4// filterPos[idx]
ldr w0, [x5], #4// filterPos[idx + 1]
@@ -61,20 +81,239 @@ function ff_hscale_8

Re: [FFmpeg-devel] [PATCH 2/2] swscale/aarch64: add vscale specializations

2022-04-16 Thread Martin Storsjö

On Fri, 15 Apr 2022, Swinney, Jonathan wrote:


This commit adds new code paths for vscale when filterSize is 2, 4, or 8. By
using specialized code with unrolling to match the filterSize we can improve
performance.

| (seconds)   | c6g   |   |   |
| | - | - | - |
| filterSize  | 2 | 4 | 8 |
| original| 0.581 | 0.974 | 1.744 |
| optimized   | 0.399 | 0.569 | 1.052 |
| improvement | 31.1% | 41.6% | 39.7% |

Signed-off-by: Jonathan Swinney 
---
libswscale/aarch64/output.S  | 147 +--
libswscale/aarch64/swscale.c |  12 +++
2 files changed, 153 insertions(+), 6 deletions(-)


I'll have a closer look at the assembly itself at a later time, but first:

The checkasm tests in tests/checkasm/sw_scale.c does test yuv2planeX, but 
there's no testing of yuv2plane1, can you extend it to cover that too? And 
that existing test only tests filter sizes 1, 4, 8, 16, but apparently 
should be extended to test size 2 too?


// Martin

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".