[FFmpeg-devel] [PATCH V8 3/3] lavfi: add filter dnn_detect for object detection

2021-04-12 Thread Guo, Yejun
Below are the example steps to do object detection:

1. download and install l_openvino_toolkit_p_2021.1.110.tgz from
https://software.intel.com/content/www/us/en/develop/tools/openvino-toolkit/download.html
  or, we can get source code (tag 2021.1), build and install.
2. export LD_LIBRARY_PATH with openvino settings, for example:
.../deployment_tools/inference_engine/lib/intel64/:.../deployment_tools/inference_engine/external/tbb/lib/
3. rebuild ffmpeg from source code with configure option:
--enable-libopenvino
--extra-cflags='-I.../deployment_tools/inference_engine/include/'
--extra-ldflags='-L.../deployment_tools/inference_engine/lib/intel64'
4. download model files and test image
wget 
https://github.com/guoyejun/ffmpeg_dnn/raw/main/models/openvino/2021.1/face-detection-adas-0001.bin
wget 
https://github.com/guoyejun/ffmpeg_dnn/raw/main/models/openvino/2021.1/face-detection-adas-0001.xml
wget
https://github.com/guoyejun/ffmpeg_dnn/raw/main/models/openvino/2021.1/face-detection-adas-0001.label
wget https://github.com/guoyejun/ffmpeg_dnn/raw/main/images/cici.jpg
5. run ffmpeg with:
./ffmpeg -i cici.jpg -vf 
dnn_detect=dnn_backend=openvino:model=face-detection-adas-0001.xml:input=data:output=detection_out:confidence=0.6:labels=face-detection-adas-0001.label,showinfo
 -f null -

We'll see the detect result as below:
[Parsed_showinfo_1 @ 0x560c21ecbe40]   side data - detection bounding boxes:
[Parsed_showinfo_1 @ 0x560c21ecbe40] source: face-detection-adas-0001.xml
[Parsed_showinfo_1 @ 0x560c21ecbe40] index: 0,  region: (1005, 813) -> (1086, 
905), label: face, confidence: 1/1.
[Parsed_showinfo_1 @ 0x560c21ecbe40] index: 1,  region: (888, 839) -> (967, 
926), label: face, confidence: 6917/1.

There are two faces detected with confidence 100% and 69.17%.

Signed-off-by: Guo, Yejun 
---
 configure   |   1 +
 doc/filters.texi|  40 
 libavfilter/Makefile|   1 +
 libavfilter/allfilters.c|   1 +
 libavfilter/vf_dnn_detect.c | 421 
 5 files changed, 464 insertions(+)
 create mode 100644 libavfilter/vf_dnn_detect.c

diff --git a/configure b/configure
index d7a3f507e8..cc1013fb1d 100755
--- a/configure
+++ b/configure
@@ -3555,6 +3555,7 @@ derain_filter_select="dnn"
 deshake_filter_select="pixelutils"
 deshake_opencl_filter_deps="opencl"
 dilation_opencl_filter_deps="opencl"
+dnn_detect_filter_select="dnn"
 dnn_processing_filter_select="dnn"
 drawtext_filter_deps="libfreetype"
 drawtext_filter_suggest="libfontconfig libfribidi"
diff --git a/doc/filters.texi b/doc/filters.texi
index 5e35fa6467..68f17dd563 100644
--- a/doc/filters.texi
+++ b/doc/filters.texi
@@ -10127,6 +10127,46 @@ ffmpeg -i INPUT -f lavfi -i 
nullsrc=hd720,geq='r=128+80*(sin(sqrt((X-W/2)*(X-W/2
 @end example
 @end itemize
 
+@section dnn_detect
+
+Do object detection with deep neural networks.
+
+The filter accepts the following options:
+
+@table @option
+@item dnn_backend
+Specify which DNN backend to use for model loading and execution. This option 
accepts
+only openvino now, tensorflow backends will be added.
+
+@item model
+Set path to model file specifying network architecture and its parameters.
+Note that different backends use different file formats.
+
+@item input
+Set the input name of the dnn network.
+
+@item output
+Set the output name of the dnn network.
+
+@item confidence
+Set the confidence threshold (default: 0.5).
+
+@item labels
+Set path to label file specifying the mapping between label id and name.
+Each label name is written in one line, tailing spaces and empty lines are 
skipped.
+The first line is the name of label id 0 (usually it is 'background'),
+and the second line is the name of label id 1, etc.
+The label id is considered as name if the label file is not provided.
+
+@item backend_configs
+Set the configs to be passed into backend
+
+@item async
+use DNN async execution if set (default: set),
+roll back to sync execution if the backend does not support async.
+
+@end table
+
 @anchor{dnn_processing}
 @section dnn_processing
 
diff --git a/libavfilter/Makefile b/libavfilter/Makefile
index b2c254ea67..b77f2276a4 100644
--- a/libavfilter/Makefile
+++ b/libavfilter/Makefile
@@ -245,6 +245,7 @@ OBJS-$(CONFIG_DILATION_FILTER)   += 
vf_neighbor.o
 OBJS-$(CONFIG_DILATION_OPENCL_FILTER)+= vf_neighbor_opencl.o opencl.o \
 opencl/neighbor.o
 OBJS-$(CONFIG_DISPLACE_FILTER)   += vf_displace.o framesync.o
+OBJS-$(CONFIG_DNN_DETECT_FILTER) += vf_dnn_detect.o
 OBJS-$(CONFIG_DNN_PROCESSING_FILTER) += vf_dnn_processing.o
 OBJS-$(CONFIG_DOUBLEWEAVE_FILTER)+= vf_weave.o
 OBJS-$(CONFIG_DRAWBOX_FILTER)+= vf_drawbox.o
diff --git a/libavfilter/allfilters.c b/libavfilter/allfilters.c
index 0872c6e0f2..0d2bf7bbee 100644
--- a/libavfilter/allfilters.c
+++ b/libavfilter/allfilters.c
@@ -230,6 +230,7 @@ extern AVFilter 

[FFmpeg-devel] [PATCH V8 2/3] lavfi: show side data of detection bounding boxes

2021-04-12 Thread Guo, Yejun
---
 libavfilter/f_sidedata.c  |  2 ++
 libavfilter/vf_showinfo.c | 29 +
 2 files changed, 31 insertions(+)

diff --git a/libavfilter/f_sidedata.c b/libavfilter/f_sidedata.c
index 3757723375..6f25d2b311 100644
--- a/libavfilter/f_sidedata.c
+++ b/libavfilter/f_sidedata.c
@@ -71,6 +71,7 @@ static const AVOption filt_name##_options[] = { \
 {   "S12M_TIMECOD",   "", 0, AV_OPT_TYPE_CONST,  
{.i64 = AV_FRAME_DATA_S12M_TIMECODE  }, 0, 0, FLAGS, "type" }, \
 {   "DYNAMIC_HDR_PLUS",   "", 0, AV_OPT_TYPE_CONST,  
{.i64 = AV_FRAME_DATA_DYNAMIC_HDR_PLUS   }, 0, 0, FLAGS, "type" }, \
 {   "REGIONS_OF_INTEREST","", 0, AV_OPT_TYPE_CONST,  
{.i64 = AV_FRAME_DATA_REGIONS_OF_INTEREST}, 0, 0, FLAGS, "type" }, \
+{   "DETECTION_BOUNDING_BOXES",   "", 0, AV_OPT_TYPE_CONST,  
{.i64 = AV_FRAME_DATA_DETECTION_BBOXES   }, 0, 0, FLAGS, "type" }, \
 {   "SEI_UNREGISTERED",   "", 0, AV_OPT_TYPE_CONST,  
{.i64 = AV_FRAME_DATA_SEI_UNREGISTERED   }, 0, 0, FLAGS, "type" }, \
 { NULL } \
 }
@@ -100,6 +101,7 @@ static const AVOption filt_name##_options[] = { \
 {   "S12M_TIMECOD",   "", 0, AV_OPT_TYPE_CONST,  
{.i64 = AV_FRAME_DATA_S12M_TIMECODE  }, 0, 0, FLAGS, "type" }, \
 {   "DYNAMIC_HDR_PLUS",   "", 0, AV_OPT_TYPE_CONST,  
{.i64 = AV_FRAME_DATA_DYNAMIC_HDR_PLUS   }, 0, 0, FLAGS, "type" }, \
 {   "REGIONS_OF_INTEREST","", 0, AV_OPT_TYPE_CONST,  
{.i64 = AV_FRAME_DATA_REGIONS_OF_INTEREST}, 0, 0, FLAGS, "type" }, \
+{   "DETECTION_BOUNDING_BOXES",   "", 0, AV_OPT_TYPE_CONST,  
{.i64 = AV_FRAME_DATA_DETECTION_BBOXES   }, 0, 0, FLAGS, "type" }, \
 {   "SEI_UNREGISTERED",   "", 0, AV_OPT_TYPE_CONST,  
{.i64 = AV_FRAME_DATA_SEI_UNREGISTERED   }, 0, 0, FLAGS, "type" }, \
 { NULL } \
 }
diff --git a/libavfilter/vf_showinfo.c b/libavfilter/vf_showinfo.c
index 6208892005..ae6f6bb7b1 100644
--- a/libavfilter/vf_showinfo.c
+++ b/libavfilter/vf_showinfo.c
@@ -38,6 +38,7 @@
 #include "libavutil/timecode.h"
 #include "libavutil/mastering_display_metadata.h"
 #include "libavutil/video_enc_params.h"
+#include "libavutil/detection_bbox.h"
 
 #include "avfilter.h"
 #include "internal.h"
@@ -153,6 +154,31 @@ static void dump_roi(AVFilterContext *ctx, const 
AVFrameSideData *sd)
 }
 }
 
+static void dump_detection_bbox(AVFilterContext *ctx, const AVFrameSideData 
*sd)
+{
+int nb_bboxes;
+const AVDetectionBBoxHeader *header;
+const AVDetectionBBox *bbox;
+
+header = (const AVDetectionBBoxHeader *)sd->data;
+nb_bboxes = header->nb_bboxes;
+av_log(ctx, AV_LOG_INFO, "detection bounding boxes:\n");
+av_log(ctx, AV_LOG_INFO, "source: %s\n", header->source);
+
+for (int i = 0; i < nb_bboxes; i++) {
+bbox = av_get_detection_bbox(header, i);
+av_log(ctx, AV_LOG_INFO, "index: %d,\tregion: (%d, %d) -> (%d, %d), 
label: %s, confidence: %d/%d.\n",
+ i, bbox->x, bbox->y, bbox->x + bbox->w, 
bbox->y + bbox->h,
+ bbox->detect_label, 
bbox->detect_confidence.num, bbox->detect_confidence.den);
+if (bbox->classify_count > 0) {
+for (int j = 0; j < bbox->classify_count; j++) {
+av_log(ctx, AV_LOG_INFO, "\t\tclassify:  label: %s, 
confidence: %d/%d.\n",
+   bbox->classify_labels[j], 
bbox->classify_confidences[j].num, bbox->classify_confidences[j].den);
+}
+}
+}
+}
+
 static void dump_mastering_display(AVFilterContext *ctx, const AVFrameSideData 
*sd)
 {
 const AVMasteringDisplayMetadata *mastering_display;
@@ -494,6 +520,9 @@ static int filter_frame(AVFilterLink *inlink, AVFrame 
*frame)
 case AV_FRAME_DATA_REGIONS_OF_INTEREST:
 dump_roi(ctx, sd);
 break;
+case AV_FRAME_DATA_DETECTION_BBOXES:
+dump_detection_bbox(ctx, sd);
+break;
 case AV_FRAME_DATA_MASTERING_DISPLAY_METADATA:
 dump_mastering_display(ctx, sd);
 break;
-- 
2.17.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH V8 1/3] lavu: add side data AV_FRAME_DATA_DETECTION_BBOXES for object detection/classification

2021-04-12 Thread Guo, Yejun
---
 doc/APIchanges |   2 +
 libavutil/Makefile |   2 +
 libavutil/detection_bbox.c |  73 +
 libavutil/detection_bbox.h | 107 +
 libavutil/frame.c  |   1 +
 libavutil/frame.h  |   6 +++
 6 files changed, 191 insertions(+)
 create mode 100644 libavutil/detection_bbox.c
 create mode 100644 libavutil/detection_bbox.h

diff --git a/doc/APIchanges b/doc/APIchanges
index 9dfcc97d5c..30bd235691 100644
--- a/doc/APIchanges
+++ b/doc/APIchanges
@@ -14,6 +14,8 @@ libavutil: 2017-10-21
 
 
 API changes, most recent first:
+2021-04-xx - xx - lavu 56.xx.100 - frame.h detection_bbox.h
+  Add AV_FRAME_DATA_DETECTION_BBOXES
 
 2021-04-06 - xx - lavf 58.78.100 - avformat.h
   Add avformat_index_get_entries_count(), avformat_index_get_entry(),
diff --git a/libavutil/Makefile b/libavutil/Makefile
index 27bafe9e12..47efb718d2 100644
--- a/libavutil/Makefile
+++ b/libavutil/Makefile
@@ -21,6 +21,7 @@ HEADERS = adler32.h   
  \
   cpu.h \
   crc.h \
   des.h \
+  detection_bbox.h  \
   dict.h\
   display.h \
   dovi_meta.h   \
@@ -113,6 +114,7 @@ OBJS = adler32.o
\
cpu.o\
crc.o\
des.o\
+   detection_bbox.o \
dict.o   \
display.o\
dovi_meta.o  \
diff --git a/libavutil/detection_bbox.c b/libavutil/detection_bbox.c
new file mode 100644
index 00..c54a30d9e5
--- /dev/null
+++ b/libavutil/detection_bbox.c
@@ -0,0 +1,73 @@
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "detection_bbox.h"
+
+AVDetectionBBoxHeader *av_detection_bbox_alloc(uint32_t nb_bboxes, size_t 
*out_size)
+{
+size_t size;
+struct {
+   AVDetectionBBoxHeader header;
+   AVDetectionBBox boxes[1];
+} *ret;
+
+size = sizeof(*ret);
+if (nb_bboxes - 1 > (SIZE_MAX - size) / sizeof(*ret->boxes))
+return NULL;
+size += sizeof(*ret->boxes) * (nb_bboxes - 1);
+
+ret = av_mallocz(size);
+if (!ret)
+return NULL;
+
+ret->header.nb_bboxes = nb_bboxes;
+ret->header.bbox_size = sizeof(*ret->boxes);
+ret->header.bboxes_offset = (char *)>boxes - (char *)>header;
+
+if (out_size)
+*out_size = size;
+
+return >header;
+}
+
+AVDetectionBBoxHeader *av_detection_bbox_create_side_data(AVFrame *frame, 
uint32_t nb_bboxes)
+{
+AVBufferRef *buf;
+AVDetectionBBoxHeader *header;
+size_t size;
+
+header = av_detection_bbox_alloc(nb_bboxes, );
+if (!header)
+return NULL;
+if (size > INT_MAX) {
+av_freep();
+return NULL;
+}
+buf = av_buffer_create((uint8_t *)header, size, NULL, NULL, 0);
+if (!buf) {
+av_freep();
+return NULL;
+}
+
+if (!av_frame_new_side_data_from_buf(frame, 
AV_FRAME_DATA_DETECTION_BBOXES, buf)) {
+av_buffer_unref();
+return NULL;
+}
+
+return header;
+}
diff --git a/libavutil/detection_bbox.h b/libavutil/detection_bbox.h
new file mode 100644
index 00..4ad05d3b95
--- /dev/null
+++ b/libavutil/detection_bbox.h
@@ -0,0 +1,107 @@
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * 

Re: [FFmpeg-devel] [PATCH] libavcodec/qsvenc: add mbbrc to hevc_qsv

2021-04-12 Thread Xiang, Haihao
On Tue, 2021-04-13 at 10:22 +0800, wenbin.c...@intel.com wrote:
> From: "Chen,Wenbin" 
> 
> Add mbbrc to hevc_qsv
> For detailed description, please see "mbbrc" part in:
> 
https://github.com/Intel-Media-SDK/MediaSDK/blob/master/doc/mediasdk-man.md#mfxextcodingoption2
> 
> Signed-off-by: Wenbin Chen 
> ---
>  libavcodec/qsvenc.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/libavcodec/qsvenc.c b/libavcodec/qsvenc.c
> index 566a5c8552..19e246a8fb 100644
> --- a/libavcodec/qsvenc.c
> +++ b/libavcodec/qsvenc.c
> @@ -701,8 +701,6 @@ FF_ENABLE_DEPRECATION_WARNINGS
>  
>  if (q->bitrate_limit >= 0)
>  q->extco2.BitrateLimit = q->bitrate_limit ?
> MFX_CODINGOPTION_ON : MFX_CODINGOPTION_OFF;
> -if (q->mbbrc >= 0)
> -q->extco2.MBBRC = q->mbbrc ? MFX_CODINGOPTION_ON :
> MFX_CODINGOPTION_OFF;
>  
>  if (q->max_frame_size >= 0)
>  q->extco2.MaxFrameSize = q->max_frame_size;
> @@ -755,6 +753,9 @@ FF_ENABLE_DEPRECATION_WARNINGS
>  q->extco2.MaxQPP = q->extco2.MaxQPB = q->extco2.MaxQPI;
>  }
>  #endif
> +if (q->mbbrc >= 0)
> +q->extco2.MBBRC = q->mbbrc ? MFX_CODINGOPTION_ON :
> MFX_CODINGOPTION_OFF;
> +
>  q->extco2.Header.BufferId = MFX_EXTBUFF_CODING_OPTION2;
>  q->extco2.Header.BufferSz = sizeof(q->extco2);

LGTM, thanks!

-Haihao


___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH] libavcodec/qsvenc: add mbbrc to hevc_qsv

2021-04-12 Thread wenbin . chen
From: "Chen,Wenbin" 

Add mbbrc to hevc_qsv
For detailed description, please see "mbbrc" part in:
https://github.com/Intel-Media-SDK/MediaSDK/blob/master/doc/mediasdk-man.md#mfxextcodingoption2

Signed-off-by: Wenbin Chen 
---
 libavcodec/qsvenc.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/libavcodec/qsvenc.c b/libavcodec/qsvenc.c
index 566a5c8552..19e246a8fb 100644
--- a/libavcodec/qsvenc.c
+++ b/libavcodec/qsvenc.c
@@ -701,8 +701,6 @@ FF_ENABLE_DEPRECATION_WARNINGS
 
 if (q->bitrate_limit >= 0)
 q->extco2.BitrateLimit = q->bitrate_limit ? 
MFX_CODINGOPTION_ON : MFX_CODINGOPTION_OFF;
-if (q->mbbrc >= 0)
-q->extco2.MBBRC = q->mbbrc ? MFX_CODINGOPTION_ON : 
MFX_CODINGOPTION_OFF;
 
 if (q->max_frame_size >= 0)
 q->extco2.MaxFrameSize = q->max_frame_size;
@@ -755,6 +753,9 @@ FF_ENABLE_DEPRECATION_WARNINGS
 q->extco2.MaxQPP = q->extco2.MaxQPB = q->extco2.MaxQPI;
 }
 #endif
+if (q->mbbrc >= 0)
+q->extco2.MBBRC = q->mbbrc ? MFX_CODINGOPTION_ON : 
MFX_CODINGOPTION_OFF;
+
 q->extco2.Header.BufferId = MFX_EXTBUFF_CODING_OPTION2;
 q->extco2.Header.BufferSz = sizeof(q->extco2);
 
-- 
2.25.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH] checkasm: add (private) kperf timing for macOS

2021-04-12 Thread Lynne
Apr 13, 2021, 02:45 by j...@itanimul.li:

> Signed-off-by: Josh Dekker 
> ---
>  configure|   2 +
>  tests/checkasm/Makefile  |   1 +
>  tests/checkasm/checkasm.c|  19 -
>  tests/checkasm/checkasm.h|  10 ++-
>  tests/checkasm/macos_kperf.c | 143 +++
>  tests/checkasm/macos_kperf.h |  23 ++
>  6 files changed, 195 insertions(+), 3 deletions(-)
>  create mode 100644 tests/checkasm/macos_kperf.c
>  create mode 100644 tests/checkasm/macos_kperf.h
>
> diff --git a/configure b/configure
> index d7a3f507e8..a47e3dea67 100755
> --- a/configure
> +++ b/configure
> @@ -490,6 +490,7 @@ Developer options (useful when working on FFmpeg itself):
>  --ignore-tests=TESTS comma-separated list (without "fate-" prefix
>  in the name) of tests whose result is ignored
>  --enable-linux-perf  enable Linux Performance Monitor API
> +  --enable-macos-kperf enable macOS kperf (private) API
>  --disable-large-testsdisable tests that use a large amount of memory
>  
>  NOTE: Object files are built at the place where configure is launched.
> @@ -1949,6 +1950,7 @@ CONFIG_LIST="
>  fontconfig
>  large_tests
>  linux_perf
> +macos_kperf
>  memory_poisoning
>  neon_clobber_test
>  ossfuzz
> diff --git a/tests/checkasm/Makefile b/tests/checkasm/Makefile
> index 1827a4e134..4abaef9c63 100644
> --- a/tests/checkasm/Makefile
> +++ b/tests/checkasm/Makefile
> @@ -58,6 +58,7 @@ CHECKASMOBJS-$(CONFIG_AVUTIL)  += $(AVUTILOBJS)
>  CHECKASMOBJS-$(ARCH_AARCH64)+= aarch64/checkasm.o
>  CHECKASMOBJS-$(HAVE_ARMV5TE_EXTERNAL)   += arm/checkasm.o
>  CHECKASMOBJS-$(HAVE_X86ASM) += x86/checkasm.o
> +CHECKASMOBJS-$(CONFIG_MACOS_KPERF)  += macos_kperf.o
>  
>  CHECKASMOBJS += $(CHECKASMOBJS-yes) checkasm.o
>  CHECKASMOBJS := $(sort $(CHECKASMOBJS:%=tests/checkasm/%))
> diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c
> index 8338e8ff58..4c42040244 100644
> --- a/tests/checkasm/checkasm.c
> +++ b/tests/checkasm/checkasm.c
> @@ -26,6 +26,8 @@
>  # ifndef _GNU_SOURCE
>  #  define _GNU_SOURCE // for syscall (performance monitoring API)
>  # endif
> +#elif CONFIG_MACOS_KPERF
> +#include "macos_kperf.h"
>  #endif
>  
>  #include 
> @@ -637,9 +639,20 @@ static int bench_init_linux(void)
>  }
>  return 0;
>  }
> -#endif
> +#elif CONFIG_MACOS_KPERF
> +static int bench_init_kperf(void)
> +{
> +if (ff_kperf_init() || ff_kperf_setup())
> +return -1;
>  
> -#if !CONFIG_LINUX_PERF
> +if (ff_kperf_cycles(NULL)) {
> +fprintf(stderr, "checkasm must be run as root to use kperf on 
> macOS\n");
> +return -1;
> +}
> +
> +return 0;
> +}
> +#else
>  static int bench_init_ffmpeg(void)
>  {
>  #ifdef AV_READ_TIME
> @@ -656,6 +669,8 @@ static int bench_init(void)
>  {
>  #if CONFIG_LINUX_PERF
>  int ret = bench_init_linux();
> +#elif CONFIG_MACOS_KPERF
> +int ret = bench_init_kperf();
>  #else
>  int ret = bench_init_ffmpeg();
>  #endif
> diff --git a/tests/checkasm/checkasm.h b/tests/checkasm/checkasm.h
> index ef6645e3a2..4127081d74 100644
> --- a/tests/checkasm/checkasm.h
> +++ b/tests/checkasm/checkasm.h
> @@ -31,6 +31,8 @@
>  #include 
>  #include 
>  #include 
> +#elif CONFIG_MACOS_KPERF
> +#include "macos_kperf.h"
>  #endif
>  
>  #include "libavutil/avstring.h"
> @@ -224,7 +226,7 @@ typedef struct CheckasmPerf {
>  int iterations;
>  } CheckasmPerf;
>  
> -#if defined(AV_READ_TIME) || CONFIG_LINUX_PERF
> +#if defined(AV_READ_TIME) || CONFIG_LINUX_PERF || CONFIG_MACOS_KPERF
>  
>  #if CONFIG_LINUX_PERF
>  #define PERF_START(t) do {  \
> @@ -235,6 +237,12 @@ typedef struct CheckasmPerf {
>  ioctl(sysfd, PERF_EVENT_IOC_DISABLE, 0);\
>  read(sysfd, , sizeof(t)); \
>  } while (0)
> +#elif CONFIG_MACOS_KPERF
> +#define PERF_START(t) do {  \
> +t = 0;  \
> +ff_kperf_cycles();\
> +} while (0)
> +#define PERF_STOP(t) ff_kperf_cycles()
>  #else
>  #define PERF_START(t) t = AV_READ_TIME()
>  #define PERF_STOP(t)  t = AV_READ_TIME() - t
> diff --git a/tests/checkasm/macos_kperf.c b/tests/checkasm/macos_kperf.c
> new file mode 100644
> index 00..e6ae316608
> --- /dev/null
> +++ b/tests/checkasm/macos_kperf.c
> @@ -0,0 +1,143 @@
> +/*
> + * This file is part of FFmpeg.
> + *
> + * FFmpeg is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * FFmpeg is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of 

[FFmpeg-devel] [PATCH] checkasm: add (private) kperf timing for macOS

2021-04-12 Thread Josh Dekker
Signed-off-by: Josh Dekker 
---
 configure|   2 +
 tests/checkasm/Makefile  |   1 +
 tests/checkasm/checkasm.c|  19 -
 tests/checkasm/checkasm.h|  10 ++-
 tests/checkasm/macos_kperf.c | 143 +++
 tests/checkasm/macos_kperf.h |  23 ++
 6 files changed, 195 insertions(+), 3 deletions(-)
 create mode 100644 tests/checkasm/macos_kperf.c
 create mode 100644 tests/checkasm/macos_kperf.h

diff --git a/configure b/configure
index d7a3f507e8..a47e3dea67 100755
--- a/configure
+++ b/configure
@@ -490,6 +490,7 @@ Developer options (useful when working on FFmpeg itself):
   --ignore-tests=TESTS comma-separated list (without "fate-" prefix
in the name) of tests whose result is ignored
   --enable-linux-perf  enable Linux Performance Monitor API
+  --enable-macos-kperf enable macOS kperf (private) API
   --disable-large-testsdisable tests that use a large amount of memory
 
 NOTE: Object files are built at the place where configure is launched.
@@ -1949,6 +1950,7 @@ CONFIG_LIST="
 fontconfig
 large_tests
 linux_perf
+macos_kperf
 memory_poisoning
 neon_clobber_test
 ossfuzz
diff --git a/tests/checkasm/Makefile b/tests/checkasm/Makefile
index 1827a4e134..4abaef9c63 100644
--- a/tests/checkasm/Makefile
+++ b/tests/checkasm/Makefile
@@ -58,6 +58,7 @@ CHECKASMOBJS-$(CONFIG_AVUTIL)  += $(AVUTILOBJS)
 CHECKASMOBJS-$(ARCH_AARCH64)+= aarch64/checkasm.o
 CHECKASMOBJS-$(HAVE_ARMV5TE_EXTERNAL)   += arm/checkasm.o
 CHECKASMOBJS-$(HAVE_X86ASM) += x86/checkasm.o
+CHECKASMOBJS-$(CONFIG_MACOS_KPERF)  += macos_kperf.o
 
 CHECKASMOBJS += $(CHECKASMOBJS-yes) checkasm.o
 CHECKASMOBJS := $(sort $(CHECKASMOBJS:%=tests/checkasm/%))
diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c
index 8338e8ff58..4c42040244 100644
--- a/tests/checkasm/checkasm.c
+++ b/tests/checkasm/checkasm.c
@@ -26,6 +26,8 @@
 # ifndef _GNU_SOURCE
 #  define _GNU_SOURCE // for syscall (performance monitoring API)
 # endif
+#elif CONFIG_MACOS_KPERF
+#include "macos_kperf.h"
 #endif
 
 #include 
@@ -637,9 +639,20 @@ static int bench_init_linux(void)
 }
 return 0;
 }
-#endif
+#elif CONFIG_MACOS_KPERF
+static int bench_init_kperf(void)
+{
+if (ff_kperf_init() || ff_kperf_setup())
+return -1;
 
-#if !CONFIG_LINUX_PERF
+if (ff_kperf_cycles(NULL)) {
+fprintf(stderr, "checkasm must be run as root to use kperf on 
macOS\n");
+return -1;
+}
+
+return 0;
+}
+#else
 static int bench_init_ffmpeg(void)
 {
 #ifdef AV_READ_TIME
@@ -656,6 +669,8 @@ static int bench_init(void)
 {
 #if CONFIG_LINUX_PERF
 int ret = bench_init_linux();
+#elif CONFIG_MACOS_KPERF
+int ret = bench_init_kperf();
 #else
 int ret = bench_init_ffmpeg();
 #endif
diff --git a/tests/checkasm/checkasm.h b/tests/checkasm/checkasm.h
index ef6645e3a2..4127081d74 100644
--- a/tests/checkasm/checkasm.h
+++ b/tests/checkasm/checkasm.h
@@ -31,6 +31,8 @@
 #include 
 #include 
 #include 
+#elif CONFIG_MACOS_KPERF
+#include "macos_kperf.h"
 #endif
 
 #include "libavutil/avstring.h"
@@ -224,7 +226,7 @@ typedef struct CheckasmPerf {
 int iterations;
 } CheckasmPerf;
 
-#if defined(AV_READ_TIME) || CONFIG_LINUX_PERF
+#if defined(AV_READ_TIME) || CONFIG_LINUX_PERF || CONFIG_MACOS_KPERF
 
 #if CONFIG_LINUX_PERF
 #define PERF_START(t) do {  \
@@ -235,6 +237,12 @@ typedef struct CheckasmPerf {
 ioctl(sysfd, PERF_EVENT_IOC_DISABLE, 0);\
 read(sysfd, , sizeof(t)); \
 } while (0)
+#elif CONFIG_MACOS_KPERF
+#define PERF_START(t) do {  \
+t = 0;  \
+ff_kperf_cycles();\
+} while (0)
+#define PERF_STOP(t) ff_kperf_cycles()
 #else
 #define PERF_START(t) t = AV_READ_TIME()
 #define PERF_STOP(t)  t = AV_READ_TIME() - t
diff --git a/tests/checkasm/macos_kperf.c b/tests/checkasm/macos_kperf.c
new file mode 100644
index 00..e6ae316608
--- /dev/null
+++ b/tests/checkasm/macos_kperf.c
@@ -0,0 +1,143 @@
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with FFmpeg; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+ */
+
+#include "macos_kperf.h"
+#include 
+#include 
+#include 
+

Re: [FFmpeg-devel] [PATCH] avcodec/jpeglsenc: Remove redundant pixel format checks

2021-04-12 Thread Andreas Rheinhardt
James Almer:
> On 4/12/2021 2:07 PM, Andreas Rheinhardt wrote:
>> This encoder has AVCodec.pix_fmts set, so ff_encode_preinit() already
>> checks for this.
>>
>> Signed-off-by: Andreas Rheinhardt 
>> ---
>> Will apply tomorrow unless there are objections.
>>
>>   libavcodec/jpeglsenc.c | 8 
>>   1 file changed, 8 deletions(-)
>>
>> diff --git a/libavcodec/jpeglsenc.c b/libavcodec/jpeglsenc.c
>> index 2bb6b1407a..d03ce32f41 100644
>> --- a/libavcodec/jpeglsenc.c
>> +++ b/libavcodec/jpeglsenc.c
>> @@ -429,14 +429,6 @@ FF_DISABLE_DEPRECATION_WARNINGS
>>   FF_ENABLE_DEPRECATION_WARNINGS
>>   #endif
>>   -    if (ctx->pix_fmt != AV_PIX_FMT_GRAY8  &&
>> -    ctx->pix_fmt != AV_PIX_FMT_GRAY16 &&
>> -    ctx->pix_fmt != AV_PIX_FMT_RGB24  &&
>> -    ctx->pix_fmt != AV_PIX_FMT_BGR24) {
>> -    av_log(ctx, AV_LOG_ERROR,
>> -   "Only grayscale and RGB24/BGR24 images are supported\n");
>> -    return -1;
>> -    }
>>   return 0;
>>   }
> 
> nit: The only code left in this function after this patch will be gone
> after the bump, so maybe either wrap the entire function (and the
> AVCodec initializer) with the relevant check, or postpone applying this
> patch until after the bump so you can remove the whole thing in one go.
> 

I am aware of that and my current plan is to just remove the whole init
function in the patch that removes the coded frame. I don't think it
makes much sense to touch the #ifs and even add new ones.

> LGTM regardless of the above.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH] Added Closed caption support for cuviddec for preserving a53 data n GPU decoding

2021-04-12 Thread James Almer

On 4/12/2021 5:21 PM, Dhanish Vijayan wrote:

Signed-off-by: Dhanish Vijayan 
---
  libavcodec/cuviddec.c | 199 ++
  1 file changed, 199 insertions(+)

diff --git a/libavcodec/cuviddec.c b/libavcodec/cuviddec.c
index ec57afdefe..3b07d0a874 100644
--- a/libavcodec/cuviddec.c
+++ b/libavcodec/cuviddec.c
@@ -46,6 +46,9 @@
  #define CUVID_HAS_AV1_SUPPORT
  #endif
  
+#define MAX_FRAME_COUNT 25

+#define A53_QUEUE_SIZE (MAX_FRAME_COUNT + 8)
+
  typedef struct CuvidContext
  {
  AVClass *avclass;
@@ -89,6 +92,11 @@ typedef struct CuvidContext
  cudaVideoCodec codec_type;
  cudaVideoChromaFormat chroma_format;
  
+uint8_t* a53_caption;

+int a53_caption_size;
+uint8_t* a53_caption_queue[A53_QUEUE_SIZE];
+int a53_caption_size_queue[A53_QUEUE_SIZE];
+
  CUVIDDECODECAPS caps8, caps10, caps12;
  
  CUVIDPARSERPARAMS cuparseinfo;

@@ -103,6 +111,8 @@ typedef struct CuvidParsedFrame
  CUVIDPARSERDISPINFO dispinfo;
  int second_field;
  int is_deinterlacing;
+uint8_t* a53_caption;
+int a53_caption_size;
  } CuvidParsedFrame;
  
  #define CHECK_CU(x) FF_CUDA_CHECK_DL(avctx, ctx->cudl, x)

@@ -338,6 +348,24 @@ static int CUDAAPI cuvid_handle_picture_decode(void 
*opaque, CUVIDPICPARAMS* pic
  
  ctx->key_frame[picparams->CurrPicIdx] = picparams->intra_pic_flag;
  
+if (ctx->a53_caption)

+{
+
+if (picparams->CurrPicIdx >= A53_QUEUE_SIZE)
+{
+av_log(avctx, AV_LOG_WARNING, "CurrPicIdx too big: %d\n", 
picparams->CurrPicIdx);
+av_freep(>a53_caption);
+}
+else
+{
+int pos = picparams->CurrPicIdx;
+av_freep(>a53_caption_queue[pos]);
+ctx->a53_caption_queue[pos] = ctx->a53_caption;
+ctx->a53_caption_size_queue[pos] = ctx->a53_caption_size;
+ctx->a53_caption = NULL;
+}
+}
+
  ctx->internal_error = 
CHECK_CU(ctx->cvdl->cuvidDecodePicture(ctx->cudecoder, picparams));
  if (ctx->internal_error < 0)
  return 0;
@@ -350,6 +378,20 @@ static int CUDAAPI cuvid_handle_picture_display(void 
*opaque, CUVIDPARSERDISPINF
  AVCodecContext *avctx = opaque;
  CuvidContext *ctx = avctx->priv_data;
  CuvidParsedFrame parsed_frame = { { 0 } };
+uint8_t* a53_caption = NULL;
+int a53_caption_size = 0;
+
+if (dispinfo->picture_index >= A53_QUEUE_SIZE)
+{
+av_log(avctx, AV_LOG_WARNING, "picture_index too big: %d\n", 
dispinfo->picture_index);
+}
+else
+{
+int pos = dispinfo->picture_index;
+a53_caption = ctx->a53_caption_queue[pos];
+a53_caption_size = ctx->a53_caption_size_queue[pos];
+ctx->a53_caption_queue[pos] = NULL;
+}
  
  parsed_frame.dispinfo = *dispinfo;

  ctx->internal_error = 0;
@@ -358,11 +400,17 @@ static int CUDAAPI cuvid_handle_picture_display(void 
*opaque, CUVIDPARSERDISPINF
  parsed_frame.dispinfo.progressive_frame = ctx->progressive_sequence;
  
  if (ctx->deint_mode_current == cudaVideoDeinterlaceMode_Weave) {

+parsed_frame.a53_caption = a53_caption;
+parsed_frame.a53_caption_size = a53_caption_size;
  av_fifo_generic_write(ctx->frame_queue, _frame, 
sizeof(CuvidParsedFrame), NULL);
  } else {
  parsed_frame.is_deinterlacing = 1;
+parsed_frame.a53_caption = a53_caption;
+parsed_frame.a53_caption_size = a53_caption_size;
  av_fifo_generic_write(ctx->frame_queue, _frame, 
sizeof(CuvidParsedFrame), NULL);
  if (!ctx->drop_second_field) {
+parsed_frame.a53_caption = NULL;
+parsed_frame.a53_caption_size = 0;
  parsed_frame.second_field = 1;
  av_fifo_generic_write(ctx->frame_queue, _frame, 
sizeof(CuvidParsedFrame), NULL);
  }
@@ -382,6 +430,139 @@ static int cuvid_is_buffer_full(AVCodecContext *avctx)
  return (av_fifo_size(ctx->frame_queue) / sizeof(CuvidParsedFrame)) + delay 
>= ctx->nb_surfaces;
  }
  
+

+static void cuvid_mpeg_parse_a53(CuvidContext *ctx, const uint8_t* p, int 
buf_size)
+{
+const uint8_t* buf_end = p + buf_size;
+for(;;)
+{
+uint32_t start_code = -1;
+p = avpriv_find_start_code(p, buf_end, _code);
+if (start_code > 0x1ff)
+break;
+if (start_code != 0x1b2)
+continue;
+buf_size = buf_end - p;
+if (buf_size >= 6 &&
+p[0] == 'G' && p[1] == 'A' && p[2] == '9' && p[3] == '4' && p[4] == 3 
&& (p[5] & 0x40))
+{
+/* extract A53 Part 4 CC data */
+int cc_count = p[5] & 0x1f;
+if (cc_count > 0 && buf_size >= 7 + cc_count * 3)
+{
+av_freep(>a53_caption);
+ctx->a53_caption_size = cc_count * 3;
+ctx->a53_caption  = av_malloc(ctx->a53_caption_size);
+if (ctx->a53_caption)
+memcpy(ctx->a53_caption, p + 7, 

Re: [FFmpeg-devel] [PATCH] avcodec/jpeglsenc: Remove redundant pixel format checks

2021-04-12 Thread James Almer

On 4/12/2021 2:07 PM, Andreas Rheinhardt wrote:

This encoder has AVCodec.pix_fmts set, so ff_encode_preinit() already
checks for this.

Signed-off-by: Andreas Rheinhardt 
---
Will apply tomorrow unless there are objections.

  libavcodec/jpeglsenc.c | 8 
  1 file changed, 8 deletions(-)

diff --git a/libavcodec/jpeglsenc.c b/libavcodec/jpeglsenc.c
index 2bb6b1407a..d03ce32f41 100644
--- a/libavcodec/jpeglsenc.c
+++ b/libavcodec/jpeglsenc.c
@@ -429,14 +429,6 @@ FF_DISABLE_DEPRECATION_WARNINGS
  FF_ENABLE_DEPRECATION_WARNINGS
  #endif
  
-if (ctx->pix_fmt != AV_PIX_FMT_GRAY8  &&

-ctx->pix_fmt != AV_PIX_FMT_GRAY16 &&
-ctx->pix_fmt != AV_PIX_FMT_RGB24  &&
-ctx->pix_fmt != AV_PIX_FMT_BGR24) {
-av_log(ctx, AV_LOG_ERROR,
-   "Only grayscale and RGB24/BGR24 images are supported\n");
-return -1;
-}
  return 0;
  }


nit: The only code left in this function after this patch will be gone 
after the bump, so maybe either wrap the entire function (and the 
AVCodec initializer) with the relevant check, or postpone applying this 
patch until after the bump so you can remove the whole thing in one go.


LGTM regardless of the above.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH] libavdevice/gdigrab: fix capture window title contain non-ASCII chars

2021-04-12 Thread Jan Ekström
On Sat, Mar 20, 2021 at 5:34 PM <1160386...@qq.com> wrote:
>
> From: He Yang <1160386...@qq.com>
>
> Signed-off-by: He Yang <1160386...@qq.com>

Sorry for taking a while to respond, and thank you for the
contribution. I have verified that this conversion and FindWindowW
usage indeed fixes issues with non-ASCII window titles.

Before:
[gdigrab @ 01d1f24b2cc0] Can't find window 'ジャンキーナイトタウンオーケストラ _
すりぃ feat.鏡音レン-sm36109943.mp4 - mpv', aborting.
title=ジャンキーナイトタウンオーケストラ _ すりぃ feat.鏡音レン-sm36109943.mp4 - mpv: I/O error

After:
[gdigrab @ 017d298b2cc0] Found window ジャンキーナイトタウンオーケストラ _  すりぃ
feat.鏡音レン-sm36109943.mp4 - mpv, capturing 1920x1080x32 at (0,0)
Input #0, gdigrab, from 'title=ジャンキーナイトタウンオーケストラ _ すりぃ
feat.鏡音レン-sm36109943.mp4 - mpv':

Now, taking things step-by-step, first from the most clear things:
1. FFmpeg utilizes C99 features, but follows the rule that no
declarations should happen after non-declaring code within a
scope/context.
src/libavdevice/gdigrab.c: In function 'gdigrab_read_header':
src/libavdevice/gdigrab.c:249:9: warning: ISO C90 forbids mixed
declarations and code [-Wdeclaration-after-statement]
  249 | const wchar_t *name_w = NULL;
  | ^

-> Basically fixed by moving the new wchar_t as the first thing in the
scope of that if branch.

2. Mismatch between function and the calling code in `const`ness.
Const things are nice, but in this case the function takes in a
non-const pointer.

src/libavdevice/gdigrab.c:250:30: warning: passing argument 2 of
'utf8towchar' from incompatible pointer type
[-Wincompatible-pointer-types]
  250 | if(utf8towchar(name, _w)) {
  |  ^~~
  |  |
  |  const wchar_t ** {aka const short
unsigned int **}
In file included from src/libavformat/os_support.h:148,
 from src/libavformat/internal.h:28,
 from src/libavdevice/gdigrab.c:32:
src/libavutil/wchar_filename.h:27:68: note: expected 'wchar_t **' {aka
'short unsigned int **'} but argument is of type 'const wchar_t **'
{aka 'const short unsigned int **'}
   27 | static inline int utf8towchar(const char *filename_utf8,
wchar_t **filename_w)
  |
~~^~

-> Fixed by removing the const from the wchar_t pointer.

Thus we move to actual review:

1. The libavutil header should be explicitly #included. That way users
of headers should be more easily find'able.
2. When utf8towchar returns nonzero, ret should probably be set to
AVERROR(errno). That way we are not re-guessing implementation
specifics of the function. (noticed by Martin)
3. Some whitespace would be good between the variable
declarations/setting, doing the conversion and finally the actual
window finding.

As I had to go through these points for the review process, I
basically posted a version with these changes @
https://github.com/jeeb/ffmpeg/commits/gdigrab_unicode_fix . I also
took the liberty of rewording the commit message somewhat. If you
think these changes are acceptable, then unless something new is
noticed, I consider this LGTM.

Best regards,
Jan
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH] Added Closed caption support for cuviddec for preserving a53 data n GPU decoding

2021-04-12 Thread Dhanish Vijayan
Signed-off-by: Dhanish Vijayan 
---
 libavcodec/cuviddec.c | 199 ++
 1 file changed, 199 insertions(+)

diff --git a/libavcodec/cuviddec.c b/libavcodec/cuviddec.c
index ec57afdefe..3b07d0a874 100644
--- a/libavcodec/cuviddec.c
+++ b/libavcodec/cuviddec.c
@@ -46,6 +46,9 @@
 #define CUVID_HAS_AV1_SUPPORT
 #endif
 
+#define MAX_FRAME_COUNT 25
+#define A53_QUEUE_SIZE (MAX_FRAME_COUNT + 8)
+
 typedef struct CuvidContext
 {
 AVClass *avclass;
@@ -89,6 +92,11 @@ typedef struct CuvidContext
 cudaVideoCodec codec_type;
 cudaVideoChromaFormat chroma_format;
 
+uint8_t* a53_caption;
+int a53_caption_size;
+uint8_t* a53_caption_queue[A53_QUEUE_SIZE];
+int a53_caption_size_queue[A53_QUEUE_SIZE];
+
 CUVIDDECODECAPS caps8, caps10, caps12;
 
 CUVIDPARSERPARAMS cuparseinfo;
@@ -103,6 +111,8 @@ typedef struct CuvidParsedFrame
 CUVIDPARSERDISPINFO dispinfo;
 int second_field;
 int is_deinterlacing;
+uint8_t* a53_caption;
+int a53_caption_size;
 } CuvidParsedFrame;
 
 #define CHECK_CU(x) FF_CUDA_CHECK_DL(avctx, ctx->cudl, x)
@@ -338,6 +348,24 @@ static int CUDAAPI cuvid_handle_picture_decode(void 
*opaque, CUVIDPICPARAMS* pic
 
 ctx->key_frame[picparams->CurrPicIdx] = picparams->intra_pic_flag;
 
+if (ctx->a53_caption)
+{
+
+if (picparams->CurrPicIdx >= A53_QUEUE_SIZE)
+{
+av_log(avctx, AV_LOG_WARNING, "CurrPicIdx too big: %d\n", 
picparams->CurrPicIdx);
+av_freep(>a53_caption);
+}
+else
+{
+int pos = picparams->CurrPicIdx;
+av_freep(>a53_caption_queue[pos]);
+ctx->a53_caption_queue[pos] = ctx->a53_caption;
+ctx->a53_caption_size_queue[pos] = ctx->a53_caption_size;
+ctx->a53_caption = NULL;
+}
+}
+
 ctx->internal_error = 
CHECK_CU(ctx->cvdl->cuvidDecodePicture(ctx->cudecoder, picparams));
 if (ctx->internal_error < 0)
 return 0;
@@ -350,6 +378,20 @@ static int CUDAAPI cuvid_handle_picture_display(void 
*opaque, CUVIDPARSERDISPINF
 AVCodecContext *avctx = opaque;
 CuvidContext *ctx = avctx->priv_data;
 CuvidParsedFrame parsed_frame = { { 0 } };
+uint8_t* a53_caption = NULL;
+int a53_caption_size = 0;
+
+if (dispinfo->picture_index >= A53_QUEUE_SIZE)
+{
+av_log(avctx, AV_LOG_WARNING, "picture_index too big: %d\n", 
dispinfo->picture_index);
+}
+else
+{
+int pos = dispinfo->picture_index;
+a53_caption = ctx->a53_caption_queue[pos];
+a53_caption_size = ctx->a53_caption_size_queue[pos];
+ctx->a53_caption_queue[pos] = NULL;
+}
 
 parsed_frame.dispinfo = *dispinfo;
 ctx->internal_error = 0;
@@ -358,11 +400,17 @@ static int CUDAAPI cuvid_handle_picture_display(void 
*opaque, CUVIDPARSERDISPINF
 parsed_frame.dispinfo.progressive_frame = ctx->progressive_sequence;
 
 if (ctx->deint_mode_current == cudaVideoDeinterlaceMode_Weave) {
+parsed_frame.a53_caption = a53_caption;
+parsed_frame.a53_caption_size = a53_caption_size;
 av_fifo_generic_write(ctx->frame_queue, _frame, 
sizeof(CuvidParsedFrame), NULL);
 } else {
 parsed_frame.is_deinterlacing = 1;
+parsed_frame.a53_caption = a53_caption;
+parsed_frame.a53_caption_size = a53_caption_size;
 av_fifo_generic_write(ctx->frame_queue, _frame, 
sizeof(CuvidParsedFrame), NULL);
 if (!ctx->drop_second_field) {
+parsed_frame.a53_caption = NULL;
+parsed_frame.a53_caption_size = 0;
 parsed_frame.second_field = 1;
 av_fifo_generic_write(ctx->frame_queue, _frame, 
sizeof(CuvidParsedFrame), NULL);
 }
@@ -382,6 +430,139 @@ static int cuvid_is_buffer_full(AVCodecContext *avctx)
 return (av_fifo_size(ctx->frame_queue) / sizeof(CuvidParsedFrame)) + delay 
>= ctx->nb_surfaces;
 }
 
+
+static void cuvid_mpeg_parse_a53(CuvidContext *ctx, const uint8_t* p, int 
buf_size)
+{
+const uint8_t* buf_end = p + buf_size;
+for(;;)
+{
+uint32_t start_code = -1;
+p = avpriv_find_start_code(p, buf_end, _code);
+if (start_code > 0x1ff)
+break;
+if (start_code != 0x1b2)
+continue;
+buf_size = buf_end - p;
+if (buf_size >= 6 &&
+p[0] == 'G' && p[1] == 'A' && p[2] == '9' && p[3] == '4' && p[4] 
== 3 && (p[5] & 0x40))
+{
+/* extract A53 Part 4 CC data */
+int cc_count = p[5] & 0x1f;
+if (cc_count > 0 && buf_size >= 7 + cc_count * 3)
+{
+av_freep(>a53_caption);
+ctx->a53_caption_size = cc_count * 3;
+ctx->a53_caption  = av_malloc(ctx->a53_caption_size);
+if (ctx->a53_caption)
+memcpy(ctx->a53_caption, p + 7, ctx->a53_caption_size);
+}
+}
+else if (buf_size >= 11 && p[0] == 'C' && p[1] 

[FFmpeg-devel] [PATCH v2] lavc/aarch64: add pred16x16 10-bit functions

2021-04-12 Thread Mikhail Nitenko
Benchmarks:
pred16x16_dc_10_c: 124.0
pred16x16_dc_10_neon: 97.2
pred16x16_horizontal_10_c: 71.7
pred16x16_horizontal_10_neon: 66.2
pred16x16_top_dc_10_c: 90.7
pred16x16_top_dc_10_neon: 71.5
pred16x16_vertical_10_c: 64.7
pred16x16_vertical_10_neon: 61.7

Some functions work slower than C and are left commented out.

Signed-off-by: Mikhail Nitenko 
---
 libavcodec/aarch64/h264pred_init.c |  68 +
 libavcodec/aarch64/h264pred_neon.S | 117 +
 2 files changed, 155 insertions(+), 30 deletions(-)

diff --git a/libavcodec/aarch64/h264pred_init.c 
b/libavcodec/aarch64/h264pred_init.c
index b144376f90..325a86bfcd 100644
--- a/libavcodec/aarch64/h264pred_init.c
+++ b/libavcodec/aarch64/h264pred_init.c
@@ -45,42 +45,50 @@ void ff_pred8x8_0lt_dc_neon(uint8_t *src, ptrdiff_t stride);
 void ff_pred8x8_l00_dc_neon(uint8_t *src, ptrdiff_t stride);
 void ff_pred8x8_0l0_dc_neon(uint8_t *src, ptrdiff_t stride);
 
+void ff_pred16x16_top_dc_neon_10(uint8_t *src, ptrdiff_t stride);
+void ff_pred16x16_dc_neon_10(uint8_t *src, ptrdiff_t stride);
+void ff_pred16x16_hor_neon_10(uint8_t *src, ptrdiff_t stride);
+void ff_pred16x16_vert_neon_10(uint8_t *src, ptrdiff_t stride);
+
 static av_cold void h264_pred_init_neon(H264PredContext *h, int codec_id,
 const int bit_depth,
 const int chroma_format_idc)
 {
-const int high_depth = bit_depth > 8;
-
-if (high_depth)
-return;
-
-if (chroma_format_idc <= 1) {
-h->pred8x8[VERT_PRED8x8 ] = ff_pred8x8_vert_neon;
-h->pred8x8[HOR_PRED8x8  ] = ff_pred8x8_hor_neon;
-if (codec_id != AV_CODEC_ID_VP7 && codec_id != AV_CODEC_ID_VP8)
-h->pred8x8[PLANE_PRED8x8] = ff_pred8x8_plane_neon;
-h->pred8x8[DC_128_PRED8x8   ] = ff_pred8x8_128_dc_neon;
-if (codec_id != AV_CODEC_ID_RV40 && codec_id != AV_CODEC_ID_VP7 &&
-codec_id != AV_CODEC_ID_VP8) {
-h->pred8x8[DC_PRED8x8 ] = ff_pred8x8_dc_neon;
-h->pred8x8[LEFT_DC_PRED8x8] = ff_pred8x8_left_dc_neon;
-h->pred8x8[TOP_DC_PRED8x8 ] = ff_pred8x8_top_dc_neon;
-h->pred8x8[ALZHEIMER_DC_L0T_PRED8x8] = ff_pred8x8_l0t_dc_neon;
-h->pred8x8[ALZHEIMER_DC_0LT_PRED8x8] = ff_pred8x8_0lt_dc_neon;
-h->pred8x8[ALZHEIMER_DC_L00_PRED8x8] = ff_pred8x8_l00_dc_neon;
-h->pred8x8[ALZHEIMER_DC_0L0_PRED8x8] = ff_pred8x8_0l0_dc_neon;
+if (bit_depth == 8) {
+if (chroma_format_idc <= 1) {
+h->pred8x8[VERT_PRED8x8 ] = ff_pred8x8_vert_neon;
+h->pred8x8[HOR_PRED8x8  ] = ff_pred8x8_hor_neon;
+if (codec_id != AV_CODEC_ID_VP7 && codec_id != AV_CODEC_ID_VP8)
+h->pred8x8[PLANE_PRED8x8] = ff_pred8x8_plane_neon;
+h->pred8x8[DC_128_PRED8x8   ] = ff_pred8x8_128_dc_neon;
+if (codec_id != AV_CODEC_ID_RV40 && codec_id != AV_CODEC_ID_VP7 &&
+codec_id != AV_CODEC_ID_VP8) {
+h->pred8x8[DC_PRED8x8 ] = ff_pred8x8_dc_neon;
+h->pred8x8[LEFT_DC_PRED8x8] = ff_pred8x8_left_dc_neon;
+h->pred8x8[TOP_DC_PRED8x8 ] = ff_pred8x8_top_dc_neon;
+h->pred8x8[ALZHEIMER_DC_L0T_PRED8x8] = ff_pred8x8_l0t_dc_neon;
+h->pred8x8[ALZHEIMER_DC_0LT_PRED8x8] = ff_pred8x8_0lt_dc_neon;
+h->pred8x8[ALZHEIMER_DC_L00_PRED8x8] = ff_pred8x8_l00_dc_neon;
+h->pred8x8[ALZHEIMER_DC_0L0_PRED8x8] = ff_pred8x8_0l0_dc_neon;
+}
 }
-}
 
-h->pred16x16[DC_PRED8x8 ] = ff_pred16x16_dc_neon;
-h->pred16x16[VERT_PRED8x8   ] = ff_pred16x16_vert_neon;
-h->pred16x16[HOR_PRED8x8] = ff_pred16x16_hor_neon;
-h->pred16x16[LEFT_DC_PRED8x8] = ff_pred16x16_left_dc_neon;
-h->pred16x16[TOP_DC_PRED8x8 ] = ff_pred16x16_top_dc_neon;
-h->pred16x16[DC_128_PRED8x8 ] = ff_pred16x16_128_dc_neon;
-if (codec_id != AV_CODEC_ID_SVQ3 && codec_id != AV_CODEC_ID_RV40 &&
-codec_id != AV_CODEC_ID_VP7 && codec_id != AV_CODEC_ID_VP8)
-h->pred16x16[PLANE_PRED8x8  ] = ff_pred16x16_plane_neon;
+h->pred16x16[DC_PRED8x8 ] = ff_pred16x16_dc_neon;
+h->pred16x16[VERT_PRED8x8   ] = ff_pred16x16_vert_neon;
+h->pred16x16[HOR_PRED8x8] = ff_pred16x16_hor_neon;
+h->pred16x16[LEFT_DC_PRED8x8] = ff_pred16x16_left_dc_neon;
+h->pred16x16[TOP_DC_PRED8x8 ] = ff_pred16x16_top_dc_neon;
+h->pred16x16[DC_128_PRED8x8 ] = ff_pred16x16_128_dc_neon;
+if (codec_id != AV_CODEC_ID_SVQ3 && codec_id != AV_CODEC_ID_RV40 &&
+codec_id != AV_CODEC_ID_VP7 && codec_id != AV_CODEC_ID_VP8)
+h->pred16x16[PLANE_PRED8x8  ] = ff_pred16x16_plane_neon;
+}
+if (bit_depth == 10) {
+h->pred16x16[DC_PRED8x8 ] = ff_pred16x16_dc_neon_10;
+h->pred16x16[VERT_PRED8x8   ] = ff_pred16x16_vert_neon_10;
+h->pred16x16[HOR_PRED8x8] = 

[FFmpeg-devel] [PATCH] avcodec/jpeglsenc: Remove redundant pixel format checks

2021-04-12 Thread Andreas Rheinhardt
This encoder has AVCodec.pix_fmts set, so ff_encode_preinit() already
checks for this.

Signed-off-by: Andreas Rheinhardt 
---
Will apply tomorrow unless there are objections.

 libavcodec/jpeglsenc.c | 8 
 1 file changed, 8 deletions(-)

diff --git a/libavcodec/jpeglsenc.c b/libavcodec/jpeglsenc.c
index 2bb6b1407a..d03ce32f41 100644
--- a/libavcodec/jpeglsenc.c
+++ b/libavcodec/jpeglsenc.c
@@ -429,14 +429,6 @@ FF_DISABLE_DEPRECATION_WARNINGS
 FF_ENABLE_DEPRECATION_WARNINGS
 #endif
 
-if (ctx->pix_fmt != AV_PIX_FMT_GRAY8  &&
-ctx->pix_fmt != AV_PIX_FMT_GRAY16 &&
-ctx->pix_fmt != AV_PIX_FMT_RGB24  &&
-ctx->pix_fmt != AV_PIX_FMT_BGR24) {
-av_log(ctx, AV_LOG_ERROR,
-   "Only grayscale and RGB24/BGR24 images are supported\n");
-return -1;
-}
 return 0;
 }
 
-- 
2.27.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH v3 2/5] avcodec/mips: Refine get_cabac_inline_mips.

2021-04-12 Thread 殷时友

> 2021年3月31日 下午10:39,Michael Niedermayer  写道:
> 
> On Tue, Mar 30, 2021 at 08:51:52PM +0800, Shiyou Yin wrote:
>> 1. Refined function get_cabac_inline_mips.
>> 2. Optimize function get_cabac_bypass and get_cabac_bypass_sign.
>> 
>> Speed of decoding h264: 4.89x ==> 5.05x(tested on 3A4000).
>> ---
>> libavcodec/mips/cabac.h | 131 
>> +---
>> 1 file changed, 102 insertions(+), 29 deletions(-)
> 
> This breaks fate with qemu mips
> 
> --- ffmpeg/tests/ref/fate/hevc-cabac-tudepth  2021-03-26 18:34:55.142789579 
> +0100
> +++ tests/data/fate/hevc-cabac-tudepth2021-03-31 16:36:50.613173111 
> +0200
> @@ -3,4 +3,4 @@
> #codec_id 0: rawvideo
> #dimensions 0: 64x64
> #sar 0: 0/1
> -0,  0,  0,1,12288, 0x0127a0d9
> +0,  0,  0,1,12288, 0xa330b3bd
> Test hevc-cabac-tudepth failed. Look at 
> tests/data/fate/hevc-cabac-tudepth.err for details.
> ffmpeg/tests/Makefile:255: recipe for target 'fate-hevc-cabac-tudepth' failed
> make: *** [fate-hevc-cabac-tudepth] Error 1
> 

This bug is caused by using ‘lhu’ to load two byte date on bigendian 
environment. Has been fixed in V4.  
Please help  to merge them.

BTW, I found another failed case ‘fate-sub2video_time_limited’  when testing 
origin/master
with cross compiler mips-linux-gnu-gcc-8 on debian10-x64 and run fate  with 
qemu-mips.
I will try to analyze it later.

My configuration: --samples=../../fate-suite/ --target-exec='/usr/bin/qemu-mips 
-cpu 74Kf -L /usr/mips-linux-gnu/' --cross-prefix=/usr/mips-linux-gnu/bin/ 
--cc=mips-linux-gnu-gcc-8 --arch=mips --target-os=linux --optflags='-O3 -g 
-static' --extra-ldflags=‘-static' --enable-cross-compile --enable-static 
--enable-gpl --disable-pthreads --disable-iconv --disable-mipsfpu

TESTsub2video_time_limited
--- src/tests/ref/fate/sub2video_time_limited   2021-04-10 11:53:37.661350105 
+0800
+++ tests/data/fate/sub2video_time_limited  2021-04-12 23:18:29.355527385 
+0800
@@ -4,5 +4,5 @@
 #dimensions 0: 1920x1080
 #sar 0: 0/1
 0,  2,  2,1,  8294400, 0x
-0,  2,  2,1,  8294400, 0xa87c518f
-0, 10, 10,1,  8294400, 0xa87c518f
+0,  2,  2,1,  8294400, 0xea5a518f
+0, 10, 10,1,  8294400, 0xea5a518f
Test sub2video_time_limited failed. Look at 
tests/data/fate/sub2video_time_limited.err for details.
make: *** [src/tests/Makefile:256:fate-sub2video_time_limited] 错误 1
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH] avformat/rawenc: remove singlejpeg muxer

2021-04-12 Thread Gyan Doshi

Ping.

On 2021-04-10 20:00, Gyan Doshi wrote:

It was added in 51ac1f616f due to ticket #4218, in order to show a single
image via ffserver. With ffserver long gone, it serves no purpose.
---
  libavformat/Makefile |  1 -
  libavformat/allformats.c |  1 -
  libavformat/rawenc.c | 13 -
  3 files changed, 15 deletions(-)

diff --git a/libavformat/Makefile b/libavformat/Makefile
index 0f340f74a0..bc1ddfa81c 100644
--- a/libavformat/Makefile
+++ b/libavformat/Makefile
@@ -506,7 +506,6 @@ OBJS-$(CONFIG_SGA_DEMUXER)   += sga.o
  OBJS-$(CONFIG_SHORTEN_DEMUXER)   += shortendec.o rawdec.o
  OBJS-$(CONFIG_SIFF_DEMUXER)  += siff.o
  OBJS-$(CONFIG_SIMBIOSIS_IMX_DEMUXER) += imx.o
-OBJS-$(CONFIG_SINGLEJPEG_MUXER)  += rawenc.o
  OBJS-$(CONFIG_SLN_DEMUXER)   += pcmdec.o pcm.o
  OBJS-$(CONFIG_SMACKER_DEMUXER)   += smacker.o
  OBJS-$(CONFIG_SMJPEG_DEMUXER)+= smjpegdec.o smjpeg.o
diff --git a/libavformat/allformats.c b/libavformat/allformats.c
index a38fd1f583..fa093c7ac2 100644
--- a/libavformat/allformats.c
+++ b/libavformat/allformats.c
@@ -405,7 +405,6 @@ extern AVInputFormat  ff_sga_demuxer;
  extern AVInputFormat  ff_shorten_demuxer;
  extern AVInputFormat  ff_siff_demuxer;
  extern AVInputFormat  ff_simbiosis_imx_demuxer;
-extern AVOutputFormat ff_singlejpeg_muxer;
  extern AVInputFormat  ff_sln_demuxer;
  extern AVInputFormat  ff_smacker_demuxer;
  extern AVInputFormat  ff_smjpeg_demuxer;
diff --git a/libavformat/rawenc.c b/libavformat/rawenc.c
index caec297f4a..a43a7a6278 100644
--- a/libavformat/rawenc.c
+++ b/libavformat/rawenc.c
@@ -399,19 +399,6 @@ AVOutputFormat ff_mjpeg_muxer = {
  };
  #endif
  
-#if CONFIG_SINGLEJPEG_MUXER

-AVOutputFormat ff_singlejpeg_muxer = {
-.name  = "singlejpeg",
-.long_name = NULL_IF_CONFIG_SMALL("JPEG single image"),
-.mime_type = "image/jpeg",
-.audio_codec   = AV_CODEC_ID_NONE,
-.video_codec   = AV_CODEC_ID_MJPEG,
-.init  = force_one_stream,
-.write_packet  = ff_raw_write_packet,
-.flags = AVFMT_NOTIMESTAMPS,
-};
-#endif
-
  #if CONFIG_MLP_MUXER
  AVOutputFormat ff_mlp_muxer = {
  .name  = "mlp",


___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH v4 5/5] mips: Fix potential illegal instruction error.

2021-04-12 Thread Shiyou Yin
MSA2 optimizations are attached to MSA macros in generic_macros_msa.h.
It's difficult to do runtime check for them. Remove this part of code
can make it more robust. H264 1080p decoding: 5.13x==>5.12x.
---
 configure   |  7 +--
 libavutil/mips/generic_macros_msa.h | 37 -
 2 files changed, 1 insertion(+), 43 deletions(-)

diff --git a/configure b/configure
index d7a3f50..7b05612 100755
--- a/configure
+++ b/configure
@@ -451,7 +451,6 @@ Optimization options (experts only):
   --disable-mipsdspdisable MIPS DSP ASE R1 optimizations
   --disable-mipsdspr2  disable MIPS DSP ASE R2 optimizations
   --disable-msadisable MSA optimizations
-  --disable-msa2   disable MSA2 optimizations
   --disable-mipsfpudisable floating point MIPS optimizations
   --disable-mmidisable Loongson SIMD optimizations
   --disable-fast-unaligned consider unaligned accesses slow
@@ -2025,7 +2024,6 @@ ARCH_EXT_LIST_MIPS="
 mipsdsp
 mipsdspr2
 msa
-msa2
 "
 
 ARCH_EXT_LIST_LOONGSON="
@@ -2564,7 +2562,6 @@ mipsdsp_deps="mips"
 mipsdspr2_deps="mips"
 mmi_deps_any="loongson2 loongson3"
 msa_deps="mipsfpu"
-msa2_deps="msa"
 
 cpunop_deps="i686"
 x86_64_select="i686"
@@ -5907,9 +5904,8 @@ elif enabled mips; then
 enabled mipsdsp && check_inline_asm_flags mipsdsp '"addu.qb $t0, $t1, 
$t2"' '-mdsp'
 enabled mipsdspr2 && check_inline_asm_flags mipsdspr2 '"absq_s.qb $t0, 
$t1"' '-mdspr2'
 
-# MSA and MSA2 can be detected at runtime so we supply extra flags here
+# MSA can be detected at runtime so we supply extra flags here
 enabled mipsfpu && enabled msa && check_inline_asm msa '"addvi.b $w0, $w1, 
1"' '-mmsa' && append MSAFLAGS '-mmsa'
-enabled msa && enabled msa2 && check_inline_asm msa2 '"nxbits.any.b $w0, 
$w0"' '-mmsa2' && append MSAFLAGS '-mmsa2'
 
 # loongson2 have no switch cflag so we can only probe toolchain ability
 enabled loongson2 && check_inline_asm loongson2 '"dmult.g $8, $9, $10"' && 
disable loongson3
@@ -7340,7 +7336,6 @@ if enabled mips; then
 echo "MIPS DSP R1 enabled   ${mipsdsp-no}"
 echo "MIPS DSP R2 enabled   ${mipsdspr2-no}"
 echo "MIPS MSA enabled  ${msa-no}"
-echo "MIPS MSA2 enabled ${msa2-no}"
 echo "LOONGSON MMI enabled  ${mmi-no}"
 fi
 if enabled ppc; then
diff --git a/libavutil/mips/generic_macros_msa.h 
b/libavutil/mips/generic_macros_msa.h
index bb25e9f..1486f72 100644
--- a/libavutil/mips/generic_macros_msa.h
+++ b/libavutil/mips/generic_macros_msa.h
@@ -25,10 +25,6 @@
 #include 
 #include 
 
-#if HAVE_MSA2
-#include 
-#endif
-
 #define ALIGNMENT   16
 #define ALLOC_ALIGNED(align) __attribute__ ((aligned((align) << 1)))
 
@@ -1119,15 +1115,6 @@
  unsigned absolute diff values, even-odd pairs are added
  together to generate 8 halfword results.
 */
-#if HAVE_MSA2
-#define SAD_UB2_UH(in0, in1, ref0, ref1) \
-( {  \
-v8u16 sad_m = { 0 }; \
-sad_m += __builtin_msa2_sad_adj2_u_w2x_b((v16u8) in0, (v16u8) ref0); \
-sad_m += __builtin_msa2_sad_adj2_u_w2x_b((v16u8) in1, (v16u8) ref1); \
-sad_m;   \
-} )
-#else
 #define SAD_UB2_UH(in0, in1, ref0, ref1)\
 ( { \
 v16u8 diff0_m, diff1_m; \
@@ -1141,7 +1128,6 @@
 \
 sad_m;  \
 } )
-#endif // #if HAVE_MSA2
 
 /* Description : Insert specified word elements from input vectors to 1
  destination vector
@@ -2183,12 +2169,6 @@
  extracted and interleaved with same vector 'in0' to generate
  4 word elements keeping sign intact
 */
-#if HAVE_MSA2
-#define UNPCK_R_SH_SW(in, out)   \
-{\
-out = (v4i32) __builtin_msa2_w2x_lo_s_h((v8i16) in); \
-}
-#else
 #define UNPCK_R_SH_SW(in, out)   \
 {\
 v8i16 sign_m;\
@@ -2196,7 +2176,6 @@
 sign_m = __msa_clti_s_h((v8i16) in, 0);  \
 out = (v4i32) __msa_ilvr_h(sign_m, (v8i16) in);  \
 }
-#endif // #if HAVE_MSA2
 
 /* Description : Sign extend byte elements from input vector and return
  halfword results in pair of vectors
@@ -2209,13 +2188,6 @@
  Then interleaved left with same vector 'in0' to
  generate 8 signed halfword elements in 'out1'
 */
-#if HAVE_MSA2
-#define UNPCK_SB_SH(in, out0, out1)   \
-{

[FFmpeg-devel] [PATCH v4 3/5] avcodec/mips: Optimize function ff_h264_loop_filter_strength_msa.

2021-04-12 Thread Shiyou Yin
From: gxw 

Speed of decoding H264 1080P: 5.05x ==> 5.13x

Signed-off-by: Shiyou Yin 
---
 libavcodec/mips/Makefile|   3 +-
 libavcodec/mips/h264_deblock_msa.c  | 153 
 libavcodec/mips/h264dsp_init_mips.c |   2 +
 libavcodec/mips/h264dsp_mips.h  |   4 +
 4 files changed, 161 insertions(+), 1 deletion(-)
 create mode 100644 libavcodec/mips/h264_deblock_msa.c

diff --git a/libavcodec/mips/Makefile b/libavcodec/mips/Makefile
index 2be4d9b..81a73a4 100644
--- a/libavcodec/mips/Makefile
+++ b/libavcodec/mips/Makefile
@@ -57,7 +57,8 @@ MSA-OBJS-$(CONFIG_VP8_DECODER)+= 
mips/vp8_mc_msa.o \
  mips/vp8_lpf_msa.o
 MSA-OBJS-$(CONFIG_VP3DSP) += mips/vp3dsp_idct_msa.o
 MSA-OBJS-$(CONFIG_H264DSP)+= mips/h264dsp_msa.o\
- mips/h264idct_msa.o
+ mips/h264idct_msa.o   \
+ mips/h264_deblock_msa.o
 MSA-OBJS-$(CONFIG_H264QPEL)   += mips/h264qpel_msa.o
 MSA-OBJS-$(CONFIG_H264CHROMA) += mips/h264chroma_msa.o
 MSA-OBJS-$(CONFIG_H264PRED)   += mips/h264pred_msa.o
diff --git a/libavcodec/mips/h264_deblock_msa.c 
b/libavcodec/mips/h264_deblock_msa.c
new file mode 100644
index 000..4fed55c
--- /dev/null
+++ b/libavcodec/mips/h264_deblock_msa.c
@@ -0,0 +1,153 @@
+/*
+ * MIPS SIMD optimized H.264 deblocking code
+ *
+ * Copyright (c) 2020 Loongson Technology Corporation Limited
+ *Gu Xiwei 
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "libavcodec/bit_depth_template.c"
+#include "h264dsp_mips.h"
+#include "libavutil/mips/generic_macros_msa.h"
+#include "libavcodec/mips/h264dsp_mips.h"
+
+#define h264_loop_filter_strength_iteration_msa(edges, step, mask_mv, dir, \
+d_idx, mask_dir)   \
+do {   \
+int b_idx = 0; \
+int step_x4 = step << 2; \
+int d_idx_12 = d_idx + 12; \
+int d_idx_52 = d_idx + 52; \
+int d_idx_x4 = d_idx << 2; \
+int d_idx_x4_48 = d_idx_x4 + 48; \
+int dir_x32  = dir * 32; \
+uint8_t *ref_t = (uint8_t*)ref; \
+uint8_t *mv_t  = (uint8_t*)mv; \
+uint8_t *nnz_t = (uint8_t*)nnz; \
+uint8_t *bS_t  = (uint8_t*)bS; \
+mask_mv <<= 3; \
+for (; b_idx < edges; b_idx += step) { \
+out &= mask_dir; \
+if (!(mask_mv & b_idx)) { \
+if (bidir) { \
+ref_2 = LD_SB(ref_t + d_idx_12); \
+ref_3 = LD_SB(ref_t + d_idx_52); \
+ref_0 = LD_SB(ref_t + 12); \
+ref_1 = LD_SB(ref_t + 52); \
+ref_2 = (v16i8)__msa_ilvr_w((v4i32)ref_3, (v4i32)ref_2); \
+ref_0 = (v16i8)__msa_ilvr_w((v4i32)ref_0, (v4i32)ref_0); \
+ref_1 = (v16i8)__msa_ilvr_w((v4i32)ref_1, (v4i32)ref_1); \
+ref_3 = (v16i8)__msa_shf_h((v8i16)ref_2, 0x4e); \
+ref_0 -= ref_2; \
+ref_1 -= ref_3; \
+ref_0 = (v16i8)__msa_or_v((v16u8)ref_0, (v16u8)ref_1); \
+\
+tmp_2 = LD_SH(mv_t + d_idx_x4_48);   \
+tmp_3 = LD_SH(mv_t + 48); \
+tmp_4 = LD_SH(mv_t + 208); \
+tmp_5 = tmp_2 - tmp_3; \
+tmp_6 = tmp_2 - tmp_4; \
+SAT_SH2_SH(tmp_5, tmp_6, 7); \
+tmp_0 = __msa_pckev_b((v16i8)tmp_6, (v16i8)tmp_5); \
+tmp_0 += cnst_1; \
+tmp_0 = (v16i8)__msa_subs_u_b((v16u8)tmp_0, (v16u8)cnst_0);\
+tmp_0 = (v16i8)__msa_sat_s_h((v8i16)tmp_0, 7); \
+tmp_0 = __msa_pckev_b(tmp_0, tmp_0); \
+out   = (v16i8)__msa_or_v((v16u8)ref_0, (v16u8)tmp_0); \
+\
+tmp_2 = LD_SH(mv_t + 208 + d_idx_x4); \
+tmp_5 = tmp_2 - tmp_3; \
+tmp_6 = tmp_2 - tmp_4; \
+SAT_SH2_SH(tmp_5, tmp_6, 7); \
+tmp_1 = __msa_pckev_b((v16i8)tmp_6, (v16i8)tmp_5); \
+tmp_1 += cnst_1; \
+tmp_1 = 

[FFmpeg-devel] [PATCH v4 4/5] avcodec/mips: Refine ff_h264_h_lpf_luma_inter_msa

2021-04-12 Thread Shiyou Yin
From: gxw 

Using mask to avoid judgment, H264 4K decoding speed
improved about 0.1fps tested on 3A4000

Signed-off-by: Shiyou Yin 
---
 libavcodec/mips/h264dsp_msa.c | 465 --
 1 file changed, 171 insertions(+), 294 deletions(-)

diff --git a/libavcodec/mips/h264dsp_msa.c b/libavcodec/mips/h264dsp_msa.c
index a8c3f3c..9d815f8 100644
--- a/libavcodec/mips/h264dsp_msa.c
+++ b/libavcodec/mips/h264dsp_msa.c
@@ -1284,284 +1284,160 @@ static void 
avc_loopfilter_cb_or_cr_intra_edge_ver_msa(uint8_t *data_cb_or_cr,
 }
 }
 
-static void avc_loopfilter_luma_inter_edge_ver_msa(uint8_t *data,
-   uint8_t bs0, uint8_t bs1,
-   uint8_t bs2, uint8_t bs3,
-   uint8_t tc0, uint8_t tc1,
-   uint8_t tc2, uint8_t tc3,
-   uint8_t alpha_in,
-   uint8_t beta_in,
-   ptrdiff_t img_width)
+static void avc_loopfilter_luma_inter_edge_ver_msa(uint8_t* pPix, uint32_t 
iStride,
+   uint8_t iAlpha, uint8_t 
iBeta,
+   uint8_t* pTc)
 {
-v16u8 tmp_vec, bs = { 0 };
-
-tmp_vec = (v16u8) __msa_fill_b(bs0);
-bs = (v16u8) __msa_insve_w((v4i32) bs, 0, (v4i32) tmp_vec);
-tmp_vec = (v16u8) __msa_fill_b(bs1);
-bs = (v16u8) __msa_insve_w((v4i32) bs, 1, (v4i32) tmp_vec);
-tmp_vec = (v16u8) __msa_fill_b(bs2);
-bs = (v16u8) __msa_insve_w((v4i32) bs, 2, (v4i32) tmp_vec);
-tmp_vec = (v16u8) __msa_fill_b(bs3);
-bs = (v16u8) __msa_insve_w((v4i32) bs, 3, (v4i32) tmp_vec);
-
-if (!__msa_test_bz_v(bs)) {
-uint8_t *src = data - 4;
-v16u8 p3_org, p2_org, p1_org, p0_org, q0_org, q1_org, q2_org, q3_org;
-v16u8 p0_asub_q0, p1_asub_p0, q1_asub_q0, alpha, beta;
-v16u8 is_less_than, is_less_than_beta, is_less_than_alpha;
-v16u8 is_bs_greater_than0;
-v16u8 tc = { 0 };
-v16i8 zero = { 0 };
-
-tmp_vec = (v16u8) __msa_fill_b(tc0);
-tc = (v16u8) __msa_insve_w((v4i32) tc, 0, (v4i32) tmp_vec);
-tmp_vec = (v16u8) __msa_fill_b(tc1);
-tc = (v16u8) __msa_insve_w((v4i32) tc, 1, (v4i32) tmp_vec);
-tmp_vec = (v16u8) __msa_fill_b(tc2);
-tc = (v16u8) __msa_insve_w((v4i32) tc, 2, (v4i32) tmp_vec);
-tmp_vec = (v16u8) __msa_fill_b(tc3);
-tc = (v16u8) __msa_insve_w((v4i32) tc, 3, (v4i32) tmp_vec);
-
-is_bs_greater_than0 = (zero < bs);
-
-{
-v16u8 row0, row1, row2, row3, row4, row5, row6, row7;
-v16u8 row8, row9, row10, row11, row12, row13, row14, row15;
-
-LD_UB8(src, img_width,
-   row0, row1, row2, row3, row4, row5, row6, row7);
-src += (8 * img_width);
-LD_UB8(src, img_width,
-   row8, row9, row10, row11, row12, row13, row14, row15);
-
-TRANSPOSE16x8_UB_UB(row0, row1, row2, row3, row4, row5, row6, row7,
-row8, row9, row10, row11,
-row12, row13, row14, row15,
-p3_org, p2_org, p1_org, p0_org,
-q0_org, q1_org, q2_org, q3_org);
-}
-
-p0_asub_q0 = __msa_asub_u_b(p0_org, q0_org);
-p1_asub_p0 = __msa_asub_u_b(p1_org, p0_org);
-q1_asub_q0 = __msa_asub_u_b(q1_org, q0_org);
-
-alpha = (v16u8) __msa_fill_b(alpha_in);
-beta = (v16u8) __msa_fill_b(beta_in);
-
-is_less_than_alpha = (p0_asub_q0 < alpha);
-is_less_than_beta = (p1_asub_p0 < beta);
-is_less_than = is_less_than_beta & is_less_than_alpha;
-is_less_than_beta = (q1_asub_q0 < beta);
-is_less_than = is_less_than_beta & is_less_than;
-is_less_than = is_less_than & is_bs_greater_than0;
-
-if (!__msa_test_bz_v(is_less_than)) {
-v16i8 negate_tc, sign_negate_tc;
-v16u8 p0, q0, p2_asub_p0, q2_asub_q0;
-v8i16 tc_r, tc_l, negate_tc_r, i16_negatetc_l;
-v8i16 p1_org_r, p0_org_r, q0_org_r, q1_org_r;
-v8i16 p1_org_l, p0_org_l, q0_org_l, q1_org_l;
-v8i16 p0_r, q0_r, p0_l, q0_l;
-
-negate_tc = zero - (v16i8) tc;
-sign_negate_tc = __msa_clti_s_b(negate_tc, 0);
-
-ILVRL_B2_SH(sign_negate_tc, negate_tc, negate_tc_r, 
i16_negatetc_l);
-
-UNPCK_UB_SH(tc, tc_r, tc_l);
-UNPCK_UB_SH(p1_org, p1_org_r, p1_org_l);
-UNPCK_UB_SH(p0_org, p0_org_r, p0_org_l);
-UNPCK_UB_SH(q0_org, q0_org_r, q0_org_l);
-
-p2_asub_p0 = __msa_asub_u_b(p2_org, p0_org);
-is_less_than_beta = (p2_asub_p0 < beta);
-is_less_than_beta = 

[FFmpeg-devel] [PATCH V4] [mips] Optimize H264 decoding for MIPS platform.

2021-04-12 Thread Shiyou Yin
v2: Fixed a build error in [PATCH 2/5].
v3: add patch 4/5.
v4: Fix bug in 2/5 caused by instruction 'lhu' on BIGENDIAN environment.

[PATCH v4 1/5] avcodec/mips: Restore the initialization sequence of
[PATCH v4 2/5] avcodec/mips: Refine get_cabac_inline_mips.
[PATCH v4 3/5] avcodec/mips: Optimize function ff_h264_loop_filter_strength_msa.
[PATCH v4 4/5] avcodec/mips: Refine ff_h264_h_lpf_luma_inter_msa
[PATCH v4 5/5] mips: Fix potential illegal instruction error.

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH v4 1/5] avcodec/mips: Restore the initialization sequence of MSA and MMI in ff_h264chroma_init_mips.

2021-04-12 Thread Shiyou Yin
The MSA optimization has been refined in commit 93218c2 and ce0a52e.
It is better than MMI version now.
Speed of decoding H264: 4.83x ==> 4.89x (tested on 3A4000).
---
 libavcodec/mips/h264chroma_init_mips.c | 19 +--
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/libavcodec/mips/h264chroma_init_mips.c 
b/libavcodec/mips/h264chroma_init_mips.c
index 6bb19d3..755cc04 100644
--- a/libavcodec/mips/h264chroma_init_mips.c
+++ b/libavcodec/mips/h264chroma_init_mips.c
@@ -28,7 +28,15 @@ av_cold void ff_h264chroma_init_mips(H264ChromaContext *c, 
int bit_depth)
 int cpu_flags = av_get_cpu_flags();
 int high_bit_depth = bit_depth > 8;
 
-/* MMI apears to be faster than MSA here */
+if (have_mmi(cpu_flags)) {
+if (!high_bit_depth) {
+c->put_h264_chroma_pixels_tab[0] = ff_put_h264_chroma_mc8_mmi;
+c->avg_h264_chroma_pixels_tab[0] = ff_avg_h264_chroma_mc8_mmi;
+c->put_h264_chroma_pixels_tab[1] = ff_put_h264_chroma_mc4_mmi;
+c->avg_h264_chroma_pixels_tab[1] = ff_avg_h264_chroma_mc4_mmi;
+}
+}
+
 if (have_msa(cpu_flags)) {
 if (!high_bit_depth) {
 c->put_h264_chroma_pixels_tab[0] = ff_put_h264_chroma_mc8_msa;
@@ -40,13 +48,4 @@ av_cold void ff_h264chroma_init_mips(H264ChromaContext *c, 
int bit_depth)
 c->avg_h264_chroma_pixels_tab[2] = ff_avg_h264_chroma_mc2_msa;
 }
 }
-
-if (have_mmi(cpu_flags)) {
-if (!high_bit_depth) {
-c->put_h264_chroma_pixels_tab[0] = ff_put_h264_chroma_mc8_mmi;
-c->avg_h264_chroma_pixels_tab[0] = ff_avg_h264_chroma_mc8_mmi;
-c->put_h264_chroma_pixels_tab[1] = ff_put_h264_chroma_mc4_mmi;
-c->avg_h264_chroma_pixels_tab[1] = ff_avg_h264_chroma_mc4_mmi;
-}
-}
 }
-- 
2.1.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH v4 2/5] avcodec/mips: Refine get_cabac_inline_mips.

2021-04-12 Thread Shiyou Yin
1. Refined function get_cabac_inline_mips.
2. Optimize function get_cabac_bypass and get_cabac_bypass_sign.

Speed of decoding h264: 4.89x ==> 5.05x(tested on 3A4000).
---
 libavcodec/mips/cabac.h | 140 ++--
 1 file changed, 112 insertions(+), 28 deletions(-)

diff --git a/libavcodec/mips/cabac.h b/libavcodec/mips/cabac.h
index 3d09e93..0648b9a 100644
--- a/libavcodec/mips/cabac.h
+++ b/libavcodec/mips/cabac.h
@@ -2,7 +2,8 @@
  * Loongson SIMD optimized h264chroma
  *
  * Copyright (c) 2018 Loongson Technology Corporation Limited
- * Copyright (c) 2018 Shiyou Yin 
+ * Contributed by Shiyou Yin 
+ *Gu Xiwei(guxiwei...@loongson.cn)
  *
  * This file is part of FFmpeg.
  *
@@ -25,18 +26,18 @@
 #define AVCODEC_MIPS_CABAC_H
 
 #include "libavcodec/cabac.h"
-#include "libavutil/mips/asmdefs.h"
+#include "libavutil/mips/mmiutils.h"
 #include "config.h"
 
 #define get_cabac_inline get_cabac_inline_mips
 static av_always_inline int get_cabac_inline_mips(CABACContext *c,
- uint8_t * const state){
+  uint8_t * const state){
 mips_reg tmp0, tmp1, tmp2, bit;
 
 __asm__ volatile (
 "lbu  %[bit],0(%[state])   \n\t"
 "and  %[tmp0],   %[c_range], 0xC0  \n\t"
-PTR_ADDU "%[tmp0],   %[tmp0],%[tmp0]   \n\t"
+PTR_SLL  "%[tmp0],   %[tmp0],0x01  \n\t"
 PTR_ADDU "%[tmp0],   %[tmp0],%[tables] \n\t"
 PTR_ADDU "%[tmp0],   %[tmp0],%[bit]\n\t"
 /* tmp1: RangeLPS */
@@ -44,18 +45,11 @@ static av_always_inline int 
get_cabac_inline_mips(CABACContext *c,
 
 PTR_SUBU "%[c_range],%[c_range], %[tmp1]   \n\t"
 PTR_SLL  "%[tmp0],   %[c_range], 0x11  \n\t"
-PTR_SUBU "%[tmp0],   %[tmp0],%[c_low]  \n\t"
-
-/* tmp2: lps_mask */
-PTR_SRA  "%[tmp2],   %[tmp0],0x1F  \n\t"
-/* If tmp0 < 0, lps_mask ==  0x*/
-/* If tmp0 >= 0, lps_mask ==  0x*/
+"slt  %[tmp2],   %[tmp0],%[c_low]  \n\t"
 "beqz %[tmp2],   1f\n\t"
-PTR_SLL  "%[tmp0],   %[c_range], 0x11  \n\t"
+"move %[c_range],%[tmp1]   \n\t"
+"not  %[bit],%[bit]\n\t"
 PTR_SUBU "%[c_low],  %[c_low],   %[tmp0]   \n\t"
-PTR_SUBU "%[tmp0],   %[tmp1],%[c_range]\n\t"
-PTR_ADDU "%[c_range],%[c_range], %[tmp0]   \n\t"
-"xor  %[bit],%[bit], %[tmp2]   \n\t"
 
 "1:\n\t"
 /* tmp1: *state */
@@ -70,23 +64,21 @@ static av_always_inline int 
get_cabac_inline_mips(CABACContext *c,
 PTR_SLL  "%[c_range],%[c_range], %[tmp2]   \n\t"
 PTR_SLL  "%[c_low],  %[c_low],   %[tmp2]   \n\t"
 
-"and  %[tmp0],   %[c_low],   %[cabac_mask] \n\t"
-"bnez %[tmp0],   1f\n\t"
-PTR_ADDIU"%[tmp0],   %[c_low],   -0x01 \n\t"
+"and  %[tmp1],   %[c_low],   %[cabac_mask] \n\t"
+"bnez %[tmp1],   1f\n\t"
+PTR_ADDIU"%[tmp0],   %[c_low],   -0X01 \n\t"
 "xor  %[tmp0],   %[c_low],   %[tmp0]   \n\t"
 PTR_SRA  "%[tmp0],   %[tmp0],0x0f  \n\t"
 PTR_ADDU "%[tmp0],   %[tmp0],%[tables] \n\t"
+/* tmp2: ff_h264_norm_shift[x >> (CABAC_BITS - 1)] */
 "lbu  %[tmp2],   %[norm_off](%[tmp0])  \n\t"
-#if CABAC_BITS == 16
-"lbu  %[tmp0],   0(%[c_bytestream])\n\t"
-"lbu  %[tmp1],   1(%[c_bytestream])\n\t"
-PTR_SLL  "%[tmp0],   %[tmp0],0x09  \n\t"
-PTR_SLL  "%[tmp1],   %[tmp1],0x01  \n\t"
-PTR_ADDU "%[tmp0],   %[tmp0],%[tmp1]   \n\t"
+#if HAVE_BIGENDIAN
+"lhu  %[tmp0],   0(%[c_bytestream])\n\t"
 #else
-"lbu  %[tmp0],   0(%[c_bytestream])\n\t"
-PTR_SLL  "%[tmp0],   %[tmp0],0x01  \n\t"
+"lhu  %[tmp0],   0(%[c_bytestream])\n\t"
+"wsbh %[tmp0],   %[tmp0]   \n\t"
 #endif
+PTR_SLL  "%[tmp0],   %[tmp0],0x01  \n\t"
 PTR_SUBU "%[tmp0],   %[tmp0],%[cabac_mask] \n\t"
 
  

[FFmpeg-devel] [PATCH 5/5] avcodec/mpeg4videodec: update exported AVOptions in the user-facing context

2021-04-12 Thread James Almer
This prevents bogus values being reported on frame multithreaded decoding
scenarios.

Signed-off-by: James Almer 
---
 libavcodec/mpeg4videodec.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/libavcodec/mpeg4videodec.c b/libavcodec/mpeg4videodec.c
index 2c440a5026..de66fe8b83 100644
--- a/libavcodec/mpeg4videodec.c
+++ b/libavcodec/mpeg4videodec.c
@@ -3495,6 +3495,18 @@ static int mpeg4_update_thread_context(AVCodecContext 
*dst,
 
 return 0;
 }
+
+static int mpeg4_update_thread_context_for_user(AVCodecContext *dst,
+const AVCodecContext *src)
+{
+MpegEncContext *m = dst->priv_data;
+const MpegEncContext *m1 = src->priv_data;
+
+m->quarter_sample = m1->quarter_sample;
+m->divx_packed= m1->divx_packed;
+
+return 0;
+}
 #endif
 
 static av_cold void mpeg4_init_static(void)
@@ -3585,6 +3597,7 @@ AVCodec ff_mpeg4_decoder = {
 .pix_fmts  = ff_h263_hwaccel_pixfmt_list_420,
 .profiles  = NULL_IF_CONFIG_SMALL(ff_mpeg4_video_profiles),
 .update_thread_context = 
ONLY_IF_THREADS_ENABLED(mpeg4_update_thread_context),
+.update_thread_context_for_user = 
ONLY_IF_THREADS_ENABLED(mpeg4_update_thread_context_for_user),
 .priv_class = _class,
 .hw_configs= (const AVCodecHWConfigInternal *const []) {
 #if CONFIG_MPEG4_NVDEC_HWACCEL
-- 
2.31.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH] aarch64: h264pred: Optimize the inner loop of existing 8 bit functions

2021-04-12 Thread Lynne
Apr 12, 2021, 10:07 by mar...@martin.st:

> Move the loop counter decrement further from the branch instruction,
> this hides the latency of the decrement.
>
> In loops that first load, then store (the horizontal prediction cases),
> do the decrement after the load (where the next instruction would
> stall a bit anyway, waiting for the result of the load).
>
> In loops that store twice using the same destination register,
> also do the decrement between the two stores (as the second store
> would need to wait for the updated destination register from the
> first instruction).
>
> In loops that store twice to two different destination registers,
> do the decrement before both stores, to do it as soon before the
> branch as possible.
>
> This gives minor (1-2 cycle) speedups in most cases (modulo measurement
> noise), but the horizontal prediction functions get a rather notable
> speedup on the Cortex A53.
>

LGTM
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH] libavutil/cpu: Fix definition of _GNU_SOURCE so it occurs before other includes

2021-04-12 Thread kevin . j . wheatley
From: Kevin Wheatley 

This fix moves the potential definition of _GNU_SOURCE prior to
any includes of system header files as required by the documentation
https://www.gnu.org/software/libc/manual/html_node/Feature-Test-Macros.html

This corrects the CPU_COUNT macro availability, resulting in
sched_getaffinity() being called on Linux systems. This then correctly
returns the number of CPUs when run under containers and other cases
where processor affinity has been setup prior to running FFmpeg

Signed-off-by: Kevin J Wheatley 
---
 libavutil/cpu.c | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/libavutil/cpu.c b/libavutil/cpu.c
index 8e3576a..1496c5d 100644
--- a/libavutil/cpu.c
+++ b/libavutil/cpu.c
@@ -16,6 +16,15 @@
  * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
  */
 
+#include "config.h"
+
+#if HAVE_SCHED_GETAFFINITY
+#ifndef _GNU_SOURCE
+# define _GNU_SOURCE
+#endif
+#include 
+#endif
+
 #include 
 #include 
 #include 
@@ -23,16 +32,9 @@
 #include "attributes.h"
 #include "cpu.h"
 #include "cpu_internal.h"
-#include "config.h"
 #include "opt.h"
 #include "common.h"
 
-#if HAVE_SCHED_GETAFFINITY
-#ifndef _GNU_SOURCE
-# define _GNU_SOURCE
-#endif
-#include 
-#endif
 #if HAVE_GETPROCESSAFFINITYMASK || HAVE_WINRT
 #include 
 #endif
-- 
1.8.5.6

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH] aarch64: h264pred: Optimize the inner loop of existing 8 bit functions

2021-04-12 Thread Martin Storsjö
Move the loop counter decrement further from the branch instruction,
this hides the latency of the decrement.

In loops that first load, then store (the horizontal prediction cases),
do the decrement after the load (where the next instruction would
stall a bit anyway, waiting for the result of the load).

In loops that store twice using the same destination register,
also do the decrement between the two stores (as the second store
would need to wait for the updated destination register from the
first instruction).

In loops that store twice to two different destination registers,
do the decrement before both stores, to do it as soon before the
branch as possible.

This gives minor (1-2 cycle) speedups in most cases (modulo measurement
noise), but the horizontal prediction functions get a rather notable
speedup on the Cortex A53.

Before: Cortex A53 A72 A73
pred8x8_dc_8_neon:60.746.239.2
pred8x8_dc_128_8_neon:30.718.014.0
pred8x8_horizontal_8_neon:42.229.218.5
pred8x8_left_dc_8_neon:   52.736.232.2
pred8x8_mad_cow_dc_0l0_8_neon:48.227.725.7
pred8x8_mad_cow_dc_0lt_8_neon:52.533.234.7
pred8x8_mad_cow_dc_l0t_8_neon:52.531.733.2
pred8x8_mad_cow_dc_l00_8_neon:43.227.025.5
pred8x8_plane_8_neon:112.286.288.2
pred8x8_top_dc_8_neon:40.723.021.2
pred8x8_vertical_8_neon:  27.215.514.0
pred16x16_dc_8_neon:  91.073.270.5
pred16x16_dc_128_8_neon:  43.034.730.7
pred16x16_horizontal_8_neon:  86.049.744.7
pred16x16_left_dc_8_neon: 87.067.267.5
pred16x16_plane_8_neon:  236.0   175.7   173.0
pred16x16_top_dc_8_neon:  53.239.041.7
pred16x16_vertical_8_neon:41.729.731.0

After:
pred8x8_dc_8_neon:59.046.742.5
pred8x8_dc_128_8_neon:28.218.014.0
pred8x8_horizontal_8_neon:34.229.218.5
pred8x8_left_dc_8_neon:   51.038.232.7
pred8x8_mad_cow_dc_0l0_8_neon:46.728.226.2
pred8x8_mad_cow_dc_0lt_8_neon:55.233.737.5
pred8x8_mad_cow_dc_l0t_8_neon:51.231.737.2
pred8x8_mad_cow_dc_l00_8_neon:41.727.526.0
pred8x8_plane_8_neon:111.586.589.5
pred8x8_top_dc_8_neon:39.023.221.0
pred8x8_vertical_8_neon:  27.216.014.0
pred16x16_dc_8_neon:  85.070.270.5
pred16x16_dc_128_8_neon:  42.030.030.7
pred16x16_horizontal_8_neon:  66.549.542.5
pred16x16_left_dc_8_neon: 81.066.567.5
pred16x16_plane_8_neon:  235.0   175.7   173.0
pred16x16_top_dc_8_neon:  52.039.041.7
pred16x16_vertical_8_neon:40.233.231.0

Despite this, a number of these functions still are slower than
what e.g. GCC 7 generates - this shows the relative speedup of the
neon codepaths over the compiler generated ones:

   Cortex A53A72A73
pred8x8_dc_8_neon:   0.86   0.65   1.04
pred8x8_dc_128_8_neon:   0.59   0.44   0.62
pred8x8_horizontal_8_neon:   1.51   0.58   1.30
pred8x8_left_dc_8_neon:  0.72   0.56   0.89
pred8x8_mad_cow_dc_0l0_8_neon:   0.93   0.93   1.37
pred8x8_mad_cow_dc_0lt_8_neon:   1.37   1.41   1.68
pred8x8_mad_cow_dc_l0t_8_neon:   1.21   1.17   1.32
pred8x8_mad_cow_dc_l00_8_neon:   1.24   1.19   1.60
pred8x8_plane_8_neon:3.36   3.58   3.76
pred8x8_top_dc_8_neon:   0.97   0.99   1.43
pred8x8_vertical_8_neon: 0.86   0.78   1.18
pred16x16_dc_8_neon: 1.20   1.06   1.49
pred16x16_dc_128_8_neon: 0.83   0.95   0.99
pred16x16_horizontal_8_neon: 1.78   0.96   1.59
pred16x16_left_dc_8_neon:1.06   0.96   1.32
pred16x16_plane_8_neon:  5.78   6.49   7.19
pred16x16_top_dc_8_neon: 1.48   1.53   1.94
pred16x16_vertical_8_neon:   1.39   1.34   1.98

In particular, on Cortex A72, many of these functions are slower
than the compiler generated code, while they're more beneficial on
e.g. the Cortex A73.
---
 libavcodec/aarch64/h264pred_neon.S | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/libavcodec/aarch64/h264pred_neon.S 
b/libavcodec/aarch64/h264pred_neon.S
index 213b40b3e7..6fec33cf6a 100644
--- a/libavcodec/aarch64/h264pred_neon.S
+++ b/libavcodec/aarch64/h264pred_neon.S
@@ -81,8 +81,8 @@ function ff_pred16x16_dc_neon, export=1
 .L_pred16x16_dc_end:
 mov w3,  #8
 6:  st1 {v0.16b}, [x0], x1
-st1 {v0.16b}, [x0], x1
 subsw3,  w3,  #1
+st1 {v0.16b}, [x0], x1
 b.ne6b
 ret
 endfunc
@@ -91,8 +91,8 @@ function ff_pred16x16_hor_neon, export=1
 sub x2,  x0,  #1
 mov w3,  #16
 

Re: [FFmpeg-devel] [PATCH 2/2] lavc/qsvdec: export AVFilmGrainParams side data

2021-04-12 Thread Xiang, Haihao

Hi Mark / Zhong,

Could you please have a look at this patch when you get some time?

Thanks
Haihao


> When AV_CODEC_EXPORT_DATA_FILM_GRAIN is present, AV1 decoder should
> disable film grain application and export the corresponding side data
> ---
>  libavcodec/qsv_internal.h |  3 ++
>  libavcodec/qsvdec.c   | 88 +++
>  2 files changed, 91 insertions(+)
> 
> diff --git a/libavcodec/qsv_internal.h b/libavcodec/qsv_internal.h
> index 1d94d429e8..754581087d 100644
> --- a/libavcodec/qsv_internal.h
> +++ b/libavcodec/qsv_internal.h
> @@ -76,6 +76,9 @@ typedef struct QSVFrame {
>  mfxFrameSurface1 surface;
>  mfxEncodeCtrl enc_ctrl;
>  mfxExtDecodedFrameInfo dec_info;
> +#if QSV_VERSION_ATLEAST(1, 34)
> +mfxExtAV1FilmGrainParam av1_film_grain_param;
> +#endif
>  mfxExtBuffer *ext_param[QSV_MAX_FRAME_EXT_PARAMS];
>  int num_ext_params;
>  
> diff --git a/libavcodec/qsvdec.c b/libavcodec/qsvdec.c
> index 55cf9f35c5..e34441fc0b 100644
> --- a/libavcodec/qsvdec.c
> +++ b/libavcodec/qsvdec.c
> @@ -38,6 +38,7 @@
>  #include "libavutil/pixfmt.h"
>  #include "libavutil/time.h"
>  #include "libavutil/imgutils.h"
> +#include "libavutil/film_grain_params.h"
>  
>  #include "avcodec.h"
>  #include "internal.h"
> @@ -334,6 +335,11 @@ static int qsv_decode_header(AVCodecContext *avctx,
> QSVContext *q,
>  return ff_qsv_print_error(avctx, ret,
>  "Error decoding stream header");
>  
> +#if QSV_VERSION_ATLEAST(1, 34)
> +if (avctx->codec_id == AV_CODEC_ID_AV1)
> +param->mfx.FilmGrain = (avctx->export_side_data &
> AV_CODEC_EXPORT_DATA_FILM_GRAIN) ? 0 : param->mfx.FilmGrain;
> +#endif
> +
>  return 0;
>  }
>  
> @@ -373,6 +379,12 @@ static int alloc_frame(AVCodecContext *avctx, QSVContext
> *q, QSVFrame *frame)
>  frame->dec_info.Header.BufferId = MFX_EXTBUFF_DECODED_FRAME_INFO;
>  frame->dec_info.Header.BufferSz = sizeof(frame->dec_info);
>  ff_qsv_frame_add_ext_param(avctx, frame, (mfxExtBuffer *)
> >dec_info);
> +#if QSV_VERSION_ATLEAST(1, 34)
> +frame->av1_film_grain_param.Header.BufferId =
> MFX_EXTBUFF_AV1_FILM_GRAIN_PARAM;
> +frame->av1_film_grain_param.Header.BufferSz = sizeof(frame-
> >av1_film_grain_param);
> +frame->av1_film_grain_param.FilmGrainFlags = 0;
> +ff_qsv_frame_add_ext_param(avctx, frame, (mfxExtBuffer *)
> >av1_film_grain_param);
> +#endif
>  
>  frame->used = 1;
>  
> @@ -443,6 +455,73 @@ static QSVFrame *find_frame(QSVContext *q,
> mfxFrameSurface1 *surf)
>  return NULL;
>  }
>  
> +#if QSV_VERSION_ATLEAST(1, 34)
> +static int qsv_export_film_grain(AVCodecContext *avctx,
> mfxExtAV1FilmGrainParam *ext_param, AVFrame *frame)
> +{
> +AVFilmGrainParams *fgp;
> +AVFilmGrainAOMParams *aom;
> +int i;
> +
> +if (!(ext_param->FilmGrainFlags & MFX_FILM_GRAIN_APPLY))
> +return 0;
> +
> +fgp = av_film_grain_params_create_side_data(frame);
> +
> +if (!fgp)
> +return AVERROR(ENOMEM);
> +
> +fgp->type = AV_FILM_GRAIN_PARAMS_AV1;
> +fgp->seed = ext_param->GrainSeed;
> +aom = >codec.aom;
> +
> +aom->chroma_scaling_from_luma = !!(ext_param->FilmGrainFlags &
> MFX_FILM_GRAIN_CHROMA_SCALING_FROM_LUMA);
> +aom->scaling_shift = ext_param->GrainScalingMinus8 + 8;
> +aom->ar_coeff_lag = ext_param->ArCoeffLag;
> +aom->ar_coeff_shift = ext_param->ArCoeffShiftMinus6 + 6;
> +aom->grain_scale_shift = ext_param->GrainScaleShift;
> +aom->overlap_flag = !!(ext_param->FilmGrainFlags &
> MFX_FILM_GRAIN_OVERLAP);
> +aom->limit_output_range = !!(ext_param->FilmGrainFlags &
> MFX_FILM_GRAIN_CLIP_TO_RESTRICTED_RANGE);
> +
> +aom->num_y_points = ext_param->NumYPoints;
> +
> +for (i = 0; i < aom->num_y_points; i++) {
> +aom->y_points[i][0] = ext_param->PointY[i].Value;
> +aom->y_points[i][1] = ext_param->PointY[i].Scaling;
> +}
> +
> +aom->num_uv_points[0] = ext_param->NumCbPoints;
> +
> +for (i = 0; i < aom->num_uv_points[0]; i++) {
> +aom->uv_points[0][i][0] = ext_param->PointCb[i].Value;
> +aom->uv_points[0][i][1] = ext_param->PointCb[i].Scaling;
> +}
> +
> +aom->num_uv_points[1] = ext_param->NumCrPoints;
> +
> +for (i = 0; i < aom->num_uv_points[1]; i++) {
> +aom->uv_points[1][i][0] = ext_param->PointCr[i].Value;
> +aom->uv_points[1][i][1] = ext_param->PointCr[i].Scaling;
> +}
> +
> +for (i = 0; i < 24; i++)
> +aom->ar_coeffs_y[i] = ext_param->ArCoeffsYPlus128[i] - 128;
> +
> +for (i = 0; i < 25; i++) {
> +aom->ar_coeffs_uv[0][i] = ext_param->ArCoeffsCbPlus128[i] - 128;
> +aom->ar_coeffs_uv[1][i] = ext_param->ArCoeffsCrPlus128[i] - 128;
> +}
> +
> +aom->uv_mult[0] = ext_param->CbMult;
> +aom->uv_mult[1] = ext_param->CrMult;
> +aom->uv_mult_luma[0] = ext_param->CbLumaMult;
> +aom->uv_mult_luma[1] = ext_param->CrLumaMult;
> +aom->uv_offset[0] = ext_param->CbOffset;
> +

Re: [FFmpeg-devel] [PATCH 1/3] lavc/qsv: apply AVCodecContext AVOption -threads to QSV

2021-04-12 Thread Xiang, Haihao
On Sat, 2021-04-10 at 13:32 +0800, Linjie Fu wrote:
> Hi Haihao,
> 
> On Thu, Apr 8, 2021 at 3:10 PM Haihao Xiang  wrote:
> > 
> > By default the SDK creates a thread for each CPU when creating a mfx
> > session for decoding / encoding, which results in CPU overhead on a
> > multi CPU system. Actually creating 2 threads is a better choice for
> > most cases in practice.
> > 
> > This patch allows user to specify the number of threads created for a
> > mfx session via option -threads. If the number is not specified, 2
> > threads will be created by default.
> > 
> > Note the SDK requires at least 2 threads to avoid dead locks[1]
> > 
> > [1]
> > https://github.com/Intel-Media-SDK/MediaSDK/blob/master/_studio/mfx_lib/scheduler/linux/src/mfx_scheduler_core_ischeduler.cpp#L90-L93
> > ---
> 
> Optional choice for users to specify the thread number looks reasonable to me,
> and decreasing the CPU overhead makes sense for HW encoding pipeline.
> 
> Also curious about what's the tradeoff of decreasing the thread number to 2.
> Would the performance or something else drop?

Thanks for the comment. MSDK threads are used to execute MSDK tasks. For hw
decoding /encoding pipeline, these tasks are very light, so we may use a few
threads for msdk tasks. I didn't see performance drop in my testing after
applying this patch.

Regards
Haihao


> 
> - linjie
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".