[FFmpeg-devel] [PATCH V8 3/3] lavfi: add filter dnn_detect for object detection
Below are the example steps to do object detection: 1. download and install l_openvino_toolkit_p_2021.1.110.tgz from https://software.intel.com/content/www/us/en/develop/tools/openvino-toolkit/download.html or, we can get source code (tag 2021.1), build and install. 2. export LD_LIBRARY_PATH with openvino settings, for example: .../deployment_tools/inference_engine/lib/intel64/:.../deployment_tools/inference_engine/external/tbb/lib/ 3. rebuild ffmpeg from source code with configure option: --enable-libopenvino --extra-cflags='-I.../deployment_tools/inference_engine/include/' --extra-ldflags='-L.../deployment_tools/inference_engine/lib/intel64' 4. download model files and test image wget https://github.com/guoyejun/ffmpeg_dnn/raw/main/models/openvino/2021.1/face-detection-adas-0001.bin wget https://github.com/guoyejun/ffmpeg_dnn/raw/main/models/openvino/2021.1/face-detection-adas-0001.xml wget https://github.com/guoyejun/ffmpeg_dnn/raw/main/models/openvino/2021.1/face-detection-adas-0001.label wget https://github.com/guoyejun/ffmpeg_dnn/raw/main/images/cici.jpg 5. run ffmpeg with: ./ffmpeg -i cici.jpg -vf dnn_detect=dnn_backend=openvino:model=face-detection-adas-0001.xml:input=data:output=detection_out:confidence=0.6:labels=face-detection-adas-0001.label,showinfo -f null - We'll see the detect result as below: [Parsed_showinfo_1 @ 0x560c21ecbe40] side data - detection bounding boxes: [Parsed_showinfo_1 @ 0x560c21ecbe40] source: face-detection-adas-0001.xml [Parsed_showinfo_1 @ 0x560c21ecbe40] index: 0, region: (1005, 813) -> (1086, 905), label: face, confidence: 1/1. [Parsed_showinfo_1 @ 0x560c21ecbe40] index: 1, region: (888, 839) -> (967, 926), label: face, confidence: 6917/1. There are two faces detected with confidence 100% and 69.17%. Signed-off-by: Guo, Yejun --- configure | 1 + doc/filters.texi| 40 libavfilter/Makefile| 1 + libavfilter/allfilters.c| 1 + libavfilter/vf_dnn_detect.c | 421 5 files changed, 464 insertions(+) create mode 100644 libavfilter/vf_dnn_detect.c diff --git a/configure b/configure index d7a3f507e8..cc1013fb1d 100755 --- a/configure +++ b/configure @@ -3555,6 +3555,7 @@ derain_filter_select="dnn" deshake_filter_select="pixelutils" deshake_opencl_filter_deps="opencl" dilation_opencl_filter_deps="opencl" +dnn_detect_filter_select="dnn" dnn_processing_filter_select="dnn" drawtext_filter_deps="libfreetype" drawtext_filter_suggest="libfontconfig libfribidi" diff --git a/doc/filters.texi b/doc/filters.texi index 5e35fa6467..68f17dd563 100644 --- a/doc/filters.texi +++ b/doc/filters.texi @@ -10127,6 +10127,46 @@ ffmpeg -i INPUT -f lavfi -i nullsrc=hd720,geq='r=128+80*(sin(sqrt((X-W/2)*(X-W/2 @end example @end itemize +@section dnn_detect + +Do object detection with deep neural networks. + +The filter accepts the following options: + +@table @option +@item dnn_backend +Specify which DNN backend to use for model loading and execution. This option accepts +only openvino now, tensorflow backends will be added. + +@item model +Set path to model file specifying network architecture and its parameters. +Note that different backends use different file formats. + +@item input +Set the input name of the dnn network. + +@item output +Set the output name of the dnn network. + +@item confidence +Set the confidence threshold (default: 0.5). + +@item labels +Set path to label file specifying the mapping between label id and name. +Each label name is written in one line, tailing spaces and empty lines are skipped. +The first line is the name of label id 0 (usually it is 'background'), +and the second line is the name of label id 1, etc. +The label id is considered as name if the label file is not provided. + +@item backend_configs +Set the configs to be passed into backend + +@item async +use DNN async execution if set (default: set), +roll back to sync execution if the backend does not support async. + +@end table + @anchor{dnn_processing} @section dnn_processing diff --git a/libavfilter/Makefile b/libavfilter/Makefile index b2c254ea67..b77f2276a4 100644 --- a/libavfilter/Makefile +++ b/libavfilter/Makefile @@ -245,6 +245,7 @@ OBJS-$(CONFIG_DILATION_FILTER) += vf_neighbor.o OBJS-$(CONFIG_DILATION_OPENCL_FILTER)+= vf_neighbor_opencl.o opencl.o \ opencl/neighbor.o OBJS-$(CONFIG_DISPLACE_FILTER) += vf_displace.o framesync.o +OBJS-$(CONFIG_DNN_DETECT_FILTER) += vf_dnn_detect.o OBJS-$(CONFIG_DNN_PROCESSING_FILTER) += vf_dnn_processing.o OBJS-$(CONFIG_DOUBLEWEAVE_FILTER)+= vf_weave.o OBJS-$(CONFIG_DRAWBOX_FILTER)+= vf_drawbox.o diff --git a/libavfilter/allfilters.c b/libavfilter/allfilters.c index 0872c6e0f2..0d2bf7bbee 100644 --- a/libavfilter/allfilters.c +++ b/libavfilter/allfilters.c @@ -230,6 +230,7 @@ extern AVFilter
[FFmpeg-devel] [PATCH V8 2/3] lavfi: show side data of detection bounding boxes
--- libavfilter/f_sidedata.c | 2 ++ libavfilter/vf_showinfo.c | 29 + 2 files changed, 31 insertions(+) diff --git a/libavfilter/f_sidedata.c b/libavfilter/f_sidedata.c index 3757723375..6f25d2b311 100644 --- a/libavfilter/f_sidedata.c +++ b/libavfilter/f_sidedata.c @@ -71,6 +71,7 @@ static const AVOption filt_name##_options[] = { \ { "S12M_TIMECOD", "", 0, AV_OPT_TYPE_CONST, {.i64 = AV_FRAME_DATA_S12M_TIMECODE }, 0, 0, FLAGS, "type" }, \ { "DYNAMIC_HDR_PLUS", "", 0, AV_OPT_TYPE_CONST, {.i64 = AV_FRAME_DATA_DYNAMIC_HDR_PLUS }, 0, 0, FLAGS, "type" }, \ { "REGIONS_OF_INTEREST","", 0, AV_OPT_TYPE_CONST, {.i64 = AV_FRAME_DATA_REGIONS_OF_INTEREST}, 0, 0, FLAGS, "type" }, \ +{ "DETECTION_BOUNDING_BOXES", "", 0, AV_OPT_TYPE_CONST, {.i64 = AV_FRAME_DATA_DETECTION_BBOXES }, 0, 0, FLAGS, "type" }, \ { "SEI_UNREGISTERED", "", 0, AV_OPT_TYPE_CONST, {.i64 = AV_FRAME_DATA_SEI_UNREGISTERED }, 0, 0, FLAGS, "type" }, \ { NULL } \ } @@ -100,6 +101,7 @@ static const AVOption filt_name##_options[] = { \ { "S12M_TIMECOD", "", 0, AV_OPT_TYPE_CONST, {.i64 = AV_FRAME_DATA_S12M_TIMECODE }, 0, 0, FLAGS, "type" }, \ { "DYNAMIC_HDR_PLUS", "", 0, AV_OPT_TYPE_CONST, {.i64 = AV_FRAME_DATA_DYNAMIC_HDR_PLUS }, 0, 0, FLAGS, "type" }, \ { "REGIONS_OF_INTEREST","", 0, AV_OPT_TYPE_CONST, {.i64 = AV_FRAME_DATA_REGIONS_OF_INTEREST}, 0, 0, FLAGS, "type" }, \ +{ "DETECTION_BOUNDING_BOXES", "", 0, AV_OPT_TYPE_CONST, {.i64 = AV_FRAME_DATA_DETECTION_BBOXES }, 0, 0, FLAGS, "type" }, \ { "SEI_UNREGISTERED", "", 0, AV_OPT_TYPE_CONST, {.i64 = AV_FRAME_DATA_SEI_UNREGISTERED }, 0, 0, FLAGS, "type" }, \ { NULL } \ } diff --git a/libavfilter/vf_showinfo.c b/libavfilter/vf_showinfo.c index 6208892005..ae6f6bb7b1 100644 --- a/libavfilter/vf_showinfo.c +++ b/libavfilter/vf_showinfo.c @@ -38,6 +38,7 @@ #include "libavutil/timecode.h" #include "libavutil/mastering_display_metadata.h" #include "libavutil/video_enc_params.h" +#include "libavutil/detection_bbox.h" #include "avfilter.h" #include "internal.h" @@ -153,6 +154,31 @@ static void dump_roi(AVFilterContext *ctx, const AVFrameSideData *sd) } } +static void dump_detection_bbox(AVFilterContext *ctx, const AVFrameSideData *sd) +{ +int nb_bboxes; +const AVDetectionBBoxHeader *header; +const AVDetectionBBox *bbox; + +header = (const AVDetectionBBoxHeader *)sd->data; +nb_bboxes = header->nb_bboxes; +av_log(ctx, AV_LOG_INFO, "detection bounding boxes:\n"); +av_log(ctx, AV_LOG_INFO, "source: %s\n", header->source); + +for (int i = 0; i < nb_bboxes; i++) { +bbox = av_get_detection_bbox(header, i); +av_log(ctx, AV_LOG_INFO, "index: %d,\tregion: (%d, %d) -> (%d, %d), label: %s, confidence: %d/%d.\n", + i, bbox->x, bbox->y, bbox->x + bbox->w, bbox->y + bbox->h, + bbox->detect_label, bbox->detect_confidence.num, bbox->detect_confidence.den); +if (bbox->classify_count > 0) { +for (int j = 0; j < bbox->classify_count; j++) { +av_log(ctx, AV_LOG_INFO, "\t\tclassify: label: %s, confidence: %d/%d.\n", + bbox->classify_labels[j], bbox->classify_confidences[j].num, bbox->classify_confidences[j].den); +} +} +} +} + static void dump_mastering_display(AVFilterContext *ctx, const AVFrameSideData *sd) { const AVMasteringDisplayMetadata *mastering_display; @@ -494,6 +520,9 @@ static int filter_frame(AVFilterLink *inlink, AVFrame *frame) case AV_FRAME_DATA_REGIONS_OF_INTEREST: dump_roi(ctx, sd); break; +case AV_FRAME_DATA_DETECTION_BBOXES: +dump_detection_bbox(ctx, sd); +break; case AV_FRAME_DATA_MASTERING_DISPLAY_METADATA: dump_mastering_display(ctx, sd); break; -- 2.17.1 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH V8 1/3] lavu: add side data AV_FRAME_DATA_DETECTION_BBOXES for object detection/classification
--- doc/APIchanges | 2 + libavutil/Makefile | 2 + libavutil/detection_bbox.c | 73 + libavutil/detection_bbox.h | 107 + libavutil/frame.c | 1 + libavutil/frame.h | 6 +++ 6 files changed, 191 insertions(+) create mode 100644 libavutil/detection_bbox.c create mode 100644 libavutil/detection_bbox.h diff --git a/doc/APIchanges b/doc/APIchanges index 9dfcc97d5c..30bd235691 100644 --- a/doc/APIchanges +++ b/doc/APIchanges @@ -14,6 +14,8 @@ libavutil: 2017-10-21 API changes, most recent first: +2021-04-xx - xx - lavu 56.xx.100 - frame.h detection_bbox.h + Add AV_FRAME_DATA_DETECTION_BBOXES 2021-04-06 - xx - lavf 58.78.100 - avformat.h Add avformat_index_get_entries_count(), avformat_index_get_entry(), diff --git a/libavutil/Makefile b/libavutil/Makefile index 27bafe9e12..47efb718d2 100644 --- a/libavutil/Makefile +++ b/libavutil/Makefile @@ -21,6 +21,7 @@ HEADERS = adler32.h \ cpu.h \ crc.h \ des.h \ + detection_bbox.h \ dict.h\ display.h \ dovi_meta.h \ @@ -113,6 +114,7 @@ OBJS = adler32.o \ cpu.o\ crc.o\ des.o\ + detection_bbox.o \ dict.o \ display.o\ dovi_meta.o \ diff --git a/libavutil/detection_bbox.c b/libavutil/detection_bbox.c new file mode 100644 index 00..c54a30d9e5 --- /dev/null +++ b/libavutil/detection_bbox.c @@ -0,0 +1,73 @@ +/* + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "detection_bbox.h" + +AVDetectionBBoxHeader *av_detection_bbox_alloc(uint32_t nb_bboxes, size_t *out_size) +{ +size_t size; +struct { + AVDetectionBBoxHeader header; + AVDetectionBBox boxes[1]; +} *ret; + +size = sizeof(*ret); +if (nb_bboxes - 1 > (SIZE_MAX - size) / sizeof(*ret->boxes)) +return NULL; +size += sizeof(*ret->boxes) * (nb_bboxes - 1); + +ret = av_mallocz(size); +if (!ret) +return NULL; + +ret->header.nb_bboxes = nb_bboxes; +ret->header.bbox_size = sizeof(*ret->boxes); +ret->header.bboxes_offset = (char *)>boxes - (char *)>header; + +if (out_size) +*out_size = size; + +return >header; +} + +AVDetectionBBoxHeader *av_detection_bbox_create_side_data(AVFrame *frame, uint32_t nb_bboxes) +{ +AVBufferRef *buf; +AVDetectionBBoxHeader *header; +size_t size; + +header = av_detection_bbox_alloc(nb_bboxes, ); +if (!header) +return NULL; +if (size > INT_MAX) { +av_freep(); +return NULL; +} +buf = av_buffer_create((uint8_t *)header, size, NULL, NULL, 0); +if (!buf) { +av_freep(); +return NULL; +} + +if (!av_frame_new_side_data_from_buf(frame, AV_FRAME_DATA_DETECTION_BBOXES, buf)) { +av_buffer_unref(); +return NULL; +} + +return header; +} diff --git a/libavutil/detection_bbox.h b/libavutil/detection_bbox.h new file mode 100644 index 00..4ad05d3b95 --- /dev/null +++ b/libavutil/detection_bbox.h @@ -0,0 +1,107 @@ +/* + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + *
Re: [FFmpeg-devel] [PATCH] libavcodec/qsvenc: add mbbrc to hevc_qsv
On Tue, 2021-04-13 at 10:22 +0800, wenbin.c...@intel.com wrote: > From: "Chen,Wenbin" > > Add mbbrc to hevc_qsv > For detailed description, please see "mbbrc" part in: > https://github.com/Intel-Media-SDK/MediaSDK/blob/master/doc/mediasdk-man.md#mfxextcodingoption2 > > Signed-off-by: Wenbin Chen > --- > libavcodec/qsvenc.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/libavcodec/qsvenc.c b/libavcodec/qsvenc.c > index 566a5c8552..19e246a8fb 100644 > --- a/libavcodec/qsvenc.c > +++ b/libavcodec/qsvenc.c > @@ -701,8 +701,6 @@ FF_ENABLE_DEPRECATION_WARNINGS > > if (q->bitrate_limit >= 0) > q->extco2.BitrateLimit = q->bitrate_limit ? > MFX_CODINGOPTION_ON : MFX_CODINGOPTION_OFF; > -if (q->mbbrc >= 0) > -q->extco2.MBBRC = q->mbbrc ? MFX_CODINGOPTION_ON : > MFX_CODINGOPTION_OFF; > > if (q->max_frame_size >= 0) > q->extco2.MaxFrameSize = q->max_frame_size; > @@ -755,6 +753,9 @@ FF_ENABLE_DEPRECATION_WARNINGS > q->extco2.MaxQPP = q->extco2.MaxQPB = q->extco2.MaxQPI; > } > #endif > +if (q->mbbrc >= 0) > +q->extco2.MBBRC = q->mbbrc ? MFX_CODINGOPTION_ON : > MFX_CODINGOPTION_OFF; > + > q->extco2.Header.BufferId = MFX_EXTBUFF_CODING_OPTION2; > q->extco2.Header.BufferSz = sizeof(q->extco2); LGTM, thanks! -Haihao ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH] libavcodec/qsvenc: add mbbrc to hevc_qsv
From: "Chen,Wenbin" Add mbbrc to hevc_qsv For detailed description, please see "mbbrc" part in: https://github.com/Intel-Media-SDK/MediaSDK/blob/master/doc/mediasdk-man.md#mfxextcodingoption2 Signed-off-by: Wenbin Chen --- libavcodec/qsvenc.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/libavcodec/qsvenc.c b/libavcodec/qsvenc.c index 566a5c8552..19e246a8fb 100644 --- a/libavcodec/qsvenc.c +++ b/libavcodec/qsvenc.c @@ -701,8 +701,6 @@ FF_ENABLE_DEPRECATION_WARNINGS if (q->bitrate_limit >= 0) q->extco2.BitrateLimit = q->bitrate_limit ? MFX_CODINGOPTION_ON : MFX_CODINGOPTION_OFF; -if (q->mbbrc >= 0) -q->extco2.MBBRC = q->mbbrc ? MFX_CODINGOPTION_ON : MFX_CODINGOPTION_OFF; if (q->max_frame_size >= 0) q->extco2.MaxFrameSize = q->max_frame_size; @@ -755,6 +753,9 @@ FF_ENABLE_DEPRECATION_WARNINGS q->extco2.MaxQPP = q->extco2.MaxQPB = q->extco2.MaxQPI; } #endif +if (q->mbbrc >= 0) +q->extco2.MBBRC = q->mbbrc ? MFX_CODINGOPTION_ON : MFX_CODINGOPTION_OFF; + q->extco2.Header.BufferId = MFX_EXTBUFF_CODING_OPTION2; q->extco2.Header.BufferSz = sizeof(q->extco2); -- 2.25.1 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] checkasm: add (private) kperf timing for macOS
Apr 13, 2021, 02:45 by j...@itanimul.li: > Signed-off-by: Josh Dekker > --- > configure| 2 + > tests/checkasm/Makefile | 1 + > tests/checkasm/checkasm.c| 19 - > tests/checkasm/checkasm.h| 10 ++- > tests/checkasm/macos_kperf.c | 143 +++ > tests/checkasm/macos_kperf.h | 23 ++ > 6 files changed, 195 insertions(+), 3 deletions(-) > create mode 100644 tests/checkasm/macos_kperf.c > create mode 100644 tests/checkasm/macos_kperf.h > > diff --git a/configure b/configure > index d7a3f507e8..a47e3dea67 100755 > --- a/configure > +++ b/configure > @@ -490,6 +490,7 @@ Developer options (useful when working on FFmpeg itself): > --ignore-tests=TESTS comma-separated list (without "fate-" prefix > in the name) of tests whose result is ignored > --enable-linux-perf enable Linux Performance Monitor API > + --enable-macos-kperf enable macOS kperf (private) API > --disable-large-testsdisable tests that use a large amount of memory > > NOTE: Object files are built at the place where configure is launched. > @@ -1949,6 +1950,7 @@ CONFIG_LIST=" > fontconfig > large_tests > linux_perf > +macos_kperf > memory_poisoning > neon_clobber_test > ossfuzz > diff --git a/tests/checkasm/Makefile b/tests/checkasm/Makefile > index 1827a4e134..4abaef9c63 100644 > --- a/tests/checkasm/Makefile > +++ b/tests/checkasm/Makefile > @@ -58,6 +58,7 @@ CHECKASMOBJS-$(CONFIG_AVUTIL) += $(AVUTILOBJS) > CHECKASMOBJS-$(ARCH_AARCH64)+= aarch64/checkasm.o > CHECKASMOBJS-$(HAVE_ARMV5TE_EXTERNAL) += arm/checkasm.o > CHECKASMOBJS-$(HAVE_X86ASM) += x86/checkasm.o > +CHECKASMOBJS-$(CONFIG_MACOS_KPERF) += macos_kperf.o > > CHECKASMOBJS += $(CHECKASMOBJS-yes) checkasm.o > CHECKASMOBJS := $(sort $(CHECKASMOBJS:%=tests/checkasm/%)) > diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c > index 8338e8ff58..4c42040244 100644 > --- a/tests/checkasm/checkasm.c > +++ b/tests/checkasm/checkasm.c > @@ -26,6 +26,8 @@ > # ifndef _GNU_SOURCE > # define _GNU_SOURCE // for syscall (performance monitoring API) > # endif > +#elif CONFIG_MACOS_KPERF > +#include "macos_kperf.h" > #endif > > #include > @@ -637,9 +639,20 @@ static int bench_init_linux(void) > } > return 0; > } > -#endif > +#elif CONFIG_MACOS_KPERF > +static int bench_init_kperf(void) > +{ > +if (ff_kperf_init() || ff_kperf_setup()) > +return -1; > > -#if !CONFIG_LINUX_PERF > +if (ff_kperf_cycles(NULL)) { > +fprintf(stderr, "checkasm must be run as root to use kperf on > macOS\n"); > +return -1; > +} > + > +return 0; > +} > +#else > static int bench_init_ffmpeg(void) > { > #ifdef AV_READ_TIME > @@ -656,6 +669,8 @@ static int bench_init(void) > { > #if CONFIG_LINUX_PERF > int ret = bench_init_linux(); > +#elif CONFIG_MACOS_KPERF > +int ret = bench_init_kperf(); > #else > int ret = bench_init_ffmpeg(); > #endif > diff --git a/tests/checkasm/checkasm.h b/tests/checkasm/checkasm.h > index ef6645e3a2..4127081d74 100644 > --- a/tests/checkasm/checkasm.h > +++ b/tests/checkasm/checkasm.h > @@ -31,6 +31,8 @@ > #include > #include > #include > +#elif CONFIG_MACOS_KPERF > +#include "macos_kperf.h" > #endif > > #include "libavutil/avstring.h" > @@ -224,7 +226,7 @@ typedef struct CheckasmPerf { > int iterations; > } CheckasmPerf; > > -#if defined(AV_READ_TIME) || CONFIG_LINUX_PERF > +#if defined(AV_READ_TIME) || CONFIG_LINUX_PERF || CONFIG_MACOS_KPERF > > #if CONFIG_LINUX_PERF > #define PERF_START(t) do { \ > @@ -235,6 +237,12 @@ typedef struct CheckasmPerf { > ioctl(sysfd, PERF_EVENT_IOC_DISABLE, 0);\ > read(sysfd, , sizeof(t)); \ > } while (0) > +#elif CONFIG_MACOS_KPERF > +#define PERF_START(t) do { \ > +t = 0; \ > +ff_kperf_cycles();\ > +} while (0) > +#define PERF_STOP(t) ff_kperf_cycles() > #else > #define PERF_START(t) t = AV_READ_TIME() > #define PERF_STOP(t) t = AV_READ_TIME() - t > diff --git a/tests/checkasm/macos_kperf.c b/tests/checkasm/macos_kperf.c > new file mode 100644 > index 00..e6ae316608 > --- /dev/null > +++ b/tests/checkasm/macos_kperf.c > @@ -0,0 +1,143 @@ > +/* > + * This file is part of FFmpeg. > + * > + * FFmpeg is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License as published by > + * the Free Software Foundation; either version 2 of the License, or > + * (at your option) any later version. > + * > + * FFmpeg is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + * GNU General Public License for more details. > + * > + * You should have received a copy of
[FFmpeg-devel] [PATCH] checkasm: add (private) kperf timing for macOS
Signed-off-by: Josh Dekker --- configure| 2 + tests/checkasm/Makefile | 1 + tests/checkasm/checkasm.c| 19 - tests/checkasm/checkasm.h| 10 ++- tests/checkasm/macos_kperf.c | 143 +++ tests/checkasm/macos_kperf.h | 23 ++ 6 files changed, 195 insertions(+), 3 deletions(-) create mode 100644 tests/checkasm/macos_kperf.c create mode 100644 tests/checkasm/macos_kperf.h diff --git a/configure b/configure index d7a3f507e8..a47e3dea67 100755 --- a/configure +++ b/configure @@ -490,6 +490,7 @@ Developer options (useful when working on FFmpeg itself): --ignore-tests=TESTS comma-separated list (without "fate-" prefix in the name) of tests whose result is ignored --enable-linux-perf enable Linux Performance Monitor API + --enable-macos-kperf enable macOS kperf (private) API --disable-large-testsdisable tests that use a large amount of memory NOTE: Object files are built at the place where configure is launched. @@ -1949,6 +1950,7 @@ CONFIG_LIST=" fontconfig large_tests linux_perf +macos_kperf memory_poisoning neon_clobber_test ossfuzz diff --git a/tests/checkasm/Makefile b/tests/checkasm/Makefile index 1827a4e134..4abaef9c63 100644 --- a/tests/checkasm/Makefile +++ b/tests/checkasm/Makefile @@ -58,6 +58,7 @@ CHECKASMOBJS-$(CONFIG_AVUTIL) += $(AVUTILOBJS) CHECKASMOBJS-$(ARCH_AARCH64)+= aarch64/checkasm.o CHECKASMOBJS-$(HAVE_ARMV5TE_EXTERNAL) += arm/checkasm.o CHECKASMOBJS-$(HAVE_X86ASM) += x86/checkasm.o +CHECKASMOBJS-$(CONFIG_MACOS_KPERF) += macos_kperf.o CHECKASMOBJS += $(CHECKASMOBJS-yes) checkasm.o CHECKASMOBJS := $(sort $(CHECKASMOBJS:%=tests/checkasm/%)) diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c index 8338e8ff58..4c42040244 100644 --- a/tests/checkasm/checkasm.c +++ b/tests/checkasm/checkasm.c @@ -26,6 +26,8 @@ # ifndef _GNU_SOURCE # define _GNU_SOURCE // for syscall (performance monitoring API) # endif +#elif CONFIG_MACOS_KPERF +#include "macos_kperf.h" #endif #include @@ -637,9 +639,20 @@ static int bench_init_linux(void) } return 0; } -#endif +#elif CONFIG_MACOS_KPERF +static int bench_init_kperf(void) +{ +if (ff_kperf_init() || ff_kperf_setup()) +return -1; -#if !CONFIG_LINUX_PERF +if (ff_kperf_cycles(NULL)) { +fprintf(stderr, "checkasm must be run as root to use kperf on macOS\n"); +return -1; +} + +return 0; +} +#else static int bench_init_ffmpeg(void) { #ifdef AV_READ_TIME @@ -656,6 +669,8 @@ static int bench_init(void) { #if CONFIG_LINUX_PERF int ret = bench_init_linux(); +#elif CONFIG_MACOS_KPERF +int ret = bench_init_kperf(); #else int ret = bench_init_ffmpeg(); #endif diff --git a/tests/checkasm/checkasm.h b/tests/checkasm/checkasm.h index ef6645e3a2..4127081d74 100644 --- a/tests/checkasm/checkasm.h +++ b/tests/checkasm/checkasm.h @@ -31,6 +31,8 @@ #include #include #include +#elif CONFIG_MACOS_KPERF +#include "macos_kperf.h" #endif #include "libavutil/avstring.h" @@ -224,7 +226,7 @@ typedef struct CheckasmPerf { int iterations; } CheckasmPerf; -#if defined(AV_READ_TIME) || CONFIG_LINUX_PERF +#if defined(AV_READ_TIME) || CONFIG_LINUX_PERF || CONFIG_MACOS_KPERF #if CONFIG_LINUX_PERF #define PERF_START(t) do { \ @@ -235,6 +237,12 @@ typedef struct CheckasmPerf { ioctl(sysfd, PERF_EVENT_IOC_DISABLE, 0);\ read(sysfd, , sizeof(t)); \ } while (0) +#elif CONFIG_MACOS_KPERF +#define PERF_START(t) do { \ +t = 0; \ +ff_kperf_cycles();\ +} while (0) +#define PERF_STOP(t) ff_kperf_cycles() #else #define PERF_START(t) t = AV_READ_TIME() #define PERF_STOP(t) t = AV_READ_TIME() - t diff --git a/tests/checkasm/macos_kperf.c b/tests/checkasm/macos_kperf.c new file mode 100644 index 00..e6ae316608 --- /dev/null +++ b/tests/checkasm/macos_kperf.c @@ -0,0 +1,143 @@ +/* + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with FFmpeg; if not, write to the Free Software Foundation, Inc., + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + */ + +#include "macos_kperf.h" +#include +#include +#include +
Re: [FFmpeg-devel] [PATCH] avcodec/jpeglsenc: Remove redundant pixel format checks
James Almer: > On 4/12/2021 2:07 PM, Andreas Rheinhardt wrote: >> This encoder has AVCodec.pix_fmts set, so ff_encode_preinit() already >> checks for this. >> >> Signed-off-by: Andreas Rheinhardt >> --- >> Will apply tomorrow unless there are objections. >> >> libavcodec/jpeglsenc.c | 8 >> 1 file changed, 8 deletions(-) >> >> diff --git a/libavcodec/jpeglsenc.c b/libavcodec/jpeglsenc.c >> index 2bb6b1407a..d03ce32f41 100644 >> --- a/libavcodec/jpeglsenc.c >> +++ b/libavcodec/jpeglsenc.c >> @@ -429,14 +429,6 @@ FF_DISABLE_DEPRECATION_WARNINGS >> FF_ENABLE_DEPRECATION_WARNINGS >> #endif >> - if (ctx->pix_fmt != AV_PIX_FMT_GRAY8 && >> - ctx->pix_fmt != AV_PIX_FMT_GRAY16 && >> - ctx->pix_fmt != AV_PIX_FMT_RGB24 && >> - ctx->pix_fmt != AV_PIX_FMT_BGR24) { >> - av_log(ctx, AV_LOG_ERROR, >> - "Only grayscale and RGB24/BGR24 images are supported\n"); >> - return -1; >> - } >> return 0; >> } > > nit: The only code left in this function after this patch will be gone > after the bump, so maybe either wrap the entire function (and the > AVCodec initializer) with the relevant check, or postpone applying this > patch until after the bump so you can remove the whole thing in one go. > I am aware of that and my current plan is to just remove the whole init function in the patch that removes the coded frame. I don't think it makes much sense to touch the #ifs and even add new ones. > LGTM regardless of the above. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] Added Closed caption support for cuviddec for preserving a53 data n GPU decoding
On 4/12/2021 5:21 PM, Dhanish Vijayan wrote: Signed-off-by: Dhanish Vijayan --- libavcodec/cuviddec.c | 199 ++ 1 file changed, 199 insertions(+) diff --git a/libavcodec/cuviddec.c b/libavcodec/cuviddec.c index ec57afdefe..3b07d0a874 100644 --- a/libavcodec/cuviddec.c +++ b/libavcodec/cuviddec.c @@ -46,6 +46,9 @@ #define CUVID_HAS_AV1_SUPPORT #endif +#define MAX_FRAME_COUNT 25 +#define A53_QUEUE_SIZE (MAX_FRAME_COUNT + 8) + typedef struct CuvidContext { AVClass *avclass; @@ -89,6 +92,11 @@ typedef struct CuvidContext cudaVideoCodec codec_type; cudaVideoChromaFormat chroma_format; +uint8_t* a53_caption; +int a53_caption_size; +uint8_t* a53_caption_queue[A53_QUEUE_SIZE]; +int a53_caption_size_queue[A53_QUEUE_SIZE]; + CUVIDDECODECAPS caps8, caps10, caps12; CUVIDPARSERPARAMS cuparseinfo; @@ -103,6 +111,8 @@ typedef struct CuvidParsedFrame CUVIDPARSERDISPINFO dispinfo; int second_field; int is_deinterlacing; +uint8_t* a53_caption; +int a53_caption_size; } CuvidParsedFrame; #define CHECK_CU(x) FF_CUDA_CHECK_DL(avctx, ctx->cudl, x) @@ -338,6 +348,24 @@ static int CUDAAPI cuvid_handle_picture_decode(void *opaque, CUVIDPICPARAMS* pic ctx->key_frame[picparams->CurrPicIdx] = picparams->intra_pic_flag; +if (ctx->a53_caption) +{ + +if (picparams->CurrPicIdx >= A53_QUEUE_SIZE) +{ +av_log(avctx, AV_LOG_WARNING, "CurrPicIdx too big: %d\n", picparams->CurrPicIdx); +av_freep(>a53_caption); +} +else +{ +int pos = picparams->CurrPicIdx; +av_freep(>a53_caption_queue[pos]); +ctx->a53_caption_queue[pos] = ctx->a53_caption; +ctx->a53_caption_size_queue[pos] = ctx->a53_caption_size; +ctx->a53_caption = NULL; +} +} + ctx->internal_error = CHECK_CU(ctx->cvdl->cuvidDecodePicture(ctx->cudecoder, picparams)); if (ctx->internal_error < 0) return 0; @@ -350,6 +378,20 @@ static int CUDAAPI cuvid_handle_picture_display(void *opaque, CUVIDPARSERDISPINF AVCodecContext *avctx = opaque; CuvidContext *ctx = avctx->priv_data; CuvidParsedFrame parsed_frame = { { 0 } }; +uint8_t* a53_caption = NULL; +int a53_caption_size = 0; + +if (dispinfo->picture_index >= A53_QUEUE_SIZE) +{ +av_log(avctx, AV_LOG_WARNING, "picture_index too big: %d\n", dispinfo->picture_index); +} +else +{ +int pos = dispinfo->picture_index; +a53_caption = ctx->a53_caption_queue[pos]; +a53_caption_size = ctx->a53_caption_size_queue[pos]; +ctx->a53_caption_queue[pos] = NULL; +} parsed_frame.dispinfo = *dispinfo; ctx->internal_error = 0; @@ -358,11 +400,17 @@ static int CUDAAPI cuvid_handle_picture_display(void *opaque, CUVIDPARSERDISPINF parsed_frame.dispinfo.progressive_frame = ctx->progressive_sequence; if (ctx->deint_mode_current == cudaVideoDeinterlaceMode_Weave) { +parsed_frame.a53_caption = a53_caption; +parsed_frame.a53_caption_size = a53_caption_size; av_fifo_generic_write(ctx->frame_queue, _frame, sizeof(CuvidParsedFrame), NULL); } else { parsed_frame.is_deinterlacing = 1; +parsed_frame.a53_caption = a53_caption; +parsed_frame.a53_caption_size = a53_caption_size; av_fifo_generic_write(ctx->frame_queue, _frame, sizeof(CuvidParsedFrame), NULL); if (!ctx->drop_second_field) { +parsed_frame.a53_caption = NULL; +parsed_frame.a53_caption_size = 0; parsed_frame.second_field = 1; av_fifo_generic_write(ctx->frame_queue, _frame, sizeof(CuvidParsedFrame), NULL); } @@ -382,6 +430,139 @@ static int cuvid_is_buffer_full(AVCodecContext *avctx) return (av_fifo_size(ctx->frame_queue) / sizeof(CuvidParsedFrame)) + delay >= ctx->nb_surfaces; } + +static void cuvid_mpeg_parse_a53(CuvidContext *ctx, const uint8_t* p, int buf_size) +{ +const uint8_t* buf_end = p + buf_size; +for(;;) +{ +uint32_t start_code = -1; +p = avpriv_find_start_code(p, buf_end, _code); +if (start_code > 0x1ff) +break; +if (start_code != 0x1b2) +continue; +buf_size = buf_end - p; +if (buf_size >= 6 && +p[0] == 'G' && p[1] == 'A' && p[2] == '9' && p[3] == '4' && p[4] == 3 && (p[5] & 0x40)) +{ +/* extract A53 Part 4 CC data */ +int cc_count = p[5] & 0x1f; +if (cc_count > 0 && buf_size >= 7 + cc_count * 3) +{ +av_freep(>a53_caption); +ctx->a53_caption_size = cc_count * 3; +ctx->a53_caption = av_malloc(ctx->a53_caption_size); +if (ctx->a53_caption) +memcpy(ctx->a53_caption, p + 7,
Re: [FFmpeg-devel] [PATCH] avcodec/jpeglsenc: Remove redundant pixel format checks
On 4/12/2021 2:07 PM, Andreas Rheinhardt wrote: This encoder has AVCodec.pix_fmts set, so ff_encode_preinit() already checks for this. Signed-off-by: Andreas Rheinhardt --- Will apply tomorrow unless there are objections. libavcodec/jpeglsenc.c | 8 1 file changed, 8 deletions(-) diff --git a/libavcodec/jpeglsenc.c b/libavcodec/jpeglsenc.c index 2bb6b1407a..d03ce32f41 100644 --- a/libavcodec/jpeglsenc.c +++ b/libavcodec/jpeglsenc.c @@ -429,14 +429,6 @@ FF_DISABLE_DEPRECATION_WARNINGS FF_ENABLE_DEPRECATION_WARNINGS #endif -if (ctx->pix_fmt != AV_PIX_FMT_GRAY8 && -ctx->pix_fmt != AV_PIX_FMT_GRAY16 && -ctx->pix_fmt != AV_PIX_FMT_RGB24 && -ctx->pix_fmt != AV_PIX_FMT_BGR24) { -av_log(ctx, AV_LOG_ERROR, - "Only grayscale and RGB24/BGR24 images are supported\n"); -return -1; -} return 0; } nit: The only code left in this function after this patch will be gone after the bump, so maybe either wrap the entire function (and the AVCodec initializer) with the relevant check, or postpone applying this patch until after the bump so you can remove the whole thing in one go. LGTM regardless of the above. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] libavdevice/gdigrab: fix capture window title contain non-ASCII chars
On Sat, Mar 20, 2021 at 5:34 PM <1160386...@qq.com> wrote: > > From: He Yang <1160386...@qq.com> > > Signed-off-by: He Yang <1160386...@qq.com> Sorry for taking a while to respond, and thank you for the contribution. I have verified that this conversion and FindWindowW usage indeed fixes issues with non-ASCII window titles. Before: [gdigrab @ 01d1f24b2cc0] Can't find window 'ジャンキーナイトタウンオーケストラ _ すりぃ feat.鏡音レン-sm36109943.mp4 - mpv', aborting. title=ジャンキーナイトタウンオーケストラ _ すりぃ feat.鏡音レン-sm36109943.mp4 - mpv: I/O error After: [gdigrab @ 017d298b2cc0] Found window ジャンキーナイトタウンオーケストラ _ すりぃ feat.鏡音レン-sm36109943.mp4 - mpv, capturing 1920x1080x32 at (0,0) Input #0, gdigrab, from 'title=ジャンキーナイトタウンオーケストラ _ すりぃ feat.鏡音レン-sm36109943.mp4 - mpv': Now, taking things step-by-step, first from the most clear things: 1. FFmpeg utilizes C99 features, but follows the rule that no declarations should happen after non-declaring code within a scope/context. src/libavdevice/gdigrab.c: In function 'gdigrab_read_header': src/libavdevice/gdigrab.c:249:9: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement] 249 | const wchar_t *name_w = NULL; | ^ -> Basically fixed by moving the new wchar_t as the first thing in the scope of that if branch. 2. Mismatch between function and the calling code in `const`ness. Const things are nice, but in this case the function takes in a non-const pointer. src/libavdevice/gdigrab.c:250:30: warning: passing argument 2 of 'utf8towchar' from incompatible pointer type [-Wincompatible-pointer-types] 250 | if(utf8towchar(name, _w)) { | ^~~ | | | const wchar_t ** {aka const short unsigned int **} In file included from src/libavformat/os_support.h:148, from src/libavformat/internal.h:28, from src/libavdevice/gdigrab.c:32: src/libavutil/wchar_filename.h:27:68: note: expected 'wchar_t **' {aka 'short unsigned int **'} but argument is of type 'const wchar_t **' {aka 'const short unsigned int **'} 27 | static inline int utf8towchar(const char *filename_utf8, wchar_t **filename_w) | ~~^~ -> Fixed by removing the const from the wchar_t pointer. Thus we move to actual review: 1. The libavutil header should be explicitly #included. That way users of headers should be more easily find'able. 2. When utf8towchar returns nonzero, ret should probably be set to AVERROR(errno). That way we are not re-guessing implementation specifics of the function. (noticed by Martin) 3. Some whitespace would be good between the variable declarations/setting, doing the conversion and finally the actual window finding. As I had to go through these points for the review process, I basically posted a version with these changes @ https://github.com/jeeb/ffmpeg/commits/gdigrab_unicode_fix . I also took the liberty of rewording the commit message somewhat. If you think these changes are acceptable, then unless something new is noticed, I consider this LGTM. Best regards, Jan ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH] Added Closed caption support for cuviddec for preserving a53 data n GPU decoding
Signed-off-by: Dhanish Vijayan --- libavcodec/cuviddec.c | 199 ++ 1 file changed, 199 insertions(+) diff --git a/libavcodec/cuviddec.c b/libavcodec/cuviddec.c index ec57afdefe..3b07d0a874 100644 --- a/libavcodec/cuviddec.c +++ b/libavcodec/cuviddec.c @@ -46,6 +46,9 @@ #define CUVID_HAS_AV1_SUPPORT #endif +#define MAX_FRAME_COUNT 25 +#define A53_QUEUE_SIZE (MAX_FRAME_COUNT + 8) + typedef struct CuvidContext { AVClass *avclass; @@ -89,6 +92,11 @@ typedef struct CuvidContext cudaVideoCodec codec_type; cudaVideoChromaFormat chroma_format; +uint8_t* a53_caption; +int a53_caption_size; +uint8_t* a53_caption_queue[A53_QUEUE_SIZE]; +int a53_caption_size_queue[A53_QUEUE_SIZE]; + CUVIDDECODECAPS caps8, caps10, caps12; CUVIDPARSERPARAMS cuparseinfo; @@ -103,6 +111,8 @@ typedef struct CuvidParsedFrame CUVIDPARSERDISPINFO dispinfo; int second_field; int is_deinterlacing; +uint8_t* a53_caption; +int a53_caption_size; } CuvidParsedFrame; #define CHECK_CU(x) FF_CUDA_CHECK_DL(avctx, ctx->cudl, x) @@ -338,6 +348,24 @@ static int CUDAAPI cuvid_handle_picture_decode(void *opaque, CUVIDPICPARAMS* pic ctx->key_frame[picparams->CurrPicIdx] = picparams->intra_pic_flag; +if (ctx->a53_caption) +{ + +if (picparams->CurrPicIdx >= A53_QUEUE_SIZE) +{ +av_log(avctx, AV_LOG_WARNING, "CurrPicIdx too big: %d\n", picparams->CurrPicIdx); +av_freep(>a53_caption); +} +else +{ +int pos = picparams->CurrPicIdx; +av_freep(>a53_caption_queue[pos]); +ctx->a53_caption_queue[pos] = ctx->a53_caption; +ctx->a53_caption_size_queue[pos] = ctx->a53_caption_size; +ctx->a53_caption = NULL; +} +} + ctx->internal_error = CHECK_CU(ctx->cvdl->cuvidDecodePicture(ctx->cudecoder, picparams)); if (ctx->internal_error < 0) return 0; @@ -350,6 +378,20 @@ static int CUDAAPI cuvid_handle_picture_display(void *opaque, CUVIDPARSERDISPINF AVCodecContext *avctx = opaque; CuvidContext *ctx = avctx->priv_data; CuvidParsedFrame parsed_frame = { { 0 } }; +uint8_t* a53_caption = NULL; +int a53_caption_size = 0; + +if (dispinfo->picture_index >= A53_QUEUE_SIZE) +{ +av_log(avctx, AV_LOG_WARNING, "picture_index too big: %d\n", dispinfo->picture_index); +} +else +{ +int pos = dispinfo->picture_index; +a53_caption = ctx->a53_caption_queue[pos]; +a53_caption_size = ctx->a53_caption_size_queue[pos]; +ctx->a53_caption_queue[pos] = NULL; +} parsed_frame.dispinfo = *dispinfo; ctx->internal_error = 0; @@ -358,11 +400,17 @@ static int CUDAAPI cuvid_handle_picture_display(void *opaque, CUVIDPARSERDISPINF parsed_frame.dispinfo.progressive_frame = ctx->progressive_sequence; if (ctx->deint_mode_current == cudaVideoDeinterlaceMode_Weave) { +parsed_frame.a53_caption = a53_caption; +parsed_frame.a53_caption_size = a53_caption_size; av_fifo_generic_write(ctx->frame_queue, _frame, sizeof(CuvidParsedFrame), NULL); } else { parsed_frame.is_deinterlacing = 1; +parsed_frame.a53_caption = a53_caption; +parsed_frame.a53_caption_size = a53_caption_size; av_fifo_generic_write(ctx->frame_queue, _frame, sizeof(CuvidParsedFrame), NULL); if (!ctx->drop_second_field) { +parsed_frame.a53_caption = NULL; +parsed_frame.a53_caption_size = 0; parsed_frame.second_field = 1; av_fifo_generic_write(ctx->frame_queue, _frame, sizeof(CuvidParsedFrame), NULL); } @@ -382,6 +430,139 @@ static int cuvid_is_buffer_full(AVCodecContext *avctx) return (av_fifo_size(ctx->frame_queue) / sizeof(CuvidParsedFrame)) + delay >= ctx->nb_surfaces; } + +static void cuvid_mpeg_parse_a53(CuvidContext *ctx, const uint8_t* p, int buf_size) +{ +const uint8_t* buf_end = p + buf_size; +for(;;) +{ +uint32_t start_code = -1; +p = avpriv_find_start_code(p, buf_end, _code); +if (start_code > 0x1ff) +break; +if (start_code != 0x1b2) +continue; +buf_size = buf_end - p; +if (buf_size >= 6 && +p[0] == 'G' && p[1] == 'A' && p[2] == '9' && p[3] == '4' && p[4] == 3 && (p[5] & 0x40)) +{ +/* extract A53 Part 4 CC data */ +int cc_count = p[5] & 0x1f; +if (cc_count > 0 && buf_size >= 7 + cc_count * 3) +{ +av_freep(>a53_caption); +ctx->a53_caption_size = cc_count * 3; +ctx->a53_caption = av_malloc(ctx->a53_caption_size); +if (ctx->a53_caption) +memcpy(ctx->a53_caption, p + 7, ctx->a53_caption_size); +} +} +else if (buf_size >= 11 && p[0] == 'C' && p[1]
[FFmpeg-devel] [PATCH v2] lavc/aarch64: add pred16x16 10-bit functions
Benchmarks: pred16x16_dc_10_c: 124.0 pred16x16_dc_10_neon: 97.2 pred16x16_horizontal_10_c: 71.7 pred16x16_horizontal_10_neon: 66.2 pred16x16_top_dc_10_c: 90.7 pred16x16_top_dc_10_neon: 71.5 pred16x16_vertical_10_c: 64.7 pred16x16_vertical_10_neon: 61.7 Some functions work slower than C and are left commented out. Signed-off-by: Mikhail Nitenko --- libavcodec/aarch64/h264pred_init.c | 68 + libavcodec/aarch64/h264pred_neon.S | 117 + 2 files changed, 155 insertions(+), 30 deletions(-) diff --git a/libavcodec/aarch64/h264pred_init.c b/libavcodec/aarch64/h264pred_init.c index b144376f90..325a86bfcd 100644 --- a/libavcodec/aarch64/h264pred_init.c +++ b/libavcodec/aarch64/h264pred_init.c @@ -45,42 +45,50 @@ void ff_pred8x8_0lt_dc_neon(uint8_t *src, ptrdiff_t stride); void ff_pred8x8_l00_dc_neon(uint8_t *src, ptrdiff_t stride); void ff_pred8x8_0l0_dc_neon(uint8_t *src, ptrdiff_t stride); +void ff_pred16x16_top_dc_neon_10(uint8_t *src, ptrdiff_t stride); +void ff_pred16x16_dc_neon_10(uint8_t *src, ptrdiff_t stride); +void ff_pred16x16_hor_neon_10(uint8_t *src, ptrdiff_t stride); +void ff_pred16x16_vert_neon_10(uint8_t *src, ptrdiff_t stride); + static av_cold void h264_pred_init_neon(H264PredContext *h, int codec_id, const int bit_depth, const int chroma_format_idc) { -const int high_depth = bit_depth > 8; - -if (high_depth) -return; - -if (chroma_format_idc <= 1) { -h->pred8x8[VERT_PRED8x8 ] = ff_pred8x8_vert_neon; -h->pred8x8[HOR_PRED8x8 ] = ff_pred8x8_hor_neon; -if (codec_id != AV_CODEC_ID_VP7 && codec_id != AV_CODEC_ID_VP8) -h->pred8x8[PLANE_PRED8x8] = ff_pred8x8_plane_neon; -h->pred8x8[DC_128_PRED8x8 ] = ff_pred8x8_128_dc_neon; -if (codec_id != AV_CODEC_ID_RV40 && codec_id != AV_CODEC_ID_VP7 && -codec_id != AV_CODEC_ID_VP8) { -h->pred8x8[DC_PRED8x8 ] = ff_pred8x8_dc_neon; -h->pred8x8[LEFT_DC_PRED8x8] = ff_pred8x8_left_dc_neon; -h->pred8x8[TOP_DC_PRED8x8 ] = ff_pred8x8_top_dc_neon; -h->pred8x8[ALZHEIMER_DC_L0T_PRED8x8] = ff_pred8x8_l0t_dc_neon; -h->pred8x8[ALZHEIMER_DC_0LT_PRED8x8] = ff_pred8x8_0lt_dc_neon; -h->pred8x8[ALZHEIMER_DC_L00_PRED8x8] = ff_pred8x8_l00_dc_neon; -h->pred8x8[ALZHEIMER_DC_0L0_PRED8x8] = ff_pred8x8_0l0_dc_neon; +if (bit_depth == 8) { +if (chroma_format_idc <= 1) { +h->pred8x8[VERT_PRED8x8 ] = ff_pred8x8_vert_neon; +h->pred8x8[HOR_PRED8x8 ] = ff_pred8x8_hor_neon; +if (codec_id != AV_CODEC_ID_VP7 && codec_id != AV_CODEC_ID_VP8) +h->pred8x8[PLANE_PRED8x8] = ff_pred8x8_plane_neon; +h->pred8x8[DC_128_PRED8x8 ] = ff_pred8x8_128_dc_neon; +if (codec_id != AV_CODEC_ID_RV40 && codec_id != AV_CODEC_ID_VP7 && +codec_id != AV_CODEC_ID_VP8) { +h->pred8x8[DC_PRED8x8 ] = ff_pred8x8_dc_neon; +h->pred8x8[LEFT_DC_PRED8x8] = ff_pred8x8_left_dc_neon; +h->pred8x8[TOP_DC_PRED8x8 ] = ff_pred8x8_top_dc_neon; +h->pred8x8[ALZHEIMER_DC_L0T_PRED8x8] = ff_pred8x8_l0t_dc_neon; +h->pred8x8[ALZHEIMER_DC_0LT_PRED8x8] = ff_pred8x8_0lt_dc_neon; +h->pred8x8[ALZHEIMER_DC_L00_PRED8x8] = ff_pred8x8_l00_dc_neon; +h->pred8x8[ALZHEIMER_DC_0L0_PRED8x8] = ff_pred8x8_0l0_dc_neon; +} } -} -h->pred16x16[DC_PRED8x8 ] = ff_pred16x16_dc_neon; -h->pred16x16[VERT_PRED8x8 ] = ff_pred16x16_vert_neon; -h->pred16x16[HOR_PRED8x8] = ff_pred16x16_hor_neon; -h->pred16x16[LEFT_DC_PRED8x8] = ff_pred16x16_left_dc_neon; -h->pred16x16[TOP_DC_PRED8x8 ] = ff_pred16x16_top_dc_neon; -h->pred16x16[DC_128_PRED8x8 ] = ff_pred16x16_128_dc_neon; -if (codec_id != AV_CODEC_ID_SVQ3 && codec_id != AV_CODEC_ID_RV40 && -codec_id != AV_CODEC_ID_VP7 && codec_id != AV_CODEC_ID_VP8) -h->pred16x16[PLANE_PRED8x8 ] = ff_pred16x16_plane_neon; +h->pred16x16[DC_PRED8x8 ] = ff_pred16x16_dc_neon; +h->pred16x16[VERT_PRED8x8 ] = ff_pred16x16_vert_neon; +h->pred16x16[HOR_PRED8x8] = ff_pred16x16_hor_neon; +h->pred16x16[LEFT_DC_PRED8x8] = ff_pred16x16_left_dc_neon; +h->pred16x16[TOP_DC_PRED8x8 ] = ff_pred16x16_top_dc_neon; +h->pred16x16[DC_128_PRED8x8 ] = ff_pred16x16_128_dc_neon; +if (codec_id != AV_CODEC_ID_SVQ3 && codec_id != AV_CODEC_ID_RV40 && +codec_id != AV_CODEC_ID_VP7 && codec_id != AV_CODEC_ID_VP8) +h->pred16x16[PLANE_PRED8x8 ] = ff_pred16x16_plane_neon; +} +if (bit_depth == 10) { +h->pred16x16[DC_PRED8x8 ] = ff_pred16x16_dc_neon_10; +h->pred16x16[VERT_PRED8x8 ] = ff_pred16x16_vert_neon_10; +h->pred16x16[HOR_PRED8x8] =
[FFmpeg-devel] [PATCH] avcodec/jpeglsenc: Remove redundant pixel format checks
This encoder has AVCodec.pix_fmts set, so ff_encode_preinit() already checks for this. Signed-off-by: Andreas Rheinhardt --- Will apply tomorrow unless there are objections. libavcodec/jpeglsenc.c | 8 1 file changed, 8 deletions(-) diff --git a/libavcodec/jpeglsenc.c b/libavcodec/jpeglsenc.c index 2bb6b1407a..d03ce32f41 100644 --- a/libavcodec/jpeglsenc.c +++ b/libavcodec/jpeglsenc.c @@ -429,14 +429,6 @@ FF_DISABLE_DEPRECATION_WARNINGS FF_ENABLE_DEPRECATION_WARNINGS #endif -if (ctx->pix_fmt != AV_PIX_FMT_GRAY8 && -ctx->pix_fmt != AV_PIX_FMT_GRAY16 && -ctx->pix_fmt != AV_PIX_FMT_RGB24 && -ctx->pix_fmt != AV_PIX_FMT_BGR24) { -av_log(ctx, AV_LOG_ERROR, - "Only grayscale and RGB24/BGR24 images are supported\n"); -return -1; -} return 0; } -- 2.27.0 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH v3 2/5] avcodec/mips: Refine get_cabac_inline_mips.
> 2021年3月31日 下午10:39,Michael Niedermayer 写道: > > On Tue, Mar 30, 2021 at 08:51:52PM +0800, Shiyou Yin wrote: >> 1. Refined function get_cabac_inline_mips. >> 2. Optimize function get_cabac_bypass and get_cabac_bypass_sign. >> >> Speed of decoding h264: 4.89x ==> 5.05x(tested on 3A4000). >> --- >> libavcodec/mips/cabac.h | 131 >> +--- >> 1 file changed, 102 insertions(+), 29 deletions(-) > > This breaks fate with qemu mips > > --- ffmpeg/tests/ref/fate/hevc-cabac-tudepth 2021-03-26 18:34:55.142789579 > +0100 > +++ tests/data/fate/hevc-cabac-tudepth2021-03-31 16:36:50.613173111 > +0200 > @@ -3,4 +3,4 @@ > #codec_id 0: rawvideo > #dimensions 0: 64x64 > #sar 0: 0/1 > -0, 0, 0,1,12288, 0x0127a0d9 > +0, 0, 0,1,12288, 0xa330b3bd > Test hevc-cabac-tudepth failed. Look at > tests/data/fate/hevc-cabac-tudepth.err for details. > ffmpeg/tests/Makefile:255: recipe for target 'fate-hevc-cabac-tudepth' failed > make: *** [fate-hevc-cabac-tudepth] Error 1 > This bug is caused by using ‘lhu’ to load two byte date on bigendian environment. Has been fixed in V4. Please help to merge them. BTW, I found another failed case ‘fate-sub2video_time_limited’ when testing origin/master with cross compiler mips-linux-gnu-gcc-8 on debian10-x64 and run fate with qemu-mips. I will try to analyze it later. My configuration: --samples=../../fate-suite/ --target-exec='/usr/bin/qemu-mips -cpu 74Kf -L /usr/mips-linux-gnu/' --cross-prefix=/usr/mips-linux-gnu/bin/ --cc=mips-linux-gnu-gcc-8 --arch=mips --target-os=linux --optflags='-O3 -g -static' --extra-ldflags=‘-static' --enable-cross-compile --enable-static --enable-gpl --disable-pthreads --disable-iconv --disable-mipsfpu TESTsub2video_time_limited --- src/tests/ref/fate/sub2video_time_limited 2021-04-10 11:53:37.661350105 +0800 +++ tests/data/fate/sub2video_time_limited 2021-04-12 23:18:29.355527385 +0800 @@ -4,5 +4,5 @@ #dimensions 0: 1920x1080 #sar 0: 0/1 0, 2, 2,1, 8294400, 0x -0, 2, 2,1, 8294400, 0xa87c518f -0, 10, 10,1, 8294400, 0xa87c518f +0, 2, 2,1, 8294400, 0xea5a518f +0, 10, 10,1, 8294400, 0xea5a518f Test sub2video_time_limited failed. Look at tests/data/fate/sub2video_time_limited.err for details. make: *** [src/tests/Makefile:256:fate-sub2video_time_limited] 错误 1 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] avformat/rawenc: remove singlejpeg muxer
Ping. On 2021-04-10 20:00, Gyan Doshi wrote: It was added in 51ac1f616f due to ticket #4218, in order to show a single image via ffserver. With ffserver long gone, it serves no purpose. --- libavformat/Makefile | 1 - libavformat/allformats.c | 1 - libavformat/rawenc.c | 13 - 3 files changed, 15 deletions(-) diff --git a/libavformat/Makefile b/libavformat/Makefile index 0f340f74a0..bc1ddfa81c 100644 --- a/libavformat/Makefile +++ b/libavformat/Makefile @@ -506,7 +506,6 @@ OBJS-$(CONFIG_SGA_DEMUXER) += sga.o OBJS-$(CONFIG_SHORTEN_DEMUXER) += shortendec.o rawdec.o OBJS-$(CONFIG_SIFF_DEMUXER) += siff.o OBJS-$(CONFIG_SIMBIOSIS_IMX_DEMUXER) += imx.o -OBJS-$(CONFIG_SINGLEJPEG_MUXER) += rawenc.o OBJS-$(CONFIG_SLN_DEMUXER) += pcmdec.o pcm.o OBJS-$(CONFIG_SMACKER_DEMUXER) += smacker.o OBJS-$(CONFIG_SMJPEG_DEMUXER)+= smjpegdec.o smjpeg.o diff --git a/libavformat/allformats.c b/libavformat/allformats.c index a38fd1f583..fa093c7ac2 100644 --- a/libavformat/allformats.c +++ b/libavformat/allformats.c @@ -405,7 +405,6 @@ extern AVInputFormat ff_sga_demuxer; extern AVInputFormat ff_shorten_demuxer; extern AVInputFormat ff_siff_demuxer; extern AVInputFormat ff_simbiosis_imx_demuxer; -extern AVOutputFormat ff_singlejpeg_muxer; extern AVInputFormat ff_sln_demuxer; extern AVInputFormat ff_smacker_demuxer; extern AVInputFormat ff_smjpeg_demuxer; diff --git a/libavformat/rawenc.c b/libavformat/rawenc.c index caec297f4a..a43a7a6278 100644 --- a/libavformat/rawenc.c +++ b/libavformat/rawenc.c @@ -399,19 +399,6 @@ AVOutputFormat ff_mjpeg_muxer = { }; #endif -#if CONFIG_SINGLEJPEG_MUXER -AVOutputFormat ff_singlejpeg_muxer = { -.name = "singlejpeg", -.long_name = NULL_IF_CONFIG_SMALL("JPEG single image"), -.mime_type = "image/jpeg", -.audio_codec = AV_CODEC_ID_NONE, -.video_codec = AV_CODEC_ID_MJPEG, -.init = force_one_stream, -.write_packet = ff_raw_write_packet, -.flags = AVFMT_NOTIMESTAMPS, -}; -#endif - #if CONFIG_MLP_MUXER AVOutputFormat ff_mlp_muxer = { .name = "mlp", ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH v4 5/5] mips: Fix potential illegal instruction error.
MSA2 optimizations are attached to MSA macros in generic_macros_msa.h. It's difficult to do runtime check for them. Remove this part of code can make it more robust. H264 1080p decoding: 5.13x==>5.12x. --- configure | 7 +-- libavutil/mips/generic_macros_msa.h | 37 - 2 files changed, 1 insertion(+), 43 deletions(-) diff --git a/configure b/configure index d7a3f50..7b05612 100755 --- a/configure +++ b/configure @@ -451,7 +451,6 @@ Optimization options (experts only): --disable-mipsdspdisable MIPS DSP ASE R1 optimizations --disable-mipsdspr2 disable MIPS DSP ASE R2 optimizations --disable-msadisable MSA optimizations - --disable-msa2 disable MSA2 optimizations --disable-mipsfpudisable floating point MIPS optimizations --disable-mmidisable Loongson SIMD optimizations --disable-fast-unaligned consider unaligned accesses slow @@ -2025,7 +2024,6 @@ ARCH_EXT_LIST_MIPS=" mipsdsp mipsdspr2 msa -msa2 " ARCH_EXT_LIST_LOONGSON=" @@ -2564,7 +2562,6 @@ mipsdsp_deps="mips" mipsdspr2_deps="mips" mmi_deps_any="loongson2 loongson3" msa_deps="mipsfpu" -msa2_deps="msa" cpunop_deps="i686" x86_64_select="i686" @@ -5907,9 +5904,8 @@ elif enabled mips; then enabled mipsdsp && check_inline_asm_flags mipsdsp '"addu.qb $t0, $t1, $t2"' '-mdsp' enabled mipsdspr2 && check_inline_asm_flags mipsdspr2 '"absq_s.qb $t0, $t1"' '-mdspr2' -# MSA and MSA2 can be detected at runtime so we supply extra flags here +# MSA can be detected at runtime so we supply extra flags here enabled mipsfpu && enabled msa && check_inline_asm msa '"addvi.b $w0, $w1, 1"' '-mmsa' && append MSAFLAGS '-mmsa' -enabled msa && enabled msa2 && check_inline_asm msa2 '"nxbits.any.b $w0, $w0"' '-mmsa2' && append MSAFLAGS '-mmsa2' # loongson2 have no switch cflag so we can only probe toolchain ability enabled loongson2 && check_inline_asm loongson2 '"dmult.g $8, $9, $10"' && disable loongson3 @@ -7340,7 +7336,6 @@ if enabled mips; then echo "MIPS DSP R1 enabled ${mipsdsp-no}" echo "MIPS DSP R2 enabled ${mipsdspr2-no}" echo "MIPS MSA enabled ${msa-no}" -echo "MIPS MSA2 enabled ${msa2-no}" echo "LOONGSON MMI enabled ${mmi-no}" fi if enabled ppc; then diff --git a/libavutil/mips/generic_macros_msa.h b/libavutil/mips/generic_macros_msa.h index bb25e9f..1486f72 100644 --- a/libavutil/mips/generic_macros_msa.h +++ b/libavutil/mips/generic_macros_msa.h @@ -25,10 +25,6 @@ #include #include -#if HAVE_MSA2 -#include -#endif - #define ALIGNMENT 16 #define ALLOC_ALIGNED(align) __attribute__ ((aligned((align) << 1))) @@ -1119,15 +1115,6 @@ unsigned absolute diff values, even-odd pairs are added together to generate 8 halfword results. */ -#if HAVE_MSA2 -#define SAD_UB2_UH(in0, in1, ref0, ref1) \ -( { \ -v8u16 sad_m = { 0 }; \ -sad_m += __builtin_msa2_sad_adj2_u_w2x_b((v16u8) in0, (v16u8) ref0); \ -sad_m += __builtin_msa2_sad_adj2_u_w2x_b((v16u8) in1, (v16u8) ref1); \ -sad_m; \ -} ) -#else #define SAD_UB2_UH(in0, in1, ref0, ref1)\ ( { \ v16u8 diff0_m, diff1_m; \ @@ -1141,7 +1128,6 @@ \ sad_m; \ } ) -#endif // #if HAVE_MSA2 /* Description : Insert specified word elements from input vectors to 1 destination vector @@ -2183,12 +2169,6 @@ extracted and interleaved with same vector 'in0' to generate 4 word elements keeping sign intact */ -#if HAVE_MSA2 -#define UNPCK_R_SH_SW(in, out) \ -{\ -out = (v4i32) __builtin_msa2_w2x_lo_s_h((v8i16) in); \ -} -#else #define UNPCK_R_SH_SW(in, out) \ {\ v8i16 sign_m;\ @@ -2196,7 +2176,6 @@ sign_m = __msa_clti_s_h((v8i16) in, 0); \ out = (v4i32) __msa_ilvr_h(sign_m, (v8i16) in); \ } -#endif // #if HAVE_MSA2 /* Description : Sign extend byte elements from input vector and return halfword results in pair of vectors @@ -2209,13 +2188,6 @@ Then interleaved left with same vector 'in0' to generate 8 signed halfword elements in 'out1' */ -#if HAVE_MSA2 -#define UNPCK_SB_SH(in, out0, out1) \ -{
[FFmpeg-devel] [PATCH v4 3/5] avcodec/mips: Optimize function ff_h264_loop_filter_strength_msa.
From: gxw Speed of decoding H264 1080P: 5.05x ==> 5.13x Signed-off-by: Shiyou Yin --- libavcodec/mips/Makefile| 3 +- libavcodec/mips/h264_deblock_msa.c | 153 libavcodec/mips/h264dsp_init_mips.c | 2 + libavcodec/mips/h264dsp_mips.h | 4 + 4 files changed, 161 insertions(+), 1 deletion(-) create mode 100644 libavcodec/mips/h264_deblock_msa.c diff --git a/libavcodec/mips/Makefile b/libavcodec/mips/Makefile index 2be4d9b..81a73a4 100644 --- a/libavcodec/mips/Makefile +++ b/libavcodec/mips/Makefile @@ -57,7 +57,8 @@ MSA-OBJS-$(CONFIG_VP8_DECODER)+= mips/vp8_mc_msa.o \ mips/vp8_lpf_msa.o MSA-OBJS-$(CONFIG_VP3DSP) += mips/vp3dsp_idct_msa.o MSA-OBJS-$(CONFIG_H264DSP)+= mips/h264dsp_msa.o\ - mips/h264idct_msa.o + mips/h264idct_msa.o \ + mips/h264_deblock_msa.o MSA-OBJS-$(CONFIG_H264QPEL) += mips/h264qpel_msa.o MSA-OBJS-$(CONFIG_H264CHROMA) += mips/h264chroma_msa.o MSA-OBJS-$(CONFIG_H264PRED) += mips/h264pred_msa.o diff --git a/libavcodec/mips/h264_deblock_msa.c b/libavcodec/mips/h264_deblock_msa.c new file mode 100644 index 000..4fed55c --- /dev/null +++ b/libavcodec/mips/h264_deblock_msa.c @@ -0,0 +1,153 @@ +/* + * MIPS SIMD optimized H.264 deblocking code + * + * Copyright (c) 2020 Loongson Technology Corporation Limited + *Gu Xiwei + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "libavcodec/bit_depth_template.c" +#include "h264dsp_mips.h" +#include "libavutil/mips/generic_macros_msa.h" +#include "libavcodec/mips/h264dsp_mips.h" + +#define h264_loop_filter_strength_iteration_msa(edges, step, mask_mv, dir, \ +d_idx, mask_dir) \ +do { \ +int b_idx = 0; \ +int step_x4 = step << 2; \ +int d_idx_12 = d_idx + 12; \ +int d_idx_52 = d_idx + 52; \ +int d_idx_x4 = d_idx << 2; \ +int d_idx_x4_48 = d_idx_x4 + 48; \ +int dir_x32 = dir * 32; \ +uint8_t *ref_t = (uint8_t*)ref; \ +uint8_t *mv_t = (uint8_t*)mv; \ +uint8_t *nnz_t = (uint8_t*)nnz; \ +uint8_t *bS_t = (uint8_t*)bS; \ +mask_mv <<= 3; \ +for (; b_idx < edges; b_idx += step) { \ +out &= mask_dir; \ +if (!(mask_mv & b_idx)) { \ +if (bidir) { \ +ref_2 = LD_SB(ref_t + d_idx_12); \ +ref_3 = LD_SB(ref_t + d_idx_52); \ +ref_0 = LD_SB(ref_t + 12); \ +ref_1 = LD_SB(ref_t + 52); \ +ref_2 = (v16i8)__msa_ilvr_w((v4i32)ref_3, (v4i32)ref_2); \ +ref_0 = (v16i8)__msa_ilvr_w((v4i32)ref_0, (v4i32)ref_0); \ +ref_1 = (v16i8)__msa_ilvr_w((v4i32)ref_1, (v4i32)ref_1); \ +ref_3 = (v16i8)__msa_shf_h((v8i16)ref_2, 0x4e); \ +ref_0 -= ref_2; \ +ref_1 -= ref_3; \ +ref_0 = (v16i8)__msa_or_v((v16u8)ref_0, (v16u8)ref_1); \ +\ +tmp_2 = LD_SH(mv_t + d_idx_x4_48); \ +tmp_3 = LD_SH(mv_t + 48); \ +tmp_4 = LD_SH(mv_t + 208); \ +tmp_5 = tmp_2 - tmp_3; \ +tmp_6 = tmp_2 - tmp_4; \ +SAT_SH2_SH(tmp_5, tmp_6, 7); \ +tmp_0 = __msa_pckev_b((v16i8)tmp_6, (v16i8)tmp_5); \ +tmp_0 += cnst_1; \ +tmp_0 = (v16i8)__msa_subs_u_b((v16u8)tmp_0, (v16u8)cnst_0);\ +tmp_0 = (v16i8)__msa_sat_s_h((v8i16)tmp_0, 7); \ +tmp_0 = __msa_pckev_b(tmp_0, tmp_0); \ +out = (v16i8)__msa_or_v((v16u8)ref_0, (v16u8)tmp_0); \ +\ +tmp_2 = LD_SH(mv_t + 208 + d_idx_x4); \ +tmp_5 = tmp_2 - tmp_3; \ +tmp_6 = tmp_2 - tmp_4; \ +SAT_SH2_SH(tmp_5, tmp_6, 7); \ +tmp_1 = __msa_pckev_b((v16i8)tmp_6, (v16i8)tmp_5); \ +tmp_1 += cnst_1; \ +tmp_1 =
[FFmpeg-devel] [PATCH v4 4/5] avcodec/mips: Refine ff_h264_h_lpf_luma_inter_msa
From: gxw Using mask to avoid judgment, H264 4K decoding speed improved about 0.1fps tested on 3A4000 Signed-off-by: Shiyou Yin --- libavcodec/mips/h264dsp_msa.c | 465 -- 1 file changed, 171 insertions(+), 294 deletions(-) diff --git a/libavcodec/mips/h264dsp_msa.c b/libavcodec/mips/h264dsp_msa.c index a8c3f3c..9d815f8 100644 --- a/libavcodec/mips/h264dsp_msa.c +++ b/libavcodec/mips/h264dsp_msa.c @@ -1284,284 +1284,160 @@ static void avc_loopfilter_cb_or_cr_intra_edge_ver_msa(uint8_t *data_cb_or_cr, } } -static void avc_loopfilter_luma_inter_edge_ver_msa(uint8_t *data, - uint8_t bs0, uint8_t bs1, - uint8_t bs2, uint8_t bs3, - uint8_t tc0, uint8_t tc1, - uint8_t tc2, uint8_t tc3, - uint8_t alpha_in, - uint8_t beta_in, - ptrdiff_t img_width) +static void avc_loopfilter_luma_inter_edge_ver_msa(uint8_t* pPix, uint32_t iStride, + uint8_t iAlpha, uint8_t iBeta, + uint8_t* pTc) { -v16u8 tmp_vec, bs = { 0 }; - -tmp_vec = (v16u8) __msa_fill_b(bs0); -bs = (v16u8) __msa_insve_w((v4i32) bs, 0, (v4i32) tmp_vec); -tmp_vec = (v16u8) __msa_fill_b(bs1); -bs = (v16u8) __msa_insve_w((v4i32) bs, 1, (v4i32) tmp_vec); -tmp_vec = (v16u8) __msa_fill_b(bs2); -bs = (v16u8) __msa_insve_w((v4i32) bs, 2, (v4i32) tmp_vec); -tmp_vec = (v16u8) __msa_fill_b(bs3); -bs = (v16u8) __msa_insve_w((v4i32) bs, 3, (v4i32) tmp_vec); - -if (!__msa_test_bz_v(bs)) { -uint8_t *src = data - 4; -v16u8 p3_org, p2_org, p1_org, p0_org, q0_org, q1_org, q2_org, q3_org; -v16u8 p0_asub_q0, p1_asub_p0, q1_asub_q0, alpha, beta; -v16u8 is_less_than, is_less_than_beta, is_less_than_alpha; -v16u8 is_bs_greater_than0; -v16u8 tc = { 0 }; -v16i8 zero = { 0 }; - -tmp_vec = (v16u8) __msa_fill_b(tc0); -tc = (v16u8) __msa_insve_w((v4i32) tc, 0, (v4i32) tmp_vec); -tmp_vec = (v16u8) __msa_fill_b(tc1); -tc = (v16u8) __msa_insve_w((v4i32) tc, 1, (v4i32) tmp_vec); -tmp_vec = (v16u8) __msa_fill_b(tc2); -tc = (v16u8) __msa_insve_w((v4i32) tc, 2, (v4i32) tmp_vec); -tmp_vec = (v16u8) __msa_fill_b(tc3); -tc = (v16u8) __msa_insve_w((v4i32) tc, 3, (v4i32) tmp_vec); - -is_bs_greater_than0 = (zero < bs); - -{ -v16u8 row0, row1, row2, row3, row4, row5, row6, row7; -v16u8 row8, row9, row10, row11, row12, row13, row14, row15; - -LD_UB8(src, img_width, - row0, row1, row2, row3, row4, row5, row6, row7); -src += (8 * img_width); -LD_UB8(src, img_width, - row8, row9, row10, row11, row12, row13, row14, row15); - -TRANSPOSE16x8_UB_UB(row0, row1, row2, row3, row4, row5, row6, row7, -row8, row9, row10, row11, -row12, row13, row14, row15, -p3_org, p2_org, p1_org, p0_org, -q0_org, q1_org, q2_org, q3_org); -} - -p0_asub_q0 = __msa_asub_u_b(p0_org, q0_org); -p1_asub_p0 = __msa_asub_u_b(p1_org, p0_org); -q1_asub_q0 = __msa_asub_u_b(q1_org, q0_org); - -alpha = (v16u8) __msa_fill_b(alpha_in); -beta = (v16u8) __msa_fill_b(beta_in); - -is_less_than_alpha = (p0_asub_q0 < alpha); -is_less_than_beta = (p1_asub_p0 < beta); -is_less_than = is_less_than_beta & is_less_than_alpha; -is_less_than_beta = (q1_asub_q0 < beta); -is_less_than = is_less_than_beta & is_less_than; -is_less_than = is_less_than & is_bs_greater_than0; - -if (!__msa_test_bz_v(is_less_than)) { -v16i8 negate_tc, sign_negate_tc; -v16u8 p0, q0, p2_asub_p0, q2_asub_q0; -v8i16 tc_r, tc_l, negate_tc_r, i16_negatetc_l; -v8i16 p1_org_r, p0_org_r, q0_org_r, q1_org_r; -v8i16 p1_org_l, p0_org_l, q0_org_l, q1_org_l; -v8i16 p0_r, q0_r, p0_l, q0_l; - -negate_tc = zero - (v16i8) tc; -sign_negate_tc = __msa_clti_s_b(negate_tc, 0); - -ILVRL_B2_SH(sign_negate_tc, negate_tc, negate_tc_r, i16_negatetc_l); - -UNPCK_UB_SH(tc, tc_r, tc_l); -UNPCK_UB_SH(p1_org, p1_org_r, p1_org_l); -UNPCK_UB_SH(p0_org, p0_org_r, p0_org_l); -UNPCK_UB_SH(q0_org, q0_org_r, q0_org_l); - -p2_asub_p0 = __msa_asub_u_b(p2_org, p0_org); -is_less_than_beta = (p2_asub_p0 < beta); -is_less_than_beta =
[FFmpeg-devel] [PATCH V4] [mips] Optimize H264 decoding for MIPS platform.
v2: Fixed a build error in [PATCH 2/5]. v3: add patch 4/5. v4: Fix bug in 2/5 caused by instruction 'lhu' on BIGENDIAN environment. [PATCH v4 1/5] avcodec/mips: Restore the initialization sequence of [PATCH v4 2/5] avcodec/mips: Refine get_cabac_inline_mips. [PATCH v4 3/5] avcodec/mips: Optimize function ff_h264_loop_filter_strength_msa. [PATCH v4 4/5] avcodec/mips: Refine ff_h264_h_lpf_luma_inter_msa [PATCH v4 5/5] mips: Fix potential illegal instruction error. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH v4 1/5] avcodec/mips: Restore the initialization sequence of MSA and MMI in ff_h264chroma_init_mips.
The MSA optimization has been refined in commit 93218c2 and ce0a52e. It is better than MMI version now. Speed of decoding H264: 4.83x ==> 4.89x (tested on 3A4000). --- libavcodec/mips/h264chroma_init_mips.c | 19 +-- 1 file changed, 9 insertions(+), 10 deletions(-) diff --git a/libavcodec/mips/h264chroma_init_mips.c b/libavcodec/mips/h264chroma_init_mips.c index 6bb19d3..755cc04 100644 --- a/libavcodec/mips/h264chroma_init_mips.c +++ b/libavcodec/mips/h264chroma_init_mips.c @@ -28,7 +28,15 @@ av_cold void ff_h264chroma_init_mips(H264ChromaContext *c, int bit_depth) int cpu_flags = av_get_cpu_flags(); int high_bit_depth = bit_depth > 8; -/* MMI apears to be faster than MSA here */ +if (have_mmi(cpu_flags)) { +if (!high_bit_depth) { +c->put_h264_chroma_pixels_tab[0] = ff_put_h264_chroma_mc8_mmi; +c->avg_h264_chroma_pixels_tab[0] = ff_avg_h264_chroma_mc8_mmi; +c->put_h264_chroma_pixels_tab[1] = ff_put_h264_chroma_mc4_mmi; +c->avg_h264_chroma_pixels_tab[1] = ff_avg_h264_chroma_mc4_mmi; +} +} + if (have_msa(cpu_flags)) { if (!high_bit_depth) { c->put_h264_chroma_pixels_tab[0] = ff_put_h264_chroma_mc8_msa; @@ -40,13 +48,4 @@ av_cold void ff_h264chroma_init_mips(H264ChromaContext *c, int bit_depth) c->avg_h264_chroma_pixels_tab[2] = ff_avg_h264_chroma_mc2_msa; } } - -if (have_mmi(cpu_flags)) { -if (!high_bit_depth) { -c->put_h264_chroma_pixels_tab[0] = ff_put_h264_chroma_mc8_mmi; -c->avg_h264_chroma_pixels_tab[0] = ff_avg_h264_chroma_mc8_mmi; -c->put_h264_chroma_pixels_tab[1] = ff_put_h264_chroma_mc4_mmi; -c->avg_h264_chroma_pixels_tab[1] = ff_avg_h264_chroma_mc4_mmi; -} -} } -- 2.1.0 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH v4 2/5] avcodec/mips: Refine get_cabac_inline_mips.
1. Refined function get_cabac_inline_mips. 2. Optimize function get_cabac_bypass and get_cabac_bypass_sign. Speed of decoding h264: 4.89x ==> 5.05x(tested on 3A4000). --- libavcodec/mips/cabac.h | 140 ++-- 1 file changed, 112 insertions(+), 28 deletions(-) diff --git a/libavcodec/mips/cabac.h b/libavcodec/mips/cabac.h index 3d09e93..0648b9a 100644 --- a/libavcodec/mips/cabac.h +++ b/libavcodec/mips/cabac.h @@ -2,7 +2,8 @@ * Loongson SIMD optimized h264chroma * * Copyright (c) 2018 Loongson Technology Corporation Limited - * Copyright (c) 2018 Shiyou Yin + * Contributed by Shiyou Yin + *Gu Xiwei(guxiwei...@loongson.cn) * * This file is part of FFmpeg. * @@ -25,18 +26,18 @@ #define AVCODEC_MIPS_CABAC_H #include "libavcodec/cabac.h" -#include "libavutil/mips/asmdefs.h" +#include "libavutil/mips/mmiutils.h" #include "config.h" #define get_cabac_inline get_cabac_inline_mips static av_always_inline int get_cabac_inline_mips(CABACContext *c, - uint8_t * const state){ + uint8_t * const state){ mips_reg tmp0, tmp1, tmp2, bit; __asm__ volatile ( "lbu %[bit],0(%[state]) \n\t" "and %[tmp0], %[c_range], 0xC0 \n\t" -PTR_ADDU "%[tmp0], %[tmp0],%[tmp0] \n\t" +PTR_SLL "%[tmp0], %[tmp0],0x01 \n\t" PTR_ADDU "%[tmp0], %[tmp0],%[tables] \n\t" PTR_ADDU "%[tmp0], %[tmp0],%[bit]\n\t" /* tmp1: RangeLPS */ @@ -44,18 +45,11 @@ static av_always_inline int get_cabac_inline_mips(CABACContext *c, PTR_SUBU "%[c_range],%[c_range], %[tmp1] \n\t" PTR_SLL "%[tmp0], %[c_range], 0x11 \n\t" -PTR_SUBU "%[tmp0], %[tmp0],%[c_low] \n\t" - -/* tmp2: lps_mask */ -PTR_SRA "%[tmp2], %[tmp0],0x1F \n\t" -/* If tmp0 < 0, lps_mask == 0x*/ -/* If tmp0 >= 0, lps_mask == 0x*/ +"slt %[tmp2], %[tmp0],%[c_low] \n\t" "beqz %[tmp2], 1f\n\t" -PTR_SLL "%[tmp0], %[c_range], 0x11 \n\t" +"move %[c_range],%[tmp1] \n\t" +"not %[bit],%[bit]\n\t" PTR_SUBU "%[c_low], %[c_low], %[tmp0] \n\t" -PTR_SUBU "%[tmp0], %[tmp1],%[c_range]\n\t" -PTR_ADDU "%[c_range],%[c_range], %[tmp0] \n\t" -"xor %[bit],%[bit], %[tmp2] \n\t" "1:\n\t" /* tmp1: *state */ @@ -70,23 +64,21 @@ static av_always_inline int get_cabac_inline_mips(CABACContext *c, PTR_SLL "%[c_range],%[c_range], %[tmp2] \n\t" PTR_SLL "%[c_low], %[c_low], %[tmp2] \n\t" -"and %[tmp0], %[c_low], %[cabac_mask] \n\t" -"bnez %[tmp0], 1f\n\t" -PTR_ADDIU"%[tmp0], %[c_low], -0x01 \n\t" +"and %[tmp1], %[c_low], %[cabac_mask] \n\t" +"bnez %[tmp1], 1f\n\t" +PTR_ADDIU"%[tmp0], %[c_low], -0X01 \n\t" "xor %[tmp0], %[c_low], %[tmp0] \n\t" PTR_SRA "%[tmp0], %[tmp0],0x0f \n\t" PTR_ADDU "%[tmp0], %[tmp0],%[tables] \n\t" +/* tmp2: ff_h264_norm_shift[x >> (CABAC_BITS - 1)] */ "lbu %[tmp2], %[norm_off](%[tmp0]) \n\t" -#if CABAC_BITS == 16 -"lbu %[tmp0], 0(%[c_bytestream])\n\t" -"lbu %[tmp1], 1(%[c_bytestream])\n\t" -PTR_SLL "%[tmp0], %[tmp0],0x09 \n\t" -PTR_SLL "%[tmp1], %[tmp1],0x01 \n\t" -PTR_ADDU "%[tmp0], %[tmp0],%[tmp1] \n\t" +#if HAVE_BIGENDIAN +"lhu %[tmp0], 0(%[c_bytestream])\n\t" #else -"lbu %[tmp0], 0(%[c_bytestream])\n\t" -PTR_SLL "%[tmp0], %[tmp0],0x01 \n\t" +"lhu %[tmp0], 0(%[c_bytestream])\n\t" +"wsbh %[tmp0], %[tmp0] \n\t" #endif +PTR_SLL "%[tmp0], %[tmp0],0x01 \n\t" PTR_SUBU "%[tmp0], %[tmp0],%[cabac_mask] \n\t"
[FFmpeg-devel] [PATCH 5/5] avcodec/mpeg4videodec: update exported AVOptions in the user-facing context
This prevents bogus values being reported on frame multithreaded decoding scenarios. Signed-off-by: James Almer --- libavcodec/mpeg4videodec.c | 13 + 1 file changed, 13 insertions(+) diff --git a/libavcodec/mpeg4videodec.c b/libavcodec/mpeg4videodec.c index 2c440a5026..de66fe8b83 100644 --- a/libavcodec/mpeg4videodec.c +++ b/libavcodec/mpeg4videodec.c @@ -3495,6 +3495,18 @@ static int mpeg4_update_thread_context(AVCodecContext *dst, return 0; } + +static int mpeg4_update_thread_context_for_user(AVCodecContext *dst, +const AVCodecContext *src) +{ +MpegEncContext *m = dst->priv_data; +const MpegEncContext *m1 = src->priv_data; + +m->quarter_sample = m1->quarter_sample; +m->divx_packed= m1->divx_packed; + +return 0; +} #endif static av_cold void mpeg4_init_static(void) @@ -3585,6 +3597,7 @@ AVCodec ff_mpeg4_decoder = { .pix_fmts = ff_h263_hwaccel_pixfmt_list_420, .profiles = NULL_IF_CONFIG_SMALL(ff_mpeg4_video_profiles), .update_thread_context = ONLY_IF_THREADS_ENABLED(mpeg4_update_thread_context), +.update_thread_context_for_user = ONLY_IF_THREADS_ENABLED(mpeg4_update_thread_context_for_user), .priv_class = _class, .hw_configs= (const AVCodecHWConfigInternal *const []) { #if CONFIG_MPEG4_NVDEC_HWACCEL -- 2.31.1 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] aarch64: h264pred: Optimize the inner loop of existing 8 bit functions
Apr 12, 2021, 10:07 by mar...@martin.st: > Move the loop counter decrement further from the branch instruction, > this hides the latency of the decrement. > > In loops that first load, then store (the horizontal prediction cases), > do the decrement after the load (where the next instruction would > stall a bit anyway, waiting for the result of the load). > > In loops that store twice using the same destination register, > also do the decrement between the two stores (as the second store > would need to wait for the updated destination register from the > first instruction). > > In loops that store twice to two different destination registers, > do the decrement before both stores, to do it as soon before the > branch as possible. > > This gives minor (1-2 cycle) speedups in most cases (modulo measurement > noise), but the horizontal prediction functions get a rather notable > speedup on the Cortex A53. > LGTM ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH] libavutil/cpu: Fix definition of _GNU_SOURCE so it occurs before other includes
From: Kevin Wheatley This fix moves the potential definition of _GNU_SOURCE prior to any includes of system header files as required by the documentation https://www.gnu.org/software/libc/manual/html_node/Feature-Test-Macros.html This corrects the CPU_COUNT macro availability, resulting in sched_getaffinity() being called on Linux systems. This then correctly returns the number of CPUs when run under containers and other cases where processor affinity has been setup prior to running FFmpeg Signed-off-by: Kevin J Wheatley --- libavutil/cpu.c | 16 +--- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/libavutil/cpu.c b/libavutil/cpu.c index 8e3576a..1496c5d 100644 --- a/libavutil/cpu.c +++ b/libavutil/cpu.c @@ -16,6 +16,15 @@ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA */ +#include "config.h" + +#if HAVE_SCHED_GETAFFINITY +#ifndef _GNU_SOURCE +# define _GNU_SOURCE +#endif +#include +#endif + #include #include #include @@ -23,16 +32,9 @@ #include "attributes.h" #include "cpu.h" #include "cpu_internal.h" -#include "config.h" #include "opt.h" #include "common.h" -#if HAVE_SCHED_GETAFFINITY -#ifndef _GNU_SOURCE -# define _GNU_SOURCE -#endif -#include -#endif #if HAVE_GETPROCESSAFFINITYMASK || HAVE_WINRT #include #endif -- 1.8.5.6 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH] aarch64: h264pred: Optimize the inner loop of existing 8 bit functions
Move the loop counter decrement further from the branch instruction, this hides the latency of the decrement. In loops that first load, then store (the horizontal prediction cases), do the decrement after the load (where the next instruction would stall a bit anyway, waiting for the result of the load). In loops that store twice using the same destination register, also do the decrement between the two stores (as the second store would need to wait for the updated destination register from the first instruction). In loops that store twice to two different destination registers, do the decrement before both stores, to do it as soon before the branch as possible. This gives minor (1-2 cycle) speedups in most cases (modulo measurement noise), but the horizontal prediction functions get a rather notable speedup on the Cortex A53. Before: Cortex A53 A72 A73 pred8x8_dc_8_neon:60.746.239.2 pred8x8_dc_128_8_neon:30.718.014.0 pred8x8_horizontal_8_neon:42.229.218.5 pred8x8_left_dc_8_neon: 52.736.232.2 pred8x8_mad_cow_dc_0l0_8_neon:48.227.725.7 pred8x8_mad_cow_dc_0lt_8_neon:52.533.234.7 pred8x8_mad_cow_dc_l0t_8_neon:52.531.733.2 pred8x8_mad_cow_dc_l00_8_neon:43.227.025.5 pred8x8_plane_8_neon:112.286.288.2 pred8x8_top_dc_8_neon:40.723.021.2 pred8x8_vertical_8_neon: 27.215.514.0 pred16x16_dc_8_neon: 91.073.270.5 pred16x16_dc_128_8_neon: 43.034.730.7 pred16x16_horizontal_8_neon: 86.049.744.7 pred16x16_left_dc_8_neon: 87.067.267.5 pred16x16_plane_8_neon: 236.0 175.7 173.0 pred16x16_top_dc_8_neon: 53.239.041.7 pred16x16_vertical_8_neon:41.729.731.0 After: pred8x8_dc_8_neon:59.046.742.5 pred8x8_dc_128_8_neon:28.218.014.0 pred8x8_horizontal_8_neon:34.229.218.5 pred8x8_left_dc_8_neon: 51.038.232.7 pred8x8_mad_cow_dc_0l0_8_neon:46.728.226.2 pred8x8_mad_cow_dc_0lt_8_neon:55.233.737.5 pred8x8_mad_cow_dc_l0t_8_neon:51.231.737.2 pred8x8_mad_cow_dc_l00_8_neon:41.727.526.0 pred8x8_plane_8_neon:111.586.589.5 pred8x8_top_dc_8_neon:39.023.221.0 pred8x8_vertical_8_neon: 27.216.014.0 pred16x16_dc_8_neon: 85.070.270.5 pred16x16_dc_128_8_neon: 42.030.030.7 pred16x16_horizontal_8_neon: 66.549.542.5 pred16x16_left_dc_8_neon: 81.066.567.5 pred16x16_plane_8_neon: 235.0 175.7 173.0 pred16x16_top_dc_8_neon: 52.039.041.7 pred16x16_vertical_8_neon:40.233.231.0 Despite this, a number of these functions still are slower than what e.g. GCC 7 generates - this shows the relative speedup of the neon codepaths over the compiler generated ones: Cortex A53A72A73 pred8x8_dc_8_neon: 0.86 0.65 1.04 pred8x8_dc_128_8_neon: 0.59 0.44 0.62 pred8x8_horizontal_8_neon: 1.51 0.58 1.30 pred8x8_left_dc_8_neon: 0.72 0.56 0.89 pred8x8_mad_cow_dc_0l0_8_neon: 0.93 0.93 1.37 pred8x8_mad_cow_dc_0lt_8_neon: 1.37 1.41 1.68 pred8x8_mad_cow_dc_l0t_8_neon: 1.21 1.17 1.32 pred8x8_mad_cow_dc_l00_8_neon: 1.24 1.19 1.60 pred8x8_plane_8_neon:3.36 3.58 3.76 pred8x8_top_dc_8_neon: 0.97 0.99 1.43 pred8x8_vertical_8_neon: 0.86 0.78 1.18 pred16x16_dc_8_neon: 1.20 1.06 1.49 pred16x16_dc_128_8_neon: 0.83 0.95 0.99 pred16x16_horizontal_8_neon: 1.78 0.96 1.59 pred16x16_left_dc_8_neon:1.06 0.96 1.32 pred16x16_plane_8_neon: 5.78 6.49 7.19 pred16x16_top_dc_8_neon: 1.48 1.53 1.94 pred16x16_vertical_8_neon: 1.39 1.34 1.98 In particular, on Cortex A72, many of these functions are slower than the compiler generated code, while they're more beneficial on e.g. the Cortex A73. --- libavcodec/aarch64/h264pred_neon.S | 22 +++--- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/libavcodec/aarch64/h264pred_neon.S b/libavcodec/aarch64/h264pred_neon.S index 213b40b3e7..6fec33cf6a 100644 --- a/libavcodec/aarch64/h264pred_neon.S +++ b/libavcodec/aarch64/h264pred_neon.S @@ -81,8 +81,8 @@ function ff_pred16x16_dc_neon, export=1 .L_pred16x16_dc_end: mov w3, #8 6: st1 {v0.16b}, [x0], x1 -st1 {v0.16b}, [x0], x1 subsw3, w3, #1 +st1 {v0.16b}, [x0], x1 b.ne6b ret endfunc @@ -91,8 +91,8 @@ function ff_pred16x16_hor_neon, export=1 sub x2, x0, #1 mov w3, #16
Re: [FFmpeg-devel] [PATCH 2/2] lavc/qsvdec: export AVFilmGrainParams side data
Hi Mark / Zhong, Could you please have a look at this patch when you get some time? Thanks Haihao > When AV_CODEC_EXPORT_DATA_FILM_GRAIN is present, AV1 decoder should > disable film grain application and export the corresponding side data > --- > libavcodec/qsv_internal.h | 3 ++ > libavcodec/qsvdec.c | 88 +++ > 2 files changed, 91 insertions(+) > > diff --git a/libavcodec/qsv_internal.h b/libavcodec/qsv_internal.h > index 1d94d429e8..754581087d 100644 > --- a/libavcodec/qsv_internal.h > +++ b/libavcodec/qsv_internal.h > @@ -76,6 +76,9 @@ typedef struct QSVFrame { > mfxFrameSurface1 surface; > mfxEncodeCtrl enc_ctrl; > mfxExtDecodedFrameInfo dec_info; > +#if QSV_VERSION_ATLEAST(1, 34) > +mfxExtAV1FilmGrainParam av1_film_grain_param; > +#endif > mfxExtBuffer *ext_param[QSV_MAX_FRAME_EXT_PARAMS]; > int num_ext_params; > > diff --git a/libavcodec/qsvdec.c b/libavcodec/qsvdec.c > index 55cf9f35c5..e34441fc0b 100644 > --- a/libavcodec/qsvdec.c > +++ b/libavcodec/qsvdec.c > @@ -38,6 +38,7 @@ > #include "libavutil/pixfmt.h" > #include "libavutil/time.h" > #include "libavutil/imgutils.h" > +#include "libavutil/film_grain_params.h" > > #include "avcodec.h" > #include "internal.h" > @@ -334,6 +335,11 @@ static int qsv_decode_header(AVCodecContext *avctx, > QSVContext *q, > return ff_qsv_print_error(avctx, ret, > "Error decoding stream header"); > > +#if QSV_VERSION_ATLEAST(1, 34) > +if (avctx->codec_id == AV_CODEC_ID_AV1) > +param->mfx.FilmGrain = (avctx->export_side_data & > AV_CODEC_EXPORT_DATA_FILM_GRAIN) ? 0 : param->mfx.FilmGrain; > +#endif > + > return 0; > } > > @@ -373,6 +379,12 @@ static int alloc_frame(AVCodecContext *avctx, QSVContext > *q, QSVFrame *frame) > frame->dec_info.Header.BufferId = MFX_EXTBUFF_DECODED_FRAME_INFO; > frame->dec_info.Header.BufferSz = sizeof(frame->dec_info); > ff_qsv_frame_add_ext_param(avctx, frame, (mfxExtBuffer *) > >dec_info); > +#if QSV_VERSION_ATLEAST(1, 34) > +frame->av1_film_grain_param.Header.BufferId = > MFX_EXTBUFF_AV1_FILM_GRAIN_PARAM; > +frame->av1_film_grain_param.Header.BufferSz = sizeof(frame- > >av1_film_grain_param); > +frame->av1_film_grain_param.FilmGrainFlags = 0; > +ff_qsv_frame_add_ext_param(avctx, frame, (mfxExtBuffer *) > >av1_film_grain_param); > +#endif > > frame->used = 1; > > @@ -443,6 +455,73 @@ static QSVFrame *find_frame(QSVContext *q, > mfxFrameSurface1 *surf) > return NULL; > } > > +#if QSV_VERSION_ATLEAST(1, 34) > +static int qsv_export_film_grain(AVCodecContext *avctx, > mfxExtAV1FilmGrainParam *ext_param, AVFrame *frame) > +{ > +AVFilmGrainParams *fgp; > +AVFilmGrainAOMParams *aom; > +int i; > + > +if (!(ext_param->FilmGrainFlags & MFX_FILM_GRAIN_APPLY)) > +return 0; > + > +fgp = av_film_grain_params_create_side_data(frame); > + > +if (!fgp) > +return AVERROR(ENOMEM); > + > +fgp->type = AV_FILM_GRAIN_PARAMS_AV1; > +fgp->seed = ext_param->GrainSeed; > +aom = >codec.aom; > + > +aom->chroma_scaling_from_luma = !!(ext_param->FilmGrainFlags & > MFX_FILM_GRAIN_CHROMA_SCALING_FROM_LUMA); > +aom->scaling_shift = ext_param->GrainScalingMinus8 + 8; > +aom->ar_coeff_lag = ext_param->ArCoeffLag; > +aom->ar_coeff_shift = ext_param->ArCoeffShiftMinus6 + 6; > +aom->grain_scale_shift = ext_param->GrainScaleShift; > +aom->overlap_flag = !!(ext_param->FilmGrainFlags & > MFX_FILM_GRAIN_OVERLAP); > +aom->limit_output_range = !!(ext_param->FilmGrainFlags & > MFX_FILM_GRAIN_CLIP_TO_RESTRICTED_RANGE); > + > +aom->num_y_points = ext_param->NumYPoints; > + > +for (i = 0; i < aom->num_y_points; i++) { > +aom->y_points[i][0] = ext_param->PointY[i].Value; > +aom->y_points[i][1] = ext_param->PointY[i].Scaling; > +} > + > +aom->num_uv_points[0] = ext_param->NumCbPoints; > + > +for (i = 0; i < aom->num_uv_points[0]; i++) { > +aom->uv_points[0][i][0] = ext_param->PointCb[i].Value; > +aom->uv_points[0][i][1] = ext_param->PointCb[i].Scaling; > +} > + > +aom->num_uv_points[1] = ext_param->NumCrPoints; > + > +for (i = 0; i < aom->num_uv_points[1]; i++) { > +aom->uv_points[1][i][0] = ext_param->PointCr[i].Value; > +aom->uv_points[1][i][1] = ext_param->PointCr[i].Scaling; > +} > + > +for (i = 0; i < 24; i++) > +aom->ar_coeffs_y[i] = ext_param->ArCoeffsYPlus128[i] - 128; > + > +for (i = 0; i < 25; i++) { > +aom->ar_coeffs_uv[0][i] = ext_param->ArCoeffsCbPlus128[i] - 128; > +aom->ar_coeffs_uv[1][i] = ext_param->ArCoeffsCrPlus128[i] - 128; > +} > + > +aom->uv_mult[0] = ext_param->CbMult; > +aom->uv_mult[1] = ext_param->CrMult; > +aom->uv_mult_luma[0] = ext_param->CbLumaMult; > +aom->uv_mult_luma[1] = ext_param->CrLumaMult; > +aom->uv_offset[0] = ext_param->CbOffset; > +
Re: [FFmpeg-devel] [PATCH 1/3] lavc/qsv: apply AVCodecContext AVOption -threads to QSV
On Sat, 2021-04-10 at 13:32 +0800, Linjie Fu wrote: > Hi Haihao, > > On Thu, Apr 8, 2021 at 3:10 PM Haihao Xiang wrote: > > > > By default the SDK creates a thread for each CPU when creating a mfx > > session for decoding / encoding, which results in CPU overhead on a > > multi CPU system. Actually creating 2 threads is a better choice for > > most cases in practice. > > > > This patch allows user to specify the number of threads created for a > > mfx session via option -threads. If the number is not specified, 2 > > threads will be created by default. > > > > Note the SDK requires at least 2 threads to avoid dead locks[1] > > > > [1] > > https://github.com/Intel-Media-SDK/MediaSDK/blob/master/_studio/mfx_lib/scheduler/linux/src/mfx_scheduler_core_ischeduler.cpp#L90-L93 > > --- > > Optional choice for users to specify the thread number looks reasonable to me, > and decreasing the CPU overhead makes sense for HW encoding pipeline. > > Also curious about what's the tradeoff of decreasing the thread number to 2. > Would the performance or something else drop? Thanks for the comment. MSDK threads are used to execute MSDK tasks. For hw decoding /encoding pipeline, these tasks are very light, so we may use a few threads for msdk tasks. I didn't see performance drop in my testing after applying this patch. Regards Haihao > > - linjie ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".