[FFmpeg-cvslog] [ffmpeg-web] branch master updated. c1acb1b web/security: Add more missing CVE#s
The branch, master has been updated via c1acb1b9bd2551a147fd422e96ed456da810aef3 (commit) from a4b40b1f993070377e98759e6db0a4d08a9649c5 (commit) - Log - commit c1acb1b9bd2551a147fd422e96ed456da810aef3 Author: Michael Niedermayer AuthorDate: Thu Nov 24 21:38:51 2022 +0100 Commit: Michael Niedermayer CommitDate: Thu Nov 24 21:38:51 2022 +0100 web/security: Add more missing CVE#s Signed-off-by: Michael Niedermayer diff --git a/src/security b/src/security index 270c455..aae87ee 100644 --- a/src/security +++ b/src/security @@ -572,6 +572,17 @@ CVE-2020-22046, 02fd294a333baaa55501eb0a26b86c99a80e4569 / 097c917c147661f5378da CVE-2020-22048, ee981f7ceb2c20dbfc5a2f5f27b0c44032eac798 / fddef964e8aa4a2c123e470db1436a082ff6bcf3, ticket/8303 +3.4.10 + +Fixes following vulnerabilities: + + +CVE-2020-20891, b239ccff7db0d418a74adcebfb1f2304f9a2f1f0 / 64a805883d7223c868a683f0030837d859edd2ab, ticket/8282 +CVE-2020-20892, 32a384519a57ad850789636c4c686091a53ce217 / 19587c9332f5be4f6bc6d7b2b8ef3fd21dfeaa01, ticket/8265 +CVE-2020-20896, b8197738d27f21583d9f83d7fa8c978d3a47af85 / dd01947397b98e94c3f2a79d5820aaf4594f4d3b, ticket/8273 +CVE-2020-20902, 04240e1d09e67c6e92189a96aeab96ef7428d942 / 2c78a76cb0443f8a12a5eadc3b58373aa2f4ab22, ticket/8176 + + 3.4.9 Fixes following vulnerabilities: @@ -587,6 +598,7 @@ CVE-2020-35965, 00115573e3030eff57847e1045ec18f0da5adb5c / 3e5959b3457f7f1856d99 CVE-2021-38114, e61b25e2557394e640a5aae901473785a4b23db5 / 7150f9575671f898382c370acae35f9087a30ba1 CVE-2021-38171, bc9e0b6cd2839acbac8da3232d715eb66857e453 / 9ffa49496d1aae4cbbb387aac28a9e061a6ab0a6 CVE-2021-38291, a4a3fd814aac900175ec4a2811cb5bf98c1ddad3 / e01d306c647b5827102260b885faa223b646d2d1, ticket/9312 +CVE-2020-23906, d46b698478f11ab85135b3cf0a7944c4dd62e37c / ec59dc73f0cc8930bf5dae389cd76d049d537ca7, ticket/8782 3.4.8 @@ -607,6 +619,10 @@ CVE-2019-12730, 59ac4182583e4791a7f98b79099916fd96beedfd / ed188f6dcdf0935c939ed CVE-2019-13390, cfa7c079f72b65bfe038af84d95d384a609d4f0a / aef24efb0c1e65097ab77a4bf9264189bdf3ace3 CVE-2019-17542, 4aaf644892843e3c68f4761725ab9435745f015c / 02f909dc24b1f05cfbba75077c7707b905e63cd2 CVE-2019-17539, c3b7afa4e917d748f0c3f8237b04ebdd99bdcacb / 8df6884832ec413cf032dfaa45c23b1c7876670c +CVE-2020-20448, 0c9ad1c746e3a8ccb7c6f292e10c8017c0a9dc3b / 8802e329c8317ca5ceb929df48a23eb0f9e852b2, ticket/7990 +CVE-2020-20448, e0d167051e93bad55a4c009399de1545aa07eeb5 / 55279d699fa64d8eb1185d8db04ab4ed92e8dea2, ticket/7990 +CVE-2020-20902, 4b4c26ca09b525168339df8697eb7f6bfe20345f / 0c61661a2cbe1b8b284c80ada1c2fdddf4992cad, ticket/8176 +CVE-2020-20902, f628f38f6e43c140167005593b447c47fd731a44 / 5f0acc5064ed501cb40d4aaccae2b3ce5c4552fd, ticket/8176 3.4.6 @@ -805,24 +821,6 @@ CVE-2017-7866, e371f031b942d73e02c090170975561fabd5c264 FFmpeg 3.2 -3.2.18 - -Fixes following vulnerabilities: - - -CVE-2020-20451, 0c949b6ebfcee1b23a5fe33a3bc8af167956ea1e / 21265f42ecb265debe9fec1dbfd0cb7de5a8aefb, ticket/8094 -CVE-2020-21041, 3d350ec7281cd0d357231fc2c99f44ebe425d586 / 5d9f44da460f781a1604d537d0555b78e29438ba, ticket/7989 -CVE-2020-22016, 02161c6ed194ddfa00fd2af7684a8099852bc3ce / 58aa0ed8f10753ee90f4a4a1f4f3da803cf7c145, ticket/8183 -CVE-2020-22020, 93ad1e4a9f17ac5145c2957bb270a454c8a0cefd / ce5274c1385d55892a692998923802023526b765, ticket/8239 -CVE-2020-22022, ea5d154845dfc1c6e550d197d7da79aee87c9f66 / 07050d7bdc32d82e53ee5bb727f5882323d00dba, ticket/8264 -CVE-2020-22025, ff73a50456b93e8d555603c093a3ebd193d0b097 / ccf4ab8c9aca0aee66bcc2914031a9c97ac0eeb8, ticket/8260 -CVE-2020-22031, 1a4d18820d551bedcfa03b7e8ca72df87d4b5cfa / 0e68e8c93f9068596484ec8ba725586860e06fc8, ticket/8243 -CVE-2020-22032, a19796a15ee0ab82e2b70d253d328111e9f916e0 / de598f82f8c3f8000e1948548e8088148e2b1f44, ticket/8275 -CVE-2020-22041, 4f566654e744c7810f4afdd91fe00fdd1ef46646 / 3488e0977c671568731afa12b811adce9d4d807f, ticket/8296 -CVE-2020-22044, 40dfd623632ed22bf3c98465ae3e68fcb1f31200 / 1d479300cbe0522c233b7d51148aea2b29bd29ad, ticket/8295 -CVE-2020-22046, 1a541dc0c5e1279251c9ed4cd416005fcca6e748 / 097c917c147661f5378dae8fe3f7e46f43236426, ticket/8294 - - 3.2.18 Fixes following vulnerabilities: @@ -846,14 +844,29 @@ CVE-2020-22046, bbc9751da67286d27f379dbe3b52ee3b55b0503e / 097c917c147661f5378da CVE-2020-22048, 64d2e0b20066058cf1c6dc3c49adab6d18d66fcc / fddef964e8aa4a2c123e470db1436a082ff6bcf3, ticket/8303 +3.2.17 + +Fixes following vulnerabilities: + + +CVE-2020-20891, f8b4426c10aa65f4c04847a50ebfdcb8782a49b7 / 64a805883d7223c868a683f0030837d859edd2ab, ticket/8282 +CVE-2020-20892, 94e502e96b0870177e0af4c1e8718ac71475e374 / 19587c9332f5be4f6bc6d7b2b8ef3fd21dfeaa01, ticket/8265 +CVE-2020-20902, abf9627f70ed8467b1646d56205e61f965f11468 / 2c78a76cb0443f8a12a5eadc3b58373aa2f4ab22, ticket/8176 + + 3.2.16 Fixes following vulnerabilities: CVE-2019-17539,
[FFmpeg-cvslog] swscale/utils: Move functions to avoid forward declarations
ffmpeg | branch: master | Andreas Rheinhardt | Sat Nov 19 05:58:59 2022 +0100| [ff39dcb129806477e9a05c30dfdefb96f7fb0a25] | committer: Andreas Rheinhardt swscale/utils: Move functions to avoid forward declarations Signed-off-by: Andreas Rheinhardt > http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=ff39dcb129806477e9a05c30dfdefb96f7fb0a25 --- libswscale/utils.c | 407 ++--- 1 file changed, 200 insertions(+), 207 deletions(-) diff --git a/libswscale/utils.c b/libswscale/utils.c index fb788fc330..cdd89e4b58 100644 --- a/libswscale/utils.c +++ b/libswscale/utils.c @@ -59,13 +59,6 @@ #include "swscale.h" #include "swscale_internal.h" -static SwsVector *sws_getIdentityVec(void); -static void sws_addVec(SwsVector *a, SwsVector *b); -static void sws_shiftVec(SwsVector *a, int shift); -static void sws_printVec2(SwsVector *a, AVClass *log_ctx, int log_level); - -static void handle_formats(SwsContext *c); - typedef struct FormatEntry { uint8_t is_supported_in :1; uint8_t is_supported_out:1; @@ -926,6 +919,74 @@ static void fill_xyztables(struct SwsContext *c) } } +static int handle_jpeg(enum AVPixelFormat *format) +{ +switch (*format) { +case AV_PIX_FMT_YUVJ420P: +*format = AV_PIX_FMT_YUV420P; +return 1; +case AV_PIX_FMT_YUVJ411P: +*format = AV_PIX_FMT_YUV411P; +return 1; +case AV_PIX_FMT_YUVJ422P: +*format = AV_PIX_FMT_YUV422P; +return 1; +case AV_PIX_FMT_YUVJ444P: +*format = AV_PIX_FMT_YUV444P; +return 1; +case AV_PIX_FMT_YUVJ440P: +*format = AV_PIX_FMT_YUV440P; +return 1; +case AV_PIX_FMT_GRAY8: +case AV_PIX_FMT_YA8: +case AV_PIX_FMT_GRAY9LE: +case AV_PIX_FMT_GRAY9BE: +case AV_PIX_FMT_GRAY10LE: +case AV_PIX_FMT_GRAY10BE: +case AV_PIX_FMT_GRAY12LE: +case AV_PIX_FMT_GRAY12BE: +case AV_PIX_FMT_GRAY14LE: +case AV_PIX_FMT_GRAY14BE: +case AV_PIX_FMT_GRAY16LE: +case AV_PIX_FMT_GRAY16BE: +case AV_PIX_FMT_YA16BE: +case AV_PIX_FMT_YA16LE: +return 1; +default: +return 0; +} +} + +static int handle_0alpha(enum AVPixelFormat *format) +{ +switch (*format) { +case AV_PIX_FMT_0BGR: *format = AV_PIX_FMT_ABGR ; return 1; +case AV_PIX_FMT_BGR0: *format = AV_PIX_FMT_BGRA ; return 4; +case AV_PIX_FMT_0RGB: *format = AV_PIX_FMT_ARGB ; return 1; +case AV_PIX_FMT_RGB0: *format = AV_PIX_FMT_RGBA ; return 4; +default: return 0; +} +} + +static int handle_xyz(enum AVPixelFormat *format) +{ +switch (*format) { +case AV_PIX_FMT_XYZ12BE : *format = AV_PIX_FMT_RGB48BE; return 1; +case AV_PIX_FMT_XYZ12LE : *format = AV_PIX_FMT_RGB48LE; return 1; +default:return 0; +} +} + +static void handle_formats(SwsContext *c) +{ +c->src0Alpha |= handle_0alpha(>srcFormat); +c->dst0Alpha |= handle_0alpha(>dstFormat); +c->srcXYZ|= handle_xyz(>srcFormat); +c->dstXYZ|= handle_xyz(>dstFormat); +if (c->srcXYZ || c->dstXYZ) +fill_xyztables(c); +} + static int range_override_needed(enum AVPixelFormat format) { return !isYUV(format) && !isGray(format); @@ -1112,74 +1173,6 @@ int sws_getColorspaceDetails(struct SwsContext *c, int **inv_table, return 0; } -static int handle_jpeg(enum AVPixelFormat *format) -{ -switch (*format) { -case AV_PIX_FMT_YUVJ420P: -*format = AV_PIX_FMT_YUV420P; -return 1; -case AV_PIX_FMT_YUVJ411P: -*format = AV_PIX_FMT_YUV411P; -return 1; -case AV_PIX_FMT_YUVJ422P: -*format = AV_PIX_FMT_YUV422P; -return 1; -case AV_PIX_FMT_YUVJ444P: -*format = AV_PIX_FMT_YUV444P; -return 1; -case AV_PIX_FMT_YUVJ440P: -*format = AV_PIX_FMT_YUV440P; -return 1; -case AV_PIX_FMT_GRAY8: -case AV_PIX_FMT_YA8: -case AV_PIX_FMT_GRAY9LE: -case AV_PIX_FMT_GRAY9BE: -case AV_PIX_FMT_GRAY10LE: -case AV_PIX_FMT_GRAY10BE: -case AV_PIX_FMT_GRAY12LE: -case AV_PIX_FMT_GRAY12BE: -case AV_PIX_FMT_GRAY14LE: -case AV_PIX_FMT_GRAY14BE: -case AV_PIX_FMT_GRAY16LE: -case AV_PIX_FMT_GRAY16BE: -case AV_PIX_FMT_YA16BE: -case AV_PIX_FMT_YA16LE: -return 1; -default: -return 0; -} -} - -static int handle_0alpha(enum AVPixelFormat *format) -{ -switch (*format) { -case AV_PIX_FMT_0BGR: *format = AV_PIX_FMT_ABGR ; return 1; -case AV_PIX_FMT_BGR0: *format = AV_PIX_FMT_BGRA ; return 4; -case AV_PIX_FMT_0RGB: *format = AV_PIX_FMT_ARGB ; return 1; -case AV_PIX_FMT_RGB0: *format = AV_PIX_FMT_RGBA ; return 4; -default: return 0; -} -} - -static int handle_xyz(enum AVPixelFormat *format) -{ -switch (*format) { -case
[FFmpeg-cvslog] swscale/utils: Fix indentation
ffmpeg | branch: master | Andreas Rheinhardt | Mon Nov 21 00:50:29 2022 +0100| [1ff9c07fa696443d2d243e45d5794c8b19946a1b] | committer: Andreas Rheinhardt swscale/utils: Fix indentation Forgotten after c1eb3e7fecdc270e03a700d61ef941600a6af491. Signed-off-by: Andreas Rheinhardt > http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=1ff9c07fa696443d2d243e45d5794c8b19946a1b --- libswscale/utils.c | 20 ++-- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/libswscale/utils.c b/libswscale/utils.c index 5a728afd89..90734f66ef 100644 --- a/libswscale/utils.c +++ b/libswscale/utils.c @@ -1307,16 +1307,16 @@ static av_cold int sws_init_single_context(SwsContext *c, SwsFilter *srcFilter, if (!(unscaled && sws_isSupportedEndiannessConversion(srcFormat) && av_pix_fmt_swap_endianness(srcFormat) == dstFormat)) { -if (!sws_isSupportedInput(srcFormat)) { -av_log(c, AV_LOG_ERROR, "%s is not supported as input pixel format\n", - av_get_pix_fmt_name(srcFormat)); -return AVERROR(EINVAL); -} -if (!sws_isSupportedOutput(dstFormat)) { -av_log(c, AV_LOG_ERROR, "%s is not supported as output pixel format\n", - av_get_pix_fmt_name(dstFormat)); -return AVERROR(EINVAL); -} +if (!sws_isSupportedInput(srcFormat)) { +av_log(c, AV_LOG_ERROR, "%s is not supported as input pixel format\n", + av_get_pix_fmt_name(srcFormat)); +return AVERROR(EINVAL); +} +if (!sws_isSupportedOutput(dstFormat)) { +av_log(c, AV_LOG_ERROR, "%s is not supported as output pixel format\n", + av_get_pix_fmt_name(dstFormat)); +return AVERROR(EINVAL); +} } av_assert2(desc_src && desc_dst); ___ ffmpeg-cvslog mailing list ffmpeg-cvslog@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-cvslog To unsubscribe, visit link above, or email ffmpeg-cvslog-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-cvslog] swscale/utils: Derive range from YUVJ-pix-fmt only once
ffmpeg | branch: master | Andreas Rheinhardt | Mon Nov 21 00:03:01 2022 +0100| [b2d1a258162a619187bbb0a72f7e8eb94f91cfa4] | committer: Andreas Rheinhardt swscale/utils: Derive range from YUVJ-pix-fmt only once Currently, it is done once per slice-thread, leading to one warning per slice-thread in case a YUVJ pixel format has been originally used. This also fixes the anomaly that said parameter are only updated for the user-facing context (whose values are retrievable via av_opt_get()) if slice-threading is not in use. Fixes ticket #9860. Signed-off-by: Andreas Rheinhardt > http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=b2d1a258162a619187bbb0a72f7e8eb94f91cfa4 --- libswscale/utils.c | 18 ++ 1 file changed, 10 insertions(+), 8 deletions(-) diff --git a/libswscale/utils.c b/libswscale/utils.c index cdd89e4b58..5a728afd89 100644 --- a/libswscale/utils.c +++ b/libswscale/utils.c @@ -1277,8 +1277,7 @@ static av_cold int sws_init_single_context(SwsContext *c, SwsFilter *srcFilter, int dstH = c->dstH; int dst_stride= FFALIGN(dstW * sizeof(int16_t) + 66, 16); int flags, cpu_flags; -enum AVPixelFormat srcFormat = c->srcFormat; -enum AVPixelFormat dstFormat = c->dstFormat; +enum AVPixelFormat srcFormat, dstFormat; const AVPixFmtDescriptor *desc_src; const AVPixFmtDescriptor *desc_dst; int ret = 0; @@ -1291,12 +1290,6 @@ static av_cold int sws_init_single_context(SwsContext *c, SwsFilter *srcFilter, unscaled = (srcW == dstW && srcH == dstH); -c->srcRange |= handle_jpeg(>srcFormat); -c->dstRange |= handle_jpeg(>dstFormat); - -if(srcFormat!=c->srcFormat || dstFormat!=c->dstFormat) -av_log(c, AV_LOG_WARNING, "deprecated pixel format used, make sure you did set range correctly\n"); - if (!c->contrast && !c->saturation && !c->dstFormatBpp) sws_setColorspaceDetails(c, ff_yuv2rgb_coeffs[SWS_CS_DEFAULT], c->srcRange, ff_yuv2rgb_coeffs[SWS_CS_DEFAULT], @@ -2034,6 +2027,7 @@ av_cold int sws_init_context(SwsContext *c, SwsFilter *srcFilter, SwsFilter *dstFilter) { static AVOnce rgb2rgb_once = AV_ONCE_INIT; +enum AVPixelFormat src_format, dst_format; int ret; c->frame_src = av_frame_alloc(); @@ -2044,6 +2038,14 @@ av_cold int sws_init_context(SwsContext *c, SwsFilter *srcFilter, if (ff_thread_once(_once, ff_sws_rgb2rgb_init) != 0) return AVERROR_UNKNOWN; +src_format = c->srcFormat; +dst_format = c->dstFormat; +c->srcRange |= handle_jpeg(>srcFormat); +c->dstRange |= handle_jpeg(>dstFormat); + +if (src_format != c->srcFormat || dst_format != c->dstFormat) +av_log(c, AV_LOG_WARNING, "deprecated pixel format used, make sure you did set range correctly\n"); + if (c->nb_threads != 1) { ret = context_init_threaded(c, srcFilter, dstFilter); if (ret < 0 || c->nb_threads > 1) ___ ffmpeg-cvslog mailing list ffmpeg-cvslog@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-cvslog To unsubscribe, visit link above, or email ffmpeg-cvslog-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-cvslog] swscale/utils: Avoid calling ff_thread_once() unnecessarily
ffmpeg | branch: master | Andreas Rheinhardt | Sat Nov 19 05:44:44 2022 +0100| [baccc1c5417f990ebfc1b6780e2dab255a72ee3c] | committer: Andreas Rheinhardt swscale/utils: Avoid calling ff_thread_once() unnecessarily Signed-off-by: Andreas Rheinhardt > http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=baccc1c5417f990ebfc1b6780e2dab255a72ee3c --- libswscale/utils.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/libswscale/utils.c b/libswscale/utils.c index 053c6bb76b..fb788fc330 100644 --- a/libswscale/utils.c +++ b/libswscale/utils.c @@ -1340,13 +1340,10 @@ static av_cold int sws_init_single_context(SwsContext *c, SwsFilter *srcFilter, int ret = 0; enum AVPixelFormat tmpFmt; static const float float_mult = 1.0f / 255.0f; -static AVOnce rgb2rgb_once = AV_ONCE_INIT; cpu_flags = av_get_cpu_flags(); flags = c->flags; emms_c(); -if (ff_thread_once(_once, ff_sws_rgb2rgb_init) != 0) -return AVERROR_UNKNOWN; unscaled = (srcW == dstW && srcH == dstH); @@ -2043,6 +2040,7 @@ fail: // FIXME replace things by appropriate error codes av_cold int sws_init_context(SwsContext *c, SwsFilter *srcFilter, SwsFilter *dstFilter) { +static AVOnce rgb2rgb_once = AV_ONCE_INIT; int ret; c->frame_src = av_frame_alloc(); @@ -2050,6 +2048,9 @@ av_cold int sws_init_context(SwsContext *c, SwsFilter *srcFilter, if (!c->frame_src || !c->frame_dst) return AVERROR(ENOMEM); +if (ff_thread_once(_once, ff_sws_rgb2rgb_init) != 0) +return AVERROR_UNKNOWN; + if (c->nb_threads != 1) { ret = context_init_threaded(c, srcFilter, dstFilter); if (ret < 0 || c->nb_threads > 1) ___ ffmpeg-cvslog mailing list ffmpeg-cvslog@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-cvslog To unsubscribe, visit link above, or email ffmpeg-cvslog-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-cvslog] swscale/utils: Don't allocate AVFrames for slice contexts
ffmpeg | branch: master | Andreas Rheinhardt | Sat Nov 19 05:42:29 2022 +0100| [8ee071122806724a00eecb6b1eff639890c4be48] | committer: Andreas Rheinhardt swscale/utils: Don't allocate AVFrames for slice contexts Only the parent context's AVFrames are ever used. Signed-off-by: Andreas Rheinhardt > http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=8ee071122806724a00eecb6b1eff639890c4be48 --- libswscale/utils.c | 15 +-- 1 file changed, 5 insertions(+), 10 deletions(-) diff --git a/libswscale/utils.c b/libswscale/utils.c index c6fa07f752..053c6bb76b 100644 --- a/libswscale/utils.c +++ b/libswscale/utils.c @@ -1317,11 +1317,6 @@ static int context_init_threaded(SwsContext *c, } } -c->frame_src = av_frame_alloc(); -c->frame_dst = av_frame_alloc(); -if (!c->frame_src || !c->frame_dst) -return AVERROR(ENOMEM); - return 0; } @@ -1581,11 +1576,6 @@ static av_cold int sws_init_single_context(SwsContext *c, SwsFilter *srcFilter, if (!FF_ALLOCZ_TYPED_ARRAY(c->formatConvBuffer, FFALIGN(srcW * 2 + 78, 16) * 2)) goto nomem; -c->frame_src = av_frame_alloc(); -c->frame_dst = av_frame_alloc(); -if (!c->frame_src || !c->frame_dst) -goto nomem; - c->srcBpc = desc_src->comp[0].depth; if (c->srcBpc < 8) c->srcBpc = 8; @@ -2055,6 +2045,11 @@ av_cold int sws_init_context(SwsContext *c, SwsFilter *srcFilter, { int ret; +c->frame_src = av_frame_alloc(); +c->frame_dst = av_frame_alloc(); +if (!c->frame_src || !c->frame_dst) +return AVERROR(ENOMEM); + if (c->nb_threads != 1) { ret = context_init_threaded(c, srcFilter, dstFilter); if (ret < 0 || c->nb_threads > 1) ___ ffmpeg-cvslog mailing list ffmpeg-cvslog@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-cvslog To unsubscribe, visit link above, or email ffmpeg-cvslog-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-cvslog] swscale/utils: Factor initializing single slice context out
ffmpeg | branch: master | Andreas Rheinhardt | Sat Nov 19 04:58:45 2022 +0100| [64ed1d40df82949114ca5c4cbf33858ae94cc7f9] | committer: Andreas Rheinhardt swscale/utils: Factor initializing single slice context out Initializing slice threads currently uses the function (sws_init_context()) that is also used for initializing user-facing contexts with the only difference being that nb_threads is set to one before initializing the slice contexts. Yet sws_init_context() also initializes lots of stuff that is not slice-dependent, i.e. (src|dst)Range. This currently only works because the code sets these fields to the same values for all slice contexts. This is not nice; even worse, it entails that log messages are printed once per slice context (and therefore fill the screen). This commit lays the groundwork to fix this. Signed-off-by: Andreas Rheinhardt > http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=64ed1d40df82949114ca5c4cbf33858ae94cc7f9 --- libswscale/utils.c | 31 +-- 1 file changed, 21 insertions(+), 10 deletions(-) diff --git a/libswscale/utils.c b/libswscale/utils.c index 85640a143f..c6fa07f752 100644 --- a/libswscale/utils.c +++ b/libswscale/utils.c @@ -1268,6 +1268,9 @@ static enum AVPixelFormat alphaless_fmt(enum AVPixelFormat fmt) } } +static int sws_init_single_context(SwsContext *c, SwsFilter *srcFilter, + SwsFilter *dstFilter); + static int context_init_threaded(SwsContext *c, SwsFilter *src_filter, SwsFilter *dst_filter) { @@ -1301,7 +1304,7 @@ static int context_init_threaded(SwsContext *c, c->slice_ctx[i]->nb_threads = 1; -ret = sws_init_context(c->slice_ctx[i], src_filter, dst_filter); +ret = sws_init_single_context(c->slice_ctx[i], src_filter, dst_filter); if (ret < 0) return ret; @@ -1322,8 +1325,8 @@ static int context_init_threaded(SwsContext *c, return 0; } -av_cold int sws_init_context(SwsContext *c, SwsFilter *srcFilter, - SwsFilter *dstFilter) +static av_cold int sws_init_single_context(SwsContext *c, SwsFilter *srcFilter, + SwsFilter *dstFilter) { int i; int usesVFilter, usesHFilter; @@ -1344,13 +1347,6 @@ av_cold int sws_init_context(SwsContext *c, SwsFilter *srcFilter, static const float float_mult = 1.0f / 255.0f; static AVOnce rgb2rgb_once = AV_ONCE_INIT; -if (c->nb_threads != 1) { -ret = context_init_threaded(c, srcFilter, dstFilter); -if (ret < 0 || c->nb_threads > 1) -return ret; -// threading disabled in this build, init as single-threaded -} - cpu_flags = av_get_cpu_flags(); flags = c->flags; emms_c(); @@ -2054,6 +2050,21 @@ fail: // FIXME replace things by appropriate error codes return ret; } +av_cold int sws_init_context(SwsContext *c, SwsFilter *srcFilter, + SwsFilter *dstFilter) +{ +int ret; + +if (c->nb_threads != 1) { +ret = context_init_threaded(c, srcFilter, dstFilter); +if (ret < 0 || c->nb_threads > 1) +return ret; +// threading disabled in this build, init as single-threaded +} + +return sws_init_single_context(c, srcFilter, dstFilter); +} + SwsContext *sws_alloc_set_opts(int srcW, int srcH, enum AVPixelFormat srcFormat, int dstW, int dstH, enum AVPixelFormat dstFormat, int flags, const double *param) ___ ffmpeg-cvslog mailing list ffmpeg-cvslog@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-cvslog To unsubscribe, visit link above, or email ffmpeg-cvslog-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-cvslog] [ffmpeg-web] branch master updated. a4b40b1 web/security: add missing CVE#s
The branch, master has been updated via a4b40b1f993070377e98759e6db0a4d08a9649c5 (commit) from 5c52853ae8867e2aad1a9f8256bfc0e00302e363 (commit) - Log - commit a4b40b1f993070377e98759e6db0a4d08a9649c5 Author: Michael Niedermayer AuthorDate: Thu Nov 24 17:59:01 2022 +0100 Commit: Michael Niedermayer CommitDate: Thu Nov 24 18:01:18 2022 +0100 web/security: add missing CVE#s diff --git a/src/security b/src/security index 44fa15e..270c455 100644 --- a/src/security +++ b/src/security @@ -11,8 +11,25 @@ Fixes following vulnerabilities: CVE-2022-2566, 6f53f0d09ea4c9c7f7354f018a87ef840315207d / c953baa084607dd1d84c3bfcce3cf6a87c3e6e05 +5.1 + +Fixes following vulnerabilities: + + +CVE-2022-1475, 757da974b21833529cc41bdcc9684c29660cdfa8, ticket/9651 + + + FFmpeg 5.0 +5.0.1 + +Fixes following vulnerabilities: + + +CVE-2022-1475, 95322e07673885ebcbb8fd54f30a9b8f17d5be6a / 757da974b21833529cc41bdcc9684c29660cdfa8, ticket/9651 + + 5.0 Fixes following vulnerabilities: @@ -32,6 +49,14 @@ CVE-2021-38291, e01d306c647b5827102260b885faa223b646d2d1 ticket/9312, FFmpeg 4.4 +4.4.2 + +Fixes following vulnerabilities: + + +CVE-2022-1475, e9e2ddbc6c78cc18b76093617f82c920e58a8d1f / 757da974b21833529cc41bdcc9684c29660cdfa8, ticket/9651 + + 4.4.1 Fixes following vulnerabilities: @@ -61,6 +86,7 @@ CVE-2020-20450, 5400e4a50c61e53e1bc50b3e77201649bbe9c510, ticket/7993 CVE-2020-21041, 5d9f44da460f781a1604d537d0555b78e29438ba, ticket/7989 CVE-2020-22038, 7c32e9cf93b712f8463573a59ed4e98fd10fa013, ticket/8285 CVE-2020-22042, 426c16d61a9b5056a157a1a2a057a4e4d13eef84, ticket/8267 +CVE-2020-23906, ec59dc73f0cc8930bf5dae389cd76d049d537ca7, ticket/8782 CVE-2020-24020, 584f396132aa19d21bb1e38ad9a5d428869290cb, ticket/8718 CVE-2021-30123, d6f293353c94c7ce200f6e0975ae3de49787f91f, ticket/8845, never affected a release CVE-2020-35964, 27a99e2c7d450fef15594671eef4465c8a166bd7 @@ -78,6 +104,13 @@ Fixes following vulnerabilities: CVE-2020-21041, 50cadf8dc52e94372a181dd60a527c55d1d155f5 / 5d9f44da460f781a1604d537d0555b78e29438ba, ticket/7989 +4.3.4 + +Fixes following vulnerabilities: + + +CVE-2022-1475, fa2e4afe8d0a23fac37392ef6506cfc9841f8d3d / 757da974b21833529cc41bdcc9684c29660cdfa8, ticket/9651 + 4.3.3 @@ -113,6 +146,7 @@ Fixes following vulnerabilities: CVE-2020-13904, a3fdeb0c3a4ecabab2c2351b86fc92004526e9cc / b5e39880fb7269b1b3577cee288e06aa3dc1dfa2 CVE-2020-14212, dd273d359e45ab69398ac0dc41206d5f1a9371bf / 0b3bd001ac1745d9d008a2d195817df57d7d1d14 +CVE-2020-23906, be84216c53a4ed81573c82320e9c4a20e9b349d9 / ec59dc73f0cc8930bf5dae389cd76d049d537ca7, ticket/8782 4.3 @@ -128,6 +162,18 @@ CVE-2020-12284, 1812352d767ccf5431aa440123e2e260a4db2726 CVE-2020-20448, 55279d699fa64d8eb1185d8db04ab4ed92e8dea2 CVE-2020-20448, 8802e329c8317ca5ceb929df48a23eb0f9e852b2, ticket/7990 CVE-2020-20451, 21265f42ecb265debe9fec1dbfd0cb7de5a8aefb, ticket/8094 +CVE-2020-20891, 64a805883d7223c868a683f0030837d859edd2ab, ticket/8282 +CVE-2020-20892, 19587c9332f5be4f6bc6d7b2b8ef3fd21dfeaa01, ticket/8265 +CVE-2020-20896, dd01947397b98e94c3f2a79d5820aaf4594f4d3b, ticket/8273 +CVE-2020-20898, 99f8d32129dd233d4eb2efa44678a0bc44869f23, ticket/8263 +CVE-2021-38090, 99f8d32129dd233d4eb2efa44678a0bc44869f23, ticket/8263, duplicate CVE# +CVE-2021-38091, 99f8d32129dd233d4eb2efa44678a0bc44869f23, ticket/8263, duplicate CVE# +CVE-2021-38092, 99f8d32129dd233d4eb2efa44678a0bc44869f23, ticket/8263, duplicate CVE# +CVE-2021-38093, 99f8d32129dd233d4eb2efa44678a0bc44869f23, ticket/8263, duplicate CVE# +CVE-2021-38094, 99f8d32129dd233d4eb2efa44678a0bc44869f23, ticket/8263, duplicate CVE# +CVE-2020-20902, 0c61661a2cbe1b8b284c80ada1c2fdddf4992cad, ticket/8176 +CVE-2020-20902, 2c78a76cb0443f8a12a5eadc3b58373aa2f4ab22, ticket/8176 +CVE-2020-20902, 5f0acc5064ed501cb40d4aaccae2b3ce5c4552fd, ticket/8176 CVE-2020-22016, 58aa0ed8f10753ee90f4a4a1f4f3da803cf7c145, ticket/8183 CVE-2020-22017, d4d6b7b0355f3597cad3b8d12911790c73b5f96d, ticket/8309 CVE-2020-22020, ce5274c1385d55892a692998923802023526b765, ticket/8239 @@ -149,6 +195,7 @@ CVE-2020-22040, 1a0c584abc9709b1d11dbafef05d22e0937d7d19, ticket/8283 CVE-2020-22041, 3488e0977c671568731afa12b811adce9d4d807f, ticket/8296 CVE-2020-22043, b288a7eb3d963a175e177b6219c8271076ee8590, ticket/8284 CVE-2020-22044, 1d479300cbe0522c233b7d51148aea2b29bd29ad, ticket/8295 +CVE-2020-22046, 097c917c147661f5378dae8fe3f7e46f43236426, ticket/8294 FFmpeg 4.2 @@ -180,6 +227,9 @@ CVE-2020-22048, 7d4c2d90b3997542a2dece32a1234f3bc3629610 / fddef964e8aa4a2c123e4 Fixes following vulnerabilities: +CVE-2020-20891, 84fdfdf8595150c04b86febd1ef2eae3878c84b8 / 64a805883d7223c868a683f0030837d859edd2ab, ticket/8282 +CVE-2020-20892, 15900ff8e68f38404bd6d392d474d99f65cdbbf9 / 19587c9332f5be4f6bc6d7b2b8ef3fd21dfeaa01, ticket/8265 +CVE-2020-20896, c4629d8abe270ec5e5d79f7d18cd0b12cd5fd797 /
[FFmpeg-cvslog] lavu: bump minor and add APIchanges entry for lavu/tx DCT
ffmpeg | branch: master | Lynne | Sun Nov 20 21:17:30 2022 +0100| [e97368eba5b48a958d3b398780e56b12db92d1a1] | committer: Lynne lavu: bump minor and add APIchanges entry for lavu/tx DCT > http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=e97368eba5b48a958d3b398780e56b12db92d1a1 --- doc/APIchanges | 3 +++ 1 file changed, 3 insertions(+) diff --git a/doc/APIchanges b/doc/APIchanges index 038ca865ec..ab7ce15fae 100644 --- a/doc/APIchanges +++ b/doc/APIchanges @@ -14,6 +14,9 @@ libavutil: 2021-04-27 API changes, most recent first: +2022-11-xx - xx - lavu 57.43.100 - tx.h + Add AV_TX_FLOAT_DCT, AV_TX_DOUBLE_DCT and AV_TX_INT32_DCT. + 2022-xx-xx - xx - lavu 57.42.100 - dict.h Add av_dict_iterate(). ___ ffmpeg-cvslog mailing list ffmpeg-cvslog@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-cvslog To unsubscribe, visit link above, or email ffmpeg-cvslog-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-cvslog] binkaudio: convert to lavu/tx
ffmpeg | branch: master | Lynne | Sun Nov 20 04:57:00 2022 +0100| [ca8aaf24dfd28ceb4709fc518b3c95b7fce07dcc] | committer: Lynne binkaudio: convert to lavu/tx > http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=ca8aaf24dfd28ceb4709fc518b3c95b7fce07dcc --- configure | 4 ++-- libavcodec/binkaudio.c | 42 +- 2 files changed, 23 insertions(+), 23 deletions(-) diff --git a/configure b/configure index 25cd712b16..5cb80bf1ea 100755 --- a/configure +++ b/configure @@ -2772,8 +2772,8 @@ asv2_encoder_select="aandcttables bswapdsp fdctdsp pixblockdsp" atrac1_decoder_select="sinewin" av1_decoder_select="av1_frame_split_bsf cbs_av1" bink_decoder_select="blockdsp hpeldsp" -binkaudio_dct_decoder_select="dct wma_freqs" -binkaudio_rdft_decoder_select="rdft wma_freqs" +binkaudio_dct_decoder_select="wma_freqs" +binkaudio_rdft_decoder_select="wma_freqs" cavs_decoder_select="blockdsp golomb h264chroma idctdsp qpeldsp videodsp" clearvideo_decoder_select="idctdsp" cllc_decoder_select="bswapdsp" diff --git a/libavcodec/binkaudio.c b/libavcodec/binkaudio.c index 43dca1f565..046bf93207 100644 --- a/libavcodec/binkaudio.c +++ b/libavcodec/binkaudio.c @@ -33,15 +33,14 @@ #include "libavutil/channel_layout.h" #include "libavutil/intfloat.h" #include "libavutil/mem_internal.h" +#include "libavutil/tx.h" #define BITSTREAM_READER_LE #include "avcodec.h" -#include "dct.h" #include "decode.h" #include "get_bits.h" #include "codec_internal.h" #include "internal.h" -#include "rdft.h" #include "wma_freqs.h" #define MAX_DCT_CHANNELS 6 @@ -63,10 +62,8 @@ typedef struct BinkAudioContext { float previous[MAX_DCT_CHANNELS][BINK_BLOCK_MAX_SIZE / 16]; ///< coeffs from previous audio block float quant_table[96]; AVPacket *pkt; -union { -RDFTContext rdft; -DCTContext dct; -} trans; +AVTXContext *tx; +av_tx_fn tx_fn; } BinkAudioContext; @@ -138,12 +135,15 @@ static av_cold int decode_init(AVCodecContext *avctx) s->first = 1; -if (CONFIG_BINKAUDIO_RDFT_DECODER && avctx->codec->id == AV_CODEC_ID_BINKAUDIO_RDFT) -ret = ff_rdft_init(>trans.rdft, frame_len_bits, DFT_C2R); -else if (CONFIG_BINKAUDIO_DCT_DECODER) -ret = ff_dct_init(>trans.dct, frame_len_bits, DCT_III); -else +if (CONFIG_BINKAUDIO_RDFT_DECODER && avctx->codec->id == AV_CODEC_ID_BINKAUDIO_RDFT) { +float scale = 0.5; +ret = av_tx_init(>tx, >tx_fn, AV_TX_FLOAT_RDFT, 1, 1 << frame_len_bits, , 0); +} else if (CONFIG_BINKAUDIO_DCT_DECODER) { +float scale = 1.0 / (1 << frame_len_bits); +ret = av_tx_init(>tx, >tx_fn, AV_TX_FLOAT_DCT, 1, 1 << (frame_len_bits - 1), , 0); +} else { av_assert0(0); +} if (ret < 0) return ret; @@ -177,13 +177,12 @@ static int decode_block(BinkAudioContext *s, float **out, int use_dct, float q, quant[25]; int width, coeff; GetBitContext *gb = >gb; +LOCAL_ALIGNED_32(float, coeffs, [4098]); if (use_dct) skip_bits(gb, 2); for (ch = 0; ch < channels; ch++) { -FFTSample *coeffs = out[ch + ch_offset]; - if (s->version_b) { if (get_bits_left(gb) < 64) return AVERROR_INVALIDDATA; @@ -251,10 +250,15 @@ static int decode_block(BinkAudioContext *s, float **out, int use_dct, if (CONFIG_BINKAUDIO_DCT_DECODER && use_dct) { coeffs[0] /= 0.5; -s->trans.dct.dct_calc(>trans.dct, coeffs); +s->tx_fn(s->tx, out[ch + ch_offset], coeffs, sizeof(float)); +} else if (CONFIG_BINKAUDIO_RDFT_DECODER) { +for (int i = 2; i < s->frame_len; i += 2) +coeffs[i + 1] *= -1; + +coeffs[s->frame_len + 0] = coeffs[1]; +coeffs[s->frame_len + 1] = coeffs[1] = 0; +s->tx_fn(s->tx, out[ch + ch_offset], coeffs, sizeof(AVComplexFloat)); } -else if (CONFIG_BINKAUDIO_RDFT_DECODER) -s->trans.rdft.rdft_calc(>trans.rdft, coeffs); } for (ch = 0; ch < channels; ch++) { @@ -278,11 +282,7 @@ static int decode_block(BinkAudioContext *s, float **out, int use_dct, static av_cold int decode_end(AVCodecContext *avctx) { BinkAudioContext * s = avctx->priv_data; -if (CONFIG_BINKAUDIO_RDFT_DECODER && avctx->codec->id == AV_CODEC_ID_BINKAUDIO_RDFT) -ff_rdft_end(>trans.rdft); -else if (CONFIG_BINKAUDIO_DCT_DECODER) -ff_dct_end(>trans.dct); - +av_tx_uninit(>tx); return 0; } ___ ffmpeg-cvslog mailing list ffmpeg-cvslog@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-cvslog To unsubscribe, visit link above, or email ffmpeg-cvslog-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-cvslog] lavu/tx: clarify stride for RDFT transforms
ffmpeg | branch: master | Lynne | Thu Nov 24 15:56:01 2022 +0100| [93c30bd6f0846898bb3e7172bb5de65f2d0f33ce] | committer: Lynne lavu/tx: clarify stride for RDFT transforms > http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=93c30bd6f0846898bb3e7172bb5de65f2d0f33ce --- libavutil/tx.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/libavutil/tx.h b/libavutil/tx.h index cd772ad903..758f634b73 100644 --- a/libavutil/tx.h +++ b/libavutil/tx.h @@ -75,7 +75,9 @@ enum AVTXType { * the double variant, it's a 'double'. If scale is NULL, 1.0 will be used * as a default. * - * The stride parameter must be set to the size of a single sample in bytes. + * For forward transforms (R2C), stride must be the spacing between two + * samples in bytes. For inverse transforms, the stride must be set + * to the spacing between two complex values in bytes. * * The forward transform performs a real-to-complex DFT of N samples to * N/2+1 complex values. ___ ffmpeg-cvslog mailing list ffmpeg-cvslog@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-cvslog To unsubscribe, visit link above, or email ffmpeg-cvslog-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-cvslog] lavu/tx: add DCT-III implementation
ffmpeg | branch: master | Lynne | Sun Nov 20 03:44:29 2022 +0100| [a56d7e0ca3be82cb5155ab0cf8206fc8b8d6861d] | committer: Lynne lavu/tx: add DCT-III implementation > http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=a56d7e0ca3be82cb5155ab0cf8206fc8b8d6861d --- libavutil/tx_template.c | 72 +++-- 1 file changed, 70 insertions(+), 2 deletions(-) diff --git a/libavutil/tx_template.c b/libavutil/tx_template.c index 5d73809b58..1de92b9786 100644 --- a/libavutil/tx_template.c +++ b/libavutil/tx_template.c @@ -1737,6 +1737,11 @@ static av_cold int TX_NAME(ff_tx_dct_init)(AVTXContext *s, TXSample *tab; SCALE_TYPE rsc = *((SCALE_TYPE *)scale); +if (inv) { +len *= 2; +s->len *= 2; +rsc *= 0.5; +} if ((ret = ff_tx_init_subtx(s, TX_TYPE(RDFT), flags, NULL, len, inv, ))) return ret; @@ -1752,8 +1757,13 @@ static av_cold int TX_NAME(ff_tx_dct_init)(AVTXContext *s, for (int i = 0; i < len; i++) tab[i] = RESCALE(cos(i*freq)*(!inv + 1)); -for (int i = 0; i < len/2; i++) -tab[len + i] = RESCALE(cos((len - 2*i - 1)*freq)); +if (inv) { +for (int i = 0; i < len/2; i++) +tab[len + i] = RESCALE(0.5 / sin((2*i + 1)*freq)); +} else { +for (int i = 0; i < len/2; i++) +tab[len + i] = RESCALE(cos((len - 2*i - 1)*freq)); +} return 0; } @@ -1818,6 +1828,49 @@ static void TX_NAME(ff_tx_dctII)(AVTXContext *s, void *_dst, dst[1] = next; } +static void TX_NAME(ff_tx_dctIII)(AVTXContext *s, void *_dst, + void *_src, ptrdiff_t stride) +{ +TXSample *dst = _dst; +TXSample *src = _src; +const int len = s->len; +const int len2 = len >> 1; +const TXSample *exp = (void *)s->exp; +#ifdef TX_INT32 +int64_t tmp1, tmp2 = src[len - 1]; +tmp2 = (2*tmp2 + 0x4000) >> 31; +#else +TXSample tmp1, tmp2 = 2*src[len - 1]; +#endif + +src[len] = tmp2; + +for (int i = len - 2; i >= 2; i -= 2) { +TXSample val1 = src[i - 0]; +TXSample val2 = src[i - 1] - src[i + 1]; + +CMUL(src[i + 1], src[i], exp[len - i], exp[i], val1, val2); +} + +s->fn[0](>sub[0], dst, src, sizeof(float)); + +for (int i = 0; i < len2; i++) { +TXSample in1 = dst[i]; +TXSample in2 = dst[len - i - 1]; +TXSample c = exp[len + i]; + +tmp1 = in1 + in2; +tmp2 = in1 - in2; +tmp2 *= c; +#ifdef TX_INT32 +tmp2 = (tmp2 + 0x4000) >> 31; +#endif + +dst[i]= tmp1 + tmp2; +dst[len - i - 1] = tmp1 - tmp2; +} +} + static const FFTXCodelet TX_NAME(ff_tx_dctII_def) = { .name = TX_NAME_STR("dctII"), .function = TX_NAME(ff_tx_dctII), @@ -1832,6 +1885,20 @@ static const FFTXCodelet TX_NAME(ff_tx_dctII_def) = { .prio = FF_TX_PRIO_BASE, }; +static const FFTXCodelet TX_NAME(ff_tx_dctIII_def) = { +.name = TX_NAME_STR("dctIII"), +.function = TX_NAME(ff_tx_dctIII), +.type = TX_TYPE(DCT), +.flags = AV_TX_UNALIGNED | AV_TX_INPLACE | + FF_TX_OUT_OF_PLACE | FF_TX_INVERSE_ONLY, +.factors= { 2, TX_FACTOR_ANY }, +.min_len= 2, +.max_len= TX_LEN_UNLIMITED, +.init = TX_NAME(ff_tx_dct_init), +.cpu_flags = FF_TX_CPU_FLAGS_ALL, +.prio = FF_TX_PRIO_BASE, +}; + int TX_TAB(ff_tx_mdct_gen_exp)(AVTXContext *s, int *pre_tab) { int off = 0; @@ -1920,6 +1987,7 @@ const FFTXCodelet * const TX_NAME(ff_tx_codelet_list)[] = { _NAME(ff_tx_rdft_r2c_def), _NAME(ff_tx_rdft_c2r_def), _NAME(ff_tx_dctII_def), +_NAME(ff_tx_dctIII_def), NULL, }; ___ ffmpeg-cvslog mailing list ffmpeg-cvslog@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-cvslog To unsubscribe, visit link above, or email ffmpeg-cvslog-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-cvslog] lavu/tx: fix last coefficient scaling for R2C transforms
ffmpeg | branch: master | Lynne | Sat Nov 19 14:16:30 2022 +0100| [43d285a40f11e15839b784c85bbbcc7fafd135b5] | committer: Lynne lavu/tx: fix last coefficient scaling for R2C transforms This was a typo. > http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=43d285a40f11e15839b784c85bbbcc7fafd135b5 --- libavutil/tx_template.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libavutil/tx_template.c b/libavutil/tx_template.c index 59295e06ca..f33bcf85de 100644 --- a/libavutil/tx_template.c +++ b/libavutil/tx_template.c @@ -1622,7 +1622,7 @@ static av_cold int TX_NAME(ff_tx_rdft_init)(AVTXContext *s, m = (inv ? 2*s->scale_d : s->scale_d); *tab++ = RESCALE((inv ? 0.5 : 1.0) * m); -*tab++ = RESCALE(inv ? 0.5*m : 1.0); +*tab++ = RESCALE(inv ? 0.5*m : 1.0*m); *tab++ = RESCALE( m); *tab++ = RESCALE(-m); ___ ffmpeg-cvslog mailing list ffmpeg-cvslog@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-cvslog To unsubscribe, visit link above, or email ffmpeg-cvslog-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-cvslog] lavu/tx: generalize PFA FFTs
ffmpeg | branch: master | Lynne | Sat Nov 19 01:00:36 2022 +0100| [8547123f3b13da5a135994f4253686e67953be55] | committer: Lynne lavu/tx: generalize PFA FFTs This commit permits any stacking of FFTs of any size. > http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=8547123f3b13da5a135994f4253686e67953be55 --- libavutil/tx_priv.h | 3 +- libavutil/tx_template.c | 217 +++- 2 files changed, 163 insertions(+), 57 deletions(-) diff --git a/libavutil/tx_priv.h b/libavutil/tx_priv.h index f11061d051..8a79cf0dd3 100644 --- a/libavutil/tx_priv.h +++ b/libavutil/tx_priv.h @@ -232,7 +232,8 @@ struct AVTXContext { intlen; /* Length of the transform */ intinv; /* If transform is inverse */ int *map; /* Lookup table(s) */ -TXComplex *exp; /* Any non-pre-baked multiplication factors needed */ +TXComplex *exp; /* Any non-pre-baked multiplication factors, + * or extra temporary buffer */ TXComplex *tmp; /* Temporary buffer, if needed */ AVTXContext *sub; /* Subtransform context(s), if needed */ diff --git a/libavutil/tx_template.c b/libavutil/tx_template.c index 38ab517f66..59295e06ca 100644 --- a/libavutil/tx_template.c +++ b/libavutil/tx_template.c @@ -949,74 +949,182 @@ static av_cold int TX_NAME(ff_tx_fft_pfa_init)(AVTXContext *s, int len, int inv, const void *scale) { -int ret; -int sub_len = len / cd->factors[0]; -FFTXCodeletOptions sub_opts = { .invert_lookup = 0 }; +int ret, *tmp, ps = flags & FF_TX_PRESHUFFLE; +FFTXCodeletOptions sub_opts = { .map_dir = FF_TX_MAP_GATHER }; +size_t extra_tmp_len = 0; +int len_list[TX_MAX_DECOMPOSITIONS]; -flags &= ~FF_TX_OUT_OF_PLACE; /* We want the subtransform to be */ -flags |= AV_TX_INPLACE; /* in-place */ -flags |= FF_TX_PRESHUFFLE; /* This function handles the permute step */ +if ((ret = ff_tx_decompose_length(len_list, TX_TYPE(FFT), len, inv)) < 0) +return ret; -if ((ret = ff_tx_init_subtx(s, TX_TYPE(FFT), flags, _opts, -sub_len, inv, scale))) +/* Two iterations to test both orderings. */ +for (int i = 0; i < ret; i++) { +int len1 = len_list[i]; +int len2 = len / len1; + +/* Our ptwo transforms don't support striding the output. */ +if (len2 & (len2 - 1)) +FFSWAP(int, len1, len2); + +ff_tx_clear_ctx(s); + +/* First transform */ +sub_opts.map_dir = FF_TX_MAP_GATHER; +flags &= ~AV_TX_INPLACE; +flags |= FF_TX_OUT_OF_PLACE; +flags |= FF_TX_PRESHUFFLE; /* This function handles the permute step */ +ret = ff_tx_init_subtx(s, TX_TYPE(FFT), flags, _opts, + len1, inv, scale); + +if (ret == AVERROR(ENOMEM)) { +return ret; +} else if (ret < 0) { /* Try again without a preshuffle flag */ +flags &= ~FF_TX_PRESHUFFLE; +ret = ff_tx_init_subtx(s, TX_TYPE(FFT), flags, _opts, + len1, inv, scale); +if (ret == AVERROR(ENOMEM)) +return ret; +else if (ret < 0) +continue; +} + +/* Second transform. */ +sub_opts.map_dir = FF_TX_MAP_SCATTER; +flags |= FF_TX_PRESHUFFLE; +retry: +flags &= ~FF_TX_OUT_OF_PLACE; +flags |= AV_TX_INPLACE; +ret = ff_tx_init_subtx(s, TX_TYPE(FFT), flags, _opts, + len2, inv, scale); + +if (ret == AVERROR(ENOMEM)) { +return ret; +} else if (ret < 0) { /* Try again with an out-of-place transform */ +flags |= FF_TX_OUT_OF_PLACE; +flags &= ~AV_TX_INPLACE; +ret = ff_tx_init_subtx(s, TX_TYPE(FFT), flags, _opts, + len2, inv, scale); +if (ret == AVERROR(ENOMEM)) { +return ret; +} else if (ret < 0) { +if (flags & FF_TX_PRESHUFFLE) { /* Retry again without a preshuf flag */ +flags &= ~FF_TX_PRESHUFFLE; +goto retry; +} else { +continue; +} +} +} + +/* Success */ +break; +} + +/* If nothing was sucessful, error out */ +if (ret < 0) return ret; /* Generate PFA map */ if ((ret = ff_tx_gen_compound_mapping(s, opts, 0, - cd->factors[0], sub_len))) + s->sub[0].len, s->sub[1].len))) return ret; if (!(s->tmp =
[FFmpeg-cvslog] lavu/tx: add DCT-II implementation
ffmpeg | branch: master | Lynne | Sat Nov 19 14:20:23 2022 +0100| [504b7bec1a7a46ffbfd0c605fdd984df36dc9871] | committer: Lynne lavu/tx: add DCT-II implementation > http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=504b7bec1a7a46ffbfd0c605fdd984df36dc9871 --- libavutil/tx.h | 14 +++ libavutil/tx_template.c | 108 2 files changed, 122 insertions(+) diff --git a/libavutil/tx.h b/libavutil/tx.h index 758f634b73..064edbc097 100644 --- a/libavutil/tx.h +++ b/libavutil/tx.h @@ -91,6 +91,20 @@ enum AVTXType { AV_TX_DOUBLE_RDFT = 7, AV_TX_INT32_RDFT = 8, +/** + * Real to real (DCT) transforms. + * + * The forward transform is a DCT-II. + * The inverse transform is a DCT-III. + * + * The input array is always overwritten. DCT-III requires that the + * input be padded with 2 extra samples. Stride must be set to the + * spacing between two samples in bytes. + */ +AV_TX_FLOAT_DCT = 9, +AV_TX_DOUBLE_DCT = 10, +AV_TX_INT32_DCT = 11, + /* Not part of the API, do not use */ AV_TX_NB, }; diff --git a/libavutil/tx_template.c b/libavutil/tx_template.c index f33bcf85de..5d73809b58 100644 --- a/libavutil/tx_template.c +++ b/libavutil/tx_template.c @@ -1725,6 +1725,113 @@ static const FFTXCodelet TX_NAME(ff_tx_rdft_c2r_def) = { .prio = FF_TX_PRIO_BASE, }; +static av_cold int TX_NAME(ff_tx_dct_init)(AVTXContext *s, + const FFTXCodelet *cd, + uint64_t flags, + FFTXCodeletOptions *opts, + int len, int inv, + const void *scale) +{ +int ret; +double freq; +TXSample *tab; +SCALE_TYPE rsc = *((SCALE_TYPE *)scale); + + +if ((ret = ff_tx_init_subtx(s, TX_TYPE(RDFT), flags, NULL, len, inv, ))) +return ret; + +s->exp = av_malloc((len/2)*3*sizeof(TXSample)); +if (!s->exp) +return AVERROR(ENOMEM); + +tab = (TXSample *)s->exp; + +freq = M_PI/(len*2); + +for (int i = 0; i < len; i++) +tab[i] = RESCALE(cos(i*freq)*(!inv + 1)); + +for (int i = 0; i < len/2; i++) +tab[len + i] = RESCALE(cos((len - 2*i - 1)*freq)); + +return 0; +} + +static void TX_NAME(ff_tx_dctII)(AVTXContext *s, void *_dst, + void *_src, ptrdiff_t stride) +{ +TXSample *dst = _dst; +TXSample *src = _src; +const int len = s->len; +const int len2 = len >> 1; +const TXSample *exp = (void *)s->exp; +TXSample next; +#ifdef TX_INT32 +int64_t tmp1, tmp2; +#else +TXSample tmp1, tmp2; +#endif + +for (int i = 0; i < len2; i++) { +TXSample in1 = src[i]; +TXSample in2 = src[len - i - 1]; +TXSample s= exp[len + i]; + +#ifdef TX_INT32 +tmp1 = in1 + in2; +tmp2 = in1 - in2; + +tmp1 >>= 1; +tmp2 *= s; + +tmp2 = (tmp2 + 0x4000) >> 31; +#else +tmp1 = (in1 + in2)*0.5; +tmp2 = (in1 - in2)*s; +#endif + +src[i] = tmp1 + tmp2; +src[len - i - 1] = tmp1 - tmp2; +} + +s->fn[0](>sub[0], dst, src, sizeof(TXComplex)); + +next = dst[len]; + +for (int i = len - 2; i > 0; i -= 2) { +TXSample tmp; + +CMUL(tmp, dst[i], exp[len - i], exp[i], dst[i + 0], dst[i + 1]); + +dst[i + 1] = next; + +next += tmp; +} + +#ifdef TX_INT32 +tmp1 = ((int64_t)exp[0]) * ((int64_t)dst[0]); +dst[0] = (tmp1 + 0x4000) >> 31; +#else +dst[0] = exp[0] * dst[0]; +#endif +dst[1] = next; +} + +static const FFTXCodelet TX_NAME(ff_tx_dctII_def) = { +.name = TX_NAME_STR("dctII"), +.function = TX_NAME(ff_tx_dctII), +.type = TX_TYPE(DCT), +.flags = AV_TX_UNALIGNED | AV_TX_INPLACE | + FF_TX_OUT_OF_PLACE | FF_TX_FORWARD_ONLY, +.factors= { 2, TX_FACTOR_ANY }, +.min_len= 2, +.max_len= TX_LEN_UNLIMITED, +.init = TX_NAME(ff_tx_dct_init), +.cpu_flags = FF_TX_CPU_FLAGS_ALL, +.prio = FF_TX_PRIO_BASE, +}; + int TX_TAB(ff_tx_mdct_gen_exp)(AVTXContext *s, int *pre_tab) { int off = 0; @@ -1812,6 +1919,7 @@ const FFTXCodelet * const TX_NAME(ff_tx_codelet_list)[] = { _NAME(ff_tx_mdct_inv_full_def), _NAME(ff_tx_rdft_r2c_def), _NAME(ff_tx_rdft_c2r_def), +_NAME(ff_tx_dctII_def), NULL, }; ___ ffmpeg-cvslog mailing list ffmpeg-cvslog@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-cvslog To unsubscribe, visit link above, or email ffmpeg-cvslog-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-cvslog] lavu/tx: refactor and separate codelet list and prio code
ffmpeg | branch: master | Lynne | Thu Nov 17 22:14:53 2022 +0100| [1c8d77a2bfa239621b63c4553c6221560b1ee298] | committer: Lynne lavu/tx: refactor and separate codelet list and prio code > http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=1c8d77a2bfa239621b63c4553c6221560b1ee298 --- libavutil/tx.c | 125 ++--- 1 file changed, 66 insertions(+), 59 deletions(-) diff --git a/libavutil/tx.c b/libavutil/tx.c index 319392788f..ff81d235ba 100644 --- a/libavutil/tx.c +++ b/libavutil/tx.c @@ -300,6 +300,67 @@ static const FFTXCodelet * const ff_tx_null_list[] = { NULL, }; +/* Array of all compiled codelet lists. Order is irrelevant. */ +static const FFTXCodelet * const * const codelet_list[] = { +ff_tx_codelet_list_float_c, +ff_tx_codelet_list_double_c, +ff_tx_codelet_list_int32_c, +ff_tx_null_list, +#if HAVE_X86ASM +ff_tx_codelet_list_float_x86, +#endif +#if ARCH_AARCH64 +ff_tx_codelet_list_float_aarch64, +#endif +}; +static const int codelet_list_num = FF_ARRAY_ELEMS(codelet_list); + +static const int cpu_slow_mask = AV_CPU_FLAG_SSE2SLOW | AV_CPU_FLAG_SSE3SLOW | + AV_CPU_FLAG_ATOM | AV_CPU_FLAG_SSSE3SLOW | + AV_CPU_FLAG_AVXSLOW | AV_CPU_FLAG_SLOW_GATHER; + +static const int cpu_slow_penalties[][2] = { +{ AV_CPU_FLAG_SSE2SLOW,1 + 64 }, +{ AV_CPU_FLAG_SSE3SLOW,1 + 64 }, +{ AV_CPU_FLAG_SSSE3SLOW, 1 + 64 }, +{ AV_CPU_FLAG_ATOM,1 + 128 }, +{ AV_CPU_FLAG_AVXSLOW, 1 + 128 }, +{ AV_CPU_FLAG_SLOW_GATHER, 1 + 32 }, +}; + +static int get_codelet_prio(const FFTXCodelet *cd, int cpu_flags, int len) +{ +int prio = cd->prio; +int max_factor = 0; + +/* If the CPU has a SLOW flag, and the instruction is also flagged + * as being slow for such, reduce its priority */ +for (int i = 0; i < FF_ARRAY_ELEMS(cpu_slow_penalties); i++) { +if ((cpu_flags & cd->cpu_flags) & cpu_slow_penalties[i][0]) +prio -= cpu_slow_penalties[i][1]; +} + +/* Prioritize aligned-only codelets */ +if ((cd->flags & FF_TX_ALIGNED) && !(cd->flags & AV_TX_UNALIGNED)) +prio += 64; + +/* Codelets for specific lengths are generally faster */ +if ((len == cd->min_len) && (len == cd->max_len)) +prio += 64; + +/* Forward-only or inverse-only transforms are generally better */ +if ((cd->flags & (FF_TX_FORWARD_ONLY | FF_TX_INVERSE_ONLY))) +prio += 64; + +/* Larger factors are generally better */ +for (int i = 0; i < TX_MAX_SUB; i++) +max_factor = FFMAX(cd->factors[i], max_factor); +if (max_factor) +prio += 16*max_factor; + +return prio; +} + #if !CONFIG_SMALL static void print_flags(AVBPrint *bp, uint64_t f) { @@ -465,41 +526,15 @@ av_cold int ff_tx_init_subtx(AVTXContext *s, enum AVTXType type, AVTXContext *sub = NULL; TXCodeletMatch *cd_tmp, *cd_matches = NULL; unsigned int cd_matches_size = 0; +int codelet_list_idx = codelet_list_num; int nb_cd_matches = 0; #if !CONFIG_SMALL AVBPrint bp = { 0 }; #endif -/* Array of all compiled codelet lists. Order is irrelevant. */ -const FFTXCodelet * const * const codelet_list[] = { -ff_tx_codelet_list_float_c, -ff_tx_codelet_list_double_c, -ff_tx_codelet_list_int32_c, -ff_tx_null_list, -#if HAVE_X86ASM -ff_tx_codelet_list_float_x86, -#endif -#if ARCH_AARCH64 -ff_tx_codelet_list_float_aarch64, -#endif -}; -int codelet_list_num = FF_ARRAY_ELEMS(codelet_list); - /* We still accept functions marked with SLOW, even if the CPU is * marked with the same flag, but we give them lower priority. */ const int cpu_flags = av_get_cpu_flags(); -const int slow_mask = AV_CPU_FLAG_SSE2SLOW | AV_CPU_FLAG_SSE3SLOW | - AV_CPU_FLAG_ATOM | AV_CPU_FLAG_SSSE3SLOW | - AV_CPU_FLAG_AVXSLOW | AV_CPU_FLAG_SLOW_GATHER; - -static const int slow_penalties[][2] = { -{ AV_CPU_FLAG_SSE2SLOW,1 + 64 }, -{ AV_CPU_FLAG_SSE3SLOW,1 + 64 }, -{ AV_CPU_FLAG_SSSE3SLOW, 1 + 64 }, -{ AV_CPU_FLAG_ATOM,1 + 128 }, -{ AV_CPU_FLAG_AVXSLOW, 1 + 128 }, -{ AV_CPU_FLAG_SLOW_GATHER, 1 + 32 }, -}; /* Flags the transform wants */ uint64_t req_flags = flags; @@ -519,13 +554,11 @@ av_cold int ff_tx_init_subtx(AVTXContext *s, enum AVTXType type, /* Loop through all codelets in all codelet lists to find matches * to the requirements */ -while (codelet_list_num--) { -const FFTXCodelet * const * list = codelet_list[codelet_list_num]; +while (codelet_list_idx--) { +const FFTXCodelet * const * list = codelet_list[codelet_list_idx]; const FFTXCodelet *cd = NULL; while ((cd = *list++)) { -int max_factor = 0; - /*
[FFmpeg-cvslog] lavu/tx: add length decomposition function
ffmpeg | branch: master | Lynne | Sat Nov 19 00:58:13 2022 +0100| [7f019e77586ab62d37856d09071fcdeef880bcd9] | committer: Lynne lavu/tx: add length decomposition function Rather than using a list of lengths supported, this goes a step beyond and uses all registered codelets to come up with a good decomposition. > http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=7f019e77586ab62d37856d09071fcdeef880bcd9 --- libavutil/tx.c | 143 libavutil/tx_priv.h | 8 +++ 2 files changed, 151 insertions(+) diff --git a/libavutil/tx.c b/libavutil/tx.c index 8027e983ba..c692239656 100644 --- a/libavutil/tx.c +++ b/libavutil/tx.c @@ -17,6 +17,7 @@ */ #include "avassert.h" +#include "intmath.h" #include "cpu.h" #include "qsort.h" #include "bprint.h" @@ -395,6 +396,148 @@ static int get_codelet_prio(const FFTXCodelet *cd, int cpu_flags, int len) return prio; } +typedef struct FFTXLenDecomp { +int len; +int len2; +int prio; +const FFTXCodelet *cd; +} FFTXLenDecomp; + +static int cmp_decomp(FFTXLenDecomp *a, FFTXLenDecomp *b) +{ +return FFDIFFSIGN(b->prio, a->prio); +} + +int ff_tx_decompose_length(int dst[TX_MAX_DECOMPOSITIONS], enum AVTXType type, + int len, int inv) +{ +int nb_decomp = 0; +FFTXLenDecomp ld[TX_MAX_DECOMPOSITIONS]; +int codelet_list_idx = codelet_list_num; + +const int cpu_flags = av_get_cpu_flags(); + +/* Loop through all codelets in all codelet lists to find matches + * to the requirements */ +while (codelet_list_idx--) { +const FFTXCodelet * const * list = codelet_list[codelet_list_idx]; +const FFTXCodelet *cd = NULL; + +while ((cd = *list++)) { +int fl = len; +int skip = 0, prio; +int factors_product = 1, factors_mod = 0; + +if (nb_decomp >= TX_MAX_DECOMPOSITIONS) +goto sort; + +/* Check if the type matches */ +if (cd->type != TX_TYPE_ANY && type != cd->type) +continue; + +/* Check direction for non-orthogonal codelets */ +if (((cd->flags & FF_TX_FORWARD_ONLY) && inv) || +((cd->flags & (FF_TX_INVERSE_ONLY | AV_TX_FULL_IMDCT)) && !inv)) +continue; + +/* Check if the CPU supports the required ISA */ +if (cd->cpu_flags != FF_TX_CPU_FLAGS_ALL && +!(cpu_flags & (cd->cpu_flags & ~cpu_slow_mask))) +continue; + +for (int i = 0; i < TX_MAX_FACTORS; i++) { +if (!cd->factors[i] || (fl == 1)) +break; + +if (cd->factors[i] == TX_FACTOR_ANY) { +factors_mod++; +factors_product *= fl; +} else if (!(fl % cd->factors[i])) { +factors_mod++; +if (cd->factors[i] == 2) { +int b = ff_ctz(fl); +fl >>= b; +factors_product <<= b; +} else { +do { +fl /= cd->factors[i]; +factors_product *= cd->factors[i]; +} while (!(fl % cd->factors[i])); +} +} +} + +/* Disqualify if factor requirements are not satisfied or if trivial */ +if ((factors_mod < cd->nb_factors) || (len == factors_product)) +continue; + +if (av_gcd(factors_product, fl) != 1) +continue; + +/* Check if length is supported and factorization was successful */ +if ((factors_product < cd->min_len) || +(cd->max_len != TX_LEN_UNLIMITED && (factors_product > cd->max_len))) +continue; + +prio = get_codelet_prio(cd, cpu_flags, factors_product) * factors_product; + +/* Check for duplicates */ +for (int i = 0; i < nb_decomp; i++) { +if (factors_product == ld[i].len) { +/* Update priority if new one is higher */ +if (prio > ld[i].prio) +ld[i].prio = prio; +skip = 1; +break; +} +} + +/* Add decomposition if unique */ +if (!skip) { +ld[nb_decomp].cd = cd; +ld[nb_decomp].len = factors_product; +ld[nb_decomp].len2 = fl; +ld[nb_decomp].prio = prio; +nb_decomp++; +} +} +} + +if (!nb_decomp) +return AVERROR(EINVAL); + +sort: +AV_QSORT(ld, nb_decomp, FFTXLenDecomp, cmp_decomp); + +for (int i = 0; i < nb_decomp; i++) { +if (ld[i].cd->nb_factors > 1) +dst[i] = ld[i].len2; +else +dst[i] = ld[i].len; +} + +
[FFmpeg-cvslog] lavu/tx: refactor to explicitly track and convert lookup table order
ffmpeg | branch: master | Lynne | Sat Nov 19 00:47:45 2022 +0100| [87bae6b0189d5cb71b836890078f96a4d1abd277] | committer: Lynne lavu/tx: refactor to explicitly track and convert lookup table order Necessary for generalizing PFAs. > http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=87bae6b0189d5cb71b836890078f96a4d1abd277 --- libavutil/aarch64/tx_float_init.c | 5 +- libavutil/tx.c| 109 +- libavutil/tx_priv.h | 52 -- libavutil/tx_template.c | 49 - libavutil/x86/tx_float_init.c | 46 5 files changed, 181 insertions(+), 80 deletions(-) diff --git a/libavutil/aarch64/tx_float_init.c b/libavutil/aarch64/tx_float_init.c index e7b73b4bf9..8300472c4c 100644 --- a/libavutil/aarch64/tx_float_init.c +++ b/libavutil/aarch64/tx_float_init.c @@ -37,12 +37,11 @@ static av_cold int neon_init(AVTXContext *s, const FFTXCodelet *cd, uint64_t flags, FFTXCodeletOptions *opts, int len, int inv, const void *scale) { -const int inv_lookup = opts ? opts->invert_lookup : 1; ff_tx_init_tabs_float(len); if (cd->max_len == 2) -return ff_tx_gen_ptwo_revtab(s, inv_lookup); +return ff_tx_gen_ptwo_revtab(s, opts); else -return ff_tx_gen_split_radix_parity_revtab(s, len, inv, inv_lookup, 8, 0); +return ff_tx_gen_split_radix_parity_revtab(s, len, inv, opts, 8, 0); } const FFTXCodelet * const ff_tx_codelet_list_float_aarch64[] = { diff --git a/libavutil/tx.c b/libavutil/tx.c index ff81d235ba..8027e983ba 100644 --- a/libavutil/tx.c +++ b/libavutil/tx.c @@ -39,11 +39,41 @@ static av_always_inline int mulinv(int n, int m) return 0; } +int ff_tx_gen_pfa_input_map(AVTXContext *s, FFTXCodeletOptions *opts, +int d1, int d2) +{ +const int sl = d1*d2; + +s->map = av_malloc(s->len*sizeof(*s->map)); +if (!s->map) +return AVERROR(ENOMEM); + +for (int k = 0; k < s->len; k += sl) { +if (s->inv || (opts && opts->map_dir == FF_TX_MAP_SCATTER)) { +for (int m = 0; m < d2; m++) +for (int n = 0; n < d1; n++) +s->map[k + ((m*d1 + n*d2) % (sl))] = m*d1 + n; +} else { +for (int m = 0; m < d2; m++) +for (int n = 0; n < d1; n++) +s->map[k + m*d1 + n] = (m*d1 + n*d2) % (sl); +} + +if (s->inv) +for (int w = 1; w <= ((sl) >> 1); w++) +FFSWAP(int, s->map[k + w], s->map[k + sl - w]); +} + +s->map_dir = opts ? opts->map_dir : FF_TX_MAP_GATHER; + +return 0; +} + /* Guaranteed to work for any n, m where gcd(n, m) == 1 */ -int ff_tx_gen_compound_mapping(AVTXContext *s, int n, int m) +int ff_tx_gen_compound_mapping(AVTXContext *s, FFTXCodeletOptions *opts, + int inv, int n, int m) { int *in_map, *out_map; -const int inv = s->inv; const int len = n*m;/* Will not be equal to s->len for MDCTs */ int m_inv, n_inv; @@ -61,14 +91,22 @@ int ff_tx_gen_compound_mapping(AVTXContext *s, int n, int m) out_map = s->map + len; /* Ruritanian map for input, CRT map for output, can be swapped */ -for (int j = 0; j < m; j++) { -for (int i = 0; i < n; i++) { -in_map[j*n + i] = (i*m + j*n) % len; -out_map[(i*m*m_inv + j*n*n_inv) % len] = i*m + j; +if (opts && opts->map_dir == FF_TX_MAP_SCATTER) { +for (int j = 0; j < m; j++) { +for (int i = 0; i < n; i++) { +in_map[(i*m + j*n) % len] = j*n + i; +out_map[(i*m*m_inv + j*n*n_inv) % len] = i*m + j; +} +} +} else { +for (int j = 0; j < m; j++) { +for (int i = 0; i < n; i++) { +in_map[j*n + i] = (i*m + j*n) % len; +out_map[(i*m*m_inv + j*n*n_inv) % len] = i*m + j; +} } } -/* Change transform direction by reversing all ACs */ if (inv) { for (int i = 0; i < m; i++) { int *in = _map[i*n + 1]; /* Skip the DC */ @@ -77,17 +115,7 @@ int ff_tx_gen_compound_mapping(AVTXContext *s, int n, int m) } } -/* Our 15-point transform is also a compound one, so embed its input map */ -if (n == 15) { -for (int k = 0; k < m; k++) { -int tmp[15]; -memcpy(tmp, _map[k*15], 15*sizeof(*tmp)); -for (int i = 0; i < 5; i++) { -for (int j = 0; j < 3; j++) -in_map[k*15 + i*3 + j] = tmp[(i*3 + j*5) % 15]; -} -} -} +s->map_dir = opts ? opts->map_dir : FF_TX_MAP_GATHER; return 0; } @@ -103,21 +131,23 @@ static inline int split_radix_permutation(int i, int len, int inv) return split_radix_permutation(i, len, inv) * 4 + 1 - 2*(!(i & len) ^ inv); } -int
[FFmpeg-cvslog] lavu/tx: add ff_tx_clear_ctx()
ffmpeg | branch: master | Lynne | Sat Oct 1 12:20:10 2022 +0200| [dd77e61182865e396195a19b1e6ec697717cef56] | committer: Lynne lavu/tx: add ff_tx_clear_ctx() This function allows implementations to clean up a context after successfully initializing subcontexts. > http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=dd77e61182865e396195a19b1e6ec697717cef56 --- libavutil/tx.c | 27 +++ libavutil/tx_priv.h | 3 +++ 2 files changed, 22 insertions(+), 8 deletions(-) diff --git a/libavutil/tx.c b/libavutil/tx.c index 246a7aa980..13fb54f916 100644 --- a/libavutil/tx.c +++ b/libavutil/tx.c @@ -225,24 +225,35 @@ int ff_tx_gen_split_radix_parity_revtab(AVTXContext *s, int len, int inv, return 0; } -static void reset_ctx(AVTXContext *s) +static void reset_ctx(AVTXContext *s, int free_sub) { if (!s) return; if (s->sub) -for (int i = 0; i < s->nb_sub; i++) -reset_ctx(>sub[i]); +for (int i = 0; i < TX_MAX_SUB; i++) +reset_ctx(>sub[i], free_sub + 1); -if (s->cd_self->uninit) +if (s->cd_self && s->cd_self->uninit) s->cd_self->uninit(s); -av_freep(>sub); +if (free_sub) +av_freep(>sub); + av_freep(>map); av_freep(>exp); av_freep(>tmp); -memset(s, 0, sizeof(*s)); +/* Nothing else needs to be reset, it gets overwritten if another + * ff_tx_init_subtx() call is made. */ +s->nb_sub = 0; +s->opaque = NULL; +memset(s->fn, 0, sizeof(*s->fn)); +} + +void ff_tx_clear_ctx(AVTXContext *s) +{ +reset_ctx(s, 0); } av_cold void av_tx_uninit(AVTXContext **ctx) @@ -250,7 +261,7 @@ av_cold void av_tx_uninit(AVTXContext **ctx) if (!(*ctx)) return; -reset_ctx(*ctx); +reset_ctx(*ctx, 1); av_freep(ctx); } @@ -635,7 +646,7 @@ av_cold int ff_tx_init_subtx(AVTXContext *s, enum AVTXType type, s->fn[s->nb_sub] = NULL; s->cd[s->nb_sub] = NULL; -reset_ctx(sctx); +reset_ctx(sctx, 0); if (ret == AVERROR(ENOMEM)) break; } diff --git a/libavutil/tx_priv.h b/libavutil/tx_priv.h index 56e78631ba..d9e38ba19b 100644 --- a/libavutil/tx_priv.h +++ b/libavutil/tx_priv.h @@ -240,6 +240,9 @@ int ff_tx_init_subtx(AVTXContext *s, enum AVTXType type, uint64_t flags, FFTXCodeletOptions *opts, int len, int inv, const void *scale); +/* Clear the context by freeing all tables, maps and subtransforms. */ +void ff_tx_clear_ctx(AVTXContext *s); + /* * Generates the PFA permutation table into AVTXContext->pfatab. The end table * is appended to the start table. ___ ffmpeg-cvslog mailing list ffmpeg-cvslog@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-cvslog To unsubscribe, visit link above, or email ffmpeg-cvslog-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-cvslog] lavu/tx: improve transform tree logging
ffmpeg | branch: master | Lynne | Sat Oct 1 12:35:14 2022 +0200| [958b3760b54245b934054f8aa72a608bdb2a48b8] | committer: Lynne lavu/tx: improve transform tree logging Now prints the actual codelet size used, as well as the number of allowed factors. > http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=958b3760b54245b934054f8aa72a608bdb2a48b8 --- libavutil/tx.c | 56 +++- 1 file changed, 35 insertions(+), 21 deletions(-) diff --git a/libavutil/tx.c b/libavutil/tx.c index a1173f6137..319392788f 100644 --- a/libavutil/tx.c +++ b/libavutil/tx.c @@ -343,7 +343,7 @@ static void print_type(AVBPrint *bp, enum AVTXType type) "unknown"); } -static void print_cd_info(const FFTXCodelet *cd, int prio, int print_prio) +static void print_cd_info(const FFTXCodelet *cd, int prio, int len, int print_prio) { AVBPrint bp = { 0 }; av_bprint_init(, 0, AV_BPRINT_SIZE_AUTOMATIC); @@ -353,27 +353,41 @@ static void print_cd_info(const FFTXCodelet *cd, int prio, int print_prio) print_type(, cd->type); av_bprintf(, ", len: "); -if (cd->min_len != cd->max_len) -av_bprintf(, "[%i, ", cd->min_len); - -if (cd->max_len == TX_LEN_UNLIMITED) -av_bprintf(, "∞"); -else -av_bprintf(, "%i", cd->max_len); - -av_bprintf(, "%s, factors: [", cd->min_len != cd->max_len ? "]" : ""); -for (int i = 0; i < TX_MAX_SUB; i++) { -if (i && cd->factors[i]) -av_bprintf(, ", "); -if (cd->factors[i] == TX_FACTOR_ANY) -av_bprintf(, "any"); -else if (cd->factors[i]) -av_bprintf(, "%i", cd->factors[i]); +if (!len) { +if (cd->min_len != cd->max_len) +av_bprintf(, "[%i, ", cd->min_len); + +if (cd->max_len == TX_LEN_UNLIMITED) +av_bprintf(, "∞"); else -break; +av_bprintf(, "%i", cd->max_len); +} else { +av_bprintf(, "%i", len); } -av_bprintf(, "], "); +if (cd->factors[1]) { +av_bprintf(, "%s, factors", !len && cd->min_len != cd->max_len ? "]" : ""); +if (!cd->nb_factors) +av_bprintf(, ": ["); +else +av_bprintf(, "[%i]: [", cd->nb_factors); + +for (int i = 0; i < TX_MAX_FACTORS; i++) { +if (i && cd->factors[i]) +av_bprintf(, ", "); +if (cd->factors[i] == TX_FACTOR_ANY) +av_bprintf(, "any"); +else if (cd->factors[i]) +av_bprintf(, "%i", cd->factors[i]); +else +break; +} + +av_bprintf(, "], "); +} else { +av_bprintf(, "%s, factor: %i, ", + !len && cd->min_len != cd->max_len ? "]" : "", cd->factors[0]); +} print_flags(, cd->flags); if (print_prio) @@ -389,7 +403,7 @@ static void print_tx_structure(AVTXContext *s, int depth) for (int i = 0; i <= depth; i++) av_log(NULL, AV_LOG_DEBUG, ""); -print_cd_info(cd, cd->prio, 0); +print_cd_info(cd, cd->prio, s->len, 0); for (int i = 0; i < s->nb_sub; i++) print_tx_structure(>sub[i], depth + 1); @@ -604,7 +618,7 @@ av_cold int ff_tx_init_subtx(AVTXContext *s, enum AVTXType type, for (int i = 0; i < nb_cd_matches; i++) { av_log(NULL, AV_LOG_DEBUG, "%i: ", i + 1); -print_cd_info(cd_matches[i].cd, cd_matches[i].prio, 1); +print_cd_info(cd_matches[i].cd, cd_matches[i].prio, 0, 1); } #endif ___ ffmpeg-cvslog mailing list ffmpeg-cvslog@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-cvslog To unsubscribe, visit link above, or email ffmpeg-cvslog-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-cvslog] lavu/tx: allow codelets to specify a minimum number of matching factors
ffmpeg | branch: master | Lynne | Sat Oct 1 12:21:28 2022 +0200| [6ddd10c3e2d63d1ad1ea1034b0e3862107a27063] | committer: Lynne lavu/tx: allow codelets to specify a minimum number of matching factors > http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=6ddd10c3e2d63d1ad1ea1034b0e3862107a27063 --- libavutil/tx.c | 30 +- libavutil/tx_priv.h | 11 +-- libavutil/tx_template.c | 18 ++ 3 files changed, 40 insertions(+), 19 deletions(-) diff --git a/libavutil/tx.c b/libavutil/tx.c index 13fb54f916..a1173f6137 100644 --- a/libavutil/tx.c +++ b/libavutil/tx.c @@ -409,42 +409,38 @@ static int cmp_matches(TXCodeletMatch *a, TXCodeletMatch *b) /* We want all factors to completely cover the length */ static inline int check_cd_factors(const FFTXCodelet *cd, int len) { -int all_flag = 0; +int matches = 0, any_flag = 0; -for (int i = 0; i < TX_MAX_SUB; i++) { +for (int i = 0; i < TX_MAX_FACTORS; i++) { int factor = cd->factors[i]; -/* Conditions satisfied */ -if (len == 1) -return 1; - -/* No more factors */ -if (!factor) { -break; -} else if (factor == TX_FACTOR_ANY) { -all_flag = 1; +if (factor == TX_FACTOR_ANY) { +any_flag = 1; +matches++; continue; -} - -if (factor == 2) { /* Fast path */ +} else if (len <= 1 || !factor) { +break; +} else if (factor == 2) { /* Fast path */ int bits_2 = ff_ctz(len); if (!bits_2) -return 0; /* Factor not supported */ +continue; /* Factor not supported */ len >>= bits_2; +matches++; } else { int res = len % factor; if (res) -return 0; /* Factor not supported */ +continue; /* Factor not supported */ while (!res) { len /= factor; res = len % factor; } +matches++; } } -return all_flag || (len == 1); +return (cd->nb_factors <= matches) && (any_flag || len == 1); } av_cold int ff_tx_init_subtx(AVTXContext *s, enum AVTXType type, diff --git a/libavutil/tx_priv.h b/libavutil/tx_priv.h index d9e38ba19b..80d045f6af 100644 --- a/libavutil/tx_priv.h +++ b/libavutil/tx_priv.h @@ -71,7 +71,8 @@ typedef void TXComplex; .function = TX_FN_NAME(fn, suffix), \ .type = TX_TYPE(tx_type), \ .flags = FF_TX_ALIGNED | FF_TX_OUT_OF_PLACE | cd_flags, \ -.factors= { f1, f2 }, \ +.factors= { (f1), (f2) }, \ +.nb_factors = !!(f1) + !!(f2), \ .min_len= len_min, \ .max_len= len_max, \ .init = init_fn, \ @@ -163,6 +164,9 @@ typedef struct FFTXCodeletOptions { invert the lookup direction for the map generated */ } FFTXCodeletOptions; +/* Maximum number of factors a codelet may have. Arbitrary. */ +#define TX_MAX_FACTORS 16 + /* Maximum amount of subtransform functions, subtransforms and factors. Arbitrary. */ #define TX_MAX_SUB 4 @@ -175,13 +179,16 @@ typedef struct FFTXCodelet { uint64_t flags; /* A combination of AVTXFlags and codelet * flags that describe its properties. */ -int factors[TX_MAX_SUB]; /* Length factors */ +int factors[TX_MAX_FACTORS]; /* Length factors. MUST be coprime. */ #define TX_FACTOR_ANY -1 /* When used alone, signals that the codelet * supports all factors. Otherwise, if other * factors are present, it signals that whatever * remains will be supported, as long as the * other factors are a component of the length */ +int nb_factors; /* Minimum number of factors that have to + * be a modulo of the length. Must not be 0. */ + int min_len; /* Minimum length of transform, must be >= 1 */ int max_len; /* Maximum length of transform */ #define TX_LEN_UNLIMITED -1 /* Special length value to permit all lengths */ diff --git a/libavutil/tx_template.c b/libavutil/tx_template.c index 228209521b..c157719d73 100644 --- a/libavutil/tx_template.c +++ b/libavutil/tx_template.c @@ -518,6 +518,7 @@ static const FFTXCodelet TX_NAME(ff_tx_fft##n##_ns_def) =
[FFmpeg-cvslog] x86/tx_float_init: properly specify the supported factors of 15xM FFTs
ffmpeg | branch: master | Lynne | Sat Oct 1 12:44:06 2022 +0200| [92100eee5b588e60b8fe3e14d35766e8bab2] | committer: Lynne x86/tx_float_init: properly specify the supported factors of 15xM FFTs Only powers of two are currently supported. > http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=92100eee5b588e60b8fe3e14d35766e8bab2 --- libavutil/x86/tx_float_init.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/libavutil/x86/tx_float_init.c b/libavutil/x86/tx_float_init.c index 523b43a689..97ee44defa 100644 --- a/libavutil/x86/tx_float_init.c +++ b/libavutil/x86/tx_float_init.c @@ -290,11 +290,11 @@ const FFTXCodelet * const ff_tx_codelet_list_float_x86[] = { TX_DEF(fft_sr_ns, FFT, 64, 131072, 2, 0, 384, b8_i2, avx2, AVX2, AV_TX_INPLACE | FF_TX_PRESHUFFLE, AV_CPU_FLAG_AVXSLOW | AV_CPU_FLAG_SLOW_GATHER), -TX_DEF(fft_pfa_15xM, FFT, 60, TX_LEN_UNLIMITED, 15, TX_FACTOR_ANY, 320, fft_pfa_init, avx2, AVX2, +TX_DEF(fft_pfa_15xM, FFT, 60, TX_LEN_UNLIMITED, 15, 2, 320, fft_pfa_init, avx2, AVX2, AV_TX_INPLACE, AV_CPU_FLAG_AVXSLOW | AV_CPU_FLAG_SLOW_GATHER), -TX_DEF(fft_pfa_15xM_asm, FFT, 60, TX_LEN_UNLIMITED, 15, TX_FACTOR_ANY, 384, fft_pfa_init, avx2, AVX2, +TX_DEF(fft_pfa_15xM_asm, FFT, 60, TX_LEN_UNLIMITED, 15, 2, 384, fft_pfa_init, avx2, AVX2, AV_TX_INPLACE | FF_TX_PRESHUFFLE | FF_TX_ASM_CALL, AV_CPU_FLAG_AVXSLOW | AV_CPU_FLAG_SLOW_GATHER), -TX_DEF(fft_pfa_15xM_ns, FFT, 60, TX_LEN_UNLIMITED, 15, TX_FACTOR_ANY, 384, fft_pfa_init, avx2, AVX2, +TX_DEF(fft_pfa_15xM_ns, FFT, 60, TX_LEN_UNLIMITED, 15, 2, 384, fft_pfa_init, avx2, AVX2, AV_TX_INPLACE | FF_TX_PRESHUFFLE, AV_CPU_FLAG_AVXSLOW | AV_CPU_FLAG_SLOW_GATHER), TX_DEF(mdct_inv, MDCT, 16, TX_LEN_UNLIMITED, 2, TX_FACTOR_ANY, 384, m_inv_init, avx2, AVX2, ___ ffmpeg-cvslog mailing list ffmpeg-cvslog@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-cvslog To unsubscribe, visit link above, or email ffmpeg-cvslog-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-cvslog] x86/tx_float: implement striding in fft_15xM
ffmpeg | branch: master | Lynne | Fri Sep 30 11:00:44 2022 +0200| [fab97faf02118240c28695c1a6401e7bcc4b21a8] | committer: Lynne x86/tx_float: implement striding in fft_15xM > http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=fab97faf02118240c28695c1a6401e7bcc4b21a8 --- libavutil/x86/tx_float.asm | 45 + 1 file changed, 29 insertions(+), 16 deletions(-) diff --git a/libavutil/x86/tx_float.asm b/libavutil/x86/tx_float.asm index 6f83555ce5..2ad84c2885 100644 --- a/libavutil/x86/tx_float.asm +++ b/libavutil/x86/tx_float.asm @@ -24,7 +24,7 @@ ; Intra-asm call convention: ; 320 bytes of stack available ; 14 GPRs available (last 4 must not be clobbered) -; Additionally, don't clobber ctx, in, out, len, lut +; Additionally, don't clobber ctx, in, out, stride, len, lut ; All vector regs available ; TODO: @@ -863,7 +863,7 @@ FFT4_FN inv, 1, 1 %macro FFT8_SSE_FN 1 INIT_XMM sse3 %if %1 -cglobal fft8_asm_float, 0, 0, 0, ctx, out, in, tmp +cglobal fft8_asm_float, 0, 0, 0, ctx, out, in, stride, tmp movaps m0, [inq + 0*mmsize] movaps m1, [inq + 1*mmsize] movaps m2, [inq + 2*mmsize] @@ -896,7 +896,7 @@ cglobal fft8_float, 4, 4, 6, ctx, out, in, tmp %endif %if %1 -cglobal fft8_ns_float, 4, 4, 6, ctx, out, in, tmp +cglobal fft8_ns_float, 4, 5, 6, ctx, out, in, stride, tmp call mangle(ff_tx_fft8_asm_float_sse3) RET %endif @@ -908,7 +908,7 @@ FFT8_SSE_FN 1 %macro FFT8_AVX_FN 1 INIT_YMM avx %if %1 -cglobal fft8_asm_float, 0, 0, 0, ctx, out, in, tmp +cglobal fft8_asm_float, 0, 0, 0, ctx, out, in, stride, tmp movaps m0, [inq + 0*mmsize] movaps m1, [inq + 1*mmsize] %else @@ -936,7 +936,7 @@ cglobal fft8_float, 4, 4, 4, ctx, out, in, tmp %endif %if %1 -cglobal fft8_ns_float, 4, 4, 4, ctx, out, in, tmp +cglobal fft8_ns_float, 4, 5, 4, ctx, out, in, stride, tmp call mangle(ff_tx_fft8_asm_float_avx) RET %endif @@ -948,7 +948,7 @@ FFT8_AVX_FN 1 %macro FFT16_FN 2 INIT_YMM %1 %if %2 -cglobal fft16_asm_float, 0, 0, 0, ctx, out, in, tmp +cglobal fft16_asm_float, 0, 0, 0, ctx, out, in, stride, tmp movaps m0, [inq + 0*mmsize] movaps m1, [inq + 1*mmsize] movaps m2, [inq + 2*mmsize] @@ -985,7 +985,7 @@ cglobal fft16_float, 4, 4, 8, ctx, out, in, tmp %endif %if %2 -cglobal fft16_ns_float, 4, 4, 8, ctx, out, in, tmp +cglobal fft16_ns_float, 4, 5, 8, ctx, out, in, stride, tmp call mangle(ff_tx_fft16_asm_float_ %+ %1) RET %endif @@ -999,7 +999,7 @@ FFT16_FN fma3, 1 %macro FFT32_FN 2 INIT_YMM %1 %if %2 -cglobal fft32_asm_float, 0, 0, 0, ctx, out, in, tmp +cglobal fft32_asm_float, 0, 0, 0, ctx, out, in, stride, tmp movaps m4, [inq + 4*mmsize] movaps m5, [inq + 5*mmsize] movaps m6, [inq + 6*mmsize] @@ -1069,7 +1069,7 @@ cglobal fft32_float, 4, 4, 16, ctx, out, in, tmp %endif %if %2 -cglobal fft32_ns_float, 4, 4, 16, ctx, out, in, tmp +cglobal fft32_ns_float, 4, 5, 16, ctx, out, in, stride, tmp call mangle(ff_tx_fft32_asm_float_ %+ %1) RET %endif @@ -1123,9 +1123,9 @@ ALIGN 16 %macro FFT_SPLIT_RADIX_FN 2 INIT_YMM %1 %if %2 -cglobal fft_sr_asm_float, 0, 0, 0, ctx, out, in, tmp, len, lut, itab, rtab, tgt, off +cglobal fft_sr_asm_float, 0, 0, 0, ctx, out, in, stride, len, lut, itab, rtab, tgt, tmp %else -cglobal fft_sr_float, 4, 10, 16, 272, ctx, out, in, tmp, len, lut, itab, rtab, tgt, off +cglobal fft_sr_float, 4, 10, 16, 272, ctx, out, in, stride, len, lut, itab, rtab, tgt, tmp movsxd lenq, dword [ctxq + AVTXContext.len] mov lutq, [ctxq + AVTXContext.map] %endif @@ -1391,12 +1391,15 @@ FFT_SPLIT_RADIX_DEF 131072 ; Final synthesis + deinterleaving code ;=== .deinterleave: +%if %2 +PUSH strideq +%endif mov tgtq, lenq imul tmpq, lenq, 2 -lea offq, [4*lenq + tmpq] +lea strideq, [4*lenq + tmpq] .synth_deinterleave: -SPLIT_RADIX_COMBINE_DEINTERLEAVE_FULL tmpq, offq +SPLIT_RADIX_COMBINE_DEINTERLEAVE_FULL tmpq, strideq add outq, 8*mmsize add rtabq, 4*mmsize sub itabq, 4*mmsize @@ -1404,6 +1407,7 @@ FFT_SPLIT_RADIX_DEF 131072 jg .synth_deinterleave %if %2 +POP strideq sub outq, tmpq neg tmpq lea inq, [inq + tmpq*4] @@ -1706,6 +1710,7 @@ cglobal mdct_inv_float, 4, 14, 16, 320, ctx, out, in, stride, len, lut, exp, t1, jge .stride4_pre .transform: +mov strideq, 2*4 mov t4q, ctxq ; backup original context mov t5q, [ctxq + AVTXContext.fn] ; subtransform's jump point mov ctxq, [ctxq + AVTXContext.sub] @@ -1767,7 +1772,7 @@ IMDCT_FN avx2 %macro PFA_15_FN 2 INIT_YMM %1 %if %2 -cglobal fft_pfa_15xM_asm_float, 0, 0, 0, ctx, out, in, stride, len, lut, buf, map, tgt, tmp, \ +cglobal fft_pfa_15xM_asm_float, 0, 8, 0, ctx, out, in, stride, len, lut, buf, map, tgt, tmp, \ tgt5, stride3,
[FFmpeg-cvslog] x86/tx_float: optimize and macro out FFT15
ffmpeg | branch: master | Lynne | Tue Sep 27 04:47:46 2022 +0200| [877e575b5d44adc252d4434d2ec53232b2000956] | committer: Lynne x86/tx_float: optimize and macro out FFT15 > http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=877e575b5d44adc252d4434d2ec53232b2000956 --- libavutil/x86/tx_float.asm | 277 +++-- 1 file changed, 143 insertions(+), 134 deletions(-) diff --git a/libavutil/x86/tx_float.asm b/libavutil/x86/tx_float.asm index 5ed0007530..0061829581 100644 --- a/libavutil/x86/tx_float.asm +++ b/libavutil/x86/tx_float.asm @@ -91,7 +91,7 @@ s16_perm: dd 0, 1, 2, 3, 1, 0, 3, 2 s15_perm: dd 0, 6, 5, 3, 2, 4, 7, 1 -mask_mmpp: dd NEG, NEG, NEG, NEG, NEG, NEG, POS, POS +mask_mmpp: dd NEG, NEG, POS, POS, NEG, NEG, NEG, NEG mask_pppm: dd NEG, NEG, NEG, NEG, POS, POS, POS, NEG mask_ppmpmmpm: dd POS, POS, NEG, POS, NEG, NEG, POS, NEG mask_mppmmpmp: dd NEG, POS, POS, NEG, NEG, POS, NEG, POS @@ -307,6 +307,132 @@ SECTION .text %undef perm %endmacro +; Single 15-point complex FFT +; Input: +; xm0 must contain in[0,1].reim +; m2 - in[3-6].reim +; m3 - in[7-11].reim +; m4 - in[12-15].reim +; xm5 must contain in[2].reimreim +; +; Output: +; m0, m1, m2 - ACs +; xm14 - out[0] +; xm15 - out[10, 5] +%macro FFT15 0 +shufps xm1, xm0, xm0, q3223 ; in[1].imrereim +shufps xm0, xm0, xm0, q1001 ; in[0].imrereim + +xorps xm1, xm11 +addps xm1, xm0 ; pc[0,1].imre + +shufps xm0, xm1, xm1, q3232 ; pc[1].reimreim +addps xm0, xm5 ; dc[0].reimreim + +mulps xm1, xm9 ; tab[0123]*pc[01] + +shufpd xm6, xm1, xm1, 01b; pc[1,0].reim +xorps xm1, xm11 +addps xm1, xm1, xm6 +addsubps xm1, xm5, xm1 ; dc[1,2].reim + +subps m7, m2, m3 ; q[0-3].imre +addps m6, m2, m3 ; q[4-7] +shufps m7, m7, m7, q2301 ; q[0-3].reim + +addps m5, m4, m6 ; y[0-3] + +vperm2f128 m14, m9, m9, 0x11 ; tab[23232323] +vbroadcastsd m15, xm9; tab[01010101] + +mulps m6, m14 +mulps m7, m15 + +subps m2, m6, m7 ; k[0-3] +addps m3, m6, m7 ; k[4-7] + +shufps m12, m11, m11, q3232 ; + +addsubps m6, m4, m2 ; k[0-3] +addsubps m7, m4, m3 ; k[4-7] + +; 15pt from here on +vpermpd m2, m5, q0123; y[3-0] +vpermpd m3, m6, q0123; k[3-0] +vpermpd m4, m7, q0123; k[7-4] + +xorps m5, m12 +xorps m6, m12 +xorps m7, m12 + +addps m2, m5 ; t[0-3] +addps m3, m6 ; t[4-7] +addps m4, m7 ; t[8-11] + +movlhps xm14, xm2; out[0] +unpcklpd xm15, xm3, xm4 ; out[10,5] +unpckhpd xm5, xm3, xm4 ; out[10,5] + +addps xm14, xm2 ; out[0] +addps xm15, xm5 ; out[10,5] +addps xm14, xm0 ; out[0] +addps xm15, xm1 ; out[10,5] + +shufps m12, m10, m10, q3232 ; tab5 4 5 4 5 8 9 8 9 +shufps m13, m10, m10, q1010 ; tab5 6 7 6 7 10 11 10 11 + +mulps m5, m2, m12; t[0-3] +mulps m6, m3, m12; t[4-7] +mulps m7, m4, m12; t[8-11] + +mulps m2, m13; r[0-3] +mulps m3, m13; r[4-7] +mulps m4, m13; r[8-11] + +shufps m5, m5, m5, q1032 ; t[1,0,3,2].reim +shufps m6, m6, m6, q1032 ; t[5,4,7,6].reim +shufps m7, m7, m7, q1032 ; t[9,8,11,10].reim + +vperm2f128 m13, m11, m11, 0x01 ; mmpp +shufps m12, m11, m11, q3232 ; + +xorps m5, m13 +xorps m6, m13 +xorps m7, m13 + +addps m2, m5 ; r[0,1,2,3] +addps m3, m6 ; r[4,5,6,7] +addps m4, m7 ; r[8,9,10,11] + +shufps m5, m2, m2, q2301 +shufps m6, m3, m3, q2301 +shufps m7, m4, m4, q2301 + +xorps m2, m12 +xorps m3, m12 +xorps m4, m12 + +vpermpd m5, m5, q0123 +vpermpd m6, m6, q0123 +vpermpd m7, m7, q0123 + +addps m5, m2 +addps m6, m3 +addps m7, m4 + +vpermps m5, m8, m5 +vpermps m6, m8, m6 +vpermps m7, m8, m7 + +vbroadcastsd m0, xm0 ; dc[0] +vpermpd m2, m1, q; dc[2] +vbroadcastsd m1, xm1 ; dc[1] + +addps m0, m5 +addps m1, m6 +addps m2, m7 +%endmacro + ; Cobmines m0...m8 (tx1[even, even, odd, odd], tx2,3[even], tx2,3[odd]) coeffs ; Uses all 16 of registers. ; Output is slightly permuted such that tx2,3's coefficients are interleaved @@ -1610,11 +1736,10 @@ cglobal fft_pfa_15xM_float, 4, 14, 16, 320, ctx, out, in, stride, len, lut, buf, imul stride3q, strideq, 3 imul stride5q, strideq, 5 -movaps m13, [mask_mmpp]
[FFmpeg-cvslog] lavu/tx: drop requirement of input == output for in-place transforms
ffmpeg | branch: master | Lynne | Thu Nov 17 20:08:50 2022 +0100| [d4e39cae2e250a6fb9ed3a3a5a93694f4d445165] | committer: Lynne lavu/tx: drop requirement of input == output for in-place transforms No longer necessary. > http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=d4e39cae2e250a6fb9ed3a3a5a93694f4d445165 --- libavutil/tx.h | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/libavutil/tx.h b/libavutil/tx.h index 3de2f7231b..cd772ad903 100644 --- a/libavutil/tx.h +++ b/libavutil/tx.h @@ -115,9 +115,8 @@ typedef void (*av_tx_fn)(AVTXContext *s, void *out, void *in, ptrdiff_t stride); */ enum AVTXFlags { /** - * Performs an in-place transformation on the input. The output argument - * of av_tn_fn() MUST match the input. May be unsupported or slower for some - * transform types. + * Allows for in-place transformations, where input == output. + * May be unsupported or slower for some transform types. */ AV_TX_INPLACE = 1ULL << 0, ___ ffmpeg-cvslog mailing list ffmpeg-cvslog@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-cvslog To unsubscribe, visit link above, or email ffmpeg-cvslog-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-cvslog] lavu/tx: support out-of-place transforms in fft_inplace
ffmpeg | branch: master | Lynne | Thu Nov 17 20:06:43 2022 +0100| [fff3e1d8489ee83949f67faba8908755846a6f4f] | committer: Lynne lavu/tx: support out-of-place transforms in fft_inplace This makes testing easier, as a unified path can be used for in/out of place transforms. > http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=fff3e1d8489ee83949f67faba8908755846a6f4f --- libavutil/tx_template.c | 13 +++-- 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/libavutil/tx_template.c b/libavutil/tx_template.c index 2a8afcb02a..5274133ec4 100644 --- a/libavutil/tx_template.c +++ b/libavutil/tx_template.c @@ -773,6 +773,7 @@ static void TX_NAME(ff_tx_fft)(AVTXContext *s, void *_dst, static void TX_NAME(ff_tx_fft_inplace)(AVTXContext *s, void *_dst, void *_src, ptrdiff_t stride) { +TXComplex *src = _src; TXComplex *dst = _dst; TXComplex tmp; const int *map = s->sub->map; @@ -781,16 +782,16 @@ static void TX_NAME(ff_tx_fft_inplace)(AVTXContext *s, void *_dst, src_idx = *inplace_idx++; do { -tmp = dst[src_idx]; +tmp = src[src_idx]; dst_idx = map[src_idx]; do { -FFSWAP(TXComplex, tmp, dst[dst_idx]); +FFSWAP(TXComplex, tmp, src[dst_idx]); dst_idx = map[dst_idx]; } while (dst_idx != src_idx); /* Can be > as well, but was less predictable */ -dst[dst_idx] = tmp; +src[dst_idx] = tmp; } while ((src_idx = *inplace_idx++)); -s->fn[0](>sub[0], dst, dst, stride); +s->fn[0](>sub[0], dst, src, stride); } static const FFTXCodelet TX_NAME(ff_tx_fft_def) = { @@ -810,13 +811,13 @@ static const FFTXCodelet TX_NAME(ff_tx_fft_inplace_def) = { .name = TX_NAME_STR("fft_inplace"), .function = TX_NAME(ff_tx_fft_inplace), .type = TX_TYPE(FFT), -.flags = AV_TX_UNALIGNED | AV_TX_INPLACE, +.flags = AV_TX_UNALIGNED | FF_TX_OUT_OF_PLACE | AV_TX_INPLACE, .factors[0] = TX_FACTOR_ANY, .min_len= 2, .max_len= TX_LEN_UNLIMITED, .init = TX_NAME(ff_tx_fft_init), .cpu_flags = FF_TX_CPU_FLAGS_ALL, -.prio = FF_TX_PRIO_BASE, +.prio = FF_TX_PRIO_BASE - 512, }; static av_cold int TX_NAME(ff_tx_fft_init_naive_small)(AVTXContext *s, ___ ffmpeg-cvslog mailing list ffmpeg-cvslog@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-cvslog To unsubscribe, visit link above, or email ffmpeg-cvslog-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-cvslog] x86/tx_float: add a standalone 15-point AVX2 transform
ffmpeg | branch: master | Lynne | Wed Sep 28 06:46:57 2022 +0200| [cc1df4045eba7273b573ecb40380f000144d] | committer: Lynne x86/tx_float: add a standalone 15-point AVX2 transform Enables its use everywhere else in the framework. > http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=cc1df4045eba7273b573ecb40380f000144d --- libavutil/x86/tx_float.asm| 63 +++ libavutil/x86/tx_float_init.c | 54 + 2 files changed, 117 insertions(+) diff --git a/libavutil/x86/tx_float.asm b/libavutil/x86/tx_float.asm index 0061829581..6f83555ce5 100644 --- a/libavutil/x86/tx_float.asm +++ b/libavutil/x86/tx_float.asm @@ -1515,6 +1515,69 @@ FFT_SPLIT_RADIX_FN avx2, 1 %endif %endif +%macro FFT15_FN 2 +INIT_YMM avx2 +cglobal fft15_ %+ %2, 4, 10, 16, ctx, out, in, stride, len, lut, tmp, tgt5, stride3, stride5 +mov lutq, [ctxq + AVTXContext.map] + +imul stride3q, strideq, 3 +imul stride5q, strideq, 5 + +movaps m11, [mask_mmpp] ; mmpp +movaps m10, [tab_53_float] ; tab5 +movaps xm9, [tab_53_float + 32] ; tab3 +vpermpd m9, m9, q1110; tab[23232323] +movaps m8, [s15_perm] + +%if %1 +movups xm0, [inq] +movddup xm5, [inq + 16] +movups m2, [inq + mmsize*0 + 24] +movups m3, [inq + mmsize*1 + 24] +movups m4, [inq + mmsize*2 + 24] +%else +LOAD64_LUT xm0, inq, lutq, 0, tmpq, m14, xm15 +LOAD64_LUT m2, inq, lutq, (mmsize/2)*0 + 12, tmpq, m6, m7 +LOAD64_LUT m3, inq, lutq, (mmsize/2)*1 + 12, tmpq, m14, m15 +LOAD64_LUT m4, inq, lutq, (mmsize/2)*2 + 12, tmpq, m6, m7 +mov tmpd, [lutq + 8] +movddup xm5, [inq + tmpq*8] +%endif + +FFT15 + +lea tgt5q, [outq + stride5q] +lea tmpq, [outq + stride5q*2] + +movhps [outq], xm14 ; out[0] +movhps [outq + stride5q*1], xm15 ; out[5] +movlps [outq + stride5q*2], xm15 ; out[10] + +vextractf128 xm3, m0, 1 +vextractf128 xm4, m1, 1 +vextractf128 xm5, m2, 1 + +movlps [outq + strideq*1], xm1 +movhps [outq + strideq*2], xm2 +movlps [outq + stride3q*1], xm3 +movhps [outq + strideq*4], xm4 +movlps [outq + stride3q*2], xm0 +movlps [outq + strideq*8], xm5 +movhps [outq + stride3q*4], xm0 +movhps [tgt5q + strideq*2], xm1 +movhps [tgt5q + strideq*4], xm3 +movlps [tmpq + strideq*1], xm2 +movlps [tmpq + stride3q*1], xm4 +movhps [tmpq + strideq*4], xm5 + +RET +%endmacro + +%if ARCH_X86_64 && HAVE_AVX2_EXTERNAL +FFT15_FN 0, float +FFT15_FN 1, ns_float +%endif + %macro IMDCT_FN 1 INIT_YMM %1 cglobal mdct_inv_float, 4, 14, 16, 320, ctx, out, in, stride, len, lut, exp, t1, t2, t3, \ diff --git a/libavutil/x86/tx_float_init.c b/libavutil/x86/tx_float_init.c index 8e2babb539..523b43a689 100644 --- a/libavutil/x86/tx_float_init.c +++ b/libavutil/x86/tx_float_init.c @@ -30,6 +30,8 @@ TX_DECL_FN(fft8, sse3) TX_DECL_FN(fft8_ns, sse3) TX_DECL_FN(fft8, avx) TX_DECL_FN(fft8_ns, avx) +TX_DECL_FN(fft15, avx2) +TX_DECL_FN(fft15_ns, avx2) TX_DECL_FN(fft16, avx) TX_DECL_FN(fft16_ns, avx) TX_DECL_FN(fft16, fma3) @@ -85,6 +87,53 @@ static av_cold int b ##basis## _i ##interleave(AVTXContext *s, \ DECL_INIT_FN(8, 0) DECL_INIT_FN(8, 2) +static av_cold int factor_init(AVTXContext *s, const FFTXCodelet *cd, + uint64_t flags, FFTXCodeletOptions *opts, + int len, int inv, const void *scale) +{ +TX_TAB(ff_tx_init_tabs)(len); + +s->map = av_malloc(len*sizeof(s->map)); +s->map[0] = 0; /* DC is always at the start */ +if (inv) /* Reversing the ACs flips the transform direction */ +for (int i = 1; i < len; i++) +s->map[i] = len - i; +else +for (int i = 1; i < len; i++) +s->map[i] = i; + +if (len == 15) { +int cnt = 0, tmp[15]; + +/* Our 15-point transform is actually a 5x3 PFA, so embed its input map. */ +memcpy(tmp, s->map, 15*sizeof(*tmp)); +for (int i = 0; i < 5; i++) +for (int j = 0; j < 3; j++) +s->map[i*3 + j] = tmp[(i*3 + j*5) % 15]; + +/* Special 15-point assembly permutation */ +memcpy(tmp, s->map, 15*sizeof(*tmp)); +for (int i = 1; i < 15; i += 3) { +s->map[cnt] = tmp[i]; +cnt++; +} +for (int i = 2; i < 15; i += 3) { +s->map[cnt] = tmp[i]; +cnt++; +} +for (int i = 0; i < 15; i += 3) { +s->map[cnt] = tmp[i]; +cnt++; +} +memmove(>map[7], >map[6], 4*sizeof(int)); +memmove(>map[3], >map[1], 4*sizeof(int)); +s->map[1] = tmp[2]; +s->map[2] = tmp[0]; +} + +return 0; +} + static av_cold int m_inv_init(AVTXContext *s, const FFTXCodelet *cd, uint64_t flags, FFTXCodeletOptions *opts,
[FFmpeg-cvslog] lavu/tx: add fft_inplace_small transforms
ffmpeg | branch: master | Lynne | Thu Nov 17 20:10:45 2022 +0100| [68cabf875015610decda7e564dc5697f6c21f707] | committer: Lynne lavu/tx: add fft_inplace_small transforms This is much faster than the loop. > http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=68cabf875015610decda7e564dc5697f6c21f707 --- libavutil/tx_template.c | 34 +++--- 1 file changed, 31 insertions(+), 3 deletions(-) diff --git a/libavutil/tx_template.c b/libavutil/tx_template.c index 5274133ec4..747731a06d 100644 --- a/libavutil/tx_template.c +++ b/libavutil/tx_template.c @@ -754,20 +754,34 @@ static av_cold int TX_NAME(ff_tx_fft_init)(AVTXContext *s, return 0; } +static av_cold int TX_NAME(ff_tx_fft_inplace_small_init)(AVTXContext *s, + const FFTXCodelet *cd, + uint64_t flags, + FFTXCodeletOptions *opts, + int len, int inv, + const void *scale) +{ +if (!(s->tmp = av_malloc(len*sizeof(*s->tmp +return AVERROR(ENOMEM); +flags &= ~AV_TX_INPLACE; +return TX_NAME(ff_tx_fft_init)(s, cd, flags, opts, len, inv, scale); +} + static void TX_NAME(ff_tx_fft)(AVTXContext *s, void *_dst, void *_src, ptrdiff_t stride) { TXComplex *src = _src; -TXComplex *dst = _dst; +TXComplex *dst1 = s->flags & AV_TX_INPLACE ? s->tmp : _dst; +TXComplex *dst2 = _dst; int *map = s->sub[0].map; int len = s->len; /* Compilers can't vectorize this anyway without assuming AVX2, which they * generally don't, at least without -march=native -mtune=native */ for (int i = 0; i < len; i++) -dst[i] = src[map[i]]; +dst1[i] = src[map[i]]; -s->fn[0](>sub[0], dst, dst, stride); +s->fn[0](>sub[0], dst2, dst1, stride); } static void TX_NAME(ff_tx_fft_inplace)(AVTXContext *s, void *_dst, @@ -807,6 +821,19 @@ static const FFTXCodelet TX_NAME(ff_tx_fft_def) = { .prio = FF_TX_PRIO_BASE, }; +static const FFTXCodelet TX_NAME(ff_tx_fft_inplace_small_def) = { +.name = TX_NAME_STR("fft_inplace_small"), +.function = TX_NAME(ff_tx_fft), +.type = TX_TYPE(FFT), +.flags = AV_TX_UNALIGNED | FF_TX_OUT_OF_PLACE | AV_TX_INPLACE, +.factors[0] = TX_FACTOR_ANY, +.min_len= 2, +.max_len= 65536, +.init = TX_NAME(ff_tx_fft_inplace_small_init), +.cpu_flags = FF_TX_CPU_FLAGS_ALL, +.prio = FF_TX_PRIO_BASE - 256, +}; + static const FFTXCodelet TX_NAME(ff_tx_fft_inplace_def) = { .name = TX_NAME_STR("fft_inplace"), .function = TX_NAME(ff_tx_fft_inplace), @@ -1638,6 +1665,7 @@ const FFTXCodelet * const TX_NAME(ff_tx_codelet_list)[] = { /* Standalone transforms */ _NAME(ff_tx_fft_def), _NAME(ff_tx_fft_inplace_def), +_NAME(ff_tx_fft_inplace_small_def), _NAME(ff_tx_fft_pfa_3xM_def), _NAME(ff_tx_fft_pfa_5xM_def), _NAME(ff_tx_fft_pfa_7xM_def), ___ ffmpeg-cvslog mailing list ffmpeg-cvslog@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-cvslog To unsubscribe, visit link above, or email ffmpeg-cvslog-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-cvslog] lavu/tx: make C ptwo transforms in+out of place
ffmpeg | branch: master | Lynne | Thu Nov 17 20:03:09 2022 +0100| [d260796f119682274c83e2f1465f56f3e314c4a4] | committer: Lynne lavu/tx: make C ptwo transforms in+out of place We assume that _all_ in-place transforms can operate out of place, which isn't true, because the C ptwo transforms were always in-place (dst). > http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=d260796f119682274c83e2f1465f56f3e314c4a4 --- libavutil/tx_template.c | 117 +--- 1 file changed, 61 insertions(+), 56 deletions(-) diff --git a/libavutil/tx_template.c b/libavutil/tx_template.c index f53a241248..2a8afcb02a 100644 --- a/libavutil/tx_template.c +++ b/libavutil/tx_template.c @@ -611,8 +611,8 @@ static const FFTXCodelet TX_NAME(ff_tx_fft##n##_ns_def) = { \ .name = TX_NAME_STR("fft" #n "_ns"), \ .function = TX_NAME(ff_tx_fft##n##_ns), \ .type = TX_TYPE(FFT), \ -.flags = AV_TX_INPLACE | AV_TX_UNALIGNED | \ - FF_TX_PRESHUFFLE, \ +.flags = FF_TX_OUT_OF_PLACE | AV_TX_INPLACE | \ + AV_TX_UNALIGNED | FF_TX_PRESHUFFLE, \ .factors[0] = 2,\ .min_len= n,\ .max_len= n,\ @@ -621,70 +621,75 @@ static const FFTXCodelet TX_NAME(ff_tx_fft##n##_ns_def) = { \ .prio = FF_TX_PRIO_BASE, \ }; -#define DECL_SR_CODELET(n, n2, n4) \ -static void TX_NAME(ff_tx_fft##n##_ns)(AVTXContext *s, void *dst,\ -void *src, ptrdiff_t stride) \ -{\ -TXComplex *z = dst; \ -const TXSample *cos = TX_TAB(ff_tx_tab_##n); \ - \ -TX_NAME(ff_tx_fft##n2##_ns)(s, z,z,stride); \ -TX_NAME(ff_tx_fft##n4##_ns)(s, z + n4*2, z + n4*2, stride); \ -TX_NAME(ff_tx_fft##n4##_ns)(s, z + n4*3, z + n4*3, stride); \ -TX_NAME(ff_tx_fft_sr_combine)(z, cos, n4 >> 1); \ -}\ - \ +#define DECL_SR_CODELET(n, n2, n4)\ +static void TX_NAME(ff_tx_fft##n##_ns)(AVTXContext *s, void *_dst,\ +void *_src, ptrdiff_t stride) \ +{ \ +TXComplex *src = _src;\ +TXComplex *dst = _dst;\ +const TXSample *cos = TX_TAB(ff_tx_tab_##n); \ + \ +TX_NAME(ff_tx_fft##n2##_ns)(s, dst,src,stride); \ +TX_NAME(ff_tx_fft##n4##_ns)(s, dst + n4*2, src + n4*2, stride); \ +TX_NAME(ff_tx_fft##n4##_ns)(s, dst + n4*3, src + n4*3, stride); \ +TX_NAME(ff_tx_fft_sr_combine)(dst, cos, n4 >> 1); \ +} \ + \ DECL_SR_CODELET_DEF(n) -static void TX_NAME(ff_tx_fft2_ns)(AVTXContext *s, void *dst, - void *src, ptrdiff_t stride) +static void TX_NAME(ff_tx_fft2_ns)(AVTXContext *s, void *_dst, + void *_src, ptrdiff_t stride) { -TXComplex *z = dst; +TXComplex *src = _src; +TXComplex *dst = _dst; TXComplex tmp; -BF(tmp.re, z[0].re, z[0].re, z[1].re); -BF(tmp.im, z[0].im, z[0].im, z[1].im); -z[1] = tmp; +BF(tmp.re, dst[0].re, src[0].re, src[1].re); +BF(tmp.im, dst[0].im, src[0].im, src[1].im); +dst[1] = tmp; } -static void TX_NAME(ff_tx_fft4_ns)(AVTXContext *s, void *dst, - void *src, ptrdiff_t stride) +static void TX_NAME(ff_tx_fft4_ns)(AVTXContext *s, void *_dst, + void *_src, ptrdiff_t stride) { -TXComplex *z = dst; +TXComplex *src = _src; +TXComplex *dst = _dst; TXSample t1, t2, t3, t4, t5, t6, t7, t8; -BF(t3, t1, z[0].re, z[1].re); -BF(t8, t6, z[3].re, z[2].re); -BF(z[2].re, z[0].re, t1, t6); -BF(t4, t2, z[0].im, z[1].im); -BF(t7, t5, z[2].im, z[3].im); -BF(z[3].im, z[1].im, t4, t8); -BF(z[3].re, z[1].re, t3, t7); -BF(z[2].im, z[0].im, t2, t5); +BF(t3, t1, src[0].re, src[1].re); +BF(t8, t6, src[3].re, src[2].re); +BF(dst[2].re, dst[0].re, t1, t6); +BF(t4, t2, src[0].im, src[1].im); +
[FFmpeg-cvslog] lavu/tx: support output stride in naive transforms
ffmpeg | branch: master | Lynne | Wed Sep 28 12:59:08 2022 +0200| [fbe4fd992f4327fcf17b2a76a823c38945b0ea13] | committer: Lynne lavu/tx: support output stride in naive transforms Allows them to be used in general PFAs. > http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=fbe4fd992f4327fcf17b2a76a823c38945b0ea13 --- libavutil/tx_template.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/libavutil/tx_template.c b/libavutil/tx_template.c index 747731a06d..228209521b 100644 --- a/libavutil/tx_template.c +++ b/libavutil/tx_template.c @@ -880,6 +880,8 @@ static void TX_NAME(ff_tx_fft_naive)(AVTXContext *s, void *_dst, void *_src, const int n = s->len; double phase = s->inv ? 2.0*M_PI/n : -2.0*M_PI/n; +stride /= sizeof(*dst); + for (int i = 0; i < n; i++) { TXComplex tmp = { 0 }; for (int j = 0; j < n; j++) { @@ -893,7 +895,7 @@ static void TX_NAME(ff_tx_fft_naive)(AVTXContext *s, void *_dst, void *_src, tmp.re += res.re; tmp.im += res.im; } -dst[i] = tmp; +dst[i*stride] = tmp; } } @@ -904,6 +906,8 @@ static void TX_NAME(ff_tx_fft_naive_small)(AVTXContext *s, void *_dst, void *_sr TXComplex *dst = _dst; const int n = s->len; +stride /= sizeof(*dst); + for (int i = 0; i < n; i++) { TXComplex tmp = { 0 }; for (int j = 0; j < n; j++) { @@ -913,7 +917,7 @@ static void TX_NAME(ff_tx_fft_naive_small)(AVTXContext *s, void *_dst, void *_sr tmp.re += res.re; tmp.im += res.im; } -dst[i] = tmp; +dst[i*stride] = tmp; } } ___ ffmpeg-cvslog mailing list ffmpeg-cvslog@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-cvslog To unsubscribe, visit link above, or email ffmpeg-cvslog-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-cvslog] lavu/tx: add naive_small FFT
ffmpeg | branch: master | Lynne | Sun Sep 25 08:19:17 2022 +0200| [37008dc4026c6a460c454a95f3f2766afbc702e3] | committer: Lynne lavu/tx: add naive_small FFT The same as naive but with precomputed tables. Makes it more useful for odd-factors we don't support yet. > http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=37008dc4026c6a460c454a95f3f2766afbc702e3 --- libavutil/tx_template.c | 63 +++-- 1 file changed, 61 insertions(+), 2 deletions(-) diff --git a/libavutil/tx_template.c b/libavutil/tx_template.c index d72281f09c..f53a241248 100644 --- a/libavutil/tx_template.c +++ b/libavutil/tx_template.c @@ -814,6 +814,31 @@ static const FFTXCodelet TX_NAME(ff_tx_fft_inplace_def) = { .prio = FF_TX_PRIO_BASE, }; +static av_cold int TX_NAME(ff_tx_fft_init_naive_small)(AVTXContext *s, + const FFTXCodelet *cd, + uint64_t flags, + FFTXCodeletOptions *opts, + int len, int inv, + const void *scale) +{ +const double phase = s->inv ? 2.0*M_PI/len : -2.0*M_PI/len; + +if (!(s->exp = av_malloc(len*len*sizeof(*s->exp +return AVERROR(ENOMEM); + +for (int i = 0; i < len; i++) { +for (int j = 0; j < len; j++) { +const double factor = phase*i*j; +s->exp[i*j] = (TXComplex){ +RESCALE(cos(factor)), +RESCALE(sin(factor)), +}; +} +} + +return 0; +} + static void TX_NAME(ff_tx_fft_naive)(AVTXContext *s, void *_dst, void *_src, ptrdiff_t stride) { @@ -822,9 +847,9 @@ static void TX_NAME(ff_tx_fft_naive)(AVTXContext *s, void *_dst, void *_src, const int n = s->len; double phase = s->inv ? 2.0*M_PI/n : -2.0*M_PI/n; -for(int i = 0; i < n; i++) { +for (int i = 0; i < n; i++) { TXComplex tmp = { 0 }; -for(int j = 0; j < n; j++) { +for (int j = 0; j < n; j++) { const double factor = phase*i*j; const TXComplex mult = { RESCALE(cos(factor)), @@ -839,6 +864,39 @@ static void TX_NAME(ff_tx_fft_naive)(AVTXContext *s, void *_dst, void *_src, } } +static void TX_NAME(ff_tx_fft_naive_small)(AVTXContext *s, void *_dst, void *_src, + ptrdiff_t stride) +{ +TXComplex *src = _src; +TXComplex *dst = _dst; +const int n = s->len; + +for (int i = 0; i < n; i++) { +TXComplex tmp = { 0 }; +for (int j = 0; j < n; j++) { +TXComplex res; +const TXComplex mult = s->exp[i*j]; +CMUL3(res, src[j], mult); +tmp.re += res.re; +tmp.im += res.im; +} +dst[i] = tmp; +} +} + +static const FFTXCodelet TX_NAME(ff_tx_fft_naive_small_def) = { +.name = TX_NAME_STR("fft_naive_small"), +.function = TX_NAME(ff_tx_fft_naive_small), +.type = TX_TYPE(FFT), +.flags = AV_TX_UNALIGNED | FF_TX_OUT_OF_PLACE, +.factors[0] = TX_FACTOR_ANY, +.min_len= 2, +.max_len= 1024, +.init = TX_NAME(ff_tx_fft_init_naive_small), +.cpu_flags = FF_TX_CPU_FLAGS_ALL, +.prio = FF_TX_PRIO_MIN/2, +}; + static const FFTXCodelet TX_NAME(ff_tx_fft_naive_def) = { .name = TX_NAME_STR("fft_naive"), .function = TX_NAME(ff_tx_fft_naive), @@ -1580,6 +1638,7 @@ const FFTXCodelet * const TX_NAME(ff_tx_codelet_list)[] = { _NAME(ff_tx_fft_pfa_9xM_def), _NAME(ff_tx_fft_pfa_15xM_def), _NAME(ff_tx_fft_naive_def), +_NAME(ff_tx_fft_naive_small_def), _NAME(ff_tx_mdct_fwd_def), _NAME(ff_tx_mdct_inv_def), _NAME(ff_tx_mdct_pfa_3xM_fwd_def), ___ ffmpeg-cvslog mailing list ffmpeg-cvslog@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-cvslog To unsubscribe, visit link above, or email ffmpeg-cvslog-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-cvslog] lavu/tx: generalize single-factor transforms
ffmpeg | branch: master | Lynne | Sat Sep 24 06:49:16 2022 +0200| [45bd4bf79f9b69ac4cec1bd00c433407b3aa7ae4] | committer: Lynne lavu/tx: generalize single-factor transforms Not that useful, but it gives us fast small odd-length transforms. > http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=45bd4bf79f9b69ac4cec1bd00c433407b3aa7ae4 --- libavutil/tx.c | 4 ++-- libavutil/tx_priv.h | 2 +- libavutil/tx_template.c | 46 +++--- 3 files changed, 26 insertions(+), 26 deletions(-) diff --git a/libavutil/tx.c b/libavutil/tx.c index 556fcbb94b..246a7aa980 100644 --- a/libavutil/tx.c +++ b/libavutil/tx.c @@ -121,9 +121,9 @@ int ff_tx_gen_ptwo_revtab(AVTXContext *s, int invert_lookup) return 0; } -int ff_tx_gen_ptwo_inplace_revtab_idx(AVTXContext *s) +int ff_tx_gen_inplace_map(AVTXContext *s, int len) { -int *src_map, out_map_idx = 0, len = s->len; +int *src_map, out_map_idx = 0; if (!s->sub || !s->sub->map) return AVERROR(EINVAL); diff --git a/libavutil/tx_priv.h b/libavutil/tx_priv.h index fb61119009..56e78631ba 100644 --- a/libavutil/tx_priv.h +++ b/libavutil/tx_priv.h @@ -259,7 +259,7 @@ int ff_tx_gen_ptwo_revtab(AVTXContext *s, int invert_lookup); * specific order, allows the revtab to be done in-place. The sub-transform * and its map should already be initialized. */ -int ff_tx_gen_ptwo_inplace_revtab_idx(AVTXContext *s); +int ff_tx_gen_inplace_map(AVTXContext *s, int len); /* * This generates a parity-based revtab of length len and direction inv. diff --git a/libavutil/tx_template.c b/libavutil/tx_template.c index 6c8d0a1ebc..b547800447 100644 --- a/libavutil/tx_template.c +++ b/libavutil/tx_template.c @@ -650,12 +650,12 @@ DECL_SR_CODELET(32768,16384,8192) DECL_SR_CODELET(65536,32768,16384) DECL_SR_CODELET(131072,65536,32768) -static av_cold int TX_NAME(ff_tx_fft_sr_init)(AVTXContext *s, - const FFTXCodelet *cd, - uint64_t flags, - FFTXCodeletOptions *opts, - int len, int inv, - const void *scale) +static av_cold int TX_NAME(ff_tx_fft_init)(AVTXContext *s, + const FFTXCodelet *cd, + uint64_t flags, + FFTXCodeletOptions *opts, + int len, int inv, + const void *scale) { int ret; int is_inplace = !!(flags & AV_TX_INPLACE); @@ -668,14 +668,14 @@ static av_cold int TX_NAME(ff_tx_fft_sr_init)(AVTXContext *s, if ((ret = ff_tx_init_subtx(s, TX_TYPE(FFT), flags, _opts, len, inv, scale))) return ret; -if (is_inplace && (ret = ff_tx_gen_ptwo_inplace_revtab_idx(s))) +if (is_inplace && (ret = ff_tx_gen_inplace_map(s, len))) return ret; return 0; } -static void TX_NAME(ff_tx_fft_sr)(AVTXContext *s, void *_dst, - void *_src, ptrdiff_t stride) +static void TX_NAME(ff_tx_fft)(AVTXContext *s, void *_dst, + void *_src, ptrdiff_t stride) { TXComplex *src = _src; TXComplex *dst = _dst; @@ -690,8 +690,8 @@ static void TX_NAME(ff_tx_fft_sr)(AVTXContext *s, void *_dst, s->fn[0](>sub[0], dst, dst, stride); } -static void TX_NAME(ff_tx_fft_sr_inplace)(AVTXContext *s, void *_dst, - void *_src, ptrdiff_t stride) +static void TX_NAME(ff_tx_fft_inplace)(AVTXContext *s, void *_dst, + void *_src, ptrdiff_t stride) { TXComplex *dst = _dst; TXComplex tmp; @@ -713,28 +713,28 @@ static void TX_NAME(ff_tx_fft_sr_inplace)(AVTXContext *s, void *_dst, s->fn[0](>sub[0], dst, dst, stride); } -static const FFTXCodelet TX_NAME(ff_tx_fft_sr_def) = { -.name = TX_NAME_STR("fft_sr"), -.function = TX_NAME(ff_tx_fft_sr), +static const FFTXCodelet TX_NAME(ff_tx_fft_def) = { +.name = TX_NAME_STR("fft"), +.function = TX_NAME(ff_tx_fft), .type = TX_TYPE(FFT), .flags = AV_TX_UNALIGNED | FF_TX_OUT_OF_PLACE, -.factors[0] = 2, +.factors[0] = TX_FACTOR_ANY, .min_len= 2, .max_len= TX_LEN_UNLIMITED, -.init = TX_NAME(ff_tx_fft_sr_init), +.init = TX_NAME(ff_tx_fft_init), .cpu_flags = FF_TX_CPU_FLAGS_ALL, .prio = FF_TX_PRIO_BASE, }; -static const FFTXCodelet TX_NAME(ff_tx_fft_sr_inplace_def) = { -.name = TX_NAME_STR("fft_sr_inplace"), -.function = TX_NAME(ff_tx_fft_sr_inplace), +static const FFTXCodelet TX_NAME(ff_tx_fft_inplace_def) = { +.name = TX_NAME_STR("fft_inplace"), +.function = TX_NAME(ff_tx_fft_inplace), .type = TX_TYPE(FFT),
[FFmpeg-cvslog] lavu/tx: make prime factor transforms truly in-place
ffmpeg | branch: master | Lynne | Sat Sep 24 06:47:21 2022 +0200| [79f11e24098d6392015656897bc7842c9d2aea43] | committer: Lynne lavu/tx: make prime factor transforms truly in-place They all overwrote in[0] and then used it as a DC. > http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=79f11e24098d6392015656897bc7842c9d2aea43 --- libavutil/tx_template.c | 108 +--- 1 file changed, 56 insertions(+), 52 deletions(-) diff --git a/libavutil/tx_template.c b/libavutil/tx_template.c index 666af5e496..6c8d0a1ebc 100644 --- a/libavutil/tx_template.c +++ b/libavutil/tx_template.c @@ -171,36 +171,37 @@ av_cold void TX_TAB(ff_tx_init_tabs)(int len) static av_always_inline void fft3(TXComplex *out, TXComplex *in, ptrdiff_t stride) { -TXComplex tmp[2]; +TXComplex tmp[3]; const TXSample *tab = TX_TAB(ff_tx_tab_53); #ifdef TX_INT32 int64_t mtmp[4]; #endif -BF(tmp[0].re, tmp[1].im, in[1].im, in[2].im); -BF(tmp[0].im, tmp[1].re, in[1].re, in[2].re); +tmp[0] = in[0]; +BF(tmp[1].re, tmp[2].im, in[1].im, in[2].im); +BF(tmp[1].im, tmp[2].re, in[1].re, in[2].re); -out[0*stride].re = in[0].re + tmp[1].re; -out[0*stride].im = in[0].im + tmp[1].im; +out[0*stride].re = tmp[0].re + tmp[2].re; +out[0*stride].im = tmp[0].im + tmp[2].im; #ifdef TX_INT32 -mtmp[0] = (int64_t)tab[ 8] * tmp[0].re; -mtmp[1] = (int64_t)tab[ 9] * tmp[0].im; -mtmp[2] = (int64_t)tab[10] * tmp[1].re; -mtmp[3] = (int64_t)tab[10] * tmp[1].im; -out[1*stride].re = in[0].re - (mtmp[2] + mtmp[0] + 0x4000 >> 31); -out[1*stride].im = in[0].im - (mtmp[3] - mtmp[1] + 0x4000 >> 31); -out[2*stride].re = in[0].re - (mtmp[2] - mtmp[0] + 0x4000 >> 31); -out[2*stride].im = in[0].im - (mtmp[3] + mtmp[1] + 0x4000 >> 31); +mtmp[0] = (int64_t)tab[ 8] * tmp[1].re; +mtmp[1] = (int64_t)tab[ 9] * tmp[1].im; +mtmp[2] = (int64_t)tab[10] * tmp[2].re; +mtmp[3] = (int64_t)tab[10] * tmp[2].im; +out[1*stride].re = tmp[0].re - (mtmp[2] + mtmp[0] + 0x4000 >> 31); +out[1*stride].im = tmp[0].im - (mtmp[3] - mtmp[1] + 0x4000 >> 31); +out[2*stride].re = tmp[0].re - (mtmp[2] - mtmp[0] + 0x4000 >> 31); +out[2*stride].im = tmp[0].im - (mtmp[3] + mtmp[1] + 0x4000 >> 31); #else -tmp[0].re = tab[ 8] * tmp[0].re; -tmp[0].im = tab[ 9] * tmp[0].im; -tmp[1].re = tab[10] * tmp[1].re; -tmp[1].im = tab[10] * tmp[1].im; -out[1*stride].re = in[0].re - tmp[1].re + tmp[0].re; -out[1*stride].im = in[0].im - tmp[1].im - tmp[0].im; -out[2*stride].re = in[0].re - tmp[1].re - tmp[0].re; -out[2*stride].im = in[0].im - tmp[1].im + tmp[0].im; +tmp[1].re = tab[ 8] * tmp[1].re; +tmp[1].im = tab[ 9] * tmp[1].im; +tmp[2].re = tab[10] * tmp[2].re; +tmp[2].im = tab[10] * tmp[2].im; +out[1*stride].re = tmp[0].re - tmp[2].re + tmp[1].re; +out[1*stride].im = tmp[0].im - tmp[2].im - tmp[1].im; +out[2*stride].re = tmp[0].re - tmp[2].re - tmp[1].re; +out[2*stride].im = tmp[0].im - tmp[2].im + tmp[1].im; #endif } @@ -208,16 +209,17 @@ static av_always_inline void fft3(TXComplex *out, TXComplex *in, static av_always_inline void NAME(TXComplex *out, TXComplex *in,\ ptrdiff_t stride) \ { \ -TXComplex z0[4], t[6]; \ +TXComplex dc, z0[4], t[6]; \ const TXSample *tab = TX_TAB(ff_tx_tab_53); \ \ +dc = in[0]; \ BF(t[1].im, t[0].re, in[1].re, in[4].re); \ BF(t[1].re, t[0].im, in[1].im, in[4].im); \ BF(t[3].im, t[2].re, in[2].re, in[3].re); \ BF(t[3].re, t[2].im, in[2].im, in[3].im); \ \ -out[D0*stride].re = in[0].re + t[0].re + t[2].re; \ -out[D0*stride].im = in[0].im + t[0].im + t[2].im; \ +out[D0*stride].re = dc.re + t[0].re + t[2].re; \ +out[D0*stride].im = dc.im + t[0].im + t[2].im; \ \ SMUL(t[4].re, t[0].re, tab[0], tab[2], t[2].re, t[0].re); \ SMUL(t[4].im, t[0].im, tab[0], tab[2], t[2].im, t[0].im); \ @@ -229,14 +231,14 @@ static av_always_inline void NAME(TXComplex *out, TXComplex *in,\ BF(z0[2].re, z0[1].re, t[4].re, t[5].re); \ BF(z0[2].im, z0[1].im, t[4].im, t[5].im); \ \ -out[D1*stride].re =
[FFmpeg-cvslog] lavu/tx: list all odd-length FFT factors as regular codelets
ffmpeg | branch: master | Lynne | Sat Sep 24 06:50:17 2022 +0200| [e8a9b7b29877db9e3887562007df7a53325b67d1] | committer: Lynne lavu/tx: list all odd-length FFT factors as regular codelets Allows them to be picked just like any other transform. > http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=e8a9b7b29877db9e3887562007df7a53325b67d1 --- libavutil/tx_template.c | 88 + 1 file changed, 88 insertions(+) diff --git a/libavutil/tx_template.c b/libavutil/tx_template.c index b547800447..d72281f09c 100644 --- a/libavutil/tx_template.c +++ b/libavutil/tx_template.c @@ -472,6 +472,81 @@ static av_always_inline void fft15(TXComplex *out, TXComplex *in, fft5_m3(out, tmp + 10, stride); } +static av_cold int TX_NAME(ff_tx_fft_factor_init)(AVTXContext *s, + const FFTXCodelet *cd, + uint64_t flags, + FFTXCodeletOptions *opts, + int len, int inv, + const void *scale) +{ +TX_TAB(ff_tx_init_tabs)(len); + +if (flags & FF_TX_PRESHUFFLE) { +s->map = av_malloc(len*sizeof(s->map)); +s->map[0] = 0; /* DC is always at the start */ +if (inv) /* Reversing the ACs flips the transform direction */ +for (int i = 1; i < len; i++) +s->map[i] = len - i; +else +for (int i = 1; i < len; i++) +s->map[i] = i; +} + +/* Our 15-point transform is actually a 5x3 PFA, so embed its input map. */ +if (len == 15) { +int tmp[15]; +memcpy(tmp, s->map, 15*sizeof(*tmp)); +for (int i = 0; i < 5; i++) { +for (int j = 0; j < 3; j++) +s->map[i*3 + j] = tmp[(i*3 + j*5) % 15]; +} +} + +return 0; +} + +#define DECL_FACTOR_S(n) \ +static void TX_NAME(ff_tx_fft##n)(AVTXContext *s, void *dst, \ + void *src, ptrdiff_t stride) \ +{ \ +fft##n((TXComplex *)dst, (TXComplex *)src, stride / sizeof(TXComplex)); \ +} \ +static const FFTXCodelet TX_NAME(ff_tx_fft##n##_ns_def) = { \ +.name = TX_NAME_STR("fft" #n "_ns"), \ +.function = TX_NAME(ff_tx_fft##n), \ +.type = TX_TYPE(FFT), \ +.flags = AV_TX_INPLACE | FF_TX_OUT_OF_PLACE | \ + AV_TX_UNALIGNED | FF_TX_PRESHUFFLE, \ +.factors[0] = n, \ +.min_len= n, \ +.max_len= n, \ +.init = TX_NAME(ff_tx_fft_factor_init), \ +.cpu_flags = FF_TX_CPU_FLAGS_ALL, \ +.prio = FF_TX_PRIO_BASE, \ +}; + +#define DECL_FACTOR_F(n) \ +DECL_FACTOR_S(n) \ +static const FFTXCodelet TX_NAME(ff_tx_fft##n##_fwd_def) = { \ +.name = TX_NAME_STR("fft" #n "_fwd"), \ +.function = TX_NAME(ff_tx_fft##n), \ +.type = TX_TYPE(FFT), \ +.flags = AV_TX_INPLACE | FF_TX_OUT_OF_PLACE | \ + AV_TX_UNALIGNED | FF_TX_FORWARD_ONLY, \ +.factors[0] = n, \ +.min_len= n, \ +.max_len= n, \ +.init = TX_NAME(ff_tx_fft_factor_init), \ +.cpu_flags = FF_TX_CPU_FLAGS_ALL, \ +.prio = FF_TX_PRIO_BASE, \ +}; + +DECL_FACTOR_F(3) +DECL_FACTOR_F(5) +DECL_FACTOR_F(7) +DECL_FACTOR_F(9) +DECL_FACTOR_S(15) + #define BUTTERFLIES(a0, a1, a2, a3)\ do { \ r0=a0.re; \ @@ -1483,6 +1558,19 @@ const FFTXCodelet * const TX_NAME(ff_tx_codelet_list)[] = { _NAME(ff_tx_fft65536_ns_def),
[FFmpeg-cvslog] fate/aacenc: increase tolerance for ln-128k test
ffmpeg | branch: master | Lynne | Thu Nov 17 12:39:47 2022 +0100| [d556f6fa9bda60af411415a6aaa9fa26fe2ebf45] | committer: Lynne fate/aacenc: increase tolerance for ln-128k test The encoder is sensitive to changes in precision, and its test target was a compromise. It was already close to failing on x87 FPUs. ff_mdct_init used double precision entirely from the scale to computing the MDCT exp tables. av_tx_init uses single-precision for the scale, with a small input change which was enough to tip the test into failing on x87 FPUs. Increase the fuzz factor in line with other AAC encoder tests to fix. > http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=d556f6fa9bda60af411415a6aaa9fa26fe2ebf45 --- tests/fate/aac.mak | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tests/fate/aac.mak b/tests/fate/aac.mak index 1743428f54..4f8d1cdcea 100644 --- a/tests/fate/aac.mak +++ b/tests/fate/aac.mak @@ -174,7 +174,7 @@ fate-aac-ln-encode-128k: REF = $(SAMPLES)/audio-reference/luckynight_2ch_44kHz_s fate-aac-ln-encode-128k: CMP_SHIFT = -4096 fate-aac-ln-encode-128k: CMP_TARGET = 622 fate-aac-ln-encode-128k: SIZE_TOLERANCE = 3560 -fate-aac-ln-encode-128k: FUZZ = 5 +fate-aac-ln-encode-128k: FUZZ = 10 FATE_AAC_ENCODE += fate-aac-pns-encode fate-aac-pns-encode: CMD = enc_dec_pcm adts wav s16le $(TARGET_SAMPLES)/audio-reference/luckynight_2ch_44kHz_s16.wav -c:a aac -aac_coder fast -aac_pns 1 -aac_is 0 -aac_ms 0 -aac_tns 0 -b:a 128k -cutoff 22050 -fflags +bitexact -flags +bitexact ___ ffmpeg-cvslog mailing list ffmpeg-cvslog@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-cvslog To unsubscribe, visit link above, or email ffmpeg-cvslog-requ...@ffmpeg.org with subject "unsubscribe".