Re: [FFmpeg-devel] [PATCH] checkasm/af_afir: relax the max allowed absolute difference
On 1/11/2019 12:17 PM, Derek Buitenhuis wrote: > On 11/01/2019 13:28, Hendrik Leppkes wrote: >> Because the computation accumulates more inaccuarcy then FLT_EPSILON >> allows for. That value is really not of that great use. If you have >> two accurate numbers and do one calculation, it may work, but if you >> do a whole bunch of them, the error accumulates and eventually gets >> bigger then FLT_EPSILON. >> x86_32 floating point is for $reasons a tad bit less accurate then on >> x86_64, for example, resulting in the test failing. We have some other >> float tests that do (or used to) fail sporadically due to inaccuracy >> problems, which sometimes where fixed by similar means - or >> multiplifying FLT_EPSILON to make it bigger. > > OK. > > Two things: > > 1. That should be in the commit messages. > 2. Would some multiple of FLT_EPSILON make more sense? Michael suggested 1000*FLT_EPSILON but IMO that's too big and may hide errors in future implementations. The value i used is the smallest value i found that didn't fail after several runs. 6.1e-05 for example fails. > > - Derek > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel > ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH] avcodec: fix some docstrings
Hello, This fixes an erroneous reference and missing links in the API documentation. Best regards, Nicolas Granger --- libavcodec/avcodec.h | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/libavcodec/avcodec.h b/libavcodec/avcodec.h index 4414853e84..64ba039be2 100644 --- a/libavcodec/avcodec.h +++ b/libavcodec/avcodec.h @@ -4892,7 +4892,8 @@ int avcodec_send_packet(AVCodecContext *avctx, const AVPacket *avpkt); * @param frame This will be set to a reference-counted video or audio * frame (depending on the decoder type) allocated by the * decoder. Note that the function will always call - * av_frame_unref(frame) before doing anything else. + * @ref av_frame_unref "av_frame_unref(frame)" before doing + * anything else. * * @return * 0: success, a frame was returned @@ -4948,7 +4949,8 @@ int avcodec_send_frame(AVCodecContext *avctx, const AVFrame *frame); * @param avctx codec context * @param avpkt This will be set to a reference-counted packet allocated by the * encoder. Note that the function will always call - * av_frame_unref(frame) before doing anything else. + * @ref av_packet_unref "av_packet_unref(avpkt)" before doing + * anything else. * @return 0 on success, otherwise negative error code: * AVERROR(EAGAIN): output is not available in the current state - user * must try to send input -- 2.20.1 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH v5] libswscale/ppc: VSX-optimize 9-16 bit yuv2planeX
On Sat, Jan 12, 2019 at 06:25:57PM +0200, Lauri Kasanen wrote: > On Sat, 12 Jan 2019 14:52:07 +0100 > Michael Niedermayer wrote: > > > On Sat, Jan 12, 2019 at 10:47:50AM +0200, Lauri Kasanen wrote: > > > ./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt > > > yuv420p16be \ > > > -s 1920x1728 -f null -vframes 100 -v error -nostats - > > > > > > 9-14 bit funcs get about 6x speedup, 16-bit gets about 15x. > > > Fate passes, each format tested with an image to video conversion. > > > > > > Only POWER8 includes 32-bit vector multiplies, so POWER7 is locked out > > > of the 16-bit function. This includes the vec_mulo/mule functions too, > > > not just vmuluwm. > > > > > > yuv420p9le > > > 12341 UNITS in planarX, 130976 runs, 96 skips > > > 73752 UNITS in planarX, 131066 runs, 6 skips > > > yuv420p9be > > > 12364 UNITS in planarX, 131025 runs, 47 skips > > > 73001 UNITS in planarX, 131055 runs, 17 skips > > > yuv420p10le > > > 12386 UNITS in planarX, 131042 runs, 30 skips > > > 72735 UNITS in planarX, 131062 runs, 10 skips > > > yuv420p10be > > > 12337 UNITS in planarX, 131045 runs, 27 skips > > > 72734 UNITS in planarX, 131057 runs, 15 skips > > > yuv420p12le > > > 12236 UNITS in planarX, 131058 runs, 14 skips > > > 73029 UNITS in planarX, 131062 runs, 10 skips > > > yuv420p12be > > > 12218 UNITS in planarX, 130973 runs, 99 skips > > > 72402 UNITS in planarX, 131069 runs, 3 skips > > > yuv420p14le > > > 12168 UNITS in planarX, 131067 runs, 5 skips > > > 72480 UNITS in planarX, 131069 runs, 3 skips > > > yuv420p14be > > > 12358 UNITS in planarX, 130948 runs,124 skips > > > 73772 UNITS in planarX, 131063 runs, 9 skips > > > yuv420p16le > > > 10439 UNITS in planarX, 130911 runs,161 skips > > > 157923 UNITS in planarX, 131068 runs, 4 skips > > > yuv420p16be > > > 10463 UNITS in planarX, 130874 runs,198 skips > > > 154405 UNITS in planarX, 131061 runs, 11 skips > > > > The number of skips in the benchmark is much larger on one > > side. That way the numbers become hard to compare as > > more cases aer skipped on one side > > > > please adjust the parameters so the skip counts are compareable > > or redo the tests until the numbers are more similar > > thanks > > How do I do that? It's a VM, so there are going to be pauses no matter > what, when other VMs run. Or should I take the largest run count with > about the same skips? I would try to adjust TIMER_REPORT so that either VM switches are skiped on both sides of the test reliably or that they are never skipped. The idea is to do the same to both so theres no asymetry from differntly successfull skips thx [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB Asymptotically faster algorithms should always be preferred if you have asymptotical amounts of data signature.asc Description: PGP signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] doc/developer: require transparency about sponshorships.
On 1/12/19, Nicolas George wrote: > Hendrik Leppkes (12019-01-11): >> Its everyones right to keep their finances private. Would I be forced >> to disclose my hourly wages and then determine how long I worked on a >> patch, just because I did it during my day job? Thats not going to >> happen. >> >> To take a line from your post: >> Are you against privacy? > > I grant you these were cheap theatricals. But to answer your question > seriously: I am against absolute unconditional privacy, yes. Some things > deserve privacy, some things do not; I personally believe that economic > matters rather fall in the second category. > > In the particular instance you are evoking, the commit message could > just say "developed as part my regular job at $company", I consider that > enough disclosure for the purpose. And I wonder why you would want to > keep that much hidden. > >> Patches should generally be considered on their own merit. > > That is true. And patches should be reviewed and discussed until they > are of top quality. You know as well as me that it is not what is > happening: there are too many patches and too little time available from > competent developers; as a result, some code of mediocre quality have > been pushed, and some committers have explicitly stated they would > bypass technical objections to their patches. And now it appears that > was the result of sponsorships... Citation needed. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH 3/4] avcodec/prosumer: Reduce lut size
Signed-off-by: Michael Niedermayer --- libavcodec/prosumer.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libavcodec/prosumer.c b/libavcodec/prosumer.c index 3fa9986a38..9143bb1bf4 100644 --- a/libavcodec/prosumer.c +++ b/libavcodec/prosumer.c @@ -38,7 +38,7 @@ typedef struct ProSumerContext { unsigned stride; unsigned size; -uint32_t lut[0x1]; +uint32_t lut[0x2000]; uint8_t *initial_line; uint8_t *decbuffer; } ProSumerContext; -- 2.20.1 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH 2/4] avcodec/prosumer: Simplify code slightly in decompress()
Signed-off-by: Michael Niedermayer --- libavcodec/prosumer.c | 19 +++ 1 file changed, 7 insertions(+), 12 deletions(-) diff --git a/libavcodec/prosumer.c b/libavcodec/prosumer.c index 505de71980..3fa9986a38 100644 --- a/libavcodec/prosumer.c +++ b/libavcodec/prosumer.c @@ -84,16 +84,13 @@ static int decompress(GetByteContext *gb, int size, PutByteContext *pb, const ui if (bytestream2_get_bytes_left(gb) <= 0) { if (!a) return 0; -cnt = 4; } else { -pos = bytestream2_tell(gb) ^ 2; -bytestream2_seek(gb, pos, SEEK_SET); +pos = bytestream2_tell(gb); +bytestream2_seek(gb, pos ^ 2, SEEK_SET); AV_WN16(, bytestream2_peek_le16(gb)); -pos = pos ^ 2; -bytestream2_seek(gb, pos, SEEK_SET); -bytestream2_skip(gb, 2); -cnt = 4; +bytestream2_seek(gb, pos + 2, SEEK_SET); } +cnt = 4; } c--; } @@ -117,12 +114,10 @@ static int decompress(GetByteContext *gb, int size, PutByteContext *pb, const ui } return 0; } -pos = bytestream2_tell(gb) ^ 2; -bytestream2_seek(gb, pos, SEEK_SET); +pos = bytestream2_tell(gb); +bytestream2_seek(gb, pos ^ 2, SEEK_SET); AV_WN16(, bytestream2_peek_le16(gb)); -pos = pos ^ 2; -bytestream2_seek(gb, pos, SEEK_SET); -bytestream2_skip(gb, 2); +bytestream2_seek(gb, pos + 2, SEEK_SET); cnt = 4; idx--; } -- 2.20.1 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH 4/4] avcodec/prosumer: Error out if decompress() stops reading data
if 0 is encountered in the LUT then decompress() will continue to output 0 bytes but never read more data. Without a specification it is impossible to say if this is invalid or a feature. None of the valid prosumer files tested cause a 0 to be read, so it is likely not a intended feature. Fixes: Timeout Fixes: 11266/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_PROSUMER_fuzzer-5681827423977472 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer --- libavcodec/prosumer.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/libavcodec/prosumer.c b/libavcodec/prosumer.c index 9143bb1bf4..ce3cbdbb73 100644 --- a/libavcodec/prosumer.c +++ b/libavcodec/prosumer.c @@ -96,6 +96,8 @@ static int decompress(GetByteContext *gb, int size, PutByteContext *pb, const ui } idx = a >> 20; b = lut[2 * idx]; +if (!b) +return AVERROR_INVALIDDATA; continue; } idx = 2; @@ -154,8 +156,9 @@ static int decode_frame(AVCodecContext *avctx, void *data, memset(s->decbuffer, 0, s->size); bytestream2_init(>gb, avpkt->data, avpkt->size); bytestream2_init_writer(>pb, s->decbuffer, s->size); - -decompress(>gb, AV_RL32(avpkt->data + 28) >> 1, >pb, s->lut); +ret = decompress(>gb, AV_RL32(avpkt->data + 28) >> 1, >pb, s->lut); +if (ret < 0) +return ret; vertical_predict((uint32_t *)s->decbuffer, 0, (uint32_t *)s->initial_line, s->stride, 1); vertical_predict((uint32_t *)s->decbuffer, s->stride, (uint32_t *)s->decbuffer, s->stride, avctx->height - 1); -- 2.20.1 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH 1/4] avcodec/tiff: Check for 12bit gray fax
Fixes: Assertion failure Fixes: 11898/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_TIFF_fuzzer-5759794191794176 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer --- libavcodec/tiff.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libavcodec/tiff.c b/libavcodec/tiff.c index 570b3cbd01..5cf2a4417e 100644 --- a/libavcodec/tiff.c +++ b/libavcodec/tiff.c @@ -619,7 +619,7 @@ static int tiff_unpack_strip(TiffContext *s, AVFrame *p, uint8_t *dst, int strid if (s->compr == TIFF_CCITT_RLE || s->compr == TIFF_G3|| s->compr == TIFF_G4) { -if (is_yuv) +if (is_yuv || p->format == AV_PIX_FMT_GRAY12) return AVERROR_INVALIDDATA; return tiff_unpack_fax(s, dst, stride, src, size, width, lines); -- 2.20.1 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] doc/developer: require transparency about sponshorships.
Kyle Swanson (12019-01-11): > If someone sends a bad patch, we have no obligation to merge it. Except if they push it themselves after a few hours without review (or after being rude to somebody to made a review requiring more work). The disclosure (and review) requirement is especially important for people who have commit rights, but it would be unjust to apply to only us. Regards, -- Nicolas George signature.asc Description: PGP signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] doc/developer: require transparency about sponshorships.
Hendrik Leppkes (12019-01-11): > Its everyones right to keep their finances private. Would I be forced > to disclose my hourly wages and then determine how long I worked on a > patch, just because I did it during my day job? Thats not going to > happen. > > To take a line from your post: > Are you against privacy? I grant you these were cheap theatricals. But to answer your question seriously: I am against absolute unconditional privacy, yes. Some things deserve privacy, some things do not; I personally believe that economic matters rather fall in the second category. In the particular instance you are evoking, the commit message could just say "developed as part my regular job at $company", I consider that enough disclosure for the purpose. And I wonder why you would want to keep that much hidden. > Patches should generally be considered on their own merit. That is true. And patches should be reviewed and discussed until they are of top quality. You know as well as me that it is not what is happening: there are too many patches and too little time available from competent developers; as a result, some code of mediocre quality have been pushed, and some committers have explicitly stated they would bypass technical objections to their patches. And now it appears that was the result of sponsorships... Regards, -- Nicolas George signature.asc Description: PGP signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] avfilter: add maskfun filter
On 11/19/18, Paul B Mahol wrote: > Signed-off-by: Paul B Mahol > --- > doc/filters.texi | 27 > libavfilter/Makefile | 1 + > libavfilter/allfilters.c | 1 + > libavfilter/vf_maskfun.c | 279 +++ > 4 files changed, 308 insertions(+) > create mode 100644 libavfilter/vf_maskfun.c Will apply. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH]lavc: Allow very high bitrates in AVCPBProperties after next version bump
2019-01-11 1:07 GMT+01:00, James Almer : > On 1/10/2019 6:27 PM, Carl Eugen Hoyos wrote: >> Hi! >> >> I don't know how urgent this is and how easily this can be triggered >> with AVCPBProperties but we had issues with bitrates > INT32MAX in the >> past, so looking at this code before realizing the qsv bitrate issue >> is Intel-related I thought this patch cannot hurt. >> >> Please comment, Carl Eugen > > Probalby correct. bitrate fields in AVCodecContext are all int64_t, and > AVCPBProperties fields are usually set to those. Patch applied. Thank you, Carl Eugen ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH]lavc/tiff: Support some CMYK samples
2019-01-11 16:36 GMT+01:00, Derek Buitenhuis : > On 11/01/2019 14:54, Carl Eugen Hoyos wrote: >> Attached patch that fixes the sample from ticket #3459 cannot be >> factorized with the code in mjpegdec (and psd), the representation is >> different. > > Is there a good reason this is RGB0 instead of RGB24? It would make the code much more complicated for no obvious benefit (imo). Once somebody implements 5-component cmyk in tiff, this can be changed as well. > Other than that, seems OK, if tested Patch applied, thank you! > (is there a FATE sample we can add?) Will look into a test. Thank you, Carl Eugen ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH v5] libswscale/ppc: VSX-optimize 9-16 bit yuv2planeX
On Sat, 12 Jan 2019 14:52:07 +0100 Michael Niedermayer wrote: > On Sat, Jan 12, 2019 at 10:47:50AM +0200, Lauri Kasanen wrote: > > ./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt > > yuv420p16be \ > > -s 1920x1728 -f null -vframes 100 -v error -nostats - > > > > 9-14 bit funcs get about 6x speedup, 16-bit gets about 15x. > > Fate passes, each format tested with an image to video conversion. > > > > Only POWER8 includes 32-bit vector multiplies, so POWER7 is locked out > > of the 16-bit function. This includes the vec_mulo/mule functions too, > > not just vmuluwm. > > > > yuv420p9le > > 12341 UNITS in planarX, 130976 runs, 96 skips > > 73752 UNITS in planarX, 131066 runs, 6 skips > > yuv420p9be > > 12364 UNITS in planarX, 131025 runs, 47 skips > > 73001 UNITS in planarX, 131055 runs, 17 skips > > yuv420p10le > > 12386 UNITS in planarX, 131042 runs, 30 skips > > 72735 UNITS in planarX, 131062 runs, 10 skips > > yuv420p10be > > 12337 UNITS in planarX, 131045 runs, 27 skips > > 72734 UNITS in planarX, 131057 runs, 15 skips > > yuv420p12le > > 12236 UNITS in planarX, 131058 runs, 14 skips > > 73029 UNITS in planarX, 131062 runs, 10 skips > > yuv420p12be > > 12218 UNITS in planarX, 130973 runs, 99 skips > > 72402 UNITS in planarX, 131069 runs, 3 skips > > yuv420p14le > > 12168 UNITS in planarX, 131067 runs, 5 skips > > 72480 UNITS in planarX, 131069 runs, 3 skips > > yuv420p14be > > 12358 UNITS in planarX, 130948 runs,124 skips > > 73772 UNITS in planarX, 131063 runs, 9 skips > > yuv420p16le > > 10439 UNITS in planarX, 130911 runs,161 skips > > 157923 UNITS in planarX, 131068 runs, 4 skips > > yuv420p16be > > 10463 UNITS in planarX, 130874 runs,198 skips > > 154405 UNITS in planarX, 131061 runs, 11 skips > > The number of skips in the benchmark is much larger on one > side. That way the numbers become hard to compare as > more cases aer skipped on one side > > please adjust the parameters so the skip counts are compareable > or redo the tests until the numbers are more similar > thanks How do I do that? It's a VM, so there are going to be pauses no matter what, when other VMs run. Or should I take the largest run count with about the same skips? - Lauri ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH]lavc/psd: Support CMYK images
2019-01-11 7:17 GMT+01:00, Peter Ross : > On Fri, Jan 11, 2019 at 03:23:49AM +0100, Carl Eugen Hoyos wrote: >> 2019-01-11 2:55 GMT+01:00, Carl Eugen Hoyos : >> > Hi! >> > >> > Attached patch fixes ticket #6797, please comment. >> >> New patch with 16bit support attached. >> >> Please comment, Carl Eugen > >> From 5f879539ee7fecd57bd3de9f7c6363d9b7779b5b Mon Sep 17 00:00:00 2001 >> From: Carl Eugen Hoyos >> Date: Fri, 11 Jan 2019 03:20:38 +0100 >> Subject: [PATCH] lavc/psd: Support CMYK images. >> >> Based on a05635e by Michael Niedermayer. >> >> Fixes ticket #6797. >> --- >> libavcodec/psd.c | 84 >> ++ >> 1 file changed, 84 insertions(+) >> >> diff --git a/libavcodec/psd.c b/libavcodec/psd.c > >> +if (s->channel_depth == 8) { >> +for (y = 0; y < s->height; y++) { >> +for (x = 0; x < s->width; x++) { >> +int k = src[3][x]; >> +int r = src[0][x] * k; >> +int g = src[1][x] * k; >> +int b = src[2][x] * k; >> +dst[0][x] = g * 257 >> 16; >> +dst[1][x] = b * 257 >> 16; >> +dst[2][x] = r * 257 >> 16; >> +} > > the same algorithm exists in libavcodec/mjpegdec.c, with alpha channel > support. > i guess it is trivial enough to be duplicated here. > > otherwise looks good. Patch applied. Thanks everybody, Carl Eugen ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 1/2] avcodec/vc1: shuffle calculation of MV predictor candidates
2019-01-11 15:36 GMT+01:00, Jerome Borsboom : > The B predictor for 4-MV macroblocks is only out of bounds when > the A predictor is also out of bounds. Patch applied. Thank you, Carl Eugen ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 2/3] avcodec/gdv: Optimize and factorize scaling loops
2019-01-12 16:46 GMT+01:00, Michael Niedermayer : > On Sat, Jan 12, 2019 at 04:07:42PM +0100, Carl Eugen Hoyos wrote: >> 2019-01-04 20:22 GMT+01:00, Michael Niedermayer : >> >> > +static void scaledown(uint8_t *dst, const uint8_t *src, int w) >> > +{ >> > +int x; >> > +for (x = 0; x < w - 7; x+=8) { >> > +dst[x + 0] = src[2*x + 0]; >> > +dst[x + 1] = src[2*x + 2]; >> > +dst[x + 2] = src[2*x + 4]; >> > +dst[x + 3] = src[2*x + 6]; >> > +dst[x + 4] = src[2*x + 8]; >> > +dst[x + 5] = src[2*x +10]; >> > +dst[x + 6] = src[2*x +12]; >> > +dst[x + 7] = src[2*x +14]; >> >> Could you add to the commit message the information >> which compiler is able to optimize this? >> (Assuming this is a reason for the speedup) > > if what you ask for is "which compiler turns this into SIMD" > i do not know, and i suspect mine does not from the limited > increase in performance > I think the speedup is primarly from simply unrolling the trivial loop > > is there something you want me to change in the commit message still ? No, I am a little surprised that unrolling without SIMD makes a difference. Thank you for the explanation, Carl Eugen ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 2/2 v3] avcodec/vc1: fix decoding of old WMV3 format
2019-01-12 16:14 GMT+01:00, Jerome Borsboom : > The position of the second MV predicitor candidate is slightly different > for the old WMV3 format indicated by RES_RTM_FLAG. This patch fixes > decoding of niceday.wmv on the samples server. > > Fixes: #6641 > > Signed-off-by: Jerome Borsboom > --- > This revision removes a spurious whitespace that was left behind. Patch applied. Thank you, Carl Eugen ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 2/3] avcodec/gdv: Optimize and factorize scaling loops
On Sat, Jan 12, 2019 at 04:07:42PM +0100, Carl Eugen Hoyos wrote: > 2019-01-04 20:22 GMT+01:00, Michael Niedermayer : > > > +static void scaledown(uint8_t *dst, const uint8_t *src, int w) > > +{ > > +int x; > > +for (x = 0; x < w - 7; x+=8) { > > +dst[x + 0] = src[2*x + 0]; > > +dst[x + 1] = src[2*x + 2]; > > +dst[x + 2] = src[2*x + 4]; > > +dst[x + 3] = src[2*x + 6]; > > +dst[x + 4] = src[2*x + 8]; > > +dst[x + 5] = src[2*x +10]; > > +dst[x + 6] = src[2*x +12]; > > +dst[x + 7] = src[2*x +14]; > > Could you add to the commit message the information > which compiler is able to optimize this? > (Assuming this is a reason for the speedup) if what you ask for is "which compiler turns this into SIMD" i do not know, and i suspect mine does not from the limited increase in performance I think the speedup is primarly from simply unrolling the trivial loop is there something you want me to change in the commit message still ? > > Sorry for the late comment, Carl Eugen > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB Dictatorship: All citizens are under surveillance, all their steps and actions recorded, for the politicians to enforce control. Democracy: All politicians are under surveillance, all their steps and actions recorded, for the citizens to enforce control. signature.asc Description: PGP signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 2/2 v3] avcodec/vc1: fix decoding of old WMV3 format
The position of the second MV predicitor candidate is slightly different for the old WMV3 format indicated by RES_RTM_FLAG. This patch fixes decoding of niceday.wmv on the samples server. Fixes: #6641 Signed-off-by: Jerome Borsboom --- This revision removes a spurious whitespace that was left behind. libavcodec/vc1.c | 5 - libavcodec/vc1_pred.c | 5 - 2 files changed, 4 insertions(+), 6 deletions(-) diff --git a/libavcodec/vc1.c b/libavcodec/vc1.c index 3581d87b57..e102b931d8 100644 --- a/libavcodec/vc1.c +++ b/libavcodec/vc1.c @@ -379,11 +379,6 @@ int ff_vc1_decode_sequence_header(AVCodecContext *avctx, VC1Context *v, GetBitCo } else { v->res_rtm_flag = get_bits1(gb); //reserved } -if (!v->res_rtm_flag) { -av_log(avctx, AV_LOG_ERROR, - "Old WMV3 version detected, some frames may be decoded incorrectly\n"); -//return -1; -} //TODO: figure out what they mean (always 0x402F) if (!v->res_fasttx) skip_bits(gb, 16); diff --git a/libavcodec/vc1_pred.c b/libavcodec/vc1_pred.c index 0b22d9916c..e1758a3817 100644 --- a/libavcodec/vc1_pred.c +++ b/libavcodec/vc1_pred.c @@ -275,7 +275,10 @@ void ff_vc1_pred_mv(VC1Context *v, int n, int dmv_x, int dmv_y, //in 4-MV mode different blocks have different B predictor position switch (n) { case 0: -off = (s->mb_x > 0) ? -1 : 1; +if (v->res_rtm_flag) +off = s->mb_x ? -1 : 1; +else +off = s->mb_x ? -1 : 2 * s->mb_width - wrap - 1; break; case 1: off = (s->mb_x == (s->mb_width - 1)) ? -1 : 1; -- 2.13.6 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] Video codec design for very low-end decoder
2019-01-12 1:46 GMT+01:00, Ronald S. Bultje : > Hi, > > On Thu, Jan 10, 2019 at 2:41 PM Carl Eugen Hoyos wrote: > >> 2019-01-07 18:37 GMT+01:00, Ronald S. Bultje : >> > Hi, >> > >> > On Mon, Jan 7, 2019 at 12:22 PM Lauri Kasanen wrote: >> > >> >> On Mon, 7 Jan 2019 12:02:58 -0500 >> >> "Ronald S. Bultje" wrote: >> >> >> >> > Have you considered vp8? It may sound weird but this is basically >> >> > what >> >> vp8 >> >> > was great at: being really simple to decode. >> >> >> >> VP8 has a reputation of being slow, so I didn't consider it. Benchmarks >> >> show it as decoding slower than h264. >> >> >> > >> > It is faster than h264 when comparing ffh264 vs. ffvp8: >> > >> > https://blogs.gnome.org/rbultje/files/2014/02/sintel_decspeed.png >> >> Are the relations identical without asm optimizations? >> > > I believe so, yes. The theory behind it would be that lack of per-symbol > probability adaptations in CABAC and bidirectional prediction were missing > in VP8, both of which incur a significant runtime overhead. Then, if you > start disabling tools (e.g. CABAC -> CAVLC) this difference would probably > diminish quite quickly. Thank you for the clarification! Carl Eugen ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 2/3] avcodec/gdv: Optimize and factorize scaling loops
2019-01-04 20:22 GMT+01:00, Michael Niedermayer : > +static void scaledown(uint8_t *dst, const uint8_t *src, int w) > +{ > +int x; > +for (x = 0; x < w - 7; x+=8) { > +dst[x + 0] = src[2*x + 0]; > +dst[x + 1] = src[2*x + 2]; > +dst[x + 2] = src[2*x + 4]; > +dst[x + 3] = src[2*x + 6]; > +dst[x + 4] = src[2*x + 8]; > +dst[x + 5] = src[2*x +10]; > +dst[x + 6] = src[2*x +12]; > +dst[x + 7] = src[2*x +14]; Could you add to the commit message the information which compiler is able to optimize this? (Assuming this is a reason for the speedup) Sorry for the late comment, Carl Eugen ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] avcodec/tests/rangecoder: initialize array to avoid valgrind warning
On Fri, Jan 04, 2019 at 02:46:29AM +0100, Michael Niedermayer wrote: > Found-by: jamrial > Signed-off-by: Michael Niedermayer > --- > libavcodec/tests/rangecoder.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) i intend to apply this soon unless there are more comments, i did not understand the only comment :( [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB "I am not trying to be anyone's saviour, I'm trying to think about the future and not be sad" - Elon Musk signature.asc Description: PGP signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 3/3] avcodec/h264_slice: Fix integer overflow in implicit_weight_table()
On Fri, Jan 04, 2019 at 08:22:36PM +0100, Michael Niedermayer wrote: > Fixes: signed integer overflow: 2 * 2132811760 cannot be represented in type > 'int' > Fixes: > 11156/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_H264_fuzzer-6237685933408256 > > Found-by: continuous fuzzing process > https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg > Signed-off-by: Michael Niedermayer > --- > libavcodec/h264_slice.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) will apply [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB Concerning the gods, I have no means of knowing whether they exist or not or of what sort they may be, because of the obscurity of the subject, and the brevity of human life -- Protagoras signature.asc Description: PGP signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 2/3] avcodec/gdv: Optimize and factorize scaling loops
On Fri, Jan 04, 2019 at 08:22:35PM +0100, Michael Niedermayer wrote: > Fixes: Timeout > Fixes: > 11067/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_GDV_fuzzer-5686623711264768 > > Before change: Executed > clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_GDV_fuzzer-5686623711264768 > in 34386 ms > After change: Executed > clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_GDV_fuzzer-5686623711264768 > in 24327 ms > > Found-by: continuous fuzzing process > https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg > Signed-off-by: Michael Niedermayer > --- > libavcodec/gdv.c | 87 +++- > 1 file changed, 64 insertions(+), 23 deletions(-) will apply [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB No human being will ever know the Truth, for even if they happen to say it by chance, they would not even known they had done so. -- Xenophanes signature.asc Description: PGP signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 3/3] avcodec/exr: set layer_match in all branches
On Tue, Dec 25, 2018 at 11:15:22PM +0100, Michael Niedermayer wrote: > Otherwise it is left to the value from the previous iteration > > Signed-off-by: Michael Niedermayer > --- > libavcodec/exr.c | 1 + > 1 file changed, 1 insertion(+) will apply [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB Asymptotically faster algorithms should always be preferred if you have asymptotical amounts of data signature.asc Description: PGP signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 2/3] avcodec/exr: Check for duplicate channel index
On Tue, Dec 25, 2018 at 11:15:21PM +0100, Michael Niedermayer wrote: > Fixes: Out of memory > Fixes: > 11582/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_EXR_fuzzer-5730204559867904 > > Found-by: continuous fuzzing process > https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg > Signed-off-by: Michael Niedermayer > --- > libavcodec/exr.c | 5 + > 1 file changed, 5 insertions(+) will apply [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB When the tyrant has disposed of foreign enemies by conquest or treaty, and there is nothing more to fear from them, then he is always stirring up some war or other, in order that the people may require a leader. -- Plato signature.asc Description: PGP signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH v5] libswscale/ppc: VSX-optimize 9-16 bit yuv2planeX
On Sat, Jan 12, 2019 at 10:47:50AM +0200, Lauri Kasanen wrote: > ./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt > yuv420p16be \ > -s 1920x1728 -f null -vframes 100 -v error -nostats - > > 9-14 bit funcs get about 6x speedup, 16-bit gets about 15x. > Fate passes, each format tested with an image to video conversion. > > Only POWER8 includes 32-bit vector multiplies, so POWER7 is locked out > of the 16-bit function. This includes the vec_mulo/mule functions too, > not just vmuluwm. > > yuv420p9le > 12341 UNITS in planarX, 130976 runs, 96 skips > 73752 UNITS in planarX, 131066 runs, 6 skips > yuv420p9be > 12364 UNITS in planarX, 131025 runs, 47 skips > 73001 UNITS in planarX, 131055 runs, 17 skips > yuv420p10le > 12386 UNITS in planarX, 131042 runs, 30 skips > 72735 UNITS in planarX, 131062 runs, 10 skips > yuv420p10be > 12337 UNITS in planarX, 131045 runs, 27 skips > 72734 UNITS in planarX, 131057 runs, 15 skips > yuv420p12le > 12236 UNITS in planarX, 131058 runs, 14 skips > 73029 UNITS in planarX, 131062 runs, 10 skips > yuv420p12be > 12218 UNITS in planarX, 130973 runs, 99 skips > 72402 UNITS in planarX, 131069 runs, 3 skips > yuv420p14le > 12168 UNITS in planarX, 131067 runs, 5 skips > 72480 UNITS in planarX, 131069 runs, 3 skips > yuv420p14be > 12358 UNITS in planarX, 130948 runs,124 skips > 73772 UNITS in planarX, 131063 runs, 9 skips > yuv420p16le > 10439 UNITS in planarX, 130911 runs,161 skips > 157923 UNITS in planarX, 131068 runs, 4 skips > yuv420p16be > 10463 UNITS in planarX, 130874 runs,198 skips > 154405 UNITS in planarX, 131061 runs, 11 skips The number of skips in the benchmark is much larger on one side. That way the numbers become hard to compare as more cases aer skipped on one side please adjust the parameters so the skip counts are compareable or redo the tests until the numbers are more similar thanks [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB Modern terrorism, a quick summary: Need oil, start war with country that has oil, kill hundread thousand in war. Let country fall into chaos, be surprised about raise of fundamantalists. Drop more bombs, kill more people, be surprised about them taking revenge and drop even more bombs and strip your own citizens of their rights and freedoms. to be continued signature.asc Description: PGP signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH v5] libswscale/ppc: VSX-optimize 9-16 bit yuv2planeX
./ffmpeg_g -f rawvideo -pix_fmt rgb24 -s hd1080 -i /dev/zero -pix_fmt yuv420p16be \ -s 1920x1728 -f null -vframes 100 -v error -nostats - 9-14 bit funcs get about 6x speedup, 16-bit gets about 15x. Fate passes, each format tested with an image to video conversion. Only POWER8 includes 32-bit vector multiplies, so POWER7 is locked out of the 16-bit function. This includes the vec_mulo/mule functions too, not just vmuluwm. yuv420p9le 12341 UNITS in planarX, 130976 runs, 96 skips 73752 UNITS in planarX, 131066 runs, 6 skips yuv420p9be 12364 UNITS in planarX, 131025 runs, 47 skips 73001 UNITS in planarX, 131055 runs, 17 skips yuv420p10le 12386 UNITS in planarX, 131042 runs, 30 skips 72735 UNITS in planarX, 131062 runs, 10 skips yuv420p10be 12337 UNITS in planarX, 131045 runs, 27 skips 72734 UNITS in planarX, 131057 runs, 15 skips yuv420p12le 12236 UNITS in planarX, 131058 runs, 14 skips 73029 UNITS in planarX, 131062 runs, 10 skips yuv420p12be 12218 UNITS in planarX, 130973 runs, 99 skips 72402 UNITS in planarX, 131069 runs, 3 skips yuv420p14le 12168 UNITS in planarX, 131067 runs, 5 skips 72480 UNITS in planarX, 131069 runs, 3 skips yuv420p14be 12358 UNITS in planarX, 130948 runs,124 skips 73772 UNITS in planarX, 131063 runs, 9 skips yuv420p16le 10439 UNITS in planarX, 130911 runs,161 skips 157923 UNITS in planarX, 131068 runs, 4 skips yuv420p16be 10463 UNITS in planarX, 130874 runs,198 skips 154405 UNITS in planarX, 131061 runs, 11 skips Signed-off-by: Lauri Kasanen --- libswscale/ppc/swscale_ppc_template.c | 4 +- libswscale/ppc/swscale_vsx.c | 186 +- 2 files changed, 184 insertions(+), 6 deletions(-) v2: Separate macros so that yuv2plane1_16_vsx remains available for power7 v3: Remove accidental tabs, switch to HAVE_POWER8 from configure + runtime check v4: #if HAVE_POWER8 v5: Get rid of the mul #if, turns out gcc vec_mul works diff --git a/libswscale/ppc/swscale_ppc_template.c b/libswscale/ppc/swscale_ppc_template.c index 00e4b99..11decab 100644 --- a/libswscale/ppc/swscale_ppc_template.c +++ b/libswscale/ppc/swscale_ppc_template.c @@ -21,7 +21,7 @@ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA */ -static void FUNC(yuv2planeX_16)(const int16_t *filter, int filterSize, +static void FUNC(yuv2planeX_8_16)(const int16_t *filter, int filterSize, const int16_t **src, uint8_t *dest, const uint8_t *dither, int offset, int x) { @@ -88,7 +88,7 @@ static void FUNC(yuv2planeX)(const int16_t *filter, int filterSize, yuv2planeX_u(filter, filterSize, src, dest, dst_u, dither, offset, 0); for (i = dst_u; i < dstW - 15; i += 16) -FUNC(yuv2planeX_16)(filter, filterSize, src, dest + i, dither, +FUNC(yuv2planeX_8_16)(filter, filterSize, src, dest + i, dither, offset, i); yuv2planeX_u(filter, filterSize, src, dest, dstW, dither, offset, i); diff --git a/libswscale/ppc/swscale_vsx.c b/libswscale/ppc/swscale_vsx.c index 70da6ae..f6c7f1d 100644 --- a/libswscale/ppc/swscale_vsx.c +++ b/libswscale/ppc/swscale_vsx.c @@ -83,6 +83,8 @@ #include "swscale_ppc_template.c" #undef FUNC +#undef vzero + #endif /* !HAVE_BIGENDIAN */ static void yuv2plane1_8_u(const int16_t *src, uint8_t *dest, int dstW, @@ -180,6 +182,76 @@ static void yuv2plane1_nbps_vsx(const int16_t *src, uint16_t *dest, int dstW, yuv2plane1_nbps_u(src, dest, dstW, big_endian, output_bits, i); } +static void yuv2planeX_nbps_u(const int16_t *filter, int filterSize, + const int16_t **src, uint16_t *dest, int dstW, + int big_endian, int output_bits, int start) +{ +int i; +int shift = 11 + 16 - output_bits; + +for (i = start; i < dstW; i++) { +int val = 1 << (shift - 1); +int j; + +for (j = 0; j < filterSize; j++) +val += src[j][i] * filter[j]; + +output_pixel([i], val); +} +} + +static void yuv2planeX_nbps_vsx(const int16_t *filter, int filterSize, +const int16_t **src, uint16_t *dest, int dstW, +int big_endian, int output_bits) +{ +const int dst_u = -(uintptr_t)dest & 7; +const int shift = 11 + 16 - output_bits; +const int add = (1 << (shift - 1)); +const int clip = (1 << output_bits) - 1; +const uint16_t swap = big_endian ? 8 : 0; +const vector uint32_t vadd = (vector uint32_t) {add, add, add, add}; +const vector uint32_t vshift = (vector uint32_t) {shift, shift, shift, shift}; +const vector uint16_t vswap = (vector uint16_t) {swap, swap, swap, swap, swap, swap, swap, swap}; +const vector uint16_t vlargest = (vector uint16_t) {clip, clip, clip, clip, clip, clip, clip,
Re: [FFmpeg-devel] [PATCH v4] libswscale/ppc: VSX-optimize 9-16 bit yuv2planeX
On Sat, 12 Jan 2019 01:03:09 +0100 Michael Niedermayer wrote: > On Fri, Jan 11, 2019 at 11:16:20AM +0200, Lauri Kasanen wrote: > > On Fri, 11 Jan 2019 09:56:15 +0100 > > Michael Niedermayer wrote: > > > > > > +#ifdef __GNUC__ > > > > +// GCC does not support vmuluwm yet. Bug open. > > > > > > this should probably be tested by configure similar to how other > > > compiler limitations are tested > > > > We can't really test for it, because there is no standard name for it. I > > don't know what name the gcc devs will pick for it, it could be vec_mul, > > vec_vmuluwm or something different. > > the code contains a #if and a #else case > so i thought there was something else than the __GNUC__ case and gcc > would follow that It's second-hand info from libsimdpp. I don't know where they got it. However, I found out yesterday that gcc docs are wrong, and vec_mul for gcc does use the correct instruction on power8. Respinning. - Lauri ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel