Re: [FFmpeg-devel] [PATCH 4/4] avcodec/pngdec: use memcpy instead of byte loops for P frames.
On Thu, Dec 04, 2014 at 03:28:32PM +0100, Benoit Fouet wrote: Hi, Le 03/12/2014 14:31, Benoit Fouet a écrit : Rely on the way memcpy is optimized for one's system instead of looping on a byte buffer for buffer copies to handle P frames. --- libavcodec/pngdec.c | 25 + 1 file changed, 13 insertions(+), 12 deletions(-) diff --git a/libavcodec/pngdec.c b/libavcodec/pngdec.c index 7e7b285..e6b7593 100644 --- a/libavcodec/pngdec.c +++ b/libavcodec/pngdec.c @@ -884,8 +884,7 @@ static int handle_p_frame_apng(AVCodecContext *avctx, PNGDecContext *s, ff_thread_await_progress(s-previous_picture, INT_MAX, 0); for (j = 0; j s-y_offset; j++) { -for (i = 0; i ls; i++) -pd[i] = pd_last[i]; +memcpy(pd, pd_last, ls); pd += s-image_linesize; pd_last += s-image_linesize; } @@ -907,8 +906,9 @@ static int handle_p_frame_apng(AVCodecContext *avctx, PNGDecContext *s, } for (j = s-y_offset; j s-y_offset + s-cur_h; j++) { -for (i = 0; i s-x_offset * s-bpp; i++) -pd[i] = pd_last[i]; +i = s-x_offset * s-bpp; +if (i) +memcpy(pd, pd_last, i); for (; i (s-x_offset + s-cur_w) * s-bpp; i += s-bpp) { uint8_t alpha = pd[i+ai]; @@ -930,26 +930,27 @@ static int handle_p_frame_apng(AVCodecContext *avctx, PNGDecContext *s, break; } } -for (; i ls; i++) -pd[i] = pd_last[i]; +if (ls - i) +memcpy(pd+i, pd_last+i, ls - i); pd += s-image_linesize; pd_last += s-image_linesize; pd_last_region += s-image_linesize; } } else { for (j = s-y_offset; j s-y_offset + s-cur_h; j++) { -for (i = 0; i s-x_offset * s-bpp; i++) -pd[i] = pd_last[i]; -for (i = (s-x_offset + s-cur_w) * s-bpp; i ls; i++) -pd[i] = pd_last[i]; +int end_offset = (s-x_offset + s-cur_w) * s-bpp; +int end_len= ls - end_offset; +if (s-x_offset) +memcpy(pd, pd_last, s-x_offset * s-bpp); +if (end_len) +memcpy(pd+end_offset, pd_last+end_offset, end_len); pd += s-image_linesize; pd_last += s-image_linesize; } } for (j = s-y_offset + s-cur_h; j s-height; j++) { -for (i = 0; i ls; i++) -pd[i] = pd_last[i]; +memcpy(pd, pd_last, ls); pd += s-image_linesize; pd_last += s-image_linesize; } Unless there is some more remarks on this one, I think it can be applied too. applied thx [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB Many things microsoft did are stupid, but not doing something just because microsoft did it is even more stupid. If everything ms did were stupid they would be bankrupt already. signature.asc Description: Digital signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH 4/4] avcodec/pngdec: use memcpy instead of byte loops for P frames.
Rely on the way memcpy is optimized for one's system instead of looping on a byte buffer for buffer copies to handle P frames. --- libavcodec/pngdec.c | 30 +++--- 1 file changed, 15 insertions(+), 15 deletions(-) diff --git a/libavcodec/pngdec.c b/libavcodec/pngdec.c index 7e7b285..8b004bd 100644 --- a/libavcodec/pngdec.c +++ b/libavcodec/pngdec.c @@ -844,15 +844,14 @@ static int decode_fctl_chunk(AVCodecContext *avctx, PNGDecContext *s, static void handle_p_frame_png(PNGDecContext *s, AVFrame *p) { -int i, j; +int j; uint8_t *pd = p-data[0]; uint8_t *pd_last = s-last_picture.f-data[0]; int ls = FFMIN(av_image_get_linesize(p-format, s-width, 0), s-width * s-bpp); ff_thread_await_progress(s-last_picture, INT_MAX, 0); for (j = 0; j s-height; j++) { -for (i = 0; i ls; i++) -pd[i] += pd_last[i]; +memcpy(pd, pd_last, ls); pd += s-image_linesize; pd_last += s-image_linesize; } @@ -884,8 +883,7 @@ static int handle_p_frame_apng(AVCodecContext *avctx, PNGDecContext *s, ff_thread_await_progress(s-previous_picture, INT_MAX, 0); for (j = 0; j s-y_offset; j++) { -for (i = 0; i ls; i++) -pd[i] = pd_last[i]; +memcpy(pd, pd_last, ls); pd += s-image_linesize; pd_last += s-image_linesize; } @@ -907,8 +905,9 @@ static int handle_p_frame_apng(AVCodecContext *avctx, PNGDecContext *s, } for (j = s-y_offset; j s-y_offset + s-cur_h; j++) { -for (i = 0; i s-x_offset * s-bpp; i++) -pd[i] = pd_last[i]; +i = s-x_offset * s-bpp; +if (i) +memcpy(pd, pd_last, i); for (; i (s-x_offset + s-cur_w) * s-bpp; i += s-bpp) { uint8_t alpha = pd[i+ai]; @@ -930,26 +929,27 @@ static int handle_p_frame_apng(AVCodecContext *avctx, PNGDecContext *s, break; } } -for (; i ls; i++) -pd[i] = pd_last[i]; +if (ls - i) +memcpy(pd+i, pd_last+i, ls - i); pd += s-image_linesize; pd_last += s-image_linesize; pd_last_region += s-image_linesize; } } else { for (j = s-y_offset; j s-y_offset + s-cur_h; j++) { -for (i = 0; i s-x_offset * s-bpp; i++) -pd[i] = pd_last[i]; -for (i = (s-x_offset + s-cur_w) * s-bpp; i ls; i++) -pd[i] = pd_last[i]; +int end_offset = (s-x_offset + s-cur_w) * s-bpp; +int end_len= ls - end_offset; +if (s-x_offset) +memcpy(pd, pd_last, s-x_offset * s-bpp); +if (end_len) +memcpy(pd+end_offset, pd_last+end_offset, end_len); pd += s-image_linesize; pd_last += s-image_linesize; } } for (j = s-y_offset + s-cur_h; j s-height; j++) { -for (i = 0; i ls; i++) -pd[i] = pd_last[i]; +memcpy(pd, pd_last, ls); pd += s-image_linesize; pd_last += s-image_linesize; } -- 2.2.0 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 4/4] avcodec/pngdec: use memcpy instead of byte loops for P frames.
Hi, - Mail original - Rely on the way memcpy is optimized for one's system instead of looping on a byte buffer for buffer copies to handle P frames. --- libavcodec/pngdec.c | 30 +++--- 1 file changed, 15 insertions(+), 15 deletions(-) diff --git a/libavcodec/pngdec.c b/libavcodec/pngdec.c index 7e7b285..8b004bd 100644 --- a/libavcodec/pngdec.c +++ b/libavcodec/pngdec.c @@ -844,15 +844,14 @@ static int decode_fctl_chunk(AVCodecContext *avctx, PNGDecContext *s, static void handle_p_frame_png(PNGDecContext *s, AVFrame *p) { -int i, j; +int j; uint8_t *pd = p-data[0]; uint8_t *pd_last = s-last_picture.f-data[0]; int ls = FFMIN(av_image_get_linesize(p-format, s-width, 0), s-width * s-bpp); ff_thread_await_progress(s-last_picture, INT_MAX, 0); for (j = 0; j s-height; j++) { -for (i = 0; i ls; i++) -pd[i] += pd_last[i]; +memcpy(pd, pd_last, ls); Ouch... Reverted locally... -- Ben ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 4/4] avcodec/pngdec: use memcpy instead of byte loops for P frames.
Hi, Le 03/12/2014 14:51, Christophe Gisquet a écrit : Hi, 2014-12-03 14:16 GMT+01:00 Benoit Fouet benoit.fo...@free.fr: Rely on the way memcpy is optimized for one's system instead of looping on a byte buffer for buffer copies to handle P frames. Are there many compilers left that actually perform a call here instead of inlining code? Some people will smugly call that ricing, but it would be interesting to bench this for small and large images. As you mention P frames, that might be smaller widths here, where the overhead of checking the various aligments etc in a C lib memcpy is larger. When the overhead here is high, then it means that we are going to perform a lot on the frame itself, looping using byte buffers. I believe this is a good compromise (and also I had up to 20% better performances on the APNG samples I have). -- Ben ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 4/4] avcodec/pngdec: use memcpy instead of byte loops for P frames.
Hi, On December 3, 2014 6:39:12 PM GMT+01:00, Christophe Gisquet christophe.gisq...@gmail.com wrote: Hi, 2014-12-03 18:32 GMT+01:00 Benoit Fouet benoit.fo...@free.fr: When the overhead here is high, then it means that we are going to perform a lot on the frame itself, looping using byte buffers. I believe this is a good compromise (and also I had up to 20% better performances on the APNG samples I have). I have just benchmarked on a few animated pngs, and it's systematically much better here. Cool, and also thanks for testing this! -- Ben ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel