Re: [FFmpeg-devel] [PATCH 4/4] avcodec/pngdec: use memcpy instead of byte loops for P frames.

2014-12-04 Thread Michael Niedermayer
On Thu, Dec 04, 2014 at 03:28:32PM +0100, Benoit Fouet wrote:
 Hi,
 
 Le 03/12/2014 14:31, Benoit Fouet a écrit :
 Rely on the way memcpy is optimized for one's system instead of looping
 on a byte buffer for buffer copies to handle P frames.
 ---
   libavcodec/pngdec.c | 25 +
   1 file changed, 13 insertions(+), 12 deletions(-)
 
 diff --git a/libavcodec/pngdec.c b/libavcodec/pngdec.c
 index 7e7b285..e6b7593 100644
 --- a/libavcodec/pngdec.c
 +++ b/libavcodec/pngdec.c
 @@ -884,8 +884,7 @@ static int handle_p_frame_apng(AVCodecContext *avctx, 
 PNGDecContext *s,
   ff_thread_await_progress(s-previous_picture, INT_MAX, 0);
   for (j = 0; j  s-y_offset; j++) {
 -for (i = 0; i  ls; i++)
 -pd[i] = pd_last[i];
 +memcpy(pd, pd_last, ls);
   pd  += s-image_linesize;
   pd_last += s-image_linesize;
   }
 @@ -907,8 +906,9 @@ static int handle_p_frame_apng(AVCodecContext *avctx, 
 PNGDecContext *s,
   }
   for (j = s-y_offset; j  s-y_offset + s-cur_h; j++) {
 -for (i = 0; i  s-x_offset * s-bpp; i++)
 -pd[i] = pd_last[i];
 +i = s-x_offset * s-bpp;
 +if (i)
 +memcpy(pd, pd_last, i);
   for (; i  (s-x_offset + s-cur_w) * s-bpp; i += s-bpp) {
   uint8_t alpha = pd[i+ai];
 @@ -930,26 +930,27 @@ static int handle_p_frame_apng(AVCodecContext *avctx, 
 PNGDecContext *s,
   break;
   }
   }
 -for (; i  ls; i++)
 -pd[i] = pd_last[i];
 +if (ls - i)
 +memcpy(pd+i, pd_last+i, ls - i);
   pd  += s-image_linesize;
   pd_last += s-image_linesize;
   pd_last_region += s-image_linesize;
   }
   } else {
   for (j = s-y_offset; j  s-y_offset + s-cur_h; j++) {
 -for (i = 0; i  s-x_offset * s-bpp; i++)
 -pd[i] = pd_last[i];
 -for (i = (s-x_offset + s-cur_w) * s-bpp; i  ls; i++)
 -pd[i] = pd_last[i];
 +int end_offset = (s-x_offset + s-cur_w) * s-bpp;
 +int end_len= ls - end_offset;
 +if (s-x_offset)
 +memcpy(pd, pd_last, s-x_offset * s-bpp);
 +if (end_len)
 +memcpy(pd+end_offset, pd_last+end_offset, end_len);
   pd  += s-image_linesize;
   pd_last += s-image_linesize;
   }
   }
   for (j = s-y_offset + s-cur_h; j  s-height; j++) {
 -for (i = 0; i  ls; i++)
 -pd[i] = pd_last[i];
 +memcpy(pd, pd_last, ls);
   pd  += s-image_linesize;
   pd_last += s-image_linesize;
   }
 
 Unless there is some more remarks on this one, I think it can be
 applied too.

applied

thx

[...]

-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Many things microsoft did are stupid, but not doing something just because
microsoft did it is even more stupid. If everything ms did were stupid they
would be bankrupt already.


signature.asc
Description: Digital signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH 4/4] avcodec/pngdec: use memcpy instead of byte loops for P frames.

2014-12-03 Thread Benoit Fouet
Rely on the way memcpy is optimized for one's system instead of looping
on a byte buffer for buffer copies to handle P frames.
---
 libavcodec/pngdec.c | 30 +++---
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/libavcodec/pngdec.c b/libavcodec/pngdec.c
index 7e7b285..8b004bd 100644
--- a/libavcodec/pngdec.c
+++ b/libavcodec/pngdec.c
@@ -844,15 +844,14 @@ static int decode_fctl_chunk(AVCodecContext *avctx, 
PNGDecContext *s,
 
 static void handle_p_frame_png(PNGDecContext *s, AVFrame *p)
 {
-int i, j;
+int j;
 uint8_t *pd  = p-data[0];
 uint8_t *pd_last = s-last_picture.f-data[0];
 int ls = FFMIN(av_image_get_linesize(p-format, s-width, 0), s-width * 
s-bpp);
 
 ff_thread_await_progress(s-last_picture, INT_MAX, 0);
 for (j = 0; j  s-height; j++) {
-for (i = 0; i  ls; i++)
-pd[i] += pd_last[i];
+memcpy(pd, pd_last, ls);
 pd  += s-image_linesize;
 pd_last += s-image_linesize;
 }
@@ -884,8 +883,7 @@ static int handle_p_frame_apng(AVCodecContext *avctx, 
PNGDecContext *s,
 ff_thread_await_progress(s-previous_picture, INT_MAX, 0);
 
 for (j = 0; j  s-y_offset; j++) {
-for (i = 0; i  ls; i++)
-pd[i] = pd_last[i];
+memcpy(pd, pd_last, ls);
 pd  += s-image_linesize;
 pd_last += s-image_linesize;
 }
@@ -907,8 +905,9 @@ static int handle_p_frame_apng(AVCodecContext *avctx, 
PNGDecContext *s,
 }
 
 for (j = s-y_offset; j  s-y_offset + s-cur_h; j++) {
-for (i = 0; i  s-x_offset * s-bpp; i++)
-pd[i] = pd_last[i];
+i = s-x_offset * s-bpp;
+if (i)
+memcpy(pd, pd_last, i);
 for (; i  (s-x_offset + s-cur_w) * s-bpp; i += s-bpp) {
 uint8_t alpha = pd[i+ai];
 
@@ -930,26 +929,27 @@ static int handle_p_frame_apng(AVCodecContext *avctx, 
PNGDecContext *s,
 break;
 }
 }
-for (; i  ls; i++)
-pd[i] = pd_last[i];
+if (ls - i)
+memcpy(pd+i, pd_last+i, ls - i);
 pd  += s-image_linesize;
 pd_last += s-image_linesize;
 pd_last_region += s-image_linesize;
 }
 } else {
 for (j = s-y_offset; j  s-y_offset + s-cur_h; j++) {
-for (i = 0; i  s-x_offset * s-bpp; i++)
-pd[i] = pd_last[i];
-for (i = (s-x_offset + s-cur_w) * s-bpp; i  ls; i++)
-pd[i] = pd_last[i];
+int end_offset = (s-x_offset + s-cur_w) * s-bpp;
+int end_len= ls - end_offset;
+if (s-x_offset)
+memcpy(pd, pd_last, s-x_offset * s-bpp);
+if (end_len)
+memcpy(pd+end_offset, pd_last+end_offset, end_len);
 pd  += s-image_linesize;
 pd_last += s-image_linesize;
 }
 }
 
 for (j = s-y_offset + s-cur_h; j  s-height; j++) {
-for (i = 0; i  ls; i++)
-pd[i] = pd_last[i];
+memcpy(pd, pd_last, ls);
 pd  += s-image_linesize;
 pd_last += s-image_linesize;
 }
-- 
2.2.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 4/4] avcodec/pngdec: use memcpy instead of byte loops for P frames.

2014-12-03 Thread Benoit Fouet
Hi,

- Mail original -
 Rely on the way memcpy is optimized for one's system instead of
 looping
 on a byte buffer for buffer copies to handle P frames.
 ---
  libavcodec/pngdec.c | 30 +++---
  1 file changed, 15 insertions(+), 15 deletions(-)
 
 diff --git a/libavcodec/pngdec.c b/libavcodec/pngdec.c
 index 7e7b285..8b004bd 100644
 --- a/libavcodec/pngdec.c
 +++ b/libavcodec/pngdec.c
 @@ -844,15 +844,14 @@ static int decode_fctl_chunk(AVCodecContext
 *avctx, PNGDecContext *s,
  
  static void handle_p_frame_png(PNGDecContext *s, AVFrame *p)
  {
 -int i, j;
 +int j;
  uint8_t *pd  = p-data[0];
  uint8_t *pd_last = s-last_picture.f-data[0];
  int ls = FFMIN(av_image_get_linesize(p-format, s-width, 0),
  s-width * s-bpp);
  
  ff_thread_await_progress(s-last_picture, INT_MAX, 0);
  for (j = 0; j  s-height; j++) {
 -for (i = 0; i  ls; i++)
 -pd[i] += pd_last[i];
 +memcpy(pd, pd_last, ls);
 

Ouch... Reverted locally...

-- 
Ben
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 4/4] avcodec/pngdec: use memcpy instead of byte loops for P frames.

2014-12-03 Thread Benoit Fouet

Hi,

Le 03/12/2014 14:51, Christophe Gisquet a écrit :

Hi,

2014-12-03 14:16 GMT+01:00 Benoit Fouet benoit.fo...@free.fr:

Rely on the way memcpy is optimized for one's system instead of looping
on a byte buffer for buffer copies to handle P frames.

Are there many compilers left that actually perform a call here
instead of inlining code? Some people will smugly call that ricing,
but it would be interesting to bench this for small and large images.
As you mention P frames, that might be smaller widths here, where
the overhead of checking the various aligments etc in a C lib memcpy
is larger.



When the overhead here is high, then it means that we are going to 
perform a lot on the frame itself, looping using byte buffers.
I believe this is a good compromise (and also I had up to 20% better 
performances on the APNG samples I have).


--
Ben
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 4/4] avcodec/pngdec: use memcpy instead of byte loops for P frames.

2014-12-03 Thread Benoit Fouet
Hi,

On December 3, 2014 6:39:12 PM GMT+01:00, Christophe Gisquet 
christophe.gisq...@gmail.com wrote:
Hi,

2014-12-03 18:32 GMT+01:00 Benoit Fouet benoit.fo...@free.fr:
 When the overhead here is high, then it means that we are going to
perform a
 lot on the frame itself, looping using byte buffers.
 I believe this is a good compromise (and also I had up to 20% better
 performances on the APNG samples I have).

I have just benchmarked on a few animated pngs, and it's
systematically much better here.


Cool, and also thanks for testing this!

-- 
Ben


___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel