Re: [FFmpeg-devel] avcodec/huffyuvenc : try to call dsp with aligned data, and remove code duplication

2017-12-09 Thread Martin Vignali
2017-12-02 18:59 GMT+01:00 Martin Vignali :

>
>
>> requiring FFMIN() to be evaluated per iteration could be slower
>> if the compiler fails to factor it out
>>
>>
>>
>> New patchs in attach :
>
> 001 : unchanged
> 002 : add "int min_width = FFMIN(w, 32)" at the start of the func
> 003 : add "int min_width = FFMIN(w, 8)" at the start of the func
>
>
> Pass fate test for me.
>
> Pushed, thanks

Martin
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] avcodec/huffyuvenc : try to call dsp with aligned data, and remove code duplication

2017-12-02 Thread Martin Vignali
>
> requiring FFMIN() to be evaluated per iteration could be slower
> if the compiler fails to factor it out
>
>
>
> New patchs in attach :

001 : unchanged
002 : add "int min_width = FFMIN(w, 32)" at the start of the func
003 : add "int min_width = FFMIN(w, 8)" at the start of the func


Pass fate test for me.

Martin


0001-avcodec-huffyuvenc-increase-scalar-loop-count.patch
Description: Binary data


0002-avcodec-huffyuvenc-remove-code-duplication-in.patch
Description: Binary data


0003-avcodec-huffyuvenc-sub_left_prediction_bgr32-call-ds.patch
Description: Binary data
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] avcodec/huffyuvenc : try to call dsp with aligned data, and remove code duplication

2017-12-01 Thread Michael Niedermayer
On Sun, Nov 26, 2017 at 07:07:41PM +0100, Martin Vignali wrote:
> Hello,
> 
> in attach patchs
> 
> 0001-avcodec-huffyuvenc-increase-scalar-loop-count
> and
> 0003-avcodec-huffyuvenc-sub_left_prediction_bgr32-call-ds
> 
> like diff_bytes and diff_bytes16, have AVX2 version, increase the scalar
> loop
> to call the aligned version in most case
> 
> 
> 
> 0002-avcodec-huffyuvenc-remove-code-duplication-in
> remove some code duplication, for width < 32 and for the initial scalar loop
> 
> 
> pass fate test for me (x86_64, mac os 10.12)
> 
> Martin

>  huffyuvenc.c |4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 32eecc99e666808926e1dec4ff35c17a94f5f86e  
> 0001-avcodec-huffyuvenc-increase-scalar-loop-count.patch
> From 9477be212247012ac386beeff009a2edb78abb31 Mon Sep 17 00:00:00 2001
> From: Martin Vignali 
> Date: Sun, 26 Nov 2017 19:01:29 +0100
> Subject: [PATCH 1/3] avcodec/huffyuvenc : increase scalar loop count
> 
> in order to try to call dsp in aligned mode
> (diff_int16 have AVX2 now)
> ---
>  libavcodec/huffyuvenc.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/libavcodec/huffyuvenc.c b/libavcodec/huffyuvenc.c
> index 89639b75df..4f3a28e033 100644
> --- a/libavcodec/huffyuvenc.c
> +++ b/libavcodec/huffyuvenc.c
> @@ -80,12 +80,12 @@ static inline int sub_left_prediction(HYuvContext *s, 
> uint8_t *dst,
>  }
>  return left;
>  } else {
> -for (i = 0; i < 16; i++) {
> +for (i = 0; i < 32; i++) {
>  const int temp = src16[i];
>  dst16[i] = temp - left;
>  left   = temp;
>  }
> -s->hencdsp.diff_int16(dst16 + 16, src16 + 16, src16 + 15, s->n - 
> 1, w - 16);
> +s->hencdsp.diff_int16(dst16 + 32, src16 + 32, src16 + 31, s->n - 
> 1, w - 32);
>  return src16[w-1];
>  }
>  }
> -- 
> 2.11.0 (Apple Git-81)
> 

>  huffyuvenc.c |   46 --
>  1 file changed, 16 insertions(+), 30 deletions(-)
> ba80747db2582141ec0faefc5ccd04fba65c7d72  
> 0002-avcodec-huffyuvenc-remove-code-duplication-in.patch
> From 7fa991ae72c97f4d1f74789e543cf01dcb93adb9 Mon Sep 17 00:00:00 2001
> From: Martin Vignali 
> Date: Sun, 26 Nov 2017 19:02:10 +0100
> Subject: [PATCH 2/3] avcodec/huffyuvenc : remove code duplication in 
>  sub_left_prediction
> 
> start of the line (before dsp call), can be merge with width < 32 part
> ---
>  libavcodec/huffyuvenc.c | 46 --
>  1 file changed, 16 insertions(+), 30 deletions(-)
> 
> diff --git a/libavcodec/huffyuvenc.c b/libavcodec/huffyuvenc.c
> index 4f3a28e033..59da49212e 100644
> --- a/libavcodec/huffyuvenc.c
> +++ b/libavcodec/huffyuvenc.c
> @@ -53,41 +53,27 @@ static inline int sub_left_prediction(HYuvContext *s, 
> uint8_t *dst,
>  {
>  int i;
>  if (s->bps <= 8) {
> -if (w < 32) {
> -for (i = 0; i < w; i++) {
> -const int temp = src[i];
> -dst[i] = temp - left;
> -left   = temp;
> -}
> -return left;
> -} else {
> -for (i = 0; i < 32; i++) {
> -const int temp = src[i];
> -dst[i] = temp - left;
> -left   = temp;
> -}
> -s->llvidencdsp.diff_bytes(dst + 32, src + 32, src + 31, w - 32);
> -return src[w-1];
> +for (i = 0; i < FFMIN(w, 32); i++) { /* scalar loop before dsp call 
> */
> +const int temp = src[i];
> +dst[i] = temp - left;
> +left   = temp;

requiring FFMIN() to be evaluated per iteration could be slower
if the compiler fails to factor it out

no other comments from me, the patches should be ok otherwise

[...]

-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

If you fake or manipulate statistics in a paper in physics you will never
get a job again.
If you fake or manipulate statistics in a paper in medicin you will get
a job for life at the pharma industry.


signature.asc
Description: Digital signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] avcodec/huffyuvenc : try to call dsp with aligned data, and remove code duplication

2017-12-01 Thread Martin Vignali
2017-11-26 19:07 GMT+01:00 Martin Vignali :

> Hello,
>
> in attach patchs
>
> 0001-avcodec-huffyuvenc-increase-scalar-loop-count
> and
> 0003-avcodec-huffyuvenc-sub_left_prediction_bgr32-call-ds
>
> like diff_bytes and diff_bytes16, have AVX2 version, increase the scalar
> loop
> to call the aligned version in most case
>
>
>
> 0002-avcodec-huffyuvenc-remove-code-duplication-in
> remove some code duplication, for width < 32 and for the initial scalar
> loop
>
>
> pass fate test for me (x86_64, mac os 10.12)
>
> Martin
>

Ping.

Martin
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel