On Sun, Nov 26, 2017 at 07:07:41PM +0100, Martin Vignali wrote:
> Hello,
>
> in attach patchs
>
> 0001-avcodec-huffyuvenc-increase-scalar-loop-count
> and
> 0003-avcodec-huffyuvenc-sub_left_prediction_bgr32-call-ds
>
> like diff_bytes and diff_bytes16, have AVX2 version, increase the scalar
> loop
> to call the aligned version in most case
>
>
>
> 0002-avcodec-huffyuvenc-remove-code-duplication-in
> remove some code duplication, for width < 32 and for the initial scalar loop
>
>
> pass fate test for me (x86_64, mac os 10.12)
>
> Martin
> huffyuvenc.c |4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
> 32eecc99e666808926e1dec4ff35c17a94f5f86e
> 0001-avcodec-huffyuvenc-increase-scalar-loop-count.patch
> From 9477be212247012ac386beeff009a2edb78abb31 Mon Sep 17 00:00:00 2001
> From: Martin Vignali
> Date: Sun, 26 Nov 2017 19:01:29 +0100
> Subject: [PATCH 1/3] avcodec/huffyuvenc : increase scalar loop count
>
> in order to try to call dsp in aligned mode
> (diff_int16 have AVX2 now)
> ---
> libavcodec/huffyuvenc.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/libavcodec/huffyuvenc.c b/libavcodec/huffyuvenc.c
> index 89639b75df..4f3a28e033 100644
> --- a/libavcodec/huffyuvenc.c
> +++ b/libavcodec/huffyuvenc.c
> @@ -80,12 +80,12 @@ static inline int sub_left_prediction(HYuvContext *s,
> uint8_t *dst,
> }
> return left;
> } else {
> -for (i = 0; i < 16; i++) {
> +for (i = 0; i < 32; i++) {
> const int temp = src16[i];
> dst16[i] = temp - left;
> left = temp;
> }
> -s->hencdsp.diff_int16(dst16 + 16, src16 + 16, src16 + 15, s->n -
> 1, w - 16);
> +s->hencdsp.diff_int16(dst16 + 32, src16 + 32, src16 + 31, s->n -
> 1, w - 32);
> return src16[w-1];
> }
> }
> --
> 2.11.0 (Apple Git-81)
>
> huffyuvenc.c | 46 --
> 1 file changed, 16 insertions(+), 30 deletions(-)
> ba80747db2582141ec0faefc5ccd04fba65c7d72
> 0002-avcodec-huffyuvenc-remove-code-duplication-in.patch
> From 7fa991ae72c97f4d1f74789e543cf01dcb93adb9 Mon Sep 17 00:00:00 2001
> From: Martin Vignali
> Date: Sun, 26 Nov 2017 19:02:10 +0100
> Subject: [PATCH 2/3] avcodec/huffyuvenc : remove code duplication in
> sub_left_prediction
>
> start of the line (before dsp call), can be merge with width < 32 part
> ---
> libavcodec/huffyuvenc.c | 46 --
> 1 file changed, 16 insertions(+), 30 deletions(-)
>
> diff --git a/libavcodec/huffyuvenc.c b/libavcodec/huffyuvenc.c
> index 4f3a28e033..59da49212e 100644
> --- a/libavcodec/huffyuvenc.c
> +++ b/libavcodec/huffyuvenc.c
> @@ -53,41 +53,27 @@ static inline int sub_left_prediction(HYuvContext *s,
> uint8_t *dst,
> {
> int i;
> if (s->bps <= 8) {
> -if (w < 32) {
> -for (i = 0; i < w; i++) {
> -const int temp = src[i];
> -dst[i] = temp - left;
> -left = temp;
> -}
> -return left;
> -} else {
> -for (i = 0; i < 32; i++) {
> -const int temp = src[i];
> -dst[i] = temp - left;
> -left = temp;
> -}
> -s->llvidencdsp.diff_bytes(dst + 32, src + 32, src + 31, w - 32);
> -return src[w-1];
> +for (i = 0; i < FFMIN(w, 32); i++) { /* scalar loop before dsp call
> */
> +const int temp = src[i];
> +dst[i] = temp - left;
> +left = temp;
requiring FFMIN() to be evaluated per iteration could be slower
if the compiler fails to factor it out
no other comments from me, the patches should be ok otherwise
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
If you fake or manipulate statistics in a paper in physics you will never
get a job again.
If you fake or manipulate statistics in a paper in medicin you will get
a job for life at the pharma industry.
signature.asc
Description: Digital signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel