Re: [FFmpeg-devel] [PATCH 2/2] avcodec/pngdec: Clean up on av_frame_ref() failure

2017-09-18 Thread James Almer
On 9/16/2017 9:42 PM, Michael Niedermayer wrote:
> Fixes: memleak
> Fixes: 3203/clusterfuzz-testcase-minimized-4514553595428864
> 
> Found-by: continuous fuzzing process 
> https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
> Signed-off-by: Michael Niedermayer 
> ---
>  libavcodec/pngdec.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/libavcodec/pngdec.c b/libavcodec/pngdec.c
> index dce8faf168..0d6612ccca 100644
> --- a/libavcodec/pngdec.c
> +++ b/libavcodec/pngdec.c
> @@ -1414,7 +1414,7 @@ static int decode_frame_png(AVCodecContext *avctx,
>  }
>  
>  if ((ret = av_frame_ref(data, s->picture.f)) < 0)
> -return ret;
> +goto the_end;
>  
>  *got_frame = 1;
>  

LGTM.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 3/3 v2] avfilter/interlace: add support for 10 and 12 bit

2017-09-18 Thread James Almer
On 9/18/2017 10:41 PM, Thomas Mundt wrote:
> I tried to set up MIPS compiler for two days on windows and linux without
> success.
> Now I try it blind. This solution is based on the first suggestion James
> gave me at IRC.
> There might be room for improvement and an alternative solution with
> AV_RL16() / AV_WL16().
> I used av_le2ne16() because it will be ignored for little endian.
> 
> Regards,
> Thomas

> From a2be5859266b1a2f7048b81ced6770ab4b90a5a4 Mon Sep 17 00:00:00 2001
> From: Thomas Mundt 
> Date: Tue, 19 Sep 2017 00:25:25 +0200
> Subject: [PATCH 3/3 v2] avfilter/interlace: add support for 10 and 12 bit
> 
> Signed-off-by: Thomas Mundt 
> ---
>  libavfilter/interlace.h|  5 +-
>  libavfilter/tinterlace.h   |  5 +-
>  libavfilter/vf_interlace.c | 92 
> ++
>  libavfilter/vf_tinterlace.c| 73 ++--
>  libavfilter/x86/vf_interlace.asm   | 80 --
>  libavfilter/x86/vf_interlace_init.c| 51 ++
>  libavfilter/x86/vf_tinterlace_init.c   | 51 ++
>  tests/ref/fate/filter-pixfmts-tinterlace_cvlpf | 11 +++
>  tests/ref/fate/filter-pixfmts-tinterlace_merge | 11 +++
>  tests/ref/fate/filter-pixfmts-tinterlace_pad   | 11 +++
>  tests/ref/fate/filter-pixfmts-tinterlace_vlpf  | 11 +++
>  11 files changed, 345 insertions(+), 56 deletions(-)
> 
> diff --git a/libavfilter/interlace.h b/libavfilter/interlace.h
> index 2101b79..90a0198 100644
> --- a/libavfilter/interlace.h
> +++ b/libavfilter/interlace.h
> @@ -25,9 +25,11 @@
>  #ifndef AVFILTER_INTERLACE_H
>  #define AVFILTER_INTERLACE_H
>  
> +#include "libavutil/bswap.h"
>  #include "libavutil/common.h"
>  #include "libavutil/imgutils.h"
>  #include "libavutil/opt.h"
> +#include "libavutil/pixdesc.h"
>  
>  #include "avfilter.h"
>  #include "formats.h"
> @@ -55,8 +57,9 @@ typedef struct InterlaceContext {
>  enum ScanMode scan;// top or bottom field first scanning
>  int lowpass;   // enable or disable low pass filtering
>  AVFrame *cur, *next;   // the two frames from which the new one is 
> obtained
> +const AVPixFmtDescriptor *csp;
>  void (*lowpass_line)(uint8_t *dstp, ptrdiff_t linesize, const uint8_t 
> *srcp,
> - ptrdiff_t mref, ptrdiff_t pref);
> + ptrdiff_t mref, ptrdiff_t pref, int clip_max);
>  } InterlaceContext;
>  
>  void ff_interlace_init_x86(InterlaceContext *interlace);
> diff --git a/libavfilter/tinterlace.h b/libavfilter/tinterlace.h
> index cc13a6c..b5c39aa 100644
> --- a/libavfilter/tinterlace.h
> +++ b/libavfilter/tinterlace.h
> @@ -27,7 +27,9 @@
>  #ifndef AVFILTER_TINTERLACE_H
>  #define AVFILTER_TINTERLACE_H
>  
> +#include "libavutil/bswap.h"
>  #include "libavutil/opt.h"
> +#include "libavutil/pixdesc.h"
>  #include "drawutils.h"
>  #include "avfilter.h"
>  
> @@ -60,8 +62,9 @@ typedef struct TInterlaceContext {
>  int black_linesize[4];
>  FFDrawContext draw;
>  FFDrawColor color;
> +const AVPixFmtDescriptor *csp;
>  void (*lowpass_line)(uint8_t *dstp, ptrdiff_t width, const uint8_t *srcp,
> - ptrdiff_t mref, ptrdiff_t pref);
> + ptrdiff_t mref, ptrdiff_t pref, int clip_max);
>  } TInterlaceContext;
>  
>  void ff_tinterlace_init_x86(TInterlaceContext *interlace);
> diff --git a/libavfilter/vf_interlace.c b/libavfilter/vf_interlace.c
> index 55bf782..bfba054 100644
> --- a/libavfilter/vf_interlace.c
> +++ b/libavfilter/vf_interlace.c
> @@ -61,8 +61,8 @@ static const AVOption interlace_options[] = {
>  AVFILTER_DEFINE_CLASS(interlace);
>  
>  static void lowpass_line_c(uint8_t *dstp, ptrdiff_t linesize,
> -   const uint8_t *srcp,
> -   ptrdiff_t mref, ptrdiff_t pref)
> +   const uint8_t *srcp, ptrdiff_t mref,
> +   ptrdiff_t pref, int clip_max)
>  {
>  const uint8_t *srcp_above = srcp + mref;
>  const uint8_t *srcp_below = srcp + pref;
> @@ -75,9 +75,28 @@ static void lowpass_line_c(uint8_t *dstp, ptrdiff_t 
> linesize,
>  }
>  }
>  
> +static void lowpass_line_c_16(uint8_t *dst8, ptrdiff_t linesize,
> +  const uint8_t *src8, ptrdiff_t mref,
> +  ptrdiff_t pref, int clip_max)
> +{
> +uint16_t *dstp = (uint16_t *)dst8;
> +const uint16_t *srcp = (const uint16_t *)src8;
> +const uint16_t *srcp_above = srcp + mref / 2;
> +const uint16_t *srcp_below = srcp + pref / 2;
> +int i;
> +for (i = 0; i < linesize; i++) {
> +// this calculation is an integer representation of
> +// '0.5 * current + 0.25 * above + 0.25 * below'
> +// '1 +' is for rounding.
> +dstp[i] = av_le2ne16((1 + av_le2ne16(srcp[i]) + av_le2ne16(srcp[i])
> +   

[FFmpeg-devel] [PATCH 3/3 v2] avfilter/interlace: add support for 10 and 12 bit

2017-09-18 Thread Thomas Mundt
I tried to set up MIPS compiler for two days on windows and linux without
success.
Now I try it blind. This solution is based on the first suggestion James
gave me at IRC.
There might be room for improvement and an alternative solution with
AV_RL16() / AV_WL16().
I used av_le2ne16() because it will be ignored for little endian.

Regards,
Thomas


0003-avfilter-interlace-add-support-for-10-and-12-bit.patch
Description: Binary data
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH 2/3 v2] avfilter/tinterlace: use drawutils for pad mode

2017-09-18 Thread Thomas Mundt
Patch 1/3 has already been applied.
Patch attached.


0002-avfilter-tinterlace-use-drawutils-for-pad-mode.patch
Description: Binary data
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH 2/2] avcodec/ffv1dec: Fix integer overflow in read_quant_table()

2017-09-18 Thread Michael Niedermayer
Fixes: runtime error: signed integer overflow: 2147483647 + 1 cannot be 
represented in type 'int'
Fixes: 3361/clusterfuzz-testcase-minimized-5065842955911168

Found-by: continuous fuzzing process 
https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer 
---
 libavcodec/ffv1dec.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libavcodec/ffv1dec.c b/libavcodec/ffv1dec.c
index b13ecd3eab..d2bfee784f 100644
--- a/libavcodec/ffv1dec.c
+++ b/libavcodec/ffv1dec.c
@@ -372,7 +372,7 @@ static int read_quant_table(RangeCoder *c, int16_t 
*quant_table, int scale)
 memset(state, 128, sizeof(state));
 
 for (v = 0; i < 128; v++) {
-unsigned len = get_symbol(c, state, 0) + 1;
+unsigned len = get_symbol(c, state, 0) + 1U;
 
 if (len > 128 - i || !len)
 return AVERROR_INVALIDDATA;
-- 
2.14.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH 1/2] avcodec/svq3: Fix overflow in svq3_add_idct_c()

2017-09-18 Thread Michael Niedermayer
Fixes: runtime error: signed integer overflow: 2147392585 + 524288 cannot be 
represented in type 'int'
Fixes: 3348/clusterfuzz-testcase-minimized-4809500517203968

Found-by: continuous fuzzing process 
https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer 
---
 libavcodec/svq3.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libavcodec/svq3.c b/libavcodec/svq3.c
index a766fa49ad..5cb5bd45b7 100644
--- a/libavcodec/svq3.c
+++ b/libavcodec/svq3.c
@@ -285,7 +285,7 @@ static void svq3_add_idct_c(uint8_t *dst, int16_t *block,
 const unsigned z1 = 13 * (block[i + 4 * 0] -  block[i + 4 * 2]);
 const unsigned z2 =  7 *  block[i + 4 * 1] - 17 * block[i + 4 * 3];
 const unsigned z3 = 17 *  block[i + 4 * 1] +  7 * block[i + 4 * 3];
-const int rr = (dc + 0x8);
+const int rr = (dc + 0x8u);
 
 dst[i + stride * 0] = av_clip_uint8(dst[i + stride * 0] + ((int)((z0 + 
z3) * qmul + rr) >> 20));
 dst[i + stride * 1] = av_clip_uint8(dst[i + stride * 1] + ((int)((z1 + 
z2) * qmul + rr) >> 20));
-- 
2.14.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 1/2] avcodec/wmv2dec: Check end of bitstream in parse_mb_skip() and ff_wmv2_decode_mb()

2017-09-18 Thread Paul B Mahol
On 9/18/17, Carl Eugen Hoyos  wrote:
> 2017-09-18 15:11 GMT+02:00 Ronald S. Bultje :
>> Hi Michael,
>>
>> On Sun, Sep 17, 2017 at 8:15 PM, Michael Niedermayer
>> >> wrote:
>>
>>> Iam happy to follow what the community prefers.
>>>
>>
>> Some don't like it, some don't care. I think everyone would be happy (and
>> thus the sum of happiness would increase) if you changed this to ff_dlog()
>> or something along those lines.
>
> I don't think this makes much sense for an error message shown
> on corrupted input.

Such messages just pollute codebase and binary.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH]lavfi/stereo3d: Set SAR for every output frame

2017-09-18 Thread Paul B Mahol
On 9/18/17, Carl Eugen Hoyos  wrote:
> Hi!
>
> Attached patch fixes ticket #6672.
>
> Please comment, Carl Eugen
>

ok
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH]lavfi/stereo3d: Set SAR for every output frame

2017-09-18 Thread Carl Eugen Hoyos
Hi!

Attached patch fixes ticket #6672.

Please comment, Carl Eugen
From c4d3ba1d69e7dd7a60ea7150eb1fc545a86697fe Mon Sep 17 00:00:00 2001
From: Carl Eugen Hoyos 
Date: Mon, 18 Sep 2017 23:10:06 +0200
Subject: [PATCH] lavfi/stereo3d: Set SAR for every output frame.

Fixes ticket #6672.
---
 libavfilter/vf_stereo3d.c |   22 --
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/libavfilter/vf_stereo3d.c b/libavfilter/vf_stereo3d.c
index 3e23890..8b22f88 100644
--- a/libavfilter/vf_stereo3d.c
+++ b/libavfilter/vf_stereo3d.c
@@ -150,6 +150,7 @@ typedef struct Stereo3DContext {
 AVFrame *prev;
 int blanks;
 int in_off_left[4], in_off_right[4];
+AVRational aspect;
 Stereo3DDSPContext dsp;
 } Stereo3DContext;
 
@@ -359,11 +360,11 @@ static int config_output(AVFilterLink *outlink)
 AVFilterContext *ctx = outlink->src;
 AVFilterLink *inlink = ctx->inputs[0];
 Stereo3DContext *s = ctx->priv;
-AVRational aspect = inlink->sample_aspect_ratio;
 AVRational fps = inlink->frame_rate;
 AVRational tb = inlink->time_base;
 const AVPixFmtDescriptor *desc = av_pix_fmt_desc_get(outlink->format);
 int ret;
+s->aspect = inlink->sample_aspect_ratio;
 
 switch (s->in.format) {
 case INTERLEAVE_COLS_LR:
@@ -404,25 +405,25 @@ static int config_output(AVFilterLink *outlink)
 
 switch (s->in.format) {
 case SIDE_BY_SIDE_2_LR:
-aspect.num *= 2;
+s->aspect.num  *= 2;
 case SIDE_BY_SIDE_LR:
 s->width= inlink->w / 2;
 s->in.off_right = s->width;
 break;
 case SIDE_BY_SIDE_2_RL:
-aspect.num *= 2;
+s->aspect.num  *= 2;
 case SIDE_BY_SIDE_RL:
 s->width= inlink->w / 2;
 s->in.off_left  = s->width;
 break;
 case ABOVE_BELOW_2_LR:
-aspect.den *= 2;
+s->aspect.den  *= 2;
 case ABOVE_BELOW_LR:
 s->in.row_right =
 s->height   = inlink->h / 2;
 break;
 case ABOVE_BELOW_2_RL:
-aspect.den *= 2;
+s->aspect.den  *= 2;
 case ABOVE_BELOW_RL:
 s->in.row_left  =
 s->height   = inlink->h / 2;
@@ -486,19 +487,19 @@ static int config_output(AVFilterLink *outlink)
 break;
 }
 case SIDE_BY_SIDE_2_LR:
-aspect.den  *= 2;
+s->aspect.den   *= 2;
 case SIDE_BY_SIDE_LR:
 s->out.width = s->width * 2;
 s->out.off_right = s->width;
 break;
 case SIDE_BY_SIDE_2_RL:
-aspect.den  *= 2;
+s->aspect.den   *= 2;
 case SIDE_BY_SIDE_RL:
 s->out.width = s->width * 2;
 s->out.off_left  = s->width;
 break;
 case ABOVE_BELOW_2_LR:
-aspect.num  *= 2;
+s->aspect.num   *= 2;
 case ABOVE_BELOW_LR:
 s->out.height= s->height * 2;
 s->out.row_right = s->height;
@@ -514,7 +515,7 @@ static int config_output(AVFilterLink *outlink)
 s->out.row_right = s->height + s->blanks;
 break;
 case ABOVE_BELOW_2_RL:
-aspect.num  *= 2;
+s->aspect.num   *= 2;
 case ABOVE_BELOW_RL:
 s->out.height= s->height * 2;
 s->out.row_left  = s->height;
@@ -576,7 +577,7 @@ static int config_output(AVFilterLink *outlink)
 outlink->h = s->out.height;
 outlink->frame_rate = fps;
 outlink->time_base = tb;
-outlink->sample_aspect_ratio = aspect;
+outlink->sample_aspect_ratio = s->aspect;
 
 if ((ret = av_image_fill_linesizes(s->linesize, outlink->format, s->width)) < 0)
 return ret;
@@ -1075,6 +1076,7 @@ copy:
 av_frame_free(>prev);
 av_frame_free();
 }
+out->sample_aspect_ratio = s->aspect;
 return ff_filter_frame(outlink, out);
 }
 
-- 
1.7.10.4

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 2/2] MAINTAINERS: modify the hlsenc description

2017-09-18 Thread Lou Logan
On Sun, Sep 17, 2017, at 06:08 PM, Steven Liu wrote:
> change the hlsenc from hls encryption to hlsenc
> 
> Suggested-by: Aman Gupta 
> Signed-off-by: Steven Liu 
> ---
>  MAINTAINERS | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 0b0f7fa1e4..81e1e4916a 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -409,7 +409,7 @@ Muxers/Demuxers:
>gxf.c Reimar Doeffinger
>gxfenc.c  Baptiste Coudurier
>hls.c Anssi Hannula
> -  hls encryption (hlsenc.c) Christian Suloway, Steven Liu
> +  hlsenc.c  Christian Suloway, Steven Liu
>idcin.c   Mike Melanson
>idroqdec.cMike Melanson
>iff.c Jaikrishnan Menon
> -- 

LGTM
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] doc/faq: replace "Mo" with Bytes

2017-09-18 Thread Werner Robitza
On Mon, Sep 18, 2017 at 5:06 PM, Nicolas George  wrote:
>
> Le jour du Génie, an CCXXV, Werner Robitza a écrit :
> > Replaces French "Mo" with "Bytes".
> >
> > Signed-off-by: Werner Robitza 
> > ---
> >  doc/faq.texi | 5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
>
> No. "Octet" originated from French but has been imported to English
> because "byte" causes a lot of confusion with "bit". RFCs and other
> texts where accuracy matters have started to adopt it since long ago
> (although not all of them did consistently, of course). With audio-video
> tools, the confusion with bits is quite frequent, that makes a good
> reason to take all steps to avoid it.

Hum, okay. Didn't think this was a conscious decision. I frequently
see speakers of French using "Mo" without knowing that (most of) the
rest of the world barely has an idea what this is. And I do understand
that octets are used in networking and RFCs – but then as "octet" –,
and that you may want to avoid ambiguity.

But this is literally the only place in FFmpeg's documentation where
this abbreviation is used, with more than 50 mentions of "Byte" in
http://ffmpeg.org/ffmpeg-all.html alone. So… is "-probesize" is doing
something special?
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] doc/faq: replace "Mo" with Bytes

2017-09-18 Thread Nicolas George
Le jour du Génie, an CCXXV, Werner Robitza a écrit :
> Hum, okay. Didn't think this was a conscious decision. I frequently
> see speakers of French using "Mo" without knowing that (most of) the
> rest of the world barely has an idea what this is. And I do understand
> that octets are used in networking and RFCs – but then as "octet" –,
> and that you may want to avoid ambiguity.

I would not mind a patch that expands Mo into mega-octet.

Nor a patch to replace all occurrences of byte, for that matter.

Case in point: the Vim spell file for English knows mega-octet.
(Hum, not sure if it knows it or mega and octet separately.)

Regards,

-- 
  Nicolas George


signature.asc
Description: Digital signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH]configure: Add -Wno-main

2017-09-18 Thread Nicolas George
Le jour du Génie, an CCXXV, Clement Boesch a écrit :
> I'm with James on this one, it's easy and harmless to fix, so I think
> we should do that instead.

I am also for changing the variable names. But it should also be
reported to gcc, because this warning is utterly braindead for local
symbols.

Regards,

-- 
  Nicolas George


signature.asc
Description: Digital signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] doc/faq: replace "Mo" with Bytes

2017-09-18 Thread Nicolas George
Le jour du Génie, an CCXXV, Werner Robitza a écrit :
> Replaces French "Mo" with "Bytes".
> 
> Signed-off-by: Werner Robitza 
> ---
>  doc/faq.texi | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)

No. "Octet" originated from French but has been imported to English
because "byte" causes a lot of confusion with "bit". RFCs and other
texts where accuracy matters have started to adopt it since long ago
(although not all of them did consistently, of course). With audio-video
tools, the confusion with bits is quite frequent, that makes a good
reason to take all steps to avoid it.

Regards,

-- 
  Nicolas George


signature.asc
Description: Digital signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH]configure: Add -Wno-main

2017-09-18 Thread Carl Eugen Hoyos
2017-09-18 7:54 GMT+02:00 Clément Bœsch :
> On Mon, Sep 18, 2017 at 03:55:12AM +0200, Carl Eugen Hoyos wrote:
>> 2017-09-18 3:47 GMT+02:00 James Almer :
>> > On 9/17/2017 10:37 PM, Carl Eugen Hoyos wrote:
>> >> Hi!
>> >>
>> >> Attached patch fixes several warnings when compiling libavfilter with
>> >> current gcc.
>> >>
>> >> Please comment, Carl Eugen
>> >
>> > IMO, it would be better if we instead rename all the cases of "main"
>> > used across the codebase.
>>
>> No strong opinion here, I would prefer if the warning were silenced.
>
> I'm with James on this one, it's easy and harmless to fix, so I think
> we should do that instead.

Sorry for being unclear:
I would just like to silence the warnings currently shown when
compiling FFmpeg, I do not care how they are silenced, I
made a suggestion that I thought would have less impact and
less discussion. ("Where in the specification is it written that
a variable must not be called 'main'?")

Sorry, Carl Eugen
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] x86/exrdsp: optimize ff_reorder_pixels_avx2()

2017-09-18 Thread James Almer
On 9/18/2017 4:35 AM, Martin Vignali wrote:
> 2017-09-18 3:52 GMT+02:00 James Almer :
> 
>> From: Henrik Gramner 
>>
>> Tested with "checkasm --test=exrdsp -bench"
>>
>> Before:
>> reorder_pixels_c: 5187.8
>> reorder_pixels_sse2: 377.0
>> reorder_pixels_avx2: 331.3
>>
>> After:
>> reorder_pixels_c: 5181.5
>> reorder_pixels_sse2: 377.0
>> reorder_pixels_avx2: 313.8
>>
>> I don't have the same result using a start/stop timer,
> but your testing approach is probably better than mine.

I also had a hard time getting to notice a difference with
star/stop_timer, and it's clear why seeing how little difference this
change truly makes.

You can build the above tool with "make checkasm", and the executable
will be in the tests/checkasm folder. The results tend to be less
variable and it's better detecting small differences between functions.

> And like, you both think it's a better way to do it, it's ok for me !
> 
> Thanks
> 
> Martin
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] avfilter: add vmafmotion filter

2017-09-18 Thread Ashish Pratap Singh
Hi,

On Mon, Sep 18, 2017 at 6:46 AM, James Almer  wrote:

> On 9/15/2017 5:47 PM, Ashish Pratap Singh wrote:
> > From: Ashish Singh 
> >
> > Hi, this patch addresses the previous issues and changes it to a single
> > input filter.
> >
> > Signed-off-by: Ashish Singh 
> > ---
> >  Changelog   |   1 +
> >  doc/filters.texi|  14 ++
> >  libavfilter/Makefile|   1 +
> >  libavfilter/allfilters.c|   1 +
> >  libavfilter/vf_vmafmotion.c | 325 ++
> ++
> >  libavfilter/vmaf_motion.h   |  58 
> >  6 files changed, 400 insertions(+)
> >  create mode 100644 libavfilter/vf_vmafmotion.c
> >  create mode 100644 libavfilter/vmaf_motion.h
> >
> > diff --git a/Changelog b/Changelog
> > index ea48e81..574f46e 100644
> > --- a/Changelog
> > +++ b/Changelog
> > @@ -48,6 +48,7 @@ version :
> >  - convolve video filter
> >  - VP9 tile threading support
> >  - KMS screen grabber
> > +- vmafmotion video filter
> >
> >  version 3.3:
> >  - CrystalHD decoder moved to new decode API
> > diff --git a/doc/filters.texi b/doc/filters.texi
> > index 830de54..d996357 100644
> > --- a/doc/filters.texi
> > +++ b/doc/filters.texi
> > @@ -15570,6 +15570,20 @@ vignette='PI/4+random(1)*PI/50':eval=frame
> >
> >  @end itemize
> >
> > +@section vmafmotion
> > +
> > +Obtain the average vmaf motion score of a video.
> > +It is one of the component filters of VMAF.
> > +
> > +The obtained average motion score is printed through the logging system.
> > +
> > +In the below example the input file @file{ref.mpg} is being processed
> and score
> > +is computed.
> > +
> > +@example
> > +ffmpeg -i ref.mpg -lavfi vmafmotion -f null -
> > +@end example
> > +
> >  @section vstack
> >  Stack input videos vertically.
> >
> > diff --git a/libavfilter/Makefile b/libavfilter/Makefile
> > index 8aa974e..4289ee0 100644
> > --- a/libavfilter/Makefile
> > +++ b/libavfilter/Makefile
> > @@ -330,6 +330,7 @@ OBJS-$(CONFIG_VFLIP_FILTER)  +=
> vf_vflip.o
> >  OBJS-$(CONFIG_VIDSTABDETECT_FILTER)  += vidstabutils.o
> vf_vidstabdetect.o
> >  OBJS-$(CONFIG_VIDSTABTRANSFORM_FILTER)   += vidstabutils.o
> vf_vidstabtransform.o
> >  OBJS-$(CONFIG_VIGNETTE_FILTER)   += vf_vignette.o
> > +OBJS-$(CONFIG_VMAFMOTION_FILTER) += vf_vmafmotion.o
> framesync.o
> >  OBJS-$(CONFIG_VSTACK_FILTER) += vf_stack.o framesync.o
> >  OBJS-$(CONFIG_W3FDIF_FILTER) += vf_w3fdif.o
> >  OBJS-$(CONFIG_WAVEFORM_FILTER)   += vf_waveform.o
> > diff --git a/libavfilter/allfilters.c b/libavfilter/allfilters.c
> > index 63e8672..8ec54be 100644
> > --- a/libavfilter/allfilters.c
> > +++ b/libavfilter/allfilters.c
> > @@ -341,6 +341,7 @@ static void register_all(void)
> >  REGISTER_FILTER(VIDSTABDETECT,  vidstabdetect,  vf);
> >  REGISTER_FILTER(VIDSTABTRANSFORM, vidstabtransform, vf);
> >  REGISTER_FILTER(VIGNETTE,   vignette,   vf);
> > +REGISTER_FILTER(VMAFMOTION, vmafmotion, vf);
> >  REGISTER_FILTER(VSTACK, vstack, vf);
> >  REGISTER_FILTER(W3FDIF, w3fdif, vf);
> >  REGISTER_FILTER(WAVEFORM,   waveform,   vf);
> > diff --git a/libavfilter/vf_vmafmotion.c b/libavfilter/vf_vmafmotion.c
> > new file mode 100644
> > index 000..c31c37c
> > --- /dev/null
> > +++ b/libavfilter/vf_vmafmotion.c
> > @@ -0,0 +1,325 @@
> > +/*
> > + * Copyright (c) 2017 Ronald S. Bultje 
> > + * Copyright (c) 2017 Ashish Pratap Singh 
> > + *
> > + * This file is part of FFmpeg.
> > + *
> > + * FFmpeg is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU Lesser General Public
> > + * License as published by the Free Software Foundation; either
> > + * version 2.1 of the License, or (at your option) any later version.
> > + *
> > + * FFmpeg is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > + * Lesser General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU Lesser General Public
> > + * License along with FFmpeg; if not, write to the Free Software
> > + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
> 02110-1301 USA
> > + */
> > +
> > +/**
> > + * @file
> > + * Calculate VMAF Motion score.
> > + */
> > +
> > +#include "libavutil/opt.h"
> > +#include "libavutil/pixdesc.h"
> > +#include "avfilter.h"
> > +#include "drawutils.h"
> > +#include "formats.h"
> > +#include "internal.h"
> > +#include "vmaf_motion.h"
> > +
> > +#define vmafmotion_options NULL
>
> This is unused.
>
Ok, I'll remove it.

>
> > +#define BIT_SHIFT 10
> > +
> > +static const float FILTER_5[5] = {
> > +0.054488685,
> > +0.244201342,
> > +

Re: [FFmpeg-devel] [PATCH] doc/faq: replace "Mo" with Bytes

2017-09-18 Thread Werner Robitza
On Mon, Sep 18, 2017 at 3:45 PM, Ricardo Constantino  wrote:
> Why not MB instead? It's still more readable than Bytes/B.

Because -probesize takes the number of Bytes by default.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] doc/faq: replace "Mo" with Bytes

2017-09-18 Thread Ricardo Constantino
On 18 September 2017 at 14:27, Werner Robitza 
wrote:

> Replaces French "Mo" with "Bytes".
>
> Signed-off-by: Werner Robitza 
> ---
>  doc/faq.texi | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/doc/faq.texi b/doc/faq.texi
> index ff35c89..07af0d9 100644
> --- a/doc/faq.texi
> +++ b/doc/faq.texi
> @@ -450,8 +450,9 @@ work with streams that were detected during the
> initial scan; streams that
>  are detected later are ignored.
>
>  The size of the initial scan is controlled by two options:
> @code{probesize}
> -(default ~5 Mo) and @code{analyzeduration} (default 5,000,000 µs = 5 s).
> For
> -the subtitle stream to be detected, both values must be large enough.
> +(default 500 Bytes) and @code{analyzeduration} (default 5,000,000 µs =
> +5 s). For the subtitle stream to be detected, both values must be large
> +enough.
>

Why not MB instead? It's still more readable than Bytes/B.


>
>  @section Why was the @command{ffmpeg} @option{-sameq} option removed?
> What to use instead?
>
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH] doc/faq: replace "Mo" with Bytes

2017-09-18 Thread Werner Robitza
Replaces French "Mo" with "Bytes".

Signed-off-by: Werner Robitza 
---
 doc/faq.texi | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/doc/faq.texi b/doc/faq.texi
index ff35c89..07af0d9 100644
--- a/doc/faq.texi
+++ b/doc/faq.texi
@@ -450,8 +450,9 @@ work with streams that were detected during the initial 
scan; streams that
 are detected later are ignored.
 
 The size of the initial scan is controlled by two options: @code{probesize}
-(default ~5 Mo) and @code{analyzeduration} (default 5,000,000 µs = 5 s). For
-the subtitle stream to be detected, both values must be large enough.
+(default 500 Bytes) and @code{analyzeduration} (default 5,000,000 µs =
+5 s). For the subtitle stream to be detected, both values must be large
+enough.
 
 @section Why was the @command{ffmpeg} @option{-sameq} option removed? What to 
use instead?
 
-- 
2.7.4

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 1/2] avcodec/wmv2dec: Check end of bitstream in parse_mb_skip() and ff_wmv2_decode_mb()

2017-09-18 Thread Ronald S. Bultje
Hi Michael,

On Sun, Sep 17, 2017 at 8:15 PM, Michael Niedermayer  wrote:

> Iam happy to follow what the community prefers.
>

Some don't like it, some don't care. I think everyone would be happy (and
thus the sum of happiness would increase) if you changed this to ff_dlog()
or something along those lines.

You say you want to code, so why not take the path of least resistance and
move on? Is this just about being right? Or do you really believe it's
important to display an error message while fuzzing? Or do you have actual
evidence that this is an error path that will often occur in real-world
files and where the provided error message helps our users resolve the
issue that their valid (non-fuzzed, real-world) file is not playing back? I
don't understand.

Ronald
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] avcodec/mips: Fixed rnd_val variable to 6 in hevc uni mc msa functions

2017-09-18 Thread Manojkumar Bhosale
LGTM

-Original Message-
From: ffmpeg-devel [mailto:ffmpeg-devel-boun...@ffmpeg.org] On Behalf Of 
kaustubh.ra...@imgtec.com
Sent: Monday, September 18, 2017 1:49 PM
To: ffmpeg-devel@ffmpeg.org
Cc: Kaustubh Raste
Subject: [FFmpeg-devel] [PATCH] avcodec/mips: Fixed rnd_val variable to 6 in 
hevc uni mc msa functions

From: Kaustubh Raste 

Signed-off-by: Kaustubh Raste 
---
 libavcodec/mips/hevc_mc_uni_msa.c |  372 +
 1 file changed, 133 insertions(+), 239 deletions(-)

diff --git a/libavcodec/mips/hevc_mc_uni_msa.c 
b/libavcodec/mips/hevc_mc_uni_msa.c
index 754fbdb..cf22e7f 100644
--- a/libavcodec/mips/hevc_mc_uni_msa.c
+++ b/libavcodec/mips/hevc_mc_uni_msa.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2015 Manojkumar Bhosale (manojkumar.bhos...@imgtec.com)
+ * Copyright (c) 2015 - 2017 Manojkumar Bhosale (manojkumar.bhos...@imgtec.com)
  *
  * This file is part of FFmpeg.
  *
@@ -359,16 +359,14 @@ static const uint8_t mc_filt_mask_arr[16 * 3] = {
 
 static void common_hz_8t_4x4_msa(uint8_t *src, int32_t src_stride,
  uint8_t *dst, int32_t dst_stride,
- const int8_t *filter, uint8_t rnd_val)
+ const int8_t *filter)
 {
 v16u8 mask0, mask1, mask2, mask3, out;
 v16i8 src0, src1, src2, src3, filt0, filt1, filt2, filt3;
 v8i16 filt, out0, out1;
-v8i16 rnd_vec;
 
 mask0 = LD_UB(_filt_mask_arr[16]);
 src -= 3;
-rnd_vec = __msa_fill_h(rnd_val);
 
 /* rearranging filter */
 filt = LD_SH(filter);
@@ -382,7 +380,7 @@ static void common_hz_8t_4x4_msa(uint8_t *src, int32_t 
src_stride,
 XORI_B4_128_SB(src0, src1, src2, src3);
 HORIZ_8TAP_4WID_4VECS_FILT(src0, src1, src2, src3, mask0, mask1, mask2,
mask3, filt0, filt1, filt2, filt3, out0, out1);
-SRAR_H2_SH(out0, out1, rnd_vec);
+SRARI_H2_SH(out0, out1, 6);
 SAT_SH2_SH(out0, out1, 7);
 out = PCKEV_XORI128_UB(out0, out1);
 ST4x4_UB(out, out, 0, 1, 2, 3, dst, dst_stride);
@@ -390,17 +388,15 @@ static void common_hz_8t_4x4_msa(uint8_t *src, int32_t 
src_stride,
 
 static void common_hz_8t_4x8_msa(uint8_t *src, int32_t src_stride,
  uint8_t *dst, int32_t dst_stride,
- const int8_t *filter, uint8_t rnd_val)
+ const int8_t *filter)
 {
 v16i8 filt0, filt1, filt2, filt3;
 v16i8 src0, src1, src2, src3;
 v16u8 mask0, mask1, mask2, mask3, out;
 v8i16 filt, out0, out1, out2, out3;
-v8i16 rnd_vec;
 
 mask0 = LD_UB(_filt_mask_arr[16]);
 src -= 3;
-rnd_vec = __msa_fill_h(rnd_val);
 
 /* rearranging filter */
 filt = LD_SH(filter);
@@ -419,7 +415,7 @@ static void common_hz_8t_4x8_msa(uint8_t *src, int32_t 
src_stride,
 XORI_B4_128_SB(src0, src1, src2, src3);
 HORIZ_8TAP_4WID_4VECS_FILT(src0, src1, src2, src3, mask0, mask1, mask2,
mask3, filt0, filt1, filt2, filt3, out2, out3);
-SRAR_H4_SH(out0, out1, out2, out3, rnd_vec);
+SRARI_H4_SH(out0, out1, out2, out3, 6);
 SAT_SH4_SH(out0, out1, out2, out3, 7);
 out = PCKEV_XORI128_UB(out0, out1);
 ST4x4_UB(out, out, 0, 1, 2, 3, dst, dst_stride);
@@ -430,16 +426,14 @@ static void common_hz_8t_4x8_msa(uint8_t *src, int32_t 
src_stride,
 
 static void common_hz_8t_4x16_msa(uint8_t *src, int32_t src_stride,
   uint8_t *dst, int32_t dst_stride,
-  const int8_t *filter, uint8_t rnd_val)
+  const int8_t *filter)
 {
 v16u8 mask0, mask1, mask2, mask3, out;
 v16i8 src0, src1, src2, src3, filt0, filt1, filt2, filt3;
 v8i16 filt, out0, out1, out2, out3;
-v8i16 rnd_vec;
 
 mask0 = LD_UB(_filt_mask_arr[16]);
 src -= 3;
-rnd_vec = __msa_fill_h(rnd_val);
 
 /* rearranging filter */
 filt = LD_SH(filter);
@@ -459,7 +453,7 @@ static void common_hz_8t_4x16_msa(uint8_t *src, int32_t 
src_stride,
 src += (4 * src_stride);
 HORIZ_8TAP_4WID_4VECS_FILT(src0, src1, src2, src3, mask0, mask1, mask2,
mask3, filt0, filt1, filt2, filt3, out2, out3);
-SRAR_H4_SH(out0, out1, out2, out3, rnd_vec);
+SRARI_H4_SH(out0, out1, out2, out3, 6);
 SAT_SH4_SH(out0, out1, out2, out3, 7);
 out = PCKEV_XORI128_UB(out0, out1);
 ST4x4_UB(out, out, 0, 1, 2, 3, dst, dst_stride);
@@ -479,7 +473,7 @@ static void common_hz_8t_4x16_msa(uint8_t *src, int32_t 
src_stride,
 HORIZ_8TAP_4WID_4VECS_FILT(src0, src1, src2, src3, mask0, mask1, mask2,
mask3, filt0, filt1, filt2, filt3, out2, out3);
 
-SRAR_H4_SH(out0, out1, out2, out3, rnd_vec);
+SRARI_H4_SH(out0, out1, out2, out3, 6);
 SAT_SH4_SH(out0, out1, out2, out3, 7);
 out = PCKEV_XORI128_UB(out0, out1);
 ST4x4_UB(out, out, 0, 1, 2, 3, 

Re: [FFmpeg-devel] [PATCH] avcodec/mips: preload data in hevc sao edge 0 degree filter msa functions

2017-09-18 Thread Manojkumar Bhosale
LGTM

-Original Message-
From: ffmpeg-devel [mailto:ffmpeg-devel-boun...@ffmpeg.org] On Behalf Of 
kaustubh.ra...@imgtec.com
Sent: Monday, September 18, 2017 1:55 PM
To: ffmpeg-devel@ffmpeg.org
Cc: Kaustubh Raste
Subject: [FFmpeg-devel] [PATCH] avcodec/mips: preload data in hevc sao edge 0 
degree filter msa functions

From: Kaustubh Raste 

Signed-off-by: Kaustubh Raste 
---
 libavcodec/mips/hevc_lpf_sao_msa.c |  232 +---
 1 file changed, 138 insertions(+), 94 deletions(-)

diff --git a/libavcodec/mips/hevc_lpf_sao_msa.c 
b/libavcodec/mips/hevc_lpf_sao_msa.c
index 1d77432..3472d32 100644
--- a/libavcodec/mips/hevc_lpf_sao_msa.c
+++ b/libavcodec/mips/hevc_lpf_sao_msa.c
@@ -1265,54 +1265,51 @@ static void 
hevc_sao_edge_filter_0degree_4width_msa(uint8_t *dst,
 int16_t *sao_offset_val,
 int32_t height)  {
-int32_t h_cnt;
 uint32_t dst_val0, dst_val1;
-v8i16 edge_idx = { 1, 2, 0, 3, 4, 0, 0, 0 };
+v16u8 cmp_minus10, diff_minus10, diff_minus11, src_minus10, src_minus11;
+v16i8 edge_idx = { 1, 2, 0, 3, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
+v16i8 sao_offset = LD_SB(sao_offset_val);
+v16i8 src_plus10, offset, src0, dst0;
 v16u8 const1 = (v16u8) __msa_ldi_b(1);
-v16u8 cmp_minus10, diff_minus10, cmp_minus11, diff_minus11;
-v16u8 src_minus10, src_minus11;
 v16i8 zero = { 0 };
-v16i8 src_zero0, src_zero1, src_plus10, src_plus11, dst0;
-v8i16 offset_mask0, offset_mask1;
-v8i16 sao_offset, src00, src01;
 
-sao_offset = LD_SH(sao_offset_val);
+sao_offset = __msa_pckev_b(sao_offset, sao_offset);
 src -= 1;
 
-for (h_cnt = (height >> 1); h_cnt--;) {
-LD_UB2(src, src_stride, src_minus10, src_minus11);
+/* load in advance */
+LD_UB2(src, src_stride, src_minus10, src_minus11);
+
+for (height -= 2; height; height -= 2) {
 src += (2 * src_stride);
 
-SLDI_B2_0_SB(src_minus10, src_minus11, src_zero0, src_zero1, 1);
-SLDI_B2_0_SB(src_minus10, src_minus11, src_plus10, src_plus11, 2);
-ILVR_B2_UB(src_plus10, src_minus10, src_plus11, src_minus11,
-   src_minus10, src_minus11);
-ILVR_B2_SB(src_zero0, src_zero0, src_zero1, src_zero1, src_zero0,
-   src_zero1);
+src_minus10 = (v16u8) __msa_pckev_d((v2i64) src_minus11,
+(v2i64) src_minus10);
 
-cmp_minus10 = ((v16u8) src_zero0 == src_minus10);
+src0 = (v16i8) __msa_sldi_b(zero, (v16i8) src_minus10, 1);
+src_plus10 = (v16i8) __msa_sldi_b(zero, (v16i8) src_minus10, 
+ 2);
+
+cmp_minus10 = ((v16u8) src0 == src_minus10);
 diff_minus10 = __msa_nor_v(cmp_minus10, cmp_minus10);
-cmp_minus10 = (src_minus10 < (v16u8) src_zero0);
+cmp_minus10 = (src_minus10 < (v16u8) src0);
 diff_minus10 = __msa_bmnz_v(diff_minus10, const1, cmp_minus10);
 
-cmp_minus11 = ((v16u8) src_zero1 == src_minus11);
-diff_minus11 = __msa_nor_v(cmp_minus11, cmp_minus11);
-cmp_minus11 = (src_minus11 < (v16u8) src_zero1);
-diff_minus11 = __msa_bmnz_v(diff_minus11, const1, cmp_minus11);
+cmp_minus10 = ((v16u8) src0 == (v16u8) src_plus10);
+diff_minus11 = __msa_nor_v(cmp_minus10, cmp_minus10);
+cmp_minus10 = ((v16u8) src_plus10 < (v16u8) src0);
+diff_minus11 = __msa_bmnz_v(diff_minus11, const1, cmp_minus10);
 
-offset_mask0 = (v8i16) (__msa_hadd_u_h(diff_minus10, diff_minus10) + 
2);
-offset_mask1 = (v8i16) (__msa_hadd_u_h(diff_minus11, diff_minus11) + 
2);
+offset = (v16i8) diff_minus10 + (v16i8) diff_minus11 + 2;
 
-VSHF_H2_SH(edge_idx, edge_idx, sao_offset, sao_offset, offset_mask0,
-   offset_mask0, offset_mask0, offset_mask0);
-VSHF_H2_SH(edge_idx, edge_idx, sao_offset, sao_offset, offset_mask1,
-   offset_mask1, offset_mask1, offset_mask1);
-ILVEV_B2_SH(src_zero0, zero, src_zero1, zero, src00, src01);
-ADD2(offset_mask0, src00, offset_mask1, src01, offset_mask0,
- offset_mask1);
-CLIP_SH2_0_255(offset_mask0, offset_mask1);
+/* load in advance */
+LD_UB2(src, src_stride, src_minus10, src_minus11);
+
+VSHF_B2_SB(edge_idx, edge_idx, sao_offset, sao_offset, offset, offset,
+   offset, offset);
+
+src0 = (v16i8) __msa_xori_b((v16u8) src0, 128);
+dst0 = __msa_adds_s_b(src0, offset);
+dst0 = (v16i8) __msa_xori_b((v16u8) dst0, 128);
 
-dst0 = __msa_pckev_b((v16i8) offset_mask1, (v16i8) offset_mask0);
 dst_val0 = __msa_copy_u_w((v4i32) dst0, 0);
 dst_val1 = __msa_copy_u_w((v4i32) dst0, 2);
 SW(dst_val0, dst);
@@ -1320,6 +1317,37 @@ static void 
hevc_sao_edge_filter_0degree_4width_msa(uint8_t 

Re: [FFmpeg-devel] [PATCH] configure: support static libnpp [v3]

2017-09-18 Thread Timo Rothenpieler

I don't like the complexity of this.

It also seems strange to me that you put cudart_static and other culibos 
in there, as well as $ldl, stdc++ and so on.
That would most likely link against shared cuda and static cuda at the 
same time, which seems highly redundant.


Also, no other library uses an additional configure switch to use its 
static version. I'm against adding one for libnpp.


Besides, I consider scale_npp a candidate for deprecation, in favor of 
scale_cuda, so I'm not too thrilled about introducing new and complex 
stuff for it.




smime.p7s
Description: S/MIME Cryptographic Signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH] avcodec/mips: Reduced conditional cases in avc inter lpf msa functions

2017-09-18 Thread kaustubh.raste
From: Kaustubh Raste 

Signed-off-by: Kaustubh Raste 
---
 libavcodec/mips/h264dsp_msa.c |  274 +
 1 file changed, 110 insertions(+), 164 deletions(-)

diff --git a/libavcodec/mips/h264dsp_msa.c b/libavcodec/mips/h264dsp_msa.c
index a17eacb..422703d 100644
--- a/libavcodec/mips/h264dsp_msa.c
+++ b/libavcodec/mips/h264dsp_msa.c
@@ -1250,21 +1250,7 @@ static void 
avc_loopfilter_luma_inter_edge_ver_msa(uint8_t *data,
uint8_t beta_in,
uint32_t img_width)
 {
-uint8_t *src;
-v16u8 beta, tmp_vec, bs = { 0 };
-v16u8 tc = { 0 };
-v16u8 is_less_than, is_less_than_beta;
-v16u8 p1, p0, q0, q1;
-v8i16 p0_r, q0_r, p1_r = { 0 };
-v8i16 q1_r = { 0 };
-v8i16 p0_l, q0_l, p1_l = { 0 };
-v8i16 q1_l = { 0 };
-v16u8 p3_org, p2_org, p1_org, p0_org, q0_org, q1_org, q2_org, q3_org;
-v8i16 p2_org_r, p1_org_r, p0_org_r, q0_org_r, q1_org_r, q2_org_r;
-v8i16 p2_org_l, p1_org_l, p0_org_l, q0_org_l, q1_org_l, q2_org_l;
-v8i16 tc_r, tc_l;
-v16i8 zero = { 0 };
-v16u8 is_bs_greater_than0;
+v16u8 tmp_vec, bs = { 0 };
 
 tmp_vec = (v16u8) __msa_fill_b(bs0);
 bs = (v16u8) __msa_insve_w((v4i32) bs, 0, (v4i32) tmp_vec);
@@ -1276,6 +1262,14 @@ static void 
avc_loopfilter_luma_inter_edge_ver_msa(uint8_t *data,
 bs = (v16u8) __msa_insve_w((v4i32) bs, 3, (v4i32) tmp_vec);
 
 if (!__msa_test_bz_v(bs)) {
+uint8_t *src = data - 4;
+v16u8 p3_org, p2_org, p1_org, p0_org, q0_org, q1_org, q2_org, q3_org;
+v16u8 p0_asub_q0, p1_asub_p0, q1_asub_q0, alpha, beta;
+v16u8 is_less_than, is_less_than_beta, is_less_than_alpha;
+v16u8 is_bs_greater_than0;
+v16u8 tc = { 0 };
+v16i8 zero = { 0 };
+
 tmp_vec = (v16u8) __msa_fill_b(tc0);
 tc = (v16u8) __msa_insve_w((v4i32) tc, 0, (v4i32) tmp_vec);
 tmp_vec = (v16u8) __msa_fill_b(tc1);
@@ -1291,9 +1285,6 @@ static void 
avc_loopfilter_luma_inter_edge_ver_msa(uint8_t *data,
 v16u8 row0, row1, row2, row3, row4, row5, row6, row7;
 v16u8 row8, row9, row10, row11, row12, row13, row14, row15;
 
-src = data;
-src -= 4;
-
 LD_UB8(src, img_width,
row0, row1, row2, row3, row4, row5, row6, row7);
 src += (8 * img_width);
@@ -1306,27 +1297,28 @@ static void 
avc_loopfilter_luma_inter_edge_ver_msa(uint8_t *data,
 p3_org, p2_org, p1_org, p0_org,
 q0_org, q1_org, q2_org, q3_org);
 }
-{
-v16u8 p0_asub_q0, p1_asub_p0, q1_asub_q0, alpha;
-v16u8 is_less_than_alpha;
-
-p0_asub_q0 = __msa_asub_u_b(p0_org, q0_org);
-p1_asub_p0 = __msa_asub_u_b(p1_org, p0_org);
-q1_asub_q0 = __msa_asub_u_b(q1_org, q0_org);
-
-alpha = (v16u8) __msa_fill_b(alpha_in);
-beta = (v16u8) __msa_fill_b(beta_in);
-
-is_less_than_alpha = (p0_asub_q0 < alpha);
-is_less_than_beta = (p1_asub_p0 < beta);
-is_less_than = is_less_than_beta & is_less_than_alpha;
-is_less_than_beta = (q1_asub_q0 < beta);
-is_less_than = is_less_than_beta & is_less_than;
-is_less_than = is_less_than & is_bs_greater_than0;
-}
+
+p0_asub_q0 = __msa_asub_u_b(p0_org, q0_org);
+p1_asub_p0 = __msa_asub_u_b(p1_org, p0_org);
+q1_asub_q0 = __msa_asub_u_b(q1_org, q0_org);
+
+alpha = (v16u8) __msa_fill_b(alpha_in);
+beta = (v16u8) __msa_fill_b(beta_in);
+
+is_less_than_alpha = (p0_asub_q0 < alpha);
+is_less_than_beta = (p1_asub_p0 < beta);
+is_less_than = is_less_than_beta & is_less_than_alpha;
+is_less_than_beta = (q1_asub_q0 < beta);
+is_less_than = is_less_than_beta & is_less_than;
+is_less_than = is_less_than & is_bs_greater_than0;
+
 if (!__msa_test_bz_v(is_less_than)) {
 v16i8 negate_tc, sign_negate_tc;
-v8i16 negate_tc_r, i16_negatetc_l;
+v16u8 p0, q0, p2_asub_p0, q2_asub_q0;
+v8i16 tc_r, tc_l, negate_tc_r, i16_negatetc_l;
+v8i16 p1_org_r, p0_org_r, q0_org_r, q1_org_r;
+v8i16 p1_org_l, p0_org_l, q0_org_l, q1_org_l;
+v8i16 p0_r, q0_r, p0_l, q0_l;
 
 negate_tc = zero - (v16i8) tc;
 sign_negate_tc = __msa_clti_s_b(negate_tc, 0);
@@ -1338,34 +1330,22 @@ static void 
avc_loopfilter_luma_inter_edge_ver_msa(uint8_t *data,
 UNPCK_UB_SH(p0_org, p0_org_r, p0_org_l);
 UNPCK_UB_SH(q0_org, q0_org_r, q0_org_l);
 
-{
-v16u8 p2_asub_p0;
-v16u8 is_less_than_beta_r, is_less_than_beta_l;
-
-p2_asub_p0 = __msa_asub_u_b(p2_org, p0_org);
-

[FFmpeg-devel] [PATCH] avcodec/mips: Unrolled loops and expanded functions in avc put mc 10 & 30 msa functions

2017-09-18 Thread kaustubh.raste
From: Kaustubh Raste 

Signed-off-by: Kaustubh Raste 
---
 libavcodec/mips/h264qpel_msa.c |  284 +++-
 1 file changed, 278 insertions(+), 6 deletions(-)

diff --git a/libavcodec/mips/h264qpel_msa.c b/libavcodec/mips/h264qpel_msa.c
index 05dffea..b7f6c3d 100644
--- a/libavcodec/mips/h264qpel_msa.c
+++ b/libavcodec/mips/h264qpel_msa.c
@@ -3065,37 +3065,309 @@ void ff_avg_h264_qpel4_mc00_msa(uint8_t *dst, const 
uint8_t *src,
 void ff_put_h264_qpel16_mc10_msa(uint8_t *dst, const uint8_t *src,
  ptrdiff_t stride)
 {
-avc_luma_hz_qrt_16w_msa(src - 2, stride, dst, stride, 16, 0);
+uint32_t loop_cnt;
+v16i8 dst0, dst1, dst2, dst3, src0, src1, src2, src3, src4, src5, src6;
+v16i8 mask0, mask1, mask2, mask3, mask4, mask5, src7, vec11;
+v16i8 vec0, vec1, vec2, vec3, vec4, vec5, vec6, vec7, vec8, vec9, vec10;
+v8i16 res0, res1, res2, res3, res4, res5, res6, res7;
+v16i8 minus5b = __msa_ldi_b(-5);
+v16i8 plus20b = __msa_ldi_b(20);
+
+LD_SB3(_mask_arr[0], 16, mask0, mask1, mask2);
+mask3 = mask0 + 8;
+mask4 = mask1 + 8;
+mask5 = mask2 + 8;
+src -= 2;
+
+for (loop_cnt = 4; loop_cnt--;) {
+LD_SB2(src, 16, src0, src1);
+src += stride;
+LD_SB2(src, 16, src2, src3);
+src += stride;
+LD_SB2(src, 16, src4, src5);
+src += stride;
+LD_SB2(src, 16, src6, src7);
+src += stride;
+
+XORI_B8_128_SB(src0, src1, src2, src3, src4, src5, src6, src7);
+VSHF_B2_SB(src0, src0, src0, src1, mask0, mask3, vec0, vec3);
+VSHF_B2_SB(src2, src2, src2, src3, mask0, mask3, vec6, vec9);
+VSHF_B2_SB(src0, src0, src0, src1, mask1, mask4, vec1, vec4);
+VSHF_B2_SB(src2, src2, src2, src3, mask1, mask4, vec7, vec10);
+VSHF_B2_SB(src0, src0, src0, src1, mask2, mask5, vec2, vec5);
+VSHF_B2_SB(src2, src2, src2, src3, mask2, mask5, vec8, vec11);
+HADD_SB4_SH(vec0, vec3, vec6, vec9, res0, res1, res2, res3);
+DPADD_SB4_SH(vec1, vec4, vec7, vec10, minus5b, minus5b, minus5b,
+ minus5b, res0, res1, res2, res3);
+DPADD_SB4_SH(vec2, vec5, vec8, vec11, plus20b, plus20b, plus20b,
+ plus20b, res0, res1, res2, res3);
+VSHF_B2_SB(src4, src4, src4, src5, mask0, mask3, vec0, vec3);
+VSHF_B2_SB(src6, src6, src6, src7, mask0, mask3, vec6, vec9);
+VSHF_B2_SB(src4, src4, src4, src5, mask1, mask4, vec1, vec4);
+VSHF_B2_SB(src6, src6, src6, src7, mask1, mask4, vec7, vec10);
+VSHF_B2_SB(src4, src4, src4, src5, mask2, mask5, vec2, vec5);
+VSHF_B2_SB(src6, src6, src6, src7, mask2, mask5, vec8, vec11);
+HADD_SB4_SH(vec0, vec3, vec6, vec9, res4, res5, res6, res7);
+DPADD_SB4_SH(vec1, vec4, vec7, vec10, minus5b, minus5b, minus5b,
+ minus5b, res4, res5, res6, res7);
+DPADD_SB4_SH(vec2, vec5, vec8, vec11, plus20b, plus20b, plus20b,
+ plus20b, res4, res5, res6, res7);
+SLDI_B2_SB(src1, src3, src0, src2, src0, src2, 2);
+SLDI_B2_SB(src5, src7, src4, src6, src4, src6, 2);
+SRARI_H4_SH(res0, res1, res2, res3, 5);
+SRARI_H4_SH(res4, res5, res6, res7, 5);
+SAT_SH4_SH(res0, res1, res2, res3, 7);
+SAT_SH4_SH(res4, res5, res6, res7, 7);
+PCKEV_B2_SB(res1, res0, res3, res2, dst0, dst1);
+PCKEV_B2_SB(res5, res4, res7, res6, dst2, dst3);
+dst0 = __msa_aver_s_b(dst0, src0);
+dst1 = __msa_aver_s_b(dst1, src2);
+dst2 = __msa_aver_s_b(dst2, src4);
+dst3 = __msa_aver_s_b(dst3, src6);
+XORI_B4_128_SB(dst0, dst1, dst2, dst3);
+ST_SB4(dst0, dst1, dst2, dst3, dst, stride);
+dst += (4 * stride);
+}
 }
 
 void ff_put_h264_qpel16_mc30_msa(uint8_t *dst, const uint8_t *src,
  ptrdiff_t stride)
 {
-avc_luma_hz_qrt_16w_msa(src - 2, stride, dst, stride, 16, 1);
+uint32_t loop_cnt;
+v16i8 dst0, dst1, dst2, dst3, src0, src1, src2, src3, src4, src5, src6;
+v16i8 mask0, mask1, mask2, mask3, mask4, mask5, src7, vec11;
+v16i8 vec0, vec1, vec2, vec3, vec4, vec5, vec6, vec7, vec8, vec9, vec10;
+v8i16 res0, res1, res2, res3, res4, res5, res6, res7;
+v16i8 minus5b = __msa_ldi_b(-5);
+v16i8 plus20b = __msa_ldi_b(20);
+
+LD_SB3(_mask_arr[0], 16, mask0, mask1, mask2);
+mask3 = mask0 + 8;
+mask4 = mask1 + 8;
+mask5 = mask2 + 8;
+src -= 2;
+
+for (loop_cnt = 4; loop_cnt--;) {
+LD_SB2(src, 16, src0, src1);
+src += stride;
+LD_SB2(src, 16, src2, src3);
+src += stride;
+LD_SB2(src, 16, src4, src5);
+src += stride;
+LD_SB2(src, 16, src6, src7);
+src += stride;
+
+XORI_B8_128_SB(src0, src1, src2, src3, src4, src5, src6, src7);
+VSHF_B2_SB(src0, src0, src0, src1, mask0, mask3, vec0, 

[FFmpeg-devel] [PATCH] avcodec/mips: preload data in hevc sao edge 0 degree filter msa functions

2017-09-18 Thread kaustubh.raste
From: Kaustubh Raste 

Signed-off-by: Kaustubh Raste 
---
 libavcodec/mips/hevc_lpf_sao_msa.c |  232 +---
 1 file changed, 138 insertions(+), 94 deletions(-)

diff --git a/libavcodec/mips/hevc_lpf_sao_msa.c 
b/libavcodec/mips/hevc_lpf_sao_msa.c
index 1d77432..3472d32 100644
--- a/libavcodec/mips/hevc_lpf_sao_msa.c
+++ b/libavcodec/mips/hevc_lpf_sao_msa.c
@@ -1265,54 +1265,51 @@ static void 
hevc_sao_edge_filter_0degree_4width_msa(uint8_t *dst,
 int16_t *sao_offset_val,
 int32_t height)
 {
-int32_t h_cnt;
 uint32_t dst_val0, dst_val1;
-v8i16 edge_idx = { 1, 2, 0, 3, 4, 0, 0, 0 };
+v16u8 cmp_minus10, diff_minus10, diff_minus11, src_minus10, src_minus11;
+v16i8 edge_idx = { 1, 2, 0, 3, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
+v16i8 sao_offset = LD_SB(sao_offset_val);
+v16i8 src_plus10, offset, src0, dst0;
 v16u8 const1 = (v16u8) __msa_ldi_b(1);
-v16u8 cmp_minus10, diff_minus10, cmp_minus11, diff_minus11;
-v16u8 src_minus10, src_minus11;
 v16i8 zero = { 0 };
-v16i8 src_zero0, src_zero1, src_plus10, src_plus11, dst0;
-v8i16 offset_mask0, offset_mask1;
-v8i16 sao_offset, src00, src01;
 
-sao_offset = LD_SH(sao_offset_val);
+sao_offset = __msa_pckev_b(sao_offset, sao_offset);
 src -= 1;
 
-for (h_cnt = (height >> 1); h_cnt--;) {
-LD_UB2(src, src_stride, src_minus10, src_minus11);
+/* load in advance */
+LD_UB2(src, src_stride, src_minus10, src_minus11);
+
+for (height -= 2; height; height -= 2) {
 src += (2 * src_stride);
 
-SLDI_B2_0_SB(src_minus10, src_minus11, src_zero0, src_zero1, 1);
-SLDI_B2_0_SB(src_minus10, src_minus11, src_plus10, src_plus11, 2);
-ILVR_B2_UB(src_plus10, src_minus10, src_plus11, src_minus11,
-   src_minus10, src_minus11);
-ILVR_B2_SB(src_zero0, src_zero0, src_zero1, src_zero1, src_zero0,
-   src_zero1);
+src_minus10 = (v16u8) __msa_pckev_d((v2i64) src_minus11,
+(v2i64) src_minus10);
 
-cmp_minus10 = ((v16u8) src_zero0 == src_minus10);
+src0 = (v16i8) __msa_sldi_b(zero, (v16i8) src_minus10, 1);
+src_plus10 = (v16i8) __msa_sldi_b(zero, (v16i8) src_minus10, 2);
+
+cmp_minus10 = ((v16u8) src0 == src_minus10);
 diff_minus10 = __msa_nor_v(cmp_minus10, cmp_minus10);
-cmp_minus10 = (src_minus10 < (v16u8) src_zero0);
+cmp_minus10 = (src_minus10 < (v16u8) src0);
 diff_minus10 = __msa_bmnz_v(diff_minus10, const1, cmp_minus10);
 
-cmp_minus11 = ((v16u8) src_zero1 == src_minus11);
-diff_minus11 = __msa_nor_v(cmp_minus11, cmp_minus11);
-cmp_minus11 = (src_minus11 < (v16u8) src_zero1);
-diff_minus11 = __msa_bmnz_v(diff_minus11, const1, cmp_minus11);
+cmp_minus10 = ((v16u8) src0 == (v16u8) src_plus10);
+diff_minus11 = __msa_nor_v(cmp_minus10, cmp_minus10);
+cmp_minus10 = ((v16u8) src_plus10 < (v16u8) src0);
+diff_minus11 = __msa_bmnz_v(diff_minus11, const1, cmp_minus10);
 
-offset_mask0 = (v8i16) (__msa_hadd_u_h(diff_minus10, diff_minus10) + 
2);
-offset_mask1 = (v8i16) (__msa_hadd_u_h(diff_minus11, diff_minus11) + 
2);
+offset = (v16i8) diff_minus10 + (v16i8) diff_minus11 + 2;
 
-VSHF_H2_SH(edge_idx, edge_idx, sao_offset, sao_offset, offset_mask0,
-   offset_mask0, offset_mask0, offset_mask0);
-VSHF_H2_SH(edge_idx, edge_idx, sao_offset, sao_offset, offset_mask1,
-   offset_mask1, offset_mask1, offset_mask1);
-ILVEV_B2_SH(src_zero0, zero, src_zero1, zero, src00, src01);
-ADD2(offset_mask0, src00, offset_mask1, src01, offset_mask0,
- offset_mask1);
-CLIP_SH2_0_255(offset_mask0, offset_mask1);
+/* load in advance */
+LD_UB2(src, src_stride, src_minus10, src_minus11);
+
+VSHF_B2_SB(edge_idx, edge_idx, sao_offset, sao_offset, offset, offset,
+   offset, offset);
+
+src0 = (v16i8) __msa_xori_b((v16u8) src0, 128);
+dst0 = __msa_adds_s_b(src0, offset);
+dst0 = (v16i8) __msa_xori_b((v16u8) dst0, 128);
 
-dst0 = __msa_pckev_b((v16i8) offset_mask1, (v16i8) offset_mask0);
 dst_val0 = __msa_copy_u_w((v4i32) dst0, 0);
 dst_val1 = __msa_copy_u_w((v4i32) dst0, 2);
 SW(dst_val0, dst);
@@ -1320,6 +1317,37 @@ static void 
hevc_sao_edge_filter_0degree_4width_msa(uint8_t *dst,
 SW(dst_val1, dst);
 dst += dst_stride;
 }
+
+src_minus10 = (v16u8) __msa_pckev_d((v2i64) src_minus11,
+(v2i64) src_minus10);
+
+src0 = (v16i8) __msa_sldi_b(zero, (v16i8) src_minus10, 1);
+src_plus10 = (v16i8) __msa_sldi_b(zero, (v16i8) src_minus10, 2);
+
+  

[FFmpeg-devel] [PATCH] avcodec/mips: Fixed rnd_val variable to 6 in hevc uni mc msa functions

2017-09-18 Thread kaustubh.raste
From: Kaustubh Raste 

Signed-off-by: Kaustubh Raste 
---
 libavcodec/mips/hevc_mc_uni_msa.c |  372 +
 1 file changed, 133 insertions(+), 239 deletions(-)

diff --git a/libavcodec/mips/hevc_mc_uni_msa.c 
b/libavcodec/mips/hevc_mc_uni_msa.c
index 754fbdb..cf22e7f 100644
--- a/libavcodec/mips/hevc_mc_uni_msa.c
+++ b/libavcodec/mips/hevc_mc_uni_msa.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2015 Manojkumar Bhosale (manojkumar.bhos...@imgtec.com)
+ * Copyright (c) 2015 - 2017 Manojkumar Bhosale (manojkumar.bhos...@imgtec.com)
  *
  * This file is part of FFmpeg.
  *
@@ -359,16 +359,14 @@ static const uint8_t mc_filt_mask_arr[16 * 3] = {
 
 static void common_hz_8t_4x4_msa(uint8_t *src, int32_t src_stride,
  uint8_t *dst, int32_t dst_stride,
- const int8_t *filter, uint8_t rnd_val)
+ const int8_t *filter)
 {
 v16u8 mask0, mask1, mask2, mask3, out;
 v16i8 src0, src1, src2, src3, filt0, filt1, filt2, filt3;
 v8i16 filt, out0, out1;
-v8i16 rnd_vec;
 
 mask0 = LD_UB(_filt_mask_arr[16]);
 src -= 3;
-rnd_vec = __msa_fill_h(rnd_val);
 
 /* rearranging filter */
 filt = LD_SH(filter);
@@ -382,7 +380,7 @@ static void common_hz_8t_4x4_msa(uint8_t *src, int32_t 
src_stride,
 XORI_B4_128_SB(src0, src1, src2, src3);
 HORIZ_8TAP_4WID_4VECS_FILT(src0, src1, src2, src3, mask0, mask1, mask2,
mask3, filt0, filt1, filt2, filt3, out0, out1);
-SRAR_H2_SH(out0, out1, rnd_vec);
+SRARI_H2_SH(out0, out1, 6);
 SAT_SH2_SH(out0, out1, 7);
 out = PCKEV_XORI128_UB(out0, out1);
 ST4x4_UB(out, out, 0, 1, 2, 3, dst, dst_stride);
@@ -390,17 +388,15 @@ static void common_hz_8t_4x4_msa(uint8_t *src, int32_t 
src_stride,
 
 static void common_hz_8t_4x8_msa(uint8_t *src, int32_t src_stride,
  uint8_t *dst, int32_t dst_stride,
- const int8_t *filter, uint8_t rnd_val)
+ const int8_t *filter)
 {
 v16i8 filt0, filt1, filt2, filt3;
 v16i8 src0, src1, src2, src3;
 v16u8 mask0, mask1, mask2, mask3, out;
 v8i16 filt, out0, out1, out2, out3;
-v8i16 rnd_vec;
 
 mask0 = LD_UB(_filt_mask_arr[16]);
 src -= 3;
-rnd_vec = __msa_fill_h(rnd_val);
 
 /* rearranging filter */
 filt = LD_SH(filter);
@@ -419,7 +415,7 @@ static void common_hz_8t_4x8_msa(uint8_t *src, int32_t 
src_stride,
 XORI_B4_128_SB(src0, src1, src2, src3);
 HORIZ_8TAP_4WID_4VECS_FILT(src0, src1, src2, src3, mask0, mask1, mask2,
mask3, filt0, filt1, filt2, filt3, out2, out3);
-SRAR_H4_SH(out0, out1, out2, out3, rnd_vec);
+SRARI_H4_SH(out0, out1, out2, out3, 6);
 SAT_SH4_SH(out0, out1, out2, out3, 7);
 out = PCKEV_XORI128_UB(out0, out1);
 ST4x4_UB(out, out, 0, 1, 2, 3, dst, dst_stride);
@@ -430,16 +426,14 @@ static void common_hz_8t_4x8_msa(uint8_t *src, int32_t 
src_stride,
 
 static void common_hz_8t_4x16_msa(uint8_t *src, int32_t src_stride,
   uint8_t *dst, int32_t dst_stride,
-  const int8_t *filter, uint8_t rnd_val)
+  const int8_t *filter)
 {
 v16u8 mask0, mask1, mask2, mask3, out;
 v16i8 src0, src1, src2, src3, filt0, filt1, filt2, filt3;
 v8i16 filt, out0, out1, out2, out3;
-v8i16 rnd_vec;
 
 mask0 = LD_UB(_filt_mask_arr[16]);
 src -= 3;
-rnd_vec = __msa_fill_h(rnd_val);
 
 /* rearranging filter */
 filt = LD_SH(filter);
@@ -459,7 +453,7 @@ static void common_hz_8t_4x16_msa(uint8_t *src, int32_t 
src_stride,
 src += (4 * src_stride);
 HORIZ_8TAP_4WID_4VECS_FILT(src0, src1, src2, src3, mask0, mask1, mask2,
mask3, filt0, filt1, filt2, filt3, out2, out3);
-SRAR_H4_SH(out0, out1, out2, out3, rnd_vec);
+SRARI_H4_SH(out0, out1, out2, out3, 6);
 SAT_SH4_SH(out0, out1, out2, out3, 7);
 out = PCKEV_XORI128_UB(out0, out1);
 ST4x4_UB(out, out, 0, 1, 2, 3, dst, dst_stride);
@@ -479,7 +473,7 @@ static void common_hz_8t_4x16_msa(uint8_t *src, int32_t 
src_stride,
 HORIZ_8TAP_4WID_4VECS_FILT(src0, src1, src2, src3, mask0, mask1, mask2,
mask3, filt0, filt1, filt2, filt3, out2, out3);
 
-SRAR_H4_SH(out0, out1, out2, out3, rnd_vec);
+SRARI_H4_SH(out0, out1, out2, out3, 6);
 SAT_SH4_SH(out0, out1, out2, out3, 7);
 out = PCKEV_XORI128_UB(out0, out1);
 ST4x4_UB(out, out, 0, 1, 2, 3, dst, dst_stride);
@@ -490,30 +484,27 @@ static void common_hz_8t_4x16_msa(uint8_t *src, int32_t 
src_stride,
 
 static void common_hz_8t_4w_msa(uint8_t *src, int32_t src_stride,
 uint8_t *dst, int32_t dst_stride,
-const int8_t *filter, int32_t height, uint8_t 

Re: [FFmpeg-devel] [PATCH] x86/exrdsp: optimize ff_reorder_pixels_avx2()

2017-09-18 Thread Martin Vignali
2017-09-18 3:52 GMT+02:00 James Almer :

> From: Henrik Gramner 
>
> Tested with "checkasm --test=exrdsp -bench"
>
> Before:
> reorder_pixels_c: 5187.8
> reorder_pixels_sse2: 377.0
> reorder_pixels_avx2: 331.3
>
> After:
> reorder_pixels_c: 5181.5
> reorder_pixels_sse2: 377.0
> reorder_pixels_avx2: 313.8
>
> I don't have the same result using a start/stop timer,
but your testing approach is probably better than mine.
And like, you both think it's a better way to do it, it's ok for me !

Thanks

Martin
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 2/2] fate: add mxf_dv25/dvcpro50 regression tests

2017-09-18 Thread Tobias Rapp

On 15.09.2017 22:43, Michael Niedermayer wrote:

On Thu, Sep 14, 2017 at 03:44:42PM +0200, Tobias Rapp wrote:

Signed-off-by: Tobias Rapp 
---
  tests/fate/avformat.mak  |  2 ++
  tests/fate/seek.mak  |  4 +++
  tests/lavf-regression.sh |  8 ++
  tests/ref/lavf/mxf_dv25  |  3 +++
  tests/ref/lavf/mxf_dvcpro50  |  3 +++
  tests/ref/seek/lavf-mxf_dv25 | 53 
  tests/ref/seek/lavf-mxf_dvcpro50 | 53 
  7 files changed, 126 insertions(+)
  create mode 100644 tests/ref/lavf/mxf_dv25
  create mode 100644 tests/ref/lavf/mxf_dvcpro50
  create mode 100644 tests/ref/seek/lavf-mxf_dv25
  create mode 100644 tests/ref/seek/lavf-mxf_dvcpro50


probably ok


Applied, thanks for the review.

Tobias

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 1/2] avformat/mxfenc: fix aspect ratio when writing 16:9 DV frames

2017-09-18 Thread Tobias Rapp

On 15.09.2017 20:44, Michael Niedermayer wrote:

On Thu, Sep 14, 2017 at 03:44:41PM +0200, Tobias Rapp wrote:

Signed-off-by: Tobias Rapp 
---
  libavformat/mxfenc.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libavformat/mxfenc.c b/libavformat/mxfenc.c
index 7289e0b..da4d7b4 100644
--- a/libavformat/mxfenc.c
+++ b/libavformat/mxfenc.c
@@ -1810,7 +1810,7 @@ static int mxf_parse_dv_frame(AVFormatContext *s, 
AVStream *st, AVPacket *pkt)
  stype= vs_pack[3] & 0x1f;
  pal  = (vs_pack[3] >> 5) & 0x1;
  
-if ((vs_pack[2] & 0x07) == 0x02)

+if ((vsc_pack[2] & 0x07) == 0x02)
  sc->aspect_ratio = (AVRational){ 16, 9 };
  else
  sc->aspect_ratio = (AVRational){ 4, 3 };


LGTM

thanks


Applied, thanks for the review.

Tobias

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel