date:20170612

[FFmpeg-devel] [PATCH] Add support for RockChip Media Process Platform

2017-06-12 Thread LongChair .

From: LongChair 

This adds hardware decoding for h264 / HEVC / VP8 using MPP Rockchip API.
Will return frames holding a av_drmprime struct in buf[3} that allows drm / 
dmabuf usage.
Was tested on RK3288 (TinkerBoard) and RK3328.

Additions from patch v1
 - Change AV_PIX_FMT_RKMPP to AV_PIX_FMT_DRMPRIME as any decoder able to output 
drmprime structures could use that pixel format.
---
 Changelog  |   1 +
 configure  |  12 ++
 libavcodec/Makefile|   5 +
 libavcodec/allcodecs.c |   6 +
 libavcodec/drmprime.h  |  17 ++
 libavcodec/rkmppdec.c  | 523 +
 libavutil/pixdesc.c|   4 +
 libavutil/pixfmt.h |   5 +
 8 files changed, 573 insertions(+)
 create mode 100644 libavcodec/drmprime.h
 create mode 100644 libavcodec/rkmppdec.c

diff --git a/Changelog b/Changelog
index 3533bdc..498e433 100644
--- a/Changelog
+++ b/Changelog
@@ -52,6 +52,7 @@ version 3.3:
 - Removed asyncts filter (use af_aresample instead)
 - Intel QSV-accelerated VP8 video decoding
 - VAAPI-accelerated deinterlacing
+- Addition of Rockchip MPP harware decoding
 
 
 version 3.2:
diff --git a/configure b/configure
index 4ec8f21..883fc84 100755
--- a/configure
+++ b/configure
@@ -304,6 +304,7 @@ External library support:
   --disable-nvenc  disable Nvidia video encoding code [autodetect]
   --enable-omx enable OpenMAX IL code [no]
   --enable-omx-rpi enable OpenMAX IL code for Raspberry Pi [no]
+  --enable-rkmpp   enable Rockchip Media Process Platform code [no]
   --disable-vaapi  disable Video Acceleration API (mainly Unix/Intel) 
code [autodetect]
   --disable-vdadisable Apple Video Decode Acceleration code 
[autodetect]
   --disable-vdpau  disable Nvidia Video Decode and Presentation API 
for Unix code [autodetect]
@@ -1607,6 +1608,7 @@ HWACCEL_LIBRARY_LIST="
 libmfx
 mmal
 omx
+rkmpp
 "
 
 DOCUMENT_LIST="
@@ -2616,6 +2618,7 @@ h264_dxva2_hwaccel_select="h264_decoder"
 h264_mediacodec_hwaccel_deps="mediacodec"
 h264_mmal_hwaccel_deps="mmal"
 h264_qsv_hwaccel_deps="libmfx"
+h264_rkmpp_hwaccel_deps="rkmpp"
 h264_vaapi_hwaccel_deps="vaapi"
 h264_vaapi_hwaccel_select="h264_decoder"
 h264_vda_hwaccel_deps="vda"
@@ -2634,6 +2637,7 @@ hevc_mediacodec_hwaccel_deps="mediacodec"
 hevc_dxva2_hwaccel_deps="dxva2 DXVA_PicParams_HEVC"
 hevc_dxva2_hwaccel_select="hevc_decoder"
 hevc_qsv_hwaccel_deps="libmfx"
+hevc_rkmpp_hwaccel_deps="rkmpp"
 hevc_vaapi_hwaccel_deps="vaapi VAPictureParameterBufferHEVC"
 hevc_vaapi_hwaccel_select="hevc_decoder"
 hevc_vdpau_hwaccel_deps="vdpau VdpPictureInfoHEVC"
@@ -2696,6 +2700,7 @@ vp9_cuvid_hwaccel_deps="cuda cuvid"
 vp9_cuvid_hwaccel_select="vp9_cuvid_decoder"
 vp8_mediacodec_hwaccel_deps="mediacodec"
 vp8_qsv_hwaccel_deps="libmfx"
+vp8_rkmpp_hwaccel_deps="rkmpp"
 vp9_d3d11va_hwaccel_deps="d3d11va DXVA_PicParams_VP9"
 vp9_d3d11va_hwaccel_select="vp9_decoder"
 vp9_dxva2_hwaccel_deps="dxva2 DXVA_PicParams_VP9"
@@ -2736,6 +2741,8 @@ h264_qsv_decoder_deps="libmfx"
 h264_qsv_decoder_select="h264_mp4toannexb_bsf h264_parser qsvdec 
h264_qsv_hwaccel"
 h264_qsv_encoder_deps="libmfx"
 h264_qsv_encoder_select="qsvenc"
+h264_rkmpp_decoder_deps="rkmpp"
+h264_rkmpp_decoder_select="h264_mp4toannexb_bsf rkmpp h264_rkmpp_hwaccel"
 h264_vaapi_encoder_deps="VAEncPictureParameterBufferH264"
 h264_vaapi_encoder_select="vaapi_encode golomb"
 h264_vda_decoder_deps="vda"
@@ -2751,6 +2758,8 @@ hevc_qsv_decoder_deps="libmfx"
 hevc_qsv_decoder_select="hevc_mp4toannexb_bsf hevc_parser qsvdec 
hevc_qsv_hwaccel"
 hevc_qsv_encoder_deps="libmfx"
 hevc_qsv_encoder_select="hevcparse qsvenc"
+hevc_rkmpp_decoder_deps="rkmpp"
+hevc_rkmpp_decoder_select="hevc_mp4toannexb_bsf rkmpp hevc_rkmpp_hwaccel"
 hevc_vaapi_encoder_deps="VAEncPictureParameterBufferHEVC"
 hevc_vaapi_encoder_select="vaapi_encode golomb"
 mjpeg_cuvid_decoder_deps="cuda cuvid"
@@ -2789,6 +2798,8 @@ vp8_cuvid_decoder_deps="cuda cuvid"
 vp8_mediacodec_decoder_deps="mediacodec"
 vp8_qsv_decoder_deps="libmfx"
 vp8_qsv_decoder_select="qsvdec vp8_qsv_hwaccel vp8_parser"
+vp8_rkmpp_decoder_deps="rkmpp"
+vp8_rkmpp_decoder_select="rkmpp vp8_rkmpp_hwaccel"
 vp8_vaapi_encoder_deps="VAEncPictureParameterBufferVP8"
 vp8_vaapi_encoder_select="vaapi_encode"
 vp9_cuvid_decoder_deps="cuda cuvid"
@@ -5917,6 +5928,7 @@ enabled mmal  && { check_lib mmal 
interface/mmal/mmal.h mmal_port_co
  check_lib mmal interface/mmal/mmal.h 
mmal_port_connect -lmmal_core -lmmal_util -lmmal_vc_client -lbcm_host; } ||
die "ERROR: mmal not found" &&
check_func_headers interface/mmal/mmal.h 
"MMAL_PARAMETER_VIDEO_MAX_NUM_CALLBACKS"; }
+enabled rkmpp && { check_lib rkmpp rockchip/rk_mpi.h mpp_create 
"-lrockchip_mpp" || die "ERROR : Rockchip MPP was not found."; }
 enabled netcdf&& require_pkg_config netcdf netcdf.h nc_inq_libvers
 enabled

Re: [FFmpeg-devel] [PATCH] libavfilter/scale: Populate ow/oh when using 0 as w/h

2017-06-12 Thread Kevin Mark

On Mon, Jun 12, 2017 at 9:42 PM, Michael Niedermayer
 wrote:
> why is there a cast at all ?

The cast is there because if you run this:

ffmpeg -frames:v 5 -filter_complex
"sws_flags=+accurate_rnd+bitexact;testsrc=size=320x240
[main];testsrc=size=640x360 [ref];[main][ref]
scale2ref=0:print(ow/641) [main][ref];[ref] nullsink" -map "[main]"
-flags +bitexact -fflags +bitexact -f md5 -

This works just fine without a cast for ow. 0 == 0 is true so we set
it to 640. But for oh, the print() shows that ow/641 is 0.998440. When
it is truncated from a double to an integer (eval_h = res) it becomes
0. But in our comparison, 0.998440 == 0 is false so in this case
eval_h will be truncated to 0 which is exactly the behavior we're
trying to correct. Adding that cast resolves the issue. 0.998440 == 0
is false but (int) 0.998440 == 0 is true.

For the extra cast I was talking about consider this:

ffmpeg -frames:v 5 -filter_complex
"sws_flags=+accurate_rnd+bitexact;testsrc=size=320x240
[main];testsrc=size=640x360 [ref];[main][ref]
scale2ref=500/6:print(print(ow)*5) [main][ref];[ref] nullsink" -map
"[main]" -flags +bitexact -fflags +bitexact -f md5 -

That will print() 83.33 and then 416.67. A user might
(reasonably, in my opinion) expect that the ow value (or oh) is always
an integer. With the extra cast you'll see 83.00 and 415.00
printed. 83.33 truncates to 83 so no (noticeable) change for ow
but 416.67 does not truncate to 415 so this is an example of a
place where the lack of truncation for ow/oh does change the outcome.

I hope this clears it up. Perhaps that code should just be entirely
refactored to be a little more clear?

Thanks,
Kevin
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] configure/libopenjpegdec.c/libopenjpegenc.c: Add support for LibOpenJPEG v2.2/git

2017-06-12 Thread Michael Bradshaw

On Mon, Jun 12, 2017 at 4:36 PM, Reino Wijnsma  wrote:

> This patch adds support for LibOpenJPEG v2.2/git. At the moment v2.1 is
> the highest version FFmpeg supports. I've successfully cross-compiled
> FFmpeg this way.


Are you sure you built ffmpeg using OpenJPEG v2.2? Because your patch is
missing the openjpeg_2_2_openjpeg_h entry in HEADERS_LIST in configure, so
you shouldn't be able to successfully build with OpenJPEG v2.2.


> From df61d7a295bec74c85d37042051e9dc1ef5cdbce Mon Sep 17 00:00:00 2001
> From: Reino17 
> Date: Tue, 13 Jun 2017 01:01:07 +0200
> Subject: [PATCH] Add support for LibOpenJPEG v2.2/git

> ---
>  configure   |  3 ++-
>  libavcodec/libopenjpegdec.c | 10 +++---
>  libavcodec/libopenjpegenc.c | 12 
>  3 files changed, 17 insertions(+), 8 deletions(-)

> diff --git a/configure b/configure
> index e3941f9..003d359 100755
> --- a/configure
> +++ b/configure
> @@ -5831,7 +5831,8 @@ enabled libopencv && { check_header
opencv2/core/core_c.h &&
>   require opencv opencv2/core/core_c.h
cvCreateImageHeader -lopencv_core -lopencv_imgproc; } ||
> require_pkg_config opencv opencv/cxcore.h
cvCreateImageHeader; }
>  enabled libopenh264   && require_pkg_config openh264
wels/codec_api.h WelsGetCodecVersion
> -enabled libopenjpeg   && { { check_lib libopenjpeg
openjpeg-2.1/openjpeg.h opj_version -lopenjp2 -DOPJ_STATIC && add_cppflags
-DOPJ_STATIC; } ||
> +enabled libopenjpeg   && { { check_lib libopenjpeg
openjpeg-2.2/openjpeg.h opj_version -lopenjp2 -DOPJ_STATIC && add_cppflags
-DOPJ_STATIC; } ||
> +   { check_lib libopenjpeg
openjpeg-2.1/openjpeg.h opj_version -lopenjp2 -DOPJ_STATIC && add_cppflags
-DOPJ_STATIC; } ||
> check_lib libopenjpeg
openjpeg-2.1/openjpeg.h opj_version -lopenjp2 ||
> { check_lib libopenjpeg
openjpeg-2.0/openjpeg.h opj_version -lopenjp2 -DOPJ_STATIC && add_cppflags
-DOPJ_STATIC; } ||
> { check_lib libopenjpeg
openjpeg-1.5/openjpeg.h opj_version -lopenjpeg -DOPJ_STATIC && add_cppflags
-DOPJ_STATIC; } ||
> diff --git a/libavcodec/libopenjpegdec.c b/libavcodec/libopenjpegdec.c
> index ce4e2b0..5ed9ce1 100644
> --- a/libavcodec/libopenjpegdec.c
> +++ b/libavcodec/libopenjpegdec.c
> @@ -34,7 +34,9 @@
>  #include "internal.h"
>  #include "thread.h"
>
> -#if HAVE_OPENJPEG_2_1_OPENJPEG_H
> +#if HAVE_OPENJPEG_2_2_OPENJPEG_H
> +#  include 
> +#elif HAVE_OPENJPEG_2_1_OPENJPEG_H
>  #  include 
>  #elif HAVE_OPENJPEG_2_0_OPENJPEG_H
>  #  include 
> @@ -44,7 +46,7 @@
>  #  include 
>  #endif
>
> -#if HAVE_OPENJPEG_2_1_OPENJPEG_H || HAVE_OPENJPEG_2_0_OPENJPEG_H
> +#if HAVE_OPENJPEG_2_2_OPENJPEG_H || HAVE_OPENJPEG_2_1_OPENJPEG_H ||
HAVE_OPENJPEG_2_0_OPENJPEG_H
>  #  define OPENJPEG_MAJOR_VERSION 2
>  #  define OPJ(x) OPJ_##x
>  #else
> @@ -429,7 +431,9 @@ static int libopenjpeg_decode_frame(AVCodecContext
*avctx,
>  opj_stream_set_read_function(stream, stream_read);
>  opj_stream_set_skip_function(stream, stream_skip);
>  opj_stream_set_seek_function(stream, stream_seek);
> -#if HAVE_OPENJPEG_2_1_OPENJPEG_H
> +#if HAVE_OPENJPEG_2_2_OPENJPEG_H
> +opj_stream_set_user_data(stream, &reader, NULL);
> +#elif HAVE_OPENJPEG_2_1_OPENJPEG_H
>  opj_stream_set_user_data(stream, &reader, NULL);

Please merge these two into just one #if branch. That is:

#if HAVE_OPENJPEG_2_2_OPENJPEG_H || HAVE_OPENJPEG_2_1_OPENJPEG_H
opj_stream_set_user_data(stream, &reader, NULL);
#elif HAVE_OPENJPEG_2_0_OPENJPEG_H
...

>  #elif HAVE_OPENJPEG_2_0_OPENJPEG_H
>  opj_stream_set_user_data(stream, &reader);
> diff --git a/libavcodec/libopenjpegenc.c b/libavcodec/libopenjpegenc.c
> index 4a12729..d3b9161 100644
> --- a/libavcodec/libopenjpegenc.c
> +++ b/libavcodec/libopenjpegenc.c
> @@ -32,7 +32,9 @@
>  #include "avcodec.h"
>  #include "internal.h"
>
> -#if HAVE_OPENJPEG_2_1_OPENJPEG_H
> +#if HAVE_OPENJPEG_2_2_OPENJPEG_H
> +#  include 
> +#elif HAVE_OPENJPEG_2_1_OPENJPEG_H
>  #  include 
>  #elif HAVE_OPENJPEG_2_0_OPENJPEG_H
>  #  include 
> @@ -42,7 +44,7 @@
>  #  include 
>  #endif
>
> -#if HAVE_OPENJPEG_2_1_OPENJPEG_H || HAVE_OPENJPEG_2_0_OPENJPEG_H
> +#if HAVE_OPENJPEG_2_2_OPENJPEG_H || HAVE_OPENJPEG_2_1_OPENJPEG_H ||
HAVE_OPENJPEG_2_0_OPENJPEG_H
>  #  define OPENJPEG_MAJOR_VERSION 2
>  #  define OPJ(x) OPJ_##x
>  #else
> @@ -305,7 +307,7 @@ static av_cold int
libopenjpeg_encode_init(AVCodecContext *avctx)
>
>  opj_set_default_encoder_parameters(&ctx->enc_params);
>
> -#if HAVE_OPENJPEG_2_1_OPENJPEG_H
> +#if HAVE_OPENJPEG_2_2_OPENJPEG_H || HAVE_OPENJPEG_2_1_OPENJPEG_H
>  switch (ctx->cinema_mode) {
>  case OPJ_CINEMA2K_24:
>  ctx->enc_params.rsiz = OPJ_PROFILE_CINEMA_2K;
> @@ -769,7 +771,9 @@ static int libopenjpeg_encode_frame(AVCodecContext
*avctx, AVPacket *pkt,
>  opj_stream_set_write_function(stream, stream_write);
>  opj_

Re: [FFmpeg-devel] [PATCH] libavfilter/scale: Populate ow/oh when using 0 as w/h

2017-06-12 Thread Michael Niedermayer

On Wed, Jun 07, 2017 at 03:54:26AM -0400, Kevin Mark wrote:
> I also have to wonder if it would be advantageous to add the cast on
> the right side as well. That way the var_values variables will have
> the proper truncated values on future evaluations. Open to comments on
> that.
> 
> On Wed, Jun 7, 2017 at 3:45 AM, Kevin Mark  wrote:
> > -eval_w = var_values[VAR_OUT_W] = var_values[VAR_OW] = res;
> > +eval_w = var_values[VAR_OUT_W] = var_values[VAR_OW] = (int) res == 0 ? 
> > inlink->w : res;
> 
> to perhaps:
> +eval_w = var_values[VAR_OUT_W] = var_values[VAR_OW] = (int) res
> == 0 ? inlink->w : (int) res;
> 
> Without that extra cast I assume the values in eval_w and
> var_values[VAR_OUT_W], var_values[VAR_OW] could be different. I doubt
> most users expect that those values could ever be non-integers which
> has implications for how you're writing your expression.

why is there a cast at all ?

[...]
-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

If you fake or manipulate statistics in a paper in physics you will never
get a job again.
If you fake or manipulate statistics in a paper in medicin you will get
a job for life at the pharma industry.


signature.asc
Description: Digital signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH V4] lavc/golomb: Fix UE golomb overwrite issue.

2017-06-12 Thread Michael Niedermayer

On Fri, Jun 09, 2017 at 10:34:19AM +0800, Jun Zhao wrote:
> V4: Fix rang check error in assert base on Mark's review
> V3: Clean the code logic base on Michael's review.
> V2: Add set_ue_golomb_long() to support 32bits UE golomb and update the unit 
> test.

>  golomb.h   |   17 -
>  put_bits.h |   35 +++
>  tests/golomb.c |   19 +++
>  3 files changed, 70 insertions(+), 1 deletion(-)
> 6bed99e213506530c7a58c6bffda43607a7be37c  
> 0001-lavc-golomb-Fix-UE-golomb-overwrite-issue.patch
> From fa3f59e5fcb2cddcc44b0e895bfa02caa491fee5 Mon Sep 17 00:00:00 2001
> From: Jun Zhao 
> Date: Fri, 2 Jun 2017 15:05:49 +0800
> Subject: [PATCH V4] lavc/golomb: Fix UE golomb overwrite issue.
> 
> put_bits just support write up to 31 bits, when write 32 bit in
> put_bits, it's will overwrite the bit buffer, because the default
> assert level is 0, the av_assert2(n <= 31 && value < (1U << n))
> in put_bits can not be trigger runtime. Add set_ue_golomb_long()
> to support 32bits UE golomb.
> 
> Signed-off-by: Jun Zhao 
> ---
>  libavcodec/golomb.h   | 17 -
>  libavcodec/put_bits.h | 35 +++
>  libavcodec/tests/golomb.c | 19 +++
>  3 files changed, 70 insertions(+), 1 deletion(-)

This should be 3 patches
1. changes to set_ue_golomb() commet
2. addition of put_bits64()
3. addition of set_ue_golomb_long()

[...]
-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

In a rich man's house there is no place to spit but his face.
-- Diogenes of Sinope


signature.asc
Description: Digital signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] Add libtwolame cppflags in 'configure' to prevent the use of '--extra-cflags'

2017-06-12 Thread Reino Wijnsma

I just tried to build a shared LibTwoLAME, but it gave me errors.
Building a shared LibModplug proved more successful.
FFmpeg's 'configure' didn't have any problems with it and finished
without errors.

LibOpenJPEG
 was
the reason for me to submit these patches as it already has added cppflags.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH] x86/aacpsdsp: add ff_ps_hybrid_analysis_ileave_sse

2017-06-12 Thread James Almer

About 2x faster than the c version.

Signed-off-by: James Almer 
---
Depends on "[PATCH] x86/aacpsdsp: add ff_ps_hybrid_synthesis_deint_{sse,sse4}"

 libavcodec/x86/aacpsdsp.asm| 106 +
 libavcodec/x86/aacpsdsp_init.c |   3 ++
 2 files changed, 109 insertions(+)

diff --git a/libavcodec/x86/aacpsdsp.asm b/libavcodec/x86/aacpsdsp.asm
index cdcadefcdc..70a3d84780 100644
--- a/libavcodec/x86/aacpsdsp.asm
+++ b/libavcodec/x86/aacpsdsp.asm
@@ -172,6 +172,112 @@ align 16
 .ret:
 REP_RET
 
+;**
+;void ps_hybrid_analysis_ileave_sse(float out[2][38][64],
+;   float (*in)[32][2],
+;   int i, int len)
+;**
+INIT_XMM sse
+cglobal ps_hybrid_analysis_ileave, 3, 7, 5, out, in, i, len, in0, in1, tmp
+movsxdifnidniq, id
+mov   lend, 32 << 3
+leainq, [inq+iq*4]
+mov   tmpd, id
+shl   tmpd, 8
+add   outq, tmpq
+mov   tmpd, 64
+sub   tmpd, id
+mov id, tmpd
+
+testid, 1
+jne .loop4
+testid, 2
+jne .loop8
+
+align 16
+.loop16:
+mov   in0q, inq
+mov   in1q, 38*64*4
+add   in1q, in0q
+mov   tmpd, lend
+
+.inner_loop16:
+movaps  m0, [in0q]
+movaps  m1, [in1q]
+movaps  m2, [in0q+lenq]
+movaps  m3, [in1q+lenq]
+TRANSPOSE4x4PS 0, 1, 2, 3, 4
+movaps  [outq], m0
+movaps [outq+lenq], m1
+movaps   [outq+lenq*2], m2
+movaps [outq+3*32*2*4], m3
+lea   in0q, [in0q+lenq*2]
+lea   in1q, [in1q+lenq*2]
+add   outq, mmsize
+sub   tmpd, mmsize
+jg .inner_loop16
+addinq, 16
+add   outq, 3*32*2*4
+sub id, 4
+jg .loop16
+RET
+
+align 16
+.loop8:
+mov   in0q, inq
+mov   in1q, 38*64*4
+add   in1q, in0q
+mov   tmpd, lend
+
+.inner_loop8:
+movlps  m0, [in0q]
+movlps  m1, [in1q]
+movhps  m0, [in0q+lenq]
+movhps  m1, [in1q+lenq]
+SBUTTERFLYPS 0, 1, 2
+SBUTTERFLYPD 0, 1, 2
+movaps  [outq], m0
+movaps [outq+lenq], m1
+lea   in0q, [in0q+lenq*2]
+lea   in1q, [in1q+lenq*2]
+add   outq, mmsize
+sub   tmpd, mmsize
+jg .inner_loop8
+addinq, 8
+add   outq, lenq
+sub id, 2
+jg .loop16
+RET
+
+align 16
+.loop4:
+mov   in0q, inq
+mov   in1q, 38*64*4
+add   in1q, in0q
+mov   tmpd, lend
+
+.inner_loop4:
+movss   m0, [in0q]
+movss   m1, [in1q]
+movss   m2, [in0q+lenq]
+movss   m3, [in1q+lenq]
+movlhps m0, m1
+movlhps m2, m3
+shufps  m0, m2, q2020
+movaps  [outq], m0
+lea   in0q, [in0q+lenq*2]
+lea   in1q, [in1q+lenq*2]
+add   outq, mmsize
+sub   tmpd, mmsize
+jg .inner_loop4
+addinq, 4
+sub id, 1
+testid, 2
+jne .loop8
+cmp id, 4
+jge .loop16
+RET
+
 ;***
 ;void ps_hybrid_synthesis_deint_sse4(float out[2][38][64],
 ;float (*in)[32][2],
diff --git a/libavcodec/x86/aacpsdsp_init.c b/libavcodec/x86/aacpsdsp_init.c
index 25e089c395..056e23e59e 100644
--- a/libavcodec/x86/aacpsdsp_init.c
+++ b/libavcodec/x86/aacpsdsp_init.c
@@ -44,6 +44,8 @@ void ff_ps_hybrid_synthesis_deint_sse(float out[2][38][64], 
float (*in)[32][2],
   int i, int len);
 void ff_ps_hybrid_synthesis_deint_sse4(float out[2][38][64], float 
(*in)[32][2],
int i, int len);
+void ff_ps_hybrid_analysis_ileave_sse(float (*out)[32][2], float L[2][38][64],
+  int i, int len);
 
 av_cold void ff_psdsp_init_x86(PSDSPContext *s)
 {
@@ -52,6 +54,7 @@ av_cold void ff_psdsp_init_x86(PSDSPContext *s)
 if (EXTERNAL_SSE(cpu_flags)) {
 s->add_squares= ff_ps_add_squares_sse;
 s->mul_pair_single= ff_ps_mul_pair_single_sse;
+s->hybrid_analysis_ileave = ff_ps_hybrid_analysis_ileave_sse;
 s->hybrid_synthesis_deint = ff_ps_hybrid_synthesis_deint_sse;
 s->hybrid_analysis= ff_ps_hybrid_analysis_sse;
 }
-- 
2.13.0

__

Re: [FFmpeg-devel] Add libtwolame cppflags in 'configure' to prevent the use of '--extra-cflags'

2017-06-12 Thread Hendrik Leppkes

On Tue, Jun 13, 2017 at 1:41 AM, Reino Wijnsma  wrote:
> This patch adds libtwolame cppflags in 'configure' to prevent the use of
> '--extra-cflags'. This is all that's needed to successfully
> cross-compile FFmpeg with LibTwoLAME.
>

And what happens if someone were to actually compile with a shared
TwoLAME, instead of a static one?
The same can be asked about the other patches doing the same thing.

- Hendrik
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] configure: Add libcaca cppflags in 'configure' to prevent the use of '--extra-cflags'

2017-06-12 Thread Reino Wijnsma

This patch adds libcaca cppflags in 'configure' to prevent the use of
'--extra-cflags'. This is all that's needed to successfully
cross-compile FFmpeg with Libcaca.
From ba27f7d90c5c3883bd302caf977837d970d46387 Mon Sep 17 00:00:00 2001
From: Reino17 
Date: Tue, 13 Jun 2017 01:19:07 +0200
Subject: [PATCH] Add libcaca cppflags in 'configure' to prevent the use of
 '--extra-cflags'

---
 configure | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/configure b/configure
index e3941f9..d96dbf9 100755
--- a/configure
+++ b/configure
@@ -5795,7 +5795,7 @@ enabled libbs2b   && require_pkg_config libbs2b 
bs2b.h bs2b_open
 enabled libcelt   && require libcelt celt/celt.h celt_decode -lcelt0 &&
  { check_lib libcelt celt/celt.h 
celt_decoder_create_custom -lcelt0 ||
die "ERROR: libcelt must be installed and 
version must be >= 0.11.0."; }
-enabled libcaca   && require_pkg_config caca caca.h caca_create_canvas
+enabled libcaca   && require_pkg_config caca caca.h caca_create_canvas 
-DCACA_STATIC && add_cppflags -DCACA_STATIC
 enabled libdc1394 && require_pkg_config libdc1394-2 dc1394/dc1394.h 
dc1394_new
 enabled libfdk_aac&& { use_pkg_config fdk-aac "fdk-aac/aacenc_lib.h" 
aacEncOpen ||
{ require libfdk_aac fdk-aac/aacenc_lib.h 
aacEncOpen -lfdk-aac &&
-- 
2.8.3

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] configure: Add libmodplug cppflags in 'configure' to prevent the use of '--extra-cflags'

2017-06-12 Thread Reino Wijnsma

This patch adds libmodplug cppflags in 'configure' to prevent the use of
'--extra-cflags'. This is all that's needed to successfully
cross-compile FFmpeg with LibModplug.
From 7fe7113ccc79f46610b2a89f8ab17e94c348e568 Mon Sep 17 00:00:00 2001
From: Reino17 
Date: Tue, 13 Jun 2017 01:43:09 +0200
Subject: [PATCH] Add libmodplug cppflags in 'configure' to prevent the use of
 '--extra-cflags'

---
 configure | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/configure b/configure
index e3941f9..6df1cfaa 100755
--- a/configure
+++ b/configure
@@ -5820,7 +5820,7 @@ enabled libkvazaar&& require_pkg_config "kvazaar 
>= 0.8.1" kvazaar.h kvz
 # can find the libraries and headers through other means.
 enabled libmfx&& { use_pkg_config libmfx "mfx/mfxvideo.h" MFXInit 
||
{ require libmfx "mfx/mfxvideo.h" MFXInit 
-llibmfx && warn "using libmfx without pkg-config"; } }
-enabled libmodplug&& require_pkg_config libmodplug 
libmodplug/modplug.h ModPlug_Load
+enabled libmodplug&& require_pkg_config libmodplug 
libmodplug/modplug.h ModPlug_Load -DMODPLUG_STATIC && add_cppflags 
-DMODPLUG_STATIC
 enabled libmp3lame&& require "libmp3lame >= 3.98.3" lame/lame.h 
lame_set_VBR_quality -lmp3lame
 enabled libmysofa && require libmysofa "mysofa.h" mysofa_load -lmysofa
 enabled libnpp&& require libnpp npp.h nppGetLibVersion -lnppi 
-lnppc
-- 
2.8.3

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] Add libtwolame cppflags in 'configure' to prevent the use of '--extra-cflags'

2017-06-12 Thread Reino Wijnsma

This patch adds libtwolame cppflags in 'configure' to prevent the use of
'--extra-cflags'. This is all that's needed to successfully
cross-compile FFmpeg with LibTwoLAME.
From 9aacc77f0dcb67d375fd3bdf4d0a1dc8d55b1cf7 Mon Sep 17 00:00:00 2001
From: Reino17 
Date: Tue, 13 Jun 2017 01:10:53 +0200
Subject: [PATCH] Add libtwolame cppflags in 'configure' to prevent the use of
 '--extra-cflags'

---
 configure | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/configure b/configure
index e3941f9..e2c0cb9 100755
--- a/configure
+++ b/configure
@@ -5859,7 +5859,7 @@ enabled libssh&& require_pkg_config libssh 
libssh/sftp.h sftp_init
 enabled libspeex  && require_pkg_config speex speex/speex.h 
speex_decoder_init -lspeex
 enabled libtesseract  && require_pkg_config tesseract tesseract/capi.h 
TessBaseAPICreate
 enabled libtheora && require libtheora theora/theoraenc.h th_info_init 
-ltheoraenc -ltheoradec -logg
-enabled libtwolame&& require libtwolame twolame.h twolame_init 
-ltwolame &&
+enabled libtwolame&& require libtwolame twolame.h twolame_init 
-ltwolame -DLIBTWOLAME_STATIC && add_cppflags -DLIBTWOLAME_STATIC &&
  { check_lib libtwolame twolame.h 
twolame_encode_buffer_float32_interleaved -ltwolame ||
die "ERROR: libtwolame must be installed and 
version must be >= 0.3.10"; }
 enabled libv4l2   && require_pkg_config libv4l2 libv4l2.h v4l2_ioctl
-- 
2.8.3

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] configure/libopenjpegdec.c/libopenjpegenc.c: Add support for LibOpenJPEG v2.2/git

2017-06-12 Thread Reino Wijnsma

This patch adds support for LibOpenJPEG v2.2/git. At the moment v2.1 is
the highest version FFmpeg supports. I've successfully cross-compiled
FFmpeg this way.
From df61d7a295bec74c85d37042051e9dc1ef5cdbce Mon Sep 17 00:00:00 2001
From: Reino17 
Date: Tue, 13 Jun 2017 01:01:07 +0200
Subject: [PATCH] Add support for LibOpenJPEG v2.2/git

---
 configure   |  3 ++-
 libavcodec/libopenjpegdec.c | 10 +++---
 libavcodec/libopenjpegenc.c | 12 
 3 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/configure b/configure
index e3941f9..003d359 100755
--- a/configure
+++ b/configure
@@ -5831,7 +5831,8 @@ enabled libopencv && { check_header 
opencv2/core/core_c.h &&
  require opencv opencv2/core/core_c.h 
cvCreateImageHeader -lopencv_core -lopencv_imgproc; } ||
require_pkg_config opencv opencv/cxcore.h 
cvCreateImageHeader; }
 enabled libopenh264   && require_pkg_config openh264 wels/codec_api.h 
WelsGetCodecVersion
-enabled libopenjpeg   && { { check_lib libopenjpeg openjpeg-2.1/openjpeg.h 
opj_version -lopenjp2 -DOPJ_STATIC && add_cppflags -DOPJ_STATIC; } ||
+enabled libopenjpeg   && { { check_lib libopenjpeg openjpeg-2.2/openjpeg.h 
opj_version -lopenjp2 -DOPJ_STATIC && add_cppflags -DOPJ_STATIC; } ||
+   { check_lib libopenjpeg openjpeg-2.1/openjpeg.h 
opj_version -lopenjp2 -DOPJ_STATIC && add_cppflags -DOPJ_STATIC; } ||
check_lib libopenjpeg openjpeg-2.1/openjpeg.h 
opj_version -lopenjp2 ||
{ check_lib libopenjpeg openjpeg-2.0/openjpeg.h 
opj_version -lopenjp2 -DOPJ_STATIC && add_cppflags -DOPJ_STATIC; } ||
{ check_lib libopenjpeg openjpeg-1.5/openjpeg.h 
opj_version -lopenjpeg -DOPJ_STATIC && add_cppflags -DOPJ_STATIC; } ||
diff --git a/libavcodec/libopenjpegdec.c b/libavcodec/libopenjpegdec.c
index ce4e2b0..5ed9ce1 100644
--- a/libavcodec/libopenjpegdec.c
+++ b/libavcodec/libopenjpegdec.c
@@ -34,7 +34,9 @@
 #include "internal.h"
 #include "thread.h"
 
-#if HAVE_OPENJPEG_2_1_OPENJPEG_H
+#if HAVE_OPENJPEG_2_2_OPENJPEG_H
+#  include 
+#elif HAVE_OPENJPEG_2_1_OPENJPEG_H
 #  include 
 #elif HAVE_OPENJPEG_2_0_OPENJPEG_H
 #  include 
@@ -44,7 +46,7 @@
 #  include 
 #endif
 
-#if HAVE_OPENJPEG_2_1_OPENJPEG_H || HAVE_OPENJPEG_2_0_OPENJPEG_H
+#if HAVE_OPENJPEG_2_2_OPENJPEG_H || HAVE_OPENJPEG_2_1_OPENJPEG_H || 
HAVE_OPENJPEG_2_0_OPENJPEG_H
 #  define OPENJPEG_MAJOR_VERSION 2
 #  define OPJ(x) OPJ_##x
 #else
@@ -429,7 +431,9 @@ static int libopenjpeg_decode_frame(AVCodecContext *avctx,
 opj_stream_set_read_function(stream, stream_read);
 opj_stream_set_skip_function(stream, stream_skip);
 opj_stream_set_seek_function(stream, stream_seek);
-#if HAVE_OPENJPEG_2_1_OPENJPEG_H
+#if HAVE_OPENJPEG_2_2_OPENJPEG_H
+opj_stream_set_user_data(stream, &reader, NULL);
+#elif HAVE_OPENJPEG_2_1_OPENJPEG_H
 opj_stream_set_user_data(stream, &reader, NULL);
 #elif HAVE_OPENJPEG_2_0_OPENJPEG_H
 opj_stream_set_user_data(stream, &reader);
diff --git a/libavcodec/libopenjpegenc.c b/libavcodec/libopenjpegenc.c
index 4a12729..d3b9161 100644
--- a/libavcodec/libopenjpegenc.c
+++ b/libavcodec/libopenjpegenc.c
@@ -32,7 +32,9 @@
 #include "avcodec.h"
 #include "internal.h"
 
-#if HAVE_OPENJPEG_2_1_OPENJPEG_H
+#if HAVE_OPENJPEG_2_2_OPENJPEG_H
+#  include 
+#elif HAVE_OPENJPEG_2_1_OPENJPEG_H
 #  include 
 #elif HAVE_OPENJPEG_2_0_OPENJPEG_H
 #  include 
@@ -42,7 +44,7 @@
 #  include 
 #endif
 
-#if HAVE_OPENJPEG_2_1_OPENJPEG_H || HAVE_OPENJPEG_2_0_OPENJPEG_H
+#if HAVE_OPENJPEG_2_2_OPENJPEG_H || HAVE_OPENJPEG_2_1_OPENJPEG_H || 
HAVE_OPENJPEG_2_0_OPENJPEG_H
 #  define OPENJPEG_MAJOR_VERSION 2
 #  define OPJ(x) OPJ_##x
 #else
@@ -305,7 +307,7 @@ static av_cold int libopenjpeg_encode_init(AVCodecContext 
*avctx)
 
 opj_set_default_encoder_parameters(&ctx->enc_params);
 
-#if HAVE_OPENJPEG_2_1_OPENJPEG_H
+#if HAVE_OPENJPEG_2_2_OPENJPEG_H || HAVE_OPENJPEG_2_1_OPENJPEG_H
 switch (ctx->cinema_mode) {
 case OPJ_CINEMA2K_24:
 ctx->enc_params.rsiz = OPJ_PROFILE_CINEMA_2K;
@@ -769,7 +771,9 @@ static int libopenjpeg_encode_frame(AVCodecContext *avctx, 
AVPacket *pkt,
 opj_stream_set_write_function(stream, stream_write);
 opj_stream_set_skip_function(stream, stream_skip);
 opj_stream_set_seek_function(stream, stream_seek);
-#if HAVE_OPENJPEG_2_1_OPENJPEG_H
+#if HAVE_OPENJPEG_2_2_OPENJPEG_H
+opj_stream_set_user_data(stream, &writer, NULL);
+#elif HAVE_OPENJPEG_2_1_OPENJPEG_H
 opj_stream_set_user_data(stream, &writer, NULL);
 #elif HAVE_OPENJPEG_2_0_OPENJPEG_H
 opj_stream_set_user_data(stream, &writer);
-- 
2.8.3

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] Sharing cuda context between transcode sessions to reduce initialization overhead

2017-06-12 Thread Mark Thompson



On 12/06/17 22:33, Hendrik Leppkes wrote:
> Am 12.06.2017 10:38 nachm. schrieb "Ganapathy Raman Kasi" > :
> 
> Hi,
> 
> 
> Currently incase of using 1 -> N transcode (1 SW decode -> N  NVENC
> encodes) without HW upload filter, we end up allocating multiple Cuda
> contexts for the N transcode sessions for the same underlying gpu device.
> This comes with the cuda context initialization overhead. (~100 ms per
> context creation with 4th gen i5 with GTX 1080 in ubuntu 16.04).  Also in
> case of  M * (1->N) full HW accelerated transcode we face this issue where
> the cuda context is not shared between the M transcode sessions. Sharing
> the context would greatly reduce the initialization time which will matter
> in case of short clip transcodes.
> 
> 
> I currently have a global array in avutil/hwcontext_cuda.c which keeps
> track of the cuda contexts created and reuses existing contexts when
> request for hwdevice ctx create occurs. This is shared in the attached
> patch. Please check the approach and let me know if there is better/cleaner
> way to do this. Thanks
> 
> 
> Global state in the libraries is something we absolutely try to stay away
> from, so this approach is not quite appropriate.
> 
> If you want to somehow share this, it should be in the ffmpeg command line
> tool somewhere, however we also try to reduce hardware specific magic in
> favor of abstractions

Using hwupload_cuda creates a new device out of nowhere in the middle of the 
graph, and you can't do anything to avoid that behaviour without nasty hackery 
in the libraries.  So, use generic hwupload instead, which will use a device 
provided by the user.

With the patch series just posted:

"ffmpeg ... -init_hw_device cuda=foo:bar -filter_hw_device foo -vf 
...hwupload... -c:v nvenc... -vf ...hwupload... -c:v nvenc..."

It doesn't currently solve cases which require multiple devices in a single 
graph, though - thoughts definitely welcome on how to do that.


- Mark
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH 22/24] vf_hwmap: Add device derivation

2017-06-12 Thread Mark Thompson

Also refactor a little and improve error messages to make failure
cases easier to understand.

(cherry picked from commit 38cb05f1c89cae1862b360d4e7e3f0cd2b5bbb67)
---
 libavfilter/vf_hwmap.c | 67 --
 1 file changed, 49 insertions(+), 18 deletions(-)

diff --git a/libavfilter/vf_hwmap.c b/libavfilter/vf_hwmap.c
index 654477c6f2..c0fb42a1bc 100644
--- a/libavfilter/vf_hwmap.c
+++ b/libavfilter/vf_hwmap.c
@@ -30,10 +30,10 @@
 typedef struct HWMapContext {
 const AVClass *class;
 
-AVBufferRef   *hwdevice_ref;
 AVBufferRef   *hwframes_ref;
 
 intmode;
+char  *derive_device_type;
 intmap_backwards;
 } HWMapContext;
 
@@ -56,6 +56,7 @@ static int hwmap_config_output(AVFilterLink *outlink)
 HWMapContext  *ctx = avctx->priv;
 AVFilterLink   *inlink = avctx->inputs[0];
 AVHWFramesContext *hwfc;
+AVBufferRef *device;
 const AVPixFmtDescriptor *desc;
 int err;
 
@@ -63,30 +64,58 @@ static int hwmap_config_output(AVFilterLink *outlink)
av_get_pix_fmt_name(inlink->format),
av_get_pix_fmt_name(outlink->format));
 
+av_buffer_unref(&ctx->hwframes_ref);
+
+device = avctx->hw_device_ctx;
+
 if (inlink->hw_frames_ctx) {
 hwfc = (AVHWFramesContext*)inlink->hw_frames_ctx->data;
 
+if (ctx->derive_device_type) {
+enum AVHWDeviceType type;
+
+type = av_hwdevice_find_type_by_name(ctx->derive_device_type);
+if (type == AV_HWDEVICE_TYPE_NONE) {
+av_log(avctx, AV_LOG_ERROR, "Invalid device type.\n");
+goto fail;
+}
+
+err = av_hwdevice_ctx_create_derived(&device, type,
+ hwfc->device_ref, 0);
+if (err < 0) {
+av_log(avctx, AV_LOG_ERROR, "Failed to created derived "
+   "device context: %d.\n", err);
+goto fail;
+}
+}
+
 desc = av_pix_fmt_desc_get(outlink->format);
-if (!desc)
-return AVERROR(EINVAL);
+if (!desc) {
+err = AVERROR(EINVAL);
+goto fail;
+}
 
 if (inlink->format == hwfc->format &&
 (desc->flags & AV_PIX_FMT_FLAG_HWACCEL)) {
 // Map between two hardware formats (including the case of
 // undoing an existing mapping).
 
-ctx->hwdevice_ref = av_buffer_ref(avctx->hw_device_ctx);
-if (!ctx->hwdevice_ref) {
-err = AVERROR(ENOMEM);
+if (!device) {
+av_log(avctx, AV_LOG_ERROR, "A device reference is "
+   "required to map to a hardware format.\n");
+err = AVERROR(EINVAL);
 goto fail;
 }
 
 err = av_hwframe_ctx_create_derived(&ctx->hwframes_ref,
 outlink->format,
-ctx->hwdevice_ref,
+device,
 inlink->hw_frames_ctx, 0);
-if (err < 0)
+if (err < 0) {
+av_log(avctx, AV_LOG_ERROR, "Failed to create derived "
+   "frames context: %d.\n", err);
 goto fail;
+}
 
 } else if ((outlink->format == hwfc->format &&
 inlink->format  == hwfc->sw_format) ||
@@ -94,8 +123,6 @@ static int hwmap_config_output(AVFilterLink *outlink)
 // Map from a hardware format to a software format, or
 // undo an existing such mapping.
 
-ctx->hwdevice_ref = NULL;
-
 ctx->hwframes_ref = av_buffer_ref(inlink->hw_frames_ctx);
 if (!ctx->hwframes_ref) {
 err = AVERROR(ENOMEM);
@@ -119,15 +146,17 @@ static int hwmap_config_output(AVFilterLink *outlink)
 // returns frames mapped from that to the previous link in
 // order to fill them without an additional copy.
 
-ctx->map_backwards = 1;
-
-ctx->hwdevice_ref = av_buffer_ref(avctx->hw_device_ctx);
-if (!ctx->hwdevice_ref) {
-err = AVERROR(ENOMEM);
+if (!device) {
+av_log(avctx, AV_LOG_ERROR, "A device reference is "
+   "required to create new frames with backwards "
+   "mapping.\n");
+err = AVERROR(EINVAL);
 goto fail;
 }
 
-ctx->hwframes_ref = av_hwframe_ctx_alloc(ctx->hwdevice_ref);
+ctx->map_backwards = 1;
+
+ctx->hwframes_ref = av_hwframe_ctx_alloc(device);
 if (!ctx->hwframes_ref) {
 err = AVERROR(ENOMEM);
 goto fail;
@@ -165,7 +194,6 @@ static int hwmap_config_output(AVFilterLink *outlink)
 
 fail:
 av_buffer_unref(&ctx->hwframes_ref);
-av_buffer_unref(&ctx->hwdevice_ref);
 return

[FFmpeg-devel] [PATCH 24/24] doc: Document hwupload, hwdownload and hwmap filters

2017-06-12 Thread Mark Thompson

(cherry picked from commit 66aa9b94dae217a0fc5acfb704490707629d95ed)
---
 doc/filters.texi | 98 
 1 file changed, 98 insertions(+)

diff --git a/doc/filters.texi b/doc/filters.texi
index 023096f4e0..db0bdfe254 100644
--- a/doc/filters.texi
+++ b/doc/filters.texi
@@ -9040,6 +9040,104 @@ A floating point number which specifies chroma temporal 
strength. It defaults to
 @var{luma_tmp}*@var{chroma_spatial}/@var{luma_spatial}.
 @end table
 
+@section hwdownload
+
+Download hardware frames to system memory.
+
+The input must be in hardware frames, and the output a non-hardware format.
+Not all formats will be supported on the output - it may be necessary to insert
+an additional @option{format} filter immediately following in the graph to get
+the output in a supported format.
+
+@section hwmap
+
+Map hardware frames to system memory or to another device.
+
+This filter has several different modes of operation; which one is used depends
+on the input and output formats:
+@itemize
+@item
+Hardware frame input, normal frame output
+
+Map the input frames to system memory and pass them to the output.  If the
+original hardware frame is later required (for example, after overlaying
+something else on part of it), the @option{hwmap} filter can be used again
+in the next mode to retrieve it.
+@item
+Normal frame input, hardware frame output
+
+If the input is actually a software-mapped hardware frame, then unmap it -
+that is, return the original hardware frame.
+
+Otherwise, a device must be provided.  Create new hardware surfaces on that
+device for the output, then map them back to the software format at the input
+and give those frames to the preceding filter.  This will then act like the
+@option{hwupload} filter, but may be able to avoid an additional copy when
+the input is already in a compatible format.
+@item
+Hardware frame input and output
+
+A device must be supplied for the output, either directly or with the
+@option{derive_device} option.  The input and output devices must be of
+different types and compatible - the exact meaning of this is
+system-dependent, but typically it means that they must refer to the same
+underlying hardware context (for example, refer to the same graphics card).
+
+If the input frames were originally created on the output device, then unmap
+to retrieve the original frames.
+
+Otherwise, map the frames to the output device - create new hardware frames
+on the output corresponding to the frames on the input.
+@end itemize
+
+The following additional parameters are accepted:
+
+@table @option
+@item mode
+Set the frame mapping mode.  Some combination of:
+@table @var
+@item read
+The mapped frame should be readable.
+@item write
+The mapped frame should be writeable.
+@item overwrite
+The mapping will always overwrite the entire frame.
+
+This may improve performance in some cases, as the original contents of the
+frame need not be loaded.
+@item direct
+The mapping must not involve any copying.
+
+Indirect mappings to copies of frames are created in some cases where either
+direct mapping is not possible or it would have unexpected properties.
+Setting this flag ensures that the mapping is direct and will fail if that is
+not possible.
+@end table
+Defaults to @var{read+write} if not specified.
+
+@item derive_device @var{type}
+Rather than using the device supplied at initialisation, instead derive a new
+device of type @var{type} from the device the input frames exist on.
+
+@item reverse
+In a hardware to hardware mapping, map in reverse - create frames in the sink
+and map them back to the source.  This may be necessary in some cases where
+a mapping in one direction is required but only the opposite direction is
+supported by the devices being used.
+
+This option is dangerous - it may break the preceding filter in undefined
+ways if there are any additional constraints on that filter's output.
+Do not use it without fully understanding the implications of its use.
+@end table
+
+@section hwupload
+
+Upload system memory frames to hardware surfaces.
+
+The device to upload to must be supplied when the filter is initialised.  If
+using ffmpeg, select the appropriate device with the @option{-filter_hw_device}
+option.
+
 @anchor{hwupload_cuda}
 @section hwupload_cuda
 
-- 
2.11.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH 23/24] vf_hwmap: Add reverse mapping for hardware frames

2017-06-12 Thread Mark Thompson

This is something of a hack.  It allocates a new hwframe context for
the target format, then maps it back to the source link and overwrites
the input link hw_frames_ctx so that the previous filter will receive
the frames we want from ff_get_video_buffer().  It may fail if
the previous filter imposes any additional constraints on the frames
it wants to use as output.

(cherry picked from commit 81a4cb8e58636d4efd200c2b4fec786a7e948d8b)
---
 libavfilter/vf_hwmap.c | 68 --
 1 file changed, 61 insertions(+), 7 deletions(-)

diff --git a/libavfilter/vf_hwmap.c b/libavfilter/vf_hwmap.c
index c0fb42a1bc..c40ed4baf7 100644
--- a/libavfilter/vf_hwmap.c
+++ b/libavfilter/vf_hwmap.c
@@ -34,7 +34,7 @@ typedef struct HWMapContext {
 
 intmode;
 char  *derive_device_type;
-intmap_backwards;
+intreverse;
 } HWMapContext;
 
 static int hwmap_query_formats(AVFilterContext *avctx)
@@ -96,7 +96,8 @@ static int hwmap_config_output(AVFilterLink *outlink)
 }
 
 if (inlink->format == hwfc->format &&
-(desc->flags & AV_PIX_FMT_FLAG_HWACCEL)) {
+(desc->flags & AV_PIX_FMT_FLAG_HWACCEL) &&
+!ctx->reverse) {
 // Map between two hardware formats (including the case of
 // undoing an existing mapping).
 
@@ -117,6 +118,56 @@ static int hwmap_config_output(AVFilterLink *outlink)
 goto fail;
 }
 
+} else if (inlink->format == hwfc->format &&
+   (desc->flags & AV_PIX_FMT_FLAG_HWACCEL) &&
+   ctx->reverse) {
+// Map between two hardware formats, but do it in reverse.
+// Make a new hwframe context for the target type, and then
+// overwrite the input hwframe context with a derived context
+// mapped from that back to the source type.
+AVBufferRef *source;
+AVHWFramesContext *frames;
+
+ctx->hwframes_ref = av_hwframe_ctx_alloc(device);
+if (!ctx->hwframes_ref) {
+err = AVERROR(ENOMEM);
+goto fail;
+}
+frames = (AVHWFramesContext*)ctx->hwframes_ref->data;
+
+frames->format= outlink->format;
+frames->sw_format = hwfc->sw_format;
+frames->width = hwfc->width;
+frames->height= hwfc->height;
+frames->initial_pool_size = 64;
+
+err = av_hwframe_ctx_init(ctx->hwframes_ref);
+if (err < 0) {
+av_log(avctx, AV_LOG_ERROR, "Failed to initialise "
+   "target frames context: %d.\n", err);
+goto fail;
+}
+
+err = av_hwframe_ctx_create_derived(&source,
+inlink->format,
+hwfc->device_ref,
+ctx->hwframes_ref,
+ctx->mode);
+if (err < 0) {
+av_log(avctx, AV_LOG_ERROR, "Failed to create "
+   "derived source frames context: %d.\n", err);
+goto fail;
+}
+
+// Here is the naughty bit.  This overwriting changes what
+// ff_get_video_buffer() in the previous filter returns -
+// it will now give a frame allocated here mapped back to
+// the format it expects.  If there were any additional
+// constraints on the output frames there then this may
+// break nastily.
+av_buffer_unref(&inlink->hw_frames_ctx);
+inlink->hw_frames_ctx = source;
+
 } else if ((outlink->format == hwfc->format &&
 inlink->format  == hwfc->sw_format) ||
inlink->format == hwfc->format) {
@@ -148,13 +199,13 @@ static int hwmap_config_output(AVFilterLink *outlink)
 
 if (!device) {
 av_log(avctx, AV_LOG_ERROR, "A device reference is "
-   "required to create new frames with backwards "
+   "required to create new frames with reverse "
"mapping.\n");
 err = AVERROR(EINVAL);
 goto fail;
 }
 
-ctx->map_backwards = 1;
+ctx->reverse = 1;
 
 ctx->hwframes_ref = av_hwframe_ctx_alloc(device);
 if (!ctx->hwframes_ref) {
@@ -171,7 +222,7 @@ static int hwmap_config_output(AVFilterLink *outlink)
 err = av_hwframe_ctx_init(ctx->hwframes_ref);
 if (err < 0) {
 av_log(avctx, AV_LOG_ERROR, "Failed to create frame "
-   "context for backward mapping: %d.\n", err);
+   "context for reverse mapping: %d.\n", err);
 goto fail;
 }
 
@@ -203,7 +254,7 @@ static AVFrame *hwmap_get_buffer(AVFilterLink *inlink, int 
w, int h)
 AVFilterL

[FFmpeg-devel] [PATCH 21/24] hwcontext: Improve allocation in derived contexts

2017-06-12 Thread Mark Thompson

Use the flags argument of av_hwframe_ctx_create_derived() to pass the
mapping flags which will be used on allocation.  Also, set the format
and hardware context on the allocated frame automatically - the user
should not be required to do this themselves.

(cherry picked from commit c5714b51aad41fef56dddac1d542e7fc6b984627)
---
 doc/APIchanges |  4 
 libavutil/hwcontext.c  | 14 +-
 libavutil/hwcontext.h  |  4 +++-
 libavutil/hwcontext_internal.h |  5 +
 libavutil/version.h|  2 +-
 5 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/doc/APIchanges b/doc/APIchanges
index 12c4877b9b..19776f830e 100644
--- a/doc/APIchanges
+++ b/doc/APIchanges
@@ -15,6 +15,10 @@ libavutil: 2015-08-28
 
 API changes, most recent first:
 
+2017-06-xx - xxx - lavu 55.66.100 - hwcontext.h
+  av_hwframe_ctx_create_derived() now takes some AV_HWFRAME_MAP_* combination
+  as its flags argument (which was previously unused).
+
 2017-06-xx - xxx - lavc 57.99.100 - avcodec.h
   Add AV_HWACCEL_FLAG_ALLOW_PROFILE_MISMATCH.
 
diff --git a/libavutil/hwcontext.c b/libavutil/hwcontext.c
index ba7ffd1951..4726986902 100644
--- a/libavutil/hwcontext.c
+++ b/libavutil/hwcontext.c
@@ -458,6 +458,11 @@ int av_hwframe_get_buffer(AVBufferRef *hwframe_ref, 
AVFrame *frame, int flags)
 // and map the frame immediately.
 AVFrame *src_frame;
 
+frame->format = ctx->format;
+frame->hw_frames_ctx = av_buffer_ref(hwframe_ref);
+if (!frame->hw_frames_ctx)
+return AVERROR(ENOMEM);
+
 src_frame = av_frame_alloc();
 if (!src_frame)
 return AVERROR(ENOMEM);
@@ -467,7 +472,8 @@ int av_hwframe_get_buffer(AVBufferRef *hwframe_ref, AVFrame 
*frame, int flags)
 if (ret < 0)
 return ret;
 
-ret = av_hwframe_map(frame, src_frame, 0);
+ret = av_hwframe_map(frame, src_frame,
+ ctx->internal->source_allocation_map_flags);
 if (ret) {
 av_log(ctx, AV_LOG_ERROR, "Failed to map frame into derived "
"frame context: %d.\n", ret);
@@ -819,6 +825,12 @@ int av_hwframe_ctx_create_derived(AVBufferRef 
**derived_frame_ctx,
 goto fail;
 }
 
+dst->internal->source_allocation_map_flags =
+flags & (AV_HWFRAME_MAP_READ  |
+ AV_HWFRAME_MAP_WRITE |
+ AV_HWFRAME_MAP_OVERWRITE |
+ AV_HWFRAME_MAP_DIRECT);
+
 ret = AVERROR(ENOSYS);
 if (src->internal->hw_type->frames_derive_from)
 ret = src->internal->hw_type->frames_derive_from(dst, src, flags);
diff --git a/libavutil/hwcontext.h b/libavutil/hwcontext.h
index 37e8831f6b..edf12cc631 100644
--- a/libavutil/hwcontext.h
+++ b/libavutil/hwcontext.h
@@ -566,7 +566,9 @@ int av_hwframe_map(AVFrame *dst, const AVFrame *src, int 
flags);
  *   AVHWFramesContext on.
  * @param source_frame_ctx   A reference to an existing AVHWFramesContext
  *   which will be mapped to the derived context.
- * @param flags  Currently unused; should be set to zero.
+ * @param flags  Some combination of AV_HWFRAME_MAP_* flags, defining the
+ *   mapping parameters to apply to frames which are allocated
+ *   in the derived device.
  * @return   Zero on success, negative AVERROR code on failure.
  */
 int av_hwframe_ctx_create_derived(AVBufferRef **derived_frame_ctx,
diff --git a/libavutil/hwcontext_internal.h b/libavutil/hwcontext_internal.h
index 0a0c4e86ce..68f78c0a1f 100644
--- a/libavutil/hwcontext_internal.h
+++ b/libavutil/hwcontext_internal.h
@@ -121,6 +121,11 @@ struct AVHWFramesInternal {
  * context it was derived from.
  */
 AVBufferRef *source_frames;
+/**
+ * Flags to apply to the mapping from the source to the derived
+ * frame context when trying to allocate in the derived context.
+ */
+int source_allocation_map_flags;
 };
 
 typedef struct HWMapDescriptor {
diff --git a/libavutil/version.h b/libavutil/version.h
index 322b683cf4..308d16f95b 100644
--- a/libavutil/version.h
+++ b/libavutil/version.h
@@ -80,7 +80,7 @@
 
 
 #define LIBAVUTIL_VERSION_MAJOR  55
-#define LIBAVUTIL_VERSION_MINOR  65
+#define LIBAVUTIL_VERSION_MINOR  66
 #define LIBAVUTIL_VERSION_MICRO 100
 
 #define LIBAVUTIL_VERSION_INT   AV_VERSION_INT(LIBAVUTIL_VERSION_MAJOR, \
-- 
2.11.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH 20/24] hwcontext_qsv: Implement mapping frames to the child device type

2017-06-12 Thread Mark Thompson

(cherry picked from commit e1c5d56b18b82e3fb42382b1b1f972e8b371fc38)
---
 libavutil/hwcontext_qsv.c | 88 +--
 1 file changed, 86 insertions(+), 2 deletions(-)

diff --git a/libavutil/hwcontext_qsv.c b/libavutil/hwcontext_qsv.c
index 8dbff88b0a..75057f7d52 100644
--- a/libavutil/hwcontext_qsv.c
+++ b/libavutil/hwcontext_qsv.c
@@ -577,13 +577,62 @@ static int qsv_transfer_get_formats(AVHWFramesContext 
*ctx,
 return 0;
 }
 
+static int qsv_frames_derive_from(AVHWFramesContext *dst_ctx,
+  AVHWFramesContext *src_ctx, int flags)
+{
+AVQSVFramesContext *src_hwctx = src_ctx->hwctx;
+int i;
+
+switch (dst_ctx->device_ctx->type) {
+#if CONFIG_VAAPI
+case AV_HWDEVICE_TYPE_VAAPI:
+{
+AVVAAPIFramesContext *dst_hwctx = dst_ctx->hwctx;
+dst_hwctx->surface_ids = av_mallocz_array(src_hwctx->nb_surfaces,
+  
sizeof(*dst_hwctx->surface_ids));
+if (!dst_hwctx->surface_ids)
+return AVERROR(ENOMEM);
+for (i = 0; i < src_hwctx->nb_surfaces; i++)
+dst_hwctx->surface_ids[i] =
+*(VASurfaceID*)src_hwctx->surfaces[i].Data.MemId;
+dst_hwctx->nb_surfaces = src_hwctx->nb_surfaces;
+}
+break;
+#endif
+#if CONFIG_DXVA2
+case AV_HWDEVICE_TYPE_DXVA2:
+{
+AVDXVA2FramesContext *dst_hwctx = dst_ctx->hwctx;
+dst_hwctx->surfaces = av_mallocz_array(src_hwctx->nb_surfaces,
+   
sizeof(*dst_hwctx->surfaces));
+if (!dst_hwctx->surfaces)
+return AVERROR(ENOMEM);
+for (i = 0; i < src_hwctx->nb_surfaces; i++)
+dst_hwctx->surfaces[i] =
+(IDirect3DSurface9*)src_hwctx->surfaces[i].Data.MemId;
+dst_hwctx->nb_surfaces = src_hwctx->nb_surfaces;
+if (src_hwctx->frame_type == 
MFX_MEMTYPE_VIDEO_MEMORY_DECODER_TARGET)
+dst_hwctx->surface_type = DXVA2_VideoDecoderRenderTarget;
+else
+dst_hwctx->surface_type = DXVA2_VideoProcessorRenderTarget;
+}
+break;
+#endif
+default:
+return AVERROR(ENOSYS);
+}
+
+return 0;
+}
+
 static int qsv_map_from(AVHWFramesContext *ctx,
 AVFrame *dst, const AVFrame *src, int flags)
 {
 QSVFramesContext *s = ctx->internal->priv;
 mfxFrameSurface1 *surf = (mfxFrameSurface1*)src->data[3];
 AVHWFramesContext *child_frames_ctx;
-
+const AVPixFmtDescriptor *desc;
+uint8_t *child_data;
 AVFrame *dummy;
 int ret = 0;
 
@@ -591,6 +640,40 @@ static int qsv_map_from(AVHWFramesContext *ctx,
 return AVERROR(ENOSYS);
 child_frames_ctx = (AVHWFramesContext*)s->child_frames_ref->data;
 
+switch (child_frames_ctx->device_ctx->type) {
+#if CONFIG_VAAPI
+case AV_HWDEVICE_TYPE_VAAPI:
+child_data = (uint8_t*)(intptr_t)*(VASurfaceID*)surf->Data.MemId;
+break;
+#endif
+#if CONFIG_DXVA2
+case AV_HWDEVICE_TYPE_DXVA2:
+child_data = surf->Data.MemId;
+break;
+#endif
+default:
+return AVERROR(ENOSYS);
+}
+
+if (dst->format == child_frames_ctx->format) {
+ret = ff_hwframe_map_create(s->child_frames_ref,
+dst, src, NULL, NULL);
+if (ret < 0)
+return ret;
+
+dst->width   = src->width;
+dst->height  = src->height;
+dst->data[3] = child_data;
+
+return 0;
+}
+
+desc = av_pix_fmt_desc_get(dst->format);
+if (desc && desc->flags & AV_PIX_FMT_FLAG_HWACCEL) {
+// This only supports mapping to software.
+return AVERROR(ENOSYS);
+}
+
 dummy = av_frame_alloc();
 if (!dummy)
 return AVERROR(ENOMEM);
@@ -603,7 +686,7 @@ static int qsv_map_from(AVHWFramesContext *ctx,
 dummy->format= child_frames_ctx->format;
 dummy->width = src->width;
 dummy->height= src->height;
-dummy->data[3]   = surf->Data.MemId;
+dummy->data[3]   = child_data;
 
 ret = av_hwframe_map(dst, dummy, flags);
 
@@ -1042,6 +1125,7 @@ const HWContextType ff_hwcontext_type_qsv = {
 .map_to = qsv_map_to,
 .map_from   = qsv_map_from,
 .frames_derive_to   = qsv_frames_derive_to,
+.frames_derive_from = qsv_frames_derive_from,
 
 .pix_fmts = (const enum AVPixelFormat[]){ AV_PIX_FMT_QSV, AV_PIX_FMT_NONE 
},
 };
-- 
2.11.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH 18/24] hwcontext: Add frame context mapping for nontrivial contexts

2017-06-12 Thread Mark Thompson

Some frames contexts are not usable without additional format-specific
state in hwctx.  This change adds new functions frames_derive_from and
frames_derive_to to initialise this state appropriately when deriving
a frames context which will require it to be set.

(cherry picked from commit 27978155bc661eec9f22bcf82c9cfc099cff4365)
---
 libavutil/hwcontext.c  | 9 -
 libavutil/hwcontext_internal.h | 5 +
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/libavutil/hwcontext.c b/libavutil/hwcontext.c
index 7f9b1d33e3..ba7ffd1951 100644
--- a/libavutil/hwcontext.c
+++ b/libavutil/hwcontext.c
@@ -819,7 +819,14 @@ int av_hwframe_ctx_create_derived(AVBufferRef 
**derived_frame_ctx,
 goto fail;
 }
 
-ret = av_hwframe_ctx_init(dst_ref);
+ret = AVERROR(ENOSYS);
+if (src->internal->hw_type->frames_derive_from)
+ret = src->internal->hw_type->frames_derive_from(dst, src, flags);
+if (ret == AVERROR(ENOSYS) &&
+dst->internal->hw_type->frames_derive_to)
+ret = dst->internal->hw_type->frames_derive_to(dst, src, flags);
+if (ret == AVERROR(ENOSYS))
+ret = 0;
 if (ret)
 goto fail;
 
diff --git a/libavutil/hwcontext_internal.h b/libavutil/hwcontext_internal.h
index 6451c0e2c5..0a0c4e86ce 100644
--- a/libavutil/hwcontext_internal.h
+++ b/libavutil/hwcontext_internal.h
@@ -92,6 +92,11 @@ typedef struct HWContextType {
const AVFrame *src, int flags);
 int  (*map_from)(AVHWFramesContext *ctx, AVFrame *dst,
  const AVFrame *src, int flags);
+
+int  (*frames_derive_to)(AVHWFramesContext *dst_ctx,
+ AVHWFramesContext *src_ctx, int 
flags);
+int  (*frames_derive_from)(AVHWFramesContext *dst_ctx,
+   AVHWFramesContext *src_ctx, int 
flags);
 } HWContextType;
 
 struct AVHWDeviceInternal {
-- 
2.11.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH 19/24] hwcontext_qsv: Implement mapping frames from the child device type

2017-06-12 Thread Mark Thompson

Factorises out existing surface initialisation code to reuse.

(cherry picked from commit eaa5e0710496db50fc164806e5f49eaaccc83bb5)
---
 libavutil/hwcontext_qsv.c | 174 +-
 1 file changed, 142 insertions(+), 32 deletions(-)

diff --git a/libavutil/hwcontext_qsv.c b/libavutil/hwcontext_qsv.c
index 505a8e709d..8dbff88b0a 100644
--- a/libavutil/hwcontext_qsv.c
+++ b/libavutil/hwcontext_qsv.c
@@ -94,6 +94,16 @@ static const struct {
 { AV_PIX_FMT_PAL8, MFX_FOURCC_P8   },
 };
 
+static uint32_t qsv_fourcc_from_pix_fmt(enum AVPixelFormat pix_fmt)
+{
+int i;
+for (i = 0; i < FF_ARRAY_ELEMS(supported_pixel_formats); i++) {
+if (supported_pixel_formats[i].pix_fmt == pix_fmt)
+return supported_pixel_formats[i].fourcc;
+}
+return 0;
+}
+
 static int qsv_device_init(AVHWDeviceContext *ctx)
 {
 AVQSVDeviceContext *hwctx = ctx->hwctx;
@@ -272,18 +282,48 @@ fail:
 return ret;
 }
 
+static int qsv_init_surface(AVHWFramesContext *ctx, mfxFrameSurface1 *surf)
+{
+const AVPixFmtDescriptor *desc;
+uint32_t fourcc;
+
+desc = av_pix_fmt_desc_get(ctx->sw_format);
+if (!desc)
+return AVERROR(EINVAL);
+
+fourcc = qsv_fourcc_from_pix_fmt(ctx->sw_format);
+if (!fourcc)
+return AVERROR(EINVAL);
+
+surf->Info.BitDepthLuma   = desc->comp[0].depth;
+surf->Info.BitDepthChroma = desc->comp[0].depth;
+surf->Info.Shift  = desc->comp[0].depth > 8;
+
+if (desc->log2_chroma_w && desc->log2_chroma_h)
+surf->Info.ChromaFormat   = MFX_CHROMAFORMAT_YUV420;
+else if (desc->log2_chroma_w)
+surf->Info.ChromaFormat   = MFX_CHROMAFORMAT_YUV422;
+else
+surf->Info.ChromaFormat   = MFX_CHROMAFORMAT_YUV444;
+
+surf->Info.FourCC = fourcc;
+surf->Info.Width  = ctx->width;
+surf->Info.CropW  = ctx->width;
+surf->Info.Height = ctx->height;
+surf->Info.CropH  = ctx->height;
+surf->Info.FrameRateExtN  = 25;
+surf->Info.FrameRateExtD  = 1;
+
+return 0;
+}
+
 static int qsv_init_pool(AVHWFramesContext *ctx, uint32_t fourcc)
 {
 QSVFramesContext  *s = ctx->internal->priv;
 AVQSVFramesContext *frames_hwctx = ctx->hwctx;
-const AVPixFmtDescriptor *desc;
 
 int i, ret = 0;
 
-desc = av_pix_fmt_desc_get(ctx->sw_format);
-if (!desc)
-return AVERROR_BUG;
-
 if (ctx->initial_pool_size <= 0) {
 av_log(ctx, AV_LOG_ERROR, "QSV requires a fixed frame pool size\n");
 return AVERROR(EINVAL);
@@ -295,26 +335,9 @@ static int qsv_init_pool(AVHWFramesContext *ctx, uint32_t 
fourcc)
 return AVERROR(ENOMEM);
 
 for (i = 0; i < ctx->initial_pool_size; i++) {
-mfxFrameSurface1 *surf = &s->surfaces_internal[i];
-
-surf->Info.BitDepthLuma   = desc->comp[0].depth;
-surf->Info.BitDepthChroma = desc->comp[0].depth;
-surf->Info.Shift  = desc->comp[0].depth > 8;
-
-if (desc->log2_chroma_w && desc->log2_chroma_h)
-surf->Info.ChromaFormat   = MFX_CHROMAFORMAT_YUV420;
-else if (desc->log2_chroma_w)
-surf->Info.ChromaFormat   = MFX_CHROMAFORMAT_YUV422;
-else
-surf->Info.ChromaFormat   = MFX_CHROMAFORMAT_YUV444;
-
-surf->Info.FourCC = fourcc;
-surf->Info.Width  = ctx->width;
-surf->Info.CropW  = ctx->width;
-surf->Info.Height = ctx->height;
-surf->Info.CropH  = ctx->height;
-surf->Info.FrameRateExtN  = 25;
-surf->Info.FrameRateExtD  = 1;
+ret = qsv_init_surface(ctx, &s->surfaces_internal[i]);
+if (ret < 0)
+return ret;
 }
 
 if (!(frames_hwctx->frame_type & MFX_MEMTYPE_OPAQUE_FRAME)) {
@@ -466,15 +489,10 @@ static int qsv_frames_init(AVHWFramesContext *ctx)
 
 int opaque = !!(frames_hwctx->frame_type & MFX_MEMTYPE_OPAQUE_FRAME);
 
-uint32_t fourcc = 0;
+uint32_t fourcc;
 int i, ret;
 
-for (i = 0; i < FF_ARRAY_ELEMS(supported_pixel_formats); i++) {
-if (supported_pixel_formats[i].pix_fmt == ctx->sw_format) {
-fourcc = supported_pixel_formats[i].fourcc;
-break;
-}
-}
+fourcc = qsv_fourcc_from_pix_fmt(ctx->sw_format);
 if (!fourcc) {
 av_log(ctx, AV_LOG_ERROR, "Unsupported pixel format\n");
 return AVERROR(ENOSYS);
@@ -723,6 +741,96 @@ static int qsv_transfer_data_to(AVHWFramesContext *ctx, 
AVFrame *dst,
 return 0;
 }
 
+static int qsv_frames_derive_to(AVHWFramesContext *dst_ctx,
+AVHWFramesContext *src_ctx, int flags)
+{
+QSVFramesContext *s = dst_ctx->internal->priv;
+AVQSVFramesContext *dst_hwctx = dst_ctx->hwctx;
+int i;
+
+switch (src_ctx->device_ctx->type) {
+#if CONFIG_VAAPI
+case AV_HWDEVICE_TYPE_VAAPI:
+{
+AVVAAPIFramesContext *src_hwctx = src_ctx->hwctx;
+s->

[FFmpeg-devel] [PATCH 08/24] ffmpeg: Document the -init_hw_device option

2017-06-12 Thread Mark Thompson

(cherry picked from commit 303fadf5963e01b8edf4ba2701e45f7e9e586aeb)
---
 doc/ffmpeg.texi | 85 +++--
 1 file changed, 58 insertions(+), 27 deletions(-)

diff --git a/doc/ffmpeg.texi b/doc/ffmpeg.texi
index dcc0cfb341..db7f05a3e0 100644
--- a/doc/ffmpeg.texi
+++ b/doc/ffmpeg.texi
@@ -715,6 +715,56 @@ would be more efficient.
 When doing stream copy, copy also non-key frames found at the
 beginning.
 
+@item -init_hw_device 
@var{type}[=@var{name}][:@var{device}[,@var{key=value}...]]
+Initialise a new hardware device of type @var{type} called @var{name}, using 
the
+given device parameters.
+If no name is specified it will receive a default name of the form 
"@var{type}%d".
+
+The meaning of @var{device} and the following arguments depends on the
+device type:
+@table @option
+
+@item cuda
+@var{device} is the number of the CUDA device.
+
+@item dxva2
+@var{device} is the number of the Direct3D 9 display adapter.
+
+@item vaapi
+@var{device} is either an X11 display name or a DRM render node.
+If not specified, it will attempt to open the default X11 display 
(@emph{$DISPLAY})
+and then the first DRM render node (@emph{/dev/dri/renderD128}).
+
+@item vdpau
+@var{device} is an X11 display name.
+If not specified, it will attempt to open the default X11 display 
(@emph{$DISPLAY}).
+
+@item qsv
+@var{device} selects a value in @samp{MFX_IMPL_*}. Allowed values are:
+@table @option
+@item auto
+@item sw
+@item hw
+@item auto_any
+@item hw_any
+@item hw2
+@item hw3
+@item hw4
+@end table
+If not specified, @samp{auto_any} is used.
+(Note that it may be easier to achieve the desired result for QSV by creating 
the
+platform-appropriate subdevice (@samp{dxva2} or @samp{vaapi}) and then 
deriving a
+QSV device from that.)
+
+@end table
+
+@item -init_hw_device @var{type}[=@var{name}]@@@var{source}
+Initialise a new hardware device of type @var{type} called @var{name},
+deriving it from the existing device with the name @var{source}.
+
+@item -init_hw_device list
+List all hardware device types supported in this build of ffmpeg.
+
 @item -hwaccel[:@var{stream_specifier}] @var{hwaccel} (@emph{input,per-stream})
 Use hardware acceleration to decode the matching stream(s). The allowed values
 of @var{hwaccel} are:
@@ -734,6 +784,9 @@ Use VDPAU (Video Decode and Presentation API for Unix) 
hardware acceleration.
 @item dxva2
 Use DXVA2 (DirectX Video Acceleration) hardware acceleration.
 
+@item vaapi
+Use VAAPI (Video Acceleration API) hardware acceleration.
+
 @item qsv
 Use the Intel QuickSync Video acceleration for video transcoding.
 
@@ -757,33 +810,11 @@ useful for testing.
 @item -hwaccel_device[:@var{stream_specifier}] @var{hwaccel_device} 
(@emph{input,per-stream})
 Select a device to use for hardware acceleration.
 
-This option only makes sense when the @option{-hwaccel} option is also
-specified. Its exact meaning depends on the specific hardware acceleration
-method chosen.
-
-@table @option
-@item vdpau
-For VDPAU, this option specifies the X11 display/screen to use. If this option
-is not specified, the value of the @var{DISPLAY} environment variable is used
-
-@item dxva2
-For DXVA2, this option should contain the number of the display adapter to use.
-If this option is not specified, the default adapter is used.
-
-@item qsv
-For QSV, this option corresponds to the values of MFX_IMPL_* . Allowed values
-are:
-@table @option
-@item auto
-@item sw
-@item hw
-@item auto_any
-@item hw_any
-@item hw2
-@item hw3
-@item hw4
-@end table
-@end table
+This option only makes sense when the @option{-hwaccel} option is also 
specified.
+It can either refer to an existing device created with @option{-init_hw_device}
+by name, or it can create a new device as if
+@samp{-init_hw_device} @var{type}:@var{hwaccel_device}
+were called immediately before.
 
 @item -hwaccels
 List all hardware acceleration methods supported in this build of ffmpeg.
-- 
2.11.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH 17/24] hwcontext_qsv: Support derivation from child devices

2017-06-12 Thread Mark Thompson

(cherry picked from commit aa51bb3d2756ed912ee40645efccf5f4a9609696)
---
 libavutil/hwcontext_qsv.c | 113 ++
 1 file changed, 84 insertions(+), 29 deletions(-)

diff --git a/libavutil/hwcontext_qsv.c b/libavutil/hwcontext_qsv.c
index 5550ffe143..505a8e709d 100644
--- a/libavutil/hwcontext_qsv.c
+++ b/libavutil/hwcontext_qsv.c
@@ -792,21 +792,96 @@ static mfxIMPL choose_implementation(const char *device)
 return impl;
 }
 
-static int qsv_device_create(AVHWDeviceContext *ctx, const char *device,
- AVDictionary *opts, int flags)
+static int qsv_device_derive_from_child(AVHWDeviceContext *ctx,
+mfxIMPL implementation,
+AVHWDeviceContext *child_device_ctx,
+int flags)
 {
 AVQSVDeviceContext *hwctx = ctx->hwctx;
-QSVDevicePriv *priv;
-enum AVHWDeviceType child_device_type;
-AVDictionaryEntry *e;
+QSVDeviceContext   *s = ctx->internal->priv;
 
 mfxVersionver = { { 3, 1 } };
-mfxIMPL   impl;
 mfxHDLhandle;
 mfxHandleType handle_type;
 mfxStatus err;
 int ret;
 
+switch (child_device_ctx->type) {
+#if CONFIG_VAAPI
+case AV_HWDEVICE_TYPE_VAAPI:
+{
+AVVAAPIDeviceContext *child_device_hwctx = child_device_ctx->hwctx;
+handle_type = MFX_HANDLE_VA_DISPLAY;
+handle = (mfxHDL)child_device_hwctx->display;
+}
+break;
+#endif
+#if CONFIG_DXVA2
+case AV_HWDEVICE_TYPE_DXVA2:
+{
+AVDXVA2DeviceContext *child_device_hwctx = child_device_ctx->hwctx;
+handle_type = MFX_HANDLE_D3D9_DEVICE_MANAGER;
+handle = (mfxHDL)child_device_hwctx->devmgr;
+}
+break;
+#endif
+default:
+ret = AVERROR(ENOSYS);
+goto fail;
+}
+
+err = MFXInit(implementation, &ver, &hwctx->session);
+if (err != MFX_ERR_NONE) {
+av_log(ctx, AV_LOG_ERROR, "Error initializing an MFX session: "
+   "%d.\n", err);
+ret = AVERROR_UNKNOWN;
+goto fail;
+}
+
+err = MFXVideoCORE_SetHandle(hwctx->session, handle_type, handle);
+if (err != MFX_ERR_NONE) {
+av_log(ctx, AV_LOG_ERROR, "Error setting child device handle: "
+   "%d\n", err);
+ret = AVERROR_UNKNOWN;
+goto fail;
+}
+
+ret = qsv_device_init(ctx);
+if (ret < 0)
+goto fail;
+if (s->handle_type != handle_type) {
+av_log(ctx, AV_LOG_ERROR, "Error in child device handle setup: "
+   "type mismatch (%d != %d).\n", s->handle_type, handle_type);
+err = AVERROR_UNKNOWN;
+goto fail;
+}
+
+return 0;
+
+fail:
+if (hwctx->session)
+MFXClose(hwctx->session);
+return ret;
+}
+
+static int qsv_device_derive(AVHWDeviceContext *ctx,
+ AVHWDeviceContext *child_device_ctx, int flags)
+{
+return qsv_device_derive_from_child(ctx, MFX_IMPL_HARDWARE_ANY,
+child_device_ctx, flags);
+}
+
+static int qsv_device_create(AVHWDeviceContext *ctx, const char *device,
+ AVDictionary *opts, int flags)
+{
+QSVDevicePriv *priv;
+enum AVHWDeviceType child_device_type;
+AVHWDeviceContext *child_device;
+AVDictionaryEntry *e;
+
+mfxIMPL impl;
+int ret;
+
 priv = av_mallocz(sizeof(*priv));
 if (!priv)
 return AVERROR(ENOMEM);
@@ -830,32 +905,11 @@ static int qsv_device_create(AVHWDeviceContext *ctx, 
const char *device,
 if (ret < 0)
 return ret;
 
-{
-AVHWDeviceContext  *child_device_ctx = 
(AVHWDeviceContext*)priv->child_device_ctx->data;
-#if CONFIG_VAAPI
-AVVAAPIDeviceContext *child_device_hwctx = child_device_ctx->hwctx;
-handle_type = MFX_HANDLE_VA_DISPLAY;
-handle = (mfxHDL)child_device_hwctx->display;
-#elif CONFIG_DXVA2
-AVDXVA2DeviceContext *child_device_hwctx = child_device_ctx->hwctx;
-handle_type = MFX_HANDLE_D3D9_DEVICE_MANAGER;
-handle = (mfxHDL)child_device_hwctx->devmgr;
-#endif
-}
+child_device = (AVHWDeviceContext*)priv->child_device_ctx->data;
 
 impl = choose_implementation(device);
 
-err = MFXInit(impl, &ver, &hwctx->session);
-if (err != MFX_ERR_NONE) {
-av_log(ctx, AV_LOG_ERROR, "Error initializing an MFX session\n");
-return AVERROR_UNKNOWN;
-}
-
-err = MFXVideoCORE_SetHandle(hwctx->session, handle_type, handle);
-if (err != MFX_ERR_NONE)
-return AVERROR_UNKNOWN;
-
-return 0;
+return qsv_device_derive_from_child(ctx, impl, child_device, 0);
 }
 
 const HWContextType ff_hwcontext_type_qsv = {
@@ -868,6 +922,7 @@ const HWContextType ff_hwcontext_type_qsv = {
 .frames_priv_size   = sizeof(QSVFramesContext),
 
 .device_create  = qsv_device_create,
+.devic

[FFmpeg-devel] [PATCH 10/24] qsv: Add ability to create a session from a device

2017-06-12 Thread Mark Thompson

(cherry picked from commit 4936a48b1e6fc2147599541f8b25f43a8a9d1f16)
---
 libavcodec/qsv.c  | 49 ---
 libavcodec/qsv_internal.h |  9 ++---
 libavcodec/qsvdec.c   |  6 +++---
 libavcodec/qsvenc.c   |  6 +++---
 4 files changed, 46 insertions(+), 24 deletions(-)

diff --git a/libavcodec/qsv.c b/libavcodec/qsv.c
index 1284419741..b9e2cd990d 100644
--- a/libavcodec/qsv.c
+++ b/libavcodec/qsv.c
@@ -535,27 +535,16 @@ static mfxStatus qsv_frame_get_hdl(mfxHDL pthis, mfxMemId 
mid, mfxHDL *hdl)
 return MFX_ERR_NONE;
 }
 
-int ff_qsv_init_session_hwcontext(AVCodecContext *avctx, mfxSession *psession,
-  QSVFramesContext *qsv_frames_ctx,
-  const char *load_plugins, int opaque)
+int ff_qsv_init_session_device(AVCodecContext *avctx, mfxSession *psession,
+   AVBufferRef *device_ref, const char 
*load_plugins)
 {
 static const mfxHandleType handle_types[] = {
 MFX_HANDLE_VA_DISPLAY,
 MFX_HANDLE_D3D9_DEVICE_MANAGER,
 MFX_HANDLE_D3D11_DEVICE,
 };
-mfxFrameAllocator frame_allocator = {
-.pthis  = qsv_frames_ctx,
-.Alloc  = qsv_frame_alloc,
-.Lock   = qsv_frame_lock,
-.Unlock = qsv_frame_unlock,
-.GetHDL = qsv_frame_get_hdl,
-.Free   = qsv_frame_free,
-};
-
-AVHWFramesContext*frames_ctx = 
(AVHWFramesContext*)qsv_frames_ctx->hw_frames_ctx->data;
-AVQSVFramesContext *frames_hwctx = frames_ctx->hwctx;
-AVQSVDeviceContext *device_hwctx = frames_ctx->device_ctx->hwctx;
+AVHWDeviceContext*device_ctx = (AVHWDeviceContext*)device_ref->data;
+AVQSVDeviceContext *device_hwctx = device_ctx->hwctx;
 mfxSessionparent_session = device_hwctx->session;
 
 mfxSessionsession;
@@ -605,6 +594,36 @@ int ff_qsv_init_session_hwcontext(AVCodecContext *avctx, 
mfxSession *psession,
 return ret;
 }
 
+*psession = session;
+return 0;
+}
+
+int ff_qsv_init_session_frames(AVCodecContext *avctx, mfxSession *psession,
+   QSVFramesContext *qsv_frames_ctx,
+   const char *load_plugins, int opaque)
+{
+mfxFrameAllocator frame_allocator = {
+.pthis  = qsv_frames_ctx,
+.Alloc  = qsv_frame_alloc,
+.Lock   = qsv_frame_lock,
+.Unlock = qsv_frame_unlock,
+.GetHDL = qsv_frame_get_hdl,
+.Free   = qsv_frame_free,
+};
+
+AVHWFramesContext*frames_ctx = 
(AVHWFramesContext*)qsv_frames_ctx->hw_frames_ctx->data;
+AVQSVFramesContext *frames_hwctx = frames_ctx->hwctx;
+
+mfxSessionsession;
+mfxStatus err;
+
+int ret;
+
+ret = ff_qsv_init_session_device(avctx, &session,
+ frames_ctx->device_ref, load_plugins);
+if (ret < 0)
+return ret;
+
 if (!opaque) {
 qsv_frames_ctx->logctx = avctx;
 
diff --git a/libavcodec/qsv_internal.h b/libavcodec/qsv_internal.h
index 814db08e6c..c0305508dd 100644
--- a/libavcodec/qsv_internal.h
+++ b/libavcodec/qsv_internal.h
@@ -90,9 +90,12 @@ int ff_qsv_map_pixfmt(enum AVPixelFormat format, uint32_t 
*fourcc);
 int ff_qsv_init_internal_session(AVCodecContext *avctx, mfxSession *session,
  const char *load_plugins);
 
-int ff_qsv_init_session_hwcontext(AVCodecContext *avctx, mfxSession *session,
-  QSVFramesContext *qsv_frames_ctx,
-  const char *load_plugins, int opaque);
+int ff_qsv_init_session_device(AVCodecContext *avctx, mfxSession *psession,
+   AVBufferRef *device_ref, const char 
*load_plugins);
+
+int ff_qsv_init_session_frames(AVCodecContext *avctx, mfxSession *session,
+   QSVFramesContext *qsv_frames_ctx,
+   const char *load_plugins, int opaque);
 
 int ff_qsv_find_surface_idx(QSVFramesContext *ctx, QSVFrame *frame);
 
diff --git a/libavcodec/qsvdec.c b/libavcodec/qsvdec.c
index d7664ce581..74866b57ff 100644
--- a/libavcodec/qsvdec.c
+++ b/libavcodec/qsvdec.c
@@ -59,9 +59,9 @@ static int qsv_init_session(AVCodecContext *avctx, QSVContext 
*q, mfxSession ses
 if (!q->frames_ctx.hw_frames_ctx)
 return AVERROR(ENOMEM);
 
-ret = ff_qsv_init_session_hwcontext(avctx, &q->internal_session,
-&q->frames_ctx, q->load_plugins,
-q->iopattern == 
MFX_IOPATTERN_OUT_OPAQUE_MEMORY);
+ret = ff_qsv_init_session_frames(avctx, &q->internal_session,
+ &q->frames_ctx, q->load_plugins,
+ q->iopattern == 
MFX_IOPATTERN_OUT_OPAQUE_MEMORY);
 if (ret < 0) {
 av_buffer_unref(&q->frames_ctx.hw_frames_ctx);
 return ret;
diff --git a/libavco

[FFmpeg-devel] [PATCH 15/24] vaapi: Add external control of allow-profile-mismatch

2017-06-12 Thread Mark Thompson

Uses the just-added ALLOW_PROFILE_MISMATCH flag.

(cherry picked from commit 7acb90333a187b0e847b66f9d3511245423dc0ce)
---
 libavcodec/vaapi_decode.c | 11 ++-
 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/libavcodec/vaapi_decode.c b/libavcodec/vaapi_decode.c
index b63fb94fc1..cf58aae4c6 100644
--- a/libavcodec/vaapi_decode.c
+++ b/libavcodec/vaapi_decode.c
@@ -286,14 +286,6 @@ static int vaapi_decode_make_config(AVCodecContext *avctx)
 int profile_count, exact_match, alt_profile;
 const AVPixFmtDescriptor *sw_desc, *desc;
 
-// Allowing a profile mismatch can be useful because streams may
-// over-declare their required capabilities - in particular, many
-// H.264 baseline profile streams (notably some of those in FATE)
-// only use the feature set of constrained baseline.  This flag
-// would have to be be set by some external means in order to
-// actually be useful.  (AV_HWACCEL_FLAG_IGNORE_PROFILE?)
-int allow_profile_mismatch = 0;
-
 codec_desc = avcodec_descriptor_get(avctx->codec_id);
 if (!codec_desc) {
 err = AVERROR(EINVAL);
@@ -348,7 +340,8 @@ static int vaapi_decode_make_config(AVCodecContext *avctx)
 goto fail;
 }
 if (!exact_match) {
-if (allow_profile_mismatch) {
+if (avctx->hwaccel_flags &
+AV_HWACCEL_FLAG_ALLOW_PROFILE_MISMATCH) {
 av_log(avctx, AV_LOG_VERBOSE, "Codec %s profile %d not "
"supported for hardware decode.\n",
codec_desc->name, avctx->profile);
-- 
2.11.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH 14/24] lavc: Add flag to allow profile mismatch with hardware decoding

2017-06-12 Thread Mark Thompson

(cherry picked from commit 64a5260c695dd8051509d3270295fd64eac56587)
---
 doc/APIchanges   |  3 +++
 libavcodec/avcodec.h | 14 ++
 libavcodec/version.h |  2 +-
 3 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/doc/APIchanges b/doc/APIchanges
index 5b2203f2b4..12c4877b9b 100644
--- a/doc/APIchanges
+++ b/doc/APIchanges
@@ -15,6 +15,9 @@ libavutil: 2015-08-28
 
 API changes, most recent first:
 
+2017-06-xx - xxx - lavc 57.99.100 - avcodec.h
+  Add AV_HWACCEL_FLAG_ALLOW_PROFILE_MISMATCH.
+
 2017-06-xx - xxx - lavu 55.65.100 - hwcontext.h
   Add AV_HWDEVICE_TYPE_NONE, av_hwdevice_find_type_by_name(),
   av_hwdevice_get_type_name() and av_hwdevice_iterate_types().
diff --git a/libavcodec/avcodec.h b/libavcodec/avcodec.h
index dcdcfe00ae..39be8cf717 100644
--- a/libavcodec/avcodec.h
+++ b/libavcodec/avcodec.h
@@ -4002,6 +4002,20 @@ typedef struct AVHWAccel {
 #define AV_HWACCEL_FLAG_ALLOW_HIGH_DEPTH (1 << 1)
 
 /**
+ * Hardware acceleration should still be attempted for decoding when the
+ * codec profile does not match the reported capabilities of the hardware.
+ *
+ * For example, this can be used to try to decode baseline profile H.264
+ * streams in hardware - it will often succeed, because many streams marked
+ * as baseline profile actually conform to constrained baseline profile.
+ *
+ * @warning If the stream is actually not supported then the behaviour is
+ *  undefined, and may include returning entirely incorrect output
+ *  while indicating success.
+ */
+#define AV_HWACCEL_FLAG_ALLOW_PROFILE_MISMATCH (1 << 2)
+
+/**
  * @}
  */
 
diff --git a/libavcodec/version.h b/libavcodec/version.h
index c93487273a..a44a88832d 100644
--- a/libavcodec/version.h
+++ b/libavcodec/version.h
@@ -28,7 +28,7 @@
 #include "libavutil/version.h"
 
 #define LIBAVCODEC_VERSION_MAJOR  57
-#define LIBAVCODEC_VERSION_MINOR  98
+#define LIBAVCODEC_VERSION_MINOR  99
 #define LIBAVCODEC_VERSION_MICRO 100
 
 #define LIBAVCODEC_VERSION_INT  AV_VERSION_INT(LIBAVCODEC_VERSION_MAJOR, \
-- 
2.11.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH 16/24] ffmpeg: Support setting the hardware device to use when filtering

2017-06-12 Thread Mark Thompson

This only supports one device globally, but more can be used by
passing them with input streams in hw_frames_ctx or by deriving new
devices inside a filter graph with hwmap.

(cherry picked from commit e669db76108de8d7a36c2274c99da82cc94d1dd1)
---
 doc/ffmpeg.texi | 11 +++
 ffmpeg.h|  1 +
 ffmpeg_filter.c | 10 --
 ffmpeg_opt.c| 17 +
 4 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/doc/ffmpeg.texi b/doc/ffmpeg.texi
index db7f05a3e0..4616a4239e 100644
--- a/doc/ffmpeg.texi
+++ b/doc/ffmpeg.texi
@@ -765,6 +765,17 @@ deriving it from the existing device with the name 
@var{source}.
 @item -init_hw_device list
 List all hardware device types supported in this build of ffmpeg.
 
+@item -filter_hw_device @var{name}
+Pass the hardware device called @var{name} to all filters in any filter graph.
+This can be used to set the device to upload to with the @code{hwupload} 
filter,
+or the device to map to with the @code{hwmap} filter.  Other filters may also
+make use of this parameter when they require a hardware device.  Note that this
+is typically only required when the input is not already in hardware frames -
+when it is, filters will derive the device they require from the context of the
+frames they receive as input.
+
+This is a global setting, so all filters will receive the same device.
+
 @item -hwaccel[:@var{stream_specifier}] @var{hwaccel} (@emph{input,per-stream})
 Use hardware acceleration to decode the matching stream(s). The allowed values
 of @var{hwaccel} are:
diff --git a/ffmpeg.h b/ffmpeg.h
index fbb9172d74..c3854bcb4a 100644
--- a/ffmpeg.h
+++ b/ffmpeg.h
@@ -628,6 +628,7 @@ extern AVBufferRef *hw_device_ctx;
 #if CONFIG_QSV
 extern char *qsv_device;
 #endif
+extern HWDevice *filter_hw_device;
 
 
 void term_init(void);
diff --git a/ffmpeg_filter.c b/ffmpeg_filter.c
index 817f48f473..aacc185059 100644
--- a/ffmpeg_filter.c
+++ b/ffmpeg_filter.c
@@ -1046,9 +1046,15 @@ int configure_filtergraph(FilterGraph *fg)
 if ((ret = avfilter_graph_parse2(fg->graph, graph_desc, &inputs, 
&outputs)) < 0)
 goto fail;
 
-if (hw_device_ctx) {
+if (filter_hw_device || hw_device_ctx) {
+AVBufferRef *device = filter_hw_device ? filter_hw_device->device_ref
+   : hw_device_ctx;
 for (i = 0; i < fg->graph->nb_filters; i++) {
-fg->graph->filters[i]->hw_device_ctx = 
av_buffer_ref(hw_device_ctx);
+fg->graph->filters[i]->hw_device_ctx = av_buffer_ref(device);
+if (!fg->graph->filters[i]->hw_device_ctx) {
+ret = AVERROR(ENOMEM);
+goto fail;
+}
 }
 }
 
diff --git a/ffmpeg_opt.c b/ffmpeg_opt.c
index 1facc82f44..90c31c0f58 100644
--- a/ffmpeg_opt.c
+++ b/ffmpeg_opt.c
@@ -98,6 +98,7 @@ const HWAccel hwaccels[] = {
 };
 int hwaccel_lax_profile_check = 0;
 AVBufferRef *hw_device_ctx;
+HWDevice *filter_hw_device;
 
 char *vstats_filename;
 char *sdp_filename;
@@ -497,6 +498,20 @@ static int opt_init_hw_device(void *optctx, const char 
*opt, const char *arg)
 }
 }
 
+static int opt_filter_hw_device(void *optctx, const char *opt, const char *arg)
+{
+if (filter_hw_device) {
+av_log(NULL, AV_LOG_ERROR, "Only one filter device can be used.\n");
+return AVERROR(EINVAL);
+}
+filter_hw_device = hw_device_get_by_name(arg);
+if (!filter_hw_device) {
+av_log(NULL, AV_LOG_ERROR, "Invalid filter device %s.\n", arg);
+return AVERROR(EINVAL);
+}
+return 0;
+}
+
 /**
  * Parse a metadata specifier passed as 'arg' parameter.
  * @param arg  metadata string to parse
@@ -3710,6 +3725,8 @@ const OptionDef options[] = {
 
 { "init_hw_device", HAS_ARG | OPT_EXPERT, { .func_arg = opt_init_hw_device 
},
 "initialise hardware device", "args" },
+{ "filter_hw_device", HAS_ARG | OPT_EXPERT, { .func_arg = 
opt_filter_hw_device },
+"set hardware device used when filtering", "device" },
 
 { NULL, },
 };
-- 
2.11.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH 09/24] vf_deinterlace_vaapi: Add support for field rate output

2017-06-12 Thread Mark Thompson

In order to work correctly with the i965 driver, this also fixes the
direction of forward/backward references - forward references are
intended to be those from the past to the current frame, not from the
current frame to the future.

(cherry picked from commit 9aa251c98ce60e5ee83156e5292547a7671ced3a)
---
 libavfilter/vf_deinterlace_vaapi.c | 289 +
 1 file changed, 166 insertions(+), 123 deletions(-)

diff --git a/libavfilter/vf_deinterlace_vaapi.c 
b/libavfilter/vf_deinterlace_vaapi.c
index 5e7f7cf1c2..838eb89c90 100644
--- a/libavfilter/vf_deinterlace_vaapi.c
+++ b/libavfilter/vf_deinterlace_vaapi.c
@@ -22,6 +22,7 @@
 #include 
 
 #include "libavutil/avassert.h"
+#include "libavutil/common.h"
 #include "libavutil/hwcontext.h"
 #include "libavutil/hwcontext_vaapi.h"
 #include "libavutil/mem.h"
@@ -42,6 +43,8 @@ typedef struct DeintVAAPIContext {
 AVBufferRef   *device_ref;
 
 intmode;
+intfield_rate;
+intauto_enable;
 
 intvalid_ids;
 VAConfigID va_config;
@@ -63,6 +66,7 @@ typedef struct DeintVAAPIContext {
 intqueue_depth;
 intqueue_count;
 AVFrame   *frame_queue[MAX_REFERENCES];
+intextra_delay_for_timestamps;
 
 VABufferID filter_buffer;
 } DeintVAAPIContext;
@@ -211,8 +215,12 @@ static int deint_vaapi_build_filter_params(AVFilterContext 
*avctx)
 return AVERROR(EIO);
 }
 
+ctx->extra_delay_for_timestamps = ctx->field_rate == 2 &&
+ctx->pipeline_caps.num_backward_references == 0;
+
 ctx->queue_depth = ctx->pipeline_caps.num_backward_references +
-   ctx->pipeline_caps.num_forward_references + 1;
+   ctx->pipeline_caps.num_forward_references +
+   ctx->extra_delay_for_timestamps + 1;
 if (ctx->queue_depth > MAX_REFERENCES) {
 av_log(avctx, AV_LOG_ERROR, "Pipeline requires too many "
"references (%u forward, %u back).\n",
@@ -227,6 +235,7 @@ static int deint_vaapi_build_filter_params(AVFilterContext 
*avctx)
 static int deint_vaapi_config_output(AVFilterLink *outlink)
 {
 AVFilterContext*avctx = outlink->src;
+AVFilterLink  *inlink = avctx->inputs[0];
 DeintVAAPIContext*ctx = avctx->priv;
 AVVAAPIHWConfig *hwconfig = NULL;
 AVHWFramesConstraints *constraints = NULL;
@@ -326,8 +335,13 @@ static int deint_vaapi_config_output(AVFilterLink *outlink)
 if (err < 0)
 goto fail;
 
-outlink->w = ctx->output_width;
-outlink->h = ctx->output_height;
+outlink->w = inlink->w;
+outlink->h = inlink->h;
+
+outlink->time_base  = av_mul_q(inlink->time_base,
+   (AVRational) { 1, ctx->field_rate });
+outlink->frame_rate = av_mul_q(inlink->frame_rate,
+   (AVRational) { ctx->field_rate, 1 });
 
 outlink->hw_frames_ctx = av_buffer_ref(ctx->output_frames_ref);
 if (!outlink->hw_frames_ctx) {
@@ -375,7 +389,7 @@ static int deint_vaapi_filter_frame(AVFilterLink *inlink, 
AVFrame *input_frame)
 VABufferID params_id;
 VAStatus vas;
 void *filter_params_addr = NULL;
-int err, i;
+int err, i, field, current_frame_index;
 
 av_log(avctx, AV_LOG_DEBUG, "Filter input: %s, %ux%u (%"PRId64").\n",
av_get_pix_fmt_name(input_frame->format),
@@ -394,17 +408,16 @@ static int deint_vaapi_filter_frame(AVFilterLink *inlink, 
AVFrame *input_frame)
 ctx->frame_queue[i] = input_frame;
 }
 
-input_frame =
-ctx->frame_queue[ctx->pipeline_caps.num_backward_references];
+current_frame_index = ctx->pipeline_caps.num_forward_references;
+
+input_frame = ctx->frame_queue[current_frame_index];
 input_surface = (VASurfaceID)(uintptr_t)input_frame->data[3];
-for (i = 0; i < ctx->pipeline_caps.num_backward_references; i++)
-backward_references[i] = (VASurfaceID)(uintptr_t)
-ctx->frame_queue[ctx->pipeline_caps.num_backward_references -
- i - 1]->data[3];
 for (i = 0; i < ctx->pipeline_caps.num_forward_references; i++)
 forward_references[i] = (VASurfaceID)(uintptr_t)
-ctx->frame_queue[ctx->pipeline_caps.num_backward_references +
- i + 1]->data[3];
+ctx->frame_queue[current_frame_index - i - 1]->data[3];
+for (i = 0; i < ctx->pipeline_caps.num_backward_references; i++)
+backward_references[i] = (VASurfaceID)(uintptr_t)
+ctx->frame_queue[current_frame_index + i + 1]->data[3];
 
 av_log(avctx, AV_LOG_DEBUG, "Using surface %#x for "
"deinterlace input.\n", input_surface);
@@ -417,129 +430,148 @@ static int deint_vaapi_filter_frame(AVFilterLink 
*inlink, AVFrame *input_frame)
 av_log(avctx, AV_LOG_DEBUG, " %#x", forward_references[i]);
 av_log(avctx, AV_LOG_DE

[FFmpeg-devel] [PATCH 13/24] vaapi_encode: Use gop_size consistently in RC parameters

2017-06-12 Thread Mark Thompson

The non-H.26[45] codecs already use this form.  Since we don't
currently generate I frames for codecs which support them separately
to IDR, the p_per_i variable is set to infinity by default so that it
doesn't interfere with any other calculation.  (All the code for I
frames still exists, and it works for H.264 if set manually.)

(cherry picked from commit 6af014f4028238b4c50f1731b3369a41d65fa9c4)
---
 libavcodec/vaapi_encode.c  | 3 +--
 libavcodec/vaapi_encode_h264.c | 4 ++--
 libavcodec/vaapi_encode_h265.c | 4 ++--
 3 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/libavcodec/vaapi_encode.c b/libavcodec/vaapi_encode.c
index 7aaf263d25..2de5f76cab 100644
--- a/libavcodec/vaapi_encode.c
+++ b/libavcodec/vaapi_encode.c
@@ -1435,8 +1435,7 @@ av_cold int ff_vaapi_encode_init(AVCodecContext *avctx)
 ctx->output_order = - ctx->output_delay - 1;
 
 // Currently we never generate I frames, only IDR.
-ctx->p_per_i = ((avctx->gop_size - 1 + avctx->max_b_frames) /
-(avctx->max_b_frames + 1));
+ctx->p_per_i = INT_MAX;
 ctx->b_per_p = avctx->max_b_frames;
 
 if (ctx->codec->sequence_params_size > 0) {
diff --git a/libavcodec/vaapi_encode_h264.c b/libavcodec/vaapi_encode_h264.c
index 92e29554ed..f9fcd805a4 100644
--- a/libavcodec/vaapi_encode_h264.c
+++ b/libavcodec/vaapi_encode_h264.c
@@ -905,8 +905,8 @@ static int 
vaapi_encode_h264_init_sequence_params(AVCodecContext *avctx)
 mseq->nal_hrd_parameters_present_flag = 0;
 }
 
-vseq->intra_period = ctx->p_per_i * (ctx->b_per_p + 1);
-vseq->intra_idr_period = vseq->intra_period;
+vseq->intra_period = avctx->gop_size;
+vseq->intra_idr_period = avctx->gop_size;
 vseq->ip_period= ctx->b_per_p + 1;
 }
 
diff --git a/libavcodec/vaapi_encode_h265.c b/libavcodec/vaapi_encode_h265.c
index 6e008b7b9c..1d648a6d87 100644
--- a/libavcodec/vaapi_encode_h265.c
+++ b/libavcodec/vaapi_encode_h265.c
@@ -832,8 +832,8 @@ static int 
vaapi_encode_h265_init_sequence_params(AVCodecContext *avctx)
 vseq->vui_time_scale= avctx->time_base.den;
 }
 
-vseq->intra_period = ctx->p_per_i * (ctx->b_per_p + 1);
-vseq->intra_idr_period = vseq->intra_period;
+vseq->intra_period = avctx->gop_size;
+vseq->intra_idr_period = avctx->gop_size;
 vseq->ip_period= ctx->b_per_p + 1;
 }
 
-- 
2.11.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH 12/24] qsvenc: Allow use of hw_device_ctx to make the internal session

2017-06-12 Thread Mark Thompson

(cherry picked from commit 3d197514e613ccd9eab43180c0a7c8b09a307606)
---
 libavcodec/qsvenc.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/libavcodec/qsvenc.c b/libavcodec/qsvenc.c
index 64227cea6e..5eb506fb76 100644
--- a/libavcodec/qsvenc.c
+++ b/libavcodec/qsvenc.c
@@ -700,6 +700,13 @@ static int qsvenc_init_session(AVCodecContext *avctx, 
QSVEncContext *q)
 }
 
 q->session = q->internal_session;
+} else if (avctx->hw_device_ctx) {
+ret = ff_qsv_init_session_device(avctx, &q->internal_session,
+ avctx->hw_device_ctx, 
q->load_plugins);
+if (ret < 0)
+return ret;
+
+q->session = q->internal_session;
 } else {
 ret = ff_qsv_init_internal_session(avctx, &q->internal_session,
q->load_plugins);
-- 
2.11.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH 07/24] ffmpeg: Enable generic hwaccel support for VDPAU

2017-06-12 Thread Mark Thompson

(cherry picked from commit aa6b2e081c504cb99f5e2e0ceb45295ef24bdac2)
---
 Makefile   |   1 -
 ffmpeg.h   |   1 -
 ffmpeg_opt.c   |   4 +-
 ffmpeg_vdpau.c | 159 -
 4 files changed, 2 insertions(+), 163 deletions(-)
 delete mode 100644 ffmpeg_vdpau.c

diff --git a/Makefile b/Makefile
index 26f9d93d85..ea90ec8b44 100644
--- a/Makefile
+++ b/Makefile
@@ -39,7 +39,6 @@ OBJS-ffmpeg-$(CONFIG_VDA) += ffmpeg_videotoolbox.o
 endif
 OBJS-ffmpeg-$(CONFIG_CUVID)   += ffmpeg_cuvid.o
 OBJS-ffmpeg-$(HAVE_DXVA2_LIB) += ffmpeg_dxva2.o
-OBJS-ffmpeg-$(HAVE_VDPAU_X11) += ffmpeg_vdpau.o
 OBJS-ffserver += ffserver_config.o
 
 TESTTOOLS   = audiogen videogen rotozoom tiny_psnr tiny_ssim base64 audiomatch
diff --git a/ffmpeg.h b/ffmpeg.h
index 231d362f5f..fbb9172d74 100644
--- a/ffmpeg.h
+++ b/ffmpeg.h
@@ -660,7 +660,6 @@ int ifilter_parameters_from_frame(InputFilter *ifilter, 
const AVFrame *frame);
 
 int ffmpeg_parse_options(int argc, char **argv);
 
-int vdpau_init(AVCodecContext *s);
 int dxva2_init(AVCodecContext *s);
 int vda_init(AVCodecContext *s);
 int videotoolbox_init(AVCodecContext *s);
diff --git a/ffmpeg_opt.c b/ffmpeg_opt.c
index 51671e0dd4..1facc82f44 100644
--- a/ffmpeg_opt.c
+++ b/ffmpeg_opt.c
@@ -67,8 +67,8 @@
 
 const HWAccel hwaccels[] = {
 #if HAVE_VDPAU_X11
-{ "vdpau", vdpau_init, HWACCEL_VDPAU, AV_PIX_FMT_VDPAU,
-  AV_HWDEVICE_TYPE_NONE },
+{ "vdpau", hwaccel_decode_init, HWACCEL_VDPAU, AV_PIX_FMT_VDPAU,
+  AV_HWDEVICE_TYPE_VDPAU },
 #endif
 #if HAVE_DXVA2_LIB
 { "dxva2", dxva2_init, HWACCEL_DXVA2, AV_PIX_FMT_DXVA2_VLD,
diff --git a/ffmpeg_vdpau.c b/ffmpeg_vdpau.c
deleted file mode 100644
index 7d4fbf8a37..00
--- a/ffmpeg_vdpau.c
+++ /dev/null
@@ -1,159 +0,0 @@
-/*
- * This file is part of FFmpeg.
- *
- * FFmpeg is free software; you can redistribute it and/or
- * modify it under the terms of the GNU Lesser General Public
- * License as published by the Free Software Foundation; either
- * version 2.1 of the License, or (at your option) any later version.
- *
- * FFmpeg is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
- * Lesser General Public License for more details.
- *
- * You should have received a copy of the GNU Lesser General Public
- * License along with FFmpeg; if not, write to the Free Software
- * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
- */
-
-#include 
-
-#include "ffmpeg.h"
-
-#include "libavcodec/vdpau.h"
-
-#include "libavutil/buffer.h"
-#include "libavutil/frame.h"
-#include "libavutil/hwcontext.h"
-#include "libavutil/hwcontext_vdpau.h"
-#include "libavutil/pixfmt.h"
-
-typedef struct VDPAUContext {
-AVBufferRef *hw_frames_ctx;
-AVFrame *tmp_frame;
-} VDPAUContext;
-
-static void vdpau_uninit(AVCodecContext *s)
-{
-InputStream  *ist = s->opaque;
-VDPAUContext *ctx = ist->hwaccel_ctx;
-
-ist->hwaccel_uninit= NULL;
-ist->hwaccel_get_buffer= NULL;
-ist->hwaccel_retrieve_data = NULL;
-
-av_buffer_unref(&ctx->hw_frames_ctx);
-av_frame_free(&ctx->tmp_frame);
-
-av_freep(&ist->hwaccel_ctx);
-av_freep(&s->hwaccel_context);
-}
-
-static int vdpau_get_buffer(AVCodecContext *s, AVFrame *frame, int flags)
-{
-InputStream *ist = s->opaque;
-VDPAUContext*ctx = ist->hwaccel_ctx;
-
-return av_hwframe_get_buffer(ctx->hw_frames_ctx, frame, 0);
-}
-
-static int vdpau_retrieve_data(AVCodecContext *s, AVFrame *frame)
-{
-InputStream*ist = s->opaque;
-VDPAUContext   *ctx = ist->hwaccel_ctx;
-int ret;
-
-ret = av_hwframe_transfer_data(ctx->tmp_frame, frame, 0);
-if (ret < 0)
-return ret;
-
-ret = av_frame_copy_props(ctx->tmp_frame, frame);
-if (ret < 0) {
-av_frame_unref(ctx->tmp_frame);
-return ret;
-}
-
-av_frame_unref(frame);
-av_frame_move_ref(frame, ctx->tmp_frame);
-
-return 0;
-}
-
-static int vdpau_alloc(AVCodecContext *s)
-{
-InputStream  *ist = s->opaque;
-int loglevel = (ist->hwaccel_id == HWACCEL_AUTO) ? AV_LOG_VERBOSE : 
AV_LOG_ERROR;
-VDPAUContext *ctx;
-int ret;
-
-AVBufferRef  *device_ref = NULL;
-AVHWDeviceContext*device_ctx;
-AVVDPAUDeviceContext *device_hwctx;
-AVHWFramesContext*frames_ctx;
-
-ctx = av_mallocz(sizeof(*ctx));
-if (!ctx)
-return AVERROR(ENOMEM);
-
-ist->hwaccel_ctx   = ctx;
-ist->hwaccel_uninit= vdpau_uninit;
-ist->hwaccel_get_buffer= vdpau_get_buffer;
-ist->hwaccel_retrieve_data = vdpau_retrieve_data;
-
-ctx->tmp_frame = av_frame_alloc();
-if (!ctx->tmp_frame)
-goto fail;
-
-ret = av_hwdevice_ctx_create(&device_ref, AV_HWDEVICE_TYPE_VDPAU,
- ist->hwaccel_device, NULL, 0);
-if (re

[FFmpeg-devel] [PATCH 06/24] ffmpeg: Enable generic hwaccel support for VAAPI

2017-06-12 Thread Mark Thompson

(cherry picked from commit 62a1ef9f26c654a3e988aa465c4ac1d776c4c356)
---
 Makefile   |   1 -
 ffmpeg.h   |   2 -
 ffmpeg_opt.c   |  20 -
 ffmpeg_vaapi.c | 233 -
 4 files changed, 16 insertions(+), 240 deletions(-)
 delete mode 100644 ffmpeg_vaapi.c

diff --git a/Makefile b/Makefile
index 913a890a78..26f9d93d85 100644
--- a/Makefile
+++ b/Makefile
@@ -34,7 +34,6 @@ $(foreach prog,$(AVBASENAMES),$(eval 
OBJS-$(prog)-$(CONFIG_OPENCL) += cmdutils_o
 OBJS-ffmpeg   += ffmpeg_opt.o ffmpeg_filter.o ffmpeg_hw.o
 OBJS-ffmpeg-$(CONFIG_VIDEOTOOLBOX) += ffmpeg_videotoolbox.o
 OBJS-ffmpeg-$(CONFIG_LIBMFX)  += ffmpeg_qsv.o
-OBJS-ffmpeg-$(CONFIG_VAAPI)   += ffmpeg_vaapi.o
 ifndef CONFIG_VIDEOTOOLBOX
 OBJS-ffmpeg-$(CONFIG_VDA) += ffmpeg_videotoolbox.o
 endif
diff --git a/ffmpeg.h b/ffmpeg.h
index 5c115cf9a3..231d362f5f 100644
--- a/ffmpeg.h
+++ b/ffmpeg.h
@@ -665,8 +665,6 @@ int dxva2_init(AVCodecContext *s);
 int vda_init(AVCodecContext *s);
 int videotoolbox_init(AVCodecContext *s);
 int qsv_init(AVCodecContext *s);
-int vaapi_decode_init(AVCodecContext *avctx);
-int vaapi_device_init(const char *device);
 int cuvid_init(AVCodecContext *s);
 
 HWDevice *hw_device_get_by_name(const char *name);
diff --git a/ffmpeg_opt.c b/ffmpeg_opt.c
index 6755e09e47..51671e0dd4 100644
--- a/ffmpeg_opt.c
+++ b/ffmpeg_opt.c
@@ -87,8 +87,8 @@ const HWAccel hwaccels[] = {
   AV_HWDEVICE_TYPE_NONE },
 #endif
 #if CONFIG_VAAPI
-{ "vaapi", vaapi_decode_init, HWACCEL_VAAPI, AV_PIX_FMT_VAAPI,
-  AV_HWDEVICE_TYPE_NONE },
+{ "vaapi", hwaccel_decode_init, HWACCEL_VAAPI, AV_PIX_FMT_VAAPI,
+  AV_HWDEVICE_TYPE_VAAPI },
 #endif
 #if CONFIG_CUVID
 { "cuvid", cuvid_init, HWACCEL_CUVID, AV_PIX_FMT_CUDA,
@@ -462,10 +462,22 @@ static int opt_sdp_file(void *optctx, const char *opt, 
const char *arg)
 #if CONFIG_VAAPI
 static int opt_vaapi_device(void *optctx, const char *opt, const char *arg)
 {
+HWDevice *dev;
+const char *prefix = "vaapi:";
+char *tmp;
 int err;
-err = vaapi_device_init(arg);
+tmp = av_malloc(strlen(prefix) + strlen(arg) + 1);
+if (!tmp)
+return AVERROR(ENOMEM);
+strcpy(tmp, prefix);
+strcat(tmp, arg);
+err = hw_device_init_from_string(tmp, &dev);
+av_free(tmp);
 if (err < 0)
-exit_program(1);
+return err;
+hw_device_ctx = av_buffer_ref(dev->device_ref);
+if (!hw_device_ctx)
+return AVERROR(ENOMEM);
 return 0;
 }
 #endif
diff --git a/ffmpeg_vaapi.c b/ffmpeg_vaapi.c
deleted file mode 100644
index d011cacef7..00
--- a/ffmpeg_vaapi.c
+++ /dev/null
@@ -1,233 +0,0 @@
-/*
- * This file is part of FFmpeg.
- *
- * FFmpeg is free software; you can redistribute it and/or
- * modify it under the terms of the GNU Lesser General Public
- * License as published by the Free Software Foundation; either
- * version 2.1 of the License, or (at your option) any later version.
- *
- * FFmpeg is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
- * Lesser General Public License for more details.
- *
- * You should have received a copy of the GNU Lesser General Public
- * License along with FFmpeg; if not, write to the Free Software
- * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
- */
-
-#include "config.h"
-
-#include "libavutil/avassert.h"
-#include "libavutil/frame.h"
-#include "libavutil/hwcontext.h"
-#include "libavutil/log.h"
-
-#include "ffmpeg.h"
-
-
-static AVClass vaapi_class = {
-.class_name = "vaapi",
-.item_name  = av_default_item_name,
-.version= LIBAVUTIL_VERSION_INT,
-};
-
-#define DEFAULT_SURFACES 20
-
-typedef struct VAAPIDecoderContext {
-const AVClass *class;
-
-AVBufferRef   *device_ref;
-AVHWDeviceContext *device;
-AVBufferRef   *frames_ref;
-AVHWFramesContext *frames;
-
-// The output need not have the same format, width and height as the
-// decoded frames - the copy for non-direct-mapped access is actually
-// a whole vpp instance which can do arbitrary scaling and format
-// conversion.
-enum AVPixelFormat output_format;
-} VAAPIDecoderContext;
-
-
-static int vaapi_get_buffer(AVCodecContext *avctx, AVFrame *frame, int flags)
-{
-InputStream *ist = avctx->opaque;
-VAAPIDecoderContext *ctx = ist->hwaccel_ctx;
-int err;
-
-err = av_hwframe_get_buffer(ctx->frames_ref, frame, 0);
-if (err < 0) {
-av_log(ctx, AV_LOG_ERROR, "Failed to allocate decoder surface.\n");
-} else {
-av_log(ctx, AV_LOG_DEBUG, "Decoder given surface %#x.\n",
-   (unsigned int)(uintptr_t)frame->data[3]);
-}
-return err;
-}
-
-static int vaapi_retrieve_data(AVCodecContext *avctx, AVFrame *input)
-{
-InputStream *ist = avctx->opaque;
-VAAPIDecoderContext *ctx = ist->hw

[FFmpeg-devel] [PATCH 11/24] qsvdec: Allow use of hw_device_ctx to make the internal session

2017-06-12 Thread Mark Thompson

(cherry picked from commit 8848ba0bd6b035af77d4f13aa0d8d9806fe8)
---
 libavcodec/qsvdec.c | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/libavcodec/qsvdec.c b/libavcodec/qsvdec.c
index 74866b57ff..c00817f1d9 100644
--- a/libavcodec/qsvdec.c
+++ b/libavcodec/qsvdec.c
@@ -42,7 +42,7 @@
 #include "qsvdec.h"
 
 static int qsv_init_session(AVCodecContext *avctx, QSVContext *q, mfxSession 
session,
-AVBufferRef *hw_frames_ref)
+AVBufferRef *hw_frames_ref, AVBufferRef 
*hw_device_ref)
 {
 int ret;
 
@@ -68,6 +68,18 @@ static int qsv_init_session(AVCodecContext *avctx, 
QSVContext *q, mfxSession ses
 }
 
 q->session = q->internal_session;
+} else if (hw_device_ref) {
+if (q->internal_session) {
+MFXClose(q->internal_session);
+q->internal_session = NULL;
+}
+
+ret = ff_qsv_init_session_device(avctx, &q->internal_session,
+ hw_device_ref, q->load_plugins);
+if (ret < 0)
+return ret;
+
+q->session = q->internal_session;
 } else {
 if (!q->internal_session) {
 ret = ff_qsv_init_internal_session(avctx, &q->internal_session,
@@ -133,7 +145,7 @@ static int qsv_decode_init(AVCodecContext *avctx, 
QSVContext *q)
 iopattern = MFX_IOPATTERN_OUT_SYSTEM_MEMORY;
 q->iopattern = iopattern;
 
-ret = qsv_init_session(avctx, q, session, avctx->hw_frames_ctx);
+ret = qsv_init_session(avctx, q, session, avctx->hw_frames_ctx, 
avctx->hw_device_ctx);
 if (ret < 0) {
 av_log(avctx, AV_LOG_ERROR, "Error initializing an MFX session\n");
 return ret;
-- 
2.11.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH 05/24] ffmpeg: Generic device setup

2017-06-12 Thread Mark Thompson

Not yet enabled for any hwaccels.

(cherry picked from commit d2e6dd32a445b5744a51d090c0822dbd7e434592)
(cherry picked from commit 9203aac22874c7259e155b7d00f1f33bb1355129)
---
 Makefile |   2 +-
 ffmpeg.c |  18 +++
 ffmpeg.h |  17 +++
 ffmpeg_hw.c  | 387 +++
 ffmpeg_opt.c |  39 --
 5 files changed, 455 insertions(+), 8 deletions(-)
 create mode 100644 ffmpeg_hw.c

diff --git a/Makefile b/Makefile
index a2df8b9d8d..913a890a78 100644
--- a/Makefile
+++ b/Makefile
@@ -31,7 +31,7 @@ ALLAVPROGS_G = $(AVBASENAMES:%=%$(PROGSSUF)_g$(EXESUF))
 $(foreach prog,$(AVBASENAMES),$(eval OBJS-$(prog) += cmdutils.o))
 $(foreach prog,$(AVBASENAMES),$(eval OBJS-$(prog)-$(CONFIG_OPENCL) += 
cmdutils_opencl.o))
 
-OBJS-ffmpeg   += ffmpeg_opt.o ffmpeg_filter.o
+OBJS-ffmpeg   += ffmpeg_opt.o ffmpeg_filter.o ffmpeg_hw.o
 OBJS-ffmpeg-$(CONFIG_VIDEOTOOLBOX) += ffmpeg_videotoolbox.o
 OBJS-ffmpeg-$(CONFIG_LIBMFX)  += ffmpeg_qsv.o
 OBJS-ffmpeg-$(CONFIG_VAAPI)   += ffmpeg_vaapi.o
diff --git a/ffmpeg.c b/ffmpeg.c
index cd19594f8b..6170bd453c 100644
--- a/ffmpeg.c
+++ b/ffmpeg.c
@@ -2884,6 +2884,15 @@ static int init_input_stream(int ist_index, char *error, 
int error_len)
 
 if (!av_dict_get(ist->decoder_opts, "threads", NULL, 0))
 av_dict_set(&ist->decoder_opts, "threads", "auto", 0);
+
+ret = hw_device_setup_for_decode(ist);
+if (ret < 0) {
+snprintf(error, error_len, "Device setup failed for "
+ "decoder on input stream #%d:%d : %s",
+ ist->file_index, ist->st->index, av_err2str(ret));
+return ret;
+}
+
 if ((ret = avcodec_open2(ist->dec_ctx, codec, &ist->decoder_opts)) < 
0) {
 if (ret == AVERROR_EXPERIMENTAL)
 abort_codec_experimental(codec, 0);
@@ -3441,6 +3450,14 @@ static int init_output_stream(OutputStream *ost, char 
*error, int error_len)
 ost->enc_ctx->hw_frames_ctx = 
av_buffer_ref(av_buffersink_get_hw_frames_ctx(ost->filter->filter));
 if (!ost->enc_ctx->hw_frames_ctx)
 return AVERROR(ENOMEM);
+} else {
+ret = hw_device_setup_for_encode(ost);
+if (ret < 0) {
+snprintf(error, error_len, "Device setup failed for "
+ "encoder on output stream #%d:%d : %s",
+ ost->file_index, ost->index, av_err2str(ret));
+return ret;
+}
 }
 
 if ((ret = avcodec_open2(ost->enc_ctx, codec, &ost->encoder_opts)) < 
0) {
@@ -4643,6 +4660,7 @@ static int transcode(void)
 }
 
 av_buffer_unref(&hw_device_ctx);
+hw_device_free_all();
 
 /* finished ! */
 ret = 0;
diff --git a/ffmpeg.h b/ffmpeg.h
index a806445e0d..5c115cf9a3 100644
--- a/ffmpeg.h
+++ b/ffmpeg.h
@@ -42,6 +42,7 @@
 #include "libavutil/dict.h"
 #include "libavutil/eval.h"
 #include "libavutil/fifo.h"
+#include "libavutil/hwcontext.h"
 #include "libavutil/pixfmt.h"
 #include "libavutil/rational.h"
 #include "libavutil/threadmessage.h"
@@ -74,8 +75,15 @@ typedef struct HWAccel {
 int (*init)(AVCodecContext *s);
 enum HWAccelID id;
 enum AVPixelFormat pix_fmt;
+enum AVHWDeviceType device_type;
 } HWAccel;
 
+typedef struct HWDevice {
+char *name;
+enum AVHWDeviceType type;
+AVBufferRef *device_ref;
+} HWDevice;
+
 /* select an input stream for an output stream */
 typedef struct StreamMap {
 int disabled;   /* 1 is this mapping is disabled by a negative map 
*/
@@ -661,4 +669,13 @@ int vaapi_decode_init(AVCodecContext *avctx);
 int vaapi_device_init(const char *device);
 int cuvid_init(AVCodecContext *s);
 
+HWDevice *hw_device_get_by_name(const char *name);
+int hw_device_init_from_string(const char *arg, HWDevice **dev);
+void hw_device_free_all(void);
+
+int hw_device_setup_for_decode(InputStream *ist);
+int hw_device_setup_for_encode(OutputStream *ost);
+
+int hwaccel_decode_init(AVCodecContext *avctx);
+
 #endif /* FFMPEG_H */
diff --git a/ffmpeg_hw.c b/ffmpeg_hw.c
new file mode 100644
index 00..3acf8b4532
--- /dev/null
+++ b/ffmpeg_hw.c
@@ -0,0 +1,387 @@
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 021

[FFmpeg-devel] [PATCH 03/24] hwcontext: Add device derivation

2017-06-12 Thread Mark Thompson

Creates a new device context from another of a different type which
refers to the same underlying hardware.

(cherry picked from commit b266ad56fe0e4ce5bb70118ba2e2b1dabfaf76ce)
---
 doc/APIchanges |  3 ++
 libavutil/hwcontext.c  | 65 ++
 libavutil/hwcontext.h  | 26 +
 libavutil/hwcontext_internal.h |  8 ++
 libavutil/version.h|  2 +-
 5 files changed, 103 insertions(+), 1 deletion(-)

diff --git a/doc/APIchanges b/doc/APIchanges
index 67a6142401..a6889f3930 100644
--- a/doc/APIchanges
+++ b/doc/APIchanges
@@ -15,6 +15,9 @@ libavutil: 2015-08-28
 
 API changes, most recent first:
 
+2017-06-xx - xxx - lavu 55.64.100 - hwcontext.h
+  Add av_hwdevice_ctx_create_derived().
+
 2017-05-15 - xx - lavc 57.96.100 - avcodec.h
   VideoToolbox hardware-accelerated decoding now supports the new hwaccel API,
   which can create the decoder context and allocate hardware frames 
automatically.
diff --git a/libavutil/hwcontext.c b/libavutil/hwcontext.c
index 8d50a32b84..86d290d322 100644
--- a/libavutil/hwcontext.c
+++ b/libavutil/hwcontext.c
@@ -68,6 +68,8 @@ static void hwdevice_ctx_free(void *opaque, uint8_t *data)
 if (ctx->free)
 ctx->free(ctx);
 
+av_buffer_unref(&ctx->internal->source_device);
+
 av_freep(&ctx->hwctx);
 av_freep(&ctx->internal->priv);
 av_freep(&ctx->internal);
@@ -538,6 +540,69 @@ fail:
 return ret;
 }
 
+int av_hwdevice_ctx_create_derived(AVBufferRef **dst_ref_ptr,
+   enum AVHWDeviceType type,
+   AVBufferRef *src_ref, int flags)
+{
+AVBufferRef *dst_ref = NULL, *tmp_ref;
+AVHWDeviceContext *dst_ctx, *tmp_ctx;
+int ret = 0;
+
+tmp_ref = src_ref;
+while (tmp_ref) {
+tmp_ctx = (AVHWDeviceContext*)tmp_ref->data;
+if (tmp_ctx->type == type) {
+dst_ref = av_buffer_ref(tmp_ref);
+if (!dst_ref) {
+ret = AVERROR(ENOMEM);
+goto fail;
+}
+goto done;
+}
+tmp_ref = tmp_ctx->internal->source_device;
+}
+
+dst_ref = av_hwdevice_ctx_alloc(type);
+if (!dst_ref) {
+ret = AVERROR(ENOMEM);
+goto fail;
+}
+dst_ctx = (AVHWDeviceContext*)dst_ref->data;
+
+tmp_ref = src_ref;
+while (tmp_ref) {
+tmp_ctx = (AVHWDeviceContext*)tmp_ref->data;
+if (dst_ctx->internal->hw_type->device_derive) {
+ret = dst_ctx->internal->hw_type->device_derive(dst_ctx,
+tmp_ctx,
+flags);
+if (ret == 0) {
+dst_ctx->internal->source_device = av_buffer_ref(src_ref);
+if (!dst_ctx->internal->source_device) {
+ret = AVERROR(ENOMEM);
+goto fail;
+}
+goto done;
+}
+if (ret != AVERROR(ENOSYS))
+goto fail;
+}
+tmp_ref = tmp_ctx->internal->source_device;
+}
+
+ret = AVERROR(ENOSYS);
+goto fail;
+
+done:
+*dst_ref_ptr = dst_ref;
+return 0;
+
+fail:
+av_buffer_unref(&dst_ref);
+*dst_ref_ptr = NULL;
+return ret;
+}
+
 static void ff_hwframe_unmap(void *opaque, uint8_t *data)
 {
 HWMapDescriptor *hwmap = (HWMapDescriptor*)data;
diff --git a/libavutil/hwcontext.h b/libavutil/hwcontext.h
index cfc6ad0e28..782dbf22e1 100644
--- a/libavutil/hwcontext.h
+++ b/libavutil/hwcontext.h
@@ -271,6 +271,32 @@ int av_hwdevice_ctx_create(AVBufferRef **device_ctx, enum 
AVHWDeviceType type,
const char *device, AVDictionary *opts, int flags);
 
 /**
+ * Create a new device of the specified type from an existing device.
+ *
+ * If the source device is a device of the target type or was originally
+ * derived from such a device (possibly through one or more intermediate
+ * devices of other types), then this will return a reference to the
+ * existing device of the same type as is requested.
+ *
+ * Otherwise, it will attempt to derive a new device from the given source
+ * device.  If direct derivation to the new type is not implemented, it will
+ * attempt the same derivation from each ancestor of the source device in
+ * turn looking for an implemented derivation method.
+ *
+ * @param dst_ctx On success, a reference to the newly-created
+ *AVHWDeviceContext.
+ * @param typeThe type of the new device to create.
+ * @param src_ctx A reference to an existing AVHWDeviceContext which will be
+ *used to create the new device.
+ * @param flags   Currently unused; should be set to zero.
+ * @returnZero on success, a negative AVERROR code on failure.
+ */
+int av_hwdevice_ctx_create_derived(AVBufferRef **dst_ctx,
+   enum AVHWDeviceType type,
+

[FFmpeg-devel] [PATCH 04/24] hwcontext: Make it easier to work with device types

2017-06-12 Thread Mark Thompson

Adds functions to convert to/from strings and a function to iterate
over all supported device types.  Also adds a new invalid type
AV_HWDEVICE_TYPE_NONE, which acts as a sentinel value.

(cherry picked from commit b7487f4f3c39b4b202e1ea7bb2de13902f2dee45)
---
 doc/APIchanges|  4 
 libavutil/hwcontext.c | 42 ++
 libavutil/hwcontext.h | 28 
 libavutil/version.h   |  2 +-
 4 files changed, 75 insertions(+), 1 deletion(-)

diff --git a/doc/APIchanges b/doc/APIchanges
index a6889f3930..5b2203f2b4 100644
--- a/doc/APIchanges
+++ b/doc/APIchanges
@@ -15,6 +15,10 @@ libavutil: 2015-08-28
 
 API changes, most recent first:
 
+2017-06-xx - xxx - lavu 55.65.100 - hwcontext.h
+  Add AV_HWDEVICE_TYPE_NONE, av_hwdevice_find_type_by_name(),
+  av_hwdevice_get_type_name() and av_hwdevice_iterate_types().
+
 2017-06-xx - xxx - lavu 55.64.100 - hwcontext.h
   Add av_hwdevice_ctx_create_derived().
 
diff --git a/libavutil/hwcontext.c b/libavutil/hwcontext.c
index 86d290d322..7f9b1d33e3 100644
--- a/libavutil/hwcontext.c
+++ b/libavutil/hwcontext.c
@@ -50,6 +50,48 @@ static const HWContextType *hw_table[] = {
 NULL,
 };
 
+const char *hw_type_names[] = {
+[AV_HWDEVICE_TYPE_CUDA]   = "cuda",
+[AV_HWDEVICE_TYPE_DXVA2]  = "dxva2",
+[AV_HWDEVICE_TYPE_QSV]= "qsv",
+[AV_HWDEVICE_TYPE_VAAPI]  = "vaapi",
+[AV_HWDEVICE_TYPE_VDPAU]  = "vdpau",
+[AV_HWDEVICE_TYPE_VIDEOTOOLBOX] = "videotoolbox",
+};
+
+enum AVHWDeviceType av_hwdevice_find_type_by_name(const char *name)
+{
+int type;
+for (type = 0; type < FF_ARRAY_ELEMS(hw_type_names); type++) {
+if (hw_type_names[type] && !strcmp(hw_type_names[type], name))
+return type;
+}
+return AV_HWDEVICE_TYPE_NONE;
+}
+
+const char *av_hwdevice_get_type_name(enum AVHWDeviceType type)
+{
+if (type >= 0 && type < FF_ARRAY_ELEMS(hw_type_names))
+return hw_type_names[type];
+else
+return NULL;
+}
+
+enum AVHWDeviceType av_hwdevice_iterate_types(enum AVHWDeviceType prev)
+{
+enum AVHWDeviceType next;
+int i, set = 0;
+for (i = 0; hw_table[i]; i++) {
+if (prev != AV_HWDEVICE_TYPE_NONE && hw_table[i]->type <= prev)
+continue;
+if (!set || hw_table[i]->type < next) {
+next = hw_table[i]->type;
+set = 1;
+}
+}
+return set ? next : AV_HWDEVICE_TYPE_NONE;
+}
+
 static const AVClass hwdevice_ctx_class = {
 .class_name = "AVHWDeviceContext",
 .item_name  = av_default_item_name,
diff --git a/libavutil/hwcontext.h b/libavutil/hwcontext.h
index 782dbf22e1..37e8831f6b 100644
--- a/libavutil/hwcontext.h
+++ b/libavutil/hwcontext.h
@@ -31,6 +31,7 @@ enum AVHWDeviceType {
 AV_HWDEVICE_TYPE_DXVA2,
 AV_HWDEVICE_TYPE_QSV,
 AV_HWDEVICE_TYPE_VIDEOTOOLBOX,
+AV_HWDEVICE_TYPE_NONE,
 };
 
 typedef struct AVHWDeviceInternal AVHWDeviceInternal;
@@ -224,6 +225,33 @@ typedef struct AVHWFramesContext {
 } AVHWFramesContext;
 
 /**
+ * Look up an AVHWDeviceType by name.
+ *
+ * @param name String name of the device type (case-insensitive).
+ * @return The type from enum AVHWDeviceType, or AV_HWDEVICE_TYPE_NONE if
+ * not found.
+ */
+enum AVHWDeviceType av_hwdevice_find_type_by_name(const char *name);
+
+/** Get the string name of an AVHWDeviceType.
+ *
+ * @param type Type from enum AVHWDeviceType.
+ * @return Pointer to a static string containing the name, or NULL if the type
+ * is not valid.
+ */
+const char *av_hwdevice_get_type_name(enum AVHWDeviceType type);
+
+/**
+ * Iterate over supported device types.
+ *
+ * @param type AV_HWDEVICE_TYPE_NONE initially, then the previous type
+ * returned by this function in subsequent iterations.
+ * @return The next usable device type from enum AVHWDeviceType, or
+ * AV_HWDEVICE_TYPE_NONE if there are no more.
+ */
+enum AVHWDeviceType av_hwdevice_iterate_types(enum AVHWDeviceType prev);
+
+/**
  * Allocate an AVHWDeviceContext for a given hardware type.
  *
  * @param type the type of the hardware device to allocate.
diff --git a/libavutil/version.h b/libavutil/version.h
index dd8d2407da..322b683cf4 100644
--- a/libavutil/version.h
+++ b/libavutil/version.h
@@ -80,7 +80,7 @@
 
 
 #define LIBAVUTIL_VERSION_MAJOR  55
-#define LIBAVUTIL_VERSION_MINOR  64
+#define LIBAVUTIL_VERSION_MINOR  65
 #define LIBAVUTIL_VERSION_MICRO 100
 
 #define LIBAVUTIL_VERSION_INT   AV_VERSION_INT(LIBAVUTIL_VERSION_MAJOR, \
-- 
2.11.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH 02/24] vaapi_encode: Discard output buffer if picture submission fails

2017-06-12 Thread Mark Thompson

Previously this was leaking, though it actually hit an assert making
sure that the buffer had already been cleared when freeing the picture.

(cherry picked from commit 17aeee5832b9188b570c3d3de4197e4cdc54c634)
---
 libavcodec/vaapi_encode.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/libavcodec/vaapi_encode.c b/libavcodec/vaapi_encode.c
index 7e9c00f51d..7aaf263d25 100644
--- a/libavcodec/vaapi_encode.c
+++ b/libavcodec/vaapi_encode.c
@@ -428,6 +428,8 @@ fail:
 fail_at_end:
 av_freep(&pic->codec_picture_params);
 av_frame_free(&pic->recon_image);
+av_buffer_unref(&pic->output_buffer_ref);
+pic->output_buffer = VA_INVALID_ID;
 return err;
 }
 
-- 
2.11.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH 00/24] Generic hardware device setup and miscellaneous related merges

2017-06-12 Thread Mark Thompson

This merges a set of stuff from libav to do with hardware codecs/processing.

The two most interesting features of this are:

* Generic hardware device setup.  This finishes the uniform structure for 
hardware device setup which has been in progress for a while, finally deleting 
several of the ffmpeg_X.c hardware specific files.  Initially this is working 
for VAAPI and VDPAU, with partial support for QSV.  A following series by wm4 
(start from 
)
 will add DXVA2/D3D11 support as well.

* Mapping between hardware APIs.  Initially this supports VAAPI/DXVA2 and QSV, 
OpenCL integration with those is to follow.  The main use of this at the moment 
to to allow use of the lavc decoder via a platform hwaccel and hence avoid the 
nastiness of the specific  *_qsv decoders (for example: "./ffmpeg_g -y -hwaccel 
vaapi -hwaccel_output_format vaapi -i in.mp4 -an -vf 
'hwmap=derive_device=qsv,format=qsv' -c:v h264_qsv -b 5M -maxrate 5M 
-look_ahead 0 out.mp4", and similarly with DXVA2).

Other oddments:
* Support for the VAAPI driver which wraps VDPAU.
* Field rate output for the VAAPI deinterlacer.
* hw_device_ctx support for QSV codecs using software frames (fixes some 
current silly failure cases when using multiple independent instances together).
* Profile mismatch option for hwaccels (primarily to allow hardware decoding of 
H.264 constrained baseline profile streams which erroneously fail to set 
constraint_set1_flag).
* Documentation for the hardware frame movement filters (hwupload, hwdownload, 
hwmap).

VP9 VAAPI encode support would be here, but is not included because it depends 
on the vp9_raw_reorder BSF, which is only written with the bitstream API rather 
than with get_bits.  I know that was skipped earlier, but has there been any 
more discussion on merging that?  Would it be easiest to just convert the BSF?

Thanks,

- Mark
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH 01/24] hwcontext_vaapi: Try to support the VDPAU wrapper

2017-06-12 Thread Mark Thompson

The driver is somewhat bitrotten (not updated for years) but is still
usable for decoding with this change.  To support it, this adds a new
driver quirk to indicate no support at all for surface attributes.

Based on a patch by wm4 .

(cherry picked from commit e791b915c774408fbc0ec9e7270b021899e08ccc)
---
 libavutil/hwcontext_vaapi.c | 79 ++---
 libavutil/hwcontext_vaapi.h |  7 
 2 files changed, 52 insertions(+), 34 deletions(-)

diff --git a/libavutil/hwcontext_vaapi.c b/libavutil/hwcontext_vaapi.c
index 3b50e95615..3970726d30 100644
--- a/libavutil/hwcontext_vaapi.c
+++ b/libavutil/hwcontext_vaapi.c
@@ -155,7 +155,8 @@ static int vaapi_frames_get_constraints(AVHWDeviceContext 
*hwdev,
 unsigned int fourcc;
 int err, i, j, attr_count, pix_fmt_count;
 
-if (config) {
+if (config &&
+!(hwctx->driver_quirks & AV_VAAPI_DRIVER_QUIRK_SURFACE_ATTRIBUTES)) {
 attr_count = 0;
 vas = vaQuerySurfaceAttributes(hwctx->display, config->config_id,
0, &attr_count);
@@ -273,6 +274,11 @@ static const struct {
 "ubit",
 AV_VAAPI_DRIVER_QUIRK_ATTRIB_MEMTYPE,
 },
+{
+"VDPAU wrapper",
+"Splitted-Desktop Systems VDPAU backend for VA-API",
+AV_VAAPI_DRIVER_QUIRK_SURFACE_ATTRIBUTES,
+},
 };
 
 static int vaapi_device_init(AVHWDeviceContext *hwdev)
@@ -451,43 +457,48 @@ static int vaapi_frames_init(AVHWFramesContext *hwfc)
 }
 
 if (!hwfc->pool) {
-int need_memory_type = !(hwctx->driver_quirks & 
AV_VAAPI_DRIVER_QUIRK_ATTRIB_MEMTYPE);
-int need_pixel_format = 1;
-for (i = 0; i < avfc->nb_attributes; i++) {
-if (ctx->attributes[i].type == VASurfaceAttribMemoryType)
-need_memory_type  = 0;
-if (ctx->attributes[i].type == VASurfaceAttribPixelFormat)
-need_pixel_format = 0;
-}
-ctx->nb_attributes =
-avfc->nb_attributes + need_memory_type + need_pixel_format;
+if (!(hwctx->driver_quirks & 
AV_VAAPI_DRIVER_QUIRK_SURFACE_ATTRIBUTES)) {
+int need_memory_type = !(hwctx->driver_quirks & 
AV_VAAPI_DRIVER_QUIRK_ATTRIB_MEMTYPE);
+int need_pixel_format = 1;
+for (i = 0; i < avfc->nb_attributes; i++) {
+if (ctx->attributes[i].type == VASurfaceAttribMemoryType)
+need_memory_type  = 0;
+if (ctx->attributes[i].type == VASurfaceAttribPixelFormat)
+need_pixel_format = 0;
+}
+ctx->nb_attributes =
+avfc->nb_attributes + need_memory_type + need_pixel_format;
 
-ctx->attributes = av_malloc(ctx->nb_attributes *
+ctx->attributes = av_malloc(ctx->nb_attributes *
 sizeof(*ctx->attributes));
-if (!ctx->attributes) {
-err = AVERROR(ENOMEM);
-goto fail;
-}
+if (!ctx->attributes) {
+err = AVERROR(ENOMEM);
+goto fail;
+}
 
-for (i = 0; i < avfc->nb_attributes; i++)
-ctx->attributes[i] = avfc->attributes[i];
-if (need_memory_type) {
-ctx->attributes[i++] = (VASurfaceAttrib) {
-.type  = VASurfaceAttribMemoryType,
-.flags = VA_SURFACE_ATTRIB_SETTABLE,
-.value.type= VAGenericValueTypeInteger,
-.value.value.i = VA_SURFACE_ATTRIB_MEM_TYPE_VA,
-};
-}
-if (need_pixel_format) {
-ctx->attributes[i++] = (VASurfaceAttrib) {
-.type  = VASurfaceAttribPixelFormat,
-.flags = VA_SURFACE_ATTRIB_SETTABLE,
-.value.type= VAGenericValueTypeInteger,
-.value.value.i = fourcc,
-};
+for (i = 0; i < avfc->nb_attributes; i++)
+ctx->attributes[i] = avfc->attributes[i];
+if (need_memory_type) {
+ctx->attributes[i++] = (VASurfaceAttrib) {
+.type  = VASurfaceAttribMemoryType,
+.flags = VA_SURFACE_ATTRIB_SETTABLE,
+.value.type= VAGenericValueTypeInteger,
+.value.value.i = VA_SURFACE_ATTRIB_MEM_TYPE_VA,
+};
+}
+if (need_pixel_format) {
+ctx->attributes[i++] = (VASurfaceAttrib) {
+.type  = VASurfaceAttribPixelFormat,
+.flags = VA_SURFACE_ATTRIB_SETTABLE,
+.value.type= VAGenericValueTypeInteger,
+.value.value.i = fourcc,
+};
+}
+av_assert0(i == ctx->nb_attributes);
+} else {
+ctx->attributes = NULL;
+ctx->nb_attributes = 0;
 }
-av_assert0(i == ctx->nb_attri

Re: [FFmpeg-devel] [PATCH 01/11] avfilter/unsharp: fix uninitialized pointer read

2017-06-12 Thread Michael Niedermayer

On Sun, Jun 11, 2017 at 04:05:43PM +0200, Timo Rothenpieler wrote:
> Fixes CID 1396855
> ---
>  libavfilter/unsharp_opencl.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

LGTM

thx

[...]
-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

There will always be a question for which you do not know the correct answer.


signature.asc
Description: Digital signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] Sharing cuda context between transcode sessions to reduce initialization overhead

2017-06-12 Thread Timo Rothenpieler


Global state in the libraries is something we absolutely try to stay away
from, so this approach is not quite appropriate.

If you want to somehow share this, it should be in the ffmpeg command line
tool somewhere, however we also try to reduce hardware specific magic in
favor of abstractions.


This shouldn't need any hardware specific magic, just setting the 
hw_device_ctx on all nvenc instances. Which is something that could be 
done entirely hardware independent quite easily.
Like, set the hw_device_ctx wherever applicable if "-hwaccel something" 
is set.


I wonder though, what happens in the case that one explicitly wants to 
use multiple GPUs? I guess in that case using an explicit hwupload_cuda 
might be a workaround?

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH 07/11] avfilter/vf_signature: fix memory leaks in error cases

2017-06-12 Thread Michael Niedermayer

On Sun, Jun 11, 2017 at 04:05:49PM +0200, Timo Rothenpieler wrote:
> Fixes CIDs 1403234 and 1403235
> ---
>  libavfilter/vf_signature.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)

LGTM

thx

PS: i wont review the remaining signature fix, as its not obvious
to me if the condition is possible, maybe mail the author if he doesnt
spot the patch

[...]
-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Concerning the gods, I have no means of knowing whether they exist or not
or of what sort they may be, because of the obscurity of the subject, and
the brevity of human life -- Protagoras


signature.asc
Description: Digital signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH 0/6] sse2/avx functions for 8-bit simple idct

2017-06-12 Thread James Darnley

On 2017-06-12 18:57, Michael Niedermayer wrote:
> On Mon, Jun 12, 2017 at 03:36:06PM +0200, James Darnley wrote:
>> Rounding contributed by Ronald S. Bultje
>> ---
>>  libavcodec/tests/x86/dct.c   |  2 ++
>>  libavcodec/x86/idctdsp_init.c| 19 +++
>>  libavcodec/x86/simple_idct.h |  3 +++
>>  libavcodec/x86/simple_idct10.asm |  8 
>>  4 files changed, 32 insertions(+)
>
> this (3) and the patches 1 and 2 break te idct
>
> ./ffplay ~/videos/matrixbench_mpeg2.mpg
> looks pretty bad

If that would happen to be the FATE sample
mpeg2/matrixbench_mpeg2.lq1.mpg then I see that too.

As I said on IRC I was able to partly remedy it by moving patch 6 up.

On 2017-06-12 19:00, Michael Niedermayer wrote:
> On Mon, Jun 12, 2017 at 03:36:03PM +0200, James Darnley wrote:
>> I think I have reached the final state for these patches.  There has been 
>> little
>> change to the 1st, 3rd, 4th, and 5th.
>>
>> The 2nd adds an option to explicitly control what the macro does after the 
>> IDCT.
>> This allows the small optimisation for 8-bit of not storing the data back to 
>> the
>> source block.
>>
> 
>> The 6th lets the IDCT use the slightly different coefficients to get exact
>> output compared with the MMX original.  This is rather messy but I think it 
>> is
>> slightly better than trying to alter the code macro.  A word diff looks much
>> cleaner than the line diff git uses by default.
> 
> i see no changes in fate
> is the changed code not tested in any fate test (in which case please
> add one) but quite possibly i also misundestand as i didnt look at
> what exactly is slihtly different

While I am not completely sure, I think that is be because the simplemmx
doesn't produce the same output as the C so is avoided in fate tests.
The new functions are chosen with these conditions:

> +if (ARCH_X86_64 &&
> +!high_bit_depth &&
> +avctx->lowres == 0 &&
> +(avctx->idct_algo == FF_IDCT_AUTO ||
> +avctx->idct_algo == FF_IDCT_SIMPLEAUTO ||
> +avctx->idct_algo == FF_IDCT_SIMPLEMMX)) {

As is the old MMX, minus the x86-64 part.

Where as others (the 10 and 12 bit) have "avctx->idct_algo ==
FF_IDCT_SIMPLE" at the end.

Further to this, doing sed -i 's/idct simple/idct simplemmx/g' across
the files which match makes fate fail.  I looked at where that matched
and it appeared to be mostly command line options in the testing files.

That was done on master so I think that proves my original thought.

Ultimately I have no idea what is going on here.  I've tested decoding
the sample mentioned above before my inline to external conversion of
the MMX.  I've done that immediately after that commit.  I've done that
on the current master.  I get the same result for all of those.

Going to my patches.  I can get the same result with -cpuflags none.
Surely -cpuflags mmx+mmxext would then be the same as above.

I'm quitting for the night before I flame out.

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH 08/11] avfilter/vf_signature: use av_strlcpy instead of strcpy

2017-06-12 Thread Michael Niedermayer

On Sun, Jun 11, 2017 at 04:05:50PM +0200, Timo Rothenpieler wrote:
> Fixes CID 1403236
> ---
>  libavfilter/vf_signature.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/libavfilter/vf_signature.c b/libavfilter/vf_signature.c
> index f0078ba1a6..c20b0bfabb 100644
> --- a/libavfilter/vf_signature.c
> +++ b/libavfilter/vf_signature.c
> @@ -576,7 +576,7 @@ static int export(AVFilterContext *ctx, StreamContext 
> *sc, int input)
>  /* error already handled */
>  av_assert0(av_get_frame_filename(filename, sizeof(filename), 
> sic->filename, input) == 0);
>  } else {
> -strcpy(filename, sic->filename);
> +av_strlcpy(filename, sic->filename, sizeof(filename));

missing error handling, if its truncated
but better than before, so patch ok

thx

[...]
-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Freedom in capitalist society always remains about the same as it was in
ancient Greek republics: Freedom for slave owners. -- Vladimir Lenin


signature.asc
Description: Digital signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH] avcodec/htmlsubtitles: Replace very slow redundant sscanf() calls by cleaner and faster code

2017-06-12 Thread Michael Niedermayer

On Mon, Jun 12, 2017 at 11:35:14AM +0200, wm4 wrote:
> On Sun, 11 Jun 2017 17:58:45 +0200
> Michael Niedermayer  wrote:
> 
> > This reduces the worst case from O(n²) to O(n) time
> > 
> > Fixes Timeout
> > Fixes: 2127/clusterfuzz-testcase-minimized-6595787859427328
> > 
> > Signed-off-by: Michael Niedermayer 
> > ---
> >  libavcodec/htmlsubtitles.c | 20 +++-
> >  1 file changed, 15 insertions(+), 5 deletions(-)
> > 
> > diff --git a/libavcodec/htmlsubtitles.c b/libavcodec/htmlsubtitles.c
> > index 16295daa0c..70311c66d5 100644
> > --- a/libavcodec/htmlsubtitles.c
> > +++ b/libavcodec/htmlsubtitles.c
> > @@ -56,6 +56,7 @@ int ff_htmlmarkup_to_ass(void *log_ctx, AVBPrint *dst, 
> > const char *in)
> >  char *param, buffer[128], tmp[128];
> >  int len, tag_close, sptr = 1, line_start = 1, an = 0, end = 0;
> >  SrtStack stack[16];
> > +int closing_brace_missing = 0;
> >  
> >  stack[0].tag[0] = 0;
> >  strcpy(stack[0].param[PARAM_SIZE],  "{\\fs}");
> > @@ -83,11 +84,20 @@ int ff_htmlmarkup_to_ass(void *log_ctx, AVBPrint *dst, 
> > const char *in)
> >  and all microdvd like styles such as {Y:xxx} */
> >  len = 0;
> >  an += sscanf(in, "{\\an%*1u}%n", &len) >= 0 && len > 0;
> > -if ((an != 1 && (len = 0, sscanf(in, "{\\%*[^}]}%n", &len) >= 
> > 0 && len > 0)) ||
> > -(len = 0, sscanf(in, "{%*1[CcFfoPSsYy]:%*[^}]}%n", &len) 
> > >= 0 && len > 0)) {
> > -in += len - 1;
> > -} else
> > -av_bprint_chars(dst, *in, 1);
> > +
> > +if (!closing_brace_missing) {
> > +if (   (an != 1 && in[1] == '\\')
> > +|| (in[1] && strchr("CcFfoPSsYy", in[1]) && in[2] == 
> > ':')) {
> > +char *bracep = strchr(in+2, '}');
> > +if (bracep) {
> > +in = bracep;
> > +break;
> > +} else
> > +closing_brace_missing = 1;
> > +}
> > +}
> > +
> > +av_bprint_chars(dst, *in, 1);
> >  break;
> >  case '<':
> >  tag_close = in[1] == '/';
> 
> IMO better than before - now anyone can understand this code, and it's
> faster. I'm not maintainer of this though.
> 

> Would it be possible to move this switch case to a separate function?
> ff_htmlmarkup_to_ass is a bit too big.

patch posted which moves the code on top of this patch

about the "an +=" stuff, i tried to find some description of the
subtitle format that trigers it but i had no luck. So i tried not to
change its behavior  ...


[...]
-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Let us carefully observe those good qualities wherein our enemies excel us
and endeavor to excel them, by avoiding what is faulty, and imitating what
is excellent in them. -- Plutarch


signature.asc
Description: Digital signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH] avcodec/htmlsubtitles: Factor open brace handling into its own function

2017-06-12 Thread Michael Niedermayer

Suggested-by: wm4
Signed-off-by: Michael Niedermayer 
---
 libavcodec/htmlsubtitles.c | 44 ++--
 1 file changed, 26 insertions(+), 18 deletions(-)

diff --git a/libavcodec/htmlsubtitles.c b/libavcodec/htmlsubtitles.c
index 70311c66d5..be5c9316ca 100644
--- a/libavcodec/htmlsubtitles.c
+++ b/libavcodec/htmlsubtitles.c
@@ -51,6 +51,30 @@ static void rstrip_spaces_buf(AVBPrint *buf)
 buf->str[--buf->len] = 0;
 }
 
+/* skip all {\xxx} substrings except for {\an%d}
+   and all microdvd like styles such as {Y:xxx} */
+static void handle_open_brace(AVBPrint *dst, const char **inp, int *an, int 
*closing_brace_missing)
+{
+int len = 0;
+const char *in = *inp;
+
+*an += sscanf(in, "{\\an%*1u}%n", &len) >= 0 && len > 0;
+
+if (!*closing_brace_missing) {
+if (   (*an != 1 && in[1] == '\\')
+|| (in[1] && strchr("CcFfoPSsYy", in[1]) && in[2] == ':')) {
+char *bracep = strchr(in+2, '}');
+if (bracep) {
+*inp = bracep;
+return;
+} else
+*closing_brace_missing = 1;
+}
+}
+
+av_bprint_chars(dst, *in, 1);
+}
+
 int ff_htmlmarkup_to_ass(void *log_ctx, AVBPrint *dst, const char *in)
 {
 char *param, buffer[128], tmp[128];
@@ -80,24 +104,8 @@ int ff_htmlmarkup_to_ass(void *log_ctx, AVBPrint *dst, 
const char *in)
 if (!line_start)
 av_bprint_chars(dst, *in, 1);
 break;
-case '{':/* skip all {\xxx} substrings except for {\an%d}
-and all microdvd like styles such as {Y:xxx} */
-len = 0;
-an += sscanf(in, "{\\an%*1u}%n", &len) >= 0 && len > 0;
-
-if (!closing_brace_missing) {
-if (   (an != 1 && in[1] == '\\')
-|| (in[1] && strchr("CcFfoPSsYy", in[1]) && in[2] == ':')) 
{
-char *bracep = strchr(in+2, '}');
-if (bracep) {
-in = bracep;
-break;
-} else
-closing_brace_missing = 1;
-}
-}
-
-av_bprint_chars(dst, *in, 1);
+case '{':
+handle_open_brace(dst, &in, &an, &closing_brace_missing);
 break;
 case '<':
 tag_close = in[1] == '/';
-- 
2.13.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] Sharing cuda context between transcode sessions to reduce initialization overhead

2017-06-12 Thread Hendrik Leppkes

Am 12.06.2017 10:38 nachm. schrieb "Ganapathy Raman Kasi" :

Hi,


Currently incase of using 1 -> N transcode (1 SW decode -> N  NVENC
encodes) without HW upload filter, we end up allocating multiple Cuda
contexts for the N transcode sessions for the same underlying gpu device.
This comes with the cuda context initialization overhead. (~100 ms per
context creation with 4th gen i5 with GTX 1080 in ubuntu 16.04).  Also in
case of  M * (1->N) full HW accelerated transcode we face this issue where
the cuda context is not shared between the M transcode sessions. Sharing
the context would greatly reduce the initialization time which will matter
in case of short clip transcodes.


I currently have a global array in avutil/hwcontext_cuda.c which keeps
track of the cuda contexts created and reuses existing contexts when
request for hwdevice ctx create occurs. This is shared in the attached
patch. Please check the approach and let me know if there is better/cleaner
way to do this. Thanks


Global state in the libraries is something we absolutely try to stay away
from, so this approach is not quite appropriate.

If you want to somehow share this, it should be in the ffmpeg command line
tool somewhere, however we also try to reduce hardware specific magic in
favor of abstractions.

- Hendrik
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] Sharing cuda context between transcode sessions to reduce initialization overhead

2017-06-12 Thread Ganapathy Raman Kasi

Hi,


Currently incase of using 1 -> N transcode (1 SW decode -> N  NVENC encodes) 
without HW upload filter, we end up allocating multiple Cuda contexts for the N 
transcode sessions for the same underlying gpu device. This comes with the cuda 
context initialization overhead. (~100 ms per context creation with 4th gen i5 
with GTX 1080 in ubuntu 16.04).  Also in case of  M * (1->N) full HW 
accelerated transcode we face this issue where the cuda context is not shared 
between the M transcode sessions. Sharing the context would greatly reduce the 
initialization time which will matter in case of short clip transcodes.


I currently have a global array in avutil/hwcontext_cuda.c which keeps track of 
the cuda contexts created and reuses existing contexts when request for 
hwdevice ctx create occurs. This is shared in the attached patch. Please check 
the approach and let me know if there is better/cleaner way to do this. Thanks


Regards

Ganapathy


---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---
From 9e828c7cd943b964ccf4cc8d1059fcef014b24a3 Mon Sep 17 00:00:00 2001
From: Ganapathy Kasi 
Date: Mon, 12 Jun 2017 13:14:36 -0700
Subject: [PATCH] Share cuda context across multiple transcode sessions for the
 same gpu

Cuda context is allocated per decode/scale/encode session. If there are multiple
transcodes in same process, many cuda contexts are allocated for the underlying
same gpu device which has a initialization perf overhead. Sharing the cuda
context per device fixes the issue. Also nvenc is directly using the cuda
interface to create the cuda context instead of using the av_hwdevice interface.
---
 libavcodec/nvenc.c | 33 ++---
 libavcodec/nvenc.h |  3 ++-
 libavutil/hwcontext_cuda.c | 40 ++--
 3 files changed, 46 insertions(+), 30 deletions(-)

diff --git a/libavcodec/nvenc.c b/libavcodec/nvenc.c
index f79b9a5..d5b6978 100644
--- a/libavcodec/nvenc.c
+++ b/libavcodec/nvenc.c
@@ -326,10 +326,14 @@ static av_cold int nvenc_check_device(AVCodecContext *avctx, int idx)
 NvencDynLoadFunctions *dl_fn = &ctx->nvenc_dload_funcs;
 NV_ENCODE_API_FUNCTION_LIST *p_nvenc = &dl_fn->nvenc_funcs;
 char name[128] = { 0};
+char device_str[20];
 int major, minor, ret;
 CUresult cu_res;
 CUdevice cu_device;
 CUcontext dummy;
+AVHWDeviceContext *device_ctx;
+AVCUDADeviceContext *device_hwctx;
+
 int loglevel = AV_LOG_VERBOSE;
 
 if (ctx->device == LIST_DEVICES)
@@ -364,19 +368,19 @@ static av_cold int nvenc_check_device(AVCodecContext *avctx, int idx)
 if (ctx->device != idx && ctx->device != ANY_DEVICE)
 return -1;
 
-cu_res = dl_fn->cuda_dl->cuCtxCreate(&ctx->cu_context_internal, 0, cu_device);
-if (cu_res != CUDA_SUCCESS) {
-av_log(avctx, AV_LOG_FATAL, "Failed creating CUDA context for NVENC: 0x%x\n", (int)cu_res);
+if (ctx->device == ANY_DEVICE)
+ctx->device = 0;
+
+sprintf(device_str, "%d", ctx->device);
+
+ret = av_hwdevice_ctx_create(&ctx->hwdevice, AV_HWDEVICE_TYPE_CUDA, device_str, NULL, 0);
+if (ret < 0)
 goto fail;
-}
 
-ctx->cu_context = ctx->cu_context_internal;
+device_ctx = (AVHWDeviceContext *)ctx->hwdevice->data;
+device_hwctx = device_ctx->hwctx;
 
-cu_res = dl_fn->cuda_dl->cuCtxPopCurrent(&dummy);
-if (cu_res != CUDA_SUCCESS) {
-av_log(avctx, AV_LOG_FATAL, "Failed popping CUDA context: 0x%x\n", (int)cu_res);
-goto fail2;
-}
+ctx->cu_context = device_hwctx->cuda_ctx;
 
 if ((ret = nvenc_open_session(avctx)) < 0)
 goto fail2;
@@ -408,8 +412,8 @@ fail3:
 }
 
 fail2:
-dl_fn->cuda_dl->cuCtxDestroy(ctx->cu_context_internal);
-ctx->cu_context_internal = NULL;
+av_buffer_unref(&ctx->hwdevice);
+ctx->cu_context = NULL;
 
 fail:
 return AVERROR(ENOSYS);
@@ -1374,9 +1378,8 @@ av_cold int ff_nvenc_encode_close(AVCodecContext *avctx)
 return AVERROR_EXTERNAL;
 }
 
-if (ctx->cu_context_internal)
-dl_fn->cuda_dl->cuCtxDestroy(ctx->cu_context_internal);
-ctx->cu_context = ctx->cu_context_internal = NULL;
+av_buffer_unref(&ctx->hwdevice);
+ctx->cu_context = NULL;
 
 nvenc_free_functions(&dl_fn->nvenc_dl);
 cuda_free_functions(&dl_fn->cuda_dl);
diff --git a/libavcodec/nvenc.h b/libavcodec/nvenc.h
index 2e24604..327c914 100644
--- a/libavcodec/nvenc.h
+++ b/libavcodec/nvenc.h
@@ -106,7 +106,6 @@ typedef struct NvencContext
 NV_ENC_INITIALIZE_PARAMS init_encode_params;
 NV_ENC_C

Re: [FFmpeg-devel] [PATCH] fate: fix source test for sofa2wavs

2017-06-12 Thread Paul B Mahol

On 6/12/17, Michael Bradshaw  wrote:
> On Mon, Jun 12, 2017 at 11:14 AM, James Almer  wrote:
>> On 6/12/2017 3:07 PM, Paul B Mahol wrote:
>>> OK, please apply.
>>
>> Wouldn't it be better/proper to add the license header to the file
>> instead?
>
> Yeah. Paul, as the proper copyright holder of the sofa2wavs.c file,
> could you add the license header?
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>

Done.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH] fate: add test for -time_base option

2017-06-12 Thread Michael Bradshaw

On Sat, Jun 10, 2017 at 8:19 AM, James Almer  wrote:
> Is mxf as output needed for this?

mxf was one of the (relatively few) number of muxers I knew of that
would utilize the provided time base. It's not strictly needed.

> If not, the framemd5() or framecrc()
> fate functions (which use the muxers of the same name) would be a better
> test. Those report the output timebase in a quick and easily readable way.

Thanks for pointing that out! I didn't know they reported the time
base in addition to the MD5 of the frames. I've attached a patch that
changes the fate test to use framemd5. Please review.

0001-fate-use-framemd5-for-time_base-testing.patch
Description: Binary data
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH] fate: fix source test for sofa2wavs

2017-06-12 Thread Michael Bradshaw

On Mon, Jun 12, 2017 at 11:14 AM, James Almer  wrote:
> On 6/12/2017 3:07 PM, Paul B Mahol wrote:
>> OK, please apply.
>
> Wouldn't it be better/proper to add the license header to the file instead?

Yeah. Paul, as the proper copyright holder of the sofa2wavs.c file,
could you add the license header?
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH] fate: fix source test for sofa2wavs

2017-06-12 Thread James Almer

On 6/12/2017 3:07 PM, Paul B Mahol wrote:
> On 6/12/17, Michael Bradshaw  wrote:
>> commit 1a30bf60be9243830b68e8fe2e20539f08a85926 added the sofa2wavs
>> tool, which breaks fate:
>> $ make fate V=1
>> TESTsource
>> ./tests/fate-run.sh fate-source "samples" "" "/Users/mjbshaw/ffmpeg"
>> 'runlocal fate/source-check.sh' '' '' '' '1' '' '' '' '' '' '' '' ''
>> ''
>> ./tests/fate/source-check.sh ./tests
>> --- ./tests/ref/fate/source 2017-06-12 10:34:46.0 -0700
>> +++ tests/data/fate/source 2017-06-12 10:48:27.0 -0700
>> @@ -12,6 +12,7 @@
>>  libavformat/log2_tab.c
>>  libswresample/log2_tab.c
>>  libswscale/log2_tab.c
>> +tools/sofa2wavs.c
>>  tools/uncoded_frame.c
>>  tools/yuvcmp.c
>>  Headers without standard inclusion guards:
>> Test source failed. Look at tests/data/fate/source.err for details.
>> make: *** [fate-source] Error 1
>>
>> Attached patch updates fate to include the sofa2wavs tool. Please review.
> 
> OK, please apply.

Wouldn't it be better/proper to add the license header to the file instead?
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH] x86/aacpsdsp: add ff_ps_hybrid_synthesis_deint_{sse, sse4}

2017-06-12 Thread James Almer

About 2x faster than the c version.

Signed-off-by: James Almer 
---
 libavcodec/x86/aacpsdsp.asm| 123 +
 libavcodec/x86/aacpsdsp_init.c |   8 +++
 libavutil/x86/x86util.asm  |  15 +++--
 3 files changed, 140 insertions(+), 6 deletions(-)

diff --git a/libavcodec/x86/aacpsdsp.asm b/libavcodec/x86/aacpsdsp.asm
index f7f22f274c..cdcadefcdc 100644
--- a/libavcodec/x86/aacpsdsp.asm
+++ b/libavcodec/x86/aacpsdsp.asm
@@ -172,6 +172,129 @@ align 16
 .ret:
 REP_RET
 
+;***
+;void ps_hybrid_synthesis_deint_sse4(float out[2][38][64],
+;float (*in)[32][2],
+;int i, int len)
+;***
+%macro HYBRID_SYNTHESIS_DEINT 0
+cglobal ps_hybrid_synthesis_deint, 3, 7, 5, out, in, i, len, out0, out1, tmp
+%if cpuflag(sse4)
+%define MOVH movsd
+%else
+%define MOVH movlps
+%endif
+movsxdifnidniq, id
+mov   lend, 32 << 3
+lea   outq, [outq+iq*4]
+mov   tmpd, id
+shl   tmpd, 8
+addinq, tmpq
+mov   tmpd, 64
+sub   tmpd, id
+mov id, tmpd
+
+testid, 1
+jne .loop4
+testid, 2
+jne .loop8
+
+align 16
+.loop16:
+mov  out0q, outq
+mov  out1q, 38*64*4
+add  out1q, out0q
+mov   tmpd, lend
+
+.inner_loop16:
+movaps  m0, [inq]
+movaps  m1, [inq+lenq]
+movaps  m2, [inq+lenq*2]
+movaps  m3, [inq+3*32*2*4]
+TRANSPOSE4x4PS 0, 1, 2, 3, 4
+movaps [out0q], m0
+movaps [out1q], m1
+movaps[out0q+lenq], m2
+movaps[out1q+lenq], m3
+lea  out0q, [out0q+lenq*2]
+lea  out1q, [out1q+lenq*2]
+addinq, mmsize
+sub   tmpd, mmsize
+jg .inner_loop16
+add   outq, 16
+addinq, 3*32*2*4
+sub id, 4
+jg .loop16
+RET
+
+align 16
+.loop8:
+mov  out0q, outq
+mov  out1q, 38*64*4
+add  out1q, out0q
+mov   tmpd, lend
+
+.inner_loop8:
+movaps  m0, [inq]
+movaps  m1, [inq+lenq]
+SBUTTERFLYPS 0, 1, 2
+SBUTTERFLYPD 0, 1, 2
+MOVH   [out0q], m0
+MOVH   [out1q], m1
+movhps[out0q+lenq], m0
+movhps[out1q+lenq], m1
+lea  out0q, [out0q+lenq*2]
+lea  out1q, [out1q+lenq*2]
+addinq, mmsize
+sub   tmpd, mmsize
+jg .inner_loop8
+add   outq, 8
+addinq, lenq
+sub id, 2
+jg .loop16
+RET
+
+align 16
+.loop4:
+mov  out0q, outq
+mov  out1q, 38*64*4
+add  out1q, out0q
+mov   tmpd, lend
+
+.inner_loop4:
+movaps  m0, [inq]
+movss  [out0q], m0
+%if cpuflag(sse4)
+extractps  [out1q], m0, 1
+extractps [out0q+lenq], m0, 2
+extractps [out1q+lenq], m0, 3
+%else
+movhlps m1, m0
+movss [out0q+lenq], m1
+shufps  m0, m0, 0xb1
+movss  [out1q], m0
+movhlps m1, m0
+movss [out1q+lenq], m1
+%endif
+lea  out0q, [out0q+lenq*2]
+lea  out1q, [out1q+lenq*2]
+addinq, mmsize
+sub   tmpd, mmsize
+jg .inner_loop4
+add   outq, 4
+sub id, 1
+testid, 2
+jne .loop8
+cmp id, 4
+jge .loop16
+RET
+%endmacro
+
+INIT_XMM sse
+HYBRID_SYNTHESIS_DEINT
+INIT_XMM sse4
+HYBRID_SYNTHESIS_DEINT
+
 ;***
 ;void ff_ps_hybrid_analysis_(float (*out)[2], float (*in)[2],
 ; const float (*filter)[8][2],
diff --git a/libavcodec/x86/aacpsdsp_init.c b/libavcodec/x86/aacpsdsp_init.c
index 767ae6588e..25e089c395 100644
--- a/libavcodec/x86/aacpsdsp_init.c
+++ b/libavcodec/x86/aacpsdsp_init.c
@@ -40,6 +40,10 @@ void ff_ps_stereo_interpolate_sse3(float (*l)[2], float 
(*r)[2],
 void ff_ps_stereo_interpolate_ipdopd_sse3(float (*l)[2], float (*r)[2],
   float h[2][4], float h_step[2][4],
   int len);
+void ff_ps_hybrid_synthesis_deint_sse(float out[2][38][64], float (*in)[32][2],
+  int i, int len);
+void ff_ps_hybrid_synthesis_deint_sse4(float out[2][38][64], float 
(*in)[32][2],
+   int i, int len);
 
 av_cold void ff_psdsp_init_x86(PSDSPContext *s)
 {
@@ -48,6 +52,7 @@ av_cold voi

Re: [FFmpeg-devel] [PATCH] fate: fix source test for sofa2wavs

2017-06-12 Thread Paul B Mahol

On 6/12/17, Michael Bradshaw  wrote:
> commit 1a30bf60be9243830b68e8fe2e20539f08a85926 added the sofa2wavs
> tool, which breaks fate:
> $ make fate V=1
> TESTsource
> ./tests/fate-run.sh fate-source "samples" "" "/Users/mjbshaw/ffmpeg"
> 'runlocal fate/source-check.sh' '' '' '' '1' '' '' '' '' '' '' '' ''
> ''
> ./tests/fate/source-check.sh ./tests
> --- ./tests/ref/fate/source 2017-06-12 10:34:46.0 -0700
> +++ tests/data/fate/source 2017-06-12 10:48:27.0 -0700
> @@ -12,6 +12,7 @@
>  libavformat/log2_tab.c
>  libswresample/log2_tab.c
>  libswscale/log2_tab.c
> +tools/sofa2wavs.c
>  tools/uncoded_frame.c
>  tools/yuvcmp.c
>  Headers without standard inclusion guards:
> Test source failed. Look at tests/data/fate/source.err for details.
> make: *** [fate-source] Error 1
>
> Attached patch updates fate to include the sofa2wavs tool. Please review.

OK, please apply.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH] fate: fix source test for sofa2wavs

2017-06-12 Thread Michael Bradshaw

commit 1a30bf60be9243830b68e8fe2e20539f08a85926 added the sofa2wavs
tool, which breaks fate:
$ make fate V=1
TESTsource
./tests/fate-run.sh fate-source "samples" "" "/Users/mjbshaw/ffmpeg"
'runlocal fate/source-check.sh' '' '' '' '1' '' '' '' '' '' '' '' ''
''
./tests/fate/source-check.sh ./tests
--- ./tests/ref/fate/source 2017-06-12 10:34:46.0 -0700
+++ tests/data/fate/source 2017-06-12 10:48:27.0 -0700
@@ -12,6 +12,7 @@
 libavformat/log2_tab.c
 libswresample/log2_tab.c
 libswscale/log2_tab.c
+tools/sofa2wavs.c
 tools/uncoded_frame.c
 tools/yuvcmp.c
 Headers without standard inclusion guards:
Test source failed. Look at tests/data/fate/source.err for details.
make: *** [fate-source] Error 1

Attached patch updates fate to include the sofa2wavs tool. Please review.

Thanks,

--Michael


0001-fate-fix-source-test-for-sofa2wavs.patch
Description: Binary data
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH 0/6] sse2/avx functions for 8-bit simple idct

2017-06-12 Thread Michael Niedermayer

On Mon, Jun 12, 2017 at 03:36:03PM +0200, James Darnley wrote:
> I think I have reached the final state for these patches.  There has been 
> little
> change to the 1st, 3rd, 4th, and 5th.
> 
> The 2nd adds an option to explicitly control what the macro does after the 
> IDCT.
> This allows the small optimisation for 8-bit of not storing the data back to 
> the
> source block.
> 

> The 6th lets the IDCT use the slightly different coefficients to get exact
> output compared with the MMX original.  This is rather messy but I think it is
> slightly better than trying to alter the code macro.  A word diff looks much
> cleaner than the line diff git uses by default.

i see no changes in fate
is the changed code not tested in any fate test (in which case please
add one) but quite possibly i also misundestand as i didnt look at
what exactly is slihtly different

[...]
-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

The bravest are surely those who have the clearest vision
of what is before them, glory and danger alike, and yet
notwithstanding go out to meet it. -- Thucydides


signature.asc
Description: Digital signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH 3/6] avcodec/x86: add x86-64 8-bit simple_idct function

2017-06-12 Thread Michael Niedermayer

On Mon, Jun 12, 2017 at 03:36:06PM +0200, James Darnley wrote:
> Rounding contributed by Ronald S. Bultje
> ---
>  libavcodec/tests/x86/dct.c   |  2 ++
>  libavcodec/x86/idctdsp_init.c| 19 +++
>  libavcodec/x86/simple_idct.h |  3 +++
>  libavcodec/x86/simple_idct10.asm |  8 
>  4 files changed, 32 insertions(+)

this (3) and the patches 1 and 2 break te idct

./ffplay ~/videos/matrixbench_mpeg2.mpg
looks pretty bad

[...]
-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

When the tyrant has disposed of foreign enemies by conquest or treaty, and
there is nothing more to fear from them, then he is always stirring up
some war or other, in order that the people may require a leader. -- Plato


signature.asc
Description: Digital signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH] avcodec/vp9: ipred_dr_16x16_16 avx2 implementation

2017-06-12 Thread Ronald S. Bultje

Hi,

On Sat, Jun 10, 2017 at 6:01 AM, Ilia Valiakhmetov 
wrote:

> Signed-off-by: Ilia Valiakhmetov 
> ---
>  libavcodec/x86/vp9dsp_init_16bpp.c|  2 ++
>  libavcodec/x86/vp9intrapred_16bpp.asm | 56 ++
> +
>  2 files changed, 58 insertions(+)
>
> diff --git a/libavcodec/x86/vp9dsp_init_16bpp.c
> b/libavcodec/x86/vp9dsp_init_16bpp.c
> index d1b8fcd..8d1aa13 100644
> --- a/libavcodec/x86/vp9dsp_init_16bpp.c
> +++ b/libavcodec/x86/vp9dsp_init_16bpp.c
> @@ -52,6 +52,7 @@ decl_ipred_fns(dc,  16, mmxext, sse2);
>  decl_ipred_fns(dc_top,  16, mmxext, sse2);
>  decl_ipred_fns(dc_left, 16, mmxext, sse2);
>  decl_ipred_fn(dl,   16, 16, avx2);
> +decl_ipred_fn(dr,   16, 16, avx2);
>  decl_ipred_fn(dl,   32, 16, avx2);
>
>  #define decl_ipred_dir_funcs(type) \
> @@ -136,6 +137,7 @@ av_cold void ff_vp9dsp_init_16bpp_x86(VP9DSPContext
> *dsp)
>  init_fpel_func(1, 1,  64, avg, _16, avx2);
>  init_fpel_func(0, 1, 128, avg, _16, avx2);
>  init_ipred_func(dl, DIAG_DOWN_LEFT, 16, 16, avx2);
> +init_ipred_func(dr, DIAG_DOWN_RIGHT, 16, 16, avx2);
>  init_ipred_func(dl, DIAG_DOWN_LEFT, 32, 16, avx2);
>  }
>
> diff --git a/libavcodec/x86/vp9intrapred_16bpp.asm
> b/libavcodec/x86/vp9intrapred_16bpp.asm
> index 92333bc..764f704 100644
> --- a/libavcodec/x86/vp9intrapred_16bpp.asm
> +++ b/libavcodec/x86/vp9intrapred_16bpp.asm
> @@ -1170,6 +1170,62 @@ DR_FUNCS 2
>  INIT_XMM avx
>  DR_FUNCS 2
>
> +%if HAVE_AVX2_EXTERNAL
> +INIT_YMM avx2
> +cglobal vp9_ipred_dr_16x16_16, 4, 5, 6, dst, stride, l, a
> +movam0, [lq]   ; klmnopqrstuvwxyz
> +movum1, [aq-2] ; *abcdefghijklmno
> +movam2, [aq]   ; abcdefghijklmnop
> +vperm2i128  m4, m2, m2, q2001  ; ijklmnop
> +vpalignrm5, m4, m2, 2  ; bcdefghijklmnop.
> +vperm2i128  m3, m0, m1, q0201  ; stuvwxyz*abcdefg
> +LOWPASS  1,  2,  5 ; ABCDEFGHIJKLMNO.
> +vpalignrm4, m3, m0, 2  ; lmnopqrstuvwxyz*
> +vpalignrm5, m3, m0, 4  ; mnopqrstuvwxyz*a
> +LOWPASS  0,  4,  5 ; LMNOPQRSTUVWXYZ#
> +vperm2i128  m5, m0, m1, q0201  ; TUVWXYZ#ABCDEFGH
> +DEFINE_ARGS dst, stride, stride3, stride5, dst3
> +lea  dst3q, [dstq+strideq*4]
> +lea   stride3q, [strideq*3]
> +lea   stride5q, [stride3q+strideq*2]
> +
> +vpalignrm3, m5, m0, 2
> +vpalignrm4, m1, m5, 2
> +mova[dst3q+stride5q*2], m3 ; 14
> +mova[ dstq+stride3q*2], m4 ; 6
> +vpalignrm3, m5, m0, 4
> +vpalignrm4, m1, m5, 4
> +sub  dst3q, strideq
> +mova[dst3q+stride5q*2], m3 ; 13
> +mova[dst3q+strideq*2 ], m4 ; 5
> +mova[dst3q+stride3q*4], m0 ; 15
> +vpalignrm3, m5, m0, 6
> +vpalignrm4, m1, m5, 6
> +mova [dstq+stride3q*4], m3 ; 12
> +mova [dst3q+strideq*1], m4 ; 4
> +vpalignrm3, m5, m0, 8
> +vpalignrm4, m1, m5, 8
> +mova [dst3q+strideq*8], m3 ; 11
> +mova [dst3q+strideq*0], m4 ; 3
> +vpalignrm3, m5, m0, 10
> +vpalignrm4, m1, m5, 10
> +mova [dstq+stride5q*2], m3 ; 10
> +mova [dstq+strideq*2 ], m4 ; 2
> +vpalignrm3, m5, m0, 12
> +vpalignrm4, m1, m5, 12
> +mova[dst3q+stride3q*2], m3 ; 9
> +mova [dstq+strideq*1 ], m4 ; 1
> +vpalignrm3, m5, m0, 14
> +vpalignrm4, m1, m5, 14
> +mova  [dstq+strideq*8], m3 ; 8
> +mova  [dstq+strideq*0], m4 ; 0
> +sub   dstq, strideq
> +mova [dst3q+strideq*4], m5 ; 7
> +mova [ dstq+strideq*0], m1 ; -1
> +RET
> +%endif
> +
> +
>  %macro VL_FUNCS 1 ; stack_mem_for_32x32_32bit_function
>  cglobal vp9_ipred_vl_4x4_16, 2, 4, 3, dst, stride, l, a
>  movifnidn   aq, amp
> --
> 2.8.3
>

Pushed, thanks.

Ronald
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH 3/6] avcodec/x86: add x86-64 8-bit simple_idct function

2017-06-12 Thread James Darnley

Rounding contributed by Ronald S. Bultje
---
 libavcodec/tests/x86/dct.c   |  2 ++
 libavcodec/x86/idctdsp_init.c| 19 +++
 libavcodec/x86/simple_idct.h |  3 +++
 libavcodec/x86/simple_idct10.asm |  8 
 4 files changed, 32 insertions(+)

diff --git a/libavcodec/tests/x86/dct.c b/libavcodec/tests/x86/dct.c
index 34f5b8767b..317d973f9f 100644
--- a/libavcodec/tests/x86/dct.c
+++ b/libavcodec/tests/x86/dct.c
@@ -88,10 +88,12 @@ static const struct algo idct_tab_arch[] = {
 #if HAVE_YASM
 #if ARCH_X86_64
 #if HAVE_SSE2_EXTERNAL
+{ "SIMPLE8-SSE2",   ff_simple_idct8_sse2,  FF_IDCT_PERM_TRANSPOSE, 
AV_CPU_FLAG_SSE2},
 { "SIMPLE10-SSE2",  ff_simple_idct10_sse2, FF_IDCT_PERM_TRANSPOSE, 
AV_CPU_FLAG_SSE2},
 { "SIMPLE12-SSE2",  ff_simple_idct12_sse2, FF_IDCT_PERM_TRANSPOSE, 
AV_CPU_FLAG_SSE2, 1 },
 #endif
 #if HAVE_AVX_EXTERNAL
+{ "SIMPLE8-AVX",ff_simple_idct8_avx,   FF_IDCT_PERM_TRANSPOSE, 
AV_CPU_FLAG_AVX},
 { "SIMPLE10-AVX",   ff_simple_idct10_avx,  FF_IDCT_PERM_TRANSPOSE, 
AV_CPU_FLAG_AVX},
 { "SIMPLE12-AVX",   ff_simple_idct12_avx,  FF_IDCT_PERM_TRANSPOSE, 
AV_CPU_FLAG_AVX,  1 },
 #endif
diff --git a/libavcodec/x86/idctdsp_init.c b/libavcodec/x86/idctdsp_init.c
index f1c915aa00..4b2145e478 100644
--- a/libavcodec/x86/idctdsp_init.c
+++ b/libavcodec/x86/idctdsp_init.c
@@ -94,9 +94,28 @@ av_cold void ff_idctdsp_init_x86(IDCTDSPContext *c, 
AVCodecContext *avctx,
 c->idct_add  = ff_simple_idct_add_sse2;
 c->perm_type = FF_IDCT_PERM_SIMPLE;
 }
+
+if (ARCH_X86_64 &&
+!high_bit_depth &&
+avctx->lowres == 0 &&
+(avctx->idct_algo == FF_IDCT_AUTO ||
+avctx->idct_algo == FF_IDCT_SIMPLEAUTO ||
+avctx->idct_algo == FF_IDCT_SIMPLEMMX)) {
+c->idct  = ff_simple_idct8_sse2;
+c->perm_type = FF_IDCT_PERM_TRANSPOSE;
+}
 }
 
 if (ARCH_X86_64 && avctx->lowres == 0) {
+if (EXTERNAL_AVX(cpu_flags) &&
+!high_bit_depth &&
+(avctx->idct_algo == FF_IDCT_AUTO ||
+avctx->idct_algo == FF_IDCT_SIMPLEAUTO ||
+avctx->idct_algo == FF_IDCT_SIMPLEMMX)) {
+c->idct  = ff_simple_idct8_avx;
+c->perm_type = FF_IDCT_PERM_TRANSPOSE;
+}
+
 if (avctx->bits_per_raw_sample == 10 &&
 (avctx->idct_algo == FF_IDCT_AUTO ||
  avctx->idct_algo == FF_IDCT_SIMPLEAUTO ||
diff --git a/libavcodec/x86/simple_idct.h b/libavcodec/x86/simple_idct.h
index d17ef6a462..d17a855312 100644
--- a/libavcodec/x86/simple_idct.h
+++ b/libavcodec/x86/simple_idct.h
@@ -29,6 +29,9 @@ void ff_simple_idct_put_mmx(uint8_t *dest, ptrdiff_t 
line_size, int16_t *block);
 void ff_simple_idct_add_sse2(uint8_t *dest, ptrdiff_t line_size, int16_t 
*block);
 void ff_simple_idct_put_sse2(uint8_t *dest, ptrdiff_t line_size, int16_t 
*block);
 
+void ff_simple_idct8_sse2(int16_t *block);
+void ff_simple_idct8_avx(int16_t *block);
+
 void ff_simple_idct10_sse2(int16_t *block);
 void ff_simple_idct10_avx(int16_t *block);
 
diff --git a/libavcodec/x86/simple_idct10.asm b/libavcodec/x86/simple_idct10.asm
index 1a5a2eae9b..168b6a08e0 100644
--- a/libavcodec/x86/simple_idct10.asm
+++ b/libavcodec/x86/simple_idct10.asm
@@ -33,9 +33,11 @@ cextern pw_2
 cextern pw_16
 cextern pw_1023
 cextern pw_4095
+pd_round_11: times 4 dd 1<<(11-1)
 pd_round_12: times 4 dd 1<<(12-1)
 pd_round_15: times 4 dd 1<<(15-1)
 pd_round_19: times 4 dd 1<<(19-1)
+pd_round_20: times 4 dd 1<<(20-1)
 
 %macro CONST_DEC  3
 const %1
@@ -50,6 +52,8 @@ times 4 dw %2, %3
 %define W6sh2  8867 ; W6 = 35468 =  8867<<2
 %define W7sh2  4520 ; W7 = 18081 =  4520<<2 + 1
 
+pw_round_20_div_w4: times 8 dw ((1 << (20 - 1)) / W4sh2)
+
 CONST_DEC  w4_plus_w2,   W4sh2, +W2sh2
 CONST_DEC  w4_min_w2,W4sh2, -W2sh2
 CONST_DEC  w4_plus_w6,   W4sh2, +W6sh2
@@ -68,6 +72,10 @@ CONST_DEC  w7_min_w5,W7sh2, -W5sh2
 SECTION .text
 
 %macro idct_fn 0
+cglobal simple_idct8, 1, 1, 16, block
+IDCT_FN"", 11, pw_round_20_div_w4, 20, "store"
+RET
+
 cglobal simple_idct10, 1, 1, 16, block
 IDCT_FN"", 12, "", 19, "store"
 RET
-- 
2.13.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH 2/6] avcodec/x86: modify simple_idct10 macros to add an action paramter

2017-06-12 Thread James Darnley

---
 libavcodec/x86/proresdsp.asm  |  2 +-
 libavcodec/x86/simple_idct10.asm  |  8 +++
 libavcodec/x86/simple_idct10_template.asm | 37 +--
 3 files changed, 25 insertions(+), 22 deletions(-)

diff --git a/libavcodec/x86/proresdsp.asm b/libavcodec/x86/proresdsp.asm
index 8318a81c5e..3be0ff7757 100644
--- a/libavcodec/x86/proresdsp.asm
+++ b/libavcodec/x86/proresdsp.asm
@@ -52,7 +52,7 @@ SECTION .text
 
 %macro idct_fn 0
 cglobal prores_idct_put_10, 4, 4, 15, pixels, lsize, block, qmat
-IDCT_FNpw_1, 15, pw_88, 18, pw_4, pw_1019, r3
+IDCT_FNpw_1, 15, pw_88, 18, "put", pw_4, pw_1019, r3
 RET
 %endmacro
 
diff --git a/libavcodec/x86/simple_idct10.asm b/libavcodec/x86/simple_idct10.asm
index 7cfd33eaa3..1a5a2eae9b 100644
--- a/libavcodec/x86/simple_idct10.asm
+++ b/libavcodec/x86/simple_idct10.asm
@@ -69,24 +69,24 @@ SECTION .text
 
 %macro idct_fn 0
 cglobal simple_idct10, 1, 1, 16, block
-IDCT_FN"", 12, "", 19
+IDCT_FN"", 12, "", 19, "store"
 RET
 
 cglobal simple_idct10_put, 3, 3, 16, pixels, lsize, block
-IDCT_FN"", 12, "", 19, 0, pw_1023
+IDCT_FN"", 12, "", 19, "put", 0, pw_1023
 RET
 
 cglobal simple_idct12, 1, 1, 16, block
 ; coeffs are already 15bits, adding the offset would cause
 ; overflow in the input
-IDCT_FN"", 15, pw_2, 16
+IDCT_FN"", 15, pw_2, 16, "store"
 RET
 
 cglobal simple_idct12_put, 3, 3, 16, pixels, lsize, block
 ; range isn't known, so the C simple_idct range is used
 ; Also, using a bias on input overflows, so use the bias
 ; on output of the first butterfly instead
-IDCT_FN"", 15, pw_2, 16, 0, pw_4095
+IDCT_FN"", 15, pw_2, 16, "put", 0, pw_4095
 RET
 %endmacro
 
diff --git a/libavcodec/x86/simple_idct10_template.asm 
b/libavcodec/x86/simple_idct10_template.asm
index 3f398985a5..8367011dfd 100644
--- a/libavcodec/x86/simple_idct10_template.asm
+++ b/libavcodec/x86/simple_idct10_template.asm
@@ -218,11 +218,12 @@
 ; %2 = row bias macro
 ; %3 = column shift
 ; %4 = column bias macro
-; %5 = min pixel value
-; %6 = max pixel value
-; %7 = qmat (for prores)
+; %5 = final action (nothing, "store", "put", "add")
+; %6 = min pixel value
+; %7 = max pixel value
+; %8 = qmat (for prores)
 
-%macro IDCT_FN 4-7
+%macro IDCT_FN 4-8
 ; for (i = 0; i < 8; i++)
 ; idctRowCondDC(block + i*8);
 movam10,[blockq+ 0]; { row[0] }[0-7]
@@ -230,13 +231,13 @@
 movam13,[blockq+64]; { row[4] }[0-7]
 movam12,[blockq+96]; { row[6] }[0-7]
 
-%if %0 == 7
-pmullw  m10,[%7+ 0]
-pmullw  m8, [%7+32]
-pmullw  m13,[%7+64]
-pmullw  m12,[%7+96]
+%if %0 == 8
+pmullw  m10,[%8+ 0]
+pmullw  m8, [%8+32]
+pmullw  m13,[%8+64]
+pmullw  m12,[%8+96]
 
-IDCT_1D %1, %2, %7
+IDCT_1D %1, %2, %8
 %else
 IDCT_1D %1, %2
 %endif
@@ -257,7 +258,8 @@
 IDCT_1D %3, %4
 
 ; clip/store
-%if %0 == 4
+%if %0 >= 5
+%ifidn %5,"store"
 ; No clamping, means pure idct
 mova  [blockq+  0], m8
 mova  [blockq+ 16], m0
@@ -267,13 +269,13 @@
 mova  [blockq+ 80], m11
 mova  [blockq+ 96], m9
 mova  [blockq+112], m10
-%else
-%ifidn %5, 0
+%elifidn %5,"put"
+%ifidn %6, 0
 pxorm3, m3
 %else
-movam3, [%5]
-%endif
-movam5, [%6]
+movam3, [%6]
+%endif ; ifidn %6, 0
+movam5, [%7]
 pmaxsw  m8,  m3
 pmaxsw  m0,  m3
 pmaxsw  m1,  m3
@@ -301,7 +303,8 @@
 mova  [r0+r1  ], m11
 mova  [r0+r1*2], m9
 mova  [r0+r2  ], m10
-%endif
+%endif ; %5 action
+%endif; if %0 >= 5
 %endmacro
 
 %endif
-- 
2.13.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH 4/6] avcodec/x86: add x86-64 8-bit simple_idct put function

2017-06-12 Thread James Darnley

---
 libavcodec/x86/idctdsp_init.c|  2 ++
 libavcodec/x86/simple_idct.h |  3 +++
 libavcodec/x86/simple_idct10.asm | 23 +++
 3 files changed, 28 insertions(+)

diff --git a/libavcodec/x86/idctdsp_init.c b/libavcodec/x86/idctdsp_init.c
index 4b2145e478..1826d01e0e 100644
--- a/libavcodec/x86/idctdsp_init.c
+++ b/libavcodec/x86/idctdsp_init.c
@@ -102,6 +102,7 @@ av_cold void ff_idctdsp_init_x86(IDCTDSPContext *c, 
AVCodecContext *avctx,
 avctx->idct_algo == FF_IDCT_SIMPLEAUTO ||
 avctx->idct_algo == FF_IDCT_SIMPLEMMX)) {
 c->idct  = ff_simple_idct8_sse2;
+c->idct_put  = ff_simple_idct8_put_sse2;
 c->perm_type = FF_IDCT_PERM_TRANSPOSE;
 }
 }
@@ -113,6 +114,7 @@ av_cold void ff_idctdsp_init_x86(IDCTDSPContext *c, 
AVCodecContext *avctx,
 avctx->idct_algo == FF_IDCT_SIMPLEAUTO ||
 avctx->idct_algo == FF_IDCT_SIMPLEMMX)) {
 c->idct  = ff_simple_idct8_avx;
+c->idct_put  = ff_simple_idct8_put_avx;
 c->perm_type = FF_IDCT_PERM_TRANSPOSE;
 }
 
diff --git a/libavcodec/x86/simple_idct.h b/libavcodec/x86/simple_idct.h
index d17a855312..b559f8527c 100644
--- a/libavcodec/x86/simple_idct.h
+++ b/libavcodec/x86/simple_idct.h
@@ -32,6 +32,9 @@ void ff_simple_idct_put_sse2(uint8_t *dest, ptrdiff_t 
line_size, int16_t *block)
 void ff_simple_idct8_sse2(int16_t *block);
 void ff_simple_idct8_avx(int16_t *block);
 
+void ff_simple_idct8_put_sse2(uint8_t *dest, ptrdiff_t line_size, int16_t 
*block);
+void ff_simple_idct8_put_avx(uint8_t *dest, ptrdiff_t line_size, int16_t 
*block);
+
 void ff_simple_idct10_sse2(int16_t *block);
 void ff_simple_idct10_avx(int16_t *block);
 
diff --git a/libavcodec/x86/simple_idct10.asm b/libavcodec/x86/simple_idct10.asm
index 168b6a08e0..f31fb5cfa5 100644
--- a/libavcodec/x86/simple_idct10.asm
+++ b/libavcodec/x86/simple_idct10.asm
@@ -71,11 +71,34 @@ CONST_DEC  w7_min_w5,W7sh2, -W5sh2
 
 SECTION .text
 
+%macro STORE_HI_LO 12
+movq   %1, %9
+movq   %3, %10
+movq   %5, %11
+movq   %7, %12
+movhps %2, %9
+movhps %4, %10
+movhps %6, %11
+movhps %8, %12
+%endmacro
+
 %macro idct_fn 0
 cglobal simple_idct8, 1, 1, 16, block
 IDCT_FN"", 11, pw_round_20_div_w4, 20, "store"
 RET
 
+; TODO: optimise by not writing the final data to the block.
+cglobal simple_idct8_put, 3, 4, 16, pixels, lsize, block
+IDCT_FN"", 11, pw_round_20_div_w4, 20
+lea   r3, [3*lsizeq]
+lea   r2, [pixelsq + r3]
+packuswb  m8, m0
+packuswb  m1, m2
+packuswb  m4, m11
+packuswb  m9, m10
+STORE_HI_LO PASS8ROWS(pixelsq, r2, lsizeq, r3), m8, m1, m4, m9
+RET
+
 cglobal simple_idct10, 1, 1, 16, block
 IDCT_FN"", 12, "", 19, "store"
 RET
-- 
2.13.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH 5/6] avcodec/x86: add x86-64 8-bit simple_idct add function

2017-06-12 Thread James Darnley

---
 libavcodec/x86/idctdsp_init.c|  2 ++
 libavcodec/x86/simple_idct.h |  3 ++
 libavcodec/x86/simple_idct10.asm | 61 
 3 files changed, 66 insertions(+)

diff --git a/libavcodec/x86/idctdsp_init.c b/libavcodec/x86/idctdsp_init.c
index 1826d01e0e..9da60d1a1e 100644
--- a/libavcodec/x86/idctdsp_init.c
+++ b/libavcodec/x86/idctdsp_init.c
@@ -103,6 +103,7 @@ av_cold void ff_idctdsp_init_x86(IDCTDSPContext *c, 
AVCodecContext *avctx,
 avctx->idct_algo == FF_IDCT_SIMPLEMMX)) {
 c->idct  = ff_simple_idct8_sse2;
 c->idct_put  = ff_simple_idct8_put_sse2;
+c->idct_add  = ff_simple_idct8_add_sse2;
 c->perm_type = FF_IDCT_PERM_TRANSPOSE;
 }
 }
@@ -115,6 +116,7 @@ av_cold void ff_idctdsp_init_x86(IDCTDSPContext *c, 
AVCodecContext *avctx,
 avctx->idct_algo == FF_IDCT_SIMPLEMMX)) {
 c->idct  = ff_simple_idct8_avx;
 c->idct_put  = ff_simple_idct8_put_avx;
+c->idct_add  = ff_simple_idct8_add_avx;
 c->perm_type = FF_IDCT_PERM_TRANSPOSE;
 }
 
diff --git a/libavcodec/x86/simple_idct.h b/libavcodec/x86/simple_idct.h
index b559f8527c..9b64cfe9bc 100644
--- a/libavcodec/x86/simple_idct.h
+++ b/libavcodec/x86/simple_idct.h
@@ -35,6 +35,9 @@ void ff_simple_idct8_avx(int16_t *block);
 void ff_simple_idct8_put_sse2(uint8_t *dest, ptrdiff_t line_size, int16_t 
*block);
 void ff_simple_idct8_put_avx(uint8_t *dest, ptrdiff_t line_size, int16_t 
*block);
 
+void ff_simple_idct8_add_sse2(uint8_t *dest, ptrdiff_t line_size, int16_t 
*block);
+void ff_simple_idct8_add_avx(uint8_t *dest, ptrdiff_t line_size, int16_t 
*block);
+
 void ff_simple_idct10_sse2(int16_t *block);
 void ff_simple_idct10_avx(int16_t *block);
 
diff --git a/libavcodec/x86/simple_idct10.asm b/libavcodec/x86/simple_idct10.asm
index f31fb5cfa5..29e18fe6a6 100644
--- a/libavcodec/x86/simple_idct10.asm
+++ b/libavcodec/x86/simple_idct10.asm
@@ -82,6 +82,31 @@ SECTION .text
 movhps %8, %12
 %endmacro
 
+%macro LOAD_ZXBW_8 16
+pmovzxbw %1, %9
+pmovzxbw %2, %10
+pmovzxbw %3, %11
+pmovzxbw %4, %12
+pmovzxbw %5, %13
+pmovzxbw %6, %14
+pmovzxbw %7, %15
+pmovzxbw %8, %16
+%endmacro
+
+%macro LOAD_ZXBW_4 9
+movh %1, %5
+movh %2, %6
+movh %3, %7
+movh %4, %8
+punpcklbw %1, %9
+punpcklbw %2, %9
+punpcklbw %3, %9
+punpcklbw %4, %9
+%endmacro
+
+%define PASS4ROWS(base, stride, stride3) \
+[base], [base + stride], [base + 2*stride], [base + stride3]
+
 %macro idct_fn 0
 cglobal simple_idct8, 1, 1, 16, block
 IDCT_FN"", 11, pw_round_20_div_w4, 20, "store"
@@ -99,6 +124,42 @@ cglobal simple_idct8_put, 3, 4, 16, pixels, lsize, block
 STORE_HI_LO PASS8ROWS(pixelsq, r2, lsizeq, r3), m8, m1, m4, m9
 RET
 
+; TODO: optimise by not writing the final data to the block.
+cglobal simple_idct8_add, 3, 4, 16, pixels, lsize, block
+IDCT_FN"", 11, pw_round_20_div_w4, 20
+lea r2, [3*lsizeq]
+lea r3, [pixelsq + r2]
+%if cpuflag(sse4)
+LOAD_ZXBW_8 m3, m5, m6, m7, m12, m13, m14, m15, PASS8ROWS(pixelsq, r3, 
lsizeq, r2)
+paddsw m8, m3
+paddsw m0, m5
+paddsw m1, m6
+paddsw m2, m7
+paddsw m4, m12
+paddsw m11, m13
+paddsw m9, m14
+paddsw m10, m15
+%else
+pxor m12, m12
+LOAD_ZXBW_4 m3, m5, m6, m7, PASS4ROWS(pixelsq, lsizeq, r2), m12
+paddsw m8, m3
+paddsw m0, m5
+paddsw m1, m6
+paddsw m2, m7
+lea pixelsq, [pixelsq + 4*lsizeq]
+LOAD_ZXBW_4 m3, m5, m6, m7, PASS4ROWS(pixelsq, lsizeq, r2), m12
+paddsw m4, m3
+paddsw m11, m5
+paddsw m9, m6
+paddsw m10, m7
+%endif
+packuswb  m8, m0
+packuswb  m1, m2
+packuswb  m4, m11
+packuswb  m9, m10
+STORE_HI_LO PASS8ROWS(pixelsq, r3, lsizeq, r2), m8, m1, m4, m9
+RET
+
 cglobal simple_idct10, 1, 1, 16, block
 IDCT_FN"", 12, "", 19, "store"
 RET
-- 
2.13.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH 6/6] avcodec/x86: allow 8-bit simple_idct to use slightly different coefficients

2017-06-12 Thread James Darnley

This makes it exact to the old MMX one, as reported by libavcodec/tests/dct.
---
 libavcodec/x86/proresdsp.asm  | 18 +
 libavcodec/x86/simple_idct10.asm  | 33 ++-
 libavcodec/x86/simple_idct10_template.asm | 19 ++
 3 files changed, 53 insertions(+), 17 deletions(-)

diff --git a/libavcodec/x86/proresdsp.asm b/libavcodec/x86/proresdsp.asm
index 3be0ff7757..65c9fad51c 100644
--- a/libavcodec/x86/proresdsp.asm
+++ b/libavcodec/x86/proresdsp.asm
@@ -33,14 +33,14 @@ cextern pw_1
 cextern pw_4
 cextern pw_1019
 ; Below are defined in simple_idct10.asm built from selecting idctdsp
-cextern w4_plus_w2
-cextern w4_min_w2
-cextern w4_plus_w6
-cextern w4_min_w6
-cextern w1_plus_w3
-cextern w3_min_w1
-cextern w7_plus_w3
-cextern w3_min_w7
+cextern w4_plus_w2_hi
+cextern w4_min_w2_hi
+cextern w4_plus_w6_hi
+cextern w4_min_w6_hi
+cextern w1_plus_w3_hi
+cextern w3_min_w1_hi
+cextern w7_plus_w3_hi
+cextern w3_min_w7_hi
 cextern w1_plus_w5
 cextern w5_min_w1
 cextern w5_plus_w7
@@ -50,6 +50,8 @@ cextern w7_min_w5
 
 SECTION .text
 
+define_constants _hi
+
 %macro idct_fn 0
 cglobal prores_idct_put_10, 4, 4, 15, pixels, lsize, block, qmat
 IDCT_FNpw_1, 15, pw_88, 18, "put", pw_4, pw_1019, r3
diff --git a/libavcodec/x86/simple_idct10.asm b/libavcodec/x86/simple_idct10.asm
index 29e18fe6a6..5ec1e6b23d 100644
--- a/libavcodec/x86/simple_idct10.asm
+++ b/libavcodec/x86/simple_idct10.asm
@@ -48,24 +48,34 @@ times 4 dw %2, %3
 %define W2sh2 21407 ; W2 = 85627 = 21407<<2 - 1
 %define W3sh2 19265 ; W3 = 77062 = 19265<<2 + 2
 %define W4sh2 16384 ; W4 = 65535 = 16384<<2 - 1
+%define W3sh2_lo 19266
+%define W4sh2_lo 16383
 %define W5sh2 12873 ; W5 = 51491 = 12873<<2 - 1
 %define W6sh2  8867 ; W6 = 35468 =  8867<<2
 %define W7sh2  4520 ; W7 = 18081 =  4520<<2 + 1
 
-pw_round_20_div_w4: times 8 dw ((1 << (20 - 1)) / W4sh2)
+pw_round_20_div_w4: times 8 dw ((1 << (20 - 1)) / W4sh2_lo)
 
-CONST_DEC  w4_plus_w2,   W4sh2, +W2sh2
-CONST_DEC  w4_min_w2,W4sh2, -W2sh2
-CONST_DEC  w4_plus_w6,   W4sh2, +W6sh2
-CONST_DEC  w4_min_w6,W4sh2, -W6sh2
-CONST_DEC  w1_plus_w3,   W1sh2, +W3sh2
-CONST_DEC  w3_min_w1,W3sh2, -W1sh2
-CONST_DEC  w7_plus_w3,   W7sh2, +W3sh2
-CONST_DEC  w3_min_w7,W3sh2, -W7sh2
+CONST_DEC  w4_plus_w2_hi,   W4sh2, +W2sh2
+CONST_DEC  w4_min_w2_hi,W4sh2, -W2sh2
+CONST_DEC  w4_plus_w6_hi,   W4sh2, +W6sh2
+CONST_DEC  w4_min_w6_hi,W4sh2, -W6sh2
+CONST_DEC  w1_plus_w3_hi,   W1sh2, +W3sh2
+CONST_DEC  w3_min_w1_hi,W3sh2, -W1sh2
+CONST_DEC  w7_plus_w3_hi,   W7sh2, +W3sh2
+CONST_DEC  w3_min_w7_hi,W3sh2, -W7sh2
 CONST_DEC  w1_plus_w5,   W1sh2, +W5sh2
 CONST_DEC  w5_min_w1,W5sh2, -W1sh2
 CONST_DEC  w5_plus_w7,   W5sh2, +W7sh2
 CONST_DEC  w7_min_w5,W7sh2, -W5sh2
+CONST_DEC  w4_plus_w2_lo,   W4sh2_lo, +W2sh2
+CONST_DEC  w4_min_w2_lo,W4sh2_lo, -W2sh2
+CONST_DEC  w4_plus_w6_lo,   W4sh2_lo, +W6sh2
+CONST_DEC  w4_min_w6_lo,W4sh2_lo, -W6sh2
+CONST_DEC  w1_plus_w3_lo,   W1sh2,+W3sh2_lo
+CONST_DEC  w3_min_w1_lo,W3sh2_lo, -W1sh2
+CONST_DEC  w7_plus_w3_lo,   W7sh2,+W3sh2_lo
+CONST_DEC  w3_min_w7_lo,W3sh2_lo, -W7sh2
 
 %include "libavcodec/x86/simple_idct10_template.asm"
 
@@ -108,6 +118,9 @@ SECTION .text
 [base], [base + stride], [base + 2*stride], [base + stride3]
 
 %macro idct_fn 0
+
+define_constants _lo
+
 cglobal simple_idct8, 1, 1, 16, block
 IDCT_FN"", 11, pw_round_20_div_w4, 20, "store"
 RET
@@ -160,6 +173,8 @@ cglobal simple_idct8_add, 3, 4, 16, pixels, lsize, block
 STORE_HI_LO PASS8ROWS(pixelsq, r3, lsizeq, r2), m8, m1, m4, m9
 RET
 
+define_constants _hi
+
 cglobal simple_idct10, 1, 1, 16, block
 IDCT_FN"", 12, "", 19, "store"
 RET
diff --git a/libavcodec/x86/simple_idct10_template.asm 
b/libavcodec/x86/simple_idct10_template.asm
index 8367011dfd..d8ea0bcc6b 100644
--- a/libavcodec/x86/simple_idct10_template.asm
+++ b/libavcodec/x86/simple_idct10_template.asm
@@ -26,6 +26,25 @@
 
 %if ARCH_X86_64
 
+%macro define_constants 1
+%undef w4_plus_w2
+%undef w4_min_w2
+%undef w4_plus_w6
+%undef w4_min_w6
+%undef w1_plus_w3
+%undef w3_min_w1
+%undef w7_plus_w3
+%undef w3_min_w7
+%define w4_plus_w2 w4_plus_w2%1
+%define w4_min_w2  w4_min_w2%1
+%define w4_plus_w6 w4_plus_w6%1
+%define w4_min_w6  w4_min_w6%1
+%define w1_plus_w3 w1_plus_w3%1
+%define w3_min_w1  w3_min_w1%1
+%define w7_plus_w3 w7_plus_w3%1
+%define w3_min_w7  w3_min_w7%1
+%endmacro
+
 ; interleave data while maintaining source
 ; %1=type, %2=dstlo, %3=dsthi, %4=src, %5=interleave
 %macro SBUTTERFLY3 5
-- 
2.13.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH 1/6] avcodec/x86: cleanup simple_idct10

2017-06-12 Thread James Darnley

Use named arguments for the functions so we can remove a define.  The
stride/linesize argument is now ptrdiff_t type so we no longer need to
sign extend the register.
---
 libavcodec/x86/proresdsp.asm  |  2 +-
 libavcodec/x86/simple_idct10.asm  |  8 ++--
 libavcodec/x86/simple_idct10_template.asm | 80 ++-
 3 files changed, 41 insertions(+), 49 deletions(-)

diff --git a/libavcodec/x86/proresdsp.asm b/libavcodec/x86/proresdsp.asm
index 16fc262aeb..8318a81c5e 100644
--- a/libavcodec/x86/proresdsp.asm
+++ b/libavcodec/x86/proresdsp.asm
@@ -51,7 +51,7 @@ cextern w7_min_w5
 SECTION .text
 
 %macro idct_fn 0
-cglobal prores_idct_put_10, 4, 4, 15
+cglobal prores_idct_put_10, 4, 4, 15, pixels, lsize, block, qmat
 IDCT_FNpw_1, 15, pw_88, 18, pw_4, pw_1019, r3
 RET
 %endmacro
diff --git a/libavcodec/x86/simple_idct10.asm b/libavcodec/x86/simple_idct10.asm
index 5dee533de0..7cfd33eaa3 100644
--- a/libavcodec/x86/simple_idct10.asm
+++ b/libavcodec/x86/simple_idct10.asm
@@ -68,21 +68,21 @@ CONST_DEC  w7_min_w5,W7sh2, -W5sh2
 SECTION .text
 
 %macro idct_fn 0
-cglobal simple_idct10, 1, 1, 16
+cglobal simple_idct10, 1, 1, 16, block
 IDCT_FN"", 12, "", 19
 RET
 
-cglobal simple_idct10_put, 3, 3, 16
+cglobal simple_idct10_put, 3, 3, 16, pixels, lsize, block
 IDCT_FN"", 12, "", 19, 0, pw_1023
 RET
 
-cglobal simple_idct12, 1, 1, 16
+cglobal simple_idct12, 1, 1, 16, block
 ; coeffs are already 15bits, adding the offset would cause
 ; overflow in the input
 IDCT_FN"", 15, pw_2, 16
 RET
 
-cglobal simple_idct12_put, 3, 3, 16
+cglobal simple_idct12_put, 3, 3, 16, pixels, lsize, block
 ; range isn't known, so the C simple_idct range is used
 ; Also, using a bias on input overflows, so use the bias
 ; on output of the first butterfly instead
diff --git a/libavcodec/x86/simple_idct10_template.asm 
b/libavcodec/x86/simple_idct10_template.asm
index 9d323d99b3..3f398985a5 100644
--- a/libavcodec/x86/simple_idct10_template.asm
+++ b/libavcodec/x86/simple_idct10_template.asm
@@ -115,18 +115,18 @@
 psubd   m3,  m9; a1[4-7] intermediate
 
 ; load/store
-mova   [COEFFS+  0], m0
-mova   [COEFFS+ 32], m2
-mova   [COEFFS+ 64], m4
-mova   [COEFFS+ 96], m6
-movam10,[COEFFS+ 16]   ; { row[1] }[0-7]
-movam8, [COEFFS+ 48]   ; { row[3] }[0-7]
-movam13,[COEFFS+ 80]   ; { row[5] }[0-7]
-movam14,[COEFFS+112]   ; { row[7] }[0-7]
-mova   [COEFFS+ 16], m1
-mova   [COEFFS+ 48], m3
-mova   [COEFFS+ 80], m5
-mova   [COEFFS+112], m7
+mova   [blockq+  0], m0
+mova   [blockq+ 32], m2
+mova   [blockq+ 64], m4
+mova   [blockq+ 96], m6
+movam10,[blockq+ 16]   ; { row[1] }[0-7]
+movam8, [blockq+ 48]   ; { row[3] }[0-7]
+movam13,[blockq+ 80]   ; { row[5] }[0-7]
+movam14,[blockq+112]   ; { row[7] }[0-7]
+mova   [blockq+ 16], m1
+mova   [blockq+ 48], m3
+mova   [blockq+ 80], m5
+mova   [blockq+112], m7
 %if %0 == 3
 pmullw  m10,[%3+ 16]
 pmullw  m8, [%3+ 48]
@@ -197,17 +197,17 @@
 ; row[5] = (a2 - b2) >> 15;
 ; row[3] = (a3 + b3) >> 15;
 ; row[4] = (a3 - b3) >> 15;
-movam8, [COEFFS+ 0]; a0[0-3]
-movam9, [COEFFS+16]; a0[4-7]
+movam8, [blockq+ 0]; a0[0-3]
+movam9, [blockq+16]; a0[4-7]
 SUMSUB_SHPK m8,  m9,  m10, m11, m0,  m1,  %2
-movam0, [COEFFS+32]; a1[0-3]
-movam1, [COEFFS+48]; a1[4-7]
+movam0, [blockq+32]; a1[0-3]
+movam1, [blockq+48]; a1[4-7]
 SUMSUB_SHPK m0,  m1,  m9,  m11, m2,  m3,  %2
-movam1, [COEFFS+64]; a2[0-3]
-movam2, [COEFFS+80]; a2[4-7]
+movam1, [blockq+64]; a2[0-3]
+movam2, [blockq+80]; a2[4-7]
 SUMSUB_SHPK m1,  m2,  m11, m3,  m4,  m5,  %2
-movam2, [COEFFS+96]; a3[0-3]
-movam3, [COEFFS+112]   ; a3[4-7]
+movam2, [blockq+96]; a3[0-3]
+movam3, [blockq+112]   ; a3[4-7]
 SUMSUB_SHPK m2,  m3,  m4,  m5,  m6,  m7,  %2
 %endmacro
 
@@ -223,20 +223,12 @@
 ; %7 = qmat (for prores)
 
 %macro IDCT_FN 4-7
-%if %0 == 4
-; No clamping, means pure idct
-%xdefine COEFFS r0
-%else
-movsxd  r1,  r1d
-%xdefine COEFFS r2
-%endif
-
 ; for (i = 0; i < 8; i++)
 ; idctRowCondDC(block + i*8);
-movam10,[COEFFS+ 0]; { row[0] }[0-7]
-movam8, [COEFFS+32]; { row[2] }[0-7]
-movam13,[COEFFS+64]; { row[4] }[0-7]
-movam12,[COEFFS+96]; { row[6] }[0-7]
+movam10,[blockq+ 0]; { row[0] }[0-7]
+movam8, [blockq+32]; { row[2] }[0-7]
+movam13,[blockq+64]; {

[FFmpeg-devel] [PATCH 0/6] sse2/avx functions for 8-bit simple idct

2017-06-12 Thread James Darnley

I think I have reached the final state for these patches.  There has been little
change to the 1st, 3rd, 4th, and 5th.

The 2nd adds an option to explicitly control what the macro does after the IDCT.
This allows the small optimisation for 8-bit of not storing the data back to the
source block.

The 6th lets the IDCT use the slightly different coefficients to get exact
output compared with the MMX original.  This is rather messy but I think it is
slightly better than trying to alter the code macro.  A word diff looks much
cleaner than the line diff git uses by default.

If people would kindly give their opinion on the 2nd and 6th patches in
particular I would greatly appreciate it.

Performance gain decoding an MPEG2 HD sample over the old MMX:
 - Yorkfield: 210 to 224 fps
 - Haswell:   387 to 426 fps

Would anyone like me to get some timer figures for the functions themselves?

James Darnley (6):
  avcodec/x86: cleanup simple_idct10
  avcodec/x86: modify simple_idct10 macros to add an action paramter
  avcodec/x86: add x86-64 8-bit simple_idct function
  avcodec/x86: add x86-64 8-bit simple_idct put function
  avcodec/x86: add x86-64 8-bit simple_idct add function
  avcodec/x86: allow 8-bit simple_idct to use slightly different
coefficients

 libavcodec/tests/x86/dct.c|   2 +
 libavcodec/x86/idctdsp_init.c |  23 +
 libavcodec/x86/proresdsp.asm  |  22 ++---
 libavcodec/x86/simple_idct.h  |   9 ++
 libavcodec/x86/simple_idct10.asm  | 139 ++
 libavcodec/x86/simple_idct10_template.asm | 136 -
 6 files changed, 244 insertions(+), 87 deletions(-)

-- 
2.13.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH] libavformat/avio: fix retry_transfer_wrapper return value on error

2017-06-12 Thread Michael Niedermayer

On Fri, Jun 09, 2017 at 03:51:30PM +0200, Daniel Kucera wrote:
> Signed-off-by: Daniel Kucera 
> ---
>  libavformat/avio.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)

causes fate to infinte loop
for example fate-filter-volume

[...]

-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

The bravest are surely those who have the clearest vision
of what is before them, glory and danger alike, and yet
notwithstanding go out to meet it. -- Thucydides


signature.asc
Description: Digital signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH] avcodec/htmlsubtitles: Replace very slow redundant sscanf() calls by cleaner and faster code

2017-06-12 Thread wm4

On Sun, 11 Jun 2017 17:58:45 +0200
Michael Niedermayer  wrote:

> This reduces the worst case from O(n²) to O(n) time
> 
> Fixes Timeout
> Fixes: 2127/clusterfuzz-testcase-minimized-6595787859427328
> 
> Signed-off-by: Michael Niedermayer 
> ---
>  libavcodec/htmlsubtitles.c | 20 +++-
>  1 file changed, 15 insertions(+), 5 deletions(-)
> 
> diff --git a/libavcodec/htmlsubtitles.c b/libavcodec/htmlsubtitles.c
> index 16295daa0c..70311c66d5 100644
> --- a/libavcodec/htmlsubtitles.c
> +++ b/libavcodec/htmlsubtitles.c
> @@ -56,6 +56,7 @@ int ff_htmlmarkup_to_ass(void *log_ctx, AVBPrint *dst, 
> const char *in)
>  char *param, buffer[128], tmp[128];
>  int len, tag_close, sptr = 1, line_start = 1, an = 0, end = 0;
>  SrtStack stack[16];
> +int closing_brace_missing = 0;
>  
>  stack[0].tag[0] = 0;
>  strcpy(stack[0].param[PARAM_SIZE],  "{\\fs}");
> @@ -83,11 +84,20 @@ int ff_htmlmarkup_to_ass(void *log_ctx, AVBPrint *dst, 
> const char *in)
>  and all microdvd like styles such as {Y:xxx} */
>  len = 0;
>  an += sscanf(in, "{\\an%*1u}%n", &len) >= 0 && len > 0;
> -if ((an != 1 && (len = 0, sscanf(in, "{\\%*[^}]}%n", &len) >= 0 
> && len > 0)) ||
> -(len = 0, sscanf(in, "{%*1[CcFfoPSsYy]:%*[^}]}%n", &len) >= 
> 0 && len > 0)) {
> -in += len - 1;
> -} else
> -av_bprint_chars(dst, *in, 1);
> +
> +if (!closing_brace_missing) {
> +if (   (an != 1 && in[1] == '\\')
> +|| (in[1] && strchr("CcFfoPSsYy", in[1]) && in[2] == 
> ':')) {
> +char *bracep = strchr(in+2, '}');
> +if (bracep) {
> +in = bracep;
> +break;
> +} else
> +closing_brace_missing = 1;
> +}
> +}
> +
> +av_bprint_chars(dst, *in, 1);
>  break;
>  case '<':
>  tag_close = in[1] == '/';

IMO better than before - now anyone can understand this code, and it's
faster. I'm not maintainer of this though.

Would it be possible to move this switch case to a separate function?
ff_htmlmarkup_to_ass is a bit too big.

Btw. as far as ASS tag stripping is concerned, this seems to ignore
drawings, but maybe they're typically not used in SRT.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

66 matches

Mail list logo