Re: [libav-devel] [PATCH] checkasm: aarch64: Specify alignment for the register_init const array
On 5/4/17 8:46 PM, Martin Storsjö wrote: > Loads from this strictly doesn't require alignment, but specify it > just for consistency with the arm version. > --- > tests/checkasm/aarch64/checkasm.S | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/tests/checkasm/aarch64/checkasm.S > b/tests/checkasm/aarch64/checkasm.S > index bc5ed9ea09..327dfc0802 100644 > --- a/tests/checkasm/aarch64/checkasm.S > +++ b/tests/checkasm/aarch64/checkasm.S > @@ -22,7 +22,7 @@ > > #include "libavutil/aarch64/asm.S" > > -const register_init > +const register_init, align=4 > .quad 0x21f86d66c8ca00ce > .quad 0x75b6ba21077c48ad > .quad 0xed56bb2dcb3c7736 > Sounds a good idea. ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
[libav-devel] [PATCH] checkasm: aarch64: Specify alignment for the register_init const array
Loads from this strictly doesn't require alignment, but specify it just for consistency with the arm version. --- tests/checkasm/aarch64/checkasm.S | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tests/checkasm/aarch64/checkasm.S b/tests/checkasm/aarch64/checkasm.S index bc5ed9ea09..327dfc0802 100644 --- a/tests/checkasm/aarch64/checkasm.S +++ b/tests/checkasm/aarch64/checkasm.S @@ -22,7 +22,7 @@ #include "libavutil/aarch64/asm.S" -const register_init +const register_init, align=4 .quad 0x21f86d66c8ca00ce .quad 0x75b6ba21077c48ad .quad 0xed56bb2dcb3c7736 -- 2.11.0 (Apple Git-81) ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 2/2] hevc: Add NEON 32x32 IDCT
On Thu, 4 May 2017, Alexandra Hájková wrote: --- libavcodec/arm/hevc_idct.S| 311 +++--- libavcodec/arm/hevcdsp_init_arm.c | 4 + 2 files changed, 294 insertions(+), 21 deletions(-) My main issues with it have been taken care of, so I don't see it as too bad any longer. I haven't read the code in detail, but the overall structure is more or less sound at least, so I'm ok with it going in for now. The speedup vs C code is 8-18x, so it's clearly worthwhile at least. Will push. // Martin ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 1/2] hevc: 16x16 NEON idct: Use the right element size for stores.
On Thu, 4 May 2017, Alexandra Hájková wrote: This doesn't change the actual behaviour of the code but improves readability. --- libavcodec/arm/hevc_idct.S | 16 1 file changed, 8 insertions(+), 8 deletions(-) OK // Martin ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
[libav-devel] [GASPP PATCH] Support converting more instructions to their thumb equivalent
--- These are used for supporting building x264 for windows/arm with msvc/armasm (currently in the x264 sandbox repo). --- gas-preprocessor.pl | 14 ++ 1 file changed, 14 insertions(+) diff --git a/gas-preprocessor.pl b/gas-preprocessor.pl index 35d201d..afdfc9e 100755 --- a/gas-preprocessor.pl +++ b/gas-preprocessor.pl @@ -951,6 +951,20 @@ sub handle_serialized_line { $line =~ s/stm(?:db|fd)\s+sp!\s*,\s*\{([^,-]+)\}/str $1, [sp, #-4]!/g; $line =~ s/ldm(?:ia|fd)?\s+sp!\s*,\s*\{([^,-]+)\}/ldr $1, [sp], #4/g; +# Convert muls into mul+cmp +$line =~ s/muls\s+(\w+),\s*(\w+)\,\s*(\w+)/mul $1, $2, $3\n\tcmp $1, #0/g; + +# Convert "and r0, sp, #xx" into "mov r0, sp", "and r0, r0, #xx" +$line =~ s/and\s+(\w+),\s*(sp|r13)\,\s*#(\w+)/mov $1, $2\n\tand $1, $1, #$3/g; + +# Convert "ldr r0, [r0, r1, lsl #6]" where the shift is >3 (which +# can't be handled in thumb) into "add r0, r0, r1, lsl #6", +# "ldr r0, [r0]", for the special case where the same address is +# used as base and target for the ldr. +if ($line =~ /(ldr[bh]?)\s+(\w+),\s*\[\2,\s*(\w+),\s*lsl\s*#(\w+)\]/ and $4 > 3) { +$line =~ s/(ldr[bh]?)\s+(\w+),\s*\[\2,\s*(\w+),\s*lsl\s*#(\w+)\]/add $2, $2, $3, lsl #$4\n\t$1 $2, [$2]/; +} + $line =~ s/\.arm/.thumb/x; } -- 2.11.0 (Apple Git-81) ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 1/5] lavu: add new D3D11 pixfmt and hwcontext
On Thu, 4 May 2017 08:44:04 +0200 wm4wrote: > +AV_PIX_FMT_D3D11, ///< HW decoding through Direct3D11 via new API, > Picture.data[0] contains a ID3D11Texture2D pointer, and data[1] contains the > texture array index of the frame as intptr_t if the ID3D11Texture2D is an > array texture (or 0 if it's a normal texture) > + > AV_PIX_FMT_NB,///< number of pixel formats, DO NOT USE THIS if > you want to link with shared libav* By the way, there is probably no strict need for a new pixfmt. With the "new" D3D11 hwaccel, we need it to carry different objects (a texture handle instead of a decoder view). So the semantics change which would warrant a new pixfmt. On the other hand, there's no hard technical reason to use a new pixfmt. We could just change the definition of AV_PIX_FMT_D3D11VA_VLD to depend on which API is used. Pro (for new, separate AV_PIX_FMT_D3D11): - cleaner - avoids confusion - chance that the old API is deprecated, and AV_PIX_FMT_D3D11VA_VLD is removed, also removing the problem Contra: - libavcodec dxva2 code needs tons of changes to deal with both d3d11 formats - separate AVHWAccels needed just because of the pixfmt Which should it be? Or maybe I'm missing something big here due to sleep deprivation. ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 4/5] dxva: move d3d11 locking/unlocking to functions
On Thu, 4 May 2017 11:15:42 +0200 Hendrik Leppkeswrote: > On Thu, May 4, 2017 at 8:44 AM, wm4 wrote: > > I want to make it non-mandatory to set a mutex in the D3D11 device > > context, and replacing it with user callbacks seems like the best > > solution. This is preparation for it. Also makes the code slightly more > > readable. > > > > With recent frame-mt hwaccel changes, a user that needs this locking > could just do it externally around the decode function. Maybe we > should just get rid of it in the "new" d3d11 hwaccel? Yeah, I'm not sure how this should be handled, or if this sort of locking is even required. Note that for sane refcounting and hwframes functionality in a multithreaded setting there would need to be a locking mechanism in the hwcontext - but only if this kind of locking is needed at all. It's not clear to me whether it's needed. It could be cargo-cult. ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH] arm: Check for the .arch directive in configure
On 5/4/17 10:45 AM, Martin Storsjö wrote: > is used), as suggested by Janne on irc. Looks fine. ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
Re: [libav-devel] [PATCH 4/5] dxva: move d3d11 locking/unlocking to functions
On Thu, May 4, 2017 at 8:44 AM, wm4wrote: > I want to make it non-mandatory to set a mutex in the D3D11 device > context, and replacing it with user callbacks seems like the best > solution. This is preparation for it. Also makes the code slightly more > readable. > With recent frame-mt hwaccel changes, a user that needs this locking could just do it externally around the decode function. Maybe we should just get rid of it in the "new" d3d11 hwaccel? - Hendrik ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
[libav-devel] [PATCH 2/2] hevc: Add NEON 32x32 IDCT
--- libavcodec/arm/hevc_idct.S| 311 +++--- libavcodec/arm/hevcdsp_init_arm.c | 4 + 2 files changed, 294 insertions(+), 21 deletions(-) diff --git a/libavcodec/arm/hevc_idct.S b/libavcodec/arm/hevc_idct.S index eeb81e3..79799b2 100644 --- a/libavcodec/arm/hevc_idct.S +++ b/libavcodec/arm/hevc_idct.S @@ -28,6 +28,10 @@ const trans, align=4 .short 89, 75, 50, 18 .short 90, 87, 80, 70 .short 57, 43, 25, 9 +.short 90, 90, 88, 85 +.short 82, 78, 73, 67 +.short 61, 54, 46, 38 +.short 31, 22, 13, 4 endconst .macro clip10 in1, in2, c1, c2 @@ -509,7 +513,7 @@ endfunc vsub.s32\tmp_m, \e, \o .endm -.macro tr16_8x4 in0, in1, in2, in3, in4, in5, in6, in7 +.macro tr16_8x4 in0, in1, in2, in3, in4, in5, in6, in7, offset tr_4x4_8\in0, \in2, \in4, \in6, q8, q9, q10, q11, q12, q13, q14, q15 vmull.s16 q12, \in1, \in0[0] @@ -535,7 +539,7 @@ endfunc butterfly q9, q13, q1, q6 butterfly q10, q14, q2, q5 butterfly q11, q15, q3, q4 -add r4, sp, #512 +add r4, sp, #\offset vst1.s32{q0-q1}, [r4, :128]! vst1.s32{q2-q3}, [r4, :128]! vst1.s32{q4-q5}, [r4, :128]! @@ -575,15 +579,15 @@ endfunc vsub.s32\in6, \in6, \in7 .endm -.macro store16 in0, in1, in2, in3, in4, in5, in6, in7 +.macro store16 in0, in1, in2, in3, in4, in5, in6, in7, rx vst1.s16\in0, [r1, :64], r2 -vst1.s16\in1, [r3, :64], r4 +vst1.s16\in1, [r3, :64], \rx vst1.s16\in2, [r1, :64], r2 -vst1.s16\in3, [r3, :64], r4 +vst1.s16\in3, [r3, :64], \rx vst1.s16\in4, [r1, :64], r2 -vst1.s16\in5, [r3, :64], r4 +vst1.s16\in5, [r3, :64], \rx vst1.s16\in6, [r1, :64], r2 -vst1.s16\in7, [r3, :64], r4 +vst1.s16\in7, [r3, :64], \rx .endm .macro scale out0, out1, out2, out3, out4, out5, out6, out7, in0, in1, in2, in3, in4, in5, in6, in7, shift @@ -597,19 +601,35 @@ endfunc vqrshrn.s32 \out7, \in7, \shift .endm -.macro tr_16x4 name, shift +@stores in1, in2, in4, in6 ascending from off1 and +@stores in1, in3, in5, in7 descending from off2 +.macro store_to_stack off1, off2, in0, in2, in4, in6, in7, in5, in3, in1 +add r1, sp, #\off1 +add r3, sp, #\off2 +mov r2, #-16 +vst1.s32{\in0}, [r1, :128]! +vst1.s32{\in1}, [r3, :128], r2 +vst1.s32{\in2}, [r1, :128]! +vst1.s32{\in3}, [r3, :128], r2 +vst1.s32{\in4}, [r1, :128]! +vst1.s32{\in5}, [r3, :128], r2 +vst1.s32{\in6}, [r1, :128] +vst1.s32{\in7}, [r3, :128] +.endm + +.macro tr_16x4 name, shift, offset, step function func_tr_16x4_\name mov r1, r5 -add r3, r5, #64 -mov r2, #128 +add r3, r5, #(\step * 64) +mov r2, #(\step * 128) load16 d0, d1, d2, d3, d4, d5, d6, d7 movrel r1, trans -tr16_8x4d0, d1, d2, d3, d4, d5, d6, d7 +tr16_8x4d0, d1, d2, d3, d4, d5, d6, d7, \offset -add r1, r5, #32 -add r3, r5, #(64 + 32) -mov r2, #128 +add r1, r5, #(\step * 32) +add r3, r5, #(\step * 3 *32) +mov r2, #(\step * 128) load16 d8, d9, d2, d3, d4, d5, d6, d7 movrel r1, trans + 16 vld1.s16{q0}, [r1, :128] @@ -630,11 +650,12 @@ function func_tr_16x4_\name add_member d6, d1[2], d0[3], d0[0], d0[2], d1[1], d1[3], d1[0], d0[1], +, -, +, -, +, +, -, + add_member d7, d1[3], d1[2], d1[1], d1[0], d0[3], d0[2], d0[1], d0[0], +, -, +, -, +, -, +, - -add r4, sp, #512 +add r4, sp, #\offset vld1.s32{q0-q1}, [r4, :128]! vld1.s32{q2-q3}, [r4, :128]! butterfly16 q0, q5, q1, q6, q2, q7, q3, q8 +.if \shift > 0 scale d26, d27, d28, d29, d30, d31, d16, d17, q4, q0, q5, q1, q6, q2, q7, q3, \shift transpose8_4x4 d26, d28, d30, d16 transpose8_4x4 d17, d31, d29, d27 @@ -642,12 +663,16 @@ function func_tr_16x4_\name add r3, r6, #(24 +3*32) mov r2, #32 mov r4, #-32 -store16 d26, d27, d28, d29, d30, d31, d16, d17 +store16 d26, d27, d28, d29, d30, d31, d16, d17, r4 +.else +store_to_stack \offset, (\offset + 240), q4, q5, q6, q7, q3, q2, q1, q0 +.endif -add
[libav-devel] [PATCH 1/2] hevc: 16x16 NEON idct: Use the right element size for stores.
This doesn't change the actual behaviour of the code but improves readability. --- libavcodec/arm/hevc_idct.S | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/libavcodec/arm/hevc_idct.S b/libavcodec/arm/hevc_idct.S index fac5758..eeb81e3 100644 --- a/libavcodec/arm/hevc_idct.S +++ b/libavcodec/arm/hevc_idct.S @@ -536,10 +536,10 @@ endfunc butterfly q10, q14, q2, q5 butterfly q11, q15, q3, q4 add r4, sp, #512 -vst1.s16{q0-q1}, [r4, :128]! -vst1.s16{q2-q3}, [r4, :128]! -vst1.s16{q4-q5}, [r4, :128]! -vst1.s16{q6-q7}, [r4, :128] +vst1.s32{q0-q1}, [r4, :128]! +vst1.s32{q2-q3}, [r4, :128]! +vst1.s32{q4-q5}, [r4, :128]! +vst1.s32{q6-q7}, [r4, :128] .endm .macro load16 in0, in1, in2, in3, in4, in5, in6, in7 @@ -631,8 +631,8 @@ function func_tr_16x4_\name add_member d7, d1[3], d1[2], d1[1], d1[0], d0[3], d0[2], d0[1], d0[0], +, -, +, -, +, -, +, - add r4, sp, #512 -vld1.s16{q0-q1}, [r4, :128]! -vld1.s16{q2-q3}, [r4, :128]! +vld1.s32{q0-q1}, [r4, :128]! +vld1.s32{q2-q3}, [r4, :128]! butterfly16 q0, q5, q1, q6, q2, q7, q3, q8 scale d26, d27, d28, d29, d30, d31, d16, d17, q4, q0, q5, q1, q6, q2, q7, q3, \shift @@ -645,8 +645,8 @@ function func_tr_16x4_\name store16 d26, d27, d28, d29, d30, d31, d16, d17 add r4, sp, #576 -vld1.s16{q0-q1}, [r4, :128]! -vld1.s16{q2-q3}, [r4, :128] +vld1.s32{q0-q1}, [r4, :128]! +vld1.s32{q2-q3}, [r4, :128] butterfly16 q0, q9, q1, q10, q2, q11, q3, q12 scale d26, d27, d28, d29, d30, d31, d8, d9, q4, q0, q9, q1, q10, q2, q11, q3, \shift transpose8_4x4 d26, d28, d30, d8 -- 2.10.2 ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
[libav-devel] [PATCH] arm: Check for the .arch directive in configure
When targeting windows, the .arch directive isn't available. So far, when building for windows, we've always used gas-preprocessor, both when using msvc's armasm and when using clang. Lately, clang/llvm has implemented the last missing piece (altmacro support) for building our assembly without gas-preprocessor. This means that we now build for arm/windows with clang without any extra compatibility layer. --- Updated to use a plain ifdef guard around the block instead of introducing a line prefix as for e.g. ELF (since this is the only place where .arch is used), as suggested by Janne on irc. --- configure | 4 libavutil/arm/asm.S | 2 ++ 2 files changed, 6 insertions(+) diff --git a/configure b/configure index c7d0363..029ae9e 100755 --- a/configure +++ b/configure @@ -1661,6 +1661,7 @@ SYSTEM_FUNCS=" " TOOLCHAIN_FEATURES=" +as_arch_directive as_dn_directive as_fpu_directive as_func @@ -4372,6 +4373,9 @@ EOF check_inline_asm asm_mod_q '"add r0, %Q0, %R0" :: "r"((long long)0)' +check_as
[libav-devel] [PATCH] arm: Check for the .arch directive in configure
When targeting windows, the .arch directive isn't available. So far, when building for windows, we've always used gas-preprocessor, both when using msvc's armasm and when using clang. Lately, clang/llvm has implemented the last missing piece (altmacro support) for building our assembly without gas-preprocessor. This means that we now build for arm/windows with clang without any extra compatibility layer. --- configure | 4 libavutil/arm/asm.S | 14 ++ 2 files changed, 14 insertions(+), 4 deletions(-) diff --git a/configure b/configure index c7d0363..029ae9e 100755 --- a/configure +++ b/configure @@ -1661,6 +1661,7 @@ SYSTEM_FUNCS=" " TOOLCHAIN_FEATURES=" +as_arch_directive as_dn_directive as_fpu_directive as_func @@ -4372,6 +4373,9 @@ EOF check_inline_asm asm_mod_q '"add r0, %Q0, %R0" :: "r"((long long)0)' +check_as
[libav-devel] [PATCH 4/5] dxva: move d3d11 locking/unlocking to functions
I want to make it non-mandatory to set a mutex in the D3D11 device context, and replacing it with user callbacks seems like the best solution. This is preparation for it. Also makes the code slightly more readable. --- And yes, only because INVALID_HANDLE_VALUE != NULL --- libavcodec/dxva2.c | 46 -- 1 file changed, 28 insertions(+), 18 deletions(-) diff --git a/libavcodec/dxva2.c b/libavcodec/dxva2.c index 1cb79fe294..0d4effd228 100644 --- a/libavcodec/dxva2.c +++ b/libavcodec/dxva2.c @@ -29,6 +29,28 @@ #include "avcodec.h" #include "dxva2_internal.h" +static void ff_dxva2_lock(AVCodecContext *avctx) +{ +#if CONFIG_D3D11VA +if (ff_dxva2_is_d3d11(avctx)) { +AVDXVAContext *ctx = DXVA_CONTEXT(avctx); +if (D3D11VA_CONTEXT(ctx)->context_mutex != INVALID_HANDLE_VALUE) +WaitForSingleObjectEx(D3D11VA_CONTEXT(ctx)->context_mutex, INFINITE, FALSE); +} +#endif +} + +static void ff_dxva2_unlock(AVCodecContext *avctx) +{ +#if CONFIG_D3D11VA +if (ff_dxva2_is_d3d11(avctx)) { +AVDXVAContext *ctx = DXVA_CONTEXT(avctx); +if (D3D11VA_CONTEXT(ctx)->context_mutex != INVALID_HANDLE_VALUE) +ReleaseMutex(D3D11VA_CONTEXT(ctx)->context_mutex); +} +#endif +} + static void *get_surface(const AVFrame *frame) { return frame->data[3]; @@ -153,14 +175,12 @@ int ff_dxva2_common_end_frame(AVCodecContext *avctx, AVFrame *frame, unsigned type; do { +ff_dxva2_lock(avctx); #if CONFIG_D3D11VA -if (ff_dxva2_is_d3d11(avctx)) { -if (D3D11VA_CONTEXT(ctx)->context_mutex != INVALID_HANDLE_VALUE) -WaitForSingleObjectEx(D3D11VA_CONTEXT(ctx)->context_mutex, INFINITE, FALSE); +if (ff_dxva2_is_d3d11(avctx)) hr = ID3D11VideoContext_DecoderBeginFrame(D3D11VA_CONTEXT(ctx)->video_context, D3D11VA_CONTEXT(ctx)->decoder, get_surface(frame), 0, NULL); -} #endif #if CONFIG_DXVA2 if (avctx->pix_fmt == AV_PIX_FMT_DXVA2_VLD) @@ -170,21 +190,13 @@ int ff_dxva2_common_end_frame(AVCodecContext *avctx, AVFrame *frame, #endif if (hr != E_PENDING || ++runs > 50) break; -#if CONFIG_D3D11VA -if (ff_dxva2_is_d3d11(avctx)) -if (D3D11VA_CONTEXT(ctx)->context_mutex != INVALID_HANDLE_VALUE) -ReleaseMutex(D3D11VA_CONTEXT(ctx)->context_mutex); -#endif +ff_dxva2_unlock(avctx); av_usleep(2000); } while(1); if (FAILED(hr)) { av_log(avctx, AV_LOG_ERROR, "Failed to begin frame: 0x%x\n", hr); -#if CONFIG_D3D11VA -if (ff_dxva2_is_d3d11(avctx)) -if (D3D11VA_CONTEXT(ctx)->context_mutex != INVALID_HANDLE_VALUE) -ReleaseMutex(D3D11VA_CONTEXT(ctx)->context_mutex); -#endif +ff_dxva2_unlock(avctx); return -1; } @@ -284,16 +296,14 @@ int ff_dxva2_common_end_frame(AVCodecContext *avctx, AVFrame *frame, end: #if CONFIG_D3D11VA -if (ff_dxva2_is_d3d11(avctx)) { +if (ff_dxva2_is_d3d11(avctx)) hr = ID3D11VideoContext_DecoderEndFrame(D3D11VA_CONTEXT(ctx)->video_context, D3D11VA_CONTEXT(ctx)->decoder); -if (D3D11VA_CONTEXT(ctx)->context_mutex != INVALID_HANDLE_VALUE) -ReleaseMutex(D3D11VA_CONTEXT(ctx)->context_mutex); -} #endif #if CONFIG_DXVA2 if (avctx->pix_fmt == AV_PIX_FMT_DXVA2_VLD) hr = IDirectXVideoDecoder_EndFrame(DXVA2_CONTEXT(ctx)->decoder, NULL); #endif +ff_dxva2_unlock(avctx); if (FAILED(hr)) { av_log(avctx, AV_LOG_ERROR, "Failed to end frame: 0x%x\n", hr); result = -1; -- 2.11.0 ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
[libav-devel] [PATCH 0/5] New D3D hwaccel API stuff
Radically rebased, and omits a few in-between commits that are unnecessary for the end result. avconv_dxva2.c should probably also be deleted, but for now it'd only inflate the diff. As part of the rebase I've also removed Steve Lhomme as author name - let me know whether I should set his name back on the two relevant commits (first and last one), or how this should be correctly handled. As far as I'm concerned, this is pretty much finished. Please review or merge. wm4 (5): lavu: add new D3D11 pixfmt and hwcontext lavc: set avctx->hwaccel before init dxva: preparations for new hwaccel API dxva: move d3d11 locking/unlocking to functions dxva: add support for new dxva2 and d3d11 hwaccel APIs Changelog | 1 + avtools/avconv.h | 2 + avtools/avconv_opt.c | 8 +- configure | 18 +- doc/APIchanges | 9 + libavcodec/allcodecs.c | 5 + libavcodec/decode.c| 4 +- libavcodec/dxva2.c | 723 +++-- libavcodec/dxva2_h264.c| 36 +- libavcodec/dxva2_hevc.c| 32 +- libavcodec/dxva2_internal.h| 63 +++- libavcodec/dxva2_mpeg2.c | 32 +- libavcodec/dxva2_vc1.c | 54 ++- libavcodec/h264_slice.c| 3 +- libavcodec/hevcdec.c | 3 +- libavcodec/mpeg12dec.c | 1 + libavcodec/vc1dec.c| 1 + libavcodec/version.h | 4 +- libavutil/Makefile | 3 + libavutil/hwcontext.c | 4 + libavutil/hwcontext.h | 1 + libavutil/hwcontext_d3d11va.c | 488 +++ libavutil/hwcontext_d3d11va.h | 158 + libavutil/hwcontext_dxva2.h| 3 + libavutil/hwcontext_internal.h | 1 + libavutil/pixdesc.c| 4 + libavutil/pixfmt.h | 4 +- libavutil/version.h| 4 +- 28 files changed, 1597 insertions(+), 72 deletions(-) create mode 100644 libavutil/hwcontext_d3d11va.c create mode 100644 libavutil/hwcontext_d3d11va.h -- 2.11.0 ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
[libav-devel] [PATCH 2/5] lavc: set avctx->hwaccel before init
So a hwaccel can access avctx->hwaccel in init for whatever reason. This is for the new d3d hwaccel API. We could create separate entrypoints for each of the 3 hwaccel types (dxva2, d3d11va, new d3d11va), but this seems nicer. --- libavcodec/decode.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/libavcodec/decode.c b/libavcodec/decode.c index 8aa27095b6..f7cb05851d 100644 --- a/libavcodec/decode.c +++ b/libavcodec/decode.c @@ -740,16 +740,16 @@ static int setup_hwaccel(AVCodecContext *avctx, return AVERROR(ENOMEM); } +avctx->hwaccel = hwa; if (hwa->init) { ret = hwa->init(avctx); if (ret < 0) { av_freep(>internal->hwaccel_priv_data); +avctx->hwaccel = NULL; return ret; } } -avctx->hwaccel = hwa; - return 0; } -- 2.11.0 ___ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel
[libav-devel] [PATCH 3/5] dxva: preparations for new hwaccel API
The actual hwaccel code will need to access an internal context instead of avctx->hwaccel_context, so add a new DXVA_CONTEXT() macro, that will dispatch between the "old" external and the new internal context. Also, the new API requires a new D3D11 pixfmt, so all places which check for the pixfmt need to be adjusted. Introduce a ff_dxva2_is_d3d11() function, which does the check. --- libavcodec/dxva2.c | 33 + libavcodec/dxva2_h264.c | 14 +++--- libavcodec/dxva2_hevc.c | 10 +- libavcodec/dxva2_internal.h | 22 +- libavcodec/dxva2_mpeg2.c| 10 +- libavcodec/dxva2_vc1.c | 10 +- 6 files changed, 56 insertions(+), 43 deletions(-) diff --git a/libavcodec/dxva2.c b/libavcodec/dxva2.c index b0452b6a9a..1cb79fe294 100644 --- a/libavcodec/dxva2.c +++ b/libavcodec/dxva2.c @@ -71,7 +71,7 @@ int ff_dxva2_commit_buffer(AVCodecContext *avctx, HRESULT hr; #if CONFIG_D3D11VA -if (avctx->pix_fmt == AV_PIX_FMT_D3D11VA_VLD) +if (ff_dxva2_is_d3d11(avctx)) hr = ID3D11VideoContext_GetDecoderBuffer(D3D11VA_CONTEXT(ctx)->video_context, D3D11VA_CONTEXT(ctx)->decoder, type, @@ -91,7 +91,7 @@ int ff_dxva2_commit_buffer(AVCodecContext *avctx, memcpy(dxva_data, data, size); #if CONFIG_D3D11VA -if (avctx->pix_fmt == AV_PIX_FMT_D3D11VA_VLD) { +if (ff_dxva2_is_d3d11(avctx)) { D3D11_VIDEO_DECODER_BUFFER_DESC *dsc11 = dsc; memset(dsc11, 0, sizeof(*dsc11)); dsc11->BufferType = type; @@ -116,7 +116,7 @@ int ff_dxva2_commit_buffer(AVCodecContext *avctx, } #if CONFIG_D3D11VA -if (avctx->pix_fmt == AV_PIX_FMT_D3D11VA_VLD) +if (ff_dxva2_is_d3d11(avctx)) hr = ID3D11VideoContext_ReleaseDecoderBuffer(D3D11VA_CONTEXT(ctx)->video_context, D3D11VA_CONTEXT(ctx)->decoder, type); #endif #if CONFIG_DXVA2 @@ -139,7 +139,7 @@ int ff_dxva2_common_end_frame(AVCodecContext *avctx, AVFrame *frame, DECODER_BUFFER_DESC *bs, DECODER_BUFFER_DESC *slice)) { -AVDXVAContext *ctx = avctx->hwaccel_context; +AVDXVAContext *ctx = DXVA_CONTEXT(avctx); unsigned buffer_count = 0; #if CONFIG_D3D11VA D3D11_VIDEO_DECODER_BUFFER_DESC buffer11[4]; @@ -154,7 +154,7 @@ int ff_dxva2_common_end_frame(AVCodecContext *avctx, AVFrame *frame, do { #if CONFIG_D3D11VA -if (avctx->pix_fmt == AV_PIX_FMT_D3D11VA_VLD) { +if (ff_dxva2_is_d3d11(avctx)) { if (D3D11VA_CONTEXT(ctx)->context_mutex != INVALID_HANDLE_VALUE) WaitForSingleObjectEx(D3D11VA_CONTEXT(ctx)->context_mutex, INFINITE, FALSE); hr = ID3D11VideoContext_DecoderBeginFrame(D3D11VA_CONTEXT(ctx)->video_context, D3D11VA_CONTEXT(ctx)->decoder, @@ -171,7 +171,7 @@ int ff_dxva2_common_end_frame(AVCodecContext *avctx, AVFrame *frame, if (hr != E_PENDING || ++runs > 50) break; #if CONFIG_D3D11VA -if (avctx->pix_fmt == AV_PIX_FMT_D3D11VA_VLD) +if (ff_dxva2_is_d3d11(avctx)) if (D3D11VA_CONTEXT(ctx)->context_mutex != INVALID_HANDLE_VALUE) ReleaseMutex(D3D11VA_CONTEXT(ctx)->context_mutex); #endif @@ -181,7 +181,7 @@ int ff_dxva2_common_end_frame(AVCodecContext *avctx, AVFrame *frame, if (FAILED(hr)) { av_log(avctx, AV_LOG_ERROR, "Failed to begin frame: 0x%x\n", hr); #if CONFIG_D3D11VA -if (avctx->pix_fmt == AV_PIX_FMT_D3D11VA_VLD) +if (ff_dxva2_is_d3d11(avctx)) if (D3D11VA_CONTEXT(ctx)->context_mutex != INVALID_HANDLE_VALUE) ReleaseMutex(D3D11VA_CONTEXT(ctx)->context_mutex); #endif @@ -189,7 +189,7 @@ int ff_dxva2_common_end_frame(AVCodecContext *avctx, AVFrame *frame, } #if CONFIG_D3D11VA -if (avctx->pix_fmt == AV_PIX_FMT_D3D11VA_VLD) { +if (ff_dxva2_is_d3d11(avctx)) { buffer = [buffer_count]; type = D3D11_VIDEO_DECODER_BUFFER_PICTURE_PARAMETERS; } @@ -212,7 +212,7 @@ int ff_dxva2_common_end_frame(AVCodecContext *avctx, AVFrame *frame, if (qm_size > 0) { #if CONFIG_D3D11VA -if (avctx->pix_fmt == AV_PIX_FMT_D3D11VA_VLD) { +if (ff_dxva2_is_d3d11(avctx)) { buffer = [buffer_count]; type = D3D11_VIDEO_DECODER_BUFFER_INVERSE_QUANTIZATION_MATRIX; } @@ -235,7 +235,7 @@ int ff_dxva2_common_end_frame(AVCodecContext *avctx, AVFrame *frame, } #if CONFIG_D3D11VA -if (avctx->pix_fmt == AV_PIX_FMT_D3D11VA_VLD) { +if (ff_dxva2_is_d3d11(avctx)) { buffer = [buffer_count + 0]; buffer_slice = [buffer_count + 1]; } @@ -262,7 +262,7 @@ int ff_dxva2_common_end_frame(AVCodecContext *avctx, AVFrame *frame, assert(buffer_count == 1 + (qm_size
[libav-devel] [PATCH 5/5] dxva: add support for new dxva2 and d3d11 hwaccel APIs
This also adds support to avconv (which is trivial due to the new hwaccel API being generic enough). For now, this keeps avconv_dxva2.c as "dxva2-old", although it doesn't work as avconv.c can't handle multiple hwaccels with the same pixfmt. The new decoder setup code in dxva2.c is significantly based on work by Steve Lhomme, but with heavy changes/rewrites. --- Changelog | 1 + avtools/avconv.h| 2 + avtools/avconv_opt.c| 8 +- configure | 12 +- doc/APIchanges | 6 + libavcodec/allcodecs.c | 5 + libavcodec/dxva2.c | 654 +++- libavcodec/dxva2_h264.c | 22 ++ libavcodec/dxva2_hevc.c | 22 ++ libavcodec/dxva2_internal.h | 43 ++- libavcodec/dxva2_mpeg2.c| 22 ++ libavcodec/dxva2_vc1.c | 44 +++ libavcodec/h264_slice.c | 3 +- libavcodec/hevcdec.c| 3 +- libavcodec/mpeg12dec.c | 1 + libavcodec/vc1dec.c | 1 + libavcodec/version.h| 4 +- libavutil/hwcontext_dxva2.h | 3 + 18 files changed, 844 insertions(+), 12 deletions(-) diff --git a/Changelog b/Changelog index 6fd30fddb9..e44df54c93 100644 --- a/Changelog +++ b/Changelog @@ -15,6 +15,7 @@ version : - VP9 superframe split/merge bitstream filters - FM Screen Capture Codec decoder - ClearVideo decoder (I-frames only) +- support for decoding through D3D11VA in avconv version 12: diff --git a/avtools/avconv.h b/avtools/avconv.h index 3354c50444..fe2bb313b7 100644 --- a/avtools/avconv.h +++ b/avtools/avconv.h @@ -54,9 +54,11 @@ enum HWAccelID { HWACCEL_AUTO, HWACCEL_VDPAU, HWACCEL_DXVA2, +HWACCEL_DXVA2_OLD, HWACCEL_VDA, HWACCEL_QSV, HWACCEL_VAAPI, +HWACCEL_D3D11VA, }; typedef struct HWAccel { diff --git a/avtools/avconv_opt.c b/avtools/avconv_opt.c index 9839a2269e..e2599bd4d8 100644 --- a/avtools/avconv_opt.c +++ b/avtools/avconv_opt.c @@ -60,8 +60,14 @@ const HWAccel hwaccels[] = { { "vdpau", hwaccel_decode_init, HWACCEL_VDPAU, AV_PIX_FMT_VDPAU, AV_HWDEVICE_TYPE_VDPAU }, #endif +#if HAVE_D3D11VA_LIB +{ "d3d11va", hwaccel_decode_init, HWACCEL_D3D11VA, AV_PIX_FMT_D3D11, + AV_HWDEVICE_TYPE_D3D11VA }, +#endif #if HAVE_DXVA2_LIB -{ "dxva2", dxva2_init, HWACCEL_DXVA2, AV_PIX_FMT_DXVA2_VLD, +{ "dxva2", hwaccel_decode_init, HWACCEL_DXVA2, AV_PIX_FMT_DXVA2_VLD, + AV_HWDEVICE_TYPE_DXVA2}, +{ "dxva2-old", dxva2_init, HWACCEL_DXVA2_OLD, AV_PIX_FMT_DXVA2_VLD, AV_HWDEVICE_TYPE_NONE }, #endif #if CONFIG_VDA diff --git a/configure b/configure index c3ccf69730..2183d23bde 100755 --- a/configure +++ b/configure @@ -2168,7 +2168,7 @@ zmbv_encoder_deps="zlib" # hardware accelerators d3d11va_deps="d3d11_h dxva_h ID3D11VideoDecoder" d3d11va_lib_deps="d3d11va" -dxva2_deps="dxva2api_h DXVA2_ConfigPictureDecode" +dxva2_deps="dxva2api_h DXVA2_ConfigPictureDecode ole32" dxva2_lib_deps="dxva2" vda_deps="VideoDecodeAcceleration_VDADecoder_h blocks_extension pthreads" vda_extralibs="-framework CoreFoundation -framework VideoDecodeAcceleration -framework QuartzCore" @@ -2177,6 +2177,8 @@ h263_vaapi_hwaccel_deps="vaapi" h263_vaapi_hwaccel_select="h263_decoder" h264_d3d11va_hwaccel_deps="d3d11va" h264_d3d11va_hwaccel_select="h264_decoder" +h264_d3d11va2_hwaccel_deps="d3d11va" +h264_d3d11va2_hwaccel_select="h264_decoder" h264_dxva2_hwaccel_deps="dxva2" h264_dxva2_hwaccel_select="h264_decoder" h264_mmal_hwaccel_deps="mmal" @@ -2191,6 +2193,8 @@ h264_vdpau_hwaccel_deps="vdpau" h264_vdpau_hwaccel_select="h264_decoder" hevc_d3d11va_hwaccel_deps="d3d11va DXVA_PicParams_HEVC" hevc_d3d11va_hwaccel_select="hevc_decoder" +hevc_d3d11va2_hwaccel_deps="d3d11va DXVA_PicParams_HEVC" +hevc_d3d11va2_hwaccel_select="hevc_decoder" hevc_dxva2_hwaccel_deps="dxva2 DXVA_PicParams_HEVC" hevc_dxva2_hwaccel_select="hevc_decoder" hevc_qsv_hwaccel_deps="libmfx" @@ -2202,6 +2206,8 @@ mpeg1_vdpau_hwaccel_deps="vdpau" mpeg1_vdpau_hwaccel_select="mpeg1video_decoder" mpeg2_d3d11va_hwaccel_deps="d3d11va" mpeg2_d3d11va_hwaccel_select="mpeg2video_decoder" +mpeg2_d3d11va2_hwaccel_deps="d3d11va" +mpeg2_d3d11va2_hwaccel_select="mpeg2video_decoder" mpeg2_dxva2_hwaccel_deps="dxva2" mpeg2_dxva2_hwaccel_select="mpeg2video_decoder" mpeg2_mmal_hwaccel_deps="mmal" @@ -2216,6 +,8 @@ mpeg4_vdpau_hwaccel_deps="vdpau" mpeg4_vdpau_hwaccel_select="mpeg4_decoder" vc1_d3d11va_hwaccel_deps="d3d11va" vc1_d3d11va_hwaccel_select="vc1_decoder" +vc1_d3d11va2_hwaccel_deps="d3d11va" +vc1_d3d11va2_hwaccel_select="vc1_decoder" vc1_dxva2_hwaccel_deps="dxva2" vc1_dxva2_hwaccel_select="vc1_decoder" vc1_mmal_hwaccel_deps="mmal" @@ -2228,6 +2236,7 @@ vp8_qsv_hwaccel_deps="libmfx" vp8_vaapi_hwaccel_deps="vaapi VAPictureParameterBufferVP8" vp8_vaapi_hwaccel_select="vp8_decoder" wmv3_d3d11va_hwaccel_select="vc1_d3d11va_hwaccel" +wmv3_d3d11va2_hwaccel_select="vc1_d3d11va2_hwaccel"
[libav-devel] [PATCH 1/5] lavu: add new D3D11 pixfmt and hwcontext
To be used with the new d3d11 hwaccel decode API. With the new hwaccel API, we don't want surfaces to depend on the decoder (other than the required dimension and format). The old D3D11VA pixfmt uses ID3D11VideoDecoderOutputView pointers, which include the decoder configuration, and thus is incompatible with the new hwaccel API. This patch introduces AV_PIX_FMT_D3D11, which uses ID3D11Texture2D and an index. It's simpler and compatible with the new hwaccel API. The introduced hwcontext supports only the new pixfmt. Significantly based on work by Steve Lhomme, but with heavy changes/rewrites. --- Somewhat sketchy: if initial_pool_size is set, the pool is assumed to be static. --- configure | 6 + doc/APIchanges | 3 + libavutil/Makefile | 3 + libavutil/hwcontext.c | 4 + libavutil/hwcontext.h | 1 + libavutil/hwcontext_d3d11va.c | 488 + libavutil/hwcontext_d3d11va.h | 158 + libavutil/hwcontext_internal.h | 1 + libavutil/pixdesc.c| 4 + libavutil/pixfmt.h | 4 +- libavutil/version.h| 4 +- 11 files changed, 673 insertions(+), 3 deletions(-) create mode 100644 libavutil/hwcontext_d3d11va.c create mode 100644 libavutil/hwcontext_d3d11va.h diff --git a/configure b/configure index 6f696c9ab5..c3ccf69730 100755 --- a/configure +++ b/configure @@ -1712,6 +1712,7 @@ HAVE_LIST=" $THREADS_LIST $TOOLCHAIN_FEATURES $TYPES_LIST +d3d11va_lib dos_paths dxva2_lib libc_msvcrt @@ -2166,6 +2167,7 @@ zmbv_encoder_deps="zlib" # hardware accelerators d3d11va_deps="d3d11_h dxva_h ID3D11VideoDecoder" +d3d11va_lib_deps="d3d11va" dxva2_deps="dxva2api_h DXVA2_ConfigPictureDecode" dxva2_lib_deps="dxva2" vda_deps="VideoDecodeAcceleration_VDADecoder_h blocks_extension pthreads" @@ -4861,6 +4863,10 @@ if enabled libxcb; then check_pkg_config libxcb_xfixes xcb-xfixes xcb/xfixes.h xcb_xfixes_get_cursor_image fi +enabled d3d11va && +check_type "windows.h d3d11.h" ID3D11VideoDevice && +enable d3d11va_lib + enabled dxva2 && check_lib dxva2_lib windows.h CoTaskMemFree -lole32 diff --git a/doc/APIchanges b/doc/APIchanges index a251c4ca82..a81e41833d 100644 --- a/doc/APIchanges +++ b/doc/APIchanges @@ -13,6 +13,9 @@ libavutil: 2017-03-23 API changes, most recent first: +2017-xx-xx - xxx - lavu 56.2.0 - hwcontext.h + Add AV_HWDEVICE_TYPE_D3D11VA and AV_PIX_FMT_D3D11. + 2017-04-30 - xxx - lavu 56.1.1 - hwcontext.h av_hwframe_ctx_create_derived() now takes some AV_HWFRAME_MAP_* combination as its flags argument (which was previously unused). diff --git a/libavutil/Makefile b/libavutil/Makefile index 60e180c79d..6fb24db678 100644 --- a/libavutil/Makefile +++ b/libavutil/Makefile @@ -27,6 +27,7 @@ HEADERS = adler32.h \ hmac.h\ hwcontext.h \ hwcontext_cuda.h \ + hwcontext_d3d11va.h \ hwcontext_dxva2.h \ hwcontext_qsv.h \ hwcontext_vaapi.h \ @@ -112,6 +113,7 @@ OBJS = adler32.o \ xtea.o \ OBJS-$(CONFIG_CUDA) += hwcontext_cuda.o +OBJS-$(CONFIG_D3D11VA) += hwcontext_d3d11va.o OBJS-$(CONFIG_DXVA2)+= hwcontext_dxva2.o OBJS-$(CONFIG_LIBMFX) += hwcontext_qsv.o OBJS-$(CONFIG_LZO) += lzo.o @@ -121,6 +123,7 @@ OBJS-$(CONFIG_VDPAU)+= hwcontext_vdpau.o OBJS += $(COMPAT_OBJS:%=../compat/%) SKIPHEADERS-$(CONFIG_CUDA) += hwcontext_cuda.h +SKIPHEADERS-$(CONFIG_D3D11VA) += hwcontext_d3d11va.h SKIPHEADERS-$(CONFIG_DXVA2)+= hwcontext_dxva2.h SKIPHEADERS-$(CONFIG_LIBMFX) += hwcontext_qsv.h SKIPHEADERS-$(CONFIG_VAAPI)+= hwcontext_vaapi.h diff --git a/libavutil/hwcontext.c b/libavutil/hwcontext.c index 360b01205c..d82df56abf 100644 --- a/libavutil/hwcontext.c +++ b/libavutil/hwcontext.c @@ -32,6 +32,9 @@ static const HWContextType * const hw_table[] = { #if CONFIG_CUDA _hwcontext_type_cuda, #endif +#if CONFIG_D3D11VA +_hwcontext_type_d3d11va, +#endif #if CONFIG_DXVA2 _hwcontext_type_dxva2, #endif @@ -50,6 +53,7 @@ static const HWContextType * const hw_table[] = { const char *hw_type_names[] = { [AV_HWDEVICE_TYPE_CUDA] = "cuda", [AV_HWDEVICE_TYPE_DXVA2] = "dxva2", +[AV_HWDEVICE_TYPE_D3D11VA] =