Re: [FFmpeg-devel] have some major changes for nvenc support
On 1/8/16, Andrey Turkin wrote: > In my opinion this proliferation of various filters which do the same thing > in different way is a configuration headache. There's CPU filters: one for > scaling/format conversion, one for padding, one for cropping, like 5 > different filters for deinterlacing. And now there would be nvresize for > scaling, and we gotta add CUDA-based nvpad and nvdeint if we want to do > some transcoding. Maybe add overlaying too - so nvoverlay. Then there is > OpenCL which can do everything - so 4 more filters for that. And there is > quicksync which I think can do those things, so there would be qsvscale and > qsvdeint. And there is D3D11 video processor which can do those things too > (simultaneously in fact), so there's gotta be > d3d11vascaledeintpadcropoverlay. And then we've got a whole bunch of video > filters which can only do their job on a specific hwaccel platform. Want to > try different hwaccel? Rewrite damn filter string. Want to do something > generic that can be used over different platforms? Can't be done. > Maybe it's just my wishful thinking, but I was wondering for some time if > there can be one "smart" filter to do one specific thing? Yes a smart wrapper might be nice. I'm all for different ways to resize, though, I used to wonder why there wasn't a filter option liek -filter_complex "convert_to_opengl; opengl_resize=1000x1000; opengl_rotate=90;convert_from_opengl;" that type of thing, I think based on similar functionality gstreamer has. Cheers! ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] have some major changes for nvenc support
2016-01-08 20:25 GMT+03:00 Michael Niedermayer : > Also slightly orthogonal but if you have 4 filters each written for a > different hwaccel you can write a generic filter that passes its > stuff to the appropriate one > If theres not much shareale code between hw specific filters then this > might be cleaner than throwing all hw specifc code in one filter > directly > > such seperate filters could use the existing conditional compilation > code, could be used directly if the user wants to force a specific > hw type, ... > > So we'd have a default filter and then specific ones just in case anyone needs them. That is a great idea. Not sure which way is cleaner - to have proxy filter or to lump everything together in one filter and add extra filter definitions to force specific hwaccel. Would work either way. That I'd looking for in the end, though, is to have more "intelligent" ffmpeg wrt hardware acceleration. I'd much rather prefer to have it automagically use hwaccel whether possible, in the optimal way, and not have to describe every detail. For example, this thread started because NVidia had a specific scenario and they had to offload scaling to CUDA - and they had to concoct something arguably messy to make it happen. There's very specialized filter with many outputs, and you have to use complex_filter and you have to map filter outputs to output files etc. It would be awesome if they could just do ffmpeg -i input -s 1920x1080 -c:v nvenc_h264 hd1080.mp4 -s 1280x720 -c:v nvenc_h264 hd720.mp4 ..., and it'd do scaling on GPU without explicit instructions. In my opinion (which might be totally wrong), it would take 4 changes to make that happen: - make ffmpeg use avfilter for scaling - i.e. connect scale filter to filterchain output (unless it already does) - add another pixelformat for CUDA, add its support to scale, and add it as a input format for nvenc_h264 - adjust pixelformat negotiation logic in avfilter to make sure it would select CUDA pixelformat for scale (maybe just preferring hwaccel formats over software ones would work?) - add interop filter to perform CPU-GPU and GPU-CPU data transfer - i.e. convert between hwaccel and corresponding software formats; avfilter would insert it in appropriate places when negotiating pixel formats (I think it does something similar with scale). This might be tricky - e.g. in my example single interop filter needs to be added and its output has to be shared between all the scale filters. If, say, there were 2 GPUs used in encoding then there would have to be 2 interop filters. On the plus side all existing host->device copying code in encoders can be thrown out (or rather moved in that filter); as well as existing device->host copying code from ffmpeg_*.c. Also it would make writing new hwaccel-enable filters easier. Actually there is one more thing to do - filters would somehow have to share hwaccel contexts with their neighbour filters as well as filterchain inputs and outputs. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] have some major changes for nvenc support
On Fri, Jan 08, 2016 at 03:04:26PM +0300, Andrey Turkin wrote: > In my opinion this proliferation of various filters which do the same thing > in different way is a configuration headache. There's CPU filters: one for > scaling/format conversion, one for padding, one for cropping, like 5 > different filters for deinterlacing. And now there would be nvresize for > scaling, and we gotta add CUDA-based nvpad and nvdeint if we want to do > some transcoding. Maybe add overlaying too - so nvoverlay. Then there is > OpenCL which can do everything - so 4 more filters for that. And there is > quicksync which I think can do those things, so there would be qsvscale and > qsvdeint. And there is D3D11 video processor which can do those things too > (simultaneously in fact), so there's gotta be > d3d11vascaledeintpadcropoverlay. And then we've got a whole bunch of video > filters which can only do their job on a specific hwaccel platform. Want to > try different hwaccel? Rewrite damn filter string. Want to do something > generic that can be used over different platforms? Can't be done. > Maybe it's just my wishful thinking, but I was wondering for some time if > there can be one "smart" filter to do one specific thing? Say, single > deinterlace filter which can automatically use whichever hwaccel was used > on its input (or whichever will be used on its output)? We've already got > pixel formats which describe particular hwaccel - can't filters decide > which code path to use based on that? And it can have a configuration Also slightly orthogonal but if you have 4 filters each written for a different hwaccel you can write a generic filter that passes its stuff to the appropriate one If theres not much shareale code between hw specific filters then this might be cleaner than throwing all hw specifc code in one filter directly such seperate filters could use the existing conditional compilation code, could be used directly if the user wants to force a specific hw type, ... [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB I am the wisest man alive, for I know one thing, and that is that I know nothing. -- Socrates signature.asc Description: Digital signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] have some major changes for nvenc support
In my opinion this proliferation of various filters which do the same thing in different way is a configuration headache. There's CPU filters: one for scaling/format conversion, one for padding, one for cropping, like 5 different filters for deinterlacing. And now there would be nvresize for scaling, and we gotta add CUDA-based nvpad and nvdeint if we want to do some transcoding. Maybe add overlaying too - so nvoverlay. Then there is OpenCL which can do everything - so 4 more filters for that. And there is quicksync which I think can do those things, so there would be qsvscale and qsvdeint. And there is D3D11 video processor which can do those things too (simultaneously in fact), so there's gotta be d3d11vascaledeintpadcropoverlay. And then we've got a whole bunch of video filters which can only do their job on a specific hwaccel platform. Want to try different hwaccel? Rewrite damn filter string. Want to do something generic that can be used over different platforms? Can't be done. Maybe it's just my wishful thinking, but I was wondering for some time if there can be one "smart" filter to do one specific thing? Say, single deinterlace filter which can automatically use whichever hwaccel was used on its input (or whichever will be used on its output)? We've already got pixel formats which describe particular hwaccel - can't filters decide which code path to use based on that? And it can have a configuration option to choose which CPU-based fallback to use (in fact that option can be used to tweak GPU-based algorithm for platforms which support it). Same goes for encoders - can't there be "just" h264 encoder? This one's tough though - you might have dxva decoder, cuda filters and nvenc encoder and you probably want to keep them all on the same GPU. Or not - maybe you do want to decode on qsv and encode on nvenc, or maybe vice versa. Probably single-GPU case is more common so it should try to use same GPU what was used for decoding and/or filtering, and allow to override with encoder options (like nvenc allows to specify cuda device to use). Interop will be a pain though (obviously we'd want to avoid device-host frame transfers). I'm trying to share d3d11va-decoded frames (nv12 texture) with OpenCL or CUDA right now,and I've been at it for days with no luck whatsoever. My last resort now is to write a shader to "draw" (in fact just copy pixels around) nv12 texture onto another texture in more "common" format which can be used by OpenCL... There's got to be some easier way (other than using cuvid to decode the video), right? Regards, Andrey 2016-01-08 11:29 GMT+03:00 Roger Pack : > On 11/5/15, wm4 wrote: > > On Thu, 5 Nov 2015 16:23:04 +0800 > > Agatha Hu wrote: > > > >> 2) We use AVFrame::opaque field to store a customized ffnvinfo struture > >> to prevent expensive CPU<->GPU transferration. Without it, the workflow > >> will be like CPU AVFrame input-->copy to GPU-->do CUDA resizing-->copy > >> to CPU AVFrame-->copy to GPU-->NVENC encoding. And now it becomes: > >> CPU AVFrame input-->copy to GPU-->do CUDA resizing-->NVENC encoding. > >> Our strategy is to check whether AVFrame::opaque is not null AND its > >> first 128 bytes matches some particular GUID. If so, AVFrame::opaque is > >> a valid ffnvinfo struture and we read GPU address directly from it > >> instead of copying data from AVFrame. > > > > Please no, not another hack that makes the hw decoding API situation > > worse. Do this properly and coordinate with Gwenole Beauchesne, who > > plans improvements into this direction. > > Which part are you referring to (though I'll admit putting some stuff > in libavutil seems a bit suspect). > It would be nice to have the nvresize filter available anyway, and it > looks like it mostly just deals with private context variables. > Cheers! > -roger- > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel > ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] have some major changes for nvenc support
On 11/5/15, wm4 wrote: > On Thu, 5 Nov 2015 16:23:04 +0800 > Agatha Hu wrote: > >> 2) We use AVFrame::opaque field to store a customized ffnvinfo struture >> to prevent expensive CPU<->GPU transferration. Without it, the workflow >> will be like CPU AVFrame input-->copy to GPU-->do CUDA resizing-->copy >> to CPU AVFrame-->copy to GPU-->NVENC encoding. And now it becomes: >> CPU AVFrame input-->copy to GPU-->do CUDA resizing-->NVENC encoding. >> Our strategy is to check whether AVFrame::opaque is not null AND its >> first 128 bytes matches some particular GUID. If so, AVFrame::opaque is >> a valid ffnvinfo struture and we read GPU address directly from it >> instead of copying data from AVFrame. > > Please no, not another hack that makes the hw decoding API situation > worse. Do this properly and coordinate with Gwenole Beauchesne, who > plans improvements into this direction. Which part are you referring to (though I'll admit putting some stuff in libavutil seems a bit suspect). It would be nice to have the nvresize filter available anyway, and it looks like it mostly just deals with private context variables. Cheers! -roger- ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] have some major changes for nvenc support
在 2015/11/5 18:31, wm4 写道: On Thu, 5 Nov 2015 16:23:04 +0800 Agatha Hu wrote: 2) We use AVFrame::opaque field to store a customized ffnvinfo struture to prevent expensive CPU<->GPU transferration. Without it, the workflow will be like CPU AVFrame input-->copy to GPU-->do CUDA resizing-->copy to CPU AVFrame-->copy to GPU-->NVENC encoding. And now it becomes: CPU AVFrame input-->copy to GPU-->do CUDA resizing-->NVENC encoding. Our strategy is to check whether AVFrame::opaque is not null AND its first 128 bytes matches some particular GUID. If so, AVFrame::opaque is a valid ffnvinfo struture and we read GPU address directly from it instead of copying data from AVFrame. Please no, not another hack that makes the hw decoding API situation worse. Do this properly and coordinate with Gwenole Beauchesne, who plans improvements into this direction. Try to catch on with Gwenole Beauchesne's work, is the related AVHWAccelFrame available for use now? Agatha Hu ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] have some major changes for nvenc support
On Thu, 5 Nov 2015 16:23:04 +0800 Agatha Hu wrote: > 2) We use AVFrame::opaque field to store a customized ffnvinfo struture > to prevent expensive CPU<->GPU transferration. Without it, the workflow > will be like CPU AVFrame input-->copy to GPU-->do CUDA resizing-->copy > to CPU AVFrame-->copy to GPU-->NVENC encoding. And now it becomes: > CPU AVFrame input-->copy to GPU-->do CUDA resizing-->NVENC encoding. > Our strategy is to check whether AVFrame::opaque is not null AND its > first 128 bytes matches some particular GUID. If so, AVFrame::opaque is > a valid ffnvinfo struture and we read GPU address directly from it > instead of copying data from AVFrame. Please no, not another hack that makes the hw decoding API situation worse. Do this properly and coordinate with Gwenole Beauchesne, who plans improvements into this direction. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] have some major changes for nvenc support
Hi, Recently Nvidia did some work on improving nvenc performance, it includes lots of change so I attach the patch instead of direct send. Here are the explanations: 1) The first main change is adding an nvresize filter (1:N, one input, multiple outputs) to do hardware resizing, because during our interal 1:N encoding test, we found swscale becomes bottleneck. So we use cuda kernel instead. 2) We use AVFrame::opaque field to store a customized ffnvinfo struture to prevent expensive CPU<->GPU transferration. Without it, the workflow will be like CPU AVFrame input-->copy to GPU-->do CUDA resizing-->copy to CPU AVFrame-->copy to GPU-->NVENC encoding. And now it becomes: CPU AVFrame input-->copy to GPU-->do CUDA resizing-->NVENC encoding. Our strategy is to check whether AVFrame::opaque is not null AND its first 128 bytes matches some particular GUID. If so, AVFrame::opaque is a valid ffnvinfo struture and we read GPU address directly from it instead of copying data from AVFrame. Nvresize filter has a -readback parameter, if it's set as 0, resized result won't be copied back to CPU, mostly in case it's connected to an NVENC encoder。 If it's set as 1, resized result will still be copied back to AVFrame so that it could be compatible with other components. 3) Because we are using CUDA address now, input buffer becomes CUDA external memory. We replaced NvEncCreateInputBuffer to cuMemAllocPitch+NvEncRegisterInputBuffer, and NvEncLock/UnlockInputBuffer to NvEncMap/UnmapInputBuffer. 4) And because of using cuda input, it exposed some driver bugs, e.g. nvenc generates corrupted chroma plane data if buffer format is YUV420p. Bug-fixed driver will soon be released, but considering backwards compatibility we decided to convert YUV420P to NV12 explicitly by a cuda kernel in nvenc.c. Even in the bug-fixed driver, there's still a YUV420P->NV12 conversion kernel. The only difference is that kernel is provided along with driver, but here we did it within nvenc.c. The same reason, YUV444P is removed temporarily, there's a bug for cuda input. Once the fix is released, we should enable the support again. We choose to backwards support YUV420p is because it's much more popular than YUV444P. 5) Last is, we move most of cuda typedefs/functions/helpers to cudautils.h/c A typical use case is: ffmpeg -y -i $1 $2 $3 -filter_complex \ nvresize=5:s=hd1080\|hd720\|hd480\|wvga\|cif:readback=0[out0][out1][out2][out3][out4] \ -map [out0] -an -vcodec nvenc_h264 -preset slow -profile:v main -async 1 -b:v 200M -bufsize 200M -maxrate 200M -refs 1 -bf 2 $1_1080p.mp4 \ -map [out1] -an -vcodec nvenc_h264 -preset slow -profile:v main -async 1 -b:v 100M -bufsize 100M -maxrate 100M -refs 1 -bf 2 $1_720p.mp4 \ -map [out2] -an -vcodec nvenc_h264 -preset slow -profile:v main -async 1 -b:v 50M -bufsize 50M -maxrate 50M -refs 1 -bf 2 $1_480p.mp4 \ -map [out3] -an -vcodec nvenc_h264 -preset slow -profile:v main -async 1 -b:v 25M -bufsize 25M -maxrate 25M -refs 1 -bf 2 $1_wvga.mp4 \ -map [out4] -an -vcodec nvenc_h264 -preset slow -profile:v main -async 1 -b:v 10M -bufsize 10M -maxrate 10M -refs 1 -bf 2 $1_cif.mp4 Thanks Agatha Hu >From 4bb843a47cbcef9c0383efb7e573f0f8eadb65d6 Mon Sep 17 00:00:00 2001 From: Ganapathy Kasi Date: Wed, 4 Nov 2015 22:22:35 -0800 Subject: [PATCH] combined: cuda resize,yuv420 fix,remove yuv444,add AQ --- libavcodec/Makefile | 2 +- libavcodec/nvenc.c| 435 ++- libavcodec/nvenc_ptx.c| 240 +++ libavfilter/Makefile | 2 + libavfilter/allfilters.c | 1 + libavfilter/vf_nvresize.c | 669 ++ libavfilter/vf_nvresize_ptx.c | 659 + libavutil/Makefile| 2 + libavutil/cudautils.c | 288 ++ libavutil/cudautils.h | 216 ++ 10 files changed, 2241 insertions(+), 273 deletions(-) create mode 100644 libavcodec/nvenc_ptx.c create mode 100644 libavfilter/vf_nvresize.c create mode 100644 libavfilter/vf_nvresize_ptx.c create mode 100644 libavutil/cudautils.c create mode 100644 libavutil/cudautils.h diff --git a/libavcodec/Makefile b/libavcodec/Makefile index 67fb72a..45ac476 100644 --- a/libavcodec/Makefile +++ b/libavcodec/Makefile @@ -98,7 +98,7 @@ OBJS-$(CONFIG_MPEGVIDEOENC)+= mpegvideo_enc.o mpeg12data.o \ motion_est.o ratecontrol.o\ mpegvideoencdsp.o OBJS-$(CONFIG_MSS34DSP)+= mss34dsp.o -OBJS-$(CONFIG_NVENC) += nvenc.o +OBJS-$(CONFIG_NVENC) += nvenc.o nvenc_ptx.o OBJS-$(CONFIG_PIXBLOCKDSP) += pixblockdsp.o OBJS-$(CONFIG_QPELDSP) += qpeldsp.o OBJS-$(CONFIG_QSV) += qsv.o diff --git a/libavcodec/nvenc.c b/