Re: [FFmpeg-devel] [Patch] CUDA Thumbnail Filter
Am 04.09.2017 um 14:59 schrieb Yogender Gupta: Taken care of all comments except the documentation. applied Will send out a separate patch for both the CUDA filters documentation. ok smime.p7s Description: S/MIME Cryptographic Signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [Patch] CUDA Thumbnail Filter
On 13 September 2017 at 19:12, Timo Rothenpielerwrote: > I did object and have NAK'd the patch currently. >> Make the submitter submit a new version themselves and I'll approve it. >> > > There was a new version submitted already, I just replied to the wrong one. > > > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > Should be fine then. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [Patch] CUDA Thumbnail Filter
I did object and have NAK'd the patch currently. Make the submitter submit a new version themselves and I'll approve it. There was a new version submitted already, I just replied to the wrong one. smime.p7s Description: S/MIME Cryptographic Signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [Patch] CUDA Thumbnail Filter
On 13 September 2017 at 14:43, Timo Rothenpielerwrote: > Will apply with some minor changes (some unused variables are still left, > and some cleanup is missing) soon if nobody objects. > > > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > I did object and have NAK'd the patch currently. Make the submitter submit a new version themselves and I'll approve it. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [Patch] CUDA Thumbnail Filter
Will apply with some minor changes (some unused variables are still left, and some cleanup is missing) soon if nobody objects. smime.p7s Description: S/MIME Cryptographic Signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [Patch] CUDA Thumbnail Filter
It would be nice, yes, but I'm not sure there is actually that much need with the current setup. The use-cases for the two as currently written don't really overlap - CUDA is useful in the cases you describe with (possibly multiple) high-power GPUs trying to squeeze as much performance as possible out of a system to run many streams, while my OpenCL stuff is intended to be useful on random low-power devices where doing more stuff on the GPU can make the difference between managing real-time or not on a small number of streams. On this filter in particular, I find thumbnail a slightly weird choice to want to write a GPU version of, but if it works in essentially the same way as the software filter and someone has a use-case for it then sure. (* Not that I've actually read it, I'm not familiar with CUDA at all.) An "N times speedup" metric or comparison with some CPU implementation with SIMD is essentially irrelevant, because that isn't the point - even when slower than the CPU implementation there can still be value in it not running on the CPU (this probably won't happen with CUDA because it only runs on high-power devices, but it is certainly possible on mobile devices with OpenCL). - Mark Creating a bunch of thumbnails without major system impact is probably useful for big video websites that want to save time and system resources. While definitely not an everyday use case, I can see applications for it. Some other filters I intend to write/add CUDA versions of are chromakey and despill. And, for those first ones to be useful at all, the overlay filter as well. smime.p7s Description: S/MIME Cryptographic Signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [Patch] CUDA Thumbnail Filter
On 11/09/17 10:18, Timo Rothenpieler wrote: > Am 11.09.2017 um 07:40 schrieb Yogender Gupta: Only 3 to 4 times? This is easily doable with SIMD. >> >> The problem is not with the thumbnail filter at all. The problem is doing >> the transfers from vidmem to sysmem or vice-versa. You will observe if we >> use a transcoder pipeline with and without hwaccel cuvid (using hw >> encoder/decoders in both cases), the one with hwaccel runs much faster. If >> we add more transfers by using a CPU based filter, it will only degrade the >> performance further. >> >> The CUDA thumbnail filter can work directly on the video memory without >> requiring an additional vidmem to sysmem transfer. >> >> Thanks, >> Yogender >> > > I also really don't see the concern with adding CUDA versions of already > existing filters. > They are not included in any standard build, and require both non-free and > the cuda-sdk to be even built in the first place. > > For their specific use case of a fully hardware-accelerated transcode and > filter pipeline they clearly offer benefits. Specially when the final encode > is to be done with nvenc and/or when operating on huge frames(4K or maybe > even bigger) using the GPU has clear benefits and I doubt any SIMD will be > able to compensate for it. > > Another scenario where a 100% GPU pipeline becomes essential is when you are > processing _a lot_ of streams on one machine. You can freely put more GPUs in > and gain more VMEM and Cores to work with, without interfering with the > others. > If there is a single CPU based filter anywhere in that chain you will very > quickly be bottlenecked by it and the copying to and from sysmem. > > Concerning the OpenCL infrastructure that was just posted to the list: > It would indeed be nice if there was a way to map CUDA frames to OpenCL, and > the other way around. But I am not aware of any interoperability there and > Nvidia has more than big enough of a market share on server and cloud > GPUs(see for example AWS) to make adding CUDA based filters worthwhile. > It would be nice, yes, but I'm not sure there is actually that much need with the current setup. The use-cases for the two as currently written don't really overlap - CUDA is useful in the cases you describe with (possibly multiple) high-power GPUs trying to squeeze as much performance as possible out of a system to run many streams, while my OpenCL stuff is intended to be useful on random low-power devices where doing more stuff on the GPU can make the difference between managing real-time or not on a small number of streams. On this filter in particular, I find thumbnail a slightly weird choice to want to write a GPU version of, but if it works in essentially the same way as the software filter and someone has a use-case for it then sure. (* Not that I've actually read it, I'm not familiar with CUDA at all.) An "N times speedup" metric or comparison with some CPU implementation with SIMD is essentially irrelevant, because that isn't the point - even when slower than the CPU implementation there can still be value in it not running on the CPU (this probably won't happen with CUDA because it only runs on high-power devices, but it is certainly possible on mobile devices with OpenCL). - Mark ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [Patch] CUDA Thumbnail Filter
Am 11.09.2017 um 07:40 schrieb Yogender Gupta: Only 3 to 4 times? This is easily doable with SIMD. The problem is not with the thumbnail filter at all. The problem is doing the transfers from vidmem to sysmem or vice-versa. You will observe if we use a transcoder pipeline with and without hwaccel cuvid (using hw encoder/decoders in both cases), the one with hwaccel runs much faster. If we add more transfers by using a CPU based filter, it will only degrade the performance further. The CUDA thumbnail filter can work directly on the video memory without requiring an additional vidmem to sysmem transfer. Thanks, Yogender I also really don't see the concern with adding CUDA versions of already existing filters. They are not included in any standard build, and require both non-free and the cuda-sdk to be even built in the first place. For their specific use case of a fully hardware-accelerated transcode and filter pipeline they clearly offer benefits. Specially when the final encode is to be done with nvenc and/or when operating on huge frames(4K or maybe even bigger) using the GPU has clear benefits and I doubt any SIMD will be able to compensate for it. Another scenario where a 100% GPU pipeline becomes essential is when you are processing _a lot_ of streams on one machine. You can freely put more GPUs in and gain more VMEM and Cores to work with, without interfering with the others. If there is a single CPU based filter anywhere in that chain you will very quickly be bottlenecked by it and the copying to and from sysmem. Concerning the OpenCL infrastructure that was just posted to the list: It would indeed be nice if there was a way to map CUDA frames to OpenCL, and the other way around. But I am not aware of any interoperability there and Nvidia has more than big enough of a market share on server and cloud GPUs(see for example AWS) to make adding CUDA based filters worthwhile. smime.p7s Description: S/MIME Cryptographic Signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [Patch] CUDA Thumbnail Filter
>> Only 3 to 4 times? This is easily doable with SIMD. The problem is not with the thumbnail filter at all. The problem is doing the transfers from vidmem to sysmem or vice-versa. You will observe if we use a transcoder pipeline with and without hwaccel cuvid (using hw encoder/decoders in both cases), the one with hwaccel runs much faster. If we add more transfers by using a CPU based filter, it will only degrade the performance further. The CUDA thumbnail filter can work directly on the video memory without requiring an additional vidmem to sysmem transfer. Thanks, Yogender -Original Message- From: ffmpeg-devel [mailto:ffmpeg-devel-boun...@ffmpeg.org] On Behalf Of Rostislav Pehlivanov Sent: Monday, September 11, 2017 10:56 AM To: FFmpeg development discussions and patches Subject: Re: [FFmpeg-devel] [Patch] CUDA Thumbnail Filter On 11 September 2017 at 05:59, Yogender Gupta <ygu...@nvidia.com> wrote: > I believe there were concerns on pushing the CUDA thumbnail filter and > that is possible to get similar performance using the normal thumbnail > filter. The CUDA thumbnail filter is useful for generating thumbnails > on the hwaccel cuvid pipeline, as it can directly operate on the video > memory and give significantly higher performance, owing to the fact > that there are no sysmem to vidmem copies as well as the fact that the > encoding and CUDA HW being separate, the CUDA thumbnail filter may not > degrade the encode performance at all. > > The following commands run show that using the Cuda thumbnail filter > on the hw pipeline could be 3x-4x faster. > > E:\>ffmpeg -vsync 0 -y -hwaccel cuvid -c:v h264_cuvid -i amazing.264 > -filter_complex > [0:v]split=2[in0][in1];[in0]thumbnail_cuda=600,hwdownload, > format=nv12[out0];[in1]scale_npp=1920:1080 > [out1] -map [out0] thumb%03d.jpg -map [out1] -c:v h264_nvenc out.264 > 2> hw.txt > > E:\>ffmpeg -vsync 0 -y -c:v h264_cuvid -i amazing.264 -filter_complex > [0:v]split=2[in0][in1];[in0]thumbnail[out0];[in1]scale[out1] -map > [out0] thumb%03d.jpg -map [out1] -c:v h264_nvenc > out.264 2> sw.txt > > Thanks, > Yogender > > > --- > This email message is for the sole use of the intended recipient(s) > and may contain confidential information. Any unauthorized review, > use, disclosure or distribution is prohibited. If you are not the > intended recipient, please contact the sender by reply email and > destroy all copies of the original message. > > --- > > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > Only 3 to 4 times? This is easily doable with SIMD. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [Patch] CUDA Thumbnail Filter
On 11 September 2017 at 05:59, Yogender Guptawrote: > I believe there were concerns on pushing the CUDA thumbnail filter and > that is possible to get similar performance using the normal thumbnail > filter. The CUDA thumbnail filter is useful for generating thumbnails on > the hwaccel cuvid pipeline, as it can directly operate on the video memory > and give significantly higher performance, owing to the fact that there are > no sysmem to vidmem copies as well as the fact that the encoding and CUDA > HW being separate, the CUDA thumbnail filter may not degrade the encode > performance at all. > > The following commands run show that using the Cuda thumbnail filter on > the hw pipeline could be 3x-4x faster. > > E:\>ffmpeg -vsync 0 -y -hwaccel cuvid -c:v h264_cuvid -i amazing.264 > -filter_complex [0:v]split=2[in0][in1];[in0]thumbnail_cuda=600,hwdownload, > format=nv12[out0];[in1]scale_npp=1920:1080 > [out1] -map [out0] thumb%03d.jpg -map [out1] -c:v h264_nvenc out.264 2> > hw.txt > > E:\>ffmpeg -vsync 0 -y -c:v h264_cuvid -i amazing.264 -filter_complex > [0:v]split=2[in0][in1];[in0]thumbnail[out0];[in1]scale[out1] -map [out0] > thumb%03d.jpg -map [out1] -c:v h264_nvenc > out.264 2> sw.txt > > Thanks, > Yogender > > > --- > This email message is for the sole use of the intended recipient(s) and > may contain > confidential information. Any unauthorized review, use, disclosure or > distribution > is prohibited. If you are not the intended recipient, please contact the > sender by > reply email and destroy all copies of the original message. > > --- > > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > Only 3 to 4 times? This is easily doable with SIMD. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [Patch] CUDA Thumbnail Filter
I believe there were concerns on pushing the CUDA thumbnail filter and that is possible to get similar performance using the normal thumbnail filter. The CUDA thumbnail filter is useful for generating thumbnails on the hwaccel cuvid pipeline, as it can directly operate on the video memory and give significantly higher performance, owing to the fact that there are no sysmem to vidmem copies as well as the fact that the encoding and CUDA HW being separate, the CUDA thumbnail filter may not degrade the encode performance at all. The following commands run show that using the Cuda thumbnail filter on the hw pipeline could be 3x-4x faster. E:\>ffmpeg -vsync 0 -y -hwaccel cuvid -c:v h264_cuvid -i amazing.264 -filter_complex [0:v]split=2[in0][in1];[in0]thumbnail_cuda=600,hwdownload,format=nv12[out0];[in1]scale_npp=1920:1080 [out1] -map [out0] thumb%03d.jpg -map [out1] -c:v h264_nvenc out.264 2> hw.txt E:\>ffmpeg -vsync 0 -y -c:v h264_cuvid -i amazing.264 -filter_complex [0:v]split=2[in0][in1];[in0]thumbnail[out0];[in1]scale[out1] -map [out0] thumb%03d.jpg -map [out1] -c:v h264_nvenc out.264 2> sw.txt Thanks, Yogender --- This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. --- ffmpeg version 3.3.git Copyright (c) 2000-2017 the FFmpeg developers built with Microsoft (R) C/C++ Optimizing Compiler Version 18.00.40629 for x64 configuration: --enable-nonfree --disable-shared --enable-nvenc --enable-cuda --enable-cuvid --enable-libnpp --enable-cuda-sdk --enable-libnpp --extra-cflags=-Ilocal/include --extra-cflags=-I../nv_sdk --extra-ldflags='-libpath:../nv_sdk' --toolchain=msvc libavutil 55. 74.100 / 55. 74.100 libavcodec 57.104.101 / 57.104.101 libavformat57. 81.100 / 57. 81.100 libavdevice57. 8.100 / 57. 8.100 libavfilter 6.101.100 / 6.101.100 libswscale 4. 7.103 / 4. 7.103 libswresample 2. 8.100 / 2. 8.100 [h264 @ 0057E4367AE0] Stream #0: not enough frames to estimate rate; consider increasing probesize Input #0, h264, from 'amazing.264': Duration: N/A, bitrate: N/A Stream #0:0: Video: h264 (High), yuv420p(progressive), 1920x1072 [SAR 134:135 DAR 16:9], 24 fps, 24 tbr, 1200k tbn, 48 tbc Stream mapping: Stream #0:0 (h264_cuvid) -> split format -> Stream #0:0 (mjpeg) scale_npp -> Stream #1:0 (h264_nvenc) Press [q] to stop, [?] for help [swscaler @ 0057E43BF400] deprecated pixel format used, make sure you did set range correctly Output #0, image2, to 'thumb%03d.jpg': Metadata: encoder : Lavf57.81.100 Stream #0:0: Video: mjpeg, yuvj420p(pc), 1920x1072 [SAR 134:135 DAR 16:9], q=2-31, 200 kb/s, 24 fps, 24 tbn, 24 tbc Metadata: encoder : Lavc57.104.101 mjpeg Side data: cpb: bitrate max/min/avg: 0/0/20 buffer size: 0 vbv_delay: -1 Output #1, h264, to 'out.264': Metadata: encoder : Lavf57.81.100 Stream #1:0: Video: h264 (h264_nvenc) (Main), cuda, 1920x1080 [SAR 1:1 DAR 16:9], q=-1--1, 2000 kb/s, 24 fps, 24 tbn, 24 tbc Metadata: encoder : Lavc57.104.101 h264_nvenc Side data: cpb: bitrate max/min/avg: 0/0/200 buffer size: 400 vbv_delay: -1 frame=0 fps=0.0 q=0.0 q=25.0 size=N/A time=00:00:06.12 bitrate=N/A speed=12.2x frame=0 fps=0.0 q=0.0 q=25.0 size=N/A time=00:00:12.91 bitrate=N/A speed=12.9x frame=0 fps=0.0 q=0.0 q=29.0 size=N/A time=00:00:18.58 bitrate=N/A speed=12.3x frame=0 fps=0.0 q=0.0 q=27.0 size=N/A time=00:00:24.20 bitrate=N/A speed= 12x [Parsed_thumbnail_cuda_1 @ 0057E6D88900] frame id #150 (pts_time=6.249990) selected from a set of 600 images frame=1 fps=0.4 q=3.0 q=25.0 size=N/A time=00:00:30.54 bitrate=N/A speed=12.2x frame=1 fps=0.3 q=3.0 q=28.0 size=N/A time=00:00:37.37 bitrate=N/A speed=12.4x frame=1 fps=0.3 q=3.0 q=29.0 size=N/A time=00:00:44.00 bitrate=N/A speed=12.5x [Parsed_thumbnail_cuda_1 @ 0057E6D88900] frame id #461 (pts_time=44.208262) selected from a set of 600 images frame=2 fps=0.5 q=1.6 q=29.0 size=N/A time=00:00:50.25 bitrate=N/A speed=12.5x frame=2 fps=0.4 q=1.6 q=31.0 size=N/A time=00:00:56.91 bitrate=N/A speed=12.6x frame=2 fps=0.4 q=1.6 q=30.0 size=N/A time=00:01:03.41 bitrate=N/A speed=12.6x frame=2 fps=0.4 q=1.6 q=25.0 size=N/A time=00:01:09.83 bitrate=N/A speed=12.6x [Parsed_thumbnail_cuda_1 @ 0057E6D88900] frame id #216 (pts_time=58.06) selected from a set of 600 images frame=3 fps=0.5 q=1.6 q=27.0 size=N/A time=00:01:15.91 bitrate=N/A speed=12.6x
Re: [FFmpeg-devel] [Patch] CUDA Thumbnail Filter
On Mon, 4 Sep 2017 20:41:19 +0100 Rostislav Pehlivanovwrote: > On 4 September 2017 at 19:44, wm4 wrote: > > > On Mon, 4 Sep 2017 19:07:02 +0100 > > Rostislav Pehlivanov wrote: > > > > > On 4 September 2017 at 18:18, wm4 wrote: > > > > > > > On Mon, 4 Sep 2017 18:03:51 +0100 > > > > Rostislav Pehlivanov wrote: > > > > > > > > > On 4 September 2017 at 17:25, Timo Rothenpieler < > > t...@rothenpieler.org> > > > > > wrote: > > > > > > > > > > > We have av_pixelutils_sad_fn which does SAD and has SIMD, there's > > no > > > > point > > > > > >> in reinventing the wheel. > > > > > >> > > > > > >> I also don't see why this needs to be implemented with CUDA. > > You're > > > > not > > > > > >> even doing the SAD in CUDA. I bet it'll be just as fast if not > > faster > > > > in C > > > > > >> (unless you cheat somehow). > > > > > >> > > > > > > > > > > > > The point is to do it on CUDA frames without copying them to > > system ram > > > > > > first. > > > > > > > > > > > > > > > > > > ___ > > > > > > ffmpeg-devel mailing list > > > > > > ffmpeg-devel@ffmpeg.org > > > > > > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > > > > > > > > > > > > > > > > I think they should provide a Vulkan interop so we could drop all > > CUDA > > > > > filters and instead treat all filter GPU acceleration in a generic > > way. > > > > Its > > > > > just a matter of months before one exists, I bet. > > > > > > > > You could say the same about OpenCL. Too bad NVIDIA keep pushing their > > > > dumb vendor specific APIs. > > > > ___ > > > > ffmpeg-devel mailing list > > > > ffmpeg-devel@ffmpeg.org > > > > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > > > > > > > > > OpenCL does no presenting, so interops there would remove CUDA's point. > > > However, Vulkan is general purpose so interops must exist in order to > > avoid > > > copying when presenting. OpenGL got a CUDA interop for this very reason. > > > > That doesn't matter for this filter. I'm fairly sure OpenCL got interop > > too, although I've never tried it. > > > > > Hence, since a Vulkan interop will soon exist, I object to this patch. I > > > see no reason to add more vendor exlcusive code when a generic solution > > > will appear and we could use that. Unlelss someone manages to convince me > > > otherwise. > > > > Unlike Vulkan, OpenCL is rather stable and widely supported. Vulkan was > > apparently made for games (including stability requirements), and > > supported only with newer HW. In fact, OpenCL is literally the portable > > equivalent to Cuda. So it would be the logical choice. > > ___ > > ffmpeg-devel mailing list > > ffmpeg-devel@ffmpeg.org > > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > > > Vulkan was definitely not made for games only, it was made to be general > purpose. It certainly feels like it. So much around it is geared towards game dev. > As far as I know some vendors are replacing their OpenCL > implementations by a Vulkan shim. They probably could implement a Vulkan OpenCL backend, but even then they'll provide a shader compiler as part of the OpenCL API, which is superior to Vulkan again. > Some vendors also have had a history of > deliberately handicapping alternative compute APIs so their native ones > perform better. Vulkan eliminates all that. Then suggest better hardware with vendors which don't do that nonsense. It remains to be seen whether Vulkan is really suitable for anything but things centered around rasterization. The lack of a standard shader compiler is definitely an issue. (Are you going to depend on vendor extensions? On some shitty 3rd party compilers, like the half-broken glslang? Or check in shader binaries into git?) > Also using Vulkan elminates the > need for an OpenCL/Vulkan interop for users using Vulkan. There's no other > logical choice but Vulkan. Using OpenCL wouldn't even require any interop with that much. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [Patch] CUDA Thumbnail Filter
On 4 September 2017 at 19:44, wm4wrote: > On Mon, 4 Sep 2017 19:07:02 +0100 > Rostislav Pehlivanov wrote: > > > On 4 September 2017 at 18:18, wm4 wrote: > > > > > On Mon, 4 Sep 2017 18:03:51 +0100 > > > Rostislav Pehlivanov wrote: > > > > > > > On 4 September 2017 at 17:25, Timo Rothenpieler < > t...@rothenpieler.org> > > > > wrote: > > > > > > > > > We have av_pixelutils_sad_fn which does SAD and has SIMD, there's > no > > > point > > > > >> in reinventing the wheel. > > > > >> > > > > >> I also don't see why this needs to be implemented with CUDA. > You're > > > not > > > > >> even doing the SAD in CUDA. I bet it'll be just as fast if not > faster > > > in C > > > > >> (unless you cheat somehow). > > > > >> > > > > > > > > > > The point is to do it on CUDA frames without copying them to > system ram > > > > > first. > > > > > > > > > > > > > > > ___ > > > > > ffmpeg-devel mailing list > > > > > ffmpeg-devel@ffmpeg.org > > > > > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > > > > > > > > > > > > > I think they should provide a Vulkan interop so we could drop all > CUDA > > > > filters and instead treat all filter GPU acceleration in a generic > way. > > > Its > > > > just a matter of months before one exists, I bet. > > > > > > You could say the same about OpenCL. Too bad NVIDIA keep pushing their > > > dumb vendor specific APIs. > > > ___ > > > ffmpeg-devel mailing list > > > ffmpeg-devel@ffmpeg.org > > > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > > > > > > OpenCL does no presenting, so interops there would remove CUDA's point. > > However, Vulkan is general purpose so interops must exist in order to > avoid > > copying when presenting. OpenGL got a CUDA interop for this very reason. > > That doesn't matter for this filter. I'm fairly sure OpenCL got interop > too, although I've never tried it. > > > Hence, since a Vulkan interop will soon exist, I object to this patch. I > > see no reason to add more vendor exlcusive code when a generic solution > > will appear and we could use that. Unlelss someone manages to convince me > > otherwise. > > Unlike Vulkan, OpenCL is rather stable and widely supported. Vulkan was > apparently made for games (including stability requirements), and > supported only with newer HW. In fact, OpenCL is literally the portable > equivalent to Cuda. So it would be the logical choice. > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel > Vulkan was definitely not made for games only, it was made to be general purpose. As far as I know some vendors are replacing their OpenCL implementations by a Vulkan shim. Some vendors also have had a history of deliberately handicapping alternative compute APIs so their native ones perform better. Vulkan eliminates all that. Also using Vulkan elminates the need for an OpenCL/Vulkan interop for users using Vulkan. There's no other logical choice but Vulkan. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [Patch] CUDA Thumbnail Filter
On Mon, 4 Sep 2017 19:07:02 +0100 Rostislav Pehlivanovwrote: > On 4 September 2017 at 18:18, wm4 wrote: > > > On Mon, 4 Sep 2017 18:03:51 +0100 > > Rostislav Pehlivanov wrote: > > > > > On 4 September 2017 at 17:25, Timo Rothenpieler > > > wrote: > > > > > > > We have av_pixelutils_sad_fn which does SAD and has SIMD, there's no > > point > > > >> in reinventing the wheel. > > > >> > > > >> I also don't see why this needs to be implemented with CUDA. You're > > not > > > >> even doing the SAD in CUDA. I bet it'll be just as fast if not faster > > in C > > > >> (unless you cheat somehow). > > > >> > > > > > > > > The point is to do it on CUDA frames without copying them to system ram > > > > first. > > > > > > > > > > > > ___ > > > > ffmpeg-devel mailing list > > > > ffmpeg-devel@ffmpeg.org > > > > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > > > > > > > > > > I think they should provide a Vulkan interop so we could drop all CUDA > > > filters and instead treat all filter GPU acceleration in a generic way. > > Its > > > just a matter of months before one exists, I bet. > > > > You could say the same about OpenCL. Too bad NVIDIA keep pushing their > > dumb vendor specific APIs. > > ___ > > ffmpeg-devel mailing list > > ffmpeg-devel@ffmpeg.org > > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > > > OpenCL does no presenting, so interops there would remove CUDA's point. > However, Vulkan is general purpose so interops must exist in order to avoid > copying when presenting. OpenGL got a CUDA interop for this very reason. That doesn't matter for this filter. I'm fairly sure OpenCL got interop too, although I've never tried it. > Hence, since a Vulkan interop will soon exist, I object to this patch. I > see no reason to add more vendor exlcusive code when a generic solution > will appear and we could use that. Unlelss someone manages to convince me > otherwise. Unlike Vulkan, OpenCL is rather stable and widely supported. Vulkan was apparently made for games (including stability requirements), and supported only with newer HW. In fact, OpenCL is literally the portable equivalent to Cuda. So it would be the logical choice. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [Patch] CUDA Thumbnail Filter
On 4 September 2017 at 18:18, wm4wrote: > On Mon, 4 Sep 2017 18:03:51 +0100 > Rostislav Pehlivanov wrote: > > > On 4 September 2017 at 17:25, Timo Rothenpieler > > wrote: > > > > > We have av_pixelutils_sad_fn which does SAD and has SIMD, there's no > point > > >> in reinventing the wheel. > > >> > > >> I also don't see why this needs to be implemented with CUDA. You're > not > > >> even doing the SAD in CUDA. I bet it'll be just as fast if not faster > in C > > >> (unless you cheat somehow). > > >> > > > > > > The point is to do it on CUDA frames without copying them to system ram > > > first. > > > > > > > > > ___ > > > ffmpeg-devel mailing list > > > ffmpeg-devel@ffmpeg.org > > > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > > > > > > > I think they should provide a Vulkan interop so we could drop all CUDA > > filters and instead treat all filter GPU acceleration in a generic way. > Its > > just a matter of months before one exists, I bet. > > You could say the same about OpenCL. Too bad NVIDIA keep pushing their > dumb vendor specific APIs. > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel > OpenCL does no presenting, so interops there would remove CUDA's point. However, Vulkan is general purpose so interops must exist in order to avoid copying when presenting. OpenGL got a CUDA interop for this very reason. Hence, since a Vulkan interop will soon exist, I object to this patch. I see no reason to add more vendor exlcusive code when a generic solution will appear and we could use that. Unlelss someone manages to convince me otherwise. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [Patch] CUDA Thumbnail Filter
On Mon, 4 Sep 2017 18:03:51 +0100 Rostislav Pehlivanovwrote: > On 4 September 2017 at 17:25, Timo Rothenpieler > wrote: > > > We have av_pixelutils_sad_fn which does SAD and has SIMD, there's no point > >> in reinventing the wheel. > >> > >> I also don't see why this needs to be implemented with CUDA. You're not > >> even doing the SAD in CUDA. I bet it'll be just as fast if not faster in C > >> (unless you cheat somehow). > >> > > > > The point is to do it on CUDA frames without copying them to system ram > > first. > > > > > > ___ > > ffmpeg-devel mailing list > > ffmpeg-devel@ffmpeg.org > > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > > > > I think they should provide a Vulkan interop so we could drop all CUDA > filters and instead treat all filter GPU acceleration in a generic way. Its > just a matter of months before one exists, I bet. You could say the same about OpenCL. Too bad NVIDIA keep pushing their dumb vendor specific APIs. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [Patch] CUDA Thumbnail Filter
On 4 September 2017 at 17:25, Timo Rothenpielerwrote: > We have av_pixelutils_sad_fn which does SAD and has SIMD, there's no point >> in reinventing the wheel. >> >> I also don't see why this needs to be implemented with CUDA. You're not >> even doing the SAD in CUDA. I bet it'll be just as fast if not faster in C >> (unless you cheat somehow). >> > > The point is to do it on CUDA frames without copying them to system ram > first. > > > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > I think they should provide a Vulkan interop so we could drop all CUDA filters and instead treat all filter GPU acceleration in a generic way. Its just a matter of months before one exists, I bet. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [Patch] CUDA Thumbnail Filter
We have av_pixelutils_sad_fn which does SAD and has SIMD, there's no point in reinventing the wheel. I also don't see why this needs to be implemented with CUDA. You're not even doing the SAD in CUDA. I bet it'll be just as fast if not faster in C (unless you cheat somehow). The point is to do it on CUDA frames without copying them to system ram first. smime.p7s Description: S/MIME Cryptographic Signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [Patch] CUDA Thumbnail Filter
On 30 August 2017 at 05:19, Yogender Guptawrote: > Attached is a CUDA version of the thumbnail filter, this helps accelerate > thumbnails generations significantly, when using the GPU pipeline. > > Regards, > Yogender > > > --- > This email message is for the sole use of the intended recipient(s) and > may contain > confidential information. Any unauthorized review, use, disclosure or > distribution > is prohibited. If you are not the intended recipient, please contact the > sender by > reply email and destroy all copies of the original message. > > --- > > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > We have av_pixelutils_sad_fn which does SAD and has SIMD, there's no point in reinventing the wheel. I also don't see why this needs to be implemented with CUDA. You're not even doing the SAD in CUDA. I bet it'll be just as fast if not faster in C (unless you cheat somehow). ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [Patch] CUDA Thumbnail Filter
Taken care of all comments except the documentation. Will send out a separate patch for both the CUDA filters documentation. Regards, Yogender --- This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. --- 0001-thumbnail_cuda-CUDA-Thumbnail-Filter.patch Description: 0001-thumbnail_cuda-CUDA-Thumbnail-Filter.patch ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [Patch] CUDA Thumbnail Filter
Also missing a dep on cuda_sdk in configure. smime.p7s Description: S/MIME Cryptographic Signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [Patch] CUDA Thumbnail Filter
Am 30.08.2017 um 06:19 schrieb Yogender Gupta: Attached is a CUDA version of the thumbnail filter, this helps accelerate thumbnails generations significantly, when using the GPU pipeline. Regards, Yogender After having a look at the code: The filter is using a global "CUdeviceptr data;" variable(Which isn't even static). This is generally not acceptable. It makes it impossible to use the filter more than once in parallel. All state should be in the filter context. Also, the allocated Module and Device-Memory is never freed. uninit should unload the module, free the memory, and do other potentially necessary cleanup. Otherwise the code seems reasonable to me. Would still like to have someone else review it though. smime.p7s Description: S/MIME Cryptographic Signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [Patch] CUDA Thumbnail Filter
Am 30.08.2017 um 06:19 schrieb Yogender Gupta: Attached is a CUDA version of the thumbnail filter, this helps accelerate thumbnails generations significantly, when using the GPU pipeline. Without having done a full review on the code yet: A new filter needs a libavfilter version bump(not 100% sure if minor or micro, but I think it was a minor bump). Also, the filter is missing documentation in doc/filters.texi. Which, as I just realized, is also true for scale_cuda. Will have a closer look at the code later. If someone else could also have a look, that would be greatly appreciated. Regards, Timo smime.p7s Description: S/MIME Cryptographic Signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel