Re: [FFmpeg-devel] [Patch] CUDA Thumbnail Filter

2017-09-22 Thread Timo Rothenpieler

Am 04.09.2017 um 14:59 schrieb Yogender Gupta:

Taken care of all comments except the documentation.


applied


Will send out a separate patch for both the CUDA filters documentation.


ok



smime.p7s
Description: S/MIME Cryptographic Signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [Patch] CUDA Thumbnail Filter

2017-09-13 Thread Rostislav Pehlivanov
On 13 September 2017 at 19:12, Timo Rothenpieler 
wrote:

> I did object and have NAK'd the patch currently.
>> Make the submitter submit a new version themselves and I'll approve it.
>>
>
> There was a new version submitted already, I just replied to the wrong one.
>
>
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
>
Should be fine then.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [Patch] CUDA Thumbnail Filter

2017-09-13 Thread Timo Rothenpieler

I did object and have NAK'd the patch currently.
Make the submitter submit a new version themselves and I'll approve it.


There was a new version submitted already, I just replied to the wrong one.



smime.p7s
Description: S/MIME Cryptographic Signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [Patch] CUDA Thumbnail Filter

2017-09-13 Thread Rostislav Pehlivanov
On 13 September 2017 at 14:43, Timo Rothenpieler 
wrote:

> Will apply with some minor changes (some unused variables are still left,
> and some cleanup is missing) soon if nobody objects.
>
>
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
>
I did object and have NAK'd the patch currently.
Make the submitter submit a new version themselves and I'll approve it.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [Patch] CUDA Thumbnail Filter

2017-09-13 Thread Timo Rothenpieler
Will apply with some minor changes (some unused variables are still 
left, and some cleanup is missing) soon if nobody objects.




smime.p7s
Description: S/MIME Cryptographic Signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [Patch] CUDA Thumbnail Filter

2017-09-11 Thread Timo Rothenpieler

It would be nice, yes, but I'm not sure there is actually that much need with 
the current setup.  The use-cases for the two as currently written don't really 
overlap - CUDA is useful in the cases you describe with (possibly multiple) 
high-power GPUs trying to squeeze as much performance as possible out of a 
system to run many streams, while my OpenCL stuff is intended to be useful on 
random low-power devices where doing more stuff on the GPU can make the 
difference between managing real-time or not on a small number of streams.

On this filter in particular, I find thumbnail a slightly weird choice to want to write a 
GPU version of, but if it works in essentially the same way as the software filter and 
someone has a use-case for it then sure.  (* Not that I've actually read it, I'm not 
familiar with CUDA at all.)  An "N times speedup" metric or comparison with 
some CPU implementation with SIMD is essentially irrelevant, because that isn't the point 
- even when slower than the CPU implementation there can still be value in it not running 
on the CPU (this probably won't happen with CUDA because it only runs on high-power 
devices, but it is certainly possible on mobile devices with OpenCL).

- Mark


Creating a bunch of thumbnails without major system impact is probably 
useful for big video websites that want to save time and system 
resources. While definitely not an everyday use case, I can see 
applications for it.


Some other filters I intend to write/add CUDA versions of are chromakey 
and despill. And, for those first ones to be useful at all, the overlay 
filter as well.




smime.p7s
Description: S/MIME Cryptographic Signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [Patch] CUDA Thumbnail Filter

2017-09-11 Thread Mark Thompson
On 11/09/17 10:18, Timo Rothenpieler wrote:
> Am 11.09.2017 um 07:40 schrieb Yogender Gupta:
 Only 3 to 4 times? This is easily doable with SIMD.
>>
>> The problem is not with the thumbnail filter at all. The problem is doing 
>> the transfers from vidmem to sysmem or vice-versa. You will observe if we 
>> use a transcoder pipeline with and without hwaccel cuvid (using hw 
>> encoder/decoders in both cases), the one with hwaccel runs much faster. If 
>> we add more transfers by using a CPU based filter, it will only degrade the 
>> performance further.
>>
>> The CUDA thumbnail filter can work directly on the video memory without 
>> requiring an additional vidmem to sysmem transfer.
>>
>> Thanks,
>> Yogender
>>
> 
> I also really don't see the concern with adding CUDA versions of already 
> existing filters.
> They are not included in any standard build, and require both non-free and 
> the cuda-sdk to be even built in the first place.
> 
> For their specific use case of a fully hardware-accelerated transcode and 
> filter pipeline they clearly offer benefits. Specially when the final encode 
> is to be done with nvenc and/or when operating on huge frames(4K or maybe 
> even bigger) using the GPU has clear benefits and I doubt any SIMD will be 
> able to compensate for it.
> 
> Another scenario where a 100% GPU pipeline becomes essential is when you are 
> processing _a lot_ of streams on one machine. You can freely put more GPUs in 
> and gain more VMEM and Cores to work with, without interfering with the 
> others.
> If there is a single CPU based filter anywhere in that chain you will very 
> quickly be bottlenecked by it and the copying to and from sysmem.
> 
> Concerning the OpenCL infrastructure that was just posted to the list:
> It would indeed be nice if there was a way to map CUDA frames to OpenCL, and 
> the other way around. But I am not aware of any interoperability there and 
> Nvidia has more than big enough of a market share on server and cloud 
> GPUs(see for example AWS) to make adding CUDA based filters worthwhile.
> 

It would be nice, yes, but I'm not sure there is actually that much need with 
the current setup.  The use-cases for the two as currently written don't really 
overlap - CUDA is useful in the cases you describe with (possibly multiple) 
high-power GPUs trying to squeeze as much performance as possible out of a 
system to run many streams, while my OpenCL stuff is intended to be useful on 
random low-power devices where doing more stuff on the GPU can make the 
difference between managing real-time or not on a small number of streams.

On this filter in particular, I find thumbnail a slightly weird choice to want 
to write a GPU version of, but if it works in essentially the same way as the 
software filter and someone has a use-case for it then sure.  (* Not that I've 
actually read it, I'm not familiar with CUDA at all.)  An "N times speedup" 
metric or comparison with some CPU implementation with SIMD is essentially 
irrelevant, because that isn't the point - even when slower than the CPU 
implementation there can still be value in it not running on the CPU (this 
probably won't happen with CUDA because it only runs on high-power devices, but 
it is certainly possible on mobile devices with OpenCL).

- Mark
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [Patch] CUDA Thumbnail Filter

2017-09-11 Thread Timo Rothenpieler

Am 11.09.2017 um 07:40 schrieb Yogender Gupta:

Only 3 to 4 times? This is easily doable with SIMD.


The problem is not with the thumbnail filter at all. The problem is doing the 
transfers from vidmem to sysmem or vice-versa. You will observe if we use a 
transcoder pipeline with and without hwaccel cuvid (using hw encoder/decoders 
in both cases), the one with hwaccel runs much faster. If we add more transfers 
by using a CPU based filter, it will only degrade the performance further.

The CUDA thumbnail filter can work directly on the video memory without 
requiring an additional vidmem to sysmem transfer.

Thanks,
Yogender



I also really don't see the concern with adding CUDA versions of already 
existing filters.
They are not included in any standard build, and require both non-free 
and the cuda-sdk to be even built in the first place.


For their specific use case of a fully hardware-accelerated transcode 
and filter pipeline they clearly offer benefits. Specially when the 
final encode is to be done with nvenc and/or when operating on huge 
frames(4K or maybe even bigger) using the GPU has clear benefits and I 
doubt any SIMD will be able to compensate for it.


Another scenario where a 100% GPU pipeline becomes essential is when you 
are processing _a lot_ of streams on one machine. You can freely put 
more GPUs in and gain more VMEM and Cores to work with, without 
interfering with the others.
If there is a single CPU based filter anywhere in that chain you will 
very quickly be bottlenecked by it and the copying to and from sysmem.


Concerning the OpenCL infrastructure that was just posted to the list:
It would indeed be nice if there was a way to map CUDA frames to OpenCL, 
and the other way around. But I am not aware of any interoperability 
there and Nvidia has more than big enough of a market share on server 
and cloud GPUs(see for example AWS) to make adding CUDA based filters 
worthwhile.




smime.p7s
Description: S/MIME Cryptographic Signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [Patch] CUDA Thumbnail Filter

2017-09-10 Thread Yogender Gupta
>> Only 3 to 4 times? This is easily doable with SIMD.

The problem is not with the thumbnail filter at all. The problem is doing the 
transfers from vidmem to sysmem or vice-versa. You will observe if we use a 
transcoder pipeline with and without hwaccel cuvid (using hw encoder/decoders 
in both cases), the one with hwaccel runs much faster. If we add more transfers 
by using a CPU based filter, it will only degrade the performance further.

The CUDA thumbnail filter can work directly on the video memory without 
requiring an additional vidmem to sysmem transfer.

Thanks,
Yogender



-Original Message-
From: ffmpeg-devel [mailto:ffmpeg-devel-boun...@ffmpeg.org] On Behalf Of 
Rostislav Pehlivanov
Sent: Monday, September 11, 2017 10:56 AM
To: FFmpeg development discussions and patches
Subject: Re: [FFmpeg-devel] [Patch] CUDA Thumbnail Filter

On 11 September 2017 at 05:59, Yogender Gupta <ygu...@nvidia.com> wrote:

> I believe there were concerns on pushing the CUDA thumbnail filter and 
> that is possible to get similar performance using the normal thumbnail 
> filter. The CUDA thumbnail filter is useful for generating thumbnails 
> on the hwaccel cuvid pipeline, as it can directly operate on the video 
> memory and give significantly higher performance, owing to the fact 
> that there are no sysmem to vidmem copies as well as the fact that the 
> encoding and CUDA HW being separate, the CUDA thumbnail filter may not 
> degrade the encode performance at all.
>
> The following commands run show that using the Cuda thumbnail filter 
> on the hw pipeline could be 3x-4x faster.
>
> E:\>ffmpeg -vsync 0 -y -hwaccel cuvid -c:v h264_cuvid -i amazing.264 
> -filter_complex 
> [0:v]split=2[in0][in1];[in0]thumbnail_cuda=600,hwdownload,
> format=nv12[out0];[in1]scale_npp=1920:1080
> [out1] -map [out0] thumb%03d.jpg -map [out1] -c:v h264_nvenc out.264 
> 2> hw.txt
>
> E:\>ffmpeg -vsync 0 -y -c:v h264_cuvid -i amazing.264 -filter_complex 
> [0:v]split=2[in0][in1];[in0]thumbnail[out0];[in1]scale[out1] -map 
> [out0] thumb%03d.jpg -map [out1] -c:v h264_nvenc
> out.264 2> sw.txt
>
> Thanks,
> Yogender
>
> 
> ---
> This email message is for the sole use of the intended recipient(s) 
> and may contain confidential information.  Any unauthorized review, 
> use, disclosure or distribution is prohibited.  If you are not the 
> intended recipient, please contact the sender by reply email and 
> destroy all copies of the original message.
> 
> ---
>
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
>
Only 3 to 4 times? This is easily doable with SIMD.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [Patch] CUDA Thumbnail Filter

2017-09-10 Thread Rostislav Pehlivanov
On 11 September 2017 at 05:59, Yogender Gupta  wrote:

> I believe there were concerns on pushing the CUDA thumbnail filter and
> that is possible to get similar performance using the normal thumbnail
> filter. The CUDA thumbnail filter is useful for generating thumbnails on
> the hwaccel cuvid pipeline, as it can directly operate on the video memory
> and give significantly higher performance, owing to the fact that there are
> no sysmem to vidmem copies as well as the fact that the encoding and CUDA
> HW being separate, the CUDA thumbnail filter may not degrade the encode
> performance at all.
>
> The following commands run show that using the Cuda thumbnail filter on
> the hw pipeline could be 3x-4x faster.
>
> E:\>ffmpeg -vsync 0 -y -hwaccel cuvid -c:v h264_cuvid -i amazing.264
> -filter_complex [0:v]split=2[in0][in1];[in0]thumbnail_cuda=600,hwdownload,
> format=nv12[out0];[in1]scale_npp=1920:1080
> [out1] -map [out0] thumb%03d.jpg -map [out1] -c:v h264_nvenc out.264 2>
> hw.txt
>
> E:\>ffmpeg -vsync 0 -y -c:v h264_cuvid -i amazing.264 -filter_complex
> [0:v]split=2[in0][in1];[in0]thumbnail[out0];[in1]scale[out1] -map [out0]
> thumb%03d.jpg -map [out1] -c:v h264_nvenc
> out.264 2> sw.txt
>
> Thanks,
> Yogender
>
> 
> ---
> This email message is for the sole use of the intended recipient(s) and
> may contain
> confidential information.  Any unauthorized review, use, disclosure or
> distribution
> is prohibited.  If you are not the intended recipient, please contact the
> sender by
> reply email and destroy all copies of the original message.
> 
> ---
>
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
>
Only 3 to 4 times? This is easily doable with SIMD.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [Patch] CUDA Thumbnail Filter

2017-09-10 Thread Yogender Gupta
I believe there were concerns on pushing the CUDA thumbnail filter and that is 
possible to get similar performance using the normal thumbnail filter. The CUDA 
thumbnail filter is useful for generating thumbnails on the hwaccel cuvid 
pipeline, as it can directly operate on the video memory and give significantly 
higher performance, owing to the fact that there are no sysmem to vidmem copies 
as well as the fact that the encoding and CUDA HW being separate, the CUDA 
thumbnail filter may not degrade the encode performance at all.

The following commands run show that using the Cuda thumbnail filter on the hw 
pipeline could be 3x-4x faster.

E:\>ffmpeg -vsync 0 -y -hwaccel cuvid -c:v h264_cuvid -i amazing.264 
-filter_complex 
[0:v]split=2[in0][in1];[in0]thumbnail_cuda=600,hwdownload,format=nv12[out0];[in1]scale_npp=1920:1080
[out1] -map [out0] thumb%03d.jpg -map [out1] -c:v h264_nvenc out.264 2> hw.txt

E:\>ffmpeg -vsync 0 -y -c:v h264_cuvid -i amazing.264 -filter_complex 
[0:v]split=2[in0][in1];[in0]thumbnail[out0];[in1]scale[out1] -map [out0] 
thumb%03d.jpg -map [out1] -c:v h264_nvenc
out.264 2> sw.txt

Thanks,
Yogender

---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---
ffmpeg version 3.3.git Copyright (c) 2000-2017 the FFmpeg developers
  built with Microsoft (R) C/C++ Optimizing Compiler Version 18.00.40629 for x64
  configuration: --enable-nonfree --disable-shared --enable-nvenc --enable-cuda 
--enable-cuvid --enable-libnpp --enable-cuda-sdk --enable-libnpp 
--extra-cflags=-Ilocal/include --extra-cflags=-I../nv_sdk 
--extra-ldflags='-libpath:../nv_sdk' --toolchain=msvc
  libavutil  55. 74.100 / 55. 74.100
  libavcodec 57.104.101 / 57.104.101
  libavformat57. 81.100 / 57. 81.100
  libavdevice57.  8.100 / 57.  8.100
  libavfilter 6.101.100 /  6.101.100
  libswscale  4.  7.103 /  4.  7.103
  libswresample   2.  8.100 /  2.  8.100
[h264 @ 0057E4367AE0] Stream #0: not enough frames to estimate rate; 
consider increasing probesize
Input #0, h264, from 'amazing.264':
  Duration: N/A, bitrate: N/A
Stream #0:0: Video: h264 (High), yuv420p(progressive), 1920x1072 [SAR 
134:135 DAR 16:9], 24 fps, 24 tbr, 1200k tbn, 48 tbc
Stream mapping:
  Stream #0:0 (h264_cuvid) -> split
  format -> Stream #0:0 (mjpeg)
  scale_npp -> Stream #1:0 (h264_nvenc)
Press [q] to stop, [?] for help
[swscaler @ 0057E43BF400] deprecated pixel format used, make sure you did 
set range correctly
Output #0, image2, to 'thumb%03d.jpg':
  Metadata:
encoder : Lavf57.81.100
Stream #0:0: Video: mjpeg, yuvj420p(pc), 1920x1072 [SAR 134:135 DAR 16:9], 
q=2-31, 200 kb/s, 24 fps, 24 tbn, 24 tbc
Metadata:
  encoder : Lavc57.104.101 mjpeg
Side data:
  cpb: bitrate max/min/avg: 0/0/20 buffer size: 0 vbv_delay: -1
Output #1, h264, to 'out.264':
  Metadata:
encoder : Lavf57.81.100
Stream #1:0: Video: h264 (h264_nvenc) (Main), cuda, 1920x1080 [SAR 1:1 DAR 
16:9], q=-1--1, 2000 kb/s, 24 fps, 24 tbn, 24 tbc
Metadata:
  encoder : Lavc57.104.101 h264_nvenc
Side data:
  cpb: bitrate max/min/avg: 0/0/200 buffer size: 400 vbv_delay: -1
frame=0 fps=0.0 q=0.0 q=25.0 size=N/A time=00:00:06.12 bitrate=N/A 
speed=12.2x
frame=0 fps=0.0 q=0.0 q=25.0 size=N/A time=00:00:12.91 bitrate=N/A 
speed=12.9x
frame=0 fps=0.0 q=0.0 q=29.0 size=N/A time=00:00:18.58 bitrate=N/A 
speed=12.3x
frame=0 fps=0.0 q=0.0 q=27.0 size=N/A time=00:00:24.20 bitrate=N/A speed=  
12x
[Parsed_thumbnail_cuda_1 @ 0057E6D88900] frame id #150 (pts_time=6.249990) 
selected from a set of 600 images
frame=1 fps=0.4 q=3.0 q=25.0 size=N/A time=00:00:30.54 bitrate=N/A 
speed=12.2x
frame=1 fps=0.3 q=3.0 q=28.0 size=N/A time=00:00:37.37 bitrate=N/A 
speed=12.4x
frame=1 fps=0.3 q=3.0 q=29.0 size=N/A time=00:00:44.00 bitrate=N/A 
speed=12.5x
[Parsed_thumbnail_cuda_1 @ 0057E6D88900] frame id #461 (pts_time=44.208262) 
selected from a set of 600 images
frame=2 fps=0.5 q=1.6 q=29.0 size=N/A time=00:00:50.25 bitrate=N/A 
speed=12.5x
frame=2 fps=0.4 q=1.6 q=31.0 size=N/A time=00:00:56.91 bitrate=N/A 
speed=12.6x
frame=2 fps=0.4 q=1.6 q=30.0 size=N/A time=00:01:03.41 bitrate=N/A 
speed=12.6x
frame=2 fps=0.4 q=1.6 q=25.0 size=N/A time=00:01:09.83 bitrate=N/A 
speed=12.6x
[Parsed_thumbnail_cuda_1 @ 0057E6D88900] frame id #216 (pts_time=58.06) 
selected from a set of 600 images
frame=3 fps=0.5 q=1.6 q=27.0 size=N/A time=00:01:15.91 bitrate=N/A 
speed=12.6x

Re: [FFmpeg-devel] [Patch] CUDA Thumbnail Filter

2017-09-04 Thread wm4
On Mon, 4 Sep 2017 20:41:19 +0100
Rostislav Pehlivanov  wrote:

> On 4 September 2017 at 19:44, wm4  wrote:
> 
> > On Mon, 4 Sep 2017 19:07:02 +0100
> > Rostislav Pehlivanov  wrote:
> >  
> > > On 4 September 2017 at 18:18, wm4  wrote:
> > >  
> > > > On Mon, 4 Sep 2017 18:03:51 +0100
> > > > Rostislav Pehlivanov  wrote:
> > > >  
> > > > > On 4 September 2017 at 17:25, Timo Rothenpieler <  
> > t...@rothenpieler.org>  
> > > > > wrote:
> > > > >  
> > > > > > We have av_pixelutils_sad_fn which does SAD and has SIMD, there's  
> > no  
> > > > point  
> > > > > >> in reinventing the wheel.
> > > > > >>
> > > > > >> I also don't see why this needs to be implemented with CUDA.  
> > You're  
> > > > not  
> > > > > >> even doing the SAD in CUDA. I bet it'll be just as fast if not  
> > faster  
> > > > in C  
> > > > > >> (unless you cheat somehow).
> > > > > >>  
> > > > > >
> > > > > > The point is to do it on CUDA frames without copying them to  
> > system ram  
> > > > > > first.
> > > > > >
> > > > > >
> > > > > > ___
> > > > > > ffmpeg-devel mailing list
> > > > > > ffmpeg-devel@ffmpeg.org
> > > > > > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> > > > > >
> > > > > >  
> > > > > I think they should provide a Vulkan interop so we could drop all  
> > CUDA  
> > > > > filters and instead treat all filter GPU acceleration in a generic  
> > way.  
> > > > Its  
> > > > > just a matter of months before one exists, I bet.  
> > > >
> > > > You could say the same about OpenCL. Too bad NVIDIA keep pushing their
> > > > dumb vendor specific APIs.
> > > > ___
> > > > ffmpeg-devel mailing list
> > > > ffmpeg-devel@ffmpeg.org
> > > > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> > > >  
> > >
> > > OpenCL does no presenting, so interops there would remove CUDA's point.
> > > However, Vulkan is general purpose so interops must exist in order to  
> > avoid  
> > > copying when presenting. OpenGL got a CUDA interop for this very reason.  
> >
> > That doesn't matter for this filter. I'm fairly sure OpenCL got interop
> > too, although I've never tried it.
> >  
> > > Hence, since a Vulkan interop will soon exist, I object to this patch. I
> > > see no reason to add more vendor exlcusive code when a generic solution
> > > will appear and we could use that. Unlelss someone manages to convince me
> > > otherwise.  
> >
> > Unlike Vulkan, OpenCL is rather stable and widely supported. Vulkan was
> > apparently made for games (including stability requirements), and
> > supported only with newer HW. In fact, OpenCL is literally the portable
> > equivalent to Cuda. So it would be the logical choice.
> > ___
> > ffmpeg-devel mailing list
> > ffmpeg-devel@ffmpeg.org
> > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> >  
> 
> Vulkan was definitely not made for games only, it was made to be general
> purpose.

It certainly feels like it. So much around it is geared towards game
dev.

> As far as I know some vendors are replacing their OpenCL
> implementations by a Vulkan shim.

They probably could implement a Vulkan OpenCL backend, but even then
they'll provide a shader compiler as part of the OpenCL API, which is
superior to Vulkan again.

> Some vendors also have had a history of
> deliberately handicapping alternative compute APIs so their native ones
> perform better. Vulkan eliminates all that.

Then suggest better hardware with vendors which don't do that nonsense.

It remains to be seen whether Vulkan is really suitable for anything
but things centered around rasterization. The lack of a standard shader
compiler is definitely an issue. (Are you going to depend on vendor
extensions? On some shitty 3rd party compilers, like the half-broken
glslang? Or check in shader binaries into git?)

> Also using Vulkan elminates the
> need for an OpenCL/Vulkan interop for users using Vulkan. There's no other
> logical choice but Vulkan.

Using OpenCL wouldn't even require any interop with that much.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [Patch] CUDA Thumbnail Filter

2017-09-04 Thread Rostislav Pehlivanov
On 4 September 2017 at 19:44, wm4  wrote:

> On Mon, 4 Sep 2017 19:07:02 +0100
> Rostislav Pehlivanov  wrote:
>
> > On 4 September 2017 at 18:18, wm4  wrote:
> >
> > > On Mon, 4 Sep 2017 18:03:51 +0100
> > > Rostislav Pehlivanov  wrote:
> > >
> > > > On 4 September 2017 at 17:25, Timo Rothenpieler <
> t...@rothenpieler.org>
> > > > wrote:
> > > >
> > > > > We have av_pixelutils_sad_fn which does SAD and has SIMD, there's
> no
> > > point
> > > > >> in reinventing the wheel.
> > > > >>
> > > > >> I also don't see why this needs to be implemented with CUDA.
> You're
> > > not
> > > > >> even doing the SAD in CUDA. I bet it'll be just as fast if not
> faster
> > > in C
> > > > >> (unless you cheat somehow).
> > > > >>
> > > > >
> > > > > The point is to do it on CUDA frames without copying them to
> system ram
> > > > > first.
> > > > >
> > > > >
> > > > > ___
> > > > > ffmpeg-devel mailing list
> > > > > ffmpeg-devel@ffmpeg.org
> > > > > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> > > > >
> > > > >
> > > > I think they should provide a Vulkan interop so we could drop all
> CUDA
> > > > filters and instead treat all filter GPU acceleration in a generic
> way.
> > > Its
> > > > just a matter of months before one exists, I bet.
> > >
> > > You could say the same about OpenCL. Too bad NVIDIA keep pushing their
> > > dumb vendor specific APIs.
> > > ___
> > > ffmpeg-devel mailing list
> > > ffmpeg-devel@ffmpeg.org
> > > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> > >
> >
> > OpenCL does no presenting, so interops there would remove CUDA's point.
> > However, Vulkan is general purpose so interops must exist in order to
> avoid
> > copying when presenting. OpenGL got a CUDA interop for this very reason.
>
> That doesn't matter for this filter. I'm fairly sure OpenCL got interop
> too, although I've never tried it.
>
> > Hence, since a Vulkan interop will soon exist, I object to this patch. I
> > see no reason to add more vendor exlcusive code when a generic solution
> > will appear and we could use that. Unlelss someone manages to convince me
> > otherwise.
>
> Unlike Vulkan, OpenCL is rather stable and widely supported. Vulkan was
> apparently made for games (including stability requirements), and
> supported only with newer HW. In fact, OpenCL is literally the portable
> equivalent to Cuda. So it would be the logical choice.
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>

Vulkan was definitely not made for games only, it was made to be general
purpose. As far as I know some vendors are replacing their OpenCL
implementations by a Vulkan shim. Some vendors also have had a history of
deliberately handicapping alternative compute APIs so their native ones
perform better. Vulkan eliminates all that. Also using Vulkan elminates the
need for an OpenCL/Vulkan interop for users using Vulkan. There's no other
logical choice but Vulkan.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [Patch] CUDA Thumbnail Filter

2017-09-04 Thread wm4
On Mon, 4 Sep 2017 19:07:02 +0100
Rostislav Pehlivanov  wrote:

> On 4 September 2017 at 18:18, wm4  wrote:
> 
> > On Mon, 4 Sep 2017 18:03:51 +0100
> > Rostislav Pehlivanov  wrote:
> >  
> > > On 4 September 2017 at 17:25, Timo Rothenpieler 
> > > wrote:
> > >  
> > > > We have av_pixelutils_sad_fn which does SAD and has SIMD, there's no  
> > point  
> > > >> in reinventing the wheel.
> > > >>
> > > >> I also don't see why this needs to be implemented with CUDA. You're  
> > not  
> > > >> even doing the SAD in CUDA. I bet it'll be just as fast if not faster  
> > in C  
> > > >> (unless you cheat somehow).
> > > >>  
> > > >
> > > > The point is to do it on CUDA frames without copying them to system ram
> > > > first.
> > > >
> > > >
> > > > ___
> > > > ffmpeg-devel mailing list
> > > > ffmpeg-devel@ffmpeg.org
> > > > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> > > >
> > > >  
> > > I think they should provide a Vulkan interop so we could drop all CUDA
> > > filters and instead treat all filter GPU acceleration in a generic way.  
> > Its  
> > > just a matter of months before one exists, I bet.  
> >
> > You could say the same about OpenCL. Too bad NVIDIA keep pushing their
> > dumb vendor specific APIs.
> > ___
> > ffmpeg-devel mailing list
> > ffmpeg-devel@ffmpeg.org
> > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> >  
> 
> OpenCL does no presenting, so interops there would remove CUDA's point.
> However, Vulkan is general purpose so interops must exist in order to avoid
> copying when presenting. OpenGL got a CUDA interop for this very reason.

That doesn't matter for this filter. I'm fairly sure OpenCL got interop
too, although I've never tried it.

> Hence, since a Vulkan interop will soon exist, I object to this patch. I
> see no reason to add more vendor exlcusive code when a generic solution
> will appear and we could use that. Unlelss someone manages to convince me
> otherwise.

Unlike Vulkan, OpenCL is rather stable and widely supported. Vulkan was
apparently made for games (including stability requirements), and
supported only with newer HW. In fact, OpenCL is literally the portable
equivalent to Cuda. So it would be the logical choice.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [Patch] CUDA Thumbnail Filter

2017-09-04 Thread Rostislav Pehlivanov
On 4 September 2017 at 18:18, wm4  wrote:

> On Mon, 4 Sep 2017 18:03:51 +0100
> Rostislav Pehlivanov  wrote:
>
> > On 4 September 2017 at 17:25, Timo Rothenpieler 
> > wrote:
> >
> > > We have av_pixelutils_sad_fn which does SAD and has SIMD, there's no
> point
> > >> in reinventing the wheel.
> > >>
> > >> I also don't see why this needs to be implemented with CUDA. You're
> not
> > >> even doing the SAD in CUDA. I bet it'll be just as fast if not faster
> in C
> > >> (unless you cheat somehow).
> > >>
> > >
> > > The point is to do it on CUDA frames without copying them to system ram
> > > first.
> > >
> > >
> > > ___
> > > ffmpeg-devel mailing list
> > > ffmpeg-devel@ffmpeg.org
> > > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> > >
> > >
> > I think they should provide a Vulkan interop so we could drop all CUDA
> > filters and instead treat all filter GPU acceleration in a generic way.
> Its
> > just a matter of months before one exists, I bet.
>
> You could say the same about OpenCL. Too bad NVIDIA keep pushing their
> dumb vendor specific APIs.
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>

OpenCL does no presenting, so interops there would remove CUDA's point.
However, Vulkan is general purpose so interops must exist in order to avoid
copying when presenting. OpenGL got a CUDA interop for this very reason.

Hence, since a Vulkan interop will soon exist, I object to this patch. I
see no reason to add more vendor exlcusive code when a generic solution
will appear and we could use that. Unlelss someone manages to convince me
otherwise.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [Patch] CUDA Thumbnail Filter

2017-09-04 Thread wm4
On Mon, 4 Sep 2017 18:03:51 +0100
Rostislav Pehlivanov  wrote:

> On 4 September 2017 at 17:25, Timo Rothenpieler 
> wrote:
> 
> > We have av_pixelutils_sad_fn which does SAD and has SIMD, there's no point  
> >> in reinventing the wheel.
> >>
> >> I also don't see why this needs to be implemented with CUDA. You're not
> >> even doing the SAD in CUDA. I bet it'll be just as fast if not faster in C
> >> (unless you cheat somehow).
> >>  
> >
> > The point is to do it on CUDA frames without copying them to system ram
> > first.
> >
> >
> > ___
> > ffmpeg-devel mailing list
> > ffmpeg-devel@ffmpeg.org
> > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> >
> >  
> I think they should provide a Vulkan interop so we could drop all CUDA
> filters and instead treat all filter GPU acceleration in a generic way. Its
> just a matter of months before one exists, I bet.

You could say the same about OpenCL. Too bad NVIDIA keep pushing their
dumb vendor specific APIs.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [Patch] CUDA Thumbnail Filter

2017-09-04 Thread Rostislav Pehlivanov
On 4 September 2017 at 17:25, Timo Rothenpieler 
wrote:

> We have av_pixelutils_sad_fn which does SAD and has SIMD, there's no point
>> in reinventing the wheel.
>>
>> I also don't see why this needs to be implemented with CUDA. You're not
>> even doing the SAD in CUDA. I bet it'll be just as fast if not faster in C
>> (unless you cheat somehow).
>>
>
> The point is to do it on CUDA frames without copying them to system ram
> first.
>
>
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
>
I think they should provide a Vulkan interop so we could drop all CUDA
filters and instead treat all filter GPU acceleration in a generic way. Its
just a matter of months before one exists, I bet.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [Patch] CUDA Thumbnail Filter

2017-09-04 Thread Timo Rothenpieler

We have av_pixelutils_sad_fn which does SAD and has SIMD, there's no point
in reinventing the wheel.

I also don't see why this needs to be implemented with CUDA. You're not
even doing the SAD in CUDA. I bet it'll be just as fast if not faster in C
(unless you cheat somehow).


The point is to do it on CUDA frames without copying them to system ram 
first.




smime.p7s
Description: S/MIME Cryptographic Signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [Patch] CUDA Thumbnail Filter

2017-09-04 Thread Rostislav Pehlivanov
On 30 August 2017 at 05:19, Yogender Gupta  wrote:

> Attached is a CUDA version of the thumbnail filter, this helps accelerate
> thumbnails generations significantly, when using the GPU pipeline.
>
> Regards,
> Yogender
>
> 
> ---
> This email message is for the sole use of the intended recipient(s) and
> may contain
> confidential information.  Any unauthorized review, use, disclosure or
> distribution
> is prohibited.  If you are not the intended recipient, please contact the
> sender by
> reply email and destroy all copies of the original message.
> 
> ---
>
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
>
We have av_pixelutils_sad_fn which does SAD and has SIMD, there's no point
in reinventing the wheel.

I also don't see why this needs to be implemented with CUDA. You're not
even doing the SAD in CUDA. I bet it'll be just as fast if not faster in C
(unless you cheat somehow).
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [Patch] CUDA Thumbnail Filter

2017-09-04 Thread Yogender Gupta
Taken care of all comments except the documentation.

Will send out a separate patch for both the CUDA filters documentation.

Regards,
Yogender

---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---


0001-thumbnail_cuda-CUDA-Thumbnail-Filter.patch
Description: 0001-thumbnail_cuda-CUDA-Thumbnail-Filter.patch
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [Patch] CUDA Thumbnail Filter

2017-09-01 Thread Timo Rothenpieler

Also missing a dep on cuda_sdk in configure.



smime.p7s
Description: S/MIME Cryptographic Signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [Patch] CUDA Thumbnail Filter

2017-09-01 Thread Timo Rothenpieler

Am 30.08.2017 um 06:19 schrieb Yogender Gupta:

Attached is a CUDA version of the thumbnail filter, this helps accelerate 
thumbnails generations significantly, when using the GPU pipeline.

Regards,
Yogender


After having a look at the code:

The filter is using a global "CUdeviceptr data;" variable(Which isn't 
even static).
This is generally not acceptable. It makes it impossible to use the 
filter more than once in parallel. All state should be in the filter 
context.


Also, the allocated Module and Device-Memory is never freed.
uninit should unload the module, free the memory, and do other 
potentially necessary cleanup.



Otherwise the code seems reasonable to me. Would still like to have 
someone else review it though.




smime.p7s
Description: S/MIME Cryptographic Signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [Patch] CUDA Thumbnail Filter

2017-08-30 Thread Timo Rothenpieler

Am 30.08.2017 um 06:19 schrieb Yogender Gupta:

Attached is a CUDA version of the thumbnail filter, this helps accelerate 
thumbnails generations significantly, when using the GPU pipeline.


Without having done a full review on the code yet:
A new filter needs a libavfilter version bump(not 100% sure if minor or 
micro, but I think it was a minor bump).

Also, the filter is missing documentation in doc/filters.texi.
Which, as I just realized, is also true for scale_cuda.

Will have a closer look at the code later.
If someone else could also have a look, that would be greatly appreciated.



Regards,
Timo



smime.p7s
Description: S/MIME Cryptographic Signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel