Re: [FFmpeg-devel] [PATCH] libavfilter: temporarily remove DNN framework and vf_sr filter
2018-07-26 23:19 GMT-03:00 Ronald S. Bultje : > Hi, > > On Thu, Jul 26, 2018 at 10:04 PM, Pedro Arthur wrote: > >> If you compare NN weights with quantization tables they are pretty >> similar > > > https://chromium.googlesource.com/webm/libvpx/+/3b9c19aaa7b8830a896c5f578a3ce6c6a7953947%5E%21/#F0 > > So, that one tiny single function is how VP9/AV1 quant tables are generated. > > Or, the HEVC/H264 ones, they are even simpler: exp2(qp/6). > > Are NN weights a single, one-line (10-character) expression? Please > elaborate. Why isn't that 10-character function documented anywhere? I think you missed the point, I wrote "can be obtained from a training process over a dataset so it achieves better results (quality/compression)". Taking the vp9 as example, sure the coeficients are obtained by the 'poly3' but the real data are the polynomial coeficients, does any one asks where these polynomial coeficients came from, reproducibility, etc? Your comparison does not seems fair to me. > > Ronald > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 3/3] lavfi/motion_estimation: use pixelutils API for sad.
On Mon, Jul 23, 2018 at 2:33 AM Marton Balint wrote: > > > > On Tue, 17 Jul 2018, myp...@gmail.com wrote: > > > On Sun, Jul 15, 2018 at 1:03 AM Michael Niedermayer > > wrote: > >> > >> On Sat, Jul > >> 14, 2018 at 12:04:46PM +0200, Marton Balint wrote: > >> > > >> > > >> > On Sat, 14 Jul 2018, Michael Niedermayer wrote: > >> > > >> > >On Fri, Jul 13, 2018 at 10:51:00AM +0200, Marton Balint wrote: > >> > >> > >> > >> > >> > >>On Thu, 12 Jul 2018, myp...@gmail.com wrote: > >> > >> > >> > >>>On Thu, Jul 12, 2018 at 12:43 AM Marton Balint wrote: > >> > > >> > > >> > > >> > On Wed, 11 Jul 2018, Jun Zhao wrote: > >> > > >> > >use pixelutils API for sad in motion estimation. > >> > > >> > Does it make sense to improve this code? I thought a superior and > >> > faster > >> > approach was a result of 2017 GSOC task: > >> > > >> > https://docs.google.com/document/d/1Hyh_rxP1KGsVkg7i7yU8Bcv92z0LIL4r-axpoKfvMFk/edit > >> > > >> > Maybe that code should be merged back, and any further optimalization > >> > should be done based on that code, no? > >> > > >> > Thanks, > >> > Marton > >> > > >> > >>>Hi, Marton: > >> > >>> > >> > >>>Yes, now I try to improve the > >> minterpolate, and after use perf > >> > >>>profiing the commands: > >> > >>> > >> > >> > >>>./ffmpeg -i a.ts -filter_complex > >> > >>>"minterpolate=mi_mode=mci:mc_mode=aobmc:vsbmc=1" -f null /dev/null > >> > >>>I found the hotspot is: > >> > >>>- get_sbad_ob > >> > >>>- get_sbad > >> > >>>- get_sad_ob > >> > >>>- bilateral_obmc > >> > >>>- set_frame_data > >> > >>> > >> > >>>So, as my plan, I will try to use sse2/avx2 > >> > >>>Scatter/Gather, optimized > >> > >>>sad function (use pixelutils API) > >> > >>>in get_sbad_ob / get_sbad / get_sad_ob first, for set_frame_data > >> > >>>case, maybe need to use Scatter/Gather SIMD instruction. > >> > >> > >> > >>That is great, all I am saying we should avoid diverging the two > >> > >>brances > >> > >>(FFmpeg branch, and GSOC 2017 branch), and try to merge back GSOC2017 > >> > >>if it > >> > >>can be done with reasonable amount of work before optimizing code, > >> > >>otherwise > >> > >>the GSOC2017 branch will rot and we will lose the result of the GSOC > >> > >>task. > >> > >> > >> > >>> > >> > >>>But if some guys have done some improve task in this case, I think > >> > >>>based on the pre-existing work is the better way. > >> > >> > >> > >>Michael was the mentor, maybe he can chip in on what should be done > >> > >>here. > >> > > > >> > >talk with the author/student who wrote the code, not me :) > >> > > >> > Well, his not active here, > >> > >> yes but last i heared from him, he was interrested in continuing this > >> project > >> i think ive not heared much from him after that but i now see that there > >> is a > >> small commit in his repo from 2018 so he is not completely inactive. > >> I think you should talk with him > >> > >> > >> > and the question is if his work is ready for > >> > mainline inclusion or not, and if he has done enough valuable work during > >> > GSOC that its worth working on mainlining it. > >> > >> He certainly did valuable work. Looking now at the ML, it seems the more or > >> less last thing on the ML was the RFC/Discussion thread about libmotion. > >> In that everyone wanted to dictate the design, and all that was > >> contradicting > >> each other. > >> If you want to work on unifying this entangled bikeshed ball of conflicting > >> oppinions, that surely is very welcome. Important is that it ends in > >> something > >> that is practical and high quality. > >> Personally i think the author should be given more authority in the design. > >> But again, please talk with the author of this code > >> I dont remember everything in as much detail about this ... > >> > >> also ive added him to the CC > >> > >> Thanks > >> > >> > > Now the minterpolate/libmotion auther didn't give a feedback or > > sugesstion, so I will update patch 1/2 (just add SSE2/AVX2 sad_32x32) > > with some perf data and hold on the patch 3 about minterpolate, any > > other comments? > > I checked the "libmotion" series, and it seems they are in > debug/development state and the commits are not clean, so some heavy > refactoring is needed before applying them anyway. > > Do what you prefer, snow codec based motion compenstaion is an additional > algorithm to the existing code anyway as far as I see. > As my point, I prefer improve current minterpolate filter first, then we can try to refactor the "libmotion" series, Done Is Better Than Perfect, any other comments or suggestion? ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] libavfilter: temporarily remove DNN framework and vf_sr filter
Hi, On Thu, Jul 26, 2018 at 10:04 PM, Pedro Arthur wrote: > If you compare NN weights with quantization tables they are pretty > similar https://chromium.googlesource.com/webm/libvpx/+/3b9c19aaa7b8830a896c5f578a3ce6c6a7953947%5E%21/#F0 So, that one tiny single function is how VP9/AV1 quant tables are generated. Or, the HEVC/H264 ones, they are even simpler: exp2(qp/6). Are NN weights a single, one-line (10-character) expression? Please elaborate. Why isn't that 10-character function documented anywhere? Ronald ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] libavfilter: temporarily remove DNN framework and vf_sr filter
Hi, I'm surprised with this patch, there wasn't any concern raised in the patch review process. 2018-07-26 16:26 GMT-03:00 Rostislav Pehlivanov : > As discussed recently, the vf_sr filter and the DNN framework have an > issue: unreproducable weights and questionable license, as well as > overall unfitting coding style to the rest of the project. I think I'm not aware of these discussions could you provide a reference? I also don't understand why the coding style is unfitting (again no concern was raised). > > The vf_sr filter in particular has weights embedded which weight the > libavfilter binary by a bit and cannot currently be reproduced. > There's an overall consensus that NN filters should accept external > weights only, as the nnedi filter currently does. > > So, temporarily remove both until the coding style issues have been > fixed with the framework and the filter has been modified to accept > external weights. What are these issues so we can fix them? > > Also, there's a discussion by the Debian folks as to whether to treat > pretrained NNs as non-free[0], hence its not just our project that's > affected by the questionable license of distributing pretrained NN > weights. > > Due to the weight of the patch (more than 1mb!) I've uploaded it to > https://0x0.st/sVEH.patch if anyone wants to test it. The change stat > is printed below. > > [0]: https://lwn.net/Articles/760142/ I took the time to read the whole discussion and in my opinion it is flawed, except for the storage requirement, which Sergey already worked on a patch to reduce the stored data. I think before any discussion, it should be clear what is the ffmpeg policy on adding data, and what is considered data, and it should be consistent. I'll try to address the topics in the above discussion. First what is data? is it expected that all data stored should be easily reproducible? I guess not, what is the point in storing data that is easily reproducible? The entire humanity is built on previous stored knowledge, namely data, do we require each time one is going to use some form of knowledge to reproduce it? that is, proof everything one is using? The answer is no, the whole point in storing data is that you had once worked hard to proof it works and onwards just use it and believe it works. That does not mean it is imposible to proof everything. I think the above fits perfectly with NN weights as data. The next point is the reproducibility, one should be reasonable, it is hard to reproduce bit by bit of NN stored data but is totally doable to achieve similar results following the same training metodology used. Then there is the question what is open-source, once again one should be reasonable. The NN model is public available, everything is documented, the math machinery is also widely available and documented. There is also a repository containing everything one needs to train the NN and achieve similar results, the trainig data is public also. The training software is open-source, the user could also use any machine learning framework of their choice to perform the training since the model is documented, there is nothing locking one to a specific software or hardware. I can't see what is not open. Does we impose all requiriments imposed for NN weights on all other data stored in ffmpeg? I guess not, once more one should be consistent. If you compare NN weights with quantization tables they are pretty similar, both can be obtained from a training process over a dataset so it achieves better results (quality/compression). Are quantization tables evil? I don't think so. It seems people is afraid of NN just because we give it a fancy name, while it is just tables of data as we always used in our code. > > Signed-off-by: Rostislav Pehlivanov > > Rostislav Pehlivanov (1): > libavfilter: temporarily remove DNN framework and vf_sr filter > > Changelog| 1 - > configure| 8 - > libavfilter/Makefile | 3 - > libavfilter/allfilters.c | 1 - > libavfilter/dnn_backend_native.c | 495 -- > libavfilter/dnn_backend_native.h |40 - > libavfilter/dnn_backend_tf.c | 325 - > libavfilter/dnn_backend_tf.h |40 - > libavfilter/dnn_espcn.h | 12637 - > libavfilter/dnn_interface.c |60 - > libavfilter/dnn_interface.h |63 - > libavfilter/dnn_srcnn.h | 4957 --- > libavfilter/vf_sr.c | 354 - > 13 files changed, 18984 deletions(-) > delete mode 100644 libavfilter/dnn_backend_native.c > delete mode 100644 libavfilter/dnn_backend_native.h > delete mode 100644 libavfilter/dnn_backend_tf.c > delete mode 100644 libavfilter/dnn_backend_tf.h > delete mode 100644 libavfilter/dnn_espcn.h > delete mode 100644 libavfilter/dnn_interface.c > delete mode 100644 libavfilter/dnn_interface.h > delete mode 100644 libavfilter/dnn_srcnn.h > delete mode 100644 lib
Re: [FFmpeg-devel] [PATCH] avformat/movenc: implicitly enable negative CTS offsets for ismv
On Thu, Jul 26, 2018 at 02:51:38AM +0300, Jan Ekström wrote: > ISMV lacks any sort of edit list support, as well as tfxd is > effectively the PTS of the fragment for most intents and purposes. > > Thus, if b-frames are requested without negative CTS offsets you > end up with N frames' worth of delay (tfxd PTS plus the CTS offset > of the first sample). Negative CTS offsets enable the first sample > to have CTS=DTS, and thus a/v desync due to b-frame reorder delay > is avoided. > --- > doc/muxers.texi | 2 ++ > libavformat/movenc.c | 2 +- > tests/ref/fate/movenc | 4 ++-- > 3 files changed, 5 insertions(+), 3 deletions(-) breaks fate TESTlavf-ismv --- ./tests/ref/lavf/ismv 2018-07-20 13:20:28.137581113 +0200 +++ tests/data/fate/lavf-ismv 2018-07-27 00:29:48.709348455 +0200 @@ -1,9 +1,9 @@ -a9ccbb4cd1436d222ef4425567b4e03d *./tests/data/lavf/lavf.ismv +96053075a3f60d271131fe2d0765c267 *./tests/data/lavf/lavf.ismv 312542 ./tests/data/lavf/lavf.ismv ./tests/data/lavf/lavf.ismv CRC=0x9d9a638a -440d85f9fd5b9f63c2676638782b5c15 *./tests/data/lavf/lavf.ismv +7022701b4c693bc4ffe1e9f96dd82a02 *./tests/data/lavf/lavf.ismv 321448 ./tests/data/lavf/lavf.ismv ./tests/data/lavf/lavf.ismv CRC=0xe8130120 -a9ccbb4cd1436d222ef4425567b4e03d *./tests/data/lavf/lavf.ismv +96053075a3f60d271131fe2d0765c267 *./tests/data/lavf/lavf.ismv 312542 ./tests/data/lavf/lavf.ismv ./tests/data/lavf/lavf.ismv CRC=0x9d9a638a Test lavf-ismv failed. Look at tests/data/fate/lavf-ismv.err for details. make: *** [fate-lavf-ismv] Error 1 [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB You can kill me, but you cannot change the truth. signature.asc Description: PGP signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH] lavfi/nlmeans: fixup aarch64 assembly with clang
Clang is more strict about some things. --- libavfilter/aarch64/vf_nlmeans_neon.S | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/libavfilter/aarch64/vf_nlmeans_neon.S b/libavfilter/aarch64/vf_nlmeans_neon.S index 6308a428db..ac16157bbd 100644 --- a/libavfilter/aarch64/vf_nlmeans_neon.S +++ b/libavfilter/aarch64/vf_nlmeans_neon.S @@ -22,7 +22,7 @@ // acc_sum_store(ABCD) = {X+A, X+A+B, X+A+B+C, X+A+B+C+D} .macro acc_sum_store x, xb -dup v24.4S, v24.4S[3] // ...X -> +dup v24.4s, v24.s[3]// ...X -> ext v25.16B, v26.16B, \xb, #12 // ext(,ABCD,12)=0ABC add v24.4S, v24.4S, \x // +ABCD={X+A,X+B,X+C,X+D} add v24.4S, v24.4S, v25.4S // {X+A,X+B+A,X+C+B,X+D+C} (+0ABC) @@ -37,7 +37,7 @@ function ff_compute_safe_ssd_integral_image_neon, export=1 moviv26.4S, #0 // used as zero for the "rotations" in acc_sum_store sub x3, x3, w6, UXTW// s1 padding (s1_linesize - w) sub x5, x5, w6, UXTW// s2 padding (s2_linesize - w) -sub x9, x0, x1, UXTW #2 // dst_top +sub x9, x0, w1, UXTW #2 // dst_top sub x1, x1, w6, UXTW// dst padding (dst_linesize_32 - w) lsl x1, x1, #2 // dst padding expressed in bytes 1: mov w10, w6 // width copy for each line -- 2.17.1 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] libavfilter: temporarily remove DNN framework and vf_sr filter
2018-07-26 21:26 GMT+02:00, Rostislav Pehlivanov : > There's an overall consensus that NN filters should accept > external weights only Do you mean an overall consensus on irc? I ask because the patch in question was sent several times for review, and I don't remember a comment concerning internal weights. When the issue was brought up on the mailing list, at least one developer defended the internal weights iirc. (No opinion here.) Carl Eugen ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] libavfilter: temporarily remove DNN framework and vf_sr filter
On 7/26/18, Thilo Borgmann wrote: > Hi, > > Am 26.07.18 um 21:26 schrieb Rostislav Pehlivanov: >> As discussed recently, the vf_sr filter and the DNN framework have an >> issue: unreproducable weights and questionable license, as well as >> overall unfitting coding style to the rest of the project. >> >> The vf_sr filter in particular has weights embedded which weight the >> libavfilter binary by a bit and cannot currently be reproduced. >> There's an overall consensus that NN filters should accept external >> weights only, as the nnedi filter currently does. >> >> So, temporarily remove both until the coding style issues have been >> fixed with the framework and the filter has been modified to accept >> external weights. >> >> Also, there's a discussion by the Debian folks as to whether to treat >> pretrained NNs as non-free[0], hence its not just our project that's >> affected by the questionable license of distributing pretrained NN >> weights. > > I personally don't have a good feeling with pre-trained NNs in the codebase, > too. However, I do not care much about what solution we take, but Mina's > GSoC project also depends on the NN code that comes with this and therefore > I'd encourage everyone to make up their mind to find a suitable solution > sometime soonish. > > Maybe for the time-being, we might only accept such code reading in > externally provided NNs and/or the ability to train using FFmpeg itself. (Or > ask one of our kind users with compute power to generate some for us) IIRC mentioned filter already supports external files. It just have internal one too. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] libavfilter: temporarily remove DNN framework and vf_sr filter
Hi, Am 26.07.18 um 21:26 schrieb Rostislav Pehlivanov: > As discussed recently, the vf_sr filter and the DNN framework have an > issue: unreproducable weights and questionable license, as well as > overall unfitting coding style to the rest of the project. > > The vf_sr filter in particular has weights embedded which weight the > libavfilter binary by a bit and cannot currently be reproduced. > There's an overall consensus that NN filters should accept external > weights only, as the nnedi filter currently does. > > So, temporarily remove both until the coding style issues have been > fixed with the framework and the filter has been modified to accept > external weights. > > Also, there's a discussion by the Debian folks as to whether to treat > pretrained NNs as non-free[0], hence its not just our project that's > affected by the questionable license of distributing pretrained NN > weights. I personally don't have a good feeling with pre-trained NNs in the codebase, too. However, I do not care much about what solution we take, but Mina's GSoC project also depends on the NN code that comes with this and therefore I'd encourage everyone to make up their mind to find a suitable solution sometime soonish. Maybe for the time-being, we might only accept such code reading in externally provided NNs and/or the ability to train using FFmpeg itself. (Or ask one of our kind users with compute power to generate some for us) > Due to the weight of the patch (more than 1mb!) I've uploaded it to > https://0x0.st/sVEH.patch if anyone wants to test it. The change stat > is printed below. > > [0]: https://lwn.net/Articles/760142/ > > Signed-off-by: Rostislav Pehlivanov > > Rostislav Pehlivanov (1): > libavfilter: temporarily remove DNN framework and vf_sr filter > > Changelog| 1 - > configure| 8 - > libavfilter/Makefile | 3 - > libavfilter/allfilters.c | 1 - > libavfilter/dnn_backend_native.c | 495 -- > libavfilter/dnn_backend_native.h |40 - > libavfilter/dnn_backend_tf.c | 325 - > libavfilter/dnn_backend_tf.h |40 - > libavfilter/dnn_espcn.h | 12637 - > libavfilter/dnn_interface.c |60 - > libavfilter/dnn_interface.h |63 - > libavfilter/dnn_srcnn.h | 4957 --- > libavfilter/vf_sr.c | 354 - > 13 files changed, 18984 deletions(-) > delete mode 100644 libavfilter/dnn_backend_native.c > delete mode 100644 libavfilter/dnn_backend_native.h > delete mode 100644 libavfilter/dnn_backend_tf.c > delete mode 100644 libavfilter/dnn_backend_tf.h > delete mode 100644 libavfilter/dnn_espcn.h > delete mode 100644 libavfilter/dnn_interface.c > delete mode 100644 libavfilter/dnn_interface.h > delete mode 100644 libavfilter/dnn_srcnn.h > delete mode 100644 libavfilter/vf_sr.c > -Thilo ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH] libavfilter: temporarily remove DNN framework and vf_sr filter
As discussed recently, the vf_sr filter and the DNN framework have an issue: unreproducable weights and questionable license, as well as overall unfitting coding style to the rest of the project. The vf_sr filter in particular has weights embedded which weight the libavfilter binary by a bit and cannot currently be reproduced. There's an overall consensus that NN filters should accept external weights only, as the nnedi filter currently does. So, temporarily remove both until the coding style issues have been fixed with the framework and the filter has been modified to accept external weights. Also, there's a discussion by the Debian folks as to whether to treat pretrained NNs as non-free[0], hence its not just our project that's affected by the questionable license of distributing pretrained NN weights. Due to the weight of the patch (more than 1mb!) I've uploaded it to https://0x0.st/sVEH.patch if anyone wants to test it. The change stat is printed below. [0]: https://lwn.net/Articles/760142/ Signed-off-by: Rostislav Pehlivanov Rostislav Pehlivanov (1): libavfilter: temporarily remove DNN framework and vf_sr filter Changelog| 1 - configure| 8 - libavfilter/Makefile | 3 - libavfilter/allfilters.c | 1 - libavfilter/dnn_backend_native.c | 495 -- libavfilter/dnn_backend_native.h |40 - libavfilter/dnn_backend_tf.c | 325 - libavfilter/dnn_backend_tf.h |40 - libavfilter/dnn_espcn.h | 12637 - libavfilter/dnn_interface.c |60 - libavfilter/dnn_interface.h |63 - libavfilter/dnn_srcnn.h | 4957 --- libavfilter/vf_sr.c | 354 - 13 files changed, 18984 deletions(-) delete mode 100644 libavfilter/dnn_backend_native.c delete mode 100644 libavfilter/dnn_backend_native.h delete mode 100644 libavfilter/dnn_backend_tf.c delete mode 100644 libavfilter/dnn_backend_tf.h delete mode 100644 libavfilter/dnn_espcn.h delete mode 100644 libavfilter/dnn_interface.c delete mode 100644 libavfilter/dnn_interface.h delete mode 100644 libavfilter/dnn_srcnn.h delete mode 100644 libavfilter/vf_sr.c -- 2.18.0 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 1/2] lavc/amfenc: moving amf common code (library and context) to lavu/hwcontext_amf from amfenc to be reused in other amf components
Hello. It is reminder. Could you please review the patch? if it is ok, could you apply it? It was published 2 weeks ago and it is required for further updates Thanks, Alexander ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 1/3] diracdec: add 10-bit Haar SIMD functions
On 26 July 2018 at 12:28, James Darnley wrote: > + > +%macro HAAR_HORIZONTAL 0 > + > +cglobal horizontal_compose_haar_10bit, 3, 6+ARCH_X86_64, 4, b, temp_, w, > x, b2 > +DECLARE_REG_TMP 2,5 > +%if ARCH_X86_64 > +%define tail r6d > +%else > +%define tail dword wm > +%endif > + > You can remove this whole bit, the init function only gets called if ARCH_X86_64 is true. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH 1/3] diracdec: add 10-bit Haar SIMD functions
On 26 July 2018 at 12:28, James Darnley wrote: > Speed of ffmpeg when decoding a 720p yuv422p10 file encoded with the > relevant transform. > C:119fps > SSE2: 204fps > AVX: 206fps > AVX2: 221fps > > timer measurements, haar horizontal compose: > sse2: 3.68x faster (45143 vs. 12279 decicycles) compared with C > avx: 3.68x faster (45143 vs. 12275 decicycles) compared with C > avx2: 5.16x faster (45143 vs. 8742 decicycles) compared with C > haar vertical compose: > sse2: 1.64x faster (31792 vs. 19377 decicycles) compared with C > avx: 1.58x faster (31792 vs. 20090 decicycles) compared with C > avx2: 1.66x faster (31792 vs. 19157 decicycles) compared with C > --- > libavcodec/dirac_dwt.c| 7 +- > libavcodec/dirac_dwt.h| 1 + > libavcodec/x86/Makefile | 6 +- > libavcodec/x86/dirac_dwt_10bit.asm| 160 ++ > libavcodec/x86/dirac_dwt_init_10bit.c | 76 > 5 files changed, 247 insertions(+), 3 deletions(-) > create mode 100644 libavcodec/x86/dirac_dwt_10bit.asm > create mode 100644 libavcodec/x86/dirac_dwt_init_10bit.c > > diff --git a/libavcodec/dirac_dwt.c b/libavcodec/dirac_dwt.c > index cc08f8865a..86bee5bb9b 100644 > --- a/libavcodec/dirac_dwt.c > +++ b/libavcodec/dirac_dwt.c > @@ -59,8 +59,13 @@ int ff_spatial_idwt_init(DWTContext *d, DWTPlane *p, > enum dwt_type type, > return AVERROR_INVALIDDATA; > } > > -if (ARCH_X86 && bit_depth == 8) > +#if ARCH_X86 > +if (bit_depth == 8) > ff_spatial_idwt_init_x86(d, type); > +else if (bit_depth == 10) > +ff_spatial_idwt_init_10bit_x86(d, type); > +#endif > + > return 0; > } > > diff --git a/libavcodec/dirac_dwt.h b/libavcodec/dirac_dwt.h > index 994dc21d70..1ad7b9a821 100644 > --- a/libavcodec/dirac_dwt.h > +++ b/libavcodec/dirac_dwt.h > @@ -88,6 +88,7 @@ enum dwt_type { > int ff_spatial_idwt_init(DWTContext *d, DWTPlane *p, enum dwt_type type, > int decomposition_count, int bit_depth); > void ff_spatial_idwt_init_x86(DWTContext *d, enum dwt_type type); > +void ff_spatial_idwt_init_10bit_x86(DWTContext *d, enum dwt_type type); > > void ff_spatial_idwt_slice2(DWTContext *d, int y); > > diff --git a/libavcodec/x86/Makefile b/libavcodec/x86/Makefile > index 2350c8bbee..590d83c167 100644 > --- a/libavcodec/x86/Makefile > +++ b/libavcodec/x86/Makefile > @@ -7,7 +7,8 @@ OBJS-$(CONFIG_BLOCKDSP)+= > x86/blockdsp_init.o > OBJS-$(CONFIG_BSWAPDSP)+= x86/bswapdsp_init.o > OBJS-$(CONFIG_DCT) += x86/dct_init.o > OBJS-$(CONFIG_DIRAC_DECODER) += x86/diracdsp_init.o \ > - x86/dirac_dwt_init.o > + x86/dirac_dwt_init.o \ > + x86/dirac_dwt_init_10bit.o > OBJS-$(CONFIG_FDCTDSP) += x86/fdctdsp_init.o > OBJS-$(CONFIG_FFT) += x86/fft_init.o > OBJS-$(CONFIG_FLACDSP) += x86/flacdsp_init.o > @@ -153,7 +154,8 @@ X86ASM-OBJS-$(CONFIG_APNG_DECODER) += x86/pngdsp.o > X86ASM-OBJS-$(CONFIG_CAVS_DECODER) += x86/cavsidct.o > X86ASM-OBJS-$(CONFIG_DCA_DECODER) += x86/dcadsp.o x86/synth_filter.o > X86ASM-OBJS-$(CONFIG_DIRAC_DECODER)+= x86/diracdsp.o\ > - x86/dirac_dwt.o > + x86/dirac_dwt.o \ > + x86/dirac_dwt_10bit.o > X86ASM-OBJS-$(CONFIG_DNXHD_ENCODER)+= x86/dnxhdenc.o > X86ASM-OBJS-$(CONFIG_EXR_DECODER) += x86/exrdsp.o > X86ASM-OBJS-$(CONFIG_FLAC_DECODER) += x86/flacdsp.o > diff --git a/libavcodec/x86/dirac_dwt_10bit.asm > b/libavcodec/x86/dirac_dwt_10bit.asm > new file mode 100644 > index 00..baea91329e > --- /dev/null > +++ b/libavcodec/x86/dirac_dwt_10bit.asm > @@ -0,0 +1,160 @@ > +;** > > +;* x86 optimized discrete 10-bit wavelet trasnform > +;* Copyright (c) 2018 James Darnley > +;* > +;* This file is part of FFmpeg. > +;* > +;* FFmpeg is free software; you can redistribute it and/or > +;* modify it under the terms of the GNU Lesser General Public > +;* License as published by the Free Software Foundation; either > +;* version 2.1 of the License, or (at your option) any later version. > +;* > +;* FFmpeg is distributed in the hope that it will be useful, > +;* but WITHOUT ANY WARRANTY; without even the implied warranty of > +;* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > +;* Lesser General Public License for more details. > +;* > +;* You should have received a copy of the GNU Lesser General Public > +;* License along with FFmpeg; if not, write to the Free Software > +;* 51, Inc., Foundation Franklin Street, Fifth Floor, Boston, MA > 02110-1301 USA > +;*
Re: [FFmpeg-devel] [PATCH] Support for Ambisonics and OpusProjection* API.
On Thu, Jul 26, 2018 at 4:15 PM, Rostislav Pehlivanov wrote: > Hey, > > As of now, the ambisonics API is enabled by default in libopus. We still > don't have a way to signal ambisonics yet. > We still have plenty of bits left in libavutil/channel_layout.h to signal > many orders of ambisonics but some people have had opinions against > extending that API. We could instead extend AVMatrixEncoding but I don't > think that's entirely appropriate. > What opinions do people have on this? > I had been working on a new API that would encompass ambisonic ordering (see https://github.com/kodabb/libav/commit/98d9b0a7b28525b29e40ae4c564e51e7c94449eb). The downside is that it requires updating the whole channel layout API (see https://github.com/kodabb/libav/commit/c023b553e6ad6da5af6d0d4ff067ff844b2fcfac ) I got it mostly working but ran into issues during backward compatibility, and didn't have time to debug and fix it. If anyone wants to finish the set, backport it, and add the missing lswr part it would be easy work. I'm available to help in the process just to get this completed. The full branch is available at https://github.com/kodabb/libav/commits/chl (I hope this will be a mature discussion even though the patches belong to another tree). -- Vittorio ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] Support for Ambisonics and OpusProjection* API.
Hey, As of now, the ambisonics API is enabled by default in libopus. We still don't have a way to signal ambisonics yet. We still have plenty of bits left in libavutil/channel_layout.h to signal many orders of ambisonics but some people have had opinions against extending that API. We could instead extend AVMatrixEncoding but I don't think that's entirely appropriate. What opinions do people have on this? ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH 0/1] libavformat/dashenc: Fix relative URI of HLS master playlist
When using the DASH muxer to produce a segmented media stream, enabling the setting "hls_playlist" yields also HLS-compatible master and playlist manifest files. However, the relative URI of the master playlist is not formed as expected, since an extra slash preceeds the file name, i.e., ::///PATH//master.m3u8 is generated instead of the expected ::///PATH/master.m3u8. This patch just removes the extra slash preceeding the file name. As can be seen at line 341, media playlists are properly produced without placing the extra preceeding slash. The resulting URI of the master.m3u8 file is now consistent with the other assets yielded by the muxer. Antonio Morell (1): libavformat/dashenc: Fix relative URI of HLS master playlist libavformat/dashenc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- 2.15.2 (Apple Git-101.1) ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH 1/1] libavformat/dashenc: Fix relative URI of HLS master playlist
--- libavformat/dashenc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libavformat/dashenc.c b/libavformat/dashenc.c index a9b8b1d4f6..ae57fd5493 100644 --- a/libavformat/dashenc.c +++ b/libavformat/dashenc.c @@ -868,7 +868,7 @@ static int write_manifest(AVFormatContext *s, int final) int max_audio_bitrate = 0; if (*c->dirname) -snprintf(filename_hls, sizeof(filename_hls), "%s/master.m3u8", c->dirname); +snprintf(filename_hls, sizeof(filename_hls), "%smaster.m3u8", c->dirname); else snprintf(filename_hls, sizeof(filename_hls), "master.m3u8"); -- 2.15.2 (Apple Git-101.1) ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] About the maintainer of mips
>> I hered from the previous mantainer for mips that he was no longer part of >> mips company,and as a result, my patch was still pending review. >> Will ffmpeg community asign new mantainer for mips? > >No, you have to send a patch that changes the maintainership to you, >see MAINTAINERS in the main directory. Thank you very much for your reply. I send a patch to add myself to mips section today : ) 姓名:殷时友 电话:153 0560 8910 邮箱:yinshiyou...@loongson.cn ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH 1/3] diracdec: add 10-bit Haar SIMD functions
Speed of ffmpeg when decoding a 720p yuv422p10 file encoded with the relevant transform. C:119fps SSE2: 204fps AVX: 206fps AVX2: 221fps timer measurements, haar horizontal compose: sse2: 3.68x faster (45143 vs. 12279 decicycles) compared with C avx: 3.68x faster (45143 vs. 12275 decicycles) compared with C avx2: 5.16x faster (45143 vs. 8742 decicycles) compared with C haar vertical compose: sse2: 1.64x faster (31792 vs. 19377 decicycles) compared with C avx: 1.58x faster (31792 vs. 20090 decicycles) compared with C avx2: 1.66x faster (31792 vs. 19157 decicycles) compared with C --- libavcodec/dirac_dwt.c| 7 +- libavcodec/dirac_dwt.h| 1 + libavcodec/x86/Makefile | 6 +- libavcodec/x86/dirac_dwt_10bit.asm| 160 ++ libavcodec/x86/dirac_dwt_init_10bit.c | 76 5 files changed, 247 insertions(+), 3 deletions(-) create mode 100644 libavcodec/x86/dirac_dwt_10bit.asm create mode 100644 libavcodec/x86/dirac_dwt_init_10bit.c diff --git a/libavcodec/dirac_dwt.c b/libavcodec/dirac_dwt.c index cc08f8865a..86bee5bb9b 100644 --- a/libavcodec/dirac_dwt.c +++ b/libavcodec/dirac_dwt.c @@ -59,8 +59,13 @@ int ff_spatial_idwt_init(DWTContext *d, DWTPlane *p, enum dwt_type type, return AVERROR_INVALIDDATA; } -if (ARCH_X86 && bit_depth == 8) +#if ARCH_X86 +if (bit_depth == 8) ff_spatial_idwt_init_x86(d, type); +else if (bit_depth == 10) +ff_spatial_idwt_init_10bit_x86(d, type); +#endif + return 0; } diff --git a/libavcodec/dirac_dwt.h b/libavcodec/dirac_dwt.h index 994dc21d70..1ad7b9a821 100644 --- a/libavcodec/dirac_dwt.h +++ b/libavcodec/dirac_dwt.h @@ -88,6 +88,7 @@ enum dwt_type { int ff_spatial_idwt_init(DWTContext *d, DWTPlane *p, enum dwt_type type, int decomposition_count, int bit_depth); void ff_spatial_idwt_init_x86(DWTContext *d, enum dwt_type type); +void ff_spatial_idwt_init_10bit_x86(DWTContext *d, enum dwt_type type); void ff_spatial_idwt_slice2(DWTContext *d, int y); diff --git a/libavcodec/x86/Makefile b/libavcodec/x86/Makefile index 2350c8bbee..590d83c167 100644 --- a/libavcodec/x86/Makefile +++ b/libavcodec/x86/Makefile @@ -7,7 +7,8 @@ OBJS-$(CONFIG_BLOCKDSP)+= x86/blockdsp_init.o OBJS-$(CONFIG_BSWAPDSP)+= x86/bswapdsp_init.o OBJS-$(CONFIG_DCT) += x86/dct_init.o OBJS-$(CONFIG_DIRAC_DECODER) += x86/diracdsp_init.o \ - x86/dirac_dwt_init.o + x86/dirac_dwt_init.o \ + x86/dirac_dwt_init_10bit.o OBJS-$(CONFIG_FDCTDSP) += x86/fdctdsp_init.o OBJS-$(CONFIG_FFT) += x86/fft_init.o OBJS-$(CONFIG_FLACDSP) += x86/flacdsp_init.o @@ -153,7 +154,8 @@ X86ASM-OBJS-$(CONFIG_APNG_DECODER) += x86/pngdsp.o X86ASM-OBJS-$(CONFIG_CAVS_DECODER) += x86/cavsidct.o X86ASM-OBJS-$(CONFIG_DCA_DECODER) += x86/dcadsp.o x86/synth_filter.o X86ASM-OBJS-$(CONFIG_DIRAC_DECODER)+= x86/diracdsp.o\ - x86/dirac_dwt.o + x86/dirac_dwt.o \ + x86/dirac_dwt_10bit.o X86ASM-OBJS-$(CONFIG_DNXHD_ENCODER)+= x86/dnxhdenc.o X86ASM-OBJS-$(CONFIG_EXR_DECODER) += x86/exrdsp.o X86ASM-OBJS-$(CONFIG_FLAC_DECODER) += x86/flacdsp.o diff --git a/libavcodec/x86/dirac_dwt_10bit.asm b/libavcodec/x86/dirac_dwt_10bit.asm new file mode 100644 index 00..baea91329e --- /dev/null +++ b/libavcodec/x86/dirac_dwt_10bit.asm @@ -0,0 +1,160 @@ +;** +;* x86 optimized discrete 10-bit wavelet trasnform +;* Copyright (c) 2018 James Darnley +;* +;* This file is part of FFmpeg. +;* +;* FFmpeg is free software; you can redistribute it and/or +;* modify it under the terms of the GNU Lesser General Public +;* License as published by the Free Software Foundation; either +;* version 2.1 of the License, or (at your option) any later version. +;* +;* FFmpeg is distributed in the hope that it will be useful, +;* but WITHOUT ANY WARRANTY; without even the implied warranty of +;* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +;* Lesser General Public License for more details. +;* +;* You should have received a copy of the GNU Lesser General Public +;* License along with FFmpeg; if not, write to the Free Software +;* 51, Inc., Foundation Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA +;** + +%include "libavutil/x86/x86util.asm" + +SECTION_RODATA + +cextern pd_1 + +SECTION .text + +%macro HAAR_VERTICAL 0 + +cglobal vertical_compose_haar_10bit, 3, 6, 4, b0, b1, w +DECLARE_REG_TMP 4
[FFmpeg-devel] [PATCH 0/3 v2] x86 SIMD for dirac 10-bit wavelet transforms
I will ask the same question as last time. Is the AVX worth it in Haar? Also I am surprised that the AVX2 doesn't have a bigger difference on some of the vertical transforms. James Darnley (3): diracdec: add 10-bit Haar SIMD functions diracdec: add 10-bit Legall 5,3 (5_3) SIMD functions diracdec: add 10-bit Deslauriers-Dubuc 9,7 (9_7) vertical high-pass function libavcodec/dirac_dwt.c| 7 +- libavcodec/dirac_dwt.h| 1 + libavcodec/x86/Makefile | 6 +- libavcodec/x86/dirac_dwt_10bit.asm| 302 ++ libavcodec/x86/dirac_dwt_init_10bit.c | 118 ++ 5 files changed, 431 insertions(+), 3 deletions(-) create mode 100644 libavcodec/x86/dirac_dwt_10bit.asm create mode 100644 libavcodec/x86/dirac_dwt_init_10bit.c -- 2.18.0 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH 3/3] diracdec: add 10-bit Deslauriers-Dubuc 9, 7 (9_7) vertical high-pass function
Speed of ffmpeg when decoding a 720p yuv422p10 file encoded with the relevant transform. C: 84fps SSE2: 111fps AVX2: 115fps dd97 vertical hi sse2: 2.77x faster (31773 vs. 11457 decicycles) compared with C avx2: 3.83x faster (31773 vs. 8297 decicycles) compared with C --- libavcodec/x86/dirac_dwt_10bit.asm| 39 +++ libavcodec/x86/dirac_dwt_init_10bit.c | 29 2 files changed, 68 insertions(+) diff --git a/libavcodec/x86/dirac_dwt_10bit.asm b/libavcodec/x86/dirac_dwt_10bit.asm index 0295e6f554..2ed77fe3b0 100644 --- a/libavcodec/x86/dirac_dwt_10bit.asm +++ b/libavcodec/x86/dirac_dwt_10bit.asm @@ -25,6 +25,7 @@ SECTION_RODATA 32 cextern pd_1 pd_2: times 8 dd 2 +pd_8: times 8 dd 8 SECTION .text @@ -246,7 +247,44 @@ RET %endmacro +%macro DD97_VERTICAL_HI 0 + +cglobal dd97_vertical_hi, 6, 6, 8, b0, b1, b2, b3, b4, w +mova m7, [pd_8] +shl wd, 2 +add b0q, wq +add b1q, wq +add b2q, wq +add b3q, wq +add b4q, wq +neg wq + +ALIGN 16 +.loop: +mova m0, [b0q + wq] +mova m1, [b1q + wq] +mova m2, [b2q + wq] +mova m3, [b3q + wq] +mova m4, [b4q + wq] +pslld m5, m1, 3 +pslld m6, m3, 3 +paddd m5, m1 +paddd m6, m3 +psubd m5, m0 +psubd m6, m4 +paddd m5, m7 +paddd m5, m6 +psrad m5, 4 +paddd m2, m5 +mova [b2q + wq], m2 +add wq, mmsize +jl .loop +RET + +%endmacro + INIT_XMM sse2 +DD97_VERTICAL_HI HAAR_HORIZONTAL HAAR_VERTICAL LEGALL53_VERTICAL_HI @@ -257,6 +295,7 @@ HAAR_HORIZONTAL HAAR_VERTICAL INIT_YMM avx2 +DD97_VERTICAL_HI HAAR_HORIZONTAL HAAR_VERTICAL LEGALL53_VERTICAL_HI diff --git a/libavcodec/x86/dirac_dwt_init_10bit.c b/libavcodec/x86/dirac_dwt_init_10bit.c index d1234efac5..a9ac603bc5 100644 --- a/libavcodec/x86/dirac_dwt_init_10bit.c +++ b/libavcodec/x86/dirac_dwt_init_10bit.c @@ -23,6 +23,9 @@ #include "libavutil/x86/cpu.h" #include "libavcodec/dirac_dwt.h" +void ff_dd97_vertical_hi_sse2(int32_t *b0, int32_t *b1, int32_t *b2, int32_t *b3, int32_t *b4, int width); +void ff_dd97_vertical_hi_avx2(int32_t *b0, int32_t *b1, int32_t *b2, int32_t *b3, int32_t *b4, int width); + void ff_legall53_vertical_hi_sse2(int32_t *b0, int32_t *b1, int32_t *b2, int width); void ff_legall53_vertical_lo_sse2(int32_t *b0, int32_t *b1, int32_t *b2, int width); void ff_legall53_vertical_hi_avx2(int32_t *b0, int32_t *b1, int32_t *b2, int width); @@ -36,6 +39,24 @@ void ff_vertical_compose_haar_10bit_sse2(int32_t *b0, int32_t *b1, int width_ali void ff_vertical_compose_haar_10bit_avx(int32_t *b0, int32_t *b1, int width_align); void ff_vertical_compose_haar_10bit_avx2(int32_t *b0, int32_t *b1, int width_align); +static void dd97_vertical_hi_sse2(int32_t *b0, int32_t *b1, int32_t *b2, + int32_t *b3, int32_t *b4, int width) +{ +int i = width & ~3; +ff_dd97_vertical_hi_sse2(b0, b1, b2, b3, b4, i); +for(; ivertical_compose_h0 = (void*)dd97_vertical_hi_sse2; +d->vertical_compose_l0 = (void*)ff_legall53_vertical_lo_sse2; +break; case DWT_DIRAC_LEGALL5_3: d->vertical_compose_h0 = (void*)ff_legall53_vertical_hi_sse2; d->vertical_compose_l0 = (void*)ff_legall53_vertical_lo_sse2; @@ -71,6 +96,10 @@ av_cold void ff_spatial_idwt_init_10bit_x86(DWTContext *d, enum dwt_type type) if (EXTERNAL_AVX2(cpu_flags)) { switch (type) { +case DWT_DIRAC_DD9_7: +d->vertical_compose_h0 = (void*)dd97_vertical_hi_avx2; +d->vertical_compose_l0 = (void*)ff_legall53_vertical_lo_avx2; +break; case DWT_DIRAC_LEGALL5_3: d->vertical_compose_h0 = (void*)ff_legall53_vertical_hi_avx2; d->vertical_compose_l0 = (void*)ff_legall53_vertical_lo_avx2; -- 2.18.0 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH 2/3] diracdec: add 10-bit Legall 5, 3 (5_3) SIMD functions
Speed of ffmpeg when decoding a 720p yuv422p10 file encoded with the relevant transform. C: 94fps SSE2: 118fps AVX2: 121fps legall vertical hi sse2: 3.86x faster (20201 vs. 5231 decicycles) compared with C avx2: 6.70x faster (20201 vs. 3014 decicycles) compared with C legall vertical lo sse2: 1.50x faster (28345 vs. 18908 decicycles) compared with C avx2: 1.63x faster (28345 vs. 17361 decicycles) compared with C --- libavcodec/x86/dirac_dwt_10bit.asm| 105 +- libavcodec/x86/dirac_dwt_init_10bit.c | 13 2 files changed, 117 insertions(+), 1 deletion(-) diff --git a/libavcodec/x86/dirac_dwt_10bit.asm b/libavcodec/x86/dirac_dwt_10bit.asm index baea91329e..0295e6f554 100644 --- a/libavcodec/x86/dirac_dwt_10bit.asm +++ b/libavcodec/x86/dirac_dwt_10bit.asm @@ -21,9 +21,10 @@ %include "libavutil/x86/x86util.asm" -SECTION_RODATA +SECTION_RODATA 32 cextern pd_1 +pd_2: times 8 dd 2 SECTION .text @@ -147,9 +148,109 @@ REP_RET %endmacro +%macro LEGALL53_VERTICAL_LO 0 + +cglobal legall53_vertical_lo, 4, 6, 4, b0, b1, b2, w +DECLARE_REG_TMP 3,4,5 + +mova m3, [pd_2] +mov t2d, wd +and wd, ~(mmsize/4 - 1) +shl wd, 2 +add b0q, wq +add b1q, wq +add b2q, wq +neg wq + +ALIGN 16 +.loop: +mova m0, [b0q + wq] +mova m1, [b1q + wq] +mova m2, [b2q + wq] +paddd m0, m2 +paddd m0, m3 +psrad m0, 2 +psubd m1, m0 +mova [b1q + wq], m1 +add wq, mmsize +jl .loop + +and t2d, mmsize/4 - 1 +jz .end +.loop_scalar: +mov t0d, [b0q] +mov t1d, [b1q] +add t0d, [b2q] +add t0d, 2 +sar t0d, 2 +sub t1d, t0d +mov [b1q], t1d + +add b0q, 4 +add b1q, 4 +add b2q, 4 +sub t2d, 1 +jg .loop_scalar + +.end: +RET + +%endmacro + +%macro LEGALL53_VERTICAL_HI 0 + +cglobal legall53_vertical_hi, 4, 6, 4, b0, b1, b2, w +DECLARE_REG_TMP 3,4,5 + +mova m3, [pd_1] +mov t2d, wd +and wd, ~(mmsize/4 - 1) +shl wd, 2 +add b0q, wq +add b1q, wq +add b2q, wq +neg wq + +ALIGN 16 +.loop: +mova m0, [b0q + wq] +mova m1, [b1q + wq] +mova m2, [b2q + wq] +paddd m0, m2 +paddd m0, m3 +psrad m0, 1 +paddd m1, m0 +mova [b1q + wq], m1 +add wq, mmsize +jl .loop + +and t2d, mmsize/4 - 1 +jz .end +.loop_scalar: +mov t0d, [b0q] +mov t1d, [b1q] +add t0d, [b2q] +add t0d, 1 +sar t0d, 1 +add t1d, t0d +mov [b1q], t1d + +add b0q, 4 +add b1q, 4 +add b2q, 4 +sub t2d, 1 +jg .loop_scalar + +.end: +RET + +%endmacro + INIT_XMM sse2 HAAR_HORIZONTAL HAAR_VERTICAL +LEGALL53_VERTICAL_HI +LEGALL53_VERTICAL_LO INIT_XMM avx HAAR_HORIZONTAL @@ -158,3 +259,5 @@ HAAR_VERTICAL INIT_YMM avx2 HAAR_HORIZONTAL HAAR_VERTICAL +LEGALL53_VERTICAL_HI +LEGALL53_VERTICAL_LO diff --git a/libavcodec/x86/dirac_dwt_init_10bit.c b/libavcodec/x86/dirac_dwt_init_10bit.c index 289862d728..d1234efac5 100644 --- a/libavcodec/x86/dirac_dwt_init_10bit.c +++ b/libavcodec/x86/dirac_dwt_init_10bit.c @@ -23,6 +23,11 @@ #include "libavutil/x86/cpu.h" #include "libavcodec/dirac_dwt.h" +void ff_legall53_vertical_hi_sse2(int32_t *b0, int32_t *b1, int32_t *b2, int width); +void ff_legall53_vertical_lo_sse2(int32_t *b0, int32_t *b1, int32_t *b2, int width); +void ff_legall53_vertical_hi_avx2(int32_t *b0, int32_t *b1, int32_t *b2, int width); +void ff_legall53_vertical_lo_avx2(int32_t *b0, int32_t *b1, int32_t *b2, int width); + void ff_horizontal_compose_haar_10bit_sse2(int32_t *b0, int32_t *b1, int width_align); void ff_horizontal_compose_haar_10bit_avx(int32_t *b0, int32_t *b1, int width_align); void ff_horizontal_compose_haar_10bit_avx2(int32_t *b0, int32_t *b1, int width_align); @@ -38,6 +43,10 @@ av_cold void ff_spatial_idwt_init_10bit_x86(DWTContext *d, enum dwt_type type) if (EXTERNAL_SSE2(cpu_flags)) { switch (type) { +case DWT_DIRAC_LEGALL5_3: +d->vertical_compose_h0 = (void*)ff_legall53_vertical_hi_sse2; +d->vertical_compose_l0 = (void*)ff_legall53_vertical_lo_sse2; +break; case DWT_DIRAC_HAAR0: d->vertical_compose = (void*)ff_vertical_compose_haar_10bit_sse2; break; @@ -62,6 +71,10 @@ av_cold void ff_spatial_idwt_init_10bit_x86(DWTContext *d, enum dwt_type type) if (EXTERNAL_AVX2(cpu_flags)) { switch (type) { +case DWT_DIRAC_LEGALL5_3: +d->vertical_compose_h0 = (void*)ff_legall53_vertical_hi_avx2; +d->vertical_compose_l0 = (void*)ff_legall53_vertical_lo_avx2; +break; case DWT_DIRAC_HAAR0: d->vertical_compose = (void*)ff_vertical_compose_haar_10b
[FFmpeg-devel] [PATCH] avformat/librtmp: fix returning EOF from Read/Write
--- libavformat/librtmp.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/libavformat/librtmp.c b/libavformat/librtmp.c index f3cfa9a8e2..43013e46e0 100644 --- a/libavformat/librtmp.c +++ b/libavformat/librtmp.c @@ -261,7 +261,10 @@ static int rtmp_write(URLContext *s, const uint8_t *buf, int size) LibRTMPContext *ctx = s->priv_data; RTMP *r = &ctx->rtmp; -return RTMP_Write(r, buf, size); +int ret = RTMP_Write(r, buf, size); +if (!ret) +return AVERROR_EOF; +return ret; } static int rtmp_read(URLContext *s, uint8_t *buf, int size) @@ -269,7 +272,10 @@ static int rtmp_read(URLContext *s, uint8_t *buf, int size) LibRTMPContext *ctx = s->priv_data; RTMP *r = &ctx->rtmp; -return RTMP_Read(r, buf, size); +int ret = RTMP_Read(r, buf, size); +if (!ret) +return AVERROR_EOF; +return ret; } static int rtmp_read_pause(URLContext *s, int pause) -- 2.18.0 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH 2/2] lavc/encode: fix frame_number double-counted
Encoder frame_number may be double-counted if some frames are cached and then flushed. Take qsv encoder (some frames are cached firsty for asynchronism) as example, ./ffmpeg -loglevel verbose -hwaccel qsv -c:v h264_qsv -i in.mp4 -vframes 100 -c:v h264_qsv out.mp4 frame_number passed to encoder is double-counted and larger than the accurate value. Libx264 encoding with B frames can also reproduce it. Signed-off-by: Zhong Li --- libavcodec/encode.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/libavcodec/encode.c b/libavcodec/encode.c index d976151..98c44c3 100644 --- a/libavcodec/encode.c +++ b/libavcodec/encode.c @@ -235,8 +235,8 @@ int attribute_align_arg avcodec_encode_audio2(AVCodecContext *avctx, if (ret >= 0) avpkt->data = avpkt->buf->data; } - -avctx->frame_number++; +if (frame) +avctx->frame_number++; } if (ret < 0 || !*got_packet_ptr) { @@ -333,7 +333,8 @@ int attribute_align_arg avcodec_encode_video2(AVCodecContext *avctx, avpkt->data = avpkt->buf->data; } -avctx->frame_number++; +if (frame) +avctx->frame_number++; } if (ret < 0 || !*got_packet_ptr) -- 2.7.4 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH 1/2] lavc/qsvenc: expose qp of encoded frames
Requirement from ticket #7254. Currently only H264 supported by MSDK. Signed-off-by: Zhong Li --- libavcodec/qsvenc.c | 43 +++ libavcodec/qsvenc.h | 2 ++ libavcodec/qsvenc_h264.c | 5 + 3 files changed, 50 insertions(+) diff --git a/libavcodec/qsvenc.c b/libavcodec/qsvenc.c index 8096945..1294ed2 100644 --- a/libavcodec/qsvenc.c +++ b/libavcodec/qsvenc.c @@ -1139,6 +1139,10 @@ static int encode_frame(AVCodecContext *avctx, QSVEncContext *q, { AVPacket new_pkt = { 0 }; mfxBitstream *bs; +#if QSV_VERSION_ATLEAST(1, 26) +mfxExtAVCEncodedFrameInfo *enc_info; +mfxExtBuffer **enc_buf; +#endif mfxFrameSurface1 *surf = NULL; mfxSyncPoint *sync = NULL; @@ -1172,6 +1176,22 @@ static int encode_frame(AVCodecContext *avctx, QSVEncContext *q, bs->Data = new_pkt.data; bs->MaxLength = new_pkt.size; +#if QSV_VERSION_ATLEAST(1, 26) +if (avctx->codec_id == AV_CODEC_ID_H264) { +enc_info = av_mallocz(sizeof(*enc_info)); +if (!enc_info) +return AVERROR(ENOMEM); + +enc_info->Header.BufferId = MFX_EXTBUFF_ENCODED_FRAME_INFO; +enc_info->Header.BufferSz = sizeof (*enc_info); +bs->NumExtParam = 1; +enc_buf = av_mallocz(sizeof(mfxExtBuffer *)); +enc_buf[0] = (mfxExtBuffer *)enc_info; + +bs->ExtParam = enc_buf; +} +#endif + if (q->set_encode_ctrl_cb) { q->set_encode_ctrl_cb(avctx, frame, &qsv_frame->enc_ctrl); } @@ -1179,6 +1199,10 @@ static int encode_frame(AVCodecContext *avctx, QSVEncContext *q, sync = av_mallocz(sizeof(*sync)); if (!sync) { av_freep(&bs); + #if QSV_VERSION_ATLEAST(1, 26) +if (avctx->codec_id == AV_CODEC_ID_H264) +av_freep(&enc_info); + #endif av_packet_unref(&new_pkt); return AVERROR(ENOMEM); } @@ -1195,6 +1219,10 @@ static int encode_frame(AVCodecContext *avctx, QSVEncContext *q, if (ret < 0) { av_packet_unref(&new_pkt); av_freep(&bs); +#if QSV_VERSION_ATLEAST(1, 26) +if (avctx->codec_id == AV_CODEC_ID_H264) +av_freep(&enc_info); +#endif av_freep(&sync); return (ret == MFX_ERR_MORE_DATA) ? 0 : ff_qsv_print_error(avctx, ret, "Error during encoding"); @@ -1211,6 +1239,10 @@ static int encode_frame(AVCodecContext *avctx, QSVEncContext *q, av_freep(&sync); av_packet_unref(&new_pkt); av_freep(&bs); +#if QSV_VERSION_ATLEAST(1, 26) +if (avctx->codec_id == AV_CODEC_ID_H264) +av_freep(&enc_info); +#endif } return 0; @@ -1230,6 +1262,9 @@ int ff_qsv_encode(AVCodecContext *avctx, QSVEncContext *q, AVPacket new_pkt; mfxBitstream *bs; mfxSyncPoint *sync; +#if QSV_VERSION_ATLEAST(1, 26) +mfxExtAVCEncodedFrameInfo *enc_info; +#endif av_fifo_generic_read(q->async_fifo, &new_pkt, sizeof(new_pkt), NULL); av_fifo_generic_read(q->async_fifo, &sync,sizeof(sync),NULL); @@ -1258,6 +1293,14 @@ FF_DISABLE_DEPRECATION_WARNINGS FF_ENABLE_DEPRECATION_WARNINGS #endif +#if QSV_VERSION_ATLEAST(1, 26) +if (avctx->codec_id == AV_CODEC_ID_H264) { +enc_info = (mfxExtAVCEncodedFrameInfo *)(*bs->ExtParam); +av_log(avctx, AV_LOG_DEBUG, "QP is %d\n", enc_info->QP); +q->sum_frame_qp += enc_info->QP; +av_freep(&enc_info); +} +#endif av_freep(&bs); av_freep(&sync); diff --git a/libavcodec/qsvenc.h b/libavcodec/qsvenc.h index b2d6355..3784a82 100644 --- a/libavcodec/qsvenc.h +++ b/libavcodec/qsvenc.h @@ -102,6 +102,8 @@ typedef struct QSVEncContext { int width_align; int height_align; +int sum_frame_qp; + mfxVideoParam param; mfxFrameAllocRequest req; diff --git a/libavcodec/qsvenc_h264.c b/libavcodec/qsvenc_h264.c index 5c262e5..b87bef6 100644 --- a/libavcodec/qsvenc_h264.c +++ b/libavcodec/qsvenc_h264.c @@ -95,6 +95,11 @@ static av_cold int qsv_enc_close(AVCodecContext *avctx) { QSVH264EncContext *q = avctx->priv_data; +#if QSV_VERSION_ATLEAST(1, 26) +av_log(avctx, AV_LOG_VERBOSE, "encoded %d frames, avarge qp is %.2f\n", +avctx->frame_number,(double)q->qsv.sum_frame_qp / avctx->frame_number); +#endif + return ff_qsv_enc_close(avctx, &q->qsv); } -- 2.7.4 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH 1/2] tests/audiogen: raise channel count limit to 12
Signed-off-by: Tobias Rapp --- tests/audiogen.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tests/audiogen.c b/tests/audiogen.c index 8d596b5..c43bb70 100644 --- a/tests/audiogen.c +++ b/tests/audiogen.c @@ -26,7 +26,7 @@ #include #include -#define MAX_CHANNELS 8 +#define MAX_CHANNELS 12 static unsigned int myrnd(unsigned int *seed_ptr, int n) { -- 2.7.4 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] [PATCH 2/2] fate: add tests for audio channel up-/downmixing with pan filter
Add tests for upmixing and downmixing with audio channel counts that have a corresponding default layout and also tests where there is no default layout. Update the existing "stereo4" test so it actually outputs stereo like the other stereo tests. Rename the previous "stereo4" test into "upmix1". Signed-off-by: Tobias Rapp --- tests/fate/filter-audio.mak| 22 +++- tests/ref/fate/filter-pan-downmix1 | 26 ++ tests/ref/fate/filter-pan-downmix2 | 26 ++ tests/ref/fate/filter-pan-stereo4 | 42 +++--- .../fate/{filter-pan-stereo4 => filter-pan-upmix1} | 0 tests/ref/fate/filter-pan-upmix2 | 26 ++ 6 files changed, 120 insertions(+), 22 deletions(-) create mode 100644 tests/ref/fate/filter-pan-downmix1 create mode 100644 tests/ref/fate/filter-pan-downmix2 copy tests/ref/fate/{filter-pan-stereo4 => filter-pan-upmix1} (100%) create mode 100644 tests/ref/fate/filter-pan-upmix2 diff --git a/tests/fate/filter-audio.mak b/tests/fate/filter-audio.mak index 6125a37..473b8ae 100644 --- a/tests/fate/filter-audio.mak +++ b/tests/fate/filter-audio.mak @@ -156,7 +156,27 @@ fate-filter-pan-stereo3: CMD = framecrc -ss 3.14 -i $(SRC) -frames:a 20 -filter: FATE_AFILTER-$(call FILTERDEMDECENCMUX, PAN, WAV, PCM_S16LE, PCM_S16LE, WAV) += fate-filter-pan-stereo4 fate-filter-pan-stereo4: tests/data/asynth-44100-2.wav fate-filter-pan-stereo4: SRC = $(TARGET_PATH)/tests/data/asynth-44100-2.wav -fate-filter-pan-stereo4: CMD = framecrc -ss 3.14 -guess_layout_max 0 -i $(SRC) -frames:a 20 -filter:a "pan=4C|c0=c0-0.5*c1|c1=c1+0.5*c0|c2=0*c0|c3=0*c0" +fate-filter-pan-stereo4: CMD = framecrc -ss 3.14 -guess_layout_max 0 -i $(SRC) -frames:a 20 -filter:a "pan=2C|c0=c0-0.5*c1|c1=c1+0.5*c0" + +FATE_AFILTER-$(call FILTERDEMDECENCMUX, PAN, WAV, PCM_S16LE, PCM_S16LE, WAV) += fate-filter-pan-upmix1 +fate-filter-pan-upmix1: tests/data/asynth-44100-2.wav +fate-filter-pan-upmix1: SRC = $(TARGET_PATH)/tests/data/asynth-44100-2.wav +fate-filter-pan-upmix1: CMD = framecrc -ss 3.14 -guess_layout_max 0 -i $(SRC) -frames:a 20 -filter:a "pan=4C|c0=c0-0.5*c1|c1=c1+0.5*c0|c2=0*c0|c3=0*c0" + +FATE_AFILTER-$(call FILTERDEMDECENCMUX, PAN, WAV, PCM_S16LE, PCM_S16LE, WAV) += fate-filter-pan-upmix2 +fate-filter-pan-upmix2: tests/data/asynth-44100-4.wav +fate-filter-pan-upmix2: SRC = $(TARGET_PATH)/tests/data/asynth-44100-4.wav +fate-filter-pan-upmix2: CMD = framecrc -ss 3.14 -i $(SRC) -frames:a 20 -filter:a "pan=9C|c0=c0-c1|c1=c2+c3|c2=c0+c1|c3=c2-c3|c4=c1-c0|c5=c3+c2|c6=c1+c0|c7=c3-c2|c8=c0-c3" + +FATE_AFILTER-$(call FILTERDEMDECENCMUX, PAN, WAV, PCM_S16LE, PCM_S16LE, WAV) += fate-filter-pan-downmix1 +fate-filter-pan-downmix1: tests/data/asynth-44100-4.wav +fate-filter-pan-downmix1: SRC = $(TARGET_PATH)/tests/data/asynth-44100-4.wav +fate-filter-pan-downmix1: CMD = framecrc -ss 3.14 -i $(SRC) -frames:a 20 -filter:a "pan=2c|FLhttp://ffmpeg.org/mailman/listinfo/ffmpeg-devel