Re: [FFmpeg-devel] [PATCH] Avoid integer to float point domain crossing penalties
Anyone interested in reviewing this patch ? Thanks -Adrian On Mon, 24 Jun 2019 at 13:57, wrote: > From: Adrian Tong > > On internal benchmark, I see a noisy-level difference (more likely to be > an improvement) in ff_h264_decode_mb_cabac which calls this function. > --- > libavutil/x86/intreadwrite.h | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/libavutil/x86/intreadwrite.h b/libavutil/x86/intreadwrite.h > index 4061d19231..df0bf45ae1 100644 > --- a/libavutil/x86/intreadwrite.h > +++ b/libavutil/x86/intreadwrite.h > @@ -68,8 +68,8 @@ static av_always_inline void AV_COPY128(void *d, const > void *s) > { > struct v {uint64_t v[2];}; > > -__asm__("movaps %1, %%xmm0 \n\t" > -"movaps %%xmm0, %0 \n\t" > +__asm__("movdqa %1, %%xmm0 \n\t" > +"movdqa %%xmm0, %0 \n\t" > : "=m"(*(struct v*)d) > : "m" (*(const struct v*)s) > : "xmm0"); > -- > 2.20.1 (Apple Git-117) > > ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] yuv420_bgr24_mmxext conversion taking significant time
On Mon, 10 Jun 2019 at 23:02, Lauri Kasanen wrote: > On Mon, 10 Jun 2019 17:42:00 -0700 > Adrian Tong wrote: > > > I have been trying to implement yuv420_to_bgr24 using SSE2 instruction. I > > ran into the case where the output of C implemented yuv420_to_bgr24 has > > slightly different resulting bgr24 image from MMX implemented > > yuv420_to_bgr24. Is this expected behavior ? > > Yes, some of the MMX implementations choose speed over accuracy, I ran > to that myself when doing PPC versions. For a SSE version, if an > accurate version is fast enough, please try to match the C version. > Otherwise try to match MMX. > Thanks for confirming, Lauri. I am reading the MMX code for YUV420 to BGR24, I am a little bit confused by it. Particularly, we shift left by 3 bits (multiply by 8) for better precision. How does this increase precision ? Also, the conversion formula has some floating point numbers as coefficient, But we are doing integer type multiplication here. https://github.com/FFmpeg/FFmpeg/blob/master/libswscale/x86/yuv2rgb_template.c#L92 Thanks - Adrian > > - Lauri > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe". ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] yuv420_bgr24_mmxext conversion taking significant time
On Sat, 8 Jun 2019 at 09:42, Adrian Tong wrote: > > > On Sat, 8 Jun 2019 at 09:38, Lauri Kasanen wrote: > >> On Sat, 8 Jun 2019 06:51:51 -0700 >> Adrian Tong wrote: >> >> > Hi Lauri. >> > >> > Thanks for the reply, any reason why this has not been implemented >> before ? >> > it seems to me that this would be a pretty important/hot function. >> >> Just the usual, nobody has had the interest. There are other places too >> where the only x86 accel is mmx. >> >> - Lauri >> > > I see. Thank you. I will see what I can do. > -Adrian > I have been trying to implement yuv420_to_bgr24 using SSE2 instruction. I ran into the case where the output of C implemented yuv420_to_bgr24 has slightly different resulting bgr24 image from MMX implemented yuv420_to_bgr24. Is this expected behavior ? Thanks -Adrian > ___ >> ffmpeg-devel mailing list >> ffmpeg-devel@ffmpeg.org >> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel >> >> To unsubscribe, visit link above, or email >> ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe". > > ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] yuv420_bgr24_mmxext conversion taking significant time
On Sat, 8 Jun 2019 at 09:38, Lauri Kasanen wrote: > On Sat, 8 Jun 2019 06:51:51 -0700 > Adrian Tong wrote: > > > Hi Lauri. > > > > Thanks for the reply, any reason why this has not been implemented > before ? > > it seems to me that this would be a pretty important/hot function. > > Just the usual, nobody has had the interest. There are other places too > where the only x86 accel is mmx. > > - Lauri > I see. Thank you. I will see what I can do. -Adrian > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe". ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] yuv420_bgr24_mmxext conversion taking significant time
On Fri, 7 Jun 2019 at 23:20, Lauri Kasanen wrote: > On Fri, 7 Jun 2019 08:38:35 -0700 > Adrian Tong wrote: > > > Hi > > > > I have a workload which spends a significant amount of time (~10%) in > > the yuv420_bgr24_mmxext function in FFMEPG. > > > > I looked at the assembly and profile and see MMX (64 bit) registers are > > used. I wonder whether we can have a SSE2 version which has a register > bit > > width of 128. > > > > I am very interested in implementing such support if it is possible. > > I'm not well versed in x86 vectors, so I can't say if SSE2 is enough or > some other SSE version would be needed, but certainly YUV to RGB > conversion can be done faster than with MMX. Please do send a patch. > > - Lauri > Hi Lauri. Thanks for the reply, any reason why this has not been implemented before ? it seems to me that this would be a pretty important/hot function. -Adrian > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe". ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] yuv420_bgr24_mmxext conversion taking significant time
Hi I have a workload which spends a significant amount of time (~10%) in the yuv420_bgr24_mmxext function in FFMEPG. I looked at the assembly and profile and see MMX (64 bit) registers are used. I wonder whether we can have a SSE2 version which has a register bit width of 128. I am very interested in implementing such support if it is possible. Thanks -Adrian ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] Build ffmpeg with clang lto
Hi Carl This is the error message I get. It seems like libavdevice.a is llvm bitcode. Thanks -Adrian ./configure --cc=clang --enable-lto && make -j28 AR libavcodec/libavcodec.a LD ffmpeg_g LD ffprobe_g /usr/bin/ld: skipping incompatible libavdevice/libavdevice.a when searching for -lavdevice /usr/bin/ld: cannot find -lavdevice /usr/bin/ld: skipping incompatible libavfilter/libavfilter.a when searching for -lavfilter /usr/bin/ld: cannot find -lavfilter /usr/bin/ld: skipping incompatible libavformat/libavformat.a when searching for /usr/bin/ld-lavformat : skipping incompatible libavdevice/libavdevice.a when searching for -lavdevice /usr/bin/ld: cannot find -lavdevice /usr/bin/ld: skipping incompatible libavfilter/libavfilter.a when searching for -lavfilter /usr/bin/ld: cannot find -lavfilter /usr/bin/ld: skipping incompatible libavformat/libavformat.a when searching for -lavformat /usr/bin/ld: skipping incompatible libavcodec/libavcodec.a when searching for -lavcodec /usr/bin/ld: skipping incompatible libavcodec/libavcodec.a when searching for -lavcodec /usr/bin/ld: skipping incompatible libswresample/libswresample.a when searching for -lswresample /usr/bin/ld: skipping incompatible libswscale/libswscale.a when searching for -lswscale /usr/bin/ld: skipping incompatible libswresample/libswresample.a when searching for -lswresample /usr/bin/ld: skipping incompatible libavutil/libavutil.a when searching for -lavutil /usr/bin/ld: skipping incompatible libswscale/libswscale.a when searching for -lswscale /usr/bin/ld: skipping incompatible libavutil/libavutil.a when searching for -lavutil clang: *error: *linker command failed with exit code 1 (use -v to see invocation) clang: *error: *linker command failed with exit code 1 (use -v to see invocation) Makefile:111: recipe for target 'ffprobe_g' failed On Sat, 1 Jun 2019 at 09:55, Carl Eugen Hoyos wrote: > Am Sa., 1. Juni 2019 um 17:45 Uhr schrieb Adrian Tong > : > > > Anyone has experience compiling ffmpeg with clang LTO before ? I tried > > ./configure --cc=clang --cxx=clang++ --enable-lto and it did not work. > > cxx should never be needed. > > "did not work" is not a useful problem description... > (clang is definitely supported) > > Carl Eugen > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe". ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] Build ffmpeg with clang lto
Hi Anyone has experience compiling ffmpeg with clang LTO before ? I tried ./configure --cc=clang --cxx=clang++ --enable-lto and it did not work. Thanks -Adrian ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".