Re: [FFmpeg-devel] [PATCH] Avoid integer to float point domain crossing penalties

2019-06-27 Thread Adrian Tong
Anyone interested in reviewing this patch ?

Thanks
-Adrian

On Mon, 24 Jun 2019 at 13:57,  wrote:

> From: Adrian Tong 
>
> On internal benchmark, I see a noisy-level difference (more likely to be
> an improvement) in ff_h264_decode_mb_cabac which calls this function.
> ---
>  libavutil/x86/intreadwrite.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/libavutil/x86/intreadwrite.h b/libavutil/x86/intreadwrite.h
> index 4061d19231..df0bf45ae1 100644
> --- a/libavutil/x86/intreadwrite.h
> +++ b/libavutil/x86/intreadwrite.h
> @@ -68,8 +68,8 @@ static av_always_inline void AV_COPY128(void *d, const
> void *s)
>  {
>  struct v {uint64_t v[2];};
>
> -__asm__("movaps   %1, %%xmm0  \n\t"
> -"movaps   %%xmm0, %0  \n\t"
> +__asm__("movdqa   %1, %%xmm0  \n\t"
> +"movdqa   %%xmm0, %0  \n\t"
>  : "=m"(*(struct v*)d)
>  : "m" (*(const struct v*)s)
>  : "xmm0");
> --
> 2.20.1 (Apple Git-117)
>
>
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] yuv420_bgr24_mmxext conversion taking significant time

2019-06-11 Thread Adrian Tong
On Mon, 10 Jun 2019 at 23:02, Lauri Kasanen  wrote:

> On Mon, 10 Jun 2019 17:42:00 -0700
> Adrian Tong  wrote:
>
> > I have been trying to implement yuv420_to_bgr24 using SSE2 instruction. I
> > ran into the case where the output of C implemented yuv420_to_bgr24 has
> > slightly different resulting bgr24 image from MMX implemented
> > yuv420_to_bgr24. Is this expected behavior ?
>
> Yes, some of the MMX implementations choose speed over accuracy, I ran
> to that myself when doing PPC versions. For a SSE version, if an
> accurate version is fast enough, please try to match the C version.
> Otherwise try to match MMX.
>

Thanks for confirming, Lauri. I am reading the MMX code for YUV420 to
BGR24, I am a little bit confused by it. Particularly, we shift left by 3
bits (multiply by 8) for better precision. How does this increase precision
?

Also, the conversion formula has some floating point numbers as
coefficient, But we are doing integer type multiplication here.

https://github.com/FFmpeg/FFmpeg/blob/master/libswscale/x86/yuv2rgb_template.c#L92

Thanks
- Adrian


>
> - Lauri
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] yuv420_bgr24_mmxext conversion taking significant time

2019-06-10 Thread Adrian Tong
On Sat, 8 Jun 2019 at 09:42, Adrian Tong  wrote:

>
>
> On Sat, 8 Jun 2019 at 09:38, Lauri Kasanen  wrote:
>
>> On Sat, 8 Jun 2019 06:51:51 -0700
>> Adrian Tong  wrote:
>>
>> > Hi Lauri.
>> >
>> > Thanks for the reply, any reason why this has not been implemented
>> before ?
>> > it seems to me that this would be a pretty important/hot function.
>>
>> Just the usual, nobody has had the interest. There are other places too
>> where the only x86 accel is mmx.
>>
>> - Lauri
>>
>
> I see. Thank you. I will see what I can do.
> -Adrian
>

I have been trying to implement yuv420_to_bgr24 using SSE2 instruction. I
ran into the case where the output of C implemented yuv420_to_bgr24 has
slightly different resulting bgr24 image from MMX implemented
yuv420_to_bgr24. Is this expected behavior ?

Thanks
-Adrian

> ___
>> ffmpeg-devel mailing list
>> ffmpeg-devel@ffmpeg.org
>> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>>
>> To unsubscribe, visit link above, or email
>> ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
>
>
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] yuv420_bgr24_mmxext conversion taking significant time

2019-06-08 Thread Adrian Tong
On Sat, 8 Jun 2019 at 09:38, Lauri Kasanen  wrote:

> On Sat, 8 Jun 2019 06:51:51 -0700
> Adrian Tong  wrote:
>
> > Hi Lauri.
> >
> > Thanks for the reply, any reason why this has not been implemented
> before ?
> > it seems to me that this would be a pretty important/hot function.
>
> Just the usual, nobody has had the interest. There are other places too
> where the only x86 accel is mmx.
>
> - Lauri
>

I see. Thank you. I will see what I can do.
-Adrian

> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] yuv420_bgr24_mmxext conversion taking significant time

2019-06-08 Thread Adrian Tong
On Fri, 7 Jun 2019 at 23:20, Lauri Kasanen  wrote:

> On Fri, 7 Jun 2019 08:38:35 -0700
> Adrian Tong  wrote:
>
> > Hi
> >
> > I have a workload which spends a significant amount of time (~10%) in
> > the yuv420_bgr24_mmxext function in FFMEPG.
> >
> > I looked at the assembly and profile and see MMX (64 bit) registers are
> > used. I wonder whether we can have a SSE2 version which has a register
> bit
> > width of 128.
> >
> > I am very interested in implementing such support if it is possible.
>
> I'm not well versed in x86 vectors, so I can't say if SSE2 is enough or
> some other SSE version would be needed, but certainly YUV to RGB
> conversion can be done faster than with MMX. Please do send a patch.
>
> - Lauri
>

Hi Lauri.

Thanks for the reply, any reason why this has not been implemented before ?
it seems to me that this would be a pretty important/hot function.

-Adrian

> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] yuv420_bgr24_mmxext conversion taking significant time

2019-06-07 Thread Adrian Tong
Hi

I have a workload which spends a significant amount of time (~10%) in
the yuv420_bgr24_mmxext function in FFMEPG.

I looked at the assembly and profile and see MMX (64 bit) registers are
used. I wonder whether we can have a SSE2 version which has a register bit
width of 128.

I am very interested in implementing such support if it is possible.

Thanks
-Adrian
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] Build ffmpeg with clang lto

2019-06-02 Thread Adrian Tong
Hi Carl

This is the error message I get. It seems like libavdevice.a is llvm
bitcode.

Thanks
-Adrian

./configure --cc=clang --enable-lto && make -j28

AR libavcodec/libavcodec.a

LD ffmpeg_g

LD ffprobe_g

/usr/bin/ld: skipping incompatible libavdevice/libavdevice.a when searching
for -lavdevice

/usr/bin/ld: cannot find -lavdevice

/usr/bin/ld: skipping incompatible libavfilter/libavfilter.a when searching
for -lavfilter

/usr/bin/ld: cannot find -lavfilter

/usr/bin/ld: skipping incompatible libavformat/libavformat.a when searching
for /usr/bin/ld-lavformat

: skipping incompatible libavdevice/libavdevice.a when searching for
-lavdevice

/usr/bin/ld: cannot find -lavdevice

/usr/bin/ld: skipping incompatible libavfilter/libavfilter.a when searching
for -lavfilter

/usr/bin/ld: cannot find -lavfilter

/usr/bin/ld: skipping incompatible libavformat/libavformat.a when searching
for -lavformat

/usr/bin/ld: skipping incompatible libavcodec/libavcodec.a when searching
for -lavcodec

/usr/bin/ld: skipping incompatible libavcodec/libavcodec.a when searching
for -lavcodec

/usr/bin/ld: skipping incompatible libswresample/libswresample.a when
searching for -lswresample

/usr/bin/ld: skipping incompatible libswscale/libswscale.a when searching
for -lswscale

/usr/bin/ld: skipping incompatible libswresample/libswresample.a when
searching for -lswresample

/usr/bin/ld: skipping incompatible libavutil/libavutil.a when searching for
-lavutil

/usr/bin/ld: skipping incompatible libswscale/libswscale.a when searching
for -lswscale

/usr/bin/ld: skipping incompatible libavutil/libavutil.a when searching for
-lavutil

clang: *error: *linker command failed with exit code 1 (use -v to see
invocation)

clang: *error: *linker command failed with exit code 1 (use -v to see
invocation)

Makefile:111: recipe for target 'ffprobe_g' failed

On Sat, 1 Jun 2019 at 09:55, Carl Eugen Hoyos  wrote:

> Am Sa., 1. Juni 2019 um 17:45 Uhr schrieb Adrian Tong
> :
>
> > Anyone has experience compiling ffmpeg with clang LTO before ? I tried
> > ./configure --cc=clang --cxx=clang++ --enable-lto and it did not work.
>
> cxx should never be needed.
>
> "did not work" is not a useful problem description...
> (clang is definitely supported)
>
> Carl Eugen
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] Build ffmpeg with clang lto

2019-06-01 Thread Adrian Tong
Hi

Anyone has experience compiling ffmpeg with clang LTO before ? I tried
./configure --cc=clang --cxx=clang++ --enable-lto and it did not work.

Thanks
-Adrian
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".