Re: [FFmpeg-devel] yuv420_bgr24_mmxext conversion taking significant time

2019-06-11 Thread Adrian Tong
On Mon, 10 Jun 2019 at 23:02, Lauri Kasanen  wrote:

> On Mon, 10 Jun 2019 17:42:00 -0700
> Adrian Tong  wrote:
>
> > I have been trying to implement yuv420_to_bgr24 using SSE2 instruction. I
> > ran into the case where the output of C implemented yuv420_to_bgr24 has
> > slightly different resulting bgr24 image from MMX implemented
> > yuv420_to_bgr24. Is this expected behavior ?
>
> Yes, some of the MMX implementations choose speed over accuracy, I ran
> to that myself when doing PPC versions. For a SSE version, if an
> accurate version is fast enough, please try to match the C version.
> Otherwise try to match MMX.
>

Thanks for confirming, Lauri. I am reading the MMX code for YUV420 to
BGR24, I am a little bit confused by it. Particularly, we shift left by 3
bits (multiply by 8) for better precision. How does this increase precision
?

Also, the conversion formula has some floating point numbers as
coefficient, But we are doing integer type multiplication here.

https://github.com/FFmpeg/FFmpeg/blob/master/libswscale/x86/yuv2rgb_template.c#L92

Thanks
- Adrian


>
> - Lauri
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] yuv420_bgr24_mmxext conversion taking significant time

2019-06-11 Thread Lauri Kasanen
On Mon, 10 Jun 2019 17:42:00 -0700
Adrian Tong  wrote:

> I have been trying to implement yuv420_to_bgr24 using SSE2 instruction. I
> ran into the case where the output of C implemented yuv420_to_bgr24 has
> slightly different resulting bgr24 image from MMX implemented
> yuv420_to_bgr24. Is this expected behavior ?

Yes, some of the MMX implementations choose speed over accuracy, I ran
to that myself when doing PPC versions. For a SSE version, if an
accurate version is fast enough, please try to match the C version.
Otherwise try to match MMX.

- Lauri
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] yuv420_bgr24_mmxext conversion taking significant time

2019-06-10 Thread Adrian Tong
On Sat, 8 Jun 2019 at 09:42, Adrian Tong  wrote:

>
>
> On Sat, 8 Jun 2019 at 09:38, Lauri Kasanen  wrote:
>
>> On Sat, 8 Jun 2019 06:51:51 -0700
>> Adrian Tong  wrote:
>>
>> > Hi Lauri.
>> >
>> > Thanks for the reply, any reason why this has not been implemented
>> before ?
>> > it seems to me that this would be a pretty important/hot function.
>>
>> Just the usual, nobody has had the interest. There are other places too
>> where the only x86 accel is mmx.
>>
>> - Lauri
>>
>
> I see. Thank you. I will see what I can do.
> -Adrian
>

I have been trying to implement yuv420_to_bgr24 using SSE2 instruction. I
ran into the case where the output of C implemented yuv420_to_bgr24 has
slightly different resulting bgr24 image from MMX implemented
yuv420_to_bgr24. Is this expected behavior ?

Thanks
-Adrian

> ___
>> ffmpeg-devel mailing list
>> ffmpeg-devel@ffmpeg.org
>> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>>
>> To unsubscribe, visit link above, or email
>> ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
>
>
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] yuv420_bgr24_mmxext conversion taking significant time

2019-06-08 Thread Adrian Tong
On Sat, 8 Jun 2019 at 09:38, Lauri Kasanen  wrote:

> On Sat, 8 Jun 2019 06:51:51 -0700
> Adrian Tong  wrote:
>
> > Hi Lauri.
> >
> > Thanks for the reply, any reason why this has not been implemented
> before ?
> > it seems to me that this would be a pretty important/hot function.
>
> Just the usual, nobody has had the interest. There are other places too
> where the only x86 accel is mmx.
>
> - Lauri
>

I see. Thank you. I will see what I can do.
-Adrian

> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] yuv420_bgr24_mmxext conversion taking significant time

2019-06-08 Thread Lauri Kasanen
On Sat, 8 Jun 2019 06:51:51 -0700
Adrian Tong  wrote:

> Hi Lauri.
>
> Thanks for the reply, any reason why this has not been implemented before ?
> it seems to me that this would be a pretty important/hot function.

Just the usual, nobody has had the interest. There are other places too
where the only x86 accel is mmx.

- Lauri
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] yuv420_bgr24_mmxext conversion taking significant time

2019-06-08 Thread Adrian Tong
On Fri, 7 Jun 2019 at 23:20, Lauri Kasanen  wrote:

> On Fri, 7 Jun 2019 08:38:35 -0700
> Adrian Tong  wrote:
>
> > Hi
> >
> > I have a workload which spends a significant amount of time (~10%) in
> > the yuv420_bgr24_mmxext function in FFMEPG.
> >
> > I looked at the assembly and profile and see MMX (64 bit) registers are
> > used. I wonder whether we can have a SSE2 version which has a register
> bit
> > width of 128.
> >
> > I am very interested in implementing such support if it is possible.
>
> I'm not well versed in x86 vectors, so I can't say if SSE2 is enough or
> some other SSE version would be needed, but certainly YUV to RGB
> conversion can be done faster than with MMX. Please do send a patch.
>
> - Lauri
>

Hi Lauri.

Thanks for the reply, any reason why this has not been implemented before ?
it seems to me that this would be a pretty important/hot function.

-Adrian

> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] yuv420_bgr24_mmxext conversion taking significant time

2019-06-08 Thread Lauri Kasanen
On Fri, 7 Jun 2019 08:38:35 -0700
Adrian Tong  wrote:

> Hi
>
> I have a workload which spends a significant amount of time (~10%) in
> the yuv420_bgr24_mmxext function in FFMEPG.
>
> I looked at the assembly and profile and see MMX (64 bit) registers are
> used. I wonder whether we can have a SSE2 version which has a register bit
> width of 128.
>
> I am very interested in implementing such support if it is possible.

I'm not well versed in x86 vectors, so I can't say if SSE2 is enough or
some other SSE version would be needed, but certainly YUV to RGB
conversion can be done faster than with MMX. Please do send a patch.

- Lauri
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] yuv420_bgr24_mmxext conversion taking significant time

2019-06-07 Thread Adrian Tong
Hi

I have a workload which spends a significant amount of time (~10%) in
the yuv420_bgr24_mmxext function in FFMEPG.

I looked at the assembly and profile and see MMX (64 bit) registers are
used. I wonder whether we can have a SSE2 version which has a register bit
width of 128.

I am very interested in implementing such support if it is possible.

Thanks
-Adrian
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".