Re: [FFmpeg-devel] Memcpy Operation Duration

2016-10-18 Thread Ali KIZIL
On Oct 18, 2016 11:08 PM, "Ronald S. Bultje"  wrote:
>
> Hi Ali,
>
> On Tue, Oct 18, 2016 at 3:57 PM, Ali KIZIL  wrote:
>
> > 2016-10-18 22:44 GMT+03:00 Sven C. Dack :
> >
> > > On 18/10/16 20:26, Ali KIZIL wrote:
> > >
> > >> Hi Everyone,
> > >>
> > >> Today, I was analyzing memcpy duration in FFmpeg. I noticed that it
is
> > >> taking longer time compared to an optimized SSE, SSE2, MMX, MMX2,
AVX or
> > >> AVX2 based memcpy operation.
> > >>
> > >> I tried march=corei7-avx2 compiled FFmpeg version, it does not change
> > the
> > >> duration of memcpy operation.
> > >> I also folowed https://trac.ffmpeg.org/wiki/C
> > >> ompilationGuide#PerformanceTips
> > >> .Same result. In addition, I tried gcc 6.2 if gcc if gcc is not
> > selecting
> > >> the correct flag. Same result again.
> > >>
> > >> This memcpy operations effect the fps decoding (and probably
encoding)
> > >> rates.
> > >>
> > >> In a case that uyvy422 to p010 3840x2160 unscaled convertion in
> > rawvideo,
> > >> fps rate increased from 44 fps to 52 fps on a Xeon E5 2630 v4.
> > >>
> > >> Do I miss anything when compiling FFmpeg for AVX2 or other flag
> > optimised,
> > >> or there need a fix in FFmpeg to direct some (or all)  memcpy
operations
> > >> to
> > >> a inherited memcpy operation which can decide flag for optimisation ?
> > >> Or there is no such need and I am on a wrong path ?
> > >>
> > >> (As a side note, FFmpeg works performance on i7 Extreme cores
compared
> > to
> > >> Xeon v4 processors.)
> > >
> > > Could be it's gcc's built-in version. It's been said that libc is
> > > occasionally better at it than gcc's built-in version.
> > >
> > > Use -fno-builtin-memcpy and see what difference it makes.
> >
>
> >
> I see, tomorrow morning I will give it a try.
> > Thank you for the good idea. If it increase performance, maybe it will
be a
> > good idea to make a configure option.
>
>
> configure has --extra-cflags=.. and --extra-ldflags=.. options to add
> custom CC CLI arguments.
>
> Ronald
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Hi Ronald,

Yes, I used extra flags to give march=native or march=corei7-avx2.
Tomorrow, I will try -fno-builtin-memcpy option with extra-cflag. I will
update the topic.

Thank you,
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] Memcpy Operation Duration

2016-10-18 Thread Ronald S. Bultje
Hi Ali,

On Tue, Oct 18, 2016 at 3:57 PM, Ali KIZIL  wrote:

> 2016-10-18 22:44 GMT+03:00 Sven C. Dack :
>
> > On 18/10/16 20:26, Ali KIZIL wrote:
> >
> >> Hi Everyone,
> >>
> >> Today, I was analyzing memcpy duration in FFmpeg. I noticed that it is
> >> taking longer time compared to an optimized SSE, SSE2, MMX, MMX2, AVX or
> >> AVX2 based memcpy operation.
> >>
> >> I tried march=corei7-avx2 compiled FFmpeg version, it does not change
> the
> >> duration of memcpy operation.
> >> I also folowed https://trac.ffmpeg.org/wiki/C
> >> ompilationGuide#PerformanceTips
> >> .Same result. In addition, I tried gcc 6.2 if gcc if gcc is not
> selecting
> >> the correct flag. Same result again.
> >>
> >> This memcpy operations effect the fps decoding (and probably encoding)
> >> rates.
> >>
> >> In a case that uyvy422 to p010 3840x2160 unscaled convertion in
> rawvideo,
> >> fps rate increased from 44 fps to 52 fps on a Xeon E5 2630 v4.
> >>
> >> Do I miss anything when compiling FFmpeg for AVX2 or other flag
> optimised,
> >> or there need a fix in FFmpeg to direct some (or all)  memcpy operations
> >> to
> >> a inherited memcpy operation which can decide flag for optimisation ?
> >> Or there is no such need and I am on a wrong path ?
> >>
> >> (As a side note, FFmpeg works performance on i7 Extreme cores compared
> to
> >> Xeon v4 processors.)
> >
> > Could be it's gcc's built-in version. It's been said that libc is
> > occasionally better at it than gcc's built-in version.
> >
> > Use -fno-builtin-memcpy and see what difference it makes.
>

>
I see, tomorrow morning I will give it a try.
> Thank you for the good idea. If it increase performance, maybe it will be a
> good idea to make a configure option.


configure has --extra-cflags=.. and --extra-ldflags=.. options to add
custom CC CLI arguments.

Ronald
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] Memcpy Operation Duration

2016-10-18 Thread Ali KIZIL
2016-10-18 22:44 GMT+03:00 Sven C. Dack :

> On 18/10/16 20:26, Ali KIZIL wrote:
>
>> Hi Everyone,
>>
>> Today, I was analyzing memcpy duration in FFmpeg. I noticed that it is
>> taking longer time compared to an optimized SSE, SSE2, MMX, MMX2, AVX or
>> AVX2 based memcpy operation.
>>
>> I tried march=corei7-avx2 compiled FFmpeg version, it does not change the
>> duration of memcpy operation.
>> I also folowed https://trac.ffmpeg.org/wiki/C
>> ompilationGuide#PerformanceTips
>> .Same result. In addition, I tried gcc 6.2 if gcc if gcc is not selecting
>> the correct flag. Same result again.
>>
>> This memcpy operations effect the fps decoding (and probably encoding)
>> rates.
>>
>> In a case that uyvy422 to p010 3840x2160 unscaled convertion in rawvideo,
>> fps rate increased from 44 fps to 52 fps on a Xeon E5 2630 v4.
>>
>> Do I miss anything when compiling FFmpeg for AVX2 or other flag optimised,
>> or there need a fix in FFmpeg to direct some (or all)  memcpy operations
>> to
>> a inherited memcpy operation which can decide flag for optimisation ?
>> Or there is no such need and I am on a wrong path ?
>>
>> (As a side note, FFmpeg works performance on i7 Extreme cores compared to
>> Xeon v4 processors.)
>>
>> Kind Regards,
>> Ali KIZIL
>> ___
>> ffmpeg-devel mailing list
>> ffmpeg-devel@ffmpeg.org
>> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>>
>
> Could be it's gcc's built-in version. It's been said that libc is
> occasionally better at it than gcc's built-in version.
>
> Use -fno-builtin-memcpy and see what difference it makes.
>
>
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


I see, tomorrow morning I will give it a try.
Thank you for the good idea. If it increase performance, maybe it will be a
good idea to make a configure option.

Kind Regards,
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] Memcpy Operation Duration

2016-10-18 Thread Sven C. Dack

On 18/10/16 20:26, Ali KIZIL wrote:

Hi Everyone,

Today, I was analyzing memcpy duration in FFmpeg. I noticed that it is
taking longer time compared to an optimized SSE, SSE2, MMX, MMX2, AVX or
AVX2 based memcpy operation.

I tried march=corei7-avx2 compiled FFmpeg version, it does not change the
duration of memcpy operation.
I also folowed https://trac.ffmpeg.org/wiki/CompilationGuide#PerformanceTips
.Same result. In addition, I tried gcc 6.2 if gcc if gcc is not selecting
the correct flag. Same result again.

This memcpy operations effect the fps decoding (and probably encoding)
rates.

In a case that uyvy422 to p010 3840x2160 unscaled convertion in rawvideo,
fps rate increased from 44 fps to 52 fps on a Xeon E5 2630 v4.

Do I miss anything when compiling FFmpeg for AVX2 or other flag optimised,
or there need a fix in FFmpeg to direct some (or all)  memcpy operations to
a inherited memcpy operation which can decide flag for optimisation ?
Or there is no such need and I am on a wrong path ?

(As a side note, FFmpeg works performance on i7 Extreme cores compared to
Xeon v4 processors.)

Kind Regards,
Ali KIZIL
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Could be it's gcc's built-in version. It's been said that libc is occasionally 
better at it than gcc's built-in version.


Use -fno-builtin-memcpy and see what difference it makes.


___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] Memcpy Operation Duration

2016-10-18 Thread Ali KIZIL
Hi Everyone,

Today, I was analyzing memcpy duration in FFmpeg. I noticed that it is
taking longer time compared to an optimized SSE, SSE2, MMX, MMX2, AVX or
AVX2 based memcpy operation.

I tried march=corei7-avx2 compiled FFmpeg version, it does not change the
duration of memcpy operation.
I also folowed https://trac.ffmpeg.org/wiki/CompilationGuide#PerformanceTips
.Same result. In addition, I tried gcc 6.2 if gcc if gcc is not selecting
the correct flag. Same result again.

This memcpy operations effect the fps decoding (and probably encoding)
rates.

In a case that uyvy422 to p010 3840x2160 unscaled convertion in rawvideo,
fps rate increased from 44 fps to 52 fps on a Xeon E5 2630 v4.

Do I miss anything when compiling FFmpeg for AVX2 or other flag optimised,
or there need a fix in FFmpeg to direct some (or all)  memcpy operations to
a inherited memcpy operation which can decide flag for optimisation ?
Or there is no such need and I am on a wrong path ?

(As a side note, FFmpeg works performance on i7 Extreme cores compared to
Xeon v4 processors.)

Kind Regards,
Ali KIZIL
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel