Re: [FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS

2015-10-22 Thread Ganesh Ajjanagadde
On Thu, Oct 22, 2015 at 4:51 PM, Carl Eugen Hoyos  wrote:
> Ganesh Ajjanagadde  mit.edu> writes:
>
>> To put an end to a long and tortuous thread, and
>> due to the lack of relevant outstanding objections,
>> pushed.
>
> To sum it up:
> Several developers have explained to you that the
> numbers you posted show that FFmpeg is now either
> slower or equally fast, you have pushed with a
> commit message that claims that your patch makes
> FFmpeg faster.

I was far more nuanced because I know there are people like you who do
not understand such things. Seems like I failed at that anyway. I am
sorry, but I can only repeat the first two lines:
"It is well known that fabs and fabsf are at least as fast and sometimes
faster than the FFABS macro, at least on the gcc+glibc combination."

That statement is completely accurate. I did not claim "within
FFmpeg", since like I said usually there is no difference.

From where the "slower" comes from god knows.

> How do you call such a claim without any base?

Look at the asm, etc etc. Clement has already showed the difference.
"Without any base" is a completely incorrect and illogical statement.
Paul is clearly convinced enough to start using it in his own filters.

> Carl Eugen
>
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS

2015-10-22 Thread Ganesh Ajjanagadde
On Fri, Oct 16, 2015 at 7:53 AM, Ganesh Ajjanagadde  wrote:
> On Fri, Oct 16, 2015 at 7:30 AM, Michael Niedermayer
>  wrote:
>> On Thu, Oct 15, 2015 at 06:38:10AM -0400, Ganesh Ajjanagadde wrote:
>>> On Wed, Oct 14, 2015 at 6:53 AM, Hendrik Leppkes  
>>> wrote:
>>> > On Wed, Oct 14, 2015 at 12:49 PM, Carl Eugen Hoyos  
>>> > wrote:
>>> >> Ganesh Ajjanagadde  mit.edu> writes:
>>> >>
>>> >>> What? My numbers actually show that the new code may be faster -
>>> >>
>>> >> No, you are misunderstanding the numbers you posted.
>>> >> (Or I misunderstand them but nobody said so yet.)
>>> >>
>>> >> Highest runs are most relevant, skips have to be
>>> >> avoided (afaik).
>>> >>
>>> >> [...]
>>> >>
>>> >>> If you continue to post such stuff that has no basis, I might actually
>>> >>> get tempted into finding out for which floating point values the new
>>> >>> code is significantly faster, craft a relevant audio file, and post it
>>> >>> showing a huge performance difference - my random numbers benchmark
>>> >>> shows there must exist such values.
>>> >>
>>> >> Please do so!
>>> >>
>>> >>> > The more important question is if you can see the same
>>> >>> > changes in the disassembly of af_astats.o as what
>>> >>> > ubitux posted here for a short test function?
>>> >>>
>>> >>> I do. He uses clang/gcc, so do I.
>>> >>
>>> >> Sorry, my understanding fails here (I am not a native speaker):
>>> >> You did look at the disassembly of af_astats.o and there is
>>> >> inlined code instead of a function call?
>>> >>
>>> >>> The reason (irrelevant) is that both
>>> >>> of us run Arch.
>>> >>>
>>> >>> What is "more relevant" is if _you_ can see the changes
>>> >>> on some non Linux platform.
>>> >>
>>> >> If you could show that it is faster on any platform
>>> >> I would already be happy!
>>> >>
>>> >
>>> > A more important check would be that its not significantly slower on
>>> > any other platform. Just because one compiler/glibc combination
>>> > manages to produce an efficient inlined function doesn't necessarily
>>> > mean that some other compiler or libc couldn't produce a full function
>>> > call with all the overhead that comes with it, becoming significantly
>>> > slower.
>>>
>>> As I point out, all a libc implementer needs to do to be on par with
>>> the macro is to add the inline keyword. This was added in c99. If said
>>> libc does not, then it is fundamentally broken from a performance
>>> perspective. A beginning programmer can do that in a couple of
>>> minutes. Fix upstream and complain to them if it does not inline.
>>
>> I dont know how the latest compilers handle "inline" but a few years
>> ago gcc was rather dumb about inlining, and i think its not easy for
>> a compiler to be actually not "dumb"
>>
>> A compiler cannot inline everything that has the inline keyword,
>> it would lead (for some source code) to an explosion on size and
>> compile time.
>> and a good compiler will want to inline some functions even if they
>> do not have the inline keyword
>> Also its not easy to know for a compiler what to
>> inline and what not, there could be 10 functions a1(),a2(), a3(), ...
>> each calling the previous 10 times ...
>> the way gcc handled this (in the past and AFAIK at least) is to have
>> various complicated thresholds that limit the amount of inlining.
>> The big annoyance with this (years ago at least) was that if you
>> forced a function to be inlined by "force" gcc would then stop
>> inlining something else and you ended up either forcing every single
>> function you needed inlined or would have had to tune the thresholds
>>
>> it would be interresting to check if replacing FFABS by fabs causes
>> any big changes to inlining behavior (maybe that can be done by
>> comparing the list of symbols in the object files as fully inlined
>> functions s´wouldnt show up but maybe there are other ways)
>>
>> anyway iam not against using fabs() for float/double FFABS()
>> i just think some assumtations in this thread are possibly too
>> optimistic, but its quite possible these replacements are all fine
>> and the changes in inlining if any have no performance impact
>
> I myself am not "optimistic" in the sense that I think most of the
> time this will have zero change. All I am saying is that in cases
> where there is a difference, it will likely be in favor of fabs, etc
> and not the macro due to reasons I mentioned in the long commit
> message I posted.
>
>>
>> also if a *abs is implemented by using a branch (as in if its positive
>> jump over a negate instruction) then branch prediction can play a
>> sigificant role in performance, that is random values would be alot
>> slower than the same values ordered
>
> Maybe this is why I get such a large difference between fabs and FFABS
> in favor of fabs - I just keep random numbers with no ordering. If
> true, this is definitely in fabs's favor.
>
>> a good implementation should not use a 

Re: [FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS

2015-10-22 Thread Carl Eugen Hoyos
Ganesh Ajjanagadde  mit.edu> writes:

> To put an end to a long and tortuous thread, and 
> due to the lack of relevant outstanding objections, 
> pushed.

To sum it up:
Several developers have explained to you that the 
numbers you posted show that FFmpeg is now either 
slower or equally fast, you have pushed with a 
commit message that claims that your patch makes 
FFmpeg faster.
How do you call such a claim without any base?
Carl Eugen

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS

2015-10-16 Thread Ganesh Ajjanagadde
On Fri, Oct 16, 2015 at 7:30 AM, Michael Niedermayer
 wrote:
> On Thu, Oct 15, 2015 at 06:38:10AM -0400, Ganesh Ajjanagadde wrote:
>> On Wed, Oct 14, 2015 at 6:53 AM, Hendrik Leppkes  wrote:
>> > On Wed, Oct 14, 2015 at 12:49 PM, Carl Eugen Hoyos  
>> > wrote:
>> >> Ganesh Ajjanagadde  mit.edu> writes:
>> >>
>> >>> What? My numbers actually show that the new code may be faster -
>> >>
>> >> No, you are misunderstanding the numbers you posted.
>> >> (Or I misunderstand them but nobody said so yet.)
>> >>
>> >> Highest runs are most relevant, skips have to be
>> >> avoided (afaik).
>> >>
>> >> [...]
>> >>
>> >>> If you continue to post such stuff that has no basis, I might actually
>> >>> get tempted into finding out for which floating point values the new
>> >>> code is significantly faster, craft a relevant audio file, and post it
>> >>> showing a huge performance difference - my random numbers benchmark
>> >>> shows there must exist such values.
>> >>
>> >> Please do so!
>> >>
>> >>> > The more important question is if you can see the same
>> >>> > changes in the disassembly of af_astats.o as what
>> >>> > ubitux posted here for a short test function?
>> >>>
>> >>> I do. He uses clang/gcc, so do I.
>> >>
>> >> Sorry, my understanding fails here (I am not a native speaker):
>> >> You did look at the disassembly of af_astats.o and there is
>> >> inlined code instead of a function call?
>> >>
>> >>> The reason (irrelevant) is that both
>> >>> of us run Arch.
>> >>>
>> >>> What is "more relevant" is if _you_ can see the changes
>> >>> on some non Linux platform.
>> >>
>> >> If you could show that it is faster on any platform
>> >> I would already be happy!
>> >>
>> >
>> > A more important check would be that its not significantly slower on
>> > any other platform. Just because one compiler/glibc combination
>> > manages to produce an efficient inlined function doesn't necessarily
>> > mean that some other compiler or libc couldn't produce a full function
>> > call with all the overhead that comes with it, becoming significantly
>> > slower.
>>
>> As I point out, all a libc implementer needs to do to be on par with
>> the macro is to add the inline keyword. This was added in c99. If said
>> libc does not, then it is fundamentally broken from a performance
>> perspective. A beginning programmer can do that in a couple of
>> minutes. Fix upstream and complain to them if it does not inline.
>
> I dont know how the latest compilers handle "inline" but a few years
> ago gcc was rather dumb about inlining, and i think its not easy for
> a compiler to be actually not "dumb"
>
> A compiler cannot inline everything that has the inline keyword,
> it would lead (for some source code) to an explosion on size and
> compile time.
> and a good compiler will want to inline some functions even if they
> do not have the inline keyword
> Also its not easy to know for a compiler what to
> inline and what not, there could be 10 functions a1(),a2(), a3(), ...
> each calling the previous 10 times ...
> the way gcc handled this (in the past and AFAIK at least) is to have
> various complicated thresholds that limit the amount of inlining.
> The big annoyance with this (years ago at least) was that if you
> forced a function to be inlined by "force" gcc would then stop
> inlining something else and you ended up either forcing every single
> function you needed inlined or would have had to tune the thresholds
>
> it would be interresting to check if replacing FFABS by fabs causes
> any big changes to inlining behavior (maybe that can be done by
> comparing the list of symbols in the object files as fully inlined
> functions s´wouldnt show up but maybe there are other ways)
>
> anyway iam not against using fabs() for float/double FFABS()
> i just think some assumtations in this thread are possibly too
> optimistic, but its quite possible these replacements are all fine
> and the changes in inlining if any have no performance impact

I myself am not "optimistic" in the sense that I think most of the
time this will have zero change. All I am saying is that in cases
where there is a difference, it will likely be in favor of fabs, etc
and not the macro due to reasons I mentioned in the long commit
message I posted.

>
> also if a *abs is implemented by using a branch (as in if its positive
> jump over a negate instruction) then branch prediction can play a
> sigificant role in performance, that is random values would be alot
> slower than the same values ordered

Maybe this is why I get such a large difference between fabs and FFABS
in favor of fabs - I just keep random numbers with no ordering. If
true, this is definitely in fabs's favor.

> a good implementation should not use a branch though, abs for floats
> and doubles is just setting the sign bit basically, platforms should
> have a dedicated instruction for that or in some cases a integer
> and/or could maybe even be used

Re: [FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS

2015-10-16 Thread Michael Niedermayer
On Thu, Oct 15, 2015 at 06:38:10AM -0400, Ganesh Ajjanagadde wrote:
> On Wed, Oct 14, 2015 at 6:53 AM, Hendrik Leppkes  wrote:
> > On Wed, Oct 14, 2015 at 12:49 PM, Carl Eugen Hoyos  wrote:
> >> Ganesh Ajjanagadde  mit.edu> writes:
> >>
> >>> What? My numbers actually show that the new code may be faster -
> >>
> >> No, you are misunderstanding the numbers you posted.
> >> (Or I misunderstand them but nobody said so yet.)
> >>
> >> Highest runs are most relevant, skips have to be
> >> avoided (afaik).
> >>
> >> [...]
> >>
> >>> If you continue to post such stuff that has no basis, I might actually
> >>> get tempted into finding out for which floating point values the new
> >>> code is significantly faster, craft a relevant audio file, and post it
> >>> showing a huge performance difference - my random numbers benchmark
> >>> shows there must exist such values.
> >>
> >> Please do so!
> >>
> >>> > The more important question is if you can see the same
> >>> > changes in the disassembly of af_astats.o as what
> >>> > ubitux posted here for a short test function?
> >>>
> >>> I do. He uses clang/gcc, so do I.
> >>
> >> Sorry, my understanding fails here (I am not a native speaker):
> >> You did look at the disassembly of af_astats.o and there is
> >> inlined code instead of a function call?
> >>
> >>> The reason (irrelevant) is that both
> >>> of us run Arch.
> >>>
> >>> What is "more relevant" is if _you_ can see the changes
> >>> on some non Linux platform.
> >>
> >> If you could show that it is faster on any platform
> >> I would already be happy!
> >>
> >
> > A more important check would be that its not significantly slower on
> > any other platform. Just because one compiler/glibc combination
> > manages to produce an efficient inlined function doesn't necessarily
> > mean that some other compiler or libc couldn't produce a full function
> > call with all the overhead that comes with it, becoming significantly
> > slower.
> 
> As I point out, all a libc implementer needs to do to be on par with
> the macro is to add the inline keyword. This was added in c99. If said
> libc does not, then it is fundamentally broken from a performance
> perspective. A beginning programmer can do that in a couple of
> minutes. Fix upstream and complain to them if it does not inline.

I dont know how the latest compilers handle "inline" but a few years
ago gcc was rather dumb about inlining, and i think its not easy for
a compiler to be actually not "dumb"

A compiler cannot inline everything that has the inline keyword,
it would lead (for some source code) to an explosion on size and
compile time.
and a good compiler will want to inline some functions even if they
do not have the inline keyword
Also its not easy to know for a compiler what to
inline and what not, there could be 10 functions a1(),a2(), a3(), ...
each calling the previous 10 times ...
the way gcc handled this (in the past and AFAIK at least) is to have
various complicated thresholds that limit the amount of inlining.
The big annoyance with this (years ago at least) was that if you
forced a function to be inlined by "force" gcc would then stop
inlining something else and you ended up either forcing every single
function you needed inlined or would have had to tune the thresholds

it would be interresting to check if replacing FFABS by fabs causes
any big changes to inlining behavior (maybe that can be done by
comparing the list of symbols in the object files as fully inlined
functions s´wouldnt show up but maybe there are other ways)

anyway iam not against using fabs() for float/double FFABS()
i just think some assumtations in this thread are possibly too
optimistic, but its quite possible these replacements are all fine
and the changes in inlining if any have no performance impact

also if a *abs is implemented by using a branch (as in if its positive
jump over a negate instruction) then branch prediction can play a
sigificant role in performance, that is random values would be alot
slower than the same values ordered
a good implementation should not use a branch though, abs for floats
and doubles is just setting the sign bit basically, platforms should
have a dedicated instruction for that or in some cases a integer
and/or could maybe even be used

[...]
-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Why not whip the teacher when the pupil misbehaves? -- Diogenes of Sinope


signature.asc
Description: Digital signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS

2015-10-15 Thread Hendrik Leppkes
On Thu, Oct 15, 2015 at 12:53 PM, Ganesh Ajjanagadde  wrote:
> On Thu, Oct 15, 2015 at 6:50 AM, Hendrik Leppkes  wrote:
>> On Thu, Oct 15, 2015 at 12:41 PM, Ganesh Ajjanagadde  
>> wrote:
>>> On Thu, Oct 15, 2015 at 6:37 AM, Hendrik Leppkes  
>>> wrote:
 On Thu, Oct 15, 2015 at 12:34 PM, Ganesh Ajjanagadde  
 wrote:
> On Wed, Oct 14, 2015 at 6:50 AM, Hendrik Leppkes  
> wrote:
>> On Wed, Oct 14, 2015 at 12:37 PM, Ganesh Ajjanagadde  
>> wrote:
>>> On Wed, Oct 14, 2015 at 5:01 AM, Matt Oliver  
>>> wrote:
 On 14 October 2015 at 09:46, Ganesh Ajjanagadde  
 wrote:

> On Tue, Oct 13, 2015 at 9:12 AM, Ganesh Ajjanagadde 
> wrote:
> > On Tue, Oct 13, 2015 at 4:02 AM, Clément Bœsch  wrote:
> >> On Tue, Oct 13, 2015 at 09:25:03AM +0200, Paul B Mahol wrote:
> >> [...]
> >>> What about fmax/FFMAX?
> >>
> >> Feel free to try that out (it looks OT regarding the patch), but 
> >> fmax()
> >> looks glibc specific
>
> Seems they are actually ISO:
> http://en.cppreference.com/w/c/numeric/math/fmax
>
> Can someone check availability on all of our platforms of interest
> (e.g Microsoft)?
>

 fmax and fmin are only available on msvc using 2013 or newer. 
 Currently the
 only msvc version without fmax/fmin that FFmpeg supports is 2012 which 
 uses
 the C99 to C89 converter.
>>>
>>> And does that converter handle fmin, fmax, fmaxf, etc?
>>> Does it need patches?
>>> Bottom line: are they safe to use at the moment?
>>>
>>
>> No, they are not.
>>
>> One thing I don't understand - why are we bothering with something
>> that at best comes out as "same speed" from tests performed? (low
>> number of runs are irrelevant as they are not statistically
>> significant).
>
> Because if you actually bothered to run my random numbers benchmark
> instead of posting with no basis claiming "statistical
> insignificance", or for that matter matter bothered to actually check
> the libc link, or even looked at Clement's asm test - you would
> finally understand.
>
> Also, what needs to be done to get fmax, fmin, etc into the converter?
>

 The converter doesn't provide any functions, just alters the syntax if
 needed. Functions not available cannot be fixed that way, sorry.
>>>
>>> Thanks for clarifying. I am still confused: how do we have llabs then?
>>> Per MSDN, this was not present in MSVC 2012, and was added in MSVC
>>> 2013 (looks like a similar case to fabs, fabsf).
>>>
>>
>> Docs appear to be wrong in that particular case. It happens sometimes
>> that functions are available but didn't get added to the docs.
>
> But with respect to fmin, fmax, etc - they were not available in 2012,
> and the docs are right? Are you sure, and have you tested?
>

Yes those are definitely not available.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS

2015-10-15 Thread Ganesh Ajjanagadde
On Thu, Oct 15, 2015 at 6:50 AM, Hendrik Leppkes  wrote:
> On Thu, Oct 15, 2015 at 12:41 PM, Ganesh Ajjanagadde  wrote:
>> On Thu, Oct 15, 2015 at 6:37 AM, Hendrik Leppkes  wrote:
>>> On Thu, Oct 15, 2015 at 12:34 PM, Ganesh Ajjanagadde  
>>> wrote:
 On Wed, Oct 14, 2015 at 6:50 AM, Hendrik Leppkes  
 wrote:
> On Wed, Oct 14, 2015 at 12:37 PM, Ganesh Ajjanagadde  
> wrote:
>> On Wed, Oct 14, 2015 at 5:01 AM, Matt Oliver  
>> wrote:
>>> On 14 October 2015 at 09:46, Ganesh Ajjanagadde  
>>> wrote:
>>>
 On Tue, Oct 13, 2015 at 9:12 AM, Ganesh Ajjanagadde 
 wrote:
 > On Tue, Oct 13, 2015 at 4:02 AM, Clément Bœsch  wrote:
 >> On Tue, Oct 13, 2015 at 09:25:03AM +0200, Paul B Mahol wrote:
 >> [...]
 >>> What about fmax/FFMAX?
 >>
 >> Feel free to try that out (it looks OT regarding the patch), but 
 >> fmax()
 >> looks glibc specific

 Seems they are actually ISO:
 http://en.cppreference.com/w/c/numeric/math/fmax

 Can someone check availability on all of our platforms of interest
 (e.g Microsoft)?

>>>
>>> fmax and fmin are only available on msvc using 2013 or newer. Currently 
>>> the
>>> only msvc version without fmax/fmin that FFmpeg supports is 2012 which 
>>> uses
>>> the C99 to C89 converter.
>>
>> And does that converter handle fmin, fmax, fmaxf, etc?
>> Does it need patches?
>> Bottom line: are they safe to use at the moment?
>>
>
> No, they are not.
>
> One thing I don't understand - why are we bothering with something
> that at best comes out as "same speed" from tests performed? (low
> number of runs are irrelevant as they are not statistically
> significant).

 Because if you actually bothered to run my random numbers benchmark
 instead of posting with no basis claiming "statistical
 insignificance", or for that matter matter bothered to actually check
 the libc link, or even looked at Clement's asm test - you would
 finally understand.

 Also, what needs to be done to get fmax, fmin, etc into the converter?

>>>
>>> The converter doesn't provide any functions, just alters the syntax if
>>> needed. Functions not available cannot be fixed that way, sorry.
>>
>> Thanks for clarifying. I am still confused: how do we have llabs then?
>> Per MSDN, this was not present in MSVC 2012, and was added in MSVC
>> 2013 (looks like a similar case to fabs, fabsf).
>>
>
> Docs appear to be wrong in that particular case. It happens sometimes
> that functions are available but didn't get added to the docs.

But with respect to fmin, fmax, etc - they were not available in 2012,
and the docs are right? Are you sure, and have you tested?

>
> - Hendrik
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS

2015-10-15 Thread Ganesh Ajjanagadde
On Wed, Oct 14, 2015 at 6:50 AM, Hendrik Leppkes  wrote:
> On Wed, Oct 14, 2015 at 12:37 PM, Ganesh Ajjanagadde  wrote:
>> On Wed, Oct 14, 2015 at 5:01 AM, Matt Oliver  wrote:
>>> On 14 October 2015 at 09:46, Ganesh Ajjanagadde  wrote:
>>>
 On Tue, Oct 13, 2015 at 9:12 AM, Ganesh Ajjanagadde 
 wrote:
 > On Tue, Oct 13, 2015 at 4:02 AM, Clément Bœsch  wrote:
 >> On Tue, Oct 13, 2015 at 09:25:03AM +0200, Paul B Mahol wrote:
 >> [...]
 >>> What about fmax/FFMAX?
 >>
 >> Feel free to try that out (it looks OT regarding the patch), but fmax()
 >> looks glibc specific

 Seems they are actually ISO:
 http://en.cppreference.com/w/c/numeric/math/fmax

 Can someone check availability on all of our platforms of interest
 (e.g Microsoft)?

>>>
>>> fmax and fmin are only available on msvc using 2013 or newer. Currently the
>>> only msvc version without fmax/fmin that FFmpeg supports is 2012 which uses
>>> the C99 to C89 converter.
>>
>> And does that converter handle fmin, fmax, fmaxf, etc?
>> Does it need patches?
>> Bottom line: are they safe to use at the moment?
>>
>
> No, they are not.
>
> One thing I don't understand - why are we bothering with something
> that at best comes out as "same speed" from tests performed? (low
> number of runs are irrelevant as they are not statistically
> significant).

Because if you actually bothered to run my random numbers benchmark
instead of posting with no basis claiming "statistical
insignificance", or for that matter matter bothered to actually check
the libc link, or even looked at Clement's asm test - you would
finally understand.

Also, what needs to be done to get fmax, fmin, etc into the converter?

> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS

2015-10-15 Thread Ganesh Ajjanagadde
On Thu, Oct 15, 2015 at 6:41 AM, Ganesh Ajjanagadde  wrote:
> On Thu, Oct 15, 2015 at 6:37 AM, Hendrik Leppkes  wrote:
>> On Thu, Oct 15, 2015 at 12:34 PM, Ganesh Ajjanagadde  
>> wrote:
>>> On Wed, Oct 14, 2015 at 6:50 AM, Hendrik Leppkes  
>>> wrote:
 On Wed, Oct 14, 2015 at 12:37 PM, Ganesh Ajjanagadde  
 wrote:
> On Wed, Oct 14, 2015 at 5:01 AM, Matt Oliver  wrote:
>> On 14 October 2015 at 09:46, Ganesh Ajjanagadde  wrote:
>>
>>> On Tue, Oct 13, 2015 at 9:12 AM, Ganesh Ajjanagadde 
>>> wrote:
>>> > On Tue, Oct 13, 2015 at 4:02 AM, Clément Bœsch  wrote:
>>> >> On Tue, Oct 13, 2015 at 09:25:03AM +0200, Paul B Mahol wrote:
>>> >> [...]
>>> >>> What about fmax/FFMAX?
>>> >>
>>> >> Feel free to try that out (it looks OT regarding the patch), but 
>>> >> fmax()
>>> >> looks glibc specific
>>>
>>> Seems they are actually ISO:
>>> http://en.cppreference.com/w/c/numeric/math/fmax
>>>
>>> Can someone check availability on all of our platforms of interest
>>> (e.g Microsoft)?
>>>
>>
>> fmax and fmin are only available on msvc using 2013 or newer. Currently 
>> the
>> only msvc version without fmax/fmin that FFmpeg supports is 2012 which 
>> uses
>> the C99 to C89 converter.
>
> And does that converter handle fmin, fmax, fmaxf, etc?
> Does it need patches?
> Bottom line: are they safe to use at the moment?
>

 No, they are not.

 One thing I don't understand - why are we bothering with something
 that at best comes out as "same speed" from tests performed? (low
 number of runs are irrelevant as they are not statistically
 significant).
>>>
>>> Because if you actually bothered to run my random numbers benchmark
>>> instead of posting with no basis claiming "statistical
>>> insignificance", or for that matter matter bothered to actually check
>>> the libc link, or even looked at Clement's asm test - you would
>>> finally understand.
>>>
>>> Also, what needs to be done to get fmax, fmin, etc into the converter?
>>>
>>
>> The converter doesn't provide any functions, just alters the syntax if
>> needed. Functions not available cannot be fixed that way, sorry.
>
> Thanks for clarifying. I am still confused: how do we have llabs then?
> Per MSDN, this was not present in MSVC 2012, and was added in MSVC
> 2013 (looks like a similar case to fabs, fabsf).

sorry, fabs, fabsf -> fmin, fmax etc.

>
>>
>> - Hendrik
>> ___
>> ffmpeg-devel mailing list
>> ffmpeg-devel@ffmpeg.org
>> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS

2015-10-15 Thread Hendrik Leppkes
On Thu, Oct 15, 2015 at 12:41 PM, Ganesh Ajjanagadde  wrote:
> On Thu, Oct 15, 2015 at 6:37 AM, Hendrik Leppkes  wrote:
>> On Thu, Oct 15, 2015 at 12:34 PM, Ganesh Ajjanagadde  
>> wrote:
>>> On Wed, Oct 14, 2015 at 6:50 AM, Hendrik Leppkes  
>>> wrote:
 On Wed, Oct 14, 2015 at 12:37 PM, Ganesh Ajjanagadde  
 wrote:
> On Wed, Oct 14, 2015 at 5:01 AM, Matt Oliver  wrote:
>> On 14 October 2015 at 09:46, Ganesh Ajjanagadde  wrote:
>>
>>> On Tue, Oct 13, 2015 at 9:12 AM, Ganesh Ajjanagadde 
>>> wrote:
>>> > On Tue, Oct 13, 2015 at 4:02 AM, Clément Bœsch  wrote:
>>> >> On Tue, Oct 13, 2015 at 09:25:03AM +0200, Paul B Mahol wrote:
>>> >> [...]
>>> >>> What about fmax/FFMAX?
>>> >>
>>> >> Feel free to try that out (it looks OT regarding the patch), but 
>>> >> fmax()
>>> >> looks glibc specific
>>>
>>> Seems they are actually ISO:
>>> http://en.cppreference.com/w/c/numeric/math/fmax
>>>
>>> Can someone check availability on all of our platforms of interest
>>> (e.g Microsoft)?
>>>
>>
>> fmax and fmin are only available on msvc using 2013 or newer. Currently 
>> the
>> only msvc version without fmax/fmin that FFmpeg supports is 2012 which 
>> uses
>> the C99 to C89 converter.
>
> And does that converter handle fmin, fmax, fmaxf, etc?
> Does it need patches?
> Bottom line: are they safe to use at the moment?
>

 No, they are not.

 One thing I don't understand - why are we bothering with something
 that at best comes out as "same speed" from tests performed? (low
 number of runs are irrelevant as they are not statistically
 significant).
>>>
>>> Because if you actually bothered to run my random numbers benchmark
>>> instead of posting with no basis claiming "statistical
>>> insignificance", or for that matter matter bothered to actually check
>>> the libc link, or even looked at Clement's asm test - you would
>>> finally understand.
>>>
>>> Also, what needs to be done to get fmax, fmin, etc into the converter?
>>>
>>
>> The converter doesn't provide any functions, just alters the syntax if
>> needed. Functions not available cannot be fixed that way, sorry.
>
> Thanks for clarifying. I am still confused: how do we have llabs then?
> Per MSDN, this was not present in MSVC 2012, and was added in MSVC
> 2013 (looks like a similar case to fabs, fabsf).
>

Docs appear to be wrong in that particular case. It happens sometimes
that functions are available but didn't get added to the docs.

- Hendrik
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS

2015-10-15 Thread Ganesh Ajjanagadde
On Wed, Oct 14, 2015 at 6:53 AM, Hendrik Leppkes  wrote:
> On Wed, Oct 14, 2015 at 12:49 PM, Carl Eugen Hoyos  wrote:
>> Ganesh Ajjanagadde  mit.edu> writes:
>>
>>> What? My numbers actually show that the new code may be faster -
>>
>> No, you are misunderstanding the numbers you posted.
>> (Or I misunderstand them but nobody said so yet.)
>>
>> Highest runs are most relevant, skips have to be
>> avoided (afaik).
>>
>> [...]
>>
>>> If you continue to post such stuff that has no basis, I might actually
>>> get tempted into finding out for which floating point values the new
>>> code is significantly faster, craft a relevant audio file, and post it
>>> showing a huge performance difference - my random numbers benchmark
>>> shows there must exist such values.
>>
>> Please do so!
>>
>>> > The more important question is if you can see the same
>>> > changes in the disassembly of af_astats.o as what
>>> > ubitux posted here for a short test function?
>>>
>>> I do. He uses clang/gcc, so do I.
>>
>> Sorry, my understanding fails here (I am not a native speaker):
>> You did look at the disassembly of af_astats.o and there is
>> inlined code instead of a function call?
>>
>>> The reason (irrelevant) is that both
>>> of us run Arch.
>>>
>>> What is "more relevant" is if _you_ can see the changes
>>> on some non Linux platform.
>>
>> If you could show that it is faster on any platform
>> I would already be happy!
>>
>
> A more important check would be that its not significantly slower on
> any other platform. Just because one compiler/glibc combination
> manages to produce an efficient inlined function doesn't necessarily
> mean that some other compiler or libc couldn't produce a full function
> call with all the overhead that comes with it, becoming significantly
> slower.

As I point out, all a libc implementer needs to do to be on par with
the macro is to add the inline keyword. This was added in c99. If said
libc does not, then it is fundamentally broken from a performance
perspective. A beginning programmer can do that in a couple of
minutes. Fix upstream and complain to them if it does not inline.

You seem to have an alternative platform: you (and others who have
such platforms) are welcome to try and find out, file bugs (if need
be) with Microsoft, etc.

> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS

2015-10-15 Thread Hendrik Leppkes
On Thu, Oct 15, 2015 at 12:34 PM, Ganesh Ajjanagadde  wrote:
> On Wed, Oct 14, 2015 at 6:50 AM, Hendrik Leppkes  wrote:
>> On Wed, Oct 14, 2015 at 12:37 PM, Ganesh Ajjanagadde  
>> wrote:
>>> On Wed, Oct 14, 2015 at 5:01 AM, Matt Oliver  wrote:
 On 14 October 2015 at 09:46, Ganesh Ajjanagadde  wrote:

> On Tue, Oct 13, 2015 at 9:12 AM, Ganesh Ajjanagadde 
> wrote:
> > On Tue, Oct 13, 2015 at 4:02 AM, Clément Bœsch  wrote:
> >> On Tue, Oct 13, 2015 at 09:25:03AM +0200, Paul B Mahol wrote:
> >> [...]
> >>> What about fmax/FFMAX?
> >>
> >> Feel free to try that out (it looks OT regarding the patch), but fmax()
> >> looks glibc specific
>
> Seems they are actually ISO:
> http://en.cppreference.com/w/c/numeric/math/fmax
>
> Can someone check availability on all of our platforms of interest
> (e.g Microsoft)?
>

 fmax and fmin are only available on msvc using 2013 or newer. Currently the
 only msvc version without fmax/fmin that FFmpeg supports is 2012 which uses
 the C99 to C89 converter.
>>>
>>> And does that converter handle fmin, fmax, fmaxf, etc?
>>> Does it need patches?
>>> Bottom line: are they safe to use at the moment?
>>>
>>
>> No, they are not.
>>
>> One thing I don't understand - why are we bothering with something
>> that at best comes out as "same speed" from tests performed? (low
>> number of runs are irrelevant as they are not statistically
>> significant).
>
> Because if you actually bothered to run my random numbers benchmark
> instead of posting with no basis claiming "statistical
> insignificance", or for that matter matter bothered to actually check
> the libc link, or even looked at Clement's asm test - you would
> finally understand.
>
> Also, what needs to be done to get fmax, fmin, etc into the converter?
>

The converter doesn't provide any functions, just alters the syntax if
needed. Functions not available cannot be fixed that way, sorry.

- Hendrik
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS

2015-10-15 Thread Ganesh Ajjanagadde
On Thu, Oct 15, 2015 at 6:57 AM, Hendrik Leppkes  wrote:
> On Thu, Oct 15, 2015 at 12:53 PM, Ganesh Ajjanagadde  wrote:
>> On Thu, Oct 15, 2015 at 6:50 AM, Hendrik Leppkes  wrote:
>>> On Thu, Oct 15, 2015 at 12:41 PM, Ganesh Ajjanagadde  
>>> wrote:
 On Thu, Oct 15, 2015 at 6:37 AM, Hendrik Leppkes  
 wrote:
> On Thu, Oct 15, 2015 at 12:34 PM, Ganesh Ajjanagadde  
> wrote:
>> On Wed, Oct 14, 2015 at 6:50 AM, Hendrik Leppkes  
>> wrote:
>>> On Wed, Oct 14, 2015 at 12:37 PM, Ganesh Ajjanagadde  
>>> wrote:
 On Wed, Oct 14, 2015 at 5:01 AM, Matt Oliver  
 wrote:
> On 14 October 2015 at 09:46, Ganesh Ajjanagadde  
> wrote:
>
>> On Tue, Oct 13, 2015 at 9:12 AM, Ganesh Ajjanagadde 
>> 
>> wrote:
>> > On Tue, Oct 13, 2015 at 4:02 AM, Clément Bœsch  wrote:
>> >> On Tue, Oct 13, 2015 at 09:25:03AM +0200, Paul B Mahol wrote:
>> >> [...]
>> >>> What about fmax/FFMAX?
>> >>
>> >> Feel free to try that out (it looks OT regarding the patch), but 
>> >> fmax()
>> >> looks glibc specific
>>
>> Seems they are actually ISO:
>> http://en.cppreference.com/w/c/numeric/math/fmax
>>
>> Can someone check availability on all of our platforms of interest
>> (e.g Microsoft)?
>>
>
> fmax and fmin are only available on msvc using 2013 or newer. 
> Currently the
> only msvc version without fmax/fmin that FFmpeg supports is 2012 
> which uses
> the C99 to C89 converter.

 And does that converter handle fmin, fmax, fmaxf, etc?
 Does it need patches?
 Bottom line: are they safe to use at the moment?

>>>
>>> No, they are not.
>>>
>>> One thing I don't understand - why are we bothering with something
>>> that at best comes out as "same speed" from tests performed? (low
>>> number of runs are irrelevant as they are not statistically
>>> significant).
>>
>> Because if you actually bothered to run my random numbers benchmark
>> instead of posting with no basis claiming "statistical
>> insignificance", or for that matter matter bothered to actually check
>> the libc link, or even looked at Clement's asm test - you would
>> finally understand.
>>
>> Also, what needs to be done to get fmax, fmin, etc into the converter?
>>
>
> The converter doesn't provide any functions, just alters the syntax if
> needed. Functions not available cannot be fixed that way, sorry.

 Thanks for clarifying. I am still confused: how do we have llabs then?
 Per MSDN, this was not present in MSVC 2012, and was added in MSVC
 2013 (looks like a similar case to fabs, fabsf).

>>>
>>> Docs appear to be wrong in that particular case. It happens sometimes
>>> that functions are available but didn't get added to the docs.
>>
>> But with respect to fmin, fmax, etc - they were not available in 2012,
>> and the docs are right? Are you sure, and have you tested?
>>
>
> Yes those are definitely not available.

Thanks for the check. So you mentioned we can't add fmin/fmax with the
c99-89 converter. I noticed we have a libavutil/libm.h: couldn't a
"portable" fmin/fmax etc go there?

> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS

2015-10-15 Thread Ganesh Ajjanagadde
On Wed, Oct 14, 2015 at 6:54 AM, Carl Eugen Hoyos  wrote:
> Hendrik Leppkes  gmail.com> writes:
>
>> One thing I don't understand - why are we bothering
>> with something that at best comes out as "same speed"
>> from tests performed? (low number of runs are
>> irrelevant as they are not statistically significant).
>
> Since the patches apparently save a function call,
> the question is imo: Why is this not measurable?
> Shouldn't a function call always have a clear impact?

Unless I am completely off with asm (entirely possible, it is not my
area of interest) - there is no "call" for either and both are being
inlined. The difference is that fabs() and FFABS are being optimized
differently.

>
> Carl Eugen
>
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS

2015-10-14 Thread Carl Eugen Hoyos
Ganesh Ajjanagadde  mit.edu> writes:

> >> I think the general case, it'd be nice to figure out
> >> why Carl's results are slightly different from yours
> >
> > Why do you think they are different at all?
> > Did you look at the tables?
> 
> They are different, and our conclusions are different 
> (in a slight way). Carl claims that the old code and 
> new code are mostly the same in speed, but in the 
> cases where they differ, the old code is faster.

No.

I wrote that both the numbers you posted and the numbers 
I posted show no proof that the new code is faster.
(Contrary to my numbers, your numbers show that the old 
code may be faster but that is irrelevant.)

The more important question is if you can see the same 
changes in the disassembly of af_astats.o as what 
ubitux posted here for a short test function?

Carl Eugen

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS

2015-10-14 Thread Ganesh Ajjanagadde
On Wed, Oct 14, 2015 at 2:40 AM, Carl Eugen Hoyos  wrote:
> Ganesh Ajjanagadde  mit.edu> writes:
>
>> >> I think the general case, it'd be nice to figure out
>> >> why Carl's results are slightly different from yours
>> >
>> > Why do you think they are different at all?
>> > Did you look at the tables?
>>
>> They are different, and our conclusions are different
>> (in a slight way). Carl claims that the old code and
>> new code are mostly the same in speed, but in the
>> cases where they differ, the old code is faster.
>
> No.
>
> I wrote that both the numbers you posted and the numbers
> I posted show no proof that the new code is faster.

You posted a first reply with all manner of stuff saying essentially
that "I believe the old code may be faster". You have not withdrawn
that claim, and neither have I withdrawn mine.

> (Contrary to my numbers, your numbers show that the old
> code may be faster but that is irrelevant.)

What? My numbers actually show that the new code may be faster -
again: cycle times in the best case are identical, in the worst case
they favor the new code. My random number benchmark is also clearly in
favor of the new code. How this is "irrelevant" is beyond me. Also,
please don't spin my numbers into something they are not: this is
distracting the thread. Clement and Paul have already started moving
to using the function, others are free to see the numbers themselves.
Why you are trying to derail the benchmarks I posted is beyond me.

If you continue to post such stuff that has no basis, I might actually
get tempted into finding out for which floating point values the new
code is significantly faster, craft a relevant audio file, and post it
showing a huge performance difference - my random numbers benchmark
shows there must exist such values.

>
> The more important question is if you can see the same
> changes in the disassembly of af_astats.o as what
> ubitux posted here for a short test function?

I do. He uses clang/gcc, so do I. The reason (irrelevant) is that both
of us run Arch.

What is "more relevant" is if _you_ can see the changes on some non
Linux platform.

>
> Carl Eugen
>
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS

2015-10-14 Thread Ganesh Ajjanagadde
On Wed, Oct 14, 2015 at 5:01 AM, Matt Oliver  wrote:
> On 14 October 2015 at 09:46, Ganesh Ajjanagadde  wrote:
>
>> On Tue, Oct 13, 2015 at 9:12 AM, Ganesh Ajjanagadde 
>> wrote:
>> > On Tue, Oct 13, 2015 at 4:02 AM, Clément Bœsch  wrote:
>> >> On Tue, Oct 13, 2015 at 09:25:03AM +0200, Paul B Mahol wrote:
>> >> [...]
>> >>> What about fmax/FFMAX?
>> >>
>> >> Feel free to try that out (it looks OT regarding the patch), but fmax()
>> >> looks glibc specific
>>
>> Seems they are actually ISO:
>> http://en.cppreference.com/w/c/numeric/math/fmax
>>
>> Can someone check availability on all of our platforms of interest
>> (e.g Microsoft)?
>>
>
> fmax and fmin are only available on msvc using 2013 or newer. Currently the
> only msvc version without fmax/fmin that FFmpeg supports is 2012 which uses
> the C99 to C89 converter.

And does that converter handle fmin, fmax, fmaxf, etc?
Does it need patches?
Bottom line: are they safe to use at the moment?

> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS

2015-10-14 Thread Hendrik Leppkes
On Wed, Oct 14, 2015 at 12:37 PM, Ganesh Ajjanagadde  wrote:
> On Wed, Oct 14, 2015 at 5:01 AM, Matt Oliver  wrote:
>> On 14 October 2015 at 09:46, Ganesh Ajjanagadde  wrote:
>>
>>> On Tue, Oct 13, 2015 at 9:12 AM, Ganesh Ajjanagadde 
>>> wrote:
>>> > On Tue, Oct 13, 2015 at 4:02 AM, Clément Bœsch  wrote:
>>> >> On Tue, Oct 13, 2015 at 09:25:03AM +0200, Paul B Mahol wrote:
>>> >> [...]
>>> >>> What about fmax/FFMAX?
>>> >>
>>> >> Feel free to try that out (it looks OT regarding the patch), but fmax()
>>> >> looks glibc specific
>>>
>>> Seems they are actually ISO:
>>> http://en.cppreference.com/w/c/numeric/math/fmax
>>>
>>> Can someone check availability on all of our platforms of interest
>>> (e.g Microsoft)?
>>>
>>
>> fmax and fmin are only available on msvc using 2013 or newer. Currently the
>> only msvc version without fmax/fmin that FFmpeg supports is 2012 which uses
>> the C99 to C89 converter.
>
> And does that converter handle fmin, fmax, fmaxf, etc?
> Does it need patches?
> Bottom line: are they safe to use at the moment?
>

No, they are not.

One thing I don't understand - why are we bothering with something
that at best comes out as "same speed" from tests performed? (low
number of runs are irrelevant as they are not statistically
significant).
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS

2015-10-14 Thread Carl Eugen Hoyos
Ganesh Ajjanagadde  mit.edu> writes:

> What? My numbers actually show that the new code may be faster -

No, you are misunderstanding the numbers you posted.
(Or I misunderstand them but nobody said so yet.)

Highest runs are most relevant, skips have to be 
avoided (afaik).

[...]

> If you continue to post such stuff that has no basis, I might actually
> get tempted into finding out for which floating point values the new
> code is significantly faster, craft a relevant audio file, and post it
> showing a huge performance difference - my random numbers benchmark
> shows there must exist such values.

Please do so!

> > The more important question is if you can see the same
> > changes in the disassembly of af_astats.o as what
> > ubitux posted here for a short test function?
> 
> I do. He uses clang/gcc, so do I.

Sorry, my understanding fails here (I am not a native speaker):
You did look at the disassembly of af_astats.o and there is 
inlined code instead of a function call?

> The reason (irrelevant) is that both
> of us run Arch.
> 
> What is "more relevant" is if _you_ can see the changes 
> on some non Linux platform.

If you could show that it is faster on any platform 
I would already be happy!

Carl Eugen

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS

2015-10-14 Thread Ganesh Ajjanagadde
On Wed, Oct 14, 2015 at 6:49 AM, Carl Eugen Hoyos  wrote:
> Ganesh Ajjanagadde  mit.edu> writes:
>
>> What? My numbers actually show that the new code may be faster -
>
> No, you are misunderstanding the numbers you posted.
> (Or I misunderstand them but nobody said so yet.)
>
> Highest runs are most relevant, skips have to be
> avoided (afaik).

Usually yes, but for such a small function, even the low runs should
be important.
Explain to me why I get consistently lower numbers with the new code
(on low runs) - if you are inclined to believe that they are
irrelevant, then why do I see a consistent trend there and not simply
"noise"?

>
> [...]
>
>> If you continue to post such stuff that has no basis, I might actually
>> get tempted into finding out for which floating point values the new
>> code is significantly faster, craft a relevant audio file, and post it
>> showing a huge performance difference - my random numbers benchmark
>> shows there must exist such values.
>
> Please do so!
>
>> > The more important question is if you can see the same
>> > changes in the disassembly of af_astats.o as what
>> > ubitux posted here for a short test function?
>>
>> I do. He uses clang/gcc, so do I.
>
> Sorry, my understanding fails here (I am not a native speaker):
> You did look at the disassembly of af_astats.o and there is
> inlined code instead of a function call?
>
>> The reason (irrelevant) is that both
>> of us run Arch.
>>
>> What is "more relevant" is if _you_ can see the changes
>> on some non Linux platform.
>
> If you could show that it is faster on any platform
> I would already be happy!

I already have with my random number benchmark. The original glibc
link I posted (which you essentially dismissed as irrelevant) also
shows why they switched away from their macros to fabs, fabsf, etc.

>
> Carl Eugen
>
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS

2015-10-13 Thread Carl Eugen Hoyos
Ronald S. Bultje  gmail.com> writes:

> I think the general case, it'd be nice to figure out 
> why Carl's results are slightly different from yours

Why do you think they are different at all?
Did you look at the tables?

Carl Eugen

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS

2015-10-13 Thread Ganesh Ajjanagadde
On Tue, Oct 13, 2015 at 11:15 PM, Carl Eugen Hoyos  wrote:
> Ronald S. Bultje  gmail.com> writes:
>
>> I think the general case, it'd be nice to figure out
>> why Carl's results are slightly different from yours
>
> Why do you think they are different at all?
> Did you look at the tables?

They are different, and our conclusions are different (in a slight way).
Carl claims that the old code and new code are mostly the same in
speed, but in the cases where they differ, the old code is faster.
I claim the opposite: they are mostly the same, but in the cases where
they differ, the new code is faster.

>
> Carl Eugen
>
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS

2015-10-13 Thread Clément Bœsch
On Tue, Oct 13, 2015 at 12:31:10AM -0400, Ganesh Ajjanagadde wrote:
> On Tue, Oct 13, 2015 at 12:26 AM, Ganesh Ajjanagadde  wrote:
> > On Tue, Oct 13, 2015 at 12:16 AM, Carl Eugen Hoyos  wrote:
> >> Ganesh Ajjanagadde  mit.edu> writes:
> >>
> >>> Bench from libavfilter/astats on a 15 min clip.
> >>
> >> I believe that your test would indicate that the
> >> old variant is faster or that no result can be
> >> given which is what my tests show.
> 
> Also, how you can possibly believe that the old variant is faster is
> beyond me given the astonishing amount of work by Intel, Red Hat, and
> others to create the absolutely best performing libc.
> 
> Just have a look at
> https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/ieee754/dbl-64/s_sin.c;hb=HEAD#l281,
> it gives an idea of the extreme lengths they go to.
> 

https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/ieee754/dbl-64/s_fabs.c;hb=HEAD

[/tmp]☭ cat a.c
#include 
#include 

#define FFABS(a) ((a) >= 0 ? (a) : (-(a)))

double f1d(double x) { return fabs(x); }
double f2d(double x) { return FFABS(x); }

int f1i(int x) { return abs(x); }
int f2i(int x) { return FFABS(x); }
[/tmp]☭ gcc -O2 -c a.c && objdump -d -Mintel a.o

a.o: file format elf64-x86-64


Disassembly of section .text:

 :
   0:   f2 0f 10 0d 00 00 00movsd  xmm1,QWORD PTR [rip+0x0]# 8 

   7:   00 
   8:   66 0f 54 c1 andpd  xmm0,xmm1
   c:   c3  ret
   d:   0f 1f 00nopDWORD PTR [rax]

0010 :
  10:   66 0f 2e 05 00 00 00ucomisd xmm0,QWORD PTR [rip+0x0]# 18 

  17:   00 
  18:   72 06   jb 20 
  1a:   f3 c3   repz ret 
  1c:   0f 1f 40 00 nopDWORD PTR [rax+0x0]
  20:   f2 0f 10 0d 00 00 00movsd  xmm1,QWORD PTR [rip+0x0]# 28 

  27:   00 
  28:   66 0f 57 c1 xorpd  xmm0,xmm1
  2c:   c3  ret
  2d:   0f 1f 00nopDWORD PTR [rax]

0030 :
  30:   89 fa   movedx,edi
  32:   89 f8   moveax,edi
  34:   c1 fa 1fsaredx,0x1f
  37:   31 d0   xoreax,edx
  39:   29 d0   subeax,edx
  3b:   c3  ret
  3c:   0f 1f 40 00 nopDWORD PTR [rax+0x0]

0040 :
  40:   89 fa   movedx,edi
  42:   89 f8   moveax,edi
  44:   c1 fa 1fsaredx,0x1f
  47:   31 d0   xoreax,edx
  49:   29 d0   subeax,edx
  4b:   c3  ret
[/tmp]☭ 

So fabs() is inlined by the compiler (gcc 5.2.0 here), while abs() is
essentially identical to FFABS().

I have similar results with clang (3.7.0).

Conclusion: using fabs() looks better with at least recent versions of clang
and GCC on x86-64 (but may introduce slight behaviour changes?)

To be more rigorous, it would be interesting to compare on different arch &
compilers, but changing FFABS() with fabs() sounds OK to me.

-- 
Clément B.


signature.asc
Description: PGP signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS

2015-10-13 Thread Paul B Mahol
On 10/13/15, Clement Boesch  wrote:
> On Tue, Oct 13, 2015 at 12:31:10AM -0400, Ganesh Ajjanagadde wrote:
>> On Tue, Oct 13, 2015 at 12:26 AM, Ganesh Ajjanagadde 
>> wrote:
>> > On Tue, Oct 13, 2015 at 12:16 AM, Carl Eugen Hoyos 
>> > wrote:
>> >> Ganesh Ajjanagadde  mit.edu> writes:
>> >>
>> >>> Bench from libavfilter/astats on a 15 min clip.
>> >>
>> >> I believe that your test would indicate that the
>> >> old variant is faster or that no result can be
>> >> given which is what my tests show.
>>
>> Also, how you can possibly believe that the old variant is faster is
>> beyond me given the astonishing amount of work by Intel, Red Hat, and
>> others to create the absolutely best performing libc.
>>
>> Just have a look at
>> https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/ieee754/dbl-64/s_sin.c;hb=HEAD#l281,
>> it gives an idea of the extreme lengths they go to.
>>
>
> https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/ieee754/dbl-64/s_fabs.c;hb=HEAD
>
> [/tmp]* cat a.c
> #include 
> #include 
>
> #define FFABS(a) ((a) >= 0 ? (a) : (-(a)))
>
> double f1d(double x) { return fabs(x); }
> double f2d(double x) { return FFABS(x); }
>
> int f1i(int x) { return abs(x); }
> int f2i(int x) { return FFABS(x); }
> [/tmp]* gcc -O2 -c a.c && objdump -d -Mintel a.o
>
> a.o: file format elf64-x86-64
>
>
> Disassembly of section .text:
>
>  :
>0:   f2 0f 10 0d 00 00 00movsd  xmm1,QWORD PTR [rip+0x0]# 8
> 
>7:   00
>8:   66 0f 54 c1 andpd  xmm0,xmm1
>c:   c3  ret
>d:   0f 1f 00nopDWORD PTR [rax]
>
> 0010 :
>   10:   66 0f 2e 05 00 00 00ucomisd xmm0,QWORD PTR [rip+0x0]# 18
> 
>   17:   00
>   18:   72 06   jb 20 
>   1a:   f3 c3   repz ret
>   1c:   0f 1f 40 00 nopDWORD PTR [rax+0x0]
>   20:   f2 0f 10 0d 00 00 00movsd  xmm1,QWORD PTR [rip+0x0]# 28
> 
>   27:   00
>   28:   66 0f 57 c1 xorpd  xmm0,xmm1
>   2c:   c3  ret
>   2d:   0f 1f 00nopDWORD PTR [rax]
>
> 0030 :
>   30:   89 fa   movedx,edi
>   32:   89 f8   moveax,edi
>   34:   c1 fa 1fsaredx,0x1f
>   37:   31 d0   xoreax,edx
>   39:   29 d0   subeax,edx
>   3b:   c3  ret
>   3c:   0f 1f 40 00 nopDWORD PTR [rax+0x0]
>
> 0040 :
>   40:   89 fa   movedx,edi
>   42:   89 f8   moveax,edi
>   44:   c1 fa 1fsaredx,0x1f
>   47:   31 d0   xoreax,edx
>   49:   29 d0   subeax,edx
>   4b:   c3  ret
> [/tmp]*
>
> So fabs() is inlined by the compiler (gcc 5.2.0 here), while abs() is
> essentially identical to FFABS().
>
> I have similar results with clang (3.7.0).
>
> Conclusion: using fabs() looks better with at least recent versions of
> clang
> and GCC on x86-64 (but may introduce slight behaviour changes?)
>
> To be more rigorous, it would be interesting to compare on different arch &
> compilers, but changing FFABS() with fabs() sounds OK to me.

What about fmax/FFMAX?
> --
> Clement B.
>
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS

2015-10-13 Thread Clément Bœsch
On Tue, Oct 13, 2015 at 09:25:03AM +0200, Paul B Mahol wrote:
[...]
> What about fmax/FFMAX?

Feel free to try that out (it looks OT regarding the patch), but fmax()
looks glibc specific

-- 
Clément B.


signature.asc
Description: PGP signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS

2015-10-13 Thread Ganesh Ajjanagadde
On Tue, Oct 13, 2015 at 2:45 AM, Clément Bœsch  wrote:
> On Tue, Oct 13, 2015 at 12:31:10AM -0400, Ganesh Ajjanagadde wrote:
>> On Tue, Oct 13, 2015 at 12:26 AM, Ganesh Ajjanagadde  
>> wrote:
>> > On Tue, Oct 13, 2015 at 12:16 AM, Carl Eugen Hoyos  
>> > wrote:
>> >> Ganesh Ajjanagadde  mit.edu> writes:
>> >>
>> >>> Bench from libavfilter/astats on a 15 min clip.
>> >>
>> >> I believe that your test would indicate that the
>> >> old variant is faster or that no result can be
>> >> given which is what my tests show.
>>
>> Also, how you can possibly believe that the old variant is faster is
>> beyond me given the astonishing amount of work by Intel, Red Hat, and
>> others to create the absolutely best performing libc.
>>
>> Just have a look at
>> https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/ieee754/dbl-64/s_sin.c;hb=HEAD#l281,
>> it gives an idea of the extreme lengths they go to.
>>
>
> https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/ieee754/dbl-64/s_fabs.c;hb=HEAD
>
> [/tmp]☭ cat a.c
> #include 
> #include 
>
> #define FFABS(a) ((a) >= 0 ? (a) : (-(a)))
>
> double f1d(double x) { return fabs(x); }
> double f2d(double x) { return FFABS(x); }
>
> int f1i(int x) { return abs(x); }
> int f2i(int x) { return FFABS(x); }
> [/tmp]☭ gcc -O2 -c a.c && objdump -d -Mintel a.o
>
> a.o: file format elf64-x86-64
>
>
> Disassembly of section .text:
>
>  :
>0:   f2 0f 10 0d 00 00 00movsd  xmm1,QWORD PTR [rip+0x0]# 8 
> 
>7:   00
>8:   66 0f 54 c1 andpd  xmm0,xmm1
>c:   c3  ret
>d:   0f 1f 00nopDWORD PTR [rax]
>
> 0010 :
>   10:   66 0f 2e 05 00 00 00ucomisd xmm0,QWORD PTR [rip+0x0]# 18 
> 
>   17:   00
>   18:   72 06   jb 20 
>   1a:   f3 c3   repz ret
>   1c:   0f 1f 40 00 nopDWORD PTR [rax+0x0]
>   20:   f2 0f 10 0d 00 00 00movsd  xmm1,QWORD PTR [rip+0x0]# 28 
> 
>   27:   00
>   28:   66 0f 57 c1 xorpd  xmm0,xmm1
>   2c:   c3  ret
>   2d:   0f 1f 00nopDWORD PTR [rax]
>
> 0030 :
>   30:   89 fa   movedx,edi
>   32:   89 f8   moveax,edi
>   34:   c1 fa 1fsaredx,0x1f
>   37:   31 d0   xoreax,edx
>   39:   29 d0   subeax,edx
>   3b:   c3  ret
>   3c:   0f 1f 40 00 nopDWORD PTR [rax+0x0]
>
> 0040 :
>   40:   89 fa   movedx,edi
>   42:   89 f8   moveax,edi
>   44:   c1 fa 1fsaredx,0x1f
>   47:   31 d0   xoreax,edx
>   49:   29 d0   subeax,edx
>   4b:   c3  ret
> [/tmp]☭
>
> So fabs() is inlined by the compiler (gcc 5.2.0 here), while abs() is
> essentially identical to FFABS().

Yes, on integers they are identical. Differences come on floating
point, which is the point of my patch. Thanks for showing the asm.

>
> I have similar results with clang (3.7.0).
>
> Conclusion: using fabs() looks better with at least recent versions of clang
> and GCC on x86-64 (but may introduce slight behaviour changes?)

There might be some behavior changes (floating point is not exact,
etc), but at least they are governed by the ISO C document. FATE still
passes.

>
> To be more rigorous, it would be interesting to compare on different arch &
> compilers, but changing FFABS() with fabs() sounds OK to me.
>
> --
> Clément B.
>
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS

2015-10-13 Thread Ganesh Ajjanagadde
On Tue, Oct 13, 2015 at 9:12 AM, Ganesh Ajjanagadde  wrote:
> On Tue, Oct 13, 2015 at 4:02 AM, Clément Bœsch  wrote:
>> On Tue, Oct 13, 2015 at 09:25:03AM +0200, Paul B Mahol wrote:
>> [...]
>>> What about fmax/FFMAX?
>>
>> Feel free to try that out (it looks OT regarding the patch), but fmax()
>> looks glibc specific

Seems they are actually ISO:
http://en.cppreference.com/w/c/numeric/math/fmax

Can someone check availability on all of our platforms of interest
(e.g Microsoft)?

>
> Maybe (long term) we can use an av_fabs, av_fabsf, av_fmin/av_fmax (or
> ff_, avpriv_, etc) that pick out the right thing for different
> configurations. It will need something split between configure/header
> guards. I am willing to do this, once everyone is convinced.
>
>>
>> --
>> Clément B.
>>
>> ___
>> ffmpeg-devel mailing list
>> ffmpeg-devel@ffmpeg.org
>> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>>
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS

2015-10-13 Thread Ganesh Ajjanagadde
On Tue, Oct 13, 2015 at 8:58 PM, Ronald S. Bultje  wrote:
> Hi,
>
> On Tue, Oct 13, 2015 at 8:09 PM, Ganesh Ajjanagadde 
> wrote:
>
>> On Tue, Oct 13, 2015 at 2:45 AM, Clément Bœsch  wrote:
>> > On Tue, Oct 13, 2015 at 12:31:10AM -0400, Ganesh Ajjanagadde wrote:
>> >> On Tue, Oct 13, 2015 at 12:26 AM, Ganesh Ajjanagadde 
>> wrote:
>> >> > On Tue, Oct 13, 2015 at 12:16 AM, Carl Eugen Hoyos 
>> wrote:
>> >> >> Ganesh Ajjanagadde  mit.edu> writes:
>> >> >>
>> >> >>> Bench from libavfilter/astats on a 15 min clip.
>> >> >>
>> >> >> I believe that your test would indicate that the
>> >> >> old variant is faster or that no result can be
>> >> >> given which is what my tests show.
>> >>
>> >> Also, how you can possibly believe that the old variant is faster is
>> >> beyond me given the astonishing amount of work by Intel, Red Hat, and
>> >> others to create the absolutely best performing libc.
>> >>
>> >> Just have a look at
>> >>
>> https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/ieee754/dbl-64/s_sin.c;hb=HEAD#l281
>> ,
>> >> it gives an idea of the extreme lengths they go to.
>> >>
>> >
>> >
>> https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/ieee754/dbl-64/s_fabs.c;hb=HEAD
>> >
>> > [/tmp]☭ cat a.c
>> > #include 
>> > #include 
>> >
>> > #define FFABS(a) ((a) >= 0 ? (a) : (-(a)))
>> >
>> > double f1d(double x) { return fabs(x); }
>> > double f2d(double x) { return FFABS(x); }
>> >
>> > int f1i(int x) { return abs(x); }
>> > int f2i(int x) { return FFABS(x); }
>> > [/tmp]☭ gcc -O2 -c a.c && objdump -d -Mintel a.o
>> >
>> > a.o: file format elf64-x86-64
>> >
>> >
>> > Disassembly of section .text:
>> >
>> >  :
>> >0:   f2 0f 10 0d 00 00 00movsd  xmm1,QWORD PTR [rip+0x0]#
>> 8 
>> >7:   00
>> >8:   66 0f 54 c1 andpd  xmm0,xmm1
>> >c:   c3  ret
>> >d:   0f 1f 00nopDWORD PTR [rax]
>> >
>> > 0010 :
>> >   10:   66 0f 2e 05 00 00 00ucomisd xmm0,QWORD PTR [rip+0x0]
>> # 18 
>> >   17:   00
>> >   18:   72 06   jb 20 
>> >   1a:   f3 c3   repz ret
>> >   1c:   0f 1f 40 00 nopDWORD PTR [rax+0x0]
>> >   20:   f2 0f 10 0d 00 00 00movsd  xmm1,QWORD PTR [rip+0x0]#
>> 28 
>> >   27:   00
>> >   28:   66 0f 57 c1 xorpd  xmm0,xmm1
>> >   2c:   c3  ret
>> >   2d:   0f 1f 00nopDWORD PTR [rax]
>> >
>> > 0030 :
>> >   30:   89 fa   movedx,edi
>> >   32:   89 f8   moveax,edi
>> >   34:   c1 fa 1fsaredx,0x1f
>> >   37:   31 d0   xoreax,edx
>> >   39:   29 d0   subeax,edx
>> >   3b:   c3  ret
>> >   3c:   0f 1f 40 00 nopDWORD PTR [rax+0x0]
>> >
>> > 0040 :
>> >   40:   89 fa   movedx,edi
>> >   42:   89 f8   moveax,edi
>> >   44:   c1 fa 1fsaredx,0x1f
>> >   47:   31 d0   xoreax,edx
>> >   49:   29 d0   subeax,edx
>> >   4b:   c3  ret
>> > [/tmp]☭
>> >
>> > So fabs() is inlined by the compiler (gcc 5.2.0 here), while abs() is
>> > essentially identical to FFABS().
>> >
>> > I have similar results with clang (3.7.0).
>> >
>> > Conclusion: using fabs() looks better with at least recent versions of
>> clang
>> > and GCC on x86-64 (but may introduce slight behaviour changes?)
>> >
>> > To be more rigorous, it would be interesting to compare on different
>> arch &
>> > compilers, but changing FFABS() with fabs() sounds OK to me.
>>
>> I noticed that is being applied piecemeal, and some of it has been
>> pushed. Does that mean I am free to push (with the reduced commit
>> message) as well?
>
>
> You'll notice that Paul did it for the filters he maintains. I'm fine with
> you doing this for any code I maintain (no further review required). You
> can find maintainers for each piece of code in git log or MAINTAINERS. It
> sounds like Paul is fine with this also. I think the general case, it'd be
> nice to figure out why Carl's results are slightly different from yours (or
> maybe it's noise?). If we can resolve that, I don't think there's any
> further outstanding objections, right?

No other outstanding objections - the only serious concern is
availability (which is a non-issue since we were already using fabs,
fabsf sporadically). Carl's issues should be either noise, or a
bad/slow libc fabs implementation. Hence I requested him for his
config.

I will give respective maintainers a week for slowly adding this
stuff. To reiterate, I have not touched avcodec as it is mostly
integer math anyway - if someone could point out some "hotspots" in
avcodec with this issue, that would be great.

>
> Also, 

Re: [FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS

2015-10-12 Thread Ganesh Ajjanagadde
On Mon, Oct 12, 2015 at 7:59 AM, Ganesh Ajjanagadde  wrote:
> On Mon, Oct 12, 2015 at 7:46 AM, Carl Eugen Hoyos  wrote:
>> Ganesh Ajjanagadde  mit.edu> writes:
>>
>>> It is well known that fabs and fabsf are at least as fast and usually
>>> faster than the FFABS macro, at least on the gcc+glibc combination.
>>
>> I wasn't aware of this.
>> And I believe we support other compilers and other
>> libc implementations.
>
> Indeed, which is why performance comparisons are welcome. I argue
> below why any sane configuration should not regress performance wise.
> This is also "relevant information" in my view.
>
>>
>>> For instance, see the reference:
>>> http://patchwork.sourceware.org/patch/6735/.
>>> This was a patch to glibc in order to remove their usages. Given their
>>> general performance obsession (more than FFmpeg in many cases), they
>>> have ensured that fabs and fabsf never peform worse than FFABS.
>>
>> Ok but is this really related?
>
> The reference is, the comment may not be, I was slightly annoyed at
> FFABS usage when libc provides them on all our platforms, and wanted a
> justification that would appeal to the FFmpeg crowd, namely peformance
> to move away from them.
>
>>
>>> I have tested on x86-64 Haswell with GCC 5.2 - even with no strict IEEE
>>> mode enabled, and just the standard -O3 optimizations, there is a
>>> performance benefit.
>>
>> This is the only relevant information imo.
>> Please provide (very, very short) information
>> on what you tested.
>
> Random integers, same style as before. I have not posted numbers,
> since my numbers are anyway meaningless: I lack non
> x86-64+(gcc/clang)+glibc configurations.
> As for that being the only relevant message, I do intend to shorten
> the message. The long stuff was simply my own personal motivation to
> make people understand why I did this stuff. Otherwise, I would have
> sent a separate message anyway in the patch thread, let me know what
> style you prefer.
>
>>
>> Since you mention libc so often: Does the patch
>> work on win*, aix and other strange platforms?
>
> Why not, any standard, conformant fabs/fabsf should. Again, I lack the
> configurations and am just a university student with a single laptop.
> fabs and fabsf are already being used elsewhere. Inf anything, they
> are far better specified on IEEE 754 than FFABS - behavior with NaN,
> Inf, etc.

Bench from libavfilter/astats on a 15 min clip. Of course the
difference is slight, but nonetheless it exists. The best case is the
same, but look at the difference in the worst cases (as was mentioned
in the glibc link I gave, I suspect some trickery for subnormal
floats/Inf/0.0). By the way, I can show results skewing even more
heavily in favor of fabs by using "random" floating point numbers,
random in the sense of being a random 64 bit pattern (same style as my
old crude bench - fill a large array, and test). There, believe it or
not, I was getting a nearly 1.5-2x improvement.

Anyway, here it is:
old:
   4230 decicycles in abs,   1 runs,  0 skips
   2520 decicycles in abs,   2 runs,  0 skips
   1635 decicycles in abs,   4 runs,  0 skips
967 decicycles in abs,   8 runs,  0 skips
635 decicycles in abs,  16 runs,  0 skips
473 decicycles in abs,  32 runs,  0 skips
389 decicycles in abs,  64 runs,  0 skips
350 decicycles in abs, 128 runs,  0 skips
331 decicycles in abs, 256 runs,  0 skips
321 decicycles in abs, 512 runs,  0 skips
319 decicycles in abs,1024 runs,  0 skips
318 decicycles in abs,2048 runs,  0 skips
315 decicycles in abs,4096 runs,  0 skips
317 decicycles in abs,8192 runs,  0 skips
335 decicycles in abs,   16384 runs,  0 skips
335 decicycles in abs,   32768 runs,  0 skips
333 decicycles in abs,   65536 runs,  0 skips
342 decicycles in abs,  131072 runs,  0 skips
340 decicycles in abs,  262144 runs,  0 skips
345 decicycles in abs,  524285 runs,  3 skips
348 decicycles in abs, 1048565 runs, 11 skips
351 decicycles in abs, 2097129 runs, 23 skipsbitrate=N/A
352 decicycles in abs, 4194252 runs, 52 skipsbitrate=N/A
350 decicycles in abs, 8388498 runs,110 skipsbitrate=N/A
351 decicycles in abs,16776993 runs,223 skipsbitrate=N/A
352 decicycles in abs,33553999 runs,433 skipsbitrate=N/A
351 decicycles in abs,67108036 runs,828 skips
new:
   3540 decicycles in abs,   1 runs,  0 skips
   2160 decicycles in abs,   2 runs,  0 skips
   1447 decicycles in abs,   4 runs,  0 skips
881 decicycles in abs,   8 runs,  0 skips
594 decicycles in abs,  16 runs,  0 skips
455 decicycles in abs,  32 runs,  0 skips
382 decicycles in abs,  64 runs,  0 skips
361 decicycles in abs, 128 runs,  0 skips
356 decicycles in abs, 

Re: [FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS

2015-10-12 Thread Carl Eugen Hoyos
Ganesh Ajjanagadde  mit.edu> writes:
> On Tue, Oct 13, 2015 at 12:16 AM, Carl Eugen Hoyos wrote:
> > Ganesh Ajjanagadde  mit.edu> writes:
> >
> >> Bench from libavfilter/astats on a 15 min clip.
> >
> > I believe that your test would indicate that the
> > old variant is faster or that no result can be
> > given which is what my tests show.
> 
> Look at the bench and the numbers again, I have 
> provided it above.

Ok:
old:
389 decicycles in abs,  64 runs,  0 skips
350 decicycles in abs, 128 runs,  0 skips
331 decicycles in abs, 256 runs,  0 skips
321 decicycles in abs, 512 runs,  0 skips
319 decicycles in abs,1024 runs,  0 skips
318 decicycles in abs,2048 runs,  0 skips
315 decicycles in abs,4096 runs,  0 skips
317 decicycles in abs,8192 runs,  0 skips
335 decicycles in abs,   16384 runs,  0 skips
335 decicycles in abs,   32768 runs,  0 skips

mew:
382 decicycles in abs,  64 runs,  0 skips
361 decicycles in abs, 128 runs,  0 skips
356 decicycles in abs, 256 runs,  0 skips
334 decicycles in abs, 512 runs,  0 skips
322 decicycles in abs,1024 runs,  0 skips
317 decicycles in abs,2048 runs,  0 skips
315 decicycles in abs,4096 runs,  0 skips
341 decicycles in abs,8192 runs,  0 skips
363 decicycles in abs,   16383 runs,  1 skips
342 decicycles in abs,   32767 runs,  1 skips
Numbers with high skips or low runs are not so 
relevant afaik.

> They are essentially identical in the best case 
> (most number of runs), the new variant is faster in 
> the worst case.

I would say the opposite is true but we can certainly 
agree that there is no proof that one is faster.

> You have not provided a bench proving otherwise.

old:
user0m20.338s
user0m20.408s
user0m20.287s
user0m20.365s
user0m20.208s
new:
user0m20.197s
user0m20.577s
user0m20.434s
user0m20.322s
user0m20.356s

> > I am not sure if it makes sense to apply a patch
> > that is meant to improve speed if this improvement
> > can't be shown.
> 
> I believe I have shown it above clearly.

Imo, you have shown clearly that neither variant can 
be shown to be faster.

Carl Eugen

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS

2015-10-12 Thread Ganesh Ajjanagadde
On Mon, Oct 12, 2015 at 4:57 PM, Ganesh Ajjanagadde  wrote:
> On Mon, Oct 12, 2015 at 7:59 AM, Ganesh Ajjanagadde  wrote:
>> On Mon, Oct 12, 2015 at 7:46 AM, Carl Eugen Hoyos  wrote:
>>> Ganesh Ajjanagadde  mit.edu> writes:
>>>
 It is well known that fabs and fabsf are at least as fast and usually
 faster than the FFABS macro, at least on the gcc+glibc combination.
>>>
>>> I wasn't aware of this.
>>> And I believe we support other compilers and other
>>> libc implementations.
>>
>> Indeed, which is why performance comparisons are welcome. I argue
>> below why any sane configuration should not regress performance wise.
>> This is also "relevant information" in my view.
>>
>>>
 For instance, see the reference:
 http://patchwork.sourceware.org/patch/6735/.
 This was a patch to glibc in order to remove their usages. Given their
 general performance obsession (more than FFmpeg in many cases), they
 have ensured that fabs and fabsf never peform worse than FFABS.
>>>
>>> Ok but is this really related?
>>
>> The reference is, the comment may not be, I was slightly annoyed at
>> FFABS usage when libc provides them on all our platforms, and wanted a
>> justification that would appeal to the FFmpeg crowd, namely peformance
>> to move away from them.
>>
>>>
 I have tested on x86-64 Haswell with GCC 5.2 - even with no strict IEEE
 mode enabled, and just the standard -O3 optimizations, there is a
 performance benefit.
>>>
>>> This is the only relevant information imo.
>>> Please provide (very, very short) information
>>> on what you tested.
>>
>> Random integers, same style as before. I have not posted numbers,
>> since my numbers are anyway meaningless: I lack non
>> x86-64+(gcc/clang)+glibc configurations.
>> As for that being the only relevant message, I do intend to shorten
>> the message. The long stuff was simply my own personal motivation to
>> make people understand why I did this stuff. Otherwise, I would have
>> sent a separate message anyway in the patch thread, let me know what
>> style you prefer.
>>
>>>
>>> Since you mention libc so often: Does the patch
>>> work on win*, aix and other strange platforms?
>>
>> Why not, any standard, conformant fabs/fabsf should. Again, I lack the
>> configurations and am just a university student with a single laptop.
>> fabs and fabsf are already being used elsewhere. Inf anything, they
>> are far better specified on IEEE 754 than FFABS - behavior with NaN,
>> Inf, etc.
>
> Bench from libavfilter/astats on a 15 min clip. Of course the
> difference is slight, but nonetheless it exists. The best case is the
> same, but look at the difference in the worst cases (as was mentioned
> in the glibc link I gave, I suspect some trickery for subnormal
> floats/Inf/0.0). By the way, I can show results skewing even more
> heavily in favor of fabs by using "random" floating point numbers,
> random in the sense of being a random 64 bit pattern (same style as my
> old crude bench - fill a large array, and test). There, believe it or
> not, I was getting a nearly 1.5-2x improvement.
>
> Anyway, here it is:
> old:
>4230 decicycles in abs,   1 runs,  0 skips
>2520 decicycles in abs,   2 runs,  0 skips
>1635 decicycles in abs,   4 runs,  0 skips
> 967 decicycles in abs,   8 runs,  0 skips
> 635 decicycles in abs,  16 runs,  0 skips
> 473 decicycles in abs,  32 runs,  0 skips
> 389 decicycles in abs,  64 runs,  0 skips
> 350 decicycles in abs, 128 runs,  0 skips
> 331 decicycles in abs, 256 runs,  0 skips
> 321 decicycles in abs, 512 runs,  0 skips
> 319 decicycles in abs,1024 runs,  0 skips
> 318 decicycles in abs,2048 runs,  0 skips
> 315 decicycles in abs,4096 runs,  0 skips
> 317 decicycles in abs,8192 runs,  0 skips
> 335 decicycles in abs,   16384 runs,  0 skips
> 335 decicycles in abs,   32768 runs,  0 skips
> 333 decicycles in abs,   65536 runs,  0 skips
> 342 decicycles in abs,  131072 runs,  0 skips
> 340 decicycles in abs,  262144 runs,  0 skips
> 345 decicycles in abs,  524285 runs,  3 skips
> 348 decicycles in abs, 1048565 runs, 11 skips
> 351 decicycles in abs, 2097129 runs, 23 skipsbitrate=N/A
> 352 decicycles in abs, 4194252 runs, 52 skipsbitrate=N/A
> 350 decicycles in abs, 8388498 runs,110 skipsbitrate=N/A
> 351 decicycles in abs,16776993 runs,223 skipsbitrate=N/A
> 352 decicycles in abs,33553999 runs,433 skipsbitrate=N/A
> 351 decicycles in abs,67108036 runs,828 skips
> new:
>3540 decicycles in abs,   1 runs,  0 skips
>2160 decicycles in abs,   2 runs,  0 skips
>1447 decicycles in abs,   4 runs,  0 skips
> 881 decicycles in abs,   8 runs,  0 skips
> 594 

Re: [FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS

2015-10-12 Thread Ganesh Ajjanagadde
On Tue, Oct 13, 2015 at 12:16 AM, Carl Eugen Hoyos  wrote:
> Ganesh Ajjanagadde  mit.edu> writes:
>
>> Bench from libavfilter/astats on a 15 min clip.
>
> I believe that your test would indicate that the
> old variant is faster or that no result can be
> given which is what my tests show.

Look at the bench and the numbers again, I have provided it above.
They are essentially identical in the best case (most number of runs),
the new variant is faster in the worst case. You have not provided a
bench proving otherwise.

>
> I am not sure if it makes sense to apply a patch
> that is meant to improve speed if this improvement
> can't be shown.

I believe I have shown it above clearly.

>
> Carl Eugen
>
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS

2015-10-12 Thread Ganesh Ajjanagadde
On Tue, Oct 13, 2015 at 12:26 AM, Ganesh Ajjanagadde  wrote:
> On Tue, Oct 13, 2015 at 12:16 AM, Carl Eugen Hoyos  wrote:
>> Ganesh Ajjanagadde  mit.edu> writes:
>>
>>> Bench from libavfilter/astats on a 15 min clip.
>>
>> I believe that your test would indicate that the
>> old variant is faster or that no result can be
>> given which is what my tests show.

Also, how you can possibly believe that the old variant is faster is
beyond me given the astonishing amount of work by Intel, Red Hat, and
others to create the absolutely best performing libc.

Just have a look at
https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/ieee754/dbl-64/s_sin.c;hb=HEAD#l281,
it gives an idea of the extreme lengths they go to.

>
> Look at the bench and the numbers again, I have provided it above.
> They are essentially identical in the best case (most number of runs),
> the new variant is faster in the worst case. You have not provided a
> bench proving otherwise.
>
>>
>> I am not sure if it makes sense to apply a patch
>> that is meant to improve speed if this improvement
>> can't be shown.
>
> I believe I have shown it above clearly.
>
>>
>> Carl Eugen
>>
>> ___
>> ffmpeg-devel mailing list
>> ffmpeg-devel@ffmpeg.org
>> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS

2015-10-12 Thread Ganesh Ajjanagadde
On Tue, Oct 13, 2015 at 12:44 AM, Carl Eugen Hoyos  wrote:
> Ganesh Ajjanagadde  mit.edu> writes:
>> On Tue, Oct 13, 2015 at 12:16 AM, Carl Eugen Hoyos wrote:
>> > Ganesh Ajjanagadde  mit.edu> writes:
>> >
>> >> Bench from libavfilter/astats on a 15 min clip.
>> >
>> > I believe that your test would indicate that the
>> > old variant is faster or that no result can be
>> > given which is what my tests show.
>>
>> Look at the bench and the numbers again, I have
>> provided it above.
>
> Ok:
> old:
> 389 decicycles in abs,  64 runs,  0 skips
> 350 decicycles in abs, 128 runs,  0 skips
> 331 decicycles in abs, 256 runs,  0 skips
> 321 decicycles in abs, 512 runs,  0 skips
> 319 decicycles in abs,1024 runs,  0 skips
> 318 decicycles in abs,2048 runs,  0 skips
> 315 decicycles in abs,4096 runs,  0 skips
> 317 decicycles in abs,8192 runs,  0 skips
> 335 decicycles in abs,   16384 runs,  0 skips
> 335 decicycles in abs,   32768 runs,  0 skips
>
> mew:
> 382 decicycles in abs,  64 runs,  0 skips
> 361 decicycles in abs, 128 runs,  0 skips
> 356 decicycles in abs, 256 runs,  0 skips
> 334 decicycles in abs, 512 runs,  0 skips
> 322 decicycles in abs,1024 runs,  0 skips
> 317 decicycles in abs,2048 runs,  0 skips
> 315 decicycles in abs,4096 runs,  0 skips
> 341 decicycles in abs,8192 runs,  0 skips
> 363 decicycles in abs,   16383 runs,  1 skips
> 342 decicycles in abs,   32767 runs,  1 skips
> Numbers with high skips or low runs are not so
> relevant afaik.

Not so relevant, but as I said: it is still better.

>
>> They are essentially identical in the best case
>> (most number of runs), the new variant is faster in
>> the worst case.
>
> I would say the opposite is true but we can certainly
> agree that there is no proof that one is faster.

Do a random float test, the difference is more pronounced.

>
>> You have not provided a bench proving otherwise.
>
> old:
> user0m20.338s
> user0m20.408s
> user0m20.287s
> user0m20.365s
> user0m20.208s
> new:
> user0m20.197s
> user0m20.577s
> user0m20.434s
> user0m20.322s
> user0m20.356s

The difference here is imo too small to say anything. My point is
precisely this: on most inputs, there is no difference. On bad (worst
case) inputs, using fabs instead of the macro is far superior. The
random float bench proves this. Translating that to some audio file
should be easy: I suspect placing most samples near a silence value
(0) does this.

>
>> > I am not sure if it makes sense to apply a patch
>> > that is meant to improve speed if this improvement
>> > can't be shown.
>>
>> I believe I have shown it above clearly.
>
> Imo, you have shown clearly that neither variant can
> be shown to be faster.
>
> Carl Eugen
>
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS

2015-10-12 Thread Ganesh Ajjanagadde
On Tue, Oct 13, 2015 at 1:03 AM, Ganesh Ajjanagadde  wrote:
> On Tue, Oct 13, 2015 at 12:44 AM, Carl Eugen Hoyos  wrote:
>> Ganesh Ajjanagadde  mit.edu> writes:
>>> On Tue, Oct 13, 2015 at 12:16 AM, Carl Eugen Hoyos wrote:
>>> > Ganesh Ajjanagadde  mit.edu> writes:
>>> >
>>> >> Bench from libavfilter/astats on a 15 min clip.
>>> >
>>> > I believe that your test would indicate that the
>>> > old variant is faster or that no result can be
>>> > given which is what my tests show.
>>>
>>> Look at the bench and the numbers again, I have
>>> provided it above.
>>
>> Ok:
>> old:
>> 389 decicycles in abs,  64 runs,  0 skips
>> 350 decicycles in abs, 128 runs,  0 skips
>> 331 decicycles in abs, 256 runs,  0 skips
>> 321 decicycles in abs, 512 runs,  0 skips
>> 319 decicycles in abs,1024 runs,  0 skips
>> 318 decicycles in abs,2048 runs,  0 skips
>> 315 decicycles in abs,4096 runs,  0 skips
>> 317 decicycles in abs,8192 runs,  0 skips
>> 335 decicycles in abs,   16384 runs,  0 skips
>> 335 decicycles in abs,   32768 runs,  0 skips
>>
>> mew:
>> 382 decicycles in abs,  64 runs,  0 skips
>> 361 decicycles in abs, 128 runs,  0 skips
>> 356 decicycles in abs, 256 runs,  0 skips
>> 334 decicycles in abs, 512 runs,  0 skips
>> 322 decicycles in abs,1024 runs,  0 skips
>> 317 decicycles in abs,2048 runs,  0 skips
>> 315 decicycles in abs,4096 runs,  0 skips
>> 341 decicycles in abs,8192 runs,  0 skips
>> 363 decicycles in abs,   16383 runs,  1 skips
>> 342 decicycles in abs,   32767 runs,  1 skips
>> Numbers with high skips or low runs are not so
>> relevant afaik.
>
> Not so relevant, but as I said: it is still better.
>
>>
>>> They are essentially identical in the best case
>>> (most number of runs), the new variant is faster in
>>> the worst case.
>>
>> I would say the opposite is true but we can certainly
>> agree that there is no proof that one is faster.
>
> Do a random float test, the difference is more pronounced.

Simple bench for all abs stuff:

#include 
#include 
#include 
#include 
#include 

#define FFABS(a) ((a) >= 0 ? (a) : (-(a)))
#define NUM_TRIALS 10
#define NUM_ITER 10

static float f[NUM_TRIALS];
static double g[NUM_TRIALS];
static int i[NUM_TRIALS];
static long long ll[NUM_TRIALS];

int main(void) {
int c, d;
clock_t start, end;
double time;
float abs_f;
double abs_d;
int abs_i;
long long abs_ll;

for (c = 0; c < NUM_TRIALS; ++c) {
ll[c] = random();
i[c] = rand();
f[c] = (float)rand()/(float)(RAND_MAX/FLT_MAX);
g[c] = (double)random()/(double)(RAND_MAX/DBL_MAX);
}

start = clock();
for (d = 0; d < NUM_ITER; ++d)
for (c = 0; c < NUM_TRIALS; ++c)
f[c] = fabsf(f[c]);
end = clock();
time = ((double) (end - start)) / CLOCKS_PER_SEC;
printf("fabsf: %lf\n", time);

start = clock();
for (d = 0; d < NUM_ITER; ++d)
for (c = 0; c < NUM_TRIALS; ++c)
f[c] = FFABS(f[c]);
end = clock();
time = ((double) (end - start)) / CLOCKS_PER_SEC;
printf("FFABS: %lf\n", time);

start = clock();
for (d = 0; d < NUM_ITER; ++d)
for (c = 0; c < NUM_TRIALS; ++c)
g[c] = fabs(g[c]);
end = clock();
time = ((double) (end - start)) / CLOCKS_PER_SEC;
printf("fabs: %lf\n", time);

start = clock();
for (d = 0; d < NUM_ITER; ++d)
for (c = 0; c < NUM_TRIALS; ++c)
g[c] = FFABS(g[c]);
end = clock();
time = ((double) (end - start)) / CLOCKS_PER_SEC;
printf("FFABS: %lf\n", time);

start = clock();
for (d = 0; d < NUM_ITER; ++d)
for (c = 0; c < NUM_TRIALS; ++c)
i[c] = abs(i[c]);
end = clock();
time = ((double) (end - start)) / CLOCKS_PER_SEC;
printf("abs: %lf\n", time);

start = clock();
for (d = 0; d < NUM_ITER; ++d)
for (c = 0; c < NUM_TRIALS; ++c)
i[c] = FFABS(i[c]);
end = clock();
time = ((double) (end - start)) / CLOCKS_PER_SEC;
printf("FFABS: %lf\n", time);

start = clock();
for (d = 0; d < NUM_ITER; ++d)
for (c = 0; c < NUM_TRIALS; ++c)
ll[c] = llabs(ll[c]);
end = clock();
time = ((double) (end - start)) / CLOCKS_PER_SEC;
printf("llabs: %lf\n", time);

start = clock();
for (d = 0; d < NUM_ITER; ++d)
for (c = 0; c < NUM_TRIALS; ++c)
ll[c] = FFABS(ll[c]);
end = clock();
time = ((double) (end - start)) / CLOCKS_PER_SEC;
printf("FFABS: %lf\n", time);

return 0;
}

>
>>
>>> You have not provided a bench proving otherwise.
>>
>> old:
>> user0m20.338s
>> user0m20.408s
>> user0m20.287s
>> user0m20.365s
>> user0m20.208s
>> new:
>> user0m20.197s
>> user0m20.577s
>> user0m20.434s
>> user0m20.322s
>> user0m20.356s

Am also