Re: [FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS
On Thu, Oct 22, 2015 at 4:51 PM, Carl Eugen Hoyoswrote: > Ganesh Ajjanagadde mit.edu> writes: > >> To put an end to a long and tortuous thread, and >> due to the lack of relevant outstanding objections, >> pushed. > > To sum it up: > Several developers have explained to you that the > numbers you posted show that FFmpeg is now either > slower or equally fast, you have pushed with a > commit message that claims that your patch makes > FFmpeg faster. I was far more nuanced because I know there are people like you who do not understand such things. Seems like I failed at that anyway. I am sorry, but I can only repeat the first two lines: "It is well known that fabs and fabsf are at least as fast and sometimes faster than the FFABS macro, at least on the gcc+glibc combination." That statement is completely accurate. I did not claim "within FFmpeg", since like I said usually there is no difference. From where the "slower" comes from god knows. > How do you call such a claim without any base? Look at the asm, etc etc. Clement has already showed the difference. "Without any base" is a completely incorrect and illogical statement. Paul is clearly convinced enough to start using it in his own filters. > Carl Eugen > > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS
On Fri, Oct 16, 2015 at 7:53 AM, Ganesh Ajjanagaddewrote: > On Fri, Oct 16, 2015 at 7:30 AM, Michael Niedermayer > wrote: >> On Thu, Oct 15, 2015 at 06:38:10AM -0400, Ganesh Ajjanagadde wrote: >>> On Wed, Oct 14, 2015 at 6:53 AM, Hendrik Leppkes >>> wrote: >>> > On Wed, Oct 14, 2015 at 12:49 PM, Carl Eugen Hoyos >>> > wrote: >>> >> Ganesh Ajjanagadde mit.edu> writes: >>> >> >>> >>> What? My numbers actually show that the new code may be faster - >>> >> >>> >> No, you are misunderstanding the numbers you posted. >>> >> (Or I misunderstand them but nobody said so yet.) >>> >> >>> >> Highest runs are most relevant, skips have to be >>> >> avoided (afaik). >>> >> >>> >> [...] >>> >> >>> >>> If you continue to post such stuff that has no basis, I might actually >>> >>> get tempted into finding out for which floating point values the new >>> >>> code is significantly faster, craft a relevant audio file, and post it >>> >>> showing a huge performance difference - my random numbers benchmark >>> >>> shows there must exist such values. >>> >> >>> >> Please do so! >>> >> >>> >>> > The more important question is if you can see the same >>> >>> > changes in the disassembly of af_astats.o as what >>> >>> > ubitux posted here for a short test function? >>> >>> >>> >>> I do. He uses clang/gcc, so do I. >>> >> >>> >> Sorry, my understanding fails here (I am not a native speaker): >>> >> You did look at the disassembly of af_astats.o and there is >>> >> inlined code instead of a function call? >>> >> >>> >>> The reason (irrelevant) is that both >>> >>> of us run Arch. >>> >>> >>> >>> What is "more relevant" is if _you_ can see the changes >>> >>> on some non Linux platform. >>> >> >>> >> If you could show that it is faster on any platform >>> >> I would already be happy! >>> >> >>> > >>> > A more important check would be that its not significantly slower on >>> > any other platform. Just because one compiler/glibc combination >>> > manages to produce an efficient inlined function doesn't necessarily >>> > mean that some other compiler or libc couldn't produce a full function >>> > call with all the overhead that comes with it, becoming significantly >>> > slower. >>> >>> As I point out, all a libc implementer needs to do to be on par with >>> the macro is to add the inline keyword. This was added in c99. If said >>> libc does not, then it is fundamentally broken from a performance >>> perspective. A beginning programmer can do that in a couple of >>> minutes. Fix upstream and complain to them if it does not inline. >> >> I dont know how the latest compilers handle "inline" but a few years >> ago gcc was rather dumb about inlining, and i think its not easy for >> a compiler to be actually not "dumb" >> >> A compiler cannot inline everything that has the inline keyword, >> it would lead (for some source code) to an explosion on size and >> compile time. >> and a good compiler will want to inline some functions even if they >> do not have the inline keyword >> Also its not easy to know for a compiler what to >> inline and what not, there could be 10 functions a1(),a2(), a3(), ... >> each calling the previous 10 times ... >> the way gcc handled this (in the past and AFAIK at least) is to have >> various complicated thresholds that limit the amount of inlining. >> The big annoyance with this (years ago at least) was that if you >> forced a function to be inlined by "force" gcc would then stop >> inlining something else and you ended up either forcing every single >> function you needed inlined or would have had to tune the thresholds >> >> it would be interresting to check if replacing FFABS by fabs causes >> any big changes to inlining behavior (maybe that can be done by >> comparing the list of symbols in the object files as fully inlined >> functions s´wouldnt show up but maybe there are other ways) >> >> anyway iam not against using fabs() for float/double FFABS() >> i just think some assumtations in this thread are possibly too >> optimistic, but its quite possible these replacements are all fine >> and the changes in inlining if any have no performance impact > > I myself am not "optimistic" in the sense that I think most of the > time this will have zero change. All I am saying is that in cases > where there is a difference, it will likely be in favor of fabs, etc > and not the macro due to reasons I mentioned in the long commit > message I posted. > >> >> also if a *abs is implemented by using a branch (as in if its positive >> jump over a negate instruction) then branch prediction can play a >> sigificant role in performance, that is random values would be alot >> slower than the same values ordered > > Maybe this is why I get such a large difference between fabs and FFABS > in favor of fabs - I just keep random numbers with no ordering. If > true, this is definitely in fabs's favor. > >> a good implementation should not use a
Re: [FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS
Ganesh Ajjanagadde mit.edu> writes: > To put an end to a long and tortuous thread, and > due to the lack of relevant outstanding objections, > pushed. To sum it up: Several developers have explained to you that the numbers you posted show that FFmpeg is now either slower or equally fast, you have pushed with a commit message that claims that your patch makes FFmpeg faster. How do you call such a claim without any base? Carl Eugen ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS
On Fri, Oct 16, 2015 at 7:30 AM, Michael Niedermayerwrote: > On Thu, Oct 15, 2015 at 06:38:10AM -0400, Ganesh Ajjanagadde wrote: >> On Wed, Oct 14, 2015 at 6:53 AM, Hendrik Leppkes wrote: >> > On Wed, Oct 14, 2015 at 12:49 PM, Carl Eugen Hoyos >> > wrote: >> >> Ganesh Ajjanagadde mit.edu> writes: >> >> >> >>> What? My numbers actually show that the new code may be faster - >> >> >> >> No, you are misunderstanding the numbers you posted. >> >> (Or I misunderstand them but nobody said so yet.) >> >> >> >> Highest runs are most relevant, skips have to be >> >> avoided (afaik). >> >> >> >> [...] >> >> >> >>> If you continue to post such stuff that has no basis, I might actually >> >>> get tempted into finding out for which floating point values the new >> >>> code is significantly faster, craft a relevant audio file, and post it >> >>> showing a huge performance difference - my random numbers benchmark >> >>> shows there must exist such values. >> >> >> >> Please do so! >> >> >> >>> > The more important question is if you can see the same >> >>> > changes in the disassembly of af_astats.o as what >> >>> > ubitux posted here for a short test function? >> >>> >> >>> I do. He uses clang/gcc, so do I. >> >> >> >> Sorry, my understanding fails here (I am not a native speaker): >> >> You did look at the disassembly of af_astats.o and there is >> >> inlined code instead of a function call? >> >> >> >>> The reason (irrelevant) is that both >> >>> of us run Arch. >> >>> >> >>> What is "more relevant" is if _you_ can see the changes >> >>> on some non Linux platform. >> >> >> >> If you could show that it is faster on any platform >> >> I would already be happy! >> >> >> > >> > A more important check would be that its not significantly slower on >> > any other platform. Just because one compiler/glibc combination >> > manages to produce an efficient inlined function doesn't necessarily >> > mean that some other compiler or libc couldn't produce a full function >> > call with all the overhead that comes with it, becoming significantly >> > slower. >> >> As I point out, all a libc implementer needs to do to be on par with >> the macro is to add the inline keyword. This was added in c99. If said >> libc does not, then it is fundamentally broken from a performance >> perspective. A beginning programmer can do that in a couple of >> minutes. Fix upstream and complain to them if it does not inline. > > I dont know how the latest compilers handle "inline" but a few years > ago gcc was rather dumb about inlining, and i think its not easy for > a compiler to be actually not "dumb" > > A compiler cannot inline everything that has the inline keyword, > it would lead (for some source code) to an explosion on size and > compile time. > and a good compiler will want to inline some functions even if they > do not have the inline keyword > Also its not easy to know for a compiler what to > inline and what not, there could be 10 functions a1(),a2(), a3(), ... > each calling the previous 10 times ... > the way gcc handled this (in the past and AFAIK at least) is to have > various complicated thresholds that limit the amount of inlining. > The big annoyance with this (years ago at least) was that if you > forced a function to be inlined by "force" gcc would then stop > inlining something else and you ended up either forcing every single > function you needed inlined or would have had to tune the thresholds > > it would be interresting to check if replacing FFABS by fabs causes > any big changes to inlining behavior (maybe that can be done by > comparing the list of symbols in the object files as fully inlined > functions s´wouldnt show up but maybe there are other ways) > > anyway iam not against using fabs() for float/double FFABS() > i just think some assumtations in this thread are possibly too > optimistic, but its quite possible these replacements are all fine > and the changes in inlining if any have no performance impact I myself am not "optimistic" in the sense that I think most of the time this will have zero change. All I am saying is that in cases where there is a difference, it will likely be in favor of fabs, etc and not the macro due to reasons I mentioned in the long commit message I posted. > > also if a *abs is implemented by using a branch (as in if its positive > jump over a negate instruction) then branch prediction can play a > sigificant role in performance, that is random values would be alot > slower than the same values ordered Maybe this is why I get such a large difference between fabs and FFABS in favor of fabs - I just keep random numbers with no ordering. If true, this is definitely in fabs's favor. > a good implementation should not use a branch though, abs for floats > and doubles is just setting the sign bit basically, platforms should > have a dedicated instruction for that or in some cases a integer > and/or could maybe even be used
Re: [FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS
On Thu, Oct 15, 2015 at 06:38:10AM -0400, Ganesh Ajjanagadde wrote: > On Wed, Oct 14, 2015 at 6:53 AM, Hendrik Leppkeswrote: > > On Wed, Oct 14, 2015 at 12:49 PM, Carl Eugen Hoyos wrote: > >> Ganesh Ajjanagadde mit.edu> writes: > >> > >>> What? My numbers actually show that the new code may be faster - > >> > >> No, you are misunderstanding the numbers you posted. > >> (Or I misunderstand them but nobody said so yet.) > >> > >> Highest runs are most relevant, skips have to be > >> avoided (afaik). > >> > >> [...] > >> > >>> If you continue to post such stuff that has no basis, I might actually > >>> get tempted into finding out for which floating point values the new > >>> code is significantly faster, craft a relevant audio file, and post it > >>> showing a huge performance difference - my random numbers benchmark > >>> shows there must exist such values. > >> > >> Please do so! > >> > >>> > The more important question is if you can see the same > >>> > changes in the disassembly of af_astats.o as what > >>> > ubitux posted here for a short test function? > >>> > >>> I do. He uses clang/gcc, so do I. > >> > >> Sorry, my understanding fails here (I am not a native speaker): > >> You did look at the disassembly of af_astats.o and there is > >> inlined code instead of a function call? > >> > >>> The reason (irrelevant) is that both > >>> of us run Arch. > >>> > >>> What is "more relevant" is if _you_ can see the changes > >>> on some non Linux platform. > >> > >> If you could show that it is faster on any platform > >> I would already be happy! > >> > > > > A more important check would be that its not significantly slower on > > any other platform. Just because one compiler/glibc combination > > manages to produce an efficient inlined function doesn't necessarily > > mean that some other compiler or libc couldn't produce a full function > > call with all the overhead that comes with it, becoming significantly > > slower. > > As I point out, all a libc implementer needs to do to be on par with > the macro is to add the inline keyword. This was added in c99. If said > libc does not, then it is fundamentally broken from a performance > perspective. A beginning programmer can do that in a couple of > minutes. Fix upstream and complain to them if it does not inline. I dont know how the latest compilers handle "inline" but a few years ago gcc was rather dumb about inlining, and i think its not easy for a compiler to be actually not "dumb" A compiler cannot inline everything that has the inline keyword, it would lead (for some source code) to an explosion on size and compile time. and a good compiler will want to inline some functions even if they do not have the inline keyword Also its not easy to know for a compiler what to inline and what not, there could be 10 functions a1(),a2(), a3(), ... each calling the previous 10 times ... the way gcc handled this (in the past and AFAIK at least) is to have various complicated thresholds that limit the amount of inlining. The big annoyance with this (years ago at least) was that if you forced a function to be inlined by "force" gcc would then stop inlining something else and you ended up either forcing every single function you needed inlined or would have had to tune the thresholds it would be interresting to check if replacing FFABS by fabs causes any big changes to inlining behavior (maybe that can be done by comparing the list of symbols in the object files as fully inlined functions s´wouldnt show up but maybe there are other ways) anyway iam not against using fabs() for float/double FFABS() i just think some assumtations in this thread are possibly too optimistic, but its quite possible these replacements are all fine and the changes in inlining if any have no performance impact also if a *abs is implemented by using a branch (as in if its positive jump over a negate instruction) then branch prediction can play a sigificant role in performance, that is random values would be alot slower than the same values ordered a good implementation should not use a branch though, abs for floats and doubles is just setting the sign bit basically, platforms should have a dedicated instruction for that or in some cases a integer and/or could maybe even be used [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB Why not whip the teacher when the pupil misbehaves? -- Diogenes of Sinope signature.asc Description: Digital signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS
On Thu, Oct 15, 2015 at 12:53 PM, Ganesh Ajjanagaddewrote: > On Thu, Oct 15, 2015 at 6:50 AM, Hendrik Leppkes wrote: >> On Thu, Oct 15, 2015 at 12:41 PM, Ganesh Ajjanagadde >> wrote: >>> On Thu, Oct 15, 2015 at 6:37 AM, Hendrik Leppkes >>> wrote: On Thu, Oct 15, 2015 at 12:34 PM, Ganesh Ajjanagadde wrote: > On Wed, Oct 14, 2015 at 6:50 AM, Hendrik Leppkes > wrote: >> On Wed, Oct 14, 2015 at 12:37 PM, Ganesh Ajjanagadde >> wrote: >>> On Wed, Oct 14, 2015 at 5:01 AM, Matt Oliver >>> wrote: On 14 October 2015 at 09:46, Ganesh Ajjanagadde wrote: > On Tue, Oct 13, 2015 at 9:12 AM, Ganesh Ajjanagadde > wrote: > > On Tue, Oct 13, 2015 at 4:02 AM, Clément Bœsch wrote: > >> On Tue, Oct 13, 2015 at 09:25:03AM +0200, Paul B Mahol wrote: > >> [...] > >>> What about fmax/FFMAX? > >> > >> Feel free to try that out (it looks OT regarding the patch), but > >> fmax() > >> looks glibc specific > > Seems they are actually ISO: > http://en.cppreference.com/w/c/numeric/math/fmax > > Can someone check availability on all of our platforms of interest > (e.g Microsoft)? > fmax and fmin are only available on msvc using 2013 or newer. Currently the only msvc version without fmax/fmin that FFmpeg supports is 2012 which uses the C99 to C89 converter. >>> >>> And does that converter handle fmin, fmax, fmaxf, etc? >>> Does it need patches? >>> Bottom line: are they safe to use at the moment? >>> >> >> No, they are not. >> >> One thing I don't understand - why are we bothering with something >> that at best comes out as "same speed" from tests performed? (low >> number of runs are irrelevant as they are not statistically >> significant). > > Because if you actually bothered to run my random numbers benchmark > instead of posting with no basis claiming "statistical > insignificance", or for that matter matter bothered to actually check > the libc link, or even looked at Clement's asm test - you would > finally understand. > > Also, what needs to be done to get fmax, fmin, etc into the converter? > The converter doesn't provide any functions, just alters the syntax if needed. Functions not available cannot be fixed that way, sorry. >>> >>> Thanks for clarifying. I am still confused: how do we have llabs then? >>> Per MSDN, this was not present in MSVC 2012, and was added in MSVC >>> 2013 (looks like a similar case to fabs, fabsf). >>> >> >> Docs appear to be wrong in that particular case. It happens sometimes >> that functions are available but didn't get added to the docs. > > But with respect to fmin, fmax, etc - they were not available in 2012, > and the docs are right? Are you sure, and have you tested? > Yes those are definitely not available. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS
On Thu, Oct 15, 2015 at 6:50 AM, Hendrik Leppkeswrote: > On Thu, Oct 15, 2015 at 12:41 PM, Ganesh Ajjanagadde wrote: >> On Thu, Oct 15, 2015 at 6:37 AM, Hendrik Leppkes wrote: >>> On Thu, Oct 15, 2015 at 12:34 PM, Ganesh Ajjanagadde >>> wrote: On Wed, Oct 14, 2015 at 6:50 AM, Hendrik Leppkes wrote: > On Wed, Oct 14, 2015 at 12:37 PM, Ganesh Ajjanagadde > wrote: >> On Wed, Oct 14, 2015 at 5:01 AM, Matt Oliver >> wrote: >>> On 14 October 2015 at 09:46, Ganesh Ajjanagadde >>> wrote: >>> On Tue, Oct 13, 2015 at 9:12 AM, Ganesh Ajjanagadde wrote: > On Tue, Oct 13, 2015 at 4:02 AM, Clément Bœsch wrote: >> On Tue, Oct 13, 2015 at 09:25:03AM +0200, Paul B Mahol wrote: >> [...] >>> What about fmax/FFMAX? >> >> Feel free to try that out (it looks OT regarding the patch), but >> fmax() >> looks glibc specific Seems they are actually ISO: http://en.cppreference.com/w/c/numeric/math/fmax Can someone check availability on all of our platforms of interest (e.g Microsoft)? >>> >>> fmax and fmin are only available on msvc using 2013 or newer. Currently >>> the >>> only msvc version without fmax/fmin that FFmpeg supports is 2012 which >>> uses >>> the C99 to C89 converter. >> >> And does that converter handle fmin, fmax, fmaxf, etc? >> Does it need patches? >> Bottom line: are they safe to use at the moment? >> > > No, they are not. > > One thing I don't understand - why are we bothering with something > that at best comes out as "same speed" from tests performed? (low > number of runs are irrelevant as they are not statistically > significant). Because if you actually bothered to run my random numbers benchmark instead of posting with no basis claiming "statistical insignificance", or for that matter matter bothered to actually check the libc link, or even looked at Clement's asm test - you would finally understand. Also, what needs to be done to get fmax, fmin, etc into the converter? >>> >>> The converter doesn't provide any functions, just alters the syntax if >>> needed. Functions not available cannot be fixed that way, sorry. >> >> Thanks for clarifying. I am still confused: how do we have llabs then? >> Per MSDN, this was not present in MSVC 2012, and was added in MSVC >> 2013 (looks like a similar case to fabs, fabsf). >> > > Docs appear to be wrong in that particular case. It happens sometimes > that functions are available but didn't get added to the docs. But with respect to fmin, fmax, etc - they were not available in 2012, and the docs are right? Are you sure, and have you tested? > > - Hendrik > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS
On Wed, Oct 14, 2015 at 6:50 AM, Hendrik Leppkeswrote: > On Wed, Oct 14, 2015 at 12:37 PM, Ganesh Ajjanagadde wrote: >> On Wed, Oct 14, 2015 at 5:01 AM, Matt Oliver wrote: >>> On 14 October 2015 at 09:46, Ganesh Ajjanagadde wrote: >>> On Tue, Oct 13, 2015 at 9:12 AM, Ganesh Ajjanagadde wrote: > On Tue, Oct 13, 2015 at 4:02 AM, Clément Bœsch wrote: >> On Tue, Oct 13, 2015 at 09:25:03AM +0200, Paul B Mahol wrote: >> [...] >>> What about fmax/FFMAX? >> >> Feel free to try that out (it looks OT regarding the patch), but fmax() >> looks glibc specific Seems they are actually ISO: http://en.cppreference.com/w/c/numeric/math/fmax Can someone check availability on all of our platforms of interest (e.g Microsoft)? >>> >>> fmax and fmin are only available on msvc using 2013 or newer. Currently the >>> only msvc version without fmax/fmin that FFmpeg supports is 2012 which uses >>> the C99 to C89 converter. >> >> And does that converter handle fmin, fmax, fmaxf, etc? >> Does it need patches? >> Bottom line: are they safe to use at the moment? >> > > No, they are not. > > One thing I don't understand - why are we bothering with something > that at best comes out as "same speed" from tests performed? (low > number of runs are irrelevant as they are not statistically > significant). Because if you actually bothered to run my random numbers benchmark instead of posting with no basis claiming "statistical insignificance", or for that matter matter bothered to actually check the libc link, or even looked at Clement's asm test - you would finally understand. Also, what needs to be done to get fmax, fmin, etc into the converter? > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS
On Thu, Oct 15, 2015 at 6:41 AM, Ganesh Ajjanagaddewrote: > On Thu, Oct 15, 2015 at 6:37 AM, Hendrik Leppkes wrote: >> On Thu, Oct 15, 2015 at 12:34 PM, Ganesh Ajjanagadde >> wrote: >>> On Wed, Oct 14, 2015 at 6:50 AM, Hendrik Leppkes >>> wrote: On Wed, Oct 14, 2015 at 12:37 PM, Ganesh Ajjanagadde wrote: > On Wed, Oct 14, 2015 at 5:01 AM, Matt Oliver wrote: >> On 14 October 2015 at 09:46, Ganesh Ajjanagadde wrote: >> >>> On Tue, Oct 13, 2015 at 9:12 AM, Ganesh Ajjanagadde >>> wrote: >>> > On Tue, Oct 13, 2015 at 4:02 AM, Clément Bœsch wrote: >>> >> On Tue, Oct 13, 2015 at 09:25:03AM +0200, Paul B Mahol wrote: >>> >> [...] >>> >>> What about fmax/FFMAX? >>> >> >>> >> Feel free to try that out (it looks OT regarding the patch), but >>> >> fmax() >>> >> looks glibc specific >>> >>> Seems they are actually ISO: >>> http://en.cppreference.com/w/c/numeric/math/fmax >>> >>> Can someone check availability on all of our platforms of interest >>> (e.g Microsoft)? >>> >> >> fmax and fmin are only available on msvc using 2013 or newer. Currently >> the >> only msvc version without fmax/fmin that FFmpeg supports is 2012 which >> uses >> the C99 to C89 converter. > > And does that converter handle fmin, fmax, fmaxf, etc? > Does it need patches? > Bottom line: are they safe to use at the moment? > No, they are not. One thing I don't understand - why are we bothering with something that at best comes out as "same speed" from tests performed? (low number of runs are irrelevant as they are not statistically significant). >>> >>> Because if you actually bothered to run my random numbers benchmark >>> instead of posting with no basis claiming "statistical >>> insignificance", or for that matter matter bothered to actually check >>> the libc link, or even looked at Clement's asm test - you would >>> finally understand. >>> >>> Also, what needs to be done to get fmax, fmin, etc into the converter? >>> >> >> The converter doesn't provide any functions, just alters the syntax if >> needed. Functions not available cannot be fixed that way, sorry. > > Thanks for clarifying. I am still confused: how do we have llabs then? > Per MSDN, this was not present in MSVC 2012, and was added in MSVC > 2013 (looks like a similar case to fabs, fabsf). sorry, fabs, fabsf -> fmin, fmax etc. > >> >> - Hendrik >> ___ >> ffmpeg-devel mailing list >> ffmpeg-devel@ffmpeg.org >> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS
On Thu, Oct 15, 2015 at 12:41 PM, Ganesh Ajjanagaddewrote: > On Thu, Oct 15, 2015 at 6:37 AM, Hendrik Leppkes wrote: >> On Thu, Oct 15, 2015 at 12:34 PM, Ganesh Ajjanagadde >> wrote: >>> On Wed, Oct 14, 2015 at 6:50 AM, Hendrik Leppkes >>> wrote: On Wed, Oct 14, 2015 at 12:37 PM, Ganesh Ajjanagadde wrote: > On Wed, Oct 14, 2015 at 5:01 AM, Matt Oliver wrote: >> On 14 October 2015 at 09:46, Ganesh Ajjanagadde wrote: >> >>> On Tue, Oct 13, 2015 at 9:12 AM, Ganesh Ajjanagadde >>> wrote: >>> > On Tue, Oct 13, 2015 at 4:02 AM, Clément Bœsch wrote: >>> >> On Tue, Oct 13, 2015 at 09:25:03AM +0200, Paul B Mahol wrote: >>> >> [...] >>> >>> What about fmax/FFMAX? >>> >> >>> >> Feel free to try that out (it looks OT regarding the patch), but >>> >> fmax() >>> >> looks glibc specific >>> >>> Seems they are actually ISO: >>> http://en.cppreference.com/w/c/numeric/math/fmax >>> >>> Can someone check availability on all of our platforms of interest >>> (e.g Microsoft)? >>> >> >> fmax and fmin are only available on msvc using 2013 or newer. Currently >> the >> only msvc version without fmax/fmin that FFmpeg supports is 2012 which >> uses >> the C99 to C89 converter. > > And does that converter handle fmin, fmax, fmaxf, etc? > Does it need patches? > Bottom line: are they safe to use at the moment? > No, they are not. One thing I don't understand - why are we bothering with something that at best comes out as "same speed" from tests performed? (low number of runs are irrelevant as they are not statistically significant). >>> >>> Because if you actually bothered to run my random numbers benchmark >>> instead of posting with no basis claiming "statistical >>> insignificance", or for that matter matter bothered to actually check >>> the libc link, or even looked at Clement's asm test - you would >>> finally understand. >>> >>> Also, what needs to be done to get fmax, fmin, etc into the converter? >>> >> >> The converter doesn't provide any functions, just alters the syntax if >> needed. Functions not available cannot be fixed that way, sorry. > > Thanks for clarifying. I am still confused: how do we have llabs then? > Per MSDN, this was not present in MSVC 2012, and was added in MSVC > 2013 (looks like a similar case to fabs, fabsf). > Docs appear to be wrong in that particular case. It happens sometimes that functions are available but didn't get added to the docs. - Hendrik ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS
On Wed, Oct 14, 2015 at 6:53 AM, Hendrik Leppkeswrote: > On Wed, Oct 14, 2015 at 12:49 PM, Carl Eugen Hoyos wrote: >> Ganesh Ajjanagadde mit.edu> writes: >> >>> What? My numbers actually show that the new code may be faster - >> >> No, you are misunderstanding the numbers you posted. >> (Or I misunderstand them but nobody said so yet.) >> >> Highest runs are most relevant, skips have to be >> avoided (afaik). >> >> [...] >> >>> If you continue to post such stuff that has no basis, I might actually >>> get tempted into finding out for which floating point values the new >>> code is significantly faster, craft a relevant audio file, and post it >>> showing a huge performance difference - my random numbers benchmark >>> shows there must exist such values. >> >> Please do so! >> >>> > The more important question is if you can see the same >>> > changes in the disassembly of af_astats.o as what >>> > ubitux posted here for a short test function? >>> >>> I do. He uses clang/gcc, so do I. >> >> Sorry, my understanding fails here (I am not a native speaker): >> You did look at the disassembly of af_astats.o and there is >> inlined code instead of a function call? >> >>> The reason (irrelevant) is that both >>> of us run Arch. >>> >>> What is "more relevant" is if _you_ can see the changes >>> on some non Linux platform. >> >> If you could show that it is faster on any platform >> I would already be happy! >> > > A more important check would be that its not significantly slower on > any other platform. Just because one compiler/glibc combination > manages to produce an efficient inlined function doesn't necessarily > mean that some other compiler or libc couldn't produce a full function > call with all the overhead that comes with it, becoming significantly > slower. As I point out, all a libc implementer needs to do to be on par with the macro is to add the inline keyword. This was added in c99. If said libc does not, then it is fundamentally broken from a performance perspective. A beginning programmer can do that in a couple of minutes. Fix upstream and complain to them if it does not inline. You seem to have an alternative platform: you (and others who have such platforms) are welcome to try and find out, file bugs (if need be) with Microsoft, etc. > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS
On Thu, Oct 15, 2015 at 12:34 PM, Ganesh Ajjanagaddewrote: > On Wed, Oct 14, 2015 at 6:50 AM, Hendrik Leppkes wrote: >> On Wed, Oct 14, 2015 at 12:37 PM, Ganesh Ajjanagadde >> wrote: >>> On Wed, Oct 14, 2015 at 5:01 AM, Matt Oliver wrote: On 14 October 2015 at 09:46, Ganesh Ajjanagadde wrote: > On Tue, Oct 13, 2015 at 9:12 AM, Ganesh Ajjanagadde > wrote: > > On Tue, Oct 13, 2015 at 4:02 AM, Clément Bœsch wrote: > >> On Tue, Oct 13, 2015 at 09:25:03AM +0200, Paul B Mahol wrote: > >> [...] > >>> What about fmax/FFMAX? > >> > >> Feel free to try that out (it looks OT regarding the patch), but fmax() > >> looks glibc specific > > Seems they are actually ISO: > http://en.cppreference.com/w/c/numeric/math/fmax > > Can someone check availability on all of our platforms of interest > (e.g Microsoft)? > fmax and fmin are only available on msvc using 2013 or newer. Currently the only msvc version without fmax/fmin that FFmpeg supports is 2012 which uses the C99 to C89 converter. >>> >>> And does that converter handle fmin, fmax, fmaxf, etc? >>> Does it need patches? >>> Bottom line: are they safe to use at the moment? >>> >> >> No, they are not. >> >> One thing I don't understand - why are we bothering with something >> that at best comes out as "same speed" from tests performed? (low >> number of runs are irrelevant as they are not statistically >> significant). > > Because if you actually bothered to run my random numbers benchmark > instead of posting with no basis claiming "statistical > insignificance", or for that matter matter bothered to actually check > the libc link, or even looked at Clement's asm test - you would > finally understand. > > Also, what needs to be done to get fmax, fmin, etc into the converter? > The converter doesn't provide any functions, just alters the syntax if needed. Functions not available cannot be fixed that way, sorry. - Hendrik ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS
On Thu, Oct 15, 2015 at 6:57 AM, Hendrik Leppkeswrote: > On Thu, Oct 15, 2015 at 12:53 PM, Ganesh Ajjanagadde wrote: >> On Thu, Oct 15, 2015 at 6:50 AM, Hendrik Leppkes wrote: >>> On Thu, Oct 15, 2015 at 12:41 PM, Ganesh Ajjanagadde >>> wrote: On Thu, Oct 15, 2015 at 6:37 AM, Hendrik Leppkes wrote: > On Thu, Oct 15, 2015 at 12:34 PM, Ganesh Ajjanagadde > wrote: >> On Wed, Oct 14, 2015 at 6:50 AM, Hendrik Leppkes >> wrote: >>> On Wed, Oct 14, 2015 at 12:37 PM, Ganesh Ajjanagadde >>> wrote: On Wed, Oct 14, 2015 at 5:01 AM, Matt Oliver wrote: > On 14 October 2015 at 09:46, Ganesh Ajjanagadde > wrote: > >> On Tue, Oct 13, 2015 at 9:12 AM, Ganesh Ajjanagadde >> >> wrote: >> > On Tue, Oct 13, 2015 at 4:02 AM, Clément Bœsch wrote: >> >> On Tue, Oct 13, 2015 at 09:25:03AM +0200, Paul B Mahol wrote: >> >> [...] >> >>> What about fmax/FFMAX? >> >> >> >> Feel free to try that out (it looks OT regarding the patch), but >> >> fmax() >> >> looks glibc specific >> >> Seems they are actually ISO: >> http://en.cppreference.com/w/c/numeric/math/fmax >> >> Can someone check availability on all of our platforms of interest >> (e.g Microsoft)? >> > > fmax and fmin are only available on msvc using 2013 or newer. > Currently the > only msvc version without fmax/fmin that FFmpeg supports is 2012 > which uses > the C99 to C89 converter. And does that converter handle fmin, fmax, fmaxf, etc? Does it need patches? Bottom line: are they safe to use at the moment? >>> >>> No, they are not. >>> >>> One thing I don't understand - why are we bothering with something >>> that at best comes out as "same speed" from tests performed? (low >>> number of runs are irrelevant as they are not statistically >>> significant). >> >> Because if you actually bothered to run my random numbers benchmark >> instead of posting with no basis claiming "statistical >> insignificance", or for that matter matter bothered to actually check >> the libc link, or even looked at Clement's asm test - you would >> finally understand. >> >> Also, what needs to be done to get fmax, fmin, etc into the converter? >> > > The converter doesn't provide any functions, just alters the syntax if > needed. Functions not available cannot be fixed that way, sorry. Thanks for clarifying. I am still confused: how do we have llabs then? Per MSDN, this was not present in MSVC 2012, and was added in MSVC 2013 (looks like a similar case to fabs, fabsf). >>> >>> Docs appear to be wrong in that particular case. It happens sometimes >>> that functions are available but didn't get added to the docs. >> >> But with respect to fmin, fmax, etc - they were not available in 2012, >> and the docs are right? Are you sure, and have you tested? >> > > Yes those are definitely not available. Thanks for the check. So you mentioned we can't add fmin/fmax with the c99-89 converter. I noticed we have a libavutil/libm.h: couldn't a "portable" fmin/fmax etc go there? > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS
On Wed, Oct 14, 2015 at 6:54 AM, Carl Eugen Hoyoswrote: > Hendrik Leppkes gmail.com> writes: > >> One thing I don't understand - why are we bothering >> with something that at best comes out as "same speed" >> from tests performed? (low number of runs are >> irrelevant as they are not statistically significant). > > Since the patches apparently save a function call, > the question is imo: Why is this not measurable? > Shouldn't a function call always have a clear impact? Unless I am completely off with asm (entirely possible, it is not my area of interest) - there is no "call" for either and both are being inlined. The difference is that fabs() and FFABS are being optimized differently. > > Carl Eugen > > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS
Ganesh Ajjanagadde mit.edu> writes: > >> I think the general case, it'd be nice to figure out > >> why Carl's results are slightly different from yours > > > > Why do you think they are different at all? > > Did you look at the tables? > > They are different, and our conclusions are different > (in a slight way). Carl claims that the old code and > new code are mostly the same in speed, but in the > cases where they differ, the old code is faster. No. I wrote that both the numbers you posted and the numbers I posted show no proof that the new code is faster. (Contrary to my numbers, your numbers show that the old code may be faster but that is irrelevant.) The more important question is if you can see the same changes in the disassembly of af_astats.o as what ubitux posted here for a short test function? Carl Eugen ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS
On Wed, Oct 14, 2015 at 2:40 AM, Carl Eugen Hoyoswrote: > Ganesh Ajjanagadde mit.edu> writes: > >> >> I think the general case, it'd be nice to figure out >> >> why Carl's results are slightly different from yours >> > >> > Why do you think they are different at all? >> > Did you look at the tables? >> >> They are different, and our conclusions are different >> (in a slight way). Carl claims that the old code and >> new code are mostly the same in speed, but in the >> cases where they differ, the old code is faster. > > No. > > I wrote that both the numbers you posted and the numbers > I posted show no proof that the new code is faster. You posted a first reply with all manner of stuff saying essentially that "I believe the old code may be faster". You have not withdrawn that claim, and neither have I withdrawn mine. > (Contrary to my numbers, your numbers show that the old > code may be faster but that is irrelevant.) What? My numbers actually show that the new code may be faster - again: cycle times in the best case are identical, in the worst case they favor the new code. My random number benchmark is also clearly in favor of the new code. How this is "irrelevant" is beyond me. Also, please don't spin my numbers into something they are not: this is distracting the thread. Clement and Paul have already started moving to using the function, others are free to see the numbers themselves. Why you are trying to derail the benchmarks I posted is beyond me. If you continue to post such stuff that has no basis, I might actually get tempted into finding out for which floating point values the new code is significantly faster, craft a relevant audio file, and post it showing a huge performance difference - my random numbers benchmark shows there must exist such values. > > The more important question is if you can see the same > changes in the disassembly of af_astats.o as what > ubitux posted here for a short test function? I do. He uses clang/gcc, so do I. The reason (irrelevant) is that both of us run Arch. What is "more relevant" is if _you_ can see the changes on some non Linux platform. > > Carl Eugen > > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS
On Wed, Oct 14, 2015 at 5:01 AM, Matt Oliverwrote: > On 14 October 2015 at 09:46, Ganesh Ajjanagadde wrote: > >> On Tue, Oct 13, 2015 at 9:12 AM, Ganesh Ajjanagadde >> wrote: >> > On Tue, Oct 13, 2015 at 4:02 AM, Clément Bœsch wrote: >> >> On Tue, Oct 13, 2015 at 09:25:03AM +0200, Paul B Mahol wrote: >> >> [...] >> >>> What about fmax/FFMAX? >> >> >> >> Feel free to try that out (it looks OT regarding the patch), but fmax() >> >> looks glibc specific >> >> Seems they are actually ISO: >> http://en.cppreference.com/w/c/numeric/math/fmax >> >> Can someone check availability on all of our platforms of interest >> (e.g Microsoft)? >> > > fmax and fmin are only available on msvc using 2013 or newer. Currently the > only msvc version without fmax/fmin that FFmpeg supports is 2012 which uses > the C99 to C89 converter. And does that converter handle fmin, fmax, fmaxf, etc? Does it need patches? Bottom line: are they safe to use at the moment? > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS
On Wed, Oct 14, 2015 at 12:37 PM, Ganesh Ajjanagaddewrote: > On Wed, Oct 14, 2015 at 5:01 AM, Matt Oliver wrote: >> On 14 October 2015 at 09:46, Ganesh Ajjanagadde wrote: >> >>> On Tue, Oct 13, 2015 at 9:12 AM, Ganesh Ajjanagadde >>> wrote: >>> > On Tue, Oct 13, 2015 at 4:02 AM, Clément Bœsch wrote: >>> >> On Tue, Oct 13, 2015 at 09:25:03AM +0200, Paul B Mahol wrote: >>> >> [...] >>> >>> What about fmax/FFMAX? >>> >> >>> >> Feel free to try that out (it looks OT regarding the patch), but fmax() >>> >> looks glibc specific >>> >>> Seems they are actually ISO: >>> http://en.cppreference.com/w/c/numeric/math/fmax >>> >>> Can someone check availability on all of our platforms of interest >>> (e.g Microsoft)? >>> >> >> fmax and fmin are only available on msvc using 2013 or newer. Currently the >> only msvc version without fmax/fmin that FFmpeg supports is 2012 which uses >> the C99 to C89 converter. > > And does that converter handle fmin, fmax, fmaxf, etc? > Does it need patches? > Bottom line: are they safe to use at the moment? > No, they are not. One thing I don't understand - why are we bothering with something that at best comes out as "same speed" from tests performed? (low number of runs are irrelevant as they are not statistically significant). ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS
Ganesh Ajjanagadde mit.edu> writes: > What? My numbers actually show that the new code may be faster - No, you are misunderstanding the numbers you posted. (Or I misunderstand them but nobody said so yet.) Highest runs are most relevant, skips have to be avoided (afaik). [...] > If you continue to post such stuff that has no basis, I might actually > get tempted into finding out for which floating point values the new > code is significantly faster, craft a relevant audio file, and post it > showing a huge performance difference - my random numbers benchmark > shows there must exist such values. Please do so! > > The more important question is if you can see the same > > changes in the disassembly of af_astats.o as what > > ubitux posted here for a short test function? > > I do. He uses clang/gcc, so do I. Sorry, my understanding fails here (I am not a native speaker): You did look at the disassembly of af_astats.o and there is inlined code instead of a function call? > The reason (irrelevant) is that both > of us run Arch. > > What is "more relevant" is if _you_ can see the changes > on some non Linux platform. If you could show that it is faster on any platform I would already be happy! Carl Eugen ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS
On Wed, Oct 14, 2015 at 6:49 AM, Carl Eugen Hoyoswrote: > Ganesh Ajjanagadde mit.edu> writes: > >> What? My numbers actually show that the new code may be faster - > > No, you are misunderstanding the numbers you posted. > (Or I misunderstand them but nobody said so yet.) > > Highest runs are most relevant, skips have to be > avoided (afaik). Usually yes, but for such a small function, even the low runs should be important. Explain to me why I get consistently lower numbers with the new code (on low runs) - if you are inclined to believe that they are irrelevant, then why do I see a consistent trend there and not simply "noise"? > > [...] > >> If you continue to post such stuff that has no basis, I might actually >> get tempted into finding out for which floating point values the new >> code is significantly faster, craft a relevant audio file, and post it >> showing a huge performance difference - my random numbers benchmark >> shows there must exist such values. > > Please do so! > >> > The more important question is if you can see the same >> > changes in the disassembly of af_astats.o as what >> > ubitux posted here for a short test function? >> >> I do. He uses clang/gcc, so do I. > > Sorry, my understanding fails here (I am not a native speaker): > You did look at the disassembly of af_astats.o and there is > inlined code instead of a function call? > >> The reason (irrelevant) is that both >> of us run Arch. >> >> What is "more relevant" is if _you_ can see the changes >> on some non Linux platform. > > If you could show that it is faster on any platform > I would already be happy! I already have with my random number benchmark. The original glibc link I posted (which you essentially dismissed as irrelevant) also shows why they switched away from their macros to fabs, fabsf, etc. > > Carl Eugen > > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS
Ronald S. Bultje gmail.com> writes: > I think the general case, it'd be nice to figure out > why Carl's results are slightly different from yours Why do you think they are different at all? Did you look at the tables? Carl Eugen ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS
On Tue, Oct 13, 2015 at 11:15 PM, Carl Eugen Hoyoswrote: > Ronald S. Bultje gmail.com> writes: > >> I think the general case, it'd be nice to figure out >> why Carl's results are slightly different from yours > > Why do you think they are different at all? > Did you look at the tables? They are different, and our conclusions are different (in a slight way). Carl claims that the old code and new code are mostly the same in speed, but in the cases where they differ, the old code is faster. I claim the opposite: they are mostly the same, but in the cases where they differ, the new code is faster. > > Carl Eugen > > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS
On Tue, Oct 13, 2015 at 12:31:10AM -0400, Ganesh Ajjanagadde wrote: > On Tue, Oct 13, 2015 at 12:26 AM, Ganesh Ajjanagaddewrote: > > On Tue, Oct 13, 2015 at 12:16 AM, Carl Eugen Hoyos wrote: > >> Ganesh Ajjanagadde mit.edu> writes: > >> > >>> Bench from libavfilter/astats on a 15 min clip. > >> > >> I believe that your test would indicate that the > >> old variant is faster or that no result can be > >> given which is what my tests show. > > Also, how you can possibly believe that the old variant is faster is > beyond me given the astonishing amount of work by Intel, Red Hat, and > others to create the absolutely best performing libc. > > Just have a look at > https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/ieee754/dbl-64/s_sin.c;hb=HEAD#l281, > it gives an idea of the extreme lengths they go to. > https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/ieee754/dbl-64/s_fabs.c;hb=HEAD [/tmp]☭ cat a.c #include #include #define FFABS(a) ((a) >= 0 ? (a) : (-(a))) double f1d(double x) { return fabs(x); } double f2d(double x) { return FFABS(x); } int f1i(int x) { return abs(x); } int f2i(int x) { return FFABS(x); } [/tmp]☭ gcc -O2 -c a.c && objdump -d -Mintel a.o a.o: file format elf64-x86-64 Disassembly of section .text: : 0: f2 0f 10 0d 00 00 00movsd xmm1,QWORD PTR [rip+0x0]# 8 7: 00 8: 66 0f 54 c1 andpd xmm0,xmm1 c: c3 ret d: 0f 1f 00nopDWORD PTR [rax] 0010 : 10: 66 0f 2e 05 00 00 00ucomisd xmm0,QWORD PTR [rip+0x0]# 18 17: 00 18: 72 06 jb 20 1a: f3 c3 repz ret 1c: 0f 1f 40 00 nopDWORD PTR [rax+0x0] 20: f2 0f 10 0d 00 00 00movsd xmm1,QWORD PTR [rip+0x0]# 28 27: 00 28: 66 0f 57 c1 xorpd xmm0,xmm1 2c: c3 ret 2d: 0f 1f 00nopDWORD PTR [rax] 0030 : 30: 89 fa movedx,edi 32: 89 f8 moveax,edi 34: c1 fa 1fsaredx,0x1f 37: 31 d0 xoreax,edx 39: 29 d0 subeax,edx 3b: c3 ret 3c: 0f 1f 40 00 nopDWORD PTR [rax+0x0] 0040 : 40: 89 fa movedx,edi 42: 89 f8 moveax,edi 44: c1 fa 1fsaredx,0x1f 47: 31 d0 xoreax,edx 49: 29 d0 subeax,edx 4b: c3 ret [/tmp]☭ So fabs() is inlined by the compiler (gcc 5.2.0 here), while abs() is essentially identical to FFABS(). I have similar results with clang (3.7.0). Conclusion: using fabs() looks better with at least recent versions of clang and GCC on x86-64 (but may introduce slight behaviour changes?) To be more rigorous, it would be interesting to compare on different arch & compilers, but changing FFABS() with fabs() sounds OK to me. -- Clément B. signature.asc Description: PGP signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS
On 10/13/15, Clement Boeschwrote: > On Tue, Oct 13, 2015 at 12:31:10AM -0400, Ganesh Ajjanagadde wrote: >> On Tue, Oct 13, 2015 at 12:26 AM, Ganesh Ajjanagadde >> wrote: >> > On Tue, Oct 13, 2015 at 12:16 AM, Carl Eugen Hoyos >> > wrote: >> >> Ganesh Ajjanagadde mit.edu> writes: >> >> >> >>> Bench from libavfilter/astats on a 15 min clip. >> >> >> >> I believe that your test would indicate that the >> >> old variant is faster or that no result can be >> >> given which is what my tests show. >> >> Also, how you can possibly believe that the old variant is faster is >> beyond me given the astonishing amount of work by Intel, Red Hat, and >> others to create the absolutely best performing libc. >> >> Just have a look at >> https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/ieee754/dbl-64/s_sin.c;hb=HEAD#l281, >> it gives an idea of the extreme lengths they go to. >> > > https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/ieee754/dbl-64/s_fabs.c;hb=HEAD > > [/tmp]* cat a.c > #include > #include > > #define FFABS(a) ((a) >= 0 ? (a) : (-(a))) > > double f1d(double x) { return fabs(x); } > double f2d(double x) { return FFABS(x); } > > int f1i(int x) { return abs(x); } > int f2i(int x) { return FFABS(x); } > [/tmp]* gcc -O2 -c a.c && objdump -d -Mintel a.o > > a.o: file format elf64-x86-64 > > > Disassembly of section .text: > > : >0: f2 0f 10 0d 00 00 00movsd xmm1,QWORD PTR [rip+0x0]# 8 > >7: 00 >8: 66 0f 54 c1 andpd xmm0,xmm1 >c: c3 ret >d: 0f 1f 00nopDWORD PTR [rax] > > 0010 : > 10: 66 0f 2e 05 00 00 00ucomisd xmm0,QWORD PTR [rip+0x0]# 18 > > 17: 00 > 18: 72 06 jb 20 > 1a: f3 c3 repz ret > 1c: 0f 1f 40 00 nopDWORD PTR [rax+0x0] > 20: f2 0f 10 0d 00 00 00movsd xmm1,QWORD PTR [rip+0x0]# 28 > > 27: 00 > 28: 66 0f 57 c1 xorpd xmm0,xmm1 > 2c: c3 ret > 2d: 0f 1f 00nopDWORD PTR [rax] > > 0030 : > 30: 89 fa movedx,edi > 32: 89 f8 moveax,edi > 34: c1 fa 1fsaredx,0x1f > 37: 31 d0 xoreax,edx > 39: 29 d0 subeax,edx > 3b: c3 ret > 3c: 0f 1f 40 00 nopDWORD PTR [rax+0x0] > > 0040 : > 40: 89 fa movedx,edi > 42: 89 f8 moveax,edi > 44: c1 fa 1fsaredx,0x1f > 47: 31 d0 xoreax,edx > 49: 29 d0 subeax,edx > 4b: c3 ret > [/tmp]* > > So fabs() is inlined by the compiler (gcc 5.2.0 here), while abs() is > essentially identical to FFABS(). > > I have similar results with clang (3.7.0). > > Conclusion: using fabs() looks better with at least recent versions of > clang > and GCC on x86-64 (but may introduce slight behaviour changes?) > > To be more rigorous, it would be interesting to compare on different arch & > compilers, but changing FFABS() with fabs() sounds OK to me. What about fmax/FFMAX? > -- > Clement B. > ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS
On Tue, Oct 13, 2015 at 09:25:03AM +0200, Paul B Mahol wrote: [...] > What about fmax/FFMAX? Feel free to try that out (it looks OT regarding the patch), but fmax() looks glibc specific -- Clément B. signature.asc Description: PGP signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS
On Tue, Oct 13, 2015 at 2:45 AM, Clément Bœschwrote: > On Tue, Oct 13, 2015 at 12:31:10AM -0400, Ganesh Ajjanagadde wrote: >> On Tue, Oct 13, 2015 at 12:26 AM, Ganesh Ajjanagadde >> wrote: >> > On Tue, Oct 13, 2015 at 12:16 AM, Carl Eugen Hoyos >> > wrote: >> >> Ganesh Ajjanagadde mit.edu> writes: >> >> >> >>> Bench from libavfilter/astats on a 15 min clip. >> >> >> >> I believe that your test would indicate that the >> >> old variant is faster or that no result can be >> >> given which is what my tests show. >> >> Also, how you can possibly believe that the old variant is faster is >> beyond me given the astonishing amount of work by Intel, Red Hat, and >> others to create the absolutely best performing libc. >> >> Just have a look at >> https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/ieee754/dbl-64/s_sin.c;hb=HEAD#l281, >> it gives an idea of the extreme lengths they go to. >> > > https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/ieee754/dbl-64/s_fabs.c;hb=HEAD > > [/tmp]☭ cat a.c > #include > #include > > #define FFABS(a) ((a) >= 0 ? (a) : (-(a))) > > double f1d(double x) { return fabs(x); } > double f2d(double x) { return FFABS(x); } > > int f1i(int x) { return abs(x); } > int f2i(int x) { return FFABS(x); } > [/tmp]☭ gcc -O2 -c a.c && objdump -d -Mintel a.o > > a.o: file format elf64-x86-64 > > > Disassembly of section .text: > > : >0: f2 0f 10 0d 00 00 00movsd xmm1,QWORD PTR [rip+0x0]# 8 > >7: 00 >8: 66 0f 54 c1 andpd xmm0,xmm1 >c: c3 ret >d: 0f 1f 00nopDWORD PTR [rax] > > 0010 : > 10: 66 0f 2e 05 00 00 00ucomisd xmm0,QWORD PTR [rip+0x0]# 18 > > 17: 00 > 18: 72 06 jb 20 > 1a: f3 c3 repz ret > 1c: 0f 1f 40 00 nopDWORD PTR [rax+0x0] > 20: f2 0f 10 0d 00 00 00movsd xmm1,QWORD PTR [rip+0x0]# 28 > > 27: 00 > 28: 66 0f 57 c1 xorpd xmm0,xmm1 > 2c: c3 ret > 2d: 0f 1f 00nopDWORD PTR [rax] > > 0030 : > 30: 89 fa movedx,edi > 32: 89 f8 moveax,edi > 34: c1 fa 1fsaredx,0x1f > 37: 31 d0 xoreax,edx > 39: 29 d0 subeax,edx > 3b: c3 ret > 3c: 0f 1f 40 00 nopDWORD PTR [rax+0x0] > > 0040 : > 40: 89 fa movedx,edi > 42: 89 f8 moveax,edi > 44: c1 fa 1fsaredx,0x1f > 47: 31 d0 xoreax,edx > 49: 29 d0 subeax,edx > 4b: c3 ret > [/tmp]☭ > > So fabs() is inlined by the compiler (gcc 5.2.0 here), while abs() is > essentially identical to FFABS(). Yes, on integers they are identical. Differences come on floating point, which is the point of my patch. Thanks for showing the asm. > > I have similar results with clang (3.7.0). > > Conclusion: using fabs() looks better with at least recent versions of clang > and GCC on x86-64 (but may introduce slight behaviour changes?) There might be some behavior changes (floating point is not exact, etc), but at least they are governed by the ISO C document. FATE still passes. > > To be more rigorous, it would be interesting to compare on different arch & > compilers, but changing FFABS() with fabs() sounds OK to me. > > -- > Clément B. > > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel > ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS
On Tue, Oct 13, 2015 at 9:12 AM, Ganesh Ajjanagaddewrote: > On Tue, Oct 13, 2015 at 4:02 AM, Clément Bœsch wrote: >> On Tue, Oct 13, 2015 at 09:25:03AM +0200, Paul B Mahol wrote: >> [...] >>> What about fmax/FFMAX? >> >> Feel free to try that out (it looks OT regarding the patch), but fmax() >> looks glibc specific Seems they are actually ISO: http://en.cppreference.com/w/c/numeric/math/fmax Can someone check availability on all of our platforms of interest (e.g Microsoft)? > > Maybe (long term) we can use an av_fabs, av_fabsf, av_fmin/av_fmax (or > ff_, avpriv_, etc) that pick out the right thing for different > configurations. It will need something split between configure/header > guards. I am willing to do this, once everyone is convinced. > >> >> -- >> Clément B. >> >> ___ >> ffmpeg-devel mailing list >> ffmpeg-devel@ffmpeg.org >> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel >> ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS
On Tue, Oct 13, 2015 at 8:58 PM, Ronald S. Bultjewrote: > Hi, > > On Tue, Oct 13, 2015 at 8:09 PM, Ganesh Ajjanagadde > wrote: > >> On Tue, Oct 13, 2015 at 2:45 AM, Clément Bœsch wrote: >> > On Tue, Oct 13, 2015 at 12:31:10AM -0400, Ganesh Ajjanagadde wrote: >> >> On Tue, Oct 13, 2015 at 12:26 AM, Ganesh Ajjanagadde >> wrote: >> >> > On Tue, Oct 13, 2015 at 12:16 AM, Carl Eugen Hoyos >> wrote: >> >> >> Ganesh Ajjanagadde mit.edu> writes: >> >> >> >> >> >>> Bench from libavfilter/astats on a 15 min clip. >> >> >> >> >> >> I believe that your test would indicate that the >> >> >> old variant is faster or that no result can be >> >> >> given which is what my tests show. >> >> >> >> Also, how you can possibly believe that the old variant is faster is >> >> beyond me given the astonishing amount of work by Intel, Red Hat, and >> >> others to create the absolutely best performing libc. >> >> >> >> Just have a look at >> >> >> https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/ieee754/dbl-64/s_sin.c;hb=HEAD#l281 >> , >> >> it gives an idea of the extreme lengths they go to. >> >> >> > >> > >> https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/ieee754/dbl-64/s_fabs.c;hb=HEAD >> > >> > [/tmp]☭ cat a.c >> > #include >> > #include >> > >> > #define FFABS(a) ((a) >= 0 ? (a) : (-(a))) >> > >> > double f1d(double x) { return fabs(x); } >> > double f2d(double x) { return FFABS(x); } >> > >> > int f1i(int x) { return abs(x); } >> > int f2i(int x) { return FFABS(x); } >> > [/tmp]☭ gcc -O2 -c a.c && objdump -d -Mintel a.o >> > >> > a.o: file format elf64-x86-64 >> > >> > >> > Disassembly of section .text: >> > >> > : >> >0: f2 0f 10 0d 00 00 00movsd xmm1,QWORD PTR [rip+0x0]# >> 8 >> >7: 00 >> >8: 66 0f 54 c1 andpd xmm0,xmm1 >> >c: c3 ret >> >d: 0f 1f 00nopDWORD PTR [rax] >> > >> > 0010 : >> > 10: 66 0f 2e 05 00 00 00ucomisd xmm0,QWORD PTR [rip+0x0] >> # 18 >> > 17: 00 >> > 18: 72 06 jb 20 >> > 1a: f3 c3 repz ret >> > 1c: 0f 1f 40 00 nopDWORD PTR [rax+0x0] >> > 20: f2 0f 10 0d 00 00 00movsd xmm1,QWORD PTR [rip+0x0]# >> 28 >> > 27: 00 >> > 28: 66 0f 57 c1 xorpd xmm0,xmm1 >> > 2c: c3 ret >> > 2d: 0f 1f 00nopDWORD PTR [rax] >> > >> > 0030 : >> > 30: 89 fa movedx,edi >> > 32: 89 f8 moveax,edi >> > 34: c1 fa 1fsaredx,0x1f >> > 37: 31 d0 xoreax,edx >> > 39: 29 d0 subeax,edx >> > 3b: c3 ret >> > 3c: 0f 1f 40 00 nopDWORD PTR [rax+0x0] >> > >> > 0040 : >> > 40: 89 fa movedx,edi >> > 42: 89 f8 moveax,edi >> > 44: c1 fa 1fsaredx,0x1f >> > 47: 31 d0 xoreax,edx >> > 49: 29 d0 subeax,edx >> > 4b: c3 ret >> > [/tmp]☭ >> > >> > So fabs() is inlined by the compiler (gcc 5.2.0 here), while abs() is >> > essentially identical to FFABS(). >> > >> > I have similar results with clang (3.7.0). >> > >> > Conclusion: using fabs() looks better with at least recent versions of >> clang >> > and GCC on x86-64 (but may introduce slight behaviour changes?) >> > >> > To be more rigorous, it would be interesting to compare on different >> arch & >> > compilers, but changing FFABS() with fabs() sounds OK to me. >> >> I noticed that is being applied piecemeal, and some of it has been >> pushed. Does that mean I am free to push (with the reduced commit >> message) as well? > > > You'll notice that Paul did it for the filters he maintains. I'm fine with > you doing this for any code I maintain (no further review required). You > can find maintainers for each piece of code in git log or MAINTAINERS. It > sounds like Paul is fine with this also. I think the general case, it'd be > nice to figure out why Carl's results are slightly different from yours (or > maybe it's noise?). If we can resolve that, I don't think there's any > further outstanding objections, right? No other outstanding objections - the only serious concern is availability (which is a non-issue since we were already using fabs, fabsf sporadically). Carl's issues should be either noise, or a bad/slow libc fabs implementation. Hence I requested him for his config. I will give respective maintainers a week for slowly adding this stuff. To reiterate, I have not touched avcodec as it is mostly integer math anyway - if someone could point out some "hotspots" in avcodec with this issue, that would be great. > > Also,
Re: [FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS
On Mon, Oct 12, 2015 at 7:59 AM, Ganesh Ajjanagaddewrote: > On Mon, Oct 12, 2015 at 7:46 AM, Carl Eugen Hoyos wrote: >> Ganesh Ajjanagadde mit.edu> writes: >> >>> It is well known that fabs and fabsf are at least as fast and usually >>> faster than the FFABS macro, at least on the gcc+glibc combination. >> >> I wasn't aware of this. >> And I believe we support other compilers and other >> libc implementations. > > Indeed, which is why performance comparisons are welcome. I argue > below why any sane configuration should not regress performance wise. > This is also "relevant information" in my view. > >> >>> For instance, see the reference: >>> http://patchwork.sourceware.org/patch/6735/. >>> This was a patch to glibc in order to remove their usages. Given their >>> general performance obsession (more than FFmpeg in many cases), they >>> have ensured that fabs and fabsf never peform worse than FFABS. >> >> Ok but is this really related? > > The reference is, the comment may not be, I was slightly annoyed at > FFABS usage when libc provides them on all our platforms, and wanted a > justification that would appeal to the FFmpeg crowd, namely peformance > to move away from them. > >> >>> I have tested on x86-64 Haswell with GCC 5.2 - even with no strict IEEE >>> mode enabled, and just the standard -O3 optimizations, there is a >>> performance benefit. >> >> This is the only relevant information imo. >> Please provide (very, very short) information >> on what you tested. > > Random integers, same style as before. I have not posted numbers, > since my numbers are anyway meaningless: I lack non > x86-64+(gcc/clang)+glibc configurations. > As for that being the only relevant message, I do intend to shorten > the message. The long stuff was simply my own personal motivation to > make people understand why I did this stuff. Otherwise, I would have > sent a separate message anyway in the patch thread, let me know what > style you prefer. > >> >> Since you mention libc so often: Does the patch >> work on win*, aix and other strange platforms? > > Why not, any standard, conformant fabs/fabsf should. Again, I lack the > configurations and am just a university student with a single laptop. > fabs and fabsf are already being used elsewhere. Inf anything, they > are far better specified on IEEE 754 than FFABS - behavior with NaN, > Inf, etc. Bench from libavfilter/astats on a 15 min clip. Of course the difference is slight, but nonetheless it exists. The best case is the same, but look at the difference in the worst cases (as was mentioned in the glibc link I gave, I suspect some trickery for subnormal floats/Inf/0.0). By the way, I can show results skewing even more heavily in favor of fabs by using "random" floating point numbers, random in the sense of being a random 64 bit pattern (same style as my old crude bench - fill a large array, and test). There, believe it or not, I was getting a nearly 1.5-2x improvement. Anyway, here it is: old: 4230 decicycles in abs, 1 runs, 0 skips 2520 decicycles in abs, 2 runs, 0 skips 1635 decicycles in abs, 4 runs, 0 skips 967 decicycles in abs, 8 runs, 0 skips 635 decicycles in abs, 16 runs, 0 skips 473 decicycles in abs, 32 runs, 0 skips 389 decicycles in abs, 64 runs, 0 skips 350 decicycles in abs, 128 runs, 0 skips 331 decicycles in abs, 256 runs, 0 skips 321 decicycles in abs, 512 runs, 0 skips 319 decicycles in abs,1024 runs, 0 skips 318 decicycles in abs,2048 runs, 0 skips 315 decicycles in abs,4096 runs, 0 skips 317 decicycles in abs,8192 runs, 0 skips 335 decicycles in abs, 16384 runs, 0 skips 335 decicycles in abs, 32768 runs, 0 skips 333 decicycles in abs, 65536 runs, 0 skips 342 decicycles in abs, 131072 runs, 0 skips 340 decicycles in abs, 262144 runs, 0 skips 345 decicycles in abs, 524285 runs, 3 skips 348 decicycles in abs, 1048565 runs, 11 skips 351 decicycles in abs, 2097129 runs, 23 skipsbitrate=N/A 352 decicycles in abs, 4194252 runs, 52 skipsbitrate=N/A 350 decicycles in abs, 8388498 runs,110 skipsbitrate=N/A 351 decicycles in abs,16776993 runs,223 skipsbitrate=N/A 352 decicycles in abs,33553999 runs,433 skipsbitrate=N/A 351 decicycles in abs,67108036 runs,828 skips new: 3540 decicycles in abs, 1 runs, 0 skips 2160 decicycles in abs, 2 runs, 0 skips 1447 decicycles in abs, 4 runs, 0 skips 881 decicycles in abs, 8 runs, 0 skips 594 decicycles in abs, 16 runs, 0 skips 455 decicycles in abs, 32 runs, 0 skips 382 decicycles in abs, 64 runs, 0 skips 361 decicycles in abs, 128 runs, 0 skips 356 decicycles in abs,
Re: [FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS
Ganesh Ajjanagadde mit.edu> writes: > On Tue, Oct 13, 2015 at 12:16 AM, Carl Eugen Hoyos wrote: > > Ganesh Ajjanagadde mit.edu> writes: > > > >> Bench from libavfilter/astats on a 15 min clip. > > > > I believe that your test would indicate that the > > old variant is faster or that no result can be > > given which is what my tests show. > > Look at the bench and the numbers again, I have > provided it above. Ok: old: 389 decicycles in abs, 64 runs, 0 skips 350 decicycles in abs, 128 runs, 0 skips 331 decicycles in abs, 256 runs, 0 skips 321 decicycles in abs, 512 runs, 0 skips 319 decicycles in abs,1024 runs, 0 skips 318 decicycles in abs,2048 runs, 0 skips 315 decicycles in abs,4096 runs, 0 skips 317 decicycles in abs,8192 runs, 0 skips 335 decicycles in abs, 16384 runs, 0 skips 335 decicycles in abs, 32768 runs, 0 skips mew: 382 decicycles in abs, 64 runs, 0 skips 361 decicycles in abs, 128 runs, 0 skips 356 decicycles in abs, 256 runs, 0 skips 334 decicycles in abs, 512 runs, 0 skips 322 decicycles in abs,1024 runs, 0 skips 317 decicycles in abs,2048 runs, 0 skips 315 decicycles in abs,4096 runs, 0 skips 341 decicycles in abs,8192 runs, 0 skips 363 decicycles in abs, 16383 runs, 1 skips 342 decicycles in abs, 32767 runs, 1 skips Numbers with high skips or low runs are not so relevant afaik. > They are essentially identical in the best case > (most number of runs), the new variant is faster in > the worst case. I would say the opposite is true but we can certainly agree that there is no proof that one is faster. > You have not provided a bench proving otherwise. old: user0m20.338s user0m20.408s user0m20.287s user0m20.365s user0m20.208s new: user0m20.197s user0m20.577s user0m20.434s user0m20.322s user0m20.356s > > I am not sure if it makes sense to apply a patch > > that is meant to improve speed if this improvement > > can't be shown. > > I believe I have shown it above clearly. Imo, you have shown clearly that neither variant can be shown to be faster. Carl Eugen ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS
On Mon, Oct 12, 2015 at 4:57 PM, Ganesh Ajjanagaddewrote: > On Mon, Oct 12, 2015 at 7:59 AM, Ganesh Ajjanagadde wrote: >> On Mon, Oct 12, 2015 at 7:46 AM, Carl Eugen Hoyos wrote: >>> Ganesh Ajjanagadde mit.edu> writes: >>> It is well known that fabs and fabsf are at least as fast and usually faster than the FFABS macro, at least on the gcc+glibc combination. >>> >>> I wasn't aware of this. >>> And I believe we support other compilers and other >>> libc implementations. >> >> Indeed, which is why performance comparisons are welcome. I argue >> below why any sane configuration should not regress performance wise. >> This is also "relevant information" in my view. >> >>> For instance, see the reference: http://patchwork.sourceware.org/patch/6735/. This was a patch to glibc in order to remove their usages. Given their general performance obsession (more than FFmpeg in many cases), they have ensured that fabs and fabsf never peform worse than FFABS. >>> >>> Ok but is this really related? >> >> The reference is, the comment may not be, I was slightly annoyed at >> FFABS usage when libc provides them on all our platforms, and wanted a >> justification that would appeal to the FFmpeg crowd, namely peformance >> to move away from them. >> >>> I have tested on x86-64 Haswell with GCC 5.2 - even with no strict IEEE mode enabled, and just the standard -O3 optimizations, there is a performance benefit. >>> >>> This is the only relevant information imo. >>> Please provide (very, very short) information >>> on what you tested. >> >> Random integers, same style as before. I have not posted numbers, >> since my numbers are anyway meaningless: I lack non >> x86-64+(gcc/clang)+glibc configurations. >> As for that being the only relevant message, I do intend to shorten >> the message. The long stuff was simply my own personal motivation to >> make people understand why I did this stuff. Otherwise, I would have >> sent a separate message anyway in the patch thread, let me know what >> style you prefer. >> >>> >>> Since you mention libc so often: Does the patch >>> work on win*, aix and other strange platforms? >> >> Why not, any standard, conformant fabs/fabsf should. Again, I lack the >> configurations and am just a university student with a single laptop. >> fabs and fabsf are already being used elsewhere. Inf anything, they >> are far better specified on IEEE 754 than FFABS - behavior with NaN, >> Inf, etc. > > Bench from libavfilter/astats on a 15 min clip. Of course the > difference is slight, but nonetheless it exists. The best case is the > same, but look at the difference in the worst cases (as was mentioned > in the glibc link I gave, I suspect some trickery for subnormal > floats/Inf/0.0). By the way, I can show results skewing even more > heavily in favor of fabs by using "random" floating point numbers, > random in the sense of being a random 64 bit pattern (same style as my > old crude bench - fill a large array, and test). There, believe it or > not, I was getting a nearly 1.5-2x improvement. > > Anyway, here it is: > old: >4230 decicycles in abs, 1 runs, 0 skips >2520 decicycles in abs, 2 runs, 0 skips >1635 decicycles in abs, 4 runs, 0 skips > 967 decicycles in abs, 8 runs, 0 skips > 635 decicycles in abs, 16 runs, 0 skips > 473 decicycles in abs, 32 runs, 0 skips > 389 decicycles in abs, 64 runs, 0 skips > 350 decicycles in abs, 128 runs, 0 skips > 331 decicycles in abs, 256 runs, 0 skips > 321 decicycles in abs, 512 runs, 0 skips > 319 decicycles in abs,1024 runs, 0 skips > 318 decicycles in abs,2048 runs, 0 skips > 315 decicycles in abs,4096 runs, 0 skips > 317 decicycles in abs,8192 runs, 0 skips > 335 decicycles in abs, 16384 runs, 0 skips > 335 decicycles in abs, 32768 runs, 0 skips > 333 decicycles in abs, 65536 runs, 0 skips > 342 decicycles in abs, 131072 runs, 0 skips > 340 decicycles in abs, 262144 runs, 0 skips > 345 decicycles in abs, 524285 runs, 3 skips > 348 decicycles in abs, 1048565 runs, 11 skips > 351 decicycles in abs, 2097129 runs, 23 skipsbitrate=N/A > 352 decicycles in abs, 4194252 runs, 52 skipsbitrate=N/A > 350 decicycles in abs, 8388498 runs,110 skipsbitrate=N/A > 351 decicycles in abs,16776993 runs,223 skipsbitrate=N/A > 352 decicycles in abs,33553999 runs,433 skipsbitrate=N/A > 351 decicycles in abs,67108036 runs,828 skips > new: >3540 decicycles in abs, 1 runs, 0 skips >2160 decicycles in abs, 2 runs, 0 skips >1447 decicycles in abs, 4 runs, 0 skips > 881 decicycles in abs, 8 runs, 0 skips > 594
Re: [FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS
On Tue, Oct 13, 2015 at 12:16 AM, Carl Eugen Hoyoswrote: > Ganesh Ajjanagadde mit.edu> writes: > >> Bench from libavfilter/astats on a 15 min clip. > > I believe that your test would indicate that the > old variant is faster or that no result can be > given which is what my tests show. Look at the bench and the numbers again, I have provided it above. They are essentially identical in the best case (most number of runs), the new variant is faster in the worst case. You have not provided a bench proving otherwise. > > I am not sure if it makes sense to apply a patch > that is meant to improve speed if this improvement > can't be shown. I believe I have shown it above clearly. > > Carl Eugen > > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS
On Tue, Oct 13, 2015 at 12:26 AM, Ganesh Ajjanagaddewrote: > On Tue, Oct 13, 2015 at 12:16 AM, Carl Eugen Hoyos wrote: >> Ganesh Ajjanagadde mit.edu> writes: >> >>> Bench from libavfilter/astats on a 15 min clip. >> >> I believe that your test would indicate that the >> old variant is faster or that no result can be >> given which is what my tests show. Also, how you can possibly believe that the old variant is faster is beyond me given the astonishing amount of work by Intel, Red Hat, and others to create the absolutely best performing libc. Just have a look at https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/ieee754/dbl-64/s_sin.c;hb=HEAD#l281, it gives an idea of the extreme lengths they go to. > > Look at the bench and the numbers again, I have provided it above. > They are essentially identical in the best case (most number of runs), > the new variant is faster in the worst case. You have not provided a > bench proving otherwise. > >> >> I am not sure if it makes sense to apply a patch >> that is meant to improve speed if this improvement >> can't be shown. > > I believe I have shown it above clearly. > >> >> Carl Eugen >> >> ___ >> ffmpeg-devel mailing list >> ffmpeg-devel@ffmpeg.org >> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS
On Tue, Oct 13, 2015 at 12:44 AM, Carl Eugen Hoyoswrote: > Ganesh Ajjanagadde mit.edu> writes: >> On Tue, Oct 13, 2015 at 12:16 AM, Carl Eugen Hoyos wrote: >> > Ganesh Ajjanagadde mit.edu> writes: >> > >> >> Bench from libavfilter/astats on a 15 min clip. >> > >> > I believe that your test would indicate that the >> > old variant is faster or that no result can be >> > given which is what my tests show. >> >> Look at the bench and the numbers again, I have >> provided it above. > > Ok: > old: > 389 decicycles in abs, 64 runs, 0 skips > 350 decicycles in abs, 128 runs, 0 skips > 331 decicycles in abs, 256 runs, 0 skips > 321 decicycles in abs, 512 runs, 0 skips > 319 decicycles in abs,1024 runs, 0 skips > 318 decicycles in abs,2048 runs, 0 skips > 315 decicycles in abs,4096 runs, 0 skips > 317 decicycles in abs,8192 runs, 0 skips > 335 decicycles in abs, 16384 runs, 0 skips > 335 decicycles in abs, 32768 runs, 0 skips > > mew: > 382 decicycles in abs, 64 runs, 0 skips > 361 decicycles in abs, 128 runs, 0 skips > 356 decicycles in abs, 256 runs, 0 skips > 334 decicycles in abs, 512 runs, 0 skips > 322 decicycles in abs,1024 runs, 0 skips > 317 decicycles in abs,2048 runs, 0 skips > 315 decicycles in abs,4096 runs, 0 skips > 341 decicycles in abs,8192 runs, 0 skips > 363 decicycles in abs, 16383 runs, 1 skips > 342 decicycles in abs, 32767 runs, 1 skips > Numbers with high skips or low runs are not so > relevant afaik. Not so relevant, but as I said: it is still better. > >> They are essentially identical in the best case >> (most number of runs), the new variant is faster in >> the worst case. > > I would say the opposite is true but we can certainly > agree that there is no proof that one is faster. Do a random float test, the difference is more pronounced. > >> You have not provided a bench proving otherwise. > > old: > user0m20.338s > user0m20.408s > user0m20.287s > user0m20.365s > user0m20.208s > new: > user0m20.197s > user0m20.577s > user0m20.434s > user0m20.322s > user0m20.356s The difference here is imo too small to say anything. My point is precisely this: on most inputs, there is no difference. On bad (worst case) inputs, using fabs instead of the macro is far superior. The random float bench proves this. Translating that to some audio file should be easy: I suspect placing most samples near a silence value (0) does this. > >> > I am not sure if it makes sense to apply a patch >> > that is meant to improve speed if this improvement >> > can't be shown. >> >> I believe I have shown it above clearly. > > Imo, you have shown clearly that neither variant can > be shown to be faster. > > Carl Eugen > > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] avfilter, swresample, swscale: use fabs, fabsf instead of FFABS
On Tue, Oct 13, 2015 at 1:03 AM, Ganesh Ajjanagaddewrote: > On Tue, Oct 13, 2015 at 12:44 AM, Carl Eugen Hoyos wrote: >> Ganesh Ajjanagadde mit.edu> writes: >>> On Tue, Oct 13, 2015 at 12:16 AM, Carl Eugen Hoyos wrote: >>> > Ganesh Ajjanagadde mit.edu> writes: >>> > >>> >> Bench from libavfilter/astats on a 15 min clip. >>> > >>> > I believe that your test would indicate that the >>> > old variant is faster or that no result can be >>> > given which is what my tests show. >>> >>> Look at the bench and the numbers again, I have >>> provided it above. >> >> Ok: >> old: >> 389 decicycles in abs, 64 runs, 0 skips >> 350 decicycles in abs, 128 runs, 0 skips >> 331 decicycles in abs, 256 runs, 0 skips >> 321 decicycles in abs, 512 runs, 0 skips >> 319 decicycles in abs,1024 runs, 0 skips >> 318 decicycles in abs,2048 runs, 0 skips >> 315 decicycles in abs,4096 runs, 0 skips >> 317 decicycles in abs,8192 runs, 0 skips >> 335 decicycles in abs, 16384 runs, 0 skips >> 335 decicycles in abs, 32768 runs, 0 skips >> >> mew: >> 382 decicycles in abs, 64 runs, 0 skips >> 361 decicycles in abs, 128 runs, 0 skips >> 356 decicycles in abs, 256 runs, 0 skips >> 334 decicycles in abs, 512 runs, 0 skips >> 322 decicycles in abs,1024 runs, 0 skips >> 317 decicycles in abs,2048 runs, 0 skips >> 315 decicycles in abs,4096 runs, 0 skips >> 341 decicycles in abs,8192 runs, 0 skips >> 363 decicycles in abs, 16383 runs, 1 skips >> 342 decicycles in abs, 32767 runs, 1 skips >> Numbers with high skips or low runs are not so >> relevant afaik. > > Not so relevant, but as I said: it is still better. > >> >>> They are essentially identical in the best case >>> (most number of runs), the new variant is faster in >>> the worst case. >> >> I would say the opposite is true but we can certainly >> agree that there is no proof that one is faster. > > Do a random float test, the difference is more pronounced. Simple bench for all abs stuff: #include #include #include #include #include #define FFABS(a) ((a) >= 0 ? (a) : (-(a))) #define NUM_TRIALS 10 #define NUM_ITER 10 static float f[NUM_TRIALS]; static double g[NUM_TRIALS]; static int i[NUM_TRIALS]; static long long ll[NUM_TRIALS]; int main(void) { int c, d; clock_t start, end; double time; float abs_f; double abs_d; int abs_i; long long abs_ll; for (c = 0; c < NUM_TRIALS; ++c) { ll[c] = random(); i[c] = rand(); f[c] = (float)rand()/(float)(RAND_MAX/FLT_MAX); g[c] = (double)random()/(double)(RAND_MAX/DBL_MAX); } start = clock(); for (d = 0; d < NUM_ITER; ++d) for (c = 0; c < NUM_TRIALS; ++c) f[c] = fabsf(f[c]); end = clock(); time = ((double) (end - start)) / CLOCKS_PER_SEC; printf("fabsf: %lf\n", time); start = clock(); for (d = 0; d < NUM_ITER; ++d) for (c = 0; c < NUM_TRIALS; ++c) f[c] = FFABS(f[c]); end = clock(); time = ((double) (end - start)) / CLOCKS_PER_SEC; printf("FFABS: %lf\n", time); start = clock(); for (d = 0; d < NUM_ITER; ++d) for (c = 0; c < NUM_TRIALS; ++c) g[c] = fabs(g[c]); end = clock(); time = ((double) (end - start)) / CLOCKS_PER_SEC; printf("fabs: %lf\n", time); start = clock(); for (d = 0; d < NUM_ITER; ++d) for (c = 0; c < NUM_TRIALS; ++c) g[c] = FFABS(g[c]); end = clock(); time = ((double) (end - start)) / CLOCKS_PER_SEC; printf("FFABS: %lf\n", time); start = clock(); for (d = 0; d < NUM_ITER; ++d) for (c = 0; c < NUM_TRIALS; ++c) i[c] = abs(i[c]); end = clock(); time = ((double) (end - start)) / CLOCKS_PER_SEC; printf("abs: %lf\n", time); start = clock(); for (d = 0; d < NUM_ITER; ++d) for (c = 0; c < NUM_TRIALS; ++c) i[c] = FFABS(i[c]); end = clock(); time = ((double) (end - start)) / CLOCKS_PER_SEC; printf("FFABS: %lf\n", time); start = clock(); for (d = 0; d < NUM_ITER; ++d) for (c = 0; c < NUM_TRIALS; ++c) ll[c] = llabs(ll[c]); end = clock(); time = ((double) (end - start)) / CLOCKS_PER_SEC; printf("llabs: %lf\n", time); start = clock(); for (d = 0; d < NUM_ITER; ++d) for (c = 0; c < NUM_TRIALS; ++c) ll[c] = FFABS(ll[c]); end = clock(); time = ((double) (end - start)) / CLOCKS_PER_SEC; printf("FFABS: %lf\n", time); return 0; } > >> >>> You have not provided a bench proving otherwise. >> >> old: >> user0m20.338s >> user0m20.408s >> user0m20.287s >> user0m20.365s >> user0m20.208s >> new: >> user0m20.197s >> user0m20.577s >> user0m20.434s >> user0m20.322s >> user0m20.356s Am also