Re: [Gegl-developer] Question about the use of C99/gcc built-in math intrinsics within GEGL on gfloats
Hi, On Sun, 2008-09-14 at 11:30 +0200, Geert Jordaens wrote: Introducing the qualifier restrict will have some more checks to be done by the programmer and enabling the *-fstrict-aliasing* flag and the warning *-Wstrict-aliasing *would be advisable. I don't think enabling the strict-aliasing rules is advisable in general. It is too easy to break the rules and this might lead to bugs. Sven ___ Gegl-developer mailing list Gegl-developer@lists.XCF.Berkeley.EDU https://lists.XCF.Berkeley.EDU/mailman/listinfo/gegl-developer
Re: [Gegl-developer] Question about the use of C99/gcc built-in math intrinsics within GEGL on gfloats
Sven Neumann wrote: Hi, I've filed an enhancement request for G_GNUC_RESTRICT: http://bugzilla.gnome.org/show_bug.cgi?id=552098 We should however not wait for this to be included in GLib. As GLib 2.18 has just been released, it will take a while before 2.20 hits the road. Sven ___ Gegl-developer mailing list Gegl-developer@lists.XCF.Berkeley.EDU https://lists.XCF.Berkeley.EDU/mailman/listinfo/gegl-developer Introducing the qualifier restrict will have some more checks to be done by the programmer and enabling the *-fstrict-aliasing* flag and the warning *-Wstrict-aliasing *would be advisable. A good article on the use of the restrict qualifier: http://www.cellperformance.com/mike_acton/2006/05/demystifying_the_restrict_keyw.html and http://developers.sun.com/solaris/articles/cc_restrict.html ___ Gegl-developer mailing list Gegl-developer@lists.XCF.Berkeley.EDU https://lists.XCF.Berkeley.EDU/mailman/listinfo/gegl-developer
Re: [Gegl-developer] Question about the use of C99/gcc built-in math intrinsics within GEGL on gfloats
Hi, On Fri, 2008-09-12 at 19:45 -0400, Nicolas Robidoux wrote: There is another C99/gcc built-in with the potential to speed up code a lot: the restrict keyword. See: http://www.cellperformance.com/mike_acton/2006/05/demystifying_the_restrict_keyw.html It looks like the restrict keyword could be easily wrapped into a macro that evaluates to restrict on compilers that support it and to on compilers where support for it is missing. So if we should decide that it is too early for using C99 features, we could still use restrict. We just need to add a configure check for it. We could even suggest that it is added to GLib as G_GNUC_RESTRICT. Sven ___ Gegl-developer mailing list Gegl-developer@lists.XCF.Berkeley.EDU https://lists.XCF.Berkeley.EDU/mailman/listinfo/gegl-developer
Re: [Gegl-developer] Question about the use of C99/gcc built-in math intrinsics within GEGL on gfloats
Hi, regarding the use of C99 features, this is a pointer to the last time this question came up among the GLib developers: http://mail.gnome.org/archives/gtk-devel-list/2008-June/msg00020.html The thread linked from this mail might have some interesting arguments that we should consider. Sven ___ Gegl-developer mailing list Gegl-developer@lists.XCF.Berkeley.EDU https://lists.XCF.Berkeley.EDU/mailman/listinfo/gegl-developer
Re: [Gegl-developer] Question about the use of C99/gcc built-in math intrinsics within GEGL on gfloats
Hi, I've filed an enhancement request for G_GNUC_RESTRICT: http://bugzilla.gnome.org/show_bug.cgi?id=552098 We should however not wait for this to be included in GLib. As GLib 2.18 has just been released, it will take a while before 2.20 hits the road. Sven ___ Gegl-developer mailing list Gegl-developer@lists.XCF.Berkeley.EDU https://lists.XCF.Berkeley.EDU/mailman/listinfo/gegl-developer
Re: [Gegl-developer] Question about the use of C99/gcc built-in math intrinsics within GEGL on gfloats
Hello Sven: Thanks for your answer. Simplifies my life a lot. -- There is another C99/gcc built-in with the potential to speed up code a lot: the restrict keyword. See: http://www.cellperformance.com/mike_acton/2006/05/demystifying_the_restrict_keyw.html I'll build two versions of the gegl-sampler-yafr code (one of which which I'll masquerade as gegl-sampler-cubic so I can run both without recompiling) and run careful benchmarks this weekend. One version will stay away from restrict and c99 math intrinsics, the other will not (first pass, I may not go as far as making explicit calls to fma, although my code is structured in the hope that the compiler recognizes fused multiply-adds when appropriate). I don't quite understand the issues of writing c++ code using c99 features (this is why knowing that they are gcc built-ins is useful, provided one knows that gcc will be the compiler). Maybe I'll inspire myself from http://www.ddj.com/cpp/184401653 Nicolas Robidoux Laurentian University/Universite Laurentienne ___ Gegl-developer mailing list Gegl-developer@lists.XCF.Berkeley.EDU https://lists.XCF.Berkeley.EDU/mailman/listinfo/gegl-developer
Re: [Gegl-developer] Question about the use of C99/gcc built-in math intrinsics within GEGL on gfloats
I just completed a quick and dirty benchmark comparing the use of arithmetic branching using c99/gcc intrinsics within the yafr sampler code, to using the standard c if then else. These tests were performed on a Thinkpad t60p with Intel(R) Core(TM)2 CPU T7200 @ 2.00GHz with 2025MiB memory running 2.6.24-19-generic #1 SMP by way of a pretty standard Ubuntu 8.04. Warning: There seems to be something wrong with math.h with the current version of gcc, as suggested by some recent bug postings. For example, according to the gcc documentation, I should not have to prefix fminf with __builtin_. Consequently, it could be that the benchmark results will soon be made irrelevant. Second warning: If my memory is good, Intel chips have a good and fast implementation of the ? : branching construct (having to do with selecting which register to copy into another), as well as good branch prediction. My code without intrinsics is structured to take advantage of this. Third warning: I have not optimized looking at the assembler output of gcc, and have done no optimization of the arithmetic branching version of the code. In particular, I have not used fmaf, even though my code is peppered with opportunity to use it (this may not be a big deal: apparently, gcc attempts to spot opportunities to use fused multiply-add). -- quick description of the test: -- I ran a bunch of consecutive scalings (times 20) of a digital photograph with initial dimensions 200x133, driving the gegl scale through an xml file analogous to the ones in gegl/docs/gallery, alternating between the with branching and arithmetic branching with intrinsics versions, and throwing in four scalings with the gegl stock linear. - Differences between the two versions of the code: - 16 code segments resembling the following (note the ?: this the version with branching): const gfloat prem_squared = prem * prem_; const gfloat deux_squared = deux * deux_; const gfloat troi_squared = troi * troi_; const gfloat prem_times_deux = prem * deux; const gfloat deux_times_troi = deux * troi; const gfloat deux_squared_minus_prem_squared = deux_squared - prem_squared; const gfloat troi_squared_minus_deux_squared = troi_squared - deux_squared; const gfloat prem_vs_deux = deux_squared_minus_prem_squared (gfloat) 0. ? prem : deux; const gfloat deux_vs_troi= troi_squared_minus_deux_squared (gfloat) 0. ? deux: troi; const gfloat my__up = prem_times_deux (gfloat) 0. ? prem_vs_deux : (gfloat) 0.; const gfloat my_dow = deux_times_troi (gfloat) 0. ? deux_vs_troi : (gfloat) 0.; were replaced by (this is the version with arithmetic branching): const gfloat abs_prem = fabsf( prem ); const gfloat abs_deux = fabsf( deux ); const gfloat abs_troi = fabsf( troi ); const gfloat prem_vs_deux = __builtin_fminf( abs_prem, abs_deux ); const gfloat deux_vs_troi = __builtin_fminf( abs_deux, abs_troi ); const gfloat sign_prem = copysignf( prem, (gfloat) 1. ); const gfloat sign_deux = copysignf( deux, (gfloat) 1. ); const gfloat sign_troi = copysignf( troi, (gfloat) 1. ); const gfloat my__up = ( sign_prem * sign_deux + (gfloat) 1. ) * prem_vs_deux; const gfloat my_dow = ( sign_deux * sign_troi + (gfloat) 1. ) * prem_deux_0_vs_troi; Basically, what the code snippets does is this: If prem and deux have the same sign, put the smallest one (in absolute value) in my__up. Otherwise, set my__up to zero. Do likewise with deux, troi and my_dow. The above two code snippets represent the best ways of performing this that I could figure. === Overall conclusion: === Arithmetic branching (without other improvements) does not appear to be worth the trouble. Average timings: stock gegl linear scale: 47.50 = ( 47.474 + 47.581 + 47.345 + 47.595 ) / 4 gegl yafr with ? branching and no use of intrinsics: 52.58 = ( 52.422 + 52.479 + 52.748 + 52.501 + 52.680 + 52.623 + 52.537 + 52.518 + 52.576 + 52.487 + 52.542 + 52.485 + 52.645 + 52.810 + 52.667 + 52.554 ) / 16 gegl yafr performing arithmetic branching with fabsf, copysignf and fminf: 52.70 = ( 52.568 + 52.447 + 52.763 + 52.524 + 52.772 + 52.652 + 52.524 + 52.765 + 52.596 + 52.850 + 52.733 + 52.799 + 52.627 + 52.897 + 52.871 + 52.866 ) / 16 As you can see, the ? version is slightly faster overall. Probably not in a significant way, but this certainly does not suggest that this is worth the hassle. Nicolas Robidoux Laurentian University/Universite Laurentienne ___ Gegl-developer mailing list Gegl-developer@lists.XCF.Berkeley.EDU https://lists.XCF.Berkeley.EDU/mailman/listinfo/gegl-developer