Re: [Gegl-developer] Question about the use of C99/gcc built-in math intrinsics within GEGL on gfloats

2008-09-16 Thread Sven Neumann
Hi,

On Sun, 2008-09-14 at 11:30 +0200, Geert Jordaens wrote:

 Introducing the qualifier restrict will have some more checks to be done 
 by the programmer and enabling the *-fstrict-aliasing* flag and the 
 warning *-Wstrict-aliasing *would be advisable.

I don't think enabling the strict-aliasing rules is advisable in
general. It is too easy to break the rules and this might lead to bugs.


Sven


___
Gegl-developer mailing list
Gegl-developer@lists.XCF.Berkeley.EDU
https://lists.XCF.Berkeley.EDU/mailman/listinfo/gegl-developer


Re: [Gegl-developer] Question about the use of C99/gcc built-in math intrinsics within GEGL on gfloats

2008-09-14 Thread Geert Jordaens
Sven Neumann wrote:
 Hi,

 I've filed an enhancement request for G_GNUC_RESTRICT:

  http://bugzilla.gnome.org/show_bug.cgi?id=552098

 We should however not wait for this to be included in GLib. As GLib 2.18
 has just been released, it will take a while before 2.20 hits the road.


 Sven


 ___
 Gegl-developer mailing list
 Gegl-developer@lists.XCF.Berkeley.EDU
 https://lists.XCF.Berkeley.EDU/mailman/listinfo/gegl-developer


   
Introducing the qualifier restrict will have some more checks to be done 
by the programmer and enabling the *-fstrict-aliasing* flag and the 
warning *-Wstrict-aliasing *would be advisable.

A good article on the use of the restrict qualifier:

http://www.cellperformance.com/mike_acton/2006/05/demystifying_the_restrict_keyw.html
and
http://developers.sun.com/solaris/articles/cc_restrict.html

___
Gegl-developer mailing list
Gegl-developer@lists.XCF.Berkeley.EDU
https://lists.XCF.Berkeley.EDU/mailman/listinfo/gegl-developer


Re: [Gegl-developer] Question about the use of C99/gcc built-in math intrinsics within GEGL on gfloats

2008-09-13 Thread Sven Neumann
Hi,

On Fri, 2008-09-12 at 19:45 -0400, Nicolas Robidoux wrote:

 There is another C99/gcc built-in with the potential to speed up code a lot: 
 the restrict keyword.
 
 See:
 
 http://www.cellperformance.com/mike_acton/2006/05/demystifying_the_restrict_keyw.html

It looks like the restrict keyword could be easily wrapped into a macro
that evaluates to restrict on compilers that support it and to  on
compilers where support for it is missing. So if we should decide that
it is too early for using C99 features, we could still use restrict.
We just need to add a configure check for it. We could even suggest that
it is added to GLib as G_GNUC_RESTRICT.


Sven


___
Gegl-developer mailing list
Gegl-developer@lists.XCF.Berkeley.EDU
https://lists.XCF.Berkeley.EDU/mailman/listinfo/gegl-developer


Re: [Gegl-developer] Question about the use of C99/gcc built-in math intrinsics within GEGL on gfloats

2008-09-13 Thread Sven Neumann
Hi,

regarding the use of C99 features, this is a pointer to the last time
this question came up among the GLib developers:

 http://mail.gnome.org/archives/gtk-devel-list/2008-June/msg00020.html

The thread linked from this mail might have some interesting arguments
that we should consider.


Sven


___
Gegl-developer mailing list
Gegl-developer@lists.XCF.Berkeley.EDU
https://lists.XCF.Berkeley.EDU/mailman/listinfo/gegl-developer


Re: [Gegl-developer] Question about the use of C99/gcc built-in math intrinsics within GEGL on gfloats

2008-09-13 Thread Sven Neumann
Hi,

I've filed an enhancement request for G_GNUC_RESTRICT:

 http://bugzilla.gnome.org/show_bug.cgi?id=552098

We should however not wait for this to be included in GLib. As GLib 2.18
has just been released, it will take a while before 2.20 hits the road.


Sven


___
Gegl-developer mailing list
Gegl-developer@lists.XCF.Berkeley.EDU
https://lists.XCF.Berkeley.EDU/mailman/listinfo/gegl-developer


Re: [Gegl-developer] Question about the use of C99/gcc built-in math intrinsics within GEGL on gfloats

2008-09-12 Thread Nicolas Robidoux

Hello Sven:

Thanks for your answer. Simplifies my life a lot.

--

There is another C99/gcc built-in with the potential to speed up code a lot: 
the restrict keyword.

See:

http://www.cellperformance.com/mike_acton/2006/05/demystifying_the_restrict_keyw.html

I'll build two versions of the gegl-sampler-yafr code (one of which which I'll 
masquerade as gegl-sampler-cubic so I can run both without recompiling) and run 
careful benchmarks this weekend. One version will stay away from restrict and 
c99 math intrinsics, the other will not (first pass, I may not go as far as 
making explicit calls to fma, although my code is structured in the hope that 
the compiler recognizes fused multiply-adds when appropriate).

I don't quite understand the issues of writing c++ code using c99 features 
(this is why knowing that they are gcc built-ins is useful, provided one knows 
that gcc will be the compiler).

Maybe I'll inspire myself from

http://www.ddj.com/cpp/184401653

Nicolas Robidoux
Laurentian University/Universite Laurentienne

___
Gegl-developer mailing list
Gegl-developer@lists.XCF.Berkeley.EDU
https://lists.XCF.Berkeley.EDU/mailman/listinfo/gegl-developer


Re: [Gegl-developer] Question about the use of C99/gcc built-in math intrinsics within GEGL on gfloats

2008-09-12 Thread Nicolas Robidoux

I just completed a quick and dirty benchmark comparing the use of 
arithmetic branching using c99/gcc intrinsics within the yafr sampler
code, to using the standard c if then else.

These tests were performed on a Thinkpad t60p with Intel(R) Core(TM)2
CPU T7200 @ 2.00GHz with 2025MiB memory running 2.6.24-19-generic #1
SMP by way of a pretty standard Ubuntu 8.04.

Warning: There seems to be something wrong with math.h with the
current version of gcc, as suggested by some recent bug postings. For
example, according to the gcc documentation, I should not have to
prefix fminf with __builtin_. Consequently, it could be that the
benchmark results will soon be made irrelevant.

Second warning: If my memory is good, Intel chips have a good and fast
implementation of the ? : branching construct (having to do with
selecting which register to copy into another), as well as good branch
prediction. My code without intrinsics is structured to take advantage
of this.

Third warning: I have not optimized looking at the assembler output of
gcc, and have done no optimization of the arithmetic branching
version of the code. In particular, I have not used fmaf, even though
my code is peppered with opportunity to use it (this may not be a big
deal: apparently, gcc attempts to spot opportunities to use fused
multiply-add).

--
quick description of the test:
--

I ran a bunch of consecutive scalings (times 20) of a digital
photograph with initial dimensions 200x133, driving the gegl scale
through an xml file analogous to the ones in gegl/docs/gallery,
alternating between the with branching and arithmetic branching
with intrinsics versions, and throwing in four scalings with the gegl
stock linear.

-
Differences between the two versions of the code:
-

16 code segments resembling the following (note the ?: this the
version with branching):

  const gfloat prem_squared = prem * prem_;
  const gfloat deux_squared = deux * deux_;
  const gfloat troi_squared = troi * troi_;
  const gfloat prem_times_deux = prem * deux;
  const gfloat deux_times_troi = deux * troi;
  const gfloat deux_squared_minus_prem_squared = deux_squared - prem_squared;
  const gfloat troi_squared_minus_deux_squared = troi_squared - deux_squared;
  const gfloat prem_vs_deux =
deux_squared_minus_prem_squared  (gfloat) 0. ? prem : deux;
  const gfloat deux_vs_troi=
troi_squared_minus_deux_squared  (gfloat) 0. ? deux: troi;
  const gfloat my__up =
prem_times_deux  (gfloat) 0. ? prem_vs_deux : (gfloat) 0.;
  const gfloat my_dow =
deux_times_troi (gfloat) 0. ? deux_vs_troi : (gfloat) 0.;

were replaced by (this is the version with arithmetic branching):

  const gfloat abs_prem = fabsf( prem );
  const gfloat abs_deux = fabsf( deux );
  const gfloat abs_troi = fabsf( troi );
  const gfloat prem_vs_deux = __builtin_fminf( abs_prem, abs_deux );
  const gfloat deux_vs_troi = __builtin_fminf( abs_deux, abs_troi );
  const gfloat sign_prem = copysignf( prem, (gfloat) 1. );
  const gfloat sign_deux = copysignf( deux, (gfloat) 1. );
  const gfloat sign_troi = copysignf( troi, (gfloat) 1. );
  const gfloat my__up =
( sign_prem * sign_deux + (gfloat) 1. ) * prem_vs_deux;
  const gfloat my_dow =
( sign_deux * sign_troi + (gfloat) 1. ) * prem_deux_0_vs_troi;

Basically, what the code snippets does is this:

If prem and deux have the same sign, put the smallest one (in absolute
value) in my__up. Otherwise, set my__up to zero. Do likewise with
deux, troi and my_dow. The above two code snippets represent the best
ways of performing this that I could figure.

===
Overall conclusion:
===

Arithmetic branching (without other improvements) does not appear to
be worth the trouble.


Average timings:


stock gegl linear scale:

47.50 = ( 47.474 + 47.581 + 47.345 + 47.595 ) / 4

gegl yafr with ? branching and no use of intrinsics:

52.58 = 
( 52.422 + 52.479 + 52.748 + 52.501 + 52.680 + 52.623 + 52.537 +
52.518 + 52.576 + 52.487 + 52.542 + 52.485 + 52.645 + 52.810 + 52.667
+ 52.554 ) / 16

gegl yafr performing arithmetic branching with fabsf, copysignf and fminf:

52.70 = ( 52.568 + 52.447 + 52.763 + 52.524 + 52.772 + 52.652 + 52.524
+ 52.765 + 52.596 + 52.850 + 52.733 + 52.799 + 52.627 + 52.897 +
52.871 + 52.866 ) / 16

As you can see, the ? version is slightly faster overall. Probably
not in a significant way, but this certainly does not suggest that
this is worth the hassle.

Nicolas Robidoux
Laurentian University/Universite Laurentienne

___
Gegl-developer mailing list
Gegl-developer@lists.XCF.Berkeley.EDU
https://lists.XCF.Berkeley.EDU/mailman/listinfo/gegl-developer