Re: [PD-dev] double precision Pd: .patch files, tests and benchmarks

Hans-Christoph Steiner Tue, 04 Oct 2011 06:42:38 -0700


On Oct 4, 2011, at 5:38 AM, IOhannes m zmoelnig wrote:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 2011-10-04 09:06, katja wrote:


Yesterday I forgot to mention why it should definitely not be built
with -O0 (unless for debug purposes): PD_BIGORSMALL is defined an


ah yes, this was indeed my fault.
since i don't feel comfortable with editing m_pd.h to get a different
build, i used CFLAGS="-DPD_FLOAT_PRECISION=64", which undid any
optimization flags (which by default are "-O6", which i find a bit
overdone; and "-g" is not set at all...)

the proper way is to use CPPFLAGS="-DPD_FLOAT_PRECISION=64", which
results in:

osc-delay-perftest with 400 instances:
debian           : 31%
original         : 29%
single           : 22%
single(O0)       : 64%
single(O2)       : 25%
single(O2+loop)  : 22%
single(pentium3) : 24%
single(pentium4) : 22%
single(prescott) : 22%
single(core2)    : 22%
single(core2+sse): 22%
double           : 25%
double(O0)       : 86%
double(O2)       : 27%
double(O2+loop)  : 26%
double(pentium3) : 25%
double(pentium4) : 24%
double(prescott) : 24%
double(core2)    : 24%
double(core2+sse): 25%

osc-delay-perftest with 1200 instances:
debian           : 94%
original         : 81%
single           : 65%
single(O2)       : 72%
single(O0)       : ++%
single(O2+loop)  : 66%
single(pentium3) : 70%
single(pentium4) : 66%
single(prescott) : 65%
single(core2)    : 59%
single(core2+sse): 64%
double           : 77%
double(O0)       : ++%
double(O2)       : 82%
double(O2+loop)  : 77%
double(pentium3) : 79%
double(pentium4) : 75%
double(prescott) : 75%
double(core2)    : 71%
double(core2+sse): 75%

which is more inline with katja's measurements.

this is (again) on an i5 650 @ 3.2GHz running in 32bit mode
optimization flags (as far as they can be reconstructed :-))
debian: "-g -O2" (this is what is dictated by debian policy)
original: "-O6 -funroll-loops -fomit-frame-pointer"  (seems to be the
default)
single/double: ->original
(O0): -O0
(O2): -g -O2
(O2+loop): -g -O2 -funroll-loops -fomit-frame-pointer
(prescott): ->original + "-march=prescott"
(core2): ->original + "-march=core2"
(core2+sse): ->original + "-march=core2 -mfpmath=sse -msse2"


so it seems like the biggest performance boost is given (on the tested
platform), by compiling with "-g -O2 -funroll-loops

- -fomit-frame-pointer" (which is cool because i think this can evenmake

it into debian, the way it is)

inline function (like it was already suggested by IOhannes a while
ago), but at -O0 nothing will be inlined. A benchmark howto would be
useful indeed.

well, i usually just cram lots of the same object into a subpatch(untili get approximately 80% in the slowest environment, in order to notmax

out the CUP and get unknown side-effects), and measure it with the
built-in load-meter (for loads <100% it behaves quite the same as top)
nothing very dramatic.

Nice tests, thanks for that. I would be interested to see the effectsof auto-vectorization on these numbers. Have you tried that? If thetest patch doesn't include objects that have loops vectorized, itwon't make a difference.


.hc


----------------------------------------------------------------------------

If you are not part of the solution, you are part of the problem.



_______________________________________________
Pd-dev mailing list
[email protected]
http://lists.puredata.info/listinfo/pd-dev

Re: [PD-dev] double precision Pd: .patch files, tests and benchmarks

Reply via email to