-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 2011-10-04 09:06, katja wrote: > > Yesterday I forgot to mention why it should definitely not be built > with -O0 (unless for debug purposes): PD_BIGORSMALL is defined an
ah yes, this was indeed my fault. since i don't feel comfortable with editing m_pd.h to get a different build, i used CFLAGS="-DPD_FLOAT_PRECISION=64", which undid any optimization flags (which by default are "-O6", which i find a bit overdone; and "-g" is not set at all...) the proper way is to use CPPFLAGS="-DPD_FLOAT_PRECISION=64", which results in: osc-delay-perftest with 400 instances: debian : 31% original : 29% single : 22% single(O0) : 64% single(O2) : 25% single(O2+loop) : 22% single(pentium3) : 24% single(pentium4) : 22% single(prescott) : 22% single(core2) : 22% single(core2+sse): 22% double : 25% double(O0) : 86% double(O2) : 27% double(O2+loop) : 26% double(pentium3) : 25% double(pentium4) : 24% double(prescott) : 24% double(core2) : 24% double(core2+sse): 25% osc-delay-perftest with 1200 instances: debian : 94% original : 81% single : 65% single(O2) : 72% single(O0) : ++% single(O2+loop) : 66% single(pentium3) : 70% single(pentium4) : 66% single(prescott) : 65% single(core2) : 59% single(core2+sse): 64% double : 77% double(O0) : ++% double(O2) : 82% double(O2+loop) : 77% double(pentium3) : 79% double(pentium4) : 75% double(prescott) : 75% double(core2) : 71% double(core2+sse): 75% which is more inline with katja's measurements. this is (again) on an i5 650 @ 3.2GHz running in 32bit mode optimization flags (as far as they can be reconstructed :-)) debian: "-g -O2" (this is what is dictated by debian policy) original: "-O6 -funroll-loops -fomit-frame-pointer" (seems to be the default) single/double: ->original (O0): -O0 (O2): -g -O2 (O2+loop): -g -O2 -funroll-loops -fomit-frame-pointer (prescott): ->original + "-march=prescott" (core2): ->original + "-march=core2" (core2+sse): ->original + "-march=core2 -mfpmath=sse -msse2" so it seems like the biggest performance boost is given (on the tested platform), by compiling with "-g -O2 -funroll-loops - -fomit-frame-pointer" (which is cool because i think this can even make it into debian, the way it is) > inline function (like it was already suggested by IOhannes a while > ago), but at -O0 nothing will be inlined. A benchmark howto would be > useful indeed. well, i usually just cram lots of the same object into a subpatch (until i get approximately 80% in the slowest environment, in order to not max out the CUP and get unknown side-effects), and measure it with the built-in load-meter (for loads <100% it behaves quite the same as top) nothing very dramatic. fgmasdr IOhannes -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk6K1AQACgkQkX2Xpv6ydvTGgwCfSp1ytXru2AtPqCQx2O1BZ3Zc A2QAoNS7ki9euvd4XKaRMhtc0grI2D9V =EwUX -----END PGP SIGNATURE-----
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ Pd-dev mailing list [email protected] http://lists.puredata.info/listinfo/pd-dev
