Thanks for this write-up and all that testing, its definitely very helpful. So in the end, you're talking about Pd-extended on debian only? It sounds like your tests show that 0.43 was not slower on Mac OS X.
It does look like the Debian-i386 builds don't have optimization turned on, you can look at the build log to see exactly how it was built: http://autobuild.puredata.info/auto-build/2012-12-07/logs/2012-12-07_06.27.52_linux_debian-squeeze-i386_pd-extended.txt cc -I"/home/pd/auto-build/pd-extended/pd/include/pd" -DPD -DVERSION='"1.2.1"' -fPIC -DPD -DHAVE_G_CANVAS_H -I/home/pd/auto-build/pd-extended/pd/src -Wall -W -ggdb -I/home/pd/auto-build/pd-extended/externals/Gem -I/home/pd/auto-build/pd-extended/externals/pdp/include -DUNIX -Dunix -DDL_OPEN -fPIC -g -fno-inline-functions -fno-omit-frame-pointer -DDEBUG_SOUNDFILE -Wstrict-aliasing=2 -o "freeverb~.o" -c "freeverb~.c" If you want to mess with the flags, try adding things to OPT_CFLAGS in packages/linux_make/Makefile, that should affect the almost all of the build. If you just want to test freeverb, you can do this: cd externals/freeverb make OPT_CFLAGS="-O6 -msse -msse2 -mfpmath=sse -ftree-vectorize -ftree-vectorizer-verbose=1" Or things like that... I'd be very interested to hear about profiling results of using these flags. I only did a little profiling when I stuck those in. .hc On Dec 7, 2012, at 4:36 PM, katja wrote: > Finally I have some clue what's wrong with Pd-E 0.43 for GNU/Linux, or > for Debian Squeeze at least. Sorry that it took me so long to sit down > and sort it out. > > The problem is still there, with version 0.43.4: my live performance > setups run with almost double CPU load, when compared to 0.42. Now I > also tested with some comprehensive patches which are known to be pure > vanilla, like Martin Brinkmann's 'chaosmonster'. Remarkably, these > patches do not show an increased CPU load. Therefore I guessed that it > must be in external classes. > > I tried using callgrind and kcachegrind (thanks for the hint Jamie). > Though callgrind makes Pd choke completely (while recording the > complete call history of a process instead of taking samples), the > output gave a clue. Freeverb~ was shown to make a couple hundred > function calls within the perform loop. Functions which are written as > 'inline' in the C file. An isolated freeverb~ instance turned out to > do 10% CPU load. Admittedly, this computer (1.8 GHz core duo 2006) is > not the latest. But freeverb~ normally does some 1% per instance. > > So, freeverb~ is the messenger; without it I might not have noticed > any problem. But what is the message? Is Pd-E 0.43 compiled without > optimization? I searched for more inline functions in external libs, > and found one in bsaylor/svf~. In this case again, the executable > implements it as a call. The core code however is almost certainly > compiled with properly inlined functions. There's one frequently > called inline function in the API (PD_BIGORSMALL, which used to be a > macro in the past). If this would be compiled as a call, a patch like > 'chaosmonster' would definitely show performance loss. > > Note that I'm talking about debian binaries so far, more precisely > Pd-E 0.43.4 for debian squeeze, as downloaded from puredata.info > downloads page. In contrast, I checked freeverb~ in the distribution > for OSX i386, and here the inlining was done properly. > > Another difference between those distributions: SSE instructions are > used for OSX, not for debian. Simple operations like addition and > multiplication of floats are done on the FPU in debian, while xmm > registers are used with OSX. This also means that things like abs() > and ifnan() are function calls for debian, while they could be simple > instructions on the xmm registers. (Instructions can be viewed by > dissassembling executables with command objdump -d <file>.) > > My conclusion from these observations: at least some Pd 0.43 externals > for debian squeeze are compiled with -0O for some reason (don't know > about other Linuxes). How come? The template makefile (also used for > freeverb~) has optimization -O6. The root makefile for the packages > have certain optimization flags as well. Are they somehow conflicting, > producing an undefined result? Not for OSX, apparently. But for debian > something goes wrong. The build system stuff is really over my head, > hopefully someone else has better overview to find the exact cause. > > Katja > > > > > On 5/6/12, Jamie Bullock <[email protected]> wrote: >> >> Hi Katja, >> >> >> On 5 May 2012, at 20:43, katja <[email protected]> wrote: >> >>> >>> >>> I've tried to use Oprofile on Debian, but this gives me a kernel >>> failure soon as I start sampling. Does anyone know of a fine >>> performance profiler for GNU/Linux? >>> >>> Katja >>> >>> >> >> You might want to try callgrind + kcachegrind... >> >> http://www.slac.stanford.edu/BFROOT/www/Computing/Optimization/genprof.html >> >> best, >> >> Jamie >> >> -- >> http://www.jamiebullock.com >> >> >> > > _______________________________________________ > [email protected] mailing list > UNSUBSCRIBE and account-management -> > http://lists.puredata.info/listinfo/pd-list _______________________________________________ [email protected] mailing list UNSUBSCRIBE and account-management -> http://lists.puredata.info/listinfo/pd-list
