Thank you. On 24 January 2013 09:14, katja <[email protected]> wrote:
> 'Undenormalized' Pd build for Raspberry Pi is temporarily parked here > for testing purposes (will be removed when Miller's release is fixed > in this sense): > > www.katjaas.nl/temp/pd-0.44-0-normalized.tar.gz > > This is a locally installed Pd, like Miller's distribution. You can > start it from command line with the full path to > pd-0.44-0-normalized/bin/pd. It's not a .deb, so it can't be installed > under supervision of package manager. > > Katja > > > On Wed, Jan 23, 2013 at 9:15 PM, Julian Brooks <[email protected]> wrote: > > Hey Katja, > > > > Would you mind sharing the 'normalised' Pd-0.44.0 for RPi please. > > > > Cheers, > > > > Julian > > > > > > > > On 23 January 2013 18:23, katja <[email protected]> wrote: > >> > >> Now I recompiled the Pd-0.44.0 release on Raspberry Pi (took me a few > >> hours, not only because Pi is so slow) with PD_BIGORSMALL enabled for > >> arm in m_pd.h. Using bigorsmalltest.pd from my previous mail I > >> verified that the macro is implemented indeed. > >> > >> Martin Brinkmann's patch chaosmonster1 > >> (http://www.martin-brinkmann.de) gives a beautiful illustration of the > >> improvement. This patch is full of filters and delay lines. At it's > >> initial settings, there is no subnormals problem. But if you set the > >> bottom slider to the right, it gets silent. With Pd-0.44-0 release, > >> CPU load explodes. With the 'normalized' Pd, nothing special happens. > >> > >> And indeed, the PD_BIGORSMALL conditional checks come for free: with > >> initial settings of the chaosmonster1, performance is equivalent in > >> both Pd's. Cool! Hopefully this is similar on armv7. > >> > >> Katja > >> > >> > >> > >> On Wed, Jan 23, 2013 at 5:01 PM, Hans-Christoph Steiner <[email protected]> > >> wrote: > >> > > >> > hey Katya, > >> > > >> > This also sounds like good evidence for your idea of writing C code > that > >> > modern compilers optimize well. Using unions for aliasing allows the > >> > compiler > >> > to do all the new tricks, then writing loops that auto-vectorize gives > >> > us the > >> > real benefits. Also, I think we can see some gains by using memcpy() > >> > since on > >> > modern libc version, those are highly optimized for the given CPU, > >> > dynamically > >> > choosing the routines based on what instructions are available. memcpy > >> > will > >> > use things like SSSE2 if its available. > >> > > >> > .hc > >> > > >> > On 01/23/2013 07:47 AM, katja wrote: > >> >> Finally some good news on this topic. Earlier I stated that 'big or > >> >> small tests' are expensive for the Pi, but that is not by definition > >> >> the case. There must have been other conditions blurring my > >> >> impression. I've now done a systematic test where other influences > are > >> >> ruled out. A test class [lopass~] with exactly the same routine as > >> >> [lop~] was made, but compiled with PD_BIGORSMALL() macro enabled. It > >> >> was verified that [lopass~] is not affected by denormals. Performance > >> >> comparison of [lop~] and [lopass~] shows that both objects cause > >> >> equivalent CPU load. Meaning, Raspberry Pi gives the 'big or small > >> >> checks' for free! At least in the case of this simple filter. Please > >> >> try attached bigorsmalltest.zip on the Pi to see if I'm not dreaming. > >> >> > >> >> While I was at the topic anyway, I also tried a big or small test > with > >> >> union instead of direct type aliasing. It has the advantage that the > >> >> compiler can apply strict aliasing rules. This test with unions did > >> >> not cause extra CPU load either on the Pi. If you want to verify this > >> >> result, enable the call to bigorsmall() instead of PD_BIGORSMALL in > >> >> lopass~.c and recompile. > >> >> > >> >> The fact that these tests do not cause extra CPU load, indicate that > >> >> they are done in parallel with other instructions. Float and int > >> >> registers are apparently strictly separated on armv6, there's no such > >> >> thing like Intel's xmm registers or armv7's NEON. As it happens, the > >> >> big or small tests are done on ints, aliases of the floats that must > >> >> be tested. Initially I assumed that the transport of floats from vfp > >> >> to the arm integer processor would be expensive, but if the > >> >> instructions are done simultaneously it may be an advantage instead. > >> >> Another thing is that ARM implements branch predication instead of > >> >> branch prediction. Those terms look almost the same but the routines > >> >> are very different. Predication is when instructions for both > branches > >> >> are executed, and the wrong result is simply discarded later. > >> >> > >> >> Conclusions from the limited test with [lop~] and [lopass~] do not > >> >> mean that all sorts of conditional checks are cheap on the Pi, or on > >> >> ARM in general. If PD_BIGORSMALL is enabled for RPi using > compile-time > >> >> definition __arm__, it will also hold for armv7, but it may have very > >> >> different result there. At the moment I have no access yet to an > armv7 > >> >> device. Maybe someone can recompile test class [lopass~] and do the > >> >> tests on Beagleboard or Cubieboard? Otherwise I may be able to do it > >> >> on my friend's PengPod when that has arrived. > >> >> > >> >> Katja > >> >> > >> >> > >> >> On Tue, Jan 22, 2013 at 8:54 PM, Miller Puckette <[email protected]> > wrote: > >> >>> thanks - I'd better try this and find out what's going on :) > >> >>> > >> >>> M > >> >>> > >> >>> On Mon, Jan 21, 2013 at 11:54:29AM +0100, katja wrote: > >> >>>> Tried the 0.44.0 build from your website. It has the same issue > with > >> >>>> subnormal values. My test patch is with [lop~]. If inf or nan is > fed > >> >>>> into [lop~], these 'values' keep circulating in the object, it can > no > >> >>>> longer process normal signal values. > >> >>>> > >> >>>> I also tried my reverb stuff with specific compiler options for > Pi's > >> >>>> processor: > >> >>>> > >> >>>> -march=armv6zk > >> >>>> -mcpu=arm1176jzf-s > >> >>>> -mtune=arm1176jzf-s > >> >>>> > >> >>>> With these options, gcc should be able to decide that RunFast mode > is > >> >>>> permitted. But even in combination with -ffast-math (which in turn > >> >>>> sets -funsafe-math-optimizations and -fno-trapping-math amongst > >> >>>> others), denormals are still there. I'm literally out of options > for > >> >>>> the moment. Sorry for not having better news. > >> >>>> > >> >>>> Katja > >> >>>> > >> >>>> > >> > >> _______________________________________________ > >> [email protected] mailing list > >> UNSUBSCRIBE and account-management -> > >> http://lists.puredata.info/listinfo/pd-list > > > > >
_______________________________________________ [email protected] mailing list UNSUBSCRIBE and account-management -> http://lists.puredata.info/listinfo/pd-list
