Re: [PD] Raspberry Pi does denormals

Julian Brooks Wed, 23 Jan 2013 12:15:55 -0800

Hey Katja,

Would you mind sharing the 'normalised' Pd-0.44.0 for RPi please.


Cheers,

Julian



On 23 January 2013 18:23, katja <[email protected]> wrote:

> Now I recompiled the Pd-0.44.0 release on Raspberry Pi (took me a few
> hours, not only because Pi is so slow) with PD_BIGORSMALL enabled for
> arm in m_pd.h. Using bigorsmalltest.pd from my previous mail I
> verified that the macro is implemented indeed.
>
> Martin Brinkmann's patch chaosmonster1
> (http://www.martin-brinkmann.de) gives a beautiful illustration of the
> improvement. This patch is full of filters and delay lines. At it's
> initial settings, there is no subnormals problem. But if you set the
> bottom slider to the right, it gets silent. With Pd-0.44-0 release,
> CPU load explodes. With the 'normalized' Pd, nothing special happens.
>
> And indeed, the PD_BIGORSMALL conditional checks come for free: with
> initial settings of the chaosmonster1, performance is equivalent in
> both Pd's. Cool! Hopefully this is similar on armv7.
>
> Katja
>
>
>
> On Wed, Jan 23, 2013 at 5:01 PM, Hans-Christoph Steiner <[email protected]>
> wrote:
> >
> > hey Katya,
> >
> > This also sounds like good evidence for your idea of writing C code that
> > modern compilers optimize well.  Using unions for aliasing allows the
> compiler
> > to do all the new tricks, then writing loops that auto-vectorize gives
> us the
> > real benefits.  Also, I think we can see some gains by using memcpy()
> since on
> > modern libc version, those are highly optimized for the given CPU,
> dynamically
> > choosing the routines based on what instructions are available. memcpy
> will
> > use things like SSSE2 if its available.
> >
> > .hc
> >
> > On 01/23/2013 07:47 AM, katja wrote:
> >> Finally some good news on this topic. Earlier I stated that 'big or
> >> small tests' are expensive for the Pi, but that is not by definition
> >> the case. There must have been other conditions blurring my
> >> impression. I've now done a systematic test where other influences are
> >> ruled out. A test class [lopass~] with exactly the same routine as
> >> [lop~] was made, but compiled with PD_BIGORSMALL() macro enabled. It
> >> was verified that [lopass~] is not affected by denormals. Performance
> >> comparison of [lop~] and [lopass~] shows that both objects cause
> >> equivalent CPU load. Meaning, Raspberry Pi gives the 'big or small
> >> checks' for free! At least in the case of this simple filter. Please
> >> try attached bigorsmalltest.zip on the Pi to see if I'm not dreaming.
> >>
> >> While I was at the topic anyway, I also tried a big or small test with
> >> union instead of direct type aliasing. It has the advantage that the
> >> compiler can apply strict aliasing rules. This test with unions did
> >> not cause extra CPU load either on the Pi. If you want to verify this
> >> result, enable the call to bigorsmall() instead of PD_BIGORSMALL in
> >> lopass~.c and recompile.
> >>
> >> The fact that these tests do not cause extra CPU load, indicate that
> >> they are done in parallel with other instructions. Float and int
> >> registers are apparently strictly separated on armv6, there's no such
> >> thing like Intel's xmm registers or armv7's NEON. As it happens, the
> >> big or small tests are done on ints, aliases of the floats that must
> >> be tested. Initially I assumed that the transport of floats from vfp
> >> to the arm integer processor would be expensive, but if the
> >> instructions are done simultaneously it may be an advantage instead.
> >> Another thing is that ARM implements branch predication instead of
> >> branch prediction. Those terms look almost the same but the routines
> >> are very different. Predication is when instructions for both branches
> >> are executed, and the wrong result is simply discarded later.
> >>
> >> Conclusions from the limited test with [lop~] and [lopass~] do not
> >> mean that all sorts of conditional checks are cheap on the Pi, or on
> >> ARM in general. If PD_BIGORSMALL is enabled for RPi using compile-time
> >> definition __arm__, it will also hold for armv7, but it may have very
> >> different result there. At the moment I have no access yet to an armv7
> >> device. Maybe someone can recompile test class [lopass~] and do the
> >> tests on Beagleboard or Cubieboard? Otherwise I may be able to do it
> >> on my friend's PengPod when that has arrived.
> >>
> >> Katja
> >>
> >>
> >> On Tue, Jan 22, 2013 at 8:54 PM, Miller Puckette <[email protected]> wrote:
> >>> thanks - I'd better try this and find out what's going on :)
> >>>
> >>> M
> >>>
> >>> On Mon, Jan 21, 2013 at 11:54:29AM +0100, katja wrote:
> >>>> Tried the 0.44.0 build from your website. It has the same issue with
> >>>> subnormal values. My test patch is with [lop~]. If inf or nan is fed
> >>>> into [lop~], these 'values' keep circulating in the object, it can no
> >>>> longer process normal signal values.
> >>>>
> >>>> I also tried my reverb stuff with specific compiler options for Pi's
> processor:
> >>>>
> >>>> -march=armv6zk
> >>>> -mcpu=arm1176jzf-s
> >>>> -mtune=arm1176jzf-s
> >>>>
> >>>> With these options, gcc should be able to decide that RunFast mode is
> >>>> permitted. But even in combination with -ffast-math (which in turn
> >>>> sets -funsafe-math-optimizations and -fno-trapping-math amongst
> >>>> others), denormals are still there. I'm literally out of options for
> >>>> the moment. Sorry for not having better news.
> >>>>
> >>>> Katja
> >>>>
> >>>>
>
> _______________________________________________
> [email protected] mailing list
> UNSUBSCRIBE and account-management ->
> http://lists.puredata.info/listinfo/pd-list
>

_______________________________________________
[email protected] mailing list
UNSUBSCRIBE and account-management -> 
http://lists.puredata.info/listinfo/pd-list

Re: [PD] Raspberry Pi does denormals

Reply via email to