Hans, the info about NEON is relevant for armv7 (Beagleboard, Cubieboard, PengPod...). But Raspberry Pi doesn't have NEON. Float processing is done on coprocessor vfpv2. As far as I can see, vfpv2 hardly has any SIMD instructions (except for moving data between ARM and vfp). It is said to process a maximum of 8 single precision floats in parallel, but Raspberry Pi doesn't show a sign that it profits from data alignment, at least not when code is compiled with gcc.
Katja On Sun, Jan 20, 2013 at 5:12 PM, Hans-Christoph Steiner <[email protected]> wrote: > > I think this is what you want, from 'man gcc'. Its interesting to note that > the NEON mode, which provides SIMD, also does not do denormals: > > -mfpu=name > -mfpe=number > -mfp=number > This specifies what floating point hardware (or hardware emulation) is > available on the target. Permissible names are: fpa, fpe2, fpe3, > maverick, > vfp, vfpv3, vfpv3-fp16, vfpv3-d16, vfpv3-d16-fp16, vfpv3xd, vfpv3xd-fp16, > neon, neon-fp16, vfpv4, vfpv4-d16, fpv4-sp-d16 and neon-vfpv4. -mfp and > -mfpe are synonyms for -mfpu=fpenumber, for compatibility with older > versions of GCC. > > If -msoft-float is specified this specifies the format of floating point > values. > > If the selected floating-point hardware includes the NEON extension (e.g. > -mfpu=neon), note that floating-point operations will not be used by GCC's > auto-vectorization pass unless -funsafe-math-optimizations is also > specified. This is because NEON hardware does not fully implement the > IEEE > 754 standard for floating-point arithmetic (in particular denormal values > are treated as zero), so the use of NEON instructions may lead to a loss > of > precision. > > > .hc > > On 01/20/2013 06:54 AM, katja wrote: >> I was assuming, or maybe just hoping? that Raspberry Pi (and ARM >> devices in general) would not suffer from Denormal's disease like >> Intel processors do. But guess what: Pi's float coprocessor is IEEE >> 754 compliant and does all denormals by default (can check with >> attached denorm-test.pd). Bummer! As if one would use an ARM device to >> calculate the size of a Majorana particle, rather than doing simple >> dsp. Do we really need to enable PD-BIGORSMALL() checks for this poor >> little processor? There seems to be something called 'RunFast mode' >> for Pi's float processor vfpv2, but I see no way how to enable this >> via gcc. Option -ffast-math is allowed but doesn't do the trick. Can't >> find an option to set vfpv2 specifically, in gcc docs. >> >> Katja >> >> >> >> _______________________________________________ >> [email protected] mailing list >> UNSUBSCRIBE and account-management -> >> http://lists.puredata.info/listinfo/pd-list >> > > _______________________________________________ > [email protected] mailing list > UNSUBSCRIBE and account-management -> > http://lists.puredata.info/listinfo/pd-list _______________________________________________ [email protected] mailing list UNSUBSCRIBE and account-management -> http://lists.puredata.info/listinfo/pd-list
