Hi,

I was investigating a performance issue with Neon intrinsics and realized this needed to happen.

Patch 1/3 does this. I've special cased the ffast-math case for the _f32 intrinsics to prevent the auto-vectorizer from coming along and vectorizing addv2sf and addv4sf type operations which we don't want to happen by default. Patch 1/3 causes apparent "regressions" in the rather ineffective neon intrinsics tests that we currently carry soon hopefully to be replaced by Christophe Lyon's rewrite that is being reviewed. On the whole I deem this patch stack to be safe to go in if necessary. These "regressions" are for -O0 with the vbic and vorn intrinsics which don't now get combined and well, so be it.

This then left us in the happy position of being able to delete code but I was worried about LTO streaming as these "builtins" are essentially streamed out in LTO object code format. However since we make no promises about LTO compatibility across releases, that's safe but I structured the dead code elimination as Patch 2/3. This will be committed separately in case folks want to backport Patch 1/3 separately and want to assure their users of LTO compatibility within a release branch (if that even works :) ) .

Patch 3/3 removes the ML to generate Neon intrinsics and the documentation and updates the comments in the files to show that these are now hand crafted rather than auto-generated. We've had these for many years now and I think it's time we got rid of this. Not everyone groks ML and it doesn't help that only one or 2 folks can actually do this properly everytime. Instead of having these bottlenecks and given the fact that the intrinsics are pretty stable now, there's no point in retaining the generator interface. I'd rather get rid of them. The only bit left is neon-schedgen.ml, neon.ml and neon-testgen.ml. I think we can safely remove neon-testgen.ml once Christophe's testsuite is done and we'll probably just have to carry neon-schedgen.ml / neon.ml as it still generates the neon descriptions for both a8 and a9.

The patch stack was caught up in the C++ type info mess recently and I've tested this on a cross arm-linux-gnueabihf testsuite run and it looks ok module the issues mentioned for Patch 1/3. I've deliberately resisted deleting the entire gcc.target/arm/neon and neon-testgen.ml in the hope that Christophe's testsuite will do the honours at that point :). Given we're in stage 1 and that I think we're getting some where with clyon's testsuite I feel that is reasonably practical in just carrying the noise with these extra failures. Christophe and I will testdrive his testsuite work in this space with these patches to see how the conversion process works and if there are any issues with these patches.

If there are issues I'm happy to hear about them.

Will apply to trunk in a couple of days if no regressions with clyon's testsuite for these intrinsics.


regards
Ramana
--
Ramana Radhakrishnan
Principal Engineer
ARM Ltd.

Reply via email to