[Patch ARM 0/3] Neon intrinsics TLC - Replace intrinsics with GNU C implementations where possible and remove dead code.

Ramana Radhakrishnan Mon, 28 Apr 2014 03:42:13 -0700

Hi,

I was investigating a performance issue with Neon intrinsics andrealized this needed to happen.

Patch 1/3 does this. I've special cased the ffast-math case for the_f32 intrinsics to prevent the auto-vectorizer from coming along andvectorizing addv2sf and addv4sf type operations which we don't want tohappen by default. Patch 1/3 causes apparent "regressions" in the ratherineffective neon intrinsics tests that we currently carry soon hopefullyto be replaced by Christophe Lyon's rewrite that is being reviewed. Onthe whole I deem this patch stack to be safe to go in if necessary.These "regressions" are for -O0 with the vbic and vorn intrinsics whichdon't now get combined and well, so be it.

This then left us in the happy position of being able to delete codebut I was worried about LTO streaming as these "builtins" areessentially streamed out in LTO object code format. However since wemake no promises about LTO compatibility across releases, that's safebut I structured the dead code elimination as Patch 2/3. This will becommitted separately in case folks want to backport Patch 1/3 separatelyand want to assure their users of LTO compatibility within a releasebranch (if that even works :) ) .

Patch 3/3 removes the ML to generate Neon intrinsics and thedocumentation and updates the comments in the files to show that theseare now hand crafted rather than auto-generated. We've had these formany years now and I think it's time we got rid of this. Not everyonegroks ML and it doesn't help that only one or 2 folks can actually dothis properly everytime. Instead of having these bottlenecks and giventhe fact that the intrinsics are pretty stable now, there's no point inretaining the generator interface. I'd rather get rid of them. The onlybit left is neon-schedgen.ml, neon.ml and neon-testgen.ml. I think wecan safely remove neon-testgen.ml once Christophe's testsuite is doneand we'll probably just have to carry neon-schedgen.ml / neon.ml as itstill generates the neon descriptions for both a8 and a9.

The patch stack was caught up in the C++ type info mess recently andI've tested this on a cross arm-linux-gnueabihf testsuite run and itlooks ok module the issues mentioned for Patch 1/3. I've deliberatelyresisted deleting the entire gcc.target/arm/neon and neon-testgen.ml inthe hope that Christophe's testsuite will do the honours at that point:). Given we're in stage 1 and that I think we're getting some wherewith clyon's testsuite I feel that is reasonably practical in justcarrying the noise with these extra failures. Christophe and I willtestdrive his testsuite work in this space with these patches to see howthe conversion process works and if there are any issues with these patches.


If there are issues I'm happy to hear about them.

Will apply to trunk in a couple of days if no regressions with clyon'stestsuite for these intrinsics.



regards
Ramana
--
Ramana Radhakrishnan
Principal Engineer
ARM Ltd.

[Patch ARM 0/3] Neon intrinsics TLC - Replace intrinsics with GNU C implementations where possible and remove dead code.

Reply via email to