Hi All,

This patch series adds support for SLP vectorization of complex instructions 
[1].

These instructions exist only in their vector forms and require you to recognize
two statements in parallel.  Complex operations usually require a permute due to
the fact that the real and imaginary numbers are stored intermixed but these 
vector
instructions expect this and no longer need the compiler to generate a permute.

For this reason the pass also re-orders the loads in the SLP tree such that they
become contiguous and no longer need the permutes.  The Basic Blocks are left
untouched such that the scalar loop will still correctly issue permutes.

The instructions also support rotations along the Argand plane, as such the 
operands
have to be re-ordered to coincide with their load group.

For now, this patch only adds support for Complex addition with rotate and 
Complex FMLA
with rotation of 0 and 180. However it is the intention to in the future add 
support for
Complex subtraction and Complex multiplication.

The operations rely on the early lowering of complex numbers by GCC into real 
and imaginary
pairs, and so just recognizes any instruction sequence matching the operations 
requested.

To be safe when the it is not sure it can support the operation or if it finds 
something it
does not understand it backs off.

The hit rate of such patterns in SPEC CPU 2006 are as follows

Unsupported due to type casts: 28
Successfully matched and created: 43
Aborted due to unknown instruction in sequence: 354
Total times pattern matched: 403

Which shows that this and the future enhancements are worth while.  On AArch64 
the code size
difference when the new instructions are used is about 2-3x smaller.

[1] 
https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile

Thanks,
Tamar

-- 


Reply via email to