I'm sorry, but I'm not quite following what you're expecting versus what
you're seeing. e.g. if I compile that with --target=avx2, I see a series of
three multiplies followed by FMA instructions, which seems about as good as
it gets.
Is it that you're expecting that if you have code that does
Interesting, it's effectively exploiting functional programming to extract
parallelism. I was always wondering why there are no implementations of
functional languages, which are specifically targeted at extracting SIMD
level parallelism. Or probably I'm just not aware of such languages. Though