https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43147
Andrew Pinski changed:
What|Removed |Added
Component|rtl-optimization|target
--- Comment #11 from Andrew Pinski ---
We produce:
Trying 5, 7 -> 11:
5: r86:V4SF=[`*.LC0']
REG_EQUAL const_vector
7: r85:V4SF=vec_select(vec_concat(r86:V4SF,r86:V4SF),parallel)
REG_DEAD r86:V4SF
REG_EQUAL const_vector
11: r88:V4SF=vec_select(vec_concat(r85:V4SF,r85:V4SF),parallel)
REG_DEAD r85:V4SF
REG_EQUAL const_vector
Failed to match this instruction:
(set (reg:V4SF 88)
(const_vector:V4SF [
(const_double:SF 2.0e+0 [0x0.8p+2])
(const_double:SF 1.0e+0 [0x0.8p+1])
(const_double:SF 4.0e+0 [0x0.8p+3])
(const_double:SF 3.0e+0 [0x0.cp+2])
]))
Which means the vec_select are merging at the rtl level just fine.
Anyways if the target expands __builtin_ia32_shufps to VEC_PERM_EXPR we would
have gotten this optimized at the gimple level. So this is a target issue.