[Bug tree-optimization/113677] Missing `VEC_PERM_EXPR <{a, CST}, CST, {0, 1, 2, ...}>` optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113677 --- Comment #5 from Andrew Pinski --- (In reply to Andrew Pinski from comment #4) > Note it is not just about constants either. That is the same as what is mentioned in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94301#c2 even :).
[Bug tree-optimization/113677] Missing `VEC_PERM_EXPR <{a, CST}, CST, {0, 1, 2, ...}>` optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113677 --- Comment #4 from Andrew Pinski --- Note it is not just about constants either. Take: ``` #define vect64 __attribute__((vector_size(8) )) #define vect128 __attribute__((vector_size(16) )) vect128 unsigned int f(vect64 unsigned int a, vect64 unsigned int b) { vect64 unsigned int zero={0, 0}; return __builtin_shufflevector (a, b, 0, 1, 2, 3); } ``` We get: ``` _1 = {a_3(D), { 0, 0 }}; _2 = {b_4(D), { 0, 0 }}; _5 = VEC_PERM_EXPR <_1, _2, { 0, 1, 4, 5 }>; ``` Which obvious could be done to just: `_5 = {a_3(D), b_4(D)};`
[Bug tree-optimization/113677] Missing `VEC_PERM_EXPR <{a, CST}, CST, {0, 1, 2, ...}>` optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113677 Richard Biener changed: What|Removed |Added Last reconfirmed||2024-01-31 Status|UNCONFIRMED |NEW Ever confirmed|0 |1 --- Comment #3 from Richard Biener --- Yeah, most of the code in forwprop/match doesn't deal with the "new" permutes where the result isn't the same length as the inputs.
[Bug tree-optimization/113677] Missing `VEC_PERM_EXPR <{a, CST}, CST, {0, 1, 2, ...}>` optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113677 Andrew Pinski changed: What|Removed |Added Target|x86_64 |x86_64 aarch64 --- Comment #2 from Andrew Pinski --- Here is another example, using 64/128 on aarch64: ``` #define vect64 __attribute__((vector_size(8) )) #define vect128 __attribute__((vector_size(16) )) vect128 unsigned int f(vect64 unsigned int a) { vect64 unsigned int zero={0, 0}; return __builtin_shufflevector (a, zero, 0, 1, 2, 3); } ``` We get: ``` f: moviv31.4s, 0 fmovd0, d0 zip1v0.2d, v0.2d, v31.2d ``` This should just produce the `fmov` for little-endian and `mov/ins` for big-endian. Note for this part of the issue the aarch64 back-end represents zip using UNSPEC where it could use VEC_CONCAT instead. And it would do the correct thing there ...
[Bug tree-optimization/113677] Missing `VEC_PERM_EXPR <{a, CST}, CST, {0, 1, 2, ...}>` optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113677 Andrew Pinski changed: What|Removed |Added Target||x86_64 --- Comment #1 from Andrew Pinski --- I should note I noticed this while working on adding V4QI support for aarch64 but it is definite a generic issue.