[Bug target/102055] full 128byte swap using __builtin_shuffle should produce rev64 followed by ext

2024-01-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102055

--- Comment #2 from Andrew Pinski  ---
The use of ldr/tbl vs rev64/ext is questionable and depend on if we are inside
a loop or not. In the case of it being inside the loop and there are enough
registers, then using TBL is better on many (not all though) micro-arches as it
is similar latency as rev64. 

Though I should note that clang/LLVM implements it as rev64/ext.

E.g.:
```

#define vector __attribute__((vector_size(16)))

vector char g(vector char a)
{
return __builtin_shufflevector (a,a,15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,
0);
}

vector char g1(vector char a)
{
vector char t= __builtin_shufflevector
(a,a,7,6,5,4,3,2,1,0,15,14,13,12,11,10,9,8);
vector long long t1 = (vector long long)t;
t1 = __builtin_shufflevector(t1,t1, 1,0);
return (vector char)t1;
}
```

Produces:
```
rev64   v0.16b, v0.16b
ext v0.16b, v0.16b, v0.16b, #8
```

For both.

[Bug target/102055] full 128byte swap using __builtin_shuffle should produce rev64 followed by ext

2021-08-25 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102055

Andrew Pinski  changed:

   What|Removed |Added

   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=102056

--- Comment #1 from Andrew Pinski  ---
Note if PR 102056 is implemented this will both become the same as g.