https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102055
--- Comment #2 from Andrew Pinski ---
The use of ldr/tbl vs rev64/ext is questionable and depend on if we are inside
a loop or not. In the case of it being inside the loop and there are enough
registers, then using TBL is better on many (not all though) micro-arches as it
is similar latency as rev64.
Though I should note that clang/LLVM implements it as rev64/ext.
E.g.:
```
#define vector __attribute__((vector_size(16)))
vector char g(vector char a)
{
return __builtin_shufflevector (a,a,15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,
0);
}
vector char g1(vector char a)
{
vector char t= __builtin_shufflevector
(a,a,7,6,5,4,3,2,1,0,15,14,13,12,11,10,9,8);
vector long long t1 = (vector long long)t;
t1 = __builtin_shufflevector(t1,t1, 1,0);
return (vector char)t1;
}
```
Produces:
```
rev64 v0.16b, v0.16b
ext v0.16b, v0.16b, v0.16b, #8
```
For both.