https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113537

            Bug ID: 113537
           Summary: ext should be used more for __builtin_shufflevector
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: enhancement
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: pinskia at gcc dot gnu.org
  Target Milestone: ---
            Target: aarch64

Take:
```
#define vector4 __attribute__((vector_size(4)))
#define vector8 __attribute__((vector_size(8)))
#define vector16 __attribute__((vector_size(16)))


vector8 char f3(vector16 char a)
{
  return __builtin_shufflevector  (a, a, 1, 2, 3, 4, 5, 6, 7, 8);
}

vector8 char f2(vector16 char a)
{
  return __builtin_shufflevector  (a, a, 1, 2, 3, 4, 5, 6, 7, 0);
}
```

Currently GCC produces:
```
f3:
        adrp    x0, .LC0
        ldr     q31, [x0, #:lo12:.LC0]
        tbl     v0.16b, {v0.16b}, v31.16b
        ret
f2:
        adrp    x0, .LC1
        ldr     q31, [x0, #:lo12:.LC1]
        tbl     v0.16b, {v0.16b}, v31.16b
        ret
```

But these should be optimized to just:
```
f3:
        ext     v0.16b, v0.16b, v0.16b, #1
        ret
f2:
        ext     v0.8b, v0.8b, v0.8b, #1
        ret
```

Reply via email to