[Bug target/40771] generated code is ~25% slower when autovectorization is enabled

pinskia at gcc dot gnu.org via Gcc-bugs Wed, 03 Apr 2024 16:36:48 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40771


--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
AARCH64 vectorization looks decent too:
```
        dup     v31.8h, w0
        adrp    x2, .LC0
        adrp    x0, .LC1
        adrp    x1, .LANCHOR0
        ldr     q30, [x2, #:lo12:.LC0]
        ldr     q29, [x0, #:lo12:.LC1]
        add     v30.8h, v31.8h, v30.8h
        add     v29.8h, v31.8h, v29.8h
        uzp2    v29.16b, v30.16b, v29.16b
        str     q29, [x1, #:lo12:.LANCHOR0]
```

The only improvement that can be made there is with SVE, those ldr could be
`index` instructions instead but that is PR 113328 .

[Bug target/40771] generated code is ~25% slower when autovectorization is enabled

Reply via email to