https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101693

            Bug ID: 101693
           Summary: Terrible SIMD register allocation with a tight loop
                    operating on 8 registers.
           Product: gcc
           Version: 11.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: ts.tomeksopel at gmail dot com
  Target Milestone: ---

There are a few issues regarding unnecessary register spilling, but this also
exhibits a lot of unnecessary juggling between registers.

See https://godbolt.org/z/da76fY1n7 and
https://www.reddit.com/r/cpp_questions/comments/oui5tc/simd_what_to_do_when_your_compiler_forgets_how_to/

The gist is that there's a tight loop, executed a constant number of times (~64
times) where accumulation happens to 8 ymm registers, and only those 8
registers are used from outside of the loop. Before the loop zeros are
assinged, and after the loop horizontal addition is performed. GCC generates
suboptimal code, whereas clang gets it right. It seems to perform unnecessary
movs in a pattern following a -> b -> vpdpbusd to b -> a. All versions on
godbolt >=8.1 seem to exhibit the issue, including trunk.

Reply via email to