[Bug rtl-optimization/101693] New: Terrible SIMD register allocation with a tight loop operating on 8 registers.

ts.tomeksopel at gmail dot com via Gcc-bugs Fri, 30 Jul 2021 07:04:30 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101693


            Bug ID: 101693
           Summary: Terrible SIMD register allocation with a tight loop
                    operating on 8 registers.
           Product: gcc
           Version: 11.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: ts.tomeksopel at gmail dot com
  Target Milestone: ---

There are a few issues regarding unnecessary register spilling, but this also
exhibits a lot of unnecessary juggling between registers.

See https://godbolt.org/z/da76fY1n7 and
https://www.reddit.com/r/cpp_questions/comments/oui5tc/simd_what_to_do_when_your_compiler_forgets_how_to/

The gist is that there's a tight loop, executed a constant number of times (~64
times) where accumulation happens to 8 ymm registers, and only those 8
registers are used from outside of the loop. Before the loop zeros are
assinged, and after the loop horizontal addition is performed. GCC generates
suboptimal code, whereas clang gets it right. It seems to perform unnecessary
movs in a pattern following a -> b -> vpdpbusd to b -> a. All versions on
godbolt >=8.1 seem to exhibit the issue, including trunk.

[Bug rtl-optimization/101693] New: Terrible SIMD register allocation with a tight loop operating on 8 registers.

Reply via email to