https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97756
Roger Sayle <roger at nextmovesoftware dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- Known to work| |14.0 Summary|[11/12/13/14/15 Regression] |[11/12/13 Regression] |Inefficient handling of |Inefficient handling of |128-bit arguments |128-bit arguments --- Comment #17 from Roger Sayle <roger at nextmovesoftware dot com> --- I believe this issue is now fixed on mainline (i.e. for both GCC 14 and GCC 15). Firstly, many thanks to Jakub for correcting the error in my patch. We now generate optimal code sequences for the code in comments #3 and #5, and use generate fewer instructions than described in the original description. The final remaining issue is that with -O3 GCC still uses more instructions than clang and icc (see Thomas' comments in comments #12 and #13). The good news is that this is intentional, compiling with -Os (to optimize for size) generates the same number of instructions as clang and icc [in fact, using icc -Os generates larger code!?]. So when optimizing for performance, GCC is taking the opportunity to use more (cheap) instructions to execute faster (or that's the theory).