[Bug middle-end/105032] Compiling inline ASM x86 causing GCC stuck in an endless loop with 100% CPU usage
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105032 --- Comment #15 from Andrey Vihrov --- *** Bug 101438 has been marked as a duplicate of this bug. ***
[Bug middle-end/105032] Compiling inline ASM x86 causing GCC stuck in an endless loop with 100% CPU usage
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105032 H.J. Lu changed: What|Removed |Added Resolution|--- |FIXED Target Milestone|--- |11.3 Status|NEW |RESOLVED --- Comment #14 from H.J. Lu --- Fixed for GCC 12 and GCC 11.3.
[Bug middle-end/105032] Compiling inline ASM x86 causing GCC stuck in an endless loop with 100% CPU usage
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105032 --- Comment #13 from CVS Commits --- The releases/gcc-11 branch has been updated by Vladimir Makarov : https://gcc.gnu.org/g:5f587c81bc558942d2988f5e2965a72471f5c202 commit r11-9754-g5f587c81bc558942d2988f5e2965a72471f5c202 Author: Vladimir N. Makarov Date: Fri Apr 1 09:48:57 2022 -0400 [PR105032] LRA: modify loop condition to find reload insns for hard reg splitting When trying to split hard reg live range to assign hard reg to a reload pseudo, LRA searches for reload insns of the reload pseudo assuming a specific order of the reload insns. This order is violated if reload involved in inheritance transformation. In such case, the loop used for reload insn searching can become infinite. The patch fixes this. gcc/ChangeLog: PR middle-end/105032 * lra-assigns.c (find_reload_regno_insns): Modify loop condition. gcc/testsuite/ChangeLog: PR middle-end/105032 * gcc.target/i386/pr105032.c: New.
[Bug middle-end/105032] Compiling inline ASM x86 causing GCC stuck in an endless loop with 100% CPU usage
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105032 --- Comment #12 from Vladimir Makarov --- GCC-11 branch needs a bit different patch. I'll commit a modified patch to gcc-11 branch on Friday.
[Bug middle-end/105032] Compiling inline ASM x86 causing GCC stuck in an endless loop with 100% CPU usage
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105032 --- Comment #11 from CVS Commits --- The master branch has been updated by Vladimir Makarov : https://gcc.gnu.org/g:22b0476a814a4759bb68f38b9415624a0fe52a7d commit r12-7924-g22b0476a814a4759bb68f38b9415624a0fe52a7d Author: Vladimir N. Makarov Date: Wed Mar 30 13:03:44 2022 -0400 [PR105032] LRA: modify loop condition to find reload insns for hard reg splitting When trying to split hard reg live range to assign hard reg to a reload pseudo, LRA searches for reload insns of the reload pseudo assuming a specific order of the reload insns. This order is violated if reload involved in inheritance transformation. In such case, the loop used for reload insn searching can become infinite. The patch fixes this. gcc/ChangeLog: PR middle-end/105032 * lra-assigns.cc (find_reload_regno_insns): Modify loop condition. gcc/testsuite/ChangeLog: PR middle-end/105032 * gcc.target/i386/pr105032.c: New.
[Bug middle-end/105032] Compiling inline ASM x86 causing GCC stuck in an endless loop with 100% CPU usage
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105032 --- Comment #10 from Vladimir Makarov --- I've reproduced the bug also on the trunk. The loop in question assumes a specific order for reload insns. In this case order of insns involving the reload pseudos is violated because the pseudo is also used for inheritance. We can change the loop condition to guarantee its finish independently of the reload insns order. It might results in failure of hard reg live range splitting for the pseudo. Permitting hard reg splitting for reload pseudo involved in inheritance is questionable with LRA correct work and generated code efficiency. So it has no sense for me to do this. The patch will be pushed to trunk right after finishing testing.
[Bug middle-end/105032] Compiling inline ASM x86 causing GCC stuck in an endless loop with 100% CPU usage
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105032 --- Comment #9 from Vladimir Makarov --- Cycling is the worst what can happen to compiler (even crash is better). This is the highest priority PR right now for me. I can not say why the cycle does not finish. It should as it works only for reload pseudos. I'll investigate it more. In any case I hope to fix it on this week. Sorry for inconvenience.
[Bug middle-end/105032] Compiling inline ASM x86 causing GCC stuck in an endless loop with 100% CPU usage
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105032 H.J. Lu changed: What|Removed |Added CC||hjl.tools at gmail dot com --- Comment #8 from H.J. Lu --- (In reply to Jakub Jelinek from comment #7) > Anyway, with the "b" etc. constraints (which is a good idea to use on x86 > when it has single register constraints for those but can't be used on other > arches which do not have such constraints) you just trigger slightly > different path in the RA, but the problem remains roughly the same, you > force use of 6 registers as input plus one memory address and esp is a stack > pointer and ebp could be a frame pointer and it is a question if you don't > need another register for the address of the memory input. Since it is very unreliable, the inlined syscall with asm statement has been removed from glibc.
[Bug middle-end/105032] Compiling inline ASM x86 causing GCC stuck in an endless loop with 100% CPU usage
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105032 --- Comment #7 from Jakub Jelinek --- Consider e.g. void bar (int *); int foo (int *a, int *b, int *c, int *d) { for (int i = 0; i < 1024; i++) a[i] = a[i] * b[i] + (c[i] - d[i]); bar (a); return 42; } with -m32 -O3 -mavx -mstackrealign. It needs to dynamically realign the stack because user asked for it (so that it can use 256-bit aligned stack slots and callers don't guarantee that alignment), so it needs a frame pointer (%ebp), stack pointer (%esp) and DRAP (%ecx in this case). Especially if you e.g. also use VLAs or alloca in the function. %ebp based addressing is for the automatic vars in the function in its stack frame, stack pointer can be variable offset from it used for outgoing arguments to function calls and push/pop or for alloca/VLAs and DRAP is used to access function arguments which aren't at fixed offset from the frame pointer either. Anyway, with the "b" etc. constraints (which is a good idea to use on x86 when it has single register constraints for those but can't be used on other arches which do not have such constraints) you just trigger slightly different path in the RA, but the problem remains roughly the same, you force use of 6 registers as input plus one memory address and esp is a stack pointer and ebp could be a frame pointer and it is a question if you don't need another register for the address of the memory input. A way to free one input would be to store 2 arguments into an array and use the whole array as one memory input and only inside of the inline asm load it into the right registers.
[Bug middle-end/105032] Compiling inline ASM x86 causing GCC stuck in an endless loop with 100% CPU usage
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105032 --- Comment #6 from Ammar Faizi --- (In reply to Jakub Jelinek from comment #4) > If this is a macro that users should use in arbitrary user code, there is > another problem, if something is vectorized in the function, either using > AVX or later or -mstackrealign is used, another register is needed for the > stack realignment (DRAP register). I don't really understand about stack realignment part. So I have a question, what is another register here? Is it %ebp? If we have %ebp as a stack frame pointer, can't the compiler just use it for the realignment? I am not sure what the DRAP register really means. Googled about it, but doesn't show anything relevant.
[Bug middle-end/105032] Compiling inline ASM x86 causing GCC stuck in an endless loop with 100% CPU usage
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105032 --- Comment #5 from Ammar Faizi --- (In reply to Jakub Jelinek from comment #3) > This has been hanging or ICEing on and off since forever. > E.g. even r105000 ICEs, r20 works, r21 ICEs, r10-5912 works, r11-1 > hangs, so does current trunk. > The first revision after r10-5912 to start hanging was > r10-6326-gbcf3fa7cf5a3d024b507. > Note, without optimizations, the inline asm is on or beyond the border what > can be handled, it uses 6 of the 8 GPRs the arch has, the further two are > the stack pointer and when not optimizing or if frame pointer is for > whatever reason needed frame pointer. The asm also has a memory input. So, > it fully depends on optimization (which isn't done with -O0 generally) that > the address of the > _arg6 variable can be expressed as offset(%esp) or offset(%ebp). If it is > not (and -O0 asks for no optimizations), then there are no registers left > how to describe the input. Interestingly, changing the my_syscall6() macro to this one works nicely. #define my_syscall6(num, arg1, arg2, arg3, arg4, arg5, arg6)\ ({ \ long _eax = (long)(num); \ long _arg6 = (long)(arg6); /* Always in memory */ \ __asm__ volatile ( \ "pushl %[_arg6]\n\t" \ "pushl %%ebp\n\t" \ "movl 4(%%esp),%%ebp\n\t" \ "int$0x80\n\t" \ "popl %%ebp\n\t" \ "addl $4,%%esp\n\t" \ : "+a"(_eax)/* %eax */ \ : "b"(arg1),/* %ebx */ \ "c"(arg2),/* %ecx */ \ "d"(arg3),/* %edx */ \ "S"(arg4),/* %esi */ \ "D"(arg5),/* %edi */ \ [_arg6]"m"(_arg6) /* memory */\ : "memory", "cc"\ ); \ _eax; \ }) Link: https://godbolt.org/z/hdsffvr1d What could possibly be wrong here? I am not sure what is the behavior difference between this macro with the previously posted?
[Bug middle-end/105032] Compiling inline ASM x86 causing GCC stuck in an endless loop with 100% CPU usage
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105032 --- Comment #4 from Jakub Jelinek --- If this is a macro that users should use in arbitrary user code, there is another problem, if something is vectorized in the function, either using AVX or later or -mstackrealign is used, another register is needed for the stack realignment (DRAP register).
[Bug middle-end/105032] Compiling inline ASM x86 causing GCC stuck in an endless loop with 100% CPU usage
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105032 Jakub Jelinek changed: What|Removed |Added Last reconfirmed||2022-03-23 Status|UNCONFIRMED |NEW Ever confirmed|0 |1 CC||jakub at gcc dot gnu.org, ||vmakarov at gcc dot gnu.org --- Comment #3 from Jakub Jelinek --- This has been hanging or ICEing on and off since forever. E.g. even r105000 ICEs, r20 works, r21 ICEs, r10-5912 works, r11-1 hangs, so does current trunk. The first revision after r10-5912 to start hanging was r10-6326-gbcf3fa7cf5a3d024b507. Note, without optimizations, the inline asm is on or beyond the border what can be handled, it uses 6 of the 8 GPRs the arch has, the further two are the stack pointer and when not optimizing or if frame pointer is for whatever reason needed frame pointer. The asm also has a memory input. So, it fully depends on optimization (which isn't done with -O0 generally) that the address of the _arg6 variable can be expressed as offset(%esp) or offset(%ebp). If it is not (and -O0 asks for no optimizations), then there are no registers left how to describe the input.
[Bug middle-end/105032] Compiling inline ASM x86 causing GCC stuck in an endless loop with 100% CPU usage
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105032 Hongtao.liu changed: What|Removed |Added CC||crazylht at gmail dot com --- Comment #2 from Hongtao.liu --- stuck in this loop 1731 for (prev_insn = PREV_INSN (start_insn), 1732 next_insn = NEXT_INSN (start_insn); 1733 insns_num != 1 && (prev_insn != NULL || next_insn != NULL); ) 1734{ 1735 if (prev_insn != NULL) 1736{ 1737 if (bitmap_bit_p (&lra_reg_info[regno].insn_bitmap, (gdb) 1738INSN_UID (prev_insn))) 1739{ 1740 first_insn = prev_insn; 1741 insns_num--; 1742} 1743prev_insn = PREV_INSN (prev_insn); 1744} 1745 if (next_insn != NULL && second_insn == NULL) 1746{ 1747 if (! bitmap_bit_p (&lra_reg_info[regno].insn_bitmap, (gdb) 1748INSN_UID (next_insn))) 1749next_insn = NEXT_INSN (next_insn); 1750 else 1751{ 1752 second_insn = next_insn; 1753 insns_num--; 1754} 1755} 1756} (gdb) p second_insn $5 = (rtx_insn *) 0x7fffea2f9980 (gdb) p prev_insn $6 = (rtx_insn *) 0x0 (gdb) p next_insn $7 = (rtx_insn *) 0x7fffea2f9980 (gdb) p second_insn $8 = (rtx_insn *) 0x7fffea2f9980 (gdb) p insns_num $9 = 2 (gdb) f #0 find_reload_regno_insns (regno=91, start=@0x7fffd308: 0xcc2968 ::release()+68>, finish=@0x7fffd300: 0x7fffd320) at gcc/lra-assigns.cc:1733
[Bug middle-end/105032] Compiling inline ASM x86 causing GCC stuck in an endless loop with 100% CPU usage
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105032 --- Comment #1 from Andrew Pinski --- GCC 6.4 used to ICE: : In function 'void* __sys_mmap(void*, size_t, int, int, int, off_t)': :39:1: error: unable to find a register to spill } ^ :39:1: error: this is the insn: (insn 21 20 37 2 (set (reg:SI 102 [orig:96 offset.10_18 ] [96]) (mem/c:SI (plus:SI (reg/f:SI 16 argp) (const_int 28 [0x1c])) [1 offset+0 S4 A32])) :37 86 {*movsi_internal} (expr_list:REG_DEAD (reg:SI 16 argp) (nil))) :39: confused by earlier errors, bailing out