[Bug target/114415] [13/14 Regression] wrong code with -Oz -fno-dce -fno-forward-propagate -flive-range-shrinkage -fweb since r13-1826
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114415 --- Comment #5 from Vladimir Makarov --- After some considerations, I've decided to fix it in the scheduler. Such approach solves the problem for all targets and schedulers, still permitting live range shrinkage (important for space optimizations) and scheduling for all option cases. I see the only downside in adding additional dependencies for stack pointer modifications. But such insns have very small latency time and increase critical paths only quite a bit. I'll push the patch today or tomorrow.
[Bug target/114415] [13/14 Regression] wrong code with -Oz -fno-dce -fno-forward-propagate -flive-range-shrinkage -fweb since r13-1826
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114415 --- Comment #4 from Vladimir Makarov --- (In reply to Jakub Jelinek from comment #3) > BTW, with additional -mno-red-zone there is still movement of these insns, > The problem is even bigger. Live range splitting uses a standard insn dependency calculation of the scheduler. The scheduler can not recognize that there should be dependencies between insns 60/66 and 65/71: 60: {sp:DI=sp:DI-0x40;clobber flags:CC;} 65: {r262:DI=0;r161:DI=r132:DI<<0x2+r129:DI;r162:DI=r132:DI<<0x2+r140:DI;[r129:DI]=[r140:DI];use r132:DI;} 66: {sp:DI=r129:DI-0x40;clobber flags:CC;} 71: {r263:DI=0;r165:DI=r132:DI<<0x2+r136:DI;r166:DI=r132:DI<<0x2+r128:DI;[r136:DI]=[r128:DI];use r132:DI;} Therefore it moves insns 65 and 71 before insn 60 during live range shrinkage. It is wrong when there is no red zone but, even worse, split3 inserts the following 446: [--sp:DI]=0x10 447: cx:DI=[sp:DI++] between insns 65 and 71 before insn 60 which makes code wrong (rewriting memory updated by insn 65) even if there is a red zone. The analogous problem could occur in sched1 or/and sched2 when we don't use live range shrinkage. There is no way that the scheduler find and create specific deps 60->65, 66->65 (too much analysis is required to find insn 65 or 71 works with the stack) I see two possible solutions: 1. prohibit sched1, sched2, and live range shrinkage when accumulating args is used 2. create deps between any stack modification insns and memory modification insns The first one is easier and affects only one target (although the same problem can be on other targets). Still probably the same should be done for selective scheduler. The second one is the safest approach solving problems on all targets but may affect performance of other targets. The fix can require more time to implement. I'll think a bit about the possible fixes and inform you tomorrow.
[Bug target/114415] [13/14 Regression] wrong code with -Oz -fno-dce -fno-forward-propagate -flive-range-shrinkage -fweb since r13-1826
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114415 Jeffrey A. Law changed: What|Removed |Added Priority|P3 |P1 CC||law at gcc dot gnu.org
[Bug target/114415] [13/14 Regression] wrong code with -Oz -fno-dce -fno-forward-propagate -flive-range-shrinkage -fweb since r13-1826
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114415 --- Comment #3 from Jakub Jelinek --- BTW, with additional -mno-red-zone there is still movement of these insns, though they leaq128(%rbx), %rsp ! level 0 movq%r13, %rsi movl%r10d, %edx rep stosl andq$0, -424(%rbp) movq%r12, %rdi leaq-112(%rbp), %rax movq$1, -432(%rbp) pushq $16 popq%rcx rep movsl ! fill in second argument - at %rsp - 64 pushq $16 ! again overwrite last 8 bytes of the second argument movq%rbx, %rdi leaq-240(%rbp), %rsi popq%rcx movq%r12, %rsp! level 64 movq%rbx, %rsp! level 128 rep movsl ! fill in first argument xchgq %rax, %rdi callbar1 Ah, and it is sched2 pass which moves the movq %r12, %rsp and movq %rbx, %rsp instructions before the second rep movsl.
[Bug target/114415] [13/14 Regression] wrong code with -Oz -fno-dce -fno-forward-propagate -flive-range-shrinkage -fweb since r13-1826
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114415 Jakub Jelinek changed: What|Removed |Added CC||sayle at gcc dot gnu.org, ||uros at gcc dot gnu.org, ||vmakarov at gcc dot gnu.org --- Comment #2 from Jakub Jelinek --- Seems it is lr_shrinkage pass where things go wrong. In bar0, all the 3 calls the function makes have 2 64-byte arguments (plus 64-byte return passed by hidden reference). -Oz apparently uses -mno-accumulate-outgoing-args, allocates some stack and then have (at least initially) before each of the 3 calls two spots which decrement %rsp by 64 and after each call one spot which increments %rsp by 128. csa merges the first %rsp -= 64 with the previous much larger %rsp decrement and in some cases the %rsp += 128 with following %rsp -= 64. The function sets %r12 to be the REG_ARGS_SIZE 64 %rsp level and %rbx to be the REG_ARGS_SIZE 128 %rsp level, i.e. %rbx = %r12 - 64. In asmcons pass, we still have before the call to bar1 %rsp -= 64; rep_movsi (fills in the second argument); %rsp -= 64; rep_movsi (fills in the first argument). But then lr_shrinkage pass moves both the stack decrements after all the rep_movsi calls, so we then have: (insn 60 71 66 2 (parallel [ (set (reg/f:DI 7 sp) (plus:DI (reg/f:DI 7 sp) (const_int -64 [0xffc0]))) (clobber (reg:CC 17 flags)) ]) "pr114415.c":27:8 272 {*adddi_1} (expr_list:REG_UNUSED (reg:CC 17 flags) (expr_list:REG_ARGS_SIZE (const_int 64 [0x40]) (nil (insn 66 60 72 2 (parallel [ (set (reg/f:DI 7 sp) (plus:DI (reg/f:DI 129) (const_int -64 [0xffc0]))) (clobber (reg:CC 17 flags)) ]) "pr114415.c":27:8 272 {*adddi_1} (expr_list:REG_DEAD (reg/f:DI 129) (expr_list:REG_UNUSED (reg:CC 17 flags) (expr_list:REG_ARGS_SIZE (const_int 128 [0x80]) (nil) and just set of %rdi and call to bar1. Now, as x86-64 has red zone, this in theory could still work fine (dunno if backend is supposed to have some barriers which prevent moving the rep movsl insns across it, unfortunately the fact that it uses some %rsp based address is not visible directly in the insn due to CSE), but then comes peephole2 ;; With -Oz, transform mov $imm,reg to the shorter push $imm; pop reg. and converts the (set (reg:DI %rcx) (const_int 16)) insns to push/pop pair, which when the stack pointer is higher than it should have been causes clobbering of the value.
[Bug target/114415] [13/14 Regression] wrong code with -Oz -fno-dce -fno-forward-propagate -flive-range-shrinkage -fweb since r13-1826
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114415 Jakub Jelinek changed: What|Removed |Added Summary|wrong code with -Oz |[13/14 Regression] wrong |-fno-dce|code with -Oz -fno-dce |-fno-forward-propagate |-fno-forward-propagate |-flive-range-shrinkage |-flive-range-shrinkage |-fweb |-fweb since r13-1826 Target Milestone|--- |13.3 Last reconfirmed||2024-03-25 Status|UNCONFIRMED |NEW Keywords|needs-bisection | Ever confirmed|0 |1 CC||jakub at gcc dot gnu.org --- Comment #1 from Jakub Jelinek --- Started with r13-1826-g16aafa3194d4851a07cc204f56a5f0618f77e5d7