[Bug target/114415] [13/14 Regression] wrong code with -Oz -fno-dce -fno-forward-propagate -flive-range-shrinkage -fweb since r13-1826

2024-04-04 Thread vmakarov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114415

--- Comment #5 from Vladimir Makarov  ---
After some considerations, I've decided to fix it in the scheduler.

Such approach solves the problem for all targets and schedulers, still
permitting live range shrinkage (important for space optimizations) and
scheduling for all option cases.

I see the only downside in adding additional dependencies for stack pointer
modifications.  But such insns have very small latency time and increase
critical paths only quite a bit.

I'll push the patch today or tomorrow.

[Bug target/114415] [13/14 Regression] wrong code with -Oz -fno-dce -fno-forward-propagate -flive-range-shrinkage -fweb since r13-1826

2024-04-03 Thread vmakarov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114415

--- Comment #4 from Vladimir Makarov  ---
(In reply to Jakub Jelinek from comment #3)
> BTW, with additional -mno-red-zone there is still movement of these insns,
> 

The problem is even bigger.  Live range splitting uses a standard insn
dependency calculation of the scheduler.  The scheduler can not recognize that
there should be dependencies between insns 60/66 and 65/71:

   60: {sp:DI=sp:DI-0x40;clobber flags:CC;} 
   65:
{r262:DI=0;r161:DI=r132:DI<<0x2+r129:DI;r162:DI=r132:DI<<0x2+r140:DI;[r129:DI]=[r140:DI];use
r132:DI;}   
   66: {sp:DI=r129:DI-0x40;clobber flags:CC;}   
   71:
{r263:DI=0;r165:DI=r132:DI<<0x2+r136:DI;r166:DI=r132:DI<<0x2+r128:DI;[r136:DI]=[r128:DI];use
r132:DI;}   

Therefore it moves insns 65 and 71 before insn 60 during live range shrinkage. 
It is wrong when there is no red zone but, even worse, split3 inserts the
following

  446: [--sp:DI]=0x10
  447: cx:DI=[sp:DI++]

between insns 65 and 71 before insn 60 which makes code wrong (rewriting memory
updated by insn 65) even if there is a red zone.

The analogous problem could occur in sched1 or/and sched2 when we don't use
live range shrinkage.

There is no way that the scheduler find and create specific deps 60->65, 66->65
(too much analysis is required to find insn 65 or 71 works with the stack)

I see two possible solutions:
  1. prohibit sched1, sched2, and live range shrinkage when accumulating args
is used
  2. create deps between any stack modification insns and memory modification
insns

The first one is easier and affects only one target (although the same problem
can be on other targets).  Still probably the same should be done for selective
scheduler.

The second one is the safest approach solving problems on all targets but may
affect performance of other targets.  The fix can require more time to
implement.

I'll think a bit about the possible fixes and inform you tomorrow.

[Bug target/114415] [13/14 Regression] wrong code with -Oz -fno-dce -fno-forward-propagate -flive-range-shrinkage -fweb since r13-1826

2024-03-29 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114415

Jeffrey A. Law  changed:

   What|Removed |Added

   Priority|P3  |P1
 CC||law at gcc dot gnu.org

[Bug target/114415] [13/14 Regression] wrong code with -Oz -fno-dce -fno-forward-propagate -flive-range-shrinkage -fweb since r13-1826

2024-03-25 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114415

--- Comment #3 from Jakub Jelinek  ---
BTW, with additional -mno-red-zone there is still movement of these insns,
though they   
leaq128(%rbx), %rsp   ! level 0
movq%r13, %rsi
movl%r10d, %edx
rep stosl
andq$0, -424(%rbp)
movq%r12, %rdi
leaq-112(%rbp), %rax
movq$1, -432(%rbp)
pushq   $16
popq%rcx
rep movsl ! fill in second argument - at %rsp - 64
pushq   $16   ! again overwrite last 8 bytes of the
second argument
movq%rbx, %rdi
leaq-240(%rbp), %rsi
popq%rcx
movq%r12, %rsp! level 64
movq%rbx, %rsp! level 128
rep movsl ! fill in first argument
xchgq   %rax, %rdi
callbar1

Ah, and it is sched2 pass which moves the movq %r12, %rsp and movq %rbx, %rsp
instructions before the second rep movsl.

[Bug target/114415] [13/14 Regression] wrong code with -Oz -fno-dce -fno-forward-propagate -flive-range-shrinkage -fweb since r13-1826

2024-03-25 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114415

Jakub Jelinek  changed:

   What|Removed |Added

 CC||sayle at gcc dot gnu.org,
   ||uros at gcc dot gnu.org,
   ||vmakarov at gcc dot gnu.org

--- Comment #2 from Jakub Jelinek  ---
Seems it is lr_shrinkage pass where things go wrong.
In bar0, all the 3 calls the function makes have 2 64-byte arguments (plus
64-byte return passed by hidden reference).  -Oz apparently uses
-mno-accumulate-outgoing-args, allocates some stack and then have (at least
initially) before each of the 3 calls two spots which decrement %rsp by 64 and
after each call one spot which increments %rsp by 128.
csa merges the first %rsp -= 64 with the previous much larger %rsp decrement
and in some cases the %rsp += 128 with following %rsp -= 64.
The function sets %r12 to be the REG_ARGS_SIZE 64 %rsp level and %rbx to be the
REG_ARGS_SIZE 128 %rsp level, i.e. %rbx = %r12 - 64.
In asmcons pass, we still have before the call to bar1 %rsp -= 64; rep_movsi
(fills in the second argument); %rsp -= 64; rep_movsi (fills in the first
argument).
But then lr_shrinkage pass moves both the stack decrements after all the
rep_movsi calls, so we then have:
(insn 60 71 66 2 (parallel [
(set (reg/f:DI 7 sp)
(plus:DI (reg/f:DI 7 sp)
(const_int -64 [0xffc0])))
(clobber (reg:CC 17 flags))
]) "pr114415.c":27:8 272 {*adddi_1}
 (expr_list:REG_UNUSED (reg:CC 17 flags)
(expr_list:REG_ARGS_SIZE (const_int 64 [0x40])
(nil
(insn 66 60 72 2 (parallel [
(set (reg/f:DI 7 sp)
(plus:DI (reg/f:DI 129)
(const_int -64 [0xffc0])))
(clobber (reg:CC 17 flags))
]) "pr114415.c":27:8 272 {*adddi_1}
 (expr_list:REG_DEAD (reg/f:DI 129)
(expr_list:REG_UNUSED (reg:CC 17 flags)
(expr_list:REG_ARGS_SIZE (const_int 128 [0x80])
(nil)
and just set of %rdi and call to bar1.

Now, as x86-64 has red zone, this in theory could still work fine (dunno if
backend is supposed to have some barriers which prevent moving the rep movsl
insns across it, unfortunately the fact that it uses some %rsp based address is
not visible directly in the insn due to CSE), but then comes peephole2
;; With -Oz, transform mov $imm,reg to the shorter push $imm; pop reg.
and converts the (set (reg:DI %rcx) (const_int 16)) insns to push/pop pair,
which when
the stack pointer is higher than it should have been causes clobbering of the
value.

[Bug target/114415] [13/14 Regression] wrong code with -Oz -fno-dce -fno-forward-propagate -flive-range-shrinkage -fweb since r13-1826

2024-03-25 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114415

Jakub Jelinek  changed:

   What|Removed |Added

Summary|wrong code with -Oz |[13/14 Regression] wrong
   |-fno-dce|code with -Oz -fno-dce
   |-fno-forward-propagate  |-fno-forward-propagate
   |-flive-range-shrinkage  |-flive-range-shrinkage
   |-fweb   |-fweb since r13-1826
   Target Milestone|--- |13.3
   Last reconfirmed||2024-03-25
 Status|UNCONFIRMED |NEW
   Keywords|needs-bisection |
 Ever confirmed|0   |1
 CC||jakub at gcc dot gnu.org

--- Comment #1 from Jakub Jelinek  ---
Started with r13-1826-g16aafa3194d4851a07cc204f56a5f0618f77e5d7