[Bug target/17108] Store with update not generated for a simple loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=17108 --- Comment #9 from Segher Boessenkool --- Author: segher Date: Wed Apr 17 09:45:57 2019 New Revision: 270407 URL: https://gcc.gnu.org/viewcvs?rev=270407&root=gcc&view=rev Log: rs6000: Improve the load/store-with-update patterns (PR17108) Many of these patterns only worked in 32-bit mode, and some only worked in 64-bit mode. This patch makes these use Pmode, fixing the PR. On the other hand, the stack updates have to use the same mode for the stack pointer as for the value stored, so let's simplify that a bit. Many of these patterns pass the wrong mode to avoiding_indexed_address_p (it should be the mode of the datum accessed, not the mode of the pointer). Finally, I merge some patterns into one (using iterators). PR target/17108 * config/rs6000/rs6000.c (rs6000_split_multireg_move): Adjust pattern name. (rs6000_emit_allocate_stack_1): Simplify condition. Adjust pattern name. * config/rs6000/rs6000.md (bits): Add entries for SF and DF. (*movdi_update1): Use Pmode. (movdi__update): Fix argument to avoiding_indexed_address_p. (movdi__update_stack): Rename to ... (movdi_update_stack): ... this. Fix comment. Change condition. Don't use Pmode. (*movsi_update1): Use Pmode. (*movsi_update2): Use Pmode. (movsi_update): Rename to ... (movsi__update): ... this. Use Pmode. (movsi_update_stack): Fix condition. (*movhi_update1): Use Pmode. Fix argument to avoiding_indexed_address_p. (*movhi_update2): Ditto. (*movhi_update3): Ditto. (*movhi_update4): Ditto. (*movqi_update1): Ditto. (*movqi_update2): Ditto. (*movqi_update3): Ditto. (*movsf_update1, *movdf_update1): Merge, rename to... (*mov_update1): This. Use Pmode. Fix argument to avoiding_indexed_address_p. Add "size" attribute. (*movsf_update2, *movdf_update2): Merge, rename to... (*mov_update2): This. Ditto. (*movsf_update3): Use Pmode. Fix argument to avoiding_indexed_address_p. (*movsf_update4): Ditto. (allocate_stack): Simplify condition. Adjust pattern names. Modified: trunk/gcc/ChangeLog trunk/gcc/config/rs6000/rs6000.c trunk/gcc/config/rs6000/rs6000.md
[Bug target/17108] Store with update not generated for a simple loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=17108 Segher Boessenkool changed: What|Removed |Added CC||segher at gcc dot gnu.org --- Comment #8 from Segher Boessenkool --- We currently generate (for -O2 -m64, -O3 unrolls it completely, see comment 7) li 9,8 mtctr 9 .p2align 4,,15 .L2: stfs 1,0(3) addi 3,3,4 bdnz .L2 blr and for -m32 we get li 9,8 addi 3,3,-4 mtctr 9 .p2align 4,,15 .L2: stfsu 1,4(3) bdnz .L2 blr The difference is partly the selected -mcpu=, but that doesn't explain it completely. The gimple passes (probably ivopts) have decided to do a pre_inc here; all differences are at RTL level. Except for -mcpu=power9 they didn't. A case where it works as expected, -O2 -m32 -mcpu=power4, the auto_inc_dec pass does not help (this is caused by rtx_cost issues): starting bb 3 11: [r122:SI]=r127:SF 11: [r122:SI]=r127:SF found mem(11) *(r[122]+0) 10: r122:SI=r122:SI+0x4 10: r122:SI=r122:SI+0x4 found pre inc(10) r[122]+=4 11: [r122:SI]=r127:SF found mem(11) *(r[122]+0) trying SIMPLE_PRE_INC cost failure old=16 new=408 (I have a patch for that). but then combine comes along and does Trying 10 -> 11: 10: r122:SI=r122:SI+0x4 11: [r122:SI]=r127:SF Successfully matched this instruction: (parallel [ (set (mem:SF (plus:SI (reg:SI 122 [ ivtmp.10 ]) (const_int 4 [0x4])) [1 MEM[base: _17, offset: 0B]+0 S4 A32]) (reg/v:SF 127 [ d ])) (set (reg:SI 122 [ ivtmp.10 ]) (plus:SI (reg:SI 122 [ ivtmp.10 ]) (const_int 4 [0x4]))) ]) allowing combination of insns 10 and 11 original costs 4 + 4 = 8 replacement cost 4 -m64 however says Trying 10 -> 11: 10: r122:DI=r122:DI+0x4 11: [r122:DI]=r127:SF Failed to match this instruction: (parallel [ (set (mem:SF (plus:DI (reg:DI 122 [ ivtmp.11 ]) (const_int 4 [0x4])) [1 MEM[base: _17, offset: 0B]+0 S4 A32]) (reg/v:SF 127 [ d ])) (set (reg:DI 122 [ ivtmp.11 ]) (plus:DI (reg:DI 122 [ ivtmp.11 ]) (const_int 4 [0x4]))) ]) Oh dear, we do not have the float load/store-with-update instructions for -m64. On all modern 64-bit CPUs these are cracked, so they execute the same as the separate addi and store instructions, but it costs code space. And if we do not want them we should make them more expensive, not just pretend the insns do not exist :-)
[Bug target/17108] Store with update not generated for a simple loop
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17108 Paul H. Hargrove changed: What|Removed |Added CC||PHHargrove at lbl dot gov --- Comment #7 from Paul H. Hargrove 2012-08-13 23:19:45 UTC --- FWIW: With GCC 4.8.0 20120809 the test code is fully unrolled at -O3: .L.foo: stfs 1,0(3) stfs 1,4(3) stfs 1,8(3) stfs 1,12(3) stfs 1,16(3) stfs 1,20(3) stfs 1,24(3) stfs 1,28(3) blr However, at -O2 the code shown in comment #6 is still being generated (except with different register allocations). This is true even when "-mupdate" is passed to explicitly enable store/update instructions.
[Bug target/17108] Store with update not generated for a simple loop
--- Comment #6 from pinskia at gcc dot gnu dot org 2008-01-25 22:35 --- No this is still not fixed, we get now on the trunk as of yesterday: foo: li 0,8 li 9,0 mtctr 0 .p2align 3,,7 .L2: stfsx 1,3,9 addi 9,9,4 bdnz .L2 blr In fact I think this is worse, than what was done before but oh well. We still don't generate the store with update instruction. -- pinskia at gcc dot gnu dot org changed: What|Removed |Added Status|WAITING |NEW Last reconfirmed|2007-07-02 21:40:02 |2008-01-25 22:35:16 date|| Summary|Missed opportunity for |Store with update not |strength reduction |generated for a simple loop http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17108