[Bug target/82445] ARM target generates unaligned STRD instruction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82445 Richard Earnshaw changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED Target Milestone|--- |6.5 --- Comment #9 from Richard Earnshaw --- Fixed on trunk gcc-6 & gcc-7
[Bug target/82445] ARM target generates unaligned STRD instruction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82445 --- Comment #7 from Richard Earnshaw --- Author: rearnsha Date: Thu Oct 19 13:14:55 2017 New Revision: 253891 URL: https://gcc.gnu.org/viewcvs?rev=253891=gcc=rev Log: [ARM] PR 82445 - suppress 32-bit aligned ldrd/strd peepholing with -mno-unaligned-access Peephole patterns exist in the arm backend to spot load/store operations to adjacent memory operations in order to convert them into ldrd/strd instructions. However, when we have strict alignment enforced, then we can only do this if the accesses are known to be 64-bit aligned; this is unlikely to be the case for most loads. The patch adds some alignment checking to the code that validates the addresses for use in the peephole patterns. This should also fix incorrect generation of ldrd/strd with unaligned accesses that could previously have occurred on ARMv5e where all such operations must be 64-bit aligned. I've added some new tests as well. In doing so I discovered that the ldrd/strd peephole tests could never fail since they would match the source file name in the scanned assembly as well as any instructions of the intended type. I've fixed those by tightening the scan results slightly. gcc: * config/arm/arm.c (align_ok_ldrd_strd): New function. (mem_ok_for_ldrd_strd): New parameter align. Extract the alignment of the mem into it. (gen_operands_ldrd_strd): Validate the alignment of the accesses. testsuite: * gcc.target/arm/peep-ldrd-1.c: Tighten test scan pattern. * gcc.target/arm/peep-strd-1.c: Likewise. * gcc.target/arm/peep-ldrd-2.c: New test. * gcc.target/arm/peep-strd-2.c: New test. Added: branches/gcc-7-branch/gcc/testsuite/gcc.target/arm/peep-ldrd-2.c branches/gcc-7-branch/gcc/testsuite/gcc.target/arm/peep-strd-2.c Modified: branches/gcc-7-branch/gcc/ChangeLog branches/gcc-7-branch/gcc/config/arm/arm.c branches/gcc-7-branch/gcc/testsuite/ChangeLog branches/gcc-7-branch/gcc/testsuite/gcc.target/arm/peep-ldrd-1.c branches/gcc-7-branch/gcc/testsuite/gcc.target/arm/peep-strd-1.c
[Bug target/82445] ARM target generates unaligned STRD instruction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82445 --- Comment #8 from Richard Earnshaw --- Author: rearnsha Date: Thu Oct 19 13:16:42 2017 New Revision: 253892 URL: https://gcc.gnu.org/viewcvs?rev=253892=gcc=rev Log: [ARM] PR 82445 - suppress 32-bit aligned ldrd/strd peepholing with -mno-unaligned-access Peephole patterns exist in the arm backend to spot load/store operations to adjacent memory operations in order to convert them into ldrd/strd instructions. However, when we have strict alignment enforced, then we can only do this if the accesses are known to be 64-bit aligned; this is unlikely to be the case for most loads. The patch adds some alignment checking to the code that validates the addresses for use in the peephole patterns. This should also fix incorrect generation of ldrd/strd with unaligned accesses that could previously have occurred on ARMv5e where all such operations must be 64-bit aligned. I've added some new tests as well. In doing so I discovered that the ldrd/strd peephole tests could never fail since they would match the source file name in the scanned assembly as well as any instructions of the intended type. I've fixed those by tightening the scan results slightly. gcc: * config/arm/arm.c (align_ok_ldrd_strd): New function. (mem_ok_for_ldrd_strd): New parameter align. Extract the alignment of the mem into it. (gen_operands_ldrd_strd): Validate the alignment of the accesses. testsuite: * gcc.target/arm/peep-ldrd-1.c: Tighten test scan pattern. * gcc.target/arm/peep-strd-1.c: Likewise. * gcc.target/arm/peep-ldrd-2.c: New test. * gcc.target/arm/peep-strd-2.c: New test. Added: branches/gcc-6-branch/gcc/testsuite/gcc.target/arm/peep-ldrd-2.c branches/gcc-6-branch/gcc/testsuite/gcc.target/arm/peep-strd-2.c Modified: branches/gcc-6-branch/gcc/ChangeLog branches/gcc-6-branch/gcc/config/arm/arm.c branches/gcc-6-branch/gcc/testsuite/ChangeLog branches/gcc-6-branch/gcc/testsuite/gcc.target/arm/peep-ldrd-1.c branches/gcc-6-branch/gcc/testsuite/gcc.target/arm/peep-strd-1.c
[Bug target/82445] ARM target generates unaligned STRD instruction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82445 --- Comment #6 from Richard Earnshaw --- Author: rearnsha Date: Thu Oct 19 13:10:42 2017 New Revision: 253890 URL: https://gcc.gnu.org/viewcvs?rev=253890=gcc=rev Log: [ARM] PR 82445 - suppress 32-bit aligned ldrd/strd peepholing with -mno-unaligned-access Peephole patterns exist in the arm backend to spot load/store operations to adjacent memory operations in order to convert them into ldrd/strd instructions. However, when we have strict alignment enforced, then we can only do this if the accesses are known to be 64-bit aligned; this is unlikely to be the case for most loads. The patch adds some alignment checking to the code that validates the addresses for use in the peephole patterns. This should also fix incorrect generation of ldrd/strd with unaligned accesses that could previously have occurred on ARMv5e where all such operations must be 64-bit aligned. I've added some new tests as well. In doing so I discovered that the ldrd/strd peephole tests could never fail since they would match the source file name in the scanned assembly as well as any instructions of the intended type. I've fixed those by tightening the scan results slightly. gcc: * config/arm/arm.c (align_ok_ldrd_strd): New function. (mem_ok_for_ldrd_strd): New parameter align. Extract the alignment of the mem into it. (gen_operands_ldrd_strd): Validate the alignment of the accesses. testsuite: * gcc.target/arm/peep-ldrd-1.c: Tighten test scan pattern. * gcc.target/arm/peep-strd-1.c: Likewise. * gcc.target/arm/peep-ldrd-2.c: New test. * gcc.target/arm/peep-strd-2.c: New test. Added: trunk/gcc/testsuite/gcc.target/arm/peep-ldrd-2.c trunk/gcc/testsuite/gcc.target/arm/peep-strd-2.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/arm/arm.c trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.target/arm/peep-ldrd-1.c trunk/gcc/testsuite/gcc.target/arm/peep-strd-1.c
[Bug target/82445] ARM target generates unaligned STRD instruction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82445 Petr Cvek changed: What|Removed |Added CC||petrcvekcz at gmail dot com --- Comment #5 from Petr Cvek --- Get the same bug with: gcc version 7.1.0 (crosstool-NG crosstool-ng-1.23.0-88-gfae66ae3) Bug free version: gcc version 6.3.0 (crosstool-NG crosstool-ng-1.22.0-452-gc7b1e295) Testing program: just main() with "helloworld" calling the example code (strd.c) compiled as object file. march/mtune combinations which are OK: -march=armv5te -mtune=xscale -march=armv5te -mtune=iwmmxt -march=iwmmxt -march=armv5t (actually less opcodes than variants above) -march=armv5 march/mtune combinations which generate an unaligned access in STRD (even with -mno-unaligned-access): -march=armv5te (tested on a real HW, generating alignment exceptions in /proc/cpu/alignment) -march=armv5tej (not existing on gcc 6.3.0) -march=armv5te -mtune=arm1020e -march=armv5te -mtune=arm926ej-s
[Bug target/82445] ARM target generates unaligned STRD instruction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82445 Richard Earnshaw changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |rearnsha at gcc dot gnu.org
[Bug target/82445] ARM target generates unaligned STRD instruction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82445 --- Comment #4 from Richard Earnshaw --- looks like gen_operands_ldrd_strd should be checking for this and failing if the alignment is not suitable for the target architecture.
[Bug target/82445] ARM target generates unaligned STRD instruction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82445 Richard Biener changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2017-10-06 Ever confirmed|0 |1 --- Comment #3 from Richard Biener --- To answer myself, yes, -fno-store-merging fixes it. RTL expansion is ok: ;; MEM[(void *)unknown_glyph.0_1 + 4B] = 1048584; (insn 7 6 8 (set (reg:SI 112) (const_int 1048584 [0x18])) "strd.c":50 -1 (nil)) (insn 8 7 0 (set (mem:SI (plus:SI (reg/f:SI 110 [ unknown_glyph.0_1 ]) (const_int 4 [0x4])) [0 MEM[(void *)unknown_glyph.0_1 + 4B]+0 S4 A32]) (reg:SI 112)) "strd.c":50 -1 (nil)) ;; MEM[(void *)unknown_glyph.0_1 + 8B] = 4294770688; (insn 9 8 10 (set (reg:SI 113) (const_int -196608 [0xfffd])) "strd.c":51 -1 (nil)) (insn 10 9 0 (set (mem:SI (plus:SI (reg/f:SI 110 [ unknown_glyph.0_1 ]) (const_int 8 [0x8])) [0 MEM[(void *)unknown_glyph.0_1 + 8B]+0 S4 A32]) (reg:SI 113)) "strd.c":51 -1 (nil)) ;; MEM[(void *)unknown_glyph.0_1 + 12B] = 8; (insn 11 10 12 (set (reg:SI 115) (const_int 8 [0x8])) "strd.c":53 -1 (nil)) (insn 12 11 13 (set (reg:HI 114) (subreg:HI (reg:SI 115) 0)) "strd.c":53 -1 (expr_list:REG_EQUAL (const_int 8 [0x8]) (nil))) (insn 13 12 0 (set (mem:HI (plus:SI (reg/f:SI 110 [ unknown_glyph.0_1 ]) (const_int 12 [0xc])) [0 MEM[(void *)unknown_glyph.0_1 + 12B]+0 S2 A32]) (reg:HI 114)) "strd.c":53 -1 (nil)) The DI move is introduced by peephole2: Splitting with gen_peephole2_12 which is ldrdstrd.md: (define_peephole2 ; strd [(set (match_operand:SI 2 "memory_operand" "") (match_operand:SI 0 "arm_general_register_operand" "")) (set (match_operand:SI 3 "memory_operand" "") (match_operand:SI 1 "arm_general_register_operand" ""))] "TARGET_LDRD" [(const_int 0)] { if (!gen_operands_ldrd_strd (operands, false, false, false)) FAIL; else if (TARGET_ARM) { /* In ARM state, the destination registers of LDRD/STRD must be consecutive. We emit DImode access. */ operands[0] = gen_rtx_REG (DImode, REGNO (operands[0])); operands[2] = adjust_address (operands[2], DImode, 0); /* Emit [(set (match_dup 2) (match_dup 0))] */ emit_insn (gen_rtx_SET (operands[2], operands[0])); DONE; } else if (TARGET_THUMB2) { /* Emit the pattern: [(parallel [(set (match_dup 2) (match_dup 0)) (set (match_dup 3) (match_dup 1))])] */ rtx t1 = gen_rtx_SET (operands[2], operands[0]); rtx t2 = gen_rtx_SET (operands[3], operands[1]); emit_insn (gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, t1, t2))); DONE; } }) but this doesn't at all consider alignment of the MEMs it merges.
[Bug target/82445] ARM target generates unaligned STRD instruction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82445 --- Comment #2 from Alexander Graf --- (In reply to Richard Biener from comment #1) > Does -fno-store-merging fix it? Yes, but it generates worse code than -march=armv5 (which does not support STRD) does: -march=armv6 -fno-store-merging: > : >0: e59f3028ldr r3, [pc, #40] ; 30>4: e3a02008mov r2, #8 >8: e3a0c010mov ip, #16 >c: e3a0mov r0, #0 > 10: e3e01002mvn r1, #2 > 14: e5933000ldr r3, [r3] > 18: e1c3c0b6strhip, [r3, #6] > 1c: e1c300b8strhr0, [r3, #8] > 20: e1c310bastrhr1, [r3, #10] > 24: e1c320b4strhr2, [r3, #4] > 28: e1c320bcstrhr2, [r3, #12] > 2c: e12fff1ebx lr > 30: .word 0x -march=armv5: > : >0: e59f3018ldr r3, [pc, #24] ; 20 >4: e3a02008mov r2, #8 >8: e59f0014ldr r0, [pc, #20] ; 24 >c: e59f1014ldr r1, [pc, #20] ; 28 > 10: e5933000ldr r3, [r3] > 14: e9830003stmib r3, {r0, r1} > 18: e1c320bcstrhr2, [r3, #12] > 1c: e12fff1ebx lr > 20: .word 0x > 24: 0018.word 0x0018 > 28: fffd.word 0xfffd
[Bug target/82445] ARM target generates unaligned STRD instruction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82445 Richard Biener changed: What|Removed |Added Keywords||wrong-code Target|ARM |arm --- Comment #1 from Richard Biener --- Does -fno-store-merging fix it? store-merging currently does unsigned HOST_WIDE_INT align = split_store->align; tree offset_type = get_alias_type_for_stmts (split_store->orig_stmts); location_t loc = get_location_for_stmts (split_store->orig_stmts); tree int_type = build_nonstandard_integer_type (try_size, UNSIGNED); int_type = build_aligned_type (int_type, align); tree dest = fold_build2 (MEM_REF, int_type, addr, build_int_cst (offset_type, try_pos)); and expects the target to cope with this. But IIRC RTL expansion only ever "honors" this if either it has a movmisalign optab for the mode or it is SLOW_UNALIGNED_ACCESS (ok, arm seems to be SLOW_UNALIGNED_ACCESS). So this means we go the bitfield store way (in the end this means store-merging lacks proper costing for strict-align targets). That may end up using all sorts of patterns, like insv and friends. Not sure what happens in the end.