[Bug c++/97399] g++ 9.3 cannot compile SFINAE code with separated declaration and definition, g++ 7.3 compiles
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97399 --- Comment #1 from Renlin Li --- Created attachment 49363 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49363&action=edit test case 2
[Bug c++/97399] New: g++ 9.3 cannot compile SFINAE code with separated declaration and definition, g++ 7.3 compiles
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97399 Bug ID: 97399 Summary: g++ 9.3 cannot compile SFINAE code with separated declaration and definition, g++ 7.3 compiles Product: gcc Version: 9.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: renlin at gcc dot gnu.org Target Milestone: --- Created attachment 49362 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49362&action=edit test case 1 For gcc_1.c++ gcc 7.3 compiles for this code clang 7 compiles for this code gcc 9.3 fails to compile with following message Not sure if this is gcc's issue or clang. ``` :29:16: error: no declaration matches 'constexpr enable_if_t<((tmp*)this)->is_integral(), bool> tmp::func(E, E) const' 29 | constexpr auto tmp::func(E f_lhs, E f_rhs) |^~~ :18:27: note: candidate is: 'template static constexpr enable_if_t<((tmp*)this)->is_integral(), bool> tmp::func(E, E)' 18 | static constexpr auto func(E f_lhs, E f_rhs) | ^~~~ :12:8: note: 'struct tmp' defined here 12 | struct tmp ``` Meanwhile for gcc_2.c++ gcc compiles without any issue. clang gives the following error message ``` :27:28: error: template parameter redefines default argument template (), bool>> ^ :17:32: note: previous default template argument defined here template (), bool>> ``` It seems this is not an new issue, and might be duplicated.
[Bug middle-end/84877] Local stack copy of BLKmode parameter on the stack is not aligned when the requested alignment exceeds MAX_SUPPORTED_STACK_ALIGNMENT
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84877 Renlin Li changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #5 from Renlin Li --- Mark it as fixed.
[Bug middle-end/84877] Local stack copy of BLKmode parameter on the stack is not aligned when the requested alignment exceeds MAX_SUPPORTED_STACK_ALIGNMENT
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84877 --- Comment #4 from Renlin Li --- Author: renlin Date: Wed Nov 21 14:29:19 2018 New Revision: 266345 URL: https://gcc.gnu.org/viewcvs?rev=266345&root=gcc&view=rev Log: [PATCH][PR84877]Dynamically align the address for local parameter copy on the stack when required alignment is larger than MAX_SUPPORTED_STACK_ALIGNMENT As described in PR84877. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84877 The local copy of parameter on stack is not aligned. For BLKmode paramters, a local copy on the stack will be saved. There are three cases: 1) arguments passed partially on the stack, partially via registers. 2) arguments passed fully on the stack. 3) arguments passed via registers. After the change here, in all three cases, the stack slot for the local parameter copy is aligned by the data type. The stack slot is the DECL_RTL of the parameter. All the references thereafter in the function will refer to this RTL. To populate the local copy on the stack, For case 1) and 2), there are operations to move data from the caller's stack (from incoming rtl) into callee's stack. For case 3), the registers are directly saved into the stack slot. In all cases, the destination address is properly aligned. But for case 1) and case 2), the source address is not aligned by the type. It is defined by the PCS how the arguments are prepared. The block move operation is fulfilled by emit_block_move (). As far as I can see, it will use the smaller alignment of source and destination. This looks fine as long as we don't use instructions which requires a strict larger alignment than the address actually has. Here, it only changes receiving parameters. The function assign_stack_local_1 will be called in various places. Usually, the caller will constraint the ALIGN parameter. For example via STACK_SLOT_ALIGNMENT macro. assign_parm_setup_block will call assign_stack_local () with alignment from the parameter type which in this case could be larger than MAX_SUPPORTED_STACK_ALIGNMENT. The alignment operation for parameter copy on the stack is similar to stack vars. First, enough space is reserved on the stack. The size is fixed at compile time. Instructions are emitted to dynamically get an aligned address at runtime within this piece of memory. This will unavoidably increase the usage of stack. However, it really depends on how many over-aligned parameters are passed by value. gcc/ 2018-11-21 Renlin Li PR middle-end/84877 * explow.h (get_dynamic_stack_size): Declare it as external. * explow.c (record_new_stack_level): Remove function static attribute. * function.c (assign_stack_local_1): Dynamically align the stack slot addr for parameter copy on the stack. gcc/testsuite/ 2018-11-21 Renlin Li PR middle-end/84877 * gcc.dg/pr84877.c: New. Added: trunk/gcc/testsuite/gcc.dg/pr84877.c Modified: trunk/gcc/ChangeLog trunk/gcc/explow.c trunk/gcc/explow.h trunk/gcc/function.c trunk/gcc/testsuite/ChangeLog
[Bug target/87815] ICE in DSE with -march=armv8-a+sve while trying to replace load with previously stored value
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87815 Renlin Li changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #2 from Renlin Li --- Fix by r266033
[Bug target/87815] ICE in DSE with -march=armv8-a+sve while trying to replace load with previously stored value
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87815 --- Comment #1 from Renlin Li --- Author: renlin Date: Mon Nov 12 16:47:24 2018 New Revision: 266033 URL: https://gcc.gnu.org/viewcvs?rev=266033&root=gcc&view=rev Log: [PR87815]Don't generate shift sequence for load replacement in DSE when the mode size is not compile-time constant The patch adds a check if the gap is compile-time constant. This happens when dse decides to replace the load with previous store value. The problem is that, shift sequence could not accept compile-time non-constant mode operand. gcc/ 2018-11-12 Renlin Li PR target/87815 * dse.c (get_stored_val): Add check for compile-time constantness of gap. gcc/testsuite/ 2018-11-12 Renlin Li PR target/87815 * gcc.target/aarch64/sve/pr87815.c: New. Added: trunk/gcc/testsuite/gcc.target/aarch64/sve/pr87815.c Modified: trunk/gcc/ChangeLog trunk/gcc/dse.c trunk/gcc/testsuite/ChangeLog
[Bug middle-end/87899] [9 regression]r264897 cause mis-compiled native arm-linux-gnueabihf toolchain
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87899 --- Comment #6 from Renlin Li --- Created attachment 44975 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44975&action=edit IRA dump
[Bug middle-end/87899] [9 regression]r264897 cause mis-compiled native arm-linux-gnueabihf toolchain
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87899 --- Comment #5 from Renlin Li --- Created attachment 44974 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44974&action=edit IRA dump The code you want to check is the following in ira pass: insn 10905: r1 = r2040 insn 208: use and update r1 with pre_modify insn 191: use pseudo r2040
[Bug middle-end/87899] [9 regression]r264897 cause mis-compiled native arm-linux-gnueabihf toolchain
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87899 --- Comment #3 from Renlin Li --- (In reply to Renlin Li from comment #1) > in tree-loop-distribution.c, distribution_loop function, I got the following > code snippets. > > 30386: 0103cff4 4 OBJECT LOCAL DEFAULT 25 _ZL23bb_top_order_index_s > 30387: 0103cff8 4 OBJECT LOCAL DEFAULT 25 _ZL18bb_top_order_index > 30388: 0103cffc 4 OBJECT LOCAL DEFAULT 25 _ZL10ddrs_table > 30389: 0103d000 4 OBJECT LOCAL DEFAULT 25 _ZL9loop_nest > 30390: 0103d004 4 OBJECT LOCAL DEFAULT 25 _ZL12datarefs_vec > > > r1 = 0x103cff4, which points to the local anchor area. > r4 is the dynamically allocated has_table pointer which supposed to be store > into ddrs_table, i.e. 0103cffc. > >0x61a346 , > control_dependences*, int*, bool*)+90>: strbr7, [r2, #0] >0x61a348 , > control_dependences*, int*, bool*)+92>: str.w r7, [r8] > 1=>0x61a34c , > control_dependences*, int*, bool*)+96>: str.w r7, [r1, #12]! >0x61a350 , > control_dependences*, int*, bool*)+100>: mov r5, r1 > 2=>0x61a352 , > control_dependences*, int*, bool*)+102>: str r4, [r1, #8] >0x61a354 , > control_dependences*, int*, bool*)+104>: str r0, [r4, #0] >0x61a356 , > control_dependences*, int*, bool*)+106>: mov r0, r9 > > However, r1 is changed by the previous pre-indexed store at 0x61a34c (marked > as 1). > This makes the store later store the pointer in the wrong position. > Later when accessing ddrs_table, it got a null pointer, eventually resulting > in the ICE observed here. > > The full assembly is attached. Before the change: 0x0061a746 <+26>:bl 0xc86134 0x0061a74a <+30>:movwr2, #57316 ; 0xdfe4 0x0061a74e <+34>:movtr2, #259; 0x103 0x0061a752 <+38>:str r2, [sp, #28] 0x0061a754 <+40>:mov r4, r0 0x0061a756 <+42>:movwr0, #389; 0x185 0x0061a75a <+46>:str r7, [r4, #8] 0x0061a75c <+48>:str r7, [r4, #12] 0x0061a75e <+50>:strdr7, r7, [r4, #16] 0x0061a762 <+54>:strhr7, [r4, #28] 0x0061a764 <+56>:bl 0xc2bc50 0x0061a768 <+60>:movwr3, #8452 ; 0x2104 0x0061a76c <+64>:movtr3, #242; 0xf2 0x0061a770 <+68>:lslsr2, r0, #4 0x0061a772 <+70>:mov r5, r0 0x0061a774 <+72>:mov r0, r4 0x0061a776 <+74>:ldr r6, [r3, r2] 0x0061a778 <+76>:mov r1, r6 0x0061a77a <+78>:bl 0x61d1b4 ::alloc_entries(unsigned int) const> 0x0061a77e <+82>:ldr.w r12, [sp, #28] 0x0061a782 <+86>:ldr r2, [sp, #296] ; 0x128 0x0061a784 <+88>:str r5, [r4, #24] 0x0061a786 <+90>:mov r1, r12 0x0061a788 <+92>:str r6, [r4, #4] 0x0061a78a <+94>:strbr7, [r2, #0] 0x0061a78c <+96>:mov r5, r12 0x0061a78e <+98>:str.w r7, [r8] 0x0061a792 <+102>: str.w r7, [r1, #12]! 0x0061a796 <+106>: str.w r4, [r12, #8] We can see that, r4 is store in [r12+8], not using the updated r1 above.
[Bug middle-end/87899] [9 regression]r264897 cause mis-compiled native arm-linux-gnueabihf toolchain
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87899 --- Comment #2 from Renlin Li --- Created attachment 44965 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44965&action=edit disassembly of distribute_loop disassembly of wrongly compiled distribute_loop function
[Bug middle-end/87899] [9 regression]r264897 cause mis-compiled native arm-linux-gnueabihf toolchain
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87899 --- Comment #1 from Renlin Li --- in tree-loop-distribution.c, distribution_loop function, I got the following code snippets. 30386: 0103cff4 4 OBJECT LOCAL DEFAULT 25 _ZL23bb_top_order_index_s 30387: 0103cff8 4 OBJECT LOCAL DEFAULT 25 _ZL18bb_top_order_index 30388: 0103cffc 4 OBJECT LOCAL DEFAULT 25 _ZL10ddrs_table 30389: 0103d000 4 OBJECT LOCAL DEFAULT 25 _ZL9loop_nest 30390: 0103d004 4 OBJECT LOCAL DEFAULT 25 _ZL12datarefs_vec r1 = 0x103cff4, which points to the local anchor area. r4 is the dynamically allocated has_table pointer which supposed to be store into ddrs_table, i.e. 0103cffc. 0x61a346 , control_dependences*, int*, bool*)+90>: strbr7, [r2, #0] 0x61a348 , control_dependences*, int*, bool*)+92>: str.w r7, [r8] 1=>0x61a34c , control_dependences*, int*, bool*)+96>: str.w r7, [r1, #12]! 0x61a350 , control_dependences*, int*, bool*)+100>: mov r5, r1 2=>0x61a352 , control_dependences*, int*, bool*)+102>: str r4, [r1, #8] 0x61a354 , control_dependences*, int*, bool*)+104>: str r0, [r4, #0] 0x61a356 , control_dependences*, int*, bool*)+106>: mov r0, r9 However, r1 is changed by the previous pre-indexed store at 0x61a34c (marked as 1). This makes the store later store the pointer in the wrong position. Later when accessing ddrs_table, it got a null pointer, eventually resulting in the ICE observed here. The full assembly is attached.
[Bug middle-end/87899] New: [9 regression]r264897 cause mis-compiled native arm-linux-gnueabihf toolchain
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87899 Bug ID: 87899 Summary: [9 regression]r264897 cause mis-compiled native arm-linux-gnueabihf toolchain Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: renlin at gcc dot gnu.org Target Milestone: --- since r264897, native arm-linux-gnueabihf toolchain has been mis-compiled. Somehow, it survives boostrap. It ICEs when compiling a lot of test cases. They fail with similar message. For example: ./gcc/cc1 ~/gcc/./gcc/testsuite/gcc.c-torture/execute/pr36034-1.c -O3 test main Analyzing compilation unit Performing interprocedural optimizations <*free_lang_data> Streaming LTO Assembling functions: testduring GIMPLE pass: ldist gcc/./gcc/testsuite/gcc.c-torture/execute/pr36034-1.c: In function ‘test’: gcc/./gcc/testsuite/gcc.c-torture/execute/pr36034-1.c:9:1: internal compiler error: Segmentation fault 9 | test (void) | ^~~~ 0x5c3a37 crash_signal ../../gcc/gcc/toplev.c:325 0x63ef6b inchash::hash::add(void const*, unsigned int) ../../gcc/gcc/inchash.h:100 0x63ef6b inchash::hash::add_ptr(void const*) ../../gcc/gcc/inchash.h:94 0x63ef6b ddr_hasher::hash(data_dependence_relation const*) ../../gcc/gcc/tree-loop-distribution.c:143 0x63ef6b hash_table::find_slot(data_dependence_relation* const&, insert_option) ../../gcc/gcc/hash-table.h:414 0x63ef6b get_data_dependence ../../gcc/gcc/tree-loop-distribution.c:1184 0x63f2bd data_dep_in_cycle_p ../../gcc/gcc/tree-loop-distribution.c:1210 0x63f2bd update_type_for_merge ../../gcc/gcc/tree-loop-distribution.c:1255 0x64064b build_rdg_partition_for_vertex ../../gcc/gcc/tree-loop-distribution.c:1302 0x64064b rdg_build_partitions ../../gcc/gcc/tree-loop-distribution.c:1754 0x64064b distribute_loop ../../gcc/gcc/tree-loop-distribution.c:2795 0x642299 execute ../../gcc/gcc/tree-loop-distribution.c:3133 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See <https://gcc.gnu.org/bugs/> for instructions.
[Bug target/87815] ICE in DSE with -march=armv8-a+sve while trying to replace load with previously stored value
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87815 Renlin Li changed: What|Removed |Added Keywords||ice-on-valid-code Target||aarch64-none-elf Version|8.0 |9.0 Target Milestone|--- |9.0 Known to fail||9.0
[Bug target/87815] ICE in DSE with -march=armv8-a+sve while trying to replace load with previously stored value
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87815 Renlin Li changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Last reconfirmed||2018-10-30 Ever confirmed|0 |1
[Bug target/87815] New: ICE in DSE with -march=armv8-a+sve while trying to replace load with previously stored value
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87815 Bug ID: 87815 Summary: ICE in DSE with -march=armv8-a+sve while trying to replace load with previously stored value Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: renlin at gcc dot gnu.org Target Milestone: --- The following test case ICEs with: -march=armv8.2-a+sve -O3 and -Ofast int a, b, d; short e; void f() { for (int i = 0; i < 8; i++) { e = b >= 2 ?: a >> b; d = e && b; } } test.c: In function 'f': test.c:8:1: internal compiler error: in smallest_mode_for_size, at stor-layout.c:355 8 | } | ^ 0x1048b4a smallest_mode_for_size(poly_int<2u, unsigned long>, mode_class) src/gcc/gcc/stor-layout.c:355 0xa1a14e smallest_int_mode_for_size(poly_int<2u, unsigned long>) src/gcc/gcc/machmode.h:838 0x1a93f86 find_shift_sequence src/gcc/gcc/dse.c:1704 0x1a9497b get_stored_val src/gcc/gcc/dse.c:1850 0x1a94dae replace_read src/gcc/gcc/dse.c:1955 0x1a958db check_mem_read_rtx src/gcc/gcc/dse.c:2187 0x1a95dfc check_mem_read_use src/gcc/gcc/dse.c:2293 0xfd0fd9 note_uses(rtx_def**, void (*)(rtx_def**, void*), void*) src/gcc/gcc/rtlanal.c:2005 0x1a9660d scan_insn src/gcc/gcc/dse.c:2401 0x1a972f3 dse_step1 src/gcc/gcc/dse.c:2659 0x1a9968b rest_of_handle_dse src/gcc/gcc/dse.c:3576 0x1a9981e execute src/gcc/gcc/dse.c:3634 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See <https://gcc.gnu.org/bugs/> for instructions.
[Bug target/87563] [9 regression ] ICE with -march=armv8-a+sve
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87563 Renlin Li changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #5 from Renlin Li --- fix committed as r265172. Close it.
[Bug target/87563] [9 regression ] ICE with -march=armv8-a+sve
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87563 --- Comment #4 from Renlin Li --- Author: renlin Date: Mon Oct 15 16:49:05 2018 New Revision: 265172 URL: https://gcc.gnu.org/viewcvs?rev=265172&root=gcc&view=rev Log: [PR87563][AARCH64-SVE]: Don't keep ifcvt loop when COND_ ifn could not be vectorized. ifcvt will created versioned loop and it will permissively generate scalar COND_ ifn. If in the loop vectorize pass, COND_ could not get vectoized, the if-converted loop should be abandoned when the target doesn't support such ifn. gcc/ 2018-10-12 Renlin Li PR target/87563 * tree-vectorizer.c (try_vectorize_loop_1): Don't use if-conversioned loop when it contains ifn with types not supported by backend. * internal-fn.c (expand_direct_optab_fn): Add an assert. (direct_internal_fn_supported_p): New helper function. * internal-fn.h (direct_internal_fn_supported_p): Declare. gcc/testsuite/ 2018-10-12 Renlin Li PR target/87563 * gcc.target/aarch64/sve/pr87563.c: New. Added: trunk/gcc/testsuite/gcc.target/aarch64/sve/pr87563.c Modified: trunk/gcc/ChangeLog trunk/gcc/internal-fn.c trunk/gcc/internal-fn.h trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-vectorizer.c
[Bug tree-optimization/87562] [9 Regression] ICE in in linemap_position_for_line_and_column, at libcpp/line-map.c:848
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87562 Renlin Li changed: What|Removed |Added CC||renlin at gcc dot gnu.org --- Comment #2 from Renlin Li --- (In reply to David Malcolm from comment #1) > linemap_position_for_line_and_column(line_maps*, line_map_ordinary const*, > unsigned int, unsigned int) at libcpp/line-map.c:848 > is: > linemap_assert (ORDINARY_MAP_STARTING_LINE_NUMBER (ord_map) <= line); > > I wonder if I introduced this in r264887 with the changes to input.c > (macro-handling and concatenated strings), which touched the function in the > next frame. > > I'll see if I can reproduce it. Hi David, I checked that, the ICE starts from r264887.
[Bug target/87563] [9 regression ] ICE with -march=armv8-a+sve
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87563 Renlin Li changed: What|Removed |Added Status|NEW |ASSIGNED CC||renlin at gcc dot gnu.org Assignee|unassigned at gcc dot gnu.org |renlin at gcc dot gnu.org
[Bug middle-end/84877] New: Local stack copy of BLKmode parameter on the stack is not aligned when the requested alignment exceeds MAX_SUPPORTED_STACK_ALIGNMENT
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84877 Bug ID: 84877 Summary: Local stack copy of BLKmode parameter on the stack is not aligned when the requested alignment exceeds MAX_SUPPORTED_STACK_ALIGNMENT Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: renlin at gcc dot gnu.org Target Milestone: --- For a test case like this, #include struct U { uint32_t M0; uint32_t M1; } __attribute((aligned(16))); void tmp (struct U *); void foo(struct U P0) { struct U P1 = P0; tmp (&P1); } void bar(struct U P0) { tmp (&P0); } The required alignment of a BLKmode parameter is truncated to MAX_SUPPORTED_STACK_ALIGNMENT when it exceeds. On the other hand, the compiler will try to dynamically align the stack slot for local variable. For example, on arm-gcc toolchain, The function foo () will return a 16-byte aligned address. However, P0 is temporarily stored on stack in an unaligned address. Function bar () will return an unaligned address which is the address of local stack copy of P0. a warning could be emitted when the alignment could not be fulfilled or dynamically align it thought it will waste stack space.
[Bug target/83370] [AARCH64]Tailcall register may be corrupted by epilogue code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83370 Renlin Li changed: What|Removed |Added Status|REOPENED|RESOLVED Resolution|--- |FIXED --- Comment #6 from Renlin Li --- (In reply to Richard Earnshaw from comment #3) > Doesn't this need backporting? Yes, it is needed. The same problem happens in gcc-6 and gcc-7. The backporting is approved and committed now.
[Bug target/83370] [AARCH64]Tailcall register may be corrupted by epilogue code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83370 --- Comment #5 from Renlin Li --- Author: renlin Date: Thu Feb 1 21:33:05 2018 New Revision: 257315 URL: https://gcc.gnu.org/viewcvs?rev=257315&root=gcc&view=rev Log: [PR83370][AARCH64]Use tighter register constraint for sibcall patterns. gcc/ backport from mainline 2018-02-01 Renlin Li PR target/83370 * config/aarch64/aarch64.c (aarch64_class_max_nregs): Handle TAILCALL_ADDR_REGS. (aarch64_register_move_cost): Likewise. * config/aarch64/aarch64.h (reg_class): Rename CALLER_SAVE_REGS to TAILCALL_ADDR_REGS. (REG_CLASS_NAMES): Likewise. (REG_CLASS_CONTENTS): Rename CALLER_SAVE_REGS to TAILCALL_ADDR_REGS. Remove IP registers. * config/aarch64/aarch64.md (Ucs): Update register constraint. gcc/testsuite/ backport from mainline 2018-02-01 Richard Sandiford PR target/83370 * gcc.target/aarch64/pr83370.c: New. Added: branches/gcc-6-branch/gcc/testsuite/gcc.target/aarch64/pr83370.c Modified: branches/gcc-6-branch/gcc/ChangeLog branches/gcc-6-branch/gcc/config/aarch64/aarch64.c branches/gcc-6-branch/gcc/config/aarch64/aarch64.h branches/gcc-6-branch/gcc/config/aarch64/constraints.md branches/gcc-6-branch/gcc/testsuite/ChangeLog
[Bug target/83370] [AARCH64]Tailcall register may be corrupted by epilogue code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83370 --- Comment #4 from Renlin Li --- Author: renlin Date: Thu Feb 1 21:09:06 2018 New Revision: 257314 URL: https://gcc.gnu.org/viewcvs?rev=257314&root=gcc&view=rev Log: [PR83370][AARCH64]Use tighter register constraint for sibcall patterns. gcc/ backport from mainline 2018-02-01 Renlin Li PR target/83370 * config/aarch64/aarch64.c (aarch64_class_max_nregs): Handle TAILCALL_ADDR_REGS. (aarch64_register_move_cost): Likewise. * config/aarch64/aarch64.h (reg_class): Rename CALLER_SAVE_REGS to TAILCALL_ADDR_REGS. (REG_CLASS_NAMES): Likewise. (REG_CLASS_CONTENTS): Rename CALLER_SAVE_REGS to TAILCALL_ADDR_REGS. Remove IP registers. * config/aarch64/aarch64.md (Ucs): Update register constraint. gcc/testsuite/ backport from mainline 2018-02-01 Richard Sandiford PR target/83370 * gcc.target/aarch64/pr83370.c: New. Added: branches/gcc-7-branch/gcc/testsuite/gcc.target/aarch64/pr83370.c Modified: branches/gcc-7-branch/gcc/ChangeLog branches/gcc-7-branch/gcc/config/aarch64/aarch64.c branches/gcc-7-branch/gcc/config/aarch64/aarch64.h branches/gcc-7-branch/gcc/config/aarch64/constraints.md branches/gcc-7-branch/gcc/testsuite/ChangeLog
[Bug target/83370] [AARCH64]Tailcall register may be corrupted by epilogue code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83370 Renlin Li changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #2 from Renlin Li --- fix has been commit in trunk.
[Bug target/83370] [AARCH64]Tailcall register may be corrupted by epilogue code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83370 --- Comment #1 from Renlin Li --- Author: renlin Date: Thu Feb 1 13:02:24 2018 New Revision: 257294 URL: https://gcc.gnu.org/viewcvs?rev=257294&root=gcc&view=rev Log: [PR83370][AARCH64]Use tighter register constraint for sibcall patterns. In aarch64 backend, ip0/ip1 register will be used in the prologue/epilogue as temporary register. When the compiler is performing sibcall optimization. It has the chance to use ip0/ip1 register for indirect function call to hold the address. However, those two register might be clobbered by the epilogue code which makes the last sibcall instruction invalid. The patch here renames the register class CALLER_SAVE_REGS to TAILCALL_ADDR_REGS to reflect its usage, and remove IP registers from this class. gcc/ 2018-02-01 Renlin Li PR target/83370 * config/aarch64/aarch64.c (aarch64_class_max_nregs): Handle TAILCALL_ADDR_REGS. (aarch64_register_move_cost): Likewise. * config/aarch64/aarch64.h (reg_class): Rename CALLER_SAVE_REGS to TAILCALL_ADDR_REGS. (REG_CLASS_NAMES): Likewise. (REG_CLASS_CONTENTS): Rename CALLER_SAVE_REGS to TAILCALL_ADDR_REGS. Remove IP registers. * config/aarch64/aarch64.md (Ucs): Update register constraint. gcc/testsuite/ 2018-02-01 Richard Sandiford PR target/83370 * gcc.target/aarch64/pr83370.c: New. Added: trunk/gcc/testsuite/gcc.target/aarch64/pr83370.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/aarch64/aarch64.c trunk/gcc/config/aarch64/aarch64.h trunk/gcc/config/aarch64/constraints.md trunk/gcc/testsuite/ChangeLog
[Bug target/83370] New: [AARCH64]Tailcall register may be corrupted by epilogue code
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83370 Bug ID: 83370 Summary: [AARCH64]Tailcall register may be corrupted by epilogue code Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: renlin at gcc dot gnu.org Target Milestone: --- The following example generates incorrect code: void (*f)(); int xx; void tailcall (int i) { int arr[5000]; xx = arr[i]; f(); } When built with -O2 -ffixed-x0 -ffixed-x1 -ffixed-x2 -ffixed-x3 -ffixed-x4 -ffixed-x5 -ffixed-x6 -ffixed-x7 -ffixed-x8 -ffixed-x9 -ffixed-x10 -ffixed-x11 -ffixed-x12 -ffixed-x13 -ffixed-x14 -ffixed-x15 -ffixed-x17 -ffixed-x18 tailcall: mov x16, 20016 sub sp, sp, x16 adrpx16, .LANCHOR0 stp x19, x30, [sp] add x19, sp, 16 ldr s0, [x19, w0, sxtw 2] ldp x19, x30, [sp] str s0, [x16, #:lo12:.LANCHOR0] mov x16, 20016 add sp, sp, x16 br x16 // oops So the issue is there is nothing in the tail call instruction that prevents it from using IP0/IP1 which are used as temporaries in the epilogue. We use the temporary for frames of 4-64KB, so this issue is more likely today (previously temporary was used only in frames larger than 16MBytes). The problem appears to be that while we have explicit clobbers in a tailcall, they are after the call, not before it: (call_insn/j 16 12 17 2 (parallel [ (call (mem:DI (reg/f:DI 84 [ f ]) [0 *f.0_2 S8 A8]) (const_int 0 [0])) (return) ]) "tailcall.c":13 42 {*sibcall_insn} (expr_list:REG_DEAD (reg/f:DI 84 [ f ]) (expr_list:REG_CALL_DECL (nil) (nil))) (expr_list (clobber (reg:DI 17 x17)) (expr_list (clobber (reg:DI 16 x16)) (nil This issues affects gcc-5, gcc-6, gcc-7 and current trunk.
[Bug lto/81351] [8 regression] Many LTO testcases FAIL
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81351 Renlin Li changed: What|Removed |Added CC||renlin at gcc dot gnu.org --- Comment #5 from Renlin Li --- similar failures happens on aarch64-linux-gnu & arm-linux-gnueabihf
[Bug testsuite/81179] [8 regression] gcc.dg/vect/pr65947-9.c and gcc.dg/vect/pr65947-14.c fail starting with r249553
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81179 Renlin Li changed: What|Removed |Added CC||renlin at gcc dot gnu.org --- Comment #2 from Renlin Li --- The same failures are observed on all arm and aarch64 targets. FAIL: gcc.dg/vect/pr65947-9.c -flto -ffat-lto-objects scan-tree-dump vect "loop size is greater than data size" FAIL: gcc.dg/vect/pr65947-9.c scan-tree-dump vect "loop size is greater than data size" FAIL: gcc.dg/vect/pr65947-14.c -flto -ffat-lto-objects execution test FAIL: gcc.dg/vect/pr65947-14.c execution test
[Bug c++/81067] [8 regression] g++.dg/template/nontype10.C FAILs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81067 Renlin Li changed: What|Removed |Added CC||renlin at gcc dot gnu.org --- Comment #1 from Renlin Li --- I confirm I noticed the same regressions on arm targets.
[Bug tree-optimization/80948] [8 regression] test case gcc.dg/torture/pr68017.c fails with ICE starting with r248771
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80948 Renlin Li changed: What|Removed |Added CC||renlin at gcc dot gnu.org --- Comment #3 from Renlin Li --- saw this ICE on arm and aarch64 target as well.
[Bug tree-optimization/78529] [7 Regression] gcc.c-torture/execute/builtins/strcat-chk.c failed with lto/O2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78529 --- Comment #25 from Renlin Li --- Created attachment 40474 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40474&action=edit reduced objdump assembler file
[Bug tree-optimization/78529] [7 Regression] gcc.c-torture/execute/builtins/strcat-chk.c failed with lto/O2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78529 --- Comment #24 from Renlin Li --- Created attachment 40473 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40473&action=edit memset.c
[Bug tree-optimization/78529] [7 Regression] gcc.c-torture/execute/builtins/strcat-chk.c failed with lto/O2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78529 --- Comment #23 from Renlin Li --- Created attachment 40472 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40472&action=edit test case
[Bug tree-optimization/78529] [7 Regression] gcc.c-torture/execute/builtins/strcat-chk.c failed with lto/O2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78529 Renlin Li changed: What|Removed |Added CC||renlin at gcc dot gnu.org --- Comment #22 from Renlin Li --- (In reply to James Greenhalgh from comment #19) > That would be an error: > > /tmp/ccpefK3l.ltrans0.ltrans.o: In function `memset': > :(.text+0x4a0): multiple definition of `memset' > .../aarch64-none-elf/lib/libc.a(lib_a-memset.o): > .../newlib/libc/machine/aarch64/memset.S:90: first defined here > > Were it not for the flag added to resolve PR55994 > -Wl,--allow-multiple-definition . > > So, in my opinion, the testcase is broken and could always have failed in > this way. The combination of register allocation, LTO and order the linker > sees symbols explains why this is hard to reproduce. I had exactly the same errors and issues today. I reduced it to a minimum test case. Please check the new attachment The build command line is: aarch64-none-elf-gcc -O2 -specs=aem-ve.specs -Wl,--allow-multiple-definition -lm -flto main.c memset.c -o new.exe The expected output should be "A A A 2" 80001038 : 80001038: a9bf7bfdstp x29, x30, [sp,#-16]! 8000103c: 9123adrpx3, 80025000 <__global_locale+0x68> 80001040: 52800044mov w4, #0x2// #2 80001044: 91060060add x0, x3, #0x180 80001048: 910003fdmov x29, sp 8000104c: b9018064str w4, [x3,#384] 80001050: d2800402mov x2, #0x20 // #32 80001054: 52800821mov w1, #0x41 // #65 80001058: 91002000add x0, x0, #0x8 # At this function entry, x4 is not saved. Because LTO thinks the local memset # implementation will not clobber it. However, the libc version of memeset is # linked in the final binary. The implementation there will clobber x4. This # will cause run-time data corruption, which is shown here. 8000105c: 94000a39bl 80003940 80001060: a8c17bfdldp x29, x30, [sp],#16 80001064: 52800823mov w3, #0x41 // #65 80001068: 9080adrpx0, 80011000 <__swbuf_r+0x70> 8000106c: 2a0303e2mov w2, w3 80001070: 2a0303e1mov w1, w3 80001074: 91152000add x0, x0, #0x548 80001078: 140015c0b 80006778 8000107c: .inst 0x ; undefined This is mentioned above. But allow me to ask again: "aarch64-none-elf-gcc -O2 main.c memset.c -o new.o -specs=aem-ve.specs -lm -flto" will give the "multiple definition of `memset'" error while "aarch64-none-elf-gcc -O2 main.c memset.c -o new.o -specs=aem-ve.specs -lm" won't. Should them behavior the same? By adding "-Wl,--allow-multiple-definition" do fix this erro. But why it's the test case that is broken instead of the lto pass?
[Bug c++/71913] [5/6/7 Regression] Missing copy elision with operator new
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71913 --- Comment #11 from Renlin Li --- (In reply to Christophe Lyon from comment #10) > I've noticed that something similar to what Renlin suggested was committed > to trunk as r238728. > > Could this testcase fix be backported to the release branches too? Yes, the failure can still be observed in branch 49 and 5. It will be good to backport the fix to those branches.
[Bug middle-end/64971] [5 Regression] gcc.c-torture/compile/pr37433.c ICEs with -mabi=ilp32
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64971 --- Comment #17 from Renlin Li --- Author: renlin Date: Tue Aug 9 17:20:14 2016 New Revision: 239300 URL: https://gcc.gnu.org/viewcvs?rev=239300&root=gcc&view=rev Log: [PATCH][PR64971]Convert function pointer to Pmode when emit call. gcc/ 2016-08-04 Renlin Li PR middle-end/64971 * calls.c (prepare_call_address): Convert funexp to Pmode when necessary. * config/aarch64/aarch64.md (sibcall): Remove fix for PR 64971. (sibcall_value): Likewise. Modified: trunk/gcc/ChangeLog trunk/gcc/calls.c trunk/gcc/config/aarch64/aarch64.md
[Bug fortran/71961] [7 Regression] 178.galgel in SPEC CPU 2000 is miscompiled
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71961 --- Comment #15 from Renlin Li --- The change r238497 has been reverted as r238815. I confirmed that, after the revert, the 178.gagel mis-compare is fixed in aarch64-linux environment. PR 71902 is reopend as well.
[Bug fortran/71902] [5/6 Regression] Unneeded temporary on reallocatable character assignment
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71902 --- Comment #6 from Renlin Li --- Author: renlin Date: Thu Jul 28 11:21:53 2016 New Revision: 238815 URL: https://gcc.gnu.org/viewcvs?rev=238815&root=gcc&view=rev Log: [PATCH] Revert Revert r238497 because of PR 71961. This patch reverts the change for PR 71902 since it causes 178.gagel miscompile in spec2000 as reported in PR 71961 which was observed in x86_64, aarch64, powerpc64. gcc/fortran/ChangeLog: 2016-07-28 Renlin Li Revert 2016-07-19 Thomas Koenig PR fortran/71902 * dependency.c (gfc_check_dependency): Use dep_ref. Handle case if identical is true and two array element references differ. (gfc_dep_resovler): Move most of the code to dep_ref. (dep_ref): New function. * frontend-passes.c (realloc_string_callback): Name temporary variable "realloc_string". gcc/testsuite/ChangeLog: 2016-07-28 Renlin Li Revert 2016-07-19 Thomas Koenig PR fortran/71902 * gfortran.dg/dependency_47.f90: New test. Removed: trunk/gcc/testsuite/gfortran.dg/dependency_47.f90 Modified: trunk/gcc/fortran/ChangeLog trunk/gcc/fortran/dependency.c trunk/gcc/fortran/frontend-passes.c trunk/gcc/testsuite/ChangeLog
[Bug fortran/71961] [7 Regression] 178.galgel in SPEC CPU 2000 is miscompiled
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71961 --- Comment #14 from Renlin Li --- Author: renlin Date: Thu Jul 28 11:21:53 2016 New Revision: 238815 URL: https://gcc.gnu.org/viewcvs?rev=238815&root=gcc&view=rev Log: [PATCH] Revert Revert r238497 because of PR 71961. This patch reverts the change for PR 71902 since it causes 178.gagel miscompile in spec2000 as reported in PR 71961 which was observed in x86_64, aarch64, powerpc64. gcc/fortran/ChangeLog: 2016-07-28 Renlin Li Revert 2016-07-19 Thomas Koenig PR fortran/71902 * dependency.c (gfc_check_dependency): Use dep_ref. Handle case if identical is true and two array element references differ. (gfc_dep_resovler): Move most of the code to dep_ref. (dep_ref): New function. * frontend-passes.c (realloc_string_callback): Name temporary variable "realloc_string". gcc/testsuite/ChangeLog: 2016-07-28 Renlin Li Revert 2016-07-19 Thomas Koenig PR fortran/71902 * gfortran.dg/dependency_47.f90: New test. Removed: trunk/gcc/testsuite/gfortran.dg/dependency_47.f90 Modified: trunk/gcc/fortran/ChangeLog trunk/gcc/fortran/dependency.c trunk/gcc/fortran/frontend-passes.c trunk/gcc/testsuite/ChangeLog
[Bug c++/71913] [5/6/7 Regression] Missing copy elision with operator new
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71913 Renlin Li changed: What|Removed |Added CC||renlin at gcc dot gnu.org --- Comment #9 from Renlin Li --- g++.dg/init/elide5.C fails on target whose SIZE_TYPE is not "long unsigned int". testsuite/g++.dg/init/elide5.C:4:42: error: 'operator new' takes type 'size_t' ('unsigned int') as first parameter [-fpermissive] I have checked, for most 32 bit architectures or ABI, the SIZE_TYPE is "unsigned int". arm is one of them. To make this test case portable, __SIZE_TYPE__ should be better in this case, instead of "unsigned long" as first argument of new operator. > void* operator new(unsigned long, void* p) { return p; }
[Bug fortran/71961] [7 Regression] 178.galgel in SPEC CPU 2000 is miscompiled
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71961 Renlin Li changed: What|Removed |Added CC||renlin at gcc dot gnu.org --- Comment #2 from Renlin Li --- The miscompare of 178.galgel is observed in aarch64-linux as well.
[Bug rtl-optimization/70030] [LRA]ICE when reload insn with output scratch operand
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70030 --- Comment #8 from Renlin Li --- (In reply to Vladimir Makarov from comment #6) > Created attachment 38033 [details] > A patch > > Here is the patch which might solve the problem. Hi Vladimir, Do you have plan to check this patch in? Thanks!
[Bug middle-end/71625] missing strlen optimization on different array initialization style
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71625 --- Comment #9 from Renlin Li --- (In reply to nsz from comment #8) > (In reply to Jakub Jelinek from comment #6) > > (In reply to Marc Glisse from comment #1) > > > Or we could do like clang and improve alias analysis. We should know that > > > array doesn't escape and thus that hallo() cannot write to it. > > > > The strlen pass uses the alias oracle, so the question is why it thinks the > > call might affect the array. > > the optimization fails with > > const char array[] = "abc"; > > too (which is why i thought it was about pure strlen depending on global > state > other than the argument.. static const array works though). char *array = "abc"; works, however, this generates string literals in read-only section.
[Bug middle-end/71625] New: missing strlen optimization on different array initialization style
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71625 Bug ID: 71625 Summary: missing strlen optimization on different array initialization style Product: gcc Version: tree-ssa Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: renlin at gcc dot gnu.org Target Milestone: --- Hi, The following two functions shall give the same result 3. Currently, foo () can be optimized to return a constant. bar (), however, contains function call to strlen, which is sub-optimal. int foo () { char array[] = "abc"; return __builtin_strlen (array); } int bar () { char array[] = {'a', 'b', 'c', '\0'}; return __builtin_strlen (array); } Clang 3.8 produce optimal code-generation for both cases. In addition, I have another case here: int hallo (); int dummy () { char array[] = "abc"; return hallo () + __builtin_strlen (array); } the __builtin_strlen is not fold into a const as in foo () above. Presumably, gcc is too conservative about what hallo () function can do. By adding a pure attribute to hallo (), gcc will generate optimal code. Clang 3.8 gives optimal code in this case as well.
[Bug rtl-optimization/70030] [LRA]ICE when reload insn with output scratch operand
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70030 --- Comment #7 from Renlin Li --- (In reply to Vladimir Makarov from comment #6) > Created attachment 38033 [details] > A patch > > Here is the patch which might solve the problem. Hi Vladimir, sorry for the late reply. I am just back from holiday. Thanks for the patch. I have tested that it fixes the ICE reported here! scratch register in reload instructions are replaced by pseudo registers just as other instructions feeding into LRA. I have also did regression test and bootstrap check. It's all good for aarch64-none-linux-gnu toolchain. Are you going to post it?
[Bug rtl-optimization/70030] [LRA]ICE when reload insn with output scratch operand
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70030 --- Comment #5 from Renlin Li --- (In reply to Vladimir Makarov from comment #3) > (In reply to Ramana Radhakrishnan from comment #2) > > Waiting. > > Actually, I have a candidate patch to deal with scratches created during > LRA. But I can not test it as I have no "local change to gcc", a test case > and used option set. > > In any case, if this problem is solved by other means (e.g. using another > patterns), we should probably close the bug. Yes, it's possible to circumvent this bug by slightly adjusting the patterns. For example, instead of relying on LRA to create pseudo (by using match_scratch), pseudo registers can be created explicitly during expand stage, and used as an normal early clobber register operand in the complex pattern. However, the problem, described here is still there. If it's Okay for you to share your change, I quite happy to test it.
[Bug rtl-optimization/70030] New: [LRA]ICE when reload insn with output scratch operand
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70030 Bug ID: 70030 Summary: [LRA]ICE when reload insn with output scratch operand Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: renlin at gcc dot gnu.org Target Milestone: --- The ICE is triggered when building linux toolchain with local change to gcc aarch64 backend. vfprintf.c: In function ‘_IO_vfwprintf’: vfprintf.c:1689:1: internal compiler error: in lra_set_insn_recog_data, at lra.c:964 } ^ 0x952998 lra_set_insn_recog_data(rtx_insn*) src/gcc/gcc/lra.c:962 0x9537b6 lra_get_insn_recog_data src/gcc/gcc/lra-int.h:486 0x9537b6 lra_update_insn_regno_info src/gcc/gcc/lra.c:1584 0x9537b6 lra_update_insn_regno_info src/gcc/gcc/lra.c:1574 0x953a82 lra_push_insn_1 src/gcc/gcc/lra.c:1649 0x953a82 lra_push_insn(rtx_insn*) src/gcc/gcc/lra.c:1657 0x953cb7 push_insns gcc/gcc/lra.c:1700 0x954191 lra_process_new_insns(rtx_insn*, rtx_insn*, rtx_insn*, char const*) gcc/gcc/lra.c:1754 0x9670e5 curr_insn_transform src/gcc/gcc/lra-constraints.c:3962 0x968866 lra_constraints(bool) src/gcc/gcc/lra-constraints.c:4450 0x954cb2 lra(_IO_FILE*) src/gcc/gcc/lra.c:2277 0x90cfa9 do_reload src/gcc/gcc/ira.c:5395 0x90cfa9 execute src/gcc/gcc/ira.c:5566 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See <http://gcc.gnu.org/bugs.html> for instructions. The situation is like this, To make insn_1 strict, lra generates a new insn_1_reload insn. In insn_1_reload, there is a scratch operand with this form clobber (match_scratch:MODE x "=r") It's written in this way to reserve a pseudo register which will be used as temporary within the pattern. When lra tries to reload insn_1_reload in later iteration, a new pseudo register (let say RXX) is created to replace this scratch operand in-place. Additionally, a new insn will be generated and inserted after insn_1_reload to finish the reload. It's in this form: (set scratch, RXX) And this instruction is illegal. no target implements this kind of pattern. LRA will ICE because of this. (1) if (get_reload_reg (type, mode, old, goal_alt[i], loc != curr_id->operand_loc[i], "", &new_reg) && type != OP_OUT) { push_to_sequence (before); lra_emit_move (new_reg, old); before = get_insns (); end_sequence (); } (2) *loc = new_reg; if (type != OP_IN && find_reg_note (curr_insn, REG_UNUSED, old) == NULL_RTX) { start_sequence (); (3) lra_emit_move (type == OP_INOUT ? copy_rtx (old) : old, new_reg); emit_insn (after); after = get_insns (); end_sequence (); *loc = new_reg; } (1) a reload pseudo register is generated: RXX (2) replace original operand in-place: (clobber RXX) (3) insert insn to set output operand: (set scratch, RXX)
[Bug target/63634] Compiler generated R_AARCH64_TLSLE_ADD_TPREL_HI12/LO12 pair overflowed by large TP offset
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63634 Renlin Li changed: What|Removed |Added Status|ASSIGNED|RESOLVED CC||renlin at gcc dot gnu.org Resolution|--- |FIXED --- Comment #1 from Renlin Li --- r227215 [AArch64][TLSLE][3/3] Implement local executable mode for all memory model r227213 [AArch64][TLSLE][2/3] Rename SYMBOL_TLSLE to SYMBOL_TLSLE24 r227212 [AArch64][TLSLE][1/3] Add the option "-mtls-size" Those three patches implemented TLS local executable mode for all memory models. I have double checked, if -mtls-size is specified properly, correct access sequence and relocations will be emitted. For example in this case -mtls-size=32 should generate movz/movk pair to give 32-bit TP offset. So I will close this ticket now.
[Bug target/64152] internal compiler error: in gen_add2_insn
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64152 Renlin Li changed: What|Removed |Added CC||renlin at gcc dot gnu.org --- Comment #3 from Renlin Li --- As -mno-lra option is already deprecated since GCC 5 for arm/aarch64 backend. This ICE doesn't manifest since then. I came across an insn canonicalization problem which I think is similar to this one, so I spent some time understanding what's going on in both cases. For the record, below is what I found. To reload the following insn, (set (reg:DI 5 x5) (plus:DI (reg/f:DI 31 sp) (mem/u/c:DI (symbol_ref/u:DI ("*.LC201") [flags 0x2]) [0 S8 A64]))) two insns are generated by reload: (set (reg:DI 5 x5) (mem/u/c:DI (symbol_ref/u:DI ("*.LC201" (set (reg:DI 5 x5) (plus:DI (reg:DI 5 x5) (reg/f:DI 31 sp))) {*adddi3_aarch64} The second insn here is not an strict rtx, because the rtx pattern defined in the backend doesn't allow the third operand to be SP register. However, at this stage, the rtx pattern is required to be strict. So this reload is rejected, forcing the reload pass to try other possibilities, This eventually leads the the ICEs observed here. I have checked that there is no insn canonicalization rule for this scenario. Either the target should provided more relaxed add pattern or the reload pass can try to swap the source operands for this commutative operator.
[Bug rtl-optimization/64895] RA picks the wrong register for -fipa-ra
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64895 Renlin Li changed: What|Removed |Added CC||renlin at gcc dot gnu.org --- Comment #12 from Renlin Li --- The same happens for aarch64. > [hjl@gnu-tools-1 gcc]$ cat /tmp/x.c > static int __attribute__((noinline)) > bar (int x) > { > if (x > 4) > return bar (x - 3); > return 0; > } > > int __attribute__((noinline)) > foo (int y) > { > return y + bar (y); > } > There is another problem here actually. In this particular case, bar() is a static function which is not exported. Although -fpic option is provided, pic_offset_table_rtx is not used at all in function foo(). In this case, pic_offset_table_rtx may not be initialized at all. The target hook TARGET_INIT_PIC_REG can be improved to initialize pic register only when necessary. On the other hand, if pic_offset_table_rtx is not used at all, lra_risky_transformations_p should not be true. Does it make sensible? diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c index a78edd8..d4a950f 100644 --- a/gcc/lra-constraints.c +++ b/gcc/lra-constraints.c @@ -4221,7 +4221,8 @@ lra_constraints (bool first_p) lra_constraint_iter); changed_p = false; if (pic_offset_table_rtx - && REGNO (pic_offset_table_rtx) >= FIRST_PSEUDO_REGISTER) + && (i = REGNO (pic_offset_table_rtx)) >= FIRST_PSEUDO_REGISTER + && lra_reg_info[i].nrefs > 0) lra_risky_transformations_p = true; else lra_risky_transformations_p = false;
[Bug target/69008] gcc emits unneeded memory access when passing trivial structs by value (ARM)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69008 Renlin Li changed: What|Removed |Added CC||renlin at gcc dot gnu.org --- Comment #2 from Renlin Li --- This relate to the strict alignment for aarch32 target. The structure is treated as BLKmode and will be stored in the stack first. However, I believe that this actually can be optimized by DSE pass, which will forward the value to the ADD operation directly eliminate the store. However, It seems it's unable to recognize the opportunities here. For example the following modified test case: struct Trivial { short i1; short i2; }; int foo(Trivial t) { return t.i1 + t.i2; } The expand will emits the following code, which still stores the structure into stack first. However, DSE can optimized it and remove insn 2. (insn 2 4 3 2 (set (mem/c:SI (plus:SI (reg/f:SI 105 virtual-stack-vars) (const_int -4 [0xfffc])) [1 S4 A32]) (reg:SI 0 r0)) test.c:7 -1 (nil)) (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG) (insn 6 3 7 2 (set (reg:SI 116) (sign_extend:SI (mem/c:HI (plus:SI (reg/f:SI 105 virtual-stack-vars) (const_int -4 [0xfffc])) [2 t.i1+0 S2 A32]))) test.c:8 -1 (nil)) (insn 7 6 8 2 (set (reg:SI 117) (sign_extend:SI (mem/c:HI (plus:SI (reg/f:SI 105 virtual-stack-vars) (const_int -2 [0xfffe])) [2 t.i2+0 S2 A16]))) test.c:8 -1 (nil)) (insn 8 7 9 2 (set (reg:SI 115) (plus:SI (reg:SI 116) (reg:SI 117))) test.c:8 -1 (nil)) On the other hand, if the original test case is compiled with -mabi=apcs-gnu, it will produce exactly the same code-gen as clang does. "-mabi=apcs-gnu" will change the target BIGGEST_ALIGNMENT macro to 32. In this case, the structure will be treated as scalar DImode. It will no longer stored on the stack any more. The expand will emit different code from the very beginning. (insn 6 3 7 2 (set (reg:SI 114) (plus:SI (subreg:SI (reg/v:DI 113 [ t ]) 0) (subreg:SI (reg/v:DI 113 [ t ]) 4))) new.c:8 -1 (nil)) (insn 7 6 11 2 (set (reg:SI 112 [ ]) (reg:SI 114)) new.c:8 -1 (nil)) (insn 11 7 12 2 (set (reg/i:SI 0 r0) (reg:SI 112 [ ])) new.c:9 -1 (nil)) (insn 12 11 0 2 (use (reg/i:SI 0 r0)) new.c:9 -1 (nil))
[Bug target/69082] Final link fails on ARM using lto
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69082 Renlin Li changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #15 from Renlin Li --- Patch backported. It should be fixed then. I will mark it as resolved.
[Bug target/69082] Final link fails on ARM using lto
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69082 --- Comment #14 from Renlin Li --- Author: renlin Date: Tue Jan 12 17:32:18 2016 New Revision: 232284 URL: https://gcc.gnu.org/viewcvs?rev=232284&root=gcc&view=rev Log: [Backport][PR69082][ARM]Backport "[PATCH][ARM]Tighten the conditions for arm_movw, arm_movt". gcc/ 2016-01-12 Renlin Li PR target/69082 Backport from mainline. 2015-08-24 Renlin Li * config/arm/arm-protos.h (arm_valid_symbolic_address_p): Declare. * config/arm/arm.c (arm_valid_symbolic_address_p): Define. * config/arm/arm.md (arm_movt): Use arm_valid_symbolic_address_p. * config/arm/constraints.md ("j"): Add check for high code. Modified: branches/gcc-4_9-branch/gcc/ChangeLog branches/gcc-4_9-branch/gcc/config/arm/arm-protos.h branches/gcc-4_9-branch/gcc/config/arm/arm.c branches/gcc-4_9-branch/gcc/config/arm/arm.md branches/gcc-4_9-branch/gcc/config/arm/constraints.md
[Bug target/69082] Final link fails on ARM using lto
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69082 --- Comment #13 from Renlin Li --- This problem can be reproduced using gcc 4.9.3 (r225077), and can be fixed by r227129. However, in branch 4.9 with the latest code, this bug cannot be trigger any more. I have done a quick bisect, and find out it's r231177 which masked this error out. r231177 will change the register allocation result. Presumably the problem is still there, as the initial patch is made to fix exactly the same problem observed on trunk code. arm-none-linux-gnueabihf tested without any new failures. I will send a backport patch to mailing list.
[Bug target/69082] Final link fails on ARM using lto
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69082 Renlin Li changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |renlin at gcc dot gnu.org --- Comment #12 from Renlin Li --- (In reply to Richard Earnshaw from comment #11) > Looks like > https://gcc.gnu.org/ml/gcc-cvs/2015-08/msg00665.html > > would be an appropriate fix for this. I verified that, this patch fixes the problem described here. I will do full regression test first. If nothing is broken, I will send a backport patch to branch 4.9.
[Bug rtl-optimization/67477] [6 Regression] ICE in cselib_record_set, at cselib.c:2388
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67477 --- Comment #7 from Renlin Li --- (In reply to Jakub Jelinek from comment #4) > The ICE has been on > (insn 105 746 971 5 (parallel [ > (set (reg:V16QI 60 d22 [720]) > (unspec:V16QI [ > (reg:V16QI 60 d22 [720]) > (reg:V16QI 60 d22 [720]) > ] UNSPEC_VTRN1)) > (set (reg:V16QI 60 d22 [720]) > (unspec:V16QI [ > (reg:V16QI 60 d22 [720]) > (reg:V16QI 60 d22 [720]) > ] UNSPEC_VTRN2)) > ]) pr67477.c:63 1972 {*neon_vtrnv16qi_insn} > (nil)) > which was clearly invalid RTL, multiple sets of the same register. The insn > was still ok in the *.ira dump and broken in *.reload dump. > (define_insn "*neon_vtrn_insn" > [(set (match_operand:VDQW 0 "s_register_operand" "=w") > (unspec:VDQW [(match_operand:VDQW 1 "s_register_operand" "0") > (match_operand:VDQW 3 "s_register_operand" "2")] > UNSPEC_VTRN1)) >(set (match_operand:VDQW 2 "s_register_operand" "=w") > (unspec:VDQW [(match_dup 1) (match_dup 3)] > UNSPEC_VTRN2))] > "TARGET_NEON" > "vtrn.\t%0, %2" > [(set_attr "type" "neon_permute")] > doesn't look like a target bug that would allow 2 same set destinations. That's exactly what I have observed. r228662 fixes that by adding early clobber modifier to the operand, so that register could assign a different register.
[Bug target/67383] reload_cse_simplify_operands fails on ARMV7-M
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67383 Renlin Li changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #7 from Renlin Li --- Backport committed as r231177. It should fix the ICE in the particular case. However, this is not the whole story. I just found another problem. In the test case, there are code structure like this. uint64_t callee (int a, int b, int c, int d); uint64_t caller (int a, int b, int c, int d) { uint64_t res; /* single BB contains complicated data processing which requires register pair */ res = callee (tmp, b ,c, d); return res; } CES pass in this case will extend the hard register live range across the whole BB until the callee. In this case, r1, r2, r3 are excluded from allocatable registers. There are places in CES which prevents extending the hard register's live range, for example for hard register which fullfil small_register_classes_for_mode_p(), class_likely_spilled_p(). However, argument registers belong to neither of them. I tried to stop CES from extending argument registers live range. However, later, scheduler jumps in and re-orders the instruction to reduce the pseudo register pressure, which in effect extend the argument register live again.
[Bug rtl-optimization/66556] Wrong code-generation for armv7-a big-endian at -Os
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66556 Renlin Li changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #5 from Renlin Li --- resolved
[Bug target/66776] [AArch64] zero-extend version of csel not matching properly
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66776 Renlin Li changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #3 from Renlin Li --- resolved.
[Bug target/68286] [6 Regression] ICE: in wide_int_to_tree, at tree.c:1468
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68286 Renlin Li changed: What|Removed |Added Target|powerpc64le-unknown-linux-g |powerpc64le-unknown-linux-g |nu |nu, ||arm-none-linux-gnueabihf CC||renlin at gcc dot gnu.org --- Comment #3 from Renlin Li --- same issue happens on arm-none-linuxgnu-eabihf toolchain.
[Bug tree-optimization/67794] [6 regression] internal compiler error: Segmentation fault
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67794 --- Comment #11 from renlin at gcc dot gnu.org --- > > Hi Martin, > > After the backport patch to branch 5, aarch-none-elf fails to build because > of the following ICEs. > I mean "aarch64-none-elf" here, sorry for the typo.
[Bug tree-optimization/67794] [6 regression] internal compiler error: Segmentation fault
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67794 renlin at gcc dot gnu.org changed: What|Removed |Added CC||renlin at gcc dot gnu.org --- Comment #10 from renlin at gcc dot gnu.org --- (In reply to Martin Jambor from comment #9) > Author: jamborm > Date: Mon Oct 26 14:36:43 2015 > New Revision: 229367 > > URL: https://gcc.gnu.org/viewcvs?rev=229367&root=gcc&view=rev > Log: > Also remap SSA_NAMEs of PARM_DECLs in IPA-SRA > > 2015-10-26 Martin Jambor > > PR tree-optimization/67794 > * tree-sra.c (replace_removed_params_ssa_names): Do not distinguish > between types of statements but accept original definitions as a > parameter. > (ipa_sra_modify_function_body): Use FOR_EACH_SSA_DEF_OPERAND to > iterate over definitions. > > testsuite/ > * gcc.dg/ipa/ipa-sra-10.c: New test. > * gcc.dg/torture/pr67794.c: Likewise. > > > Added: > branches/gcc-5-branch/gcc/testsuite/gcc.dg/ipa/ipa-sra-10.c > branches/gcc-5-branch/gcc/testsuite/gcc.dg/torture/pr67794.c > Modified: > branches/gcc-5-branch/gcc/ChangeLog > branches/gcc-5-branch/gcc/testsuite/ChangeLog > branches/gcc-5-branch/gcc/tree-sra.c Hi Martin, After the backport patch to branch 5, aarch-none-elf fails to build because of the following ICEs. gcc/gcc/tree-sra.c: In function ‘tree_node* replace_removed_params_ssa_names(tree, gimple_statement_base**, ipa_parm_adjustment_vec)’: gcc/gcc/tree-sra.c:4609:39: error: cannot convert ‘gimple_statement_base**’ to ‘gimple’ for argument ‘2’ to ‘tree_node* make_ssa_name(tree, gimple)’ gcc/gcc/tree-sra.c: In function ‘bool ipa_sra_modify_function_body(ipa_parm_adjustment_vec)’: gcc/gcc/tree-sra.c:4703:73: error: cannot convert ‘gphi*’ to ‘gimple_statement_base**’ for argument ‘2’ to ‘tree_node* replace_removed_params_ssa_names(tree, gimple_statement_base**, ipa_parm_adjustment_vec)’ gcc/gcc/tree-sra.c:4772:23: error: cannot convert ‘gimple’ to ‘gimple_statement_base**’ for argument ‘2’ to ‘tree_node* replace_removed_params_ssa_names(tree, gimple_statement_base**, ipa_parm_adjustment_vec)’
[Bug target/67383] reload_cse_simplify_operands fails on ARMV7-M
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67383 renlin at gcc dot gnu.org changed: What|Removed |Added CC||renlin at gcc dot gnu.org --- Comment #5 from renlin at gcc dot gnu.org --- (In reply to Vladimir Makarov from comment #4) > I've tried to reproduce it on gcc-4.9 branch as of today but failed. The > problem with constraints and overlapped hard regs was probably fixed by > backported patches. > > Still I have another problem: > > ../lib/mm/mm.c: In function ‘chunk_node’: > ../lib/mm/mm.c:430:1: internal compiler error: in assign_by_spills, at > lra-assigns.c:1357 > 0x853dd5 assign_by_spills > > /home/cygnus/vmakarov/build1/gcc-4.9-branch/gcc/gcc/lra-assigns.c:1357 > 0x854617 lra_assign() > > /home/cygnus/vmakarov/build1/gcc-4.9-branch/gcc/gcc/lra-assigns.c:1503 > 0x84de9c lra(_IO_FILE*) > /home/cygnus/vmakarov/build1/gcc-4.9-branch/gcc/gcc/lra.c:2388 > 0x80ca16 do_reload > /home/cygnus/vmakarov/build1/gcc-4.9-branch/gcc/gcc/ira.c:5474 > 0x80ca16 rest_of_handle_reload > /home/cygnus/vmakarov/build1/gcc-4.9-branch/gcc/gcc/ira.c:5615 > 0x80ca16 execute > /home/cygnus/vmakarov/build1/gcc-4.9-branch/gcc/gcc/ira.c:5644 > Please submit a full bug report, > with preprocessed source if appropriate. > Please include the complete backtrace with any bug report. > See <http://gcc.gnu.org/bugs.html> for instructions. > > The problem is in assigning a hard reg to reload pseudo 442 for insns > > Choosing alt 0 in insn 153: (0) =&r (1) %0 (2) r {*arm_adddi3} > Creating newreg=441, assigning class GENERAL_REGS to r441 > Creating newreg=442 from oldreg=268, assigning class GENERAL_REGS to > r442 > 153: {r441:DI=r441:DI+r442:DI;clobber cc:CC;} > REG_DEAD r268:DI > REG_UNUSED cc:CC > REG_EQUIV [sp:SI+0x10] > Inserting insn reload before: > 642: r441:DI=[sp:SI+0x8] > 644: r442:DI=r268:DI > Inserting insn reload after: > 643: [sp:SI+0x10]=r441:DI > > We canot use hard reg 0, 1, 2 as they live through insn 153: > > ... > 153: {r272:DI=r268:DI+r159:DI;clobber cc:CC;} > REG_DEAD r268:DI > REG_UNUSED cc:CC > REG_EQUIV [sp:SI+0x10] > ... > 159: call [`debug_printf'] argc:0x20 > REG_DEAD r1:SI > REG_DEAD r0:SI > REG_DEAD r2:DI > > Hard reg 7 (FP), 9 (thread), 10 (pic), 13 (sp), 15 (pc) are fixed. So > we have only one hole for DI value containing 2 regs (4, 5) and pair > (4,5) is assigned to 441 and there are no regs for 442. In this particular case, hard register 12 is free, and hard register 11 can be spilled to accommodate this DImode pseudo register. However, the target hook HARD_REGNO_MODE_OK rejects register pairs start from odd number (11 in this case.) So find_hard_regno_for() failed. I have found r209615 relaxes the target hook. In thumb2 mode, any register pair is allowed. I have verified, it fix this ICE. I will do a full regression test first, If no new issues, I will backport it to branch 4.9
[Bug rtl-optimization/67715] [6 Regression][ARM] ICE in cselib.c during reload_cse_regs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67715 renlin at gcc dot gnu.org changed: What|Removed |Added CC||renlin at gcc dot gnu.org --- Comment #2 from renlin at gcc dot gnu.org --- I have check that this ICE has been fixed by the target patch here: https://gcc.gnu.org/ml/gcc-patches/2015-10/msg00609.html It's exactly the same type of error. The patch has already been committed on trunk as r228662.
[Bug target/66776] [AArch64] zero-extend version of csel not matching properly
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66776 --- Comment #1 from renlin at gcc dot gnu.org --- Author: renlin Date: Fri Oct 2 11:55:04 2015 New Revision: 228384 URL: https://gcc.gnu.org/viewcvs?rev=228384&root=gcc&view=rev Log: [PATCH][AARCH64][PR66776]Add cmovdi_insn_uxtw pattern. gcc/ 2015-10-02 Renlin Li PR target/66776 * config/aarch64/aarch64.md (cmovdi_insn_uxtw): New pattern. gcc/testsuite/ 2015-10-02 Renlin Li PR target/66776 * gcc.target/aarch64/pr66776.c: New. Added: trunk/gcc/testsuite/gcc.target/aarch64/pr66776.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/aarch64/aarch64.md trunk/gcc/testsuite/ChangeLog
[Bug target/66776] [AArch64] zero-extend version of csel not matching properly
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66776 renlin at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Last reconfirmed||2015-10-01 Ever confirmed|0 |1
[Bug rtl-optimization/67028] combine bug. Different assumptions about subreg in different places.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67028 --- Comment #2 from renlin at gcc dot gnu.org --- (In reply to Segher Boessenkool from comment #1) > I have a hard time reproducing this. Could you show the generated > assembler code, and say why you think it is a combine bug? This is my generated asm with this command "cc1 -O3 -march=armv7-a test.c" stmfd sp!, {r4, lr} mov r1, #0 movwr0, #:lower16:.LC0 movtr0, #:upper16:.LC0 bl printf mov r0, #0 ldmfd sp!, {r4, pc} In simplify_comparison(), for the following rtx pattern, and:M1 (subreg:M2 X 0) (const_int C1)) the code will try to permute the SUBREG and AND when WORD_REGISTER_OPERATIONS is defined and the subreg here is Paradoxical. There is an assumption here: the upper bits of the subreg should all be zeros. However, this is not always true. In this particular test case, the AND operation, which ensures the higher bits are zero, is removed. The register here has two CONST_INT values in a if-then-else pattern. When further simplifying this if-then-else pattern, subreg is applied to those two CONST_INT value. In simplify_immed_subreg, CONST_INT is always signed extended to a larger mode. The different assumptions cause the wrong code-generation. What's more, in the gcc internal documentation, it's written: "subregs of subregs are not supported" However, "subreg of subreg" pattern will be generated by combine pass, and simplified by simplify_subreg(). For example: subreg:SI (subreg:HI reg:SI r10) > reg:SI r10
[Bug rtl-optimization/67028] New: combine bug. Different assumptions about subreg in different places.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67028 Bug ID: 67028 Summary: combine bug. Different assumptions about subreg in different places. Product: gcc Version: 5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: renlin at gcc dot gnu.org Target Milestone: --- Created attachment 36067 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36067&action=edit test case This is a combine bug manifest on arm target. A test case is attached. The expected output of the test case should be: checksum = 1 However, with the following command line: arm-none-eabi-gcc -march=armv7-a -O3 test.c -specs=rdimon.specs -o a.out the output is: checksum = 0 It generates wrong code when the optimization level is: -O2, -O3, -Os -O0, -O1 works fine.
[Bug rtl-optimization/66556] Wrong code-generation for armv7-a big-endian at -Os
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66556 --- Comment #4 from renlin at gcc dot gnu.org --- Author: renlin Date: Wed Jul 15 15:13:36 2015 New Revision: 225835 URL: https://gcc.gnu.org/viewcvs?rev=225835&root=gcc&view=rev Log: [PATCH]Fix PR66556. Don't drop side-effect in simplify_const_relational_operation function. gcc/ Backport from mainline. 2015-07-13 Renlin Li PR rtl/66556 * simplify-rtx.c (simplify_const_relational_operation): Add side_effects_p checks. gcc/testsuite/ Backport from mainline. 2015-07-13 Renlin Li PR rtl/66556 * gcc.c-torture/execute/pr66556.c: New. Added: branches/gcc-5-branch/gcc/testsuite/gcc.c-torture/execute/pr66556.c Modified: branches/gcc-5-branch/gcc/ChangeLog branches/gcc-5-branch/gcc/simplify-rtx.c branches/gcc-5-branch/gcc/testsuite/ChangeLog
[Bug rtl-optimization/66556] Wrong code-generation for armv7-a big-endian at -Os
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66556 --- Comment #3 from renlin at gcc dot gnu.org --- Author: renlin Date: Mon Jul 13 08:29:46 2015 New Revision: 225729 URL: https://gcc.gnu.org/viewcvs?rev=225729&root=gcc&view=rev Log: [PATCH]Fix PR66556. Don't drop side-effect in simplify_const_relational_operation function. gcc/ 2015-07-13 Renlin Li PR rtl/66556 * simplify-rtx.c (simplify_const_relational_operation): Add side_effects_p checks. gcc/testsuite/ 2015-07-13 Renlin Li PR rtl/66556 * gcc.c-torture/execute/pr66556.c: New. Added: trunk/gcc/testsuite/gcc.c-torture/execute/pr66556.c Modified: trunk/gcc/ChangeLog trunk/gcc/simplify-rtx.c trunk/gcc/testsuite/ChangeLog
[Bug rtl-optimization/66556] Wrong code-generation for armv7-a big-endian at -Os
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66556 --- Comment #1 from renlin at gcc dot gnu.org --- (insn 22 94 24 4 (set (reg:SI 140 [ g+2 ]) (zero_extend:SI (mem/c:HI (post_modify:SI (reg/f:SI 156) (plus:SI (reg/f:SI 156) (const_int 20 [0x14]))) [5 g+4 S2 A32]))) test.c:36 159 {*arm_zero_extendhisi2_v6} (expr_list:REG_INC (reg/f:SI 156) (expr_list:REG_EQUAL (zero_extend:SI (mem/c:HI (const:SI (plus:SI (symbol_ref:SI ("*.LANCHOR0") [flags 0x182]) (const_int 256 [0x100]))) [5 g+4 S2 A32])) (nil (insn 24 22 25 4 (set (subreg:SI (reg:HI 141 [ D.4259 ]) 0) (zero_extract:SI (reg:SI 140 [ g+2 ]) (const_int 15 [0xf]) (const_int 1 [0x1]))) test.c:36 138 {extzv_t2} (expr_list:REG_DEAD (reg:SI 140 [ g+2 ]) (nil))) (insn 25 24 27 4 (set (reg:SI 142 [ D.4255 ]) (zero_extend:SI (reg:HI 141 [ D.4259 ]))) test.c:36 159 {*arm_zero_extendhisi2_v6} (expr_list:REG_DEAD (reg:HI 141 [ D.4259 ]) (nil))) (insn 33 32 34 4 (set (reg:CC 100 cc) (compare:CC (reg:SI 142 [ D.4255 ]) (reg:SI 150 [ D.4255 ]))) test.c:36 188 {*arm_cmpsi_insn} (expr_list:REG_DEAD (reg:SI 150 [ D.4255 ]) (expr_list:REG_DEAD (reg:SI 142 [ D.4255 ]) (nil (insn 34 33 36 4 (set (reg:SI 152) (ltu:SI (reg:CC 100 cc) (const_int 0 [0]))) test.c:36 198 {*mov_scc} (expr_list:REG_DEAD (reg:CC 100 cc) (nil))) In combine pass, the above rtx are simplified combined and insn 22, 24, 25, 33 are marked as deleted. However, the side-effect of insn 22, post_modify, is not preserved. (insn 43 41 45 4 (set (mem/c:HI (plus:SI (reg/f:SI 156) (const_int 8 [0x8])) [4 MEM[(short int *)&i + 8B]+0 S2 A16]) So for insn 43, the data is stored in the wrong place.
[Bug rtl-optimization/66556] Wrong code-generation for armv7-a big-endian at -Os
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66556 renlin at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Last reconfirmed||2015-06-16 Ever confirmed|0 |1
[Bug rtl-optimization/66556] New: Wrong code-generation for armv7-a big-endian at -Os
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66556 Bug ID: 66556 Summary: Wrong code-generation for armv7-a big-endian at -Os Product: gcc Version: 5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: renlin at gcc dot gnu.org Target Milestone: --- Created attachment 35789 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=35789&action=edit test case The test case is attached. toolchain built from latest trunk code and branch 5 produce wrong code-generation with the following command line option. arm-none-eabi-gcc -march=armv7-a -mbig-endian -Os test.c -o test.out The correct output should be: checksum = ff However, the result is: checksum = 7 The testcase is correctly compiled at -O1, which gives the right execution result. The test case works fine for little-endian at any optimization level.
[Bug target/65326] LRA missing a Thumb optimization.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65326 renlin at gcc dot gnu.org changed: What|Removed |Added CC||renlin at gcc dot gnu.org --- Comment #1 from renlin at gcc dot gnu.org --- In this specific case, thumb_legitimize_address will generate ldr r0, [r9, r10] pattern(after IRA). However, this pattern only allows LO_REGS. During reload, r9&r10 will be spilled into LO_REGS, that's where those two mov instructions come from. (In reply to Matthew Wahab from comment #0) > Created attachment 34964 [details] > Testcase showing change in behaviour. > > The ARM backend no longer supports -mno-lra so only the LRA is available. > This > has also removed the Thumb mode optimiziation introduced in > https://gcc.gnu.org/ml/gcc-patches/2005-08/msg01140.html to fix PR 23436. > > This turns sequences like > mov r3, r9 > mov r2, r10 > ldr r0, [r3, r2] > into > mov r3, r9 > add r3, r3, r10 > ldr r0, [r3] > which saves a register. > > Attached is a contrived test case. Compiling with gcc-4.9 with -mthumb > -mno-lra > (at -O1 and higher) produces the second (better) sequence. Compiling with > gcc-4.9 or gcc-trunk with -mthumb (at -O1 and higher) produces the first > sequence. The sequences appear after the 'nop' > > gcc-4.9 is > arm-none-eabi-gcc (GNU Tools for ARM Embedded Processors) 4.9.3 20141119 > (release) [ARM/embedded-4_9-branch revision 218278] > > trunk is: > arm-none-eabi-gcc (unknown) 5.0.0 20150217 (experimental)
[Bug target/65459] SLOW_UNALIGNED_ACCESS unconditionally set to 1 for ARM targets
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65459 renlin at gcc dot gnu.org changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |renlin at gcc dot gnu.org --- Comment #3 from renlin at gcc dot gnu.org --- confirmed and assign it to myself
[Bug tree-optimization/46038] Vectorizer generates misaligned address for vld1 qn, [rn:alignment]
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46038 renlin at gcc dot gnu.org changed: What|Removed |Added CC||renlin at gcc dot gnu.org --- Comment #1 from renlin at gcc dot gnu.org --- I cannot reproduce the fault in 4.9 or trunk.
[Bug libstdc++/64467] [5 Regression] 28_regex/traits/char/isctype.cc and wchar_t/isctype.cc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64467 --- Comment #7 from renlin at gcc dot gnu.org --- Author: renlin Date: Wed Feb 4 09:24:56 2015 New Revision: 220392 URL: https://gcc.gnu.org/viewcvs?rev=220392&root=gcc&view=rev Log: [PATCH][libstdc++][Testsuite] isctype test fails for newlib. libstdc++-v3/ 2015-02-02 Matthew Wahab PR libstdc++/64467 * testsuite/28_regex/testsuiteraits/char/isctype.cc (test01): Add newlib special case for '\n'. * test01estsuite/28_regex/traits/wchar_t/isctype.cc (test01): Likewise. Modified: trunk/libstdc++-v3/ChangeLog trunk/libstdc++-v3/testsuite/28_regex/traits/char/isctype.cc trunk/libstdc++-v3/testsuite/28_regex/traits/wchar_t/isctype.cc
[Bug target/64149] -mno-lra bitrots, suggest to remove for GCC 5
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64149 --- Comment #7 from renlin at gcc dot gnu.org --- Author: renlin Date: Tue Jan 20 10:26:18 2015 New Revision: 219884 URL: https://gcc.gnu.org/viewcvs?rev=219884&root=gcc&view=rev Log: [ARM] PR 64149: Remove -mlra/-mno-lra option for ARM. gcc/ 2015-01-20 Matthew Wahab PR target/64149 * config/arm/arm.option: Remove lra option and arm_lra_flag variable. * config/arm/arm.h (MODE_BASE_REG_CLASS): Remove use of arm_lra_flag, replace the conditional with it's true branch. * config/arm/arm.c (TARGET_LRA_P): Set to hook_bool_void_true. (arm_lra_p): Remove. gcc/testsuite/ 2015-01-20 matthewhew Wahab PR target/64149 * gcc.target/arm/armthumb1-far-jump-3.c: Remove. Removed: trunk/gcc/testsuite/gcc.target/arm/thumb1-far-jump-3.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/arm/arm.c trunk/gcc/config/arm/arm.h trunk/gcc/config/arm/arm.opt trunk/gcc/testsuite/ChangeLog
[Bug target/61413] __ARM_SIZEOF_WCHAR_T is constant 32 -- should be 4 or 2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61413 renlin at gcc dot gnu.org changed: What|Removed |Added Status|ASSIGNED|RESOLVED CC||renlin at gcc dot gnu.org Resolution|--- |FIXED --- Comment #5 from renlin at gcc dot gnu.org --- backport to branch 4.8 & 4.9.
[Bug target/61413] __ARM_SIZEOF_WCHAR_T is constant 32 -- should be 4 or 2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61413 --- Comment #4 from renlin at gcc dot gnu.org --- Author: renlin Date: Wed Jan 14 11:02:24 2015 New Revision: 219587 URL: https://gcc.gnu.org/viewcvs?rev=219587&root=gcc&view=rev Log: [ARM]Fix definition of __ARM_SIZEOF_WCHAR_T. Backport from mainline: 2014-08-12 Ramana Radhakrishnan PR target/61413 * config/arm/arm.h (TARGET_CPU_CPP_BUILTINS): Fix definition of __ARM_SIZEOF_WCHAR_T. Modified: branches/gcc-4_8-branch/gcc/ChangeLog branches/gcc-4_8-branch/gcc/config/arm/arm.h
[Bug target/63424] [4.9 regression] Octave -O3 build: internal compiler error: in prepare_cmp_insn, at optabs.c:4237
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63424 renlin at gcc dot gnu.org changed: What|Removed |Added Status|ASSIGNED|RESOLVED CC||renlin at gcc dot gnu.org Resolution|--- |FIXED --- Comment #7 from renlin at gcc dot gnu.org --- backport to 4.9
[Bug target/63424] [4.9 regression] Octave -O3 build: internal compiler error: in prepare_cmp_insn, at optabs.c:4237
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63424 --- Comment #6 from renlin at gcc dot gnu.org --- Author: renlin Date: Tue Jan 13 16:25:00 2015 New Revision: 219540 URL: https://gcc.gnu.org/viewcvs?rev=219540&root=gcc&view=rev Log: [AArch64] Implement v2di3 pattern Backport from mainline 2014-11-19 Renlin Li gcc/: PR target/63424 * config/aarch64/aarch64-simd.md (v2di3): New. gcc/testsuite/: PR target/63424 * gcc.target/aarch64/pr63424.c: New test. Added: branches/gcc-4_9-branch/gcc/testsuite/gcc.target/aarch64/pr63424.c Modified: branches/gcc-4_9-branch/gcc/ChangeLog branches/gcc-4_9-branch/gcc/config/aarch64/aarch64-simd.md branches/gcc-4_9-branch/gcc/testsuite/ChangeLog
[Bug ipa/64551] Segfault in target_opts_for_fn (from ipa_icf::sem_function::equals_private)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64551 renlin at gcc dot gnu.org changed: What|Removed |Added CC||hp at gcc dot gnu.org --- Comment #2 from renlin at gcc dot gnu.org --- *** Bug 64552 has been marked as a duplicate of this bug. ***
[Bug middle-end/64552] Build broken for cris-elf and others
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64552 renlin at gcc dot gnu.org changed: What|Removed |Added Status|NEW |RESOLVED CC||renlin at gcc dot gnu.org Resolution|--- |DUPLICATE --- Comment #1 from renlin at gcc dot gnu.org --- presumably a duplicate of 64551 *** This bug has been marked as a duplicate of bug 64551 ***
[Bug ipa/64551] Segfault in target_opts_for_fn (from ipa_icf::sem_function::equals_private)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64551 renlin at gcc dot gnu.org changed: What|Removed |Added Target|alpha-linux-gnu |alpha-linux-gnu, ||arm-none-linux-gnueabi CC||renlin at gcc dot gnu.org --- Comment #1 from renlin at gcc dot gnu.org --- I observed the same issue on arm-none-linux-gnueabi target
[Bug target/61413] __ARM_SIZEOF_WCHAR_T is constant 32 -- should be 4 or 2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61413 --- Comment #3 from renlin at gcc dot gnu.org --- Author: renlin Date: Fri Jan 9 13:55:16 2015 New Revision: 219386 URL: https://gcc.gnu.org/viewcvs?rev=219386&root=gcc&view=rev Log: [ARM]Fix definition of __ARM_SIZEOF_WCHAR_T. Backport from mainline: 2014-08-12 Ramana Radhakrishnan PR target/61413 * config/arm/arm.h (TARGET_CPU_CPP_BUILTINS): Fix definition of __ARM_SIZEOF_WCHAR_T. Modified: branches/gcc-4_9-branch/gcc/ChangeLog branches/gcc-4_9-branch/gcc/config/arm/arm.h
[Bug middle-end/63762] [ARM]GCC generates UNPREDICTABLE STR with Rn = Rt when hard-float abi is used
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63762 --- Comment #8 from renlin at gcc dot gnu.org --- Author: renlin Date: Wed Dec 3 11:13:50 2014 New Revision: 218306 URL: https://gcc.gnu.org/viewcvs?rev=218306&root=gcc&view=rev Log: Backported from mainline gcc/ 2014-12-03 Renlin Li PR middle-end/63762 PR target/63661 * ira.c (i386ra): Update preferred class. gcc/testsuite/ 2014-12-03 Renlin Li H.J. Lu PR middle-end/63762 PR target/63661 * gcc.dg/pr63762.c: New test. * gcc.target/i386/pr63661.c: New test. Added: branches/gcc-4_9-branch/gcc/testsuite/gcc.dg/pr63762.c branches/gcc-4_9-branch/gcc/testsuite/gcc.target/i386/pr63661.c Modified: branches/gcc-4_9-branch/gcc/ChangeLog branches/gcc-4_9-branch/gcc/ira.c branches/gcc-4_9-branch/gcc/testsuite/ChangeLog
[Bug target/63661] [4.9 Regression] -O2 miscompiles with -mtune=nehalem or corei7
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63661 --- Comment #29 from renlin at gcc dot gnu.org --- Author: renlin Date: Wed Dec 3 11:13:50 2014 New Revision: 218306 URL: https://gcc.gnu.org/viewcvs?rev=218306&root=gcc&view=rev Log: Backported from mainline gcc/ 2014-12-03 Renlin Li PR middle-end/63762 PR target/63661 * ira.c (i386ra): Update preferred class. gcc/testsuite/ 2014-12-03 Renlin Li H.J. Lu PR middle-end/63762 PR target/63661 * gcc.dg/pr63762.c: New test. * gcc.target/i386/pr63661.c: New test. Added: branches/gcc-4_9-branch/gcc/testsuite/gcc.dg/pr63762.c branches/gcc-4_9-branch/gcc/testsuite/gcc.target/i386/pr63661.c Modified: branches/gcc-4_9-branch/gcc/ChangeLog branches/gcc-4_9-branch/gcc/ira.c branches/gcc-4_9-branch/gcc/testsuite/ChangeLog
[Bug target/63661] [4.9/5 Regression] -O2 miscompiles with -mtune=nehalem or corei7
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63661 --- Comment #25 from renlin at gcc dot gnu.org --- Author: renlin Date: Fri Nov 28 11:18:47 2014 New Revision: 218144 URL: https://gcc.gnu.org/viewcvs?rev=218144&root=gcc&view=rev Log: Use native tune. nehalem is not able to triggle the issue in trunk any more. 2014-11-28 Renlin Li PR target/63661 * gcc.target/i386/pr63661.c: Use native tune. Modified: trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.target/i386/pr63661.c
[Bug target/63661] [4.9/5 Regression] -O2 miscompiles with -mtune=nehalem or corei7
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63661 --- Comment #24 from renlin at gcc dot gnu.org --- Author: renlin Date: Fri Nov 28 11:01:27 2014 New Revision: 218143 URL: https://gcc.gnu.org/viewcvs?rev=218143&root=gcc&view=rev Log: Add testcase for PR63661. 2014-11-28 Renlin Li PR target/63661 * gcc.target/i386/pr63661.c: New test. Added: trunk/gcc/testsuite/gcc.target/i386/pr63661.c Modified: trunk/gcc/testsuite/ChangeLog
[Bug target/63424] Octave -O3 build: internal compiler error: in prepare_cmp_insn, at optabs.c:4237
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63424 --- Comment #4 from renlin at gcc dot gnu.org --- Author: renlin Date: Wed Nov 19 16:34:38 2014 New Revision: 217786 URL: https://gcc.gnu.org/viewcvs?rev=217786&root=gcc&view=rev Log: [AArch64] Implement v2di3 pattern gcc/: PR target/63424 * config/aarch64/aarch64-simd.md (v2di3): New. gcc/testsuite/: PR target/63424 * gcc.target/aarch64/pr63424.c: New test. Added: trunk/gcc/testsuite/gcc.target/aarch64/pr63424.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/aarch64/aarch64-simd.md trunk/gcc/testsuite/ChangeLog
[Bug middle-end/63762] [ARM]GCC generates UNPREDICTABLE STR with Rn = Rt when hard-float abi is used
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63762 --- Comment #5 from renlin at gcc dot gnu.org --- Author: renlin Date: Wed Nov 19 15:15:51 2014 New Revision: 217783 URL: https://gcc.gnu.org/viewcvs?rev=217783&root=gcc&view=rev Log: 2014-11-19 Renlin Li PR middle-end/63762 * ira.c (ira): Update preferred class. * gcc.dg/pr63762.c: New test. Added: trunk/gcc/testsuite/gcc.dg/pr63762.c Modified: trunk/gcc/ChangeLog trunk/gcc/ira.c trunk/gcc/testsuite/ChangeLog