[Bug tree-optimization/89546] [8/9 Regression] Suspected arm flint miscompilation starting with r255510
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89546 Jakub Jelinek changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #12 from Jakub Jelinek --- Fixed.
[Bug tree-optimization/89546] [8/9 Regression] Suspected arm flint miscompilation starting with r255510
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89546 --- Comment #11 from Martin Jambor --- Author: jamborm Date: Mon Mar 18 11:31:52 2019 New Revision: 269762 URL: https://gcc.gnu.org/viewcvs?rev=269762=gcc=rev Log: Add forgotten requeing in propagate_subaccesses_across_link 2019-03-18 Martin Jambor PR tree-optimization/89546 * tree-sra.c (propagate_subaccesses_across_link): Requeue new_acc if any propagation to its children took place. testsuite/ * gcc.dg/tree-ssa/pr89546.c: New test. Added: branches/gcc-8-branch/gcc/testsuite/gcc.dg/tree-ssa/pr89546.c Modified: branches/gcc-8-branch/gcc/ChangeLog branches/gcc-8-branch/gcc/testsuite/ChangeLog branches/gcc-8-branch/gcc/tree-sra.c
[Bug tree-optimization/89546] [8/9 Regression] Suspected arm flint miscompilation starting with r255510
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89546 --- Comment #10 from Martin Jambor --- Author: jamborm Date: Mon Mar 18 11:28:01 2019 New Revision: 269761 URL: https://gcc.gnu.org/viewcvs?rev=269761=gcc=rev Log: Add forgotten requeing in propagate_subaccesses_across_link 2019-03-18 Martin Jambor PR tree-optimization/89546 * tree-sra.c (propagate_subaccesses_across_link): Requeue new_acc if any propagation to its children took place. testsuite/ * gcc.dg/tree-ssa/pr89546.c: New test. Added: trunk/gcc/testsuite/gcc.dg/tree-ssa/pr89546.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-sra.c
[Bug tree-optimization/89546] [8/9 Regression] Suspected arm flint miscompilation starting with r255510
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89546 --- Comment #9 from Martin Jambor --- I have proposed the patch on the mailing list: https://gcc.gnu.org/ml/gcc-patches/2019-03/msg00708.html
[Bug tree-optimization/89546] [8/9 Regression] Suspected arm flint miscompilation starting with r255510
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89546 --- Comment #8 from Martin Jambor --- Created attachment 45964 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45964=edit x86_64 testcase It took me four or five evenings and is quite fragile, but finally I have an x86_64-linux testcase that triggers the miscompilation (so this really should be P1). The correct fix is the one from my previous comment, I'll prepare a proper patch tomorrow.
[Bug tree-optimization/89546] [8/9 Regression] Suspected arm flint miscompilation starting with r255510
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89546 --- Comment #7 from Martin Jambor --- Right, I must have been looking at output of a patched compiler. Anyway, I believe I tracked it down to a bug in how SRA propagates write flag (since r247604). In one specific case, an access structure is not being requeued and the propagation stops prematurely. The following should fix it (I'll get back to this on Monday, let's see if I can construct a saner testcase): diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c index eeef31ba496..077bd111344 100644 --- a/gcc/tree-sra.c +++ b/gcc/tree-sra.c @@ -2734,8 +2734,12 @@ propagate_subaccesses_across_link (struct access *lacc, struct access *racc) rchild->grp_hint = 1; new_acc->grp_hint |= new_acc->grp_read; - if (rchild->first_child) - ret |= propagate_subaccesses_across_link (new_acc, rchild); + if (rchild->first_child + && propagate_subaccesses_across_link (new_acc, rchild)) + { + ret = 1; + add_access_to_work_queue (new_acc); + } } else {
[Bug tree-optimization/89546] [8/9 Regression] Suspected arm flint miscompilation starting with r255510
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89546 --- Comment #6 from Jakub Jelinek --- And in the *.sra dump, I really don't see any way how it could be two: MEM[(struct &) clique 22 base 1] ={v} {CLOBBER}; ... n1D.7146 ={v} {CLOBBER}; ... MEM[(struct &) clique 27 base 1] ={v} {CLOBBER}; SR.161_6 = SR.150_36; MEM[(struct tupleD.6437 *) + 8B] = SR.161_6; n1$tail$head$payload_91 = MEM[(struct tupleD.6437 *) + 4B]; ... n1D.7698 ={v} {CLOBBER}; and no other stmts referencing n1. So, we have the whole var undefined, then we store an int at offset 8 bytes into it and read an int from offset 4 into it. That is uninitialized load which as #c5 tried to prove wasn't there before late_intra_sra.
[Bug tree-optimization/89546] [8/9 Regression] Suspected arm flint miscompilation starting with r255510
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89546 --- Comment #5 from Jakub Jelinek --- You mean -fno-strict-aliasing? I guess many options change the behavior that it doesn't trigger anymore. In the #c3 dump, I really don't see how aliasing could matter though, all are MEM_REFs with addresses of particular automatic vars, there are no pointers involved. I've instrumented the compiler a little bit: --- gcc/gimple-pretty-print.c.jj2019-01-01 12:37:15.700998856 +0100 +++ gcc/gimple-pretty-print.c 2019-03-01 17:44:46.673862737 +0100 @@ -41,6 +41,7 @@ along with GCC; see the file COPYING3. #include "stringpool.h" #include "attribs.h" #include "asan.h" +#include "tree-dfa.h" #define INDENT(SPACE) \ do { int i; for (i = 0; i < SPACE; i++) pp_space (buffer); } while (0) @@ -575,6 +576,7 @@ dump_ternary_rhs (pretty_printer *buffer } } +volatile bool assign_extra; /* Dump the gimple assignment GS. BUFFER, SPC and FLAGS are as in pp_gimple_stmt_1. */ @@ -634,6 +636,35 @@ dump_gimple_assign (pretty_printer *buff gcc_unreachable (); if (!(flags & TDF_RHS_ONLY)) pp_semicolon (buffer); + + if (assign_extra && gimple_num_ops (gs) == 2) + { + bool any = false; + for (int i = 0; i < 2; i++) + { + tree op = gimple_op (gs, i); + if (handled_component_p (op) + || TREE_CODE (op) == MEM_REF + || TREE_CODE (op) == TARGET_MEM_REF + || DECL_P (op)) + { + poly_int64 off, size, max_size; + bool dummy; + get_ref_base_and_extent (op, , , _size, ); + if (!any) + { + pp_string (buffer, "//"); + any = true; + } + pp_string (buffer, i ? " rhs1 " : " lhs "); + pp_wide_integer (buffer, off); + pp_space (buffer); + pp_wide_integer (buffer, size); + pp_space (buffer); + pp_wide_integer (buffer, max_size); + } + } + } } } and at the start of late_intra_sra after set assign_extra = 1 debug_bb_n (2) looks like below. Extra comments added where the value 2 propagates through: [local count: 1073741824]: MEM[(struct &)] ={v} {CLOBBER};// lhs 0 32 32 MEM[(struct &) + 4] ={v} {CLOBBER};// lhs 32 32 32 MEM[(struct &) + 8] ={v} {CLOBBER};// lhs 64 96 96 MEM[(struct &) + 8] ={v} {CLOBBER};// lhs 64 32 32 MEM[(struct &) + 12] ={v} {CLOBBER};// lhs 96 32 32 MEM[(struct &) + 16] ={v} {CLOBBER};// lhs 128 16 16 MEM[(struct &)] ={v} {CLOBBER};// lhs 0 128 128 D.9933 ={v} {CLOBBER};// lhs 0 96 96 MEM[(struct &)] ={v} {CLOBBER};// lhs 0 128 128 D.9940 ={v} {CLOBBER};// lhs 0 96 96 D.9941 ={v} {CLOBBER};// lhs 0 96 96 D.9942 ={v} {CLOBBER};// lhs 0 64 64 D.9936 ={v} {CLOBBER};// lhs 0 128 128 MEM[(struct &)] ={v} {CLOBBER};// lhs 0 128 128 D.9937 ={v} {CLOBBER};// lhs 0 128 128 n1 ={v} {CLOBBER};// lhs 0 128 128 MEM[(struct &)] ={v} {CLOBBER};// lhs 0 32 32 MEM[(struct &) + 4] ={v} {CLOBBER};// lhs 32 32 32 MEM[(struct &) + 8] ={v} {CLOBBER};// lhs 64 96 96 MEM[(struct &) + 8] ={v} {CLOBBER};// lhs 64 32 32 MEM[(struct &) + 12] ={v} {CLOBBER};// lhs 96 32 32 MEM[(struct &) + 16] ={v} {CLOBBER};// lhs 128 16 16 MEM[(struct tuple &) + 4].head.payload = 2;// lhs 32 32 32 // MEM[+4] == 2 MEM[(struct tuple &) + 8].head.payload = 3;// lhs 64 32 32 D.9957.head = MEM[(const struct type_n &) + 4];// lhs 0 32 32 rhs1 32 32 32 // MEM[+0] == 2 MEM[(struct &)] ={v} {CLOBBER};// lhs 0 96 96 MEM[(struct tuple *)].head = MEM[(const struct type_n &)];// lhs 0 32 32 rhs1 0 32 32 // MEM[+0] == 2 D.9957 ={v} {CLOBBER};// lhs 0 64 64 MEM[(struct tuple *) + 4B] = 4;// lhs 32 32 32 D.9960 = D.9959;// lhs 0 96 96 rhs1 0 96 96 // MEM[+0] == 2 MEM[(struct &)] ={v} {CLOBBER};// lhs 0 128 128 D.9953.head = MEM[(const struct type_n &) + 8];// lhs 0 32 32 rhs1 64 32 32 D.9953.tail = MEM[(const struct tuple &)];// lhs 32 96 96 rhs1 0 96 96 // MEM[+4] == 2 D.9960 ={v} {CLOBBER};// lhs 0 96 96 D.9959 ={v} {CLOBBER};// lhs 0 96 96 D.9952 = D.9953;// lhs 0 128 128 rhs1 0 128 128 // MEM[+4] == 2 MEM[(struct &)] ={v} {CLOBBER};// lhs 0 160 160 MEM[(struct tuple *)].tail = MEM[(const struct tuple &)];// lhs 32 128 128 rhs1 0 128 128 // MEM[+8] == 2 D.9952 ={v} {CLOBBER};// lhs 0 128 128 D.9953 ={v} {CLOBBER};// lhs 0 128 128 D.9954 ={v} {CLOBBER};// lhs 0 64 64 D.9947 = MEM[(const struct tuple &) + 8];// lhs 0 96 96 rhs1 64 96 96 // MEM[+0] == 2 MEM[(struct tuple *)].tail = MEM[(const struct tuple &)];// lhs 32 96 96 rhs1 0 96 96 // MEM[+4] == 2 SR.150_36 = MEM[(const struct tuple &) + 4];// rhs1 32 32 32 D.9947 ={v} {CLOBBER};// lhs 0 96 96 MEM[(struct &)] ={v} {CLOBBER};// lhs 0 96 96 D.9949.head = MEM[(const struct type_n &) + 4];// lhs 0 32 32 rhs1 32 32 32 // MEM[+0] == 2
[Bug tree-optimization/89546] [8/9 Regression] Suspected arm flint miscompilation starting with r255510
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89546 --- Comment #4 from Martin Jambor --- (In reply to Jakub Jelinek from comment #3) > ...But in the sra pass dump that possibility is gone: I am still double checking because it is easy to make a mistake but I have seen a (potential) path in the sra dump, just not in the optimized dump. Also, I believe the testcase passes with -fstrict-aliasing (can you please check?), which would hint at some problems with aliasing... (of course, those might be created by SRA too).
[Bug tree-optimization/89546] [8/9 Regression] Suspected arm flint miscompilation starting with r255510
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89546 Richard Biener changed: What|Removed |Added Keywords||wrong-code Target||arm Priority|P3 |P2
[Bug tree-optimization/89546] [8/9 Regression] Suspected arm flint miscompilation starting with r255510
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89546 --- Comment #3 from Jakub Jelinek --- The gimple dumps aren't exactly readable with so many different tuple/type etc. types, where it is unclear what exact offset is something being stored at. That said, in cplxlower1 I still see a possibility to get there the desired value of 2: MEM[(struct tupleD.7416 &) + 4].headD.7482.payloadD.6716 = 2; ... D.9878.headD.7889 = MEM[(const struct type_nD.6356 &) + 4]; MEM[(struct &)] ={v} {CLOBBER}; D.9880.headD.7099 = MEM[(const struct type_nD.6356 &)]; D.9878 ={v} {CLOBBER}; MEM[(struct tupleD.6387 *) + 4B] = 4; D.9881 = D.9880; MEM[(struct &)] ={v} {CLOBBER}; D.9874.headD.7096 = MEM[(const struct type_nD.6355 &) + 8]; D.9874.tailD.7097 = D.9881; D.9881 ={v} {CLOBBER}; D.9880 ={v} {CLOBBER}; D.9873 = D.9874; MEM[(struct &)] ={v} {CLOBBER}; D.9865.tailD.7802 = D.9873; D.9873 ={v} {CLOBBER}; D.9874 ={v} {CLOBBER}; D.9875 ={v} {CLOBBER}; D.9868 = MEM[(const struct tupleD.6387 &) + 8]; D.9869.tailD.8063 = D.9868; SR.34_37 = MEM[(struct tupleD.6387 *) + 4B]; D.9868 ={v} {CLOBBER}; MEM[(struct &)] ={v} {CLOBBER}; D.9870.headD.7099 = MEM[(const struct type_nD.6356 &) + 4]; MEM[(struct &)] ={v} {CLOBBER}; MEM[(struct tupleD.6387 *) + 4B] = SR.34_37; n1D.7646.tailD.7097 = D.9870; D.9870 ={v} {CLOBBER}; D.9869 ={v} {CLOBBER}; D.9865 ={v} {CLOBBER}; _2 = n1D.7646.tailD.7097.headD.7099.payloadD.6716; if (_2 != 2) But in the sra pass dump that possibility is gone: MEM[(struct &)] ={v} {CLOBBER}; SR.41_6 = SR.34_37; MEM[(struct tupleD.6387 *) + 8B] = SR.41_6; n1$tail$head$payload_90 = MEM[(struct tupleD.6387 *) + 4B]; ... _2 = n1$tail$head$payload_90; if (_2 != 2)
[Bug tree-optimization/89546] [8/9 Regression] Suspected arm flint miscompilation starting with r255510
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89546 --- Comment #2 from Jakub Jelinek --- When get_tail is esra optimized the old way, main is: int n1$tail$head$payload; struct type D.9885; struct tuple D.9880; struct type D.9879; struct type D.9878; struct type D.9875; [local count: 1073741825]: MEM[(struct tuple *)] = 3; MEM[(struct tuple *) + 4B] = 2; MEM[(struct tuple *) + 8B] = 4; D.9875.tail = D.9885; D.9885 ={v} {CLOBBER}; D.9878 = MEM[(const struct tuple &) + 8]; D.9879.tail = D.9878; D.9878 ={v} {CLOBBER}; D.9880.head = MEM[(const struct type_n &) + 4]; n1$tail$head$payload_32 = MEM[(struct tuple *)]; D.9880 ={v} {CLOBBER}; D.9879 ={v} {CLOBBER}; D.9875 ={v} {CLOBBER}; if (n1$tail$head$payload_32 != 2) goto ; [0.00%] else goto ; [99.96%] [count: 0]: __builtin_abort (); [local count: 1072883002]: return 0; and the testcase works. If it is esra optimized the new way, I get: int n1$tail$head$payload; struct type n1; [local count: 1073741825]: MEM[(struct &)] ={v} {CLOBBER}; n1$tail$head$payload_90 = MEM[(struct tuple *) + 4B]; if (n1$tail$head$payload_90 != 2) goto ; [0.00%] else goto ; [99.96%] [count: 0]: __builtin_abort (); [local count: 1072883002]: n1 ={v} {CLOBBER}; return 0; which is undefined, so unless the testcase is UB (but passes e.g. on x86_64-linux too, valgrind is quiet on it, -fsanitize=address,undefined too), something went wrong during GIMPLE optimizations.
[Bug tree-optimization/89546] [8/9 Regression] Suspected arm flint miscompilation starting with r255510
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89546 Jakub Jelinek changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2019-03-01 CC||jamborm at gcc dot gnu.org Target Milestone|--- |8.4 Ever confirmed|0 |1 --- Comment #1 from Jakub Jelinek --- I've tried to bisect it further, which exact SRA change makes a difference, using: --- gcc/tree-sra.c.jj 2019-03-01 00:21:16.431273877 +0100 +++ gcc/tree-sra.c 2019-03-01 13:09:22.437370185 +0100 @@ -650,12 +650,22 @@ pop_access_from_work_queue (void) return access; } +static bool old_rev; +static int cntxx; /* Allocate necessary structures. */ static void sra_initialize (void) { +if (getenv ("SRA") != NULL) +{ + int xx = atoi (getenv ("SRA")); + if (cntxx++ < xx) +old_rev = true; + else +old_rev = false; +} candidate_bitmap = BITMAP_ALLOC (NULL); candidates = new hash_table (vec_safe_length (cfun->local_decls) / 2); @@ -1156,6 +1166,8 @@ contains_vce_or_bfcref_p (const_tree ref ref = TREE_OPERAND (ref, 0); } +if (old_rev) return false; + if (TREE_CODE (ref) != MEM_REF || TREE_CODE (TREE_OPERAND (ref, 0)) != ADDR_EXPR) return false; @@ -1367,7 +1379,7 @@ build_accesses_from_assign (gimple *stmt if (should_scalarize_away_bitmap && !gimple_has_volatile_ops (stmt) && !is_gimple_reg_type (racc->type)) { - if (contains_vce_or_bfcref_p (rhs)) + if (!old_rev && contains_vce_or_bfcref_p (rhs)) bitmap_set_bit (cannot_scalarize_away_bitmap, DECL_UID (racc->base)); else patch applied on top of r255510, so that some SRA invocations are done the r255509 way and others r255510 way and it seems it is esra pass in get_tail that matters whether the testcase aborts or succeeds.