[Bug ipa/114321] New: [11 regression] ipa/modref: incorrect result with O2 since r11-3308
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114321 Bug ID: 114321 Summary: [11 regression] ipa/modref: incorrect result with O2 since r11-3308 Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: ipa Assignee: unassigned at gcc dot gnu.org Reporter: yinyuefengyi at gmail dot com Target Milestone: --- https://godbolt.org/z/hz4E1q4dK Though the code is somewhat flaw, uint64_t pointer is passed to a function and modified as uint32_t pointer, the function call is removed by fre1 pass as: ipa-modref: call stmt MurmurHash3_x86_32 (_8, _2, 123456, ); ipa-modref: call to void MurmurHash3_x86_32(const void*, int, uint32_t, uint64_t*)/554 does not clobber ref: ret alias sets: 46->46 Setting value number of _10 to 0 (changed) Value numbering stmt = ret ={v} {CLOBBER(eol)}; Setting value number of .MEM_11 to .MEM_11 (changed) Value numbering stmt = return _10; marking outgoing edge 2 -> 1 executable RPO iteration over 1 blocks visited 1 blocks in total discovering 1 executable blocks iterating 1.0 times, a block was visited max. 1 times RPO tracked 9 values available at 3 locations and 9 lattice elements Replaced MEM[(const struct basic_string *)trace_id_6(D)]._M_dataplus._M_p with _7 in all uses of _8 = MEM[(const struct basic_string *)trace_id_6(D)]._M_dataplus._M_p; Replaced ret with 0 in all uses of _10 = ret; Removing dead stmt _10 = ret; Removing dead stmt _8 = MEM[(const struct basic_string *)trace_id_6(D)]._M_dataplus._M_p; Not sure whether this is valid, it works before gcc11. Disable with -fno-ipa-modref or -fno-strict-aliasing could work. Please take a look?
[Bug middle-end/88781] [meta-bug] bogus/missing -Wstringop-truncation warnings
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88781 Bug 88781 depends on bug 110151, which changed state. Bug 110151 Summary: warning: 'strncpy' output truncated copying 10 bytes from a string of length 26 [-Wstringop-truncation] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110151 What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |DUPLICATE
[Bug tree-optimization/107473] Unexpected warning / error with strncpy
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107473 Xionghu Luo (luoxhu at gcc dot gnu.org) changed: What|Removed |Added CC||yinyuefengyi at gmail dot com --- Comment #2 from Xionghu Luo (luoxhu at gcc dot gnu.org) --- *** Bug 110151 has been marked as a duplicate of this bug. ***
[Bug tree-optimization/110151] warning: 'strncpy' output truncated copying 10 bytes from a string of length 26 [-Wstringop-truncation]
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110151 Xionghu Luo (luoxhu at gcc dot gnu.org) changed: What|Removed |Added Resolution|--- |DUPLICATE Status|UNCONFIRMED |RESOLVED --- Comment #1 from Xionghu Luo (luoxhu at gcc dot gnu.org) --- duplicate. *** This bug has been marked as a duplicate of bug 107473 ***
[Bug tree-optimization/110151] New: warning: 'strncpy' output truncated copying 10 bytes from a string of length 26 [-Wstringop-truncation]
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110151 Bug ID: 110151 Summary: warning: 'strncpy' output truncated copying 10 bytes from a string of length 26 [-Wstringop-truncation] Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: yinyuefengyi at gmail dot com Target Milestone: --- For the below two cases(https://godbolt.org/z/5rbMTeqW9), are they false positive warnings seem unnecessary since: for foo1: memset has cleared the memory; for foo2: though 'dest[11] = '\0';' is not the 'immediately' next_stmt after strncpy but it does setting the last element to nul? #include #include int foo1 () { char src[40]; char dest[12]; memset(dest, '\0', sizeof(dest)); strcpy(src, "This is tutorialspoint.com"); strncpy(dest, src, 10); printf("%s", dest); return(0); } char a; int foo2 () { char src[40]; char dest[12]; strcpy(src, "This is tutorialspoint.com"); strncpy(dest, src, 10); a = dest[0]; dest[11] = '\0'; printf("%s", dest); return(0); }
[Bug c/110048] New: undefined reference when build with O0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110048 Bug ID: 110048 Summary: undefined reference when build with O0 Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: yinyuefengyi at gmail dot com Target Milestone: --- The below case failed to link with O0 since gcc 5.1, is it a regression? Though clang always failed to link... The case links success with O1+ or 'inline' removed. https://godbolt.org/z/9PEhWrov8 inline void foo(void) { } int main(void) { foo(); }
[Bug middle-end/109821] vect: Different output with -O2 -ftree-loop-vectorize compared to -O2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109821 --- Comment #2 from Xionghu Luo (luoxhu at gcc dot gnu.org) --- (In reply to Andrew Pinski from comment #1) > Two issues which make this undefined. First the unaligned macros still use > aligned types which gcc uses for alignment of the pointer type. Thanks Andrew :), and the second issue is?
[Bug c++/109821] New: vect: Different output with -O2 -ftree-loop-vectorize compared to -O2
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109821 Bug ID: 109821 Summary: vect: Different output with -O2 -ftree-loop-vectorize compared to -O2 Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: yinyuefengyi at gmail dot com Target Milestone: --- For this test code, it aims to generate special patterns different with memcpy or memmove, it generates different results with -O2 -ftree-loop-vectorize compared to -O2, is this a bug of vectorizer that lack of checking the gap of op-src should be larger than vector mode size (here only do vectorize if op - src > 16)? copy.cpp: #include #include #include #define UNALIGNED_LOAD64(_p) (*reinterpret_cast(_p)) #define UNALIGNED_STORE64(_p, _val) (*reinterpret_cast(_p) = (_val)) __attribute__((__noinline__)) static void IncrementalCopyFastPath(const char* src, char* op, int len) { while (op - src < 8) { UNALIGNED_STORE64(op, UNALIGNED_LOAD64(src)); len -= op - src; op += op - src; } while (len > 0) { UNALIGNED_STORE64(op, UNALIGNED_LOAD64(src)); src += 8; op += 8; len -= 8; } } int main () { char src[] = "123456789abcdefghijklmnopqrstu"; char *op = src+12; char * dst = op; IncrementalCopyFastPath (src, op, 36); int i = 0; while (i < 36) {printf("%x ", *(dst+i)), i++;} printf("\n"); return 0; } $ gcc copy.cpp -O2 -o a.out.good $ ./a.out.good 30 31 32 33 34 35 36 37 38 39 61 62 30 31 32 33 34 35 36 37 38 39 61 62 30 31 32 33 34 35 36 37 38 39 61 62 $ gcc copy.cpp -O2 -ftree-loop-vectorize -o a.out.bad $ ./a.out.bad 30 31 32 33 34 35 36 37 38 39 61 62 63 64 65 66 34 35 36 37 38 39 61 62 63 64 65 66 73 74 75 76 38 39 61 62 gimple after t.vect: IncrementalCopyFastPath.constprop (const char * src, char * op) { ... [local count: 118111600]: _4 = src_8(D) + 8; if (_4 != op_9(D))// <= the check should be op_9 > src_8 + 16 here? goto ; [80.00%] else goto ; [20.00%] ... }
[Bug gcov-profile/93680] [GCOV] "do-while" structure in case statement leads to incorrect code coverage
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93680 --- Comment #5 from Xionghu Luo (luoxhu at gcc dot gnu.org) --- Patch posted: https://gcc.gnu.org/pipermail/gcc-patches/2023-April/616123.html
[Bug target/106069] [12/13 Regression] wrong code with -O -fno-tree-forwprop -maltivec on ppc64le
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106069 --- Comment #37 from Xionghu Luo (luoxhu at gcc dot gnu.org) --- https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614932.html
[Bug ipa/107769] [12/13 Regression] -flto with -Os/-O2/-O3 emitted code with gcc 12.x segfaults via mutated global in .rodata since r12-2887-ga6da2cddcf0e959d
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107769 Xionghu Luo (luoxhu at gcc dot gnu.org) changed: What|Removed |Added CC||yinyuefengyi at gmail dot com --- Comment #5 from Xionghu Luo (luoxhu at gcc dot gnu.org) --- For case c#1: g__r_1 is a global variable changed in function hh, but ipa-prop think it is only loaded by reference without change then removed references in gcc/ipa-prop.cc:propagate_controlled_uses. .wpa.081i.cp: g__r_1/6 (g__r_1) Type: variable definition analyzed Visibility: semantic_interposition prevailing_def_ironly References: Referring: main/7 (addr) kk.constprop.0/16 (addr) kk.part.0.constprop.0/17 (read) Read from file: /tmp/cc3peQfe.o Availability: available Varpool flags: initialized .wpa.085i.inline: ipa-prop: Address IPA constant will reach a load so adding LOAD reference from main/7 to g__r_1/6. ipa-prop: Removed a reference from main/7 to g__r_1/6. ipa-prop: Removing cloning-created reference from kk.constprop/16 to g__r_1/6. ... g__r_1/6 (g__r_1) Type: variable definition analyzed Visibility: semantic_interposition prevailing_def_ironly References: Referring: main/7 (read) main/7 (read) kk.part.0.constprop.0/17 (read) Read from file: /tmp/cc3peQfe.o Availability: available Varpool flags: initialized It seems a bug exposed by r12-2887-ga6da2cddcf0e959d, but maybe actually caused by r12-2523-g13586172d0b70c since it fail to identify globals not read-only...
[Bug gcov-profile/93680] [GCOV] "do-while" structure in case statement leads to incorrect code coverage
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93680 Xionghu Luo (luoxhu at gcc dot gnu.org) changed: What|Removed |Added CC||yinyuefengyi at gmail dot com --- Comment #4 from Xionghu Luo (luoxhu at gcc dot gnu.org) --- Check the difference of the two switch cases, both called split_edge to generate empty latch bb after the loop: profile.cc: /* Edge with goto locus might get wrong coverage info unless it is the only edge out of BB. Don't do that when the locuses match, so if (blah) goto something; is not computed twice. */ if (last && gimple_has_location (last) && !RESERVED_LOCATION_P (e->goto_locus) && !single_succ_p (bb) && (LOCATION_FILE (e->goto_locus) != LOCATION_FILE (gimple_location (last)) || (LOCATION_LINE (e->goto_locus) != LOCATION_LINE (gimple_location (last) { basic_block new_bb = split_edge (e); edge ne = single_succ_edge (new_bb); ne->goto_locus = e->goto_locus; } but the second case failed to find a edge from dest_prev to dest if edge_in forms a self loop (edge_in->src == edge_in->dest) : p_6 = 0; q_7 = 0; switch (s_8(D)) [INV], case 0: [INV], case 1: [INV]> : # n_1 = PHI # p_3 = PHI : p_13 = p_3 + 1; n_14 = n_1 + -1; if (n_14 != 0) goto ; [INV] else goto ; [INV] : goto ; [100.00%] : _15 = p_13; goto ; [INV] : : # n_2 = PHI # p_4 = PHI : p_10 = p_4 + 1; n_11 = n_2 + -1; if (n_11 != 0) goto ; [INV] else goto ; [INV] : _12 = p_10; goto ; [INV] Note the two loops have different latch bb location. So add the check like this for self loop to return loop bb itself as after_bb? diff --git a/gcc/tree-cfg.cc b/gcc/tree-cfg.cc index a9fcc7fd050..6fa1d83d366 100644 --- a/gcc/tree-cfg.cc +++ b/gcc/tree-cfg.cc @@ -3009,7 +3009,7 @@ split_edge_bb_loc (edge edge_in) if (dest_prev) { edge e = find_edge (dest_prev, dest); - if (e && !(e->flags & EDGE_COMPLEX)) + if ((e && !(e->flags & EDGE_COMPLEX)) || edge_in->src == edge_in->dest) return edge_in->src; } return dest_prev; With the fix, small.c.069i.profile: : p_6 = 0; q_7 = 0; switch (s_8(D)) [INV], case 0: [INV], case 1: [INV]> : # n_1 = PHI # p_3 = PHI : p_13 = p_3 + 1; n_14 = n_1 + -1; if (n_14 != 0) goto ; [INV] else goto ; [INV] : goto ; [100.00%] : _15 = p_13; goto ; [INV] : # n_2 = PHI # p_4 = PHI : p_10 = p_4 + 1; n_11 = n_2 + -1; if (n_11 != 0) goto ; [INV] else goto ; [INV] : goto ; [100.00%] : _12 = p_10; goto ; [INV] cat small.c.gcov: -:0:Source:small.c -:0:Graph:small.gcno -:0:Data:small.gcda -:0:Runs:1 2:1:int f(int s, int n) -:2:{ 2:3: int p = 0; 2:4: int q = 0; -:5: 2:6: switch (s) -:7:{ 5:8:case 0: 5:9: do { p++; } while (--n); 1: 10: return p; -: 11: 5: 12:case 1: 5: 13: do { p++; } while (--n); 1: 14: return p; -: 15:} -: 16: #: 17: return 0; -: 18:} -: 19: 1: 20:int main() { f(0, 5); f(1, 5);} Is this reasonable Fix? If so I could cook a patch and send it to maillist for review...
[Bug gcov-profile/97923] [GCOV]Wrong code coverage for multiple expressions with Logical OR Operator at multiple lines
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97923 --- Comment #6 from Xionghu Luo (luoxhu at gcc dot gnu.org) --- below changes could fix the incorrect location diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc index 96845154a92..2dc8608dedf 100644 --- a/gcc/gimplify.cc +++ b/gcc/gimplify.cc @@ -3915,7 +3915,8 @@ shortcut_cond_r (tree pred, tree *true_label_p, tree *false_label_p, false_label_p = _label; /* Keep the original source location on the first 'if'. */ - t = shortcut_cond_r (TREE_OPERAND (pred, 0), NULL, false_label_p, locus); + tree op0 = TREE_OPERAND (pred, 0); + t = shortcut_cond_r (op0, NULL, false_label_p, EXPR_LOCATION (op0)); append_to_statement_list (t, ); /* Set the source location of the && on the second 'if'. */ @@ -3938,7 +3939,8 @@ shortcut_cond_r (tree pred, tree *true_label_p, tree *false_label_p, true_label_p = _label; /* Keep the original source location on the first 'if'. */ - t = shortcut_cond_r (TREE_OPERAND (pred, 0), true_label_p, NULL, locus); + tree op0 = TREE_OPERAND (pred, 0); + t = shortcut_cond_r (op0, true_label_p, NULL, EXPR_LOCATION (op0)); append_to_statement_list (t, ); That produce expected block line info and coverage: gcov-dump test.gcno -lp: test.gcno: 583:0145: 35:LINES test.gcno: 595: block 2:`test.c':1, 3 <= change from 5 to 3 test.gcno: 626:0145: 31:LINES test.gcno: 638: block 3:`test.c':3 test.gcno: 665:0145: 31:LINES test.gcno: 677: block 4:`test.c':4 test.gcno: 704:0145: 31:LINES test.gcno: 716: block 5:`test.c':4 test.gcno: 743:0145: 31:LINES test.gcno: 755: block 6:`test.c':5 test.gcno: 782:0145: 31:LINES test.gcno: 794: block 7:`test.c':5 test.gcno: 821:0145: 31:LINES test.gcno: 833: block 8:`test.c':5 test.gcno: 860:0145: 31:LINES test.gcno: 872: block 9:`test.c':5 test.gcno: 899:0145: 31:LINES test.gcno: 911: block 10:`test.c':5 test.gcno: 938:0145: 31:LINES test.gcno: 950: block 11:`test.c':5 cat test.c.gcov: -:0:Source:test.c -:0:Graph:test.gcno -:0:Data:test.gcda -:0:Runs:1 1:1:int foo(char c) -:2:{ 1*:3: return ((c >= 'A' && c <= 'Z') 1*:4: || (c >= 'a' && c <= 'z') 1*:5: || (c >= '0' && c <= '0')); -:6:} -:7: 1:8:int main() { return foo('0'); }
[Bug gcov-profile/97923] [GCOV]Wrong code coverage for multiple expressions with Logical OR Operator at multiple lines
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97923 Xionghu Luo (luoxhu at gcc dot gnu.org) changed: What|Removed |Added CC||yinyuefengyi at gmail dot com --- Comment #5 from Xionghu Luo (luoxhu at gcc dot gnu.org) --- gcov-dump test.gcno -lp test.gcno: 583:0145: 35:LINES test.gcno: 595: block 2:`test.c':1, 5 test.gcno: 626:0145: 31:LINES test.gcno: 638: block 3:`test.c':3 test.gcno: 665:0145: 31:LINES test.gcno: 677: block 4:`test.c':4 test.gcno: 704:0145: 31:LINES test.gcno: 716: block 5:`test.c':4 test.gcno: 743:0145: 31:LINES test.gcno: 755: block 6:`test.c':5 test.gcno: 782:0145: 31:LINES test.gcno: 794: block 7:`test.c':5 test.gcno: 821:0145: 31:LINES test.gcno: 833: block 8:`test.c':5 test.gcno: 860:0145: 31:LINES test.gcno: 872: block 9:`test.c':5 test.gcno: 899:0145: 31:LINES test.gcno: 911: block 10:`test.c':5 test.gcno: 938:0145: 31:LINES test.gcno: 950: block 11:`test.c':5 It seems that the block location of block 2 is incorect, the gcno shows block 2 is located in line 1 and line 5, but actually it is located in line 1 and line 3 as block 2 maps to source code c>='A' only? int foo (char c) { int iftmp.0; int D.2744; int iftmp.0_1; int iftmp.0_3; int iftmp.0_4; int _5; : if (c_2(D) > 64) goto ; [INV] else goto ; [INV] : if (c_2(D) <= 90) goto ; [INV] else goto ; [INV] : if (c_2(D) > 96) goto ; [INV] else goto ; [INV]
[Bug tree-optimization/108351] [13 Regression] Dead Code Elimination Regression at -O3 since r13-4240-gfeeb0d68f1c708
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108351 --- Comment #4 from Xionghu Luo (luoxhu at gcc dot gnu.org) --- > early inline pass inlines the two calls with C front-end but fails to inline > them with C++ front-end due to "growth 8 exceeds --param > early-inlining-insns divided by number of calls". > Swap C and C++ here...
[Bug tree-optimization/108351] [13 Regression] Dead Code Elimination Regression at -O3 since r13-4240-gfeeb0d68f1c708
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108351 Xionghu Luo (luoxhu at gcc dot gnu.org) changed: What|Removed |Added CC||yinyuefengyi at gmail dot com --- Comment #3 from Xionghu Luo (luoxhu at gcc dot gnu.org) --- (In reply to Andrew Pinski from comment #2) > I noticed that with the C++ front-end early inline inlines f into main but > with the C front-end it does not ... C++ generates different call instructions with C front-end: < D.3747 = f (7, 7); < D.3748 = f (9, 7); --- > f (7, 7); > f (9, 7); then an extra return_cost cost is added to C++ front-end: < D.3747 = f (7, 7); < freq:1.00 size: 4 time: 13 < D.3748 = f (9, 7); < freq:1.00 size: 4 time: 13 --- > f (7, 7); > freq:1.00 size: 3 time: 12 > f (9, 7); > freq:1.00 size: 3 time: 12 early inline pass inlines the two calls with C front-end but fails to inline them with C++ front-end due to "growth 8 exceeds --param early-inlining-insns divided by number of calls". gcc/ipa-inline.cc:747growth * (n + 1) > early_inlining_insns gcc/opts.cc:687{ OPT_LEVELS_3_PLUS, OPT__param_early_inlining_insns_, NULL, 14 }, Tried revert the commit r13-4240-gfeeb0d68f1c708, the two calls still exits, need also revert the commit r13-4686-g095a13eda2caf6. (r13-4686 mentioned IPA-SRA detects whether parameters could be removed, obviously it doesn't work well now if constants are not propagated before IPA-SRA, it seems that the IPA-SRA only remove locally_unused parameters but 'm' is not locally used and the constants from caller are not visible in IPA-SRA?)
[Bug rtl-optimization/106707] [13 Regression] ICE: in cselib_record_set, at cselib.cc:2687 with -Oz -g -fno-cprop-registers -fno-dce since r13-1945-gfc6ef90173478521
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106707 --- Comment #4 from Xionghu Luo (luoxhu at gcc dot gnu.org) --- Maybe guard the pattern with... diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 58fcc382fa2..2a9d70da6d0 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -3045,6 +3045,7 @@ (define_peephole2 "optimize_size > 1 && (REGNO (operands[0]) == AX_REG || REGNO (operands[1]) == AX_REG) + && REGNO(operands[0]) != REGNO(operands[1]) && optimize_insn_for_size_p () && peep2_reg_dead_p (1, operands[1])" [(parallel [(set (match_dup 0) (match_dup 1))
[Bug lto/100010] [10/11/12/13 Regression] ICE in lto_output_node, at lto-cgraph.c:447 (-fdevirtualize-at-ltrans) since r6-6384-gceda2c69d5219719
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100010 --- Comment #8 from Xionghu Luo (luoxhu at gcc dot gnu.org) --- At the ICE point, node->clone_of has value, but clone_of is NULL: (gdb) p clone_of $114 = (cgraph_node *) 0x0 (gdb) p node->clone_of $115 = (cgraph_node *) 0x76664bb0 (gdb) pnode node->clone_of _ZN12ErrorHandler8decorateERK6String/0 (decorate) Type: function Visibility: semantic_interposition virtual next sharing asm name: 0 References: Referring: Availability: not_available Function flags: Called by: Calls: (gdb) pnode node _ZN12ErrorHandler8decorateERK6String/0 (decorate) Type: function definition analyzed Visibility: semantic_interposition virtual previous sharing asm name: 0 References: Referring: Read from file: a-pr10010.o Function decorate/0 is inline copy in decorate/1 Clone of _ZN12ErrorHandler8decorateERK6String/0 Availability: local Unit id: 1 Function flags: count:1073741824 (estimated locally) local Called by: _ZN20LandmarkErrorHandler8decorateERK6String/1 (inlined) (1073741824 (estimated locally),1.00 per call) Calls: _ZN6StringD1Ev/37 (1073741824 (estimated locally),1.00 per call) _ZN6StringC1Ec/38 (1073741824 (estimated loca lly),1.00 per call) (can throw external) Polymorphic indirect call of type struct ErrorHandler token:0(1073741824 (estimated locally),1.00 per call) (can thr ow external) of param:0 (vptr maybe changed) num speculative call targets: 0 Outer type (dynamic):struct ErrorHandler (or a derived type) offset 0 This simple change could fix but not sure whether it is correct. diff --git a/gcc/lto-cgraph.cc b/gcc/lto-cgraph.cc index 6d9c36ea8b6..44a33a2af23 100644 --- a/gcc/lto-cgraph.cc +++ b/gcc/lto-cgraph.cc @@ -448,7 +448,7 @@ lto_output_node (struct lto_simple_output_block *ob, struct cgraph_node *node, if (clone_of && !lto_symtab_encoder_encode_body_p (encoder, ultimate_clone_of)) clone_of = NULL; - if (tag == LTO_symtab_analyzed_node) + if (tag == LTO_symtab_analyzed_node && !flag_ltrans_devirtualize) gcc_assert (clone_of || !node->clone_of); if (!clone_of) streamer_write_hwi_stream (ob->main_stream, LCC_NOT_FOUND);
[Bug ipa/91771] Optimization fails to inline final override.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91771 Xionghu Luo (luoxhu at gcc dot gnu.org) changed: What|Removed |Added CC||yinyuefengyi at gmail dot com --- Comment #4 from Xionghu Luo (luoxhu at gcc dot gnu.org) --- Just curious about the 021t.ssa dump... int f (struct Derived & d) { struct Base * _1; int _5; int _6; : _1 = _2(D)->D.2395; _5 = Base::foo (_1, 40); _6 = _5; return _6; } d_2 is a reference to "struct Derived" type instance, so is it an unnecessary type promotion of promoting type "_1" to "struct Base *"? Another thing to be noted is early inline pass inlined Base::foo into f, but it failed to devirtualize the virtual call in it, is it possible to devirt the call if "struct Derived * _1" is produced in ssa pass?
[Bug ipa/101839] [10/11/12/13 Regression] Hang in C++ code with -fdevirtualize
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101839 --- Comment #8 from Xionghu Luo (luoxhu at gcc dot gnu.org) --- The relationship is: A A::type | | | BA BA::type CACA::type | CBA CBA::type class CA and CBA are final, also function CA::type and BA::type are final, then in function possible_polymorphic_call_targets for "target" BA::type, the "DECL_FINAL_P (target)" check is not accurate enough, as there may be classes like CBA derived from BA and have instance that need continue walk recursively in possible_polymorphic_call_targets_1 to record_target_from_binfo. if (target) { /* In the case we get complete method, we don't need to walk derivations. */ if (DECL_FINAL_P (target)) context.maybe_derived_type = false; } So fix this by belong change only stop walk derivations when target is final and it's class outer_type->type is also final? diff --git a/gcc/ipa-devirt.cc b/gcc/ipa-devirt.cc index 412ca14f66b..77f9b268e86 100644 --- a/gcc/ipa-devirt.cc +++ b/gcc/ipa-devirt.cc @@ -3188,7 +3188,9 @@ possible_polymorphic_call_targets (tree otr_type, /* In the case we get complete method, we don't need to walk derivations. */ - if (target && DECL_FINAL_P (target)) + if (target && TREE_CODE (target) == FUNCTION_DECL && DECL_FINAL_P (target) + && RECORD_OR_UNION_TYPE_P (out er_type->type) + && TYPE_FINAL_P (outer_type->type)) context.speculative_maybe_derived_type = false; if (type_possibly_instantiated_p (speculative_outer_type->type)) maybe_record_node (nodes, target, , can_refer, _complete); @@ -3233,7 +3235,9 @@ possible_polymorphic_call_targets (tree otr_type, { /* In the case we get complete method, we don't need to walk derivations. */ - if (DECL_FINAL_P (target)) + if (TREE_CODE (target) == FUNCTION_DECL && DECL_FINAL_P (target) + && RECORD_OR_UNION_TYPE_P (outer_type->type) + && TYPE_FINAL_P (outer_type->type)) context.maybe_derived_type = false; }
[Bug target/106069] [12/13 Regression] wrong code with -O -fno-tree-forwprop -maltivec on ppc64le
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106069 --- Comment #32 from Xionghu Luo (luoxhu at gcc dot gnu.org) --- Thanks for all the information! It inspires to me that "native RTL should be endian-independent". So both big-endian and little-endian platform should generate same (vec_select (vec_concat (R0 R1) [0 4 1 5])) for altivec_vmrghw, then combine pass could do correct "nested vec_select" optimization, the endian check are left to ASM generation at last, that's the benefit for removing the UNSPECS. My culprit patch did change the LE representation, sorry for the stupid mistake... Attached the fix patch. If it is reasonable, I will continue refine it and send to maillist.
[Bug target/106069] [12/13 Regression] wrong code with -O -fno-tree-forwprop -maltivec on ppc64le
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106069 --- Comment #31 from Xionghu Luo (luoxhu at gcc dot gnu.org) --- Created attachment 53408 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53408=edit 0001-rs6000-Fix-incorrect-RTL-for-Power-LE-when-removing-
[Bug target/106069] [12/13 Regression] wrong code with -O -fno-tree-forwprop -maltivec on ppc64le
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106069 --- Comment #20 from Xionghu Luo (luoxhu at gcc dot gnu.org) --- Another reference is manually change the generated assembly with modifying the source and index vspltw to verify: luoxhu@gcc135 build $ diff q.bad.s q.good.s -U12 --- q.bad.s 2022-08-03 06:30:08.298451116 + +++ q.good.s2022-08-03 06:30:52.887250451 + @@ -18,31 +18,31 @@ addi 2,2,.TOC.-.LCF0@l .localentry _Z3fooPhPjDv4_jS1_S1_S1_,.-_Z3fooPhPjDv4_jS1_S1_S1_ mflr %r0 std %r0,16(%r1) std %r30,-16(%r1) std %r31,-8(%r1) stdu %r1,-112(%r1) .cfi_def_cfa_offset 112 .cfi_offset 65, 16 .cfi_offset 30, -16 .cfi_offset 31, -8 mr %r30,%r3 - vspltw %v0,%v2,0 + vspltw %v0,%v5,3 mfvsrwz %r7,%vs32 - vspltw %v0,%v3,0 + vspltw %v0,%v4,3 mfvsrwz %r6,%vs32 - vspltw %v0,%v4,0 + vspltw %v0,%v3,3 mfvsrwz %r5,%vs32 - vspltw %v0,%v5,0 + vspltw %v0,%v2,3 mfvsrwz %r31,%vs32 rldicl %r7,%r7,0,32 rldicl %r6,%r6,0,32 rldicl %r5,%r5,0,32 rldicl %r4,%r31,0,32 addis %r3,%r2,.LC0@toc@ha addi %r3,%r3,.LC0@toc@l bl printf nop stb %r31,0(%r30) addi %r1,%r1,112 .cfi_def_cfa_offset 0 luoxhu@gcc135 build $ gcc q.good.s -o q.good luoxhu@gcc135 build $ ./q.good B0: 41fcef98, 91648e8b,7dca18c6,61707865 Which means both register and index are incorrectly used in LE nested vec_select optimization.
[Bug target/106069] [12/13 Regression] wrong code with -O -fno-tree-forwprop -maltivec on ppc64le
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106069 --- Comment #19 from Xionghu Luo (luoxhu at gcc dot gnu.org) --- (In reply to Xionghu Luo (luo...@gcc.gnu.org) from comment #15) > In combine: vec_select(vec_concat and the followed vec_select are combined > to a single extract instruction, which seems reasonable for both LE and BE? > > R146: 0 1 2 3 > R141: 4 5 6 7 > R150: 2 6 3 7// vec_select(vec_concat(r146:V4SI,r141:V4SI),[2 6 3 7]) > R151: R150[3]// vec_select(r150:V4SI,3) > > => > > R151: R141[3] // vec_select(r141:V4SI,3) > > > > Trying 21 -> 24: >21: r150:V4SI=vec_select(vec_concat(r146:V4SI,r141:V4SI),parallel) > REG_DEAD r146:V4SI > REG_DEAD r141:V4SI >24: {r151:SI=vec_select(r150:V4SI,parallel);clobber scratch;} > Failed to match this instruction: > (parallel [ > (set (reg:SI 151) > (vec_select:SI (reg:V4SI 141) > (parallel [ > (const_int 3 [0x3]) > ]))) > (clobber (scratch:SI)) > (set (reg:V4SI 150) > (vec_select:V4SI (vec_concat:V8SI (reg:V4SI 146) > (reg:V4SI 141)) > (parallel [ > (const_int 2 [0x2]) > (const_int 6 [0x6]) > (const_int 3 [0x3]) > (const_int 7 [0x7]) > ]))) > ]) > Failed to match this instruction: > (parallel [ > (set (reg:SI 151) > (vec_select:SI (reg:V4SI 141) > (parallel [ > (const_int 3 [0x3]) > ]))) > (set (reg:V4SI 150) > (vec_select:V4SI (vec_concat:V8SI (reg:V4SI 146) > (reg:V4SI 141)) > (parallel [ > (const_int 2 [0x2]) > (const_int 6 [0x6]) > (const_int 3 [0x3]) > (const_int 7 [0x7]) > ]))) > ]) > Successfully matched this instruction: > (set (reg:V4SI 150) > (vec_select:V4SI (vec_concat:V8SI (reg:V4SI 146) > (reg:V4SI 141)) > (parallel [ > (const_int 2 [0x2]) > (const_int 6 [0x6]) > (const_int 3 [0x3]) > (const_int 7 [0x7]) > ]))) > Successfully matched this instruction: > (set (reg:SI 151) > (vec_select:SI (reg:V4SI 141) > (parallel [ > (const_int 3 [0x3]) > ]))) > allowing combination of insns 21 and 24 > original costs 4 + 4 = 8 > replacement costs 4 + 4 = 8 > modifying insn i221: > r150:V4SI=vec_select(vec_concat(r146:V4SI,r141:V4SI),parallel) > REG_DEAD r146:V4SI > deferring rescan insn with uid = 21. > modifying insn i324: {r151:SI=vec_select(r141:V4SI,parallel);clobber > scratch;} > REG_DEAD r141:V4SI > deferring rescan insn with uid = 24. > > > I guess the previous unspec implementation bypassed the LE + LE swap check, > so now in split2, we should generate vextuwlx instead of vextuwrx on little > endian? This nested vec_select+vec_select+vec_concat optimization is introduced by Uros in simplify-rtx.c by PR32661, unfortunately it only works for Power BE platforms, disable that piece of code could work due to not combined the nested vec_select optimizations... For Power LE, firstly: Trying 21 -> 24: R146: 3 2 1 0 R141: 7 6 5 4 R150: 7 3 6 2// vec_select(vec_concat(r146:V4SI,r141:V4SI),[2 6 3 7]) R151: R150[3]// vec_select(r150:V4SI,3) => currently: R151: R141[3] // vec_select(r141:V4SI,3) But it should be: R151: R146[3] // vec_select(r146:V4SI,3) Which means current: R151: R150[3] R141[3] R153: R150[2] R146[3] R155: R150[1] R141[2] R157: R150[0] R146[2] Should be optimized to after the first nested vec_select optimization: R151: R150[3] R146[3] R153: R150[2] R141[3] R155: R150[1] R146[2] R157: R150[0] R141[2] With some little endian check and swap could achieve the result (swap op00 and op01). But Secondly there is another "nested vec_select" optimisation which produces R151=R165[3]: Trying 21 -> 26: ... R146 R165 R163 [7 3 6 2] R151: R146[3] => R165[3] (this is wrong!) While R162, R163, R164, R165 is input value R0 R1 R2 R3. the vsx_extract_v4si_di_p9 index should be "0" instead of "3". correct should be: R151: R165[0] R153: R164[0] R155: R163[0] R157: R162[0] (insn 44 2 4 2 (set (reg:V4SI 162) (reg:V4SI 66 2 [ R0 ])) "q.C":36:1 1157 {vsx_movv4si_64bit} (expr_list:REG_DEAD (reg:V4SI 66 2 [ R0 ]) (nil))) (note 4 44 45 2 NOTE_INSN_DELETED) (insn 45 4 5 2 (set (reg:V4SI 163) (reg:V4SI 67 3 [ R1 ])) "q.C":36:1 1157 {vsx_movv4si_64bit} (expr_list:REG_DEAD (reg:V4SI 67 3 [ R1 ]) (nil))) (note 5 45 46 2 NOTE_INSN_DELETED) (insn 46 5 6 2 (set (reg:V4SI 164) (reg:V4SI 68 4 [ R2 ])) "q.C":36:1 1157 {vsx_movv4si_64bit}