[Bug ipa/114321] New: [11 regression] ipa/modref: incorrect result with O2 since r11-3308

2024-03-12 Thread yinyuefengyi at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114321

Bug ID: 114321
   Summary: [11 regression] ipa/modref: incorrect result with O2
since r11-3308
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: ipa
  Assignee: unassigned at gcc dot gnu.org
  Reporter: yinyuefengyi at gmail dot com
  Target Milestone: ---

https://godbolt.org/z/hz4E1q4dK

Though the code is somewhat flaw, uint64_t pointer is passed to a function and
modified as uint32_t pointer, the function call is removed by fre1 pass as:


ipa-modref: call stmt MurmurHash3_x86_32 (_8, _2, 123456, );
ipa-modref: call to void MurmurHash3_x86_32(const void*, int, uint32_t,
uint64_t*)/554 does not clobber ref: ret alias sets: 46->46
Setting value number of _10 to 0 (changed)
Value numbering stmt = ret ={v} {CLOBBER(eol)};
Setting value number of .MEM_11 to .MEM_11 (changed)
Value numbering stmt = return _10;
marking outgoing edge 2 -> 1 executable
RPO iteration over 1 blocks visited 1 blocks in total discovering 1 executable
blocks iterating 1.0 times, a block was visited max. 1 times
RPO tracked 9 values available at 3 locations and 9 lattice elements
Replaced MEM[(const struct basic_string *)trace_id_6(D)]._M_dataplus._M_p with
_7 in all uses of _8 = MEM[(const struct basic_string
*)trace_id_6(D)]._M_dataplus._M_p;
Replaced ret with 0 in all uses of _10 = ret;
Removing dead stmt _10 = ret;
Removing dead stmt _8 = MEM[(const struct basic_string
*)trace_id_6(D)]._M_dataplus._M_p;


Not sure whether this is valid, it works before gcc11.  Disable with
-fno-ipa-modref or -fno-strict-aliasing could work.  Please take a look?

[Bug middle-end/88781] [meta-bug] bogus/missing -Wstringop-truncation warnings

2023-06-07 Thread yinyuefengyi at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88781
Bug 88781 depends on bug 110151, which changed state.

Bug 110151 Summary: warning: 'strncpy' output truncated copying 10 bytes from a 
string of length 26 [-Wstringop-truncation]
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110151

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |DUPLICATE

[Bug tree-optimization/107473] Unexpected warning / error with strncpy

2023-06-07 Thread yinyuefengyi at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107473

Xionghu Luo (luoxhu at gcc dot gnu.org)  changed:

   What|Removed |Added

 CC||yinyuefengyi at gmail dot com

--- Comment #2 from Xionghu Luo (luoxhu at gcc dot gnu.org)  ---
*** Bug 110151 has been marked as a duplicate of this bug. ***

[Bug tree-optimization/110151] warning: 'strncpy' output truncated copying 10 bytes from a string of length 26 [-Wstringop-truncation]

2023-06-07 Thread yinyuefengyi at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110151

Xionghu Luo (luoxhu at gcc dot gnu.org)  changed:

   What|Removed |Added

 Resolution|--- |DUPLICATE
 Status|UNCONFIRMED |RESOLVED

--- Comment #1 from Xionghu Luo (luoxhu at gcc dot gnu.org)  ---
duplicate.

*** This bug has been marked as a duplicate of bug 107473 ***

[Bug tree-optimization/110151] New: warning: 'strncpy' output truncated copying 10 bytes from a string of length 26 [-Wstringop-truncation]

2023-06-06 Thread yinyuefengyi at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110151

Bug ID: 110151
   Summary: warning: 'strncpy' output truncated copying 10 bytes
from a string of length 26 [-Wstringop-truncation]
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: yinyuefengyi at gmail dot com
  Target Milestone: ---

For the below two cases(https://godbolt.org/z/5rbMTeqW9), are they false
positive warnings seem unnecessary since:

for foo1:
 memset has cleared the memory;

for foo2:
 though 'dest[11] = '\0';' is not the 'immediately' next_stmt after strncpy but
it does setting the last element to nul? 

#include 
#include 

int foo1 () {
char src[40];
char dest[12];

memset(dest, '\0', sizeof(dest));
strcpy(src, "This is tutorialspoint.com");
strncpy(dest, src, 10);

printf("%s", dest);
return(0);
}

char a;
int foo2 () {
char src[40];
char dest[12];

strcpy(src, "This is tutorialspoint.com");
strncpy(dest, src, 10);
a = dest[0];
dest[11] = '\0';

printf("%s", dest);
return(0);
}

[Bug c/110048] New: undefined reference when build with O0

2023-05-31 Thread yinyuefengyi at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110048

Bug ID: 110048
   Summary: undefined reference when build with O0
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: yinyuefengyi at gmail dot com
  Target Milestone: ---

The below case failed to link with O0 since gcc 5.1, is it a regression? 
Though clang always failed to link... 
The case links success with O1+ or 'inline' removed.

https://godbolt.org/z/9PEhWrov8

inline void foo(void)
{
}

int main(void)
{
  foo();
}

[Bug middle-end/109821] vect: Different output with -O2 -ftree-loop-vectorize compared to -O2

2023-05-11 Thread yinyuefengyi at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109821

--- Comment #2 from Xionghu Luo (luoxhu at gcc dot gnu.org)  ---
(In reply to Andrew Pinski from comment #1)
> Two issues which make this undefined. First the unaligned macros still use
> aligned types which gcc uses for alignment of the pointer type.

Thanks Andrew :), and the second issue is?

[Bug c++/109821] New: vect: Different output with -O2 -ftree-loop-vectorize compared to -O2

2023-05-11 Thread yinyuefengyi at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109821

Bug ID: 109821
   Summary: vect: Different output with -O2 -ftree-loop-vectorize
compared to -O2
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: yinyuefengyi at gmail dot com
  Target Milestone: ---

For this test code, it aims to generate special patterns different with memcpy
or memmove, it generates different results with -O2 -ftree-loop-vectorize
compared to -O2, is this a bug of vectorizer that lack of checking the gap of
op-src should be larger than vector mode size (here only do vectorize if op -
src > 16)?

copy.cpp:

#include 
#include 
#include 

#define UNALIGNED_LOAD64(_p) (*reinterpret_cast(_p))
#define UNALIGNED_STORE64(_p, _val) (*reinterpret_cast(_p) =
(_val))

__attribute__((__noinline__))
static void IncrementalCopyFastPath(const char* src, char* op, int len) {
while (op - src < 8) {
UNALIGNED_STORE64(op, UNALIGNED_LOAD64(src));
len -= op - src;
op += op - src;
}
while (len > 0) {
UNALIGNED_STORE64(op, UNALIGNED_LOAD64(src));
src += 8;
op += 8;
len -= 8;
}
}

int main ()
{
  char src[] = "123456789abcdefghijklmnopqrstu";
  char *op = src+12;
  char * dst = op;
  IncrementalCopyFastPath (src, op, 36);
  int i = 0;
  while (i < 36)
{printf("%x ", *(dst+i)), i++;}
  printf("\n");
  return 0;
}


$ gcc copy.cpp -O2 -o a.out.good
$ ./a.out.good
30 31 32 33 34 35 36 37 38 39 61 62 30 31 32 33 34 35 36 37 38 39 61 62 30 31
32 33 34 35 36 37 38 39 61 62
$ gcc copy.cpp -O2 -ftree-loop-vectorize  -o a.out.bad
$ ./a.out.bad
30 31 32 33 34 35 36 37 38 39 61 62 63 64 65 66 34 35 36 37 38 39 61 62 63 64
65 66 73 74 75 76 38 39 61 62


gimple after t.vect:

IncrementalCopyFastPath.constprop (const char * src, char * op)
{
...
   [local count: 118111600]:
  _4 = src_8(D) + 8;
  if (_4 != op_9(D))// <=  the check should be op_9 > src_8 + 16 here?
goto ; [80.00%]
  else
goto ; [20.00%]
...
}

[Bug gcov-profile/93680] [GCOV] "do-while" structure in case statement leads to incorrect code coverage

2023-05-05 Thread yinyuefengyi at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93680

--- Comment #5 from Xionghu Luo (luoxhu at gcc dot gnu.org)  ---
Patch posted:
https://gcc.gnu.org/pipermail/gcc-patches/2023-April/616123.html

[Bug target/106069] [12/13 Regression] wrong code with -O -fno-tree-forwprop -maltivec on ppc64le

2023-03-30 Thread yinyuefengyi at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106069

--- Comment #37 from Xionghu Luo (luoxhu at gcc dot gnu.org)  ---
https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614932.html

[Bug ipa/107769] [12/13 Regression] -flto with -Os/-O2/-O3 emitted code with gcc 12.x segfaults via mutated global in .rodata since r12-2887-ga6da2cddcf0e959d

2023-03-29 Thread yinyuefengyi at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107769

Xionghu Luo (luoxhu at gcc dot gnu.org)  changed:

   What|Removed |Added

 CC||yinyuefengyi at gmail dot com

--- Comment #5 from Xionghu Luo (luoxhu at gcc dot gnu.org)  ---
For case c#1:
g__r_1 is a global variable changed in function hh, but ipa-prop think it is
only loaded by reference without change then removed references in
gcc/ipa-prop.cc:propagate_controlled_uses.


.wpa.081i.cp:

g__r_1/6 (g__r_1)
  Type: variable definition analyzed
  Visibility: semantic_interposition prevailing_def_ironly
  References:
  Referring: main/7 (addr) kk.constprop.0/16 (addr) kk.part.0.constprop.0/17
(read)
  Read from file: /tmp/cc3peQfe.o
  Availability: available
  Varpool flags: initialized


.wpa.085i.inline:
ipa-prop: Address IPA constant will reach a load so adding LOAD reference from
main/7 to g__r_1/6.
ipa-prop: Removed a reference from main/7 to g__r_1/6.
ipa-prop: Removing cloning-created reference from kk.constprop/16 to g__r_1/6.
...
g__r_1/6 (g__r_1)
  Type: variable definition analyzed
  Visibility: semantic_interposition prevailing_def_ironly
  References:
  Referring: main/7 (read) main/7 (read) kk.part.0.constprop.0/17 (read)
  Read from file: /tmp/cc3peQfe.o
  Availability: available
  Varpool flags: initialized


It seems a bug exposed by r12-2887-ga6da2cddcf0e959d, but maybe actually caused
by r12-2523-g13586172d0b70c since it fail to identify globals not read-only...

[Bug gcov-profile/93680] [GCOV] "do-while" structure in case statement leads to incorrect code coverage

2023-02-28 Thread yinyuefengyi at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93680

Xionghu Luo (luoxhu at gcc dot gnu.org)  changed:

   What|Removed |Added

 CC||yinyuefengyi at gmail dot com

--- Comment #4 from Xionghu Luo (luoxhu at gcc dot gnu.org)  ---
Check the difference of the two switch cases, both called split_edge to
generate empty latch bb after the loop:

profile.cc:
  /* Edge with goto locus might get wrong coverage info unless
 it is the only edge out of BB.
 Don't do that when the locuses match, so
 if (blah) goto something;
 is not computed twice.  */
  if (last
  && gimple_has_location (last)
  && !RESERVED_LOCATION_P (e->goto_locus)
  && !single_succ_p (bb)
  && (LOCATION_FILE (e->goto_locus)
  != LOCATION_FILE (gimple_location (last))
  || (LOCATION_LINE (e->goto_locus)
  != LOCATION_LINE (gimple_location (last)
{
  basic_block new_bb = split_edge (e);
  edge ne = single_succ_edge (new_bb);
  ne->goto_locus = e->goto_locus;
}

but the second case failed to find a edge from dest_prev to dest if edge_in
forms a self loop (edge_in->src == edge_in->dest)


   :
  p_6 = 0;
  q_7 = 0;
  switch (s_8(D))  [INV], case 0:  [INV], case 1:  [INV]>

   :
  # n_1 = PHI 
  # p_3 = PHI 
:
  p_13 = p_3 + 1;
  n_14 = n_1 + -1;
  if (n_14 != 0)
goto ; [INV]
  else
goto ; [INV]

   :
  goto ; [100.00%]

   :
  _15 = p_13;
  goto ; [INV]

   :

   :
  # n_2 = PHI 
  # p_4 = PHI 
:
  p_10 = p_4 + 1;
  n_11 = n_2 + -1;
  if (n_11 != 0)
goto ; [INV]
  else
goto ; [INV]

   :
  _12 = p_10;
  goto ; [INV]


Note the two loops have different latch bb location.



So add the check like this for self loop to return loop bb itself as after_bb?

diff --git a/gcc/tree-cfg.cc b/gcc/tree-cfg.cc
index a9fcc7fd050..6fa1d83d366 100644
--- a/gcc/tree-cfg.cc
+++ b/gcc/tree-cfg.cc
@@ -3009,7 +3009,7 @@ split_edge_bb_loc (edge edge_in)
   if (dest_prev)
 {
   edge e = find_edge (dest_prev, dest);
-  if (e && !(e->flags & EDGE_COMPLEX))
+  if ((e && !(e->flags & EDGE_COMPLEX)) || edge_in->src == edge_in->dest)
return edge_in->src;
 }
   return dest_prev;

With the fix, small.c.069i.profile:

   :
  p_6 = 0;
  q_7 = 0;
  switch (s_8(D))  [INV], case 0:  [INV], case 1:  [INV]>

   :
  # n_1 = PHI 
  # p_3 = PHI 
:
  p_13 = p_3 + 1;
  n_14 = n_1 + -1;
  if (n_14 != 0)
goto ; [INV]
  else
goto ; [INV]

   :
  goto ; [100.00%]

   :
  _15 = p_13;
  goto ; [INV]

   :
  # n_2 = PHI 
  # p_4 = PHI 
:
  p_10 = p_4 + 1;
  n_11 = n_2 + -1;
  if (n_11 != 0)
goto ; [INV]
  else
goto ; [INV]

   :
  goto ; [100.00%]

   :
  _12 = p_10;
  goto ; [INV]



cat small.c.gcov:
-:0:Source:small.c
-:0:Graph:small.gcno
-:0:Data:small.gcda
-:0:Runs:1
2:1:int f(int s, int n)
-:2:{
2:3:  int p = 0;
2:4:  int q = 0;
-:5:
2:6:  switch (s)
-:7:{
5:8:case 0:
5:9:  do { p++; } while (--n);
1:   10:  return p;
-:   11:
5:   12:case 1:
5:   13:  do { p++; } while (--n);
1:   14:  return p;
-:   15:}
-:   16:
#:   17:  return 0;
-:   18:}
-:   19:
1:   20:int main() { f(0, 5); f(1, 5);}


Is this reasonable Fix? If so I could cook a patch and send it to maillist for
review...

[Bug gcov-profile/97923] [GCOV]Wrong code coverage for multiple expressions with Logical OR Operator at multiple lines

2023-02-28 Thread yinyuefengyi at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97923

--- Comment #6 from Xionghu Luo (luoxhu at gcc dot gnu.org)  ---
below changes could fix the incorrect location

diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
index 96845154a92..2dc8608dedf 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -3915,7 +3915,8 @@ shortcut_cond_r (tree pred, tree *true_label_p, tree
*false_label_p,
false_label_p = _label;

   /* Keep the original source location on the first 'if'.  */
-  t = shortcut_cond_r (TREE_OPERAND (pred, 0), NULL, false_label_p,
locus);
+  tree op0 = TREE_OPERAND (pred, 0);
+  t = shortcut_cond_r (op0, NULL, false_label_p, EXPR_LOCATION (op0));
   append_to_statement_list (t, );

   /* Set the source location of the && on the second 'if'.  */
@@ -3938,7 +3939,8 @@ shortcut_cond_r (tree pred, tree *true_label_p, tree
*false_label_p,
true_label_p = _label;

   /* Keep the original source location on the first 'if'.  */
-  t = shortcut_cond_r (TREE_OPERAND (pred, 0), true_label_p, NULL, locus);
+  tree op0 = TREE_OPERAND (pred, 0);
+  t = shortcut_cond_r (op0, true_label_p, NULL, EXPR_LOCATION (op0));
   append_to_statement_list (t, );


That produce expected block line info and coverage:

gcov-dump test.gcno -lp:
test.gcno:  583:0145:  35:LINES
test.gcno:  595:  block 2:`test.c':1, 3   <= change from 5 to 3
test.gcno:  626:0145:  31:LINES
test.gcno:  638:  block 3:`test.c':3
test.gcno:  665:0145:  31:LINES
test.gcno:  677:  block 4:`test.c':4
test.gcno:  704:0145:  31:LINES
test.gcno:  716:  block 5:`test.c':4
test.gcno:  743:0145:  31:LINES
test.gcno:  755:  block 6:`test.c':5
test.gcno:  782:0145:  31:LINES
test.gcno:  794:  block 7:`test.c':5
test.gcno:  821:0145:  31:LINES
test.gcno:  833:  block 8:`test.c':5
test.gcno:  860:0145:  31:LINES
test.gcno:  872:  block 9:`test.c':5
test.gcno:  899:0145:  31:LINES
test.gcno:  911:  block 10:`test.c':5
test.gcno:  938:0145:  31:LINES
test.gcno:  950:  block 11:`test.c':5

cat test.c.gcov:
-:0:Source:test.c
-:0:Graph:test.gcno
-:0:Data:test.gcda
-:0:Runs:1
1:1:int foo(char c)
-:2:{
   1*:3:  return ((c >= 'A' && c <= 'Z')
   1*:4:  || (c >= 'a' && c <= 'z')
   1*:5:  || (c >= '0' && c <= '0'));
-:6:}
-:7:
1:8:int main() { return foo('0'); }

[Bug gcov-profile/97923] [GCOV]Wrong code coverage for multiple expressions with Logical OR Operator at multiple lines

2023-02-28 Thread yinyuefengyi at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97923

Xionghu Luo (luoxhu at gcc dot gnu.org)  changed:

   What|Removed |Added

 CC||yinyuefengyi at gmail dot com

--- Comment #5 from Xionghu Luo (luoxhu at gcc dot gnu.org)  ---
gcov-dump test.gcno -lp
test.gcno:  583:0145:  35:LINES
test.gcno:  595:  block 2:`test.c':1, 5
test.gcno:  626:0145:  31:LINES
test.gcno:  638:  block 3:`test.c':3
test.gcno:  665:0145:  31:LINES
test.gcno:  677:  block 4:`test.c':4
test.gcno:  704:0145:  31:LINES
test.gcno:  716:  block 5:`test.c':4
test.gcno:  743:0145:  31:LINES
test.gcno:  755:  block 6:`test.c':5
test.gcno:  782:0145:  31:LINES
test.gcno:  794:  block 7:`test.c':5
test.gcno:  821:0145:  31:LINES
test.gcno:  833:  block 8:`test.c':5
test.gcno:  860:0145:  31:LINES
test.gcno:  872:  block 9:`test.c':5
test.gcno:  899:0145:  31:LINES
test.gcno:  911:  block 10:`test.c':5
test.gcno:  938:0145:  31:LINES
test.gcno:  950:  block 11:`test.c':5

It seems that the block location of block 2 is incorect, the gcno shows block 2
is located in line 1 and line 5, but actually it is located in line 1 and line
3 as block 2 maps to source code c>='A' only?

int foo (char c)
{
  int iftmp.0;
  int D.2744;
  int iftmp.0_1;
  int iftmp.0_3;
  int iftmp.0_4;
  int _5;

   :
  if (c_2(D) > 64)
goto ; [INV]
  else
goto ; [INV]

   :
  if (c_2(D) <= 90)
goto ; [INV]
  else
goto ; [INV]

   :
  if (c_2(D) > 96)
goto ; [INV]
  else
goto ; [INV]

[Bug tree-optimization/108351] [13 Regression] Dead Code Elimination Regression at -O3 since r13-4240-gfeeb0d68f1c708

2023-02-15 Thread yinyuefengyi at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108351

--- Comment #4 from Xionghu Luo (luoxhu at gcc dot gnu.org)  ---

> early inline pass inlines the two calls with C front-end but fails to inline
> them with C++ front-end due to "growth 8 exceeds --param
> early-inlining-insns  divided by number of calls". 
> 

Swap C and C++ here...

[Bug tree-optimization/108351] [13 Regression] Dead Code Elimination Regression at -O3 since r13-4240-gfeeb0d68f1c708

2023-02-15 Thread yinyuefengyi at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108351

Xionghu Luo (luoxhu at gcc dot gnu.org)  changed:

   What|Removed |Added

 CC||yinyuefengyi at gmail dot com

--- Comment #3 from Xionghu Luo (luoxhu at gcc dot gnu.org)  ---
(In reply to Andrew Pinski from comment #2)
> I noticed that with the C++ front-end early inline inlines f into main but
> with the C front-end it does not ...

C++ generates different call instructions with C front-end:

<   D.3747 = f (7, 7);
<   D.3748 = f (9, 7);
---
>   f (7, 7);
>   f (9, 7);


then an extra return_cost cost is added to C++ front-end:

<   D.3747 = f (7, 7);
<   freq:1.00 size:  4 time: 13
<   D.3748 = f (9, 7);
<   freq:1.00 size:  4 time: 13
---
>   f (7, 7);
>   freq:1.00 size:  3 time: 12
>   f (9, 7);
>   freq:1.00 size:  3 time: 12


early inline pass inlines the two calls with C front-end but fails to inline
them with C++ front-end due to "growth 8 exceeds --param early-inlining-insns 
divided by number of calls". 

gcc/ipa-inline.cc:747growth * (n + 1) > early_inlining_insns

gcc/opts.cc:687{ OPT_LEVELS_3_PLUS, OPT__param_early_inlining_insns_, NULL,
14 },


Tried revert the commit r13-4240-gfeeb0d68f1c708, the two calls still exits,
need also revert the commit r13-4686-g095a13eda2caf6.
(r13-4686 mentioned IPA-SRA detects whether parameters could be removed,
obviously it doesn't work well now if constants are not propagated before
IPA-SRA, it seems that the IPA-SRA only remove locally_unused parameters but
'm' is not locally used and the constants from caller are not visible in
IPA-SRA?)

[Bug rtl-optimization/106707] [13 Regression] ICE: in cselib_record_set, at cselib.cc:2687 with -Oz -g -fno-cprop-registers -fno-dce since r13-1945-gfc6ef90173478521

2022-08-22 Thread yinyuefengyi at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106707

--- Comment #4 from Xionghu Luo (luoxhu at gcc dot gnu.org)  ---
Maybe guard the pattern with...

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 58fcc382fa2..2a9d70da6d0 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -3045,6 +3045,7 @@ (define_peephole2
  "optimize_size > 1
   && (REGNO (operands[0]) == AX_REG
   || REGNO (operands[1]) == AX_REG)
+  && REGNO(operands[0]) != REGNO(operands[1])
   && optimize_insn_for_size_p ()
   && peep2_reg_dead_p (1, operands[1])"
   [(parallel [(set (match_dup 0) (match_dup 1))

[Bug lto/100010] [10/11/12/13 Regression] ICE in lto_output_node, at lto-cgraph.c:447 (-fdevirtualize-at-ltrans) since r6-6384-gceda2c69d5219719

2022-08-16 Thread yinyuefengyi at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100010

--- Comment #8 from Xionghu Luo (luoxhu at gcc dot gnu.org)  ---
At the ICE point, node->clone_of has value, but clone_of is NULL:

(gdb) p clone_of
$114 = (cgraph_node *) 0x0
(gdb) p node->clone_of
$115 = (cgraph_node *) 0x76664bb0
(gdb) pnode node->clone_of
_ZN12ErrorHandler8decorateERK6String/0 (decorate)
  Type: function
  Visibility: semantic_interposition virtual
  next sharing asm name: 0
  References:
  Referring:
  Availability: not_available
  Function flags:
  Called by:
  Calls:
(gdb) pnode node
_ZN12ErrorHandler8decorateERK6String/0 (decorate)
  Type: function definition analyzed
  Visibility: semantic_interposition virtual
  previous sharing asm name: 0
  References:
  Referring:
  Read from file: a-pr10010.o
  Function decorate/0 is inline copy in decorate/1
  Clone of _ZN12ErrorHandler8decorateERK6String/0
  Availability: local
  Unit id: 1
  Function flags: count:1073741824 (estimated locally) local
  Called by: _ZN20LandmarkErrorHandler8decorateERK6String/1 (inlined)
(1073741824 (estimated locally),1.00 per call)
  Calls: _ZN6StringD1Ev/37 (1073741824 (estimated locally),1.00 per call)
_ZN6StringC1Ec/38 (1073741824 (estimated loca
lly),1.00 per call) (can throw external)
   Polymorphic indirect call of type struct ErrorHandler token:0(1073741824
(estimated locally),1.00 per call) (can thr
ow external) of param:0 (vptr maybe changed) num speculative call targets: 0
Outer type (dynamic):struct ErrorHandler (or a derived type) offset 0


This simple change could fix but not sure whether it is correct.

diff --git a/gcc/lto-cgraph.cc b/gcc/lto-cgraph.cc
index 6d9c36ea8b6..44a33a2af23 100644
--- a/gcc/lto-cgraph.cc
+++ b/gcc/lto-cgraph.cc
@@ -448,7 +448,7 @@ lto_output_node (struct lto_simple_output_block *ob, struct
cgraph_node *node,
   if (clone_of && !lto_symtab_encoder_encode_body_p (encoder,
ultimate_clone_of))
 clone_of = NULL;

-  if (tag == LTO_symtab_analyzed_node)
+  if (tag == LTO_symtab_analyzed_node && !flag_ltrans_devirtualize)
 gcc_assert (clone_of || !node->clone_of);
   if (!clone_of)
 streamer_write_hwi_stream (ob->main_stream, LCC_NOT_FOUND);

[Bug ipa/91771] Optimization fails to inline final override.

2022-08-12 Thread yinyuefengyi at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91771

Xionghu Luo (luoxhu at gcc dot gnu.org)  changed:

   What|Removed |Added

 CC||yinyuefengyi at gmail dot com

--- Comment #4 from Xionghu Luo (luoxhu at gcc dot gnu.org)  ---
Just curious about the 021t.ssa dump...


int f (struct Derived & d)
{
  struct Base * _1;  
  int _5;
  int _6;

   :
  _1 = _2(D)->D.2395;
  _5 = Base::foo (_1, 40);
  _6 = _5;
  return _6;

}


d_2 is a reference to "struct Derived" type instance, so is it an unnecessary
type promotion of promoting type "_1" to "struct Base *"?  Another thing to be
noted is early inline pass inlined Base::foo into f, but it failed to
devirtualize the virtual call in it, is it possible to devirt the call if
"struct Derived * _1" is produced in ssa pass?

[Bug ipa/101839] [10/11/12/13 Regression] Hang in C++ code with -fdevirtualize

2022-08-09 Thread yinyuefengyi at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101839

--- Comment #8 from Xionghu Luo (luoxhu at gcc dot gnu.org)  ---
The relationship is:

A  A::type
| 
| |
BA BA::type   CACA::type
|
CBA CBA::type

class CA and CBA are final, also function CA::type and BA::type are final, then
in function possible_polymorphic_call_targets for "target" BA::type, the
"DECL_FINAL_P (target)" check is not accurate enough, as there may be classes
like CBA derived from BA and have instance that need continue walk recursively
in possible_polymorphic_call_targets_1 to record_target_from_binfo.

  if (target)
{
  /* In the case we get complete method, we don't need 
 to walk derivations.  */
  if (DECL_FINAL_P (target))
context.maybe_derived_type = false;
}

So fix this by belong change only stop walk derivations when target is final
and it's class outer_type->type is also final?

diff --git a/gcc/ipa-devirt.cc b/gcc/ipa-devirt.cc
index 412ca14f66b..77f9b268e86 100644
--- a/gcc/ipa-devirt.cc
+++ b/gcc/ipa-devirt.cc
@@ -3188,7 +3188,9 @@ possible_polymorphic_call_targets (tree otr_type,

   /* In the case we get complete method, we don't need
 to walk derivations.  */
-  if (target && DECL_FINAL_P (target))
+  if (target && TREE_CODE (target) == FUNCTION_DECL && DECL_FINAL_P
(target)
+ && RECORD_OR_UNION_TYPE_P (out er_type->type)
+ && TYPE_FINAL_P (outer_type->type))
context.speculative_maybe_derived_type = false;
   if (type_possibly_instantiated_p (speculative_outer_type->type))
maybe_record_node (nodes, target, , can_refer,
_complete);
@@ -3233,7 +3235,9 @@ possible_polymorphic_call_targets (tree otr_type,
{
  /* In the case we get complete method, we don't need
 to walk derivations.  */
- if (DECL_FINAL_P (target))
+ if (TREE_CODE (target) == FUNCTION_DECL && DECL_FINAL_P (target)
+ && RECORD_OR_UNION_TYPE_P (outer_type->type)
+ && TYPE_FINAL_P (outer_type->type))
context.maybe_derived_type = false;
}

[Bug target/106069] [12/13 Regression] wrong code with -O -fno-tree-forwprop -maltivec on ppc64le

2022-08-04 Thread yinyuefengyi at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106069

--- Comment #32 from Xionghu Luo (luoxhu at gcc dot gnu.org)  ---
Thanks for all the information! It inspires to me that "native RTL should be
endian-independent". So both big-endian and little-endian platform should
generate same (vec_select (vec_concat (R0 R1) [0 4 1 5])) for altivec_vmrghw,
then combine pass could do correct "nested vec_select" optimization, the endian
check are left to ASM generation at last, that's the benefit for removing the
UNSPECS.  My culprit patch did change the LE representation, sorry for the
stupid mistake...

Attached the fix patch.  If it is reasonable, I will continue refine it and
send to maillist.

[Bug target/106069] [12/13 Regression] wrong code with -O -fno-tree-forwprop -maltivec on ppc64le

2022-08-04 Thread yinyuefengyi at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106069

--- Comment #31 from Xionghu Luo (luoxhu at gcc dot gnu.org)  ---
Created attachment 53408
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53408=edit
0001-rs6000-Fix-incorrect-RTL-for-Power-LE-when-removing-

[Bug target/106069] [12/13 Regression] wrong code with -O -fno-tree-forwprop -maltivec on ppc64le

2022-08-03 Thread yinyuefengyi at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106069

--- Comment #20 from Xionghu Luo (luoxhu at gcc dot gnu.org)  ---
Another reference is manually change the generated assembly with modifying the
source and index vspltw to verify:

luoxhu@gcc135 build $ diff q.bad.s q.good.s -U12
--- q.bad.s 2022-08-03 06:30:08.298451116 +
+++ q.good.s2022-08-03 06:30:52.887250451 +
@@ -18,31 +18,31 @@
addi 2,2,.TOC.-.LCF0@l
.localentry _Z3fooPhPjDv4_jS1_S1_S1_,.-_Z3fooPhPjDv4_jS1_S1_S1_
mflr %r0
std %r0,16(%r1)
std %r30,-16(%r1)
std %r31,-8(%r1)
stdu %r1,-112(%r1)
.cfi_def_cfa_offset 112
.cfi_offset 65, 16
.cfi_offset 30, -16
.cfi_offset 31, -8
mr %r30,%r3
-   vspltw %v0,%v2,0
+   vspltw %v0,%v5,3
mfvsrwz %r7,%vs32
-   vspltw %v0,%v3,0
+   vspltw %v0,%v4,3
mfvsrwz %r6,%vs32
-   vspltw %v0,%v4,0
+   vspltw %v0,%v3,3
mfvsrwz %r5,%vs32
-   vspltw %v0,%v5,0
+   vspltw %v0,%v2,3
mfvsrwz %r31,%vs32
rldicl %r7,%r7,0,32
rldicl %r6,%r6,0,32
rldicl %r5,%r5,0,32
rldicl %r4,%r31,0,32
addis %r3,%r2,.LC0@toc@ha
addi %r3,%r3,.LC0@toc@l
bl printf
nop
stb %r31,0(%r30)
addi %r1,%r1,112
.cfi_def_cfa_offset 0

luoxhu@gcc135 build $ gcc q.good.s -o q.good
luoxhu@gcc135 build $ ./q.good
B0: 41fcef98, 91648e8b,7dca18c6,61707865

Which means both register and index are incorrectly used in LE nested
vec_select optimization.

[Bug target/106069] [12/13 Regression] wrong code with -O -fno-tree-forwprop -maltivec on ppc64le

2022-08-03 Thread yinyuefengyi at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106069

--- Comment #19 from Xionghu Luo (luoxhu at gcc dot gnu.org)  ---
(In reply to Xionghu Luo (luo...@gcc.gnu.org) from comment #15)
> In combine: vec_select(vec_concat and the followed vec_select are combined
> to a single extract instruction, which seems reasonable for both LE and BE?
> 
> R146:   0 1 2 3
> R141:   4 5 6 7
> R150:   2 6 3 7// vec_select(vec_concat(r146:V4SI,r141:V4SI),[2 6 3 7])
> R151:   R150[3]// vec_select(r150:V4SI,3)
> 
> => 
> 
> R151:   R141[3]   //  vec_select(r141:V4SI,3)
> 
>   
> 
> Trying 21 -> 24:
>21: r150:V4SI=vec_select(vec_concat(r146:V4SI,r141:V4SI),parallel)
>   REG_DEAD r146:V4SI
>   REG_DEAD r141:V4SI
>24: {r151:SI=vec_select(r150:V4SI,parallel);clobber scratch;}
> Failed to match this instruction:
> (parallel [
> (set (reg:SI 151)
> (vec_select:SI (reg:V4SI 141)
> (parallel [
> (const_int 3 [0x3])
> ])))
> (clobber (scratch:SI))
> (set (reg:V4SI 150)
> (vec_select:V4SI (vec_concat:V8SI (reg:V4SI 146)
> (reg:V4SI 141))
> (parallel [
> (const_int 2 [0x2])
> (const_int 6 [0x6])
> (const_int 3 [0x3])
> (const_int 7 [0x7])
> ])))
> ])
> Failed to match this instruction:
> (parallel [
> (set (reg:SI 151)
> (vec_select:SI (reg:V4SI 141)
> (parallel [
> (const_int 3 [0x3])
> ])))
> (set (reg:V4SI 150)
> (vec_select:V4SI (vec_concat:V8SI (reg:V4SI 146)
> (reg:V4SI 141))
> (parallel [
> (const_int 2 [0x2])
> (const_int 6 [0x6])
> (const_int 3 [0x3])
> (const_int 7 [0x7])
> ])))
> ])
> Successfully matched this instruction:
> (set (reg:V4SI 150)
> (vec_select:V4SI (vec_concat:V8SI (reg:V4SI 146)
> (reg:V4SI 141))
> (parallel [
> (const_int 2 [0x2])
> (const_int 6 [0x6])
> (const_int 3 [0x3])
> (const_int 7 [0x7])
> ])))
> Successfully matched this instruction:
> (set (reg:SI 151)
> (vec_select:SI (reg:V4SI 141)
> (parallel [
> (const_int 3 [0x3])
> ])))
> allowing combination of insns 21 and 24
> original costs 4 + 4 = 8
> replacement costs 4 + 4 = 8
> modifying insn i221:
> r150:V4SI=vec_select(vec_concat(r146:V4SI,r141:V4SI),parallel)
>   REG_DEAD r146:V4SI
> deferring rescan insn with uid = 21.
> modifying insn i324: {r151:SI=vec_select(r141:V4SI,parallel);clobber
> scratch;}
>   REG_DEAD r141:V4SI
> deferring rescan insn with uid = 24.
> 
> 
> I guess the previous unspec implementation bypassed the LE + LE swap check,
> so now in split2, we should generate vextuwlx instead of vextuwrx on little
> endian?


This nested vec_select+vec_select+vec_concat optimization is introduced by Uros
in simplify-rtx.c by PR32661, unfortunately it only works for Power BE
platforms, disable that piece of code could work due to not combined the nested
vec_select optimizations...

For Power LE, firstly:

Trying 21 -> 24:

 R146:   3 2 1 0
 R141:   7 6 5 4
 R150:   7 3 6 2// vec_select(vec_concat(r146:V4SI,r141:V4SI),[2 6 3 7])
 R151:   R150[3]// vec_select(r150:V4SI,3)

 => 

currently:
 R151:   R141[3]   //  vec_select(r141:V4SI,3)

But it should be:
 R151:   R146[3]   //  vec_select(r146:V4SI,3)

Which means current:

R151: R150[3] R141[3]
R153: R150[2] R146[3]
R155: R150[1] R141[2]
R157: R150[0] R146[2]

Should be optimized to after the first nested vec_select optimization:

R151: R150[3] R146[3]
R153: R150[2] R141[3]
R155: R150[1] R146[2]
R157: R150[0] R141[2]

With some little endian check and swap could achieve the result (swap op00 and
op01).  But
Secondly there is another "nested vec_select" optimisation which produces
R151=R165[3]:

Trying 21 -> 26:
...

R146 R165 R163 [7 3 6 2]
R151: R146[3]   =>  R165[3]  (this is wrong!)

While R162, R163, R164, R165 is input value R0 R1 R2 R3.  the
vsx_extract_v4si_di_p9 index should be "0" instead of "3".

correct should be:

R151: R165[0]
R153: R164[0]
R155: R163[0]
R157: R162[0]


(insn 44 2 4 2 (set (reg:V4SI 162)
(reg:V4SI 66 2 [ R0 ])) "q.C":36:1 1157 {vsx_movv4si_64bit}
 (expr_list:REG_DEAD (reg:V4SI 66 2 [ R0 ])
(nil)))
(note 4 44 45 2 NOTE_INSN_DELETED)
(insn 45 4 5 2 (set (reg:V4SI 163)
(reg:V4SI 67 3 [ R1 ])) "q.C":36:1 1157 {vsx_movv4si_64bit}
 (expr_list:REG_DEAD (reg:V4SI 67 3 [ R1 ])
(nil)))
(note 5 45 46 2 NOTE_INSN_DELETED)
(insn 46 5 6 2 (set (reg:V4SI 164)
(reg:V4SI 68 4 [ R2 ])) "q.C":36:1 1157 {vsx_movv4si_64bit}