[Bug c++/105297] [12 Regression] new modules 'xtreme' test cases FAILs

2022-04-21 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105297

--- Comment #15 from Jiu Fu Guo  ---
(In reply to Patrick Palka from comment #13)
> (In reply to Jiu Fu Guo from comment #11)
> > (In reply to Patrick Palka from comment #10)
> > > 
> > > Interestingly that doesn't seem to make a difference.  What seems to 
> > > matter
> > > is whether the constexpr function modifies the CONSTRUCTOR that it 
> > > returns:
> > > 
> > > constexpr auto foo() {
> > >   struct S { int d; } t = {};
> > >   t.d = 0; // doesn't ICE if this line is commented out
> > >   return t;
> > > }
> > > 
> > > template
> > > int bar() {
> > >   constexpr auto t = foo();
> > >   return 0;
> > > }
> > 
> > Right, it is weird. Some PRs on Xtreme-* failure (including ICE) were also
> > reported before. e.g. PR100052, PR101853, PR99910.  As commented in those
> > PRs, these may be random failures, and changes in headers that could expose
> > the ICE.
> > I'm also wondering if this may be an issue hidden inside somewhere (GC?).
> 
> In this case I suspect it's just a bug in the modules code, I opened
> PR105322 to track it.

Oh, thanks!  This failure seems only about the module code on 'struct member
cross functions'.

[Bug c++/105297] [12 Regression] new modules 'xtreme' test cases FAILs

2022-04-21 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105297

Jiu Fu Guo  changed:

   What|Removed |Added

 CC||guojiufu at gcc dot gnu.org

--- Comment #11 from Jiu Fu Guo  ---
(In reply to Patrick Palka from comment #10)
> 
> Interestingly that doesn't seem to make a difference.  What seems to matter
> is whether the constexpr function modifies the CONSTRUCTOR that it returns:
> 
> constexpr auto foo() {
>   struct S { int d; } t = {};
>   t.d = 0; // doesn't ICE if this line is commented out
>   return t;
> }
> 
> template
> int bar() {
>   constexpr auto t = foo();
>   return 0;
> }

Right, it is weird. Some PRs on Xtreme-* failure (including ICE) were also
reported before. e.g. PR100052, PR101853, PR99910.  As commented in those PRs,
these may be random failures, and changes in headers that could expose the ICE.
I'm also wondering if this may be an issue hidden inside somewhere (GC?).

[Bug go/105315] go build fail on ppc: has no member named 'gregs'; did you mean 'regs'?

2022-04-21 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105315

--- Comment #4 from Jiu Fu Guo  ---
(In reply to Ian Lance Taylor from comment #3)
> Thanks, should be fixed now, I hope.

As tested, 'go' build pass on that machine now. Thanks!

[Bug go/105315] go build fail on ppc: has no member named 'gregs'; did you mean 'regs'?

2022-04-19 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105315

--- Comment #1 from Jiu Fu Guo  ---
The failure compiling command is about -m32:

/home/guojiufu/gcc/build/gcc-mainline-base/./gcc/xgcc
-B/home/guojiufu/gcc/build/gcc-mainline-base/./gcc/
-B/home/guojiufu/gcc/install/gcc-mainline-base/powerpc64-unknown-linux-gnu/bin/
-B/home/guojiufu/gcc/install/gcc-mainline-base/powerpc64-unknown-linux-gnu/lib/
-isystem
/home/guojiufu/gcc/install/gcc-mainline-base/powerpc64-unknown-linux-gnu/include
-isystem
/home/guojiufu/gcc/install/gcc-mainline-base/powerpc64-unknown-linux-gnu/sys-include
-m32 -DHAVE_CONFIG_H -I. -I/home/guojiufu/gcc/gcc-mainline-base/libgo -I
/home/guojiufu/gcc/gcc-mainline-base/libgo/runtime
-I/home/guojiufu/gcc/gcc-mainline-base/libgo/../libffi/include
-I../libffi/include -pthread -L../libatomic/.libs -fexceptions
-fnon-call-exceptions -Wall -Wextra -Wwrite-strings -Wcast-qual -Werror
-D_GNU_SOURCE -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I
/home/guojiufu/gcc/gcc-mainline-base/libgo/../libgcc -I
/home/guojiufu/gcc/gcc-mainline-base/libgo/../libbacktrace -I
../../../gcc/include -g -O2 -m32 -MT runtime/go-signal.lo -MD -MP -MF
runtime/.deps/go-signal.Tpo -c
/home/guojiufu/gcc/gcc-mainline-base/libgo/runtime/go-signal.c  -fPIC -DPIC -o
runtime/.libs/go-signal.o

[Bug go/105315] New: go build fail on ppc: has no member named 'gregs'; did you mean 'regs'?

2022-04-19 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105315

Bug ID: 105315
   Summary: go build fail on ppc: has no member named 'gregs'; did
you mean 'regs'?
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: go
  Assignee: ian at airs dot com
  Reporter: guojiufu at gcc dot gnu.org
CC: cmang at google dot com
  Target Milestone: ---

On P8 BE machine, I encounter a build failure.

libgo/runtime/go-signal.c:236:59: error: 'union uc_regs_ptr' has no member
named 'gregs'; did you mean 'regs'?
  236 | ret.sigpc = ((ucontext_t*)(context))->uc_mcontext.gregs[32];


The machine is:
Architecture:  ppc64
CPU op-mode(s):32-bit, 64-bit
Byte Order:Big Endian
Model name:POWER8 (architected), altivec supported

reproduce command:
$GCC_SRC/configure --enable-languages=c,c++,go --with-cpu=native
--with-long-double-128 --prefix=~/install/gcc-mainline-base
make -j 30


It may be the commit "r12-8168 af27d545dc6132dcd67d1ee854372ea9cfd2a225" which
cause this issue.

[Bug rtl-optimization/85409] [9/10/11/12 Regression] ICE in alloc_succs_info, at sel-sched-ir.c:4730

2022-04-11 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85409

Jiu Fu Guo  changed:

   What|Removed |Added

 CC||guojiufu at gcc dot gnu.org

--- Comment #9 from Jiu Fu Guo  ---
The latest trunk has fixed this issue. As check the fix is r11-321
998fbe9f1f7e5ef53ca79fbd28f8a3875a477baa.  This fix is handling the debug_insn
which was also mentioned in comment 4.

Just wondering if need to backport to gcc10/9.

[Bug rtl-optimization/105023] new test case g++.dg/other/pr104989.C ICEs

2022-04-07 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105023

Jiu Fu Guo  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #8 from Jiu Fu Guo  ---
mark as fixed.

[Bug rtl-optimization/105023] new test case g++.dg/other/pr104989.C ICEs

2022-04-07 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105023

Jiu Fu Guo  changed:

   What|Removed |Added

 CC||guojiufu at gcc dot gnu.org

--- Comment #7 from Jiu Fu Guo  ---
This would already been fixed via r12-7833
(41d1f11f5f693a2a06c65c9467a28dfeb02aed85).

With this patch, the call is:
(call_insn/u/c 23 22 0 (parallel [
(call (mem:SI (symbol_ref:DI ("_Z1cz") [flags 0x3]  ) [0 c S4 A8])
(const_int 2305843009213693952 [0x2000]))
(use (const_int 0 [0]))
(clobber (reg:DI 96 lr))
]) "/home/guojiufu/temp/pr104989.C":8:16 -1
 (expr_list:REG_CALL_DECL (symbol_ref:DI ("_Z1cz") [flags 0x3] 
)
(expr_list:REG_EH_REGION (const_int 0 [0])
(nil)))
(expr_list (use (reg:DI 2 %r2))
(expr_list (use (reg:DI 10 %r10))
(expr_list (use (reg:DI 9 %r9))
(expr_list (use (reg:DI 8 %r8))
(expr_list (use (reg:DI 7 %r7))
(expr_list (use (reg:DI 6 %r6))
(expr_list (use (reg:DI 5 %r5))
(expr_list (use (reg:DI 4 %r4))
(expr_list (use (reg:DI 3 %r3))
(expr_list:BLK (use (mem:BLK (reg/f:DI
114 virtual-outgoing-args) [0  S2305843009213693952 A128]))
(nil

[Bug c++/100052] [11/12 regression] ICE in compiling g++.dg/modules/xtreme-header-3_b.C after r11-8118

2022-04-01 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100052

--- Comment #10 from Jiu Fu Guo  ---
While would we keep this open for a while to see if this issue occurs again.

[Bug c++/100052] [11/12 regression] ICE in compiling g++.dg/modules/xtreme-header-3_b.C after r11-8118

2022-03-31 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100052

Jiu Fu Guo  changed:

   What|Removed |Added

 CC||guojiufu at gcc dot gnu.org

--- Comment #9 from Jiu Fu Guo  ---
On the latest trunk, these failures seems to disappear again.

./gcc/testsuite/g++/g++.sum:PASS: g++.dg/modules/xtreme-header-3_b.C -std=c++17
(test for excess errors)
./gcc/testsuite/g++/g++.sum:PASS: g++.dg/modules/xtreme-header-3_b.C -std=c++2a
(test for excess errors)
./gcc/testsuite/g++/g++.sum:PASS: g++.dg/modules/xtreme-header-3_b.C -std=c++2b
(test for excess errors)
./gcc/testsuite/g++/g++.log:PASS: g++.dg/modules/xtreme-header-3_b.C -std=c++17
(test for excess errors)
./gcc/testsuite/g++/g++.log:PASS: g++.dg/modules/xtreme-header-3_b.C -std=c++2a
(test for excess errors)
./gcc/testsuite/g++/g++.log:PASS: g++.dg/modules/xtreme-header-3_b.C -std=c++2b
(test for excess errors)

[Bug c++/99910] [11/12 Regression] g++.dg/modules/xtreme-header-2_b.C ICE

2022-03-31 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99910

Jiu Fu Guo  changed:

   What|Removed |Added

 CC||guojiufu at gcc dot gnu.org

--- Comment #10 from Jiu Fu Guo  ---
./gcc/testsuite/g++/g++.sum:PASS: g++.dg/modules/xtreme-header-2_b.C -std=c++17
(test for excess errors)
./gcc/testsuite/g++/g++.sum:PASS: g++.dg/modules/xtreme-header-2_b.C -std=c++2a
(test for excess errors)
./gcc/testsuite/g++/g++.sum:PASS: g++.dg/modules/xtreme-header-2_b.C -std=c++2b
(test for excess errors)
./gcc/testsuite/g++/g++.log:PASS: g++.dg/modules/xtreme-header-2_b.C -std=c++17
(test for excess errors)
./gcc/testsuite/g++/g++.log:PASS: g++.dg/modules/xtreme-header-2_b.C -std=c++2a
(test for excess errors)
./gcc/testsuite/g++/g++.log:PASS: g++.dg/modules/xtreme-header-2_b.C -std=c++2b
(test for excess errors)
.
./gcc/testsuite/g++/g++.sum:PASS: g++.dg/modules/xtreme-tr1_b.C -std=c++17
(test for excess errors)
./gcc/testsuite/g++/g++.sum:PASS: g++.dg/modules/xtreme-tr1_b.C -std=c++2a
(test for excess errors)
./gcc/testsuite/g++/g++.sum:PASS: g++.dg/modules/xtreme-tr1_b.C -std=c++2b
(test for excess errors)
./gcc/testsuite/g++/g++.log:PASS: g++.dg/modules/xtreme-tr1_b.C -std=c++17
(test for excess errors)
./gcc/testsuite/g++/g++.log:PASS: g++.dg/modules/xtreme-tr1_b.C -std=c++2a
(test for excess errors)
./gcc/testsuite/g++/g++.log:PASS: g++.dg/modules/xtreme-tr1_b.C -std=c++2b
(test for excess errors)

It would pass with the latest trunk. (I tested on ppc64le)

[Bug c++/101853] [12 Regression] g++.dg/modules/xtreme-header-5_b.C ICE

2022-03-31 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101853

Jiu Fu Guo  changed:

   What|Removed |Added

 CC||guojiufu at gcc dot gnu.org

--- Comment #10 from Jiu Fu Guo  ---
On the trunk, this would be fixed:
./gcc/testsuite/g++/g++.sum:PASS: g++.dg/modules/xtreme-header-5_b.C -std=c++17
(test for excess errors)
./gcc/testsuite/g++/g++.sum:PASS: g++.dg/modules/xtreme-header-5_b.C -std=c++2a
(test for excess errors)
./gcc/testsuite/g++/g++.sum:PASS: g++.dg/modules/xtreme-header-5_b.C -std=c++2b
(test for excess errors)
./gcc/testsuite/g++/g++.log:PASS: g++.dg/modules/xtreme-header-5_b.C -std=c++17
(test for excess errors)
./gcc/testsuite/g++/g++.log:PASS: g++.dg/modules/xtreme-header-5_b.C -std=c++2a
(test for excess errors)
./gcc/testsuite/g++/g++.log:PASS: g++.dg/modules/xtreme-header-5_b.C -std=c++2b
(test for excess errors)

> grep -r xtreme-. > ~/22.4.1aaf3a5993ae.log
grep -i FAIL ~/22.4.1aaf3a5993ae.log |wc
  0   0   0
grep -i PASS ~/22.4.1aaf3a5993ae.log |wc
3602232   41904

[Bug rtl-optimization/105091] RTL dse1 remove stack mem storing incorrectly

2022-03-31 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105091

--- Comment #14 from Jiu Fu Guo  ---
(In reply to Richard Biener from comment #13)
...
> Does the following fix the runtime error?  The RTL after DSE seems to be OK.
> 
> diff --git a/gcc/gimple-expr.cc b/gcc/gimple-expr.cc
> index f9a650b5daf..5faaf43eaf5 100644
> --- a/gcc/gimple-expr.cc
> +++ b/gcc/gimple-expr.cc
> @@ -910,7 +910,8 @@ mark_addressable (tree x)
>  x = TREE_OPERAND (x, 0);
>while (handled_component_p (x))
>  x = TREE_OPERAND (x, 0);
> -  if (TREE_CODE (x) == MEM_REF
> +  if ((TREE_CODE (x) == MEM_REF
> +   || TREE_CODE (x) == TARGET_MEM_REF)
>&& TREE_CODE (TREE_OPERAND (x, 0)) == ADDR_EXPR)
>  x = TREE_OPERAND (TREE_OPERAND (x, 0), 0);
>if (!VAR_P (x)

Hi Richard!
Thanks a lot, so great!  This fix works, also pass bootstrap for
ppc64/ppc64le and x86_64.

[Bug rtl-optimization/105091] RTL dse1 remove stack mem storing incorrectly

2022-03-30 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105091

--- Comment #12 from Jiu Fu Guo  ---
In dse.cc, "may_be_aliased" affects "can_escape" and then affects
"kill_on_calls".

[Bug rtl-optimization/105091] RTL dse1 remove stack mem storing incorrectly

2022-03-30 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105091

--- Comment #11 from Jiu Fu Guo  ---
Find one difference between trunk and r12-656:

On trunk:
tree expr = MEM_EXPR (mem); 
   where mem is 
(mem/f/c:DI (plus:DI (reg/f:DI 110 sfp)
 (const_int 32 [0x20])) [3 GOTMP.2[0].x.__values+0 S8 A128])
and then expr is GOTMP.2[0].x.__values

base = get_base_address (expr);   base is "GOTMP.2"
"may_be_aliased (base)" returns false. 

On r12-656: "may_be_aliased (base)" returns true.

may_be_aliased checks TREE_ADDRESSABLE which also returns differences between
trunk and r12-656.

[Bug rtl-optimization/105091] RTL dse1 remove stack mem storing incorrectly

2022-03-30 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105091

--- Comment #10 from Jiu Fu Guo  ---
Created attachment 52718
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52718=edit
m.go sub1.go

Based on Ian's code, the below code also reproduce this issue.
package sub1

func TestBits(callback func(interface{})) {
for _, test := range []struct {
x, y, want []int
}{
{[]int{}, nil, nil},
{[]int{0}, nil, nil},
} {
p := test.x
callback(p)
}
}

---
> go1 sub1.go -quiet -O2 -o sub1.s

[Bug rtl-optimization/105091] RTL dse1 remove stack mem storing incorrectly

2022-03-30 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105091

--- Comment #9 from Jiu Fu Guo  ---
(In reply to Ian Lance Taylor from comment #8)
...
> 
> package main
> 
> func main() {
>   for _, test := range []struct {
>   x, y, want []int
>   }{
>   {[]int{}, []int{}, nil},
>   {[]int{0}, []int{0}, []int{0}},
>   } {
>   p := test.x
>   F(p)
>   }
> }
> 
> func F(v interface{}) {
>  recover()
>  println(cap(v.([]int)))
> }
> 
> This can be compiled (though not run) using a cross-compiler without
> building libgo.
> 
> The code coming into 280r.dse1 seems to be indexing from the end of the
> array.  I see
> 
> code_label 96 126 55 4 118 (nil) [0 uses])
> (note 55 96 56 4 [bb 4] NOTE_INSN_BASIC_BLOCK)
> (insn 56 55 57 4 (set (reg:DI 144)
> (mult:DI (reg:DI 121 [ ivtmp_47 ])
> (const_int -72 [0xffb8]))) "foo.go":4:2 154 {muldi3}
>  (nil))
> (insn 57 56 59 4 (set (reg/f:DI 145)
> (plus:DI (reg/f:DI 173)
> (reg:DI 144))) "foo.go":4:2 66 {*adddi3}
>  (expr_list:REG_DEAD (reg/f:DI 173)
> (expr_list:REG_DEAD (reg:DI 144)
> (nil
> 
> where earlier I see
> 
> (insn 17 16 19 2 (set (mem/f/c:DI (plus:DI (reg/f:DI 110 sfp)
> (const_int 32 [0x20])) [8 GOTMP.5[0].x.__values+0 S8 A128])
> (reg/f:DI 117 [ _11 ])) "foo.go":4:23 670 {*movdi_internal64}
>  (expr_list:REG_DEAD (reg/f:DI 117 [ _11 ])
> (nil)))
> 
> and
> 
> (insn 120 4 121 2 (set (reg/f:DI 173)
> (plus:DI (reg/f:DI 110 sfp)
> (const_int 32 [0x20]))) 66 {*adddi3}
>  (nil))
> 
> So register 173 is  although insn 120 doesn't indicate that.  Then
> the 280r.dse1 pass drops out all the assignments to GOTMP.5, presumably
> because it doesn't understand that register 173 points to it.

Hi Ian!

Thanks for your great help!

[Bug rtl-optimization/105091] RTL dse1 remove stack mem storing incorrectly

2022-03-29 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105091

--- Comment #7 from Jiu Fu Guo  ---
tried to remove 'fmt' from the narrowed code, but it is still in code :)

[Bug rtl-optimization/105091] RTL dse1 remove stack mem storing incorrectly

2022-03-29 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105091

--- Comment #6 from Jiu Fu Guo  ---
---bits_test.go
package big

import (
"fmt"
"testing"
)

type Bits []int

func TestMulBits(t *testing.T) {
for _, test := range []struct {
x, y, want Bits
}{

{Bits{}, Bits{}, nil},
{Bits{0}, Bits{0}, Bits{0}},
} {
p := test.x
fmt.Printf("%v", p)
}
}

---

Hi Richard!

The dumps are attached.  Thanks.
One interesting thing: after r12-656, it seems no changes on dse.cc relates to
this issue.

[Bug rtl-optimization/105091] RTL dse1 remove stack mem storing incorrectly

2022-03-29 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105091

--- Comment #5 from Jiu Fu Guo  ---
Created attachment 52709
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52709=edit
280r.dse1

[Bug rtl-optimization/105091] RTL dse1 remove stack mem storing incorrectly

2022-03-29 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105091

--- Comment #4 from Jiu Fu Guo  ---
Created attachment 52708
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52708=edit
279r.cse2

[Bug rtl-optimization/105091] RTL dse1 remove stack mem storing incorrectly

2022-03-28 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105091

--- Comment #2 from Jiu Fu Guo  ---
starting to process insn 14
  v:  1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60,
61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80,
81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99,
100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115,
116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131,
132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147,
148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163,
164, 165, 166, 167, 168, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203,
204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216
i = 32, index = 1
i = 33, index = 2
i = 34, index = 3
i = 35, index = 4
i = 36, index = 5
i = 37, index = 6
i = 38, index = 7
i = 39, index = 8
deferring deletion of insn with uid = 14.

[Bug rtl-optimization/105091] RTL dse1 remove stack mem storing incorrectly

2022-03-28 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105091

--- Comment #1 from Jiu Fu Guo  ---
Checking the dumps from dse1, some "stack memory store" are deleted
incorrectly.

   12: %3:DI=call [`runtime.newobject'] argc:0
  REG_CALL_DECL `runtime.newobject'
   13: r117:DI=%3:DI
  REG_DEAD %3:DI
   14: [sfp:DI+0x20]=r117:DI
  REG_DEAD r117:DI
dse1 removes instruction 14.

[Bug rtl-optimization/105091] New: RTL dse1 remove stack mem storing incorrectly

2022-03-28 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105091

Bug ID: 105091
   Summary: RTL dse1 remove stack mem storing incorrectly
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: guojiufu at gcc dot gnu.org
  Target Milestone: ---

Code is narrowed from math/big which passes at the early of r12(r12-656). 
With -fdisable-rtl-dse1, the case also passes. 

_testmain.go
package main

import "fmt"
import "testing"
import "testing/internal/testdeps"
import __os__ "os"

type Bits []int

func TestMulBits(t *testing.T) {
for _, test := range []struct {
x, y, want Bits
}{

{Bits{}, Bits{}, nil},
{Bits{0}, Bits{0}, Bits{0}},
} {
p := test.x
fmt.Printf("%v", p)
}
}

var tests = []testing.InternalTest {
{"TestMulBits", TestMulBits},
}
var benchmarks = []testing.InternalBenchmark{
}
var fuzzTargets = []testing.InternalFuzzTarget{
}
var examples = []testing.InternalExample{
}

func main() {
m := testing.MainStart(testdeps.TestDeps{}, tests, benchmarks,
fuzzTargets, examples)

__os__.Exit(m.Run())
}
-

> gccgo -O2 _testmain.go && ./a.out -test.v
=== RUN   TestMulBits
[35185086168544 32199672319005300 268454424 268599296 35185086167968 15 32
35185086167968 15 35184405205936 824635867296 15 35184393891632 0
35185086168112]--- FAIL: TestMulBits (0.00s)
panic: runtime error: invalid memory address or nil pointer dereference
[recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x20d08280]

goroutine 17 [running]:
testing.tRunner..func3
/home/guojiufu/gcc/gcc-mainline-base/libgo/go/testing/testing.go:1425
testing.tRunner..func1
/home/guojiufu/gcc/gcc-mainline-base/libgo/go/testing/testing.go:1342
panic
/home/guojiufu/gcc/gcc-mainline-base/libgo/go/runtime/panic.go:714
reflect.typedmemmove
/home/guojiufu/gcc/gcc-mainline-base/libgo/go/runtime/mbarrier.go:197
reflect.packEface
/home/guojiufu/gcc/gcc-mainline-base/libgo/go/reflect/value.go:123
reflect.valueInterface
/home/guojiufu/gcc/gcc-mainline-base/libgo/go/reflect/value.go:930
reflect.Value.Interface
/home/guojiufu/gcc/gcc-mainline-base/libgo/go/reflect/value.go:890
fmt.pp.printValue
/home/guojiufu/gcc/gcc-mainline-base/libgo/go/fmt/print.go:722



> gccgo -O2 _testmain.go -fdisable-rtl-dse1  && ./a.out -test.v
go1: note: disable pass rtl-dse1 for functions in the range of [0, 4294967295]
=== RUN   TestMulBits
[][0]--- PASS: TestMulBits (0.00s)
PASS

> gccgo -v
Using built-in specs.
COLLECT_GCC=/home/guojiufu/gcc/install/gcc-mainline-base-debug/bin/gccgo
COLLECT_LTO_WRAPPER=/home/guojiufu/gcc/install/gcc-mainline-base-debug/libexec/gcc/powerpc64le-unknown-linux-gnu/12.0.1/lto-wrapper
Target: powerpc64le-unknown-linux-gnu
Configured with: /home/guojiufu/gcc/gcc-mainline-base/configure
--enable-languages=c,c++,go --with-cpu=native --enable-checking
--with-long-double-128
--prefix=/home/guojiufu/gcc/install/gcc-mainline-base-debug --disable-bootstrap
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 12.0.1 20220318 (experimental) (GCC)


I encounter this on ppc64le and did not reproduce it x86_64.

[Bug preprocessor/101168] gnu++14 complains about altivec types defined with using keyword in the same file with preprocessor macros

2022-03-17 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101168

Jiu Fu Guo  changed:

   What|Removed |Added

 CC||guojiufu at gcc dot gnu.org

--- Comment #1 from Jiu Fu Guo  ---
This issue seems can also be reproduced at at10.0(gcc6.4.1).

[Bug target/103743] PPC: Inefficient equality compare for large 64-bit constants having only 16-bit relevant bits in high part

2022-03-16 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103743

--- Comment #5 from Jiu Fu Guo  ---
It would be also ok for the constant that only has 16bits in the middle:
e.g. 0x09876000ULL, we can rotate the constant to 0x9876.

[Bug target/103743] PPC: Inefficient equality compare for large 64-bit constants having only 16-bit relevant bits in high part

2022-03-15 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103743

Jiu Fu Guo  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |guojiufu at gcc dot 
gnu.org

--- Comment #4 from Jiu Fu Guo  ---
rot(a)==rot(b_)==> a==rot(b')

[Bug target/103743] PPC: Inefficient equality compare for large 64-bit constants having only 16-bit relevant bits in high part

2022-03-14 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103743

Jiu Fu Guo  changed:

   What|Removed |Added

 CC||guojiufu at gcc dot gnu.org

--- Comment #3 from Jiu Fu Guo  ---
For "in == 0x8000LL", it would be also ok with:
rotldi %r9,%r3,16
cmpldi %cr0,%r9,32768

And it would be similar for "in == 0x8000LL" (highest bit and
low48bits are all 1)
rotldi %r9,%r3,16
cmpdi %cr0,%r9,-32768

[Bug target/104525] timeout on signed overflow at O0 fwrapv

2022-02-14 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104525

--- Comment #1 from Jiu Fu Guo  ---
with "-fsigned-char", the case run ok. 
When 'char' is treated as 'unsigned': "c != -4" would not be true is any value
of 'c'.
So, this PR would be invalid.

[Bug target/104525] New: timeout on signed overflow at O0 fwrapv

2022-02-14 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104525

Bug ID: 104525
   Summary: timeout on signed overflow at O0 fwrapv
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: guojiufu at gcc dot gnu.org
  Target Milestone: ---

When checking the case in PR104521, a timeout occurs on the case at ppc64le.
The timeout also occurs with -O0 -fwrapv (without -fwrapv, the signed overflow
which would be a UB). This issue seems to exist for a long time (gcc6).

> cat small.c
char a, b, c;
int main() {
  unsigned char d = 1;
  while (1) {
if (c >= a) {
  for (c = 0; c != -4; c -= 3) {
while (!d)
  b = 0;
continue;
  }
}
d = ~a;
if (!d)
  continue;
return 0;
  }
}

> gcc -fwrapv -O0  small.c ; timeout -s 9 5 ./a.out
Killed

[Bug tree-optimization/104519] [12 Regression] wrong code at -Os on x86_64-linux-gnu and char as induction variable

2022-02-13 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104519

Jiu Fu Guo  changed:

   What|Removed |Added

 CC||guojiufu at gcc dot gnu.org

--- Comment #4 from Jiu Fu Guo  ---
Had a quick bisect and test,  this PR and 104521 seems caused by 
r12-7052-g0898049ad9bf6c46e510b18aaafca4946802749f.

[Bug rtl-optimization/68212] Loop unroller breaks basic block frequencies

2022-02-06 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68212

--- Comment #10 from Jiu Fu Guo  ---
I had a try for GCC11, 
https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574421.html.
The patches could mitigate the BB-count mismatch issue for loops. In theory,
this patch would make sense. But it also raises the mismatch BB's count out of
the loop for some cases.
With tests on spec2017, this patch could help performance on some bmks, while
it also introduces recession on some bmks.  So, I did not pursue pushing that
patch.

[Bug tree-optimization/102087] [12 Regression] ICE on valid code at -O3 on x86_64-linux-gnu: in determine_exit_conditions, at tree-ssa-loop-manip.c:1049 since r12-3136-g3673dcf6d6baeb67

2022-01-19 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102087

--- Comment #22 from Jiu Fu Guo  ---
(In reply to Martin Liška from comment #21)
> > _12 = Gif_ClipImage_gfi_1.0_1 + -1;
> > during GIMPLE pass: aprefetch
> > bug760.c:3:1: internal compiler error: verify_gimple failed
> > 0xde2f5a verify_gimple_in_cfg(function*, bool)
> > 
> > Flag -O3 -march=bdver2 required.
> 
> Confirmed.

Hi David and Martin, Thanks.  I can reproduce the failure. 
Just drafting a patch and will send it out for review after regtest.

[Bug rtl-optimization/63281] powerpc64le creates 64 bit constants from scratch instead of loading them

2022-01-09 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63281

--- Comment #22 from Jiu Fu Guo  ---
On power10, loading constant only needs 1 instruction, like:
pld 9,.LC0@pcrel

And, as tests, it seems nearly as fast as using 1 instruction to build const.

[Bug rtl-optimization/63281] powerpc64le creates 64 bit constants from scratch instead of loading them

2022-01-06 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63281

--- Comment #21 from Jiu Fu Guo  ---
Also had a test on powerpc, -m32.  As testing, it seems no significant benefit
loading from 'rodata' vs. building constants by instructions.

lis %r7,0x410
ori %r7,%r7,0x103c
lis %r6,0x710
ori %r6,%r6,0xe005

lis %r12,.LC3@ha
la %r12,.LC3@l(%r12)
lwz %r3,0(%r12)
lwz %r4,4(%r12)

[Bug rtl-optimization/63281] powerpc64le creates 64 bit constants from scratch instead of loading them

2022-01-04 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63281

--- Comment #20 from Jiu Fu Guo  ---
Created attachment 52114
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52114=edit
testcases

With these test cases, invoke 'foo' in these cases 1000,000,000 times, to see
the runtime:
building 'constant' through 1 insn is fastest.
next faster is building const by 2 instructions, or loading from rodata, or
loading from toc.
building const by 3 instructions is slower than loading from rodata, building
const by 5 ins is slowest.

[Bug rtl-optimization/63281] powerpc64le creates 64 bit constants from scratch instead of loading them

2022-01-03 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63281

--- Comment #19 from Jiu Fu Guo  ---
(In reply to Segher Boessenkool from comment #18)
Thanks for your clarify! 
> Yes, it is slow.  Five sequential dependent integer instructions instead of
> one load instruction.  Depending on how you benchmark this you possibly won't
Yes, it depends on how the cases are benchmarked.  There are some factors that
affect the runtime.  This is really the point! 
In the above cases, a few std(s) and there is one spill on r31 are all affect
the runtime and would hide the instructions on const building.
Focusing on the sequence to build a const, the 5 insns sequence is faster a lot
than the sequence of 1 insns.

[Bug rtl-optimization/63281] powerpc64le creates 64 bit constants from scratch instead of loading them

2021-12-30 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63281

--- Comment #17 from Jiu Fu Guo  ---
One thing, I'm wondering, is if it is really 'slow' using instructions to build
the const (even with 5 insns). 

For example, there seems no big difference in runtime between the below two
pieces of code on a real machine.
1.

foo:
.LFB0:
.cfi_startproc
std %r31,-8(%r1)
.cfi_offset 31, -8
li %r12,2
li %r31,1
li %r0,3
li %r11,4
std %r31,0(%r3)
std %r12,0(%r4)
std %r0,0(%r5)
std %r11,0(%r6)
std %r31,0(%r7)
std %r12,0(%r8)
ld %r31,-8(%r1)
std %r0,0(%r9)
std %r11,0(%r10)
.cfi_restore 31
blr


2
foo:
.LFB0:
.cfi_startproc
std 31,-8(1)
.cfi_offset 31, -8
li 11,0
li 31,0
li 12,0
ori 11,11,0x8000
ori 31,31,0x8000
ori 12,12,0x8000
sldi 11,11,32
sldi 31,31,32
sldi 12,12,32
oris 11,11,0x410
oris 31,31,0x410
oris 12,12,0x410
ori 11,11,0x1
ori 31,31,0x3
ori 12,12,0x5
li 0,0
std 11,0(3)
std 31,0(4)
li 3,0
li 4,0
std 12,0(5)
li 5,0
ori 0,0,0x8000
ld 31,-8(1)
ori 3,3,0x8000
ori 4,4,0x8000
ori 5,5,0x8000
sldi 0,0,32
sldi 3,3,32
sldi 4,4,32
sldi 5,5,32
oris 0,0,0x410
oris 3,3,0x410
oris 4,4,0x410
oris 5,5,0x410
ori 0,0,0x7
addi 11,11,5
ori 3,3,0xa
ori 4,4,0xe
ori 5,5,0xc
std 0,0(6)
std 11,0(7)
std 3,0(8)
std 4,0(9)
std 5,0(10)
.cfi_restore 31
   blr

[Bug rtl-optimization/63281] powerpc64le creates 64 bit constants from scratch instead of loading them

2021-12-30 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63281

--- Comment #16 from Jiu Fu Guo  ---
Thanks, Alan!
I saw your patches in this PR. They would help us to get the sequence of what
we are thinking. And as you said in the comments: it is a big problem for
fixing insn and rtl cost.

[Bug rtl-optimization/63281] powerpc64le creates 64 bit constants from scratch instead of loading them

2021-12-29 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63281

--- Comment #14 from Jiu Fu Guo  ---
For constant like 0x0008411, which is using 5 insns, at 'expand' pass,
it is treated as preferred to save in memory, while at cse1 pass, it was
replaced back to constant.

expand:
7: r119:DI=[unspec[`*.LC0',%r2:DI] 47]
  REG_EQUAL 0x8411
8: [r117:DI]=r119:DI

cse1:
7: r119:DI=0x8411
  REG_EQUAL 0x8411
8: [r117:DI]=r119:DI

This is because:
expand_assignment invoke force_const_mem/gen_const_mem under the condition:
(num_insns_constant (operands[1], mode) > (TARGET_CMODEL != CMODEL_SMALL ? 3 :
2))

At cse1, when comparing the cost between 'fold_const' and 'src', 'fold_const'
is selected
'preferable (src_folded_cost, src_folded_regcost, src_cost, src_regcost) <= 0'

src:
(mem/u/c:DI (unspec:DI [
(symbol_ref/u:DI ("*.LC0") [flags 0x82])
(reg:DI 2 2)
] UNSPEC_TOCREL) [2  S8 A8])
fold_const:
(const_int 140737556512769 [0x8411])

It would be a way to keep the data in memory(.rodata) through adjusting the
cost of constant.

[Bug rtl-optimization/63281] powerpc64le creates 64 bit constants from scratch instead of loading them

2021-12-21 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63281

--- Comment #11 from Jiu Fu Guo  ---
While for the const which Bill said in comment9, 0x0008411
The code sequence still contains a few instructions:
e.g.
li %r11,0
ori %r11,%r11,0x8000
sldi %r11,%r11,32
oris %r11,%r11,0x410
ori %r11,%r11,0x1
std %r11,0(%r3)

[Bug rtl-optimization/63281] powerpc64le creates 64 bit constants from scratch instead of loading them

2021-12-21 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63281

Jiu Fu Guo  changed:

   What|Removed |Added

 CC||guojiufu at gcc dot gnu.org

--- Comment #10 from Jiu Fu Guo  ---
With the latest trunk (AT14 is similar), the generated code looks like this:

-O
lis %r9,0x8123
ori %r9,%r9,0x4567
rldimi %r9,%r9,32,0
std %r9,0(%r10)

Or 
-O3
lis %r11,0x1234
lis %r31,0x2345
lis %r12,0x3456
ori %r11,%r11,0x5678
ori %r31,%r31,0x6781
ori %r12,%r12,0x7812
rldimi %r11,%r11,32,0
rldimi %r31,%r31,32,0
rldimi %r12,%r12,32,0
...

This code seems better than the previous one.

[Bug target/102069] [12 regression] New test case gcc.dg/vect/pr101145_3.c in r12-3136 fails on power 7

2021-12-15 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102069

Jiu Fu Guo  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from Jiu Fu Guo  ---
Updated the case in the trunk with the requirement of vec_char_add.

[Bug testsuite/102946] [12 Regression] gcc.dg/vect/pr101145_1.c etc. FAIL

2021-12-15 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102946

Jiu Fu Guo  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #7 from Jiu Fu Guo  ---
In trunk cases updated to run on the target which supports the cases (e.g.
vec_char_add)

[Bug tree-optimization/102087] [12 Regression] ICE on valid code at -O3 on x86_64-linux-gnu: in determine_exit_conditions, at tree-ssa-loop-manip.c:1049 since r12-3136-g3673dcf6d6baeb67

2021-12-15 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102087

Jiu Fu Guo  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #19 from Jiu Fu Guo  ---
So, mark this as resolved.

[Bug tree-optimization/102131] [12 Regression] wrong code at -O1 and above on x86_64-linux-gnu since r12-3136

2021-11-15 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102131

--- Comment #8 from Jiu Fu Guo  ---
(In reply to Jakub Jelinek from comment #7)
> Any further progress on this?

Thanks, Jabkub!

There is a patch that may cover more cases (PR102636/PR100740.. and other cases
where 'vi0.step - iv1.step > 0'), but it seems complex. 
https://gcc.gnu.org/pipermail/gcc-patches/2021-October/582766.html

We may need a better fix.

BR,
Jiufu

[Bug testsuite/102946] [12 Regression] gcc.dg/vect/pr101145_1.c etc. FAIL

2021-10-27 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102946

--- Comment #6 from Jiu Fu Guo  ---
Hi Rainer and Richard,
Thanks for working on this PR.

The intention of these test cases (pr101145*) is to test if the number 
of iterations can be calculated for the loop with the 'until wrap' 
condition.
So, I'm thinking we may be able to update the cases like:
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" } } */
+/* { dg-final { scan-tree-dump-times "Symbolic number of iterations is" 2
"vect" } } */

[Bug tree-optimization/102087] [12 Regression] ICE on valid code at -O3 on x86_64-linux-gnu: in determine_exit_conditions, at tree-ssa-loop-manip.c:1049 since r12-3136-g3673dcf6d6baeb67

2021-10-08 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102087

--- Comment #16 from Jiu Fu Guo  ---
Thanks David, Richard,

~/gcc/install/gcc-mainline-base-debug/bin/gcc -v
Using built-in specs.
COLLECT_GCC=/home/guojiufu/gcc/install/gcc-mainline-base-debug/bin/gcc
COLLECT_LTO_WRAPPER=/home/guojiufu/gcc/install/gcc-mainline-base-debug/libexec/gcc/x86_64-pc-linux-gnu/12.0.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /home/guojiufu/gcc/gcc-mainline-base/configure
--prefix=/home/guojiufu/gcc/install/gcc-mainline-base-debug --disable-bootstrap
--disable-multilib --disable-werror --with-pkgversion=29c92857039d0a10
--enable-checking=df,extra,fold,rtl,yes --enable-languages=c,c++,fortran :
(reconfigured) /home/guojiufu/gcc/gcc-mainline-base/configure
--prefix=/home/guojiufu/gcc/install/gcc-mainline-base-debug --disable-bootstrap
--disable-multilib --disable-werror --with-pkgversion=29c92857039d0a10
--enable-checking=df,extra,fold,rtl,yes --enable-languages=c,c++,fortran,lto
--no-create --no-recursion
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 12.0.0 20210922 (experimental) (29c92857039d0a10) 

~/gcc/install/gcc-mainline-base-debug/bin/gcc -c -O3 -w  ~/temp/t.c
-march=opteron
~/gcc/install/gcc-mainline-base-debug/bin/gcc -c -O3 -w -march=bdver2
~/temp/t.c

cat ~/temp/t.c
char **Gif_ClipImage_gfi_0;
int Gif_ClipImage_y, Gif_ClipImage_shift;
void Gif_ClipImage(void) {
  for (; Gif_ClipImage_y >= Gif_ClipImage_shift; Gif_ClipImage_y++)
Gif_ClipImage_gfi_0[Gif_ClipImage_shift] =
Gif_ClipImage_gfi_0[Gif_ClipImage_y];
}

I build both 3087d1b0a2c and the latest trunk, the case could pass.

[Bug tree-optimization/102364] [12 Regression] wrong code at -O1 and above on x86_64-linux-gnu since r12-3136-g3673dcf6d6baeb67

2021-09-16 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102364

--- Comment #3 from Jiu Fu Guo  ---
We may be able to mark this as a duplicate of PR100740/PR102131.

[Bug tree-optimization/102364] [12 Regression] wrong code at -O1 and above on x86_64-linux-gnu since r12-3136-g3673dcf6d6baeb67

2021-09-16 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102364

--- Comment #2 from Jiu Fu Guo  ---
This is also the case that two ivs are combined into inaccurate step:

"{3,+,1} < {11,+,2}" was transformed to "{3,+,-1} < {11,+,0}". 
The new condition is not same with the original one.

[Bug tree-optimization/102131] [12 Regression] wrong code at -O1 and above on x86_64-linux-gnu

2021-09-02 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102131

--- Comment #6 from Jiu Fu Guo  ---
Drafted a patch as below.  With this patch, those cases can pass.

diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c
index 7af92d1c893..a400c42919b 100644
--- a/gcc/tree-ssa-loop-niter.c
+++ b/gcc/tree-ssa-loop-niter.c
@@ -1866,6 +1866,24 @@ number_of_iterations_cond (class loop *loop,
  || !iv0->no_overflow || !iv1->no_overflow))
return false;

+  /* GT/GE has been transformed to LT/LE already.
+   cmp_code could be LT, LE or NE
+
+   For LE/LT transform
+   {iv0.base, iv0.step} LT/LE {iv1.base, iv1.step}
+   to
+   {iv0.base, iv0.step - iv1.step} LT/LE {iv1.base, 0}
+   Negative iv0.step - iv1.step means decreasing until wrap,
+   then the transform is not accurate.
+
+   For example:
+   {1, +, 1} <= {4, +, 3}
+   is not same with
+   {1, +, -2} <= {4, +, 0}
+   */
+  if ((code == LE_EXPR || code == LT_EXPR) && tree_int_cst_sign_bit
(step))
+   return false;
+
   iv0->step = step;
   if (!POINTER_TYPE_P (type))
iv0->no_overflow = false;

[Bug tree-optimization/102131] [12 Regression] wrong code at -O1 and above on x86_64-linux-gnu

2021-09-01 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102131

--- Comment #5 from Jiu Fu Guo  ---
(In reply to bin cheng from comment #4)
> (In reply to Jiu Fu Guo from comment #3)
> > The issue may come from 'iv0 cmp iv1' transform:
> > 
> >if (c > -->if (c>=b) in-loop
> > -->if (b<=c) in-loop
> > 
> >   c: {4, +, 3}
> >   b: {1, +, 1}
> > 
> >   if ({1, +, 1} <= {4, +, 3})
> >   ==> if ({1,+,-2} <= {4,+,0})  here, error occur
> >   ==> if ({1,+,-2} < {5,+,0}) le-->lt
> 
> So this duplicates to PR100740?  Thanks

Yes, in theory, these PRs are related to the inaccurate
converting "{b1,s1} LT/LE {b2,s2}" to "{b1,s1-s2} LT/LE {b2,0}".

[Bug tree-optimization/102131] [12 Regression] wrong code at -O1 and above on x86_64-linux-gnu

2021-08-31 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102131

--- Comment #3 from Jiu Fu Guo  ---
The issue may come from 'iv0 cmp iv1' transform:

   if (cif (c>=b) in-loop
-->if (b<=c) in-loop

  c: {4, +, 3}
  b: {1, +, 1}

  if ({1, +, 1} <= {4, +, 3})
  ==> if ({1,+,-2} <= {4,+,0})  here, error occur
  ==> if ({1,+,-2} < {5,+,0}) le-->lt

[Bug tree-optimization/102131] [12 Regression] wrong code at -O1 and above on x86_64-linux-gnu

2021-08-31 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102131

--- Comment #2 from Jiu Fu Guo  ---
Thank you!

For this case, there are two exits, and through these two exits, different
niters(number of iterations) are calculated.  It fails to handle this kind of
case well.

In ivcanon pass, the edge on the condition was removed incorrectly.

int a;
int main() {
  unsigned b = 0;
  int c = 1;
  for (;b < 3; b++) {
if (c < b)
  __builtin_abort ();
c+=3;
  }
  return 0;
}

[Bug tree-optimization/102087] [12 Regression] ICE on valid code at -O3 on x86_64-linux-gnu: in determine_exit_conditions, at tree-ssa-loop-manip.c:1049 since r12-3136-g3673dcf6d6baeb67

2021-08-29 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102087

--- Comment #10 from Jiu Fu Guo  ---
Drafted a patch:

diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c
index 7af92d1c893..5c77c8b7d51 100644
--- a/gcc/tree-ssa-loop-niter.c
+++ b/gcc/tree-ssa-loop-niter.c
@@ -1482,7 +1482,7 @@ number_of_iterations_until_wrap (class loop *, tree type,
affine_iv *iv0,
 affine_iv *iv1, class tree_niter_desc *niter)
 {
   tree niter_type = unsigned_type_for (type);
-  tree step, num, assumptions, may_be_zero;
+  tree step, num, assumptions, may_be_zero, span;
   wide_int high, low, max, min;

   may_be_zero = fold_build2 (LE_EXPR, boolean_type_node, iv1->base,
iv0->base);
@@ -1513,6 +1513,8 @@ number_of_iterations_until_wrap (class loop *, tree type,
affine_iv *iv0,
low = wi::to_wide (iv0->base);
   else
low = min;
+
+  niter->control = *iv1;
 }
   /* {base, -C} < n.  */
   else if (tree_int_cst_sign_bit (iv0->step) && integer_zerop (iv1->step))
@@ -1533,6 +1535,8 @@ number_of_iterations_until_wrap (class loop *, tree type,
affine_iv *iv0,
high = wi::to_wide (iv1->base);
   else
high = max;
+
+  niter->control = *iv0;
 }
   else
 return false;
@@ -1556,6 +1560,14 @@ number_of_iterations_until_wrap (class loop *, tree
type, affine_iv *iv0,
  niter->assumptions, assumptions);

   niter->control.no_overflow = false;
+  niter->control.base = fold_build2 (MINUS_EXPR, niter_type,
+niter->control.base, niter->control.step);
+  span = fold_build2 (MULT_EXPR, niter_type, niter->niter,
+ fold_convert (niter_type, niter->control.step));
+  niter->bound = fold_build2 (PLUS_EXPR, niter_type, span,
+ fold_convert (niter_type, niter->control.base));
+  niter->bound = fold_convert (type, niter->bound);
+  niter->cmp = NE_EXPR;

   return true;
 }


This patch supports the case even the 'bound' is not a const.

[Bug tree-optimization/102087] [12 Regression] ICE on valid code at -O3 on x86_64-linux-gnu: in determine_exit_conditions, at tree-ssa-loop-manip.c:1049 since r12-3136-g3673dcf6d6baeb67

2021-08-27 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102087

--- Comment #7 from Jiu Fu Guo  ---
If step is +-1, or if the 'iv base' is constant, the 'bound' would be
calculated as const.  

Otherwise, the 'bound' maybe something like: "(max - base) / step * step +
base". For this case, then runtime cost may need to calculate 'bound'. 
I'm wondering if it is beneficial to support this kind of case.

[Bug tree-optimization/102087] [12 Regression] ICE on valid code at -O3 on x86_64-linux-gnu: in determine_exit_conditions, at tree-ssa-loop-manip.c:1049 since r12-3136-g3673dcf6d6baeb67

2021-08-27 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102087

--- Comment #6 from Jiu Fu Guo  ---
Drafting patch to calculate three items: control, bound and cmp.

diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c
index 7af92d1c893..c6e4b24fd83 100644
--- a/gcc/tree-ssa-loop-niter.c
+++ b/gcc/tree-ssa-loop-niter.c
@@ -1482,7 +1482,7 @@ number_of_iterations_until_wrap (class loop *, tree type,
affine_iv *iv0,
 affine_iv *iv1, class tree_niter_desc *niter)
 {
   tree niter_type = unsigned_type_for (type);
-  tree step, num, assumptions, may_be_zero;
+  tree step, num, assumptions, may_be_zero, span;
   wide_int high, low, max, min;

   may_be_zero = fold_build2 (LE_EXPR, boolean_type_node, iv1->base,
iv0->base);
@@ -1513,6 +1513,8 @@ number_of_iterations_until_wrap (class loop *, tree type,
affine_iv *iv0,
low = wi::to_wide (iv0->base);
   else
low = min;
+
+  niter->control = *iv1;
 }
   /* {base, -C} < n.  */
   else if (tree_int_cst_sign_bit (iv0->step) && integer_zerop (iv1->step))
@@ -1533,6 +1535,8 @@ number_of_iterations_until_wrap (class loop *, tree type,
affine_iv *iv0,
high = wi::to_wide (iv1->base);
   else
high = max;
+
+  niter->control = *iv0;
 }
   else
 return false;
@@ -1556,6 +1560,14 @@ number_of_iterations_until_wrap (class loop *, tree
type, affine_iv *iv0,
  niter->assumptions, assumptions);

   niter->control.no_overflow = false;
+  tree niter_m1 = fold_build2 (MINUS_EXPR, niter_type, niter->niter,
+ build_int_cst (niter_type, 1));
+  span = fold_build2 (MULT_EXPR, niter_type, niter_m1,
+ fold_convert (niter_type, niter->control.step));
+  niter->bound = fold_build2 (PLUS_EXPR, niter_type, span,
+ fold_convert (niter_type, niter->control.base));
+  niter->bound = fold_convert (type, niter->bound);
+  niter->cmp = NE_EXPR;

   return true;
 }


While this code may generate complicated niter->bound if the step is not +-1.

[Bug tree-optimization/102072] New test case gcc.dg/vect/pr101145_3.c in r12-3136 fails on armeb

2021-08-27 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102072

Jiu Fu Guo  changed:

   What|Removed |Added

 CC||zhendong.su at inf dot ethz.ch

--- Comment #8 from Jiu Fu Guo  ---
*** Bug 102087 has been marked as a duplicate of this bug. ***

[Bug tree-optimization/102087] [12 Regression] ICE on valid code at -O3 on x86_64-linux-gnu: in determine_exit_conditions, at tree-ssa-loop-manip.c:1049 since r12-3136-g3673dcf6d6baeb67

2021-08-27 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102087

Jiu Fu Guo  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #3 from Jiu Fu Guo  ---
Yes, it is a duplicate of PR102072.

*** This bug has been marked as a duplicate of bug 102072 ***

[Bug tree-optimization/102072] New test case gcc.dg/vect/pr101145_3.c in r12-3136 fails on armeb

2021-08-27 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102072

--- Comment #7 from Jiu Fu Guo  ---
For this case: it was generated as:
  l_12 = l_25 + 1;
  if (l_12 > n_13(D))

Here: cmp is ">", bound is "n_13", and "iv(base=l_xx, step=1)".
This hits the assert in determine_exit_conditions.

For members of tree_niter_desc, the comments(as below) align with the asserts
in determine_exit_conditions. 

  /* The simplified shape of the exit condition.  The loop exits if 
 CONTROL CMP BOUND is false, where CMP is one of NE_EXPR,   
 LT_EXPR, or GT_EXPR, and step of CONTROL is positive if CMP is 
 LE_EXPR and negative if CMP is GE_EXPR.  This information is used  
 by loop unrolling.  */
  affine_iv control;
  tree bound;
  enum tree_code cmp;


Current code the "control, bound and cmp" are set in number_of_iterations_lt:
if (integer_nonzerop (iv0->step))
  {  
niter->control = *iv0;
niter->cmp = LT_EXPR;
niter->bound = iv1->base;
  }
else
  {
niter->control = *iv1;
niter->cmp = GT_EXPR;
niter->bound = iv0->base;
  }
This code may need to refine for the case "step until wrap condition".

[Bug tree-optimization/102072] New test case gcc.dg/vect/pr101145_3.c in r12-3136 fails on armeb

2021-08-27 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102072

--- Comment #6 from Jiu Fu Guo  ---
(In reply to Richard Earnshaw from comment #5)
> (In reply to Jiu Fu Guo from comment #4)
> 
> > I did not find arm big-endian yet, I'm trying to reproduce this issue on
> > other targets...
> 
> For testing purposes you should be able to build a standard arm-eabi config
> and then compile the testcase with -mbig-endian.

Thanks a lot!
config with --target=arm-none-eabi, I could reproduce the ice for
-mcpu=cortex-a9.

[Bug tree-optimization/102072] New test case gcc.dg/vect/pr101145_3.c in r12-3136 fails on armeb

2021-08-26 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102072

--- Comment #4 from Jiu Fu Guo  ---
Code around tree-ssa-loop-manip.c:1049 (in determine_exit_conditions) is:
  else if (cmp == GT_EXPR)
{
  gcc_assert (tree_int_cst_sign_bit (step));
}

which seems checking: 'step' should be negative if 'cmp' is ">"

code around pr101145.c:6 :
  5unsigned __attribute__ ((noinline))
  6foo (int *__restrict__ a, int *__restrict__ b, unsigned l, unsigned n)
  7{
  8  while (n < ++l)
  9*a++ = *b++ + 1;
 10  return l;
 11}


I did not find arm big-endian yet, I'm trying to reproduce this issue on other
targets...

[Bug target/102069] [12 regression] New test case gcc.dg/vect/pr101145_3.c in r12-3136 fails on power 7

2021-08-26 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102069

--- Comment #1 from Jiu Fu Guo  ---
Thanks, Segher!

The test case could be updated.  The patch supports calculating the number of
iterations for the special condition(step to min/max), so we may just update to
the case to check if the "number of iterations" is there.
like:
diff --git a/gcc/testsuite/gcc.dg/vect/pr101145_3.c
b/gcc/testsuite/gcc.dg/vect/pr101145_3.c
index 99289afec0b..819e134c6e6 100644
--- a/gcc/testsuite/gcc.dg/vect/pr101145_3.c
+++ b/gcc/testsuite/gcc.dg/vect/pr101145_3.c
@@ -10,4 +10,4 @@

 #include "pr101145.inc"

-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" } } */  
+/* { dg-final { scan-tree-dump-times "Symbolic number of iterations is" 2
"vect" } } */

[Bug target/61837] missed loop invariant expression optimization

2021-08-11 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61837

Jiu Fu Guo  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #13 from Jiu Fu Guo  ---
The code looks like below with trunk and options  -O2 -mcpu=power8 -S
-fno-unroll-loops

.L2:
ble %cr7,.L7
mtctr %r5
addi %r10,%r4,-1
mr %r9,%r3
.p2align 5
.L4:
lbzu %r8,1(%r10)
cmpw %cr0,%r8,%r7
bne %cr0,.L3
stw %r6,0(%r9)
.L3:
addi %r9,%r9,4
bdnz .L4
.L7:
addi %r6,%r6,88
addi %r7,%r7,1
cmpwi %cr0,%r6,
bne %cr0,.L2
blr

Just mark this PR as resolved.

[Bug debug/101669] error reading variable from debug information when compiling with -O2

2021-07-29 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101669

Jiu Fu Guo  changed:

   What|Removed |Added

 Resolution|--- |WONTFIX
 Status|NEW |RESOLVED

--- Comment #8 from Jiu Fu Guo  ---
(In reply to Andrew Pinski from comment #6)
> Hmmm, maybe https://sourceware.org/bugzilla/show_bug.cgi?id=27999

Thanks, Andrew.

Great! With the latest gdb from the trunk, it is ok. 

(gdb) b sub
Breakpoint 1 at 0xa58: file /home/guojiufu/temp/gdb.f90, line 9.
(gdb) b 17
Breakpoint 2 at 0xafc: file /home/guojiufu/temp/gdb.f90, line 17.
(gdb) r
Starting program: /home/guojiufu/temp/gdb/binutils-gdb/arg1.exe

Breakpoint 1, 0x00010a58 in sub (a=..., n=10) at
/home/guojiufu/temp/gdb.f90:9
9   subroutine sub (a, n)
(gdb) c
Continuing.

Breakpoint 2, sub (a=..., n=10) at /home/guojiufu/temp/gdb.f90:17
17write (*,*) a
(gdb) p a
$1 = (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
(gdb) 

So, it seems not to be an issue from GCC, and works fine on with the latest
binutils.

Thanks all!

[Bug debug/101669] error reading variable from debug information when compiling with -O2

2021-07-29 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101669

--- Comment #5 from Jiu Fu Guo  ---
(In reply to Andrew Pinski from comment #4)
> What version of gdb are you using?

Tried gdb8.1/8.3/9.2 on ppc64le.
In gdb, the msg "error reading variable: dwarf2_find_location_expression:"
occurs when stopping at the breakpoint 'sub' on the tip message:

Breakpoint 1, 0x00010a58 in sub (a=, n=10) at
/home/guojiufu/temp/gdb.f90:7


readelf --debug-dump arg1.exe |grep readelf
readelf: Error: Invalid location list entry type 8
readelf: Warning: Hole and overlap detection requires adjacent view lists and
loclists.


readelf -v
GNU readelf (GNU Binutils for Ubuntu) 2.34
readelf 2.30 can also get the Error/Warning msg.


gfortran -O2 -g ~/temp/gdb.f90 -gdwarf-5 -o arg1.exe

On x86 is similar, readelf2.30/gdb8.1 can reproduce the msg at my side.

Before gcc11, need -gdwarf-5 to reproduce since we default to this dwarf
version in gcc11.

[Bug debug/101669] error reading variable from debug information when compiling with -O2

2021-07-29 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101669

--- Comment #2 from Jiu Fu Guo  ---
Similar to what Richard said,  tested with gdb, use -gdwarf-4 with trunk, the
msg also disappears.

[Bug debug/101669] New: error reading variable from debug information when compiling with -O2

2021-07-29 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101669

Bug ID: 101669
   Summary: error reading variable from debug information when
compiling with -O2
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: minor
  Priority: P3
 Component: debug
  Assignee: unassigned at gcc dot gnu.org
  Reporter: guojiufu at gcc dot gnu.org
  Target Milestone: ---

For below case:
--gdb.f90
  integer :: a(10), b(12)
  call sub (a, 10)
  call sub (b, 12)
  write (*,*) a, b
end

subroutine sub (a, n)
  integer :: a(n), n
  integer(kind=8) nl, i
  nl = n

  do i = 1, nl
a(i) = i
  end do
  write (*,*) a  
end subroutine

At -O2, using command "gfortran -O2 -g ~/temp/gdb.f90 -o arg1.exe" to compile,
when debugger it with gdb, there is log:
a=

$ gdb arg1.exe
...
(gdb) b sub
Breakpoint 1 at 0x400710: file /home/guojiufu/temp/gdb.f90, line 7.
(gdb) r
Starting program: /home/guojiufu/gcc/build/gcc-mainline-base/arg1.exe 

Breakpoint 1, sub (a=, 
n=10) at /home/guojiufu/temp/gdb.f90:7
7   subroutine sub (a, n)

This msg does not occur when compiling with -O3 -g.

This issue can be reproduced on x86/ppc64le with the latest trunk, and also
occur in gcc11.

[Bug target/67288] [9/10/11/12 regression] non optimal simple function (useless additional shift/remove/shift/add)

2021-07-13 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67288

Jiu Fu Guo  changed:

   What|Removed |Added

 Status|REOPENED|RESOLVED
 Resolution|--- |FIXED
 CC||guojiufu at gcc dot gnu.org

--- Comment #22 from Jiu Fu Guo  ---
 (In reply to Segher Boessenkool from comment #4)
> It's not fixed.  On trunk we get:
> 
> ===
> flush_dcache_range:
> rlwinm 3,3,0,0,27
> addi 4,4,15
> subf 4,3,4
> srwi. 9,4,4
> beq 0,.L1
> slwi 9,9,4
> addi 9,9,-16
> srwi 9,9,4
> addi 9,9,1
> mtctr 9
> .p2align 4,,15
> .L3:
> dcbf 0, 3
> addi 3,3,16
> bdnz .L3
> sync
> .L1:
> blr
> ===
> 
> (-m32, edited a bit).
> 
> The slwi/addi/srwi/addi is unnecessary.

With the latest trunk (which contains
https://gcc.gnu.org/g:8a15faa730f99100f6f3ed12663563356ec5a2c0)
The asm code is:

.cfi_startproc
rldicr %r3,%r3,0,59
addi %r9,%r4,15
subf %r9,%r3,%r9
srwi %r9,%r9,4
cmpwi %cr0,%r9,0
beqlr %cr0
rldicl %r9,%r9,0,32
mtctr %r9
.p2align 4,,15
.L3:
dcbf 0, %r3
addi %r3,%r3,16
bdnz .L3
sync
blr

[Bug tree-optimization/101291] turns infinite loop into finite

2021-07-02 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101291

--- Comment #7 from Jiu Fu Guo  ---
When generates doloop.xxx in ivopts, gimple looks like:

   [local count: 21023864]:
  _38 = val_4(D) - start_3(D);
  _29 = _38 / 16;
  doloop.15_35 = _29 + 1;

   [local count: 191126041]:
  # cnt_17 = PHI <0(13), cnt_19(10)>
  # doloop.15_28 = PHI 
  cnt_19 = cnt_17 + 1;
  doloop.15_23 = doloop.15_28 - 1;
  if (doloop.15_23 != 0)
goto ; [89.00%]
  else
goto ; [11.00%]

   [local count: 170102176]:
  goto ; [100.00%]


Before it 
   [local count: 21023864]:

   [local count: 191126041]:
  # cnt_17 = PHI <0(13), cnt_19(10)>
  # i_18 = PHI 
  cnt_19 = cnt_17 + 1;
  i_20 = i_18 + 16;
  if (val_4(D) >= i_20)
goto ; [89.00%]
  else
goto ; [11.00%]

   [local count: 170102176]:
  goto ; [100.00%]

   [local count: 118111600]:
  # cnt_11 = PHI 
  return cnt_11;


---
In number_of_iterations_exit_assumptions, there is code: 
   if (!integer_zerop (niter->assumptions)
&& loop_constraint_set_p (loop, LOOP_C_FINITE))
  niter->assumptions = boolean_true_node;

At here niter->assumptions was reset to true. And then doloop.xx is generated
as if niter is always ok.

[Bug tree-optimization/101291] New: turns infinite loop into finite

2021-07-02 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101291

Bug ID: 101291
   Summary: turns infinite loop into finite
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: guojiufu at gcc dot gnu.org
  Target Milestone: ---

For the below code, it should run infinite, but it terminates quickly.

#include 
__attribute__ ((noinline))
unsigned foo(unsigned val, unsigned start)
{
  unsigned cnt = 0;
  for (unsigned i = start; i <= val; i+=16)
cnt++;
  return cnt;
}

int main()
{
  return foo (UINT_MAX-7, 8);
}

[Bug tree-optimization/101145] niter analysis fails for until-wrap condition

2021-06-30 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101145

--- Comment #8 from Jiu Fu Guo  ---
Reference the code of adjust_cond_for_loop_until_wrap, add code for non-const
cases.  Code was added in adjust_cond_for_loop_until_wrap at beginning, to set
may_be_zero and no_overflow, the code was moved to number_of_iterations_lt at
last.

The patch was submitted as: 
https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574110.html

[Bug tree-optimization/101145] niter analysis fails for until-wrap condition

2021-06-25 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101145

--- Comment #6 from Jiu Fu Guo  ---
> As tests, for below loop, adjust_cond_for_loop_until_wrap return false:
> 
> foo (int *__restrict__ a, int *__restrict__ b, unsigned i)
> {
>   while (++i > 100)
> *a++ = *b++ + 1;
> }
For the above code, niter still may be zero: e.g. "i < 100" at the start.
For the below code, niter can be determined as constant at compiling time.
> 
> For below code, adjust_cond_for_loop_until_wrap returns true:
>   i = UINT_MAX - 200;
>   while (++i > 100)
> *a++ = *b++ + 1;

For below code, niter is also may be zero: e.g. "UINT_MAX - 100 < n" .
   i = UINT_MAX - 200
   while (++i > n)
 *a++ = *b++ + 1;

[Bug tree-optimization/101145] niter analysis fails for until-wrap condition

2021-06-25 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101145

--- Comment #5 from Jiu Fu Guo  ---
(In reply to bin cheng from comment #4)
> (In reply to Jiu Fu Guo from comment #3)
> > Yes, while the code in adjust_cond_for_loop_until_wrap seems somehow tricky:
> > 
> >   /* Only support simple cases for the moment.  */
> >   if (TREE_CODE (iv0->base) != INTEGER_CST
> >   || TREE_CODE (iv1->base) != INTEGER_CST)
> > return false;
> > 
> > This code requires both sides are constant.
> Actually it requires an IV with constant base.

I also feel that the intention of this function may only require one side
constant for IV0 CODE IV1.
As tests, for below loop, adjust_cond_for_loop_until_wrap return false:

foo (int *__restrict__ a, int *__restrict__ b, unsigned i)
{
  while (++i > 100)
*a++ = *b++ + 1;
}

For below code, adjust_cond_for_loop_until_wrap returns true:
  i = UINT_MAX - 200;
  while (++i > 100)
*a++ = *b++ + 1;

[Bug tree-optimization/101145] niter analysis fails for until-wrap condition

2021-06-24 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101145

--- Comment #3 from Jiu Fu Guo  ---
Yes, while the code in adjust_cond_for_loop_until_wrap seems somehow tricky:

  /* Only support simple cases for the moment.  */
  if (TREE_CODE (iv0->base) != INTEGER_CST
  || TREE_CODE (iv1->base) != INTEGER_CST)
return false;

This code requires both sides are constant.

[Bug rtl-optimization/100622] Conversion to smaller unsigned type in loop

2021-06-08 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100622

Jiu Fu Guo  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED
 CC||guojiufu at gcc dot gnu.org

--- Comment #6 from Jiu Fu Guo  ---
Had a test, this issue has been fixed on the trunk by r12-1202.

[Bug target/59371] [9/10/11/12 Regression] Performance regression in GCC 4.8/9/10/11/12 and later versions.

2021-05-16 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59371

--- Comment #28 from Jiu Fu Guo  ---
If change code as below, 'i' is not starting from '0', and 'compare code' is
'!='
then wrap/overflow on 'i' may happen, and optimizations (e.g. vectorization)
are not applied.
The below patch is trying to optimize this kind of loop.
https://gcc.gnu.org/pipermail/gcc-patches/2021-May/570424.html

int foo (int *p, unsigned short u_n, unsigned short i)
{
  int x = 0;
  for (; i != u_n; i++) {
x = x + p[i];
  }
  return x;
}

[Bug target/59371] [9/10/11/12 Regression] Performance regression in GCC 4.8/9/10/11/12 and later versions.

2021-05-16 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59371

Jiu Fu Guo  changed:

   What|Removed |Added

 CC||guojiufu at gcc dot gnu.org

--- Comment #27 from Jiu Fu Guo  ---
For -O2, since a few optimizations are not enabled (e.g. some loop-based
optimizations), the code was not optimized too much.

At -O3, now, GCC could vectorize it.  While with GCC 4.8, the code was not
vectorized.  I guess the pain in performance may be mitigated.

[Bug go/100537] Bootstrap-O3 and bootstrap-debug fail on 32-bit ARM after gcc-12-657-ga076632e274a

2021-05-13 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100537

--- Comment #15 from Jiu Fu Guo  ---
(In reply to Jiu Fu Guo from comment #9)
> Yes,
> 
> diff --git a/gcc/go/go-gcc.cc b/gcc/go/go-gcc.cc
> index 5d9dbb5d068..32637a44af1 100644
> --- a/gcc/go/go-gcc.cc
> +++ b/gcc/go/go-gcc.cc
> @@ -1680,6 +1680,7 @@ Gcc_backend::address_expression(Bexpression* bexpr,
> Location location)
>if (expr == error_mark_node)
>  return this->error_expression();
> 
> +  TREE_ADDRESSABLE(expr) = 1;
>tree ret = build_fold_addr_expr_loc(location.gcc_location(), expr);
>return this->make_expression(ret);
>  }
> 
> Could pass bootstrap.

So, this patch would pass bootstrap and regtest.

[Bug go/100537] Bootstrap-O3 and bootstrap-debug fail on 32-bit ARM after gcc-12-657-ga076632e274a

2021-05-13 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100537

--- Comment #14 from Jiu Fu Guo  ---
Update/correct info:
If bootstrap-O3, the message is "error: method 'foo' is ambiguous".
It is "error: type has no method 'foo'".

[Bug go/100537] Bootstrap-O3 and bootstrap-debug fail on 32-bit ARM after gcc-12-657-ga076632e274a

2021-05-12 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100537

--- Comment #13 from Jiu Fu Guo  ---
(In reply to Ian Lance Taylor from comment #12)
> A change to go-gcc.cc should not change any of the error messages emitted by
> the Go frontend.  It should not change the way that issue4458.go is handled.
> Those errors messages are emitted long before any of the code in go-gcc.cc
> is called.  I'm not sure what is happening.

It is interesting,  I rerun for trunk (without the patch), the message is
"error: type has no method 'foo'"
With the patch, the message is "error: method 'foo' is ambiguous"

At expressions.cc:14655
  if (!is_ambiguous)
go_error_at(location, "type has no method %<%s%>",
Gogo::message_name(name).c_str());
  else
go_error_at(location, "method %<%s%> is ambiguous",
Gogo::message_name(name).c_str());

is_ambiguous:
  if (nt != NULL)
method = nt->method_function(name, _ambiguous);
  else if (st != NULL)
method = st->method_function(name, _ambiguous);

[Bug go/100537] Bootstrap-O3 and bootstrap-debug fail on 32-bit ARM after gcc-12-657-ga076632e274a

2021-05-12 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100537

--- Comment #11 from Jiu Fu Guo  ---
Had a quick regression test on the patch:
issue4458.go which pass before, but fail on this patch.
Compiling message changed from "error: method expression requires named type or
pointer to named type" to "error: method 'foo' is ambiguous"

I'm not sure if this message change is expected or not.

issue4458.go:
package main

type T struct{}

func (T) foo() {}

func main() {
av := T{}
pav := 
(**T).foo() // ERROR "no method .*foo|requires named type or
pointer to named"
}

[Bug go/100537] Bootstrap-O3 and bootstrap-debug fail on 32-bit ARM after gcc-12-657-ga076632e274a

2021-05-12 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100537

--- Comment #9 from Jiu Fu Guo  ---
Yes,

diff --git a/gcc/go/go-gcc.cc b/gcc/go/go-gcc.cc
index 5d9dbb5d068..32637a44af1 100644
--- a/gcc/go/go-gcc.cc
+++ b/gcc/go/go-gcc.cc
@@ -1680,6 +1680,7 @@ Gcc_backend::address_expression(Bexpression* bexpr,
Location location)
   if (expr == error_mark_node)
 return this->error_expression();

+  TREE_ADDRESSABLE(expr) = 1;
   tree ret = build_fold_addr_expr_loc(location.gcc_location(), expr);
   return this->make_expression(ret);
 }

Could pass bootstrap.

[Bug go/100537] Bootstrap-O3 and bootstrap-debug fail on 32-bit ARM after gcc-12-657-ga076632e274a

2021-05-12 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100537

--- Comment #6 from Jiu Fu Guo  ---
As Richard mentioned: one does mark the object addressable.
Which is for 'label' (Gcc_backend::label_address).

I'm wondering if all others invoking on build_fold_addr_expr_loc need to mark
addressable?

[Bug go/100537] Bootstrap-O3 and bootstrap-debug fail on 32-bit ARM after gcc-12-657-ga076632e274a

2021-05-12 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100537

--- Comment #5 from Jiu Fu Guo  ---
breakpoint at tree-ssa.c:1013 error ("address taken, but ADDRESSABLE bit not
set"); 

if ((VAR_P (base)
 || TREE_CODE (base) == PARM_DECL
 || TREE_CODE (base) == RESULT_DECL)
&& !TREE_ADDRESSABLE (base))
  {
B   error ("address taken, but ADDRESSABLE bit not set");
err = true;
  }

we can see base:
p base
$1 = 

And break at ggc_internal_alloc, b ggc-page.c:1455 if result ==  0x20a285e0
we can see the stack:
Unary_expression::do_get_backend
(expressions.cc:5322)->Gcc_backend::implicit_variable(go-gcc.cc:29239) ->
build_decl->make_node ->...-> ggc_internal_cleared_alloc

Gcc_backend::implicit_variable:
  tree decl = build_decl(BUILTINS_LOCATION, VAR_DECL, ...

Unary_expression::do_get_backend (expressions.cc:5322):
  gogo->backend()->implicit_variable(var_name, "", btype, true, true, false,
0);
where var_name is go..C479


And break at build_fold_addr_expr_loc if t == 0x20a285e0
Gcc_backend::address_expression (go-gcc:1683) --> build_fold_addr_expr_loc

[Bug middle-end/100537] Bootstrap-O3 and bootstrap-debug fail on 32-bit ARM after gcc-12-657-ga076632e274a

2021-05-11 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100537

Jiu Fu Guo  changed:

   What|Removed |Added

 CC||guojiufu at gcc dot gnu.org

--- Comment #3 from Jiu Fu Guo  ---
This issue also occurs on ppc64le when --with-build-config=bootstrap-O3.

As mentioned in PR100513:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100513#c21

[Bug ipa/100513] [10/11 Regression] ICE: Segmentation fault (in lookup_page_table_entry) for bootstrap-O3 since r11-6411-gae99b315ba5b9e1c

2021-05-11 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100513

--- Comment #24 from Jiu Fu Guo  ---
(In reply to rguent...@suse.de from comment #22)
> On Tue, 11 May 2021, guojiufu at gcc dot gnu.org wrote:
> 
cut..
> > Makefile:3001: recipe for target 'syscall.lo' failed
> 
> Yes, this was reported by Maxim as well, independent of this
> patch.  It's caused by sth else.

Thanks ;)

[Bug ipa/100513] [10/11 Regression] ICE: Segmentation fault (in lookup_page_table_entry) for bootstrap-O3 since r11-6411-gae99b315ba5b9e1c

2021-05-11 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100513

--- Comment #23 from Jiu Fu Guo  ---
Created attachment 50791
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50791=edit
the command to build syscall.o

[Bug ipa/100513] [10/11 Regression] ICE: Segmentation fault (in lookup_page_table_entry) for bootstrap-O3 since r11-6411-gae99b315ba5b9e1c

2021-05-11 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100513

--- Comment #21 from Jiu Fu Guo  ---
When build the go on trunk with the patch, an error occur:
In function 'syscall.forkExec':
go1: error: address taken, but ADDRESSABLE bit not set
PHI argument

for PHI node
err$__object_77 = PHI 
during GIMPLE pass: fre
go1: internal compiler error: verify_ssa failed
mv -f .deps/tsan_rtl_report.Tpo .deps/tsan_rtl_report.Plo
0x10fde5cf verify_ssa(bool, bool)
/home/guojiufu/gcc/gcc-mainline-test/gcc/tree-ssa.c:1214
0x10aff4ef execute_function_todo
/home/guojiufu/gcc/gcc-mainline-test/gcc/passes.c:2049
0x10b011c3 do_per_function
/home/guojiufu/gcc/gcc-mainline-test/gcc/passes.c:1687
0x10b011c3 execute_todo
/home/guojiufu/gcc/gcc-mainline-test/gcc/passes.c:2096
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.
Makefile:3001: recipe for target 'syscall.lo' failed

[Bug ipa/100513] [10/11 Regression] ICE: Segmentation fault (in lookup_page_table_entry) for bootstrap-O3 since r11-6411-gae99b315ba5b9e1c

2021-05-11 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100513

--- Comment #20 from Jiu Fu Guo  ---
Yes, with the patch, bootstrap-O3 pass on ppc64le too.

Thanks!

[Bug ipa/100513] ICE: Segmentation fault (in lookup_page_table_entry) for bootstrap-O3

2021-05-11 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100513

--- Comment #7 from Jiu Fu Guo  ---
A similar issue also reported on X86 before,
https://gcc.gnu.org/pipermail/gcc-testresults/2021-April/677996.html
While when I bootstrap -O3 on one x86, it passes.

[Bug ipa/100513] ICE: Segmentation fault (in lookup_page_table_entry) for bootstrap-O3

2021-05-10 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100513

--- Comment #6 from Jiu Fu Guo  ---
cut..
> 0xa5a5a5a5a5a5a5a means the location has been GC'ed already; either from
> ggc_free or from a previous ggc_collect.
> What you can try is run with the following options:
> --param ggc-min-expand=1 --param ggc-min-heapsize=1
> Which will cause ggc_collect to run the garbage collection almost every time
> (setting it to 0 will run it every time but it is much much slower) and
> reduce the testcase that way.

Hi Andrew, thanks.

With --param ggc-min-expand=1 --param ggc-min-heapsize=1, the error can
reproduced too.

> 
> Also it would be a good idea to attach the preprocessed source and the exact
> command line xgcc is involved and which stage is this at too and the full
> configure command line used.

I would continue to reduce the testcase, .ii file is still large.

> 
> I highly doubt it is related to PR 99447 which is about a stack overflow
> while doing the garbage collection walk.
Yes, it is possible! The stack depth is big: > 500 levels at least.

[Bug ipa/100513] ICE: Segmentation fault (in lookup_page_table_entry) for bootstrap-O3

2021-05-10 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100513

--- Comment #5 from Jiu Fu Guo  ---
build command is:
configure --enable-languages=c,c++,fortran,objc,obj-c++,go --with-cpu=native
--disable-multilib --with-long-double-128 --prefix=$HOME/xx
--with-build-config=bootstrap-O3
make -j

[Bug ipa/100513] ICE: Segmentation fault (in lookup_page_table_entry) for bootstrap-O3

2021-05-10 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100513

--- Comment #4 from Jiu Fu Guo  ---
Created attachment 50787
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=50787=edit
t.ii

/home/guojiufu/gcc/build/gcc-mainline-test/./prev-gcc/xg++
-B/home/guojiufu/gcc/build/gcc-mainline-test/./prev-gcc/   -c   -O3  -o
tree-cfg.o 
t.ii

[Bug ipa/100513] ICE: Segmentation fault (in lookup_page_table_entry) for bootstrap-O3

2021-05-10 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100513

--- Comment #2 from Jiu Fu Guo  ---
There is a similar bug fixed for
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99447.  it may be a different
issue.

[Bug ipa/100513] ICE: Segmentation fault (in lookup_page_table_entry) for bootstrap-O3

2021-05-10 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100513

--- Comment #1 from Jiu Fu Guo  ---
The error is raised after ipa “inlining” pass, when doing ggc_collect at stage
2.

At code:
xlimit = ((*xlimit).next);

The value of xlimit becomes 0xa5a5a5a5a5a5a5a5 before crash. 0xa5 may comes
from poison_pages.


If using "-fno-inline" the crash disappears.

[Bug ipa/100513] New: ICE: Segmentation fault (in lookup_page_table_entry) for bootstrap-O3

2021-05-10 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100513

Bug ID: 100513
   Summary: ICE: Segmentation fault (in lookup_page_table_entry)
for bootstrap-O3
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: ipa
  Assignee: unassigned at gcc dot gnu.org
  Reporter: guojiufu at gcc dot gnu.org
CC: marxin at gcc dot gnu.org
  Target Milestone: ---

With --with-build-config=bootstrap-O3, I encounter an ICE in bootstrap on
ppc64le:

/home/guojiufu/gcc/gcc-mainline-test/gcc/tree-cfg.c: In function 'tree_node*
get_cases_for_edge(edge, gswitch*)':
/home/guojiufu/gcc/gcc-mainline-test/gcc/tree-cfg.c:1339:1: internal compiler
error: Segmentation fault
 1339 | }
   | ^
0x116b928b crash_signal
/home/guojiufu/gcc/gcc-mainline-test/gcc/toplev.c:327
0x10c6f55c lookup_page_table_entry
/home/guojiufu/gcc/gcc-mainline-test/gcc/ggc-page.c:630
0x10c71057 ggc_set_mark(void const*)
/home/guojiufu/gcc/gcc-mainline-test/gcc/ggc-page.c:1544
0x11175ddf gt_ggc_mx_basic_block_def(void*)
/home/guojiufu/gcc/build/gcc-mainline-test/gcc/gtype-desc.c:1518
0x11174a5f gt_ggc_mx_gimple(void*)
/home/guojiufu/gcc/build/gcc-mainline-test/gcc/gtype-desc.c:1238
0x10af4757 gt_ggc_mx_lang_tree_node(void*)
./gt-cp-tree.h:483
0x11174a2b gt_ggc_mx_gimple(void*)
/home/guojiufu/gcc/build/gcc-mainline-test/gcc/gtype-desc.c:1235
0x111756f3 gt_ggc_mx_cgraph_edge(void*)
/home/guojiufu/gcc/build/gcc-mainline-test/gcc/gtype-desc.c:1403
0x111756d3 gt_ggc_mx_cgraph_edge(void*)
/home/guojiufu/gcc/build/gcc-mainline-test/gcc/gtype-desc.c:1402
0x111751e7 gt_ggc_mx_symtab_node(void*)
/home/guojiufu/gcc/build/gcc-mainline-test/gcc/gtype-desc.c:1348
0x10af3c37 gt_ggc_mx_lang_tree_node(void*)
./gt-cp-tree.h:346
0x11176ff3 gt_ggc_mx(tree_node*&)
/home/guojiufu/gcc/build/gcc-mainline-test/gcc/gtype-desc.c:1790
0x10b03e7b void gt_ggc_mx(vec*)
/home/guojiufu/gcc/gcc-mainline-test/gcc/vec.h:1353
0x11177477 gt_ggc_mx_vec_tree_va_gc_(void*)
/home/guojiufu/gcc/build/gcc-mainline-test/gcc/gtype-desc.c:1868
0x10af1f53 gt_ggc_mx_lang_type(void*)
./gt-cp-tree.h:36
0x10af4457 gt_ggc_mx_lang_tree_node(void*)
./gt-cp-tree.h:440
0x10af32ef gt_ggc_mx_lang_tree_node(void*)
./gt-cp-tree.h:263
0x11176ff3 gt_ggc_mx(tree_node*&)
/home/guojiufu/gcc/build/gcc-mainline-test/gcc/gtype-desc.c:1790
0x10b03e7b void gt_ggc_mx(vec*)
/home/guojiufu/gcc/gcc-mainline-test/gcc/vec.h:1353
0x11177477 gt_ggc_mx_vec_tree_va_gc_(void*)
/home/guojiufu/gcc/build/gcc-mainline-test/gcc/gtype-desc.c:1868
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.
Makefile:1142: recipe for target 'tree-cfg.o' failed

[Bug tree-optimization/98813] loop is sub-optimized if index is unsigned int with offset

2021-01-27 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98813

--- Comment #8 from Jiu Fu Guo  ---
For code in comment 4, it is optimized since there are some range info for "_2
= l_m_34 + _54;" where _54 > 0.

[Bug tree-optimization/98813] loop is sub-optimized if index is unsigned int with offset

2021-01-26 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98813

--- Comment #7 from Jiu Fu Guo  ---
(In reply to Richard Biener from comment #6)
> (In reply to Andrew Pinski from comment #5)
> > (In reply to Jiu Fu Guo from comment #0)
> > > For the below code:
> > > ---t.c
> > > void
> > > foo (const double* __restrict__ A, const double* __restrict__ B, double*
> > > __restrict__ C,
> > >  int n, int k, int m)
> > > {
> > >   for (unsigned int l_m = 0; l_m < m; l_m++)
> > > C[n + l_m] += A[k + l_m] * B[k];
> > > }
> > 
> > Try using unsigned long instead of unsigned int.
> > I think this is the same as PR 61247.
> 
> Yes, I think we've seen plenty examples in the past where conversions in
> the SCEV chain prevent analysis.

Yes. Thanks for your comments and suggestions!

And for this code (unsigned int), I'm thinking if we really need runtime
scev/overflow checking before vectorizing it to guard `n+m<4294967295 &&
m<4294967295`.  
Without this guard, I'm wondering if the optimization is correct for the code
in comment 4.

[Bug tree-optimization/98813] loop is sub-optimized if index is unsigned int with offset

2021-01-25 Thread guojiufu at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98813

--- Comment #4 from Jiu Fu Guo  ---
Thanks, Richard!

One interesting thing: below code is vectorized:

void
foo (const double *__restrict__ A, const double *__restrict__ B,
 double *__restrict__ C, int n, int k, int m)
{
  if (n > 0 && m > 0 && k > 0)
for (unsigned int l_m = 0; l_m < m; l_m++)
  C[n + l_m] += A[k + l_m] * B[k];
}

<    1   2   3   >