[Bug tree-optimization/115693] 8 std::byte std::array comparison potential missed optimization

2024-09-22 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115693

Levy Hsu  changed:

   What|Removed |Added

 CC||admin at levyhsu dot com

--- Comment #7 from Levy Hsu  ---
Looks like the problem is solved:
https://godbolt.org/z/Y7bn77bjG

#include 
#include 
#include 

bool compare1(const std::array &p, std::array r)
{
return p == r;
}

bool compare2(const std::array &p, std::array r)
{
return p == r;
}

// same assembly if you use char instead of byte
bool compare3(const std::array &p, std::array r)
{
return std::bit_cast(p) == std::bit_cast(r);
}

compiled with trunk and --std=c++20 -O3 flags produces:

compare1(std::array const&, std::array):
cmp QWORD PTR [rdi], rsi
seteal
ret
compare2(std::array const&, std::array):
cmp QWORD PTR [rdi], rsi
seteal
ret
compare3(std::array const&, std::array):
cmp QWORD PTR [rdi], rsi
seteal
ret

[Bug c++/116064] New: [15 Regression] SPEC 2017 523.xalancbmk_r failed to build

2024-07-23 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116064

Bug ID: 116064
   Summary: [15 Regression] SPEC 2017 523.xalancbmk_r failed to
build
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: admin at levyhsu dot com
  Target Milestone: ---

On x86 CascadeLake/IceLake/Sapphire Rapids/Zen4/Zen3, compile with:
-march=native -Ofast -funroll-loops -flto
or
-mtune=generic -O2 -march=x86-64-v3

On Aarch64 Neoverse-N1/Graviton 3, compile with:
-march=native -Ofast -funroll-loops -flto
or
-O2

Bisect down to r15-2117-g313afcfdabeab3

In file included from XalanXMLSerializerFactory.cpp:30:
xalanc/XMLSupport/XalanOtherEncodingWriter.hpp: In member function 'void
xalanc_1_10::XalanOtherEncodingWriter::writeSafe(const xalanc_1_10::XalanDOMChar*,
xalanc_1_10::XalanFormatterWriter::size_type)':
xalanc/XMLSupport/XalanOtherEncodingWriter.hpp:319:30: error: 'class
xalanc_1_10::XalanOtherEncodingWriter' has no member
named 'm_isPresentable'
  319 | if(this->m_isPresentable(value))
  |  ^~~
xalanc/XMLSupport/XalanOtherEncodingWriter.hpp:325:31: error: 'class
xalanc_1_10::XalanOtherEncodingWriter' has no member
named 'writeNumberedEntityReference'
  325 | this->writeNumberedEntityReference(value);
  |   ^~~~
specmake: *** [**hiden_path**/speccpu2017/benchspec/Makefile.defaults:366:
XalanXMLSerializerFactory.o] Error 1

[Bug target/115889] [15 Regression] FAIL: gcc.dg/vect/vect-vfa-03.c execution test with -march=znver4 --param vect-partial-vector-usage=1 since r15-1368-g6d0b7b69d14302

2024-07-13 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115889

--- Comment #7 from Levy Hsu  ---
It appears that vect-partial-vector-usage=2 causes short int type V32HI falls
into vpermt2_sepcial_bf16_shuffle_ while the original one was intended
for bf16, will investigate.

[Bug tree-optimization/115256] [15 Regression] 502.gcc_r Run failed with '-march=native -Ofast -funroll-loops -flto' since r15-571-g1e0ae1f52741f7

2024-05-28 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115256

--- Comment #3 from Levy Hsu  ---
FYI we tried serval combinations, -funroll-loops didn't cause the issue, The
link-time optimization -flto may caused the issue, we can pass with the option
[-march=native -Ofast -funroll-loops].

But compiling with -flto makes it harder to minimize a test case. So we're
still not clear what exactly the issue is.

[Bug tree-optimization/115256] New: [15 Regression] 502.gcc_r Run failed with '-march=native -Ofast -funroll-loops -flto' since r15-571-g1e0ae1f52741f7

2024-05-28 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115256

Bug ID: 115256
   Summary: [15 Regression] 502.gcc_r Run failed with
'-march=native -Ofast -funroll-loops -flto' since
r15-571-g1e0ae1f52741f7
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: admin at levyhsu dot com
  Target Milestone: ---

Bisect down to r15-571-g1e0ae1f52741f7
(1e0ae1f52741f7e0133661659ed2d210f939a398)

tree-optimization/79958 - make DSE track multiple paths
DSE currently gives up when the path we analyze forks.  This leads
to multiple missed dead store elimination PRs.  The following fixes
this by recursing for each path and maintaining the visited bitmap
to avoid visiting CFG re-merges multiple times.  The overall cost
is still limited by the same bound, it's just more likely we'll hit
the limit now.  The patch doesn't try to deal with byte tracking
once a path forks but drops info on the floor and only handling
fully dead stores in that case.

PR tree-optimization/79958
PR tree-optimization/109087
PR tree-optimization/100314
PR tree-optimization/114774
* tree-ssa-dse.cc (dse_classify_store): New forwarder.
(dse_classify_store): Add arguments cnt and visited, recurse
to track multiple paths when we end up with multiple defs.

* gcc.dg/tree-ssa/ssa-dse-48.c: New testcase.
* gcc.dg/tree-ssa/ssa-dse-49.c: Likewise.
* gcc.dg/tree-ssa/ssa-dse-50.c: Likewise.
* gcc.dg/tree-ssa/ssa-dse-51.c: Likewise.
* gcc.dg/graphite/pr80906.c: Avoid DSE of last data reference
in loop.
* g++.dg/ipa/devirt-24.C: Adjust for extra DSE.
* g++.dg/warn/Wuninitialized-pr107919-1.C: Use more important
-O2 optimization level, -O1 regresses.

Observed on 
Ice Lake
Cascade Lake
AlderLake
Zen3 Server/Client
Also failed on Aarch64 (But didn't bisect)

[Bug target/115146] [15 Regression] Incorrect 8-byte vectorization: psrlw/psraw confusion

2024-05-19 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115146

--- Comment #11 from Levy Hsu  ---
(In reply to Uroš Bizjak from comment #8)
> (In reply to Levy Hsu from comment #5)
> > case E_V16QImode:
> >   mode = V8HImode;
> >   gen_shr = gen_vlshrv8hi3;
> >   gen_shl = gen_vashlv8hi3;
> 
> Hm, why vector-by-vector shift here? Should there be a call to gen_lshrv8hi3
> and gen_ashlv8hi3 instead?

Yes Uros It looks like a misuse, should be 
gen_lshrv8hi3 and gen_ashlv8hi3, 

gen_vlshrv8hi3 and gen_vashlv8hi3 accidentally generated "correct" psrlw and
psllw

[Bug target/115146] [15 Regression] Incorrect 8-byte vectorization: psrlw/psraw confusion

2024-05-18 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115146

--- Comment #7 from Levy Hsu  ---
Created attachment 58236
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58236&action=edit
[PR]115146

[Bug target/115146] [15 Regression] Incorrect 8-byte vectorization: psrlw/psraw confusion

2024-05-18 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115146

--- Comment #5 from Levy Hsu  ---
switch (d->vmode)
{
case E_V8QImode:
  if (!TARGET_MMX_WITH_SSE)
return false;
  mode = V4HImode;
  gen_shr = gen_ashrv4hi3(should be gen_lshrv4hi3);
  gen_shl = gen_ashlv4hi3;
  gen_or = gen_iorv4hi3;
  break;
case E_V16QImode:
  mode = V8HImode;
  gen_shr = gen_vlshrv8hi3;
  gen_shl = gen_vashlv8hi3;
  gen_or = gen_iorv8hi3;
  break;
default: return false;
}

Obviously, under V8QImode it should be gen_lshrv4hi3 instead of gen_ashrv4hi3.

I mistakenly used gen_ashrv4hi3 due to the similar naming conventions and
failed to find out. gen_lshrv4hi3 is the correct logical shift needed.

Will send a patch soon

[Bug target/107563] __builtin_shufflevector fails to pshufd instructions under default x86_64 compilation toggle which is the sse2 one

2024-05-18 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107563

--- Comment #12 from Levy Hsu  ---
switch (d->vmode)
{
case E_V8QImode:
  if (!TARGET_MMX_WITH_SSE)
return false;
  mode = V4HImode;
  gen_shr = gen_ashrv4hi3(should be gen_lshrv4hi3);
  gen_shl = gen_ashlv4hi3;
  gen_or = gen_iorv4hi3;
  break;
case E_V16QImode:
  mode = V8HImode;
  gen_shr = gen_vlshrv8hi3;
  gen_shl = gen_vashlv8hi3;
  gen_or = gen_iorv8hi3;
  break;
default: return false;
}

Obviously, under V8QImode it should be gen_lshrv4hi3 instead of gen_ashrv4hi3.

I mistakenly used gen_ashrv4hi3 due to the similar naming conventions and
failed to find out. gen_lshrv4hi3 is the correct logical shift needed.

Will send a patch soon

[Bug tree-optimization/111858] [14 Regression] ICE: in vectorizable_simd_clone_call, at tree-vect-stmts.cc:4263

2023-10-18 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111858

Levy Hsu  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|WAITING |RESOLVED

[Bug tree-optimization/111858] [14 Regression] ICE: in vectorizable_simd_clone_call, at tree-vect-stmts.cc:4263

2023-10-18 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111858

--- Comment #2 from Levy Hsu  ---
Checked the parent commit and confirmed r14-4682-g323209cd73bf1d fixed the ICE.
Thanks~

[Bug tree-optimization/111858] New: [14 Regression] ICE: in vectorizable_simd_clone_call, at tree-vect-stmts.cc:4263

2023-10-17 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111858

Bug ID: 111858
   Summary: [14 Regression] ICE: in vectorizable_simd_clone_call,
at tree-vect-stmts.cc:4263
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: admin at levyhsu dot com
  Target Milestone: ---

This bug appears to be caused by a commit r14-4628-g63eaccd114393f
(63eaccd114393f4692976bb78b30148e6d77a89e)

When compiling spec2017 527.cam4_r with 
-march=native -Ofast -funroll-loops -flto
on Sapphire Rapids or
-march=sapphirerapids on other host:

radsw.fppized.f90: In function 'raddedmx.isra':
radsw.fppized.f90:2017:19: internal compiler error: in
vectorizable_simd_clone_call, at tree-vect-stmts.cc:4263
 2017 | subroutine raddedmx(coszrs  ,ndayc   ,abh2o   , &
  |   ^
0x8d3fad vectorizable_simd_clone_call
../../gcc/tree-vect-stmts.cc:4263
0x1e4693e vect_analyze_stmt(vec_info*, _stmt_vec_info*, bool*, _slp_tree*,
_slp_instance*, vec*)
../../gcc/tree-vect-stmts.cc:12672
0x116255c vect_analyze_loop_operations
../../gcc/tree-vect-loop.cc:2087
0x116255c vect_analyze_loop_2
../../gcc/tree-vect-loop.cc:2903
0x1162ecf vect_analyze_loop_1
../../gcc/tree-vect-loop.cc:3339
0x11635d3 vect_analyze_loop(loop*, vec_info_shared*)
../../gcc/tree-vect-loop.cc:3493
0x11a1981 try_vectorize_loop_1
../../gcc/tree-vectorizer.cc:1064
0x11a1981 try_vectorize_loop
../../gcc/tree-vectorizer.cc:1180
0x11a21e4 execute
../../gcc/tree-vectorizer.cc:1296
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See  for instructions.
lto-wrapper: fatal error: gfortran returned 1 exit status
compilation terminated.

Which matches the changes in vectorizable_simd_clone_call.

[Bug tree-optimization/106315] [13 Regression] 7.8% increased codesize on specfp 507.cactuBSSN_r

2022-07-24 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106315

--- Comment #4 from Levy Hsu  ---
So I cross-compared all sizes of those .o files in make.out:

list of all diff > 2% files:

Size1: 19464 Size2: 20760 File: PUGH/SetupPGV.o
Size1: 324675 Size2: 402929 File: LocalReduce/CountFunctions.o
Size1: 372967 Size2: 408964 File: LocalReduce/Norm4Functions.o
Size1: 378371 Size2: 434948 File: LocalReduce/Norm1Functions.o
Size1: 370431 Size2: 442340 File: LocalReduce/Norm2Functions.o
Size1: 373212 Size2: 460212 File: LocalReduce/SumFunctions.o
Size1: 373858 Size2: 452466 File: LocalReduce/AvgFunctions.o
Size1: 379238 Size2: 437511 File: LocalReduce/NormInfFunctions.o
Size1: 374379 Size2: 384654 File: LocalReduce/MaxFunctions.o
Size1: 377728 Size2: 387170 File: LocalReduce/MinFunctions.o
Size1: 379068 Size2: 395071 File: LocalReduce/Norm3Functions.o
Size1: 7136496 Size2: 7640664 File: cactusBSSN_r

Not sure if they were caused by the same head file

[Bug tree-optimization/106315] [13 Regression] 7.8% increased codesize on specfp 507.cactuBSSN_r

2022-07-20 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106315

--- Comment #2 from Levy Hsu  ---
Cheked non-LTO build (-march=native -Ofast -funroll-loops), codesize increasd
by 7.1%

[Bug tree-optimization/106315] New: 7.8% increased codesize on specfp 507.cactuBSSN_r

2022-07-15 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106315

Bug ID: 106315
   Summary: 7.8% increased codesize on specfp 507.cactuBSSN_r
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: admin at levyhsu dot com
  Target Milestone: ---

when compiled with march_native_ofast_lto (-march=native -Ofast -funroll-loops
-flto) on IceLake,CascadeLake, SkylakeW, Zen3 Server/Client,
r13-1268-g8c99e307b20c50 results 7.8%-7.9% codesize increment.

On aarch64 codesize looks ok.

[Bug target/106222] New: x86 Better code squence for __builtin_shuffle

2022-07-06 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106222

Bug ID: 106222
   Summary: x86 Better code squence for __builtin_shuffle
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: admin at levyhsu dot com
  Target Milestone: ---

Related bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54346

After I merge two VEC_PERM_EXPR in the match.pd, I found that two
__builtin_shuffle actually generate better code than one __builtin_shuffle:

https://godbolt.org/z/xE9xd9ExT

[Bug tree-optimization/105643] [13 Regression] Code-Size regression for specrate 538.imagick_r

2022-05-19 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105643

--- Comment #6 from Levy Hsu  ---
Created attachment 52997
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52997&action=edit
Vec different seems to related SetPixelPacket

[Bug tree-optimization/105643] [13 Regression] Code-Size regression for specrate 538.imagick_r

2022-05-19 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105643

--- Comment #5 from Levy Hsu  ---
Created attachment 52996
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52996&action=edit
-fopt-info-vec after this commit

[Bug tree-optimization/105643] [13 Regression] Code-Size regression for specrate 538.imagick_r

2022-05-19 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105643

--- Comment #4 from Levy Hsu  ---
Created attachment 52995
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52995&action=edit
-fopt-info-vec before that commit

[Bug tree-optimization/105643] [13 Regression] Code-Size regression for specrate 538.imagick_r

2022-05-18 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105643

--- Comment #3 from Levy Hsu  ---
I forgot to mention, that the build time also increased by 128% on the Intel
platform above, but no performance improvement was spotted.

I'll check the objdump and see what happens.

[Bug tree-optimization/105643] New: [13 Regression] Code-Size regression for specrate 538.imagick_r

2022-05-18 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105643

Bug ID: 105643
   Summary: [13 Regression] Code-Size regression for specrate
538.imagick_r
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: admin at levyhsu dot com
  Target Milestone: ---

We found on Intel platform, commit r13-128-g938a02a589dc22
(938a02a589dc22cef65bba2b131fc9e4874baddb) results 53.7% codesize increment
when compiled with march_native_ofast_lto(-march=native -Ofast -funroll-loops
-flto) for specrate 538.imagick_r on SkylakeW, Cascade Lake and IceLake.

On Zen3 Server/Client code size looks alright.

[Bug regression/103997] [12 Regression] gcc.target/i386/pr88531-??.c scan-assembler-times FAILs

2022-01-25 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103997

--- Comment #14 from Levy Hsu  ---
Hi Avieira and Richard

I checked the data for the last half month and you are right, that no real
regression was caused. Thank you all for the detailed explanation.

[Bug regression/103997] [12 Regression] gcc.target/i386/pr88531-??.c scan-assembler-times FAILs

2022-01-24 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103997

--- Comment #11 from Levy Hsu  ---
Hi Avieira

The baseline was one commit before. (ffc7f200adbdf47f14b3594d9b21855c19cf797a)
I'm experiencing some issue on local Vtune so I can't say which function or
front/backend was affected.
objdump shows some different binary, but too long to dig deep.
I'll fix the Vtune and see if I can get some results back.

[Bug regression/103997] [12 Regression] gcc.target/i386/pr88531-??.c scan-assembler-times FAILs

2022-01-23 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103997

Levy Hsu  changed:

   What|Removed |Added

 CC||admin at levyhsu dot com

--- Comment #9 from Levy Hsu  ---
Compare to one commit before (ffc7f200adbdf47f14b3594d9b21855c19cf797a)
commit r12-6740-gf4ca0a53be18dfc7162fd5dcc1e73c4203805e14 causes regression on

AlderLake (12900K) Multi-Core
548.exchange2_r -3.48%

Skylake Workstation(7920x) Single Core
538.imagick_r -2.29%
549.fotonik3d_r -3.81%

With label march_native_ofast_lto and 5 iterations
-march=native -Ofast -funroll-loops -flto

[Bug tree-optimization/53947] [meta-bug] vectorizer missed-optimizations

2022-01-20 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
Bug 53947 depends on bug 104058, which changed state.

Bug 104058 Summary: [12 Regression] 6-7% x264_r regression with -march=native 
-Ofast -funroll-loops -flto on x86 since 
r12-6420-gd3ff7420e941931d32ce2e332e7968fe67ba20af
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104058

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution|--- |FIXED

[Bug middle-end/26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

2022-01-20 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
Bug 26163 depends on bug 104058, which changed state.

Bug 104058 Summary: [12 Regression] 6-7% x264_r regression with -march=native 
-Ofast -funroll-loops -flto on x86 since 
r12-6420-gd3ff7420e941931d32ce2e332e7968fe67ba20af
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104058

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution|--- |FIXED

[Bug tree-optimization/104058] [12 Regression] 6-7% x264_r regression with -march=native -Ofast -funroll-loops -flto on x86 since r12-6420-gd3ff7420e941931d32ce2e332e7968fe67ba20af

2022-01-20 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104058

Levy Hsu  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|WAITING |RESOLVED

--- Comment #4 from Levy Hsu  ---
SkyLake
+6.79%

CascadeLake
+7.56%

Zen2
+5.77%

Look likes it's resolved. We'll track it further in weekly report.
Thanks

[Bug tree-optimization/104058] New: [12 Regression] 6-7% x264_r regression with -march=native -Ofast -funroll-loops -flto on x86 since r12-6420-gd3ff7420e941931d32ce2e332e7968fe67ba20af

2022-01-16 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104058

Bug ID: 104058
   Summary: [12 Regression] 6-7% x264_r regression with
-march=native -Ofast -funroll-loops -flto on x86 since
r12-6420-gd3ff7420e941931d32ce2e332e7968fe67ba20af
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: admin at levyhsu dot com
  Target Milestone: ---

We observed regression on 525.x264_r with commit
d3ff7420e941931d32ce2e332e7968fe67ba20af

On IceLake(8358):
-7.27%

On Zen3(7763):
-6.67%

On Zen3(5800x):
-6.45%

The regression on Zen 3 can also be found in
https://lnt.opensuse.org/db_default/v4/SPEC/graph?highlight_run=22984&plot.0=475.377.0

[Bug tree-optimization/103223] [12 regression] Access attribute dropped when ipa-sra is applied

2021-11-21 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103223

Levy  changed:

   What|Removed |Added

 CC||admin at levyhsu dot com

--- Comment #10 from Levy  ---
Hi Jan

Just want provide a status report that this commit
ecdf414bd89e6ba251f6b3f494407139b4dbae0e seems to caused about 50% regression
when running multi-copy 548.exchange2_r with march_native_ofast_lto on
spec2017:

Xeon(R) Platinum 8358 (IceLake) (64C 128T 512G):
BenchMarks  Copies  RunTime1RunTime2Rate1   Rate2   Compare
548.exchange2_r 128 479 913 700 367 -47.57%

Xeon(R) Gold 6252 (CascadeLake) (48C 96T 192G)
BenchMarks  Copies  RunTime1RunTime2Rate1   Rate2   Compare
548.exchange2_r 96  643 1240391 203 -48.08%

Best
Levy

[Bug target/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-12-21 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #57 from Levy  ---
Thank you JiaWei for the CoreMark-Pro result.

Personally, I agree with Jim, since changing the split behaviour of try_combine
in the combine.c could affect other platforms, theoretically, we can fix it
with platform flag and UNITS_PER_WORD check or maybe Just skip over a
ZERO_EXTEND or SIGN_EXTEND before the general_operand check, but that seems
inconvenient.

Probably need more testing on all patches to see the differences in code size &
speed. Maybe after EEMBC results come out then decide what to proceed next.

[Bug target/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-12-15 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

Levy  changed:

   What|Removed |Added

  Attachment #49543|0   |1
is obsolete||
  Attachment #49575|0   |1
is obsolete||
  Attachment #49757|0   |1
is obsolete||

--- Comment #49 from Levy  ---
Created attachment 49767
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49767&action=edit
Auto-extend Patch

Combined all three patches.

   = Summary of gcc testsuite =
| # of unexpected case / # of unique unexpected
case
|  gcc |  g++ | gfortran |
 rv64imafdc/  lp64d/ medlow |0 / 0 |0 / 0 |  - |

(May require some work on coding style)

[Bug target/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-12-14 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #48 from Levy  ---
Created attachment 49757
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49757&action=edit
Initial V1 patch on combine.c

Three patches together: 


   = Summary of gcc testsuite =
| # of unexpected case / # of unique unexpected
case
|  gcc |  g++ | gfortran |
 rv64imafdc/  lp64d/ medlow |0 / 0 |0 / 0 |  - |

I'll merge all 3 patches together and fix all the debug/coding style/efficiency
/whatever problem with explanations later this week.

Looks likes it's fixed from my side.

[Bug target/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-12-08 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #47 from Levy  ---
where insns are merged:
In combine.c (pass_combine)

rest_of_handle_combine()
calls:
combine_instructions()
calls:
creat_log_links() creates links of insn (768&32/36/40/44) 
for both patched and unpatched version with log_links()

Then in combine_instructions(), for_each_log_link(), try_combine() calls
combine_validate_cost()

in combine_validate_cost(), for the patched version:

OLD===846930886===OLD
i2 & Cost 4:
(insn 27 3 6 2 (set (reg/f:DI 92)
(plus:DI (reg:DI 96)
(const_int 768 [0x300]))) "array_test.c":7:5 4 {adddi3}
 (expr_list:REG_DEAD (reg:DI 96)
(nil)))
i3 & Cost 16:
(insn 6 27 8 2 (set (reg:DI 82 [ MEM[(int *)array_5(D) + 800B] ])
(sign_extend:DI (mem:SI (plus:DI (reg:DI 96)
(const_int 800 [0x320])) [1 MEM[(int *)array_5(D) + 800B]+0
S4 A32]))) "array_test.c":7:5 90 {extendsidi2}
 (nil))
Old Cost 20:



NEW===846930886===NEW
New_Cost: 20
i0 & Cost 0:
(nil)
i1 & Cost 0:
(nil)
i2 & Cost 4:
(insn 27 3 6 2 (set (reg/f:DI 92)
(plus:DI (reg:DI 96)
(const_int 768 [0x300]))) "array_test.c":7:5 4 {adddi3}
 (expr_list:REG_DEAD (reg:DI 96)
(nil)))
i3 & Cost 16:
(insn 6 27 8 2 (set (reg:DI 82 [ MEM[(int *)array_5(D) + 800B] ])
(sign_extend:DI (mem:SI (plus:DI (reg:DI 96)
(const_int 800 [0x320])) [1 MEM[(int *)array_5(D) + 800B]+0
S4 A32]))) "array_test.c":7:5 90 {extendsidi2}
 (nil))
newpat:
(set (reg:DI 82 [ MEM[(int *)array_5(D) + 800B] ])
(sign_extend:DI (mem:SI (plus:DI (reg:DI 96)
(const_int 800 [0x320])) [1 MEM[(int *)array_5(D) + 800B]+0 S4
A32])))
newi2pat:
(set (reg/f:DI 92)
(plus:DI (reg:DI 96)
(const_int 768 [0x300])))
newotherpat:
(nil)
GO!---


OLD===1681692777===OLD
i2 & Cost 4:
(insn 27 3 6 2 (set (reg/f:DI 92)
(plus:DI (reg:DI 96)
(const_int 768 [0x300]))) "array_test.c":7:5 4 {adddi3}
 (nil))
i3 & Cost 16:
(insn 8 6 10 2 (set (reg:DI 84 [ MEM[(int *)array_5(D) + 804B] ])
(sign_extend:DI (mem:SI (plus:DI (reg:DI 96)
(const_int 804 [0x324])) [1 MEM[(int *)array_5(D) + 804B]+0
S4 A32]))) "array_test.c":7:5 90 {extendsidi2}
 (nil))
Old Cost 20:



NEW===1681692777===NEW
New_Cost: 20
i0 & Cost 0:
(nil)
i1 & Cost 0:
(nil)
i2 & Cost 4:
(insn 27 3 6 2 (set (reg/f:DI 92)
(plus:DI (reg:DI 96)
(const_int 768 [0x300]))) "array_test.c":7:5 4 {adddi3}
 (nil))
i3 & Cost 16:
(insn 8 6 10 2 (set (reg:DI 84 [ MEM[(int *)array_5(D) + 804B] ])
(sign_extend:DI (mem:SI (plus:DI (reg:DI 96)
(const_int 804 [0x324])) [1 MEM[(int *)array_5(D) + 804B]+0
S4 A32]))) "array_test.c":7:5 90 {extendsidi2}
 (nil))
newpat:
(set (reg:DI 84 [ MEM[(int *)array_5(D) + 804B] ])
(sign_extend:DI (mem:SI (plus:DI (reg:DI 96)
(const_int 804 [0x324])) [1 MEM[(int *)array_5(D) + 804B]+0 S4
A32])))
newi2pat:
(set (reg/f:DI 92)
(plus:DI (reg:DI 96)
(const_int 768 [0x300])))
newotherpat:
(nil)
GO!---


OLD===1714636915===OLD
i2 & Cost 4:
(insn 27 3 6 2 (set (reg/f:DI 92)
(plus:DI (reg:DI 96)
(const_int 768 [0x300]))) "array_test.c":7:5 4 {adddi3}
 (nil))
i3 & Cost 16:
(insn 12 10 14 2 (set (reg:DI 87 [ MEM[(int *)array_5(D) + 808B] ])
(sign_extend:DI (mem:SI (plus:DI (reg:DI 96)
(const_int 808 [0x328])) [1 MEM[(int *)array_5(D) + 808B]+0
S4 A32]))) "array_test.c":8:5 90 {extendsidi2}
 (nil))
Old Cost 20:



NEW===1714636915===NEW
New_Cost: 20
i0 & Cost 0:
(nil)
i1 & Cost 0:
(nil)
i2 & Cost 4:
(insn 27 3 6 2 (set (reg/f:DI 92)
(plus:DI (reg:DI 96)
(const_int 768 [0x300]))) "array_test.c":7:5 4 {adddi3}
 (nil))
i3 & Cost 16:
(insn 12 10 14 2 (set (reg:DI 87 [ MEM[(int *)array_5(D) + 808B] ])
(sign_extend:DI (mem:SI (plus:DI (reg:DI 96)
(const_int 808 [0x328])) [1 MEM[(int *)array_5(D) + 808B]+0
S4 A32]))) "array_test.c":8:5 90 {extendsidi2}
 (nil))
newpat:
(set (reg:DI 87 [ MEM[(int *)array_5(D) + 808B] ])
(sign_extend:DI (mem:SI (plus:DI (reg:DI 96)
(const_int 808 [0x328])) [1 MEM[(int *)array_5(D) + 808B]+0 S4
A32])))
newi2pat:
(set (reg/f:DI 92)
(plus:DI (reg:DI 96)
(const_int 768 [0x300])))
newotherpat:
(nil)
GO!---


OLD===1957747793===OLD
i2 & Cost 4:
(insn 27 3 6 2 (set (reg/f:DI 92)
(plus:DI (reg:DI 96)
(const_int 768 [0x300]))) "array_test.c":7:5 4 {adddi3}
 (nil))
i3 & Cost 16:
(insn 16 14 18 2 (set (reg:DI 90 [ MEM[(int *)ar

[Bug target/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-11-30 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #46 from Levy  ---
Looking at gcc/passed.def and gcc/config/riscv-passes.def:

pass_shorten_memrefs is inserted after NEXT_PASS (pass_rtl_store_motion);

  NEXT_PASS (pass_rtl_store_motion);
(pass_shorten_memrefs)
  NEXT_PASS (pass_cse_after_global_opts);
  NEXT_PASS (pass_rtl_ifcvt);
  NEXT_PASS (pass_reginfo_init);
  /* Perform loop optimizations.  It might be better to do them a bit
  sooner, but we want the profile feedback to work more
  efficiently.  */
  NEXT_PASS (pass_loop2);
  PUSH_INSERT_PASSES_WITHIN (pass_loop2)
NEXT_PASS (pass_rtl_loop_init);
NEXT_PASS (pass_rtl_move_loop_invariants);
NEXT_PASS (pass_rtl_unroll_loops);
NEXT_PASS (pass_rtl_doloop);
NEXT_PASS (pass_rtl_loop_done);
  POP_INSERT_PASSES ()
  NEXT_PASS (pass_lower_subreg2);
  NEXT_PASS (pass_web);
  NEXT_PASS (pass_rtl_cprop);
  NEXT_PASS (pass_cse2);
  NEXT_PASS (pass_rtl_dse1);
  NEXT_PASS (pass_rtl_fwprop_addr);
  NEXT_PASS (pass_inc_dec);
  NEXT_PASS (pass_initialize_regs);
  NEXT_PASS (pass_ud_rtl_dce);
  NEXT_PASS (pass_combine);
  NEXT_PASS (pass_if_after_combine);
  NEXT_PASS (pass_jump_after_combine);
  NEXT_PASS (pass_partition_blocks);
  NEXT_PASS (pass_outof_cfg_layout_mode);
  NEXT_PASS (pass_split_all_insns);
  NEXT_PASS (pass_lower_subreg3);
  NEXT_PASS (pass_df_initialize_no_opt);
  NEXT_PASS (pass_stack_ptr_mod);
  NEXT_PASS (pass_mode_switching);
  NEXT_PASS (pass_match_asm_constraints);
  NEXT_PASS (pass_sms);
  NEXT_PASS (pass_live_range_shrinkage);
  NEXT_PASS (pass_sched);
  NEXT_PASS (pass_early_remat);
  NEXT_PASS (pass_ira);
  NEXT_PASS (pass_reload);
  NEXT_PASS (pass_postreload);
  PUSH_INSERT_PASSES_WITHIN (pass_postreload)
NEXT_PASS (pass_postreload_cse);
NEXT_PASS (pass_gcse2);
NEXT_PASS (pass_split_after_reload);
..

After some debugging processes. it seems either:
1.The address cost info was calculated between (pass_combine) and
(pass_shorten_memrefs) for patched version, then merged in the combined pass. 
patched one is failed to be recognized as unpathed one due to Sign/Zero extend
then Subreg.
This can be verified by adding -fdisable-rtl-combine option when compile, also
the address_cost was not called for the whole time.

2.4 insn was determined(or say fixed?) before (pass_rtl_fwprop_addr), as for
patched version, I saw forward_propagate_and_simplify() was called for 4 extra
times, then pass all the way to
propagate_rtx()->propagate_rtx_1()->should_replace_address()->address_cost() in
fwprop.c

I've also tested the (pass_postreload) as mentioned by Jim and
new_address_profitable_p(). But they seem not to be the right one.

Need some time to examine and trace the pass between (pass_shorten_memrefs) and
(pass_rtl_fwprop_addr).

[Bug target/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-11-22 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #45 from Levy  ---
Basically crossed out the rtlanal.c and fwprop.c
I'm looking back at riscv.c. Since address_cost() was called by hook function
new_address_profitable_p(), may be some place uses this function would provide
more info

[Bug target/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-11-22 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #44 from Levy  ---
should_replace_address() in fwprop.c looks really interesting:

/* OLD is a memory address.  Return whether it is good to use NEW instead,
   for a memory access in the given MODE.  */

[Bug target/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-11-22 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #43 from Levy  ---
Thanks for pointing the hook out Jim.

for both patched and unpatched, so far I've been traced through

try_replace_in_use()
to
reload_combine_recognize_const_pattern()
to
reload_combine()
to
reload_cse_regs()
to
pass_postreload_cse::execute()

in postreload.c

---
For reload_combine(), by printing each insn at front of the loop: (line 1302)

for (insn = get_last_insn (); insn; insn = prev)
{
   debug_rtx(insn)
   ...
}
---
Unpatched gcc shows:

(insn 13 11 14 2 (set (reg:DI 10 a0)
(sign_extend:DI (mem:SI (plus:DI (reg/f:DI 15 a5 [88])
(const_int 44 [0x2c])) [1 MEM[(int *)array_5(D) + 812B]+0
S4 A32]))) "array_test.c":9:5 90 {extendsidi2}
 (nil))
(insn 11 10 13 2 (set (reg:SI 14 a4 [83])
(plus:SI (reg:SI 14 a4 [orig:84 MEM[(int *)array_5(D) + 808B] ] [84])
(reg:SI 10 a0 [80]))) "array_test.c":8:5 3 {addsi3}
 (nil))
(insn 10 8 11 2 (set (reg:DI 14 a4)
(sign_extend:DI (mem:SI (plus:DI (reg/f:DI 15 a5 [88])
(const_int 40 [0x28])) [1 MEM[(int *)array_5(D) + 808B]+0
S4 A32]))) "array_test.c":8:5 90 {extendsidi2}
 (expr_list:REG_EQUIV (mem:SI (plus:DI (reg/f:DI 15 a5 [88])
(const_int 40 [0x28])) [1 MEM[(int *)array_5(D) + 808B]+0 S4
A32])
(nil)))
(insn 8 7 10 2 (set (reg:SI 10 a0 [80])
(plus:SI (reg:SI 10 a0 [orig:81 MEM[(int *)array_5(D) + 800B] ] [81])
(reg:SI 14 a4 [orig:82 MEM[(int *)array_5(D) + 804B] ] [82])))
"array_test.c":7:5 3 {addsi3}
 (nil))
(insn 7 6 8 2 (set (reg:DI 14 a4)
(sign_extend:DI (mem:SI (plus:DI (reg/f:DI 15 a5 [88])
(const_int 36 [0x24])) [1 MEM[(int *)array_5(D) + 804B]+0
S4 A32]))) "array_test.c":7:5 90 {extendsidi2}
 (expr_list:REG_EQUIV (mem:SI (plus:DI (reg/f:DI 15 a5 [88])
(const_int 36 [0x24])) [1 MEM[(int *)array_5(D) + 804B]+0 S4
A32])
(nil)))
(insn 6 23 7 2 (set (reg:DI 10 a0)
(sign_extend:DI (mem:SI (plus:DI (reg/f:DI 15 a5 [88])
(const_int 32 [0x20])) [1 MEM[(int *)array_5(D) + 800B]+0
S4 A32]))) "array_test.c":7:5 90 {extendsidi2}
 (expr_list:REG_EQUIV (mem:SI (plus:DI (reg/f:DI 15 a5 [88])
(const_int 32 [0x20])) [1 MEM[(int *)array_5(D) + 800B]+0 S4
A32])
(nil)))
---
Patched one shows already merged results:

(insn 16 14 18 2 (set (reg:DI 10 a0 [orig:90 MEM[(int *)array_5(D) + 812B] ]
[90])
(sign_extend:DI (mem:SI (plus:DI (reg:DI 10 a0 [96])
(const_int 812 [0x32c])) [1 MEM[(int *)array_5(D) + 812B]+0
S4 A32]))) "array_test.c":9:5 90 {extendsidi2}
 (nil))
(insn 14 12 16 2 (set (reg:SI 15 a5 [85])
(plus:SI (reg:SI 15 a5 [80])
(reg:SI 14 a4 [orig:87 MEM[(int *)array_5(D) + 808B] ] [87])))
"array_test.c":8:5 3 {addsi3}
 (nil))
(insn 12 10 14 2 (set (reg:DI 14 a4 [orig:87 MEM[(int *)array_5(D) + 808B] ]
[87])
(sign_extend:DI (mem:SI (plus:DI (reg:DI 10 a0 [96])
(const_int 808 [0x328])) [1 MEM[(int *)array_5(D) + 808B]+0
S4 A32]))) "array_test.c":8:5 90 {extendsidi2}
 (nil))
(insn 10 8 12 2 (set (reg:SI 15 a5 [80])
(plus:SI (reg:SI 15 a5 [orig:84 MEM[(int *)array_5(D) + 804B] ] [84])
(reg:SI 14 a4 [orig:82 MEM[(int *)array_5(D) + 800B] ] [82])))
"array_test.c":7:5 3 {addsi3}
 (nil))
(insn 8 6 10 2 (set (reg:DI 15 a5 [orig:84 MEM[(int *)array_5(D) + 804B] ]
[84])
(sign_extend:DI (mem:SI (plus:DI (reg:DI 10 a0 [96])
(const_int 804 [0x324])) [1 MEM[(int *)array_5(D) + 804B]+0
S4 A32]))) "array_test.c":7:5 90 {extendsidi2}
 (nil))
(insn 6 27 8 2 (set (reg:DI 14 a4 [orig:82 MEM[(int *)array_5(D) + 800B] ]
[82])
(sign_extend:DI (mem:SI (plus:DI (reg:DI 10 a0 [96])
(const_int 800 [0x320])) [1 MEM[(int *)array_5(D) + 800B]+0
S4 A32]))) "array_test.c":7:5 90 {extendsidi2}
 (nil))
---
Before reload_combine () is reload_cse_regs_1() in reload_cse_regs() which
"detects no-op moves where we happened to assign two different pseudo-registers
to the same hard register"

and pass_postreload_cse::execute calls reload_cse_regs()

pass_postreload_cse::execute() look like the entry point for postreload.c

In order to confirm it's not in postreload.c, I put:
--
  rtx_insn *insn, *next;
  for (insn = get_insns (); insn; insn = next)
  {
  debug_rtx(insn);
  next = NEXT_INSN (insn);
  }
--
in pass_postreload_cse::execute (function *fun)

right after:

if (!dbg_cnt (postreload_cse))
return 0;
-

[Bug target/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-11-19 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #41 from Levy  ---
When putting the same debug_rtx(addr) at the first line of riscv_address_cost()

Unpatched one outputs:
(plus:DI (reg/f:DI 15 a5 [88])
(const_int 32 [0x20]))
(plus:DI (reg:DI 10 a0 [92])
(const_int 800 [0x320]))
(plus:DI (reg/f:DI 15 a5 [88])
(const_int 36 [0x24]))
(plus:DI (reg:DI 10 a0 [92])
(const_int 804 [0x324]))
...

Patched one outputs nothing. what it means to me is that riscv backend,
something like:
(sign_extend:DI (mem:SI (plus:DI (reg:DI 93) 

is never passed to riscv_address_cost(), unlike:
(mem:SI (plus:DI (reg:DI 89)

so whether riscv_mshorten_memrefs is set or not doesn't really matter here.
Then I traced back to where address_cost() is called, 

1.address_cost() is called by riscv_new_address_profitable_p() in riscv.c
2.riscv_new_address_profitable_p() is called by attempt_change() in
sched-deps.c
And since I'm not that familiar with struct mem_inc_info, this of trace could
be wrong:
3.attempt_change() is called by find_inc() in sched-deps.c
(Still tracing)

--
Also since Arm can handle sign/zero/extend with subreg under -Os, I had a quick
search on arm.c

in arm_address_cost(), rtx here were passed as x, not addr (which addr may be
XEXP (mem, 0)). 

So GET_CODE (x) cam be used to determine whether it has a
MEM/LABEL_REF/SYMBOL_REF... at front. Then cost can be vary from 0 all the way
to 10.

This also worth some investigation to see how -Os works on arm side.
--

Need some time to work on it.

[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-11-17 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #39 from Levy  ---
Checked all pass from 250r.shorten_memrefs to 270r.ce2

In 269r.combine I saw the following combination merged the replaced address:
---
modifying insn i327: r92:DI=r96:DI+0x300
  REG_DEAD r96:DI
deferring rescan insn with uid = 27.
(!)allowing combination of insns 27 and 6
original costs 4 + 16 = 20
replacement costs 4 + 16 = 20
modifying insn i227: r92:DI=r96:DI+0x300
deferring rescan insn with uid = 27.
modifying insn i3 6: r82:DI=sign_extend([r96:DI+0x320])
  REG_DEAD r96:DI
deferring rescan insn with uid = 6.
(!)allowing combination of insns 27 and 8
original costs 4 + 16 = 20
replacement costs 4 + 16 = 20
modifying insn i227: r92:DI=r96:DI+0x300
deferring rescan insn with uid = 27.
modifying insn i3 8: r84:DI=sign_extend([r96:DI+0x324])
  REG_DEAD r96:DI
deferring rescan insn with uid = 8.
(!)allowing combination of insns 27 and 12
original costs 4 + 16 = 20
replacement costs 4 + 16 = 20
modifying insn i227: r92:DI=r96:DI+0x300
deferring rescan insn with uid = 27.
modifying insn i312: r87:DI=sign_extend([r96:DI+0x328])
  REG_DEAD r96:DI
deferring rescan insn with uid = 12.
(!)allowing combination of insns 27 and 16
original costs 4 + 16 = 20
replacement cost 16
deferring deletion of insn with uid = 27.
modifying insn i316: r90:DI=sign_extend([r96:DI+0x32c])
  REG_DEAD r96:DI
deferring rescan insn with uid = 16.
allowing combination of insns 18 and 19
---
Where in 268r.ud_dce both insns 27 are same (except for memory address):

(insn 27 26 28 2 (set (reg:DI 10 a0)
(lo_sum:DI (reg/f:DI 85)
(symbol_ref/f:DI ("*.LC0") [flags 0x82]  ))) "array_test.c":21:5 133 {*lowdi}
 (expr_list:REG_DEAD (reg/f:DI 85)
(expr_list:REG_EQUAL (symbol_ref/f:DI ("*.LC0") [flags 0x82]  )
(nil
---
In 270r.combine (patched):

(note 27 3 6 2 NOTE_INSN_DELETED)

and following insns 768 + 32/36/40/44 were put back as:

(insn 6 27 8 2 (set (reg:DI 82 [ MEM[(int *)array_5(D) + 800B] ])
(sign_extend:DI (mem:SI (plus:DI (reg:DI 96)
(const_int 800 [0x320])) [1 MEM[(int *)array_5(D) + 800B]+0
S4 A32]))) "array_test.c":7:5 90 {extendsidi2}


(insn 8 6 10 2 (set (reg:DI 84 [ MEM[(int *)array_5(D) + 804B] ])
(sign_extend:DI (mem:SI (plus:DI (reg:DI 96)
(const_int 804 [0x324])) [1 MEM[(int *)array_5(D) + 804B]+0
S4 A32]))) "array_test.c":7:5 90 {extendsidi2}


(insn 12 10 14 2 (set (reg:DI 87 [ MEM[(int *)array_5(D) + 808B] ])
(sign_extend:DI (mem:SI (plus:DI (reg:DI 96)
(const_int 808 [0x328])) [1 MEM[(int *)array_5(D) + 808B]+0
S4 A32]))) "array_test.c":8:5 90 {extendsidi2}

(insn 16 14 18 2 (set (reg:DI 90 [ MEM[(int *)array_5(D) + 812B] ])
(sign_extend:DI (mem:SI (plus:DI (reg:DI 96)
(const_int 812 [0x32c])) [1 MEM[(int *)array_5(D) + 812B]+0
S4 A32]))) "array_test.c":9:5 90 {extendsidi2}
 (expr_list:REG_DEAD (reg:DI 96)
(nil)))

Maybe combine.c needs some modification?

[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-11-17 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #38 from Levy  ---
Created attachment 49575
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49575&action=edit
riscv-shorten-memrefs_V1.patch

Did little bit change in get_si_mem_base_reg() and transform()
Now for the same c input array_test.c

int load1r (int *array)
{
  int a = 0;
  a += array[200];
  a += array[201];
  a += array[202];
  a += array[203];
  return a;
}

int main ()
{
int arr[300]= {0};
arr[200] = 15;
arr[201] = -33;
arr[202] = 7;
arr[203] = -999;
int b = load1r(arr);
printf("Result: %d\n",b);
return 0;
}

in loadlr, when put a debug_rtx(pat) after:

(unpatched)XEXP (pat, i) = replace_equiv_address (mem, addr); 
or 
(patched)XEXP (XEXP (pat, I), 0) = replace_equiv_address(XEXP (mem, 0), addr);



unpatched gcc will produce follwing insns:
-
(set (reg:SI 81 [ MEM[(int *)array_5(D) + 800B] ])
(mem:SI (plus:DI (reg:DI 88)
(const_int 32 [0x20])) [1 MEM[(int *)array_5(D) + 800B]+0 S4 A32]))
(set (reg:SI 82 [ MEM[(int *)array_5(D) + 804B] ])
(mem:SI (plus:DI (reg:DI 89)
(const_int 36 [0x24])) [1 MEM[(int *)array_5(D) + 804B]+0 S4 A32]))
(set (reg:SI 84 [ MEM[(int *)array_5(D) + 808B] ])
(mem:SI (plus:DI (reg:DI 90)
(const_int 40 [0x28])) [1 MEM[(int *)array_5(D) + 808B]+0 S4 A32]))
(set (reg:SI 86 [ MEM[(int *)array_5(D) + 812B] ])
(mem:SI (plus:DI (reg:DI 91)
(const_int 44 [0x2c])) [1 MEM[(int *)array_5(D) + 812B]+0 S4 A32]))
-


patched gcc will produce follwing insns:
-
(set (reg:DI 82 [ MEM[(int *)array_5(D) + 800B] ])
(sign_extend:DI (mem:SI (plus:DI (reg:DI 92)
(const_int 32 [0x20])) [1 MEM[(int *)array_5(D) + 800B]+0 S4
A32])))
(set (reg:DI 84 [ MEM[(int *)array_5(D) + 804B] ])
(sign_extend:DI (mem:SI (plus:DI (reg:DI 93)
(const_int 36 [0x24])) [1 MEM[(int *)array_5(D) + 804B]+0 S4
A32])))
(set (reg:DI 87 [ MEM[(int *)array_5(D) + 808B] ])
(sign_extend:DI (mem:SI (plus:DI (reg:DI 94)
(const_int 40 [0x28])) [1 MEM[(int *)array_5(D) + 808B]+0 S4
A32])))
(set (reg:DI 90 [ MEM[(int *)array_5(D) + 812B] ])
(sign_extend:DI (mem:SI (plus:DI (reg:DI 95)
(const_int 44 [0x2c])) [1 MEM[(int *)array_5(D) + 812B]+0 S4
A32])))
-
which the patched one looks ok for me, but the final assembly code still shows
no change (both under -Os)

Not quite sure where the problem is, I'll have a look near
array_test.c.250r.shorten_memrefs tomorrow.

Please ignore the coding style as it's just a debug patch

[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-11-15 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #36 from Levy  ---
It seems get_si_mem_base_reg() were called repeatly FOR_BB_INSNS from both
pass_shorten_memrefs::analyze and pass_shorten_memrefs::transform

Correct me if I'm wrong:
It seems we need some data structure (a linked list should work) to store the
zero/sign extend when we strip it off like:

if (GET_CODE (mem) == ZERO_EXTEND
  || GET_CODE (mem) == SIGN_EXTEND)
mem = XEXP (mem, 0);
in each insns.

Then in pass_shorten_memrefs::transform(), each time get_si_mem_base_reg() is
called, we check whether if we need to put it back.

[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-11-11 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #34 from Levy  ---
(In reply to Jim Wilson from comment #33)
> I did say that I'm willing to fix code style issues.  All major software
> projects I have worked with have coding style conventions.  It makes it
> easier to maintain a large software base when everything is using the same
> coding style.  We do have info to help you follow the GNU coding style.  See
> https://gcc.gnu.org/wiki/FormattingCodeForGCC
> which has emacs and vim settings to get the right code style.

No problem and thank you, Jim, I'll try to catch up the coding style.
what about the combine issue and shorten-memrefs? 
Shall we fix it here or someplace else?

[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-11-10 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

Levy  changed:

   What|Removed |Added

  Attachment #49542|0   |1
is obsolete||

--- Comment #32 from Levy  ---
Created attachment 49543
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49543&action=edit
QI/HI/SImode auto Zero/Sign-extend

Added one missing space at the end of the comment.
Added one tab before each brace.
Replace all tabs with space (come on)

[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-11-10 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

Levy  changed:

   What|Removed |Added

  Attachment #49536|0   |1
is obsolete||

--- Comment #31 from Levy  ---
Created attachment 49542
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49542&action=edit
QI/HI/SImode auto Zero/Sign-extend

Much appreciated Jim.

The new patch should solve the format issue by adding the same tabs on each
line.

In the meanwhile I'll try to patch the pass_shorten_memrefs::analyze() in
riscv-shorten-memrefs.c

Any idea on the combiner issue?

[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-11-10 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

Levy  changed:

   What|Removed |Added

  Attachment #49534|0   |1
is obsolete||

--- Comment #29 from Levy  ---
Created attachment 49536
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49536&action=edit
QI/HI/SImode auto Zero/Sign-extend

Finally, make gen_extend_insn() seems to work with auto-extension based on Jim
and Kito's idea.

Just 10 lines of code at the beginning of riscv_legitimize_move() in
riscv-gcc/gcc/config/riscv.c

if (GET_MODE_CLASS (mode) == MODE_INT
&& GET_MODE_SIZE (mode) < UNITS_PER_WORD
  && can_create_pseudo_p()
  && MEM_P (src))
  {
  rtx temp_reg = gen_reg_rtx (word_mode);
  int zero_sign_extend = (LOAD_EXTEND_OP (mode) == ZERO_EXTEND);
  emit_insn(gen_extend_insn(temp_reg, src, word_mode, mode,
zero_sign_extend));
  riscv_emit_move(dest, gen_lowpart(mode, temp_reg));
  return true;
  }

Haven't make report-gcc, but already passed 2-stage make. 
I'll have a test later.

[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-11-09 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

Levy  changed:

   What|Removed |Added

  Attachment #49533|0   |1
is obsolete||

--- Comment #28 from Levy  ---
Created attachment 49534
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49534&action=edit
V4 patch for QI/HI/SI/DI zero/sign-extend for RV32/64/128

Suggest by Kito, The V4 patch moved the gen_extend_insn_auto() to riscv.c and
was included in riscv-protos.h since it handles riscv backend only.

[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-11-09 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #27 from Levy  ---
(In reply to Kito Cheng from comment #25)
> Seem like you have add code to gcc/optabs.h and gcc/optabs.c, however those
> functions are RISC-V specific, so I would suggest you put in riscv.c and
> riscv-protos.h.

No problem, I'll make a new patch.

[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-11-09 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

Levy  changed:

   What|Removed |Added

  Attachment #49532|0   |1
is obsolete||

--- Comment #26 from Levy  ---
Created attachment 49533
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49533&action=edit
QI/HI/SI/DI zero/sign-extend for RV32/64/128

BUG fix for unimplemented code

[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-11-09 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

Levy  changed:

   What|Removed |Added

  Attachment #49524|0   |1
is obsolete||

--- Comment #24 from Levy  ---
Created attachment 49532
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49532&action=edit
QI/HI/SI/DI zero/sign-extend for RV32/64/128

Rewrote the third proposed patch.

[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-11-09 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

Levy  changed:

   What|Removed |Added

  Attachment #49470|0   |1
is obsolete||
  Attachment #49500|0   |1
is obsolete||

--- Comment #23 from Levy  ---
Created attachment 49524
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49524&action=edit
Third test patch

While I'm waiting for a solution, I've reused my second patch and made a new
patch.
Third test patch adds one extra function gen_extend_insn_auto() in optabs.c/h
then just called by riscv_legitimize_move()
Now it can emit sign/unsigned-extend insn automatically. 

Still haven't solved the shorten-memrefs issue. So it will still raise 2 error
when make report-gcc
So the -Os option (size optimization) may not behave as expected.

[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-11-09 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #22 from Levy  ---
Under condition 

if (GET_MODE_CLASS (mode) == MODE_INT
  && GET_MODE_SIZE (mode) < UNITS_PER_WORD
&& can_create_pseudo_p()
&& MEM_P (src))

with var:

rtx temp_reg;
int extend = (LOAD_EXTEND_OP (mode) == ZERO_EXTEND);

I've tried the combination of:

gen_extend_insn (temp_reg, force_reg (mode, src), word_mode, mode, extend);
gen_extend_insn (temp_reg, force_reg (word_mode, src), word_mode, word_mode,
extend);
gen_extend_insn (temp_reg, src, word_mode, mode, extend);

with:
riscv_emit_move(dest, gen_lowpart (mode, temp_reg));
riscv_emit_move(dest, force_reg(mode, temp_reg));

then return true

All raised segfault during make gcc.

For example:

  if (GET_MODE_CLASS (mode) == MODE_INT
  && GET_MODE_SIZE (mode) < UNITS_PER_WORD
&& can_create_pseudo_p()
&& MEM_P (src))
  {
rtx temp_reg;
int extend = (LOAD_EXTEND_OP (mode) == ZERO_EXTEND);
gen_extend_insn (temp_reg, force_reg (mode, src), word_mode, mode, extend);
riscv_emit_move(dest, force_reg(mode, temp_reg));
return true;
  }
At beginning of riscv_legitimize_move()

[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-11-06 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #19 from Levy  ---
Also tested code without int extend, just zero-extend with unsignedp set to 1,
Still seg fault.

if (GET_MODE_CLASS (mode) == MODE_INT
  && GET_MODE_SIZE (mode) < UNITS_PER_WORD
&& can_create_pseudo_p()
&& MEM_P (src))
  {
rtx temp_reg = force_reg (word_mode, convert_to_mode (word_mode, src, 1));
riscv_emit_move(dest, temp_reg);
return true;
  }

[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-11-06 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #18 from Levy  ---
if (GET_MODE_CLASS (mode) == MODE_INT
  && GET_MODE_SIZE (mode) < UNITS_PER_WORD
&& can_create_pseudo_p()
&& MEM_P (src))
  {
int extend = (LOAD_EXTEND_OP (mode) == ZERO_EXTEND);
rtx temp_reg = force_reg (word_mode, convert_to_mode (word_mode, src,
extend));
riscv_emit_move(dest, temp_reg);
return true;
  }

tried to insert code at the beginning of riscv_legitimize_move() but seems
convert_to_mode() raised seg fault druing make

[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-11-03 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #12 from Levy  ---
(In reply to Kito Cheng from comment #11)
> > Two failed cases: shorten-memrefs-5.c & shorten-memrefs-6.c
> 
> Seems like shorten_memrefs pass didn't handle zero_extend and sign_extend
> with memory.
> 
> You can take a look into
> riscv-shorten-memrefs.c:pass_shorten_memrefs::analyze and add logic to
> handle zero_extend and sign_extend.
> 
> 
> > With one instruction less, the patched one seems right and even faster to 
> > me. However we still need a test on sign extend and check performance
> 
> shorten_memrefs is optimize for size, the idea is transform several load
> instructions with large immediate to a load immediate instruction and load
> instructions with small immediate, to use C-extensions instruction as
> possible.
> 
> so the instruction count seems increased, but the code size is smaller.

Thank you cheng, I'll have a look.

[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-11-03 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #10 from Levy  ---
Created attachment 49500
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49500&action=edit
Optimzation Patch for QI/HImode(32bit) and QI/HI/SImode(64bit)

Proposing second patch for QI/HImode(32bit) and QI/HI/SImode(64bit)
Both Zero-Extend & Subreg

Tested with make report-gcc
Two failed cases: shorten-memrefs-5.c & shorten-memrefs-6.c

Both were failed due to dejaGNU rule:
/* { dg-final { scan-assembler "load1r:\n\taddi" } } */

But if we look at the assembly code, for same input in both file:

int load1r (int *array)
{
  int a = 0;
  a += array[200];
  a += array[201];
  a += array[202];
  a += array[203];
  return a;
}

Current gcc risc-v port will produce:
load1r:
addia5,a0,768
lw  a4,36(a5)
lw  a0,32(a5)
addwa0,a0,a4
lw  a4,40(a5)
addwa4,a4,a0
lw  a0,44(a5)
addwa0,a0,a4
ret
Patched new port will produce:
load1r:
lwu a4,800(a0)
lwu a5,804(a0)
addwa5,a5,a4
lwu a4,808(a0)
lwu a0,812(a0)
addwa5,a5,a4
addwa0,a5,a0
ret
With one instruction less, the patched one seems right and even faster to me.
However we still need a test on sign extend and check performance

Test case and performance evaluation will be provided later (hopefully)

[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-10-30 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #9 from Levy  ---
Thanks Jim. See u on Monday.

[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-10-30 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

--- Comment #8 from Levy  ---
Created attachment 49470
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49470&action=edit
optimization fix for BUG 97417

proposing a temp patch here in case someone needs it, then I'll submit a full
patch with test case later.

Following code was added to the riscv_legitimize_move () in the
riscv-gcc/gcc/config/riscv/riscv.c

  if(mode == QImode && MEM_P (src) && REG_P (dest) && can_create_pseudo_p())
  {
rtx temp_reg;

if (TARGET_64BIT)
{
  temp_reg = gen_reg_rtx (DImode);
  emit_insn(gen_zero_extendqidi2(temp_reg, src));
}
else
{
  temp_reg = gen_reg_rtx (SImode);
  emit_insn(gen_zero_extendqisi2(temp_reg, src));
}

riscv_emit_move(dest, gen_lowpart(QImode,temp_reg));
return true;
  }

Tested with make report-gcc, haven't done the regression/performance test yet:

= Summary of gcc testsuite =
| # of unexpected case / # of unique unexpected
case
|  gcc |  g++ | gfortran |
 rv64imafdc/  lp64d/ medlow |0 / 0 |0 / 0 |  - |

[Bug other/97417] RISC-V Unnecessary andi instruction when loading volatile bool

2020-10-28 Thread admin at levyhsu dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97417

Levy  changed:

   What|Removed |Added

 CC||admin at levyhsu dot com

--- Comment #6 from Levy  ---
Hi Jim

Levy from StarFive. 

Adding following code to the head of riscv_legitimize_move() according to your
reply seems to solve the problem:

if(mode == QImode && MEM_P (src) && REG_P (dest))
  {
rtx temp_reg;
if (TARGET_64BIT)
{
  temp_reg = gen_reg_rtx (DImode);
  emit_insn(gen_zero_extendqidi2(temp_reg, src));
}
else
{
  temp_reg = gen_reg_rtx (SImode);
  emit_insn(gen_zero_extendqisi2(temp_reg, src));
}

riscv_emit_move(dest, gen_rtx_SUBREG(QImode,temp_reg,0));
return true;
  }

same foo.c will produce:
foo:
lui a5,%hi(active)
lbu a5,%lo(active)(a5)
li  a0,42
bne a5,zero,.L6
ret
.L6:
li  a0,-42
ret
.size   foo, .-foo
.ident  "GCC: (GNU) 10.2.0"

Not sure if I'm doing it right, especially for 64bit DImode because I've only
been with gcc for a month. Just wonder if you have time after Monday's compiler
meeting so we may discuss movsi, movhi and MEM to MEM copy.