[Bug tree-optimization/115208] [15 Regression] Memory consumption get extremely high after r15-807-gfae5e6a4dfcf92

2024-05-26 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115208

Haochen Jiang  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #10 from Haochen Jiang  ---
It also solved the issue for me. Thx!

[Bug middle-end/115208] [15 Regression] Memory consumption get extremely high after r15-809

2024-05-24 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115208

--- Comment #1 from Haochen Jiang  ---
Forgot to mention, the memory consumption collection is collected on x86_64
target in order to get the test finished. Therefore, we could debug on x86_64.

[Bug middle-end/115208] New: [15 Regression] Memory consumption get extremely high after r15-809

2024-05-24 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115208

Bug ID: 115208
   Summary: [15 Regression] Memory consumption get extremely high
after r15-809
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: haochen.jiang at intel dot com
  Target Milestone: ---

I currently got some testcase fail on i686 target.

New failures:
FAIL: gcc.target/i386/avx-1.c (test for excess errors)
FAIL: gcc.target/i386/sse-13.c (test for excess errors)
FAIL: gcc.target/i386/sse-23.c (test for excess errors)
FAIL: gcc.target/i386/sse-24.c (test for excess errors)
FAIL: gcc.target/i386/sse-25.c (test for excess errors)
FAIL: gcc.target/i386/sse-26.c (test for excess errors)

>From my investigation, it happened after the commit series r15-797 to r15-809.

After those commits, the memory consumption got very high.

I ran with:
make check-gcc RUNTESTFLAGS="i386.exp=avx-1.c --target_board='unix{-m32}'"

Before those commits, the memory usage:

Memory usage summary: heap total: 245591, heap peak: 205651, stack peak: 13824
 total calls   total memory   failed calls
 malloc|402 238998  0
realloc| 14   5416  0  (nomove:7, dec:0, free:0)
 calloc|  6   1177  0
   free|224  57705

After those commits, the memory usage:

Memory usage summary: heap total: 17691252434, heap peak: 7691866921, stack
peak: 51056
 total calls   total memory   failed calls
 malloc|   6961407515426859093  0
realloc| 260731   17538362  0  (nomove:88951, dec:19349,
free:0)
 calloc|   11132073 2246854979  0
   free|   81105599 9997315310


Since i686 target gets very limited memory, it crashed those targets.

[Bug target/115025] [14/15 regression] prime computation performance regression, x86, between gcc-14 and gcc-13 on skylake platform

2024-05-22 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115025

Haochen Jiang  changed:

   What|Removed |Added

 CC||jh at suse dot cz

--- Comment #6 from Haochen Jiang  ---
>From my investigation, there are two commits related to this PR. Both of them
related to copy header pass (ch2).

This is the dump before ch2 pass for that loop.

   [local count: 109475452]:
  _4 = n_1(D) % 5;
  if (_4 == 0)
goto ; [3.66%]
  else
goto ; [96.34%]

   [local count: 105468650]:
  _24 = n_1(D) % 7;
  if (_24 == 0)
goto ; [3.66%]
  else
goto ; [96.34%]


First is r14-2675. After this commit, the ch2 pass refused to duplicate bb 9
and bb 10 for the following reason, which previously will duplicate. This
caused half of the total regression.

"Not duplicating bb 9: condition based on non-IV loop variant."

The other is r14-2709. After this commit, the ch2 pass tried to duplicate both
bb 9 and bb 10, but eventually the pass did not. However, the commit
contributed the other half of the regression.

Going to dig into deeper

[Bug target/115069] [14/15 regression] 8 bit integer vector performance regression, x86, between gcc-14 and gcc-13 using avx2 target clones on skylake platform

2024-05-21 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115069

--- Comment #22 from Haochen Jiang  ---
Fixed in GCC14 and GCC15

[Bug target/115069] [14/15 regression] 8 bit integer vector performance regression, x86, between gcc-14 and gcc-13 using avx2 target clones on skylake platform

2024-05-20 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115069

--- Comment #19 from Haochen Jiang  ---
(In reply to Haochen Jiang from comment #18)
> SPEC

SPEC seems all same binary to me. So there is no surprise.

I suppose let's go with patch from Uros to just emphasize the problem.

[Bug target/115069] [14/15 regression] 8 bit integer vector performance regression, x86, between gcc-14 and gcc-13 using avx2 target clones on skylake platform

2024-05-20 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115069

--- Comment #18 from Haochen Jiang  ---
SPEC

[Bug target/115024] [14/15 regression] 128 bit division performance regression, x86, between gcc-14 and gcc-13 using target clones on skylake platform

2024-05-20 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115024

Haochen Jiang  changed:

   What|Removed |Added

 CC||haochen.jiang at intel dot com

--- Comment #5 from Haochen Jiang  ---
>From my test, trunk only has <1% regression if I calculated right.

[haochenj@shgcc101 ~]$ ./13.exe
1240.97 div128 ops per sec
[haochenj@shgcc101 ~]$ ./13.exe
1235.78 div128 ops per sec
[haochenj@shgcc101 ~]$ ./13.exe
1236.95 div128 ops per sec

[haochenj@shgcc101 ~]$ ./trunk.exe
1228.43 div128 ops per sec
[haochenj@shgcc101 ~]$ ./trunk.exe
1227.11 div128 ops per sec
[haochenj@shgcc101 ~]$ ./trunk.exe
1225.42 div128 ops per sec

[Bug target/115069] [14/15 regression] 8 bit integer vector performance regression, x86, between gcc-14 and gcc-13 using avx2 target clones on skylake platform

2024-05-20 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115069

--- Comment #15 from Haochen Jiang  ---
I am doing like this way. Suppose should be same as Comment 8.

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index a6132911e6a..1e8334877d6 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -24323,8 +24323,8 @@ ix86_expand_vecop_qihi2 (enum rtx_code code, rtx dest,
rtx op1, rtx op2)
   bool op2vec = GET_MODE_CLASS (GET_MODE (op2)) == MODE_VECTOR_INT;
   bool uns_p = code != ASHIFTRT;

-  if ((qimode == V16QImode && !TARGET_AVX2)
-  || (qimode == V32QImode && (!TARGET_AVX512BW || !TARGET_EVEX512))
+  if (!TARGET_AVX512BW
+  || (qimode == V32QImode && !TARGET_EVEX512)
   /* There are no V64HImode instructions.  */
   || qimode == V64QImode)

Should we also run a SPEC on with -O2 -mtune=generic -march=x86-64-v3 to see if
there is any surprise?

[Bug target/115069] [14/15 regression] 8 bit integer vector performance regression, x86, between gcc-14 and gcc-13 using avx2 target clones on skylake platform

2024-05-19 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115069

--- Comment #12 from Haochen Jiang  ---
(In reply to Hongtao Liu from comment #11)
> (In reply to Haochen Jiang from comment #10)
> > A patch like Comment 8 could definitely solve the problem. But I need to
> > test more benchmarks to see if there is surprise.
> > 
> > But, yes, as Uros said in Comment 9, maybe there is a chance we could do it
> > better.
> 
> Could you add "arch=skylake-avx512" to target_clones and try disable whole
> ix86_expand_vecop_qihi2 to see if there's any performance improvement?
> For x86, cross-lane permutation(truncation) is not very efficient(3-4 cycles
> for both vpermq and vpmovwb).

When I disable/enable ix86_expand_vecop_qihi2 with arch=skylake-avx512 on
trunk, there is no performance regression comparing to GCC13 + avx2.

It seems that the regression only happens when GCC14 + avx2.

[Bug target/115069] [14/15 regression] 8 bit integer vector performance regression, x86, between gcc-14 and gcc-13 using avx2 target clones on skylake platform

2024-05-17 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115069

--- Comment #10 from Haochen Jiang  ---
A patch like Comment 8 could definitely solve the problem. But I need to test
more benchmarks to see if there is surprise.

But, yes, as Uros said in Comment 9, maybe there is a chance we could do it
better.

[Bug target/115069] [14/15 regression] 8 bit integer vector performance regression, x86, between gcc-14 and gcc-13 using avx2 target clones on skylake platform

2024-05-17 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115069

--- Comment #6 from Haochen Jiang  ---
(In reply to Hongtao Liu from comment #5)
> (In reply to Krzysztof Kanas from comment #4)
> > I bisected the issue and it seems that commit
> > 0368fc54bc11f15bfa0ed9913fd0017815dfaa5d introduces regression.
> 
> I guess the real guilty commit is 
> 
> commit 52ff3f7b863da1011b73c0ab3b11f6c78b6451c7
> Author: Uros Bizjak 
> Date:   Thu May 25 19:40:26 2023 +0200
>  
> i386: Use 2x-wider modes when emulating QImode vector instructions
>  
> Rewrite ix86_expand_vecop_qihi2 to expand fo 2x-wider (e.g. V16QI ->
> V16HImode)
> instructions when available.  Currently, the compiler generates following
> assembly for V16QImode multiplication (-mavx2):

Yes, since 0368fc54bc11f15bfa0ed9913fd0017815dfaa5d only fixed a typo in that
patch.

Original thread: https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619745.html

[Bug target/115025] [14/15 regression] prime computation performance regression, x86, between gcc-14 and gcc-13 on skylake platform

2024-05-16 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115025

--- Comment #5 from Haochen Jiang  ---
My guess is that for the prime judging loop:

for (i = 5; i < max; i += 6)
if ((n % i == 0) || (n % (i + 2) == 0))
return 0;

In GCC13, it extracts the first loop, which is (n % 5 == 0) || (n % 7 == 0),
out of the whole loop to do imul+cmp instead of div.

However, on current trunk, it still remains div and will be slower.

BTW, there is also a codegen regression which won't cause perf regression. On
current trunk, the sqrt BB is not merged together. It increases codesize but no
perf impact.

[Bug target/115028] [15 regression] gcc.target/i386/pr101950-2.c FAILs

2024-05-15 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115028

Haochen Jiang  changed:

   What|Removed |Added

 CC||haochen.jiang at intel dot com

--- Comment #5 from Haochen Jiang  ---
I suppose Richard should have known the issue.
See: https://gcc.gnu.org/pipermail/gcc-regression/2024-May/079828.html

[Bug target/115069] 8 bit integer vector performance regression, x86, between gcc-14 and gcc-13 using avx2 target clones on skylake platform

2024-05-15 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115069

Haochen Jiang  changed:

   What|Removed |Added

 CC||haochen.jiang at intel dot com

--- Comment #3 from Haochen Jiang  ---
>From my investigation, GCC14 generates some perm instructions, which caused
data dependency.

Going to bisect which commit caused this issue.

[Bug target/115071] performance regression, x86, between gcc-14 and gcc-13 using -O3 and _Pragma("GCC unroll 4") on skylake

2024-05-15 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115071

Haochen Jiang  changed:

   What|Removed |Added

 CC||haochen.jiang at intel dot com

--- Comment #3 from Haochen Jiang  ---
I could not reproduce the regression. For me, it is:
[haochenj@shgcc101 pr115071]$ ./13.exe
duration: 7.09 seconds, count = 1119566602
[haochenj@shgcc101 pr115071]$ ./trunk.exe
duration: 4.97 seconds, count = 1119566602

[Bug target/115002] [14/15 regression] wide integer vector performance regression, x86, between gcc-14 and gcc-13 using target clones on skylake platform

2024-05-14 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115002

--- Comment #5 from Haochen Jiang  ---
It seems that mainly caused by codesize increase in GCC14 since the actual
instruction retired increase ratio is similar to the regression.

Also, just like PR114987, I tried with GCC11, seems it gets the better
performance than GCC13.

[Bug target/114987] [14/15 Regression] floating point vector regression, x86, between gcc 14 and gcc-13 using -O3 and target clones on skylake platforms

2024-05-10 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114987

--- Comment #7 from Haochen Jiang  ---
Furthermore, when I build with GCC11, the codegen is much better:

vaddps   0xc0(%rsp),%ymm5,%ymm2
vaddps   0xe0(%rsp),%ymm4,%ymm1
vmovaps  %ymm2,0x80(%rsp)
vmovdqa  0x90(%rsp),%xmm6
vmovaps  %ymm1,0xa0(%rsp)
vmovdqa  0xb0(%rsp),%xmm7
vmovdqa  %xmm2,0xc0(%rsp)
vmovdqa  %xmm6,0xd0(%rsp)
vmovdqa  %xmm1,0xe0(%rsp)
vmovdqa  %xmm7,0xf0(%rsp)
sub  $0x1,%eax
jne  401e00 

Seems we might get two separate issues for this regression.

[Bug target/114987] [14/15 Regression] floating point vector regression, x86, between gcc 14 and gcc-13 using -O3 and target clones on skylake platforms

2024-05-10 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114987

--- Comment #5 from Haochen Jiang  ---
What I have found is that the binary built with GCC13 and GCC14 will regress on
Cascadelake and Skylake.

But when I copied the binary to Icelake, it won't. Seems Icelake might fix this
with micro-tuning.

I tried to move "vmovdqa %xmm1,0xd0(%rsp)" before "vmovdqa %xmm0,0xe0(%rsp)"
and rebuilt the binary and it will save half the regression.

[Bug target/110621] x86_64: Test gcc.target/i386/pr105354-2.c fails with -fstack-protector

2024-04-26 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110621

Haochen Jiang  changed:

   What|Removed |Added

 CC||haochen.jiang at intel dot com

--- Comment #3 from Haochen Jiang  ---
Fixed in GCC13 and GCC14.

[Bug testsuite/109596] [14 Regression] Lots of guality testcase fails on x86_64 after r14-162-gcda246f8b421ba

2024-04-13 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109596

--- Comment #18 from Haochen Jiang  ---
(In reply to Andrew Pinski from comment #16)
> (In reply to Carlos Eduardo Seo from comment #15)
> > I see some failures after this patch on aarch64-linux-gnu:
> > 
> > FAIL: gcc.dg/guality/pr54693-2.c -O2 -flto -fuse-linker-plugin
> > -fno-fat-lto-objects  -DPREVENT_OPTIMIZATION line 21 x == 10 - i
> > FAIL: gcc.dg/guality/pr54693-2.c -O2 -flto -fuse-linker-plugin
> > -fno-fat-lto-objects  -DPREVENT_OPTIMIZATION line 21 y == 20 - 2 * i
> > FAIL: gcc.dg/guality/pr54693-2.c -O2 -flto -fuse-linker-plugin
> > -fno-fat-lto-objects  -DPREVENT_OPTIMIZATION line 21 z == 30 - 3 * i
> > 
> > Could you please take a look?
> 
> I suspect it is similar to what was already discussed here: 
> 
> https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649347.html

Yes. As Richard said, the FAIL are exactly the same before Honza's patch, I
suppose expected.

[Bug tree-optimization/114238] [14 regression] Multiple 554.roms_r run-time regressions (4%-20%) since r14-9193-ga0b1798042d033

2024-03-12 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114238

Haochen Jiang  changed:

   What|Removed |Added

 CC||haochen.jiang at intel dot com

--- Comment #2 from Haochen Jiang  ---
(In reply to Richard Biener from comment #1)
> r14-9391-g018ddc86b92851 fixed this on Zen2 for me as well.

The commit also fixed the regression on SPR for me as well.

[Bug target/113656] [x86] ICE in simplify_const_unary_operation, at simplify-rtx.cc:1954 with new -mavx10.1

2024-01-30 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113656

Haochen Jiang  changed:

   What|Removed |Added

 CC||crazylht at gmail dot com

--- Comment #4 from Haochen Jiang  ---
>From my bisect, it seems that the guilty commit is gcc-14-1707-ge52be6034fa.

[Bug target/113656] [x86] ICE in simplify_const_unary_operation, at simplify-rtx.cc:1954 with new -mavx10.1

2024-01-29 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113656

--- Comment #3 from Haochen Jiang  ---
(In reply to Haochen Jiang from comment #2)
> Actually it is caused by option -funsafe-math-optimizations but not
> -mavx10.1.
> 
> Before my commit, while using option:
> 
> -frounding-math -O3 -mavx512fp16 -mavx512vl -funsafe-math-optimizations
> 
> It will also report ICE. In GCC13.2, it won't, which means it is introduced
> in GCC14.
> 
> You got that bisect result since it is when avx10.1 first introduced.
> -mavx10.1 will enable -mavx512fp16 and -mavx512vl.
> 
> When we eliminate -funsafe-math-optimizations, it will be ok.
> 
> Also if we are only using -mavx512vl, everything is ok. Seems like something
> got disabled under -mavx512fp16.

What I mean "disabled" here is actually not enabled while using
-funsafe-math-optimizations with -mavx512fp16.

> 
> Need more bisect with option: -frounding-math -O3 -mavx512fp16 -mavx512vl
> -funsafe-math-optimizations to find out why.

[Bug target/113656] [x86] ICE in simplify_const_unary_operation, at simplify-rtx.cc:1954 with new -mavx10.1

2024-01-29 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113656

--- Comment #2 from Haochen Jiang  ---
Actually it is caused by option -funsafe-math-optimizations but not -mavx10.1.

Before my commit, while using option:

-frounding-math -O3 -mavx512fp16 -mavx512vl -funsafe-math-optimizations

It will also report ICE. In GCC13.2, it won't, which means it is introduced in
GCC14.

You got that bisect result since it is when avx10.1 first introduced. -mavx10.1
will enable -mavx512fp16 and -mavx512vl.

When we eliminate -funsafe-math-optimizations, it will be ok.

Also if we are only using -mavx512vl, everything is ok. Seems like something
got disabled under -mavx512fp16.

Need more bisect with option: -frounding-math -O3 -mavx512fp16 -mavx512vl
-funsafe-math-optimizations to find out why.

[Bug target/113656] [x86] ICE in simplify_const_unary_operation, at simplify-rtx.cc:1954 with new -mavx10.1

2024-01-29 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113656

--- Comment #1 from Haochen Jiang  ---
>From the first glance, it seems that the op here is wrongly interpreted.
Investigating why.

[Bug target/113534] New: printf might report segmentation fault under -mabi=ms

2024-01-21 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113534

Bug ID: 113534
   Summary: printf might report segmentation fault under -mabi=ms
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: haochen.jiang at intel dot com
  Target Milestone: ---

A reproducer:

$ /export/users/haochenj/env/build_no_bootstrap_master/gcc/xgcc
-B/export/users/haochenj/env/build_no_bootstrap_master/gcc/
/export/users/haochenj/src/gcc/master/gcc/testsuite/gcc.target/i386/pr80969-4a.c
-m64 -DDEBUG -fdiagnostics-plain-output -Ofast -mabi=ms -mavx512f -lm -o
./pr80969-4a.exe
$ ./pr80969-4a.exe
Segmentation fault (core dumped)

After I debug into that, where it core dumped is the "printf ("PASSED\n");" in
avx-check.h.

We got:

Program received signal SIGSEGV, Segmentation fault.
0x77db887c in __strlen_evex () from /lib64/libc.so.6

It seems that the seg fault will only happen under -mabi=ms. If we eliminate
-mabi=ms, no segmentation fault is detected.

[Bug target/113288] [i386] Missing #define for -mavx10.1-256 and -mavx10.1-512

2024-01-11 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113288

--- Comment #6 from Haochen Jiang  ---
Fixed on trunk.

[Bug target/113288] [i386] Missing #define for -mavx10.1-256 and -mavx10.1-512

2024-01-08 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113288

--- Comment #3 from Haochen Jiang  ---
Adding them are quite straightforward. But I am not quite sure how the whole
libgomp patch works.

Is the patch attempt to check whether it is a perfect match for each ISA
detected from a hardware? If that is the case, we need them to be added. BTW,
under this scenario, no need to add an if clause for macro __EVEX512__ and
__EVEX256__ in that patch since those two are not true ISAs.

[Bug target/113288] [i386] Missing #define for -mavx10.1-256 and -mavx10.1-512

2024-01-08 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113288

--- Comment #1 from Haochen Jiang  ---
(In reply to Tobias Burnus from comment #0)
> As noted in
> https://gcc.gnu.org/pipermail/gcc-patches/2024-January/642025.html
> 
> There is not #define for -mavx10.1-256 and -mavx10.1-512
> 
> By contrast, there is one for, e.g.,
> 
> __AVX10_512BIT__ and "avx10-max-512bit"
> __AVX10_1__ and "avx10.1"

I think both of these two are also not on current trunk, they are the previous
design but get obsoleted at the end. Let me see if we need something like that.

> __AMX_FP16__ and -mamx-fp16
> etc.

[Bug target/113133] [14 Regression] ICE: SIGSEGV in mark_label_nuses(rtx_def*) (emit-rtl.cc:3896) with -O -fno-tree-ter -mavx512f -march=barcelona

2024-01-01 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113133

--- Comment #11 from Haochen Jiang  ---
I just checked the code and pattern. I suppose the simple remove is reasonable
here. We should only allow x/ymm16+ for scalar instructions, but not this
pattern.

[Bug target/113133] [14 Regression] ICE: SIGSEGV in mark_label_nuses(rtx_def*) (emit-rtl.cc:3896) with -O -fno-tree-ter -mavx512f -march=barcelona

2023-12-28 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113133

--- Comment #7 from Haochen Jiang  ---
(In reply to Uroš Bizjak from comment #1)
> Created attachment 56962 [details]
> Proposed patch
> 
> Patch in testing.
> 
> lowpart_subreg can't handle:
> 
> lowpart_subreg (V4SFmode, operands[0], DFmode);
> 
> and
> 
> lowpart_subreg (V2DFmode, operands[0], SFmode);
> 
> subreg conversions and will return NULL_RTX for these cases.

I suppose the patch here is ok at least from my initial test.

[Bug target/113133] [14 Regression] ICE: SIGSEGV in mark_label_nuses(rtx_def*) (emit-rtl.cc:3896) with -O -fno-tree-ter -mavx512f -march=barcelona

2023-12-28 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113133

--- Comment #6 from Haochen Jiang  ---
Aha, I see what happened. x/ymm16+ are usable for AVX512F w/o AVX512VL and that
is why I added that to allow them.

Let me find a way to see if we can fix this.

[Bug target/113133] [14 Regression] ICE: SIGSEGV in mark_label_nuses(rtx_def*) (emit-rtl.cc:3896) with -O -fno-tree-ter -mavx512f -march=barcelona

2023-12-28 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113133

--- Comment #5 from Haochen Jiang  ---
(In reply to Uroš Bizjak from comment #3)
> This patch also fixes the failure:
> 
> --cut here--
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index ca6dbf42a6d..cdb9ddc4eb3 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -5210,7 +5210,7 @@ (define_split
> && optimize_insn_for_speed_p ()
> && reload_completed
> && (!EXT_REX_SSE_REG_P (operands[0])
> -   || TARGET_AVX512VL || TARGET_EVEX512)"
> +   || TARGET_AVX512VL)"
> [(set (match_dup 2)
>  (float_extend:V2DF
>(vec_select:V2SF
> --cut here--

Hmm, it looks weird I added EVEX512 near AVX512VL, checking why I am doing
that.

[Bug target/112675] [14 Regression] r14-5385-g0a140730c97087 caused regression on testcases for i386

2023-11-26 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112675

Haochen Jiang  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #2 from Haochen Jiang  ---
Aha, it has been fixed on trunk. I will close that.

Thanks!

[Bug target/112675] New: [14 Regression] r14-5385-g0a140730c97087 caused regression on testcases

2023-11-22 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112675

Bug ID: 112675
   Summary: [14 Regression] r14-5385-g0a140730c97087 caused
regression on testcases
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: haochen.jiang at intel dot com
  Target Milestone: ---

As shown in gcc-regression:

https://gcc.gnu.org/pipermail/gcc-regression/2023-November/078504.html

The guilty commit for some regressions is r14-5385-g0a140730c97087.

An easy reproducer would be:

make check RUNTESTFLAGS="dg-torture.exp=gcc.dg/torture/fp-int-convert-timode.c
--target_board='unix{-m64\ -march=cascadelake,-m32\
-march=cascadelake,-m32,-m64}'"

[Bug target/112643] [14 regression] including x86intrin.h is broken for -march=native (which adds -mno-avx10.1-256 )

2023-11-22 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112643

--- Comment #24 from Haochen Jiang  ---
Patch aims to fix that:

https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637865.html

[Bug target/112643] [14 regression] including x86intrin.h is broken for -march=native (which adds -mno-avx10.1-256 )

2023-11-22 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112643

--- Comment #23 from Haochen Jiang  ---
I have root caused the issue and also discovered some other minor problems
unrelated to this PR but hard to discover.

I will write a patch to fix all of them.

[Bug target/112643] [14 regression] including x86intrin.h is broken for -march=native (which adds -mno-avx10.1-256 )

2023-11-21 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112643

--- Comment #22 from Haochen Jiang  ---
A quick workaround would be not appending -mno-avx10.1-xxx into -march=native.
And it should work after my experiment. However, I am finding a better way to
do that.

The real problem seems like the AVX10 and AVX512 options handling in override
part messed up flags while both explicit no on options and finally messed up
the pragma push in avx512vp2intersect since it is the only AVX512 ISA out of
AVX10.1 except for Xeon Phi.

[Bug target/112643] [14 regression] including x86intrin.h is broken for -march=native (which adds -mno-avx10.1-256 )

2023-11-21 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112643

--- Comment #21 from Haochen Jiang  ---
(In reply to Andrew Pinski from comment #20)
> The use of __builtin_ia32_2intersectd128 in avx512vp2intersectvlintrin.h has:
> #pragma GCC target("avx512vp2intersect,avx512vl,no-evex512")
> 
> While i386-builtin.def does:
> BDESC (0, OPTION_MASK_ISA2_AVX512VP2INTERSECT, CODE_FOR_nothing,
> "__builtin_ia32_2intersectd128", IX86_BUILTIN_2INTERSECTD128, UNKNOWN, (int)
> VOID_FTYPE_PUQI_PUQI_V4SI_V4SI)

This is a known issue I figured out yesterday but should not cause the problem
since it actually relaxed conditions. It will cause ICE when caliing builtins
directly.

The reason why I am not reproducing the regression seems mainly caused by the
machine I built with all have AVX512, but all the CPUs mentioned here did not,
which will lead to different behavior on march and option override.

Rebuilding on AVX only machine to reproduce.

[Bug bootstrap/112643] [14 regression] failure to build libitm with --disable-bootstrap after r14-5607-g2f8f7ee2db82a3

2023-11-21 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112643

--- Comment #16 from Haochen Jiang  ---
Well I still could not reproduce that. Need some more investigation if they are
the same case.

[Bug bootstrap/112643] [14 regression] failure to build libitm with --disable-bootstrap after r14-5607-g2f8f7ee2db82a3

2023-11-20 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112643

--- Comment #14 from Haochen Jiang  ---
Intel(R) Core(TM) i5-8250U and AMD Ryzen 7 PRO 6850U both have AVX.

I am trying to reproduce that on building trunk with GCC 13.

[Bug bootstrap/112643] [14 regression] failure to build libitm with --disable-bootstrap after r14-5607-g2f8f7ee2db82a3

2023-11-20 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112643

--- Comment #10 from Haochen Jiang  ---
(In reply to Andrew Pinski from comment #7)
> I suspect the common theme here is enable-default-pie .
> 
> In the case of the original report was built with a compiler that had
> enabled and --disable-bootstrap got it.

I added --enable-default-pie into config and it still won't reproduce the fail.

I am suspecting whether it is caused by the GCC version which compiles GCC.

[Bug bootstrap/112643] Failure to build libitm with --disable-bootstrap after r14-5607-g2f8f7ee2db82a3

2023-11-20 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112643

--- Comment #4 from Haochen Jiang  ---
It is weird since everything passed even under bootstrap.

Could you provide the exact options you build GCC with --disable-bootstrap for
me to reproduce?

I suppose all of them are '--enable-libsanitizer' '--disable-bootstrap'
'--enable-valgrind-annotations' '--with-system-zlib' '--prefix=/opt/gcc/14'
'CC=/usr/bin/gcc-13' 'CFLAGS=-g -O2 -march=native' 'CXX=/usr/bin/g++-13'
'CXXFLAGS=-g -O2 -march=native' '--enable-languages=c,c++,fortran,lto'

[Bug target/112547] 9% exec time regression of 462.libquantum SPEC on AMD zen4 CPU

2023-11-16 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112547

--- Comment #7 from Haochen Jiang  ---
I have got a same binary w/ and w/o my commit with the options if nothing went
wrong.

Seems we need more investigation.

[Bug target/112547] 9% exec time regression of 462.libquantum SPEC on AMD zen4 CPU

2023-11-15 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112547

--- Comment #4 from Haochen Jiang  ---
I checked the znver3 plot on the site, it seems that no regression occurs.

Since znver4 enabled AVX512, that is the reason why I guessed previously.

Could you also provide the option you ran with? I could not find where it hides
in the site. Thx!

[Bug target/112547] 9% exec time regression of 462.libquantum SPEC on AMD zen4 CPU

2023-11-15 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112547

--- Comment #3 from Haochen Jiang  ---
(In reply to Haochen Jiang from comment #2)
> It is weird since I did not touch the tune.
> 
> Need a bisect to check that but I do not have a zen4 machine.
> 
> Could you try with this commit g:459866eaeec151e72aecd670695f014f4ec48588 to
> see if the regression still occurs?
> 
> If that still occurs, a guess might be zmm vectorization is not enabled
> corrected under some scenario.

Sorry, should be if that regression disappeared, not occured.

[Bug target/112547] 9% exec time regression of 462.libquantum SPEC on AMD zen4 CPU

2023-11-15 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112547

--- Comment #2 from Haochen Jiang  ---
It is weird since I did not touch the tune.

Need a bisect to check that but I do not have a zen4 machine.

Could you try with this commit g:459866eaeec151e72aecd670695f014f4ec48588 to
see if the regression still occurs?

If that still occurs, a guess might be zmm vectorization is not enabled
corrected under some scenario.

[Bug target/112435] [14 regression] GCC generates assembly which gas rejects with AVX when building ncnn (Error: unsupported instruction `vblendps') since r14-96-gc2dac2e5fbbcdd

2023-11-07 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112435

--- Comment #12 from Haochen Jiang  ---
Seems like we should prevent the optimization in that commit to register
x/ymm16+.

[Bug target/111907] ICE: in curr_insn_transform, at lra-constraints.cc:4294 unable to generate reloads for: {*andnottf3} with -mavx512f -mno-evex512

2023-11-07 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111907

--- Comment #8 from Haochen Jiang  ---
Should be fixed on trunk now.

[Bug target/112374] [14 Regression] `--with-arch=skylake-avx512 --with-cpu=skylake-avx512` causes a comparison failure

2023-11-07 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112374

Haochen Jiang  changed:

   What|Removed |Added

 CC||haochen.jiang at intel dot com

--- Comment #5 from Haochen Jiang  ---
It seems that after the latest patch in PR112361, the fail still not got fixed.

https://gcc.gnu.org/pipermail/gcc-regression/2023-November/078446.html

[Bug target/111907] ICE: in curr_insn_transform, at lra-constraints.cc:4294 unable to generate reloads for: {*andnottf3} with -mavx512f -mno-evex512

2023-11-06 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111907

--- Comment #6 from Haochen Jiang  ---
Proposed patch:

https://gcc.gnu.org/pipermail/gcc-patches/2023-November/635410.html

[Bug target/111889] [14 Regression] 128/256 intrins could not be used with only specifying "no-evex512, avx512vl" in function attribute

2023-11-05 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111889

Haochen Jiang  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #7 from Haochen Jiang  ---
Should be fixed after patches

[Bug target/111907] ICE: in curr_insn_transform, at lra-constraints.cc:4294 unable to generate reloads for: {*andnottf3} with -mavx512f -mno-evex512

2023-11-05 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111907

--- Comment #5 from Haochen Jiang  ---
BTW, it should be disabled since it will use zmm previously.

foo(_Float128, _Float128):
pushrbp
mov rbp, rsp
vmovdqa XMMWORD PTR [rbp-16], xmm0
vmovdqa XMMWORD PTR [rbp-32], xmm1
vmovdqa xmm1, XMMWORD PTR [rbp-16]
vmovdqa xmm2, XMMWORD PTR [rbp-32]
vmovdqa xmm0, XMMWORD PTR .LC0[rip]
vpandnq zmm1, zmm0, zmm1
vpand   xmm0, xmm0, xmm2
vporxmm0, xmm1, xmm0
pop rbp
ret

A straightforward solution might be trying to use its xmm version here.

[Bug target/111907] ICE: in curr_insn_transform, at lra-constraints.cc:4294 unable to generate reloads for: {*andnottf3} with -mavx512f -mno-evex512

2023-11-05 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111907

--- Comment #4 from Haochen Jiang  ---
I guess it is caused by "*andnot3", not confirmed yet.
The isa for the last constraint changed to avx512f_512, which will make the
pattern disabled under -mavx512f -mno-evex512.
Let me find a solution on that.

[Bug target/111889] [14 Regression] 128/256 intrins could not be used with only specifying "no-evex512, avx512vl" in function attribute

2023-10-23 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111889

--- Comment #5 from Haochen Jiang  ---
It is actually a legacy issue from this:

$ cat 2.c
#include 


__attribute__ ((target ("no-avx2")))
void foo ()
{
return _mm_empty ();
}

$ x86_64-pc-linux-gnu-gcc -O2 -mavx512f 2.c

It will also fail.

The main reason is caused by caller's target is higher than callee's.

Previously it will not cause problem since we consider it makes sense and
nobody will write code in such pattern.

But we will introduce avx10.x-256/512 options and function attributes in near
future, the problem might become visible.

[Bug target/111772] ICE on gfortran.dg/transpose_conjg_1.f90 in regrename.cc

2023-10-22 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111772

Haochen Jiang  changed:

   What|Removed |Added

 CC||haochen.jiang at intel dot com

--- Comment #1 from Haochen Jiang  ---
Should also be fixed on trunk as PR111753 fixed.

[Bug target/111753] [14 Regression] ICE: in extract_constrain_insn, at recog.cc:2692 insn does not satisfy its constraints: {*movsf_internal} with -O2 -mavx512bw -fno-tree-ter starting with r14-4499

2023-10-22 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111753

--- Comment #6 from Haochen Jiang  ---
Fixed on trunk.

[Bug target/111889] [14 Regression] 128/256 intrins could not be used with only specifying "no-evex512, avx512vl" in function attribute

2023-10-20 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111889

--- Comment #3 from Haochen Jiang  ---
My proposal for this problem is to also push "no-evex512" when defining 128/256
intrins. But I am not sure if there will be some potential problems.

Currently working on an experiment on that.

[Bug target/111889] [14 Regression] 128/256 intrins could not be used with only specifying "no-evex512, avx512vl" in function attribute

2023-10-20 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111889

--- Comment #2 from Haochen Jiang  ---
Here is the Godbolt example of that:

https://godbolt.org/z/b3n8h4rb1

[Bug target/111889] [14 Regression] 128/256 intrins could not be used with only specifying "no-evex512, avx512vl" in function attribute

2023-10-20 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111889

--- Comment #1 from Haochen Jiang  ---
(In reply to Haochen Jiang from comment #0)
> Created attachment 56155 [details]
> Simple testcase
> 
> With this simple testcase and command like this:
> 
> x86_64-pc-linux-gnu-gcc -O2 -march=x86-64 1.c
> 
> We will finally get:
> 
> error: inlining failed in call to ‘always_inline’ ‘_mm256_mask_mov_pd’:
> target specific option mismatch
> 
> But if we use the command like this:
> 
> x86_64-pc-linux-gnu-gcc -O2 -march=x86-64 -mno-evex512 1.c

Oops, I missing a sentence here, I mean it will meet no issue if we specified
-mno-evex512 in command line.

> 
> It seems that the default handle for evex512 with avx512 will finally let
> the compiler wrongly suppose that 128/256 bit intrins need evex512 feature
> when co-operating with function attribute, but actually it does not.

[Bug target/111889] New: [14 Regression] 128/256 intrins could not be used with only specifying "no-evex512, avx512vl" in function attribute

2023-10-20 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111889

Bug ID: 111889
   Summary: [14 Regression] 128/256 intrins could not be used with
only specifying "no-evex512, avx512vl" in function
attribute
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: haochen.jiang at intel dot com
  Target Milestone: ---

Created attachment 56155
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56155=edit
Simple testcase

With this simple testcase and command like this:

x86_64-pc-linux-gnu-gcc -O2 -march=x86-64 1.c

We will finally get:

error: inlining failed in call to ‘always_inline’ ‘_mm256_mask_mov_pd’: target
specific option mismatch

But if we use the command like this:

x86_64-pc-linux-gnu-gcc -O2 -march=x86-64 -mno-evex512 1.c

It seems that the default handle for evex512 with avx512 will finally let the
compiler wrongly suppose that 128/256 bit intrins need evex512 feature when
co-operating with function attribute, but actually it does not.

[Bug target/111753] [14 Regression] ICE: in extract_constrain_insn, at recog.cc:2692 insn does not satisfy its constraints: {*movsf_internal} with -O2 -mavx512bw -fno-tree-ter starting with r14-4499

2023-10-20 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111753

--- Comment #4 from Haochen Jiang  ---
Proposed patch:

https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633677.html

[Bug target/111753] [14 Regression] ICE: in extract_constrain_insn, at recog.cc:2692 insn does not satisfy its constraints: {*movsf_internal} with -O2 -mavx512bw -fno-tree-ter starting with r14-4499

2023-10-18 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111753

--- Comment #3 from Haochen Jiang  ---
It seems like caused by I changed the behavior when trying to use x/ymm16+ w/o
avx512vl specified.

Working on a solution for that.

[Bug target/111051] [14 Regression] highway-1.0.6 fails to build as gcc-14.0.0/lib/gcc/x86_64-unknown-linux-gnu/14.0.0/include/avxintrin.h:1238:1: error: inlining failed in call to 'always_inline' '__

2023-08-18 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111051

--- Comment #3 from Haochen Jiang  ---
See patch:

https://gcc.gnu.org/pipermail/gcc-patches/2023-August/627829.html

[Bug target/111051] [14 Regression] highway-1.0.6 fails to build as gcc-14.0.0/lib/gcc/x86_64-unknown-linux-gnu/14.0.0/include/avxintrin.h:1238:1: error: inlining failed in call to 'always_inline' '__

2023-08-17 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111051

--- Comment #2 from Haochen Jiang  ---
It is caused by when including immintrin.h, since the pragma is removed, there
will be no AVX support, which makes _mm256_setzero_pd invisible.

Adding a AVX2 pragma instead of removing it should solve the problem.

I am working a patch on that.

[Bug target/110083] New: [14 Regression] ICEs for testcase on fp-int-convert*timode after r14-1466-g3635e8c67e1

2023-06-01 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110083

Bug ID: 110083
   Summary: [14 Regression] ICEs for testcase on
fp-int-convert*timode after r14-1466-g3635e8c67e1
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: haochen.jiang at intel dot com
  Target Milestone: ---

Currently we get testcase regressions following:

https://gcc.gnu.org/pipermail/gcc-regression/2023-June/077808.html

I checked my bisect script, at least for fp-int-convert-timode.c, it points to
commit r14-1466-g3635e8c67e1 when using -march=cascadelake.

Actually, it will cause ICEs when using -msse4 and higher ISA set.

I reproduce it by:

/export/users/haochenj/env/build_no_bootstrap_master/gcc/xgcc
-B/export/users/haochenj/env/build_no_bootstrap_master/gcc/
/export/users/haochenj/src/gcc/master/gcc/testsuite/gcc.dg/torture/fp-int-convert-timode.c
-m64 -msse4 -fdiagnostics-plain-output -O2 -lm -o ./fp-int-convert-timode.exe

with backtrace:

during RTL pass: cse2
/export/users/haochenj/src/gcc/master/gcc/testsuite/gcc.dg/torture/fp-int-convert-timode.c:
In function ‘main’:
/export/users/haochenj/src/gcc/master/gcc/testsuite/gcc.dg/torture/fp-int-convert-timode.c:22:1:
internal compiler error: in as_a, at machmode.h:381
0xcc405f scalar_mode as_a(machine_mode)
/export/users/haochenj/src/gcc/master/./gcc/machmode.h:381
0xf9eb33 wi::int_traits
>::get_precision(std::pair const&)
/export/users/haochenj/src/gcc/master/./gcc/rtl.h:2282
0xfb7fce unsigned int wi::get_precision
>(std::pair const&)
/export/users/haochenj/src/gcc/master/./gcc/wide-int.h:1795
0xfb21aa wide_int_ref_storage::wide_int_ref_storage
>(std::pair const&)
/export/users/haochenj/src/gcc/master/./gcc/wide-int.h:1029
0xfa7c7c generic_wide_int
>::generic_wide_int >(std::pair const&)
/export/users/haochenj/src/gcc/master/./gcc/wide-int.h:787
0x103f74b poly_int<1u, generic_wide_int >
>::poly_int >(std::pair const&)
/export/users/haochenj/src/gcc/master/./gcc/poly-int.h:670
0x103ef15 wi::to_poly_wide(rtx_def const*, machine_mode)
/export/users/haochenj/src/gcc/master/./gcc/rtl.h:2382
0x1558c20 neg_poly_int_rtx
/export/users/haochenj/src/gcc/master/./gcc/simplify-rtx.cc:57
0x156521c simplify_context::simplify_binary_operation_1(rtx_code, machine_mode,
rtx_def*, rtx_def*, rtx_def*, rtx_def*)
/export/users/haochenj/src/gcc/master/./gcc/simplify-rtx.cc:3171
0x1562b50 simplify_context::simplify_binary_operation(rtx_code, machine_mode,
rtx_def*, rtx_def*)
/export/users/haochenj/src/gcc/master/./gcc/simplify-rtx.cc:2641
0xfcbd31 simplify_binary_operation(rtx_code, machine_mode, rtx_def*, rtx_def*)
/export/users/haochenj/src/gcc/master/./gcc/rtl.h:3485
0x15730be simplify_const_relational_operation(rtx_code, machine_mode, rtx_def*,
rtx_def*)
/export/users/haochenj/src/gcc/master/./gcc/simplify-rtx.cc:6173
0x1571a01 simplify_context::simplify_relational_operation(rtx_code,
machine_mode, machine_mode, rtx_def*, rtx_def*)
/export/users/haochenj/src/gcc/master/./gcc/simplify-rtx.cc:5759
0xf0607d simplify_relational_operation(rtx_code, machine_mode, machine_mode,
rtx_def*, rtx_def*)
/export/users/haochenj/src/gcc/master/./gcc/rtl.h:3500
0x29418dd fold_rtx
/export/users/haochenj/src/gcc/master/./gcc/cse.cc:3487
0x2940e75 fold_rtx
/export/users/haochenj/src/gcc/master/./gcc/cse.cc:3227
0x29443e9 cse_insn
/export/users/haochenj/src/gcc/master/./gcc/cse.cc:4667
0x29498ce cse_extended_basic_block
/export/users/haochenj/src/gcc/master/./gcc/cse.cc:6566
0x2949ddd cse_main
/export/users/haochenj/src/gcc/master/./gcc/cse.cc:6711
0x294c0c6 rest_of_handle_cse2
/export/users/haochenj/src/gcc/master/./gcc/cse.cc:7609

[Bug target/109807] [14 Regression] sse2-mmx-pmaddwd.c met ICE after commit r14-666-g608e7f3ab47 with march=cascadelake

2023-05-11 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109807

--- Comment #4 from Haochen Jiang  ---
(In reply to Uroš Bizjak from comment #2)
> (In reply to Haochen Jiang from comment #1)
> > I further checked the reason, V2SI should never dropped into that function
> > because we have no pattern under V2SI.
> > 
> > I suppose it is because -march=cascadelake will open SSE4.1, with the new
> > pattern, it wrongly dropped into that.
> > 
> > -m32 will not ICE since TARGET_MMX_WITH_SSE need 64 bit and won't enable the
> > new pattern.
> 
> V2SI mul was introduced in r14-493 (AKA partial fix for PR109690).

I see. So we might need to add cost for that right?

[Bug target/109807] [14 Regression] sse2-mmx-pmaddwd.c met ICE after commit r14-666-g608e7f3ab47 with march=cascadelake

2023-05-11 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109807

--- Comment #1 from Haochen Jiang  ---
I further checked the reason, V2SI should never dropped into that function
because we have no pattern under V2SI.

I suppose it is because -march=cascadelake will open SSE4.1, with the new
pattern, it wrongly dropped into that.

-m32 will not ICE since TARGET_MMX_WITH_SSE need 64 bit and won't enable the
new pattern.

[Bug target/109807] New: [14 Regression] sse2-mmx-pmaddwd.c met ICE after commit gcc-14-666-g608e7f3ab47 with march=cascadelake

2023-05-11 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109807

Bug ID: 109807
   Summary: [14 Regression] sse2-mmx-pmaddwd.c met ICE after
commit gcc-14-666-g608e7f3ab47 with march=cascadelake
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: haochen.jiang at intel dot com
  Target Milestone: ---

After that commit in the testcase with command:

/export/users/haochenj/env/build_no_bootstrap_master/gcc/xgcc
-B/export/users/haochenj/env/build_no_bootstrap_master/gcc/
/export/users/haochenj/src/gcc/master/gcc/testsuite/gcc.
target/i386/sse2-mmx-pmaddwd.c -m64 -march=cascadelake
-fdiagnostics-plain-output -O2 -fno-strict-aliasing -msse2 -mno-mmx -lm -o
./sse2-mmx-pmaddwd.exe

We met an ICE:

In file included from
/export/users/haochenj/src/gcc/master/gcc/testsuite/gcc.target/i386/sse2-mmx-pmaddwd.c:5:
/export/users/haochenj/src/gcc/master/gcc/testsuite/gcc.target/i386/sse2-check.h:
In function ‘do_test’: 

/export/users/haochenj/src/gcc/master/gcc/testsuite/gcc.target/i386/sse2-check.h:10:1:
internal compiler error: in ix86_widen_mult_cost, at config/i386/i386.cc:20442
0x1a57e23 ix86_widen_mult_cost 
   
  
/export/users/haochenj/src/gcc/master/./gcc/config/i386/i386.cc:20442
0x1a62835 ix86_vector_costs::add_stmt_cost(int, vect_cost_for_stmt,
_stmt_vec_info*, _slp_tree*, tree_node*, int, vect_cost_model_location)
/export/users/haochenj/src/gcc/master/./gcc/config/i386/i386.cc:23479
0x18dc317 add_stmt_cost(vector_costs*, int, vect_cost_for_stmt,
_stmt_vec_info*, _slp_tree*, tree_node*, int, vect_cost_model_location)
/export/users/haochenj/src/gcc/master/./gcc/tree-vectorizer.h:1779
0x190fa68 add_stmt_cost(vector_costs*, stmt_info_for_cost*)
/export/users/haochenj/src/gcc/master/./gcc/tree-vectorizer.h:1801
0x190567f vect_bb_vectorization_profitable_p
/export/users/haochenj/src/gcc/master/./gcc/tree-vect-slp.cc:6928
0x1907bec vect_slp_region
/export/users/haochenj/src/gcc/master/./gcc/tree-vect-slp.cc:7441
0x19087dc vect_slp_bbs
/export/users/haochenj/src/gcc/master/./gcc/tree-vect-slp.cc:7611
0x1908d6c vect_slp_function(function*)
/export/users/haochenj/src/gcc/master/./gcc/tree-vect-slp.cc:7712
0x192d9b5 execute
/export/users/haochenj/src/gcc/master/./gcc/tree-vectorizer.cc:1529
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See  for instructions.

I took a quick look, it seems that V2SImode is not handled in function
ix86_widen_mult_cost.

[Bug testsuite/109596] New: [14 Regression] Lots of testcases fails on x86_64 after r14-162-gcda246f8b421ba

2023-04-22 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109596

Bug ID: 109596
   Summary: [14 Regression] Lots of testcases fails on x86_64
after r14-162-gcda246f8b421ba
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: testsuite
  Assignee: unassigned at gcc dot gnu.org
  Reporter: haochen.jiang at intel dot com
  Target Milestone: ---

Currently, we can see all the following testcases fail on x86_64:

https://gcc.gnu.org/pipermail/gcc-regression/2023-April/077654.html

>From my bisect script, I can confirm that at least for pr90074 and pr90716,
fail is casued by commit r14-162-gcda246f8b421ba.

It seems that we might need to modify those testcases.

Reproduce by:

make check RUNTESTFLAGS="guality.exp=gcc.dg/guality/pr90716.c
--target_board='unix{-m64\ -march=cascadelake,-m32\
-march=cascadelake,-m32,-m64}'"

[Bug target/109549] New: [14 Regression] cmov6.c test fail after commit r14-53-g675b1a7f113adb1d737adaf78b4fd90be7a0ed1a

2023-04-19 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109549

Bug ID: 109549
   Summary: [14 Regression] cmov6.c test fail after commit
r14-53-g675b1a7f113adb1d737adaf78b4fd90be7a0ed1a
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: haochen.jiang at intel dot com
  Target Milestone: ---

After commit r14-53-g675b1a7f113adb1d737adaf78b4fd90be7a0ed1a with the fix of
removing @gol fix, we got a scan asm fail for gcc.target/i386/cmov6.c.

We can reproduce that by command:

make check-gcc RUNTESTFLAGS="i386.exp=cmov6.c --target_board='unix{-m32,}'"

The previous codegen is:

foo:
.LFB0:
.cfi_startproc
movl4(%esp), %eax
movl$20, %edx
testl   %eax, %eax
movl$10, %eax
cmovne  8(%esp), %edx
cmove   12(%esp), %eax
movl%edx, 8(%esp)
movl%eax, 4(%esp)
jmp bar
.cfi_endproc

The current codegen is:

foo:
.LFB0:
.cfi_startproc
movl4(%esp), %ecx
movl8(%esp), %edx
movl12(%esp), %eax
testl   %ecx, %ecx
je  .L3
movl$10, %eax
movl%edx, 8(%esp)
movl%eax, 4(%esp)
jmp bar
.p2align 4,,7
.p2align 3
.L3:
movl$20, %edx
movl%eax, 4(%esp)
movl%edx, 8(%esp)
jmp bar
.cfi_endproc

BTW, I saw Andrew's patch fixing cmov:

https://gcc.gnu.org/pipermail/gcc-patches/2023-April/616088.html

Is that related?

[Bug middle-end/109118] New: [13 Regression] gcc.dg/mla_1.c failed on target w/o __Uint32x4_t support

2023-03-13 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109118

Bug ID: 109118
   Summary: [13 Regression] gcc.dg/mla_1.c failed on target w/o
__Uint32x4_t support
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: haochen.jiang at intel dot com
  Target Milestone: ---

For target w/o __Uint32x4_t support, like i386, when run through the testsuite,
we will get error like this:

error: unknown type name '__Uint32x4_t'; did you mean '__uint128_t'

We can reproduce by:

make check RUNTESTFLAGS="dg.exp=gcc.dg/mla_1.c --target_board='unix{-m64\
-march=cascadelake,-m32\ -march=cascadelake,-m32,-m64}'"

It should be a simple fix. But I am not sure whether the testcase aims to test
on middle-end or aarch64 target specific.

If we want to test on middle-end, we might not use the type __Uint32x4_t in
testcase.

If it is a just aarch64 specific test, I suppose we can move the target
aarch64*-*-* to dg-do compile to skip for other backend.

Christina, what is your opinion?

[Bug testsuite/108898] [13 Regression] Test introduced by r13-6278-g3da77f217c8b2089ecba3eb201e727c3fcdcd19d failed on i386

2023-02-23 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108898

--- Comment #2 from Haochen Jiang  ---
(In reply to Andrew Stubbs from comment #1)
> I tested it on i686-pc-linux-gnu before I posted the patch, and it was
> working then. Can you be more specific what configuration you were testing,
> please?

For fail
FAIL: gcc.dg/vect/vect-simd-clone-18e.c scan-tree-dump-times vect "[\\n\\r]
[^\\n]* = foo\\.simdclone" 3
FAIL: gcc.dg/vect/vect-simd-clone-18e.c scan-tree-dump-times vect "[\\n\\r]
[^\\n]* = foo\\.simdclone" 3
FAIL: gcc.dg/vect/vect-simd-clone-18f.c scan-tree-dump-times vect "[\\n\\r]
[^\\n]* = foo\\.simdclone" 2
FAIL: gcc.dg/vect/vect-simd-clone-18f.c scan-tree-dump-times vect "[\\n\\r]
[^\\n]* = foo\\.simdclone" 2
FAIL: gcc.dg/vect/vect-simd-clone-18f.c scan-tree-dump-times vect "[\\n\\r]
[^\\n]* = foo\\.simdclone" 2

We can reproduce by:

$ cd {build_dir}/gcc && make check
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-18f.c
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-18f.c
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-18f.c
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-18e.c
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check
RUNTESTFLAGS="vect.exp=gcc.dg/vect/vect-simd-clone-18e.c
--target_board='unix{-m32\ -march=cascadelake}'"

[Bug testsuite/108899] [13 Regression] ERROR: can't rename to "saved-unsupported": command already exists on i386

2023-02-23 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108899

--- Comment #9 from Haochen Jiang  ---
(In reply to Jakub Jelinek from comment #8)
> Should be fixed now.

Sorry for the late reply.

Yes, it fixed for me now. Thx a lot!

[Bug testsuite/108899] New: [13 Regression] ERROR: can't rename to "saved-unsupported": command already exists on i386

2023-02-22 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108899

Bug ID: 108899
   Summary: [13 Regression] ERROR: can't rename to
"saved-unsupported": command already exists on i386
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: testsuite
  Assignee: unassigned at gcc dot gnu.org
  Reporter: haochen.jiang at intel dot com
  Target Milestone: ---

We currently got these errors on i386 when running tests.

ERROR: can't rename to "saved-unsupported": command already exists
ERROR: in testcase
/export/gnu/import/git/gcc-test-master-intel64-native/src-master/gcc/testsuite/g++.dg/modules/modules.exp

I checked the trunk and I suppose it is probably caused by
r13-6288-g5344482c4d3ae0618fa8f5ed38f8309db43fdb82.

This commit changed modules.exp and rename unsupported to saved-unsupported.

Feel free to correct me if I am wrong.

[Bug testsuite/108898] New: [13 Regression] Test introduced by r13-6278-g3da77f217c8b2089ecba3eb201e727c3fcdcd19d failed on i386

2023-02-22 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108898

Bug ID: 108898
   Summary: [13 Regression] Test introduced by
r13-6278-g3da77f217c8b2089ecba3eb201e727c3fcdcd19d
failed on i386
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: testsuite
  Assignee: unassigned at gcc dot gnu.org
  Reporter: haochen.jiang at intel dot com
  Target Milestone: ---

r13-6278-g3da77f217c8b2089ecba3eb201e727c3fcdcd19d introduced
gcc.dg/vect/vect-simd-clone-1{6,7,8}{,b,c,d,e,f}.c.

My bisect script showed it caused these FAIL:

FAIL: gcc.dg/vect/vect-simd-clone-18e.c scan-tree-dump-times vect "[\\n\\r]
[^\\n]* = foo\\.simdclone" 3
FAIL: gcc.dg/vect/vect-simd-clone-18e.c scan-tree-dump-times vect "[\\n\\r]
[^\\n]* = foo\\.simdclone" 3
FAIL: gcc.dg/vect/vect-simd-clone-18f.c scan-tree-dump-times vect "[\\n\\r]
[^\\n]* = foo\\.simdclone" 2
FAIL: gcc.dg/vect/vect-simd-clone-18f.c scan-tree-dump-times vect "[\\n\\r]
[^\\n]* = foo\\.simdclone" 2
FAIL: gcc.dg/vect/vect-simd-clone-18f.c scan-tree-dump-times vect "[\\n\\r]
[^\\n]* = foo\\.simdclone" 2

And I suppose other FAIL are also related to this commit:

FAIL: gcc.dg/vect/vect-simd-clone-16.c scan-tree-dump-times vect "[\\n\\r]
[^\\n]* = foo\\.simdclone" 2
FAIL: gcc.dg/vect/vect-simd-clone-16.c scan-tree-dump-times vect "[\\n\\r]
[^\\n]* = foo\\.simdclone" 2
FAIL: gcc.dg/vect/vect-simd-clone-16.c scan-tree-dump-times vect "[\\n\\r]
[^\\n]* = foo\\.simdclone" 2
FAIL: gcc.dg/vect/vect-simd-clone-16e.c scan-tree-dump-times vect "[\\n\\r]
[^\\n]* = foo\\.simdclone" 3
FAIL: gcc.dg/vect/vect-simd-clone-16e.c scan-tree-dump-times vect "[\\n\\r]
[^\\n]* = foo\\.simdclone" 3
FAIL: gcc.dg/vect/vect-simd-clone-16f.c scan-tree-dump-times vect "[\\n\\r]
[^\\n]* = foo\\.simdclone" 2
FAIL: gcc.dg/vect/vect-simd-clone-16f.c scan-tree-dump-times vect "[\\n\\r]
[^\\n]* = foo\\.simdclone" 2
FAIL: gcc.dg/vect/vect-simd-clone-16f.c scan-tree-dump-times vect "[\\n\\r]
[^\\n]* = foo\\.simdclone" 2
FAIL: gcc.dg/vect/vect-simd-clone-17.c scan-tree-dump-times vect "[\\n\\r]
[^\\n]* = foo\\.simdclone" 2
FAIL: gcc.dg/vect/vect-simd-clone-17.c scan-tree-dump-times vect "[\\n\\r]
[^\\n]* = foo\\.simdclone" 2
FAIL: gcc.dg/vect/vect-simd-clone-17.c scan-tree-dump-times vect "[\\n\\r]
[^\\n]* = foo\\.simdclone" 2
FAIL: gcc.dg/vect/vect-simd-clone-17e.c scan-tree-dump-times vect "[\\n\\r]
[^\\n]* = foo\\.simdclone" 3
FAIL: gcc.dg/vect/vect-simd-clone-17e.c scan-tree-dump-times vect "[\\n\\r]
[^\\n]* = foo\\.simdclone" 3
FAIL: gcc.dg/vect/vect-simd-clone-17f.c scan-tree-dump-times vect "[\\n\\r]
[^\\n]* = foo\\.simdclone" 2
FAIL: gcc.dg/vect/vect-simd-clone-17f.c scan-tree-dump-times vect "[\\n\\r]
[^\\n]* = foo\\.simdclone" 2
FAIL: gcc.dg/vect/vect-simd-clone-17f.c scan-tree-dump-times vect "[\\n\\r]
[^\\n]* = foo\\.simdclone" 2
FAIL: gcc.dg/vect/vect-simd-clone-18.c scan-tree-dump-times vect "[\\n\\r]
[^\\n]* = foo\\.simdclone" 2
FAIL: gcc.dg/vect/vect-simd-clone-18.c scan-tree-dump-times vect "[\\n\\r]
[^\\n]* = foo\\.simdclone" 2
FAIL: gcc.dg/vect/vect-simd-clone-18.c scan-tree-dump-times vect "[\\n\\r]
[^\\n]* = foo\\.simdclone" 2

[Bug libfortran/108056] [12/13 Regression] backward compatibility issue between 11 and 12

2022-12-15 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108056

Haochen Jiang  changed:

   What|Removed |Added

 CC||haochen.jiang at intel dot com

--- Comment #15 from Haochen Jiang  ---
Hi Tobias,

My script shows that this commit cause testcase fail following:
(It is still running and you might get a email from gcc-regression afterwards)

FAIL: libgomp.fortran/allocate-4.f90   -O   (test for errors, line 19)
FAIL: libgomp.fortran/allocate-4.f90   -O   (test for errors, line 19)
FAIL: libgomp.fortran/allocate-4.f90   -O   (test for errors, line 19)
FAIL: libgomp.fortran/allocate-4.f90   -O   (test for errors, line 25)
FAIL: libgomp.fortran/allocate-4.f90   -O   (test for errors, line 25)
FAIL: libgomp.fortran/allocate-4.f90   -O   (test for errors, line 25)
FAIL: libgomp.fortran/allocate-4.f90   -O   (test for errors, line 31)
FAIL: libgomp.fortran/allocate-4.f90   -O   (test for errors, line 31)
FAIL: libgomp.fortran/allocate-4.f90   -O   (test for errors, line 31)
FAIL: libgomp.fortran/allocate-4.f90   -O   (test for errors, line 34)
FAIL: libgomp.fortran/allocate-4.f90   -O   (test for errors, line 34)
FAIL: libgomp.fortran/allocate-4.f90   -O   (test for errors, line 34)
FAIL: libgomp.fortran/allocate-4.f90   -O   (test for errors, line 37)
FAIL: libgomp.fortran/allocate-4.f90   -O   (test for errors, line 37)
FAIL: libgomp.fortran/allocate-4.f90   -O   (test for errors, line 37)
FAIL: libgomp.fortran/allocate-4.f90   -O   (test for errors, line 40)
FAIL: libgomp.fortran/allocate-4.f90   -O   (test for errors, line 40)
FAIL: libgomp.fortran/allocate-4.f90   -O   (test for errors, line 40)
FAIL: libgomp.fortran/allocate-4.f90   -O  (test for excess errors)
FAIL: libgomp.fortran/allocate-4.f90   -O  (test for excess errors)
FAIL: libgomp.fortran/allocate-4.f90   -O  (test for excess errors)

Apology for could not debugging that since I am not familiar with fortran.
Could you help to see why or we could just ignore them?

[Bug fortran/107669] [13 Regression] commit r13-3931 causes lots of testcase failure

2022-11-15 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107669

Haochen Jiang  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #5 from Haochen Jiang  ---
Solved on trunk.

[Bug fortran/107669] New: [13 Regression] commit r13-3931-59a63247992eb13153b82c4902aadf111460eac2 causes lots of testcase failure

2022-11-13 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107669

Bug ID: 107669
   Summary: [13 Regression] commit
r13-3931-59a63247992eb13153b82c4902aadf111460eac2
causes lots of testcase failure
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: fortran
  Assignee: unassigned at gcc dot gnu.org
  Reporter: haochen.jiang at intel dot com
  Target Milestone: ---

After commit r13-3931-59a63247992eb13153b82c4902aadf111460eac2, we got lots of
failure in the following libgomp.fortran testcases.

libgomp.fortran/is_device_ptr-2.f90
libgomp.fortran/optional-map.f90
libgomp.fortran/use_device_addr-1.f90
libgomp.fortran/use_device_addr-2.f90
libgomp.fortran/use_device_ptr-optional-2.f90
libgomp.fortran/use_device_ptr-optional-3.f90
libgomp.oacc-fortran/optional-data-copyin-by-value.f90

They all got ICE like this: internal compiler error: in
gfc_omp_check_optional_argument, at fortran/trans-openmp.cc:137.

You might also get an email sent by my script later.

[Bug target/106180] [13 Regression] ICE in extract_insn, at recog.cc:2791 since r13-1418-g73f942c08deef3

2022-07-06 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106180

--- Comment #9 from Haochen Jiang  ---
(In reply to Haochen Jiang from comment #8)
> Created attachment 53269 [details]
> This patch aims to handle memory issue when unpacking in cvtps2pd (version 2)
> 
> Just fully tested on this patch. Changed to adjust_address_nv to reduce
> function calls.

What I mean by fully tested is regtested on x86_64-pc-linux-gnu.

[Bug target/106180] [13 Regression] ICE in extract_insn, at recog.cc:2791 since r13-1418-g73f942c08deef3

2022-07-06 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106180

--- Comment #8 from Haochen Jiang  ---
Created attachment 53269
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53269=edit
This patch aims to handle memory issue when unpacking in cvtps2pd (version 2)

Just fully tested on this patch. Changed to adjust_address_nv to reduce
function calls.

[Bug target/106180] [13 Regression] ICE in extract_insn, at recog.cc:2791 since r13-1418-g73f942c08deef3

2022-07-06 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106180

--- Comment #7 from Haochen Jiang  ---
(In reply to Uroš Bizjak from comment #6)
> Comment on attachment 53261 [details]
> This patch aims to handle memory issue when unpacking in cvtps2pd
> 
> >@@ -9270,7 +9270,15 @@
> >   (vec_select:V2SF
> > (match_operand:V4SF 1 "vector_operand")
> > (parallel [(const_int 0) (const_int 1)]]
> >-  "TARGET_SSE2")
> >+  "TARGET_SSE2"
> >+{
> >+  if (MEM_P (operands[1]))
> >+{
> >+  operands[1] = gen_lowpart (V2SFmode, operands[1]);
> >+  emit_insn (gen_sse2_cvtps2pd_1 (operands[0], operands[1]));
> >+  DONE;
> >+}
> >+})
> 
> Does adjust_address_nv work here instead of gen_lowpart?
> 
> Uros.

I just did a quick test on that. It seems that it works.

I will send out the patch after fully tested.

[Bug target/106180] [13 Regression] ICE in extract_insn, at recog.cc:2791 since r13-1418-g73f942c08deef3

2022-07-05 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106180

--- Comment #4 from Haochen Jiang  ---
Created attachment 53261
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53261=edit
This patch aims to handle memory issue when unpacking in cvtps2pd

I am trying to solve this ICE problem with this patch in this case and
regtested on x86_64-pc-linux-gnu.
I believe that  this is logically correct. Although it seems a little bit
complicated than previous pattern.
BTW, I plan to fix all that pattern if this patch is ok.

[Bug target/43618] Incorrect sse2_cvtX2Y pattern

2022-07-04 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43618

Haochen Jiang  changed:

   What|Removed |Added

 CC||haochen.jiang at intel dot com

--- Comment #3 from Haochen Jiang  ---
First fix the wrong corresponding insn in machine description.
This will make when loop = 2, it is right.
For loop = 8, it need further fix in vectorization.

[Bug target/104371] [x86] Failure to use optimize pxor+pcmpeqb+pmovmskb+cmp 0xFFFF pattern to ptest

2022-05-12 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104371

--- Comment #8 from Haochen Jiang  ---
Fixed for GCC 13.

[Bug target/104371] [x86] Failure to use optimize pxor+pcmpeqb+pmovmskb+cmp 0xFFFF pattern to ptest

2022-03-31 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104371

Haochen Jiang  changed:

   What|Removed |Added

 CC||haochen.jiang at intel dot com

--- Comment #6 from Haochen Jiang  ---
Created attachment 52723
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52723=edit
This patch aims to optimize pxor+pcmpeqb+pmovmskb+cmp 0x pattern to ptest

I fixed that through this patch. Regtested on x86_64-pc-linux-gnu.

Currently hold for Stage 1 of GCC 13

If this is ok, could you help me to add block to PR105073? Thx.

[Bug fortran/102826] Glibc "--disable-mathvec" configure option fail to disable traces to libmvec

2021-10-18 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102826

--- Comment #3 from haochen.jiang at intel dot com ---
(In reply to Andrew Pinski from comment #2)
> math-vector-fortran.h comes from glibc so this is a glibc bug and not a GCC
> bug.
> installed header files from glibc should match --disable-mathvec .

>From my perspective, there maybe bug in glibc.

However, in the gfortran config part of gcc, it should not has the finclude
part for F951 under all circumstances or there should be some option to control
that. This is also a bug I believe.

[Bug fortran/102826] New: Glibc "--disable-mathvec" configure option fail to disable traces to libmvec

2021-10-18 Thread haochen.jiang at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102826

Bug ID: 102826
   Summary: Glibc "--disable-mathvec" configure option fail to
disable traces to libmvec
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: fortran
  Assignee: unassigned at gcc dot gnu.org
  Reporter: haochen.jiang at intel dot com
  Target Milestone: ---

When I build Glibc with "--disable-mathvec" and run through this simple
testcase: 

program test_overloaded_intrinsic
  real(4) :: x4(3200), y4(3200)
  real(8) :: x8(3200), y8(3200)

  y4 = sin(x4)
  print *, y4
end

It reports error: undefined reference to `_ZGVeN8v_sin'

It is trying to find libmvec functions, while because of the disabling,
libmvec.so will not be generated.

When I go through gcc file. In gcc/config/gnu-user.h Line 156, we have:

#undef TARGET_F951_OPTIONS
#define TARGET_F951_OPTIONS "%{!nostdinc:\
  %:fortran-preinclude-file(-fpre-include= math-vector-fortran.h finclude%s/)}"

, which is maybe the cause of the error. Because when I clear that
math-vector-fortran.h, the testcase will pass.