[Bug libfortran/114646] libgfortran still doesn't define GTHREAD_USE_WEAK to 0 for newer glibc

2024-04-09 Thread skpgkp2 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114646

--- Comment #17 from Sunil Pandey  ---
(In reply to H.J. Lu from comment #10)
> Created attachment 57906 [details]
> A patch
> 
> I am testing this.

This patch resolved my static testing issue.

[Bug target/88035] missing _mm512_reduce_round_pd() et al

2021-07-21 Thread skpgkp2 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88035

--- Comment #3 from Sunil Pandey  ---
I added _mm512_reduce_round_pd() and bunch of other missing intrinsic last
year.


commit 93103603fd66a9fcf3ea2d8b52657e4b2496f544
Author: Sunil K Pandey 
Date:   Wed Oct 14 11:36:39 2020 -0700

x86: Add missing intrinsics [PR95483]

Tested on x86-64.

gcc/ChangeLog:


$ git grep mm512_reduce_round_pd
gcc/ChangeLog-2020: (_mm512_reduce_round_pd): Ditto.
gcc/config/i386/avx512dqintrin.h:_mm512_reduce_round_pd (__m512d __A, int __B,
const int __R)
gcc/config/i386/avx512dqintrin.h:#define _mm512_reduce_round_pd(A, B, R)   
 \
gcc/testsuite/gcc.target/i386/avx512dq-vreducepd-3.c:  xx1 =
_mm512_reduce_round_pd(xx1, IMM, _MM_FROUND_NO_EXC);

$ git grep mm_*reduce_round
gcc/ChangeLog-2020: * config/i386/avx512dqintrin.h (_mm_reduce_round_sd):
New intrinsics.
gcc/ChangeLog-2020: (_mm_reduce_round_ss): Ditto.
gcc/config/i386/avx512dqintrin.h:_mm_reduce_round_sd (__m128d __A, __m128d __B,
int __C, const int __R)
gcc/config/i386/avx512dqintrin.h:_mm_reduce_round_ss (__m128 __A, __m128 __B,
int __C, const int __R)
gcc/config/i386/avx512dqintrin.h:#define _mm_reduce_round_sd(A, B, C, R)   
   \
gcc/config/i386/avx512dqintrin.h:#define _mm_reduce_round_ss(A, B, C, R)   
   \
gcc/testsuite/gcc.target/i386/avx512dq-vreducesd-1.c:  xx1 =
_mm_reduce_round_sd (xx1, xx2, IMM, _MM_FROUND_NO_EXC);
gcc/testsuite/gcc.target/i386/avx512dq-vreducesd-2.c:  res4.x =
_mm_reduce_round_sd (s1.x, s2.x, IMM,_MM_FROUND_TO_NEAREST_INT
gcc/testsuite/gcc.target/i386/avx512dq-vreducess-1.c:  xx1 =
_mm_reduce_round_ss (xx1, xx2, IMM, _MM_FROUND_NO_EXC);
gcc/testsuite/gcc.target/i386/avx512dq-vreducess-2.c:  res4.x =
_mm_reduce_round_ss (s1.x, s2.x, IMM, _MM_FROUND_TO_NEAREST_INT

[Bug testsuite/101114] new test case libgomp.c/../libgomp.c-c++-common/struct-elem-5.c fails after its introduction in r12-1565

2021-06-22 Thread skpgkp2 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101114

Sunil Pandey  changed:

   What|Removed |Added

 CC||hjl.tools at gmail dot com

--- Comment #1 from Sunil Pandey  ---
Also saw the same issue  on x86 target.

On Linux/x86_64,

275c736e732d29934e4d22e8f030d5aae8c12a52 is the first bad commit
commit 275c736e732d29934e4d22e8f030d5aae8c12a52
Author: Chung-Lin Tang 
Date:   Thu Jun 17 21:33:32 2021 +0800

libgomp: Structure element mapping for OpenMP 5.0

caused

FAIL: libgomp.c/../libgomp.c-c++-common/struct-elem-5.c execution test


with GCC configured with

../../gcc/configure
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-1565/usr
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check
RUNTESTFLAGS="c++.exp=libgomp.c-c++-common/struct-elem-5.c
--target_board='unix{-m32}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check
RUNTESTFLAGS="c++.exp=libgomp.c-c++-common/struct-elem-5.c
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check
RUNTESTFLAGS="c++.exp=libgomp.c-c++-common/struct-elem-5.c
--target_board='unix{-m64}'"
$ cd {build_dir}/x86_64-linux/libgomp/testsuite && make check
RUNTESTFLAGS="c++.exp=libgomp.c-c++-common/struct-elem-5.c
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me
at skpgkp2 at gmail dot com)

[Bug target/97054] New: [r10-3559 Regression] Runtime segfault with attached test code

2020-09-14 Thread skpgkp2 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97054

Bug ID: 97054
   Summary: [r10-3559 Regression] Runtime segfault with attached
test code
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: skpgkp2 at gmail dot com
CC: crazylht at gmail dot com, hjl.tools at gmail dot com
  Target Milestone: ---

Created attachment 49218
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49218=edit
reproducer test case.

Test case attached.

How to reproduce:

$g++ -fno-strict-aliasing -msse4.2 -mfpmath=sse  -gdwarf-2 -Wall
-Wwrite-strings -fPIC -Wformat-security -fstack-protector-strong -O2
-Wfatal-errors  -Wformat -Werror -Wundef  repro.cc && ./a.out
Segmentation fault (core dumped)

(gdb) run
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /local/skpandey/gccwork/toolwork/a.out 

Program received signal SIGSEGV, Segmentation fault.
0x004011b0 in p2_ep_REBIND_IPC () at repro.cc:55
55  cur_pro->pc_RIP.i64 = code_lin_to_log(cur_pro,
int2linaddr(cur_pro, ipc));
(gdb) disass
Dump of assembler code for function p2_ep_REBIND_IPC():
   0x00401180 <+0>: push   %r15
   0x00401182 <+2>: push   %r12
   0x00401184 <+4>: mov%rbp,%r12
   0x00401187 <+7>: mov%r12,%rdi
   0x0040118a <+10>:sub$0x18,%rsp
   0x0040118e <+14>:mov$0x4040a0,%r15
   0x00401195 <+21>:mov0x10(%rbp),%rbp
   0x00401199 <+25>:mov(%r15),%rsi
   0x0040119c <+28>:mov%rbp,0x8(%rsp)
   0x004011a1 <+33>:mov%rsi,0x30(%r12)
   0x004011a6 <+38>:mov%rsi,0x8(%r12)
   0x004011ab <+43>:callq  0x401150 
=> 0x004011b0 <+48>:movq   $0x0,0x10(%rbp)
   0x004011b8 <+56>:mov%rbp,%rdi
   0x004011bb <+59>:callq  0x401160 
   0x004011c0 <+64>:mov%rbp,%rdi
   0x004011c3 <+67>:mov0x8(%rsp),%rbp
   0x004011c8 <+72>:mov%rbp,%rsi
   0x004011cb <+75>:callq  0x401170

   0x004011d0 <+80>:addq   $0x4,(%r15)
   0x004011d4 <+84>:xor%edx,%edx
   0x004011d6 <+86>:mov%rax,0x30(%r12)
   0x004011db <+91>:subl   $0x1,0x4(%rbp)
   0x004011df <+95>:mov0x4(%rbp),%eax
   0x004011e2 <+98>:test   %eax,%eax
   0x004011e4 <+100>:   movsbl 0x0(%rbp),%eax
   0x004011e8 <+104>:   setle  %dl
   0x004011eb <+107>:   or %eax,%edx
   0x004011ed <+109>:   jne0x4011f5 
   0x004011ef <+111>:   mov(%r15),%rax
   0x004011f2 <+114>:   mov(%rax),%r13d
   0x004011f5 <+117>:   add$0x18,%rsp
   0x004011f9 <+121>:   xor%eax,%eax
   0x004011fb <+123>:   pop%r12
   0x004011fd <+125>:   pop%r15
   0x004011ff <+127>:   retq   
End of assembler dump.



Configured with: ../../gcc/configure
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r10-3559/usr
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld
--with-fpmath=sse --disable-libsanitizer --enable-languages=c,c++,fortran
--enable-cet --without-isl --enable-libmpx --disable-bootstrap

1bcb4c4faa4bd6b1c917c75b100d618faf9e628c is the first bad commit
commit 1bcb4c4faa4bd6b1c917c75b100d618faf9e628c
Author: Richard Sandiford 
Date:   Wed Oct 2 07:37:10 2019 +

[LRA] Don't make eliminable registers live (PR91957)

One effect of https://gcc.gnu.org/ml/gcc-patches/2019-09/msg00802.html
was to strengthen the sanity check in lra_assigns so that it checks
whether reg_renumber is consistent with the whole conflict set.
This duly tripped on csky for a pseudo that had been allocated
to the eliminated frame pointer.  (csky doesn't have a separate
hard frame pointer.)

lra-lives uses:

/* Set of hard regs (except eliminable ones) currently live.  */
static HARD_REG_SET hard_regs_live;

to track the set of live directly-referenced hard registers, and it
correctly implements the exclusion when setting up the initial set:

  hard_regs_live &= ~eliminable_regset;

But later calls to make_hard_regno_live and make_hard_regno_dead
would process eliminable registers like other registers, recording
conflicts for them and potentially making them live.  (Note that
after r266086, make_hard_regno_dead adds conflicts for registers
that are already marked dead.)  I think this would 

[Bug target/97018] [11 Regression] FAIL: gcc.target/i386/l_fma_float_1.c scan-assembler-times vfnmsub[123]+ss 32 on Linux/x86_64 (-m64 -march=cascadelake)

2020-09-11 Thread skpgkp2 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97018

--- Comment #5 from Sunil Pandey  ---
(In reply to Richard Biener from comment #1)
> Do they PASS on the GCC 10 branch?

GCC 10 branch has same issue. Same patch should be applied to GCC 10 too.

[Bug regression/97018] New: [r11 Regression] FAIL: gcc.target/i386/l_fma_float_1.c scan-assembler-times vfnmsub[123]+ss 32 on Linux/x86_64 (-m64 -march=cascadelake)

2020-09-11 Thread skpgkp2 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97018

Bug ID: 97018
   Summary: [r11 Regression] FAIL: gcc.target/i386/l_fma_float_1.c
scan-assembler-times vfnmsub[123]+ss 32 on
Linux/x86_64 (-m64 -march=cascadelake)
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: regression
  Assignee: unassigned at gcc dot gnu.org
  Reporter: skpgkp2 at gmail dot com
CC: hjl.tools at gmail dot com
  Target Milestone: ---

fma test fail on x86, when compiled with -march=cascadelake

Regression link: 

https://gcc.gnu.org/pipermail/gcc-regression/2020-August/073111.html
https://gcc.gnu.org/pipermail/gcc-regression/2020-August/073112.html
https://gcc.gnu.org/pipermail/gcc-regression/2020-August/073113.html
https://gcc.gnu.org/pipermail/gcc-regression/2020-August/073114.html
https://gcc.gnu.org/pipermail/gcc-regression/2020-August/073115.html
https://gcc.gnu.org/pipermail/gcc-regression/2020-August/073116.html
https://gcc.gnu.org/pipermail/gcc-regression/2020-August/073117.html
https://gcc.gnu.org/pipermail/gcc-regression/2020-August/073118.html
https://gcc.gnu.org/pipermail/gcc-regression/2020-August/073119.html
https://gcc.gnu.org/pipermail/gcc-regression/2020-August/073120.html
https://gcc.gnu.org/pipermail/gcc-regression/2020-August/073121.html
https://gcc.gnu.org/pipermail/gcc-regression/2020-August/073122.html

[Bug target/95237] LOCAL_DECL_ALIGNMENT shrinks alignment, FAIL gcc.target/i386/pr69454-2.c

2020-07-22 Thread skpgkp2 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95237

--- Comment #23 from Sunil Pandey  ---
(In reply to David Binderman from comment #20)
> This bug has prevented the successful compilation of the local
> Linux kernel for just over a month now.
> 
> If I can assist with any testing, please let me know.

Please look into bug report 96192. I also posted a proposed patch for PR96192.
It will be great, if you can apply the patch on latest gcc source to see if
kernel can build/boot. Also, it would be helpful if you can post a test case,
in case of failure.

[Bug middle-end/96192] tree-inline.c(copy_decl_for_dup_finish) should preserve decl alignment in copy

2020-07-22 Thread skpgkp2 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96192

Sunil Pandey  changed:

   What|Removed |Added

 CC||skpgkp2 at gmail dot com

--- Comment #4 from Sunil Pandey  ---
Created attachment 48913
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48913=edit
This patch lower alignment of parm and result decl. It also preserve decl
alignment during inlining.

[Bug middle-end/96192] tree-inline.c(copy_decl_for_dup_finish) should preserve decl alignment in copy

2020-07-14 Thread skpgkp2 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96192

--- Comment #3 from Sunil Pandey  ---
(In reply to Richard Biener from comment #1)
> Hmm, but there's no local variable to copy here?  Are you refering to the
> result decl from b we materialize in c?  This would be the same case
> as for example switch conversion adding a 'long long' variable, so the
> issue would be more wide-spread as you think - for example IPA SRA might
> choose to pass an aggregate by its components thus with an aggregate with
> two long long members you should see similar issues.

Yes, long long result decl. It's incoming alignment is 4(lowered by target
hook). But copy get default alignment as 8.

[Bug middle-end/96192] New: tree-inline.c(copy_decl_for_dup_finish) should preserve decl alignment in copy

2020-07-13 Thread skpgkp2 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96192

Bug ID: 96192
   Summary: tree-inline.c(copy_decl_for_dup_finish) should
preserve decl alignment in copy
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: skpgkp2 at gmail dot com
CC: hjl.tools at gmail dot com
  Target Milestone: ---
Target: x86_64-*-* i?86-*-*

tree-inline.c function copy_decl_for_dup_finish should preserve local decl
alignment in copy.

This issue appears when local decl alignment get lowered by target hook, but
during the inline, copy may get different alignment than original decl.

This blocks PR95237.

Test case:

$ cat foo.c
int a;

long long 
b (void)
{
}

void
c (void)
{
  if (b())
a = 1;
}

$gcc -m32 -mpreferred-stack-boundary=2 -Os -c foo.c
during GIMPLE pass: adjust_alignment
foo.c: In function ??c??:
foo.c:12:1: internal compiler error: in execute, at adjust-alignment.c:74
   12 | c (void)
  | ^
0x20bc351 execute
/local/skpandey/gccwork/gccwork/pr95237_1/gcc/adjust-alignment.c:74
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.

[Bug target/95885] New: LOCAL_DECL_ALIGNMENT macro documentation is incorrect.

2020-06-24 Thread skpgkp2 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95885

Bug ID: 95885
   Summary: LOCAL_DECL_ALIGNMENT macro documentation is incorrect.
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: skpgkp2 at gmail dot com
  Target Milestone: ---
Target: x86_64-*-* i?86-*-*

LOCAL_DECL_ALIGNMENT Macro documentation

 1172 @defmac LOCAL_DECL_ALIGNMENT (@var{decl})
 1173 If defined, a C expression to compute the alignment for a local
 1174 variable @var{decl}.
 1175 
 1176 If this macro is not defined, then
 1177 @code{LOCAL_ALIGNMENT (TREE_TYPE (@var{decl}), DECL_ALIGN (@var{decl}))}
 1178 is used.
 1179 
 1180 One use of this macro is to increase alignment of medium-size data to
 1181 make it all fit in fewer cache lines.
 1182 
 1183 If the value of this macro has a type, it should be an unsigned type.
 1184 @end defmac

This macro not only increases alignment but also decreases(-m32
-mpreferred-stack-boundary=2) depending on condition.

[Bug target/95237] LOCAL_DECL_ALIGNMENT shrinks alignment, FAIL gcc.target/i386/pr69454-2.c

2020-06-19 Thread skpgkp2 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95237

--- Comment #18 from Sunil Pandey  ---
Another test, trigger with -Os option.

$ cat foo.i
int a;
long long b() {}
int c() {
  if (b())
a = 1;
}


$gcc -m32  -mpreferred-stack-boundary=2 -Os   -c  foo.i
during GIMPLE pass: adjust_alignment
foo.i: In function ??c??:
foo.i:3:5: internal compiler error: in execute, at adjust-alignment.c:74
3 | int c() {
  | ^
0x2091411 execute
   
/local/skpandey/gccwork/pr95237/gitlab/gcc.orig/gcc/adjust-alignment.c:74
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.

[Bug target/95237] LOCAL_DECL_ALIGNMENT shrinks alignment, FAIL gcc.target/i386/pr69454-2.c

2020-06-19 Thread skpgkp2 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95237

--- Comment #17 from Sunil Pandey  ---
$ cat foo.c
long long c(long long x) {}
int a() { long long b = c(b); }

$ gcc -m32 -mpreferred-stack-boundary=2 -c foo.c
during GIMPLE pass: adjust_alignment
foo.c: In function ??a??:
foo.c:2:5: internal compiler error: in execute, at adjust-alignment.c:74
2 | int a() { long long b = c(b); }
  | ^
0x79d34f execute
   
/local/skpandey/gccwork/pr95237/gitlab/gcc.orig/gcc/adjust-alignment.c:74
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.

[Bug target/95748] New: Long long function parameter should be aligned to 32 bit on x86.

2020-06-18 Thread skpgkp2 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95748

Bug ID: 95748
   Summary: Long long function parameter should be aligned to 32
bit on x86.
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: skpgkp2 at gmail dot com
CC: hjl.tools at gmail dot com
  Target Milestone: ---

Long long function parameter should be aligned to 32 bit on x86 target.

$ cat paramtest.c
void foo(int x, long long p)
{
if (__alignof__(p) != 4 )
  __builtin_abort();
}
int bar()
{
foo(4,5);
return 0;
}
int main()
{
return(bar());
}

$ gcc -m32 paramtest.c
$ ./a.out
Aborted (core dumped)

[Bug target/95237] LOCAL_DECL_ALIGNMENT shrinks alignment, FAIL gcc.target/i386/pr69454-2.c

2020-06-02 Thread skpgkp2 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95237

--- Comment #14 from Sunil Pandey  ---
Created attachment 48662
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48662=edit
Add target hook to skip alignment check for long long on x86 with -m32 and
-mpreferred-stack-boundary=2

Bootstrap and regression tested on x86_64.

[Bug target/95237] LOCAL_DECL_ALIGNMENT shrinks alignment, FAIL gcc.target/i386/pr69454-2.c

2020-06-01 Thread skpgkp2 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95237

--- Comment #4 from Sunil Pandey  ---
This test case and many other regression on x86 caused by following change set
r11-508

Good

$
/local/skpandey/gccwork/pr95237/tools-build/gcc-debug-r11-507/release/usr/gcc-11.0.0-x86-64/bin/gcc
-m32 -mpreferred-stack-boundary=2 foo.c
bash-5.0$ echo $?
0

Bad
===
bash-5.0$
/local/skpandey/gccwork/pr95237/tools-build/gcc-debug-r11-508/release/usr/gcc-11.0.0-x86-64/bin/gcc
-m32 -mpreferred-stack-boundary=2 foo.c
during GIMPLE pass: adjust_alignment
foo.c: In function ??main??:
foo.c:7:5: internal compiler error: in execute, at adjust-alignment.c:73
7 | int main()
  | ^~~~
0x206a091 execute
/local/skpandey/gccwork/pr95237/gcc/gcc/adjust-alignment.c:73
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.

$git show dfa4fcdba374ed44d4aa1a22b2738f3f5c5b37af
commit dfa4fcdba374ed44d4aa1a22b2738f3f5c5b37af
Author: Kito Cheng 
Date:   Tue Apr 14 14:53:19 2020 +0800

Fix alignment for local variable [PR90811]

 - The alignment for local variable was adjust during
estimate_stack_frame_size,
   however it seems wrong spot to adjust that, expand phase will adjust
that
   but it little too late to some gimple optimization, which rely on
certain
   target hooks need to check alignment, forwprop is an example for
   that, result of simplify_builtin_call rely on the alignment on some
   target like ARM or RISC-V.

 - Exclude static local var and hard register var in the process of
   alignment adjustment.

 - This patch fix gfortran.dg/pr45636.f90 for arm and riscv.

 - Regression test on riscv32/riscv64 and x86_64-linux-gnu, no new fail
   introduced.

gcc/ChangeLog

PR target/90811
* Makefile.in (OBJS): Add adjust-alignment.o.
* adjust-alignment.c (pass_data_adjust_alignment): New.
(pass_adjust_alignment): New.
(pass_adjust_alignment::execute): New.
(make_pass_adjust_alignment): New.
* tree-pass.h (make_pass_adjust_alignment): New.
* passes.def: Add pass_adjust_alignment.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 9eee988e12c..fdaa94ae8b9 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,14 @@
+2020-05-20  Kito Cheng  
+
+   PR target/90811
+   * Makefile.in (OBJS): Add adjust-alignment.o.
+   * adjust-alignment.c (pass_data_adjust_alignment): New.
+   (pass_adjust_alignment): New.
+   (pass_adjust_alignment::execute): New.
+   (make_pass_adjust_alignment): New.
+   * tree-pass.h (make_pass_adjust_alignment): New.
+   * passes.def: Add pass_adjust_alignment.


GDB dump:
=

The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program:
/local/skpandey/gccwork/pr95237/tools-build/gcc-debug-r11-508/release/usr/gcc-11.0.0-x86-64/libexec/gcc/x86_64-pc-linux-gnu/11.0.0/cc1
-quiet -imultilib 32 -iprefix
/local/skpandey/gccwork/pr95237/tools-build/gcc-debug-r11-508/release/usr/gcc-11.0.0-x86-64/bin/../lib/gcc/x86_64-pc-linux-gnu/11.0.0/
foo.c -quiet -dumpbase foo.c -m32 -mpreferred-stack-boundary=2 -mtune=generic
-march=x86-64 -auxbase foo -o /tmp/ccEnTcu1.s
  File "/usr/share/gdb/auto-load/usr/lib64/libisl.so.15.1.1-gdb.py", line 67
print "No isl printer for this type"
   ^
SyntaxError: Missing parentheses in call to 'print'. Did you mean print("No isl
printer for this type")?

Breakpoint 1, (anonymous namespace)::pass_adjust_alignment::execute
(this=0x32fbb10, fun=0x7fffea9340b0)
at /local/skpandey/gccwork/pr95237/gcc/gcc/adjust-alignment.c:73
73gcc_assert (align >= DECL_ALIGN (var));
(gdb) c
Continuing.

Breakpoint 1, (anonymous namespace)::pass_adjust_alignment::execute
(this=0x32fbb10, fun=0x7fffea9340b0)
at /local/skpandey/gccwork/pr95237/gcc/gcc/adjust-alignment.c:73
73gcc_assert (align >= DECL_ALIGN (var));
(gdb) n
during GIMPLE pass: adjust_alignment
foo.c: In function ??main??:
foo.c:7:5: internal compiler error: in execute, at adjust-alignment.c:73
7 | int main()
  | ^~~~
0x206a091 execute
/local/skpandey/gccwork/pr95237/gcc/gcc/adjust-alignment.c:73
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.
[Inferior 1 (process 4132225) exited with code 04]
(gdb) 

Can we just exclude adjust-alignment pass for x86?

[Bug target/92807] gcc generate extra move for the snippet code along with lea instruction.

2019-12-09 Thread skpgkp2 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92807

--- Comment #5 from Sunil Pandey  ---
Patch link: https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00450.html

[Bug target/92807] gcc generate extra move for the snippet code along with lea instruction.

2019-12-04 Thread skpgkp2 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92807

Sunil Pandey  changed:

   What|Removed |Added

 CC||hjl.tools at gmail dot com,
   ||skpgkp2 at gmail dot com
Summary|gcc generate extra move for |gcc generate extra move for
   |the snippet code|the snippet code along with
   ||lea instruction.

--- Comment #1 from Sunil Pandey  ---
$ cat t1.c
#include
uint32_t abs2( uint32_t a )
{
uint32_t s = ((a>>15)&0x10001)*0x;
return (a+s)^s;
}

$ gcc --version
gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1)
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


$ gcc -Ofast -S t1.c -o t1.s.gcc.9
$ cat t1.s.gcc.9
.file   "t1.c"
.text
.p2align 4
.globl  abs2
.type   abs2, @function
abs2:
.LFB0:
.cfi_startproc
movl%edi, %edx
shrl$15, %edx
andl$65537, %edx
movl%edx, %eax
sall$16, %eax
subl%edx, %eax
movl%eax, %edx
leal(%rdi,%rax), %eax
xorl%edx, %eax
ret
.cfi_endproc
.LFE0:
.size   abs2, .-abs2
.ident  "GCC: (GNU) 9.2.1 20190827 (Red Hat 9.2.1-1)"
.section.note.GNU-stack,"",@progbits


Intel compiler generate add instead of lea as well as 1 less mov instruction
for same code.

$ icc -Ofast -S t1.c -o t1.s.icc

$ cat t1.s.icc

.L_2__routine_start_abs2_0:
# -- Begin  abs2
.text
# mark_begin;
   .align16,0x90
.globl abs2
# --- abs2(uint32_t)
abs2:
# parameter 1: %edi
..B1.1: # Preds ..B1.0
# Execution count [1.00e+00]
.cfi_startproc
..___tag_value_abs2.1:
..L2:
  #3.1
movl  %edi, %edx#4.23
shrl  $15, %edx #4.23
andl  $65537, %edx  #4.27
movl  %edx, %eax#4.36
shll  $16, %eax #4.36
subl  %edx, %eax#4.36
addl  %eax, %edi#5.19
xorl  %edi, %eax#5.22
ret #5.22
.align16,0x90
# LOE
.cfi_endproc
# mark_end;
.type   abs2,@function
.size   abs2,.-abs2
..LNabs2.0:
.data
# -- End  abs2
.data
.section .note.GNU-stack, ""
# End

[Bug target/92807] New: gcc generate extra move for the snippet code

2019-12-04 Thread skpgkp2 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92807

Bug ID: 92807
   Summary: gcc generate extra move for the snippet code
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: skpgkp2 at gmail dot com
  Target Milestone: ---

[Bug middle-end/91512] [10 Regression] Fortran compile time regression.

2019-09-06 Thread skpgkp2 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91512

--- Comment #20 from Sunil Pandey  ---
Created attachment 46851
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46851=edit
Compile time regression reproducer.

attached reproducer show ~28X compile time regression after the commit. See the
command line in readme.txt file.

[Bug middle-end/91512] [10 Regression] Fortran compile time regression.

2019-08-25 Thread skpgkp2 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91512

--- Comment #18 from Sunil Pandey  ---
(In reply to Thomas Koenig from comment #17)
> Simply passing on a huge number of arguments is not enough to trigger this.
> 
> Here's a perl script to generate test cases:
> 
> while ($n=shift)
> {
> open FOO, ">foo-$n.f90";
> 
> print FOO < module foo
>   implicit none
> contains
> EOF
> 
> print FOO "subroutine foo_proc";
> for ($i=0; $i<$n; $i++)
> {
> push (@var, "a" . sprintf("%3.3d", $i));
> }
> 
> @call = ();
> push (@call, "(");
> 
> for ($i=0; $i<$n-1; $i++)
> {
> push (@call, "&\n  ") if ($i%10 == 0);
> push (@call, $var[$i] . ", ");
> }
> push (@call, $var[$n-1],")\n");
> print FOO @call;
> 
> for ($i=0; $i<$n; $i++)
> {
> print FOO "  real, dimension(:,:) :: $var[$i]\n";
> }
> 
> print FOO "  call bar";
> print FOO @call;
> print FOO "  end subroutine\n";
> print FOO "end module\n";
> }
> 
> Running this script with
> 
> for a in 50 100 200 500 1000; do perl gener.pl $a; echo -n "$a ";
> /usr/bin/time -f "%e %M" gfortran -c -O2 foo-$a.f90; done
> 
> gave me
> 
> 50 3.21 272668
> 100 8.44 581860
> 200 20.15 1046780
> 500 52.32 1208684
> 1000 167.43 3493456
> 
> so the CPU time does not come close to what is reported here.
> Memory use is quite high, though.
> 
> What is the memory footprint of the compilation? Is your machine possibly
> starting to swap?

My system has plenty of memory. I don't think it's swapping issue. Here is
memory profile for before and after commit.

Before commit:
==

$ /usr/bin/time -f "%e %M"
/local/skpandey/gccwork/gcc_trunk/tools-build/gcc-debug/release.a4ba5c3ec624008e899a8bcb687359db25140c23/usr/gcc-10.0.0-x86-64/bin/gfortran
 -m64 -c -o module_first_rk_step_part1.fppized.o -I. -I./netcdf/include -I./inc
-fno-unsafe-math-optimizations -mfpmath=sse -O3 -march=skylake -funroll-loops
-fconvert=big-endian module_first_rk_step_part1.fppized.f90
41.88 214612
$ /usr/bin/time -f "%e %M"
/local/skpandey/gccwork/gcc_trunk/tools-build/gcc-debug/release.a4ba5c3ec624008e899a8bcb687359db25140c23/usr/gcc-10.0.0-x86-64/bin/gfortran
 -m64 -c -o module_first_rk_step_part1.fppized.o -I. -I./netcdf/include -I./inc
-fno-unsafe-math-optimizations -mfpmath=sse -O3 -march=skylake
-fconvert=big-endian module_first_rk_step_part1.fppized.f90
40.88 214716
$ /usr/bin/time -f "%e %M"
/local/skpandey/gccwork/gcc_trunk/tools-build/gcc-debug/release.a4ba5c3ec624008e899a8bcb687359db25140c23/usr/gcc-10.0.0-x86-64/bin/gfortran
 -m64 -c -o module_first_rk_step_part1.fppized.o -I. -I./netcdf/include -I./inc
-fno-unsafe-math-optimizations -mfpmath=sse -O2 -march=skylake
-fconvert=big-endian module_first_rk_step_part1.fppized.f90
40.38 214652

After commit:
=

$ /usr/bin/time -f "%e %M"
/local/skpandey/gccwork/gcc_trunk/tools-build/gcc-debug/release/usr/gcc-10.0.0-x86-64/bin/gfortran
 -m64 -c -o module_first_rk_step_part1.fppized.o -I. -I./netcdf/include -I./inc
-fno-unsafe-math-optimizations -mfpmath=sse -O3 -march=skylake -funroll-loops
-fconvert=big-endian module_first_rk_step_part1.fppized.f90
1548.42 10111860
$ /usr/bin/time -f "%e %M"
/local/skpandey/gccwork/gcc_trunk/tools-build/gcc-debug/release/usr/gcc-10.0.0-x86-64/bin/gfortran
 -m64 -c -o module_first_rk_step_part1.fppized.o -I. -I./netcdf/include -I./inc
-fno-unsafe-math-optimizations -mfpmath=sse -O3 -march=skylake
-fconvert=big-endian module_first_rk_step_part1.fppized.f90
1088.74 2924072
$ /usr/bin/time -f "%e %M"
/local/skpandey/gccwork/gcc_trunk/tools-build/gcc-debug/release/usr/gcc-10.0.0-x86-64/bin/gfortran
 -m64 -c -o module_first_rk_step_part1.fppized.o -I. -I./netcdf/include -I./inc
-fno-unsafe-math-optimizations -mfpmath=sse -O2 -march=skylake
-fconvert=big-endian module_first_rk_step_part1.fppized.f90
544.56 3129568

[Bug middle-end/91512] [10 Regression] Fortran compile time regression.

2019-08-23 Thread skpgkp2 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91512

--- Comment #16 from Sunil Pandey  ---
(In reply to rguent...@suse.de from comment #15)
> On Thu, 22 Aug 2019, skpgkp2 at gmail dot com wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91512
> > 
> > --- Comment #14 from Sunil Pandey  ---
> > (In reply to Richard Biener from comment #7)
> > > (In reply to Sunil Pandey from comment #4)
> > > > Actually it is spec cpu 2017 521.wrf benchmark getting this problem 
> > > > while
> > > > compiling. Compilation taking forever, you can see while compiling file
> > > > module_first_rk_step_part1.fppized.f90 as a representative.
> > > 
> > > Note this file contains a single function which (besides USEing quite a
> > > number
> > > of modules...) has only function calls involving a lot of parameters
> > > effectively forwarding parameters from the function.  Thus
> > > 
> > > SUBROUTINE foo (psim, ..., ims, ime, jms, jme)
> > > REAL,DIMENSION(ims:ime,jms:jme), INTENT(INOUT) :: psim
> > > call sub1 (PSIM=psim, ...)
> > > call sub2 (PSIM=psim, ...)
> > > END SUBROUTINE
> > > 
> > > with a _lot_ of arrays being passed through.  A simple testcase like
> > > 
> > > SUBROUTINE sub1 (psim, ims, ime, jms, jme)
> > > REAL,DIMENSION(ims:ime,jms:jme), INTENT(INOUT) :: psim
> > > END SUBROUTINE
> > > SUBROUTINE foo (psim, ims, ime, jms, jme)
> > > REAL,DIMENSION(ims:ime,jms:jme), INTENT(INOUT) :: psim
> > > call sub1 (psim, ims, ime, jms, jme)
> > > END SUBROUTINE
> > > 
> > > doesn't show any extra loops generated though, so I'm not sure what to
> > > look after.
> > 
> > It seems very hard to create a small test case which reproduce the long 
> > compile
> > time problem. Unfortunately, I'm not allowed to upload spec source file. 
> > Also
> > it's very big with lots of module dependency. Assuming you have spec 2017
> > sources,
> > 
> > Here is unmodified command line, which show compile time problem.
> > 
> > Spec build dir: 
> > ===
> > 
> > /local/skpandey/gccwork/specx5/cpu2017/benchspec/CPU/521.wrf_r/build/build_base_gcc-10.0.0-x86-64.
> > 
> > Before the commit in question:
> > ==
> > 
> > Take 41 second to compile unmodified file with -O2 -march=skylake
> > 
> > $ time
> > /local/skpandey/gccwork/gcc_trunk/tools-build/gcc-debug/release.a4ba5c3ec624008e899a8bcb687359db25140c23/usr/gcc-10.0.0-x86-64/bin/gfortran
> >  -m64 -c -o module_first_rk_step_part1.fppized.o -I. -I./netcdf/include 
> > -I./inc
> > -fno-unsafe-math-optimizations -mfpmath=sse -O2 -march=skylake 
> > -funroll-loops
> > -fconvert=big-endian module_first_rk_step_part1.fppized.f90
> > 
> > real0m41.295s
> > user0m41.031s
> > sys 0m0.204s
> > 
> > After the commit in question:
> > =
> > 
> > It take about 12 minute with -O2 -march=skylake
> > 
> > $ time
> > /local/skpandey/gccwork/gcc_trunk/tools-build/gcc-debug/release/usr/gcc-10.0.0-x86-64/bin/gfortran
> >  -m64 -c -o module_first_rk_step_part1.fppized.o -I. -I./netcdf/include 
> > -I./inc
> > -fno-unsafe-math-optimizations -mfpmath=sse -O2 -march=skylake 
> > -funroll-loops
> > -fconvert=big-endian module_first_rk_step_part1.fppized.f90
> > 
> > real11m59.498s
> > user11m53.304s
> > sys 0m4.835s
> > 
> > 
> > With higher optimization like -O3 or -Ofast, it take even longer and I have 
> > to
> > kill it.
> 
> Does it help to omit -funroll-loops?

Omitting -funroll-loops help a bit but not much.

$ time
/local/skpandey/gccwork/gcc_trunk/tools-build/gcc-debug/release/usr/gcc-10.0.0-x86-64/bin/gfortran
 -m64 -c -o module_first_rk_step_part1.fppized.o -I. -I./netcdf/include -I./inc
-fno-unsafe-math-optimizations -mfpmath=sse -O2 -march=skylake
-fconvert=big-endian module_first_rk_step_part1.fppized.f90

real9m4.806s
user9m2.180s
sys 0m1.620s
$ time
/local/skpandey/gccwork/gcc_trunk/tools-build/gcc-debug/release/usr/gcc-10.0.0-x86-64/bin/gfortran
 -m64 -c -o module_first_rk_step_part1.fppized.o -I. -I./netcdf/include -I./inc
-fno-unsafe-math-optimizations -mfpmath=sse -O3 -march=skylake
-fconvert=big-endian module_first_rk_step_part1.fppized.f90

real18m7.810s
user18m4.395s
sys 0m1.498s
$ time
/local/skpandey/gccwork/gcc_trunk/tools-build/gcc-debug/release/usr/gcc-10.0.0-x86-64/bin/gfortran
 -m64 -c -o module_first_rk_step_part1.fppized.o -I. -I./netcdf/include -I./inc
-fno-unsafe-math-optimizations -mfpmath=sse -O3 -march=skylake -funroll-loops
-fconvert=big-endian module_first_rk_step_part1.fppized.f90

real25m47.889s
user25m40.571s
sys 0m4.639s

[Bug middle-end/91512] [10 Regression] Fortran compile time regression.

2019-08-22 Thread skpgkp2 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91512

--- Comment #14 from Sunil Pandey  ---
(In reply to Richard Biener from comment #7)
> (In reply to Sunil Pandey from comment #4)
> > Actually it is spec cpu 2017 521.wrf benchmark getting this problem while
> > compiling. Compilation taking forever, you can see while compiling file
> > module_first_rk_step_part1.fppized.f90 as a representative.
> 
> Note this file contains a single function which (besides USEing quite a
> number
> of modules...) has only function calls involving a lot of parameters
> effectively forwarding parameters from the function.  Thus
> 
> SUBROUTINE foo (psim, ..., ims, ime, jms, jme)
> REAL,DIMENSION(ims:ime,jms:jme), INTENT(INOUT) :: psim
> call sub1 (PSIM=psim, ...)
> call sub2 (PSIM=psim, ...)
> END SUBROUTINE
> 
> with a _lot_ of arrays being passed through.  A simple testcase like
> 
> SUBROUTINE sub1 (psim, ims, ime, jms, jme)
> REAL,DIMENSION(ims:ime,jms:jme), INTENT(INOUT) :: psim
> END SUBROUTINE
> SUBROUTINE foo (psim, ims, ime, jms, jme)
> REAL,DIMENSION(ims:ime,jms:jme), INTENT(INOUT) :: psim
> call sub1 (psim, ims, ime, jms, jme)
> END SUBROUTINE
> 
> doesn't show any extra loops generated though, so I'm not sure what to
> look after.

It seems very hard to create a small test case which reproduce the long compile
time problem. Unfortunately, I'm not allowed to upload spec source file. Also
it's very big with lots of module dependency. Assuming you have spec 2017
sources,

Here is unmodified command line, which show compile time problem.

Spec build dir: 
===

/local/skpandey/gccwork/specx5/cpu2017/benchspec/CPU/521.wrf_r/build/build_base_gcc-10.0.0-x86-64.

Before the commit in question:
==

Take 41 second to compile unmodified file with -O2 -march=skylake

$ time
/local/skpandey/gccwork/gcc_trunk/tools-build/gcc-debug/release.a4ba5c3ec624008e899a8bcb687359db25140c23/usr/gcc-10.0.0-x86-64/bin/gfortran
 -m64 -c -o module_first_rk_step_part1.fppized.o -I. -I./netcdf/include -I./inc
-fno-unsafe-math-optimizations -mfpmath=sse -O2 -march=skylake -funroll-loops
-fconvert=big-endian module_first_rk_step_part1.fppized.f90

real0m41.295s
user0m41.031s
sys 0m0.204s

After the commit in question:
=

It take about 12 minute with -O2 -march=skylake

$ time
/local/skpandey/gccwork/gcc_trunk/tools-build/gcc-debug/release/usr/gcc-10.0.0-x86-64/bin/gfortran
 -m64 -c -o module_first_rk_step_part1.fppized.o -I. -I./netcdf/include -I./inc
-fno-unsafe-math-optimizations -mfpmath=sse -O2 -march=skylake -funroll-loops
-fconvert=big-endian module_first_rk_step_part1.fppized.f90

real11m59.498s
user11m53.304s
sys 0m4.835s


With higher optimization like -O3 or -Ofast, it take even longer and I have to
kill it.

[Bug fortran/91512] [10 Regression] Fortran compile time regression.

2019-08-21 Thread skpgkp2 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91512

--- Comment #4 from Sunil Pandey  ---
(In reply to Thomas Koenig from comment #3)
> (In reply to Sunil Pandey from comment #2)
> 
> >  phase opt and generate :  47.72 ( 97%)   0.24 ( 77%)  48.04 (
> > 96%)  118205 kB ( 89%)
> 
> So, phase_opt_and_generate appears to be something strange.
> 
> What the patch did was to change the way for subarrays that are
> packed because they are passed to an old-style argument, like this:
> 
> module x
>   implicit none
> contains
>   subroutine bar(a, n)
> integer, intent(in) :: n
> integer, intent(in), dimension(n) :: a
> print *,a
>   end subroutine bar
> end module x
> 
> program main
>   use x
>   implicit none
>   integer, parameter :: n = 10
>   integer, dimension(n) :: a
>   integer :: i
>   a = [(i,i=1,n)]
>   call bar(a(n:1:-1),n)
> end program main
> 
> After the patch, the packing is done in the front end, which
> means that the optimizers can see through it and possibly inline
> the procedures, or do other optimizations.  If these optimizations
> hit some quadratic (or worse) behavior, this could lead to long
> compilation times.
> 
> I would assume that your code has a lot of places where code like
> the above occurs, is that the case?

Actually it is spec cpu 2017 521.wrf benchmark getting this problem while
compiling. Compilation taking forever, you can see while compiling file
module_first_rk_step_part1.fppized.f90 as a representative.

[Bug fortran/91512] Fortran compile time regression.

2019-08-21 Thread skpgkp2 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91512

--- Comment #2 from Sunil Pandey  ---
Before commit time report:
==

$
/local/skpandey/gccwork/gcc_trunk/tools-build/gcc-debug/release.a4ba5c3ec624008e899a8bcb687359db25140c23/usr/gcc-10.0.0-x86-64/bin/gfortran
 -m64 -c -o module_first_rk_step_part1.fppized.o  
-fno-unsafe-math-optimizations -mfpmath=sse -Ofast -march=native -funroll-loops
 -fconvert=big-endian foo.f90 -ftime-report

Time variable   usr   sys  wall
  GGC
 phase setup:   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)
182 kB (  1%)
 phase parsing  :   1.67 ( 54%)   0.07 ( 88%)   1.74 ( 54%)
  10534 kB ( 72%)
 phase opt and generate :   1.45 ( 46%)   0.01 ( 12%)   1.46 ( 45%)
   3928 kB ( 27%)
 garbage collection :   0.02 (  1%)   0.00 (  0%)   0.02 (  1%)
  0 kB (  0%)
 callgraph construction :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)
121 kB (  1%)
 callgraph optimization :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)
  0 kB (  0%)
 ipa function summary   :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)
 11 kB (  0%)
 cfg cleanup:   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)
 34 kB (  0%)
 CFG verifier   :   0.05 (  2%)   0.00 (  0%)   0.10 (  3%)
  0 kB (  0%)
 trivially dead code:   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)
  0 kB (  0%)
 df scan insns  :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)
  0 kB (  0%)
 df reaching defs   :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)
  0 kB (  0%)
 df live regs   :   0.01 (  0%)   0.01 ( 12%)   0.00 (  0%)
  0 kB (  0%)
 df live regs   :   0.03 (  1%)   0.00 (  0%)   0.00 (  0%)
  0 kB (  0%)
 df must-initialized regs   :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)
  0 kB (  0%)
 df reg dead/unused notes   :   0.02 (  1%)   0.00 (  0%)   0.01 (  0%)
 72 kB (  0%)
 register information   :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)
  0 kB (  0%)
 alias stmt walking :   0.01 (  0%)   0.00 (  0%)   0.02 (  1%)
 13 kB (  0%)
 parser (global):   1.67 ( 54%)   0.07 ( 88%)   1.74 ( 54%)
  10534 kB ( 72%)
 tree gimplify  :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)
430 kB (  3%)
 tree CFG cleanup   :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)
  0 kB (  0%)
 tree VRP   :   0.03 (  1%)   0.00 (  0%)   0.02 (  1%)
 72 kB (  0%)
 tree Early VRP :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)
 17 kB (  0%)
 tree copy propagation  :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)
  0 kB (  0%)
 tree PTA   :   0.03 (  1%)   0.00 (  0%)   0.03 (  1%)
113 kB (  1%)
 dominator optimization :   0.04 (  1%)   0.00 (  0%)   0.03 (  1%)
 62 kB (  0%)
 tree CCP   :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)
 17 kB (  0%)
 tree PRE   :   0.03 (  1%)   0.00 (  0%)   0.04 (  1%)
 79 kB (  1%)
 tree FRE   :   0.03 (  1%)   0.00 (  0%)   0.01 (  0%)
 17 kB (  0%)
 tree forward propagate :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)
  1 kB (  0%)
 tree phiprop   :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)
  0 kB (  0%)
 tree aggressive DCE:   0.02 (  1%)   0.00 (  0%)   0.00 (  0%)
 10 kB (  0%)
 tree DSE   :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)
  1 kB (  0%)
 complete unrolling :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)
 13 kB (  0%)
 tree slp vectorization :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)
 53 kB (  0%)
 tree SSA verifier  :   0.22 (  7%)   0.00 (  0%)   0.15 (  5%)
  0 kB (  0%)
 tree STMT verifier :   0.17 (  5%)   0.00 (  0%)   0.23 (  7%)
  0 kB (  0%)
 dominance computation  :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)
  0 kB (  0%)
 expand :   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)
497 kB (  3%)
 CSE:   0.01 (  0%)   0.00 (  0%)   0.01 (  0%)
 15 kB (  0%)
 dead store elim1   :   0.02 (  1%)   0.00 (  0%)   0.01 (  0%)
 55 kB (  0%)
 dead store elim2   :   0.10 (  3%)   0.00 (  0%)   0.10 (  3%)
 77 kB (  1%)
 loop init  :   0.02 (  1%)   0.00 (  0%)   0.03 (  1%)
 15 kB (  0%)
 loop unrolling :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%)
 94 kB (  1%)
 CPROP  :   0.01 (  0%)   0.00 (  0%)   0.02 (  1%)
 69 kB (  0%)
 PRE:   

[Bug fortran/91512] New: Fortran compile time regression.

2019-08-21 Thread skpgkp2 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91512

Bug ID: 91512
   Summary: Fortran compile time regression.
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: fortran
  Assignee: unassigned at gcc dot gnu.org
  Reporter: skpgkp2 at gmail dot com
CC: crazylht at gmail dot com, hjl.tools at gmail dot com,
tkoenig at gcc dot gnu.org
  Target Milestone: ---

We have a big fortran file which used to take 30 second to compile now it's
taking more than 30 minute. I'm trying to create a small reproducer.

Compile time regression caused by this change.

commit 396260630b60675a92bee5577333c4794565aee5 (HEAD, refs/bisect/bad)
Author: tkoenig 
Date:   Sun May 19 10:21:06 2019 +

2019-05-19  Thomas Koenig  

PR fortran/88821
* expr.c (gfc_is_simply_contiguous): Return true for
an EXPR_ARRAY.
* trans-array.c (is_pointer): New function.
(gfc_conv_array_parameter): Call gfc_conv_subref_array_arg
when not optimizing and not optimizing for size if the formal
arg is passed by reference.
* trans-expr.c (gfc_conv_subref_array_arg): Add arguments
fsym, proc_name and sym.  Add run-time warning for temporary
array creation.  Wrap argument if passing on an optional
argument to an optional argument.
* trans.h (gfc_conv_subref_array_arg): Add optional arguments
fsym, proc_name and sym to prototype.

2019-05-19  Thomas Koenig  

PR fortran/88821
* gfortran.dg/alloc_comp_auto_array_3.f90: Add -O0 to dg-options
to make sure the test for internal_pack is retained.
* gfortran.dg/assumed_type_2.f90: Split compile and run time
tests into this and
* gfortran.dg/assumed_type_2a.f90: New file.
* gfortran.dg/c_loc_test_22.f90: Likewise.
* gfortran.dg/contiguous_3.f90: Likewise.
* gfortran.dg/internal_pack_11.f90: Likewise.
* gfortran.dg/internal_pack_12.f90: Likewise.
* gfortran.dg/internal_pack_16.f90: Likewise.
* gfortran.dg/internal_pack_17.f90: Likewise.
* gfortran.dg/internal_pack_18.f90: Likewise.
* gfortran.dg/internal_pack_4.f90: Likewise.
* gfortran.dg/internal_pack_5.f90: Add -O0 to dg-options
to make sure the test for internal_pack is retained.
* gfortran.dg/internal_pack_6.f90: Split compile and run time
tests into this and
* gfortran.dg/internal_pack_6a.f90: New file.
* gfortran.dg/internal_pack_8.f90: Likewise.
* gfortran.dg/missing_optional_dummy_6: Split compile and run time
tests into this and
* gfortran.dg/missing_optional_dummy_6a.f90: New file.
* gfortran.dg/no_arg_check_2.f90: Split compile and run time tests
into this and
* gfortran.dg/no_arg_check_2a.f90: New file.
* gfortran.dg/typebound_assignment_5.f90: Split compile and run
time
tests into this and
* gfortran.dg/typebound_assignment_5a.f90: New file.
* gfortran.dg/typebound_assignment_6.f90: Split compile and run
time
tests into this and
* gfortran.dg/typebound_assignment_6a.f90: New file.
* gfortran.dg/internal_pack_19.f90: New file.
* gfortran.dg/internal_pack_20.f90: New file.
* gfortran.dg/internal_pack_21.f90: New file.



git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@271377
138bc75d-0d04-0410-961f-82ee72b054a4

[Bug rtl-optimization/91503] New: ICE ira-build.i:17:1: error: shared rtx

2019-08-20 Thread skpgkp2 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91503

Bug ID: 91503
   Summary: ICE ira-build.i:17:1: error: shared rtx
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: skpgkp2 at gmail dot com
CC: crazylht at gmail dot com, hjl.tools at gmail dot com
  Target Milestone: ---

ICE when compile with gcc option "-O3  -mtune-ctrl=^inter_unit_moves_to_vec
-msse4.1"

$cat ira-build.i
typedef struct a *b;
struct a {
  b c}
;
d
  ;
e( f,  g) {
  int h, i;
  b j;
  i = g + 1;
  for (j = f; j ;
   j = j->c) 
h = e( 1);
if (h > i)
  i = h;
  return i;
}
static k() {
  d = e();
}
l() {
  k();
}


$gcc -S  -O3  -mtune-ctrl="^inter_unit_moves_to_vec"  ira-build.i  -msse4.1 -w
ira-build.i: In function \u2018l\u2019:
ira-build.i:23:1: error: invalid rtl sharing found in the insn
   23 | }
  | ^
(insn 57 56 15 4 (set (subreg:V4SI (reg:SI 96) 0)
(vec_merge:V4SI (vec_duplicate:V4SI (mem/c:SI (plus:DI (reg/f:DI 19
frame)
(const_int -4 [0xfffc])) [0  S4 A32]))
(const_vector:V4SI [
(const_int 0 [0]) repeated x4
])
(const_int 1 [0x1]))) "ira-build.i":13:9 -1
 (nil))
ira-build.i:23:1: error: shared rtx
(mem/c:SI (plus:DI (reg/f:DI 19 frame)
(const_int -4 [0xfffc])) [0  S4 A32])
during RTL pass: stv
ira-build.i:23:1: internal compiler error: internal consistency failure
0xa21114 verify_rtx_sharing
/local/gccwork/gcc_trunk/gcc/gcc/emit-rtl.c:2927
0xa20feb verify_rtx_sharing
/local/gccwork/gcc_trunk/gcc/gcc/emit-rtl.c:2942
0xa20feb verify_rtx_sharing
/local/gccwork/gcc_trunk/gcc/gcc/emit-rtl.c:2942
0xa20feb verify_rtx_sharing
/local/gccwork/gcc_trunk/gcc/gcc/emit-rtl.c:2942
0xa2143f verify_insn_sharing
/local/gccwork/gcc_trunk/gcc/gcc/emit-rtl.c:3013
0xa24d67 verify_rtl_sharing()
/local/gccwork/gcc_trunk/gcc/gcc/emit-rtl.c:3036
0xcb2ae9 execute_function_todo
/local/gccwork/gcc_trunk/gcc/gcc/passes.c:2004
0xcb382e execute_todo
/local/gccwork/gcc_trunk/gcc/gcc/passes.c:2037
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.

[Bug rtl-optimization/91460] New: gcc -mpreferred-vector-width=256 is slower than -mpreferred-vector-width=128 for some loops

2019-08-15 Thread skpgkp2 at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91460

Bug ID: 91460
   Summary: gcc -mpreferred-vector-width=256 is slower than
-mpreferred-vector-width=128 for some loops
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: skpgkp2 at gmail dot com
CC: hjl.tools at gmail dot com
  Target Milestone: ---

1 static inline void pixel_avg( uint8_t *dst,  int i_dst_stride,
2  uint8_t *src1, int i_src1_stride,
3  uint8_t *src2, int i_src2_stride,
4   int i_width, int i_height )
5 {
6 for( int y = 0; y < i_height; y++ )
7 {
8 for( int x = 0; x < i_width; x++ )
9 dst[x] = ( src1[x] + src2[x] + 1 ) >> 1;
10 dst  += i_dst_stride;
11 src1 += i_src1_stride;
12 src2 += i_src2_stride;
13 }
14 }

If above code is in hot loop.

if i_width value is between 16 and 32, -mprefer-vector-width=128 can provide
~6% performance improvement as compared to -mprefer-vector-width=256.

i_width value must be at least 16 to trigger 128 bit vectorization at line 8.

i_width value must be at least 32 to trigger 256 bit vectorization at line 8.