from:"kugan at gcc dot gnu.org"

[Bug tree-optimization/114635] OpenMP reductions fail dependency analysis

2024-04-15 Thread kugan at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114635

--- Comment #18 from kugan at gcc dot gnu.org ---
Also, can we set INT_MAX when there is no explicit safelen specified in OMP.
Something like:

--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -6975,14 +6975,11 @@ lower_rec_input_clauses (tree clauses, gimple_seq
*ilist, gimple_seq *dlist,
 {
   tree c = omp_find_clause (gimple_omp_for_clauses (ctx->stmt),
OMP_CLAUSE_SAFELEN);
-  poly_uint64 safe_len;
-  if (c == NULL_TREE
- || (poly_int_tree_p (OMP_CLAUSE_SAFELEN_EXPR (c), _len)
- && maybe_gt (safe_len, sctx.max_vf)))
+  if (c == NULL_TREE)
{
  c = build_omp_clause (UNKNOWN_LOCATION, OMP_CLAUSE_SAFELEN);
  OMP_CLAUSE_SAFELEN_EXPR (c) = build_int_cst (integer_type_node,
-  sctx.max_vf);
+  INT_MAX);
  OMP_CLAUSE_CHAIN (c) = gimple_omp_for_clauses (ctx->stmt);
  gimple_omp_for_set_clauses (ctx->stmt, c);
}

[Bug tree-optimization/114635] OpenMP reductions fail dependency analysis

2024-04-15 Thread kugan at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114635

--- Comment #12 from kugan at gcc dot gnu.org ---
(In reply to Jakub Jelinek from comment #11)
> (In reply to kugan from comment #9)
> > Looking at the options, looks to me that making loop->safelen a poly_in is
> > the way to go. (In reply to Jakub Jelinek from comment #4)
> > > The OpenMP safelen clause argument is a scalar integer, so using poly_int
> > > for something that must be an int doesn't make sense.
> > > Though, the above testcase actually doesn't use safelen clause, so safelen
> > > is there effectively infinity.
> > Thanks. I was looking at this to see if there is a way to handle this
> > differently. Looks to me that making loop->safelen a poly_int is the way to
> > handle at least the case when omp safelen clause is not provided.
> 
> Why?
> Then it just is INT_MAX value, which is a magic value that says that it is
> infinity.
> No need to say it is a poly_int infinity.

For this test case, omp_max_vf gets [16, 16] from the backend. This then
becomes 16. If we keep it as poly_int, it would pass maybe_lt (max_vf, min_vf))
after applying safelen?

[Bug tree-optimization/114635] OpenMP reductions fail dependency analysis

2024-04-15 Thread kugan at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114635

--- Comment #10 from kugan at gcc dot gnu.org ---
Created attachment 57946
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57946=edit
patch

patch to make loop->safelen a poly_int

[Bug tree-optimization/114635] OpenMP reductions fail dependency analysis

2024-04-15 Thread kugan at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114635

--- Comment #9 from kugan at gcc dot gnu.org ---
Looking at the options, looks to me that making loop->safelen a poly_in is the
way to go. (In reply to Jakub Jelinek from comment #4)
> The OpenMP safelen clause argument is a scalar integer, so using poly_int
> for something that must be an int doesn't make sense.
> Though, the above testcase actually doesn't use safelen clause, so safelen
> is there effectively infinity.
Thanks. I was looking at this to see if there is a way to handle this
differently. Looks to me that making loop->safelen a poly_int is the way to
handle at least the case when omp safelen clause is not provided. I am
interested in looking into this. Any suggestions? Here is a completely untested
diff that makes loop->safelen a poly_int.

[Bug tree-optimization/53947] [meta-bug] vectorizer missed-optimizations

2024-04-10 Thread kugan at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
Bug 53947 depends on bug 114653, which changed state.

Bug 114653 Summary: Not vectorizing the loop with openmp reduction.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114653

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |DUPLICATE

[Bug tree-optimization/114635] OpenMP reductions fail dependency analysis

2024-04-10 Thread kugan at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114635

kugan at gcc dot gnu.org changed:

   What|Removed |Added

 CC||kugan at gcc dot gnu.org

--- Comment #8 from kugan at gcc dot gnu.org ---
*** Bug 114653 has been marked as a duplicate of this bug. ***

[Bug middle-end/114653] Not vectorizing the loop with openmp reduction.

2024-04-10 Thread kugan at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114653

kugan at gcc dot gnu.org changed:

   What|Removed |Added

 Resolution|--- |DUPLICATE
 Status|UNCONFIRMED |RESOLVED

--- Comment #6 from kugan at gcc dot gnu.org ---
Duplicate

*** This bug has been marked as a duplicate of bug 114635 ***

[Bug middle-end/114653] Not vectorizing the loop with openmp reduction.

2024-04-10 Thread kugan at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114653

--- Comment #5 from kugan at gcc dot gnu.org ---
ddd for the :
 ref_a: 
_57 = D.4803[_20];
  ref_b: 
D.4803[_20] = _ifc__174;

We get DDR_ARE_DEPENDENT (ddr) == chrec_dont_know. Hence apply_safelen ().

[Bug middle-end/114653] Not vectorizing the loop with openmp reduction.

2024-04-09 Thread kugan at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114653

--- Comment #4 from kugan at gcc dot gnu.org ---
This particular loop has loop->safelen set to 16. Does this mean this can never
be loop vectorized for VLA?

[Bug middle-end/114653] Not vectorizing the loop with openmp reduction.

2024-04-09 Thread kugan at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114653

--- Comment #3 from kugan at gcc dot gnu.org ---
For SVE mode in vect_analyze_loop_2, we have

(gdb) p min_vf
$15 = {coeffs = {4, 4}}
(gdb) p max_vf
$16 = 16

Thus maybe_lt (max_vf, min_vf)) is false. This results in bad data dependence.

[Bug middle-end/114653] Not vectorizing the loop with openmp reduction.

2024-04-09 Thread kugan at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114653

--- Comment #2 from kugan at gcc dot gnu.org ---
Thanks. I see the following in the log:
test.cpp:33:53: missed:   not vectorized: relevant stmt not supported: _54 =
.MASK_LOAD (_53, 32B, _171);
test.cpp:22:19: missed:  bad operation or unsupported loop bound.
test.cpp:22:19: note:  * Analysis  failed with vector mode V4SF


test.cpp:22:19: note:   === vect_analyze_data_ref_dependences ===
test.cpp:22:19: missed:  bad data dependence.
test.cpp:22:19: note:  * Analysis  failed with vector mode VNx16QI

test.cpp:33:53: missed:   not vectorized: relevant stmt not supported: _54 =
.MASK_LOAD (_53, 32B, _171);
test.cpp:22:19: missed:  bad operation or unsupported loop bound.
test.cpp:22:19: note:  * Analysis  failed with vector mode V8QI

test.cpp:22:19: note:   === vect_analyze_data_ref_dependences ===
test.cpp:22:19: missed:  bad data dependence.
test.cpp:22:19: note:  * Analysis  failed with vector mode VNx8QI

test.cpp:33:53: missed:   not vectorized: relevant stmt not supported: _54 =
.MASK_LOAD (_53, 32B, _171);
test.cpp:22:19: missed:  bad operation or unsupported loop bound.
test.cpp:22:19: note:  * Analysis  failed with vector mode V4HI

test.cpp:22:19: note:   === vect_analyze_data_ref_dependences ===
test.cpp:22:19: missed:  bad data dependence.
test.cpp:22:19: note:  * Analysis  failed with vector mode VNx4QI

test.cpp:33:53: missed:   not vectorized: relevant stmt not supported: _54 =
.MASK_LOAD (_53, 32B, _171);
test.cpp:22:19: missed:  bad operation or unsupported loop bound.
test.cpp:22:19: note:  * Analysis  failed with vector mode V2SI

test.cpp:22:19: note:   worklist: examine stmt: _57 = D.4803[_20];
test.cpp:22:19: note:   === vect_analyze_data_ref_dependences ===
test.cpp:22:19: missed:  bad data dependence.
test.cpp:22:19: note:  * Analysis  failed with vector mode VNx2QI
test.cpp:22:19: missed: couldn't vectorize loop
test.cpp:22:19: missed: bad data dependence.

[Bug middle-end/114653] New: Not vectoring the loop with openmp reduction.

2024-04-09 Thread kugan at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114653

Bug ID: 114653
   Summary: Not vectoring the loop with openmp reduction.
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kugan at gcc dot gnu.org
  Target Milestone: ---

Created attachment 57910
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57910=edit
testcase

Main loop in the attached test case is not vectorized with -fopenmp. It gets
vectorized with -fopenmp-simd.

In the case of -fopenmp reduction variables lax,lay,laz gets assigned to an
array. data reference calculation for this seem to fail. See:

offset from base address: (ssizetype) ((sizetype) _20 * 4)
constant offset from base address: 0
step: 0
base alignment: 16
base misalignment: 0
offset alignment: 4
step alignment: 128
base_object: D.4806[_20]
Creating dr for D.4808[_20]
analyze_innermost: Applying pattern match.pd:219, generic-match-1.cc:3190
test.cpp:37:9: missed:  failed: evolution of offset is not affine.


command used: 
 test.cpp -Ofast -fopenmp -mcpu=neoverse-v2


gcc -v:
Using built-in specs.
COLLECT_GCC=/home/kvivekananda/install/bin/gcc
COLLECT_LTO_WRAPPER=/home/kvivekananda/install/libexec/gcc/aarch64-unknown-linux-gnu/14.0.1/lto-wrapper
Target: aarch64-unknown-linux-gnu
Configured with: ../gcc/configure --enable-multiarch=yes
--enable-languages=c,c++,fortran,lto --disable-bootstrap
--prefix=/home/kvivekananda/install
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 14.0.1 20240314 (experimental) (GCC)

[Bug middle-end/111683] [11/12/13/14 Regression] Incorrect answer when using SSE2 intrinsics with -O3 since r7-3163-g973625a04b3d9351f2485e37f7d3382af2aed87e

2024-03-09 Thread kugan at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111683

--- Comment #5 from kugan at gcc dot gnu.org ---
 -O3 -fno-tree-vectorize  and -O3 -fno-tree-vrp works. I looked at the ever
dump and it is not doing anything suspicious. Looks like range_info usage in
vectoriser is causing the problem.

[Bug libgomp/113698] GNU OpenMP with OMP_PROC_BIND alters thread affinity in a way that negatively affects performance

2024-02-09 Thread kugan at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113698

--- Comment #4 from kugan at gcc dot gnu.org ---
Thanks for looking into this. The main reason we ere seeing performance issue
turned out to be due to glibc malloc issue in
https://sourceware.org/bugzilla/show_bug.cgi?id=30945

[Bug libgomp/113698] New: GNU OpenMP with OMP_PROC_BIND alters thread affinity in a way that negatively affects performance

2024-01-31 Thread kugan at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113698

Bug ID: 113698
   Summary: GNU OpenMP with OMP_PROC_BIND alters thread affinity
in a way that negatively affects performance
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libgomp
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kugan at gcc dot gnu.org
CC: jakub at gcc dot gnu.org
  Target Milestone: ---

Created attachment 57275
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57275=edit
testcase

When OMP_PROC_BIND=true it seems gomp set the affinity even before main()
starts. In particular, the main thread gets affinity 0x1 (i.e. pinned to the
first core). For the attached, I get

$ OMP_NUM_THREADS=72 ./a.out
[main thread affinity right after main()]. tid:ae511020
aff:...
duration: 402.949 msec

$ OMP_PROC_BIND=true OMP_NUM_THREADS=72 ./a.out
[main thread affinity right after main()]. tid:fffdded50020
aff:...0001
duration: 7879.59 msec

$ OMP_PROC_BIND=true OMP_NUM_THREADS=72 ./a.out
[main thread affinity right after main()]. tid:ae54c020
aff:...0001
duration: 311219 msec

Compiler options used:
gcc -O0 -fopenmp repro.c

gcc -v:


Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/aarch64-linux-gnu/11/lto-wrapper
Target: aarch64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu
11.3.0-1ubuntu1~22.04' --with-bugurl=file:///usr/share/doc/gcc-11/README.Bugs
--enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --prefix=/usr
--with-gcc-major-version-only --program-suffix=-11
--program-prefix=aarch64-linux-gnu- --enable-shared --enable-linker-build-id
--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix
--libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu
--enable-libstdcxx-debug --enable-libstdcxx-time=yes
--with-default-libstdcxx-abi=new --enable-gnu-unique-object
--disable-libquadmath --disable-libquadmath-support --enable-plugin
--enable-default-pie --with-system-zlib --enable-libphobos-checking=release
--with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch
--enable-fix-cortex-a53-843419 --disable-werror --enable-checking=release
--build=aarch64-linux-gnu --host=aarch64-linux-gnu --target=aarch64-linux-gnu
--with-build-config=bootstrap-lto-lean --enable-link-serialization=2
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 11.3.0 (Ubuntu 11.3.0-1ubuntu1~22.04)

[Bug driver/47785] GCC with -flto does not pass -Wa options to the assembler

2019-10-22 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=47785

kugan at gcc dot gnu.org changed:

   What|Removed |Added

 CC||kugan at gcc dot gnu.org

--- Comment #14 from kugan at gcc dot gnu.org ---
A patch for this is posted at
https://gcc.gnu.org/ml/gcc-patches/2019-10/msg01471.html

[Bug ipa/91468] Suspicious codes in ipa-prop.c and ipa-cp.c

2019-08-26 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91468

kugan at gcc dot gnu.org changed:

   What|Removed |Added

 CC||kugan at gcc dot gnu.org

--- Comment #2 from kugan at gcc dot gnu.org ---
(In reply to Martin Jambor from comment #1)
> (In reply to Feng Xue from comment #0)

> > 
> > In function update_jump_functions_after_inlining(),
> > 
> >   if (dst->type == IPA_JF_ANCESTOR)
> > {
> >   ..
> > 
> >   if (src->type == IPA_JF_PASS_THROUGH
> >   && src->value.pass_through.operation == NOP_EXPR)
> > {
> >..
> > }
> >   else if (src->type == IPA_JF_PASS_THROUGH
> >&& TREE_CODE_CLASS (src->value.pass_through.operation) == 
> > tcc_unary)
> > {
> >   dst->value.ancestor.formal_id = src->value.pass_through.formal_id;
> >   dst->value.ancestor.agg_preserved = false;
> > }
> >   ..   
> > }
> > 
> > If we suppose pass_through operation is "negate_expr" (while it is not a
> > reasonable operation on pointer type), the code might be incorrect. It's
> > better to specify expected unary operations here.
> 
> Kugan, you added this in 2016 and unfortunately I think it is wrong.
> Are there any unary operations we could possibly want to handle?
> In any event, the information that there was an arithmetic function in
> the path of the parameter would be completely lost if the code ever
> executed.  (Which I don't think it ever does, I think it would take
> crazy code that employs LTO to pass an integer to a pointer parameter
> to trigger).
> 
> So I plan to remove the whole if.
> 

Yes, i think this is a mistake and should go. Thanks for doing that.

[Bug target/88834] [SVE] Poor addressing mode choices for LD2 and ST2

2019-06-17 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88834

--- Comment #21 from kugan at gcc dot gnu.org ---
(In reply to Christophe Lyon from comment #20)
> Hi Kugan,
> 
> The new test fails with -mabi=ilp32:
> FAIL: gcc.target/aarch64/pr88834.c scan-assembler-times \\tld2w\\t{z[0-9]+.s
> - z[0-9]+.s}, p[0-7]/z, \\[x[0-9]+, x[0-9]+, lsl 2\\]\\n 2
> FAIL: gcc.target/aarch64/pr88834.c scan-assembler-times \\tst2w\\t{z[0-9]+.s
> - z[0-9]+.s}, p[0-7], \\[x[0-9]+, x[0-9]+, lsl 2\\]\\n 1

Thanks Christophe. In the back-end, when we use ILP32, we don't accept SImode
ops if like:

(plus:SI (mult:SI (reg:SI 91)
(const_int 4 [0x4]))
(reg:SI 90))

While we would accept Pmode. My question is, should we care about ILP32 for
SVE? If so we need to fix this. Otherwise, we can run the test for LP64.

[Bug target/88838] [SVE] Use 32-bit WHILELO in LP64 mode

2019-06-12 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88838

--- Comment #6 from kugan at gcc dot gnu.org ---
Author: kugan
Date: Thu Jun 13 03:34:28 2019
New Revision: 272233

URL: https://gcc.gnu.org/viewcvs?rev=272233=gcc=rev
Log:

gcc/ChangeLog:

2019-06-13  Kugan Vivekanandarajah  

PR target/88838
* tree-vect-loop-manip.c (vect_set_loop_masks_directly): If the
compare_type is not with Pmode size, we will create an IV with
Pmode size with truncated use (i.e. converted to the correct type).
* tree-vect-loop.c (vect_verify_full_masking): Find IV type.
(vect_iv_limit_for_full_masking): New. Factored out of
vect_set_loop_condition_masked.
* tree-vectorizer.h (LOOP_VINFO_MASK_IV_TYPE): New.
(vect_iv_limit_for_full_masking): Declare.

gcc/testsuite/ChangeLog:

2019-06-13  Kugan Vivekanandarajah  

PR target/88838
* gcc.target/aarch64/pr88838.c: New test.
* gcc.target/aarch64/sve/while_1.c: Adjust.

Added:
trunk/gcc/testsuite/gcc.target/aarch64/pr88838.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gcc.target/aarch64/sve/while_1.c
trunk/gcc/tree-vect-loop-manip.c
trunk/gcc/tree-vect-loop.c
trunk/gcc/tree-vectorizer.h

[Bug target/88834] [SVE] Poor addressing mode choices for LD2 and ST2

2019-06-12 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88834

--- Comment #19 from kugan at gcc dot gnu.org ---
Author: kugan
Date: Thu Jun 13 03:18:54 2019
New Revision: 272232

URL: https://gcc.gnu.org/viewcvs?rev=272232=gcc=rev
Log:

gcc/ChangeLog:

2019-06-13  Kugan Vivekanandarajah  

PR target/88834
* tree-ssa-loop-ivopts.c (get_mem_type_for_internal_fn): Handle
IFN_MASK_LOAD_LANES and IFN_MASK_STORE_LANES.
(get_alias_ptr_type_for_ptr_address): Likewise.
(add_iv_candidate_for_use): Add scaled index candidate if useful.
* tree-ssa-address.c (preferred_mem_scale_factor): New.
* config/aarch64/aarch64.c (aarch64_classify_address): Relax
allow_reg_index_p.

gcc/testsuite/ChangeLog:

2019-06-13  Kugan Vivekanandarajah  

PR target/88834
* gcc.target/aarch64/pr88834.c: New test.
* gcc.target/aarch64/sve/struct_vect_1.c: Adjust.
* gcc.target/aarch64/sve/struct_vect_14.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_15.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_16.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_17.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_7.c: Likewise.


Added:
trunk/gcc/testsuite/gcc.target/aarch64/pr88834.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/aarch64/aarch64.c
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gcc.target/aarch64/sve/struct_vect_1.c
trunk/gcc/testsuite/gcc.target/aarch64/sve/struct_vect_14.c
trunk/gcc/testsuite/gcc.target/aarch64/sve/struct_vect_15.c
trunk/gcc/testsuite/gcc.target/aarch64/sve/struct_vect_16.c
trunk/gcc/testsuite/gcc.target/aarch64/sve/struct_vect_17.c
trunk/gcc/testsuite/gcc.target/aarch64/sve/struct_vect_7.c
trunk/gcc/tree-ssa-address.c
trunk/gcc/tree-ssa-address.h
trunk/gcc/tree-ssa-loop-ivopts.c

[Bug target/88834] [SVE] Poor addressing mode choices for LD2 and ST2

2019-04-09 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88834

--- Comment #17 from kugan at gcc dot gnu.org ---
(In reply to Wilco from comment #16)
> (In reply to kugan from comment #15)
> > (In reply to Wilco from comment #11)
> > > There is also something odd with the way the loop iterates, this doesn't
> > > look right:
> > > 
> > > whilelo p0.s, x3, x4
> > > incwx3
> > > ptest   p1, p0.b
> > > bne .L3
> > 
> > I am not sure I understand this. I tried with qemu using an execution
> > testcase and It seems to work.
> > 
> > whilelo p0.s, x4, x5
> > incwx4
> > ptest   p1, p0.b
> > bne .L3
> > In my case I have the above (register allocation difference only) incw is
> > correct considering two vector word registers? Am I missing something here?
> 
> I'm talking about the completely redundant ptest, where does that come from?

It is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88836

[Bug target/88834] [SVE] Poor addressing mode choices for LD2 and ST2

2019-04-08 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88834

--- Comment #15 from kugan at gcc dot gnu.org ---
(In reply to Wilco from comment #11)
> There is also something odd with the way the loop iterates, this doesn't
> look right:
> 
> whilelo p0.s, x3, x4
> incwx3
> ptest   p1, p0.b
> bne .L3

I am not sure I understand this. I tried with qemu using an execution testcase
and It seems to work.

whilelo p0.s, x4, x5
incwx4
ptest   p1, p0.b
bne .L3
In my case I have the above (register allocation difference only) incw is
correct considering two vector word registers? Am I missing something here?

[Bug target/88834] [SVE] Poor addressing mode choices for LD2 and ST2

2019-04-08 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88834

--- Comment #14 from kugan at gcc dot gnu.org ---
Created attachment 46104
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46104=edit
testcase

[Bug target/88834] [SVE] Poor addressing mode choices for LD2 and ST2

2019-04-08 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88834

kugan at gcc dot gnu.org changed:

   What|Removed |Added

  Attachment #46040|0   |1
is obsolete||

--- Comment #13 from kugan at gcc dot gnu.org ---
Created attachment 46103
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46103=edit
ivopt changes alone

[Bug target/88834] [SVE] Poor addressing mode choices for LD2 and ST2

2019-04-08 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88834

--- Comment #12 from kugan at gcc dot gnu.org ---
(In reply to rsand...@gcc.gnu.org from comment #10)
> (In reply to kugan from comment #9)
> > Created attachment 46040 [details]
> > patch
> 
> Wasn't sure whether this patch was WIP or the final version
> for review, but we need to do something more generic than
> dividing by 4.  I think the test will still fail with "int"
> changed to "short" for example.
> 
> I also don't think the new candidate should be tied to the
> mask/load store functions.  Maybe one approach would be to
> check when adding a zero-based candidate for a use in:
> 
>   /* Record common candidate with initial value zero.  */
>   basetype = TREE_TYPE (iv->base);
>   if (POINTER_TYPE_P (basetype))
> basetype = sizetype;
>   record_common_cand (data, build_int_cst (basetype, 0), iv->step, use);
> 
> whether the use actually benefits from this unscaled iv.
> If the use is USE_REF_ADDRESS, we could compare the cost
> of an address with an unscaled index with the cost of an address
> with a scaled index.  I think the natural scale value to try
> would be GET_MODE_INNER (TYPE_MODE (mem_type)).

Thanks for the comments. I agree this is the right place. But I am not sure if
checking the cost at this point is what IV opt generally does. In general,
IV-opt adds candidates which can be helpful and later decides the optimal set. 

If we are to use get_computation_cost to see the costs, we have to create
iv_cand and then discard. Since we are adding only one candidate and that too
for SVE like targets, I am thinking that it is OK. If you still prefer to check
the cost, I will change that.

Attached patch (only the ivopt changes) and testcase

[Bug rtl-optimization/89862] LTO bootstrap fails for ARM

2019-03-29 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89862

--- Comment #4 from kugan at gcc dot gnu.org ---
Author: kugan
Date: Sat Mar 30 04:28:51 2019
New Revision: 270031

URL: https://gcc.gnu.org/viewcvs?rev=270031=gcc=rev
Log:

2019-03-29  Kugan Vivekanandarajah  

Backport from mainline
2019-03-29  Kugan Vivekanandarajah  
Eric Botcazou  

PR rtl-optimization/89862
* rtl.h (word_register_operation_p): Exclude CONST_INT from operations
that operates on the full registers for WORD_REGISTER_OPERATIONS
architectures.


Modified:
branches/gcc-8-branch/gcc/ChangeLog
branches/gcc-8-branch/gcc/rtl.h

[Bug rtl-optimization/89862] LTO bootstrap fails for ARM

2019-03-29 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89862

--- Comment #3 from kugan at gcc dot gnu.org ---
Author: kugan
Date: Sat Mar 30 04:24:22 2019
New Revision: 270030

URL: https://gcc.gnu.org/viewcvs?rev=270030=gcc=rev
Log:

2019-03-29  Kugan Vivekanandarajah  
Eric Botcazou  

PR rtl-optimization/89862
* rtl.h (word_register_operation_p): Exclude CONST_INT from operations
that operates on the full registers for WORD_REGISTER_OPERATIONS
architectures.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/rtl.h

[Bug rtl-optimization/89862] LTO bootstrap fails for ARM

2019-03-28 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89862

--- Comment #2 from kugan at gcc dot gnu.org ---
(In reply to Eric Botcazou from comment #1)
> Can you try this instead?
> 
> Index: rtl.h
> ===
> --- rtl.h   (revision 269886)
> +++ rtl.h   (working copy)
> @@ -4401,6 +4401,7 @@ word_register_operation_p (const_rtx x)
>  {
>switch (GET_CODE (x))
>  {
> +case CONST_INT:
>  case ROTATE:
>  case ROTATERT:
>  case SIGN_EXTRACT:
Thanks for looking into it. Disallowing all the CONST_INT works for me. I have
verified that lto-bootstrap works with the above changes. I will test for
regression and post it to gcc-patches.

[Bug target/88834] [SVE] Poor addressing mode choices for LD2 and ST2

2019-03-27 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88834

kugan at gcc dot gnu.org changed:

   What|Removed |Added

  Attachment #45686|0   |1
is obsolete||

--- Comment #9 from kugan at gcc dot gnu.org ---
Created attachment 46040
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46040=edit
patch

[Bug target/88834] [SVE] Poor addressing mode choices for LD2 and ST2

2019-03-27 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88834

--- Comment #8 from kugan at gcc dot gnu.org ---
(In reply to rsand...@gcc.gnu.org from comment #7)
> Thanks for looking at this.
> 
> (In reply to kugan from comment #6)
> > cmp w3, 0
> > ble .L1
> > sub w3, w3, #1
> > mov x4, 0
> > cntwx5
> > ptrue   p1.s, all
> > lsr w3, w3, 1
> > add w3, w3, 1
> > whilelo p0.s, xzr, x3
> > .p2align 3,,7
> > .L3:
> > ld2w{z4.s - z5.s}, p0/z, [x1, x4, lsl 2]
> > ld2w{z2.s - z3.s}, p0/z, [x2, x4, lsl 2]
> > add z0.s, z4.s, z2.s
> > sub z1.s, z5.s, z3.s
> > st2w{z0.s - z1.s}, p0, [x0, x4, lsl 2]
> > whilelo p0.s, x5, x3
> > incbx4, all, mul #2
> > incwx5
> > ptest   p1, p0.b
> > bne .L3
> > .L1:
> > ret
> > .cfi_endproc
> 
> This doesn't look right.  x4 is an index, so it should be
> incremented by the number of words in two vectors, rather than
> the number of bytes in two vectors.

Thanks for the comments. Fixed it with the attached patch it generates

f:
.LFB0:
.cfi_startproc
cmp w3, 0
ble .L1
sub w5, w3, #1
cntwx4
mov x3, 0
ptrue   p1.s, all
lsr w5, w5, 1
add w5, w5, 1
whilelo p0.s, xzr, x5
.p2align 3,,7
.L3:
ld2w{z4.s - z5.s}, p0/z, [x1, x3, lsl 2]
ld2w{z2.s - z3.s}, p0/z, [x2, x3, lsl 2]
add z0.s, z4.s, z2.s
sub z1.s, z5.s, z3.s
st2w{z0.s - z1.s}, p0, [x0, x3, lsl 2]
whilelo p0.s, x4, x5
inchx3
incwx4
ptest   p1, p0.b
bne .L3
.L1:
ret
.cfi_endproc

[Bug rtl-optimization/89862] New: LTO bootstrap fails for ARM

2019-03-27 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89862

Bug ID: 89862
   Summary: LTO bootstrap fails for ARM
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kugan at gcc dot gnu.org
  Target Milestone: ---

Created attachment 46039
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46039=edit
patch

With the commit:
commit 67c18bce7054934528ff5930cca283b4ac967dca
Author: ebotcazou 
Date:   Wed Jan 31 10:03:06 2018 +PR rtl-optimization/84071
* combine.c (record_dead_and_set_regs_1): Record the source
unmodified
for a paradoxical SUBREG on a WORD_REGISTER_OPERATIONS target.

LTO bootstrap fails for arm (possibly for other WORD_REGISTER_OPERATIONS
targets).

There are internal compiler error: in operator+=, at profile-count.h:792. It
looks like the profile_count is set incorrectly.

Commit 67c18bce7054934528ff5930cca283b4ac967dca skips generating gen_lowpart
for
(set (subreg:SI (reg:QI 1434) 0)
(const_int 224 [0xe0])) and likes. This seems to be the reason for the
error.

attached patch fixes this. Does this look reasonable?

[Bug target/88838] [SVE] Use 32-bit WHILELO in LP64 mode

2019-03-20 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88838

--- Comment #5 from kugan at gcc dot gnu.org ---
Created attachment 46000
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46000=edit
RFC patch

RFC patch fixes this for review.

[Bug target/88836] [SVE] Redundant PTEST in loop test

2019-02-21 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88836

--- Comment #2 from kugan at gcc dot gnu.org ---
Created attachment 45795
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45795=edit
RFC patch

AFIK, we need to:
1. Change the whilelo pattern in backend
2. Change RTL CSE
- Add support for VEC_DUPLICATE
- When handling PARALLEL rtx, we  may kill CSE defined in the first set so that
it docent reach

Attached patch fix this. With the patch I now have:
.LFB0:
.cfi_startproc
cmp w3, 0
ble .L1
sub w4, w3, #1
cntwx3
lsr w4, w4, 1
add w4, w4, 1
whilelo p0.s, xzr, x4
.p2align 3,,7
.L3:
ld2w{z4.s - z5.s}, p0/z, [x1]
ld2w{z2.s - z3.s}, p0/z, [x2]
add z0.s, z4.s, z2.s
sub z1.s, z5.s, z3.s
st2w{z0.s - z1.s}, p0, [x0]
incbx1, all, mul #2
whilelo p0.s, x3, x4
incbx0, all, mul #2
incwx3
incbx2, all, mul #2
bne .L3
.L1:
ret
.cfi_endproc

[Bug target/88838] [SVE] Use 32-bit WHILELO in LP64 mode

2019-02-21 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88838

--- Comment #4 from kugan at gcc dot gnu.org ---
sorry wr(In reply to kugan from comment #3)
> Created attachment 45794 [details]
> RFC patch

Oops wrong place, it should be for
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88836

[Bug target/88838] [SVE] Use 32-bit WHILELO in LP64 mode

2019-02-21 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88838

--- Comment #3 from kugan at gcc dot gnu.org ---
Created attachment 45794
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45794=edit
RFC patch

[Bug target/88838] [SVE] Use 32-bit WHILELO in LP64 mode

2019-02-21 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88838

kugan at gcc dot gnu.org changed:

   What|Removed |Added

 CC||kugan at gcc dot gnu.org

--- Comment #2 from kugan at gcc dot gnu.org ---
AFIK, we need to:
1. Change the whilelo pattern in backend
2. Change RTL CSE
- Add support for VEC_DUPLICATE
- When handling PARALLEL rtx, we  may kill CSE defined in the first set so that
it docent reach

Attached patch fix this. With the patch I now have:
.LFB0:
.cfi_startproc
cmp w3, 0
ble .L1
sub w4, w3, #1
cntwx3
lsr w4, w4, 1
add w4, w4, 1
whilelo p0.s, xzr, x4
.p2align 3,,7
.L3:
ld2w{z4.s - z5.s}, p0/z, [x1]
ld2w{z2.s - z3.s}, p0/z, [x2]
add z0.s, z4.s, z2.s
sub z1.s, z5.s, z3.s
st2w{z0.s - z1.s}, p0, [x0]
incbx1, all, mul #2
whilelo p0.s, x3, x4
incbx0, all, mul #2
incwx3
incbx2, all, mul #2
bne .L3
.L1:
ret
.cfi_endproc

[Bug target/88834] [SVE] Poor addressing mode choices for LD2 and ST2

2019-02-12 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88834

--- Comment #6 from kugan at gcc dot gnu.org ---

> 
> Note the difference in mode for aarch64_classify_address. Not sure if this
> is because of the way my patch changes ivopt.

Yes, it ws my mistake in iv-use. with attached patch, I now get
cmp w3, 0
ble .L1
sub w3, w3, #1
mov x4, 0
cntwx5
ptrue   p1.s, all
lsr w3, w3, 1
add w3, w3, 1
whilelo p0.s, xzr, x3
.p2align 3,,7
.L3:
ld2w{z4.s - z5.s}, p0/z, [x1, x4, lsl 2]
ld2w{z2.s - z3.s}, p0/z, [x2, x4, lsl 2]
add z0.s, z4.s, z2.s
sub z1.s, z5.s, z3.s
st2w{z0.s - z1.s}, p0, [x0, x4, lsl 2]
whilelo p0.s, x5, x3
incbx4, all, mul #2
incwx5
ptest   p1, p0.b
bne .L3
.L1:
ret
.cfi_endproc

I will post the patch for review after stage-1 opens. In the meantime any
review is appreciated. Especially the part where iv-use is setup and
get_alias_ptr_type_for_ptr_address.

[Bug target/88834] [SVE] Poor addressing mode choices for LD2 and ST2

2019-02-12 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88834

kugan at gcc dot gnu.org changed:

   What|Removed |Added

  Attachment #45661|0   |1
is obsolete||

--- Comment #5 from kugan at gcc dot gnu.org ---
Created attachment 45686
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45686=edit
ivopt patch v2

[Bug tree-optimization/89296] New: tree copy-header masking uninitialized warning

2019-02-11 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89296

Bug ID: 89296
   Summary: tree copy-header masking uninitialized warning
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kugan at gcc dot gnu.org
  Target Milestone: ---

void test_func(void) {
  int loop;  // uninitialized and "garbage"
  while (!loop) {
   loop = get_a_value();  // <- must be for this test
   printk("...");
  }
}

from Linaro bug report https://bugs.linaro.org/show_bug.cgi?id=4134
-fno-tree-ch gets the required warning

diff --git a/gcc/tree-ssa-loop-ch.c b/gcc/tree-ssa-loop-ch.c
index c876d62..d405d00 100644
--- a/gcc/tree-ssa-loop-ch.c
+++ b/gcc/tree-ssa-loop-ch.c
@@ -393,7 +393,7 @@ ch_base::copy_headers (function *fun)
{
  gimple *stmt = gsi_stmt (bsi);
  if (gimple_code (stmt) == GIMPLE_COND)
-   gimple_set_no_warning (stmt, true);
+   ;//gimple_set_no_warning (stmt, true);
  else if (is_gimple_assign (stmt))
{
  enum tree_code rhs_code = gimple_assign_rhs_code (stmt);

also gets the required warning. Looking into it.

[Bug target/88834] [SVE] Poor addressing mode choices for LD2 and ST2

2019-02-11 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88834

--- Comment #4 from kugan at gcc dot gnu.org ---
Created attachment 45661
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45661=edit
ivopt patch v1

[Bug target/88834] [SVE] Poor addressing mode choices for LD2 and ST2

2019-02-11 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88834

--- Comment #3 from kugan at gcc dot gnu.org ---
I added iv-use for MASKED_LOAD_LANE and the result is
cmp w3, 0
ble .L1
sub w5, w3, #1
mov x4, 0
lsr w5, w5, 1
add w5, w5, 1
whilelo p0.s, xzr, x5
.p2align 3,,7
.L3:
lsl x3, x4, 3
incwx4
add x7, x1, x3
add x6, x2, x3
ld2w{z4.s - z5.s}, p0/z, [x7]
ld2w{z2.s - z3.s}, p0/z, [x6]
add x3, x0, x3
add z0.s, z4.s, z2.s
sub z1.s, z5.s, z3.s
st2w{z0.s - z1.s}, p0, [x3]
whilelo p0.s, x4, x5
bne .L3
.L1:
ret

No base plus scaled index addressing mode. This is because in ivopt

When called from ivopt:
Breakpoint 4, aarch64_classify_address (info=0x7fffcba0, x=0x76c44f30,
mode=E_DImode, strict_p=false, type=ADDR_QUERY_M)
at
/home/kugan/work/abe/snapshots/gcc.git~origin~aarch64~sve-acle-branch/gcc/config/aarch64/aarch64.c:5689
5689{
(gdb) p debug_rtx (x)
(plus:DI (mult:DI (reg:DI 91)
(const_int 8 [0x8]))
(reg:DI 90))

it accepts it.

When in cfgexpand:
Breakpoint 5, aarch64_classify_address (info=0x7fffcca0, x=0x76c5b840,
mode=E_VNx8SImode, strict_p=false, type=ADDR_QUERY_M)
at
/home/kugan/work/abe/snapshots/gcc.git~origin~aarch64~sve-acle-branch/gcc/config/aarch64/aarch64.c:5689
5689{
(gdb) p debug_rtx (x)
(plus:DI (mult:DI (reg:DI 92 [ ivtmp_28 ])
(const_int 8 [0x8]))
(reg/v/f:DI 110 [ y ]))


This is not accepted because of aarch64_classify_index (info, op1, mode,
strict_p) failing (as it should).

Note the difference in mode for aarch64_classify_address. Not sure if this is
because of the way my patch changes ivopt.

[Bug target/88834] [SVE] Poor addressing mode choices for LD2 and ST2

2019-02-03 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88834

kugan at gcc dot gnu.org changed:

   What|Removed |Added

 CC||kugan at gcc dot gnu.org

--- Comment #2 from kugan at gcc dot gnu.org ---
I'll assign it to myself unless it is being looked at by someone else.

[Bug sanitizer/88333] [9 Regression] ice in asan_emit_stack_protection, at asan.c:1574

2018-12-06 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88333

kugan at gcc dot gnu.org changed:

   What|Removed |Added

 CC||kugan at gcc dot gnu.org

--- Comment #7 from kugan at gcc dot gnu.org ---
*** Bug 88350 has been marked as a duplicate of this bug. ***

[Bug sanitizer/88350] Linux kernel build ICE with allyesconfig for aarch64

2018-12-06 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88350

kugan at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #3 from kugan at gcc dot gnu.org ---
Duplicate

*** This bug has been marked as a duplicate of bug 88333 ***

[Bug sanitizer/88350] Linux kernel build ICE with allyesconfig for aarch64

2018-12-06 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88350

kugan at gcc dot gnu.org changed:

   What|Removed |Added

  Alias|PR88333 |

--- Comment #2 from kugan at gcc dot gnu.org ---
Dup of PR88333 and fixed.

[Bug sanitizer/88350] New: Linux kernel build ICE with allyesconfig for aarch64

2018-12-04 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88350

Bug ID: 88350
   Summary: Linux kernel build ICE with allyesconfig for aarch64
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: sanitizer
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kugan at gcc dot gnu.org
CC: dodji at gcc dot gnu.org, dvyukov at gcc dot gnu.org,
jakub at gcc dot gnu.org, kcc at gcc dot gnu.org, marxin at 
gcc dot gnu.org
  Target Milestone: ---

When Linux kernel is built (allyesconfig) with trunk,  


++ make
CC=/home/tcwg-buildslave/workspace/tcwg_kernel-bisect-gnu_0/bin/aarch64-cc
ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- HOSTCC=gcc -j32 -s -k
:1335:2: warning: #warning syscall rseq not implemented [-Wcpp]
*** WARNING *** there are active plugins, do not report this as a bug unless
you can reproduce it without enabling any plugins.
Event| Plugins
PLUGIN_FINISH_TYPE   | randomize_layout_plugin structleak_plugin
PLUGIN_FINISH_DECL   | randomize_layout_plugin
PLUGIN_ATTRIBUTES| randomize_layout_plugin
latent_entropy_plugin structleak_plugin
PLUGIN_START_UNIT| latent_entropy_plugin
PLUGIN_ALL_IPA_PASSES_START  | randomize_layout_plugin
during RTL pass: expand
arch/arm64/mm/flush.c: In function '__sync_icache_dcache':
arch/arm64/mm/flush.c:61:6: internal compiler error: in
asan_emit_stack_protection, at asan.c:1574
   61 | void __sync_icache_dcache(pte_t pte)
  |  ^~~~


Full build Log can be found in:
https://ci.linaro.org/job/tcwg_kernel-bisect-gnu-master-aarch64-stable-allyesconfig/11/artifact/artifacts/build-1d89613e77d7db420b13ce3ad8b98f07aaf474e8/console.log


Commit that seem to trigger this is:
Author: marxin 
Date:   Fri Nov 30 14:25:15 2018 +

Make red zone size more flexible for stack variables (PR sanitizer/81715).

2018-11-30  Martin Liska  

PR sanitizer/81715
* asan.c (asan_shadow_cst): Remove, partially transform
into flush_redzone_payload.
(RZ_BUFFER_SIZE): New.
(struct asan_redzone_buffer): New.
(asan_redzone_buffer::emit_redzone_byte): Likewise.
(asan_redzone_buffer::flush_redzone_payload): Likewise.
(asan_redzone_buffer::flush_if_full): Likewise.
(asan_emit_stack_protection): Use asan_redzone_buffer class
that is responsible for proper aligned stores and flushing
of shadow memory payload.
* asan.h (ASAN_MIN_RED_ZONE_SIZE): New.
(asan_var_and_redzone_size): Likewise.
* cfgexpand.c (expand_stack_vars): Use smaller alignment
(ASAN_MIN_RED_ZONE_SIZE) in order to make shadow memory
for automatic variables more compact.
2018-11-30  Martin Liska  

PR sanitizer/81715
* c-c++-common/asan/asan-stack-small.c: New test.


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@24
138bc75d-0d04-0410-961f-82ee72b054a4

[Bug rtl-optimization/88212] New: IRA Register Coalescing not working for the testcase

2018-11-26 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88212

Bug ID: 88212
   Summary: IRA Register Coalescing not working for the testcase
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kugan at gcc dot gnu.org
  Target Milestone: ---

When compiling the following on aarch64 with -O2:
#include 
void g(int32_t *p, int32x2x2_t val, int x)
{
 vst2_lane_s32(p,val,0);
}

generates:
.cfi_startproc
mov v2.8b, v0.8b
mov v3.8b, v1.8b
st2 {v2.s - v3.s}[0], [x0]
ret

clang produces:
st2 { v0.s, v1.s }[0], [x0]
ret

Essentially the problem is that access to part-registers doesn't get
coalesced, so IRA generates moves which aren't actually required.

[Bug target/86677] popcount builtin detection is breaking some kernel build

2018-11-12 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86677

--- Comment #13 from kugan at gcc dot gnu.org ---
Author: kugan
Date: Mon Nov 12 23:43:56 2018
New Revision: 266039

URL: https://gcc.gnu.org/viewcvs?rev=266039=gcc=rev
Log:
gcc/ChangeLog:

2018-11-13  Kugan Vivekanandarajah  

PR middle-end/86677
PR middle-end/87528
* tree-scalar-evolution.c (expression_expensive_p): Make BUILTIN
POPCOUNT
as expensive when backend does not define it.

gcc/testsuite/ChangeLog:

2018-11-13  Kugan Vivekanandarajah  

PR middle-end/86677
PR middle-end/87528
* g++.dg/tree-ssa/pr86544.C: Run only for target supporting popcount
pattern.
* gcc.dg/tree-ssa/popcount.c: Likewise.
* gcc.dg/tree-ssa/popcount2.c: Likewise.
* gcc.dg/tree-ssa/popcount3.c: Likewise.
* gcc.target/aarch64/popcount4.c: New test.
* lib/target-supports.exp (check_effective_target_popcountl): New.


Added:
trunk/gcc/testsuite/gcc.target/aarch64/popcount4.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/g++.dg/tree-ssa/pr86544.C
trunk/gcc/testsuite/gcc.dg/tree-ssa/popcount.c
trunk/gcc/testsuite/gcc.dg/tree-ssa/popcount2.c
trunk/gcc/testsuite/gcc.dg/tree-ssa/popcount3.c
trunk/gcc/testsuite/lib/target-supports.exp
trunk/gcc/tree-scalar-evolution.c

[Bug middle-end/87528] Popcount changes caused 531.deepsjeng_r run-time regression on Skylake

2018-11-12 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87528

--- Comment #7 from kugan at gcc dot gnu.org ---
Author: kugan
Date: Mon Nov 12 23:43:56 2018
New Revision: 266039

URL: https://gcc.gnu.org/viewcvs?rev=266039=gcc=rev
Log:
gcc/ChangeLog:

2018-11-13  Kugan Vivekanandarajah  

PR middle-end/86677
PR middle-end/87528
* tree-scalar-evolution.c (expression_expensive_p): Make BUILTIN
POPCOUNT
as expensive when backend does not define it.

gcc/testsuite/ChangeLog:

2018-11-13  Kugan Vivekanandarajah  

PR middle-end/86677
PR middle-end/87528
* g++.dg/tree-ssa/pr86544.C: Run only for target supporting popcount
pattern.
* gcc.dg/tree-ssa/popcount.c: Likewise.
* gcc.dg/tree-ssa/popcount2.c: Likewise.
* gcc.dg/tree-ssa/popcount3.c: Likewise.
* gcc.target/aarch64/popcount4.c: New test.
* lib/target-supports.exp (check_effective_target_popcountl): New.


Added:
trunk/gcc/testsuite/gcc.target/aarch64/popcount4.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/g++.dg/tree-ssa/pr86544.C
trunk/gcc/testsuite/gcc.dg/tree-ssa/popcount.c
trunk/gcc/testsuite/gcc.dg/tree-ssa/popcount2.c
trunk/gcc/testsuite/gcc.dg/tree-ssa/popcount3.c
trunk/gcc/testsuite/lib/target-supports.exp
trunk/gcc/tree-scalar-evolution.c

[Bug c++/87469] [9 Regression] ice in record_estimate, at tree-ssa-loop-niter.c:3271

2018-10-29 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87469

--- Comment #5 from kugan at gcc dot gnu.org ---
Author: kugan
Date: Mon Oct 29 22:02:45 2018
New Revision: 265605

URL: https://gcc.gnu.org/viewcvs?rev=265605=gcc=rev
Log:
gcc/testsuite/ChangeLog:

2018-10-29  Kugan Vivekanandarajah  

PR middle-end/87469
* g++.dg/pr87469.C: New test.

gcc/ChangeLog:

2018-10-29  Kugan Vivekanandarajah  

PR middle-end/87469
* tree-ssa-loop-niter.c (number_of_iterations_popcount): Fix niter
max value.



Added:
trunk/gcc/testsuite/g++.dg/pr87469.C
Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/tree-ssa-loop-niter.c

[Bug c++/87469] [9 Regression] ice in record_estimate, at tree-ssa-loop-niter.c:3271

2018-10-17 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87469

--- Comment #4 from kugan at gcc dot gnu.org ---
In the loop here, the value defined in the loop (e) is used outside the loop
hence this should not be detected as popcount (AFIK). I will have a look at
fixing this.

[Bug target/87253] New: Python test_ctypes fails when built with gcc 8.2

2018-09-08 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87253

Bug ID: 87253
   Summary: Python test_ctypes fails when built with gcc 8.2
   Product: gcc
   Version: 8.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kugan at gcc dot gnu.org
  Target Milestone: ---

Python-2.7.15

Steps to reproduce error
In Python src directory:
./configure
make
./python Lib/test/regrtest.py -v test_ctypes

==
FAIL: test_struct_by_value (ctypes.test.test_win32.Structures)
--
Traceback (most recent call last):
  File
"/home/kugan.vivekanandarajah/Python-2.7.15/Lib/ctypes/test/test_win32.py",
line 113, in test_struct_by_value
self.assertEqual(ret.left, left.value)
AssertionError: -200 != 10



gdb ./python
b ReturnRect
r Lib/test/regrtest.py -v test_ctypesQuit

(gdb) p cp
$9 = {x = 15, y = 25}
(gdb) p fp
$10 = {x = 548534164448, y = 9890688}

cp and fp are the same as can  be seen from below:

vi /home/kugan.vivekanandarajah/Python-2.7.15/Lib/ctypes/test/test_win32.py
+112

pt = POINT(15, 25)
...
ReturnRect = dll.ReturnRect
ReturnRect.argtypes = [c_int, RECT, POINTER(RECT), POINT, RECT,
  POINTER(RECT), POINT, RECT]


ret = ReturnRect(i, rect, pointer(rect), pt, rect,
 byref(rect), pt, rect)


gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/home/kugan.vivekanandarajah/install/usr/local/bin/../libexec/gcc/aarch64-unknown-linux-gnu/8.2.1/lto-wrapper
Target: aarch64-unknown-linux-gnu
Configured with: ../gcc/configure --disable-bootstrap
Thread model: posix
gcc version 8.2.1 20180907 (GCC)

[Bug target/86677] popcount builtin detection is breaking some kernel build

2018-07-26 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86677

--- Comment #2 from kugan at gcc dot gnu.org ---
(In reply to Richard Biener from comment #1)
> The kernel simply has to provide __popcount{s,d}i2 like it provides other
> libgcc functions if it chooses to not link against libgcc.

Yes, I created this bug just so that I can point it to the kernel people. I
will raise it with the kernel people internally and see what I can do. Thanks.

[Bug target/86677] New: popcount builtin detection is breaking some kernel build

2018-07-25 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86677

Bug ID: 86677
   Summary: popcount builtin detection is breaking some kernel
build
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kugan at gcc dot gnu.org
  Target Milestone: ---

Linux kernel build for arm/aarch64 (and possibly other targets) which does not
provide appropriate patterns in the backend will break the kernel build. 

As for aarch64 this happens because kernel is built with -mgeneral-regs-only

Also discussed in:
https://gcc.gnu.org/ml/gcc-patches/2018-07/msg00489.html

[Bug tree-optimization/86544] Popcount detection generates different code on C and C++

2018-07-18 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86544

--- Comment #4 from kugan at gcc dot gnu.org ---
Author: kugan
Date: Wed Jul 18 22:11:24 2018
New Revision: 262864

URL: https://gcc.gnu.org/viewcvs?rev=262864=gcc=rev
Log:
gcc/ChangeLog:

2018-07-18  Kugan Vivekanandarajah  

PR middle-end/86544
* tree-ssa-phiopt.c (cond_removal_in_popcount_pattern): Handle
comparision with EQ_EXPR
in last stmt.

gcc/testsuite/ChangeLog:

2018-07-18  Kugan Vivekanandarajah  

PR middle-end/86544
* g++.dg/tree-ssa/pr86544.C: New test.


Added:
trunk/gcc/testsuite/g++.dg/tree-ssa/pr86544.C
Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/tree-ssa-phiopt.c

[Bug tree-optimization/86544] Popcount detection generates different code on C and C++

2018-07-17 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86544

--- Comment #2 from kugan at gcc dot gnu.org ---
Patch posted at https://gcc.gnu.org/ml/gcc-patches/2018-07/msg00975.html

[Bug tree-optimization/86544] Popcount detection generates different code on C and C++

2018-07-17 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86544

--- Comment #1 from kugan at gcc dot gnu.org ---
(In reply to ktkachov from comment #0)
> Great to see that GCC now detects the popcount loop in PR 82479!
> I am seeing some curious differences between gcc and g++ though.
> int
> pc (unsigned long long b)
> {
> int c = 0;
> 
> while (b) {
> b &= b - 1;
> c++;
> }
> 
> return c;
> }
> 
> If compiled with gcc -O3 on aarch64 this gives:
> pc:
> fmovd0, x0
> cnt v0.8b, v0.8b
> addvb0, v0.8b
> umovw0, v0.b[0]
> ret
> 
> whereas if compiled with g++ -O3 it gives:
> _Z2pcy:
> .LFB0:
> .cfi_startproc
> fmovd0, x0
> cmp x0, 0
> cnt v0.8b, v0.8b
> addvb0, v0.8b
> umovw0, v0.b[0]
> and x0, x0, 255
> cselw0, w0, wzr, ne
> ret
> 
> which is suboptimal. It seems that phiopt3 manages to optimise the C version
> better. The GIMPLE dumps just before the phiopt pass are:
> For the C (good version):
> 
>   int c;
>   int _7;
> 
>[local count: 118111601]:
>   if (b_4(D) != 0)
> goto ; [89.00%]
>   else
> goto ; [11.00%]
> 
>[local count: 105119324]:
>   _7 = __builtin_popcountl (b_4(D));
> 
>[local count: 118111601]:
>   # c_12 = PHI <0(2), _7(3)>
>   return c_12;
> 
> 
> For the C++ (bad version):
> 
>   int c;
>   int _7;
> 
>[local count: 118111601]:
>   if (b_4(D) == 0)
> goto ; [11.00%]
>   else
> goto ; [89.00%]
> 
>[local count: 105119324]:
>   _7 = __builtin_popcountl (b_4(D));
> 
>[local count: 118111601]:
>   # c_12 = PHI <0(2), _7(3)>
>   return c_12;
> 
> As you can see the order of the gotos and the jump conditions is inverted.
> 
> It seems to me that the two are equivalent and GCC could be doing a better
> job of optimising.
> 
> Can we improve phiopt to handle this more effectively?

Thanks for the test case. I will look at it.

[Bug tree-optimization/86489] ICE in gimple_phi_arg starting with r261682 when building 531.deepsjeng_r with FDO + LTO

2018-07-12 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86489

--- Comment #7 from kugan at gcc dot gnu.org ---
Author: kugan
Date: Fri Jul 13 05:25:47 2018
New Revision: 262622

URL: https://gcc.gnu.org/viewcvs?rev=262622=gcc=rev
Log:
gcc/ChangeLog:

2018-07-13  Kugan Vivekanandarajah  
Richard Biener  

PR middle-end/86489
* tree-ssa-loop-niter.c (number_of_iterations_popcount): Check
that the loop latch destination where phi is defined.

gcc/testsuite/ChangeLog:

2018-07-13  Kugan Vivekanandarajah  

PR middle-end/86489
* gcc.dg/pr86489.c: New test.


Added:
trunk/gcc/testsuite/gcc.dg/pr86489.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/tree-ssa-loop-niter.c

[Bug tree-optimization/86489] ICE in gimple_phi_arg starting with r261682 when building 531.deepsjeng_r with FDO + LTO

2018-07-12 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86489

--- Comment #3 from kugan at gcc dot gnu.org ---
(In reply to Richard Biener from comment #2)
>   gimple *phi = SSA_NAME_DEF_STMT (b_11);
>   if (gimple_code (phi) != GIMPLE_PHI
>   || (gimple_assign_lhs (and_stmt)
>   != gimple_phi_arg_def (phi, loop_latch_edge (loop)->dest_idx)))
> return false;
> 
> this may fail if the PHI in question is not the correct one in which case
> it may not have the argument at the latch dest_idx.  Try first verifying
> that the loop latch destination is indeed gimple_bb (phi).

yes, thanks for spotting. I am testing the following patch:

diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c
index f6fa2f7..fbdf838 100644
--- a/gcc/tree-ssa-loop-niter.c
+++ b/gcc/tree-ssa-loop-niter.c
@@ -2555,6 +2555,7 @@ number_of_iterations_popcount (loop_p loop, edge exit,
... = PHI .  */
   gimple *phi = SSA_NAME_DEF_STMT (b_11);
   if (gimple_code (phi) != GIMPLE_PHI
+  || (gimple_bb (phi) != loop_latch_edge (loop)->dest)
   || (gimple_assign_lhs (and_stmt)
  != gimple_phi_arg_def (phi, loop_latch_edge (loop)->dest_idx)))
 return false;

is checking that there is argument at the latch dest_idx (argument count of
PHI) is still necessary?

[Bug tree-optimization/86489] ICE in gimple_phi_arg starting with r261682 when building 531.deepsjeng_r with FDO + LTO

2018-07-11 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86489

--- Comment #1 from kugan at gcc dot gnu.org ---
Sorry about the breakage, I am trying to reproduce it on x86-64. Please let me
know if you have testcase.

[Bug middle-end/82479] missing popcount builtin detection

2018-06-16 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82479

--- Comment #13 from kugan at gcc dot gnu.org ---
Author: kugan
Date: Sat Jun 16 21:39:31 2018
New Revision: 261682

URL: https://gcc.gnu.org/viewcvs?rev=261682=gcc=rev
Log:
gcc/ChangeLog:

2018-06-16  Kugan Vivekanandarajah  

PR middle-end/82479
* ipa-fnsummary.c (will_be_nonconstant_expr_predicate): Handle
CALL_EXPR.
* tree-scalar-evolution.c (interpret_expr): Likewise.
(expression_expensive_p): Likewise.
* tree-ssa-loop-ivopts.c (contains_abnormal_ssa_name_p): Likewise.
* tree-ssa-loop-niter.c (number_of_iterations_popcount): New.
(number_of_iterations_exit_assumptions): Use
number_of_iterations_popcount.
(ssa_defined_by_minus_one_stmt_p): New.

gcc/testsuite/ChangeLog:

2018-06-16  Kugan Vivekanandarajah  

PR middle-end/82479
* gcc.dg/tree-ssa/popcount.c: New test.
* gcc.dg/tree-ssa/popcount2.c: New test.


Added:
trunk/gcc/testsuite/gcc.dg/tree-ssa/popcount.c
trunk/gcc/testsuite/gcc.dg/tree-ssa/popcount2.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/ipa-fnsummary.c
trunk/gcc/testsuite/ChangeLog
trunk/gcc/tree-scalar-evolution.c
trunk/gcc/tree-ssa-loop-ivopts.c
trunk/gcc/tree-ssa-loop-niter.c

[Bug tree-optimization/64946] [AArch64] gcc.target/aarch64/vect-abs-compile.c - "abs" vectorization fails for char/short types

2018-06-16 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64946

--- Comment #24 from kugan at gcc dot gnu.org ---
Author: kugan
Date: Sat Jun 16 21:34:29 2018
New Revision: 261681

URL: https://gcc.gnu.org/viewcvs?rev=261681=gcc=rev
Log:
gcc/ChangeLog:

2018-06-16  Kugan Vivekanandarajah  

PR middle-end/64946
* cfgexpand.c (expand_debug_expr): Hande ABSU_EXPR.
* config/i386/i386.c (ix86_add_stmt_cost): Likewise.
* dojump.c (do_jump): Likewise.
* expr.c (expand_expr_real_2): Check operand type's sign.
* fold-const.c (const_unop): Handle ABSU_EXPR.
(fold_abs_const): Likewise.
* gimple-pretty-print.c (dump_unary_rhs): Likewise.
* gimple-ssa-backprop.c (backprop::process_assign_use): Likesie.
(strip_sign_op_1): Likesise.
* match.pd: Add new pattern to generate ABSU_EXPR.
* optabs-tree.c (optab_for_tree_code): Handle ABSU_EXPR.
* tree-cfg.c (verify_gimple_assign_unary): Likewise.
* tree-eh.c (operation_could_trap_helper_p): Likewise.
* tree-inline.c (estimate_operator_cost): Likewise.
* tree-pretty-print.c (dump_generic_node): Likewise.
* tree-vect-patterns.c (vect_recog_sad_pattern): Likewise.
* tree.def (ABSU_EXPR): New.

gcc/c-family/ChangeLog:

2018-06-16  Kugan Vivekanandarajah  

* c-common.c (c_common_truthvalue_conversion): Handle ABSU_EXPR.

gcc/c/ChangeLog:

2018-06-16  Kugan Vivekanandarajah  

* c-typeck.c (build_unary_op): Handle ABSU_EXPR;
* gimple-parser.c (c_parser_gimple_statement): Likewise.
(c_parser_gimple_unary_expression): Likewise.

gcc/cp/ChangeLog:

2018-06-16  Kugan Vivekanandarajah  

* constexpr.c (potential_constant_expression_1): Handle ABSU_EXPR.
* cp-gimplify.c (cp_fold): Likewise.

gcc/testsuite/ChangeLog:

2018-06-16  Kugan Vivekanandarajah  

PR middle-end/64946
* gcc.dg/absu.c: New test.
* gcc.dg/gimplefe-29.c: New test.
* gcc.target/aarch64/pr64946.c: New test.


Added:
trunk/gcc/testsuite/gcc.dg/absu.c
trunk/gcc/testsuite/gcc.dg/gimplefe-29.c
trunk/gcc/testsuite/gcc.target/aarch64/pr64946.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/c-family/ChangeLog
trunk/gcc/c-family/c-common.c
trunk/gcc/c/ChangeLog
trunk/gcc/c/c-typeck.c
trunk/gcc/c/gimple-parser.c
trunk/gcc/cfgexpand.c
trunk/gcc/config/i386/i386.c
trunk/gcc/cp/ChangeLog
trunk/gcc/cp/constexpr.c
trunk/gcc/cp/cp-gimplify.c
trunk/gcc/dojump.c
trunk/gcc/expr.c
trunk/gcc/fold-const.c
trunk/gcc/gimple-pretty-print.c
trunk/gcc/gimple-ssa-backprop.c
trunk/gcc/match.pd
trunk/gcc/optabs-tree.c
trunk/gcc/testsuite/ChangeLog
trunk/gcc/tree-cfg.c
trunk/gcc/tree-eh.c
trunk/gcc/tree-inline.c
trunk/gcc/tree-pretty-print.c
trunk/gcc/tree-vect-patterns.c
trunk/gcc/tree.def

[Bug fortran/78387] OpenMP segfault/stack size exceeded writing to internal file

2017-10-15 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78387

kugan at gcc dot gnu.org changed:

   What|Removed |Added

 CC||kugan at gcc dot gnu.org

--- Comment #17 from kugan at gcc dot gnu.org ---
*** Bug 82555 has been marked as a duplicate of this bug. ***

[Bug libfortran/82555] SPECcpu201 Wrf_s deadlock

2017-10-15 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82555

kugan at gcc dot gnu.org changed:

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #6 from kugan at gcc dot gnu.org ---


*** This bug has been marked as a duplicate of bug 78387 ***

[Bug libfortran/82555] SPECcpu201 Wrf_s deadlock

2017-10-14 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82555

--- Comment #5 from kugan at gcc dot gnu.org ---
(In reply to Andrew Pinski from comment #4)
> Actually PR 78387 seems exactly this issue.  Please test with a newer
> version of gfortran.

Thanks Andrew. Looks like this is the issue. So far, current trunk is
continuing without error.

[Bug libgomp/82555] SPECcpu201 Wrf_s deadlock

2017-10-14 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82555

--- Comment #1 from kugan at gcc dot gnu.org ---
My gcc is slightly old. 
gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/home/kugan.vivekanandarajah/install/test/usr/local/bin/../libexec/gcc/aarch64-unknown-linux-gnu/8.0.0/lto-wrapper
Target: aarch64-unknown-linux-gnu
Configured with: ../gcc-exp2/configure : (reconfigured) ../gcc-exp2/configure
--enable-languages=c,c++,fortran,lto,objc --no-create --no-recursion
Thread model: posix
gcc version 8.0.0 20170822 (experimental) (GCC)

I will try with the latest version.

[Bug libgomp/82555] New: SPECcpu201 Wrf_s deadlock

2017-10-14 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82555

Bug ID: 82555
   Summary: SPECcpu201 Wrf_s deadlock
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libgomp
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kugan at gcc dot gnu.org
CC: jakub at gcc dot gnu.org
  Target Milestone: ---

Wrf_s is hanging or deadlocks when run on 48 threads (cores). It doesnt always
happen and I have to run with --iterations=111 and it will eventually happens.
Sometimes in the 2nd iterations and some times much later.

I attached the process to gdb and the back trace is:
(gdb) bt
#0  0x01019924 in __lll_lock_wait (futex=futex@entry=0x2c3b1e0
<_gfortrani_unit_lock>, private=0) at lowlevellock.c:43
#1  0x01012cbc in __pthread_mutex_lock (mutex=0x2c3b1e0
<_gfortrani_unit_lock>) at pthread_mutex_lock.c:80
#2  0x00fd20ac in __gthread_mutex_lock (__mutex=0x2c3b1e0
<_gfortrani_unit_lock>) at ../libgcc/gthr-default.h:748
#3  _gfortrani_close_units () at ../../../gcc-exp2/libgfortran/io/unit.c:835
#4  0x0103950c in __libc_csu_fini ()
#5  0x0103f068 in __run_exit_handlers ()
#6  0x0103f0b0 in exit ()
#7  0x00fc6e60 in _gfortrani_exit_error (status=1, status@entry=3) at
../../../gcc-exp2/libgfortran/runtime/error.c:196
#8  0x00fc7314 in _gfortrani_internal_error
(cmp=cmp@entry=0xcdf23d00, 
message=message@entry=0x11548a8 "stash_internal_unit(): Stack Size
Exceeded") at ../../../gcc-exp2/libgfortran/runtime/error.c:422
#9  0x00fd1a84 in _gfortrani_stash_internal_unit (dtp=0xcdf23d00)
at ../../../gcc-exp2/libgfortran/io/unit.c:549
#10 0x00fd0f6c in _gfortran_st_write_done (dtp=0xcdf23d00) at
../../../gcc-exp2/libgfortran/io/transfer.c:4168
#11 0x00db933c in __module_ra_rrtm_MOD_rrtmlwrad ()
#12 0x00db933c in __module_ra_rrtm_MOD_rrtmlwrad ()
#13 0x00db933c in __module_ra_rrtm_MOD_rrtmlwrad ()
#14 0x00db933c in __module_ra_rrtm_MOD_rrtmlwrad ()
#15 0x00db933c in __module_ra_rrtm_MOD_rrtmlwrad ()
#16 0x00db933c in __module_ra_rrtm_MOD_rrtmlwrad ()
#17 0x00db933c in __module_ra_rrtm_MOD_rrtmlwrad ()
#18 0x00db933c in __module_ra_rrtm_MOD_rrtmlwrad ()
#19 0x00db933c in __module_ra_rrtm_MOD_rrtmlwrad ()
#20 0x00db933c in __module_ra_rrtm_MOD_rrtmlwrad ()
#21 0x00db933c in __module_ra_rrtm_MOD_rrtmlwrad ()
#22 0x00db933c in __module_ra_rrtm_MOD_rrtmlwrad ()
#23 0x00db933c in __module_ra_rrtm_MOD_rrtmlwrad ()
#24 0x00db933c in __module_ra_rrtm_MOD_rrtmlwrad ()
#25 0x00db933c in __module_ra_rrtm_MOD_rrtmlwrad ()
#26 0x00db933c in __module_ra_rrtm_MOD_rrtmlwrad ()
#27 0x00db933c in __module_ra_rrtm_MOD_rrtmlwrad ()
#28 0x00db933c in __module_ra_rrtm_MOD_rrtmlwrad ()
#29 0x00db933c in __module_ra_rrtm_MOD_rrtmlwrad ()
#30 0x00db933c in __module_ra_rrtm_MOD_rrtmlwrad ()
#31 0x00db933c in __module_ra_rrtm_MOD_rrtmlwrad ()
#32 0x00db933c in __module_ra_rrtm_MOD_rrtmlwrad ()
#33 0x00db933c in __module_ra_rrtm_MOD_rrtmlwrad ()
#34 0x00db933c in __module_ra_rrtm_MOD_rrtmlwrad ()
#35 0x00db933c in __module_ra_rrtm_MOD_rrtmlwrad ()
#36 0x00db933c in __module_ra_rrtm_MOD_rrtmlwrad ()
#37 0x00db933c in __module_ra_rrtm_MOD_rrtmlwrad ()
#38 0x00db933c in __module_ra_rrtm_MOD_rrtmlwrad ()
#39 0x00db933c in __module_ra_rrtm_MOD_rrtmlwrad ()
#40 0x00db933c in __module_ra_rrtm_MOD_rrtmlwrad ()
#41 0x00db933c in __module_ra_rrtm_MOD_rrtmlwrad ()
#42 0x00db933c in __module_ra_rrtm_MOD_rrtmlwrad ()
#43 0x00db933c in __module_ra_rrtm_MOD_rrtmlwrad ()
#44 0x00db933c in __module_ra_rrtm_MOD_rrtmlwrad ()
#45 0x00db933c in __module_ra_rrtm_MOD_rrtmlwrad ()
#46 0x00db933c in __module_ra_rrtm_MOD_rrtmlwrad ()

I am running this on AArch64 but I dont think this is an AArch64 specific
issue. Is anyone else seeing this?

[Bug middle-end/82479] missing popcount builtin detection

2017-10-08 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82479

--- Comment #4 from kugan at gcc dot gnu.org ---
(In reply to Andrew Pinski from comment #2)
> Confirmed. How useful this optimization is questionable.

This code is part of spec2017/deepsjeng. There is some gain if we can. 

> 
> Gcc has __builtin_popcount which can be used.

I agree.

[Bug middle-end/82479] missing popcount builtin detection

2017-10-08 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82479

--- Comment #1 from kugan at gcc dot gnu.org ---
gcc trunk generates:
PopCount:
mov w2, 0
cbz x0, .L1
.p2align 3
.L3:
sub x1, x0, #1
add w2, w2, 1
andsx0, x0, x1
bne .L3
.L1:
mov w0, w2
ret

[Bug middle-end/82479] New: missing popcount builtin detection

2017-10-08 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82479

Bug ID: 82479
   Summary: missing popcount builtin detection
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kugan at gcc dot gnu.org
  Target Milestone: ---

gcc does not have support to detect builtin pop count. As a results, gcc
generates bad code for

int PopCount (long b) {
int c = 0;

while (b) {
b &= b - 1;
c++;
}
return c;
}

clang seems to do that and generates (for aarch64):

_Z8PopCounty:
fmov d0, x0
cnt  v0.8b, v0.8b
uaddlv  h0, v0.8b
fmov w0, s0
ret

[Bug tree-optimization/81558] Loop not vectorized

2017-07-26 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81558

--- Comment #2 from kugan at gcc dot gnu.org ---

> Does LLVM do a runtime alias check here?  For foo1 GCC adds a runtime alias
> check
> (BB vectorization cannot version for aliasing).

Yes. LLVM does not seem to be unrolling the inner loop. As you said, when
disabling cunrolli it works. cunroll pass will unroll after loop vectorisation.
Can anything  done with the heuristics for this case? Thanks.

[Bug middle-end/81558] New: Loop not vectorized

2017-07-26 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81558

Bug ID: 81558
   Summary: Loop not vectorized
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kugan at gcc dot gnu.org
  Target Milestone: ---

For the testcase:

struct I
{
  int opix_x;
  int opix_y;
};

//#define R 
#define R __restrict__
extern struct I * R img;
extern unsigned short ** R imgY_org;
extern unsigned short orig_blocks[256];

void foo1 (int n)
{
  int x = 1, y = 1;
  unsigned short *orgptr=orig_blocks;
  // Vectorized
  for (y = 0; y < img->opix_y; y++)
for (x = 0; x < img->opix_x; x++)
  *orgptr++ = imgY_org [y][x];
}

void foo2 (int n)
{
  int x = 1, y = 1;
  unsigned short *orgptr=orig_blocks;
  // Not vectorized
  for (y = img->opix_y; y < img->opix_y+16; y++)
for (x = img->opix_x; x < img->opix_x+16; x++)
  *orgptr++ = imgY_org [y][x];
}

Loop in foo2 is not vectorized.

In the *.156t.vect, I see:
Creating dr for *_40
analyze_innermost: failed: evolution of base is not affine.
base_address: 
offset from base address: 
constant offset from base address: 
step: 
aligned to: 
base_object: *_40


LLVM seems to be able to vectorize this.

[Bug tree-optimization/80612] [7/8 Regression] ICE in get_range_info, at tree-ssanames.c:375

2017-05-03 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80612

--- Comment #5 from kugan at gcc dot gnu.org ---
(In reply to Marek Polacek from comment #4)
> This should fix it:
> 
> --- a/gcc/calls.c
> +++ b/gcc/calls.c
> @@ -1270,7 +1270,7 @@ get_size_range (tree exp, tree range[2])
>  
>wide_int min, max;
>enum value_range_type range_type
> -= (TREE_CODE (exp) == SSA_NAME
> += ((TREE_CODE (exp) == SSA_NAME && INTEGRAL_TYPE_P (TREE_TYPE (exp)))
> ? get_range_info (exp, , ) : VR_VARYING);
>  
>if (range_type == VR_VARYING)

Looked at the other uses of get_range_info too. There are uses of this in
gcc/gimple-ssa-warn-alloca.c without the check for INTEGRAL_TYPE_P but I think
it is intentional.

[Bug lto/78140] [7 Regression] libxul -flto uses 1GB more memory than gcc-6

2017-01-22 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78140

kugan at gcc dot gnu.org changed:

   What|Removed |Added

 CC||kugan at gcc dot gnu.org

--- Comment #26 from kugan at gcc dot gnu.org ---
(In reply to Richard Biener from comment #20)
> Look at tree-ssanames.c:range_info_def for "tricks" (make them variable
> size):
> 
> /* Value range information for SSA_NAMEs representing non-pointer variables.
> */
> 
> struct GTY ((variable_size)) range_info_def {
>   /* Minimum, maximum and nonzero bits.  */
>   TRAILING_WIDE_INT_ACCESSOR (min, ints, 0)
>   TRAILING_WIDE_INT_ACCESSOR (max, ints, 1)
>   TRAILING_WIDE_INT_ACCESSOR (nonzero_bits, ints, 2)
>   trailing_wide_ints <3> ints;
> };

I am working on a patch to change ipa vrp based on the above.

[Bug tree-optimization/78721] [7 Regression] ICE on valid code at -O2 and -O3 on x86_64-linux-gnu: in set_value_range, at tree-vrp.c:371

2016-12-09 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78721

--- Comment #4 from kugan at gcc dot gnu.org ---
Author: kugan
Date: Fri Dec  9 19:47:10 2016
New Revision: 243501

URL: https://gcc.gnu.org/viewcvs?rev=243501=gcc=rev
Log:
gcc/testsuite/ChangeLog:

2016-12-09  Kugan Vivekanandarajah  <kug...@linaro.org>

PR ipa/78721
* gcc.dg/pr78721.c: New test.

gcc/ChangeLog:

2016-12-09  Kugan Vivekanandarajah  <kug...@linaro.org>

PR ipa/78721
* ipa-cp.c (propagate_vr_accross_jump_function): drop_tree_overflow
after fold_convert.


Added:
trunk/gcc/testsuite/gcc.dg/pr78721.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/ipa-cp.c
trunk/gcc/testsuite/ChangeLog

[Bug tree-optimization/78721] [7 Regression] ICE on valid code at -O2 and -O3 on x86_64-linux-gnu: in set_value_range, at tree-vrp.c:371

2016-12-08 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78721

kugan at gcc dot gnu.org changed:

   What|Removed |Added

 CC||kugan at gcc dot gnu.org

--- Comment #3 from kugan at gcc dot gnu.org ---
Created attachment 40280
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=40280=edit
untested patch

[Bug tree-optimization/77862] [7 Regression] ice in add_equivalence

2016-12-07 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77862

kugan at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #8 from kugan at gcc dot gnu.org ---
Fixed in trunk.

[Bug tree-optimization/72835] [7 Regression] Incorrect arithmetic optimization involving bitfield arguments

2016-11-21 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72835

kugan at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #6 from kugan at gcc dot gnu.org ---
Fixed in trunk.

[Bug tree-optimization/71408] [7 Regression] wrong code at -Os and above on x86_64-linux-gnu

2016-11-21 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71408

kugan at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #7 from kugan at gcc dot gnu.org ---
Fixed in trunk.

[Bug tree-optimization/40921] missed optimization: x + (-y * z * z) => x - y * z * z

2016-11-21 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40921

kugan at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 CC||kugan at gcc dot gnu.org
 Resolution|--- |FIXED

--- Comment #6 from kugan at gcc dot gnu.org ---
Fixed in trunk.

[Bug ipa/78296] [7 regression] test case gcc.dg/ipa/vrp7.c fails starting with r242032

2016-11-17 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78296

kugan at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from kugan at gcc dot gnu.org ---
Fixed with r242368.

[Bug c/78365] [7 Regression] ICE in determine_value_range, at tree-ssa-loo p-niter.c:413

2016-11-17 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78365

--- Comment #6 from kugan at gcc dot gnu.org ---
(In reply to Richard Biener from comment #5)
> IPA has to deal with argument mismatches (I think I've said this elsewhere).

As I understand, this is along what you found earlier but a different issue. I
posted a patch at https://gcc.gnu.org/ml/gcc-patches/2016-11/msg01878.html for
review.

[Bug c/78365] [7 Regression] ICE in determine_value_range, at tree-ssa-loo p-niter.c:413

2016-11-15 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78365

--- Comment #4 from kugan at gcc dot gnu.org ---
bug320.c also has the same issue:

static void finddpos (coord *,int,int,int,int);

bug320.c +10093 has:
static void
finddpos(cc, xl,yl,xh,yh)
coord *cc;
xchar xl,yl,xh,yh;

[Bug c/78365] [7 Regression] ICE in determine_value_range, at tree-ssa-loo p-niter.c:413

2016-11-15 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78365

kugan at gcc dot gnu.org changed:

   What|Removed |Added

 CC||kugan at gcc dot gnu.org

--- Comment #3 from kugan at gcc dot gnu.org ---
Reduces testcase looks invalid:

a, b, c;
char d;
static fn1(int *, int);
fn1(cc, yh) int *cc;
char yh;
{
  char y;
  a = fn2(c - b + 1);
  for (; y <= yh; y++)
;
}
fn3() {
  fn1(fn3, 1);
  fn1(fn3, d - 1);
}


static fn1(int *, int); is the prototype
and then we have

fn1(cc, yh) int *cc;
char yh;

second argument is now char. I think FE should reject this.

[Bug ipa/78258] [7 Regression] ICE in compare_values_warnv, at tree-vrp.c:1218

2016-11-14 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78258

kugan at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #7 from kugan at gcc dot gnu.org ---
Duplicate and fixed.

*** This bug has been marked as a duplicate of bug 78121 ***

[Bug tree-optimization/78121] [7 Regression] ice in set_value_range

2016-11-14 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78121

kugan at gcc dot gnu.org changed:

   What|Removed |Added

 CC||gerhard.steinmetz.fortran@t
   ||-online.de

--- Comment #9 from kugan at gcc dot gnu.org ---
*** Bug 78258 has been marked as a duplicate of this bug. ***

[Bug ipa/78258] [7 Regression] ICE in compare_values_warnv, at tree-vrp.c:1218

2016-11-13 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78258

--- Comment #5 from kugan at gcc dot gnu.org ---
Looks like a dupof PR78121 which is fixed. z1.f90 is now working.

[Bug ipa/78296] [7 regression] test case gcc.dg/ipa/vrp7.c fails starting with r242032

2016-11-11 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78296

kugan at gcc dot gnu.org changed:

   What|Removed |Added

 CC||amker at gcc dot gnu.org

--- Comment #2 from kugan at gcc dot gnu.org ---
*** Bug 78316 has been marked as a duplicate of this bug. ***

[Bug ipa/78316] FAIL: gcc.dg/ipa/vrp7.c scan-ipa-dump-times cp "Setting value range of param 0 \\[-10, 9\\]" 1

2016-11-11 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78316

kugan at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 CC||kugan at gcc dot gnu.org
 Resolution|--- |DUPLICATE

--- Comment #3 from kugan at gcc dot gnu.org ---
duplicate.

*** This bug has been marked as a duplicate of bug 78296 ***

[Bug ipa/78296] [7 regression] test case gcc.dg/ipa/vrp7.c fails starting with r242032

2016-11-10 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78296

kugan at gcc dot gnu.org changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |kugan at gcc dot gnu.org

--- Comment #1 from kugan at gcc dot gnu.org ---
(In reply to Bill Seurer from comment #0)
> spawn /home/seurer/gcc/build/gcc-test2/gcc/xgcc
> -B/home/seurer/gcc/build/gcc-test2/gcc/
> /home/seurer/gcc/gcc-test2/gcc/testsuite/gcc.dg/ipa/vrp7.c
> -fno-diagnostics-show-caret -fdiagnostics-color=never -O2
> -fdump-ipa-cp-details -S -o vrp7.s
> PASS: gcc.dg/ipa/vrp7.c (test for excess errors)
> FAIL: gcc.dg/ipa/vrp7.c scan-ipa-dump-times cp "Setting value range of param
> 0 \\[-10, 9\\]" 1
> 
>   === gcc Summary ===
> 
> # of expected passes  1
> # of unexpected failures  1

Thanks for the report. This is expected as I have reverted r241990 which does
this optimization. I will repost r241990 when I have fixed the bootstrap
comparison issue at the earliest.

[Bug ipa/78268] [7 Regression] internal compiler error: Segmentation fault

2016-11-09 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78268

--- Comment #1 from kugan at gcc dot gnu.org ---
(In reply to Markus Trippelsdorf from comment #0)
> Either r241990 or r241989 causes a new ICE during Firefox build:
> 
> /home/trippels/gecko-dev/rdf/base/rdfutil.cpp:111:1: internal compiler
> error: Segmentation fault
>  }
>  ^
> 0x10b6b1d3 crash_signal
> ../../gcc/gcc/toplev.c:338
> 0x108308dc unshare_expr_without_location(tree_node*)
> ../../gcc/gcc/gimplify.c:978
> 0x10903163 ipa_set_jf_arith_pass_through
> ../../gcc/gcc/ipa-prop.c:468
> 0x10903163 update_jump_functions_after_inlining
> ../../gcc/gcc/ipa-prop.c:2645
> 0x10915f8b propagate_info_to_inlined_callees
> ../../gcc/gcc/ipa-prop.c:3409
> 0x10917c1f ipa_propagate_indirect_call_infos(cgraph_edge*, vec<cgraph_edge*,
> va_heap, vl_ptr>*)
> ../../gcc/gcc/ipa-prop.c:3561
> 0x113c401b inline_call(cgraph_edge*, bool, vec<cgraph_edge*, va_heap,
> vl_ptr>*, int*, bool, bool*)
> ../../gcc/gcc/ipa-inline-transform.c:447
> 0x113b973b inline_small_functions
> ../../gcc/gcc/ipa-inline.c:2029
> 0x113b973b ipa_inline
> ../../gcc/gcc/ipa-inline.c:2439
> 0x113b973b execute
> ../../gcc/gcc/ipa-inline.c:2850
> 
> 
> Reducing.

Sorry about the breakage. can you please attach the preprocessed source file to
reproduce this.

[Bug tree-optimization/78121] [7 Regression] ice in set_value_range

2016-11-08 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78121

--- Comment #7 from kugan at gcc dot gnu.org ---
Author: kugan
Date: Wed Nov  9 01:41:26 2016
New Revision: 241989

URL: https://gcc.gnu.org/viewcvs?rev=241989=gcc=rev
Log:
Fix ice in set_value_range
gcc/ChangeLog:

2016-11-09  Kugan Vivekanandarajah  <kug...@linaro.org>

PR ipa/78121
* ipa-cp.c (propagate_vr_accross_jump_function): Pass param type.
Also fold constant passed as argument while computing value range.
(propagate_constants_accross_call): Pass param type.
* ipa-prop.c: export ipa_get_callee_param_type.
* ipa-prop.h: export ipa_get_callee_param_type.

gcc/testsuite/ChangeLog:

2016-11-09  Kugan Vivekanandarajah  <kug...@linaro.org>

PR ipa/78121
* gcc.dg/ipa/pr78121.c: New test.



Added:
trunk/gcc/testsuite/gcc.dg/ipa/pr78121.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/ipa-cp.c
trunk/gcc/ipa-prop.c
trunk/gcc/ipa-prop.h
trunk/gcc/testsuite/ChangeLog

[Bug tree-optimization/78121] [7 Regression] ice in set_value_range

2016-11-05 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78121

--- Comment #6 from kugan at gcc dot gnu.org ---
(In reply to David Binderman from comment #5)
> (In reply to kugan from comment #4)
> > Created attachment 39904 [details]
> > untested patch
> > 
> > testing this patch
> 
> patch any good ?

Posted at https://gcc.gnu.org/ml/gcc-patches/2016-10/msg02309.html
waiting for Honza's approval.

[Bug tree-optimization/78121] [7 Regression] ice in set_value_range

2016-10-26 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78121

--- Comment #4 from kugan at gcc dot gnu.org ---
Created attachment 39904
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=39904=edit
untested patch

testing this patch

[Bug tree-optimization/78121] [7 Regression] ice in set_value_range

2016-10-26 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78121

kugan at gcc dot gnu.org changed:

   What|Removed |Added

 CC||kugan at gcc dot gnu.org
   Assignee|unassigned at gcc dot gnu.org  |kugan at gcc dot gnu.org

--- Comment #3 from kugan at gcc dot gnu.org ---
Looks like ipa-vrp issue. I will have a look.

[Bug tree-optimization/77921] [7 Regression] tree-ssanames.c miscompiled during PGO bootstrap

2016-10-10 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77921

--- Comment #4 from kugan at gcc dot gnu.org ---
Sorry about the breakage. I will try to reproduce it.

(In reply to Markus Trippelsdorf from comment #1)
> gcc version 7.0.0 20161007 was fine
Are you saying that this is issue is gone latent? 20161007 should have
early-vrp and ipa-vrp.

[Bug tree-optimization/77862] [7 Regression] ice in add_equivalence

2016-10-06 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77862

--- Comment #5 from kugan at gcc dot gnu.org ---
Author: kugan
Date: Thu Oct  6 19:58:46 2016
New Revision: 240842

URL: https://gcc.gnu.org/viewcvs?rev=240842=gcc=rev
Log:
Fix PR77862
gcc/testsuite/ChangeLog:

2016-10-06  Kugan Vivekanandarajah  <kug...@linaro.org>

PR tree-optimization/77862
* gcc.dg/pr77862.c: New test.

gcc/ChangeLog:

2016-10-06  Kugan Vivekanandarajah  <kug...@linaro.org>

PR tree-optimization/77862
* tree-vrp.c (add_equivalence): Use get_value_range so that
num_vr_values is checked before accessing vr_values.



Added:
trunk/gcc/testsuite/gcc.dg/pr77862.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/tree-vrp.c

[Bug tree-optimization/77862] [7 Regression] ice in add_equivalence

2016-10-05 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77862

--- Comment #4 from kugan at gcc dot gnu.org ---
patch posted for review at:
https://gcc.gnu.org/ml/gcc-patches/2016-10/msg00349.html

[Bug tree-optimization/77677] [7 Regression] ICE at -O1 and above in both 32-bit and 64-bit modes on x86_64-linux-gnu (internal compiler error: in set_value_range, at tree-vrp.c:361)

2016-09-26 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77677

--- Comment #11 from kugan at gcc dot gnu.org ---
Author: kugan
Date: Tue Sep 27 03:41:14 2016
New Revision: 240517

URL: https://gcc.gnu.org/viewcvs?rev=240517=gcc=rev
Log:
Fix ipa-vrp convert value_range

gcc/ChangeLog:

2016-09-27  Kugan Vivekanandarajah  <kug...@linaro.org>

PR ipa/77677
* ipa-prop.c (ipa_compute_jump_functions_for_edge): Use
extract_range_from_unary_expr to convert value_range.
* tree-vrp.c (extract_range_from_unary_expr_1): Rename to.
(extract_range_from_unary_expr): This.
* tree-vrp.h (extract_range_from_unary_expr): Declare.

gcc/testsuite/ChangeLog:

2016-09-27  Kugan Vivekanandarajah  <kug...@linaro.org>

PR ipa/77677
* gcc.dg/torture/pr77677-2.c: New test.


Added:
trunk/gcc/testsuite/gcc.dg/torture/pr77677-2.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/ipa-prop.c
trunk/gcc/testsuite/ChangeLog
trunk/gcc/tree-vrp.c
trunk/gcc/tree-vrp.h

[Bug tree-optimization/77719] [7 Regression] ICE in pp_string, at pretty-print.c:955

2016-09-26 Thread kugan at gcc dot gnu.org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77719

--- Comment #7 from kugan at gcc dot gnu.org ---
Author: kugan
Date: Mon Sep 26 18:16:23 2016
New Revision: 240505

URL: https://gcc.gnu.org/viewcvs?rev=240505=gcc=rev
Log:
Fix PR77719
gcc/testsuite/ChangeLog:

2016-09-26  Kugan Vivekanandarajah  <kug...@linaro.org>

PR middle-end/77719
* gfortran.dg/pr77719.f90: New test.

gcc/ChangeLog:

2016-09-26  Kugan Vivekanandarajah  <kug...@linaro.org>

PR middle-end/77719
* tree-ssa-reassoc.c (make_new_ssa_for_def): Use gimple_get_lhs to get
lhs
instead of gimple_assign_lhs as stmt can be builtins too.



Added:
trunk/gcc/testsuite/gfortran.dg/pr77719.f90
Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/tree-ssa-reassoc.c

1 2 3 >

1 - 100 of 248 matches

Mail list logo