Re: [PATCH] Fix ICE with i?86 stv (PR target/70110)

2016-03-07 Thread Uros Bizjak
On Tue, Mar 8, 2016 at 12:41 AM, Jakub Jelinek  wrote:
> Hi!
>
> What the STV pass does is clearly wrong, it meant to grab a sequence
> of instructions out of a sequence and emit it somewhere else, but
> calls end_sequence too late, so the sequence bookkepping vars get
> corrupted.
>
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
> trunk?
>
> 2016-03-08  Jakub Jelinek  
>
> PR target/70110
> * config/i386/i386.c (scalar_chain::make_vector_copies,
> scalar_chain::convert_reg): Call end_sequence in between
> get_insns and emit_conversion_insns rather than after both
> calls.
>
> * gcc.dg/pr70110.c: New test.

Uh, OK.

Thanks,
Uros.

> --- gcc/config/i386/i386.c.jj   2016-03-04 21:27:48.0 +0100
> +++ gcc/config/i386/i386.c  2016-03-07 10:03:44.465981310 +0100
> @@ -3272,8 +3272,9 @@ scalar_chain::make_vector_copies (unsign
> gen_rtx_SUBREG (SImode, reg, 4));
> emit_move_insn (vreg, tmp);
>   }
> -   emit_conversion_insns (get_insns (), insn);
> +   rtx_insn *seq = get_insns ();
> end_sequence ();
> +   emit_conversion_insns (seq, insn);
>
> if (dump_file)
>   fprintf (dump_file,
> @@ -3348,8 +3349,9 @@ scalar_chain::convert_reg (unsigned regn
>   emit_move_insn (gen_rtx_SUBREG (SImode, scopy, 4),
>   adjust_address (tmp, SImode, 4));
> }
> - emit_conversion_insns (get_insns (), insn);
> + rtx_insn *seq = get_insns ();
>   end_sequence ();
> + emit_conversion_insns (seq, insn);
>
>   if (dump_file)
> fprintf (dump_file,
> --- gcc/testsuite/gcc.dg/pr70110.c.jj   2016-03-07 10:16:22.211631464 +0100
> +++ gcc/testsuite/gcc.dg/pr70110.c  2016-03-07 10:16:05.0 +0100
> @@ -0,0 +1,39 @@
> +/* PR target/70110 */
> +/* { dg-do compile } */
> +/* { dg-options "-O3" } */
> +/* { dg-additional-options "-msse2" { target i?86-*-* x86_64-*-* } } */
> +
> +int a, c, d, f, h;
> +long long b;
> +
> +static inline void
> +foo (void)
> +{
> +  if (a)
> +foo ();
> +  b = c;
> +}
> +
> +static inline void
> +bar (int p)
> +{
> +  if (p)
> +f = 0;
> +  b |= c;
> +}
> +
> +void
> +baz (int g, int i)
> +{
> +  for (b = d; (d = 1) != 0; )
> +{
> +  if (a)
> +   foo ();
> +  b |= c;
> +  bar (h);
> +  bar (g);
> +  bar (h);
> +  bar (i);
> +  bar (h);
> +}
> +}
>
> Jakub


Re: [PATCH, AArch64] atomics: prefetch the destination for write prior to ldxr/stxr loops

2016-03-07 Thread Andrew Pinski
On Mon, Mar 7, 2016 at 8:12 PM, Yangfei (Felix)  wrote:
>> On Mon, Mar 7, 2016 at 7:27 PM, Yangfei (Felix)  
>> wrote:
>> > Hi,
>> >
>> > As discussed in LKML:
>> http://lists.infradead.org/pipermail/linux-arm-kernel/2015-July/355996.html, 
>> the
>> cost of changing a cache line
>> > from shared to exclusive state can be significant on aarch64 cores,
>> especially when this is triggered by an exclusive store, since it may
>> > result in having to retry the transaction.
>> > This patch makes use of the "prfm PSTL1STRM" instruction to prefetch
>> cache lines for write prior to ldxr/stxr loops generated by the ll/sc atomic
>> routines.
>> > Bootstrapped on AArch64 server, is it OK?
>>
>>
>> I don't think this is a good thing in general.  For an example on ThunderX, 
>> the
>> prefetch just adds a cycle for no benefit.  This really depends on the
>> micro-architecture of the core and how LDXR/STXR are
>> implemented.   So after this patch, it will slow down ThunderX.
>>
>> Thanks,
>> Andrew Pinski
>>
>
> Hi Andrew,
>
>I am not quite clear about the ThunderX micro-arch.  But, Yes, I agree it 
> depends on the micro-architecture of the core.
>As the mentioned kernel patch is merged upstream, I think the added 
> prefetch instruction in atomic routines is good for most of AArch64 cores in 
> the market.
>If it does nothing good for ThunderX, then how about adding some checking 
> here?  I mean disabling the the generation of the prfm if we are tuning for 
> ThunderX.

No it is not just not do any good, it actually causes worse
performance for ThunderX.  How about only doing it for the
micro-architecture where it helps and also not do it for generic since
it hurts ThunderX so much.

Thanks,
Andrew

>
> Thanks,
> Felix


Re: [PATCH, AArch64] atomics: prefetch the destination for write prior to ldxr/stxr loops

2016-03-07 Thread Yangfei (Felix)
> On Mon, Mar 7, 2016 at 7:27 PM, Yangfei (Felix)  wrote:
> > Hi,
> >
> > As discussed in LKML:
> http://lists.infradead.org/pipermail/linux-arm-kernel/2015-July/355996.html, 
> the
> cost of changing a cache line
> > from shared to exclusive state can be significant on aarch64 cores,
> especially when this is triggered by an exclusive store, since it may
> > result in having to retry the transaction.
> > This patch makes use of the "prfm PSTL1STRM" instruction to prefetch
> cache lines for write prior to ldxr/stxr loops generated by the ll/sc atomic
> routines.
> > Bootstrapped on AArch64 server, is it OK?
> 
> 
> I don't think this is a good thing in general.  For an example on ThunderX, 
> the
> prefetch just adds a cycle for no benefit.  This really depends on the
> micro-architecture of the core and how LDXR/STXR are
> implemented.   So after this patch, it will slow down ThunderX.
> 
> Thanks,
> Andrew Pinski
> 

Hi Andrew,

   I am not quite clear about the ThunderX micro-arch.  But, Yes, I agree it 
depends on the micro-architecture of the core.  
   As the mentioned kernel patch is merged upstream, I think the added prefetch 
instruction in atomic routines is good for most of AArch64 cores in the market. 
 
   If it does nothing good for ThunderX, then how about adding some checking 
here?  I mean disabling the the generation of the prfm if we are tuning for 
ThunderX.  
   
Thanks,
Felix


Re: [PATCH, AArch64] atomics: prefetch the destination for write prior to ldxr/stxr loops

2016-03-07 Thread Andrew Pinski
On Mon, Mar 7, 2016 at 7:27 PM, Yangfei (Felix)  wrote:
> Hi,
>
> As discussed in LKML: 
> http://lists.infradead.org/pipermail/linux-arm-kernel/2015-July/355996.html, 
> the cost of changing a cache line
> from shared to exclusive state can be significant on aarch64 cores, 
> especially when this is triggered by an exclusive store, since it may
> result in having to retry the transaction.
> This patch makes use of the "prfm PSTL1STRM" instruction to prefetch 
> cache lines for write prior to ldxr/stxr loops generated by the ll/sc atomic 
> routines.
> Bootstrapped on AArch64 server, is it OK?


I don't think this is a good thing in general.  For an example on
ThunderX, the prefetch just adds a cycle for no benefit.  This really
depends on the micro-architecture of the core and how LDXR/STXR are
implemented.   So after this patch, it will slow down ThunderX.

Thanks,
Andrew Pinski

>
> Thanks,
> Felix


[PATCH, AArch64] atomics: prefetch the destination for write prior to ldxr/stxr loops

2016-03-07 Thread Yangfei (Felix)
Hi,

As discussed in LKML: 
http://lists.infradead.org/pipermail/linux-arm-kernel/2015-July/355996.html, 
the cost of changing a cache line
from shared to exclusive state can be significant on aarch64 cores, 
especially when this is triggered by an exclusive store, since it may
result in having to retry the transaction. 
This patch makes use of the "prfm PSTL1STRM" instruction to prefetch cache 
lines for write prior to ldxr/stxr loops generated by the ll/sc atomic 
routines. 
Bootstrapped on AArch64 server, is it OK? 

Thanks,
Felix


aarch64-atomics-v0.diff
Description: aarch64-atomics-v0.diff


Re: [Patch, fortran] PRs 70031 and 69524 - submodule tweaks

2016-03-07 Thread Jerry DeLisle
On 03/06/2016 10:18 AM, Paul Richard Thomas wrote:
> Dear All,
> 
> These are two rather trivial modifications to permit, 'module' to
> appear at any position in the list of prefixes in the procedure
> declaration and to allow module procedures to appear within a module
> contains section. I was rather astonished at this latter since it does
> seem to be rather contrary to having an module interface declaration
> for the same procedure. However, from the Fortran 2008 standard:
> 
> C1247 (R1225) MODULE shall appear only in the function-stmt or
> subroutine-stmt of a module subprogram or of a nonabstract interface
> body that is declared in the scoping unit of a module or submodule.
> 
> Whilst I was about it, I prevented an ICE from occurring following the
> error generated by decl.c(copy_prefix), when prefixes in the interface
> are repeated in the procedure declaration. I have not included a test
> for this, since I am not convinced that repeating the prefixes is
> strictly speaking an error. In fact, it would make a lot of sense to
> repeat the interface declaration completely in the submodule
> declaration. I will investigate further before committing. The fix is
> even more trivial than preventing the ICE.
> 
> Since the patch is entirely permissive, it will not prevent correct
> code from compiling. In this sense, it is safe for stage 4.
> 
> Bootstrapped and regtested on FC21/x86_64. OK for trunk?
> 
> Best regards
> 
> Paul
> 
OK Paul, thanks for work.

Jerry


[PATCH] Fix ICE with i?86 stv (PR target/70110)

2016-03-07 Thread Jakub Jelinek
Hi!

What the STV pass does is clearly wrong, it meant to grab a sequence
of instructions out of a sequence and emit it somewhere else, but
calls end_sequence too late, so the sequence bookkepping vars get
corrupted.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2016-03-08  Jakub Jelinek  

PR target/70110
* config/i386/i386.c (scalar_chain::make_vector_copies,
scalar_chain::convert_reg): Call end_sequence in between
get_insns and emit_conversion_insns rather than after both
calls.

* gcc.dg/pr70110.c: New test.

--- gcc/config/i386/i386.c.jj   2016-03-04 21:27:48.0 +0100
+++ gcc/config/i386/i386.c  2016-03-07 10:03:44.465981310 +0100
@@ -3272,8 +3272,9 @@ scalar_chain::make_vector_copies (unsign
gen_rtx_SUBREG (SImode, reg, 4));
emit_move_insn (vreg, tmp);
  }
-   emit_conversion_insns (get_insns (), insn);
+   rtx_insn *seq = get_insns ();
end_sequence ();
+   emit_conversion_insns (seq, insn);
 
if (dump_file)
  fprintf (dump_file,
@@ -3348,8 +3349,9 @@ scalar_chain::convert_reg (unsigned regn
  emit_move_insn (gen_rtx_SUBREG (SImode, scopy, 4),
  adjust_address (tmp, SImode, 4));
}
- emit_conversion_insns (get_insns (), insn);
+ rtx_insn *seq = get_insns ();
  end_sequence ();
+ emit_conversion_insns (seq, insn);
 
  if (dump_file)
fprintf (dump_file,
--- gcc/testsuite/gcc.dg/pr70110.c.jj   2016-03-07 10:16:22.211631464 +0100
+++ gcc/testsuite/gcc.dg/pr70110.c  2016-03-07 10:16:05.0 +0100
@@ -0,0 +1,39 @@
+/* PR target/70110 */
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+/* { dg-additional-options "-msse2" { target i?86-*-* x86_64-*-* } } */
+
+int a, c, d, f, h;
+long long b;
+
+static inline void
+foo (void)
+{
+  if (a) 
+foo ();
+  b = c;
+}
+
+static inline void 
+bar (int p)
+{
+  if (p)
+f = 0; 
+  b |= c; 
+} 
+
+void
+baz (int g, int i)
+{
+  for (b = d; (d = 1) != 0; )
+{
+  if (a)
+   foo ();
+  b |= c;
+  bar (h); 
+  bar (g); 
+  bar (h); 
+  bar (i); 
+  bar (h);
+}
+}

Jakub


Re: IRA costs tweaks, PR 56069

2016-03-07 Thread Richard Sandiford
Bernd Schmidt  writes:
> On 03/02/2016 10:53 PM, Vladimir Makarov wrote:
>>> 2. update_costs_from_allocno records a cost update not just for the
>>> initial allocno, but for each of the visited ones. I can sort of see
>>> an argument for doing that (let's say if you assign an allocno in the
>>> middle of a copy chain you'd want the tail end of the chain to be
>>> reset), but in practice I don't think the present algorithm can work
>>> at all. In the case of an allocno in the middle of a copy chain the
>>> restore would progress in both directions, and in any case it looks
>>> like this approach can end up double-counting things when restoring
>>> costs.
>>>
>> It is just a heuristic.  Richard Sandiford proposed this update
>> approach.  Before it we had only updates of allocnos directly connected
>> to allocno in question.  Richard's approach helped to improve code in
>> some cases.  If something works better we should use.  The bechmarking
>> is the best criterium.
>
> Ccing Richard in case he has comments.

TBH I don't remember anything about this now.  Is it:

  https://gcc.gnu.org/ml/gcc-patches/2008-09/msg00541.html

?  I think that was just tweaking the traversal order in an existing
cost update, rather than adding a new one.

You might be talking about a different patch though, sorry.

Richard


Re: [hsa testsuite 5/5] New directory for HSA-specific C testcases

2016-03-07 Thread Jakub Jelinek
On Mon, Feb 29, 2016 at 05:31:53PM +0100, Martin Jambor wrote:
> Is the patch OK for trunk?

Ok, thanks.
> 2016-03-03  Martin Jambor  
> 
>   * testsuite/lib/libgomp.exp
>   (check_effective_target_hsa_offloading_selected_nocache): New.
>   (check_effective_target_hsa_offloading_selected): Likewise.
>   * testsuite/libgomp.hsa.c/c.exp: Likewise.
>   * testsuite/libgomp.hsa.c/alloca-1.c: Likewise.
>   * testsuite/libgomp.hsa.c/bitfield-1.c: Likewise.
>   * testsuite/libgomp.hsa.c/builtins-1.c: Likewise.
>   * testsuite/libgomp.hsa.c/complex-1.c: Likewise.
>   * testsuite/libgomp.hsa.c/formal-actual-args-1.c: Likewise.
>   * testsuite/libgomp.hsa.c/function-call-1.c: Likewise.
>   * testsuite/libgomp.hsa.c/get-level-1.c: Likewise.
>   * testsuite/libgomp.hsa.c/gridify-1.c: Likewise.
>   * testsuite/libgomp.hsa.c/gridify-2.c: Likewise.
>   * testsuite/libgomp.hsa.c/gridify-3.c: Likewise.
>   * testsuite/libgomp.hsa.c/gridify-4.c: Likewise.
>   * testsuite/libgomp.hsa.c/memory-operations-1.c: Likewise.
>   * testsuite/libgomp.hsa.c/pr69568.c: Likewise.
>   * testsuite/libgomp.hsa.c/rotate-1.c: Likewise.
>   * testsuite/libgomp.hsa.c/switch-1.c: Likewise.
>   * testsuite/libgomp.hsa.c/switch-branch-1.c: Likewise.

Jakub


Re: [hsa testsuite 4/5] Adjust libgomp tests that do not work on host fallback

2016-03-07 Thread Jakub Jelinek
On Mon, Feb 29, 2016 at 05:30:05PM +0100, Martin Jambor wrote:
> Tested both with and without HSA (enabled or present).  OK for trunk?

Ok.  These tests are from the Examples 4.0.x document, so we don't want to
diverge too much on them.

> 2016-02-12  Martin Jambor  
> 
> libgomp/
>   * testsuite/libgomp.c/examples-4/async_target-2.c: Only run on
>   non-shared memory accelerators.
>   * testsuite/libgomp.c/examples-4/device-1.c: Likewise.
>   * testsuite/libgomp.c/examples-4/target-5.c: Likewise.
>   * testsuite/libgomp.c/examples-4/target_data-6.c: Likewise.
>   * testsuite/libgomp.c/examples-4/target_data-7.c: Likewise.
>   * testsuite/libgomp.fortran/examples-4/async_target-2.f90: Likewise.
>   * testsuite/libgomp.fortran/examples-4/device-1.f90: Likewise.
>   * testsuite/libgomp.fortran/examples-4/target-5.f90: Likewise.
>   * testsuite/libgomp.fortran/examples-4/target_data-6.f90: Likewise.
>   * testsuite/libgomp.fortran/examples-4/target_data-7.f90: Likewise.

Jakub


Re: [hsa testsuite 3/5] Suppress hsa warnings in libgomp tests

2016-03-07 Thread Jakub Jelinek
On Mon, Feb 29, 2016 at 05:22:36PM +0100, Martin Jambor wrote:
> 2016-03-04  Martin Jambor  
> 
>   * testsuite/lib/libgomp.exp (libgomp_init): Append -Wno-hsa to
>   ALWAYS_CFLAGS.

Ok.

Jakub


Re: [hsa testsuite 2/5] Suppress hsa warnings in compiler gomp tests

2016-03-07 Thread Jakub Jelinek
On Mon, Feb 29, 2016 at 05:02:34PM +0100, Martin Jambor wrote:
> 
> [1] https://gcc.gnu.org/ml/gcc-patches/2016-03/msg00381.html
> 
> 
> 2016-03-04  Martin Jambor  
> 
>   * c-c++-common/gomp/clauses-1.c: Remove dg-options.
>   * c-c++-common/gomp/if-1.c: Likewise.
>   * c-c++-common/gomp/pr61486-2.c: Likewise.
>   * c-c++-common/gomp/target-teams-1.c: Moved dg-options except -fopenmp
>   to dg-additional-options.
>   * g++.dg/gomp/gomp.exp: Pass -Wno-hsa to all tests.
>   * g++/gomp/target-teams-1.c: Likewise.
>   * gcc.dg/gomp/gomp.exp: Likewise.
>   * gcc.dg/gomp/pr68128-2.c: Moved dg-options except -fopenmp to
>   dg-additional-options.
>   * gfortran.dg/gomp/gomp.exp: Likewise.
>   * gfortran.dg/gomp/target1.f90: Remove dg-options.
>   * gfortran.dg/gomp/target2.f90: Moved dg-options except -fopenmp to
>   dg-additional-options.
>   * gfortran.dg/gomp/target3.f90: Remove dg-options.

Ok.

Jakub


Re: [hsa testsuite 1/5] Gridification tests

2016-03-07 Thread Jakub Jelinek
On Mon, Feb 29, 2016 at 04:55:44PM +0100, Martin Jambor wrote:
> the patch below adds a DejaGNU effective target predicate (is that the
> correct dejagnu term?) offload_hsa so that selected tests can be run
> only if the hsa offloading is enabled.  I hope it is fairly standard
> stuff.  Additionally, it adds one C/C++ and one Fortran testsuite to
> check that gridification happens.
> 
> Tested, both with and without HSA enabled.  OK for trunk?

Ok.
> 2016-02-10  Martin Jambor  
> 
>   * target-supports.exp (check_effective_target_offload_hsa): New.
>   * c-c++-common/gomp/gridify-1.c: New test.
> * gfortran.dg/gomp/gridify-1.f90: Likewise.

Jakub


[hsa testsuite 2/5] Suppress hsa warnings in compiler gomp tests

2016-03-07 Thread Martin Jambor
Hi,

as Jakub requested, this patch deals with HSA "excess errors" in the
gomp compiler testsuite by passing -Wno-hsa to all of them.  After
discussing this in the thread about similar libgomp tests[1] (which
are however handled differently), Jakub expressed preference for
passing the option in default_extra_flags rather than flags so that
names of the tests do not change.

This however requires that the failing tests which use dg-options must
be adjusted.  There is 9 of them, most of them have just superfluous
-fopenmp in them which can be removed because that is the default and
the rest is handled by turning dg-options into dg-additional-options.

OK for trunk?

Thanks,

Martin

[1] https://gcc.gnu.org/ml/gcc-patches/2016-03/msg00381.html


2016-03-04  Martin Jambor  

* c-c++-common/gomp/clauses-1.c: Remove dg-options.
* c-c++-common/gomp/if-1.c: Likewise.
* c-c++-common/gomp/pr61486-2.c: Likewise.
* c-c++-common/gomp/target-teams-1.c: Moved dg-options except -fopenmp
to dg-additional-options.
* g++.dg/gomp/gomp.exp: Pass -Wno-hsa to all tests.
* g++/gomp/target-teams-1.c: Likewise.
* gcc.dg/gomp/gomp.exp: Likewise.
* gcc.dg/gomp/pr68128-2.c: Moved dg-options except -fopenmp to
dg-additional-options.
* gfortran.dg/gomp/gomp.exp: Likewise.
* gfortran.dg/gomp/target1.f90: Remove dg-options.
* gfortran.dg/gomp/target2.f90: Moved dg-options except -fopenmp to
dg-additional-options.
* gfortran.dg/gomp/target3.f90: Remove dg-options.
---
 gcc/testsuite/c-c++-common/gomp/clauses-1.c  | 1 -
 gcc/testsuite/c-c++-common/gomp/if-1.c   | 1 -
 gcc/testsuite/c-c++-common/gomp/pr61486-2.c  | 1 -
 gcc/testsuite/c-c++-common/gomp/target-teams-1.c | 2 +-
 gcc/testsuite/g++.dg/gomp/gomp.exp   | 2 +-
 gcc/testsuite/g++.dg/gomp/target-teams-1.C   | 2 +-
 gcc/testsuite/gcc.dg/gomp/gomp.exp   | 2 +-
 gcc/testsuite/gcc.dg/gomp/pr68128-2.c| 2 +-
 gcc/testsuite/gfortran.dg/gomp/gomp.exp  | 2 +-
 gcc/testsuite/gfortran.dg/gomp/target1.f90   | 1 -
 gcc/testsuite/gfortran.dg/gomp/target2.f90   | 2 +-
 gcc/testsuite/gfortran.dg/gomp/target3.f90   | 1 -
 12 files changed, 7 insertions(+), 12 deletions(-)

diff --git a/gcc/testsuite/c-c++-common/gomp/clauses-1.c 
b/gcc/testsuite/c-c++-common/gomp/clauses-1.c
index 2d1c352..91aed39 100644
--- a/gcc/testsuite/c-c++-common/gomp/clauses-1.c
+++ b/gcc/testsuite/c-c++-common/gomp/clauses-1.c
@@ -1,5 +1,4 @@
 /* { dg-do compile } */
-/* { dg-options "-fopenmp" } */
 /* { dg-additional-options "-std=c99" { target c } } */
 
 int t;
diff --git a/gcc/testsuite/c-c++-common/gomp/if-1.c 
b/gcc/testsuite/c-c++-common/gomp/if-1.c
index 4ba708c..3a9b538 100644
--- a/gcc/testsuite/c-c++-common/gomp/if-1.c
+++ b/gcc/testsuite/c-c++-common/gomp/if-1.c
@@ -1,5 +1,4 @@
 /* { dg-do compile } */
-/* { dg-options "-fopenmp" } */
 
 void
 foo (int a, int b, int *p, int *q)
diff --git a/gcc/testsuite/c-c++-common/gomp/pr61486-2.c 
b/gcc/testsuite/c-c++-common/gomp/pr61486-2.c
index db97143..4a68023 100644
--- a/gcc/testsuite/c-c++-common/gomp/pr61486-2.c
+++ b/gcc/testsuite/c-c++-common/gomp/pr61486-2.c
@@ -1,6 +1,5 @@
 /* PR middle-end/61486 */
 /* { dg-do compile } */
-/* { dg-options "-fopenmp" } */
 /* { dg-require-effective-target alloca } */
 
 #pragma omp declare target
diff --git a/gcc/testsuite/c-c++-common/gomp/target-teams-1.c 
b/gcc/testsuite/c-c++-common/gomp/target-teams-1.c
index 0a707c2..51b8d48 100644
--- a/gcc/testsuite/c-c++-common/gomp/target-teams-1.c
+++ b/gcc/testsuite/c-c++-common/gomp/target-teams-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-fopenmp -fdump-tree-gimple" } */
+/* { dg-additional-options "-fdump-tree-gimple" } */
 
 int v = 6;
 void bar (int);
diff --git a/gcc/testsuite/g++.dg/gomp/gomp.exp 
b/gcc/testsuite/g++.dg/gomp/gomp.exp
index 7365389..d26596c 100644
--- a/gcc/testsuite/g++.dg/gomp/gomp.exp
+++ b/gcc/testsuite/g++.dg/gomp/gomp.exp
@@ -29,7 +29,7 @@ dg-init
 # Main loop.
 g++-dg-runtest [lsort [concat \
[find $srcdir/$subdir *.C] \
-   [find $srcdir/c-c++-common/gomp *.c]]] "" "-fopenmp"
+   [find $srcdir/c-c++-common/gomp *.c]]] "" "-fopenmp -Wno-hsa"
 
 # All done.
 dg-finish
diff --git a/gcc/testsuite/g++.dg/gomp/target-teams-1.C 
b/gcc/testsuite/g++.dg/gomp/target-teams-1.C
index 0a97de0..f78a608 100644
--- a/gcc/testsuite/g++.dg/gomp/target-teams-1.C
+++ b/gcc/testsuite/g++.dg/gomp/target-teams-1.C
@@ -1,5 +1,5 @@
 // { dg-do compile }
-// { dg-options "-fopenmp -fdump-tree-gimple" }
+// { dg-additional-options "-fdump-tree-gimple" }
 
 int v = 6;
 void bar (int);
diff --git a/gcc/testsuite/gcc.dg/gomp/gomp.exp 
b/gcc/testsuite/gcc.dg/gomp/gomp.exp
index 78623fc..b6b5932 100644
--- a/gcc/testsuite/gcc.dg/gomp/gomp.exp
+++ b/gcc/testsuite/gcc.dg/gomp/gomp.exp
@@ -31,7 +31,7 @@ dg-init
 # Main loop.
 dg-runtest [lsort 

[hsa testsuite 5/5] New directory for HSA-specific C testcases

2016-03-07 Thread Martin Jambor
Hi,

we would like a place to have some HSA-specific tests, which would
only run not only when HSA is enabled at configuration time but also
when HSA hardware is present and used for offloading.

I have proposed the first version of this patch as
https://gcc.gnu.org/ml/gcc-patches/2016-02/msg01817.html and got some
seedback from Mike Stump in
https://gcc.gnu.org/ml/gcc-patches/2016-03/msg00086.html.  I hope I
have incorporated his suggestions.  As I wrote in the cover letter, it
is likely I'll propose similar C++ and Fortran directories in the
future.

Is the patch OK for trunk?

Thanks,

Martin


2016-03-03  Martin Jambor  

* testsuite/lib/libgomp.exp
(check_effective_target_hsa_offloading_selected_nocache): New.
(check_effective_target_hsa_offloading_selected): Likewise.
* testsuite/libgomp.hsa.c/c.exp: Likewise.
* testsuite/libgomp.hsa.c/alloca-1.c: Likewise.
* testsuite/libgomp.hsa.c/bitfield-1.c: Likewise.
* testsuite/libgomp.hsa.c/builtins-1.c: Likewise.
* testsuite/libgomp.hsa.c/complex-1.c: Likewise.
* testsuite/libgomp.hsa.c/formal-actual-args-1.c: Likewise.
* testsuite/libgomp.hsa.c/function-call-1.c: Likewise.
* testsuite/libgomp.hsa.c/get-level-1.c: Likewise.
* testsuite/libgomp.hsa.c/gridify-1.c: Likewise.
* testsuite/libgomp.hsa.c/gridify-2.c: Likewise.
* testsuite/libgomp.hsa.c/gridify-3.c: Likewise.
* testsuite/libgomp.hsa.c/gridify-4.c: Likewise.
* testsuite/libgomp.hsa.c/memory-operations-1.c: Likewise.
* testsuite/libgomp.hsa.c/pr69568.c: Likewise.
* testsuite/libgomp.hsa.c/rotate-1.c: Likewise.
* testsuite/libgomp.hsa.c/switch-1.c: Likewise.
* testsuite/libgomp.hsa.c/switch-branch-1.c: Likewise.
---
 libgomp/testsuite/lib/libgomp.exp  |  53 +++
 libgomp/testsuite/libgomp.hsa.c/alloca-1.c |  25 
 libgomp/testsuite/libgomp.hsa.c/bitfield-1.c   | 160 +
 libgomp/testsuite/libgomp.hsa.c/builtins-1.c   |  97 +
 libgomp/testsuite/libgomp.hsa.c/c.exp  |  42 ++
 libgomp/testsuite/libgomp.hsa.c/complex-1.c|  65 +
 .../testsuite/libgomp.hsa.c/formal-actual-args-1.c |  83 +++
 libgomp/testsuite/libgomp.hsa.c/function-call-1.c  |  50 +++
 libgomp/testsuite/libgomp.hsa.c/get-level-1.c  |  26 
 libgomp/testsuite/libgomp.hsa.c/gridify-1.c|  26 
 libgomp/testsuite/libgomp.hsa.c/gridify-2.c|  26 
 libgomp/testsuite/libgomp.hsa.c/gridify-3.c|  39 +
 libgomp/testsuite/libgomp.hsa.c/gridify-4.c|  45 ++
 .../testsuite/libgomp.hsa.c/memory-operations-1.c  |  92 
 libgomp/testsuite/libgomp.hsa.c/pr69568.c  |  41 ++
 libgomp/testsuite/libgomp.hsa.c/rotate-1.c |  39 +
 libgomp/testsuite/libgomp.hsa.c/switch-1.c | 145 +++
 libgomp/testsuite/libgomp.hsa.c/switch-branch-1.c  | 116 +++
 18 files changed, 1170 insertions(+)
 create mode 100644 libgomp/testsuite/libgomp.hsa.c/alloca-1.c
 create mode 100644 libgomp/testsuite/libgomp.hsa.c/bitfield-1.c
 create mode 100644 libgomp/testsuite/libgomp.hsa.c/builtins-1.c
 create mode 100644 libgomp/testsuite/libgomp.hsa.c/c.exp
 create mode 100644 libgomp/testsuite/libgomp.hsa.c/complex-1.c
 create mode 100644 libgomp/testsuite/libgomp.hsa.c/formal-actual-args-1.c
 create mode 100644 libgomp/testsuite/libgomp.hsa.c/function-call-1.c
 create mode 100644 libgomp/testsuite/libgomp.hsa.c/get-level-1.c
 create mode 100644 libgomp/testsuite/libgomp.hsa.c/gridify-1.c
 create mode 100644 libgomp/testsuite/libgomp.hsa.c/gridify-2.c
 create mode 100644 libgomp/testsuite/libgomp.hsa.c/gridify-3.c
 create mode 100644 libgomp/testsuite/libgomp.hsa.c/gridify-4.c
 create mode 100644 libgomp/testsuite/libgomp.hsa.c/memory-operations-1.c
 create mode 100644 libgomp/testsuite/libgomp.hsa.c/pr69568.c
 create mode 100644 libgomp/testsuite/libgomp.hsa.c/rotate-1.c
 create mode 100644 libgomp/testsuite/libgomp.hsa.c/switch-1.c
 create mode 100644 libgomp/testsuite/libgomp.hsa.c/switch-branch-1.c

diff --git a/libgomp/testsuite/lib/libgomp.exp 
b/libgomp/testsuite/lib/libgomp.exp
index bbc2c26..0d5b6d4 100644
--- a/libgomp/testsuite/lib/libgomp.exp
+++ b/libgomp/testsuite/lib/libgomp.exp
@@ -395,3 +395,56 @@ proc check_effective_target_openacc_host_selected { } {
 }
 return 0;
 }
+
+# Return 1 if the selected OMP device is actually a HSA device
+
+proc check_effective_target_hsa_offloading_selected_nocache {} {
+global tool
+
+set src {
+   int main () {
+   int v = 1;
+   #pragma omp target map(from:v)
+   v = 0;
+   return v;
+   }
+}
+
+set result [eval [list check_compile hsa_offloading_src executable $src] 
""]
+set lines [lindex $result 0]
+set output [lindex $result 1]
+
+set ok 0
+if { 

[hsa testsuite 4/5] Adjust libgomp tests that do not work on host fallback

2016-03-07 Thread Martin Jambor
Hi,

this patch avoids run-time failures in libgomp testsuite that
curtrently happen when HSA offloading is actually used.  All of these
tests require the offload_device effective target which the patch
changes to offload_device_nonshared_as one.

For some tests, such as libgomp.c/examples-4/device-1.c this is
clearly just the correct thing to do because the test explicitely
checks that changes that happen in a target construct and are not
"mapped" back are not observable on the host.

However, the majority of the tests has a different problem.  If a test
for some reason is not compiled into HSAIL (usually because it would
require the dynamic parallelism path which is disabled or because it
calls abort from within target which HSA so far cannot handle), the
host fallback is called, even though the test actually is not supposed
to be called on it.  Such problematic tests then call
omp_is_initial_device to verify they are not running on the host and
decide to fail when they figure out they are.

Changing the effective target only to devices with non-shared memory
probably isn't the really correct fix.  We basically want to disable
the host fallback for them regardeless of address spaces but I cannot
think of a simple and generic way of doing that.  However, all
testcases for non-shared memory devices were written with disallowed
fallback in mind and so this soulution also gives the desired result.
Perhaps we need something better for the long term, any suggestions
are welcome.

Tested both with and without HSA (enabled or present).  OK for trunk?

Thanks,

Martin

2016-02-12  Martin Jambor  

libgomp/
* testsuite/libgomp.c/examples-4/async_target-2.c: Only run on
non-shared memory accelerators.
* testsuite/libgomp.c/examples-4/device-1.c: Likewise.
* testsuite/libgomp.c/examples-4/target-5.c: Likewise.
* testsuite/libgomp.c/examples-4/target_data-6.c: Likewise.
* testsuite/libgomp.c/examples-4/target_data-7.c: Likewise.
* testsuite/libgomp.fortran/examples-4/async_target-2.f90: Likewise.
* testsuite/libgomp.fortran/examples-4/device-1.f90: Likewise.
* testsuite/libgomp.fortran/examples-4/target-5.f90: Likewise.
* testsuite/libgomp.fortran/examples-4/target_data-6.f90: Likewise.
* testsuite/libgomp.fortran/examples-4/target_data-7.f90: Likewise.
---
 libgomp/testsuite/libgomp.c/examples-4/async_target-2.c | 2 +-
 libgomp/testsuite/libgomp.c/examples-4/device-1.c   | 2 +-
 libgomp/testsuite/libgomp.c/examples-4/target-5.c   | 2 +-
 libgomp/testsuite/libgomp.c/examples-4/target_data-6.c  | 2 +-
 libgomp/testsuite/libgomp.c/examples-4/target_data-7.c  | 2 +-
 libgomp/testsuite/libgomp.fortran/examples-4/async_target-2.f90 | 2 +-
 libgomp/testsuite/libgomp.fortran/examples-4/device-1.f90   | 2 +-
 libgomp/testsuite/libgomp.fortran/examples-4/target-5.f90   | 2 +-
 libgomp/testsuite/libgomp.fortran/examples-4/target_data-6.f90  | 2 +-
 libgomp/testsuite/libgomp.fortran/examples-4/target_data-7.f90  | 2 +-
 10 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/libgomp/testsuite/libgomp.c/examples-4/async_target-2.c 
b/libgomp/testsuite/libgomp.c/examples-4/async_target-2.c
index ce63328..0c76f8e 100644
--- a/libgomp/testsuite/libgomp.c/examples-4/async_target-2.c
+++ b/libgomp/testsuite/libgomp.c/examples-4/async_target-2.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-require-effective-target offload_device } */
+/* { dg-require-effective-target offload_device_nonshared_as } */
 
 #include 
 #include 
diff --git a/libgomp/testsuite/libgomp.c/examples-4/device-1.c 
b/libgomp/testsuite/libgomp.c/examples-4/device-1.c
index dad8572..46aa160 100644
--- a/libgomp/testsuite/libgomp.c/examples-4/device-1.c
+++ b/libgomp/testsuite/libgomp.c/examples-4/device-1.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-require-effective-target offload_device } */
+/* { dg-require-effective-target offload_device_nonshared_as } */
 
 #include 
 #include 
diff --git a/libgomp/testsuite/libgomp.c/examples-4/target-5.c 
b/libgomp/testsuite/libgomp.c/examples-4/target-5.c
index 1853fba..1c14bae 100644
--- a/libgomp/testsuite/libgomp.c/examples-4/target-5.c
+++ b/libgomp/testsuite/libgomp.c/examples-4/target-5.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-require-effective-target offload_device } */
+/* { dg-require-effective-target offload_device_nonshared_as } */
 
 #include 
 #include 
diff --git a/libgomp/testsuite/libgomp.c/examples-4/target_data-6.c 
b/libgomp/testsuite/libgomp.c/examples-4/target_data-6.c
index affeb49..57c7c0c 100644
--- a/libgomp/testsuite/libgomp.c/examples-4/target_data-6.c
+++ b/libgomp/testsuite/libgomp.c/examples-4/target_data-6.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-require-effective-target offload_device } */
+/* { dg-require-effective-target offload_device_nonshared_as } */
 
 #include 
 #include 
diff --git 

[hsa testsuite 3/5] Suppress hsa warnings in libgomp tests

2016-03-07 Thread Martin Jambor
Hi,

just like with the compiler gomp testsuite, we need to add -Wno-hsa to
options when compiling libgomp testcases in order not to have "excess
errors" failures when HSA is enabled.  There are quite many of such
testcases on the trunk because I have disabled the dynamic parallelism
way of executing stuff.

The patch below adds the option to all libgomp testsuite compilations,
so that people who are not interested in HSA do not need to care.  The
patch has been tested both with and without HSA enabled.  OK for
trunk?

Thanks,

Martin


2016-03-04  Martin Jambor  

* testsuite/lib/libgomp.exp (libgomp_init): Append -Wno-hsa to
ALWAYS_CFLAGS.
---
 libgomp/testsuite/lib/libgomp.exp | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/libgomp/testsuite/lib/libgomp.exp 
b/libgomp/testsuite/lib/libgomp.exp
index 154a447..bbc2c26 100644
--- a/libgomp/testsuite/lib/libgomp.exp
+++ b/libgomp/testsuite/lib/libgomp.exp
@@ -237,6 +237,9 @@ proc libgomp_init { args } {
 # Disable caret
 lappend ALWAYS_CFLAGS "additional_flags=-fno-diagnostics-show-caret"
 
+# Disable HSA warnings by default.
+lappend ALWAYS_CFLAGS "additional_flags=-Wno-hsa"
+
 # Disable color diagnostics
 lappend ALWAYS_CFLAGS "additional_flags=-fdiagnostics-color=never"
 
-- 
2.7.1



[hsa testsuite 1/5] Gridification tests

2016-03-07 Thread Martin Jambor
Hi,

the patch below adds a DejaGNU effective target predicate (is that the
correct dejagnu term?) offload_hsa so that selected tests can be run
only if the hsa offloading is enabled.  I hope it is fairly standard
stuff.  Additionally, it adds one C/C++ and one Fortran testsuite to
check that gridification happens.

Tested, both with and without HSA enabled.  OK for trunk?

Thanks,

Martin

2016-02-10  Martin Jambor  

* target-supports.exp (check_effective_target_offload_hsa): New.
* c-c++-common/gomp/gridify-1.c: New test.
* gfortran.dg/gomp/gridify-1.f90: Likewise.
---
 gcc/testsuite/c-c++-common/gomp/gridify-1.c  | 54 
 gcc/testsuite/gfortran.dg/gomp/gridify-1.f90 | 16 +
 gcc/testsuite/lib/target-supports.exp|  8 +
 3 files changed, 78 insertions(+)
 create mode 100644 gcc/testsuite/c-c++-common/gomp/gridify-1.c
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/gridify-1.f90

diff --git a/gcc/testsuite/c-c++-common/gomp/gridify-1.c 
b/gcc/testsuite/c-c++-common/gomp/gridify-1.c
new file mode 100644
index 000..ba7a866
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/gridify-1.c
@@ -0,0 +1,54 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target offload_hsa } */
+/* { dg-options "-fopenmp -fdump-tree-omplower-details" } */
+
+void
+foo1 (int n, int *a, int workgroup_size)
+{
+  int i;
+#pragma omp target
+#pragma omp teams thread_limit(workgroup_size)
+#pragma omp distribute parallel for shared(a) firstprivate(n) private(i)
+for (i = 0; i < n; i++)
+  a[i]++;
+}
+
+void
+foo2 (int j, int n, int *a)
+{
+  int i;
+#pragma omp target teams
+#pragma omp distribute parallel for shared(a) firstprivate(n) private(i) 
firstprivate(j)
+for (i = j + 1; i < n; i++)
+  a[i] = i;
+}
+
+void
+foo3 (int j, int n, int *a)
+{
+  int i;
+#pragma omp target teams
+#pragma omp distribute parallel for shared(a) firstprivate(n) private(i) 
firstprivate(j)
+  for (i = j + 1; i < n; i += 3)
+a[i] = i;
+}
+
+void
+foo4 (int j, int n, int *a)
+{
+#pragma omp parallel
+  {
+#pragma omp single
+{
+  int i;
+#pragma omp target
+#pragma omp teams
+#pragma omp distribute parallel for shared(a) firstprivate(n) private(i) 
firstprivate(j)
+  for (i = j + 1; i < n; i += 3)
+   a[i] = i;
+}
+  }
+}
+
+
+/* { dg-final { scan-tree-dump-times "Target construct will be turned into a 
gridified GPGPU kernel" 4 "omplower" } } */
diff --git a/gcc/testsuite/gfortran.dg/gomp/gridify-1.f90 
b/gcc/testsuite/gfortran.dg/gomp/gridify-1.f90
new file mode 100644
index 000..00ff7f5
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/gomp/gridify-1.f90
@@ -0,0 +1,16 @@
+! { dg-do compile }
+! { dg-require-effective-target offload_hsa }
+! { dg-options "-fopenmp -fdump-tree-omplower-details" } */
+
+subroutine vector_square(n, a, b)
+  integer i, n, b(n), a(n)
+!$omp target teams
+!$omp distribute parallel do
+  do i=1,n
+  b(i) = a(i) * a(i)
+  enddo
+!$omp end distribute parallel do
+!$omp end target teams
+end subroutine vector_square
+
+! { dg-final { scan-tree-dump "Target construct will be turned into a 
gridified GPGPU kernel" "omplower" } }
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 0b4252f..fac4c3c 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -6936,3 +6936,11 @@ proc check_effective_target_offload_nvptx { } {
int main () {return 0;}
 } "-foffload=nvptx-none" ]
 }
+
+# Return 1 if the compiler has been configured with hsa offloading.
+
+proc check_effective_target_offload_hsa { } {
+return [check_no_compiler_messages offload_hsa assembly {
+   int main () {return 0;}
+} "-foffload=hsa" ]
+}
-- 
2.7.1



[hsa testsuite 0/5] Re-post of all pending patches adjusting testsuite for HSA

2016-03-07 Thread Martin Jambor
Hi,

in order to consolidate things, I have decided to re-post all "hsa
testsuite" patches under this thread.  With the patches applied, we do
no not get any spurious failures because of hsa warnings or libgomp
testcases failing because they are run on the host fallback.
Moreover, the first patch adds a simple dump-scan compile-time
gridification tests and the last patch adds a special directory for
run-time C tests of hsa which are run only when HSA devices are
actually selected for offloading.  In the future, I'll likely propose
similar C++ and Fortran directories.

All patches were tested by running the whole testsuite on patched
trunk:

  - that was configured for all languages except go but not configured
for HSA,

  - that was configured for all languages except go and also for HSA
offloading, but an HSA device was not present on the machine, and

  - running the whole suite after configuring trunk for C, C++ and
Fortran on a computer with an HSA APU,

and subsequently comparing generated .sum files with unpatched trunk.

Thanks for any feedback (and approvals ;-),

Martin


[PATCH][PR tree-optimization/69740] Request loop fixups when needed

2016-03-07 Thread Jeff Law


This is an updated patch to address BZ69740.  When the patch was 
originally committed, there was a flurry of regressions related to the 
loop optimizers.


Essentially the loop optimizers would change the CFG and we'd 
conservatively set LOOPS_NEED_FIXUP.  At the end of the given loop 
optimizer pass we'd call check_verify_loop_structure which would assert 
no loops need fixing up.  Opps.


Richi noted that 1. The loop optimizers shouldn't be making changes to 
the CFG which result in needing loop fixups and 2. 
check_verify_loop_structure will verify that the loops don't need fixups 
and if they do it will complain.


Thus we can have check_verify_loop_structure clear LOOPS_NEED_FIXUP and 
let the normal verification run.  This has the additional advantage that 
we don't do unnecessary loop fixup passes.


Bootstrapped and regression tested on x86_64-linux-gnu.  Note that the 
tests for the regressions caused when the prior version of this patch 
are already in the testsuite.  Installed on the trunk.


I'm going to hold off backporting for a bit this time.


Thanks,
Jeff
commit cbf273cb7415a5a2e3a09aa81f466526a97e7d17
Author: law 
Date:   Mon Mar 7 17:01:54 2016 +

PR tree-optimization/69740
* cfghooks.c (remove_edge): Request loop fixups if we delete
an edge that might turn an irreducible loop into a natural
loop.
* cfgloop.h (check_verify_loop_structure): Clear LOOPS_NEED_FIXUP.
Move after definition of loops_state_clear.

PR tree-optimization/69740
* gcc.c-torture/compile/pr69740-1.c: New test.
* gcc.c-torture/compile/pr69740-2.c: New test.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@234036 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index bab6530..9f83f36 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,13 @@
+2016-02-26  Richard Biener  
+Jeff Law  
+
+   PR tree-optimization/69740
+   * cfghooks.c (remove_edge): Request loop fixups if we delete
+   an edge that might turn an irreducible loop into a natural
+   loop.
+   * cfgloop.h (check_verify_loop_structure): Clear LOOPS_NEED_FIXUP.
+   Move after definition of loops_state_clear.
+
 2016-03-07  Bin Cheng  
 
PR rtl-optimization/69052
diff --git a/gcc/cfghooks.c b/gcc/cfghooks.c
index bbb1017..06c05d1 100644
--- a/gcc/cfghooks.c
+++ b/gcc/cfghooks.c
@@ -408,7 +408,20 @@ void
 remove_edge (edge e)
 {
   if (current_loops != NULL)
-rescan_loop_exit (e, false, true);
+{
+  rescan_loop_exit (e, false, true);
+
+  /* Removal of an edge inside an irreducible region or which leads
+to an irreducible region can turn the region into a natural loop.
+In that case, ask for the loop structure fixups.
+
+FIXME: Note that LOOPS_HAVE_MARKED_IRREDUCIBLE_REGIONS is not always
+set, so always ask for fixups when removing an edge in that case.  */
+  if (!loops_state_satisfies_p (LOOPS_HAVE_MARKED_IRREDUCIBLE_REGIONS)
+ || (e->flags & EDGE_IRREDUCIBLE_LOOP)
+ || (e->dest->flags & BB_IRREDUCIBLE_LOOP))
+   loops_state_set (LOOPS_NEED_FIXUP);
+}
 
   /* This is probably not needed, but it doesn't hurt.  */
   /* FIXME: This should be called via a remove_edge hook.  */
diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h
index 882861c..54e738f 100644
--- a/gcc/cfgloop.h
+++ b/gcc/cfgloop.h
@@ -314,16 +314,6 @@ extern void delete_loop (struct loop *);
 
 extern void verify_loop_structure (void);
 
-/* Check loop structure invariants, if internal consistency checks are
-   enabled.  */
-
-static inline void
-checking_verify_loop_structure (void)
-{
-  if (flag_checking)
-verify_loop_structure ();
-}
-
 /* Loop analysis.  */
 extern bool just_once_each_iteration_p (const struct loop *, 
const_basic_block);
 gcov_type expected_loop_iterations_unbounded (const struct loop *);
@@ -546,6 +536,28 @@ loops_state_clear (unsigned flags)
   loops_state_clear (cfun, flags);
 }
 
+/* Check loop structure invariants, if internal consistency checks are
+   enabled.  */
+
+static inline void
+checking_verify_loop_structure (void)
+{
+  /* VERIFY_LOOP_STRUCTURE essentially asserts that no loops need fixups.
+
+ The loop optimizers should never make changes to the CFG which
+ require loop fixups.  But the low level CFG manipulation code may
+ set the flag conservatively.
+
+ Go ahead and clear the flag here.  That avoids the assert inside
+ VERIFY_LOOP_STRUCTURE, and if there is an inconsistency in the loop
+ structures VERIFY_LOOP_STRUCTURE will detect it.
+
+ This also avoid the compile time cost of excessive fixups.  */
+  loops_state_clear (LOOPS_NEED_FIXUP);
+  if (flag_checking)
+verify_loop_structure ();
+}
+
 /* Loop iterators.  */
 
 /* Flags for loop iteration.  */
diff --git 

Re: Fix postreload_combine miscompilation (PR 69941)

2016-03-07 Thread Jeff Law

On 03/07/2016 05:37 AM, Bernd Schmidt wrote:

On 03/05/2016 06:27 AM, Jeff Law wrote:

PR rtl-optimization/69941
* postreload.c (reload_combine_recognize_pattern): Ensure all
uses of
the reg share its mode.

testsuite/
PR rtl-optimization/69941
* gcc.dg/torture/pr69941.c: New test.

OK.


For branches as well?

Yes.

jeff



[hsa] Consodlidate GTY roots for trees used during expansion to HSA

2016-03-07 Thread Martin Jambor
Hi,

when testing the experimental hsa branch, where dynamic parallelism is
not disabled and get_hsa_kernel_dispatch_offset is executed quite a
bit more frequently, I have come across hsa_kernel_dispatch_type being
freed by gcc even though it is marked with a GTY flag.  The reason is
that the file hsa-gen.c is not listed in GTFILES in Makefile.in (and
this it does not have and does not include its gcc header file).  This
was the only intended GTY root in this file but there is another one
in hsa-brig.c for lists of statements to put into a static
constructors and destructors.

Rather than adding these files to GTFILES, I have decided to move the
tree roots to hsa.c which is already there and GTY roots are really
only needed for these few rather special occasions.  The patch below,
which does just that, passed bootstrap and testing and HSA testing on
both trunk and the branch.  I will commit it in a few moments.

Thanks,

Martin


2016-03-02  Martin Jambor  

* hsa.h (hsa_get_ctor_statements): Declare.
(hsa_get_dtor_statements): Likewise.
(hsa_get_kernel_dispatch_type): Likewise.
* hsa.c (hsa_get_ctor_statements): New function.
(hsa_get_dtor_statements): Likewise.
(hsa_get_kernel_dispatch_type): Likewise.
* hsa-brig.c (hsa_cdtor_statements): Removed.
(hsa_output_libgomp_mapping): Use hsa_get_ctor_statements and
hsa_get_dtor_statements.
* hsa-gen.c (hsa_kernel_dispatch_type): Removed.
(get_hsa_kernel_dispatch_offset): Use hsa_get_kernel_dispatch_type.
---
 gcc/hsa-brig.c | 14 ++
 gcc/hsa-gen.c  | 13 ++---
 gcc/hsa.c  | 25 +
 gcc/hsa.h  |  3 +++
 4 files changed, 40 insertions(+), 15 deletions(-)

diff --git a/gcc/hsa-brig.c b/gcc/hsa-brig.c
index 61cfd8b..2a301be 100644
--- a/gcc/hsa-brig.c
+++ b/gcc/hsa-brig.c
@@ -2006,8 +2006,6 @@ hsa_brig_emit_omp_symbols (void)
   emit_directive_variable (hsa_num_threads);
 }
 
-static GTY(()) tree hsa_cdtor_statements[2];
-
 /* Create and return __hsa_global_variables symbol that contains
all informations consumed by libgomp to link global variables
with their string names used by an HSA kernel.  */
@@ -2408,6 +2406,7 @@ hsa_output_libgomp_mapping (tree brig_decl)
 = builtin_decl_explicit (BUILT_IN_GOMP_OFFLOAD_REGISTER);
   gcc_checking_assert (offload_register);
 
+  tree *hsa_ctor_stmts = hsa_get_ctor_statements ();
   append_to_statement_list
 (build_call_expr (offload_register, 4,
  build_int_cstu (unsigned_type_node,
@@ -2416,15 +2415,15 @@ hsa_output_libgomp_mapping (tree brig_decl)
  build_fold_addr_expr (hsa_libgomp_host_table),
  build_int_cst (integer_type_node, GOMP_DEVICE_HSA),
  build_fold_addr_expr (hsa_img_descriptor)),
- _cdtor_statements[0]);
+ hsa_ctor_stmts);
 
-  cgraph_build_static_cdtor ('I', hsa_cdtor_statements[0],
-DEFAULT_INIT_PRIORITY);
+  cgraph_build_static_cdtor ('I', *hsa_ctor_stmts, DEFAULT_INIT_PRIORITY);
 
   tree offload_unregister
 = builtin_decl_explicit (BUILT_IN_GOMP_OFFLOAD_UNREGISTER);
   gcc_checking_assert (offload_unregister);
 
+  tree *hsa_dtor_stmts = hsa_get_dtor_statements ();
   append_to_statement_list
 (build_call_expr (offload_unregister, 4,
  build_int_cstu (unsigned_type_node,
@@ -2433,9 +2432,8 @@ hsa_output_libgomp_mapping (tree brig_decl)
  build_fold_addr_expr (hsa_libgomp_host_table),
  build_int_cst (integer_type_node, GOMP_DEVICE_HSA),
  build_fold_addr_expr (hsa_img_descriptor)),
- _cdtor_statements[1]);
-  cgraph_build_static_cdtor ('D', hsa_cdtor_statements[1],
-DEFAULT_INIT_PRIORITY);
+ hsa_dtor_stmts);
+  cgraph_build_static_cdtor ('D', *hsa_dtor_stmts, DEFAULT_INIT_PRIORITY);
 }
 
 /* Emit the brig module we have compiled to a section in the final assembly and
diff --git a/gcc/hsa-gen.c b/gcc/hsa-gen.c
index d7d39f0..fc59fa5 100644
--- a/gcc/hsa-gen.c
+++ b/gcc/hsa-gen.c
@@ -3772,20 +3772,19 @@ gen_set_num_threads (tree value, hsa_bb *hbb)
   hbb->append_insn (basic);
 }
 
-static GTY (()) tree hsa_kernel_dispatch_type = NULL;
-
 /* Return byte offset of a FIELD_NAME in GOMP_hsa_kernel_dispatch which
is defined in plugin-hsa.c.  */
 
 static HOST_WIDE_INT
 get_hsa_kernel_dispatch_offset (const char *field_name)
 {
-  if (hsa_kernel_dispatch_type == NULL)
+  tree *hsa_kernel_dispatch_type = hsa_get_kernel_dispatch_type ();
+  if (*hsa_kernel_dispatch_type == NULL)
 {
   /* Collection of information needed for a dispatch of a kernel from a
 kernel.  Keep in sync with libgomp's plugin-hsa.c.  */
 
-  hsa_kernel_dispatch_type = make_node (RECORD_TYPE);
+  *hsa_kernel_dispatch_type = make_node (RECORD_TYPE);
   tree id_f1 = build_decl (BUILTINS_LOCATION, FIELD_DECL,

Re: [PATCH][ARM] Add deprecation warning on pre-v4t architecture revisions

2016-03-07 Thread Kyrill Tkachov

Ping.
https://gcc.gnu.org/ml/gcc-patches/2016-03/msg00046.html

Thanks,
Kyrill

On 01/03/16 16:17, Kyrill Tkachov wrote:

Hi all,

For GCC 6 we want to deprecate architecture revisions prior to ARMv4T.
This patch implements this by documenting the deprecation in invoke.texi and 
adding
a warning whenever the user specifies an -march or -mcpu option that selects 
such
an architecture revision.

Bootstrapped and tested on arm.

Ok for trunk?

Thanks,
Kyrill

P.S. I'll add a note to changes.html to that effect separately.

2016-03-01  Kyrylo Tkachov  

* config/arm/arm.c (arm_option_override): Warn on pre-ARMv4T
architecture revisions.
* doc/invoke.texi (ARM Options): Add note on deprecation of pre-ARMv4T
architecture revisions.

2016-03-01  Kyrylo Tkachov  

* gcc.target/arm/ftest-armv4-arm.c: Add dg-warning for deprecation
warning.
* gcc.target/arm/pr62554.c: Likewise.
* gcc.target/arm/pr69610-1.c: Likewise.
* gcc.target/arm/pr69610-2.c: Likewise.




Re: [PATCH, fortran, v3] Use Levenshtein spelling suggestions in Fortran FE

2016-03-07 Thread David Malcolm
On Sat, 2016-03-05 at 23:46 +0100, Bernhard Reutner-Fischer wrote:
[...]

> diff --git a/gcc/fortran/misc.c b/gcc/fortran/misc.c
> index 405bae0..72ed311 100644
> --- a/gcc/fortran/misc.c
> +++ b/gcc/fortran/misc.c
[...]

> @@ -274,3 +275,41 @@ get_c_kind(const char *c_kind_name,teropKind_tki
> nds_table[])
>  
>return ISOCBINDING_INVALID;
>  }
> +
> +
> +/* For a given name TYPO, determine the best candidate from
> CANDIDATES
> +   perusing Levenshtein distance.  Frees CANDIDATES before
> returning.  */
> +
> +const char *
> +gfc_closest_fuzzy_match (const char *typo, char **candidates)
> +{
> +  /* Determine closest match.  */
> +  const char *best = NULL;
> +  char **cand = candidates;
> +  edit_distance_t best_distance = MAX_EDIT_DISTANCE;
> +
> +  while (cand && *cand)
> +{
> +  edit_distance_t dist = levenshtein_distance (typo, *cand);
> +  if (dist < best_distance)
> + {
> +best_distance = dist;
> +best = *cand;
> + }
> +  cand++;
> +}
> +  /* If more than half of the letters were misspelled, the
> suggestion is
> + likely to be meaningless.  */
> +  if (best)
> +{
> +  unsigned int cutoff = MAX (strlen (typo), strlen (best)) / 2;
> +
> +  if (best_distance > cutoff)
> + {
> +   XDELETEVEC (candidates);
> +   return NULL;
> + }
> +  XDELETEVEC (candidates);
> +}
> +  return best;
> +}

FWIW, there are two overloaded variants of levenshtein_distance in
gcc/spellcheck.h, the first of which takes a pair of strlen values;
your patch uses the second one:

extern edit_distance_t
levenshtein_distance (const char *s, int len_s,
  const char *t, int len_t);

extern edit_distance_t
levenshtein_distance (const char *s, const char *t);

So one minor tweak you may want to consider here is to calculate
  strlen (typo)
once at the top of gfc_closest_fuzzy_match, and then pass it in to the
4-arg variant of levenshtein_distance, which would avoid recalculating
strlen (typo) for every candidate.

I can't comment on the rest of the patch (I'm not a Fortran expert),
though it seems sane to 

Hope this is constructive
Dave


Re: [RFC] PR69195, Reload confused by invalid reg equivs

2016-03-07 Thread Alan Modra
On Fri, Mar 04, 2016 at 06:39:54PM +0100, Bernd Schmidt wrote:
> I've managed to reproduce this, and I think your analysis of the problem is
> correct. So the patch is probably ok from the point of you of "will it
> work". I can probably be convinced to approve it as-is, but I wonder if
> you'd be prepared to try something else first.
> 
> This whole area in IRA is turning into spaghetti a little bit. I would
> prefer that we schedule a new small optimization pass beforehand (either as
> a real pass or just a function call) that does only the label-substituting
> trick from update_equiv_regs, then removes dead code as necessary. When we
> then get to IRA and setting up equiv regs, we should no longer get to a
> point where we need to reverify equivalences or update regstat.

I had the same thought myself, but wondered what I'd be getting into
so didn't mention the idea.  ;-)  I'll have a go.  (Not sure if I'd
trust me to get it right though, for stage 4.)

Something for later:  It seems to my untrained eye that more than just
the label substitution should be separated out.  All of the latter
half of update_equiv_regs doing global combine and moving insns could
be better separated from the rest of ira.  It is to some extent, by
use of reg_equiv[] vs. ira_reg_equiv[], but shares the REG_EQUIV
notes.  That means the REG_EQUIV notes created by ira need to take
into account ira, lra and reload behaviour and are thus rather more
restrictive than needed for the mini-combine pass.  eg. See the
comment about calls in validate_equiv_mem.

Incidentally, an all-langs bootstrap and regression test of gcc
triggers delete_unreachable_blocks just once for x86_64 and never for
powerpc64le.  The file is testsuite/gfortran.dg/pr46755.f, which of
course isn't that surprising since the pr46755 fix added the
delete_unreachable_blocks call.

-- 
Alan Modra
Australia Development Lab, IBM


[PING^2][PATCH, PR69607] Mark offload symbols as global in lto

2016-03-07 Thread Tom de Vries

On 24/02/16 14:37, Tom de Vries wrote:

On 17/02/16 16:48, Tom de Vries wrote:

On 17/02/16 13:30, Jakub Jelinek wrote:

On Wed, Feb 17, 2016 at 01:02:17PM +0100, Tom de Vries wrote:

Mark offload symbols as global in lto


I'm really not familiar with that part of LTO, so I'm CCing Honza and
Richard here.

2016-02-08  Tom de Vries  

PR lto/69607
* lto-partition.c (promote_offload_tables): New function.
* lto-partition.h (promote_offload_tables):  Declare.


Just one space instead of two after :


* lto.c (do_whole_program_analysis): call promote_offload_tables.


Capital C in Call.



Done.


diff --git a/libgomp/testsuite/libgomp.c/target-37.c
b/libgomp/testsuite/libgomp.c/target-37.c
new file mode 100644
index 000..1edb21e
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/target-37.c
@@ -0,0 +1,98 @@
+/* { dg-do run { target lto } } */
+/* { dg-additional-sources "target-38.c" } */
+/* { dg-additional-options "-flto -flto-partition=1to1
-fno-toplevel-reorder" } */
+
+extern
+#ifdef __cplusplus
+"C"
+#endif
+void abort (void);


Why the C++ stuff in there?  Do you intend to include the testcase
also in libgomp.c++?


No, that's just there because I started both target-37.c and target-38.c
by copying target-1.c.


If not, it is not needed.


Removed.


Otherwise, the tests LGTM.



Updated patch attached.





Ping^2.

This patch fixes an ICE when compiling an openmp program using -flto 
-partition=1to1 -fno-toplevel-reorder.


[ Original submission here:
 https://gcc.gnu.org/ml/gcc-patches/2016-02/msg00547.html
   Latest patch with updated testcases:
 https://gcc.gnu.org/ml/gcc-patches/2016-02/msg01196.html
]

OK for trunk, stage1 (or stage4, if that's appropriate)?

Thanks,
- Tom




0001-Mark-offload-symbols-as-global-in-lto.patch


Mark offload symbols as global in lto

2016-02-17  Tom de Vries  

PR lto/69607
* lto-partition.c (promote_offload_tables): New function.
* lto-partition.h (promote_offload_tables): Declare.
* lto.c (do_whole_program_analysis): Call promote_offload_tables.

* testsuite/libgomp.c/target-36.c: New test.
* testsuite/libgomp.c/target-37.c: New test.
* testsuite/libgomp.c/target-38.c: New test.

---
  gcc/lto/lto-partition.c | 28 ++
  gcc/lto/lto-partition.h |  1 +
  gcc/lto/lto.c   |  2 +
  libgomp/testsuite/libgomp.c/target-36.c |  4 ++
  libgomp/testsuite/libgomp.c/target-37.c | 94
+
  libgomp/testsuite/libgomp.c/target-38.c | 91
+++
  6 files changed, 220 insertions(+)

diff --git a/gcc/lto/lto-partition.c b/gcc/lto/lto-partition.c
index 9eb63c2..56598d4 100644
--- a/gcc/lto/lto-partition.c
+++ b/gcc/lto/lto-partition.c
@@ -34,6 +34,7 @@ along with GCC; see the file COPYING3.  If not see
  #include "ipa-prop.h"
  #include "ipa-inline.h"
  #include "lto-partition.h"
+#include "omp-low.h"

  vec ltrans_partitions;

@@ -1003,6 +1004,33 @@ promote_symbol (symtab_node *node)
  "Promoting as hidden: %s\n", node->name ());
  }

+/* Promote the symbols in the offload tables.  */
+
+void
+promote_offload_tables (void)
+{
+  if (vec_safe_is_empty (offload_funcs) && vec_safe_is_empty
(offload_vars))
+return;
+
+  for (unsigned i = 0; i < vec_safe_length (offload_funcs); i++)
+{
+  tree fn_decl = (*offload_funcs)[i];
+  cgraph_node *node = cgraph_node::get (fn_decl);
+  if (node->externally_visible)
+continue;
+  promote_symbol (node);
+}
+
+  for (unsigned i = 0; i < vec_safe_length (offload_vars); i++)
+{
+  tree var_decl = (*offload_vars)[i];
+  varpool_node *node = varpool_node::get (var_decl);
+  if (node->externally_visible)
+continue;
+  promote_symbol (node);
+}
+}
+
  /* Return true if NODE needs named section even if it won't land in
the partition
 symbol table.
 FIXME: we should really not use named sections for inline clones
and master
diff --git a/gcc/lto/lto-partition.h b/gcc/lto/lto-partition.h
index 31e3764..1a38126 100644
--- a/gcc/lto/lto-partition.h
+++ b/gcc/lto/lto-partition.h
@@ -36,6 +36,7 @@ extern vec ltrans_partitions;
  void lto_1_to_1_map (void);
  void lto_max_map (void);
  void lto_balanced_map (int);
+extern void promote_offload_tables (void);
  void lto_promote_cross_file_statics (void);
  void free_ltrans_partitions (void);
  void lto_promote_statics_nonwpa (void);
diff --git a/gcc/lto/lto.c b/gcc/lto/lto.c
index 9dd513f..2736c5c 100644
--- a/gcc/lto/lto.c
+++ b/gcc/lto/lto.c
@@ -3138,6 +3138,8 @@ do_whole_program_analysis (void)
   to globals with hidden visibility because they are accessed
from multiple
   partitions.  */
lto_promote_cross_file_statics ();
+  /* Promote all the offload symbols.  */
+  promote_offload_tables ();
timevar_pop (TV_WHOPR_PARTITIONING);

timevar_stop (TV_PHASE_OPT_GEN);
diff --git 

[PATCH] Fix PR70115

2016-03-07 Thread Richard Biener

This fixes the PR by removing an incomplete duplicate implementation.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard

2016-03-07  Richard Biener  

PR tree-optimization/70115
* tree-ssa-loop-ivcanon.c (propagate_into_all_uses): Remove.
(propagate_constants_for_unrolling): Use replace_uses_by.

* gcc.dg/torture/pr70115.c: New testcase.

Index: gcc/tree-ssa-loop-ivcanon.c
===
*** gcc/tree-ssa-loop-ivcanon.c (revision 233964)
--- gcc/tree-ssa-loop-ivcanon.c (working copy)
*** canonicalize_induction_variables (void)
*** 1164,1201 
return 0;
  }
  
- /* Propagate VAL into all uses of SSA_NAME.  */
- 
- static void
- propagate_into_all_uses (tree ssa_name, tree val)
- {
-   imm_use_iterator iter;
-   gimple *use_stmt;
- 
-   FOR_EACH_IMM_USE_STMT (use_stmt, iter, ssa_name)
- {
-   gimple_stmt_iterator use_stmt_gsi = gsi_for_stmt (use_stmt);
-   use_operand_p use;
- 
-   FOR_EACH_IMM_USE_ON_STMT (use, iter)
-   SET_USE (use, val);
- 
-   if (is_gimple_assign (use_stmt)
- && get_gimple_rhs_class (gimple_assign_rhs_code (use_stmt))
-== GIMPLE_SINGLE_RHS)
-   {
- tree rhs = gimple_assign_rhs1 (use_stmt);
- 
- if (TREE_CODE (rhs) == ADDR_EXPR)
-   recompute_tree_invariant_for_addr_expr (rhs);
-   }
- 
-   fold_stmt_inplace (_stmt_gsi);
-   update_stmt (use_stmt);
-   maybe_clean_or_replace_eh_stmt (use_stmt, use_stmt);
- }
- }
- 
  /* Propagate constant SSA_NAMEs defined in basic block BB.  */
  
  static void
--- 1164,1169 
*** propagate_constants_for_unrolling (basic
*** 1212,1218 
  && gimple_phi_num_args (phi) == 1
  && TREE_CODE (arg) == INTEGER_CST)
{
! propagate_into_all_uses (result, arg);
  gsi_remove (, true);
  release_ssa_name (result);
}
--- 1180,1186 
  && gimple_phi_num_args (phi) == 1
  && TREE_CODE (arg) == INTEGER_CST)
{
! replace_uses_by (result, arg);
  gsi_remove (, true);
  release_ssa_name (result);
}
*** propagate_constants_for_unrolling (basic
*** 1231,1237 
  && (lhs = gimple_assign_lhs (stmt), TREE_CODE (lhs) == SSA_NAME)
  && !SSA_NAME_OCCURS_IN_ABNORMAL_PHI (lhs))
{
! propagate_into_all_uses (lhs, gimple_assign_rhs1 (stmt));
  gsi_remove (, true);
  release_ssa_name (lhs);
}
--- 1199,1205 
  && (lhs = gimple_assign_lhs (stmt), TREE_CODE (lhs) == SSA_NAME)
  && !SSA_NAME_OCCURS_IN_ABNORMAL_PHI (lhs))
{
! replace_uses_by (lhs, gimple_assign_rhs1 (stmt));
  gsi_remove (, true);
  release_ssa_name (lhs);
}
Index: gcc/testsuite/gcc.dg/torture/pr70115.c
===
*** gcc/testsuite/gcc.dg/torture/pr70115.c  (revision 0)
--- gcc/testsuite/gcc.dg/torture/pr70115.c  (working copy)
***
*** 0 
--- 1,20 
+ /* { dg-do compile } */
+ 
+ typedef int size_t;
+ char a;
+ int main()
+ {
+   size_t b, c;
+   for (;;)
+ {
+   b = 0;
+   for (; c;)
+   ;
+   for (; b < sizeof(long); b++)
+   ;
+   for (; b < c; b++)
+   a++;
+   for (; c < b; c++)
+   ;
+ }
+ }


Re: [PATCH] Fix PR c++/66786 (ICE with nested lambdas in variable template)

2016-03-07 Thread Jason Merrill

On 03/06/2016 09:30 AM, Patrick Palka wrote:

On Sun, Mar 6, 2016 at 1:42 AM, Jason Merrill  wrote:

On 02/08/2016 12:19 AM, Patrick Palka wrote:


Here, we are calling template_class_depth on a FIELD_DECL corresponding
to a lambda that is used inside variable template.  template_class_depth
however does not see that this FIELD_DECL is used inside a variable
template binding because its chain of DECL_CONTEXTs does not include the
corresponding VAR_DECL.  So template_class_depth returns the wrong
template nesting level which causes its callers to malfunction.  In
particular we strip a template argument level in
tsubst_copy [FIELD_DECL] when we shouldn't have.

This patch makes template_class_depth look at a lambda type's
LAMBDA_TYPE_EXTRA_SCOPE field instead of its TYPE_CONTEXT, so that it
can iterate into an enclosing variable template, if applicable.

Tested on x86_64-pc-linux gnu, no new regressions.  Also tested against
Boost.  Is this OK to commit?



This is breaking several lambda testcases with -fconcepts on;
LAMBDA_TYPE_EXTRA_SCOPE can also be a FIELD_DECL or PARM_DECL.  Let's just
check DECL_P.


Sorry about that.

In the case of LAMBDA_TYPE_EXTRA_SCOPE being a PARM_DECL, could there
be a chance that this PARM_DECL has a non-null DECL_LANG_SPECIFIC? If
so it looks like we could still later ICE in get_template_info (called
from template_class_depth) when we try to get at its
DECL_TEMPLATE_INFO, an accessor that cannot be used on PARM_DECLs.  So
I wonder if maybe get_template_info() should also be adjusted to
handle the case of being given a PARM_DECL which may have a non-null
DECL_LANG_SPECIFIC:


Looks good.

Jason




Re: [PATCH][ARM][4.9 Backport] PR target/69875 Fix atomic_loaddi expansion

2016-03-07 Thread Kyrill Tkachov

Ping.
https://gcc.gnu.org/ml/gcc-patches/2016-02/msg01627.html

Thanks,
Kyrill
On 24/02/16 11:23, Kyrill Tkachov wrote:

Hi all,

This is the GCC 4.9 backport of 
https://gcc.gnu.org/ml/gcc-patches/2016-02/msg01338.html.
The differences are that TARGET_HAVE_LPAE has to be defined in arm.h in a 
different way because
the ARM_FSET_HAS_CPU1 mechanism doesn't exist on this branch. Also, due to the 
location of insn_flags
and the various FL_* (on the 4.9 branch they're defined locally in arm.c rather 
than in arm-protos.h)
I chose to define TARGET_HAVE_LPAE in terms of hardware divide instruction 
availability. This should be
an equivalent definition.

Also, the scan-assembler tests that check for the DMB instruction are updated 
to check for
"dmb sy" rather than "dmb ish", because the memory barrier instruction changed 
on trunk for GCC 6.

Bootstrapped and tested on the GCC 4.9 branch on arm-none-linux-gnueabihf.

Ok for the branch after the trunk patch has had a few days to bake?

Thanks,
Kyrill

2016-02-24  Kyrylo Tkachov  

PR target/69875
* config/arm/arm.h (TARGET_HAVE_LPAE): Define.
* config/arm/unspecs.md (VUNSPEC_LDRD_ATOMIC): New value.
* config/arm/sync.md (arm_atomic_loaddi2_ldrd): New pattern.
(atomic_loaddi_1): Delete.
(atomic_loaddi): Rewrite expander using the above changes.

2016-02-24  Kyrylo Tkachov  

PR target/69875
* gcc.target/arm/atomic_loaddi_acquire.x: New file.
* gcc.target/arm/atomic_loaddi_relaxed.x: Likewise.
* gcc.target/arm/atomic_loaddi_seq_cst.x: Likewise.
* gcc.target/arm/atomic_loaddi_1.c: New test.
* gcc.target/arm/atomic_loaddi_2.c: Likewise.
* gcc.target/arm/atomic_loaddi_3.c: Likewise.
* gcc.target/arm/atomic_loaddi_4.c: Likewise.
* gcc.target/arm/atomic_loaddi_5.c: Likewise.
* gcc.target/arm/atomic_loaddi_6.c: Likewise.
* gcc.target/arm/atomic_loaddi_7.c: Likewise.
* gcc.target/arm/atomic_loaddi_8.c: Likewise.
* gcc.target/arm/atomic_loaddi_9.c: Likewise.




Re: [PATCH][ARM][5 Backport] PR target/69875 Fix atomic_loaddi expansion

2016-03-07 Thread Kyrill Tkachov

Ping.
https://gcc.gnu.org/ml/gcc-patches/2016-02/msg01626.html

Thanks,
Kyrill
On 24/02/16 11:23, Kyrill Tkachov wrote:

Hi all,

This is the GCC 5 backport of 
https://gcc.gnu.org/ml/gcc-patches/2016-02/msg01338.html.
The differences are that TARGET_HAVE_LPAE has to be defined in arm.h in a 
different way because
the ARM_FSET_HAS_CPU1 mechanism doesn't exist on this branch.
Also, the scan-assembler tests that check for the DMB instruction are updated 
to check for
"dmb sy" rather than "dmb ish", because the memory barrier instruction changed 
on trunk for GCC 6.

Bootstrapped and tested on the GCC 5 branch on arm-none-linux-gnueabihf.

Ok for the branch after the trunk patch has had a few days to bake?

Thanks,
Kyrill

2016-02-24  Kyrylo Tkachov  

PR target/69875
* config/arm/arm.h (TARGET_HAVE_LPAE): Define.
* config/arm/unspecs.md (VUNSPEC_LDRD_ATOMIC): New value.
* config/arm/sync.md (arm_atomic_loaddi2_ldrd): New pattern.
(atomic_loaddi_1): Delete.
(atomic_loaddi): Rewrite expander using the above changes.

2016-02-24  Kyrylo Tkachov  

PR target/69875
* gcc.target/arm/atomic_loaddi_acquire.x: New file.
* gcc.target/arm/atomic_loaddi_relaxed.x: Likewise.
* gcc.target/arm/atomic_loaddi_seq_cst.x: Likewise.
* gcc.target/arm/atomic_loaddi_1.c: New test.
* gcc.target/arm/atomic_loaddi_2.c: Likewise.
* gcc.target/arm/atomic_loaddi_3.c: Likewise.
* gcc.target/arm/atomic_loaddi_4.c: Likewise.
* gcc.target/arm/atomic_loaddi_5.c: Likewise.
* gcc.target/arm/atomic_loaddi_6.c: Likewise.
* gcc.target/arm/atomic_loaddi_7.c: Likewise.
* gcc.target/arm/atomic_loaddi_8.c: Likewise.
* gcc.target/arm/atomic_loaddi_9.c: Likewise.




Re: [C PATCH] Prevent -Wunused-value warning with __atomic_fetch_* (PR c/69407)

2016-03-07 Thread Marek Polacek
On Fri, Mar 04, 2016 at 07:17:46PM +0100, Uros Bizjak wrote:
> Hello!
> 
> > This is not a regression but I thought I'd post this anyway.  Martin 
> > reported
> > that we generate -Wunused-value warnings on the attached testcase, which
> > arguable doesn't make sense.  Setting TREE_USED suppresses the warning.  
> > Since
> > we already compute 'fetch_op' I used that.  (This warning doesn't trigger 
> > e.g.
> > for __atomic_load/store/compare.)
> >
> > Bootstrapped/regtested on x86_64-linux, ok for trunk or gcc7?
> >
> > 2016-03-04  Marek Polacek  
> >
> > PR c/69407
> > * c-common.c (resolve_overloaded_builtin): Set TREE_USED for the fetch
> > operations.
> >
> > * gcc.dg/atomic-op-6.c: New test.
> 
> You can probably revert my workaround [1] that suppressed these
> warnings in libsupc++/guard.cc.

Ah, thanks for the heads-up, I'll do that once I get the patch in.
 
> [1] https://gcc.gnu.org/ml/gcc-patches/2015-07/msg00023.html

Marek


Re: [C PATCH] Prevent -Wunused-value warning with __atomic_fetch_* (PR c/69407)

2016-03-07 Thread Marek Polacek
On Fri, Mar 04, 2016 at 07:19:56PM +0100, Jakub Jelinek wrote:
> > --- gcc/c-family/c-common.c
> > +++ gcc/c-family/c-common.c
> > @@ -11443,6 +11443,10 @@ resolve_overloaded_builtin (location_t loc, tree 
> > function,
> > && orig_code != BUILT_IN_ATOMIC_STORE_N)
> >   result = sync_resolve_return (first_param, result, orig_format);
> >  
> > +   if (fetch_op)
> > + /* Prevent -Wunused-value warning.  */
> > + TREE_USED (result) = true;
> > +
> 
> Can't sync_resolve_return return error_mark_node?
> If it could, perhaps it would be saver to move this a few lines above the
> sync_resolve_return call, where we've already checked that result is not
> error_mark_node.

That seemed right but now I remember why I put setting TREE_USED here.  Setting
TREE_USED on the result of build_function_call_vec (a CALL_EXPR) wouldn't
suppress the warning, it needs to be set on the result of sync_resolve_return
(a CONVERT_EXPR).  Setting TREE_USED on error_mark_node doesn't ICE so I didn't
check for that as I think it should be harmless.
So I'm keeping the patch as-is, please let me know you still want to see some
adjustments.

Marek


Re: RFA: PR 70044: Catch a second call to aarch64_override_options_after_change

2016-03-07 Thread Nick Clifton
Hi Kyrill,

> This is missing a second hunk from the patch you attached in the PR that I 
> think is necessary
> for this to work (setting to x_flag_omit_frame_pointer)...

Doh!  Silly me - there was a snafu restoring the patch after I had reverted it 
in order to
check that the pre- and post- patch gcc test results were the same.

> Note that this patch would expose a bug in 
> common/config/aarch64/aarch64-common.c
> where there's a thinko in the handling of OPT_momit_leaf_frame_pointer.
> That's my bad and I'll propose a patch for it soon.

OK.

> Also, is there a way to create a testcase for the testuite?
> i.e. is there a simple way to scan the assembly generated after the final LTO 
> processing
> for the presence of the frame pointer?

Originally I thought not.  But then I found scan-lto-assembler in 
testsuite/lib/scanasm.exp
and that made everything simple.

So attached is a revised patch with the missing second hunk restored and a 
testcase added.
(Which I have checked and confirmed that it does fail without the patch and it 
does pass
with the patch applied).

OK to apply ?

Cheers
  Nick

gcc/ChangeLog as before...

gcc/testsuite/ChangeLog
2016-03-07  Nick Clifton  

PR target/70044
* gcc.target/aarch64/pr70044.c: New test.


aarch64.c.patch.2
Description: Unix manual page


Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE

2016-03-07 Thread Christophe Lyon
On 7 March 2016 at 05:43, Ramana Radhakrishnan
 wrote:
> On Wed, Feb 17, 2016 at 10:20 AM, Christophe Lyon
>  wrote:
>> On 17 February 2016 at 11:05, Kyrill Tkachov
>>  wrote:
>>>
>>> On 17/02/16 10:03, Christophe Lyon wrote:

 On 15 February 2016 at 12:32, Kyrill Tkachov
  wrote:
>
> On 04/02/16 08:58, Ramana Radhakrishnan wrote:
>>
>> On Tue, Jun 30, 2015 at 2:15 AM, Jim Wilson 
>> wrote:
>>>
>>> This is my suggested fix for PR 65932, which is a linux kernel
>>> miscompile with gcc-5.1.
>>>
>>> The problem here is caused by a chain of events.  The first is that
>>> the relatively new eipa_sra pass creates fake parameters that behave
>>> slightly differently than normal parameters.  The second is that the
>>> optimizer creates phi nodes that copy local variables to fake
>>> parameters and/or vice versa.  The third is that the ouf-of-ssa pass
>>> assumes that it can emit simple move instructions for these phi nodes.
>>> And the fourth is that the ARM port has a PROMOTE_MODE macro that
>>> forces QImode and HImode to unsigned, but a
>>> TARGET_PROMOTE_FUNCTION_MODE hook that does not.  So signed char and
>>> short parameters have different in register representations than local
>>> variables, and require a conversion when copying between them, a
>>> conversion that the out-of-ssa pass can't easily emit.
>>>
>>> Ultimately, I think this is a problem in the arm backend.  It should
>>> not have a PROMOTE_MODE macro that is changing the sign of char and
>>> short local variables.  I also think that we should merge the
>>> PROMOTE_MODE macro with the TARGET_PROMOTE_FUNCTION_MODE hook to
>>> prevent this from happening again.
>>>
>>> I see four general problems with the current ARM PROMOTE_MODE
>>> definition.
>>> 1) Unsigned char is only faster for armv5 and earlier, before the sxtb
>>> instruction was added.  It is a lose for armv6 and later.
>>> 2) Unsigned short was only faster for targets that don't support
>>> unaligned accesses.  Support for these targets was removed a while
>>> ago, and this PROMODE_MODE hunk should have been removed at the same
>>> time.  It was accidentally left behind.
>>> 3) TARGET_PROMOTE_FUNCTION_MODE used to be a boolean hook, when it was
>>> converted to a function, the PROMOTE_MODE code was copied without the
>>> UNSIGNEDP changes.  Thus it is only an accident that
>>> TARGET_PROMOTE_FUNCTION_MODE and PROMOTE_MODE disagree.  Changing
>>> TARGET_PROMOTE_FUNCTION_MODE is an ABI change, so only PROMOTE_MODE
>>> changes to resolve the difference are safe.
>>> 4) There is a general principle that you should only change signedness
>>> in PROMOTE_MODE if the hardware forces it, as otherwise this results
>>> in extra conversion instructions that make code slower.  The mips64
>>> hardware for instance requires that 32-bit values be sign-extended
>>> regardless of type, and instructions may trap if this is not true.
>>> However, it has a set of 32-bit instructions that operate on these
>>> values, and hence no conversions are required.  There is no similar
>>> case on ARM. Thus the conversions are unnecessary and unwise.  This
>>> can be seen in the testcases where gcc emits both a zero-extend and a
>>> sign-extend inside a loop, as the sign-extend is required for a
>>> compare, and the zero-extend is required by PROMOTE_MODE.
>>
>> Given Kyrill's testing with the patch and the reasonably detailed
>> check of the effects of code generation changes - The arm.h hunk is ok
>> - I do think we should make this explicit in the documentation that
>> TARGET_PROMOTE_MODE and TARGET_PROMOTE_FUNCTION_MODE should agree and
>> better still maybe put in a checking assert for the same in the
>> mid-end but that could be the subject of a follow-up patch.
>>
>> Ok to apply just the arm.h hunk as I think Kyrill has taken care of
>> the testsuite fallout separately.
>
> Hi all,
>
> I'd like to backport the arm.h from this ( r233130) to the GCC 5
> branch. As the CSE patch from my series had some fallout on x86_64
> due to a deficiency in the AVX patterns that is too invasive to fix
> at this stage (and presumably backport), I'd like to just backport
> this arm.h fix and adjust the tests to XFAIL the fallout that comes
> with not applying the CSE patch. The attached patch does that.
>
> The code quality fallout on code outside the testsuite is not
> that gread. The SPEC benchmarks are not affected by not applying
> the CSE change, and only a single sequence in a popular embedded
> benchmark
> shows some degradation for -mtune=cortex-a9 in the same way as 

Re: Fix postreload_combine miscompilation (PR 69941)

2016-03-07 Thread Bernd Schmidt

On 03/05/2016 06:27 AM, Jeff Law wrote:

PR rtl-optimization/69941
* postreload.c (reload_combine_recognize_pattern): Ensure all uses of
the reg share its mode.

testsuite/
PR rtl-optimization/69941
* gcc.dg/torture/pr69941.c: New test.

OK.


For branches as well?


Bernd



Re: [C++] Add -fnull-this-pointer

2016-03-07 Thread Richard Biener
On Mon, Mar 7, 2016 at 1:14 PM, Markus Trippelsdorf
 wrote:
> On 2016.03.07 at 13:03 +0100, Jakub Jelinek wrote:
>> On Mon, Mar 07, 2016 at 12:57:20PM +0100, Richard Biener wrote:
>> > (no strong opinion about that - but for example the recent -flifetime-dse
>> > strengthening fallout is similar)
>>
>> -flifetime-dse change is different, we default to -flifetime-dse=2, but have
>> a way for broken packages to workaround their bugs, so that situation is the
>> same as with -fno-delete-null-pointer-checks.
>
> If the situation was the same there would be no -flifetime-dse=2 flag at
> all. Affected users would be told to use -fno-lifetime-dse for broken
> code, just like they are told to use -fno-delete-null-pointer-checks.

Yes, I said I would have told them that.  But appearantly Jason had a
different opinion...

Richard.

> --
> Markus


Re: [C++] Add -fnull-this-pointer

2016-03-07 Thread Markus Trippelsdorf
On 2016.03.07 at 13:03 +0100, Jakub Jelinek wrote:
> On Mon, Mar 07, 2016 at 12:57:20PM +0100, Richard Biener wrote:
> > (no strong opinion about that - but for example the recent -flifetime-dse
> > strengthening fallout is similar)
> 
> -flifetime-dse change is different, we default to -flifetime-dse=2, but have
> a way for broken packages to workaround their bugs, so that situation is the
> same as with -fno-delete-null-pointer-checks.

If the situation was the same there would be no -flifetime-dse=2 flag at
all. Affected users would be told to use -fno-lifetime-dse for broken
code, just like they are told to use -fno-delete-null-pointer-checks.

-- 
Markus


Re: [C++] Add -fnull-this-pointer

2016-03-07 Thread Richard Biener
On Mon, Mar 7, 2016 at 1:03 PM, Jakub Jelinek  wrote:
> On Mon, Mar 07, 2016 at 12:57:20PM +0100, Richard Biener wrote:
>> > Honza, can you please repost the patch. Richard said on IRC that he may
>> > reconsider his rejection after all.
>> >
>> > I've tested the patch on my Gentoo test machine and it fixes segfaults
>> > in LLVM, QT, Chromium, Kdevelop.
>> >
>> > If the patch gets accepted, changes.html and porting_to.html need to be
>> > updated to reflect the new flag.
>>
>> Note that I'll only re-consider if we want to make this flag enabled by 
>> default.
>
> By enabled by default you mean what exactly?  That by default this will not
> be treated as non-NULL anymore?

Yes.

> I'm against that, while there are large
> codebases that are coded in "C++", there are plenty of properly written
> packages and by deferring this optimization we'd just delay the fixing of
> firefox etc.

True.  Then I'm against a new flag.

>> (no strong opinion about that - but for example the recent -flifetime-dse
>> strengthening fallout is similar)
>
> -flifetime-dse change is different, we default to -flifetime-dse=2, but have
> a way for broken packages to workaround their bugs, so that situation is the
> same as with -fno-delete-null-pointer-checks.

Well, if anybody would have asked I'd have said we don't want that fine-grained
control either there - people could have just used -fno-lifetime-dse.  Unless
we want to change default behavior back to what GCC 5 did, but see above.

Richard.

> Jakub


Re: [C++] Add -fnull-this-pointer

2016-03-07 Thread Jakub Jelinek
On Mon, Mar 07, 2016 at 12:57:20PM +0100, Richard Biener wrote:
> > Honza, can you please repost the patch. Richard said on IRC that he may
> > reconsider his rejection after all.
> >
> > I've tested the patch on my Gentoo test machine and it fixes segfaults
> > in LLVM, QT, Chromium, Kdevelop.
> >
> > If the patch gets accepted, changes.html and porting_to.html need to be
> > updated to reflect the new flag.
> 
> Note that I'll only re-consider if we want to make this flag enabled by 
> default.

By enabled by default you mean what exactly?  That by default this will not
be treated as non-NULL anymore?  I'm against that, while there are large
codebases that are coded in "C++", there are plenty of properly written
packages and by deferring this optimization we'd just delay the fixing of
firefox etc.

> (no strong opinion about that - but for example the recent -flifetime-dse
> strengthening fallout is similar)

-flifetime-dse change is different, we default to -flifetime-dse=2, but have
a way for broken packages to workaround their bugs, so that situation is the
same as with -fno-delete-null-pointer-checks.

Jakub


Re: [committed, libffi] Match upstream soname

2016-03-07 Thread Richard Henderson

On 03/06/2016 07:24 PM, Matthias Klose wrote:

On 05.03.2016 19:28, Richard Henderson wrote:

When I went to apply my symbol versioning patch to upstream,
I discovered that upstream had already bumped their soname to
6.4.0, beyond the bump that I'd applied to gcc to 5.0.0.

So I bumped upstream to 7.0.0, including the symbol versioning,
and this adjusts gcc to match.


Looking at the libffi upstream git repo, I can't find this change upstream, and
neither mentioned on the libffi mailing list. What am I missing?


That must of libffi's workflow is via github, I think.

  https://github.com/atgreen/libffi/pull/230


r~


Re: [PATCH] Skip ubsan internal fns in tail-merge

2016-03-07 Thread Tom de Vries

On 07/03/16 12:32, Jakub Jelinek wrote:

On Mon, Mar 07, 2016 at 12:26:10PM +0100, Tom de Vries wrote:

OK for stage4 trunk if bootstrap and reg-test succeeds?


Ok, with a minor nits.


2016-03-07  Tom de Vries  

PR tree-optimization/70116
* tree-ssa-tail-merge.c (merge_stmts_p): New function, factored out
of ...
(find_duplicate): ... here.  Move merge_stmts_p test to after
gimple_equal_p test.
(merge_stmts_p): Return false for ubsan/asan internal functions with
different gimple_location.


Listing the same function twice is weird, and for a single line short test
copied from elsewhere, especially if it originally was used twice and now
just once, I think it is not worth mentioning.  I'd just write:
(merge_stmts_p): New function.
(find_duplicate): Use it.  Don't test is_tm_ending here.



The only problem I have with that is that the ChangeLog entry doesn't 
mention ubsan/usan internal fns. I've resolved that by elaborating on 
the new function.



+   case IFN_UBSAN_NULL:
+   case IFN_UBSAN_BOUNDS:
+   case IFN_UBSAN_VPTR:
+   case IFN_UBSAN_CHECK_ADD:
+   case IFN_UBSAN_CHECK_SUB:
+   case IFN_UBSAN_CHECK_MUL:
+   case IFN_UBSAN_OBJECT_SIZE:
+   case IFN_ASAN_CHECK:


This deserves a comment, you should mention what has been mentioned in the
thread, that for these internal functions gimple_location is like another
artificial parameter, or explain what it would mean if it attempted to merge
these stmts with different locations.

+ return gimple_location (stmt1) == gimple_location (stmt2);


Done.

[ And I fixed a missing closing brace error. ]

Currently bootstrapping/reg-testing.

Thanks,
- Tom

Skip ubsan/asan internal fns with different location in tail-merge

2016-03-07  Tom de Vries  

	PR tree-optimization/70116
	* tree-ssa-tail-merge.c	(merge_stmts_p): New function, handling
	is_tm_ending stmts and ubsan/asan internal functions.
	(find_duplicate): Use it.  Don't test is_tm_ending here.

---
 gcc/tree-ssa-tail-merge.c | 44 ++--
 1 file changed, 38 insertions(+), 6 deletions(-)

diff --git a/gcc/tree-ssa-tail-merge.c b/gcc/tree-ssa-tail-merge.c
index 5d32790..e95879f 100644
--- a/gcc/tree-ssa-tail-merge.c
+++ b/gcc/tree-ssa-tail-merge.c
@@ -1207,6 +1207,42 @@ gsi_advance_bw_nondebug_nonlocal (gimple_stmt_iterator *gsi, tree *vuse,
 }
 }
 
+/* Return true if equal (in the sense of gimple_equal_p) statements STMT1 and
+   STMT2 are allowed to be merged.  */
+
+static bool
+merge_stmts_p (gimple *stmt1, gimple *stmt2)
+{
+  /* What could be better than this here is to blacklist the bb
+ containing the stmt, when encountering the stmt f.i. in
+ same_succ_hash.  */
+  if (is_tm_ending (stmt1))
+return false;
+
+  if (is_gimple_call (stmt1)
+  && gimple_call_internal_p (stmt1))
+switch (gimple_call_internal_fn (stmt1))
+  {
+  case IFN_UBSAN_NULL:
+  case IFN_UBSAN_BOUNDS:
+  case IFN_UBSAN_VPTR:
+  case IFN_UBSAN_CHECK_ADD:
+  case IFN_UBSAN_CHECK_SUB:
+  case IFN_UBSAN_CHECK_MUL:
+  case IFN_UBSAN_OBJECT_SIZE:
+  case IFN_ASAN_CHECK:
+	/* For these internal functions, gimple_location is an implicit
+	   parameter, which will be used explicitly after expansion.
+	   Merging these statements may cause confusing line numbers in
+	   sanitizer messages.  */
+	return gimple_location (stmt1) == gimple_location (stmt2);
+  default:
+	break;
+  }
+
+  return true;
+}
+
 /* Determines whether BB1 and BB2 (members of same_succ) are duplicates.  If so,
clusters them.  */
 
@@ -1226,14 +1262,10 @@ find_duplicate (same_succ *same_succ, basic_block bb1, basic_block bb2)
   gimple *stmt1 = gsi_stmt (gsi1);
   gimple *stmt2 = gsi_stmt (gsi2);
 
-  /* What could be better than this here is to blacklist the bb
-	 containing the stmt, when encountering the stmt f.i. in
-	 same_succ_hash.  */
-  if (is_tm_ending (stmt1)
-	  || is_tm_ending (stmt2))
+  if (!gimple_equal_p (same_succ, stmt1, stmt2))
 	return;
 
-  if (!gimple_equal_p (same_succ, stmt1, stmt2))
+  if (!merge_stmts_p (stmt1, stmt2))
 	return;
 
   gsi_prev_nondebug ();


Re: [C++] Add -fnull-this-pointer

2016-03-07 Thread Richard Biener
On Mon, Mar 7, 2016 at 12:35 PM, Markus Trippelsdorf
 wrote:
> On 2016.01.19 at 13:11 +0100, Jan Hubicka wrote:
>> according to Trevor, the assumption about THIS pointer being non-NULL breaks
>> several bigger C++ packages (definitly including Firefox, but I believe
>> kdevelop was mentioned, too).  This patch makes the feature to be controlable
>> by a dedicated flag.  I am not sure about the default. We now have ubsan 
>> check
>> for the bug so I would hope the codebases to be updated soon, but it did not
>> happen for Firefox for quite a while despite the fact that Martin Liska 
>> reported
>> it.
>>
>> This patch defaults to -fno-null-this-pointer, but I would be OK with 
>> changing
>> the default and setting it on only in GCC 6. Main point of the patch is to
>> avoid need of those packages to be built with -fno-delete-null-pointer-checks
>> (which still subsumes the flag).
>>
>> The patch is bit inconsistent, becuase C++ FE wil still assume that this 
>> pointer
>> is non-NULL when expanding multiple inheritance accesses.  We did this from
>> very beginning. I do not know FE enough to see if it is easy to change the
>> behaviour here or if it is desired.
>>
>> Bootstrapped/regtsted x86_64-linux.
>>
>> Honza
>>
>>   * c-family/c.opt (fnull-this-pointer): New flag.
>>   (nonnull_arg_p): Honnor flag_null_this_pointer.
>> Index: tree.c
>> ===
>> --- tree.c(revision 232553)
>> +++ tree.c(working copy)
>> @@ -14016,6 +14022,7 @@ nonnull_arg_p (const_tree arg)
>>/* THIS argument of method is always non-NULL.  */
>>if (TREE_CODE (TREE_TYPE (cfun->decl)) == METHOD_TYPE
>>&& arg == DECL_ARGUMENTS (cfun->decl)
>> +  && flag_null_this_pointer
>
> This should be:
> +  && !flag_null_this_pointer
>
> Honza, can you please repost the patch. Richard said on IRC that he may
> reconsider his rejection after all.
>
> I've tested the patch on my Gentoo test machine and it fixes segfaults
> in LLVM, QT, Chromium, Kdevelop.
>
> If the patch gets accepted, changes.html and porting_to.html need to be
> updated to reflect the new flag.

Note that I'll only re-consider if we want to make this flag enabled by default.
(no strong opinion about that - but for example the recent -flifetime-dse
strengthening fallout is similar)

If it's supposed to be a flag people need to use explicitely I still stand with
my suggestion to force them using -fno-delete-null-pointer-checks.

Richard.

> --
> Markus


Re: [PATCH PR69052]Set higher precedence for CONST_WIDE_INT than CONST_INT when canonicalizing addr expr

2016-03-07 Thread Richard Biener
On Mon, Mar 7, 2016 at 10:27 AM, Bin.Cheng  wrote:
> On Fri, Mar 4, 2016 at 6:06 PM, Richard Biener
>  wrote:
>>
>> On March 4, 2016 5:35:13 PM GMT+01:00, "Bin.Cheng"  
>> wrote:
>> >On Fri, Mar 4, 2016 at 11:57 AM, Richard Biener
>> > wrote:
>> >> On Fri, Mar 4, 2016 at 12:07 PM, Bin Cheng  wrote:
>> >>> Hi,
>> >>> A address canonicalization interface was introduced by my original
>> >patch to PR69052.  The interface sorts sub-parts in an address
>> >expression based on precedences defined by function
>> >commutative_operand_precedence.  It also assumes that all CONST_INT
>> >sub-parts are at the end of vector after sorting.  But this is not
>> >always true because commutative_operand_precedence sets the same
>> >precedence to both CONST_INT and CONST_WIDE_INT.  The patch could be
>> >broken in cases which have CONST_WIDE_INT sorted after CONST_INT.  Even
>> >though I don't think we can really run into such complicated case, I
>> >worked out a patch forcing CONST_INT to lower precedence than
>> >CONST_WIDE_INT, so that for sure it will be sorted after all other
>> >kinds sub-parts.
>> >>>
>> >>> This is an obvious change.  Bootstrap on x86_64, bootstrap
>> >on AArch64.  Is it OK for this stage?
>> >>
>> >> I believe the obvious change would be to change
>> >> commutative_operand_precedence to pre-CONST_WIDE_INT behavior, namely
>> >> giving CONST_WIDE_INT the same precedence as CONST_DOUBLE.
>> >Yes, but I am not sure what else this change will affect, so I made
>> >the smallest change in the original code.
>>
>> I don't like going with this kind of weirdo changes out of caution.  If you 
>> think it's too dangerous the push it back to gcc 7 and consider backporting 
>> for 6.2
>
> Hi,
> Attachment is the patch setting precedence of CONST_WIDE_INT to the
> same value as CONST_DOUBLE.  Bootstrap and test on x86_64 and AArch64,
> no regressions.

Ok.

Richard.

> Thanks,
> bin
>
> 2016-03-03  Bin Cheng  
>
> PR rtl-optimization/69052
> * rtlanal.c (commutative_operand_precedence): Set higher precedence
> to CONST_WIDE_INT.


[PATCH] Fix PR70109

2016-03-07 Thread Richard Biener

The old gcc.dg/vect/O3-pr36098.c now FAILs on target that support
general permutation.  It was added for a miscompile but only tested
that we don't vectorize - now we do, but correctly.

I believe we've recently got a duplicate but proper runtime testcase.

Tested on x86_64-unknown-linux-gnu with various flags, committed.

Richard.

2016-03-07  Richard Biener  

PR testsuite/70109
* gcc.dg/vect/O3-pr36098.c: New testcase.

Index: gcc/testsuite/gcc.dg/vect/O3-pr36098.c
===
--- gcc/testsuite/gcc.dg/vect/O3-pr36098.c  (revision 233964)
+++ gcc/testsuite/gcc.dg/vect/O3-pr36098.c  (working copy)
@@ -17,4 +17,8 @@ void foo (int ncons, t_sortblock *sb, in
  iatom[m]=sb[i].iatom[m];
 }
 
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" } 
} */
+/* The testcase was originally added for correctness reasons but now we
+   can vectorize it correctly if the target supports the permutations
+   required.  */
+
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { 
target { ! vect_perm } } } } */


Re: [C++] Add -fnull-this-pointer

2016-03-07 Thread Markus Trippelsdorf
On 2016.01.19 at 13:11 +0100, Jan Hubicka wrote:
> according to Trevor, the assumption about THIS pointer being non-NULL breaks
> several bigger C++ packages (definitly including Firefox, but I believe
> kdevelop was mentioned, too).  This patch makes the feature to be controlable
> by a dedicated flag.  I am not sure about the default. We now have ubsan check
> for the bug so I would hope the codebases to be updated soon, but it did not
> happen for Firefox for quite a while despite the fact that Martin Liska 
> reported
> it.
> 
> This patch defaults to -fno-null-this-pointer, but I would be OK with changing
> the default and setting it on only in GCC 6. Main point of the patch is to
> avoid need of those packages to be built with -fno-delete-null-pointer-checks
> (which still subsumes the flag).
> 
> The patch is bit inconsistent, becuase C++ FE wil still assume that this 
> pointer
> is non-NULL when expanding multiple inheritance accesses.  We did this from
> very beginning. I do not know FE enough to see if it is easy to change the
> behaviour here or if it is desired.
> 
> Bootstrapped/regtsted x86_64-linux.
> 
> Honza
> 
>   * c-family/c.opt (fnull-this-pointer): New flag.
>   (nonnull_arg_p): Honnor flag_null_this_pointer.
> Index: tree.c
> ===
> --- tree.c(revision 232553)
> +++ tree.c(working copy)
> @@ -14016,6 +14022,7 @@ nonnull_arg_p (const_tree arg)
>/* THIS argument of method is always non-NULL.  */
>if (TREE_CODE (TREE_TYPE (cfun->decl)) == METHOD_TYPE
>&& arg == DECL_ARGUMENTS (cfun->decl)
> +  && flag_null_this_pointer

This should be:
+  && !flag_null_this_pointer

Honza, can you please repost the patch. Richard said on IRC that he may
reconsider his rejection after all.

I've tested the patch on my Gentoo test machine and it fixes segfaults
in LLVM, QT, Chromium, Kdevelop.

If the patch gets accepted, changes.html and porting_to.html need to be
updated to reflect the new flag.

-- 
Markus


Re: [PATCH] Skip ubsan internal fns in tail-merge

2016-03-07 Thread Jakub Jelinek
On Mon, Mar 07, 2016 at 12:26:10PM +0100, Tom de Vries wrote:
> OK for stage4 trunk if bootstrap and reg-test succeeds?

Ok, with a minor nits.

> 2016-03-07  Tom de Vries  
> 
>   PR tree-optimization/70116
>   * tree-ssa-tail-merge.c (merge_stmts_p): New function, factored out
>   of ...
>   (find_duplicate): ... here.  Move merge_stmts_p test to after
>   gimple_equal_p test.
>   (merge_stmts_p): Return false for ubsan/asan internal functions with
>   different gimple_location.

Listing the same function twice is weird, and for a single line short test
copied from elsewhere, especially if it originally was used twice and now
just once, I think it is not worth mentioning.  I'd just write:
(merge_stmts_p): New function.
(find_duplicate): Use it.  Don't test is_tm_ending here.

> + case IFN_UBSAN_NULL:
> + case IFN_UBSAN_BOUNDS:
> + case IFN_UBSAN_VPTR:
> + case IFN_UBSAN_CHECK_ADD:
> + case IFN_UBSAN_CHECK_SUB:
> + case IFN_UBSAN_CHECK_MUL:
> + case IFN_UBSAN_OBJECT_SIZE:
> + case IFN_ASAN_CHECK:

This deserves a comment, you should mention what has been mentioned in the
thread, that for these internal functions gimple_location is like another
artificial parameter, or explain what it would mean if it attempted to merge
these stmts with different locations.
> +   return gimple_location (stmt1) == gimple_location (stmt2);

Jakub


Re: [PATCH] Skip ubsan internal fns in tail-merge

2016-03-07 Thread Tom de Vries

On 07/03/16 11:01, Jakub Jelinek wrote:

On Mon, Mar 07, 2016 at 10:49:14AM +0100, Tom de Vries wrote:

This patch implements that.

Should the asan internal function (ASAN_CHECK) be handled in the same way,
as suggested here ( https://gcc.gnu.org/ml/gcc-patches/2015-07/msg00776.html
)?

OK for stage4 trunk if bootstrap and reg-test succeeds?


Please certainly add ASAN_CHECK to the list.


Done.


Perhaps more precise would be to allow merging them, but only if
gimple_location (stmt1) == gimple_location (stmt2).



Done.


Also, I wonder why you need to check both stmts individually, wouldn't it be
better (both for the is_tm_ending and the ubsan/asan internal calls) to
first call gimple_equal_p and only if it says the two are equal, check
is_tm_ending (only on one stmt, if they are equal, then I hope is_tm_ending
is necessarily equal for both), and similarly for the ubsan/asan ifns
compare the gimple_location?



Done.

OK for stage4 trunk if bootstrap and reg-test succeeds?

Thanks,
- Tom
Skip ubsan/asan internal fns with different location in tail-merge

2016-03-07  Tom de Vries  

	PR tree-optimization/70116
	* tree-ssa-tail-merge.c (merge_stmts_p): New function, factored out
	of ...
	(find_duplicate): ... here.  Move merge_stmts_p test to after
	gimple_equal_p test.
	(merge_stmts_p): Return false for ubsan/asan internal functions with
	different gimple_location.

---
 gcc/tree-ssa-tail-merge.c | 41 +++--
 1 file changed, 35 insertions(+), 6 deletions(-)

diff --git a/gcc/tree-ssa-tail-merge.c b/gcc/tree-ssa-tail-merge.c
index 5d32790..13ea4fe 100644
--- a/gcc/tree-ssa-tail-merge.c
+++ b/gcc/tree-ssa-tail-merge.c
@@ -1207,6 +1207,39 @@ gsi_advance_bw_nondebug_nonlocal (gimple_stmt_iterator *gsi, tree *vuse,
 }
 }
 
+/* Return true if equal (in the sense of gimple_equal_p) statements STMT1 and
+   STMT2 are allowed to be merged.  */
+
+static bool
+merge_stmts_p (gimple *stmt1, gimple *stmt2)
+{
+  /* What could be better than this here is to blacklist the bb
+ containing the stmt, when encountering the stmt f.i. in
+ same_succ_hash.  */
+  if (is_tm_ending (stmt1))
+return false;
+
+  if (is_gimple_call (stmt1)
+  && gimple_call_internal_p (stmt1))
+{
+  switch (gimple_call_internal_fn (stmt1))
+	{
+	case IFN_UBSAN_NULL:
+	case IFN_UBSAN_BOUNDS:
+	case IFN_UBSAN_VPTR:
+	case IFN_UBSAN_CHECK_ADD:
+	case IFN_UBSAN_CHECK_SUB:
+	case IFN_UBSAN_CHECK_MUL:
+	case IFN_UBSAN_OBJECT_SIZE:
+	case IFN_ASAN_CHECK:
+	  return gimple_location (stmt1) == gimple_location (stmt2);
+	default:
+	  break;
+	}
+
+  return true;
+}
+
 /* Determines whether BB1 and BB2 (members of same_succ) are duplicates.  If so,
clusters them.  */
 
@@ -1226,14 +1259,10 @@ find_duplicate (same_succ *same_succ, basic_block bb1, basic_block bb2)
   gimple *stmt1 = gsi_stmt (gsi1);
   gimple *stmt2 = gsi_stmt (gsi2);
 
-  /* What could be better than this here is to blacklist the bb
-	 containing the stmt, when encountering the stmt f.i. in
-	 same_succ_hash.  */
-  if (is_tm_ending (stmt1)
-	  || is_tm_ending (stmt2))
+  if (!gimple_equal_p (same_succ, stmt1, stmt2))
 	return;
 
-  if (!gimple_equal_p (same_succ, stmt1, stmt2))
+  if (!merge_stmts_p (stmt1, stmt2))
 	return;
 
   gsi_prev_nondebug ();


[PATCH] Add -funconstrained-commons to work around PR/69368 (and others) in SPEC2006 (was: Re: [PATCH] Add -funknown-commons ...)

2016-03-07 Thread Alan Lawrence
On 04/03/16 13:27, Richard Biener wrote:
> I think to make it work with LTO you need to mark it 'Optimization'.
> Also it's about
> arrays so maybe
>
> 'Assume common declarations may be overridden with ones with a larger
> trailing array'
>
> also if we document it here we should eventually document it in invoke.texi.
>
> Not sure if "unknown commons" is a good term, maybe "unconstrained
> commons" instead?

All done; I doubt there is really a good word, unconstrained seems as good as
any. I've reused much the same wording in invoke.texi, unless you think there
is more to add.

On 04/03/16 13:33, Jakub Jelinek wrote:
> Also, isn't the *.opt description line supposed to end with a full stop?

Ah, yes, thanks.

Is this version OK for trunk?

gcc/ChangeLog:

DATE  Alan Lawrence  
  Jakub Jelinek  

* common.opt (funconstrained-commons, flag_unconstrained_commons): New.
* tree.c (array_at_struct_end_p): Do not limit to size of decl for
DECL_COMMONS if flag_unconstrained_commons is set.
* tree-dfa.c (get_ref_base_and_extent): Likewise.

gcc/testsuite/ChangeLog:

* gfortran.dg/unconstrained_commons.f: New.
---
 gcc/common.opt|  5 +
 gcc/doc/invoke.texi   |  8 +++-
 gcc/testsuite/gfortran.dg/unconstrained_commons.f | 20 
 gcc/tree-dfa.c| 15 ++-
 gcc/tree.c|  6 --
 5 files changed, 50 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/unconstrained_commons.f

diff --git a/gcc/common.opt b/gcc/common.opt
index 520fa9c..bbf79ef 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2451,6 +2451,11 @@ fsplit-paths
 Common Report Var(flag_split_paths) Init(0) Optimization
 Split paths leading to loop backedges.
 
+funconstrained-commons
+Common Var(flag_unconstrained_commons) Optimization
+Assume common declarations may be overridden with ones with a larger
+trailing array.
+
 funit-at-a-time
 Common Report Var(flag_unit_at_a_time) Init(1)
 Compile whole compilation unit at a time.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 0a2a6f4..68933a1 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -407,7 +407,7 @@ Objective-C and Objective-C++ Dialects}.
 -ftree-parallelize-loops=@var{n} -ftree-pre -ftree-partial-pre -ftree-pta @gol
 -ftree-reassoc -ftree-sink -ftree-slsr -ftree-sra @gol
 -ftree-switch-conversion -ftree-tail-merge -ftree-ter @gol
--ftree-vectorize -ftree-vrp @gol
+-ftree-vectorize -ftree-vrp -funconstrained-commons @gol
 -funit-at-a-time -funroll-all-loops -funroll-loops @gol
 -funsafe-loop-optimizations -funsafe-math-optimizations -funswitch-loops @gol
 -fipa-ra -fvariable-expansion-in-unroller -fvect-cost-model -fvpt @gol
@@ -6659,6 +6659,12 @@ the loop optimizer itself cannot prove that these 
assumptions are valid.
 If you use @option{-Wunsafe-loop-optimizations}, the compiler warns you
 if it finds this kind of loop.
 
+@item -funconstrained-commons
+@opindex funconstrained-commons
+This option tells the compiler that variables declared in common blocks
+(e.g. Fortran) may later be overridden with longer trailing arrays. This
+prevents certain optimizations that depend on knowing the array bounds.
+
 @item -fcrossjumping
 @opindex fcrossjumping
 Perform cross-jumping transformation.
diff --git a/gcc/testsuite/gfortran.dg/unconstrained_commons.f 
b/gcc/testsuite/gfortran.dg/unconstrained_commons.f
new file mode 100644
index 000..f9fc471
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/unconstrained_commons.f
@@ -0,0 +1,20 @@
+! { dg-do compile }
+! { dg-options "-O3 -funconstrained-commons -fdump-tree-dom2-details" }
+
+! Test for PR69368: a single-element array in a common block, which will be
+! overridden with a larger size at link time (contrary to language spec).
+! Dominator opts considers accesses to differently-computed elements of X as
+! equivalent, unless -funconstrained-commons is passed in.
+  SUBROUTINE FOO
+  IMPLICIT DOUBLE PRECISION (X)
+  INTEGER J
+  COMMON /MYCOMMON / X(1)
+  DO 10 J=1,1024
+ X(J+1)=X(J+7)
+  10  CONTINUE
+  RETURN
+  END
+! { dg-final { scan-tree-dump-not "FIND" "dom2" } }
+! We should retain both a read and write of mycommon.x.
+! { dg-final { scan-tree-dump-times "  _\[0-9\]+ = 
mycommon\\.x\\\[_\[0-9\]+\\\];" 1 "dom2" } }
+! { dg-final { scan-tree-dump-times "  mycommon\\.x\\\[_\[0-9\]+\\\] = 
_\[0-9\]+;" 1 "dom2" } }
diff --git a/gcc/tree-dfa.c b/gcc/tree-dfa.c
index 0e98056..f133abc 100644
--- a/gcc/tree-dfa.c
+++ b/gcc/tree-dfa.c
@@ -612,9 +612,22 @@ get_ref_base_and_extent (tree exp, HOST_WIDE_INT *poffset,
 
   if (DECL_P (exp))
 {
+  if (flag_unconstrained_commons
+ && TREE_CODE (exp) == VAR_DECL && DECL_COMMON (exp))
+   {
+ tree sz_tree = TYPE_SIZE (TREE_TYPE (exp));
+

Re: [Patch, fortran] PR68241 - [meta-bug] Deferred-length character

2016-03-07 Thread Paul Richard Thomas
Dear All,

I had promised to get the 5-branch up to date in respect of deferred
character patches after then had been in place on trunk for "a few
weeks". Well, I got pulled away by PR69423 and have only now come back
to the earlier patch.

The attached patch corresponds to trunk revisions 232450 and 233589.
They did not apply cleanly 5-branch in one or two places but it was no
big deal to put them right.

Bootstrapped and regtested on FC21/x86_64 - OK for 5-branch?

Best regards

Paul

2016-03-07  Paul Thomas  

Backport from trunk.
PR fortran/69423
* trans-decl.c (create_function_arglist): Deferred character
length functions, with and without declared results, address
the passed reference type as '.result' and the local string
length as '..result'.
(gfc_null_and_pass_deferred_len): Helper function to null and
return deferred string lengths, as needed.
(gfc_trans_deferred_vars): Call it, thereby reducing repeated
code, add call for deferred arrays and reroute pointer function
results. Avoid using 'tmp' for anything other that a temporary
tree by introducing 'type_of_array' for the arrayspec type.

2016-03-07  Paul Thomas  

Backport from trunk.
PR fortran/64324
* resolve.c (check_uop_procedure): Prevent deferred length
characters from being trapped by assumed length error.

Backport from trunk.
PR fortran/49630
PR fortran/54070
PR fortran/60593
PR fortran/60795
PR fortran/61147
PR fortran/64324
* trans-array.c (gfc_conv_scalarized_array_ref): Pass decl for
function as well as variable expressions.
(gfc_array_init_size): Add 'expr' as an argument. Use this to
correctly set the descriptor dtype for deferred characters.
(gfc_array_allocate): Add 'expr' to the call to
'gfc_array_init_size'.
* trans.c (gfc_build_array_ref): Expand logic for setting span
to include indirect references to character lengths.
* trans-decl.c (gfc_get_symbol_decl): Ensure that deferred
result char lengths that are PARM_DECLs are indirectly
referenced both for directly passed and by reference.
(create_function_arglist): If the length type is a pointer type
then store the length as the 'passed_length' and make the char
length an indirect reference to it.
(gfc_trans_deferred_vars): If a character length has escaped
being set as an indirect reference, return it via the 'passed
length'.
* trans-expr.c (gfc_conv_procedure_call): The length of
deferred character length results is set TREE_STATIC and set to
zero.
(gfc_trans_assignment_1): Do not fix the rse string_length if
it is a variable, a parameter or an indirect reference. Add the
code to trap assignment of scalars to unallocated arrays.
* trans-stmt.c (gfc_trans_allocate): Remove 'def_str_len' and
all references to it. Instead, replicate the code to obtain a
explicitly defined string length and provide a value before
array allocation so that the dtype is correctly set.
trans-types.c (gfc_get_character_type): If the character length
is a pointer, use the indirect reference.

2016-03-07  Paul Thomas  

Backport from trunk.
PR fortran/69423
* gfortran.dg/deferred_character_15.f90 : New test.

2016-03-07  Paul Thomas  

Backport from trunk.
PR fortran/49630
* gfortran.dg/deferred_character_13.f90: New test for the fix
of comment 3 of the PR.

Backport from trunk.
PR fortran/54070
* gfortran.dg/deferred_character_8.f90: New test
* gfortran.dg/allocate_error_5.f90: New test

Backport from trunk.
PR fortran/60593
* gfortran.dg/deferred_character_10.f90: New test

Backport from trunk.
PR fortran/60795
* gfortran.dg/deferred_character_14.f90: New test

Backport from trunk.
PR fortran/61147
* gfortran.dg/deferred_character_11.f90: New test

Backport from trunk.
PR fortran/64324
* gfortran.dg/deferred_character_9.f90: New test





-- 
The difference between genius and stupidity is; genius has its limits.

Albert Einstein
Index: gcc/fortran/resolve.c
===
*** gcc/fortran/resolve.c   (revision 232481)
--- gcc/fortran/resolve.c   (working copy)
*** check_uop_procedure (gfc_symbol *sym, lo
*** 14904,14912 
  }
  
if (sym->ts.type == BT_CHARACTER
!   && !(sym->ts.u.cl && sym->ts.u.cl->length)
!   && !(sym->result && sym->result->ts.u.cl
!  && sym->result->ts.u.cl->length))
  {
gfc_error ("User operator procedure %qs at %L cannot be assumed "
 "character length", sym->name, );
--- 14904,14912 
  }
  
if (sym->ts.type == BT_CHARACTER
!   && !((sym->ts.u.cl && sym->ts.u.cl->length) || sym->ts.deferred)
!   && !(sym->result && ((sym->result->ts.u.cl
!  && 

Re: [PATCH] Skip ubsan internal fns in tail-merge

2016-03-07 Thread Jakub Jelinek
On Mon, Mar 07, 2016 at 10:49:14AM +0100, Tom de Vries wrote:
> This patch implements that.
> 
> Should the asan internal function (ASAN_CHECK) be handled in the same way,
> as suggested here ( https://gcc.gnu.org/ml/gcc-patches/2015-07/msg00776.html
> )?
> 
> OK for stage4 trunk if bootstrap and reg-test succeeds?

Please certainly add ASAN_CHECK to the list.
Perhaps more precise would be to allow merging them, but only if
gimple_location (stmt1) == gimple_location (stmt2).

Also, I wonder why you need to check both stmts individually, wouldn't it be
better (both for the is_tm_ending and the ubsan/asan internal calls) to
first call gimple_equal_p and only if it says the two are equal, check
is_tm_ending (only on one stmt, if they are equal, then I hope is_tm_ending
is necessarily equal for both), and similarly for the ubsan/asan ifns
compare the gimple_location?

> Skip ubsan internal fns in tail-merge
> 
> 2016-03-07  Tom de Vries  
> 
>   * tree-ssa-tail-merge.c (merge_stmt_p): New function, factored out
>   of ...
>   (find_duplicate): ... here.
>   (merge_stmt_p): Return false for ubsan functions.

Jakub


[PATCH] Skip ubsan internal fns in tail-merge

2016-03-07 Thread Tom de Vries

[ was: Re: [PATCH 3/3] Fix ubsan tests by disabling of an optimization. ]

On 10/07/15 22:11, Jeff Law wrote:

On 07/10/2015 02:19 AM, Richard Biener wrote:


But the warning on the "bogus" line will still be warranted, so user
goes and
fixes it.

But when the user gets the "bogus" line, he may look at the code and
determine that the reported line can't possibly be executed -- so they
get confused, assume the warning is bogus and ignore it.



  Then tail-merge no longer applies and he gets the warning on the

other warranted line.

That assumes that we'd get to both paths.  We may not.  The paths may be
totally independent.



IIUC, the conclusion of this discussion (starting at 
https://gcc.gnu.org/ml/gcc-patches/2015-07/msg00758.html ) is that we 
want to disable tail-merge for ubsan internal functions?


This patch implements that.

Should the asan internal function (ASAN_CHECK) be handled in the same 
way, as suggested here ( 
https://gcc.gnu.org/ml/gcc-patches/2015-07/msg00776.html )?


OK for stage4 trunk if bootstrap and reg-test succeeds?

Thanks,
- Tom

Skip ubsan internal fns in tail-merge

2016-03-07  Tom de Vries  

	* tree-ssa-tail-merge.c (merge_stmt_p): New function, factored out
	of ...
	(find_duplicate): ... here.
	(merge_stmt_p): Return false for ubsan functions.

---
 gcc/tree-ssa-tail-merge.c | 31 +--
 1 file changed, 29 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-ssa-tail-merge.c b/gcc/tree-ssa-tail-merge.c
index 5d32790..7f6e922 100644
--- a/gcc/tree-ssa-tail-merge.c
+++ b/gcc/tree-ssa-tail-merge.c
@@ -1207,6 +1207,33 @@ gsi_advance_bw_nondebug_nonlocal (gimple_stmt_iterator *gsi, tree *vuse,
 }
 }
 
+/* Return true if STMT is allowed to be merged.  */
+
+static bool
+merge_stmt_p (gimple *stmt)
+{
+  if (is_tm_ending (stmt))
+return false;
+
+  if (is_gimple_call (stmt)
+  && gimple_call_internal_p (stmt))
+switch (gimple_call_internal_fn (stmt))
+  {
+  case IFN_UBSAN_NULL:
+  case IFN_UBSAN_BOUNDS:
+  case IFN_UBSAN_VPTR:
+  case IFN_UBSAN_CHECK_ADD:
+  case IFN_UBSAN_CHECK_SUB:
+  case IFN_UBSAN_CHECK_MUL:
+  case IFN_UBSAN_OBJECT_SIZE:
+	return false;
+  default:
+	break;
+  }
+
+  return true;
+}
+
 /* Determines whether BB1 and BB2 (members of same_succ) are duplicates.  If so,
clusters them.  */
 
@@ -1229,8 +1256,8 @@ find_duplicate (same_succ *same_succ, basic_block bb1, basic_block bb2)
   /* What could be better than this here is to blacklist the bb
 	 containing the stmt, when encountering the stmt f.i. in
 	 same_succ_hash.  */
-  if (is_tm_ending (stmt1)
-	  || is_tm_ending (stmt2))
+  if (!merge_stmt_p (stmt1)
+	  || !merge_stmt_p (stmt2))
 	return;
 
   if (!gimple_equal_p (same_succ, stmt1, stmt2))


Only assume 4-byte stack alignment on 32-bit Solaris/x86 (PR target/62281)

2016-03-07 Thread Rainer Orth
There have been several PRs that all boil down to the fact that SSE
insns segfault since one of their operands isn't sufficiently aligned.
Initially, it seemed that only Solaris 9/x86 wass affected, and the issue
was fixed on the 4.9 branch (the last supported on Solaris 9) by
enabling -mstackrealign by default:

Only assume 4-byte stack alignment on Solaris 9/x86 (PR libgomp/60107)
http://gcc.gnu.org/ml/gcc-patches/2014-02/msg00662.html

Now PRs target/61949 and target/62281 show the same issue on 32-bit
Solaris 10/x86, and I've received confirmation that Solaris indeed only
guarantees 4-byte stack alignment as required by the i386 psABI.  Given
that, the following patch generalizes the patch above.

Bootstrapped without regressions on i386-pc-solaris2.1[012].  I'm
applying it to mainline now, although I don't yet understand why the
testcases only fail on Solaris 10.  Given that this is a wrong-code bug,
I'm going to apply it to the gcc-5 and gcc-4.9 branches within a week.

Rainer


2016-02-29  Rainer Orth  

PR target/62281
* config/i386/sol2.h (STACK_REALIGN_DEFAULT): Define.

# HG changeset patch
# Parent  fe2aecc6db984d0da6866189364e4fa9c5021495
Only assume 4-byte stack alignment on Solaris/x86 (PR target/62281)

diff --git a/gcc/config/i386/sol2.h b/gcc/config/i386/sol2.h
--- a/gcc/config/i386/sol2.h
+++ b/gcc/config/i386/sol2.h
@@ -21,6 +21,11 @@ along with GCC; see the file COPYING3.  
 #define SUBTARGET_OPTIMIZATION_OPTIONS\
   { OPT_LEVELS_1_PLUS, OPT_momit_leaf_frame_pointer, NULL, 1 }
 
+/* 32-bit Solaris/x86 only guarantees 4-byte stack alignment as required by
+   the i386 psABI, so realign it as necessary for SSE instructions.  */
+#undef STACK_REALIGN_DEFAULT
+#define STACK_REALIGN_DEFAULT (TARGET_64BIT ? 0 : 1)
+
 /* Old versions of the Solaris assembler can not handle the difference of
labels in different sections, so force DW_EH_PE_datarel if so.  */
 #ifndef HAVE_AS_IX86_DIFF_SECT_DELTA

2016-02-29  Rainer Orth  

Backport from mainline
PR target/62281
* config/i386/sol2.h (STACK_REALIGN_DEFAULT): Define.

Revert:
2014-02-11  Rainer Orth  

PR libgomp/60107
* config/i386/sol2-9.h: New file.
* config.gcc (i[34567]86-*-solaris2* | x86_64-*-solaris2.1[0-9]*,
*-*-solaris2.9*): Use it.

# HG changeset patch
# Parent  20345b38cd619b11e097531603ee5eb0d7f56835
Only assume 4-byte stack alignment on Solaris/x86 (PR target/62281)

diff --git a/gcc/config.gcc b/gcc/config.gcc
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -1542,9 +1542,6 @@ i[34567]86-*-solaris2* | x86_64-*-solari
 	esac
 	with_tune_32=${with_tune_32:-generic}
 	case ${target} in
-	*-*-solaris2.9*)
-		tm_file="${tm_file} i386/sol2-9.h"
-		;;
 	*-*-solaris2.1[0-9]*)
 		tm_file="${tm_file} i386/x86-64.h i386/sol2-bi.h sol2-bi.h"
 		tm_defines="${tm_defines} TARGET_BI_ARCH=1"
diff --git a/gcc/config/i386/sol2-9.h b/gcc/config/i386/sol2-9.h
deleted file mode 100644
--- a/gcc/config/i386/sol2-9.h
+++ /dev/null
@@ -1,23 +0,0 @@
-/* Target definitions for GCC for Intel 80386 running Solaris 9
-   Copyright (C) 2014 Free Software Foundation, Inc.
-
-This file is part of GCC.
-
-GCC is free software; you can redistribute it and/or modify
-it under the terms of the GNU General Public License as published by
-the Free Software Foundation; either version 3, or (at your option)
-any later version.
-
-GCC is distributed in the hope that it will be useful,
-but WITHOUT ANY WARRANTY; without even the implied warranty of
-MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-GNU General Public License for more details.
-
-You should have received a copy of the GNU General Public License
-along with GCC; see the file COPYING3.  If not see
-.  */
-
-/* Solaris 9 only guarantees 4-byte stack alignment as required by the i386
-   psABI, so realign it as necessary for SSE instructions.  */
-#undef STACK_REALIGN_DEFAULT
-#define STACK_REALIGN_DEFAULT 1
diff --git a/gcc/config/i386/sol2.h b/gcc/config/i386/sol2.h
--- a/gcc/config/i386/sol2.h
+++ b/gcc/config/i386/sol2.h
@@ -25,6 +25,11 @@ along with GCC; see the file COPYING3.  
 #define TARGET_SUBTARGET_DEFAULT \
 	(MASK_80387 | MASK_IEEE_FP | MASK_FLOAT_RETURNS | MASK_VECT8_RETURNS)
 
+/* 32-bit Solaris/x86 only guarantees 4-byte stack alignment as required by
+   the i386 psABI, so realign it as necessary for SSE instructions.  */
+#undef STACK_REALIGN_DEFAULT
+#define STACK_REALIGN_DEFAULT (TARGET_64BIT ? 0 : 1)
+
 /* Old versions of the Solaris assembler can not handle the difference of
labels in different sections, so force DW_EH_PE_datarel.  */
 #undef ASM_PREFERRED_EH_DATA_FORMAT

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH PR69052]Set higher precedence for CONST_WIDE_INT than CONST_INT when canonicalizing addr expr

2016-03-07 Thread Bin.Cheng
On Fri, Mar 4, 2016 at 6:06 PM, Richard Biener
 wrote:
>
> On March 4, 2016 5:35:13 PM GMT+01:00, "Bin.Cheng"  
> wrote:
> >On Fri, Mar 4, 2016 at 11:57 AM, Richard Biener
> > wrote:
> >> On Fri, Mar 4, 2016 at 12:07 PM, Bin Cheng  wrote:
> >>> Hi,
> >>> A address canonicalization interface was introduced by my original
> >patch to PR69052.  The interface sorts sub-parts in an address
> >expression based on precedences defined by function
> >commutative_operand_precedence.  It also assumes that all CONST_INT
> >sub-parts are at the end of vector after sorting.  But this is not
> >always true because commutative_operand_precedence sets the same
> >precedence to both CONST_INT and CONST_WIDE_INT.  The patch could be
> >broken in cases which have CONST_WIDE_INT sorted after CONST_INT.  Even
> >though I don't think we can really run into such complicated case, I
> >worked out a patch forcing CONST_INT to lower precedence than
> >CONST_WIDE_INT, so that for sure it will be sorted after all other
> >kinds sub-parts.
> >>>
> >>> This is an obvious change.  Bootstrap on x86_64, bootstrap
> >on AArch64.  Is it OK for this stage?
> >>
> >> I believe the obvious change would be to change
> >> commutative_operand_precedence to pre-CONST_WIDE_INT behavior, namely
> >> giving CONST_WIDE_INT the same precedence as CONST_DOUBLE.
> >Yes, but I am not sure what else this change will affect, so I made
> >the smallest change in the original code.
>
> I don't like going with this kind of weirdo changes out of caution.  If you 
> think it's too dangerous the push it back to gcc 7 and consider backporting 
> for 6.2

Hi,
Attachment is the patch setting precedence of CONST_WIDE_INT to the
same value as CONST_DOUBLE.  Bootstrap and test on x86_64 and AArch64,
no regressions.

Thanks,
bin

2016-03-03  Bin Cheng  

PR rtl-optimization/69052
* rtlanal.c (commutative_operand_precedence): Set higher precedence
to CONST_WIDE_INT.
diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
index 642611f..bacc5f2 100644
--- a/gcc/rtlanal.c
+++ b/gcc/rtlanal.c
@@ -3358,7 +3358,7 @@ commutative_operand_precedence (rtx op)
   if (code == CONST_INT)
 return -8;
   if (code == CONST_WIDE_INT)
-return -8;
+return -7;
   if (code == CONST_DOUBLE)
 return -7;
   if (code == CONST_FIXED)


Re: [PATCH][ARM,testsuite] fix pragma_cpp_fma testcase

2016-03-07 Thread Kyrill Tkachov


On 06/03/16 13:58, Christophe Lyon wrote:

Hi,

In commit r233654, Christian introduced a new test: pragma_cpp_fma.

Unfortunately, this test fails when gcc is configured --with-fpu >=
neonvfpv4: __ARM_FEATURE_FMA is still defined after the last
pop_options.

To address this, I propose to simply force fpu=vfp via a pragma at the
beginning of the test, like we do in several other similar tests.

OK?


Ok.
Thanks,
Kyrill




Christophe.




[RFC][ARM,AArch64] Adding crypto Advsimd intrinsics tests

2016-03-07 Thread Christophe Lyon
Hi,

While preparing the cleanup of neon-testgen.ml, I'm adding the missing
tests to gcc.target/aarch64/advsimd-intrinsics.

All the *_p64 and *_p128 are currently missing, and I am wondering
what's the best option. I can think of:
1- Update existing tests using #ifdef __ARM_FEATURE_CRYPTO
2- Update existing tests without #ifdef, but adding effective_target
arm_crypto_ok
3- Create dedicated tests, either grouping alll p64/p128 in one single
source, or splitting them in as many source files as there are
intrinsics.

1- means that we would test different things depending on how GCC is
configured (--with-fpu)
2- means that we would not be able to test the subset which does not
require crypto if for some reason we cannot force the right effective
target
3- might be a bit more confusing as several places cover the same intrinsics.

Thoughts?

Thanks,

Christophe.


[Ada] Fix handling of bitpacked arrays in ASIS mode

2016-03-07 Thread Eric Botcazou
This isn't a regression but the problem only affects ASIS mode and the fix 
cannot do any harm in regular mode.  It brings the support of bitpacked array 
types on par with that of other array types.

Tested on x86_64-suse-linux, applied on the mainline.


2016-03-07  Eric Botcazou  

* gcc-interface/trans.c (statement_node_p): New predicate.
(gnat_to_gnu): Invoke it to detect statement nodes.  In ASIS mode, do
not return dummy results for expressions attached to packed array
implementation types.

-- 
Eric BotcazouIndex: gcc-interface/trans.c
===
--- gcc-interface/trans.c	(revision 233957)
+++ gcc-interface/trans.c	(working copy)
@@ -5715,6 +5715,28 @@ unchecked_conversion_nop (Node_Id gnat_n
   return false;
 }
 
+/* Return true if GNAT_NODE represents a statement.  */
+
+static bool
+statement_node_p (Node_Id gnat_node)
+{
+  const Node_Kind kind = Nkind (gnat_node);
+
+  if (kind == N_Label)
+return true;
+
+  if (IN (kind, N_Statement_Other_Than_Procedure_Call))
+return true;
+
+  if (kind == N_Procedure_Call_Statement)
+return true;
+
+  if (IN (kind, N_Raise_xxx_Error) && Ekind (Etype (gnat_node)) == E_Void)
+return true;
+
+  return false;
+}
+
 /* This function is the driver of the GNAT to GCC tree transformation process.
It is the entry point of the tree transformer.  GNAT_NODE is the root of
some GNAT tree.  Return the root of the corresponding GCC tree.  If this
@@ -5738,15 +5760,23 @@ gnat_to_gnu (Node_Id gnat_node)
   error_gnat_node = gnat_node;
   Sloc_to_locus (Sloc (gnat_node), _location);
 
-  /* If this node is a statement and we are only annotating types, return an
- empty statement list.  */
-  if (type_annotate_only && IN (kind, N_Statement_Other_Than_Procedure_Call))
+  /* If we are only annotating types and this node is a statement, return
+ an empty statement list.  */
+  if (type_annotate_only && statement_node_p (gnat_node))
 return alloc_stmt_list ();
 
-  /* If this node is a non-static subexpression and we are only annotating
- types, make this into a NULL_EXPR.  */
+  /* If we are only annotating types and this node is a subexpression, return
+ a NULL_EXPR, but filter out nodes appearing in the expressions attached
+ to packed array implementation types.  */
   if (type_annotate_only
   && IN (kind, N_Subexpr)
+  && !(((IN (kind, N_Op) && kind != N_Op_Expon)
+	|| kind == N_Type_Conversion)
+	   && Is_Integer_Type (Etype (gnat_node)))
+  && !(kind == N_Attribute_Reference
+	   && Get_Attribute_Id (Attribute_Name (gnat_node)) == Attr_Length
+	   && Ekind (Etype (Prefix (gnat_node))) == E_Array_Subtype
+	   && !Is_Constr_Subt_For_U_Nominal (Etype (Prefix (gnat_node
   && kind != N_Expanded_Name
   && kind != N_Identifier
   && !Compile_Time_Known_Value (gnat_node))
@@ -5754,13 +5784,9 @@ gnat_to_gnu (Node_Id gnat_node)
 		   build_call_raise (CE_Range_Check_Failed, gnat_node,
  N_Raise_Constraint_Error));
 
-  if ((IN (kind, N_Statement_Other_Than_Procedure_Call)
-   && kind != N_Null_Statement)
-  || kind == N_Procedure_Call_Statement
-  || kind == N_Label
-  || kind == N_Implicit_Label_Declaration
+  if ((statement_node_p (gnat_node) && kind != N_Null_Statement)
   || kind == N_Handled_Sequence_Of_Statements
-  || (IN (kind, N_Raise_xxx_Error) && Ekind (Etype (gnat_node)) == E_Void))
+  || kind == N_Implicit_Label_Declaration)
 {
   tree current_elab_proc = get_elaboration_procedure ();
 
@@ -5780,7 +5806,8 @@ gnat_to_gnu (Node_Id gnat_node)
 	 spurious errors on dummy (empty) sequences created by the front-end
 	 for package bodies in some cases.  */
   if (current_function_decl == current_elab_proc
-	  && kind != N_Handled_Sequence_Of_Statements)
+	  && kind != N_Handled_Sequence_Of_Statements
+	  && kind != N_Implicit_Label_Declaration)
 	Check_Elaboration_Code_Allowed (gnat_node);
 }
 


Re: [AArch64] Disable pcrelative_literal_loads with fix-cortex-a53-843419

2016-03-07 Thread Christophe Lyon
On 7 March 2016 at 05:49, Ramana Radhakrishnan
 wrote:
> On Tue, Jan 26, 2016 at 2:42 PM, Christophe Lyon
>  wrote:
>> Hi,
>>
>> This is a followup to PR63304.
>>
>> As discussed in bugzilla, this patch disables pcrelative_literal_loads
>> when -mfix-cortex-a53-843419 (or its default configure option) is
>> used.
>>
>> I copied the behavior of -mfix-cortex-a53-835769 (e.g. in
>> aarch64_can_inline_p), and I have tested by building the Linux kernel
>> using -mfix-cortex-a53-843419 and checked that
>> R_AARCH64_ADR_PREL_PG_HI21 relocations are not emitted anymore (under
>> CONFIG_ARM64_ERRATUM_843419).
>>
>> For reference, this is motivated by:
>> https://bugs.linaro.org/show_bug.cgi?id=1994
>
> I think we need a bug report for this in the GCC bugzilla targeting
> GCC 6 and have a reference to that in addition to PR63304 as this is
> broken behaviour in gcc trunk with respect to the erratum fixup.
OK, I have opened PR 70113.

>
> There are 2 solutions to this - one is to put out address constants in
> the large memory model with the mov / movk instruction sequences but
> then that wouldn't work with older released kernels and require
> changes to the kernel loader and the other being this where we turn on
> PC relative literal loads for the erratum fixup. I personally think
> that this solution is probably sensible during stage4 but I cannot
> approve changes to the AArch64 backend.
>
> The only question I have about this implementation in the sources
> based on a cursory look at the patch is whether this ends up working
> reliably with LTO ?
>
I haven't tested this case. Do you have any testcase in mind to convince
yourself & other maintainers?
Building the Linux kernel with LTO?


Thanks,

Christophe.

>
> regards
> Ramana
>
>> and further details on Launchpad:
>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1533009
>>
>> OK for trunk?
>>
>> Thanks,
>>
>> Christophe.


Re: [PATCH, i386]: Fix PR 70024, wrong code with -fPIC and -mred-zone on i686

2016-03-07 Thread Uros Bizjak
On Mon, Mar 7, 2016 at 9:34 AM, Jakub Jelinek  wrote:
> On Sun, Mar 06, 2016 at 10:39:19PM +0100, Uros Bizjak wrote:
>> Attached patch prevents red-zone with -fPIC for leaf functions on
>> i686. The idea is to mark when set_got is expanded as a call to a pc
>> thunk, and use this flag to prevent red-zone in the current function.
>>
>> Please note that using red-zone with i686 violates all known ABIs, but
>> some future ABI can trip on this limitation. The failure mode is quite
>> tricky, so IMO it warrants a couple of lines as a safety net.
>>
>> The testcase is not suitable for the testsuite (it uses -mred-zone for
>> i686 target), but I have checked that the testcase works OK when
>> compiled with the patched compiler even when -mred-zone is used
>> together with -fPIC.
>>
>> Jakub, HJ -  what do you think?
>
> LGTM.  Have you checked other possibilities of red-zone violations from the
> compiler generated stuff?  I mean -p, -mfentry, TLS (both GNU and GNU2)?
> Signals/interrupts and inline asm is something that the user is responsible
> if he enables -mred-zone, but the above probably are not.

Yes. TLS disables redzone via similar mechanism, where
ix86_current_function_calls_tls_descriptor flag disables redzone.
mcount and __fentry__ are called before redzone is used, so it looks
that they work OK.

Uros.


Re: [PATCH, i386]: Fix PR 70024, wrong code with -fPIC and -mred-zone on i686

2016-03-07 Thread Jakub Jelinek
On Sun, Mar 06, 2016 at 10:39:19PM +0100, Uros Bizjak wrote:
> Attached patch prevents red-zone with -fPIC for leaf functions on
> i686. The idea is to mark when set_got is expanded as a call to a pc
> thunk, and use this flag to prevent red-zone in the current function.
> 
> Please note that using red-zone with i686 violates all known ABIs, but
> some future ABI can trip on this limitation. The failure mode is quite
> tricky, so IMO it warrants a couple of lines as a safety net.
> 
> The testcase is not suitable for the testsuite (it uses -mred-zone for
> i686 target), but I have checked that the testcase works OK when
> compiled with the patched compiler even when -mred-zone is used
> together with -fPIC.
> 
> Jakub, HJ -  what do you think?

LGTM.  Have you checked other possibilities of red-zone violations from the
compiler generated stuff?  I mean -p, -mfentry, TLS (both GNU and GNU2)?
Signals/interrupts and inline asm is something that the user is responsible
if he enables -mred-zone, but the above probably are not.

Jakub


Re: [PATCH, i386]: Fix PR 70024, wrong code with -fPIC and -mred-zone on i686

2016-03-07 Thread Uros Bizjak
On Mon, Mar 7, 2016 at 12:10 AM, H.J. Lu  wrote:
> On Sun, Mar 6, 2016 at 1:39 PM, Uros Bizjak  wrote:
>> Hello!
>>
>> Attached patch prevents red-zone with -fPIC for leaf functions on
>> i686. The idea is to mark when set_got is expanded as a call to a pc
>> thunk, and use this flag to prevent red-zone in the current function.
>>
>> Please note that using red-zone with i686 violates all known ABIs, but
>> some future ABI can trip on this limitation. The failure mode is quite
>> tricky, so IMO it warrants a couple of lines as a safety net.
>>
>> The testcase is not suitable for the testsuite (it uses -mred-zone for
>> i686 target), but I have checked that the testcase works OK when
>> compiled with the patched compiler even when -mred-zone is used
>> together with -fPIC.
>>
>> Jakub, HJ -  what do you think?
>>
>> 2016-03-01  Uros Bizjak  
>>
>> PR target/70064
>> * config/i386/i386.h (machine_function): Add
>> pc_thunk_call_expanded flag.
>> (ix86_pc_thunk_call_expanded): New define.
>> * config/i386/i386.md (set_got, set_got_labelled): New expanders.
>> (*set_got): Rename insn pattern from set_got.
>> (*set_got_labelled): Rename inst pattern from set_got_labelled.
>> * config/i386/i386.c (ix86_compute_frame_layout): Use
>> ix86_pc_thunk_call_expanded to prevent red-zone.
>>
>> The patch is bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.
>>
>
> I think red zone should only be allowed for x86-64 psABI.

Is there a hardware limitation in the processor that would disallow
red-zone for all x86_32 targets? As far as the documentation goes, the
existence of redzone depends on signal and interrupt handlers. x86_32
linux in particular doesn't provide this area, but there is no reason
some other OS will.

In light of this, I think that limiting redzone to x86_64 would be an
artificial limitation.

Uros.


[PING][PATCH, 12/16] Handle acc loop directive

2016-03-07 Thread Tom de Vries

On 29/02/16 04:26, Tom de Vries wrote:

On 22-02-16 11:57, Jakub Jelinek wrote:

On Mon, Feb 22, 2016 at 11:54:46AM +0100, Tom de Vries wrote:

Following up on your suggestion to implement this during
gimplification, I
wrote attached patch.

I'll put it through some openacc testing and add testcases. Is this
approach
acceptable for stage4?


LGTM.


Hi,

I ran into trouble during testing of this patch, with ignoring the
private clause on the loop directive.

This openacc testcase compiles atm without a problem:
...
int
main (void)
{
   int j;
#pragma acc kernels default(none)
   {
#pragma acc loop private (j)
 for (unsigned i = 0; i < 1000; ++i)
   {
 j;
   }
   }
}
...

But when compiling with the patch, and ignoring the private clause, we
run into this error:
...
test.c: In function ‘main’:
test.c:10:2: error: ‘j’ not specified in enclosing OpenACC ‘kernels’
construct
   j;
   ^
test.c:5:9: note: enclosing OpenACC ‘kernels’ construct
  #pragma acc kernels default(none)
...

So I updated the patch to ignore all but the private clause on the loop
directive during gimplification, and moved the sequential expansion of
the omp-for construct from gimplify to omp-lower.

Bootstrapped and reg-tested on x86_64.

Build for nvidia accelerator and reg-tested goacc.exp and libgomp
testsuite.

Updated patch still ok for stage4?



Ping. ( Submitted here: 
https://gcc.gnu.org/ml/gcc-patches/2016-02/msg01903.html )


Thanks,
- Tom


0001-Ignore-acc-loop-directive-in-kernels-region.patch


Ignore acc loop directive in kernels region

2016-02-29  Tom de Vries  

* gimplify.c (gimplify_ctx_in_oacc_kernels_region): New function.
(gimplify_omp_for): Ignore all but private clause on loop directive in
kernels region.
* omp-low.c (lower_omp_for_seq): New function.
(lower_omp_for): Use lower_omp_for_seq in kernels region.  Don't
generate omp continue/return.

* c-c++-common/goacc/kernels-acc-loop-reduction.c: New test.
* c-c++-common/goacc/kernels-acc-loop-smaller-equal.c: Same.
* c-c++-common/goacc/kernels-loop-2-acc-loop.c: Same.
* c-c++-common/goacc/kernels-loop-3-acc-loop.c: Same.
* c-c++-common/goacc/kernels-loop-acc-loop.c: Same.
* c-c++-common/goacc/kernels-loop-n-acc-loop.c: Same.
* c-c++-common/goacc/combined-directives.c: Update test.
* c-c++-common/goacc/loop-private-1.c: Same.
* gfortran.dg/goacc/combined-directives.f90: Same.
* gfortran.dg/goacc/gang-static.f95: Same.
* gfortran.dg/goacc/reduction-2.f95: Same.

---
  gcc/gimplify.c | 41 ++
  gcc/omp-low.c  | 93 --
  .../c-c++-common/goacc/combined-directives.c   | 16 ++--
  .../goacc/kernels-acc-loop-reduction.c | 24 ++
  .../goacc/kernels-acc-loop-smaller-equal.c | 22 +
  .../c-c++-common/goacc/kernels-loop-2-acc-loop.c   | 17 
  .../c-c++-common/goacc/kernels-loop-3-acc-loop.c   | 14 
  .../c-c++-common/goacc/kernels-loop-acc-loop.c | 14 
  .../c-c++-common/goacc/kernels-loop-n-acc-loop.c   | 14 
  gcc/testsuite/c-c++-common/goacc/loop-private-1.c  |  2 +-
  .../gfortran.dg/goacc/combined-directives.f90  | 16 ++--
  gcc/testsuite/gfortran.dg/goacc/gang-static.f95|  4 +-
  gcc/testsuite/gfortran.dg/goacc/reduction-2.f95|  3 +-
  13 files changed, 252 insertions(+), 28 deletions(-)

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 7be6bd7..4b82305 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -8364,6 +8364,20 @@ find_combined_omp_for (tree *tp, int *walk_subtrees, 
void *)
return NULL_TREE;
  }

+/* Return true if CTX is (part of) an oacc kernels region.  */
+
+static bool
+gimplify_ctx_in_oacc_kernels_region (gimplify_omp_ctx *ctx)
+{
+  for (;ctx != NULL; ctx = ctx->outer_context)
+{
+  if (ctx->region_type == ORT_ACC_KERNELS)
+   return true;
+}
+
+  return false;
+}
+
  /* Gimplify the gross structure of an OMP_FOR statement.  */

  static enum gimplify_status
@@ -8403,6 +8417,33 @@ gimplify_omp_for (tree *expr_p, gimple_seq *pre_p)
gcc_unreachable ();
  }

+  /* Skip loop clauses not handled in kernels region.  */
+  if (gimplify_ctx_in_oacc_kernels_region (gimplify_omp_ctxp))
+{
+  tree *prev_ptr = _FOR_CLAUSES (for_stmt);
+
+  while (tree probe = *prev_ptr)
+   {
+ tree *next_ptr = _CLAUSE_CHAIN (probe);
+
+ bool keep_clause;
+ switch (OMP_CLAUSE_CODE (probe))
+   {
+   case OMP_CLAUSE_PRIVATE:
+ keep_clause = true;
+ break;
+   default:
+ keep_clause = false;
+ break;
+   }
+
+ if (keep_clause)
+   prev_ptr = next_ptr;
+ else
+   *prev_ptr = *next_ptr;
+   }
+}
+
/* Set OMP_CLAUSE_LINEAR_NO_COPYIN flag on explicit linear
   

[Ada] Fix ICE on renaming of downcast of dereference

2016-03-07 Thread Eric Botcazou
This is a regression present on the mainline: the compiler aborts on the 
renaming of a type conversion of the dereference of an access value, if the 
renaming is declared at library level and used in a subprogram.

Fixed thusly, tested on x86_64-suse-linux, applied on the mainline.


2016-03-07  Eric Botcazou  

* gcc-interface/decl.c (gnat_to_gnu_entity) : Always mark
the expression of a renaming manually in case #3.


2016-03-07  Eric Botcazou  

* gnat.dg/renaming9.ad[sb]: New testcase.

-- 
Eric BotcazouIndex: gcc-interface/decl.c
===
--- gcc-interface/decl.c	(revision 233957)
+++ gcc-interface/decl.c	(working copy)
@@ -1058,12 +1058,10 @@ gnat_to_gnu_entity (Entity_Id gnat_entit
 		  = elaborate_reference (gnu_expr, gnat_entity, definition,
 	 );
 
-		/* If we are not defining the entity, the expression will not
-		   be attached through DECL_INITIAL so it needs to be marked
-		   manually because it will likely be shared.  Likewise for a
-		   dereference as it will be folded by the ADDR_EXPR below.  */
-		if ((!definition || TREE_CODE (renamed_obj) == INDIRECT_REF)
-		&& global_bindings_p ())
+		/* The expression needs to be marked manually because it will
+		   likely be shared, even for a definition since the ADDR_EXPR
+		   built below can cause the first few nodes to be folded.  */
+		if (global_bindings_p ())
 		  MARK_VISITED (renamed_obj);
 
 		if (type_annotate_only
-- { dg-do compile }

package body Renaming9 is

  procedure Proc is
  begin
Obj.I := 0;
  end;

begin
  Obj.I := 0;
end Renaming9;
package Renaming9 is

  pragma Elaborate_Body;

  type Object is tagged null record;

  type Pointer is access all Object'Class;

  type Derived is new Object with record
 I : Integer;
  end record;

  Ptr : Pointer := new Derived;
  Obj : Derived renames Derived (Ptr.all);

end Renaming9;