date:20160630

Re: [PATCH, rs6000] Fix PR target/71698, ICE in reload copying TDmode values to GPR regs

2016-06-30 Thread Segher Boessenkool

On Thu, Jun 30, 2016 at 05:55:04PM -0500, Peter Bergner wrote:
> We currently don't allow TDmode values to use direct moves, since they
> live in register pairs and the most significant word is always in the
> even-numbered register which does not match subreg ordering in little
> endian mode.  The following patch fixes PR71698 by disallowing reload
> from using direct moves for TDmode values.  
> 
> This passed bootstrap and regtesting with no regressions. Ok for trunk?

Yes, this is okay.  Thanks!

> This is also broken on the FSF 6 branch, so is this ok there too after
> bootstrap and regtesting there?

Okay.


Segher

[PATCH], Fix PowerPC PR target/71720 (bad vec_init of V4SF on power9)

2016-06-30 Thread Michael Meissner

Aaron Sawdey was running tests on the Power9 simulator, and he noticed that the
XXSPLTW to splat the SFmode value into the vector accessed the wrong element.
When I added the more general code for vector splat for power9, I called an
insn that reversed the element numbers on little endian, when I should have
called the direct function, since the XSCVDPSPN instruction leaves the
converted value in real element 0 in the hardware.

I did a bootstrap and make check with no regressions, can I check this into the
trunk?  And can I back port the patch to GCC 6.2 after a burn-in period?

[gcc]
2016-06-30  Michael Meissner  

PR target/71720
* config/rs6000/vsx.md (vsx_splat_v4sf_internal): When splitting
the insns, use vsx_xxspltw_v4sf_direct which does not check for
little endian.

[gcc/testsuite]
2016-06-30  Michael Meissner  

PR target/71720
* gcc.target/powerpc/pr71720.c: New test.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/vsx.md
===
--- gcc/config/rs6000/vsx.md(revision 237895)
+++ gcc/config/rs6000/vsx.md(working copy)
@@ -2512,9 +2512,8 @@ (define_insn_and_split "*vsx_splat_v4sf_
   [(set (match_dup 0)
(unspec:V4SF [(match_dup 1)] UNSPEC_VSX_CVDPSPN))
(set (match_dup 0)
-   (vec_duplicate:V4SF
-(vec_select:SF (match_dup 0)
-   (parallel [(const_int 0)]]
+   (unspec:V4SF [(match_dup 0)
+ (const_int 0)] UNSPEC_VSX_XXSPLTW))]
   ""
   [(set_attr "type" "vecload,vecperm,mftgpr")
(set_attr "length" "4,8,4")])
Index: gcc/testsuite/gcc.target/powerpc/pr71720.c
===
--- gcc/testsuite/gcc.target/powerpc/pr71720.c  (revision 0)
+++ gcc/testsuite/gcc.target/powerpc/pr71720.c  (working copy)
@@ -0,0 +1,15 @@
+/* { dg-do compile { target { powerpc64*-*-* && lp64 } } } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power9" } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-options "-mcpu=power9 -O2 -mupper-regs-di" } */
+
+/* Verify that we generate xxspltw ,,0 for V4SFmode splat.  */
+
+vector float
+splat_v4sf (float f)
+{
+  return (vector float) { f, f, f, f };
+}
+
+/* { dg-final { scan-assembler "xscvdpspn "  } } */
+/* { dg-final { scan-assembler "xxspltw .*,.*,0" } } */

Re: RFC: pass to warn on questionable uses of alloca().

2016-06-30 Thread Martin Sebor


On 06/30/2016 06:01 AM, Aldy Hernandez wrote:

On 06/18/2016 07:55 PM, Martin Sebor wrote:

On 06/16/2016 02:32 AM, Aldy Hernandez wrote:



p.s. The pass currently warns on all uses of VLAs.  I'm not completely
sold on this idea, so perhaps we could remove it, or gate it with a
flag.


I also think that VLA diagnostics would be better controlled
by a separate option, and emit a different diagnostic (one
that mentions VLA rather than alloca).  Although again, and
for VLAs even more so than for alloca, providing an option
to have GCC use dynamic allocation, would be an even more
robust solution than issuing warnings.  IIRC, this was the
early implementation of VLAs in GCC so there is a precedent
for it.  (Though this seems complementary to the warnings.)
In addition, I'm of the opinion that potentially unbounded
VLA allocation should be checked at runtime and made trap on
size overflow in C and throw an exception in C++ (e.g., when
int a [A][B] when A * B * sizeof (int) exceeds SIZE_MAX / 2
or some runtime-configurable limit).  My C++ patch for bug
69517 does just that (it needs to be resubmitted with the
runtime configuration limit added).

For a choice of VLA warning options, there already is -Wvla,
though it simply points out every VLA definition regardless
of its size.  It also issues a diagnostic that's questionable
in C99 and later modes (that "ISO C90 forbids variable length
array" is only relevant in C90; what matters when -Wvla is
specified in C99 and C11 modes is that a VLA has been seen).

To keep things consistent with -Walloca, perhaps -Wvla could
be extended to take an optional argument to specify the VLA
threshold.  (So that -Wvla=4096 would only diagnose VLA
definitions of 4k bytes or more, or unknown size.)


Hmmm...I kinda wanted a sane default for -Walloca, and keeping -Walloc
and -Wvla consistent would mean I don't get that.

Currently, -Walloca would mean "warn on unbounded uses of alloca where n
 > DEFAULT", whereas -Walloca=0 would mean "warn on all uses of alloca".

The problem is that -Wvla means "warn on ALL uses".

One consistent option would be to change the definition to:

-Walloca: warn on all uses of alloca (strict mode).
-Walloca=N: warn on unbounded uses of alloca or bounded uses of n > N.
-Wvla: warn on all uses of VLAs (as currently implemented).
-Wvla=N: warn on unbounded uses of VLA or bounded uses of n > N.

This means we get no defaults, and the user must make the limit
explicit, but at least we're consistent with the current use of -Wvla.

How about we just use -Walloca (and no separate variants for -Wvla=??),
but we document that -Walloca will also flag VLAs (and explain why).
Also, we make sure the error messages say "variable-length array" not
"alloca" in the VLA case.  This way we can have a default, and perhaps a
more consistent flag:

-Walloca: warn on all unbounded uses of alloca/vlas and bounded uses
where n > DEFAULT.
-Walloca=0: warn on all uses of alloca/vlas (strict mode).
-Walloca=N: warn on all unbounded uses of alloca/vlas and bounded uses
where n > N.
-Wla: untouched; as is currently implemented.

Would this be acceptable, or are y'all really against one -Walloca flag
to rule it all?


Since they are distinct features my preference is to keep alloca
and VLAs under separate warnings, even if the options that control
each aren't completely consistent with one another.  If you think
it's preferable to leave -Wvla unchanged then adding a new warning
for the VLA size checking might be a compromise to consider.

Alternatively, if you would prefer to have one option control both,
then using a different name for the option might work.  Incidentally,
while checking the manual for an inspiration for a suitable name I
came across -Wlarger-than= that sounds like should do exactly what
-Walloca/-Wvla-too-big do, except it only looks at constant sizes.
Maybe it should be made smarter?

Martin

Re: [PATCH], Fix PowerPC PR target/71720 (bad vec_init of V4SF on power9)

2016-06-30 Thread Segher Boessenkool

On Thu, Jun 30, 2016 at 08:10:41PM -0400, Michael Meissner wrote:
> Aaron Sawdey was running tests on the Power9 simulator, and he noticed that 
> the
> XXSPLTW to splat the SFmode value into the vector accessed the wrong element.
> When I added the more general code for vector splat for power9, I called an
> insn that reversed the element numbers on little endian, when I should have
> called the direct function, since the XSCVDPSPN instruction leaves the
> converted value in real element 0 in the hardware.

The changelog could be clearer (I first thought you attached the wrong
patch :-) )  Instead of "use vsx_xxspltw_v4sf_direct" you could say e.g.
"emit code matching use vsx_xxspltw_v4sf_direct,".

Okay for trunk with the comment clarified, and for 6 later.


Segher


>   * config/rs6000/vsx.md (vsx_splat_v4sf_internal): When splitting
>   the insns, use vsx_xxspltw_v4sf_direct which does not check for
>   little endian.

> Index: gcc/config/rs6000/vsx.md
> ===
> --- gcc/config/rs6000/vsx.md  (revision 237895)
> +++ gcc/config/rs6000/vsx.md  (working copy)
> @@ -2512,9 +2512,8 @@ (define_insn_and_split "*vsx_splat_v4sf_
>[(set (match_dup 0)
>   (unspec:V4SF [(match_dup 1)] UNSPEC_VSX_CVDPSPN))
> (set (match_dup 0)
> - (vec_duplicate:V4SF
> -  (vec_select:SF (match_dup 0)
> - (parallel [(const_int 0)]]
> + (unspec:V4SF [(match_dup 0)
> +   (const_int 0)] UNSPEC_VSX_XXSPLTW))]
>""
>[(set_attr "type" "vecload,vecperm,mftgpr")
> (set_attr "length" "4,8,4")])

[PATCH, rs6000] Fix PR target/71698, ICE in reload copying TDmode values to GPR regs

2016-06-30 Thread Peter Bergner

We currently don't allow TDmode values to use direct moves, since they
live in register pairs and the most significant word is always in the
even-numbered register which does not match subreg ordering in little
endian mode.  The following patch fixes PR71698 by disallowing reload
from using direct moves for TDmode values.  

This passed bootstrap and regtesting with no regressions. Ok for trunk?

This is also broken on the FSF 6 branch, so is this ok there too after
bootstrap and regtesting there?

Peter


gcc/
PR target/71698
* config/rs6000/rs6000.c (rs6000_secondary_reload_simple_move): Disallow
TDmode values.

gcc/testsuite/
PR target/71698
* gcc.target/powerpc/pr71698.c: New test.

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 237893)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -19194,7 +19194,8 @@ rs6000_secondary_reload_simple_move (enu
  simple move insns are issued.  At present, 32-bit integers are not allowed
  in FPR/VSX registers.  Single precision binary floating is not a simple
  move because we need to convert to the single precision memory layout.
- The 4-byte SDmode can be moved.  */
+ The 4-byte SDmode can be moved.  TDmode values are disallowed since they
+ need special direct move handling, which we do not support yet.  */
   size = GET_MODE_SIZE (mode);
   if (TARGET_DIRECT_MOVE
   && ((mode == SDmode) || (TARGET_POWERPC64 && size == 8))
@@ -19202,7 +19203,7 @@ rs6000_secondary_reload_simple_move (enu
  || (to_type == VSX_REG_TYPE && from_type == GPR_REG_TYPE)))
 return true;
 
-  else if (TARGET_DIRECT_MOVE_128 && size == 16
+  else if (TARGET_DIRECT_MOVE_128 && size == 16 && mode != TDmode
   && ((to_type == VSX_REG_TYPE && from_type == GPR_REG_TYPE)
   || (to_type == GPR_REG_TYPE && from_type == VSX_REG_TYPE)))
 return true;
Index: gcc/testsuite/gcc.target/powerpc/pr71698.c
===
--- gcc/testsuite/gcc.target/powerpc/pr71698.c  (revision 0)
+++ gcc/testsuite/gcc.target/powerpc/pr71698.c  (working copy)
@@ -0,0 +1,13 @@
+/* Test for a reload ICE arising from trying to direct move a TDmode value.  */
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-require-effective-target dfp } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power9" } } */
+/* { dg-options "-O1 -mcpu=power9 -mno-lra" } */
+
+extern void testvad128 (int n, ...);
+void
+testitd128 (_Decimal128 g01d128)
+{
+  testvad128 (1, g01d128);
+}

Re: [PATCH] c/71552 - Confusing error for incorrect struct initialization

2016-06-30 Thread Martin Sebor


On 06/20/2016 08:52 AM, Joseph Myers wrote:

On Sat, 18 Jun 2016, Martin Sebor wrote:


The attached patch slightly changes the order in which initializers
are checked for type compatibility to issue the same error for static
initializers of incompatible types as for automatic objects, rather
than rejecting the former for their lack of constness first.


OK, presuming the patch has passed the usual testing.


Thanks.  I committed it in r237829.  The reporter wants to know
if the patch can also be backported to 5 and or 6.  Should I go
ahead?

Martin

Re: [PATCHv2, testsuite] Enable some existing __float128 tests for powerpc

2016-06-30 Thread Joseph Myers

On Thu, 30 Jun 2016, Bill Schmidt wrote:

> # Return 1 if the target supports any special run-time requirements   
>   
> # for __float128 or _Float128,
>   
> # 0 otherwise.
>   
> 
> proc check_effective_target_base_quadfloat_support { } {
> if { [istarget powerpc*-*-*] } {
> return [check_vsx_hw_available]
> }
> return 1
> }

Yes, that seems about right - can be combined with either tests on target 
names for __float128, or with the compile test for _Float128 (provided 
that dg-add-options float128 - which is preferred to hardcoding the 
particular options in the individual tests - can be made to take effect 
before the compile test for _Float128 gets run).

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [gomp4] Un-parallelized OpenACC kernels constructs with nvptx offloading: "avoid offloading"

2016-06-30 Thread Thomas Schwinge

Hi!

On Thu, 21 Jan 2016 22:54:26 +0100, I wrote:
> Committed to gomp-4_0-branch in r232709:
> 
> commit 41a76d233e714fd7b79dc1f40823f607c38306ba
> Author: tschwinge 
> Date:   Thu Jan 21 21:52:50 2016 +
> 
> Un-parallelized OpenACC kernels constructs with nvptx offloading: "avoid 
> offloading"

In gomp-4_0-branch r237895, I committed the following patch to 'only
trigger the "avoid offloading" mechanism for -O2 and higher', resolving
the confusing case that for -O0 and -O1 we'D not emit the diagnostic but
still trigger the "avoid offloading" mechanism.

commit 68ce05b476b68b50c2ed341ae6a77279850edbb1
Author: tschwinge 
Date:   Thu Jun 30 20:46:07 2016 +

Only trigger the "avoid offloading" mechanism for -O2 and higher

gcc/
* config/nvptx/nvptx.c (nvptx_goacc_validate_dims): Only trigger
the "avoid offloading" mechanism for -O2 and higher.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/avoid-offloading-1.c:
Update.
* testsuite/libgomp.oacc-fortran/avoid-offloading-1.f: Update.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@237895 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog.gomp|  5 +
 gcc/config/nvptx/nvptx.c  | 19 ++-
 libgomp/ChangeLog.gomp|  6 ++
 .../libgomp.oacc-c-c++-common/avoid-offloading-1.c| 10 +-
 .../libgomp.oacc-fortran/avoid-offloading-1.f | 12 +++-
 5 files changed, 41 insertions(+), 11 deletions(-)

diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
index 9bc1fbe..8c88119 100644
--- gcc/ChangeLog.gomp
+++ gcc/ChangeLog.gomp
@@ -1,3 +1,8 @@
+2016-06-30  Thomas Schwinge  
+
+   * config/nvptx/nvptx.c (nvptx_goacc_validate_dims): Only trigger
+   the "avoid offloading" mechanism for -O2 and higher.
+
 2016-06-10  James Norris  
 
Backport from mainline r236098.
diff --git gcc/config/nvptx/nvptx.c gcc/config/nvptx/nvptx.c
index 6d787b0..09a5a62 100644
--- gcc/config/nvptx/nvptx.c
+++ gcc/config/nvptx/nvptx.c
@@ -4152,8 +4152,13 @@ nvptx_goacc_validate_dims (tree decl, int dims[], int 
fn_level)
   /* Detect if a function is unsuitable for offloading.  */
   if (!flag_offload_force && decl)
 {
+  /* Trigger the "avoid offloading" mechanism if a OpenACC kernels
+construct could not be parallelized, but only do that for -O2 and
+higher, as otherwise we're not expecting any parallelization to
+happen.  */
   tree oacc_function_attr = get_oacc_fn_attrib (decl);
-  if (oacc_function_attr
+  if (optimize >= 2
+ && oacc_function_attr
  && oacc_fn_attrib_kernels_p (oacc_function_attr))
{
  bool avoid_offloading_p = true;
@@ -4167,14 +4172,10 @@ nvptx_goacc_validate_dims (tree decl, int dims[], int 
fn_level)
}
  if (avoid_offloading_p)
{
- /* OpenACC kernels constructs will never be parallelized for
-optimization levels smaller than -O2; avoid the diagnostic in
-this case.  */
- if (optimize >= 2)
-   warning_at (DECL_SOURCE_LOCATION (decl), 0,
-   "OpenACC kernels construct will be executed "
-   "sequentially; will by default avoid offloading "
-   "to prevent data copy penalty");
+ warning_at (DECL_SOURCE_LOCATION (decl), 0,
+ "OpenACC kernels construct will be executed"
+ " sequentially; will by default avoid offloading to"
+ " prevent data copy penalty");
  DECL_ATTRIBUTES (decl)
= tree_cons (get_identifier ("omp avoid offloading"),
 NULL_TREE, DECL_ATTRIBUTES (decl));
diff --git libgomp/ChangeLog.gomp libgomp/ChangeLog.gomp
index af4e0d5..07fe8b7 100644
--- libgomp/ChangeLog.gomp
+++ libgomp/ChangeLog.gomp
@@ -1,3 +1,9 @@
+2016-06-30  Thomas Schwinge  
+
+   * testsuite/libgomp.oacc-c-c++-common/avoid-offloading-1.c:
+   Update.
+   * testsuite/libgomp.oacc-fortran/avoid-offloading-1.f: Update.
+
 2016-06-10  Thomas Schwinge  
 
PR middle-end/71373
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-1.c 
libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-1.c
index 8f50ba3..d5fff2d 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-1.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-1.c
@@ -15,7 +15,15 @@ int main(void)
 
   if (x != 33)
 __builtin_abort();
-#if defined ACC_DEVICE_TYPE_host || defined ACC_DEVICE_TYPE_nvidia
+#if defined

Re: [PATCHv2, testsuite] Enable some existing __float128 tests for powerpc

2016-06-30 Thread Bill Schmidt

On Thu, 2016-06-30 at 17:55 +, Joseph Myers wrote:
> On Thu, 30 Jun 2016, Bill Schmidt wrote:
> 
> > +# Return 1 if the target supports __float128 at run time,
> > +# 0 otherwise.
> > +
> > +proc check_effective_target___float128_runnable { } {
> 
> I'd think you should have an effective-target for this that's shared 
> between _Float128 and __float128, possibly one that gets anded with the 
> one for the relevant type being available at all.

Well, as is it can be used for either _Float128 or __float128, so I
suppose we could just change the name.
check_effective_target_quadfloat_runnable?

To provide something minimal to and with the effective-target __float128
check would get the somewhat silly:

# Return 1 if the target supports any special run-time requirements 
# for __float128 or _Float128,  
# 0 otherwise.  

proc check_effective_target_base_quadfloat_support { } {
if { [istarget powerpc*-*-*] } {
return [check_vsx_hw_available]
}
return 1
}

so I think what I have (with a new name) is probably better.

> > +# Return 1 if the *-*-*gnu* target supports __float128 at run time,
> > +# 0 otherwise.
> 
> *-*-*gnu* here is actually a proxy for feenableexcept trap enablement 
> being available; if you're adding an effective-target, I'd think it should 
> be one for feenableexcept with an actual test of function availability, 
> which gets anded with the others.
> 

Hm, ok.  Perhaps for now I will just ditch changing this test and let
that be a follow-on opportunity for me or someone else when we aren't
under quite such a time crunch.

Thanks,
Bill

Re: [RFC] __atomic_compare_exchange* optimizations (PR middle-end/66867, alternative)

2016-06-30 Thread H.J. Lu

On Fri, Jun 24, 2016 at 6:35 AM, Jakub Jelinek  wrote:
> On Thu, Jun 23, 2016 at 05:23:21PM +0200, Jakub Jelinek wrote:
>> Thinking about this again, there could be another option - keep
>> __atomic_compare_exchange_N in the IL, but under certain conditions (similar
>> to what the patch uses in fold_builtin_atomic_compare_exchange) for these
>> builtins ignore  on the second argument, and if we actually turn var
>> into non-addressable, convert the builtin call similarly to what
>> fold_builtin_atomic_compare_exchange does in the patch (except the store
>> would be non-conditional then; the gimple-fold.c part wouldn't be needed
>> then).
>
> Here is this second approach implemented.  Generally it gives slight code
> size savings over the other patch on the testcases I've posted (and even the
> other patch used to produce generally smaller code than vanilla).
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2016-06-24  Jakub Jelinek  
>
> PR middle-end/66867
> * builtins.c (expand_ifn_atomic_compare_exchange_into_call,
> expand_ifn_atomic_compare_exchange): New functions.
> * internal-fn.c (expand_ATOMIC_COMPARE_EXCHANGE): New function.
> * tree.h (build_call_expr_internal_loc): Rename to ...
> (build_call_expr_internal_loc_array): ... this.  Fix up type of
> last argument.
> * internal-fn.def (ATOMIC_COMPARE_EXCHANGE): New internal fn.
> * predict.c (expr_expected_value_1): Handle IMAGPART_EXPR of
> ATOMIC_COMPARE_EXCHANGE result.
> * builtins.h (expand_ifn_atomic_compare_exchange): New prototype.
> * gimple-fold.h (optimize_atomic_compare_exchange_p,
> fold_builtin_atomic_compare_exchange): New prototypes.
> * gimple-fold.c (optimize_atomic_compare_exchange_p,
> fold_builtin_atomic_compare_exchange): New functions..
> * tree-ssa.c (execute_update_addresses_taken): If
> optimize_atomic_compare_exchange_p, ignore  in 2nd argument
> of call when finding addressable vars, and if such var becomes
> non-addressable, call fold_builtin_atomic_compare_exchange.
>

This caused:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71716

H.J.

Re: [BUILDROBOT] Selftest failed for i686-wrs-vxworks

2016-06-30 Thread David Malcolm

On Thu, 2016-06-30 at 08:38 -0400, Nathan Sidwell wrote:
> Jan-Benedict,
> 
> > I haven't given it any additional manual testing so far. It's
> > pre-installation though. Maybe I'd just set WIND_BASE to some
> > arbitrary value, just to make xgcc pass it's initial start-up test
> > so
> > that it can continue with self-testing? Or shall we set some value
> > in gcc/Makefile.in for running the self-test?
> 
> As I recall, WIND_BASE is expected to point at a vxworks install to
> pick up 
> header files.  It is an error not to have it set (silently skipping
> it leads to 
> user confusion).
> 
> If that's irrelevant for this testing environment, then setting it to
> something 
> (probably just "", but safer might be 
> "/These.are.not.the.dirs.you.are.looking.for") should be fine.

Sorry about the breakage.

The error message appears to affect a few other targets within
gcc/Makefile.in.

For example:

$ make s-macro_list 
echo |  ./xgcc -B./ -B/usr/local/i686-wrs-vxworks/bin/ -isystem
/usr/local/i686-wrs-vxworks/include -isystem /usr/local/i686-wrs
-vxworks/sys-include -L/home/david/archive/huge/gcc-git-all
-configs/selftest-multi-mk/i686-wrs-vxworks/gcc/../ld -E -dM - | \
  sed -n -e 's/^#define \([^_][a-zA-Z0-9_]*\).*/\1/p' \
 -e 's/^#define \(_[^_A-Z][a-zA-Z0-9_]*\).*/\1/p' | \
  sort -u > tmp-macro_list
xgcc: fatal error: environment variable ‘WIND_BASE’ not defined
compilation terminated.
/bin/sh /home/david/coding-3/gcc-git-unittests/src/gcc/../move-if
-change tmp-macro_list macro_list
cmp: EOF on tmp-macro_list
echo timestamp > s-macro_list

However the above issue doesn't fail the build:

$ echo $?
0

It can be trivially reproduced like this:

$ ./xgcc -B. -E -
xgcc: fatal error: environment variable ‘WIND_BASE’ not defined
compilation terminated.

It happens due to evaluating this spec function:
  "getenv(WIND_BASE /target/h)"
from this within gcc/config/vxworks.h:

/* Since we provide a default -isystem, expand -isystem on the command
   line early.  */
#undef VXWORKS_ADDITIONAL_CPP_SPEC
#define VXWORKS_ADDITIONAL_CPP_SPEC \
 "%{!nostdinc:  \
%{isystem*} -idirafter  \
%{mrtp: %:getenv(WIND_USR /h)   \
  ;:%:getenv(WIND_BASE /target/h)}}"

Hence it appears that passing "-nostdinc" as a param will avoid the
error:

$ echo | ./xgcc -B. -E - -nostdinc
# 1 ""
# 1 ""
# 1 ""
# 1 ""

$ echo $?
0

Presumably if you're explicitly building for vxworks you have a vxworks
install, so there is a meaningful value to set WIND_BASE to, whereas if
you don't have a vxworks install (and are merely building everything as
a smoketest), you presumably only want to build the "gcc" subdir, since
AFAIK you can't run then driver.

So there are at least 2 ways of fixing this:

(a) add "-nostdinc" when running the selftests i.e. to the invocations
of GCC_FOR_TARGET in the "s-selftest" and "selftest-gdb" clauses of
gcc/Makefile.in.
I've verified that this fixes the issue for --target=i686-wrs-vxworks.

(b) set WIND_BASE to a dummy value in contrib/config-list.mk (if not
already set) so that the vxworks targets are able to at least build all
of "gcc" without needing a vxworks install.

Thoughts?

Re: Improve insert/emplace robustness to self insertion

2016-06-30 Thread François Dumont


On 29/06/2016 23:30, Jonathan Wakely wrote:

On 29/06/16 21:36 +0100, Jonathan Wakely wrote:

On 29/06/16 21:43 +0200, François Dumont wrote:
As asked here is now a patch to only fix the robustness issue. The 
consequence is that it is reverting the latest optimization as, 
without smart algo, we always need to do a copy to protect against 
insertion of values contained in the vector as shown by new tests.


I don't understand. There is no problem with insert(), only with
emplace(), so why do both get worse?

Also, the problem is with emplacing from an lvalue, so why do the
number of operations change for test02 and test03, which are for
xvalues and rvalues?

I haven't analyzed your patch, but the results seem wrong. We should
not have to do any more work to insert rvalues.

What am I missing?


It seems to me that the minimal fix would be:

--- a/libstdc++-v3/include/bits/vector.tcc
+++ b/libstdc++-v3/include/bits/vector.tcc
@@ -312,7 +312,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 }
   else
 _M_insert_aux(begin() + (__position - cbegin()),
-   std::forward<_Args>(__args)...);
+ _Tp(std::forward<_Args>(__args)...));
   return iterator(this->_M_impl._M_start + __n);
  }

This causes regressions in the insert_vs_emplace.cc test because we
insert rvalues using emplace:


This is also why my patch is impacting insert from xvalue or rvalue.



 iterator
 insert(const_iterator __position, value_type&& __x)
 { return emplace(__position, std::move(__x)); }

That's suboptimal, since in the general case we need an extra
construction for emplacing, but we know that we don't need to do that
when inserting rvalues.


Why not ? I realized with your remarks that I was missing some 
tests in the new self_insert.cc. The ones to insert an rvalue coming 
from the vector itself. In the attached patch there is those 2 tests, do 
you agree with expected behavior ? For the moment it doesn't check that 
the source value has been indeed moved cause it doesn't, I will update 
it once it does.


My patch also reorganize a little bit code to avoid redundant 
checks to find out if reallocation is necessary or not. I was also 
thinking about reusing _M_realloc_insert_aux within _M_fill_insert when 
reallocation is needed.




So the correct fix would be to implement inserting rvalues without
using emplace.

The attached patch is a smaller change, and doesn't change the number
of operations for insertions, only for emplacing lvalues.



Ok, go ahead, I will try to rebase mine from yours.

François

diff --git a/libstdc++-v3/include/bits/stl_vector.h b/libstdc++-v3/include/bits/stl_vector.h
index eaafa22..c37880a 100644
--- a/libstdc++-v3/include/bits/stl_vector.h
+++ b/libstdc++-v3/include/bits/stl_vector.h
@@ -949,7 +949,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 #if __cplusplus >= 201103L
 	  _M_emplace_back_aux(__x);
 #else
-	  _M_insert_aux(end(), __x);
+	  _M_realloc_insert_aux(end(), __x);
 #endif
   }
 
@@ -1435,21 +1435,19 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 #if __cplusplus < 201103L
   void
   _M_insert_aux(iterator __position, const value_type& __x);
-#else
-  template
-	static void
-	_S_insert_aux_assign(iterator __pos, _Args&&... __args)
-	{ *__pos =  _Tp(std::forward<_Args>(__args)...); }
-
-  static void
-  _S_insert_aux_assign(iterator __pos, _Tp&& __arg)
-  { *__pos = std::move(__arg); }
 
+  void
+  _M_realloc_insert_aux(iterator __position, const value_type& __x);
+#else
   template
 	void
 	_M_insert_aux(iterator __position, _Args&&... __args);
 
   template
+void
+_M_realloc_insert_aux(iterator __position, _Args&&... __args);
+
+  template
 	void
 	_M_emplace_back_aux(_Args&&... __args);
 #endif
diff --git a/libstdc++-v3/include/bits/vector.tcc b/libstdc++-v3/include/bits/vector.tcc
index 604be7b..9061593 100644
--- a/libstdc++-v3/include/bits/vector.tcc
+++ b/libstdc++-v3/include/bits/vector.tcc
@@ -112,27 +112,25 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 #endif
 {
   const size_type __n = __position - begin();
-  if (this->_M_impl._M_finish != this->_M_impl._M_end_of_storage
-	  && __position == end())
-	{
-	  _Alloc_traits::construct(this->_M_impl, this->_M_impl._M_finish, __x);
-	  ++this->_M_impl._M_finish;
-	}
+  if (this->_M_impl._M_finish != this->_M_impl._M_end_of_storage)
+	if (__position == end())
+	  {
+	_Alloc_traits::construct(this->_M_impl, this->_M_impl._M_finish, __x);
+	++this->_M_impl._M_finish;
+	  }
+	else
+#if __cplusplus >= 201103L
+	  _M_insert_aux(begin() + (__position - cbegin()), __x);
+#else
+	  _M_insert_aux(__position, __x);
+#endif
   else
-	{
 #if __cplusplus >= 201103L
-	  const auto __pos = begin() + (__position - cbegin());
-	  if (this->_M_impl._M_finish != this->_M_impl._M_end_of_storage)
-	{
-	  _Tp __x_copy = __x;
-	  _M_insert_aux(__pos, std::move(__x_copy));
-	}
-	  else
-	_M_insert_aux(__pos, __x);
+

Re: [PATCH] Fix tree-ssa-strlen.c ICE (PR tree-optimization/71707)

2016-06-30 Thread Richard Biener

On June 30, 2016 7:38:05 PM GMT+02:00, Jakub Jelinek  wrote:
>Hi!
>
>I thought for ADDR_EXPRs we couldn't reach this, but we actually can,
>when
>there is an SSA_NAME with POINTER_PLUS_EXPR of ADDR_EXPR and constant.
>
>The following patch handles it.
>
>Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Thanks,
Richard.

>2016-06-30  Jakub Jelinek  
>
>   PR tree-optimization/71707
>   * tree-ssa-strlen.c (get_stridx_plus_constant): Handle already present
>   strinfo even for ADDR_EXPR ptr.
>
>   * gcc.dg/strlenopt-29.c: New test.
>
>--- gcc/tree-ssa-strlen.c.jj   2016-06-29 16:10:27.0 +0200
>+++ gcc/tree-ssa-strlen.c  2016-06-30 10:28:30.605430361 +0200
>@@ -677,8 +677,14 @@ get_stridx_plus_constant (strinfo *bases
>   {
> if (r == 0)
>   {
>-gcc_assert (TREE_CODE (ptr) == SSA_NAME);
>-ssa_ver_to_stridx[SSA_NAME_VERSION (ptr)] = si->idx;
>+if (TREE_CODE (ptr) == SSA_NAME)
>+  ssa_ver_to_stridx[SSA_NAME_VERSION (ptr)] = si->idx;
>+else
>+  {
>+int *pidx = addr_stridxptr (TREE_OPERAND (ptr, 0));
>+if (pidx != NULL && *pidx == 0)
>+  *pidx = si->idx;
>+  }
> return si->idx;
>   }
> break;
>--- gcc/testsuite/gcc.dg/strlenopt-29.c.jj 2016-06-30
>10:36:34.673371348 +0200
>+++ gcc/testsuite/gcc.dg/strlenopt-29.c2016-06-30 10:38:21.731032244
>+0200
>@@ -0,0 +1,27 @@
>+/* PR tree-optimization/71707 */
>+/* { dg-do run } */
>+/* { dg-options "-O2 -fdump-tree-strlen" } */
>+
>+#include "strlenopt.h"
>+
>+char a[32];
>+size_t b;
>+
>+__attribute__((noinline, noclone)) char *
>+foo (void)
>+{
>+  char *p = memcpy (a, "a", 2) + 1;
>+  memcpy ([1], "b", 2);
>+  b = strlen (a) + strlen ([1]) + strlen (p);
>+  return p;
>+}
>+
>+int
>+main ()
>+{
>+  if (foo () != [1] || b != 4)
>+abort ();
>+  return 0;
>+}
>+
>+/* { dg-final { scan-tree-dump-times "strlen \\(" 0 "strlen" } } */
>
>   Jakub

[PATCH 1/2] C: convert return type of lookup_name_fuzzy from tree to const char *

2016-06-30 Thread David Malcolm

The followup patch implements lookup_name_fuzzy for the C++ frontend.
It's cleaner for that implementation to return a const char *, so this
patch updates the implementation in the C frontend to have the same
return type.

Successfully bootstrapped on x86_64-pc-linux-gnu
(in combination with the followup patch).

OK for trunk?

gcc/c-family/ChangeLog:
* c-common.h (lookup_name_fuzzy): Convert return type from tree to
const char *.

gcc/c/ChangeLog:
* c-decl.c (implicit_decl_warning): Update for conversion of
return type of lookup_name_fuzzy to const char *.
(undeclared_variable): Likewise.
(lookup_name_fuzzy): Convert return type from tree to
const char *.
* c-parser.c (c_parser_declaration_or_fndef): Update for
conversion of return type of lookup_name_fuzzy to const char *.
(c_parser_parameter_declaration): Likewise.

gcc/ChangeLog:
* gcc-rich-location.c
(gcc_rich_location::add_fixit_misspelled_id): New overload, taking
a const char *.
* gcc-rich-location.h
(gcc_rich_location::add_fixit_misspelled_id): Likewise.
---
 gcc/c-family/c-common.h |  2 +-
 gcc/c/c-decl.c  | 22 +-
 gcc/c/c-parser.c| 10 +-
 gcc/gcc-rich-location.c | 13 +
 gcc/gcc-rich-location.h |  2 ++
 5 files changed, 34 insertions(+), 15 deletions(-)

diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index 3ad5400..44d98d1 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -997,7 +997,7 @@ enum lookup_name_fuzzy_kind {
   /* Any name.  */
   FUZZY_LOOKUP_NAME
 };
-extern tree lookup_name_fuzzy (tree, enum lookup_name_fuzzy_kind);
+extern const char *lookup_name_fuzzy (tree, enum lookup_name_fuzzy_kind);
 
 extern bool vector_targets_convertible_p (const_tree t1, const_tree t2);
 extern bool vector_types_convertible_p (const_tree t1, const_tree t2, bool 
emit_lax_note);
diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
index 8b966fe..09f7f79 100644
--- a/gcc/c/c-decl.c
+++ b/gcc/c/c-decl.c
@@ -3088,7 +3088,7 @@ implicit_decl_warning (location_t loc, tree id, tree 
olddecl)
   if (warn_implicit_function_declaration)
 {
   bool warned;
-  tree hint = NULL_TREE;
+  const char *hint = NULL;
   if (!olddecl)
hint = lookup_name_fuzzy (id, FUZZY_LOOKUP_NAME);
 
@@ -3099,7 +3099,7 @@ implicit_decl_warning (location_t loc, tree id, tree 
olddecl)
richloc.add_fixit_misspelled_id (loc, hint);
warned = pedwarn_at_rich_loc
  (, OPT_Wimplicit_function_declaration,
-  "implicit declaration of function %qE; did you mean %qE?",
+  "implicit declaration of function %qE; did you mean %qs?",
   id, hint);
  }
else
@@ -3112,7 +3112,7 @@ implicit_decl_warning (location_t loc, tree id, tree 
olddecl)
richloc.add_fixit_misspelled_id (loc, hint);
warned = warning_at_rich_loc
  (, OPT_Wimplicit_function_declaration,
-  G_("implicit declaration of function %qE;did you mean %qE?"),
+  G_("implicit declaration of function %qE;did you mean %qs?"),
   id, hint);
  }
else
@@ -3433,14 +3433,14 @@ undeclared_variable (location_t loc, tree id)
 
   if (current_function_decl == 0)
 {
-  tree guessed_id = lookup_name_fuzzy (id, FUZZY_LOOKUP_NAME);
+  const char *guessed_id = lookup_name_fuzzy (id, FUZZY_LOOKUP_NAME);
   if (guessed_id)
{
  gcc_rich_location richloc (loc);
  richloc.add_fixit_misspelled_id (loc, guessed_id);
  error_at_rich_loc (,
 "%qE undeclared here (not in a function);"
-" did you mean %qE?",
+" did you mean %qs?",
 id, guessed_id);
}
   else
@@ -3451,7 +3451,7 @@ undeclared_variable (location_t loc, tree id)
 {
   if (!objc_diagnose_private_ivar (id))
{
- tree guessed_id = lookup_name_fuzzy (id, FUZZY_LOOKUP_NAME);
+ const char *guessed_id = lookup_name_fuzzy (id, FUZZY_LOOKUP_NAME);
  if (guessed_id)
{
  gcc_rich_location richloc (loc);
@@ -3459,7 +3459,7 @@ undeclared_variable (location_t loc, tree id)
  error_at_rich_loc
(,
 "%qE undeclared (first use in this function);"
-" did you mean %qE?",
+" did you mean %qs?",
 id, guessed_id);
}
  else
@@ -4010,7 +4010,7 @@ find_closest_macro_cpp_cb (cpp_reader *, cpp_hashnode 
*hashnode,
It also looks for start_typename keywords, to detect "singed" vs "signed"
typos.  */
 
-tree
+const char *
 lookup_name_fuzzy (tree name, enum lookup_name_fuzzy_kind kind)
 {
   gcc_assert (TREE_CODE (name) == IDENTIFIER_NODE);
@@ -4079,7 +4079,11 @@ lookup_name_fuzzy (tree name,

[PATCH 2/2] C++ FE: handle misspelled identifiers and typenames

2016-06-30 Thread David Malcolm

This is a port of the C frontend's r237714 [1] to the C++ frontend:
  https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01052.html
offering spelling suggestions for misspelled identifiers, macro names,
and some keywords (e.g. "singed" vs "signed" aka PR c/70339).

Unlike the C frontend, there doesn't seem to be an easy way to
distinguish between cases where we're expecting a typename vs
a variable name, so some of the logic is a little different.

Examples of suggestions can be seen in the test case.

Successfully bootstrapped on x86_64-pc-linux-gnu
(in combination with the prior patch); adds 240 PASS results to
g++.sum.

OK for trunk?

Dave

[1] aka 8469aece13814deddf2cd80538d33c2d0a8d60d9 in the git mirror

gcc/c/ChangeLog:
PR c/70339
* c-decl.c (struct edit_distance_traits): Move to
spellcheck-tree.h
(best_macro_match): Likewise, converting from a typedef to a
subclass.
(find_closest_macro_cpp_cb): Move to spellcheck-tree.c.
(lookup_name_fuzzy): Update for change of best_macro_match to a
subclass with a ctor that calls cpp_forall_identifiers.

gcc/cp/ChangeLog:
PR c/70339
* name-lookup.c: Include gcc-rich-location.h, spellcheck-tree.h,
and parser.h.
(suggest_alternatives_for): If no candidates are found, try
lookup_name_fuzzy and report if if finds a suggestion.
(consider_binding_level): New function.
(consider_binding_levels): New function.
(lookup_name_fuzzy) New function.
* parser.c: Include gcc-rich-location.h.
(cp_lexer_next_token_is_decl_specifier_keyword): Move most of
logic into...
(cp_keyword_starts_decl_specifier_p): ...this new function.
(cp_parser_diagnose_invalid_type_name): When issuing
"does not name a type" errors, attempt to make a suggestion using
lookup_name_fuzzy.
* parser.h (cp_keyword_starts_decl_specifier_p): New prototype.
* search.c (lookup_field_fuzzy_info::fuzzy_lookup_field): Don't
attempt to access TYPE_FIELDS within a TYPE_PACK_EXPANSION.

gcc/ChangeLog:
PR c/70339
* diagnostic-show-locus.c (diagnostic_show_locus): If this is the
same location as last time, don't skip if we have fix-it hints.
Clarify the skipping logic by converting it from one "if" clause
to repeated "if" clauses.
* spellcheck-tree.c: Include "cpplib.h".
(find_closest_macro_cpp_cb): Move here from c/c-decl.c.
(best_macro_match::best_macro_match): New constructor.
* spellcheck-tree.h (struct edit_distance_traits):
Move here from c/c-decl.c.
(class best_macro_match): Move here from c/c-decl.c, converting
from a typedef to a subclass, gaining a ctor.

gcc/testsuite/ChangeLog:
PR c/70339
* g++.dg/spellcheck-identifiers.C: New test case, based on
gcc.dg/spellcheck-identifiers.c.
* g++.dg/spellcheck-typenames.C: New test case, based on
gcc.dg/spellcheck-typenames.c
---
 gcc/c/c-decl.c|  42 +
 gcc/cp/name-lookup.c  | 140 +-
 gcc/cp/parser.c   |  43 +++--
 gcc/cp/parser.h   |   1 +
 gcc/cp/search.c   |   4 +
 gcc/diagnostic-show-locus.c   |  15 +-
 gcc/spellcheck-tree.c |  31 
 gcc/spellcheck-tree.h |  26 +++
 gcc/testsuite/g++.dg/spellcheck-identifiers.C | 255 ++
 gcc/testsuite/g++.dg/spellcheck-typenames.C   |  84 +
 10 files changed, 584 insertions(+), 57 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/spellcheck-identifiers.C
 create mode 100644 gcc/testsuite/g++.dg/spellcheck-typenames.C

diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
index 09f7f79..dc14485 100644
--- a/gcc/c/c-decl.c
+++ b/gcc/c/c-decl.c
@@ -3955,45 +3955,6 @@ lookup_name_in_scope (tree name, struct c_scope *scope)
   return NULL_TREE;
 }
 
-/* Specialization of edit_distance_traits for preprocessor macros.  */
-
-template <>
-struct edit_distance_traits
-{
-  static size_t get_length (cpp_hashnode *hashnode)
-  {
-return hashnode->ident.len;
-  }
-
-  static const char *get_string (cpp_hashnode *hashnode)
-  {
-return (const char *)hashnode->ident.str;
-  }
-};
-
-/* Specialization of best_match<> for finding the closest preprocessor
-   macro to a given identifier.  */
-
-typedef best_match best_macro_match;
-
-/* A callback for cpp_forall_identifiers, for use by lookup_name_fuzzy.
-   Process HASHNODE and update the best_macro_match instance pointed to be
-   USER_DATA.  */
-
-static int
-find_closest_macro_cpp_cb (cpp_reader *, cpp_hashnode *hashnode,
-  void *user_data)
-{
-  if (hashnode->type != NT_MACRO)
-return 1;
-
-  best_macro_match *bmm = (best_macro_match *)user_data;
-

[RFC] Change order of Fortran BLOCK_VARS (PR fortran/71687)

2016-06-30 Thread Jakub Jelinek

Hi!

While usually the order of vars in block scope isn't that significant,
e.g. on the following testcase with -fopenmp -fstack-arrays it does matter
if we have local declaration of a temporary and then local declaration of
a VLA that uses that other temporary as the TYPE_SIZE_UNIT, or the other way
around (we ICE if we first see the VLA and firstprivatize the variable, only
to later on discover it is a local var).  pushdecl is called on them in the
right order (scalar temporary first, then VLA that uses it), but the names
chain is LIFO.
Actually, with gfc_merge_block_scope it seems we end up in quite randomish
order.
So, is it intentional that pushdecl / getdecls acts like a LIFO?
Though, it seems user vars are pushdecled in the reverse order of
declarations, but gfc_add_decl_to_function is called in the user declared
order, so perhaps for the following patch we'd also need to
decl = nreverse (saved_function_decls);
in gfc_generate_function_code and similarly in gfc_process_block_locals
decl = nreverse (saved_local_decls);

What do you think about this?

Note, nreverse is fairly cheap, it is linear in the chain length.

2016-06-30  Jakub Jelinek  

PR fortran/71687
* f95-lang.c (struct binding_level): Add reversed field.
(clear_binding_level): Adjust initializer.
(getdecls): If reversed is clear, set it and nreverse the names
chain before returning it.
(poplevel): Use getdecls.

* gfortran.dg/gomp/pr71687.f90: New test.

--- gcc/fortran/f95-lang.c.jj   2016-06-29 10:46:33.0 +0200
+++ gcc/fortran/f95-lang.c  2016-06-30 17:11:44.350240191 +0200
@@ -286,6 +286,9 @@ binding_level {
   tree blocks;
   /* The binding level containing this one (the enclosing binding level).  */
   struct binding_level *level_chain;
+  /* True if nreverse has been already called on names; if false, names
+ are ordered from newest declaration to oldest one.  */
+  bool reversed;
 };
 
 /* The binding level currently in effect.  */
@@ -296,7 +299,7 @@ static GTY(()) struct binding_level *cur
 static GTY(()) struct binding_level *global_binding_level;
 
 /* Binding level structures are initialized by copying this one.  */
-static struct binding_level clear_binding_level = { NULL, NULL, NULL };
+static struct binding_level clear_binding_level = { NULL, NULL, NULL, false };
 
 
 /* Return true if we are in the global binding level.  */
@@ -310,6 +313,11 @@ global_bindings_p (void)
 tree
 getdecls (void)
 {
+  if (!current_binding_level->reversed)
+{
+  current_binding_level->reversed = true;
+  current_binding_level->names = nreverse (current_binding_level->names);
+}
   return current_binding_level->names;
 }
 
@@ -347,7 +355,7 @@ poplevel (int keep, int functionbody)
  binding level that we are about to exit and which is returned by this
  routine.  */
   tree block_node = NULL_TREE;
-  tree decl_chain = current_binding_level->names;
+  tree decl_chain = getdecls ();
   tree subblock_chain = current_binding_level->blocks;
   tree subblock_node;
 
--- gcc/testsuite/gfortran.dg/gomp/pr71687.f90.jj   2016-06-30 
17:12:27.279702475 +0200
+++ gcc/testsuite/gfortran.dg/gomp/pr71687.f90  2016-06-30 17:12:53.480374296 
+0200
@@ -0,0 +1,11 @@
+! PR fortran/71687
+! { dg-do compile }
+! { dg-additional-options "-fstack-arrays -O2" }
+
+subroutine s (n, x)
+   integer :: n
+   real :: x(n)
+!$omp parallel
+   x(1:n) = x(n:1:-1)
+!$omp end parallel
+end


Jakub

Re: [PATCHv2, testsuite] Enable some existing __float128 tests for powerpc

2016-06-30 Thread Joseph Myers

On Thu, 30 Jun 2016, Bill Schmidt wrote:

> +# Return 1 if the target supports __float128 at run time,
> +# 0 otherwise.
> +
> +proc check_effective_target___float128_runnable { } {

I'd think you should have an effective-target for this that's shared 
between _Float128 and __float128, possibly one that gets anded with the 
one for the relevant type being available at all.

> +# Return 1 if the *-*-*gnu* target supports __float128 at run time,
> +# 0 otherwise.

*-*-*gnu* here is actually a proxy for feenableexcept trap enablement 
being available; if you're adding an effective-target, I'd think it should 
be one for feenableexcept with an actual test of function availability, 
which gets anded with the others.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH, rs6000] Enable some existing __float128 tests for powerpc64*

2016-06-30 Thread Joseph Myers

On Thu, 30 Jun 2016, Bill Schmidt wrote:

> On Wed, 2016-06-29 at 16:37 +, Joseph Myers wrote:
> > On Tue, 28 Jun 2016, Bill Schmidt wrote:
> > 
> > > -/* { dg-do compile { target ia64-*-* i?86-*-* x86_64-*-* } } */
> > > +/* { dg-do compile { target ia64-*-* i?86-*-* x86_64-*-* powerpc64*-*-* 
> > > } } */
> > >  /* { dg-options "-pedantic" } */
> > > +/* { dg-additional-options "-mfloat128 -mvsx" { target powerpc64*-*-* } 
> > > } */
> > 
> > Rather than duplicating powerpc64 references everywhere, wouldn't it be 
> > better to add an effective-target keyword __float128, meaning that 
> > __float128 is available?  Along with { dg-add-options float128 }.
> 
> Sure, I can do that.  I guess I need to stay away from the name
> check_effective_target_float128, though, as in your pending patch that
> will mean availability of _Float128.  Do you have a naming preference?

My suggestion is check_effective_target___float128, so __float128 is the 
effective-target keyword.  (You don't need separate naming for the 
dg-add-options case, since the same options will work for both __float128 
and _Float128.)

-- 
Joseph S. Myers
jos...@codesourcery.com

[committed] Fix ICE with !$omp target update (PR fortran/71705)

2016-06-30 Thread Jakub Jelinek

Hi!

The following patch fixes ICE on target update construct clauses.
While the testcase is (at runtime) invalid in OpenMP 4.0, it is valid in
4.5, and we shouldn't ICE on it anyway.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk,
queued for 6.2 backport.

2016-06-30  Jakub Jelinek  

PR fortran/71705
* trans-openmp.c (gfc_trans_omp_clauses): Set TREE_ADDRESSABLE on
decls in to/from clauses.

* gfortran.dg/gomp/pr71705.f90: New test.

--- gcc/fortran/trans-openmp.c.jj   2016-06-03 21:25:17.0 +0200
+++ gcc/fortran/trans-openmp.c  2016-06-30 12:39:15.253931614 +0200
@@ -2182,6 +2182,8 @@ gfc_trans_omp_clauses (stmtblock_t *bloc
  tree decl = gfc_get_symbol_decl (n->sym);
  if (gfc_omp_privatize_by_reference (decl))
decl = build_fold_indirect_ref (decl);
+ else if (DECL_P (decl))
+   TREE_ADDRESSABLE (decl) = 1;
  if (GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (decl)))
{
  tree type = TREE_TYPE (decl);
--- gcc/testsuite/gfortran.dg/gomp/pr71705.f90.jj   2016-06-30 
12:50:49.076186209 +0200
+++ gcc/testsuite/gfortran.dg/gomp/pr71705.f90  2016-06-30 12:48:06.0 
+0200
@@ -0,0 +1,7 @@
+! PR fortran/71705
+! { dg-do compile }
+
+  real :: x
+  x = 0.0
+  !$omp target update to(x)
+end

Jakub

[committed] Fix OpenMP parsing of the specification part in functions (PR fortran/71704)

2016-06-30 Thread Jakub Jelinek

Hi!

The Fortran parser apparently relies in functions that have still undecided
kind of the result that ST_GET_FCN_CHARACTERISTICS artificial statement is
returned before any executable statements in the function.
In normal statements that is ensured through decode_statement calling
decode_specification_statement, which parses just a subset of statements,
but for OpenMP we need to do something similar.  If we figure out we want
only the case_omp_decl statements, for any other we just try to gfc_match
the keyword and if we match it, it means we'd be about to return an OpenMP
executable statement, so instead return ST_GET_FCN_CHARACTERISTICS.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk,
queued for 6.2 backport.

Cesar, note OpenACC will need something similar (though,
decode_acc_statement uses just the match macro, so you'll need another one
for the executable statements).

2016-06-30  Jakub Jelinek  

PR fortran/71704
* parse.c (matchs, matcho): Move right before decode_omp_directive.
If spec_only, only gfc_match the keyword and if successful, goto
do_spec_only.
(matchds, matchdo): Define.
(decode_omp_directive): Add spec_only local var and set it.
Use matchds or matchdo macros instead of matchs or matcho
for declare target, declare simd, declare reduction and threadprivate
directives.  Return ST_GET_FCN_CHARACTERISTICS if a non-declarative
directive could be matched.
(next_statement): For ST_GET_FCN_CHARACTERISTICS restore
gfc_current_locus from old_locus even if there is no label.

* gfortran.dg/gomp/pr71704.f90: New test.

--- gcc/fortran/parse.c.jj  2016-06-01 14:20:51.0 +0200
+++ gcc/fortran/parse.c 2016-06-30 15:32:20.003410398 +0200
@@ -589,28 +589,6 @@ decode_statement (void)
   return ST_NONE;
 }
 
-/* Like match, but set a flag simd_matched if keyword matched.  */
-#define matchs(keyword, subr, st)  \
-do {   \
-  if (match_word_omp_simd (keyword, subr, _locus,  \
-  _matched) == MATCH_YES) \
-   return st;  \
-  else \
-   undo_new_statement ();  \
-} while (0);
-
-/* Like match, but don't match anything if not -fopenmp.  */
-#define matcho(keyword, subr, st)  \
-do {   \
-  if (!flag_openmp)\
-   ;   \
-  else if (match_word (keyword, subr, _locus)  \
-  == MATCH_YES)\
-   return st;  \
-  else \
-   undo_new_statement ();  \
-} while (0);
-
 static gfc_statement
 decode_oacc_directive (void)
 {
@@ -702,12 +680,63 @@ decode_oacc_directive (void)
   return ST_NONE;
 }
 
+/* Like match, but set a flag simd_matched if keyword matched
+   and if spec_only, goto do_spec_only without actually matching.  */
+#define matchs(keyword, subr, st)  \
+do {   \
+  if (spec_only && gfc_match (keyword) == MATCH_YES)   \
+   goto do_spec_only;  \
+  if (match_word_omp_simd (keyword, subr, _locus,  \
+  _matched) == MATCH_YES) \
+   return st;  \
+  else \
+   undo_new_statement ();  \
+} while (0);
+
+/* Like match, but don't match anything if not -fopenmp
+   and if spec_only, goto do_spec_only without actually matching.  */
+#define matcho(keyword, subr, st)  \
+do {   \
+  if (!flag_openmp)\
+   ;   \
+  else if (spec_only && gfc_match (keyword) == MATCH_YES)  \
+   goto do_spec_only;  \
+  else if (match_word (keyword, subr, _locus)  \
+  == MATCH_YES)\
+   return st;  \
+  else \
+   undo_new_statement ();  \
+} while (0);
+
+/* Like match, but set a flag simd_matched if keyword matched.  */
+#define matchds(keyword, subr, st)

[PATCH] Fix tree-ssa-strlen.c ICE (PR tree-optimization/71707)

2016-06-30 Thread Jakub Jelinek

Hi!

I thought for ADDR_EXPRs we couldn't reach this, but we actually can, when
there is an SSA_NAME with POINTER_PLUS_EXPR of ADDR_EXPR and constant.

The following patch handles it.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-06-30  Jakub Jelinek  

PR tree-optimization/71707
* tree-ssa-strlen.c (get_stridx_plus_constant): Handle already present
strinfo even for ADDR_EXPR ptr.

* gcc.dg/strlenopt-29.c: New test.

--- gcc/tree-ssa-strlen.c.jj2016-06-29 16:10:27.0 +0200
+++ gcc/tree-ssa-strlen.c   2016-06-30 10:28:30.605430361 +0200
@@ -677,8 +677,14 @@ get_stridx_plus_constant (strinfo *bases
{
  if (r == 0)
{
- gcc_assert (TREE_CODE (ptr) == SSA_NAME);
- ssa_ver_to_stridx[SSA_NAME_VERSION (ptr)] = si->idx;
+ if (TREE_CODE (ptr) == SSA_NAME)
+   ssa_ver_to_stridx[SSA_NAME_VERSION (ptr)] = si->idx;
+ else
+   {
+ int *pidx = addr_stridxptr (TREE_OPERAND (ptr, 0));
+ if (pidx != NULL && *pidx == 0)
+   *pidx = si->idx;
+   }
  return si->idx;
}
  break;
--- gcc/testsuite/gcc.dg/strlenopt-29.c.jj  2016-06-30 10:36:34.673371348 
+0200
+++ gcc/testsuite/gcc.dg/strlenopt-29.c 2016-06-30 10:38:21.731032244 +0200
@@ -0,0 +1,27 @@
+/* PR tree-optimization/71707 */
+/* { dg-do run } */
+/* { dg-options "-O2 -fdump-tree-strlen" } */
+
+#include "strlenopt.h"
+
+char a[32];
+size_t b;
+
+__attribute__((noinline, noclone)) char *
+foo (void)
+{
+  char *p = memcpy (a, "a", 2) + 1;
+  memcpy ([1], "b", 2);
+  b = strlen (a) + strlen ([1]) + strlen (p);
+  return p;
+}
+
+int
+main ()
+{
+  if (foo () != [1] || b != 4)
+abort ();
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "strlen \\(" 0 "strlen" } } */

Jakub

[PATCHv2, testsuite] Enable some existing __float128 tests for powerpc

2016-06-30 Thread Bill Schmidt

Hi,

My previous patch added to the specific target list for the tests, and
Joseph suggested it would be better to use an effective target check.
This version does this.  To avoid a name clash with Joseph's pending
patch, the effective target names include __float128 rather than
float128.  I discovered that most of the tests work even for -m32 on
big-endian, except those that depend on int128 support, so I've loosened
the 64-bit restriction.

I used the term "runnable" rather than the more common "hw_available"
for the dg-run tests, to avoid a point of potential confusion.
__float128_hw_available for PowerPC implies hardware support for the
individual instructions, but we want to run these tests even when those
are not present.  However, we need VSX hardware support because of ABI
restrictions for __float128.

Tested on powerpc64[le]-unknown-linux-gnu.  Is this ok for trunk, and
eventual backport to gcc-6-branch?

Thanks,
Bill


2016-06-30  Bill Schmidt  

* gcc.dg/const-float128-ped.c: Require __float128 effective
target, and specify additional options for powerpc*-*-*.
* gcc.dg/const-float128.c: Likewise.
* gcc.dg/torture/float128-cmp-invalid.c: Require
__float128_runnable effective target, and specify additional
options for powerpc*-*-*.
* gcc.dg/torture/float128-div-underflow.c: Likewise.
* gcc.dg/torture/float128-exact-underflow.c: Require
__float128_runnable_gnu effective target, and specify additional
options for powerpc*-*-*.
* gcc.dg/torture/float128-extend-nan.c: Require
__float128_runnable effective target, and specify additional
options for powerpc*-*-*.
* gcc.dg/torture/float128-nan.c: Likewise.
* gcc.dg/torture/fp-int-convert-float128-timode-2.c: Likewise.
* gcc.dg/torture/fp-int-convert-float128-timode-3.c: Likewise.
* gcc.dg/torture/fp-int-convert-float128-timode.c: Likewise.
* lib/target-supports.exp (check_effective_target___float128):
New.
(check_effective_target___float128_runnable): New.
(check_effective_target___float128_runnable_gnu): New.


Index: gcc/testsuite/gcc.dg/const-float128-ped.c
===
--- gcc/testsuite/gcc.dg/const-float128-ped.c   (revision 237802)
+++ gcc/testsuite/gcc.dg/const-float128-ped.c   (working copy)
@@ -1,5 +1,7 @@
 /* Test 'q' suffix with -pedantic on __float128 type constants.  */
-/* { dg-do compile { target ia64-*-* i?86-*-* x86_64-*-* } } */
+/* { dg-do compile } */
+/* { dg-require-effective-target __float128 } */
 /* { dg-options "-pedantic" } */
+/* { dg-additional-options "-mfloat128 -mvsx" { target powerpc*-*-* } } */
 
 __float128 a = 123.456789q; /* { dg-warning "non-standard suffix on floating 
constant" } */
Index: gcc/testsuite/gcc.dg/const-float128.c
===
--- gcc/testsuite/gcc.dg/const-float128.c   (revision 237802)
+++ gcc/testsuite/gcc.dg/const-float128.c   (working copy)
@@ -1,6 +1,8 @@
 /* Test 'q' and 'Q' suffixes on __float128 type constants.  */
-/* { dg-do compile { target ia64-*-* i?86-*-* x86_64-*-* } } */
+/* { dg-do compile } */
+/* { dg-require-effective-target __float128 } */
 /* { dg-options "" } */
+/* { dg-additional-options "-mfloat128 -mvsx" { target powerpc*-*-* } } */
 
 __float128 a = 123.456789q;
 __float128 b = 123.456789Q;
Index: gcc/testsuite/gcc.dg/torture/float128-cmp-invalid.c
===
--- gcc/testsuite/gcc.dg/torture/float128-cmp-invalid.c (revision 237802)
+++ gcc/testsuite/gcc.dg/torture/float128-cmp-invalid.c (working copy)
@@ -1,7 +1,9 @@
 /* Test for "invalid" exceptions from __float128 comparisons.  */
-/* { dg-do run { target i?86-*-* x86_64-*-* ia64-*-* } } */
+/* { dg-do run } */
 /* { dg-options "" } */
+/* { dg-require-effective-target __float128_runnable } */
 /* { dg-require-effective-target fenv_exceptions } */
+/* { dg-additional-options "-mfloat128 -mvsx" { target powerpc*-*-* } } */
 
 #include 
 #include 
Index: gcc/testsuite/gcc.dg/torture/float128-div-underflow.c
===
--- gcc/testsuite/gcc.dg/torture/float128-div-underflow.c   (revision 
237802)
+++ gcc/testsuite/gcc.dg/torture/float128-div-underflow.c   (working copy)
@@ -1,7 +1,9 @@
 /* Test for spurious underflow from __float128 division.  */
-/* { dg-do run { target i?86-*-* x86_64-*-* ia64-*-* } } */
+/* { dg-do run } */
 /* { dg-options "" } */
+/* { dg-require-effective-target __float128_runnable } */
 /* { dg-require-effective-target fenv_exceptions } */
+/* { dg-additional-options "-mfloat128 -mvsx" { target powerpc*-*-* } } */
 
 #include 
 #include 
Index: gcc/testsuite/gcc.dg/torture/float128-exact-underflow.c
===
---

[lra] Cleanup the use of offmemok and don't count spilling cost for it

2016-06-30 Thread Jiong Wang


On 29/06/16 22:31, Jiong Wang wrote:

Andreas Krebbel writes:


On 06/28/2016 04:16 PM, Jiong Wang wrote:
...

So my first impression is TARGET_LEGITIMATE_ADDRESS_P on s390 do need a
fix here. The following draft patch fix this, my fix may be in
correct as normally we will allow illegal constant offset if it's
associated with virtual frame register before virtual register
elimination, I don't know if that should be considered here. Also I
don't know if such check should be constrainted under some architecture
features.

--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -4374,6 +4374,11 @@ s390_legitimate_address_p (machine_mode mode, rtx
addr, bool strict)
 || REGNO_REG_CLASS (REGNO (ad.indx)) == ADDR_REGS))
return false;
   }
+
+  if (ad.disp
+  && !s390_short_displacement (ad.disp))
+return false;
+
 return true;
   }

Whether short displacement is required or not depends on the instruction. We 
always relied on reload
to take care of this.  legitimate_address_p needs to accept the union of all 
valid adresses on
S/390.  That change probably would cause a severe performance regression.


Meanwhile I feel LRA might be too strict here, as we can't assume all
address legitimized before lra, right? we still need to handle reload
illegal address. While now, a reload of

set (mem (reg + reg + offset)) regA

(The offset is out of range that the inner address doesn't pass
 constraint check.)

into

 regB <- reg + reg + offset
(set (mem (regB)) regA)

This is exactly what is supposed to happen in that case. All addresses which 
might be invalid to be
used directly in an insn can be handled with load address or load address 
relative long (+ secondary
reload for odd addresses).


is treate as spill after r237277, so I feel the following check in
lra-constraints.c also needs a relax?

if (no_regs_p && !(REG_P (op) && hard_regno[nop] < 0))


Comments?

I am testing a fix which teach LRA don't add spill cost if it's "offmemok".

Here is the patch,

From my understanding, "offmemok" is used to represent a memory operand
who's address we want to reload, and searching of it's reference location
seems confirmed my understanding as it's always used together with MEM_P 
check.


So this patch does the following modifications:

  * Only set offmemok to true if MEM_P is also true, as otherwise offmemok
is not used.
  * Remove redundant MEM_P check which was used together with offmemok.
  * Avoid the addition of spilling cost if offmemok be true as an address
calculation reload is not spilling.

bootstrap & gcc/g++ regression OK on x86_64/aarch64/arm.

OK for trunk?

2016-06-30  Jiong Wang  

gcc
  * lra-constraints.c (process_alt_operands): Only set "offmemok" for
  MEM_P.  Remove redundant MEM_P check if it's used together with
  "offmemok" check.  Don't add spilling cost for "offmemok".
  (curr_insn_transform): Remove redundant MEM_P check.

diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
index bf08dce..4546f1e 100644
--- a/gcc/lra-constraints.c
+++ b/gcc/lra-constraints.c
@@ -1967,7 +1967,6 @@ process_alt_operands (int only_alternative)
 			   combination, because we can't reload
 			   it.	*/
 			if (curr_alt_offmemok[m]
-			&& MEM_P (*curr_id->operand_loc[m])
 			&& curr_alt[m] == NO_REGS && ! curr_alt_win[m])
 			  continue;
 		  }
@@ -2089,7 +2088,8 @@ process_alt_operands (int only_alternative)
 			  || equiv_substition_p[nop])
 			badop = false;
 		  constmemok = true;
-		  offmemok = true;
+		  if (MEM_P (op))
+			offmemok = true;
 		  break;
 
 		case CT_ADDRESS:
@@ -2436,8 +2436,7 @@ process_alt_operands (int only_alternative)
 		  reject += LRA_MAX_REJECT;
 		}
 
-	  if (! (MEM_P (op) && offmemok)
-		  && ! (const_to_mem && constmemok))
+	  if (!offmemok && ! (const_to_mem && constmemok))
 		{
 		  /* We prefer to reload pseudos over reloading other
 		 things, since such reloads may be able to be
@@ -2488,7 +2487,9 @@ process_alt_operands (int only_alternative)
 
 		 Code below increases the reject for both pseudo and non-pseudo
 		 spill.  */
-	  if (no_regs_p && !(REG_P (op) && hard_regno[nop] < 0))
+	  if (no_regs_p
+		  && !offmemok
+		  && !(REG_P (op) && hard_regno[nop] < 0))
 		{
 		  if (lra_dump_file != NULL)
 		fprintf
@@ -3909,7 +3910,7 @@ curr_insn_transform (bool check_only_p)
 	 appearing where an offsettable address will do.  It also may
 	 be a case when the address should be special in other words
 	 not a general one (e.g. it needs no index reg).  */
-  if (goal_alt_matched[i][0] == -1 && goal_alt_offmemok[i] && MEM_P (op))
+  if (goal_alt_matched[i][0] == -1 && goal_alt_offmemok[i])
 	{
 	  enum reg_class rclass;
 	  rtx *loc =  (op, 0);

Re: [PATCH][ARM] Delete thumb_reload_in_h

2016-06-30 Thread Kyrill Tkachov


Ping.
Looking back at this it's probably obvious, but since I had already asked for 
approval...

Thanks,
Kyrill

On 17/06/16 15:38, Kyrill Tkachov wrote:

Ping.
https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00821.html

Thanks,
Kyrill
On 10/06/16 15:55, Kyrill Tkachov wrote:

Hi all,

This function just ICEs and isn't actually called from anywhere.
It was introduced back in 2000 as part of a large merge introducing Thumb 
support
and was aborting even then. I don't think having it around is of any benefit.

Tested on arm-none-eabi.

Ok for trunk?

Thanks,
Kyrill

2016-06-10  Kyrylo Tkachov  

* config/arm/arm.c (thumb_reload_in_hi): Delete.
* config/arm/arm-protos.h (thumb_reload_in_hi): Delete prototype.

Re: [PATCH PR c/71699] Handle pointer arithmetic in nonzero tree checks

2016-06-30 Thread Manish Goregaokar

This is the full test failure summary. I will compare with the
previous commit tomorrow. The asan failure and the uninit-19 failure
are interesting. uninit-19 should not have failed, but I have no idea
what's going on with asan. I'll have a closer look tomorrow and look
for other failing tests.

cat <<'EOF' |
Native configuration is x86_64-apple-darwin15.3.0

=== g++ tests ===


Running target unix
FAIL: g++.dg/debug/dwarf2/imported-decl-2.C  -std=gnu++98
scan-assembler-times ascii "0".*ascii "0".*DIE
.0x[0-9a-f]*. DW_TAG_imported_declaration 1
FAIL: g++.dg/debug/dwarf2/imported-decl-2.C  -std=gnu++11
scan-assembler-times ascii "0".*ascii "0".*DIE
.0x[0-9a-f]*. DW_TAG_imported_declaration 1
FAIL: g++.dg/debug/dwarf2/imported-decl-2.C  -std=gnu++14
scan-assembler-times ascii "0".*ascii "0".*DIE
.0x[0-9a-f]*. DW_TAG_imported_declaration 1
FAIL: g++.dg/cpp0x/constexpr-array-ptr10.C  -std=c++11  (test for
errors, line 92)
FAIL: g++.dg/cpp0x/constexpr-array-ptr10.C  -std=c++11  (test for
errors, line 93)
FAIL: g++.dg/cpp0x/constexpr-array-ptr10.C  -std=c++11  (test for
errors, line 94)
FAIL: g++.dg/cpp0x/constexpr-array-ptr10.C  -std=c++11  (test for
errors, line 95)
FAIL: g++.dg/cpp0x/constexpr-array-ptr10.C  -std=c++11  (test for
errors, line 96)
FAIL: g++.dg/cpp0x/constexpr-array-ptr10.C  -std=c++11  (test for
errors, line 98)
FAIL: g++.dg/cpp0x/constexpr-array-ptr10.C  -std=c++11  (test for
errors, line 99)
FAIL: g++.dg/cpp0x/constexpr-array-ptr10.C  -std=c++11  (test for
errors, line 100)
FAIL: g++.dg/cpp0x/constexpr-array-ptr10.C  -std=c++11  (test for
errors, line 101)
FAIL: g++.dg/cpp0x/constexpr-array-ptr10.C  -std=c++11  (test for
errors, line 102)
FAIL: g++.dg/cpp0x/constexpr-array-ptr10.C  -std=c++14  (test for
errors, line 92)
FAIL: g++.dg/cpp0x/constexpr-array-ptr10.C  -std=c++14  (test for
errors, line 93)
FAIL: g++.dg/cpp0x/constexpr-array-ptr10.C  -std=c++14  (test for
errors, line 94)
FAIL: g++.dg/cpp0x/constexpr-array-ptr10.C  -std=c++14  (test for
errors, line 95)
FAIL: g++.dg/cpp0x/constexpr-array-ptr10.C  -std=c++14  (test for
errors, line 96)
FAIL: g++.dg/cpp0x/constexpr-array-ptr10.C  -std=c++14  (test for
errors, line 98)
FAIL: g++.dg/cpp0x/constexpr-array-ptr10.C  -std=c++14  (test for
errors, line 99)
FAIL: g++.dg/cpp0x/constexpr-array-ptr10.C  -std=c++14  (test for
errors, line 100)
FAIL: g++.dg/cpp0x/constexpr-array-ptr10.C  -std=c++14  (test for
errors, line 101)
FAIL: g++.dg/cpp0x/constexpr-array-ptr10.C  -std=c++14  (test for
errors, line 102)
FAIL: g++.dg/ext/sync-4.C  -std=gnu++98 execution test
FAIL: g++.dg/ext/sync-4.C  -std=gnu++11 execution test
FAIL: g++.dg/ext/sync-4.C  -std=gnu++14 execution test
FAIL: g++.dg/ipa/pr67056.C   scan-ipa-dump cp "Speculative outer
type:struct CompositeClass"
FAIL: c-c++-common/pr60226.c  -std=gnu++98 (test for excess errors)
FAIL: c-c++-common/pr60226.c  -std=gnu++11 (test for excess errors)
FAIL: c-c++-common/pr60226.c  -std=gnu++14 (test for excess errors)
FAIL: g++.dg/tree-prof/pr57451.C compilation,  -fprofile-generate
-D_PROFILE_GENERATE
UNRESOLVED: g++.dg/tree-prof/pr57451.C execution,
-fprofile-generate -D_PROFILE_GENERATE
UNRESOLVED: g++.dg/tree-prof/pr57451.C compilation,  -fprofile-use
-D_PROFILE_USE
UNRESOLVED: g++.dg/tree-prof/pr57451.C execution,-fprofile-use
-D_PROFILE_USE
FAIL: g++.dg/tree-prof/pr63581.C compilation,  -fprofile-generate
-D_PROFILE_GENERATE
UNRESOLVED: g++.dg/tree-prof/pr63581.C execution,
-fprofile-generate -D_PROFILE_GENERATE
UNRESOLVED: g++.dg/tree-prof/pr63581.C compilation,  -fprofile-use
-D_PROFILE_USE
UNRESOLVED: g++.dg/tree-prof/pr63581.C execution,-fprofile-use
-D_PROFILE_USE

=== g++ Summary ===

# of expected passes98650
# of unexpected failures32
# of expected failures314
# of unresolved testcases6
# of unsupported tests4169
/Users/manishearth/dev/gcc-build/gcc/testsuite/g++/../../xg++  version
7.0.0 20160630 (experimental) (GCC)

=== gcc tests ===


Running target unix
FAIL: c-c++-common/asan/strncpy-overflow-1.c   -O0  execution test
FAIL: gcc.dg/debug/dwarf2/prod-options.c scan-assembler DW_AT_producer: "GNU C
FAIL: gcc.dg/darwin-minversion-1.c (test for excess errors)
FAIL: gcc.dg/darwin-minversion-2.c (test for excess errors)
FAIL: gcc.dg/darwin-version-1.c (test for excess errors)
FAIL: gcc.dg/framework-1.c (test for excess errors)
FAIL: gcc.dg/uninit-19.c  (test for warnings, line 22)
FAIL: gcc.dg/uninit-19.c (test for excess errors)
FAIL: c-c++-common/pr60226.c  -Wc++-compat  (test for excess errors)
FAIL: gcc.dg/graphite/scop-19.c scan-tree-dump-times graphite "number
of SCoPs: 0" 1
FAIL: gcc.dg/torture/pr68264.c   -O0  execution test
FAIL: gcc.dg/torture/pr68264.c   -O1  execution test
FAIL: gcc.dg/torture/pr68264.c   -O2  execution test
FAIL: gcc.dg/torture/p

Re: [PATCH PR c/71699] Handle pointer arithmetic in nonzero tree checks

2016-06-30 Thread Marc Glisse


On Thu, 30 Jun 2016, Richard Biener wrote:

points-to analysis already has the constraint that POINTER_PLUS_EXPR 
cannot leave the object op0 points to.  Of course currently nothing uses 
the fact whether points-to computes pointed-to as nothing (aka NULL) - 
so the argument may be moot.


Anyway, one of my points to the original patch was that POINTER_PLUS_EXPR
handling should be clearly separate from PLUS_EXPR and that we have
flag_delete_null_pointer_checks to allow targest to declare that 0 is a valid
object pointer (and thus you can do 4 + -4 and reach NULL).


Thanks. So the tricky point is that we are not allowed to transform g into 
f below:


char*f(char*p){return p+4;}
char*g(char*p){return (char*)((intptr_t)p+4);}

That makes sense and seems much easier to guarantee than I feared, nice.

(on the other hand, only RTL is able to simplify (long)p+4-(long)(p+4))

--
Marc Glisse

Re: [PATCH][AArch64] Increase code alignment

2016-06-30 Thread Wilco Dijkstra

Evandro Menezes wrote:
On 06/29/16 07:59, James Greenhalgh wrote:
> On Tue, Jun 21, 2016 at 02:39:23PM +0100, Wilco Dijkstra wrote:
>> ping
>>
>>
>> From: Wilco Dijkstra
>> Sent: 03 June 2016 11:51
>> To: GCC Patches
>> Cc: nd; philipp.toms...@theobroma-systems.com; pins...@gmail.com; 
>> jim.wil...@linaro.org; benedikt.hu...@theobroma-systems.com; Evandro Menezes
>> Subject: [PATCH][AArch64] Increase code alignment
>>  
>> Increase loop alignment on Cortex cores to 8 and set function alignment to
>> 16.  This makes things consistent across big.LITTLE cores, improves
>> performance of benchmarks with tight loops and reduces performance variations
>> due to small  changes in code layout. It looks almost all AArch64 cores agree
>> on alignment of 16 for function, and 8 for loops and branches, so we should
>> change -mcpu=generic as well if there is no disagreement - feedback welcome.
>>
>> OK for commit?
> Hi Wilco,
>
> Sorry for the delay.
>
> This patch is OK for trunk.
>
> I hope we can continue the discussion as to whether there is a set of
> values for -mcpu=generic that better suits the range of cores we now
> support.
>
> After Wilco's patch, and using the values in the proposed vulcan and
> qdf24xx structures, and with the comments from this thread, the state of
> the tuning structures will be:
>
> -mcpu=    : function-jump-loop alignment
>
> cortex-a35: 16-8-8
> cortex-a53: 16-8-8
> cortex-a57: 16-8-8
> cortex-a72: 16-8-8
> cortex-a73: 16-8-4 (Though I'm guessing this is just an ordering issue on
>  when Kyrill copied the cost table. Kyrill/Wilco do you
>  want to spin an update for the Cortex-A73 alignment?
>  Consider it preapproved if you do)
> exynos-m1 : 4-4-4  (2: 16-4-16, 3: 8-4-4)
> thunderx  : 8-8-8  (But 16-8-8 acceptable/maybe better)
> xgene1    :16-8-16
> qdf24xx   :16-8-16
> vulcan    :16-8-16
>
> Generic is currently set to 8-8-4, which doesn't look representative of
> these individual cores at all.
>
> Running down that list, I see a very compelling case for 16 for function
> alignment (from comments in this thread, it is second choice, but not too
> bad for exynos-m1, thunderx is maybe better at 16, but should not be
> worse). I also see a close to unanimous case for 8 for jump alignment,
> though I note that in Evandro's experiments 8 never came out as "best"
> for exynos-m1. Did you get a feel in your experiments for what the
> performance penalty of aligning jumps to 8 would be? That generic is
> currently set to 8 seems like a good tie-breaker if the performance impact
> for exynos-m1 would be minimal, as it gives no change from the current
> behaviour.
>
> For loop alignment I see a split straight down the middle between 8
> and 16 (exynos-m1/cortex-a73 are the outliers at 4, but second place for
> exynos-m1 was 16-4-16, and cortex-a73 might just be an artifact of when the
> table was copied).
>
>  From that, and as a starting point for discussion, I'd suggest that
> 16-8-8 or 16-8-16 are most representative from the core support that has
> been added to GCC.
>
> Wilco, the Cortex-A cores tip the balance in favour of 16-8-8. As you've
> most recently looked at the cost of setting loop alignment to 16, do you
> have any comments on why you chose to standardize on 16-8-8 rather than
> 16-8-16, or what the cost would be of 16-8-16?
>
> I've widened the CC list, and I'd appreciate any comments on what we all
> think will be the right alignment parameters for -mcpu=generic.

This is a very good summary.  However, I think that it should also 
consider the effects on code size.

Evidently, an alignment of 16 has the greatest probability of increasing 
code size, whereas an alignment of 4 doesn't increase it.  Likewise, 
what is aligned also matters, for each of them have a different 
frequency of instances.  Arguably, the function alignment has the least 
effect in code size, followed by jumps and then loops, based on typical 
frequencies.  In specific cases, the weights may move somewhat, of course.

As I stated in my previous reply, I also tracked the code size when 
experimenting with different alignments.  It was clear form my data that 
the jump alignment was the most critical to code size (~3% on average), 
followed by loop alignment (<1%) and function alignment (<1%).  
Therefore, I'd argue against setting the alignment for jumps to 16.  The 
case can be made for an alignment of 8 for them, when the cost to code 
size is more modest.

 From a performance perspective, generic alignments at 16-8-8 would be 
the 4th rank on Exynos M1, with a negligible code size penalty (<1%) 
over the current 8-4-4.

--

Hi Evandro,

Agreed, the codesize impact is important as well, so there is a balance to
be made. Looking at the codesize cost, branch alignment has the largest
cost but also the least benefit. It isn't helped by GCC's very simplistic way
of aligning - our goal is to improve fetching of hot code,

Re: [patch,avr]: ad PR71151: Make test cases pass on smaller targets.

2016-06-30 Thread Mike Stump

On Jun 23, 2016, at 5:21 AM, Senthil Kumar Selvaraj 
 wrote:
> 
> 2. Even if (1) is fixed, the custom section (.foo) is not mapped to
> any output section or region in the linker script. The linker can
> error out only if the contents overflow a region.

If the segment isn't loaded, then it's size can't matter any?  If it is loaded, 
and it goes onto a machine with a finite size, then the loader can (should) 
complain if it can't fit, and/or the linker can know that size, as it is a 
fixed finite size.  The size would usually be part of a board support package 
and can include a memory segment with that size.

Re: [patch,avr]: ad PR71151: Make test cases pass on smaller targets.

2016-06-30 Thread Mike Stump

On Jun 23, 2016, at 4:00 AM, Georg-Johann Lay  wrote:
> Binutils don't produce a message

That's unfortunate.

> so there is nothing to scan for.  Hacking binutils is beyond my scope.

That's fine.

> avrtest behaves just as if the program under test would call abort.

That's unfortunate.  Would be nice if they said, can't load program, or some 
other unique thing we could check for.

> There are at least 2 other AVR simulators; dunno how they would handle the 
> situation.

That doesn't matter, as the tool chain has to be ported to each.  Of course, if 
they can self agree on common things, that would be best.  If they do, then the 
work on our side is 1/2 the work.

> I don't see how an a-posteriori test could be independent of simulator, 
> independent of board descriptions and all that stuff.

I can explain if detail, if you want.  Program loader always says 'program load 
failed, program too big', when that is true, then dejagnu can look for this 
string, and always know that the test case is unsupported, doesn't matter the 
board file, the platform, the OS, the chip architecture or the time of day.  
The reason is the program loader must know onto what it is loading.  Likewise, 
when linking, the board file can have details, like this target has 96KB of 
ram, and some linkers, given that detail, can create a memory segment that is 
that size, and as the linker fills that segment, it can print, segment 
overflowed.  Binutils usually does this for free, _if_ you give it memory 
segments.  On virtual memory systems, we never do, as there is little concept 
of memory overload at static link time; but, for fixed memory systems, it is 
perfectly reasonable to have memory segments that have this size, if one wants.

A program loader that can't complain in some fashion, is, well, not a nice 
program.

For binutils, you can have:

MEMORY {
   SRAM(rwx) : ORIGIN = 0x, LENGTH = 128K
}

to define SRAM as being 128KB for example.  The linker script can then put 
things into sram, and when it overflows, it will print an error.  By doing 
this, one never has to worry about load failures.

> Is this one ok?

Ok.

Re: [PATCH, rs6000] Enable some existing __float128 tests for powerpc64*

2016-06-30 Thread Bill Schmidt

On Wed, 2016-06-29 at 16:37 +, Joseph Myers wrote:
> On Tue, 28 Jun 2016, Bill Schmidt wrote:
> 
> > -/* { dg-do compile { target ia64-*-* i?86-*-* x86_64-*-* } } */
> > +/* { dg-do compile { target ia64-*-* i?86-*-* x86_64-*-* powerpc64*-*-* } 
> > } */
> >  /* { dg-options "-pedantic" } */
> > +/* { dg-additional-options "-mfloat128 -mvsx" { target powerpc64*-*-* } } 
> > */
> 
> Rather than duplicating powerpc64 references everywhere, wouldn't it be 
> better to add an effective-target keyword __float128, meaning that 
> __float128 is available?  Along with { dg-add-options float128 }.

Sure, I can do that.  I guess I need to stay away from the name
check_effective_target_float128, though, as in your pending patch that
will mean availability of _Float128.  Do you have a naming preference?

> 
> Also: powerpc-* targets with -m64 should always be handled in tests 
> identically to powerpc64 targets, while powerpc64 targets with -m32 should 
> presumably not run these tests.  That is, testing for powerpc64*-*-* is 
> actually (always, for any test, not just these ones) incorrect in both 
> directions (just as it's always incorrect for a test to list just one of 
> i?86-*-* and x86_64-*-*, rather than listing both and then adding any 
> restrictions required to 32-bit or 64-bit).
> 
> > -/* { dg-do run { target i?86-*-* x86_64-*-* ia64-*-* } } */
> > +/* { dg-do run { target i?86-*-* x86_64-*-* ia64-*-* powerpc64*-*-* } } */
> >  /* { dg-options "" } */
> >  /* { dg-require-effective-target fenv_exceptions } */
> > +/* { dg-additional-options "-mfloat128 -mvsx" { target powerpc64*-*-* } } 
> > */
> 
> Also: for execution tests, if you add extra options, you also need to 
> restrict the test to running when the hardware actually supports the 
> required features.
> 

All good points, will fix in next revision.

Thanks!
Bill

Re: [PATCH PR70729] The second part of patch.

2016-06-30 Thread Yuri Rumyantsev

Richard,

Could you please review additional simple fix for 70729 - we need to
nullify safelen field of loops containing simduid intrinsics like
GOMP_SIMD_LANE (introduced  e.g. for private variables). I checked
that this fix cures regression which was missed by me since AVX2
machine is required for  libgomp.fortran/examples-4/simd-2.f90.

Regression testing and bootstrapping did not show any new failures.
Is it OK for trunk?

Patch:
diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c
index 2669813..9fbd183 100644
--- a/gcc/tree-vectorizer.c
+++ b/gcc/tree-vectorizer.c
@@ -204,6 +204,10 @@ adjust_simduid_builtins (hash_table *htab)
   gcc_assert (TREE_CODE (arg) == SSA_NAME);
   simduid_to_vf *p = NULL, data;
   data.simduid = DECL_UID (SSA_NAME_VAR (arg));
+  /* Need to nullify loop safelen field since it's value is not
+ valid after transformation.  */
+  if (bb->loop_father && bb->loop_father->safelen > 0)
+bb->loop_father->safelen = 0;
   if (htab)
 {
   p = htab->find ();

ChangeLog:
2016-06-30  Yuri Rumyantsev  

PR tree-optimization/70729
* tree-vectorizer.c (adjust_simduid_builtins): Nullify safelen field
of loop since it can be not valid after transformation.

2016-06-30 17:28 GMT+03:00 Jakub Jelinek :
> On Thu, Jun 30, 2016 at 04:25:16PM +0200, Thomas Schwinge wrote:
>> Hi!
>>
>> On Thu, 30 Jun 2016 16:48:25 +0300, Yuri Rumyantsev  
>> wrote:
>> > Thanks for your help.
>> > Could you please look at the following simple patch which cures
>> > regression - we need to nullify loop safelen  field in
>> > adjust_simduid_builtins:
>> >
>> > diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c
>> > index 2669813..f70380c 100644
>> > --- a/gcc/tree-vectorizer.c
>> > +++ b/gcc/tree-vectorizer.c
>> > @@ -204,6 +204,10 @@ adjust_simduid_builtins (hash_table 
>> > *htab)
>> >   gcc_assert (TREE_CODE (arg) == SSA_NAME);
>> >   simduid_to_vf *p = NULL, data;
>> >   data.simduid = DECL_UID (SSA_NAME_VAR (arg));
>> > + /* Need to nullify safelen fielf of loop since it's vale is not
>> > +valid after transformation.  */
>
> s/fielf/field/
> s/vale/value/
>
>> > + if (bb->loop_father && bb->loop_father->safelen > 0)
>> > +   bb->loop_father->safelen = 0;
>> >   if (htab)
>> > {
>> >   p = htab->find ();
>
> Jakub

Re: [PATCH][AArch64][1/2] Add support INS (element) instruction to copy lanes between vectors

2016-06-30 Thread Richard Earnshaw (lists)

On 07/06/16 17:56, Kyrill Tkachov wrote:
> Hi all,
> 
> This patch addresses an deficiency we have in handling vector
> lane-to-lane moves in the AArch64 backend.
> Generally we can use the INS (element) instruction but, as a user
> complains in https://gcc.gnu.org/ml/gcc-help/2016-05/msg00069.html
> we don't. James had a patch adding an appropriate combine pattern some
> time ago (https://gcc.gnu.org/ml/gcc-patches/2013-09/msg01068.html)
> but it never got applied.
> 
> This patch is a rebase of that patch that adds necessary
> vec_merge+vec_duplicate+vec_select combine pattern.
> I chose to use a define_insn rather than the define_insn_and_split in
> that patch that just deletes the instruction when
> the source and destination registers are the same, as I think that's not
> he combine patterns job to delete the redundant instruction
> but rather some other passes job. Also, I was not able to create a
> testcase where it would make a difference.
> 
> Also, this patch doesn't reimplement that vcopy*lane* intrinsics from
> inline assembly to a vget_lane+vset_lane combo.
> This can be done as a separate patch on top of this one.
> 
> Bootstrapped and tested on aarch64-none-linux-gnu.
> Also tested on aarch64_be-none-elf.
> 
> Ok for trunk?
> 

OK.

R.

> Thanks,
> Kyrill
> 
> 2016-06-07  James Greenhalgh  
> Kyrylo Tkachov  
> 
> * config/aarch64/aarch64-simd.md (*aarch64_simd_vec_copy_lane):
> New define_insn.
>  (*aarch64_simd_vec_copy_lane_): Likewise.
> 
> 2016-06-07  James Greenhalgh  
> Kyrylo Tkachov  
> 
> * gcc.target/aarch64/vget_set_lane_1.c: New test.
> 
> aarch64-ins-vec.patch
> 
> 
> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index 
> 6ea35bf487eaa47dd78742e3eae7507b6875ba1a..5600e5bd0a94fd7efd704a4b13d95d993fd5b62f
>  100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -555,6 +555,49 @@ (define_insn "aarch64_simd_vec_set"
>[(set_attr "type" "neon_from_gp, neon_ins, neon_load1_1reg")]
>  )
>  
> +(define_insn "*aarch64_simd_vec_copy_lane"
> +  [(set (match_operand:VALL 0 "register_operand" "=w")
> + (vec_merge:VALL
> + (vec_duplicate:VALL
> +   (vec_select:
> + (match_operand:VALL 3 "register_operand" "w")
> + (parallel
> +   [(match_operand:SI 4 "immediate_operand" "i")])))
> + (match_operand:VALL 1 "register_operand" "0")
> + (match_operand:SI 2 "immediate_operand" "i")))]
> +  "TARGET_SIMD"
> +  {
> +int elt = ENDIAN_LANE_N (mode, exact_log2 (INTVAL (operands[2])));
> +operands[2] = GEN_INT (HOST_WIDE_INT_1 << elt);
> +operands[4] = GEN_INT (ENDIAN_LANE_N (mode, INTVAL (operands[4])));
> +
> +return "ins\t%0.[%p2], %3.[%4]";
> +  }
> +  [(set_attr "type" "neon_ins")]
> +)
> +
> +(define_insn "*aarch64_simd_vec_copy_lane_"
> +  [(set (match_operand:VALL 0 "register_operand" "=w")
> + (vec_merge:VALL
> + (vec_duplicate:VALL
> +   (vec_select:
> + (match_operand: 3 "register_operand" "w")
> + (parallel
> +   [(match_operand:SI 4 "immediate_operand" "i")])))
> + (match_operand:VALL 1 "register_operand" "0")
> + (match_operand:SI 2 "immediate_operand" "i")))]
> +  "TARGET_SIMD"
> +  {
> +int elt = ENDIAN_LANE_N (mode, exact_log2 (INTVAL (operands[2])));
> +operands[2] = GEN_INT (HOST_WIDE_INT_1 << elt);
> +operands[4] = GEN_INT (ENDIAN_LANE_N (mode,
> +INTVAL (operands[4])));
> +
> +return "ins\t%0.[%p2], %3.[%4]";
> +  }
> +  [(set_attr "type" "neon_ins")]
> +)
> +
>  (define_insn "aarch64_simd_lshr"
>   [(set (match_operand:VDQ_I 0 "register_operand" "=w")
> (lshiftrt:VDQ_I (match_operand:VDQ_I 1 "register_operand" "w")
> diff --git a/gcc/testsuite/gcc.target/aarch64/vget_set_lane_1.c 
> b/gcc/testsuite/gcc.target/aarch64/vget_set_lane_1.c
> new file mode 100644
> index 
> ..07a77de319206c5c6dad1c0d2d9bcc998583f9c1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/vget_set_lane_1.c
> @@ -0,0 +1,72 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +#include "arm_neon.h"
> +
> +#define BUILD_TEST(TYPE1, TYPE2, Q1, Q2, SUFFIX, INDEX1, INDEX2) \
> +TYPE1 __attribute__((noinline,noclone))  \
> +test_copy##Q1##_lane##Q2##_##SUFFIX (TYPE1 a, TYPE2 b)   
> \
> +{\
> +  return vset##Q1##_lane_##SUFFIX (vget##Q2##_lane_##SUFFIX (b, INDEX2),\
> + a, INDEX1); \
> +}
> +
> +BUILD_TEST (poly8x8_t, poly8x8_t, , , p8, 7, 6)
> +BUILD_TEST (int8x8_t,  int8x8_t,  , , s8, 7, 6)
> +BUILD_TEST (uint8x8_t, uint8x8_t, , , u8, 7, 6)
> +/* { dg-final

Re: [PATCH 3/4] Add support to run auto-vectorization tests for multiple effective targets

2016-06-30 Thread Mike Stump

On May 5, 2016, at 8:14 AM, Robert Suchanek  wrote:
> 
> I'm resending this patch as it has been rebased and updated.  I reverted a 
> change
> to check_effective_target_vect_call_lrint procedure because it does not use
> cached result.

Ok.

Please ensure that the compilation flag is mixed into the test case name so 
that as you iterate over them, the test case names are unique.

[PATCH][ARM] -mpure-code option for ARM

2016-06-30 Thread Andre Vieira (lists)

Hello,

This patch adds the -mpure-code option for ARM. This option ensures
functions are put into sections that contain only code and no data. To
ensure this throughout compilation we give these sections the ARM
processor-specific ELF section attribute "SHF_ARM_PURECODE". This option
is only supported for non-pic code for armv7-m targets.

This patch introduces a new target hook 'TARGET_ASM_ELF_FLAGS_NUMERIC'.
This target hook enables a target to use the numeric value for elf
section attributes rather than their alphabetical representation. If
TARGET_ASM_ELF_FLAGS_NUMERIC returns TRUE, the existing
'default_elf_asm_named_section', will print the numeric value of the
section attributes for the current section. This target hook has two
parameters:
unsigned int FLAGS, the input parameter that tells the function the
current section's attributes;
unsigned int *NUM, used to pass down the numerical representation of the
section's attributes.

The default implementation for TARGET_ASM_ELF_FLAGS_NUMERIC will return
false, so existing behavior is not changed.

Bootstrapped and tested for arm-none-linux-gnueabihf. Further tested for
arm-none-eabi with a Cortex-M3 target.


gcc/ChangeLog:
2016-06-30  Andre Vieira  
Terry Guo  

* target.def (elf_flags_numeric): New target hook.
* targhooks.h (default_asm_elf_flags_numeric): New.
* varasm.c (default_asm_elf_flags_numeric): New.
  (default_elf_asm_named_section): Use new target hook.
* config/arm/arm.opt (mpure-code): New.
* config/arm/arm.h (SECTION_ARM_PURECODE): New.
* config/arm/arm.c (arm_asm_init_sections): Add section
  attribute to default text section if -mpure-code.
  (arm_option_check_internal): Diagnose use of option with
  non supported targets and/or options.
  (arm_asm_elf_flags_numeric): New.
  (arm_function_section): New.
  (arm_elf_section_type_flags): New.
* config/arm/elf.h (JUMP_TABLES_IN_TEXT_SECTION): Disable
  for -mpure-code.
* gcc/doc/texi (TARGET_ASM_ELF_FLAGS_NUMERIC): New.
* gcc/doc/texi.in (TARGET_ASM_ELF_FLAGS_NUMERIC): Likewise.



gcc/testsuite/ChangeLog:
2016-06-30  Andre Vieira  
Terry Guo  

* gcc.target/arm/pure-code/ffunction-sections.c: New.
* gcc.target/arm/pure-code/no-literal-pool.c: New.
* gcc.target/arm/pure-code/pure-code.exp: New.
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index f0cdd669191689bc5dcf3a7c2b60da5a2d201e3f..d10605cee0e6e0e07bbb4e1910d30c91443f8d17 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -2263,4 +2263,8 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
 /* For switching between functions with different target attributes.  */
 #define SWITCHABLE_TARGET 1
 
+/* Define SECTION_ARM_PURECODE as the ARM specific section attribute
+   representation for SHF_ARM_PURECODE in GCC.  */
+#define SECTION_ARM_PURECODE SECTION_MACH_DEP
+
 #endif /* ! GCC_ARM_H */
>From 6f3e37973e0c4dbd393325addf42265c42726240 Mon Sep 17 00:00:00 2001
From: Andre Simoes Dias Vieira 
Date: Tue, 14 Jun 2016 11:17:12 +0100
Subject: [PATCH] -mpure-code for ARM

---
 gcc/config/arm/arm.c   | 150 -
 gcc/config/arm/arm.h   |   4 +
 gcc/config/arm/arm.opt |   6 +
 gcc/config/arm/elf.h   |   3 +-
 gcc/doc/invoke.texi|  11 +-
 gcc/doc/tm.texi|   6 +
 gcc/doc/tm.texi.in |   2 +
 gcc/target.def |  10 ++
 gcc/targhooks.h|   2 +
 .../gcc.target/arm/pure-code/ffunction-sections.c  |  17 +++
 .../gcc.target/arm/pure-code/no-literal-pool.c |  73 ++
 .../gcc.target/arm/pure-code/pure-code.exp |  54 
 gcc/varasm.c   |  57 +---
 13 files changed, 366 insertions(+), 29 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/pure-code/ffunction-sections.c
 create mode 100644 gcc/testsuite/gcc.target/arm/pure-code/no-literal-pool.c
 create mode 100644 gcc/testsuite/gcc.target/arm/pure-code/pure-code.exp

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index f60955438d6f1cc5d996e7eacd4b453213044181..f5cb301a0efb23dc66e4d220b4a8f32e3670e744 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -214,8 +214,8 @@ static bool arm_return_in_memory (const_tree, const_tree);
 static void arm_unwind_emit (FILE *, rtx_insn *);
 static bool arm_output_ttype (rtx);
 static void arm_asm_emit_except_personality (rtx);
-static void arm_asm_init_sections (void);
 #endif
+static void arm_asm_init_sections (void);
 static rtx

Re: [PATCH PR70729] The second part of patch.

2016-06-30 Thread Jakub Jelinek

On Thu, Jun 30, 2016 at 04:25:16PM +0200, Thomas Schwinge wrote:
> Hi!
> 
> On Thu, 30 Jun 2016 16:48:25 +0300, Yuri Rumyantsev  
> wrote:
> > Thanks for your help.
> > Could you please look at the following simple patch which cures
> > regression - we need to nullify loop safelen  field in
> > adjust_simduid_builtins:
> > 
> > diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c
> > index 2669813..f70380c 100644
> > --- a/gcc/tree-vectorizer.c
> > +++ b/gcc/tree-vectorizer.c
> > @@ -204,6 +204,10 @@ adjust_simduid_builtins (hash_table 
> > *htab)
> >   gcc_assert (TREE_CODE (arg) == SSA_NAME);
> >   simduid_to_vf *p = NULL, data;
> >   data.simduid = DECL_UID (SSA_NAME_VAR (arg));
> > + /* Need to nullify safelen fielf of loop since it's vale is not
> > +valid after transformation.  */

s/fielf/field/
s/vale/value/

> > + if (bb->loop_father && bb->loop_father->safelen > 0)
> > +   bb->loop_father->safelen = 0;
> >   if (htab)
> > {
> >   p = htab->find ();

Jakub

Re: [PATCH][AArch64][1/2] Add support INS (element) instruction to copy lanes between vectors

2016-06-30 Thread Kyrill Tkachov


Ping.

Thanks,
Kyrill

On 22/06/16 11:07, Kyrill Tkachov wrote:

Ping.
Richard, Marcus, do you have any feedback on this?

https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00502.html

Thanks,
Kyrill
On 14/06/16 10:36, James Greenhalgh wrote:

On Tue, Jun 07, 2016 at 05:56:47PM +0100, Kyrill Tkachov wrote:

Hi all,

This patch addresses an deficiency we have in handling vector lane-to-lane
moves in the AArch64 backend.  Generally we can use the INS (element)
instruction but, as a user complains in
https://gcc.gnu.org/ml/gcc-help/2016-05/msg00069.html
we don't. James had a patch adding an appropriate combine pattern some time
ago (https://gcc.gnu.org/ml/gcc-patches/2013-09/msg01068.html) but it never
got applied.

This patch is a rebase of that patch that adds necessary
vec_merge+vec_duplicate+vec_select combine pattern.  I chose to use a
define_insn rather than the define_insn_and_split in that patch that just
deletes the instruction when the source and destination registers are the
same, as I think that's not he combine patterns job to delete the redundant
instruction but rather some other passes job. Also, I was not able to create
a testcase where it would make a difference.

Also, this patch doesn't reimplement that vcopy*lane* intrinsics from inline
assembly to a vget_lane+vset_lane combo.  This can be done as a separate
patch on top of this one.

Bootstrapped and tested on aarch64-none-linux-gnu.
Also tested on aarch64_be-none-elf.

Ok for trunk?

This looks OK to me, but as it is based on my code I probably can't
approve it within the spirit of the write access policies (I only have
localized review permission).

Best wait for Richard/Marcus or a global reviewer to take a look.


Thanks,
Kyrill

2016-06-07  James Greenhalgh  
 Kyrylo Tkachov  

 * config/aarch64/aarch64-simd.md (*aarch64_simd_vec_copy_lane):
 New define_insn.
(*aarch64_simd_vec_copy_lane_): Likewise.

Watch your ChangeLog formatting.

Thanks,
James


2016-06-07  James Greenhalgh 
 Kyrylo Tkachov  

 * gcc.target/aarch64/vget_set_lane_1.c: New test.

Re: [PATCH PR70729] The second part of patch.

2016-06-30 Thread Thomas Schwinge

Hi!

On Thu, 30 Jun 2016 16:48:25 +0300, Yuri Rumyantsev  wrote:
> Thanks for your help.
> Could you please look at the following simple patch which cures
> regression - we need to nullify loop safelen  field in
> adjust_simduid_builtins:
> 
> diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c
> index 2669813..f70380c 100644
> --- a/gcc/tree-vectorizer.c
> +++ b/gcc/tree-vectorizer.c
> @@ -204,6 +204,10 @@ adjust_simduid_builtins (hash_table *htab)
>   gcc_assert (TREE_CODE (arg) == SSA_NAME);
>   simduid_to_vf *p = NULL, data;
>   data.simduid = DECL_UID (SSA_NAME_VAR (arg));
> + /* Need to nullify safelen fielf of loop since it's vale is not
> +valid after transformation.  */
> + if (bb->loop_father && bb->loop_father->safelen > 0)
> +   bb->loop_father->safelen = 0;
>   if (htab)
> {
>   p = htab->find ();

Quickly testing that by manually compiling
libgomp.fortran/examples-4/simd-2.f90 with -O2, I see that the dump files
generated by -fdump-tree-all are now back to what they were before your
change, and when manually running that executable, it exits normally, and
the following valgrind diagnostics no longer show up (that I saw after
your change):

==5942== Use of uninitialised value of size 8
==5942==at 0x40083D: __simd2_mod_MOD_work (simd-2.f90:18)
==5942==by 0x4008D4: MAIN__ (simd-2.f90:68)
==5942==by 0x4005DC: main (simd-2.f90:58)
==5942== 
==5942== Use of uninitialised value of size 8
==5942==at 0x400845: __simd2_mod_MOD_work (simd-2.f90:18)
==5942==by 0x4008D4: MAIN__ (simd-2.f90:68)
==5942==by 0x4005DC: main (simd-2.f90:58)
==5942== 
==5942== Invalid read of size 8
==5942==at 0x400845: __simd2_mod_MOD_work (simd-2.f90:18)
==5942==by 0x4008D4: MAIN__ (simd-2.f90:68)
==5942==by 0x4005DC: main (simd-2.f90:58)
==5942==  Address 0x5d3 is not stack'd, malloc'd or (recently) free'd

(Address 0x5d3 looks highly suspicious.)

In case that's relevant:

$ sed -e '/^$/Q' < /proc/cpuinfo 
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model   : 58
model name  : Intel(R) Core(TM) i7-3740QM CPU @ 2.70GHz
stepping: 9
microcode   : 0x1b
cpu MHz : 1744.875
cache size  : 6144 KB
physical id : 0
siblings: 8
core id : 0
cpu cores   : 4
apicid  : 0
initial apicid  : 0
fpu : yes
fpu_exception   : yes
cpuid level : 13
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca 
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx 
rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology 
nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est 
tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer 
aes xsave avx f16c rdrand lahf_lm ida arat epb pln pts dtherm tpr_shadow vnmi 
flexpriority ept vpid fsgsbase smep erms xsaveopt
bugs:
bogomips: 5382.58
clflush size: 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:


Grüße
 Thomas


> 2016-06-30 15:21 GMT+03:00 Jakub Jelinek :
> > On Thu, Jun 30, 2016 at 03:16:51PM +0300, Yuri Rumyantsev wrote:
> >> Hi Grüße.
> >>
> >> Could you please tell me how to reproduce your regression - did not
> >> see any new failures in my local area:
> >>
> >> PASS: libgomp.fortran/examples-4/simd-2.f90   -O0  (test for excess errors)
> >> PASS: libgomp.fortran/examples-4/simd-2.f90   -O0  execution test
> >> PASS: libgomp.fortran/examples-4/simd-2.f90   -O1  (test for excess errors)
> >> PASS: libgomp.fortran/examples-4/simd-2.f90   -O1  execution test
> >> PASS: libgomp.fortran/examples-4/simd-2.f90   -O2  (test for excess errors)
> >> PASS: libgomp.fortran/examples-4/simd-2.f90   -O2  execution test
> >> PASS: libgomp.fortran/examples-4/simd-2.f90   -O3 -fomit-frame-pointer
> >> -funroll-loops -fpeel-loops -ftracer -finline-functions  (test for
> >> excess errors)
> >> PASS: libgomp.fortran/examples-4/simd-2.f90   -O3 -fomit-frame-pointer
> >> -funroll-loops -fpeel-loops -ftracer -finline-functions  execution
> >> test
> >> PASS: libgomp.fortran/examples-4/simd-2.f90   -O3 -g  (test for excess 
> >> errors)
> >> PASS: libgomp.fortran/examples-4/simd-2.f90   -O3 -g  execution test
> >> PASS: libgomp.fortran/examples-4/simd-2.f90   -Os  (test for excess errors)
> >> PASS: libgomp.fortran/examples-4/simd-2.f90   -Os  execution test
> >>
> >> I used HASWELL machine for it since test requires target avx.
> >
> > It fails everywhere for me, without offloading configured, both 32-bit and 
> > 64-bit.
> > $ /home/jakub/src/gcc/obj20/gcc/xgcc -B/home/jakub/src/gcc/obj20/gcc/ 
>

Re: [Patch AArch64] Fix PR target/63874

2016-06-30 Thread Richard Earnshaw (lists)

On 31/03/16 14:11, Ramana Radhakrishnan wrote:
> Hi,
> 
>   In this PR we have a situation where we aren't really detecting
> weak references vs weak definitions. If one has a weak definition
> that binds locally there's no reason not to put out PC relative
> relocations.
> 
> However if you have a genuine weak reference that is
> known not to bind locally it makes very little sense
> to put out an entry into the literal pool which doesn't always
> work with DSOs and shared objects.
> 
> Tested aarch64-none-linux-gnu bootstrap and regression test with no 
> regressions
> 
> This is not a regression and given what we've seen recently with protected
> symbols and binds_locally_p I'd rather this were queued for GCC 7. 
> 
> Ok ?
> 

Yes, this looks fine.  Sorry for the delay replying.

R.

> regards
> Ramana
> 
> gcc/
> 
> * config/aarch64/aarch64.c (aarch64_classify_symbol): Typo in comment fixed.
>   Only force to memory if it is a weak external reference.
> 
> 
> gcc/testsuite
> 
> * gcc.target/aarch64/pr63874.c: New test.
> 
> 
> pr63874.txt
> 
> 
> commit e41d4bd6abbee99628909d4af612504844dee640
> Author: Ramana Radhakrishnan 
> Date:   Thu Mar 31 13:47:33 2016 +0100
> 
> fix PR63874
> 
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index cf1239d..6782316 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -9387,15 +9387,18 @@ aarch64_classify_symbol (rtx x, rtx offset)
>switch (aarch64_cmodel)
>   {
>   case AARCH64_CMODEL_TINY:
> -   /* When we retreive symbol + offset address, we have to make sure
> +   /* When we retrieve symbol + offset address, we have to make sure
>the offset does not cause overflow of the final address.  But
>we have no way of knowing the address of symbol at compile time
>so we can't accurately say if the distance between the PC and
>symbol + offset is outside the addressible range of +/-1M in the
>TINY code model.  So we rely on images not being greater than
>1M and cap the offset at 1M and anything beyond 1M will have to
> -  be loaded using an alternative mechanism.  */
> -   if (SYMBOL_REF_WEAK (x)
> +  be loaded using an alternative mechanism.  Furthermore if the
> +  symbol is a weak reference to something that isn't known to
> +  resolve to a symbol in this module, then force to memory.  */
> +   if ((SYMBOL_REF_WEAK (x)
> +&& !aarch64_symbol_binds_local_p (x))
> || INTVAL (offset) < -1048575 || INTVAL (offset) > 1048575)
>   return SYMBOL_FORCE_TO_MEM;
> return SYMBOL_TINY_ABSOLUTE;
> @@ -9403,7 +9406,8 @@ aarch64_classify_symbol (rtx x, rtx offset)
>   case AARCH64_CMODEL_SMALL:
> /* Same reasoning as the tiny code model, but the offset cap here is
>4G.  */
> -   if (SYMBOL_REF_WEAK (x)
> +   if ((SYMBOL_REF_WEAK (x)
> +&& !aarch64_symbol_binds_local_p (x))
> || !IN_RANGE (INTVAL (offset), HOST_WIDE_INT_C (-4294967263),
>   HOST_WIDE_INT_C (4294967264)))
>   return SYMBOL_FORCE_TO_MEM;
> diff --git a/gcc/testsuite/gcc.target/aarch64/pr63874.c 
> b/gcc/testsuite/gcc.target/aarch64/pr63874.c
> new file mode 100644
> index 000..1a745a0
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/pr63874.c
> @@ -0,0 +1,22 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +/* { dg-skip-if "Not applicable for mcmodel=large" { aarch64*-*-* }  { 
> "-mcmodel=large" } { "" } } */
> +
> +extern void __attribute__((weak)) foo_weakref (void);
> +void __attribute__((weak, noinline)) bar (void)
> +{
> + return;
> +}
> +void (*f) (void);
> +void (*g) (void);
> +
> +int
> +main (void)
> +{
> + f = _weakref;
> + g = 
> + return 0;
> +}
> +
> +/* { dg-final { scan-assembler-not "adr*foo_weakref" } } */
> +/* { dg-final { scan-assembler-not "\\.(word|xword)\tbar" } } */
>

Re: [PATCH, libstdc++/testsuite, ping2] 29_atomics/atomic/65913.cc: require atomic-builtins rather than specific target

2016-06-30 Thread Thomas Preudhomme

On Wednesday 29 June 2016 21:58:40 Jonathan Wakely wrote:
> On 29/06/16 13:49 -0700, Mike Stump wrote:
> >Please include the libstdc++ list, they don't all read the other list.
> 
> And the documentation clearly says (in two places) that all libstdc++
> patches must go to the libstdc++ list.

Oops my apologize, I did not think of libstdc++ has a separate component than 
GCC. I'll keep that in mind for next time.

Committed now.

Best regards,

Thomas

Re: [PATCH PR c/71699] Handle pointer arithmetic in nonzero tree checks

2016-06-30 Thread Manish Goregaokar

I'm getting a failure on
./gcc/testsuite/c-c++-common/asan/strncpy-overflow-1.c with the latest
patch. Unsure if it's related; we're not doing any pointer arithmetic
there.

I'll test it again on trunk without the patch and see what happens.
-Manish


On Thu, Jun 30, 2016 at 7:18 PM, Richard Biener
 wrote:
> On Thu, Jun 30, 2016 at 3:38 PM, Marc Glisse  wrote:
>> On Thu, 30 Jun 2016, Manish Goregaokar wrote:
>>
>>> Alright, the following patch was tested and it works
>>>
>>> diff --git a/gcc/fold-const.c b/gcc/fold-const.c
>>> index 3b9500d..0d82018 100644
>>> --- a/gcc/fold-const.c
>>> +++ b/gcc/fold-const.c
>>> @@ -13199,6 +13199,9 @@ tree_binary_nonzero_warnv_p (enum tree_code code,
>>>   switch (code)
>>> {
>>> case POINTER_PLUS_EXPR:
>>> +  return flag_delete_null_pointer_checks
>>> +&& (tree_expr_nonzero_warnv_p (op0, strict_overflow_p)
>>> +|| tree_expr_nonzero_warnv_p (op1, strict_overflow_p));
>>> case PLUS_EXPR:
>>>   if (ANY_INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_UNDEFINED (type))
>>> {
>>
>>
>> So, what prevents us from deciding that p+(0-p) is nonzero when p is? Not
>> sure if it is strictly legal in all languages, but I wouldn't be surprised
>> if there was at least one language where optimizations could lead to such
>> expressions.
>>
>> I think this is an exciting optimization, but I was always too scared to try
>> anything like this.
>
> ;)
>
> points-to analysis already has the constraint that POINTER_PLUS_EXPR cannot
> leave the object op0 points to.  Of course currently nothing uses the
> fact whether
> points-to computes pointed-to as nothing (aka NULL) - so the argument
> may be moot.
>
> Anyway, one of my points to the original patch was that POINTER_PLUS_EXPR
> handling should be clearly separate from PLUS_EXPR and that we have
> flag_delete_null_pointer_checks to allow targest to declare that 0 is a valid
> object pointer (and thus you can do 4 + -4 and reach NULL).
>
> Richard.
>
>> --
>> Marc Glisse

[PATCH][RFC] Fix PR71632, remove parts of TER

2016-06-30 Thread Richard Biener


The following patch fixes PR71632 by removing delayed expansion of
TERed defs.  Instead it adds code to apply the scheduling effect
to the GIMPLE IL (so you also get better interleaved GIMPLE stmt
/ generated RTL dumps in .expand).

This removes the quadratic re-expansion of TERed RHS as seen in
the testcase.

It affects initial RTL generation in following ways:

 1) it may expand stmts with unused LHS
 2) it may expand dead code when use stmts are expanded and those
use get_gimple_for_ssa, expanding its operands (again).
 3) it no longer automatically tries to combine with defs doing
memory loads (in case we TERed the load and expansion of the
RHS resulted in a MEM).
 4) the depth-first, left-to-right "expansion" of ops might not
be a 100% match of what expand currently does

I expect that 1) and 2) are mostly a non-issue (and not worse than
the effect seen in the PR).  I also expect that 3) is "quickly"
recovered by fwprop or combine.

While I'm quite sure 3) exists I'm not 100% sure we never return
a non-constant/reg for SSA name expansion (considering expand
modifiers like EXPAND_SUM).

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

I think this is also an important step towards eventually doing
a meaningful GIMPLE-level scheduling (for register pressure,
also to remove this awkward code-placement code from reassoc
for example).  Without this TER would disrupt the result quite
a bit (so the scheduling part of TER would be disabled when scheduling
on GIMPLE).

Comments?  I'll throw this on one of our spec testers tonight any
testing on non-x86 archs appreciated.

Thanks,
Richard.

PS: avoid_deep_ter_for_debug could go away as well I think.

2016-06-30  Richard Biener  

PR middle-end/71632
* expr.c (expand_expr_real_1): Do not expand TERed SSA name
def rhs.
* cfgexpand.c (expand_gimple_basic_block): Expand defs of
TERed SSA names.  Move debug stmt use handling ...
* tree-outof-ssa.c (move_uses_r): ... here.  New helper
for ...
(rewrite_trees): ... code to apply the scheduling effect of TER
to the GIMPLE IL.
(remove_ssa_form): Move rewrite_trees call.

* gcc.dg/torture/pr71632.c: New testcase.

Index: gcc/expr.c
===
*** gcc/expr.c  (revision 237873)
--- gcc/expr.c  (working copy)
*** expand_expr_real_1 (tree exp, rtx target
*** 9695,9704 
  LAST_VIRTUAL_REGISTER + 1);
}
  
!   g = get_gimple_for_ssa_name (exp);
!   /* For EXPAND_INITIALIZER try harder to get something simpler.  */
!   if (g == NULL
! && modifier == EXPAND_INITIALIZER
  && !SSA_NAME_IS_DEFAULT_DEF (exp)
  && (optimize || !SSA_NAME_VAR (exp)
  || DECL_IGNORED_P (SSA_NAME_VAR (exp)))
--- 9677,9685 
  LAST_VIRTUAL_REGISTER + 1);
}
  
!   g = NULL;
!   /* For EXPAND_INITIALIZER try to get something simpler.  */
!   if (modifier == EXPAND_INITIALIZER
  && !SSA_NAME_IS_DEFAULT_DEF (exp)
  && (optimize || !SSA_NAME_VAR (exp)
  || DECL_IGNORED_P (SSA_NAME_VAR (exp)))
Index: gcc/cfgexpand.c
===
*** gcc/cfgexpand.c (revision 237873)
--- gcc/cfgexpand.c (working copy)
*** expand_gimple_basic_block (basic_block b
*** 5510,5611 
basic_block new_bb;
  
stmt = gsi_stmt (gsi);
- 
-   /* If this statement is a non-debug one, and we generate debug
-insns, then this one might be the last real use of a TERed
-SSA_NAME, but where there are still some debug uses further
-down.  Expanding the current SSA name in such further debug
-uses by their RHS might lead to wrong debug info, as coalescing
-might make the operands of such RHS be placed into the same
-pseudo as something else.  Like so:
-  a_1 = a_0 + 1;   // Assume a_1 is TERed and a_0 is dead
-  use(a_1);
-  a_2 = ...
-#DEBUG ... => a_1
-As a_0 and a_2 don't overlap in lifetime, assume they are coalesced.
-If we now would expand a_1 by it's RHS (a_0 + 1) in the debug use,
-the write to a_2 would actually have clobbered the place which
-formerly held a_0.
- 
-So, instead of that, we recognize the situation, and generate
-debug temporaries at the last real use of TERed SSA names:
-  a_1 = a_0 + 1;
-#DEBUG #D1 => a_1
-  use(a_1);
-  a_2 = ...
-#DEBUG ... => #D1
-*/
-   if (MAY_HAVE_DEBUG_INSNS
- && SA.values
- && !is_gimple_debug (stmt))
-   {
- ssa_op_iter iter;
- tree op;
- gimple *def;
- 
- location_t sloc = curr_insn_location ();
- 
- /* Look for SSA names that have their

Re: [PATCH PR c/71699] Handle pointer arithmetic in nonzero tree checks

2016-06-30 Thread Richard Biener

On Thu, Jun 30, 2016 at 3:38 PM, Marc Glisse  wrote:
> On Thu, 30 Jun 2016, Manish Goregaokar wrote:
>
>> Alright, the following patch was tested and it works
>>
>> diff --git a/gcc/fold-const.c b/gcc/fold-const.c
>> index 3b9500d..0d82018 100644
>> --- a/gcc/fold-const.c
>> +++ b/gcc/fold-const.c
>> @@ -13199,6 +13199,9 @@ tree_binary_nonzero_warnv_p (enum tree_code code,
>>   switch (code)
>> {
>> case POINTER_PLUS_EXPR:
>> +  return flag_delete_null_pointer_checks
>> +&& (tree_expr_nonzero_warnv_p (op0, strict_overflow_p)
>> +|| tree_expr_nonzero_warnv_p (op1, strict_overflow_p));
>> case PLUS_EXPR:
>>   if (ANY_INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_UNDEFINED (type))
>> {
>
>
> So, what prevents us from deciding that p+(0-p) is nonzero when p is? Not
> sure if it is strictly legal in all languages, but I wouldn't be surprised
> if there was at least one language where optimizations could lead to such
> expressions.
>
> I think this is an exciting optimization, but I was always too scared to try
> anything like this.

;)

points-to analysis already has the constraint that POINTER_PLUS_EXPR cannot
leave the object op0 points to.  Of course currently nothing uses the
fact whether
points-to computes pointed-to as nothing (aka NULL) - so the argument
may be moot.

Anyway, one of my points to the original patch was that POINTER_PLUS_EXPR
handling should be clearly separate from PLUS_EXPR and that we have
flag_delete_null_pointer_checks to allow targest to declare that 0 is a valid
object pointer (and thus you can do 4 + -4 and reach NULL).

Richard.

> --
> Marc Glisse

Re: [PATCH PR70729] The second part of patch.

2016-06-30 Thread Yuri Rumyantsev

Hi Jacub,

Thanks for your help.
Could you please look at the following simple patch which cures
regression - we need to nullify loop safelen  field in
adjust_simduid_builtins:

diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c
index 2669813..f70380c 100644
--- a/gcc/tree-vectorizer.c
+++ b/gcc/tree-vectorizer.c
@@ -204,6 +204,10 @@ adjust_simduid_builtins (hash_table *htab)
  gcc_assert (TREE_CODE (arg) == SSA_NAME);
  simduid_to_vf *p = NULL, data;
  data.simduid = DECL_UID (SSA_NAME_VAR (arg));
+ /* Need to nullify safelen fielf of loop since it's vale is not
+valid after transformation.  */
+ if (bb->loop_father && bb->loop_father->safelen > 0)
+   bb->loop_father->safelen = 0;
  if (htab)
{
  p = htab->find ();

Thanks.
Yuri.

2016-06-30 15:21 GMT+03:00 Jakub Jelinek :
> On Thu, Jun 30, 2016 at 03:16:51PM +0300, Yuri Rumyantsev wrote:
>> Hi Grüße.
>>
>> Could you please tell me how to reproduce your regression - did not
>> see any new failures in my local area:
>>
>> PASS: libgomp.fortran/examples-4/simd-2.f90   -O0  (test for excess errors)
>> PASS: libgomp.fortran/examples-4/simd-2.f90   -O0  execution test
>> PASS: libgomp.fortran/examples-4/simd-2.f90   -O1  (test for excess errors)
>> PASS: libgomp.fortran/examples-4/simd-2.f90   -O1  execution test
>> PASS: libgomp.fortran/examples-4/simd-2.f90   -O2  (test for excess errors)
>> PASS: libgomp.fortran/examples-4/simd-2.f90   -O2  execution test
>> PASS: libgomp.fortran/examples-4/simd-2.f90   -O3 -fomit-frame-pointer
>> -funroll-loops -fpeel-loops -ftracer -finline-functions  (test for
>> excess errors)
>> PASS: libgomp.fortran/examples-4/simd-2.f90   -O3 -fomit-frame-pointer
>> -funroll-loops -fpeel-loops -ftracer -finline-functions  execution
>> test
>> PASS: libgomp.fortran/examples-4/simd-2.f90   -O3 -g  (test for excess 
>> errors)
>> PASS: libgomp.fortran/examples-4/simd-2.f90   -O3 -g  execution test
>> PASS: libgomp.fortran/examples-4/simd-2.f90   -Os  (test for excess errors)
>> PASS: libgomp.fortran/examples-4/simd-2.f90   -Os  execution test
>>
>> I used HASWELL machine for it since test requires target avx.
>
> It fails everywhere for me, without offloading configured, both 32-bit and 
> 64-bit.
> $ /home/jakub/src/gcc/obj20/gcc/xgcc -B/home/jakub/src/gcc/obj20/gcc/ 
> ../../../../libgomp/testsuite/libgomp.fortran/examples-4/simd-2.f90   
> -B/home/jakub/src/gcc/obj20/x86_64-pc-linux-gnu/./libgomp/ 
> -B/home/jakub/src/gcc/obj20/x86_64-pc-linux-gnu/./libgomp/.libs 
> -I/home/jakub/src/gcc/obj20/x86_64-pc-linux-gnu/./libgomp 
> -I../../../../libgomp/testsuite/../../include 
> -I../../../../libgomp/testsuite/.. -fmessage-length=0 
> -fno-diagnostics-show-caret -Wno-hsa -fdiagnostics-color=never -fopenmp 
> -B/home/jakub/src/gcc/obj20/x86_64-pc-linux-gnu/./libgomp/../libquadmath/.libs/
>-O2  -msse2 -mavx 
> -B/home/jakub/src/gcc/obj20/x86_64-pc-linux-gnu/./libgomp/../libgfortran/.libs
>  
> -fintrinsic-modules-path=/home/jakub/src/gcc/obj20/x86_64-pc-linux-gnu/./libgomp
>-L/home/jakub/src/gcc/obj20/x86_64-pc-linux-gnu/./libgomp/.libs 
> -L/home/jakub/src/gcc/obj20/x86_64-pc-linux-gnu/./libgomp/../libquadmath/.libs/
>  
> -L/home/jakub/src/gcc/obj20/x86_64-pc-linux-gnu/./libgomp/../libgfortran/.libs
>  -lgfortran -lm  -o ./simd-2.exe -Wl,-rpath,../.libs/ -march=haswell
> $ ./simd-2.exe
>
> Program received signal SIGSEGV: Segmentation fault - invalid memory 
> reference.
>
> Backtrace for this error:
> #0  0x7F7548CDBD58
> #1  0x7F7548CDAEE0
> #2  0x3405234A4F
> #3  0x400849 in __simd2_mod_MOD_work
> #4  0x4008D4 in MAIN__ at simd-2.f90:?
> Segmentation fault (core dumped)
>
> With/without -march=haswell, -mtune=haswell etc.
>
> Jakub

Re: [PATCH PR c/71699] Handle pointer arithmetic in nonzero tree checks

2016-06-30 Thread Marc Glisse


On Thu, 30 Jun 2016, Manish Goregaokar wrote:


Alright, the following patch was tested and it works

diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index 3b9500d..0d82018 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -13199,6 +13199,9 @@ tree_binary_nonzero_warnv_p (enum tree_code code,
  switch (code)
{
case POINTER_PLUS_EXPR:
+  return flag_delete_null_pointer_checks
+&& (tree_expr_nonzero_warnv_p (op0, strict_overflow_p)
+|| tree_expr_nonzero_warnv_p (op1, strict_overflow_p));
case PLUS_EXPR:
  if (ANY_INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_UNDEFINED (type))
{


So, what prevents us from deciding that p+(0-p) is nonzero when p is? Not 
sure if it is strictly legal in all languages, but I wouldn't be surprised 
if there was at least one language where optimizations could lead to such 
expressions.


I think this is an exciting optimization, but I was always too scared to 
try anything like this.


--
Marc Glisse

Re: [PATCH 2/2] gcc/genrecog: Don't warn for missing mode on special predicates

2016-06-30 Thread Andrew Burgess

* Richard Sandiford  [2016-06-15 19:07:56 +0100]:

> Andrew Burgess  writes:
> > In md.texi it says:
> >
> >   Predicates written with @code{define_special_predicate} do not get any
> >   automatic mode checks, and are treated as having special mode handling
> >   by @command{genrecog}.
> >
> > However, in genrecog, when validating a SET pattern, if either the
> > source or destination is missing a mode then a warning is given, even if
> > there's a predicate defined with define_special_predicate.
> >
> > This commit silences the warning for special predicates.
> >
> > gcc/ChangeLog:
> >
> > * genrecog.c (validate_pattern): Don't warn about missing mode for
> > define_special_predicate predicates.
> > Acked-by: Andrew Burgess 
> > ---
> >  gcc/ChangeLog  |  5 +
> >  gcc/genrecog.c | 22 +++---
> >  2 files changed, 24 insertions(+), 3 deletions(-)
> >
> > diff --git a/gcc/genrecog.c b/gcc/genrecog.c
> > index a9f5a4a..7596552 100644
> > --- a/gcc/genrecog.c
> > +++ b/gcc/genrecog.c
> > @@ -674,9 +674,25 @@ validate_pattern (rtx pattern, md_rtx_info *info, rtx 
> > set, int set_code)
> >  && !CONST_WIDE_INT_P (src)
> >  && GET_CODE (src) != CALL)
> >   {
> > -   const char *which;
> > -   which = (dmode == VOIDmode ? "destination" : "source");
> > -   message_at (info->loc, "warning: %s missing a mode?", which);
> > +   const char *which_msg;
> > +   rtx which;
> > +   const char *pred_name;
> > +   const struct pred_data *pred;
> > +
> > +   which_msg = (dmode == VOIDmode ? "destination" : "source");
> > +   which = (dmode == VOIDmode ? dest : src);
> > +   pred_name = XSTR (which, 1);
> > +   if (pred_name[0] != 0)
> > + {
> > +   pred = lookup_predicate (pred_name);
> > +   if (!pred)
> > + error_at (info->loc, "unknown predicate '%s'", pred_name);
> > + }
> > +   else
> > + pred = 0;
> > +   if (!pred || !pred->special)
> > + message_at (info->loc, "warning: %s missing a mode?",
> > + which_msg);
> 
> There's no guarantee at this point that "which" is a match_operand.
> Also, I think the earlier:
> 
> /* The operands of a SET must have the same mode unless one
>  is VOIDmode.  */
> else if (dmode != VOIDmode && smode != VOIDmode && dmode != smode)
> error_at (info->loc, "mode mismatch in set: %smode vs %smode",
>   GET_MODE_NAME (dmode), GET_MODE_NAME (smode));
> 
> should be skipped for special predicates too.
> 
> How about generalising:
> 
>   /* The mode of an ADDRESS_OPERAND is the mode of the memory
>  reference, not the mode of the address.  */
>   if (GET_CODE (src) == MATCH_OPERAND
>   && ! strcmp (XSTR (src, 1), "address_operand"))
> ;
> 
> to:
> 
>   if (special_predicate_operand_p (src)
>   || special_predicate_operand_p (dest))
> ;
> 
> with a new special_predicate_operand_p helper?  I don't think we should
> duplicate the "unknown predicate" error here; the helper can just return
> false for unknown predicates.

Thanks for taking the time to review and provide feedback.  Sorry it
has take a while to get around to this patch again.

I've updated the patch inline with your feedback.  How's this?

Thanks,
Andrew

---

gcc/genrecog: Don't warn for missing mode on special predicates

In md.texi it says:

  Predicates written with @code{define_special_predicate} do not get any
  automatic mode checks, and are treated as having special mode handling
  by @command{genrecog}.

In genrecog, when validating a SET pattern, there is already a special
case for 'address_operand' which is a special predicate, however,
other special predicates fall through to the code which checks for
incorrect use of VOIDmode.

This commit adds a new function for detecting special predicates, and
then generalises the check in validate_pattern so that mode checking
is skipped for all special predicates.

gcc/ChangeLog:

* genrecog.c (special_predicate_operand_p): New function.
(predicate_name): Move function.
(validate_pattern): Don't warn about missing mode for all
define_special_predicate predicates.
---
 gcc/ChangeLog  |  7 +++
 gcc/genrecog.c | 50 +++---
 2 files changed, 42 insertions(+), 15 deletions(-)

diff --git a/gcc/genrecog.c b/gcc/genrecog.c
index a9f5a4a..7c56225 100644
--- a/gcc/genrecog.c
+++ b/gcc/genrecog.c
@@ -463,6 +463,38 @@ constraints_supported_in_insn_p (rtx insn)
   || GET_CODE (insn) == DEFINE_PEEPHOLE2);
 }
 
+/* Return the name of the predicate matched by MATCH_RTX.  */
+
+static const char *
+predicate_name (rtx match_rtx)
+{
+  if (GET_CODE (match_rtx) == MATCH_SCRATCH)
+return "scratch_operand";
+  else
+return XSTR (match_rtx, 1);

Re: [PATCH PR c/71699] Handle pointer arithmetic in nonzero tree checks

2016-06-30 Thread Manish Goregaokar

Alright, the following patch was tested and it works

diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index 3b9500d..0d82018 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -13199,6 +13199,9 @@ tree_binary_nonzero_warnv_p (enum tree_code code,
   switch (code)
 {
 case POINTER_PLUS_EXPR:
+  return flag_delete_null_pointer_checks
+&& (tree_expr_nonzero_warnv_p (op0, strict_overflow_p)
+|| tree_expr_nonzero_warnv_p (op1, strict_overflow_p));
 case PLUS_EXPR:
   if (ANY_INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_UNDEFINED (type))
 {
-- 
2.8.3


-Manish


On Thu, Jun 30, 2016 at 6:45 PM, Richard Biener
 wrote:
> On Thu, Jun 30, 2016 at 2:03 PM, Manish Goregaokar  wrote:
>> What about ptr + intptr_t? I guess we should check nonnegative for op1 then?
>
> op1 will always be nonnegative as it is forced to sizetype type (which
> is unsigned).
>
> Richard.
>
>> -Manish
>>
>>
>> On Thu, Jun 30, 2016 at 5:25 PM, Richard Biener
>>  wrote:
>>> On Thu, Jun 30, 2016 at 1:17 PM, Manish Goregaokar  
>>> wrote:
 gcc/ChangeLog:
 PR c/71699
 * fold-const.c (tree_binary_nonzero_warnv_p): Allow
 pointer addition to also be considered nonzero.
 ---
  gcc/fold-const.c | 20 +---
  1 file changed, 13 insertions(+), 7 deletions(-)

 diff --git a/gcc/fold-const.c b/gcc/fold-const.c
 index 3b9500d..eda713e 100644
 --- a/gcc/fold-const.c
 +++ b/gcc/fold-const.c
 @@ -13200,16 +13200,22 @@ tree_binary_nonzero_warnv_p (enum tree_code code,
  {
  case POINTER_PLUS_EXPR:
  case PLUS_EXPR:
 -  if (ANY_INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_UNDEFINED (type))
 +  if ((ANY_INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_UNDEFINED (type))
 +  || POINTER_TYPE_P (type))
  {
 +  /* Pointers are always nonnegative, check integers.  */
 +  if (ANY_INTEGRAL_TYPE_P (type))
 +  {
 +  sub_strict_overflow_p = false;
 +  if (!tree_expr_nonnegative_warnv_p (op0,
 +  _strict_overflow_p)
 +  || !tree_expr_nonnegative_warnv_p (op1,
 + _strict_overflow_p))
 +return false;
 +  }
/* With the presence of negative values it is hard
   to say something.  */
 -  sub_strict_overflow_p = false;
 -  if (!tree_expr_nonnegative_warnv_p (op0,
 -  _strict_overflow_p)
 -  || !tree_expr_nonnegative_warnv_p (op1,
 - _strict_overflow_p))
 -return false;
 +
>>>
>>> Hmm, note that op1 of POINTER_PLUS_EXPR _is_ an integer that needs to be 
>>> treated
>>> as signed.
>>>
>>> POINTER_PLUS_EXPR is special in other ways in that iff
>>> flag_delete_null_pointer_checks
>>> is set we can assume that ptr + non-zero is never zero.  Thus simply do
>>>
>>>case POINTER_PLUS_EXPR:
>>>   return flag_delete_null_pointer_checks
>>>&& (tree_expr_nonzero_warnv_p (op0, strict_overflow_p)
>>>  || tree_expr_nonzero_warnv_p (op1, strict_overflow_p));
>>>
>>> OTOH ptr + -(uintptr_t)ptr is valid from a POINTER_PLUS_EXPR
>>> perspective as GCC does
>>> not have a special tree code for pointer subtraction (but IIRC it uses
>>> MINUS_EXPR on integers
>>> for this).
>>>
>>> Richard.
>>>
>>>
/* One of operands must be positive and the other non-negative.  */
/* We don't set *STRICT_OVERFLOW_P here: even if this value
   overflows, on a twos-complement machine the sum of two
 --
 2.8.3

Re: [PATCH PR c/71699] Handle pointer arithmetic in nonzero tree checks

2016-06-30 Thread Richard Biener

On Thu, Jun 30, 2016 at 2:03 PM, Manish Goregaokar  wrote:
> What about ptr + intptr_t? I guess we should check nonnegative for op1 then?

op1 will always be nonnegative as it is forced to sizetype type (which
is unsigned).

Richard.

> -Manish
>
>
> On Thu, Jun 30, 2016 at 5:25 PM, Richard Biener
>  wrote:
>> On Thu, Jun 30, 2016 at 1:17 PM, Manish Goregaokar  
>> wrote:
>>> gcc/ChangeLog:
>>> PR c/71699
>>> * fold-const.c (tree_binary_nonzero_warnv_p): Allow
>>> pointer addition to also be considered nonzero.
>>> ---
>>>  gcc/fold-const.c | 20 +---
>>>  1 file changed, 13 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/gcc/fold-const.c b/gcc/fold-const.c
>>> index 3b9500d..eda713e 100644
>>> --- a/gcc/fold-const.c
>>> +++ b/gcc/fold-const.c
>>> @@ -13200,16 +13200,22 @@ tree_binary_nonzero_warnv_p (enum tree_code code,
>>>  {
>>>  case POINTER_PLUS_EXPR:
>>>  case PLUS_EXPR:
>>> -  if (ANY_INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_UNDEFINED (type))
>>> +  if ((ANY_INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_UNDEFINED (type))
>>> +  || POINTER_TYPE_P (type))
>>>  {
>>> +  /* Pointers are always nonnegative, check integers.  */
>>> +  if (ANY_INTEGRAL_TYPE_P (type))
>>> +  {
>>> +  sub_strict_overflow_p = false;
>>> +  if (!tree_expr_nonnegative_warnv_p (op0,
>>> +  _strict_overflow_p)
>>> +  || !tree_expr_nonnegative_warnv_p (op1,
>>> + _strict_overflow_p))
>>> +return false;
>>> +  }
>>>/* With the presence of negative values it is hard
>>>   to say something.  */
>>> -  sub_strict_overflow_p = false;
>>> -  if (!tree_expr_nonnegative_warnv_p (op0,
>>> -  _strict_overflow_p)
>>> -  || !tree_expr_nonnegative_warnv_p (op1,
>>> - _strict_overflow_p))
>>> -return false;
>>> +
>>
>> Hmm, note that op1 of POINTER_PLUS_EXPR _is_ an integer that needs to be 
>> treated
>> as signed.
>>
>> POINTER_PLUS_EXPR is special in other ways in that iff
>> flag_delete_null_pointer_checks
>> is set we can assume that ptr + non-zero is never zero.  Thus simply do
>>
>>case POINTER_PLUS_EXPR:
>>   return flag_delete_null_pointer_checks
>>&& (tree_expr_nonzero_warnv_p (op0, strict_overflow_p)
>>  || tree_expr_nonzero_warnv_p (op1, strict_overflow_p));
>>
>> OTOH ptr + -(uintptr_t)ptr is valid from a POINTER_PLUS_EXPR
>> perspective as GCC does
>> not have a special tree code for pointer subtraction (but IIRC it uses
>> MINUS_EXPR on integers
>> for this).
>>
>> Richard.
>>
>>
>>>/* One of operands must be positive and the other non-negative.  */
>>>/* We don't set *STRICT_OVERFLOW_P here: even if this value
>>>   overflows, on a twos-complement machine the sum of two
>>> --
>>> 2.8.3

Re: Determine more IVs to be non-overflowing

2016-06-30 Thread Jan Hubicka

Hi,
i sent the email before, but it seems it never left my machine. Sorry for
possible duplicates.  This is updated version which does not overflow (by
capping nit), but still using widest_int for the calculatoins. It seems more
readable to do so than using wi:add/wi:mult/wi:lt and overflow checks, but
i can definitly update the patch to do it, too.

Bootstrapped/regtested x86_64-linux, OK?

Honza

* tree-scalar-evolution.h (iv_can_overflow_p): Declare.
* tree-scalar-evolution.c (iv_can_overflow_p): New funcition.
(simple_iv): Use it.
* tree-ssa-loop-niter.c (nowrap_type_p): Use ANY_INTEGRAL_TYPE_P

* gcc.dg/tree-ssa/scev-14.c: New testcase.

Index: testsuite/gcc.dg/tree-ssa/scev-14.c
===
--- testsuite/gcc.dg/tree-ssa/scev-14.c (revision 0)
+++ testsuite/gcc.dg/tree-ssa/scev-14.c (working copy)
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-ivopts-details" } */
+int a[100];
+void t(unsigned int n)
+{
+  unsigned int i;
+  for (i=0; i widest_int::from (type_max, sgn)
+  || (widest_int::from (type_min, sgn)
+ > (widest_int::from (base_min, base_sgn)
++ widest_int::from (step_min, step_sgn) * (nit + 1
+return true;
+  return false;
+}
+
 /* Checks whether use of OP in USE_LOOP behaves as a simple affine iv with
respect to WRTO_LOOP and returns its base and step in IV if possible
(see analyze_scalar_evolution_in_loop for more details on USE_LOOP
@@ -3375,8 +3430,12 @@ simple_iv (struct loop *wrto_loop, struc
   if (tree_contains_chrecs (iv->base, NULL))
 return false;
 
-  iv->no_overflow = (!folded_casts && ANY_INTEGRAL_TYPE_P (type)
-&& TYPE_OVERFLOW_UNDEFINED (type));
+  iv->no_overflow = !folded_casts && nowrap_type_p (type);
+
+  if (!iv->no_overflow
+  && !iv_can_overflow_p (wrto_loop, type, iv->base, iv->step))
+iv->no_overflow = true;
+
 
   /* Try to simplify iv base:
 
Index: tree-scalar-evolution.h
===
--- tree-scalar-evolution.h (revision 237856)
+++ tree-scalar-evolution.h (working copy)
@@ -38,6 +38,7 @@ extern unsigned int scev_const_prop (voi
 extern bool expression_expensive_p (tree);
 extern bool simple_iv (struct loop *, struct loop *, tree, struct affine_iv *,
   bool);
+extern bool iv_can_overflow_p (struct loop *, tree, tree, tree);
 extern tree compute_overall_effect_of_inner_loop (struct loop *, tree);
 
 /* Returns the basic block preceding LOOP, or the CFG entry block when
Index:

Re: [BUILDROBOT] Selftest failed for i686-wrs-vxworks

2016-06-30 Thread Nathan Sidwell


Jan-Benedict,


I haven't given it any additional manual testing so far. It's
pre-installation though. Maybe I'd just set WIND_BASE to some
arbitrary value, just to make xgcc pass it's initial start-up test so
that it can continue with self-testing? Or shall we set some value
in gcc/Makefile.in for running the self-test?


As I recall, WIND_BASE is expected to point at a vxworks install to pick up 
header files.  It is an error not to have it set (silently skipping it leads to 
user confusion).


If that's irrelevant for this testing environment, then setting it to something 
(probably just "", but safer might be 
"/These.are.not.the.dirs.you.are.looking.for") should be fine.


nathan

Re: [PATCH PR70729] The second part of patch.

2016-06-30 Thread Jakub Jelinek

On Thu, Jun 30, 2016 at 03:16:51PM +0300, Yuri Rumyantsev wrote:
> Hi Grüße.
> 
> Could you please tell me how to reproduce your regression - did not
> see any new failures in my local area:
> 
> PASS: libgomp.fortran/examples-4/simd-2.f90   -O0  (test for excess errors)
> PASS: libgomp.fortran/examples-4/simd-2.f90   -O0  execution test
> PASS: libgomp.fortran/examples-4/simd-2.f90   -O1  (test for excess errors)
> PASS: libgomp.fortran/examples-4/simd-2.f90   -O1  execution test
> PASS: libgomp.fortran/examples-4/simd-2.f90   -O2  (test for excess errors)
> PASS: libgomp.fortran/examples-4/simd-2.f90   -O2  execution test
> PASS: libgomp.fortran/examples-4/simd-2.f90   -O3 -fomit-frame-pointer
> -funroll-loops -fpeel-loops -ftracer -finline-functions  (test for
> excess errors)
> PASS: libgomp.fortran/examples-4/simd-2.f90   -O3 -fomit-frame-pointer
> -funroll-loops -fpeel-loops -ftracer -finline-functions  execution
> test
> PASS: libgomp.fortran/examples-4/simd-2.f90   -O3 -g  (test for excess errors)
> PASS: libgomp.fortran/examples-4/simd-2.f90   -O3 -g  execution test
> PASS: libgomp.fortran/examples-4/simd-2.f90   -Os  (test for excess errors)
> PASS: libgomp.fortran/examples-4/simd-2.f90   -Os  execution test
> 
> I used HASWELL machine for it since test requires target avx.

It fails everywhere for me, without offloading configured, both 32-bit and 
64-bit.
$ /home/jakub/src/gcc/obj20/gcc/xgcc -B/home/jakub/src/gcc/obj20/gcc/ 
../../../../libgomp/testsuite/libgomp.fortran/examples-4/simd-2.f90   
-B/home/jakub/src/gcc/obj20/x86_64-pc-linux-gnu/./libgomp/ 
-B/home/jakub/src/gcc/obj20/x86_64-pc-linux-gnu/./libgomp/.libs 
-I/home/jakub/src/gcc/obj20/x86_64-pc-linux-gnu/./libgomp 
-I../../../../libgomp/testsuite/../../include 
-I../../../../libgomp/testsuite/.. -fmessage-length=0 
-fno-diagnostics-show-caret -Wno-hsa -fdiagnostics-color=never -fopenmp 
-B/home/jakub/src/gcc/obj20/x86_64-pc-linux-gnu/./libgomp/../libquadmath/.libs/ 
  -O2  -msse2 -mavx 
-B/home/jakub/src/gcc/obj20/x86_64-pc-linux-gnu/./libgomp/../libgfortran/.libs 
-fintrinsic-modules-path=/home/jakub/src/gcc/obj20/x86_64-pc-linux-gnu/./libgomp
   -L/home/jakub/src/gcc/obj20/x86_64-pc-linux-gnu/./libgomp/.libs 
-L/home/jakub/src/gcc/obj20/x86_64-pc-linux-gnu/./libgomp/../libquadmath/.libs/ 
-L/home/jakub/src/gcc/obj20/x86_64-pc-linux-gnu/./libgomp/../libgfortran/.libs 
-lgfortran -lm  -o ./simd-2.exe -Wl,-rpath,../.libs/ -march=haswell
$ ./simd-2.exe

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0x7F7548CDBD58
#1  0x7F7548CDAEE0
#2  0x3405234A4F
#3  0x400849 in __simd2_mod_MOD_work
#4  0x4008D4 in MAIN__ at simd-2.f90:?
Segmentation fault (core dumped)

With/without -march=haswell, -mtune=haswell etc.

Jakub

Re: [PATCH PR70729] The second part of patch.

2016-06-30 Thread Yuri Rumyantsev

Hi Grüße.

Could you please tell me how to reproduce your regression - did not
see any new failures in my local area:

PASS: libgomp.fortran/examples-4/simd-2.f90   -O0  (test for excess errors)
PASS: libgomp.fortran/examples-4/simd-2.f90   -O0  execution test
PASS: libgomp.fortran/examples-4/simd-2.f90   -O1  (test for excess errors)
PASS: libgomp.fortran/examples-4/simd-2.f90   -O1  execution test
PASS: libgomp.fortran/examples-4/simd-2.f90   -O2  (test for excess errors)
PASS: libgomp.fortran/examples-4/simd-2.f90   -O2  execution test
PASS: libgomp.fortran/examples-4/simd-2.f90   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  (test for
excess errors)
PASS: libgomp.fortran/examples-4/simd-2.f90   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution
test
PASS: libgomp.fortran/examples-4/simd-2.f90   -O3 -g  (test for excess errors)
PASS: libgomp.fortran/examples-4/simd-2.f90   -O3 -g  execution test
PASS: libgomp.fortran/examples-4/simd-2.f90   -Os  (test for excess errors)
PASS: libgomp.fortran/examples-4/simd-2.f90   -Os  execution test

I used HASWELL machine for it since test requires target avx.

2016-06-29 19:19 GMT+03:00 Thomas Schwinge :
> Hi!
>
> On Tue, 28 Jun 2016 18:50:37 +0300, Yuri Rumyantsev  
> wrote:
>> Here is the second part of patch to improve loop invariant code motion
>> for loop marked with pragma omp simd.
>>
>> Bootstrapping and regression testing did not show any new failures.
>
> That got committed in r237844.  In my testing, that causes (or exposes?)
> the following regressions (both with offloading enabled and disabled):
>
>  PASS: libgomp.fortran/examples-4/simd-2.f90   -O1  (test for excess 
> errors)
>  PASS: libgomp.fortran/examples-4/simd-2.f90   -O1  execution test
>  PASS: libgomp.fortran/examples-4/simd-2.f90   -O2  (test for excess 
> errors)
> -PASS: libgomp.fortran/examples-4/simd-2.f90   -O2  execution test
> +FAIL: libgomp.fortran/examples-4/simd-2.f90   -O2  execution test
>  PASS: libgomp.fortran/examples-4/simd-2.f90   -O3 -fomit-frame-pointer 
> -funroll-loops -fpeel-loops -ftracer -finline-functions  (test for excess 
> errors)
> -PASS: libgomp.fortran/examples-4/simd-2.f90   -O3 -fomit-frame-pointer 
> -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
> +FAIL: libgomp.fortran/examples-4/simd-2.f90   -O3 -fomit-frame-pointer 
> -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
>  PASS: libgomp.fortran/examples-4/simd-2.f90   -O3 -g  (test for excess 
> errors)
> -PASS: libgomp.fortran/examples-4/simd-2.f90   -O3 -g  execution test
> +FAIL: libgomp.fortran/examples-4/simd-2.f90   -O3 -g  execution test
>  PASS: libgomp.fortran/examples-4/simd-2.f90   -Os  (test for excess 
> errors)
>  PASS: libgomp.fortran/examples-4/simd-2.f90   -Os  execution test
>  PASS: libgomp.fortran/examples-4/simd-3.f90   -O0  (test for excess 
> errors)
>
> These are segmentation faults, for example:
>
> Program received signal SIGSEGV: Segmentation fault - invalid memory 
> reference.
>
> Backtrace for this error:
> #0  0x7fb65eeb52ef in ???
> #1  0x400845 in __simd2_mod_MOD_add2
> at 
> [...]/source-gcc/libgomp/testsuite/libgomp.fortran/examples-4/simd-2.f90:18
> #2  0x400845 in __simd2_mod_MOD_work
> at 
> [...]/source-gcc/libgomp/testsuite/libgomp.fortran/examples-4/simd-2.f90:29
> #3  0x4008d4 in MAIN__
> at 
> [...]/source-gcc/libgomp/testsuite/libgomp.fortran/examples-4/simd-2.f90:68
> #4  0x4005dc in main
> at 
> [...]/source-gcc/libgomp/testsuite/libgomp.fortran/examples-4/simd-2.f90:58
> Segmentation fault (core dumped)
>
>> PR tree-optimization/70729
>> * tree-ssa-loop-im.c (ref_indep_loop_p_1): Consider memory reference as
>> independent in loops having positive safelen value.
>> * tree-vect-loop.c (vect_transform_loop): Clear-up safelen value since
>> it may be not valid after vectorization.
>
>
> Grüße
>  Thomas

Re: [PATCH PR c/71699] Handle pointer arithmetic in nonzero tree checks

2016-06-30 Thread Manish Goregaokar

What about ptr + intptr_t? I guess we should check nonnegative for op1 then?
-Manish


On Thu, Jun 30, 2016 at 5:25 PM, Richard Biener
 wrote:
> On Thu, Jun 30, 2016 at 1:17 PM, Manish Goregaokar  wrote:
>> gcc/ChangeLog:
>> PR c/71699
>> * fold-const.c (tree_binary_nonzero_warnv_p): Allow
>> pointer addition to also be considered nonzero.
>> ---
>>  gcc/fold-const.c | 20 +---
>>  1 file changed, 13 insertions(+), 7 deletions(-)
>>
>> diff --git a/gcc/fold-const.c b/gcc/fold-const.c
>> index 3b9500d..eda713e 100644
>> --- a/gcc/fold-const.c
>> +++ b/gcc/fold-const.c
>> @@ -13200,16 +13200,22 @@ tree_binary_nonzero_warnv_p (enum tree_code code,
>>  {
>>  case POINTER_PLUS_EXPR:
>>  case PLUS_EXPR:
>> -  if (ANY_INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_UNDEFINED (type))
>> +  if ((ANY_INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_UNDEFINED (type))
>> +  || POINTER_TYPE_P (type))
>>  {
>> +  /* Pointers are always nonnegative, check integers.  */
>> +  if (ANY_INTEGRAL_TYPE_P (type))
>> +  {
>> +  sub_strict_overflow_p = false;
>> +  if (!tree_expr_nonnegative_warnv_p (op0,
>> +  _strict_overflow_p)
>> +  || !tree_expr_nonnegative_warnv_p (op1,
>> + _strict_overflow_p))
>> +return false;
>> +  }
>>/* With the presence of negative values it is hard
>>   to say something.  */
>> -  sub_strict_overflow_p = false;
>> -  if (!tree_expr_nonnegative_warnv_p (op0,
>> -  _strict_overflow_p)
>> -  || !tree_expr_nonnegative_warnv_p (op1,
>> - _strict_overflow_p))
>> -return false;
>> +
>
> Hmm, note that op1 of POINTER_PLUS_EXPR _is_ an integer that needs to be 
> treated
> as signed.
>
> POINTER_PLUS_EXPR is special in other ways in that iff
> flag_delete_null_pointer_checks
> is set we can assume that ptr + non-zero is never zero.  Thus simply do
>
>case POINTER_PLUS_EXPR:
>   return flag_delete_null_pointer_checks
>&& (tree_expr_nonzero_warnv_p (op0, strict_overflow_p)
>  || tree_expr_nonzero_warnv_p (op1, strict_overflow_p));
>
> OTOH ptr + -(uintptr_t)ptr is valid from a POINTER_PLUS_EXPR
> perspective as GCC does
> not have a special tree code for pointer subtraction (but IIRC it uses
> MINUS_EXPR on integers
> for this).
>
> Richard.
>
>
>>/* One of operands must be positive and the other non-negative.  */
>>/* We don't set *STRICT_OVERFLOW_P here: even if this value
>>   overflows, on a twos-complement machine the sum of two
>> --
>> 2.8.3

Re: RFC: pass to warn on questionable uses of alloca().

2016-06-30 Thread Aldy Hernandez


On 06/18/2016 07:55 PM, Martin Sebor wrote:

On 06/16/2016 02:32 AM, Aldy Hernandez wrote:



p.s. The pass currently warns on all uses of VLAs.  I'm not completely
sold on this idea, so perhaps we could remove it, or gate it with a flag.


I also think that VLA diagnostics would be better controlled
by a separate option, and emit a different diagnostic (one
that mentions VLA rather than alloca).  Although again, and
for VLAs even more so than for alloca, providing an option
to have GCC use dynamic allocation, would be an even more
robust solution than issuing warnings.  IIRC, this was the
early implementation of VLAs in GCC so there is a precedent
for it.  (Though this seems complementary to the warnings.)
In addition, I'm of the opinion that potentially unbounded
VLA allocation should be checked at runtime and made trap on
size overflow in C and throw an exception in C++ (e.g., when
int a [A][B] when A * B * sizeof (int) exceeds SIZE_MAX / 2
or some runtime-configurable limit).  My C++ patch for bug
69517 does just that (it needs to be resubmitted with the
runtime configuration limit added).

For a choice of VLA warning options, there already is -Wvla,
though it simply points out every VLA definition regardless
of its size.  It also issues a diagnostic that's questionable
in C99 and later modes (that "ISO C90 forbids variable length
array" is only relevant in C90; what matters when -Wvla is
specified in C99 and C11 modes is that a VLA has been seen).

To keep things consistent with -Walloca, perhaps -Wvla could
be extended to take an optional argument to specify the VLA
threshold.  (So that -Wvla=4096 would only diagnose VLA
definitions of 4k bytes or more, or unknown size.)


Hmmm...I kinda wanted a sane default for -Walloca, and keeping -Walloc 
and -Wvla consistent would mean I don't get that.


Currently, -Walloca would mean "warn on unbounded uses of alloca where n 
> DEFAULT", whereas -Walloca=0 would mean "warn on all uses of alloca".


The problem is that -Wvla means "warn on ALL uses".

One consistent option would be to change the definition to:

-Walloca: warn on all uses of alloca (strict mode).
-Walloca=N: warn on unbounded uses of alloca or bounded uses of n > N.
-Wvla: warn on all uses of VLAs (as currently implemented).
-Wvla=N: warn on unbounded uses of VLA or bounded uses of n > N.

This means we get no defaults, and the user must make the limit 
explicit, but at least we're consistent with the current use of -Wvla.


How about we just use -Walloca (and no separate variants for -Wvla=??), 
but we document that -Walloca will also flag VLAs (and explain why). 
Also, we make sure the error messages say "variable-length array" not 
"alloca" in the VLA case.  This way we can have a default, and perhaps a 
more consistent flag:


-Walloca: warn on all unbounded uses of alloca/vlas and bounded uses 
where n > DEFAULT.

-Walloca=0: warn on all uses of alloca/vlas (strict mode).
-Walloca=N: warn on all unbounded uses of alloca/vlas and bounded uses 
where n > N.

-Wla: untouched; as is currently implemented.

Would this be acceptable, or are y'all really against one -Walloca flag 
to rule it all?


Aldy

Re: [RFC: Patch 1/6 v2] New target hook: max_noce_ifcvt_seq_cost

2016-06-30 Thread Bernd Schmidt


On 06/21/2016 05:50 PM, James Greenhalgh wrote:

For the default implementation, if the parameters are not set, I just
multiply BRANCH_COST through by COSTS_N_INSNS (1) for size and
COSTS_N_INSNS (3) for speed. I know this is not ideal, but I'm still short
of ideas on how best to form the default implementation.


Yeah, this does seem kind of arbitrary. It looks especialy odd 
considering that BRANCH_COST is likely to already vary between the 
size/speed cases. What's wrong with just multiplying through by CNI(1)?


I'm not sure we want params for this; targets should just eventually 
upgrade their cost models.



The new default causes some changes in generated conditional move sequences
for x86_64. Whether these changes are for the better or not I can't say.


How about arm/aarch64? I think some benchmark results might be good to have.

Bernhard already pointed out some issues with the patch; I'll omit these.


+(max_noce_ifcvt_seq_cost,
+ "This hook should return a value in the same units as\n\
+@code{TARGET_RTX_COSTS}, giving the maximum acceptable cost for\n\
+a sequence generated by the RTL if-conversion pass when conditional\n\
+execution is not available.


There's still the issue that we're also replacing instructions when 
doing if-conversion. Let's say in this case,


 /* Convert "if (test) x = a; else x = b", for A and B constant.
   Also allow A = y + c1, B = y + c2, with a common y between A
   and B.  */

we're removing two assignments for the purposes of optimizing for size, 
and one assignment when considering optimization for speed. This needs 
to factor into the cost calculations somehow if we want to do it 
properly. I think we can leave the hook as-is, but maybe add 
documentation to the effect of "The caller should increase the limit by 
the cost of whatever instructions are removed in the transformation."



+/* Default implementation of TARGET_RTX_BRANCH_COST.  */


Wrong name for the hook.


+
+unsigned int
+default_max_noce_ifcvt_seq_cost (bool speed_p, edge e)
+{
+  bool predictable_p = predictable_edge_p (e);
+  /* For size, some targets like to set a BRANCH_COST of zero to disable
+ ifcvt, continue to allow that.  Then multiply through by
+ COSTS_N_INSNS (1) so we're in a comparable base.  */
+
+  if (!speed_p)
+return BRANCH_COST (speed_p, predictable_p) * COSTS_N_INSNS (1);


Blank line before the comment would be more readable.


+  enum compiler_param param = predictable_p
+ ? PARAM_MAX_RTL_IF_CONVERSION_PREDICTABLE_COST
+ : PARAM_MAX_RTL_IF_CONVERSION_UNPREDICTABLE_COST;


When splitting expressions across multiple lines, wrap in parens so that 
emacs formats them automatically.



Bernd

Re: [PATCH PR c/71699] Handle pointer arithmetic in nonzero tree checks

2016-06-30 Thread Richard Biener

On Thu, Jun 30, 2016 at 1:17 PM, Manish Goregaokar  wrote:
> gcc/ChangeLog:
> PR c/71699
> * fold-const.c (tree_binary_nonzero_warnv_p): Allow
> pointer addition to also be considered nonzero.
> ---
>  gcc/fold-const.c | 20 +---
>  1 file changed, 13 insertions(+), 7 deletions(-)
>
> diff --git a/gcc/fold-const.c b/gcc/fold-const.c
> index 3b9500d..eda713e 100644
> --- a/gcc/fold-const.c
> +++ b/gcc/fold-const.c
> @@ -13200,16 +13200,22 @@ tree_binary_nonzero_warnv_p (enum tree_code code,
>  {
>  case POINTER_PLUS_EXPR:
>  case PLUS_EXPR:
> -  if (ANY_INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_UNDEFINED (type))
> +  if ((ANY_INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_UNDEFINED (type))
> +  || POINTER_TYPE_P (type))
>  {
> +  /* Pointers are always nonnegative, check integers.  */
> +  if (ANY_INTEGRAL_TYPE_P (type))
> +  {
> +  sub_strict_overflow_p = false;
> +  if (!tree_expr_nonnegative_warnv_p (op0,
> +  _strict_overflow_p)
> +  || !tree_expr_nonnegative_warnv_p (op1,
> + _strict_overflow_p))
> +return false;
> +  }
>/* With the presence of negative values it is hard
>   to say something.  */
> -  sub_strict_overflow_p = false;
> -  if (!tree_expr_nonnegative_warnv_p (op0,
> -  _strict_overflow_p)
> -  || !tree_expr_nonnegative_warnv_p (op1,
> - _strict_overflow_p))
> -return false;
> +

Hmm, note that op1 of POINTER_PLUS_EXPR _is_ an integer that needs to be treated
as signed.

POINTER_PLUS_EXPR is special in other ways in that iff
flag_delete_null_pointer_checks
is set we can assume that ptr + non-zero is never zero.  Thus simply do

   case POINTER_PLUS_EXPR:
  return flag_delete_null_pointer_checks
   && (tree_expr_nonzero_warnv_p (op0, strict_overflow_p)
 || tree_expr_nonzero_warnv_p (op1, strict_overflow_p));

OTOH ptr + -(uintptr_t)ptr is valid from a POINTER_PLUS_EXPR
perspective as GCC does
not have a special tree code for pointer subtraction (but IIRC it uses
MINUS_EXPR on integers
for this).

Richard.


>/* One of operands must be positive and the other non-negative.  */
>/* We don't set *STRICT_OVERFLOW_P here: even if this value
>   overflows, on a twos-complement machine the sum of two
> --
> 2.8.3

[BUILDROBOT] Selftest failed for i686-wrs-vxworks

2016-06-30 Thread Jan-Benedict Glaw

Hi Nathan!

The recent self-testing fails for i686-wrs-vxworks:

/home/jbglaw/build-configlist_mk/i686-wrs-vxworks/build-gcc/mk/i686-wrs-vxworks/./gcc/xgcc
 
-B/home/jbglaw/build-configlist_mk/i686-wrs-vxworks/build-gcc/mk/i686-wrs-vxworks/./gcc/
 -xc -S -c /dev/null -fself-test
xgcc: fatal error: environment variable ‘WIND_BASE’ not defined
compilation terminated.
Makefile:1877: recipe for target 's-selftest' failed
make[2]: *** [s-selftest] Error 1


That's straight out of config-list.mk, see eg. build
http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=567929

I haven't given it any additional manual testing so far. It's
pre-installation though. Maybe I'd just set WIND_BASE to some
arbitrary value, just to make xgcc pass it's initial start-up test so
that it can continue with self-testing? Or shall we set some value
in gcc/Makefile.in for running the self-test?

MfG, JBG
 
-- 
  Jan-Benedict Glaw  jbg...@lug-owl.de  +49-172-7608481
Signature of: 17:44 <@uschebit> Evangelist ist doch ein Vertriebler
the second  :   für unverkäufliche Produkte, oder? (#korsett, 20120821)


signature.asc
Description: Digital signature

Re: [PATCH, www] Mention download_prerequisites in installation docs

2016-06-30 Thread Manish Goregaokar

Improved wording after a chat with `redi` on IRC.

gcc/ChangeLog:
* doc/install.texi: Mention download_prerequisites
---
 gcc/doc/install.texi | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 9248f0d..de43e36 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -351,7 +351,10 @@ versions may work in some cases, but it's safer
to use the exact
 versions documented.  We appreciate bug reports about problems with
 newer versions, though.  If your OS vendor provides packages for the
 support libraries then using those packages may be the simplest way to
-install the libraries.
+install the libraries.  Alternatively, running
+@code{contrib/download_prerequisites} from the root directory of the source
+tree will fetch these libraries and place them in the appropriate locations
+for the build.

 @table @asis
 @item GNU Multiple Precision Library (GMP) version 4.3.2 (or later)
-- 
2.8.3

-Manish


On Thu, Jun 30, 2016 at 12:37 PM, Manish Goregaokar  wrote:
> 2016-06-30  Manish Goregaokar  
>
> gcc/ChangeLog:
> * doc/install.texi: Mention download_prerequisites
> ---
>  gcc/doc/install.texi | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
> index 9248f0d..0070fdf 100644
> --- a/gcc/doc/install.texi
> +++ b/gcc/doc/install.texi
> @@ -351,7 +351,9 @@ versions may work in some cases, but it's safer to
> use the exact
>  versions documented.  We appreciate bug reports about problems with
>  newer versions, though.  If your OS vendor provides packages for the
>  support libraries then using those packages may be the simplest way to
> -install the libraries.
> +install the libraries.  Alternatively, running
> +@code{contrib/download_prerequisites} from the repository root will fetch
> +these libraries and place them in the appropriate locations for build.
>
>  @table @asis
>  @item GNU Multiple Precision Library (GMP) version 4.3.2 (or later)
> --
> 2.8.3

[PATCH PR c/71699] Handle pointer arithmetic in nonzero tree checks

2016-06-30 Thread Manish Goregaokar

gcc/ChangeLog:
PR c/71699
* fold-const.c (tree_binary_nonzero_warnv_p): Allow
pointer addition to also be considered nonzero.
---
 gcc/fold-const.c | 20 +---
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index 3b9500d..eda713e 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -13200,16 +13200,22 @@ tree_binary_nonzero_warnv_p (enum tree_code code,
 {
 case POINTER_PLUS_EXPR:
 case PLUS_EXPR:
-  if (ANY_INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_UNDEFINED (type))
+  if ((ANY_INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_UNDEFINED (type))
+  || POINTER_TYPE_P (type))
 {
+  /* Pointers are always nonnegative, check integers.  */
+  if (ANY_INTEGRAL_TYPE_P (type))
+  {
+  sub_strict_overflow_p = false;
+  if (!tree_expr_nonnegative_warnv_p (op0,
+  _strict_overflow_p)
+  || !tree_expr_nonnegative_warnv_p (op1,
+ _strict_overflow_p))
+return false;
+  }
   /* With the presence of negative values it is hard
  to say something.  */
-  sub_strict_overflow_p = false;
-  if (!tree_expr_nonnegative_warnv_p (op0,
-  _strict_overflow_p)
-  || !tree_expr_nonnegative_warnv_p (op1,
- _strict_overflow_p))
-return false;
+
   /* One of operands must be positive and the other non-negative.  */
   /* We don't set *STRICT_OVERFLOW_P here: even if this value
  overflows, on a twos-complement machine the sum of two
-- 
2.8.3

[PATCH] [ARC] Various small miscellaneous fixes.

2016-06-30 Thread Claudiu Zissulescu

Small patches batch.

Ok to apply?
Claudiu

gcc/
2016-05-09  Claudiu Zissulescu  

* config/arc/arc.c (arc_process_double_reg_moves): Change.
* config/arc/arc.md (movsi_insn): Disable unsupported move
instructions for ARCv2 cores.
(movdi): Use prepare_move_operands.
(movsf, movdf): Use move_dest_operand predicate.
(arc_process_double_reg_moves): Change.
* config/arc/constraints.md (Chs): Enable when barrel shifter is
present.
* config/arc/fpu.md (divsf3): Change.
* config/arc/fpx.md (dexcl_3op_peep2_insn): Dx data register is
also a destination.
(dexcl_3op_peep2_insn_nores): Likewise.
* config/arc/arc.h (SHIFT_COUNT_TRUNCATED): Define to one.
(LINK_COMMAND_SPEC): Remove.
---
 gcc/config/arc/arc.c  |  5 +
 gcc/config/arc/arc.h  | 27 +++
 gcc/config/arc/arc.md | 35 +++
 gcc/config/arc/constraints.md |  3 ++-
 gcc/config/arc/fpu.md |  4 +++-
 gcc/config/arc/fpx.md | 26 --
 6 files changed, 40 insertions(+), 60 deletions(-)

diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
index 985df81..0830af3 100644
--- a/gcc/config/arc/arc.c
+++ b/gcc/config/arc/arc.c
@@ -9025,10 +9025,7 @@ arc_process_double_reg_moves (rtx *operands)
   rtx srcLow  = simplify_gen_subreg (SImode, src, DFmode,
TARGET_BIG_ENDIAN ? 4 : 0);
 
-  emit_insn (gen_rtx_UNSPEC_VOLATILE (Pmode,
- gen_rtvec (3, dest, srcHigh, srcLow),
- VUNSPEC_ARC_DEXCL_NORES));
-
+  emit_insn (gen_dexcl_2op (dest, srcHigh, srcLow));
 }
   else
 gcc_unreachable ();
diff --git a/gcc/config/arc/arc.h b/gcc/config/arc/arc.h
index f1705d5..f8f195d 100644
--- a/gcc/config/arc/arc.h
+++ b/gcc/config/arc/arc.h
@@ -124,24 +124,6 @@ extern const char *arc_cpu_to_as (int argc, const char 
**argv);
   %{!marclinux*: %{pg|p|profile:-marclinux_prof;: -marclinux}} 
\
   %{!z:-z max-page-size=0x2000 -z common-page-size=0x2000} \
   %{shared:-shared}"
-/* Like the standard LINK_COMMAND_SPEC, but add %G when building
-   a shared library with -nostdlib, so that the hidden functions of libgcc
-   will be incorporated.
-   N.B., we don't want a plain -lgcc, as this would lead to re-exporting
-   non-hidden functions, so we have to consider libgcc_s.so.* first, which in
-   turn should be wrapped with --as-needed.  */
-#define LINK_COMMAND_SPEC "\
-%{!fsyntax-only:%{!c:%{!M:%{!MM:%{!E:%{!S:\
-%(linker) %l " LINK_PIE_SPEC "%X %{o*} %{A} %{d} %{e*} %{m} %{N} %{n} %{r}\
-%{s} %{t} %{u*} %{x} %{z} %{Z} %{!A:%{!nostdlib:%{!nostartfiles:%S}}}\
-%{static:} %{L*} %(mfwrap) %(link_libgcc) %o\
-%{fopenacc|fopenmp|%:gt(%{ftree-parallelize-loops=*:%*} 1):\
-   %:include(libgomp.spec)%(link_gomp)}\
-%(mflib)\
-%{fprofile-arcs|fprofile-generate|coverage:-lgcov}\
-%{!nostdlib:%{!nodefaultlibs:%(link_ssp) %(link_gcc_c_sequence)}}\
-%{!A:%{!nostdlib:%{!nostartfiles:%E}}} %{T*} }}"
-
 #else
 #define LINK_SPEC "%{mbig-endian:-EB} %{EB} %{EL}\
   %{pg|p:-marcelf_prof;mA7|mARC700|mcpu=arc700|mcpu=ARC700: -marcelf}"
@@ -1563,13 +1545,10 @@ extern int arc_return_address_regs[4];
 /* Undo the effects of the movmem pattern presence on STORE_BY_PIECES_P .  */
 #define MOVE_RATIO(SPEED) ((SPEED) ? 15 : 3)
 
-/* Define this to be nonzero if shift instructions ignore all but the low-order
-   few bits. Changed from 1 to 0 for rotate pattern testcases
-   (e.g. 20020226-1.c). This change truncates the upper 27 bits of a word
-   while rotating a word. Came to notice through a combine phase
-   optimization viz. a << (32-b) is equivalent to a << (-b).
+/* Define this to be nonzero if shift instructions ignore all but the
+   low-order few bits.
 */
-#define SHIFT_COUNT_TRUNCATED 0
+#define SHIFT_COUNT_TRUNCATED 1
 
 /* Value is 1 if truncating an integer of INPREC bits to OUTPREC bits
is done just by pretending it is already truncated.  */
diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
index 5accf4a..c86fc02 100644
--- a/gcc/config/arc/arc.md
+++ b/gcc/config/arc/arc.md
@@ -699,9 +699,9 @@
 ; the iscompact attribute allows the epilogue expander to know for which
 ; insns it should lengthen the return insn.
 ; N.B. operand 1 of alternative 7 expands into pcl,symbol@gotpc .
-(define_insn "*movsi_insn"  ;   0 1 23  4 5   
6   7   8   9   10  11  1213   14  15  16 17 18 19   202122 
23 2425 26  27   28   29
-  [(set (match_operand:SI 0 "move_dest_operand" "=Rcq,Rcq#q,w,   h, w,w,  
w,  w,  w,  w,???w, ?w,  w,Rcq#q,   w,Rcq,  S,   Us<,RcqRck,!*x,  
r,!*Rsd,!*Rcd,r,Ucm,  Usd,m,???m,VUsc,VUsc")
-   (match_operand:SI 1 "move_src_operand"  "  cL,

Re: [PATCH 0/9] remove some manual memory management

2016-06-30 Thread Bernd Schmidt


On 06/29/2016 02:26 PM, tbsaunde+...@tbsaunde.org wrote:

patches individually bootstrapped and regtested on x86_64-linux-gnu, ok?


I think these all look sensible. ChangeLogs ought to have slightly more 
information than "Adjust" in some cases, especially when you're changing 
function arguments.



Bernd

[PATCH 1/2] [ARC] Update INSN_LENGTH_ALIGNMENT.

2016-06-30 Thread Claudiu Zissulescu

Update the INSN_LENGTH_ALIGNMENT macro to handle jump tables placed in
program memory.

Ok to apply?
Claudiu


gcc/
2016-06-20  Claudiu Zissulescu  

* config/arc/arc.h (INSN_LENGTH_ALIGNMENT): Change.
---
 gcc/config/arc/arc.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/config/arc/arc.h b/gcc/config/arc/arc.h
index 3376ad8..d2adf4d 100644
--- a/gcc/config/arc/arc.h
+++ b/gcc/config/arc/arc.h
@@ -1519,10 +1519,10 @@ extern int arc_return_address_regs[4];
 #define ASM_OUTPUT_BEFORE_CASE_LABEL(FILE, PREFIX, NUM, TABLE) \
   ASM_OUTPUT_ALIGN ((FILE), ADDR_VEC_ALIGN (TABLE));
 
-#define INSN_LENGTH_ALIGNMENT(INSN) \
-  ((JUMP_P (INSN) \
+#define INSN_LENGTH_ALIGNMENT(INSN)  \
+  ((JUMP_TABLE_DATA_P (INSN) \
 && GET_CODE (PATTERN (INSN)) == ADDR_DIFF_VEC \
-&& GET_MODE (PATTERN (INSN)) == QImode) \
+&& GET_MODE (PATTERN (INSN)) == QImode)  \
? 0 : length_unit_log)
 
 /* Define if operations between registers always perform the operation
-- 
1.9.1

[PATCH 2/2] [ARC] Fix mul32x16 patterns.

2016-06-30 Thread Claudiu Zissulescu

The mululw instructions are wrongly used in the patterns, fix them.

Okt to apply?
Claudiu

gcc/
2016-06-28  Claudiu Zissulescu  

* config/arc/arc.md (umul_600): Change.
(umul64_600): Likewise.
---
 gcc/config/arc/arc.md | 18 +++---
 1 file changed, 7 insertions(+), 11 deletions(-)

diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
index 2464a19..41b8eed 100644
--- a/gcc/config/arc/arc.md
+++ b/gcc/config/arc/arc.md
@@ -1937,13 +1937,11 @@
  (const_int 0
(clobber (match_operand:SI 3 "acc1_operand" ""))]
   "TARGET_MULMAC_32BY16_SET"
-  "@mululw 0, %0, %1
-mululw 0, %0, %1
-mululw%? 0, %1, %0"
+  "mululw 0, %0, %1"
   [(set_attr "length" "4,4,8")
-   (set_attr "type" "mulmac_600, mulmac_600, mulmac_600")
-   (set_attr "predicable" "no, no, yes")
-   (set_attr "cond" "nocond, canuse_limm, canuse")])
+   (set_attr "type" "mulmac_600")
+   (set_attr "predicable" "no")
+   (set_attr "cond" "nocond")])
 
 (define_insn "mac_600"
   [(set (match_operand:SI 2 "acc2_operand" "")
@@ -2372,13 +2370,11 @@
  (const_int 0
   ]
   "TARGET_MULMAC_32BY16_SET"
-  "@mululw 0, %0, %1
-mululw 0, %0, %1
-mululw%? 0, %1, %0"
+  "mululw 0, %0, %1"
   [(set_attr "length" "4,4,8")
(set_attr "type" "mulmac_600")
-   (set_attr "predicable" "no,no,yes")
-   (set_attr "cond" "nocond, canuse_limm, canuse")])
+   (set_attr "predicable" "no")
+   (set_attr "cond" "nocond")])
 
 
 (define_insn "umac64_600"
-- 
1.9.1

Re: [Patch AArch64] Fix PR target/63874

2016-06-30 Thread Ramana Radhakrishnan

On Thu, May 5, 2016 at 8:45 AM, Ramana Radhakrishnan
 wrote:
> On Thu, Mar 31, 2016 at 2:11 PM, Ramana Radhakrishnan
>  wrote:
>> Hi,
>>
>> In this PR we have a situation where we aren't really detecting
>> weak references vs weak definitions. If one has a weak definition
>> that binds locally there's no reason not to put out PC relative
>> relocations.
>>
>> However if you have a genuine weak reference that is
>> known not to bind locally it makes very little sense
>> to put out an entry into the literal pool which doesn't always
>> work with DSOs and shared objects.
>>
>> Tested aarch64-none-linux-gnu bootstrap and regression test with no 
>> regressions
>>
>> This is not a regression and given what we've seen recently with protected
>> symbols and binds_locally_p I'd rather this were queued for GCC 7.
>>
>> Ok ?
>
> Ping ^ 2.
>
> https://gcc.gnu.org/ml/gcc-patches/2016-03/msg01680.html

Ping ^3

https://gcc.gnu.org/ml/gcc-patches/2016-03/msg01680.html

Ramana


>
> regards
> Ramana
>>
>> regards
>> Ramana
>>
>> gcc/
>>
>> * config/aarch64/aarch64.c (aarch64_classify_symbol): Typo in comment fixed.
>>   Only force to memory if it is a weak external reference.
>>
>>
>> gcc/testsuite
>>
>> * gcc.target/aarch64/pr63874.c: New test.

Re: [PATCH] gcc/arc: Avoid add_s REG,REG,pcl on ARCv2 targets

2016-06-30 Thread Andrew Burgess

* Claudiu Zissulescu  [2016-06-30 07:36:32 
+]:

> I think the compactsi makes no sense for ARCv2 without the add_s
> reg, reg,PCL instruction.  Moreover, I have proposed a patch about
> this issue months ago, See:
> https://gcc.gnu.org/ml/gcc-patches/2016-04/msg01462.html.  However,
> our maintainer is missing in action.

OK, thanks for the link, I don't see a need for my patch then.

Lets hope we get some of your outstanding patches reviewed soon.

Thanks,
Andrew

Fix PR fortran/71688

2016-06-30 Thread Martin Jambor

Hi,

PR 71688 is about an ICE in cgraphunit.c caused by the fact that
Fortran FE creates two separate call-graph nodes for a single function
decl, if you are interested, complete backtraces leading to the point
of creating them are in bugzilla.

The intuitive fix, changing one of these points so that they call
cgraph::get_create rather than cgraph_node::create works and given the
comment just before the line also seems like the correct thing to do:

  /* Register this function with cgraph just far enough to get it
 added to our parent's nested function list.
 If there are static coarrays in this function, the nested _caf_init
 function has already called cgraph_create_node, which also created
 the cgraph node for this function.  */

It is interesting that the bug lurked so long there.  I have
bootstrapped and tested the patch below on x86_64-linux, is it OK for
trunk and (after a while) for all active release branches?

Thanks,

Martin


2016-06-29  Martin Jambor  

PR fortran/71688
* trans-decl.c (gfc_generate_function_code): Use get_create rather
than create to get a call graph node.

testsuite/
gfortran.dg/pr71688.f90: New test.


diff --git a/gcc/fortran/trans-decl.c b/gcc/fortran/trans-decl.c
index 2f5e434..0e68736 100644
--- a/gcc/fortran/trans-decl.c
+++ b/gcc/fortran/trans-decl.c
@@ -6336,7 +6336,7 @@ gfc_generate_function_code (gfc_namespace * ns)
 function has already called cgraph_create_node, which also created
 the cgraph node for this function.  */
   if (!has_coarray_vars || flag_coarray != GFC_FCOARRAY_LIB)
-   (void) cgraph_node::create (fndecl);
+   (void) cgraph_node::get_create (fndecl);
 }
   else
 cgraph_node::finalize_function (fndecl, true);
diff --git a/gcc/testsuite/gfortran.dg/pr71688.f90 
b/gcc/testsuite/gfortran.dg/pr71688.f90
new file mode 100644
index 000..dbb6d18
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr71688.f90
@@ -0,0 +1,13 @@
+! { dg-do compile }
+! { dg-options "-fcoarray=lib" }
+
+program p
+   call s
+contains
+   subroutine s
+  real :: x[*] = 1
+  block
+  end block
+  x = 2
+   end
+end

Re: [PATCH] Mark -fstack-protect as optimization flag (PR middle-end/71585)

2016-06-30 Thread Martin Liška

On 06/29/2016 01:00 PM, Richard Biener wrote:
> Ok.  Might want to backport the inline_call hunk to all affected branches.
> 
> Richard.

Done in GCC-6 branch. Older branches are not affected.

Re: [PATCH][AArch64] Canonicalize Cortex core tunings

2016-06-30 Thread Richard Earnshaw (lists)

On 29/06/16 18:03, Wilco Dijkstra wrote:
> This patch sets the branch cost to the same most optimal setting for all 
> Cortex cores,
> reducing codesize and improving performance due to using more CSEL 
> instructions.
> Set the autoprefetcher model in Cortex-A72 to weak like the others.  Enable 
> AES fusion
> in Cortex-A35.  As a result generated code is now more similar as well as 
> more optimal
> across Cortex cores.
> 
> Regress passes, OK for commit?
> 

OK.

R.


> ChangeLog:
> 2016-06-29  Wilco Dijkstra  
> 
>   * config/aarch64/aarch64.c (cortexa35_tunings):
>   Enable AES fusion.  Use cortexa57_branch_cost.
>   (cortexa53_tunings): Use cortexa57_branch_cost.
>   (cortexa72_tunings): Use cortexa57_branch_cost.
>   Use AUTOPREFETCHER_WEAK.
>   (cortexa73_tunings): Use cortexa57_branch_cost.
> ---
> 
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> 7617f9fb273d17653df95eaf22b3cbeae3862230..5a20e175e72b2848499f07fcdf1856b3a7817e49
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -448,11 +448,11 @@ static const struct tune_params cortexa35_tunings =
>_addrcost_table,
>_regmove_cost,
>_vector_cost,
> -  _branch_cost,
> +  _branch_cost,
>_approx_modes,
>4, /* memmov_cost  */
>1, /* issue_rate  */
> -  (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
> +  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
> | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops  */
>8, /* function_align.  */
>8, /* jump_align.  */
> @@ -474,7 +474,7 @@ static const struct tune_params cortexa53_tunings =
>_addrcost_table,
>_regmove_cost,
>_vector_cost,
> -  _branch_cost,
> +  _branch_cost,
>_approx_modes,
>4, /* memmov_cost  */
>2, /* issue_rate  */
> @@ -526,7 +526,7 @@ static const struct tune_params cortexa72_tunings =
>_addrcost_table,
>_regmove_cost,
>_vector_cost,
> -  _branch_cost,
> +  _branch_cost,
>_approx_modes,
>4, /* memmov_cost  */
>3, /* issue_rate  */
> @@ -542,7 +542,7 @@ static const struct tune_params cortexa72_tunings =
>2, /* min_div_recip_mul_df.  */
>0, /* max_case_values.  */
>0, /* cache_line_size.  */
> -  tune_params::AUTOPREFETCHER_OFF,   /* autoprefetcher_model.  */
> +  tune_params::AUTOPREFETCHER_WEAK,  /* autoprefetcher_model.  */
>(AARCH64_EXTRA_TUNE_NONE)  /* tune_flags.  */
>  };
>  
> @@ -552,7 +552,7 @@ static const struct tune_params cortexa73_tunings =
>_addrcost_table,
>_regmove_cost,
>_vector_cost,
> -  _branch_cost,
> +  _branch_cost,
>_approx_modes,
>4, /* memmov_cost.  */
>2, /* issue_rate.  */
>

Re: [PATCH][vectorizer][2/2] Hook up mult synthesis logic into vectorisation of mult-by-constant

2016-06-30 Thread Kyrill Tkachov



On 28/06/16 08:54, Richard Biener wrote:

On Thu, 16 Jun 2016, Kyrill Tkachov wrote:


On 15/06/16 22:53, Marc Glisse wrote:

On Wed, 15 Jun 2016, Kyrill Tkachov wrote:


This is a respin of
https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00952.html following
feedback.
I've changed the code to cast the operand to an unsigned type before
applying the multiplication algorithm
and cast it back to the signed type at the end.
Whether to perform the cast is now determined by the function
cast_mult_synth_to_unsigned in which I've implemented
the cases that Marc mentioned in [1]. Please do let me know
if there are any other cases that need to be handled.

Ah, I never meant those cases as an exhaustive list, I was just looking for
examples showing that the transformation was unsafe, and those 2 came to
mind:

- x*15 -> x*16-x the second one can overflow even when the first one
doesn't.

- x*-2 -> -(x*2) can overflow when the result is INT_MIN (maybe that's
redundant with the negate_variant check?)

On the other hand, as long as we remain in the 'positive' operations,
turning x*3 to x<<1+x seems perfectly safe. And even x*30 to (x*16-x)*2
cannot cause spurious overflows. But I didn't look at the algorithm closely
enough to characterize the safe cases. Now if you have done it, that's good
:-) Otherwise, we might want to err on the side of caution.


I'll be honest, I didn't give it much thought beyond convincing myself that
the two cases you listed are legitimate.
Looking at expand_mult_const in expmed.c can be helpful (where it updates
val_so_far for checking purposes) to see
the different algorithm cases. I think the only steps that could cause
overflow are alg_sub_t_m2, alg_sub_t2_m and alg_sub_factor or when the final
step is negate_variant, which are what you listed (and covered in this patch).

richi is away on PTO for the time being though, so we have some time to
convince ourselves :)

;)  I think the easiest way would be to always use unsigned arithmetic.

While VRP doesn't do anything on vector operations we still have some
match.pd patterns that rely on correct overflow behavior and those
may be enabled for vector types as well.


That's fine with me.
Here's the patch that performs the casts to unsigned and back when the input
type doesn't wrap on overflow.

Bootstrapped and tested on arm, aarch64, x86_64.

How's this?

Thanks,
Kyrill

2016-06-30  Kyrylo Tkachov  

PR target/65951
* tree-vect-patterns.c: Include mult-synthesis.h
(target_supports_mult_synth_alg): New function.
(apply_binop_and_append_stmt): Likewise.
(vect_synth_mult_by_constant): Likewise.
(target_has_vecop_for_code): Likewise.
(vect_recog_mult_pattern): Use above functions to synthesize vector
multiplication by integer constants.

2016-06-30  Kyrylo Tkachov  

* gcc.dg/vect/vect-mult-const-pattern-1.c: New test.
* gcc.dg/vect/vect-mult-const-pattern-2.c: Likewise.

diff --git a/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-1.c b/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-1.c
new file mode 100644
index ..e5dba82d7fa955a6a37a0eabf980127e464ac77b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-1.c
@@ -0,0 +1,41 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_shift } */
+
+#include 
+#include "tree-vect.h"
+
+#define N 256
+
+__attribute__ ((noinline)) void
+foo (long long *arr)
+{
+  for (int i = 0; i < N; i++)
+arr[i] *= 123;
+}
+
+int
+main (void)
+{
+  check_vect ();
+  long long data[N];
+  int i;
+
+  for (i = 0; i < N; i++)
+{
+  data[i] = i;
+  __asm__ volatile ("");
+}
+
+  foo (data);
+  for (i = 0; i < N; i++)
+{
+  if (data[i] / 123 != i)
+  __builtin_abort ();
+  __asm__ volatile ("");
+}
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vect_recog_mult_pattern: detected" 2 "vect"  { target aarch64*-*-* } } } */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target aarch64*-*-* } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-2.c b/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-2.c
new file mode 100644
index ..83019c96910b866e364a7c2e00261a1ded13cb53
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-mult-const-pattern-2.c
@@ -0,0 +1,41 @@
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_shift } */
+
+#include 
+#include "tree-vect.h"
+
+#define N 256
+
+__attribute__ ((noinline)) void
+foo (long long *arr)
+{
+  for (int i = 0; i < N; i++)
+arr[i] *= -19594LL;
+}
+
+int
+main (void)
+{
+  check_vect ();
+  long long data[N];
+  int i;
+
+  for (i = 0; i < N; i++)
+{
+  data[i] = i;
+  __asm__ volatile ("");
+}
+
+  foo (data);
+  for (i = 0; i < N; i++)
+{
+  if (data[i] / -19594LL != i)
+  __builtin_abort ();
+

Re: [patch] Get rid of stack trampolines for nested functions

2016-06-30 Thread Richard Biener

On Thu, Jun 30, 2016 at 12:08 AM, Eric Botcazou  wrote:
> Hi,
>
> this patch implements generic support for the elimination of stack trampolines
> and, consequently, of the need to make the stack executable when pointers to
> nested functions are used.  That's done on a per-language and per-target basis
> (i.e. there is 1 language hook and 1 target hook to parameterize it) and there
> are no changes whatsoever in code generation if both are not turned on (and
> the patch implements a -ftrampolines option to let the user override them).
>
> The idea is based on the fact that, for targets using function descriptors as
> per their ABI like IA-64, AIX or VMS platforms, stack trampolines "degenerate"
> into descriptors built at run time on the stack and thus made up of data only,
> which in turn means that the stack doesn't need to be made executable.

As you are not able to handle all trampolines that way I always wondered if we
can allocate space for the trampolines statically for those cases we can compute
do not happen in a recursive path (or avoid them if we do not need a
static chain
anyway, not sure if we optimize for this case currently).

Richard.

> This descriptor-based scheme is implemented generically for nested functions,
> i.e. the nested function lowering pass builds generic descriptors instead of
> trampolines on the stack when encountering pointers to nested functions, which
> means that there are 2 kinds of pointers to functions and therefore a run-time
> identification mechanism is needed for indirect calls to distinguish them.
>
> Because of that, enabling the support breaks binary compatibility (for code
> manipulating pointers to nested functions).  That's OK for Ada and nested
> functions are first-class citizens in the language anyway so we really need
> this, but not for C so for example Ada doesn't use it at the interface with C
> (when objects have "convention C" in Ada parlance).
>
> This was bootstrapped/regtested on x86_64-suse-linux but AdaCore has been
> using it on native platforms (Linux, Windows, Solaris, etc) for years.
>
> OK for the mainline?
>
>
> 2016-06-29  Eric Botcazou  
>
> PR ada/37139
> PR ada/67205
> * common.opt (-ftrampolines): New option.
> * doc/invoke.texi (Code Gen Options): Document it.
> * doc/tm.texi.in (Trampolines): Add TARGET_CUSTOM_FUNCTION_DESCRIPTORS
> * doc/tm.texi: Regenerate.
> * builtins.def: Add init_descriptor and adjust_descriptor.
> * builtins.c (expand_builtin_init_trampoline): Do not issue a warning
> on platforms with descriptors.
> (expand_builtin_init_descriptor): New function.
> (expand_builtin_adjust_descriptor): Likewise.
> (expand_builtin) : New case.
> : Likewise.
> * calls.c (prepare_call_address): Remove SIBCALLP parameter and add
> FLAGS parameter.  Deal with indirect calls by descriptor and adjust.
> Set STATIC_CHAIN_REG_P on the static chain register, if any.
> (call_expr_flags): Set ECF_BY_DESCRIPTOR for calls by descriptor.
> (expand_call): Likewise.  Move around call to prepare_call_address
> and pass all flags to it.
> * cfgexpand.c (expand_call_stmt): Reinstate CALL_EXPR_BY_DESCRIPTOR.
> * gimple.h (enum gf_mask): New GF_CALL_BY_DESCRIPTOR value.
> (gimple_call_set_by_descriptor): New setter.
> (gimple_call_by_descriptor_p): New getter.
> * gimple.c (gimple_build_call_from_tree): Set CALL_EXPR_BY_DESCRIPTOR.
> (gimple_call_flags): Deal with GF_CALL_BY_DESCRIPTOR.
> * langhooks.h (struct lang_hooks): Add custom_function_descriptors.
> * langhooks-def.h (LANG_HOOKS_CUSTOM_FUNCTION_DESCRIPTORS): Define.
> (LANG_HOOKS_INITIALIZER): Add LANG_HOOKS_CUSTOM_FUNCTION_DESCRIPTORS.
> * rtl.h (STATIC_CHAIN_REG_P): New macro.
> * rtlanal.c (find_first_parameter_load): Skip static chain registers.
> * target.def (custom_function_descriptors): New POD hook.
> * tree.h (FUNC_ADDR_BY_DESCRIPTOR): New flag on ADDR_EXPR.
> (CALL_EXPR_BY_DESCRIPTOR): New flag on CALL_EXPR.
> * tree-core.h (ECF_BY_DESCRIPTOR): New mask.
> Document FUNC_ADDR_BY_DESCRIPTOR and CALL_EXPR_BY_DESCRIPTOR.
> * tree.c (make_node_stat) : Set function alignment to
> DEFAULT_FUNCTION_ALIGNMENT instead of FUNCTION_BOUNDARY.
> (build_common_builtin_nodes): Initialize init_descriptor and
> adjust_descriptor.
> * tree-nested.c: Include target.h.
> (struct nesting_info): Add 'any_descr_created' field.
> (get_descriptor_type): New function.
> (lookup_element_for_decl): New function extracted from...
> (create_field_for_decl): Likewise.
> (lookup_tramp_for_decl): ...here.  Adjust.
> (lookup_descr_for_decl): New function.
> (convert_tramp_reference_op): Deal with

RE: [PATCH] gcc/arc: Avoid add_s REG,REG,pcl on ARCv2 targets

2016-06-30 Thread Claudiu Zissulescu

I think the compactsi makes no sense for ARCv2 without the add_s reg, reg,PCL 
instruction.  Moreover, I have proposed a patch about this issue months ago, 
See: https://gcc.gnu.org/ml/gcc-patches/2016-04/msg01462.html.  However, our 
maintainer is missing in action.

Cheers,
Claudiu

> -Original Message-
> From: Andrew Burgess [mailto:andrew.burg...@embecosm.com]
> Sent: Thursday, June 30, 2016 1:43 AM
> To: gcc-patches@gcc.gnu.org; g...@amylaar.uk
> Cc: claudiu.zissule...@synopsys.com; Andrew Burgess
> 
> Subject: [PATCH] gcc/arc: Avoid add_s REG,REG,pcl on ARCv2 targets
> 
> The ARCv2 targets don't support add_s REG,REG,pcl like earlier ARC
> targets did.  This instruction format is used when generating case jump
> tables.
> 
> This commit updates the code so that ARCv2 targets avoid the unsupported
> instruction.  This does mean the code is slightly longer on ARCv2, so
> the instruction length has changed accordingly.
> 
> gcc/ChangeLog:
> 
>   * config/arc/arc.md (casesi_compact_jump): Avoid generating add_s
>   REG,REG,pcl on ARCv2 targets.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arc/switch-1.c: New file.
> ---
>  gcc/ChangeLog   |  5 
>  gcc/config/arc/arc.md   | 50 
> ++---
>  gcc/testsuite/ChangeLog |  4 +++
>  gcc/testsuite/gcc.target/arc/switch-1.c | 42
> +++
>  4 files changed, 79 insertions(+), 22 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/arc/switch-1.c
> 
> diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
> index 1102c53..0f39d49 100644
> --- a/gcc/config/arc/arc.md
> +++ b/gcc/config/arc/arc.md
> @@ -3950,6 +3950,7 @@
>int unalign = arc_get_unalign ();
>rtx xop[3];
>const char *s;
> +  int vec_offset = TARGET_V2 ? 12 : 10;
> 
>xop[0] = operands[0];
>xop[2] = operands[2];
> @@ -3962,11 +3963,11 @@
>2 of these are for alignment, and are anticipated in the length
>of the ADDR_DIFF_VEC.  */
>if (unalign && !satisfies_constraint_Rcq (xop[0]))
> - s = \"add2 %2,pcl,%0\n\tld_s %2,[%2,12]\";
> + s = \"add2 %2,pcl,%0\n\tld_s %2,[%2,12]\";   /* Length = 6 bytes.  */
>else if (unalign)
> - s = \"add_s %2,%0,2\n\tld.as %2,[pcl,%2]\";
> + s = \"add_s %2,%0,2\n\tld.as %2,[pcl,%2]\";  /* Length = 6 bytes.  */
>else
> - s = \"add %2,%0,2\n\tld.as %2,[pcl,%2]\";
> + s = \"add %2,%0,2\n\tld.as %2,[pcl,%2]\";/* Length = 8 bytes.  */
>arc_clear_unalign ();
>break;
>  case HImode:
> @@ -3974,26 +3975,26 @@
>   {
> if (satisfies_constraint_Rcq (xop[0]))
>   {
> -   s = \"add_s %2,%0,%1\n\tld%_.as %2,[pcl,%2]\";
> -   xop[1] = GEN_INT ((10 - unalign) / 2U);
> +   s = \"add_s %2,%0,%1\n\tld%_.as %2,[pcl,%2]\";  /* Length = 6
> bytes.  */
> +   xop[1] = GEN_INT ((vec_offset - unalign) / 2U);
>   }
> else
>   {
> -   s = \"add1 %2,pcl,%0\n\tld%__s %2,[%2,%1]\";
> -   xop[1] = GEN_INT (10 + unalign);
> +   s = \"add1 %2,pcl,%0\n\tld%__s %2,[%2,%1]\";/* Length = 6
> bytes.  */
> +   xop[1] = GEN_INT (vec_offset + unalign);
>   }
>   }
>else
>   {
> if (satisfies_constraint_Rcq (xop[0]))
>   {
> -   s = \"add_s %2,%0,%1\n\tld%_.x.as %2,[pcl,%2]\";
> -   xop[1] = GEN_INT ((10 - unalign) / 2U);
> +   s = \"add_s %2,%0,%1\n\tld%_.x.as %2,[pcl,%2]\";  /* Length = 6
> bytes.  */
> +   xop[1] = GEN_INT ((vec_offset - unalign) / 2U);
>   }
> else
>   {
> -   s = \"add1 %2,pcl,%0\n\tld%__s.x %2,[%2,%1]\";
> -   xop[1] = GEN_INT (10 + unalign);
> +   s = \"add1 %2,pcl,%0\n\tld%__s.x %2,[%2,%1]\";/* Length = 6
> bytes.  */
> +   xop[1] = GEN_INT (vec_offset + unalign);
>   }
>   }
>arc_toggle_unalign ();
> @@ -4001,17 +4002,18 @@
>  case QImode:
>if (ADDR_DIFF_VEC_FLAGS (diff_vec).offset_unsigned)
>   {
> -   if ((rtx_equal_p (xop[2], xop[0])
> -|| find_reg_note (insn, REG_DEAD, xop[0]))
> +   if (!TARGET_V2
> +  && (rtx_equal_p (xop[2], xop[0])
> +   || find_reg_note (insn, REG_DEAD, xop[0]))
> && satisfies_constraint_Rcq (xop[0]))
>   {
> -   s = \"add_s %0,%0,pcl\n\tldb_s %2,[%0,%1]\";
> -   xop[1] = GEN_INT (8 + unalign);
> +   s = \"add %0,%0,pcl\n\tldb_s %2,[%0,%1]\";/* Length = 6
> bytes.  */
> +   xop[1] = GEN_INT ((vec_offset - 2) + unalign);
>   }
> else
>   {
> -   s = \"add %2,%0,pcl\n\tldb_s %2,[%2,%1]\";
> -   xop[1] = GEN_INT (10 + unalign);
> +   s = \"add %2,%0,pcl\n\tldb_s %2,[%2,%1]\";/* Length = 6
> bytes.  */
> +   xop[1] = GEN_INT (vec_offset + unalign);
>

Re: [PATCH] Fix fold_binary RROTATE_EXPR opts (PR middle-end/71693)

2016-06-30 Thread Richard Biener

On Wed, 29 Jun 2016, Jakub Jelinek wrote:

> Hi!
> 
> This issue is latent on the trunk/6.2, fails with checking in 5.x/4.9.
> As usually in fold_*_loc, arg0/arg1 is STRIP_NOPS op0/op1, so might have
> incompatible type, so we need to convert it to the right type first
> before optimizing.
> 
> Bootstrapped/regtested on x86_64-linux/i686-linux, ok everywhere?

Ok.

Thanks,
Richard.

> 2016-06-29  Jakub Jelinek  
> 
>   PR middle-end/71693
>   * fold-const.c (fold_binary_loc) : Cast
>   TREE_OPERAND (arg0, 0) and TREE_OPERAND (arg0, 1) to type
>   first when permuting bitwise operation with rotate.  Cast
>   TREE_OPERAND (arg0, 0) to type when cancelling two rotations.
> 
>   * gcc.c-torture/compile/pr71693.c: New test.
> 
> --- gcc/fold-const.c.jj   2016-06-14 12:17:20.530282919 +0200
> +++ gcc/fold-const.c  2016-06-29 16:39:15.175562715 +0200
> @@ -10294,11 +10294,15 @@ fold_binary_loc (location_t loc,
> || TREE_CODE (arg0) == BIT_IOR_EXPR
> || TREE_CODE (arg0) == BIT_XOR_EXPR)
> && TREE_CODE (TREE_OPERAND (arg0, 1)) == INTEGER_CST)
> - return fold_build2_loc (loc, TREE_CODE (arg0), type,
> - fold_build2_loc (loc, code, type,
> -  TREE_OPERAND (arg0, 0), arg1),
> - fold_build2_loc (loc, code, type,
> -  TREE_OPERAND (arg0, 1), arg1));
> + {
> +   tree arg00 = fold_convert_loc (loc, type, TREE_OPERAND (arg0, 0));
> +   tree arg01 = fold_convert_loc (loc, type, TREE_OPERAND (arg0, 1));
> +   return fold_build2_loc (loc, TREE_CODE (arg0), type,
> +   fold_build2_loc (loc, code, type,
> +arg00, arg1),
> +   fold_build2_loc (loc, code, type,
> +arg01, arg1));
> + }
>  
>/* Two consecutive rotates adding up to the some integer
>multiple of the precision of the type can be ignored.  */
> @@ -10307,7 +10311,7 @@ fold_binary_loc (location_t loc,
> && TREE_CODE (TREE_OPERAND (arg0, 1)) == INTEGER_CST
> && wi::umod_trunc (wi::add (arg1, TREE_OPERAND (arg0, 1)),
>prec) == 0)
> - return TREE_OPERAND (arg0, 0);
> + return fold_convert_loc (loc, type, TREE_OPERAND (arg0, 0));
>  
>return NULL_TREE;
>  
> --- gcc/testsuite/gcc.c-torture/compile/pr71693.c.jj  2016-06-29 
> 16:39:15.175562715 +0200
> +++ gcc/testsuite/gcc.c-torture/compile/pr71693.c 2016-06-29 
> 16:39:15.175562715 +0200
> @@ -0,0 +1,10 @@
> +/* PR middle-end/71693 */
> +
> +unsigned short v;
> +
> +void
> +foo (int x)
> +{
> +  v = unsigned short) (0x0001 | (x & 0x0070) | 0x0100) & 0x00ffU) << 8)
> +   | (((unsigned short) (0x0001 | (x & 0x0070) | 0x0100) >> 8) & 
> 0x00ffU));
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

[PATCH, www] Mention download_prerequisites in installation docs

2016-06-30 Thread Manish Goregaokar

2016-06-30  Manish Goregaokar  

gcc/ChangeLog:
* doc/install.texi: Mention download_prerequisites
---
 gcc/doc/install.texi | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 9248f0d..0070fdf 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -351,7 +351,9 @@ versions may work in some cases, but it's safer to
use the exact
 versions documented.  We appreciate bug reports about problems with
 newer versions, though.  If your OS vendor provides packages for the
 support libraries then using those packages may be the simplest way to
-install the libraries.
+install the libraries.  Alternatively, running
+@code{contrib/download_prerequisites} from the repository root will fetch
+these libraries and place them in the appropriate locations for build.

 @table @asis
 @item GNU Multiple Precision Library (GMP) version 4.3.2 (or later)
-- 
2.8.3

Fix fir PR71696 in Libiberty Demangler (6)

2016-06-30 Thread Marcel Böhme

Hi,

The attached patch fixes the stack overflow in the demangler due to cycles in 
the references of “remembered” mangled types 
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71696).

The methods demangle_signature and do_arg in cplus-dem.c allow to “remember” 
mangled type names that can later be referenced and will also be demangled. The 
method demangle_args demangles those types following any references. So, if 
there is a cycle in the referencing (or in the simplest case a self-reference), 
the method enters infinite recursion.

The patch tracks the mangled types that are currently being demangled in a new 
variable called work->proctypevec. If a referenced type is currently being 
demangled, the demangling is marked as not successful.

Bootstrapped and regression tested on x86_64-pc-linux-gnu. Test case added to 
libiberty/testsuite/demangler-expected and checked PR71696 is resolved.


Index: libiberty/ChangeLog
===
--- libiberty/ChangeLog (revision 237852)
+++ libiberty/ChangeLog (working copy)
@@ -1,3 +1,21 @@
+2016-06-30  Marcel Böhme 
+
+   * cplus-dem.c: Prevent infinite recursion when there is a cycle
+   in the referencing of remembered mangled types.
+   (work_stuff): New stack to keep track of the remembered mangled
+   types that are currently being processed. 
+   (push_processed_type): New method to push currently processed
+   remembered type onto the stack.
+   (pop_processed_type): New method to pop currently processed
+   remembered type from the stack.
+   (work_stuff_copy_to_from): Copy values of new variables.
+   (delete_non_B_K_work_stuff): Free stack memory.
+   (demangle_args): Push/Pop currently processed remembered type.
+   (do_type): Do not demangle a cyclic reference and push/pop
+   referenced remembered type.
+
+   * testsuite/demangle-expected: Add tests.
+
 2016-05-31  Alan Modra  
 
* xmemdup.c (xmemdup): Use xmalloc rather than xcalloc.
Index: libiberty/cplus-dem.c
===
--- libiberty/cplus-dem.c   (revision 237852)
+++ libiberty/cplus-dem.c   (working copy)
@@ -144,6 +144,9 @@ struct work_stuff
   string* previous_argument; /* The last function argument demangled.  */
   int nrepeats; /* The number of times to repeat the previous
   argument.  */
+  int *proctypevec; /* Indices of currently processed remembered typevecs. 
 */
+  int proctypevec_size;
+  int nproctypes;
 };
 
 #define PRINT_ANSI_QUALIFIERS (work -> options & DMGL_ANSI)
@@ -436,6 +439,10 @@ iterate_demangle_function (struct work_stuff *,
 
 static void remember_type (struct work_stuff *, const char *, int);
 
+static void push_processed_type (struct work_stuff *, int);
+
+static void pop_processed_type (struct work_stuff *);
+
 static void remember_Btype (struct work_stuff *, const char *, int, int);
 
 static int register_Btype (struct work_stuff *);
@@ -1302,6 +1309,13 @@ work_stuff_copy_to_from (struct work_stuff *to, st
   memcpy (to->btypevec[i], from->btypevec[i], len);
 }
 
+  if (from->proctypevec)
+{
+  to->proctypevec = XNEWVEC (int, from->proctypevec_size);
+  memcpy (to->proctypevec, from->proctypevec, 
+ from->proctypevec_size * sizeof(int));
+}
+
   if (from->ntmpl_args)
 to->tmpl_argvec = XNEWVEC (char *, from->ntmpl_args);
 
@@ -1336,6 +1350,12 @@ delete_non_B_K_work_stuff (struct work_stuff *work
   work -> typevec = NULL;
   work -> typevec_size = 0;
 }
+  if (work -> proctypevec != NULL)
+{
+  free (work -> proctypevec);
+  work -> proctypevec = NULL;
+  work -> proctypevec_size = 0;
+}
   if (work->tmpl_argvec)
 {
   int i;
@@ -3554,6 +3574,8 @@ static int
 do_type (struct work_stuff *work, const char **mangled, string *result)
 {
   int n;
+  int i;
+  int is_proctypevec;
   int done;
   int success;
   string decl;
@@ -3566,6 +3588,7 @@ do_type (struct work_stuff *work, const char **man
 
   done = 0;
   success = 1;
+  is_proctypevec = 0;
   while (success && !done)
 {
   int member;
@@ -3626,7 +3649,14 @@ do_type (struct work_stuff *work, const char **man
  success = 0;
}
  else
-   {
+   for (i = 0; i < work -> nproctypes; i++)
+ if (work -> proctypevec [i] == n)
+   success = 0;
+ 
+ if (success)
+   {
+ is_proctypevec = 1;
+ push_processed_type (work, n);
  remembered_type = work -> typevec[n];
  mangled = _type;
}
@@ -3849,6 +3879,9 @@ do_type (struct work_stuff *work, const char **man
 string_delete (result);
   string_delete ();
 
+  if (is_proctypevec)
+pop_processed_type (work); 
+
   if (success)
 /* Assume an integral type, if we're not

Re: [AArch64] - feedback on the approach followed for vulcan scheduler

2016-06-30 Thread Andrew Pinski

On Wed, Jun 29, 2016 at 9:59 AM, Virendra Pathak
 wrote:
> Hi gcc-patches group,
>
> I am working on adding vulcan.md (machine description) for vulcan cpu
> in the aarch64 port. However, before proposing the final patch, I would like
> the basic approach to be reviewed by you all (as changes are there in
> aarch64.md)
>
> In vulcan, a (load/store) instruction could be scheduled to cpu units in
> different way based on the addressing mode(e.g. load, or load+integer).
> So the requirement is to identify the addressing mode of (load/store)
> instruction's operand while scheduling.
>
> For this purpose, a new attribute "addr_type" has been added in the
> aarch64.md file. This helps in identifying which operands of (load/store)
> instruction should be considered for finding the addressing mode.
>
> vulcan.md, while scheduling, calls a new function aarch64_mem_type_p
> in the aarch64.c (via match_test) to decide the scheduling option based
> on the addressing mode.
>
> I have copied the code snippet below (complete patch is attached with
> this mail).
>
> Kindly review and give your feedback/comment.
> Also if you think there could be an better alternative way, kindly suggest.


Since this is only about post and pre increment being split into two
micro-ops.  I suspect something like what rs600 back-end does is
better:
;; Does this instruction use update addressing?
;; This is used for load and store insns.  See the comments for "indexed".
(define_attr "update" "no,yes"
  (if_then_else (ior (match_operand 0 "update_address_mem")
 (match_operand 1 "update_address_mem"))
(const_string "yes")
(const_string "no")))

Where update_address_mem is defined as:
;; Return 1 if the operand is a MEM with an update-form address. This may
;; also include update-indexed form.
(define_special_predicate "update_address_mem"
  (match_test "(MEM_P (op)
&& (GET_CODE (XEXP (op, 0)) == PRE_INC
|| GET_CODE (XEXP (op, 0)) == PRE_DEC
|| GET_CODE (XEXP (op, 0)) == PRE_MODIFY))"))


Note you need to include the POST ones for AARCH64 but it should be
similar enough.
And then you just check the attr update.

Thanks,
Andrew

>
> Thanks in advance for your time.
>
> 
>
> FILE - gcc/config/aarch64/aarch64-protos.h
>
> /* Mask bits to use for for aarch64_mem_type_p.  Unshifted/shifted index
>register variants are separated for scheduling purposes because the
>distinction matters on some cores.  */
> #define AARCH64_ADDR_REG_IMM   0x01
> #define AARCH64_ADDR_REG_WB0x02
> #define AARCH64_ADDR_REG_REG   0x04
> #define AARCH64_ADDR_REG_SHIFT 0x08
> #define AARCH64_ADDR_REG_EXT   0x10
> #define AARCH64_ADDR_REG_SHIFT_EXT 0x20
> #define AARCH64_ADDR_LO_SUM0x40
> #define AARCH64_ADDR_SYMBOLIC  0x80
>
>
>
>
> FILE - gcc/config/aarch64/aarch64.md
>
> (define_attr "addr_type" "none,op0,op1,op0addr,op1addr,lo_sum,wb"
>   (const_string "none"))
>
>
> (define_insn "*mov_aarch64"
>   [(set (match_operand:SHORT 0 "nonimmediate_operand" "=r,r,
> *w,r,*w, m, m, r,*w,*w")
> (match_operand:SHORT 1 "general_operand"  " r,M,D,m,
> m,rZ,*w,*w, r,*w"))]
>   "(register_operand (operands[0], mode)
> || aarch64_reg_or_zero (operands[1], mode))"
>
> {
>switch (which_alternative)
>  {
>  case 0:
>return "mov\t%w0, %w1";
>  case 1:
>return "mov\t%w0, %1";
>  case 2:
>return aarch64_output_scalar_simd_mov_immediate (operands[1],
> mode);
>  case 3:
>return "ldr\t%w0, %1";
>  case 4:
>return "ldr\t%0, %1";
>  case 5:
>return "str\t%w1, %0";
>  case 6:
>return "str\t%1, %0";
>  case 7:
>return "umov\t%w0, %1.[0]";
>  case 8:
>return "dup\t%0., %w1";
>  case 9:
>return "dup\t%0, %1.[0]";
>  default:
>gcc_unreachable ();
>  }
> }
>   [(set_attr "type" "mov_reg,mov_imm,neon_move,load1,load1,store1,store1,\
>  neon_to_gp,neon_from_gp,neon_dup")
>(set_attr "simd" "*,*,yes,*,*,*,*,yes,yes,yes")
>(set_attr "addr_type" "*,*,*,op1,op1,op0,op0,*,*,*")]
> )
>
>
>
>
> FILE - gcc/config/aarch64/vulcan.md
> ;; Integer loads and stores.
>
> (define_insn_reservation "vulcan_load_basic" 4
>   (and (eq_attr "tune" "vulcan")
>(eq_attr "type" "load1")
>(match_test "aarch64_mem_type_p (insn, AARCH64_ADDR_SYMBOLIC
>   | AARCH64_ADDR_REG_IMM
>   | AARCH64_ADDR_LO_SUM)"))
>   "vulcan_ls01")
>
> (define_insn_reservation "vulcan_load_automod" 4
>   (and (eq_attr "tune" "vulcan")
>(eq_attr "type" "load1")
>(match_test "aarch64_mem_type_p (insn, AARCH64_ADDR_REG_WB)"))
>   "vulcan_ls01,vulcan_i012")
>
>
>
>
> FILE -

74 matches

Mail list logo