Re: [PATCH, testsuite]: Fix PR 65944, FAIL: g++.dg/lto/pr65276: undefined reference to std2::exception::~exception()

2016-02-01 Thread Michael Daniels
On Thu, 2015-06-18 at 11:05 +0200, Uros Bizjak wrote:
> On Thu, Jun 18, 2015 at 10:59 AM, James Greenhalgh
>  wrote:
> 
> >> > Hello!
> >> >
> >> > Following patch fixes:
> >> >
> >> > cp_lto_pr65276_1.o: In function 
> >> > `std2::runtime_error::~runtime_error()':^M
> >> > pr65276_1.C:(.text._ZN4std213runtime_errorD2Ev[_ZN4std213runtime_errorD5Ev]+0x8):
> >> > undefined reference to `std2::exception::~exception()'^M
> >> > cp_lto_pr65276_1.o: In function 
> >> > `std2::runtime_error::~runtime_error()':^M
> >> > pr65276_1.C:(.text._ZN4std213runtime_errorD0Ev[_ZN4std213runtime_errorD5Ev]+0xc):
> >> > undefined reference to `std2::exception::~exception()'^M
> >> > cp_lto_pr65276_1.o:(.rodata._ZTVN4std29exceptionE[_ZTVN4std29exceptionE]+0x10):
> >> > undefined reference to `std2::exception::~exception()'^M
> >> > cp_lto_pr65276_1.o:(.rodata._ZTVN4std29exceptionE[_ZTVN4std29exceptionE]+0x18):
> >> > undefined reference to `std2::exception::~exception()'^M
> >> > collect2: error: ld returned 1 exit status^M
> >> >
> >> > link error with g++.dg/lto/pr65276 testcase.
> >> >
> >> > 2015-06-16  Uros Bizjak  
> >> >
> >> > PR testsuite/65944
> >> > * g++.dg/lto/pr65276_0.C: Add std2::exception::~exception() function.
> >> >
> >> > Tested on x86_64-linux-gnu CentOS 5.11 (where linking failed) and
> >> > Fedora 22 (where linking didn't fail).
> >> >
> >> > OK for mainline and gcc-5 branch?
> >>
> >> Now committed as obvious.
> >
> > This patch causes failures in arm-none-linux-gnueabihf testing:
> >
> > PASS->FAIL: g++.dg/lto/pr65276 cp_lto_pr65276_0.o-cp_lto_pr65276_1.o link, 
> > -flto -O0 -std=c++11
> >
> > .../arm-none-linux-gnueabihf/obj/gcc4/gcc/testsuite/g++2/../../xg++ 
> > -B.../arm-none-linux-gnueabihf/obj/gcc4/gcc/testsuite/g++2/../../ 
> > cp_lto_pr65276_0.o cp_lto_pr65276_1.o -fno-diagnostics-show-caret 
> > -fdiagnostics-color=never -nostdinc++ 
> > -I.../arm-none-linux-gnueabihf/obj/gcc4/arm-none-linux-gnueabihf/libstdc++-v3/include/arm-none-linux-gnueabihf
> >  
> > -I.../arm-none-linux-gnueabihf/obj/gcc4/arm-none-linux-gnueabihf/libstdc++-v3/include
> >  -I.../gcc/libstdc++-v3/libsupc++ -I.../gcc/libstdc++-v3/include/backward 
> > -I.../gcc/libstdc++-v3/testsuite/util -fmessage-length=0 -flto -O0 
> > -std=c++11 
> > -L.../arm-none-linux-gnueabihf/obj/gcc4/arm-none-linux-gnueabihf/./libstdc++-v3/src/.libs
> >  
> > -B.../arm-none-linux-gnueabihf/obj/gcc4/arm-none-linux-gnueabihf/./libstdc++-v3/src/.libs
> >  
> > -L.../arm-none-linux-gnueabihf/obj/gcc4/arm-none-linux-gnueabihf/./libstdc++-v3/src/.libs
> >  -o g++-dg-lto-pr65276-01.exe
> > cp_lto_pr65276_1.o (symbol from plugin): In function `typeinfo for 
> > std2::runtime_error':
> > (.text+0x0): multiple definition of `typeinfo name for std2::exception'
> > cp_lto_pr65276_0.o (symbol from plugin):(.text+0x0): first defined here
> > cp_lto_pr65276_1.o (symbol from plugin): In function `typeinfo for 
> > std2::runtime_error':
> > (.text+0x0): multiple definition of `typeinfo for std2::exception'
> > cp_lto_pr65276_0.o (symbol from plugin):(.text+0x0): first defined here
> > collect2: error: ld returned 1 exit status
> > compiler exited with status 1
> > output is:
> > cp_lto_pr65276_1.o (symbol from plugin): In function `typeinfo for 
> > std2::runtime_error':
> > (.text+0x0): multiple definition of `typeinfo name for std2::exception'
> > cp_lto_pr65276_0.o (symbol from plugin):(.text+0x0): first defined here
> > cp_lto_pr65276_1.o (symbol from plugin): In function `typeinfo for 
> > std2::runtime_error':
> > (.text+0x0): multiple definition of `typeinfo for std2::exception'
> > cp_lto_pr65276_0.o (symbol from plugin):(.text+0x0): first defined here
> > collect2: error: ld returned 1 exit status
> 
> I discussed this patch privately with Jon, where he suggested that the
> approach is OK. The patch also works for me on both, CentOS 5.11 and
> Fedora 22.
> 
> I'm out of ideas why this doesn't work on your system. Can you investigate it?
> 
> Uros.

I am seeing something similar to this with arm targets, and I don't
think it's a problem with the test. From my limited understanding of the
ARM C++ ABI, there should not be any typeinfo definitions in the object
for pr65276_1.C. I am seeing strong typeinfo definitions in the objects
for both pr65276_0.C and pr65276_1.C though, which is what's causing the
error me.


RE: [Patch,microblaze]: Better register allocation to minimize the spill and fetch.

2016-02-01 Thread Ajit Kumar Agarwal


-Original Message-
From: Mike Stump [mailto:mikest...@comcast.net] 
Sent: Tuesday, February 02, 2016 12:12 AM
To: Ajit Kumar Agarwal
Cc: GCC Patches; Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; 
Nagaraju Mekala
Subject: Re: [Patch,microblaze]: Better register allocation to minimize the 
spill and fetch.

On Jan 29, 2016, at 2:31 AM, Ajit Kumar Agarwal  
wrote:
> 
> This patch improves the allocation of registers in the given function.

>>Is it just me, or, would it be even better to change the abi and make 
>>MB_ABI_ASM_TEMP_REGNUM be allocated by the register allocator?

Yes, it would be even better to make r18 (MB_ABI_ASM_TEMP_REGNUM) to be 
allocated by the register allocator
for the given function. Currently r18 is marked as FIXED_REGISTERS and cannot 
be allocated to the given function only
used for temporaries where the liveness is limited. R18 is used in some of the 
shift patterns and also for the conditional
branches temporaries.

I have made some of the ABI changes making r18 as not FIXED_REGISTERS in my 
local branch and the given function can
be allocated with r18.This will reduce the spill and fetch to a great extent 
and I am seeing good amount of gains with this 
change for Mibench/EEMBC benchmarks.

Since the ABI specifies  r18 is reserved for assembler temporaries, lot of 
kernel driven code and glibc code  uses
 r18 as a temporaries for inline assembly. Changing the ABI requires lot of 
changes in the kernel specific code for
Microblaze where r18 is used as a temporary without storing and fetching after 
the scope is gone.

Currently we don't have  plans to change the ABI that might break the existing 
Kernel and GLIBC code. Changing
the ABI require a good amount of work to be done in Kernel and GLIBC code for 
Microblaze.

Thanks & Regards
Ajit


Re: [PATCH][2/2] Improve array-bound warnings and VRP

2016-02-01 Thread H.J. Lu
On Mon, Jan 26, 2015 at 7:06 AM, Richard Biener  wrote:
>
> This is the 2nd thing I came up with after looking at PR64277.
> VRP does a poor job computing value-ranges of unrolled loop IVs
> thus a very simple thing to do is to factor in previous VRP results
> by intersecting what VRP2 computes with recorded SSA name range infos
> (that also makes errors in those more likely to pop up... :/).
>
> This reduces the number of array bound warnings I get from the testcase
> but doesn't fix it completely.  It also ends up with saner value-ranges.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu (with one
> libstdc++ runtime failure which I am now checking if caused by the patch,
> and thus likely an existing latent issue with SSA name range info).
>
> Ok for trunk?  Or should I delay this to GCC 6?
>
> Thanks,
> Richard.
>
> 2015-01-26  Richard Biener  
>
> PR tree-optimization/64277
> * tree-vrp.c (update_value_range): Intersect the range with
> old recorded SSA name range information.
>

This caused:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69606

H.J.


Re: [PATCH] Fix compile/memory hog in the combiner (PR rtl-optimization/69592)

2016-02-01 Thread Bernd Schmidt

On 02/01/2016 09:34 PM, Jakub Jelinek wrote:

On the following testcase we completely uselessly consume about 5.5GB
of RAM and lots of compile time.  The problem is the code to avoid
exponential behavior of nonzero_bits/num_sign_bit_copies on binary
arithmetics rtxes, which causes us to recurse even when handling
of those rtxes is going to ignore those arguments.
So, this patch limits those only to the cases where we are going
to recurse on both arguments, for rtxes where we don't look at arguments
at all or where we only recurse on a single arguments it doesn't make
sense.  On the testcase, one of the rtxes where the new predicates
return false but ARITHMETIC_P is true, is COMPARE, in particular
(compare:CCC (plus (x) (y)) (x)), where it is known even without
looking at the operands that only one bit is possibly non-zero and
number of sign bit copies is always 1.  But without the patch we
needlessly recurse on x, which is set by another similar operation etc.


Hmm, so the code we have to eliminate performance problems is itself 
causing them?
I don't see any code handling COMPARE in nonzero_bits1, only the various 
EQ/NE/etc. codes.



+static inline bool
+nonzero_bits_binary_arith_p (const_rtx x)
+{
+  if (!ARITHMETIC_P (x))
+return false;
+  switch (GET_CODE (x))
+{
+case AND:
+case XOR:
+case IOR:
+case UMIN:
+case UMAX:
+case SMIN:
+case SMAX:
+case PLUS:
+case MINUS:
+case MULT:
+case DIV:
+case UDIV:
+case MOD:
+case UMOD:
+  return true;
+default:
+  return false;
+}


I think I have a slight preference for listing the cases where we know 
we can avoid the exponential behaviour workaround - i.e. just test for 
compares and return false for them. Otherwise someone might add another 
of the arithmetic codes to nonzero_bits without noticing they have to 
adjust this function as well.



Bernd



Re: [PATCH] Fix up _Pragma GCC diagnostics regressions (PR preprocessor/69543, PR c/69558)

2016-02-01 Thread Jeff Law

On 01/29/2016 10:57 PM, David Malcolm wrote:

On Fri, 2016-01-29 at 20:50 +0100, Jakub Jelinek wrote:

Hi!

This patch reverts one tiny change from r228049 changes (which hasn't
been
mentioned in the ChangeLog or patch description).  We definitely need
to
revisit this for GCC 7, but stage4 is probably not the right time for
that,
and the patch fixes e.g. tons of warnings (or with -Werror errors on
including pretty much all glib2 headers).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-01-29  Jakub Jelinek  

PR preprocessor/69543
PR c/69558
* c-pragma.c (handle_pragma_diagnostic): Pass input_location
instead of loc to control_warning_option.

* gcc.dg/pr69543.c: New test.
* gcc.dg/pr69558.c: New test.


This touches c-family; shouldn't the new tests be in c-c++-common,
rather than gcc.dg? (presumably we need to ensure that the glib2
headers are sane from C++ also)

I've been attempting to fix these by fixing linemap_compare_locations,
but I don't have that approach working, so fwiw I don't object to this
patch.
Then let's go ahead with this patch.  If you come up with something 
cleaner with the linemap_compare_locations, then we can revert the 
c-pragma part of this change (keeping the tests, of course).


jeff



Re: [PATCH] Fix compile/memory hog in the combiner (PR rtl-optimization/69592)

2016-02-01 Thread Jeff Law

On 02/01/2016 01:34 PM, Jakub Jelinek wrote:

Hi!

On the following testcase we completely uselessly consume about 5.5GB
of RAM and lots of compile time.  The problem is the code to avoid
exponential behavior of nonzero_bits/num_sign_bit_copies on binary
arithmetics rtxes, which causes us to recurse even when handling
of those rtxes is going to ignore those arguments.
So, this patch limits those only to the cases where we are going
to recurse on both arguments, for rtxes where we don't look at arguments
at all or where we only recurse on a single arguments it doesn't make
sense.  On the testcase, one of the rtxes where the new predicates
return false but ARITHMETIC_P is true, is COMPARE, in particular
(compare:CCC (plus (x) (y)) (x)), where it is known even without
looking at the operands that only one bit is possibly non-zero and
number of sign bit copies is always 1.  But without the patch we
needlessly recurse on x, which is set by another similar operation etc.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-02-01  Jakub Jelinek  

PR rtl-optimization/69592
* rtlanal.c (nonzero_bits_binary_arith_p): New inline function.
(cached_nonzero_bits): Use it instead of ARITHMETIC_P.
(num_sign_bit_copies_binary_arith_p): New inline function.
(cached_num_sign_bit_copies): Use it instead of ARITHMETIC_P.

* gcc.dg/pr69592.c: New test.
I do wonder if we want a pointer back from nonzero_bits1 to 
nonzero_bits_binary_arith_p to keep them in sync.  Similarly for the 
sign_bits version.  Presumably before the appropriate SWITCH statement 
in those functions.


OK with that change.

jeff


Re: [PATCH, testsuite]: Use -gdwarf-2 for g++.dg/other/anon5.C

2016-02-01 Thread Jeff Law

On 02/01/2016 03:05 PM, Uros Bizjak wrote:

Hello!

This test case fails for targets, where dwarf-4 is unsupported (e.g.
CentOS 5.11) with

/usr/bin/ld: Dwarf Error: found dwarf version '4', this reader only
handles version 2 information.

We can use -gdwarf-2 here, and still get correct linker error message
on Cent OS 5.11 and Fedora 23.

2016-02-01  Uros Bizjak  

 * g++.dg/other/anon5.C (dg-opetions): Use -gdwarf-2 instead of -g.

OK.
jeff


[Patch, fortran] PR66089 fix elemental dependency mishandling

2016-02-01 Thread Mikael Morin

Hello,

this is about the case

   c(:) = elemental_func(c(1), ...)

where as a result of a trunk change, only a reference to c(1) is saved 
to a temporary variable, instead of its value.


The fix tries to save the amount of copying as much as possible by 
detecting the above case.  Technically through the usage of a new field 
needs_temporary.


The patch is a variant of the one that has been on bugzilla for months.
The main difference is the usage of gfc_expr_is_variable instead of the 
check for expr_type == EXPR_VARIABLE (the former includes 
pointer-returning functions as well).


Regression-tested on x86_64-unknown-linux-gnu.  OK for trunk?
Mikael

2016-02-01  Mikael Morin  

PR fortran/66089
* trans-expr.c (expr_is_variable, gfc_expr_is_variable): Rename
the former to the latter and make it non-static.  Update callers.
* gfortran.h (gfc_expr_is_variable): New declaration.
(struct gfc_ss_info): Add field needs_temporary.
* trans-array.c (gfc_scalar_elemental_arg_saved_as_argument):
Tighten the condition on aggregate expressions with a check
that the expression is a variable and doesn't need a temporary.
(gfc_conv_resolve_dependency): Add intermediary reference variable.
Set the needs_temporary field.

2016-02-01  Mikael Morin  

PR fortran/66089
* gfortran.dg/elemental_dependency_6.f90: New.
diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c
index eeb688c..2ff2833 100644
--- a/gcc/fortran/trans-array.c
+++ b/gcc/fortran/trans-array.c
@@ -2464,10 +2464,12 @@ gfc_scalar_elemental_arg_saved_as_reference (gfc_ss_info * ss_info)
 return true;
 
   /* If the expression is a data reference of aggregate type,
+ and the data reference is not used on the left hand side,
  avoid a copy by saving a reference to the content.  */
-  if (ss_info->expr->expr_type == EXPR_VARIABLE
+  if (!ss_info->data.scalar.needs_temporary
   && (ss_info->expr->ts.type == BT_DERIVED
-	  || ss_info->expr->ts.type == BT_CLASS))
+	  || ss_info->expr->ts.type == BT_CLASS)
+  && gfc_expr_is_variable (ss_info->expr))
 return true;
 
   /* Otherwise the expression is evaluated to a temporary variable before the
@@ -4461,6 +4463,7 @@ gfc_conv_resolve_dependencies (gfc_loopinfo * loop, gfc_ss * dest,
   gfc_ss *ss;
   gfc_ref *lref;
   gfc_ref *rref;
+  gfc_ss_info *ss_info;
   gfc_expr *dest_expr;
   gfc_expr *ss_expr;
   int nDepend = 0;
@@ -4471,15 +4474,16 @@ gfc_conv_resolve_dependencies (gfc_loopinfo * loop, gfc_ss * dest,
 
   for (ss = rss; ss != gfc_ss_terminator; ss = ss->next)
 {
-  ss_expr = ss->info->expr;
+  ss_info = ss->info;
+  ss_expr = ss_info->expr;
 
-  if (ss->info->array_outer_dependency)
+  if (ss_info->array_outer_dependency)
 	{
 	  nDepend = 1;
 	  break;
 	}
 
-  if (ss->info->type != GFC_SS_SECTION)
+  if (ss_info->type != GFC_SS_SECTION)
 	{
 	  if (flag_realloc_lhs
 	  && dest_expr != ss_expr
@@ -4494,6 +4498,10 @@ gfc_conv_resolve_dependencies (gfc_loopinfo * loop, gfc_ss * dest,
 
 	nDepend = gfc_check_dependency (dest_expr, ss_expr, false);
 
+	  if (ss_info->type == GFC_SS_REFERENCE
+	  && gfc_check_dependency (dest_expr, ss_expr, false))
+	ss_info->data.scalar.needs_temporary = 1;
+
 	  continue;
 	}
 
diff --git a/gcc/fortran/trans-expr.c b/gcc/fortran/trans-expr.c
index c5ae4c5..9f5bece 100644
--- a/gcc/fortran/trans-expr.c
+++ b/gcc/fortran/trans-expr.c
@@ -8825,8 +8825,8 @@ gfc_trans_array_constructor_copy (gfc_expr * expr1, gfc_expr * expr2)
 
 /* Tells whether the expression is to be treated as a variable reference.  */
 
-static bool
-expr_is_variable (gfc_expr *expr)
+bool
+gfc_expr_is_variable (gfc_expr *expr)
 {
   gfc_expr *arg;
   gfc_component *comp;
@@ -8839,7 +8839,7 @@ expr_is_variable (gfc_expr *expr)
   if (arg)
 {
   gcc_assert (expr->value.function.isym->id == GFC_ISYM_TRANSPOSE);
-  return expr_is_variable (arg);
+  return gfc_expr_is_variable (arg);
 }
 
   /* A data-pointer-returning function should be considered as a variable
@@ -9320,7 +9320,7 @@ gfc_trans_assignment_1 (gfc_expr * expr1, gfc_expr * expr2, bool init_flag,
  must have its components deallocated afterwards.  */
   scalar_to_array = (expr2->ts.type == BT_DERIVED
 		   && expr2->ts.u.derived->attr.alloc_comp
-		   && !expr_is_variable (expr2)
+		   && !gfc_expr_is_variable (expr2)
 		   && expr1->rank && !expr2->rank);
   scalar_to_array |= (expr1->ts.type == BT_DERIVED
 && expr1->rank
@@ -9364,7 +9364,7 @@ gfc_trans_assignment_1 (gfc_expr * expr1, gfc_expr * expr2, bool init_flag,
 }
 
   tmp = gfc_trans_scalar_assign (&lse, &rse, expr1->ts,
- expr_is_variable (expr2) || scalar_to_array
+ gfc_expr_is_variable (expr2) || scalar_to_array
  || expr2->expr_type == EXPR_ARRAY,
  !(l_is_temp || init_flag) && dealloc);
   gfc_add_expr_to_block (&body, tmp);
diff --git a/gcc/f

[PATCH, testsuite]: Use -gdwarf-2 for g++.dg/other/anon5.C

2016-02-01 Thread Uros Bizjak
Hello!

This test case fails for targets, where dwarf-4 is unsupported (e.g.
CentOS 5.11) with

/usr/bin/ld: Dwarf Error: found dwarf version '4', this reader only
handles version 2 information.

We can use -gdwarf-2 here, and still get correct linker error message
on Cent OS 5.11 and Fedora 23.

2016-02-01  Uros Bizjak  

* g++.dg/other/anon5.C (dg-opetions): Use -gdwarf-2 instead of -g.

Tested on x86_64-linux-gnu, CentOS 5.11 and Fedora 23.

OK for mainline?

Uros.
Index: g++.dg/other/anon5.C
===
--- g++.dg/other/anon5.C(revision 233041)
+++ g++.dg/other/anon5.C(working copy)
@@ -1,6 +1,6 @@
 // PR c++/34094
 // { dg-do link { target { ! { *-*-darwin* *-*-hpux* *-*-solaris2.* } } } }
-// { dg-options "-g" }
+// { dg-options "-gdwarf-2" }
 // Ignore additional message on powerpc-ibm-aix
 // { dg-prune-output "obtain more information" } */
 // Ignore additional messages on Linux/x86 with PIE


[PATCH] [PR tree-optimization/68580] Throttle size of PHI nodes we'll walk in FSM threader

2016-02-01 Thread Jeff Law


As discussed in the BZ, the FSM code walks backwards through PHI nodes 
trying to find constant values for key SSA_NAMEs.


When the size of the PHI node gets very large, those walks get 
exponentially more expensive.  Experiments have shown that there are 
limited cases with PHI nodes with > 100 arguments.  The testcase has 
PHIs with > 1000 arguments.


This patch creates a new PARAM with the limit set at a maximum of 100 
arguments in a PHI node before the FSM threader gives up.


That's enough to make compilation of the 68580 testcase reasonable, but 
I'm going to drop it to P4 priority rather than close.


I believe fsm_find_thread_paths is in need of a rewrite and probably 
accounts for the remaining small compile-time regression and thus I'm 
keeping the BZ open.


The fundamental problem with fsm_find_thread_path tries to see if there 
is a path between two nodes in DAG subset of the CFG.  However, it does 
this *for every predecessor* of the final node in a thread path.


ISTM there ought to be a way to pass in the final block in the jump 
thread path (rather than its preds), then use some backtracking as we 
pop up nodes in the DAG to search for alternate paths to avoid lots of 
extra work.  My experiments have shown this accounts for a fair amount 
of the remaining compile-time regression in the referenced testcase.



So this patch is a bit of a band-aid.  I may try to get to that rewrite 
before we close stage4 at which point re-evaluation of this PARAM will 
be needed.


Bootstrapped and regression tested on x86_64 linux.  Installing on the 
trunk.  Note the patch has to do some reindentation which makes it look 
far larger than it really is.  A diff -b output is also attached if 
anyone wants to see the meat of the change independent of the whitespace 
changes.


Jeff

commit c3a8a3d7108f6ad45692e3350b8e21edd4965d7a
Author: Jeff Law 
Date:   Mon Feb 1 14:46:57 2016 -0700

PR tree-optimization/68580
* params.def (FSM_MAXIMUM_PHI_ARGUMENTS): New param.
* tree-ssa-threadbackward.c
(fsm_find_control_statement_thread_paths): Do not try to walk
through large PHI nodes.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index a551069..60e229f 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,11 @@
+2016-02-01  Jeff Law  
+
+   PR tree-optimization/68580
+   * params.def (FSM_MAXIMUM_PHI_ARGUMENTS): New param.
+   * tree-ssa-threadbackward.c
+   (fsm_find_control_statement_thread_paths): Do not try to walk
+   through large PHI nodes.
+
 2016-02-01  Jakub Jelinek  
 
* ifcvt.c (bb_ok_for_noce_convert_multiple_sets): Return false
diff --git a/gcc/params.def b/gcc/params.def
index 0722ad7..c0494fa 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -1150,6 +1150,11 @@ DEFPARAM (PARAM_FSM_SCALE_PATH_STMTS,
  "Scale factor to apply to the number of statements in a threading 
path when comparing to the number of (scaled) blocks.",
  2, 1, 10)
 
+DEFPARAM (PARAM_FSM_MAXIMUM_PHI_ARGUMENTS,
+ "fsm-maximum-phi-arguments",
+ "Maximum number of arguments a PHI may have before the FSM threader 
will not try to thread through its block.",
+ 100, 1, 99)
+
 DEFPARAM (PARAM_FSM_SCALE_PATH_BLOCKS,
  "fsm-scale-path-blocks",
  "Scale factor to apply to the number of blocks in a threading path 
when comparing to the number of (scaled) statements.",
diff --git a/gcc/tree-ssa-threadbackward.c b/gcc/tree-ssa-threadbackward.c
index 735009c..55dbcad 100644
--- a/gcc/tree-ssa-threadbackward.c
+++ b/gcc/tree-ssa-threadbackward.c
@@ -114,7 +114,9 @@ fsm_find_control_statement_thread_paths (tree name,
  eventually one of the phi arguments will be an integer constant.  In the
  future, this could be extended to also handle simple assignments of
  arithmetic operations.  */
-  if (gimple_code (def_stmt) != GIMPLE_PHI)
+  if (gimple_code (def_stmt) != GIMPLE_PHI
+  || (gimple_phi_num_args (def_stmt)
+ >= (unsigned) PARAM_VALUE (PARAM_FSM_MAXIMUM_PHI_ARGUMENTS)))
 return;
 
   /* Avoid infinite recursion.  */
@@ -200,247 +202,253 @@ fsm_find_control_statement_thread_paths (tree name,
 
   /* Iterate over the arguments of PHI.  */
   unsigned int i;
-  for (i = 0; i < gimple_phi_num_args (phi); i++)
+  if (gimple_phi_num_args (phi)
+  < (unsigned) PARAM_VALUE (PARAM_FSM_MAXIMUM_PHI_ARGUMENTS))
 {
-  tree arg = gimple_phi_arg_def (phi, i);
-  basic_block bbi = gimple_phi_arg_edge (phi, i)->src;
-
-  /* Skip edges pointing outside the current loop.  */
-  if (!arg || var_bb->loop_father != bbi->loop_father)
-   continue;
-
-  if (TREE_CODE (arg) == SSA_NAME)
+  for (i = 0; i < gimple_phi_num_args (phi); i++)
{
- vec_safe_push (path, bbi);
- /* Recursively follow SSA_NAMEs looking for a constant definition.  */
- fsm_find_control_statement_thread_paths (arg, visited_bbs, path,
-  

Re: [Patch, MIPS] Fix PR target/68273, passing args in wrong regs

2016-02-01 Thread Steve Ellcey
On Sat, 2016-01-30 at 11:06 +, Richard Sandiford wrote:
> I'm not sure this patch is safe.  The received wisdom used to be that
> ABIs should be defined in terms of types, not modes, since types
> represent the source code while modes are an internal GCC concept
> that could change over time (although in practice the barrier for
> that is pretty high).
> 
> The patch that introduced the line being patched was part of a series
> that fixed ABI incompatibilities with MIPSpro, and IIRC some of the
> incompatibilties really were due to us looking at modes when we should
> have been looking at types.

My main reason for looking at mode instead of type for non-aggregates
was because of argument promotion.  When I looked at a function with a
char argument, the type was char but the mode was SI.  Now I could still
use the alignment of char (1 byte) since it would get extended to 4
bytes in mips_function_arg_boundary by:

  if (alignment < PARM_BOUNDARY)
alignment = PARM_BOUNDARY;

But it still seemed a bit 'wrong' to me.  Maybe something in the
front/middle ends should be updating the type to match any argument
promotion that is being done?

Steve Ellcey
sell...@imgtec.com



[PATCH] Fix compile/memory hog in the combiner (PR rtl-optimization/69592)

2016-02-01 Thread Jakub Jelinek
Hi!

On the following testcase we completely uselessly consume about 5.5GB
of RAM and lots of compile time.  The problem is the code to avoid
exponential behavior of nonzero_bits/num_sign_bit_copies on binary
arithmetics rtxes, which causes us to recurse even when handling
of those rtxes is going to ignore those arguments.
So, this patch limits those only to the cases where we are going
to recurse on both arguments, for rtxes where we don't look at arguments
at all or where we only recurse on a single arguments it doesn't make
sense.  On the testcase, one of the rtxes where the new predicates
return false but ARITHMETIC_P is true, is COMPARE, in particular
(compare:CCC (plus (x) (y)) (x)), where it is known even without
looking at the operands that only one bit is possibly non-zero and
number of sign bit copies is always 1.  But without the patch we
needlessly recurse on x, which is set by another similar operation etc.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-02-01  Jakub Jelinek  

PR rtl-optimization/69592
* rtlanal.c (nonzero_bits_binary_arith_p): New inline function.
(cached_nonzero_bits): Use it instead of ARITHMETIC_P.
(num_sign_bit_copies_binary_arith_p): New inline function.
(cached_num_sign_bit_copies): Use it instead of ARITHMETIC_P.

* gcc.dg/pr69592.c: New test.

--- gcc/rtlanal.c.jj2016-01-21 21:27:57.0 +0100
+++ gcc/rtlanal.c   2016-02-01 18:53:06.130934333 +0100
@@ -4163,6 +4163,36 @@ num_sign_bit_copies (const_rtx x, machin
   return cached_num_sign_bit_copies (x, mode, NULL_RTX, VOIDmode, 0);
 }
 
+/* Return true if nonzero_bits1 might recurse into both operands
+   of X.  */
+
+static inline bool
+nonzero_bits_binary_arith_p (const_rtx x)
+{
+  if (!ARITHMETIC_P (x))
+return false;
+  switch (GET_CODE (x))
+{
+case AND:
+case XOR:
+case IOR:
+case UMIN:
+case UMAX:
+case SMIN:
+case SMAX:
+case PLUS:
+case MINUS:
+case MULT:
+case DIV:
+case UDIV:
+case MOD:
+case UMOD:
+  return true;
+default:
+  return false;
+}
+}
+
 /* The function cached_nonzero_bits is a wrapper around nonzero_bits1.
It avoids exponential behavior in nonzero_bits1 when X has
identical subexpressions on the first or the second level.  */
@@ -4179,7 +4209,7 @@ cached_nonzero_bits (const_rtx x, machin
  nonzero_bits1 on X with the subexpressions as KNOWN_X and the
  precomputed value for the subexpression as KNOWN_RET.  */
 
-  if (ARITHMETIC_P (x))
+  if (nonzero_bits_binary_arith_p (x))
 {
   rtx x0 = XEXP (x, 0);
   rtx x1 = XEXP (x, 1);
@@ -4191,13 +4221,13 @@ cached_nonzero_bits (const_rtx x, machin
   known_mode, known_ret));
 
   /* Check the second level.  */
-  if (ARITHMETIC_P (x0)
+  if (nonzero_bits_binary_arith_p (x0)
  && (x1 == XEXP (x0, 0) || x1 == XEXP (x0, 1)))
return nonzero_bits1 (x, mode, x1, mode,
  cached_nonzero_bits (x1, mode, known_x,
   known_mode, known_ret));
 
-  if (ARITHMETIC_P (x1)
+  if (nonzero_bits_binary_arith_p (x1)
  && (x0 == XEXP (x1, 0) || x0 == XEXP (x1, 1)))
return nonzero_bits1 (x, mode, x0, mode,
  cached_nonzero_bits (x0, mode, known_x,
@@ -4672,6 +4702,33 @@ nonzero_bits1 (const_rtx x, machine_mode
 #undef cached_num_sign_bit_copies
 
 
+/* Return true if num_sign_bit_copies1 might recurse into both operands
+   of X.  */
+
+static inline bool
+num_sign_bit_copies_binary_arith_p (const_rtx x)
+{
+  if (!ARITHMETIC_P (x))
+return false;
+  switch (GET_CODE (x))
+{
+case IOR:
+case AND:
+case XOR:
+case SMIN:
+case SMAX:
+case UMIN:
+case UMAX:
+case PLUS:
+case MINUS:
+case MULT:
+  return true;
+default:
+  return false;
+}
+}
+
+
 /* The function cached_num_sign_bit_copies is a wrapper around
num_sign_bit_copies1.  It avoids exponential behavior in
num_sign_bit_copies1 when X has identical subexpressions on the
@@ -4689,7 +4746,7 @@ cached_num_sign_bit_copies (const_rtx x,
  num_sign_bit_copies1 on X with the subexpressions as KNOWN_X and
  the precomputed value for the subexpression as KNOWN_RET.  */
 
-  if (ARITHMETIC_P (x))
+  if (num_sign_bit_copies_binary_arith_p (x))
 {
   rtx x0 = XEXP (x, 0);
   rtx x1 = XEXP (x, 1);
@@ -4703,7 +4760,7 @@ cached_num_sign_bit_copies (const_rtx x,
known_ret));
 
   /* Check the second level.  */
-  if (ARITHMETIC_P (x0)
+  if (num_sign_bit_copies_binary_arith_p (x0)
  && (x1 == XEXP (x0, 0) || x1 == XEXP (x0, 1)))
return
  num_sign_bit_copies1 (x, mode, x1, mode,
@@ -4711,7 +4768,7 @@ cached_num_sign_bit_copies (const_rtx x,
   

Re: [PATCH] Ensure noce_convert_multiple_sets handles only multiple sets (PR rtl-optimization/69570)

2016-02-01 Thread Richard Biener
On February 1, 2016 9:26:38 PM GMT+01:00, Jakub Jelinek  
wrote:
>On Mon, Feb 01, 2016 at 09:39:19AM +0100, Steven Bosscher wrote:
>> Browny points for opting out of the loop over all insns in the basic
>> block when count > limit.
>
>Like this?
>Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Richard.

>2016-02-01  Jakub Jelinek  
>
>   * ifcvt.c (bb_ok_for_noce_convert_multiple_sets): Return false
>   when count is incremented above limit, don't analyze further
>   insns afterwards.
>
>--- gcc/ifcvt.c.jj 2016-02-01 09:46:00.0 +0100
>+++ gcc/ifcvt.c2016-02-01 12:33:28.932281244 +0100
>@@ -3286,15 +3286,13 @@ bb_ok_for_noce_convert_multiple_sets (ba
>   if (!can_conditionally_move_p (GET_MODE (dest)))
>   return false;
> 
>-  ++count;
>+  /* FORNOW: Our cost model is a count of the number of
>instructions we
>+   would if-convert.  This is suboptimal, and should be improved as
>part
>+   of a wider rework of branch_cost.  */
>+  if (++count > limit)
>+  return false;
> }
> 
>-  /* FORNOW: Our cost model is a count of the number of instructions
>we
>- would if-convert.  This is suboptimal, and should be improved as
>part
>- of a wider rework of branch_cost.  */
>-  if (count > limit)
>-return false;
>-
>   return count > 1;
> }
> 
>
>
>   Jakub




Re: [PATCH] Ensure noce_convert_multiple_sets handles only multiple sets (PR rtl-optimization/69570)

2016-02-01 Thread Jakub Jelinek
On Mon, Feb 01, 2016 at 09:39:19AM +0100, Steven Bosscher wrote:
> Browny points for opting out of the loop over all insns in the basic
> block when count > limit.

Like this?
Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-02-01  Jakub Jelinek  

* ifcvt.c (bb_ok_for_noce_convert_multiple_sets): Return false
when count is incremented above limit, don't analyze further
insns afterwards.

--- gcc/ifcvt.c.jj  2016-02-01 09:46:00.0 +0100
+++ gcc/ifcvt.c 2016-02-01 12:33:28.932281244 +0100
@@ -3286,15 +3286,13 @@ bb_ok_for_noce_convert_multiple_sets (ba
   if (!can_conditionally_move_p (GET_MODE (dest)))
return false;
 
-  ++count;
+  /* FORNOW: Our cost model is a count of the number of instructions we
+would if-convert.  This is suboptimal, and should be improved as part
+of a wider rework of branch_cost.  */
+  if (++count > limit)
+   return false;
 }
 
-  /* FORNOW: Our cost model is a count of the number of instructions we
- would if-convert.  This is suboptimal, and should be improved as part
- of a wider rework of branch_cost.  */
-  if (count > limit)
-return false;
-
   return count > 1;
 }
 


Jakub


Re: [PATCH] Fix wide_int unsigned division (PR tree-optimization/69546, take 2)

2016-02-01 Thread Jakub Jelinek
On Sat, Jan 30, 2016 at 02:04:45PM +, Richard Sandiford wrote:
> Not sure what to call it.  Maybe canonize_uhwi?  Like canonize, except
> that it takes a uhwi instead of a length.
> 
> > Can that be done as a follow-up?  Certainly it would need
> > to take the uhwi to store, pointer to the array of hwis, and precision.
> 
> Yeah, guess it can wait.

So like this?
Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-02-01  Jakub Jelinek  

* wide-int.cc (canonize_uhwi): New function.
(wi::divmod_internal): Use it.

--- gcc/wide-int.cc.jj  2016-01-30 19:03:35.0 +0100
+++ gcc/wide-int.cc 2016-02-01 12:28:23.501519292 +0100
@@ -118,6 +118,20 @@ canonize (HOST_WIDE_INT *val, unsigned i
   return 1;
 }
 
+/* VAL[0] is unsigned result of operation.  Canonize it by adding
+   another 0 block if needed, and return number of blocks needed.  */
+
+static inline unsigned int
+canonize_uhwi (HOST_WIDE_INT *val, unsigned int precision)
+{
+  if (val[0] < 0 && precision > HOST_BITS_PER_WIDE_INT)
+{
+  val[1] = 0;
+  return 2;
+}
+  return 1;
+}
+
 /*
  * Conversion routines in and out of wide_int.
  */
@@ -1793,25 +1807,12 @@ wi::divmod_internal (HOST_WIDE_INT *quot
   if (quotient)
{
  quotient[0] = o0 / o1;
- if (o1 == 1
- && (HOST_WIDE_INT) o0 < 0
- && dividend_prec > HOST_BITS_PER_WIDE_INT)
-   {
- quotient[1] = 0;
- quotient_len = 2;
-   }
+ quotient_len = canonize_uhwi (quotient, dividend_prec);
}
   if (remainder)
{
  remainder[0] = o0 % o1;
- if ((HOST_WIDE_INT) remainder[0] < 0
- && dividend_prec > HOST_BITS_PER_WIDE_INT)
-   {
- remainder[1] = 0;
- *remainder_len = 2;
-   }
- else
-   *remainder_len = 1;
+ *remainder_len = canonize_uhwi (remainder, dividend_prec);
}
   return quotient_len;
 }

Jakub


Re: genattrab.c generate switch

2016-02-01 Thread Jesper Broge Jørgensen


On 01/02/16 20:19, Patrick Palka wrote:

On Tue, Jan 12, 2016 at 7:53 PM, Jesper Broge Jørgensen
 wrote:

Hello

genattrab.c can generate if statements that have very deep bracket nesting
causing clang to produce errors (when target=arm-none-eabi) as explained at
https://gcc.gnu.org/ml/gcc/2014-05/msg00032.html
At the above link it was suggested that genattrab.c generated a switch
statement instead. I have made a patch that does just that.=

Have you considered first implementing the other suggestion in that
thread -- to avoid emitting the redundant parentheses in a consecutive
chain ||s and &&s? That kind of fix would be simpler and would fix the
compilation issue all the same, right?
I think it makes more sense to have the special case logic as close to 
where you know that you have to handle it eg. in write_attr_set() 
instead of in the more general purpos write_test_expr() (which is the 
where the parentheses is inserted). Also creating a different flow 
control structure (writing a switch instead of an if statement) makes it 
clear that we are handling an edge case. Also most of the logic needed 
is in the new function check_attr_set_switch() that is used to check if 
the expression is indeed a chain of || operators which would have to be 
done either way, so not much simplification would be gained.
write_test_expr() also makes the promise (in form of an inline comment) 
that parentheses are inserted to avoid worrying about operator 
precedence, having a special case that does not insert parentheses would 
break that promise.
Lastly i would also guess that a switch statement could be better 
optimized by the compiler but one would have to inspect the assembly 
output which i have not done so that is guess-work.




Re: [PATCH] Fix PR64748

2016-02-01 Thread Jakub Jelinek
On Mon, Feb 01, 2016 at 01:41:50PM -0600, James Norris wrote:
> The attached patch resolves c/PR64748. The patch
> adds the use of parm's with the deviceptr clause.
> 
> Question
> 
> As there is VAR_P (), could there be a PARM_P ()?

Not for GCC 6.x, for 7 it is possible.

> --- a/gcc/c/c-parser.c
> +++ b/gcc/c/c-parser.c
> @@ -10760,7 +10760,7 @@ c_parser_oacc_data_clause_deviceptr (c_parser 
> *parser, tree list)
>c_parser_omp_var_list_parens() should construct a list of
>locations to go along with the var list.  */
>  
> -  if (!VAR_P (v))
> +  if (!VAR_P (v) && !(TREE_CODE (v) == PARM_DECL))

Please don't write !(x == y) but x != y.

> --- a/gcc/cp/parser.c
> +++ b/gcc/cp/parser.c
> @@ -30087,7 +30087,7 @@ cp_parser_oacc_data_clause_deviceptr (cp_parser 
> *parser, tree list)
>c_parser_omp_var_list_parens should construct a list of
>locations to go along with the var list.  */
>  
> -  if (!VAR_P (v))
> +  if (!VAR_P (v) && !(TREE_CODE (v) == PARM_DECL))
>   error_at (loc, "%qD is not a variable", v);
>else if (TREE_TYPE (v) == error_mark_node)
>   ;

For C++, all this diagnostics is premature, if processing_template_decl
you really often don't know what the type will be, not sure if you always
know at least if it is a VAR_DECL, PARM_DECL or something else.  I bet you
can easily ICE with the current POINTER_TYPE_P (TREE_TYPE (v)) check as
in templates the type can be NULL, or it could be some lang type and only
later on become POINTER_TYPE, etc.
For C++ the diagnostics need to be done during finish_omp_clauses or so, not
earlier.

Jakub


Re: [PATCH] Don't change stack_alignment_needed for __tls_get_addr

2016-02-01 Thread Uros Bizjak
On Mon, Feb 1, 2016 at 6:30 PM, H.J. Lu  wrote:
> On Wed, Jan 27, 2016 at 11:49 AM, Uros Bizjak  wrote:
>> On Wed, Jan 27, 2016 at 8:25 PM, H.J. Lu  wrote:
>>>
>>> __tls_get_addr must be called with 16-byte aligned stack, which is
>>> guaranted by setting preferred_stack_boundary to 128 bits.  There
>>> is no need to change stack_alignment_needed for __tls_get_addr.
>>>
>>> Tested on x86-64.  OK for trunk?
>>
>> You know the purpose of these flags better than I, so - OK.
>>
>> Thanks,
>> Uros.
>>
>>>
>>> H.J.
>>> --
>>> PR target/68986
>>> * config/i386/i386.c (ix86_update_stack_boundary): Don't
>>> change stack_alignment_needed for __tls_get_addr call.
>
> Here is the backport for GCC 5.  Ok for gcc-5-branch?

OK.

Thanks,
Uros.


[PATCH] Fix PR64748

2016-02-01 Thread James Norris

Hi,

The attached patch resolves c/PR64748. The patch
adds the use of parm's with the deviceptr clause.

Question

As there is VAR_P (), could there be a PARM_P ()?
Or would that obscure something I'm not aware of?

Regtested and bootstrapped on x86_64.

Thanks,
Jim

 ChangeLog entries...

gcc/c/
PR c/64748
* c-parser.c (c_parser_oacc_data_clause_deviceptr): Allow parms.

gcc/cp/
PR c/64748
* parser.c (cp_parser_oacc_data_clause_deviceptr): Allow parms.

gcc/testsuite/
PR c/64748
* c-c++-common/goacc/deviceptr-1.c: Add tests.

diff --git a/gcc/c/ChangeLog b/gcc/c/ChangeLog
index 5341f04..f2d114c 100644
--- a/gcc/c/ChangeLog
+++ b/gcc/c/ChangeLog
@@ -1,3 +1,8 @@
+2016-02-XX  James Norris  
+
+	PR c/64748
+	* c-parser.c (c_parser_oacc_data_clause_deviceptr): Allow parms.
+
 2016-01-27  Jakub Jelinek  
 
 	PR debug/66869
diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index eede3a7..f61f559 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -10760,7 +10760,7 @@ c_parser_oacc_data_clause_deviceptr (c_parser *parser, tree list)
 	 c_parser_omp_var_list_parens() should construct a list of
 	 locations to go along with the var list.  */
 
-  if (!VAR_P (v))
+  if (!VAR_P (v) && !(TREE_CODE (v) == PARM_DECL))
 	error_at (loc, "%qD is not a variable", v);
   else if (TREE_TYPE (v) == error_mark_node)
 	;
diff --git a/gcc/cp/ChangeLog b/gcc/cp/ChangeLog
index 3b5c9d5..b11b859 100644
--- a/gcc/cp/ChangeLog
+++ b/gcc/cp/ChangeLog
@@ -1,3 +1,8 @@
+2016-02-XX  James Norris  
+
+	PR c/64748
+	* parser.c (cp_parser_oacc_data_clause_deviceptr): Allow parms.
+
 2016-01-29  Jakub Jelinek  
 
 	PR debug/66869
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index d03b0c9..de96b44 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -30087,7 +30087,7 @@ cp_parser_oacc_data_clause_deviceptr (cp_parser *parser, tree list)
 	 c_parser_omp_var_list_parens should construct a list of
 	 locations to go along with the var list.  */
 
-  if (!VAR_P (v))
+  if (!VAR_P (v) && !(TREE_CODE (v) == PARM_DECL))
 	error_at (loc, "%qD is not a variable", v);
   else if (TREE_TYPE (v) == error_mark_node)
 	;
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 150ebc8..db281cd 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,8 @@
+2016-02-XX  James Norris  
+
+	PR c/64748
+	* c-c++-common/goacc/deviceptr-1.c: Add tests.
+
 2016-01-29  Jakub Jelinek  
 
 	PR target/69551
diff --git a/gcc/testsuite/c-c++-common/goacc/deviceptr-1.c b/gcc/testsuite/c-c++-common/goacc/deviceptr-1.c
index 546fa82..6edbdb1 100644
--- a/gcc/testsuite/c-c++-common/goacc/deviceptr-1.c
+++ b/gcc/testsuite/c-c++-common/goacc/deviceptr-1.c
@@ -84,3 +84,21 @@ fun4 (void)
 #pragma acc parallel deviceptr(s2_p)
   s2_p = 0;
 }
+
+void
+func5 (float *fp)
+{
+
+#pragma acc data deviceptr (fp)
+{ }
+
+}
+
+void
+func6 (float fp)
+{
+
+#pragma acc data deviceptr (fp)	/* { dg-error "is not a pointer variable" } */
+{ }
+
+}


Re: genattrab.c generate switch

2016-02-01 Thread Patrick Palka
On Tue, Jan 12, 2016 at 7:53 PM, Jesper Broge Jørgensen
 wrote:
> Hello
>
> genattrab.c can generate if statements that have very deep bracket nesting
> causing clang to produce errors (when target=arm-none-eabi) as explained at
> https://gcc.gnu.org/ml/gcc/2014-05/msg00032.html
> At the above link it was suggested that genattrab.c generated a switch
> statement instead. I have made a patch that does just that.=

Have you considered first implementing the other suggestion in that
thread -- to avoid emitting the redundant parentheses in a consecutive
chain ||s and &&s? That kind of fix would be simpler and would fix the
compilation issue all the same, right?


Re: Default compute dimensions

2016-02-01 Thread Nathan Sidwell

On 02/01/16 13:42, Jakub Jelinek wrote:


Your patch broke bootstrap on ILP32 hosts, I'm testing following fix.
Supporting unsigned values from 0x8000U to 0xU only on LP64
hosts and not on ILP64 hosts sounds really weird, I think it is better
to only support 1 to 0x7fffU.


yes, I must have missed that first cast when changing my mind over 
signed/unsigned.  thanks!


nathan


Re: [Patch,microblaze]: Better register allocation to minimize the spill and fetch.

2016-02-01 Thread Mike Stump
On Jan 29, 2016, at 2:31 AM, Ajit Kumar Agarwal  
wrote:
> 
> This patch improves the allocation of registers in the given function.

Is it just me, or, would it be even better to change the abi and make 
MB_ABI_ASM_TEMP_REGNUM be allocated by the register allocator?

Re: Default compute dimensions

2016-02-01 Thread H.J. Lu
On Mon, Feb 1, 2016 at 8:15 AM, Nathan Sidwell  wrote:
> On 02/01/16 10:32, Jakub Jelinek wrote:
>>
>> On Mon, Feb 01, 2016 at 09:15:05AM -0500, Nathan Sidwell wrote:
>>>
>>> On 01/29/16 10:18, Jakub Jelinek wrote:

 On Thu, Jan 28, 2016 at 10:38:51AM -0500, Nathan Sidwell wrote:
>
> This patch adds default compute dimension handling.  Users rarely
> specify
> compute dimensions, expecting the toolchain to DTRT.  More savvy users
> would
> like to specify global defaults.  This patch permits both.


 Isn't it better to be able to override the defaults on the library side?
 I mean, when when somebody is compiling the code, often he doesn't know
 the
 exact properties of the hw it will be run on, if he does, I think it is
 better to specify them explicitly in the code.
>>>
>>>
>>> I realized that it's actually not possible to markup the code in this
>>> way,
>>> as an 'intermediate' user.  One can exercise complete control by saying
>>> exactly the axis/axes over which a loop is to be partitioned, and then
>>> specify the geometry.  But one cannot use the 'auto' feature and have the
>>> compiler choose an axis without also relying on the compiler choosing a
>>> size
>>> for that axis.  As I already said,  IMHO being able to specify a
>>> compile-time size is useful.
>>
>>
>> Ok, I won't fight against it.  But please make sure it can be overridden
>> on
>> the library side too.
>
>
> Absolutely, thanks!
>

This breaks bootstrap on x86:

../../src-trunk/gcc/omp-low.c: In function ‘void
oacc_parse_default_dims(const char*)’:
../../src-trunk/gcc/omp-low.c:20288:47: warning: comparison between
signed and unsigned integer expressions [-Wsign-compare]
if (errno || val <= 0 || (unsigned)val != val)
   ^

-- 
H.J.


Re: Default compute dimensions

2016-02-01 Thread Jakub Jelinek
On Mon, Feb 01, 2016 at 11:15:13AM -0500, Nathan Sidwell wrote:
> On 02/01/16 10:32, Jakub Jelinek wrote:
> >On Mon, Feb 01, 2016 at 09:15:05AM -0500, Nathan Sidwell wrote:
> >>On 01/29/16 10:18, Jakub Jelinek wrote:
> >>>On Thu, Jan 28, 2016 at 10:38:51AM -0500, Nathan Sidwell wrote:
> This patch adds default compute dimension handling.  Users rarely specify
> compute dimensions, expecting the toolchain to DTRT.  More savvy users 
> would
> like to specify global defaults.  This patch permits both.
> >>>
> >>>Isn't it better to be able to override the defaults on the library side?
> >>>I mean, when when somebody is compiling the code, often he doesn't know the
> >>>exact properties of the hw it will be run on, if he does, I think it is
> >>>better to specify them explicitly in the code.
> >>
> >>I realized that it's actually not possible to markup the code in this way,
> >>as an 'intermediate' user.  One can exercise complete control by saying
> >>exactly the axis/axes over which a loop is to be partitioned, and then
> >>specify the geometry.  But one cannot use the 'auto' feature and have the
> >>compiler choose an axis without also relying on the compiler choosing a size
> >>for that axis.  As I already said,  IMHO being able to specify a
> >>compile-time size is useful.
> >
> >Ok, I won't fight against it.  But please make sure it can be overridden on
> >the library side too.
> 
> Absolutely, thanks!

Your patch broke bootstrap on ILP32 hosts, I'm testing following fix.
Supporting unsigned values from 0x8000U to 0xU only on LP64
hosts and not on ILP64 hosts sounds really weird, I think it is better
to only support 1 to 0x7fffU.

2016-02-01  Jakub Jelinek  

* omp-low.c (oacc_parse_default_dims): Avoid
-Wsign-compare warning, make sure value fits into int
rather than just unsigned int.

--- gcc/omp-low.c.jj2016-02-01 19:08:51.0 +0100
+++ gcc/omp-low.c   2016-02-01 19:36:57.751641369 +0100
@@ -20285,10 +20285,10 @@ oacc_parse_default_dims (const char *dim
 
  errno = 0;
  val = strtol (pos, CONST_CAST (char **, &eptr), 10);
- if (errno || val <= 0 || (unsigned)val != val)
+ if (errno || val <= 0 || (int) val != val)
goto malformed;
  pos = eptr;
- oacc_default_dims[ix] = (int)val;
+ oacc_default_dims[ix] = (int) val;
}
}
   if (*pos)


Jakub


Re: [PATCH] [graphite] document that isl-0.16 is supported

2016-02-01 Thread Mike Stump
On Jan 29, 2016, at 8:10 AM, Sebastian Pop  wrote:
> diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
> index 062f42c..3df7974 100644
> --- a/gcc/doc/install.texi
> +++ b/gcc/doc/install.texi
> @@ -383,7 +383,7 @@ installed but it is not in your default library search 
> path, the
> @option{--with-mpc} configure option should be used.  See also
> @option{--with-mpc-lib} and @option{--with-mpc-include}.
> 
> -@item isl Library version 0.15 or 0.14.
> +@item isl Library version 0.16, 0.15, or 0.14.

So, they should commit to compatibility with apis vended, and if they do, I 
think we should say 0.14 or later.  This doesn’t mean that we won’t need fixes 
from time to time or that they will always do this, but generally that is true. 
 If we (they) deviate from this, _then_ we document the exceptions.  If release 
after release goes out, with all newer versions not working, then the list of 
known good versions is best; but, I’d say that something is terribly broken if 
we had to do that.

Re: [PATCH] nvptx: do not use alternative spelling of unsigned comparisons

2016-02-01 Thread Nathan Sidwell

On 02/01/16 11:55, Alexander Monakov wrote:

Hello!

The following patch fixes subtle breakage on NVPTX when unsigned comparisons
would be sometimes mistranslated by PTX JIT.  The new test demonstrates that
by using the %nctaid.x (number of blocks) register, but a comparison against
a value in constant memory can also trigger that (I could easily reproduce
that with NVCC, but not with GCC yet).  OK for trunk?

When emitting unsigned comparisons, nvptx backend uses alternative spellings
lo/ls/hs/hi for relational operators, instead of the usual lt/le/ge/gt,
respectively.  There's no nominal difference, but in practice ptxas and PTX
JIT can mistranslate the former in some circumstances, e.g. when the first
comparison operand is a load from constant memory (perhaps implicitely like
%nctaid.x).


Ok.

nathan


Re: genattrab.c generate switch

2016-02-01 Thread Jesper Broge Jørgensen



On 19/01/16 13:18, Bernd Schmidt wrote:

On 01/18/2016 11:44 PM, Jesper Broge Jørgensen wrote:

I found a formatting tool called uncrustify that comes with a gnu style
config
https://github.com/bengardner/uncrustify/blob/master/etc/gnu-indent.cfg
that needed a few tweaks to format code that looked what is already in
gcc/genattrtab.c

The tweaks was:

indent_with_tabs = 2 // instead of 0
sp_func_def_paren = add // instead of remove
sp_func_proto_paren  = add // instead of remove
sp_func_call_paren = add // instead of remove

So now the code should be correctly formatted.


Best to get that right when editing, though. emacs defaults to GNU 
style and other editors can also be tweaked.



Do i send in a new patch or just respond to the old one with the new
changes?


Usually best to send updated patches (as text/plain attachment to 
avoid word-wrapping and other whitespace damage).



I have also followed instructions at
https://gcc.gnu.org/ml/gcc/2003-06/txt00010.txt to get copyright
assignment though i have not yet received a reply.


Ok, we'll have to wait for that.


Bernd

I have finally received the copyright assignment it is currently on its 
way over the atlantic via snail mail.
In the meantime i have attached the the refomatted patch here hoping it 
could be reviewed while we wait.
gcc/ChangeLog:

2016-01-19  Jesper Broge Jørgensen  

* genattrtab.c (check_attr_set_switch): New function
(write_attr_set): Write a switch instead of if condition if possible


diff --git a/gcc/genattrtab.c b/gcc/genattrtab.c
index 2caf8f6..8e7f9e6 100644
--- a/gcc/genattrtab.c
+++ b/gcc/genattrtab.c
@@ -4113,6 +4113,103 @@ eliminate_known_true (rtx known_true, rtx exp, int 
insn_code, int insn_index)
   return exp;
 }
 
+/* Check if exp contains a series of IOR conditions on the same attr_name.
+   If it does it can be turned into a switch statement and returns true.
+   If write_cases is true it will write the cases of the switch to outf.  */
+
+static int
+check_attr_set_switch (FILE *outf, rtx exp, unsigned int attrs_cached,
+  int write_cases, int indent)
+{
+  if (GET_CODE (exp) != IOR)
+return 0;
+  if (GET_CODE (XEXP (exp, 0)) != EQ_ATTR)
+return 0;
+
+  rtx next = exp;
+  int ior_depth = 0;
+  int is_first = 1;
+
+  const char *attr_name_cmp = XSTR (XEXP (exp, 0), 0);
+
+  while (1)
+{
+  rtx op1 = XEXP (next, 0);
+  rtx op2 = XEXP (next, 1);
+
+  if (GET_CODE (op1) != EQ_ATTR)
+   return 0;
+
+  const char *attr_name = XSTR (op1, 0);
+  const char *cmp_val = XSTR (op1, 1);
+
+  /* pointer compare is enough.  */
+  if (attr_name_cmp != attr_name)
+   return 0;
+
+  if (write_cases)
+   {
+ struct attr_desc *attr = find_attr (&attr_name, 0);
+ gcc_assert (attr);
+ if (is_first)
+   {
+ fprintf (outf, "(");
+ is_first = 0;
+ int i;
+ for (i = 0; i < cached_attr_count; i++)
+   if (attr->name == cached_attrs[i])
+ break;
+
+ if (i < cached_attr_count && (attrs_cached & (1U << i)) != 0)
+   fprintf (outf, "cached_%s", attr->name);
+ else if (i < cached_attr_count &&
+  (attrs_to_cache & (1U << i)) != 0)
+   fprintf (outf, "(cached_%s = get_attr_%s (insn))", attr->name,
+attr->name);
+ else
+   fprintf (outf, "get_attr_%s (insn)", attr->name);
+ fprintf (outf, ")\n");
+ write_indent (outf, indent);
+ fprintf (outf, "{\n");
+   }
+ write_indent (outf, indent);
+ fprintf (outf, "case ");
+ write_attr_valueq (outf, attr, cmp_val);
+ fprintf (outf, ":\n");
+   }
+
+  const int code = GET_CODE (op2);
+  if (code != IOR)
+   {
+ if (code == EQ_ATTR)
+   {
+ const char *attr_name = XSTR (op2, 0);
+ const char *cmp_val = XSTR (op2, 1);
+
+ if (attr_name == alternative_name)
+   return 0;
+
+ struct attr_desc *attr = find_attr (&attr_name, 0);
+ gcc_assert (attr);
+
+ if (attr->is_const)
+   return 0;
+ else if (write_cases)
+   {
+ write_indent (outf, indent);
+ fprintf (outf, "case ");
+ write_attr_valueq (outf, attr, cmp_val);
+ fprintf (outf, ":\n");
+   }
+   }
+ break;
+   }
+  next = op2;
+  ior_depth++;
+}
+  return ior_depth > 2;
+}
+
 /* Write out a series of tests and assignment statements to perform tests and
sets of an attribute value.  We are passed an indentation amount and prefix
and suffix strings to write around each attribute value (e.g., "return"
@@ -4123,6 +4220,7 @@ write_attr_set (FILE *outf, struct attr_desc *attr, int 
indent, rtx value,
  

Re: [RS6000] lqarx and stqcx. registers

2016-02-01 Thread David Edelsohn
On Sun, Jan 31, 2016 at 5:28 PM, Alan Modra  wrote:
> lqarx RT and stqcx. RS are valid only with even numbered gprs.  The
> predicate to enforce this happens to allow a loophole, closed by this
> patch.
>
> This pattern created by combine:
> Trying 8 -> 9:
> Successfully matched this instruction:
> (set (subreg:PTI (reg:TI 155 [ D.2357 ]) 0)
> (unspec_volatile:PTI [
> (mem/v:TI (reg/v/f:DI 157 [ mptr ]) [-1  S16 A128])
> ] UNSPECV_LL))
>
> is seen by reload as needing to reload pseudo 155 in TI mode, which
> has no requirement that the reg be even.  Apparently, nothing checks
> the predicate again after reload.
>
> We only see this problem on gcc-5 and gcc-4.9, because on gcc-6 we
> don't define WORD_REGISTER_OPERATIONS and combine happens to have a
> bug in simplify_set that prevents it creating the problem subregs.
> See https://gcc.gnu.org/ml/gcc-patches/2016-01/msg02377.html
>
> Bootstrapped and regression tested powerpc64-linux biarch on master
> both with and without the combine bug, and on gcc-5.  OK for master
> and active branches?
>
> gcc/
> PR target/69548
> * config/rs6000/predicates.md (quad_int_reg_operand): Don't
> allow subregs.
> gcc/testsuite/
> * gcc.target/powerpc/pr69548.c: New test.

Okay.

Thanks, David


Re: [RS6000] ABI_V4 init of toc section

2016-02-01 Thread David Edelsohn
On Fri, Jan 29, 2016 at 11:38 AM, Alan Modra  wrote:
> Since 4c4a180d, LTO has turned off flag_pic when linking a fixed
> position executable.  This results in flag_pic being zero in
> rs6000_file_start, and no definition of ".LCTOC1".
>
> However, when we get to actually emitting code, flag_pic may be on
> again, and references made to ".LCTOC1".  How flag_pic comes to be
> enabled again is quite a story.  It goes like this..  If a function is
> compiled with -fPIC then sysv4.h SUBTARGET_OVERRIDE_OPTIONS will set
> TARGET_RELOCATABLE.  Conversely, if TARGET_RELOCATABLE is set and
> flag_pic is zero, then SUBTARGET_OVERRIDE_OPTIONS will set flag_pic=2.
> It also happens that TARGET_RELOCATABLE is a bit in rs6000_isa_flags,
> which is handled by rs6000_function_specific_save and
> rs6000_function_specific_restore.  That last fact means lto streaming
> keeps track of the state of TARGET_RELOCATABLE for functions, and when
> options are restored for a given function we'll set flag_pic=2 if the
> function was originally compiled with -fPIC.  That's bad because it
> defeats the purpose of the 4c4a180d lto change, resulting in worse
> optimization of ppc32 executables.  What's more, we don't seem to turn
> off flag_pic once it is on.
>
> We should really untangle the flag_pic/TARGET_RELOCATABLE mess, but
> that change is probably a little dangerous for stage4.  Instead, this
> patch removes the toc symbol initialization from file_start and does
> so when the first item is emitted to the toc, or after the function
> epilogue in the cases where we emit code to initialize a toc pointer
> but don't actually use it (-O0 mostly, I think).
>
> Bootstrapped and regression tested powerpc64-linux biarch with all
> languages enabled.  OK to apply?
>
> PR target/68662
> * config/rs6000/rs6000.c (need_toc_init): New var, set it
> whenever toc_label_name used.
> (rs6000_file_start): Don't set up toc section here,
> (rs6000_output_function_epilogue): do so here instead,
> (rs6000_xcoff_file_start): and here.
> * config/rs6000/rs6000.md (load_toc_aix_si): Set need_toc_init.
> (load_toc_aix_di): Likewise.

This is okay as an interim fix for GCC 6.

Thanks, David


[openacc] reference-typed data mappings

2016-02-01 Thread Cesar Philippidis
This patch fixes a couple of bugs preventing c++ reference-typed
variables from working in openacc data clauses. These fixes include:

 * Teach the gimplifier to filter out pointer data mappings for
   OACC_DATA, OACC_ENTER_DATA, OACC_EXIT_DATA and OACC_UPDATE regions.
   Along with using a firsptrivate mapping for the array base pointers
   in OACC_DATA, OACC_PARALLEL and OACC_KERNELS regions.

 * Make the data mapping errors emitted by the c and c++ front ends
   more consistent with openacc by reporting data mapping errors, not
   omp-specific map errors.

 * Add some light checking for duplicate reference mappings in c++. The
   c++ FE still fails to detect duplicate component refs, but that's not
   working in openacc at the moment, anyway.

Jakub, the latter issue also affects openmp. I've added a simple openmp
test case, but it could probably be more extensive. Can you add more
test coverage or tell me what should be included?

Is this patch ok for trunk?

Cesar
2016-02-01  Cesar Philippidis  

	gcc/c/
	* c-typeck.c (c_finish_omp_clauses): Report OMP_CLAUSE_MAP errors
	as data clause errors for OpenACC.

	gcc/cp/
	* cp-tree.h (finish_omp_clauses): Update prototype.
	* parser.c (cp_parser_oacc_all_clauses): Update call to
	finish_omp_clauses.
	(cp_parser_omp_all_clauses): Likewise.
	(cp_parser_omp_for_loop): Likewise.
	(cp_omp_split_clauses): Likewise.
	(cp_parser_oacc_cache): Likewise.
	(cp_parser_oacc_loop): Likewise.
	(cp_parser_omp_declare_target): Likewise.
	(cp_parser_cilk_for): Likewise.
	* pt.c (tsubst_attribute): Update call to tsubst_omp_clauses.
	(tsubst_omp_clauses): New is_oacc argument.  Use it when calling
	finish_omp_clauses.
	(tsubst_omp_for_iterator): Update call to finish_omp_clauses.
	(tsubst_expr): Update calls to tsubst_omp_clauses.
	* semantics.c (finish_omp_clauses): Add is_oacc argument.  Report
	OMP_CLAUSE_MAP errors as data clause errors for OpenACC.  Check for
	duplicate reference mappings.  Exclude "omp declare simd"-isms when
	processing OpenACC clauses.
	(finish_omp_for): Update call to finish_omp_clauses.

	gcc/
	* gimplify.c (gimplify_scan_omp_clauses): Consider OACC_{DATA, PARALLEL,
	KERNELS} when setting target_firstprivatize_array_bases.  Consider
	OACC_{DATA, ENTER_DATA, EXIT_DATA, UPDATE} when filtering out pointer
	mappings.  Also filter out GOMP_MAP_POINTER.

	gcc/testsuite/
	* c-c++-common/goacc/data-clause-duplicate-1.c: Adjust test.
	* c-c++-common/goacc/declare-2.c: Likewise.
	* c-c++-common/goacc/deviceptr-1.c: Likewise.
	* c-c++-common/goacc/kernels-alias-ipa-pta-3.c: Likewise.
	* c-c++-common/goacc/pcopy.c: Likewise.
	* c-c++-common/goacc/pcopyin.c: Likewise.
	* c-c++-common/goacc/pcopyout.c: Likewise.
	* c-c++-common/goacc/pcreate.c: Likewise.
	* c-c++-common/goacc/present-1.c: Likewise.
	* g++.dg/goacc/data-1.C: New test.
	* g++.dg/goacc/data-2.C: New test.
	* g++.dg/goacc/reduction-1.C: New test.
	* g++.dg/goacc/update.C: New test.
	* g++.dg/gomp/template-data.C: New test.

	libgomp/
	* testsuite/libgomp.c++/non-scalar-data.C: New test.
	* testsuite/libgomp.oacc-c++/data-references.C: New test.
	* testsuite/libgomp.oacc-c++/data-templates.C: New test.
	* testsuite/libgomp.oacc-c++/non-scalar-data-templates.C: New test.
	* testsuite/libgomp.oacc-c++/update-reference.C: New test.
	* testsuite/libgomp.oacc-c++/update-template.C: New test.
	* testsuite/libgomp.oacc-c-c++-common/asyncwait-1.c: Adjust test.
	* testsuite/libgomp.oacc-c-c++-common/data-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/data-3.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/if-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/nested-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/present-2.c: Likewise.

diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index 65925cb..4d93005 100644
--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@@ -13157,8 +13157,10 @@ c_finish_omp_clauses (tree clauses, bool is_omp, bool declare_simd)
 	{
 	  if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_MAP)
 		error ("%qD appears more than once in motion clauses", t);
-	  else
+	  else if (is_omp)
 		error ("%qD appears more than once in map clauses", t);
+	  else
+		error ("%qD appears more than once in data clauses", t);
 	  remove = true;
 	}
 	  else if (bitmap_bit_p (&generic_head, DECL_UID (t))
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 0aeee57..baf3495 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -6415,7 +6415,7 @@ extern tree omp_reduction_id			(enum tree_code, tree, tree);
 extern tree cp_remove_omp_priv_cleanup_stmt	(tree *, int *, void *);
 extern void cp_check_omp_declare_reduction	(tree);
 extern void finish_omp_declare_simd_methods	(tree);
-extern tree finish_omp_clauses			(tree, bool, bool = false);
+extern tree finish_omp_clauses			(tree, bool, bool, bool = false);
 extern tree push_omp_privatization_clauses	(bool);
 extern void pop_omp_privatization_clauses	(tree);
 extern void save_omp_privatization_clauses	(vec &);
diff --git a/gcc

Re: [PATCH] Don't change stack_alignment_needed for __tls_get_addr

2016-02-01 Thread H.J. Lu
On Wed, Jan 27, 2016 at 11:49 AM, Uros Bizjak  wrote:
> On Wed, Jan 27, 2016 at 8:25 PM, H.J. Lu  wrote:
>>
>> __tls_get_addr must be called with 16-byte aligned stack, which is
>> guaranted by setting preferred_stack_boundary to 128 bits.  There
>> is no need to change stack_alignment_needed for __tls_get_addr.
>>
>> Tested on x86-64.  OK for trunk?
>
> You know the purpose of these flags better than I, so - OK.
>
> Thanks,
> Uros.
>
>>
>> H.J.
>> --
>> PR target/68986
>> * config/i386/i386.c (ix86_update_stack_boundary): Don't
>> change stack_alignment_needed for __tls_get_addr call.

Here is the backport for GCC 5.  Ok for gcc-5-branch?


-- 
H.J.
From 3c93d02be1e4b41d9116da6262cf083a65439280 Mon Sep 17 00:00:00 2001
From: hjl 
Date: Tue, 26 Jan 2016 12:51:07 +
Subject: [PATCH] Update preferred stack boundary in ix86_update_stack_boundary

__tls_get_addr must be called with 16-byte aligned stack, which is
guaranted by setting preferred_stack_boundary to 128 bits.  Preferred
stack boundary adjustment for __tls_get_addr should be done in
ix86_update_stack_boundary, not ix86_compute_frame_layout Also
there is no need to over-align stack for __tls_get_addr and function
with __tls_get_addr call isn't a leaf function.

gcc/

	Backport from mainline
	PR target/68986
	* config/i386/i386.c (ix86_compute_frame_layout): Move stack
	alignment adjustment to ...
	(ix86_update_stack_boundary): Here.  Don't over-align stack nor
	change stack_alignment_needed for __tls_get_addr.
	(ix86_finalize_stack_realign_flags): Use stack_alignment_needed
	if __tls_get_addr is called.

gcc/testsuite/

	Backport from mainline
	PR target/68986
	* gcc.target/i386/pr68986-1.c: New test.
	* gcc.target/i386/pr68986-2.c: Likewise.
	* gcc.target/i386/pr68986-3.c: Likewise.
---
 gcc/config/i386/i386.c| 26 ++
 gcc/testsuite/gcc.target/i386/pr68986-1.c | 11 +++
 gcc/testsuite/gcc.target/i386/pr68986-2.c | 13 +
 gcc/testsuite/gcc.target/i386/pr68986-3.c | 13 +
 4 files changed, 47 insertions(+), 16 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr68986-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr68986-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr68986-3.c

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 504e8b8..a99d53b 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -10131,18 +10131,6 @@ ix86_compute_frame_layout (struct ix86_frame *frame)
   crtl->preferred_stack_boundary = 128;
   crtl->stack_alignment_needed = 128;
 }
-  /* preferred_stack_boundary is never updated for call
- expanded from tls descriptor. Update it here. We don't update it in
- expand stage because according to the comments before
- ix86_current_function_calls_tls_descriptor, tls calls may be optimized
- away.  */
-  else if (ix86_current_function_calls_tls_descriptor
-	   && crtl->preferred_stack_boundary < PREFERRED_STACK_BOUNDARY)
-{
-  crtl->preferred_stack_boundary = PREFERRED_STACK_BOUNDARY;
-  if (crtl->stack_alignment_needed < PREFERRED_STACK_BOUNDARY)
-	crtl->stack_alignment_needed = PREFERRED_STACK_BOUNDARY;
-}
 
   stack_alignment_needed = crtl->stack_alignment_needed / BITS_PER_UNIT;
   preferred_alignment = crtl->preferred_stack_boundary / BITS_PER_UNIT;
@@ -10816,6 +10804,11 @@ ix86_update_stack_boundary (void)
   && cfun->stdarg
   && crtl->stack_alignment_estimated < 128)
 crtl->stack_alignment_estimated = 128;
+
+  /* __tls_get_addr needs to be called with 16-byte aligned stack.  */
+  if (ix86_tls_descriptor_calls_expanded_in_cfun
+  && crtl->preferred_stack_boundary < 128)
+crtl->preferred_stack_boundary = 128;
 }
 
 /* Handle the TARGET_GET_DRAP_RTX hook.  Return NULL if no DRAP is
@@ -11275,10 +11268,11 @@ ix86_finalize_stack_realign_flags (void)
   unsigned int incoming_stack_boundary
 = (crtl->parm_stack_boundary > ix86_incoming_stack_boundary
? crtl->parm_stack_boundary : ix86_incoming_stack_boundary);
-  unsigned int stack_realign = (incoming_stack_boundary
-< (crtl->is_leaf
-   ? crtl->max_used_stack_slot_alignment
-   : crtl->stack_alignment_needed));
+  unsigned int stack_realign
+= (incoming_stack_boundary
+   < (crtl->is_leaf && !ix86_current_function_calls_tls_descriptor
+	  ? crtl->max_used_stack_slot_alignment
+	  : crtl->stack_alignment_needed));
 
   if (crtl->stack_realign_finalized)
 {
diff --git a/gcc/testsuite/gcc.target/i386/pr68986-1.c b/gcc/testsuite/gcc.target/i386/pr68986-1.c
new file mode 100644
index 000..998f34f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr68986-1.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target tls_native } */
+/* { dg-require-effective-target fpic } */
+/* { dg-options "-fPIC -mno-accumulate-outgoing-args -mpreferred-stack-boundary=5 -mincoming-stack-boundary=4" } */
+
+extern __thread int msg

Re: [PATCH 2/9] S/390: Add disabled insn attribute

2016-02-01 Thread Ulrich Weigand
Andreas Krebbel wrote:
> On 02/01/2016 02:45 PM, Ulrich Weigand wrote:
> > So I'm wondering what the difference is between this and simply
> > overriding the default implementation of "enabled" per-insn?
> > 
> > So instead of adding
> > (set_attr "disabled" "0,1")])
> > to an insn, you might simply add instead:
> > (set_attr "enabled" "*,0")])
> Not sure but wouldn't this mean that the value of the enabled attribute would 
> then depend on the
> order of the set_attr for "enabled" and "cpu_facility" since one is defined 
> on the value of the other?!

I don't think the order matches; genattrtab seems to first read in all .md
files and only then optimize and then emit definitions of all the get_attr_...
functions.  The eq_attr uses in the "enabled" definition would only be
evaluated at that point.

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



Re: [PATCH 2/9] S/390: Add disabled insn attribute

2016-02-01 Thread Andreas Krebbel
On 02/01/2016 02:45 PM, Ulrich Weigand wrote:
> Andreas Krebbel wrote:
> 
>> With this patch a `disabled' attribute is defined which can be used to
>> explicitly disable an alternative.  This comes handy when defining the
>> substitutions later and while adding it anyway I've used it for the
>> existing cases as well.
> 
>> +; Insn attribute with boolean value
>> +;  1 to disable an insn alternative
>> +;  0 or * for an active insn alternative
>> +;  an active alternative can still be disabled due to lacking cpu
>> +;  facility support
>> +(define_attr "disabled" "" (const_int 0))
>> +
>>  (define_attr "enabled" ""
>> -  (cond [(eq_attr "cpu_facility" "standard")
>> +  (cond [(eq_attr "disabled" "1")
>> + (const_int 0)
>> +
>> + (eq_attr "cpu_facility" "standard")
>>   (const_int 1)
>>  
>>   (and (eq_attr "cpu_facility" "ieee")
> 
> So I'm wondering what the difference is between this and simply
> overriding the default implementation of "enabled" per-insn?
> 
> So instead of adding
> (set_attr "disabled" "0,1")])
> to an insn, you might simply add instead:
> (set_attr "enabled" "*,0")])
Not sure but wouldn't this mean that the value of the enabled attribute would 
then depend on the
order of the set_attr for "enabled" and "cpu_facility" since one is defined on 
the value of the other?!

-Andreas-



Re: [PATCH] [ARM] PR68532: Fix VUZP and VZIP recognition on big endian

2016-02-01 Thread Kyrill Tkachov


On 16/12/15 17:44, Charles Baylis wrote:

Hi


Hi Charles,
sorry for the delay on this one.



This patch addresses incorrect recognition of VEC_PERM_EXPRs as VUZP
and VZIP on armeb-* targets. It also fixes the definition of the
vuzpq_* and vzipq_*  NEON intrinsics which use incorrect lane
specifiers in the use of __builtin_shuffle().

The problem with arm_neon.h can be seen by temporarily altering
arm_expand_vec_perm_const_1() to unconditionally return false. If this
is done, the vuzp/vzip tests in the advsimd execution tests will fail.
With these patches, this is no longer the case.

The problem is caused by the weird mapping of architectural lane order
to gcc lane order in big endian. For 64 bit vectors, the order is
simply reversed, but 128 bit vectors are treated as 2 64 bit vectors
where the lane ordering is reversed inside those. This is due to the
memory ordering defined by the EABI. There is a large comment in
gcc/config/arm.c above output_move_neon() which describes this in more
detail.

The arm_evpc_neon_vuzp() and  arm_evpc_neon_vzip() functions do not
allow for this lane order, instead treating the lane order as simply
reversed in 128 bit vectors. These patches fix this. I have included a
test case for vuzp, but I don't have one for vzip.

Tested with make check on arm-unknown-linux-gnueabihf with no regressions
Tested with make check on armeb-unknown-linux-gnueabihf. Some
gcc.dg/vect tests fail due to no longer being vectorized. I haven't
analysed these, but it is expected since vuzp is not usable for the
shuffle patterns for which it was previously used. There are also a
few new PASSes.


Indeed I see the new passes on armeb-none-eabi.
However, the new FAILs that I see are ICEs, not just vectorisation failures,
so they need to be looked at.

The ICEs that I see are:
FAIL: gcc.dg/torture/vshuf-v4hi.c   -O2  (internal compiler error)
FAIL: gcc.dg/torture/vshuf-v8qi.c   -O2  (internal compiler error)

The backtrace looks like:
0x81c9eb expand_expr_real_2(separate_ops*, rtx_def*, machine_mode, 
expand_modifier)
$SRC/gcc/expr.c:9239
0x8044cc expand_expr_real_1(tree_node*, rtx_def*, machine_mode, 
expand_modifier, rtx_def**, bool)
$SRC/gcc/expr.c:9562
0x80a851 expand_expr_real(tree_node*, rtx_def*, machine_mode, expand_modifier, 
rtx_def**, bool)
$SRC/gcc/expr.c:7947
0x811bf0 store_expr_with_bounds(tree_node*, rtx_def*, int, bool, bool, 
tree_node*)
$SRC/gcc/expr.c:5406
0x814a7f expand_assignment(tree_node*, tree_node*, bool)
$SRC/gcc/expr.c:5175
0x709da5 expand_gimple_stmt_1
$SRC/gcc/cfgexpand.c:3606
0x709da5 expand_gimple_stmt
$SRC/gcc/cfgexpand.c:3702
0x70c3a6 expand_gimple_basic_block
$SRC/gcc/cfgexpand.c:5708
0x70fd58 execute
$SRC/gcc/cfgexpand.c:6323
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See  for instructions.

Seems that the code in expr.c asserts that expand_vec_perm returned a non-NULL 
result.

I'll look at the patches in more detail, but in the meantime I notice that 
there are some
GNU style issues that should be resolved, like starting comments with a capital 
letter,
two spaces after full stop, two spaces between full stop and close comment, as 
well as some
lines over 80 characters. The check_GNU_style.sh script in the contrib/ 
directory can help
catch some (if not all) of these.

Also, can you please send any follow-up versions of the two patches as separate 
emails,
so that we can more easily keep track of what's comment goes to which patch.

Thanks,
Kyrill



Patch 1 (vuzp):

gcc/ChangeLog:

2015-12-15  Charles Baylis  

 * config/arm/arm.c (arm_neon_endian_lane_map): New function.
 (arm_neon_vector_pair_endian_lane_map): New function.
 (arm_evpc_neon_vuzp): Allow for big endian lane order.
 * config/arm/arm_neon.h (vuzpq_s8): Adjust shuffle patterns for big
 endian.
 (vuzpq_s16): Likewise.
 (vuzpq_s32): Likewise.
 (vuzpq_f32): Likewise.
 (vuzpq_u8): Likewise.
 (vuzpq_u16): Likewise.
 (vuzpq_u32): Likewise.
 (vuzpq_p8): Likewise.
 (vuzpq_p16): Likewise.

gcc/testsuite/ChangeLog:

2015-12-15  Charles Baylis  

 * gcc.c-torture/execute/pr68532.c: New test.


Patch 2 (vzip)

gcc/ChangeLog:

2015-12-15  Charles Baylis  

 * config/arm/arm.c (arm_evpc_neon_vzip): Allow for big endian lane
 order.
 * config/arm/arm_neon.h (vzipq_s8): Adjust shuffle patterns for big
 endian.
 (vzipq_s16): Likewise.
 (vzipq_s32): Likewise.
 (vzipq_f32): Likewise.
 (vzipq_u8): Likewise.
 (vzipq_u16): Likewise.
 (vzipq_u32): Likewise.
 (vzipq_p8): Likewise.
 (vzipq_p16): Likewise.




Re: [PATCH 3/9] S/390: Get rid of Y constraint in rotate patterns.

2016-02-01 Thread Ulrich Weigand
Andreas Krebbel wrote:
> > Hmm.  In theory reload should always respect both constraints and 
> > predicates.
> > But I guess it can't hurt to disable the alternative just to be safe; it is
> > indeed an uncommon case to have the constraint accept more than the 
> > predicate.
> > (Also a PLUS with two CONST_INT operands is not canonical RTL anyway.)
> 
> Does this mean you would prefer to convert the insn condition into a 
> predicate and change it with
> subst_attr or should I leave it in the insn condition?

If we can transform the operand predicate, that would certainly be the best.
I.e. where the original pattern has:
  (match_operand:SI 2 "nonmemory_operand" "a,n")
have the replaced pattern use instead:
  (plus:SI (match_operand:SI 2 "register_operand" "a")
   (match_operand 3 "const_int_operand" "n"))

if that is possible via the subst mechanism ...

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



Re: [PATCH 3/9] S/390: Get rid of Y constraint in rotate patterns.

2016-02-01 Thread Andreas Krebbel
On 02/01/2016 05:58 PM, Ulrich Weigand wrote:
> Andreas Krebbel wrote:
>> On 02/01/2016 02:30 PM, Ulrich Weigand wrote:
>>> This seems to add just a single alternative.  Is that OK if it gets
>>> substituted into a pattern that used multiple alternatives otherwise?
>> Yes.  This is supposed to work.  The new constraint will be duplicated until 
>> it matches the number
>> of alternatives in the original insn.
> 
> OK, that makes sense.   But now I'm wondering about this bit of
> the ashiftrt patch:
> 
> +; FIXME: The number of alternatives is doubled here to match the fix
> +; number of 4 in the subst pattern for the (clobber (match_scratch...
> +; The right fix should be to support match_scratch in the output
> +; pattern of a define_subst.
> 
> +(define_subst "cconly_subst"
> +  [(set (match_operand:DSI 0 ""   "")
> +(match_operand:DSI 1 "" ""))
> +   (clobber (reg:CC CC_REGNUM))]
> +  "s390_match_ccmode(insn, CCSmode)"
> +  [(set (reg CC_REGNUM)
> + (compare (match_dup 1) (const_int 0)))
> +   (clobber (match_scratch:DSI 0 "=d,d,d,d"))])
> 
> I guess this is where the number 4 comes from?  But if the alternatives
> are duplicated, shouldn't it then work to simply use:
>(clobber (match_scratch:DSI 0 "=d"))])

As the comment tries (and fails) to say the duplication of constraints 
currently does not seem to
work for match_scratch - only for match_operand.  This is just an ugly way to 
work around this for
now. The real fix is to enable match_scratch as well.

 +; In the subst pattern we have to disable the alternative where op2 is
 +; already a constant.  This attribute is supposed to be used in the
 +; "disabled" insn attribute to achieve this.
 +(define_subst_attr "addr_style_op_disable" "addr_style_op_subst" "0" "1")
>>>
>>> Is this necessary given that the subst adds a REG_P (operands[2])
>>> condition?
>> I wasn't sure about this.  How does reload/lra behave when the constraints 
>> allow more than the insn
>> condition? I expected that reload might remember that one of the 
>> alternatives might take a constant
>> as well and then tries as part of reload inheritance and such things to 
>> actually exploit this?! Just
>> disabling the alternative in order to prevent reload from recording stuff 
>> for that alternative
>> appeared safer to me.
>>
>> On the other hand it is not ideal to have an operands[x] check in the insn 
>> condition in the first
>> place. Especially when using define_substs where the operands get renumbered 
>> this might add hard to
>> find bugs. Perhaps I should try replacing the insn predicate with a 
>> subst_attr instead? Is having a
>> constraint which allows more than a predicate any better than having the 
>> same with the insn condition?
> 
> Hmm.  In theory reload should always respect both constraints and predicates.
> But I guess it can't hurt to disable the alternative just to be safe; it is
> indeed an uncommon case to have the constraint accept more than the predicate.
> (Also a PLUS with two CONST_INT operands is not canonical RTL anyway.)

Does this mean you would prefer to convert the insn condition into a predicate 
and change it with
subst_attr or should I leave it in the insn condition?

-Andreas-

> 
> Bye,
> Ulrich
> 



Re: [PATCH 3/9] S/390: Get rid of Y constraint in rotate patterns.

2016-02-01 Thread Ulrich Weigand
Andreas Krebbel wrote:
> On 02/01/2016 02:30 PM, Ulrich Weigand wrote:
> > This seems to add just a single alternative.  Is that OK if it gets
> > substituted into a pattern that used multiple alternatives otherwise?
> Yes.  This is supposed to work.  The new constraint will be duplicated until 
> it matches the number
> of alternatives in the original insn.

OK, that makes sense.   But now I'm wondering about this bit of
the ashiftrt patch:

+; FIXME: The number of alternatives is doubled here to match the fix
+; number of 4 in the subst pattern for the (clobber (match_scratch...
+; The right fix should be to support match_scratch in the output
+; pattern of a define_subst.

+(define_subst "cconly_subst"
+  [(set (match_operand:DSI 0 ""   "")
+(match_operand:DSI 1 "" ""))
+   (clobber (reg:CC CC_REGNUM))]
+  "s390_match_ccmode(insn, CCSmode)"
+  [(set (reg CC_REGNUM)
+   (compare (match_dup 1) (const_int 0)))
+   (clobber (match_scratch:DSI 0 "=d,d,d,d"))])

I guess this is where the number 4 comes from?  But if the alternatives
are duplicated, shouldn't it then work to simply use:
   (clobber (match_scratch:DSI 0 "=d"))])


> >> +; In the subst pattern we have to disable the alternative where op2 is
> >> +; already a constant.  This attribute is supposed to be used in the
> >> +; "disabled" insn attribute to achieve this.
> >> +(define_subst_attr "addr_style_op_disable" "addr_style_op_subst" "0" "1")
> > 
> > Is this necessary given that the subst adds a REG_P (operands[2])
> > condition?
> I wasn't sure about this.  How does reload/lra behave when the constraints 
> allow more than the insn
> condition? I expected that reload might remember that one of the alternatives 
> might take a constant
> as well and then tries as part of reload inheritance and such things to 
> actually exploit this?! Just
> disabling the alternative in order to prevent reload from recording stuff for 
> that alternative
> appeared safer to me.
> 
> On the other hand it is not ideal to have an operands[x] check in the insn 
> condition in the first
> place. Especially when using define_substs where the operands get renumbered 
> this might add hard to
> find bugs. Perhaps I should try replacing the insn predicate with a 
> subst_attr instead? Is having a
> constraint which allows more than a predicate any better than having the same 
> with the insn condition?

Hmm.  In theory reload should always respect both constraints and predicates.
But I guess it can't hurt to disable the alternative just to be safe; it is
indeed an uncommon case to have the constraint accept more than the predicate.
(Also a PLUS with two CONST_INT operands is not canonical RTL anyway.)

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



[PATCH] nvptx: do not use alternative spelling of unsigned comparisons

2016-02-01 Thread Alexander Monakov
Hello!

The following patch fixes subtle breakage on NVPTX when unsigned comparisons
would be sometimes mistranslated by PTX JIT.  The new test demonstrates that
by using the %nctaid.x (number of blocks) register, but a comparison against
a value in constant memory can also trigger that (I could easily reproduce
that with NVCC, but not with GCC yet).  OK for trunk?

When emitting unsigned comparisons, nvptx backend uses alternative spellings
lo/ls/hs/hi for relational operators, instead of the usual lt/le/ge/gt,
respectively.  There's no nominal difference, but in practice ptxas and PTX
JIT can mistranslate the former in some circumstances, e.g. when the first
comparison operand is a load from constant memory (perhaps implicitely like
%nctaid.x).

This was reported to NVIDIA, but the workaround on GCC side is to simply
use the same common spellings as for signed comparisons, so let's do that.

Alexander

gcc/ChangeLog:
* config/nvptx/nvptx.c (nvptx_print_operand): Treat LEU, GEU, LTU, GTU
like LE, GE, LT, GT when emitting relational operator.

gcc/testsuite/ChangeLog:
* gcc.target/nvptx/unsigned-cmp.c: New test.
---

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 27c797b..efd0f8e 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -2161,28 +2161,20 @@ nvptx_print_operand (FILE *file, rtx x, int code)
fputs (".ne", file);
  break;
case LE:
+   case LEU:
  fputs (".le", file);
  break;
case GE:
+   case GEU:
  fputs (".ge", file);
  break;
case LT:
+   case LTU:
  fputs (".lt", file);
  break;
case GT:
- fputs (".gt", file);
- break;
-   case LEU:
- fputs (".ls", file);
- break;
-   case GEU:
- fputs (".hs", file);
- break;
-   case LTU:
- fputs (".lo", file);
- break;
case GTU:
- fputs (".hi", file);
+ fputs (".gt", file);
  break;
case LTGT:
  fputs (".ne", file);
diff --git a/gcc/testsuite/gcc.target/nvptx/unsigned-cmp.c 
b/gcc/testsuite/gcc.target/nvptx/unsigned-cmp.c
new file mode 100644
index 000..a0cf5c2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/unsigned-cmp.c
@@ -0,0 +1,50 @@
+/* { dg-options "-O2" } */
+/* { dg-do run } */
+
+/* nvptx backend used to emit lo/ls/hs/hi suffixes on unsigned comparison
+   insns instead of the more common lt/le/ge/gt, but ptxas and PTX JIT
+   miscompile 'ls' and 'hi' under some circumstances, such as when the first
+   source operand expands to a constant memory load, as demonstrated below.
+   Reported as NVIDIA bug ID 1725195 (tracker is not public).  */
+
+/* Define this to observe PTX translation breakage.  */
+//#define EMIT_BROKEN_ASM 1
+
+/* Or define this to get expected codegen.  */
+//#define EMIT_WORKING_ASM 1
+
+static __attribute__((noinline,noclone)) int ls(unsigned a)
+{
+unsigned v;
+/* %nctaid.x is always 1 in gcc testing.  */
+asm ("mov.u32 %0, %%nctaid.x;" : "=r"(v));
+#if defined(EMIT_BROKEN_ASM)
+asm ("set.u32.ls.u32 %0, %1, %0;" : "+r"(a) : "r"(v));
+#elif defined(EMIT_WORKING_ASM)
+asm ("set.u32.le.u32 %0, %1, %0;" : "+r"(a) : "r"(v));
+#else
+a = v <= a ? -1 : 0;
+#endif
+return a;
+}
+static __attribute__((noinline,noclone)) int hi(unsigned a)
+{
+unsigned v;
+asm ("mov.u32 %0, %%nctaid.x;" : "=r"(v));
+#if defined(EMIT_BROKEN_ASM)
+asm ("set.u32.hi.u32 %0, %1, %0;" : "+r"(a) : "r"(v));
+#elif defined(EMIT_WORKING_ASM)
+asm ("set.u32.gt.u32 %0, %1, %0;" : "+r"(a) : "r"(v));
+#else
+a = v > a ? -1 : 0;
+#endif
+return a;
+}
+int main()
+{
+int i;
+for (i=0; i<3; i++)
+if (ls(i) != -(1 <= i) || hi(i) != -(1 > i))
+__builtin_abort();
+return 0;
+}


[PATCH] libstdc++: S/390: Add missing baseline_symbols.txt for s390x/-m31.

2016-02-01 Thread Dominik Vogt
The attached patch copies the existing 
libstdc++-v3/config/abi/post/s390-linux-gnu/baseline_symbols.txt
to .../s390x-linux-gnu/32/baseline_symbols.txt.  This fixes the
abi test failure on s390x with -m31.

The patch is gzip'ed because it's large.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
libstdc++-v3/ChangeLog

* config/abi/post/s390x-linux-gnu/32/baseline_symbols.txt (FUNC): New
file.


0001-libstdc-S-390-Add-missing-baseline_symbols.txt-for-s.patch.gz
Description: Binary data


Re: [PATCH 3/9] S/390: Get rid of Y constraint in rotate patterns.

2016-02-01 Thread Andreas Krebbel
On 02/01/2016 02:30 PM, Ulrich Weigand wrote:
> Andreas Krebbel wrote:
> 
>> +; Accept single register and immediate operands usable as shift
>> +; counts.
>> +(define_predicate "addrreg_or_constint_operand"
>> +  (match_code "reg, subreg, const_int")
> 
> I'm wondering whether this is even necessary.
> 
>> +{
>> +  if (GET_MODE (op) != VOIDmode
>> +  && GET_MODE_CLASS (GET_MODE (op)) != MODE_INT)
>> +return false;
> 
> The predicate always seems to be used with a mode, so any extra
> mode checks here seem redundant.
> 
>> +  while (op && GET_CODE (op) == SUBREG)
>> +op = SUBREG_REG (op);
>> +
>> +  if (REG_P (op)
>> +  && (REGNO (op) >= FIRST_PSEUDO_REGISTER || ADDR_REG_P (op)))
>> +return true;
>> +
>> +  if (CONST_INT_P (op))
>> +return true;
> 
> This looks somewhat copied from shift_count_or_setmem_operand, where
> we have a comment:
>   /* Don't allow any non-base hard registers.  Doing so without
>  confusing reload and/or regrename would be tricky, and doesn't
>  buy us much anyway.  */
> 
> With the new set up, I don't believe there is any issue with confusing
> reload -- it's no more complex than any other pattern now.  So it
> should be fine to just accept any register.
> 
> In fact, I'm starting to wonder whether it wouldn't be just fine to
> simply using nonmemory_operand instead of this special predicate
> (the only issue might be that we could get CONST_WIDE_INT, but it
> should be simple to handle those).
> 
>> +(define_expand "rotl3"
>> +  [(set (match_operand:GPR 0 "register_operand" "")
>> +(rotate:GPR (match_operand:GPR 1 "register_operand" "")
>> +(match_operand:SI 2 "shift_count_or_setmem_operand" "")))]
> 
> Similarly, I think the expander no longer needs to use the special predicate
> shift_count_or_setmem_operandm but could just use nonmemory_operand.
> [ This will reject the PLUS that shift_count_or_setmem_operand accepted,
> but I don't believe you ever *could* get the PLUS in the expander anyway.
> It will only be moved in by combine, so as long as the insn accepts it,
> everything should be good.  ]
Ok. I'll try to get rid of the special predicates.

>> +(define_insn "*rotl3"
>> +  [(set (match_operand:GPR 0 "register_operand"   
>> "=d,d")
>> +(rotate:GPR (match_operand:GPR 1 "register_operand""d,d")
>> +(match_operand:SI  2 "addrreg_or_constint_operand" "a,n")))]
>> +  "TARGET_CPU_ZARCH"
>> +  "@
>> +   rll\t%0,%1,(%2)
>> +   rll\t%0,%1,%Y2"
> 
> So it looks like %Y is now always used for just a constant.  Maybe the
> implementation of %Y should be adjusted (once the patch series is complete)?
> Also, if we use just nonmemory_operand, %Y should be updated to accept
> CONST_WIDE_INT just like e.g. %x does.
Ok.

> 
>> +; This expands an register/immediate operand to a register+immediate
>> +; operand to draw advantage of the address style operand format
>> +; providing a addition for free.
>> +(define_subst "addr_style_op_subst"
>> +  [(set (match_operand:DSI 0 "" "")
>> +(SUBST:DSI (match_operand:DSI 1 "" "")
>> +   (match_operand:SI 2 "" "")))]
>> +  "REG_P (operands[2])"
>> +  [(set (match_dup 0)
>> +(SUBST:DSI (match_dup 1)
>> +   (plus:SI (match_dup 2)
>> +(match_operand 3 "const_int_operand" "n"])
> 
> This seems to add just a single alternative.  Is that OK if it gets
> substituted into a pattern that used multiple alternatives otherwise?
Yes.  This is supposed to work.  The new constraint will be duplicated until it 
matches the number
of alternatives in the original insn.

>> +; In the subst pattern we have to disable the alternative where op2 is
>> +; already a constant.  This attribute is supposed to be used in the
>> +; "disabled" insn attribute to achieve this.
>> +(define_subst_attr "addr_style_op_disable" "addr_style_op_subst" "0" "1")
> 
> Is this necessary given that the subst adds a REG_P (operands[2])
> condition?
I wasn't sure about this.  How does reload/lra behave when the constraints 
allow more than the insn
condition? I expected that reload might remember that one of the alternatives 
might take a constant
as well and then tries as part of reload inheritance and such things to 
actually exploit this?! Just
disabling the alternative in order to prevent reload from recording stuff for 
that alternative
appeared safer to me.

On the other hand it is not ideal to have an operands[x] check in the insn 
condition in the first
place. Especially when using define_substs where the operands get renumbered 
this might add hard to
find bugs. Perhaps I should try replacing the insn predicate with a subst_attr 
instead? Is having a
constraint which allows more than a predicate any better than having the same 
with the insn condition?

-Andreas-



Re: [committed] Fix declaration of vsscanf on hpux

2016-02-01 Thread Bruce Korb

On 01/31/16 17:03, John David Anglin wrote:

The attached hack fixes missing const from the first argument of the 
declararation for vsscanf on hpux.


Approved for all active dev branches.


Re: Default compute dimensions

2016-02-01 Thread Nathan Sidwell

On 02/01/16 10:32, Jakub Jelinek wrote:

On Mon, Feb 01, 2016 at 09:15:05AM -0500, Nathan Sidwell wrote:

On 01/29/16 10:18, Jakub Jelinek wrote:

On Thu, Jan 28, 2016 at 10:38:51AM -0500, Nathan Sidwell wrote:

This patch adds default compute dimension handling.  Users rarely specify
compute dimensions, expecting the toolchain to DTRT.  More savvy users would
like to specify global defaults.  This patch permits both.


Isn't it better to be able to override the defaults on the library side?
I mean, when when somebody is compiling the code, often he doesn't know the
exact properties of the hw it will be run on, if he does, I think it is
better to specify them explicitly in the code.


I realized that it's actually not possible to markup the code in this way,
as an 'intermediate' user.  One can exercise complete control by saying
exactly the axis/axes over which a loop is to be partitioned, and then
specify the geometry.  But one cannot use the 'auto' feature and have the
compiler choose an axis without also relying on the compiler choosing a size
for that axis.  As I already said,  IMHO being able to specify a
compile-time size is useful.


Ok, I won't fight against it.  But please make sure it can be overridden on
the library side too.


Absolutely, thanks!

nathan



Re: [PATCH] Fix use of declare'd vars by routine procedures.

2016-02-01 Thread Jakub Jelinek
On Mon, Feb 01, 2016 at 10:00:51AM -0600, James Norris wrote:
> +2016-01-XX  James Norris  

It is February now ;)

> @@ -8223,7 +8247,7 @@ gimplify_oacc_declare (tree *expr_p, gimple_seq *pre_p)
> if (oacc_declare_returns == NULL)
>   oacc_declare_returns = new hash_map;
>  
> -   oacc_declare_returns->put (decl, c);
> + oacc_declare_returns->put (decl, c);
>   }
>   }

This snippet looks wrong, I bet -Wmisleading-indentation will flag that (or 
should).

Ok for trunk with the bogus snippet removed.

Jakub


Re: [PATCH] Fix use of declare'd vars by routine procedures.

2016-02-01 Thread James Norris

Hi!

On 01/29/2016 02:31 AM, Jakub Jelinek wrote:

On Thu, Jan 28, 2016 at 12:26:38PM -0600, James Norris wrote:

I think the attached change is what you had in mind with
regard to doing the check at gimplification time.


Nope, this is still a wrong location for that.
If you look at the next line after the block you've added, you'll see
if (gimplify_omp_ctxp && omp_notice_variable (gimplify_omp_ctxp, decl, true))
And that function fairly early calls is_global_var (decl).  So you know
already gimplify_omp_ctxp && is_global_var (decl), just put the rest into
that block.


When said you said "rest into that block", I assumed you meant
the block in omp_notice_variable () that has the function call
to is_global_var () as the if condition.


The TREE_CODE (decl) == VAR_DECL check is VAR_P (decl).


Fixed.


What do you want to achieve with


+  && ((TREE_STATIC (decl) && !DECL_EXTERNAL (decl))
+   || (!TREE_STATIC (decl) && DECL_EXTERNAL (decl


?  is_global_var already guarantees you that it is either TREE_STATIC
or DECL_EXTERNAL, why is that not good enough?


As you pointed out above, the check is not necessary.




[snip snip]

+  /* Validate variable for use within routine function.  */
+  if (gimplify_omp_ctxp && gimplify_omp_ctxp->region_type == ORT_TARGET
+  && get_oacc_fn_attrib (current_function_decl)


If you only care about the implicit target region of acc routine, I think
you also want to check that gimplify_omp_ctxp->outer_context == NULL.


Check added.




[snip snip]


And I'm really confused by this error message.  If you are complaining that
the variable is not listed in acc declare clauses, why don't you say that?
What does the error have to do with its storage class?
Also, splitting one error into two is weird, usually there would be one
error message and perhaps inform after it.


Error messages re-written to make better sense.

Thanks!

Jim

= ChangeLog entries

gcc/
* gimplify.c (omp_notice_variable): Add variable usage check.
gcc/testsuite/
* c-c++-common/goacc/routine-5.c: Add tests.



diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 9f99842..41cd7a7 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,7 @@
+2016-01-XX  James Norris  
+
+	* gimplify.c (omp_notice_variable): Add usage check.
+
 2016-01-29  Jakub Jelinek  
 
 	PR target/69551
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 32bc1fd..5833605 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -6087,9 +6087,9 @@ omp_notice_variable (struct gimplify_omp_ctx *ctx, tree decl, bool in_code)
   if (ctx->region_type == ORT_NONE)
 return lang_hooks.decls.omp_disregard_value_expr (decl, false);
 
-  /* Threadprivate variables are predetermined.  */
   if (is_global_var (decl))
 {
+  /* Threadprivate variables are predetermined.  */
   if (DECL_THREAD_LOCAL_P (decl))
 	return omp_notice_threadprivate_variable (ctx, decl, NULL_TREE);
 
@@ -6100,6 +6100,30 @@ omp_notice_variable (struct gimplify_omp_ctx *ctx, tree decl, bool in_code)
 	  if (value && DECL_P (value) && DECL_THREAD_LOCAL_P (value))
 	return omp_notice_threadprivate_variable (ctx, decl, value);
 	}
+
+  if (gimplify_omp_ctxp->outer_context == NULL
+	  && VAR_P (decl)
+	  && get_oacc_fn_attrib (current_function_decl))
+	{
+	  location_t loc = DECL_SOURCE_LOCATION (decl);
+
+	  if (lookup_attribute ("omp declare target link",
+DECL_ATTRIBUTES (decl)))
+	{
+	  error_at (loc,
+			"%qE with % clause used in % function",
+			DECL_NAME (decl));
+	  return false;
+	}
+	  else if (!lookup_attribute ("omp declare target",
+  DECL_ATTRIBUTES (decl)))
+	{
+	  error_at (loc,
+			"%qE requires a % directive for use "
+			"in a % function", DECL_NAME (decl));
+	  return false;
+	}
+	}
 }
 
   n = splay_tree_lookup (ctx->variables, (splay_tree_key)decl);
@@ -8223,7 +8247,7 @@ gimplify_oacc_declare (tree *expr_p, gimple_seq *pre_p)
 	  if (oacc_declare_returns == NULL)
 		oacc_declare_returns = new hash_map;
 
-	  oacc_declare_returns->put (decl, c);
+		oacc_declare_returns->put (decl, c);
 	}
 	}
 
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 150ebc8..61003c2 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,7 @@
+2016-01-XX  James Norris  
+
+	* c-c++-common/goacc/routine-5.c: Add tests.
+
 2016-01-29  Jakub Jelinek  
 
 	PR target/69551
diff --git a/gcc/testsuite/c-c++-common/goacc/routine-5.c b/gcc/testsuite/c-c++-common/goacc/routine-5.c
index ccda097..c34838f 100644
--- a/gcc/testsuite/c-c++-common/goacc/routine-5.c
+++ b/gcc/testsuite/c-c++-common/goacc/routine-5.c
@@ -45,3 +45,97 @@ using namespace g;
 #pragma acc routine (a) /* { dg-error "does not refer to" } */
   
 #pragma acc routine (c) /* { dg-error "does not refer to" } */
+
+int vb1;		/* { dg-error "directive for use" } */
+extern int vb2;		/* { dg-error "directive for use" } */
+static int vb3;		/* { dg-error "directive 

Re: [PATCH] [ARM] PR68532: Fix VUZP and VZIP recognition on big endian

2016-02-01 Thread Charles Baylis
ping^2

On 13 January 2016 at 13:37, Charles Baylis  wrote:
> ping
>
> On 16 December 2015 at 17:44, Charles Baylis  
> wrote:
>> Hi
>>
>> This patch addresses incorrect recognition of VEC_PERM_EXPRs as VUZP
>> and VZIP on armeb-* targets. It also fixes the definition of the
>> vuzpq_* and vzipq_*  NEON intrinsics which use incorrect lane
>> specifiers in the use of __builtin_shuffle().
>>
>> The problem with arm_neon.h can be seen by temporarily altering
>> arm_expand_vec_perm_const_1() to unconditionally return false. If this
>> is done, the vuzp/vzip tests in the advsimd execution tests will fail.
>> With these patches, this is no longer the case.
>>
>> The problem is caused by the weird mapping of architectural lane order
>> to gcc lane order in big endian. For 64 bit vectors, the order is
>> simply reversed, but 128 bit vectors are treated as 2 64 bit vectors
>> where the lane ordering is reversed inside those. This is due to the
>> memory ordering defined by the EABI. There is a large comment in
>> gcc/config/arm.c above output_move_neon() which describes this in more
>> detail.
>>
>> The arm_evpc_neon_vuzp() and  arm_evpc_neon_vzip() functions do not
>> allow for this lane order, instead treating the lane order as simply
>> reversed in 128 bit vectors. These patches fix this. I have included a
>> test case for vuzp, but I don't have one for vzip.
>>
>> Tested with make check on arm-unknown-linux-gnueabihf with no regressions
>> Tested with make check on armeb-unknown-linux-gnueabihf. Some
>> gcc.dg/vect tests fail due to no longer being vectorized. I haven't
>> analysed these, but it is expected since vuzp is not usable for the
>> shuffle patterns for which it was previously used. There are also a
>> few new PASSes.
>>
>>
>> Patch 1 (vuzp):
>>
>> gcc/ChangeLog:
>>
>> 2015-12-15  Charles Baylis  
>>
>> * config/arm/arm.c (arm_neon_endian_lane_map): New function.
>> (arm_neon_vector_pair_endian_lane_map): New function.
>> (arm_evpc_neon_vuzp): Allow for big endian lane order.
>> * config/arm/arm_neon.h (vuzpq_s8): Adjust shuffle patterns for big
>> endian.
>> (vuzpq_s16): Likewise.
>> (vuzpq_s32): Likewise.
>> (vuzpq_f32): Likewise.
>> (vuzpq_u8): Likewise.
>> (vuzpq_u16): Likewise.
>> (vuzpq_u32): Likewise.
>> (vuzpq_p8): Likewise.
>> (vuzpq_p16): Likewise.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2015-12-15  Charles Baylis  
>>
>> * gcc.c-torture/execute/pr68532.c: New test.
>>
>>
>> Patch 2 (vzip)
>>
>> gcc/ChangeLog:
>>
>> 2015-12-15  Charles Baylis  
>>
>> * config/arm/arm.c (arm_evpc_neon_vzip): Allow for big endian lane
>> order.
>> * config/arm/arm_neon.h (vzipq_s8): Adjust shuffle patterns for big
>> endian.
>> (vzipq_s16): Likewise.
>> (vzipq_s32): Likewise.
>> (vzipq_f32): Likewise.
>> (vzipq_u8): Likewise.
>> (vzipq_u16): Likewise.
>> (vzipq_u32): Likewise.
>> (vzipq_p8): Likewise.
>> (vzipq_p16): Likewise.


[PATCH] Fix PR69574

2016-02-01 Thread Richard Biener

I've bootstrapped and tested the followgn safe patch on 
x86_64-unknown-linux-gnu.

Applied.

Richard.

2016-02-01  Richard Biener  

PR tree-optimization/69574
* tree-chrec.c (hide_evolution_in_other_loops_than_loop): Instead
of asserting return chrec_dont_know.

* gcc.dg/torture/pr69574.c: New testcase.

Index: gcc/tree-chrec.c
===
*** gcc/tree-chrec.c(revision 233033)
--- gcc/tree-chrec.c(working copy)
*** hide_evolution_in_other_loops_than_loop
*** 728,739 
/* There is no evolution in this loop.  */
return initial_condition (chrec);
  
else
!   {
! gcc_assert (flow_loop_nested_p (loop, chloop));
! return hide_evolution_in_other_loops_than_loop (CHREC_LEFT (chrec),
! loop_num);
!   }
  
  default:
return chrec;
--- 728,739 
/* There is no evolution in this loop.  */
return initial_condition (chrec);
  
+   else if (flow_loop_nested_p (loop, chloop))
+   return hide_evolution_in_other_loops_than_loop (CHREC_LEFT (chrec),
+   loop_num);
+ 
else
!   return chrec_dont_know;
  
  default:
return chrec;
Index: gcc/testsuite/gcc.dg/torture/pr69574.c
===
*** gcc/testsuite/gcc.dg/torture/pr69574.c  (revision 0)
--- gcc/testsuite/gcc.dg/torture/pr69574.c  (working copy)
***
*** 0 
--- 1,15 
+ /* { dg-do compile } */
+ 
+ typedef unsigned mytype;
+ 
+ struct S {
+ mytype *pu;
+ };
+ 
+ mytype f(struct S *e)
+ {
+   mytype x;
+   if(&x != e->pu)
+ __builtin_memcpy(&x, e->pu, sizeof(unsigned));
+   return x;
+ }



[PATCH] Fix PR69556

2016-02-01 Thread Richard Biener

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2016-02-01  Richard Biener  

PR middle-end/69556
* match.pd: Guard (C1/X)*C2 -> (C1*C2)/X with single_use.

* gcc.dg/tree-ssa/recip-8.c: New testcase.

Index: gcc/match.pd
===
--- gcc/match.pd(revision 233033)
+++ gcc/match.pd(working copy)
@@ -445,8 +445,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 
 /* Fold (C1/X)*C2 into (C1*C2)/X.  */
 (simplify
- (mult (rdiv:s REAL_CST@0 @1) REAL_CST@2)
-  (if (flag_associative_math)
+ (mult (rdiv@3 REAL_CST@0 @1) REAL_CST@2)
+  (if (flag_associative_math
+   && single_use (@3))
(with
 { tree tem = const_binop (MULT_EXPR, type, @0, @2); }
 (if (tem)
Index: gcc/testsuite/gcc.dg/tree-ssa/recip-8.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/recip-8.c (revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/recip-8.c (working copy)
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ffast-math -fdump-tree-optimized" } */
+
+double bar (double, double, double, double, double);
+
+double
+foo (double a)
+{
+  return bar (1.0/a, 2.0/a, 4.0/a, 8.0/a, 16.0/a);
+}
+
+/* { dg-final { scan-tree-dump-times "/" 1 "optimized"  } } */


Re: Default compute dimensions

2016-02-01 Thread Jakub Jelinek
On Mon, Feb 01, 2016 at 09:15:05AM -0500, Nathan Sidwell wrote:
> On 01/29/16 10:18, Jakub Jelinek wrote:
> >On Thu, Jan 28, 2016 at 10:38:51AM -0500, Nathan Sidwell wrote:
> >>This patch adds default compute dimension handling.  Users rarely specify
> >>compute dimensions, expecting the toolchain to DTRT.  More savvy users would
> >>like to specify global defaults.  This patch permits both.
> >
> >Isn't it better to be able to override the defaults on the library side?
> >I mean, when when somebody is compiling the code, often he doesn't know the
> >exact properties of the hw it will be run on, if he does, I think it is
> >better to specify them explicitly in the code.
> 
> I realized that it's actually not possible to markup the code in this way,
> as an 'intermediate' user.  One can exercise complete control by saying
> exactly the axis/axes over which a loop is to be partitioned, and then
> specify the geometry.  But one cannot use the 'auto' feature and have the
> compiler choose an axis without also relying on the compiler choosing a size
> for that axis.  As I already said,  IMHO being able to specify a
> compile-time size is useful.

Ok, I won't fight against it.  But please make sure it can be overridden on
the library side too.

Jakub


Re: Default compute dimensions

2016-02-01 Thread Nathan Sidwell

On 01/29/16 10:18, Jakub Jelinek wrote:

On Thu, Jan 28, 2016 at 10:38:51AM -0500, Nathan Sidwell wrote:

This patch adds default compute dimension handling.  Users rarely specify
compute dimensions, expecting the toolchain to DTRT.  More savvy users would
like to specify global defaults.  This patch permits both.


Isn't it better to be able to override the defaults on the library side?
I mean, when when somebody is compiling the code, often he doesn't know the
exact properties of the hw it will be run on, if he does, I think it is
better to specify them explicitly in the code.


I realized that it's actually not possible to markup the code in this way, as an 
'intermediate' user.  One can exercise complete control by saying exactly the 
axis/axes over which a loop is to be partitioned, and then specify the geometry. 
 But one cannot use the 'auto' feature and have the compiler choose an axis 
without also relying on the compiler choosing a size for that axis.  As I 
already said,  IMHO being able to specify a compile-time size is useful.



nathan



Re: [AArch64] Remove AARCH64_EXTRA_TUNE_RECIP_SQRT from Cortex-A57 tuning

2016-02-01 Thread James Greenhalgh
On Mon, Jan 25, 2016 at 11:20:46AM +, James Greenhalgh wrote:
> On Mon, Jan 11, 2016 at 12:04:43PM +, James Greenhalgh wrote:
> > 
> > Hi,
> > 
> > I've seen a couple of large performance issues caused by expanding
> > the high-precision reciprocal square root for Cortex-A57, so I'd like
> > to turn it off by default.
> > 
> > This is good for art (~2%) from Spec2000, bad (~3.5%) for fma3d from
> > Spec2000, good (~5.5%) for gromcas from Spec2006, and very good (>10%) for
> > some private microbenchmark kernels which stress the divide/sqrt/multiply
> > units. It therefore seems to me to be the correct choice to make across
> > a number of workloads.
> > 
> > Bootstrapped and tested on aarch64-none-linux-gnu with no issues.
> > 
> > OK?
> 
> *Ping*

*pingx2*

Thanks,
James

> > ---
> > 2015-12-11  James Greenhalgh  
> > 
> > * config/aarch64/aarch64.c (cortexa57_tunings): Remove
> > AARCH64_EXTRA_TUNE_RECIP_SQRT.
> > 
> 
> > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> > index 1d5d898..999c9fc 100644
> > --- a/gcc/config/aarch64/aarch64.c
> > +++ b/gcc/config/aarch64/aarch64.c
> > @@ -484,8 +484,7 @@ static const struct tune_params cortexa57_tunings =
> >0,   /* max_case_values.  */
> >0,   /* cache_line_size.  */
> >tune_params::AUTOPREFETCHER_WEAK,/* autoprefetcher_model.  */
> > -  (AARCH64_EXTRA_TUNE_RENAME_FMA_REGS
> > -   | AARCH64_EXTRA_TUNE_RECIP_SQRT)/* tune_flags.  */
> > +  (AARCH64_EXTRA_TUNE_RENAME_FMA_REGS) /* tune_flags.  */
> >  };
> >  
> >  static const struct tune_params cortexa72_tunings =
> 


Re: [Patch AArch64] Use software sqrt expansion always for -mlow-precision-recip-sqrt

2016-02-01 Thread James Greenhalgh
On Mon, Jan 25, 2016 at 11:21:25AM +, James Greenhalgh wrote:
> On Mon, Jan 11, 2016 at 11:53:39AM +, James Greenhalgh wrote:
> > 
> > Hi,
> > 
> > I'd like to switch the logic around in aarch64.c such that
> > -mlow-precision-recip-sqrt causes us to always emit the low-precision
> > software expansion for reciprocal square root. I have two reasons to do
> > this; first is consistency across -mcpu targets, second is enabling more
> > -mcpu targets to use the flag for peak tuning.
> > 
> > I don't much like that the precision we use for -mlow-precision-recip-sqrt
> > differs between cores (and possibly compiler revisions). Yes, we're
> > under -ffast-math but I take this flag to mean the user explicitly wants the
> > low-precision expansion, and we should not diverge from that based on an
> > internal decision as to what is optimal for performance in the
> > high-precision case. I'd prefer to keep things as predictable as possible,
> > and here that means always emitting the low-precision expansion when asked.
> > 
> > Judging by the comments in the thread proposing the reciprocal square
> > root optimisation, this will benefit all cores currently supported by GCC.
> > To be clear, we would still not expand in the high-precision case for any
> > cores which do not explicitly ask for it. Currently that is Cortex-A57
> > and xgene, though I will be proposing a patch to remove Cortex-A57 from
> > that list shortly.
> > 
> > Which gives my second motivation for this patch. -mlow-precision-recip-sqrt
> > is intended as a tuning flag for situations where performance is more
> > important than precision, but the current logic requires setting an
> > internal flag which also changes the performance characteristics where
> > high-precision is needed. This conflates two decisions the target might
> > want to make, and reduces the applicability of an option targets might
> > want to enable for performance. In particular, I'd still like to see
> > -mlow-precision-recip-sqrt continue to emit the cheaper, low-precision
> > sequence for floats under Cortex-A57.
> > 
> > Based on that reasoning, this patch makes the appropriate change to the
> > logic. I've checked with the current -mcpu values to ensure that behaviour
> > without -mlow-precision-recip-sqrt does not change, and that behaviour
> > with -mlow-precision-recip-sqrt is to emit the low precision sequences.
> > 
> > I've also put this through bootstrap and test on aarch64-none-linux-gnu
> > with no issues.
> > 
> > OK?
> 
> *Ping*

*Pingx2*

Thanks,
James

> 
> Thanks,
> James
> 
> > 2015-12-10  James Greenhalgh  
> > 
> > * config/aarch64/aarch64.c (use_rsqrt_p): Always use software
> > reciprocal sqrt for -mlow-precision-recip-sqrt.
> > 
> 
> > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> > index 9142ac0..1d5d898 100644
> > --- a/gcc/config/aarch64/aarch64.c
> > +++ b/gcc/config/aarch64/aarch64.c
> > @@ -7485,8 +7485,9 @@ use_rsqrt_p (void)
> >  {
> >return (!flag_trapping_math
> >   && flag_unsafe_math_optimizations
> > - && (aarch64_tune_params.extra_tuning_flags
> > - & AARCH64_EXTRA_TUNE_RECIP_SQRT));
> > + && ((aarch64_tune_params.extra_tuning_flags
> > +  & AARCH64_EXTRA_TUNE_RECIP_SQRT)
> > + || flag_mrecip_low_precision_sqrt));
> >  }
> >  
> >  /* Function to decide when to use
> 


[PATCH] [ARC] Add single/double IEEE precission FPU support.

2016-02-01 Thread Claudiu Zissulescu
In this patch, we add support for the new FPU instructions available with
ARC V2 processors.  The new FPU instructions covers both single and
double precision IEEE formats. While the single precision is available
for both ARC EM and ARC HS processors, the double precision is only
available for ARC HS. ARC EM will make use of the double precision assist
instructions which are in fact FPX double instructions.  The double
floating point precision instructions are making use of the odd-even
register pairs to hold 64-bit datums, exactly like in the case of ldd/std
instructions.

Additional to the mods required by FPU instructions to be supported by
GCC, we forced all the 64 bit datum to use odd-even register pairs (HS
only), as we observed a better usage of the ldd/std, and less generated
move instructions.  A byproduct of this optimization, is a new ABI, which
places the 64-bit arguments into odd-even register pairs.  This behavior
can be selected using -mabi option.

Feedback is welcomed,
Claudiu

gcc/
2016-02-01  Claudiu Zissulescu  

* config/arc/arc-modes.def (CC_FPU, CC_FPUE, CC_FPU_UNEQ): New
modes.
* config/arc/arc-opts.h (FPU_SP, FPU_SF, FPU_SC, FPU_SD, FPU_DP)
(FPU_DF, FPU_DC, FPU_DD, FXP_DP): Define.
(arc_abi_type): New enum.
* config/arc/arc-protos.h (arc_init_cumulative_args): Declare.
* config/arc/arc.c (TARGET_STRICT_ARGUMENT_NAMING): Define.
(arc_init): Check FPU options.
(get_arc_condition_code): Handle new CC_FPU* modes.
(arc_select_cc_mode): Likewise.
(arc_conditional_register_usage): Allow 64 bit datum into even-odd
register pair only. Allow access for ARCv2 accumulator.
(gen_compare_reg): Whenever we have FPU support use FPU compare
instructions.
(arc_setup_incoming_varargs): Handle even-odd register pair (ARC
HS only).
(arc_strict_argument_naming): New function.
(arc_init_cumulative_args): Likewise.
(arc_hard_regno_nregs): Likewise.
(arc_function_args_impl): Likewise.
(arc_arg_partial_bytes): Use arc_function_args_impl function.
(arc_function_arg): Likewise.
(arc_function_arg_advance): Likewise.
(arc_reorg): Don't generate brcc insns when FPU compare
instructions are involved.
* config/arc/arc.h (TARGET_DPFP): Add TARGET_FP_DPAX condition.
(TARGET_OPTFPE): Add condition when ARC EM can use optimized
floating point emulation.
(ACC_REG_FIRST, ACC_REG_LAST, ACCL_REGNO, ACCH_REGNO): Define.
(CUMULATIVE_ARGS): New structure.
(INIT_CUMULATIVE_ARGS): Use arc_init_cumulative_args.
(REVERSE_CONDITION): Add new CC_FPU* modes.
(TARGET_HARD_FLOAT, TARGET_FP_SINGLE, TARGET_FP_DOUBLE)
(TARGET_FP_SFUZED, TARGET_FP_DFUZED, TARGET_FP_SCONV)
(TARGET_FP_DCONV, TARGET_FP_SSQRT, TARGET_FP_DSQRT)
(TARGET_FP_DPAX): Define.
* config/arc/arc.md (ARCV2_ACC): New constant.
(type): New fpu type attribute.
(SDF): Conditional iterator.
(cstore, cbranch): Change expand condition.
(addsf3, subsf3, mulsf3, adddf3, subdf3, muldf3): New expands,
handles FPU/FPX cases as well.
* config/arc/arc.opt (mfpu, mabi): New options.
* config/arc/fpx.md (addsf3_fpx, subsf3_fpx, mulsf3_fpx):
Renamed.
(adddf3, muldf3, subdf3): Removed.
* config/arc/predicates.md (proper_comparison_operator): Recognize
CC_FPU* modes.
* config/arc/fpu.md: New file.
* doc/invoke.texi (ARC Options): Document mabi and mfpu options.
---
 gcc/config/arc/arc-modes.def |   5 +
 gcc/config/arc/arc-opts.h|  27 ++
 gcc/config/arc/arc-protos.h  |   1 +
 gcc/config/arc/arc.c | 398 -
 gcc/config/arc/arc.h |  70 --
 gcc/config/arc/arc.md| 150 ++-
 gcc/config/arc/arc.opt   |  56 +
 gcc/config/arc/fpu.md| 580 +++
 gcc/config/arc/fpx.md|  64 +
 gcc/config/arc/predicates.md |  10 +
 gcc/doc/invoke.texi  |  91 ++-
 11 files changed, 1310 insertions(+), 142 deletions(-)
 create mode 100644 gcc/config/arc/fpu.md

diff --git a/gcc/config/arc/arc-modes.def b/gcc/config/arc/arc-modes.def
index b64a596..1f4bf95 100644
--- a/gcc/config/arc/arc-modes.def
+++ b/gcc/config/arc/arc-modes.def
@@ -35,3 +35,8 @@ CC_MODE (CC_FPX);
 VECTOR_MODES (INT, 4);/*V4QI V2HI */
 VECTOR_MODES (INT, 8);/*   V8QI V4HI V2SI */
 VECTOR_MODES (INT, 16);   /* V16QI V8HI V4SI V2DI */
+
+/* FPU conditon flags. */
+CC_MODE (CC_FPU);
+CC_MODE (CC_FPUE);
+CC_MODE (CC_FPU_UNEQ);
diff --git a/gcc/config/arc/arc-opts.h b/gcc/config/arc/arc-opts.h
index 0f12885..be628e5 100644
--- a/gcc/config/arc/arc-opts.h
+++ b/gcc/config/arc/arc-opts.h
@@ -27,3 +27,30 @@ enum processor_type
   PROCESSOR_ARCEM,
   PROCESSOR_ARCHS
 };
+
+/* Single precisio

Re: [PATCH, RFC] New memory usage statistics infrastructure

2016-02-01 Thread Martin Liška
On 01/29/2016 05:24 PM, Patrick Palka wrote:
> On Thu, May 28, 2015 at 1:07 PM, Jeff Law  wrote:
>> On 05/28/2015 06:29 AM, Martin Liška wrote:
>>

>>>
>>> Hello.
>>>
>>> Thank you for pointing about missing copyright.
>>> Following patch adds that.
>>>
>>> Ready for trunk?
>>
>> Yes.
>> jeff
>>
> 
> It looks like this patch was never committed.  gcc/mem-stats.h and
> gcc/mem-stats-traits.h don't yet have copyright headers.
> 

Hi.

Thanks for pointing out, I've just installed the patch (w/ adding 2016 year to 
copyright header).

Martin


[Patch, avr] Restore default value of PARAM_ALLOW_STORE_DATA_RACES to 1

2016-02-01 Thread Senthil Kumar Selvaraj

Hi,

  This patch sets PARAM_ALLOW_STORE_DATA_RACES to 1 (the default until
  a year back), to avoid code size regressions in trunk (and probably
  5.x )for the AVR target.

  Consider the following piece of code
  
volatile int z;
void foo(int x)
{
static char i;
for (i=0; i< 4; ++i)
{
if (x > 2)
z = 1;
else
z = 2;
}
}

Unmodified gcc trunk generates this

movw r20,r24
sts i.1495,__zero_reg__
ldi r25,0
ldi r18,0
ldi r22,lo8(2)
ldi r23,0
ldi r30,lo8(1)
ldi r31,0
.L2:
cpi r25,lo8(4)
brne .L5
cpse r18,__zero_reg__
sts i.1495,r25
.L1:
ret
.L5:
cpi r20,3
cpc r21,__zero_reg__
brlt .L3
sts z+1,r31
sts z,r30
.L4:
subi r25,lo8(-(1))
ldi r18,lo8(1)
rjmp .L2
.L3:
sts z+1,r23
sts z,r22
rjmp .L4
.size   foo, .-foo
.local  i.1495
.comm   i.1495,1,1
.comm   z,2,1
.ident  "GCC: (GNU) 6.0.0 20160201 (experimental)"

Note the usage of an extra reg (r18) that is used as a flag to
record loop entry (in .L4), and the conditional store of r25 to i in .L2.

In 4.x, there is no extra reg usage - only a single unconditional set of
i to r25 at the end of the loop.

Digging into the code, I found that LIM checks
PARAM_ALLOW_STORE_DATA_RACES and introduces the flag to avoid store data
races - see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52558. The
default value of the param was set to zero a year and a half back - see
https://gcc.gnu.org/ml/gcc-patches/2014-06/msg01548.html.

For AVR, I guess assuming any store can cause a data race is too
pessimistic for the general case. Globals shared with interrupts will
need special handling for atomic access anyway, so I thought we should
revert the default back to allow store data races.

If this is ok, could someone commit please? I don't have commit access.

Regards
Senthil

gcc/ChangeLog

2016-02-01  Senthil Kumar Selvaraj  

* config/avr/avr.c (avr_option_override): Set
PARAM_ALLOW_STORE_DATA_RACES to 1.


diff --git gcc/config/avr/avr.c gcc/config/avr/avr.c
index e557772..a7728e3 100644
--- gcc/config/avr/avr.c
+++ gcc/config/avr/avr.c
@@ -43,6 +43,7 @@
 #include "expr.h"
 #include "langhooks.h"
 #include "cfgrtl.h"
+#include "params.h"
 #include "builtins.h"
 #include "context.h"
 #include "tree-pass.h"
@@ -410,6 +411,15 @@ avr_option_override (void)
   if (avr_strict_X)
 flag_caller_saves = 0;
 
+  /* Allow optimizer to introduce store data races. This used to be the
+ default - it was changed because bigger targets did not see any
+ performance decrease. For the AVR though, disallowing data races
+ introduces additional code in LIM and increases reg pressure.  */
+
+  maybe_set_param_value (PARAM_ALLOW_STORE_DATA_RACES, 1,
+  global_options.x_param_values,
+  global_options_set.x_param_values);
+
   /* Unwind tables currently require a frame pointer for correctness,
  see toplev.c:process_options().  */
 




Re: [PATCH 9/9] S/390: Disallow SImode in s390_decompose_address

2016-02-01 Thread Ulrich Weigand
Andreas Krebbel wrote:

>   * config/s390/s390.c (s390_decompose_address): Don't accept SImode
>   anymore.

Great!  Very nice to finally get rid of this.

Thanks,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



Re: [PATCH 2/9] S/390: Add disabled insn attribute

2016-02-01 Thread Ulrich Weigand
Andreas Krebbel wrote:

> With this patch a `disabled' attribute is defined which can be used to
> explicitly disable an alternative.  This comes handy when defining the
> substitutions later and while adding it anyway I've used it for the
> existing cases as well.

> +; Insn attribute with boolean value
> +;  1 to disable an insn alternative
> +;  0 or * for an active insn alternative
> +;  an active alternative can still be disabled due to lacking cpu
> +;  facility support
> +(define_attr "disabled" "" (const_int 0))
> +
>  (define_attr "enabled" ""
> -  (cond [(eq_attr "cpu_facility" "standard")
> +  (cond [(eq_attr "disabled" "1")
> +  (const_int 0)
> +
> +  (eq_attr "cpu_facility" "standard")
>(const_int 1)
>  
>   (and (eq_attr "cpu_facility" "ieee")

So I'm wondering what the difference is between this and simply
overriding the default implementation of "enabled" per-insn?

So instead of adding
(set_attr "disabled" "0,1")])
to an insn, you might simply add instead:
(set_attr "enabled" "*,0")])

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



Re: [PATCH 8/9] S/390: Use define_subst for the setmem patterns.

2016-02-01 Thread Ulrich Weigand
Andreas Krebel:

> While trying to get rid of the Y constraint in the setmem patterns I
> noticed that for these patterns it isn't even a problem since these
> always only use the constraint with a Pmode match_operand.  But while
> being at it I've tried to fold some of the patterns a bit.

Hmm, I'm wondering whether it's worth it to use a subst pattern that
is only ever used in one single instance?  (Specifically I'm wondering
about the 31z subst.  The and subst is used multiple times.)


> +; Turn the DImode in the 31 bit pattern into TImode to enforce
> +; register pair usage even with -mzarch.
> +; The subreg offset is adjusted accordingly.
> +(define_subst "setmem_31z_subst"
> +  [(clobber (match_operand:DI  0 "register_operand" ""))
> +   (set (mem:BLK (subreg:SI (match_operand:DI  3 "register_operand" "") 
> 0))
> +(unspec:BLK [(match_operand:SI 2 
> "shift_count_or_setmem_operand" "")
> +  (subreg:SI (match_operand:DI  4 "register_operand" "")
> + 0)]

Is this right?  Shouldn't it be subreg offset 4 originally?

> +  UNSPEC_REPLICATE_BYTE))
> +   (use (match_operand:DI  1 "register_operand" ""))
> +   (clobber (reg:CC CC_REGNUM))]
> +""
> +  [(clobber (match_operand:TI  0 "register_operand" ""))
> +   (set (mem:BLK (subreg:SI (match_operand:TI  3 "register_operand" "") 
> 4))
> +(unspec:BLK [(match_operand:SI 2 
> "shift_count_or_setmem_operand" "")
> +  (subreg:SI (match_operand:TI  4 "register_operand" "")
> + 12)]
> + UNSPEC_REPLICATE_BYTE))
> +   (use (match_operand:TI  1 "register_operand" ""))
> +   (clobber (reg:CC CC_REGNUM))])

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



RE: [Patch, MIPS] Patch for PR 68400, a mips16 bug

2016-02-01 Thread Andrew Bennett


> -Original Message-
> From: Matthew Fortune
> Sent: 30 January 2016 16:46
> To: Richard Sandiford; Steve Ellcey
> Cc: gcc-patches@gcc.gnu.org; c...@codesourcery.com; Andrew Bennett
> Subject: RE: [Patch, MIPS] Patch for PR 68400, a mips16 bug
> 
> Richard Sandiford  writes:
> > "Steve Ellcey "  writes:
> > > Here is a patch for PR6400.  The problem is that and_operands_ok was
> checking
> > > one operand to see if it was a memory_operand but MIPS16 addressing is
> more
> > > restrictive than what the general memory_operand allows.  The fix was to
> > > call mips_classify_address if TARGET_MIPS16 is set because it will do a
> > > more complete mips16 addressing check and reject operands that do not meet
> > > the more restrictive requirements.
> > >
> > > I ran the GCC testsuite with no regressions and have included a test case
> as
> > > part of this patch.
> > >
> > > OK to checkin?
> > >
> > > Steve Ellcey
> > > sell...@imgtec.com
> > >
> > >
> > > 2016-01-26  Steve Ellcey  
> > >
> > >   PR target/68400
> > >   * config/mips/mips.c (and_operands_ok): Add MIPS16 check.
> > >
> > >
> > >
> > > diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
> > > index dd54d6a..adeafa3 100644
> > > --- a/gcc/config/mips/mips.c
> > > +++ b/gcc/config/mips/mips.c
> > > @@ -8006,9 +8006,18 @@ mask_low_and_shift_p (machine_mode mode, rtx mask,
> rtx shift, int
> > maxlen)
> > >  bool
> > >  and_operands_ok (machine_mode mode, rtx op1, rtx op2)
> > >  {
> > > -  return (memory_operand (op1, mode)
> > > -   ? and_load_operand (op2, mode)
> > > -   : and_reg_operand (op2, mode));
> > > +
> > > +  if (memory_operand (op1, mode))
> > > +{
> > > +  if (TARGET_MIPS16) {
> > > + struct mips_address_info addr;
> > > + if (!mips_classify_address (&addr, op1, mode, false))
> > > +   return false;
> > > +  }
> >
> > Nit: brace formatting.
> >
> > It looks like the patch only works by accident.  The code above
> > is passing the MEM, rather than the address inside the MEM, to
> > mips_classify_address.  Since (mem (mem ...)) is never valid on MIPS,
> > the effect is to disable the memory alternatives of *and3_mips16
> > unconditionally.
> >
> > The addresses that occur in the testcase are valid as far as
> > mips_classify_address is concerned.  FWIW, there shouldn't be any
> > difference between the addresses that memory_operand accepts and the
> > addresses that mips_classify_address accepts.
> >
> > In theory, the "W" constraints in *and3_mips16 are supposed to
> > tell the target-independent code that this instruction cannot handle
> > constant addresses or stack-based addresses.  That seems to be working
> > correctly during RA for the testcase.  The problem comes in regcprop,
> > which ends up creating a second stack pointer rtx distinct from
> > stack_pointer_rtx:
> >
> > (reg/f:SI 29 $sp [375])
> >
> > (Note the ORIGINAL_REGNO of 375.)  This then defeats the test in
> > mips_stack_address_p:
> >
> > bool
> > mips_stack_address_p (rtx x, machine_mode mode)
> > {
> >   struct mips_address_info addr;
> >
> >   return (mips_classify_address (&addr, x, mode, false)
> >   && addr.type == ADDRESS_REG
> >   && addr.reg == stack_pointer_rtx);
> > }
> >
> > Change the == to rtx_equal_p and the test passes.  I don't think that's
> > the correct fix though -- the fix is to stop a second stack pointer rtx
> > from being created.
> 
> Agreed, though I'm inclined to say do both. We actually hit this
> same issue while testing some 4.9.2 based tools recently but I hadn't
> got confirmation from Andrew (cc'd) whether it was definitely the same
> issue. Andrew fixed this by updating all tests against stack_pointer_rtx
> to compare register numbers instead (but rtx_equal_p is better still).
> 
> I think it is worthwhile looking in more detail why a new stack pointer
> rtx appears. Andrew did look briefly at the time but I don't recall his
> conclusions...

I will do some digging to find the testcase that caused the issue on the 4.9.2 
tools.  
In the meantime below is the initial patch I created to fix this issue.  I am 
currently 
in the middle of preparing it for submission and it should be on the mailing 
list in 
the next few days.  I will take the rtx_equal_p comment on board and rework the 
code 
to use this function rather than doing a register comparison.


Regards,



Andrew


diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index dd54d6a..1c49f55 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -2435,7 +2435,7 @@ mips_stack_address_p (rtx x, machine_mode mode)
 
   return (mips_classify_address (&addr, x, mode, false)
  && addr.type == ADDRESS_REG
- && addr.reg == stack_pointer_rtx);
+ && REGNO (addr.reg) == STACK_POINTER_REGNUM);
 }
 
 /* Return true if ADDR matches the pattern for the LWXS load scaled indexed
@@ -2498,7 +2498,8 @@ mips16_unextended_reference_p (machine_mode mode, rtx 
base,
 {
   if (mode != BLKmode && offset % GET_MODE_

Re: [PATCH 7/9] S/390: Get rid of Y constraint in vector.md.

2016-02-01 Thread Ulrich Weigand
Andreas Krebbel wrote:

> +(define_insn "*vec_extract_plus"
> +  [(set (match_operand:  0 
> "nonimmediate_operand" "=d,QR")
> + (unspec: [(match_operand:V   1 "register_operand"  
> "v, v")
> +(plus:SI (match_operand:SI 2 "register_operand"  
> "a, I")

The second alternative can never match here, can it?

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



Re: [PATCH 6/9] S/390: Get rid of Y constraint in tabort.

2016-02-01 Thread Ulrich Weigand
Andreas Krebbel wrote:

>  (define_insn "*tabort_1"
> -  [(unspec_volatile [(match_operand:SI 0 "shift_count_or_setmem_operand" 
> "Y")]
> +  [(unspec_volatile [(match_operand:SI 0 "addrreg_or_constint_operand" 
> "a,n")]
>   UNSPECV_TABORT)]
>"TARGET_HTM && operands != NULL"
> -  "tabort\t%Y0"
> +  "@
> +   tabort\t0(%0)
> +   tabort\t%0"
> +  [(set_attr "op_type" "S")])
> +
> +(define_insn "*tabort_1_plus"
> +  [(unspec_volatile [(plus:SI (match_operand:SI 0 "register_operand"  "a")
> +   (match_operand:SI 1 "const_int_operand" "J"))]
> + UNSPECV_TABORT)]
> +  "TARGET_HTM && operands != NULL"
> +  "tabort\t%1(%0)"

This seems dangerous: const_int_operand may match a constant that does
not fit into the "J" constraint, which would lead to an abort in reload.

What is the semantics for the abort code anyway?  It is supposed to be
automatically truncated or not?

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



Re: [PATCH 5/9] S/390: Get rid of Y constraint in arithmetic right shift patterns.

2016-02-01 Thread Ulrich Weigand
Andreas Krebbel wrote:

> +; This is like the addr_style_op substitution above but with a CC clobber.
> +(define_subst "addr_style_op_cc_subst"

> +; This is like the masked_op substitution but with a CC clobber.
> +(define_subst "masked_op_cc_subst"

A bit unfortunate that these need to be duplicated.  Does the subst always
have to match the full pattern, or can it match and operate on just one
element of a PARALLEL?

> +; This adds an explicit CC reg set to an operation while keeping the
> +; set for the operation result as well.
> +(define_subst "setcc_subst"

> +; This adds an explicit CC reg set to an operation while dropping the
> +; result of the operation.
> +(define_subst "cconly_subst"

These are nice!  It would seem they could be applied to simplify many
of the non-shift patterns too, right?

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



Re: [PATCH 3/9] S/390: Get rid of Y constraint in rotate patterns.

2016-02-01 Thread Ulrich Weigand
Andreas Krebbel wrote:

> +; Accept single register and immediate operands usable as shift
> +; counts.
> +(define_predicate "addrreg_or_constint_operand"
> +  (match_code "reg, subreg, const_int")

I'm wondering whether this is even necessary.

> +{
> +  if (GET_MODE (op) != VOIDmode
> +  && GET_MODE_CLASS (GET_MODE (op)) != MODE_INT)
> +return false;

The predicate always seems to be used with a mode, so any extra
mode checks here seem redundant.

> +  while (op && GET_CODE (op) == SUBREG)
> +op = SUBREG_REG (op);
> +
> +  if (REG_P (op)
> +  && (REGNO (op) >= FIRST_PSEUDO_REGISTER || ADDR_REG_P (op)))
> +return true;
> +
> +  if (CONST_INT_P (op))
> +return true;

This looks somewhat copied from shift_count_or_setmem_operand, where
we have a comment:
  /* Don't allow any non-base hard registers.  Doing so without
 confusing reload and/or regrename would be tricky, and doesn't
 buy us much anyway.  */

With the new set up, I don't believe there is any issue with confusing
reload -- it's no more complex than any other pattern now.  So it
should be fine to just accept any register.

In fact, I'm starting to wonder whether it wouldn't be just fine to
simply using nonmemory_operand instead of this special predicate
(the only issue might be that we could get CONST_WIDE_INT, but it
should be simple to handle those).

> +(define_expand "rotl3"
> +  [(set (match_operand:GPR 0 "register_operand" "")
> +(rotate:GPR (match_operand:GPR 1 "register_operand" "")
> + (match_operand:SI 2 "shift_count_or_setmem_operand" "")))]

Similarly, I think the expander no longer needs to use the special predicate
shift_count_or_setmem_operandm but could just use nonmemory_operand.
[ This will reject the PLUS that shift_count_or_setmem_operand accepted,
but I don't believe you ever *could* get the PLUS in the expander anyway.
It will only be moved in by combine, so as long as the insn accepts it,
everything should be good.  ]

> +(define_insn "*rotl3"
> +  [(set (match_operand:GPR 0 "register_operand"   "=d,d")
> + (rotate:GPR (match_operand:GPR 1 "register_operand""d,d")
> + (match_operand:SI  2 "addrreg_or_constint_operand" "a,n")))]
> +  "TARGET_CPU_ZARCH"
> +  "@
> +   rll\t%0,%1,(%2)
> +   rll\t%0,%1,%Y2"

So it looks like %Y is now always used for just a constant.  Maybe the
implementation of %Y should be adjusted (once the patch series is complete)?
Also, if we use just nonmemory_operand, %Y should be updated to accept
CONST_WIDE_INT just like e.g. %x does.

> +; This expands an register/immediate operand to a register+immediate
> +; operand to draw advantage of the address style operand format
> +; providing a addition for free.
> +(define_subst "addr_style_op_subst"
> +  [(set (match_operand:DSI 0 "" "")
> +(SUBST:DSI (match_operand:DSI 1 "" "")
> +(match_operand:SI 2 "" "")))]
> +  "REG_P (operands[2])"
> +  [(set (match_dup 0)
> +(SUBST:DSI (match_dup 1)
> +(plus:SI (match_dup 2)
> + (match_operand 3 "const_int_operand" "n"])

This seems to add just a single alternative.  Is that OK if it gets
substituted into a pattern that used multiple alternatives otherwise?

> +; In the subst pattern we have to disable the alternative where op2 is
> +; already a constant.  This attribute is supposed to be used in the
> +; "disabled" insn attribute to achieve this.
> +(define_subst_attr "addr_style_op_disable" "addr_style_op_subst" "0" "1")

Is this necessary given that the subst adds a REG_P (operands[2])
condition?

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



[PATCH] libitm: Introduce target macro TARGET_BEGIN_TRANSACTION_ATTRIBUTE.

2016-02-01 Thread Dominik Vogt
The attached patch adds the a target specific attribute via the
new target macro TARGET_BEGIN_TRANSACTION_ATTRIBUTE to the
function begin_transaction().  S/390 uses this to set the
soft-float target attribute which is needed to fix a crash with
-m31.

As there seems to be no place in libitm to document internal macros like
USE_HTM_FASTPATH or the new macro, I've put the documentation in a
comment where the macro is used.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
libitm/ChangeLog

* config/s390/target.h (TARGET_BEGIN_TRANSACTION_ATTRIBUTE): Define
function attribute to disable floating point in begin_transaction() on
S/390.
* beginend.cc (begin_transaction): Use
TARGET_BEGIN_TRANSACTION_ATTRIBUTE.
>From a5c8087753ab1f42082b9ab44d443005ed2c53ba Mon Sep 17 00:00:00 2001
From: Dominik Vogt 
Date: Mon, 1 Feb 2016 14:07:55 +0100
Subject: [PATCH] libitm: Introduce target macro
 TARGET_BEGIN_TRANSACTION_ATTRIBUTE.

The macro can be used to add attributes to the function begin_transaction.  Set
it for S/390 to switch of harware floating point.
---
 libitm/beginend.cc  | 6 ++
 libitm/config/s390/target.h | 3 +++
 2 files changed, 9 insertions(+)

diff --git a/libitm/beginend.cc b/libitm/beginend.cc
index 1a258ad..20b5547 100644
--- a/libitm/beginend.cc
+++ b/libitm/beginend.cc
@@ -149,6 +149,12 @@ choose_code_path(uint32_t prop, abi_dispatch *disp)
 return a_runInstrumentedCode;
 }
 
+#ifdef TARGET_BEGIN_TRANSACTION_ATTRIBUTE
+/* This macro can be used to define target specific attributes for this
+   function.  For example, S/390 requires floating point to be disabled in
+   begin_transaction.  */
+TARGET_BEGIN_TRANSACTION_ATTRIBUTE
+#endif
 uint32_t
 GTM::gtm_thread::begin_transaction (uint32_t prop, const gtm_jmpbuf *jb)
 {
diff --git a/libitm/config/s390/target.h b/libitm/config/s390/target.h
index 44819a4..c9d5203 100644
--- a/libitm/config/s390/target.h
+++ b/libitm/config/s390/target.h
@@ -69,6 +69,9 @@ cpu_relax (void)
 /* Number of retries for transient failures.  */
 #define _HTM_ITM_RETRIES 10
 #define USE_HTM_FASTPATH
+#define TARGET_BEGIN_TRANSACTION_ATTRIBUTE \
+  __attribute__ ((target("soft-float")))
+
 
 static inline bool
 htm_available ()
-- 
2.3.0



AW: Is it OK for rtx_addr_can_trap_p_1 to attempt to compute the frame layout? (was Re: [PATCH] Skip re-computing the mips frame info after reload completed)

2016-02-01 Thread Bernd Edlinger
Hi,

On 02/01/2016 13:18, Bernd Schmidt wrote:
> On 01/29/2016 08:42 PM, Bernd Edlinger wrote:
> > Usually we have "if (x==1234) { read MEM[FP+x]; }", so wo don't know,
> > and then after reload: "if (x==1234) { read MEM[SP+x+sp_fp_offset]; }"
> > but wait, in the if statement we know, that x==1234, so everything
> > turns in one magic constant, and we have a totally new constant offset
> > from the SP register "if (x==1234) { read MEM[SP+1234+sp_fp_offset]; }".
> > Now if rtx_addr_can_trap_p(MEM[SP+1234+sp_fp_offset]) says it cannot
> > trap we think we do not need the if at all => BANG.
> 
> What are you trying to say here? As far as I can tell this isn't a
> problem with my proposed solution (set MEM_NOTRAP_P for valid SFP+x
> offsets).

First, I think there are cases when x is still a pseudo before reload, and after
reload, some constant propagation manages to replace x with a constant value.
Then it will not help to know what rtx_addr_can_trap_p was before reload.

Second, I do also think that it may be helpful to annotate the RTX during
reload.  MEM_NOTRAP_P is already used, so maybe something new
like a spare bit would be necessary.

But maybe a single bit will not be enough, what I would need, would be to
somehow annotate the stack_pointer_rtx, that it has a known constant offset
to the initial stack pointer value.  The problem with that is, that 
stack_pointer_rtx
is a shared pointer, and it is therefore not possible to modify it, and attach
some context to it.


Bernd.

[PATCH] S/390: Correct documentation typos.

2016-02-01 Thread Dominik Vogt
The attached patch corrects a few typos in the S/390 specific
documentation.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
gcc/ChangeLog

* doc/extend.texi: S/390: Correct some typos.
>From 418db8fbac09ecfb8f70f7f2a5c8e4ce52cc160a Mon Sep 17 00:00:00 2001
From: Dominik Vogt 
Date: Mon, 1 Feb 2016 12:39:12 +0100
Subject: [PATCH] S/390: Correct documentation typos.

---
 gcc/doc/extend.texi | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index a1c36f5..e68c1ee 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -16710,7 +16710,7 @@ Generates the @code{wait} machine instruction.
 @subsection S/390 System z Built-in Functions
 @deftypefn {Built-in Function} int __builtin_tbegin (void*)
 Generates the @code{tbegin} machine instruction starting a
-non-constraint hardware transaction.  If the parameter is non-NULL the
+non-constrained hardware transaction.  If the parameter is non-NULL the
 memory area is used to store the transaction diagnostic buffer and
 will be passed as first operand to @code{tbegin}.  This buffer can be
 defined using the @code{struct __htm_tdb} C struct defined in
@@ -16723,7 +16723,7 @@ instruction by definition overwrites the content of all FPRs.  The
 compiler will generate code which saves and restores the FPRs.  For
 soft-float code it is recommended to used the @code{*_nofloat}
 variant.  In order to prevent a TDB from being written it is required
-to pass an constant zero value as parameter.  Passing the zero value
+to pass a constant zero value as parameter.  Passing a zero value
 through a variable is not sufficient.  Although modifications of
 access registers inside the transaction will not trigger an
 transaction abort it is not supported to actually modify them.  Access
@@ -16784,7 +16784,7 @@ handler code.
 @end deftypefn
 
 @deftypefn {Built-in Function} void __builtin_tbeginc (void)
-Generates the @code{tbeginc} machine instruction starting a constraint
+Generates the @code{tbeginc} machine instruction starting a constrained
 hardware transaction.  The second operand is set to @code{0xff08}.
 @end deftypefn
 
-- 
2.3.0



Re: Is it OK for rtx_addr_can_trap_p_1 to attempt to compute the frame layout? (was Re: [PATCH] Skip re-computing the mips frame info after reload completed)

2016-02-01 Thread Richard Biener
On Mon, Feb 1, 2016 at 1:51 PM, Bernd Schmidt  wrote:
> On 02/01/2016 01:49 PM, Richard Biener wrote:
>
>> What prevents motion of those across a stack adjustment (thus a place
>> they are _not_ valid?)
>
>
> If the address is SP-based, dependencies on the address register. If you're
> thinking prologue stack adjustments, ports where this could be an issue emit
> suitable barrier insns.

Ok.  Just I remember we can't mark non-trapping memory references in general
as they might be non-trapping only conditional, aka if (p != NULL) *p = 1;

I suppose stack accesses are special enough where issues like that can't pop up?
I can think of

foo (int i)
{
 char *p = alloca(10);
 if (i < 10)
   p[i] = 1;
}

or stuff like that.  But I guess your proposal is to handle the simple
non-variable-indexed
cases and leave the rest as always conservatively trapping.

Richard.

>
> Bernd
>


Re: Is it OK for rtx_addr_can_trap_p_1 to attempt to compute the frame layout? (was Re: [PATCH] Skip re-computing the mips frame info after reload completed)

2016-02-01 Thread Bernd Schmidt

On 02/01/2016 01:49 PM, Richard Biener wrote:


What prevents motion of those across a stack adjustment (thus a place
they are _not_ valid?)


If the address is SP-based, dependencies on the address register. If 
you're thinking prologue stack adjustments, ports where this could be an 
issue emit suitable barrier insns.



Bernd



Re: Is it OK for rtx_addr_can_trap_p_1 to attempt to compute the frame layout? (was Re: [PATCH] Skip re-computing the mips frame info after reload completed)

2016-02-01 Thread Richard Biener
On Mon, Feb 1, 2016 at 1:18 PM, Bernd Schmidt  wrote:
> On 01/29/2016 08:42 PM, Bernd Edlinger wrote:
>>
>> On 29.01.2016 16:47 Bernd Schmidt wrote:
>>>
>>>
>>> Yes. What is the problem with that? If we have (plus sfp const_int) at
>>> any point before reload, we can check whether that offset is inside
>>> frame_size. If it isn't or if the offset isn't known, it could trap.
>>>
>>>
>>
>> Usually we have "if (x==1234) { read MEM[FP+x]; }", so wo don't know,
>> and then after reload: "if (x==1234) { read MEM[SP+x+sp_fp_offset]; }"
>> but wait, in the if statement we know, that x==1234, so everything
>> turns in one magic constant, and we have a totally new constant offset
>> from the SP register "if (x==1234) { read MEM[SP+1234+sp_fp_offset]; }".
>> Now if rtx_addr_can_trap_p(MEM[SP+1234+sp_fp_offset]) says it cannot
>> trap we think we do not need the if at all => BANG.
>
>
> What are you trying to say here? As far as I can tell this isn't a problem
> with my proposed solution (set MEM_NOTRAP_P for valid SFP+x offsets).

What prevents motion of those across a stack adjustment (thus a place
they are _not_ valid?)

Richard.

>
> Bernd


[PATCH] Fix PR69579

2016-02-01 Thread Richard Biener

The following fixes PR69579.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2016-02-01  Richard Biener  

PR tree-optimization/69579
* tree-ssa-loop-ivcanon.c (propagate_constants_for_unrolling):
Do not propagate through abnormal PHI results.

* gcc.dg/setjmp-6.c: New testcase.

Index: gcc/tree-ssa-loop-ivcanon.c
===
--- gcc/tree-ssa-loop-ivcanon.c (revision 232976)
+++ gcc/tree-ssa-loop-ivcanon.c (working copy)
@@ -1208,7 +1208,9 @@ propagate_constants_for_unrolling (basic
   tree result = gimple_phi_result (phi);
   tree arg = gimple_phi_arg_def (phi, 0);
 
-  if (gimple_phi_num_args (phi) == 1 && TREE_CODE (arg) == INTEGER_CST)
+  if (! SSA_NAME_OCCURS_IN_ABNORMAL_PHI (result)
+ && gimple_phi_num_args (phi) == 1
+ && TREE_CODE (arg) == INTEGER_CST)
{
  propagate_into_all_uses (result, arg);
  gsi_remove (&gsi, true);
Index: gcc/testsuite/gcc.dg/setjmp-6.c
===
--- gcc/testsuite/gcc.dg/setjmp-6.c (revision 0)
+++ gcc/testsuite/gcc.dg/setjmp-6.c (working copy)
@@ -0,0 +1,25 @@
+/* PR69569 */
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+
+#include 
+
+jmp_buf buf;
+
+struct str {
+int Count;
+};
+int fun2(struct str *p1)
+{
+int i = 1;
+while (1) {
+   setjmp(buf);
+   break;
+}
+for (; i;) {
+   i = 0;
+   for (; i < (p1 ? p1->Count : 1); i++)
+ fun2(p1);
+}
+return 1;
+}


Re: [Patch, fortran, pr67451, gcc-5, v1] [5/6 Regression] ICE with sourced allocation from coarray

2016-02-01 Thread Andre Vehreschild
Oh, well, now with attachments. I am sorry.

- Andre

On Mon, 1 Feb 2016 13:20:24 +0100
Andre Vehreschild  wrote:

> Hi all,
> 
> here is the backport of the patch for pr67451 for gcc-5. Because the
> structure of the allocate() in trunk is quite different the patch looks
> somewhat different, too, but essentially does the same.
> 
> Bootstrapped and regtests ok on x86_64-linux-gnu/F23.
> 
> Ok for gcc-5-branch?
> 
> Here is the link to the mainline patch:
> https://gcc.gnu.org/ml/fortran/2016-01/msg00093.html
> 
> Regards,
>   Andre
> 
> On Fri, 29 Jan 2016 19:17:24 +0100
> Andre Vehreschild  wrote:
> 
> > Hi all,
> > 
> > attached is a patch to fix a regression in current gfortran when a
> > coarray is used in the source=-expression of an allocate(). The ICE was
> > caused by the class information, i.e., _vptr and so on, not at the
> > expected place. The patch fixes this.
> > 
> > The patch also fixes pr69418, which I will flag as a duplicate in a
> > second.
> > 
> > Bootstrapped and regtested ok on x86_64-linux-gnu/F23.
> > 
> > Ok for trunk?
> > 
> > Backport to gcc-5 is pending, albeit more difficult, because the
> > allocate() implementation on 5 is not as advanced the one in 6.
> > 
> > Regards,
> > Andre  
> 
> 


-- 
Andre Vehreschild * Email: vehre ad gmx dot de 
diff --git a/gcc/fortran/trans-expr.c b/gcc/fortran/trans-expr.c
index 4c368a8..0daa631 100644
--- a/gcc/fortran/trans-expr.c
+++ b/gcc/fortran/trans-expr.c
@@ -1019,6 +1019,7 @@ gfc_copy_class_to_class (tree from, tree to, tree nelems, bool unlimited)
   tree fcn;
   tree fcn_type;
   tree from_data;
+  tree from_class_base = NULL;
   tree from_len;
   tree to_data;
   tree to_len;
@@ -1035,21 +1036,41 @@ gfc_copy_class_to_class (tree from, tree to, tree nelems, bool unlimited)
   from_len = to_len = NULL_TREE;
 
   if (from != NULL_TREE)
-fcn = gfc_class_vtab_copy_get (from);
+{
+  /* Check that from is a class.  When the class is part of a coarray,
+	 then from is a common pointer and is to be used as is.  */
+  tmp = POINTER_TYPE_P (TREE_TYPE (from)) && !DECL_P (from)
+	  ? TREE_OPERAND (from, 0) : from;
+  if (GFC_CLASS_TYPE_P (TREE_TYPE (tmp))
+	  || (DECL_P (tmp) && GFC_DECL_CLASS (tmp)))
+	{
+	  from_class_base = from;
+	  from_data = gfc_class_data_get (from_class_base);
+	}
+  else
+	{
+	  /* For arrays two component_refs can be present.  */
+	  if (TREE_CODE (tmp) == COMPONENT_REF)
+	tmp = TREE_OPERAND (tmp, 0);
+	  if (TREE_CODE (tmp) == COMPONENT_REF)
+	tmp = TREE_OPERAND (tmp, 0);
+	  from_class_base = tmp;
+	  from_data = from;
+	}
+  fcn = gfc_class_vtab_copy_get (from_class_base);
+}
   else
-fcn = gfc_class_vtab_copy_get (to);
+{
+  fcn = gfc_class_vtab_copy_get (to);
+  from_data = gfc_class_vtab_def_init_get (to);
+}
 
   fcn_type = TREE_TYPE (TREE_TYPE (fcn));
 
-  if (from != NULL_TREE)
-  from_data = gfc_class_data_get (from);
-  else
-from_data = gfc_class_vtab_def_init_get (to);
-
   if (unlimited)
 {
-  if (from != NULL_TREE && unlimited)
-	from_len = gfc_class_len_get (from);
+  if (from_class_base != NULL_TREE)
+	from_len = gfc_class_len_get (from_class_base);
   else
 	from_len = integer_zero_node;
 }
diff --git a/gcc/fortran/trans-stmt.c b/gcc/fortran/trans-stmt.c
index 0be92cd..9c1f920 100644
--- a/gcc/fortran/trans-stmt.c
+++ b/gcc/fortran/trans-stmt.c
@@ -5180,7 +5180,7 @@ gfc_trans_allocate (gfc_code * code)
  _vptr, _len and element_size for expr3.  */
   if (code->expr3)
 {
-  bool vtab_needed = false;
+  bool vtab_needed = false, is_coarray = gfc_is_coarray (code->expr3);
   /* expr3_tmp gets the tree when code->expr3.mold is set, i.e.,
 	 the expression is only needed to get the _vptr, _len a.s.o.  */
   tree expr3_tmp = NULL_TREE;
@@ -5245,7 +5245,8 @@ gfc_trans_allocate (gfc_code * code)
 		{
 		  tree var;
 
-		  tmp = build_fold_indirect_ref_loc (input_location,
+		  tmp = is_coarray ? se.expr
+  : build_fold_indirect_ref_loc (input_location,
 		 se.expr);
 
 		  /* We need a regular (non-UID) symbol here, therefore give a
@@ -5297,6 +5298,16 @@ gfc_trans_allocate (gfc_code * code)
 	  else if (expr3_tmp != NULL_TREE
 		   && (VAR_P (expr3_tmp) ||!code->expr3->ref))
 	tmp = gfc_class_vptr_get (expr3_tmp);
+	  else if (is_coarray && expr3 != NULL_TREE)
+	{
+	  /* Get the ref to coarray's data.  May be wrapped in a
+		 NOP_EXPR.  */
+	  tmp = POINTER_TYPE_P (TREE_TYPE (expr3)) ? TREE_OPERAND (expr3, 0)
+		   : tmp;
+	  /* Get to the base variable, i.e., strip _data.data.  */
+	  tmp = TREE_OPERAND (TREE_OPERAND (tmp, 0), 0);
+	  tmp = gfc_class_vptr_get (tmp);
+	}
 	  else
 	{
 	  rhs = gfc_find_and_cut_at_last_class_ref (code->expr3);
diff --git a/gcc/testsuite/gfortran.dg/coarray_allocate_2.f08 b/gcc/testsuite/gfortran.dg/coarray_allocate_2.f08
new file mode 100644
index 000..7a712a9
--- /dev/null
+++ b/gcc/testsu

Re: [Patch, MIPS] Fix PR target/68273, passing args in wrong regs

2016-02-01 Thread Eric Botcazou
> Actually I'd say it's a frontend bug then setting DECL_ARG_TYPE to the
> wrong type.  Quoting:
> 
> /* For a PARM_DECL, records the data type used to pass the argument,
>which may be different from the type seen in the program.  */
> #define DECL_ARG_TYPE(NODE) (PARM_DECL_CHECK (NODE)->decl_common.initial)
> 
> if the type doesn't end up coming from DECL_ARG_TYPE but the value
> then I'd agree - we should always look at the main variant here.  Expansion
> is quite twisty here so catching all cases is going to be interesting as
> well as evaluating ABI side-effects - that's much easier to determine when
> you change one target at a time...

FWIW SPARC 64-bit is very likely affected too, but I'm not sure we would want 
to change the ABI (it's a GCC extension after all).  IMO the underlying issue 
is to allow variants with different alignments (for strict-alignment targets 
at least), this seems fundamentally broken to me.

-- 
Eric Botcazou


Re: [Patch, fortran, pr67451, gcc-5, v1] [5/6 Regression] ICE with sourced allocation from coarray

2016-02-01 Thread Andre Vehreschild
Hi all,

here is the backport of the patch for pr67451 for gcc-5. Because the
structure of the allocate() in trunk is quite different the patch looks
somewhat different, too, but essentially does the same.

Bootstrapped and regtests ok on x86_64-linux-gnu/F23.

Ok for gcc-5-branch?

Here is the link to the mainline patch:
https://gcc.gnu.org/ml/fortran/2016-01/msg00093.html

Regards,
Andre

On Fri, 29 Jan 2016 19:17:24 +0100
Andre Vehreschild  wrote:

> Hi all,
> 
> attached is a patch to fix a regression in current gfortran when a
> coarray is used in the source=-expression of an allocate(). The ICE was
> caused by the class information, i.e., _vptr and so on, not at the
> expected place. The patch fixes this.
> 
> The patch also fixes pr69418, which I will flag as a duplicate in a
> second.
> 
> Bootstrapped and regtested ok on x86_64-linux-gnu/F23.
> 
> Ok for trunk?
> 
> Backport to gcc-5 is pending, albeit more difficult, because the
> allocate() implementation on 5 is not as advanced the one in 6.
> 
> Regards,
>   Andre


-- 
Andre Vehreschild * Email: vehre ad gmx dot de 


Re: Is it OK for rtx_addr_can_trap_p_1 to attempt to compute the frame layout? (was Re: [PATCH] Skip re-computing the mips frame info after reload completed)

2016-02-01 Thread Bernd Schmidt

On 01/29/2016 08:42 PM, Bernd Edlinger wrote:

On 29.01.2016 16:47 Bernd Schmidt wrote:


Yes. What is the problem with that? If we have (plus sfp const_int) at
any point before reload, we can check whether that offset is inside
frame_size. If it isn't or if the offset isn't known, it could trap.




Usually we have "if (x==1234) { read MEM[FP+x]; }", so wo don't know,
and then after reload: "if (x==1234) { read MEM[SP+x+sp_fp_offset]; }"
but wait, in the if statement we know, that x==1234, so everything
turns in one magic constant, and we have a totally new constant offset
from the SP register "if (x==1234) { read MEM[SP+1234+sp_fp_offset]; }".
Now if rtx_addr_can_trap_p(MEM[SP+1234+sp_fp_offset]) says it cannot
trap we think we do not need the if at all => BANG.


What are you trying to say here? As far as I can tell this isn't a 
problem with my proposed solution (set MEM_NOTRAP_P for valid SFP+x 
offsets).



Bernd


Re: [Patch, MIPS] Fix PR target/68273, passing args in wrong regs

2016-02-01 Thread Richard Biener
On Sat, Jan 30, 2016 at 12:06 PM, Richard Sandiford
 wrote:
> I'm not sure this patch is safe.  The received wisdom used to be that
> ABIs should be defined in terms of types, not modes, since types
> represent the source code while modes are an internal GCC concept
> that could change over time (although in practice the barrier for
> that is pretty high).
>
> The patch that introduced the line being patched was part of a series
> that fixed ABI incompatibilities with MIPSpro, and IIRC some of the
> incompatibilties really were due to us looking at modes when we should
> have been looking at types.
>
> So...
>
> "Steve Ellcey "  writes:
>> This is a patch for PR 68273, where MIPS is passing arguments in the wrong
>> registers.  The problem is that when passing an argument by value,
>> mips_function_arg_boundary was looking at the type of the argument (and
>> not just the mode), and if that argument was a variable with extra alignment
>> info (say an integer variable with 8 byte alignment), MIPS was aligning the
>> argument on an 8 byte boundary instead of a 4 byte boundary the way it 
>> should.
>>
>> Since we are passing values (not variables), the alignment of the variable
>> that the value is copied from should not affect the alignment of the value
>> being passed.
>
> ...to me this suggests we might have a middle-end bug.  The argument
> seems to be that the alignment of the type being passed in cannot
> reasonably be used to determine the ABI for semantic reasons
> (rvalues don't have meaningful alignment).  Doesn't that mean that
> we're passing the wrong type to the hook?  The point of the hook
> is to say how an ABI handles a particular type, so if we have a type
> variant that the ABI can't reasonably handle directly for language
> reasons, shouldn't we be passing the main variant instead?

Actually I'd say it's a frontend bug then setting DECL_ARG_TYPE to the
wrong type.  Quoting:

/* For a PARM_DECL, records the data type used to pass the argument,
   which may be different from the type seen in the program.  */
#define DECL_ARG_TYPE(NODE) (PARM_DECL_CHECK (NODE)->decl_common.initial)

if the type doesn't end up coming from DECL_ARG_TYPE but the value
then I'd agree - we should always look at the main variant here.  Expansion is
quite twisty here so catching all cases is going to be interesting as well as
evaluating ABI side-effects - that's much easier to determine when you change
one target at a time...

Richard.

>> This patch fixes the problem and it could change what registers arguments
>> are passed in, which means it could affect backwards compatibility with
>> older programs.  But since the current behaviour is not compliant with the
>> MIPS ABI and does not match what LLVM does, I think we have to make this
>> change.  For the most part this will only affect arguments which are copied
>> from variables that have non-standard alignments set by the aligned 
>> attribute,
>> however the SRA optimization pass can create aligned variables as it splits
>> aggregates apart and that was what triggered this bug report.
>>
>> This is basically the same bug as the ARM bug PR 65956 and the fix is
>> pretty much the same too.
>>
>> Rather than create MIPS specific tests that check the use of specific
>> registers I created two tests to put in gcc.c-torture/execute that
>> were failing before because GCC on MIPS was not consistent in where
>> arguments were passed and which now work with this patch.
>>
>> Tested with mips-mti-linux-gnu and no regressions.  OK to checkin?
>
> Given the argument about LLVM, I think it would also be worth running
> the compat testsuite with clang as the alt compiler both before and
> after the patch to see whether we regress.
>
>> 2016-01-29  Steve Ellcey  
>>
>>   PR target/68273
>>   * config/mips/mips.c (mips_function_arg_boundary): Fix argument
>>   alignment.
>>
>>
>>
>> diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
>> index dd54d6a..ecce3cd 100644
>> --- a/gcc/config/mips/mips.c
>> +++ b/gcc/config/mips/mips.c
>> @@ -5643,8 +5643,9 @@ static unsigned int
>>  mips_function_arg_boundary (machine_mode mode, const_tree type)
>>  {
>>unsigned int alignment;
>> -
>> -  alignment = type ? TYPE_ALIGN (type) : GET_MODE_ALIGNMENT (mode);
>> +  alignment = type && mode == BLKmode
>> +   ? TYPE_ALIGN (TYPE_MAIN_VARIANT (type))
>> +   : GET_MODE_ALIGNMENT (mode);
>>if (alignment < PARM_BOUNDARY)
>>  alignment = PARM_BOUNDARY;
>>if (alignment > STACK_BOUNDARY)
>
> We need to be careful of examples like:
>
>   struct __attribute__ ((aligned (8))) s { _Complex float x; };
>   void foo (struct s *ptr, struct s val) { *ptr = val; }
>
> "x" gets SCmode, which has an alignment of 4.  And it's OK for TYPE_MODE
> to have a smaller alignment than the type -- it's just not allowed to
> have a larger alignment (and even that restriction only applies because
> this is a STRICT_ALIGNMENT target).  So the structure itself inherits
> 

Re: [PATCH PR67921]Convert pointer expr to proper type before negating it

2016-02-01 Thread Richard Biener
On Fri, Jan 29, 2016 at 5:21 PM, Bin Cheng  wrote:
> Hi,
> Function fold_binary_loc calls split_tree to split a tree into constant, 
> literal and variable parts.  Function split_tree deals with minus_expr by 
> negating different parts into NEXGATE_EXPR.  Since tree exprs fed to 
> split_tree are with NOP conversions stripped, this could result in illegal 
> expr for pointer expressions.  Given below example as described by PR67921:
>   op0:  (4 - (sizetype) &c)
>   code: MINUS_EXPR
>   op1:  (sizetype)b
>
> fold_binary_loc calls split_tree for both op0 and op1 and gets below from the 
> function calls (it also flips the code):
>   op0:  4, -(sizetype)&c
>   code: PLUS_EXPR
>   op1:  -b
>
> Here we generate NEGATIVE_EXPR of pointer variable (b) which is illegal.  If 
> "-b" can not be canceled by following call to associate_trees, it will be 
> passed along in IR resulting in ICE somewhere.
>
> This patch fixes it by converting pointer expression to proper type before 
> negating it.  Note the proper type is the outer type stripped in 
> fold_binary_loc before calling split_tree.  I also included a test which is 
> heavily reduced from the original ffmpeg code in the original PR.  
> Considering it's stage4, I restricted the patch to the smallest change.  As a 
> matter of fact, we may need to do the same thing for signed int types because 
> -TYPE_MIN is undefined.  Unfortunately, I failed to create a test in this 
> case.
>
> Bootstrap and test on x64_64, is it OK?

Ok.

Thanks,
Richard.

> 2016-01-27  Bin Cheng  
>
> PR tree-optimization/67921
> * fold-const.c (split_tree): New parameters.  Convert pointer
> type variable part to proper type before negating.
> (fold_binary_loc): Pass new arguments to split_tree.
>
> gcc/testsuite/ChangeLog
> 2016-01-27  Bin Cheng  
>
> PR tree-optimization/67921
> * c-c++-common/ubsan/pr67921.c: New test.
>
>


Re: [PATCH] fix #69573 - FAIL: gcc.dg/pr61053.c

2016-02-01 Thread Marek Polacek
On Sat, Jan 30, 2016 at 07:11:14PM -0700, Martin Sebor wrote:
> While doing some research into the aligned attribute I came across
> c/61053 that had been reopened after the regression test added with
> the fix had started to fail on a couple of targets.  Since the bug
> is fixed and the test failure is due to the added test I resolved
> it as fixed and opened a new one for the test failure (to avoid
> giving the impression there's still a problem with _Alignas).
 
Right.  This is a testsuite-only issue; I've never gotten to fixing it.

> The attached patch should fix these outstanding test failures on
> the two targets (x86_64-apple-darwin15.3.0 and x86_64-w64-mingw32).
> I don't have access to either but I verified it on x86_64 by
> running the test with the following:
> 
> RUNTESTFLAGS='--target_board=unix/-m32/-m128bit-long-double
> dg.exp=pr61053.c'
 
Thanks for looking into this.  I guess I'm ok with this as long as the
test still passes on all non-weirdo targets ;).

Marek


[patch] libstdc++/69581 Don't define guard macros when doing #include_next in math.h and stdlib.h

2016-02-01 Thread Jonathan Wakely

The new  and  wrapper headers should not define
their guard macros when being included in "pass-through" mode, i.e.
when _GLIBCXX_INCLUDE_NEXT_C_HEADERS is defined. This patch allows the
wrapper to pass-through as many layers of wrapper headers as needed
until the real C library header is found.

Tested powerpc64-linux, committed to trunk.

commit c501dcd83400039b74f79f338580f25e148556ac
Author: Jonathan Wakely 
Date:   Mon Feb 1 10:53:35 2016 +

Don't define guard macros when doing #include_next in math.h and stdlib.h

2016-02-01  Bernd Edlinger  

	PR libstdc++/69581
	* include/c_compatibility/math.h: Move header guards.
	* include/c_compatibility/stdlib.h: Likewise.

diff --git a/libstdc++-v3/include/c_compatibility/math.h b/libstdc++-v3/include/c_compatibility/math.h
index 243e631..1f579ee 100644
--- a/libstdc++-v3/include/c_compatibility/math.h
+++ b/libstdc++-v3/include/c_compatibility/math.h
@@ -26,13 +26,13 @@
  *  This is a Standard C++ Library header.
  */
 
+#if !defined __cplusplus || defined _GLIBCXX_INCLUDE_NEXT_C_HEADERS
+# include_next 
+#else
 
 #ifndef _GLIBCXX_MATH_H
 #define _GLIBCXX_MATH_H 1
 
-#if !defined __cplusplus || defined _GLIBCXX_INCLUDE_NEXT_C_HEADERS
-# include_next 
-#else
 # include 
 
 using std::abs;
@@ -177,7 +177,5 @@ using std::sph_neumannl;
 using std::sph_neumann;
 #endif // __STDCPP_WANT_MATH_SPEC_FUNCS__
 
-#endif // __cplusplus
-
 #endif // _GLIBCXX_MATH_H
-
+#endif // __cplusplus
diff --git a/libstdc++-v3/include/c_compatibility/stdlib.h b/libstdc++-v3/include/c_compatibility/stdlib.h
index 31e7e5f..747ad76 100644
--- a/libstdc++-v3/include/c_compatibility/stdlib.h
+++ b/libstdc++-v3/include/c_compatibility/stdlib.h
@@ -26,12 +26,13 @@
  *  This is a Standard C++ Library header.
  */
 
-#ifndef _GLIBCXX_STDLIB_H
-#define _GLIBCXX_STDLIB_H 1
-
 #if !defined __cplusplus || defined _GLIBCXX_INCLUDE_NEXT_C_HEADERS
 # include_next 
 #else
+
+#ifndef _GLIBCXX_STDLIB_H
+#define _GLIBCXX_STDLIB_H 1
+
 # include 
 
 using std::abort;
@@ -81,5 +82,5 @@ using std::wctomb;
 #endif // _GLIBCXX_USE_WCHAR_T
 #endif
 
-#endif
-#endif
+#endif // _GLIBCXX_STDLIB_H
+#endif // __cplusplus


Re: [PATCH][ARM][0/4] Fixing PR target/65932

2016-02-01 Thread Kyrill Tkachov

Ping on the series:
https://gcc.gnu.org/ml/gcc-patches/2016-01/msg01717.html
https://gcc.gnu.org/ml/gcc-patches/2016-01/msg01714.html
https://gcc.gnu.org/ml/gcc-patches/2016-01/msg01716.html (already approved by 
Richi)
https://gcc.gnu.org/ml/gcc-patches/2016-01/msg01715.html

as well as
https://gcc.gnu.org/ml/gcc-patches/2015-06/msg02132.html
as described in the cover letter

Thanks,
Kyrill

On 22/01/16 09:52, Kyrill Tkachov wrote:

Hi all,

PR target/65932 is a wrong-code bug affecting arm and has manifested itself
when compiling the Linux kernel, so it's something that we really
ought to fix. The problem stems from the fact that PROMOTE_MODE and
TARGET_PROMOTE_FUNCTION_MODE don't match up on arm.
PROMOTE_MODE also marks the promotion as unsigned, whereas the
TARGET_PROMOTE_FUNCTION_MODE doesn't. This can lead to short variables
being wrongly zero-extended instead of sign-extended.  This also occurs
in PR target/67714.

Jim Wilson tried a few approaches and from the discussion
on the PR and on the ML 
(https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00814.html)
the preferred approach is to make PROMOTE_MODE and TARGET_PROMOTE_FUNCTION_MODE
match up. Changing TARGET_PROMOTE_FUNCTION_MODE to zero-extend would be an ABI
change so we don't want to do that.  Changing PROMOTE_MODE to not zero-extend
fixes both PR 65932 and 67714.  So Jim's patch is the first patch in this
series.

It has been posted at https://gcc.gnu.org/ml/gcc-patches/2015-06/msg02132.html
and this series is based on top of the arm.h hunk of that patch.

There have been some concerns about the codegen quality fallout from Jim's
patch, which is what the remaining patches in this series address:

* 3 new arm testsuite failures: gcc.target/arm/wmul-[123].c.
wmul-1.c and wmul-2.c are cases where we no longer generate
sign-extend+multiply (and accumulate) instructions but instead generate
the normal full-width multiplies (the operands are sign-extended from
preceeding sign-extending loads).  This is a regression for some targets
on which the sign-extending form is faster. Patches 2 and 3 address this.
gcc.target/arm/wmul-3.c is a test where we actually end up generating
better code and so the testcase just needs to be adjusted.
Patch 4 deals with that.

* Sign-extending rather than zero-extending short values means we make more
use of the ldrsb and ldrsh arm instructions rather than the zero-extending
ldrb and ldrh.  On Thumb1 targets ldrsb and ldrsh have more limited addressing
modes (only REG + REG), which could hurt code size. However, the change also
means that we can now merge sequences of load-zero-extend followed by a 
sign-extend
into a single load-sign-extend.
So we'd turn a (ldrh ; sxth) into an (ldrsh).

I've built a few popular embedded benchmarks for a Cortex-M0 target (Thumb1) 
and looked
at the generated assembly for -Os and -O2. I did see both effects. That is,
ldrh instructions with an immediate being turned into two instructions:
ldrhr4, [r4, #12]
->
movsr5, #12
ldrshr4, [r4, r5]

But I also observed the beneficial effect. Various register allocation
perturbations also featured in the changes
Overall, for every codebase I've looked at both -Os and -O2 the new code was
slightly smaller. Probably not enough to call this an outright win, but
definitely not a regression overall.

As for ARM and Thumb2 states, this series doesn't have a major impact.
SPEC2006 bencharks are slightly reduced in size (but nothing to write home
about) and there are no code quality regressions. And even the regressions
caught by the testsuite in the wmul-[12].c tests don't actually manifest
in practice in SPEC, but they are fixed anyway.

The series contains one target-independent change to CSE in patch 3 that
I'll explain there.

The series has been bootstrapped and tested on arm, aarch64 and x86_64.
Is this ok for trunk together with Jim's arm.h hunk from
g ?

P.S. This also fixes PR middlen-end/67295 which is a testsuite regression on 
arm.
Refer to the bugzilla entry there for the analysis of how this issue affects
that PR.

Thanks,
Kyrill





Remove couple of dead lines in reload_cse_simplify

2016-02-01 Thread Eric Botcazou
They are not flagged by the compiler, albeit obviously dead, but static 
analyzers will probably not miss them.

Tested on x86_64-suse-linux, applied on the mainline.


2016-02-01  Eric Botcazou  

* postreload.c (reload_cse_simplify): Remove dead code.

-- 
Eric BotcazouIndex: postreload.c
===
--- postreload.c	(revision 232976)
+++ postreload.c	(working copy)
@@ -106,10 +106,6 @@ reload_cse_simplify (rtx_insn *insn, rtx
 
   if (!count && reload_cse_noop_set_p (body))
 	{
-	  rtx value = SET_DEST (body);
-	  if (REG_P (value)
-	  && ! REG_FUNCTION_VALUE_P (value))
-	value = 0;
 	  if (check_for_inc_dec (insn))
 	delete_insn_and_edges (insn);
 	  /* We're done with this insn.  */


RE: [Patch,microblaze]: Better register allocation to minimize the spill and fetch.

2016-02-01 Thread Ajit Kumar Agarwal


-Original Message-
From: Michael Eager [mailto:ea...@eagerm.com] 
Sent: Friday, January 29, 2016 11:33 PM
To: Ajit Kumar Agarwal; GCC Patches
Cc: Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala
Subject: Re: [Patch,microblaze]: Better register allocation to minimize the 
spill and fetch.

On 01/29/2016 02:31 AM, Ajit Kumar Agarwal wrote:
>
> This patch improves the allocation of registers in the given function. 
> The allocation is optimized for the conditional branches. The 
> temporary register used in the conditional branches to store the 
> comparison results and use of temporary in the conditional branch is 
> optimized. Such temporary registers are allocated with a fixed register r18.
>
> Currently such temporaries are allocated with a free registers in the 
> given function. Due to this one of the free register is reserved for 
> the temporaries and given function is left with a few registers. This 
> is unoptimized with respect to microblaze. In Microblaze r18 is marked as 
> fixed and cannot be allocated to pseudos'
> in the given function. Instead r18 can be used as a temporary for the 
> conditional branches with compare and branch. Use of r18 as a 
> temporary for conditional branches will save one of the free registers 
> to be allocated. The free registers can be used for other pseudos' and hence 
> the better register allocation.
>
> The usage of r18 as above reduces the spill and fetch because of the 
> availability of one of the free registers to other pseudos instead of 
> being used for conditional temporaries.
>
> The advantage of the above is that the scope of the temporaries is 
> limited to the conditional branches and hence the usage of r18 as 
> temporary for such conditional branches is optimized and preserve the 
> functionality of the function.
>
> Regtested for Microblaze target.
>
> Performance runs are done with Mibench/EEMBC benchmarks.
>
> Following gains are achieved.
>
> Benchmarks  Gains
>
> automotive_qsort1   1.630730524%
> network_dijkstra   1.527506256%
> office_stringsearch   1 1.81356288%
> security_rijndael_d   3.26129357%
> basefp01_lite  4.465120185%
> a2time01_lite  1.893862857%
> cjpeg_lite  3.286496675%
> djpeg_lite 3.120150612%
> qos_lite 2.63964381%
> office_ispell 1.531340405%
>
> Code Size improvements:
>
> Reduction in number of instructions for Mibench  :  12927.
> Reduction in number of instructions for EEMBC :   212.
>
> ChangeLog:
> 2016-01-29  Ajit Agarwal  
>
>   * config/microblaze/microblaze.c
>   (microblaze_expand_conditional_branch): Use of MB_ABI_ASM_TEMP_REGNUM
>   for temporary conditional branch.
>   (microblaze_expand_conditional_branch_reg): Use of 
> MB_ABI_ASM_TEMP_REGNUM
>   for temporary conditional branch.
>   (microblaze_expand_conditional_branch_sf): Use of MB_ABI_ASM_TEMP_REGNUM
>   for temporary conditional branch.

>>You can combine these ChangeLog entries:

 >> * config/microblaze/microblaze.c
  >>(microblaze_expand_conditional_branch, 
microblaze_expand_conditional_branch_reg,
>>microblaze_expand_conditional_branch_sf): Use MB_ABI_ASM_TEMP_REGNUM 
for temp reg.

Attached is the patch with update in the changeLog.  Modified changeLog is 
given below.

ChangeLog:
2016-01-29  Ajit Agarwal  

* config/microblaze/microblaze.c
(microblaze_expand_conditional_branch, 
microblaze_expand_conditional_branch_reg,
microblaze_expand_conditional_branch_sf): Use MB_ABI_ASM_TEMP_REGNUM 
for temp reg.

Thanks & Regards
Ajit

>>Otherwise, OK.

>
> Signed-off-by:Ajit Agarwal ajit...@xilinx.com.
> ---
>   gcc/config/microblaze/microblaze.c |6 +++---
>   1 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/config/microblaze/microblaze.c 
> b/gcc/config/microblaze/microblaze.c
> index baff67a..b4277ad 100644
> --- a/gcc/config/microblaze/microblaze.c
> +++ b/gcc/config/microblaze/microblaze.c
> @@ -3402,7 +3402,7 @@ microblaze_expand_conditional_branch (machine_mode 
> mode, rtx operands[])
> rtx cmp_op0 = operands[1];
> rtx cmp_op1 = operands[2];
> rtx label1 = operands[3];
> -  rtx comp_reg = gen_reg_rtx (SImode);
> +  rtx comp_reg =  gen_rtx_REG (SImode, MB_ABI_ASM_TEMP_REGNUM);
> rtx condition;
>
> gcc_assert ((GET_CODE (cmp_op0) == REG) || (GET_CODE (cmp_op0) == 
> SUBREG)); @@ -3439,7 +3439,7 @@ microblaze_expand_conditional_branch_reg 
> (enum machine_mode mode,
> rtx cmp_op0 = operands[1];
> rtx cmp_op1 = operands[2];
> rtx label1 = operands[3];
> -  rtx comp_reg = gen_reg_rtx (SImode);
> +  rtx comp_reg =  gen_rtx_REG (SImode, MB_ABI_ASM_TEMP_REGNUM);
> rtx condition;
>
> gcc_assert ((GET_CODE (cmp_op0) == REG) @@ -3483,7 +3483,7 @@ 
> microblaze_expand_conditional_branch_sf (rtx operands[])
> rtx condition;
> rtx cmp_op0 = XEXP (operands[0], 0);
>  

Re: [PATCH] Ensure noce_convert_multiple_sets handles only multiple sets (PR rtl-optimization/69570)

2016-02-01 Thread Steven Bosscher
On Mon, Feb 1, 2016 at 9:32 AM, Jakub Jelinek  wrote:
> Bootstrapped/regtested on 
> {x86_64,i686,ppc64,ppc64le,s390,s390x,aarch64}-linux,
> ok for trunk?

OK.

Browny points for opting out of the loop over all insns in the basic
block when count > limit.

Ciao!
Steven


[PATCH] Ensure noce_convert_multiple_sets handles only multiple sets (PR rtl-optimization/69570)

2016-02-01 Thread Jakub Jelinek
Hi!

While looking at this PR (which is most likely a reg-stack or RA bug
triggered by the ifcvt noce_convert_multiple_sets additions), I've noticed
that despite the "multiple_sets" in the name it actually attempts to handle
not just multiple sets, but also the single set case, which is already
handled by various calls later on noce_process_if_block.  Additionally,
it handles it worse than those calls we had for years, because it always
creates temporary pseudos to store result into and then at the end copy back
to the desired register.  If there is just a single set, this temporary is
unnecessary and unfortunately negatively affects RA (get larger code with
more spills/fills in *.reload/postreload).

So, I'd like to change noce_convert_multiple_sets to really apply to
multiple sets only.  While it makes the issue latent again (and I'll try to
analyze it), IMHO it is the right step forward.

Bootstrapped/regtested on {x86_64,i686,ppc64,ppc64le,s390,s390x,aarch64}-linux,
ok for trunk?

2016-02-01  Jakub Jelinek  

PR rtl-optimization/69570
* ifcvt.c (bb_ok_for_noce_convert_multiple_sets): Return true only
if there is more than one set, not if there is a single set.

* g++.dg/opt/pr69570.C: New test.

--- gcc/ifcvt.c.jj  2016-01-21 17:53:32.0 +0100
+++ gcc/ifcvt.c 2016-01-31 13:47:34.171323086 +0100
@@ -3295,7 +3295,7 @@ bb_ok_for_noce_convert_multiple_sets (ba
   if (count > limit)
 return false;
 
-  return count > 0;
+  return count > 1;
 }
 
 /* Given a simple IF-THEN-JOIN or IF-THEN-ELSE-JOIN block, attempt to convert
--- gcc/testsuite/g++.dg/opt/pr69570.C.jj   2016-01-31 22:49:03.747216450 
+0100
+++ gcc/testsuite/g++.dg/opt/pr69570.C  2016-01-31 22:49:18.861009011 +0100
@@ -0,0 +1,70 @@
+// PR rtl-optimization/69570
+// { dg-do run }
+// { dg-options "-O2" }
+// { dg-additional-options "-fpic" { target fpic } }
+// { dg-additional-options "-march=i686" { target ia32 } }
+
+template  inline const T &
+min (const T &a, const T &b)
+{
+  if (b < a)
+return b;
+  return a;
+}
+
+template  inline const T &
+max (const T &a, const T &b)
+{
+  if (a < b)
+return b;
+  return a;
+}
+
+static inline void
+foo (unsigned x, unsigned y, unsigned z, double &h, double &s, double &l)
+{
+  double r = x / 255.0;
+  double g = y / 255.0;
+  double b = z / 255.0;
+  double m = max (r, max (g, b));
+  double n = min (r, min (g, b));
+  double d = m - n;
+  double e = m + n;
+  h = 0.0, s = 0.0, l = e / 2.0;
+  if (d > 0.0)
+{
+  s = l > 0.5 ? d / (2.0 - e) : d / e;
+  if (m == r && m != g)
+h = (g - b) / d + (g < b ? 6.0 : 0.0);
+  if (m == g && m != b)
+h = (b - r) / d + 2.0;
+  if (m == b && m != r)
+h = (r - g) / d + 4.0;
+  h /= 6.0;
+}
+}
+
+__attribute__ ((noinline, noclone))
+void bar (unsigned x[3], double y[3])
+{
+  double h, s, l;
+  foo (x[0], x[1], x[2], h, s, l);
+  y[0] = h;
+  y[1] = s;
+  y[2] = l;
+}
+
+int
+main ()
+{
+  unsigned x[3] = { 0, 128, 0 };
+  double y[3];
+
+  bar (x, y);
+  if (__builtin_fabs (y[0] - 0.3) > 0.001
+  || __builtin_fabs (y[1] - 1) > 0.001
+  || __builtin_fabs (y[2] - 0.25098) > 0.001)
+__builtin_abort ();
+
+  return 0;
+}

Jakub