Re: PR 68432: Add a target hook to control size/speed optab choices

2015-12-01 Thread Bernd Schmidt

On 12/01/2015 10:15 PM, Richard Sandiford wrote:

[This is a less invasive fix for the PR, without any changes to
  the .md attribute handling]


As a minimal fix I like this much better. I'll ok it under the condition 
that you have verified in all ports that size/speed issues are the only 
reasons for expanders that can be used for internal functions to FAIL. 
(I've now attempted to do this and didn't find anything relevant to the 
current set of internal functions, you can rely on that if you're 
willing to take the blame if it turns out not to have been exhaustive. 
umin and abs are potential candidates for trouble in the future.)


Also please make a followup patch to update the documentation in md.texi 
regarding when FAIL is allowed.



Bernd


Re: [PATCH] fix PR65726

2015-12-01 Thread Jeff Law

On 12/01/2015 02:50 PM, Andreas Tobler wrote:

On 30.11.15 23:30, Jeff Law wrote:

On 11/26/2015 11:49 AM, Andreas Tobler wrote:

Hi all,

the attached patch fixes the build issue from this ticket if bootstrap
is disabled.

Tested on x86_64-*-linux* and on x86_64-*-freebsd* with gcc and clang.

Ok for trunk?

And 5.3?

Thanks,
Andreas

2015-11-26  Andreas Tobler  

  PR libffi/65726
  * Makefile.def (lang_env_dependencies): Make libffi depend
  on cxx.
  * Makefile.in: Regenerate.


OK.



Thanks!

Committed to trunk. I'll wait till gcc5 opens again and then I commit to
gcc5 and gcc49, ok?

Seems reasonable.
jeff


Re: [patch] RFC asan support for i?86/x86_64-*freebsd*

2015-12-01 Thread Andreas Tobler

Hi!

On 01.12.15 13:22, Uros Bizjak wrote:


2015-11-29  Andreas Tobler  

* config/i386/i386.h: Define two new macros:
SUBTARGET_SHADOW_OFFSET_64 and SUBTARGET_SHADOW_OFFSET_32.
* config/i386/i386.c (ix86_asan_shadow_offset): Use these macros.
* config/i386/darwin.h: Override the SUBTARGET_SHADOW_OFFSET_64
macro.
* config/i386/freebsd.h: Override the SUBTARGET_SHADOW_OFFSET_64
and the SUBTARGET_SHADOW_OFFSET_32 macro.
* config/freebsd.h (LIBASAN_EARLY_SPEC): Define.
(LIBTSAN_EARLY_SPEC): Likewise.
(LIBLSAN_EARLY_SPEC): Likewise.


IMO, there is no compelling reason for _64 and _32 subtargets split,
especially since it depends on TARGET_LP64, not on the usual
TARGET_64BIT. Due to this, I'd rather introduce only
TARGET_SHADOW_OFFSET, like:

#define TARGET_SHADOW_OFFSET \
   (TARGET_LP64 ? HOST_WIDE_INT_C (0x7fff8000) : HOST_WIDE_INT_1 << 29)

(and similar for other targets).


Thank you for the feedback. I put your suggestion into my local tree and 
the results are equal. Thanks, diff reduced :)


Andreas



Re: [RFA] Compact EH Patch

2015-12-01 Thread Jason Merrill

On 11/25/2015 11:58 AM, Moore, Catherine wrote:




-Original Message-
From: Richard Henderson [mailto:r...@redhat.com]
Sent: Friday, September 18, 2015 3:25 PM
To: Moore, Catherine; gcc-patches@gcc.gnu.org
Cc: ja...@redhat.com; Matthew Fortune
Subject: Re: [RFA] Compact EH Patch


Index: libgcc/libgcc-std.ver.in


==
=

--- libgcc/libgcc-std.ver.in(revision 226409)
+++ libgcc/libgcc-std.ver.in(working copy)
@@ -1918,6 +1918,7 @@ GCC_4.6.0 {
__morestack_current_segment
__morestack_initial_sp
__splitstack_find
+  _Unwind_GetEhEncoding
  }

  %inherit GCC_4.7.0 GCC_4.6.0
@@ -1938,3 +1939,8 @@ GCC_4.7.0 {
  %inherit GCC_4.8.0 GCC_4.7.0
  GCC_4.8.0 {
  }
+
+%inherit GCC_4.8.0 GCC_4.7.0
+GCC_4.8.0 {
+  __register_frame_info_header_bases
+}


You can't push new symbols into old versions.  These have to go into the
version for the current gcc.


Index: libstdc++-v3/config/abi/pre/gnu.ver


==
=

--- libstdc++-v3/config/abi/pre/gnu.ver (revision 226409)
+++ libstdc++-v3/config/abi/pre/gnu.ver (working copy)
@@ -1909,6 +1909,7 @@ CXXABI_1.3 {
  __gxx_personality_v0;
  __gxx_personality_sj0;
  __gxx_personality_seh0;
+__gnu_compact_pr2;
  __dynamic_cast;

  # *_type_info classes, ctor and dtor
Index: libstdc++-v3/config/abi/pre/gnu-versioned-namespace.ver


==
=

--- libstdc++-v3/config/abi/pre/gnu-versioned-namespace.ver

(revision 226409)

+++ libstdc++-v3/config/abi/pre/gnu-versioned-namespace.ver

(working copy)

@@ -200,6 +200,7 @@ CXXABI_2.0 {
  __cxa_vec_new;
  __gxx_personality_v0;
  __gxx_personality_sj0;
+__gnu_compact_pr2;
  __dynamic_cast;

  # std::exception_ptr


Likewise.


I'm getting ready to post the updates to this patch -- hopefully, I can still 
get it in GCC 6.0.
I'm not sure how to tell what the current CXXABI is for these two files.  
Should it be CXXABI_2.0 for both of these?


Jonathan, can you answer this question?

Jason



Re: PR68577: Handle narrowing for vector popcount, etc.

2015-12-01 Thread Richard Sandiford
Richard Biener  writes:
> On Tue, Dec 1, 2015 at 10:14 AM, Richard Sandiford
>  wrote:
>> diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
>> index 3b078da..af86bce 100644
>> --- a/gcc/tree-vect-stmts.c
>> +++ b/gcc/tree-vect-stmts.c
>> @@ -2122,6 +2122,40 @@ vectorizable_mask_load_store (gimple *stmt, 
>> gimple_stmt_iterator *gsi,
>>return true;
>>  }
>>
>> +/* Return true if vector type VECTYPE_OUT has integer elements and
>> +   if we can narrow two integer vectors with the same shape as
>> +   VECTYPE_IN to VECTYPE_OUT in a single step.  On success,
>> +   return the binary pack code in *CONVERT_CODE and the types
>> +   of the input vectors in *CONVERT_FROM.  */
>> +
>> +static bool
>> +simple_integer_narrowing (tree vectype_out, tree vectype_in,
>> + tree_code *convert_code, tree *convert_from)
>> +{
>> +  if (!INTEGRAL_TYPE_P (TREE_TYPE (vectype_out)))
>> +return false;
>> +
>> +  if (!INTEGRAL_TYPE_P (TREE_TYPE (vectype_in)))
>> +{
>> +  unsigned int bits
>> +   = GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (vectype_in)));
>> +  tree scalar_type = build_nonstandard_integer_type (bits, 0);
>> +  vectype_in = get_same_sized_vectype (scalar_type, vectype_in);
>> +}
>> +
>
> any reason for supporting non-integer types on the input?  It seems to me
> you are doing this for the lrint case?  If so isn't the "question" wrong and
> you should pass the integer type the IFN returns as vectype_in instead?
>
> That said, this conversion doesn't seem to belong to simple_integer_narrowing.
>
> The patch is ok with simply removing it.

OK, thanks, here's what I applied after retesting.

Richard


gcc/
PR tree-optimization/68577
* tree-vect-stmts.c (simple_integer_narrowing): New function.
(vectorizable_call): Restrict internal function handling
to NONE and NARROW cases, using simple_integer_narrowing
to test for the latter.  Add cost of narrowing operation
and insert it where necessary.

gcc/testsuite/
PR tree-optimization/68577
* gcc.dg/vect/pr68577.c: New test.

Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   2015-12-01 14:53:39.689404993 +
+++ gcc/tree-vect-stmts.c   2015-12-01 20:50:22.288498596 +
@@ -2140,6 +2140,31 @@ vectorizable_mask_load_store (gimple *st
   return true;
 }
 
+/* Return true if vector types VECTYPE_IN and VECTYPE_OUT have
+   integer elements and if we can narrow VECTYPE_IN to VECTYPE_OUT
+   in a single step.  On success, store the binary pack code in
+   *CONVERT_CODE.  */
+
+static bool
+simple_integer_narrowing (tree vectype_out, tree vectype_in,
+ tree_code *convert_code)
+{
+  if (!INTEGRAL_TYPE_P (TREE_TYPE (vectype_out))
+  || !INTEGRAL_TYPE_P (TREE_TYPE (vectype_in)))
+return false;
+
+  tree_code code;
+  int multi_step_cvt = 0;
+  auto_vec  interm_types;
+  if (!supportable_narrowing_operation (NOP_EXPR, vectype_out, vectype_in,
+   , _step_cvt,
+   _types)
+  || multi_step_cvt)
+return false;
+
+  *convert_code = code;
+  return true;
+}
 
 /* Function vectorizable_call.
 
@@ -2306,7 +2331,12 @@ vectorizable_call (gimple *gs, gimple_st
   tree callee = gimple_call_fndecl (stmt);
 
   /* First try using an internal function.  */
-  if (cfn != CFN_LAST)
+  tree_code convert_code = ERROR_MARK;
+  if (cfn != CFN_LAST
+  && (modifier == NONE
+ || (modifier == NARROW
+ && simple_integer_narrowing (vectype_out, vectype_in,
+  _code
 ifn = vectorizable_internal_function (cfn, callee, vectype_out,
  vectype_in);
 
@@ -2346,7 +2376,7 @@ vectorizable_call (gimple *gs, gimple_st
 
   if (slp_node || PURE_SLP_STMT (stmt_info))
 ncopies = 1;
-  else if (modifier == NARROW)
+  else if (modifier == NARROW && ifn == IFN_LAST)
 ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits_out;
   else
 ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits_in;
@@ -2362,6 +2392,10 @@ vectorizable_call (gimple *gs, gimple_st
 dump_printf_loc (MSG_NOTE, vect_location, "=== vectorizable_call ==="
  "\n");
   vect_model_simple_cost (stmt_info, ncopies, dt, NULL, NULL);
+  if (ifn != IFN_LAST && modifier == NARROW && !slp_node)
+   add_stmt_cost (stmt_info->vinfo->target_cost_data, ncopies / 2,
+  vec_promote_demote, stmt_info, 0, vect_body);
+
   return true;
 }
 
@@ -2375,9 +2409,9 @@ vectorizable_call (gimple *gs, gimple_st
   vec_dest = vect_create_destination_var (scalar_dest, vectype_out);
 
   prev_stmt_info = NULL;
-  switch (modifier)
+  if (modifier == NONE || ifn != IFN_LAST)
 {
-case NONE:
+  tree 

Re: [PATCH][PR tree-optimization/67816] Fix jump threading when DOM removes conditionals in jump threading path

2015-12-01 Thread Jeff Law

On 10/09/2015 09:45 AM, Jeff Law wrote:

Yes, but as you remove jump threading paths you could leave the CFG
change to
cfg-cleanup anyway?  To get better behavior wrt loop fixup at least?

So go ahead and detect, remove the threading paths, but leave final
fixup to cfg-cleanup.  I can certainly try that.

It'd actually be a good thing to experiement with regardless -- I've
speculated that removing the edges in DOM allows DOM to do a better job,
but never did the instrumentation to find out for sure.  Deferring the
final cleanup like you've suggested ought to give me most of what I'd
want to see if there's really any good secondary effects of cleaning up
the edges in DOM.
So I started looking at this in response to 68619, where this approach 
does indeed solve the problem.


Essentially DOM's optimization of those edges results in two irredicuble 
loops becoming reducible.  The loop analysis code then complains because 
we don't have proper loop structures for the new natural loops.


Deferring to cfg_cleanup works because if cfg_cleanup does anything, it 
sets LOOPS_NEED_FIXUP (which we were trying to avoid in DOM).  So it 
seems that the gyrations we often do to avoid LOOPS_NEED_FIXUP are 
probably not all that valuable in the end.  Anyway...



There's some fallout which I'm still exploring.  For example, we have 
cases where removal of the edge by DOM results in removal of a PHI 
argument in the target, which in turn results in the PHI becoming a 
degenerate which we can then propagate away.  I have a possible solution 
for this that I'm playing with.


I suspect the right path is to continue down this path.

Jeff




RE: [RFA] Compact EH Patch

2015-12-01 Thread Moore, Catherine
Ping?

> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Moore, Catherine
> Sent: Wednesday, November 25, 2015 11:58 AM
> To: Richard Henderson; gcc-patches@gcc.gnu.org
> Cc: ja...@redhat.com; Matthew Fortune
> Subject: RE: [RFA] Compact EH Patch
> 
> 
> 
> > -Original Message-
> > From: Richard Henderson [mailto:r...@redhat.com]
> > Sent: Friday, September 18, 2015 3:25 PM
> > To: Moore, Catherine; gcc-patches@gcc.gnu.org
> > Cc: ja...@redhat.com; Matthew Fortune
> > Subject: Re: [RFA] Compact EH Patch
> >
> > > Index: libgcc/libgcc-std.ver.in
> > >
> >
> ==
> > =
> > > --- libgcc/libgcc-std.ver.in  (revision 226409)
> > > +++ libgcc/libgcc-std.ver.in  (working copy)
> > > @@ -1918,6 +1918,7 @@ GCC_4.6.0 {
> > >__morestack_current_segment
> > >__morestack_initial_sp
> > >__splitstack_find
> > > +  _Unwind_GetEhEncoding
> > >  }
> > >
> > >  %inherit GCC_4.7.0 GCC_4.6.0
> > > @@ -1938,3 +1939,8 @@ GCC_4.7.0 {
> > >  %inherit GCC_4.8.0 GCC_4.7.0
> > >  GCC_4.8.0 {
> > >  }
> > > +
> > > +%inherit GCC_4.8.0 GCC_4.7.0
> > > +GCC_4.8.0 {
> > > +  __register_frame_info_header_bases
> > > +}
> >
> > You can't push new symbols into old versions.  These have to go into
> > the version for the current gcc.
> >
> > > Index: libstdc++-v3/config/abi/pre/gnu.ver
> > >
> >
> ==
> > =
> > > --- libstdc++-v3/config/abi/pre/gnu.ver   (revision 226409)
> > > +++ libstdc++-v3/config/abi/pre/gnu.ver   (working copy)
> > > @@ -1909,6 +1909,7 @@ CXXABI_1.3 {
> > >  __gxx_personality_v0;
> > >  __gxx_personality_sj0;
> > >  __gxx_personality_seh0;
> > > +__gnu_compact_pr2;
> > >  __dynamic_cast;
> > >
> > >  # *_type_info classes, ctor and dtor
> > > Index: libstdc++-v3/config/abi/pre/gnu-versioned-namespace.ver
> > >
> >
> ==
> > =
> > > --- libstdc++-v3/config/abi/pre/gnu-versioned-namespace.ver
> > (revision 226409)
> > > +++ libstdc++-v3/config/abi/pre/gnu-versioned-namespace.ver
> > (working copy)
> > > @@ -200,6 +200,7 @@ CXXABI_2.0 {
> > >  __cxa_vec_new;
> > >  __gxx_personality_v0;
> > >  __gxx_personality_sj0;
> > > +__gnu_compact_pr2;
> > >  __dynamic_cast;
> > >
> > >  # std::exception_ptr
> >
> > Likewise.
> >
> I'm getting ready to post the updates to this patch -- hopefully, I can still 
> get it
> in GCC 6.0.
> I'm not sure how to tell what the current CXXABI is for these two files.
> Should it be CXXABI_2.0 for both of these?
> Thanks,
> Catherine


[C PATCH] Fix up location used in get_parm_info diagnostics (PR c/68533)

2015-12-01 Thread Jakub Jelinek
Hi!

get_parm_info right now uses input_location as the diagnostics locus, but as
can be seen on the testcase, that is pretty random location at that point,
often the type of the last parameter.

This patch changes it to use the locus from the binding info.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2015-12-01  Jakub Jelinek  

PR c/68533
* c-decl.c (get_parm_info): Use b->locus instead of input_location
for diagnostics.

* gcc.dg/pr68533.c: New test.

--- gcc/c/c-decl.c.jj   2015-11-30 13:40:35.0 +0100
+++ gcc/c/c-decl.c  2015-12-01 16:14:40.466462666 +0100
@@ -6913,11 +6913,11 @@ get_parm_info (bool ellipsis, tree expr)
 {
   if (TYPE_QUALS (TREE_TYPE (b->decl)) != TYPE_UNQUALIFIED
  || C_DECL_REGISTER (b->decl))
-   error ("% as only parameter may not be qualified");
+   error_at (b->locus, "% as only parameter may not be qualified");
 
   /* There cannot be an ellipsis.  */
   if (ellipsis)
-   error ("% must be the only parameter");
+   error_at (b->locus, "% must be the only parameter");
 
   arg_info->types = void_list_node;
   return arg_info;
@@ -6946,13 +6946,14 @@ get_parm_info (bool ellipsis, tree expr)
 
  /* Check for forward decls that never got their actual decl.  */
  if (TREE_ASM_WRITTEN (decl))
-   error ("parameter %q+D has just a forward declaration", decl);
+   error_at (b->locus,
+ "parameter %q+D has just a forward declaration", decl);
  /* Check for (..., void, ...) and issue an error.  */
  else if (VOID_TYPE_P (type) && !DECL_NAME (decl))
{
  if (!gave_void_only_once_err)
{
- error ("% must be the only parameter");
+ error_at (b->locus, "% must be the only parameter");
  gave_void_only_once_err = true;
}
}
@@ -6991,13 +6992,13 @@ get_parm_info (bool ellipsis, tree expr)
{
  if (b->id)
/* The %s will be one of 'struct', 'union', or 'enum'.  */
-   warning_at (input_location, 0,
+   warning_at (b->locus, 0,
"%<%s %E%> declared inside parameter list"
" will not be visible outside of this definition or"
" declaration", keyword, b->id);
  else
/* The %s will be one of 'struct', 'union', or 'enum'.  */
-   warning_at (input_location, 0,
+   warning_at (b->locus, 0,
"anonymous %s declared inside parameter list"
" will not be visible outside of this definition or"
" declaration", keyword);
--- gcc/testsuite/gcc.dg/pr68533.c.jj   2015-12-01 18:47:55.864178841 +0100
+++ gcc/testsuite/gcc.dg/pr68533.c  2015-12-01 18:48:56.0 +0100
@@ -0,0 +1,68 @@
+/* PR c/68533 */
+/* { dg-do compile } */
+/* { dg-options "" } */
+
+struct T { int t; };
+
+void
+f1 (
+  struct S *   /* { dg-warning "declared inside parameter list will not be 
visible outside of this definition or declaration" } */
+  x,
+  struct T *
+  y
+   )
+{
+  y->t = 4;
+}
+
+void
+f2 (
+  struct {int s;} * /* { dg-warning "anonymous struct declared inside 
parameter list will not be visible outside of this definition or declaration" } 
*/
+  x,
+  struct T *
+  y
+   )
+{
+  y->t = 5;
+}
+
+void
+f3 (
+  const void
+   )   /* { dg-error "'void' as only parameter may not be qualified" } 
*/
+{
+}
+
+void
+f4 (
+   void,   /* { dg-error "'void' must be the only parameter" } */
+   ...
+   )
+{
+}
+
+void
+f5 (
+   int
+   x;  /* { dg-error "parameter 'x' has just a forward declaration" } 
*/
+   int y
+   )
+{
+}
+
+void
+f6 (
+   int
+   x,
+   void
+   )   /* { dg-error "'void' must be the only parameter" } */
+{
+}
+
+void
+f7 (
+   void,   /* { dg-error "'void' must be the only parameter" } */
+   int y
+   )
+{
+}

Jakub


PR 68432: Add a target hook to control size/speed optab choices

2015-12-01 Thread Richard Sandiford
[This is a less invasive fix for the PR, without any changes to
 the .md attribute handling]

The problem in the PR is that some i386 optabs FAIL when
optimising for size rather than speed.  The gimple level generally
needs access to this information before calling the generator,
so this patch adds a new hook to say whether an optab should
be used when optimising for size or speed.  It also has a "both"
option for cases where we want code that is optimised for both
size and speed.

I've passed the optab to the target hook because I think in most
cases that's more useful than the instruction code.  We could pass
both if there's a use for it though.

At the moment the match-and-simplify code doesn't have direct access
to the target block, so for now I've used "both" there.

Tested on x86_64-linux-gnu and powerpc64-linux-gnu.  OK to install?

Thanks,
Richard


gcc/
PR tree-optimization/68432
* coretypes.h (optimization_type): New enum.
* doc/tm.texi.in (TARGET_OPTAB_SUPPORTED_P): New hook.
* doc/tm.texi: Regenerate.
* target.def (optab_supported_p): New hook.
* targhooks.h (default_optab_supported_p): Declare.
* targhooks.c (default_optab_supported_p): New function.
* predict.h (function_optimization_type): Declare.
(bb_optimization_type): Likewise.
* predict.c (function_optimization_type): New function.
(bb_optimization_type): Likewise.
* optabs-query.h (convert_optab_handler): Define an overload
that takes an optimization type.
(direct_optab_handler): Likewise.
* optabs-query.c (convert_optab_handler): Likewise.
(direct_optab_handler): Likewise.
* internal-fn.h (direct_internal_fn_supported_p): Take an
optimization_type argument.
* internal-fn.c (direct_optab_supported_p): Likewise.
(multi_vector_optab_supported_p): Likewise.
(direct_internal_fn_supported_p): Likewise.
* builtins.c (replacement_internal_fn): Update call to
direct_internal_fn_supported_p.
* gimple-match-head.c (build_call_internal): Likewise.
* tree-vect-patterns.c (vect_recog_pow_pattern): Likewise.
* tree-vect-stmts.c (vectorizable_internal_function): Likewise.
* tree.c (maybe_build_call_expr_loc): Likewise.
* config/i386/i386.c (ix86_optab_supported_p): New function.
(TARGET_OPTAB_SUPPORTED_P): Define.
* config/i386/i386.md (asinxf2): Remove optimize_insn_for_size_p check.
(asin2, acosxf2, acos2, log1pxf2, log1p2)
(expNcorexf3, expxf2, exp2, exp10xf2, exp102, exp2xf2)
(exp22, expm1xf2, expm12, ldexpxf3, ldexp3)
(scalbxf3, scalb3, rint2, round2)
(xf2, 2): Likewise.

gcc/testsuite/
* gcc.target/i386/pr68432-1.c: New test.
* gcc.target/i386/pr68432-2.c: Likewise.
* gcc.target/i386/pr68432-3.c: Likewise.

Index: gcc/coretypes.h
===
--- gcc/coretypes.h 2015-11-17 18:37:24.589012971 +
+++ gcc/coretypes.h 2015-12-01 21:11:45.093955446 +
@@ -200,6 +200,18 @@ enum node_frequency {
   NODE_FREQUENCY_HOT
 };
 
+/* Ways of optimizing code.  */
+enum optimization_type {
+  /* Prioritize speed over size.  */
+  OPTIMIZE_FOR_SPEED,
+
+  /* Only do things that are good for both size and speed.  */
+  OPTIMIZE_FOR_BOTH,
+
+  /* Prioritize size over speed.  */
+  OPTIMIZE_FOR_SIZE
+};
+
 /* Possible initialization status of a variable.   When requested
by the user, this information is tracked and recorded in the DWARF
debug information, along with the variable's location.  */
Index: gcc/doc/tm.texi.in
===
--- gcc/doc/tm.texi.in  2015-11-17 18:55:03.697205226 +
+++ gcc/doc/tm.texi.in  2015-12-01 21:11:45.101955355 +
@@ -4746,6 +4746,8 @@ Define this macro if a non-short-circuit
 @code{BRANCH_COST} is greater than or equal to the value 2.
 @end defmac
 
+@hook TARGET_OPTAB_SUPPORTED_P
+
 @hook TARGET_RTX_COSTS
 
 @hook TARGET_ADDRESS_COST
Index: gcc/doc/tm.texi
===
--- gcc/doc/tm.texi 2015-12-01 09:29:35.764236262 +
+++ gcc/doc/tm.texi 2015-12-01 21:11:45.097955400 +
@@ -6425,6 +6425,20 @@ Define this macro if a non-short-circuit
 @code{BRANCH_COST} is greater than or equal to the value 2.
 @end defmac
 
+@deftypefn {Target Hook} bool TARGET_OPTAB_SUPPORTED_P (int @var{op}, 
machine_mode @var{mode1}, machine_mode @var{mode2}, optimization_type 
@var{opt_type})
+Return true if the optimizers should use optab @var{op} with
+modes @var{mode1} and @var{mode2} for optimization type @var{opt_type}.
+The optab is known to have an associated @file{.md} instruction
+whose C condition is true.  @var{mode2} is only meaningful for conversion
+optabs; for direct optabs it is a copy of @var{mode1}.
+
+For example, when called with 

[PTX] uninitialized decls

2015-12-01 Thread Nathan Sidwell

This patch removes some more code duplication. ASM_OUTPUT_ALIGNED_DECL_COMMON &
ASM_OUTPUT_ALIGNED_DECL_LOCAL had virtually identical definitions, so I fowarded 
them both to a new helper function.  I noticed that:


(a) a common decl  could use .weak, which is closer to common semantics than a 
regular visible decl.


(b) local decls were being exported with a .visible

While there, I introduced 2 newhelper functions to emit the linker marker 
comments and adjusted code to use those two helpers.


nathan
2015-12-01  Nathan Sidwell  

	gcc/
	* config/nvptx/nvptx-protos.h (nvptx_output_aligned_decl): Declare.
	* config/nvptx/nvptx.h (ASM_OUTPUT_ALIGNED_DECL_COMMON,
	ASM_OUTPUT_ALIGNED_DECL_LOCAL): Forward to nvptx_output_aligned_decl.
	* config/nvptx/nvptx.c (write_fn_marker, write_var_marker): New.
	(write_fn_proto, write_fn_proto_from_insn): Call write_fn_marker.
	(init_output_initializer): Call write_var_marker.
	(nvptx_output_aligned_decl): New.
	(nvptx_assemble_undefined_decl, nvptx_file_end): Call write_var_marker.

	gcc/testsuite/
	* gcc.target/nvptx/uninit-decl.c: New.

Index: config/nvptx/nvptx-protos.h
===
--- config/nvptx/nvptx-protos.h	(revision 231126)
+++ config/nvptx/nvptx-protos.h	(working copy)
@@ -24,11 +24,13 @@
 extern void nvptx_declare_function_name (FILE *, const char *, const_tree decl);
 extern void nvptx_declare_object_name (FILE *file, const char *name,
    const_tree decl);
+extern void nvptx_output_aligned_decl (FILE *file, const char *name,
+   const_tree decl,
+   HOST_WIDE_INT size, unsigned align);
 extern void nvptx_function_end (FILE *);
 extern void nvptx_output_skip (FILE *, unsigned HOST_WIDE_INT);
 extern void nvptx_output_ascii (FILE *, const char *, unsigned HOST_WIDE_INT);
 extern void nvptx_register_pragmas (void);
-extern const char *nvptx_section_for_decl (const_tree);
 
 #ifdef RTX_CODE
 extern void nvptx_expand_oacc_fork (unsigned);
Index: config/nvptx/nvptx.c
===
--- config/nvptx/nvptx.c	(revision 231126)
+++ config/nvptx/nvptx.c	(working copy)
@@ -366,6 +366,31 @@ write_as_kernel (tree attrs)
 	  || lookup_attribute ("omp target entrypoint", attrs) != NULL_TREE);
 }
 
+/* Emit a linker marker for a function decl or defn.  */
+
+static void
+write_fn_marker (std::stringstream , bool is_defn, bool globalize,
+		 const char *name)
+{
+  s << "\n// BEGIN";
+  if (globalize)
+s << " GLOBAL";
+  s << " FUNCTION " << (is_defn ? "DEF: " : "DECL: ");
+  s << name << "\n";
+}
+
+/* Emit a linker marker for a variable decl or defn.  */
+
+static void
+write_var_marker (FILE *file, bool is_defn, bool globalize, const char *name)
+{
+  fprintf (file, "\n// BEGIN%s VAR %s: ",
+	   globalize ? " GLOBAL" : "",
+	   is_defn ? "DEF" : "DECL");
+  assemble_name_raw (file, name);
+  fputs ("\n", file);
+}
+
 /* Write a .func or .kernel declaration or definition along with
a helper comment for use by ld.  S is the stream to write to, DECL
the decl for the function with name NAME.   For definitions, emit
@@ -386,11 +411,7 @@ write_fn_proto (std::stringstream , bo
 	name++;
 }
 
-  /* Emit the linker marker.  */
-  s << "\n// BEGIN";
-  if (TREE_PUBLIC (decl))
-s << " GLOBAL";
-  s << " FUNCTION " << (is_defn ? "DEF" : "DECL") << ": " << name << "\n";
+  write_fn_marker (s, is_defn, TREE_PUBLIC (decl), name);
 
   /* PTX declaration.  */
   if (DECL_EXTERNAL (decl))
@@ -500,7 +521,7 @@ write_fn_proto_from_insn (std::stringstr
   else
 {
   name = nvptx_name_replacement (name);
-  s << "\n// BEGIN GLOBAL FUNCTION DECL: " << name << "\n";
+  write_fn_marker (s, false, true, name);
   s << "\t.extern .func ";
 }
 
@@ -1638,9 +1659,7 @@ static void
 init_output_initializer (FILE *file, const char *name, const_tree type,
 			 bool is_public)
 {
-  fprintf (file, "\n// BEGIN%s VAR DEF: ", is_public ? " GLOBAL" : "");
-  assemble_name_raw (file, name);
-  fputc ('\n', file);
+  write_var_marker (file, true, is_public, name);
 
   if (TREE_CODE (type) == ARRAY_TYPE)
 type = TREE_TYPE (type);
@@ -1658,6 +1677,27 @@ init_output_initializer (FILE *file, con
   object_finished = false;
 }
 
+/* Output an uninitialized common or file-scope variable.  */
+
+void
+nvptx_output_aligned_decl (FILE *file, const char *name,
+			   const_tree decl, HOST_WIDE_INT size, unsigned align)
+{
+  write_var_marker (file, true, TREE_PUBLIC (decl), name);
+
+  /* If this is public, it is common.  The nearest thing we have to
+ common is weak.  */
+  if (TREE_PUBLIC (decl))
+fprintf (file, ".weak ");
+
+  const char *sec = nvptx_section_for_decl (decl);
+  fprintf (file, "%s.align %d .b8 ", sec, align / BITS_PER_UNIT);
+  assemble_name (file, name);
+  if (size > 0)
+fprintf (file, "[" HOST_WIDE_INT_PRINT_DEC"]", size);
+  fprintf (file, ";\n");
+}
+
 /* Implement 

Re: -fstrict-aliasing fixes 1/5: propagate -fno-strict-aliasing in the inliner

2015-12-01 Thread Bernhard Reutner-Fischer
On December 1, 2015 12:05:39 AM GMT+01:00, Jan Hubicka  wrote:
>Hi,
>this is first patch in the broken up series.  It adds the logic into
>ipa-inline-transform to drop the flag when inlining.  I do it always
>until
>we find a way to make early optimizations safe WRT this transform.
>
>The testcase triggers with GCC 5.0/4.9 too, older compilers passes if
>-fstrict-aliasing is used at linktime and fails otherwise.
>
>Bootstrapped/regtested x86_64-linux, will commit it after re-testing on
>Firefox.
>
>Honza
>
>   * ipa-inline-transform.c (inline_call): Drop -fstrict-aliasing when
>   inlining -fno-strict-aliasing into -fstrict-aliasing body.
>   * gcc.dg/lto/alias-1_0.c: New testcase.
>   * gcc.dg/lto/alias-1_1.c: New testcase.
>Index: ipa-inline-transform.c
>===
>--- ipa-inline-transform.c (revision 231081)
>+++ ipa-inline-transform.c (working copy)
>@@ -322,6 +322,21 @@ inline_call (struct cgraph_edge *e, bool
>   if (DECL_FUNCTION_PERSONALITY (callee->decl))
> DECL_FUNCTION_PERSONALITY (to->decl)
>   = DECL_FUNCTION_PERSONALITY (callee->decl);
>+  if (!opt_for_fn (callee->decl, flag_strict_aliasing)
>+  && opt_for_fn (to->decl, flag_strict_aliasing))

Just curious why you don't handle the other way round?

>+{
>+  struct gcc_options opts = global_options;
>+
>+  cl_optimization_restore (,
>+   TREE_OPTIMIZATION (DECL_FUNCTION_SPECIFIC_OPTIMIZATION (to->decl)));
>+  opts.x_flag_strict_aliasing = false;
>+  if (dump_file)
>+  fprintf (dump_file, "Dropping flag_strict_aliasing on %s:%i\n",
>+   to->name (), to->order);

ISTR to have seen %s/%i for printing name and order in IPA, no?

>+  build_optimization_node ();
>+  DECL_FUNCTION_SPECIFIC_OPTIMIZATION (to->decl)
>+   = build_optimization_node ();
>+}
> 
>/* If aliases are involved, redirect edge to the actual destination and
>  possibly remove the aliases.  */
>Index: testsuite/gcc.dg/lto/alias-1_0.c
>===
>--- testsuite/gcc.dg/lto/alias-1_0.c   (revision 0)
>+++ testsuite/gcc.dg/lto/alias-1_0.c   (revision 0)
>@@ -0,0 +1,23 @@
>+/* { dg-lto-do run } */
>+/* { dg-lto-options { { -O2 -flto } } } */
>+int val;
>+
>+__attribute__ ((used))
>+int *ptr = 
>+__attribute__ ((used))
>+float *ptr2 = (void *)
>+
>+extern void typefun(float val);
>+
>+void link_error (void);

Unused and unneeded forward decl?

Thanks,
>+
>+int
>+main()
>+{ 
>+  *ptr=1;
>+  typefun (0);
>+  if (*ptr)
>+__builtin_abort ();
>+  return 0;
>+}
>+
>Index: testsuite/gcc.dg/lto/alias-1_1.c
>===
>--- testsuite/gcc.dg/lto/alias-1_1.c   (revision 0)
>+++ testsuite/gcc.dg/lto/alias-1_1.c   (revision 0)
>@@ -0,0 +1,7 @@
>+/* { dg-options "-fno-strict-aliasing" } */
>+extern float *ptr2;
>+void
>+typefun (float val)
>+{ 
>+  *ptr2=val;
>+}




Re: [patch] add ELFv2 check to FreeBSD PowerPC64

2015-12-01 Thread Andreas Tobler

On 29.11.15 22:15, Andreas Tobler wrote:

Hi all,

I'd like to commit this patch to trunk.

It is FreeBSD only.

If nobody objects I'll commit it within two days.

Thanks,

Andreas

2015-11-29  Andreas Tobler  

* config/rs6000/freebsd64.h (ELFv2_ABI_CHECK): Add new macro.
(SUBSUBTARGET_OVERRIDE_OPTIONS): Use it to decide whether to set
rs6000_current_abi to ABI_AIX or ABI_ELFv2.




Committed to trunk. I'll wait till gcc5 opens again and then I'll 
backport to gcc5 and gcc49.


Thanks,
Andreas


Re: [PATCH] RFC: Use Levenshtein spelling suggestions in Fortran FE

2015-12-01 Thread Steve Kargl
On Tue, Dec 01, 2015 at 12:58:28PM -0500, David Malcolm wrote:
> On Tue, 2015-12-01 at 18:51 +0100, Bernhard Reutner-Fischer wrote:
> > As said, we could as well use a list of candidates with NULL as record 
> > marker.
> > Implementation cosmetics. Steve seems to not be thrilled by the
> > overall idea in the first place, so unless there is clear support by
> > somebody else i won't pursue this any further, it's not that i'm bored
> > or ran out of stuff i should do.. ;)
> 
> (FWIW I liked the idea, but I'm not a Fortran person so my opinion
> counts much less that Steve's)
> 

Your opinion is as valid as mine.

My only concern is code maintenance.  Injection of C++ (or any
other language) into C code seems to add possible complications
when something needs to be fix or changed to accommodate a new
Fortran freature.

-- 
Steve


Re: RFD: annotate iterator patterns with expanded forms

2015-12-01 Thread Bernd Schmidt

On 12/01/2015 04:31 PM, Bernd Schmidt wrote:

On 12/01/2015 04:23 PM, Jakub Jelinek wrote:

With the comments in the *.md file I'd worry about them getting out of
date,
or people feeling they have to edit them manually (rather than being
regenerated or whatever).


I suppose we could have a Makefile rule that checks for out-of-date
comments (by redoing the annotation and running diff). That would also
alleviate the second worry.

I'd much prefer the original source files to be searchable, because if I
want to make modifications, I can't make them in tmp-mddump.md and going
back and forth between two files is just inconvenient.


The automatic Makefile approach might look something like this. The 
effect is similar to what happens when you edit tm.texi.in, except the 
build would not be interrupted every time, only when you modify the 
iterator expansion of a pattern. There's a new rtx code which can be put 
into a machine description to enable this feature.


This could be further tweaked to make (enable_auto_annotate) 
push/poppable; I could imagine a world where we'd want it enabled for 
i386.md but not for sse.md. Another tweak might be to have every line 
marked as "--GEN--" both for clarity and for robustifying the part of 
the script that removes the previous version of the annotations.


Thoughts?


Bernd


diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index bee2879..ad4101d 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -2189,12 +2189,20 @@ s-conditions: $(MD_DEPS) build/genconditions$(build_exeext)
 	$(SHELL) $(srcdir)/../move-if-change tmp-condmd.c build/gencondmd.c
 	$(STAMP) s-conditions
 
-insn-conditions.md: s-condmd; @true
+insn-conditions.md: s-condmd s-annotations; @true
 s-condmd: build/gencondmd$(build_exeext)
 	$(RUN_GEN) build/gencondmd$(build_exeext) > tmp-cond.md
 	$(SHELL) $(srcdir)/../move-if-change tmp-cond.md insn-conditions.md
 	$(STAMP) s-condmd
 
+build/genannotation.ed: s-annotations; @true
+s-annotations: $(MD_DEPS) build/genannotations$(build_exeext)
+	$(RUN_GEN) build/genannotations$(build_exeext) $(md_file) \
+		> tmp-annotate.ed
+	$(SHELL) $(srcdir)/../move-if-change tmp-annotate.ed \
+		build/genannotation.ed
+	$(SHELL) $(srcdir)/annotate-md.sh build/genannotation.ed
+	$(STAMP) s-annotations
 
 # These files are generated by running the same generator more than
 # once with different options, so they have custom rules.  The
@@ -2577,6 +2585,9 @@ build/gengtype.o: $(BCONFIG_H)
 
 CFLAGS-errors.o += -DHOST_GENERATOR_FILE
 
+build/genannotations.o : genannotations.c $(RTL_BASE_H) $(BCONFIG_H)	\
+  $(SYSTEM_H) coretypes.h $(GTM_H) errors.h $(READ_MD_H) gensupport.h	\
+  $(OBSTACK_H)
 build/genmddeps.o: genmddeps.c $(BCONFIG_H) $(SYSTEM_H) coretypes.h	\
   errors.h $(READ_MD_H)
 build/genmodes.o : genmodes.c $(BCONFIG_H) $(SYSTEM_H) errors.h		\
@@ -2608,8 +2619,10 @@ build/gencfn-macros.o : gencfn-macros.c $(BCONFIG_H) $(SYSTEM_H)	\
 # even if GCC is being compiled to run on some other machine.
 
 # All these programs use the RTL reader ($(BUILD_RTL)).
-genprogrtl = attr attr-common attrtab automata codes conditions config emit \
-	 extract flags mddump opinit output peep preds recog target-def
+genprogrtl = annotations attr attr-common attrtab automata codes conditions \
+	 config emit extract flags mddump opinit output peep preds recog \
+	 target-def
+
 $(genprogrtl:%=build/gen%$(build_exeext)): $(BUILD_RTL)
 
 # All these programs use the MD reader ($(BUILD_MD)).
diff --git a/gcc/annotate-md.sh b/gcc/annotate-md.sh
new file mode 100644
index 000..7c21eec
--- /dev/null
+++ b/gcc/annotate-md.sh
@@ -0,0 +1,46 @@
+#! /bin/sh -x
+
+# Copyright (C) 2015 Free Software Foundation, Inc.
+# This file is part of GCC.
+
+# GCC is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3, or (at your option)
+# any later version.
+
+# GCC is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# .  
+
+# Take a file generated by genannotations.c and verify the annotations
+# in the .md files.
+
+mkdir -p annotated || exit 1
+fail=false
+FILES=`cat $1 |sed s,':.*$,,'|sort|uniq`
+for f in $FILES; do
+fn=`basename $f`
+sed '/.GEN. Expands.to/,/^[^;]/s/^;;/;; AUTOGEN-DELETE/' < $f > annotated/tmpfile
+grep $f $1 |sed 's,^[^:]*:,,'| sort -r -n |sed 's,\\n,\n,g' >tmp-annotate
+echo "w" >>tmp-annotate
+echo "q" >>tmp-annotate
+ed annotated/tmpfile < tmp-annotate >/dev/null 2>&1
+grep -v AUTOGEN-DELETE < annotated/tmpfile >annotated/$fn
+if ! cmp $f annotated/$fn >/dev/null 2>&1 ; then
+	

Re: [PATCH] fix PR65726

2015-12-01 Thread Andreas Tobler

On 30.11.15 23:30, Jeff Law wrote:

On 11/26/2015 11:49 AM, Andreas Tobler wrote:

Hi all,

the attached patch fixes the build issue from this ticket if bootstrap
is disabled.

Tested on x86_64-*-linux* and on x86_64-*-freebsd* with gcc and clang.

Ok for trunk?

And 5.3?

Thanks,
Andreas

2015-11-26  Andreas Tobler  

  PR libffi/65726
  * Makefile.def (lang_env_dependencies): Make libffi depend
  on cxx.
  * Makefile.in: Regenerate.


OK.



Thanks!

Committed to trunk. I'll wait till gcc5 opens again and then I commit to 
gcc5 and gcc49, ok?


Andreas



Re: [PATCH] Empty redirect_edge_var_map after each pass and function

2015-12-01 Thread Jeff Law

On 12/01/2015 11:33 AM, Alan Lawrence wrote:


I was not able to reduce the testcase below about 30k characters, with e.g.
#define T_VOID 0
 T_VOID 
producing the ICE, but manually changing to
 0 
preventing the ICE; as did running the preprocessor as a separate step, or a
wide variety of options (e.g. -fdump-tree-alias).
Which is almost always an indication that there's a memory corruption, 
or uninitialized memory read or something similar.





In the end I traced this to loop_unswitch reading stale values from the edge
redirect map, which is keyed on 'edge' (a pointer to struct edge_def); the map
entries had been left there by pass_dominator (on a different function), and by
"chance" the edge *pointers* were the same as to some current edge_defs (even
though they pointed to structures created by different allocations, the first
of which had since been freed). Hence the fragility of the testcase and
environment.
Right.  So the question I have is how/why did DOM leave anything in the 
map.   And if DOM is fixed to not leave stuff lying around, can we then 
assert that nothing is ever left in those maps between passes?  There's 
certainly no good reason I'm aware of why DOM would leave things in this 
state.


Jeff


Re: Incorrect code due to indirect tail call of varargs function with hard float ABI

2015-12-01 Thread Kugan

>>
>> gcc/ChangeLog:
>>
>> 2015-11-18  Kugan Vivekanandarajah  
>>
>>  PR target/68390
>>  * config/arm/arm.c (arm_function_ok_for_sibcall): Get function type
>>  for indirect function call.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2015-11-18  Kugan Vivekanandarajah  
>>
>>  PR target/68390
>>  * gcc.target/arm/PR68390.c: New test.
>>
> 
> s/PR/pr in the test name and put this in gcc.c-torture/execute instead - 
> there is nothing ARM specific about the test. Tests in gcc.target/arm should 
> really only be architecture specific. This isn't.
> 
>>
>>
>>
>> p.txt
>>
>>
>> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
>> index a379121..0dae7da 100644
>> --- a/gcc/config/arm/arm.c
>> +++ b/gcc/config/arm/arm.c
>> @@ -6680,8 +6680,13 @@ arm_function_ok_for_sibcall (tree decl, tree exp)
>>   a VFP register but then need to transfer it to a core
>>   register.  */
>>rtx a, b;
>> +  tree fn_decl = decl;
> 
> Call it decl_or_type instead - it's really that ... 
> 
>>  
>> -  a = arm_function_value (TREE_TYPE (exp), decl, false);
>> +  /* If it is an indirect function pointer, get the function type.  */
>> +  if (!decl)
>> +fn_decl = TREE_TYPE (TREE_TYPE (CALL_EXPR_FN (exp)));
>> +
> 
> This is probably just my mail client - but please watch out for indentation.
> 
>> +  a = arm_function_value (TREE_TYPE (exp), fn_decl, false);
>>b = arm_function_value (TREE_TYPE (DECL_RESULT (cfun->decl)),
>>cfun->decl, false);
>>if (!rtx_equal_p (a, b))
> 
> 
> OK with those changes.
> 
> Ramana
> 


Hi Ramana,

This issue also remains in 4.9 and 5.0 branches. Is this OK to backport
to the release branches.

Thanks,
Kugan


[google gcc-4_9] Fix bad LIPO profile produced by gcov-tool

2015-12-01 Thread Rong Xu
Hi,

This patch fixes the issue when using gcov-tool to merge LIPO profiles
after we compressing the module infomration . We should not decompress
the string as the compressed string should be written directly to the
profile later. Tested with some LIPO profiles.

Thanks,

-Rong
2015-12-01  Rong Xu  

* gcov-dump.c (tag_module_info): Dump string information.
* gcov-io.c (gcov_read_module_info): record combined_len
  and don't uncompress in gcov-tool.

Index: gcov-dump.c
===
--- gcov-dump.c (revision 231134)
+++ gcov-dump.c (working copy)
@@ -588,6 +588,11 @@ tag_module_info (const char *filename ATTRIBUTE_UN
 {
   if (!mod_info->is_primary)
printf ("%s\n", mod_info->source_filename);
+  unsigned short compressed_size = mod_info->combined_strlen;
+  unsigned short uncompressed_size = mod_info->combined_strlen>>16;
+  printf ("compressed_ strlen=%d uncompressed_strlen=%d String:\n",
+  compressed_size,uncompressed_size);
+  printf ("%s\n", mod_info->saved_cc1_strings);
 }
   else
 {
Index: gcov-io.c
===
--- gcov-io.c   (revision 231134)
+++ gcov-io.c   (working copy)
@@ -835,16 +835,18 @@ gcov_read_module_info (struct gcov_module_info *mo
   len -= (src_filename_len + 1);
 
   saved_compressed_len = (unsigned long) gcov_read_unsigned ();
-  saved_uncompressed_len  = saved_compressed_len >> 16;
-  saved_compressed_len &= 0x;
+  mod_info->combined_strlen = saved_compressed_len;
   tag_len = gcov_read_unsigned ();
   len -= (tag_len + 2);
   gcc_assert (!len);
   compressed_array = (char *) xmalloc (tag_len * sizeof (gcov_unsigned_t));
-  uncompressed_array = (char *) xmalloc (saved_uncompressed_len);
   for (i = 0; i < tag_len; i++)
 ((gcov_unsigned_t *) compressed_array)[i] = gcov_read_unsigned ();
 
+#if !defined (IN_GCOV_TOOL)
+  saved_uncompressed_len  = saved_compressed_len >> 16;
+  saved_compressed_len &= 0x;
+  uncompressed_array = (char *) xmalloc (saved_uncompressed_len);
   result_len = saved_uncompressed_len;
   uncompress ((Bytef *)uncompressed_array, _len,
   (const Bytef *)compressed_array, saved_compressed_len);
@@ -851,6 +853,9 @@ gcov_read_module_info (struct gcov_module_info *mo
   gcc_assert (result_len == saved_uncompressed_len);
   mod_info->saved_cc1_strings = uncompressed_array;
   free (compressed_array);
+#else /* IN_GCOV_TOOL: we don't need to uncompress. It's a pass through.  */
+  mod_info->saved_cc1_strings = compressed_array;
+#endif
 }
 #endif
 


Re: Gimple loop splitting v2

2015-12-01 Thread Jeff Law

On 12/01/2015 09:46 AM, Michael Matz wrote:

Hi,

So, okay for trunk?

-ENOPATCH

Jeff



Re: [PATCH] Handle BUILT_IN_GOACC_PARALLEL in ipa-pta

2015-12-01 Thread Tom de Vries

On 01/12/15 15:44, Jakub Jelinek wrote:

On Tue, Dec 01, 2015 at 03:25:42PM +0100, Tom de Vries wrote:

Handle BUILT_IN_GOACC_PARALLEL in ipa-pta

2015-12-01  Tom de Vries  

* tree-ssa-structalias.c (find_func_aliases_for_builtin_call)
(find_func_clobbers, ipa_pta_execute): Handle BUILT_IN_GOACC_PARALLEL.


Isn't this cheating though?  The kernel will be called with those addresses
only if doing host fallback


Let's take a look at goacc/kernels-alias-ipa-pta.c:
...
unsigned int a[N];
unsigned int b[N];
unsigned int c[N];

#pragma acc kernels pcopyout (a, b, c)
{
  a[0] = 0;
  b[0] = 1;
  c[0] = a[0];
}
...

If we execute on the host, the a, b and c used in the kernels region 
will be the a, b and c declared outside the region.


If we execute on a non-shared mem accelerator, the a, b and c used in 
the kernels region will be copies of a, b and c in the accelerator 
memory: a.1, b.1 and c.1.


This patch tells ipa-pta (which has no notion of a.1, b.1 and c.1) that 
we're using declared a, b, and c, in the kernels region, while on the 
accelerator we're really using a.1, b.1 and c.1, so in that sense it's 
cheating.


However, given that declared a, b and c are disjunct, we know that their 
copies will be disjunct, so by pretending that declared a, b and c are 
used in the kernels region, we get conclusions which are also valid when 
we use a.1, b.1 and c.1 instead in the kernels region.


So, for this patch to be incorrect we have to find an example where 
ipa-pta finds that two memory references are not aliasing, while on the 
accelerator those memory references are really aliasing. AFAICT there 
are no such examples.



(and for GOMP_target_ext even not for that
always - firstprivate vars will have the addresses replaced by addresses of
alloca-ed copies of those objects).


I don't think firstprivate vars is a problem, I think the opposite would 
a problem: merging vars on the accelerator which are disjunct on the host.



I haven't studied in detail what exactly IPA-PTA does, so maybe it is good
enough to pretend that.


AFAIU, it's good enough, because the points-to information is only used 
to prove non-aliases.


Does this explanation address your concern?

Thanks,
- Tom


[committed] Tighten runtime initialization check in __canonicalize_funcptr_for_compare on hppa-linux

2015-12-01 Thread John David Anglin
The attached change fixes a startup issue of emacs24 on Debian hppa-linux.  The 
emacs24 build does some tricky
stuff to preinitialize values, so that standard static initialization check is 
skipped.  However, the global offset table has
moved in the final executable and emacs24 crashes.

The attached change changes the check to one against the runtime global offset 
table address.  If its location has changed,
new values to call fixup in the dynamic linker are computed.

The offset order to find the address of _dl_fixup is changed as the template is 
now everywhere.

Tested with no observed regressions on trunk.  Installed to trunk and 4.9.

Dave
--
John David Anglin   dave.ang...@bell.net


2015-12-01  John David Anglin  

* config/pa/fptr.c (__canonicalize_funcptr_for_compare): Initialize
fixup values if saved GOT address doesn't match runtime address.
(fixup_branch_offset): Reorder list.

Index: config/pa/fptr.c
===
--- config/pa/fptr.c(revision 231043)
+++ config/pa/fptr.c(working copy)
@@ -40,7 +40,7 @@
the template should it be necessary to change the current branch
position.  */
 #define NOFFSETS 2
-static int fixup_branch_offset[NOFFSETS] = { 32, -4 };
+static int fixup_branch_offset[NOFFSETS] = { -4, 32 };
 
 #define GET_FIELD(X, FROM, TO) \
   ((X) >> (31 - (TO)) & ((1 << ((TO) - (FROM) + 1)) - 1))
@@ -66,6 +66,7 @@
 {
   static unsigned int fixup_plabel[2];
   static fixup_t fixup;
+  static unsigned int *init_fixup;
   unsigned int *plabel, *got;
 
   /* -1 and page 0 are special.  -1 is used in crtend to mark the end of
@@ -88,9 +89,11 @@
 return plabel[0];
 
   /* Initialize our plabel for calling fixup if we haven't done so already.
- This code needs to be thread safe but we don't have to be too careful
- as the result is invariant.  */
-  if (!fixup)
+ We can't rely on static initialization so we check that any previous
+ initialization was done for the current got address.  This code needs
+ to be thread safe but we don't have to be too careful as the result
+ is invariant.  */
+  if (init_fixup != got)
 {
   int i;
   unsigned int *iptr;
@@ -121,6 +124,9 @@
   fixup_plabel[0] = (unsigned int) iptr + 8;  /* address of fixup */
   fixup_plabel[1] = got[-1]; /* ltp for fixup */
   fixup = (fixup_t) ((int) fixup_plabel | 3);
+
+  /* Save address of the global offset table.  */
+  init_fixup = got;
 }
 
   /* Call fixup to resolve the function address.  got[1] contains the


Re: [gomp-nvptx 9/9] adjust SIMD loop lowering for SIMT targets

2015-12-01 Thread Alexander Monakov
Apologies -- last-minute attempt to cleanup and enhance broke this patch;
fixed version below.  The main difference is checking whether we're
transforming a loop that might be executed on the target: checking
decl->offloadable isn't enough, because target region outlining might not have
happened yet; in that case, we need to walk the region tree upwards to check
if any containing region is a target region.

Alexander

diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index a3c4a90..3189e96 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -142,6 +142,28 @@ expand_ANNOTATE (gcall *)
   gcc_unreachable ();
 }
 
+/* Lane index on SIMT targets: thread index in the warp on NVPTX.  On targets
+   without SIMT execution this should be expanded in omp_device_lower pass.  */
+
+static void
+expand_GOMP_SIMT_LANE (gcall *stmt)
+{
+  tree lhs = gimple_call_lhs (stmt);
+
+  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+  /* FIXME: use a separate pattern for OpenMP?  */
+  gcc_assert (targetm.have_oacc_dim_pos ());
+  emit_insn (targetm.gen_oacc_dim_pos (target, const2_rtx));
+}
+
+/* This should get expanded in omp_device_lower pass.  */
+
+static void
+expand_GOMP_SIMT_VF (gcall *)
+{
+  gcc_unreachable ();
+}
+
 /* This should get expanded in adjust_simduid_builtins.  */
 
 static void
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 1cb14a8..66c7422 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -41,6 +41,8 @@ along with GCC; see the file COPYING3.  If not see
 
 DEF_INTERNAL_FN (LOAD_LANES, ECF_CONST | ECF_LEAF, NULL)
 DEF_INTERNAL_FN (STORE_LANES, ECF_CONST | ECF_LEAF, NULL)
+DEF_INTERNAL_FN (GOMP_SIMT_LANE, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL)
+DEF_INTERNAL_FN (GOMP_SIMT_VF, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (GOMP_SIMD_LANE, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (GOMP_SIMD_VF, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (GOMP_SIMD_LAST_LANE, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index cc0435e..0478b2a 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -10173,7 +10173,7 @@ expand_omp_simd (struct omp_region *region, struct 
omp_for_data *fd)
  OMP_CLAUSE_SAFELEN);
   tree simduid = find_omp_clause (gimple_omp_for_clauses (fd->for_stmt),
  OMP_CLAUSE__SIMDUID_);
-  tree n1, n2;
+  tree n1, n2, step, simt_lane;
 
   type = TREE_TYPE (fd->loop.v);
   entry_bb = region->entry;
@@ -10218,12 +10218,36 @@ expand_omp_simd (struct omp_region *region, struct 
omp_for_data *fd)
 
   n1 = fd->loop.n1;
   n2 = fd->loop.n2;
+  step = fd->loop.step;
+  bool offloaded = cgraph_node::get (current_function_decl)->offloadable;
+  for (struct omp_region *reg = region; !offloaded && reg; reg = reg->outer)
+offloaded = reg->type == GIMPLE_OMP_TARGET;
+  bool do_simt_transform
+= offloaded && !broken_loop && !safelen && !simduid && !(fd->collapse > 1);
+  if (do_simt_transform)
+{
+  simt_lane
+   = build_call_expr_internal_loc (UNKNOWN_LOCATION, IFN_GOMP_SIMT_LANE,
+   integer_type_node, 0);
+  simt_lane = fold_convert (TREE_TYPE (step), simt_lane);
+  simt_lane = fold_build2 (MULT_EXPR, TREE_TYPE (step), step, simt_lane);
+  cfun->curr_properties &= ~PROP_gimple_lomp_dev;
+}
+
   if (gimple_omp_for_combined_into_p (fd->for_stmt))
 {
   tree innerc = find_omp_clause (gimple_omp_for_clauses (fd->for_stmt),
 OMP_CLAUSE__LOOPTEMP_);
   gcc_assert (innerc);
   n1 = OMP_CLAUSE_DECL (innerc);
+  if (do_simt_transform)
+   {
+ n1 = fold_convert (type, n1);
+ if (POINTER_TYPE_P (type))
+   n1 = fold_build_pointer_plus (n1, simt_lane);
+ else
+   n1 = fold_build2 (PLUS_EXPR, type, n1, fold_convert (type, 
simt_lane));
+   }
   innerc = find_omp_clause (OMP_CLAUSE_CHAIN (innerc),
OMP_CLAUSE__LOOPTEMP_);
   gcc_assert (innerc);
@@ -10239,8 +10263,15 @@ expand_omp_simd (struct omp_region *region, struct 
omp_for_data *fd)
 }
   else
 {
-  expand_omp_build_assign (, fd->loop.v,
-  fold_convert (type, fd->loop.n1));
+  if (do_simt_transform)
+   {
+ n1 = fold_convert (type, n1);
+ if (POINTER_TYPE_P (type))
+   n1 = fold_build_pointer_plus (n1, simt_lane);
+ else
+   n1 = fold_build2 (PLUS_EXPR, type, n1, fold_convert (type, 
simt_lane));
+   }
+  expand_omp_build_assign (, fd->loop.v, fold_convert (type, n1));
   if (fd->collapse > 1)
for (i = 0; i < fd->collapse; i++)
  {
@@ -10262,10 +10293,18 @@ expand_omp_simd (struct omp_region *region, struct 
omp_for_data *fd)
   stmt = gsi_stmt (gsi);
   gcc_assert (gimple_code (stmt) == GIMPLE_OMP_CONTINUE);
 
+  if (do_simt_transform)

Re: [gomp-nvptx 6/9] nvptx libgcc: rewrite in C

2015-12-01 Thread Alexander Monakov
On Wed, 2 Dec 2015, Bernd Schmidt wrote:

> What exactly is the problem with having asm files? I'm asking because this...

Wrappers for malloc and free need different code under -muniform-simt.

> 
> On 12/01/2015 04:28 PM, Alexander Monakov wrote:
> > +/* __shared__ char *__nvptx_stacks[32];  */
> > +asm ("// BEGIN GLOBAL VAR DEF: __nvptx_stacks");
> > +asm (".visible .shared .u64 __nvptx_stacks[32];");
> > +
> > +/* __shared__ unsigned __nvptx_uni[32];  */
> > +asm ("// BEGIN GLOBAL VAR DEF: __nvptx_uni");
> > +asm (".visible .shared .u32 __nvptx_uni[32];");
> 
> ... doesn't look great to me. This is better done in assembly directly IMO.

Hm.  I can convert it to asm, but then if/when I start using attribute-based
shared memory, I'd have to move it back to C again, I think.

Thanks.
Alexander


Re: [gomp-nvptx 6/9] nvptx libgcc: rewrite in C

2015-12-01 Thread Bernd Schmidt
What exactly is the problem with having asm files? I'm asking because 
this...


On 12/01/2015 04:28 PM, Alexander Monakov wrote:

+/* __shared__ char *__nvptx_stacks[32];  */
+asm ("// BEGIN GLOBAL VAR DEF: __nvptx_stacks");
+asm (".visible .shared .u64 __nvptx_stacks[32];");
+
+/* __shared__ unsigned __nvptx_uni[32];  */
+asm ("// BEGIN GLOBAL VAR DEF: __nvptx_uni");
+asm (".visible .shared .u32 __nvptx_uni[32];");


... doesn't look great to me. This is better done in assembly directly IMO.


Bernd


Go patch committed: Fix array dimension handling on 32-bit host

2015-12-01 Thread Ian Lance Taylor
The Go frontend code that handled array dimensions when generating
reflection and mangling assumed that an array dimension would fit in
an unsigned long.  That is of course not true when a 32-bit host is
cross-compiling to a 64-bit target.  This patch fixes the problem.
This was reported as GCC PR 65717.  Bootstrapped and ran Go tests on
x86_64-pc-linux-gnu, and also on a 32-bit Solaris host crossing to a
64-bit Solaris target.  Committed to mainline.  Could be committed to
GCC 5 branch but I'm not sure whether the branch is open yet.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 231095)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-81eb6a3f425b2158c67ee32c0cc973a72ce9d6be
+c375f3bf470f94220149b486c947bb3eb57cde7d
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/types.cc
===
--- gcc/go/gofrontend/types.cc  (revision 231095)
+++ gcc/go/gofrontend/types.cc  (working copy)
@@ -6398,22 +6398,21 @@ Array_type::do_reflection(Gogo* gogo, st
   if (this->length_ != NULL)
 {
   Numeric_constant nc;
-  unsigned long val;
-  if (!this->length_->numeric_constant_value()
- || nc.to_unsigned_long() != Numeric_constant::NC_UL_VALID)
+  if (!this->length_->numeric_constant_value())
{
- if (!this->issued_length_error_)
-   {
- error_at(this->length_->location(), "invalid array length");
- this->issued_length_error_ = true;
-   }
+ go_assert(saw_errors());
+ return;
}
-  else
+  mpz_t val;
+  if (!nc.to_int())
{
- char buf[50];
- snprintf(buf, sizeof buf, "%lu", val);
- ret->append(buf);
+ go_assert(saw_errors());
+ return;
}
+  char* s = mpz_get_str(NULL, 10, val);
+  ret->append(s);
+  free(s);
+  mpz_clear(val);
 }
   ret->push_back(']');
 
@@ -6544,22 +6543,21 @@ Array_type::do_mangled_name(Gogo* gogo,
   if (this->length_ != NULL)
 {
   Numeric_constant nc;
-  unsigned long val;
-  if (!this->length_->numeric_constant_value()
- || nc.to_unsigned_long() != Numeric_constant::NC_UL_VALID)
+  if (!this->length_->numeric_constant_value())
{
- if (!this->issued_length_error_)
-   {
- error_at(this->length_->location(), "invalid array length");
- this->issued_length_error_ = true;
-   }
+ go_assert(saw_errors());
+ return;
}
-  else
+  mpz_t val;
+  if (!nc.to_int())
{
- char buf[50];
- snprintf(buf, sizeof buf, "%lu", val);
- ret->append(buf);
+ go_assert(saw_errors());
+ return;
}
+  char *s = mpz_get_str(NULL, 10, val);
+  ret->append(s);
+  free(s);
+  mpz_clear(val);
 }
   ret->push_back('e');
 }


Re: [PATCH] rs6000: Optimise SImode cstore on 64-bit

2015-12-01 Thread David Edelsohn
On Tue, Dec 1, 2015 at 8:55 PM, Segher Boessenkool
 wrote:
> On 64-bit we can do comparisons of 32-bit values by extending those
> values to 64-bit, subtracting them, and then getting the high bit of
> the result.  For registers this is always cheaper than using the carry
> bit sequence; and if the comparison involves a constant, this is cheaper
> than the sequence we previously generated in half of the cases (and the
> same cost in the other cases).
>
> After this, the only sequence left that is using the mfcr insn is the
> one doing signed comparison of Pmode registers.
>
> Testing in progress.  Okay for trunk if that succeeds?

Okay.

Thanks, David


Re: -fstrict-aliasing fixes 1/5: propagate -fno-strict-aliasing in the inliner

2015-12-01 Thread Jan Hubicka
> > * ipa-inline-transform.c (inline_call): Drop -fstrict-aliasing when
> > inlining -fno-strict-aliasing into -fstrict-aliasing body.
> > * gcc.dg/lto/alias-1_0.c: New testcase.
> > * gcc.dg/lto/alias-1_1.c: New testcase.
> >Index: ipa-inline-transform.c
> >===
> >--- ipa-inline-transform.c   (revision 231081)
> >+++ ipa-inline-transform.c   (working copy)
> >@@ -322,6 +322,21 @@ inline_call (struct cgraph_edge *e, bool
> >   if (DECL_FUNCTION_PERSONALITY (callee->decl))
> > DECL_FUNCTION_PERSONALITY (to->decl)
> >   = DECL_FUNCTION_PERSONALITY (callee->decl);
> >+  if (!opt_for_fn (callee->decl, flag_strict_aliasing)
> >+  && opt_for_fn (to->decl, flag_strict_aliasing))
> 
> Just curious why you don't handle the other way round?

After inlining, opt_for_fn of CALLEE will be ignored and will
become opt_for_fn of TO. Turning flag_strict_alising code to
!flag_strict_aliasing is safe, but not the other way around.
> 
> >+{
> >+  struct gcc_options opts = global_options;
> >+
> >+  cl_optimization_restore (,
> >+ TREE_OPTIMIZATION (DECL_FUNCTION_SPECIFIC_OPTIMIZATION (to->decl)));
> >+  opts.x_flag_strict_aliasing = false;
> >+  if (dump_file)
> >+fprintf (dump_file, "Dropping flag_strict_aliasing on %s:%i\n",
> >+ to->name (), to->order);
> 
> ISTR to have seen %s/%i for printing name and order in IPA, no?

Hmm, right, will update it.
> >+void link_error (void);
> 
> Unused and unneeded forward decl?

Yep, I originally wanted to check that we optimize out the type punned code (we 
can)
but we don't seem to be able to do so.   It is just a testcase and extra
declaration is harmless I guess.

Thanks!
Honza


Re: [google gcc-4_9] Fix bad LIPO profile produced by gcov-tool

2015-12-01 Thread Xinliang David Li
So that field was never inited/used before?

Also saved_compressed_len = (unsigned long) gcov_read_unsigned ();

should use unsigned not unsigned long type.

On Tue, Dec 1, 2015 at 4:53 PM, Rong Xu  wrote:
> This is only needed for gcov-tool as we need to rewrite the
> moduel-info to the profile (this is only used in decompress)
>
> The transitional compiler path does not need it because the string is
> already decompressed. It only needs to use the strings.
>
> gcov-dump in theory does not need it either if it only dumps the
> strings. But now I added the printing of both lengths. So now it is
> also needed.
>
> On Tue, Dec 1, 2015 at 4:46 PM, Xinliang David Li  wrote:
>> Not sure about this line:
>>
>> mod_info->combined_strlen = saved_compressed_len;
>>
>> This did not exist for the compiler path before.
>>
>> On Tue, Dec 1, 2015 at 4:34 PM, Rong Xu  wrote:
>>>
>>> Hi,
>>>
>>> This patch fixes the issue when using gcov-tool to merge LIPO profiles
>>> after we compressing the module infomration . We should not decompress
>>> the string as the compressed string should be written directly to the
>>> profile later. Tested with some LIPO profiles.
>>>
>>> Thanks,
>>>
>>> -Rong
>>
>>


Re: [PATCH] rs6000: Optimise SImode cstore on 64-bit

2015-12-01 Thread Alan Modra
On Wed, Dec 02, 2015 at 01:55:17AM +, Segher Boessenkool wrote:
> +  emit_insn (gen_subdi3 (tmp, op1, op2));
> +  emit_insn (gen_lshrdi3 (tmp2, tmp, GEN_INT (63)));
> +  emit_insn (gen_anddi3 (tmp3, tmp2, const1_rtx));

Why the AND?  The top 63 bits are already clear.

-- 
Alan Modra
Australia Development Lab, IBM


[PATCH] rs6000: Optimise SImode cstore on 64-bit

2015-12-01 Thread Segher Boessenkool
On 64-bit we can do comparisons of 32-bit values by extending those
values to 64-bit, subtracting them, and then getting the high bit of
the result.  For registers this is always cheaper than using the carry
bit sequence; and if the comparison involves a constant, this is cheaper
than the sequence we previously generated in half of the cases (and the
same cost in the other cases).

After this, the only sequence left that is using the mfcr insn is the
one doing signed comparison of Pmode registers.

Testing in progress.  Okay for trunk if that succeeds?


Segher


2015-12-01  Segher Boessenkool  

* config/rs6000/rs6000.md (cstore_si_as_di): New expander.
(cstore4): Use it.

---
 gcc/config/rs6000/rs6000.md | 52 +
 1 file changed, 52 insertions(+)

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index a500d67..a599372 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -10564,6 +10564,53 @@ (define_expand "cstore4_unsigned"
   DONE;
 })
 
+(define_expand "cstore_si_as_di"
+  [(use (match_operator 1 "unsigned_comparison_operator"
+ [(match_operand:SI 2 "gpc_reg_operand")
+  (match_operand:SI 3 "reg_or_short_operand")]))
+   (clobber (match_operand:SI 0 "register_operand"))]
+  ""
+{
+  int uns_flag = unsigned_comparison_operator (operands[1], VOIDmode) ? 1 : 0;
+  enum rtx_code cond_code = signed_condition (GET_CODE (operands[1]));
+
+  rtx op1 = gen_reg_rtx (DImode);
+  rtx op2 = gen_reg_rtx (DImode);
+  convert_move (op1, operands[2], uns_flag);
+  convert_move (op2, operands[3], uns_flag);
+
+  if (cond_code == GT || cond_code == LE)
+{
+  cond_code = swap_condition (cond_code);
+  std::swap (op1, op2);
+}
+
+  rtx tmp = gen_reg_rtx (DImode);
+  rtx tmp2 = gen_reg_rtx (DImode);
+  rtx tmp3 = gen_reg_rtx (DImode);
+  emit_insn (gen_subdi3 (tmp, op1, op2));
+  emit_insn (gen_lshrdi3 (tmp2, tmp, GEN_INT (63)));
+  emit_insn (gen_anddi3 (tmp3, tmp2, const1_rtx));
+
+  rtx tmp4;
+  switch (cond_code)
+{
+default:
+  gcc_unreachable ();
+case LT:
+  tmp4 = tmp3;
+  break;
+case GE:
+  tmp4 = gen_reg_rtx (DImode);
+  emit_insn (gen_xordi3 (tmp4, tmp3, const1_rtx));
+  break;
+}
+
+  convert_move (operands[0], tmp4, 1);
+
+  DONE;
+})
+
 (define_expand "cstore4_signed_imm"
   [(use (match_operator 1 "signed_comparison_operator"
  [(match_operand:GPR 2 "gpc_reg_operand")
@@ -10688,6 +10735,11 @@ (define_expand "cstore4"
 emit_insn (gen_cstore4_unsigned (operands[0], operands[1],
   operands[2], operands[3]));
 
+  /* For comparisons smaller than Pmode we can cheaply do things in Pmode.  */
+  else if (mode == SImode && Pmode == DImode)
+emit_insn (gen_cstore_si_as_di (operands[0], operands[1],
+   operands[2], operands[3]));
+
   /* For signed comparisons against a constant, we can do some simple
  bit-twiddling.  */
   else if (signed_comparison_operator (operands[1], VOIDmode)
-- 
1.9.3



Re: [google gcc-4_9] Fix bad LIPO profile produced by gcov-tool

2015-12-01 Thread Rong Xu
This is only needed for gcov-tool as we need to rewrite the
moduel-info to the profile (this is only used in decompress)

The transitional compiler path does not need it because the string is
already decompressed. It only needs to use the strings.

gcov-dump in theory does not need it either if it only dumps the
strings. But now I added the printing of both lengths. So now it is
also needed.

On Tue, Dec 1, 2015 at 4:46 PM, Xinliang David Li  wrote:
> Not sure about this line:
>
> mod_info->combined_strlen = saved_compressed_len;
>
> This did not exist for the compiler path before.
>
> On Tue, Dec 1, 2015 at 4:34 PM, Rong Xu  wrote:
>>
>> Hi,
>>
>> This patch fixes the issue when using gcov-tool to merge LIPO profiles
>> after we compressing the module infomration . We should not decompress
>> the string as the compressed string should be written directly to the
>> profile later. Tested with some LIPO profiles.
>>
>> Thanks,
>>
>> -Rong
>
>


[PTX] simplify arg advance

2015-12-01 Thread Nathan Sidwell

arg_advance doesn't need to consider TImode.  Those are always passed by 
reference.

nathan
2015-12-01  Nathan Sidwell  

	* config/nvptx/nvptx.c (nvptx_function_arg_advance): Don't
	consider mode.

Index: config/nvptx/nvptx.c
===
--- config/nvptx/nvptx.c	(revision 231120)
+++ config/nvptx/nvptx.c	(working copy)
@@ -975,15 +975,13 @@ nvptx_function_incoming_arg (cumulative_
 /* Implement TARGET_FUNCTION_ARG_ADVANCE.  */
 
 static void
-nvptx_function_arg_advance (cumulative_args_t cum_v, machine_mode mode,
-			const_tree type ATTRIBUTE_UNUSED,
-			bool named ATTRIBUTE_UNUSED)
+nvptx_function_arg_advance (cumulative_args_t cum_v,
+			machine_mode ARG_UNUSED (mode),
+			const_tree ARG_UNUSED (type),
+			bool ARG_UNUSED (named))
 {
   CUMULATIVE_ARGS *cum = get_cumulative_args (cum_v);
-  if (mode == TImode)
-cum->count += 2;
-  else
-cum->count++;
+  cum->count++;
 }
 
 /* Handle the TARGET_STRICT_ARGUMENT_NAMING target hook.


Re: [PATCH] RFC: Use Levenshtein spelling suggestions in Fortran FE

2015-12-01 Thread David Malcolm
On Tue, 2015-12-01 at 13:55 +0100, Bernhard Reutner-Fischer wrote:
> gcc/fortran/ChangeLog
> 
> 2015-11-29  Bernhard Reutner-Fischer  
> 
>   * gfortran.h (gfc_lookup_function_fuzzy): New declaration.
>   * resolve.c: Include spellcheck.h.
>   (lookup_function_fuzzy_find_candidates): New static function.
>   (lookup_uop_fuzzy_find_candidates): Likewise.
>   (lookup_uop_fuzzy): Likewise.
>   (resolve_operator) : Call lookup_uop_fuzzy.
>   (gfc_lookup_function_fuzzy): New definition.
>   (resolve_unknown_f): Call gfc_lookup_function_fuzzy.
>   * interface.c (check_interface0): Likewise.
>   * symbol.c: Include spellcheck.h.
>   (lookup_symbol_fuzzy_find_candidates): New static function.
>   (lookup_symbol_fuzzy): Likewise.
>   (gfc_set_default_type): Call lookup_symbol_fuzzy.
>   (lookup_component_fuzzy_find_candidates): New static function.
>   (lookup_component_fuzzy): Likewise.
>   (gfc_find_component): Call lookup_component_fuzzy.
> 
> gcc/testsuite/ChangeLog
> 
> 2015-11-29  Bernhard Reutner-Fischer  
> 
>   * gfortran.dg/spellcheck-operator.f90: New testcase.
>   * gfortran.dg/spellcheck-procedure.f90: New testcase.
>   * gfortran.dg/spellcheck-structure.f90: New testcase.
> 
> ---
> 
> David Malcolm nice Levenshtein distance spelling check helpers
> were used in some parts of other frontends. This proposed patch adds
> some spelling corrections to the fortran frontend.
> 
> Suggestions are printed if we can find a suitable name, currently
> perusing a very simple cutoff factor:
> /* If more than half of the letters were misspelled, the suggestion is
>likely to be meaningless.  */
> cutoff = MAX (strlen (typo), strlen (best_guess)) / 2;
> which effectively skips names with less than 4 characters.
> For e.g. structures, one could try to be much smarter in an attempt to
> also provide suggestions for single-letter members/components.
> 
> This patch covers (at least partly):
> - user-defined operators
> - structures (types and their components)
> - functions
> - symbols (variables)
> 
> I do not immediately see how to handle subroutines. Ideas?
> 
> If anybody has a testcase where a spelling-suggestion would make sense
> then please pass it along so we maybe can add support for GCC-7.
> 
> Signed-off-by: Bernhard Reutner-Fischer 
> ---
>  gcc/fortran/gfortran.h |   1 +
>  gcc/fortran/interface.c|  16 ++-
>  gcc/fortran/resolve.c  | 135 
> -
>  gcc/fortran/symbol.c   | 129 +++-
>  gcc/testsuite/gfortran.dg/spellcheck-operator.f90  |  30 +
>  gcc/testsuite/gfortran.dg/spellcheck-procedure.f90 |  41 +++
>  gcc/testsuite/gfortran.dg/spellcheck-structure.f90 |  35 ++
>  7 files changed, 376 insertions(+), 11 deletions(-)
>  create mode 100644 gcc/testsuite/gfortran.dg/spellcheck-operator.f90
>  create mode 100644 gcc/testsuite/gfortran.dg/spellcheck-procedure.f90
>  create mode 100644 gcc/testsuite/gfortran.dg/spellcheck-structure.f90
> 
> diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
> index 5487c93..cbfd592 100644
> --- a/gcc/fortran/gfortran.h
> +++ b/gcc/fortran/gfortran.h
> @@ -3060,6 +3060,7 @@ bool gfc_type_is_extensible (gfc_symbol *);
>  bool gfc_resolve_intrinsic (gfc_symbol *, locus *);
>  bool gfc_explicit_interface_required (gfc_symbol *, char *, int);
>  extern int gfc_do_concurrent_flag;
> +const char* gfc_lookup_function_fuzzy (const char *, gfc_symtree *);
>  
> 
>  /* array.c */
> diff --git a/gcc/fortran/interface.c b/gcc/fortran/interface.c
> index 30cc522..19f800f 100644
> --- a/gcc/fortran/interface.c
> +++ b/gcc/fortran/interface.c
> @@ -1590,10 +1590,18 @@ check_interface0 (gfc_interface *p, const char 
> *interface_name)
> if (p->sym->attr.external)
>   gfc_error ("Procedure %qs in %s at %L has no explicit interface",
>  p->sym->name, interface_name, >sym->declared_at);
> -   else
> - gfc_error ("Procedure %qs in %s at %L is neither function nor "
> -"subroutine", p->sym->name, interface_name,
> -   >sym->declared_at);
> +   else {
> + const char *guessed
> +   = gfc_lookup_function_fuzzy (p->sym->name, p->sym->ns->sym_root);
> + if (guessed)
> +   gfc_error ("Procedure %qs in %s at %L is neither function nor "
> +  "subroutine; did you mean %qs?", p->sym->name,
> + interface_name, >sym->declared_at, guessed);
> + else
> +   gfc_error ("Procedure %qs in %s at %L is neither function nor "
> +  "subroutine", p->sym->name, interface_name,
> + >sym->declared_at);
> +   }
> return 1;
>   }
>  
> diff --git a/gcc/fortran/resolve.c 

[PING v2][PATCH][4.9] Backport fix for PR sanitizer/64820.

2015-12-01 Thread Maxim Ostapenko

On 25/11/15 12:14, Maxim Ostapenko wrote:
I would like to ping the patch: 
https://gcc.gnu.org/ml/gcc-patches/2015-11/msg02174.html.




Ping.

-Maxim


Gimple loop splitting v2

2015-12-01 Thread Michael Matz
Hi,

On Mon, 16 Nov 2015, Jeff Law wrote:

> OK, if you want to keep them, then have a consistent way to turn them 
> on/off for future debugging.  if0/if1 doesn't provide much of a clue to 
> someone else what to turn on/off if they need to debug this stuff.

> > > I don't see any negative tests -- ie tests that should not be split 
> > > due to boundary conditions.  Do you have any from development?
> > 
> > Good point, I had some but only ones where I was able to extend the 
> > splitters to cover them.  I'll think of some that really shouldn't be 
> > split.
> If you've got them, certainly add them.  Though I realize they may get 
> lost over time.

Actually, thinking a bit more about this, I don't have any that wouldn't 
be merely restrictions in the implementation that couldn't be lifted in 
the future (e.g. unequal step sizes), so I've added no additional ones.

> But in that case, the immediate dominator of pre2 & join is still the 
> initial if statement.  So I think we're OK.  That was the conclusion I 
> was starting to come to yesterday, having the ascii art makes it pretty 
> clear.  I'm just not good at conceptualizing a CFG.  I have to see it 
> explicitly and then everything seems so clear and simple.

So, this second version should reflect the review.  I've moved everything 
to a new file, split the long function into several logically separate 
ones, and even included ascii art in the comments :)  The testcase got a 
comment about what to #define for debugging.  I've included the pass to 
-O3 or alternatively if profile-use is on, similar to funswitch-loops.  
I've also added a proper -fsplit-loops option.

There's two functional changes in v2: a bugfix to not try splitting a 
non-iterating loop (irritatingly such a look returns true from 
number_of_iterations_exit, but with an ERROR_MARK comparator), and a 
limitation to avoid combinatorical explosion in artificial testcases: Once 
we have done a splitting, we don't do any in that loops parents (we may 
still do splitting in siblings or childs of siblings).

I've also done some measurements: first, bootstrap time is unaffected, and 
regstrapping succeeds without regressions when I activate the pass by 
default.  Then SPECcpu2006: build times are unaffected, everything builds 
and works also with -fsplit-loops, performance is mostly unaffected, base 
is -Ofast -funroll-loops -fpeel-loops, peak adds -fsplit-loops.

  Estimated   Estimated
Base Base   BasePeak Peak   Peak
Benchmarks  Ref.   Run Time Ratio   Ref.   Run Time Ratio
-- --  -  ---  -  
-
400.perlbench9770325   30.1 *9770323   30.3 *  
401.bzip29650382   25.2 *9650382   25.3 *  
403.gcc  8050242   33.3 *8050241   33.4 *  
429.mcf  9120311   29.3 *9120311   29.3 *  
445.gobmk   10490392   26.8 *   10490391   26.8 *  
456.hmmer9330345   27.0 *9330342   27.3 *  
458.sjeng   12100422   28.7 *   12100420   28.8 *  
462.libquantum  20720308   67.3 *   20720308   67.3 *  
464.h264ref 22130423   52.3 *   22130423   52.3 *  
471.omnetpp  6250273   22.9 *6250273   22.9 *  
473.astar7020311   22.6 *7020311   22.6 *  
483.xalancbmk6900191   36.2 *6900190   36.2 *  
 Est. SPECint_base2006 31.7
 Est. SPECint2006  31.7

  Estimated   Estimated
Base Base   BasePeak Peak   Peak
Benchmarks  Ref.   Run Time Ratio   Ref.   Run Time Ratio
-- --  -  ---  -  
-
410.bwaves  13590235   57.7 *   13590235   57.8 *  
416.gamess  NR  NR 
433.milc 9180347   26.5 *9180345   26.6 *  
434.zeusmp   9100269   33.9 *9100268   33.9 *  
435.gromacs  7140260   27.4 *7140262   27.3 *  
436.cactusADM   11950237   50.5 *   11950240   49.9 *  
437.leslie3d 9400228   41.3 *9400228   41.2 *  
444.namd 8020312   25.7 *8020311   25.7 *  
447.dealII  11440254   45.0 *   11440254   45.0 *  
450.soplex   8340201   41.4 *8340202   41.4 *  
453.povray  NR  NR 
454.calculix 

Re: [PATCH] Derive interface buffers from max name length

2015-12-01 Thread Bernhard Reutner-Fischer
On 1 December 2015 at 15:52, Janne Blomqvist  wrote:
> On Tue, Dec 1, 2015 at 2:54 PM, Bernhard Reutner-Fischer
>  wrote:
>> These three function used a hardcoded buffer of 100 but would be better
>> off to base off GFC_MAX_SYMBOL_LEN which denotes the maximum length of a
>> name in any of our supported standards (63 as of f2003 ff.).
>
> Please use xasprintf() instead (and free the result, or course). One
> of my backburner projects is to get rid of these static symbol
> buffers, and use dynamic buffers (or the symbol table) instead. We
> IIRC already have some ugly hacks by using hashing to get around
> GFC_MAX_SYMBOL_LEN when handling mangled symbols. Your patch doesn't
> make the situation worse per se, but if you're going to fix it, lets
> do it properly.

I see.

/scratch/src/gcc-6.0.mine/gcc/fortran$ git grep
"^[[:space:]]*char[[:space:]][[:space:]]*[^[;[:space:]]*\[" | wc -l
142
/scratch/src/gcc-6.0.mine/gcc/fortran$ git grep "xasprintf" | wc -l
32

What about memory fragmentation when switching to heap-based allocation?
Or is there consensus that these are in the noise compared to other
parts of the compiler?

BTW:
$ git grep APO
io.c:  static const char *delim[] = { "APOSTROPHE", "QUOTE", "NONE", NULL };
io.c:  static const char *delim[] = { "APOSTROPHE", "QUOTE", "NONE", NULL };


> Ok for GCC 7 stage1 with these changes. I don't think it's worth
> putting it into GCC 6 at this point anymore, unless this is actually
> fixing some bugs that are visible to users?

Not visible, no, can wait easily.


Re: [UPC 01/22] front-end changes

2015-12-01 Thread David Malcolm
On Tue, 2015-12-01 at 00:19 -0800, Gary Funck wrote:
> On 12/01/15 09:12:44, Eric Botcazou wrote:
> > > All languages (c, c++, fortran, go, lto, objc, obj-c++) have been
> > > bootstrapped; no test suite regressions were introduced,
> > > relative to the GCC trunk.
> > 
> > That's not all languages though, Ada and Java are missing.
> 
> OK. I'll bootstrap and run tests on those as well, and
> report back in a day/two.

There's also the jit, which looks like a language to the rest of gcc.



Re: [gomp4.5] Handle #pragma omp declare target link

2015-12-01 Thread Ilya Verbin
On Tue, Dec 01, 2015 at 14:15:59 +0100, Jakub Jelinek wrote:
> On Tue, Dec 01, 2015 at 11:48:51AM +0300, Ilya Verbin wrote:
> > > On 01 Dec 2015, at 11:18, Jakub Jelinek  wrote:
> > >> On Mon, Nov 30, 2015 at 11:55:20PM +0300, Ilya Verbin wrote:
> > >> Ok, but it doesn't solve the issue with doing it for the executable, 
> > >> because
> > >> gomp_unmap_tgt (n->tgt) will want to run free_func on uninitialized 
> > >> device.
> > > 
> > > ?? You mean that the
> > > devicep->unload_image_func (devicep->target_id, version, target_data);
> > > call deinitializes the device or something else (I mean, if there is some
> > > other tgt, then it had to be initialized)?
> > 
> > No, I mean that it can be deinitialized from plugin's __run_exit_handlers 
> > (see my last mail with the patch).
> 
> Then the bug is that you have too many atexit registered handlers that
> perform some finalization, better would be to have a single one that
> performs everything in order.
> 
> Anyway, the other option is in the atexit handlers (liboffloadmic and/or the
> intelmic plugin) to set some flag and ignore free_func calls when the flag
> is set or something like that.
> 
> Note library destructors can also use OpenMP code in them, similarly C++
> dtors etc., so when you at some point finalize certain device, you should
> arrange for newer events on the device to be ignored and new offloadings to
> go to host fallback.

So I guess the decision to do host fallback should be made in resolve_device,
rather than in plugins (in free_func and all others).  Is this patch OK?
make check-target-libgomp pass both using emul and hw, offloading from dlopened
libs also works fine.


libgomp/
* target.c (finalized): New static variable.
(resolve_device): Do nothing when finalized is true.
(GOMP_offload_register_ver): Likewise.
(GOMP_offload_unregister_ver): Likewise.
(gomp_target_fini): New static function.
(gomp_target_init): Call gomp_target_fini at exit.
liboffloadmic/
* plugin/libgomp-plugin-intelmic.cpp (unregister_main_image): Remove.
(register_main_image): Do not call unregister_main_image at exit.
(GOMP_OFFLOAD_fini_device): Allow for OpenMP.  Unregister main image.


diff --git a/libgomp/target.c b/libgomp/target.c
index cf9d0e6..320178e 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -78,6 +78,10 @@ static int num_devices;
 /* Number of GOMP_OFFLOAD_CAP_OPENMP_400 devices.  */
 static int num_devices_openmp;
 
+/* True when offloading runtime is finalized.  */
+static bool finalized;
+
+
 /* Similar to gomp_realloc, but release register_lock before gomp_fatal.  */
 
 static void *
@@ -108,6 +112,9 @@ gomp_get_num_devices (void)
 static struct gomp_device_descr *
 resolve_device (int device_id)
 {
+  if (finalized)
+return NULL;
+
   if (device_id == GOMP_DEVICE_ICV)
 {
   struct gomp_task_icv *icv = gomp_icv (false);
@@ -1095,6 +1102,9 @@ GOMP_offload_register_ver (unsigned version, const void 
*host_table,
 {
   int i;
 
+  if (finalized)
+return;
+
   if (GOMP_VERSION_LIB (version) > GOMP_VERSION)
 gomp_fatal ("Library too old for offload (version %u < %u)",
GOMP_VERSION, GOMP_VERSION_LIB (version));
@@ -1143,6 +1153,9 @@ GOMP_offload_unregister_ver (unsigned version, const void 
*host_table,
 {
   int i;
 
+  if (finalized)
+return;
+
   gomp_mutex_lock (_lock);
 
   /* Unload image from all initialized devices.  */
@@ -2282,6 +2295,24 @@ gomp_load_plugin_for_device (struct gomp_device_descr 
*device,
   return 0;
 }
 
+/* This function finalizes the runtime needed for offloading and all 
initialized
+   devices.  */
+
+static void
+gomp_target_fini (void)
+{
+  finalized = true;
+
+  int i;
+  for (i = 0; i < num_devices; i++)
+{
+  struct gomp_device_descr *devicep = [i];
+  gomp_mutex_lock (>lock);
+  gomp_fini_device (devicep);
+  gomp_mutex_unlock (>lock);
+}
+}
+
 /* This function initializes the runtime needed for offloading.
It parses the list of offload targets and tries to load the plugins for
these targets.  On return, the variables NUM_DEVICES and NUM_DEVICES_OPENMP
@@ -2387,6 +2418,9 @@ gomp_target_init (void)
   if (devices[i].capabilities & GOMP_OFFLOAD_CAP_OPENACC_200)
goacc_register ([i]);
 }
+
+  if (atexit (gomp_target_fini) != 0)
+gomp_fatal ("atexit failed");
 }
 
 #else /* PLUGIN_SUPPORT */
diff --git a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp 
b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
index f8c1725..68f7b2c 100644
--- a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
+++ b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
@@ -231,12 +231,6 @@ offload (const char *file, uint64_t line, int device, 
const char *name,
 }
 
 static void
-unregister_main_image ()
-{
-  __offload_unregister_image (_target_image);
-}
-
-static void
 register_main_image ()
 {
   /* Do not check the return value, because old 

Re: [PATCH] RFC: Use Levenshtein spelling suggestions in Fortran FE

2015-12-01 Thread Bernhard Reutner-Fischer
On 1 December 2015 at 17:41, Steve Kargl
 wrote:
> On Tue, Dec 01, 2015 at 05:12:57PM +0100, Bernhard Reutner-Fischer wrote:
>> On 1 December 2015 at 16:01, Steve Kargl
>>  wrote:
>> > On Tue, Dec 01, 2015 at 01:55:01PM +0100, Bernhard Reutner-Fischer wrote:
>> >>
>> >> David Malcolm nice Levenshtein distance spelling check helpers
>> >> were used in some parts of other frontends. This proposed patch adds
>> >> some spelling corrections to the fortran frontend.
>>
>> > What problem are you trying to solve here?  The patch looks like
>>
>> The idea is to improve the programmer experience when writing code.
>> See the testcases enclosed in the patch. I consider this a feature :)
>
> Opinions differ.  I consider it unnecessary bloat.

Fair enough.
I fully agree that it's bloat.

The compiler is so tremendously bloated by now anyway that i consider
these couple of kilobyte to have a nice bloat/user friendliness
factor, overall ;)
I can imagine that people code their fortran programs in an IDE (the
bloated variant of an editor, mine is ~20518 bytes of text, no data,
no bss) and IDEs will sooner or later support fixit-hints. Even the
console/terminal users might enjoy to safe them a cycle of opening a
different file, looking up the type/module/etc name and then going
back to the source-file to correct their typo. *I* would welcome that
sometimes for sure :)

>> > unneeded complexity with the result of injecting C++ idioms into
>> > the Fortran FE.
>>
>> What C++ idioms are you referring to? The autovec?
>> AFAIU the light use of C++ in GCC is deemed OK. I see usage of
>> std::swap and std::map in the FE, not to mention the wide-int uses
>> (wi::). Thus we don't have to realloc/strcat but can use vectors to
>> the same effect, just as other frontends, including the C frontend,
>> do.
>> I take it you remember that we had to change all "try" to something
>> C++ friendly. If the Fortran FE meant to opt-out of being compiled
>> with a C++ compiler in the first place, why were all the C++ clashes
>> rewritten, back then? :)
>
> Yes, I know there are other C++ (mis)features within the
> Fortran FE especially in the trans-*.c files.  Those are
> accepted (by some) as necessary evils to interface with
> the ME.  Your patch injects C++ into otherwise perfectly
> fine C code, which makes it more difficult for those with
> no or very limited C++ knowledge to maintain the gfortran.

So you're in favour of using realloc and strcat, ok. I can use that.
Let me see if ipa-icf can replace all the identical tails of the
lookup_*_fuzzy into a common helper.
Shouldn't rely on LTO anyway nor ipa-icf i suppose.

>
> There are currently 806 open bug reports for gfortran.
> AFAIK, your patch does not address any of those bug reports.

I admit i didn't look..

> The continued push to inject C++ into the Fortran FE will
> have the (un)intentional consequence of forcing at least one
> active gfortran contributor to stop.

That was not my intention for sure.

cheers,


Re: [PATCH] RFC: Use Levenshtein spelling suggestions in Fortran FE

2015-12-01 Thread Bernhard Reutner-Fischer
On 1 December 2015 at 18:28, David Malcolm  wrote:
> On Tue, 2015-12-01 at 13:55 +0100, Bernhard Reutner-Fischer wrote:


>> +/* Lookup function FN fuzzily, taking names in FUN into account.  */
>> +
>> +const char*
>> +gfc_lookup_function_fuzzy (const char *fn, gfc_symtree *fun)
>> +{
>> +  auto_vec  candidates;
>> +  lookup_function_fuzzy_find_candidates (fun, );
>> +
>> +  /* Determine closest match.  */
>> +  int i;
>> +  const char *name, *best = NULL;
>> +  edit_distance_t best_distance = MAX_EDIT_DISTANCE;
>> +
>> +  FOR_EACH_VEC_ELT (candidates, i, name)
>> +{
>> +  edit_distance_t dist = levenshtein_distance (fn, name);
>> +  if (dist < best_distance)
>> + {
>> +   best_distance = dist;
>> +   best = name;
>> + }
>> +}
>> +  /* If more than half of the letters were misspelled, the suggestion is
>> + likely to be meaningless.  */
>> +  if (best)
>> +{
>> +  unsigned int cutoff = MAX (strlen (fn), strlen (best)) / 2;
>> +  if (best_distance > cutoff)
>> + return NULL;
>> +}
>> +  return best;
>> +}
>
>
> Caveat: I'm not very familiar with the Fortran FE, so take the following
> with a pinch of salt.
>
> If I'm reading things right, here, and in various other places, you're
> building a vec of const char *, and then seeing which one of those
> candidates is the best match for another const char *.
>
> You could simplify things by adding a helper function to spellcheck.h,
> akin to this one:
>
> extern tree
> find_closest_identifier (tree target, const auto_vec *candidates);

I was hoping for ipa-icf to fix that up on my behalf. I'll try to see
if it does. Short of that: yes, should do that.

>
> This would reduce the amount of duplication in the patch (and slightly
> reduce the amount of C++).

As said, we could as well use a list of candidates with NULL as record marker.
Implementation cosmetics. Steve seems to not be thrilled by the
overall idea in the first place, so unless there is clear support by
somebody else i won't pursue this any further, it's not that i'm bored
or ran out of stuff i should do.. ;)
>
> [are there IDENTIFIER nodes in the Fortran FE, or is it all const char
> *? this would avoid some strlen calls]

Right, but in the Fortran FE these are const char*.

thanks for your comments!


Re: [PATCH] RFC: Use Levenshtein spelling suggestions in Fortran FE

2015-12-01 Thread David Malcolm
On Tue, 2015-12-01 at 18:51 +0100, Bernhard Reutner-Fischer wrote:
> On 1 December 2015 at 18:28, David Malcolm  wrote:
> > On Tue, 2015-12-01 at 13:55 +0100, Bernhard Reutner-Fischer wrote:
> 
> 
> >> +/* Lookup function FN fuzzily, taking names in FUN into account.  */
> >> +
> >> +const char*
> >> +gfc_lookup_function_fuzzy (const char *fn, gfc_symtree *fun)
> >> +{
> >> +  auto_vec  candidates;
> >> +  lookup_function_fuzzy_find_candidates (fun, );
> >> +
> >> +  /* Determine closest match.  */
> >> +  int i;
> >> +  const char *name, *best = NULL;
> >> +  edit_distance_t best_distance = MAX_EDIT_DISTANCE;
> >> +
> >> +  FOR_EACH_VEC_ELT (candidates, i, name)
> >> +{
> >> +  edit_distance_t dist = levenshtein_distance (fn, name);
> >> +  if (dist < best_distance)
> >> + {
> >> +   best_distance = dist;
> >> +   best = name;
> >> + }
> >> +}
> >> +  /* If more than half of the letters were misspelled, the suggestion is
> >> + likely to be meaningless.  */
> >> +  if (best)
> >> +{
> >> +  unsigned int cutoff = MAX (strlen (fn), strlen (best)) / 2;
> >> +  if (best_distance > cutoff)
> >> + return NULL;
> >> +}
> >> +  return best;
> >> +}
> >
> >
> > Caveat: I'm not very familiar with the Fortran FE, so take the following
> > with a pinch of salt.
> >
> > If I'm reading things right, here, and in various other places, you're
> > building a vec of const char *, and then seeing which one of those
> > candidates is the best match for another const char *.
> >
> > You could simplify things by adding a helper function to spellcheck.h,
> > akin to this one:
> >
> > extern tree
> > find_closest_identifier (tree target, const auto_vec *candidates);
> 
> I was hoping for ipa-icf to fix that up on my behalf. I'll try to see
> if it does. Short of that: yes, should do that.

I was more thinking about code readability; don't rely on ipa-icf - fix
it in the source.

> > This would reduce the amount of duplication in the patch (and slightly
> > reduce the amount of C++).
> 
> As said, we could as well use a list of candidates with NULL as record marker.
> Implementation cosmetics. Steve seems to not be thrilled by the
> overall idea in the first place, so unless there is clear support by
> somebody else i won't pursue this any further, it's not that i'm bored
> or ran out of stuff i should do.. ;)

(FWIW I liked the idea, but I'm not a Fortran person so my opinion
counts much less that Steve's)

> > [are there IDENTIFIER nodes in the Fortran FE, or is it all const char
> > *? this would avoid some strlen calls]
> 
> Right, but in the Fortran FE these are const char*.
> 
> thanks for your comments!




RE: [PATCH] RFC: Use Levenshtein spelling suggestions in Fortran FE

2015-12-01 Thread VandeVondele Joost
So, I have tested the patch, it seems to work well.

I would really like to see this feature in the compiler, I'm sure it will help 
people developing Fortran code.

I have already an enhancement request, catching the name of 'Keyword argument' :

> cat test.f90
MODULE test
CONTAINS
  SUBROUTINE foo(bar)
 INTEGER :: bar
  END SUBROUTINE
END MODULE
USE test 
CALL foo(baz=1)
END



Re: [PATCH] Fix declaration of pthread-structs in s-osinte-rtems.ads (ada/68169)

2015-12-01 Thread Jeff Law

On 12/01/2015 12:56 PM, Jan Sommer wrote:

Am Monday 30 November 2015, 16:19:30 schrieb Jeff Law:

On 11/30/2015 03:06 PM, Jan Sommer wrote:

Could someone with write access please commit the patch?
The paperwork with the FSF has gone through. If something else is missing, 
please tell me.
I won't be available next week.

I'm not sure what you built your patches again, but I can't apply them
to the trunk.  Can you resend a patch as a diff against the trunk.

Often I can fix things by hand, but this is Ada and I'd be much more
likely to botch something.


I updated the patches again. They should now fit with the heads of the 
respective branches again.
Maybe the Changelog will be out of synch again.
The patches are for the following branches:
ada-68169_4.9.diff   -->  gcc-4_9-branch
ada-68169_5.x.diff  -->   gcc-5-branch
ada-68169_trunk.diff --> trunk

Let me know if they apply this time. I used svn diff to create them and used 
patch -p0 to test if they apply locally.

THanks.  I've committed this to the trunk based on Joel's comments.

The gcc-5 branch is frozen for the upcoming release and gcc-4.9 is 
regression/doc fixes only.  It'll be up to the release managers whether 
or not to backport to those branches.


Thanks.

Jeff


Re: [PATCH] Fix PR68029

2015-12-01 Thread Jeff Law

On 11/27/2015 06:11 AM, Jiří Engelthaler wrote:

2015-11-27 13:49 GMT+01:00 Bernd Schmidt :

On 11/27/2015 01:30 PM, Jiří Engelthaler wrote:


Sorry for international characters in my name. It should be

Jiri Engelthaler

2015-11-27 13:29 GMT+01:00 Engelthaler Jiří :



There is precedent for non-ASCII characters in ChangeLogs. Grep for Rafael
Ávila de Espíndola. But I think there should be two spaces before the email
address.


You are right - two spaces.


 PR driver/68029
 * opts-common.c (prune_options): fdiagnostics_color ignored
 if it was as first parameter



This should read "Don't ignore -fdiagnostics-color if it is the first
parameter." Full sentences with punctuation.


Changelog modified.

Thank you for recommendation, this is my first patch to GCC.
I did a successful bootstrap & regression test on x86_64-linux-gnu with 
your patch and installed it on the trunk.


Thanks,
Jeff


Re: [PR68001, CilkPlus] Fix for PR68001

2015-12-01 Thread Tom de Vries

On 30/11/15 21:43, Zamyatin, Igor wrote:


FAIL: obj-c++.dg/property/dotsyntax-11.mm -fgnu-runtime  (test for errors,
line 51)
FAIL: obj-c++.dg/property/dotsyntax-11.mm -fgnu-runtime  (test for errors,
line 56)
FAIL: obj-c++.dg/property/dotsyntax-11.mm -fgnu-runtime  (test for errors,
line 59)

Andreas.


Here is the patch that properly limits GS_ERROR exit only in case of error in 
cilk spawn detection.



Please add PR objc++/68511 to the ChangeLog entrie.

Thanks,
- Tom


Bootstrapped and regtested on x86_64, ok for trunk?

Thanks,
Igor

cp/Changelog

2015-11-27  Igor Zamyatin  

PR c++/68001
* cp-gimplify.c (cp_gimplify_expr): Limit GS_ERROR only in case of
error in cilk spawn detection.



diff --git a/gcc/cp/cp-gimplify.c b/gcc/cp/cp-gimplify.c
index 09ee5ff..3dbbd7f 100644
--- a/gcc/cp/cp-gimplify.c
+++ b/gcc/cp/cp-gimplify.c
@@ -559,6 +559,7 @@ int
  cp_gimplify_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p)
  {
int saved_stmts_are_full_exprs_p = 0;
+  bool is_spawn_detected = true;
enum tree_code code = TREE_CODE (*expr_p);
enum gimplify_status ret;

@@ -614,12 +615,12 @@ cp_gimplify_expr (tree *expr_p, gimple_seq *pre_p, 
gimple_seq *post_p)
 25979.  */
  case INIT_EXPR:
if (fn_contains_cilk_spawn_p (cfun)
- && cilk_detect_spawn_and_unwrap (expr_p))
+ && (is_spawn_detected = cilk_detect_spawn_and_unwrap (expr_p)))
{
  cilk_cp_gimplify_call_params_in_spawned_fn (expr_p, pre_p, post_p);
  return (enum gimplify_status) gimplify_cilk_spawn (expr_p);
}
-  if (seen_error ())
+  if (!is_spawn_detected && seen_error ())
return GS_ERROR;

cp_gimplify_init_expr (expr_p);







[ping] pending patches

2015-12-01 Thread Eric Botcazou
IA-64 (stack checking improvement):
  https://gcc.gnu.org/ml/gcc-patches/2015-11/msg01604.html

MIPS (stack checking improvement):
  https://gcc.gnu.org/ml/gcc-patches/2015-11/msg01602.html

Aarch64 (stack checking implementation):
  https://gcc.gnu.org/ml/gcc-patches/2015-11/msg01988.html

DWARF-2 (debug info for Scalar_Storage_Order attribute):
  https://gcc.gnu.org/ml/gcc-patches/2015-11/msg01659.html

C++ (PR 68290: internal error with concepts):
  https://gcc.gnu.org/ml/gcc-patches/2015-11/msg03301.html

Thanks in advance.

-- 
Eric Botcazou


Re: [C PATCH] Fix up location used in get_parm_info diagnostics (PR c/68533)

2015-12-01 Thread Jakub Jelinek
On Tue, Dec 01, 2015 at 08:57:39PM -0700, Jeff Law wrote:
> On 12/01/2015 01:34 PM, Jakub Jelinek wrote:
> >Hi!
> >
> >get_parm_info right now uses input_location as the diagnostics locus, but as
> >can be seen on the testcase, that is pretty random location at that point,
> >often the type of the last parameter.
> >
> >This patch changes it to use the locus from the binding info.
> >
> >Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> >
> >2015-12-01  Jakub Jelinek  
> >
> > PR c/68533
> > * c-decl.c (get_parm_info): Use b->locus instead of input_location
> > for diagnostics.
> >
> > * gcc.dg/pr68533.c: New test.
> I think the change itself is fine.  My question is whether or not the C++
> front-end gets this right.  ISTM we ought to be running the test on both the
> C & C++ front-ends.  The C++ front-end may emit different messages, but we
> ought to be able to account for that and ensure that we're getting them on
> the right lines.

The warning does not exist at all in the C++ FE, it has instead
just an error on declaring anonymous types (not others) among parameters.
C++ does not have forward parameter declarations.  And for the void
among arguments diagnostics it uses slightly different locations and
completely different wording.  So I'm afraid there is basically nothing in
common between C and C++ FEs in this area, so a shared testcase does not
make sense.

Jakub


Re: [C PATCH] Fix up location used in get_parm_info diagnostics (PR c/68533)

2015-12-01 Thread Jeff Law

On 12/02/2015 12:16 AM, Jakub Jelinek wrote:

On Tue, Dec 01, 2015 at 08:57:39PM -0700, Jeff Law wrote:

On 12/01/2015 01:34 PM, Jakub Jelinek wrote:

Hi!

get_parm_info right now uses input_location as the diagnostics locus, but as
can be seen on the testcase, that is pretty random location at that point,
often the type of the last parameter.

This patch changes it to use the locus from the binding info.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2015-12-01  Jakub Jelinek  

PR c/68533
* c-decl.c (get_parm_info): Use b->locus instead of input_location
for diagnostics.

* gcc.dg/pr68533.c: New test.

I think the change itself is fine.  My question is whether or not the C++
front-end gets this right.  ISTM we ought to be running the test on both the
C & C++ front-ends.  The C++ front-end may emit different messages, but we
ought to be able to account for that and ensure that we're getting them on
the right lines.


The warning does not exist at all in the C++ FE, it has instead
just an error on declaring anonymous types (not others) among parameters.
C++ does not have forward parameter declarations.  And for the void
among arguments diagnostics it uses slightly different locations and
completely different wording.  So I'm afraid there is basically nothing in
common between C and C++ FEs in this area, so a shared testcase does not
make sense.

In that case, it's fine as-is for the trunk.

Thanks for checking on the C++ side.

Jeff


Re: [PATCH] rs6000: Optimise SImode cstore on 64-bit

2015-12-01 Thread Segher Boessenkool
On Wed, Dec 02, 2015 at 01:50:46PM +1030, Alan Modra wrote:
> On Wed, Dec 02, 2015 at 01:55:17AM +, Segher Boessenkool wrote:
> > +  emit_insn (gen_subdi3 (tmp, op1, op2));
> > +  emit_insn (gen_lshrdi3 (tmp2, tmp, GEN_INT (63)));
> > +  emit_insn (gen_anddi3 (tmp3, tmp2, const1_rtx));
> 
> Why the AND?  The top 63 bits are already clear.

Ha, yes.  Thanks.  In a previous version I shifted by less, in which
case GCC is smart enough to make it 63 anyway.  63 is always correct
as well, and simpler because you don't need the AND.  But I forgot
to take it out :-)


Segher


Re: [C PATCH] Fix up location used in get_parm_info diagnostics (PR c/68533)

2015-12-01 Thread Jeff Law

On 12/01/2015 01:34 PM, Jakub Jelinek wrote:

Hi!

get_parm_info right now uses input_location as the diagnostics locus, but as
can be seen on the testcase, that is pretty random location at that point,
often the type of the last parameter.

This patch changes it to use the locus from the binding info.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2015-12-01  Jakub Jelinek  

PR c/68533
* c-decl.c (get_parm_info): Use b->locus instead of input_location
for diagnostics.

* gcc.dg/pr68533.c: New test.
I think the change itself is fine.  My question is whether or not the 
C++ front-end gets this right.  ISTM we ought to be running the test on 
both the C & C++ front-ends.  The C++ front-end may emit different 
messages, but we ought to be able to account for that and ensure that 
we're getting them on the right lines.


Jeff




Re: [PATCH 1/2] destroy values as well as keys when removing them from hash maps

2015-12-01 Thread Trevor Saunders
On Tue, Dec 01, 2015 at 07:43:35PM +, Richard Sandiford wrote:
> tbsaunde+...@tbsaunde.org writes:
> > -template 
> > +template 
> >  template 
> >  inline void
> > -simple_hashmap_traits ::remove (T )
> > +simple_hashmap_traits ::remove (T )
> >  {
> >H::remove (entry.m_key);
> > +  entry.m_value.~Value ();
> >  }
> 
> This is just repeating my IRC comment really, but doesn't this mean that
> we're calling the destructor on an object that was never constructed?
> I.e. nothing ever calls placement new on the entry, the m_key, or the
> m_value.

I believe you are correct that placement new is not called.  I'd say its
a bug waiting to happen given that the usage of auto_vec seems to
demonstrate that people expect objects to be initialized and destroyed.
However for now all values are either POD, or auto_vec and in either
case the current 0 initialization has the same effect as the
constructor.  So There may be a theoretical problem with how we
initialize values that will become real when somebody adds a constructor
that doesn't just 0 initialize.  So it should probably be improved at
some point, but it doesn't seem necessary to mess with it at this point
instead of next stage 1.

Trev

> 
> Thanks,
> Richard


[gomp4] backport fortran gang parsing updates

2015-12-01 Thread Cesar Philippidis
This patch carries over fortran gang parsing updates I recently applied
to trunk to gomp-4_0-branch. Most of it was straightforward, but I did
take the opportunity to clean up struct gfc_omp_clauses by eliminating
some unnecessary bits for num_gangs, num_workers, vector_length and
tile. Besides that, this patch does diverge from trunk a little because
gomp4's preliminary support for device_type.

Tom, while I was working on combined loop splitter, I noticed that
reductions in combined constructs were still being associated with the
parallel/kernels constructs. I've updated it to match the behavior in
your c FE patch, i.e., reductions in combined constructs are only
associated with the split acc loop. Let me know if that's causes any
problems with you. This is how trunk behaves now.

Another random note. I'm not sure why gfortran ICEs when I associate the
private clauses in combined constructs with the acc loop. That code was
ifdef'ed out in gfc_filter_oacc_combined_clauses. I'll investigate this
later.

I've applied this patch to gomp-4_0-branch.

Cesar
2015-12-01  Cesar Philippidis  

	gcc/fortran/
	* dump-parse-tree.c (show_omp_clauses): Handle gang_static_expr
	and gang_num_expr.
	* gfortran.h (struct gfc_omp_clauses): Remove gang_expr, num_gangs,
	num_workers, vector_length and tile.  Add gang_static_expr and
	gang_num_expr.
	* openmp.c (gfc_free_omp_clauses): Handle gnag_static_expr,
	gang_num_expr.  Eliminate gang_expr.
	(match_oacc_clause_gang): Update to allow both num and static arguments
	in the same gang clauses.
	(gfc_match_omp_clauses): Remove reference to c->{vector_length,
	num_gangs, num_workers, tile}.
	(resolve_omp_clauses): Update calls to resolve_oacc_positive_int_expr.
	(resolve_oacc_params_in_parallel): Add const char arg argument to
	make the error messages more descriptive.
	(resolve_oacc_loop_blocks): Update calls to
	resolve_oacc_params_in_parallel.
	* trans-openmp.c (gfc_trans_omp_clauses_1): Update how the gang
	clause is lowered.
	(gfc_filter_oacc_combined_clauses): Handle gang_num_expr and
	gang_static_expr.  Also remove OMP_LIST_REDUCTION from the outer
	construct clauses.

	gcc/testsuite/
	* gfortran.dg/goacc/gang-static.f95: Add static num coverage.
	* gfortran.dg/goacc/loop-2.f95: Likewise.
	* gfortran.dg/goacc/loop-6.f95: Likewise.
	* gfortran.dg/goacc/loop-7.f95: New file.

diff --git a/gcc/fortran/dump-parse-tree.c b/gcc/fortran/dump-parse-tree.c
index a816cde..4f38a09 100644
--- a/gcc/fortran/dump-parse-tree.c
+++ b/gcc/fortran/dump-parse-tree.c
@@ -1146,10 +1146,24 @@ show_omp_clauses (gfc_omp_clauses *omp_clauses)
   if (omp_clauses->gang)
 {
   fputs (" GANG", dumpfile);
-  if (omp_clauses->gang_expr)
+  if (omp_clauses->gang_num_expr || omp_clauses->gang_static_expr)
 	{
 	  fputc ('(', dumpfile);
-	  show_expr (omp_clauses->gang_expr);
+	  if (omp_clauses->gang_num_expr)
+	{
+	  fprintf (dumpfile, "num:");
+	  show_expr (omp_clauses->gang_num_expr);
+	}
+	  if (omp_clauses->gang_num_expr && omp_clauses->gang_static)
+	fputc (',', dumpfile);
+	  if (omp_clauses->gang_static)
+	{
+	  fprintf (dumpfile, "static:");
+	  if (omp_clauses->gang_static_expr)
+		show_expr (omp_clauses->gang_static_expr);
+	  else
+		fputc ('*', dumpfile);
+	}
 	  fputc (')', dumpfile);
 	}
 }
diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index dd186b5..26f4c8a 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -1229,7 +1229,8 @@ typedef struct gfc_omp_clauses
 
   /* OpenACC. */
   struct gfc_expr *async_expr;
-  struct gfc_expr *gang_expr;
+  struct gfc_expr *gang_static_expr;
+  struct gfc_expr *gang_num_expr;
   struct gfc_expr *worker_expr;
   struct gfc_expr *vector_expr;
   struct gfc_expr *num_gangs_expr;
@@ -1242,7 +1243,6 @@ typedef struct gfc_omp_clauses
   gfc_expr_list *tile_list;
   unsigned async:1, gang:1, worker:1, vector:1, seq:1, independent:1;
   unsigned wait:1, par_auto:1, gang_static:1, nohost:1, acc_collapse:1, bind:1;
-  unsigned num_gangs:1, num_workers:1, vector_length:1, tile:1;
   locus loc;
 
 }
diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index c6db847..b354d70 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -77,7 +77,8 @@ gfc_free_omp_clauses (gfc_omp_clauses *c)
   gfc_free_expr (c->thread_limit);
   gfc_free_expr (c->dist_chunk_size);
   gfc_free_expr (c->async_expr);
-  gfc_free_expr (c->gang_expr);
+  gfc_free_expr (c->gang_num_expr);
+  gfc_free_expr (c->gang_static_expr);
   gfc_free_expr (c->worker_expr);
   gfc_free_expr (c->vector_expr);
   gfc_free_expr (c->num_gangs_expr);
@@ -396,21 +397,41 @@ cleanup:
 static match
 match_oacc_clause_gang (gfc_omp_clauses *cp)
 {
-  if (gfc_match_char ('(') != MATCH_YES)
+  match ret = MATCH_YES;
+
+  if (gfc_match (" ( ") != MATCH_YES)
 return MATCH_NO;
-  if (gfc_match (" num :") == MATCH_YES)
-{
-  cp->gang_static = false;
-  return gfc_match (" %e )", 

Re: When not optimizing do not compute RTX memory attributes

2015-12-01 Thread Jan Hubicka
> On Tue, 1 Dec 2015, Jan Hubicka wrote:
> 
> > Hi,
> > memory attributes are currently optimized and attached to RTL even when not
> > optimizing. This is obviously just a wasted effort.
> 
> Huh, are you sure?  What about globals used from different optimize
> contexts?

Hmm, you are right - we will get worse code quality.  The code won't ICE
because MEM_ATTRS can legally be NULL - get_mem_attrs will then supply default
one for given mode, but we will miss code quality.  I will look into disabling
mem attrs for non-globals only.

It would be nice to get rid of those global persistent RTLs (and make DECL_RTL
to be function local hash)

Honza


Re: [gomp4.5] Handle #pragma omp declare target link

2015-12-01 Thread Jakub Jelinek
On Tue, Dec 01, 2015 at 08:29:27PM +0300, Ilya Verbin wrote:
> libgomp/
>   * target.c (finalized): New static variable.
>   (resolve_device): Do nothing when finalized is true.
>   (GOMP_offload_register_ver): Likewise.
>   (GOMP_offload_unregister_ver): Likewise.
>   (gomp_target_fini): New static function.
>   (gomp_target_init): Call gomp_target_fini at exit.
> liboffloadmic/
>   * plugin/libgomp-plugin-intelmic.cpp (unregister_main_image): Remove.
>   (register_main_image): Do not call unregister_main_image at exit.
>   (GOMP_OFFLOAD_fini_device): Allow for OpenMP.  Unregister main image.
> 
> diff --git a/libgomp/target.c b/libgomp/target.c
> index cf9d0e6..320178e 100644
> --- a/libgomp/target.c
> +++ b/libgomp/target.c
> @@ -78,6 +78,10 @@ static int num_devices;
>  /* Number of GOMP_OFFLOAD_CAP_OPENMP_400 devices.  */
>  static int num_devices_openmp;
>  
> +/* True when offloading runtime is finalized.  */
> +static bool finalized;


> +
> +
>  /* Similar to gomp_realloc, but release register_lock before gomp_fatal.  */
>  
>  static void *
> @@ -108,6 +112,9 @@ gomp_get_num_devices (void)
>  static struct gomp_device_descr *
>  resolve_device (int device_id)
>  {
> +  if (finalized)
> +return NULL;
> +

This is racy, tsan would tell you so.
Instead of a global var, I'd just change the devicep->is_initialized 
field from bool into a 3 state field (perhaps enum), with states
uninitialized, initialized, finalized, and then say in resolve_device,

  gomp_mutex_lock ([device_id].lock);
  if (devices[device_id].state == GOMP_DEVICE_UNINITIALIZED)
gomp_init_device ([device_id]);
  else if (devices[device_id].state == GOMP_DEVICE_FINALIZED)
{
  gomp_mutex_unlock ([device_id].lock);
  return NULL;
}
  gomp_mutex_unlock ([device_id].lock);

Though, of course, that is incomplete, because resolve_device takes one
lock, gomp_get_target_fn_addr another one, gomp_map_vars yet another one.
So I think either we want to rewrite the locking, such that say
resolve_device returns a locked device and then you perform stuff on the
locked device (disadvantage is that gomp_map_vars will call gomp_malloc
with the lock held, which can take some time to allocate the memory),
or there needs to be the possibility that gomp_map_vars rechecks if the
device has not been finalized after taking the lock and returns to the
caller if the device has been finalized in between resolve_device and
gomp_map_vars.

Jakub


[PATCH] Empty redirect_edge_var_map after each pass and function

2015-12-01 Thread Alan Lawrence
This follows on from discussion at
https://gcc.gnu.org/ml/gcc-patches/2015-11/msg03392.html
To recap: Starting in r229479 and continuing until at least 229711, compiling
polynom.c from spec2000 on aarch64-none-linux-gnu, with options
-O3 -mcpu=cortex-a53 -ffast-math (on both cross, native bootstrapped, and native
--disable-bootstrap compilers), produced a verify_gimple ICE after unswitch:

../spec2000/benchspec/CINT2000/254.gap/src/polynom.c: In function 
'NormalizeCoeffsListx':
../spec2000/benchspec/CINT2000/254.gap/src/polynom.c:358:11: error: 
incompatible types in PHI argument 0
 TypHandle NormalizeCoeffsListx ( hdC )
   ^
long int

int

../spec2000/benchspec/CINT2000/254.gap/src/polynom.c:358:11: error: location 
references block not in block tree
l1_279 = PHI <1(28), l1_299(33)>
../spec2000/benchspec/CINT2000/254.gap/src/polynom.c:358:11: error: invalid PHI 
argument

../spec2000/benchspec/CINT2000/254.gap/src/polynom.c:358:11: internal compiler 
error: tree check: expected class 'type', have 'declaration' (namespace_decl) 
in useless_type_conversion_p, at gimple-expr.c:84
0xd110ef tree_class_check_failed(tree_node const*, tree_code_class, char 
const*, int, char const*)
../../gcc-fsf/gcc/tree.c:9643
0x82561b tree_class_check
../../gcc-fsf/gcc/tree.h:3042
0x82561b useless_type_conversion_p(tree_node*, tree_node*)
../../gcc-fsf/gcc/gimple-expr.c:84
0xaca043 verify_gimple_phi
../../gcc-fsf/gcc/tree-cfg.c:4673
0xaca043 verify_gimple_in_cfg(function*, bool)
../../gcc-fsf/gcc/tree-cfg.c:4967
0x9c2e0b execute_function_todo
../../gcc-fsf/gcc/passes.c:1967
0x9c360b do_per_function
../../gcc-fsf/gcc/passes.c:1659
0x9c3807 execute_todo
../../gcc-fsf/gcc/passes.c:2022

I was not able to reduce the testcase below about 30k characters, with e.g.
#define T_VOID 0
 T_VOID 
producing the ICE, but manually changing to
 0 
preventing the ICE; as did running the preprocessor as a separate step, or a
wide variety of options (e.g. -fdump-tree-alias).

In the end I traced this to loop_unswitch reading stale values from the edge
redirect map, which is keyed on 'edge' (a pointer to struct edge_def); the map
entries had been left there by pass_dominator (on a different function), and by
"chance" the edge *pointers* were the same as to some current edge_defs (even
though they pointed to structures created by different allocations, the first
of which had since been freed). Hence the fragility of the testcase and
environment.

While the ICE is prevented merely by adding a call to
redirect_edge_var_map_destroy at the end of pass_dominator::execute, given the
fragility of the bug, difficulty of reducing the testcase, and the low overhead
of emptying an already-empty map, I believe the right fix is to empty the map
as often as can correctly do so, hence this patch - based substantially on
Richard's comments in PR/68117.

Bootstrapped + check-gcc + check-g++ on x86_64 linux, based on r231105; I've
also built SPEC2000 on aarch64-none-linux-gnu by applying this patch (roughly)
onto the previously-failing r229711, which also passes aarch64 bootstrap, and
a more recent bootstrap on aarch64 is ongoing. Assuming/if no regressions 
there...

Is this ok for trunk?

This could also be a candidate for the 5.3 release; backporting depends only on
the (fairly trivial) r230357.

gcc/ChangeLog:

  Alan Lawrence  
Richard Biener  

* cfgexpand.c (pass_expand::execute): Replace call to
redirect_edge_var_map_destroy with redirect_edge_var_map_empty.
* tree-ssa.c (delete_tree_ssa): Likewise.
* function.c (set_cfun): Call redirect_edge_var_map_empty.
* passes.c (execute_one_ipa_transform_pass, execute_one_pass): Likewise.
* tree-ssa.h (redirect_edge_var_map_destroy): Remove.
(redirect_edge_var_map_empty): New.
* tree-ssa.c (redirect_edge_var_map_destroy): Remove.
(redirect_edge_var_map_empty): New.

---
 gcc/cfgexpand.c | 2 +-
 gcc/function.c  | 2 ++
 gcc/passes.c| 2 ++
 gcc/tree-ssa.c  | 8 
 gcc/tree-ssa.h  | 2 +-
 5 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 1990e10..ede1b82 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -6291,7 +6291,7 @@ pass_expand::execute (function *fun)
   expand_phi_nodes ();
 
   /* Release any stale SSA redirection data.  */
-  redirect_edge_var_map_destroy ();
+  redirect_edge_var_map_empty ();
 
   /* Register rtl specific functions for cfg.  */
   rtl_register_cfg_hooks ();
diff --git a/gcc/function.c b/gcc/function.c
index 515d7c0..e452865 100644
--- a/gcc/function.c
+++ b/gcc/function.c
@@ -75,6 +75,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-chkp.h"
 #include "rtl-chkp.h"
 #include "tree-dfa.h"
+#include "tree-ssa.h"
 
 /* So we can assign to cfun in this file.  */
 #undef cfun
@@ -4798,6 +4799,7 

[GOOGLE] add more type in lipo type compare

2015-12-01 Thread Dehao Chen
The following patch can fix an ICE when compiling with LIPO. OK for google-4_9?

Thanks,
Dehao

Index: gcc/l-ipo.c
===
--- gcc/l-ipo.c (revision 225685)
+++ gcc/l-ipo.c (working copy)
@@ -731,6 +731,7 @@ lipo_cmp_type (tree t1, tree t2)
 case NULLPTR_TYPE:
   return 1;
 case TEMPLATE_TYPE_PARM:
+case TEMPLATE_TEMPLATE_PARM:
   return 1;
 default:
   gcc_unreachable ();


Re: [PING v2][PATCH][4.9] Backport fix for PR sanitizer/64820.

2015-12-01 Thread Joakim Tjernlund
On Tue, 2015-12-01 at 20:23 +0300, Maxim Ostapenko wrote:
> On 25/11/15 12:14, Maxim Ostapenko wrote:
> > I would like to ping the patch: 
> > https://gcc.gnu.org/ml/gcc-patches/2015-11/msg02174.html.
> > 
> 
> Ping.

Yes, please add :)

Re: [RFA] [PATCH] Fix invalid redundant extension elimination for rl78 port

2015-12-01 Thread Richard Sandiford
Jeff Law  writes:
> @@ -1080,6 +1070,18 @@ add_removable_extension (const_rtx expr, rtx_insn 
> *insn,
> }
> }
>  
> +  /* Fourth, if the extended version occupies more registers than the
> +  original and the source of the extension is the same hard register
> +  as the destination of the extension, then we can not eliminate
> +  the extension without deep analysis, so just punt.
> +
> +  We allow this when the registers are different because the
> +  code in combine_reaching_defs will handle that case correctly.  */
> +  if ((HARD_REGNO_NREGS (REGNO (dest), mode)
> +!= HARD_REGNO_NREGS (REGNO (reg), GET_MODE (reg)))
> +   && REGNO (dest) == REGNO (reg))
> + return;
> +
>/* Then add the candidate to the list and insert the reaching 
> definitions
>   into the definition map.  */
>ext_cand e = {expr, code, mode, insn};

I might be wrong, but the check looks specific to little-endian.  Would
it make sense to use reg_overlap_mentioned_p instead of the REGNO check?

Thanks,
Richard


[PATCH] PR c/68637: Rebuid array with the updated function pointer type

2015-12-01 Thread H.J. Lu
When we apply function attribute to array of function pointer, we
need to rebuild array with the updated function pointer type.

gcc/

PR c/68637
* attribs.c (decl_attributes): Rebuid array with the updated
* function pointer type.

gcc/testsuite/

PR c/68637
* gcc.target/i386/pr68637.c: New test.
---
 gcc/attribs.c   | 18 +-
 gcc/testsuite/gcc.target/i386/pr68637.c | 10 ++
 2 files changed, 27 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr68637.c

diff --git a/gcc/attribs.c b/gcc/attribs.c
index affb21d..0be5ebf 100644
--- a/gcc/attribs.c
+++ b/gcc/attribs.c
@@ -494,9 +494,18 @@ decl_attributes (tree *node, tree attributes, int flags)
  flags &= ~(int) ATTR_FLAG_TYPE_IN_PLACE;
}
 
+  tree array_type = (TREE_CODE (*anode) == ARRAY_TYPE
+? *anode
+: NULL_TREE);
+
   if (spec->function_type_required && TREE_CODE (*anode) != FUNCTION_TYPE
  && TREE_CODE (*anode) != METHOD_TYPE)
{
+ /* We need to rebuid array with the updated function pointer
+type later. */
+ if (array_type)
+   *anode = TREE_TYPE (*anode);
+
  if (TREE_CODE (*anode) == POINTER_TYPE
  && (TREE_CODE (TREE_TYPE (*anode)) == FUNCTION_TYPE
  || TREE_CODE (TREE_TYPE (*anode)) == METHOD_TYPE))
@@ -617,7 +626,14 @@ decl_attributes (tree *node, tree attributes, int flags)
  if (fn_ptr_quals)
fn_ptr_tmp = build_qualified_type (fn_ptr_tmp, fn_ptr_quals);
  if (DECL_P (*node))
-   TREE_TYPE (*node) = fn_ptr_tmp;
+   {
+ if (array_type)
+   TREE_TYPE (*node)
+ = build_array_type (fn_ptr_tmp,
+ TYPE_DOMAIN (array_type));
+ else
+   TREE_TYPE (*node) = fn_ptr_tmp;
+   }
  else
{
  gcc_assert (TREE_CODE (*node) == POINTER_TYPE);
diff --git a/gcc/testsuite/gcc.target/i386/pr68637.c 
b/gcc/testsuite/gcc.target/i386/pr68637.c
new file mode 100644
index 000..c6fc6ba
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr68637.c
@@ -0,0 +1,10 @@
+/* { dg-do compile { target ia32 } } */
+/* { dg-options "-O2 -Werror " } */
+
+extern void (*bar[10]) (int, int) __attribute__ ((regparm (2)));
+
+void
+xxx (int i)
+{
+  bar[i] (1, 2);
+}
-- 
2.5.0



Re: [PATCH 1/2] destroy values as well as keys when removing them from hash maps

2015-12-01 Thread Richard Sandiford
tbsaunde+...@tbsaunde.org writes:
> -template 
> +template 
>  template 
>  inline void
> -simple_hashmap_traits ::remove (T )
> +simple_hashmap_traits ::remove (T )
>  {
>H::remove (entry.m_key);
> +  entry.m_value.~Value ();
>  }

This is just repeating my IRC comment really, but doesn't this mean that
we're calling the destructor on an object that was never constructed?
I.e. nothing ever calls placement new on the entry, the m_key, or the
m_value.

Thanks,
Richard


Re: [PATCH] RFC: Use Levenshtein spelling suggestions in Fortran FE

2015-12-01 Thread Steve Kargl
On Tue, Dec 01, 2015 at 06:34:57PM +0100, Bernhard Reutner-Fischer wrote:
> On 1 December 2015 at 17:41, Steve Kargl
> >
> > Yes, I know there are other C++ (mis)features within the
> > Fortran FE especially in the trans-*.c files.  Those are
> > accepted (by some) as necessary evils to interface with
> > the ME.  Your patch injects C++ into otherwise perfectly
> > fine C code, which makes it more difficult for those with
> > no or very limited C++ knowledge to maintain the gfortran.
> 
> So you're in favour of using realloc and strcat, ok. I can use that.
> Let me see if ipa-icf can replace all the identical tails of the
> lookup_*_fuzzy into a common helper.
> Shouldn't rely on LTO anyway nor ipa-icf i suppose.

Yes, I would prefer it, but certainly won't demand it.
There are other Fortran contributors/maintainers.  They
may prefer you approach, so give them time to speak up.

-- 
Steve


Re: [PATCH] Fix declaration of pthread-structs in s-osinte-rtems.ads (ada/68169)

2015-12-01 Thread Jan Sommer
Am Monday 30 November 2015, 16:19:30 schrieb Jeff Law:
> On 11/30/2015 03:06 PM, Jan Sommer wrote:
> > Could someone with write access please commit the patch?
> > The paperwork with the FSF has gone through. If something else is missing, 
> > please tell me.
> > I won't be available next week.
> I'm not sure what you built your patches again, but I can't apply them 
> to the trunk.  Can you resend a patch as a diff against the trunk.
> 
> Often I can fix things by hand, but this is Ada and I'd be much more 
> likely to botch something.

I updated the patches again. They should now fit with the heads of the 
respective branches again.
Maybe the Changelog will be out of synch again.
The patches are for the following branches:
ada-68169_4.9.diff   -->  gcc-4_9-branch
ada-68169_5.x.diff  -->   gcc-5-branch
ada-68169_trunk.diff --> trunk

Let me know if they apply this time. I used svn diff to create them and used 
patch -p0 to test if they apply locally.

Thank you,

  Jan

> 
> 
> jeff
> 
> 
Index: gcc/ada/ChangeLog
===
--- gcc/ada/ChangeLog	(Revision 231125)
+++ gcc/ada/ChangeLog	(Arbeitskopie)
@@ -1,3 +1,9 @@
+2015-12-01  Jan Sommer 
+
+	* s-oscons-tmplt.c: Generate pthread constants for RTEMS
+	* s-osinte-rtems.ads: Declare pthread structs as opaque types in Ada
+	Fixes PR ada/68169
+
 2015-11-29  Matthias Klose  
 
 	PR ada/68564
Index: gcc/ada/s-oscons-tmplt.c
===
--- gcc/ada/s-oscons-tmplt.c	(Revision 231125)
+++ gcc/ada/s-oscons-tmplt.c	(Arbeitskopie)
@@ -154,7 +154,7 @@ pragma Style_Checks ("M32766");
 # include <_types.h>
 #endif
 
-#ifdef __linux__
+#if defined (__linux__) || defined (__rtems__)
 # include 
 # include 
 #endif
@@ -1441,7 +1441,8 @@ CND(CLOCK_THREAD_CPUTIME_ID, "Thread CPU clock")
 CNS(CLOCK_RT_Ada, "")
 #endif
 
-#if defined (__APPLE__) || defined (__linux__) || defined (DUMMY)
+#if defined (__APPLE__) || defined (__linux__) || defined (__rtems__) || \
+  defined (DUMMY)
 /*
 
--  Sizes of pthread data types
@@ -1484,7 +1485,7 @@ CND(PTHREAD_RWLOCKATTR_SIZE, "pthread_rwlockattr_t
 CND(PTHREAD_RWLOCK_SIZE, "pthread_rwlock_t")
 CND(PTHREAD_ONCE_SIZE,   "pthread_once_t")
 
-#endif /* __APPLE__ || __linux__ */
+#endif /* __APPLE__ || __linux__ || __rtems__*/
 
 /*
 
Index: gcc/ada/s-osinte-rtems.ads
===
--- gcc/ada/s-osinte-rtems.ads	(Revision 231125)
+++ gcc/ada/s-osinte-rtems.ads	(Arbeitskopie)
@@ -51,6 +51,8 @@
 --  It is designed to be a bottom-level (leaf) package.
 
 with Interfaces.C;
+with System.OS_Constants;
+
 package System.OS_Interface is
pragma Preelaborate;
 
@@ -60,6 +62,7 @@ package System.OS_Interface is
subtype rtems_id   is Interfaces.C.unsigned;
 
subtype intis Interfaces.C.int;
+   subtype char   is Interfaces.C.char;
subtype short  is Interfaces.C.short;
subtype long   is Interfaces.C.long;
subtype unsigned   is Interfaces.C.unsigned;
@@ -68,7 +71,6 @@ package System.OS_Interface is
subtype unsigned_char  is Interfaces.C.unsigned_char;
subtype plain_char is Interfaces.C.plain_char;
subtype size_t is Interfaces.C.size_t;
-
---
-- Errno --
---
@@ -76,11 +78,11 @@ package System.OS_Interface is
function errno return int;
pragma Import (C, errno, "__get_errno");
 
-   EAGAIN: constant := 11;
-   EINTR : constant := 4;
-   EINVAL: constant := 22;
-   ENOMEM: constant := 12;
-   ETIMEDOUT : constant := 116;
+   EAGAIN: constant := System.OS_Constants.EAGAIN;
+   EINTR : constant := System.OS_Constants.EINTR;
+   EINVAL: constant := System.OS_Constants.EINVAL;
+   ENOMEM: constant := System.OS_Constants.ENOMEM;
+   ETIMEDOUT : constant := System.OS_Constants.ETIMEDOUT;
 
-
-- Signals --
@@ -448,6 +450,7 @@ package System.OS_Interface is
   ss_low_priority : int;
   ss_replenish_period : timespec;
   ss_initial_budget   : timespec;
+  sched_ss_max_repl   : int;
end record;
pragma Convention (C, struct_sched_param);
 
@@ -621,43 +624,34 @@ private
end record;
pragma Convention (C, timespec);
 
-   CLOCK_REALTIME :  constant clockid_t := 1;
-   CLOCK_MONOTONIC : constant clockid_t := 4;
+   CLOCK_REALTIME :  constant clockid_t := System.OS_Constants.CLOCK_REALTIME;
+   CLOCK_MONOTONIC : constant clockid_t := System.OS_Constants.CLOCK_MONOTONIC;
 
+   subtype char_array is Interfaces.C.char_array;
+
type pthread_attr_t is record
-  is_initialized  : int;
-  stackaddr   : System.Address;
-  stacksize   : int;
-  contentionscope : int;
-  inheritsched: int;
-  schedpolicy : int;
-  schedparam  : struct_sched_param;
-  cputime_clocked_allowed : int;
-  

Re: [gomp4.5] Handle #pragma omp declare target link

2015-12-01 Thread Ilya Verbin

> On 01 Dec 2015, at 11:18, Jakub Jelinek  wrote:
> 
>> On Mon, Nov 30, 2015 at 11:55:20PM +0300, Ilya Verbin wrote:
>> Ok, but it doesn't solve the issue with doing it for the executable, because
>> gomp_unmap_tgt (n->tgt) will want to run free_func on uninitialized device.
> 
> ?? You mean that the
> devicep->unload_image_func (devicep->target_id, version, target_data);
> call deinitializes the device or something else (I mean, if there is some
> other tgt, then it had to be initialized)?

No, I mean that it can be deinitialized from plugin's __run_exit_handlers (see 
my last mail with the patch).

  -- Ilya

[Patch,microblaze]: Instruction prefetch optimization for microblaze.

2015-12-01 Thread Ajit Kumar Agarwal
The changes are made in this patch for the instruction prefetch optimizations 
for Microblaze.

Reg tested for Microblaze target.

The changes are made for instruction prefetch optimizations for Microblaze. The 
"wic" microblaze instruction is the
instruction prefetch instruction. The instruction prefetch optimization is done 
to generate the iprefetch instruction 
at the call site fall through path. This optimization is enabled with  
microblaze target flag mxl-prefetch. The purpose
of adding the flags is that selection of "wic" instruction should be enabled in 
the reconfigurable design and the 
selection is not enabled by default.

ChangeLog:
2015-12-01  Ajit Agarwal  

* config/microblaze/microblaze.c
(get_branch_target): New.
(insert_wic_for_ilb_runout): New.
(insert_wic): New.
(microblaze_machine_dependent_reorg): New.
(TARGET_MACHINE_DEPENDENT_REORG): Define macro.
* config/microblaze/microblaze.md
(UNSPEC_IPREFETCH): Define.
(iprefetch): New pattern
* config/microblaze/microblaze.opt
(mxl-prefetch): New flag.

Signed-off-by:Ajit Agarwal ajit...@xilinx.com


Thanks & Regards
Ajit


iprefetch.patch
Description: iprefetch.patch


[PATCH] Fix PR68590

2015-12-01 Thread Richard Biener

The following avoids PR68590 by merging two match.pd patterns.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

Richard.

2015-12-01  Richard Biener  

PR middle-end/68590
* match.pd: Merge (eq @0 @0) and (ge/le @0 @0) patterns.

Index: gcc/match.pd
===
--- gcc/match.pd(revision 231065)
+++ gcc/match.pd(working copy)
@@ -1828,15 +1828,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  
 /* Simplify comparison of something with itself.  For IEEE
floating-point, we can only do some of these simplifications.  */
-(simplify
- (eq @0 @0)
- (if (! FLOAT_TYPE_P (TREE_TYPE (@0))
-  || ! HONOR_NANS (TYPE_MODE (TREE_TYPE (@0
-  { constant_boolean_node (true, type); }))
-(for cmp (ge le)
+(for cmp (eq ge le)
  (simplify
   (cmp @0 @0)
-  (eq @0 @0)))
+  (if (! FLOAT_TYPE_P (TREE_TYPE (@0))
+   || ! HONOR_NANS (TYPE_MODE (TREE_TYPE (@0
+   { constant_boolean_node (true, type); }
+   (if (cmp != EQ_EXPR)
+(eq @0 @0)
 (for cmp (ne gt lt)
  (simplify
   (cmp @0 @0)


RE: [Patch,microblaze]: Instruction prefetch optimization for microblaze.

2015-12-01 Thread Ajit Kumar Agarwal
Moreover this patch is tested and run on hardware with Mibench/EEMBC benchmarks 
for Microblaze target. The reconfigurable 
design is enabled with the selection of "wic" instruction prefetch instruction 
and above benchmarks compiled with -mxl-prefetch flags.

Thanks & Regards
Ajit
-Original Message-
From: Ajit Kumar Agarwal 
Sent: Tuesday, December 01, 2015 2:19 PM
To: GCC Patches
Cc: Vinod Kathail; Shail Aditya Gupta; Vidhumouli Hunsigida; Nagaraju Mekala
Subject: [Patch,microblaze]: Instruction prefetch optimization for microblaze.

The changes are made in this patch for the instruction prefetch optimizations 
for Microblaze.

Reg tested for Microblaze target.

The changes are made for instruction prefetch optimizations for Microblaze. The 
"wic" microblaze instruction is the instruction prefetch instruction. The 
instruction prefetch optimization is done to generate the iprefetch instruction 
at the call site fall through path. This optimization is enabled with  
microblaze target flag mxl-prefetch. The purpose of adding the flags is that 
selection of "wic" instruction should be enabled in the reconfigurable design 
and the selection is not enabled by default.

ChangeLog:
2015-12-01  Ajit Agarwal  

* config/microblaze/microblaze.c
(get_branch_target): New.
(insert_wic_for_ilb_runout): New.
(insert_wic): New.
(microblaze_machine_dependent_reorg): New.
(TARGET_MACHINE_DEPENDENT_REORG): Define macro.
* config/microblaze/microblaze.md
(UNSPEC_IPREFETCH): Define.
(iprefetch): New pattern
* config/microblaze/microblaze.opt
(mxl-prefetch): New flag.

Signed-off-by:Ajit Agarwal ajit...@xilinx.com


Thanks & Regards
Ajit


PR68577: Handle narrowing for vector popcount, etc.

2015-12-01 Thread Richard Sandiford
This patch adds support for simple cases where the a vector internal
function returns wider results than the scalar equivalent.  It punts
on other cases.

Tested on powerpc64-linux-gnu and x86_64-linux-gnu.  OK to install?

Thanks,
Richard


gcc/
PR tree-optimization/68577
* tree-vect-stmts.c (simple_integer_narrowing): New function.
(vectorizable_call): Restrict internal function handling
to NONE and NARROW cases, using simple_integer_narrowing
to test for the latter.  Add cost of narrowing operation
and insert it where necessary.

gcc/testsuite/
PR tree-optimization/68577
* gcc.dg/vect/pr68577.c: New test.

diff --git a/gcc/testsuite/gcc.dg/vect/pr68577.c 
b/gcc/testsuite/gcc.dg/vect/pr68577.c
new file mode 100644
index 000..999c1c8
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr68577.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+
+int a, b;
+
+void
+__sched_cpucount (void)
+{
+  while (b)
+{
+  long l = b++;
+  a += __builtin_popcountl(l);
+}
+}
+
+void
+slp_test (int *x, long *y)
+{
+  for (int i = 0; i < 512; i += 4)
+{
+  x[i] = __builtin_popcountl(y[i]);
+  x[i + 1] = __builtin_popcountl(y[i + 1]);
+  x[i + 2] = __builtin_popcountl(y[i + 2]);
+  x[i + 3] = __builtin_popcountl(y[i + 3]);
+}
+}
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 3b078da..af86bce 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -2122,6 +2122,40 @@ vectorizable_mask_load_store (gimple *stmt, 
gimple_stmt_iterator *gsi,
   return true;
 }
 
+/* Return true if vector type VECTYPE_OUT has integer elements and
+   if we can narrow two integer vectors with the same shape as
+   VECTYPE_IN to VECTYPE_OUT in a single step.  On success,
+   return the binary pack code in *CONVERT_CODE and the types
+   of the input vectors in *CONVERT_FROM.  */
+
+static bool
+simple_integer_narrowing (tree vectype_out, tree vectype_in,
+ tree_code *convert_code, tree *convert_from)
+{
+  if (!INTEGRAL_TYPE_P (TREE_TYPE (vectype_out)))
+return false;
+
+  if (!INTEGRAL_TYPE_P (TREE_TYPE (vectype_in)))
+{
+  unsigned int bits
+   = GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (vectype_in)));
+  tree scalar_type = build_nonstandard_integer_type (bits, 0);
+  vectype_in = get_same_sized_vectype (scalar_type, vectype_in);
+}
+
+  tree_code code;
+  int multi_step_cvt = 0;
+  auto_vec  interm_types;
+  if (!supportable_narrowing_operation (NOP_EXPR, vectype_out, vectype_in,
+   , _step_cvt,
+   _types)
+  || multi_step_cvt)
+return false;
+
+  *convert_code = code;
+  *convert_from = vectype_in;
+  return true;
+}
 
 /* Function vectorizable_call.
 
@@ -2288,7 +2322,13 @@ vectorizable_call (gimple *gs, gimple_stmt_iterator 
*gsi, gimple **vec_stmt,
   tree callee = gimple_call_fndecl (stmt);
 
   /* First try using an internal function.  */
-  if (cfn != CFN_LAST)
+  tree_code convert_code = ERROR_MARK;
+  tree convert_from = NULL_TREE;
+  if (cfn != CFN_LAST
+  && (modifier == NONE
+ || (modifier == NARROW
+ && simple_integer_narrowing (vectype_out, vectype_in,
+  _code, _from
 ifn = vectorizable_internal_function (cfn, callee, vectype_out,
  vectype_in);
 
@@ -2328,7 +2368,7 @@ vectorizable_call (gimple *gs, gimple_stmt_iterator *gsi, 
gimple **vec_stmt,
 
   if (slp_node || PURE_SLP_STMT (stmt_info))
 ncopies = 1;
-  else if (modifier == NARROW)
+  else if (modifier == NARROW && ifn == IFN_LAST)
 ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits_out;
   else
 ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits_in;
@@ -2344,6 +2384,10 @@ vectorizable_call (gimple *gs, gimple_stmt_iterator 
*gsi, gimple **vec_stmt,
 dump_printf_loc (MSG_NOTE, vect_location, "=== vectorizable_call ==="
  "\n");
   vect_model_simple_cost (stmt_info, ncopies, dt, NULL, NULL);
+  if (ifn != IFN_LAST && modifier == NARROW && !slp_node)
+   add_stmt_cost (stmt_info->vinfo->target_cost_data, ncopies / 2,
+  vec_promote_demote, stmt_info, 0, vect_body);
+
   return true;
 }
 
@@ -2357,9 +2401,9 @@ vectorizable_call (gimple *gs, gimple_stmt_iterator *gsi, 
gimple **vec_stmt,
   vec_dest = vect_create_destination_var (scalar_dest, vectype_out);
 
   prev_stmt_info = NULL;
-  switch (modifier)
+  if (modifier == NONE || ifn != IFN_LAST)
 {
-case NONE:
+  tree prev_res = NULL_TREE;
   for (j = 0; j < ncopies; ++j)
{
  /* Build argument list for the vectorized call.  */
@@ -2387,12 +2431,30 @@ vectorizable_call (gimple *gs, gimple_stmt_iterator 
*gsi, gimple **vec_stmt,
  vec vec_oprndsk = vec_defs[k];
  vargs[k] = vec_oprndsk[i];

PR68474: Fix tree-call-cdce.c:use_internal_fn

2015-12-01 Thread Richard Sandiford
We'd call gen_shrink_wrap_conditions for functions that it can't handle
but edom_only_function can.

Tested on x86_64-linux-gnu.  OK to install?

Thanks,
Richard


gcc/
PR tree-optimization/68474
* tree-call-cdce.c (use_internal_fn): Protect call to
gen_shrink_wrap_conditions.

gcc/testsuite/
PR tree-optimization/68474
* gcc.dg/pr68474.c: New test.

diff --git a/gcc/testsuite/gcc.dg/pr68474.c b/gcc/testsuite/gcc.dg/pr68474.c
new file mode 100644
index 000..8ad7def
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr68474.c
@@ -0,0 +1,7 @@
+/* { dg-options "-O -funsafe-math-optimizations" } */
+
+long double
+foo (long double d1, long double d2)
+{
+  return d1 || __builtin_significandl (d2);
+}
diff --git a/gcc/tree-call-cdce.c b/gcc/tree-call-cdce.c
index 75ef180..4123130 100644
--- a/gcc/tree-call-cdce.c
+++ b/gcc/tree-call-cdce.c
@@ -959,7 +959,8 @@ use_internal_fn (gcall *call)
 {
   unsigned nconds = 0;
   auto_vec conds;
-  gen_shrink_wrap_conditions (call, conds, );
+  if (can_test_argument_range (call))
+gen_shrink_wrap_conditions (call, conds, );
   if (nconds == 0 && !edom_only_function (call))
 return false;
 



Re: [openacc] fortran loop clauses and splitting

2015-12-01 Thread Jakub Jelinek
On Mon, Nov 30, 2015 at 10:00:06AM -0800, Cesar Philippidis wrote:
> This patch contains the following bug fixes:
> 
>  * Teaches gfortran to accept both num and static gang arguments inside
>same clause. E.g. gang(num:10, static:30). Currently, gfortran only
>allows one of those arguments to appear in a gang clause.
> 
>  * Make the diagnostics reported by resovle_oacc_positive_int_expr more
>accurate for worker and vector clauses.
> 
>  * Updates how combined loops are split to account for the renamed gang
>clause members in gfc_omp_clauses.  Also corrected a bug that Tom
>discovered in the c front end where combined reductions were being
>attached to kernels and parallel constructs. Now, they are only
>associated with the split acc loop.
> 
> Is this OK for trunk?

Ok, thanks.

Jakub


[Patch AArch64] Fix typo in aarch64_builtin_reciprocal.

2015-12-01 Thread Ramana Radhakrishnan
The patch to restructure builtin_reciprocals missed out an obvious ')'. 
Adjusted thusly and applied as obvious to trunk.

regards
Ramana


2015-12-01  Ramana Radhakrishnan  

* config/aarch64/aarch64.c (aarch64_builtin_reciprocal): Fix typo.


Re: [Patch AArch64] Fix typo in aarch64_builtin_reciprocal.

2015-12-01 Thread Jakub Jelinek
On Tue, Dec 01, 2015 at 08:58:53AM +, Ramana Radhakrishnan wrote:
> The patch to restructure builtin_reciprocals missed out an obvious ')'. 
> Adjusted thusly and applied as obvious to trunk.

Sorry for that.  Could you please also handle the gimple_call_internal_p
case, so that it actually returns the aarch64 builtin decls if
it is internal SQRT call with the right modes?  See the i386 and rs6000
builtins.  Haven't done that for aarch64, because it uses a helper function
defined somewhere else, so haven't been sure how you want it to look like.
> 
> 2015-12-01  Ramana Radhakrishnan  
> 
> * config/aarch64/aarch64.c (aarch64_builtin_reciprocal): Fix typo.

Jakub


Re: [gomp] Move openacc vector& worker single handling to RTL

2015-12-01 Thread Thomas Schwinge
Hi!

On Thu, 09 Jul 2015 20:25:22 -0400, Nathan Sidwell  wrote:
> This is the patch I committed.  [...]

> 2015-07-09  Nathan Sidwell  

>   * omp-low.c (omp_region): [...]
>   (enclosing_target_region, required_predication_mask,
>   generate_vector_broadcast, generate_oacc_broadcast,
>   make_predication_test, predicate_bb, find_predicatable_bbs,
>   predicate_omp_regions): Delete.
>   [...]

This removed all usage of bb_region_map.  Now cleaned up in
gomp-4_0-branch r231102:

commit ff7e1eb4e855aa16d14ae047172269bc7192a069
Author: tschwinge 
Date:   Tue Dec 1 09:04:33 2015 +

gcc/omp-low.c: Remove bb_region_map

gcc/
* omp-low.c (bb_region_map): Remove.  Adjust all users.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@231102 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog.gomp |  4 
 gcc/omp-low.c  | 42 +-
 2 files changed, 21 insertions(+), 25 deletions(-)

diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp
index 0e4f371..4842164 100644
--- gcc/ChangeLog.gomp
+++ gcc/ChangeLog.gomp
@@ -1,3 +1,7 @@
+2015-12-01  Thomas Schwinge  
+
+   * omp-low.c (bb_region_map): Remove.  Adjust all users.
+
 2015-11-30  Cesar Philippidis  
 
* tree-nested.c (convert_nonlocal_omp_clauses): Handle optional
diff --git gcc/omp-low.c gcc/omp-low.c
index 1b52f6b..a1e7a14 100644
--- gcc/omp-low.c
+++ gcc/omp-low.c
@@ -13356,9 +13356,6 @@ expand_omp (struct omp_region *region)
 }
 }
 
-/* Map each basic block to an omp_region.  */
-static hash_map *bb_region_map;
-
 static void
 find_omp_for_region_data (struct omp_region *region, gomp_for *stmt)
 {
@@ -13394,8 +13391,6 @@ build_omp_regions_1 (basic_block bb, struct omp_region 
*parent,
   gimple *stmt;
   basic_block son;
 
-  bb_region_map->put (bb, parent);
-
   gsi = gsi_last_bb (bb);
   if (!gsi_end_p (gsi) && is_gimple_omp (gsi_stmt (gsi)))
 {
@@ -13536,31 +13531,28 @@ build_omp_regions (void)
 static unsigned int
 execute_expand_omp (void)
 {
-  bb_region_map = new hash_map;
-
   build_omp_regions ();
 
-  if (root_omp_region)
+  if (!root_omp_region)
+return 0;
+
+  if (dump_file)
 {
-  if (dump_file)
-   {
- fprintf (dump_file, "\nOMP region tree\n\n");
- dump_omp_region (dump_file, root_omp_region, 0);
- fprintf (dump_file, "\n");
-   }
-
-  remove_exit_barriers (root_omp_region);
-
-  expand_omp (root_omp_region);
-
-  if (flag_checking && !loops_state_satisfies_p (LOOPS_NEED_FIXUP))
-   verify_loop_structure ();
-  cleanup_tree_cfg ();
-
-  free_omp_regions ();
+  fprintf (dump_file, "\nOMP region tree\n\n");
+  dump_omp_region (dump_file, root_omp_region, 0);
+  fprintf (dump_file, "\n");
 }
 
-  delete bb_region_map;
+  remove_exit_barriers (root_omp_region);
+
+  expand_omp (root_omp_region);
+
+  if (flag_checking && !loops_state_satisfies_p (LOOPS_NEED_FIXUP))
+verify_loop_structure ();
+  cleanup_tree_cfg ();
+
+  free_omp_regions ();
+
   return 0;
 }
 


Grüße
 Thomas


signature.asc
Description: PGP signature


Re: [PATCH testsuite ARM] : Update armv6 unaligned macro tests

2015-12-01 Thread Kyrill Tkachov

Hi Christian,

On 30/11/15 10:16, Christian Bruel wrote:

Hi Kyrill,

Your fix (https://gcc.gnu.org/ml/gcc-patches/2015-11/msg01392.html) exposed new 
FAILs with the macro testings in ftest-armv6[kz]-thumb.c.

From what I understood, only ARMv6T2 will have TARGET_32BIT set, and set 
unaligned_access as tested in ftest-armv6t2-thumb.c.
It seems that the other fttest-armv6-thumb tests should be updated to reflect 
your fix.



Yes, thanks for catching this.
Ok.

Kyrill


Tested for arm-none-eabi .







[committed] Improve error reporting from genattrtab.c

2015-12-01 Thread Richard Sandiford
The errors reported by check_attr_value weren't very helpful because
they always used the location of the define(_enum)_attr, even if the
error was in a define_insn.  Also, the errors reported by
check_attr_test didn't say which attribute was faulty.

Although not technically a bug fix, it was really useful in writing
the patch for PR68432.

Tested on a variety of targets and applied.

Richard


gcc/
* genattrtab.c (check_attr_test): Take an attr_desc instead of
an is_const flag.  Put the file_location argument first.
Update recursive calls.  Improve error messages.
(check_attr_value): Take a file location and use it instead
of attr->loc.  Improve error messages.  Update calls to
check_attr_test.
(check_defs): Update call to check_attr_value.
(make_canonical): Likewise.
(gen_attr): Likewise.
(main): Likewise.
(gen_insn_reserv): Update call to check_attr_test.

diff --git a/gcc/genattrtab.c b/gcc/genattrtab.c
index 32b837c..2caf8f6 100644
--- a/gcc/genattrtab.c
+++ b/gcc/genattrtab.c
@@ -729,9 +729,8 @@ attr_copy_rtx (rtx orig)
   return copy;
 }
 
-/* Given a test expression for an attribute, ensure it is validly formed.
-   IS_CONST indicates whether the expression is constant for each compiler
-   run (a constant expression may not test any particular insn).
+/* Given a test expression EXP for attribute ATTR, ensure it is validly
+   formed.  LOC is the location of the .md construct that contains EXP.
 
Convert (eq_attr "att" "a1,a2") to (ior (eq_attr ... ) (eq_attrq ..))
and (eq_attr "att" "!a1") to (not (eq_attr "att" "a1")).  Do the latter
@@ -744,9 +743,8 @@ attr_copy_rtx (rtx orig)
Return the new expression, if any.  */
 
 static rtx
-check_attr_test (rtx exp, int is_const, file_location loc)
+check_attr_test (file_location loc, rtx exp, attr_desc *attr)
 {
-  struct attr_desc *attr;
   struct attr_value *av;
   const char *name_ptr, *p;
   rtx orexp, newexp;
@@ -756,26 +754,27 @@ check_attr_test (rtx exp, int is_const, file_location loc)
 case EQ_ATTR:
   /* Handle negation test.  */
   if (XSTR (exp, 1)[0] == '!')
-   return check_attr_test (attr_rtx (NOT,
+   return check_attr_test (loc,
+   attr_rtx (NOT,
  attr_eq (XSTR (exp, 0),
(exp, 1)[1])),
-   is_const, loc);
+   attr);
 
   else if (n_comma_elts (XSTR (exp, 1)) == 1)
{
- attr = find_attr ( (exp, 0), 0);
- if (attr == NULL)
+ attr_desc *attr2 = find_attr ( (exp, 0), 0);
+ if (attr2 == NULL)
{
  if (! strcmp (XSTR (exp, 0), "alternative"))
return mk_attr_alt (((uint64_t) 1) << atoi (XSTR (exp, 1)));
  else
-   fatal_at (loc, "unknown attribute `%s' in EQ_ATTR",
- XSTR (exp, 0));
+   fatal_at (loc, "unknown attribute `%s' in definition of"
+ " attribute `%s'", XSTR (exp, 0), attr->name);
}
 
- if (is_const && ! attr->is_const)
-   fatal_at (loc, "constant expression uses insn attribute `%s'"
- " in EQ_ATTR", XSTR (exp, 0));
+ if (attr->is_const && ! attr2->is_const)
+   fatal_at (loc, "constant attribute `%s' cannot test non-constant"
+ " attribute `%s'", attr->name, attr2->name);
 
  /* Copy this just to make it permanent,
 so expressions using it can be permanent too.  */
@@ -784,26 +783,26 @@ check_attr_test (rtx exp, int is_const, file_location loc)
  /* It shouldn't be possible to simplify the value given to a
 constant attribute, so don't expand this until it's time to
 write the test expression.  */
- if (attr->is_const)
+ if (attr2->is_const)
ATTR_IND_SIMPLIFIED_P (exp) = 1;
 
- if (attr->is_numeric)
+ if (attr2->is_numeric)
{
  for (p = XSTR (exp, 1); *p; p++)
if (! ISDIGIT (*p))
  fatal_at (loc, "attribute `%s' takes only numeric values",
-   XSTR (exp, 0));
+   attr2->name);
}
  else
{
- for (av = attr->first_value; av; av = av->next)
+ for (av = attr2->first_value; av; av = av->next)
if (GET_CODE (av->value) == CONST_STRING
&& ! strcmp (XSTR (exp, 1), XSTR (av->value, 0)))
  break;
 
  if (av == NULL)
-   fatal_at (loc, "unknown value `%s' for `%s' attribute",
- XSTR (exp, 1), XSTR (exp, 0));
+   fatal_at (loc, "unknown value `%s' for attribute `%s'",
+ XSTR (exp, 1), attr2->name);
}
}
  

[PATCH, PR middle-end/68595] Fix invariant boolean vector generation

2015-12-01 Thread Ilya Enkovich
Hi,

This patch fixes a way invariant boolean vector is generated.  It makes sure 
boolean vector consists of 0 and -1 values.  Bootstrapped and tested on 
x86_64-unknown-linux-gnu.  OK for trunk?

Thanks,
Ilya
--
gcc/

2015-12-01  Ilya Enkovich  

PR middle-end/68595
* tree-vect-stmts.c (vect_init_vector): Cast boolean
scalars to a proper value before building a vector.

gcc/testsuite/

2015-12-01  Ilya Enkovich  

PR middle-end/68595
* gcc.dg/pr68595.c: New test.


diff --git a/gcc/testsuite/gcc.dg/pr68595.c b/gcc/testsuite/gcc.dg/pr68595.c
new file mode 100644
index 000..179c6c3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr68595.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+
+int a, b;
+char c;
+void fn1() {
+  b = 30;
+  for (; b <= 32; b++) {
+c = -17;
+for (; c <= 56; c++)
+  a -= 0 == (c || b);
+  }
+}
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 3b078da..5bb2289 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -1300,7 +1300,25 @@ vect_init_vector (gimple *stmt, tree val, tree type, 
gimple_stmt_iterator *gsi)
 {
   if (!types_compatible_p (TREE_TYPE (type), TREE_TYPE (val)))
{
- if (CONSTANT_CLASS_P (val))
+ /* Scalar boolean value should be transformed into
+all zeros or all ones value before building a vector.  */
+ if (VECTOR_BOOLEAN_TYPE_P (type))
+   {
+ tree true_val = build_zero_cst (TREE_TYPE (type));
+ tree false_val = build_all_ones_cst (TREE_TYPE (type));
+
+ if (CONSTANT_CLASS_P (val))
+   val = integer_zerop (val) ? false_val : true_val;
+ else
+   {
+ new_temp = make_ssa_name (TREE_TYPE (type));
+ init_stmt = gimple_build_assign (new_temp, COND_EXPR,
+  val, true_val, false_val);
+ vect_init_vector_1 (stmt, init_stmt, gsi);
+ val = new_temp;
+   }
+   }
+ else if (CONSTANT_CLASS_P (val))
val = fold_convert (TREE_TYPE (type), val);
  else
{


Re: [Patch AArch64] Fix typo in aarch64_builtin_reciprocal.

2015-12-01 Thread Ramana Radhakrishnan


On 01/12/15 09:04, Jakub Jelinek wrote:
> On Tue, Dec 01, 2015 at 08:58:53AM +, Ramana Radhakrishnan wrote:
>> The patch to restructure builtin_reciprocals missed out an obvious ')'. 
>> Adjusted thusly and applied as obvious to trunk.
> 
> Sorry for that.  Could you please also handle the gimple_call_internal_p
> case, so that it actually returns the aarch64 builtin decls if
> it is internal SQRT call with the right modes?  See the i386 and rs6000
> builtins.  Haven't done that for aarch64, because it uses a helper function
> defined somewhere else, so haven't been sure how you want it to look like.

Thanks for pointing this out. James - can you please take a look ?  I don't 
think I'll have the time to get to this today.

I just realized my patch wasn't attached to the previous mail - here it is FTR.

regards
Ramana


>>
>> 2015-12-01  Ramana Radhakrishnan  
>>
>> * config/aarch64/aarch64.c (aarch64_builtin_reciprocal): Fix typo.
> 
>   Jakub
> 
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index b150283..88dbe15 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -7112,7 +7112,7 @@ aarch64_builtin_reciprocal (gcall *call)
   & AARCH64_EXTRA_TUNE_RECIP_SQRT))
 return NULL_TREE;
 
-  if (gimple_call_internal_p (call)
+  if (gimple_call_internal_p (call))
 return NULL_TREE;
 
   tree fndecl = gimple_call_fndecl (call);


Re: S/390: Fix warnings in "*setmem_long..." patterns.

2015-12-01 Thread Dominik Vogt
On Mon, Nov 30, 2015 at 06:11:33PM +0100, Ulrich Weigand wrote:
> On 11/30/2015 04:11 PM, Dominik Vogt wrote:
> > The attached patch fixes some warnings generated by the setmem...
> > patterns in s390.md during build and add test cases for the
> > patterns.  The patch is to be added on to p of the movstr patch:
> > https://gcc.gnu.org/ml/gcc-patches/2015-11/msg03485.html
> > 
> > The test cases validate that the patterns are actually used, but
> > at the moment the setmem_long_and pattern is never actually used
> > and thus the test case would fail.  So I've split the patch in two
> > (both attached to this message) to activate this part of the test
> > once we've fixed that.
> > 
> > The patch has passed the SPEC2006 testsuite without any measurable
> > changes in performance.
> 
> What would you think about something like the following?
> 
> (define_insn "*setmem_long"
>   [(clobber (match_operand: 0 "register_operand" "=d"))
>(set (mem:BLK (subreg:P (match_operand: 3 "register_operand" "0") 0))
> (unspec:BLK [(match_operand:P 2 "shift_count_or_setmem_operand" "Y")
>  (subreg:P (match_dup 3) 1)] UNSPEC_REPLICATE_BYTE))
>(use (match_operand: 1 "register_operand" "d"))
>(clobber (reg:CC CC_REGNUM))]

New patch attached (patch 1.5 and ChangeLog are the same).  I've
swapped the operands 1 and 3 so that the numbering is the same as
before.  I think there are still a couple of problems with the
patched code:

1.

The new pattern has "(use (match_operand 3))" where the old one
just had match_dup (which did not express that a register pair was
required).  The expander function now requires a fourth, unused
argument that I don't know how to get rid of.

  emit_insn (gen_setmem_long_di (dst, convert_to_mode (Pmode, len, 1),
  val, NULL_RTX));
   

2.

I think the pattern should express that the register pair with the
destination address and length gets clobbered by the mvcle
instruction, and I'm not sure whether it's necessary to tell Gcc
explicitly that the register pair with the source address and
legth gets zeroed.

> [ Not sure if we'd need an extra (use (match_dup 3)) any more. ]
> 
> B.t.w. this is certainly wrong and cannot be generated by common code:
> (and:BLK (unspec:BLK
> [(match_operand:P 2 "shift_count_or_setmem_operand" "Y")]
> UNSPEC_P_TO_BLK)
>(match_operand 4 "const_int_operand" "n"))
> (This explains why the pattern would never match.)

It never matched before this change either.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt
IBM Germany
gcc/ChangeLog

* config/s390/s390.c (s390_expand_setmem): Use new expanders.
* config/s390/s390.md ("*setmem_long")
("*setmem_long_and", "*setmem_long_31z"): Fix warnings.
("setmem_long_"): New expanders.
("setmem_long"): Removed.

gcc/testsuite/ChangeLog

* gcc.target/s390/md/setmem_long-1.c: New test.
* gcc.target/s390/md/setmem_long-2.c: New test.
>From 0e1bc4be3466b0f07b1d5c1334e3717802a7db82 Mon Sep 17 00:00:00 2001
From: Dominik Vogt 
Date: Wed, 4 Nov 2015 03:16:24 +0100
Subject: [PATCH 1/1.5] S/390: Fix warnings in "*setmem_long..." patterns.

---
 gcc/config/s390/s390.c   |  7 +++-
 gcc/config/s390/s390.md  | 51 ++--
 gcc/testsuite/gcc.target/s390/md/setmem_long-1.c | 20 ++
 gcc/testsuite/gcc.target/s390/md/setmem_long-2.c | 20 ++
 4 files changed, 75 insertions(+), 23 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/md/setmem_long-1.c
 create mode 100644 gcc/testsuite/gcc.target/s390/md/setmem_long-2.c

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 40ee2f7..df7af91 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -5178,7 +5178,12 @@ s390_expand_setmem (rtx dst, rtx len, rtx val)
   else if (TARGET_MVCLE)
 {
   val = force_not_mem (convert_modes (Pmode, QImode, val, 1));
-  emit_insn (gen_setmem_long (dst, convert_to_mode (Pmode, len, 1), val));
+  if (TARGET_64BIT)
+	emit_insn (gen_setmem_long_di (dst, convert_to_mode (Pmode, len, 1),
+   val, NULL_RTX));
+  else
+	emit_insn (gen_setmem_long_si (dst, convert_to_mode (Pmode, len, 1),
+   val, NULL_RTX));
 }
 
   else
diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
index 75e9af7..e093fd3 100644
--- a/gcc/config/s390/s390.md
+++ b/gcc/config/s390/s390.md
@@ -70,6 +70,9 @@
; Copy CC as is into the lower 2 bits of an integer register
UNSPEC_CC_TO_INT
 
+   ; Convert Pmode to BLKmode
+   UNSPEC_REPLICATE_BYTE
+
; GOT/PLT and lt-relative accesses
UNSPEC_LTREL_OFFSET
UNSPEC_LTREL_BASE
@@ -3281,13 +3284,13 @@
 
 ; Initialize a block of arbitrary length with (operands[2] % 256).
 
-(define_expand "setmem_long"
+(define_expand "setmem_long_"
   

Re: [PATCH AArch64]Handle REG+REG+CONST and REG+NON_REG+CONST in legitimize address

2015-12-01 Thread Richard Earnshaw
On 01/12/15 03:19, Bin.Cheng wrote:
> On Tue, Nov 24, 2015 at 6:18 PM, Richard Earnshaw
>  wrote:
>> On 24/11/15 09:56, Richard Earnshaw wrote:
>>> On 24/11/15 02:51, Bin.Cheng wrote:
>> The aarch64's problem is we don't define addptr3 pattern, and we don't
 have direct insn pattern describing the "x + y << z".  According to
 gcc internal:

 ‘addptrm3’
 Like addm3 but is guaranteed to only be used for address calculations.
 The expanded code is not allowed to clobber the condition code. It
 only needs to be defined if addm3 sets the condition code.
>>
>> addm3 on aarch64 does not set the condition codes, so by this rule we
>> shouldn't need to define this pattern.
 Hi Richard,
 I think that rule has a prerequisite that backend needs to support
 register shifted addition in addm3 pattern.
>>>
>>> addm3 is a named pattern and its format is well defined.  It does not
>>> take a shifted operand and never has.
>>>
 Apparently for AArch64,
 addm3 only supports "reg+reg" or "reg+imm".  Also we don't really
 "does not set the condition codes" actually, because both
 "adds_shift_imm_*" and "adds_mul_imm_*" do set the condition flags.
>>>
>>> You appear to be confusing named patterns (used by expand) with
>>> recognizers.  Anyway, we have
>>>
>>> (define_insn "*add__"
>>>   [(set (match_operand:GPI 0 "register_operand" "=r")
>>> (plus:GPI (ASHIFT:GPI (match_operand:GPI 1 "register_operand" "r")
>>>   (match_operand:QI 2
>>> "aarch64_shift_imm_" "n"))
>>>   (match_operand:GPI 3 "register_operand" "r")))]
>>>
>>> Which is a non-flag setting add with shifted operand.
>>>
 Either way I think it is another backend issue, so do you approve that
 I commit this patch now?
>>>
>>> Not yet.  I think there's something fundamental amiss here.
>>>
>>> BTW, it looks to me as though addptr3 should have exactly the same
>>> operand rules as add3 (documentation reads "like add3"), so a
>>> shifted operand shouldn't be supported there either.  If that isn't the
>>> case then that should be clearly called out in the documentation.
>>>
>>> R.
>>>
>>
>> PS.
>>
>> I presume you are aware of the canonicalization rules for add?  That is,
>> for a shift-and-add operation, the shift operand must appear first.  Ie.
>>
>> (plus (shift (op, op)), op)
>>
>> not
>>
>> (plus (op, (shift (op, op))
> 
> Hi Richard,
> Thanks for the comments.  I realized that the not-recognized insn
> issue is because the original patch build non-canonical expressions.
> When reloading address expression, LRA generates non-canonical
> register scaled insn, which can't be recognized by aarch64 backend.
> 
> Here is the updated patch using canonical form pattern,  it passes
> bootstrap and regression test.  Well, the ivo failure still exists,
> but it analyzed in the original message.
> 
> Is this patch OK?
> 
> As for Jiong's concern about the additional extension instruction, I
> think this only stands for atmoic load store instructions.  For
> general load store, AArch64 supports zext/sext in register scaling
> addressing mode, the additional instruction can be forward propagated
> into memory reference.  The problem for atomic load store is AArch64
> only supports direct register addressing mode.  After LRA reloads
> address expression out of memory reference, there is no combine/fwprop
> optimizer to merge instructions.  The problem is atomic_store's
> predicate doesn't match its constraint.   The predicate used for
> atomic_store is memory_operand, while all other atomic patterns
> use aarch64_sync_memory_operand.  I think this might be a typo.  With
> this change, expand will not generate addressing mode requiring reload
> anymore.  I will test another patch fixing this.
> 
> Thanks,
> bin

Some comments inline.

>>
>> R.
>>
>> aarch64_legitimize_addr-20151128.txt
>>
>>
>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>> index 3fe2f0f..5b3e3c4 100644
>> --- a/gcc/config/aarch64/aarch64.c
>> +++ b/gcc/config/aarch64/aarch64.c
>> @@ -4757,13 +4757,65 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x  
>> */, machine_mode mode)
>>   We try to pick as large a range for the offset as possible to
>>   maximize the chance of a CSE.  However, for aligned addresses
>>   we limit the range to 4k so that structures with different sized
>> - elements are likely to use the same base.  */
>> + elements are likely to use the same base.  We need to be careful
>> + not split CONST for some forms address expressions, otherwise it

not to split a CONST for some forms of address expression,

>> + will generate sub-optimal code.  */
>>  
>>if (GET_CODE (x) == PLUS && CONST_INT_P (XEXP (x, 1)))
>>  {
>>HOST_WIDE_INT offset = INTVAL (XEXP (x, 1));
>>HOST_WIDE_INT base_offset;
>>  
>> +  if (GET_CODE (XEXP (x, 0)) 

Re: [UPC 01/22] front-end changes

2015-12-01 Thread Gary Funck
On 12/01/15 09:12:44, Eric Botcazou wrote:
> > All languages (c, c++, fortran, go, lto, objc, obj-c++) have been
> > bootstrapped; no test suite regressions were introduced,
> > relative to the GCC trunk.
> 
> That's not all languages though, Ada and Java are missing.

OK. I'll bootstrap and run tests on those as well, and
report back in a day/two.

thanks,
- Gary


Re: [gomp4.5] Handle #pragma omp declare target link

2015-12-01 Thread Jakub Jelinek
On Mon, Nov 30, 2015 at 11:55:20PM +0300, Ilya Verbin wrote:
> Ok, but it doesn't solve the issue with doing it for the executable, because
> gomp_unmap_tgt (n->tgt) will want to run free_func on uninitialized device.

?? You mean that the
devicep->unload_image_func (devicep->target_id, version, target_data);
call deinitializes the device or something else (I mean, if there is some
other tgt, then it had to be initialized)?
If it is just that order, I wonder if you can't just move the
unload_image_func call after the splay_tree_remove loops (or even after the
node freeing call).

Jakub


Re: [UPC 01/22] front-end changes

2015-12-01 Thread Eric Botcazou
> All languages (c, c++, fortran, go, lto, objc, obj-c++) have been
> bootstrapped; no test suite regressions were introduced,
> relative to the GCC trunk.

That's not all languages though, Ada and Java are missing.

-- 
Eric Botcazou


Re: [OpenACC 0/7] host_data construct

2015-12-01 Thread Jakub Jelinek
On Mon, Nov 30, 2015 at 07:30:34PM +, Julian Brown wrote:
> Julian Brown  
> Cesar Philippidis  
> James Norris  
> 
> gcc/
> * c-family/c-pragma.c (oacc_pragmas): Add PRAGMA_OACC_HOST_DATA.
> * c-family/c-pragma.h (pragma_kind): Add PRAGMA_OACC_HOST_DATA.

c-family/, c/ and cp/ subdirectories have their own ChangeLog, so you need
to split the entry into multiple ChangeLog files and remove the directory
prefixes.

> @@ -6120,6 +6121,9 @@ omp_notice_variable (struct gimplify_omp_ctx *ctx, tree 
> decl, bool in_code)
>(splay_tree_key) decl);
> if (n2)
>   {
> +   if (octx->region_type == ORT_ACC_HOST_DATA)
> + error ("variable %qE declared in enclosing "
> +"host_data region", DECL_NAME (decl));

% instead?
> nflags |= GOVD_MAP;
> goto found_outer;
>   }
> @@ -6418,6 +6422,7 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq 
> *pre_p,
>case OMP_TARGET_DATA:
>case OMP_TARGET_ENTER_DATA:
>case OMP_TARGET_EXIT_DATA:
> +  case OACC_HOST_DATA:
>   ctx->target_firstprivatize_array_bases = true;
>default:
>   break;
> @@ -6683,6 +6688,7 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq 
> *pre_p,
>   case OMP_TARGET_DATA:
>   case OMP_TARGET_ENTER_DATA:
>   case OMP_TARGET_EXIT_DATA:
> + case OACC_HOST_DATA:
> if (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_FIRSTPRIVATE_POINTER
> || (OMP_CLAUSE_MAP_KIND (c)
> == GOMP_MAP_FIRSTPRIVATE_REFERENCE))
> @@ -6695,6 +6701,22 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq 
> *pre_p,
>   }
> if (remove)
>   break;
> +   if (DECL_P (decl) && outer_ctx && (region_type & ORT_ACC))
> + {
> +   struct gimplify_omp_ctx *octx;
> +   for (octx = outer_ctx; octx; octx = octx->outer_context)
> + {
> +   if (!(octx->region_type & (ORT_TARGET_DATA | ORT_TARGET)))
> + break;

Wouldn't it be better to do
if (octx->region_type != ORT_ACC_HOST_DATA)
  continue;
here, thus only lookup if you really want to use it?

> +   splay_tree_node n2
> + = splay_tree_lookup (octx->variables,
> +  (splay_tree_key) decl);
> +   if (n2 && octx->region_type == ORT_ACC_HOST_DATA)

and remove the && ... part from the condition?

> + error_at (OMP_CLAUSE_LOCATION (c), "variable %qE "
> +   "declared in enclosing host_data region",
> +   DECL_NAME (decl));
> + }
> + }
> if (OMP_CLAUSE_SIZE (c) == NULL_TREE)
>   OMP_CLAUSE_SIZE (c) = DECL_P (decl) ? DECL_SIZE_UNIT (decl)
> : TYPE_SIZE_UNIT (TREE_TYPE (decl));

Ok with those changes.

Jakub


Re: -fstrict-aliasing fixes 2/5: drop alias set 0 streaming

2015-12-01 Thread Richard Biener
On Tue, 1 Dec 2015, Jan Hubicka wrote:

> Hi,
> this patch disables the streaming of alias 0 flag and adds a comment why.
> 
> Bootstrapped/regtested x86_64-linux, OK?

Ok.

Thanks,
Richard.

> Honza
> 
>   * lto-streamer-out.c (hash_tree): Do not stream TYPE_ALIAS_SET.
>   * tree-streamer-out.c (pack_ts_type_common_value_fields): Do not
>   stream TYPE_ALIAS_SET.
>   * tree-streamer-in.c (unpack_ts_type_common_value_fields): Do not
>   stream TYPE_ALIAS_SET.
> 
>   * lto.c (compare_tree_sccs_1): Do not compare TYPE_ALIAS_SET.
> 
> Index: lto-streamer-out.c
> ===
> --- lto-streamer-out.c(revision 231081)
> +++ lto-streamer-out.c(working copy)
> @@ -1109,10 +1109,6 @@ hash_tree (struct streamer_tree_cache_d
>hstate.commit_flag ();
>hstate.add_int (TYPE_PRECISION (t));
>hstate.add_int (TYPE_ALIGN (t));
> -  hstate.add_int ((TYPE_ALIAS_SET (t) == 0
> -  || (!in_lto_p
> -  && get_alias_set (t) == 0))
> - ? 0 : -1);
>  }
>  
>if (CODE_CONTAINS_STRUCT (code, TS_TRANSLATION_UNIT_DECL))
> Index: lto/lto.c
> ===
> --- lto/lto.c (revision 231081)
> +++ lto/lto.c (working copy)
> @@ -1166,7 +1166,9 @@ compare_tree_sccs_1 (tree t1, tree t2, t
>compare_values (TYPE_READONLY);
>compare_values (TYPE_PRECISION);
>compare_values (TYPE_ALIGN);
> -  compare_values (TYPE_ALIAS_SET);
> +  /* Do not compare TYPE_ALIAS_SET.  Doing so introduce ordering issues
> + with calls to get_alias_set which may initialize it for streamed
> +  in types.  */
>  }
>  
>/* We don't want to compare locations, so there is nothing do compare
> Index: tree-streamer-out.c
> ===
> --- tree-streamer-out.c   (revision 231081)
> +++ tree-streamer-out.c   (working copy)
> @@ -317,13 +317,9 @@ pack_ts_type_common_value_fields (struct
>bp_pack_value (bp, TYPE_RESTRICT (expr), 1);
>bp_pack_value (bp, TYPE_USER_ALIGN (expr), 1);
>bp_pack_value (bp, TYPE_READONLY (expr), 1);
> -  /* Make sure to preserve the fact whether the frontend would assign
> - alias-set zero to this type.  Do that only for main variants, because
> - type variants alias sets are never computed.
> - FIXME:  This does not work for pre-streamed builtin types.  */
> -  bp_pack_value (bp, (TYPE_ALIAS_SET (expr) == 0
> -   || (!in_lto_p && TYPE_MAIN_VARIANT (expr) == expr
> -   && get_alias_set (expr) == 0)), 1);
> +  /* We used to stream TYPE_ALIAS_SET == 0 information to let frontends mark
> + types that are opaque for TBAA.  This however did not work as intended,
> + becuase TYPE_ALIAS_SET == 0 was regularly lost in canonical type 
> merging.  */
>if (RECORD_OR_UNION_TYPE_P (expr))
>  {
>bp_pack_value (bp, TYPE_TRANSPARENT_AGGR (expr), 1);
> Index: tree-streamer-in.c
> ===
> --- tree-streamer-in.c(revision 231081)
> +++ tree-streamer-in.c(working copy)
> @@ -366,7 +366,6 @@ unpack_ts_type_common_value_fields (stru
>TYPE_RESTRICT (expr) = (unsigned) bp_unpack_value (bp, 1);
>TYPE_USER_ALIGN (expr) = (unsigned) bp_unpack_value (bp, 1);
>TYPE_READONLY (expr) = (unsigned) bp_unpack_value (bp, 1);
> -  TYPE_ALIAS_SET (expr) = bp_unpack_value (bp, 1) ? 0 : -1;
>if (RECORD_OR_UNION_TYPE_P (expr))
>  {
>TYPE_TRANSPARENT_AGGR (expr) = (unsigned) bp_unpack_value (bp, 1);
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: -fstrict-aliasing fixes 3/5: Do not ignore -fstrict-aliasing changes when parsing optimization attribute

2015-12-01 Thread Richard Biener
On Tue, 1 Dec 2015, Jan Hubicka wrote:

> Hi,
> this is third part which enables us to change -fstrict-aliasing using
> optimize attribute.  This ought to work safely now because inliner
> propagate the flag.

Ok.

Thanks,
Richard.

> Bootstrapped/regtested x86_64-linux.
> 
> Honza
> 
>   * gcc.c-torture/execute/alias-1.c: New testcase.
>   * c-common.c: Do not silently ignore -fstrict-aliasing changes.
> Index: testsuite/gcc.c-torture/execute/alias-1.c
> ===
> --- testsuite/gcc.c-torture/execute/alias-1.c (revision 0)
> +++ testsuite/gcc.c-torture/execute/alias-1.c (revision 0)
> @@ -0,0 +1,19 @@
> +int val;
> +
> +int *ptr = 
> +float *ptr2 = 
> +
> +__attribute__((optimize ("-fno-strict-aliasing")))
> +typepun ()
> +{
> +  *ptr2=0;
> +}
> +
> +main()
> +{
> +  *ptr=1;
> +  typepun ();
> +  if (*ptr)
> +__builtin_abort ();
> +}
> +
> Index: c-family/c-common.c
> ===
> --- c-family/c-common.c   (revision 231097)
> +++ c-family/c-common.c   (working copy)
> @@ -9988,7 +9988,6 @@ parse_optimize_options (tree args, bool
>bool ret = true;
>unsigned opt_argc;
>unsigned i;
> -  int saved_flag_strict_aliasing;
>const char **opt_argv;
>struct cl_decoded_option *decoded_options;
>unsigned int decoded_options_count;
> @@ -10081,8 +10080,6 @@ parse_optimize_options (tree args, bool
>for (i = 1; i < opt_argc; i++)
>  opt_argv[i] = (*optimize_args)[i];
>  
> -  saved_flag_strict_aliasing = flag_strict_aliasing;
> -
>/* Now parse the options.  */
>decode_cmdline_options_to_array_default_mask (opt_argc, opt_argv,
>   _options,
> @@ -10093,9 +10090,6 @@ parse_optimize_options (tree args, bool
>  
>targetm.override_options_after_change();
>  
> -  /* Don't allow changing -fstrict-aliasing.  */
> -  flag_strict_aliasing = saved_flag_strict_aliasing;
> -
>optimize_args->truncate (0);
>return ret;
>  }
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: When not optimizing do not compute RTX memory attributes

2015-12-01 Thread Richard Biener
On Tue, 1 Dec 2015, Jan Hubicka wrote:

> Hi,
> memory attributes are currently optimized and attached to RTL even when not
> optimizing. This is obviously just a wasted effort.

Huh, are you sure?  What about globals used from different optimize
contexts?

> Bootstrapped/regtested x86_64-linux, OK?

I don't think so.  Did you bootstrap with BOOT_CFLAGS="-O0 -g"?

Richard.

> Honza
>   * emit-rtl.c (set_mem_attrs, set_mem_attributes_minus_bitpos):
>   Do not compute memory attributes when not optimizing.
> 
> Index: emit-rtl.c
> ===
> --- emit-rtl.c(revision 231081)
> +++ emit-rtl.c(working copy)
> @@ -336,7 +336,8 @@ static void
>  set_mem_attrs (rtx mem, mem_attrs *attrs)
>  {
>/* If everything is the default, we can just clear the attributes.  */
> -  if (mem_attrs_eq_p (attrs, mode_mem_attrs[(int) GET_MODE (mem)]))
> +  if (!optimize
> +  || mem_attrs_eq_p (attrs, mode_mem_attrs[(int) GET_MODE (mem)]))
>  {
>MEM_ATTRS (mem) = 0;
>return;
> @@ -1749,6 +1750,9 @@ set_mem_attributes_minus_bitpos (rtx ref
>struct mem_attrs attrs, *defattrs, *refattrs;
>addr_space_t as;
>  
> +  if (!optimize)
> +return;
> +
>/* It can happen that type_for_mode was given a mode for which there
>   is no language-level type.  In which case it returns NULL, which
>   we can see here.  */
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: PR68577: Handle narrowing for vector popcount, etc.

2015-12-01 Thread Richard Biener
On Tue, Dec 1, 2015 at 10:14 AM, Richard Sandiford
 wrote:
> This patch adds support for simple cases where the a vector internal
> function returns wider results than the scalar equivalent.  It punts
> on other cases.
>
> Tested on powerpc64-linux-gnu and x86_64-linux-gnu.  OK to install?
>
> Thanks,
> Richard
>
>
> gcc/
> PR tree-optimization/68577
> * tree-vect-stmts.c (simple_integer_narrowing): New function.
> (vectorizable_call): Restrict internal function handling
> to NONE and NARROW cases, using simple_integer_narrowing
> to test for the latter.  Add cost of narrowing operation
> and insert it where necessary.
>
> gcc/testsuite/
> PR tree-optimization/68577
> * gcc.dg/vect/pr68577.c: New test.
>
> diff --git a/gcc/testsuite/gcc.dg/vect/pr68577.c 
> b/gcc/testsuite/gcc.dg/vect/pr68577.c
> new file mode 100644
> index 000..999c1c8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/pr68577.c
> @@ -0,0 +1,25 @@
> +/* { dg-do compile } */
> +
> +int a, b;
> +
> +void
> +__sched_cpucount (void)
> +{
> +  while (b)
> +{
> +  long l = b++;
> +  a += __builtin_popcountl(l);
> +}
> +}
> +
> +void
> +slp_test (int *x, long *y)
> +{
> +  for (int i = 0; i < 512; i += 4)
> +{
> +  x[i] = __builtin_popcountl(y[i]);
> +  x[i + 1] = __builtin_popcountl(y[i + 1]);
> +  x[i + 2] = __builtin_popcountl(y[i + 2]);
> +  x[i + 3] = __builtin_popcountl(y[i + 3]);
> +}
> +}
> diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
> index 3b078da..af86bce 100644
> --- a/gcc/tree-vect-stmts.c
> +++ b/gcc/tree-vect-stmts.c
> @@ -2122,6 +2122,40 @@ vectorizable_mask_load_store (gimple *stmt, 
> gimple_stmt_iterator *gsi,
>return true;
>  }
>
> +/* Return true if vector type VECTYPE_OUT has integer elements and
> +   if we can narrow two integer vectors with the same shape as
> +   VECTYPE_IN to VECTYPE_OUT in a single step.  On success,
> +   return the binary pack code in *CONVERT_CODE and the types
> +   of the input vectors in *CONVERT_FROM.  */
> +
> +static bool
> +simple_integer_narrowing (tree vectype_out, tree vectype_in,
> + tree_code *convert_code, tree *convert_from)
> +{
> +  if (!INTEGRAL_TYPE_P (TREE_TYPE (vectype_out)))
> +return false;
> +
> +  if (!INTEGRAL_TYPE_P (TREE_TYPE (vectype_in)))
> +{
> +  unsigned int bits
> +   = GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (vectype_in)));
> +  tree scalar_type = build_nonstandard_integer_type (bits, 0);
> +  vectype_in = get_same_sized_vectype (scalar_type, vectype_in);
> +}
> +

any reason for supporting non-integer types on the input?  It seems to me
you are doing this for the lrint case?  If so isn't the "question" wrong and
you should pass the integer type the IFN returns as vectype_in instead?

That said, this conversion doesn't seem to belong to simple_integer_narrowing.

The patch is ok with simply removing it.

Thanks,
Richard.

> +  tree_code code;
> +  int multi_step_cvt = 0;
> +  auto_vec  interm_types;
> +  if (!supportable_narrowing_operation (NOP_EXPR, vectype_out, vectype_in,
> +   , _step_cvt,
> +   _types)
> +  || multi_step_cvt)
> +return false;
> +
> +  *convert_code = code;
> +  *convert_from = vectype_in;
> +  return true;
> +}
>
>  /* Function vectorizable_call.
>
> @@ -2288,7 +2322,13 @@ vectorizable_call (gimple *gs, gimple_stmt_iterator 
> *gsi, gimple **vec_stmt,
>tree callee = gimple_call_fndecl (stmt);
>
>/* First try using an internal function.  */
> -  if (cfn != CFN_LAST)
> +  tree_code convert_code = ERROR_MARK;
> +  tree convert_from = NULL_TREE;
> +  if (cfn != CFN_LAST
> +  && (modifier == NONE
> + || (modifier == NARROW
> + && simple_integer_narrowing (vectype_out, vectype_in,
> +  _code, _from
>  ifn = vectorizable_internal_function (cfn, callee, vectype_out,
>   vectype_in);
>
> @@ -2328,7 +2368,7 @@ vectorizable_call (gimple *gs, gimple_stmt_iterator 
> *gsi, gimple **vec_stmt,
>
>if (slp_node || PURE_SLP_STMT (stmt_info))
>  ncopies = 1;
> -  else if (modifier == NARROW)
> +  else if (modifier == NARROW && ifn == IFN_LAST)
>  ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits_out;
>else
>  ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits_in;
> @@ -2344,6 +2384,10 @@ vectorizable_call (gimple *gs, gimple_stmt_iterator 
> *gsi, gimple **vec_stmt,
>  dump_printf_loc (MSG_NOTE, vect_location, "=== vectorizable_call ==="
>   "\n");
>vect_model_simple_cost (stmt_info, ncopies, dt, NULL, NULL);
> +  if (ifn != IFN_LAST && modifier == NARROW && !slp_node)
> +   add_stmt_cost (stmt_info->vinfo->target_cost_data, ncopies / 2,
> +  

Re: [PATCH, PR middle-end/68595] Fix invariant boolean vector generation

2015-12-01 Thread Richard Biener
On Tue, Dec 1, 2015 at 10:44 AM, Ilya Enkovich  wrote:
> Hi,
>
> This patch fixes a way invariant boolean vector is generated.  It makes sure 
> boolean vector consists of 0 and -1 values.  Bootstrapped and tested on 
> x86_64-unknown-linux-gnu.  OK for trunk?

Ok.

Richard.

> Thanks,
> Ilya
> --
> gcc/
>
> 2015-12-01  Ilya Enkovich  
>
> PR middle-end/68595
> * tree-vect-stmts.c (vect_init_vector): Cast boolean
> scalars to a proper value before building a vector.
>
> gcc/testsuite/
>
> 2015-12-01  Ilya Enkovich  
>
> PR middle-end/68595
> * gcc.dg/pr68595.c: New test.
>
>
> diff --git a/gcc/testsuite/gcc.dg/pr68595.c b/gcc/testsuite/gcc.dg/pr68595.c
> new file mode 100644
> index 000..179c6c3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr68595.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3" } */
> +
> +int a, b;
> +char c;
> +void fn1() {
> +  b = 30;
> +  for (; b <= 32; b++) {
> +c = -17;
> +for (; c <= 56; c++)
> +  a -= 0 == (c || b);
> +  }
> +}
> diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
> index 3b078da..5bb2289 100644
> --- a/gcc/tree-vect-stmts.c
> +++ b/gcc/tree-vect-stmts.c
> @@ -1300,7 +1300,25 @@ vect_init_vector (gimple *stmt, tree val, tree type, 
> gimple_stmt_iterator *gsi)
>  {
>if (!types_compatible_p (TREE_TYPE (type), TREE_TYPE (val)))
> {
> - if (CONSTANT_CLASS_P (val))
> + /* Scalar boolean value should be transformed into
> +all zeros or all ones value before building a vector.  */
> + if (VECTOR_BOOLEAN_TYPE_P (type))
> +   {
> + tree true_val = build_zero_cst (TREE_TYPE (type));
> + tree false_val = build_all_ones_cst (TREE_TYPE (type));
> +
> + if (CONSTANT_CLASS_P (val))
> +   val = integer_zerop (val) ? false_val : true_val;
> + else
> +   {
> + new_temp = make_ssa_name (TREE_TYPE (type));
> + init_stmt = gimple_build_assign (new_temp, COND_EXPR,
> +  val, true_val, false_val);
> + vect_init_vector_1 (stmt, init_stmt, gsi);
> + val = new_temp;
> +   }
> +   }
> + else if (CONSTANT_CLASS_P (val))
> val = fold_convert (TREE_TYPE (type), val);
>   else
> {


Re: PR68474: Fix tree-call-cdce.c:use_internal_fn

2015-12-01 Thread Richard Biener
On Tue, Dec 1, 2015 at 10:24 AM, Richard Sandiford
 wrote:
> We'd call gen_shrink_wrap_conditions for functions that it can't handle
> but edom_only_function can.
>
> Tested on x86_64-linux-gnu.  OK to install?

Ok.

Richard.

> Thanks,
> Richard
>
>
> gcc/
> PR tree-optimization/68474
> * tree-call-cdce.c (use_internal_fn): Protect call to
> gen_shrink_wrap_conditions.
>
> gcc/testsuite/
> PR tree-optimization/68474
> * gcc.dg/pr68474.c: New test.
>
> diff --git a/gcc/testsuite/gcc.dg/pr68474.c b/gcc/testsuite/gcc.dg/pr68474.c
> new file mode 100644
> index 000..8ad7def
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr68474.c
> @@ -0,0 +1,7 @@
> +/* { dg-options "-O -funsafe-math-optimizations" } */
> +
> +long double
> +foo (long double d1, long double d2)
> +{
> +  return d1 || __builtin_significandl (d2);
> +}
> diff --git a/gcc/tree-call-cdce.c b/gcc/tree-call-cdce.c
> index 75ef180..4123130 100644
> --- a/gcc/tree-call-cdce.c
> +++ b/gcc/tree-call-cdce.c
> @@ -959,7 +959,8 @@ use_internal_fn (gcall *call)
>  {
>unsigned nconds = 0;
>auto_vec conds;
> -  gen_shrink_wrap_conditions (call, conds, );
> +  if (can_test_argument_range (call))
> +gen_shrink_wrap_conditions (call, conds, );
>if (nconds == 0 && !edom_only_function (call))
>  return false;
>
>


Re: RFC: Merge the GUPC branch into the GCC 6.0 trunk

2015-12-01 Thread Andi Kleen
Bernd Schmidt  writes:

> I'm worried we'll end up carrying
> something around as a burden that is of no practical use (considering
> we already support the more widespread OpenMP).

I'm not an expert on UPC, but from glancing over the description it
seems to target a distributed message passing programing model,
which is very different from OpenMP. I don't think any of the existing
parallelization models in gcc (OpenMP, cilk) support that niche.

-Andi
-- 
a...@linux.intel.com -- Speaking for myself only


[PATCH] Add testcase for tree-optimization/67916

2015-12-01 Thread Marek Polacek
This PR was fixed in r228767 (or went latent?), but this testcase has never
been added.

Tested on x86_64-linux, ok for trunk?

2015-12-01  Marek Polacek  

PR tree-optimization/67916
* gcc.dg/torture/pr67916.c: New test.

diff --git gcc/testsuite/gcc.dg/torture/pr67916.c 
gcc/testsuite/gcc.dg/torture/pr67916.c
index e69de29..88541f9 100644
--- gcc/testsuite/gcc.dg/torture/pr67916.c
+++ gcc/testsuite/gcc.dg/torture/pr67916.c
@@ -0,0 +1,46 @@
+/* PR tree-optimization/67916 */
+/* { dg-do run } */
+
+int a[6], b = 1, d, e;
+long long c;
+static int f = 1;
+
+void
+fn1 (int p1)
+{
+  b = (b >> 1) & (1 ^ a[(1 ^ p1) & 5]);
+}
+
+void
+fn2 ()
+{
+  b = (b >> 1) & (1 ^ a[(b ^ 1) & 1]);
+  fn1 (c >> 1 & 5);
+  fn1 (c >> 2 & 5);
+  fn1 (c >> 4 & 5);
+  fn1 (c >> 8 & 5);
+}
+
+int
+main ()
+{
+  int i, j;
+  for (; d;)
+{
+  for (; e;)
+   fn2 ();
+  f = 0;
+}
+  for (i = 0; i < 8; i++)
+{
+  if (f)
+   i = 9;
+  for (j = 0; j < 7; j++)
+   fn2 ();
+}
+
+  if (b != 0)
+__builtin_abort ();
+
+  return 0;
+}

Marek


[gomp-nvptx 2/9] nvptx backend: new "uniform SIMT" codegen variant

2015-12-01 Thread Alexander Monakov
This patch introduces a code generation variant for NVPTX that I'm using for
SIMD work in OpenMP offloading.  Let me try to explain the idea behind it...

In place of SIMD vectorization, NVPTX is using SIMT (single
instruction/multiple threads) execution: groups of 32 threads execute the same
instruction, with some threads possibly masked off if under a divergent branch.
So we are mapping OpenMP threads to such thread groups ("warps"), and hardware
threads are then mapped to OpenMP SIMD lanes.

We need to reach heads of SIMD regions with all hw threads active, because
there's no way to "resurrect" them once masked off: they need to follow the
same control flow, and reach the SIMD region entry with the same local state
(registers, and stack too for OpenACC).

The approach in OpenACC is to, outside of "vector" loops, 1) make threads 1-31
"slaves" which just follow branches without any computation -- that requires
extra jumps and broadcasting branch predicates, -- and 2) broadcast register
state and stack state from master to slaves when entering "vector" regions.

I'm taking a different approach.  I want to execute all insns in all warp
members, while ensuring that effect (on global and local state) is that same
as if any single thread was executing that instruction.  Most instructions
automatically satisfy that: if threads have the same state, then executing an
arithmetic instruction, normal memory load/store, etc. keep local state the
same in all threads.

The two exception insn categories are atomics and calls.  For calls, we can
demand recursively that they uphold this execution model, until we reach
runtime-provided "syscalls": malloc/free/vprintf.  Those we can handle like
atomics.

To handle atomics, we
  1) execute the atomic conditionally only in one warp member -- so its side
  effect happens once;
  2) copy the register that was set from that warp member to others -- so
  local state is kept synchronized:

atom.op dest, ...

becomes

/* pred = (current_lane == 0);  */
@pred atom.op dest, ...
shuffle.idx dest, dest, /*srclane=*/0

So the overhead is one shuffle insn following each atomic, plus predicate
setup in the prologue.

OK, so the above handles execution out of SIMD regions nicely, but then we'd
also need to run code inside of SIMD regions, where we need to turn off this
synching effect.  Turns out we can keep atomics decorated almost like before:

@pred atom.op dest, ...
shuffle.idx dest, dest, master_lane

and compute 'pred' and 'master_lane' accordingly: outside of SIMD regions we
need (master_lane == 0 && pred == (current_lane == 0)), and inside we need
(master_lane == current_lane && pred == true) (so that shuffle is no-op, and
predicate is 'true' for all lanes).  Then, (pred = (current_lane ==
master_lane) works in both cases, and we just need to set up master_lane
accordingly: master_lane = current_lane & mask, where mask is all-0 outside of
SIMD regions, and all-1 inside.  To store these per-warp masks, I've
introduced another shared memory array, __nvptx_uni.

* config/nvptx/nvptx.c (need_unisimt_decl): New variable.  Set it...
(nvptx_init_unisimt_predicate): ...here (new function) and use it...
(nvptx_file_end): ...here to emit declaration of __nvptx_uni array.
(nvptx_declare_function_name): Call nvptx_init_unisimt_predicate.
(nvptx_get_unisimt_master): New helper function.
(nvptx_get_unisimt_predicate): Ditto.
(nvptx_call_insn_is_syscall_p): Ditto.
(nvptx_unisimt_handle_set): Ditto.
(nvptx_reorg_uniform_simt): New.  Transform code for -muniform-simt.
(nvptx_get_axis_predicate): New helper function, factored out from...
(nvptx_single): ...here.
(nvptx_reorg): Call nvptx_reorg_uniform_simt.
* config/nvptx/nvptx.h (TARGET_CPU_CPP_BUILTINS): Define
__nvptx_unisimt__ when -muniform-simt option is active.
(struct machine_function): Add unisimt_master, unisimt_predicate
rtx fields.
* config/nvptx/nvptx.md (divergent): New attribute.
(atomic_compare_and_swap_1): Mark as divergent.
(atomic_exchange): Ditto.
(atomic_fetch_add): Ditto.
(atomic_fetch_addsf): Ditto.
(atomic_fetch_): Ditto.
* config/nvptx/nvptx.opt (muniform-simt): New option.
* doc/invoke.texi (-muniform-simt): Document.
---
 gcc/config/nvptx/nvptx.c   | 138 ++---
 gcc/config/nvptx/nvptx.h   |   4 ++
 gcc/config/nvptx/nvptx.md  |  18 --
 gcc/config/nvptx/nvptx.opt |   4 ++
 gcc/doc/invoke.texi|  14 +
 5 files changed, 165 insertions(+), 13 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 2dad3e2..9209b47 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -117,6 +117,9 @@ static GTY(()) rtx worker_red_sym;
 /* True if any function references __nvptx_stacks.  */
 static bool need_softstack_decl;
 
+/* 

Re: RFC: Merge the GUPC branch into the GCC 6.0 trunk

2015-12-01 Thread Richard Biener
On Tue, 1 Dec 2015, Andi Kleen wrote:

> Bernd Schmidt  writes:
> 
> > I'm worried we'll end up carrying
> > something around as a burden that is of no practical use (considering
> > we already support the more widespread OpenMP).
> 
> I'm not an expert on UPC, but from glancing over the description it
> seems to target a distributed message passing programing model,
> which is very different from OpenMP. I don't think any of the existing
> parallelization models in gcc (OpenMP, cilk) support that niche.

Fortran CoArrays do though.  Ok, slightly irrelevant...

Btw, I don't think we should talk about "no practical use" given
we took openACC.

Richard.


[gomp-nvptx 8/9] libgomp: update gomp_nvptx_main for -mgomp

2015-12-01 Thread Alexander Monakov
Here's how I've updated gomp_nvptx_main to set up shared memory arrays
__nvptx_stacks and __nvptx_uni for -mgomp.  Since it makes sense only for
-mgomp multilib, I've wrapped the whole file under #ifdef that checks
corresponding built-in macros.

Reaching those shared memory arrays is awkward.  I cannot declare them with
toplevel asms because the compiler implicitely declares them too, and ptxas
does not handle duplicated declaration.  Ideally I'd like to be able to say:

extern char *__shared __nvptx_stacks[32];

Bernd, is your position on exposing shared memory as first-class address space
on NVPTX subject to change?  Do you remember what middle-end issues you've
encountered when trying that?

* config/nvptx/team.c (gomp_nvptx_main): Rename to...
(gomp_nvptx_main_1): ... this and mark noinline.
(gomp_nvptx_main): Wrap the above, set up __nvptx_uni and
__nvptx_stacks.
---
 libgomp/config/nvptx/team.c | 37 +
 1 file changed, 29 insertions(+), 8 deletions(-)

diff --git a/libgomp/config/nvptx/team.c b/libgomp/config/nvptx/team.c
index 88d1d34..deb0860 100644
--- a/libgomp/config/nvptx/team.c
+++ b/libgomp/config/nvptx/team.c
@@ -24,6 +24,8 @@
 
 /* This file handles the maintainence of threads on NVPTX.  */
 
+#if defined __nvptx_softstack && defined __nvptx_unisimt__
+
 #include "libgomp.h"
 #include 
 
@@ -31,15 +33,9 @@ struct gomp_thread *nvptx_thrs;
 
 static void gomp_thread_start (struct gomp_thread_pool *);
 
-void
-gomp_nvptx_main (void (*fn) (void *), void *fn_data)
+static void __attribute__((noinline))
+gomp_nvptx_main_1 (void (*fn) (void *), void *fn_data, int ntids, int tid)
 {
-  int ntids, tid, laneid;
-  asm ("mov.u32 %0, %%laneid;" : "=r" (laneid));
-  if (laneid)
-return;
-  asm ("mov.u32 %0, %%tid.y;" : "=r" (tid));
-  asm ("mov.u32 %0, %%ntid.y;" : "=r"(ntids));
   if (tid == 0)
 {
   gomp_global_icv.nthreads_var = ntids;
@@ -72,6 +68,30 @@ gomp_nvptx_main (void (*fn) (void *), void *fn_data)
 }
 }
 
+void
+gomp_nvptx_main (void (*fn) (void *), void *fn_data)
+{
+  int tid, ntids;
+  asm ("mov.u32 %0, %%tid.y;" : "=r" (tid));
+  asm ("mov.u32 %0, %%ntid.y;" : "=r"(ntids));
+  char *stacks = 0;
+  int *__nvptx_uni;
+  asm ("cvta.shared.u64 %0, __nvptx_uni;" : "=r" (__nvptx_uni));
+  __nvptx_uni[tid] = 0;
+  if (tid == 0)
+{
+  size_t stacksize = 131072;
+  stacks = gomp_malloc (stacksize * ntids);
+  char **__nvptx_stacks = 0;
+  asm ("cvta.shared.u64 %0, __nvptx_stacks;" : "=r" (__nvptx_stacks));
+  for (int i = 0; i < ntids; i++)
+   __nvptx_stacks[i] = stacks + stacksize * (i + 1);
+}
+  asm ("bar.sync 0;");
+  gomp_nvptx_main_1 (fn, fn_data, ntids, tid);
+  free (stacks);
+}
+
 /* This function is a pthread_create entry point.  This contains the idle
loop in which a thread waits to be called up to become part of a team.  */
 
@@ -160,3 +180,4 @@ gomp_team_start (void (*fn) (void *), void *data, unsigned 
nthreads,
 }
 
 #include "../../team.c"
+#endif


[gomp-nvptx 5/9] new target hook: TARGET_SIMT_VF

2015-12-01 Thread Alexander Monakov
This patch adds a new target hook and implements it in a straightforward
manner on NVPTX to indicate that the target is running in SIMT fashion with 32
threads in a synchronous group ("warp").  For use in OpenMP transforms.
---
 gcc/config/nvptx/nvptx.c | 12 
 gcc/doc/tm.texi  |  4 
 gcc/doc/tm.texi.in   |  2 ++
 gcc/target.def   | 12 
 4 files changed, 30 insertions(+)

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 48ee96e..eb3b67e 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -3684,10 +3684,19 @@ nvptx_expand_builtin (tree exp, rtx target, rtx 
ARG_UNUSED (subtarget),
 }
 }
 
+
 /* Define dimension sizes for known hardware.  */
 #define PTX_VECTOR_LENGTH 32
 #define PTX_WORKER_LENGTH 32
 
+/* Implement TARGET_SIMT_VF target hook: number of threads in a warp.  */
+
+static int
+nvptx_simt_vf ()
+{
+  return PTX_VECTOR_LENGTH;
+}
+
 /* Validate compute dimensions of an OpenACC offload or routine, fill
in non-unity defaults.  FN_LEVEL indicates the level at which a
routine might spawn a loop.  It is negative for non-routines.  */
@@ -4258,6 +4267,9 @@ nvptx_goacc_reduction (gcall *call)
 #undef  TARGET_BUILTIN_DECL
 #define TARGET_BUILTIN_DECL nvptx_builtin_decl
 
+#undef TARGET_SIMT_VF
+#define TARGET_SIMT_VF nvptx_simt_vf
+
 #undef TARGET_GOACC_VALIDATE_DIMS
 #define TARGET_GOACC_VALIDATE_DIMS nvptx_goacc_validate_dims
 
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index f394db7..e54944d 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -5765,6 +5765,10 @@ usable.  In that case, the smaller the number is, the 
more desirable it is
 to use it.
 @end deftypefn
 
+@deftypefn {Target Hook} int TARGET_SIMT_VF (void)
+Return number of threads in SIMT thread group on the target.
+@end deftypefn
+
 @deftypefn {Target Hook} bool TARGET_GOACC_VALIDATE_DIMS (tree @var{decl}, int 
*@var{dims}, int @var{fn_level})
 This hook should check the launch dimensions provided for an OpenACC
 compute region, or routine.  Defaulted values are represented as -1
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index d188c57..44ba697c 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4260,6 +4260,8 @@ address;  but often a machine-dependent strategy can 
generate better code.
 
 @hook TARGET_SIMD_CLONE_USABLE
 
+@hook TARGET_SIMT_VF
+
 @hook TARGET_GOACC_VALIDATE_DIMS
 
 @hook TARGET_GOACC_DIM_LIMIT
diff --git a/gcc/target.def b/gcc/target.def
index c7ec292..f5a03d6 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -1639,6 +1639,18 @@ int, (struct cgraph_node *), NULL)
 
 HOOK_VECTOR_END (simd_clone)
 
+/* Functions relating to OpenMP SIMT vectorization transform.  */
+#undef HOOK_PREFIX
+#define HOOK_PREFIX "TARGET_SIMT_"
+HOOK_VECTOR (TARGET_SIMT, simt)
+
+DEFHOOK
+(vf,
+"Return number of threads in SIMT thread group on the target.",
+int, (void), NULL)
+
+HOOK_VECTOR_END (simt)
+
 /* Functions relating to openacc.  */
 #undef HOOK_PREFIX
 #define HOOK_PREFIX "TARGET_GOACC_"


[gomp-nvptx 0/9] Codegen bits for NVPTX OpenMP SIMD

2015-12-01 Thread Alexander Monakov
Hello!

This patch series shows how I'm approaching OpenMP SIMD for NVPTX.  It looks
good both in check-c testing and libgomp testing, including new target-3x.c
cases (but for-5.c fails to run with resource exhaustion, maybe it should be
split for NVPTX -- will investigate more later).

The previously posted patch to handle 'omp_data_o' is no longer necessary with
soft-stacks.

Looking forward to your comments.

Alexander

  nvptx backend: allow emitting COND_EXEC insns
  nvptx backend: new "uniform SIMT" codegen variant
  nvptx backend: add two more identifier maps
  nvptx backend: add -mgomp option and multilib
  new target hook: TARGET_SIMT_VF
  nvptx libgcc: rewrite in C
  nvptx mkoffload: pass -mgomp for OpenMP offloading
  libgomp: update gomp_nvptx_main for -mgomp
  adjust SIMD loop lowering for SIMT targets

 gcc/config/nvptx/mkoffload.c   |   7 ++
 gcc/config/nvptx/nvptx.c   | 181 -
 gcc/config/nvptx/nvptx.h   |   4 +
 gcc/config/nvptx/nvptx.md  |  61 +
 gcc/config/nvptx/nvptx.opt |   8 ++
 gcc/config/nvptx/t-nvptx   |   2 +
 gcc/doc/invoke.texi|  19 
 gcc/doc/tm.texi|   4 +
 gcc/doc/tm.texi.in |   2 +
 gcc/internal-fn.c  |  22 +
 gcc/internal-fn.def|   2 +
 gcc/omp-low.c  | 138 ++--
 gcc/passes.def |   1 +
 gcc/target.def |  12 +++
 gcc/tree-pass.h|   2 +
 libgcc/config/nvptx/crt0.c |  61 +
 libgcc/config/nvptx/crt0.s |  54 ---
 libgcc/config/nvptx/free.asm   |  50 --
 libgcc/config/nvptx/free.c |  34 +++
 libgcc/config/nvptx/malloc.asm |  55 ---
 libgcc/config/nvptx/malloc.c   |  35 +++
 libgcc/config/nvptx/nvptx-malloc.h |   5 +
 libgcc/config/nvptx/realloc.c  |   2 +
 libgcc/config/nvptx/stacks.c   |  30 ++
 libgcc/config/nvptx/t-nvptx|  11 ++-
 libgomp/config/nvptx/team.c|  37 ++--
 26 files changed, 622 insertions(+), 217 deletions(-)
 create mode 100644 libgcc/config/nvptx/crt0.c
 delete mode 100644 libgcc/config/nvptx/crt0.s
 delete mode 100644 libgcc/config/nvptx/free.asm
 create mode 100644 libgcc/config/nvptx/free.c
 delete mode 100644 libgcc/config/nvptx/malloc.asm
 create mode 100644 libgcc/config/nvptx/malloc.c
 create mode 100644 libgcc/config/nvptx/stacks.c



Re: [PATCH] Add testcase for tree-optimization/67916

2015-12-01 Thread Richard Biener
On Tue, Dec 1, 2015 at 4:18 PM, Marek Polacek  wrote:
> This PR was fixed in r228767 (or went latent?), but this testcase has never
> been added.
>
> Tested on x86_64-linux, ok for trunk?

Ok.

Richard.

> 2015-12-01  Marek Polacek  
>
> PR tree-optimization/67916
> * gcc.dg/torture/pr67916.c: New test.
>
> diff --git gcc/testsuite/gcc.dg/torture/pr67916.c 
> gcc/testsuite/gcc.dg/torture/pr67916.c
> index e69de29..88541f9 100644
> --- gcc/testsuite/gcc.dg/torture/pr67916.c
> +++ gcc/testsuite/gcc.dg/torture/pr67916.c
> @@ -0,0 +1,46 @@
> +/* PR tree-optimization/67916 */
> +/* { dg-do run } */
> +
> +int a[6], b = 1, d, e;
> +long long c;
> +static int f = 1;
> +
> +void
> +fn1 (int p1)
> +{
> +  b = (b >> 1) & (1 ^ a[(1 ^ p1) & 5]);
> +}
> +
> +void
> +fn2 ()
> +{
> +  b = (b >> 1) & (1 ^ a[(b ^ 1) & 1]);
> +  fn1 (c >> 1 & 5);
> +  fn1 (c >> 2 & 5);
> +  fn1 (c >> 4 & 5);
> +  fn1 (c >> 8 & 5);
> +}
> +
> +int
> +main ()
> +{
> +  int i, j;
> +  for (; d;)
> +{
> +  for (; e;)
> +   fn2 ();
> +  f = 0;
> +}
> +  for (i = 0; i < 8; i++)
> +{
> +  if (f)
> +   i = 9;
> +  for (j = 0; j < 7; j++)
> +   fn2 ();
> +}
> +
> +  if (b != 0)
> +__builtin_abort ();
> +
> +  return 0;
> +}
>
> Marek


[gomp-nvptx 7/9] nvptx mkoffload: pass -mgomp for OpenMP offloading

2015-12-01 Thread Alexander Monakov
This patch wires up use of alternative -mgomp multilib for OpenMP offloading
via nvptx mkoffload.  It makes OpenACC and OpenMP incompatible for
simultaneous offloading compilation, so I've added a diagnostic for that.

* config/nvptx/mkoffload.c (main): Check that either OpenACC or OpenMP
is selected.  Pass -mgomp to offload compiler in OpenMP case.
---
 gcc/config/nvptx/mkoffload.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/gcc/config/nvptx/mkoffload.c b/gcc/config/nvptx/mkoffload.c
index 7aa6f09..9a5d36d 100644
--- a/gcc/config/nvptx/mkoffload.c
+++ b/gcc/config/nvptx/mkoffload.c
@@ -460,6 +460,7 @@ main (int argc, char **argv)
 
   /* Scan the argument vector.  */
   bool fopenmp = false;
+  bool fopenacc = false;
   for (int i = 1; i < argc; i++)
 {
 #define STR "-foffload-abi="
@@ -476,11 +477,15 @@ main (int argc, char **argv)
 #undef STR
   else if (strcmp (argv[i], "-fopenmp") == 0)
fopenmp = true;
+  else if (strcmp (argv[i], "-fopenacc") == 0)
+   fopenacc = true;
   else if (strcmp (argv[i], "-save-temps") == 0)
save_temps = true;
   else if (strcmp (argv[i], "-v") == 0)
verbose = true;
 }
+  if (!(fopenacc ^ fopenmp))
+fatal_error (input_location, "either -fopenacc or -fopenmp must be set");
 
   struct obstack argv_obstack;
   obstack_init (_obstack);
@@ -501,6 +506,8 @@ main (int argc, char **argv)
 default:
   gcc_unreachable ();
 }
+  if (fopenmp)
+obstack_ptr_grow (_obstack, "-mgomp");
 
   for (int ix = 1; ix != argc; ix++)
 {


[gomp-nvptx 4/9] nvptx backend: add -mgomp option and multilib

2015-12-01 Thread Alexander Monakov
Since OpenMP offloading requires both soft-stacks and "uniform SIMT", both
non-traditional codegen variants, I'm building a multilib variant with those
enabled.  This patch adds option -mgomp which enables -msoft-stack plus
-muniform-simt, and builds a multilib with it.

* config/nvptx/nvptx.c (nvptx_option_override): Handle TARGET_GOMP.
* config/nvptx/nvptx.opt (mgomp): New option.
* config/nvptx/t-nvptx (MULTILIB_OPTIONS): New.
* doc/invoke.texi (mgomp): Document.
---
 gcc/config/nvptx/nvptx.c   | 3 +++
 gcc/config/nvptx/nvptx.opt | 4 
 gcc/config/nvptx/t-nvptx   | 2 ++
 gcc/doc/invoke.texi| 5 +
 4 files changed, 14 insertions(+)

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 3bd3cf7..48ee96e 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -153,6 +153,9 @@ nvptx_option_override (void)
 
   worker_red_sym = gen_rtx_SYMBOL_REF (Pmode, worker_red_name);
   worker_red_align = GET_MODE_ALIGNMENT (SImode) / BITS_PER_UNIT;
+
+  if (TARGET_GOMP)
+target_flags |= MASK_SOFT_STACK | MASK_UNIFORM_SIMT;
 }
 
 /* Return the mode to be used when declaring a ptx object for OBJ.
diff --git a/gcc/config/nvptx/nvptx.opt b/gcc/config/nvptx/nvptx.opt
index 47e811e..8826659 100644
--- a/gcc/config/nvptx/nvptx.opt
+++ b/gcc/config/nvptx/nvptx.opt
@@ -36,3 +36,7 @@ Use custom stacks instead of local memory for automatic 
storage.
 muniform-simt
 Target Report Mask(UNIFORM_SIMT)
 Generate code that executes all threads in a warp as if one was active.
+
+mgomp
+Target Report Mask(GOMP)
+Generate code for OpenMP offloading: enables -msoft-stack and -muniform-simt.
diff --git a/gcc/config/nvptx/t-nvptx b/gcc/config/nvptx/t-nvptx
index e2580c9..6c1010d 100644
--- a/gcc/config/nvptx/t-nvptx
+++ b/gcc/config/nvptx/t-nvptx
@@ -8,3 +8,5 @@ ALL_HOST_OBJS += mkoffload.o
 mkoffload$(exeext): mkoffload.o collect-utils.o libcommon-target.a 
$(LIBIBERTY) $(LIBDEPS)
+$(LINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) -o $@ \
  mkoffload.o collect-utils.o libcommon-target.a $(LIBIBERTY) $(LIBS)
+
+MULTILIB_OPTIONS = mgomp
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 46cd2e9..7e7f3b4 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -18956,6 +18956,11 @@ all-ones bitmasks for each warp, indicating current 
mode (0 outside of SIMD
 regions).  Each thread can bitwise-and the bitmask at position @code{tid.y}
 with current lane index to compute the master lane index.
 
+@item -mgomp
+@opindex mgomp
+Generate code for use in OpenMP offloading: enables @option{-msoft-stack} and
+@option{-muniform-simt} options, and selects corresponding multilib variant.
+
 @end table
 
 @node PDP-11 Options


Re: RFD: annotate iterator patterns with expanded forms

2015-12-01 Thread Jakub Jelinek
On Tue, Dec 01, 2015 at 04:14:21PM +0100, Bernd Schmidt wrote:
> One problem I have whenever I try to edit i386.md is that I can't find the
> patterns I'm looking for. Let's say I'm looking for lshrsi3, but there's no
> pattern by this name, what I'm looking for is "3". Even
> worse are things like "*xordi_2", which has just "*_2" and can't
> reasonably be searched for.

For this purpose there is
make mddump
goal which generates tmp-mddump.md in the object directory with expanded
iterators, where you can search for whatever you want.
With the comments in the *.md file I'd worry about them getting out of date,
or people feeling they have to edit them manually (rather than being
regenerated or whatever).

Jakub


RFD: annotate iterator patterns with expanded forms

2015-12-01 Thread Bernd Schmidt
One problem I have whenever I try to edit i386.md is that I can't find 
the patterns I'm looking for. Let's say I'm looking for lshrsi3, but 
there's no pattern by this name, what I'm looking for is 
"3". Even worse are things like "*xordi_2", which has 
just "*_2" and can't reasonably be searched for.


I've made a little proof-of-concept patch which makes gensupport 
generate ed scripts that can be applied to machine descriptions after 
some post processing. I'm attaching that patch, and the effect of the 
annotations on i386.md.


What should I do with this? Would people like to see a fully method of 
updating machine descriptions? Should we just generate them once for the 
most difficult files such as i386.md and apply them? Or do people find 
the additional comments to be visual clutter (the i386.md ones are 
brief, but the avx patterns in sse.md would end up with pretty long lists)?



Bernd
diff --git a/gcc/gensupport.c b/gcc/gensupport.c
index 484ead2..4daaef9 100644
--- a/gcc/gensupport.c
+++ b/gcc/gensupport.c
@@ -2236,6 +2236,32 @@ rtx_handle_directive (file_location loc, const char *rtx_name)
 
   rtx x;
   unsigned int i;
+  if (subrtxs.length () > 1
+  && (GET_CODE (subrtxs[0]) == DEFINE_INSN
+	  || GET_CODE (subrtxs[0]) == DEFINE_EXPAND))
+{
+  const char *p = "";
+  fprintf (stderr, "%s:%d\\ni\\n;; Expands to:\\n;; ", loc.filename, loc.lineno);
+  int len = 3;
+  int p_len = 0;
+  FOR_EACH_VEC_ELT (subrtxs, i, x)
+	{
+	  int this_len = strlen (XSTR (x, 0));
+	  if (len + this_len + p_len >= 78)
+	{
+	  fprintf (stderr, "\\n;; ");
+	  len = 3;
+	  p = "";
+	  p_len = 0;
+	}
+	  fprintf (stderr, "%s%s", p, XSTR (x, 0));
+	  len += this_len + p_len;
+	  p = ", ";
+	  p_len = 2;
+	}
+  fprintf (stderr, "\\n.\n");
+}
+
   FOR_EACH_VEC_ELT (subrtxs, i, x)
 process_rtx (x, loc);
 }
--- ../../git/gcc/config/i386/i386.md	2015-11-30 14:34:27.995459571 +0100
+++ ./i386.md	2015-12-01 15:58:59.817779596 +0100
@@ -1199,6 +1199,8 @@
 
 ;; Compare and branch/compare and store instructions.
 
+;; Expands to:
+;; cbranchqi4, cbranchhi4, cbranchsi4, cbranchdi4, cbranchti4
 (define_expand "cbranch4"
   [(set (reg:CC FLAGS_REG)
 	(compare:CC (match_operand:SDWIM 1 "nonimmediate_operand")
@@ -1217,6 +1219,8 @@
   DONE;
 })
 
+;; Expands to:
+;; cstoreqi4, cstorehi4, cstoresi4, cstoredi4
 (define_expand "cstore4"
   [(set (reg:CC FLAGS_REG)
 	(compare:CC (match_operand:SWIM 2 "nonimmediate_operand")
@@ -1233,11 +1237,15 @@
   DONE;
 })
 
+;; Expands to:
+;; cmpsi_1, cmpdi_1
 (define_expand "cmp_1"
   [(set (reg:CC FLAGS_REG)
 	(compare:CC (match_operand:SWI48 0 "nonimmediate_operand")
 		(match_operand:SWI48 1 "")))])
 
+;; Expands to:
+;; *cmpqi_ccno_1, *cmphi_ccno_1, *cmpsi_ccno_1, *cmpdi_ccno_1
 (define_insn "*cmp_ccno_1"
   [(set (reg FLAGS_REG)
 	(compare (match_operand:SWI 0 "nonimmediate_operand" ",?m")
@@ -1251,6 +1259,8 @@
(set_attr "modrm_class" "op0,unknown")
(set_attr "mode" "")])
 
+;; Expands to:
+;; *cmpqi_1, *cmphi_1, *cmpsi_1, *cmpdi_1
 (define_insn "*cmp_1"
   [(set (reg FLAGS_REG)
 	(compare (match_operand:SWI 0 "nonimmediate_operand" "m,")
@@ -1260,6 +1270,8 @@
   [(set_attr "type" "icmp")
(set_attr "mode" "")])
 
+;; Expands to:
+;; *cmpqi_minus_1, *cmphi_minus_1, *cmpsi_minus_1, *cmpdi_minus_1
 (define_insn "*cmp_minus_1"
   [(set (reg FLAGS_REG)
 	(compare
@@ -1382,6 +1394,8 @@
   DONE;
 })
 
+;; Expands to:
+;; cbranchsf4, cbranchdf4
 (define_expand "cbranch4"
   [(set (reg:CC FLAGS_REG)
 	(compare:CC (match_operand:MODEF 1 "cmp_fp_expander_operand")
@@ -1399,6 +1413,8 @@
   DONE;
 })
 
+;; Expands to:
+;; cstoresf4, cstoredf4
 (define_expand "cstore4"
   [(set (reg:CC FLAGS_REG)
 	(compare:CC (match_operand:MODEF 2 "cmp_fp_expander_operand")
@@ -1450,6 +1466,8 @@
 ;; We may not use "#" to split and emit these, since the REG_DEAD notes
 ;; used to manage the reg stack popping would not be preserved.
 
+;; Expands to:
+;; *cmpsf_0_i387, *cmpdf_0_i387, *cmpxf_0_i387
 (define_insn "*cmp_0_i387"
   [(set (match_operand:HI 0 "register_operand" "=a")
 	(unspec:HI
@@ -1516,6 +1534,8 @@
(set_attr "unit" "i387")
(set_attr "mode" "XF")])
 
+;; Expands to:
+;; *cmpsf_i387, *cmpdf_i387
 (define_insn "*cmp_i387"
   [(set (match_operand:HI 0 "register_operand" "=a")
 	(unspec:HI
@@ -1549,6 +1569,8 @@
(set_attr "unit" "i387")
(set_attr "mode" "")])
 
+;; Expands to:
+;; *cmpusf_i387, *cmpudf_i387, *cmpuxf_i387
 (define_insn "*cmpu_i387"
   [(set (match_operand:HI 0 "register_operand" "=a")
 	(unspec:HI
@@ -1582,6 +1604,9 @@
(set_attr "unit" "i387")
(set_attr "mode" "")])
 
+;; Expands to:
+;; *cmpsf_hi_i387, *cmpdf_hi_i387, *cmpxf_hi_i387, *cmpsf_si_i387
+;; *cmpdf_si_i387, *cmpxf_si_i387
 (define_insn "*cmp__i387"
   [(set (match_operand:HI 0 "register_operand" "=a")
 	(unspec:HI
@@ -1666,6 +1691,8 @@
 (define_mode_iterator FPCMP [CCFP CCFPU])
 (define_mode_attr unord [(CCFP "") (CCFPU "u")])
 
+;; 

  1   2   >