[committed] Fix asan typo (PR sanitizer/70712)

2016-04-22 Thread Jakub Jelinek
Hi!

As the testcase below shows, -fsanitize=address has issues with
correspondence of actual layout of vars in the stack (or heap) frame and
what regions are considered to be pads (and which kind), if there are
variables aligned to more than ASAN_RED_ZONE_SIZE (32 bytes).

The bug is in a typo, the intent in align_base call in expand_stack_vars
is to match what alloc_stack_frame_space is doing (assuming frame_phase of
0), but that function has:
  if (FRAME_GROWS_DOWNWARD)
{
  ... align_base (..., false) ...
}
  else
{
  ... align_base (..., true) ...
}
so we need to pass !FRAME_GROWS_DOWNWARD instead of FRAME_GROWS_DOWNWARD.
Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk as
obvious, will backport to branches after 6.1 goes out.

2016-04-23  Jakub Jelinek  

PR sanitizer/70712
* cfgexpand.c (expand_stack_vars): Fix typo.

* c-c++-common/asan/pr70712.c: New test.

--- gcc/cfgexpand.c.jj  2016-04-22 18:21:52.0 +0200
+++ gcc/cfgexpand.c 2016-04-22 18:35:02.129325661 +0200
@@ -1137,7 +1137,7 @@ expand_stack_vars (bool (*pred) (size_t)
  HOST_WIDE_INT prev_offset
= align_base (frame_offset,
  MAX (alignb, ASAN_RED_ZONE_SIZE),
- FRAME_GROWS_DOWNWARD);
+ !FRAME_GROWS_DOWNWARD);
  tree repr_decl = NULL_TREE;
  offset
= alloc_stack_frame_space (stack_vars[i].size
--- gcc/testsuite/c-c++-common/asan/pr70712.c.jj2016-04-22 
18:30:45.246786590 +0200
+++ gcc/testsuite/c-c++-common/asan/pr70712.c   2016-04-22 18:30:30.0 
+0200
@@ -0,0 +1,32 @@
+/* PR sanitizer/70712 */
+/* { dg-do run } */
+
+struct __attribute__((aligned (64))) S
+{
+  char s[4];
+};
+
+struct T
+{
+  char t[8];
+  char u[480];
+
+};
+
+__attribute__((noinline, noclone)) void
+foo (struct T *p, struct S *q)
+{
+  __builtin_memset (p->t, '\0', sizeof (p->t));
+  __builtin_memset (p->u, '\0', sizeof (p->u));
+  __builtin_memset (q->s, '\0', sizeof (q->s));
+}
+
+int
+main ()
+{
+  struct S s;
+  struct T t;
+  foo (, );
+  asm volatile ("" : : "r" (), "r" () : "memory");
+  return 0;
+}

Jakub


Go patch committed: expose the runtime code in a Call_expression

2016-04-22 Thread Ian Lance Taylor
This patch by Chris Manghane exposes the runtime function code in a
Call_expression, in the cases where function call is to a runtime
function.  This isn't useful by itself but is a prerequisite for
future work.  Bootstrapped and ran Go testsuite on
x86_64-pc-linux-gnu.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 235380)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-97b358f525584e45fa2e3d83fc7d3a091900927a
+944c3ca6ac7c204585fd73936894fe05de535b94
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/expressions.cc
===
--- gcc/go/gofrontend/expressions.cc(revision 234304)
+++ gcc/go/gofrontend/expressions.cc(working copy)
@@ -1141,7 +1141,13 @@ Expression*
 Expression::make_func_reference(Named_object* function, Expression* closure,
Location location)
 {
-  return new Func_expression(function, closure, location);
+  Func_expression* fe = new Func_expression(function, closure, location);
+
+  // Detect references to builtin functions and set the runtime code if
+  // appropriate.
+  if (function->is_function_declaration())
+fe->set_runtime_code(Runtime::name_to_code(function->name()));
+  return fe;
 }
 
 // Class Func_descriptor_expression.
Index: gcc/go/gofrontend/expressions.h
===
--- gcc/go/gofrontend/expressions.h (revision 234304)
+++ gcc/go/gofrontend/expressions.h (working copy)
@@ -11,6 +11,7 @@
 #include 
 
 #include "operator.h"
+#include "runtime.h"
 
 class Gogo;
 class Translate_context;
@@ -2149,7 +2150,8 @@ class Func_expression : public Expressio
   Func_expression(Named_object* function, Expression* closure,
  Location location)
 : Expression(EXPRESSION_FUNC_REFERENCE, location),
-  function_(function), closure_(closure)
+  function_(function), closure_(closure),
+  runtime_code_(Runtime::NUMBER_OF_FUNCTIONS)
   { }
 
   // Return the object associated with the function.
@@ -2163,6 +2165,23 @@ class Func_expression : public Expressio
   closure()
   { return this->closure_; }
 
+  // Return whether this is a reference to a runtime function.
+  bool
+  is_runtime_function() const
+  { return this->runtime_code_ != Runtime::NUMBER_OF_FUNCTIONS; }
+
+  // Return the runtime code for this function expression.
+  // Returns Runtime::NUMBER_OF_FUNCTIONS if this is not a reference to a
+  // runtime function.
+  Runtime::Function
+  runtime_code() const
+  { return this->runtime_code_; }
+
+  // Set the runtime code for this function expression.
+  void
+  set_runtime_code(Runtime::Function code)
+  { this->runtime_code_ = code; }
+
   // Return a backend expression for the code of a function.
   static Bexpression*
   get_code_pointer(Gogo*, Named_object* function, Location loc);
@@ -2204,6 +2223,8 @@ class Func_expression : public Expressio
   // be a struct holding pointers to all the variables referenced by
   // this function and defined in enclosing functions.
   Expression* closure_;
+  // The runtime code for the referenced function.
+  Runtime::Function runtime_code_;
 };
 
 // A function descriptor.  A function descriptor is a struct with a
Index: gcc/go/gofrontend/runtime.cc
===
--- gcc/go/gofrontend/runtime.cc(revision 234304)
+++ gcc/go/gofrontend/runtime.cc(working copy)
@@ -402,3 +402,39 @@ Runtime::map_iteration_type()
Linemap::predeclared_location());
   return Type::make_array_type(runtime_function_type(RFT_POINTER), iexpr);
 }
+
+
+// Get the runtime code for a named builtin function.  This is used as a helper
+// when creating function references for call expressions.  Every reference to
+// a builtin runtime function should have the associated runtime code.  If the
+// name is ambiguous and can refer to many runtime codes, return
+// NUMBER_OF_FUNCTIONS.
+
+Runtime::Function
+Runtime::name_to_code(const std::string& name)
+{
+  Function code = Runtime::NUMBER_OF_FUNCTIONS;
+
+  // Aliases seen in function declaration code.
+  // TODO(cmang): Add other aliases.
+  if (name == "new")
+code = Runtime::NEW;
+  else if (name == "close")
+code = Runtime::CLOSE;
+  else if (name == "copy")
+code = Runtime::COPY;
+  else if (name == "append")
+code = Runtime::APPEND;
+  else if (name == "delete")
+code = Runtime::MAPDELETE;
+  else
+{
+  // Look through the known names for a match.
+  for (size_t i = 0; i < Runtime::NUMBER_OF_FUNCTIONS; i++)
+   {
+ if (strcmp(runtime_functions[i].name, name.c_str()) == 0)
+   code = static_cast(i);
+   }
+}
+  return code;
+}
Index: 

Re: [Patch, regex, libstdc++/70745] Fix match_not_bow and match_not_eow

2016-04-22 Thread Tim Shen
On Fri, Apr 22, 2016 at 3:23 AM, Jonathan Wakely wrote:
> OK for trunk.
>
> It is a small, safe change, so OK for the branches (6/5/4.9) too, but
> let's wait a short while to make sure nobody finds any problems on
> trunk (and gcc-6-branch is frozen for release right now anyway).

Committed to trunk as r235382.

Thanks!


-- 
Regards,
Tim Shen


libgo patch committed: Commit final version of pkg-config support

2016-04-22 Thread Ian Lance Taylor
This patch updates gccgo to the final version of the gccgo pkg-config
support committed to the gc repository.  The patch to gc is
https://golang.org/cl/18790, by Michael Hudson-Doyle.  This fixes
https://golang.org/issue/11739.  Bootstrapped and ran Go testsuite on
x86_64-pc-linux-gnu.  Committed to mainline.

I will commit this to the GCC 6 branch when that is open for bug fixes.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 234958)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-ff29ea8e4e69eb94958aef4388da09a61b2b52b6
+97b358f525584e45fa2e3d83fc7d3a091900927a
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/go/cmd/go/build.go
===
--- libgo/go/cmd/go/build.go(revision 234304)
+++ libgo/go/cmd/go/build.go(working copy)
@@ -2647,9 +2647,18 @@ func (tools gccgoToolchain) ld(b *builde
if err != nil {
return err
}
+   const ldflagsPrefix = "_CGO_LDFLAGS="
for _, line := range strings.Split(string(flags), "\n") {
-   if strings.HasPrefix(line, "_CGO_LDFLAGS=") {
-   cgoldflags = append(cgoldflags, 
strings.Fields(line[13:])...)
+   if strings.HasPrefix(line, ldflagsPrefix) {
+   newFlags := 
strings.Fields(line[len(ldflagsPrefix):])
+   for _, flag := range newFlags {
+   // Every _cgo_flags file has -g and -O2 
in _CGO_LDFLAGS
+   // but they don't mean anything to the 
linker so filter
+   // them out.
+   if flag != "-g" && 
!strings.HasPrefix(flag, "-O") {
+   cgoldflags = append(cgoldflags, 
flag)
+   }
+   }
}
}
return nil


Re: [PATCH] Fix missed DSE opportunity with operator delete.

2016-04-22 Thread Mikhail Maltsev
On 04/20/2016 05:12 PM, Richard Biener wrote:
> You have
> 
> +static tree
> +handle_free_attribute (tree *node, tree name, tree /*args*/, int /*flags*/,
> +  bool *no_add_attrs)
> +{
> +  tree decl = *node;
> +  if (TREE_CODE (decl) == FUNCTION_DECL
> +  && type_num_arguments (TREE_TYPE (decl)) != 0
> +  && POINTER_TYPE_P (TREE_VALUE (TYPE_ARG_TYPES (TREE_TYPE (decl)
> +DECL_ALLOC_FN_KIND (decl) = ALLOC_FN_FREE;
> +  else
> +{
> +  warning_at (DECL_SOURCE_LOCATION (decl), OPT_Wattributes,
> + "%qE attribute ignored", name);
> +  *no_add_attrs = true;
> +}
> 
> so one can happily apply the attribute to
> 
>  void foo (void *, void *);
> 
> but then
> 
> @@ -2117,6 +2127,13 @@ call_may_clobber_ref_p_1 (gcall *call, ao_ref *ref)
>   /* Fallthru to general call handling.  */;
>}
> 
> +  if (callee != NULL_TREE
> +  && (flags_from_decl_or_type (callee) & ECF_FREE) != 0)
> +{
> +  tree ptr = gimple_call_arg (call, 0);
> +  return ptr_deref_may_alias_ref_p_1 (ptr, ref);
> +}
> 
> will ignore the 2nd argument.  I think it's better to ignore the attribute
> if type_num_arguments () != 1.

Actually, the C++ standard ([basic.stc.dynamic]/2) defines the following 4
deallocation functions implicitly:

void operator delete(void*);
void operator delete[](void*);
void operator delete(void*, std::size_t) noexcept;
void operator delete[](void*, std::size_t) noexcept;

And the standard library also has:

void operator delete(void*, const std::nothrow_t&);
void operator delete[](void*, const std::nothrow_t&);
void operator delete(void*, std::size_t, const std::nothrow_t&);
void operator delete[](void*, std::size_t, const std::nothrow_t&);

IIUC, 'delete(void*, std::size_t)' is used by default in C++14
(https://gcc.gnu.org/ml/gcc-patches/2014-12/msg01266.html). How should we handle
this?

-- 
Regards,
Mikhail Maltsev


Re: An abridged "Writing C" for the gcc web pages

2016-04-22 Thread Sandra Loosemore

On 04/22/2016 10:42 AM, paul_kon...@dell.com wrote:



On Apr 22, 2016, at 12:21 PM, Bernd Schmidt
 wrote:

(Apologies if you get this twice, the mailing list didn't like the
html attachment in the first attempt).

We frequently get malformatted patches, and it's been brought to my
attention that some people don't even make the effort to read the
GNU coding standards before trying to contribute code. TL;DR seems
to be the excuse, and while I find that attitude inappropriate, we
could probably improve the situation by spelling out the most basic
rules in an abridged document on our webpages. Below is a draft I
came up with. Thoughts?


Would you expect people to conform to the abridged version or the
full standard?  If the full standard, then publishing an abridged
version is not a good idea, it will just cause confusion.  Let the
full standard be the rule, make people read it, and if they didn't
bother that's their problem.


I agree; let's not have two documents that can conflict or get out of 
sync with each other, unless you can figure out how to extract the 
abridged document automatically from the full version.


I think it's fine to have something on the web pages explaining that all 
contributions must follow the GNU coding standards (with a link) since 
code that follows the same formatting conventions throughout is easier 
to read, and that (in particular) patches must match the style of the 
surrounding code.


-Sandra


Re: [PATCH] Verify __builtin_unreachable and __builtin_trap are not called with arguments

2016-04-22 Thread Martin Jambor
Hi,

On Fri, Apr 22, 2016 at 09:24:31PM +0200, Richard Biener wrote:
> On April 22, 2016 7:04:31 PM GMT+02:00, Martin Jambor  wrote:
> >Hi,
> >
> >this patch adds verification that __builtin_unreachable and
> >__builtin_trap are not called with arguments.  The problem with calls
> >to them with arguments is that functions like gimple_call_builtin_p
> >return false on them, because they return true only when
> >gimple_builtin_call_types_compatible_p does.  One manifestation of
> >that was PR 61591 where undefined behavior sanitizer did not replace
> >such calls with its thing as it should, but there might be others.
> >
> >I have included __builtin_trap in the verification because they often
> >seem to be handled together but can either remove it or add more
> >builtins if people think it better.  I concede it is a bit arbitrary.
> >
> >Honza said he has seen __builtin_unreachable calls with parameters in
> >LTO builds of Firefox, so it seems this might actually trigger, but I
> >also think we do not want such calls in the IL.
> >
> >I have bootstrapped and tested this on x86_64-linux (with all
> >languages and Ada) and have also run a C, C++ and Fortran LTO
> >bootstrap with the patch on the same architecture.  OK for trunk?
> 
> Shouldn't we simply error in the FEs for this given the builtins
> essentially have a prototype?  That is, error for non-matching args
> for the __built-in_ variant of _all_ builtins (treat them as
> prototyped)?
> 

We do that.  It is just that at times we produce a call to
__builtin_unreachable internally.  The only instance I know of is IPA
figuring out a call cannot happen in a legal program (for example
because it would lead to a call of abstract virtual functions) but
perhaps there are other places where we do it.

I thought we have fixed the issue of IPA leaving behind arguments in
the calls to __builtin_unreachable it produced and this verification
would simply made sure the bug does not come back but Honza's
observation suggests that it still sometimes happens.

Martin

> Richard.
> 
> >Thanks,
> >
> >Martin
> >
> >
> >2016-04-20  Martin Jambor  
> >
> > * tree-cfg.c (verify_gimple_call): Check that calls to
> > __builtin_unreachable or __builtin_trap do not have actual arguments.
> >---
> > gcc/tree-cfg.c | 20 
> > 1 file changed, 20 insertions(+)
> >
> >diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
> >index 04e46fd..3385164 100644
> >--- a/gcc/tree-cfg.c
> >+++ b/gcc/tree-cfg.c
> >@@ -3414,6 +3414,26 @@ verify_gimple_call (gcall *stmt)
> >   return true;
> > }
> > 
> >+  if (fndecl && DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL)
> >+{
> >+  switch (DECL_FUNCTION_CODE (fndecl))
> >+{
> >+case BUILT_IN_UNREACHABLE:
> >+case BUILT_IN_TRAP:
> >+  if (gimple_call_num_args (stmt) > 0)
> >+{
> >+  /* Built-in unreachable with parameters might not be caught by
> >+ undefined behavior santizer. */
> >+  error ("__builtin_unreachable or __builtin_trap call with "
> >+ "arguments");
> >+  return true;
> >+}
> >+  break;
> >+default:
> >+  break;
> >+}
> >+}
> >+
> >   /* ???  The C frontend passes unpromoted arguments in case it
> >  didn't see a function declaration before the call.  So for now
> >  leave the call arguments mostly unverified.  Once we gimplify
> 
> 


Re: C++ PATCH for c++/70744 (wrong-code with x ?: y extension)

2016-04-22 Thread Jason Merrill
On Fri, Apr 22, 2016 at 2:12 PM, Marek Polacek  wrote:
> On Fri, Apr 22, 2016 at 09:43:31AM -0400, Jason Merrill wrote:
>> On 04/22/2016 07:50 AM, Marek Polacek wrote:
>> >This PR shows that we generate wrong code with the x ?: y extension in case 
>> >the
>> >first operand contains either predecrement or preincrement.  The problem is
>> >that we don't emit SAVE_EXPR, thus the operand is evaluated twice, which it
>> >should not be.
>> >
>> >While ++i or --i can be lvalues in C++, i++ or i-- can not.  The code in
>> >build_conditional_expr_1 has:
>> >  4635   /* Make sure that lvalues remain lvalues.  See 
>> > g++.oliva/ext1.C.  */
>> >  4636   if (real_lvalue_p (arg1))
>> >  4637 arg2 = arg1 = stabilize_reference (arg1);
>> >  4638   else
>> >  4639 arg2 = arg1 = save_expr (arg1);
>> >so for ++i/--i we call stabilize_reference, but that doesn't know anything
>> >about PREINCREMENT_EXPR or PREDECREMENT_EXPR and just returns the same
>> >expression, so SAVE_EXPR isn't created.
>> >
>> >I think one fix would be to teach stabilize_reference what to do with those,
>> >similarly to how we handle COMPOUND_EXPR there.
>>
>> Your change will turn the expression into an rvalue, so that isn't enough.
>
> Oops.
>
>> We need to turn the expression into some sort of _REF before passing it to
>> stabilize_reference, perhaps by factoring out the cast-to-reference code
>> from force_paren_expr.  This should probably be part of a
>> cp_stabilize_reference function that replaces all uses of
>> stabilize_reference in the front end.
>
> Thanks, this magic seems to work.  So something like the following?  I didn't
> put the cast-to-reference code into its separate function because it didn't
> seem convenient, but I can work on that, too, if you prefer.

> +cp_stabilize_reference (tree ref)
> +{
> +  if (TREE_CODE (ref) == PREINCREMENT_EXPR
> +  || TREE_CODE (ref) == PREDECREMENT_EXPR)

I think we want to do this for anything stabilize_reference doesn't
handle specifically, not just pre..crement.

Jason


Re: [PATCH] Verify __builtin_unreachable and __builtin_trap are not called with arguments

2016-04-22 Thread Richard Biener
On April 22, 2016 7:04:31 PM GMT+02:00, Martin Jambor  wrote:
>Hi,
>
>this patch adds verification that __builtin_unreachable and
>__builtin_trap are not called with arguments.  The problem with calls
>to them with arguments is that functions like gimple_call_builtin_p
>return false on them, because they return true only when
>gimple_builtin_call_types_compatible_p does.  One manifestation of
>that was PR 61591 where undefined behavior sanitizer did not replace
>such calls with its thing as it should, but there might be others.
>
>I have included __builtin_trap in the verification because they often
>seem to be handled together but can either remove it or add more
>builtins if people think it better.  I concede it is a bit arbitrary.
>
>Honza said he has seen __builtin_unreachable calls with parameters in
>LTO builds of Firefox, so it seems this might actually trigger, but I
>also think we do not want such calls in the IL.
>
>I have bootstrapped and tested this on x86_64-linux (with all
>languages and Ada) and have also run a C, C++ and Fortran LTO
>bootstrap with the patch on the same architecture.  OK for trunk?

Shouldn't we simply error in the FEs for this given the builtins essentially 
have a prototype?  That is, error for non-matching args for the __built-in_ 
variant of _all_ builtins (treat them as prototyped)?

Richard.

>Thanks,
>
>Martin
>
>
>2016-04-20  Martin Jambor  
>
>   * tree-cfg.c (verify_gimple_call): Check that calls to
>   __builtin_unreachable or __builtin_trap do not have actual arguments.
>---
> gcc/tree-cfg.c | 20 
> 1 file changed, 20 insertions(+)
>
>diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
>index 04e46fd..3385164 100644
>--- a/gcc/tree-cfg.c
>+++ b/gcc/tree-cfg.c
>@@ -3414,6 +3414,26 @@ verify_gimple_call (gcall *stmt)
>   return true;
> }
> 
>+  if (fndecl && DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL)
>+{
>+  switch (DECL_FUNCTION_CODE (fndecl))
>+  {
>+  case BUILT_IN_UNREACHABLE:
>+  case BUILT_IN_TRAP:
>+if (gimple_call_num_args (stmt) > 0)
>+  {
>+/* Built-in unreachable with parameters might not be caught by
>+   undefined behavior santizer. */
>+error ("__builtin_unreachable or __builtin_trap call with "
>+   "arguments");
>+return true;
>+  }
>+break;
>+  default:
>+break;
>+  }
>+}
>+
>   /* ???  The C frontend passes unpromoted arguments in case it
>  didn't see a function declaration before the call.  So for now
>  leave the call arguments mostly unverified.  Once we gimplify




Re: [PATCH PR70715]Expand simple operations in IV.base and check if it's the control_IV

2016-04-22 Thread Christophe Lyon
On 22 April 2016 at 17:38, Bin.Cheng  wrote:
> On Fri, Apr 22, 2016 at 4:26 PM, Christophe Lyon
>  wrote:
>> On 21 April 2016 at 11:03, Richard Biener  wrote:
>>> On Wed, Apr 20, 2016 at 5:08 PM, Bin Cheng  wrote:
 Hi,
 As reported in PR70715, GCC failed to prove no-overflows of IV([n]) for 
 simple example like:
 int
 foo (char *p, unsigned n)
 {
   while(n--)
 {
   p[n]='A';
 }
   return 0;
 }
 Actually, code has already been added to handle this form loops when 
 fixing PR68529.  Problem with this case is loop niter analyzer records 
 control_IV with its base expanded by calling expand_simple_operations.  
 This patch simply adds code expanding BASE before we check its equality 
 against control_IV.base.  In the long run, we might want to remove the use 
 of expand_simple_operations.

 Bootstrap and test on x86_64.  Is it OK?
>>>
>>
>> Hi Bin,
>>
>> On ARM and AArch64 bare-metal toolchains, this causes
>>
>> FAIL: gcc.dg/tree-ssa/minmax-2.c scan-tree-dump optimized "__builtin_fmin"
> Hi Christophe,
> As Kyrill pointed out, it doesn't look likely.  The case doesn't even
> have a loop for the patch to apply?
>

Ha, you are right, sorry for the noise.

I've had a lot of infrastructure problems these days with many build
failures, introducing noise in the results :(

Christophe

> Thanks,
> bin
>>
>> Christophe
>>
>>> Ok.
>>>
>>> Richard.
>>>
 Thanks,
 bin


 2016-04-20  Bin Cheng  

 PR tree-optimization/70715
 * tree-ssa-loop-niter.c (loop_exits_before_overflow): Check 
 equality
 after expanding BASE using expand_simple_operations.


Re: [PATCH] Allow all 1s of integer as standard SSE constants

2016-04-22 Thread H.J. Lu
On Fri, Apr 22, 2016 at 11:57 AM, Uros Bizjak  wrote:
> On Fri, Apr 22, 2016 at 8:20 PM, H.J. Lu  wrote:
>> On Fri, Apr 22, 2016 at 10:29 AM, Uros Bizjak  wrote:
>>> On Fri, Apr 22, 2016 at 7:10 PM, Uros Bizjak  wrote:
 On Fri, Apr 22, 2016 at 4:19 PM, H.J. Lu  wrote:
> On Fri, Apr 22, 2016 at 5:11 AM, Uros Bizjak  wrote:
>> On Thu, Apr 21, 2016 at 10:58 PM, H.J. Lu  wrote:
>>
>>> Here is the updated patch with my standard_sse_constant_p change and
>>> your SSE/AVX pattern change.  I didn't include your
>>> standard_sse_constant_opcode since it didn't compile nor is needed
>>> for this purpose.
>>
>> H.J.,
>>
>> please test the attached patch that finally rewrites and improves SSE
>> constants handling.
>>
>> This is what I want to commit, a follow-up patch will further clean
>> standard_sse_constant_opcode wrt TARGET_AVX512VL.
>>
>
> It doesn't address my problem which is "Allow all 1s of integer as
> standard SSE constants".  The key here is "integer".  I'd like to use
> SSE/AVX store TI/OI/XI integers with -1.

 Yes, my patch *should* work for this. Please note that
 all_ones_operand should catch all cases your additional patch adds.

 ;; Return true if operand is a (vector) constant with all bits set.
 (define_predicate "all_ones_operand"
   (match_code "const_int,const_wide_int,const_vector")
 {
   if (op == constm1_rtx)
 return true;

   if (mode == VOIDmode)
 mode = GET_MODE (op);
   return op == CONSTM1_RTX (mode);
 })


 Can you please investigate, what is wrong with all_ones_operand so it
 doesn't accept all (-1) operands?
>>>
>>> Does following work:
>>>
>>> ;; Return true if operand is a (vector) constant with all bits set.
>>> (define_predicate "all_ones_operand"
>>>   (match_code "const_int,const_wide_int,const_vector")
>>> {
>>>   if (op == constm1_rtx)
>>> return true;
>>>
>>>   if (CONST_INT_P (op))
>>> return INTVAL (op) == HOST_WIDE_INT_M1;
>>>
>>>   if (mode == VOIDmode)
>>> mode = GET_MODE (op);
>>>   return op == CONSTM1_RTX (mode);
>>> })
>>>
>>
>> No.  I need a predicate, all_ones_operand or all_zeros_operand,
>> i.e., standard_sse_constant_p (op, mode) != 0.
>
> The predicate (standard_sse_constant_p) still works this way, but you
> have to provide non-VOID mode in case modeless (-1) is passed. Please
> note that VOID mode with modeless (-1) will ICE by design, since
> standard_sse_constant_p is not able to determine if insn is supported
> by target ISA.
>

This works:

/* Return 1 if X is all bits 0 and 2 if X is all bits 1
   in supported SSE/AVX vector mode.  */

int
standard_sse_constant_p (rtx x, machine_mode pred_mode)
{
  machine_mode mode;

  if (!TARGET_SSE)
return 0;

  if (const0_operand (x, VOIDmode))
return 1;

  mode = GET_MODE (x);

  /* VOIDmode integer constant, infer mode from the predicate.  */
  if (mode == VOIDmode)
mode = pred_mode;

  if (all_ones_operand (x, mode))
 no "else"  ^ mode instead of VOIDmode

This works.

-- 
H.J.
From 3009ef64f7d77f8b7382de1204143a1e2b9e9213 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Fri, 22 Apr 2016 06:30:59 -0700
Subject: [PATCH] Allow all 1s of integer as standard SSE constants

---
 gcc/config/i386/constraints.md |   7 ++-
 gcc/config/i386/i386-protos.h  |   2 +-
 gcc/config/i386/i386.c | 136 +
 gcc/config/i386/i386.md|  51 ++--
 gcc/config/i386/predicates.md  |  42 ++---
 gcc/config/i386/sse.md |   4 +-
 6 files changed, 141 insertions(+), 101 deletions(-)

diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
index afdc546..c02c321 100644
--- a/gcc/config/i386/constraints.md
+++ b/gcc/config/i386/constraints.md
@@ -186,7 +186,9 @@
 
 (define_constraint "BC"
   "@internal SSE constant operand."
-  (match_test "standard_sse_constant_p (op)"))
+  (and (match_test "TARGET_SSE")
+   (ior (match_operand 0 "const0_operand")
+	(match_operand 0 "all_ones_operand"
 
 ;; Integer constant constraints.
 (define_constraint "I"
@@ -239,7 +241,8 @@
 ;; This can theoretically be any mode's CONST0_RTX.
 (define_constraint "C"
   "SSE constant zero operand."
-  (match_test "standard_sse_constant_p (op) == 1"))
+  (and (match_test "TARGET_SSE")
+   (match_operand 0 "const0_operand")))
 
 ;; Constant-or-symbol-reference constraints.
 
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index ff47bc1..93b5e1e 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -50,7 +50,7 @@ extern bool ix86_using_red_zone (void);
 extern int standard_80387_constant_p (rtx);
 extern const char 

Re: [PATCH] Allow all 1s of integer as standard SSE constants

2016-04-22 Thread Uros Bizjak
On Fri, Apr 22, 2016 at 8:20 PM, H.J. Lu  wrote:
> On Fri, Apr 22, 2016 at 10:29 AM, Uros Bizjak  wrote:
>> On Fri, Apr 22, 2016 at 7:10 PM, Uros Bizjak  wrote:
>>> On Fri, Apr 22, 2016 at 4:19 PM, H.J. Lu  wrote:
 On Fri, Apr 22, 2016 at 5:11 AM, Uros Bizjak  wrote:
> On Thu, Apr 21, 2016 at 10:58 PM, H.J. Lu  wrote:
>
>> Here is the updated patch with my standard_sse_constant_p change and
>> your SSE/AVX pattern change.  I didn't include your
>> standard_sse_constant_opcode since it didn't compile nor is needed
>> for this purpose.
>
> H.J.,
>
> please test the attached patch that finally rewrites and improves SSE
> constants handling.
>
> This is what I want to commit, a follow-up patch will further clean
> standard_sse_constant_opcode wrt TARGET_AVX512VL.
>

 It doesn't address my problem which is "Allow all 1s of integer as
 standard SSE constants".  The key here is "integer".  I'd like to use
 SSE/AVX store TI/OI/XI integers with -1.
>>>
>>> Yes, my patch *should* work for this. Please note that
>>> all_ones_operand should catch all cases your additional patch adds.
>>>
>>> ;; Return true if operand is a (vector) constant with all bits set.
>>> (define_predicate "all_ones_operand"
>>>   (match_code "const_int,const_wide_int,const_vector")
>>> {
>>>   if (op == constm1_rtx)
>>> return true;
>>>
>>>   if (mode == VOIDmode)
>>> mode = GET_MODE (op);
>>>   return op == CONSTM1_RTX (mode);
>>> })
>>>
>>>
>>> Can you please investigate, what is wrong with all_ones_operand so it
>>> doesn't accept all (-1) operands?
>>
>> Does following work:
>>
>> ;; Return true if operand is a (vector) constant with all bits set.
>> (define_predicate "all_ones_operand"
>>   (match_code "const_int,const_wide_int,const_vector")
>> {
>>   if (op == constm1_rtx)
>> return true;
>>
>>   if (CONST_INT_P (op))
>> return INTVAL (op) == HOST_WIDE_INT_M1;
>>
>>   if (mode == VOIDmode)
>> mode = GET_MODE (op);
>>   return op == CONSTM1_RTX (mode);
>> })
>>
>
> No.  I need a predicate, all_ones_operand or all_zeros_operand,
> i.e., standard_sse_constant_p (op, mode) != 0.

The predicate (standard_sse_constant_p) still works this way, but you
have to provide non-VOID mode in case modeless (-1) is passed. Please
note that VOID mode with modeless (-1) will ICE by design, since
standard_sse_constant_p is not able to determine if insn is supported
by target ISA.

Uros.


Re: [PATCH] Allow all 1s of integer as standard SSE constants

2016-04-22 Thread H.J. Lu
On Fri, Apr 22, 2016 at 10:29 AM, Uros Bizjak  wrote:
> On Fri, Apr 22, 2016 at 7:10 PM, Uros Bizjak  wrote:
>> On Fri, Apr 22, 2016 at 4:19 PM, H.J. Lu  wrote:
>>> On Fri, Apr 22, 2016 at 5:11 AM, Uros Bizjak  wrote:
 On Thu, Apr 21, 2016 at 10:58 PM, H.J. Lu  wrote:

> Here is the updated patch with my standard_sse_constant_p change and
> your SSE/AVX pattern change.  I didn't include your
> standard_sse_constant_opcode since it didn't compile nor is needed
> for this purpose.

 H.J.,

 please test the attached patch that finally rewrites and improves SSE
 constants handling.

 This is what I want to commit, a follow-up patch will further clean
 standard_sse_constant_opcode wrt TARGET_AVX512VL.

>>>
>>> It doesn't address my problem which is "Allow all 1s of integer as
>>> standard SSE constants".  The key here is "integer".  I'd like to use
>>> SSE/AVX store TI/OI/XI integers with -1.
>>
>> Yes, my patch *should* work for this. Please note that
>> all_ones_operand should catch all cases your additional patch adds.
>>
>> ;; Return true if operand is a (vector) constant with all bits set.
>> (define_predicate "all_ones_operand"
>>   (match_code "const_int,const_wide_int,const_vector")
>> {
>>   if (op == constm1_rtx)
>> return true;
>>
>>   if (mode == VOIDmode)
>> mode = GET_MODE (op);
>>   return op == CONSTM1_RTX (mode);
>> })
>>
>>
>> Can you please investigate, what is wrong with all_ones_operand so it
>> doesn't accept all (-1) operands?
>
> Does following work:
>
> ;; Return true if operand is a (vector) constant with all bits set.
> (define_predicate "all_ones_operand"
>   (match_code "const_int,const_wide_int,const_vector")
> {
>   if (op == constm1_rtx)
> return true;
>
>   if (CONST_INT_P (op))
> return INTVAL (op) == HOST_WIDE_INT_M1;
>
>   if (mode == VOIDmode)
> mode = GET_MODE (op);
>   return op == CONSTM1_RTX (mode);
> })
>

No.  I need a predicate, all_ones_operand or all_zeros_operand,
i.e., standard_sse_constant_p (op, mode) != 0.


-- 
H.J.


Re: C++ PATCH for c++/70744 (wrong-code with x ?: y extension)

2016-04-22 Thread Marek Polacek
On Fri, Apr 22, 2016 at 09:43:31AM -0400, Jason Merrill wrote:
> On 04/22/2016 07:50 AM, Marek Polacek wrote:
> >This PR shows that we generate wrong code with the x ?: y extension in case 
> >the
> >first operand contains either predecrement or preincrement.  The problem is
> >that we don't emit SAVE_EXPR, thus the operand is evaluated twice, which it
> >should not be.
> >
> >While ++i or --i can be lvalues in C++, i++ or i-- can not.  The code in
> >build_conditional_expr_1 has:
> >  4635   /* Make sure that lvalues remain lvalues.  See 
> > g++.oliva/ext1.C.  */
> >  4636   if (real_lvalue_p (arg1))
> >  4637 arg2 = arg1 = stabilize_reference (arg1);
> >  4638   else
> >  4639 arg2 = arg1 = save_expr (arg1);
> >so for ++i/--i we call stabilize_reference, but that doesn't know anything
> >about PREINCREMENT_EXPR or PREDECREMENT_EXPR and just returns the same
> >expression, so SAVE_EXPR isn't created.
> >
> >I think one fix would be to teach stabilize_reference what to do with those,
> >similarly to how we handle COMPOUND_EXPR there.
> 
> Your change will turn the expression into an rvalue, so that isn't enough.

Oops.

> We need to turn the expression into some sort of _REF before passing it to
> stabilize_reference, perhaps by factoring out the cast-to-reference code
> from force_paren_expr.  This should probably be part of a
> cp_stabilize_reference function that replaces all uses of
> stabilize_reference in the front end.

Thanks, this magic seems to work.  So something like the following?  I didn't
put the cast-to-reference code into its separate function because it didn't
seem convenient, but I can work on that, too, if you prefer.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2016-04-22  Marek Polacek  

PR c++/70744
* call.c (build_conditional_expr_1): Call cp_stabilize_reference
instead of stabilize_reference.
(build_over_call): Likewise.
* cp-tree.h (cp_stabilize_reference): Declare.
* tree.c (cp_stabilize_reference): New function.
* typeck.c (cp_build_unary_op): Call cp_stabilize_reference instead of
stabilize_reference.
(unary_complex_lvalue): Likewise.
(cp_build_modify_expr): Likewise.

* g++.dg/ext/cond2.C: New test.

diff --git gcc/cp/call.c gcc/cp/call.c
index 11f2d42..476e806 100644
--- gcc/cp/call.c
+++ gcc/cp/call.c
@@ -4634,7 +4634,7 @@ build_conditional_expr_1 (location_t loc, tree arg1, tree 
arg2, tree arg3,
 
   /* Make sure that lvalues remain lvalues.  See g++.oliva/ext1.C.  */
   if (real_lvalue_p (arg1))
-   arg2 = arg1 = stabilize_reference (arg1);
+   arg2 = arg1 = cp_stabilize_reference (arg1);
   else
arg2 = arg1 = save_expr (arg1);
 }
@@ -7644,8 +7644,9 @@ build_over_call (struct z_candidate *cand, int flags, 
tsubst_flags_t complain)
   || (TREE_CODE (arg) == TARGET_EXPR
   && !unsafe_copy_elision_p (fa, arg)))
{
- tree to = stabilize_reference (cp_build_indirect_ref (fa, RO_NULL,
-   complain));
+ tree to = cp_stabilize_reference (cp_build_indirect_ref (fa,
+  RO_NULL,
+  complain));
 
  val = build2 (INIT_EXPR, DECL_CONTEXT (fn), to, arg);
  return val;
@@ -7655,7 +7656,7 @@ build_over_call (struct z_candidate *cand, int flags, 
tsubst_flags_t complain)
   && trivial_fn_p (fn)
   && !DECL_DELETED_FN (fn))
 {
-  tree to = stabilize_reference
+  tree to = cp_stabilize_reference
(cp_build_indirect_ref (argarray[0], RO_NULL, complain));
   tree type = TREE_TYPE (to);
   tree as_base = CLASSTYPE_AS_BASE (type);
diff --git gcc/cp/cp-tree.h gcc/cp/cp-tree.h
index ec92718..0e46ae1 100644
--- gcc/cp/cp-tree.h
+++ gcc/cp/cp-tree.h
@@ -6494,6 +6494,7 @@ extern cp_lvalue_kind real_lvalue_p   
(const_tree);
 extern cp_lvalue_kind lvalue_kind  (const_tree);
 extern bool lvalue_or_rvalue_with_address_p(const_tree);
 extern bool xvalue_p   (const_tree);
+extern tree cp_stabilize_reference (tree);
 extern bool builtin_valid_in_constant_expr_p(const_tree);
 extern tree build_min  (enum tree_code, tree, ...);
 extern tree build_min_nt_loc   (location_t, enum tree_code,
diff --git gcc/cp/tree.c gcc/cp/tree.c
index 112c8c7..44e3893 100644
--- gcc/cp/tree.c
+++ gcc/cp/tree.c
@@ -296,6 +296,31 @@ xvalue_p (const_tree ref)
   return (lvalue_kind (ref) == clk_rvalueref);
 }
 
+/* C++-specific version of stabilize_reference.  For preincrement and
+   predecrement expressions we need to turn the expression into some
+   sort of _REF before passing it to stabilize_reference.  */
+
+tree
+cp_stabilize_reference (tree ref)

Re: An abridged "Writing C" for the gcc web pages

2016-04-22 Thread Mikhail Maltsev
On 04/22/2016 07:21 PM, Bernd Schmidt wrote:
> (Apologies if you get this twice, the mailing list didn't like the html
> attachment in the first attempt).
> 
> We frequently get malformatted patches, and it's been brought to my attention
> that some people don't even make the effort to read the GNU coding standards
> before trying to contribute code. TL;DR seems to be the excuse, and while I 
> find
> that attitude inappropriate, we could probably improve the situation by 
> spelling
> out the most basic rules in an abridged document on our webpages. Below is a
> draft I came up with. Thoughts?
> 
Probably contrib/clang-format and https://gcc.gnu.org/wiki/FormattingCodeForGCC
are also worth mentioning.

-- 
Regards,
Mikhail Maltsev


Re: [PATCH 2/2] (header usage fix) include c++ headers in system.h

2016-04-22 Thread Pedro Alves
On 04/22/2016 11:02 AM, Szabolcs Nagy wrote:
> Some gcc source files include standard headers after
> "system.h" but those headers may declare and use poisoned
> symbols, 

Couldn't gcc simply allow use of poisoned symbols in
system headers?

It sounds like it'd avoid these odd contortions.

> they also cannot be included before "system.h"
> because they might depend on macro definitions from there,
> so they must be included in system.h.

Thanks,
Pedro Alves



Re: [PATCH] Allow all 1s of integer as standard SSE constants

2016-04-22 Thread Uros Bizjak
On Fri, Apr 22, 2016 at 7:10 PM, Uros Bizjak  wrote:
> On Fri, Apr 22, 2016 at 4:19 PM, H.J. Lu  wrote:
>> On Fri, Apr 22, 2016 at 5:11 AM, Uros Bizjak  wrote:
>>> On Thu, Apr 21, 2016 at 10:58 PM, H.J. Lu  wrote:
>>>
 Here is the updated patch with my standard_sse_constant_p change and
 your SSE/AVX pattern change.  I didn't include your
 standard_sse_constant_opcode since it didn't compile nor is needed
 for this purpose.
>>>
>>> H.J.,
>>>
>>> please test the attached patch that finally rewrites and improves SSE
>>> constants handling.
>>>
>>> This is what I want to commit, a follow-up patch will further clean
>>> standard_sse_constant_opcode wrt TARGET_AVX512VL.
>>>
>>
>> It doesn't address my problem which is "Allow all 1s of integer as
>> standard SSE constants".  The key here is "integer".  I'd like to use
>> SSE/AVX store TI/OI/XI integers with -1.
>
> Yes, my patch *should* work for this. Please note that
> all_ones_operand should catch all cases your additional patch adds.
>
> ;; Return true if operand is a (vector) constant with all bits set.
> (define_predicate "all_ones_operand"
>   (match_code "const_int,const_wide_int,const_vector")
> {
>   if (op == constm1_rtx)
> return true;
>
>   if (mode == VOIDmode)
> mode = GET_MODE (op);
>   return op == CONSTM1_RTX (mode);
> })
>
>
> Can you please investigate, what is wrong with all_ones_operand so it
> doesn't accept all (-1) operands?

Does following work:

;; Return true if operand is a (vector) constant with all bits set.
(define_predicate "all_ones_operand"
  (match_code "const_int,const_wide_int,const_vector")
{
  if (op == constm1_rtx)
return true;

  if (CONST_INT_P (op))
return INTVAL (op) == HOST_WIDE_INT_M1;

  if (mode == VOIDmode)
mode = GET_MODE (op);
  return op == CONSTM1_RTX (mode);
})


Uros.


Re: [PATCH][doc] Update documentation of AArch64 options

2016-04-22 Thread Wilco Dijkstra
Sandra Loosemore wrote:
>
> Can you please change all the incorrectly hyphenated "32-bit" and
> "64-bit" uses in this section to "32 bits" and "64 bits" respectively?
> ("n-bit" should only be hyphenated when it is used as an adjective
> phrase immediately before the noun it modifies.)

No problem, all cases in the AArch64 section have been fixed. 

> The new text seems repetitive and awkward to me.  How about something like:
> 
> Avoid generating memory accesses that may not be aligned on a natural
> object boundary as described in the architecture specification.

Thanks, I've used that.

> +@option{-funsafe-math-optimizations} is used as well.  Enabling this reduces
> +precision of reciprocal square root results to about 16 bits for
> +single-precision and to 32 bits for double-precision.

Fixed (an earlier version had "floating point" at the end, but that seemed 
superfluous).

>>   @item -mpc-relative-literal-loads
>>   @opindex mpcrelativeliteralloads
>
> What happened to that @opindex entry?  :-(

Fixed (and mlow-precision-recip-sqrt too)

How about a @itemindex that automatically does the right thing?

>> +Enable PC relative literal loads.  With this option literal pools are

Fixed, new version below:


2016-04-22  Wilco Dijkstra  

gcc/
* gcc/doc/invoke.texi (AArch64 Options): Update.
--

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 
8ec6b092be55f1ac629df447c3aeb8ca100508dc..0d7eb566442c57f73083c3b79f3769d5910df388
 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -12827,9 +12827,9 @@ These options are defined for AArch64 implementations:
 @item -mabi=@var{name}
 @opindex mabi
 Generate code for the specified data model.  Permissible values
-are @samp{ilp32} for SysV-like data model where int, long int and pointer
-are 32-bit, and @samp{lp64} for SysV-like data model where int is 32-bit,
-but long int and pointer are 64-bit.
+are @samp{ilp32} for SysV-like data model where int, long int and pointers
+are 32 bits, and @samp{lp64} for SysV-like data model where int is 32 bits,
+but long int and pointers are 64 bits.
 
 The default depends on the specific target configuration.  Note that
 the LP64 and ILP32 ABIs are not link-compatible; you must compile your
@@ -12854,25 +12854,24 @@ Generate little-endian code.  This is the default 
when GCC is configured for an
 @item -mcmodel=tiny
 @opindex mcmodel=tiny
 Generate code for the tiny code model.  The program and its statically defined
-symbols must be within 1GB of each other.  Pointers are 64 bits.  Programs can
-be statically or dynamically linked.  This model is not fully implemented and
-mostly treated as @samp{small}.
+symbols must be within 1MB of each other.  Programs can be statically or
+dynamically linked.
 
 @item -mcmodel=small
 @opindex mcmodel=small
 Generate code for the small code model.  The program and its statically defined
-symbols must be within 4GB of each other.  Pointers are 64 bits.  Programs can
-be statically or dynamically linked.  This is the default code model.
+symbols must be within 4GB of each other.  Programs can be statically or
+dynamically linked.  This is the default code model.
 
 @item -mcmodel=large
 @opindex mcmodel=large
 Generate code for the large code model.  This makes no assumptions about
-addresses and sizes of sections.  Pointers are 64 bits.  Programs can be
-statically linked only.
+addresses and sizes of sections.  Programs can be statically linked only.
 
 @item -mstrict-align
 @opindex mstrict-align
-Do not assume that unaligned memory references are handled by the system.
+Avoid generating memory accesses that may not be aligned on a natural object
+boundary as described in the architecture specification.
 
 @item -momit-leaf-frame-pointer
 @itemx -mno-omit-leaf-frame-pointer
@@ -12894,7 +12893,7 @@ of TLS variables.
 @item -mtls-size=@var{size}
 @opindex mtls-size
 Specify bit size of immediate TLS offsets.  Valid values are 12, 24, 32, 48.
-This option depends on binutils higher than 2.25.
+This option requires binutils 2.26 or newer.
 
 @item -mfix-cortex-a53-835769
 @itemx -mno-fix-cortex-a53-835769
@@ -12914,12 +12913,13 @@ corresponding flag to the linker.
 
 @item -mlow-precision-recip-sqrt
 @item -mno-low-precision-recip-sqrt
-@opindex -mlow-precision-recip-sqrt
-@opindex -mno-low-precision-recip-sqrt
-When calculating the reciprocal square root approximation,
-uses one less step than otherwise, thus reducing latency and precision.
-This is only relevant if @option{-ffast-math} enables the reciprocal square 
root
-approximation, which in turn depends on the target processor.
+@opindex mlow-precision-recip-sqrt
+@opindex mno-low-precision-recip-sqrt
+Enable or disable reciprocal square root approximation.
+This option only has an effect if @option{-ffast-math} or
+@option{-funsafe-math-optimizations} is used as well.  Enabling this reduces
+precision of reciprocal square root results to about 16 bits for
+single precision and to 32 

Re: [PATCH] Allow all 1s of integer as standard SSE constants

2016-04-22 Thread Uros Bizjak
On Fri, Apr 22, 2016 at 4:19 PM, H.J. Lu  wrote:
> On Fri, Apr 22, 2016 at 5:11 AM, Uros Bizjak  wrote:
>> On Thu, Apr 21, 2016 at 10:58 PM, H.J. Lu  wrote:
>>
>>> Here is the updated patch with my standard_sse_constant_p change and
>>> your SSE/AVX pattern change.  I didn't include your
>>> standard_sse_constant_opcode since it didn't compile nor is needed
>>> for this purpose.
>>
>> H.J.,
>>
>> please test the attached patch that finally rewrites and improves SSE
>> constants handling.
>>
>> This is what I want to commit, a follow-up patch will further clean
>> standard_sse_constant_opcode wrt TARGET_AVX512VL.
>>
>
> It doesn't address my problem which is "Allow all 1s of integer as
> standard SSE constants".  The key here is "integer".  I'd like to use
> SSE/AVX store TI/OI/XI integers with -1.

Yes, my patch *should* work for this. Please note that
all_ones_operand should catch all cases your additional patch adds.

;; Return true if operand is a (vector) constant with all bits set.
(define_predicate "all_ones_operand"
  (match_code "const_int,const_wide_int,const_vector")
{
  if (op == constm1_rtx)
return true;

  if (mode == VOIDmode)
mode = GET_MODE (op);
  return op == CONSTM1_RTX (mode);
})


Can you please investigate, what is wrong with all_ones_operand so it
doesn't accept all (-1) operands?

Uros.


[PATCH] Verify __builtin_unreachable and __builtin_trap are not called with arguments

2016-04-22 Thread Martin Jambor
Hi,

this patch adds verification that __builtin_unreachable and
__builtin_trap are not called with arguments.  The problem with calls
to them with arguments is that functions like gimple_call_builtin_p
return false on them, because they return true only when
gimple_builtin_call_types_compatible_p does.  One manifestation of
that was PR 61591 where undefined behavior sanitizer did not replace
such calls with its thing as it should, but there might be others.

I have included __builtin_trap in the verification because they often
seem to be handled together but can either remove it or add more
builtins if people think it better.  I concede it is a bit arbitrary.

Honza said he has seen __builtin_unreachable calls with parameters in
LTO builds of Firefox, so it seems this might actually trigger, but I
also think we do not want such calls in the IL.

I have bootstrapped and tested this on x86_64-linux (with all
languages and Ada) and have also run a C, C++ and Fortran LTO
bootstrap with the patch on the same architecture.  OK for trunk?

Thanks,

Martin


2016-04-20  Martin Jambor  

* tree-cfg.c (verify_gimple_call): Check that calls to
__builtin_unreachable or __builtin_trap do not have actual arguments.
---
 gcc/tree-cfg.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index 04e46fd..3385164 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -3414,6 +3414,26 @@ verify_gimple_call (gcall *stmt)
   return true;
 }
 
+  if (fndecl && DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL)
+{
+  switch (DECL_FUNCTION_CODE (fndecl))
+   {
+   case BUILT_IN_UNREACHABLE:
+   case BUILT_IN_TRAP:
+ if (gimple_call_num_args (stmt) > 0)
+   {
+ /* Built-in unreachable with parameters might not be caught by
+undefined behavior santizer. */
+ error ("__builtin_unreachable or __builtin_trap call with "
+"arguments");
+ return true;
+   }
+ break;
+   default:
+ break;
+   }
+}
+
   /* ???  The C frontend passes unpromoted arguments in case it
  didn't see a function declaration before the call.  So for now
  leave the call arguments mostly unverified.  Once we gimplify
-- 
2.8.1



Re: An abridged "Writing C" for the gcc web pages

2016-04-22 Thread Jason Merrill

On 04/22/2016 12:42 PM, paul_kon...@dell.com wrote:



On Apr 22, 2016, at 12:21 PM, Bernd Schmidt  wrote:

(Apologies if you get this twice, the mailing list didn't like the html 
attachment in the first attempt).

We frequently get malformatted patches, and it's been brought to my attention 
that some people don't even make the effort to read the GNU coding standards 
before trying to contribute code. TL;DR seems to be the excuse, and while I 
find that attitude inappropriate, we could probably improve the situation by 
spelling out the most basic rules in an abridged document on our webpages. 
Below is a draft I came up with. Thoughts?


Would you expect people to conform to the abridged version or the full 
standard?  If the full standard, then publishing an abridged version is not a 
good idea, it will just cause confusion.  Let the full standard be the rule, 
make people read it, and if they didn't bother that's their problem.


And this isn't strictly an abridged version, as it contains information 
that is not part of the GNU standard.



+The format should be text/plain so that mail clients such as
+thunderbird can display and quote it, without forcing potential
+reviewers to take extra steps to save it and open it elsewhere before
+being able to look at it.


I note that this patch itself is sent as text/x-patch, which thunderbird 
handles fine.  And apparently so does gmail, if you use the .diff 
extension; the .patch extension that I have tended to use doesn't work 
properly, so I guess I'll switch.



+All leading whitespace should be replaced with tab characters as
+much as possible, but tab characters should not be used in any other
+circumstances.


Some headers also use tabs to separate the parameter-list from the 
function name.


Jason



Re: [PATCH] Allow all 1s of integer as standard SSE constants

2016-04-22 Thread H.J. Lu
On Fri, Apr 22, 2016 at 7:50 AM, H.J. Lu  wrote:
> On Fri, Apr 22, 2016 at 7:19 AM, H.J. Lu  wrote:
>> On Fri, Apr 22, 2016 at 5:11 AM, Uros Bizjak  wrote:
>>> On Thu, Apr 21, 2016 at 10:58 PM, H.J. Lu  wrote:
>>>
 Here is the updated patch with my standard_sse_constant_p change and
 your SSE/AVX pattern change.  I didn't include your
 standard_sse_constant_opcode since it didn't compile nor is needed
 for this purpose.
>>>
>>> H.J.,
>>>
>>> please test the attached patch that finally rewrites and improves SSE
>>> constants handling.
>>>
>>> This is what I want to commit, a follow-up patch will further clean
>>> standard_sse_constant_opcode wrt TARGET_AVX512VL.
>>>
>>
>> It doesn't address my problem which is "Allow all 1s of integer as
>> standard SSE constants".  The key here is "integer".  I'd like to use
>> SSE/AVX store TI/OI/XI integers with -1.
>>
>> --
>> H.J.
>
> I am testing this on top of yours:
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 27c3bbd..677aa71 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -11136,7 +11136,20 @@ standard_sse_constant_p (rtx x, machine_mode 
> pred_mode)
>/* VOIDmode integer constant, infer mode from the predicate.  */
>if (mode == VOIDmode)
>  mode = pred_mode;
> -
> +  if (CONST_INT_P (x))
> +{
> +  /* If mode != VOIDmode, standard_sse_constant_p must be called:
> + 1. On TImode with SSE2.
> + 2. On OImode with AVX2.
> + 3. On XImode with AVX512F.
> +   */
> +  if ((HOST_WIDE_INT) INTVAL (x) == HOST_WIDE_INT_M1
> +  && (mode == VOIDmode
> +  || (mode == TImode && TARGET_SSE2)
> +  || (mode == OImode && TARGET_AVX2)
> +  || (mode == XImode && TARGET_AVX512F)))
> + return 2;
> +}
>else if (all_ones_operand (x, VOIDmode))
>  switch (GET_MODE_SIZE (mode))
>{
>
>

This one works for me.


-- 
H.J.
From fb8c4f9ae238e738d2de5c1f60d741ac01382978 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Fri, 22 Apr 2016 06:30:59 -0700
Subject: [PATCH] Allow all 1s of integer as standard SSE constants

Since all 1s in TImode is standard SSE2 constants, all 1s in OImode is
standard AVX2 constants and all 1s in XImode is standard AVX512F constants,
pass mode to standard_sse_constant_p and standard_sse_constant_opcode
to check if all 1s is available for target.

---
 gcc/config/i386/constraints.md |   7 +-
 gcc/config/i386/i386-protos.h  |   2 +-
 gcc/config/i386/i386.c | 149 ++---
 gcc/config/i386/i386.md|  51 --
 gcc/config/i386/predicates.md  |  42 ++--
 gcc/config/i386/sse.md |   4 +-
 6 files changed, 154 insertions(+), 101 deletions(-)

diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
index afdc546..c02c321 100644
--- a/gcc/config/i386/constraints.md
+++ b/gcc/config/i386/constraints.md
@@ -186,7 +186,9 @@
 
 (define_constraint "BC"
   "@internal SSE constant operand."
-  (match_test "standard_sse_constant_p (op)"))
+  (and (match_test "TARGET_SSE")
+   (ior (match_operand 0 "const0_operand")
+	(match_operand 0 "all_ones_operand"
 
 ;; Integer constant constraints.
 (define_constraint "I"
@@ -239,7 +241,8 @@
 ;; This can theoretically be any mode's CONST0_RTX.
 (define_constraint "C"
   "SSE constant zero operand."
-  (match_test "standard_sse_constant_p (op) == 1"))
+  (and (match_test "TARGET_SSE")
+   (match_operand 0 "const0_operand")))
 
 ;; Constant-or-symbol-reference constraints.
 
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index ff47bc1..93b5e1e 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -50,7 +50,7 @@ extern bool ix86_using_red_zone (void);
 extern int standard_80387_constant_p (rtx);
 extern const char *standard_80387_constant_opcode (rtx);
 extern rtx standard_80387_constant_rtx (int);
-extern int standard_sse_constant_p (rtx);
+extern int standard_sse_constant_p (rtx, machine_mode);
 extern const char *standard_sse_constant_opcode (rtx_insn *, rtx);
 extern bool symbolic_reference_mentioned_p (rtx);
 extern bool extended_reg_mentioned_p (rtx);
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 6379313..4f805e9 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -10762,42 +10762,57 @@ standard_80387_constant_rtx (int idx)
    XFmode);
 }
 
-/* Return 1 if X is all 0s and 2 if x is all 1s
+/* Return 1 if X is all bits 0 and 2 if X is all bits 1
in supported SSE/AVX vector mode.  */
 
 int
-standard_sse_constant_p (rtx x)
+standard_sse_constant_p (rtx x, machine_mode pred_mode)
 {
   machine_mode mode;
 
   if (!TARGET_SSE)
 return 0;
 
-  mode = GET_MODE (x);
-  
-  if (x == const0_rtx || x == CONST0_RTX (mode))
+  if (const0_operand (x, VOIDmode))
 return 1;
-  if (vector_all_ones_operand (x, 

Re: An abridged "Writing C" for the gcc web pages

2016-04-22 Thread Paul_Koning

> On Apr 22, 2016, at 12:21 PM, Bernd Schmidt  wrote:
> 
> (Apologies if you get this twice, the mailing list didn't like the html 
> attachment in the first attempt).
> 
> We frequently get malformatted patches, and it's been brought to my attention 
> that some people don't even make the effort to read the GNU coding standards 
> before trying to contribute code. TL;DR seems to be the excuse, and while I 
> find that attitude inappropriate, we could probably improve the situation by 
> spelling out the most basic rules in an abridged document on our webpages. 
> Below is a draft I came up with. Thoughts?

Would you expect people to conform to the abridged version or the full 
standard?  If the full standard, then publishing an abridged version is not a 
good idea, it will just cause confusion.  Let the full standard be the rule, 
make people read it, and if they didn't bother that's their problem.

paul



An abridged "Writing C" for the gcc web pages

2016-04-22 Thread Bernd Schmidt
(Apologies if you get this twice, the mailing list didn't like the html 
attachment in the first attempt).


We frequently get malformatted patches, and it's been brought to my 
attention that some people don't even make the effort to read the GNU 
coding standards before trying to contribute code. TL;DR seems to be the 
excuse, and while I find that attitude inappropriate, we could probably 
improve the situation by spelling out the most basic rules in an 
abridged document on our webpages. Below is a draft I came up with. 
Thoughts?



Bernd

Index: htdocs/contribute.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/contribute.html,v
retrieving revision 1.87
diff -d -u -r1.87 contribute.html
--- htdocs/contribute.html	9 Apr 2015 21:49:31 -	1.87
+++ htdocs/contribute.html	22 Apr 2016 16:19:12 -
@@ -61,7 +61,9 @@
 
 All contributions must conform to the http://www.gnu.org/prep/standards_toc.html;>GNU Coding
-Standards.  There are also some .  An abridged version of
+the C formatting guidelines is available for purposes of getting
+started.  There are also some additional coding conventions for
 GCC; these include documentation and testsuite requirements as
 well as requirements on code formatting.
--- /dev/null	2016-04-22 12:54:13.474856906 +0200
+++ htdocs/writing-c.html	2016-04-22 18:11:47.222034004 +0200
@@ -0,0 +1,263 @@
+
+
+
+
+Getting Started Writing C for the GCC project
+
+
+
+
+Introduction
+
+This guide is intended as a heavily abridged version of the C
+coding guidelines in the full http://www.gnu.org/prep/standards_toc.html;>GNU Coding
+Standards, which may be too voluminous for someone just looking to
+contribute a short patch to GCC.  Here, the focus is mainly on the
+very basic formatting rules and how they apply to GCC.  Anyone who
+wishes to become a long-time contributor should still read the full
+document.  There is also https://gcc.gnu.org/codingconventions.html;>an additional
+document going into considerably more detail about less common
+cases that may arise.
+
+We care very strongly about ensuring all code in GCC is formatted
+the same way, as inconsistencies can be very distracting and make it
+harder to modify the source.  As always, there are exceptions to the
+rules, but they should be rare and only made for good reasons.
+
+When we speak about C formatting guidelines in this document, we
+intend them to be applied to C++ code as well.  Obviously there are
+certain situations where the use of C++ features creates new problems,
+and we have not settled on complete solutions in all cases.  The most
+important rule is this: when in doubt about anything, carefully
+examine existing code in the area you are changing (or analogous code
+in other files) and try to follow the prevalent style.
+
+Tools
+
+We strongly recommend using an editor which understands and
+supports the GNU formatting rules to some degree.  GNU emacs is
+designed for this purpose and supports GNU style as the default for C
+formatting, and we recommend it for this reason.  However, there are
+also contributors making good use of other editors such as vim.  Using
+more exotic tools, such as Eclipse, or Windows-based editors, may
+prove problematic.
+
+GCC has a check_GNU_style.sh script in the
+contrib/ directory to verify the formatting in a patch
+file.  Some caution is advisable however, as machines are generally
+not very good at applying common sense.
+
+To make sure that a correctly formatted patch doesn't get destroyed
+by email software, it is usually best to send it as an attachment.
+The format should be text/plain so that mail clients such as
+thunderbird can display and quote it, without forcing potential
+reviewers to take extra steps to save it and open it elsewhere before
+being able to look at it.
+
+Formatting Your Source Code
+
+Formatting Basics
+
+Please keep the length of source lines to 79 characters or less, for
+maximum readability in the widest range of environments.
+
+Tab characters are interpreted as jumps to stops separated by eight
+spaces.  Note that editors on certain systems (such as Microsoft
+Windows) may follow different rules by default, so pay special
+attention if you are using such systems.
+
+All leading whitespace should be replaced with tab characters as
+much as possible, but tab characters should not be used in any other
+circumstances. There should never be trailing whitespace, and no
+carriage-return characters at the line end.  Blank lines should just
+consist of a newline character.
+
+There should be a space before open-parentheses and after commas.
+We also use spaces around binary operators.
+
+Avoid unnecessary parentheses and braces, except in special cases that
+are described elsewhere in this document.  The following example
+shows how not to do it:
+  if ((a == 0) && (b == 2))
+{
+  return (z);
+}
+
+
+Instead, this should be written as follows:
+  if (a == 0 && b == 2)
+

[PATCH][AArch64] Improve aarch64_case_values_threshold setting

2016-04-22 Thread Wilco Dijkstra
GCC expands switch statements in a very simplistic way and tries to use a table
expansion even when it is a bad idea for performance or codesize.
GCC typically emits extremely sparse tables that contain mostly default entries
(something which currently cannot be tuned by backends).  Additionally the 
computation of the minimum/maximum label offsets is too simplistic so the tables
are often twice as large as necessary.

The cost of a table switch is significant due to the setup overhead, the table
lookup (which due to being sparse and large adds unnecessary cachemisses)
and hard to predict indirect jump.  Therefore it is best to avoid using a table
unless there are many real case labels.

This patch fixes that by setting the default aarch64_case_values_threshold to
16 when the per-CPU tuning is not set.  On SPEC2006 this improves the switch
heavy benchmarks GCC and perlbench both in performance (1-2%) as well as size
(0.5-1% smaller).

OK for trunk?

ChangeLog:
2016-04-22  Wilco Dijkstra  

gcc/
* config/aarch64/aarch64.c (aarch64_case_values_threshold):
Return a better case_values_threshold when optimizing.

--
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 0620f1e..a240635 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -3546,7 +3546,12 @@ aarch64_cannot_force_const_mem (machine_mode mode 
ATTRIBUTE_UNUSED, rtx x)
   return aarch64_tls_referenced_p (x);
 }
 
-/* Implement TARGET_CASE_VALUES_THRESHOLD.  */
+/* Implement TARGET_CASE_VALUES_THRESHOLD.
+   The expansion for a table switch is quite expensive due to the number
+   of instructions, the table lookup and hard to predict indirect jump.
+   When optimizing for speed, with -O3 use the per-core tuning if set,
+   otherwise use tables for > 16 cases as a tradeoff between size and
+   performance.  */
 
 static unsigned int
 aarch64_case_values_threshold (void)
@@ -3557,7 +3562,7 @@ aarch64_case_values_threshold (void)
   && selected_cpu->tune->max_case_values != 0)
 return selected_cpu->tune->max_case_values;
   else
-return default_case_values_threshold ();
+return optimize_size ? default_case_values_threshold () : 17;
 }




Re: C++ PATCH for range-for tweak

2016-04-22 Thread Jason Merrill

On 03/14/2016 10:57 PM, Jason Merrill wrote:

On 03/14/2016 05:30 PM, Florian Weimer wrote:

* Jason Merrill:


 P08184R0: Generalizing the Range-Based For Loop


How can one resolve this reference?  It's obviously not a PR number in
GCC Bugzilla.

I found this after some searching:



But it lacks the additional “8”.


Oops, typo.  Fixed, along with adjusting the feature-test macro.


If the change is limited to C++17, the adjusted macro should be as well.


commit f27fe6112b177226b112416457fa27be7cda394c
Author: Jason Merrill 
Date:   Tue Apr 19 15:13:42 2016 -0400

	* c-cppbuiltin.c (c_cpp_builtins): Fix __cpp_range_based_for.

diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c
index 94523b8..408ad47 100644
--- a/gcc/c-family/c-cppbuiltin.c
+++ b/gcc/c-family/c-cppbuiltin.c
@@ -841,7 +841,8 @@ c_cpp_builtins (cpp_reader *pfile)
 	  cpp_define (pfile, "__cpp_lambdas=200907");
 	  if (cxx_dialect == cxx11)
 	cpp_define (pfile, "__cpp_constexpr=200704");
-	  cpp_define (pfile, "__cpp_range_based_for=201603");
+	  if (cxx_dialect <= cxx14)
+	cpp_define (pfile, "__cpp_range_based_for=200907");
 	  if (cxx_dialect <= cxx14)
 	cpp_define (pfile, "__cpp_static_assert=200410");
 	  cpp_define (pfile, "__cpp_decltype=200707");
@@ -877,6 +878,7 @@ c_cpp_builtins (cpp_reader *pfile)
 	  cpp_define (pfile, "__cpp_nested_namespace_definitions=201411");
 	  cpp_define (pfile, "__cpp_fold_expressions=201603");
 	  cpp_define (pfile, "__cpp_nontype_template_args=201411");
+	  cpp_define (pfile, "__cpp_range_based_for=201603");
 	}
   if (flag_concepts)
 	/* Use a value smaller than the 201507 specified in


Re: C, C++: New warning for memset without multiply by elt size

2016-04-22 Thread Bernd Schmidt

On 04/22/2016 05:31 PM, Jason Merrill wrote:

On Fri, Apr 22, 2016 at 11:24 AM, Bernd Schmidt  wrote:

On 04/22/2016 03:57 PM, Jason Merrill wrote:

This looks good, but can we move the code into c-common rather than
duplicate it?


Probably not without a fair amount of surgery, since the cdw_ and ds_ codes
are private to each frontend.


I don't see any cdw_ or ds_ in this patch.


Ah sorry, that was the other patch. I'm somewhat jetlagged. I'll look 
into this; I do remember there being some differences in the C and C++ 
parts, but it may just be the bit where it looks past a CONST_DECL for C++.



Bernd




Re: Compile libcilkrts with -funwind-tables (PR target/60290)

2016-04-22 Thread Jeff Law

On 04/06/2016 05:12 AM, Rainer Orth wrote:

I've finally gotten around to analyzing this testsuite failure on 32-bit
Solaris/x86:

FAIL: g++.dg/cilk-plus/CK/catch_exc.cc  -O1 -fcilkplus execution test
FAIL: g++.dg/cilk-plus/CK/catch_exc.cc  -O3 -fcilkplus execution test
FAIL: g++.dg/cilk-plus/CK/catch_exc.cc  -g -O2 -fcilkplus execution test
FAIL: g++.dg/cilk-plus/CK/catch_exc.cc  -g -fcilkplus execution test

The testcase aborts like this:

Thread 2 received signal SIGABRT, Aborted.
[Switching to Thread 1 (LWP 1)]
0xfe3ba3c5 in __lwp_sigqueue () from /lib/libc.so.1
(gdb) where
#0  0xfe3ba3c5 in __lwp_sigqueue () from /lib/libc.so.1
#1  0xfe3b2d4f in thr_kill () from /lib/libc.so.1
#2  0xfe2f64da in raise () from /lib/libc.so.1
#3  0xfe2c93ee in abort () from /lib/libc.so.1
#4  0xfe525b37 in _Unwind_Resume (exc=0x80a75a0)
 at /vol/gcc/src/hg/trunk/local/libgcc/unwind.inc:234
#5  0xfe783b85 in __cilkrts_gcc_rethrow (sf=0xfeffdb00)
 at /vol/gcc/src/hg/trunk/local/libcilkrts/runtime/except-gcc.cpp:589
#6  0xfe77f0ea in __cilkrts_rethrow (sf=0xfeffdb00)
 at /vol/gcc/src/hg/trunk/local/libcilkrts/runtime/cilk-abi.c:548
#7  0x080513a3 in my_test ()
 at 
/vol/gcc/src/hg/trunk/local/gcc/testsuite/g++.dg/cilk-plus/CK/catch_exc.cc:38
#8  0x080515bd in main ()
 at 
/vol/gcc/src/hg/trunk/local/gcc/testsuite/g++.dg/cilk-plus/CK/catch_exc.cc:62

The gcc_assert in _Unwind_Resume triggers since
_Unwind_RaiseException_Phase2 returned _URC_FATAL_PHASE2_ERROR.  I found
that x86_fallback_frame_state had been invoked for this pc:

0xfe77218e <__cilkrts_rethrow+30>:add$0x10,%esp

and returned _URC_END_OF_STACK, which is totally unexpected since
_Unwind_Find_FDE should have found it.  __cilkrts_rethrow is defined in
libcilkrts/cilk-abi.o, but in the 32-bit case EH info is missing:

32-bit:

ro@fuego 339 > elfdump -u .libs/libcilkrts.so|grep rethrow
0x18def  0x3558c  __cilkrts_gcc_rethrow
   [0x11fc]  initloc:   0x18def [ sdata4 pcrel ]  __cilkrts_gcc_rethrow

64-bit:

ro@fuego 341 > elfdump -u amd64/libcilkrts/.libs/libcilkrts.so|grep rethrow
0x1c639  0x2010  __cilkrts_rethrow
0x2388b  0x5510  __cilkrts_gcc_rethrow
[0x488]  initloc:   0x1c639 [ sdata4 pcrel ]  __cilkrts_rethrow
   [0x3988]  initloc:   0x2388b [ sdata4 pcrel ]  __cilkrts_gcc_rethrow

I traced this to -funwind-tables bein set on 32-bit Linux/x86, while it
is unset on 32-bit Solaris/x86  due to

i386/i386.c (ix86_option_override_internal):

   if (opts->x_flag_asynchronous_unwind_tables == 2)
opts->x_flag_asynchronous_unwind_tables = !USE_IX86_FRAME_POINTER;

where i386/sol2.h has

#define USE_IX86_FRAME_POINTER 1

while the default is 0.

As expected, compiling libcilkrts with -funwind-tables (which is a no-op
on Linux/x86, Linux/x86_64, and Solaris/amd64) makes the failure go
away.

I'm uncertain if this is ok for mainline at this stage or has to wait
for gcc-7.  Once it goes into mainline, it's probably worth a backport
to all active release branches.

Thoughts?

Rainer


2016-04-04  Rainer Orth  

PR target/60290
* Makefile.am (GENERAL_FLAGS): Add -funwind-tables.
* Makefile.in: Regenerate.
OK.  Thanks for tracking this down, including verification that the 
difference between Solaris and Linux is the latter having 
-funwind-tables on by default.


jeff



[PATCH][AArch64] Adjust SIMD integer preference

2016-04-22 Thread Wilco Dijkstra
SIMD operations like combine prefer to have their operands in FP registers,
so increase the cost of integer registers slightly to avoid unnecessary int<->FP
moves. This improves register allocation of scalar SIMD operations.

OK for trunk?

ChangeLog:
2016-04-22  Wilco Dijkstra  

* gcc/config/aarch64/aarch64-simd.md (aarch64_combinez):
Add ? to integer variant.
(aarch64_combinez_be): Likewise.

--

diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
e1f5682165cd22ca7d31643b8f4e7f631d99c2d8..d3830838867eec2098b71eb46b7343d0155acf7e
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -2645,7 +2645,7 @@
 (define_insn "*aarch64_combinez"
   [(set (match_operand: 0 "register_operand" "=w,w,w")
 (vec_concat:
-  (match_operand:VD_BHSI 1 "general_operand" "w,r,m")
+  (match_operand:VD_BHSI 1 "general_operand" "w,?r,m")
   (match_operand:VD_BHSI 2 "aarch64_simd_imm_zero" "Dz,Dz,Dz")))]
   "TARGET_SIMD && !BYTES_BIG_ENDIAN"
   "@
@@ -2661,7 +2661,7 @@
   [(set (match_operand: 0 "register_operand" "=w,w,w")
 (vec_concat:
   (match_operand:VD_BHSI 2 "aarch64_simd_imm_zero" "Dz,Dz,Dz")
-  (match_operand:VD_BHSI 1 "general_operand" "w,r,m")))]
+  (match_operand:VD_BHSI 1 "general_operand" "w,?r,m")))]
   "TARGET_SIMD && BYTES_BIG_ENDIAN"
   "@
mov\\t%0.8b, %1.8b



Re: [PATCH PR70715]Expand simple operations in IV.base and check if it's the control_IV

2016-04-22 Thread Bin.Cheng
On Fri, Apr 22, 2016 at 4:26 PM, Christophe Lyon
 wrote:
> On 21 April 2016 at 11:03, Richard Biener  wrote:
>> On Wed, Apr 20, 2016 at 5:08 PM, Bin Cheng  wrote:
>>> Hi,
>>> As reported in PR70715, GCC failed to prove no-overflows of IV([n]) for 
>>> simple example like:
>>> int
>>> foo (char *p, unsigned n)
>>> {
>>>   while(n--)
>>> {
>>>   p[n]='A';
>>> }
>>>   return 0;
>>> }
>>> Actually, code has already been added to handle this form loops when fixing 
>>> PR68529.  Problem with this case is loop niter analyzer records control_IV 
>>> with its base expanded by calling expand_simple_operations.  This patch 
>>> simply adds code expanding BASE before we check its equality against 
>>> control_IV.base.  In the long run, we might want to remove the use of 
>>> expand_simple_operations.
>>>
>>> Bootstrap and test on x86_64.  Is it OK?
>>
>
> Hi Bin,
>
> On ARM and AArch64 bare-metal toolchains, this causes
>
> FAIL: gcc.dg/tree-ssa/minmax-2.c scan-tree-dump optimized "__builtin_fmin"
Hi Christophe,
As Kyrill pointed out, it doesn't look likely.  The case doesn't even
have a loop for the patch to apply?

Thanks,
bin
>
> Christophe
>
>> Ok.
>>
>> Richard.
>>
>>> Thanks,
>>> bin
>>>
>>>
>>> 2016-04-20  Bin Cheng  
>>>
>>> PR tree-optimization/70715
>>> * tree-ssa-loop-niter.c (loop_exits_before_overflow): Check equality
>>> after expanding BASE using expand_simple_operations.


Re: C, C++: New warning for memset without multiply by elt size

2016-04-22 Thread Jason Merrill
On Fri, Apr 22, 2016 at 11:24 AM, Bernd Schmidt  wrote:
> On 04/22/2016 03:57 PM, Jason Merrill wrote:
>> This looks good, but can we move the code into c-common rather than
>> duplicate it?
>
> Probably not without a fair amount of surgery, since the cdw_ and ds_ codes
> are private to each frontend.

I don't see any cdw_ or ds_ in this patch.

Jason


Re: [testsuite] gcc-dg: handle all return values when shouldfail is set

2016-04-22 Thread Jeff Law

On 04/13/2016 02:21 AM, Christophe Lyon wrote:

Hi,

While investigating stability problems when running GCC validations,
I fixed DejaGnu to properly handle cases where it cannot parse the
testcase output:
http://lists.gnu.org/archive/html/dejagnu/2016-04/msg8.html

This means that such cases now return "unresolved" which confuses
${tool}_load in gcc-dg.exp, as it currently only handles "pass" and "fail".

The attached small patch fixes this.

This is probably for stage 1 only I guess.

OK?

OK.

Jeff


Re: [PATCH PR70715]Expand simple operations in IV.base and check if it's the control_IV

2016-04-22 Thread Kyrill Tkachov

Hi Christophe,

On 22/04/16 16:26, Christophe Lyon wrote:

On 21 April 2016 at 11:03, Richard Biener  wrote:

On Wed, Apr 20, 2016 at 5:08 PM, Bin Cheng  wrote:

Hi,
As reported in PR70715, GCC failed to prove no-overflows of IV([n]) for 
simple example like:
int
foo (char *p, unsigned n)
{
   while(n--)
 {
   p[n]='A';
 }
   return 0;
}
Actually, code has already been added to handle this form loops when fixing 
PR68529.  Problem with this case is loop niter analyzer records control_IV with 
its base expanded by calling expand_simple_operations.  This patch simply adds 
code expanding BASE before we check its equality against control_IV.base.  In 
the long run, we might want to remove the use of expand_simple_operations.

Bootstrap and test on x86_64.  Is it OK?

Hi Bin,

On ARM and AArch64 bare-metal toolchains, this causes

FAIL: gcc.dg/tree-ssa/minmax-2.c scan-tree-dump optimized "__builtin_fmin"


Are you sure it's Bins' patch? I've seen that test fail on aarch64-none-elf 
since it was added recently.
See https://gcc.gnu.org/ml/gcc-patches/2016-04/msg01231.html

Kyrill


Christophe


Ok.

Richard.


Thanks,
bin


2016-04-20  Bin Cheng  

 PR tree-optimization/70715
 * tree-ssa-loop-niter.c (loop_exits_before_overflow): Check equality
 after expanding BASE using expand_simple_operations.




Re: [PATCH PR70715]Expand simple operations in IV.base and check if it's the control_IV

2016-04-22 Thread Christophe Lyon
On 21 April 2016 at 11:03, Richard Biener  wrote:
> On Wed, Apr 20, 2016 at 5:08 PM, Bin Cheng  wrote:
>> Hi,
>> As reported in PR70715, GCC failed to prove no-overflows of IV([n]) for 
>> simple example like:
>> int
>> foo (char *p, unsigned n)
>> {
>>   while(n--)
>> {
>>   p[n]='A';
>> }
>>   return 0;
>> }
>> Actually, code has already been added to handle this form loops when fixing 
>> PR68529.  Problem with this case is loop niter analyzer records control_IV 
>> with its base expanded by calling expand_simple_operations.  This patch 
>> simply adds code expanding BASE before we check its equality against 
>> control_IV.base.  In the long run, we might want to remove the use of 
>> expand_simple_operations.
>>
>> Bootstrap and test on x86_64.  Is it OK?
>

Hi Bin,

On ARM and AArch64 bare-metal toolchains, this causes

FAIL: gcc.dg/tree-ssa/minmax-2.c scan-tree-dump optimized "__builtin_fmin"

Christophe

> Ok.
>
> Richard.
>
>> Thanks,
>> bin
>>
>>
>> 2016-04-20  Bin Cheng  
>>
>> PR tree-optimization/70715
>> * tree-ssa-loop-niter.c (loop_exits_before_overflow): Check equality
>> after expanding BASE using expand_simple_operations.


Re: C, C++: New warning for memset without multiply by elt size

2016-04-22 Thread Bernd Schmidt

On 04/22/2016 03:57 PM, Jason Merrill wrote:

This looks good, but can we move the code into c-common rather than
duplicate it?


Probably not without a fair amount of surgery, since the cdw_ and ds_ 
codes are private to each frontend.



Bernd




Re: [PATCH] Allow all 1s of integer as standard SSE constants

2016-04-22 Thread H.J. Lu
On Fri, Apr 22, 2016 at 7:19 AM, H.J. Lu  wrote:
> On Fri, Apr 22, 2016 at 5:11 AM, Uros Bizjak  wrote:
>> On Thu, Apr 21, 2016 at 10:58 PM, H.J. Lu  wrote:
>>
>>> Here is the updated patch with my standard_sse_constant_p change and
>>> your SSE/AVX pattern change.  I didn't include your
>>> standard_sse_constant_opcode since it didn't compile nor is needed
>>> for this purpose.
>>
>> H.J.,
>>
>> please test the attached patch that finally rewrites and improves SSE
>> constants handling.
>>
>> This is what I want to commit, a follow-up patch will further clean
>> standard_sse_constant_opcode wrt TARGET_AVX512VL.
>>
>
> It doesn't address my problem which is "Allow all 1s of integer as
> standard SSE constants".  The key here is "integer".  I'd like to use
> SSE/AVX store TI/OI/XI integers with -1.
>
> --
> H.J.

I am testing this on top of yours:

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 27c3bbd..677aa71 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -11136,7 +11136,20 @@ standard_sse_constant_p (rtx x, machine_mode pred_mode)
   /* VOIDmode integer constant, infer mode from the predicate.  */
   if (mode == VOIDmode)
 mode = pred_mode;
-
+  if (CONST_INT_P (x))
+{
+  /* If mode != VOIDmode, standard_sse_constant_p must be called:
+ 1. On TImode with SSE2.
+ 2. On OImode with AVX2.
+ 3. On XImode with AVX512F.
+   */
+  if ((HOST_WIDE_INT) INTVAL (x) == HOST_WIDE_INT_M1
+  && (mode == VOIDmode
+  || (mode == TImode && TARGET_SSE2)
+  || (mode == OImode && TARGET_AVX2)
+  || (mode == XImode && TARGET_AVX512F)))
+ return 2;
+}
   else if (all_ones_operand (x, VOIDmode))
 switch (GET_MODE_SIZE (mode))
   {



-- 
H.J.


[PATCH][AArch64] print_operand should not fallthrough from register operand into generic operand

2016-04-22 Thread Wilco Dijkstra
Some patterns are using '%w2' for immediate operands, which means that a zero
immediate is actually emitted as 'wzr' or 'xzr'. This not only changes an 
immediate
operand into a register operand but may emit illegal instructions from legal RTL
(eg. ORR x0, SP, xzr rather than ORR x0, SP, 0).

Remove the fallthrough in aarch64_print_operand from the 'w' and 'x' case into 
the '0'
case that created this issue. Modify a few patterns to use '%2' rather than 
'%w2' for
an immediate or memory operand so they now print correctly without the 
fallthrough.

OK for trunk?

(note this requires https://gcc.gnu.org/ml/gcc-patches/2016-04/msg01265.html to
be committed first)

ChangeLog:
2016-04-22  Wilco Dijkstra  

gcc/
* config/aarch64/aarch64.md
(add3_compareC_cconly_imm): Remove use of %w for immediate.
(add3_compareC_imm): Likewise.
(si3_uxtw): Split into register and immediate variants.
(andsi3_compare0_uxtw): Likewise.
(and3_compare0): Likewise.
(and3nr_compare0): Likewise.
(stack_protect_test_): Don't use %x for memory operands.
* config/aarch64/aarch64.c (aarch64_print_operand):
Remove fallthrough from 'w' and 'x' case into '0' case.

--
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
881dc52e2de03231abb217a9ce22cbb1cc44bc6c..bcef50825c8315c39e29dbe57c387ea2a4fe445d
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -4608,7 +4608,8 @@ aarch64_print_operand (FILE *f, rtx x, int code)
  break;
}
 
-  /* Fall through */
+  output_operand_lossage ("invalid operand for '%%%c'", code);
+  return;
 
 case 0:
   /* Print a normal operand, if it's a general register, then we
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 
3e474bb0939c5786a181b67173c62ada73c4bd82..60a20366d16fb1d4eccb43ac32cfc1f0e6096cd0
 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -1779,7 +1779,7 @@
   "aarch64_zero_extend_const_eq (mode, operands[2],
 mode, operands[1])"
   "@
-  cmn\\t%0, %1
+  cmn\\t%0, %1
   cmp\\t%0, #%n1"
   [(set_attr "type" "alus_imm")]
 )
@@ -1811,7 +1811,7 @@
   "aarch64_zero_extend_const_eq (mode, operands[3],
  mode, operands[2])"
   "@
-  adds\\t%0, %1, %2
+  adds\\t%0, %1, %2
   subs\\t%0, %1, #%n2"
   [(set_attr "type" "alus_imm")]
 )
@@ -3418,7 +3418,9 @@
  (LOGICAL:SI (match_operand:SI 1 "register_operand" "%r,r")
 (match_operand:SI 2 "aarch64_logical_operand" "r,K"]
   ""
-  "\\t%w0, %w1, %w2"
+  "@
+   \\t%w0, %w1, %w2
+   \\t%w0, %w1, %2"
   [(set_attr "type" "logic_reg,logic_imm")]
 )
 
@@ -3431,7 +3433,9 @@
(set (match_operand:GPI 0 "register_operand" "=r,r")
(and:GPI (match_dup 1) (match_dup 2)))]
   ""
-  "ands\\t%0, %1, %2"
+  "@
+   ands\\t%0, %1, %2
+   ands\\t%0, %1, %2"
   [(set_attr "type" "logics_reg,logics_imm")]
 )
 
@@ -3445,7 +3449,9 @@
(set (match_operand:DI 0 "register_operand" "=r,r")
(zero_extend:DI (and:SI (match_dup 1) (match_dup 2]
   ""
-  "ands\\t%w0, %w1, %w2"
+  "@
+   ands\\t%w0, %w1, %w2
+   ands\\t%w0, %w1, %2"
   [(set_attr "type" "logics_reg,logics_imm")]
 )
 
@@ -3799,7 +3805,9 @@
  (match_operand:GPI 1 "aarch64_logical_operand" "r,"))
 (const_int 0)))]
   ""
-  "tst\\t%0, %1"
+  "@
+   tst\\t%0, %1
+   tst\\t%0, %1"
   [(set_attr "type" "logics_reg,logics_imm")]
 )
 
@@ -5187,7 +5195,7 @@
 UNSPEC_SP_TEST))
(clobber (match_scratch:PTR 3 "="))]
   ""
-  "ldr\t%3, %x1\;ldr\t%0, %x2\;eor\t%0, %3, %0"
+  "ldr\t%3, %1\;ldr\t%0, %2\;eor\t%0, %3, %0"
   [(set_attr "length" "12")
(set_attr "type" "multiple")])
 



Re: [PATCH] Allow all 1s of integer as standard SSE constants

2016-04-22 Thread H.J. Lu
On Fri, Apr 22, 2016 at 5:11 AM, Uros Bizjak  wrote:
> On Thu, Apr 21, 2016 at 10:58 PM, H.J. Lu  wrote:
>
>> Here is the updated patch with my standard_sse_constant_p change and
>> your SSE/AVX pattern change.  I didn't include your
>> standard_sse_constant_opcode since it didn't compile nor is needed
>> for this purpose.
>
> H.J.,
>
> please test the attached patch that finally rewrites and improves SSE
> constants handling.
>
> This is what I want to commit, a follow-up patch will further clean
> standard_sse_constant_opcode wrt TARGET_AVX512VL.
>

It doesn't address my problem which is "Allow all 1s of integer as
standard SSE constants".  The key here is "integer".  I'd like to use
SSE/AVX store TI/OI/XI integers with -1.

-- 
H.J.


Re: [PATCH 2/2] (header usage fix) include c++ headers in system.h

2016-04-22 Thread David Edelsohn
On Fri, Apr 22, 2016 at 6:02 AM, Szabolcs Nagy  wrote:
> Some gcc source files include standard headers after
> "system.h" but those headers may declare and use poisoned
> symbols, they also cannot be included before "system.h"
> because they might depend on macro definitions from there,
> so they must be included in system.h.
>
> This patch fixes the use of , , , 
> and  headers, by using appropriate
> INCLUDE_{LIST, MAP, SET, VECTOR, ALGORITHM} macros.
> (Note that there are some other system header uses which
> did not get fixed.)
>
> Build tested on aarch64-*-gnu, sh-*-musl, x86_64-*-musl and
> bootstrapped x86_64-*-gnu (together with PATCH 1/2).
>
> is this ok for AIX?

It should be okay on AIX.

> OK for trunk?
>
> This would be nice to fix in gcc-6 too, because at least
> with musl libc the bootstrap is broken.
>
> gcc/ChangeLog:
>
> 2016-04-22  Szabolcs Nagy  
>
> * system.h (list, map, set, vector): Include conditionally.
> * auto-profile.c (INCLUDE_MAP, INCLUDE_SET): Define.
> * graphite-isl-ast-to-gimple.c (INCLUDE_MAP): Define.
> * ipa-icf.c (INCLUDE_LIST): Define.
> * config/aarch64/cortex-a57-fma-steering.c (INCLUDE_LIST): Define.
> * config/sh/sh.c (INCLUDE_VECTOR): Define.
> * config/sh/sh_treg_combine.cc (INCLUDE_ALGORITHM): Define.
> (INCLUDE_LIST, INCLUDE_VECTOR): Define.
> * cp/logic.cc (INCLUDE_LIST): Define.
> * fortran/trans-common.c (INCLUDE_MAP): Define.


[PATCH][AArch64] Fix shift attributes

2016-04-22 Thread Wilco Dijkstra
This patch fixes the attributes of integer immediate shifts which were 
incorrectly
modelled as register controlled shifts. Also change EXTR attribute to being a 
rotate.

OK for trunk?

ChangeLog:
2016-04-22  Wilco Dijkstra  

* gcc/config/aarch64/aarch64.md (aarch64_ashl_sisd_or_int_3):
Split integer shifts into shift_reg and bfm.
(aarch64_lshr_sisd_or_int_3): Likewise.
(aarch64_ashr_sisd_or_int_3): Likewise.
(ror3_insn): Likewise.
(si3_insn_uxtw): Likewise.
(3_insn): Change to rotate_imm.
(extr5_insn_alt): Likewise.
(extrsi5_insn_uxtw): Likewise.
(extrsi5_insn_uxtw_alt): Likewise.

---
 gcc/config/aarch64/aarch64.md | 61 ---
 1 file changed, 34 insertions(+), 27 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index af52722..85c8bc9 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3701,33 +3701,35 @@
 
 ;; Logical left shift using SISD or Integer instruction
 (define_insn "*aarch64_ashl_sisd_or_int_3"
-  [(set (match_operand:GPI 0 "register_operand" "=r,w,w")
+  [(set (match_operand:GPI 0 "register_operand" "=r,r,w,w")
 (ashift:GPI
-  (match_operand:GPI 1 "register_operand" "r,w,w")
-  (match_operand:QI 2 "aarch64_reg_or_shift_imm_" 
"rUs,Us,w")))]
+  (match_operand:GPI 1 "register_operand" "r,r,w,w")
+  (match_operand:QI 2 "aarch64_reg_or_shift_imm_" 
"Us,r,Us,w")))]
   ""
   "@
+   lsl\t%0, %1, %2
lsl\t%0, %1, %2
shl\t%0, %1, %2
ushl\t%0, %1, %2"
-  [(set_attr "simd" "no,yes,yes")
-   (set_attr "type" "shift_reg,neon_shift_imm, neon_shift_reg")]
+  [(set_attr "simd" "no,no,yes,yes")
+   (set_attr "type" "bfm,shift_reg,neon_shift_imm, neon_shift_reg")]
 )
 
 ;; Logical right shift using SISD or Integer instruction
 (define_insn "*aarch64_lshr_sisd_or_int_3"
-  [(set (match_operand:GPI 0 "register_operand" "=r,w,,")
+  [(set (match_operand:GPI 0 "register_operand" "=r,r,w,,")
 (lshiftrt:GPI
-  (match_operand:GPI 1 "register_operand" "r,w,w,w")
-  (match_operand:QI 2 "aarch64_reg_or_shift_imm_" 
"rUs,Us,w,0")))]
+  (match_operand:GPI 1 "register_operand" "r,r,w,w,w")
+  (match_operand:QI 2 "aarch64_reg_or_shift_imm_" 
"Us,r,Us,w,0")))]
   ""
   "@
+   lsr\t%0, %1, %2
lsr\t%0, %1, %2
ushr\t%0, %1, %2
#
#"
-  [(set_attr "simd" "no,yes,yes,yes")
-   (set_attr "type" 
"shift_reg,neon_shift_imm,neon_shift_reg,neon_shift_reg")]
+  [(set_attr "simd" "no,no,yes,yes,yes")
+   (set_attr "type" 
"bfm,shift_reg,neon_shift_imm,neon_shift_reg,neon_shift_reg")]
 )
 
 (define_split
@@ -3762,18 +3764,19 @@
 
 ;; Arithmetic right shift using SISD or Integer instruction
 (define_insn "*aarch64_ashr_sisd_or_int_3"
-  [(set (match_operand:GPI 0 "register_operand" "=r,w,,")
+  [(set (match_operand:GPI 0 "register_operand" "=r,r,w,,")
 (ashiftrt:GPI
-  (match_operand:GPI 1 "register_operand" "r,w,w,w")
-  (match_operand:QI 2 "aarch64_reg_or_shift_imm_di" 
"rUs,Us,w,0")))]
+  (match_operand:GPI 1 "register_operand" "r,r,w,w,w")
+  (match_operand:QI 2 "aarch64_reg_or_shift_imm_di" 
"Us,r,Us,w,0")))]
   ""
   "@
+   asr\t%0, %1, %2
asr\t%0, %1, %2
sshr\t%0, %1, %2
#
#"
-  [(set_attr "simd" "no,yes,yes,yes")
-   (set_attr "type" 
"shift_reg,neon_shift_imm,neon_shift_reg,neon_shift_reg")]
+  [(set_attr "simd" "no,no,yes,yes,yes")
+   (set_attr "type" 
"bfm,shift_reg,neon_shift_imm,neon_shift_reg,neon_shift_reg")]
 )
 
 (define_split
@@ -3865,21 +3868,25 @@
   [(set (match_operand:GPI 0 "register_operand" "=r,r")
  (rotatert:GPI
(match_operand:GPI 1 "register_operand" "r,r")
-   (match_operand:QI 2 "aarch64_reg_or_shift_imm_" "r,Us")))]
+   (match_operand:QI 2 "aarch64_reg_or_shift_imm_" "Us,r")))]
   ""
-  "ror\\t%0, %1, %2"
-  [(set_attr "type" "shift_reg, rotate_imm")]
+  "@
+   ror\\t%0, %1, %2
+   ror\\t%0, %1, %2"
+  [(set_attr "type" "rotate_imm,shift_reg")]
 )
 
 ;; zero_extend version of above
 (define_insn "*si3_insn_uxtw"
-  [(set (match_operand:DI 0 "register_operand" "=r")
+  [(set (match_operand:DI 0 "register_operand" "=r,r")
(zero_extend:DI (SHIFT:SI
-(match_operand:SI 1 "register_operand" "r")
-(match_operand:QI 2 "aarch64_reg_or_shift_imm_si" "rUss"]
+(match_operand:SI 1 "register_operand" "r,r")
+(match_operand:QI 2 "aarch64_reg_or_shift_imm_si" "Uss,r"]
   ""
-  "\\t%w0, %w1, %w2"
-  [(set_attr "type" "shift_reg")]
+  "@
+   \\t%w0, %w1, %2
+   \\t%w0, %w1, %w2"
+  [(set_attr "type" "bfm,shift_reg")]
 )
 
 (define_insn "*3_insn"
@@ -3903,7 +3910,7 @@
   "UINTVAL (operands[3]) < GET_MODE_BITSIZE (mode) &&
(UINTVAL (operands[3]) + UINTVAL (operands[4]) == GET_MODE_BITSIZE 
(mode))"
   "extr\\t%0, %1, %2, %4"
-  [(set_attr "type" "shift_imm")]
+  [(set_attr "type" 

[patch] fix openacc data clause errors in c/c++ for PR70688

2016-04-22 Thread Cesar Philippidis
This patch makes the c and c++ FEs aware that acc data clauses are not
omp maps. I've also taught those FEs that reduction clauses are not data
clauses in openacc, which fixes PR70688.

I don't really like how *finish_omp_clauses has default arguments for
declare_simd, is_cilk and not is_oacc, but I went with that because it
seems like the thing to do here. Maybe it would be better if all of
those arguments were consolidated into a single bitmask or enum?

Is this OK for trunk and possibly gcc-6.2?

Cesar
2016-04-21  Cesar Philippidis  

	gcc/c/
	PR c/70688
	* c-parser.c (c_parser_oacc_all_clauses): Update call to
	c_finish_omp_clauses.
	(c_parser_oacc_cache): Likewise.
	(c_parser_oacc_loop): Likewise.
	* c-tree.h (c_finish_omp_clauses): Add bool argument.
	* c-typeck.c (c_finish_omp_clauses): Add bool is_oacc argument.
	Use it to report OpenACC-specific diagnostics.

	gcc/cp/
	* cp-tree.h (finish_omp_clauses): Add bool argument.
	* parser.c (cp_parser_oacc_all_clauses): Update call to
	finish_omp_clauses.
	(cp_parser_oacc_cache): Likewise.
	(cp_parser_oacc_loop): Likewise.
	* pt.c (tsubst_attribute): Add bool is_oacc arguments.  Propagate it
	to finish_omp_clauses. 
	(tsubst_omp_clauses): Update calls to tsubst_omp_clauses.
	(tsubst_expr): Update calls to tsubst_omp_clauses.
	* semantics.c (finish_omp_clauses): Add bool is_oacc argument.
	Use it to report OpenACC-specific diagnostics.

	gcc/testsuite/
	* c-c++-common/goacc/data-clause-duplicate-1.c: Update expected errors.
	* c-c++-common/goacc/deviceptr-1.c: Likewise.
	* c-c++-common/goacc/kernels-alias-ipa-pta-3.c: Likewise.
	* c-c++-common/goacc/reduction-5.c: Likewise.
	* c-c++-common/goacc/pr70688.c: New test.
	* g++.dg/goacc/data-1.C: New test.
	* g++.dg/goacc/data-2.C: New test.
	* g++.dg/gomp/template-data.C: New test.


diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index bdd669d..11ba146 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -13177,7 +13177,7 @@ c_parser_oacc_all_clauses (c_parser *parser, omp_clause_mask mask,
   c_parser_skip_to_pragma_eol (parser);
 
   if (finish_p)
-return c_finish_omp_clauses (clauses, false);
+return c_finish_omp_clauses (clauses, false, false, false, true);
 
   return clauses;
 }
@@ -13497,7 +13497,7 @@ c_parser_oacc_cache (location_t loc, c_parser *parser)
   tree stmt, clauses;
 
   clauses = c_parser_omp_var_list_parens (parser, OMP_CLAUSE__CACHE_, NULL);
-  clauses = c_finish_omp_clauses (clauses, false);
+  clauses = c_finish_omp_clauses (clauses, false, false, false, true);
 
   c_parser_skip_to_pragma_eol (parser);
 
@@ -13831,9 +13831,9 @@ c_parser_oacc_loop (location_t loc, c_parser *parser, char *p_name,
 {
   clauses = c_oacc_split_loop_clauses (clauses, cclauses);
   if (*cclauses)
-	*cclauses = c_finish_omp_clauses (*cclauses, false);
+	*cclauses = c_finish_omp_clauses (*cclauses, false, false, false, true);
   if (clauses)
-	clauses = c_finish_omp_clauses (clauses, false);
+	clauses = c_finish_omp_clauses (clauses, false, false, false, true);
 }
 
   tree block = c_begin_compound_stmt (true);
diff --git a/gcc/c/c-tree.h b/gcc/c/c-tree.h
index 4633182..0fa8a69 100644
--- a/gcc/c/c-tree.h
+++ b/gcc/c/c-tree.h
@@ -661,7 +661,8 @@ extern tree c_begin_omp_task (void);
 extern tree c_finish_omp_task (location_t, tree, tree);
 extern void c_finish_omp_cancel (location_t, tree);
 extern void c_finish_omp_cancellation_point (location_t, tree);
-extern tree c_finish_omp_clauses (tree, bool, bool = false, bool = false);
+extern tree c_finish_omp_clauses (tree, bool, bool = false, bool = false,
+  bool = false);
 extern tree c_build_va_arg (location_t, tree, location_t, tree);
 extern tree c_finish_transaction (location_t, tree, int);
 extern bool c_tree_equal (tree, tree);
diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index 58c2139..6731fd0 100644
--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@@ -12497,10 +12497,10 @@ c_find_omp_placeholder_r (tree *tp, int *, void *data)
 
 tree
 c_finish_omp_clauses (tree clauses, bool is_omp, bool declare_simd,
-		  bool is_cilk)
+		  bool is_cilk, bool is_oacc)
 {
   bitmap_head generic_head, firstprivate_head, lastprivate_head;
-  bitmap_head aligned_head, map_head, map_field_head;
+  bitmap_head aligned_head, map_head, map_field_head, oacc_reduction_head;
   tree c, t, type, *pc;
   tree simdlen = NULL_TREE, safelen = NULL_TREE;
   bool branch_seen = false;
@@ -12517,6 +12517,7 @@ c_finish_omp_clauses (tree clauses, bool is_omp, bool declare_simd,
   bitmap_initialize (_head, _default_obstack);
   bitmap_initialize (_head, _default_obstack);
   bitmap_initialize (_field_head, _default_obstack);
+  bitmap_initialize (_reduction_head, _default_obstack);
 
   for (pc = , c = clauses; c ; c = *pc)
 {
@@ -12854,6 +12855,16 @@ c_finish_omp_clauses (tree clauses, bool is_omp, bool declare_simd,
 			omp_clause_code_name[OMP_CLAUSE_CODE (c)]);
 	  remove = true;
 	}
+	  else if (is_oacc 

Re: [PATCH] add support for placing variables in shared memory

2016-04-22 Thread Alexander Monakov
On Fri, 22 Apr 2016, Nathan Sidwell wrote:
> On 04/21/16 10:25, Alexander Monakov wrote:
> > On Thu, 21 Apr 2016, Nathan Sidwell wrote:
> > > What is the rationale for a new attribute, rather than leveraging the
> > > existing section(".shared") machinery?
> >
> > Section switching does not work at all on NVPTX in GCC at present.  PTX
> > assembly has no notion of different data sections, so the backend does not
> > advertise section switching capability to the middle end.
> 
> Correct.  How is that relevant?  Look at DECL_SECTION_NAME in
> encode_section_info.

Middle end rejects the "section" attribute:

echo 'int v __attribute__((section("foo")));' |
  x86_64-pc-linux-gnu-accel-nvptx-none-gcc -xc - -o /dev/null
:1:5: error: section attributes are not supported for this target

> > I avoided using 'static' because it applies to external declarations as
> > well.
> > Other backends use "%qE attribute not allowed with auto storage class"; I'll
> > be happy to switch to that for consistency.
> 
> 
> Why can it not be applied to external declarations?  Doesn't '.extern .shared
> whatever' work?

It can and it does; the point was to avoid ambiguity; the precise variable
class would be 'variables with static storage duration' (doesn't matter with
external or internal linkage, defined or declared), but due to possible
confusion with variables declared with 'static' keyword, it's useful to avoid
that wording.

> Why can it not apply to  variables of auto storage?  I.e. function scope,
> function lifetime?  That would seem to be a useful property.

Because PTX does not support auto storage semantics for .shared data.  It's
statically allocated at link time.

> What happens if an initializer is present, is it silently ignored?

GCC accepts and reemits it in assembly output (if non-zero), and ptxas rejects
it ("syntax error").

Alexander


Re: C, C++: New warning for memset without multiply by elt size

2016-04-22 Thread Jason Merrill
This looks good, but can we move the code into c-common rather than
duplicate it?

Jason


On Fri, Apr 22, 2016 at 9:30 AM, Bernd Schmidt  wrote:
> We had this problem in the C frontend until very recently:
>
>   int array[some_count];
>   memset (array, 0, some_count);
>
> which forgets to multiply by sizeof int. The following patch implements a
> new warning option for this.
>
> Bootstrapped and tested (a while ago, will retest) on x86_64-linux. Ok for
> trunk?
>
>
> Bernd


Re: C, C++: Fix PR 69733 (bad location for ignored qualifiers warning)

2016-04-22 Thread Jason Merrill
The C++ change is OK.

Jason


[PATCH, committed] Fortran cleanup-submodules

2016-04-22 Thread Dominique d'Humières
I have committed the following as obvious at revision r235367. Is it OK to back 
port it to the gcc-6 branch?

Dominique

Index: gcc/testsuite/ChangeLog
===
--- gcc/testsuite/ChangeLog (revision 235364)
+++ gcc/testsuite/ChangeLog (working copy)
@@ -1,3 +1,8 @@
+2016-04-22  Dominique d'Humieres  
+
+   * gfortran.dg/submodule_14.f08: Add cleanup-submodules.
+   * gfortran.dg/submodule_15.f08: Likewise.
+
 2016-04-22  Richard Biener  
 
PR tree-optimization/70740
Index: gcc/testsuite/gfortran.dg/submodule_14.f08
===
--- gcc/testsuite/gfortran.dg/submodule_14.f08  (revision 235364)
+++ gcc/testsuite/gfortran.dg/submodule_14.f08  (working copy)
@@ -46,4 +46,4 @@
   x = 10
   if (fcn1 (x) .ne. 0) call abort
 end
-
+! { dg-final { cleanup-submodules "test@testson" } }
Index: gcc/testsuite/gfortran.dg/submodule_15.f08
===
--- gcc/testsuite/gfortran.dg/submodule_15.f08  (revision 235364)
+++ gcc/testsuite/gfortran.dg/submodule_15.f08  (working copy)
@@ -56,3 +56,4 @@
   incr = 1
   if (a3(i) .ne. 11) call abort
 end
+! { dg-final { cleanup-submodules "a@a_son" } }



Re: C++ PATCH for c++/70744 (wrong-code with x ?: y extension)

2016-04-22 Thread Jason Merrill

On 04/22/2016 07:50 AM, Marek Polacek wrote:

This PR shows that we generate wrong code with the x ?: y extension in case the
first operand contains either predecrement or preincrement.  The problem is
that we don't emit SAVE_EXPR, thus the operand is evaluated twice, which it
should not be.

While ++i or --i can be lvalues in C++, i++ or i-- can not.  The code in
build_conditional_expr_1 has:
  4635   /* Make sure that lvalues remain lvalues.  See g++.oliva/ext1.C.  
*/
  4636   if (real_lvalue_p (arg1))
  4637 arg2 = arg1 = stabilize_reference (arg1);
  4638   else
  4639 arg2 = arg1 = save_expr (arg1);
so for ++i/--i we call stabilize_reference, but that doesn't know anything
about PREINCREMENT_EXPR or PREDECREMENT_EXPR and just returns the same
expression, so SAVE_EXPR isn't created.

I think one fix would be to teach stabilize_reference what to do with those,
similarly to how we handle COMPOUND_EXPR there.


Your change will turn the expression into an rvalue, so that isn't 
enough.  We need to turn the expression into some sort of _REF before 
passing it to stabilize_reference, perhaps by factoring out the 
cast-to-reference code from force_paren_expr.  This should probably be 
part of a cp_stabilize_reference function that replaces all uses of 
stabilize_reference in the front end.


Jason



Re: [testsuite] gcc-dg: handle all return values when shouldfail is set

2016-04-22 Thread Christophe Lyon
ping?

On 13 April 2016 at 10:21, Christophe Lyon  wrote:
> Hi,
>
> While investigating stability problems when running GCC validations,
> I fixed DejaGnu to properly handle cases where it cannot parse the
> testcase output:
> http://lists.gnu.org/archive/html/dejagnu/2016-04/msg8.html
>
> This means that such cases now return "unresolved" which confuses
> ${tool}_load in gcc-dg.exp, as it currently only handles "pass" and "fail".
>
> The attached small patch fixes this.
>
> This is probably for stage 1 only I guess.
>
> OK?
>
> Christophe.


C, C++: New warning for memset without multiply by elt size

2016-04-22 Thread Bernd Schmidt

We had this problem in the C frontend until very recently:

  int array[some_count];
  memset (array, 0, some_count);

which forgets to multiply by sizeof int. The following patch implements 
a new warning option for this.


Bootstrapped and tested (a while ago, will retest) on x86_64-linux. Ok 
for trunk?



Bernd
	* doc/invoke.texi (Warning Options): Add -Wmemset-elt-size.
	(-Wmemset-elt-size): New item.
c-family/
	* c.opt (Wmemset-elt-size): New option.
c/
	* c-parser.c (c_parser_postfix_expression_after_primary): Warn for
	memset with element count rather than array size.
cp/
	* parser.c (cp_parser_postfix_expression): Warn for
	memset with element count rather than array size.
testsuite/
	* c-c++-common/memset-array.c: New test.

Index: gcc/c/c-parser.c
===
--- gcc/c/c-parser.c	(revision 234183)
+++ gcc/c/c-parser.c	(working copy)
@@ -8240,18 +8240,46 @@ c_parser_postfix_expression_after_primar
 	  expr.value, exprlist,
 	  sizeof_arg,
 	  sizeof_ptr_memacc_comptypes);
-	  if (warn_memset_transposed_args
-	  && TREE_CODE (expr.value) == FUNCTION_DECL
+	  if (TREE_CODE (expr.value) == FUNCTION_DECL
 	  && DECL_BUILT_IN_CLASS (expr.value) == BUILT_IN_NORMAL
 	  && DECL_FUNCTION_CODE (expr.value) == BUILT_IN_MEMSET
-	  && vec_safe_length (exprlist) == 3
-	  && integer_zerop ((*exprlist)[2])
-	  && (literal_zero_mask & (1 << 2)) != 0
-	  && (!integer_zerop ((*exprlist)[1])
-		  || (literal_zero_mask & (1 << 1)) == 0))
-	warning_at (expr_loc, OPT_Wmemset_transposed_args,
-			"% used with constant zero length parameter; "
-			"this could be due to transposed parameters");
+	  && vec_safe_length (exprlist) == 3)
+	{
+	  if (warn_memset_transposed_args
+		  && integer_zerop ((*exprlist)[2])
+		  && (literal_zero_mask & (1 << 2)) != 0
+		  && (!integer_zerop ((*exprlist)[1])
+		  || (literal_zero_mask & (1 << 1)) == 0))
+		warning_at (expr_loc, OPT_Wmemset_transposed_args,
+			"% used with constant zero length "
+			"parameter; this could be due to transposed "
+			"parameters");
+	  if (warn_memset_elt_size
+		  && TREE_CODE ((*exprlist)[2]) == INTEGER_CST)
+		{
+		  tree ptr = (*exprlist)[0];
+		  STRIP_NOPS (ptr);
+		  if (TREE_CODE (ptr) == ADDR_EXPR)
+		ptr = TREE_OPERAND (ptr, 0);
+		  tree type = TREE_TYPE (ptr);
+		  if (TREE_CODE (type) == ARRAY_TYPE)
+		{
+		  tree elt_type = TREE_TYPE (type);
+		  tree domain = TYPE_DOMAIN (type);
+		  if (!integer_onep (TYPE_SIZE_UNIT (elt_type))
+			  && TYPE_MAXVAL (domain)
+			  && TYPE_MINVAL (domain)
+			  && integer_zerop (TYPE_MINVAL (domain))
+			  && integer_onep (fold_build2 (MINUS_EXPR, domain,
+			(*exprlist)[2],
+			TYPE_MAXVAL (domain
+		  warning_at (expr_loc, OPT_Wmemset_elt_size,
+  "% used with length equal to "
+  "number of elements without multiplication "
+  "with element size");
+		}
+		}
+	}
 
 	  start = expr.get_start ();
 	  finish = parser->tokens_buf[0].get_finish ();
Index: gcc/cp/parser.c
===
--- gcc/cp/parser.c	(revision 234183)
+++ gcc/cp/parser.c	(working copy)
@@ -6829,20 +6829,48 @@ cp_parser_postfix_expression (cp_parser
 		  }
 	  }
 
-	if (warn_memset_transposed_args)
+	if (TREE_CODE (postfix_expression) == FUNCTION_DECL
+		&& DECL_BUILT_IN_CLASS (postfix_expression) == BUILT_IN_NORMAL
+		&& DECL_FUNCTION_CODE (postfix_expression) == BUILT_IN_MEMSET
+		&& vec_safe_length (args) == 3)
 	  {
-		if (TREE_CODE (postfix_expression) == FUNCTION_DECL
-		&& DECL_BUILT_IN_CLASS (postfix_expression) == BUILT_IN_NORMAL
-		&& DECL_FUNCTION_CODE (postfix_expression) == BUILT_IN_MEMSET
-		&& vec_safe_length (args) == 3
-		&& TREE_CODE ((*args)[2]) == INTEGER_CST
-		&& integer_zerop ((*args)[2])
-		&& !(TREE_CODE ((*args)[1]) == INTEGER_CST
-			 && integer_zerop ((*args)[1])))
+		tree arg1 = (*args)[1];
+		tree arg2 = (*args)[2];
+		if (TREE_CODE (arg2) == CONST_DECL)
+		  arg2 = DECL_INITIAL (arg2);
+
+		if (warn_memset_transposed_args
+		&& integer_zerop (arg2)
+		&& !(TREE_CODE (arg1) == INTEGER_CST
+			 && integer_zerop (arg1)))
 		  warning (OPT_Wmemset_transposed_args,
 			   "% used with constant zero length "
 			   "parameter; this could be due to transposed "
 			   "parameters");
+		if (warn_memset_elt_size && TREE_CODE (arg2) == INTEGER_CST)
+		  {
+		tree ptr = (*args)[0];
+		STRIP_NOPS (ptr);
+		if (TREE_CODE (ptr) == ADDR_EXPR)
+		  ptr = TREE_OPERAND (ptr, 0);
+		tree type = TREE_TYPE (ptr);
+		if (TREE_CODE (type) == ARRAY_TYPE)
+		  {
+			tree elt_type = TREE_TYPE (type);
+			tree domain = TYPE_DOMAIN (type);
+			if (!integer_onep (TYPE_SIZE_UNIT (elt_type))
+			&& TYPE_MAXVAL (domain)
+			&& TYPE_MINVAL (domain)
+			&& integer_zerop 

C, C++: Fix PR 69733 (bad location for ignored qualifiers warning)

2016-04-22 Thread Bernd Schmidt

The PR is for a C++ form of the form

const double val() const { ... }

where the warning location is at the second const (by accident, in 
reality it's just past the function's declarator), while the first const 
is the one that we are warning about.


This patch adds some logic to the C and C++ frontends to look for the 
qualifier, or a typedef name, and point the warning there. C needs a 
little more work because the ignored qualifier could also be an address 
space.


Bootstrapped and tested on x86_64-linux (a while ago, will retest). Ok 
for trunk?



Bernd
c/
	PR c++/69733
	* c-decl.c (smallest_type_quals_location): New static function.
	(grokdeclarator): Try to find the correct location for an ignored
	qualifier.
cp/
	PR c++/69733
	* decl.c (grokdeclarator): Try to find the correct location for an
	ignored qualifier.
testsuite/
	PR c++/69733
	* c-c++-common/pr69733.c: New test.
	* gcc.target/i386/pr69733.c: New test.

Index: gcc/c/c-decl.c
===
--- gcc/c/c-decl.c	(revision 234183)
+++ gcc/c/c-decl.c	(working copy)
@@ -5291,6 +5291,27 @@ warn_defaults_to (location_t location, i
   va_end (ap);
 }
 
+/* Returns the smallest location != UNKNOWN_LOCATION in LOCATIONS,
+   considering only those c_declspec_words found in LIST, which
+   must be terminated by cdw_number_of_elements.  */
+
+static location_t
+smallest_type_quals_location (const location_t* locations,
+			  c_declspec_word *list)
+{
+  location_t loc = UNKNOWN_LOCATION;
+  while (*list != cdw_number_of_elements)
+{
+  location_t newloc = locations[*list];
+  if (loc == UNKNOWN_LOCATION
+	  || (newloc != UNKNOWN_LOCATION && newloc < loc))
+	loc = newloc;
+  list++;
+}
+
+  return loc;
+}
+
 /* Given declspecs and a declarator,
determine the name and type of the object declared
and construct a ..._DECL node for it.
@@ -6101,6 +6122,18 @@ grokdeclarator (const struct c_declarato
 	   qualify the return type, not the function type.  */
 	if (type_quals)
 	  {
+		enum c_declspec_word ignored_quals_list[] =
+		  {
+		cdw_const, cdw_volatile, cdw_restrict, cdw_address_space,
+		cdw_number_of_elements
+		  };
+		location_t specs_loc
+		= smallest_type_quals_location (declspecs->locations,
+		ignored_quals_list);
+		if (specs_loc == UNKNOWN_LOCATION)
+		  specs_loc = declspecs->locations[cdw_typedef];
+		if (specs_loc == UNKNOWN_LOCATION)
+		  specs_loc = loc;
 		/* Type qualifiers on a function return type are
 		   normally permitted by the standard but have no
 		   effect, so give a warning at -Wreturn-type.
@@ -6108,10 +6141,10 @@ grokdeclarator (const struct c_declarato
 		   function definitions in ISO C; GCC used to used
 		   them for noreturn functions.  */
 		if (VOID_TYPE_P (type) && really_funcdef)
-		  pedwarn (loc, 0,
+		  pedwarn (specs_loc, 0,
 			   "function definition has qualified void return type");
 		else
-		  warning_at (loc, OPT_Wignored_qualifiers,
+		  warning_at (specs_loc, OPT_Wignored_qualifiers,
 			   "type qualifiers ignored on function return type");
 
 		type = c_build_qualified_type (type, type_quals);
Index: gcc/cp/decl.c
===
--- gcc/cp/decl.c	(revision 234183)
+++ gcc/cp/decl.c	(working copy)
@@ -10010,8 +10010,15 @@ grokdeclarator (const cp_declarator *dec
 	if (type_quals != TYPE_UNQUALIFIED)
 	  {
 		if (SCALAR_TYPE_P (type) || VOID_TYPE_P (type))
-		  warning (OPT_Wignored_qualifiers,
-			   "type qualifiers ignored on function return type");
+		  {
+		location_t loc;
+		loc = smallest_type_quals_location (type_quals,
+			declspecs->locations);
+		if (loc == UNKNOWN_LOCATION)
+		  loc = declspecs->locations[ds_type_spec];
+		warning_at (loc, OPT_Wignored_qualifiers, "type "
+"qualifiers ignored on function return type");
+		  }
 		/* We now know that the TYPE_QUALS don't apply to the
 		   decl, but to its return type.  */
 		type_quals = TYPE_UNQUALIFIED;
Index: gcc/testsuite/c-c++-common/pr69733.c
===
--- gcc/testsuite/c-c++-common/pr69733.c	(revision 0)
+++ gcc/testsuite/c-c++-common/pr69733.c	(working copy)
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-W -fdiagnostics-show-caret" } */
+
+typedef const double cd;
+double val;
+
+const double val0() {return val;} /* { dg-warning "qualifiers ignored" } */
+/* { dg-begin-multiline-output "" }
+ const double val0() {return val;}
+ ^
+{ dg-end-multiline-output "" } */
+
+volatile double val1() {return val;} /* { dg-warning "qualifiers ignored" } */
+/* { dg-begin-multiline-output "" }
+ volatile double val1() {return val;}
+ ^~~~
+{ dg-end-multiline-output "" } */
+
+cd val2() {return val;} /* { dg-warning "qualifiers ignored" } */
+/* { dg-begin-multiline-output "" }
+ cd val2() {return val;}
+ ^~
+{ dg-end-multiline-output "" } */
+
Index: 

[PATCH][AArch64] Improve aarch64_modes_tieable_p

2016-04-22 Thread Wilco Dijkstra
Improve modes_tieable by returning true in more cases: allow scalar access
within vectors without requiring an extra move. Removing these moves helps
the register allocator in deciding whether to use integer or FP registers on
operations that can be done on both. This saves about 100 instructions in the
gcc.target/aarch64 tests.

A typical example:

orr v1.8b, v0.8b, v1.8b
fmovx0, d0
fmovx1, d1
add x0, x1, x0
ins v0.d[0], x0
orr v0.8b, v1.8b, v0.8b

after:

orr v1.8b, v0.8b, v1.8b
add d0, d1, d0
orr v0.8b, v1.8b, v0.8b

OK for trunk?

ChangeLog:
2016-04-22  Wilco Dijkstra  

* gcc/config/aarch64/aarch64.c (aarch64_modes_tieable_p):
Allow scalar/single vector modes to be tieable.

---
 gcc/config/aarch64/aarch64.c | 18 ++
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index abc864c..6e921f0 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -12294,7 +12294,14 @@ aarch64_reverse_mask (enum machine_mode mode)
   return force_reg (V16QImode, mask);
 }
 
-/* Implement MODES_TIEABLE_P.  */
+/* Implement MODES_TIEABLE_P.  In principle we should always return true.
+   However due to issues with register allocation it is preferable to avoid
+   tieing integer scalar and FP scalar modes.  Executing integer operations
+   in general registers is better than treating them as scalar vector
+   operations.  This reduces latency and avoids redundant int<->FP moves.
+   So tie modes if they are either the same class, or vector modes with
+   other vector modes, vector structs or any scalar mode.
+*/
 
 bool
 aarch64_modes_tieable_p (machine_mode mode1, machine_mode mode2)
@@ -12305,9 +12312,12 @@ aarch64_modes_tieable_p (machine_mode mode1, 
machine_mode mode2)
   /* We specifically want to allow elements of "structure" modes to
  be tieable to the structure.  This more general condition allows
  other rarer situations too.  */
-  if (TARGET_SIMD
-  && aarch64_vector_mode_p (mode1)
-  && aarch64_vector_mode_p (mode2))
+  if (aarch64_vector_mode_p (mode1) && aarch64_vector_mode_p (mode2))
+return true;
+
+  /* Also allow any scalar modes with vectors.  */
+  if (aarch64_vector_mode_supported_p (mode1)
+  || aarch64_vector_mode_supported_p (mode2))
 return true;
 
   return false;
-- 
1.9.1



Re: [PATCH] add support for placing variables in shared memory

2016-04-22 Thread Nathan Sidwell

On 04/21/16 10:25, Alexander Monakov wrote:

On Thu, 21 Apr 2016, Nathan Sidwell wrote:

On 04/20/16 12:58, Alexander Monakov wrote:

Allow using __attribute__((shared)) to place static variables in '.shared'
memory space.


What is the rationale for a new attribute, rather than leveraging the existing
section(".shared") machinery?


Section switching does not work at all on NVPTX in GCC at present.  PTX
assembly has no notion of different data sections, so the backend does not
advertise section switching capability to the middle end.


Correct.  How is that relevant?  Look at DECL_SECTION_NAME in 
encode_section_info.


CUDA C does it via attributes too, and there's no point in diverging
gratuitously I think.


Also correct.  It seems you're trying to make the compiler look like CUDA rather 
than fit new features into existing idioms.



I avoided using 'static' because it applies to external declarations as well.
Other backends use "%qE attribute not allowed with auto storage class"; I'll
be happy to switch to that for consistency.



Why can it not be applied to external declarations?  Doesn't '.extern .shared 
whatever' work?



Why can it not apply to  variables of auto storage?  I.e. function scope, 
function lifetime?  That would seem to be a useful property.


What happens if an initializer is present, is it silently ignored?  thinking 
further the uninitialized behaviour might be a reason  for not going with 
section(".shared").


nathan


Re: [PATCH] Load external function address via GOT slot

2016-04-22 Thread Uros Bizjak
On Fri, Apr 22, 2016 at 2:54 PM, H.J. Lu  wrote:
> For -fno-plt, we load the external function address via the GOT slot
> so that linker won't create an PLT entry for extern function address.
>
> Tested on x86-64. I also built GCC with -fno-plt.  It removes 99% PLT
> entries.  OK for trunk?
>
> H.J.
> --
> gcc/
>
> PR target/pr67400
> * config/i386/i386-protos.h (ix86_force_load_from_GOT_p): New.
> * config/i386/i386.c (ix86_force_load_from_GOT_p): New function.
> (ix86_legitimate_address_p): Allow UNSPEC_GOTPCREL for
> ix86_force_load_from_GOT_p returns true.
> (ix86_print_operand_address): Support UNSPEC_GOTPCREL if
> ix86_force_load_from_GOT_p returns true.
> (ix86_expand_move): Load the external function address via the
> GOT slot if ix86_force_load_from_GOT_p returns true.
> * config/i386/predicates.md (x86_64_immediate_operand): Return
> false if ix86_force_load_from_GOT_p returns true.
>
> gcc/testsuite/
>
> PR target/pr67400
> * gcc.target/i386/pr67400-1.c: New test.
> * gcc.target/i386/pr67400-2.c: Likewise.
> * gcc.target/i386/pr67400-3.c: Likewise.
> * gcc.target/i386/pr67400-4.c: Likewise.

Please get someone that knows this linker magic to review the
functionality first. Maybe Jakub can help?

Uros.


[PATCH] Load external function address via GOT slot

2016-04-22 Thread H.J. Lu
For -fno-plt, we load the external function address via the GOT slot
so that linker won't create an PLT entry for extern function address.

Tested on x86-64. I also built GCC with -fno-plt.  It removes 99% PLT
entries.  OK for trunk?

H.J.
--
gcc/

PR target/pr67400
* config/i386/i386-protos.h (ix86_force_load_from_GOT_p): New.
* config/i386/i386.c (ix86_force_load_from_GOT_p): New function.
(ix86_legitimate_address_p): Allow UNSPEC_GOTPCREL for
ix86_force_load_from_GOT_p returns true.
(ix86_print_operand_address): Support UNSPEC_GOTPCREL if
ix86_force_load_from_GOT_p returns true.
(ix86_expand_move): Load the external function address via the
GOT slot if ix86_force_load_from_GOT_p returns true.
* config/i386/predicates.md (x86_64_immediate_operand): Return
false if ix86_force_load_from_GOT_p returns true.

gcc/testsuite/

PR target/pr67400
* gcc.target/i386/pr67400-1.c: New test.
* gcc.target/i386/pr67400-2.c: Likewise.
* gcc.target/i386/pr67400-3.c: Likewise.
* gcc.target/i386/pr67400-4.c: Likewise.
---
 gcc/config/i386/i386-protos.h |  1 +
 gcc/config/i386/i386.c| 42 +++
 gcc/config/i386/predicates.md |  4 +++
 gcc/testsuite/gcc.target/i386/pr67400-1.c | 13 ++
 gcc/testsuite/gcc.target/i386/pr67400-2.c | 14 +++
 gcc/testsuite/gcc.target/i386/pr67400-3.c | 16 
 gcc/testsuite/gcc.target/i386/pr67400-4.c | 13 ++
 7 files changed, 103 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr67400-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr67400-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr67400-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr67400-4.c

diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index ff47bc1..42c941d 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -69,6 +69,7 @@ extern bool ix86_expand_set_or_movmem (rtx, rtx, rtx, rtx, 
rtx, rtx,
 extern bool constant_address_p (rtx);
 extern bool legitimate_pic_operand_p (rtx);
 extern bool legitimate_pic_address_disp_p (rtx);
+extern bool ix86_force_load_from_GOT_p (rtx);
 extern void print_reg (rtx, int, FILE*);
 extern void ix86_print_operand (FILE *, rtx, int);
 
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 6379313..499dc77 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -14443,6 +14443,24 @@ ix86_legitimate_constant_p (machine_mode, rtx x)
   return true;
 }
 
+/* True if operand X should be loaded from GOT.  */
+
+bool
+ix86_force_load_from_GOT_p (rtx x)
+{
+  /* External function symbol should be loaded via the GOT slot for
+ -fno-plt.  */
+  return (!flag_plt
+ && !flag_pic
+ && ix86_cmodel != CM_LARGE
+ && TARGET_64BIT
+ && !TARGET_PECOFF
+ && !TARGET_MACHO
+ && GET_CODE (x) == SYMBOL_REF
+ && SYMBOL_REF_FUNCTION_P (x)
+ && !SYMBOL_REF_LOCAL_P (x));
+}
+
 /* Determine if it's legal to put X into the constant pool.  This
is not possible for the address of thread-local symbols, which
is checked above.  */
@@ -14823,6 +14841,10 @@ ix86_legitimate_address_p (machine_mode, rtx addr, 
bool strict)
return false;
 
  case UNSPEC_GOTPCREL:
+   gcc_assert (flag_pic
+   || ix86_force_load_from_GOT_p (XVECEXP (XEXP (disp, 0), 
0, 0)));
+   goto is_legitimate_pic;
+
  case UNSPEC_PCREL:
gcc_assert (flag_pic);
goto is_legitimate_pic;
@@ -17427,6 +17449,11 @@ ix86_print_operand_address_as (FILE *file, rtx addr,
}
   else if (flag_pic)
output_pic_addr_const (file, disp, 0);
+  else if (GET_CODE (disp) == CONST
+  && GET_CODE (XEXP (disp, 0)) == UNSPEC
+  && XINT (XEXP (disp, 0), 1) == UNSPEC_GOTPCREL
+  && ix86_force_load_from_GOT_p (XVECEXP (XEXP (disp, 0), 0, 0)))
+   output_pic_addr_const (file, XEXP (disp, 0), code);
   else
output_addr_const (file, disp);
 }
@@ -18692,6 +18719,21 @@ ix86_expand_move (machine_mode mode, rtx operands[])
  op1 = convert_to_mode (mode, op1, 1);
}
}
+   }
+  else if (ix86_force_load_from_GOT_p (op1))
+{
+  /* Load the external function address via the GOT slot to
+avoid PLT.  */
+  op1 = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, op1),
+   (TARGET_64BIT
+? UNSPEC_GOTPCREL
+: UNSPEC_GOT));
+  op1 = gen_rtx_CONST (Pmode, op1);
+  op1 = gen_const_mem (Pmode, op1);
+  op1 = convert_to_mode (mode, op1, 1);
+  /* Force OP1 into register to prevent cse and fwprop from
+replacing a GOT load with a constant.  */
+  op1 = force_reg (mode, op1);
 }
   else
 {

Re: [PATCH] Allow all 1s of integer as standard SSE constants

2016-04-22 Thread Uros Bizjak
On Thu, Apr 21, 2016 at 10:58 PM, H.J. Lu  wrote:

> Here is the updated patch with my standard_sse_constant_p change and
> your SSE/AVX pattern change.  I didn't include your
> standard_sse_constant_opcode since it didn't compile nor is needed
> for this purpose.

H.J.,

please test the attached patch that finally rewrites and improves SSE
constants handling.

This is what I want to commit, a follow-up patch will further clean
standard_sse_constant_opcode wrt TARGET_AVX512VL.

Bootstrapped and regtested with i386.exp, full regtest in progress.

Uros.
diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
index afdc546..c02c321 100644
--- a/gcc/config/i386/constraints.md
+++ b/gcc/config/i386/constraints.md
@@ -186,7 +186,9 @@
 
 (define_constraint "BC"
   "@internal SSE constant operand."
-  (match_test "standard_sse_constant_p (op)"))
+  (and (match_test "TARGET_SSE")
+   (ior (match_operand 0 "const0_operand")
+   (match_operand 0 "all_ones_operand"
 
 ;; Integer constant constraints.
 (define_constraint "I"
@@ -239,7 +241,8 @@
 ;; This can theoretically be any mode's CONST0_RTX.
 (define_constraint "C"
   "SSE constant zero operand."
-  (match_test "standard_sse_constant_p (op) == 1"))
+  (and (match_test "TARGET_SSE")
+   (match_operand 0 "const0_operand")))
 
 ;; Constant-or-symbol-reference constraints.
 
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index ff47bc1..93b5e1e 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -50,7 +50,7 @@ extern bool ix86_using_red_zone (void);
 extern int standard_80387_constant_p (rtx);
 extern const char *standard_80387_constant_opcode (rtx);
 extern rtx standard_80387_constant_rtx (int);
-extern int standard_sse_constant_p (rtx);
+extern int standard_sse_constant_p (rtx, machine_mode);
 extern const char *standard_sse_constant_opcode (rtx_insn *, rtx);
 extern bool symbolic_reference_mentioned_p (rtx);
 extern bool extended_reg_mentioned_p (rtx);
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 6379313..874f837 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -10762,42 +10762,44 @@ standard_80387_constant_rtx (int idx)
   XFmode);
 }
 
-/* Return 1 if X is all 0s and 2 if x is all 1s
+/* Return 1 if X is all bits 0 and 2 if X is all bits 1
in supported SSE/AVX vector mode.  */
 
 int
-standard_sse_constant_p (rtx x)
+standard_sse_constant_p (rtx x, machine_mode pred_mode)
 {
   machine_mode mode;
 
   if (!TARGET_SSE)
 return 0;
 
-  mode = GET_MODE (x);
-  
-  if (x == const0_rtx || x == CONST0_RTX (mode))
+  if (const0_operand (x, VOIDmode))
 return 1;
-  if (vector_all_ones_operand (x, mode))
-switch (mode)
+
+  mode = GET_MODE (x);
+
+  /* VOIDmode integer constant, infer mode from the predicate.  */
+  if (mode == VOIDmode)
+mode = pred_mode;
+
+  else if (all_ones_operand (x, VOIDmode))
+switch (GET_MODE_SIZE (mode))
   {
-  case V16QImode:
-  case V8HImode:
-  case V4SImode:
-  case V2DImode:
-   if (TARGET_SSE2)
+  case 64:
+   if (TARGET_AVX512F)
  return 2;
-  case V32QImode:
-  case V16HImode:
-  case V8SImode:
-  case V4DImode:
+   break;
+  case 32:
if (TARGET_AVX2)
  return 2;
-  case V64QImode:
-  case V32HImode:
-  case V16SImode:
-  case V8DImode:
-   if (TARGET_AVX512F)
+   break;
+  case 16:
+   if (TARGET_SSE2)
  return 2;
+   break;
+  case 0:
+   /* VOIDmode */
+   gcc_unreachable ();
   default:
break;
   }
@@ -10811,53 +10813,81 @@ standard_sse_constant_p (rtx x)
 const char *
 standard_sse_constant_opcode (rtx_insn *insn, rtx x)
 {
-  switch (standard_sse_constant_p (x))
+  gcc_assert (TARGET_SSE);
+
+  if (const0_operand (x, VOIDmode))
 {
-case 1:
   switch (get_attr_mode (insn))
{
case MODE_XI:
  return "vpxord\t%g0, %g0, %g0";
-   case MODE_V16SF:
- return TARGET_AVX512DQ ? "vxorps\t%g0, %g0, %g0"
-: "vpxord\t%g0, %g0, %g0";
-   case MODE_V8DF:
- return TARGET_AVX512DQ ? "vxorpd\t%g0, %g0, %g0"
-: "vpxorq\t%g0, %g0, %g0";
+   case MODE_OI:
+ return (TARGET_AVX512VL
+ ? "vpxord\t%x0, %x0, %x0"
+ : "vpxor\t%x0, %x0, %x0");
case MODE_TI:
- return TARGET_AVX512VL ? "vpxord\t%t0, %t0, %t0"
-: "%vpxor\t%0, %d0";
+ return (TARGET_AVX512VL
+ ? "vpxord\t%t0, %t0, %t0"
+ : "%vpxor\t%0, %d0");
+
+   case MODE_V8DF:
+ return (TARGET_AVX512DQ
+ ? "vxorpd\t%g0, %g0, %g0"
+ : "vpxorq\t%g0, %g0, %g0");
+   case MODE_V4DF:
+ return "vxorpd\t%x0, %x0, %x0";
case MODE_V2DF:
 

Re: [PATCH 08/18] make side_effects a vec

2016-04-22 Thread Trevor Saunders
On Thu, Apr 21, 2016 at 11:12:48PM -0600, Jeff Law wrote:
> On 04/20/2016 12:22 AM, tbsaunde+...@tbsaunde.org wrote:
> >From: Trevor Saunders 
> >
> >gcc/ChangeLog:
> >
> >2016-04-19  Trevor Saunders  
> >
> > * var-tracking.c (struct adjust_mem_data): Make side_effects a vector.
> > (adjust_mems): Adjust.
> > (adjust_insn): Likewise.
> > (prepare_call_arguments): Likewise.
> >---
> >  gcc/var-tracking.c | 30 +++---
> >  1 file changed, 11 insertions(+), 19 deletions(-)
> >
> >diff --git a/gcc/var-tracking.c b/gcc/var-tracking.c
> >index 9f09d30..7fc6ed3 100644
> >--- a/gcc/var-tracking.c
> >+++ b/gcc/var-tracking.c
> >@@ -926,7 +926,7 @@ struct adjust_mem_data
> >bool store;
> >machine_mode mem_mode;
> >HOST_WIDE_INT stack_adjust;
> >-  rtx_expr_list *side_effects;
> >+  auto_vec side_effects;
> >  };
> Is auto_vec what you really want here?  AFAICT this object is never
> destructed, so we're not releasing the memory.  Am I missing something here?

it is destructed, auto_vec has a destructor, there for adjust_mem_data
has a destructor since it has a field with a destructor.
adjust_mem_data is always on the stack so the compiler deals with making
sure the destructor is called when it goes out of scope.

Trev

> 
> 
> Jeff


C++ PATCH for c++/70744 (wrong-code with x ?: y extension)

2016-04-22 Thread Marek Polacek
This PR shows that we generate wrong code with the x ?: y extension in case the
first operand contains either predecrement or preincrement.  The problem is
that we don't emit SAVE_EXPR, thus the operand is evaluated twice, which it
should not be.

While ++i or --i can be lvalues in C++, i++ or i-- can not.  The code in
build_conditional_expr_1 has:
 4635   /* Make sure that lvalues remain lvalues.  See g++.oliva/ext1.C.  */
 4636   if (real_lvalue_p (arg1))
 4637 arg2 = arg1 = stabilize_reference (arg1);
 4638   else
 4639 arg2 = arg1 = save_expr (arg1);
so for ++i/--i we call stabilize_reference, but that doesn't know anything
about PREINCREMENT_EXPR or PREDECREMENT_EXPR and just returns the same
expression, so SAVE_EXPR isn't created.

I think one fix would be to teach stabilize_reference what to do with those,
similarly to how we handle COMPOUND_EXPR there.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2016-04-22  Marek Polacek  

PR c++/70744
* tree.c (stabilize_reference): Handle PREINCREMENT_EXPR and
PREDECREMENT_EXPR.

* g++.dg/ext/cond2.C: New test.

diff --git gcc/testsuite/g++.dg/ext/cond2.C gcc/testsuite/g++.dg/ext/cond2.C
index e69de29..d9f1d59 100644
--- gcc/testsuite/g++.dg/ext/cond2.C
+++ gcc/testsuite/g++.dg/ext/cond2.C
@@ -0,0 +1,28 @@
+// PR c++/70744
+// { dg-do run }
+// { dg-options "" }
+
+static void
+fn1 (void)
+{
+  int x = 2;
+  ++x ? : 42;
+  if (x != 3)
+__builtin_abort ();
+  --x ? : 42;
+  if (x != 2)
+__builtin_abort ();
+  x++ ? : 42;
+  if (x != 3)
+__builtin_abort ();
+  x-- ? : 42;
+  if (x != 2)
+__builtin_abort ();
+}
+
+int
+main ()
+{
+  fn1 ();
+  return 0;
+}
diff --git gcc/tree.c gcc/tree.c
index 6de46a8..8fd4e81 100644
--- gcc/tree.c
+++ gcc/tree.c
@@ -4255,6 +4255,13 @@ stabilize_reference (tree ref)
 volatiles.  */
   return stabilize_reference_1 (ref);
 
+case PREINCREMENT_EXPR:
+case PREDECREMENT_EXPR:
+  /* While in C++ postincrement and postdecrement expressions are not
+considered lvalues, preincrement and predecrement expressions can
+be lvalues.  */
+  return stabilize_reference_1 (ref);
+
   /* If arg isn't a kind of lvalue we recognize, make no change.
 Caller should recognize the error for an invalid lvalue.  */
 default:

Marek


Re: match.pd patch: min(-x, -y), min(~x, ~y)

2016-04-22 Thread Kyrill Tkachov


On 22/04/16 11:34, Marc Glisse wrote:

On Fri, 22 Apr 2016, Kyrill Tkachov wrote:



On 22/04/16 10:43, Kyrill Tkachov wrote:


On 22/04/16 10:42, Marc Glisse wrote:

On Fri, 22 Apr 2016, Kyrill Tkachov wrote:


2016-04-21  Marc Glisse 

gcc/
* match.pd (min(-x, -y), max(-x, -y), min(~x, ~y), max(~x, ~y)):
New transformations.

gcc/testsuite/
* gcc.dg/tree-ssa/minmax-2.c: New testcase.




I see the new testcase failing on aarch64:
FAIL: gcc.dg/tree-ssa/minmax-2.c scan-tree-dump optimized "__builtin_fmin"


Strange, it seems to work in 
https://gcc.gnu.org/ml/gcc-testresults/2016-04/msg02120.html

Is that on some freestanding kind of setup where the builtin might be disabled?



Ah, this is aarch64-none-elf which uses newlib as the C library.
Let me check on aarch64-none-linux-gnu and get back to you.



Yeah, I see it passing on aarch64-none-linux-gnu.
Do we have an appropriate effective target check to gate this test on?


I don't know, I have a hard time finding something related. I am not even convinced the test should be skipped. It looks like __builtin_fmax was recognized, otherwise you would get a warning and a conversion int-double. Maybe 
gimple_call_combined_fn rejects it? Ah, builtins.def declares it with DEF_C99_BUILTIN, which checks targetm.libc_has_function (function_c99_misc). I assume newlib fails that check? That would make c99_runtime a relevant target check.




Yeah, adding the below makes this test UNSUPPORTED on aarch64-none-elf.
/* { dg-add-options c99_runtime } */
/* { dg-require-effective-target c99_runtime } */

I'll prepare a patch.

Thanks,
Kyrill


Re: [PATCH GCC]Improve tree ifconv by handling virtual PHIs which can be degenerated.

2016-04-22 Thread Richard Biener
On Fri, Apr 22, 2016 at 12:33 PM, Bin.Cheng  wrote:
> On Fri, Apr 22, 2016 at 11:25 AM, Richard Biener
>  wrote:
>> On Fri, Apr 22, 2016 at 12:07 PM, Bin Cheng  wrote:
>>> Hi,
>>> Tree if-conv has below code checking on virtual PHI nodes in 
>>> if_convertible__phi_p:
>>>
>>>   if (any_mask_load_store)
>>> return true;
>>>
>>>   /* When there were no if-convertible stores, check
>>>  that there are no memory writes in the branches of the loop to be
>>>  if-converted.  */
>>>   if (virtual_operand_p (gimple_phi_result (phi)))
>>> {
>>>   imm_use_iterator imm_iter;
>>>   use_operand_p use_p;
>>>
>>>   if (bb != loop->header)
>>> {
>>>   if (dump_file && (dump_flags & TDF_DETAILS))
>>> fprintf (dump_file, "Virtual phi not on loop->header.\n");
>>>   return false;
>>> }
>>>
>>>   FOR_EACH_IMM_USE_FAST (use_p, imm_iter, gimple_phi_result (phi))
>>> {
>>>   if (gimple_code (USE_STMT (use_p)) == GIMPLE_PHI
>>>   && USE_STMT (use_p) != phi)
>>> {
>>>   if (dump_file && (dump_flags & TDF_DETAILS))
>>> fprintf (dump_file, "Difficult to handle this virtual 
>>> phi.\n");
>>>   return false;
>>> }
>>> }
>>> }
>>>
>>> After investigation, I think it's to bypass code in the form of:
>>>
>>> 
>>>   .MEM_2232 = PHI <.MEM_574(179), .MEM_1247(183)> // >>   ...
>>>   if (cond)
>>> goto 
>>>   else
>>> goto 
>>>
>>> :  //empty
>>> :
>>>   .MEM_1247 = PHI <.MEM_2232(180), .MEM_2232(181)> // >>   if (cond2)
>>> goto 
>>>   else
>>> goto 
>>>
>>> :
>>>   goto 
>>>
>>> Here PHI_2 can be degenerated and deleted.  Furthermore, after propagating 
>>> .MEM_2232 to .MEM_1247's uses, PHI_1 can also be degenerated and deleted in 
>>> this case.  These cases are bypassed because tree if-conv doesn't handle 
>>> virtual PHI nodes during loop conversion (it only predicates scalar PHI 
>>> nodes).  Of course this results in loops not converted, and not vectorized.
>>> This patch firstly deletes the aforementioned checking code, then adds code 
>>> handling such virtual PHIs during conversion.  The use of 
>>> `any_mask_load_store' now is less ambiguous with this change, which allows 
>>> further cleanups and patches fixing PR56541.
>>> BTW, I think the newly fix at PR70725 on PHI nodes with only one argument 
>>> is a special case covered by this change too.  Unfortunately I can't use 
>>> replace_uses_by because I need to handle PHIs at use point after replacing 
>>> too.  This doesn't really matter since we only care about virtual PHI, it's 
>>> not possible to be used by anything other than IR itself.
>>> Bootstrap and test on x86_64 and AArch64, is it OK if no regressions?
>>
>> Doesn't this undo my fix for degenerate non-virtual PHIs?
> No, since we already support degenerate non-virtual PHIs in
> predicate_scalar_phi, your fix is also for virtual PHIs handling.

Was it?  I don't remember ;)  I think it was for a non-virtual PHI.
Anyway, you should
see the PR70725 testcase fail again if not.

>>
>> I believe we can just drop virtual PHIs and rely on
>>
>>   if (any_mask_load_store)
>> {
>>   mark_virtual_operands_for_renaming (cfun);
>>   todo |= TODO_update_ssa_only_virtuals;
>> }
>>
>> re-creating them from scratch.  To do better than that we'd simply
> I tried this, simply enable above code for all cases can't resolve
> verify_ssa issue.  I haven't look into the details, looks like ssa
> def-use chain is corrupted in if-conversion if we don't process it
> explicitly.  Maybe it's possible along with your below suggestions,
> but we need to handle uses outside of loop too.

Yes.  I don't like all the new code to deal with virtual PHIs when doing
it correctly would also avoid the above virtual SSA update ...

After all the above seems to work for the case of if-converted stores
(which is where virtual PHIs appear as well, even not degenerate).
So I don't see exactly how it would break in the other case.  I suppose
you may need to call mark_virtual_phi_result_for_renaming () on
all virtual PHIs.

Thanks,
Richard.

> Thanks,
> bin
>> re-assign virtuals
>> in combine_blocks in the new order (if there's any DEF, use the
>> headers virtual def
>> as first live vuse, assign that to any stmt with a vuse until you hit
>> one with a vdef,
>> then make that one life).
>>
>> Richard.
>>
>>> Thanks,
>>> bin
>>>
>>> 2016-04-22  Bin Cheng  
>>>
>>> * tree-if-conv.c (if_convertible_phi_p): Remove check on special
>>> virtual PHI nodes.  Delete parameter.
>>> (if_convertible_loop_p_1): Delete argument to above function.
>>> (degenerate_virtual_phi): New function.
>>> (predicate_all_scalar_phis): Rename to ...
>>> (process_all_phis): ... here.  Call degenerate_virtual_phi to

Re: match.pd patch: min(-x, -y), min(~x, ~y)

2016-04-22 Thread Marc Glisse

On Fri, 22 Apr 2016, Kyrill Tkachov wrote:



On 22/04/16 10:43, Kyrill Tkachov wrote:


On 22/04/16 10:42, Marc Glisse wrote:

On Fri, 22 Apr 2016, Kyrill Tkachov wrote:


2016-04-21  Marc Glisse 

gcc/
* match.pd (min(-x, -y), max(-x, -y), min(~x, ~y), max(~x, ~y)):
New transformations.

gcc/testsuite/
* gcc.dg/tree-ssa/minmax-2.c: New testcase.




I see the new testcase failing on aarch64:
FAIL: gcc.dg/tree-ssa/minmax-2.c scan-tree-dump optimized 
"__builtin_fmin"


Strange, it seems to work in 
https://gcc.gnu.org/ml/gcc-testresults/2016-04/msg02120.html


Is that on some freestanding kind of setup where the builtin might be 
disabled?




Ah, this is aarch64-none-elf which uses newlib as the C library.
Let me check on aarch64-none-linux-gnu and get back to you.



Yeah, I see it passing on aarch64-none-linux-gnu.
Do we have an appropriate effective target check to gate this test on?


I don't know, I have a hard time finding something related. I am not even 
convinced the test should be skipped. It looks like __builtin_fmax was 
recognized, otherwise you would get a warning and a conversion int-double. 
Maybe gimple_call_combined_fn rejects it? Ah, builtins.def declares it 
with DEF_C99_BUILTIN, which checks targetm.libc_has_function 
(function_c99_misc). I assume newlib fails that check? That would make 
c99_runtime a relevant target check.


--
Marc Glisse


Re: [PATCH GCC]Improve tree ifconv by handling virtual PHIs which can be degenerated.

2016-04-22 Thread Bin.Cheng
On Fri, Apr 22, 2016 at 11:25 AM, Richard Biener
 wrote:
> On Fri, Apr 22, 2016 at 12:07 PM, Bin Cheng  wrote:
>> Hi,
>> Tree if-conv has below code checking on virtual PHI nodes in 
>> if_convertible__phi_p:
>>
>>   if (any_mask_load_store)
>> return true;
>>
>>   /* When there were no if-convertible stores, check
>>  that there are no memory writes in the branches of the loop to be
>>  if-converted.  */
>>   if (virtual_operand_p (gimple_phi_result (phi)))
>> {
>>   imm_use_iterator imm_iter;
>>   use_operand_p use_p;
>>
>>   if (bb != loop->header)
>> {
>>   if (dump_file && (dump_flags & TDF_DETAILS))
>> fprintf (dump_file, "Virtual phi not on loop->header.\n");
>>   return false;
>> }
>>
>>   FOR_EACH_IMM_USE_FAST (use_p, imm_iter, gimple_phi_result (phi))
>> {
>>   if (gimple_code (USE_STMT (use_p)) == GIMPLE_PHI
>>   && USE_STMT (use_p) != phi)
>> {
>>   if (dump_file && (dump_flags & TDF_DETAILS))
>> fprintf (dump_file, "Difficult to handle this virtual 
>> phi.\n");
>>   return false;
>> }
>> }
>> }
>>
>> After investigation, I think it's to bypass code in the form of:
>>
>> 
>>   .MEM_2232 = PHI <.MEM_574(179), .MEM_1247(183)> // >   ...
>>   if (cond)
>> goto 
>>   else
>> goto 
>>
>> :  //empty
>> :
>>   .MEM_1247 = PHI <.MEM_2232(180), .MEM_2232(181)> // >   if (cond2)
>> goto 
>>   else
>> goto 
>>
>> :
>>   goto 
>>
>> Here PHI_2 can be degenerated and deleted.  Furthermore, after propagating 
>> .MEM_2232 to .MEM_1247's uses, PHI_1 can also be degenerated and deleted in 
>> this case.  These cases are bypassed because tree if-conv doesn't handle 
>> virtual PHI nodes during loop conversion (it only predicates scalar PHI 
>> nodes).  Of course this results in loops not converted, and not vectorized.
>> This patch firstly deletes the aforementioned checking code, then adds code 
>> handling such virtual PHIs during conversion.  The use of 
>> `any_mask_load_store' now is less ambiguous with this change, which allows 
>> further cleanups and patches fixing PR56541.
>> BTW, I think the newly fix at PR70725 on PHI nodes with only one argument is 
>> a special case covered by this change too.  Unfortunately I can't use 
>> replace_uses_by because I need to handle PHIs at use point after replacing 
>> too.  This doesn't really matter since we only care about virtual PHI, it's 
>> not possible to be used by anything other than IR itself.
>> Bootstrap and test on x86_64 and AArch64, is it OK if no regressions?
>
> Doesn't this undo my fix for degenerate non-virtual PHIs?
No, since we already support degenerate non-virtual PHIs in
predicate_scalar_phi, your fix is also for virtual PHIs handling.

>
> I believe we can just drop virtual PHIs and rely on
>
>   if (any_mask_load_store)
> {
>   mark_virtual_operands_for_renaming (cfun);
>   todo |= TODO_update_ssa_only_virtuals;
> }
>
> re-creating them from scratch.  To do better than that we'd simply
I tried this, simply enable above code for all cases can't resolve
verify_ssa issue.  I haven't look into the details, looks like ssa
def-use chain is corrupted in if-conversion if we don't process it
explicitly.  Maybe it's possible along with your below suggestions,
but we need to handle uses outside of loop too.

Thanks,
bin
> re-assign virtuals
> in combine_blocks in the new order (if there's any DEF, use the
> headers virtual def
> as first live vuse, assign that to any stmt with a vuse until you hit
> one with a vdef,
> then make that one life).
>
> Richard.
>
>> Thanks,
>> bin
>>
>> 2016-04-22  Bin Cheng  
>>
>> * tree-if-conv.c (if_convertible_phi_p): Remove check on special
>> virtual PHI nodes.  Delete parameter.
>> (if_convertible_loop_p_1): Delete argument to above function.
>> (degenerate_virtual_phi): New function.
>> (predicate_all_scalar_phis): Rename to ...
>> (process_all_phis): ... here.  Call degenerate_virtual_phi to
>> handle virtual PHIs.
>> (combine_blocks): Call renamed function.
>>


Re: [PATCH 2/2] (header usage fix) include c++ headers in system.h

2016-04-22 Thread Richard Biener
On Fri, Apr 22, 2016 at 12:02 PM, Szabolcs Nagy  wrote:
> Some gcc source files include standard headers after
> "system.h" but those headers may declare and use poisoned
> symbols, they also cannot be included before "system.h"
> because they might depend on macro definitions from there,
> so they must be included in system.h.
>
> This patch fixes the use of , , , 
> and  headers, by using appropriate
> INCLUDE_{LIST, MAP, SET, VECTOR, ALGORITHM} macros.
> (Note that there are some other system header uses which
> did not get fixed.)
>
> Build tested on aarch64-*-gnu, sh-*-musl, x86_64-*-musl and
> bootstrapped x86_64-*-gnu (together with PATCH 1/2).
>
> is this ok for AIX?
> OK for trunk?

Ok for trunk and gcc-6.

Thanks,
Richard.

> This would be nice to fix in gcc-6 too, because at least
> with musl libc the bootstrap is broken.
>
> gcc/ChangeLog:
>
> 2016-04-22  Szabolcs Nagy  
>
> * system.h (list, map, set, vector): Include conditionally.
> * auto-profile.c (INCLUDE_MAP, INCLUDE_SET): Define.
> * graphite-isl-ast-to-gimple.c (INCLUDE_MAP): Define.
> * ipa-icf.c (INCLUDE_LIST): Define.
> * config/aarch64/cortex-a57-fma-steering.c (INCLUDE_LIST): Define.
> * config/sh/sh.c (INCLUDE_VECTOR): Define.
> * config/sh/sh_treg_combine.cc (INCLUDE_ALGORITHM): Define.
> (INCLUDE_LIST, INCLUDE_VECTOR): Define.
> * cp/logic.cc (INCLUDE_LIST): Define.
> * fortran/trans-common.c (INCLUDE_MAP): Define.


Re: [PATCH GCC]Improve tree ifconv by handling virtual PHIs which can be degenerated.

2016-04-22 Thread Richard Biener
On Fri, Apr 22, 2016 at 12:07 PM, Bin Cheng  wrote:
> Hi,
> Tree if-conv has below code checking on virtual PHI nodes in 
> if_convertible__phi_p:
>
>   if (any_mask_load_store)
> return true;
>
>   /* When there were no if-convertible stores, check
>  that there are no memory writes in the branches of the loop to be
>  if-converted.  */
>   if (virtual_operand_p (gimple_phi_result (phi)))
> {
>   imm_use_iterator imm_iter;
>   use_operand_p use_p;
>
>   if (bb != loop->header)
> {
>   if (dump_file && (dump_flags & TDF_DETAILS))
> fprintf (dump_file, "Virtual phi not on loop->header.\n");
>   return false;
> }
>
>   FOR_EACH_IMM_USE_FAST (use_p, imm_iter, gimple_phi_result (phi))
> {
>   if (gimple_code (USE_STMT (use_p)) == GIMPLE_PHI
>   && USE_STMT (use_p) != phi)
> {
>   if (dump_file && (dump_flags & TDF_DETAILS))
> fprintf (dump_file, "Difficult to handle this virtual 
> phi.\n");
>   return false;
> }
> }
> }
>
> After investigation, I think it's to bypass code in the form of:
>
> 
>   .MEM_2232 = PHI <.MEM_574(179), .MEM_1247(183)> //    ...
>   if (cond)
> goto 
>   else
> goto 
>
> :  //empty
> :
>   .MEM_1247 = PHI <.MEM_2232(180), .MEM_2232(181)> //    if (cond2)
> goto 
>   else
> goto 
>
> :
>   goto 
>
> Here PHI_2 can be degenerated and deleted.  Furthermore, after propagating 
> .MEM_2232 to .MEM_1247's uses, PHI_1 can also be degenerated and deleted in 
> this case.  These cases are bypassed because tree if-conv doesn't handle 
> virtual PHI nodes during loop conversion (it only predicates scalar PHI 
> nodes).  Of course this results in loops not converted, and not vectorized.
> This patch firstly deletes the aforementioned checking code, then adds code 
> handling such virtual PHIs during conversion.  The use of 
> `any_mask_load_store' now is less ambiguous with this change, which allows 
> further cleanups and patches fixing PR56541.
> BTW, I think the newly fix at PR70725 on PHI nodes with only one argument is 
> a special case covered by this change too.  Unfortunately I can't use 
> replace_uses_by because I need to handle PHIs at use point after replacing 
> too.  This doesn't really matter since we only care about virtual PHI, it's 
> not possible to be used by anything other than IR itself.
> Bootstrap and test on x86_64 and AArch64, is it OK if no regressions?

Doesn't this undo my fix for degenerate non-virtual PHIs?

I believe we can just drop virtual PHIs and rely on

  if (any_mask_load_store)
{
  mark_virtual_operands_for_renaming (cfun);
  todo |= TODO_update_ssa_only_virtuals;
}

re-creating them from scratch.  To do better than that we'd simply
re-assign virtuals
in combine_blocks in the new order (if there's any DEF, use the
headers virtual def
as first live vuse, assign that to any stmt with a vuse until you hit
one with a vdef,
then make that one life).

Richard.

> Thanks,
> bin
>
> 2016-04-22  Bin Cheng  
>
> * tree-if-conv.c (if_convertible_phi_p): Remove check on special
> virtual PHI nodes.  Delete parameter.
> (if_convertible_loop_p_1): Delete argument to above function.
> (degenerate_virtual_phi): New function.
> (predicate_all_scalar_phis): Rename to ...
> (process_all_phis): ... here.  Call degenerate_virtual_phi to
> handle virtual PHIs.
> (combine_blocks): Call renamed function.
>


Re: [Patch, regex, libstdc++/70745] Fix match_not_bow and match_not_eow

2016-04-22 Thread Jonathan Wakely

On 21/04/16 21:10 -0700, Tim Shen wrote:

Bootstrapped and tested on x86-pc-linux-gnu debug.

It is a conformance fix, but I don't think it's very important. I'm
happy to backport it to gcc 5/4.9, but if it's not considered
necessary, I'm Ok as well.


OK for trunk.

It is a small, safe change, so OK for the branches (6/5/4.9) too, but
let's wait a short while to make sure nobody finds any problems on
trunk (and gcc-6-branch is frozen for release right now anyway).

Thanks.



Re: [PATCH 1/2] (header usage fix) remove unused system header includes

2016-04-22 Thread Richard Biener
On Fri, Apr 22, 2016 at 12:02 PM, Szabolcs Nagy  wrote:
> These headers include "system.h" which already includes
>  and  so including them after "system.h"
> is a noop and including them before may cause problems
> if they depend on gcc macros from system.h.
>
> ipa-icf-gimple.c includes  after system.h which
> poisons various libc symbols which may cause problems
> and it is not used at all.
>
> Tested together with PATCH 2/2.
>
> OK for trunk?

Ok.

Thanks,
Richard.

> gcc/ChangeLog:
>
> 2016-04-22  Szabolcs Nagy  
>
> * auto-profile.c: Remove  include.
> * ipa-icf-gimple.c: Remove  include.
> * diagnostic.c: Remove  include.
> * genmatch.c: Likewise.
> * pretty-print.c: Likewise.
> * toplev.c: Likewise
> * c/c-objc-common.c: Likewise.
> * cp/error.c: Likewise.
> * fortran/error.c: Likewise.


[PATCH GCC]Improve tree ifconv by handling virtual PHIs which can be degenerated.

2016-04-22 Thread Bin Cheng
Hi,
Tree if-conv has below code checking on virtual PHI nodes in 
if_convertible__phi_p:

  if (any_mask_load_store)
return true;

  /* When there were no if-convertible stores, check
 that there are no memory writes in the branches of the loop to be
 if-converted.  */
  if (virtual_operand_p (gimple_phi_result (phi)))
{
  imm_use_iterator imm_iter;
  use_operand_p use_p;

  if (bb != loop->header)
{
  if (dump_file && (dump_flags & TDF_DETAILS))
fprintf (dump_file, "Virtual phi not on loop->header.\n");
  return false;
}

  FOR_EACH_IMM_USE_FAST (use_p, imm_iter, gimple_phi_result (phi))
{
  if (gimple_code (USE_STMT (use_p)) == GIMPLE_PHI
  && USE_STMT (use_p) != phi)
{
  if (dump_file && (dump_flags & TDF_DETAILS))
fprintf (dump_file, "Difficult to handle this virtual phi.\n");
  return false;
}
}
}

After investigation, I think it's to bypass code in the form of:


  .MEM_2232 = PHI <.MEM_574(179), .MEM_1247(183)> //  // 

* tree-if-conv.c (if_convertible_phi_p): Remove check on special
virtual PHI nodes.  Delete parameter.
(if_convertible_loop_p_1): Delete argument to above function.
(degenerate_virtual_phi): New function.
(predicate_all_scalar_phis): Rename to ...
(process_all_phis): ... here.  Call degenerate_virtual_phi to
handle virtual PHIs.
(combine_blocks): Call renamed function.

Index: gcc/tree-if-conv.c
===
--- gcc/tree-if-conv.c  (revision 235359)
+++ gcc/tree-if-conv.c  (working copy)
@@ -640,16 +640,11 @@ phi_convertible_by_degenerating_args (gphi *phi)
PHI is not if-convertible if:
- it has more than 2 arguments.
 
-   When we didn't see if-convertible stores, PHI is not
-   if-convertible if:
-   - a virtual PHI is immediately used in another PHI node,
-   - there is a virtual PHI in a BB other than the loop->header.
When the aggressive_if_conv is set, PHI can have more than
two arguments.  */
 
 static bool
-if_convertible_phi_p (struct loop *loop, basic_block bb, gphi *phi,
- bool any_mask_load_store)
+if_convertible_phi_p (struct loop *loop, basic_block bb, gphi *phi)
 {
   if (dump_file && (dump_flags & TDF_DETAILS))
 {
@@ -669,36 +664,6 @@ static bool
 }
 }
 
-  if (any_mask_load_store)
-return true;
-
-  /* When there were no if-convertible stores, check
- that there are no memory writes in the branches of the loop to be
- if-converted.  */
-  if (virtual_operand_p (gimple_phi_result (phi)))
-{
-  imm_use_iterator imm_iter;
-  use_operand_p use_p;
-
-  if (bb != loop->header)
-   {
- if (dump_file && (dump_flags & TDF_DETAILS))
-   fprintf (dump_file, "Virtual phi not on loop->header.\n");
- return false;
-   }
-
-  FOR_EACH_IMM_USE_FAST (use_p, imm_iter, gimple_phi_result (phi))
-   {
- if (gimple_code (USE_STMT (use_p)) == GIMPLE_PHI
- && USE_STMT (use_p) != phi)
-   {
- if (dump_file && (dump_flags & TDF_DETAILS))
-   fprintf (dump_file, "Difficult to handle this virtual phi.\n");
- return false;
-   }
-   }
-}
-
   return true;
 }
 
@@ -1405,8 +1370,7 @@ if_convertible_loop_p_1 (struct loop *loop,
   gphi_iterator itr;
 
   for (itr = gsi_start_phis (bb); !gsi_end_p (itr); gsi_next ())
-   if (!if_convertible_phi_p (loop, bb, itr.phi (),
-  

Re: [PATCH 2/2] (header usage fix) include c++ headers in system.h

2016-04-22 Thread James Greenhalgh
On Fri, Apr 22, 2016 at 11:02:48AM +0100, Szabolcs Nagy wrote:
> Some gcc source files include standard headers after
> "system.h" but those headers may declare and use poisoned
> symbols, they also cannot be included before "system.h"
> because they might depend on macro definitions from there,
> so they must be included in system.h.
> 
> This patch fixes the use of , , , 
> and  headers, by using appropriate
> INCLUDE_{LIST, MAP, SET, VECTOR, ALGORITHM} macros.
> (Note that there are some other system header uses which
> did not get fixed.)
> 
> Build tested on aarch64-*-gnu, sh-*-musl, x86_64-*-musl and
> bootstrapped x86_64-*-gnu (together with PATCH 1/2).
> 
> is this ok for AIX?
> OK for trunk?

The AArch64 part of this is OK.

Thanks,
James



[PATCH 2/2] (header usage fix) include c++ headers in system.h

2016-04-22 Thread Szabolcs Nagy
Some gcc source files include standard headers after
"system.h" but those headers may declare and use poisoned
symbols, they also cannot be included before "system.h"
because they might depend on macro definitions from there,
so they must be included in system.h.

This patch fixes the use of , , , 
and  headers, by using appropriate
INCLUDE_{LIST, MAP, SET, VECTOR, ALGORITHM} macros.
(Note that there are some other system header uses which
did not get fixed.)

Build tested on aarch64-*-gnu, sh-*-musl, x86_64-*-musl and
bootstrapped x86_64-*-gnu (together with PATCH 1/2).

is this ok for AIX?
OK for trunk?

This would be nice to fix in gcc-6 too, because at least
with musl libc the bootstrap is broken.

gcc/ChangeLog:

2016-04-22  Szabolcs Nagy  

* system.h (list, map, set, vector): Include conditionally.
* auto-profile.c (INCLUDE_MAP, INCLUDE_SET): Define.
* graphite-isl-ast-to-gimple.c (INCLUDE_MAP): Define.
* ipa-icf.c (INCLUDE_LIST): Define.
* config/aarch64/cortex-a57-fma-steering.c (INCLUDE_LIST): Define.
* config/sh/sh.c (INCLUDE_VECTOR): Define.
* config/sh/sh_treg_combine.cc (INCLUDE_ALGORITHM): Define.
(INCLUDE_LIST, INCLUDE_VECTOR): Define.
* cp/logic.cc (INCLUDE_LIST): Define.
* fortran/trans-common.c (INCLUDE_MAP): Define.
diff --git a/gcc/auto-profile.c b/gcc/auto-profile.c
index 0c726bd..cd82ab4 100644
--- a/gcc/auto-profile.c
+++ b/gcc/auto-profile.c
@@ -19,6 +19,8 @@ along with GCC; see the file COPYING3.  If not see
 .  */
 
 #include "config.h"
+#define INCLUDE_MAP
+#define INCLUDE_SET
 #include "system.h"
 #include "coretypes.h"
 #include "backend.h"
@@ -31,10 +33,6 @@ along with GCC; see the file COPYING3.  If not see
 #include "cgraph.h"
 #include "gcov-io.h"
 #include "diagnostic-core.h"
-
-#include 
-#include 
-
 #include "profile.h"
 #include "langhooks.h"
 #include "cfgloop.h"
diff --git a/gcc/config/aarch64/cortex-a57-fma-steering.c b/gcc/config/aarch64/cortex-a57-fma-steering.c
index 21159fe..1bf804b 100644
--- a/gcc/config/aarch64/cortex-a57-fma-steering.c
+++ b/gcc/config/aarch64/cortex-a57-fma-steering.c
@@ -19,6 +19,7 @@
.  */
 
 #include "config.h"
+#define INCLUDE_LIST
 #include "system.h"
 #include "coretypes.h"
 #include "backend.h"
@@ -37,8 +38,6 @@
 #include "cortex-a57-fma-steering.h"
 #include "aarch64-protos.h"
 
-#include 
-
 /* For better performance, the destination of FMADD/FMSUB instructions should
have the same parity as their accumulator register if the accumulator
contains the result of a previous FMUL or FMADD/FMSUB instruction if
diff --git a/gcc/config/sh/sh.c b/gcc/config/sh/sh.c
index 8c8fe3c..b18e59b 100644
--- a/gcc/config/sh/sh.c
+++ b/gcc/config/sh/sh.c
@@ -20,9 +20,9 @@ along with GCC; see the file COPYING3.  If not see
 .  */
 
 #include 
-#include 
 
 #include "config.h"
+#define INCLUDE_VECTOR
 #include "system.h"
 #include "coretypes.h"
 #include "backend.h"
diff --git a/gcc/config/sh/sh_treg_combine.cc b/gcc/config/sh/sh_treg_combine.cc
index bc1ee0e..4d40715 100644
--- a/gcc/config/sh/sh_treg_combine.cc
+++ b/gcc/config/sh/sh_treg_combine.cc
@@ -19,6 +19,9 @@ along with GCC; see the file COPYING3.  If not see
 .  */
 
 #include "config.h"
+#define INCLUDE_ALGORITHM
+#define INCLUDE_LIST
+#define INCLUDE_VECTOR
 #include "system.h"
 #include "coretypes.h"
 #include "backend.h"
@@ -32,10 +35,6 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-pass.h"
 #include "expr.h"
 
-#include 
-#include 
-#include 
-
 /*
 This pass tries to optimize for example this:
 	mov.l	@(4,r4),r1
diff --git a/gcc/cp/logic.cc b/gcc/cp/logic.cc
index e4967bb..c12c381 100644
--- a/gcc/cp/logic.cc
+++ b/gcc/cp/logic.cc
@@ -19,6 +19,7 @@ along with GCC; see the file COPYING3.  If not see
 .  */
 
 #include "config.h"
+#define INCLUDE_LIST
 #include "system.h"
 #include "coretypes.h"
 #include "tm.h"
@@ -45,8 +46,6 @@ along with GCC; see the file COPYING3.  If not see
 #include "toplev.h"
 #include "type-utils.h"
 
-#include 
-
 namespace {
 
 // Helper algorithms
diff --git a/gcc/fortran/trans-common.c b/gcc/fortran/trans-common.c
index 44787ae..4fdccc9 100644
--- a/gcc/fortran/trans-common.c
+++ b/gcc/fortran/trans-common.c
@@ -93,6 +93,7 @@ along with GCC; see the file COPYING3.  If not see
block for each merged equivalence list.  */
 
 #include "config.h"
+#define INCLUDE_MAP
 #include "system.h"
 #include "coretypes.h"
 #include "tm.h"
@@ -100,9 +101,6 @@ along with GCC; see the file COPYING3.  If not see
 #include "gfortran.h"
 #include "trans.h"
 #include "stringpool.h"
-
-#include 
-
 #include "fold-const.h"
 #include "stor-layout.h"
 #include "varasm.h"
diff --git a/gcc/graphite-isl-ast-to-gimple.c b/gcc/graphite-isl-ast-to-gimple.c
index 88609c0..049a4c5 100644
--- 

[PATCH 1/2] (header usage fix) remove unused system header includes

2016-04-22 Thread Szabolcs Nagy
These headers include "system.h" which already includes
 and  so including them after "system.h"
is a noop and including them before may cause problems
if they depend on gcc macros from system.h.

ipa-icf-gimple.c includes  after system.h which
poisons various libc symbols which may cause problems
and it is not used at all.

Tested together with PATCH 2/2.

OK for trunk?

gcc/ChangeLog:

2016-04-22  Szabolcs Nagy  

* auto-profile.c: Remove  include.
* ipa-icf-gimple.c: Remove  include.
* diagnostic.c: Remove  include.
* genmatch.c: Likewise.
* pretty-print.c: Likewise.
* toplev.c: Likewise
* c/c-objc-common.c: Likewise.
* cp/error.c: Likewise.
* fortran/error.c: Likewise.
diff --git a/gcc/auto-profile.c b/gcc/auto-profile.c
index 5c0640a..0c726bd 100644
--- a/gcc/auto-profile.c
+++ b/gcc/auto-profile.c
@@ -32,7 +32,6 @@ along with GCC; see the file COPYING3.  If not see
 #include "gcov-io.h"
 #include "diagnostic-core.h"
 
-#include 
 #include 
 #include 
 
diff --git a/gcc/c/c-objc-common.c b/gcc/c/c-objc-common.c
index 18247af..20dc024 100644
--- a/gcc/c/c-objc-common.c
+++ b/gcc/c/c-objc-common.c
@@ -27,8 +27,6 @@ along with GCC; see the file COPYING3.  If not see
 #include "langhooks.h"
 #include "c-objc-common.h"
 
-#include   // For placement new.
-
 static bool c_tree_printer (pretty_printer *, text_info *, const char *,
 			int, bool, bool, bool);
 
diff --git a/gcc/cp/error.c b/gcc/cp/error.c
index aa5fd41..7d70f89 100644
--- a/gcc/cp/error.c
+++ b/gcc/cp/error.c
@@ -31,8 +31,6 @@ along with GCC; see the file COPYING3.  If not see
 #include "ubsan.h"
 #include "internal-fn.h"
 
-#include // For placement-new.
-
 #define pp_separate_with_comma(PP) pp_cxx_separate_with (PP, ',')
 #define pp_separate_with_semicolon(PP) pp_cxx_separate_with (PP, ';')
 
diff --git a/gcc/diagnostic.c b/gcc/diagnostic.c
index 6a679cb..8106172 100644
--- a/gcc/diagnostic.c
+++ b/gcc/diagnostic.c
@@ -40,8 +40,6 @@ along with GCC; see the file COPYING3.  If not see
 # include 
 #endif
 
-#include  // For placement new.
-
 #define pedantic_warning_kind(DC)			\
   ((DC)->pedantic_errors ? DK_ERROR : DK_WARNING)
 #define permissive_error_kind(DC) ((DC)->permissive ? DK_WARNING : DK_ERROR)
diff --git a/gcc/fortran/error.c b/gcc/fortran/error.c
index 003702b..6cfe019 100644
--- a/gcc/fortran/error.c
+++ b/gcc/fortran/error.c
@@ -34,8 +34,6 @@ along with GCC; see the file COPYING3.  If not see
 #include "diagnostic-color.h"
 #include "tree-diagnostic.h" /* tree_diagnostics_defaults */
 
-#include  /* For placement-new */
-
 static int suppress_errors = 0;
 
 static bool warnings_not_errors = false;
diff --git a/gcc/genmatch.c b/gcc/genmatch.c
index 1f5f45c..ce964fa 100644
--- a/gcc/genmatch.c
+++ b/gcc/genmatch.c
@@ -22,7 +22,6 @@ along with GCC; see the file COPYING3.  If not see
 .  */
 
 #include "bconfig.h"
-#include 
 #include "system.h"
 #include "coretypes.h"
 #include 
diff --git a/gcc/ipa-icf-gimple.c b/gcc/ipa-icf-gimple.c
index 69db0d3..9e3c862 100644
--- a/gcc/ipa-icf-gimple.c
+++ b/gcc/ipa-icf-gimple.c
@@ -35,7 +35,6 @@ along with GCC; see the file COPYING3.  If not see
 #include "fold-const.h"
 #include "gimple-iterator.h"
 #include "ipa-utils.h"
-#include 
 #include "tree-eh.h"
 #include "builtins.h"
 
diff --git a/gcc/pretty-print.c b/gcc/pretty-print.c
index acb89e6..49e1cb9 100644
--- a/gcc/pretty-print.c
+++ b/gcc/pretty-print.c
@@ -25,8 +25,6 @@ along with GCC; see the file COPYING3.  If not see
 #include "pretty-print.h"
 #include "diagnostic-color.h"
 
-#include // For placement-new.
-
 #if HAVE_ICONV
 #include 
 #endif
diff --git a/gcc/toplev.c b/gcc/toplev.c
index c480bfc..8979d26 100644
--- a/gcc/toplev.c
+++ b/gcc/toplev.c
@@ -87,8 +87,6 @@ along with GCC; see the file COPYING3.  If not see
 #include "xcoffout.h"		/* Needed for external data declarations. */
 #endif
 
-#include 
-
 static void general_init (const char *, bool);
 static void do_compile ();
 static void process_options (void);


Re: match.pd patch: min(-x, -y), min(~x, ~y)

2016-04-22 Thread Kyrill Tkachov


On 22/04/16 10:43, Kyrill Tkachov wrote:


On 22/04/16 10:42, Marc Glisse wrote:

On Fri, 22 Apr 2016, Kyrill Tkachov wrote:


2016-04-21  Marc Glisse 

gcc/
* match.pd (min(-x, -y), max(-x, -y), min(~x, ~y), max(~x, ~y)):
New transformations.

gcc/testsuite/
* gcc.dg/tree-ssa/minmax-2.c: New testcase.




I see the new testcase failing on aarch64:
FAIL: gcc.dg/tree-ssa/minmax-2.c scan-tree-dump optimized "__builtin_fmin"


Strange, it seems to work in 
https://gcc.gnu.org/ml/gcc-testresults/2016-04/msg02120.html

Is that on some freestanding kind of setup where the builtin might be disabled?



Ah, this is aarch64-none-elf which uses newlib as the C library.
Let me check on aarch64-none-linux-gnu and get back to you.



Yeah, I see it passing on aarch64-none-linux-gnu.
Do we have an appropriate effective target check to gate this test on?

Kyrill


Thanks,
Kyrill





Re: [PATCH][AArch64][wwwdocs] Summarise some more AArch64 changes for GCC6

2016-04-22 Thread James Greenhalgh
On Thu, Apr 21, 2016 at 09:15:17AM +0100, Kyrill Tkachov wrote:
> Hi all,
> 
> Here's a proposed summary of the changes in the AArch64 backend for GCC 6.
> If there's anything I've missed it's purely my oversight, feel free to add
> entries or suggest improvements.

For me, I'm mostly happy with the wording below (I've tried to be
helpful inline). But I'm not as conscientious at checking grammar as others
in the community. So this is OK from an AArch64 target perspective with
the changes below, but wait a short while to give Gerald or Sandra a chance
to comment.

> Index: htdocs/gcc-6/changes.html
> ===
> RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-6/changes.html,v
> retrieving revision 1.62
> diff -U 3 -r1.62 changes.html
> --- htdocs/gcc-6/changes.html 24 Feb 2016 09:36:06 -  1.62
> +++ htdocs/gcc-6/changes.html 12 Apr 2016 12:47:30 -
> @@ -312,29 +312,91 @@
>  AArch64
> 
>   
> +   A number of AArch64-specific options were added.  The most important

"were" is in a different tense to the other entries in this list. To match,
you might want to use

  A number of AArch64-specific options have been added.

Or:

  A number of AArch64-specific options are now supported.

> +   ones are summarised in this section but for usage instructions please
> +   refer to the documentation.
> + 
> + 
> The new command line options -march=native,
> -mcpu=native and -mtune=native are now
> available on native AArch64 GNU/Linux systems.  Specifying
> these options will cause GCC to auto-detect the host CPU and
> rewrite these options to the optimal setting for that system.
> -   If GCC is unable to detect the host CPU these options have no effect.
>   
>   
> -   -fpic is now supported by the AArch64 target when 
> generating
> +   -fpic is now supported when generating
> code for the small code model (-mcmodel=small).  The 
> size of
> the global offset table (GOT) is limited to 28KiB under the LP64 SysV 
> ABI
> , and 15KiB under the ILP32 SysV ABI.
>   
>   
> -   The AArch64 port now supports target attributes and pragmas.  Please
> -   refer to the  href="https://gcc.gnu.org/onlinedocs/gcc/AArch64-Function-Attributes.html#AArch64-Function-Attributes;>
> -   documentation for details of available attributes and
> +   Target attributes and pragmas are now supported.  Please
> +   refer to the documentation for details of available attributes and
> pragmas as well as usage instructions.
>   
>   
> Link-time optimization across translation units with different
> target-specific options is now supported.
>   
> + 
> +   The option -mtls-size= is now supported.  It can be used 
> to
> +   specify the bit size of TLS offsets, allowing GCC to generate
> +   better TLS instruction sequences.
> + 
> + 
> +   The option -fno-plt is now fixed and is fully
> +   functional.

Remove "is now fixed" ?

> + 
> + 
> +   The ARMv8.1-A architecture and the Large System Extensions are now
> +   supported.  They can be used by specifying the
> +   -march=armv8.1-a option.  Additionally, the
> +   +lse option extension can be used in a similar fashion
> +   to other option extensions.
> +   The Large System Extensions introduce new instructions that are used
> +   in the implementation of common atomic operations.

Remove "common"

Thanks,
James


Re: match.pd patch: min(-x, -y), min(~x, ~y)

2016-04-22 Thread Kyrill Tkachov


On 22/04/16 10:42, Marc Glisse wrote:

On Fri, 22 Apr 2016, Kyrill Tkachov wrote:


2016-04-21  Marc Glisse 

gcc/
* match.pd (min(-x, -y), max(-x, -y), min(~x, ~y), max(~x, ~y)):
New transformations.

gcc/testsuite/
* gcc.dg/tree-ssa/minmax-2.c: New testcase.




I see the new testcase failing on aarch64:
FAIL: gcc.dg/tree-ssa/minmax-2.c scan-tree-dump optimized "__builtin_fmin"


Strange, it seems to work in 
https://gcc.gnu.org/ml/gcc-testresults/2016-04/msg02120.html

Is that on some freestanding kind of setup where the builtin might be disabled?



Ah, this is aarch64-none-elf which uses newlib as the C library.
Let me check on aarch64-none-linux-gnu and get back to you.

Thanks,
Kyrill


Re: match.pd patch: min(-x, -y), min(~x, ~y)

2016-04-22 Thread Marc Glisse

On Fri, 22 Apr 2016, Kyrill Tkachov wrote:


2016-04-21  Marc Glisse  

gcc/
* match.pd (min(-x, -y), max(-x, -y), min(~x, ~y), max(~x, ~y)):
New transformations.

gcc/testsuite/
* gcc.dg/tree-ssa/minmax-2.c: New testcase.




I see the new testcase failing on aarch64:
FAIL: gcc.dg/tree-ssa/minmax-2.c scan-tree-dump optimized "__builtin_fmin"


Strange, it seems to work in 
https://gcc.gnu.org/ml/gcc-testresults/2016-04/msg02120.html


Is that on some freestanding kind of setup where the builtin might be 
disabled?


--
Marc Glisse


Re: Document OpenACC status for GCC 6

2016-04-22 Thread Thomas Schwinge
Hi!

On Thu, 21 Apr 2016 12:19:31 -0600, Sandra Loosemore  
wrote:
> On 04/21/2016 10:21 AM, Thomas Schwinge wrote:
> > + Code will be offloaded onto multiple gangs, but executes with
> > +   just one worker, and a vector length of 1.
> 
> "will be" (future) vs "executes" (present).  Assuming this is all 
> supposed to describe current behavior, please write consistently in the 
> present tense.

Thanks for that.  I keep getting that wrong...

> My only comment on the rest of the patch is that "a kernels region" 
> sounds like a mistake but I think that is the official terminology?

Correct: it's an "OpenACC kernels construct/directive/region".

> -Sandra the nit-picky

Thanks for the review; OK to commit as follows?  And then, should
something be added to the "News" section on 
itself, too?  (I don't know the policy for that.  We didn't suggest that
for GCC 5, because at that time we described the support as a
"preliminary implementation of the OpenACC 2.0a specification"; now it's
much more complete and usable.)

Index: htdocs/gcc-6/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-6/changes.html,v
retrieving revision 1.75
diff -u -p -r1.75 changes.html
--- htdocs/gcc-6/changes.html   21 Apr 2016 15:57:43 -  1.75
+++ htdocs/gcc-6/changes.html   22 Apr 2016 09:22:19 -
@@ -124,6 +124,52 @@ For more information, see the
 
 New Languages and Language specific improvements
 
+Compared to GCC 5, the GCC 6 release series includes a much improved
+implementation of the http://www.openacc.org/;>OpenACC 2.0a
+  specification.  Highlights are:
+
+  In addition to single-threaded host-fallback execution, offloading is
+   supported for nvptx (Nvidia GPUs) on x86_64 and PowerPC 64-bit
+   little-endian GNU/Linux host systems.  For nvptx offloading, with the
+   OpenACC parallel construct, the execution model allows for an arbitrary
+   number of gangs, up to 32 workers, and 32 vectors.
+  Initial support for parallelized execution of OpenACC kernels
+   constructs:
+   
+ Parallelization of a kernels region is switched on
+   by -fopenacc combined with -O2 or
+   higher.
+ Code is offloaded onto multiple gangs, but executes with just one
+   worker, and a vector length of 1.
+ Directives inside a kernels region are not supported.
+ Loops with reductions can be parallelized.
+ Only kernels regions with one loop nest are parallelized.
+ Only the outer-most loop of a loop nest can be parallelized.
+ Loop nests containing sibling loops are not parallelized.
+   
+   Typically, using the OpenACC parallel construct gives much better
+   performance, compared to the initial support of the OpenACC kernels
+   construct.
+  The device_type clause is not supported.
+   The bind and nohost clauses are not
+   supported.  The host_data directive is not supported in
+   Fortran.
+  Nested parallelism (cf. CUDA dynamic parallelism) is not
+   supported.
+  Usage of OpenACC constructs inside multithreaded contexts (such as
+   created by OpenMP, or pthread programming) is not supported.
+  If a call to the acc_on_device function has a
+   compile-time constant argument, the function call evaluates to a
+   compile-time constant value only for C and C++ but not for
+   Fortran.
+
+See the https://gcc.gnu.org/wiki/OpenACC;>OpenACC
+and https://gcc.gnu.org/wiki/Offloading;>Offloading wiki pages
+for further information.
+  
+
 
 
 C family


Grüße
 Thomas


Re: match.pd patch: min(-x, -y), min(~x, ~y)

2016-04-22 Thread Kyrill Tkachov

Hi Marc,

On 21/04/16 11:32, Marc Glisse wrote:

Hello,

another simple transformation.

Instead of the ":s", I had single_use (@2) || single_use (@3), but changed it for simplicity. There may be some patterns in match.pd where we want something like that though, as requiring single_use on many expressions may be stricter 
than we need.


We could generalize to cases where overflow is not undefined if we know (VRP) 
that the variables are not TYPE_MIN_VALUE, but that didn't look like a priority.

Bootstrap+regtest on powerpc64le-unknown-linux-gnu.

2016-04-21  Marc Glisse  

gcc/
* match.pd (min(-x, -y), max(-x, -y), min(~x, ~y), max(~x, ~y)):
New transformations.

gcc/testsuite/
* gcc.dg/tree-ssa/minmax-2.c: New testcase.




I see the new testcase failing on aarch64:
FAIL: gcc.dg/tree-ssa/minmax-2.c scan-tree-dump optimized "__builtin_fmin"

The tree dump for the function 'h' is:
h (double x, double y)
{
  double _2;
  double _4;
  double _5;

  :
  _2 = -x_1(D);
  _4 = -y_3(D);
  _5 = __builtin_fmax (_2, _4);
  return _5;

}

Kyrill


Re: [PATCH] AARCH64: Remove spurious attribute __unused__ from NEON intrinsic

2016-04-22 Thread James Greenhalgh
On Fri, Apr 22, 2016 at 07:59:41AM +0200, Wladimir J. van der Laan wrote:
> The lane parameter is not unused, so should not be marked as such.
> 
> The others were removed in https://patchwork.ozlabs.org/patch/272912/,
> but this one appears to have been missed.

The patch looks good to me, and is OK for trunk... Now for all the
administration!

This patch will need a ChangeLog entry [1], please draft one that I can
use when I apply the patch.

I'm guessing that you don't have a copyright assignment on file with the
FSF. While trivial changes like this don't generally need one, if you plan
to contribute more substantial changed to GCC in future, you may want to
start the process (see [2]).

Thanks for the contribution, if you provide a ChangeLog, I'd be happy to
apply the patch.

Thanks,
James

---
[1]: http://www.gnu.org/prep/standards/standards.html#Change-Logs
[2]: https://gcc.gnu.org/contribute.html


> ---
>  gcc/config/aarch64/arm_neon.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
> index 2612a32..0a2aa7b 100644
> --- a/gcc/config/aarch64/arm_neon.h
> +++ b/gcc/config/aarch64/arm_neon.h
> @@ -14070,7 +14070,7 @@ vdupb_laneq_p8 (poly8x16_t __a, const int __b)
>  }
>  
>  __extension__ static __inline int8_t __attribute__ ((__always_inline__))
> -vdupb_laneq_s8 (int8x16_t __a, const int __attribute__ ((unused)) __b)
> +vdupb_laneq_s8 (int8x16_t __a, const int __b)
>  {
>return __aarch64_vget_lane_any (__a, __b);
>  }
> -- 
> 1.9.1
> 


Re: match.pd patch: u + 3 < u is u > UINT_MAX - 3

2016-04-22 Thread Marc Glisse

On Fri, 22 Apr 2016, Richard Biener wrote:


On Fri, Apr 22, 2016 at 5:29 AM, Marc Glisse  wrote:

Hello,

this optimizes a common pattern for unsigned overflow detection, when one of
the arguments turns out to be a constant. There are more ways this could
look like, (a + 42 <= 41) in particular, but that'll be for another patch.


This case is also covered by fold_comparison which should be re-written
to match.pd patterns (and removed from fold-const.c).

fold_binary also as a few interesting/similar equality compare cases
like X +- Y CMP X to Y CMP 0 which look related.

Also your case is in fold_binary for the case of undefined overflow:


As far as I can tell, fold-const.c handles this kind of transformation 
strictly in the case of undefined overflow (or floats), while this is 
strictly in the case of unsigned with wrapping overflow. I thought it 
would be more readable to take advantage of the genmatch machinery and 
group the wrapping transforms in one place, and the undefined overflow 
ones in another place (they don't group the same way by operator, etc).


If you prefer to group by pattern shape and port the related fold-const.c 
bit at the same time, I could try that...



+/* When one argument is a constant, overflow detection can be simplified.
+   Currently restricted to single use so as not to interfere too much with
+   ADD_OVERFLOW detection in tree-ssa-math-opts.c.  */
+(for cmp (lt le ge gt)
+ out (gt gt le le)
+ (simplify
+  (cmp (plus@2 @0 integer_nonzerop@1) @0)
+  (if (TYPE_UNSIGNED (TREE_TYPE (@0))
+   && TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0))
+   && TYPE_MAX_VALUE (TREE_TYPE (@0))
+   && single_use (@2))
+   (out @0 (minus { TYPE_MAX_VALUE (TREE_TYPE (@0)); } @1)
+(for cmp (gt ge le lt)
+ out (gt gt le le)
+ (simplify
+  (cmp @0 (plus@2 @0 integer_nonzerop@1))
+  (if (TYPE_UNSIGNED (TREE_TYPE (@0))
+   && TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0))
+   && TYPE_MAX_VALUE (TREE_TYPE (@0))
+   && single_use (@2))
+   (out @0 (minus { TYPE_MAX_VALUE (TREE_TYPE (@0)); } @1)

please add a comment with the actual transform - A + CST CMP A -> A CMP' CST'

As we are relying on twos-complement wrapping you shouldn't need TYPE_MAX_VALUE
here but you can use wi::max_value (precision, sign).  I'm not sure we
have sensible
TYPE_MAX_VALUE for vector or complex types - the accessor uses
NUMERICAL_TYPE_CKECK
and TYPE_OVERFLOW_WRAPS checks for ANY_INTEGRAL_TYPE.  Thus I wonder
if we should restrict this to INTEGRAL_TYPE_P (making the
wi::max_value route valid).


integer_nonzerop currently already restricts to INTEGER_CST or 
COMPLEX_CST, and I don't think complex can appear in a comparison. I'll go 
back to writing the more explicit INTEGER_CST in the pattern and I'll use 
wide_int.


Thanks,

--
Marc Glisse


Re: [RFC][PATCH][PR40921] Convert x + (-y * z * z) into x - y * z * z

2016-04-22 Thread Richard Biener
On Thu, Apr 21, 2016 at 1:12 PM, kugan
 wrote:
> Hi Richard,
>
>
> On 19/04/16 22:11, Richard Biener wrote:
>>
>> On Tue, Apr 19, 2016 at 1:36 PM, Richard Biener
>>  wrote:
>>>
>>> On Tue, Apr 19, 2016 at 1:35 PM, Richard Biener
>>>  wrote:

 On Mon, Feb 29, 2016 at 11:53 AM, kugan
  wrote:
>>
>>
>> Err.  I think the way you implement that in reassoc is ad-hoc and not
>> related to reassoc at all.
>>
>> In fact what reassoc is missing is to handle
>>
>>-y * z * (-w) * x -> y * x * w * x
>>
>> thus optimize negates as if they were additional * -1 entries in a
>> multiplication chain.  And
>> then optimize a single remaining * -1 in the result chain to a negate.
>>
>> Then match.pd handles x + (-y) -> x - y (independent of
>> -frounding-math
>> btw).
>>
>> So no, this isn't ok as-is, IMHO you want to expand the multiplication
>> ops
>> chain
>> pulling in the * -1 ops (if single-use, of course).
>>
>
> I agree. Here is the updated patch along what you suggested. Does this
> look
> better ?


 It looks better but I think you want to do factor_out_negate_expr before
 the
 first qsort/optimize_ops_list call to catch -1. * z * (-w) which also
 means you
 want to simply append a -1. to the ops list rather than adjusting the
 result
 with a negate stmt.

 You also need to guard all this with ! HONOR_SNANS (type) && (!
 HONOR_SIGNED_ZEROS (type)
 || ! COMPLEX_FLOAT_TYPE_P (type)) (see match.pd pattern transforming x
 * -1. to -x).
>>>
>>>
>>> And please add at least one testcase.
>>
>>
>> And it appears to me that you could handle this in linearize_expr_tree
>> as well, similar
>> to how we handle MULT_EXPR with acceptable_pow_call there by adding -1.
>> and
>> op into the ops vec.
>>
>
>
> I am not sure I understand this. I tried doing this. If I add  -1 and rhs1
> for the NEGATE_EXPR to ops list,  when it come to rewrite_expr_tree constant
> will be sorted early and would make it hard to generate:
>  x + (-y * z * z) => x - y * z * z
>
> Do you want to swap the constant in MULT_EXPR chain (if present) like in
> swap_ops_for_binary_stmt and then create a NEGATE_EXPR ?

In addition to linearize_expr handling you need to handle a -1 in the MULT_EXPR
chain specially at rewrite_expr_tree time by emitting a NEGATE_EXPR instead
of a MULT_EXPR (which also means you should sort the -1 "last").

Richard.

>
> Thanks,
> Kugan
>
>
>> Similar for the x + x + x -> 3 * x case we'd want to add a repeat op when
>> seeing
>> x + 3 * x + x and use ->count in that patch as well.
>>
>> Best split out the
>>
>>if (rhscode == MULT_EXPR
>>&& TREE_CODE (binrhs) == SSA_NAME
>>&& acceptable_pow_call (SSA_NAME_DEF_STMT (binrhs), ,
>> ))
>>  {
>>add_repeat_to_ops_vec (ops, base, exponent);
>>gimple_set_visited (SSA_NAME_DEF_STMT (binrhs), true);
>>  }
>>else
>>  add_to_ops_vec (ops, binrhs);
>>
>> pattern into a helper that handles the other cases.
>>
>> Richard.
>>
>>> Richard.
>>>
 Richard.

> Thanks,
> Kugan


Re: Inline across -ffast-math boundary

2016-04-22 Thread Richard Biener
On Thu, 21 Apr 2016, Jan Hubicka wrote:

> Hi,
> this patch implements the long promised logic to inline across -ffast-math
> boundary when eitehr caller or callee has no fp operations in it.  This is
> needed to resolve code quality regression on Firefox with LTO where
> -O3/-O2/-Ofast flags are combined and we fail to inline a lot of comdats
> otherwise.
> 
> Bootstrapped/regtested x86_64-linux. Ricahrd, I would like to know your 
> opinion
> on fp_expression_p predicate - it is bit ugly but I do not know how to 
> implement
> it better.
> 
> We still won't inline -O1 code into -O2+ because flag_strict_overflow differs.
> I will implement similar logic for overflows incrementally. Similarly 
> flag_errno_math
> can be handled better, but I am not sure it matters - I think wast majority 
> of time
> users set errno_math in sync with other -ffast-math implied flags.

Note that for reasons PR70586 shows (const functions having possible
trapping side-effect because of FP math or division) we'd like to
have sth like "uses FP math" "uses possibly trapping integer math"
"uses integer math with undefined overflow" on a per function level
and propagated alongside pure/const/nothrow state.

So maybe you can fit that into a more suitable place than just the
inliner (which of course is interested in "uses FP math locally",
not the transitive answer we need for PR70586).

More comments below.

> Honza
> 
> 
>   * ipa-inline-analysis.c (reset_inline_summary): Clear fp_expressions
>   (dump_inline_summary): Dump it.
>   (fp_expression_p): New predicate.
>   (estimate_function_body_sizes): Use it.
>   (inline_merge_summary): Merge fp_expressions.
>   (inline_read_section): Read fp_expressions.
>   (inline_write_summary): Write fp_expressions.
>   * ipa-inline.c (can_inline_edge_p): Permit inlining across fp math
>   codegen boundary if either caller or callee is !fp_expressions.
>   * ipa-inline.h (inline_summary): Add fp_expressions.
>   * ipa-inline-transform.c (inline_call): When inlining !fp_expressions
>   to fp_expressions be sure the fp generation flags are updated.
> 
>   * gcc.dg/ipa/inline-8.c: New testcase.
> Index: ipa-inline-analysis.c
> ===
> --- ipa-inline-analysis.c (revision 235312)
> +++ ipa-inline-analysis.c (working copy)
> @@ -1069,6 +1069,7 @@ reset_inline_summary (struct cgraph_node
>  reset_inline_edge_summary (e);
>for (e = node->indirect_calls; e; e = e->next_callee)
>  reset_inline_edge_summary (e);
> +  info->fp_expressions = false;
>  }
>  
>  /* Hook that is called by cgraph.c when a node is removed.  */
> @@ -1423,6 +1424,8 @@ dump_inline_summary (FILE *f, struct cgr
>   fprintf (f, " inlinable");
>if (s->contains_cilk_spawn)
>   fprintf (f, " contains_cilk_spawn");
> +  if (s->fp_expressions)
> + fprintf (f, " fp_expression");
>fprintf (f, "\n  self time:   %i\n", s->self_time);
>fprintf (f, "  global time: %i\n", s->time);
>fprintf (f, "  self size:   %i\n", s->self_size);
> @@ -2459,6 +2462,42 @@ clobber_only_eh_bb_p (basic_block bb, bo
>return true;
>  }
>  
> +/* Return true if STMT compute a floating point expression that may be 
> affected
> +   by -ffast-math and similar flags.  */
> +
> +static bool
> +fp_expression_p (gimple *stmt)
> +{
> +  tree fndecl;
> +
> +  if (gimple_code (stmt) == GIMPLE_ASSIGN
> +  /* Even conversion to and from float are FP expressions.  */
> +  && (FLOAT_TYPE_P (TREE_TYPE (gimple_assign_lhs (stmt)))
> +   || FLOAT_TYPE_P (TREE_TYPE (gimple_assign_rhs1 (stmt
> +  /* Plain moves are safe.  */
> +  && (IS_EXPR_CODE_CLASS (TREE_CODE_CLASS (gimple_assign_rhs_code 
> (stmt)))
> +   || TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) == tcc_comparison))
> +return true;
> +
> +  /* Comparsions may be optimized with assumption that value is not NaN.  */
> +  if (gimple_code (stmt) == GIMPLE_COND
> +  && (FLOAT_TYPE_P (TREE_TYPE (gimple_cond_lhs (stmt)))
> +   || FLOAT_TYPE_P (TREE_TYPE (gimple_cond_rhs (stmt)
> +return true;
> +
> +  /* Builtins may be optimized depending on math mode.  We don't really have
> + list of these, so just check that there are no FP arguments.  */
> +  if (gimple_code (stmt) == GIMPLE_CALL
> +  && (fndecl = gimple_call_fndecl (stmt)) != NULL_TREE
> +  && DECL_BUILT_IN_CLASS (fndecl) != NOT_BUILT_IN)
> +{
> +  for (unsigned int i=0; i < gimple_call_num_args (stmt); i++)
> + if (FLOAT_TYPE_P (TREE_TYPE (gimple_call_arg (stmt, i
> +   return true;
> +}

I'd say a simpler implementation with the same effect would be

   FOR_EACH_SSA_TREE_OPERAND (op, stmt, i, SSA_OP_DEF|SSA_OP_USE)
 if (FLOAT_TYPE_P (TREE_TYPE (op)))
   return true;

relying on the fact that no unfolded fully constant expressions
appear in the IL (the above doesn't visit constants).


Re: match.pd patch: u + 3 < u is u > UINT_MAX - 3

2016-04-22 Thread Richard Biener
On Fri, Apr 22, 2016 at 5:29 AM, Marc Glisse  wrote:
> Hello,
>
> this optimizes a common pattern for unsigned overflow detection, when one of
> the arguments turns out to be a constant. There are more ways this could
> look like, (a + 42 <= 41) in particular, but that'll be for another patch.

This case is also covered by fold_comparison which should be re-written
to match.pd patterns (and removed from fold-const.c).

fold_binary also as a few interesting/similar equality compare cases
like X +- Y CMP X to Y CMP 0 which look related.

Also your case is in fold_binary for the case of undefined overflow:

  /* Transform comparisons of the form X +- C CMP X.  */
  if ((TREE_CODE (arg0) == PLUS_EXPR || TREE_CODE (arg0) == MINUS_EXPR)
  && operand_equal_p (TREE_OPERAND (arg0, 0), arg1, 0)
  && ((TREE_CODE (TREE_OPERAND (arg0, 1)) == REAL_CST
   && !HONOR_SNANS (arg0))
  || (TREE_CODE (TREE_OPERAND (arg0, 1)) == INTEGER_CST
  && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (arg1)
{
...

+/* When one argument is a constant, overflow detection can be simplified.
+   Currently restricted to single use so as not to interfere too much with
+   ADD_OVERFLOW detection in tree-ssa-math-opts.c.  */
+(for cmp (lt le ge gt)
+ out (gt gt le le)
+ (simplify
+  (cmp (plus@2 @0 integer_nonzerop@1) @0)
+  (if (TYPE_UNSIGNED (TREE_TYPE (@0))
+   && TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0))
+   && TYPE_MAX_VALUE (TREE_TYPE (@0))
+   && single_use (@2))
+   (out @0 (minus { TYPE_MAX_VALUE (TREE_TYPE (@0)); } @1)
+(for cmp (gt ge le lt)
+ out (gt gt le le)
+ (simplify
+  (cmp @0 (plus@2 @0 integer_nonzerop@1))
+  (if (TYPE_UNSIGNED (TREE_TYPE (@0))
+   && TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0))
+   && TYPE_MAX_VALUE (TREE_TYPE (@0))
+   && single_use (@2))
+   (out @0 (minus { TYPE_MAX_VALUE (TREE_TYPE (@0)); } @1)

please add a comment with the actual transform - A + CST CMP A -> A CMP' CST'

As we are relying on twos-complement wrapping you shouldn't need TYPE_MAX_VALUE
here but you can use wi::max_value (precision, sign).  I'm not sure we
have sensible
TYPE_MAX_VALUE for vector or complex types - the accessor uses
NUMERICAL_TYPE_CKECK
and TYPE_OVERFLOW_WRAPS checks for ANY_INTEGRAL_TYPE.  Thus I wonder
if we should restrict this to INTEGRAL_TYPE_P (making the
wi::max_value route valid).

Thanks,
Richard.

> Bootstrap+regtest on powerpc64le-unknown-linux-gnu.



> 2016-04-22  Marc Glisse  
>
> gcc/
> * match.pd (X + CST CMP X): New transformation.
>
> gcc/testsuite/
> * gcc.dg/tree-ssa/overflow-1.c: New testcase.
>
> --
> Marc Glisse


Re: [PATCH GCC]Refactor IVOPT.

2016-04-22 Thread Richard Biener
On Thu, Apr 21, 2016 at 7:26 PM, Bin Cheng  wrote:
> Hi,
> This patch refactors IVOPT in three major aspects:
> Firstly it rewrites iv_use groups.  Use group is originally introduced only 
> for address type uses, this patch makes it general to all (generic, compare, 
> address) types.  Currently generic/compare groups contain only one iv_use, 
> and compare groups can be extended to contain multiple uses.  As far as 
> generic use is concerned, it won't contain multiple uses because IVOPT reuses 
> one iv_use structure for generic uses at different places already.  This 
> change also cleanups algorithms as well as data structures.
> Secondly it implements group data structure in vector rather than in list as 
> originally.  List was used because it's easy to split.  Of course list is 
> hard to sort (For example, we use quadratic insertion sort now).  This 
> problem will become more critical since I plan to introduce fine-control over 
> splitting small address groups by checking if target supports load/store pair 
> instructions or not.  In this case address group needs to be sorted more than 
> once and against complex conditions, for example, memory loads in one basic 
> block should be sorted together in offset ascending order.  With vector 
> group, sorting can be done very efficiently with quick sort.
> Thirdly this patch cleanups/reformats IVOPT's dump information.  I think the 
> information is easier to read/test now.  Since change of dump information is 
> entangled with group data-structure change, it's hard to make it a standalone 
> patch.  Given this part patch is quite straightforward, I hope it won't be 
> confusing.
>
> Bootstrap and test on x86_64 and AArch64, no regressions.  I also checked 
> generated assembly for spec2k and spec2k6 on both platforms, turns out output 
> assembly is almost not changed except for several files.  After further 
> investigation, I can confirm the difference is cause by small change when 
> sorting groups. Given the original sorting condition as below:
> -  /* Sub use list is maintained in offset ascending order.  */
> -  if (addr_offset <= group->addr_offset)
> -{
> -  use->related_cands = group->related_cands;
> -  group->related_cands = NULL;
> -  use->next = group;
> -  data->iv_uses[id_group] = use;
> -}
> iv_uses with same addr_offset are sorted in reverse control flow order.  This 
> might be a typo since I don't remember any specific reason for it.  If this 
> patch sorts groups in the same way, there will be no difference in generated 
> assembly at all.  So this patch is a pure refactoring work which doesn't have 
> any functional change.
>
> Any comments?

Looks good to me.

Richard.

> Thanks,
> bin
>
> 2016-04-19  Bin Cheng  
>
> * tree-ssa-loop-ivopts.c (struct iv): Use pointer to struct iv_use
> instead of redundant use_id and boolean have_use_for.
> (struct iv_use): Change sub_id into group_id.  Remove field next.
> Move fields: related_cands, n_map_members, cost_map and selected
> to ...
> (struct iv_group): ... here.  New structure.
> (struct iv_common_cand): Use structure declaration directly.
> (struct ivopts_data, iv_ca, iv_ca_delta): Rename fields.
> (MAX_CONSIDERED_USES): Rename macro to ...
> (MAX_CONSIDERED_GROUPS): ... here.
> (n_iv_uses, iv_use, n_iv_cands, iv_cand): Delete.
> (dump_iv, dump_use, dump_cand): Refactor format of dump information.
> (dump_uses): Rename to ...
> (dump_groups): ... here.  Update all uses.
> (tree_ssa_iv_optimize_init, alloc_iv): Update all uses.
> (find_induction_variables): Refactor format of dump information.
> (record_sub_use): Delete.
> (record_use): Update all uses.
> (record_group): New function.
> (record_group_use, find_interesting_uses_op): Call above functions.
> Update all uses.
> (find_interesting_uses_cond): Ditto.
> (group_compare_offset): New function.
> (split_all_small_groups): Rename to ...
> (split_small_address_groups_p): ... here.  Update all uses.
> (split_address_groups):  Update all uses.
> (find_interesting_uses): Refactor format of dump information.
> (add_candidate_1): Update all uses.  Remove redundant check on iv,
> base and step.
> (add_candidate, record_common_cand): Remove redundant assert.
> (add_iv_candidate_for_biv): Update use.
> (add_iv_candidate_derived_from_uses): Update all uses.
> (add_iv_candidate_for_groups, record_important_candidates): Ditto.
> (alloc_use_cost_map): Ditto.
> (set_use_iv_cost, get_use_iv_cost): Rename to ...
> (set_group_iv_cost, get_group_iv_cost): ... here.  Update all uses.
> (determine_use_iv_cost_generic): Ditto.
> (determine_group_iv_cost_generic): Ditto.
> 

Re: [PATCH] Fixup nb_iterations_upper_bound adjustment for vectorized loops

2016-04-22 Thread Richard Biener
On Thu, Apr 21, 2016 at 6:09 PM, Ilya Enkovich  wrote:
> Hi,
>
> Currently when loop is vectorized we adjust its nb_iterations_upper_bound
> by dividing it by VF.  This is incorrect since nb_iterations_upper_bound
> is upper bound for ( - 1) and therefore simple
> dividing it by VF in many cases gives us bounds greater than a real one.
> Correct value would be ((nb_iterations_upper_bound + 1) / VF - 1).

Yeah, that seems correct.

> Also decrement due to peeling for gaps should happen before we scale it
> by VF because peeling applies to a scalar loop, not vectorized one.

That's not true - PEELING_FOR_GAPs is so that the last _vector_ iteration
is peeled as scalar operations.  We do not account for the amount
of known prologue peeling (if peeling for alignment and the misalignment
is known at compile-time) - that would be peeling of scalar iterations.

But it would be interesting to know why we need the != 0 check - static
cost modelling should have disabled vectorization if the vectorized body
isn't run.

> This patch modifies nb_iterations_upper_bound computation to resolve
> these issues.

You do not adjust the ->nb_iterations_estimate accordingly.

> Running regression testing I got one fail due to optimized loop. Heres
> is a loop:
>
> foo (signed char s)
> {
>   signed char i;
>   for (i = 0; i < s; i++)
> yy[i] = (signed int) i;
> }
>
> Here we vectorize for AVX512 using VF=64.  Original loop has max 127
> iterations and therefore vectorized loop may be executed only once.
> With the patch applied compiler detects it and transforms loop into
> BB with just stores of constants vectors into yy.  Test was adjusted
> to increase number of possible iterations.  A copy of test was added
> to check we can optimize out the original loop.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu.  OK for trunk?

I'd like to see testcases covering the corner-cases - have them have
upper bound estimates by adjusting known array sizes and also cover
the case of peeling for gaps.

Richard.

> Thanks,
> Ilya
> --
> gcc/
>
> 2016-04-21  Ilya Enkovich  
>
> * tree-vect-loop.c (vect_transform_loop): Fix
> nb_iterations_upper_bound computation for vectorized loop.
>
> gcc/testsuite/
>
> 2016-04-21  Ilya Enkovich  
>
> * gcc.target/i386/vect-unpack-2.c (avx512bw_test): Avoid
> optimization of vector loop.
> * gcc.target/i386/vect-unpack-3.c: New test.
>
>
> diff --git a/gcc/testsuite/gcc.target/i386/vect-unpack-2.c 
> b/gcc/testsuite/gcc.target/i386/vect-unpack-2.c
> index 4825248..51c518e 100644
> --- a/gcc/testsuite/gcc.target/i386/vect-unpack-2.c
> +++ b/gcc/testsuite/gcc.target/i386/vect-unpack-2.c
> @@ -6,19 +6,22 @@
>
>  #define N 120
>  signed int yy[1];
> +signed char zz[1];
>
>  void
> -__attribute__ ((noinline)) foo (signed char s)
> +__attribute__ ((noinline,noclone)) foo (int s)
>  {
> -   signed char i;
> +   int i;
> for (i = 0; i < s; i++)
> - yy[i] = (signed int) i;
> + yy[i] = zz[i];
>  }
>
>  void
>  avx512bw_test ()
>  {
>signed char i;
> +  for (i = 0; i < N; i++)
> +zz[i] = i;
>foo (N);
>for (i = 0; i < N; i++)
>  if ( (signed int)i != yy [i] )
> diff --git a/gcc/testsuite/gcc.target/i386/vect-unpack-3.c 
> b/gcc/testsuite/gcc.target/i386/vect-unpack-3.c
> new file mode 100644
> index 000..eb8a93e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/vect-unpack-3.c
> @@ -0,0 +1,29 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -fdump-tree-vect-details -ftree-vectorize -ffast-math 
> -mavx512bw -save-temps" } */
> +/* { dg-require-effective-target avx512bw } */
> +
> +#include "avx512bw-check.h"
> +
> +#define N 120
> +signed int yy[1];
> +
> +void
> +__attribute__ ((noinline)) foo (signed char s)
> +{
> +   signed char i;
> +   for (i = 0; i < s; i++)
> + yy[i] = (signed int) i;
> +}
> +
> +void
> +avx512bw_test ()
> +{
> +  signed char i;
> +  foo (N);
> +  for (i = 0; i < N; i++)
> +if ( (signed int)i != yy [i] )
> +  abort ();
> +}
> +
> +/* { dg-final { scan-tree-dump "LOOP VECTORIZED" "vect" } } */
> +/* { dg-final { scan-assembler-not "vpmovsxbw\[ \\t\]+\[^\n\]*%zmm" } } */
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index d813b86..da98211 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -6921,11 +6921,13 @@ vect_transform_loop (loop_vec_info loop_vinfo)
>/* Reduce loop iterations by the vectorization factor.  */
>scale_loop_profile (loop, GCOV_COMPUTE_SCALE (1, vectorization_factor),
>   expected_iterations / vectorization_factor);
> -  loop->nb_iterations_upper_bound
> -= wi::udiv_floor (loop->nb_iterations_upper_bound, vectorization_factor);
>if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
>&& loop->nb_iterations_upper_bound != 0)
>  loop->nb_iterations_upper_bound = loop->nb_iterations_upper_bound - 1;
> +  

Re: [4.9/5/6: PATCH] Replace -skip-rax-setup with -mskip-rax-setup

2016-04-22 Thread Richard Biener
On Thu, Apr 21, 2016 at 4:52 PM, H.J. Lu  wrote:
> On Wed, Apr 20, 2016 at 5:56 AM, H.J. Lu  wrote:
>> This fixed a typo.  Checked into trunk.
>>
>> H.J.
>> ---
>> Index: gcc/ChangeLog
>> ===
>> --- gcc/ChangeLog   (revision 235274)
>> +++ gcc/ChangeLog   (working copy)
>> @@ -1,3 +1,7 @@
>> +2016-04-20  H.J. Lu  
>> +
>> +   * doc/invoke.texi: Replace -skip-rax-setup with -mskip-rax-setup.
>> +
>>  2016-04-20  Richard Biener  
>>
>> * gimple-match.h (maybe_build_generic_op): Adjust prototype.
>> Index: gcc/doc/invoke.texi
>> ===
>> --- gcc/doc/invoke.texi (revision 235274)
>> +++ gcc/doc/invoke.texi (working copy)
>> @@ -24157,7 +24157,7 @@ useful together with @option{-mrecord-mc
>>  @itemx -mno-skip-rax-setup
>>  @opindex mskip-rax-setup
>>  When generating code for the x86-64 architecture with SSE extensions
>> -disabled, @option{-skip-rax-setup} can be used to skip setting up RAX
>> +disabled, @option{-mskip-rax-setup} can be used to skip setting up RAX
>>  register when there are no variable arguments passed in vector registers.
>>
>>  @strong{Warning:} Since RAX register is used to avoid unnecessarily
>
> OK for 4.9, 5 and 6 branches?

Ok.

Richard.

>
> --
> H.J.


[PATCH] AARCH64: Remove spurious attribute __unused__ from NEON intrinsic

2016-04-22 Thread Wladimir J. van der Laan
The lane parameter is not unused, so should not be marked as such.

The others were removed in https://patchwork.ozlabs.org/patch/272912/,
but this one appears to have been missed.
---
 gcc/config/aarch64/arm_neon.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 2612a32..0a2aa7b 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -14070,7 +14070,7 @@ vdupb_laneq_p8 (poly8x16_t __a, const int __b)
 }
 
 __extension__ static __inline int8_t __attribute__ ((__always_inline__))
-vdupb_laneq_s8 (int8x16_t __a, const int __attribute__ ((unused)) __b)
+vdupb_laneq_s8 (int8x16_t __a, const int __b)
 {
   return __aarch64_vget_lane_any (__a, __b);
 }
-- 
1.9.1