Re: [PATCH][MIPS] Add -mgrow-frame-downwards option

2016-05-20 Thread Sandra Loosemore

On 05/20/2016 08:58 AM, Robert Suchanek wrote:

Hi,

The patch changes the default behaviour of the direction in which
the local frame grows for MIPS16.

The code size reduces by about 0.5% in average case for -Os, hence,
it is good to turn the option on by default.

Ok to apply?

Regards,
Robert

gcc/

2016-05-20  Matthew Fortune  

* config/mips/mips.h (FRAME_GROWS_DOWNWARD): Enable it
conditionally for MIPS16.
* config/mips/mips.opt: Add -mgrow-frame-downwards option.
Enable it by default for MIPS16.
* doc/invoke.texi: Document the option.


This may be a stupid question, but what point is there in exposing this 
as an option to users?  Users generally just want the compiler to emit 
good code when they compile with -O, not more individual optimization 
switches to twiddle.  Is FRAME_GROWS_DOWNWARD likely to be so buggy or 
poorly tested that it's necessary to provide a way to turn it off?


If we really must have this option


diff --git a/gcc/config/mips/mips.opt b/gcc/config/mips/mips.opt
index 3b92ef5..53feb23 100644
--- a/gcc/config/mips/mips.opt
+++ b/gcc/config/mips/mips.opt
@@ -447,3 +447,7 @@ Enum(mips_cb_setting) String(always) Value(MIPS_CB_ALWAYS)
  minline-intermix
  Target Report Var(TARGET_INLINE_INTERMIX)
  Allow inlining even if the compression flags differ between caller and callee.
+
+mgrow-frame-downwards
+Target Report Var(TARGET_FRAME_GROWS_DOWNWARDS) Init(1)
+Change the behaviour to grow the frame downwards for MIPS16.


British spelling of "behaviour" here.  How about just "Grow the frame 
downwards for MIPS16."



diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 2f6195e..6e5d620 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -17929,6 +17930,18 @@ vice-versa.  When using this option it is necessary to 
protect functions
  that cannot be compiled as MIPS16 with a @code{noinline} attribute to ensure
  they are not inlined into a MIPS16 function.

+@item -mgrow-frame-downwards
+@itemx -mno-grow-frame-downwards
+@opindex mgrow-frame-downwards
+Grow the local frame down (up) for MIPS16.
+
+Growing the frame downwards allows us to get spill slots created at the lowest


s/allows us to get spill slots created/allows GCC to create spill slots/


+address rather than the highest address in a local frame.  The benefit of this
+is smaller code size as accessing spill splots closer to the stack pointer
+can be done using using 16-bit instructions.


s/spill splots/spill slots/

But, this option description is so implementor-speaky that it just 
reinforces my thinking that it's likely to be uninteresting to users



+
+The option is enabled by default (to grow frame downwards) for MIPS16.
+
  @item -mabi=32
  @itemx -mabi=o64
  @itemx -mabi=n32



-Sandra



Re: C PATCH to add -Wswitch-unreachable (PR c/49859)

2016-05-20 Thread Sandra Loosemore

On 05/20/2016 10:36 AM, Marek Polacek wrote:

diff --git gcc/doc/invoke.texi gcc/doc/invoke.texi
index f3d087f..5909b9d 100644
--- gcc/doc/invoke.texi
+++ gcc/doc/invoke.texi
@@ -297,7 +297,8 @@ Objective-C and Objective-C++ Dialects}.
  -Wsuggest-attribute=@r{[}pure@r{|}const@r{|}noreturn@r{|}format@r{]} @gol
  -Wsuggest-final-types @gol -Wsuggest-final-methods -Wsuggest-override @gol
  -Wmissing-format-attribute -Wsubobject-linkage @gol
--Wswitch  -Wswitch-default  -Wswitch-enum -Wswitch-bool -Wsync-nand @gol
+-Wswitch  -Wswitch-default  -Wswitch-enum -Wswitch-bool @gol
+-Wswitch-unreachable  -Wsync-nand @gol
  -Wsystem-headers  -Wtautological-compare  -Wtrampolines  -Wtrigraphs @gol
  -Wtype-limits  -Wundef @gol
  -Wuninitialized  -Wunknown-pragmas  -Wunsafe-loop-optimizations @gol


I think this list is supposed to be alphabetized except with respect to 
-Wno-foo being sorted as if it were -Wfoo.  I realize there are other 
inconsistencies, but can you at least keep the -Wswitch* entries in 
proper order?



@@ -4144,6 +4145,39 @@ switch ((int) (a == 4))
  @end smallexample
  This warning is enabled by default for C and C++ programs.

+@item -Wswitch-unreachable
+@opindex Wswitch-unreachable
+@opindex Wno-switch-unreachable
+Warn whenever a @code{switch} statement contains statements between the
+controlling expression and the first case label, which will never be
+executed.  For example:
+@smallexample
+@group
+switch (cond)
+  @{
+   i = 15;
+  @dots{}
+   case 5:
+  @dots{}
+  @}
+@end group
+@end smallexample
+@option{-Wswitch-unreachable} will not warn if the statement between the


s/will/does/


+controlling expression and the first case label is just a declaration:
+@smallexample
+@group
+switch (cond)
+  @{
+   int i;
+  @dots{}
+   case 5:
+   i = 5;
+  @dots{}
+  @}
+@end group
+@end smallexample
+This warning is enabled by default for C and C++ programs.
+
  @item -Wsync-nand @r{(C and C++ only)}
  @opindex Wsync-nand
  @opindex Wno-sync-nand


The doc part of the patch is OK with those things fixed.

-Sandra



Re: [RFA] Minor cleanup to allocate_dynamic_stack_space

2016-05-20 Thread Jeff Law

On 05/20/2016 03:44 PM, Eric Botcazou wrote:

So here's that cleanup.  The diffs are larger than one might expect
because of the reindentation that needs to happen.  So I've included a
-b diff variant which shows how little actually changed here.


I'm wondering if it isn't counter-productive.  The ??? comment is explicit
about where the problem comes from: STACK_POINTER_OFFSET used to be defined
only when needed, now it's always defined.

So I think that we should try to restore the initial state, this will very
likely generate fewer alignment operations, for example:
I pondered that as a direction, but was scared off by the overall 
fragility of this code when I looked back through the old BZs.  I 
figured cleanup preserving existing behavior was the first step.


We can go the other way if you prefer.   It just makes reasoning about 
how this code is supposed to work harder.


jeff



Re: [RFA] Minor cleanup to allocate_dynamic_stack_space

2016-05-20 Thread Eric Botcazou
> So here's that cleanup.  The diffs are larger than one might expect
> because of the reindentation that needs to happen.  So I've included a
> -b diff variant which shows how little actually changed here.

I'm wondering if it isn't counter-productive.  The ??? comment is explicit 
about where the problem comes from: STACK_POINTER_OFFSET used to be defined 
only when needed, now it's always defined.

So I think that we should try to restore the initial state, this will very 
likely generate fewer alignment operations, for example:

#if defined (STACK_DYNAMIC_OFFSET)
  if (1)
#else
  if (STACK_POINTER_OFFSET)
#endif
{
  must_align = true;
  extra_align = BITS_PER_UNIT;
}

-- 
Eric Botcazou


Re: [RFC] Type promotion pass and elimination of zext/sext

2016-05-20 Thread Jeff Law

On 05/20/2016 04:55 AM, Richard Biener wrote:

+/* Promote definition DEF to promoted type.  If the stmt that defines def
+   is def_stmt, make the type of def promoted type.  If the stmt is such
+   that, result of the def_stmt cannot be of promoted type, create a
new_def
+   of the original_type and make the def_stmt assign its value to newdef.
+   Then, create a NOP_EXPR to convert new_def to def of promoted type.
+
+   For example, for stmt with original_type char and promoted_type int:
+char _1 = mem;
+becomes:
+char _2 = mem;
+int _1 = (int)_2;


When does this case happen, and how is this any better than PRE or other
elimination/code motion algorithms in improving the generated code?


The above case mentions one - loads from memory.  Another case would be
vector element extracts from vNqi vectors or asm outputs.

Duh.  I should have looked the code above more closely.





I would hazard a guess that it could happen if you still needed the char
sized used in a small number of cases, but generally wanted to promote most
uses to int?


I think we want to promote all uses to int, we only can't always combine
the extension with the value-producing stmt on GIMPLE (we don't have
single-stmt sign-extending loads for example).  Likewise we don't allow
the equivalent of (subreg:QI SI-reg) at SSA use sites and thus will
generally have a trucating stmt before such uses.
Hmmm, this is all reminding me of some terrible hacks we used to have to 
change the types on the LHS of memory loads to discourage unnecessary 
extensions and encourage use of promoted types.  I wouldn't want to go 
down that path again.







So what does this mean for this pass?  It means that we need to think
about the immediate goal we want to fulfil - which might be to just
promote things that we can fully promote, avoiding the necessity to
prevent passes from undoing our work.
You're probably right.  That's probably valuable in and of itself and 
the other cases we can tackle later if they prove important.




 That said - we need a set of

testcases the pass should enable to being optimized better than without it

Agreed, 100%.


(I myself see the idea of promoting on GIMPLE according to PROMOTE_MODE
as good design cleanup towards pushing GIMPLE farther out).

I think we're in general agreement here as well.

jeff



Re: Question regarding bug 70584

2016-05-20 Thread Daniel Gutson
On Fri, May 20, 2016 at 5:42 PM, Jeff Law  wrote:
> On 05/20/2016 01:18 PM, Daniel Gutson wrote:
>>
>> (reposting in gcc@ and adding more information)
>>
>> On Fri, May 20, 2016 at 3:43 PM, Andres Tiraboschi
>>  wrote:
>>>
>>> While analysing this bug we arrived to the following code at
>>> tree.c:145 (lvalue_kind):
>>>
>>> case VAR_DECL:
>>>   if (TREE_READONLY (ref) && ! TREE_STATIC (ref)
>>>   && DECL_LANG_SPECIFIC (ref)
>>>   && DECL_IN_AGGR_P (ref))
>>> return clk_none;
>>>
>>> That condition fails so a fall-through to the next case labels causes
>>> to return clk_ordinary, whereas this is about a constexpr value
>>> (rather than a reference).
>>>
>>> As an experiment, we forced the condition above to return clk_none and
>>> the bug is not reproduced.
>>>
>>> We are suspecting either that the condition is too restrictive or a
>>> fall-through is not intended. Why is the condition requiring
>>> DECL_IN_AGGR_P?
>>
>>
>> Just to provide more information: DECL_LANG_SPECIFIC is NULL and
>> DECL_IN_AGGR_P is false.
>> Can somebody provide the rationale of the condition?
>
> I'm not really an expert in this code, but it looks like we're returning
> clk_none for a small subset of nodes that aren't really lvalues. Examples
> would be certain read-only objects which can't be lvalues.
>
> Other VAR_DECLs would be lvalues and should probably return clk_ordinary.
>
> At least that how it appears to me.
>
> Jeff
>
Thanks, Jason and us already solved. I think he will commit the patch soon.


-- 

Daniel F. Gutson
Engineering Manager

San Lorenzo 47, 3rd Floor, Office 5
Córdoba, Argentina

Phone:   +54 351 4217888 / +54 351 4218211
Skype:dgutson
LinkedIn: http://ar.linkedin.com/in/danielgutson


Re: [RFA] Remove useless test in bitmap_find_bit.

2016-05-20 Thread Jeff Law

On 05/09/2016 03:29 AM, Bernd Schmidt wrote:

On 05/06/2016 11:18 PM, Jeff Law wrote:


OK for the trunk?


Counts as obvious, doesn't it?
It might, particularly in cases where the code is essentially unchanged 
in 20 years and thus we don't have nearly as much concern that the 
preconditions are likely to change.


jeff


Re: [PATCH 3/3] function: Restructure *logue insertion

2016-05-20 Thread Segher Boessenkool
On Fri, May 20, 2016 at 11:28:25AM +0200, Thomas Schwinge wrote:
> > > > * function.c (make_epilogue_seq): Remove epilogue_end parameter.
> > > > (thread_prologue_and_epilogue_insns): Remove bb_flags.  Restructure
> > > > code.  Ignore sibcalls on EDGE_IGNORE edges.
> > > > * shrink-wrap.c (handle_simple_exit): New function.  Set EDGE_IGNORE
> > > > on edges for sibcalls that run without prologue.  The rest of the
> > > > function is combined from...
> > > > (fix_fake_fallthrough_edge): ... this, and ...
> > > > (try_shrink_wrapping): ... a part of this.  Remove the bb_with
> > > > function argument, make it a local variable.
> 
> On Thu, 19 May 2016 17:20:46 -0500, Segher Boessenkool 
>  wrote:
> > On Thu, May 19, 2016 at 04:00:22PM -0600, Jeff Law wrote:
> > > OK for the trunk, but please watch closely for any fallout.
> > 
> > Thanks, and I will!
> 
> With nvptx offloading on x86_64 GNU/Linux, this (r236491) is causing
> several execution test failures.  I'll have a look.

nvptx calls thread_prologue_and_epilogue_insns directly.  It seems in
the "normal" way there always is a commit_edge_insertions afterwards,
but for nvptx there isn't.  thrread_prologue_and_epilogue_insertions
should do it itself, of course.  This patch fixes it; regression tested
on powerpc64-linux, and Thomas says it looks good on nvptx.  Committing
to trunk as obvious.


Segher


===
This fixes a bug in my r236491: on nvptx, functions without prologue
would not get an epilogue either.


2016-05-20  Segher Boessenkool  

* function.c (thread_prologue_and_epilogue_insns): Commit the
insertion of the epilogue.

---
 gcc/function.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/function.c b/gcc/function.c
index 25e0e0c..b517012 100644
--- a/gcc/function.c
+++ b/gcc/function.c
@@ -5977,6 +5977,7 @@ thread_prologue_and_epilogue_insns (void)
   if (epilogue_seq)
{
  insert_insn_on_edge (epilogue_seq, exit_fallthru_edge);
+ commit_edge_insertions ();
 
  /* The epilogue insns we inserted may cause the exit edge to no longer
 be fallthru.  */
-- 
1.9.3



[RFA] Minor cleanup to allocate_dynamic_stack_space

2016-05-20 Thread Jeff Law

On 05/19/2016 05:11 PM, Jeff Law wrote:
[ ... ]

This is a bit of a mess and I think the code
needs some TLC before we start hacking it up further.

Let's start with clean up of dead code:

 /* We will need to ensure that the address we return is aligned to
 REQUIRED_ALIGN.  If STACK_DYNAMIC_OFFSET is defined, we don't
 always know its final value at this point in the compilation (it
 might depend on the size of the outgoing parameter lists, for
 example), so we must align the value to be returned in that case.
 (Note that STACK_DYNAMIC_OFFSET will have a default nonzero value if
 STACK_POINTER_OFFSET or ACCUMULATE_OUTGOING_ARGS are defined).
 We must also do an alignment operation on the returned value if
 the stack pointer alignment is less strict than REQUIRED_ALIGN.

 If we have to align, we must leave space in SIZE for the hole
 that might result from the alignment operation.  */

  must_align = (crtl->preferred_stack_boundary < required_align);
  if (must_align)
{
  if (required_align > PREFERRED_STACK_BOUNDARY)
extra_align = PREFERRED_STACK_BOUNDARY;
  else if (required_align > STACK_BOUNDARY)
extra_align = STACK_BOUNDARY;
  else
extra_align = BITS_PER_UNIT;
}

  /* ??? STACK_POINTER_OFFSET is always defined now.  */
#if defined (STACK_DYNAMIC_OFFSET) || defined (STACK_POINTER_OFFSET)
  must_align = true;
  extra_align = BITS_PER_UNIT;
#endif

If we look at defaults.h, it always defines STACK_POINTER_OFFSET.  So
all the code above I think collapses to:

  must_align = true;
  extra_align = BITS_PER_UNIT

And the only other assignment to must_align assigns it the value "true".
 There are two conditionals on must_align that looks like

if (must_align)
  {
CODE;
  }

We should remove the conditional and pull CODE out an indentation level.
 And remove all remnants of must_align.

I don't think that changes your patch in any way.  Hopefully it makes
the whole function somewhat easier to grok.

Thoughts?
So here's that cleanup.  The diffs are larger than one might expect 
because of the reindentation that needs to happen.  So I've included a 
-b diff variant which shows how little actually changed here.


This should have no impact on any target.

Bootstrapped and regression tested on x86_64 linux.  Ok for the trunk?

Jeff

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index b8fb96c..257e98b 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2016-05-20  Jeff Law  
+
+* explow.c (allocate_dynamic_stack_space): Simplify
+knowing that MUST_ALIGN was always true.
+
 2016-05-16  Ryan Burn  
 
* Makefile.in (GTFILES): Add cilk.h and cilk-common.c.
diff --git a/gcc/explow.c b/gcc/explow.c
index e0ce201..51897e0 100644
--- a/gcc/explow.c
+++ b/gcc/explow.c
@@ -1175,7 +1175,6 @@ allocate_dynamic_stack_space (rtx size, unsigned 
size_align,
   rtx_code_label *final_label;
   rtx final_target, target;
   unsigned extra_align = 0;
-  bool must_align;
 
   /* If we're asking for zero bytes, it doesn't matter what we point
  to since we can't dereference it.  But return a reasonable
@@ -1245,49 +1244,18 @@ allocate_dynamic_stack_space (rtx size, unsigned 
size_align,
   if (crtl->preferred_stack_boundary < PREFERRED_STACK_BOUNDARY)
 crtl->preferred_stack_boundary = PREFERRED_STACK_BOUNDARY;
 
-  /* We will need to ensure that the address we return is aligned to
- REQUIRED_ALIGN.  If STACK_DYNAMIC_OFFSET is defined, we don't
- always know its final value at this point in the compilation (it
- might depend on the size of the outgoing parameter lists, for
- example), so we must align the value to be returned in that case.
- (Note that STACK_DYNAMIC_OFFSET will have a default nonzero value if
- STACK_POINTER_OFFSET or ACCUMULATE_OUTGOING_ARGS are defined).
- We must also do an alignment operation on the returned value if
- the stack pointer alignment is less strict than REQUIRED_ALIGN.
-
- If we have to align, we must leave space in SIZE for the hole
- that might result from the alignment operation.  */
-
-  must_align = (crtl->preferred_stack_boundary < required_align);
-  if (must_align)
-{
-  if (required_align > PREFERRED_STACK_BOUNDARY)
-   extra_align = PREFERRED_STACK_BOUNDARY;
-  else if (required_align > STACK_BOUNDARY)
-   extra_align = STACK_BOUNDARY;
-  else
-   extra_align = BITS_PER_UNIT;
-}
-
-  /* ??? STACK_POINTER_OFFSET is always defined now.  */
-#if defined (STACK_DYNAMIC_OFFSET) || defined (STACK_POINTER_OFFSET)
-  must_align = true;
   extra_align = BITS_PER_UNIT;
-#endif
 
-  if (must_align)
-{
-  unsigned extra = (required_align - extra_align) / BITS_PER_UNIT;
+  unsigned extra = (required_align - extra_align) / BITS_PER_UNIT;
 
-  size = plus_constant (Pmode, size, extra);
-  size = force_operand (size, NULL_RTX);
+  size = plus_constant (Pmode, 

[C++] code cleanup

2016-05-20 Thread Nathan Sidwell
When working on the constexpr machinery for gcc 6, I noticed a couple of cleanup 
opportunities.


1) cxx_bind_parameters_in_call contains 'if (cond) goto x; ... x:;', which can 
easily be rewritten to 'if (!cond) { ...}'


2) a which vs that grammar error.

applied to trunk.

nathan
2016-05-20  Nathan Sidwell  

	* constexpr.c (cxx_bind_parameters_in_call): Avoid gratuitous if
	... goto.
	(cxx_eval_call_expression): Fix comment grammar.

Index: cp/constexpr.c
===
--- cp/constexpr.c	(revision 236510)
+++ cp/constexpr.c	(working copy)
@@ -1201,18 +1201,18 @@ cxx_bind_parameters_in_call (const const
   /* Just discard ellipsis args after checking their constantitude.  */
   if (!parms)
 	continue;
-  if (*non_constant_p)
-	/* Don't try to adjust the type of non-constant args.  */
-	goto next;
-
-  /* Make sure the binding has the same type as the parm.  */
-  if (TREE_CODE (type) != REFERENCE_TYPE)
-	arg = adjust_temp_type (type, arg);
-  if (!TREE_CONSTANT (arg))
-	*non_constant_args = true;
-  *p = build_tree_list (parms, arg);
-  p = _CHAIN (*p);
-next:
+
+  if (!*non_constant_p)
+	{
+	  /* Make sure the binding has the same type as the parm.  But
+	 only for constant args.  */
+	  if (TREE_CODE (type) != REFERENCE_TYPE)
+	arg = adjust_temp_type (type, arg);
+	  if (!TREE_CONSTANT (arg))
+	*non_constant_args = true;
+	  *p = build_tree_list (parms, arg);
+	  p = _CHAIN (*p);
+	}
   parms = TREE_CHAIN (parms);
 }
 }
@@ -1420,7 +1420,7 @@ cxx_eval_call_expression (const constexp
 	  *slot = entry = ggc_alloc ();
 	  *entry = new_call;
 	}
-  /* Calls which are in progress have their result set to NULL
+  /* Calls that are in progress have their result set to NULL,
 	 so that we can detect circular dependencies.  */
   else if (entry->result == NULL)
 	{


Re: Question regarding bug 70584

2016-05-20 Thread Jeff Law

On 05/20/2016 01:18 PM, Daniel Gutson wrote:

(reposting in gcc@ and adding more information)

On Fri, May 20, 2016 at 3:43 PM, Andres Tiraboschi
 wrote:

While analysing this bug we arrived to the following code at
tree.c:145 (lvalue_kind):

case VAR_DECL:
  if (TREE_READONLY (ref) && ! TREE_STATIC (ref)
  && DECL_LANG_SPECIFIC (ref)
  && DECL_IN_AGGR_P (ref))
return clk_none;

That condition fails so a fall-through to the next case labels causes
to return clk_ordinary, whereas this is about a constexpr value
(rather than a reference).

As an experiment, we forced the condition above to return clk_none and
the bug is not reproduced.

We are suspecting either that the condition is too restrictive or a
fall-through is not intended. Why is the condition requiring
DECL_IN_AGGR_P?


Just to provide more information: DECL_LANG_SPECIFIC is NULL and
DECL_IN_AGGR_P is false.
Can somebody provide the rationale of the condition?
I'm not really an expert in this code, but it looks like we're returning 
clk_none for a small subset of nodes that aren't really lvalues. 
Examples would be certain read-only objects which can't be lvalues.


Other VAR_DECLs would be lvalues and should probably return clk_ordinary.

At least that how it appears to me.

Jeff



Re: [PATCH] Fix Fortran ICE due to realloc_string_callback bug (PR fortran/71204)

2016-05-20 Thread Jerry DeLisle
On 05/20/2016 04:36 AM, Jakub Jelinek wrote:
> Hi!
> 
> We ICE at -O0 while compiling the testcase below, because we don't reset
> two vars that are reset in all other places in frontend-passes.c when
> starting to process an unrelated statement.  Without this,
> we can emit some statement into a preexisting block that can be elsewhere
> in the current procedure or as in the testcase in completely different
> procedure.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/6/5?


Yes, OK, thanks for patch!

Jerry
> 
> 2016-05-20  Jakub Jelinek  
> 
>   PR fortran/71204
>   * frontend-passes.c (realloc_string_callback): Clear inserted_block
>   and changed_statement before calling create_var.
> 
>   * gfortran.dg/pr71204.f90: New test.
> 
> --- gcc/fortran/frontend-passes.c.jj  2016-05-11 15:16:18.0 +0200
> +++ gcc/fortran/frontend-passes.c 2016-05-20 10:44:31.699542384 +0200
> @@ -174,8 +174,10 @@ realloc_string_callback (gfc_code **c, i
>  
>if (!gfc_check_dependency (expr1, expr2, true))
>  return 0;
> -  
> +
>current_code = c;
> +  inserted_block = NULL;
> +  changed_statement = NULL;
>n = create_var (expr2, "trim");
>co->expr2 = n;
>return 0;
> --- gcc/testsuite/gfortran.dg/pr71204.f90.jj  2016-05-20 10:45:40.738608941 
> +0200
> +++ gcc/testsuite/gfortran.dg/pr71204.f90 2016-05-20 10:46:25.873998687 
> +0200
> @@ -0,0 +1,17 @@
> +! PR fortran/71204
> +! { dg-do compile }
> +! { dg-options "-O0" }
> +
> +module pr71204
> +  character(10), allocatable :: z(:)
> +end module
> +
> +subroutine s1
> +  use pr71204
> +  z(2) = z(1)
> +end
> +
> +subroutine s2
> +  use pr71204
> +  z(2) = z(1)
> +end
> 
>   Jakub
> 


[PTX] toplevel-reorder and no-common flags

2016-05-20 Thread Nathan Sidwell
This patch stops us unconditionally setting the toplevel-reorder flag.  It's 
mostly needed but a couple of testcase rely on it being unset.   Those now pass.


Also force -fno-common, unless explicitly specified.  As the comment says, we 
fudge common by using .weak, and that's not quite the right thing.  So only 
provide common storage when the user explicitly asks for it.  The 
ssa-store-ccp-2.c testcase is checking some semantics that are only applicable 
to common storage, so enable that flag.


applied to trunk.

nathan
2016-05-20  Nathan Sidwell  

	* config/nvptx/nptx.c (nvptx_option_override): Only set
	flag_toplevel_reorder, if not explicitly specified.  Set
	flag_no_common, unless explicitly specified.

	testsuite/
	* gcc.target/nvptx/uninit-decl.c: Force common storage,  add
	non-common cases.
	* gcc.dg/tree-ssa/ssa-store-ccp-2.c: Add -fcommon.

Index: config/nvptx/nvptx.c
===
--- config/nvptx/nvptx.c	(revision 236392)
+++ config/nvptx/nvptx.c	(working copy)
@@ -155,8 +155,19 @@ static void
 nvptx_option_override (void)
 {
   init_machine_status = nvptx_init_machine_status;
-  /* Gives us a predictable order, which we need especially for variables.  */
-  flag_toplevel_reorder = 1;
+
+  /* Set toplevel_reorder, unless explicitly disabled.  We need
+ reordering so that we emit necessary assembler decls of
+ undeclared variables. */
+  if (!global_options_set.x_flag_toplevel_reorder)
+flag_toplevel_reorder = 1;
+
+  /* Set flag_no_common, unless explicitly disabled.  We fake common
+ using .weak, and that's not entirely accurate, so avoid it
+ unless forced.  */
+  if (!global_options_set.x_flag_no_common)
+flag_no_common = 1;
+
   /* Assumes that it will see only hard registers.  */
   flag_var_tracking = 0;
 
Index: testsuite/gcc.target/nvptx/uninit-decl.c
===
--- testsuite/gcc.target/nvptx/uninit-decl.c	(revision 236392)
+++ testsuite/gcc.target/nvptx/uninit-decl.c	(working copy)
@@ -1,7 +1,21 @@
 /* { dg-do compile } */
 
-int __attribute__ ((used)) common;
-static int __attribute__ ((used)) local;
+int __attribute__ ((common)) common;
+static int local;
+extern int external_decl;
+int external_defn;
+
+int foo ()
+{
+  return common +  local + external_decl + external_defn;
+}
+
+void bar (int i)
+{
+  common = local = external_decl = external_defn = i;
+}
 
 /* { dg-final { scan-assembler "\[\n\r\]\[\t \]*.weak .global\[^,\n\r\]*common" } } */
 /* { dg-final { scan-assembler "\[\n\r\]\[\t \]*.global\[^,\n\r\]*local" } } */
+/* { dg-final { scan-assembler "\[\n\r\]\[\t \]*.extern .global\[^,\n\r\]*external_decl" } } */
+/* { dg-final { scan-assembler "\[\n\r\]\[\t \]*.visible .global\[^,\n\r\]*external_defn" } } */
Index: testsuite/gcc.dg/tree-ssa/ssa-store-ccp-2.c
===
--- testsuite/gcc.dg/tree-ssa/ssa-store-ccp-2.c	(revision 236392)
+++ testsuite/gcc.dg/tree-ssa/ssa-store-ccp-2.c	(working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* { dg-options "-O2 -fdump-tree-optimized -fcommon" } */
 
 const int conststaticvariable;
 


Re: [PATCH] nvptx per-warp compiler-defined stacks (-msoft-stack)

2016-05-20 Thread Nathan Sidwell

On 05/20/16 11:09, Alexander Monakov wrote:


This patch implements '-msoft-stack' code generation variant for NVPTX.  The
goal is to avoid relying on '.local' memory space for placement of automatic
data, and instead have an explicitely-maintained stack pointer (which can be
set up to point to preallocated global memory space).  This allows to have
stack data accessible from all threads and modifiable with atomic
instructions.  This also allows to implement variable-length stack allocation
(for 'alloca' and C99 VLAs).

Each warp has its own 'soft stack' pointer.  It lives in shared memory array
called __nvptx_stacks at index %tid.y (like OpenACC, OpenMP is offloading is
going to use launch geometry such that %tid.y gives the warp index).  It is
retrieved in function prologue (if the function needs a stack frame) and may
also be written there (if the function is non-leaf, so that its callees see
the updated stack pointer), and restored prior to returning.

Startup code is responsible for setting up the initial soft-stack pointer. For
-mmainkernel testing it is libgcc's __main, for OpenMP offloading it's the
kernel region entry code.


ah,  that's much more understandable,  thanks.  Presumably this doesn't support 
worker-single mode (in OpenACC parlance, I don't know what the OpenMP version of 
that is?)  And neither would it support calls  from vector-partitioned code (I 
think that's SIMD in OpenMP-land?).   It seems like we should reject the 
combination of -msoft-stack -fopenacc?



2016-05-19  Alexander Monakov  



2016-05-19  Alexander Monakov  



2016-05-19  Alexander Monakov  



2016-05-19  Alexander Monakov  



2016-03-15  Alexander Monakov  



2015-12-14  Alexander Monakov  



2015-12-09  Alexander Monakov  


why so many changelogs?  The on-branch development history is irrelvant for 
trunk -- the usual single changelog style should be followed.




diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 2d4dad1..700c4b0 100644



@@ -992,16 +1091,56 @@ nvptx_declare_function_name (FILE *file, const char 
*name, const_tree decl)




+  else if (need_frameptr || cfun->machine->has_varadic || cfun->calls_alloca)
+{
+  /* Maintain 64-bit stack alignment.  */


This block needs a more descriptive comment -- it appears to be doing a great 
deal more than maintaining 64-bit stack alignment!



+  int keep_align = BIGGEST_ALIGNMENT / BITS_PER_UNIT;
+  sz = ROUND_UP (sz, keep_align);
+  int bits = POINTER_SIZE;
+  fprintf (file, "\t.reg.u%d %%frame;\n", bits);
+  fprintf (file, "\t.reg.u32 %%fstmp0;\n");
+  fprintf (file, "\t.reg.u%d %%fstmp1;\n", bits);
+  fprintf (file, "\t.reg.u%d %%fstmp2;\n", bits);


Some of these register names appear to be long lived -- and referenced in other 
functions.  It would be better to give those more descriptive names, or even 
give them hard-regs.   You should  certainly  do so for those that are already 
hard regs (%frame & %stack) -- is it more feasible to augment init_frame to 
initialize them?



@@ -1037,6 +1178,10 @@ nvptx_output_return (void)
 {
   machine_mode mode = (machine_mode)cfun->machine->return_mode;

+  if (cfun->machine->using_softstack)
+fprintf (asm_out_file, "\tst.shared.u%d [%%fstmp2], %%fstmp1;\n",


See note above about obscure reg names.

  Since ptx is a virtual target, we just define a few

hard registers for special purposes and leave pseudos unallocated.
@@ -200,6 +205,7 @@ struct GTY(()) machine_function
   bool is_varadic;  /* This call is varadic  */
   bool has_varadic;  /* Current function has a varadic call.  */
   bool has_chain; /* Current function has outgoing static chain.  */
+  bool using_softstack; /* Need to restore __nvptx_stacks[tid.y].  */


Comment should describe what the attribute is, not what it implies.  In this 
case I think it's /*  Current function has   a soft stack frame.  */




diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index 33a4862..e5650b6 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md




+(define_insn "set_softstack_insn"
+  [(unspec [(match_operand 0 "nvptx_register_operand" "R")] UNSPEC_ALLOCA)]
+  "TARGET_SOFT_STACK"
+{
+  return (cfun->machine->using_softstack
+ ? "%.\\tst.shared%t0\\t[%%fstmp2], %0;"
+ : "");
+})


Is this alloca related (UNSPEC_ALLOCA) or restore related (invoked in 
restore_stack_block), or stack setting (as insn name suggests).  Things seem 
inconsistently named.  Comments would be good.




 (define_expand "restore_stack_block"
   [(match_operand 0 "register_operand" "")
(match_operand 1 "register_operand" "")]
   ""
 {
+  if (TARGET_SOFT_STACK)
+{
+  emit_move_insn (operands[0], operands[1]);
+  emit_insn (gen_set_softstack_insn (operands[0]));


Is it necessary to store the soft stack here?  Only 

Re: [PATCH][RFC] Introduce BIT_FIELD_INSERT

2016-05-20 Thread Richard Biener
On May 20, 2016 6:08:34 PM GMT+02:00, Jakub Jelinek  wrote:
>On Fri, May 20, 2016 at 08:54:39AM -0700, Andi Kleen wrote:
>> I thought I had filed a bugzilla at some point, but can't
>> find it right now. If you compare bitfield code
>> compiled for Haswell on LLVM and GCC it is very visible
>> how much worse gcc is.
>
>We really need to lower bitfield operations (especially when there are
>multiple adjacent ones) to integer arithmetics on the underlying
>DECL_BIT_FIELD_REPRESENTATIVE fields somewhere in (late?) gimple and
>perform some cleanups after it and only after that expand.

Yes, I still have patches for this and plan to resurrect them now that one 
prerequisite is in.

Richard.

>   Jakub




Re: Question regarding bug 70584

2016-05-20 Thread Daniel Gutson
(reposting in gcc@ and adding more information)

On Fri, May 20, 2016 at 3:43 PM, Andres Tiraboschi
 wrote:
> While analysing this bug we arrived to the following code at
> tree.c:145 (lvalue_kind):
>
> case VAR_DECL:
>   if (TREE_READONLY (ref) && ! TREE_STATIC (ref)
>   && DECL_LANG_SPECIFIC (ref)
>   && DECL_IN_AGGR_P (ref))
> return clk_none;
>
> That condition fails so a fall-through to the next case labels causes
> to return clk_ordinary, whereas this is about a constexpr value
> (rather than a reference).
>
> As an experiment, we forced the condition above to return clk_none and
> the bug is not reproduced.
>
> We are suspecting either that the condition is too restrictive or a
> fall-through is not intended. Why is the condition requiring
> DECL_IN_AGGR_P?

Just to provide more information: DECL_LANG_SPECIFIC is NULL and
DECL_IN_AGGR_P is false.
Can somebody provide the rationale of the condition?

Thanks,

   Daniel.

>
> Thank,
>
> Andres.



-- 

Daniel F. Gutson
Engineering Manager

San Lorenzo 47, 3rd Floor, Office 5
Córdoba, Argentina

Phone:   +54 351 4217888 / +54 351 4218211
Skype:dgutson
LinkedIn: http://ar.linkedin.com/in/danielgutson


Re: [PATCH 3/3] jit: implement gcc_jit_rvalue_set_bool_require_tail_call

2016-05-20 Thread David Malcolm
On Tue, 2016-05-17 at 18:49 -0400, Trevor Saunders wrote:
> On Tue, May 17, 2016 at 06:01:32PM -0400, David Malcolm wrote:
> > This implements the libgccjit support for must-tail-call via
> > a new:
> >   gcc_jit_rvalue_set_bool_require_tail_call
> > API entrypoint.
> 
> It seems to me like that's not a great name, the rvalue and bool
> parts
> are just about the argument types, not what the function does. 
>  Wouldn't
> gcc_jit_set_call_requires_tail_call be better?

Maybe.  I was thinking of it from the point of view of methods; it's
effectively a method on gcc_jit_rvalue.  The "set_bool" is for
consistency with various gcc_jit_context_ entrypoints.

FWIW, I've committed the patch as-is; all of the must-tail-call patches
are now in trunk, as of r236531.


Re: [PATCH, rs6000] Fix compilation issue on power7 from r235577

2016-05-20 Thread Segher Boessenkool
On Fri, May 20, 2016 at 01:35:15PM -0500, Bill Seurer wrote:
> This patch changes some of the dejagnu options to better restrict
> where the test cases run so that they will no longer cause failures on
> power7 machines.
> 
> Based on a subsequent patch I also updated the code formatting (indentation,
> etc.) for the code from the original patch (r235577) in both the test cases
> and in rs6000-c.c.
> 
> Bootstrapped and tested on powerpc64le-unknown-linux-gnu (on both
> power7 and power8) and powerpc64-unknown-linux-gnu (power8) with no
> regressions. Is this ok for trunk?

Okay, one changelog issue...

> [gcc]
> 
> 2016-05-20  Bill Seurer  
> 
>   * config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin): Fix
>   code formatting in ALTIVEC_BUILTIN_VEC_ADDEC section.
> 
> [gcc/testsuite]
> 
> 2016-05-20  Bill Seurer  
> 
>   * gcc.target/powerpc/vec-addec.c: Change dejagnu options, fix code
> formatting.
>   * gcc.target/powerpc/vec-addec-int128.c: Change dejagnu options, fix 
> code
> formatting.

"formatting" here should be indented like the *.


Segher


[PATCH] calls.c: fix warning on targets without REG_PARM_STACK_SPACE

2016-05-20 Thread David Malcolm
On Fri, 2016-05-20 at 18:03 +0100, Kyrill Tkachov wrote:
[...snip...]
> REG_PARM_STACK_SPACE is not defined on arm, which makes
> reg_parm_stack_space
> unused in this function and so breaks bootstrap on arm.
> Can you please add an ATTRIBUTE_UNUSED to reg_parm_stack_space?
> 
> Thanks,
> Kyrill
[...snip...]

Sorry about the breakage.

I manually verified that the following fixes the warning with
--target=arm-linux-gnueabi.

Successfully bootstrapped on x86_64-pc-linux-gnu.

Committed to trunk as r236527.

gcc/ChangeLog:
* calls.c (can_implement_as_sibling_call_p): Mark param
reg_parm_stack_space with ATTRIBUTE_UNUSED.
---
 gcc/calls.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/calls.c b/gcc/calls.c
index 1b12eca..587969f 100644
--- a/gcc/calls.c
+++ b/gcc/calls.c
@@ -2373,7 +2373,7 @@ static bool
 can_implement_as_sibling_call_p (tree exp,
 rtx structure_value_addr,
 tree funtype,
-int reg_parm_stack_space,
+int reg_parm_stack_space ATTRIBUTE_UNUSED,
 tree fndecl,
 int flags,
 tree addr,
-- 
1.8.5.3



Question regarding bug 70584

2016-05-20 Thread Andres Tiraboschi
While analysing this bug we arrived to the following code at
tree.c:145 (lvalue_kind):

case VAR_DECL:
  if (TREE_READONLY (ref) && ! TREE_STATIC (ref)
  && DECL_LANG_SPECIFIC (ref)
  && DECL_IN_AGGR_P (ref))
return clk_none;

That condition fails so a fall-through to the next case labels causes
to return clk_ordinary, whereas this is about a constexpr value
(rather than a reference).

As an experiment, we forced the condition above to return clk_none and
the bug is not reproduced.

We are suspecting either that the condition is too restrictive or a
fall-through is not intended. Why is the condition requiring
DECL_IN_AGGR_P?

Thank,

Andres.


[PATCH, rs6000] Fix compilation issue on power7 from r235577

2016-05-20 Thread Bill Seurer
This patch changes some of the dejagnu options to better restrict
where the test cases run so that they will no longer cause failures on
power7 machines.

Based on a subsequent patch I also updated the code formatting (indentation,
etc.) for the code from the original patch (r235577) in both the test cases
and in rs6000-c.c.

Bootstrapped and tested on powerpc64le-unknown-linux-gnu (on both
power7 and power8) and powerpc64-unknown-linux-gnu (power8) with no
regressions. Is this ok for trunk?

[gcc]

2016-05-20  Bill Seurer  

* config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin): Fix
code formatting in ALTIVEC_BUILTIN_VEC_ADDEC section.

[gcc/testsuite]

2016-05-20  Bill Seurer  

* gcc.target/powerpc/vec-addec.c: Change dejagnu options, fix code
  formatting.
* gcc.target/powerpc/vec-addec-int128.c: Change dejagnu options, fix 
code
  formatting.

Index: /home/seurer/gcc/gcc-checkin2/gcc/config/rs6000/rs6000-c.c
===
--- /home/seurer/gcc/gcc-checkin2/gcc/config/rs6000/rs6000-c.c  (revision 
236518)
+++ /home/seurer/gcc/gcc-checkin2/gcc/config/rs6000/rs6000-c.c  (working copy)
@@ -4622,37 +4622,41 @@ assignment for unaligned loads and stores");
   /* All 3 arguments must be vectors of (signed or unsigned) (int or
  __int128) and the types must match.  */
   if ((arg0_type != arg1_type) || (arg1_type != arg2_type))
-   goto bad; 
+   goto bad;
   if (TREE_CODE (arg0_type) != VECTOR_TYPE)
-   goto bad; 
+   goto bad;
 
   switch (TYPE_MODE (TREE_TYPE (arg0_type)))
{
- /* For {un}signed ints, 
-vec_adde (va, vb, carryv) == vec_add (vec_add (va, vb), 
-   vec_and (carryv, 0x1)).  */
+ /* For {un}signed ints,
+vec_adde (va, vb, carryv) == vec_add (vec_add (va, vb),
+  vec_and (carryv, 0x1)).  */
  case SImode:
{
- vec *params = make_tree_vector();
+ vec *params = make_tree_vector ();
  vec_safe_push (params, arg0);
  vec_safe_push (params, arg1);
- tree call = altivec_resolve_overloaded_builtin
-(loc, rs6000_builtin_decls[ALTIVEC_BUILTIN_VEC_ADD], params);
- tree const1 = build_vector_from_val (arg0_type, 
-build_int_cstu(TREE_TYPE (arg0_type), 1));
- tree and_expr = fold_build2_loc (loc, BIT_AND_EXPR,
-   arg0_type, arg2, const1);
- params = make_tree_vector();
+ tree add_builtin = rs6000_builtin_decls[ALTIVEC_BUILTIN_VEC_ADD];
+ tree call = altivec_resolve_overloaded_builtin (loc, add_builtin,
+ params);
+ tree const1 = build_int_cstu (TREE_TYPE (arg0_type), 1);
+ tree ones_vector = build_vector_from_val (arg0_type, const1);
+ tree and_expr = fold_build2_loc (loc, BIT_AND_EXPR, arg0_type,
+  arg2, ones_vector);
+ params = make_tree_vector ();
  vec_safe_push (params, call);
  vec_safe_push (params, and_expr);
- return altivec_resolve_overloaded_builtin
-(loc, rs6000_builtin_decls[ALTIVEC_BUILTIN_VEC_ADD], params);
+ return altivec_resolve_overloaded_builtin (loc, add_builtin,
+params);
}
  /* For {un}signed __int128s use the vaddeuqm instruction
directly.  */
  case TImode:
-   return altivec_resolve_overloaded_builtin
-   (loc, rs6000_builtin_decls[P8V_BUILTIN_VEC_VADDEUQM], arglist);
+   {
+ tree adde_bii = rs6000_builtin_decls[P8V_BUILTIN_VEC_VADDEUQM];
+ return altivec_resolve_overloaded_builtin (loc, adde_bii,
+arglist);
+   }
 
  /* Types other than {un}signed int and {un}signed __int128
are errors.  */
@@ -4839,9 +4843,9 @@ assignment for unaligned loads and stores");
   arg1_type = TREE_TYPE (arg1);
 
   if (TREE_CODE (arg1_type) != VECTOR_TYPE)
-   goto bad; 
+   goto bad;
   if (!INTEGRAL_TYPE_P (TREE_TYPE (arg2)))
-   goto bad; 
+   goto bad;
 
   /* If we are targeting little-endian, but -maltivec=be has been
 specified to override the element order, adjust the element
@@ -4942,9 +4946,9 @@ assignment for unaligned loads and stores");
   arg2 = (*arglist)[2];
 
   if (TREE_CODE (arg1_type) != VECTOR_TYPE)
-   goto bad; 
+   goto bad;
   if (!INTEGRAL_TYPE_P (TREE_TYPE (arg2)))
-   goto bad; 
+   goto bad;
 
  

Re: [PATCH][MIPS] Add -mgrow-frame-downwards option

2016-05-20 Thread Bernhard Reutner-Fischer
On May 20, 2016 4:58:47 PM GMT+02:00, Robert Suchanek 
 wrote:

s/splots/slots/

thanks,



Re: [PATCH] Fix bootstrap on hppa*-*-hpux*

2016-05-20 Thread John David Anglin

On 2016-05-18 2:20 AM, Jakub Jelinek wrote:

On Tue, May 17, 2016 at 08:31:00PM -0400, John David Anglin wrote:

>r235550 introduced the use of long long, and the macros LLONG_MIN and 
LLONG_MAX.  These macros
>are not defined by default and we need to include  when compiling 
with c++ to define them.

IMNSHO we should get rid of those long long uses instead and just use
int64_t and INTTYPE_MINIMUM (int64_t) and INTTYPE_MAXIMUM (int64_t).

There is also another use of long long in libcpp, we should also replace
that.

The attached change implements the above.  There is an implicit 
assumption that int64_t

is long long if it is not long.

The patch also changes gcov-tool.c.  This affects the interface somewhat but
I think consistently using int64_t better.

Tested on hppa2.0w-hp-hpux11.11.  Okay for trunk?

Dave

--
John David Anglin  dave.ang...@bell.net

2016-05-20  John David Anglin  

PR bootstrap/71014
* c-common.c (get_source_date_epoch): Use int64_t instead of long long.

* gcov-tool.c (profile_rewrite): Use int64_t instead of long long.
(do_rewrite): likewise.

* line-map.c (location_adhoc_data_update): Use int64_t instead of
long long.
(get_combined_adhoc_loc): Likewise.

Index: gcc/c-family/c-common.c
===
--- gcc/c-family/c-common.c (revision 236418)
+++ gcc/c-family/c-common.c (working copy)
@@ -12798,7 +12798,7 @@
 get_source_date_epoch ()
 {
   char *source_date_epoch;
-  long long epoch;
+  int64_t epoch;
   char *endptr;
 
   source_date_epoch = getenv ("SOURCE_DATE_EPOCH");
@@ -12806,8 +12806,13 @@
 return (time_t) -1;
 
   errno = 0;
+#if defined(INT64_T_IS_LONG)
+  epoch = strtol (source_date_epoch, , 10);
+#else
   epoch = strtoll (source_date_epoch, , 10);
-  if ((errno == ERANGE && (epoch == LLONG_MAX || epoch == LLONG_MIN))
+#endif
+  if ((errno == ERANGE && (epoch == INTTYPE_MAXIMUM (int64_t)
+  || epoch == INTTYPE_MINIMUM (int64_t)))
   || (errno != 0 && epoch == 0))
 fatal_error (UNKNOWN_LOCATION, "environment variable $SOURCE_DATE_EPOCH: "
 "strtoll: %s\n", xstrerror(errno));
@@ -12819,7 +12824,7 @@
 "trailing garbage: %s\n", endptr);
   if (epoch < 0)
 fatal_error (UNKNOWN_LOCATION, "environment variable $SOURCE_DATE_EPOCH: "
-"value must be nonnegative: %lld \n", epoch);
+"value must be nonnegative: %" SCNd64 "\n", epoch);
 
   return (time_t) epoch;
 }
Index: gcc/gcov-tool.c
===
--- gcc/gcov-tool.c (revision 236418)
+++ gcc/gcov-tool.c (working copy)
@@ -232,7 +232,7 @@
Otherwise, multiply the all counters by SCALE.  */
 
 static int
-profile_rewrite (const char *d1, const char *out, long long n_val,
+profile_rewrite (const char *d1, const char *out, int64_t n_val,
  float scale, int n, int d)
 {
   struct gcov_info * d1_profile;
@@ -261,7 +261,7 @@
   fnotice (file, "-v, --verbose   Verbose mode\n");
   fnotice (file, "-o, --output   Output directory\n");
   fnotice (file, "-s, --scale   Scale the profile 
counters\n");
-  fnotice (file, "-n, --normalize  Normalize the 
profile\n");
+  fnotice (file, "-n, --normalizeNormalize the 
profile\n");
 }
 
 static const struct option rewrite_options[] =
@@ -291,11 +291,7 @@
   int opt;
   int ret;
   const char *output_dir = 0;
-#ifdef HAVE_LONG_LONG
-  long long normalize_val = 0;
-#else
   int64_t normalize_val = 0;
-#endif
   float scale = 0.0;
   int numerator = 1;
   int denominator = 1;
@@ -315,12 +311,10 @@
   break;
 case 'n':
   if (!do_scaling)
-#if defined(HAVE_LONG_LONG)
-   normalize_val = strtoll (optarg, (char **)NULL, 10);
-#elif defined(INT64_T_IS_LONG)
+#if defined(INT64_T_IS_LONG)
normalize_val = strtol (optarg, (char **)NULL, 10);
 #else
-   sscanf (optarg, "%" SCNd64, _val);
+   normalize_val = strtoll (optarg, (char **)NULL, 10);
 #endif
   else
 fnotice (stderr, "scaling cannot co-exist with normalization,"
Index: libcpp/line-map.c
===
--- libcpp/line-map.c   (revision 236418)
+++ libcpp/line-map.c   (working copy)
@@ -102,7 +102,7 @@
 static int
 location_adhoc_data_update (void **slot, void *data)
 {
-  *((char **) slot) += *((long long *) data);
+  *((char **) slot) += *((int64_t *) data);
   return 1;
 }
 
@@ -224,7 +224,7 @@
  set->location_adhoc_data_map.allocated)
{
  char *orig_data = (char *) set->location_adhoc_data_map.data;
- long long offset;
+ int64_t offset;
  /* Cast away extern "C" from the type of xrealloc.  */
  line_map_realloc reallocator = (set->reallocator
   

[PATCH][Testsuite] Force testing of vectorized builtins rather than inlined i387 asm

2016-05-20 Thread Ilya Verbin
Hi!

In some cases the i387 version of a math function may be inlined from math.h,
and the testcase (like gcc.target/i386/sse4_1-ceil-vec.c) will actually test
inlined asm instead of vectorized builtin.  To fix this I've created a new file
gcc.dg/mathfunc.h (similar to gcc.dg/strlenopt.h) and changed vectorization
tests so that they include it instead of math.h.
Regtested on x86_64-linux and i686-linux.  Is it OK for trunk?

gcc/testsuite/
* gcc.dg/mathfunc.h: New file.
* gcc.target/i386/avx-ceil-sfix-2-vec.c: Do not skip if there is no M_PI
for vxworks_kernel.  Include mathfunc.h instead of math.h.  Remove
declaration.
* gcc.target/i386/avx-cvt-2-vec.c: Likewise.
* gcc.target/i386/avx-floor-sfix-2-vec.c: Likewise.
* gcc.target/i386/avx-rint-sfix-2-vec.c: Likewise.
* gcc.target/i386/avx-round-sfix-2-vec.c: Likewise.
* gcc.target/i386/avx512f-ceil-sfix-vec-1.c: Likewise.
* gcc.target/i386/avx512f-floor-sfix-vec-1.c: Likewise.
* gcc.target/i386/sse2-cvt-vec.c: Likewise.
* gcc.target/i386/sse4_1-ceil-sfix-vec.c: Likewise.
* gcc.target/i386/sse4_1-ceil-vec.c: Likewise.
* gcc.target/i386/sse4_1-ceilf-sfix-vec.c: Likewise.
* gcc.target/i386/sse4_1-ceilf-vec.c: Likewise.
* gcc.target/i386/sse4_1-floor-sfix-vec.c: Likewise.
* gcc.target/i386/sse4_1-floor-vec.c: Likewise.
* gcc.target/i386/sse4_1-rint-sfix-vec.c: Likewise.
* gcc.target/i386/sse4_1-rint-vec.c: Likewise.
* gcc.target/i386/sse4_1-rintf-sfix-vec.c: Likewise.
* gcc.target/i386/sse4_1-rintf-vec.c: Likewise.
* gcc.target/i386/sse4_1-round-sfix-vec.c: Likewise.
* gcc.target/i386/sse4_1-round-vec.c: Likewise.
* gcc.target/i386/sse4_1-roundf-sfix-vec.c: Likewise.
* gcc.target/i386/sse4_1-roundf-vec.c: Likewise.
* gcc.target/i386/sse4_1-trunc-vec.c: Likewise.
* gcc.target/i386/sse4_1-truncf-vec.c: Likewise.
* gcc.target/i386/sse4_1-floorf-sfix-vec.c: Likewise.  Use floorf
instead of __builtin_floorf.
* gcc.target/i386/sse4_1-floorf-vec.c: Likewise.


diff --git a/gcc/testsuite/gcc.dg/mathfunc.h b/gcc/testsuite/gcc.dg/mathfunc.h
new file mode 100644
index 000..1c1b7bc
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/mathfunc.h
@@ -0,0 +1,20 @@
+/* This is a replacement of needed parts from math.h for testing vectorization,
+   to ensure we are testing the builtins rather than whatever the OS has in its
+   headers.  */
+
+#define M_PI  3.14159265358979323846
+
+extern double ceil (double);
+extern float ceilf (float);
+
+extern double floor (double);
+extern float floorf (float);
+
+extern double trunc (double);
+extern float truncf (float);
+
+extern double round (double);
+extern float roundf (float);
+
+extern double rint (double);
+extern float rintf (float);
diff --git a/gcc/testsuite/gcc.target/i386/avx-ceil-sfix-2-vec.c 
b/gcc/testsuite/gcc.target/i386/avx-ceil-sfix-2-vec.c
index bf48b80..567a16d 100644
--- a/gcc/testsuite/gcc.target/i386/avx-ceil-sfix-2-vec.c
+++ b/gcc/testsuite/gcc.target/i386/avx-ceil-sfix-2-vec.c
@@ -1,7 +1,6 @@
 /* { dg-do run } */
 /* { dg-options "-O2 -ffast-math -ftree-vectorize -mavx" } */
 /* { dg-require-effective-target avx } */
-/* { dg-skip-if "no M_PI" { vxworks_kernel } } */
 
 #ifndef CHECK_H
 #define CHECK_H "avx-check.h"
@@ -13,9 +12,7 @@
 
 #include CHECK_H
 
-#include 
-
-extern double ceil (double);
+#include "../../gcc.dg/mathfunc.h"
 
 #define NUM 4
 
diff --git a/gcc/testsuite/gcc.target/i386/avx-cvt-2-vec.c 
b/gcc/testsuite/gcc.target/i386/avx-cvt-2-vec.c
index 0081dcf..8a8369b 100644
--- a/gcc/testsuite/gcc.target/i386/avx-cvt-2-vec.c
+++ b/gcc/testsuite/gcc.target/i386/avx-cvt-2-vec.c
@@ -1,7 +1,6 @@
 /* { dg-do run } */
 /* { dg-options "-O2 -ffast-math -ftree-vectorize -mavx" } */
 /* { dg-require-effective-target avx } */
-/* { dg-skip-if "no M_PI" { vxworks_kernel } } */
 
 #ifndef CHECK_H
 #define CHECK_H "avx-check.h"
@@ -13,7 +12,7 @@
 
 #include CHECK_H
 
-#include 
+#include "../../gcc.dg/mathfunc.h"
 
 #define NUM 4
 
diff --git a/gcc/testsuite/gcc.target/i386/avx-floor-sfix-2-vec.c 
b/gcc/testsuite/gcc.target/i386/avx-floor-sfix-2-vec.c
index 275199c..44002b4 100644
--- a/gcc/testsuite/gcc.target/i386/avx-floor-sfix-2-vec.c
+++ b/gcc/testsuite/gcc.target/i386/avx-floor-sfix-2-vec.c
@@ -1,7 +1,6 @@
 /* { dg-do run } */
 /* { dg-options "-O2 -ffast-math -ftree-vectorize -mavx" } */
 /* { dg-require-effective-target avx } */
-/* { dg-skip-if "no M_PI" { vxworks_kernel } } */
 
 #ifndef CHECK_H
 #define CHECK_H "avx-check.h"
@@ -13,9 +12,7 @@
 
 #include CHECK_H
 
-#include 
-
-extern double floor (double);
+#include "../../gcc.dg/mathfunc.h"
 
 #define NUM 4
 
diff --git a/gcc/testsuite/gcc.target/i386/avx-rint-sfix-2-vec.c 
b/gcc/testsuite/gcc.target/i386/avx-rint-sfix-2-vec.c
index 9f273af..980b341 100644
--- a/gcc/testsuite/gcc.target/i386/avx-rint-sfix-2-vec.c
+++ 

[PATCH] Use flag_general_regs_only with -mgeneral-regs-only

2016-05-20 Thread H.J. Lu
On Fri, May 20, 2016 at 10:15 AM, Rainer Orth
 wrote:
> "H.J. Lu"  writes:
>
>> On Thu, May 12, 2016 at 10:54 AM, H.J. Lu  wrote:
> Here is a patch to add
> -mgeneral-regs-only option to x86 backend.   We can update
> spec for interrupt handle to recommend compiling interrupt handler
> with -mgeneral-regs-only option and add a note for compiler
> implementers.
>
> OK for trunk if there is no regression?


 I can't comment on the code patch, but for the documentation part:

> @@ -24242,6 +24242,12 @@ opcodes, to mitigate against certain forms of
> attack. At the moment,
>  this option is limited in what it can do and should not be relied
>  on to provide serious protection.
>
> +@item -mgeneral-regs-only
> +@opindex mgeneral-regs-only
> +Generate code which uses only the general-purpose registers.  This will


 s/which/that/

> +prevent the compiler from using floating-point, vector, mask and bound


 s/will prevent/prevents/

> +registers, but will not impose any restrictions on the assembler.


 Maybe you mean to say "does not restrict use of those registers in inline
 assembly code"?  In any case, please get rid of the future tense here, too.
>>>
>>> I changed it to
>>>
>>> ---
>>> @item -mgeneral-regs-only
>>> @opindex mgeneral-regs-only
>>> Generate code that uses only the general-purpose registers.  This
>>> prevents the compiler from using floating-point, vector, mask and bound
>>> registers.
>>> ---
>>>
>>
>> Here is the updated patch.  Tested on x86-64.  OK for trunk?
>
> This patch broke {i386,x86_64}-apple-darwin15.5.0 bootstrap:
>
> In file included from ./tm.h:16:0,
>  from /vol/gcc/src/hg/trunk/local/gcc/genattrtab.c:108:
> ./options.h:5443:2: error: #error too many target masks
>  #error too many target masks
>   ^
> Makefile:2497: recipe for target 'build/genattrtab.o' failed
> make[3]: *** [build/genattrtab.o] Error 1
>
> options.h has
>
> #define OPTION_MASK_ISA_XSAVES (HOST_WIDE_INT_1 << 62)
> #error too many target masks
>
> The tree bootstraps just fine at the previous revision.
>

Tested on x86-64.  OK for trunk?


-- 
H.J.
From 174ef61afa610d7721d290a8a61136f86fa3987f Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Fri, 20 May 2016 10:45:12 -0700
Subject: [PATCH] Use flag_general_regs_only with -mgeneral-regs-only

Since we run out of bits in target_flags for x86 Darwin targets, use
flag_general_regs_only instead of target_flags.

	* config/i386/i386.c (ix86_option_override_internal): Check
	x_flag_general_regs_only instead of MASK_GENERAL_REGS_ONLY.
	* config/i386/i386.opt (-mgeneral-regs-only): Replace
	GENERAL_REGS_ONLY with flag_general_regs_only.
---
 gcc/config/i386/i386.c   | 2 +-
 gcc/config/i386/i386.opt | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 29a7941..e9b5182 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -5339,7 +5339,7 @@ ix86_option_override_internal (bool main_args_p,
 
 	/* Don't enable x87 instructions if only the general registers
 	   are allowed.  */
-	if (!(opts_set->x_target_flags & MASK_GENERAL_REGS_ONLY)
+	if (!opts->x_flag_general_regs_only
 	&& !(opts_set->x_target_flags & MASK_80387))
 	  {
 	if (processor_alias_table[i].flags & PTA_NO_80387)
diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index d12b29a..0fc7cf0 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -899,5 +899,5 @@ Target Var(flag_mitigate_rop) Init(0)
 Attempt to avoid generating instruction sequences containing ret bytes.
 
 mgeneral-regs-only
-Target Report RejectNegative Mask(GENERAL_REGS_ONLY) Save
+Target Report RejectNegative Var(flag_general_regs_only) Init(0)
 Generate code which uses only the general registers.
-- 
2.5.5



Re: [PATCH] Fix Fortran ICE due to realloc_string_callback bug (PR fortran/71204)

2016-05-20 Thread Thomas Koenig

Hi Jakub,



We ICE at -O0 while compiling the testcase below, because we don't reset
two vars that are reset in all other places in frontend-passes.c when
starting to process an unrelated statement.  Without this,
we can emit some statement into a preexisting block that can be elsewhere
in the current procedure or as in the testcase in completely different
procedure.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/6/5?


OK for all affected branches.

Thanks for the patch!

Thomas

[I am currently extremely busy in Real Life (TM), sorry for not taking
this on myself].



[PATCH, i386]: Improve constant RTX costs a bit

2016-05-20 Thread Uros Bizjak
2016-05-20  Uros Bizjak  

* gcc/config/i386/i386.c (ix86_rtx_costs) :
Use IS_STACK_MODE when calculating cost of standard 80387 constants.
Fallthru to CONST_VECTOR case to calculate cost of standard SSE
constants.
: Calculate cost of (MEM (SYMBOL_REF)).
(ix86_legitimate_constant_p): Use CASE_CONST_SCALAR_INT
and CASE_CONST_ANY.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline SVN.

Uros.
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index cecea11..26ec3cb 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -14864,8 +14864,7 @@ ix86_legitimate_constant_p (machine_mode mode, rtx x)
 #endif
   break;
 
-case CONST_INT:
-case CONST_WIDE_INT:
+CASE_CONST_SCALAR_INT:
   switch (mode)
{
case TImode:
@@ -14900,18 +14899,16 @@ ix86_legitimate_constant_p (machine_mode mode, rtx x)
 static bool
 ix86_cannot_force_const_mem (machine_mode mode, rtx x)
 {
-  /* We can always put integral constants and vectors in memory.  */
+  /* We can put any immediate constant in memory.  */
   switch (GET_CODE (x))
 {
-case CONST_INT:
-case CONST_WIDE_INT:
-case CONST_DOUBLE:
-case CONST_VECTOR:
+CASE_CONST_ANY:
   return false;
 
 default:
   break;
 }
+
   return !ix86_legitimate_constant_p (mode, x);
 }
 
@@ -44073,43 +44070,43 @@ ix86_rtx_costs (rtx x, machine_mode mode, int 
outer_code_i, int opno,
*total = 0;
   return true;
 
-case CONST_WIDE_INT:
-  *total = 0;
-  return true;
-
 case CONST_DOUBLE:
-  switch (standard_80387_constant_p (x))
+  if (TARGET_80387 && IS_STACK_MODE (mode))
+   switch (standard_80387_constant_p (x))
+ {
+ case -1:
+ case 0:
+   break;
+ case 1: /* 0.0 */
+   *total = 1;
+   return true;
+ default: /* Other constants */
+   *total = 2;
+   return true;
+ }
+  /* FALLTHRU */
+
+case CONST_VECTOR:
+  switch (standard_sse_constant_p (x, mode))
{
-   case 1: /* 0.0 */
- *total = 1;
- return true;
-   default: /* Other constants */
- *total = 2;
- return true;
case 0:
-   case -1:
  break;
+   case 1:  /* 0: xor eliminates false dependency */
+ *total = 0;
+ return true;
+   default: /* -1: cmp contains false dependency */
+ *total = 1;
+ return true;
}
-  if (SSE_FLOAT_MODE_P (mode))
-   {
-   case CONST_VECTOR:
- switch (standard_sse_constant_p (x, mode))
-   {
-   case 0:
- break;
-   case 1:  /* 0: xor eliminates false dependency */
- *total = 0;
- return true;
-   default: /* -1: cmp contains false dependency */
- *total = 1;
- return true;
-   }
-   }
+  /* FALLTHRU */
+
+case CONST_WIDE_INT:
   /* Fall back to (MEM (SYMBOL_REF)), since that's where
 it'll probably end up.  Add a penalty for size.  */
   *total = (COSTS_N_INSNS (1)
-   + (flag_pic != 0 && !TARGET_64BIT)
-   + (mode == SFmode ? 0 : mode == DFmode ? 1 : 2));
+   + (!TARGET_64BIT && flag_pic)
+   + (GET_MODE_SIZE (mode) <= 4
+  ? 0 : GET_MODE_SIZE (mode) <= 8 ? 1 : 2));
   return true;
 
 case ZERO_EXTEND:


Re: PATCH: PR target/70738: Add -mgeneral-regs-only option

2016-05-20 Thread Rainer Orth
"H.J. Lu"  writes:

> On Thu, May 12, 2016 at 10:54 AM, H.J. Lu  wrote:
 Here is a patch to add
 -mgeneral-regs-only option to x86 backend.   We can update
 spec for interrupt handle to recommend compiling interrupt handler
 with -mgeneral-regs-only option and add a note for compiler
 implementers.

 OK for trunk if there is no regression?
>>>
>>>
>>> I can't comment on the code patch, but for the documentation part:
>>>
 @@ -24242,6 +24242,12 @@ opcodes, to mitigate against certain forms of
 attack. At the moment,
  this option is limited in what it can do and should not be relied
  on to provide serious protection.

 +@item -mgeneral-regs-only
 +@opindex mgeneral-regs-only
 +Generate code which uses only the general-purpose registers.  This will
>>>
>>>
>>> s/which/that/
>>>
 +prevent the compiler from using floating-point, vector, mask and bound
>>>
>>>
>>> s/will prevent/prevents/
>>>
 +registers, but will not impose any restrictions on the assembler.
>>>
>>>
>>> Maybe you mean to say "does not restrict use of those registers in inline
>>> assembly code"?  In any case, please get rid of the future tense here, too.
>>
>> I changed it to
>>
>> ---
>> @item -mgeneral-regs-only
>> @opindex mgeneral-regs-only
>> Generate code that uses only the general-purpose registers.  This
>> prevents the compiler from using floating-point, vector, mask and bound
>> registers.
>> ---
>>
>
> Here is the updated patch.  Tested on x86-64.  OK for trunk?

This patch broke {i386,x86_64}-apple-darwin15.5.0 bootstrap:

In file included from ./tm.h:16:0,
 from /vol/gcc/src/hg/trunk/local/gcc/genattrtab.c:108:
./options.h:5443:2: error: #error too many target masks
 #error too many target masks
  ^
Makefile:2497: recipe for target 'build/genattrtab.o' failed
make[3]: *** [build/genattrtab.o] Error 1

options.h has

#define OPTION_MASK_ISA_XSAVES (HOST_WIDE_INT_1 << 62)
#error too many target masks

The tree bootstraps just fine at the previous revision.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH][RFC] Introduce BIT_FIELD_INSERT

2016-05-20 Thread Marc Glisse

On Fri, 20 May 2016, Andi Kleen wrote:


On Fri, May 20, 2016 at 05:11:59PM +0200, Marc Glisse wrote:

On Fri, 20 May 2016, Andi Kleen wrote:


Richard Biener  writes:


The following patch adds BIT_FIELD_INSERT, an operation to
facilitate doing bitfield inserts on registers (as opposed
to currently where we'd have a BIT_FIELD_REF store).


I wonder if these patches would make it easier to use the Haswell
bit manipulations instructions on x86 (which act on registers).

I found that gcc makes significantly less use of them than LLVM,
sometimes leading to much bigger code.


Could you point at some bugzilla entries? I don't really see which
BMI* instruction could be helped by BIT_FIELD_INSERT (PDEP seems too
hard). There is one BMI1 instruction we don't use much, bextr (only
defined with an UNSPEC in i386.md, unlike the TBM version), but it
is about extracting.


Ok. Yes I was thinking of BEXTR.

I thought I had filed a bugzilla at some point, but can't
find it right now. If you compare bitfield code
compiled for Haswell on LLVM and GCC it is very visible
how much worse gcc is.


If I try some simple operations on bitfields, I don't see anything that 
obvious.


movl$1285, %eax # imm = 0x505
bextrl  %eax, %edi, %eax

vs

shrl$5, %eax
andl$31, %eax

is not that much better.
Incrementing a field gives one more shift with gcc and one more 'and' with 
clang, so maybe clang is slightly better there.



So perhaps it only needs changes in the backend.


With -mtbm we generate

bextr   $1285, %edi, %eax

so it shouldn't be hard to generate the same code as clang above, but I 
don't think that's an example of the "much worse" you have in mind.


--
Marc Glisse


Re: [PATCH 1/3] Introduce can_implement_as_sibling_call_p

2016-05-20 Thread Kyrill Tkachov

Hi David,

On 17/05/16 23:01, David Malcolm wrote:

This patch moves part of the logic for determining if tail
call optimizations are possible to a new helper function.

There are no functional changes.

expand_call is 1300 lines long, so there's arguably a
case for doing this on its own, but this change also
enables the followup patch.

The patch changes the logic from a big "if" with joined
|| clauses:

   if (first_problem ()
   ||second_problem ()
   /* ...etc... */
   ||final_problem ())
  try_tail_call = 0;

to a series of separate tests:

   if (first_problem ())
 return false;
   if (second_problem ())
 return false;
   /* ...etc... */
   if (final_problem ())
 return false;

I think the latter form has several advantages over the former:
- IMHO it's easier to read
- it makes it easy to put breakpoints on individual causes of failure
- it makes it easy to put specific error messages on individual causes
   of failure (as done in the followup patch).

Successfully bootstrapped on x86_64-pc-linux-gnu.

OK for trunk?

gcc/ChangeLog:
* calls.c (expand_call): Move "Rest of purposes for tail call
optimizations to fail" to...
(can_implement_as_sibling_call_p): ...this new function, and
split into multiple "if" statements.
---
  gcc/calls.c | 114 
  1 file changed, 76 insertions(+), 38 deletions(-)

diff --git a/gcc/calls.c b/gcc/calls.c
index 6cc1fc7..ac8092c 100644
--- a/gcc/calls.c
+++ b/gcc/calls.c
@@ -2344,6 +2344,78 @@ avoid_likely_spilled_reg (rtx x)
return x;
  }
  
+/* Helper function for expand_call.

+   Return false is EXP is not implementable as a sibling call.  */
+
+static bool
+can_implement_as_sibling_call_p (tree exp,
+rtx structure_value_addr,
+tree funtype,
+int reg_parm_stack_space,
+tree fndecl,
+int flags,
+tree addr,
+const args_size _size)
+{
+  if (!targetm.have_sibcall_epilogue ())
+return false;
+
+  /* Doing sibling call optimization needs some work, since
+ structure_value_addr can be allocated on the stack.
+ It does not seem worth the effort since few optimizable
+ sibling calls will return a structure.  */
+  if (structure_value_addr != NULL_RTX)
+return false;
+
+#ifdef REG_PARM_STACK_SPACE
+  /* If outgoing reg parm stack space changes, we can not do sibcall.  */
+  if (OUTGOING_REG_PARM_STACK_SPACE (funtype)
+  != OUTGOING_REG_PARM_STACK_SPACE (TREE_TYPE (current_function_decl))
+  || (reg_parm_stack_space != REG_PARM_STACK_SPACE 
(current_function_decl)))
+return false;
+#endif
+


REG_PARM_STACK_SPACE is not defined on arm, which makes reg_parm_stack_space
unused in this function and so breaks bootstrap on arm.
Can you please add an ATTRIBUTE_UNUSED to reg_parm_stack_space?

Thanks,
Kyrill


+  /* Check whether the target is able to optimize the call
+ into a sibcall.  */
+  if (!targetm.function_ok_for_sibcall (fndecl, exp))
+return false;
+
+  /* Functions that do not return exactly once may not be sibcall
+ optimized.  */
+  if (flags & (ECF_RETURNS_TWICE | ECF_NORETURN))
+return false;
+
+  if (TYPE_VOLATILE (TREE_TYPE (TREE_TYPE (addr
+return false;
+
+  /* If the called function is nested in the current one, it might access
+ some of the caller's arguments, but could clobber them beforehand if
+ the argument areas are shared.  */
+  if (fndecl && decl_function_context (fndecl) == current_function_decl)
+return false;
+
+  /* If this function requires more stack slots than the current
+ function, we cannot change it into a sibling call.
+ crtl->args.pretend_args_size is not part of the
+ stack allocated by our caller.  */
+  if (args_size.constant > (crtl->args.size - crtl->args.pretend_args_size))
+return false;
+
+  /* If the callee pops its own arguments, then it must pop exactly
+ the same number of arguments as the current function.  */
+  if (targetm.calls.return_pops_args (fndecl, funtype, args_size.constant)
+  != targetm.calls.return_pops_args (current_function_decl,
+TREE_TYPE (current_function_decl),
+crtl->args.size))
+return false;
+
+  if (!lang_hooks.decls.ok_for_sibcall (fndecl))
+return false;
+
+  /* All checks passed.  */
+  return true;
+}
+
  /* Generate all the code for a CALL_EXPR exp
 and return an rtx for its value.
 Store the value in TARGET (specified as an rtx) if convenient.
@@ -2740,44 +2812,10 @@ expand_call (tree exp, rtx target, int ignore)
  try_tail_call = 0;
  
/*  Rest of purposes for tail call optimizations to fail.  */

-  if (!try_tail_call
-  || !targetm.have_sibcall_epilogue ()
- 

Re: C PATCH to add -Wswitch-unreachable (PR c/49859)

2016-05-20 Thread Jason Merrill

On 05/20/2016 12:36 PM, Marek Polacek wrote:

PR c/49859
* c.opt (Wswitch-unreachable): New option.


This should go in common.opt, since the flag variable is used in 
language-independent code.  OK with that change.


Jason



Re: C PATCH to add -Wswitch-unreachable (PR c/49859)

2016-05-20 Thread Marek Polacek
On Thu, May 19, 2016 at 11:53:52AM -0400, Jason Merrill wrote:
> Why implement this in the front end rather than at the gimple level?

I was afraid that I wouldn't have as good a location info as in the FE and
I wasn't sure if I'd be able to handle declarations well.

Now that I've rewritten this to GIMPLE, I no longer fear so.  Locations are
ok for various gimple_assigns and I don't have to care about DECL_EXPRs.
Moreover, this works fine even for the C++ FE now.  (But I had to disable the
warning for Fortran.)

I also discovered a bug in my previous version - it wouldn't warn for
  switch (i)
{
  int i;
  int j = 10;
  case 4: ...
}
because it only looked at the first statement after switch (i) and punted
for DECL_EXPRs.  This now works as it should.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2016-05-20  Marek Polacek  

PR c/49859
* c.opt (Wswitch-unreachable): New option.

* doc/invoke.texi: Document -Wswitch-unreachable.
* gimplify.c (gimplify_switch_expr): Implement the -Wswitch-unreachable
warning.

* c-c++-common/Wswitch-unreachable-1.c: New test.
* gcc.dg/Wswitch-unreachable-1.c: New test.
* c-c++-common/goacc/sb-2.c (void foo): Add dg-warning.
* g++.dg/cpp0x/lambda/lambda-switch.C (main): Likewise.
* g++.dg/gomp/block-10.C: Likewise.
* gcc.dg/gomp/block-10.c: Likewise.
* g++.dg/gomp/block-9.C: Likewise.
* gcc.dg/gomp/block-9.c: Likewise.
* g++.dg/gomp/target-1.C: Likewise.
* g++.dg/gomp/target-2.C: Likewise.
* gcc.dg/gomp/target-1.c: Likewise.
* gcc.dg/gomp/target-2.c: Likewise. 
* g++.dg/gomp/taskgroup-1.C: Likewise.
* gcc.dg/gomp/taskgroup-1.c: Likewise.
* gcc.dg/gomp/teams-1.c: Likewise.
* g++.dg/gomp/teams-1.C: Likewise.
* g++.dg/overload/error3.C: Likewise.
* g++.dg/tm/jump1.C: Likewise.
* g++.dg/torture/pr40335.C: Likewise.
* gcc.dg/c99-vla-jump-5.c: Likewise.
* gcc.dg/switch-warn-1.c: Likewise.
* gcc.dg/Wjump-misses-init-1.c: Use -Wno-switch-unreachable.
* gcc.dg/nested-func-1.c: Likewise.
* gcc.dg/pr67784-4.c: Likewise.

diff --git gcc/c-family/c.opt gcc/c-family/c.opt
index 918df16..ed98503 100644
--- gcc/c-family/c.opt
+++ gcc/c-family/c.opt
@@ -634,6 +634,11 @@ Wswitch-bool
 C ObjC C++ ObjC++ Var(warn_switch_bool) Warning Init(1)
 Warn about switches with boolean controlling expression.
 
+Wswitch-unreachable
+C ObjC C++ ObjC++ Var(warn_switch_unreachable) Warning Init(1)
+Warn about statements between switch's controlling expression and the first
+case.
+
 Wtemplates
 C++ ObjC++ Var(warn_templates) Warning
 Warn on primary template declaration.
diff --git gcc/doc/invoke.texi gcc/doc/invoke.texi
index f3d087f..5909b9d 100644
--- gcc/doc/invoke.texi
+++ gcc/doc/invoke.texi
@@ -297,7 +297,8 @@ Objective-C and Objective-C++ Dialects}.
 -Wsuggest-attribute=@r{[}pure@r{|}const@r{|}noreturn@r{|}format@r{]} @gol
 -Wsuggest-final-types @gol -Wsuggest-final-methods -Wsuggest-override @gol
 -Wmissing-format-attribute -Wsubobject-linkage @gol
--Wswitch  -Wswitch-default  -Wswitch-enum -Wswitch-bool -Wsync-nand @gol
+-Wswitch  -Wswitch-default  -Wswitch-enum -Wswitch-bool @gol
+-Wswitch-unreachable  -Wsync-nand @gol
 -Wsystem-headers  -Wtautological-compare  -Wtrampolines  -Wtrigraphs @gol
 -Wtype-limits  -Wundef @gol
 -Wuninitialized  -Wunknown-pragmas  -Wunsafe-loop-optimizations @gol
@@ -4144,6 +4145,39 @@ switch ((int) (a == 4))
 @end smallexample
 This warning is enabled by default for C and C++ programs.
 
+@item -Wswitch-unreachable
+@opindex Wswitch-unreachable
+@opindex Wno-switch-unreachable
+Warn whenever a @code{switch} statement contains statements between the
+controlling expression and the first case label, which will never be
+executed.  For example:
+@smallexample
+@group
+switch (cond)
+  @{
+   i = 15;
+  @dots{}
+   case 5:
+  @dots{}
+  @}
+@end group
+@end smallexample
+@option{-Wswitch-unreachable} will not warn if the statement between the
+controlling expression and the first case label is just a declaration:
+@smallexample
+@group
+switch (cond)
+  @{
+   int i;
+  @dots{}
+   case 5:
+   i = 5;
+  @dots{}
+  @}
+@end group
+@end smallexample
+This warning is enabled by default for C and C++ programs.
+
 @item -Wsync-nand @r{(C and C++ only)}
 @opindex Wsync-nand
 @opindex Wno-sync-nand
diff --git gcc/gimplify.c gcc/gimplify.c
index c433a84..57eca85 100644
--- gcc/gimplify.c
+++ gcc/gimplify.c
@@ -1595,6 +1595,32 @@ gimplify_switch_expr (tree *expr_p, gimple_seq *pre_p)
   gimplify_ctxp->case_labels.create (8);
 
   gimplify_stmt (_BODY (switch_expr), _body_seq);
+
+  /* Possibly warn about unreachable statements between switch's
+controlling expression and the first case.  */
+  if (warn_switch_unreachable
+ /* This warning doesn't play well with Fortran when 

Re: [PATCH] nvptx per-warp compiler-defined stacks (-msoft-stack)

2016-05-20 Thread Sandra Loosemore

On 05/20/2016 09:09 AM, Alexander Monakov wrote:


diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index d281975..f0331e2 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -19341,6 +19341,18 @@ offloading execution.
  Apply partitioned execution optimizations.  This is the default when any
  level of optimization is selected.

+@item -msoft-stack
+@opindex msoft-stack
+Switch to code generation variant that does not use @code{.local} memory


s/Switch to code generation variant/Generate code/


+directly for stack storage. Instead, a per-warp stack pointer is
+maintained explicitely. This enables variable-length stack allocation (with


s/explicitely/explicitly/


+variable-length arrays or @code{alloca}), and when global memory is used for
+underlying storage, makes possible to access automatic variables from other


s/makes possible/makes it possible/


+threads, or with atomic instructions. This code generation variant is used
+for OpenMP offloading, but the option is exposed on its own for the purpose
+of testing the compiler; to generate code suitable for linking into programs
+using OpenMP offloading, use option @option{-mgomp}.
+
  @end table

  @node PDP-11 Options


The documentation bits are OK with those changes.

-Sandra



Re: PATCH: PR target/70738: Add -mgeneral-regs-only option

2016-05-20 Thread Sandra Loosemore

On 05/13/2016 09:00 AM, H.J. Lu wrote:


I changed it to

---
@item -mgeneral-regs-only
@opindex mgeneral-regs-only
Generate code that uses only the general-purpose registers.  This
prevents the compiler from using floating-point, vector, mask and bound
registers.
---



Here is the updated patch.  Tested on x86-64.  OK for trunk?


The documentation looks OK now.

-Sandra



"omp declare target" on DECL_EXTERNAL vars

2016-05-20 Thread Jakub Jelinek
Hi!

While working on this patch, I've noticed the need to do:

On Fri, May 20, 2016 at 06:12:44PM +0200, Jakub Jelinek wrote:
>   * varpool.c (varpool_node::get_create): Set node->offloading
>   even for DECL_EXTERNAL decls.
...
> --- gcc/varpool.c.jj  2016-05-04 18:43:25.0 +0200
> +++ gcc/varpool.c 2016-05-20 12:18:14.636755302 +0200
> @@ -149,11 +149,11 @@ varpool_node::get_create (tree decl)
>node = varpool_node::create_empty ();
>node->decl = decl;
>  
> -  if ((flag_openacc || flag_openmp) && !DECL_EXTERNAL (decl)
> +  if ((flag_openacc || flag_openmp)
>&& lookup_attribute ("omp declare target", DECL_ATTRIBUTES (decl)))
>  {
>node->offloadable = 1;
> -  if (ENABLE_OFFLOADING)
> +  if (ENABLE_OFFLOADING && !DECL_EXTERNAL (decl))
>   {
> g->have_offload = true;
> if (!in_lto_p)

but that made me think on what handling do we want for the
"omp declare target" DECL_EXTERNAL vars.
The reason I needed the above is that both gimplify.c and omp-low.c
test just the node->offloadable flag, bit the attribute, and so when
it is external and the flag wasn't set, we could privatize the vars
even when we were supposed to map them etc.
In the C/C++ FEs, we set not just node->offloadable, but also
for ENABLE_OFFLOADING g->have_offload and offload_vars too.
Wonder if that means we register even non-local vars, that would be IMHO a
bug.  On the other side, we need to watch for an extern declaration
of a VAR_DECL marked for offloading and only later on locally defined,
in that case if we haven't set g->have_offload and added entry to
offload_vars, we'd need to do it when merging the extern decl with the
definition.

So, your thoughts on that?

Jakub


[gomp4.5] Make even Fortran target use firstprivate for scalars by default, assorted fixes

2016-05-20 Thread Jakub Jelinek
Hi!

This patch turns on implicit firstprivate for scalars (unless
defaultmap(tofrom: scalar) is present) for !$omp target, and assorted fixes
so that the testsuite passes again.

Tested on x86_64-linux, committed to branch.

2016-05-20  Jakub Jelinek  

gcc/
* langhooks.h (struct lang_hooks_for_decls): Add omp_scalar_p.
* langhooks-def.h (lhd_omp_scalar_p): New prototype.
(LANG_HOOKS_OMP_SCALAR_P): Define.
(LANG_HOOKS_DECLS): Use it.
* langhooks.c (lhd_omp_scalar_p): New function.
* gimplify.c (omp_notice_variable): Use lang_hooks.decls.omp_scalar_p.
(omp_no_lastprivate): Removed.
(gimplify_scan_omp_clauses): Set ctx->target_map_scalars_firstprivate
on OMP_TARGET even for Fortran.  Remove omp_no_lastprivate callers.
Propagate lastprivate on combined teams distribute parallel for simd
even to distribute and teams construct.
(gimplify_adjust_omp_clauses): Remove omp_no_lastprivate callers.
(gimplify_omp_for): Likewise.
(computable_teams_clause): Fail for automatic vars from current
function not yet seen in bind expr.
* omp-low.c (lower_omp_target): Fix up argument to is_reference.
* varpool.c (varpool_node::get_create): Set node->offloading
even for DECL_EXTERNAL decls.
gcc/fortran/
* trans.h (gfc_omp_scalar_p): New prototype.
* f95-lang.c (LANG_HOOKS_OMP_SCALAR_P): Redefine to gfc_omp_scalar_p.
* trans-openmp.c (gfc_omp_scalar_p): New function.
(gfc_trans_omp_do): Clear sched_simd flag.
(gfc_split_omp_clauses): Change firstprivate and lastprivate
handling for OpenMP 4.5.
(gfc_trans_omp_teams): Add omp_clauses argument, add it to other
teams clauses.
(gfc_trans_omp_target): For -fopenmp, translate num_teams and
thread_limit clauses on combined target teams early and pass to
gfc_trans_omp_teams.
(gfc_trans_omp_directive): Adjust gfc_trans_omp_teams caller.
libgomp/
* testsuite/libgomp.fortran/examples-4/declare_target-1.f90
(fib_wrapper): Add map(from: x) clause.
* testsuite/libgomp.fortran/examples-4/declare_target-2.f90
(e_53_2): Likewise.
* testsuite/libgomp.fortran/examples-4/declare_target-4.f90
(accum): Add map(tmp) clause.
* testsuite/libgomp.fortran/examples-4/declare_target-5.f90
(accum): Add map(tofrom: tmp) clause.
* testsuite/libgomp.fortran/examples-4/target_data-3.f90
(gramSchmidt): Likewise.
* testsuite/libgomp.fortran/examples-4/teams-2.f90 (dotprod): Add
map(tofrom: sum) clause.
* testsuite/libgomp.fortran/nestedfn5.f90 (foo): Add twice
map (alloc: a, l) clause.  Add defaultmap(tofrom: scalar) clause.
* testsuite/libgomp.fortran/pr66199-2.f90: Adjust for linear clause
only allowed on the loop iterator.
* testsuite/libgomp.fortran/target4.f90 (foo): Add map(t) clause.

--- gcc/langhooks.h.jj  2016-05-04 18:37:42.0 +0200
+++ gcc/langhooks.h 2016-05-19 18:14:56.474549712 +0200
@@ -256,6 +256,10 @@ struct lang_hooks_for_decls
 
   /* Do language specific checking on an implicitly determined clause.  */
   void (*omp_finish_clause) (tree clause, gimple_seq *pre_p);
+
+  /* Return true if DECL is a scalar variable (for the purpose of
+ implicit firstprivatization).  */
+  bool (*omp_scalar_p) (tree decl);
 };
 
 /* Language hooks related to LTO serialization.  */
--- gcc/langhooks-def.h.jj  2016-05-04 18:43:30.0 +0200
+++ gcc/langhooks-def.h 2016-05-19 18:13:40.817541557 +0200
@@ -80,6 +80,7 @@ struct gimplify_omp_ctx;
 extern void lhd_omp_firstprivatize_type_sizes (struct gimplify_omp_ctx *,
   tree);
 extern bool lhd_omp_mappable_type (tree);
+extern bool lhd_omp_scalar_p (tree);
 
 #define LANG_HOOKS_NAME"GNU unknown"
 #define LANG_HOOKS_IDENTIFIER_SIZE sizeof (struct lang_identifier)
@@ -225,6 +226,7 @@ extern tree lhd_make_node (enum tree_cod
 #define LANG_HOOKS_OMP_CLAUSE_LINEAR_CTOR NULL
 #define LANG_HOOKS_OMP_CLAUSE_DTOR hook_tree_tree_tree_null
 #define LANG_HOOKS_OMP_FINISH_CLAUSE lhd_omp_finish_clause
+#define LANG_HOOKS_OMP_SCALAR_P lhd_omp_scalar_p
 
 #define LANG_HOOKS_DECLS { \
   LANG_HOOKS_GLOBAL_BINDINGS_P, \
@@ -249,7 +251,8 @@ extern tree lhd_make_node (enum tree_cod
   LANG_HOOKS_OMP_CLAUSE_ASSIGN_OP, \
   LANG_HOOKS_OMP_CLAUSE_LINEAR_CTOR, \
   LANG_HOOKS_OMP_CLAUSE_DTOR, \
-  LANG_HOOKS_OMP_FINISH_CLAUSE \
+  LANG_HOOKS_OMP_FINISH_CLAUSE, \
+  LANG_HOOKS_OMP_SCALAR_P \
 }
 
 /* LTO hooks.  */
--- gcc/langhooks.c.jj  2016-05-04 18:37:41.0 +0200
+++ gcc/langhooks.c 2016-05-19 18:24:57.213864107 +0200
@@ -514,6 +514,24 @@ lhd_omp_finish_clause (tree, gimple_seq
 {
 }
 
+/* Return true if DECL is a scalar variable (for the purpose of
+   implicit firstprivatization).  */
+
+bool

Re: [PATCH][RFC] Introduce BIT_FIELD_INSERT

2016-05-20 Thread Jakub Jelinek
On Fri, May 20, 2016 at 08:54:39AM -0700, Andi Kleen wrote:
> I thought I had filed a bugzilla at some point, but can't
> find it right now. If you compare bitfield code
> compiled for Haswell on LLVM and GCC it is very visible
> how much worse gcc is.

We really need to lower bitfield operations (especially when there are
multiple adjacent ones) to integer arithmetics on the underlying
DECL_BIT_FIELD_REPRESENTATIVE fields somewhere in (late?) gimple and
perform some cleanups after it and only after that expand.

Jakub


Re: [PATCH][RFC] Introduce BIT_FIELD_INSERT

2016-05-20 Thread Andi Kleen
On Fri, May 20, 2016 at 05:11:59PM +0200, Marc Glisse wrote:
> On Fri, 20 May 2016, Andi Kleen wrote:
> 
> >Richard Biener  writes:
> >
> >>The following patch adds BIT_FIELD_INSERT, an operation to
> >>facilitate doing bitfield inserts on registers (as opposed
> >>to currently where we'd have a BIT_FIELD_REF store).
> >
> >I wonder if these patches would make it easier to use the Haswell
> >bit manipulations instructions on x86 (which act on registers).
> >
> >I found that gcc makes significantly less use of them than LLVM,
> >sometimes leading to much bigger code.
> 
> Could you point at some bugzilla entries? I don't really see which
> BMI* instruction could be helped by BIT_FIELD_INSERT (PDEP seems too
> hard). There is one BMI1 instruction we don't use much, bextr (only
> defined with an UNSPEC in i386.md, unlike the TBM version), but it
> is about extracting.

Ok. Yes I was thinking of BEXTR.

I thought I had filed a bugzilla at some point, but can't
find it right now. If you compare bitfield code
compiled for Haswell on LLVM and GCC it is very visible
how much worse gcc is.

So perhaps it only needs changes in the backend.

-Andi



Re: PATCH: PR target/70738: Add -mgeneral-regs-only option

2016-05-20 Thread Uros Bizjak
On Fri, May 13, 2016 at 5:00 PM, H.J. Lu  wrote:
> On Thu, May 12, 2016 at 10:54 AM, H.J. Lu  wrote:
 Here is a patch to add
 -mgeneral-regs-only option to x86 backend.   We can update
 spec for interrupt handle to recommend compiling interrupt handler
 with -mgeneral-regs-only option and add a note for compiler
 implementers.

 OK for trunk if there is no regression?
>>>
>>>
>>> I can't comment on the code patch, but for the documentation part:
>>>
 @@ -24242,6 +24242,12 @@ opcodes, to mitigate against certain forms of
 attack. At the moment,
  this option is limited in what it can do and should not be relied
  on to provide serious protection.

 +@item -mgeneral-regs-only
 +@opindex mgeneral-regs-only
 +Generate code which uses only the general-purpose registers.  This will
>>>
>>>
>>> s/which/that/
>>>
 +prevent the compiler from using floating-point, vector, mask and bound
>>>
>>>
>>> s/will prevent/prevents/
>>>
 +registers, but will not impose any restrictions on the assembler.
>>>
>>>
>>> Maybe you mean to say "does not restrict use of those registers in inline
>>> assembly code"?  In any case, please get rid of the future tense here, too.
>>
>> I changed it to
>>
>> ---
>> @item -mgeneral-regs-only
>> @opindex mgeneral-regs-only
>> Generate code that uses only the general-purpose registers.  This
>> prevents the compiler from using floating-point, vector, mask and bound
>> registers.
>> ---
>>
>
> Here is the updated patch.  Tested on x86-64.  OK for trunk?

OK.

Thanks,
Uros.


Re: [C++ Patch/RFC] PR 70572 ("[4.9/5/6/7 Regression] ICE on code with decltype (auto) on x86_64-linux-gnu in digest_init_r")

2016-05-20 Thread Paolo Carlini

Hi,

On 20/05/2016 17:24, Jason Merrill wrote:

On 05/20/2016 07:17 AM, Paolo Carlini wrote:

The below passes testing. There are a few minor changes wrt your
suggestions (I think we want & as hint;


I disagree; if what the user wanted was a function pointer, there's no 
reason to use decltype(auto) over plain auto.  Much more likely that 
they meant to capture the (possibly reference) result of a call.
... ok, if you are sure that the user (ok, novice programmer) will not 
be puzzled when he will see that decltype (auto) a = foo () also does 
not work... but I agree hints are hints, will never be 100% correct and 
informative...

spacing consistent with
typeck2.c; DECL_NAME doesn't seem necessary). I wondered if we want to
tighten the condition consistently with the wording of the error
message, thus patchlet *2 below, which of course also passes testing.


No, I don't think there's any way to deduce function type without the 
initializer having function type.


So, the first patch is OK with the message changed to ().

Ok.

Paolo.


Re: [PATCH 3/3] function: Restructure *logue insertion

2016-05-20 Thread Segher Boessenkool
On Fri, May 20, 2016 at 10:47:19AM -0400, Nathan Sidwell wrote:
> On 05/20/16 09:21, Thomas Schwinge wrote:
> >Hi!
> >
> >The nvptx maintainer Bernd, Nathan: can you take it from here, or should
> >I continue to figure it out?
> 
> What is the defect?

I have a fix, testing now.


Segher


Clean up PURE_SLP_STMT handling

2016-05-20 Thread Richard Sandiford
The vectorizable_* routines had many instances of:

slp_node || PURE_SLP_STMT (stmt_info)

which gives the misleading impression that we can have
!slp_node && PURE_SLP_STMT (stmt_info).  In this context
it's really enough to test slp_node on its own.

There are three cases:

  loop vectorisation only:
vectorizable_foo called only with !slp_node

  pure SLP:
vectorizable_foo called only with slp_node

  hybrid SLP:
(e.g. a vector that's used in SLP statements and also in a reduction)
- vectorizable_foo called once with slp_node for the SLP uses.
- vectorizable_foo called once with !slp_node for the non-SLP uses.

Hybrid SLP isn't possible for stores, so I added an explicit assert
for that.

I also made vectorizable_comparison static, to make it obvious that
no other callers outside tree-vect-stmts.c could use it with the
!slp && PURE_SLP_STMT combination.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Thanks,
Richard


gcc/
* tree-vectorizer.h (vectorizable_comparison): Delete.
* tree-vect-loop.c (vectorizable_reduction): Remove redundant
PURE_SLP_STMT check.
* tree-vect-stmts.c (vectorizable_call): Likewise.
(vectorizable_simd_clone_call): Likewise.
(vectorizable_conversion): Likewise.
(vectorizable_assignment): Likewise.
(vectorizable_shift): Likewise.
(vectorizable_operation): Likewise.
(vectorizable_load): Likewise.
(vectorizable_condition): Likewise.
(vectorizable_store): Likewise.  Assert that we don't have
hybrid SLP.
(vectorizable_comparison): Make static.  Remove redundant
PURE_SLP_STMT check.
(vect_transform_stmt): Assert that we always have an slp_node
if PURE_SLP_STMT.

Index: gcc/tree-vectorizer.h
===
--- gcc/tree-vectorizer.h
+++ gcc/tree-vectorizer.h
@@ -1004,8 +1004,6 @@ extern void vect_remove_stores (gimple *);
 extern bool vect_analyze_stmt (gimple *, bool *, slp_tree);
 extern bool vectorizable_condition (gimple *, gimple_stmt_iterator *,
gimple **, tree, int, slp_tree);
-extern bool vectorizable_comparison (gimple *, gimple_stmt_iterator *,
-gimple **, tree, int, slp_tree);
 extern void vect_get_load_cost (struct data_reference *, int, bool,
unsigned int *, unsigned int *,
stmt_vector_for_cost *,
Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c
+++ gcc/tree-vect-loop.c
@@ -5594,7 +5594,7 @@ vectorizable_reduction (gimple *stmt, 
gimple_stmt_iterator *gsi,
   if (STMT_VINFO_LIVE_P (vinfo_for_stmt (reduc_def_stmt)))
 return false;
 
-  if (slp_node || PURE_SLP_STMT (stmt_info))
+  if (slp_node)
 ncopies = 1;
   else
 ncopies = (LOOP_VINFO_VECT_FACTOR (loop_vinfo)
Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c
+++ gcc/tree-vect-stmts.c
@@ -2342,7 +2342,7 @@ vectorizable_call (gimple *gs, gimple_stmt_iterator *gsi, 
gimple **vec_stmt,
}
 }
 
-  if (slp_node || PURE_SLP_STMT (stmt_info))
+  if (slp_node)
 ncopies = 1;
   else if (modifier == NARROW && ifn == IFN_LAST)
 ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits_out;
@@ -2792,7 +2792,7 @@ vectorizable_simd_clone_call (gimple *stmt, 
gimple_stmt_iterator *gsi,
 return false;
 
   /* FORNOW */
-  if (slp_node || PURE_SLP_STMT (stmt_info))
+  if (slp_node)
 return false;
 
   /* Process function arguments.  */
@@ -3761,7 +3761,7 @@ vectorizable_conversion (gimple *stmt, 
gimple_stmt_iterator *gsi,
   /* Multiple types in SLP are handled by creating the appropriate number of
  vectorized stmts for each SLP node.  Hence, NCOPIES is always 1 in
  case of SLP.  */
-  if (slp_node || PURE_SLP_STMT (stmt_info))
+  if (slp_node)
 ncopies = 1;
   else if (modifier == NARROW)
 ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits_out;
@@ -4242,7 +4242,7 @@ vectorizable_assignment (gimple *stmt, 
gimple_stmt_iterator *gsi,
   /* Multiple types in SLP are handled by creating the appropriate number of
  vectorized stmts for each SLP node.  Hence, NCOPIES is always 1 in
  case of SLP.  */
-  if (slp_node || PURE_SLP_STMT (stmt_info))
+  if (slp_node)
 ncopies = 1;
   else
 ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits;
@@ -4502,7 +4502,7 @@ vectorizable_shift (gimple *stmt, gimple_stmt_iterator 
*gsi,
   /* Multiple types in SLP are handled by creating the appropriate number of
  vectorized stmts for each SLP node.  Hence, NCOPIES is always 1 in
  case of SLP.  */
-  if (slp_node || PURE_SLP_STMT (stmt_info))
+  if (slp_node)
 ncopies = 1;
   else
 ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits_in;
@@ -4933,7 +4933,7 @@ 

Avoid unnecessary peeling for gaps with LD3

2016-05-20 Thread Richard Sandiford
vectorizable_load forces peeling for gaps if the vectorisation factor
is not a multiple of the group size, since in that case we'd normally load
beyond the original scalar accesses but drop the excess elements as part
of a following permute:

  if (loop_vinfo
  && ! STMT_VINFO_STRIDED_P (stmt_info)
  && (GROUP_GAP (vinfo_for_stmt (first_stmt)) != 0
  || (!slp && vf % GROUP_SIZE (vinfo_for_stmt (first_stmt)) != 0)))

This isn't necessary for LOAD_LANES though, since it loads only the
data needed and does the permute itself.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Thanks,
Richard


gcc/
* tree-vect-stmts.c (vectorizable_load): Reorder checks so that
load_lanes/grouped_load classification comes first.  Don't check
whether the vectorization factor is a multiple of the group size
for load_lanes.

gcc/testsuite/
* gcc.dg/vect/vect-load-lanes-peeling-1.c: New test.

Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c
+++ gcc/tree-vect-stmts.c
@@ -6314,6 +6314,17 @@ vectorizable_load (gimple *stmt, gimple_stmt_iterator 
*gsi, gimple **vec_stmt,
   gcc_assert (!nested_in_vect_loop && !STMT_VINFO_GATHER_SCATTER_P 
(stmt_info));
 
   first_stmt = GROUP_FIRST_ELEMENT (stmt_info);
+  group_size = GROUP_SIZE (vinfo_for_stmt (first_stmt));
+
+  if (!slp
+ && !PURE_SLP_STMT (stmt_info)
+ && !STMT_VINFO_STRIDED_P (stmt_info))
+   {
+ if (vect_load_lanes_supported (vectype, group_size))
+   load_lanes_p = true;
+ else if (!vect_grouped_load_supported (vectype, group_size))
+   return false;
+   }
 
   /* If this is single-element interleaving with an element distance
  that leaves unused vector loads around punt - we at least create
@@ -6341,7 +6352,7 @@ vectorizable_load (gimple *stmt, gimple_stmt_iterator 
*gsi, gimple **vec_stmt,
   if (loop_vinfo
  && ! STMT_VINFO_STRIDED_P (stmt_info)
  && (GROUP_GAP (vinfo_for_stmt (first_stmt)) != 0
- || (!slp && vf % GROUP_SIZE (vinfo_for_stmt (first_stmt)) != 0)))
+ || (!slp && !load_lanes_p && vf % group_size != 0)))
{
  if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -6361,8 +6372,6 @@ vectorizable_load (gimple *stmt, gimple_stmt_iterator 
*gsi, gimple **vec_stmt,
   if (slp && SLP_TREE_LOAD_PERMUTATION (slp_node).exists ())
slp_perm = true;
 
-  group_size = GROUP_SIZE (vinfo_for_stmt (first_stmt));
-
   /* ???  The following is overly pessimistic (as well as the loop
  case above) in the case we can statically determine the excess
 elements loaded are within the bounds of a decl that is accessed.
@@ -6375,16 +6384,6 @@ vectorizable_load (gimple *stmt, gimple_stmt_iterator 
*gsi, gimple **vec_stmt,
  return false;
}
 
-  if (!slp
- && !PURE_SLP_STMT (stmt_info)
- && !STMT_VINFO_STRIDED_P (stmt_info))
-   {
- if (vect_load_lanes_supported (vectype, group_size))
-   load_lanes_p = true;
- else if (!vect_grouped_load_supported (vectype, group_size))
-   return false;
-   }
-
   /* Invalidate assumptions made by dependence analysis when vectorization
 on the unrolled body effectively re-orders stmts.  */
   if (!PURE_SLP_STMT (stmt_info)
Index: gcc/testsuite/gcc.dg/vect/vect-load-lanes-peeling-1.c
===
--- /dev/null
+++ gcc/testsuite/gcc.dg/vect/vect-load-lanes-peeling-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target vect_load_lanes } */
+
+void
+f (int *__restrict a, int *__restrict b)
+{
+  for (int i = 0; i < 96; ++i)
+a[i] = b[i * 3] + b[i * 3 + 1] + b[i * 3 + 2];
+}
+
+/* { dg-final { scan-tree-dump-not "Data access with gaps" "vect" } } */
+/* { dg-final { scan-tree-dump-not "epilog loop required" "vect" } } */


Fix GROUP_GAP for single-element interleaving

2016-05-20 Thread Richard Sandiford
vectorizable_load had a curious "force_peeling" variable, with no
comment explaining why we need it for single-element interleaving
but not for other cases.  I think it's simply because we weren't
initialising the GROUP_GAP correctly for single loads.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Thanks,
Richard


gcc/
* tree-vect-data-refs.c (vect_analyze_group_access_1): Set
GROUP_GAP for single-element interleaving.
* tree-vect-stmts.c (vectorizable_load): Remove force_peeling
variable.

diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index 7652e21..36d302a 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -2233,6 +2233,7 @@ vect_analyze_group_access_1 (struct data_reference *dr)
{
  GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt)) = stmt;
  GROUP_SIZE (vinfo_for_stmt (stmt)) = groupsize;
+ GROUP_GAP (stmt_info) = groupsize - 1;
  if (dump_enabled_p ())
{
  dump_printf_loc (MSG_NOTE, vect_location,
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 9ab4af4..585c073 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -6319,7 +6319,6 @@ vectorizable_load (gimple *stmt, gimple_stmt_iterator 
*gsi, gimple **vec_stmt,
  that leaves unused vector loads around punt - we at least create
 very sub-optimal code in that case (and blow up memory,
 see PR65518).  */
-  bool force_peeling = false;
   if (first_stmt == stmt
  && !GROUP_NEXT_ELEMENT (stmt_info))
{
@@ -6333,7 +6332,7 @@ vectorizable_load (gimple *stmt, gimple_stmt_iterator 
*gsi, gimple **vec_stmt,
}
 
  /* Single-element interleaving requires peeling for gaps.  */
- force_peeling = true;
+ gcc_assert (GROUP_GAP (stmt_info));
}
 
   /* If there is a gap in the end of the group or the group size cannot
@@ -6341,8 +6340,7 @@ vectorizable_load (gimple *stmt, gimple_stmt_iterator 
*gsi, gimple **vec_stmt,
 elements in the last iteration and thus need to peel that off.  */
   if (loop_vinfo
  && ! STMT_VINFO_STRIDED_P (stmt_info)
- && (force_peeling
- || GROUP_GAP (vinfo_for_stmt (first_stmt)) != 0
+ && (GROUP_GAP (vinfo_for_stmt (first_stmt)) != 0
  || (!slp && vf % GROUP_SIZE (vinfo_for_stmt (first_stmt)) != 0)))
{
  if (dump_enabled_p ())



Re: [C++ Patch/RFC] PR 70572 ("[4.9/5/6/7 Regression] ICE on code with decltype (auto) on x86_64-linux-gnu in digest_init_r")

2016-05-20 Thread Jason Merrill

On 05/20/2016 07:17 AM, Paolo Carlini wrote:

The below passes testing. There are a few minor changes wrt your
suggestions (I think we want & as hint;


I disagree; if what the user wanted was a function pointer, there's no 
reason to use decltype(auto) over plain auto.  Much more likely that 
they meant to capture the (possibly reference) result of a call.



spacing consistent with
typeck2.c; DECL_NAME doesn't seem necessary). I wondered if we want to
tighten the condition consistently with the wording of the error
message, thus patchlet *2 below, which of course also passes testing.


No, I don't think there's any way to deduce function type without the 
initializer having function type.


So, the first patch is OK with the message changed to ().

Jason



Re: [PATCH][RFC] Introduce BIT_FIELD_INSERT

2016-05-20 Thread Marc Glisse

On Fri, 20 May 2016, Andi Kleen wrote:


Richard Biener  writes:


The following patch adds BIT_FIELD_INSERT, an operation to
facilitate doing bitfield inserts on registers (as opposed
to currently where we'd have a BIT_FIELD_REF store).


I wonder if these patches would make it easier to use the Haswell
bit manipulations instructions on x86 (which act on registers).

I found that gcc makes significantly less use of them than LLVM,
sometimes leading to much bigger code.


Could you point at some bugzilla entries? I don't really see which BMI* 
instruction could be helped by BIT_FIELD_INSERT (PDEP seems too hard). 
There is one BMI1 instruction we don't use much, bextr (only defined with 
an UNSPEC in i386.md, unlike the TBM version), but it is about extracting.


--
Marc Glisse


Re: [PATCH] nvptx per-warp compiler-defined stacks (-msoft-stack)

2016-05-20 Thread Alexander Monakov
On Thu, 21 Apr 2016, Nathan Sidwell wrote:
> On 04/20/16 12:59, Alexander Monakov wrote:
> > This patch implements per-warp compiler-defined stacks under -msoft-stack
> > option, and implements alloca on top of that.  In a few obvious places,
> > changes from -muniform-simt patch are present in the hunks.
> >
> 
> It'd be better to not  mix fragments of patches, and have a description of how
> soft stacks works.

Added a description in the intro text.

> > +  /* fstmp2 = &__nvptx_stacks[tid.y];  */
> 
> ?

This is what is being computed in the emitted assembly.  Changed the comment
to say,
+  /* Now %fstmp2 holds the value of '&__nvptx_stacks[%tid.y]'.  */

> > +  /* crtl->is_leaf is not initialized because RA is not run.  */
> 
> Cryptic comment is cryptic.

Changed the comment to say,
+  /* Ideally we'd use 'crtl->is_leaf' here, but it is computed during
+ register allocator initialization, which is not done on NVPTX.  */

> > +  fprintf (asm_out_file, ".extern .shared .u%d __nvptx_stacks[32];\n",
> 
> Magic constant '32'?

It's the maximum warp count in a CUDA block, but since it's an external
declaration, it can be omitted altogether; changed to use empty brackets.

> > +  if (need_unisimt_decl)
> > +{
> > +  write_var_marker (asm_out_file, false, true, "__nvptx_uni");
> > +  fprintf (asm_out_file, ".extern .shared .u32 __nvptx_uni[32];\n");
> > +}
> 
> Looks like some other patch?

Yes, and likewise in other instances. Both the cover letter and the patch
description mentioned that.  Removed in this resubmission.

> Needs testcases.

Added a testcase that enables -msoft-stack explicitely; testing with
RUNTESTFLAGS=--target_board=nvptx-none-run/-msoft-stack gives this
functionality plenty of exercise, otherwise.

Updated patch below.

Alexander

This patch implements '-msoft-stack' code generation variant for NVPTX.  The
goal is to avoid relying on '.local' memory space for placement of automatic
data, and instead have an explicitely-maintained stack pointer (which can be
set up to point to preallocated global memory space).  This allows to have
stack data accessible from all threads and modifiable with atomic
instructions.  This also allows to implement variable-length stack allocation
(for 'alloca' and C99 VLAs).

Each warp has its own 'soft stack' pointer.  It lives in shared memory array
called __nvptx_stacks at index %tid.y (like OpenACC, OpenMP is offloading is
going to use launch geometry such that %tid.y gives the warp index).  It is
retrieved in function prologue (if the function needs a stack frame) and may
also be written there (if the function is non-leaf, so that its callees see
the updated stack pointer), and restored prior to returning.

Startup code is responsible for setting up the initial soft-stack pointer. For
-mmainkernel testing it is libgcc's __main, for OpenMP offloading it's the
kernel region entry code.

2016-05-19  Alexander Monakov  

* lib/target-supports.exp (check_effective_target_alloca): Use a
compile test.

2016-05-19  Alexander Monakov  

* gcc.target/nvptx/softstack.c: New test.

2016-05-19  Alexander Monakov  

* config/nvptx/nvptx.c (nvptx_declare_function_name): Expand comments.
(nvptx_file_end): Do not emit element count in external declaration of
__nvptx_stacks.

2016-05-19  Alexander Monakov  

* doc/invoke.texi (msoft-stack): Rewrite.

2016-03-15  Alexander Monakov  

* config/nvptx/nvptx.h (STACK_SIZE_MODE): Define.

2015-12-14  Alexander Monakov  

* config/nvptx/nvptx.c (nvptx_declare_function_name): Emit %outargs
using .local %outargs_ar only if not TARGET_SOFT_STACK.  Emit %outargs
under TARGET_SOFT_STACK by offsetting from %frame.
(nvptx_get_drap_rtx): Return %argp as the DRAP if needed.
* config/nvptx/nvptx.md (nvptx_register_operand): Allow %outargs under
TARGET_SOFT_STACK.
(nvptx_nonimmediate_operand): Ditto.
(allocate_stack): Implement for TARGET_SOFT_STACK.  Remove unused code.
(allocate_stack_): Remove unused pattern.
(set_softstack_insn): New pattern.
(restore_stack_block): Handle for TARGET_SOFT_STACK.

2015-12-09  Alexander Monakov  

* config/nvptx/nvptx.c: (need_softstack_decl): New variable.
(nvptx_declare_function_name): Handle TARGET_SOFT_STACK.
(nvptx_output_return): Emit stack restore if needed.
(nvptx_file_end): Handle need_softstack_decl.
* config/nvptx/nvptx.h: (TARGET_CPU_CPP_BUILTINS): Define
__nvptx_softstack__ when -msoft-stack is active.
(struct machine_function): New bool field using_softstack.
* config/nvptx/nvptx.opt: (msoft-stack): New option.
* doc/invoke.texi (msoft-stack): Document.

diff --git a/gcc/config/nvptx/nvptx.c 

[PATCH][MIPS] Disable madd/msub when -mno-imadd is used with -mdsp

2016-05-20 Thread Robert Suchanek
Hi,

If -mdsp option is used then adding -mno-imadd has no effect on the code
generation.  This appears to be slightly inconsistent to the -m[no-]imadd option
we have.

Any potential problems/comments? Ok to commit?

Regards,
Robert

gcc/
* config/mips/mips.c (mips_option_override): Move DSP ASE checks
earlier in the function.  Enable madd/msub patterns by default
for DSP ASE.
* config/mips/mips.md (msubsidi4): Remove override for DSP.
(maddsidi4): Ditto.
---
 gcc/config/mips/mips.c  | 30 --
 gcc/config/mips/mips.md |  4 ++--
 2 files changed, 18 insertions(+), 16 deletions(-)

diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index 4312368..9c6c2a5 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -19758,13 +19758,27 @@ mips_option_override (void)
 warning (0, "the %qs architecture does not support branch-likely"
 " instructions", mips_arch_info->name);
 
+  /* If TARGET_DSPR2, enable TARGET_DSP.  */
+  if (TARGET_DSPR2)
+TARGET_DSP = true;
+
+  if (TARGET_DSP && mips_isa_rev >= 6)
+{
+  error ("the %qs architecture does not support DSP instructions",
+mips_arch_info->name);
+  TARGET_DSP = false;
+  TARGET_DSPR2 = false;
+}
+
   /* If the user hasn't specified -mimadd or -mno-imadd set
  MASK_IMADD based on the target architecture and tuning
  flags.  */
   if ((target_flags_explicit & MASK_IMADD) == 0)
 {
-  if (ISA_HAS_MADD_MSUB &&
-  (mips_tune_info->tune_flags & PTF_AVOID_IMADD) == 0)
+  if ((ISA_HAS_MADD_MSUB &&
+  (mips_tune_info->tune_flags & PTF_AVOID_IMADD) == 0)
+ /* Enable for DSP by default */
+ || TARGET_DSP)
target_flags |= MASK_IMADD;
   else
target_flags &= ~MASK_IMADD;
@@ -19955,18 +19969,6 @@ mips_option_override (void)
   mips_r10k_cache_barrier = R10K_CACHE_BARRIER_NONE;
 }
 
-  /* If TARGET_DSPR2, enable TARGET_DSP.  */
-  if (TARGET_DSPR2)
-TARGET_DSP = true;
-
-  if (TARGET_DSP && mips_isa_rev >= 6)
-{
-  error ("the %qs architecture does not support DSP instructions",
-mips_arch_info->name);
-  TARGET_DSP = false;
-  TARGET_DSPR2 = false;
-}
-
   /* .eh_frame addresses should be the same width as a C pointer.
  Most MIPS ABIs support only one pointer size, so the assembler
  will usually know exactly how big an .eh_frame address is.
diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md
index 432ab1a..22f4f0b 100644
--- a/gcc/config/mips/mips.md
+++ b/gcc/config/mips/mips.md
@@ -2242,7 +2242,7 @@ (define_insn "msubsidi4"
   (mult:DI
  (any_extend:DI (match_operand:SI 1 "register_operand" "d"))
  (any_extend:DI (match_operand:SI 2 "register_operand" "d")]
-  "!TARGET_64BIT && (ISA_HAS_MSAC || GENERATE_MADD_MSUB || ISA_HAS_DSP)"
+  "!TARGET_64BIT && (ISA_HAS_MSAC || GENERATE_MADD_MSUB)"
 {
   if (ISA_HAS_DSP_MULT)
 return "msub\t%q0,%1,%2";
@@ -2516,7 +2516,7 @@ (define_insn "maddsidi4"
 (mult:DI (any_extend:DI (match_operand:SI 1 "register_operand" "d"))
  (any_extend:DI (match_operand:SI 2 "register_operand" "d")))
 (match_operand:DI 3 "muldiv_target_operand" "0")))]
-  "(TARGET_MAD || ISA_HAS_MACC || GENERATE_MADD_MSUB || ISA_HAS_DSP)
+  "(TARGET_MAD || ISA_HAS_MACC || GENERATE_MADD_MSUB)
&& !TARGET_64BIT"
 {
   if (TARGET_MAD)
-- 
2.8.2.396.g5fe494c


PING^3: [PATCH] PR target/70454: Build x86 libgomp with -march=i486 or better

2016-05-20 Thread H.J. Lu
On Mon, May 9, 2016 at 5:52 AM, H.J. Lu  wrote:
> On Mon, May 2, 2016 at 6:46 AM, H.J. Lu  wrote:
>> On Mon, Apr 25, 2016 at 1:36 PM, H.J. Lu  wrote:
>>> If x86 libgomp isn't compiled with -march=i486 or better, append
>>> -march=i486 XCFLAGS for x86 libgomp build.
>>>
>>> Tested on i686 with and without --with-arch=i386.  Tested on
>>> x86-64 with and without --with-arch_32=i386.  OK for trunk?
>>>
>>>
>>> H.J.
>>> ---
>>> PR target/70454
>>> * configure.tgt (XCFLAGS): Append -march=i486 to compile x86
>>> libgomp if needed.
>>> ---
>>>  libgomp/configure.tgt | 36 
>>>  1 file changed, 16 insertions(+), 20 deletions(-)
>>>
>>> diff --git a/libgomp/configure.tgt b/libgomp/configure.tgt
>>> index 77e73f0..c876e80 100644
>>> --- a/libgomp/configure.tgt
>>> +++ b/libgomp/configure.tgt
>>> @@ -67,28 +67,24 @@ if test x$enable_linux_futex = xyes; then
>>> ;;
>>>
>>>  # Note that bare i386 is not included here.  We need cmpxchg.
>>> -i[456]86-*-linux*)
>>> +i[456]86-*-linux* | x86_64-*-linux*)
>>> config_path="linux/x86 linux posix"
>>> -   case " ${CC} ${CFLAGS} " in
>>> - *" -m64 "*|*" -mx32 "*)
>>> -   ;;
>>> - *)
>>> -   if test -z "$with_arch"; then
>>> - XCFLAGS="${XCFLAGS} -march=i486 -mtune=${target_cpu}"
>>> +   # Need i486 or better.
>>> +   cat > conftestx.c <>> +#if defined __x86_64__ || defined __i486__ || defined __pentium__ \
>>> +  || defined __pentiumpro__ || defined __pentium4__ \
>>> +  || defined __geode__ || defined __SSE__
>>> +# error Need i486 or better
>>> +#endif
>>> +EOF
>>> +   if ${CC} ${CFLAGS} -c -o conftestx.o conftestx.c > /dev/null 2>&1; 
>>> then
>>> +   if test "${target_cpu}" = x86_64; then
>>> +   XCFLAGS="${XCFLAGS} -march=i486 -mtune=generic"
>>> +   else
>>> +   XCFLAGS="${XCFLAGS} -march=i486 -mtune=${target_cpu}"
>>> fi
>>> -   esac
>>> -   ;;
>>> -
>>> -# Similar jiggery-pokery for x86_64 multilibs, except here we
>>> -# can't rely on the --with-arch configure option, since that
>>> -# applies to the 64-bit side.
>>> -x86_64-*-linux*)
>>> -   config_path="linux/x86 linux posix"
>>> -   case " ${CC} ${CFLAGS} " in
>>> - *" -m32 "*)
>>> -   XCFLAGS="${XCFLAGS} -march=i486 -mtune=generic"
>>> -   ;;
>>> -   esac
>>> +   fi
>>> +   rm -f conftestx.c conftestx.o
>>> ;;
>>>
>>>  # Note that sparcv7 and sparcv8 is not included here.  We need cas.
>>> --
>>> 2.5.5
>>>
>>
>> PING.
>>
>
> PING.
>

PING.


-- 
H.J.


PING: PATCH: PR target/70738: Add -mgeneral-regs-only option

2016-05-20 Thread H.J. Lu
On Fri, May 13, 2016 at 8:00 AM, H.J. Lu  wrote:
> On Thu, May 12, 2016 at 10:54 AM, H.J. Lu  wrote:
 Here is a patch to add
 -mgeneral-regs-only option to x86 backend.   We can update
 spec for interrupt handle to recommend compiling interrupt handler
 with -mgeneral-regs-only option and add a note for compiler
 implementers.

 OK for trunk if there is no regression?
>>>
>>>
>>> I can't comment on the code patch, but for the documentation part:
>>>
 @@ -24242,6 +24242,12 @@ opcodes, to mitigate against certain forms of
 attack. At the moment,
  this option is limited in what it can do and should not be relied
  on to provide serious protection.

 +@item -mgeneral-regs-only
 +@opindex mgeneral-regs-only
 +Generate code which uses only the general-purpose registers.  This will
>>>
>>>
>>> s/which/that/
>>>
 +prevent the compiler from using floating-point, vector, mask and bound
>>>
>>>
>>> s/will prevent/prevents/
>>>
 +registers, but will not impose any restrictions on the assembler.
>>>
>>>
>>> Maybe you mean to say "does not restrict use of those registers in inline
>>> assembly code"?  In any case, please get rid of the future tense here, too.
>>
>> I changed it to
>>
>> ---
>> @item -mgeneral-regs-only
>> @opindex mgeneral-regs-only
>> Generate code that uses only the general-purpose registers.  This
>> prevents the compiler from using floating-point, vector, mask and bound
>> registers.
>> ---
>>
>
> Here is the updated patch.  Tested on x86-64.  OK for trunk?
>

PING.

-- 
H.J.


[PATCH][MIPS] Add -mgrow-frame-downwards option

2016-05-20 Thread Robert Suchanek
Hi,

The patch changes the default behaviour of the direction in which
the local frame grows for MIPS16.

The code size reduces by about 0.5% in average case for -Os, hence,
it is good to turn the option on by default.

Ok to apply?

Regards,
Robert

gcc/

2016-05-20  Matthew Fortune  

* config/mips/mips.h (FRAME_GROWS_DOWNWARD): Enable it
conditionally for MIPS16.
* config/mips/mips.opt: Add -mgrow-frame-downwards option.
Enable it by default for MIPS16.
* doc/invoke.texi: Document the option.
---
 gcc/config/mips/mips.h   |  8 +++-
 gcc/config/mips/mips.opt |  4 
 gcc/doc/invoke.texi  | 13 +
 3 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h
index 5020208..6ab7dd3 100644
--- a/gcc/config/mips/mips.h
+++ b/gcc/config/mips/mips.h
@@ -2311,7 +2311,13 @@ enum reg_class
 
 #define STACK_GROWS_DOWNWARD 1
 
-#define FRAME_GROWS_DOWNWARD flag_stack_protect
+/* Growing the frame downwards allows us to put spills closest to
+   the stack pointer which is good as they are likely to be accessed
+   frequently.  We can also arrange for normal stack usage to place
+   scalars last so that they too are close to the stack pointer.  */
+#define FRAME_GROWS_DOWNWARD ((TARGET_MIPS16   \
+  && TARGET_FRAME_GROWS_DOWNWARDS) \
+ || flag_stack_protect)
 
 /* Size of the area allocated in the frame to save the GP.  */
 
diff --git a/gcc/config/mips/mips.opt b/gcc/config/mips/mips.opt
index 3b92ef5..53feb23 100644
--- a/gcc/config/mips/mips.opt
+++ b/gcc/config/mips/mips.opt
@@ -447,3 +447,7 @@ Enum(mips_cb_setting) String(always) Value(MIPS_CB_ALWAYS)
 minline-intermix
 Target Report Var(TARGET_INLINE_INTERMIX)
 Allow inlining even if the compression flags differ between caller and callee.
+
+mgrow-frame-downwards
+Target Report Var(TARGET_FRAME_GROWS_DOWNWARDS) Init(1)
+Change the behaviour to grow the frame downwards for MIPS16.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 2f6195e..6e5d620 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -838,6 +838,7 @@ Objective-C and Objective-C++ Dialects}.
 -minterlink-compressed -mno-interlink-compressed @gol
 -minterlink-mips16  -mno-interlink-mips16 @gol
 -minline-intermix -mno-inline-intermix @gol
+-mgrow-frame-downwards -mno-grow-frame-downwards @gol
 -mabi=@var{abi}  -mabicalls  -mno-abicalls @gol
 -mshared  -mno-shared  -mplt  -mno-plt  -mxgot  -mno-xgot @gol
 -mgp32  -mgp64  -mfp32  -mfpxx  -mfp64  -mhard-float  -msoft-float @gol
@@ -17929,6 +17930,18 @@ vice-versa.  When using this option it is necessary to 
protect functions
 that cannot be compiled as MIPS16 with a @code{noinline} attribute to ensure
 they are not inlined into a MIPS16 function.
 
+@item -mgrow-frame-downwards
+@itemx -mno-grow-frame-downwards
+@opindex mgrow-frame-downwards
+Grow the local frame down (up) for MIPS16.
+
+Growing the frame downwards allows us to get spill slots created at the lowest
+address rather than the highest address in a local frame.  The benefit of this
+is smaller code size as accessing spill splots closer to the stack pointer
+can be done using using 16-bit instructions.
+
+The option is enabled by default (to grow frame downwards) for MIPS16.
+
 @item -mabi=32
 @itemx -mabi=o64
 @itemx -mabi=n32
-- 
2.8.2.396.g5fe494c


Re: [PATCH 3/3] function: Restructure *logue insertion

2016-05-20 Thread Nathan Sidwell

On 05/20/16 09:21, Thomas Schwinge wrote:

Hi!

The nvptx maintainer Bernd, Nathan: can you take it from here, or should
I continue to figure it out?


What is the defect?




[PATCH][MIPS] Fix ICE for constant pool data in GP area for MIPS16

2016-05-20 Thread Robert Suchanek
Hi,

The patch fixes an ICE when the compiler tries to split an instruction.
Test attached.

No regression. Ok to apply?

Regards,
Robert

gcc/

2016-05-20  Andrew Bennett  

* config/mips/mips.c (mips_constant_pool_symbol_in_sdata_p): New
function.
(mips_output_move): Copy GP instead of splitting HIGH when
accessing constant pool data.

gcc/testsuite/

2016-05-20  Robert Suchanek  

* gcc.target/mips/mips16-gp-bug-1.c: New test.
---
 gcc/config/mips/mips.c  | 20 +++-
 gcc/testsuite/gcc.target/mips/mips16-gp-bug-1.c | 12 
 2 files changed, 31 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/mips/mips16-gp-bug-1.c

diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index cbe1007..e6d8f76 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -5053,6 +5053,18 @@ mips_split_move_insn (rtx dest, rtx src, rtx insn)
   mips_split_move (dest, src, mips_insn_split_type (insn));
 }
 

+/* Return true if X is a GP-relative symbol in the constant pool.
+   CONTEXT is the context in which X appears.  */
+
+bool
+mips_constant_pool_symbol_in_sdata_p (rtx x, enum mips_symbol_context context)
+{
+  enum mips_symbol_type symbol_type;
+  return (mips_symbolic_constant_p (x, context, _type)
+ && symbol_type == SYMBOL_GP_RELATIVE
+ && CONSTANT_POOL_ADDRESS_P (x));
+}
+
 /* Return the appropriate instructions to move SRC into DEST.  Assume
that SRC is operand 1 and DEST is operand 0.  */
 
@@ -5202,7 +5214,13 @@ mips_output_move (rtx dest, rtx src)
}
 
   if (src_code == HIGH)
-   return TARGET_MIPS16 ? "#" : "lui\t%0,%h1";
+   {
+ if (mips_constant_pool_symbol_in_sdata_p (XEXP (src, 0),
+   SYMBOL_CONTEXT_MEM))
+   return "move\t%0,$28";
+
+ return TARGET_MIPS16 ? "#" : "lui\t%0,%h1";
+   }
 
   if (CONST_GP_P (src))
return "move\t%0,%1";
diff --git a/gcc/testsuite/gcc.target/mips/mips16-gp-bug-1.c 
b/gcc/testsuite/gcc.target/mips/mips16-gp-bug-1.c
new file mode 100644
index 000..85a6d83
--- /dev/null
+++ b/gcc/testsuite/gcc.target/mips/mips16-gp-bug-1.c
@@ -0,0 +1,12 @@
+/* { dg-options "(-mips16) -G4 -mno-abicalls -mcode-readable=no" } */
+typedef struct { int a, b, c, d; } A[5000];
+int a, b;
+extern void bar (int, int, A);
+
+void
+foo (int p1)
+{
+  A c;
+  while (p1)
+bar (a, b, c);
+}
-- 
2.8.2.396.g5fe494c


Re: [PATCH][RFC] Introduce BIT_FIELD_INSERT

2016-05-20 Thread Andi Kleen
Richard Biener  writes:

> The following patch adds BIT_FIELD_INSERT, an operation to
> facilitate doing bitfield inserts on registers (as opposed
> to currently where we'd have a BIT_FIELD_REF store).

I wonder if these patches would make it easier to use the Haswell
bit manipulations instructions on x86 (which act on registers).

I found that gcc makes significantly less use of them than LLVM,
sometimes leading to much bigger code.

-Andi



Re: [ARM] Fix bogus -fstack-usage warning on naked functions

2016-05-20 Thread Kyrill Tkachov

Hi Eric,

On 16/05/16 09:40, Eric Botcazou wrote:

Hi,

-fstack-usage issues the "not supported by this target" warning on naked
functions because the prologue routines do an early return for them.

Tested on arm-eabi, may I apply it on all active branches?


2016-05-16  Eric Botcazou  

* config/arm/arm.c (arm_expand_prologue): Set the stack usage to 0
for naked functions.
(thumb1_expand_prologue): Likewise.



Ok.
Considering the use of current_function_static_stack_size in output_stack_usage
this definitely looks sensible and safe.

Thanks,
Kyrill


Re: [Patch ARM/AArch64 06/11] Add missing vtst_p8 and vtstq_p8 tests.

2016-05-20 Thread Kyrill Tkachov

Hi Christophe,

On 19/05/16 12:54, Christophe Lyon wrote:

On 13 May 2016 at 16:47, James Greenhalgh  wrote:

On Fri, May 13, 2016 at 04:41:33PM +0200, Christophe Lyon wrote:

On 13 May 2016 at 16:37, James Greenhalgh  wrote:

On Wed, May 11, 2016 at 03:23:56PM +0200, Christophe Lyon wrote:

2016-05-02  Christophe Lyon  

   * gcc.target/aarch64/advsimd-intrinsics/vtst.c: Add tests
   for vtst_p8 and vtstq_p8.

And vtst_p16 and vtstq_p16 too please.

vtst_s64
vtstq_s64
vtst_u64
vtstq_u64 are also missing (AArch64 only).


vtst_p16/vtstq_p16 are AArch64 only too, right?

Not in my copy of:

   
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0073a/IHI0073A_arm_neon_intrinsics_ref.pdf

I see it is missing from config/arm/arm_neon.h so that's a bug in the GCC
implementation. It should be easy to resolve, map it to the same place
as vtst_u16 and vtst_s16 - this is just a bit operation which takes no
semantics from the data-type.


Maybe you have a way of automatically checking that the doc and arm_neon.h
contents match? I mean:
- are there other intrinsics documented, but not defined in arm_neon.h ?
- are there intrinsics in arm_neon.h, but not in the doc?


Would you mind spinning the fix for that and committing it before this
patch?


I've attached an updated patch which contains the definition for the
missing vtst_p16 and vtstq_p16,
as well as tests for vtst_p8, vtstq_p8, vtst_p16 and vtstq_p16.


The patch is ok (I'm assuming the newly added tests pass ;)

Thanks,
Kyrill


My introduction message was not clear enough: this series
only attempts to fully cover AArch32 intrinsics.

Understood, sorry for the extra noise.


Coverage of AArch64 intrinsics will require another effort :)


Thanks,
James






[PATCH][MIPS] Add support for P6600

2016-05-20 Thread Robert Suchanek
Hi,

The below patch adds support for MIPS P6600 CPU.

This patch will go in after the approval of the Binutils patch.

Tested with mips-img-linux-gnu.

Regards,
Robert

2016-05-20  Matthew Fortune  
Prachi Godbole  

* config/mips/mips-cpus.def: Add definition for p6600.
* config/mips/mips-tables.opt: Regenerate.
* config/mips/mips.c (mips_ucbranch_type): New enum.
(mips_rtx_cost_data): Add costs for p6600.
(mips_issue_rate): Add support for p6600.
(mips_multipass_dfa_lookahead): Likewise.
(mips_classify_branch_p6600): New function.
(mips_avoid_hazard): Optimize unconditional compact branch
hazard.
(mips_reorg_process_insns): Likewise.
* config/mips/mips.h (TUNE_P6600): New define.
(MIPS_ISA_LEVEL_SPEC): Infer mips64r6 from p6600.
(ENABLE_LD_ST_PAIRS): Enable load/store pairs for p6600.
* config/mips/mips.md: Include p6600.md.
(processor): Add p6600.
(JOIN_MODE): Add support for load/store pairs for 64-bit target.
* config/mips/p6600.md: New file.
* doc/invoke.texi: Add p6600 to supported architectures.
---
 gcc/config/mips/mips-cpus.def   |   1 +
 gcc/config/mips/mips-tables.opt |   3 +
 gcc/config/mips/mips.c  |  84 +-
 gcc/config/mips/mips.h  |   6 +-
 gcc/config/mips/mips.md |   3 +
 gcc/config/mips/p6600.md| 349 
 gcc/doc/invoke.texi |   2 +-
 7 files changed, 442 insertions(+), 6 deletions(-)
 create mode 100644 gcc/config/mips/p6600.md

diff --git a/gcc/config/mips/mips-cpus.def b/gcc/config/mips/mips-cpus.def
index 5df9807..5694e87 100644
--- a/gcc/config/mips/mips-cpus.def
+++ b/gcc/config/mips/mips-cpus.def
@@ -171,3 +171,4 @@ MIPS_CPU ("xlp", PROCESSOR_XLP, 65, PTF_AVOID_BRANCHLIKELY)
 
 /* MIPS64 Release 6 processors.  */
 MIPS_CPU ("i6400", PROCESSOR_I6400, 69, 0)
+MIPS_CPU ("p6600", PROCESSOR_P6600, 69, 0)
diff --git a/gcc/config/mips/mips-tables.opt b/gcc/config/mips/mips-tables.opt
index 34c12bd..270fcc0 100644
--- a/gcc/config/mips/mips-tables.opt
+++ b/gcc/config/mips/mips-tables.opt
@@ -696,3 +696,6 @@ Enum(mips_arch_opt_value) String(xlp) Value(101) Canonical
 EnumValue
 Enum(mips_arch_opt_value) String(i6400) Value(102) Canonical
 
+EnumValue
+Enum(mips_arch_opt_value) String(p6600) Value(103) Canonical
+
diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index 06acd30..cbe1007 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -194,6 +194,15 @@ enum mips_address_type {
   ADDRESS_SYMBOLIC
 };
 
+/* Classifies an unconditional branch of interest for the P6600.  */
+
+enum mips_ucbranch_type {
+  /* May not even be a branch.  */
+  UC_UNDEFINED,
+  UC_BALC,
+  UC_OTHER
+};
+
 /* Macros to create an enumeration identifier for a function prototype.  */
 #define MIPS_FTYPE_NAME1(A, B) MIPS_##A##_FTYPE_##B
 #define MIPS_FTYPE_NAME2(A, B, C) MIPS_##A##_FTYPE_##B##_##C
@@ -1122,6 +1131,19 @@ static const struct mips_rtx_cost_data
 COSTS_N_INSNS (36),   /* int_div_di */
2,/* branch_cost */
4 /* memory_latency */
+  },
+  { /* P6600 */
+COSTS_N_INSNS (4),/* fp_add */
+COSTS_N_INSNS (5),/* fp_mult_sf */
+COSTS_N_INSNS (5),/* fp_mult_df */
+COSTS_N_INSNS (17),   /* fp_div_sf */
+COSTS_N_INSNS (17),   /* fp_div_df */
+COSTS_N_INSNS (5),/* int_mult_si */
+COSTS_N_INSNS (5),/* int_mult_di */
+COSTS_N_INSNS (8),/* int_div_si */
+COSTS_N_INSNS (8),/* int_div_di */
+   2,/* branch_cost */
+   4 /* memory_latency */
   }
 };
 

@@ -14507,6 +14529,7 @@ mips_issue_rate (void)
 case PROCESSOR_LOONGSON_2F:
 case PROCESSOR_LOONGSON_3A:
 case PROCESSOR_P5600:
+case PROCESSOR_P6600:
   return 4;
 
 case PROCESSOR_XLP:
@@ -14642,7 +14665,7 @@ mips_multipass_dfa_lookahead (void)
   if (TUNE_OCTEON)
 return 2;
 
-  if (TUNE_P5600 || TUNE_I6400)
+  if (TUNE_P5600 || TUNE_I6400 || TUNE_P6600)
 return 4;
 
   return 0;
@@ -18496,6 +18519,28 @@ mips_orphaned_high_part_p (mips_offset_table *htab, 
rtx_insn *insn)
   return false;
 }
 
+/* Subroutine of mips_avoid_hazard.  We classify unconditional branches
+   of interest for the P6600 for performance reasons.  We are interested
+   in differentiating BALC from JIC, JIALC and BC.  */
+
+static enum mips_ucbranch_type
+mips_classify_branch_p6600 (rtx_insn *insn)
+{
+  if (!(insn
+   && USEFUL_INSN_P (insn)
+   && GET_CODE (PATTERN (insn)) != SEQUENCE))
+return UC_UNDEFINED;
+
+  if (get_attr_jal (insn) == JAL_INDIRECT /* JIC and JIALC.  */
+  || get_attr_type (insn) == TYPE_JUMP) /* BC as it is a loose jump.  */
+return UC_OTHER;

Re: Make array_at_struct_end_p to grok MEM_REFs

2016-05-20 Thread Richard Biener
On Fri, 20 May 2016, Jan Hubicka wrote:

> Hi,
> this patch makes array_at_struct_end_p to not give up at MEM_REF as discussed
> on IRC few weeks back. This happens a lot for Fortran testcases.
> I am bootstrapping/regtesteing x86_64-linux and intend to commit it as 
> obvoius.
> 
> We sill miss a lot of upper bound for fortran code because we can not look
> up the origin of the array. For example:
> type
> type  character(kind=1)>
> string-flag QI
>   
> size   
>   
> unit size  
>   
> align 8 symtab 0 alias set 7 canonical type 0x76df30a8 domain 
> 
> pointer_to_this >
>   
> BLK   
>   
> align 8 symtab 0 alias set -1 canonical type 0x76df3690   
>   
> domain  integer(kind=8)>
> DI
>   
> size  
>   
> unit size  
>   
> align 64 symtab 0 alias set -1 canonical type 0x76c515e8 
> precision 64 min >
> pointer_to_this >
>   
>   
>   
> arg 0
> type  void>  
> public unsigned DI size  unit size 
> 
> align 64 symtab 0 alias set 3 canonical type 0x76c492a0   
>   
> pointer_to_this >
>   
> visiteddef_stmt _92 = __builtin_malloc (5);   
>   
> version 92
>   
> ptr-info 0x76a91df8>  
>   
> arg 1  
> constant 0>>
> 
> 
> 
> 
> type
> type  string-flag QI
> size   
>   
> unit size  
>   
> align 8 symtab 0 alias set 7 canonical type 0x76c34540 
> precision 8 min  max  0x76c30c90 255>
> pointer_to_this >
>   
> string-flag QI size  unit size 
> 
> align 8 symtab 0 alias set 7 canonical type 0x76df30a8
>   
> domain  integer(kind=4)>
> SI
>   
> size  
>   
> unit size  
>   
> align 32 symtab 0 alias set -1 canonical type 0x76df3000 
> precision 32 min  max  0x76c30f78 1>>
> pointer_to_this >
>   
>   
>   
> arg 0
> type 
>   
> public unsigned DI
>   
> size  
>   
> unit size  
>   
> align 64 symtab 0 alias set 6 canonical type 0x76df7c78   
>   
> pointer_to_this >
>   
> visiteddef_stmt _16 = A.1[S.2_456];   
>   
> version 16
>   
> ptr-info 0x76ac70a8>  
>   
> arg 1  
> constant 0>>
> 
> Moreover the trailin array code is probably all unnecesary for Fortran. 
> Richard, I wonder if we don't want to add a flag to ARRAY_REF (or some 
> other place) specifyin that given access is not going to go past the 
> size of the array type?

The assert below is unnecessary btw - it is ensured by IL checking.

Rather than annotating an ARRAY_REF I'd have FEs annotate FIELD_DECLs
that they are possibly flexible-size members.

Richard.

>   * tree.c (array_at_struct_end_p): Look through MEM_REFs.
> Index: tree.c
> ===
> --- tree.c(revision 236507)
> +++ tree.c(working copy)
> @@ -13076,6 +13076,13 @@ array_at_struct_end_p (tree ref)
>ref = TREE_OPERAND (ref, 0);
>  }
>  
> +  if (TREE_CODE (ref) == MEM_REF
> +  && TREE_CODE (TREE_OPERAND (ref, 0)) == ADDR_EXPR)
> +{
> +  ref = TREE_OPERAND (TREE_OPERAND (ref, 0), 0);
> +  gcc_assert (!handled_component_p (ref));
> +}
> +
>/* If the reference is based on a declared entity, the size of the array
>   is constrained by its given domain.  (Do not trust commons PR/69368).  
> */
>if (DECL_P (ref)
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCH, ARM 6/7, ping1] Add support for CB(N)Z and (U|S)DIV to ARMv8-M Baseline

2016-05-20 Thread Kyrill Tkachov

Hi Thomas,

On 17/05/16 11:14, Thomas Preudhomme wrote:

Ping?

*** gcc/ChangeLog ***

2015-11-13  Thomas Preud'homme  

 * config/arm/arm.c (arm_print_operand_punct_valid_p): Make %? valid
 for Thumb-1.
 * config/arm/arm.h (TARGET_HAVE_CBZ): Define.
 (TARGET_IDIV): Set for all Thumb targets provided they have hardware
 divide feature.
 * config/arm/thumb1.md (thumb1_cbz): New insn.


diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index
f42e996e5a7ce979fe406b8261d50fb2ba005f6b..347b5b0a5cc0bc1e3b5020c8124d968e76ce48a4
100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -271,9 +271,12 @@ extern void (*arm_lang_output_object_attributes_hook)
(void);
  /* Nonzero if this chip provides the movw and movt instructions.  */
  #define TARGET_HAVE_MOVT  (arm_arch_thumb2 || arm_arch8)
  
+/* Nonzero if this chip provides the cb{n}z instruction.  */

+#define TARGET_HAVE_CBZ(arm_arch_thumb2 || arm_arch8)
+
  /* Nonzero if integer division instructions supported.  */
  #define TARGET_IDIV   ((TARGET_ARM && arm_arch_arm_hwdiv) \
-|| (TARGET_THUMB2 && arm_arch_thumb_hwdiv))
+|| (TARGET_THUMB && arm_arch_thumb_hwdiv))
  
  /* Nonzero if disallow volatile memory access in IT block.  */

  #define TARGET_NO_VOLATILE_CE (arm_arch_no_volatile_ce)
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index
13b4b71ac8f9c1da8ef1945f7ff6985ca59f6832..445972ce0e3fd27d4411840ff69e9edbb23994fc
100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -22684,7 +22684,7 @@ arm_print_operand_punct_valid_p (unsigned char code)
  {
return (code == '@' || code == '|' || code == '.'
  || code == '(' || code == ')' || code == '#'
- || (TARGET_32BIT && (code == '?'))
+ || code == '?'
  || (TARGET_THUMB2 && (code == '!'))
  || (TARGET_THUMB && (code == '_')));
  }


Hmm, I'm not a fan of this change. arm_print_operand_punct_valid_p is an 
implementation
of a target hook that is used to validate user-provided inline asm as well and 
is therefore
the right place to reject such invalid constructs.

This is just working around the fact that the output template for the [u]divsi3 
patterns
has a '%?' in it that is illegal in Thumb1 and will not be used for ARMv8-M 
Baseline anyway.
I'd prefer it if you add a second alternative to those patterns and emit
the sdiv/udiv mnemonic without the '%?' and enable that for the v8mb arch 
attribute
(and mark the existing alternative as requiring the "32" arch attribute).


diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md
index
4572456b8bc98503061846cad94bc642943db3a2..1b01ef6ce731fe3ff37c3d8c048fb9d5e7829b35
100644
--- a/gcc/config/arm/thumb1.md
+++ b/gcc/config/arm/thumb1.md
@@ -973,6 +973,92 @@
DONE;
  })
  
+;; A pattern for the cb(n)z instruction added in ARMv8-M baseline profile,

+;; adapted from cbranchsi4_insn.  Modifying cbranchsi4_insn instead leads to
+;; code generation difference for ARMv6-M because the minimum length of the
+;; instruction becomes 2 even for it due to a limitation in genattrtab's
+;; handling of pc in the length condition.
+(define_insn "thumb1_cbz"
+  [(set (pc) (if_then_else
+ (match_operator 0 "equality_operator"
+  [(match_operand:SI 1 "s_register_operand" "l")
+   (const_int 0)])
+ (label_ref (match_operand 2 "" ""))
+ (pc)))]
+  "TARGET_THUMB1 && TARGET_HAVE_MOVT"
+{


s/TARGET_HAVE_MOVT/TARGET_HAVE_CBZ/


+  if (get_attr_length (insn) == 2)
+{
+  if (GET_CODE (operands[0]) == EQ)
+   return "cbz\t%1, %l2";
+  else
+   return "cbnz\t%1, %l2";
+}
+  else
+{
+  rtx t = cfun->machine->thumb1_cc_insn;
+  if (t != NULL_RTX)
+   {
+ if (!rtx_equal_p (cfun->machine->thumb1_cc_op0, operands[1])
+ || !rtx_equal_p (cfun->machine->thumb1_cc_op1, operands[2]))
+   t = NULL_RTX;
+ if (cfun->machine->thumb1_cc_mode == CC_NOOVmode)
+   {
+ if (!noov_comparison_operator (operands[0], VOIDmode))
+   t = NULL_RTX;
+   }
+ else if (cfun->machine->thumb1_cc_mode != CCmode)
+   t = NULL_RTX;
+   }
+  if (t == NULL_RTX)
+   {
+ output_asm_insn ("cmp\t%1, #0", operands);
+ cfun->machine->thumb1_cc_insn = insn;
+ cfun->machine->thumb1_cc_op0 = operands[1];
+ cfun->machine->thumb1_cc_op1 = operands[2];
+ cfun->machine->thumb1_cc_mode = CCmode;
+   }
+  else
+   /* Ensure we emit the right type of condition code on the jump.  */
+   XEXP (operands[0], 0) = gen_rtx_REG (cfun->machine->thumb1_cc_mode,
+CC_REGNUM);
+
+  switch (get_attr_length (insn))
+   {
+   case 4:  return "b%d0\t%l2";
+   case 6:  return "b%D0\t.LCB%=;b\t%l2\t%@long 

Re: [PATCH 3/3] function: Restructure *logue insertion

2016-05-20 Thread Thomas Schwinge
Hi!

The nvptx maintainer Bernd, Nathan: can you take it from here, or should
I continue to figure it out?

On Fri, 20 May 2016 11:28:25 +0200, I wrote:
> > > > * function.c (make_epilogue_seq): Remove epilogue_end parameter.
> > > > (thread_prologue_and_epilogue_insns): Remove bb_flags.  Restructure
> > > > code.  Ignore sibcalls on EDGE_IGNORE edges.
> > > > * shrink-wrap.c (handle_simple_exit): New function.  Set EDGE_IGNORE
> > > > on edges for sibcalls that run without prologue.  The rest of the
> > > > function is combined from...
> > > > (fix_fake_fallthrough_edge): ... this, and ...
> > > > (try_shrink_wrapping): ... a part of this.  Remove the bb_with
> > > > function argument, make it a local variable.
> 
> On Thu, 19 May 2016 17:20:46 -0500, Segher Boessenkool 
>  wrote:
> > On Thu, May 19, 2016 at 04:00:22PM -0600, Jeff Law wrote:
> > > OK for the trunk, but please watch closely for any fallout.
> > 
> > Thanks, and I will!
> 
> With nvptx offloading on x86_64 GNU/Linux, this (r236491) is causing
> several execution test failures.  I'll have a look.

OK, no offloading required.  The problem -- or, "a" problem; hopefully
the same ;-) -- also reproduces with a nvptx-none target configuration.
A before/after r236491 diff of:

$ build-gcc/gcc/xgcc -Bbuild-gcc/gcc/ 
source-gcc/gcc/testsuite/gcc.c-torture/execute/2121-1.c -O0 -Wall -Wextra 
-Bbuild-gcc/nvptx-none/newlib/ -Lbuild-gcc/nvptx-none/newlib -mmainkernel -o 
./2121-1.exe -fdump-tree-all -fdump-ipa-all -fdump-rtl-all -save-temps

..., shows the execution failure ("nvptx-none-run-single 2121-1.exe"
returns exit code 1), and (aside from earlier, hopefully benign
address/ID changes) shows the following dump changes, starting with:

--- before/2121-1.c.281r.mach   2016-05-20 14:56:37.794367323 +0200
+++ after/2121-1.c.281r.mach2016-05-20 14:54:34.537741174 +0200
@@ -5,16 +5,10 @@
 ending the processing of deferred insns
 df_analyze called
 df_worklist_dataflow_doublequeue:n_basic_blocks 3 n_edges 2 count 3 (1)
-scanning new insn with uid = 11.
-changing bb of uid 13
-  unscanned insn
-changing bb of uid 11
-  from 2 to 3
 starting the processing of deferred insns
 ending the processing of deferred insns
 df_analyze called
-df_worklist_dataflow_doublequeue:n_basic_blocks 4 n_edges 3 count 4 (1)
-df_worklist_dataflow_doublequeue:n_basic_blocks 4 n_edges 3 count 4 (1)
+df_worklist_dataflow_doublequeue:n_basic_blocks 3 n_edges 2 count 3 (1)
 
 
 big
@@ -27,8 +21,8 @@
 ;;  entry block defs1 [%stack] 2 [%frame] 3 [%args] 4 [%chain]
 ;;  exit block uses 1 [%stack] 2 [%frame]
 ;;  regs ever live  2 [%frame]
-;;  ref usage  r1={1d,3u} r2={1d,4u} r3={1d,2u} r4={1d} r22={1d,1u} 
-;;total ref usage 15{5d,10u,0e} in 4{4 regular + 0 call} insns.
+;;  ref usage  r1={1d,2u} r2={1d,3u} r3={1d,1u} r4={1d} r22={1d,1u} 
+;;total ref usage 12{5d,7u,0e} in 3{3 regular + 0 call} insns.
 
 ( )->[0]->( 2 )
 ;; bb 0 artificial_defs: { d-1(1){ }d-1(2){ }d-1(3){ }d-1(4){ }}
@@ -42,7 +36,7 @@
 ;; lr  out  1 [%stack] 2 [%frame] 3 [%args]
 ;; live  out1 [%stack] 2 [%frame] 3 [%args]
 
-( 0 )->[2]->( 3 )
+( 0 )->[2]->( 1 )
 ;; bb 2 artificial_defs: { }
 ;; bb 2 artificial_uses: { u-1(1){ }u-1(2){ }u-1(3){ }}
 ;; lr  in   1 [%stack] 2 [%frame] 3 [%args]
@@ -54,19 +48,7 @@
 ;; lr  out  1 [%stack] 2 [%frame] 3 [%args]
 ;; live  out1 [%stack] 2 [%frame] 3 [%args]
 
-( 2 )->[3]->( 1 )
-;; bb 3 artificial_defs: { }
-;; bb 3 artificial_uses: { u-1(1){ }u-1(2){ }u-1(3){ }}
-;; lr  in   1 [%stack] 2 [%frame] 3 [%args]
-;; lr  use  1 [%stack] 2 [%frame] 3 [%args]
-;; lr  def 
-;; live  in 1 [%stack] 2 [%frame] 3 [%args]
-;; live  gen   
-;; live  kill  
-;; lr  out  1 [%stack] 2 [%frame] 3 [%args]
-;; live  out1 [%stack] 2 [%frame] 3 [%args]
-
-( 3 )->[1]->( )
+( 2 )->[1]->( )
 ;; bb 1 artificial_defs: { }
 ;; bb 1 artificial_uses: { u-1(1){ }u-1(2){ }}
 ;; lr  in   1 [%stack] 2 [%frame]
@@ -92,8 +74,8 @@
 ;;  entry block defs1 [%stack] 2 [%frame] 3 [%args] 4 [%chain]
 ;;  exit block uses 1 [%stack] 2 [%frame]
 ;;  regs ever live  2 [%frame]
-;;  ref usage  r1={1d,3u} r2={1d,4u} r3={1d,2u} r4={1d} r22={1d,1u} 
-;;total ref usage 15{5d,10u,0e} in 4{4 regular + 0 call} insns.
+;;  ref usage  r1={1d,2u} r2={1d,3u} r3={1d,1u} r4={1d} r22={1d,1u} 
+;;total ref usage 12{5d,7u,0e} in 3{3 regular + 0 call} insns.
 (note 1 0 5 NOTE_INSN_DELETED)
 (note 5 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
 (insn 2 5 3 2 (set (reg:DI 22)
@@ -105,14 +87,8 @@
 (reg:DI 22)) 
source-gcc/gcc/testsuite/gcc.c-torture/execute/2121-1.c:1 5 {*movdi_insn}
 

Make profile updating after loop transforms bit more robust

2016-05-20 Thread Jan Hubicka
Hi,
this patch makes expected_loop_iterations to be bit saner in coner cases.
First expected_loop_iterations currently return 0 when
-fguess-branch-probabiliteis is off and also in case the frequencies are
downscaled to 0.
Originally the function was intended to be used only for loops with profile
computed, but this is not current practice so it is better to do something
at least resonably sane.  The patch makes expected_loop_iterations_unbounded
to guess 3 iterations which is about the avreage number of iterations of a
given loop in program.

On the other hand it may return bigger values than maximal number of iterations
which we now have readily available from loop infrastructure. So it is good idea
to cap the value by this.

This makes updates after loop unrolling and similar transformations more sane.

Bootstrapped/regtested x86_64-linux, will commit it shortly.

Honza

* cfgloop.h (expected_loop_iterations_unbounded,
expected_loop_iterations): Unconstify.
* cfgloopanal.c (expected_loop_iterations_unbounded): Sanity check the
profile with known upper bound; return 3 when profile is absent.
(expected_loop_iterations): Update.
Index: cfgloop.h
===
--- cfgloop.h   (revision 236507)
+++ cfgloop.h   (working copy)
@@ -316,8 +316,8 @@ extern void verify_loop_structure (void)
 
 /* Loop analysis.  */
 extern bool just_once_each_iteration_p (const struct loop *, 
const_basic_block);
-gcov_type expected_loop_iterations_unbounded (const struct loop *);
-extern unsigned expected_loop_iterations (const struct loop *);
+gcov_type expected_loop_iterations_unbounded (struct loop *);
+extern unsigned expected_loop_iterations (struct loop *);
 extern rtx doloop_condition_get (rtx);
 
 void mark_loop_for_removal (loop_p);
Index: cfgloopanal.c
===
--- cfgloopanal.c   (revision 236507)
+++ cfgloopanal.c   (working copy)
@@ -231,14 +231,20 @@ average_num_loop_insns (const struct loo
value.  */
 
 gcov_type
-expected_loop_iterations_unbounded (const struct loop *loop)
+expected_loop_iterations_unbounded (struct loop *loop)
 {
   edge e;
   edge_iterator ei;
+  gcov_type expected;
+  
 
-  if (loop->latch->count || loop->header->count)
+  /* Average loop rolls about 3 times. If we have no profile at all, it is
+ best we can do.  */
+  if (profile_status_for_fn (cfun) == PROFILE_ABSENT)
+expected = 3;
+  else if (loop->latch->count || loop->header->count)
 {
-  gcov_type count_in, count_latch, expected;
+  gcov_type count_in, count_latch;
 
   count_in = 0;
   count_latch = 0;
@@ -253,8 +259,6 @@ expected_loop_iterations_unbounded (cons
expected = count_latch * 2;
   else
expected = (count_latch + count_in - 1) / count_in;
-
-  return expected;
 }
   else
 {
@@ -270,17 +274,28 @@ expected_loop_iterations_unbounded (cons
  freq_in += EDGE_FREQUENCY (e);
 
   if (freq_in == 0)
-   return freq_latch * 2;
-
-  return (freq_latch + freq_in - 1) / freq_in;
+   {
+ /* If we have no profile at all, expect 3 iterations.  */
+ if (!freq_latch)
+   expected = 3;
+ else
+   expected = freq_latch * 2;
+   }
+  else
+expected = (freq_latch + freq_in - 1) / freq_in;
 }
+
+  HOST_WIDE_INT max = get_max_loop_iterations_int (loop);
+  if (max != -1 && max < expected)
+return max;
+  return expected;
 }
 
 /* Returns expected number of LOOP iterations.  The returned value is bounded
by REG_BR_PROB_BASE.  */
 
 unsigned
-expected_loop_iterations (const struct loop *loop)
+expected_loop_iterations (struct loop *loop)
 {
   gcov_type expected = expected_loop_iterations_unbounded (loop);
   return (expected > REG_BR_PROB_BASE ? REG_BR_PROB_BASE : expected);


Make array_at_struct_end_p to grok MEM_REFs

2016-05-20 Thread Jan Hubicka
Hi,
this patch makes array_at_struct_end_p to not give up at MEM_REF as discussed
on IRC few weeks back. This happens a lot for Fortran testcases.
I am bootstrapping/regtesteing x86_64-linux and intend to commit it as obvoius.

We sill miss a lot of upper bound for fortran code because we can not look
up the origin of the array. For example:
type 
string-flag QI  
size 
unit size
align 8 symtab 0 alias set 7 canonical type 0x76df30a8 domain 

pointer_to_this >  
BLK 
align 8 symtab 0 alias set -1 canonical type 0x76df3690 
domain 
DI  
size
unit size
align 64 symtab 0 alias set -1 canonical type 0x76c515e8 
precision 64 min >
pointer_to_this >  

arg 0   
public unsigned DI size  unit size 

align 64 symtab 0 alias set 3 canonical type 0x76c492a0 
pointer_to_this >  
visiteddef_stmt _92 = __builtin_malloc (5); 
version 92  
ptr-info 0x76a91df8>
arg 1  
constant 0>>



 
unit size
align 8 symtab 0 alias set 7 canonical type 0x76c34540 
precision 8 min  max 
pointer_to_this >  
string-flag QI size  unit size 

align 8 symtab 0 alias set 7 canonical type 0x76df30a8  
domain 
SI  
size
unit size
align 32 symtab 0 alias set -1 canonical type 0x76df3000 
precision 32 min  max >
pointer_to_this >  

arg 0   
public unsigned DI  
size
unit size
align 64 symtab 0 alias set 6 canonical type 0x76df7c78 
pointer_to_this >  
visiteddef_stmt _16 = A.1[S.2_456]; 
version 16  
ptr-info 0x76ac70a8>
arg 1  
constant 0>>

Moreover the trailin array code is probably all unnecesary for Fortran.
Richard, I wonder if we don't want to add a flag to ARRAY_REF (or some other
place) specifyin that given access is not going to go past the size of the array
type?
* tree.c (array_at_struct_end_p): Look through MEM_REFs.
Index: tree.c
===
--- tree.c  (revision 236507)
+++ tree.c  (working copy)
@@ -13076,6 +13076,13 @@ array_at_struct_end_p (tree ref)
   ref = TREE_OPERAND (ref, 0);
 }
 
+  if (TREE_CODE (ref) == MEM_REF
+  && TREE_CODE (TREE_OPERAND (ref, 0)) == ADDR_EXPR)
+{
+  ref = TREE_OPERAND (TREE_OPERAND (ref, 0), 0);
+  gcc_assert (!handled_component_p (ref));
+}
+
   /* If the reference is based on a declared entity, the size of the array
  is constrained by its given domain.  (Do not trust commons PR/69368).  */
   if (DECL_P (ref)


Re-apply fix to realistic loop estimates

2016-05-20 Thread Jan Hubicka
Hi,
this patch re-applies the idx_infer_loop_bounds. With fix to the 
tree-vect-loop.c
change there should be no performance regressions. Prefetch-5.c testcase still 
changes
and I will send patch adding likely upper bounds shortly to handle this one.

Bootstrapped/regtested x86_64-linux, comitted.

Honza

* gcc.dg/tree-ssa/prefetch-5.c: xfail.
* tree-ssa-loop-niter.c (idx_infer_loop_bounds): We can not produce
realistic upper bounds here.
Index: testsuite/gcc.dg/tree-ssa/prefetch-5.c
===
--- testsuite/gcc.dg/tree-ssa/prefetch-5.c  (revision 236478)
+++ testsuite/gcc.dg/tree-ssa/prefetch-5.c  (working copy)
@@ -54,5 +54,7 @@ int loop5 (int n, struct tail5 *x)
   return s;
 }
 
-/* { dg-final { scan-tree-dump-times "Issued prefetch" 2 "aprefetch" } } */
-/* { dg-final { scan-tree-dump-times "Not prefetching" 1 "aprefetch" } } */
+/* Until we are able to track likely upper bounds, we can't really work out 
that
+   small trailing arrays should not be prefetched.  */
+/* { dg-final { scan-tree-dump-times "Issued prefetch" 2 "aprefetch" { xfail 
*-*-* } } } */
+/* { dg-final { scan-tree-dump-times "Not prefetching" 1 "aprefetch" { xfail 
*-*-* } } } */
Index: tree-ssa-loop-niter.c
===
--- tree-ssa-loop-niter.c   (revision 236478)
+++ tree-ssa-loop-niter.c   (working copy)
@@ -3115,7 +3115,6 @@ idx_infer_loop_bounds (tree base, tree *
   tree low, high, type, next;
   bool sign, upper = true, at_end = false;
   struct loop *loop = data->loop;
-  bool reliable = true;
 
   if (TREE_CODE (base) != ARRAY_REF)
 return true;
@@ -3187,14 +3186,14 @@ idx_infer_loop_bounds (tree base, tree *
   && tree_int_cst_compare (next, high) <= 0)
 return true;
 
-  /* If access is not executed on every iteration, we must ensure that overlow 
may
- not make the access valid later.  */
+  /* If access is not executed on every iteration, we must ensure that overlow
+ may not make the access valid later.  */
   if (!dominated_by_p (CDI_DOMINATORS, loop->latch, gimple_bb (data->stmt))
   && scev_probably_wraps_p (initial_condition_in_loop_num (ev, loop->num),
step, data->stmt, loop, true))
-reliable = false;
+upper = false;
 
-  record_nonwrapping_iv (loop, init, step, data->stmt, low, high, reliable, 
upper);
+  record_nonwrapping_iv (loop, init, step, data->stmt, low, high, false, 
upper);
   return true;
 }
 


Re: [PATCH] ARC: configure script to allow non uclibc based triplets

2016-05-20 Thread Vineet Gupta
On Friday 20 May 2016 05:28 PM, Claudiu Zissulescu wrote:
> Hi Vineet,
>
>> gcc/
>> 2016-05-20  Vineet Gupta 
>>
>> * config.gcc: Remove uclibc from arc target spec
>>
>> -arc*-*-linux-uclibc*)
>> +arc*-*-linux*)
> Actually may make sense to have something like arc*-*-*linux-glibc* here (or 
> something of a sort) as we can properly select gcc driver configurations for 
> each system, as ARM for example does.

I didn't see any explicit glibc suffixes in the switch case for other arches 
hence
the above.

Actually autoconf automatically defines LIBC_{GLIBC,UCLIBC} based on the 
triplet -
so u get the desired multiplexer for driver configuration already - although I
doubt differences will be needed for ARC for uclibc/glibc !

In light of above, Cupertino pointed me another thing which we need fixing -
something like below

diff --git a/gcc/config/arc/arc.h b/gcc/config/arc/arc.h
index 44f812dfdbe9..10329524c710 100644
--- a/gcc/config/arc/arc.h
+++ b/gcc/config/arc/arc.h
@@ -77,7 +77,7 @@ along with GCC; see the file COPYING3.  If not see
 /* Names to predefine in the preprocessor for this target machine.  */
 #define TARGET_CPU_CPP_BUILTINS() arc_cpu_cpp_builtins (pfile)
 
-#if DEFAULT_LIBC == LIBC_UCLIBC
+#if (DEFAULT_LIBC == LIBC_UCLIBC) || (DEFAULT_LIBC == LIBC_GLIBC) ||
(DEFAULT_LIBC == LIBC_BIONIC)
 
 #define TARGET_OS_CPP_BUILTINS() \
   do \




Re: [PATCH, ARM 4/7, ping1] Factor out MOVW/MOVT availability and desirability checks

2016-05-20 Thread Kyrill Tkachov

Hi Thomas,

On 19/05/16 17:10, Thomas Preudhomme wrote:

On Wednesday 18 May 2016 11:47:47 Kyrill Tkachov wrote:

Hi Thomas,

Hi Kyrill,

Please find below the updated patch and associated ChangeLog entry.

*** gcc/ChangeLog ***

2016-05-18  Thomas Preud'homme  

 * config/arm/arm.h (TARGET_USE_MOVT): Check MOVT/MOVW availability
 with TARGET_HAVE_MOVT.
 (TARGET_HAVE_MOVT): Define.
 * config/arm/arm.c (const_ok_for_op): Check MOVT/MOVW
 availability with TARGET_HAVE_MOVT.
 * config/arm/arm.md (arm_movt): Use TARGET_HAVE_MOVT to check movt
 availability.
 (addsi splitter): Use TARGET_THUMB && TARGET_HAVE_MOVT rather than
 TARGET_THUMB2.
 (symbol_refs movsi splitter): Remove TARGET_32BIT check.
 (arm_movtas_ze): Use TARGET_HAVE_MOVT to check movt availability.
 * config/arm/constraints.md (define_constraint "j"): Use
 TARGET_HAVE_MOVT to check movt availability.



Please use capitalised MOVW/MOVT consistently in the ChangeLog.
Ok with a fixed ChangeLog.

Thanks,
Kyrill


diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index
1d976b36300d92d538098b3cf83c60d62ed2be1c..d199e5ebb89194fdcc962ae9653dd159a67bb7bc
100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -237,7 +237,7 @@ extern void (*arm_lang_output_object_attributes_hook)
(void);
  
  /* Should MOVW/MOVT be used in preference to a constant pool.  */

  #define TARGET_USE_MOVT \
-  (arm_arch_thumb2 \
+  (TARGET_HAVE_MOVT \
 && (arm_disable_literal_pool \
 || (!optimize_size && !current_tune->prefer_constant_pool)))
  
@@ -268,6 +268,9 @@ extern void (*arm_lang_output_object_attributes_hook)

(void);
  /* Nonzero if this chip supports load-acquire and store-release.  */
  #define TARGET_HAVE_LDACQ (TARGET_ARM_ARCH >= 8 && arm_arch_notm)
  
+/* Nonzero if this chip provides the MOVW and MOVW instructions.  */

+#define TARGET_HAVE_MOVT   (arm_arch_thumb2)
+
  /* Nonzero if integer division instructions supported.  */
  #define TARGET_IDIV   ((TARGET_ARM && arm_arch_arm_hwdiv) \
 || (TARGET_THUMB2 && arm_arch_thumb_hwdiv))
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index
7b95ba0b379c31ee650e714ce2198a43b1cadbac..d75a34f10d5ed22cff0a0b5d3ad433f111b059ee
100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -3897,7 +3897,7 @@ const_ok_for_op (HOST_WIDE_INT i, enum rtx_code code)
  {
  case SET:
/* See if we can use movw.  */
-  if (arm_arch_thumb2 && (i & 0x) == 0)
+  if (TARGET_HAVE_MOVT && (i & 0x) == 0)
return 1;
else
/* Otherwise, try mvn.  */
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index
4049f104c6d5fd8bfd8f68ecdfae6a3d34d4333f..8aa9fedf5c07e78bc7ba793b39bebcc45a4d5921
100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -5705,7 +5705,7 @@
[(set (match_operand:SI 0 "nonimmediate_operand" "=r")
(lo_sum:SI (match_operand:SI 1 "nonimmediate_operand" "0")
   (match_operand:SI 2 "general_operand"  "i")))]
-  "arm_arch_thumb2 && arm_valid_symbolic_address_p (operands[2])"
+  "TARGET_HAVE_MOVT && arm_valid_symbolic_address_p (operands[2])"
"movt%?\t%0, #:upper16:%c2"
[(set_attr "predicable" "yes")
 (set_attr "predicable_short_it" "no")
@@ -5765,7 +5765,8 @@
[(set (match_operand:SI 0 "arm_general_register_operand" "")
(const:SI (plus:SI (match_operand:SI 1 "general_operand" "")
   (match_operand:SI 2 "const_int_operand" ""]
-  "TARGET_THUMB2
+  "TARGET_THUMB
+   && TARGET_HAVE_MOVT
 && arm_disable_literal_pool
 && reload_completed
 && GET_CODE (operands[1]) == SYMBOL_REF"
@@ -5796,8 +5797,7 @@
  (define_split
[(set (match_operand:SI 0 "arm_general_register_operand" "")
 (match_operand:SI 1 "general_operand" ""))]
-  "TARGET_32BIT
-   && TARGET_USE_MOVT && GET_CODE (operands[1]) == SYMBOL_REF
+  "TARGET_USE_MOVT && GET_CODE (operands[1]) == SYMBOL_REF
 && !flag_pic && !target_word_relocations
 && !arm_tls_referenced_p (operands[1])"
[(clobber (const_int 0))]
@@ -10965,7 +10965,7 @@
 (const_int 16)
 (const_int 16))
  (match_operand:SI 1 "const_int_operand" ""))]
-  "arm_arch_thumb2"
+  "TARGET_HAVE_MOVT"
"movt%?\t%0, %L1"
   [(set_attr "predicable" "yes")
(set_attr "predicable_short_it" "no")
diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md
index
3b71c4a527064290066348cb234c6abb8c8e2e43..4ece5f013c92adee04157b5c909e1d47c894c994
100644
--- a/gcc/config/arm/constraints.md
+++ b/gcc/config/arm/constraints.md
@@ -66,7 +66,7 @@
  
  (define_constraint "j"

   "A constant suitable for a MOVW instruction. (ARM/Thumb-2)"
- (and (match_test "TARGET_32BIT && arm_arch_thumb2")
+ (and (match_test "TARGET_HAVE_MOVT")
(ior (and (match_code "high")
 

Re: [PATCH] Fix devirt from dropping lhs with TREE_ADDRESSABLE type on noreturn calls (PR c++/71210)

2016-05-20 Thread Marek Polacek
On Fri, May 20, 2016 at 01:59:48PM +0200, Jakub Jelinek wrote:
> On Fri, May 20, 2016 at 01:40:01PM +0200, Marek Polacek wrote:
> > > +   if (lhs
> > > +   && (gimple_call_flags (stmt) & ECF_NORETURN)
> > > +   && (VOID_TYPE_P (TREE_TYPE (gimple_call_fntype (stmt)))
> > > +   || ((TREE_CODE (TYPE_SIZE_UNIT (TREE_TYPE (lhs)))
> > > +== INTEGER_CST)
> > > +   && !TREE_ADDRESSABLE (TREE_TYPE (lhs)
> > 
> > Do you think it would be worth it to factor out this check into a new
> > predicate and use it throughout the codebase?
> 
> I think it would be worthwhile.  Are you willing to write a patch for this?

Yeah, sure.  can_remove_lhs_p?

Marek


Re: [PATCH] Fix devirt from dropping lhs with TREE_ADDRESSABLE type on noreturn calls (PR c++/71210)

2016-05-20 Thread Richard Biener
On Fri, 20 May 2016, Jakub Jelinek wrote:

> On Fri, May 20, 2016 at 01:40:01PM +0200, Marek Polacek wrote:
> > > +   if (lhs
> > > +   && (gimple_call_flags (stmt) & ECF_NORETURN)
> > > +   && (VOID_TYPE_P (TREE_TYPE (gimple_call_fntype (stmt)))
> > > +   || ((TREE_CODE (TYPE_SIZE_UNIT (TREE_TYPE (lhs)))
> > > +== INTEGER_CST)
> > > +   && !TREE_ADDRESSABLE (TREE_TYPE (lhs)
> > 
> > Do you think it would be worth it to factor out this check into a new
> > predicate and use it throughout the codebase?
> 
> I think it would be worthwhile.  Are you willing to write a patch for this?
> Otherwise I can add it to my todo list, but it will take a while.

Maybe even make it a maybe_drop_lhs_from_noreturn_call () helper.

Richard.


Re: [PATCH] Fix devirt from dropping lhs with TREE_ADDRESSABLE type on noreturn calls (PR c++/71210)

2016-05-20 Thread Jakub Jelinek
On Fri, May 20, 2016 at 01:40:01PM +0200, Marek Polacek wrote:
> > + if (lhs
> > + && (gimple_call_flags (stmt) & ECF_NORETURN)
> > + && (VOID_TYPE_P (TREE_TYPE (gimple_call_fntype (stmt)))
> > + || ((TREE_CODE (TYPE_SIZE_UNIT (TREE_TYPE (lhs)))
> > +  == INTEGER_CST)
> > + && !TREE_ADDRESSABLE (TREE_TYPE (lhs)
> 
> Do you think it would be worth it to factor out this check into a new
> predicate and use it throughout the codebase?

I think it would be worthwhile.  Are you willing to write a patch for this?
Otherwise I can add it to my todo list, but it will take a while.

Jakub


RE: [PATCH] ARC: configure script to allow non uclibc based triplets

2016-05-20 Thread Claudiu Zissulescu
Hi Vineet,

> gcc/
> 2016-05-20  Vineet Gupta 
> 
> * config.gcc: Remove uclibc from arc target spec
> 
> -arc*-*-linux-uclibc*)
> +arc*-*-linux*)

Actually may make sense to have something like arc*-*-*linux-glibc* here (or 
something of a sort) as we can properly select gcc driver configurations for 
each system, as ARM for example does.

//Claudiu


Re: [PATCH][RFC] Introduce BIT_FIELD_INSERT

2016-05-20 Thread Richard Biener
On Fri, 20 May 2016, Jakub Jelinek wrote:

> On Fri, May 20, 2016 at 01:41:01PM +0200, Richard Biener wrote:
> > I'd say ppc and aarch64 are fine.  Thanks for noticing.
> 
> So like this then?

Yes.

Thanks,
Richard.

> 2016-05-20  Jakub Jelinek  
> 
>   PR tree-optimization/29756
>   gcc.dg/tree-ssa/vector-6.c: Add -Wno-psabi -w to dg-options.
>   Add -msse2 for x86 and -maltivec for powerpc.  Use scan-tree-dump-times
>   only on selected targets where V4SImode vectors are known to be
>   supported.
> 
> --- gcc/testsuite/gcc.dg/tree-ssa/vector-6.c.jj   2016-05-20 
> 12:44:33.0 +0200
> +++ gcc/testsuite/gcc.dg/tree-ssa/vector-6.c  2016-05-20 13:49:19.880961132 
> +0200
> @@ -1,5 +1,7 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O -fdump-tree-ccp1" } */
> +/* { dg-options "-O -fdump-tree-ccp1 -Wno-psabi -w" } */
> +/* { dg-additional-options "-msse2" { target i?86-*-* x86_64-*-* } } */
> +/* { dg-additional-options "-maltivec" { target powerpc_altivec_ok } } */
>  
>  typedef int v4si __attribute__((vector_size (4 * sizeof (int;
>  
> @@ -30,4 +32,4 @@ v4si test4 (v4si v, int i)
>return v;
>  }
>  
> -/* { dg-final { scan-tree-dump-times "Now a gimple register: v" 4 "ccp1" } } 
> */
> +/* { dg-final { scan-tree-dump-times "Now a gimple register: v" 4 "ccp1" { 
> target { { i?86-*-* x86_64-*-* aarch64*-*-* spu*-*-* } || { 
> powerpc_altivec_ok } } } } } */
> 
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCH][RFC] Introduce BIT_FIELD_INSERT

2016-05-20 Thread Jakub Jelinek
On Fri, May 20, 2016 at 01:41:01PM +0200, Richard Biener wrote:
> I'd say ppc and aarch64 are fine.  Thanks for noticing.

So like this then?

2016-05-20  Jakub Jelinek  

PR tree-optimization/29756
gcc.dg/tree-ssa/vector-6.c: Add -Wno-psabi -w to dg-options.
Add -msse2 for x86 and -maltivec for powerpc.  Use scan-tree-dump-times
only on selected targets where V4SImode vectors are known to be
supported.

--- gcc/testsuite/gcc.dg/tree-ssa/vector-6.c.jj 2016-05-20 12:44:33.0 
+0200
+++ gcc/testsuite/gcc.dg/tree-ssa/vector-6.c2016-05-20 13:49:19.880961132 
+0200
@@ -1,5 +1,7 @@
 /* { dg-do compile } */
-/* { dg-options "-O -fdump-tree-ccp1" } */
+/* { dg-options "-O -fdump-tree-ccp1 -Wno-psabi -w" } */
+/* { dg-additional-options "-msse2" { target i?86-*-* x86_64-*-* } } */
+/* { dg-additional-options "-maltivec" { target powerpc_altivec_ok } } */
 
 typedef int v4si __attribute__((vector_size (4 * sizeof (int;
 
@@ -30,4 +32,4 @@ v4si test4 (v4si v, int i)
   return v;
 }
 
-/* { dg-final { scan-tree-dump-times "Now a gimple register: v" 4 "ccp1" } } */
+/* { dg-final { scan-tree-dump-times "Now a gimple register: v" 4 "ccp1" { 
target { { i?86-*-* x86_64-*-* aarch64*-*-* spu*-*-* } || { powerpc_altivec_ok 
} } } } } */


Jakub


Re: [PATCH] Fix devirt from dropping lhs with TREE_ADDRESSABLE type on noreturn calls (PR c++/71210)

2016-05-20 Thread Richard Biener
On Fri, 20 May 2016, Jakub Jelinek wrote:

> Hi!
> 
> This is another case in the never ending story of dropping lhs of noreturn
> calls when we shouldn't.
> 
> Though, in this case, while we can optimize a call to a direct call to
> normal [[noreturn]] method, we can also optimize into __cxa_pure_virtual
> or __builtin_unreachable.  And in those cases IMHO it is desirable to
> not have the lhs, but we should also adjust gimple_call_set_fntype,
> because we are now calling something different, we've just reused the
> same call stmt for that.

Hmm, I think for devirt we can unconditionally adjust fntype.  It should
be impossible to create a wrongly typed virtual call in source
(maybe with the help of LTO and some ODR violations).

> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok with doing the set_fntype unconditionally.  If you think it's safer
the way you did it that's also ok.

Richard.

> 2016-05-20  Jakub Jelinek  
> 
>   PR c++/71210
>   * gimple-fold.c (gimple_fold_call): Do not remove lhs of noreturn
>   calls if the LHS is variable length or has addressable type.
>   If targets[0]->decl is a noreturn call with void return type and
>   zero arguments, adjust fntype and remove lhs in that case.
> 
>   * g++.dg/opt/pr71210-1.C: New test.
>   * g++.dg/opt/pr71210-2.C: New test.
> 
> --- gcc/gimple-fold.c.jj  2016-05-03 14:12:19.0 +0200
> +++ gcc/gimple-fold.c 2016-05-20 10:18:22.818728240 +0200
> @@ -3039,10 +3039,25 @@ gimple_fold_call (gimple_stmt_iterator *
>   }
> if (targets.length () == 1)
>   {
> -   gimple_call_set_fndecl (stmt, targets[0]->decl);
> +   tree fndecl = targets[0]->decl;
> +   gimple_call_set_fndecl (stmt, fndecl);
> changed = true;
> +   /* If changing the call to __cxa_pure_virtual
> +  or similar noreturn function, adjust gimple_call_fntype
> +  too.  */
> +   if ((gimple_call_flags (stmt) & ECF_NORETURN)
> +   && VOID_TYPE_P (TREE_TYPE (TREE_TYPE (fndecl)))
> +   && TYPE_ARG_TYPES (TREE_TYPE (fndecl))
> +   && (TREE_VALUE (TYPE_ARG_TYPES (TREE_TYPE (fndecl)))
> +   == void_type_node))
> + gimple_call_set_fntype (stmt, TREE_TYPE (fndecl));
> /* If the call becomes noreturn, remove the lhs.  */
> -   if (lhs && (gimple_call_flags (stmt) & ECF_NORETURN))
> +   if (lhs
> +   && (gimple_call_flags (stmt) & ECF_NORETURN)
> +   && (VOID_TYPE_P (TREE_TYPE (gimple_call_fntype (stmt)))
> +   || ((TREE_CODE (TYPE_SIZE_UNIT (TREE_TYPE (lhs)))
> +== INTEGER_CST)
> +   && !TREE_ADDRESSABLE (TREE_TYPE (lhs)
>   {
> if (TREE_CODE (lhs) == SSA_NAME)
>   {
> --- gcc/testsuite/g++.dg/opt/pr71210-1.C.jj   2016-05-20 09:30:13.119724805 
> +0200
> +++ gcc/testsuite/g++.dg/opt/pr71210-1.C  2016-05-20 09:29:42.0 
> +0200
> @@ -0,0 +1,14 @@
> +// PR c++/71210
> +// { dg-do compile }
> +// { dg-options "-O2" }
> +
> +#include 
> +
> +void f1 (const std::type_info&) __attribute__((noreturn));
> +struct S1 { ~S1 (); };
> +struct S2
> +{
> +  virtual S1 f2 () const { f1 (typeid (*this)); }
> +  S1 f3 () const { return f2 (); }
> +};
> +void f4 () { S2 a; a.f3 (); }
> --- gcc/testsuite/g++.dg/opt/pr71210-2.C.jj   2016-05-20 10:20:00.918402232 
> +0200
> +++ gcc/testsuite/g++.dg/opt/pr71210-2.C  2016-05-20 10:19:48.0 
> +0200
> @@ -0,0 +1,23 @@
> +// PR c++/71210
> +// { dg-do compile }
> +// { dg-options "-O2" }
> +
> +struct C { int a; int b; C (); ~C (); };
> +
> +namespace
> +{
> +  struct A
> +  {
> +A () {}
> +virtual C bar (int) = 0;
> +C baz (int x) { return bar (x); }
> +  };
> +}
> +
> +A *a;
> +
> +void
> +foo ()
> +{
> +  C c = a->baz (0);
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCH, vec-tails 01/10] New compiler options

2016-05-20 Thread Ilya Enkovich
2016-05-20 14:17 GMT+03:00 Richard Biener :
> On Fri, May 20, 2016 at 11:50 AM, Ilya Enkovich  
> wrote:
>> 2016-05-20 12:26 GMT+03:00 Richard Biener :
>>> On Thu, May 19, 2016 at 9:36 PM, Ilya Enkovich  
>>> wrote:
 Hi,

 This patch introduces new options used for loop epilogues vectorization.
>>>
>>> Why's that?  This is a bit too much for the casual user and if it is
>>> really necessary
>>> to control this via options then it is not fine-grained enough.
>>>
>>> Why doesn't the vectorizer/backend have enough info to decide this itself?
>>
>> I don't expect casual user to decide which modes to choose.  These controls 
>> are
>> added for debugging and performance measurement purposes.  I see now I miss
>> -ftree-vectorize-epilogues aliased to -ftree-vectorize-epilogues=all.  Surely
>> I expect epilogues and short loops vectorization be enabled by default on -O3
>> or by -ftree-vectorize-loops.
>
> Can you make all these --params then?  I think to be useful to users we'd want
> them to be loop pragmas rather than options.

OK, I'll change it to params.  I didn't think about control via
pragmas but will do now.

Thanks,
Ilya

>
> Richard.
>
>> Thanks,
>> Ilya
>>
>>>
>>> Richard.
>>>
 Thanks,
 Ilya
 --
 gcc/

 2016-05-19  Ilya Enkovich  

 * common.opt (flag_tree_vectorize_epilogues): New.
 (ftree-vectorize-short-loops): New.
 (ftree-vectorize-epilogues=): New.
 (fno-tree-vectorize-epilogues): New.
 (fvect-epilogue-cost-model=): New.
 * flag-types.h (enum vect_epilogue_mode): New.
 * opts.c (parse_vectorizer_options): New.
 (common_handle_option): Support -ftree-vectorize-epilogues=
 and -fno-tree-vectorize-epilogues options.


 diff --git a/gcc/common.opt b/gcc/common.opt
 index 682cb41..6b83b79 100644
 --- a/gcc/common.opt
 +++ b/gcc/common.opt
 @@ -243,6 +243,10 @@ bool dump_base_name_prefixed = false
  Variable
  bool flag_disable_hsa = false

 +; Flag holding modes for loop epilogue vectorization
 +Variable
 +unsigned int flag_tree_vectorize_epilogues
 +
  ###
  Driver

 @@ -2557,6 +2561,19 @@ ftree-vectorize
  Common Report Var(flag_tree_vectorize) Optimization
  Enable vectorization on trees.

 +ftree-vectorize-short-loops
 +Common Report Var(flag_tree_vectorize_short_loops) Optimization
 +Enable vectorization of loops with low trip count using masking.
 +
 +ftree-vectorize-epilogues=
 +Common Report Joined Optimization
 +Comma separated list of loop epilogue vectorization modes.
 +Available modes: combine, mask, nomask.
 +
 +fno-tree-vectorize-epilogues
 +Common RejectNegative Optimization
 +Disable epilogues vectorization.
 +
  ftree-vectorizer-verbose=
  Common Joined RejectNegative Ignore
  Does nothing.  Preserved for backward compatibility.
 @@ -2577,6 +2594,10 @@ fsimd-cost-model=
  Common Joined RejectNegative Enum(vect_cost_model) 
 Var(flag_simd_cost_model) Init(VECT_COST_MODEL_UNLIMITED) Optimization
  Specifies the vectorization cost model for code marked with a simd 
 directive.

 +fvect-epilogue-cost-model=
 +Common Joined RejectNegative Enum(vect_cost_model) 
 Var(flag_vect_epilogue_cost_model) Init(VECT_COST_MODEL_DEFAULT) 
 Optimization
 +Specifies the cost model for epilogue vectorization.
 +
  Enum
  Name(vect_cost_model) Type(enum vect_cost_model) UnknownError(unknown 
 vectorizer cost model %qs)

 diff --git a/gcc/flag-types.h b/gcc/flag-types.h
 index dd57e16..24081b1 100644
 --- a/gcc/flag-types.h
 +++ b/gcc/flag-types.h
 @@ -200,6 +200,15 @@ enum vect_cost_model {
VECT_COST_MODEL_DEFAULT = 3
  };

 +/* Epilogue vectorization modes.  */
 +enum vect_epilogue_mode {
 +  VECT_EPILOGUE_COMBINE = 1 << 0,
 +  VECT_EPILOGUE_MASK = 1 << 1,
 +  VECT_EPILOGUE_NOMASK = 1 << 2,
 +  VECT_EPILOGUE_ALL = VECT_EPILOGUE_COMBINE | VECT_EPILOGUE_MASK
 + | VECT_EPILOGUE_NOMASK
 +};
 +
  /* Different instrumentation modes.  */
  enum sanitize_code {
/* AddressSanitizer.  */
 diff --git a/gcc/opts.c b/gcc/opts.c
 index 0f9431a..a0c0987 100644
 --- a/gcc/opts.c
 +++ b/gcc/opts.c
 @@ -1531,6 +1531,63 @@ parse_sanitizer_options (const char *p, location_t 
 loc, int scode,
return flags;
  }

 +/* Parse comma separated vectorizer suboptions from P for option SCODE,
 +   adjust previous FLAGS and return new ones.  If COMPLAIN is false,
 +   don't issue diagnostics.  */
 +
 +unsigned int
 +parse_vectorizer_options (const char *p, location_t loc, int scode,

Re: [PATCH][RFC] Introduce BIT_FIELD_INSERT

2016-05-20 Thread Richard Biener
On Fri, 20 May 2016, Jakub Jelinek wrote:

> On Fri, May 20, 2016 at 10:59:18AM +0200, Richard Biener wrote:
> > Sounds good.  I will commit later with your wording.
> 
> Unfortunately, the new testcase fails e.g. on i?86-*-* or on powerpc*.
> On i?86-*-* (without -msse) I actually see 2 different issues, one is
> extra -Wpsabi warnings, and another is the dump scan, the optimization isn't
> used there at all if we don't have SSE HW.
> Surprisingly, on powerpc* the only problem is the extra warnings about ABI
> compatibility, but the scan matches, even if there is no vector support.
> Similarly on s390* too (and there are no warnings even).

I suppose they still have vector modes enabled.

> So, dunno if we should limit the scan-tree-dump-times only to a few selected
> arches (e.g. those where we add dg-additional-options for, plus some where
> it is known to work without additional options, like perhaps aarch64*-*-*,
> maybe spu*-*-*, what else?).

I'd say ppc and aarch64 are fine.  Thanks for noticing.

Richard.

> 2016-05-20  Jakub Jelinek  
> 
>   PR tree-optimization/29756
>   gcc.dg/tree-ssa/vector-6.c: Add -Wno-psabi -w to dg-options.
>   Add -msse2 for x86 and -maltivec for powerpc.
> 
> --- gcc/testsuite/gcc.dg/tree-ssa/vector-6.c.jj   2016-05-20 
> 12:44:33.0 +0200
> +++ gcc/testsuite/gcc.dg/tree-ssa/vector-6.c  2016-05-20 13:17:08.730168547 
> +0200
> @@ -1,5 +1,7 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O -fdump-tree-ccp1" } */
> +/* { dg-options "-O -fdump-tree-ccp1 -Wno-psabi -w" } */
> +/* { dg-additional-options "-msse2" { target i?86-*-* x86_64-*-* } } */
> +/* { dg-additional-options "-maltivec" { target powerpc_altivec_ok } } */
>  
>  typedef int v4si __attribute__((vector_size (4 * sizeof (int;
>  
> 
> 
>   Jakub


Re: [PATCH] Fix devirt from dropping lhs with TREE_ADDRESSABLE type on noreturn calls (PR c++/71210)

2016-05-20 Thread Marek Polacek
On Fri, May 20, 2016 at 01:31:22PM +0200, Jakub Jelinek wrote:
> Hi!
> 
> This is another case in the never ending story of dropping lhs of noreturn
> calls when we shouldn't.
> 
> Though, in this case, while we can optimize a call to a direct call to
> normal [[noreturn]] method, we can also optimize into __cxa_pure_virtual
> or __builtin_unreachable.  And in those cases IMHO it is desirable to
> not have the lhs, but we should also adjust gimple_call_set_fntype,
> because we are now calling something different, we've just reused the
> same call stmt for that.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> 2016-05-20  Jakub Jelinek  
> 
>   PR c++/71210
>   * gimple-fold.c (gimple_fold_call): Do not remove lhs of noreturn
>   calls if the LHS is variable length or has addressable type.
>   If targets[0]->decl is a noreturn call with void return type and
>   zero arguments, adjust fntype and remove lhs in that case.
> 
>   * g++.dg/opt/pr71210-1.C: New test.
>   * g++.dg/opt/pr71210-2.C: New test.
> 
> --- gcc/gimple-fold.c.jj  2016-05-03 14:12:19.0 +0200
> +++ gcc/gimple-fold.c 2016-05-20 10:18:22.818728240 +0200
> @@ -3039,10 +3039,25 @@ gimple_fold_call (gimple_stmt_iterator *
>   }
> if (targets.length () == 1)
>   {
> -   gimple_call_set_fndecl (stmt, targets[0]->decl);
> +   tree fndecl = targets[0]->decl;
> +   gimple_call_set_fndecl (stmt, fndecl);
> changed = true;
> +   /* If changing the call to __cxa_pure_virtual
> +  or similar noreturn function, adjust gimple_call_fntype
> +  too.  */
> +   if ((gimple_call_flags (stmt) & ECF_NORETURN)
> +   && VOID_TYPE_P (TREE_TYPE (TREE_TYPE (fndecl)))
> +   && TYPE_ARG_TYPES (TREE_TYPE (fndecl))
> +   && (TREE_VALUE (TYPE_ARG_TYPES (TREE_TYPE (fndecl)))
> +   == void_type_node))
> + gimple_call_set_fntype (stmt, TREE_TYPE (fndecl));
> /* If the call becomes noreturn, remove the lhs.  */
> -   if (lhs && (gimple_call_flags (stmt) & ECF_NORETURN))
> +   if (lhs
> +   && (gimple_call_flags (stmt) & ECF_NORETURN)
> +   && (VOID_TYPE_P (TREE_TYPE (gimple_call_fntype (stmt)))
> +   || ((TREE_CODE (TYPE_SIZE_UNIT (TREE_TYPE (lhs)))
> +== INTEGER_CST)
> +   && !TREE_ADDRESSABLE (TREE_TYPE (lhs)

Do you think it would be worth it to factor out this check into a new
predicate and use it throughout the codebase?

Marek


[PATCH] Fix Fortran ICE due to realloc_string_callback bug (PR fortran/71204)

2016-05-20 Thread Jakub Jelinek
Hi!

We ICE at -O0 while compiling the testcase below, because we don't reset
two vars that are reset in all other places in frontend-passes.c when
starting to process an unrelated statement.  Without this,
we can emit some statement into a preexisting block that can be elsewhere
in the current procedure or as in the testcase in completely different
procedure.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/6/5?

2016-05-20  Jakub Jelinek  

PR fortran/71204
* frontend-passes.c (realloc_string_callback): Clear inserted_block
and changed_statement before calling create_var.

* gfortran.dg/pr71204.f90: New test.

--- gcc/fortran/frontend-passes.c.jj2016-05-11 15:16:18.0 +0200
+++ gcc/fortran/frontend-passes.c   2016-05-20 10:44:31.699542384 +0200
@@ -174,8 +174,10 @@ realloc_string_callback (gfc_code **c, i
 
   if (!gfc_check_dependency (expr1, expr2, true))
 return 0;
-  
+
   current_code = c;
+  inserted_block = NULL;
+  changed_statement = NULL;
   n = create_var (expr2, "trim");
   co->expr2 = n;
   return 0;
--- gcc/testsuite/gfortran.dg/pr71204.f90.jj2016-05-20 10:45:40.738608941 
+0200
+++ gcc/testsuite/gfortran.dg/pr71204.f90   2016-05-20 10:46:25.873998687 
+0200
@@ -0,0 +1,17 @@
+! PR fortran/71204
+! { dg-do compile }
+! { dg-options "-O0" }
+
+module pr71204
+  character(10), allocatable :: z(:)
+end module
+
+subroutine s1
+  use pr71204
+  z(2) = z(1)
+end
+
+subroutine s2
+  use pr71204
+  z(2) = z(1)
+end

Jakub


Re: [PATCH, vec-tails 04/10] Add masking cost

2016-05-20 Thread Ilya Enkovich
2016-05-20 14:15 GMT+03:00 Richard Biener :
> On Fri, May 20, 2016 at 11:44 AM, Ilya Enkovich  
> wrote:
>> 2016-05-20 12:24 GMT+03:00 Richard Biener :
>>> On Thu, May 19, 2016 at 9:40 PM, Ilya Enkovich  
>>> wrote:
 Hi,

 This patch extends vectorizer cost model to include masking cost by
 adding new cost model locations and new target hook to compute
 masking cost.
>>>
>>> Can you explain a bit why you add separate overall
>>> masking_prologue/body_cost rather
>>> than using the existing prologue/body cost for that?
>>
>> When I make a decision I need vector loop cost without masking (what
>> we currently
>> have) and with masking (what I add).  This allows me to compute
>> profitability for
>> all options (scalar epilogue, combined epilogue, masked epilogue) and choose 
>> one
>> of them.  Using existing prologue/body cost would allow me compute masking
>> profitability with no fall back to scalar loop profitability.
>
> Yes, but for this kind of purpose you could simply re-start
> separate costing via the init_cost hook?

But that would require double scan through loop statements + double
profitability
estimations.  I compute masking cost during statements analysis
(see patch #05) in parallel with regular costs computations.  Note that masking
costs is a cost of masking only.  Thus cost of a vector masked iteration is
body cost + body masking cost.

>
>>> I realize that the current vectorizer cost infrastructure is a big
>>> mess, but isn't it possible
>>> to achieve what you did with the current add_stmt_cost hook?  (by
>>> inspecting stmt_info)
>>
>> Cost of a statement and cost of masking a statement are different things.
>> Two hooks called for the same statement return different values. I can
>> add vect_cost_for_stmt enum elements to cover masking but I thought
>> having stmt_masking_cost would me more clear.
>
> I agree we need some kind of overloading and I'm not against a separate hook
> for this.  On a related note what is "masking cost" here?  I could imagine
> that masking doesn't unconditionally add a cost to a stmt but its execution
> cost may now depend on whether an element is masked or not.
>
> Does the hook return the cost of the masked stmt or the cost of masking
> the stmt only (so you need to do add_stmt_cost as well on the same stmt)?

It returns the cost of masking the statement only.  Thus if a hardware has
no penalty for executing masked instruction then return value should be 0.

Thanks,
Ilya

>
> Thanks,
> Richard.
>
>> Thanks,
>> Ilya
>>
>>>
>>> Richard.
>>>
 Thanks,
 Ilya


[PATCH] Fix devirt from dropping lhs with TREE_ADDRESSABLE type on noreturn calls (PR c++/71210)

2016-05-20 Thread Jakub Jelinek
Hi!

This is another case in the never ending story of dropping lhs of noreturn
calls when we shouldn't.

Though, in this case, while we can optimize a call to a direct call to
normal [[noreturn]] method, we can also optimize into __cxa_pure_virtual
or __builtin_unreachable.  And in those cases IMHO it is desirable to
not have the lhs, but we should also adjust gimple_call_set_fntype,
because we are now calling something different, we've just reused the
same call stmt for that.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-05-20  Jakub Jelinek  

PR c++/71210
* gimple-fold.c (gimple_fold_call): Do not remove lhs of noreturn
calls if the LHS is variable length or has addressable type.
If targets[0]->decl is a noreturn call with void return type and
zero arguments, adjust fntype and remove lhs in that case.

* g++.dg/opt/pr71210-1.C: New test.
* g++.dg/opt/pr71210-2.C: New test.

--- gcc/gimple-fold.c.jj2016-05-03 14:12:19.0 +0200
+++ gcc/gimple-fold.c   2016-05-20 10:18:22.818728240 +0200
@@ -3039,10 +3039,25 @@ gimple_fold_call (gimple_stmt_iterator *
}
  if (targets.length () == 1)
{
- gimple_call_set_fndecl (stmt, targets[0]->decl);
+ tree fndecl = targets[0]->decl;
+ gimple_call_set_fndecl (stmt, fndecl);
  changed = true;
+ /* If changing the call to __cxa_pure_virtual
+or similar noreturn function, adjust gimple_call_fntype
+too.  */
+ if ((gimple_call_flags (stmt) & ECF_NORETURN)
+ && VOID_TYPE_P (TREE_TYPE (TREE_TYPE (fndecl)))
+ && TYPE_ARG_TYPES (TREE_TYPE (fndecl))
+ && (TREE_VALUE (TYPE_ARG_TYPES (TREE_TYPE (fndecl)))
+ == void_type_node))
+   gimple_call_set_fntype (stmt, TREE_TYPE (fndecl));
  /* If the call becomes noreturn, remove the lhs.  */
- if (lhs && (gimple_call_flags (stmt) & ECF_NORETURN))
+ if (lhs
+ && (gimple_call_flags (stmt) & ECF_NORETURN)
+ && (VOID_TYPE_P (TREE_TYPE (gimple_call_fntype (stmt)))
+ || ((TREE_CODE (TYPE_SIZE_UNIT (TREE_TYPE (lhs)))
+  == INTEGER_CST)
+ && !TREE_ADDRESSABLE (TREE_TYPE (lhs)
{
  if (TREE_CODE (lhs) == SSA_NAME)
{
--- gcc/testsuite/g++.dg/opt/pr71210-1.C.jj 2016-05-20 09:30:13.119724805 
+0200
+++ gcc/testsuite/g++.dg/opt/pr71210-1.C2016-05-20 09:29:42.0 
+0200
@@ -0,0 +1,14 @@
+// PR c++/71210
+// { dg-do compile }
+// { dg-options "-O2" }
+
+#include 
+
+void f1 (const std::type_info&) __attribute__((noreturn));
+struct S1 { ~S1 (); };
+struct S2
+{
+  virtual S1 f2 () const { f1 (typeid (*this)); }
+  S1 f3 () const { return f2 (); }
+};
+void f4 () { S2 a; a.f3 (); }
--- gcc/testsuite/g++.dg/opt/pr71210-2.C.jj 2016-05-20 10:20:00.918402232 
+0200
+++ gcc/testsuite/g++.dg/opt/pr71210-2.C2016-05-20 10:19:48.0 
+0200
@@ -0,0 +1,23 @@
+// PR c++/71210
+// { dg-do compile }
+// { dg-options "-O2" }
+
+struct C { int a; int b; C (); ~C (); };
+
+namespace
+{
+  struct A
+  {
+A () {}
+virtual C bar (int) = 0;
+C baz (int x) { return bar (x); }
+  };
+}
+
+A *a;
+
+void
+foo ()
+{
+  C c = a->baz (0);
+}

Jakub


Re: [PATCH][RFC] Introduce BIT_FIELD_INSERT

2016-05-20 Thread Jakub Jelinek
On Fri, May 20, 2016 at 10:59:18AM +0200, Richard Biener wrote:
> Sounds good.  I will commit later with your wording.

Unfortunately, the new testcase fails e.g. on i?86-*-* or on powerpc*.
On i?86-*-* (without -msse) I actually see 2 different issues, one is
extra -Wpsabi warnings, and another is the dump scan, the optimization isn't
used there at all if we don't have SSE HW.
Surprisingly, on powerpc* the only problem is the extra warnings about ABI
compatibility, but the scan matches, even if there is no vector support.
Similarly on s390* too (and there are no warnings even).

So, dunno if we should limit the scan-tree-dump-times only to a few selected
arches (e.g. those where we add dg-additional-options for, plus some where
it is known to work without additional options, like perhaps aarch64*-*-*,
maybe spu*-*-*, what else?).

2016-05-20  Jakub Jelinek  

PR tree-optimization/29756
gcc.dg/tree-ssa/vector-6.c: Add -Wno-psabi -w to dg-options.
Add -msse2 for x86 and -maltivec for powerpc.

--- gcc/testsuite/gcc.dg/tree-ssa/vector-6.c.jj 2016-05-20 12:44:33.0 
+0200
+++ gcc/testsuite/gcc.dg/tree-ssa/vector-6.c2016-05-20 13:17:08.730168547 
+0200
@@ -1,5 +1,7 @@
 /* { dg-do compile } */
-/* { dg-options "-O -fdump-tree-ccp1" } */
+/* { dg-options "-O -fdump-tree-ccp1 -Wno-psabi -w" } */
+/* { dg-additional-options "-msse2" { target i?86-*-* x86_64-*-* } } */
+/* { dg-additional-options "-maltivec" { target powerpc_altivec_ok } } */
 
 typedef int v4si __attribute__((vector_size (4 * sizeof (int;
 


Jakub


Re: [C++ Patch/RFC] PR 70572 ("[4.9/5/6/7 Regression] ICE on code with decltype (auto) on x86_64-linux-gnu in digest_init_r")

2016-05-20 Thread Paolo Carlini

Hi,

On 19/05/2016 15:58, Jason Merrill wrote:

On 05/18/2016 07:13 PM, Paolo Carlini wrote:

+  error ("cannot declare variable %q+D with function type", decl);


I think the error message would be more helpful if it mentioned 
decltype(auto), maybe


"initializer for % has function type, did you 
forget the %<()%>?", DECL_NAME (decl)


(or some other way to print the variable type as declared rather than 
as deduced).


The below passes testing. There are a few minor changes wrt your 
suggestions (I think we want & as hint; spacing consistent with 
typeck2.c; DECL_NAME doesn't seem necessary). I wondered if we want to 
tighten the condition consistently with the wording of the error 
message, thus patchlet *2 below, which of course also passes testing.


Thanks,
Paolo.

///
Index: cp/decl.c
===
--- cp/decl.c   (revision 236496)
+++ cp/decl.c   (working copy)
@@ -6609,6 +6609,13 @@ cp_finish_decl (tree decl, tree init, bool init_co
adc_variable_type);
   if (type == error_mark_node)
return;
+  if (TREE_CODE (type) == FUNCTION_TYPE)
+   {
+ error ("initializer for % has function type "
+"(did you forget the %<&%> ?)", decl);
+ TREE_TYPE (decl) = error_mark_node;
+ return;
+   }
   cp_apply_type_quals_to_decl (cp_type_quals (type), decl);
 }
 
Index: testsuite/g++.dg/cpp1y/auto-fn31.C
===
--- testsuite/g++.dg/cpp1y/auto-fn31.C  (revision 0)
+++ testsuite/g++.dg/cpp1y/auto-fn31.C  (working copy)
@@ -0,0 +1,7 @@
+// PR c++/70572
+// { dg-do compile { target c++14 } }
+
+void foo ()
+{
+  decltype (auto) a = foo;  // { dg-error "initializer" }
+}
Index: cp/decl.c
===
--- cp/decl.c   (revision 236496)
+++ cp/decl.c   (working copy)
@@ -6609,6 +6609,14 @@ cp_finish_decl (tree decl, tree init, bool init_co
adc_variable_type);
   if (type == error_mark_node)
return;
+  if (TREE_CODE (type) == FUNCTION_TYPE
+ && TREE_CODE (TREE_TYPE (d_init)) == FUNCTION_TYPE)
+   {
+ error ("initializer for % has function type "
+"(did you forget the %<&%> ?)", decl);
+ TREE_TYPE (decl) = error_mark_node;
+ return;
+   }
   cp_apply_type_quals_to_decl (cp_type_quals (type), decl);
 }
 
Index: testsuite/g++.dg/cpp1y/auto-fn31.C
===
--- testsuite/g++.dg/cpp1y/auto-fn31.C  (revision 0)
+++ testsuite/g++.dg/cpp1y/auto-fn31.C  (working copy)
@@ -0,0 +1,7 @@
+// PR c++/70572
+// { dg-do compile { target c++14 } }
+
+void foo ()
+{
+  decltype (auto) a = foo;  // { dg-error "initializer" }
+}


Re: [PATCH, vec-tails 01/10] New compiler options

2016-05-20 Thread Richard Biener
On Fri, May 20, 2016 at 11:50 AM, Ilya Enkovich  wrote:
> 2016-05-20 12:26 GMT+03:00 Richard Biener :
>> On Thu, May 19, 2016 at 9:36 PM, Ilya Enkovich  
>> wrote:
>>> Hi,
>>>
>>> This patch introduces new options used for loop epilogues vectorization.
>>
>> Why's that?  This is a bit too much for the casual user and if it is
>> really necessary
>> to control this via options then it is not fine-grained enough.
>>
>> Why doesn't the vectorizer/backend have enough info to decide this itself?
>
> I don't expect casual user to decide which modes to choose.  These controls 
> are
> added for debugging and performance measurement purposes.  I see now I miss
> -ftree-vectorize-epilogues aliased to -ftree-vectorize-epilogues=all.  Surely
> I expect epilogues and short loops vectorization be enabled by default on -O3
> or by -ftree-vectorize-loops.

Can you make all these --params then?  I think to be useful to users we'd want
them to be loop pragmas rather than options.

Richard.

> Thanks,
> Ilya
>
>>
>> Richard.
>>
>>> Thanks,
>>> Ilya
>>> --
>>> gcc/
>>>
>>> 2016-05-19  Ilya Enkovich  
>>>
>>> * common.opt (flag_tree_vectorize_epilogues): New.
>>> (ftree-vectorize-short-loops): New.
>>> (ftree-vectorize-epilogues=): New.
>>> (fno-tree-vectorize-epilogues): New.
>>> (fvect-epilogue-cost-model=): New.
>>> * flag-types.h (enum vect_epilogue_mode): New.
>>> * opts.c (parse_vectorizer_options): New.
>>> (common_handle_option): Support -ftree-vectorize-epilogues=
>>> and -fno-tree-vectorize-epilogues options.
>>>
>>>
>>> diff --git a/gcc/common.opt b/gcc/common.opt
>>> index 682cb41..6b83b79 100644
>>> --- a/gcc/common.opt
>>> +++ b/gcc/common.opt
>>> @@ -243,6 +243,10 @@ bool dump_base_name_prefixed = false
>>>  Variable
>>>  bool flag_disable_hsa = false
>>>
>>> +; Flag holding modes for loop epilogue vectorization
>>> +Variable
>>> +unsigned int flag_tree_vectorize_epilogues
>>> +
>>>  ###
>>>  Driver
>>>
>>> @@ -2557,6 +2561,19 @@ ftree-vectorize
>>>  Common Report Var(flag_tree_vectorize) Optimization
>>>  Enable vectorization on trees.
>>>
>>> +ftree-vectorize-short-loops
>>> +Common Report Var(flag_tree_vectorize_short_loops) Optimization
>>> +Enable vectorization of loops with low trip count using masking.
>>> +
>>> +ftree-vectorize-epilogues=
>>> +Common Report Joined Optimization
>>> +Comma separated list of loop epilogue vectorization modes.
>>> +Available modes: combine, mask, nomask.
>>> +
>>> +fno-tree-vectorize-epilogues
>>> +Common RejectNegative Optimization
>>> +Disable epilogues vectorization.
>>> +
>>>  ftree-vectorizer-verbose=
>>>  Common Joined RejectNegative Ignore
>>>  Does nothing.  Preserved for backward compatibility.
>>> @@ -2577,6 +2594,10 @@ fsimd-cost-model=
>>>  Common Joined RejectNegative Enum(vect_cost_model) 
>>> Var(flag_simd_cost_model) Init(VECT_COST_MODEL_UNLIMITED) Optimization
>>>  Specifies the vectorization cost model for code marked with a simd 
>>> directive.
>>>
>>> +fvect-epilogue-cost-model=
>>> +Common Joined RejectNegative Enum(vect_cost_model) 
>>> Var(flag_vect_epilogue_cost_model) Init(VECT_COST_MODEL_DEFAULT) 
>>> Optimization
>>> +Specifies the cost model for epilogue vectorization.
>>> +
>>>  Enum
>>>  Name(vect_cost_model) Type(enum vect_cost_model) UnknownError(unknown 
>>> vectorizer cost model %qs)
>>>
>>> diff --git a/gcc/flag-types.h b/gcc/flag-types.h
>>> index dd57e16..24081b1 100644
>>> --- a/gcc/flag-types.h
>>> +++ b/gcc/flag-types.h
>>> @@ -200,6 +200,15 @@ enum vect_cost_model {
>>>VECT_COST_MODEL_DEFAULT = 3
>>>  };
>>>
>>> +/* Epilogue vectorization modes.  */
>>> +enum vect_epilogue_mode {
>>> +  VECT_EPILOGUE_COMBINE = 1 << 0,
>>> +  VECT_EPILOGUE_MASK = 1 << 1,
>>> +  VECT_EPILOGUE_NOMASK = 1 << 2,
>>> +  VECT_EPILOGUE_ALL = VECT_EPILOGUE_COMBINE | VECT_EPILOGUE_MASK
>>> + | VECT_EPILOGUE_NOMASK
>>> +};
>>> +
>>>  /* Different instrumentation modes.  */
>>>  enum sanitize_code {
>>>/* AddressSanitizer.  */
>>> diff --git a/gcc/opts.c b/gcc/opts.c
>>> index 0f9431a..a0c0987 100644
>>> --- a/gcc/opts.c
>>> +++ b/gcc/opts.c
>>> @@ -1531,6 +1531,63 @@ parse_sanitizer_options (const char *p, location_t 
>>> loc, int scode,
>>>return flags;
>>>  }
>>>
>>> +/* Parse comma separated vectorizer suboptions from P for option SCODE,
>>> +   adjust previous FLAGS and return new ones.  If COMPLAIN is false,
>>> +   don't issue diagnostics.  */
>>> +
>>> +unsigned int
>>> +parse_vectorizer_options (const char *p, location_t loc, int scode,
>>> + unsigned int flags, int value, bool complain)
>>> +{
>>> +  if (scode != OPT_ftree_vectorize_epilogues_)
>>> +return flags;
>>> +
>>> +  if (!p)
>>> +return value;
>>> +
>>> +  while (*p != 0)
>>> +{
>>> +  size_t len;
>>> +  const char *comma = strchr (p, ',');
>>> +   

Re: [PATCH, vec-tails 04/10] Add masking cost

2016-05-20 Thread Richard Biener
On Fri, May 20, 2016 at 11:44 AM, Ilya Enkovich  wrote:
> 2016-05-20 12:24 GMT+03:00 Richard Biener :
>> On Thu, May 19, 2016 at 9:40 PM, Ilya Enkovich  
>> wrote:
>>> Hi,
>>>
>>> This patch extends vectorizer cost model to include masking cost by
>>> adding new cost model locations and new target hook to compute
>>> masking cost.
>>
>> Can you explain a bit why you add separate overall
>> masking_prologue/body_cost rather
>> than using the existing prologue/body cost for that?
>
> When I make a decision I need vector loop cost without masking (what
> we currently
> have) and with masking (what I add).  This allows me to compute
> profitability for
> all options (scalar epilogue, combined epilogue, masked epilogue) and choose 
> one
> of them.  Using existing prologue/body cost would allow me compute masking
> profitability with no fall back to scalar loop profitability.

Yes, but for this kind of purpose you could simply re-start
separate costing via the init_cost hook?

>> I realize that the current vectorizer cost infrastructure is a big
>> mess, but isn't it possible
>> to achieve what you did with the current add_stmt_cost hook?  (by
>> inspecting stmt_info)
>
> Cost of a statement and cost of masking a statement are different things.
> Two hooks called for the same statement return different values. I can
> add vect_cost_for_stmt enum elements to cover masking but I thought
> having stmt_masking_cost would me more clear.

I agree we need some kind of overloading and I'm not against a separate hook
for this.  On a related note what is "masking cost" here?  I could imagine
that masking doesn't unconditionally add a cost to a stmt but its execution
cost may now depend on whether an element is masked or not.

Does the hook return the cost of the masked stmt or the cost of masking
the stmt only (so you need to do add_stmt_cost as well on the same stmt)?

Thanks,
Richard.

> Thanks,
> Ilya
>
>>
>> Richard.
>>
>>> Thanks,
>>> Ilya


Re: [PATCH, ARM 5/7, ping1] Add support for MOVT/MOVW to ARMv8-M Baseline

2016-05-20 Thread Kyrill Tkachov

Hi Thomas,

On 19/05/16 17:11, Thomas Preudhomme wrote:

On Wednesday 18 May 2016 12:30:41 Kyrill Tkachov wrote:

Hi Thomas,

This looks mostly good with a few nits inline.
Please repost with the comments addressed.

Updated ChangeLog entries:

*** gcc/ChangeLog ***

2016-05-18  Thomas Preud'homme  

 * config/arm/arm.h (TARGET_HAVE_MOVT): Include ARMv8-M as having MOVT.
 * config/arm/arm.c (arm_arch_name): (const_ok_for_op): Check MOVT/MOVW
 availability with TARGET_HAVE_MOVT.
 (thumb_legitimate_constant_p): Strip the high part of a label_ref.
 (thumb1_rtx_costs): Also return 0 if setting a half word constant and
 MOVW is available and replace (unsigned HOST_WIDE_INT) INTVAL by
 UINTVAL.
 (thumb1_size_rtx_costs): Make set of half word constant also cost 1
 extra instruction if MOVW is available.  Use a cost variable
 incremented by COSTS_N_INSNS (1) when the condition match rather than
 returning an arithmetic expression based on COSTS_N_INSNS.  Make
 constant with bottom half word zero cost 2 instruction if MOVW is
 available.
 * config/arm/arm.md (define_attr "arch"): Add v8mb.
 (define_attr "arch_enabled"): Set to yes if arch value is v8mb and
 target is ARMv8-M Baseline.
 * config/arm/thumb1.md (thumb1_movdi_insn): Add ARMv8-M Baseline only
 alternative for constants satisfying j constraint.
 (thumb1_movsi_insn): Likewise.
 (movsi splitter for K alternative): Tighten condition to not trigger
 if movt is available and j constraint is satisfied.
 (Pe immediate splitter): Likewise.
 (thumb1_movhi_insn): Add ARMv8-M Baseline only alternative for
 constant fitting in an halfword to use MOVW.
 * doc/sourcebuild.texi (arm_thumb1_movt_ko): Document new ARM
 effective target.


*** gcc/testsuite/ChangeLog ***

2015-11-13  Thomas Preud'homme  

 * lib/target-supports.exp (check_effective_target_arm_thumb1_movt_ko):
 Define effective target.
 * gcc.target/arm/pr42574.c: Require arm_thumb1_movt_ko instead of
 arm_thumb1_ok as effective target to exclude ARMv8-M Baseline.


This is ok now, thanks for the changes.
I'd like to see some tests that generate MOVW/MOVT instructions
on ARMv8-M Baseline. They should be easy, just an:
int
foo ()
{
  return CONST;
}

and same for short and long long return types
(to exercise the HI, SI and DImode move patterns).

You can add them as part of this patch or as a separate followup.

Thanks,
Kyrill




and patch:

diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index
47216b4a1959ccdb18e329db411bf7f941e67163..f42e996e5a7ce979fe406b8261d50fb2ba005f6b
100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -269,7 +269,7 @@ extern void (*arm_lang_output_object_attributes_hook)
(void);
  #define TARGET_HAVE_LDACQ (TARGET_ARM_ARCH >= 8 && arm_arch_notm)
  
  /* Nonzero if this chip provides the movw and movt instructions.  */

-#define TARGET_HAVE_MOVT   (arm_arch_thumb2)
+#define TARGET_HAVE_MOVT   (arm_arch_thumb2 || arm_arch8)
  
  /* Nonzero if integer division instructions supported.  */

  #define TARGET_IDIV   ((TARGET_ARM && arm_arch_arm_hwdiv) \
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index
d75a34f10d5ed22cff0a0b5d3ad433f111b059ee..a05e559c905daa55e686491a038342360c721912
100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -8220,6 +8220,12 @@ arm_legitimate_constant_p_1 (machine_mode, rtx x)
  static bool
  thumb_legitimate_constant_p (machine_mode mode ATTRIBUTE_UNUSED, rtx x)
  {
+  /* Splitters for TARGET_USE_MOVT call arm_emit_movpair which creates high
+ RTX.  These RTX must therefore be allowed for Thumb-1 so that when run
+ for ARMv8-M baseline or later the result is valid.  */
+  if (TARGET_HAVE_MOVT && GET_CODE (x) == HIGH)
+x = XEXP (x, 0);
+
return (CONST_INT_P (x)
  || CONST_DOUBLE_P (x)
  || CONSTANT_ADDRESS_P (x)
@@ -8306,7 +8312,8 @@ thumb1_rtx_costs (rtx x, enum rtx_code code, enum
rtx_code outer)
  case CONST_INT:
if (outer == SET)
{
- if ((unsigned HOST_WIDE_INT) INTVAL (x) < 256)
+ if (UINTVAL (x) < 256
+ || (TARGET_HAVE_MOVT && !(INTVAL (x) & 0x)))
return 0;
  if (thumb_shiftable_const (INTVAL (x)))
return COSTS_N_INSNS (2);
@@ -9009,7 +9016,7 @@ static inline int
  thumb1_size_rtx_costs (rtx x, enum rtx_code code, enum rtx_code outer)
  {
machine_mode mode = GET_MODE (x);
-  int words;
+  int words, cost;
  
switch (code)

  {
@@ -9055,17 +9062,26 @@ thumb1_size_rtx_costs (rtx x, enum rtx_code code, enum
rtx_code outer)
/* A SET doesn't have a mode, so let's look at the SET_DEST to get
 the mode.  */
words = ARM_NUM_INTS (GET_MODE_SIZE (GET_MODE 

Re: [Committed] S/390: Disable scalar vector instructions with -mno-vx.

2016-05-20 Thread Jakub Jelinek
On Tue, May 10, 2016 at 11:03:57AM +0200, Andreas Krebbel wrote:
> Although the scalar variants of the vector instructions aren't
> actually vector instructions they are still executed in the vector
> facility and therefore need to be disabled when disabling the facility
> with -mno-vx.

OT, I see that check_vect_support_and_set_flags doesn't have any s390
support, I'd be expecting it to add -mvx and depending on the hw either
use just compile as the default action or run with -mvx.  Also, tree-vect.h
probably needs adjustment for z13 etc.

Jakub


Re: [PATCH] Fix PR tree-optimization/71170

2016-05-20 Thread Richard Biener
On Fri, May 20, 2016 at 1:51 AM, Kugan Vivekanandarajah
 wrote:
> Hi Richard,
>
>> I think it should have the same rank as op or op + 1 which is the current
>> behavior.  Sth else doesn't work correctly here I think, like inserting the
>> multiplication not near the definition of op.
>>
>> Well, the whole "clever insertion" logic is simply flawed.
>
> What I meant to say was that the simple logic we have now wouldn’t
> work. "clever logic" is knowing where exactly where it is needed and
> inserting there.  I think thats what  you are suggesting below in a
> simple to implement way.
>
>> I'd say that ideally we would delay inserting the multiplication to
>> rewrite_expr_tree time.  For example by adding a ops->stmt_to_insert
>> member.
>>
>
> Here is an implementation based on above. Bootstrap on x86-linux-gnu
> is OK. regression testing is ongoing.

I like it.  Please push the insertion code to a helper as I think you need
to post-pone setting the stmts UID to that point.

Ideally we'd make use of the same machinery in attempt_builtin_powi,
removing the special-casing of powi_result.  (same as I said that ideally
the plus->mult stuff would use the repeat-ops machinery...)

I'm not 100% convinced the place you insert the stmt is correct but I
haven't spent too much time to decipher reassoc in this area.

Thanks,
Richard.

> Thanks,
> Kugan
>
> gcc/ChangeLog:
>
> 2016-05-20  Kugan Vivekanandarajah  
>
> * tree-ssa-reassoc.c (struct operand_entry): Add field stmt_to_insert.
> (add_to_ops_vec): Add stmt_to_insert.
> (add_repeat_to_ops_vec): Init stmt_to_insert.
> (transform_add_to_multiply): Remove mult_stmt insertion and add it
> to ops vector.
> (get_ops): Init stmt_to_insert.
> (maybe_optimize_range_tests): Likewise.
> (rewrite_expr_tree): Insert  stmt_to_insert before use stmt.
> (rewrite_expr_tree_parallel): Likewise.


Re: [RFC] Type promotion pass and elimination of zext/sext

2016-05-20 Thread Richard Biener
On Thu, May 19, 2016 at 8:17 PM, Jeff Law  wrote:
> On 05/15/2016 06:45 PM, Kugan Vivekanandarajah wrote:
>>
>> Hi Richard,
>>
>> Now that stage1 is open, I would like to get the type promotion passes
>> reviewed again. I have tested the patches on aarch64, x86-64, and
>> ppc64le without any new execution failures. There some test-cases that
>> fails for patterns. I will address them after getting feedback on the
>> basic structure.
>
> I find myself wondering if this will eliminate some of the cases where Kai's
> type casting motion was useful.  And just to be clear, that would be a good
> thing.
>
>>
>> 1. When we promote SSA as part of promote_ssa, we either promote the
>> definition. Or create a copy stmt that is inserted after the stmt that
>> define it. i.e, we want to promote the SSA and reflect the promotion
>> on all the uses (we promote in place). We do this because, we don’t
>> want to change all the uses.
>>
>> +/* Promote definition DEF to promoted type.  If the stmt that defines def
>> +   is def_stmt, make the type of def promoted type.  If the stmt is such
>> +   that, result of the def_stmt cannot be of promoted type, create a
>> new_def
>> +   of the original_type and make the def_stmt assign its value to newdef.
>> +   Then, create a NOP_EXPR to convert new_def to def of promoted type.
>> +
>> +   For example, for stmt with original_type char and promoted_type int:
>> +char _1 = mem;
>> +becomes:
>> +char _2 = mem;
>> +int _1 = (int)_2;
>
> When does this case happen, and how is this any better than PRE or other
> elimination/code motion algorithms in improving the generated code?

The above case mentions one - loads from memory.  Another case would be
vector element extracts from vNqi vectors or asm outputs.

> I would hazard a guess that it could happen if you still needed the char
> sized used in a small number of cases, but generally wanted to promote most
> uses to int?

I think we want to promote all uses to int, we only can't always combine
the extension with the value-producing stmt on GIMPLE (we don't have
single-stmt sign-extending loads for example).  Likewise we don't allow
the equivalent of (subreg:QI SI-reg) at SSA use sites and thus will
generally have a trucating stmt before such uses.

This makes it somewhat difficult to assert the IL is in "promoted" form
and it also exposes the challenge to inhibit passes from undoing this.

I guess in the end we may want to allow "promoted" types in those stmts
directly, that is, have implicit conversions in the stmts so that we see

  void foo (char);
  char *p_2;
  int _1;

  _1 = *p_2;
  foo (_1);

where on the load we'd have an implicit extension and at the call an implicit
subreg.

Other cases would be, say

  char _1, _2;
  _1 = _2 <> However, if the defining stmt has to be the last stmt in the basic
>> block (eg, stmt that can throw), and if there is more than one normal
>> edges where we use this value, we cant insert the copy in all the
>> edges. Please note that the copy stmt copes the value to promoted SSA
>> with the same name.
>>
>> Therefore I had to return false in this case for promote_ssa and fixup
>> uses. I ran into this while testing ppc64le. I am sure it can happen
>> in other cases.
>
> Right.
>
> Jeff


Re: [PATCH 2/3] Add profiling support for IVOPTS

2016-05-20 Thread Bin.Cheng
On Thu, May 19, 2016 at 11:28 AM, Martin Liška  wrote:
> On 05/17/2016 12:27 AM, Bin.Cheng wrote:
>>> As profile-guided optimization can provide very useful information
>>> about basic block frequencies within a loop, following patch set leverages
>>> that information. It speeds up a single benchmark from upcoming SPECv6
>>> suite by 20% (-O2 -profile-generate/-fprofile use) and I think it can
>>> also improve others (currently measuring numbers for PGO).
>> Hi,
>> Is this 20% improvement from this patch, or does it include the
>> existing PGO's improvement?
>
> Hello.
>
> It shows that current trunk (compared to GCC 6 branch)
> has significantly improved the benchmark with PGO.
> Currently, my patch improves PGO by ~5% w/ -O2, but our plan is to
> improve static profile that would utilize the patch.
>
>>
>> For the patch:
>>> +
>>> +  /* Return true if the frequency has a valid value.  */
>>> +  bool has_frequency ();
>>> +
>>>/* Return infinite comp_cost.  */
>>>static comp_cost get_infinite ();
>>>
>>> @@ -249,6 +272,9 @@ private:
>>>   complexity field should be larger for more
>>>   complex expressions and addressing modes).  */
>>>int m_scratch;  /* Scratch used during cost computation.  */
>>> +  sreal m_frequency;  /* Frequency of the basic block this comp_cost
>>> + belongs to.  */
>>> +  sreal m_cost_scaled;  /* Scalled runtime cost.  */
>> IMHO we shouldn't embed frequency in comp_cost, neither record scaled
>> cost in it.  I would suggest we compute cost and amortize the cost
>> over frequency in get_computation_cost_at before storing it into
>> comp_cost.  That is, once cost is computed/stored in comp_cost, it is
>> already scaled with frequency.  One argument is frequency info is only
>> valid for use's statement/basic_block, it really doesn't have clear
>> meaning in comp_cost structure.  Outside of function
>> get_computation_cost_at, I found it's hard to understand/remember
>> what's the meaning of comp_cost.m_frequency and where it came from.
>> There are other reasons embedded in below comments.
>>>
>>>
>>>  comp_cost&
>>> @@ -257,6 +283,8 @@ comp_cost::operator= (const comp_cost& other)
>>>m_cost = other.m_cost;
>>>m_complexity = other.m_complexity;
>>>m_scratch = other.m_scratch;
>>> +  m_frequency = other.m_frequency;
>>> +  m_cost_scaled = other.m_cost_scaled;
>>>
>>>return *this;
>>>  }
>>> @@ -275,6 +303,7 @@ operator+ (comp_cost cost1, comp_cost cost2)
>>>
>>>cost1.m_cost += cost2.m_cost;
>>>cost1.m_complexity += cost2.m_complexity;
>>> +  cost1.m_cost_scaled += cost2.m_cost_scaled;
>>>
>>>return cost1;
>>>  }
>>> @@ -290,6 +319,8 @@ comp_cost
>>>  comp_cost::operator+= (HOST_WIDE_INT c)
>> This and below operators need check for infinite cost first and return
>> immediately.
>>>  {
>>>this->m_cost += c;
>>> +  if (has_frequency ())
>>> +this->m_cost_scaled += scale_cost (c);
>>>
>>>return *this;
>>>  }
>>> @@ -5047,18 +5128,21 @@ get_computation_cost_at (struct ivopts_data *data,
>>>   (symbol/var1/const parts may be omitted).  If we are looking for an
>>>   address, find the cost of addressing this.  */
>>>if (address_p)
>>> -return cost + get_address_cost (symbol_present, var_present,
>>> -offset, ratio, cstepi,
>>> -mem_mode,
>>> -TYPE_ADDR_SPACE (TREE_TYPE (utype)),
>>> -speed, stmt_is_after_inc, can_autoinc);
>>> +{
>>> +  cost += get_address_cost (symbol_present, var_present,
>>> + offset, ratio, cstepi,
>>> + mem_mode,
>>> + TYPE_ADDR_SPACE (TREE_TYPE (utype)),
>>> + speed, stmt_is_after_inc, can_autoinc);
>>> +  goto ret;
>>> +}
>>>
>>>/* Otherwise estimate the costs for computing the expression.  */
>>>if (!symbol_present && !var_present && !offset)
>>>  {
>>>if (ratio != 1)
>>>   cost += mult_by_coeff_cost (ratio, TYPE_MODE (ctype), speed);
>>> -  return cost;
>>> +  goto ret;
>>>  }
>>>
>>>/* Symbol + offset should be compile-time computable so consider that 
>>> they
>>> @@ -5077,7 +5161,8 @@ get_computation_cost_at (struct ivopts_data *data,
>>>aratio = ratio > 0 ? ratio : -ratio;
>>>if (aratio != 1)
>>>  cost += mult_by_coeff_cost (aratio, TYPE_MODE (ctype), speed);
>>> -  return cost;
>>> +
>>> +  goto ret;
>>>
>>>  fallback:
>>>if (can_autoinc)
>>> @@ -5093,8 +5178,13 @@ fallback:
>>>  if (address_p)
>>>comp = build_simple_mem_ref (comp);
>>>
>>> -return comp_cost (computation_cost (comp, speed), 0);
>>> +cost = comp_cost (computation_cost (comp, speed), 0);
>>>}
>>> +
>>> +ret:
>>> +  cost.calculate_scaled_cost (at->bb->frequency,
>>> +  data->current_loop->header->frequency);
>> Here cost consists of two parts.  One is for loop invariant
>> computation, we amortize is against avg_loop_niter and record register
>> pressure (either via invriant variables or invariant expressions) for
>> it;  the other is loop variant part.  For the first part, we should
>> 

[PATCH][AArch64] Tie operand 1 to operand 0 in AESMC pattern when AES/AESMC fusion is enabled

2016-05-20 Thread Kyrill Tkachov

Hi all,

The recent -frename-registers change exposed a deficiency in the way we fuse 
AESE/AESMC instruction
pairs in aarch64.

Basically we want to enforce:
AESE Vn, _
AESMC Vn, Vn

to enable the fusion, but regrename comes along and renames the output Vn 
register in AESMC to something
else, killing the fusion in the hardware.

The solution in this patch is to add an alternative that ties the input and 
output registers in the AESMC pattern
and enable that alternative when the fusion is enabled.

With this patch I've confirmed that the above preferred register sequence is 
kept even with -frename-registers
when tuning for a cpu that enables the fusion and that the chain is broken by 
regrename otherwise and have
seen the appropriate improvement in a proprietary benchmark (that I cannot 
name) that exercises this sequence.

Bootstrapped and tested on aarch64-none-linux-gnu.

Ok for trunk?

Thanks,
Kyrill

2016-05-20  Kyrylo Tkachov  

* config/aarch64/aarch64.c (aarch64_fusion_enabled_p): New function.
* config/aarch64/aarch64-protos.h (aarch64_fusion_enabled_p): Declare
prototype.
* config/aarch64/aarch64-simd.md (aarch64_crypto_aesv16qi):
Add "=w,0" alternative.  Enable it when AES/AESMC fusion is enabled.
diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 21cf55b60f86024429ea36ead0d2d8ae4c94b579..f6da854fbaeeab34239a1f874edaedf8a01bf9c2 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -290,6 +290,7 @@ bool aarch64_constant_address_p (rtx);
 bool aarch64_expand_movmem (rtx *);
 bool aarch64_float_const_zero_rtx_p (rtx);
 bool aarch64_function_arg_regno_p (unsigned);
+bool aarch64_fusion_enabled_p (unsigned int);
 bool aarch64_gen_movmemqi (rtx *);
 bool aarch64_gimple_fold_builtin (gimple_stmt_iterator *);
 bool aarch64_is_extend_from_extract (machine_mode, rtx, rtx);
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index bd73bce64414e8bc01732d14311d742cf28f4586..a66948a28e99f4437824a8640b092f7be1c917f6 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -5424,13 +5424,25 @@ (define_insn "aarch64_crypto_aesv16qi"
   [(set_attr "type" "crypto_aese")]
 )
 
+;; When AES/AESMC fusion is enabled we want the register allocation to
+;; look like:
+;;AESE Vn, _
+;;AESMC Vn, Vn
+;; So prefer to tie operand 1 to operand 0 when fusing.
+
 (define_insn "aarch64_crypto_aesv16qi"
-  [(set (match_operand:V16QI 0 "register_operand" "=w")
-	(unspec:V16QI [(match_operand:V16QI 1 "register_operand" "w")]
+  [(set (match_operand:V16QI 0 "register_operand" "=w,w")
+	(unspec:V16QI [(match_operand:V16QI 1 "register_operand" "0,w")]
 	 CRYPTO_AESMC))]
   "TARGET_SIMD && TARGET_CRYPTO"
   "aes\\t%0.16b, %1.16b"
-  [(set_attr "type" "crypto_aesmc")]
+  [(set_attr "type" "crypto_aesmc")
+   (set_attr_alternative "enabled"
+ [(if_then_else (match_test
+		   "aarch64_fusion_enabled_p (AARCH64_FUSE_AES_AESMC)")
+		 (const_string "yes" )
+		 (const_string "no"))
+  (const_string "yes")])]
 )
 
 ;; sha1
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index b93f961fc4ebd9eb3f50b0580741c80ab6eca427..815973ca6e764121f2669ad160918561450e6c50 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -13359,6 +13359,14 @@ aarch_macro_fusion_pair_p (rtx_insn *prev, rtx_insn *curr)
   return false;
 }
 
+/* Return true iff the instruction fusion described by OP is enabled.  */
+
+bool
+aarch64_fusion_enabled_p (unsigned int op)
+{
+  return (aarch64_tune_params.fusible_ops & op) != 0;
+}
+
 /* If MEM is in the form of [base+offset], extract the two parts
of address and set to BASE and OFFSET, otherwise return false
after clearing BASE and OFFSET.  */


[PATCH][ARM] Tie operand 1 to operand 0 in AESMC pattern when fusing AES/AESMC

2016-05-20 Thread Kyrill Tkachov

Hi all,

The recent -frename-registers change exposed a deficiency in the way we fuse 
AESE/AESMC instruction
pairs in arm.

Basically we want to enforce:
AESE Vn, _
AESMC Vn, Vn

to enable the fusion, but regrename comes along and renames the output Vn 
register in AESMC to something
else, killing the fusion in the hardware.

The solution in this patch is to add an alternative that ties the input and 
output registers in the AESMC pattern
and enable that alternative when the fusion is enabled.

With this patch I've confirmed that the above preferred register sequence is 
kept even with -frename-registers
when tuning for a cpu that enables the fusion and that the chain is broken by 
regrename otherwise and have
seen the appropriate improvement in a proprietary benchmark (that I cannot 
name) that exercises this sequence.

Bootstrapped and tested on arm-none-linux-gnueabihf.

Ok for trunk?

Thanks,
Kyrill


2016-05-20  Kyrylo Tkachov  

* config/arm/arm.c (arm_fusion_enabled_p): New function.
* config/arm/arm-protos.h (arm_fusion_enabled_p): Declare prototype.
* config/arm/crypto.md (crypto_, CRYPTO_UNARY):
Add "=w,0" alternative.  Enable it when AES/AESMC fusion is enabled.
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index cf221d6793eaf0959f2713fe0903a5d8602ec2f4..253f14be5c8266a8b305988d0e145e4b4742f256 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -131,6 +131,7 @@ extern int arm_const_double_inline_cost (rtx);
 extern bool arm_const_double_by_parts (rtx);
 extern bool arm_const_double_by_immediates (rtx);
 extern void arm_emit_call_insn (rtx, rtx, bool);
+extern bool arm_fusion_enabled_p (unsigned int);
 extern const char *output_call (rtx *);
 void arm_emit_movpair (rtx, rtx);
 extern const char *output_mov_long_double_arm_from_arm (rtx *);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index e2dc592de1974abf3fc03c8a7908dd204512f936..2cc7f7b452a62f898346a51ca7ede0d19bcfcfad 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -29642,6 +29642,13 @@ aarch_macro_fusion_pair_p (rtx_insn* prev, rtx_insn* curr)
   return false;
 }
 
+/* Return true iff the instruction fusion described by OP is enabled.  */
+bool
+arm_fusion_enabled_p (unsigned int op)
+{
+  return current_tune->fusible_ops & op;
+}
+
 /* Implement the TARGET_ASAN_SHADOW_OFFSET hook.  */
 
 static unsigned HOST_WIDE_INT
diff --git a/gcc/config/arm/crypto.md b/gcc/config/arm/crypto.md
index c6f17270b1dbaf6dc43eb1e9b8a182dbb0f5a1e1..0f510f069408471fcbf6751f161e984f39929813 100644
--- a/gcc/config/arm/crypto.md
+++ b/gcc/config/arm/crypto.md
@@ -18,14 +18,27 @@
 ;; along with GCC; see the file COPYING3.  If not see
 ;; .
 
+
+;; When AES/AESMC fusion is enabled we want the register allocation to
+;; look like:
+;;AESE Vn, _
+;;AESMC Vn, Vn
+;; So prefer to tie operand 1 to operand 0 when fusing.
+
 (define_insn "crypto_"
-  [(set (match_operand: 0 "register_operand" "=w")
+  [(set (match_operand: 0 "register_operand" "=w,w")
 (unspec: [(match_operand: 1
-   "register_operand" "w")]
+   "register_operand" "0,w")]
  CRYPTO_UNARY))]
   "TARGET_CRYPTO"
   ".\\t%q0, %q1"
-  [(set_attr "type" "")]
+  [(set_attr "type" "")
+   (set_attr_alternative "enabled"
+ [(if_then_else (match_test
+		   "arm_fusion_enabled_p (tune_params::FUSE_AES_AESMC)")
+		 (const_string "yes" )
+		 (const_string "no"))
+  (const_string "yes")])]
 )
 
 (define_insn "crypto_"


Re: [PATCH, vec-tails 01/10] New compiler options

2016-05-20 Thread Ilya Enkovich
2016-05-20 12:26 GMT+03:00 Richard Biener :
> On Thu, May 19, 2016 at 9:36 PM, Ilya Enkovich  wrote:
>> Hi,
>>
>> This patch introduces new options used for loop epilogues vectorization.
>
> Why's that?  This is a bit too much for the casual user and if it is
> really necessary
> to control this via options then it is not fine-grained enough.
>
> Why doesn't the vectorizer/backend have enough info to decide this itself?

I don't expect casual user to decide which modes to choose.  These controls are
added for debugging and performance measurement purposes.  I see now I miss
-ftree-vectorize-epilogues aliased to -ftree-vectorize-epilogues=all.  Surely
I expect epilogues and short loops vectorization be enabled by default on -O3
or by -ftree-vectorize-loops.

Thanks,
Ilya

>
> Richard.
>
>> Thanks,
>> Ilya
>> --
>> gcc/
>>
>> 2016-05-19  Ilya Enkovich  
>>
>> * common.opt (flag_tree_vectorize_epilogues): New.
>> (ftree-vectorize-short-loops): New.
>> (ftree-vectorize-epilogues=): New.
>> (fno-tree-vectorize-epilogues): New.
>> (fvect-epilogue-cost-model=): New.
>> * flag-types.h (enum vect_epilogue_mode): New.
>> * opts.c (parse_vectorizer_options): New.
>> (common_handle_option): Support -ftree-vectorize-epilogues=
>> and -fno-tree-vectorize-epilogues options.
>>
>>
>> diff --git a/gcc/common.opt b/gcc/common.opt
>> index 682cb41..6b83b79 100644
>> --- a/gcc/common.opt
>> +++ b/gcc/common.opt
>> @@ -243,6 +243,10 @@ bool dump_base_name_prefixed = false
>>  Variable
>>  bool flag_disable_hsa = false
>>
>> +; Flag holding modes for loop epilogue vectorization
>> +Variable
>> +unsigned int flag_tree_vectorize_epilogues
>> +
>>  ###
>>  Driver
>>
>> @@ -2557,6 +2561,19 @@ ftree-vectorize
>>  Common Report Var(flag_tree_vectorize) Optimization
>>  Enable vectorization on trees.
>>
>> +ftree-vectorize-short-loops
>> +Common Report Var(flag_tree_vectorize_short_loops) Optimization
>> +Enable vectorization of loops with low trip count using masking.
>> +
>> +ftree-vectorize-epilogues=
>> +Common Report Joined Optimization
>> +Comma separated list of loop epilogue vectorization modes.
>> +Available modes: combine, mask, nomask.
>> +
>> +fno-tree-vectorize-epilogues
>> +Common RejectNegative Optimization
>> +Disable epilogues vectorization.
>> +
>>  ftree-vectorizer-verbose=
>>  Common Joined RejectNegative Ignore
>>  Does nothing.  Preserved for backward compatibility.
>> @@ -2577,6 +2594,10 @@ fsimd-cost-model=
>>  Common Joined RejectNegative Enum(vect_cost_model) 
>> Var(flag_simd_cost_model) Init(VECT_COST_MODEL_UNLIMITED) Optimization
>>  Specifies the vectorization cost model for code marked with a simd 
>> directive.
>>
>> +fvect-epilogue-cost-model=
>> +Common Joined RejectNegative Enum(vect_cost_model) 
>> Var(flag_vect_epilogue_cost_model) Init(VECT_COST_MODEL_DEFAULT) Optimization
>> +Specifies the cost model for epilogue vectorization.
>> +
>>  Enum
>>  Name(vect_cost_model) Type(enum vect_cost_model) UnknownError(unknown 
>> vectorizer cost model %qs)
>>
>> diff --git a/gcc/flag-types.h b/gcc/flag-types.h
>> index dd57e16..24081b1 100644
>> --- a/gcc/flag-types.h
>> +++ b/gcc/flag-types.h
>> @@ -200,6 +200,15 @@ enum vect_cost_model {
>>VECT_COST_MODEL_DEFAULT = 3
>>  };
>>
>> +/* Epilogue vectorization modes.  */
>> +enum vect_epilogue_mode {
>> +  VECT_EPILOGUE_COMBINE = 1 << 0,
>> +  VECT_EPILOGUE_MASK = 1 << 1,
>> +  VECT_EPILOGUE_NOMASK = 1 << 2,
>> +  VECT_EPILOGUE_ALL = VECT_EPILOGUE_COMBINE | VECT_EPILOGUE_MASK
>> + | VECT_EPILOGUE_NOMASK
>> +};
>> +
>>  /* Different instrumentation modes.  */
>>  enum sanitize_code {
>>/* AddressSanitizer.  */
>> diff --git a/gcc/opts.c b/gcc/opts.c
>> index 0f9431a..a0c0987 100644
>> --- a/gcc/opts.c
>> +++ b/gcc/opts.c
>> @@ -1531,6 +1531,63 @@ parse_sanitizer_options (const char *p, location_t 
>> loc, int scode,
>>return flags;
>>  }
>>
>> +/* Parse comma separated vectorizer suboptions from P for option SCODE,
>> +   adjust previous FLAGS and return new ones.  If COMPLAIN is false,
>> +   don't issue diagnostics.  */
>> +
>> +unsigned int
>> +parse_vectorizer_options (const char *p, location_t loc, int scode,
>> + unsigned int flags, int value, bool complain)
>> +{
>> +  if (scode != OPT_ftree_vectorize_epilogues_)
>> +return flags;
>> +
>> +  if (!p)
>> +return value;
>> +
>> +  while (*p != 0)
>> +{
>> +  size_t len;
>> +  const char *comma = strchr (p, ',');
>> +  unsigned int flag = 0;
>> +
>> +  if (comma == NULL)
>> +   len = strlen (p);
>> +  else
>> +   len = comma - p;
>> +  if (len == 0)
>> +   {
>> + p = comma + 1;
>> + continue;
>> +   }
>> +
>> +  /* Check to see if the string matches an option class name.  */
>> +  if (len == strlen ("combine")
>> + && 

Re: [PATCH, vec-tails 04/10] Add masking cost

2016-05-20 Thread Ilya Enkovich
2016-05-20 12:24 GMT+03:00 Richard Biener :
> On Thu, May 19, 2016 at 9:40 PM, Ilya Enkovich  wrote:
>> Hi,
>>
>> This patch extends vectorizer cost model to include masking cost by
>> adding new cost model locations and new target hook to compute
>> masking cost.
>
> Can you explain a bit why you add separate overall
> masking_prologue/body_cost rather
> than using the existing prologue/body cost for that?

When I make a decision I need vector loop cost without masking (what
we currently
have) and with masking (what I add).  This allows me to compute
profitability for
all options (scalar epilogue, combined epilogue, masked epilogue) and choose one
of them.  Using existing prologue/body cost would allow me compute masking
profitability with no fall back to scalar loop profitability.


>
> I realize that the current vectorizer cost infrastructure is a big
> mess, but isn't it possible
> to achieve what you did with the current add_stmt_cost hook?  (by
> inspecting stmt_info)

Cost of a statement and cost of masking a statement are different things.
Two hooks called for the same statement return different values. I can
add vect_cost_for_stmt enum elements to cover masking but I thought
having stmt_masking_cost would me more clear.

Thanks,
Ilya

>
> Richard.
>
>> Thanks,
>> Ilya


Re: [patch,openacc] use firstprivate pointers for subarrays in c and c++

2016-05-20 Thread Jakub Jelinek
On Tue, May 10, 2016 at 01:29:50PM -0700, Cesar Philippidis wrote:
> @@ -12542,7 +12543,7 @@ c_finish_omp_clauses (tree clauses, enum 
> c_omp_region_type ort)
> t = OMP_CLAUSE_DECL (c);
> if (TREE_CODE (t) == TREE_LIST)
>   {
> -   if (handle_omp_array_sections (c, ort & C_ORT_OMP))
> +   if (handle_omp_array_sections (c, ort & (C_ORT_OMP | C_ORT_ACC)))
>   {
> remove = true;
> break;

There are no array sections in Cilk+, so the above is always then (for C).
Thus, the is_omp argument to handle_omp_array_sections* should be removed
and code adjusted.

> +   if (ort == C_ORT_ACC)
> + error ("%qD appears more than once in data clasues", t);

Typo.

> +   else
> + error ("%qD appears both in data and map clauses", t);
> remove = true;
>   }
> else
> @@ -12893,7 +12908,10 @@ c_finish_omp_clauses (tree clauses, enum 
> c_omp_region_type ort)
>   }
> else if (bitmap_bit_p (_head, DECL_UID (t)))
>   {
> -   error ("%qD appears both in data and map clauses", t);
> +   if (ort == C_ORT_ACC)
> + error ("%qD appears more than once in data clasues", t);

Likewise.

> +   else
> + error ("%qD appears both in data and map clauses", t);
> remove = true;
>   }
> else

> @@ -5796,12 +5796,14 @@ tree
>  finish_omp_clauses (tree clauses, enum c_omp_region_type ort)
>  {
>bitmap_head generic_head, firstprivate_head, lastprivate_head;
> -  bitmap_head aligned_head, map_head, map_field_head;
> +  bitmap_head aligned_head, map_head, map_field_head, oacc_reduction_head;
>tree c, t, *pc;
>tree safelen = NULL_TREE;
>bool branch_seen = false;
>bool copyprivate_seen = false;
>bool ordered_seen = false;
> +  bool allow_fields = (ort & C_ORT_OMP_DECLARE_SIMD) == C_ORT_OMP
> +|| ort == C_ORT_ACC;
>  

Formatting.  You want = already on the new line, or add () around the whole
rhs and align || below (ort &.

Though, this looks wrong to me, does OpenACC all of sudden support
privatization of non-static data members in methods?

>bitmap_obstack_initialize (NULL);
>bitmap_initialize (_head, _default_obstack);
> @@ -5810,6 +5812,7 @@ finish_omp_clauses (tree clauses, enum 
> c_omp_region_type ort)
>bitmap_initialize (_head, _default_obstack);
>bitmap_initialize (_head, _default_obstack);
>bitmap_initialize (_field_head, _default_obstack);
> +  bitmap_initialize (_reduction_head, _default_obstack);
>  
>for (pc = , c = clauses; c ; c = *pc)
>  {
> @@ -5829,8 +5832,7 @@ finish_omp_clauses (tree clauses, enum 
> c_omp_region_type ort)
> t = OMP_CLAUSE_DECL (c);
> if (TREE_CODE (t) == TREE_LIST)
>   {
> -   if (handle_omp_array_sections (c, ((ort & C_ORT_OMP_DECLARE_SIMD)
> -  == C_ORT_OMP)))
> +   if (handle_omp_array_sections (c, allow_fields))

IMNSHO you don't want to change this, instead adjust C++
handle_omp_array_sections* where it deals with array sections to just use
the is_omp variant; there are still other places where it deals with
non-static data members and I think you don't want to change those.
>   {
> remove = true;
> break;
> @@ -6040,6 +6042,17 @@ finish_omp_clauses (tree clauses, enum 
> c_omp_region_type ort)
>  omp_clause_code_name[OMP_CLAUSE_CODE (c)]);
> remove = true;
>   }
> +   else if (ort == C_ORT_ACC
> +&& OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION)
> + {
> +   if (bitmap_bit_p (_reduction_head, DECL_UID (t)))
> + {
> +   error ("%qD appears more than once in reduction clauses", t);
> +   remove = true;
> + }
> +   else
> + bitmap_set_bit (_reduction_head, DECL_UID (t));
> + }
> else if (bitmap_bit_p (_head, DECL_UID (t))
>  || bitmap_bit_p (_head, DECL_UID (t))
>  || bitmap_bit_p (_head, DECL_UID (t)))
> @@ -6050,7 +6063,10 @@ finish_omp_clauses (tree clauses, enum 
> c_omp_region_type ort)
> else if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_PRIVATE
>  && bitmap_bit_p (_head, DECL_UID (t)))
>   {
> -   error ("%qD appears both in data and map clauses", t);
> +   if (ort == C_ORT_ACC)
> + error ("%qD appears more than once in data clauses", t);
> +   else
> + error ("%qD appears both in data and map clauses", t);
> remove = true;
>   }
> else
> @@ -6076,7 +6092,7 @@ finish_omp_clauses (tree clauses, enum 
> c_omp_region_type ort)
>   omp_note_field_privatization (t, OMP_CLAUSE_DECL (c));
> else
>   t = OMP_CLAUSE_DECL (c);
> -   if (t == current_class_ptr)
> +   if 

Re: increase alignment of global structs in increase_alignment pass

2016-05-20 Thread Prathamesh Kulkarni
On 19 May 2016 at 13:19, Richard Biener  wrote:
> On Thu, 19 May 2016, Prathamesh Kulkarni wrote:
>
>> On 18 May 2016 at 19:38, Richard Biener  wrote:
>> > On Wed, 18 May 2016, Prathamesh Kulkarni wrote:
>> >
>> >> On 17 May 2016 at 18:36, Richard Biener  wrote:
>> >> > On Wed, 11 May 2016, Prathamesh Kulkarni wrote:
>> >> >
>> >> >> On 6 May 2016 at 17:20, Richard Biener  wrote:
>> >> >> >
>> >> >> > You can't simply use
>> >> >> >
>> >> >> > +  offset = int_byte_position (field);
>> >> >> >
>> >> >> > as it can be placed at variable offset which will make 
>> >> >> > int_byte_position
>> >> >> > ICE.  Note it also returns a truncated byte position (bit position
>> >> >> > stripped) which may be undesirable here.  I think you want to use
>> >> >> > bit_position and if and only if DECL_FIELD_OFFSET and
>> >> >> > DECL_FIELD_BIT_OFFSET are INTEGER_CST.
>> >> >> oops, I didn't realize offsets could be variable.
>> >> >> Will that be the case only for VLA member inside struct ?
>> >> >
>> >> > And non-VLA members after such member.
>> >> >
>> >> >> > Your observation about the expensiveness of the walk still stands I 
>> >> >> > guess
>> >> >> > and eventually you should at least cache the
>> >> >> > get_vec_alignment_for_record_decl cases.  Please make those workers
>> >> >> > _type rather than _decl helpers.
>> >> >> Done
>> >> >> >
>> >> >> > You seem to simply get at the maximum vectorized field/array element
>> >> >> > alignment possible for all arrays - you could restrict that to
>> >> >> > arrays with at least vector size (as followup).
>> >> >> Um sorry, I didn't understand this part.
>> >> >
>> >> > It doesn't make sense to align
>> >> >
>> >> > struct { int a; int b; int c; int d; float b[3]; int e; };
>> >> >
>> >> > because we have a float[3] member.  There is no vector size that
>> >> > would cover the float[3] array.
>> >> Thanks for the explanation.
>> >> So we do not want to align struct if sizeof (array_field) < sizeof
>> >> (vector_type).
>> >> This issue is also present without patch for global arrays, so I modified
>> >> get_vec_alignment_for_array_type, to return 0 if sizeof (array_type) <
>> >> sizeof (vectype).
>> >> >
>> >> >> >
>> >> >> > +  /* Skip artificial decls like typeinfo decls or if
>> >> >> > + record is packed.  */
>> >> >> > +  if (DECL_ARTIFICIAL (record_decl) || TYPE_PACKED (type))
>> >> >> > +return 0;
>> >> >> >
>> >> >> > I think we should honor DECL_USER_ALIGN as well and not mess with 
>> >> >> > those
>> >> >> > decls.
>> >> >> Done
>> >> >> >
>> >> >> > Given the patch now does quite some extra work it might make sense
>> >> >> > to split the symtab part out of the vect_can_force_dr_alignment_p
>> >> >> > predicate and call that early.
>> >> >> In the patch I call symtab_node::can_increase_alignment_p early. I 
>> >> >> tried
>> >> >> moving that to it's callers - vect_compute_data_ref_alignment and
>> >> >> increase_alignment::execute, however that failed some tests in vect, 
>> >> >> and
>> >> >> hence I didn't add the following hunk in the patch. Did I miss some
>> >> >> check ?
>> >> >
>> >> > Not sure.
>> >> >
>> >> >> diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
>> >> >> index 7652e21..2c1acee 100644
>> >> >> --- a/gcc/tree-vect-data-refs.c
>> >> >> +++ b/gcc/tree-vect-data-refs.c
>> >> >> @@ -795,7 +795,10 @@ vect_compute_data_ref_alignment (struct 
>> >> >> data_reference *dr)
>> >> >>&& TREE_CODE (TREE_OPERAND (base, 0)) == ADDR_EXPR)
>> >> >>   base = TREE_OPERAND (TREE_OPERAND (base, 0), 0);
>> >> >>
>> >> >> -  if (!vect_can_force_dr_alignment_p (base, TYPE_ALIGN (vectype)))
>> >> >> +  if (!(TREE_CODE (base) == VAR_DECL
>> >> >> +&& decl_in_symtab_p (base)
>> >> >> +&& symtab_node::get (base)->can_increase_alignment_p ()
>> >> >> +&& vect_can_force_dr_alignment_p (base, TYPE_ALIGN 
>> >> >> (vectype
>> >> >>   {
>> >> >>if (dump_enabled_p ())
>> >> >>  {
>> >> >
>> >> > +  for (tree field = first_field (type);
>> >> > +   field != NULL_TREE;
>> >> > +   field = DECL_CHAIN (field))
>> >> > +{
>> >> > +  /* Skip if not FIELD_DECL or has variable offset.  */
>> >> > +  if (TREE_CODE (field) != FIELD_DECL
>> >> > + || TREE_CODE (DECL_FIELD_OFFSET (field)) != INTEGER_CST
>> >> > + || TREE_CODE (DECL_FIELD_BIT_OFFSET (field)) != INTEGER_CST
>> >> > + || DECL_USER_ALIGN (field)
>> >> > + || DECL_ARTIFICIAL (field))
>> >> > +   continue;
>> >> >
>> >> > you can stop processing the type and return 0 here if the offset
>> >> > is not an INTEGER_CST.  All following fields will have the same issue.
>> >> >
>> >> > +  /* FIXME: is this check necessary since we check for variable
>> >> > offset above ?  */
>> >> > +  if (TREE_CODE (offset_tree) != INTEGER_CST)
>> >> > +   continue;
>> >> >
>> >> > the check should not be 

Re: Make do_loop use estimated_num_iterations/expected_num_iterations

2016-05-20 Thread Richard Biener
On Thu, 19 May 2016, Jan Hubicka wrote:

> Hi,
> this patch makes doloop_optimize to use the
> get_estimated_loop_iterations_int/get_max_loop_iterations_int instead of 
> weakter
> check for const_iter.  Bootstrapped/regtested x86_64-linux, OK?

Ok.

Thanks,
Richard.

> Honza
> 
>   * loop-doloop.c (doloop_optimize): Use get_estimated_loop_iterations_int
>   and get_max_loop_iterations_int.
> Index: loop-doloop.c
> ===
> --- loop-doloop.c (revision 236450)
> +++ loop-doloop.c (working copy)
> @@ -610,7 +610,8 @@ doloop_optimize (struct loop *loop)
>widest_int iterations, iterations_max;
>rtx_code_label *start_label;
>rtx condition;
> -  unsigned level, est_niter;
> +  unsigned level;
> +  HOST_WIDE_INT est_niter;
>int max_cost;
>struct niter_desc *desc;
>unsigned word_mode_size;
> @@ -635,21 +636,16 @@ doloop_optimize (struct loop *loop)
>  }
>mode = desc->mode;
>  
> -  est_niter = 3;
> -  if (desc->const_iter)
> -est_niter = desc->niter;
> -  /* If the estimate on number of iterations is reliable (comes from profile
> - feedback), use it.  Do not use it normally, since the expected number
> - of iterations of an unrolled loop is 2.  */
> -  if (loop->header->count)
> -est_niter = expected_loop_iterations (loop);
> +  est_niter = get_estimated_loop_iterations_int (loop);
> +  if (est_niter == -1)
> +est_niter = get_max_loop_iterations_int (loop);
>  
> -  if (est_niter < 3)
> +  if (est_niter >= 0 && est_niter < 3)
>  {
>if (dump_file)
>   fprintf (dump_file,
>"Doloop: Too few iterations (%u) to be profitable.\n",
> -  est_niter);
> +  (unsigned int)est_niter);
>return false;
>  }
>  
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [patch] Teach VRP about NAME + CST1 vs CST2 comparison

2016-05-20 Thread Richard Biener
On Fri, May 20, 2016 at 11:20 AM, Eric Botcazou  wrote:
> Hi,
>
> this enhances VRP, more precisely compare_values_warnv, so as not to give up
> for the NAME + CST1 vs CST2 comparison if type overflow is undefined and the
> difference CST2 - CST1 overflows or underflows.  This makes it possible to
> optimize out a class of overflow checks in Ada, typically:
>
> function F (Val, Max : Integer) return Integer is
> begin
>if Val >= Max then
>   return Max;
>end if;
>return Val + 1;  -- overflow check can never fail
> end;
>
> The patch also streamlines the handling of symbolic values in the function by
> calling get_single_symbol instead of doing the decompostion manually twice.
>
> The Ada part is a tweak to expose more opportunities to VRP by removing casts.
>
> Tested on x86_64-suse-linux, OK for the mainline?

Ok.

Thanks,
Richard.

>
> 2016-05-20  Eric Botcazou  
>
> * tree-vrp.c (compare_values_warnv): Simplify handling of symbolic
> ranges by calling get_single_symbol and tidy up.  Look more closely
> into NAME + CST1 vs CST2 comparisons if type overflow is undefined.
> ada/
> * gcc-interface/decl.c (gnat_to_gnu_entity)
> Make same-sized subtypes of signed base types signed.
> * gcc-interface/utils.c (make_type_from_size): Adjust to above
> change.
> (unchecked_convert): Likewise.
>
>
> 2016-05-20  Eric Botcazou  
>
> * gnat.dg/opt53.adb: New test.
> * gnat.dg/opt54.adb: Likewise.
>
> --
> Eric Botcazou


Re: [patch] Fix PR tree-optimization/70884

2016-05-20 Thread Richard Biener
On Fri, May 20, 2016 at 9:34 AM, Eric Botcazou  wrote:
>> Effectively, the patch prevents late-SRA from doing anything for both
>> testcases (PR 70884 and PR 70919).  I have started a bootstrap and
>> testing on x86_64 and i686 only a few moments ago but it would be
>> great if someone also tried on an architecture for which the
>> constant-pool SRA enhancement was intended, just to be sure.
>
> Can you apply it now?  It's a wrong-code regression on the 6 branch and people
> can still chime it later in any case.

The patch is ok if it passed bootstrap/regtest.  I believe at least on
ARM we had
some tree-ssa.exp testcase(s) that were no longer XFAILing with the
added support.

Thanks,
Richard.

> --
> Eric Botcazou


Re: PR71206: inconsistent types after match.pd transformation

2016-05-20 Thread Richard Biener
On Fri, May 20, 2016 at 9:12 AM, Marc Glisse  wrote:
> On Fri, 20 May 2016, Marc Glisse wrote:
>
>> Hello,
>>
>> this was bootstrapped and regtested on powerpc64le-unknown-linux-gnu.
>>
>> 2016-05-20  Marc Glisse  
>
>
> PR tree-optimization/71079
> PR tree-optimization/71206

Ok.

Thanks,
Richard.

>
>> gcc/
>> * match.pd ((X ^ Y) ^ (X ^ Z)): Convert the arguments.
>>
>> gcc/testsuite/
>> * gcc.dg/tree-ssa/pr71206.c: New testcase.
>
>
> --
> Marc Glisse


Re: [PATCH] Fix PR tree-optimization/71179

2016-05-20 Thread Richard Biener
On Fri, May 20, 2016 at 4:13 AM, Kugan Vivekanandarajah
 wrote:
> Hi,
>
> We don’t allow vector type for integer. Likewise I am also disallowing
> the floating point vector type in transforming repeated addition to
> multiplication.
>
> This can be relaxed. I will send a separate patch to allow integer and
> floating point vectorization later.
>
> Bootstrapped and regression tested on x86-64-linux-gnu with no new 
> regressions.
>
> Is this OK for trunk?

Ok.

Thanks,
Richard.

> Thanks,
> Kugan
>
> gcc/testsuite/ChangeLog:
>
> 2016-05-20  Kugan Vivekanandarajah  
>
> * gcc.dg/tree-ssa/pr71179.c: New test.
>
> gcc/ChangeLog:
>
> 2016-05-20  Kugan Vivekanandarajah  
>
> * tree-ssa-reassoc.c (transform_add_to_multiply): Disallow float
> VECTOR type.


Re: [PATCH 3/3] function: Restructure *logue insertion

2016-05-20 Thread Thomas Schwinge
Hi!

> > >   * function.c (make_epilogue_seq): Remove epilogue_end parameter.
> > >   (thread_prologue_and_epilogue_insns): Remove bb_flags.  Restructure
> > >   code.  Ignore sibcalls on EDGE_IGNORE edges.
> > >   * shrink-wrap.c (handle_simple_exit): New function.  Set EDGE_IGNORE
> > >   on edges for sibcalls that run without prologue.  The rest of the
> > >   function is combined from...
> > >   (fix_fake_fallthrough_edge): ... this, and ...
> > >   (try_shrink_wrapping): ... a part of this.  Remove the bb_with
> > >   function argument, make it a local variable.

On Thu, 19 May 2016 17:20:46 -0500, Segher Boessenkool 
 wrote:
> On Thu, May 19, 2016 at 04:00:22PM -0600, Jeff Law wrote:
> > OK for the trunk, but please watch closely for any fallout.
> 
> Thanks, and I will!

With nvptx offloading on x86_64 GNU/Linux, this (r236491) is causing
several execution test failures.  I'll have a look.


Grüße
 Thomas


signature.asc
Description: PGP signature


Re: [PATCH, vec-tails 01/10] New compiler options

2016-05-20 Thread Richard Biener
On Thu, May 19, 2016 at 9:36 PM, Ilya Enkovich  wrote:
> Hi,
>
> This patch introduces new options used for loop epilogues vectorization.

Why's that?  This is a bit too much for the casual user and if it is
really necessary
to control this via options then it is not fine-grained enough.

Why doesn't the vectorizer/backend have enough info to decide this itself?

Richard.

> Thanks,
> Ilya
> --
> gcc/
>
> 2016-05-19  Ilya Enkovich  
>
> * common.opt (flag_tree_vectorize_epilogues): New.
> (ftree-vectorize-short-loops): New.
> (ftree-vectorize-epilogues=): New.
> (fno-tree-vectorize-epilogues): New.
> (fvect-epilogue-cost-model=): New.
> * flag-types.h (enum vect_epilogue_mode): New.
> * opts.c (parse_vectorizer_options): New.
> (common_handle_option): Support -ftree-vectorize-epilogues=
> and -fno-tree-vectorize-epilogues options.
>
>
> diff --git a/gcc/common.opt b/gcc/common.opt
> index 682cb41..6b83b79 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -243,6 +243,10 @@ bool dump_base_name_prefixed = false
>  Variable
>  bool flag_disable_hsa = false
>
> +; Flag holding modes for loop epilogue vectorization
> +Variable
> +unsigned int flag_tree_vectorize_epilogues
> +
>  ###
>  Driver
>
> @@ -2557,6 +2561,19 @@ ftree-vectorize
>  Common Report Var(flag_tree_vectorize) Optimization
>  Enable vectorization on trees.
>
> +ftree-vectorize-short-loops
> +Common Report Var(flag_tree_vectorize_short_loops) Optimization
> +Enable vectorization of loops with low trip count using masking.
> +
> +ftree-vectorize-epilogues=
> +Common Report Joined Optimization
> +Comma separated list of loop epilogue vectorization modes.
> +Available modes: combine, mask, nomask.
> +
> +fno-tree-vectorize-epilogues
> +Common RejectNegative Optimization
> +Disable epilogues vectorization.
> +
>  ftree-vectorizer-verbose=
>  Common Joined RejectNegative Ignore
>  Does nothing.  Preserved for backward compatibility.
> @@ -2577,6 +2594,10 @@ fsimd-cost-model=
>  Common Joined RejectNegative Enum(vect_cost_model) Var(flag_simd_cost_model) 
> Init(VECT_COST_MODEL_UNLIMITED) Optimization
>  Specifies the vectorization cost model for code marked with a simd directive.
>
> +fvect-epilogue-cost-model=
> +Common Joined RejectNegative Enum(vect_cost_model) 
> Var(flag_vect_epilogue_cost_model) Init(VECT_COST_MODEL_DEFAULT) Optimization
> +Specifies the cost model for epilogue vectorization.
> +
>  Enum
>  Name(vect_cost_model) Type(enum vect_cost_model) UnknownError(unknown 
> vectorizer cost model %qs)
>
> diff --git a/gcc/flag-types.h b/gcc/flag-types.h
> index dd57e16..24081b1 100644
> --- a/gcc/flag-types.h
> +++ b/gcc/flag-types.h
> @@ -200,6 +200,15 @@ enum vect_cost_model {
>VECT_COST_MODEL_DEFAULT = 3
>  };
>
> +/* Epilogue vectorization modes.  */
> +enum vect_epilogue_mode {
> +  VECT_EPILOGUE_COMBINE = 1 << 0,
> +  VECT_EPILOGUE_MASK = 1 << 1,
> +  VECT_EPILOGUE_NOMASK = 1 << 2,
> +  VECT_EPILOGUE_ALL = VECT_EPILOGUE_COMBINE | VECT_EPILOGUE_MASK
> + | VECT_EPILOGUE_NOMASK
> +};
> +
>  /* Different instrumentation modes.  */
>  enum sanitize_code {
>/* AddressSanitizer.  */
> diff --git a/gcc/opts.c b/gcc/opts.c
> index 0f9431a..a0c0987 100644
> --- a/gcc/opts.c
> +++ b/gcc/opts.c
> @@ -1531,6 +1531,63 @@ parse_sanitizer_options (const char *p, location_t 
> loc, int scode,
>return flags;
>  }
>
> +/* Parse comma separated vectorizer suboptions from P for option SCODE,
> +   adjust previous FLAGS and return new ones.  If COMPLAIN is false,
> +   don't issue diagnostics.  */
> +
> +unsigned int
> +parse_vectorizer_options (const char *p, location_t loc, int scode,
> + unsigned int flags, int value, bool complain)
> +{
> +  if (scode != OPT_ftree_vectorize_epilogues_)
> +return flags;
> +
> +  if (!p)
> +return value;
> +
> +  while (*p != 0)
> +{
> +  size_t len;
> +  const char *comma = strchr (p, ',');
> +  unsigned int flag = 0;
> +
> +  if (comma == NULL)
> +   len = strlen (p);
> +  else
> +   len = comma - p;
> +  if (len == 0)
> +   {
> + p = comma + 1;
> + continue;
> +   }
> +
> +  /* Check to see if the string matches an option class name.  */
> +  if (len == strlen ("combine")
> + && memcmp (p, "combine", len) == 0)
> +   flag = VECT_EPILOGUE_COMBINE;
> +  else if (len == strlen ("mask")
> + && memcmp (p, "mask", len) == 0)
> +   flag = VECT_EPILOGUE_MASK;
> +  else if (len == strlen ("nomask")
> + && memcmp (p, "nomask", len) == 0)
> +   flag = VECT_EPILOGUE_NOMASK;
> +  else if (complain)
> +   error_at (loc, "unrecognized argument to -ftree-vectorize-epilogues= "
> + "option: %q.*s", (int) len, p);
> +
> +  if (value)
> +   flags |= flag;
> +  else
> +   flags &= ~flag;
> +
> +  

  1   2   >