[patch] Fix PHI optimization issue with boolean types

2016-10-17 Thread Eric Botcazou
Hi,

this is a regression present on the mainline and 6 branch: the compiler now 
generates wrong code for the attached testcase at -O because of an internal 
conflict about boolean types.  The sequence is as follows.  In .mergephi3:

  boolean _22;
  p__enum res;

  :
  if (_22 != 0)
goto ;
  else
goto ;

  :

  :
  # res_17 = PHI <2(8), 0(9), 1(10)>

is turned into:

COND_EXPR in block 9 and PHI in block 11 converted to straightline code.
PHI res_17 changed to factor conversion out from COND_EXPR.
New stmt with CAST that defines res_17.

  boolean _33;

  :
  # _33 = PHI <2(8), _22(9)>
  res_17 = (p__enum) _33;

[...]

  :
  if (res_17 != 0)
goto ;
  else
goto ;

  :
  _12 = res_17 == 2;
  _13 = (integer) _12

in .phiopt1.  So boolean _33 can have value 2.  Later forwprop3 propagates _33
into the uses of res_17:

  :
  if (_33 != 0)
goto ;
  else
goto ;

  :
  _12 = _33 == 2;
  _13 = (integer) _12;

and DOM3 deduces:

  :
  _12 = 0;
  _13 = 0;

because it computes that _33 has value 1 in BB 13 since it's a boolean.

The problem was introduced by the new factor_out_conditional_conversion:

  /* If arg1 is an INTEGER_CST, fold it to new type.  */
  if (INTEGRAL_TYPE_P (TREE_TYPE (new_arg0))
  && int_fits_type_p (arg1, TREE_TYPE (new_arg0)))
{
  if (gimple_assign_cast_p (arg0_def_stmt))
new_arg1 = fold_convert (TREE_TYPE (new_arg0), arg1);
  else
return NULL;
}
  else
return NULL

int_fits_type_p is documented as taking only INTEGER_TYPE, but is invoked
on constant 2 and a BOOLEAN_TYPE and returns true.

BOOLEAN_TYPE is special in Ada: it has precision 8 and range [0;255] so the
outcome of int_fits_type_p is not unreasonable.  But this goes against the
various transformations applied to boolean types in the compiler, which all
assume that they can only take values 0 or 1.

Hence the attached fix (which should be a no-op except for Ada), tested on 
x86_64-suse-linux, OK for mainline and 6 branch?


2016-10-18  Eric Botcazou  

* tree.c (int_fits_type_p): Accept only 0 and 1 for boolean types.


2016-10-18  Eric Botcazou  

* gnat.dg/opt59.adb: New test.
* gnat.dg/opt59_pkg.ad[sb]: New helper.

-- 
Eric BotcazouIndex: tree.c
===
--- tree.c	(revision 241294)
+++ tree.c	(working copy)
@@ -9065,8 +9065,8 @@ get_narrower (tree op, int *unsignedp_pt
   return win;
 }
 
-/* Returns true if integer constant C has a value that is permissible
-   for type TYPE (an INTEGER_TYPE).  */
+/* Return true if integer constant C has a value that is permissible
+   for TYPE, an integral type.  */
 
 bool
 int_fits_type_p (const_tree c, const_tree type)
@@ -9075,6 +9075,11 @@ int_fits_type_p (const_tree c, const_tre
   bool ok_for_low_bound, ok_for_high_bound;
   signop sgn_c = TYPE_SIGN (TREE_TYPE (c));
 
+  /* Short-circuit boolean types since various transformations assume that
+ they can only take values 0 and 1.  */
+  if (TREE_CODE (type) == BOOLEAN_TYPE)
+return integer_zerop (c) || integer_onep (c);
+
 retry:
   type_low_bound = TYPE_MIN_VALUE (type);
   type_high_bound = TYPE_MAX_VALUE (type);
-- { dg-do run }
-- { dg-options "-O" }

with Opt59_Pkg; use Opt59_Pkg;

procedure Opt59 is

  type Enum is (Zero, One, Two);

  function Has_True (V : Boolean_Vector) return Boolean is
  begin
 for I in V'Range loop
if V (I) then
   return True;
end if;
 end loop;
 return False;
  end;

  Data1  : constant Boolean_Vector := Get_BV1;
  Data2  : constant Boolean_Vector := Get_BV2;
  Result : Boolean_Vector;

  function F return Enum is
Res  : Enum := Zero;
Set1 : constant Boolean := Has_True (Data1);
Set2 : constant Boolean := Has_True (Data2);
  begin
if Set1 then
  Res := Two;
elsif Set2 then
  Res := One;
end if;
return Res;
  end;

  Val : constant Enum := F;

begin

  for I in Result'Range loop
Result (I) := Data1 (I) or Data2 (I);
  end loop;

  if Val /= Zero then
Test (Val = Two);
  end if;

end;
package body Opt59_Pkg is

  function Get_BV1 return Boolean_Vector is
  begin
return (others => True);
  end;

  function Get_BV2 return Boolean_Vector is
  begin
return (others => False);
  end;

  procedure Test (B : Boolean) is
  begin
if not B then
  raise Program_Error;
end if;
  end;

end Opt59_Pkg;
package Opt59_Pkg is

  type Boolean_Vector is array (1 .. 8) of Boolean;

  function Get_BV1 return Boolean_Vector;

  function Get_BV2 return Boolean_Vector;

  procedure Test (B : Boolean);

end Opt59_Pkg;


Re: [PATCH] Clear BB_VISITED in bb-reorder

2016-10-17 Thread Andrew Pinski
On Mon, Oct 17, 2016 at 5:26 AM, Richard Biener  wrote:
>
> $subject, applied as obvious.

I think you should do the same for the vectorizer too.  I noticed that
when testing the patch for loop splitting.

Thanks,
Andrew

>
> Richard.
>
> 2016-10-17  Richard Biener  
>
> * bb-reorder.c (reorder_basic_blocks_simple): Clear BB_VISITED
> before using it.
>
> Index: gcc/bb-reorder.c
> ===
> --- gcc/bb-reorder.c(revision 241228)
> +++ gcc/bb-reorder.c(working copy)
> @@ -2355,7 +2355,10 @@ reorder_basic_blocks_simple (void)
>   To start with, everything points to itself, nothing is assigned yet.  */
>
>FOR_ALL_BB_FN (bb, cfun)
> -bb->aux = bb;
> +{
> +  bb->aux = bb;
> +  bb->flags &= ~BB_VISITED;
> +}
>
>EXIT_BLOCK_PTR_FOR_FN (cfun)->aux = 0;
>


Re: Use FOR_ALL_BB_FN in a few more places (was: [PATCH] Clear BB_VISITED in bb-reorder)

2016-10-17 Thread Richard Biener
On October 17, 2016 6:09:02 PM GMT+02:00, Thomas Schwinge 
 wrote:
>Hi!
>
>On Mon, 17 Oct 2016 15:17:13 +0200, Richard Biener
> wrote:
>> On Mon, Oct 17, 2016 at 3:12 PM, Thomas Schwinge
>>  wrote:
>> > On Mon, 17 Oct 2016 14:26:42 +0200 (CEST), Richard Biener
> wrote:
>> >> --- gcc/bb-reorder.c  (revision 241228)
>> >> +++ gcc/bb-reorder.c  (working copy)
>> >> @@ -2355,7 +2355,10 @@ reorder_basic_blocks_simple (void)
>> >>   To start with, everything points to itself, nothing is
>assigned yet.  */
>> >>
>> >>FOR_ALL_BB_FN (bb, cfun)
>> >> -bb->aux = bb;
>> >> +{
>> >> +  bb->aux = bb;
>> >> +  bb->flags &= ~BB_VISITED;
>> >> +}
>> >>
>> >>EXIT_BLOCK_PTR_FOR_FN (cfun)->aux = 0;
>> >
>> > "EXIT_BLOCK_PTR_FOR_FN (cfun)->flags &= ~BB_VISITED;" is not
>required
>> > here, additionally?
>> 
>> ALL_BB includes EXIT_BLOCK.
>
>I see -- ;-) FOR_ALL_BB_FN is something different from
>FOR_EACH_BB_FN...
>
>
>We could use the former in a few more places; OK for trunk once tested?

OK.

Richard.

>commit 183433526b2befb7bd75e74f753b8a5d4bec14b0
>Author: Thomas Schwinge 
>Date:   Mon Oct 17 18:07:16 2016 +0200
>
>Use FOR_ALL_BB_FN in a few more places
>
>   gcc/
>   * cfg.c (clear_bb_flags): Use FOR_ALL_BB_FN.
>   * config/nvptx/nvptx.c (nvptx_find_sese): Likewise.
>---
> gcc/cfg.c| 2 +-
> gcc/config/nvptx/nvptx.c | 8 +---
> 2 files changed, 2 insertions(+), 8 deletions(-)
>
>diff --git gcc/cfg.c gcc/cfg.c
>index ee2e42c..6604b02 100644
>--- gcc/cfg.c
>+++ gcc/cfg.c
>@@ -386,7 +386,7 @@ clear_bb_flags (void)
> {
>   basic_block bb;
> 
>-  FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR_FOR_FN (cfun), NULL, next_bb)
>+  FOR_ALL_BB_FN (bb, cfun)
> bb->flags &= BB_FLAGS_TO_PRESERVE;
> }
>
>diff --git gcc/config/nvptx/nvptx.c gcc/config/nvptx/nvptx.c
>index 6ec8eb4..80fa9ae 100644
>--- gcc/config/nvptx/nvptx.c
>+++ gcc/config/nvptx/nvptx.c
>@@ -3091,17 +3091,11 @@ nvptx_find_sese (auto_vec &blocks,
>bb_pair_vec_t ®ions)
>   int ix;
> 
>   /* First clear each BB of the whole function.  */ 
>-  FOR_EACH_BB_FN (block, cfun)
>+  FOR_ALL_BB_FN (block, cfun)
> {
>   block->flags &= ~BB_VISITED;
>   BB_SET_SESE (block, 0);
> }
>-  block = EXIT_BLOCK_PTR_FOR_FN (cfun);
>-  block->flags &= ~BB_VISITED;
>-  BB_SET_SESE (block, 0);
>-  block = ENTRY_BLOCK_PTR_FOR_FN (cfun);
>-  block->flags &= ~BB_VISITED;
>-  BB_SET_SESE (block, 0);
> 
>   /* Mark blocks in the function that are in this graph.  */
>   for (ix = 0; blocks.iterate (ix, &block); ix++)
>
>
>Grüße
> Thomas




Re: RFC [1/3] divmod transform v2

2016-10-17 Thread Prathamesh Kulkarni
On 18 October 2016 at 02:46, Jeff Law  wrote:
> On 10/15/2016 11:59 PM, Prathamesh Kulkarni wrote:
>>
>> This patch is mostly the same as previous one, except it drops
>> targeting __udivmoddi4() because it gave undefined reference link
>> error for calling __udivmoddi4() on aarch64-linux-gnu. It appears
>> aarch64 has hardware insn for DImode div, so __udivmoddi4() isn't
>> needed for the target (it was a bug in my patch that called
>> __udivmoddi4() even though aarch64 supported hardware div).
>
> This touches on the one high level question I had.  Namely what is the code
> generation strategy if the hardware has a div, but not divmod.
The divmod transform isn't enabled if target supports hardware div in the same
or wider mode even if divmod libfunc is available for the given mode.
>
> ISTM in that case I think we want to use the div instruction and synthesize
> mod from that result rather than relying on a software divmod.  So it looks
> like you ought to be doing the right thing for that case now based on your
> comment above.
>>
>>
>> However this makes me wonder if it's guaranteed that __udivmoddi4()
>> will be available for a target if it doesn't have hardware div and
>> divmod insn and doesn't have target-specific libfunc for DImode
>> divmod ? To be conservative, the attached patch doesn't generate call
>> to __udivmoddi4.
>
> I don't think that's a safe assumption.  Addition of the divmod routines
> into libgcc is controlled by the target and have to be explicitly added
> AFAICT.
>
> So on a target like the fr30 which has no div or mod insn and doesn't define
> the right bits in libgcc, there is no divmod libcall available. (On these
> targets there's a div libcall and a mod libcall, but not a combined one).
Thanks. I had erroneously  assumed __udivimoddi4() was available for all targets
because it was defined in libgcc2.c and generated call to it as "last
resort" for unsigned
DImode case if target didn't support hardware div and divmod insn and
didn't have
target-specific divmod libfunc for DImode.
The reason why it generated undefined reference on aarch64-linux-gnu
was because I
didn't properly check in the patch if target supported hardware div,
and ended up generating call to
__udivmoddi4() even though aarch64 has hardware div. I rectified that
before posting the
patch.
>
> I don't even think we have a way of knowing in the compiler if the target
> has enabled divmod support in libgcc.
Well the arm and c6x backends register target-specific divmod libfunc via
set_optab_libfunc(). I suppose that's sufficient to know if target has
divmod enabled
in libgcc ?
>
> ISTM that for now we have to limit to cases where we have a divmod
> insn/libcall defined.
Indeed. The transform is enabled only if the target has hardware divmod insn
or divmod libfunc (in the libfunc case, no hardware div insn).
Please see divmod_candidate_p() in the patch for cases when the
transform gets enabled.

Thanks,
Prathamesh
>
> jeff
>


Re: [PATCH] Fix PR77916

2016-10-17 Thread Markus Trippelsdorf
On 2016.10.18 at 05:13 +0200, Markus Trippelsdorf wrote:
> On 2016.10.17 at 17:23 -0500, Bill Schmidt wrote:
> > Hi,
> > 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77916 identifies a situation
> > where SLSR will ICE when exposed to a cast from integer to pointer.  This
> > is because we try to convert a PLUS_EXPR with an addend of -1 * S into a
> > MINUS_EXPR with a subtrahend of S, but the base operand is unexpectedly
> > of pointer type.  This patch recognizes when pointer arithmetic is taking
> > place and ensures that we use a POINTER_PLUS_EXPR at all such times.  In
> > the case of the PR, this occurs in the logic where the stride S is a known
> > constant value, but the same problem could occur when it is an SSA_NAME
> > without known value.  Both possibilities are handled here.
> > 
> > Fixing the code to ensure that the unknown stride case always uses an 
> > initializer for a negative increment allows us to remove the stopgap fix
> > added for PR77937 as well.
> > 
> > Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
> > regressions, committed.
> 
> Perhaps you should consider building ffmpeg with -O3 -march=amdfam10 on
> X86_64 before committing these patches, because you broke it for the
> third time in the last couple of days.
> 
> markus@x4 ffmpeg % cat h264dsp.i
> extern int fn2(int);
> extern int fn3(int);
> int a, b, c;
> void fn1(long p1) {
>   char *d;
>   for (;; d += p1) {
> d[0] = fn2(1 >> c >> 1);
> fn2(c >> a);
> d[1] = fn3(d[1]) >> 1;
> d[6] = fn3(d[6] * b + 1) >> 1;
> d[7] = fn3(d[7] * b + 1) >> 1;
> d[8] = fn3(d[8] * b + 1) >> 1;
>   }
> }
> 
> markus@x4 ffmpeg % gcc -O3 -march=amdfam10 -c h264dsp.i
> h264dsp.i: In function ‘fn1’:
> h264dsp.i:4:6: internal compiler error: in replace_one_candidate, at 
> gimple-ssa-strength-reduction.c:3375
>  void fn1(long p1) {
>   ^~~
> 0x12773a9 replace_one_candidate
> ../../gcc/gcc/gimple-ssa-strength-reduction.c:3375
> 0x127af77 replace_profitable_candidates
> ../../gcc/gcc/gimple-ssa-strength-reduction.c:3486
> 0x127aeeb replace_profitable_candidates
> ../../gcc/gcc/gimple-ssa-strength-reduction.c:3495
> 0x127f3ee analyze_candidates_and_replace
> ../../gcc/gcc/gimple-ssa-strength-reduction.c:3574
> 0x127f3ee execute
> ../../gcc/gcc/gimple-ssa-strength-reduction.c:3648

Just figured out that this testcase is identical to:
gcc/testsuite/gcc.dg/torture/pr77937-2.c 

So please run the testsuite on X86_64 in the future.

-- 
Markus


Re: [PATCH] Fix PR77916

2016-10-17 Thread Markus Trippelsdorf
On 2016.10.17 at 17:23 -0500, Bill Schmidt wrote:
> Hi,
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77916 identifies a situation
> where SLSR will ICE when exposed to a cast from integer to pointer.  This
> is because we try to convert a PLUS_EXPR with an addend of -1 * S into a
> MINUS_EXPR with a subtrahend of S, but the base operand is unexpectedly
> of pointer type.  This patch recognizes when pointer arithmetic is taking
> place and ensures that we use a POINTER_PLUS_EXPR at all such times.  In
> the case of the PR, this occurs in the logic where the stride S is a known
> constant value, but the same problem could occur when it is an SSA_NAME
> without known value.  Both possibilities are handled here.
> 
> Fixing the code to ensure that the unknown stride case always uses an 
> initializer for a negative increment allows us to remove the stopgap fix
> added for PR77937 as well.
> 
> Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
> regressions, committed.

Perhaps you should consider building ffmpeg with -O3 -march=amdfam10 on
X86_64 before committing these patches, because you broke it for the
third time in the last couple of days.

markus@x4 ffmpeg % cat h264dsp.i
extern int fn2(int);
extern int fn3(int);
int a, b, c;
void fn1(long p1) {
  char *d;
  for (;; d += p1) {
d[0] = fn2(1 >> c >> 1);
fn2(c >> a);
d[1] = fn3(d[1]) >> 1;
d[6] = fn3(d[6] * b + 1) >> 1;
d[7] = fn3(d[7] * b + 1) >> 1;
d[8] = fn3(d[8] * b + 1) >> 1;
  }
}

markus@x4 ffmpeg % gcc -O3 -march=amdfam10 -c h264dsp.i
h264dsp.i: In function ‘fn1’:
h264dsp.i:4:6: internal compiler error: in replace_one_candidate, at 
gimple-ssa-strength-reduction.c:3375
 void fn1(long p1) {
  ^~~
0x12773a9 replace_one_candidate
../../gcc/gcc/gimple-ssa-strength-reduction.c:3375
0x127af77 replace_profitable_candidates
../../gcc/gcc/gimple-ssa-strength-reduction.c:3486
0x127aeeb replace_profitable_candidates
../../gcc/gcc/gimple-ssa-strength-reduction.c:3495
0x127f3ee analyze_candidates_and_replace
../../gcc/gcc/gimple-ssa-strength-reduction.c:3574
0x127f3ee execute
../../gcc/gcc/gimple-ssa-strength-reduction.c:3648




-- 
Markus


Re: [patch, gfortran] PR48298 DTIO, implement size=

2016-10-17 Thread Steve Kargl
On Mon, Oct 17, 2016 at 06:02:52PM -0700, Jerry DeLisle wrote:
> Hi all,
> 
> The attached patch enables the size= specifier in a READ statement to work 
> with 
> child DTIO procedures. This is accomplished by moving the size_used variable 
> from the dtp structure to the gfc_unit structure so that the accumulation of 
> bytes during READ is carried across the procedures via the UNIT.
> 
> As far as I know, this is the last DTIO patch needed for full implementation 
> and 
> will close the PR.
> 
> After this patch is committed I plan to prepare a clean up patch to 
> reorganize 
> the dtp structure and clear at least one TODO related to stream IO. The 
> follow-on patch will bump the major version number of libgfortran to 4.
> 
> Regression tested on x86-64-linux. New test case attached.
> 
> OK for trunk?

Lookd good to me.

>   * transfer.c (read_sf_internal): Likewise. (read_sf): Likewise.
>   (read_block_form): Likewise. (read_block_form4): Likewise.


You can simplify this by

* transfer.c (read_sf_internal, read_sf, read_block_form,
read_block_form4): Likewise.

-- 
Steve


[PATCH] xtensa: add -mpreferred-stack-boundary option

2016-10-17 Thread Max Filippov
2016-05-24  Max Filippov  
gcc/
* config/xtensa/xtensa.opt (mpreferred-stack-boundary=): New
option.
* config/xtensa/xtensa.h (STACK_BOUNDARY): Redefine as 64 for
windowed ABI/32 for call0 ABI.
(PREFERRED_STACK_BOUNDARY): New definition.
* config/xtensa/xtensa.c (xtensa_function_arg_1,
xtensa_function_arg_boundary, STACK_BYTES,
xtensa_gimplify_va_arg_expr): Use PREFERRED_STACK_BOUNDARY
instead of STACK_BOUNDARY.
* doc/invoke.texi: Document -mpreferred-stack-boundary.
---
 gcc/config/xtensa/xtensa.c   | 12 +++-
 gcc/config/xtensa/xtensa.h   | 16 +---
 gcc/config/xtensa/xtensa.opt |  4 
 gcc/doc/invoke.texi  | 16 
 4 files changed, 40 insertions(+), 8 deletions(-)

diff --git a/gcc/config/xtensa/xtensa.c b/gcc/config/xtensa/xtensa.c
index 60e5029..332aa81 100644
--- a/gcc/config/xtensa/xtensa.c
+++ b/gcc/config/xtensa/xtensa.c
@@ -2117,7 +2117,8 @@ xtensa_function_arg_1 (cumulative_args_t cum_v, 
machine_mode mode,
 
   if (type && (TYPE_ALIGN (type) > BITS_PER_WORD))
 {
-  int align = MIN (TYPE_ALIGN (type), STACK_BOUNDARY) / BITS_PER_WORD;
+  int align = MIN (TYPE_ALIGN (type),
+  PREFERRED_STACK_BOUNDARY) / BITS_PER_WORD;
   *arg_words = (*arg_words + align - 1) & -align;
 }
 
@@ -2158,8 +2159,8 @@ xtensa_function_arg_boundary (machine_mode mode, 
const_tree type)
   alignment = type ? TYPE_ALIGN (type) : GET_MODE_ALIGNMENT (mode);
   if (alignment < PARM_BOUNDARY)
 alignment = PARM_BOUNDARY;
-  if (alignment > STACK_BOUNDARY)
-alignment = STACK_BOUNDARY;
+  if (alignment > PREFERRED_STACK_BOUNDARY)
+alignment = PREFERRED_STACK_BOUNDARY;
   return alignment;
 }
 
@@ -2624,7 +2625,7 @@ xtensa_call_save_reg(int regno)
 /* Return the bytes needed to compute the frame pointer from the current
stack pointer.  */
 
-#define STACK_BYTES (STACK_BOUNDARY / BITS_PER_UNIT)
+#define STACK_BYTES (PREFERRED_STACK_BOUNDARY / BITS_PER_UNIT)
 #define XTENSA_STACK_ALIGN(LOC) (((LOC) + STACK_BYTES-1) & ~(STACK_BYTES-1))
 
 long
@@ -3184,7 +3185,8 @@ xtensa_gimplify_va_arg_expr (tree valist, tree type, 
gimple_seq *pre_p,
 
   if (TYPE_ALIGN (type) > BITS_PER_WORD)
 {
-  int align = MIN (TYPE_ALIGN (type), STACK_BOUNDARY) / BITS_PER_UNIT;
+  int align = MIN (TYPE_ALIGN (type),
+  PREFERRED_STACK_BOUNDARY) / BITS_PER_UNIT;
 
   t = build2 (PLUS_EXPR, integer_type_node, unshare_expr (orig_ndx),
  build_int_cst (integer_type_node, align - 1));
diff --git a/gcc/config/xtensa/xtensa.h b/gcc/config/xtensa/xtensa.h
index 82e9900..abcb799 100644
--- a/gcc/config/xtensa/xtensa.h
+++ b/gcc/config/xtensa/xtensa.h
@@ -504,9 +504,19 @@ enum reg_class
location above the first argument's address.  */
 #define FIRST_PARM_OFFSET(FNDECL) 0
 
-/* Align stack frames on 128 bits for Xtensa.  This is necessary for
-   128-bit datatypes defined in TIE (e.g., for Vectra).  */
-#define STACK_BOUNDARY 128
+/* Align stack frames on 32 bits for call0 ABI or on 64 bits for windowed
+   ABI on Xtensa.  These are the absolute minimums dictated by the hardware
+   (windowed is 64 bits because the entry instruction moves SP in multiples
+   of 8 bytes).  ABIs require alignment to be 128 bits to support datatypes
+   defined in TIE (e.g., for Vectra).  */
+#define STACK_BOUNDARY (TARGET_WINDOWED_ABI ? 64 : 32)
+
+/* Align stack frames on 128 bits in accordance with ABI.  This alignment
+   may be reduced when stack space conservation is preferred over ABI
+   compliance with -mpreferred-stack-boundary option.  */
+#define PREFERRED_STACK_BOUNDARY   \
+   MAX (STACK_BOUNDARY,\
+(BITS_PER_UNIT << xtensa_preferred_stack_boundary))
 
 /* Use a fixed register window size of 8.  */
 #define WINDOW_SIZE (TARGET_WINDOWED_ABI ? 8 : 0)
diff --git a/gcc/config/xtensa/xtensa.opt b/gcc/config/xtensa/xtensa.opt
index ea5c7d5..85e9622 100644
--- a/gcc/config/xtensa/xtensa.opt
+++ b/gcc/config/xtensa/xtensa.opt
@@ -45,3 +45,7 @@ Relax literals in assembler and place them automatically in 
the text section.
 mserialize-volatile
 Target Report Mask(SERIALIZE_VOLATILE)
 -mno-serialize-volatileDo not serialize volatile memory references 
with MEMW instructions.
+
+mpreferred-stack-boundary=
+Target RejectNegative Joined UInteger Var(xtensa_preferred_stack_boundary) 
Init(4)
+Attempt to keep stack aligned to this power of 2.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 0241cb5..2e885a8 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1202,6 +1202,7 @@ See RS/6000 and PowerPC Options.
 -mtext-section-literals  -mno-text-section-literals @gol
 -mauto-litpools  -mno-auto-litpools @gol
 -mtarget-align  -mno-target-align @gol
+-mpreferred-stack-boundary=@var{num} @gol
 -mlongcalls  -mno-longcalls}
 
 @emph{zSeries Op

[patch, gfortran] PR48298 DTIO, implement size=

2016-10-17 Thread Jerry DeLisle

Hi all,

The attached patch enables the size= specifier in a READ statement to work with 
child DTIO procedures. This is accomplished by moving the size_used variable 
from the dtp structure to the gfc_unit structure so that the accumulation of 
bytes during READ is carried across the procedures via the UNIT.


As far as I know, this is the last DTIO patch needed for full implementation and 
will close the PR.


After this patch is committed I plan to prepare a clean up patch to reorganize 
the dtp structure and clear at least one TODO related to stream IO. The 
follow-on patch will bump the major version number of libgfortran to 4.


Regression tested on x86-64-linux. New test case attached.

OK for trunk?

Jerry

2016-10-17  Jerry DeLisle  

PR fortran/48298
* io/io.h: Move size_used from dtp to unit structure. Add bool
has_size to unit structure.
* read.c (read_x): Use has_size and size_used.
* transfer.c (read_sf_internal): Likewise. (read_sf): Likewise.
(read_block_form): Likewise. (read_block_form4): Likewise.
(data_transfer_init): If parent, initialize the size variables.
(finalize_transfer): Set the size variable using size_used in
gfc_unit. (write_block): Delete bogus/dead code.


diff --git a/libgfortran/ChangeLog b/libgfortran/ChangeLog
index bfda86df..f20c5106 100644
--- a/libgfortran/ChangeLog
+++ b/libgfortran/ChangeLog
@@ -1,3 +1,15 @@
+2016-10-17  Jerry DeLisle  
+
+	PR fortran/48298
+	* io/io.h: Move size_used from dtp to unit structure. Add bool
+	has_size to unit structure.
+	* read.c (read_x): Use has_size and size_used.
+	* transfer.c (read_sf_internal): Likewise. (read_sf): Likewise.
+	(read_block_form): Likewise. (read_block_form4): Likewise.
+	(data_transfer_init): If parent, initialize the size variables.
+	(finalize_transfer): Set the size variable using size_used in
+	gfc_unit. (write_block): Delete bogus/dead code.
+
 2016-10-16  Janne Blomqvist  
 
 	PR libfortran/48587
diff --git a/libgfortran/io/io.h b/libgfortran/io/io.h
index aaacc089..edc520a9 100644
--- a/libgfortran/io/io.h
+++ b/libgfortran/io/io.h
@@ -514,7 +514,6 @@ typedef struct st_parameter_dt
 	 large enough to hold a complex value (two reals) of the
 	 largest kind.  */
 	  char value[32];
-	  GFC_IO_INT size_used;
 	  formatted_dtio fdtio_ptr;
 	  unformatted_dtio ufdtio_ptr;
 	} p;
@@ -650,6 +649,8 @@ typedef struct gfc_unit
   /* DTIO Parent/Child procedure, 0 = parent, >0 = child level.  */
   int child_dtio;
   int last_char;
+  bool has_size;
+  GFC_IO_INT size_used;
 }
 gfc_unit;
 
diff --git a/libgfortran/io/read.c b/libgfortran/io/read.c
index f8d5b72e..d72cdb37 100644
--- a/libgfortran/io/read.c
+++ b/libgfortran/io/read.c
@@ -1282,8 +1282,9 @@ read_x (st_parameter_dt *dtp, int n)
 } 
 
  done:
-  if ((dtp->common.flags & IOPARM_DT_HAS_SIZE) != 0)
-dtp->u.p.size_used += (GFC_IO_INT) n;
+  if (((dtp->common.flags & IOPARM_DT_HAS_SIZE) != 0) ||
+  dtp->u.p.current_unit->has_size)
+dtp->u.p.current_unit->size_used += (GFC_IO_INT) n;
   dtp->u.p.current_unit->bytes_left -= n;
   dtp->u.p.current_unit->strm_pos += (gfc_offset) n;
 }
diff --git a/libgfortran/io/transfer.c b/libgfortran/io/transfer.c
index 2232417a..e5805772 100644
--- a/libgfortran/io/transfer.c
+++ b/libgfortran/io/transfer.c
@@ -267,8 +267,9 @@ read_sf_internal (st_parameter_dt *dtp, int * length)
 
   dtp->u.p.current_unit->bytes_left -= *length;
 
-  if ((dtp->common.flags & IOPARM_DT_HAS_SIZE) != 0)
-dtp->u.p.size_used += (GFC_IO_INT) *length;
+  if (((dtp->common.flags & IOPARM_DT_HAS_SIZE) != 0) ||
+  dtp->u.p.current_unit->has_size)
+dtp->u.p.current_unit->size_used += (GFC_IO_INT) *length;
 
   return base;
 
@@ -397,8 +398,9 @@ read_sf (st_parameter_dt *dtp, int * length)
 
   dtp->u.p.current_unit->bytes_left -= n;
 
-  if ((dtp->common.flags & IOPARM_DT_HAS_SIZE) != 0)
-dtp->u.p.size_used += (GFC_IO_INT) n;
+  if (((dtp->common.flags & IOPARM_DT_HAS_SIZE) != 0) ||
+  dtp->u.p.current_unit->has_size)
+dtp->u.p.current_unit->size_used += (GFC_IO_INT) n;
 
   /* We can't call fbuf_getptr before the loop doing fbuf_getc, because
  fbuf_getc might reallocate the buffer.  So return current pointer
@@ -478,8 +480,9 @@ read_block_form (st_parameter_dt *dtp, int * nbytes)
   source = fbuf_read (dtp->u.p.current_unit, nbytes);
   fbuf_seek (dtp->u.p.current_unit, *nbytes, SEEK_CUR);
 
-  if ((dtp->common.flags & IOPARM_DT_HAS_SIZE) != 0)
-dtp->u.p.size_used += (GFC_IO_INT) *nbytes;
+  if (((dtp->common.flags & IOPARM_DT_HAS_SIZE) != 0) ||
+  dtp->u.p.current_unit->has_size)
+dtp->u.p.current_unit->size_used += (GFC_IO_INT) *nbytes;
 
   if (norig != *nbytes)
 {
@@ -536,8 +539,9 @@ read_block_form4 (st_parameter_dt *dtp, int * nbytes)
 
   dtp->u.p.current_unit->bytes_left -= *nbytes;
 
-  if ((dtp->common.flags & IOPARM_DT_HAS_SIZE) != 0)
-dtp->u.p.size_used += (GFC_IO_INT) *nbytes;
+  if (((dtp->c

[PATCH] xtensa: don't use unwind-dw2-fde-dip with elf targets

2016-10-17 Thread Max Filippov
Define LIB2ADDEH_XTENSA_UNWIND_DW2_FDE to unwind-dw2-fde.c in
xtensa/t-elf and to unwind-dw2-fde-dip.c in xtensa/t-linux and use
LIB2ADDEH_XTENSA_UNWIND_DW2_FDE in LIB2ADDEH definition.

2016-10-17  Max Filippov  
libgcc/
* config/xtensa/t-elf (LIB2ADDEH_XTENSA_UNWIND_DW2_FDE): New
definition.
* config/xtensa/t-linux (LIB2ADDEH_XTENSA_UNWIND_DW2_FDE): New
definition.
* config/xtensa/t-windowed (LIB2ADDEH): Use
LIB2ADDEH_XTENSA_UNWIND_DW2_FDE defined by either xtensa/t-elf
or xtensa/t-linux.

Signed-off-by: Max Filippov 
---
 libgcc/config/xtensa/t-elf  | 2 ++
 libgcc/config/xtensa/t-linux| 2 ++
 libgcc/config/xtensa/t-windowed | 2 +-
 3 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/libgcc/config/xtensa/t-elf b/libgcc/config/xtensa/t-elf
index 59d5121..967cf9b 100644
--- a/libgcc/config/xtensa/t-elf
+++ b/libgcc/config/xtensa/t-elf
@@ -3,3 +3,5 @@ CRTSTUFF_T_CFLAGS += -mlongcalls
 CRTSTUFF_T_CFLAGS_S += -mlongcalls
 
 HOST_LIBGCC2_CFLAGS += -mlongcalls
+
+LIB2ADDEH_XTENSA_UNWIND_DW2_FDE = $(srcdir)/unwind-dw2-fde.c
diff --git a/libgcc/config/xtensa/t-linux b/libgcc/config/xtensa/t-linux
index 6f4ae89..412ecca 100644
--- a/libgcc/config/xtensa/t-linux
+++ b/libgcc/config/xtensa/t-linux
@@ -1 +1,3 @@
 SHLIB_MAPFILES += $(srcdir)/config/xtensa/libgcc-glibc.ver
+
+LIB2ADDEH_XTENSA_UNWIND_DW2_FDE = $(srcdir)/unwind-dw2-fde-dip.c
diff --git a/libgcc/config/xtensa/t-windowed b/libgcc/config/xtensa/t-windowed
index a99156c..f140136 100644
--- a/libgcc/config/xtensa/t-windowed
+++ b/libgcc/config/xtensa/t-windowed
@@ -1,2 +1,2 @@
 LIB2ADDEH = $(srcdir)/config/xtensa/unwind-dw2-xtensa.c \
-   $(srcdir)/unwind-dw2-fde-dip.c $(srcdir)/unwind-sjlj.c $(srcdir)/unwind-c.c
+   $(LIB2ADDEH_XTENSA_UNWIND_DW2_FDE) $(srcdir)/unwind-sjlj.c 
$(srcdir)/unwind-c.c
-- 
2.1.4



Re: [PATCH] rs6000: Fix separate shrink-wrapping for TARGET_MULTIPLE

2016-10-17 Thread Iain Sandoe
Hi Segher,

> On 17 Oct 2016, at 09:13, Segher Boessenkool  
> wrote:
> 
> We cannot use {SAVE,REST}_MULTIPLE and separate shrink-wrapping together,
> not without checking when actually emitting the prologue/epilogue that the
> registers to save/restore are actually still one contiguous block up to
> (and including) 31.  So either:
> 
> 1) We delay the decision of whether to use lmw/stmw to later;
> 2) We disallow shrink-wrapping separate (integer) components when those
> strategies are selected; or
> 3) We don't use those strategies if we use separate shrink-wrapping.
> 
> This patch does 3).  In the long term it may be best to do 1) instead,
> it can be slightly more efficient.
> 
> This caused problems on darwin (it is the only config that uses lmw/stmw
> instructions by default).
> 
> Bootstrapped and tested on powerpc64-linux {-m64,-m32}.  I'll commit it
> if Iain's testing (on darwin) also succeeds.

thanks!

All-langs bootstrap was restored with the patch (and others in progress for 
existing known issues); 

I can’t see any evidence of the assert firing in the test-suite, so it all 
looks good to me (there’s quite a bit of stage-1-ish testsuite noise at 
present, however).

cheers
Iain
> 
> 
> 2016-10-17  Segher Boessenkool  
> 
>   * config/rs6000/rs6000.c: Add include of shrink-wrap.h .
>   (rs6000_savres_strategy): Do not select {SAVE,REST}_MULTIPLE if
>   shrink-wrapping separate components.
>   (rs6000_get_separate_components): Assert we do not have those
>   strategies selected.
> 
> ---
> gcc/config/rs6000/rs6000.c | 9 -
> 1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index df48980..83a7139 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -64,6 +64,7 @@
> #include "target-globals.h"
> #include "builtins.h"
> #include "context.h"
> +#include "shrink-wrap.h"
> #include "tree-pass.h"
> #if TARGET_XCOFF
> #include "xcoffout.h"  /* get declarations of xcoff_*_section_name */
> @@ -25506,7 +25507,10 @@ rs6000_savres_strategy (rs6000_stack_t *info,
>   if (TARGET_MULTIPLE
>   && !TARGET_POWERPC64
>   && !(TARGET_SPE_ABI && info->spe_64bit_regs_used)
> -  && info->first_gp_reg_save < 31)
> +  && info->first_gp_reg_save < 31
> +  && !(SHRINK_WRAPPING_ENABLED
> +&& flag_shrink_wrap_separate
> +&& optimize_function_for_speed_p (cfun)))
> {
>   /* Prefer store multiple for saves over out-of-line routines,
>since the store-multiple instruction will always be smaller.  */
> @@ -27440,6 +27444,9 @@ rs6000_get_separate_components (void)
>   sbitmap components = sbitmap_alloc (32);
>   bitmap_clear (components);
> 
> +  gcc_assert (!(info->savres_strategy & SAVE_MULTIPLE)
> +   && !(info->savres_strategy & REST_MULTIPLE));
> +
>   /* The GPRs we need saved to the frame.  */
>   if ((info->savres_strategy & SAVE_INLINE_GPRS)
>   && (info->savres_strategy & REST_INLINE_GPRS))
> -- 
> 1.9.3
> 



[PATCH] Remove leftover comment

2016-10-17 Thread Bill Schmidt
Whoops.  Discovered I had inadvertently left in a comment that made no
sense when fixing PR77916.  This patch removes it.  Committed as obvious.

Thanks,
Bill


2016-10-17  Bill Schmidt  

* gimple-ssa-strength-reduction.c (record_increment): Remove
garbage comment.


Index: gcc/gimple-ssa-strength-reduction.c
===
--- gcc/gimple-ssa-strength-reduction.c (revision 241281)
+++ gcc/gimple-ssa-strength-reduction.c (working copy)
@@ -2507,9 +2507,7 @@ record_increment (slsr_cand_t c, widest_int increm
 as providing an initializer (if it does); we will revise this
 opinion later if it doesn't dominate all other occurrences.
  Exception:  increments of 0, 1 never need initializers;
-and phi adjustments don't ever provide initializers.  Note
- that we only will see an increment of -1 here for pointer
-arithmetic (otherwise we will have an initializer).  */
+and phi adjustments don't ever provide initializers.  */
   if (c->kind == CAND_ADD
  && !is_phi_adjust
  && c->index == increment



Re: [patch v2] Get rid of stack trampolines for nested functions (0/4)

2016-10-17 Thread Eric Botcazou
> On ia64 I get this regression:
> 
> FAIL: gcc.dg/Wtrampolines.c  (test for warnings, line 31)
> 
> Since ia64 never uses trampolines this is probably ok and the test
> should be adjusted.

This actually revealed a problem: the same regression should have appeared on 
PowerPC64/Linux (ELFv1 ABI) but it didn't because I botched the hookization of 
TARGET_CUSTOM_FUNCTION_DESCRIPTORS, which started as a good old macro:

/* Use custom descriptors instead of trampolines if not AIX or ELFv1.  */
#define TARGET_CUSTOM_FUNCTION_DESCRIPTORS (DEFAULT_ABI != ABI_AIX)

doesn't work as intended on PowerPC64 because DEFAULT_ABI is a variable...

Corrective patch attached, tested on x86/Linux, x86-64/Linux, PowerPC/Linux, 
PowerPC64/Linux, IA-64/Linux and SPARC/Solaris, applied as obvious.


* config/i386/i386.h (TARGET_CUSTOM_FUNCTION_DESCRIPTORS): Move to...
* config/i386/i386.c (TARGET_CUSTOM_FUNCTION_DESCRIPTORS): ...here.
* config/ia64/ia64.h (TARGET_CUSTOM_FUNCTION_DESCRIPTORS): Move to...
* config/ia64/ia64.c (TARGET_CUSTOM_FUNCTION_DESCRIPTORS): ...here.
* config/rs6000/rs6000.h (TARGET_CUSTOM_FUNCTION_DESCRIPTORS): Move
to...
* config/rs6000/rs6000.c (TARGET_CUSTOM_FUNCTION_DESCRIPTORS):...here.
(rs6000_option_override_internal): Clear it if ABI_AIX.
* config/sparc/sparc.h (TARGET_CUSTOM_FUNCTION_DESCRIPTORS): Move
to...
* config/sparc/sparc.c (TARGET_CUSTOM_FUNCTION_DESCRIPTORS): ...here.
testsuite/
* gcc.dg/Wtrampolines.c: XFAIL warning on ia64-*-* and powerpc64-*-*.
* gnat.dg/trampoline4.adb: Minor tweak.

-- 
Eric BotcazouIndex: config/i386/i386.c
===
--- config/i386/i386.c	(revision 241222)
+++ config/i386/i386.c	(working copy)
@@ -50833,6 +50833,9 @@ ix86_addr_space_zero_address_valid (addr
 #undef TARGET_HARD_REGNO_SCRATCH_OK
 #define TARGET_HARD_REGNO_SCRATCH_OK ix86_hard_regno_scratch_ok
 
+#undef TARGET_CUSTOM_FUNCTION_DESCRIPTORS
+#define TARGET_CUSTOM_FUNCTION_DESCRIPTORS 1
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-i386.h"
Index: config/i386/i386.h
===
--- config/i386/i386.h	(revision 241222)
+++ config/i386/i386.h	(working copy)
@@ -2666,9 +2666,6 @@ extern void debug_dispatch_window (int);
 
 #define TARGET_SUPPORTS_WIDE_INT 1
 
-/* Use custom descriptors instead of trampolines when possible.  */
-#define TARGET_CUSTOM_FUNCTION_DESCRIPTORS 1
-
 /*
 Local variables:
 version-control: t
Index: config/ia64/ia64.c
===
--- config/ia64/ia64.c	(revision 241222)
+++ config/ia64/ia64.c	(working copy)
@@ -649,6 +649,9 @@ static const struct attribute_spec ia64_
 #undef TARGET_ATTRIBUTE_TAKES_IDENTIFIER_P
 #define TARGET_ATTRIBUTE_TAKES_IDENTIFIER_P ia64_attribute_takes_identifier_p
 
+#undef TARGET_CUSTOM_FUNCTION_DESCRIPTORS
+#define TARGET_CUSTOM_FUNCTION_DESCRIPTORS 0
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 /* Returns TRUE iff the target attribute indicated by ATTR_ID takes a plain
Index: config/ia64/ia64.h
===
--- config/ia64/ia64.h	(revision 241222)
+++ config/ia64/ia64.h	(working copy)
@@ -1715,7 +1715,4 @@ struct GTY(()) machine_function
 /* Switch on code for querying unit reservations.  */
 #define CPU_UNITS_QUERY 1
 
-/* IA-64 already uses descriptors for its standard calling sequence.  */
-#define TARGET_CUSTOM_FUNCTION_DESCRIPTORS 0
-
 /* End of ia64.h */
Index: config/rs6000/rs6000.c
===
--- config/rs6000/rs6000.c	(revision 241222)
+++ config/rs6000/rs6000.c	(working copy)
@@ -1867,6 +1867,9 @@ static const struct attribute_spec rs600
 
 #undef TARGET_OPTAB_SUPPORTED_P
 #define TARGET_OPTAB_SUPPORTED_P rs6000_optab_supported_p
+
+#undef TARGET_CUSTOM_FUNCTION_DESCRIPTORS
+#define TARGET_CUSTOM_FUNCTION_DESCRIPTORS 1
 
 
 /* Processor table.  */
@@ -4862,6 +4865,10 @@ rs6000_option_override_internal (bool gl
 	 Linux and Darwin ABIs at the moment.  For now, only AIX is fixed.  */
   if (DEFAULT_ABI == ABI_V4 || DEFAULT_ABI == ABI_DARWIN)
 	targetm.calls.split_complex_arg = NULL;
+
+  /* The AIX and ELFv1 ABIs define standard function descriptors.  */
+  if (DEFAULT_ABI == ABI_AIX)
+	targetm.calls.custom_function_descriptors = 0;
 }
 
   /* Initialize rs6000_cost with the appropriate target costs.  */
Index: config/rs6000/rs6000.h
===
--- config/rs6000/rs6000.h	(revision 241222)
+++ config/rs6000/rs6000.h	(working copy)
@@ -2922,9 +2922,6 @@ extern GTY(()) tree rs6000_builtin_decls
 
 #define TARGET_SUPPORTS_WIDE_INT 1
 
-/* Use custom descriptors instead of trampolines if not AIX or ELFv1.  */
-#define TARGET_CUSTOM_FUNCTION_DESCRIPTORS (DEFAULT_ABI != ABI_AIX

[PATCH] Fix PR77916

2016-10-17 Thread Bill Schmidt
Hi,

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77916 identifies a situation
where SLSR will ICE when exposed to a cast from integer to pointer.  This
is because we try to convert a PLUS_EXPR with an addend of -1 * S into a
MINUS_EXPR with a subtrahend of S, but the base operand is unexpectedly
of pointer type.  This patch recognizes when pointer arithmetic is taking
place and ensures that we use a POINTER_PLUS_EXPR at all such times.  In
the case of the PR, this occurs in the logic where the stride S is a known
constant value, but the same problem could occur when it is an SSA_NAME
without known value.  Both possibilities are handled here.

Fixing the code to ensure that the unknown stride case always uses an 
initializer for a negative increment allows us to remove the stopgap fix
added for PR77937 as well.

Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
regressions, committed.

Thanks,
Bill


[gcc]

2016-10-17  Bill Schmidt  

PR tree-optimization/77916
* gimple-ssa-strength-reduction.c (create_add_on_incoming_edge):
Don't allow a MINUS_EXPR for pointer arithmetic for either known
or unknown strides.
(record_increment): Increments of -1 for unknown strides just use
a multiply initializer like other negative values.
(analyze_increments): Remove stopgap solution for -1 increment
applied to pointer arithmetic.

[gcc/testsuite]

2016-10-17  Bill Schmidt  

PR tree-optimization/77916
* gcc.dg/torture/pr77916.c: New.


Index: gcc/gimple-ssa-strength-reduction.c
===
--- gcc/gimple-ssa-strength-reduction.c (revision 241245)
+++ gcc/gimple-ssa-strength-reduction.c (working copy)
@@ -2154,35 +2154,41 @@ create_add_on_incoming_edge (slsr_cand_t c, tree b
   basis_type = TREE_TYPE (basis_name);
   lhs = make_temp_ssa_name (basis_type, NULL, "slsr");
 
+  /* Occasionally people convert integers to pointers without a 
+ cast, leading us into trouble if we aren't careful.  */
+  enum tree_code plus_code
+= POINTER_TYPE_P (basis_type) ? POINTER_PLUS_EXPR : PLUS_EXPR;
+
   if (known_stride)
 {
   tree bump_tree;
-  enum tree_code code = PLUS_EXPR;
+  enum tree_code code = plus_code;
   widest_int bump = increment * wi::to_widest (c->stride);
-  if (wi::neg_p (bump))
+  if (wi::neg_p (bump) && !POINTER_TYPE_P (basis_type))
{
  code = MINUS_EXPR;
  bump = -bump;
}
 
-  bump_tree = wide_int_to_tree (basis_type, bump);
+  tree stride_type = POINTER_TYPE_P (basis_type) ? sizetype : basis_type;
+  bump_tree = wide_int_to_tree (stride_type, bump);
   new_stmt = gimple_build_assign (lhs, code, basis_name, bump_tree);
 }
   else
 {
   int i;
-  bool negate_incr = (!address_arithmetic_p && wi::neg_p (increment));
+  bool negate_incr = !POINTER_TYPE_P (basis_type) && wi::neg_p (increment);
   i = incr_vec_index (negate_incr ? -increment : increment);
   gcc_assert (i >= 0);
 
   if (incr_vec[i].initializer)
{
- enum tree_code code = negate_incr ? MINUS_EXPR : PLUS_EXPR;
+ enum tree_code code = negate_incr ? MINUS_EXPR : plus_code;
  new_stmt = gimple_build_assign (lhs, code, basis_name,
  incr_vec[i].initializer);
}
   else if (increment == 1)
-   new_stmt = gimple_build_assign (lhs, PLUS_EXPR, basis_name, c->stride);
+   new_stmt = gimple_build_assign (lhs, plus_code, basis_name, c->stride);
   else if (increment == -1)
new_stmt = gimple_build_assign (lhs, MINUS_EXPR, basis_name,
c->stride);
@@ -2500,12 +2506,14 @@ record_increment (slsr_cand_t c, widest_int increm
   /* Optimistically record the first occurrence of this increment
 as providing an initializer (if it does); we will revise this
 opinion later if it doesn't dominate all other occurrences.
- Exception:  increments of -1, 0, 1 never need initializers;
-and phi adjustments don't ever provide initializers.  */
+ Exception:  increments of 0, 1 never need initializers;
+and phi adjustments don't ever provide initializers.  Note
+ that we only will see an increment of -1 here for pointer
+arithmetic (otherwise we will have an initializer).  */
   if (c->kind == CAND_ADD
  && !is_phi_adjust
  && c->index == increment
- && (increment > 1 || increment < -1)
+ && (increment > 1 || increment < 0)
  && (gimple_assign_rhs_code (c->cand_stmt) == PLUS_EXPR
  || gimple_assign_rhs_code (c->cand_stmt) == POINTER_PLUS_EXPR))
{
@@ -2819,11 +2827,6 @@ analyze_increments (slsr_cand_t first_dep, machine
   && !POINTER_TYPE_P (first_dep->cand_type)))
incr_vec[i].cost = COST_NEUTRAL;
 
-  /* FIXME: We don't handle poin

Re: [PATCH] fix outstanding -Wformat-length failures (pr77735 et al.)

2016-10-17 Thread Martin Sebor



On 10/17/2016 01:11 PM, Jeff Law wrote:

On 10/02/2016 02:10 PM, Martin Sebor wrote:

The attached patch fixes a number of outstanding test failures
and ILP32-related bugs in the gimple-ssa-sprintf pattch pointed
out in bug 77676 and 77735).  The patch also fixes c_strlen to
correctly handle wide strings (previously it accepted them but
treated them as nul-terminated byte sequences), and adjusts the
handling of "%a" to avoid assuming a specific number of decimal
digits (this is likely a defect in C11 that I'm pursuing with
WG14).

Tested on powerpc64le, i386, and x86_64.

There is one outstanding failure in the builtin-sprintf-warn-1.c
test on powerpc64le that looks like it might be due to the
printf_pointer_format target hook not having been set up entirely
correctly.  I'll look into that separately, along with pr77819.

Martin

gcc-77735.diff


PR middle-end/77735 - FAIL: gcc.dg/tree-ssa/builtin-sprintf-warn-1.c

gcc/ChangeLog:
2016-10-02  Martin Sebor  

PR middle-end/77735
* builtins.c (string_length): New function.
(c_strlen): Use string_length.  Correctly handle wide strings.
* gimple-ssa-sprintf.c (target_max_value, target_size_max): New
functions.
(target_int_max): Call target_max_value.
(format_result::knownrange): New data member.
(fmtresult::fmtresult): Define default constructor.
(format_integer): Use it and set format_result::knownrange.
Handle global constants.
(format_floating_max): Add third argument.
(format_floating): Recompute maximum value for %a for each argument.
(get_string_length): Use fmtresult default ctor.
(format_string): Set format_result::knownrange.
(format_directive): Check format_result::knownrange.
(add_bytes): Same.  Correct caret placement in diagnostics.
(pass_sprintf_length::compute_format_length): Set
format_result::knownrange.
(pass_sprintf_length::handle_gimple_call): Use target_size_max.

gcc/testsuite/ChangeLog:
2016-10-02  Martin Sebor  

PR middle-end/77735
* gcc.dg/tree-ssa/builtin-sprintf-2.c: Add test cases.
* gcc.dg/tree-ssa/builtin-sprintf-warn-1.c: Same.
* gcc.dg/tree-ssa/builtin-sprintf-warn-2.c: Same.
* gcc.dg/tree-ssa/builtin-sprintf-warn-3.c: Adjust/relax.
* gcc.dg/tree-ssa/builtin-sprintf-warn-4.c: Add test cases.
* gcc.dg/tree-ssa/builtin-sprintf-warn-6.c: XFAIL for LP64 only.
* gcc.dg/tree-ssa/builtin-sprintf.c: Add test cases.

diff --git a/gcc/gimple-ssa-sprintf.c b/gcc/gimple-ssa-sprintf.c
index 92f939e..410bbc3 100644
--- a/gcc/gimple-ssa-sprintf.c
+++ b/gcc/gimple-ssa-sprintf.c
@@ -1281,9 +1330,9 @@ format_floating (const conversion_spec &spec,
int width, int prec)
 res.range.max = width;
 }

-  /* The argument is only considered bounded when the range of output
- bytes is exact.  */
-  res.bounded = res.range.min == res.range.max;
+  /* The argument is considered bounded when the output of a directive
+ is fully specified and the range of output bytes is exact.  */
+  // res.bounded &= res.range.min == res.range.max;

Did you mean to leave this commented out?


I'm pretty sure I didn't mean to leave the code commented out like
that.  I suspect I commented it out to confirm that it's not needed,
and forgot to remove it after it passed my testing.  Let me remove
it (and thanks for pointing it out!)


It looks like you're defining a constructor for "struct fmtresult".
Normally once we define a constructor we turn the underlying object into
a class.It appears that the constructor leaves argmin, argmax,
knownrange, bounded & constant uninitialized?  Am I missing something here?


I added the default ctor to avoid having to explicitly clear the
members.  The member() syntax means to default-initialize them by
setting them to zero or null:

 struct fmtresult
 {
+  fmtresult ()
+  : argmin (), argmax (), knownrange (), bounded (), constant ()
+  {
+range.min = range.max = HOST_WIDE_INT_MAX;
+  }
+

Or did you have something else in mind?  (I want to leave the members
public so they can conveniently accessed.)





+  /* A plain '%c' directive.  Its ouput is excactly 1.  */

Typo for exactly.



Will fix.

With that and with the comment above removed, would like me to post
an updated patch or is it okay commit?

Martin


[PATCH] DWARF5 DW_FORM_implicit_const support

2016-10-17 Thread Jakub Jelinek
Hi!

This patch starts using DW_FORM_implicit_const (new in DWARF5), though only
for cases where the constants are the same among all DIEs with the same
abbreviation (and only for abbrevs used only in the main compilation unit,
because right now we don't give all DIEs abbreviations first before
outputing anything.  I think more precise optimizations could be left to DWZ
(when it sees the whole binary or library, it can better simulate different
decisions like using two different abbreviations for different subsets of
DIEs which have the same attributes and forms, but differ in some const
values, etc.

The patch also switches the abbrev table from GC registered array with used
and allocated integers to normal va_gc vector.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-10-17  Jakub Jelinek  

* dwarf2out.h (enum dw_val_class): Add dw_val_class_const_implicit,
dw_val_class_unsigned_const_implicit and dw_val_class_file_implicit.
(struct dw_val_node): Add val_file_implicit field.
* dwarf2out.c (dw_val_equal_p, print_dw_val, attr_checksum,
attr_checksum_ordered, same_dw_val_p, size_of_die, value_format,
output_die): Handle dw_val_class_const_implicit,
dw_val_class_unsigned_const_implicit and dw_val_class_file_implicit.
(abbrev_die_table): Change into va_gc vec.
(abbrev_die_table_allocated, abbrev_die_table_in_use,
ABBREV_DIE_TABLE_INCREMENT): Remove.
(AT_int, AT_unsigned, AT_file): Allow dw_val_class_*_implicit.
(abbrev_opt_start, abbrev_usage_count, abbrev_dies): New variables.
(build_abbrev_table): Adjust for abbrev_die_table being a va_gc vec.
If abbrev_opt_start, fill in abbrev_usage_count and abbrev_dies
vectors.
(die_abbrev_cmp, optimize_implicit_const, optimize_abbrev_table): New
functions.
(output_die_abbrevs): For DW_FORM_implicit_const emit sleb128 with
the implicit value.
(output_abbrev_section): Adjust for abbrev_die_table being a va_gc
vec.
(output_comp_unit): Initialize abbrev_opt_start if emitting the main
unit.  Call optimize_abbrev_table.
(dwarf2out_init, dwarf2out_finish, dwarf2out_c_finalize): Adjust for
abbrev_die_table being a va_gc vec.

--- gcc/dwarf2out.h.jj  2016-04-28 17:26:05.0 +0200
+++ gcc/dwarf2out.h 2016-10-17 19:07:45.425312870 +0200
@@ -153,7 +153,10 @@ enum dw_val_class
   dw_val_class_vms_delta,
   dw_val_class_high_pc,
   dw_val_class_discr_value,
-  dw_val_class_discr_list
+  dw_val_class_discr_list,
+  dw_val_class_const_implicit,
+  dw_val_class_unsigned_const_implicit,
+  dw_val_class_file_implicit
 };
 
 /* Describe a floating point constant value, or a vector constant value.  */
@@ -198,7 +201,8 @@ struct GTY(()) dw_val_node {
   dw_loc_list_ref GTY ((tag ("dw_val_class_loc_list"))) val_loc_list;
   dw_loc_descr_ref GTY ((tag ("dw_val_class_loc"))) val_loc;
   HOST_WIDE_INT GTY ((default)) val_int;
-  unsigned HOST_WIDE_INT GTY ((tag ("dw_val_class_unsigned_const"))) 
val_unsigned;
+  unsigned HOST_WIDE_INT
+   GTY ((tag ("dw_val_class_unsigned_const"))) val_unsigned;
   double_int GTY ((tag ("dw_val_class_const_double"))) val_double;
   wide_int_ptr GTY ((tag ("dw_val_class_wide_int"))) val_wide;
   dw_vec_const GTY ((tag ("dw_val_class_vec"))) val_vec;
@@ -212,6 +216,8 @@ struct GTY(()) dw_val_node {
   char * GTY ((tag ("dw_val_class_lbl_id"))) val_lbl_id;
   unsigned char GTY ((tag ("dw_val_class_flag"))) val_flag;
   struct dwarf_file_data * GTY ((tag ("dw_val_class_file"))) val_file;
+  struct dwarf_file_data *
+   GTY ((tag ("dw_val_class_file_implicit"))) val_file_implicit;
   unsigned char GTY ((tag ("dw_val_class_data8"))) val_data8[8];
   tree GTY ((tag ("dw_val_class_decl_ref"))) val_decl_ref;
   struct dw_val_vms_delta_union
--- gcc/dwarf2out.c.jj  2016-10-17 14:46:46.0 +0200
+++ gcc/dwarf2out.c 2016-10-17 19:17:54.350626382 +0200
@@ -1363,6 +1363,8 @@ dw_val_equal_p (dw_val_node *a, dw_val_n
 case dw_val_class_offset:
 case dw_val_class_unsigned_const:
 case dw_val_class_const:
+case dw_val_class_unsigned_const_implicit:
+case dw_val_class_const_implicit:
 case dw_val_class_range_list:
 case dw_val_class_lineptr:
 case dw_val_class_macptr:
@@ -1385,6 +1387,7 @@ dw_val_equal_p (dw_val_node *a, dw_val_n
 case dw_val_class_flag:
   return a->v.val_flag == b->v.val_flag;
 case dw_val_class_file:
+case dw_val_class_file_implicit:
   return a->v.val_file == b->v.val_file;
 case dw_val_class_decl_ref:
   return a->v.val_decl_ref == b->v.val_decl_ref;
@@ -3006,17 +3009,9 @@ struct dw_loc_list_hasher : ggc_ptr_hash
 /* Table of cached location lists.  */
 static GTY (()) hash_table *cached_dw_loc_list_table;
 
-/* A pointer to the base of a list of references to DIE's that
-   are uniquely i

Re: [PATCH] PR77895: DWARF: Emit DW_AT_comp_dir in all cases, even if source is an absolute path

2016-10-17 Thread Mike Stump
On Oct 17, 2016, at 2:38 PM, Ximin Luo  wrote:
> 
> Mike Stump:
>> On Oct 17, 2016, at 11:00 AM, Ximin Luo  wrote:
>>> Therefore, it is better to emit it in all circumstances, in case the reader 
>>> needs to know what the working
>>> directory was at compile-time.
>> 
>> I can't help but wonder if this would break ccache some?
>> 
> 
> Could you explain this in some more detail? At the moment, GCC will already 
> emit DW_AT_name with an absolute path (in the scenario that this patch is 
> relevant to). How does ccache work around this at the moment? (Does it use 
> debug-prefix-map? In which case, this also affects DW_AT_comp_dir, so my 
> patch should be fine.)

If you compile the same file, but in a different directory, I wonder if cwd 
will cause the cache entry to not be reused.

I expect one of the ccache people that are around would just know if it will 
care at all.



[PATCH] DWARF5 DW_FORM_data16 support

2016-10-17 Thread Jakub Jelinek
Hi!

DWARF5 has a new 128-bit constant class form, this patch uses it
e.g. for __int128 values.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-10-17  Jakub Jelinek  

* dwarf2out.c (size_of_die, value_format, output_die): Use
DW_FORM_data16 for 128-bit dw_val_class_const_double or
dw_val_class_wide_int.

--- gcc/dwarf2out.c.jj  2016-10-17 11:45:03.0 +0200
+++ gcc/dwarf2out.c 2016-10-17 14:46:46.369083146 +0200
@@ -8591,14 +8591,14 @@ size_of_die (dw_die_ref die)
  break;
case dw_val_class_const_double:
  size += HOST_BITS_PER_DOUBLE_INT / HOST_BITS_PER_CHAR;
- if (HOST_BITS_PER_WIDE_INT >= 64)
+ if (HOST_BITS_PER_WIDE_INT >= (dwarf_version >= 5 ? 128 : 64))
size++; /* block */
  break;
case dw_val_class_wide_int:
  size += (get_full_len (*a->dw_attr_val.v.val_wide)
   * HOST_BITS_PER_WIDE_INT / HOST_BITS_PER_CHAR);
  if (get_full_len (*a->dw_attr_val.v.val_wide) * HOST_BITS_PER_WIDE_INT
- > 64)
+ > (dwarf_version >= 5 ? 128 : 64))
size++; /* block */
  break;
case dw_val_class_vec:
@@ -8979,6 +8979,9 @@ value_format (dw_attr_node *a)
case 32:
  return DW_FORM_data8;
case 64:
+ if (dwarf_version >= 5)
+   return DW_FORM_data16;
+ /* FALLTHRU */
default:
  return DW_FORM_block1;
}
@@ -8993,6 +8996,10 @@ value_format (dw_attr_node *a)
  return DW_FORM_data4;
case 64:
  return DW_FORM_data8;
+   case 128:
+ if (dwarf_version >= 5)
+   return DW_FORM_data16;
+ /* FALLTHRU */
default:
  return DW_FORM_block1;
}
@@ -9439,7 +9446,7 @@ output_die (dw_die_ref die)
  {
unsigned HOST_WIDE_INT first, second;
 
-   if (HOST_BITS_PER_WIDE_INT >= 64)
+   if (HOST_BITS_PER_WIDE_INT >= (dwarf_version >= 5 ? 128 : 64))
  dw2_asm_output_data (1,
   HOST_BITS_PER_DOUBLE_INT
   / HOST_BITS_PER_CHAR,
@@ -9468,9 +9475,9 @@ output_die (dw_die_ref die)
int i;
int len = get_full_len (*a->dw_attr_val.v.val_wide);
int l = HOST_BITS_PER_WIDE_INT / HOST_BITS_PER_CHAR;
-   if (len * HOST_BITS_PER_WIDE_INT > 64)
- dw2_asm_output_data (1, get_full_len (*a->dw_attr_val.v.val_wide) 
* l,
-  NULL);
+   if (len * HOST_BITS_PER_WIDE_INT > (dwarf_version >= 5 ? 128 : 64))
+ dw2_asm_output_data (1, get_full_len (*a->dw_attr_val.v.val_wide)
+ * l, NULL);
 
if (WORDS_BIG_ENDIAN)
  for (i = len - 1; i >= 0; --i)

Jakub


Re: [PATCH] PR77895: DWARF: Emit DW_AT_comp_dir in all cases, even if source is an absolute path

2016-10-17 Thread Ximin Luo
Mike Stump:
> On Oct 17, 2016, at 11:00 AM, Ximin Luo  wrote:
>> Therefore, it is better to emit it in all circumstances, in case the reader 
>> needs to know what the working
>> directory was at compile-time.
> 
> I can't help but wonder if this would break ccache some?
> 

Could you explain this in some more detail? At the moment, GCC will already 
emit DW_AT_name with an absolute path (in the scenario that this patch is 
relevant to). How does ccache work around this at the moment? (Does it use 
debug-prefix-map? In which case, this also affects DW_AT_comp_dir, so my patch 
should be fine.)

X

-- 
GPG: ed25519/56034877E1F87C35
GPG: rsa4096/1318EFAC5FBBDBCE
https://github.com/infinity0/pubkeys.git


[PATCH] DWARF5 DW_TAG_call_site support

2016-10-17 Thread Jakub Jelinek
Hi!

This is on top of the
http://gcc.gnu.org/ml/gcc-patches/2016-10/msg01202.html
patch.  This is yet another GNU extension that got accepted into DWARF5,
though in this case various attributes got renamed (other than just dropping
GNU_ middle parts), or extra attributes have been added instead of reusing
preexisting ones like DW_AT_location or DW_AT_abstract_origin.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-10-17  Jakub Jelinek  

* dwarf2out.c (dwarf_op): Renamed to ...
(dwarf_OP): ... this.
(convert_descriptor_to_mode, scompare_loc_descriptor,
minmax_loc_descriptor, typed_binop, mem_loc_descriptor,
implicit_ptr_descriptor, optimize_one_addr_into_implicit_ptr): Adjust
callers.
(dwarf_AT, dwarf_TAG): New functions.
(check_die): Disallow DW_AT_call_all_calls next to
DW_AT_GNU_all_call_sites.
(gen_call_site_die): Use dwarf_TAG and dwarf_AT with DWARF 5 tag
and attributes instead of the corresponding GNU tag and attributes.
(gen_subprogram_die): Likewise.  Emit call site information even
for -gdwarf-5 -gstrict-dwarf.  Replace DW_AT_GNU_defaulted with
DW_AT_defaulted in comment.
(resolve_addr): Handle DW_AT_call_origin attribute on
DW_TAG_call_site DIE like DW_AT_abstract_origin on
DW_TAG_GNU_call_site DIE.

--- gcc/dwarf2out.c.jj  2016-10-17 10:03:14.0 +0200
+++ gcc/dwarf2out.c 2016-10-17 11:45:03.113680457 +0200
@@ -1517,7 +1517,7 @@ loc_list_plus_const (dw_loc_list_ref lis
 /* Utility inline function for construction of ops that were GNU extension
before DWARF 5.  */
 static inline enum dwarf_location_atom
-dwarf_op (enum dwarf_location_atom op)
+dwarf_OP (enum dwarf_location_atom op)
 {
   switch (op)
 {
@@ -1562,6 +1562,90 @@ dwarf_op (enum dwarf_location_atom op)
   return op;
 }
 
+/* Similarly for attributes.  */
+static inline enum dwarf_attribute
+dwarf_AT (enum dwarf_attribute at)
+{
+  switch (at)
+{
+case DW_AT_call_return_pc:
+  if (dwarf_version < 5)
+   return DW_AT_low_pc;
+  break;
+
+case DW_AT_call_tail_call:
+  if (dwarf_version < 5)
+   return DW_AT_GNU_tail_call;
+  break;
+
+case DW_AT_call_origin:
+  if (dwarf_version < 5)
+   return DW_AT_abstract_origin;
+  break;
+
+case DW_AT_call_target:
+  if (dwarf_version < 5)
+   return DW_AT_GNU_call_site_target;
+  break;
+
+case DW_AT_call_target_clobbered:
+  if (dwarf_version < 5)
+   return DW_AT_GNU_call_site_target_clobbered;
+  break;
+
+case DW_AT_call_parameter:
+  if (dwarf_version < 5)
+   return DW_AT_abstract_origin;
+  break;
+
+case DW_AT_call_value:
+  if (dwarf_version < 5)
+   return DW_AT_GNU_call_site_value;
+  break;
+
+case DW_AT_call_data_value:
+  if (dwarf_version < 5)
+   return DW_AT_GNU_call_site_data_value;
+  break;
+
+case DW_AT_call_all_calls:
+  if (dwarf_version < 5)
+   return DW_AT_GNU_all_call_sites;
+  break;
+
+case DW_AT_call_all_tail_calls:
+  if (dwarf_version < 5)
+   return DW_AT_GNU_all_tail_call_sites;
+  break;
+
+default:
+  break;
+}
+  return at;
+}
+
+/* And similarly for tags.  */
+static inline enum dwarf_tag
+dwarf_TAG (enum dwarf_tag tag)
+{
+  switch (tag)
+{
+case DW_TAG_call_site:
+  if (dwarf_version < 5)
+   return DW_TAG_GNU_call_site;
+  break;
+
+case DW_TAG_call_site_parameter:
+  if (dwarf_version < 5)
+   return DW_TAG_GNU_call_site_parameter;
+  break;
+
+default:
+  break;
+}
+  return tag;
+}
+
 static unsigned long int get_base_type_offset (dw_die_ref);
 
 /* Return the size of a location descriptor.  */
@@ -5998,6 +6082,7 @@ check_die (dw_die_ref die)
&& a->dw_attr != DW_AT_high_pc
&& a->dw_attr != DW_AT_location
&& a->dw_attr != DW_AT_frame_base
+   && a->dw_attr != DW_AT_call_all_calls
&& a->dw_attr != DW_AT_GNU_all_call_sites);
 }
 }
@@ -12696,13 +12781,13 @@ convert_descriptor_to_mode (machine_mode
 
   if (GET_MODE_SIZE (mode) <= DWARF2_ADDR_SIZE)
 {
-  add_loc_descr (&op, new_loc_descr (dwarf_op (DW_OP_convert), 0, 0));
+  add_loc_descr (&op, new_loc_descr (dwarf_OP (DW_OP_convert), 0, 0));
   return op;
 }
   type_die = base_type_for_mode (outer_mode, 1);
   if (type_die == NULL)
 return NULL;
-  cvt = new_loc_descr (dwarf_op (DW_OP_convert), 0, 0);
+  cvt = new_loc_descr (dwarf_OP (DW_OP_convert), 0, 0);
   cvt->dw_loc_oprnd1.val_class = dw_val_class_die_ref;
   cvt->dw_loc_oprnd1.v.val_die_ref.die = type_die;
   cvt->dw_loc_oprnd1.v.val_die_ref.external = 0;
@@ -12767,12 +12852,12 @@ scompare_loc_descriptor (enum dwarf_loca
 
   if (type_die == NULL)
return NULL;
-  cvt = new_loc_descr (dwarf_op (DW_OP_convert), 0, 0)

Re: [PATCH] PR77895: DWARF: Emit DW_AT_comp_dir in all cases, even if source is an absolute path

2016-10-17 Thread Mike Stump
On Oct 17, 2016, at 11:00 AM, Ximin Luo  wrote:
> Therefore, it is better to emit it in all circumstances, in case the reader 
> needs to know what the working
> directory was at compile-time.

I can't help but wonder if this would break ccache some?

Re: RFC [1/3] divmod transform v2

2016-10-17 Thread Jeff Law

On 10/15/2016 11:59 PM, Prathamesh Kulkarni wrote:

This patch is mostly the same as previous one, except it drops
targeting __udivmoddi4() because it gave undefined reference link
error for calling __udivmoddi4() on aarch64-linux-gnu. It appears
aarch64 has hardware insn for DImode div, so __udivmoddi4() isn't
needed for the target (it was a bug in my patch that called
__udivmoddi4() even though aarch64 supported hardware div).
This touches on the one high level question I had.  Namely what is the 
code generation strategy if the hardware has a div, but not divmod.


ISTM in that case I think we want to use the div instruction and 
synthesize mod from that result rather than relying on a software 
divmod.  So it looks like you ought to be doing the right thing for that 
case now based on your comment above.


However this makes me wonder if it's guaranteed that __udivmoddi4()
will be available for a target if it doesn't have hardware div and
divmod insn and doesn't have target-specific libfunc for DImode
divmod ? To be conservative, the attached patch doesn't generate call
to __udivmoddi4.
I don't think that's a safe assumption.  Addition of the divmod routines 
into libgcc is controlled by the target and have to be explicitly added 
AFAICT.


So on a target like the fr30 which has no div or mod insn and doesn't 
define the right bits in libgcc, there is no divmod libcall available. 
(On these targets there's a div libcall and a mod libcall, but not a 
combined one).


I don't even think we have a way of knowing in the compiler if the 
target has enabled divmod support in libgcc.


ISTM that for now we have to limit to cases where we have a divmod 
insn/libcall defined.


jeff



Re: [PATCH] read-md.c: Move various state to within class rtx_reader (v3)

2016-10-17 Thread Richard Sandiford
David Malcolm  writes:
>> I assume you didn't turn the functions above into member functions
>> because they aren't primitives.  OTOH it might seem inconsistent
>> to have things like traverse_enum_types as member functions but
>> traverse_md_constants not.
>
> In the following I've converted traverse_md_constants to a member
> function.
>
> Successfully bootstrapped®rtested on x86_64-pc-linux-gnu.
>
> OK for trunk?

OK, thanks.  And thanks for doing this -- nice clean-up!

Richard


Re: [PATCH] (PR 65950) Improve predicate for exit(0);

2016-10-17 Thread Andrew Pinski
On Wed, Sep 21, 2016 at 1:53 AM, Richard Biener
 wrote:
> On Wed, Sep 21, 2016 at 10:45 AM, Andrew Pinski  wrote:
>> On Wed, Sep 21, 2016 at 4:32 PM, Richard Biener
>>  wrote:
>>> On Wed, Sep 21, 2016 at 9:13 AM, Andrew Pinski  wrote:
 Hi,
   While looking through bug reports, I noticed someone had reported
 that LTO caused the vectorizer not to kick in.  Originally it was not
 obvious why it would change the behavior since this was a simple
 program with nothing out of the ordinary.  Well it turned out paths
 leading to the exit(0); at the end of main was being marked as cold
 and in the LTO case the funciton (which had loop which should have
 been vectorized) was considered local and being marked as cold as it
 was only called now from the path leading to the exit(0);  Since
 exit(0); is considered a normal exit path, there is no reason to mark
 it as being as a cold path; let the other predicate handle it.

 So this patch changes the predicate for exit(0) not to mark the paths
 leading up to that function call as being cold.  Note this patch only
 disables that when the argument is a literal zero so if we a PHI node
 which contains the exit value, we still cause those paths to be
 considered cold; this will be for another patch.
>>>
>>> Hmm, it also doesn't work for main calling a function not returning but 
>>> exiting
>>> with exit (0) (it will be discovered as noreturn).
>>>
>>> I don't think that treating exit (0) as generally not terminating a cold 
>>> code
>>> sequence is good either.
>>>
>>> Predictions are always hard I guess ...
>>>
>>> But the thing to improve is maybe the different handling of main with
>>> respect to the guessed profile -- this is something that should not happen
>>> inconsistently between LTO / non-LTO as main is special in all cases.  
>>> Honza?
>>
>> Maybe I did not explain it very well but what is happening is a
>> function which is only called in the cold path is marked as cold only
>> in the LTO case as it is extern call.. Basically with LTO, the
>> function becomes local so it can be marked as cold while without LTO,
>> it cannot.
>
> Ah, so you have
>
> foo () { loop }
> main()
> {
>   if ()
>{
>   foo ();
>   exit (0);
>}
> ...
>   return 0;
> }
>
> and foo is marked cold because its only call is on the path to exit (0)?


Actually the case I have here is just:
foo () { loop }
int main(void)
{
.
foo();
...
exit (0);
}

Just an exit at the end of main.
Basically if we treat exit(0) as a normal function, main becomes hot again.

Thanks,
Andrew

>
> noreturn prediction is quite aggressive but it works also quite well.  Given
> you can only detect a very minor fraction of cases (consider exit (0) being
> in foo itself) I'm not sure that your fix is good progress.  IMHO we might
> want to switch from 'noreturn' to 'noreturn' + likely error which needs
> another attribute / auto-discovery and IPA propagation to handle this case
> better.
>
> Anyway, I'll leave the patch to Honza.
>
> Richard.
>
>> Thanks,
>> Andrew
>>
>>>
>>> Richard.
>>>
 OK?  Bootstrapped and tested on aarch64-linux-gnu with no regressions.
 Also tested it with SPEC CPU 2006 with no performance regressions.

 Thanks,
 Andrew Pinski

 ChangeLog:
 * predict.c (is_exit_with_zero_arg): New function.
 (tree_bb_level_predictions): Don't consider paths leading to exit(0)
 as nottaken.


[PATCH 7/7] make targetm.gen_ccmp{first,next} take rtx_insn **

2016-10-17 Thread tbsaunde+gcc
From: Trevor Saunders 

gcc/ChangeLog:

2016-10-17  Trevor Saunders  

* ccmp.c (expand_ccmp_expr_1): Adjust.
(expand_ccmp_expr): Likewise.
(expand_ccmp_next): Likewise.
* config/aarch64/aarch64.c (aarch64_gen_ccmp_next): Likewise.
(aarch64_gen_ccmp_first): Likewise.
* doc/tm.texi: Regenerate.
* target.def (gen_ccmp_first): Change argument types to rtx_insn *.
(gen_ccmp_next): Likewise.
---
 gcc/ccmp.c   | 21 ++---
 gcc/config/aarch64/aarch64.c | 10 +-
 gcc/doc/tm.texi  |  4 ++--
 gcc/target.def   |  4 ++--
 4 files changed, 19 insertions(+), 20 deletions(-)

diff --git a/gcc/ccmp.c b/gcc/ccmp.c
index 615b7e6..14222ca 100644
--- a/gcc/ccmp.c
+++ b/gcc/ccmp.c
@@ -122,7 +122,7 @@ ccmp_candidate_p (gimple *g)
GEN_SEQ returns all compare insns.  */
 static rtx
 expand_ccmp_next (gimple *g, tree_code code, rtx prev,
- rtx *prep_seq, rtx *gen_seq)
+ rtx_insn **prep_seq, rtx_insn **gen_seq)
 {
   rtx_code rcode;
   int unsignedp = TYPE_UNSIGNED (TREE_TYPE (gimple_assign_rhs1 (g)));
@@ -149,10 +149,8 @@ expand_ccmp_next (gimple *g, tree_code code, rtx prev,
PREP_SEQ returns all insns to prepare opearand.
GEN_SEQ returns all compare insns.  */
 static rtx
-expand_ccmp_expr_1 (gimple *g, rtx *prep_seq, rtx *gen_seq)
+expand_ccmp_expr_1 (gimple *g, rtx_insn **prep_seq, rtx_insn **gen_seq)
 {
-  rtx prep_seq_1, gen_seq_1;
-  rtx prep_seq_2, gen_seq_2;
   tree exp = gimple_assign_rhs_to_tree (g);
   tree_code code = TREE_CODE (exp);
   gimple *gs0 = get_gimple_for_ssa_name (TREE_OPERAND (exp, 0));
@@ -180,6 +178,7 @@ expand_ccmp_expr_1 (gimple *g, rtx *prep_seq, rtx *gen_seq)
  rcode0 = get_rtx_code (code0, unsignedp0);
  rcode1 = get_rtx_code (code1, unsignedp1);
 
+ rtx_insn *prep_seq_1, *gen_seq_1;
  tmp = targetm.gen_ccmp_first (&prep_seq_1, &gen_seq_1, rcode0,
gimple_assign_rhs1 (gs0),
gimple_assign_rhs2 (gs0));
@@ -187,14 +186,15 @@ expand_ccmp_expr_1 (gimple *g, rtx *prep_seq, rtx 
*gen_seq)
  if (tmp != NULL)
{
  ret = expand_ccmp_next (gs1, code, tmp, &prep_seq_1, &gen_seq_1);
- cost1 = seq_cost (safe_as_a  (prep_seq_1), speed_p);
- cost1 += seq_cost (safe_as_a  (gen_seq_1), speed_p);
+ cost1 = seq_cost (prep_seq_1, speed_p);
+ cost1 += seq_cost (gen_seq_1, speed_p);
}
 
  /* FIXME: Temporary workaround for PR69619.
 Avoid exponential compile time due to expanding gs0 and gs1 twice.
 If gs0 and gs1 are complex, the cost will be high, so avoid
 reevaluation if above an arbitrary threshold.  */
+ rtx_insn *prep_seq_2, *gen_seq_2;
  if (tmp == NULL || cost1 < COSTS_N_INSNS (25))
tmp2 = targetm.gen_ccmp_first (&prep_seq_2, &gen_seq_2, rcode1,
   gimple_assign_rhs1 (gs1),
@@ -207,8 +207,8 @@ expand_ccmp_expr_1 (gimple *g, rtx *prep_seq, rtx *gen_seq)
{
  ret2 = expand_ccmp_next (gs0, code, tmp2, &prep_seq_2,
   &gen_seq_2);
- cost2 = seq_cost (safe_as_a  (prep_seq_2), speed_p);
- cost2 += seq_cost (safe_as_a  (gen_seq_2), speed_p);
+ cost2 = seq_cost (prep_seq_2, speed_p);
+ cost2 += seq_cost (gen_seq_2, speed_p);
}
 
  if (cost2 < cost1)
@@ -262,14 +262,13 @@ expand_ccmp_expr (gimple *g)
 {
   rtx_insn *last;
   rtx tmp;
-  rtx prep_seq, gen_seq;
-
-  prep_seq = gen_seq = NULL_RTX;
 
   if (!ccmp_candidate_p (g))
 return NULL_RTX;
 
   last = get_last_insn ();
+
+  rtx_insn *prep_seq = NULL, *gen_seq = NULL;
   tmp = expand_ccmp_expr_1 (g, &prep_seq, &gen_seq);
 
   if (tmp)
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 74f1a6a..771421c 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -13160,7 +13160,7 @@ aarch64_use_by_pieces_infrastructure_p (unsigned 
HOST_WIDE_INT size,
 }
 
 static rtx
-aarch64_gen_ccmp_first (rtx *prep_seq, rtx *gen_seq,
+aarch64_gen_ccmp_first (rtx_insn **prep_seq, rtx_insn **gen_seq,
int code, tree treeop0, tree treeop1)
 {
   machine_mode op_mode, cmp_mode, cc_mode = CCmode;
@@ -13234,8 +13234,8 @@ aarch64_gen_ccmp_first (rtx *prep_seq, rtx *gen_seq,
 }
 
 static rtx
-aarch64_gen_ccmp_next (rtx *prep_seq, rtx *gen_seq, rtx prev, int cmp_code,
-  tree treeop0, tree treeop1, int bit_code)
+aarch64_gen_ccmp_next (rtx_insn **prep_seq, rtx_insn **gen_seq, rtx prev,
+  int cmp_code, tree treeop0, tree treeop1, int bit_code)
 {
   rtx op0, op1, target;
   machine_mode op_mode, cmp_mode, cc_mode = CCmode;
@@ -13244,7 +13244,7 @@ aarch64_gen_ccmp_next (rtx *prep_s

[PATCH 5/7] remove cast in delete_insn_chain

2016-10-17 Thread tbsaunde+gcc
From: Trevor Saunders 

gcc/ChangeLog:

2016-10-17  Trevor Saunders  

* cfgrtl.c (delete_insn_chain): Change argument type to rtx_insn *
and adjust for that.
* cfgrtl.h (delete_insn_chain): Adjust prototype.
---
 gcc/cfgrtl.c | 8 +++-
 gcc/cfgrtl.h | 2 +-
 2 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/gcc/cfgrtl.c b/gcc/cfgrtl.c
index 813f7ce..d2719db 100644
--- a/gcc/cfgrtl.c
+++ b/gcc/cfgrtl.c
@@ -240,17 +240,15 @@ delete_insn_and_edges (rtx_insn *insn)
insns that cannot be removed to NULL.  */
 
 void
-delete_insn_chain (rtx start, rtx finish, bool clear_bb)
+delete_insn_chain (rtx start, rtx_insn *finish, bool clear_bb)
 {
-  rtx_insn *prev, *current;
-
   /* Unchain the insns one by one.  It would be quicker to delete all of these
  with a single unchaining, rather than one at a time, but we need to keep
  the NOTE's.  */
-  current = safe_as_a  (finish);
+  rtx_insn *current = finish;
   while (1)
 {
-  prev = PREV_INSN (current);
+  rtx_insn *prev = PREV_INSN (current);
   if (NOTE_P (current) && !can_delete_note_p (as_a  (current)))
;
   else
diff --git a/gcc/cfgrtl.h b/gcc/cfgrtl.h
index d81928a..f4c1396 100644
--- a/gcc/cfgrtl.h
+++ b/gcc/cfgrtl.h
@@ -22,7 +22,7 @@ along with GCC; see the file COPYING3.  If not see
 
 extern void delete_insn (rtx);
 extern bool delete_insn_and_edges (rtx_insn *);
-extern void delete_insn_chain (rtx, rtx, bool);
+extern void delete_insn_chain (rtx, rtx_insn *, bool);
 extern basic_block create_basic_block_structure (rtx_insn *, rtx_insn *,
 rtx_note *, basic_block);
 extern void compute_bb_for_insn (void);
-- 
2.9.3



[PATCH 3/7] use rtx_insn * more

2016-10-17 Thread tbsaunde+gcc
From: Trevor Saunders 

gcc/ChangeLog:

2016-10-17  Trevor Saunders  

* config/alpha/alpha.c (alpha_legitimize_address_1): Change
variable types from rtx to rtx_insn *.
(alpha_emit_xfloating_libcall): Likewise.
* config/arc/arc.c (arc_emit_call_tls_get_addr): Likewise.
* config/arm/arm.c (arm_call_tls_get_addr): Likewise.
(legitimize_tls_address): Likewise.
* config/bfin/bfin.c (hwloop_optimize): Likewise.
(bfin_gen_bundles): Likewise.
* config/c6x/c6x.c (reorg_split_calls): Likewise.
(c6x_reorg): Likewise.
* config/frv/frv.c (frv_reorder_packet): Likewise.
* config/i386/i386.c (ix86_split_idivmod): Likewise.
* config/ia64/ia64.c (ia64_expand_compare): Likewise.
* config/m32c/m32c.c (m32c_prepare_shift): Likewise.
* config/mips/mips.c (mips_call_tls_get_addr): Likewise.
(mips_legitimize_tls_address): Likewise.
* config/mn10300/mn10300.c: Likewise.
* config/rl78/rl78.c: Likewise.
* config/s390/s390.c (s390_fix_long_loop_prediction): Likewise.
* config/sh/sh-mem.cc (sh_expand_cmpstr): Likewise.
(sh_expand_cmpnstr): Likewise.
(sh_expand_strlen): Likewise.
(sh_expand_setmem): Likewise.
* config/sh/sh.md: Likewise.
* emit-rtl.c (emit_pattern_before): Likewise.
* except.c: Likewise.
* final.c: Likewise.
* jump.c: Likewise.
* optabs.c (expand_binop): Likewise.
* reload1.c (gen_reload): Likewise.
* reorg.c (relax_delay_slots): Likewise.
---
 gcc/config/alpha/alpha.c | 117 ++-
 gcc/config/arc/arc.c |   3 +-
 gcc/config/arm/arm.c |   9 ++--
 gcc/config/bfin/bfin.c   |   7 +--
 gcc/config/c6x/c6x.c |   9 ++--
 gcc/config/frv/frv.c |   2 +-
 gcc/config/i386/i386.c   |   3 +-
 gcc/config/ia64/ia64.c   |   4 +-
 gcc/config/m32c/m32c.c   |   4 +-
 gcc/config/mips/mips.c   |  61 +++---
 gcc/config/mn10300/mn10300.c |   2 +-
 gcc/config/rl78/rl78.c   |   2 +-
 gcc/config/s390/s390.c   |   7 +--
 gcc/config/sh/sh-mem.cc  |   8 +--
 gcc/config/sh/sh.md  |  18 +++
 gcc/emit-rtl.c   |   2 +-
 gcc/except.c |   2 +-
 gcc/final.c  |   4 +-
 gcc/jump.c   |   4 +-
 gcc/optabs.c |   5 +-
 gcc/reload1.c|   9 ++--
 gcc/reorg.c  |  19 ---
 22 files changed, 156 insertions(+), 145 deletions(-)

diff --git a/gcc/config/alpha/alpha.c b/gcc/config/alpha/alpha.c
index 7f53967..6d390ae 100644
--- a/gcc/config/alpha/alpha.c
+++ b/gcc/config/alpha/alpha.c
@@ -1017,7 +1017,8 @@ alpha_legitimize_address_1 (rtx x, rtx scratch, 
machine_mode mode)
   && GET_MODE_SIZE (mode) <= UNITS_PER_WORD
   && symbolic_operand (x, Pmode))
 {
-  rtx r0, r16, eqv, tga, tp, insn, dest, seq;
+  rtx r0, r16, eqv, tga, tp, dest, seq;
+  rtx_insn *insn;
 
   switch (tls_symbolic_operand_type (x))
{
@@ -1025,66 +1026,70 @@ alpha_legitimize_address_1 (rtx x, rtx scratch, 
machine_mode mode)
  break;
 
case TLS_MODEL_GLOBAL_DYNAMIC:
- start_sequence ();
+ {
+   start_sequence ();
 
- r0 = gen_rtx_REG (Pmode, 0);
- r16 = gen_rtx_REG (Pmode, 16);
- tga = get_tls_get_addr ();
- dest = gen_reg_rtx (Pmode);
- seq = GEN_INT (alpha_next_sequence_number++);
+   r0 = gen_rtx_REG (Pmode, 0);
+   r16 = gen_rtx_REG (Pmode, 16);
+   tga = get_tls_get_addr ();
+   dest = gen_reg_rtx (Pmode);
+   seq = GEN_INT (alpha_next_sequence_number++);
 
- emit_insn (gen_movdi_er_tlsgd (r16, pic_offset_table_rtx, x, seq));
- insn = gen_call_value_osf_tlsgd (r0, tga, seq);
- insn = emit_call_insn (insn);
- RTL_CONST_CALL_P (insn) = 1;
- use_reg (&CALL_INSN_FUNCTION_USAGE (insn), r16);
+   emit_insn (gen_movdi_er_tlsgd (r16, pic_offset_table_rtx, x, seq));
+   rtx val = gen_call_value_osf_tlsgd (r0, tga, seq);
+   insn = emit_call_insn (val);
+   RTL_CONST_CALL_P (insn) = 1;
+   use_reg (&CALL_INSN_FUNCTION_USAGE (insn), r16);
 
-  insn = get_insns ();
- end_sequence ();
+   insn = get_insns ();
+   end_sequence ();
 
- emit_libcall_block (insn, dest, r0, x);
- return dest;
+   emit_libcall_block (insn, dest, r0, x);
+   return dest;
+ }
 
case TLS_MODEL_LOCAL_DYNAMIC:
- start_sequence ();
+ {
+   start_sequence ();
 
- r0 = gen_rtx_REG (Pmode, 0);
- r16 = gen_rtx_REG (Pmode, 16);
- tga = get_tls_get_addr ();
- scratch = gen_reg_rtx (Pmode);
- seq = GEN_INT (alpha_next_sequence_number++);
+   r0 = gen_rtx_REG (Pmode, 0);
+   

[PATCH 1/7] make LABEL_REF_LABEL a rtx_insn *

2016-10-17 Thread tbsaunde+gcc
From: Trevor Saunders 

While chainging LABEL_REF_LABEL it might as well become an inline
function, so that its clearer what types are involved.  Unfortunately
because it is still possible to use XEXP and related macros on a
LABEL_REF rtx you can still set the field to be a non insn rtx.  The
other unfortunate thing is that the generators actually create LABEL_REF
rtx that refer to MATCH_x rtx, so there we actually need to use XEXP to
bypass the checking this patch adds.

gcc/ChangeLog:

2016-10-17  Trevor Saunders  

* rtl.h (label_ref_label): New function.
(set_label_ref_label): New function.
(LABEL_REF_LABEL): Delete.
* alias.c (rtx_equal_for_memref_p): Adjust.
* cfgbuild.c (make_edges): Likewise.
(purge_dead_tablejump_edges): Likewise.
* cfgexpand.c (convert_debug_memory_address): Likewise.
* cfgrtl.c (patch_jump_insn): Likewise.
* combine.c (distribute_notes): Likewise.
* cse.c (hash_rtx_cb): Likewise.
(exp_equiv_p): Likewise.
(fold_rtx): Likewise.
(check_for_label_ref): Likewise.
* cselib.c (rtx_equal_for_cselib_1): Likewise.
(cselib_hash_rtx): Likewise.
* emit-rtl.c (mark_label_nuses): Likewise.
* explow.c (convert_memory_address_addr_space_1): Likewise.
* final.c (output_asm_label): Likewise.
(output_addr_const): Likewise.
* gcse.c (add_label_notes): Likewise.
* genconfig.c (walk_insn_part): Likewise.
* genrecog.c (validate_pattern): Likewise.
* ifcvt.c (cond_exec_get_condition): Likewise.
(noce_emit_store_flag): Likewise.
(noce_get_alt_condition): Likewise.
(noce_get_condition): Likewise.
* jump.c (maybe_propagate_label_ref): Likewise.
(mark_jump_label_1): Likewise.
(redirect_exp_1): Likewise.
(rtx_renumbered_equal_p): Likewise.
* lra-constraints.c (operands_match_p): Likewise.
* print-rtl.c (print_value): Likewise.
* reload.c (find_reloads): Likewise.
* reload1.c (set_label_offsets): Likewise.
* reorg.c (get_branch_condition): Likewise.
* rtl-tests.c (test_uncond_jump): Likewise.
* rtl.c (rtx_equal_p_cb): Likewise.
(rtx_equal_p): Likewise.
* rtlanal.c (reg_mentioned_p): Likewise.
(rtx_referenced_p): Likewise.
(get_condition): Likewise.
* varasm.c (const_hash_1): Likewise.
(compare_constant): Likewise.
(const_rtx_hash_1): Likewise.
(output_constant_pool_1): Likewise.
---
 gcc/alias.c   |  2 +-
 gcc/cfgbuild.c|  4 ++--
 gcc/cfgexpand.c   |  2 +-
 gcc/cfgrtl.c  |  2 +-
 gcc/combine.c |  4 ++--
 gcc/cse.c | 20 ++--
 gcc/cselib.c  |  4 ++--
 gcc/emit-rtl.c|  4 ++--
 gcc/explow.c  |  2 +-
 gcc/final.c   |  4 ++--
 gcc/gcse.c|  6 +++---
 gcc/genconfig.c   |  4 ++--
 gcc/genrecog.c|  4 ++--
 gcc/ifcvt.c   |  8 
 gcc/jump.c| 18 --
 gcc/lra-constraints.c |  2 +-
 gcc/print-rtl.c   |  2 +-
 gcc/reload.c  | 12 ++--
 gcc/reload1.c |  6 +++---
 gcc/reorg.c   |  6 +++---
 gcc/rtl-tests.c   |  2 +-
 gcc/rtl.c |  4 ++--
 gcc/rtl.h | 11 ++-
 gcc/rtlanal.c |  6 +++---
 gcc/varasm.c  | 20 +++-
 25 files changed, 84 insertions(+), 75 deletions(-)

diff --git a/gcc/alias.c b/gcc/alias.c
index f4a0a29..ca475ff 100644
--- a/gcc/alias.c
+++ b/gcc/alias.c
@@ -1767,7 +1767,7 @@ rtx_equal_for_memref_p (const_rtx x, const_rtx y)
   return REGNO (x) == REGNO (y);
 
 case LABEL_REF:
-  return LABEL_REF_LABEL (x) == LABEL_REF_LABEL (y);
+  return label_ref_label (x) == label_ref_label (y);
 
 case SYMBOL_REF:
   return compare_base_symbol_refs (x, y) == 1;
diff --git a/gcc/cfgbuild.c b/gcc/cfgbuild.c
index 40c011d..6c70971 100644
--- a/gcc/cfgbuild.c
+++ b/gcc/cfgbuild.c
@@ -275,7 +275,7 @@ make_edges (basic_block min, basic_block max, int update_p)
  && GET_CODE (SET_SRC (tmp)) == IF_THEN_ELSE
  && GET_CODE (XEXP (SET_SRC (tmp), 2)) == LABEL_REF)
make_label_edge (edge_cache, bb,
-LABEL_REF_LABEL (XEXP (SET_SRC (tmp), 2)), 0);
+label_ref_label (XEXP (SET_SRC (tmp), 2)), 0);
}
 
  /* If this is a computed jump, then mark it as reaching
@@ -415,7 +415,7 @@ purge_dead_tablejump_edges (basic_block bb, 
rtx_jump_table_data *table)
&& SET_DEST (tmp) == pc_rtx
&& GET_CODE (SET_SRC (tmp)) == IF_THEN_ELSE
&& GET_CODE (XEXP (SET_SRC (tmp), 2)) == LABEL_REF)
-mark_tablejump_edge (LABEL_REF_LABEL (XEXP (SET_SRC (tmp), 2)));
+mark_tablejump_edge (label_ref_label (XEXP (SET_SRC (tmp), 2)));
 
   for (ei = ei_start (bb->succs); (e = ei_safe_e

[PATCH 6/7] remove cast from prev_nonnote_insn_bb

2016-10-17 Thread tbsaunde+gcc
From: Trevor Saunders 

gcc/ChangeLog:

2016-10-17  Trevor Saunders  

* emit-rtl.c (prev_nonnote_insn_bb): Change argument type to
rtx_insn *.
* rtl.h (prev_nonnote_insn_bb): Adjust prototype.
---
 gcc/emit-rtl.c | 3 +--
 gcc/rtl.h  | 2 +-
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
index a8516eb..e27587b 100644
--- a/gcc/emit-rtl.c
+++ b/gcc/emit-rtl.c
@@ -3354,9 +3354,8 @@ prev_nonnote_insn (rtx_insn *insn)
not look inside SEQUENCEs.  */
 
 rtx_insn *
-prev_nonnote_insn_bb (rtx uncast_insn)
+prev_nonnote_insn_bb (rtx_insn *insn)
 {
-  rtx_insn *insn = safe_as_a  (uncast_insn);
 
   while (insn)
 {
diff --git a/gcc/rtl.h b/gcc/rtl.h
index 9abb054..3901749 100644
--- a/gcc/rtl.h
+++ b/gcc/rtl.h
@@ -2856,7 +2856,7 @@ extern rtx_call_insn *last_call_insn (void);
 extern rtx_insn *previous_insn (rtx_insn *);
 extern rtx_insn *next_insn (rtx_insn *);
 extern rtx_insn *prev_nonnote_insn (rtx_insn *);
-extern rtx_insn *prev_nonnote_insn_bb (rtx);
+extern rtx_insn *prev_nonnote_insn_bb (rtx_insn *);
 extern rtx_insn *next_nonnote_insn (rtx_insn *);
 extern rtx_insn *next_nonnote_insn_bb (rtx_insn *);
 extern rtx_insn *prev_nondebug_insn (rtx_insn *);
-- 
2.9.3



[PATCH 2/7] make tablejump_p return the label as a rtx_insn *

2016-10-17 Thread tbsaunde+gcc
From: Trevor Saunders 

gcc/ChangeLog:

2016-10-17  Trevor Saunders  

* cfgcleanup.c (merge_blocks_move_successor_nojumps): Adjust.
(outgoing_edges_match): Likewise.
(try_crossjump_to_edge): Likewise.
* cfgrtl.c (try_redirect_by_replacing_jump): Likewise.
(rtl_tidy_fallthru_edge): Likewise.
* rtl.h (tablejump_p): Adjust prototype.
* rtlanal.c (tablejump_p): Return the label as a rtx_insn *.
---
 gcc/cfgcleanup.c |  8 
 gcc/cfgrtl.c |  7 +++
 gcc/rtl.h|  2 +-
 gcc/rtlanal.c| 32 
 4 files changed, 24 insertions(+), 25 deletions(-)

diff --git a/gcc/cfgcleanup.c b/gcc/cfgcleanup.c
index 1c9691d..c67b4d7 100644
--- a/gcc/cfgcleanup.c
+++ b/gcc/cfgcleanup.c
@@ -688,7 +688,7 @@ static void
 merge_blocks_move_successor_nojumps (basic_block a, basic_block b)
 {
   rtx_insn *barrier, *real_b_end;
-  rtx label;
+  rtx_insn *label;
   rtx_jump_table_data *table;
 
   /* If we are partitioning hot/cold basic blocks, we don't want to
@@ -709,7 +709,7 @@ merge_blocks_move_successor_nojumps (basic_block a, 
basic_block b)
   /* If there is a jump table following block B temporarily add the jump table
  to block B so that it will also be moved to the correct location.  */
   if (tablejump_p (BB_END (b), &label, &table)
-  && prev_active_insn (as_a (label)) == BB_END (b))
+  && prev_active_insn (label) == BB_END (b))
 {
   BB_END (b) = table;
 }
@@ -1697,7 +1697,7 @@ outgoing_edges_match (int mode, basic_block bb1, 
basic_block bb2)
   /* Check whether there are tablejumps in the end of BB1 and BB2.
  Return true if they are identical.  */
 {
-  rtx label1, label2;
+  rtx_insn *label1, *label2;
   rtx_jump_table_data *table1, *table2;
 
   if (tablejump_p (BB_END (bb1), &label1, &table1)
@@ -1994,7 +1994,7 @@ try_crossjump_to_edge (int mode, edge e1, edge e2,
  they have been already compared for equivalence in outgoing_edges_match ()
  so replace the references to TABLE1 by references to TABLE2.  */
   {
-  rtx label1, label2;
+  rtx_insn *label1, *label2;
   rtx_jump_table_data *table1, *table2;
 
   if (tablejump_p (BB_END (osrc1), &label1, &table1)
diff --git a/gcc/cfgrtl.c b/gcc/cfgrtl.c
index 67cab71..813f7ce 100644
--- a/gcc/cfgrtl.c
+++ b/gcc/cfgrtl.c
@@ -1101,7 +1101,7 @@ try_redirect_by_replacing_jump (edge e, basic_block 
target, bool in_cfglayout)
 {
   rtx_code_label *target_label = block_label (target);
   rtx_insn *barrier;
-  rtx label;
+  rtx_insn *label;
   rtx_jump_table_data *table;
 
   emit_jump_insn_after_noloc (targetm.gen_jump (target_label), insn);
@@ -1773,7 +1773,7 @@ rtl_tidy_fallthru_edge (edge e)
   && (any_uncondjump_p (q)
  || single_succ_p (b)))
 {
-  rtx label;
+  rtx_insn *label;
   rtx_jump_table_data *table;
 
   if (tablejump_p (q, &label, &table))
@@ -1786,8 +1786,7 @@ rtl_tidy_fallthru_edge (edge e)
  PUT_CODE (label, NOTE);
  NOTE_KIND (label) = NOTE_INSN_DELETED_LABEL;
  NOTE_DELETED_LABEL_NAME (label) = name;
- rtx_insn *lab = safe_as_a  (label);
- reorder_insns (lab, lab, PREV_INSN (q));
+ reorder_insns (label, label, PREV_INSN (q));
  delete_insn (table);
}
 
diff --git a/gcc/rtl.h b/gcc/rtl.h
index 7ee0b61..c84fe71 100644
--- a/gcc/rtl.h
+++ b/gcc/rtl.h
@@ -3038,7 +3038,7 @@ extern rtx replace_rtx (rtx, rtx, rtx, bool = false);
 extern void replace_label (rtx *, rtx, rtx, bool);
 extern void replace_label_in_insn (rtx_insn *, rtx, rtx, bool);
 extern bool rtx_referenced_p (const_rtx, const_rtx);
-extern bool tablejump_p (const rtx_insn *, rtx *, rtx_jump_table_data **);
+extern bool tablejump_p (const rtx_insn *, rtx_insn **, rtx_jump_table_data 
**);
 extern int computed_jump_p (const rtx_insn *);
 extern bool tls_referenced_p (const_rtx);
 
diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
index 90b55b6..4e600c0 100644
--- a/gcc/rtlanal.c
+++ b/gcc/rtlanal.c
@@ -3103,26 +3103,26 @@ rtx_referenced_p (const_rtx x, const_rtx body)
*LABELP and the jump table to *TABLEP.  LABELP and TABLEP may be NULL.  */
 
 bool
-tablejump_p (const rtx_insn *insn, rtx *labelp, rtx_jump_table_data **tablep)
+tablejump_p (const rtx_insn *insn, rtx_insn **labelp,
+rtx_jump_table_data **tablep)
 {
-  rtx label;
-  rtx_insn *table;
-
   if (!JUMP_P (insn))
 return false;
 
-  label = JUMP_LABEL (insn);
-  if (label != NULL_RTX && !ANY_RETURN_P (label)
-  && (table = NEXT_INSN (as_a  (label))) != NULL_RTX
-  && JUMP_TABLE_DATA_P (table))
-{
-  if (labelp)
-   *labelp = label;
-  if (tablep)
-   *tablep = as_a  (table);
-  return true;
-}
-  return false;
+  rtx target = JUMP_LABEL (insn);
+  if (target == NULL_RTX || ANY_RETURN_P (target))
+return false;
+
+  rtx_insn *label = as_a (target);
+  rtx_insn *table = next_insn (label);
+  if (tab

[PATCH 4/7] remove cast to rtx_insn * in remove_note

2016-10-17 Thread tbsaunde+gcc
From: Trevor Saunders 

gcc/ChangeLog:

2016-10-17  Trevor Saunders  

* config/rl78/rl78.c (gen-and_emit_move): Change argument type
to rtx_insn *.
(transcode_memory_rtx): Likewise.
(move_to_acc): Likewise.
(move_from_acc): Likewise.
(move_acc_to_reg): Likewise.
(move_to_x): Likewise.
(move_to_hl): Likewise.
(move_to_de): Likewise.
* config/rs6000/rs6000.c (emit_frame_save): Likewise.
(rs6000_emit_savres_rtx): Likewise.
(rs6000_emit_prologue): Likewise.
* reorg.c (update_reg_unused_notes): Likewise.
* rtl.h (remove_note): Adjust prototype.
* rtlanal.c (remove_note): Make argument type rtx_insn *.
---
 gcc/config/rl78/rl78.c | 16 
 gcc/config/rs6000/rs6000.c | 26 ++
 gcc/reorg.c|  4 ++--
 gcc/rtl.h  |  2 +-
 gcc/rtlanal.c  |  4 ++--
 5 files changed, 27 insertions(+), 25 deletions(-)

diff --git a/gcc/config/rl78/rl78.c b/gcc/config/rl78/rl78.c
index fd72a58..db3e25f 100644
--- a/gcc/config/rl78/rl78.c
+++ b/gcc/config/rl78/rl78.c
@@ -2778,7 +2778,7 @@ process_postponed_content_update (void)
after WHERE.  If TO already contains FROM then do nothing.  Returns TO if
BEFORE is true, FROM otherwise.  */
 static rtx
-gen_and_emit_move (rtx to, rtx from, rtx where, bool before)
+gen_and_emit_move (rtx to, rtx from, rtx_insn *where, bool before)
 {
   machine_mode mode = GET_MODE (to);
 
@@ -2833,7 +2833,7 @@ gen_and_emit_move (rtx to, rtx from, rtx where, bool 
before)
copy it into NEWBASE and return the updated MEM.  Otherwise just
return M.  Any needed insns are emitted before BEFORE.  */
 static rtx
-transcode_memory_rtx (rtx m, rtx newbase, rtx before)
+transcode_memory_rtx (rtx m, rtx newbase, rtx_insn *before)
 {
   rtx base, index, addendr;
   int addend = 0;
@@ -2934,7 +2934,7 @@ transcode_memory_rtx (rtx m, rtx newbase, rtx before)
 /* Copy SRC to accumulator (A or AX), placing any generated insns
before BEFORE.  Returns accumulator RTX.  */
 static rtx
-move_to_acc (int opno, rtx before)
+move_to_acc (int opno, rtx_insn *before)
 {
   rtx src = OP (opno);
   machine_mode mode = GET_MODE (src);
@@ -2968,7 +2968,7 @@ force_into_acc (rtx src, rtx_insn *before)
 /* Copy accumulator (A or AX) to DEST, placing any generated insns
after AFTER.  Returns accumulator RTX.  */
 static rtx
-move_from_acc (unsigned int opno, rtx after)
+move_from_acc (unsigned int opno, rtx_insn *after)
 {
   rtx dest = OP (opno);
   machine_mode mode = GET_MODE (dest);
@@ -2982,7 +2982,7 @@ move_from_acc (unsigned int opno, rtx after)
 /* Copy accumulator (A or AX) to REGNO, placing any generated insns
before BEFORE.  Returns reg RTX.  */
 static rtx
-move_acc_to_reg (rtx acc, int regno, rtx before)
+move_acc_to_reg (rtx acc, int regno, rtx_insn *before)
 {
   machine_mode mode = GET_MODE (acc);
   rtx reg;
@@ -2995,7 +2995,7 @@ move_acc_to_reg (rtx acc, int regno, rtx before)
 /* Copy SRC to X, placing any generated insns before BEFORE.
Returns X RTX.  */
 static rtx
-move_to_x (int opno, rtx before)
+move_to_x (int opno, rtx_insn *before)
 {
   rtx src = OP (opno);
   machine_mode mode = GET_MODE (src);
@@ -3018,7 +3018,7 @@ move_to_x (int opno, rtx before)
 /* Copy OP (opno) to H or HL, placing any generated insns before BEFORE.
Returns H/HL RTX.  */
 static rtx
-move_to_hl (int opno, rtx before)
+move_to_hl (int opno, rtx_insn *before)
 {
   rtx src = OP (opno);
   machine_mode mode = GET_MODE (src);
@@ -3041,7 +3041,7 @@ move_to_hl (int opno, rtx before)
 /* Copy OP (opno) to E or DE, placing any generated insns before BEFORE.
Returns E/DE RTX.  */
 static rtx
-move_to_de (int opno, rtx before)
+move_to_de (int opno, rtx_insn *before)
 {
   rtx src = OP (opno);
   machine_mode mode = GET_MODE (src);
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 613af48..b5f1dc9 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -26805,8 +26805,8 @@ output_probe_stack_range (rtx reg1, rtx reg2)
pointer.  That fails when saving regs off r1, and sched moves the
r31 setup past the reg saves.  */
 
-static rtx
-rs6000_frame_related (rtx insn, rtx reg, HOST_WIDE_INT val,
+static rtx_insn *
+rs6000_frame_related (rtx_insn *insn, rtx reg, HOST_WIDE_INT val,
  rtx reg2, rtx repl2)
 {
   rtx repl;
@@ -26970,11 +26970,11 @@ gen_frame_store (rtx reg, rtx frame_reg, int offset)
 /* Save a register into the frame, and emit RTX_FRAME_RELATED_P notes.
Save REGNO into [FRAME_REG + OFFSET] in mode MODE.  */
 
-static rtx
+static rtx_insn *
 emit_frame_save (rtx frame_reg, machine_mode mode,
 unsigned int regno, int offset, HOST_WIDE_INT frame_reg_to_sp)
 {
-  rtx reg, insn;
+  rtx reg;
 
   /* Some cases that need register indexed addressing.  */
   gcc_checking_assert (!((TARGET_ALTIVEC_ABI && ALTIVEC_VECTOR_MODE (m

[PATCH 0/7] more rtx-insn stuff

2016-10-17 Thread tbsaunde+gcc
From: Trevor Saunders 

Hi,

Rather late (travel and then busy day job interfiered), but this includes the
promised conversion of LABEL_REF_LABEL and tablejump_p to use rtx_insn *.  
Followed by a bit more random conversion of things to use rtx_insn *.

patches individually bootstrapped + regtested on x86_64-linux-gnu, and run
through config-list.mk as a whole.  Ok?

Thanks!

Trev

Trevor Saunders (7):
  make LABEL_REF_LABEL a rtx_insn *
  make tablejump_p return the label as a rtx_insn *
  use rtx_insn * more
  remove cast to rtx_insn * in remove_note
  remove cast in delete_insn_chain
  remove cast from prev_nonnote_insn_bb
  make targetm.gen_ccmp{first,next} take rtx_insn **

 gcc/alias.c  |   2 +-
 gcc/ccmp.c   |  21 
 gcc/cfgbuild.c   |   4 +-
 gcc/cfgcleanup.c |   8 +--
 gcc/cfgexpand.c  |   2 +-
 gcc/cfgrtl.c |  17 +++
 gcc/cfgrtl.h |   2 +-
 gcc/combine.c|   4 +-
 gcc/config/aarch64/aarch64.c |  10 ++--
 gcc/config/alpha/alpha.c | 117 ++-
 gcc/config/arc/arc.c |   3 +-
 gcc/config/arm/arm.c |   9 ++--
 gcc/config/bfin/bfin.c   |   7 +--
 gcc/config/c6x/c6x.c |   9 ++--
 gcc/config/frv/frv.c |   2 +-
 gcc/config/i386/i386.c   |   3 +-
 gcc/config/ia64/ia64.c   |   4 +-
 gcc/config/m32c/m32c.c   |   4 +-
 gcc/config/mips/mips.c   |  61 +++---
 gcc/config/mn10300/mn10300.c |   2 +-
 gcc/config/rl78/rl78.c   |  18 +++
 gcc/config/rs6000/rs6000.c   |  26 +-
 gcc/config/s390/s390.c   |   7 +--
 gcc/config/sh/sh-mem.cc  |   8 +--
 gcc/config/sh/sh.md  |  18 +++
 gcc/cse.c|  20 
 gcc/cselib.c |   4 +-
 gcc/doc/tm.texi  |   4 +-
 gcc/emit-rtl.c   |   9 ++--
 gcc/except.c |   2 +-
 gcc/explow.c |   2 +-
 gcc/final.c  |   8 +--
 gcc/gcse.c   |   6 +--
 gcc/genconfig.c  |   4 +-
 gcc/genrecog.c   |   4 +-
 gcc/ifcvt.c  |   8 +--
 gcc/jump.c   |  22 
 gcc/lra-constraints.c|   2 +-
 gcc/optabs.c |   5 +-
 gcc/print-rtl.c  |   2 +-
 gcc/reload.c |  12 ++---
 gcc/reload1.c|  15 +++---
 gcc/reorg.c  |  29 ++-
 gcc/rtl-tests.c  |   2 +-
 gcc/rtl.c|   4 +-
 gcc/rtl.h|  17 +--
 gcc/rtlanal.c|  42 
 gcc/target.def   |   4 +-
 gcc/varasm.c |  20 
 49 files changed, 316 insertions(+), 299 deletions(-)

-- 
2.9.3



Re: [Patch 3/11] Implement TARGET_C_EXCESS_PRECISION for s390

2016-10-17 Thread Andreas Krebbel1
> Here is a patch implementing what I think has been discussed in this 
thread.
> 
> OK?

Looks good to me.

Uli, do you agree with that change for S/390 or would you rather see us 
fixing the float_t type definition in Glibc?

-Andreas-




Re: [PATCH 1/2] Fix GCC split-stack variadic argument tests

2016-10-17 Thread Jeff Law

On 10/17/2016 01:12 PM, Adhemerval Zanella wrote:

This test adds the expect va_end after va_start on split-stack tests.

gcc/ChangeLog:

* gcc/testsuite/gcc.dg/split-3.c (down): Call va_end after va_start.
* gcc/testsuite/gcc.dg/split-6.c (down): Likewise.

OK.  Would probably fall under the obvious rule.
jeff



Re: [PATCH] Extend -Wint-in-bool-context to suspicious enum values (PR 77700)

2016-10-17 Thread Jeff Law

On 10/08/2016 04:50 PM, Trevor Saunders wrote:

On Fri, Oct 07, 2016 at 03:18:07PM +, Bernd Edlinger wrote:

Hi!

This extends -Wint-in-bool-context to uses of enum values in boolean
context, and fixes one place where accidentally an enum value was
passed to a bool parameter.

I excluded enum values 0 and 1 because that is used in
gimple-ssa-strength-reduction.c, where we have enums
which are passed in bool function arguments:

enum stride_status
{
   UNKNOWN_STRIDE = 0,
   KNOWN_STRIDE = 1
};

enum phi_adjust_status
{
   NOT_PHI_ADJUST = 0,
   PHI_ADJUST = 1
};

enum count_phis_status
{
   DONT_COUNT_PHIS = 0,
   COUNT_PHIS = 1
};

I would'nt use an enum in that way, but I think it is
at least not completely wrong to do it like that...


Using enums here seems reasonable see the discussion here
http://gcc.gnu.org/ml/gcc/2016-10/msg6.html however I don't
understand why that code makes the argument type bool instead of the
enum type.  Here's an untested patch to change those functions to take
the enum type.  It might make sense to change naming some, but I think
this is somewhat of an improvement as it is.

Trev

diff --git a/gcc/gimple-ssa-strength-reduction.c 
b/gcc/gimple-ssa-strength-reduction.c
index 7b14b91..f61805c 100644
--- a/gcc/gimple-ssa-strength-reduction.c
+++ b/gcc/gimple-ssa-strength-reduction.c

[ ... ]
Seems reasonable.  Feel free to commit after the usual testing.

jeff



Re: [PATCH][2/2] Enable SSA propagator "DCE" for VRP

2016-10-17 Thread Jeff Law

On 10/12/2016 01:08 AM, Richard Biener wrote:

On Fri, 7 Oct 2016, Richard Biener wrote:



This turns the switch (which also requires propagating into ASSERT_EXPRs,
otherwise those will end up with released SSA names eventually).

A [3/2] would be to let ASSERT_EXPRs be removed by the propagator
which would a) require VRP to fix up its lattice for this task,
b) make match-and-simplify not be confused by them during
substitute-and-fold folding.  This also requires (I think) dealing
with symbolic valueizations during substitute-and-fold in a
way FRE does (track availability) -- currently propagators restrict
themselves to constants due to that (avoid substituting to where
the use does not dominate the definition).

Re-bootstrap / regtest pending on x86_64-unknown-linux-gnu.


The following is what I have applied -- for now we can't DCE ASSERT_EXPRs
because VRP jump threading relies on them (in fact, it looks like VRPs
jump threading "implementation" is nothing but backwards threading,
looking for ASSERT_EXPRs rather than dominating conditions ... Jeff,
you may be more familiar with the VRP threading code, it looks like
we might be able to rip it out without regressions?)
That'd be the one I'd want to see ripped out first as it's the least 
"interesting" lot term.  I haven't actually tried it though.


I can't remember all the details anymore, but there was some major 
headaches around sequencing threading into VRP.  IIRC we use two things 
from VRP.  First the threader knows that an ASSERT_EXPR is just a copy. 
Second, it uses the range information via the callback and I believe it 
can use those together.


In today's world where range information has become a first class 
citizen, there's a reasonable chance that we could drop the ASSERT_EXPRs 
from the IL and just query the range information.  However, I suspect 
we'll see regressions as range information is often lost (the same issue 
Aldy, Martin and Andrew have poked at recently).


But it's certainly worth investigation.

jeff



Re: [PATCH, C++] Warn on redefinition of builtin functions (PR c++/71973)

2016-10-17 Thread Bernd Edlinger
On 10/17/16 20:05, Joseph Myers wrote:
> On Sun, 16 Oct 2016, Bernd Edlinger wrote:
>
>> Second, the declaration in the glibc header files simply look wrong,
>> because the type of argv, and envp is "char *const *" while the
>> builtin function wants "const char**", thus only the array of char*
>> itself is const, not the actual char stings they point to.
>
> char *const * is the POSIX type.  (It can't be const char ** or const char
> *const * because you can't convert char ** implicitly to those types in
> ISO C.)  You'd need to check the list archives for rationale for the
> built-in functions doing something different.
>

Yes, that was discussed here:
https://gcc.gnu.org/ml/gcc-patches/2004-03/msg01148.html

No mention why the BT_PTR_CONST_TYPE does not match the posix type.
But the right types were used on __gcov_execv/e/p stubs, so the author
did know the right types at least.

So I think that was broken from the beginning, but that was hidden by
the loose checking in the C FE and not warning in the C++ FE when
prototypes don't match.

>> Third, in C the  builtins are not diagnosed, because C does only look
>> at the mode of the parameters see match_builtin_function_types in
>> c/c-decl.c, which may itself be wrong, because that makes an ABI
>> decision dependent on the mode of the parameter.
>
> The matching needs to be loose because of functions using types such as
> FILE * where the compiler doesn't know the exact contents of the type when
> processing built-in function definitions.  (Historically there were also
> issues with pre-ISO headers, but that may be less relevant now.)
>

The C++ FE has exactly the same problem with FILE* and struct tm*
but it solves it differently and "learns" the type but only for FILE*
and with this patch also for const struct tm*.  It is a lot more
restrictive than C, but that is because of the ++ ;)

Well in that case the posix functions have to use the prototypes
from POSIX.1.2008 although their rationale is a bit silly...

This updated patch fixes the prototype of execv/p/e,
and adds a new test case that checks that no type conflict
exists in the execve built-in any more.

Now we have no -Wsystem-headers warnings with glibc-headers any more.
And the gcov builtin also is working with C++.

Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
Is it OK for trunk?


Thanks
Bernd.
gcc:
2016-10-17  Bernd Edlinger  

PR c++/71973
* doc/invoke.texi: Document -Wbuiltin-function-redefined.
* builtin-types.def (BT_CONST_TM_PTR): New primitive type.
(BT_PTR_CONST_STRING): Updated.
(BT_FN_SIZE_STRING_SIZE_CONST_STRING_CONST_PTR): Removed.
(BT_FN_SIZE_STRING_SIZE_CONST_STRING_CONST_TM_PTR): New function type.
* builtins.def (strftime): Update builtin function.
* tree-core.h (TI_CONST_TM_PTR_TYPE): New enum value.
* tree.h (const_tm_ptr_type_node): New type node.
* tree.c (free_lang_data, build_common_tree_nodes): Initialize
const_tm_ptr_type_node.

c-family:
2016-10-17  Bernd Edlinger  

PR c++/71973
* c.opt (Wbuiltin-function-redefined): New warning.
* c-common.c (c_common_nodes_and_builtins): Initialize
const_tm_ptr_type_node.

cp:
2016-10-17  Bernd Edlinger  

PR c++/71973
* decl.c (duplicate_decls): Warn when a built-in function is redefined.
Don't overload builtin functions with C++ functions.
Handle const_tm_ptr_type_node like file_ptr_node.
Copy the TREE_NOTHROW flag unmodified to the old decl.

lto:
2016-10-17  Bernd Edlinger  

PR c++/71973
* lto-lang.c (lto_init): Assert const_tm_ptr_type_node is sane.

testsuite:
2016-10-17  Bernd Edlinger  

PR c++/71973
* g++.dg/pr71973-1.C: New test.
* g++.dg/pr71973-2.C: New test.
* g++.dg/pr71973-3.C: New test.
* g++.dg/lookup/extern-c-redecl4.C: Adjust test expectations.
Index: gcc/builtin-types.def
===
--- gcc/builtin-types.def	(revision 241271)
+++ gcc/builtin-types.def	(working copy)
@@ -103,6 +103,7 @@ DEF_PRIMITIVE_TYPE (BT_COMPLEX_LONGDOUBLE, complex
 
 DEF_PRIMITIVE_TYPE (BT_PTR, ptr_type_node)
 DEF_PRIMITIVE_TYPE (BT_FILEPTR, fileptr_type_node)
+DEF_PRIMITIVE_TYPE (BT_CONST_TM_PTR, const_tm_ptr_type_node)
 DEF_PRIMITIVE_TYPE (BT_CONST_PTR, const_ptr_type_node)
 DEF_PRIMITIVE_TYPE (BT_VOLATILE_PTR,
 		build_pointer_type
@@ -146,7 +147,12 @@ DEF_PRIMITIVE_TYPE (BT_I16, builtin_type_for_size
 
 DEF_PRIMITIVE_TYPE (BT_BND, pointer_bounds_type_node)
 
-DEF_POINTER_TYPE (BT_PTR_CONST_STRING, BT_CONST_STRING)
+/* The C type `char * const *'.  */
+DEF_PRIMITIVE_TYPE (BT_PTR_CONST_STRING,
+		build_pointer_type
+		 (build_qualified_type (string_type_node,
+	TYPE_QUAL_CONST)))
+
 DEF_POINTER_TYPE (BT_PTR_UINT, BT_UINT)
 DEF_POINTER_TYPE (BT_PTR_LONG, BT_LONG)
 DEF_POINTER_TYPE (BT_PTR_ULONG, BT_ULONG)
@@ -511,8 +517,8 @@ DEF_FUNCTION_TYPE

[PATCH v3 2/2] aarch64: Add split-stack initial support

2016-10-17 Thread Adhemerval Zanella
From: Adhemerval Zanella 

Changes from previous version:

 * Add missing new function comments;
 * Using gen_rtx_fmt_ee instead of gen_rtx_IF_THEN_ELSE to create
   conditional jumps;
 * Some typos and withspace issues.

--

This patch adds the split-stack support on aarch64 (PR #67877).  As for
other ports this patch should be used along with glibc and gold support.

The support is done similar to other architectures: a __private_ss field is
added on TCB in glibc, a target-specific __morestack implementation and
helper functions are added in libgcc and compiler supported in adjusted
(split-stack prologue, va_start for argument handling).  I also plan to
send the gold support to adjust stack allocation acrosss split-stack
and default code calls.

Current approach is similar to powerpc one: at most 2 GB of stack allocation
is support so stack adjustments can be done with 2 instructions (either just
a movn plus nop or a movn followed by movk).  The morestack call is non
standard with x10 hollding the requested stack pointer and x11 the required
stack size to be copied.  Also function arguments on the old stack are
accessed based on a value relative to the stack pointer, so x10 is used to
hold theold stack value.  Unwinding is handled by a personality routine that
knows how to find stack segments.

Split-stack prologue on function entry is as follow (this goes before the
usual function prologue):

mrsx9, tpidr_el0
movx10, -
nop/movk
addx10, sp, x10
ldrx9, [x9, 16]
cmpx10, x9
b.csenough
stpx30, [sp, -16]movx11, 
movx11, 
bl __morestack
ldpx30, [sp], 16
ret
enough:
mov x10, sp
[...]
b.csfunction
mov x10, x28
function:

Notes:

1. Even if a function does not allocate a stack frame, a split-stack prologue
   is created.  It is to avoid issues with tail call for external symbols
   which might require linker adjustment (libgo/runtime/go-varargs.c).

2. Basic-block reordering (enabled with -O2) will move split-stack TCB ldr
   to after the required stack calculation.

3. Similar to powerpc, When the linker detects a call from split-stack to
   non-split-stack code, it adds 16k (or more) to the value found in "allocate"
   instructions (so non-split-stack code gets a larger stack).  The amount is
   tunable by a linker option.  The edit means aarch64 does not need to
   implement __morestack_non_split, necessary on x86 because insufficient
   space is available there to edit the stack comparison code.  This feature
   is only implemented in the GNU gold linker.

4. AArch64 does not handle >2G stack initially and although it is possible
   to implement it, limiting to 2G allows to materize the allocation with
   only 2 instructions (mov + movk) and thus simplifying the linker
   adjustments required.  Supporting multiple threads each requiring more
   than 2G of stack is probably not that important, and likely to OOM at
   run time.

5. The TCB support on GLIBC is meant to be included in version 2.25.

libgcc/ChangeLog:

* libgcc/config.host: Use t-stack and t-statck-aarch64 for
aarch64*-*-linux.
* libgcc/config/aarch64/morestack-c.c: New file.
* libgcc/config/aarch64/morestack.S: Likewise.
* libgcc/config/aarch64/t-stack-aarch64: Likewise.
* libgcc/generic-morestack.c (__splitstack_find): Add aarch64-specific
code.

gcc/ChangeLog:

* common/config/aarch64/aarch64-common.c
(aarch64_supports_split_stack): New function.
(TARGET_SUPPORTS_SPLIT_STACK): New macro.
* gcc/config/aarch64/aarch64-linux.h (TARGET_ASM_FILE_END): Remove
macro.
* gcc/config/aarch64/aarch64-protos.h: Add
aarch64_expand_split_stack_prologue and
aarch64_split_stack_space_check.
* gcc/config/aarch64/aarch64.c (split_stack_arg_pointer_used_p): New
function.
(aarch64_expand_prologue): Setup the argument pointer (x10) for
split-stack.
(aarch64_expand_builtin_va_start): Use internal argument pointer
instead of virtual_incoming_args_rtx.
(morestack_ref): New symbol.
(aarch64_expand_split_stack_prologue): New function.
(aarch64_file_end): Emit the split-stack note sections.
(aarch64_internal_arg_pointer): Likewise.
(aarch64_live_on_entry): Set the argument pointer for split-stack.
(aarch64_split_stack_space_check): Likewise.
(TARGET_ASM_FILE_END): New macro.
(TARGET_EXTRA_LIVE_ON_ENTRY): Likewise.
(TARGET_INTERNAL_ARG_POINTER): Likewise.
* gcc/config/aarch64/aarch64.h (aarch64_frame): Add
split_stack_arg_pointer to setup the argument pointer when using
split-stack.
* gcc/config/aarch64/aarch64.md (UNSPEC_STACK_CHECK): New unspec.
(UNSPECV_SPLIT_STACK_RETURN): Likewise.
(split_stack_prologue):

[PATCH 1/2] Fix GCC split-stack variadic argument tests

2016-10-17 Thread Adhemerval Zanella
This test adds the expect va_end after va_start on split-stack tests.

gcc/ChangeLog:

* gcc/testsuite/gcc.dg/split-3.c (down): Call va_end after va_start.
* gcc/testsuite/gcc.dg/split-6.c (down): Likewise.
---
 gcc/ChangeLog  | 5 +
 gcc/testsuite/gcc.dg/split-3.c | 1 +
 gcc/testsuite/gcc.dg/split-6.c | 1 +
 3 files changed, 7 insertions(+)

diff --git a/gcc/testsuite/gcc.dg/split-3.c b/gcc/testsuite/gcc.dg/split-3.c
index 64bbb8c..5ba7616 100644
--- a/gcc/testsuite/gcc.dg/split-3.c
+++ b/gcc/testsuite/gcc.dg/split-3.c
@@ -40,6 +40,7 @@ down (int i, ...)
   || va_arg (ap, int) != 9
   || va_arg (ap, int) != 10)
 abort ();
+  va_end (ap);
 
   if (i > 0)
 {
diff --git a/gcc/testsuite/gcc.dg/split-6.c b/gcc/testsuite/gcc.dg/split-6.c
index b32cf8d..b3016ba 100644
--- a/gcc/testsuite/gcc.dg/split-6.c
+++ b/gcc/testsuite/gcc.dg/split-6.c
@@ -37,6 +37,7 @@ down (int i, ...)
   || va_arg (ap, int) != 9
   || va_arg (ap, int) != 10)
 abort ();
+  va_end (ap);
 
   if (i > 0)
 {
-- 
2.7.4



Re: PATCH to tidy up code in c-warn.c

2016-10-17 Thread Jeff Law

On 10/16/2016 04:53 AM, Marek Polacek wrote:

Found when looking at something else.  find_array_ref_with_const_idx_r would
uselessly keep on looking for other array refs with constant indices, even
though finding one is enough.  So return when something is found, instead of
just ignoring the subtrees.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2016-10-16  Marek Polacek  

* c-warn.c (find_array_ref_with_const_idx_r): Remove parameter names.
Return immediately when finding a match.
(warn_tautological_cmp): Remove a boolean variable that is no longer
needed.

OK.
jeff



Re: [PATCH] fix outstanding -Wformat-length failures (pr77735 et al.)

2016-10-17 Thread Jeff Law

On 10/02/2016 02:10 PM, Martin Sebor wrote:

The attached patch fixes a number of outstanding test failures
and ILP32-related bugs in the gimple-ssa-sprintf pattch pointed
out in bug 77676 and 77735).  The patch also fixes c_strlen to
correctly handle wide strings (previously it accepted them but
treated them as nul-terminated byte sequences), and adjusts the
handling of "%a" to avoid assuming a specific number of decimal
digits (this is likely a defect in C11 that I'm pursuing with
WG14).

Tested on powerpc64le, i386, and x86_64.

There is one outstanding failure in the builtin-sprintf-warn-1.c
test on powerpc64le that looks like it might be due to the
printf_pointer_format target hook not having been set up entirely
correctly.  I'll look into that separately, along with pr77819.

Martin

gcc-77735.diff


PR middle-end/77735 - FAIL: gcc.dg/tree-ssa/builtin-sprintf-warn-1.c

gcc/ChangeLog:
2016-10-02  Martin Sebor  

PR middle-end/77735
* builtins.c (string_length): New function.
(c_strlen): Use string_length.  Correctly handle wide strings.
* gimple-ssa-sprintf.c (target_max_value, target_size_max): New
functions.
(target_int_max): Call target_max_value.
(format_result::knownrange): New data member.
(fmtresult::fmtresult): Define default constructor.
(format_integer): Use it and set format_result::knownrange.
Handle global constants.
(format_floating_max): Add third argument.
(format_floating): Recompute maximum value for %a for each argument.
(get_string_length): Use fmtresult default ctor.
(format_string): Set format_result::knownrange.
(format_directive): Check format_result::knownrange.
(add_bytes): Same.  Correct caret placement in diagnostics.
(pass_sprintf_length::compute_format_length): Set
format_result::knownrange.
(pass_sprintf_length::handle_gimple_call): Use target_size_max.

gcc/testsuite/ChangeLog:
2016-10-02  Martin Sebor  

PR middle-end/77735
* gcc.dg/tree-ssa/builtin-sprintf-2.c: Add test cases.
* gcc.dg/tree-ssa/builtin-sprintf-warn-1.c: Same.
* gcc.dg/tree-ssa/builtin-sprintf-warn-2.c: Same.
* gcc.dg/tree-ssa/builtin-sprintf-warn-3.c: Adjust/relax.
* gcc.dg/tree-ssa/builtin-sprintf-warn-4.c: Add test cases.
* gcc.dg/tree-ssa/builtin-sprintf-warn-6.c: XFAIL for LP64 only.
* gcc.dg/tree-ssa/builtin-sprintf.c: Add test cases.

diff --git a/gcc/gimple-ssa-sprintf.c b/gcc/gimple-ssa-sprintf.c
index 92f939e..410bbc3 100644
--- a/gcc/gimple-ssa-sprintf.c
+++ b/gcc/gimple-ssa-sprintf.c
@@ -1281,9 +1330,9 @@ format_floating (const conversion_spec &spec, int width, 
int prec)
res.range.max = width;
 }

-  /* The argument is only considered bounded when the range of output
- bytes is exact.  */
-  res.bounded = res.range.min == res.range.max;
+  /* The argument is considered bounded when the output of a directive
+ is fully specified and the range of output bytes is exact.  */
+  // res.bounded &= res.range.min == res.range.max;

Did you mean to leave this commented out?

It looks like you're defining a constructor for "struct fmtresult". 
Normally once we define a constructor we turn the underlying object into 
a class.It appears that the constructor leaves argmin, argmax, 
knownrange, bounded & constant uninitialized?  Am I missing something here?




+ /* A plain '%c' directive.  Its ouput is excactly 1.  */

Typo for exactly.


Jeff


Re: [PATCH] AIX symbol encoding

2016-10-17 Thread Segher Boessenkool
Hi David,

On Mon, Oct 17, 2016 at 09:54:35AM -0400, David Edelsohn wrote:
> This patch shifts earlier the point during compilation where the
> rs6000 backend adds decoration to symbols: to encode_section_info.

> -  /* Currently C++ toc references to vtables can be emitted before it
> - is decided whether the vtable is public or private.  If this is
> - the case, then the linker will eventually complain that there is
> - a reference to an unknown section.  Thus, for vtables only,
> - we emit the TOC reference to reference the symbol and not the
> - section.  */

I think this comment should go...

>if (VTABLE_NAME_P (name))
>  {
>RS6000_OUTPUT_BASENAME (file, name);

... before here.

Otherwise looks fine.  Do you want this tested on Linux before you
commit?  We never test on AIX beforehand either ;-)


Segher


Re: [PATCH AArch64]Penalize vector cost for large loops with too many vect insns.

2016-10-17 Thread Bill Schmidt

> On Oct 17, 2016, at 12:26 PM, Bin.Cheng  wrote:
> 
> On Sat, Oct 15, 2016 at 3:07 AM, kugan
>  wrote:
>> Hi Bin,
>> 
>> On 15/10/16 00:15, Bin Cheng wrote:
>>> 
>>> +/* Test for likely overcommitment of vector hardware resources.  If a
>>> +   loop iteration is relatively large, and too large a percentage of
>>> +   instructions in the loop are vectorized, the cost model may not
>>> +   adequately reflect delays from unavailable vector resources.
>>> +   Penalize the loop body cost for this case.  */
>>> +
>>> +static void
>>> +aarch64_density_test (struct aarch64_vect_loop_cost_data *data)
>>> +{
>>> +  const int DENSITY_PCT_THRESHOLD = 85;
>>> +  const int DENSITY_SIZE_THRESHOLD = 128;
>>> +  const int DENSITY_PENALTY = 10;
>>> +  struct loop *loop = data->loop_info;
>>> +  basic_block *bbs = get_loop_body (loop);
>> 
>> 
>> Is this worth being part of the cost model such that it can have different
>> defaults for different micro-architecture?
> Hi,
> I don't know.  From my running, this penalizing function looks like a
> quite benchmark specific tuning.  If that's the case, tuning for
> different micro architecture may not give meaningful different
> results, at the cost of three parameters.
> Hi Bill,  I guess you are the original author?  Do you recall any
> motivation of this code or have some comments?  Thanks very much.
> Meanwhile, I can do some experiments on different AArch64 processors.

Hi Bin,

Yes, this is specific tuning due to problems observed on a
POWER model.  I don't necessarily recommend this approach for
other architectures without appropriate experimental verification
and tuning.

Bill

> 
> Thanks,
> bin
> 



Re: [PATCH/AARCH64] Fix some testcases for AARCH64 ILP32

2016-10-17 Thread Andrew Pinski
On Mon, Oct 17, 2016 at 2:55 AM, Richard Earnshaw (lists)
 wrote:
> On 16/10/16 22:39, Andrew Pinski wrote:
>> Hi,
>>   These testcases use long for 64bit integer which means they will
>> fail with -mabi=ilp32 on aarch64.  This reduces the number of failures
>> down for ILP32.
>>
>> OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions
>> (including with a multi-lib for ilp32).
>>
>> Thanks,
>> Andrew Pinski
>>
>> testsuite/ChangeLog:
>> * gcc.dg/tree-ssa/sra-17.c: Use long long instead of long.
>> * gcc.dg/tree-ssa/sra-18.c: Likewise.
>
> This changes the behaviour for these two tests on ARM, which is a 32-bit
> target.  What testing other than aarch64 have you done?   I'd like to
> see testing results for both 32-bit and 64-bit (including non-arm) systems.
>
>> * gcc.target/aarch64/aapcs64/test_align-7.c: Likewise.
>> * gcc.target/aarch64/cinc_common_1.c: Likewise.
>> * gcc.target/aarch64/combine_bfi_1.c: Likewise.
>> * gcc.target/aarch64/fmul_fcvt_1.c: Likewise.
>> * gcc.target/aarch64/mult-synth_4.c: Likewise.
>> * gcc.target/aarch64/pr68102_1.c: Likewise.
>> * gcc.target/aarch64/target_attr_3.c: Likewise.
>
> These are OK.

I committed the gcc.target changes and will be looking into
sra-17/sra-18 issue later this week.

Thanks,
Andrew

>
>>
>>
>> fixsometestcases.diff.txt
>>
>>
>> Index: testsuite/gcc.dg/tree-ssa/sra-17.c
>> ===
>> --- testsuite/gcc.dg/tree-ssa/sra-17.c(revision 241217)
>> +++ testsuite/gcc.dg/tree-ssa/sra-17.c(working copy)
>> @@ -7,7 +7,7 @@ extern void abort (void);
>>  int
>>  main (int argc, char **argv)
>>  {
>> -  long a[4] = { 7, 19, 11, 255 };
>> +  long long a[4] = { 7, 19, 11, 255 };
>>int tot = 0;
>>for (int i = 0; i < 4; i++)
>>  tot = (tot*256) + a[i];
>> Index: testsuite/gcc.dg/tree-ssa/sra-18.c
>> ===
>> --- testsuite/gcc.dg/tree-ssa/sra-18.c(revision 241217)
>> +++ testsuite/gcc.dg/tree-ssa/sra-18.c(working copy)
>> @@ -3,7 +3,7 @@
>>  /* { dg-additional-options "-mcpu=ev4" { target alpha*-*-* } } */
>>
>>  extern void abort (void);
>> -struct foo { long x; };
>> +struct foo { long long x; };
>>
>>  struct bar { struct foo f[2]; };
>>
>> Index: testsuite/gcc.target/aarch64/aapcs64/test_align-7.c
>> ===
>> --- testsuite/gcc.target/aarch64/aapcs64/test_align-7.c   (revision 
>> 241217)
>> +++ testsuite/gcc.target/aarch64/aapcs64/test_align-7.c   (working copy)
>> @@ -7,8 +7,8 @@
>>
>>  struct s
>>{
>> -long x;
>> -long y;
>> +long long x;
>> +long long y;
>>};
>>
>>  /* This still has size 16, so is still passed by value.  */
>> Index: testsuite/gcc.target/aarch64/cinc_common_1.c
>> ===
>> --- testsuite/gcc.target/aarch64/cinc_common_1.c  (revision 241217)
>> +++ testsuite/gcc.target/aarch64/cinc_common_1.c  (working copy)
>> @@ -15,14 +15,14 @@ barsi (int x)
>>return x > 100 ? x + 4 : x + 3;
>>  }
>>
>> -long
>> -foodi (long x)
>> +long long
>> +foodi (long long x)
>>  {
>>return x > 100 ? x - 2 : x - 1;
>>  }
>>
>> -long
>> -bardi (long x)
>> +long long
>> +bardi (long long x)
>>  {
>>return x > 100 ? x + 4 : x + 3;
>>  }
>> Index: testsuite/gcc.target/aarch64/combine_bfi_1.c
>> ===
>> --- testsuite/gcc.target/aarch64/combine_bfi_1.c  (revision 241217)
>> +++ testsuite/gcc.target/aarch64/combine_bfi_1.c  (working copy)
>> @@ -25,8 +25,8 @@ f4 (int x, int y)
>>return (x & ~0xff) | (y & 0xff);
>>  }
>>
>> -long
>> -f5 (long x, long y)
>> +long long
>> +f5 (long long x, long long y)
>>  {
>>return (x & ~0xull) | (y & 0x);
>>  }
>> Index: testsuite/gcc.target/aarch64/fmul_fcvt_1.c
>> ===
>> --- testsuite/gcc.target/aarch64/fmul_fcvt_1.c(revision 241217)
>> +++ testsuite/gcc.target/aarch64/fmul_fcvt_1.c(working copy)
>> @@ -27,13 +27,13 @@ ulsffoo##__a (float x)\
>>  }
>>
>>  #define FUNC_DEFD(__a)   \
>> -long \
>> +long long\
>>  dffoo##__a (double x)\
>>  {\
>>return x * __a##.0;\
>>  }\
>>   \
>> -unsigned long\
>> +unsigned long long   \
>>  udffoo##__a (double x)   \
>>  {\
>>return x * __a##.0;\
>> @@ -101,18 +101,18 @@ do 
>>  \
>>__builtin_abort ();\
>>  if (usffoo##__a (__b) != (unsigned int)(__b * __a))  \
>>__builtin_abort ();\
>> -if (lsffoo##__a (__b) != (long)(__b * __a))  \
>> +i

Re: [PATCH, C++] Warn on redefinition of builtin functions (PR c++/71973)

2016-10-17 Thread Joseph Myers
On Sun, 16 Oct 2016, Bernd Edlinger wrote:

> Second, the declaration in the glibc header files simply look wrong,
> because the type of argv, and envp is "char *const *" while the
> builtin function wants "const char**", thus only the array of char*
> itself is const, not the actual char stings they point to.

char *const * is the POSIX type.  (It can't be const char ** or const char 
*const * because you can't convert char ** implicitly to those types in 
ISO C.)  You'd need to check the list archives for rationale for the 
built-in functions doing something different.

> Third, in C the  builtins are not diagnosed, because C does only look
> at the mode of the parameters see match_builtin_function_types in
> c/c-decl.c, which may itself be wrong, because that makes an ABI
> decision dependent on the mode of the parameter.

The matching needs to be loose because of functions using types such as 
FILE * where the compiler doesn't know the exact contents of the type when 
processing built-in function definitions.  (Historically there were also 
issues with pre-ISO headers, but that may be less relevant now.)

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [patch] Fix ICE on ACATS test for Aarch64 at -O

2016-10-17 Thread Jeff Law

On 10/03/2016 09:45 AM, Eric Botcazou wrote:

Ping: https://gcc.gnu.org/ml/gcc-patches/2016-09/msg01781.html


2016-09-26  Eric Botcazou  

* expmed.c (expand_shift_1): Add MAY_FAIL parameter and do not assert
that the result is non-zero if it is true.
(maybe_expand_shift): New wrapper around expand_shift_1.
(emit_store_flag): Call maybe_expand_shift in lieu of expand_shift.



OK.  Sorry for the delay.

jeff


[PATCH] PR77895: DWARF: Emit DW_AT_comp_dir in all cases, even if source is an absolute path

2016-10-17 Thread Ximin Luo
(Please keep me on CC, I am not subscribed)

In working on another patch for DW_AT_comp_dir behaviour, I tried to write a
test for that other patch. This test did not work as expected - DW_AT_comp_dir
was not even generated in the resulting output! But this had nothing to do with
my patch so I dug a bit deeper.

It turns out that GCC does not emit DW_AT_comp_dir if the source path given is
an absolute path, and the file being compiled contains a typedef to a compiler
builtin (sorry, I don't know the terminology here very well). This typedef
exists in the vast majority of cases in the real world via standard library
includes, so from the outside, the behaviour of GCC "looks like" it emits
DW_AT_comp_dir in all/most cases.

>From looking at the source code dwarf2out.c, Richard Biener determined that the
current code is written to "intend to" not emit DW_AT_comp_dir if to path is
absolute, regardless of the typedef. It is possible to "fix" this bug by a
1-line change, to conform to the way currently "intended" by GCC code. However,
I think a much better fix is a 22-line deletion of code from dwarf2out.c, to
instead emit DW_AT_comp_dir in all cases.

The original aim of not emitting DW_AT_comp_dir seems to be to avoid emitting
redundant information. However, this information is *not* redundant! The DWARF2
spec says:

"A DW_AT_comp_dir attribute whose value is a null-terminated string containing
the current working directory of the compilation command that produced this
compilation unit in whatever form makes sense for the host system."

Conceptually, the working directory is independent from an input file, so it
would *not* be redundant to emit it in the general case. In future versions of
DWARF, other information might be added (such as relative -I flags, etc) that
are dependent on knowing the working directory. The logic of "don't emit
DW_AT_comp_dir if DW_AT_name is absolute" would then break. In other words, the
choice to avoid emitting DW_AT_comp_dir is a brittle and premature optimisation
that loses information in the general case. Therefore, it is better to emit it
in all circumstances, in case the reader needs to know what the working
directory was at compile-time.

*Sometimes*, *parts* of it might be redundant yes - and rewriting DW_AT_name to
be relative to this, would remove the redundancy. Doing this is compatible with
the above points, and I could amend the patch to do this - although I suggest
it's not worth the effort, unless there is a function in GCC code that already
does this.

Some more minor advantages are:

- We delete 21 lines, this reduces complexity in the already-complex code.
  We can also get rid of the Darwin-specific workaround described here:
  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53453 (Not currently part of my
  patch, but I can add this in.)

- The GCC test suite compiles tests via their absolute paths, and many many
  other buildsystems do this too. This was in fact my original problem - I was
  trying to test something involving DW_AT_comp_dir, but GCC does not emit this
  in the testsuite! Rather than saying "this is a bug, everyone else should
  spend effort fixing this", it is better to fix it in one place, i.e. the
  source of where it is (not) generated.

Thanks for your time,

Ximin

-- 
GPG: ed25519/56034877E1F87C35
GPG: rsa4096/1318EFAC5FBBDBCE
https://github.com/infinity0/pubkeys.git
Index: gcc-7-20161009/gcc/dwarf2out.c
===
--- gcc-7-20161009.orig/gcc/dwarf2out.c
+++ gcc-7-20161009/gcc/dwarf2out.c
@@ -21994,7 +21994,7 @@ gen_compile_unit_die (const char *filena
 {
   add_name_attribute (die, filename);
   /* Don't add cwd for .  */
-  if (!IS_ABSOLUTE_PATH (filename) && filename[0] != '<')
+  if (filename[0] != '<')
 	add_comp_dir_attribute (die);
 }
 
@@ -26332,20 +26332,6 @@ prune_unused_types (void)
 prune_unmark_dies (ctnode->root_die);
 }
 
-/* Set the parameter to true if there are any relative pathnames in
-   the file table.  */
-int
-file_table_relative_p (dwarf_file_data **slot, bool *p)
-{
-  struct dwarf_file_data *d = *slot;
-  if (!IS_ABSOLUTE_PATH (d->filename))
-{
-  *p = true;
-  return 0;
-}
-  return 1;
-}
-
 /* Helpers to manipulate hash table of comdat type units.  */
 
 struct comdat_type_hasher : nofree_ptr_hash 
@@ -28152,15 +28138,7 @@ dwarf2out_early_finish (const char *file
   /* Add the name for the main input file now.  We delayed this from
  dwarf2out_init to avoid complications with PCH.  */
   add_name_attribute (comp_unit_die (), remap_debug_filename (filename));
-  if (!IS_ABSOLUTE_PATH (filename) || targetm.force_at_comp_dir)
-add_comp_dir_attribute (comp_unit_die ());
-  else if (get_AT (comp_unit_die (), DW_AT_comp_dir) == NULL)
-{
-  bool p = false;
-  file_table->traverse (&p);
-  if (p)
-	add_comp_dir_attribute (comp_unit_die ());
-}
+  add_comp_dir_attribute (comp_unit_die ());
 
   

[Patch, fortran] PR61420 [5/6/7 Regression] [OOP] type-bound procedure returning a procedure pointer fails to compile

2016-10-17 Thread Paul Richard Thomas
Dear All,

Committed as 'obvious' on trunk as revision 241274.

I will commit to 5- and 6-branches at the end of the week.

Cheers

Paul

2016-10-17  Paul Thomas  

PR fortran/61420
PR fortran/78013
* resolve.c (resolve_variable): Obtain the typespec for a
variable expression, when the variable is a function result
that is a procedure pointer.

2016-10-17  Paul Thomas  

PR fortran/61420
PR fortran/78013
* gfortran.dg/proc_ptr_49.f90: New test.
Index: gcc/fortran/resolve.c
===
*** gcc/fortran/resolve.c   (revision 241226)
--- gcc/fortran/resolve.c   (working copy)
*** resolve_variable (gfc_expr *e)
*** 5112,5117 
--- 5112,5122 
  
if (sym->ts.type != BT_UNKNOWN)
  gfc_variable_attr (e, &e->ts);
+   else if (sym->attr.flavor == FL_PROCEDURE
+  && sym->attr.function && sym->result
+  && sym->result->ts.type != BT_UNKNOWN
+  && sym->result->attr.proc_pointer)
+ e->ts = sym->result->ts;
else
  {
/* Must be a simple variable reference.  */
Index: gcc/testsuite/gfortran.dg/proc_ptr_49.f90
===
*** gcc/testsuite/gfortran.dg/proc_ptr_49.f90   (revision 0)
--- gcc/testsuite/gfortran.dg/proc_ptr_49.f90   (working copy)
***
*** 0 
--- 1,50 
+ ! { dg-do compile }
+ !
+ ! Tests the fix for PRs 78013 and 61420, both of which gave a
+ ! no IMPLICIT type message for the procedure pointer at assignment.
+ !
+ module m
+ 
+   implicit none
+ 
+   abstract interface
+ function I_f() result( r )
+   real :: r
+ end function I_f
+   end interface
+ 
+   type, abstract :: a_t
+ private
+ procedure(I_f), nopass, pointer :: m_f => null()
+   contains
+ private
+ procedure, pass(this), public :: f => get_f
+   end type a_t
+ 
+ contains
+ 
+   function get_f( this ) result( f_ptr )  ! Error message here.
+ class(a_t), intent(in)  :: this
+ procedure(I_f), pointer :: f_ptr
+ f_ptr => this%m_f ! Error here :-)
+   end function get_f
+ 
+ end module m
+ 
+ module test
+   implicit none
+ 
+   type functions
+   contains
+ procedure, nopass :: get_pf => get_it ! Error here
+   end type
+ 
+   class(functions), allocatable :: f
+ 
+ contains
+ 
+   function get_it()  ! Error message here.
+ procedure (real), pointer :: get_it
+   end function
+ 
+ end module


Re: [PATCH] Make -Wint-in-bool-context warn on suspicious shift ops

2016-10-17 Thread Markus Trippelsdorf
On 2016.10.17 at 17:30 +, Bernd Edlinger wrote:
> On 10/17/16 19:11, Markus Trippelsdorf wrote:
> >>> I'm seeing this warning a lot in valid low level C code for unsigned
> >>> integers. And I must say it look bogus in this context. Some examples:
> >
> > (All these examples are from qemu trunk.)
> >
> >>>  return ((a.high & 0x7fff) == 0x7fff) && (a.low<<1);
> >>>
> >
> > typedef struct {
> > uint64_t low;
> > uint16_t high;
> > } floatx80;
> >
> > static inline int floatx80_is_any_nan(floatx80 a)
> > {
> > return ((a.high & 0x7fff) == 0x7fff) && (a.low<<1);
> > }
> >
> >> With the shift op, the result depends on integer promotion rules,
> >> and if the value is signed, it can invoke undefined behavior.
> >>
> >> But if a.low is a unsigned short for instance, a warning would be
> >> more than justified here.
> >
> >>>  if ( (uint32_t) ( aSig<<( shiftCount & 31 ) ) ) {
> >>>
> >>
> >> Yes interesting, aSig is signed int, right?
> >
> > No, it is uint32_t.
> >
> >> So if the << will overflow, the code is invoking undefined behavior.
> >>
> >>
> >>>  && (uint64_t) (extractFloatx80Frac(a) << 1))
> >>>
> >>
> >> What is the result type of extractFloatx80Frac() ?
> >
> > static inline uint64_t extractFloatx80Frac( floatx80 a )
> >
> >>
> >>>  if ((plen < KEYLENGTH) && (key << plen))
> >>>
> >>
> >> This is from linux, yes, I have not seen that with the first
> >> version where the warning is only for signed shift ops.
> >>
> >> At first sight it looks really, like could it be that "key < plen"
> >> was meant? But yes, actually it works correctly as long
> >> as int is 32 bit, if int is 64 bits, that code would break
> >> immediately.
> >
> > u8 plen;
> > u32 key;
> >
> >> I think in the majority of code, where the author was aware of
> >> possible overflow issues and integer promotion rules, he will
> >> have used unsigned integer types, of sufficient precision.
> >
> > As I wrote above, all these warning were for unsigned integer types.
> > And all examples are perfectly valid code as far as I can see.
> >
> 
> I would be fine with disabling the warning in cases where the shift
> is done in unsigned int.  Note however, that the linux code is
> dependent on sizeof(int)<=sizeof(u32), but the warning would certainly
> be more helpful if it comes back at the day when int starts to be 64
> bits wide.
> 
> 
> How about this untested patch, does it reduce the false positive rate
> for you?

Yes, now everything is fine. Thank you.

-- 
Markus


Patch ping

2016-10-17 Thread Jakub Jelinek
Hi!

I'd like to ping the
[C++ PATCH] Fix -Wimplicit-fallthrough in templates (PR c++/77886)
patch:

On Sat, Oct 08, 2016 at 08:13:50AM +0200, Jakub Jelinek wrote:
> 2016-10-08  Jakub Jelinek  
> 
>   PR c++/77886
>   * pt.c (tsubst_expr)  Copy over
>   FALLTHROUGH_LABEL_P flag to the new LABEL_DECL.
>   (tsubst_expr) : Likewise.
> 
>   * g++.dg/warn/Wimplicit-fallthrough-2.C: New test.

Thanks.

Jakub


Re: [PATCH] Make -Wint-in-bool-context warn on suspicious shift ops

2016-10-17 Thread Bernd Edlinger
On 10/17/16 19:11, Markus Trippelsdorf wrote:
>>> I'm seeing this warning a lot in valid low level C code for unsigned
>>> integers. And I must say it look bogus in this context. Some examples:
>
> (All these examples are from qemu trunk.)
>
>>>  return ((a.high & 0x7fff) == 0x7fff) && (a.low<<1);
>>>
>
> typedef struct {
> uint64_t low;
> uint16_t high;
> } floatx80;
>
> static inline int floatx80_is_any_nan(floatx80 a)
> {
> return ((a.high & 0x7fff) == 0x7fff) && (a.low<<1);
> }
>
>> With the shift op, the result depends on integer promotion rules,
>> and if the value is signed, it can invoke undefined behavior.
>>
>> But if a.low is a unsigned short for instance, a warning would be
>> more than justified here.
>
>>>  if ( (uint32_t) ( aSig<<( shiftCount & 31 ) ) ) {
>>>
>>
>> Yes interesting, aSig is signed int, right?
>
> No, it is uint32_t.
>
>> So if the << will overflow, the code is invoking undefined behavior.
>>
>>
>>>  && (uint64_t) (extractFloatx80Frac(a) << 1))
>>>
>>
>> What is the result type of extractFloatx80Frac() ?
>
> static inline uint64_t extractFloatx80Frac( floatx80 a )
>
>>
>>>  if ((plen < KEYLENGTH) && (key << plen))
>>>
>>
>> This is from linux, yes, I have not seen that with the first
>> version where the warning is only for signed shift ops.
>>
>> At first sight it looks really, like could it be that "key < plen"
>> was meant? But yes, actually it works correctly as long
>> as int is 32 bit, if int is 64 bits, that code would break
>> immediately.
>
> u8 plen;
> u32 key;
>
>> I think in the majority of code, where the author was aware of
>> possible overflow issues and integer promotion rules, he will
>> have used unsigned integer types, of sufficient precision.
>
> As I wrote above, all these warning were for unsigned integer types.
> And all examples are perfectly valid code as far as I can see.
>

I would be fine with disabling the warning in cases where the shift
is done in unsigned int.  Note however, that the linux code is
dependent on sizeof(int)<=sizeof(u32), but the warning would certainly
be more helpful if it comes back at the day when int starts to be 64
bits wide.


How about this untested patch, does it reduce the false positive rate
for you?


Thanks
Bernd.

2016-10-17  Bernd Edlinger  

	* c-common.c (c_common_truthvalue_conversion): Warn only for signed
	integer shift ops in boolean context.

Index: gcc/c-family/c-common.c
===
--- gcc/c-family/c-common.c	(revision 241270)
+++ gcc/c-family/c-common.c	(working copy)
@@ -3328,8 +3328,10 @@
 	   TREE_OPERAND (expr, 0));
 
 case LSHIFT_EXPR:
-  warning_at (EXPR_LOCATION (expr), OPT_Wint_in_bool_context,
-		  "<< in boolean context, did you mean '<' ?");
+  if (TREE_CODE (TREE_TYPE (expr)) == INTEGER_TYPE
+	  && !TYPE_UNSIGNED (TREE_TYPE (expr)))
+	warning_at (EXPR_LOCATION (expr), OPT_Wint_in_bool_context,
+		"<< in boolean context, did you mean '<' ?");
   break;
 
 case COND_EXPR:


Re: [PATCH AArch64]Penalize vector cost for large loops with too many vect insns.

2016-10-17 Thread Bin.Cheng
On Sat, Oct 15, 2016 at 3:07 AM, kugan
 wrote:
> Hi Bin,
>
> On 15/10/16 00:15, Bin Cheng wrote:
>>
>> +/* Test for likely overcommitment of vector hardware resources.  If a
>> +   loop iteration is relatively large, and too large a percentage of
>> +   instructions in the loop are vectorized, the cost model may not
>> +   adequately reflect delays from unavailable vector resources.
>> +   Penalize the loop body cost for this case.  */
>> +
>> +static void
>> +aarch64_density_test (struct aarch64_vect_loop_cost_data *data)
>> +{
>> +  const int DENSITY_PCT_THRESHOLD = 85;
>> +  const int DENSITY_SIZE_THRESHOLD = 128;
>> +  const int DENSITY_PENALTY = 10;
>> +  struct loop *loop = data->loop_info;
>> +  basic_block *bbs = get_loop_body (loop);
>
>
> Is this worth being part of the cost model such that it can have different
> defaults for different micro-architecture?
Hi,
I don't know.  From my running, this penalizing function looks like a
quite benchmark specific tuning.  If that's the case, tuning for
different micro architecture may not give meaningful different
results, at the cost of three parameters.
Hi Bill,  I guess you are the original author?  Do you recall any
motivation of this code or have some comments?  Thanks very much.
Meanwhile, I can do some experiments on different AArch64 processors.

Thanks,
bin


Re: [PATCH] Make -Wint-in-bool-context warn on suspicious shift ops

2016-10-17 Thread Markus Trippelsdorf
On 2016.10.17 at 16:51 +, Bernd Edlinger wrote:
> On 10/17/16 17:23, Markus Trippelsdorf wrote:
> > On 2016.09.29 at 18:52 +, Bernd Edlinger wrote:
> >> On 09/29/16 20:03, Jason Merrill wrote:
> >>> On Wed, Sep 28, 2016 at 12:09 PM, Bernd Edlinger
> >>>  wrote:
>  On 09/28/16 16:41, Jason Merrill wrote:
> > On Tue, Sep 27, 2016 at 11:10 AM, Bernd Edlinger
> >  wrote:
> >> On 09/27/16 16:42, Jason Merrill wrote:
> >>> On Tue, Sep 27, 2016 at 10:28 AM, Bernd Edlinger
> >>>  wrote:
>  On 09/27/16 16:10, Florian Weimer wrote:
> > * Bernd Edlinger:
> >
> >>> “0 << 0” is used in a similar context, to create a zero constant 
> >>> for a
> >>> multi-bit subfield of an integer.
> >>>
> >>> This example comes from GDB, in bfd/elf64-alpha.c:
> >>>
> >>> |   insn = INSN_ADDQ | (16 << 21) | (0 << 16) | (0 << 0);
> >>>
> >>
> >> Of course that is not a boolean context, and will not get a 
> >> warning.
> >>
> >> Question is if "if (1 << 0)" is possibly a miss-spelled "if (1 < 
> >> 0)".
> >>
> >> Maybe 1 and 0 come from macro expansion
> >
> > But what's the intent of treating 1 << 0 and 0 << 0 differently in 
> > the
> > patch, then?
> 
>  I am not sure if it was a good idea.
> 
>  I saw, we had code of the form
>  bool flag = 1 << 2;
> 
>  another value LOOKUP_PROTECT is  1 << 0, and
>  bool flag = 1 << 0;
> 
>  would at least not overflow the allowed value range of a boolean.
> >>>
> >>> Assigning a bit mask to a bool variable is still probably not what was
> >>> intended, even if it doesn't change the value.
> >>
> >> That works for me too.
> >> I can simply remove that exception.
> >
> > Sounds good.
> 
>  Great.  Is that an "OK with that change"?
> >>>
> >>> What do you think about dropping the TYPE_UNSIGNED exception as well?
> >>> I don't see what difference that makes.
> >>>
> >>
> >>
> >> If I drop that exception, then I could also drop the check for
> >> INTEGER_TYPE and the whole if, because I think other types can not
> >> happen, but if they are allowed they are as well bogus here.
> >>
> >> I can try a bootstrap and see if there are false positives.
> >>
> >> But I can do that as well in a follow-up patch, this should probably
> >> be done step by step, especially when it may trigger some false
> >> positives.
> >>
> >> I think I could also add more stuff, like unary + or - ?
> >> or maybe also binary +, -, * and / ?
> >>
> >> We already discussed making this a multi-level option,
> >> and maybe enabling the higher level explicitly in the
> >> boot-strap.
> >>
> >> As long as the warning continues to find more bugs than false
> >> positives, it is probably worth extending it to more cases.
> >>
> >> However unsigned integer shift are not undefined if they overflow.
> >>
> >> It is possible that this warning will then trigger also on valid
> >> code that does loop termination with unsigned int left shifting.
> >> I dont have a real example, but maybe  like this hypothetical C-code:
> >>
> >>   unsigned int x=1, bits=0;
> >>   while (x << bits) bits++;
> >>   printf("bits=%d\n", bits);
> >>
> >>
> >> Is it OK for everybody to warn for this on -Wall, or maybe only
> >> when -Wextra or for instance -Wint-in-bool-context=2 is used ?
> >
> > I'm seeing this warning a lot in valid low level C code for unsigned
> > integers. And I must say it look bogus in this context. Some examples:

(All these examples are from qemu trunk.)

> >  return ((a.high & 0x7fff) == 0x7fff) && (a.low<<1);
> >

typedef struct {
uint64_t low;
uint16_t high;
} floatx80;

static inline int floatx80_is_any_nan(floatx80 a)
{
return ((a.high & 0x7fff) == 0x7fff) && (a.low<<1);
}

> With the shift op, the result depends on integer promotion rules,
> and if the value is signed, it can invoke undefined behavior.
>
> But if a.low is a unsigned short for instance, a warning would be
> more than justified here.

> >  if ( (uint32_t) ( aSig<<( shiftCount & 31 ) ) ) {
> >
>
> Yes interesting, aSig is signed int, right?

No, it is uint32_t.

> So if the << will overflow, the code is invoking undefined behavior.
>
>
> >  && (uint64_t) (extractFloatx80Frac(a) << 1))
> >
>
> What is the result type of extractFloatx80Frac() ?

static inline uint64_t extractFloatx80Frac( floatx80 a )

>
> >  if ((plen < KEYLENGTH) && (key << plen))
> >
>
> This is from linux, yes, I have not seen that with the first
> version where the warning is only for signed shift ops.
>
> At first sight it looks really, like could it be that "key < plen"
> was meant? But yes, actually it works correctly as long
> as int is 32 bit, if int is 64 bits, that code would break
> immediately.

u8 plen;
u32 key;

> I think in the majority of code

Re: [PATCH 0/8] NVPTX offloading to NVPTX: backend patches

2016-10-17 Thread Alexander Monakov
On Mon, 17 Oct 2016, Bernd Schmidt wrote:
> On 10/14/2016 06:39 PM, Alexander Monakov wrote:
> > I'm resending the patch series with backend prerequisites for OpenMP
> > offloading to the NVIDIA PTX ISA.  The patches are rebased on trunk.
> 
> What's the status of the branch? Is it expected to work? I'm trying to compile
> the OpenMP version of these benchmarks:
>   https://codesign.llnl.gov/lulesh.php
> 
> and the resulting binary fails as follows:
> 
> libgomp: Link error log error   : Size doesn't match for '__nvptx_stacks' in
> 'Input 8', first specified in 'Input 8'
> error   : Multiple definition of '__nvptx_stacks' in 'Input 8', first defined
> in 'Input 8'

I've just pushed two commits to the branch to fix this issue.  Before those, the
last commit left the branch in a state where an incremental build seemed ok
(because libgcc/libgomp weren't rebuilt with the new cc1), but a from-scratch
build was broken like you've shown.  LULESH is known to work.  I also intend to
perform a trunk merge soon.

> I think before merging this work we'll need to have some idea of how well it
> works on real-world code.

This patchset and the branch lay the foundation, there's more work to be
done, in particular on the performance improvements side. There should be
an agreement on these fundamental bits first, before moving on to fine-tuning.

Alexander


Re: [PATCH AArch64]Penalize vector cost for large loops with too many vect insns.

2016-10-17 Thread James Greenhalgh
On Sat, Oct 15, 2016 at 01:07:12PM +1100, kugan wrote:
> Hi Bin,
> 
> On 15/10/16 00:15, Bin Cheng wrote:
> >+/* Test for likely overcommitment of vector hardware resources.  If a
> >+   loop iteration is relatively large, and too large a percentage of
> >+   instructions in the loop are vectorized, the cost model may not
> >+   adequately reflect delays from unavailable vector resources.
> >+   Penalize the loop body cost for this case.  */
> >+
> >+static void
> >+aarch64_density_test (struct aarch64_vect_loop_cost_data *data)
> >+{
> >+  const int DENSITY_PCT_THRESHOLD = 85;
> >+  const int DENSITY_SIZE_THRESHOLD = 128;
> >+  const int DENSITY_PENALTY = 10;
> >+  struct loop *loop = data->loop_info;
> >+  basic_block *bbs = get_loop_body (loop);
> 
> Is this worth being part of the cost model such that it can have
> different defaults for different micro-architecture?

I think this is a relevant point, even if we do choose these values for
the generic compilation model, we may want to tune this on a per-core basis.

So, pulling these magic numbers out in to a new field in the CPU tuning
structures (tune_params) is probably the right approach.

Thanks,
James



Re: [Patch] Backport fix for PR 52085 to gcc-5-branch?

2016-10-17 Thread Senthil Kumar Selvaraj

Richard Biener writes:

> On Mon, Oct 17, 2016 at 12:21 PM, Senthil Kumar Selvaraj
>  wrote:
>> Hi,
>>
>>   The fix for PR 52085 went into trunk when trunk was 6.0. I ran into the
>>   same issue on a gcc 5.x and found that the fix didn't get backported.
>>
>>   Bootstrapped and reg tested below patch with x86-64-pc-linux. Ok to
>>   backport to gcc-5-branch?
>
> Ok with me but please double-check there was no fallout.

I boostrapped and ran against x86_64-pc-linux again, just to be sure.
No regressions.

I'll run the reg tests against arm-none-eabi. Can I commit it if that
passes?

Regards
Senthil
>
> Richard.
>
>> Regards
>> Senthil
>>
>> gcc/c/ChangeLog
>>
>> 2016-10-17  Senthil Kumar Selvaraj  
>>
>>   Backport from mainline
>> 2015-04-25  Marek Polacek  
>> PR c/52085
>> * c-decl.c (finish_enum): Copy over TYPE_ALIGN.  Also check for 
>> "mode"
>> attribute.
>>
>> gcc/testsuite/ChangeLog
>> 2016-10-17  Senthil Kumar Selvaraj  
>>
>> Backport from mainline
>> 2015-04-25  Marek Polacek  
>> PR c/52085
>> * gcc.dg/enum-incomplete-2.c: New test.
>> * gcc.dg/enum-mode-1.c: New test.
>>
>>
>> diff --git gcc/c/c-decl.c gcc/c/c-decl.c
>> index d1e7444..c508e7f 100644
>> --- gcc/c/c-decl.c
>> +++ gcc/c/c-decl.c
>> @@ -8050,7 +8050,7 @@ finish_enum (tree enumtype, tree values, tree 
>> attributes)
>>
>>/* If the precision of the type was specified with an attribute and it
>>   was too small, give an error.  Otherwise, use it.  */
>> -  if (TYPE_PRECISION (enumtype))
>> +  if (TYPE_PRECISION (enumtype) && lookup_attribute ("mode", attributes))
>>  {
>>if (precision > TYPE_PRECISION (enumtype))
>> {
>> @@ -8078,6 +8078,7 @@ finish_enum (tree enumtype, tree values, tree 
>> attributes)
>>TYPE_MIN_VALUE (enumtype) = TYPE_MIN_VALUE (tem);
>>TYPE_MAX_VALUE (enumtype) = TYPE_MAX_VALUE (tem);
>>TYPE_UNSIGNED (enumtype) = TYPE_UNSIGNED (tem);
>> +  TYPE_ALIGN (enumtype) = TYPE_ALIGN (tem);
>>TYPE_SIZE (enumtype) = 0;
>>TYPE_PRECISION (enumtype) = TYPE_PRECISION (tem);
>>
>> diff --git gcc/testsuite/gcc.dg/enum-incomplete-2.c 
>> gcc/testsuite/gcc.dg/enum-incomplete-2.c
>> new file mode 100644
>> index 000..5970551
>> --- /dev/null
>> +++ gcc/testsuite/gcc.dg/enum-incomplete-2.c
>> @@ -0,0 +1,41 @@
>> +/* PR c/52085 */
>> +/* { dg-do compile } */
>> +/* { dg-options "" } */
>> +
>> +#define SA(X) _Static_assert((X),#X)
>> +
>> +enum e1;
>> +enum e1 { A } __attribute__ ((__packed__));
>> +enum e2 { B } __attribute__ ((__packed__));
>> +SA (sizeof (enum e1) == sizeof (enum e2));
>> +SA (_Alignof (enum e1) == _Alignof (enum e2));
>> +
>> +enum e3;
>> +enum e3 { C = 256 } __attribute__ ((__packed__));
>> +enum e4 { D = 256 } __attribute__ ((__packed__));
>> +SA (sizeof (enum e3) == sizeof (enum e4));
>> +SA (_Alignof (enum e3) == _Alignof (enum e4));
>> +
>> +enum e5;
>> +enum e5 { E = __INT_MAX__ } __attribute__ ((__packed__));
>> +enum e6 { F = __INT_MAX__ } __attribute__ ((__packed__));
>> +SA (sizeof (enum e5) == sizeof (enum e6));
>> +SA (_Alignof (enum e5) == _Alignof (enum e6));
>> +
>> +enum e7;
>> +enum e7 { G } __attribute__ ((__mode__(__byte__)));
>> +enum e8 { H } __attribute__ ((__mode__(__byte__)));
>> +SA (sizeof (enum e7) == sizeof (enum e8));
>> +SA (_Alignof (enum e7) == _Alignof (enum e8));
>> +
>> +enum e9;
>> +enum e9 { I } __attribute__ ((__packed__, __mode__(__byte__)));
>> +enum e10 { J } __attribute__ ((__packed__, __mode__(__byte__)));
>> +SA (sizeof (enum e9) == sizeof (enum e10));
>> +SA (_Alignof (enum e9) == _Alignof (enum e10));
>> +
>> +enum e11;
>> +enum e11 { K } __attribute__ ((__mode__(__word__)));
>> +enum e12 { L } __attribute__ ((__mode__(__word__)));
>> +SA (sizeof (enum e11) == sizeof (enum e12));
>> +SA (_Alignof (enum e11) == _Alignof (enum e12));
>> diff --git gcc/testsuite/gcc.dg/enum-mode-1.c 
>> gcc/testsuite/gcc.dg/enum-mode-1.c
>> new file mode 100644
>> index 000..a701123
>> --- /dev/null
>> +++ gcc/testsuite/gcc.dg/enum-mode-1.c
>> @@ -0,0 +1,10 @@
>> +/* { dg-do compile } */
>> +
>> +enum e1 { A = 256 } __attribute__((__mode__(__byte__))); /* { dg-error 
>> "specified mode too small for enumeral values" } */
>> +enum e2 { B = 256 } __attribute__((__packed__, __mode__(__byte__))); /* { 
>> dg-error "specified mode too small for enumeral values" } */
>> +
>> +enum e3 { C = __INT_MAX__ } __attribute__((__mode__(__QI__))); /* { 
>> dg-error "specified mode too small for enumeral values" } */
>> +enum e4 { D = __INT_MAX__ } __attribute__((__packed__, __mode__(__QI__))); 
>> /* { dg-error "specified mode too small for enumeral values" } */
>> +
>> +enum e5 { E = __INT_MAX__ } __attribute__((__mode__(__HI__))); /* { 
>> dg-error "specified mode too small for enumeral values" } */
>> +enum e6 { F = __INT_MAX__ } __attribute__((__packed__, __mode__(__HI__))); 
>> /* { dg-error "specified mode too small for enumeral values" } */



libgo patch committed: copy rdebug code from Go 1.7

2016-10-17 Thread Ian Lance Taylor
This patch to libgo copies the rdebug code from the Go 1.7 runtime to libgo.

While we're at it, this updates the runtime/debug package, and starts
running its testsuite by default.  I'm not sure why runtime/debug was
not previously updated to 1.7.  Doing that led me to fix some minor
aspects of runtime.Stack and the C function runtime/debug.readGCStats.

Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 241197)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-880cb0a45590d992880fc6aabc7484e54c817eeb
+314ba28067383516c213ba84c931f93325a48c39
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/Makefile.am
===
--- libgo/Makefile.am   (revision 241197)
+++ libgo/Makefile.am   (working copy)
@@ -515,7 +515,6 @@ runtime_files = \
lfstack.c \
malloc.c \
netpoll.c \
-   rdebug.c \
reflect.c \
runtime1.c \
sigqueue.c \
@@ -3035,6 +3034,7 @@ TEST_PACKAGES = \
os/user/check \
path/filepath/check \
regexp/syntax/check \
+   runtime/debug/check \
runtime/pprof/check \
runtime/internal/atomic/check \
runtime/internal/sys/check \
Index: libgo/go/runtime/debug/garbage.go
===
--- libgo/go/runtime/debug/garbage.go   (revision 240942)
+++ libgo/go/runtime/debug/garbage.go   (working copy)
@@ -16,17 +16,10 @@ type GCStats struct {
NumGC  int64   // number of garbage collections
PauseTotal time.Duration   // total pause for all collections
Pause  []time.Duration // pause history, most recent first
+   PauseEnd   []time.Time // pause end times history, most recent 
first
PauseQuantiles []time.Duration
 }
 
-// Implemented in package runtime.
-func readGCStats(*[]time.Duration)
-func enableGC(bool) bool
-func setGCPercent(int) int
-func freeOSMemory()
-func setMaxStack(int) int
-func setMaxThreads(int) int
-
 // ReadGCStats reads statistics about garbage collection into stats.
 // The number of entries in the pause history is system-dependent;
 // stats.Pause slice will be reused if large enough, reallocated otherwise.
@@ -38,25 +31,36 @@ func setMaxThreads(int) int
 func ReadGCStats(stats *GCStats) {
// Create a buffer with space for at least two copies of the
// pause history tracked by the runtime. One will be returned
-   // to the caller and the other will be used as a temporary buffer
-   // for computing quantiles.
+   // to the caller and the other will be used as transfer buffer
+   // for end times history and as a temporary buffer for
+   // computing quantiles.
const maxPause = len(((*runtime.MemStats)(nil)).PauseNs)
-   if cap(stats.Pause) < 2*maxPause {
-   stats.Pause = make([]time.Duration, 2*maxPause)
+   if cap(stats.Pause) < 2*maxPause+3 {
+   stats.Pause = make([]time.Duration, 2*maxPause+3)
}
 
-   // readGCStats fills in the pause history (up to maxPause entries)
-   // and then three more: Unix ns time of last GC, number of GC,
-   // and total pause time in nanoseconds. Here we depend on the
-   // fact that time.Duration's native unit is nanoseconds, so the
-   // pauses and the total pause time do not need any conversion.
+   // readGCStats fills in the pause and end times histories (up to
+   // maxPause entries) and then three more: Unix ns time of last GC,
+   // number of GC, and total pause time in nanoseconds. Here we
+   // depend on the fact that time.Duration's native unit is
+   // nanoseconds, so the pauses and the total pause time do not need
+   // any conversion.
readGCStats(&stats.Pause)
n := len(stats.Pause) - 3
stats.LastGC = time.Unix(0, int64(stats.Pause[n]))
stats.NumGC = int64(stats.Pause[n+1])
stats.PauseTotal = stats.Pause[n+2]
+   n /= 2 // buffer holds pauses and end times
stats.Pause = stats.Pause[:n]
 
+   if cap(stats.PauseEnd) < maxPause {
+   stats.PauseEnd = make([]time.Time, 0, maxPause)
+   }
+   stats.PauseEnd = stats.PauseEnd[:0]
+   for _, ns := range stats.Pause[n : n+n] {
+   stats.PauseEnd = append(stats.PauseEnd, time.Unix(0, int64(ns)))
+   }
+
if len(stats.PauseQuantiles) > 0 {
if n == 0 {
for i := range stats.PauseQuantiles {
@@ -91,9 +95,9 @@ func (x byDuration) Less(i, j int) bool
 // at startup, or 100 if the variable is not set.
 // A negative percentage disables garbage collection.
 func SetGCPercent(percent int) int {
-

Re: [PATCH] Make -Wint-in-bool-context warn on suspicious shift ops

2016-10-17 Thread Bernd Edlinger
On 10/17/16 17:23, Markus Trippelsdorf wrote:
> On 2016.09.29 at 18:52 +, Bernd Edlinger wrote:
>> On 09/29/16 20:03, Jason Merrill wrote:
>>> On Wed, Sep 28, 2016 at 12:09 PM, Bernd Edlinger
>>>  wrote:
 On 09/28/16 16:41, Jason Merrill wrote:
> On Tue, Sep 27, 2016 at 11:10 AM, Bernd Edlinger
>  wrote:
>> On 09/27/16 16:42, Jason Merrill wrote:
>>> On Tue, Sep 27, 2016 at 10:28 AM, Bernd Edlinger
>>>  wrote:
 On 09/27/16 16:10, Florian Weimer wrote:
> * Bernd Edlinger:
>
>>> “0 << 0” is used in a similar context, to create a zero constant 
>>> for a
>>> multi-bit subfield of an integer.
>>>
>>> This example comes from GDB, in bfd/elf64-alpha.c:
>>>
>>> |   insn = INSN_ADDQ | (16 << 21) | (0 << 16) | (0 << 0);
>>>
>>
>> Of course that is not a boolean context, and will not get a warning.
>>
>> Question is if "if (1 << 0)" is possibly a miss-spelled "if (1 < 0)".
>>
>> Maybe 1 and 0 come from macro expansion
>
> But what's the intent of treating 1 << 0 and 0 << 0 differently in the
> patch, then?

 I am not sure if it was a good idea.

 I saw, we had code of the form
 bool flag = 1 << 2;

 another value LOOKUP_PROTECT is  1 << 0, and
 bool flag = 1 << 0;

 would at least not overflow the allowed value range of a boolean.
>>>
>>> Assigning a bit mask to a bool variable is still probably not what was
>>> intended, even if it doesn't change the value.
>>
>> That works for me too.
>> I can simply remove that exception.
>
> Sounds good.

 Great.  Is that an "OK with that change"?
>>>
>>> What do you think about dropping the TYPE_UNSIGNED exception as well?
>>> I don't see what difference that makes.
>>>
>>
>>
>> If I drop that exception, then I could also drop the check for
>> INTEGER_TYPE and the whole if, because I think other types can not
>> happen, but if they are allowed they are as well bogus here.
>>
>> I can try a bootstrap and see if there are false positives.
>>
>> But I can do that as well in a follow-up patch, this should probably
>> be done step by step, especially when it may trigger some false
>> positives.
>>
>> I think I could also add more stuff, like unary + or - ?
>> or maybe also binary +, -, * and / ?
>>
>> We already discussed making this a multi-level option,
>> and maybe enabling the higher level explicitly in the
>> boot-strap.
>>
>> As long as the warning continues to find more bugs than false
>> positives, it is probably worth extending it to more cases.
>>
>> However unsigned integer shift are not undefined if they overflow.
>>
>> It is possible that this warning will then trigger also on valid
>> code that does loop termination with unsigned int left shifting.
>> I dont have a real example, but maybe  like this hypothetical C-code:
>>
>>   unsigned int x=1, bits=0;
>>   while (x << bits) bits++;
>>   printf("bits=%d\n", bits);
>>
>>
>> Is it OK for everybody to warn for this on -Wall, or maybe only
>> when -Wextra or for instance -Wint-in-bool-context=2 is used ?
>
> I'm seeing this warning a lot in valid low level C code for unsigned
> integers. And I must say it look bogus in this context. Some examples:
>
>  return ((a.high & 0x7fff) == 0x7fff) && (a.low<<1);
>

With the shift op, the result depends on integer promotion rules,
and if the value is signed, it can invoke undefined behavior.

But if a.low is a unsigned short for instance, a warning would be
more than justified here.

>  if ( (uint32_t) ( aSig<<( shiftCount & 31 ) ) ) {
>

Yes interesting, aSig is signed int, right?

So if the << will overflow, the code is invoking undefined behavior.


>  && (uint64_t) (extractFloatx80Frac(a) << 1))
>

What is the result type of extractFloatx80Frac() ?


>  if ((plen < KEYLENGTH) && (key << plen))
>

This is from linux, yes, I have not seen that with the first
version where the warning is only for signed shift ops.

At first sight it looks really, like could it be that "key < plen"
was meant? But yes, actually it works correctly as long
as int is 32 bit, if int is 64 bits, that code would break
immediately.

I think in the majority of code, where the author was aware of
possible overflow issues and integer promotion rules, he will
have used unsigned integer types, of sufficient precision.

I think all of the places where I have seen this warning was issued for
wrong code it was with signed integer shifts.



Bernd.


Re: [PATCH] Reduce stack usage in sha512 (PR target/77308)

2016-10-17 Thread Kyrill Tkachov


On 30/09/16 14:34, Bernd Edlinger wrote:

On 09/30/16 12:14, Bernd Edlinger wrote:

Eric Botcazou wrote:

A comment before the SETs and a testcase would be nice.  IIRC
we do have stack size testcases via using -fstack-usage.

Or -Wstack-usage, which might be more appropriate here.

Yes.  good idea.  I was not aware that we already have that kind of tests.

When trying to write this test. I noticed, that I did not try -Os so far.
But for -Os the stack is still the unchanged 3500 bytes.

However for embedded targets I am often inclined to use -Os, and
would certainly not expect the stack to explode...

I see in arm.md there are places like

/* If we're optimizing for size, we prefer the libgcc calls.  */
if (optimize_function_for_size_p (cfun))
  FAIL;


Oh, yeah.  The comment is completely misleading.

If this pattern fails, expmed.c simply expands some
less efficient rtl, which also results in two shifts
and one or-op.  No libgcc calls at all.

So in simple cases without spilling the resulting
assembler is the same, regardless if this pattern
fails or not.  But the half-defined out registers
make a big difference when it has to be spilled.


/* Expand operation using core-registers.
   'FAIL' would achieve the same thing, but this is a bit smarter.  */
scratch1 = gen_reg_rtx (SImode);
scratch2 = gen_reg_rtx (SImode);
arm_emit_coreregs_64bit_shift (LSHIFTRT, operands[0], operands[1],
   operands[2], scratch1, scratch2);


.. that explains why this happens.  I think it would be better to
use the emit_coreregs for shift count >= 32, because these are
effectively 32-bit shifts.

Will try if that can be improved, and come back with the
results.


The test case with -Os has 3520 bytes stack usage.
When only shift count >= 32 are handled we
have still 3000 bytes stack usage.
And when arm_emit_coreregs_64bit_shift is always
allowed to run, we have 2360 bytes stack usage.

Also for the code size it is better not to fail this
pattern.  So I propose to remove this exception in all
three expansions.

Here is an improved patch with the test case from the PR.
And a comment on the redundant SET why it is better to clear
the out register first.


Bootstrap and reg-testing on arm-linux-gnueabihf.
Is it OK for trunk?


This looks ok to me.
Thanks,
Kyrill



Thanks
Bernd.




[PING #2] [PATCH] fix outstanding -Wformat-length failures (pr77735 et al.)

2016-10-17 Thread Martin Sebor

I'm still looking for a review of the patch below:

  https://gcc.gnu.org/ml/gcc-patches/2016-10/msg00043.html



The patch should clean up the remaining test suite failures on
ILP32 targets and also fixes up some remaining issues in the
gimple-ssa-sprintf pass that stand in the way of re-enabling
the printf-return-value optimization.

I'm traveling next week so I'm hoping to enable the optimization
shortly after this patch goes in so that if there's any fallout
from it I can fix it before I leave.

Thanks
Martin

On 10/02/2016 02:10 PM, Martin Sebor wrote:

The attached patch fixes a number of outstanding test failures
and ILP32-related bugs in the gimple-ssa-sprintf pattch pointed
out in bug 77676 and 77735).  The patch also fixes c_strlen to
correctly handle wide strings (previously it accepted them but
treated them as nul-terminated byte sequences), and adjusts the
handling of "%a" to avoid assuming a specific number of decimal
digits (this is likely a defect in C11 that I'm pursuing with
WG14).

Tested on powerpc64le, i386, and x86_64.

There is one outstanding failure in the builtin-sprintf-warn-1.c
test on powerpc64le that looks like it might be due to the
printf_pointer_format target hook not having been set up entirely
correctly.  I'll look into that separately, along with pr77819.

Martin




Re: [PATCH][AArch64] Use new target pass registration framework for FMA steering pass

2016-10-17 Thread James Greenhalgh
On Thu, Oct 13, 2016 at 06:14:16PM +0100, Kyrill Tkachov wrote:
> Hi all,
> 
> This patch moves the aarch64-specific FMA steering pass registration into the 
> new framework
> that Jakub introduced. With this patch the RTL dump for the steering pass is 
> now numbered properly
> so that it appears after the regrename pass, rather than getting a dump 
> number that puts it after
> all the other passes.
> 
> I've followed a similar approach to [1] and added an aarch64-passes.def file 
> and updated
> PASSES_EXTRA in t-aarch64. I deleted cortex-a57-fma-steering.h as I don't 
> think it adds any value.
> The prototype for the make_pass* function works just as well in 
> aarch64-protos.h I think.
> 
> Bootstrapped and tested on aarch64-none-linux-gnu.
> Manually checked that the pass still runs when tuning for Cortex-A57 and 
> doesn't run otherwise.
> 
> Ok for trunk?

OK, a comment on git diffs below that doesn't change the patch content.

> [1] https://gcc.gnu.org/ml/gcc-patches/2016-10/msg00615.html
> 
> 2016-10-13  Kyrylo Tkachov  
> 
> * config/aarch64/aarch64.c: Delete inclusion of
> cortex-a57-fma-steering.h.
> (aarch64_override_options): Delete call
> to aarch64_register_fma_steering.
> * config/aarch64/aarch64-protos.h (make_pass_fma_steering): Declare.
> * config/aarch64/cortex-a57-fma-steering.h: Delete.
> * config/aarch64/aarch64-passes.def: New file.
> * config/aarch64/cortex-a57-fma-steering.c
> (aarch64_register_fma_steering): Delete definition.
> (make_pass_fma_steering): Remove static qualifier.
> * config/aarch64/t-aarch64 (PASSES_EXTRA): New directive.
> (cortex-a57-fma-steering.o): Remove dependency on
> cortex-a57-fma-steering.h.


> diff --git a/gcc/config/aarch64/cortex-a57-fma-steering.h 
> b/gcc/config/aarch64/aarch64-passes.def
> similarity index 78%
> rename from gcc/config/aarch64/cortex-a57-fma-steering.h
> rename to gcc/config/aarch64/aarch64-passes.def
> index 
> 65bf5acc132d2db645d1b00ef031dc33a195bb78..7fe80391a3fb0dc79715b9fb23fd4c08a9d26d74
>  100644
> --- a/gcc/config/aarch64/cortex-a57-fma-steering.h
> +++ b/gcc/config/aarch64/aarch64-passes.def
> @@ -1,6 +1,5 @@
> -/* This file contains declarations for the FMA steering optimization
> -   pass for Cortex-A57.
> -   Copyright (C) 2015-2016 Free Software Foundation, Inc.
> +/* AArch64-specific passes declarations.
> +   Copyright (C) 2016 Free Software Foundation, Inc.
> Contributed by ARM Ltd.
>  
> This file is part of GCC.
> @@ -19,4 +18,4 @@
> along with GCC; see the file COPYING3.  If not see
> .  */
>  
> -void aarch64_register_fma_steering (void);
> +INSERT_PASS_AFTER (pass_regrename, 1, pass_fma_steering);

A technicality on your git diff format, this should not be a rename.

Just make sure when you apply it to svn you accurately record a delete of
the old file, and creation of the new file.

Thanks,
James



[PATCH] read-md.c: Move various state to within class rtx_reader (v3)

2016-10-17 Thread David Malcolm
On Mon, 2016-10-17 at 12:37 +0100, Richard Sandiford wrote:
> David Malcolm  writes:
> > @@ -2388,8 +2389,10 @@ gen_mnemonic_setattr (htab_t mnemonic_htab,
> > rtx insn)
> >  static int
> >  mnemonic_htab_callback (void **slot, void *info ATTRIBUTE_UNUSED)
> >  {
> > -  obstack_grow (&string_obstack, (char*)*slot, strlen
> > ((char*)*slot));
> > -  obstack_1grow (&string_obstack, ',');
> > +  struct obstack *string_obstack = rtx_reader_ptr
> > ->get_string_obstack ();
> > +
> > +  obstack_grow (string_obstack, (char*)*slot, strlen
> > ((char*)*slot));
> > +  obstack_1grow (string_obstack, ',');
> 
> The old code was missing a space after the cast.  Might be good to
> add it while you're there.

Fixed.

> > @@ -2470,8 +2474,8 @@ gen_mnemonic_attr (void)
> >htab_traverse (mnemonic_htab, mnemonic_htab_callback, NULL);
> >  
> >/* Replace the last ',' with the zero end character.  */
> > -  *((char *)obstack_next_free (&string_obstack) - 1) = '\0';
> > -  XSTR (mnemonic_attr, 1) = XOBFINISH (&string_obstack, char *);
> > +  *((char *)obstack_next_free (string_obstack) - 1) = '\0';
> > +  XSTR (mnemonic_attr, 1) = XOBFINISH (string_obstack, char *);
> 
> Same here.

Fixed.

> > @@ -129,9 +105,9 @@ get_md_ptr_loc (const void *ptr)
> >  void
> >  copy_md_ptr_loc (const void *new_ptr, const void *old_ptr)
> >  {
> > -  const struct ptr_loc *loc = get_md_ptr_loc (old_ptr);
> > +  const struct ptr_loc *loc = rtx_reader_ptr->get_md_ptr_loc
> > (old_ptr);
> >if (loc != 0)
> > -set_md_ptr_loc (new_ptr, loc->filename, loc->lineno);
> > +rtx_reader_ptr->set_md_ptr_loc (new_ptr, loc->filename, loc
> > ->lineno);
> >  }
> 
> I suppose it's bikeshedding, but since you need access to an
> rtx_reader
> to get and set the locations, it feels like the rtx_reader should be
> a parameter here too.

In the attached patch I've converted it to a member function, so
that the param is the implicit "this".

> > @@ -140,7 +116,7 @@ copy_md_ptr_loc (const void *new_ptr, const
> > void *old_ptr)
> >  void
> >  fprint_md_ptr_loc (FILE *outf, const void *ptr)
> >  {
> > -  const struct ptr_loc *loc = get_md_ptr_loc (ptr);
> > +  const struct ptr_loc *loc = rtx_reader_ptr->get_md_ptr_loc
> > (ptr);
> >if (loc != 0)
> >  fprintf (outf, "#line %d \"%s\"\n", loc->lineno, loc
> > ->filename);
> >  }
> 
> Same here.

Likewise, and for print_md_ptr_loc.

> > +/* As rtx_reader::join_c_conditions above, but call it on the
> > singleton
> > +   rtx_reader instance.  */
> > +
> > +const char *
> > +join_c_conditions (const char *cond1, const char *cond2)
> > +{
> > +  return rtx_reader_ptr->join_c_conditions (cond1, cond2);
> > +}
> > +
> 
> I think it'd be better to have the callers use the rtx_reader_ptr
> explicitly.
> 
> In general I think it'd be better if the read-md.h interface itself
> didn't rely on there being a singleton, other than providing the
> rtx_reader_ptr variable.  It would then be easy to allow non
> -singleton
> uses in future, without having to update gen* calls a second time.

OK

> > @@ -204,6 +189,15 @@ fprint_c_condition (FILE *outf, const char
> > *cond)
> >  }
> >  }
> >  
> > +/* As rtx_reader::fprint_c_condition above, but call it on the
> > singleton
> > +   rtx_reader instance.  */
> > +
> > +void
> > +fprint_c_condition (FILE *outf, const char *cond)
> > +{
> > +  rtx_reader_ptr->fprint_c_condition (outf, cond);
> > +}
> > +
> >  /* Special fprint_c_condition for writing to STDOUT.  */
> >  
> >  void
> 
> Another instance of the above.

Fixed.

> > @@ -808,7 +802,7 @@ handle_constants (void)
> >  void
> >  traverse_md_constants (htab_trav callback, void *info)
> >  {
> > -  htab_traverse (md_constants, callback, info);
> > +  htab_traverse (rtx_reader_ptr->get_md_constants (), callback,
> > info);
> >  }
> 
> Here too.
> 
> I assume you didn't turn the functions above into member functions
> because they aren't primitives.  OTOH it might seem inconsistent
> to have things like traverse_enum_types as member functions but
> traverse_md_constants not.

In the following I've converted traverse_md_constants to a member
function.

Successfully bootstrapped®rtested on x86_64-pc-linux-gnu.

OK for trunk?

gcc/ChangeLog:
* genattrtab.c (attr_string): Use rtx_reader_ptr for call to
copy_md_ptr_loc.
(gen_attr): Use rtx_reader_ptr for lookup_enum_type call.
(write_test_expr): Use rtx_reader_ptr for calls to
fprint_c_condition.
(write_attr_value): Likewise.
* genconditions.c (write_one_condition): Use rtx_reader_ptr for
call to print_md_ptr_loc.
(write_one_condition): Likewise for calls to print_c_condition.
* genconstants.c: Include "statistics.h" and "vec.h".
(main): Update for conversion to member functions.
* genemit.c (emit_c_code): Use rtx_reader_ptr for
call to print_md_ptr_loc.
* genenums.c: Include "statistics.h" and "vec.h".
(main): Update for conversion of trave

Re: [PATCH] Fix computation of register limit for -fsched-pressure

2016-10-17 Thread Pat Haugen
On 10/17/2016 08:17 AM, Maxim Kuvyrkov wrote:
>> The patch here, https://gcc.gnu.org/ml/gcc-patches/2014-10/msg01872.html, 
>> attempted to scale down the register limit used by -fsched-pressure for the 
>> case where the block in question executes as frequently as the entry block 
>> to just the call_clobbered (i.e. call_used) regs. But the code is actually 
>> scaling toward call_saved registers. The following patch corrects that by 
>> computing call_saved regs per class and subtracting out some scaled portion 
>> of that.
>> > 
>> > Bootstrap/regtest on powerpc64le with no new failures. Ok for trunk?
> Hi Pat,
> 
> I stared at your patch and current code for good 30 minutes, and I still 
> don't see what is wrong with the current code.
> 
> With your patch the number of registers from class CL that scheduler has at 
> its disposal for a single-basic-block function will be:
> 
> sched_call_regs_num[CL] = ira_class_hard_regs_num[CL] - 
> call_saved_regs_num[CL];
> 
> where call_saved_regs_num is number of registers in class CL that need to be 
> saved in the prologue (i.e., "free" registers).  I can see some logic in 
> setting
> 
> sched_call_regs_num[CL] = call_saved_regs_num[CL];
> 
> but not in subtracting number of such registers from the number of total 
> available hard registers.
> 
> Am I missing something?
> 

Your original patch gave the following reasoning:

"At the moment the scheduler does not account for spills in the prologues and 
restores in the epilogue, which occur from use of call-used registers.  The 
current state is, essentially, optimized for case when there is a hot loop 
inside the function, and the loop executes significantly more often than the 
prologue/epilogue.  However, on the opposite end, we have a case when the 
function is just a single non-cyclic basic block, which executes just as often 
as prologue / epilogue, so spills in the prologue hurt performance as much as 
spills in the basic block itself.  In such a case the scheduler should 
throttle-down on the number of available registers and try to not go beyond 
call-clobbered registers."

But the misunderstanding is that call-used registers do NOT cause any 
save/restore. That is to say, call-used == call-clobbered. Your last sentence 
explains the goal for a single block function, to not go beyond call-clobbered 
(i.e. call-used) registers, which makes perfect sense. My patch implements that 
goal by subtracting out call_saved_regs_num (those that require prolog/epilog 
save/restore) from the total regs, and using that as the target # of registers 
to be used for the block.


> Also, could you share the testcase that you used to investigate the problem 
> with register-aware scheduling?  I wonder if there is a problem lurking.

I don't have a testcase. I'm currently trying to get -fsched-pressure to be 
beneficial for PowerPC and was familiarizing myself with the code when I 
spotted the issue.

Thanks,
Pat



[PATCH][AArch64] Fix PR target/77822: Use tighter predicates for zero_extract patterns

2016-10-17 Thread Kyrill Tkachov

Hi all,

For the attached testcase the code ends up trying to extract bits outside the 
range of the normal register
widths. The aarch64 patterns for ubfz and tbnz end up accepting such operands 
and emitting invalid assembly
such as 'ubfx x18,x2,192,32'

The solution is to add proper predicates and guards to the operands of the 
zero_extract operations that are going on.
I had a look at all the other patterns in aarch64 that generate/use 
zero_extract and they all have guards on their
operands in one form or another to avoid them accessing an area that is out of 
range.

With this patch the testcase compiles and assembles fine.

Bootstrapped and tested on aarch64-none-linux-gnu.

Ok for trunk?

Thanks,
Kyrill

2016-10-17  Kyrylo Tkachov  

PR target/77822
* config/aarch64/aarch64.md (*tb1): Use
aarch64_simd_shift_imm_ predicate for operand 1.
(, ANY_EXTRACT): Use tighter predicates on operands 2 and 3
to restrict them to an appropriate range and add FAIL check if the
region they specify is out of range.  Delete useless constraint
strings.
(*, ANY_EXTRACT): Add appropriate predicates on operands
2 and 3 to restrict their range and add pattern predicate.

2016-10-17  Kyrylo Tkachov  

PR target/77822
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 50cfdee909aa717afc097c74ab744c36137352ad..83d45931a28fc7b3442aa0a59b9e7315ef46c0e5 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -639,7 +639,8 @@ (define_insn "*tb1"
   [(set (pc) (if_then_else
 	  (EQL (zero_extract:DI (match_operand:GPI 0 "register_operand" "r")
 (const_int 1)
-(match_operand 1 "const_int_operand" "n"))
+(match_operand 1
+  "aarch64_simd_shift_imm_" "n"))
 		   (const_int 0))
 	 (label_ref (match_operand 2 "" ""))
 	 (pc)))
@@ -4301,19 +4302,28 @@ (define_insn "*extend_ashr"
 
 (define_expand ""
   [(set (match_operand:DI 0 "register_operand" "=r")
-	(ANY_EXTRACT:DI (match_operand:DI 1 "register_operand" "r")
-			(match_operand 2 "const_int_operand" "n")
-			(match_operand 3 "const_int_operand" "n")))]
-  ""
+	(ANY_EXTRACT:DI (match_operand:DI 1 "register_operand")
+			(match_operand 2
+			  "aarch64_simd_shift_imm_offset_di")
+			(match_operand 3 "aarch64_simd_shift_imm_di")))]
   ""
+  {
+if (!IN_RANGE (INTVAL (operands[2]) + INTVAL (operands[3]),
+		   1, GET_MODE_BITSIZE (DImode) - 1))
+ FAIL;
+  }
 )
 
+
 (define_insn "*"
   [(set (match_operand:GPI 0 "register_operand" "=r")
 	(ANY_EXTRACT:GPI (match_operand:GPI 1 "register_operand" "r")
-			 (match_operand 2 "const_int_operand" "n")
-			 (match_operand 3 "const_int_operand" "n")))]
-  ""
+			 (match_operand 2
+			   "aarch64_simd_shift_imm_offset_" "n")
+			 (match_operand 3
+			   "aarch64_simd_shift_imm_" "n")))]
+  "IN_RANGE (INTVAL (operands[2]) + INTVAL (operands[3]),
+	 1, GET_MODE_BITSIZE (mode) - 1)"
   "bfx\\t%0, %1, %3, %2"
   [(set_attr "type" "bfm")]
 )
diff --git a/gcc/testsuite/g++.dg/torture/pr77822.C b/gcc/testsuite/g++.dg/torture/pr77822.C
new file mode 100644
index ..4dc428b63eee981bda04e1faa29bb97e3986dca9
--- /dev/null
+++ b/gcc/testsuite/g++.dg/torture/pr77822.C
@@ -0,0 +1,30 @@
+// PR target/77822
+// { dg-do compile }
+
+using UINT8 = char;
+using UINT32 = int;
+using UINT64 = long;
+class A
+{
+  void m_fn1 ();
+  struct B
+  {
+UINT32 m_multiplier;
+  };
+  UINT8 m_datawidth;
+  UINT8 m_subunits;
+  B m_subunit_infos[];
+};
+int a;
+UINT64 b;
+void
+A::m_fn1 ()
+{
+  int c = 32, d = m_datawidth / c;
+  for (int e = 0; e < d; e++)
+{
+  UINT32 f = e * 32;
+  if (b >> f & 1)
+	m_subunit_infos[m_subunits].m_multiplier = a;
+}
+}


Use FOR_ALL_BB_FN in a few more places (was: [PATCH] Clear BB_VISITED in bb-reorder)

2016-10-17 Thread Thomas Schwinge
Hi!

On Mon, 17 Oct 2016 15:17:13 +0200, Richard Biener  
wrote:
> On Mon, Oct 17, 2016 at 3:12 PM, Thomas Schwinge
>  wrote:
> > On Mon, 17 Oct 2016 14:26:42 +0200 (CEST), Richard Biener 
> >  wrote:
> >> --- gcc/bb-reorder.c  (revision 241228)
> >> +++ gcc/bb-reorder.c  (working copy)
> >> @@ -2355,7 +2355,10 @@ reorder_basic_blocks_simple (void)
> >>   To start with, everything points to itself, nothing is assigned yet. 
> >>  */
> >>
> >>FOR_ALL_BB_FN (bb, cfun)
> >> -bb->aux = bb;
> >> +{
> >> +  bb->aux = bb;
> >> +  bb->flags &= ~BB_VISITED;
> >> +}
> >>
> >>EXIT_BLOCK_PTR_FOR_FN (cfun)->aux = 0;
> >
> > "EXIT_BLOCK_PTR_FOR_FN (cfun)->flags &= ~BB_VISITED;" is not required
> > here, additionally?
> 
> ALL_BB includes EXIT_BLOCK.

I see -- ;-) FOR_ALL_BB_FN is something different from FOR_EACH_BB_FN...


We could use the former in a few more places; OK for trunk once tested?

commit 183433526b2befb7bd75e74f753b8a5d4bec14b0
Author: Thomas Schwinge 
Date:   Mon Oct 17 18:07:16 2016 +0200

Use FOR_ALL_BB_FN in a few more places

gcc/
* cfg.c (clear_bb_flags): Use FOR_ALL_BB_FN.
* config/nvptx/nvptx.c (nvptx_find_sese): Likewise.
---
 gcc/cfg.c| 2 +-
 gcc/config/nvptx/nvptx.c | 8 +---
 2 files changed, 2 insertions(+), 8 deletions(-)

diff --git gcc/cfg.c gcc/cfg.c
index ee2e42c..6604b02 100644
--- gcc/cfg.c
+++ gcc/cfg.c
@@ -386,7 +386,7 @@ clear_bb_flags (void)
 {
   basic_block bb;
 
-  FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR_FOR_FN (cfun), NULL, next_bb)
+  FOR_ALL_BB_FN (bb, cfun)
 bb->flags &= BB_FLAGS_TO_PRESERVE;
 }
 
diff --git gcc/config/nvptx/nvptx.c gcc/config/nvptx/nvptx.c
index 6ec8eb4..80fa9ae 100644
--- gcc/config/nvptx/nvptx.c
+++ gcc/config/nvptx/nvptx.c
@@ -3091,17 +3091,11 @@ nvptx_find_sese (auto_vec &blocks, 
bb_pair_vec_t ®ions)
   int ix;
 
   /* First clear each BB of the whole function.  */ 
-  FOR_EACH_BB_FN (block, cfun)
+  FOR_ALL_BB_FN (block, cfun)
 {
   block->flags &= ~BB_VISITED;
   BB_SET_SESE (block, 0);
 }
-  block = EXIT_BLOCK_PTR_FOR_FN (cfun);
-  block->flags &= ~BB_VISITED;
-  BB_SET_SESE (block, 0);
-  block = ENTRY_BLOCK_PTR_FOR_FN (cfun);
-  block->flags &= ~BB_VISITED;
-  BB_SET_SESE (block, 0);
 
   /* Mark blocks in the function that are in this graph.  */
   for (ix = 0; blocks.iterate (ix, &block); ix++)


Grüße
 Thomas


Re: [GCC-6][RFA 1/2]: Don't ignore target_header_dir when deciding inhibit_libc

2016-10-17 Thread Andre Vieira (lists)
On 09/09/16 15:32, Andre Vieira (lists) wrote:
> On 27/05/16 15:51, Ulrich Weigand wrote:
>> Andre Vieira (lists) wrote:
>>> On 07/04/16 10:30, Andre Vieira (lists) wrote:
 On 17/03/16 16:33, Andre Vieira (lists) wrote:
> On 23/10/15 12:31, Bernd Schmidt wrote:
>> On 10/12/2015 11:58 AM, Ulrich Weigand wrote:
>>>
>>> Index: gcc/configure.ac
>>> ===
>>> --- gcc/configure.ac(revision 228530)
>>> +++ gcc/configure.ac(working copy)
>>> @@ -1993,7 +1993,7 @@ elif test "x$TARGET_SYSTEM_ROOT" != x; t
>>>   fi
>>>
>>>   if test x$host != x$target || test "x$TARGET_SYSTEM_ROOT" != x; then
>>> -  if test "x$with_headers" != x; then
>>> +  if test "x$with_headers" != x && test "x$with_headers" != xyes; then
>>>   target_header_dir=$with_headers
>>> elif test "x$with_sysroot" = x; then
>>>  
>>> target_header_dir="${test_exec_prefix}/${target_noncanonical}/sys-include"
>>>
>>
>> I'm missing the beginning of this conversation, but this looks like a
>> reasonable change (avoiding target_header_dir=yes for --with-headers).
>> So, approved.
>>
>>
>> Bernd
>>
> Hi there,
>
> I was wondering why this never made it to trunk. I am currently running
> into an issue that this patch would fix.
>>
>> Seems I never actually checked this in, even though it was approved.
>> Thanks for the reminder, I've now checked the patch in.
>>
>> Bye,
>> Ulrich
>>
> 
> Is it OK to backport this fix to GCC-6? It applies cleanly, builds and
> no regressions for arm-none-eabi.
> 
> Regards,
> Andre
> 
Ping.


Re: [PR lto/77458] Avoid ICE in offloading with differing _FloatN, _FloatNx types

2016-10-17 Thread Thomas Schwinge
Hi!

Ping.

On Thu, 29 Sep 2016 15:18:00 +0200, Thomas Schwinge  
wrote:
> On Mon, 19 Sep 2016 13:25:01 +0200, Richard Biener 
>  wrote:
> > On Mon, Sep 19, 2016 at 1:19 PM, Thomas Schwinge
> >  wrote:
> > > On Mon, 19 Sep 2016 10:18:35 +0200, Richard Biener 
> > >  wrote:
> > >> On Fri, Sep 16, 2016 at 3:32 PM, Thomas Schwinge
> > >>  wrote:
> > >> > --- gcc/tree-streamer.c
> > >> > +++ gcc/tree-streamer.c
> > >> > @@ -278,9 +278,23 @@ record_common_node (struct streamer_tree_cache_d 
> > >> > *cache, tree node)
> > >> >streamer_tree_cache_append (cache, node, cache->nodes.length ());
> > >> >
> > >> >if (POINTER_TYPE_P (node)
> > >> > -  || TREE_CODE (node) == COMPLEX_TYPE
> > >> >|| TREE_CODE (node) == ARRAY_TYPE)
> > >> >  record_common_node (cache, TREE_TYPE (node));
> > >> > +  else if (TREE_CODE (node) == COMPLEX_TYPE)
> > >> > +{
> > >> > +  /* Assert that complex types' component types have already been 
> > >> > handled
> > >> > +(and we thus don't need to recurse here).  See PR lto/77458.  
> > >> > */
> > >> > +[...]
> 
> > >> So I very much like to go forward with this kind of change as well
> 
> > > [patch]
> 
> > Ok with [changes]
> 
> Like this?  (I'll then continue to replicate this for other tree codes.)

commit 1c7dd6d92b7805cf12aaf822e509fa384cfcfbb0
Author: Thomas Schwinge 
Date:   Wed Sep 28 12:36:59 2016 +0200

[PR lto/77458] Avoid ICE in offloading with differing _FloatN, _FloatNx 
types

gcc/
PR lto/77458
* tree-core.h (enum tree_index): Put the complex types after their
component types.
* tree-streamer.c (verify_common_node_recorded): New function.
(preload_common_nodes) : Use it.
---
 gcc/tree-core.h | 31 +--
 gcc/tree-streamer.c | 32 +++-
 2 files changed, 48 insertions(+), 15 deletions(-)

diff --git gcc/tree-core.h gcc/tree-core.h
index 1bfe682..3e3f31e 100644
--- gcc/tree-core.h
+++ gcc/tree-core.h
@@ -556,20 +556,6 @@ enum tree_index {
   TI_BOOLEAN_FALSE,
   TI_BOOLEAN_TRUE,
 
-  TI_COMPLEX_INTEGER_TYPE,
-  TI_COMPLEX_FLOAT_TYPE,
-  TI_COMPLEX_DOUBLE_TYPE,
-  TI_COMPLEX_LONG_DOUBLE_TYPE,
-
-  TI_COMPLEX_FLOAT16_TYPE,
-  TI_COMPLEX_FLOATN_NX_TYPE_FIRST = TI_COMPLEX_FLOAT16_TYPE,
-  TI_COMPLEX_FLOAT32_TYPE,
-  TI_COMPLEX_FLOAT64_TYPE,
-  TI_COMPLEX_FLOAT128_TYPE,
-  TI_COMPLEX_FLOAT32X_TYPE,
-  TI_COMPLEX_FLOAT64X_TYPE,
-  TI_COMPLEX_FLOAT128X_TYPE,
-
   TI_FLOAT_TYPE,
   TI_DOUBLE_TYPE,
   TI_LONG_DOUBLE_TYPE,
@@ -599,6 +585,23 @@ enum tree_index {
 - TI_FLOATN_NX_TYPE_FIRST  \
 + 1)
 
+  /* Put the complex types after their component types, so that in (sequential)
+ tree streaming we can assert that their component types have already been
+ handled (see tree-streamer.c:record_common_node).  */
+  TI_COMPLEX_INTEGER_TYPE,
+  TI_COMPLEX_FLOAT_TYPE,
+  TI_COMPLEX_DOUBLE_TYPE,
+  TI_COMPLEX_LONG_DOUBLE_TYPE,
+
+  TI_COMPLEX_FLOAT16_TYPE,
+  TI_COMPLEX_FLOATN_NX_TYPE_FIRST = TI_COMPLEX_FLOAT16_TYPE,
+  TI_COMPLEX_FLOAT32_TYPE,
+  TI_COMPLEX_FLOAT64_TYPE,
+  TI_COMPLEX_FLOAT128_TYPE,
+  TI_COMPLEX_FLOAT32X_TYPE,
+  TI_COMPLEX_FLOAT64X_TYPE,
+  TI_COMPLEX_FLOAT128X_TYPE,
+
   TI_FLOAT_PTR_TYPE,
   TI_DOUBLE_PTR_TYPE,
   TI_LONG_DOUBLE_PTR_TYPE,
diff --git gcc/tree-streamer.c gcc/tree-streamer.c
index 2139e96..60118dc 100644
--- gcc/tree-streamer.c
+++ gcc/tree-streamer.c
@@ -248,6 +248,32 @@ streamer_tree_cache_lookup (struct streamer_tree_cache_d 
*cache, tree t,
 }
 
 
+/* Verify that NODE is in CACHE.  */
+
+static void
+verify_common_node_recorded (struct streamer_tree_cache_d *cache, tree node)
+{
+  /* Restrict this to flag_checking only because in general violating it is
+ harmless plus we never know what happens on all targets/frontend/flag(!)
+ combinations.  */
+  if (!flag_checking)
+return;
+
+  bool found = false;
+  if (cache->node_map)
+gcc_assert (streamer_tree_cache_lookup (cache, node, NULL));
+  else
+{
+  gcc_assert (cache->nodes.exists ());
+  /* Linear search...  */
+  for (unsigned i = 0; !found && i < cache->nodes.length (); ++i)
+   if (cache->nodes[i] == node)
+ found = true;
+  gcc_assert (found);
+}
+}
+
+
 /* Record NODE in CACHE.  */
 
 static void
@@ -293,11 +319,15 @@ record_common_node (struct streamer_tree_cache_d *cache, 
tree node)
   /* No recursive trees.  */
   break;
 case ARRAY_TYPE:
-case COMPLEX_TYPE:
 case POINTER_TYPE:
 case REFERENCE_TYPE:
   record_common_node (cache, TREE_TYPE (node));
   break;
+case COMPLEX_TYPE:
+  /* Verify that a complex type's component type (node_type) has been
+handled already (and we thus don't need to recurse here).  */
+  verify_common_node_recorded (cache, TREE_TYPE (node));
+  break;
 case RECORD_TYPE:
   /* The FIELD_DECLs of structures should be shared, so that eve

Re: Explicitly list all tree codes in gcc/tree-streamer.c:record_common_node

2016-10-17 Thread Thomas Schwinge
Hi!

On Fri, 30 Sep 2016 09:47:56 +0200, Richard Biener  
wrote:
> On Thu, Sep 29, 2016 at 4:48 PM, Thomas Schwinge
>  wrote:
> > On Mon, 19 Sep 2016 13:25:01 +0200, Richard Biener 
> >  wrote:
> >> On Mon, Sep 19, 2016 at 1:19 PM, Thomas Schwinge
> >>  wrote:
> >> > On Mon, 19 Sep 2016 10:18:35 +0200, Richard Biener 
> >> >  wrote:
> >> >> On Fri, Sep 16, 2016 at 3:32 PM, Thomas Schwinge
> >> >>  wrote:
> >> >> > --- gcc/tree-streamer.c
> >> >> > +++ gcc/tree-streamer.c
> >> >> > @@ -278,9 +278,23 @@ record_common_node (struct streamer_tree_cache_d 
> >> >> > *cache, tree node)
> >> >> >streamer_tree_cache_append (cache, node, cache->nodes.length ());
> >> >> >
> >> >> >if (POINTER_TYPE_P (node)
> >> >> > -  || TREE_CODE (node) == COMPLEX_TYPE
> >> >> >|| TREE_CODE (node) == ARRAY_TYPE)
> >> >> >  record_common_node (cache, TREE_TYPE (node));
> >> >> > +  else if (TREE_CODE (node) == COMPLEX_TYPE)
> >> >> > [...]
> >> >> >else if (TREE_CODE (node) == RECORD_TYPE)
> >
> >> [looks to me we miss handling of vector type components alltogether,
> >> maybe there are no global vector type trees ...]
> >
> > Looks like it, yes.  Would a patch like the following be reasonable,
> > which explicitly lists/handles all expected tree codes, or is something
> > like that not feasible?  (That's a subset of tree codes I gathered by a
> > partial run of the GCC testsuite, and libgomp testsuite; not claiming
> > this is complete.)
> 
> I think it would be a nice thing to have indeed.
> 
> So -- I'm inclined to approve this patch ;)

After quite a bit of testing (contrib/config-list.mk, modified to run
-fself-test with -flto, to exercise the code I'm modifying), I have now
committed this to trunk in r241246:

commit 29cfc397b0ec2c953ff929d0ba57001c7018ec0c
Author: tschwinge 
Date:   Mon Oct 17 15:56:22 2016 +

Explicitly list all tree codes in gcc/tree-streamer.c:record_common_node

gcc/
* tree-streamer.c (record_common_node): Explicitly list expected
tree codes.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@241246 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog   |  5 +
 gcc/tree-streamer.c | 30 +-
 2 files changed, 30 insertions(+), 5 deletions(-)

diff --git gcc/ChangeLog gcc/ChangeLog
index 86df616..9acc738 100644
--- gcc/ChangeLog
+++ gcc/ChangeLog
@@ -1,3 +1,8 @@
+2016-10-17  Thomas Schwinge  
+
+   * tree-streamer.c (record_common_node): Explicitly list expected
+   tree codes.
+
 2016-10-17  Richard Biener  
 
PR tree-optimization/77988
diff --git gcc/tree-streamer.c gcc/tree-streamer.c
index 7ea7096..2139e96 100644
--- gcc/tree-streamer.c
+++ gcc/tree-streamer.c
@@ -277,12 +277,28 @@ record_common_node (struct streamer_tree_cache_d *cache, 
tree node)
  in the cache as hash value.  */
   streamer_tree_cache_append (cache, node, cache->nodes.length ());
 
-  if (POINTER_TYPE_P (node)
-  || TREE_CODE (node) == COMPLEX_TYPE
-  || TREE_CODE (node) == ARRAY_TYPE)
-record_common_node (cache, TREE_TYPE (node));
-  else if (TREE_CODE (node) == RECORD_TYPE)
+  switch (TREE_CODE (node))
 {
+case ERROR_MARK:
+case FIELD_DECL:
+case FIXED_POINT_TYPE:
+case IDENTIFIER_NODE:
+case INTEGER_CST:
+case INTEGER_TYPE:
+case POINTER_BOUNDS_TYPE:
+case REAL_TYPE:
+case TREE_LIST:
+case VOID_CST:
+case VOID_TYPE:
+  /* No recursive trees.  */
+  break;
+case ARRAY_TYPE:
+case COMPLEX_TYPE:
+case POINTER_TYPE:
+case REFERENCE_TYPE:
+  record_common_node (cache, TREE_TYPE (node));
+  break;
+case RECORD_TYPE:
   /* The FIELD_DECLs of structures should be shared, so that every
 COMPONENT_REF uses the same tree node when referencing a field.
 Pointer equality between FIELD_DECLs is used by the alias
@@ -291,6 +307,10 @@ record_common_node (struct streamer_tree_cache_d *cache, 
tree node)
 nonoverlapping_component_refs_of_decl_p).  */
   for (tree f = TYPE_FIELDS (node); f; f = TREE_CHAIN (f))
record_common_node (cache, f);
+  break;
+default:
+  /* Unexpected tree code.  */
+  gcc_unreachable ();
 }
 }
 


Grüße
 Thomas


Re: [PATCH 0/8] NVPTX offloading to NVPTX: backend patches

2016-10-17 Thread Bernd Schmidt

On 10/14/2016 06:39 PM, Alexander Monakov wrote:

I'm resending the patch series with backend prerequisites for OpenMP
offloading to the NVIDIA PTX ISA.  The patches are rebased on trunk.


What's the status of the branch? Is it expected to work? I'm trying to 
compile the OpenMP version of these benchmarks:

  https://codesign.llnl.gov/lulesh.php

and the resulting binary fails as follows:

libgomp: Link error log error   : Size doesn't match for 
'__nvptx_stacks' in 'Input 8', first specified in 'Input 8'
error   : Multiple definition of '__nvptx_stacks' in 'Input 8', first 
defined in 'Input 8'


I think before merging this work we'll need to have some idea of how 
well it works on real-world code.



Bernd


Re: [PATCH, rs6000] pr65479 Add option to fix failing asan test cases.

2016-10-17 Thread Bill Seurer

On 10/14/2016 04:59 PM, Segher Boessenkool wrote:

On Fri, Oct 14, 2016 at 02:37:42PM -0500, Bill Seurer wrote:

[PATCH, rs6000] pr65479 Add option to fix failing asan test cases.

This patch adds the -fasynchronous-unwind-tables option to several of the asan
test cases.  The option causes a full strack trace to be produced when the
sanitizer detects an error.  Without the full trace the 3 test cases fail.


Should we enable this whenever asan is used, instead?  Not just in the
testsuite?  And is this actually PowerPC-specific?


It is needed on these tests because of how they check the traceback 
stack looking for specific functions.  Without the option only the first 
(triggering) function is included.


I don't know if the option is power specific but on x86 for instance the 
traceback gives all the functions without the option.

--

-Bill Seurer



Re: [PATCH] Make -Wint-in-bool-context warn on suspicious shift ops

2016-10-17 Thread Markus Trippelsdorf
On 2016.09.29 at 18:52 +, Bernd Edlinger wrote:
> On 09/29/16 20:03, Jason Merrill wrote:
> > On Wed, Sep 28, 2016 at 12:09 PM, Bernd Edlinger
> >  wrote:
> >> On 09/28/16 16:41, Jason Merrill wrote:
> >>> On Tue, Sep 27, 2016 at 11:10 AM, Bernd Edlinger
> >>>  wrote:
>  On 09/27/16 16:42, Jason Merrill wrote:
> > On Tue, Sep 27, 2016 at 10:28 AM, Bernd Edlinger
> >  wrote:
> >> On 09/27/16 16:10, Florian Weimer wrote:
> >>> * Bernd Edlinger:
> >>>
> > “0 << 0” is used in a similar context, to create a zero constant 
> > for a
> > multi-bit subfield of an integer.
> >
> > This example comes from GDB, in bfd/elf64-alpha.c:
> >
> > |   insn = INSN_ADDQ | (16 << 21) | (0 << 16) | (0 << 0);
> >
> 
>  Of course that is not a boolean context, and will not get a warning.
> 
>  Question is if "if (1 << 0)" is possibly a miss-spelled "if (1 < 0)".
> 
>  Maybe 1 and 0 come from macro expansion
> >>>
> >>> But what's the intent of treating 1 << 0 and 0 << 0 differently in the
> >>> patch, then?
> >>
> >> I am not sure if it was a good idea.
> >>
> >> I saw, we had code of the form
> >> bool flag = 1 << 2;
> >>
> >> another value LOOKUP_PROTECT is  1 << 0, and
> >> bool flag = 1 << 0;
> >>
> >> would at least not overflow the allowed value range of a boolean.
> >
> > Assigning a bit mask to a bool variable is still probably not what was
> > intended, even if it doesn't change the value.
> 
>  That works for me too.
>  I can simply remove that exception.
> >>>
> >>> Sounds good.
> >>
> >> Great.  Is that an "OK with that change"?
> >
> > What do you think about dropping the TYPE_UNSIGNED exception as well?
> > I don't see what difference that makes.
> >
> 
> 
> If I drop that exception, then I could also drop the check for
> INTEGER_TYPE and the whole if, because I think other types can not
> happen, but if they are allowed they are as well bogus here.
> 
> I can try a bootstrap and see if there are false positives.
> 
> But I can do that as well in a follow-up patch, this should probably
> be done step by step, especially when it may trigger some false
> positives.
> 
> I think I could also add more stuff, like unary + or - ?
> or maybe also binary +, -, * and / ?
> 
> We already discussed making this a multi-level option,
> and maybe enabling the higher level explicitly in the
> boot-strap.
> 
> As long as the warning continues to find more bugs than false
> positives, it is probably worth extending it to more cases.
> 
> However unsigned integer shift are not undefined if they overflow.
> 
> It is possible that this warning will then trigger also on valid
> code that does loop termination with unsigned int left shifting.
> I dont have a real example, but maybe  like this hypothetical C-code:
> 
>   unsigned int x=1, bits=0;
>   while (x << bits) bits++;
>   printf("bits=%d\n", bits);
> 
> 
> Is it OK for everybody to warn for this on -Wall, or maybe only
> when -Wextra or for instance -Wint-in-bool-context=2 is used ?

I'm seeing this warning a lot in valid low level C code for unsigned
integers. And I must say it look bogus in this context. Some examples:

 return ((a.high & 0x7fff) == 0x7fff) && (a.low<<1);

 if ( (uint32_t) ( aSig<<( shiftCount & 31 ) ) ) {

 && (uint64_t) (extractFloatx80Frac(a) << 1))

 if ((plen < KEYLENGTH) && (key << plen))

-- 
Markus


Re: [PATCH] Add various DWARF5 constants to dwarf2.{h,def}

2016-10-17 Thread Jeff Law

On 10/14/2016 11:34 AM, Jakub Jelinek wrote:

Hi!

Now that DWARF5 public review draft has been released, I went through
the document looking for double dagger marked constants and added them to
dwarf2.{def,h}.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
It's just adding the new constants, so seems like it ought to fall under 
obvious to me, particularly since you keep the old name/values :-)


OK.
jeff



[PATCH] PR77994 Convert std::sample size argument to suitable type

2016-10-17 Thread Jonathan Wakely

The __sample algorithms should work with the iterator's
difference_type, not the user-supplied _Distance type.

PR libstdc++/77994
* include/bits/stl_algo.h (sample): Convert size argument to iterator
difference type.
* include/experimental/algorithm (experimental::sample): Likewise.
* testsuite/25_algorithms/sample/2.cc: New test.

Tested powerpc64le-linux, committed to trunk. I'll also fix
experimental::sample on gcc-6-branch.


commit 4110dc5e4dced0da923d9fff6d7481270eb63c9c
Author: Jonathan Wakely 
Date:   Mon Oct 17 15:17:33 2016 +0100

PR77994 Convert std::sample size argument to suitable type

PR libstdc++/77994
* include/bits/stl_algo.h (sample): Convert size argument to iterator
difference type.
* include/experimental/algorithm (experimental::sample): Likewise.
* testsuite/25_algorithms/sample/2.cc: New test.

diff --git a/libstdc++-v3/include/bits/stl_algo.h 
b/libstdc++-v3/include/bits/stl_algo.h
index db99cb8..6c771bb 100644
--- a/libstdc++-v3/include/bits/stl_algo.h
+++ b/libstdc++-v3/include/bits/stl_algo.h
@@ -5730,8 +5730,9 @@ _GLIBCXX_BEGIN_NAMESPACE_ALGO
   static_assert(is_integral<_Distance>::value,
"sample size must be an integer type");
 
+  typename iterator_traits<_PopulationIterator>::difference_type __d = __n;
   return std::__sample(__first, __last, __pop_cat{}, __out, __samp_cat{},
-  __n, std::forward<_UniformRandomBitGenerator>(__g));
+  __d, std::forward<_UniformRandomBitGenerator>(__g));
 }
 #endif // C++17
 #endif // C++14
diff --git a/libstdc++-v3/include/experimental/algorithm 
b/libstdc++-v3/include/experimental/algorithm
index eb18dde..ff66b43 100644
--- a/libstdc++-v3/include/experimental/algorithm
+++ b/libstdc++-v3/include/experimental/algorithm
@@ -76,8 +76,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   static_assert(is_integral<_Distance>::value,
"sample size must be an integer type");
 
+  typename iterator_traits<_PopulationIterator>::difference_type __d = __n;
   return std::__sample(__first, __last, __pop_cat{}, __out, __samp_cat{},
-  __n,
+  __d,
   std::forward<_UniformRandomNumberGenerator>(__g));
 }
 
diff --git a/libstdc++-v3/testsuite/25_algorithms/sample/2.cc 
b/libstdc++-v3/testsuite/25_algorithms/sample/2.cc
new file mode 100644
index 000..8865e18
--- /dev/null
+++ b/libstdc++-v3/testsuite/25_algorithms/sample/2.cc
@@ -0,0 +1,65 @@
+// Copyright (C) 2016 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-std=gnu++17" }
+// { dg-do run { target c++1z } }
+
+#ifndef _GLIBCXX_ASSERTIONS
+// Make std::uniform_int_distribution check its parameters
+# define _GLIBCXX_ASSERTIONS
+#endif
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+std::mt19937 rng;
+
+using std::sample;
+using __gnu_test::test_container;
+using __gnu_test::output_iterator_wrapper;
+
+void
+test01()
+{
+  int pop[UCHAR_MAX]  = { };
+  for (int i = SCHAR_MAX; i < UCHAR_MAX; ++i)
+pop[i] = 1;
+  const signed char sample_size = SCHAR_MAX; // PR libstdc++/77994
+  int out[sample_size] = { };
+
+  // random access iterators for both population and result
+  // (uses reservoir sampling)
+  auto it = sample(std::begin(pop), std::end(pop), out, sample_size, rng);
+  auto sum = std::accumulate(out, it, 0);
+  VERIFY( sum != 0 ); // exceedingly unlikely!
+
+  // random access iterator for population and output iterator for result
+  // (uses selection sampling)
+  test_container samp2(out);
+  sample(std::begin(pop), std::end(pop), samp2.begin(), sample_size, rng);
+  sum = std::accumulate(std::begin(out), std::end(out), 0);
+  VERIFY( sum != 0 );
+}
+
+int
+main()
+{
+  test01();
+}


Re: [PATCH] Don't peel extra copy of loop in unroller for loops with exit at end

2016-10-17 Thread Pat Haugen
On 10/14/2016 10:29 PM, Andrew Pinski wrote:
>>> >> This patch bumps the iteration count by 1 for loops with the exit at the 
>>> >> end so that it represents the number of times the loop body is executed, 
>>> >> and therefore removes the need to always execute that first peeled copy. 
>>> >> With this change, when the number of executions of the loop is an even 
>>> >> multiple of the unroll factor then the code will jump to the unrolled 
>>> >> loop immediately instead of executing all the switch code and peeled 
>>> >> copies of the loop and then falling into the unrolled loop. This change 
>>> >> also reduces code size by removing a peeled copy of the loop.
>>> >>
>>> >> Bootstrap/regtest on powerpc64le with no new regressions. Ok for trunk?
>> >
>> > This patch or
>> > PR rtl-optimization/68212
>> > * cfgloopmanip.c (duplicate_loop_to_header_edge): Use preheader edge
>> > frequency when computing scale factor for peeled copies.
>> > * loop-unroll.c (unroll_loop_runtime_iterations): Fix freq/count
>> > values for switch/peel blocks/edges.
>> >
>> > Caused a ~2.7-3.5% regression in coremarks with -funroll-all-loops.
> I should say on ThunderX (aarch64-linux-gnu).

Sorry to hear about the degradation. Do you have more details on which patch 
and/or what specifically causes the degradation? This patch should only affect 
the execution path outside the unrolled loop (worst case is probably for loops 
that execute once). The pr68212 patch is just correcting some of the block 
frequency/count issues, so they're not as screwed up as what they were.

Thanks,
Pat



[PATCH] AIX symbol encoding

2016-10-17 Thread David Edelsohn
This patch shifts earlier the point during compilation where the
rs6000 backend adds decoration to symbols: to encode_section_info.

AIX uses the XCOFF file format, which extends COFF in a few ways,
including some increased granularity of sections through a mechanism
called Storage Mapping Class -- a decoration for identifiers that maps
to a COFF section.

Because of the order in which GCC emits symbols and the behavior of
the AIX Assembler, the AIX Assembler can see an early reference to an
undeclared identifier and create a symbol in the wrong section.

The rs6000 backend has been addressing this by emitting storage
mapping class decorations on DECLs declared external in
OUTPUT_SYMBOL_REF.  The rs6000 port replaces the string so that future
references also contain the mapping class.  This string replacement
conflicts with some hash tables so that the port emitted multiple
references to the symbol.

This patch changes the point at which the mapping class is added and
the string is replaced to encode_section_info hook called by
make_decl_rtl() and friends.  The patch also moves the string
management to the stringpool instead of malloc() called by concat().

Bootstrapped on powerpc-ibm-aix7.1.0.0.

Thanks, David

* config/rs6000/rs6000.c (rs6000_output_symbol_ref): Move storage
mapping class decoration from here ...
(rs6000_xcoff_encode_section): to here.

Index: rs6000.c
===
--- rs6000.c(revision 241237)
+++ rs6000.c(working copy)
@@ -30667,31 +30668,8 @@ rs6000_xcoff_strip_dollar (const char *name)
 void
 rs6000_output_symbol_ref (FILE *file, rtx x)
 {
-  /* Currently C++ toc references to vtables can be emitted before it
- is decided whether the vtable is public or private.  If this is
- the case, then the linker will eventually complain that there is
- a reference to an unknown section.  Thus, for vtables only,
- we emit the TOC reference to reference the symbol and not the
- section.  */
   const char *name = XSTR (x, 0);

-  tree decl = SYMBOL_REF_DECL (x);
-  if (decl /* sync condition with assemble_external () */
-  && DECL_P (decl) && DECL_EXTERNAL (decl) && TREE_PUBLIC (decl)
-  && (TREE_CODE (decl) == VAR_DECL
- || TREE_CODE (decl) == FUNCTION_DECL)
-  && name[strlen (name) - 1] != ']')
-{
-  name = concat (name,
-(TREE_CODE (decl) == FUNCTION_DECL
- ? "[DS]" : "[UA]"),
-NULL);
-
-  /* Don't modify name in extern VAR_DECL to include mapping class.  */
-  if (TREE_CODE (decl) == FUNCTION_DECL)
-   XSTR (x, 0) = name;
-}
-
   if (VTABLE_NAME_P (name))
 {
   RS6000_OUTPUT_BASENAME (file, name);
@@ -35264,6 +35242,7 @@ rs6000_xcoff_encode_section_info (tree decl, rtx r
 {
   rtx symbol;
   int flags;
+  const char *symname;

   default_encode_section_info (decl, rtl, first);

@@ -35280,6 +35259,28 @@ rs6000_xcoff_encode_section_info (tree decl, rtx r
 flags &= ~SYMBOL_FLAG_HAS_BLOCK_INFO;

   SYMBOL_REF_FLAGS (symbol) = flags;
+
+  /* Currently C++ toc references to vtables can be emitted before it
+ is decided whether the vtable is public or private.  If this is
+ the case, then the linker will eventually complain that there is
+ a reference to an unknown section.  Thus, for vtables only,
+ we emit the TOC reference to reference the symbol and not the
+ label identifier.  */
+  symname = XSTR (symbol, 0);
+
+  if (decl /* sync condition with assemble_external () */
+  && DECL_P (decl) && DECL_EXTERNAL (decl) && TREE_PUBLIC (decl)
+  && ((TREE_CODE (decl) == VAR_DECL && !DECL_THREAD_LOCAL_P (decl))
+ || TREE_CODE (decl) == FUNCTION_DECL)
+  && symname[strlen (symname) - 1] != ']')
+{
+  char *newname = (char *) alloca (strlen (symname) + 5);
+  strcpy (newname, symname);
+  strcat (newname, (TREE_CODE (decl) == FUNCTION_DECL
+   ? "[DS]" : "[UA]"));
+  XSTR (symbol, 0) = ggc_strdup (newname);
+}
+
 }
 #endif /* HAVE_AS_TLS */
 #endif /* TARGET_XCOFF */


[PING**2] [PATCH, ARM] Reduce stack usage in sha512 (PR target/77308)

2016-10-17 Thread Bernd Edlinger
Ping

On 10/06/16 18:18, Bernd Edlinger wrote:
> Hi,
>
> I'd like to ping for this patch:
> https://gcc.gnu.org/ml/gcc-patches/2016-09/msg02362.html
>
> Thanks,
> Bernd.
>
>
> On 09/30/16 15:33, Bernd Edlinger wrote:
>> On 09/30/16 12:14, Bernd Edlinger wrote:
>>> Eric Botcazou wrote:
> A comment before the SETs and a testcase would be nice.  IIRC
> we do have stack size testcases via using -fstack-usage.

 Or -Wstack-usage, which might be more appropriate here.
>>>
>>> Yes.  good idea.  I was not aware that we already have that kind of
>>> tests.
>>>
>>> When trying to write this test. I noticed, that I did not try -Os so
>>> far.
>>> But for -Os the stack is still the unchanged 3500 bytes.
>>>
>>> However for embedded targets I am often inclined to use -Os, and
>>> would certainly not expect the stack to explode...
>>>
>>> I see in arm.md there are places like
>>>
>>>/* If we're optimizing for size, we prefer the libgcc calls.  */
>>>if (optimize_function_for_size_p (cfun))
>>>  FAIL;
>>>
>>
>> Oh, yeah.  The comment is completely misleading.
>>
>> If this pattern fails, expmed.c simply expands some
>> less efficient rtl, which also results in two shifts
>> and one or-op.  No libgcc calls at all.
>>
>> So in simple cases without spilling the resulting
>> assembler is the same, regardless if this pattern
>> fails or not.  But the half-defined out registers
>> make a big difference when it has to be spilled.
>>
>>>/* Expand operation using core-registers.
>>>   'FAIL' would achieve the same thing, but this is a bit
>>> smarter.  */
>>>scratch1 = gen_reg_rtx (SImode);
>>>scratch2 = gen_reg_rtx (SImode);
>>>arm_emit_coreregs_64bit_shift (LSHIFTRT, operands[0],
>>> operands[1],
>>>   operands[2], scratch1, scratch2);
>>>
>>>
>>> .. that explains why this happens.  I think it would be better to
>>> use the emit_coreregs for shift count >= 32, because these are
>>> effectively 32-bit shifts.
>>>
>>> Will try if that can be improved, and come back with the
>>> results.
>>>
>>
>> The test case with -Os has 3520 bytes stack usage.
>> When only shift count >= 32 are handled we
>> have still 3000 bytes stack usage.
>> And when arm_emit_coreregs_64bit_shift is always
>> allowed to run, we have 2360 bytes stack usage.
>>
>> Also for the code size it is better not to fail this
>> pattern.  So I propose to remove this exception in all
>> three expansions.
>>
>> Here is an improved patch with the test case from the PR.
>> And a comment on the redundant SET why it is better to clear
>> the out register first.
>>
>>
>> Bootstrap and reg-testing on arm-linux-gnueabihf.
>> Is it OK for trunk?
>>
>>
>> Thanks
>> Bernd.


Re: fix -fmax-errors & notes

2016-10-17 Thread Nathan Sidwell

On 10/14/16 12:17, David Malcolm wrote:


How about splitting out the bail-out code into a separate function:

   diagnostic_handle_max_errors


Having discovered how fortran buffers errors, I think this approach is 
necessary.  I think I'm going to be in assignment-limbo for a little while though.


nathan


[PATCH] Simplify conditions in EVRP, handle taken edge

2016-10-17 Thread Richard Biener

This refactors propagation vs. substitution and handles condition
simplification properly as well as passing a known taken edge down
to the DOM walker (avoiding useless work and properly handling PHIs).

If we do all the work it's stupid to not fold away dead code...

Bootstrap and regtest pending on x86_64-unknown-linux-gnu.

Richard.

2016-10-17  Richard Biener  

* tree-vrp.c (evrp_dom_walker::before_dom_children): Handle
not visited but non-executable predecessors.  Return taken edge.
Simplify conditions and refactor propagation vs. folding step.

* gcc.dg/tree-ssa/pr20318.c: Disable EVRP.
* gcc.dg/tree-ssa/pr21001.c: Likewise.
* gcc.dg/tree-ssa/pr21090.c: Likewise.
* gcc.dg/tree-ssa/pr21294.c: Likewise.
* gcc.dg/tree-ssa/pr21563.c: Likewise.
* gcc.dg/tree-ssa/pr23744.c: Likewise.
* gcc.dg/tree-ssa/pr25382.c: Likewise.
* gcc.dg/tree-ssa/pr68431.c: Likewise.
* gcc.dg/tree-ssa/vrp03.c: Likewise.
* gcc.dg/tree-ssa/vrp06.c: Likewise.
* gcc.dg/tree-ssa/vrp07.c: Likewise.
* gcc.dg/tree-ssa/vrp09.c: Likewise.
* gcc.dg/tree-ssa/vrp19.c: Likewise.
* gcc.dg/tree-ssa/vrp20.c: Likewise.
* gcc.dg/tree-ssa/vrp92.c: Likewise.


Index: gcc/tree-vrp.c
===
--- gcc/tree-vrp.c  (revision 241238)
+++ gcc/tree-vrp.c  (working copy)
@@ -10752,7 +10744,8 @@ evrp_dom_walker::before_dom_children (ba
   bool has_unvisived_preds = false;
 
   FOR_EACH_EDGE (e, ei, bb->preds)
-if (!(e->src->flags & BB_VISITED))
+if (e->flags & EDGE_EXECUTABLE
+   && !(e->src->flags & BB_VISITED))
   {
has_unvisived_preds = true;
break;
@@ -10772,81 +10765,90 @@ evrp_dom_walker::before_dom_children (ba
   update_value_range (lhs, &vr_result);
 }
 
+  edge taken_edge = NULL;
+
   /* Visit all other stmts and discover any new VRs possible.  */
   for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
 {
   gimple *stmt = gsi_stmt (gsi);
-  edge taken_edge;
   tree output = NULL_TREE;
   gimple *old_stmt = stmt;
   bool was_noreturn = (is_gimple_call (stmt)
   && gimple_call_noreturn_p (stmt));
 
-  /* TODO, if found taken_edge, we should visit (return it) and travel
-again to improve VR as done in DOM/SCCVN optimizations.  It should
-be done carefully as stmts might prematurely leave a BB like
-in EH.  */
-  if (stmt_interesting_for_vrp (stmt))
+  if (gcond *cond = dyn_cast  (stmt))
+   {
+ vrp_visit_cond_stmt (cond, &taken_edge);
+ if (taken_edge)
+   {
+ if (taken_edge->flags & EDGE_TRUE_VALUE)
+   gimple_cond_make_true (cond);
+ else if (taken_edge->flags & EDGE_FALSE_VALUE)
+   gimple_cond_make_false (cond);
+ else
+   gcc_unreachable ();
+   }
+   }
+  else if (stmt_interesting_for_vrp (stmt))
{
+ edge taken_edge;
  value_range vr = VR_INITIALIZER;
  extract_range_from_stmt (stmt, &taken_edge, &output, &vr);
  if (output
  && (vr.type == VR_RANGE || vr.type == VR_ANTI_RANGE))
-   update_value_range (output, &vr);
- else
-   set_defs_to_varying (stmt);
-
- /* Try folding stmts with the VR discovered.  */
- bool did_replace
-   = replace_uses_in (stmt,
-  op_with_constant_singleton_value_range);
- if (fold_stmt (&gsi, follow_single_use_edges)
- || did_replace)
-   update_stmt (gsi_stmt (gsi));
-
- if (did_replace)
-   {
- /* If we cleaned up EH information from the statement,
-remove EH edges.  */
- if (maybe_clean_or_replace_eh_stmt (old_stmt, stmt))
-   bitmap_set_bit (need_eh_cleanup, bb->index);
-
- /* If we turned a not noreturn call into a noreturn one
-schedule it for fixup.  */
- if (!was_noreturn
- && is_gimple_call (stmt)
- && gimple_call_noreturn_p (stmt))
-   stmts_to_fixup.safe_push (stmt);
+   {
+ update_value_range (output, &vr);
 
- if (gimple_assign_single_p (stmt))
+ /* Set the SSA with the value range.  */
+ if (INTEGRAL_TYPE_P (TREE_TYPE (output)))
{
- tree rhs = gimple_assign_rhs1 (stmt);
- if (TREE_CODE (rhs) == ADDR_EXPR)
-   recompute_tree_invariant_for_addr_expr (rhs);
-   }
-   }
+ value_range *vr = get_value_range (output);
 
- def_operand_p def_p = SINGLE_SSA_DEF_OPERAND (stmt, SSA_OP_DEF);
- /* Set the SSA with the value range.  */
- if (def_p
- && TREE_CODE (DEF_FROM_PT

Re: Clear basic block flags before using BB_VISITED for OpenACC loops processing

2016-10-17 Thread Thomas Schwinge
Hi!

On Mon, 17 Oct 2016 14:08:44 +0200, Richard Biener  
wrote:
> On Mon, Oct 17, 2016 at 1:47 PM, Thomas Schwinge
>  wrote:
> > On Mon, 17 Oct 2016 13:22:17 +0200, Richard Biener 
> >  wrote:
> >> On Mon, Oct 17, 2016 at 11:38 AM, Thomas Schwinge
> >>  wrote:
> >> > On Fri, 14 Oct 2016 13:06:59 +0200, Richard Biener 
> >> >  wrote:
> >> >> On Fri, Oct 14, 2016 at 1:00 PM, Nathan Sidwell  wrote:
> >> >> > On 10/14/16 05:28, Richard Biener wrote:
> >> >> >
> >> >> >> The BB_VISITED flag has indetermined state at the beginning of a 
> >> >> >> pass.
> >> >> >> You have to ensure it is cleared yourself.
> >> >> >
> >> >> >
> >> >> > In that case the openacc (&nvptx?) passes should be modified to clear 
> >> >> > the
> >> >> > flags at their start, rather than at their end.

> >> > OK to commit the following?  Is such a test case appropriate (which would
> >> > have caught this issue right away), in particular the dg-final
> >> > scan-tree-dump line?
> >>
> >> Ugh.  Not worse to what we do in various dwarf scanning I guess.
> >
> > ;-|
> >
> >> Doesn't failure lead to a miscompile eventually?  So you could formulate
> >> this as a dg-do run test with a check for the desired outcome?
> >
> > No, unfortunately.  In this case the error is "benign" such that the
> > OpenACC loop processing machinery will decide to not parallelize loops
> > that ought to be parallelized.
> 
> So you can scan for "loop parallelized" instead?

The dump would still contain the outer loop's "Loop 0(0)" marker, so I'd
have to scan for "Head"/"Tail"/"UNIQUE" or similar instead -- but that
seems likewise fragile (for false negatives), and less useful than
scanning for the complete pattern.

> I fear your pattern
> is quite fragile
> to maintain over time.

Agreed -- but then, that's intentional: my idea for this new test case
has been to have it actually verify the expected OpenACC loop processing,
so it's clear that this pattern will need to be adjusted if changing the
OpenACC loop processing.

> >  This won't generally cause any problem
> > (apart from performance regression, obviously); it just caused problems
> > in a few libgomp test cases that actually at run time test for
> > parallelized execution -- which will/did trigger only with nvptx
> > offloading enabled, which not too many people are testing.  The test case
> > I propose below will trigger also for non-offloading configurations.

On IRC, Segher suggested to 'use {} instead of "" to avoid [all those
backslashes]' -- thanks, done.

commit 88260dc23e752c3e05c6644ee3b653a947714440
Author: Thomas Schwinge 
Date:   Mon Oct 17 15:33:09 2016 +0200

Clear basic block flags before using BB_VISITED for OpenACC loops processing

gcc/
* omp-low.c (oacc_loop_discovery): Call clear_bb_flags.

gcc/testsuite/
* gcc.dg/goacc/loop-processing-1.c: New file.
---
 gcc/omp-low.c  |  9 +
 gcc/testsuite/gcc.dg/goacc/loop-processing-1.c | 18 ++
 2 files changed, 23 insertions(+), 4 deletions(-)

diff --git gcc/omp-low.c gcc/omp-low.c
index 213bf8c..5257d21 100644
--- gcc/omp-low.c
+++ gcc/omp-low.c
@@ -19340,7 +19340,9 @@ oacc_loop_sibling_nreverse (oacc_loop *loop)
 static oacc_loop *
 oacc_loop_discovery ()
 {
-  basic_block bb;
+  /* Clear basic block flags, in particular BB_VISITED which we're going to use
+ in the following.  */
+  clear_bb_flags ();
   
   oacc_loop *top = new_oacc_loop_outer (current_function_decl);
   oacc_loop_discover_walk (top, ENTRY_BLOCK_PTR_FOR_FN (cfun));
@@ -19349,9 +19351,8 @@ oacc_loop_discovery ()
  that diagnostics come out in an unsurprising order.  */
   top = oacc_loop_sibling_nreverse (top);
 
-  /* Reset the visited flags.  */
-  FOR_ALL_BB_FN (bb, cfun)
-bb->flags &= ~BB_VISITED;
+  /* Clear basic block flags again, as otherwise IRA will explode later on.  */
+  clear_bb_flags ();
 
   return top;
 }
diff --git gcc/testsuite/gcc.dg/goacc/loop-processing-1.c 
gcc/testsuite/gcc.dg/goacc/loop-processing-1.c
new file mode 100644
index 000..619576a
--- /dev/null
+++ gcc/testsuite/gcc.dg/goacc/loop-processing-1.c
@@ -0,0 +1,18 @@
+/* Make sure that OpenACC loop processing happens.  */
+/* { dg-additional-options "-O2 -fdump-tree-oaccdevlow" } */
+
+extern int place ();
+
+void vector_1 (int *ary, int size)
+{
+#pragma acc parallel num_workers (32) vector_length(32) copy(ary[0:size]) 
firstprivate (size)
+  {
+#pragma acc loop gang
+for (int jx = 0; jx < 1; jx++)
+#pragma acc loop auto
+  for (int ix = 0; ix < size; ix++)
+   ary[ix] = place ();
+  }
+}
+
+/* { dg-final { scan-tree-dump {OpenACC loops.*Loop 0\(0\).*Loop 
14\(1\).*\.data_dep\.[0-9_]+ = UNIQUE \(OACC_HEAD_MARK, 0, 1, 
20\);.*Head-0:.*\.data_dep\.[0-9_]+ = UNIQUE \(OACC_HEAD_MARK, 0, 1, 
20\);.*\.data_dep\.[0-9_]+ = UNIQUE \(OACC_FORK, \.data_dep\.[0-9_]+, 
0\);.*Tail-0:.*\.data_dep\.[0-9_]+ = UNIQUE \(OACC_TAIL_MARK, 
\.data_dep\.[0-9_]+, 1\);.*\.data_dep\.[0-9_]+ = UNIQU

[PATCH] PR77990 refactor unique_ptr to encapsulate tuple

2016-10-17 Thread Jonathan Wakely

We are incorrectly requiring unique_ptr deleters to be copyable here:

 explicit
 unique_ptr(pointer __p) noexcept
 : _M_t(__p, deleter_type())
 { }

We could just do:

 explicit
 unique_ptr(pointer __p) noexcept
 : _M_t()
 { std::get<0>(_M_t) = __p; }

But having to deal directly with the std::tuple inside unique_ptr has
been bothering me for some time. The tuple is used so we get the empty
base-class optimisation for the deleter, but that implementation
detail 


This patch refactors unique_ptr to put the std::tuple member into a
new type which provides named accessors for the tuple elements, so we
can stop using get<0> and get<1>. That new type can also provide a
single-argument constructor to fix the copyable requirement for
deleters. This also removes the code for deducing the pointer type
which is duplciated in unique_ptr and unique_ptr, and while in
the neighbourhood I changed it from old-school SFINAE using overloaded
functions to the new hotness with __void_t<>.

I intend to commit this to trunk, but on the branches I'll just fix
the constructor as shown above, as it's a smaller change.


PR libstdc++/77990
* include/bits/unique_ptr.h (__uniq_ptr_impl): New type to
encapsulate implementation details.
(unique_ptr::unique_ptr(_Up)): Don't copy deleter object.
(unique_ptr::get, unique_ptr::get_deleter, unique_ptr::release):
Call member functions of implementation object.
(unique_ptr): Likewise.
* python/libstdcxx/v6/printers.py (UniquePointerPrinter): Adjust for
new implementation.
* python/libstdcxx/v6/xmethods.py (UniquePtrGetWorker): Likewise.
* testsuite/20_util/unique_ptr/assign/48635_neg.cc: Adjust dg-error
lines.
* testsuite/20_util/unique_ptr/assign/cv_qual.cc: Likewise.
* testsuite/20_util/unique_ptr/cons/cv_qual.cc: Likewise.
* testsuite/20_util/unique_ptr/cons/77990.cc: New test.

Tested powerpc64le-linux.

commit 48463e9c51e49d0c8972b5128ab8640aa90bc6a3
Author: Jonathan Wakely 
Date:   Fri Oct 14 22:56:11 2016 +0100

PR77990 refactor unique_ptr to encapsulate tuple

	PR libstdc++/77990
	* include/bits/unique_ptr.h (__uniq_ptr_impl): New type to
	encapsulate implementation details.
	(unique_ptr::unique_ptr(_Up)): Don't copy deleter object.
	(unique_ptr::get, unique_ptr::get_deleter, unique_ptr::release):
	Call member functions of implementation object.
	(unique_ptr): Likewise.
	* python/libstdcxx/v6/printers.py (UniquePointerPrinter): Adjust for
	new implementation.
	* python/libstdcxx/v6/xmethods.py (UniquePtrGetWorker): Likewise.
	* testsuite/20_util/unique_ptr/assign/48635_neg.cc: Adjust dg-error
	lines.
	* testsuite/20_util/unique_ptr/assign/cv_qual.cc: Likewise.
	* testsuite/20_util/unique_ptr/cons/cv_qual.cc: Likewise.
	* testsuite/20_util/unique_ptr/cons/77990.cc: New test.

diff --git a/libstdc++-v3/include/bits/unique_ptr.h b/libstdc++-v3/include/bits/unique_ptr.h
index a36794d..e32b6c9 100644
--- a/libstdc++-v3/include/bits/unique_ptr.h
+++ b/libstdc++-v3/include/bits/unique_ptr.h
@@ -111,33 +111,51 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   }
 };
 
-  /// 20.7.1.2 unique_ptr for single objects.
-  template  >
-class unique_ptr
+  template 
+class __uniq_ptr_impl
 {
-  // use SFINAE to determine whether _Del::pointer exists
-  class _Pointer
-  {
-	template
-	  static typename _Up::pointer __test(typename _Up::pointer*);
+  template 
+	struct _Ptr
+	{
+	  using type = _Up*;
+	};
 
-	template
-	  static _Tp* __test(...);
-
-	typedef typename remove_reference<_Dp>::type _Del;
-
-  public:
-	typedef decltype(__test<_Del>(0)) type;
-  };
-
-  typedef std::tuple  __tuple_type;
-  __tuple_type  _M_t;
+  template 
+	struct
+	_Ptr<_Up, _Ep, __void_t::type::pointer>>
+	{
+	  using type = typename remove_reference<_Ep>::type::pointer;
+	};
 
 public:
-  typedef typename _Pointer::type   pointer;
-  typedef _Tp   element_type;
-  typedef _Dp   deleter_type;
+  using pointer = typename _Ptr<_Tp, _Dp>::type;
 
+  __uniq_ptr_impl() = default;
+  __uniq_ptr_impl(pointer __p) : _M_t() { _M_ptr() = __p; }
+
+  template
+  __uniq_ptr_impl(pointer __p, _Del&& __d)
+	: _M_t(__p, std::forward<_Del>(__d)) { }
+
+  pointer&   _M_ptr() { return std::get<0>(_M_t); }
+  pointer_M_ptr() const { return std::get<0>(_M_t); }
+  _Dp&   _M_deleter() { return std::get<1>(_M_t); }
+  const _Dp& _M_deleter() const { return std::get<1>(_M_t); }
+
+private:
+  tuple _M_t;
+};
+
+  /// 20.7.1.2 unique_ptr for single objects.
+  template >
+class unique_ptr
+{
+  __uniq_ptr_impl<_Tp, _Dp> _M_t;
+
+public:
+  using pointer	  = typename __uniq_ptr_impl<_Tp, _Dp>::pointer;
+  using el

Re: [PATCH] Fix computation of register limit for -fsched-pressure

2016-10-17 Thread Maxim Kuvyrkov
> On Oct 7, 2016, at 6:08 PM, Pat Haugen  wrote:
> 
> The patch here, https://gcc.gnu.org/ml/gcc-patches/2014-10/msg01872.html, 
> attempted to scale down the register limit used by -fsched-pressure for the 
> case where the block in question executes as frequently as the entry block to 
> just the call_clobbered (i.e. call_used) regs. But the code is actually 
> scaling toward call_saved registers. The following patch corrects that by 
> computing call_saved regs per class and subtracting out some scaled portion 
> of that.
> 
> Bootstrap/regtest on powerpc64le with no new failures. Ok for trunk?

Hi Pat,

I stared at your patch and current code for good 30 minutes, and I still don't 
see what is wrong with the current code.

With your patch the number of registers from class CL that scheduler has at its 
disposal for a single-basic-block function will be:

sched_call_regs_num[CL] = ira_class_hard_regs_num[CL] - call_saved_regs_num[CL];

where call_saved_regs_num is number of registers in class CL that need to be 
saved in the prologue (i.e., "free" registers).  I can see some logic in setting

sched_call_regs_num[CL] = call_saved_regs_num[CL];

but not in subtracting number of such registers from the number of total 
available hard registers.

Am I missing something?

Also, could you share the testcase that you used to investigate the problem 
with register-aware scheduling?  I wonder if there is a problem lurking.

Thank you,

--
Maxim Kuvyrkov
www.linaro.org


> 
> -Pat
> 
> 
> 2016-10-07  Pat Haugen  
> 
>   * haifa-sched.c call_used_regs_num: Rename to...
>   call_saved_regs_num: ...this.
>   (sched_pressure_start_bb): Scale call_saved regs not call_used.
>   (alloc_global_sched_pressure_data): Compute call_saved regs.
> 
> 
> 



Re: [PATCH] Clear BB_VISITED in bb-reorder

2016-10-17 Thread Richard Biener
On Mon, Oct 17, 2016 at 3:12 PM, Thomas Schwinge
 wrote:
> Hi!
>
> On Mon, 17 Oct 2016 14:26:42 +0200 (CEST), Richard Biener  
> wrote:
>> --- gcc/bb-reorder.c  (revision 241228)
>> +++ gcc/bb-reorder.c  (working copy)
>> @@ -2355,7 +2355,10 @@ reorder_basic_blocks_simple (void)
>>   To start with, everything points to itself, nothing is assigned yet.  
>> */
>>
>>FOR_ALL_BB_FN (bb, cfun)
>> -bb->aux = bb;
>> +{
>> +  bb->aux = bb;
>> +  bb->flags &= ~BB_VISITED;
>> +}
>>
>>EXIT_BLOCK_PTR_FOR_FN (cfun)->aux = 0;
>
> "EXIT_BLOCK_PTR_FOR_FN (cfun)->flags &= ~BB_VISITED;" is not required
> here, additionally?

ALL_BB includes EXIT_BLOCK.

> Also, I see that in this file, pass_duplicate_computed_gotos::execute
> does call clear_bb_flags to clear basic block flags -- but
> reorder_basic_blocks_simple (cited above), called via
> reorder_basic_blocks, called via pass_reorder_blocks::execute is part of
> a different pass...

Yes, this file contains multiple passes.

Richard.

> Grüße
>  Thomas


Re: [PATCH] Clear BB_VISITED in bb-reorder

2016-10-17 Thread Thomas Schwinge
Hi!

On Mon, 17 Oct 2016 14:26:42 +0200 (CEST), Richard Biener  
wrote:
> --- gcc/bb-reorder.c  (revision 241228)
> +++ gcc/bb-reorder.c  (working copy)
> @@ -2355,7 +2355,10 @@ reorder_basic_blocks_simple (void)
>   To start with, everything points to itself, nothing is assigned yet.  */
>  
>FOR_ALL_BB_FN (bb, cfun)
> -bb->aux = bb;
> +{
> +  bb->aux = bb;
> +  bb->flags &= ~BB_VISITED;
> +}
>  
>EXIT_BLOCK_PTR_FOR_FN (cfun)->aux = 0;

"EXIT_BLOCK_PTR_FOR_FN (cfun)->flags &= ~BB_VISITED;" is not required
here, additionally?

Also, I see that in this file, pass_duplicate_computed_gotos::execute
does call clear_bb_flags to clear basic block flags -- but
reorder_basic_blocks_simple (cited above), called via
reorder_basic_blocks, called via pass_reorder_blocks::execute is part of
a different pass...


Grüße
 Thomas


Re: [PATCH] Fix PR77937

2016-10-17 Thread Bill Schmidt

> On Oct 17, 2016, at 3:27 AM, Richard Biener  
> wrote:
> 
> On Fri, Oct 14, 2016 at 3:51 PM, Bill Schmidt
>  wrote:
>> Hi Richard,
>> 
>>> On Oct 14, 2016, at 4:19 AM, Richard Biener  
>>> wrote:
>>> 
>>> On Thu, Oct 13, 2016 at 5:38 PM, Bill Schmidt
>>>  wrote:
 The previous patch for
 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77937 is necessary, but not
 sufficient in all cases.  It allows -1 to be used with a pointer
 increment, which we really do not want given that this is generally not
 profitable.  Disable this case for now.  We can add logic later to
 estimate the cost for the rare case where it can be useful.
 
 Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
 regressions, committed.
>>> 
>>> Huh, I wonder what is special about -1 here.  Do we handle -2?
>> 
>> We do, subject to a little more stringent cost modeling later on, because it
>> requires introducing a multiply by the increment.  We have some special
>> case code for -1 that introduces a MINUS_EXPR, but that breaks for
>> pointer arithmetic.
> 
> Ah, ok.  Fine then.
> 
>> I am working on a better fix for this as part of the work for PR77916, which
>> exposes a related problem elsewhere in the code.  The current fix is a
>> stopgap until I can get that work completed.  For -1 we prefer a negate
>> over a multiply when we have pointer types and can't use minus, and need
>> to properly model that in the cost calculation.
> 
> Note that RTL expansion will turn this into a minus again so I dont' think you
> need any cost adjustment here.  It's just that GIMPLE doesnt' have a
> POINTER_MINUS_EXPR...
> (RTL just has plus and minus, nothing special for pointers).

I came to the same conclusion over the weekend.  So I think the real fix
will be pretty simple now.  I keep getting pulled off to other things, but 
should
have something for review in the next couple of days.

Thanks!
Bill

> 
> Richard.
> 
>> Bill
>> 
>>> 
>>> Richard.
>>> 
 Thanks,
 Bill
 
 
 2016-10-13  Bill Schmidt  
 
   PR tree-optimization/77937
   * gimple-ssa-strength-reduction.c (analyze_increments): Set cost
   to infinite when we have a pointer with an increment of -1.
 
 
 Index: gcc/gimple-ssa-strength-reduction.c
 ===
 --- gcc/gimple-ssa-strength-reduction.c (revision 241120)
 +++ gcc/gimple-ssa-strength-reduction.c (working copy)
 @@ -2818,6 +2818,11 @@ analyze_increments (slsr_cand_t first_dep, machine
  || (incr == -1
  && !POINTER_TYPE_P (first_dep->cand_type)))
   incr_vec[i].cost = COST_NEUTRAL;
 +
 +  /* FIXME: We don't handle pointers with a -1 increment yet.
 + They are usually unprofitable anyway.  */
 +  else if (incr == -1 && POINTER_TYPE_P (first_dep->cand_type))
 +   incr_vec[i].cost = COST_INFINITE;
 
  /* FORNOW: If we need to add an initializer, give up if a cast from
the candidate's type to its stride's type can lose precision.



Re: [PATCH v2][AArch64] Fix symbol offset limit

2016-10-17 Thread Wilco Dijkstra


ping


From: Wilco Dijkstra
Sent: 12 September 2016 15:50
To: Richard Earnshaw; GCC Patches
Cc: nd
Subject: Re: [PATCH v2][AArch64] Fix symbol offset limit
    
Wilco wrote:    
> The original example is from GCC itself, the fixed_regs array is small but 
> due to
> optimization we can end up with &fixed_regs + 0x.

We could also check the bounds of each symbol if they exist, like the patch 
below.


In aarch64_classify_symbol symbols are allowed full-range offsets on 
relocations.
This means the offset can use all of the +/-4GB offset, leaving no offset 
available
for the symbol itself.  This results in relocation overflow and link-time errors
for simple expressions like &global_char + 0xff00.

To avoid this, limit the offset to +/-1GB so that the symbol needs to be within 
a
3GB offset from its references.  For the tiny code model use a 64KB offset, 
allowing
most of the 1MB range for code/data between the symbol and its references.
For symbols with a defined size, limit the offset to be within the size of the 
symbol.

ChangeLog:
2016-09-12  Wilco Dijkstra  

    gcc/
    * config/aarch64/aarch64.c (aarch64_classify_symbol):
    Apply reasonable limit to symbol offsets.

    testsuite/
    * gcc.target/aarch64/symbol-range.c (foo): Set new limit.
    * gcc.target/aarch64/symbol-range-tiny.c (foo): Likewise.

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
385bd560fb12cd5d404e6ddb2f01edf1fe72d729..275a828ac9e6e9b8187380c1b602ffb1b2bcfb21
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -9351,6 +9351,8 @@ aarch64_classify_symbol (rtx x, rtx offset)
   if (aarch64_tls_symbol_p (x))
 return aarch64_classify_tls_symbol (x);
 
+  const_tree decl = SYMBOL_REF_DECL (x);
+
   switch (aarch64_cmodel)
 {
 case AARCH64_CMODEL_TINY:
@@ -9359,25 +9361,45 @@ aarch64_classify_symbol (rtx x, rtx offset)
  we have no way of knowing the address of symbol at compile time
  so we can't accurately say if the distance between the PC and
  symbol + offset is outside the addressible range of +/-1M in the
-    TINY code model.  So we rely on images not being greater than
-    1M and cap the offset at 1M and anything beyond 1M will have to
-    be loaded using an alternative mechanism.  Furthermore if the
-    symbol is a weak reference to something that isn't known to
-    resolve to a symbol in this module, then force to memory.  */
+    TINY code model.  So we limit the maximum offset to +/-64KB and
+    assume the offset to the symbol is not larger than +/-(1M - 64KB).
+    Furthermore force to memory if the symbol is a weak reference to
+    something that doesn't resolve to a symbol in this module.  */
   if ((SYMBOL_REF_WEAK (x)
    && !aarch64_symbol_binds_local_p (x))
- || INTVAL (offset) < -1048575 || INTVAL (offset) > 1048575)
+ || !IN_RANGE (INTVAL (offset), -0x1, 0x1))
 return SYMBOL_FORCE_TO_MEM;
+
+ /* Limit offset to within the size of a declaration if available.  */
+ if (decl && DECL_P (decl))
+   {
+ const_tree decl_size = DECL_SIZE (decl);
+
+ if (decl_size
+ && !IN_RANGE (INTVAL (offset), 0, tree_to_shwi (decl_size)))
+   return SYMBOL_FORCE_TO_MEM;
+   }
+
   return SYMBOL_TINY_ABSOLUTE;
 
 case AARCH64_CMODEL_SMALL:
   /* Same reasoning as the tiny code model, but the offset cap here is
-    4G.  */
+    1G, allowing +/-3G for the offset to the symbol.  */
   if ((SYMBOL_REF_WEAK (x)
    && !aarch64_symbol_binds_local_p (x))
- || !IN_RANGE (INTVAL (offset), HOST_WIDE_INT_C (-4294967263),
-   HOST_WIDE_INT_C (4294967264)))
+ || !IN_RANGE (INTVAL (offset), -0x4000, 0x4000))
 return SYMBOL_FORCE_TO_MEM;
+
+ /* Limit offset to within the size of a declaration if available.  */
+ if (decl && DECL_P (decl))
+   {
+ const_tree decl_size = DECL_SIZE (decl);
+
+ if (decl_size
+ && !IN_RANGE (INTVAL (offset), 0, tree_to_shwi (decl_size)))
+   return SYMBOL_FORCE_TO_MEM;
+   }
+
   return SYMBOL_SMALL_ABSOLUTE;
 
 case AARCH64_CMODEL_TINY_PIC:
diff --git a/gcc/testsuite/gcc.target/aarch64/symbol-range-tiny.c 
b/gcc/testsuite/gcc.target/aarch64/symbol-range-tiny.c
index 
d7e46b059e41f2672b3a1da5506fa8944e752e01..d399a3637ed834ddc4bb429594c4ec229b5c2ea8
 100644
--- a/gcc/testsuite/gcc.target/aarch64/symbol-range-tiny.c
+++ b/gcc/testsuite/gcc.target/aarch64/symbol-range-tiny.c
@@ -1,12 +1,12 @@
-/* { dg-do compile } */
+/* { dg-do link } */
 /* { dg-options "-O3 -save-temps -mcmodel=tiny" } */
 
-int fixed_regs[0x0020

[PATCH] Fix PR77988

2016-10-17 Thread Richard Biener

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2016-10-17  Richard Biener  

PR tree-optimization/77988
* tree-vrp.c (remove_range_assertions): Use replace_uses_by.

* gcc.dg/torture/pr77988.c: New testcase.

Index: gcc/testsuite/gcc.dg/torture/pr77988.c
===
--- gcc/testsuite/gcc.dg/torture/pr77988.c  (revision 0)
+++ gcc/testsuite/gcc.dg/torture/pr77988.c  (working copy)
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+
+static int a = 2;
+int b[1], c, d;
+
+int main ()
+{ 
+  int e = a, *f = &b[0];
+  if (d)
+for (e = 0; e < 1; e++)
+  ;
+  if (e)
+{
+L1:
+  if (b < f)
+   __builtin_abort ();
+  if (*f)
+   c++;
+  return 0;
+}
+  f = 0;
+  goto L1;
+  return 0;
+}
Index: gcc/tree-vrp.c
===
--- gcc/tree-vrp.c  (revision 241227)
+++ gcc/tree-vrp.c  (working copy)
@@ -6882,7 +6882,6 @@ remove_range_assertions (void)
 for (si = gsi_after_labels (bb), is_unreachable = -1; !gsi_end_p (si);)
   {
gimple *stmt = gsi_stmt (si);
-   gimple *use_stmt;
 
if (is_gimple_assign (stmt)
&& gimple_assign_rhs_code (stmt) == ASSERT_EXPR)
@@ -6890,8 +6889,6 @@ remove_range_assertions (void)
tree lhs = gimple_assign_lhs (stmt);
tree rhs = gimple_assign_rhs1 (stmt);
tree var;
-   use_operand_p use_p;
-   imm_use_iterator iter;
 
var = ASSERT_EXPR_VAR (rhs);
 
@@ -6927,12 +6924,7 @@ remove_range_assertions (void)
  }
 
/* Propagate the RHS into every use of the LHS.  */
-   FOR_EACH_IMM_USE_STMT (use_stmt, iter, lhs)
- {
-   FOR_EACH_IMM_USE_ON_STMT (use_p, iter)
- SET_USE (use_p, var);
-   update_stmt (use_stmt);
- }
+   replace_uses_by (lhs, var);
 
/* And finally, remove the copy, it is not needed.  */
gsi_remove (&si, true);


Re: [PATCH][AArch64 - v3] Simplify eh_return implementation

2016-10-17 Thread Wilco Dijkstra


ping


From: Wilco Dijkstra
Sent: 02 September 2016 12:31
To: Ramana Radhakrishnan; GCC Patches
Cc: nd
Subject: Re: [PATCH][AArch64 - v3] Simplify eh_return implementation
    
Ramana Radhakrishnan wrote:
> Can you please file a PR for this and add some testcases ?  This sounds like 
> a serious enough problem that needs to be looked at probably going back since 
> the dawn of time.

I've created PR77455. Updated patch below:

This patch simplifies the handling of the EH return value.  We force the use of 
the
frame pointer so the return location is always at FP + 8.  This means we can 
emit
a simple volatile access in EH_RETURN_HANDLER_RTX without needing md
patterns, splitters and frame offset calculations.  The new implementation also
fixes various bugs in aarch64_final_eh_return_addr, which does not work with
-fomit-frame-pointer, alloca or outgoing arguments.

Bootstrap OK, GCC Regression OK, OK for trunk? Would it be useful to backport
this to GCC6.x?

ChangeLog:

2016-09-02  Wilco Dijkstra  

    PR77455
gcc/
    * config/aarch64/aarch64.md (eh_return): Remove pattern and splitter.
    * config/aarch64/aarch64.h (AARCH64_EH_STACKADJ_REGNUM): Remove.
    (EH_RETURN_HANDLER_RTX): New define.
    * config/aarch64/aarch64.c (aarch64_frame_pointer_required):
    Force frame pointer in EH return functions.
    (aarch64_expand_epilogue): Add barrier for eh_return.
    (aarch64_final_eh_return_addr): Remove.
    (aarch64_eh_return_handler_rtx): New function.
    * config/aarch64/aarch64-protos.h (aarch64_final_eh_return_addr):
    Remove.
    (aarch64_eh_return_handler_rtx): New prototype.

testsuite/
    * gcc.target/aarch64/eh_return.c: New test.
--
diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 
3cdd69b8af1089a839e5d45cda94bc70a15cd777..327c0a97f6f687604afef249b79ac22628418070
 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -358,7 +358,7 @@ int aarch64_hard_regno_mode_ok (unsigned, machine_mode);
 int aarch64_hard_regno_nregs (unsigned, machine_mode);
 int aarch64_uxt_size (int, HOST_WIDE_INT);
 int aarch64_vec_fpconst_pow_of_2 (rtx);
-rtx aarch64_final_eh_return_addr (void);
+rtx aarch64_eh_return_handler_rtx (void);
 rtx aarch64_mask_from_zextract_ops (rtx, rtx);
 const char *aarch64_output_move_struct (rtx *operands);
 rtx aarch64_return_addr (int, rtx);
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 
003fec87e41db618570663f28cc2387a87e8252a..fa81e4b853daf08842955288861ec7e7acca
 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -400,9 +400,9 @@ extern unsigned aarch64_architecture_version;
 #define ASM_DECLARE_FUNCTION_NAME(STR, NAME, DECL)  \
   aarch64_declare_function_name (STR, NAME, DECL)
 
-/* The register that holds the return address in exception handlers.  */
-#define AARCH64_EH_STACKADJ_REGNUM (R0_REGNUM + 4)
-#define EH_RETURN_STACKADJ_RTX gen_rtx_REG (Pmode, AARCH64_EH_STACKADJ_REGNUM)
+/* For EH returns X4 contains the stack adjustment.  */
+#define EH_RETURN_STACKADJ_RTX gen_rtx_REG (Pmode, R4_REGNUM)
+#define EH_RETURN_HANDLER_RTX  aarch64_eh_return_handler_rtx ()
 
 /* Don't use __builtin_setjmp until we've defined it.  */
 #undef DONT_USE_BUILTIN_SETJMP
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
e742c19d76e6c62117aa62a990b9c2945aa06b74..f07d771ea343803e054e03f59c8c1efb698bf474
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -2739,6 +2739,10 @@ aarch64_frame_pointer_required (void)
   && (!crtl->is_leaf || df_regs_ever_live_p (LR_REGNUM)))
 return true;
 
+  /* Force a frame pointer for EH returns so the return address is at FP+8.  */
+  if (crtl->calls_eh_return)
+    return true;
+
   return false;
 }
 
@@ -3298,7 +3302,8 @@ aarch64_expand_epilogue (bool for_sibcall)
  + cfun->machine->frame.saved_varargs_size) != 0;
 
   /* Emit a barrier to prevent loads from a deallocated stack.  */
-  if (final_adjust > crtl->outgoing_args_size || cfun->calls_alloca)
+  if (final_adjust > crtl->outgoing_args_size || cfun->calls_alloca
+  || crtl->calls_eh_return)
 {
   emit_insn (gen_stack_tie (stack_pointer_rtx, stack_pointer_rtx));
   need_barrier_p = false;
@@ -3366,52 +3371,15 @@ aarch64_expand_epilogue (bool for_sibcall)
 emit_jump_insn (ret_rtx);
 }
 
-/* Return the place to copy the exception unwinding return address to.
-   This will probably be a stack slot, but could (in theory be the
-   return register).  */
+/* Implement EH_RETURN_HANDLER_RTX.  The return address is stored at FP + 8.
+   The access needs to be volatile to prevent it from being removed.  */
 rtx
-aarch64_final_eh_return_addr (void)
+aarch64_eh_return_handler_rtx (void)
 {
-  HOST_WIDE_INT fp_offset;
-
-  aarch64_layout_frame ();
-
-  fp_offset = cfun->machine->frame.frame_size
- 

Re: [PATCH][AArch64] Align FP callee-saves

2016-10-17 Thread Wilco Dijkstra

ping


From: Wilco Dijkstra
Sent: 08 September 2016 14:35
To: GCC Patches
Cc: nd
Subject: [PATCH][AArch64] Align FP callee-saves
    
If the number of integer callee-saves is odd, the FP callee-saves use 8-byte 
aligned
LDP/STP.  Since 16-byte alignment may be faster on some CPUs, align the FP
callee-saves to 16 bytes and use the alignment gap for the last FP callee-save 
when
possible. Besides slightly different offsets for FP callee-saves, the generated 
code
doesn't change.

Bootstrap and regression pass, OK for commit?


ChangeLog:
2016-09-08  Wilco Dijkstra  

    * config/aarch64/aarch64.c (aarch64_layout_frame):
    Align FP callee-saves.
--
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
fed3b6e803821392194dc34a6c3df5f653d2e33e..075b3802c72a68f63b47574e19186e7ce3440b28
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -2735,7 +2735,7 @@ static void
 aarch64_layout_frame (void)
 {
   HOST_WIDE_INT offset = 0;
-  int regno;
+  int regno, last_fp_reg = INVALID_REGNUM;
 
   if (reload_completed && cfun->machine->frame.laid_out)
 return;
@@ -2781,7 +2781,10 @@ aarch64_layout_frame (void)
   for (regno = V0_REGNUM; regno <= V31_REGNUM; regno++)
 if (df_regs_ever_live_p (regno)
 && !call_used_regs[regno])
-  cfun->machine->frame.reg_offset[regno] = SLOT_REQUIRED;
+  {
+   cfun->machine->frame.reg_offset[regno] = SLOT_REQUIRED;
+   last_fp_reg = regno;
+  }
 
   if (cfun->machine->frame.emit_frame_chain)
 {
@@ -2805,9 +2808,21 @@ aarch64_layout_frame (void)
 offset += UNITS_PER_WORD;
   }
 
+  HOST_WIDE_INT max_int_offset = offset;
+  offset = ROUND_UP (offset, STACK_BOUNDARY / BITS_PER_UNIT);
+  bool has_align_gap = offset != max_int_offset;
+
   for (regno = V0_REGNUM; regno <= V31_REGNUM; regno++)
 if (cfun->machine->frame.reg_offset[regno] == SLOT_REQUIRED)
   {
+   /* If there is an alignment gap between integer and fp callee-saves,
+  allocate the last fp register to it if possible.  */
+   if (regno == last_fp_reg && has_align_gap && (offset & 8) == 0)
+ {
+   cfun->machine->frame.reg_offset[regno] = max_int_offset;
+   break;
+ }
+
 cfun->machine->frame.reg_offset[regno] = offset;
 if (cfun->machine->frame.wb_candidate1 == INVALID_REGNUM)
   cfun->machine->frame.wb_candidate1 = regno;



Re: [PATCH][AArch64] Improve stack adjustment

2016-10-17 Thread Wilco Dijkstra

ping


From: Wilco Dijkstra
Sent: 10 August 2016 17:20
To: Richard Earnshaw; GCC Patches
Cc: nd
Subject: Re: [PATCH][AArch64] Improve stack adjustment
    
Richard Earnshaw wrote:
> I see you've added a default argument for your new parameter.  I think
> doing that is fine, but I have two comments about how we might use that
> in this case.
> 
> Firstly, if this parameter is suitable for having a default value, then
> I think the preceding one should also be treated in the same way.
> 
> Secondly, I think (due to these parameters being BOOL with no useful
> context to make it clear which is which) that having wrapper functions
> (inlined, of course) that describe the intended behaviour more clearly
> would be useful.  So, defining
> 
> static inline void
> aarch64_add_frame_constant (mode, regnum, scratchreg, delta)
> {
>    aarch64_add_frame_constant (mode, regnum, scratchreg, delta, true);
> }
> 
> would make it clearer at the call point than having a lot of true and
> false parameters scattered round the code.
> 
> Alternatively we might remove all the default parameters and require
> wrappers like the above to make it more explicit which API is intended -
> this might make more sense if not all combinations make sense.

OK here is the new version using simple wrapper functions and no
default parameters:

ChangeLog:
2016-08-10  Wilco Dijkstra  

gcc/
    * config/aarch64/aarch64.c (aarch64_add_constant_internal):
    Add extra argument to allow emitting the move immediate.
    Use add/sub with positive immediate.
    (aarch64_add_constant): Add inline function.
    (aarch64_add_sp): Likewise.
    (aarch64_sub_sp): Likewise.
    (aarch64_expand_prologue): Call aarch64_sub_sp.
    (aarch64_expand_epilogue): Call aarch64_add_sp.
    Decide when to leave out move.
    (aarch64_output_mi_thunk): Call aarch64_add_constant.

testsuite/
    * gcc.target/aarch64/test_frame_17.c: New test.
--
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
b8536175a84b76f8c2939e61f1379ae279b20d43..5a25fce17785af9f9dc12e0f2a9609af09af0b35
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1941,15 +1941,21 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
 }
 
 /* Add DELTA to REGNUM in mode MODE.  SCRATCHREG can be used to held
-   intermediate value if necessary.
+   intermediate value if necessary.  FRAME_RELATED_P should be true if
+   the RTX_FRAME_RELATED flag should be set and CFA adjustments added
+   to the generated instructions.  If SCRATCHREG is known to hold
+   abs (delta), EMIT_MOVE_IMM can be set to false to avoid emitting the
+   immediate again.
 
-   This function is sometimes used to adjust the stack pointer, so we must
-   ensure that it can never cause transient stack deallocation by writing an
-   invalid value into REGNUM.  */
+   Since this function may be used to adjust the stack pointer, we must
+   ensure that it cannot cause transient stack deallocation (for example
+   by first incrementing SP and then decrementing when adjusting by a
+   large immediate).  */
 
 static void
-aarch64_add_constant (machine_mode mode, int regnum, int scratchreg,
- HOST_WIDE_INT delta, bool frame_related_p)
+aarch64_add_constant_internal (machine_mode mode, int regnum, int scratchreg,
+  HOST_WIDE_INT delta, bool frame_related_p,
+  bool emit_move_imm)
 {
   HOST_WIDE_INT mdelta = abs_hwi (delta);
   rtx this_rtx = gen_rtx_REG (mode, regnum);
@@ -1967,11 +1973,11 @@ aarch64_add_constant (machine_mode mode, int regnum, 
int scratchreg,
   return;
 }
 
-  /* We need two add/sub instructions, each one performing part of the
- calculation.  Don't do this if the addend can be loaded into register with
- a single instruction, in that case we prefer a move to a scratch register
- following by an addition.  */
-  if (mdelta < 0x100 && !aarch64_move_imm (delta, mode))
+  /* We need two add/sub instructions, each one perform part of the
+ addition/subtraction, but don't this if the addend can be loaded into
+ register by single instruction, in that case we prefer a move to scratch
+ register following by addition.  */
+  if (mdelta < 0x100 && !aarch64_move_imm (mdelta, mode))
 {
   HOST_WIDE_INT low_off = mdelta & 0xfff;
 
@@ -1985,8 +1991,10 @@ aarch64_add_constant (machine_mode mode, int regnum, int 
scratchreg,
 
   /* Otherwise use generic function to handle all other situations.  */
   rtx scratch_rtx = gen_rtx_REG (mode, scratchreg);
-  aarch64_internal_mov_immediate (scratch_rtx, GEN_INT (delta), true, mode);
-  insn = emit_insn (gen_add2_insn (this_rtx, scratch_rtx));
+  if (emit_move_imm)
+    aarch64_internal_mov_immediate (scratch_rtx, GEN_INT (mdelta), true, mode);
+  insn = emit_insn (delta < 0 ? gen_sub2_insn (this_rtx, scratch_rtx)
+ : gen_add2_insn (this_rtx,

[PATCH] Fix cond-expr handling in genmatch

2016-10-17 Thread Richard Biener

This fixes matching of toplevel (cond (lt @1 @2) ...) as reported by
Bin to me privately.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2016-10-17  Richard Biener  

* genmatch.c (dt_operand::gen_gimple_expr): Use get_name to
get at the operand to look at with TREE_OPERAND for generic
sub-nodes.

Index: gcc/genmatch.c
===
--- gcc/genmatch.c  (revision 241228)
+++ gcc/genmatch.c  (working copy)
@@ -2644,9 +2652,11 @@ dt_operand::gen_gimple_expr (FILE *f, in
  /* ???  If this is a memory operation we can't (and should not)
 match this.  The only sensible operand types are
 SSA names and invariants.  */
+ char opname[20];
+ get_name (opname);
  fprintf_indent (f, indent,
- "tree %s = TREE_OPERAND (gimple_assign_rhs1 
(def), %i);\n",
- child_opname, i);
+ "tree %s = TREE_OPERAND (%s, %i);\n",
+ child_opname, opname, i);
  fprintf_indent (f, indent,
  "if ((TREE_CODE (%s) == SSA_NAME\n",
  child_opname);



Re: [PATCH] rs6000: Fix shrink-wrap-separate for AIX

2016-10-17 Thread Segher Boessenkool
On Sun, Oct 16, 2016 at 08:37:30AM -0400, David Edelsohn wrote:
> Maybe something like the following:
> 
> * config/rs6000/rs6000.c (rs6000_savres_strategy) [AIX,ELFv2]: Inline
> FPR save and restore if shrink-wrapping and not optimizing for size.
> 
> Index: rs6000.c
> ===
> --- rs6000.c(revision 241210)
> +++ rs6000.c(working copy)
> @@ -25445,7 +25445,8 @@
>else
>  {
>gcc_checking_assert (DEFAULT_ABI == ABI_AIX || DEFAULT_ABI == 
> ABI_ELFv2);
> -  if (info->first_fp_reg_save > 61)
> +  if ((!optimize_size && flag_shrink_wrap_separate)
> + || info->first_fp_reg_save > 61)
> strategy |= SAVE_INLINE_FPRS | REST_INLINE_FPRS;
>strategy |= SAVE_INLINE_GPRS | REST_INLINE_GPRS;
>strategy |= SAVE_INLINE_VRS | REST_INLINE_VRS;
> 

With "optimize_function_for_speed_p (cfun)" (like you found out yourself
already), that looks fine to me.

Thanks,


Segher


[PATCH] Clear BB_VISITED in bb-reorder

2016-10-17 Thread Richard Biener

$subject, applied as obvious.

Richard.

2016-10-17  Richard Biener  

* bb-reorder.c (reorder_basic_blocks_simple): Clear BB_VISITED
before using it.

Index: gcc/bb-reorder.c
===
--- gcc/bb-reorder.c(revision 241228)
+++ gcc/bb-reorder.c(working copy)
@@ -2355,7 +2355,10 @@ reorder_basic_blocks_simple (void)
  To start with, everything points to itself, nothing is assigned yet.  */
 
   FOR_ALL_BB_FN (bb, cfun)
-bb->aux = bb;
+{
+  bb->aux = bb;
+  bb->flags &= ~BB_VISITED;
+}
 
   EXIT_BLOCK_PTR_FOR_FN (cfun)->aux = 0;
 


[PATCH] PR77998 Remove std::thread::_Invoker::result_type

2016-10-17 Thread Jonathan Wakely

This typedef uses the type before it's complete, which Clang rejects.
It's not necessary anyay, we can just use decltype directly where
needed. The _Invoker type isn't user-facing and doesn't need a
result_type.

PR libstdc++/77998
* include/std/future (__future_base::_Deferred_state)
(__future_base::_Async_state_impl): Use decltype to deduce return
type, instead of _Invoker::result_type.
* include/std/thread (thread::_Invoker::operator()): Likewise.
(thread::_Invoker::result_type): Remove.

Tested powerpc64le-linux, committed to trunk.

commit dfb18d127220b2a0f338d927253bbc5da4993f15
Author: Jonathan Wakely 
Date:   Mon Oct 17 13:12:15 2016 +0100

PR77998 Remove std::thread::_Invoker::result_type

PR libstdc++/77998
* include/std/future (__future_base::_Deferred_state)
(__future_base::_Async_state_impl): Use decltype to deduce return
type, instead of _Invoker::result_type.
* include/std/thread (thread::_Invoker::operator()): Likewise.
(thread::_Invoker::result_type): Remove.

diff --git a/libstdc++-v3/include/std/future b/libstdc++-v3/include/std/future
index 31a21f9..5542d49 100644
--- a/libstdc++-v3/include/std/future
+++ b/libstdc++-v3/include/std/future
@@ -561,10 +561,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 class _Async_state_commonV2;
 #endif
 
-template
+template()())>
   class _Deferred_state;
 
-template
+template()())>
   class _Async_state_impl;
 
 template
diff --git a/libstdc++-v3/include/std/thread b/libstdc++-v3/include/std/thread
index 7a3c407..576e577 100644
--- a/libstdc++-v3/include/std/thread
+++ b/libstdc++-v3/include/std/thread
@@ -235,14 +235,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
using _Indices
  = typename _Build_index_tuple::value>::__type;
 
-   using result_type
- = decltype(std::declval<_Invoker>()._M_invoke(_Indices()));
-
-   result_type
+   auto
operator()()
-   noexcept(noexcept(std::declval<_Invoker>()._M_invoke(_Indices(
+   noexcept(noexcept(std::declval<_Invoker&>()._M_invoke(_Indices(
+   -> decltype(std::declval<_Invoker&>()._M_invoke(_Indices()))
{ return _M_invoke(_Indices()); }
-
   };
 
 // Alias for _Invoker>


Re: Clear basic block flags before using BB_VISITED for OpenACC loops processing

2016-10-17 Thread Richard Biener
On Mon, Oct 17, 2016 at 1:47 PM, Thomas Schwinge
 wrote:
> Hi!
>
> On Mon, 17 Oct 2016 13:22:17 +0200, Richard Biener 
>  wrote:
>> On Mon, Oct 17, 2016 at 11:38 AM, Thomas Schwinge
>>  wrote:
>> > On Fri, 14 Oct 2016 13:06:59 +0200, Richard Biener 
>> >  wrote:
>> >> On Fri, Oct 14, 2016 at 1:00 PM, Nathan Sidwell  wrote:
>> >> > On 10/14/16 05:28, Richard Biener wrote:
>> >> >
>> >> >> The BB_VISITED flag has indetermined state at the beginning of a pass.
>> >> >> You have to ensure it is cleared yourself.
>> >> >
>> >> >
>> >> > In that case the openacc (&nvptx?) passes should be modified to clear 
>> >> > the
>> >> > flags at their start, rather than at their end.
>> >
>> > The gcc/config/nvptx/nvptx.c handling seems fine -- it explicitly clears
>> > BB_VISITED for all basic block it works on.
>> >
>> >> Yes.  But as I said, I ran into IRA ICEs (somewhere in the testsuite) 
>> >> when not
>> >> cleaning up after tree-ssa-propagate.c.  So somebody has to fix IRA first.
>> >
>> > Is there a GCC PR for that, or where are you tracking such issues?
>>
>> No, just tracking in my head.
>
> Tsk, tsk...  ;-)

bb-reorder.c has the same issue.

Index: gcc/bb-reorder.c
===
--- gcc/bb-reorder.c(revision 241228)
+++ gcc/bb-reorder.c(working copy)
@@ -2355,7 +2355,10 @@ reorder_basic_blocks_simple (void)
  To start with, everything points to itself, nothing is assigned yet.  */

   FOR_ALL_BB_FN (bb, cfun)
-bb->aux = bb;
+{
+  bb->aux = bb;
+  bb->flags &= ~BB_VISITED;
+}

   EXIT_BLOCK_PTR_FOR_FN (cfun)->aux = 0;


note that I didn't really understand the IRA issue (the code looks
fine from a quick look - it initializes
to BB_VISITED).  Still removing the tree-ssa-propagate.c BB_VISITED
clearing resulted in a bootstrap failure on x86_64.

Richard.

>
>> > OK to commit the following?  Is such a test case appropriate (which would
>> > have caught this issue right away), in particular the dg-final
>> > scan-tree-dump line?
>>
>> Ugh.  Not worse to what we do in various dwarf scanning I guess.
>
> ;-|
>
>> Doesn't failure lead to a miscompile eventually?  So you could formulate
>> this as a dg-do run test with a check for the desired outcome?
>
> No, unfortunately.  In this case the error is "benign" such that the
> OpenACC loop processing machinery will decide to not parallelize loops
> that ought to be parallelized.

So you can scan for "loop parallelized" instead?  I fear your pattern
is quite fragile
to maintain over time.

Richard.

>  This won't generally cause any problem
> (apart from performance regression, obviously); it just caused problems
> in a few libgomp test cases that actually at run time test for
> parallelized execution -- which will/did trigger only with nvptx
> offloading enabled, which not too many people are testing.  The test case
> I propose below will trigger also for non-offloading configurations.
>
>> > Clear basic block flags before using BB_VISITED for OpenACC loops 
>> > processing
>> >
>> > gcc/
>> > * omp-low.c (oacc_loop_discovery): Call clear_bb_flags.
>> >
>> > gcc/testsuite/
>> > * gcc.dg/goacc/loop-processing-1.c: New file.
>> > ---
>> >  gcc/omp-low.c  |  9 +
>> >  gcc/testsuite/gcc.dg/goacc/loop-processing-1.c | 18 ++
>> >  2 files changed, 23 insertions(+), 4 deletions(-)
>> >
>> > --- gcc/omp-low.c
>> > +++ gcc/omp-low.c
>> > @@ -19340,7 +19340,9 @@ oacc_loop_sibling_nreverse (oacc_loop *loop)
>> >  static oacc_loop *
>> >  oacc_loop_discovery ()
>> >  {
>> > -  basic_block bb;
>> > +  /* Clear basic block flags, in particular BB_VISITED which we're going 
>> > to use
>> > + in the following.  */
>> > +  clear_bb_flags ();
>> >
>> >oacc_loop *top = new_oacc_loop_outer (current_function_decl);
>> >oacc_loop_discover_walk (top, ENTRY_BLOCK_PTR_FOR_FN (cfun));
>> > @@ -19349,9 +19351,8 @@ oacc_loop_discovery ()
>> >   that diagnostics come out in an unsurprising order.  */
>> >top = oacc_loop_sibling_nreverse (top);
>> >
>> > -  /* Reset the visited flags.  */
>> > -  FOR_ALL_BB_FN (bb, cfun)
>> > -bb->flags &= ~BB_VISITED;
>> > +  /* Clear basic block flags again, as otherwise IRA will explode later 
>> > on.  */
>> > +  clear_bb_flags ();
>> >
>> >return top;
>> >  }
>> > --- /dev/null
>> > +++ gcc/testsuite/gcc.dg/goacc/loop-processing-1.c
>> > @@ -0,0 +1,18 @@
>> > +/* Make sure that OpenACC loop processing happens.  */
>> > +/* { dg-additional-options "-O2 -fdump-tree-oaccdevlow" } */
>> > +
>> > +extern int place ();
>> > +
>> > +void vector_1 (int *ary, int size)
>> > +{
>> > +#pragma acc parallel num_workers (32) vector_length(32) copy(ary[0:size]) 
>> > firstprivate (size)
>> > +  {
>> > +#pragma acc loop gang
>> > +for (int jx = 0; jx < 1; jx++)
>> > +#pragma acc loop auto
>> > +  for (int ix = 0; ix < size; ix++)
>> > +   ary[ix] = place ();
>> >

[PATCH] PR77987 Fix unique_ptr::reset(U) for T != U

2016-10-17 Thread Jonathan Wakely

This is a very simple change, to allow conversions that are supposed
to be valid according to DR 2118. The fix was slightly complicated by
the tests being a bit of a mess. We had tests for this requirement,
but they were in files which mixed positive tests and negative tests.
I've split the tests so the invalid operations are in separate files
and all the valid operations are in files that should pass.

PR libstdc++/77987
* include/bits/unique_ptr.h (unique_ptr::reset(U)): Copy
value to pointer of the correct type to swap, to support conversions
allowed by LWG 2118 / N4089.
* testsuite/20_util/unique_ptr/assign/assign_neg.cc: Move test for
incompatible deleters from ...
* testsuite/20_util/unique_ptr/assign/cv_qual.cc: ... here.
* testsuite/20_util/unique_ptr/modifiers/cv_qual.cc: Move tests for
incompatible pointers to ...
* testsuite/20_util/unique_ptr/modifiers/reset_neg.cc: ... here. Move
destructor definition to base class. Test for invalid derived-to-base
conversion.

I'll also backport this to the branches.

commit 2e75149f402d242310ba53b8d31ef38df0d17ed1
Author: Jonathan Wakely 
Date:   Sat Oct 15 19:06:09 2016 +0100

PR77987 Fix unique_ptr::reset(U) for T != U

	PR libstdc++/77987
	* include/bits/unique_ptr.h (unique_ptr::reset(U)): Copy
	value to pointer of the correct type to swap, to support conversions
	allowed by LWG 2118 / N4089.
	* testsuite/20_util/unique_ptr/assign/assign_neg.cc: Move test for
	incompatible deleters from ...
	* testsuite/20_util/unique_ptr/assign/cv_qual.cc: ... here.
	* testsuite/20_util/unique_ptr/modifiers/cv_qual.cc: Move tests for
	incompatible pointers to ...
	* testsuite/20_util/unique_ptr/modifiers/reset_neg.cc: ... here. Move
	destructor definition to base class. Test for invalid derived-to-base
	conversion.

diff --git a/libstdc++-v3/include/bits/unique_ptr.h b/libstdc++-v3/include/bits/unique_ptr.h
index 15ef8a8..a36794d 100644
--- a/libstdc++-v3/include/bits/unique_ptr.h
+++ b/libstdc++-v3/include/bits/unique_ptr.h
@@ -610,10 +610,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   void
   reset(_Up __p) noexcept
   {
+	pointer __ptr = __p;
 	using std::swap;
-	swap(std::get<0>(_M_t), __p);
-	if (__p != nullptr)
-	  get_deleter()(__p);
+	swap(std::get<0>(_M_t), __ptr);
+	if (__ptr != nullptr)
+	  get_deleter()(__ptr);
   }
 
   void reset(nullptr_t = nullptr) noexcept
diff --git a/libstdc++-v3/testsuite/20_util/unique_ptr/assign/assign_neg.cc b/libstdc++-v3/testsuite/20_util/unique_ptr/assign/assign_neg.cc
index 842b6ed..9142e61 100644
--- a/libstdc++-v3/testsuite/20_util/unique_ptr/assign/assign_neg.cc
+++ b/libstdc++-v3/testsuite/20_util/unique_ptr/assign/assign_neg.cc
@@ -48,4 +48,25 @@ test03()
   std::unique_ptr p2 = p1; // { dg-error "deleted" }
 }
 
+struct base_pointer { operator base*() const { return nullptr; } };
+
+template
+struct deleter
+{
+  deleter() = default;
+  template
+deleter(const deleter) { }
+  typedef T pointer;
+  void operator()(T) const { }
+};
+
+void
+test04()
+{
+  // Disallow conversions from incompatible deleter
+  std::unique_ptr> p;
+  std::unique_ptr> upA;
+  upA = std::move(p);  // { dg-error "no match" }
+}
+
 // { dg-prune-output "include" }
diff --git a/libstdc++-v3/testsuite/20_util/unique_ptr/assign/cv_qual.cc b/libstdc++-v3/testsuite/20_util/unique_ptr/assign/cv_qual.cc
index 793499d..7add5a9 100644
--- a/libstdc++-v3/testsuite/20_util/unique_ptr/assign/cv_qual.cc
+++ b/libstdc++-v3/testsuite/20_util/unique_ptr/assign/cv_qual.cc
@@ -65,26 +65,3 @@ test03()
   std::unique_ptr cvA;
   cvA = std::move(upA);
 }
-
-struct A_pointer { operator A*() const { return nullptr; } };
-
-template
-struct deleter
-{
-  deleter() = default;
-  template
-deleter(const deleter) { }
-  typedef T pointer;
-  void operator()(T) const { }
-};
-
-void
-test04()
-{
-  // Disallow conversions from incompatible deleter
-  std::unique_ptr> p;
-  std::unique_ptr> upA;
-  upA = std::move(p);  // { dg-error "no match" }
-  // { dg-error "no type" "" { target *-*-* } 537 }
-  // { dg-error "no matching function" "" { target *-*-* } 614 }
-}
diff --git a/libstdc++-v3/testsuite/20_util/unique_ptr/modifiers/cv_qual.cc b/libstdc++-v3/testsuite/20_util/unique_ptr/modifiers/cv_qual.cc
index 0284563..f92949b 100644
--- a/libstdc++-v3/testsuite/20_util/unique_ptr/modifiers/cv_qual.cc
+++ b/libstdc++-v3/testsuite/20_util/unique_ptr/modifiers/cv_qual.cc
@@ -86,15 +86,4 @@ test07()
   vA2.reset((A*)p);
   std::unique_ptr cvA2;
   cvA2.reset((A*)p);
-  // Disallow conversions from user-defined pointer-like types
-  // for the array version
-  std::unique_ptr upA3;
-  upA3.reset(p); // { dg-error "no matching function" }
-  std::unique_ptr cA3;
-  cA3.reset(p); // { dg-error "no matching function" }
-  std::unique_ptr vA3;
-  vA3.reset(p); // { dg-error "no matching function" }
-  std::uniqu

[PATCH] Implement P0084R2, Emplace return type, for C++17

2016-10-17 Thread Jonathan Wakely

In C++17 the emplace_front and emplace_back members return a
reference. There isn't a very neat way to implement this, so it's just
lots of conditional compilation.

This isn't an ABI break, because these member functions are templates
and so the return type is part of the mangled name. We can't apply it
retroactively to older standards though, because it breaks code that
is valid in C++14, such as:

 void f(std::vector& v) { return v.emplace_back(1); }


* doc/xml/manual/status_cxx2017.xml: Update status.
* doc/html/*: Regenerate.
* include/bits/deque.tcc (deque::emplace_front, deque::emplace_back):
Return a reference in C++17 mode.
* include/bits/forward_list.h (forward_list::emplace_front): Likewise.
* include/bits/stl_bvector.h (vector::emplace_back): Likewise.
* include/bits/stl_deque.h (deque::emplace_front, deque::emplace_back):
Likewise.
* include/bits/stl_list.h (list::emplace_front, list::emplace_back):
Likewise.
* include/bits/stl_queue.h (queue::emplace): Likewise.
* include/bits/stl_stack.h (stack::emplace): Likewise.
* include/bits/stl_vector.h (vector::emplace_back): Likewise.
* include/bits/vector.tcc (vector::emplace_back): Likewise.
* include/debug/deque (__gnu_debug::deque::emplace_front)
(__gnu_debug::deque::emplace_back): Likewise.
* include/debug/vector (__gnu_debug::vector::emplace_back): Likewise.
* testsuite/23_containers/deque/modifiers/emplace/cxx17_return.cc:
New.
* testsuite/23_containers/forward_list/modifiers/
emplace_cxx17_return.cc: New.
* testsuite/23_containers/list/modifiers/emplace/cxx17_return.cc: New.
* testsuite/23_containers/queue/members/emplace_cxx17_return.cc: New.
* testsuite/23_containers/stack/members/emplace_cxx17_return.cc: New.
* testsuite/23_containers/vector/bool/emplace_cxx17_return.cc: New.
* testsuite/23_containers/vector/modifiers/emplace/cxx17_return.cc:
New.

Tested powerpc64le-linux, committed to trunk.

commit 8e8e806cbdd4581331364153d69d39b5765f8343
Author: Jonathan Wakely 
Date:   Sat Oct 15 18:56:24 2016 +0100

Implement P0084R2, Emplace return type, for C++17

* doc/xml/manual/status_cxx2017.xml: Update status.
* doc/html/*: Regenerate.
* include/bits/deque.tcc (deque::emplace_front, deque::emplace_back):
Return a reference in C++17 mode.
* include/bits/forward_list.h (forward_list::emplace_front): Likewise.
* include/bits/stl_bvector.h (vector::emplace_back): Likewise.
* include/bits/stl_deque.h (deque::emplace_front, deque::emplace_back):
Likewise.
* include/bits/stl_list.h (list::emplace_front, list::emplace_back):
Likewise.
* include/bits/stl_queue.h (queue::emplace): Likewise.
* include/bits/stl_stack.h (stack::emplace): Likewise.
* include/bits/stl_vector.h (vector::emplace_back): Likewise.
* include/bits/vector.tcc (vector::emplace_back): Likewise.
* include/debug/deque (__gnu_debug::deque::emplace_front)
(__gnu_debug::deque::emplace_back): Likewise.
* include/debug/vector (__gnu_debug::vector::emplace_back): Likewise.
* testsuite/23_containers/deque/modifiers/emplace/cxx17_return.cc:
New.
* testsuite/23_containers/forward_list/modifiers/
emplace_cxx17_return.cc: New.
* testsuite/23_containers/list/modifiers/emplace/cxx17_return.cc: New.
* testsuite/23_containers/queue/members/emplace_cxx17_return.cc: New.
* testsuite/23_containers/stack/members/emplace_cxx17_return.cc: New.
* testsuite/23_containers/vector/bool/emplace_cxx17_return.cc: New.
* testsuite/23_containers/vector/modifiers/emplace/cxx17_return.cc:
New.

diff --git a/libstdc++-v3/doc/xml/manual/status_cxx2017.xml 
b/libstdc++-v3/doc/xml/manual/status_cxx2017.xml
index ae8dfa9..a1190bc 100644
--- a/libstdc++-v3/doc/xml/manual/status_cxx2017.xml
+++ b/libstdc++-v3/doc/xml/manual/status_cxx2017.xml
@@ -542,14 +542,13 @@ Feature-testing recommendations for C++.
 
 
 
-  
Emplace return type 
   
http://www.w3.org/1999/xlink"; 
xlink:href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0084r2.pdf";>
P0084R2

   
-   No 
+   7 
   
 
 
diff --git a/libstdc++-v3/include/bits/deque.tcc 
b/libstdc++-v3/include/bits/deque.tcc
index 96ec9f8..796b0cd 100644
--- a/libstdc++-v3/include/bits/deque.tcc
+++ b/libstdc++-v3/include/bits/deque.tcc
@@ -129,7 +129,11 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 #if __cplusplus >= 201103L
   template
 template
+#if __cplusplus > 201402L
+  typename deque<_Tp, _Alloc>::reference
+#else
   void
+#endif
   deque<_Tp, _Alloc>::
   emplace_front(_Args&&... __args)
   {
@@ -142,11 +146,18 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
  

Re: Compile-time improvement for if conversion.

2016-10-17 Thread Richard Biener
On Fri, Oct 14, 2016 at 2:07 PM, Yuri Rumyantsev  wrote:
> Richard,
>
> Here is updated patch with the changes proposed by you.
>
> Bootstrapping and regression testing did not show any new failures.
> Is it OK for trunk?

Ok with dropping the free_dominance_info_for_region variant without
struct function argument and making the one with take struct function *
as _first_ argument.

Thanks,
Richard.

> ChangeLog:
> 2016-10-14  Yuri Rumyantsev  
>
> * dominance.c (dom_info::dom_info): Add new constructor for region
> presented by vector of basic blocks.
> (dom_init): New method to initialize members common for both
> constructors.
> (dom_info::dom_info): Invoke dom_init for partial initialization.
> (dom_info::get_idom): Add check to corner cases on basic blocks which
> are not in region.
> (dom_info::calc_dfs_tree): Check M_FAKE_EXIT_EDGE instead of M_REVERSE
> to detect unreachable bbs.
> (dom_info::calc_idoms): Likewise.
> (compute_dom_fast_query_in_region): New function.
> (calculate_dominance_info_for_region): Likewise.
> (free_dominance_info_for_region): Add couple functions to free
> dominance info for region.
> * dominance.h: Add prototypes for introduced region-based functions
> tree-if-conv.c: (build_region): New function.
> (if_convertible_loop_p_1): Invoke local version of post-dominators
> calculation before basic block predication with subsequent freeing
> post-dominator info.
> (tree_if_conversion): Remove free of post-dominator info
> (pass_if_conversion::execute): Delete detection of infinite loops
> and fake edges to exit block since post-dominator calculation is
> performed per if-converted loop only.
>
> 2016-10-13 13:15 GMT+03:00 Richard Biener :
>> On Wed, Oct 12, 2016 at 3:37 PM, Yuri Rumyantsev  wrote:
>>> Richard,
>>>
>>> Here is updated patch . I avoided creation of new entry/exit blocks
>>> but instead add check to border cases - do not consider also blocks
>>> which are out of region.
>>>
>>> Any comments will be appreciated.
>>
>> I mostly like it now.  Can you split out all the common parts from the
>> dom_info constructor
>> to a helper method please and just compute m_n_basic_blocks and a max_index 
>> and
>> do all the memory allocation in that common function?
>>
>> @@ -276,9 +334,10 @@ dom_info::calc_dfs_tree_nonrec (basic_block bb)
>>   bn = e->src;
>>
>>   /* If the next node BN is either already visited or a border
>> -block the current edge is useless, and simply overwritten
>> -with the next edge out of the current node.  */
>> - if (bn == m_end_block || m_dfs_order[bn->index])
>> +block or out of region the current edge is useless, and 
>> simply
>> +overwritten with the next edge out of the current node.  */
>> + if (bn == m_end_block || bn->dom[d_i] == NULL
>>
>> clever idea ;)  Just thinking if this means we support single entry,
>> multiple exit
>> regions for CDI_DOMINATORs and multiple entry, single exit regions for
>> CDI_POST_DOMINATORs.  I think so.  Please update the function comments
>> accordingly.
>>
>>
>> +  m_dfsnum = 1;
>> +  m_nodes = 0;
>> +  m_fake_exit_edge = NULL; /* Assume that region is SCC.  */
>>
>> you mean SESE rather than SCC?
>>
>> @@ -511,7 +573,7 @@ dom_info::calc_idoms ()
>>: ei_start (bb->preds);
>>edge_iterator einext;
>>
>> -  if (m_reverse)
>> +  if (m_reverse && !in_region)
>> {
>>   /* If this block has a fake edge to exit, process that first.  */
>>   if (bitmap_bit_p (m_fake_exit_edge, bb->index))
>>
>> I think it's better to simply change the if (m_reverse) test to a if
>> (m_fake_exit_edge) test.
>>
>> I noticed you do not initialize n_bbs_in_dom_tree (just set it to
>> zero), it's not really used
>> anywhere (but in an assert).
>>
>> free_dominance_info_for_region needs a function level comment (per
>> coding guidelines).
>>
>> Please make free_dominance_info_for_region take a struct function * pointer.
>>
>>  @@ -1318,11 +1350,13 @@ if_convertible_loop_p_1 (struct loop *loop,
>> vec *refs)
>>  {
>>unsigned int i;
>>basic_block exit_bb = NULL;
>> +  vec region;
>>
>>if (find_data_references_in_loop (loop, refs) == chrec_dont_know)
>>  return false;
>>
>> -  calculate_dominance_info (CDI_DOMINATORS);
>> +  if (dom_info_state (CDI_DOMINATORS) != DOM_OK)
>> +calculate_dominance_info (CDI_DOMINATORS);
>>
>> This is a premature (unwanted) change.
>>
>> Other than that the only O(function-size) part left is the zeroing in
>>
>> +  /* Determine max basic block index in region.  */
>> +  int max_index = region[0]->index;
>> +  for (size_t i = 1; i < num; i++)
>> +if (region[i]->index > max_index)
>> +  max_index = region[i]->index;
>> +  max_index += 1;
>> +  m_dfs_order = new_zero_array  (max_index + 1);
>> +  m_dfs_last = &m_dfs_order[max_index];
>>
>> I suppose we can try addressing this as followup.
>>
>> Thanks a lot

Re: Clear basic block flags before using BB_VISITED for OpenACC loops processing

2016-10-17 Thread Thomas Schwinge
Hi!

On Mon, 17 Oct 2016 13:22:17 +0200, Richard Biener  
wrote:
> On Mon, Oct 17, 2016 at 11:38 AM, Thomas Schwinge
>  wrote:
> > On Fri, 14 Oct 2016 13:06:59 +0200, Richard Biener 
> >  wrote:
> >> On Fri, Oct 14, 2016 at 1:00 PM, Nathan Sidwell  wrote:
> >> > On 10/14/16 05:28, Richard Biener wrote:
> >> >
> >> >> The BB_VISITED flag has indetermined state at the beginning of a pass.
> >> >> You have to ensure it is cleared yourself.
> >> >
> >> >
> >> > In that case the openacc (&nvptx?) passes should be modified to clear the
> >> > flags at their start, rather than at their end.
> >
> > The gcc/config/nvptx/nvptx.c handling seems fine -- it explicitly clears
> > BB_VISITED for all basic block it works on.
> >
> >> Yes.  But as I said, I ran into IRA ICEs (somewhere in the testsuite) when 
> >> not
> >> cleaning up after tree-ssa-propagate.c.  So somebody has to fix IRA first.
> >
> > Is there a GCC PR for that, or where are you tracking such issues?
> 
> No, just tracking in my head.

Tsk, tsk...  ;-)


> > OK to commit the following?  Is such a test case appropriate (which would
> > have caught this issue right away), in particular the dg-final
> > scan-tree-dump line?
> 
> Ugh.  Not worse to what we do in various dwarf scanning I guess.

;-|

> Doesn't failure lead to a miscompile eventually?  So you could formulate
> this as a dg-do run test with a check for the desired outcome?

No, unfortunately.  In this case the error is "benign" such that the
OpenACC loop processing machinery will decide to not parallelize loops
that ought to be parallelized.  This won't generally cause any problem
(apart from performance regression, obviously); it just caused problems
in a few libgomp test cases that actually at run time test for
parallelized execution -- which will/did trigger only with nvptx
offloading enabled, which not too many people are testing.  The test case
I propose below will trigger also for non-offloading configurations.

> > Clear basic block flags before using BB_VISITED for OpenACC loops 
> > processing
> >
> > gcc/
> > * omp-low.c (oacc_loop_discovery): Call clear_bb_flags.
> >
> > gcc/testsuite/
> > * gcc.dg/goacc/loop-processing-1.c: New file.
> > ---
> >  gcc/omp-low.c  |  9 +
> >  gcc/testsuite/gcc.dg/goacc/loop-processing-1.c | 18 ++
> >  2 files changed, 23 insertions(+), 4 deletions(-)
> >
> > --- gcc/omp-low.c
> > +++ gcc/omp-low.c
> > @@ -19340,7 +19340,9 @@ oacc_loop_sibling_nreverse (oacc_loop *loop)
> >  static oacc_loop *
> >  oacc_loop_discovery ()
> >  {
> > -  basic_block bb;
> > +  /* Clear basic block flags, in particular BB_VISITED which we're going 
> > to use
> > + in the following.  */
> > +  clear_bb_flags ();
> >
> >oacc_loop *top = new_oacc_loop_outer (current_function_decl);
> >oacc_loop_discover_walk (top, ENTRY_BLOCK_PTR_FOR_FN (cfun));
> > @@ -19349,9 +19351,8 @@ oacc_loop_discovery ()
> >   that diagnostics come out in an unsurprising order.  */
> >top = oacc_loop_sibling_nreverse (top);
> >
> > -  /* Reset the visited flags.  */
> > -  FOR_ALL_BB_FN (bb, cfun)
> > -bb->flags &= ~BB_VISITED;
> > +  /* Clear basic block flags again, as otherwise IRA will explode later 
> > on.  */
> > +  clear_bb_flags ();
> >
> >return top;
> >  }
> > --- /dev/null
> > +++ gcc/testsuite/gcc.dg/goacc/loop-processing-1.c
> > @@ -0,0 +1,18 @@
> > +/* Make sure that OpenACC loop processing happens.  */
> > +/* { dg-additional-options "-O2 -fdump-tree-oaccdevlow" } */
> > +
> > +extern int place ();
> > +
> > +void vector_1 (int *ary, int size)
> > +{
> > +#pragma acc parallel num_workers (32) vector_length(32) copy(ary[0:size]) 
> > firstprivate (size)
> > +  {
> > +#pragma acc loop gang
> > +for (int jx = 0; jx < 1; jx++)
> > +#pragma acc loop auto
> > +  for (int ix = 0; ix < size; ix++)
> > +   ary[ix] = place ();
> > +  }
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "OpenACC loops.*Loop 0\\\(0\\\).*Loop 
> > 14\\\(1\\\).*\\\.data_dep\\\.\[0-9_\]+ = UNIQUE \\\(OACC_HEAD_MARK, 0, 1, 
> > 20\\\);.*Head-0:.*\\\.data_dep\\\.\[0-9_\]+ = UNIQUE \\\(OACC_HEAD_MARK, 0, 
> > 1, 20\\\);.*\\\.data_dep\\\.\[0-9_\]+ = UNIQUE \\\(OACC_FORK, 
> > \\\.data_dep\\\.\[0-9_\]+, 0\\\);.*Tail-0:.*\\\.data_dep\\\.\[0-9_\]+ = 
> > UNIQUE \\\(OACC_TAIL_MARK, \\\.data_dep\\\.\[0-9_\]+, 
> > 1\\\);.*\\\.data_dep\\\.\[0-9_\]+ = UNIQUE \\\(OACC_JOIN, 
> > \\\.data_dep\\\.\[0-9_\]+, 0\\\);.*Loop 
> > 6\\\(4\\\).*\\\.data_dep\\\.\[0-9_\]+ = UNIQUE \\\(OACC_HEAD_MARK, 0, 1, 
> > 6\\\);.*Head-0:.*\\\.data_dep\\\.\[0-9_\]+ = UNIQUE \\\(OACC_HEAD_MARK, 0, 
> > 1, 6\\\);.*\\\.data_dep\\\.\[0-9_\]+ = UNIQUE \\\(OACC_FORK, 
> > \\\.data_dep\\\.\[0-9_\]+, 2\\\);.*Tail-0:.*\\\.data_dep\\\.\[0-9_\]+ = 
> > UNIQUE \\\(OACC_TAIL_MARK, \\\.data_dep\\\.\[0-9_\]+, 
> > 1\\\);.*\\\.data_dep\\\.\[0-9_\]+ = UNIQUE \\\(OACC_JOIN, 
> > \\\.data_dep\\\.\[0-9_\]+, 2\\\);" "oaccdevlow" } } */


Grüße

Re: [RFC] Speed-up -fprofile-update=atomic

2016-10-17 Thread Martin Liška
On 10/13/2016 11:43 AM, Richard Biener wrote:
> On Wed, Oct 12, 2016 at 3:52 PM, Martin Liška  wrote:
>> On 10/04/2016 11:45 AM, Richard Biener wrote:
>>> On Thu, Sep 15, 2016 at 12:00 PM, Martin Liška  wrote:
 On 09/07/2016 02:09 PM, Richard Biener wrote:
> On Wed, Sep 7, 2016 at 1:37 PM, Martin Liška  wrote:
>> On 08/18/2016 06:06 PM, Richard Biener wrote:
>>> On August 18, 2016 5:54:49 PM GMT+02:00, Jakub Jelinek 
>>>  wrote:
 On Thu, Aug 18, 2016 at 08:51:31AM -0700, Andi Kleen wrote:
>> I'd prefer to make updates atomic in multi-threaded applications.
>> The best proxy we have for that is -pthread.
>>
>> Is it slower, most definitely, but odds are we're giving folks
>> garbage data otherwise, which in many ways is even worse.
>
> It will likely be catastrophically slower in some cases.
>
> Catastrophically as in too slow to be usable.
>
> An atomic instruction is a lot more expensive than a single
 increment. Also
> they sometimes are really slow depending on the state of the machine.

 Can't we just have thread-local copies of all the counters (perhaps
 using
 __thread pointer as base) and just atomically merge at thread
 termination?
>>>
>>> I suggested that as well but of course it'll have its own class of 
>>> issues (short lived threads, so we need to somehow re-use counters from 
>>> terminated threads, large number of threads and thus using too much 
>>> memory for the counters)
>>>
>>> Richard.
>>
>> Hello.
>>
>> I've got written the approach on my TODO list, let's see whether it 
>> would be doable in a reasonable amount of time.
>>
>> I've just finished some measurements to illustrate slow-down of 
>> -fprofile-update=atomic approach.
>> All numbers are: no profile, -fprofile-generate, -fprofile-generate 
>> -fprofile-update=atomic
>> c-ray benchmark (utilizing 8 threads, -O3): 1.7, 15.5., 38.1s
>> unrar (utilizing 8 threads, -O3): 3.6, 11.6, 38s
>> tramp3d (1 thread, -O3): 18.0, 46.6, 168s
>>
>> So the slow-down is roughly 300% compared to -fprofile-generate. I'm not 
>> having much experience with default option
>> selection, but these numbers can probably help.
>>
>> Thoughts?
>
> Look at the generated code for an instrumented simple loop and see that 
> for
> the non-atomic updates we happily apply store-motion to the counter update
> and thus we only get one counter update per loop exit rather than one per
> loop iteration.  Now see what happens for the atomic case (I suspect you
> get one per iteration).
>
> I'll bet this accounts for most of the slowdown.
>
> Back in time ICC which had atomic counter updates (but using function
> calls - ugh!) had a > 1000% overhead with FDO for tramp3d (they also
> didn't have early inlining -- removing abstraction helps reducing the 
> number
> of counters significantly).
>
> Richard.

 Hi.

 During Cauldron I discussed with Richi approaches how to speed-up ARCS
 profile counter updates. My first attempt is to utilize TLS storage, where
 every function is accumulating arcs counters. These are eventually added
 (using atomic operations) to the global one at the very end of a function.
 Currently I rely on target support of TLS, which is questionable whether
 to have such a requirement for -fprofile-update=atomic, or to add a new 
 option value
 like -fprofile-update=atomic-tls?

 Running the patch on tramp3d, compared to previous numbers, it takes 88s 
 to finish.
 Time shrinks to 50%, compared to the current implementation.

 Thoughts?
>>>
>>> Hmm, I thought I suggested that you can simply use automatic storage
>>> (which effectively
>>> is TLS...) for regions that are not forked or abnormally left (which
>>> means SESE regions
>>> that have no calls that eventually terminate or throw externally).
>>>
>>> So why did you end up with TLS?
>>
>> Hi.
>>
>> Usage for TLS does not makes sense, stupid mistake ;)
>>
>> By using SESE regions, do you mean the infrastructure that is utilized
>> by Graphite machinery?
> 
> No, just as "single-entry single-exit region" which means placing of
> initializations of the internal counters to zero and the updates of the
> actual counters is "obvious".
> 
> Note that this "optimization" isn't one if the SESE region does not contain
> cycle(s).  Unless there is a way to do an atomic update of a bunch of
> counters faster than doing them separately.  This optimization will also
> increase register pressure (or force the internal counters to the stack).
> Thus selecting which counters to "optimize" and which ones to leave in place
> might be necessary.

Ok, I must admit the selection which counters to optimize is cruc

Re: [PATCH] read-md.c: Move various state to within class rtx_reader

2016-10-17 Thread Richard Sandiford
David Malcolm  writes:
> @@ -2388,8 +2389,10 @@ gen_mnemonic_setattr (htab_t mnemonic_htab, rtx insn)
>  static int
>  mnemonic_htab_callback (void **slot, void *info ATTRIBUTE_UNUSED)
>  {
> -  obstack_grow (&string_obstack, (char*)*slot, strlen ((char*)*slot));
> -  obstack_1grow (&string_obstack, ',');
> +  struct obstack *string_obstack = rtx_reader_ptr->get_string_obstack ();
> +
> +  obstack_grow (string_obstack, (char*)*slot, strlen ((char*)*slot));
> +  obstack_1grow (string_obstack, ',');

The old code was missing a space after the cast.  Might be good to
add it while you're there.

> @@ -2470,8 +2474,8 @@ gen_mnemonic_attr (void)
>htab_traverse (mnemonic_htab, mnemonic_htab_callback, NULL);
>  
>/* Replace the last ',' with the zero end character.  */
> -  *((char *)obstack_next_free (&string_obstack) - 1) = '\0';
> -  XSTR (mnemonic_attr, 1) = XOBFINISH (&string_obstack, char *);
> +  *((char *)obstack_next_free (string_obstack) - 1) = '\0';
> +  XSTR (mnemonic_attr, 1) = XOBFINISH (string_obstack, char *);

Same here.

> @@ -129,9 +105,9 @@ get_md_ptr_loc (const void *ptr)
>  void
>  copy_md_ptr_loc (const void *new_ptr, const void *old_ptr)
>  {
> -  const struct ptr_loc *loc = get_md_ptr_loc (old_ptr);
> +  const struct ptr_loc *loc = rtx_reader_ptr->get_md_ptr_loc (old_ptr);
>if (loc != 0)
> -set_md_ptr_loc (new_ptr, loc->filename, loc->lineno);
> +rtx_reader_ptr->set_md_ptr_loc (new_ptr, loc->filename, loc->lineno);
>  }

I suppose it's bikeshedding, but since you need access to an rtx_reader
to get and set the locations, it feels like the rtx_reader should be
a parameter here too.

> @@ -140,7 +116,7 @@ copy_md_ptr_loc (const void *new_ptr, const void *old_ptr)
>  void
>  fprint_md_ptr_loc (FILE *outf, const void *ptr)
>  {
> -  const struct ptr_loc *loc = get_md_ptr_loc (ptr);
> +  const struct ptr_loc *loc = rtx_reader_ptr->get_md_ptr_loc (ptr);
>if (loc != 0)
>  fprintf (outf, "#line %d \"%s\"\n", loc->lineno, loc->filename);
>  }

Same here.

> +/* As rtx_reader::join_c_conditions above, but call it on the singleton
> +   rtx_reader instance.  */
> +
> +const char *
> +join_c_conditions (const char *cond1, const char *cond2)
> +{
> +  return rtx_reader_ptr->join_c_conditions (cond1, cond2);
> +}
> +

I think it'd be better to have the callers use the rtx_reader_ptr
explicitly.

In general I think it'd be better if the read-md.h interface itself
didn't rely on there being a singleton, other than providing the
rtx_reader_ptr variable.  It would then be easy to allow non-singleton
uses in future, without having to update gen* calls a second time.

> @@ -204,6 +189,15 @@ fprint_c_condition (FILE *outf, const char *cond)
>  }
>  }
>  
> +/* As rtx_reader::fprint_c_condition above, but call it on the singleton
> +   rtx_reader instance.  */
> +
> +void
> +fprint_c_condition (FILE *outf, const char *cond)
> +{
> +  rtx_reader_ptr->fprint_c_condition (outf, cond);
> +}
> +
>  /* Special fprint_c_condition for writing to STDOUT.  */
>  
>  void

Another instance of the above.

> @@ -808,7 +802,7 @@ handle_constants (void)
>  void
>  traverse_md_constants (htab_trav callback, void *info)
>  {
> -  htab_traverse (md_constants, callback, info);
> +  htab_traverse (rtx_reader_ptr->get_md_constants (), callback, info);
>  }

Here too.

I assume you didn't turn the functions above into member functions
because they aren't primitives.  OTOH it might seem inconsistent
to have things like traverse_enum_types as member functions but
traverse_md_constants not.

Thanks,
Richard


  1   2   >