Re: [PATCH] Fix return type detection in visit()

2017-02-14 Thread Tim Shen via gcc-patches
On Tue, Feb 14, 2017 at 2:49 PM, Jonathan Wakely  wrote:
> On 14/02/17 13:59 -0800, Tim Shen via libstdc++ wrote:
>>
>> This is an obvious missing std::forward. :)
>
>
> I was about to look into it, I assumed it would be something simple!
>
>> diff --git a/libstdc++-v3/testsuite/20_util/variant/compile.cc
>> b/libstdc++-v3/testsuite/20_util/variant/compile.cc
>> index 65f4326c397..d40a4ccb784 100644
>> --- a/libstdc++-v3/testsuite/20_util/variant/compile.cc
>> +++ b/libstdc++-v3/testsuite/20_util/variant/compile.cc
>> @@ -291,6 +291,13 @@ void test_visit()
>> };
>> static_assert(visit(Visitor(), variant(0)), "");
>>   }
>> +  // PR libstdc++/79513
>> +  {
>> +std::variant v(5);
>> +std::visit([](int&){}, v);
>> +std::visit([](int&&){}, std::move(v));
>> +(void)v;
>
>
> Is this to suppress an unused variable warning?
>
> If it is, please use an attribute instead, as it's more reliable:
>
>std::variant v __attribute__((unused)) (5);

Even better, I used the shiny new [[gnu::unused]]. :)

>
> OK for trunk if testing passes, thanks.
>

Tested and committed.


-- 
Regards,
Tim Shen


Re: [v3 PATCH] Implement C++17 GB50 resolution

2017-02-14 Thread Ville Voutilainen
On 14 February 2017 at 23:22, Dinka Ranns  wrote:
> C++17 GB50 resolution
> * libstdc++-v3/include/std/chrono:

Pardon me for not noticing this while looking at the earlier versions
of this patch, but these should
not include the libstdc++-v3 prefix, so it should be

* include/std/chrono:

> * libstdc++-v3/testsuite/20_util/duration/arithmetic/constexpr.cc:

And this should be

* testsuite/20_util/duration/arithmetic/constexpr.cc:

> * libstdc++-v3/testsuite/20_util/time_point/arithmetic/constexpr.cc: 
> new

and likewise here,

* testsuite/20_util/time_point/arithmetic/constexpr.cc: new

That's minor and can be fixed by a maintainer committing the patch,
but for future reference.


Backports to 6.x

2017-02-14 Thread Jakub Jelinek
Hi!

I've bootstrapped/regtested following patches on x86_64-linux and i686-linux
on gcc-6-branch and committed them to 6.x.

Jakub
2017-02-15  Jakub Jelinek  

Backported from mainline
2017-01-17  Kito Cheng  
Kuan-Lin Chen  

PR target/79079
* internal-fn.c (expand_mul_overflow): Use convert_modes instead of
gen_lowpart.

--- gcc/internal-fn.c   (revision 244538)
+++ gcc/internal-fn.c   (revision 244539)
@@ -1483,8 +1483,8 @@ expand_mul_overflow (location_t loc, tre
  res = expand_expr_real_2 (, NULL_RTX, wmode, EXPAND_NORMAL);
  rtx hipart = expand_shift (RSHIFT_EXPR, wmode, res, prec,
 NULL_RTX, uns);
- hipart = gen_lowpart (mode, hipart);
- res = gen_lowpart (mode, res);
+ hipart = convert_modes (mode, wmode, hipart, uns);
+ res = convert_modes (mode, wmode, res, uns);
  if (uns)
/* For the unsigned multiplication, there was overflow if
   HIPART is non-zero.  */
@@ -1517,16 +1517,16 @@ expand_mul_overflow (location_t loc, tre
  unsigned int hprec = GET_MODE_PRECISION (hmode);
  rtx hipart0 = expand_shift (RSHIFT_EXPR, mode, op0, hprec,
  NULL_RTX, uns);
- hipart0 = gen_lowpart (hmode, hipart0);
- rtx lopart0 = gen_lowpart (hmode, op0);
+ hipart0 = convert_modes (hmode, mode, hipart0, uns);
+ rtx lopart0 = convert_modes (hmode, mode, op0, uns);
  rtx signbit0 = const0_rtx;
  if (!uns)
signbit0 = expand_shift (RSHIFT_EXPR, hmode, lopart0, hprec - 1,
 NULL_RTX, 0);
  rtx hipart1 = expand_shift (RSHIFT_EXPR, mode, op1, hprec,
  NULL_RTX, uns);
- hipart1 = gen_lowpart (hmode, hipart1);
- rtx lopart1 = gen_lowpart (hmode, op1);
+ hipart1 = convert_modes (hmode, mode, hipart1, uns);
+ rtx lopart1 = convert_modes (hmode, mode, op1, uns);
  rtx signbit1 = const0_rtx;
  if (!uns)
signbit1 = expand_shift (RSHIFT_EXPR, hmode, lopart1, hprec - 1,
@@ -1717,11 +1717,12 @@ expand_mul_overflow (location_t loc, tre
 if (loxhi >> (bitsize / 2) == 0 (if uns).  */
  rtx hipartloxhi = expand_shift (RSHIFT_EXPR, mode, loxhi, hprec,
  NULL_RTX, 0);
- hipartloxhi = gen_lowpart (hmode, hipartloxhi);
+ hipartloxhi = convert_modes (hmode, mode, hipartloxhi, 0);
  rtx signbitloxhi = const0_rtx;
  if (!uns)
signbitloxhi = expand_shift (RSHIFT_EXPR, hmode,
-gen_lowpart (hmode, loxhi),
+convert_modes (hmode, mode,
+   loxhi, 0),
 hprec - 1, NULL_RTX, 0);
 
  do_compare_rtx_and_jump (signbitloxhi, hipartloxhi, NE, true, hmode,
@@ -1731,7 +1732,8 @@ expand_mul_overflow (location_t loc, tre
  /* res = (loxhi << (bitsize / 2)) | (hmode) lo0xlo1;  */
  rtx loxhishifted = expand_shift (LSHIFT_EXPR, mode, loxhi, hprec,
   NULL_RTX, 1);
- tem = convert_modes (mode, hmode, gen_lowpart (hmode, lo0xlo1), 1);
+ tem = convert_modes (mode, hmode,
+  convert_modes (hmode, mode, lo0xlo1, 1), 1);
 
  tem = expand_simple_binop (mode, IOR, loxhishifted, tem, res,
 1, OPTAB_DIRECT);
2017-02-15  Jakub Jelinek  

Backported from mainline
2017-01-31  Jakub Jelinek  

PR tree-optimization/79267
* value-prof.c (gimple_ic): Only drop lhs for noreturn calls
if should_remove_lhs_p is true.

* g++.dg/opt/pr79267.C: New test.

--- gcc/value-prof.c(revision 245052)
+++ gcc/value-prof.c(revision 245053)
@@ -1376,7 +1376,13 @@ gimple_ic (gcall *icall_stmt, struct cgr
   gimple_call_set_fndecl (dcall_stmt, direct_call->decl);
   dflags = flags_from_decl_or_type (direct_call->decl);
   if ((dflags & ECF_NORETURN) != 0)
-gimple_call_set_lhs (dcall_stmt, NULL_TREE);
+{
+  tree lhs = gimple_call_lhs (dcall_stmt);
+  if (lhs
+  && TREE_CODE (TYPE_SIZE_UNIT (TREE_TYPE (lhs))) == INTEGER_CST
+  && !TREE_ADDRESSABLE (TREE_TYPE (lhs)))
+   gimple_call_set_lhs (dcall_stmt, NULL_TREE);
+}
   gsi_insert_before (, dcall_stmt, GSI_SAME_STMT);
 
   /* Fix CFG. */
--- gcc/testsuite/g++.dg/opt/pr79267.C  (nonexistent)
+++ gcc/testsuite/g++.dg/opt/pr79267.C  (revision 245053)
@@ -0,0 +1,69 @@
+// PR tree-optimization/79267
+// { dg-do compile }
+// { dg-options "-O3" }
+
+struct A { A (int); };
+struct B
+{
+  virtual void av () 

Re: [RFC PATCH] Improve switchconv optimization (PR tree-optimization/79472)

2017-02-14 Thread Jakub Jelinek
On Wed, Feb 15, 2017 at 08:06:16AM +0100, Richard Biener wrote:
> On February 14, 2017 9:04:45 PM GMT+01:00, Jakub Jelinek  
> wrote:
> >Hi!
> >
> >The following patch is an attempt to fix a regression where we no
> >longer
> >switch convert one switch because earlier optimizations turn it into
> >unsupported shape.
> 
> Is that because of early threading?

Yes.

> >and expects to be optimized into return 3 by vrp1.  As switchconv is
> >earlier
> >than that, vrp1 sees:
> >  _1 = a_3(D) & 1;
> >  _4 = (unsigned int) _1;
> >  _5 = CSWTCH.1[_4];
> >  return _5;
> >and doesn't optimize it.  If the testcase had say case 7: replaced with
> >default:, it wouldn't pass already before.
> 
> That looks odd...

Just a pass ordering issue.

>   If the patch is ok, what
> >should
> >we do with vrp40.c?  Change it in some way (e.g. return variable in one
> >case) so that switchconv doesn't trigger, or add an optimization in vrp
> >if we see a load from constant array with known initializer and the
> >range
> >is small enough and contains the same value for all those values,
> >replace
> >it with the value? 
> 
> Possibly, but for GCC 8.

To both this switchconv patch and the potential improvement for loading
from const arrays (can create an enhancement PR for that), or just the
latter?

> can we teach EVRP about this?  It runs before switch conversion.

I guess so.  It is a matter of calling simplify_switch_using_ranges
and then doing some cleanup (you wrote that optimization)
- to_update_switch_stmts handling.

Jakub


Re: [RFC PATCH] Improve switchconv optimization (PR tree-optimization/79472)

2017-02-14 Thread Richard Biener
On February 14, 2017 9:04:45 PM GMT+01:00, Jakub Jelinek  
wrote:
>Hi!
>
>The following patch is an attempt to fix a regression where we no
>longer
>switch convert one switch because earlier optimizations turn it into
>unsupported shape.

Is that because of early threading?

>The patch contains two important changes (that can perhaps be split off
>separately):
>1) handle virtual PHIs; while because we require all the switch bbs
> to be empty, all the edges from the switch to final_bb should have the
>   same virtual op SSA_NAME, if the final_bb is reachable through other
>  edges from other code, it might have virtual PHI and switchconv would
>   unnecessarily give up
>2) if the switch cases form contiguous range, there is no need to
>require
>   anything about the default case, it can be abort, or arbitrary code
>   that can or might not fall through into final_bb, or it can e.g. be
>  empty bb and just the values from default bb might not be appropriate
>   constants; we emit an if (val is in range) vals = CSWTCH[...]; else
> anyway and the else can be anything; we still have to require standard
>  default case if the range is non-contiguous, because then the default
>   is used also for some values in the tables.
>
>Bootstrapped/regtested on x86_64-linux and i686-linux.  It causes a
>single
>regression, vrp40.c, which looks like this:
>int f(int a) {
>switch (a & 1) { case 0: case 1: return 3; case 2: return 5; case 3:
>return 7;
>case 4: return 11; case 5: return 13; case 6: return 17; case 7: return
>19; }
>}
>and expects to be optimized into return 3 by vrp1.  As switchconv is
>earlier
>than that, vrp1 sees:
>  _1 = a_3(D) & 1;
>  _4 = (unsigned int) _1;
>  _5 = CSWTCH.1[_4];
>  return _5;
>and doesn't optimize it.  If the testcase had say case 7: replaced with
>default:, it wouldn't pass already before.

That looks odd...

  If the patch is ok, what
>should
>we do with vrp40.c?  Change it in some way (e.g. return variable in one
>case) so that switchconv doesn't trigger, or add an optimization in vrp
>if we see a load from constant array with known initializer and the
>range
>is small enough and contains the same value for all those values,
>replace
>it with the value? 

Possibly, but for GCC 8.

can we teach EVRP about this?  It runs before switch conversion.

Richard.


 It would help also with say:
>const int a[] = { 7, 8, 9, 1, 1, 1, 1, 2, 3, 4, 5, 6 };
>int foo (int x)
>{
>  if (x <= 2 || x >= 7) return 3;
>  return a[x];
>}
>turn it into
>int foo (int x)
>{
>  if (x <= 2 || x >= 7) return 3;
>  return 1;
>}
>
>2017-02-14  Jakub Jelinek  
>
>   PR tree-optimization/79472
>   * tree-switch-conversion.c (struct switch_conv_info): Add
>   contiguous_range and default_case_nonstandard fields.
>   (collect_switch_conv_info): Compute contiguous_range and
>   default_case_nonstandard fields, don't clear final_bb if
>   contiguous_range and only the default case doesn't have the required
>   structure.
>   (check_all_empty_except_final): Set default_case_nonstandard instead
>   of failing if contiguous_range and the default case doesn't have empty
>   block.
>   (check_final_bb): Add SWTCH argument, don't fail if contiguous_range
>   and only the default case doesn't have the required constants.  Skip
>   virtual phis.
>   (gather_default_values): Skip virtual phis.  Allow non-NULL CASE_LOW
>   if default_case_nonstandard.
>   (build_constructors): Build constant 1 just once.  Assert that default
>   values aren't inserted in between cases if contiguous_range.  Skip
>   virtual phis.
>   (build_arrays): Skip virtual phis.
>   (prune_bbs): Add DEFAULT_BB argument, don't remove that bb.
>   (fix_phi_nodes): Don't add e2f phi arg if default_case_nonstandard.
>   Handle virtual phis.
>   (gen_inbound_check): Handle default_case_nonstandard case.
>   (process_switch): Adjust check_final_bb caller.  Call
>   gather_default_values with the first non-default case instead of
>   default case if default_case_nonstandard.
>
>   * gcc.dg/tree-ssa/cswtch-3.c: New test.
>   * gcc.dg/tree-ssa/cswtch-4.c: New test.
>   * gcc.dg/tree-ssa/cswtch-5.c: New test.
>
>--- gcc/tree-switch-conversion.c.jj2017-02-14 14:54:08.020975500 +0100
>+++ gcc/tree-switch-conversion.c   2017-02-14 17:09:01.162826954 +0100
>@@ -592,6 +592,14 @@ struct switch_conv_info
>  dump file, if there is one.  */
>   const char *reason;
> 
>+  /* True if default case is not used for any value between range_min
>and
>+ range_max inclusive.  */
>+  bool contiguous_range;
>+
>+  /* True if default case does not have the required shape for other
>case
>+ labels.  */
>+  bool default_case_nonstandard;
>+
>   /* Parameters for expand_switch_using_bit_tests.  Should be computed
>  the same way as in expand_case.  */
>   unsigned int uniq;
>@@ -606,8 +614,9 @@ collect_switch_conv_info 

[PATCH] Fix DFP conversion from INTEGER_CST to REAL_CST (PR target/79487)

2017-02-14 Thread Jakub Jelinek
Hi!

As the following testcase shows, we store decimal REAL_CSTs always in
_Decimal128 internal form and perform all the arithmetics on that, but while
for arithmetics we then ensure rounding to the actual type (_Decimal{32,64}
or for _Decimal128 no further rounding), e.g. const_binop calls
  inexact = real_arithmetic (, code, , );
  real_convert (, mode, );
when converting integers to _Decimal{32,64} we do nothing like that.
We do that only for non-decimal conversions from INTEGER_CSTs to REAL_CSTs.

The following patch fixes that.  Bootstrapped/regtested on x86_64-linux
(i686-linux fails to bootstrap for other reason), and on 6.x branch on
x86_64-linux and i686-linux.  Dominik has kindly tested it on s390x (where
the bug has been originally reported on the float-cast-overflow-10.c test).

Ok for trunk?

2017-02-15  Jakub Jelinek  

PR target/79487
* real.c (real_from_integer): Call real_convert even for decimal.

* gcc.dg/dfp/pr79487.c: New test.
* c-c++-common/ubsan/float-cast-overflow-8.c (TEST): Revert
2017-02-13 change.

--- gcc/real.c.jj   2017-01-01 12:45:37.0 +0100
+++ gcc/real.c  2017-02-14 21:35:35.868906203 +0100
@@ -2266,7 +2266,7 @@ real_from_integer (REAL_VALUE_TYPE *r, f
 
   if (fmt.decimal_p ())
 decimal_from_integer (r);
-  else if (fmt)
+  if (fmt)
 real_convert (r, fmt, r);
 }
 
--- gcc/testsuite/gcc.dg/dfp/pr79487.c.jj   2017-02-14 22:42:33.137938789 
+0100
+++ gcc/testsuite/gcc.dg/dfp/pr79487.c  2017-02-14 22:42:22.0 +0100
@@ -0,0 +1,16 @@
+/* PR target/79487 */
+/* { dg-options "-O2" } */
+
+int
+main ()
+{
+  _Decimal32 a = (-9223372036854775807LL - 1LL); 
+  _Decimal32 b = -9.223372E+18DF;
+  if (b - a != 0.0DF)
+__builtin_abort ();
+  _Decimal64 c = (-9223372036854775807LL - 1LL); 
+  _Decimal64 d = -9.223372036854776E+18DD;
+  if (d - c != 0.0DD)
+__builtin_abort ();
+  return 0;
+}
--- gcc/testsuite/c-c++-common/ubsan/float-cast-overflow-8.c.jj 2017-02-14 
00:08:33.0 +0100
+++ gcc/testsuite/c-c++-common/ubsan/float-cast-overflow-8.c2017-02-15 
07:46:46.780778627 +0100
@@ -8,7 +8,7 @@
 #define TEST(type1, type2) \
   if (type1##_MIN) \
 {  \
-  volatile type2 min = type1##_MIN;\
+  type2 min = type1##_MIN; \
   type2 add = -1.0;\
   while (1)\
{   \
@@ -28,7 +28,7 @@
   volatile type1 tem3 = cvt_##type1##_##type2 (-1.0f); \
 }  \
   {\
-volatile type2 max = type1##_MAX;  \
+type2 max = type1##_MAX;   \
 type2 add = 1.0;   \
 while (1)  \
   {\

Jakub


Re: [PATCH] Fix return type detection in visit()

2017-02-14 Thread Jonathan Wakely

On 14/02/17 13:59 -0800, Tim Shen via libstdc++ wrote:

This is an obvious missing std::forward. :)


I was about to look into it, I assumed it would be something simple!


diff --git a/libstdc++-v3/testsuite/20_util/variant/compile.cc 
b/libstdc++-v3/testsuite/20_util/variant/compile.cc
index 65f4326c397..d40a4ccb784 100644
--- a/libstdc++-v3/testsuite/20_util/variant/compile.cc
+++ b/libstdc++-v3/testsuite/20_util/variant/compile.cc
@@ -291,6 +291,13 @@ void test_visit()
};
static_assert(visit(Visitor(), variant(0)), "");
  }
+  // PR libstdc++/79513
+  {
+std::variant v(5);
+std::visit([](int&){}, v);
+std::visit([](int&&){}, std::move(v));
+(void)v;


Is this to suppress an unused variable warning?

If it is, please use an attribute instead, as it's more reliable:

   std::variant v __attribute__((unused)) (5);

OK for trunk if testing passes, thanks.



patch to fix PR79282

2017-02-14 Thread Vladimir Makarov

  The following patch fixes

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79282

  The patch was bootstrapped and tested on x86-64 and tested on ARM.

  Committed as rev. 245459


Index: ChangeLog
===
--- ChangeLog	(revision 245458)
+++ ChangeLog	(working copy)
@@ -1,3 +1,19 @@
+2017-02-14  Vladimir Makarov  
+
+	PR target/79282
+	* lra-int.h (struct lra_operand_data, struct lra_insn_reg): Add
+	member early_clobber_alts.
+	* lra-lives.c (reg_early_clobber_p): New.
+	(process_bb_lives): Use it.
+	* lra.c (new_insn_reg): New arg early_clobber_alts.  Use it.
+	(debug_operand_data): Initialize early_clobber_alts.
+	(setup_operand_alternative): Set up early_clobber_alts.
+	(collect_non_operand_hard_regs): Ditto.  Pass early clobber
+	alternatives to new_insn_reg.
+	(add_regs_to_insn_regno_info): Add arg early_clobber_alts.  Use
+	it.
+	(lra_update_insn_regno_info): Pass the new arg.
+
 2017-02-14  Jakub Jelinek  
 
 	PR middle-end/79505
Index: lra-int.h
===
--- lra-int.h	(revision 245338)
+++ lra-int.h	(working copy)
@@ -130,6 +130,8 @@ struct lra_operand_data
 {
   /* The machine description constraint string of the operand.	*/
   const char *constraint;
+  /* Alternatives for which early_clobber can be true.  */
+  alternative_mask early_clobber_alts;
   /* It is taken only from machine description (which is different
  from recog_data.operand_mode) and can be of VOIDmode.  */
   ENUM_BITFIELD(machine_mode) mode : 16;
@@ -150,6 +152,8 @@ struct lra_operand_data
 /* Info about register occurrence in an insn.  */
 struct lra_insn_reg
 {
+  /* Alternatives for which early_clobber can be true.  */
+  alternative_mask early_clobber_alts;
   /* The biggest mode through which the insn refers to the register
  occurrence (remember the register can be accessed through a
  subreg in the insn).  */
Index: lra-lives.c
===
--- lra-lives.c	(revision 245338)
+++ lra-lives.c	(working copy)
@@ -586,6 +586,16 @@ check_pseudos_live_through_calls (int re
   SET_HARD_REG_SET (lra_reg_info[regno].conflict_hard_regs);
 }
 
+/* Return true if insn REG is an early clobber operand in alternative
+   NALT.  Negative NALT means that we don't know the current insn
+   alternative.  So assume the worst.  */
+static inline bool
+reg_early_clobber_p (const struct lra_insn_reg *reg, int n_alt)
+{
+  return (reg->early_clobber
+	  && (n_alt < 0 || TEST_BIT (reg->early_clobber_alts, n_alt)));
+}
+
 /* Process insns of the basic block BB to update pseudo live ranges,
pseudo hard register conflicts, and insn notes.  We do it on
backward scan of BB insns.  CURR_POINT is the program point where
@@ -638,7 +648,7 @@ process_bb_lives (basic_block bb, int 
   FOR_BB_INSNS_REVERSE_SAFE (bb, curr_insn, next)
 {
   bool call_p;
-  int dst_regno, src_regno;
+  int n_alt, dst_regno, src_regno;
   rtx set;
   struct lra_insn_reg *reg;
 
@@ -647,9 +657,10 @@ process_bb_lives (basic_block bb, int 
 
   curr_id = lra_get_insn_recog_data (curr_insn);
   curr_static_id = curr_id->insn_static_data;
+  n_alt = curr_id->used_insn_alternative;
   if (lra_dump_file != NULL)
-	fprintf (lra_dump_file, "   Insn %u: point = %d\n",
-		 INSN_UID (curr_insn), curr_point);
+	fprintf (lra_dump_file, "   Insn %u: point = %d, n_alt = %d\n",
+		 INSN_UID (curr_insn), curr_point, n_alt);
 
   set = single_set (curr_insn);
 
@@ -818,13 +829,15 @@ process_bb_lives (basic_block bb, int 
 
   /* See which defined values die here.  */
   for (reg = curr_id->regs; reg != NULL; reg = reg->next)
-	if (reg->type == OP_OUT && ! reg->early_clobber && ! reg->subreg_p)
+	if (reg->type == OP_OUT
+	&& ! reg_early_clobber_p (reg, n_alt) && ! reg->subreg_p)
 	  need_curr_point_incr
 	|= mark_regno_dead (reg->regno, reg->biggest_mode,
 curr_point);
 
   for (reg = curr_static_id->hard_regs; reg != NULL; reg = reg->next)
-	if (reg->type == OP_OUT && ! reg->early_clobber && ! reg->subreg_p)
+	if (reg->type == OP_OUT
+	&& ! reg_early_clobber_p (reg, n_alt) && ! reg->subreg_p)
 	  make_hard_regno_dead (reg->regno);
 
   if (curr_id->arg_hard_regs != NULL)
@@ -901,13 +914,15 @@ process_bb_lives (basic_block bb, int 
 
   /* Mark early clobber outputs dead.  */
   for (reg = curr_id->regs; reg != NULL; reg = reg->next)
-	if (reg->type == OP_OUT && reg->early_clobber && ! reg->subreg_p)
+	if (reg->type == OP_OUT
+	&& reg_early_clobber_p (reg, n_alt) && ! reg->subreg_p)
 	  need_curr_point_incr
 	|= mark_regno_dead (reg->regno, reg->biggest_mode,
 curr_point);
 
   for (reg = curr_static_id->hard_regs; reg != NULL; reg = reg->next)
-	if (reg->type == OP_OUT && reg->early_clobber && ! reg->subreg_p)
+	if (reg->type == OP_OUT
+	&& reg_early_clobber_p (reg, 

Re: [PATCH] use zero as the lower bound for a signed-unsigned range (PR 79327)

2017-02-14 Thread Martin Sebor

On 02/14/2017 01:32 PM, Jakub Jelinek wrote:

On Tue, Feb 14, 2017 at 12:15:59PM -0700, Martin Sebor wrote:

That comment explains how the likely_adjust variable ("the adjustment")
is being used, or more precisely, how it was being used in the first
version of the patch.  The comment became somewhat out of date with
the committed version of the patch (this was my bad).

The variable is documented where it's defined and again where it's
assigned to.  With the removal of those comments it seems especially
important that the only remaining description of what's going on be
accurate.

The comment is outdated because it refers to "the adjustment" which
doesn't exist anymore.  (It was replaced by a flag in my commit).
To bring it up to date it should say something like:

  /* Set the LIKELY counter to MIN.  In base 8 and 16, when
 the argument is in range that includes zero, adjust it
 upward to include the length of the base prefix since
 in that case the MIN counter does include it.  */


So for a comment, what about following then?  With or without
the IMNSHO useless
&& (tree_int_cst_sgn (argmin) < 0 || tree_int_cst_sgn (argmax) > 0)


If the condition is redundant (it seems like it could be) it
shouldn't be included in the patch.  It seems like an opportunity
for further simplification.  I'm sure it's not the only one, either.


On a separate note, while testing the patch I noticed that it's
not exactly equivalent to what's on trunk.  Trunk silently accepts
the call below but with the patch it complains.  That's great (it
should complain) but the change should be tested.  More to my point,
while in this case your change happened to fix a subtle bug (which
I'm certainly happy about), it could have just as easily introduced
one.


Yeah, indeed.  That should be a clear argument for why writing it in
so many places is bad, it is simply much more error-prone, there are
too many cases to get right.


No argument there.  There's always room for improvement, cleanup,
or refactoring.

Martin




  char d[2];

  void f (unsigned i)
  {
if (i < 1234 || 12345 < i)
  i = 1234;

__builtin_sprintf (d, "%#hhx", i);
  }


What happens is that because the original range doesn't contain zero
you set likely_adjust to false and then never update it again because
the implicit cast changed the range.

If some version of the patch is approved, I'll leave addition of this
testcase to you (incrementally).

2017-02-14  Jakub Jelinek  

PR tree-optimization/79327
* gimple-ssa-sprintf.c (format_integer): Remove likely_adjust
variable, its initialization and use.

--- gcc/gimple-ssa-sprintf.c.jj 2017-02-14 21:21:56.048745037 +0100
+++ gcc/gimple-ssa-sprintf.c2017-02-14 21:25:20.939033174 +0100
@@ -1232,10 +1232,6 @@ format_integer (const directive , tr
of the format string by returning [-1, -1].  */
 return fmtresult ();

-  /* True if the LIKELY counter should be adjusted upward from the MIN
- counter to account for arguments with unknown values.  */
-  bool likely_adjust = false;
-
   fmtresult res;

   /* Using either the range the non-constant argument is in, or its
@@ -1265,14 +1261,6 @@ format_integer (const directive , tr

  res.argmin = argmin;
  res.argmax = argmax;
-
- /* Set the adjustment for an argument whose range includes
-zero since that doesn't include the octal or hexadecimal
-base prefix.  */
- wide_int wzero = wi::zero (wi::get_precision (min));
- if (wi::le_p (min, wzero, SIGNED)
- && !wi::neg_p (max))
-   likely_adjust = true;
}
   else if (range_type == VR_ANTI_RANGE)
{
@@ -1307,11 +1295,6 @@ format_integer (const directive , tr

   if (!argmin)
 {
-  /* Set the adjustment for an argument whose range includes
-zero since that doesn't include the octal or hexadecimal
-base prefix.  */
-  likely_adjust = true;
-
   if (TREE_CODE (argtype) == POINTER_TYPE)
{
  argmin = build_int_cst (pointer_sized_int_node, 0);
@@ -1364,14 +1347,19 @@ format_integer (const directive , tr
   res.range.max = MAX (max1, max2);
 }

-  /* Add the adjustment for an argument whose range includes zero
- since it doesn't include the octal or hexadecimal base prefix.  */
+  /* If the range is known, use the maximum as the likely length.  */
   if (res.knownrange)
 res.range.likely = res.range.max;
   else
 {
+  /* Otherwise, use the minimum.  Except for the case where for %#x or
+ %#o the minimum is just for a single value in the range (0) and
+ for all other values it is something longer, like 0x1 or 01.
+ Use the length for value 1 in that case instead as the likely
+ length.  */
   res.range.likely = res.range.min;
-  if (likely_adjust && maybebase && base != 10)
+  if (maybebase && base != 10
+ && (tree_int_cst_sgn (argmin) < 0 || 

[PATCH] Fix return type detection in visit()

2017-02-14 Thread Tim Shen via gcc-patches
This is an obvious missing std::forward. :)

Testing on x86_64-linux-gnu, but I expect it to pass.

-- 
Regards,
Tim Shen
commit 08235141a7e06db2b604b5869c9d8e4aaf8fa29b
Author: Tim Shen 
Date:   Tue Feb 14 13:55:18 2017 -0800

2017-02-14  Tim Shen  

PR libstdc++/79513
* include/std/variant (visit()): Forward variant types to the return
type detection code.
* testsuite/20_util/variant/compile.cc: Add test cases.

diff --git a/libstdc++-v3/include/std/variant b/libstdc++-v3/include/std/variant
index c5138e56803..866c4c40a61 100644
--- a/libstdc++-v3/include/std/variant
+++ b/libstdc++-v3/include/std/variant
@@ -1263,7 +1263,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	__throw_bad_variant_access("Unexpected index");
 
   using _Result_type =
-	decltype(std::forward<_Visitor>(__visitor)(get<0>(__variants)...));
+	decltype(std::forward<_Visitor>(__visitor)(
+	get<0>(std::forward<_Variants>(__variants))...));
 
   constexpr auto& __vtable = __detail::__variant::__gen_vtable<
 	_Result_type, _Visitor&&, _Variants&&...>::_S_vtable;
diff --git a/libstdc++-v3/testsuite/20_util/variant/compile.cc b/libstdc++-v3/testsuite/20_util/variant/compile.cc
index 65f4326c397..d40a4ccb784 100644
--- a/libstdc++-v3/testsuite/20_util/variant/compile.cc
+++ b/libstdc++-v3/testsuite/20_util/variant/compile.cc
@@ -291,6 +291,13 @@ void test_visit()
 };
 static_assert(visit(Visitor(), variant(0)), "");
   }
+  // PR libstdc++/79513
+  {
+std::variant v(5);
+std::visit([](int&){}, v);
+std::visit([](int&&){}, std::move(v));
+(void)v;
+  }
 }
 
 void test_constexpr()


Re: [PATCH, rs6000] Fix RTL definitions of the xvcvsxdsp and xvcvuxdsp instructions

2017-02-14 Thread Segher Boessenkool
On Tue, Feb 14, 2017 at 12:59:27PM -0800, Carl E. Love wrote:
> The following patch addresses errors in the RTL define_insn statements
> for the xvcvsxdsp and xvcvuxdsp instructions.  The RTL defined the
> instructions with a V2DF argument and returning V4SI.  They should take
> a V2DI argument and return a V4SF based on the Power ISA document. 
> 
> Additionally, the RTL define_insn for the xvcvuxdsp was fixed to
> generate the correct xvcvuxdsp instruction instead of the xvcvuxwdp
> instruction. Note, this is an additional fix added to the previously
> reviewed patch.
> 
> A compile only test was added to test the argument and return types for
> the RTL define_insn definitions.
> 
> The patch has been tested on powerpc64le-unknown-linux-gnu (Power 8 LE)
> with no regressions.
> 
> Is the patch OK for gcc mainline?

This is okay.  Thanks!


Segher


Re: [RFA][PR tree-optimization/79095] [PATCH 1/4] Improve ranges for MINUS_EXPR and EXACT_DIV_EXPR

2017-02-14 Thread Jeff Law

On 02/14/2017 01:58 AM, Richard Biener wrote:


I spoke with Andrew a bit today, he's consistently seeing cases where the
union of 3 ranges is necessary to resolve the kinds of queries we're
interested in.  He's made a design decision not to use anti-ranges in his
work, so y'all are in sync on that long term.


Ok.  I'd also not hard-code the number of union ranges but make the code
agnostic.  Still the actual implementation might take a #define / template param
for an upper bound.
Andrew was in-sync on not hard-coding the number of ranges either -- 
essentially he's considering the possibility that consumers might want 
different levels of detail and thus a different number of recorded union 
ranges.


I'm not 100% sure that level of engineering is needed, but a design 
which accounts for that inherently avoids hard-coding the upper bound.


Jeff


Re: [PATCH PR79347]Maintain profile counter information in vect_do_peeling

2017-02-14 Thread Pat Haugen
On 02/14/2017 07:57 AM, Jan Hubicka wrote:
> So it seems that the frequency of the loop itself is unrealistically scaled 
> down.
> Before vetorizing the frequency is 8500 and predicted number of iterations is
> 6.6.  Now the loop is intereed via BB 8 with frequency 1148, so the loop, by
> exit probability exits with 15% probability and thus still has 6.6 iterations,
> but by BB frequencies its body executes fewer times than the preheader.
> 
> Now this is a fragile area vectirizing loop should scale number of iterations 
> down
> 8 times. However guessed CFG profiles are always very "flat". Of course
> if loop iterated 6.6 times at the average vectorizing would not make any 
> sense.
> Making guessed profiles less flat is unrealistic, because average loop 
> iterates few times,
> but of course while vectorizing we make additional guess that the 
> vectorizable loops
> matters and the guessed profile is probably unrealistic.

We have the same problem in the RTL loop unroller in that we'll scale the 
unrolled loop by the unroll factor 
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68212#c3), which can result in a 
loop with lower frequency than surrounding code. Problem is compounded if we 
vectorize the loop and then unroll it. Whatever approach is decided for the 
case when we have guessed profile should be applied to both vectorizer and RTL 
loop unroller.

-Pat



[v3 PATCH] Implement C++17 GB50 resolution

2017-02-14 Thread Dinka Ranns
Tested on Linux-x64

Implementation of resolution for C++17 GB50

2017-02-12 Dinka Ranns 

C++17 GB50 resolution
* libstdc++-v3/include/std/chrono:
(duration::operator++()): Add constexpr.
(duration::operator++(int)): Likewise
(duration::operator--()): Likewise
(duration::operator--(int)): Likewise
(duration::operator+=(const duration&)): Likewise
(duration::operator-=(const duration&)): Likewise
(duration::operator*=(const rep&)): Likewise
(duration::operator/=(const rep&)): Likewise
(duration::operator%=(const rep&)): Likewise
(duration::operator%=(const duration&)): Likewise
(time_point::operator+=(const duration&)): Likewise
(time_point::operator-=(const duration&)): Likewise

* libstdc++-v3/testsuite/20_util/duration/arithmetic/constexpr.cc:
new tests
* libstdc++-v3/testsuite/20_util/time_point/arithmetic/constexpr.cc: new
diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index ceae7f8..6a6995c 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -349,50 +349,50 @@ _GLIBCXX_END_NAMESPACE_VERSION
operator-() const
{ return duration(-__r); }
 
-   duration&
+   constexpr duration&
operator++()
{
  ++__r;
  return *this;
}
 
-   duration
+   constexpr duration
operator++(int)
{ return duration(__r++); }
 
-   duration&
+   constexpr duration&
operator--()
{
  --__r;
  return *this;
}
 
-   duration
+   constexpr duration
operator--(int)
{ return duration(__r--); }
 
-   duration&
+   constexpr duration&
operator+=(const duration& __d)
{
  __r += __d.count();
  return *this;
}
 
-   duration&
+   constexpr duration&
operator-=(const duration& __d)
{
  __r -= __d.count();
  return *this;
}
 
-   duration&
+   constexpr duration&
operator*=(const rep& __rhs)
{
  __r *= __rhs;
  return *this;
}
 
-   duration&
+   constexpr duration&
operator/=(const rep& __rhs)
{
  __r /= __rhs;
@@ -401,7 +401,7 @@ _GLIBCXX_END_NAMESPACE_VERSION
 
// DR 934.
template
- typename enable_if::value,
+ constexpr typename enable_if::value,
 duration&>::type
  operator%=(const rep& __rhs)
  {
@@ -410,7 +410,7 @@ _GLIBCXX_END_NAMESPACE_VERSION
  }
 
template
- typename enable_if::value,
+ constexpr typename enable_if::value,
 duration&>::type
  operator%=(const duration& __d)
  {
@@ -631,14 +631,14 @@ _GLIBCXX_END_NAMESPACE_VERSION
{ return __d; }
 
// arithmetic
-   time_point&
+   constexpr time_point&
operator+=(const duration& __dur)
{
  __d += __dur;
  return *this;
}
 
-   time_point&
+   constexpr time_point&
operator-=(const duration& __dur)
{
  __d -= __dur;
diff --git a/libstdc++-v3/testsuite/20_util/duration/arithmetic/constexpr.cc 
b/libstdc++-v3/testsuite/20_util/duration/arithmetic/constexpr.cc
index 285f941..1128a52 100644
--- a/libstdc++-v3/testsuite/20_util/duration/arithmetic/constexpr.cc
+++ b/libstdc++-v3/testsuite/20_util/duration/arithmetic/constexpr.cc
@@ -19,11 +19,31 @@
 
 #include 
 #include 
+constexpr auto test_operators()
+{
+  std::chrono::nanoseconds d1 { };
+  d1++;
+  ++d1;
+  d1--;
+  --d1;
+
+  auto d2(d1);
+
+  d1+=d2;
+  d1-=d2;
 
+  d1*=1;
+  d1/=1;
+  d1%=1;
+  d1%=d2;
+
+  return d1;
+}
 int main()
 {
   constexpr std::chrono::nanoseconds d1 { };
   constexpr auto d2(+d1);
   constexpr auto d3(-d2);
+  constexpr auto d4 = test_operators();
   return 0;
 }
diff --git a/libstdc++-v3/testsuite/20_util/time_point/arithmetic/constexpr.cc 
b/libstdc++-v3/testsuite/20_util/time_point/arithmetic/constexpr.cc
new file mode 100644
index 000..e87a226
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/time_point/arithmetic/constexpr.cc
@@ -0,0 +1,39 @@
+// { dg-do compile { target c++11 } }
+
+// Copyright (C) 2011-2016 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have 

[PATCH, rs6000] Fix RTL definitions of the xvcvsxdsp and xvcvuxdsp instructions

2017-02-14 Thread Carl E. Love
GCC Maintainers:

The following patch addresses errors in the RTL define_insn statements
for the xvcvsxdsp and xvcvuxdsp instructions.  The RTL defined the
instructions with a V2DF argument and returning V4SI.  They should take
a V2DI argument and return a V4SF based on the Power ISA document. 

Additionally, the RTL define_insn for the xvcvuxdsp was fixed to
generate the correct xvcvuxdsp instruction instead of the xvcvuxwdp
instruction. Note, this is an additional fix added to the previously
reviewed patch.

A compile only test was added to test the argument and return types for
the RTL define_insn definitions.

The patch has been tested on powerpc64le-unknown-linux-gnu (Power 8 LE)
with no regressions.

Is the patch OK for gcc mainline?

   Carl Love



gcc/ChangeLog:

2017-02-14  Carl Love  

   * config/rs6000/rs6000.c: Add case statement entry to make the xvcvuxdsp
   built-in argument unsigned.
   * config/rs6000/vsx.md: Fix the source and return operand types so they
   match the instruction definitions from the ISA document.  Fix typo
   in the instruction generation for the (define_insn "vsx_xvcvuxdsp"
   statement.

gcc/testsuite/ChangeLog:

2017-01-14  Carl Love  

   * gcc.target/powerpc/vsx-builtin-3.c: Add missing test case for the
   xvcvsxdsp and xvcvuxdsp instructions.
---
 gcc/config/rs6000/rs6000.c   |  1 +
 gcc/config/rs6000/vsx.md | 10 +-
 gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c | 23 +++
 3 files changed, 29 insertions(+), 5 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index b1c9ef5..f752d1d 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -18594,6 +18594,7 @@ builtin_function_type (machine_mode mode_ret, 
machine_mode mode_arg0,
   break;
 
   /* unsigned args, signed return.  */
+case VSX_BUILTIN_XVCVUXDSP:
 case VSX_BUILTIN_XVCVUXDDP_UNS:
 case ALTIVEC_BUILTIN_UNSFLOAT_V4SI_V4SF:
   h.uns_p[1] = 1;
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index b10ade4..9c3c07d 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -1914,19 +1914,19 @@
   [(set_attr "type" "vecdouble")])
 
 (define_insn "vsx_xvcvsxdsp"
-  [(set (match_operand:V4SI 0 "vsx_register_operand" "=wd,?wa")
-   (unspec:V4SI [(match_operand:V2DF 1 "vsx_register_operand" "wf,wa")]
+  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wd,?wa")
+   (unspec:V4SF [(match_operand:V2DI 1 "vsx_register_operand" "wf,wa")]
 UNSPEC_VSX_CVSXDSP))]
   "VECTOR_UNIT_VSX_P (V2DFmode)"
   "xvcvsxdsp %x0,%x1"
   [(set_attr "type" "vecfloat")])
 
 (define_insn "vsx_xvcvuxdsp"
-  [(set (match_operand:V4SI 0 "vsx_register_operand" "=wd,?wa")
-   (unspec:V4SI [(match_operand:V2DF 1 "vsx_register_operand" "wf,wa")]
+  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wd,?wa")
+   (unspec:V4SF [(match_operand:V2DI 1 "vsx_register_operand" "wf,wa")]
 UNSPEC_VSX_CVUXDSP))]
   "VECTOR_UNIT_VSX_P (V2DFmode)"
-  "xvcvuxwdp %x0,%x1"
+  "xvcvuxdsp %x0,%x1"
   [(set_attr "type" "vecdouble")])
 
 ;; Convert from 32-bit to 64-bit types
diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c 
b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
index f337c1c..ff5296c 100644
--- a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
+++ b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
@@ -35,6 +35,8 @@
 /* { dg-final { scan-assembler "xvcmpgesp" } } */
 /* { dg-final { scan-assembler "xxsldwi" } } */
 /* { dg-final { scan-assembler-not "call" } } */
+/* { dg-final { scan-assembler "xvcvsxdsp" } } */
+/* { dg-final { scan-assembler "xvcvuxdsp" } } */
 
 extern __vector int si[][4];
 extern __vector short ss[][4];
@@ -50,7 +52,9 @@ extern __vector __pixel p[][4];
 #ifdef __VSX__
 extern __vector double d[][4];
 extern __vector long sl[][4];
+extern __vector long long sll[][4];
 extern __vector unsigned long ul[][4];
+extern __vector unsigned long long ull[][4];
 extern __vector __bool long bl[][4];
 #endif
 
@@ -211,3 +215,22 @@ int do_xxsldwi (void)
   d[i][0] = __builtin_vsx_xxsldwi (d[i][1], d[i][2], 3); i++;
   return i;
 }
+
+int do_xvcvsxdsp (void)
+{
+  int i = 0;
+
+  f[i][0] = __builtin_vsx_xvcvsxdsp (sll[i][1]); i++;
+
+  return i;
+}
+
+int do_xvcvuxdsp (void)
+{
+  int i = 0;
+
+  f[i][0] = __builtin_vsx_xvcvuxdsp (ull[i][1]); i++;
+
+  return i;
+}
+
-- 
1.9.1





Re: [PATCH] use zero as the lower bound for a signed-unsigned range (PR 79327)

2017-02-14 Thread Jakub Jelinek
On Tue, Feb 14, 2017 at 12:15:59PM -0700, Martin Sebor wrote:
> That comment explains how the likely_adjust variable ("the adjustment")
> is being used, or more precisely, how it was being used in the first
> version of the patch.  The comment became somewhat out of date with
> the committed version of the patch (this was my bad).
> 
> The variable is documented where it's defined and again where it's
> assigned to.  With the removal of those comments it seems especially
> important that the only remaining description of what's going on be
> accurate.
> 
> The comment is outdated because it refers to "the adjustment" which
> doesn't exist anymore.  (It was replaced by a flag in my commit).
> To bring it up to date it should say something like:
> 
>   /* Set the LIKELY counter to MIN.  In base 8 and 16, when
>  the argument is in range that includes zero, adjust it
>  upward to include the length of the base prefix since
>  in that case the MIN counter does include it.  */

So for a comment, what about following then?  With or without
the IMNSHO useless
&& (tree_int_cst_sgn (argmin) < 0 || tree_int_cst_sgn (argmax) > 0)

> On a separate note, while testing the patch I noticed that it's
> not exactly equivalent to what's on trunk.  Trunk silently accepts
> the call below but with the patch it complains.  That's great (it
> should complain) but the change should be tested.  More to my point,
> while in this case your change happened to fix a subtle bug (which
> I'm certainly happy about), it could have just as easily introduced
> one.

Yeah, indeed.  That should be a clear argument for why writing it in
so many places is bad, it is simply much more error-prone, there are
too many cases to get right.

>   char d[2];
> 
>   void f (unsigned i)
>   {
> if (i < 1234 || 12345 < i)
>   i = 1234;
> 
> __builtin_sprintf (d, "%#hhx", i);
>   }

What happens is that because the original range doesn't contain zero
you set likely_adjust to false and then never update it again because
the implicit cast changed the range.

If some version of the patch is approved, I'll leave addition of this
testcase to you (incrementally).

2017-02-14  Jakub Jelinek  

PR tree-optimization/79327
* gimple-ssa-sprintf.c (format_integer): Remove likely_adjust
variable, its initialization and use.

--- gcc/gimple-ssa-sprintf.c.jj 2017-02-14 21:21:56.048745037 +0100
+++ gcc/gimple-ssa-sprintf.c2017-02-14 21:25:20.939033174 +0100
@@ -1232,10 +1232,6 @@ format_integer (const directive , tr
of the format string by returning [-1, -1].  */
 return fmtresult ();
 
-  /* True if the LIKELY counter should be adjusted upward from the MIN
- counter to account for arguments with unknown values.  */
-  bool likely_adjust = false;
-
   fmtresult res;
 
   /* Using either the range the non-constant argument is in, or its
@@ -1265,14 +1261,6 @@ format_integer (const directive , tr
 
  res.argmin = argmin;
  res.argmax = argmax;
-
- /* Set the adjustment for an argument whose range includes
-zero since that doesn't include the octal or hexadecimal
-base prefix.  */
- wide_int wzero = wi::zero (wi::get_precision (min));
- if (wi::le_p (min, wzero, SIGNED)
- && !wi::neg_p (max))
-   likely_adjust = true;
}
   else if (range_type == VR_ANTI_RANGE)
{
@@ -1307,11 +1295,6 @@ format_integer (const directive , tr
 
   if (!argmin)
 {
-  /* Set the adjustment for an argument whose range includes
-zero since that doesn't include the octal or hexadecimal
-base prefix.  */
-  likely_adjust = true;
-
   if (TREE_CODE (argtype) == POINTER_TYPE)
{
  argmin = build_int_cst (pointer_sized_int_node, 0);
@@ -1364,14 +1347,19 @@ format_integer (const directive , tr
   res.range.max = MAX (max1, max2);
 }
 
-  /* Add the adjustment for an argument whose range includes zero
- since it doesn't include the octal or hexadecimal base prefix.  */
+  /* If the range is known, use the maximum as the likely length.  */
   if (res.knownrange)
 res.range.likely = res.range.max;
   else
 {
+  /* Otherwise, use the minimum.  Except for the case where for %#x or
+ %#o the minimum is just for a single value in the range (0) and
+ for all other values it is something longer, like 0x1 or 01.
+ Use the length for value 1 in that case instead as the likely
+ length.  */
   res.range.likely = res.range.min;
-  if (likely_adjust && maybebase && base != 10)
+  if (maybebase && base != 10
+ && (tree_int_cst_sgn (argmin) < 0 || tree_int_cst_sgn (argmax) > 0))
{
  if (res.range.min == 1)
res.range.likely += base == 8 ? 1 : 2;


Jakub


[PATCH] rs6000: Fix the vec-adde* testcases once more

2017-02-14 Thread Segher Boessenkool
David found the vec-adde{,c}-int128.c testcases fail on AIX.  Those
tests should only run on targets that have int128.

This also changes the non-int128 testcases to check for the hardware
they require.

Tested on powerpc64-linux {-m32,-m64} and powerpc64le-linux; committing
to trunk.


Segher


2017-02-14  Segher Boessenkool  

gcc/testsuite/
* gcc.target/powerpc/vec-adde-int128.c: Only run if int128 exists.
* gcc.target/powerpc/vec-addec-int128.c: Ditto.
* gcc.target/powerpc/vec-adde.c: Require vsx_hw, don't require a
64-bit default target.
* gcc.target/powerpc/vec-addec.c: Require p8vector_hw, don't require
a 64-bit default target.

---
 gcc/testsuite/gcc.target/powerpc/vec-adde-int128.c  | 2 +-
 gcc/testsuite/gcc.target/powerpc/vec-adde.c | 3 +--
 gcc/testsuite/gcc.target/powerpc/vec-addec-int128.c | 2 +-
 gcc/testsuite/gcc.target/powerpc/vec-addec.c| 3 +--
 4 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/vec-adde-int128.c 
b/gcc/testsuite/gcc.target/powerpc/vec-adde-int128.c
index 8eed7f5..03e89df 100644
--- a/gcc/testsuite/gcc.target/powerpc/vec-adde-int128.c
+++ b/gcc/testsuite/gcc.target/powerpc/vec-adde-int128.c
@@ -1,4 +1,4 @@
-/* { dg-do run { target { powerpc*-*-* && p8vector_hw } } } */
+/* { dg-do run { target { powerpc*-*-* && { p8vector_hw && int128 } } } } */
 /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power8" } } */
 /* { dg-options "-mcpu=power8 -O3" } */
 
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-adde.c 
b/gcc/testsuite/gcc.target/powerpc/vec-adde.c
index a235a1c..b6f87da 100644
--- a/gcc/testsuite/gcc.target/powerpc/vec-adde.c
+++ b/gcc/testsuite/gcc.target/powerpc/vec-adde.c
@@ -1,5 +1,4 @@
-/* { dg-do run { target { powerpc64*-*-* } } } */
-/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-do run { target { powerpc*-*-* && vsx_hw } } } */
 /* { dg-options "-mvsx -O3" } */
 
 /* Test that the vec_adde builtin works as expected.  */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-addec-int128.c 
b/gcc/testsuite/gcc.target/powerpc/vec-addec-int128.c
index 4388e06..3baa82b 100644
--- a/gcc/testsuite/gcc.target/powerpc/vec-addec-int128.c
+++ b/gcc/testsuite/gcc.target/powerpc/vec-addec-int128.c
@@ -1,4 +1,4 @@
-/* { dg-do run { target { powerpc*-*-* && p8vector_hw } } } */
+/* { dg-do run { target { powerpc*-*-* && { p8vector_hw && int128 } } } } */
 /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power8" } } */
 /* { dg-options "-mcpu=power8 -O3" } */
 
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-addec.c 
b/gcc/testsuite/gcc.target/powerpc/vec-addec.c
index 53bd41f..330ec23 100644
--- a/gcc/testsuite/gcc.target/powerpc/vec-addec.c
+++ b/gcc/testsuite/gcc.target/powerpc/vec-addec.c
@@ -1,5 +1,4 @@
-/* { dg-do run { target { powerpc64*-*-* } } } */
-/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-do run { target { powerpc*-*-* && p8vector_hw } } } */
 /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power8" } } */
 /* { dg-options "-mcpu=power8 -O3" } */
 
-- 
1.9.3



[C++ RFC] Fix up attribute handling in templates (PR c++/79502)

2017-02-14 Thread Jakub Jelinek
Hi!

The following testcase fails, because while we have the nodiscard
attribute on the template, we actually never propagate it to the
instantiation, which is where it is checked (I'm really surprised about
this).

Unfortunately, this patch regresses
FAIL: g++.dg/ext/visibility/template8.C  -std=gnu++{11,14,98}  scan-hidden 
hidden[ \\t_]*_Z1gI1AI1BEEvT_
It expects that the visibility attribute from the template never
makes it to the implementation or something, is that correct?  Or do
we need to handle visibility in some special way?

Regarding the first hunk, it is just a wild guess, I couldn't trigger
that code by make check-c++-all.  Is there a way to get it through
some partial instantiation of scoped enum with/without attributes or
something similar?

Anyway, except for that template8.C the patch passed bootstrap/regtest
on x86_64-linux and i686-linux.  But it really puzzles me that the
attributes aren't instantiated, what happens e.g. with abi_tag
attribute?

2017-02-14  Jakub Jelinek  

PR c++/79502
* pt.c (lookup_template_class_1): Set TYPE_ATTRIBUTES on class
instantiations as well as dependent enumeral instantiations.
Set ENUM_UNDERLYING_TYPE on the latter too.

* g++.dg/cpp1z/nodiscard4.C: New test.

--- gcc/cp/pt.c.jj  2017-02-10 21:35:30.0 +0100
+++ gcc/cp/pt.c 2017-02-14 08:36:35.459265103 +0100
@@ -8759,7 +8759,13 @@ lookup_template_class_1 (tree d1, tree a
  template parameters.  And, no one should be interested
  in the enumeration constants for such a type.  */
   t = cxx_make_type (ENUMERAL_TYPE);
+ ENUM_UNDERLYING_TYPE (t)
+   = tsubst (ENUM_UNDERLYING_TYPE (template_type),
+ arglist, complain, in_decl);
   SET_SCOPED_ENUM_P (t, SCOPED_ENUM_P (template_type));
+ TYPE_ATTRIBUTES (t)
+   = tsubst_attributes (TYPE_ATTRIBUTES (template_type),
+arglist, complain, in_decl);
 }
   SET_OPAQUE_ENUM_P (t, OPAQUE_ENUM_P (template_type));
  ENUM_FIXED_UNDERLYING_TYPE_P (t)
@@ -8786,6 +8792,10 @@ lookup_template_class_1 (tree d1, tree a
   equality testing, so this template class requires
   structural equality testing. */
SET_TYPE_STRUCTURAL_EQUALITY (t);
+
+ TYPE_ATTRIBUTES (t)
+   = tsubst_attributes (TYPE_ATTRIBUTES (template_type),
+arglist, complain, in_decl);
}
   else
gcc_unreachable ();
--- gcc/testsuite/g++.dg/cpp1z/nodiscard4.C.jj  2017-02-14 08:42:12.275765748 
+0100
+++ gcc/testsuite/g++.dg/cpp1z/nodiscard4.C 2017-02-14 08:40:00.0 
+0100
@@ -0,0 +1,14 @@
+// PR c++/79502
+// { dg-do compile { target c++11 } }
+
+template
+struct [[nodiscard]] missiles {};
+
+missiles make() { return {}; }
+missiles (*fnptr)() = make;
+
+int main()
+{
+  make();  // { dg-warning "ignoring returned value of type" }
+  fnptr(); // { dg-warning "ignoring returned value of type" }
+}

Jakub


Re: [PATCH] suppress unhelpful -Wformat-truncation=2 INT_MAX warning (PR 79448)

2017-02-14 Thread Martin Sebor

On 02/13/2017 04:33 PM, Jeff Law wrote:

On 02/10/2017 10:55 AM, Martin Sebor wrote:

The recent Fedora mass rebuild revealed that the Wformat-truncation=2
checker is still a bit too aggressive and complains about potentially
unbounded strings causing subsequent directives t exceed the INT_MAX
limit.  (It's unclear how the build ended up enabling level 2 of
the warning.)

This is because for the purposes of the return value optimization
the pass must assume that such strings really are potentially unbounded
and result in as many as INT_MAX bytes (or more).  That doesn't mean
that it should warn on such cases.

The attached patch relaxes the checker to avoid the warning in this
case.  Since there's no easy way for a user to suppress the warning,
is this change okay for trunk at this stage?

Martin

gcc-79448.diff


PR middle-end/79448 - unhelpful -Wformat-truncation=2 warning

gcc/testsuite/ChangeLog:

PR middle-end/79448
* gcc.dg/tree-ssa/builtin-snprintf-warn-3.c: New test.
* gcc.dg/tree-ssa/pr79448-2.c: New test.
* gcc.dg/tree-ssa/pr79448.c: New test.

gcc/ChangeLog:

PR middle-end/79448
* gimple-ssa-sprintf.c (format_directive): Avoid issuing INT_MAX
  warning for strings of unknown length.

diff --git a/gcc/gimple-ssa-sprintf.c b/gcc/gimple-ssa-sprintf.c
index e6cc31d..bf76162 100644
--- a/gcc/gimple-ssa-sprintf.c
+++ b/gcc/gimple-ssa-sprintf.c
@@ -2561,11 +2561,15 @@ format_directive (const
pass_sprintf_length::call_info ,
   /* Raise the total unlikely maximum by the larger of the maximum
  and the unlikely maximum.  It doesn't matter if the unlikely
  maximum overflows.  */
+  unsigned HOST_WIDE_INT save = res->range.unlikely;
   if (fmtres.range.max < fmtres.range.unlikely)
 res->range.unlikely += fmtres.range.unlikely;
   else
 res->range.unlikely += fmtres.range.max;

+  if (res->range.unlikely < save)
+res->range.unlikely = HOST_WIDE_INT_M1U;
+

So this looks like you're doing an overflow check -- yet earlier your
comment says "It doesnt' matter if the unlikely maximum overflows". ISTM
that comment needs updating -- if it doesn't matter, then why check for
it and clamp the value?



   res->range.min += fmtres.range.min;
   res->range.likely += fmtres.range.likely;

@@ -2616,7 +2620,12 @@ format_directive (const
pass_sprintf_length::call_info ,

   /* Has the likely and maximum directive output exceeded INT_MAX?  */
   bool likelyximax = *dir.beg && res->range.likely > target_int_max ();
-  bool maxximax = *dir.beg && res->range.max > target_int_max ();
+  /* Don't consider the maximum to be in excess when it's the result
+ of a string of unknown length (i.e., whose maximum has been set
+ to HOST_WIDE_INT_M1U.  */
+  bool maxximax = (*dir.beg
+   && res->range.max > target_int_max ()
+   && res->range.max < HOST_WIDE_INT_MAX);

So your comment mentions HOST_WIDE_INT_M1U as the key for a string of
unknown length.  But that doesn't obviously correspond to what the code
checks.

Can you please fix up the two comments.  With the comments fixed, this
is OK.


Sure, I updated the comments.

The code alternately uses HOST_WIDE_INT_M1U and HOST_WIDE_INT_MAX as
a stand-in for either a "can't happen" or "unbounded/unknown" size.
It's not fully consistent and should be cleaned up and the uses of
HOST_WIDE_INT should be replaced by a class like wide_int as someone
suggested in the past.  If I get to some of the enhancements I'd like
to make in stage 1 (e.g., integrating the pass with tree-ssa-strlen)
I'll see about cleaning this up.

Martin


Re: [PATCH] portability fix for gcc.dg/strncmp-2.c testcase

2017-02-14 Thread David Edelsohn
On Tue, Feb 14, 2017 at 2:24 PM, Aaron Sawdey
 wrote:
> On Tue, 2017-02-14 at 13:09 -0600, Segher Boessenkool wrote:
>> On Tue, Feb 14, 2017 at 11:56:50AM -0600, Aaron Sawdey wrote:
>> > This testcase I added failed to compile on AIX or older linux due
>> > to
>> > the use of aligned_alloc(). Now fixed to use posix_memalign if
>> > available, and valloc otherwise.
>> >
>> > Now it compiles and passes on x86_64 (fedora 25), ppc64 (RHEL6.8),
>> > and
>> > AIX. OK for trunk?
>>
>> Is valloc preferable to aligned_alloc on all systems where
>> posix_memalign
>> does not exist?  Okay for trunk if so.  Thanks,
>>
>>
>> Segher
>
> My reasoning here was to use the modern function (posix_memalign) if
> available and otherwise fall back to valloc which is in glibc dating
> back to 1996 and openbsd's man page says it was added in BSD 3.0 so
> pretty much anything should have it.

Recent AIX does provide aligned_alloc() and posix_memalign().

- David


[RFC PATCH] Improve switchconv optimization (PR tree-optimization/79472)

2017-02-14 Thread Jakub Jelinek
Hi!

The following patch is an attempt to fix a regression where we no longer
switch convert one switch because earlier optimizations turn it into
unsupported shape.

The patch contains two important changes (that can perhaps be split off
separately):
1) handle virtual PHIs; while because we require all the switch bbs
   to be empty, all the edges from the switch to final_bb should have the
   same virtual op SSA_NAME, if the final_bb is reachable through other
   edges from other code, it might have virtual PHI and switchconv would
   unnecessarily give up
2) if the switch cases form contiguous range, there is no need to require
   anything about the default case, it can be abort, or arbitrary code
   that can or might not fall through into final_bb, or it can e.g. be
   empty bb and just the values from default bb might not be appropriate
   constants; we emit an if (val is in range) vals = CSWTCH[...]; else
   anyway and the else can be anything; we still have to require standard
   default case if the range is non-contiguous, because then the default
   is used also for some values in the tables.

Bootstrapped/regtested on x86_64-linux and i686-linux.  It causes a single
regression, vrp40.c, which looks like this:
int f(int a) {
 switch (a & 1) { case 0: case 1: return 3; case 2: return 5; case 3: return 7;
 case 4: return 11; case 5: return 13; case 6: return 17; case 7: return 19; }
}
and expects to be optimized into return 3 by vrp1.  As switchconv is earlier
than that, vrp1 sees:
  _1 = a_3(D) & 1;
  _4 = (unsigned int) _1;
  _5 = CSWTCH.1[_4];
  return _5;
and doesn't optimize it.  If the testcase had say case 7: replaced with
default:, it wouldn't pass already before.  If the patch is ok, what should
we do with vrp40.c?  Change it in some way (e.g. return variable in one
case) so that switchconv doesn't trigger, or add an optimization in vrp
if we see a load from constant array with known initializer and the range
is small enough and contains the same value for all those values, replace
it with the value?  It would help also with say:
const int a[] = { 7, 8, 9, 1, 1, 1, 1, 2, 3, 4, 5, 6 };
int foo (int x)
{
  if (x <= 2 || x >= 7) return 3;
  return a[x];
}
turn it into
int foo (int x)
{
  if (x <= 2 || x >= 7) return 3;
  return 1;
}

2017-02-14  Jakub Jelinek  

PR tree-optimization/79472
* tree-switch-conversion.c (struct switch_conv_info): Add
contiguous_range and default_case_nonstandard fields.
(collect_switch_conv_info): Compute contiguous_range and
default_case_nonstandard fields, don't clear final_bb if
contiguous_range and only the default case doesn't have the required
structure.
(check_all_empty_except_final): Set default_case_nonstandard instead
of failing if contiguous_range and the default case doesn't have empty
block.
(check_final_bb): Add SWTCH argument, don't fail if contiguous_range
and only the default case doesn't have the required constants.  Skip
virtual phis.
(gather_default_values): Skip virtual phis.  Allow non-NULL CASE_LOW
if default_case_nonstandard.
(build_constructors): Build constant 1 just once.  Assert that default
values aren't inserted in between cases if contiguous_range.  Skip
virtual phis.
(build_arrays): Skip virtual phis.
(prune_bbs): Add DEFAULT_BB argument, don't remove that bb.
(fix_phi_nodes): Don't add e2f phi arg if default_case_nonstandard.
Handle virtual phis.
(gen_inbound_check): Handle default_case_nonstandard case.
(process_switch): Adjust check_final_bb caller.  Call
gather_default_values with the first non-default case instead of
default case if default_case_nonstandard.

* gcc.dg/tree-ssa/cswtch-3.c: New test.
* gcc.dg/tree-ssa/cswtch-4.c: New test.
* gcc.dg/tree-ssa/cswtch-5.c: New test.

--- gcc/tree-switch-conversion.c.jj 2017-02-14 14:54:08.020975500 +0100
+++ gcc/tree-switch-conversion.c2017-02-14 17:09:01.162826954 +0100
@@ -592,6 +592,14 @@ struct switch_conv_info
  dump file, if there is one.  */
   const char *reason;
 
+  /* True if default case is not used for any value between range_min and
+ range_max inclusive.  */
+  bool contiguous_range;
+
+  /* True if default case does not have the required shape for other case
+ labels.  */
+  bool default_case_nonstandard;
+
   /* Parameters for expand_switch_using_bit_tests.  Should be computed
  the same way as in expand_case.  */
   unsigned int uniq;
@@ -606,8 +614,9 @@ collect_switch_conv_info (gswitch *swtch
   unsigned int branch_num = gimple_switch_num_labels (swtch);
   tree min_case, max_case;
   unsigned int count, i;
-  edge e, e_default;
+  edge e, e_default, e_first;
   edge_iterator ei;
+  basic_block first;
 
   memset (info, 0, sizeof (*info));
 
@@ -616,8 +625,8 @@ collect_switch_conv_info 

[committed] Fix memory leak in oacc code (PR middle-end/79505)

2017-02-14 Thread Jakub Jelinek
Hi!

We leak the loop->ifns vectors, fixed thusly, bootstrapped/regtested on
x86_64-linux and i686-linux, committed to trunk.

The first 2 hunks are just cleanup, loop is allocated using XCNEW and thus
cleared, no need to clear anything again (especially when it isn't all
but just random subset of the fields).

2017-02-14  Jakub Jelinek  

PR middle-end/79505
* omp-offload.c (free_oacc_loop): Release loop->ifns vector.
(new_oacc_loop_raw): Don't clear already cleared fields.

--- gcc/omp-offload.c.jj2017-02-09 16:16:01.0 +0100
+++ gcc/omp-offload.c   2017-02-14 13:23:24.198091131 +0100
@@ -681,7 +681,6 @@ new_oacc_loop_raw (oacc_loop *parent, lo
   oacc_loop *loop = XCNEW (oacc_loop);
 
   loop->parent = parent;
-  loop->child = loop->sibling = NULL;
 
   if (parent)
 {
@@ -690,15 +689,6 @@ new_oacc_loop_raw (oacc_loop *parent, lo
 }
 
   loop->loc = loc;
-  loop->marker = NULL;
-  memset (loop->heads, 0, sizeof (loop->heads));
-  memset (loop->tails, 0, sizeof (loop->tails));
-  loop->routine = NULL_TREE;
-
-  loop->mask = loop->e_mask = loop->flags = loop->inner = 0;
-  loop->chunk_size = 0;
-  loop->head_end = NULL;
-
   return loop;
 }
 
@@ -773,6 +763,7 @@ free_oacc_loop (oacc_loop *loop)
   if (loop->child)
 free_oacc_loop (loop->child);
 
+  loop->ifns.release ();
   free (loop);
 }
 

Jakub


Re: [PATCH] portability fix for gcc.dg/strncmp-2.c testcase

2017-02-14 Thread Aaron Sawdey
On Tue, 2017-02-14 at 13:09 -0600, Segher Boessenkool wrote:
> On Tue, Feb 14, 2017 at 11:56:50AM -0600, Aaron Sawdey wrote:
> > This testcase I added failed to compile on AIX or older linux due
> > to
> > the use of aligned_alloc(). Now fixed to use posix_memalign if
> > available, and valloc otherwise.
> > 
> > Now it compiles and passes on x86_64 (fedora 25), ppc64 (RHEL6.8),
> > and
> > AIX. OK for trunk?
> 
> Is valloc preferable to aligned_alloc on all systems where
> posix_memalign
> does not exist?  Okay for trunk if so.  Thanks,
> 
> 
> Segher

My reasoning here was to use the modern function (posix_memalign) if
available and otherwise fall back to valloc which is in glibc dating
back to 1996 and openbsd's man page says it was added in BSD 3.0 so
pretty much anything should have it.

Thanks,
Aaron


> 
> 
> > 2017-02-14  Aaron Sawdey  
> > 
> > * gcc.dg/strncmp-2.c: Portability fixes.
> 
> 
-- 
Aaron Sawdey, Ph.D.  acsaw...@linux.vnet.ibm.com
050-2/C113  (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC Toolchain



Re: [PATCH] use zero as the lower bound for a signed-unsigned range (PR 79327)

2017-02-14 Thread Martin Sebor

On 02/14/2017 09:39 AM, Jakub Jelinek wrote:

On Tue, Feb 14, 2017 at 09:36:44AM -0700, Martin Sebor wrote:

@@ -1371,7 +1354,8 @@ format_integer (const directive , tr
   else
 {
   res.range.likely = res.range.min;
-  if (likely_adjust && maybebase && base != 10)
+  if (maybebase && base != 10
+ && (tree_int_cst_sgn (argmin) < 0 || tree_int_cst_sgn (argmax) > 0))
{
  if (res.range.min == 1)
res.range.likely += base == 8 ? 1 : 2;


You've removed all the comments that explain what's going on.  If
you must make the change (I see no justification for it now) please
at least document it.


I thought that is the:
  /* Add the adjustment for an argument whose range includes zero
 since it doesn't include the octal or hexadecimal base prefix.  */
comment above the if.


That comment explains how the likely_adjust variable ("the adjustment")
is being used, or more precisely, how it was being used in the first
version of the patch.  The comment became somewhat out of date with
the committed version of the patch (this was my bad).

The variable is documented where it's defined and again where it's
assigned to.  With the removal of those comments it seems especially
important that the only remaining description of what's going on be
accurate.

The comment is outdated because it refers to "the adjustment" which
doesn't exist anymore.  (It was replaced by a flag in my commit).
To bring it up to date it should say something like:

  /* Set the LIKELY counter to MIN.  In base 8 and 16, when
 the argument is in range that includes zero, adjust it
 upward to include the length of the base prefix since
 in that case the MIN counter does include it.  */

On a separate note, while testing the patch I noticed that it's
not exactly equivalent to what's on trunk.  Trunk silently accepts
the call below but with the patch it complains.  That's great (it
should complain) but the change should be tested.  More to my point,
while in this case your change happened to fix a subtle bug (which
I'm certainly happy about), it could have just as easily introduced
one.

  char d[2];

  void f (unsigned i)
  {
if (i < 1234 || 12345 < i)
  i = 1234;

__builtin_sprintf (d, "%#hhx", i);
  }

Martin


Re: [PATCH] portability fix for gcc.dg/strncmp-2.c testcase

2017-02-14 Thread Segher Boessenkool
On Tue, Feb 14, 2017 at 11:56:50AM -0600, Aaron Sawdey wrote:
> This testcase I added failed to compile on AIX or older linux due to
> the use of aligned_alloc(). Now fixed to use posix_memalign if
> available, and valloc otherwise.
> 
> Now it compiles and passes on x86_64 (fedora 25), ppc64 (RHEL6.8), and
> AIX. OK for trunk?

Is valloc preferable to aligned_alloc on all systems where posix_memalign
does not exist?  Okay for trunk if so.  Thanks,


Segher


> 2017-02-14  Aaron Sawdey  
> 
>   * gcc.dg/strncmp-2.c: Portability fixes.


RE: [PATCH 3/5] Support WORD_REGISTER_OPERATIONS requirements in simplify_operand_subreg

2017-02-14 Thread Matthew Fortune
Sorry for the slow reply, been away for a few days

Eric Botcazou  writes:
> > This patch is a minimal change to prevent (subreg(mem)) from being
> > simplified to use the outer mode for WORD_REGISTER_OPERATIONS.  There
> > is high probability of refining and/or re-implementing this for GCC 8
> > but such a change would be too invasive.  This change at least ensures
> > correctness but may prevent simplification of some acceptable cases.
> 
> This one causes:
> 
> +FAIL: gcc.dg/torture/builtin-complex-1.c   -O3 -fomit-frame-pointer -
> funroll-
> loops -fpeel-loops -ftracer -finline-functions  (test for excess errors)
> +WARNING: gcc.dg/torture/builtin-complex-1.c   -O3 -fomit-frame-pointer
> -
> funroll-loops -fpeel-loops -ftracer -finline-functions  compilation
> failed to produce executable
> +FAIL: gcc.dg/torture/builtin-complex-1.c   -O3 -g  (test for excess
> errors)
> +WARNING: gcc.dg/torture/builtin-complex-1.c   -O3 -g  compilation
> failed to
> produce executable
> +WARNING: program timed out.
> +WARNING: program timed out.
> 
> on SPARC 32-bit, i.e. LRA hangs.  Reduced testcase attached, compile at
> -O3 with a cc1 configured for sparc-sun-solaris2.10.
> 
> > gcc/
> > PR target/78660
> > * lra-constraints.c (simplify_operand_subreg): Handle
> > WORD_REGISTER_OPERATIONS targets.
> > ---
> >  gcc/lra-constraints.c | 17 -
> >  1 file changed, 12 insertions(+), 5 deletions(-)
> >
> > diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c index
> > 66ff2bb..484a70d 100644
> > --- a/gcc/lra-constraints.c
> > +++ b/gcc/lra-constraints.c
> > @@ -1541,11 +1541,18 @@ simplify_operand_subreg (int nop, machine_mode
> > reg_mode) subregs as we don't substitute such equiv memory (see
> > processing equivalences in function lra_constraints) and because for
> > spilled pseudos we allocate stack memory enough for the biggest
> > -corresponding paradoxical subreg.  */
> > - if (!(MEM_ALIGN (subst) < GET_MODE_ALIGNMENT (mode)
> > -   && SLOW_UNALIGNED_ACCESS (mode, MEM_ALIGN (subst)))
> > - || (MEM_ALIGN (reg) < GET_MODE_ALIGNMENT (innermode)
> > - && SLOW_UNALIGNED_ACCESS (innermode, MEM_ALIGN (reg
> > +corresponding paradoxical subreg.
> > +
> > +However, never simplify a (subreg (mem ...)) for
> > +WORD_REGISTER_OPERATIONS targets as this may lead to loading
> junk
> > +data into a register when the inner is narrower than outer or
> > +missing important data from memory when the inner is wider
> than
> > +outer.  */
> > + if (!WORD_REGISTER_OPERATIONS
> > + && (!(MEM_ALIGN (subst) < GET_MODE_ALIGNMENT (mode)
> > +   && SLOW_UNALIGNED_ACCESS (mode, MEM_ALIGN (subst)))
> > + || (MEM_ALIGN (reg) < GET_MODE_ALIGNMENT (innermode)
> > + && SLOW_UNALIGNED_ACCESS (innermode, MEM_ALIGN
> (reg)
> > return true;
> >
> >   *curr_id->operand_loc[nop] = operand;
> 
> I think that we might need:
> 
>   if (!(GET_MODE_PRECISION (mode) > GET_MODE_PRECISION (innermode)
>   && WORD_REGISTER_OPERATIONS)
>   && (!(MEM_ALIGN (subst) < GET_MODE_ALIGNMENT (mode)
>   && SLOW_UNALIGNED_ACCESS (mode, MEM_ALIGN (subst)))
> || (MEM_ALIGN (reg) < GET_MODE_ALIGNMENT (innermode)
> && SLOW_UNALIGNED_ACCESS (innermode, MEM_ALIGN (reg)
> return true;
> 
> i.e. we force the reloading only for paradoxical subregs.

Maybe, though I'm not entirely convinced.  For WORD_REGISTER_OPERATIONS
both paradoxical and normal SUBREGs can be a problem as the inner mode
in both cases can be used elsewhere for a reload making the content
of the spill slot wrong either in the subreg reload or the ordinary
reload elsewhere. However, the condition can be tightened; I should
not have made it so simplistic I guess. I.e. both modes must be no
wider than a word and must be different precision to force an inner
reload.

Adding that check would fix this case for SPARC and should be fine
for MIPS but I need to wait for a bootstrap build to be sure.

I don't really understand why LRA can't reload this for SPARC though
as I'm not sure there is any guarantee provided to backends that some
SUBREGs will be reloaded using their outer mode.  If there is such a
guarantee then it would be much easier to reason about this logic but
as it stands I suspect we are having to tweak LRA to cope with
assumptions made in various targets that happen to have held true (and
I have no doubt that MIPS has some of these as well especially in terms
of the FP/GP register usage with float and int modes.)  All being well
we can capture such assumptions and formalise them so we ensure they
hold true (or modify backends appropriately I guess).

The condition would look like this, What do you think?

  if (!(GET_MODE_PRECISION (mode) != GET_MODE_PRECISION (innermode)
&& GET_MODE_SIZE (mode) <= UNITS_PER_WORD
 

Re: [ARM] Enable descriptors for nested functions in Ada

2017-02-14 Thread Eric Botcazou
> Is this ABI, or private to a release of the compiler?  If the latter,
> then OK.  Otherwise, I don't think we should presume that the reserved
> bits won't get used.

The latter, there is no fixed ABI for Ada.

-- 
Eric Botcazou


Re: [Aarch64] Enable descriptors for nested functions in Ada

2017-02-14 Thread Eric Botcazou
> Doesn't this imply a minimum function alignment of 8? That's not guaranteed
> on AArch64, at least -mcpu=exynos-m1 uses 4-byte alignment.

Well, the initial setting was 2, which would have required 4-byte alignment 
only and would have been perfectly fine IMO, but it was deemed problematic, 
hence the new value.  And, yes, the functions will be overaligned.

-- 
Eric Botcazou


[PATCH] portability fix for gcc.dg/strncmp-2.c testcase

2017-02-14 Thread Aaron Sawdey
This testcase I added failed to compile on AIX or older linux due to
the use of aligned_alloc(). Now fixed to use posix_memalign if
available, and valloc otherwise.

Now it compiles and passes on x86_64 (fedora 25), ppc64 (RHEL6.8), and
AIX. OK for trunk?

2017-02-14  Aaron Sawdey  

* gcc.dg/strncmp-2.c: Portability fixes.

-- 
Aaron Sawdey, Ph.D.  acsaw...@linux.vnet.ibm.com
050-2/C113  (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC ToolchainIndex: strncmp-2.c
===
--- strncmp-2.c	(revision 245439)
+++ strncmp-2.c	(working copy)
@@ -19,7 +19,12 @@
 {
   long pgsz = sysconf(_SC_PAGESIZE);
   char buf1[sz+1];
-  char *buf2 = aligned_alloc(pgsz,2*pgsz);
+  char *buf2;
+#if _POSIX_C_SOURCE >= 200112L
+  if ( posix_memalign ((void **), pgsz, 2*pgsz) ) abort ();
+#else
+  if ( !(buf2 = valloc(2*pgsz))) abort ();
+#endif
   char *p2;
   int r,i,e;
 
@@ -35,6 +40,7 @@
 e = lib_memcmp(buf1,p2,sz);
 (*test_memcmp)(buf1,p2,e);
   }
+  free(buf2);
 }
 
 #define RUN_TEST(SZ) test_driver_strncmp (test_strncmp_ ## SZ, test_memcmp_ ## SZ, SZ);


[PATCH, i386]: Fix PR61225, -fshrink-wrap interference with RMW peepholes

2017-02-14 Thread Uros Bizjak
Hello!

It turned out that with gcc.target/i386/pr49095.c, the default
-fshrink-wrap setting interferes with short testcases that test
various read-modify-write peephole patterns.

Use -fno-shrink-wrap to keep correct RTL sequences that test the transformation.

2017-02-14  Uros Bizjak  

PR middle-end/61225
* gcc.target/i386/pr49095.c: Add -fno-shrink-wrap to dg-options.
Use dg-additional-options for ia32 target.  Remove XFAIL.

Tested on x86_64-linux-gnu {,-m32}.

Committed to mainline SVN, will be backported to gcc-6 branch.

Uros.
Index: gcc.target/i386/pr49095.c
===
--- gcc.target/i386/pr49095.c   (revision 245433)
+++ gcc.target/i386/pr49095.c   (working copy)
@@ -1,7 +1,7 @@
 /* PR rtl-optimization/49095 */
 /* { dg-do compile } */
-/* { dg-options "-Os" } */
-/* { dg-options "-Os -mregparm=2" { target ia32 } } */
+/* { dg-options "-Os -fno-shrink-wrap" } */
+/* { dg-additional-options "-mregparm=2" { target ia32 } } */
 
 void foo (void *);
 
@@ -70,5 +70,4 @@
 G (int)
 G (long)
 
-/* See PR61225 for the XFAIL.  */
-/* { dg-final { scan-assembler-not "test\[lq\]" { xfail { ia32 } } } } */
+/* { dg-final { scan-assembler-not "test\[lq\]" } } */


[PATCH, i386]: Fix PR79495, ICE in extract_constrain_insn with -msoft-float

2017-02-14 Thread Uros Bizjak
Attached patch adds correct alternative fo 64-bit targets. On these
targets, we have to prevent alternatives that split to moves from
DImode immediates outside signed 32-bit range to a memory.

2017-02-14  Uros Bizjak  

PR target/79495
* config/i386/i386.md (*movxf_internal): Add (o,rC) alternative.

testsuite/ChangeLog:

2017-02-14  Uros Bizjak  

PR target/79495
* gcc.target/i386/pr79495.c: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline, will be backported to gcc-6 branch.

Uros.
Index: config/i386/i386.md
===
--- config/i386/i386.md (revision 245433)
+++ config/i386/i386.md (working copy)
@@ -3248,9 +3248,9 @@
 ;; in alternatives 4, 6, 7 and 8.
 (define_insn "*movxf_internal"
   [(set (match_operand:XF 0 "nonimmediate_operand"
-"=f,m,f,?r ,!o,?*r ,!o,!o,!o,r  ,o")
+"=f,m,f,?r ,!o,?*r ,!o,!o,!o,r  ,o ,o")
(match_operand:XF 1 "general_operand"
-"fm,f,G,roF,r , *roF,*r,F ,C,roF,rF"))]
+"fm,f,G,roF,r ,*roF,*r,F ,C ,roF,rF,rC"))]
   "!(MEM_P (operands[0]) && MEM_P (operands[1]))
&& (lra_in_progress || reload_completed
|| !CONST_DOUBLE_P (operands[1])
@@ -3277,19 +3277,19 @@
 }
 }
   [(set (attr "isa")
-   (cond [(eq_attr "alternative" "7")
+   (cond [(eq_attr "alternative" "7,10")
 (const_string "nox64")
-  (eq_attr "alternative" "8")
+  (eq_attr "alternative" "8,11")
 (const_string "x64")
  ]
  (const_string "*")))
(set (attr "type")
-   (cond [(eq_attr "alternative" "3,4,5,6,7,8,9,10")
+   (cond [(eq_attr "alternative" "3,4,5,6,7,8,9,10,11")
 (const_string "multi")
  ]
  (const_string "fmov")))
(set (attr "mode")
-   (cond [(eq_attr "alternative" "3,4,5,6,7,8,9,10")
+   (cond [(eq_attr "alternative" "3,4,5,6,7,8,9,10,11")
 (if_then_else (match_test "TARGET_64BIT")
   (const_string "DI")
   (const_string "SI"))
@@ -3300,7 +3300,7 @@
   (symbol_ref "false")]
(symbol_ref "true")))
(set (attr "enabled")
- (cond [(eq_attr "alternative" "9,10")
+ (cond [(eq_attr "alternative" "9,10,11")
   (if_then_else
(match_test "TARGET_HARD_XF_REGS")
(symbol_ref "false")
Index: testsuite/gcc.target/i386/pr79495.c
===
--- testsuite/gcc.target/i386/pr79495.c (nonexistent)
+++ testsuite/gcc.target/i386/pr79495.c (working copy)
@@ -0,0 +1,11 @@
+/* PR target/79495 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -msoft-float" } */
+
+long double dnan = 1.0l/0.0l - 1.0l/0.0l;
+long double x = 1.0l;
+void fn1 (void)
+{
+  if (dnan != x)
+x = 1.0;
+}


Re: C++ PATCH to fix a couple of ice-on-invalid with incomplete type (PR c++/79420, c++/79463)

2017-02-14 Thread Jason Merrill
OK.

On Tue, Feb 14, 2017 at 11:40 AM, Marek Polacek  wrote:
> In both these PRs the problem is the same: we have a non-dependent incomplete
> postfix expression in a template, and since r245223 we treat it as dependent
> (with a pedwarn), and erase its type.  For OVERLOADs this is bad because 
> we'll hit
> this in tsubst_copy:
> case OVERLOAD:
>   /* An OVERLOAD will always be a non-dependent overload set; an
>  overload set from function scope will just be represented with an
>  IDENTIFIER_NODE, and from class scope with a BASELINK.  */
>   gcc_assert (!uses_template_parms (t));
>
> and for VAR_DECLs it's bad because the subsequent code cannot cope with its
> null type.  Jason suggested to only clobber EXPR_P trees and said that these
> could stay dependent.  But only resetting the type for EXPR_Ps would mean that
> we'd print the incomplete type diagnostics twice, so I decided to do this
> instead.  It's all invalid code, so it doesn't seem to be a problem to skip 
> the
> dependent_p = true; line and resetting the scope.
>
> Bootstrapped/regtested on x86_64-linux, ok for trunk?
>
> 2017-02-14  Marek Polacek  
>
> PR c++/79420
> PR c++/79463
> * parser.c (cp_parser_postfix_dot_deref_expression): Avoid
> clobbering if the postfix expression isn't an EXPR_P.
>
> * g++.dg/cpp1y/pr79463.C: New.
> * g++.dg/template/incomplete10.C: New.
> * g++.dg/template/incomplete9.C: New.
>
> diff --git gcc/cp/parser.c gcc/cp/parser.c
> index ce45bba..ccafefd 100644
> --- gcc/cp/parser.c
> +++ gcc/cp/parser.c
> @@ -7331,7 +7331,9 @@ cp_parser_postfix_dot_deref_expression (cp_parser 
> *parser,
>(scope, current_class_type
> {
>   scope = complete_type (scope);
> - if (!COMPLETE_TYPE_P (scope))
> + if (!COMPLETE_TYPE_P (scope)
> + /* Avoid clobbering e.g. OVERLOADs or DECLs.  */
> + && EXPR_P (postfix_expression))
> {
>   /* In a template, be permissive by treating an object expression
>  of incomplete type as dependent (after a pedwarn).  */
> diff --git gcc/testsuite/g++.dg/cpp1y/pr79463.C 
> gcc/testsuite/g++.dg/cpp1y/pr79463.C
> index e69de29..fdf668b 100644
> --- gcc/testsuite/g++.dg/cpp1y/pr79463.C
> +++ gcc/testsuite/g++.dg/cpp1y/pr79463.C
> @@ -0,0 +1,7 @@
> +// PR c++/79463
> +// { dg-options "-g" }
> +// { dg-do compile { target c++14 } }
> +
> +struct A;
> +extern A a; // { dg-error "'a' has incomplete type" }
> +template < int > int f = a.x;
> diff --git gcc/testsuite/g++.dg/template/incomplete10.C 
> gcc/testsuite/g++.dg/template/incomplete10.C
> index e69de29..f0b406d 100644
> --- gcc/testsuite/g++.dg/template/incomplete10.C
> +++ gcc/testsuite/g++.dg/template/incomplete10.C
> @@ -0,0 +1,13 @@
> +// PR c++/79420
> +
> +struct S;
> +extern S s; // { dg-error "'s' has incomplete type" }
> +template int f ()
> +{
> +  return s.x;
> +}
> +
> +void g ()
> +{
> +  f<0> ();
> +}
> diff --git gcc/testsuite/g++.dg/template/incomplete9.C 
> gcc/testsuite/g++.dg/template/incomplete9.C
> index e69de29..9e03232 100644
> --- gcc/testsuite/g++.dg/template/incomplete9.C
> +++ gcc/testsuite/g++.dg/template/incomplete9.C
> @@ -0,0 +1,11 @@
> +// PR c++/79420
> +
> +template int f ()
> +{
> +  return f.x; // { dg-error "overloaded function with no contextual type 
> information" }
> +}
> +
> +void g ()
> +{
> +  f<0> ();
> +}
>
> Marek


[PATCH] PR target/79498: Properly store 128-bit constant in large model

2017-02-14 Thread H.J. Lu
When converting TI store with CONST_INT to V1TI store with CONST_VECTOR
in large model, an extra instruction may be needed to load CONST_VECTOR
into a register.  Insert the extra instruction to the right place.

Tested on x86-64.  I am checking in this pre-approved patch.

Thanks.

H.J.
---
gcc/

PR target/79498
* config/i386/i386.c (timode_scalar_chain::convert_insn): Insert
the extra instruction to the right place to store 128-bit constant
when needed.

gcc/testsuite/

PR target/79498
* gcc.target/i386/pr79498.c: New test.
---
 gcc/config/i386/i386.c  |  5 +
 gcc/testsuite/gcc.target/i386/pr79498.c | 20 
 2 files changed, 25 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr79498.c

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index d7dce4b..02287fd 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -3956,8 +3956,13 @@ timode_scalar_chain::convert_insn (rtx_insn *insn)
  /* Since there are no instructions to store 128-bit constant,
 temporary register usage is required.  */
  rtx tmp = gen_reg_rtx (V1TImode);
+ start_sequence ();
  src = gen_rtx_CONST_VECTOR (V1TImode, gen_rtvec (1, src));
  src = validize_mem (force_const_mem (V1TImode, src));
+ rtx_insn *seq = get_insns ();
+ end_sequence ();
+ if (seq)
+   emit_insn_before (seq, insn);
  emit_conversion_insns (gen_rtx_SET (dst, tmp), insn);
  dst = tmp;
}
diff --git a/gcc/testsuite/gcc.target/i386/pr79498.c 
b/gcc/testsuite/gcc.target/i386/pr79498.c
new file mode 100644
index 000..8f62393
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr79498.c
@@ -0,0 +1,20 @@
+/* PR target/79498 */
+/* { dg-do compile { target lp64 } } */
+/* { dg-options "-O2 -mno-avx512f -mcmodel=large -Wno-psabi" } */
+
+typedef unsigned U __attribute__ ((vector_size (64)));
+typedef unsigned __int128 V __attribute__ ((vector_size (64)));
+
+static inline V
+bar (U u, U x, V v)
+{
+  v = (V)(U) { 0, ~0 };
+  v[x[0]] <<= u[-63];
+  return v;
+}
+
+V
+foo (U u)
+{
+  return bar (u, (U) {}, (V) {});
+}
-- 
2.9.3



C++ PATCH to fix a couple of ice-on-invalid with incomplete type (PR c++/79420, c++/79463)

2017-02-14 Thread Marek Polacek
In both these PRs the problem is the same: we have a non-dependent incomplete
postfix expression in a template, and since r245223 we treat it as dependent
(with a pedwarn), and erase its type.  For OVERLOADs this is bad because we'll 
hit
this in tsubst_copy:
case OVERLOAD:
  /* An OVERLOAD will always be a non-dependent overload set; an
 overload set from function scope will just be represented with an
 IDENTIFIER_NODE, and from class scope with a BASELINK.  */
  gcc_assert (!uses_template_parms (t)); 

and for VAR_DECLs it's bad because the subsequent code cannot cope with its
null type.  Jason suggested to only clobber EXPR_P trees and said that these
could stay dependent.  But only resetting the type for EXPR_Ps would mean that
we'd print the incomplete type diagnostics twice, so I decided to do this
instead.  It's all invalid code, so it doesn't seem to be a problem to skip the
dependent_p = true; line and resetting the scope.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2017-02-14  Marek Polacek  

PR c++/79420
PR c++/79463
* parser.c (cp_parser_postfix_dot_deref_expression): Avoid
clobbering if the postfix expression isn't an EXPR_P.

* g++.dg/cpp1y/pr79463.C: New.
* g++.dg/template/incomplete10.C: New.
* g++.dg/template/incomplete9.C: New.

diff --git gcc/cp/parser.c gcc/cp/parser.c
index ce45bba..ccafefd 100644
--- gcc/cp/parser.c
+++ gcc/cp/parser.c
@@ -7331,7 +7331,9 @@ cp_parser_postfix_dot_deref_expression (cp_parser *parser,
   (scope, current_class_type
{
  scope = complete_type (scope);
- if (!COMPLETE_TYPE_P (scope))
+ if (!COMPLETE_TYPE_P (scope)
+ /* Avoid clobbering e.g. OVERLOADs or DECLs.  */
+ && EXPR_P (postfix_expression))
{
  /* In a template, be permissive by treating an object expression
 of incomplete type as dependent (after a pedwarn).  */
diff --git gcc/testsuite/g++.dg/cpp1y/pr79463.C 
gcc/testsuite/g++.dg/cpp1y/pr79463.C
index e69de29..fdf668b 100644
--- gcc/testsuite/g++.dg/cpp1y/pr79463.C
+++ gcc/testsuite/g++.dg/cpp1y/pr79463.C
@@ -0,0 +1,7 @@
+// PR c++/79463
+// { dg-options "-g" }
+// { dg-do compile { target c++14 } }
+
+struct A;
+extern A a; // { dg-error "'a' has incomplete type" }
+template < int > int f = a.x;
diff --git gcc/testsuite/g++.dg/template/incomplete10.C 
gcc/testsuite/g++.dg/template/incomplete10.C
index e69de29..f0b406d 100644
--- gcc/testsuite/g++.dg/template/incomplete10.C
+++ gcc/testsuite/g++.dg/template/incomplete10.C
@@ -0,0 +1,13 @@
+// PR c++/79420
+
+struct S;
+extern S s; // { dg-error "'s' has incomplete type" }
+template int f ()
+{
+  return s.x;
+}
+
+void g ()
+{
+  f<0> ();
+}
diff --git gcc/testsuite/g++.dg/template/incomplete9.C 
gcc/testsuite/g++.dg/template/incomplete9.C
index e69de29..9e03232 100644
--- gcc/testsuite/g++.dg/template/incomplete9.C
+++ gcc/testsuite/g++.dg/template/incomplete9.C
@@ -0,0 +1,11 @@
+// PR c++/79420
+
+template int f ()
+{
+  return f.x; // { dg-error "overloaded function with no contextual type 
information" }
+}
+
+void g ()
+{
+  f<0> ();
+}

Marek


Re: [PATCH] use zero as the lower bound for a signed-unsigned range (PR 79327)

2017-02-14 Thread Jakub Jelinek
On Tue, Feb 14, 2017 at 09:36:44AM -0700, Martin Sebor wrote:
> > @@ -1371,7 +1354,8 @@ format_integer (const directive , tr
> >else
> >  {
> >res.range.likely = res.range.min;
> > -  if (likely_adjust && maybebase && base != 10)
> > +  if (maybebase && base != 10
> > + && (tree_int_cst_sgn (argmin) < 0 || tree_int_cst_sgn (argmax) > 0))
> > {
> >   if (res.range.min == 1)
> > res.range.likely += base == 8 ? 1 : 2;
> 
> You've removed all the comments that explain what's going on.  If
> you must make the change (I see no justification for it now) please
> at least document it.

I thought that is the:
  /* Add the adjustment for an argument whose range includes zero
 since it doesn't include the octal or hexadecimal base prefix.  */
comment above the if.

Jakub


Re: [PATCH][RFA][target/79404] Fix uninitialized reference to ira_register_move_cost[mode]

2017-02-14 Thread Vladimir Makarov

On 02/14/2017 01:30 AM, Jeff Law wrote:


So imagine we have two allocnos related by a copy chain (two operand 
architecture).


(gdb) p *cp->first
$11 = {num = 9, regno = 33, mode = DImode, wmode = DImode, aclass = 
GENERAL_REGS, dont_reassign_p = 0,
  bad_spill_p = 0, assigned_p = 1, conflict_vec_p = 0, hard_regno = 
-1, next_regno_allocno = 0x0,
  loop_tree_node = 0x1e0b190, nrefs = 13, freq = 8069, class_cost = 
1380, updated_class_cost = 1380,
  memory_cost = 29656, updated_memory_cost = 29656, 
excess_pressure_points_num = 17, allocno_prefs = 0x0,
  allocno_copies = 0x1e4b400, cap = 0x0, cap_member = 0x0, num_objects 
= 1, objects = {0x1e8b6a0, 0x0},
  call_freq = 0, calls_crossed_num = 0, cheap_calls_crossed_num = 0, 
crossed_calls_clobbered_regs = 0,
  hard_reg_costs = 0x1da9510, updated_hard_reg_costs = 0x0, 
conflict_hard_reg_costs = 0x0,

  updated_conflict_hard_reg_costs = 0x0, add_data = 0x1e04378}

(gdb) p *cp->second
$12 = {num = 12, regno = 39, mode = SImode, wmode = SImode, aclass = 
GENERAL_REGS, dont_reassign_p = 0,
  bad_spill_p = 1, assigned_p = 1, conflict_vec_p = 0, hard_regno = 2, 
next_regno_allocno = 0x0,
  loop_tree_node = 0x1e0b190, nrefs = 2, freq = 388, class_cost = 0, 
updated_class_cost = 0, memory_cost = 1552,
  updated_memory_cost = 1552, excess_pressure_points_num = 0, 
allocno_prefs = 0x0, allocno_copies = 0x1e4b400,
  cap = 0x0, cap_member = 0x0, num_objects = 2, objects = {0x1e8b7e0, 
0x1e8b830}, call_freq = 0,
  calls_crossed_num = 0, cheap_calls_crossed_num = 0, 
crossed_calls_clobbered_regs = 0,
  hard_reg_costs = 0x1da9550, updated_hard_reg_costs = 0x0, 
conflict_hard_reg_costs = 0x0,

  updated_conflict_hard_reg_costs = 0x0, add_data = 0x1e04480}


Note how cp->first is mode DImode.

Now assume that all the real uses of cp->first occur as SUBREG 
expressions.  But there is a DImode clobber of cp->first.  Like this:



(insn 7 2 3 2 (clobber (reg/v:DI 33 [ u ])) 
"/home/gcc/GIT-2/gcc/libgcc/libgcc2.c":404 -1

 (nil))
(insn 3 7 4 2 (set (subreg:HI (reg/v:DI 33 [ u ]) 0)
(mem/c:HI (reg/f:HI 9 ap) [4 u+0 S2 A16])) 
"/home/gcc/GIT-2/gcc/libgcc/libgcc2.c":404 5 {*movhi_h8300}

 (nil))
(insn 4 3 5 2 (set (subreg:HI (reg/v:DI 33 [ u ]) 2)
(mem/c:HI (plus:HI (reg/f:HI 9 ap)
(const_int 2 [0x2])) [4 u+2 S2 A16])) 
"/home/gcc/GIT-2/gcc/libgcc/libgcc2.c":404 5 {*movhi_h8300}

 (nil))
[ ... ]
(insn 35 32 37 5 (parallel [
(set (reg:SI 39 [ _32 ])
(lshiftrt:SI (subreg:SI (reg/v:DI 33 [ u ]) 0)
(subreg:QI (reg:HI 38) 1)))
(clobber (scratch:QI))
]) "/home/gcc/GIT-2/gcc/libgcc/libgcc2.c":415 229 {*shiftsi}
 (expr_list:REG_DEAD (reg:HI 38)
(expr_list:REG_DEAD (reg/v:DI 33 [ u ])
(expr_list:REG_EQUIV (mem/j/c:SI (plus:HI (reg/f:HI 11 fp)
(const_int -4 [0xfffc])) [1 
w.s.low+0 S4 A16])

(nil)

There's other references to (reg 33), but again, they all use subregs. 
The only real DImode reference to (reg 33) is in the clobber.  And 
remember that (reg 33) is involved in a copy chain.



So we'll eventually call allocno_copy_cost_saving and try to compute a 
cost savings using:


2764  cost += cp->freq * 
ira_register_move_cost[allocno_mode][rclass][rclass];


But ira_register_move_cost[DImode] is NULL -- it's never been 
initialized, presumably because we never see a real DImode reference 
to anything except in CLOBBER statements.


We can fix this in scan_one_insn via the attached patch.  I'm not sure 
if this is the best place to catch this or not.


I haven't included a testcase as this trips just building libgcc on 
the H8 target.  I could easily reduce it if folks think its worth the 
trouble.


I've verified this allows libgcc to build on the H8 target and 
bootstrapped/regression tested the change on x86_64-unknown-linux-gnu 
as well.


Vlad, is this OK for the trunk, or should we be catching this elsewhere?
There is no harm to put this fix.  We could check initialization at 
every usage but it will be a big impact on IRA's speed. Another place 
would be at the beginning of find_costs_and_classes in the following loop:


  for (i = max_reg_num () - 1; i >= FIRST_PSEUDO_REGISTER; i--)
regno_best_class[i] = NO_REGS;

Still your place will save CPU cycles most probably.

So the patch is ok for me.  Thank you, Jeff.



Re: [PATCH] use zero as the lower bound for a signed-unsigned range (PR 79327)

2017-02-14 Thread Martin Sebor

On 02/14/2017 12:18 AM, Jakub Jelinek wrote:

On Mon, Feb 13, 2017 at 04:53:19PM -0700, Jeff Law wrote:

dirtype is one of the standard {un,}signed {char,short,int,long,long long}
types, all of them have 0 in their ranges.
For VR_RANGE we almost always set res.knownrange to true:
  /* Set KNOWNRANGE if the argument is in a known subrange
 of the directive's type (KNOWNRANGE may be reset below).  */
  res.knownrange
= (!tree_int_cst_equal (TYPE_MIN_VALUE (dirtype), argmin)
   || !tree_int_cst_equal (TYPE_MAX_VALUE (dirtype), argmax));
(the exception is in case that range clearly has to include zero),
and reset it only if adjust_range_for_overflow returned true, which means
it also set the range to TYPE_M{IN,AX}_VALUE (dirtype) and again
includes zero.
So IMNSHO likely_adjust in what you've committed is always true
when you use it and thus just a useless computation and something to make
the code harder to understand.

If KNOWNRANGE is false, then LIKELY_ADJUST will be true.  But I don't see
how we can determine anything for LIKELY_ADJUST if KNOWNRANGE is true.


We can't, but that doesn't matter, we only use it if KNOWNRANGE is false.
The only user of LIKELY_ADJUST is:

  if (res.knownrange)
res.range.likely = res.range.max;
  else
{
// -- Here we know res.knownrage is false
  res.range.likely = res.range.min;
  if (likely_adjust && maybebase && base != 10)
// -- and here is the only user of likely_adjust
{
  if (res.range.min == 1)
res.range.likely += base == 8 ? 1 : 2;
  else if (res.range.min == 2
   && base == 16
   && (dir.width[0] == 2 || dir.prec[0] == 2))
++res.range.likely;
}
}


Even if you don't trust this, with the ranges in argmin/argmax, it is
IMHO undesirable to set it differently at the different code paths,
if you want to check whether the final range includes zero and at least
one another value, just do
-  if (likely_adjust && maybebase && base != 10)
+  if ((tree_int_cst_sgn (argmin) < 0 || tree_int_cst_sgn (argmax) > 0)
   && maybebase && base != 10)
Though, it is useless both for the above reason and for the reason that you
actually do something only:

I'm not convinced it's useless, but it does seem advisable to bring test
down to where it's actually used and to bse it strictly on argmin/argmax.
Can you test a patch which does that?


That would then be (the only difference compared to the previous patch is
the last hunk) following.  I can surely test that, I'm still convinced it
would work equally if that
(tree_int_cst_sgn (argmin) < 0 || tree_int_cst_sgn (argmax) > 0)
is just gcc_checking_assert.

2017-02-14  Jakub Jelinek  

PR tree-optimization/79327
* gimple-ssa-sprintf.c (format_integer): Remove likely_adjust
variable, its initialization and use.

--- gcc/gimple-ssa-sprintf.c.jj 2017-02-04 08:43:12.0 +0100
+++ gcc/gimple-ssa-sprintf.c2017-02-04 08:45:33.173709580 +0100
@@ -1232,10 +1232,6 @@ format_integer (const directive , tr
of the format string by returning [-1, -1].  */
 return fmtresult ();

-  /* True if the LIKELY counter should be adjusted upward from the MIN
- counter to account for arguments with unknown values.  */
-  bool likely_adjust = false;
-
   fmtresult res;

   /* Using either the range the non-constant argument is in, or its
@@ -1265,14 +1261,6 @@ format_integer (const directive , tr

  res.argmin = argmin;
  res.argmax = argmax;
-
- /* Set the adjustment for an argument whose range includes
-zero since that doesn't include the octal or hexadecimal
-base prefix.  */
- wide_int wzero = wi::zero (wi::get_precision (min));
- if (wi::le_p (min, wzero, SIGNED)
- && !wi::neg_p (max))
-   likely_adjust = true;
}
   else if (range_type == VR_ANTI_RANGE)
{
@@ -1307,11 +1295,6 @@ format_integer (const directive , tr

   if (!argmin)
 {
-  /* Set the adjustment for an argument whose range includes
-zero since that doesn't include the octal or hexadecimal
-base prefix.  */
-  likely_adjust = true;
-
   if (TREE_CODE (argtype) == POINTER_TYPE)
{
  argmin = build_int_cst (pointer_sized_int_node, 0);
@@ -1371,7 +1354,8 @@ format_integer (const directive , tr
   else
 {
   res.range.likely = res.range.min;
-  if (likely_adjust && maybebase && base != 10)
+  if (maybebase && base != 10
+ && (tree_int_cst_sgn (argmin) < 0 || tree_int_cst_sgn (argmax) > 0))
{
  if (res.range.min == 1)
res.range.likely += base == 8 ? 1 : 2;


You've removed all the comments that explain what's going on.  If
you must make the change (I see no justification for it now) please
at least document it.

Martin



[PATCH] rs6000: Synchronize the --with-cpu list in config.gcc with reality

2017-02-14 Thread Segher Boessenkool
power, power2, rios, rios1, rios2, rsc, rsc2 support was removed.
rs64a never was a supported option; it's spelled rs64.
power5+ and powerpc64le are supported options but could not be set as
default.


Segher


2017-02-13  Segher Boessenkool  

* config.gcc (supported_defaults) [powerpc*-*-*]: Update.

---
 gcc/config.gcc | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index c7d3899..3d3bf65 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -4260,8 +4260,9 @@ case "${target}" in
eval "with_$which=405"
;;
"" | common | native \
-   | power | power[23456789] | power6x | powerpc | 
powerpc64 \
-   | rios | rios1 | rios2 | rsc | rsc1 | rs64a \
+   | power[3456789] | power5+ | power6x \
+   | powerpc | powerpc64 | powerpc64le \
+   | rs64 \
| 401 | 403 | 405 | 405fp | 440 | 440fp | 464 | 464fp \
| 476 | 476fp | 505 | 601 | 602 | 603 | 603e | ec603e \
| 604 | 604e | 620 | 630 | 740 | 750 | 7400 | 7450 \
-- 
1.9.3



Re: [PATCH][GRAPHITE] Use generic isl-val interface, not gmp special one

2017-02-14 Thread Richard Biener
On February 14, 2017 4:50:32 PM GMT+01:00, Sebastian Pop  
wrote:
>On Tue, Feb 14, 2017 at 7:09 AM, Richard Biener 
>wrote:
>>
>> This removes all GMP code from graphite and instead arranges to use
>> widest_ints plus the generic ISL interface for building/creating vals
>> by pieces.  This removes one gmp allocation per conversion plus
>allows
>> ISL to be built with IMath or IMath with small integer optimization
>> (on the host or in-tree).
>>
>> ISL 0.15 supports IMath already but not IMath with small integer
>> optimization.  I didn't adjust Makefile.def to choose anything other
>> than the GMP default for in-tree builds (yet).
>>
>> I built and tested GCC with ISL 0.15, 0.16.1 and 0.18 all with GMP
>> and with IMath (or IMath-32 where available).
>
>Have you checked the speedup of using gmp / imath / imath32
>when bootstrapping with BOOT_CFLAGS="-O2 -floop-nest-optimize"?

No, I didn't do any measurements.

Richard.

>>
>> Full bootstrap and regtest running on x86_64-unknown-linux-gnu
>> (with host ISL 0.16.1).
>>
>> Ok for trunk?
>>
>> Thanks,
>> Richard.
>>
>> 2017-02-14  Richard Biener  
>>
>> * graphite.h: Do not include isl/isl_val_gmp.h, instead
>include
>> isl/isl_val.h.
>> * graphite-isl-ast-to-gimple.c (gmp_cst_to_tree): Remove.
>> (gcc_expression_from_isl_expr_int): Use generic isl_val
>interface.
>> * graphite-sese-to-poly.c: Do not include isl/isl_val_gmp.h.
>> (isl_val_int_from_wi): New function.
>> (extract_affine_gmp): Rename to ...
>> (extract_affine_wi): ... this, take a widest_int.
>> (extract_affine_int): Just wrap extract_affine_wi.
>> (add_param_constraints): Use isl_val_int_from_wi.
>> (add_loop_constraints): Likewise, and extract_affine_wi.
>
>Looks good to me.  Thanks!
>
>Sebastian
>
>>
>> Index: gcc/graphite-isl-ast-to-gimple.c
>> ===
>> --- gcc/graphite-isl-ast-to-gimple.c(revision 245417)
>> +++ gcc/graphite-isl-ast-to-gimple.c(working copy)
>> @@ -73,22 +73,6 @@ struct ast_build_info
>>bool is_parallelizable;
>>  };
>>
>> -/* Converts a GMP constant VAL to a tree and returns it.  */
>> -
>> -static tree
>> -gmp_cst_to_tree (tree type, mpz_t val)
>> -{
>> -  tree t = type ? type : integer_type_node;
>> -  mpz_t tmp;
>> -
>> -  mpz_init (tmp);
>> -  mpz_set (tmp, val);
>> -  wide_int wi = wi::from_mpz (t, tmp, true);
>> -  mpz_clear (tmp);
>> -
>> -  return wide_int_to_tree (t, wi);
>> -}
>> -
>>  /* Verifies properties that GRAPHITE should maintain during
>translation.  */
>>
>>  static inline void
>> @@ -325,16 +309,20 @@ gcc_expression_from_isl_expr_int (tree t
>>  {
>>gcc_assert (isl_ast_expr_get_type (expr) == isl_ast_expr_int);
>>isl_val *val = isl_ast_expr_get_val (expr);
>> -  mpz_t val_mpz_t;
>> -  mpz_init (val_mpz_t);
>> +  size_t n = isl_val_n_abs_num_chunks (val, sizeof (HOST_WIDE_INT));
>> +  HOST_WIDE_INT *chunks = XALLOCAVEC (HOST_WIDE_INT, n);
>>tree res;
>> -  if (isl_val_get_num_gmp (val, val_mpz_t) == -1)
>> +  if (isl_val_get_abs_num_chunks (val, sizeof (HOST_WIDE_INT),
>chunks) == -1)
>>  res = NULL_TREE;
>>else
>> -res = gmp_cst_to_tree (type, val_mpz_t);
>> +{
>> +  widest_int wi = widest_int::from_array (chunks, n, true);
>> +  if (isl_val_is_neg (val))
>> +   wi = -wi;
>> +  res = wide_int_to_tree (type, wi);
>> +}
>>isl_val_free (val);
>>isl_ast_expr_free (expr);
>> -  mpz_clear (val_mpz_t);
>>return res;
>>  }
>>
>> Index: gcc/graphite-sese-to-poly.c
>> ===
>> --- gcc/graphite-sese-to-poly.c (revision 245417)
>> +++ gcc/graphite-sese-to-poly.c (working copy)
>> @@ -55,7 +55,6 @@ along with GCC; see the file COPYING3.
>>  #include 
>>  #include 
>>  #include 
>> -#include 
>>
>>  #include "graphite.h"
>>
>> @@ -154,16 +153,32 @@ extract_affine_name (scop_p s, tree e, _
>>return isl_pw_aff_alloc (dom, aff);
>>  }
>>
>> +/* Convert WI to a isl_val with CTX.  */
>> +
>> +static __isl_give isl_val *
>> +isl_val_int_from_wi (isl_ctx *ctx, const widest_int )
>> +{
>> +  if (wi::neg_p (wi, SIGNED))
>> +{
>> +  widest_int mwi = -wi;
>> +  return isl_val_neg (isl_val_int_from_chunks (ctx, mwi.get_len
>(),
>> +  sizeof
>(HOST_WIDE_INT),
>> +  mwi.get_val ()));
>> +}
>> +  return isl_val_int_from_chunks (ctx, wi.get_len (), sizeof
>(HOST_WIDE_INT),
>> + wi.get_val ());
>> +}
>> +
>>  /* Extract an affine expression from the gmp constant G.  */
>>
>>  static isl_pw_aff *
>> -extract_affine_gmp (mpz_t g, __isl_take isl_space *space)
>> +extract_affine_wi (const widest_int , __isl_take isl_space *space)
>>  {
>>isl_local_space *ls = isl_local_space_from_space (isl_space_copy

Re: [PATCH] Fix PR79460

2017-02-14 Thread Richard Biener
On February 14, 2017 4:19:05 PM GMT+01:00, "Bin.Cheng"  
wrote:
>On Tue, Feb 14, 2017 at 2:48 PM, Richard Biener 
>wrote:
>>
>> The following enables final value replacement for floating point
>> expressions if -funsafe-math-optimizations is set (that's the
>> flag the reassoc pass controls similar transforms on).
>Looks to me it's kind of abusing of current implementation of SCEV for
>floating point values.  I believe it's designed only with integral
>type in mind, for example, we may need to reject float time when
>tracking scev chain via type conversion.

Note the vectorizer relies on SCEV itself here.

Richard.

>Thanks,
>bin
>>
>> Bootstrapped / tested on x86_64-unknown-linux-gnu, queued for GCC 8.
>>
>> Richard.
>>
>> 2017-02-14  Richard Biener  
>>
>> PR tree-optimization/79460
>> * tree-scalar-evolution.c (final_value_replacement_loop):
>Also
>> allow final value replacement of floating point expressions.
>>
>> * gcc.dg/tree-ssa/sccp-3.c: New testcase.
>>
>> Index: gcc/tree-scalar-evolution.c
>> ===
>> --- gcc/tree-scalar-evolution.c (revision 245417)
>> +++ gcc/tree-scalar-evolution.c (working copy)
>> @@ -3718,8 +3718,10 @@ final_value_replacement_loop (struct loo
>>   continue;
>> }
>>
>> -  if (!POINTER_TYPE_P (TREE_TYPE (def))
>> - && !INTEGRAL_TYPE_P (TREE_TYPE (def)))
>> +  if (! (POINTER_TYPE_P (TREE_TYPE (def))
>> +|| INTEGRAL_TYPE_P (TREE_TYPE (def))
>> +|| (FLOAT_TYPE_P (TREE_TYPE (def))
>> +&& flag_unsafe_math_optimizations)))
>> {
>>   gsi_next ();
>>   continue;
>> Index: gcc/testsuite/gcc.dg/tree-ssa/sccp-3.c
>> ===
>> --- gcc/testsuite/gcc.dg/tree-ssa/sccp-3.c  (nonexistent)
>> +++ gcc/testsuite/gcc.dg/tree-ssa/sccp-3.c  (working copy)
>> @@ -0,0 +1,12 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -funsafe-math-optimizations -fdump-tree-sccp" }
>*/
>> +
>> +float f(float x[])
>> +{
>> +  float p = 1.0;
>> +  for (int i = 0; i < 200; i++)
>> +p += 1;
>> +  return p;
>> +}
>> +
>> +/* { dg-final { scan-tree-dump "final value replacement.* =
>2.01e\\+2;" "sccp" } } */



Re: [Aarch64] Enable descriptors for nested functions in Ada

2017-02-14 Thread Wilco Dijkstra
On 13/11/16 22:30, Eric Botcazou wrote:
> +/* The architecture reserves bits 0 and 1 so use bit 2 for descriptors.  */
> +#undef TARGET_CUSTOM_FUNCTION_DESCRIPTORS
> +#define TARGET_CUSTOM_FUNCTION_DESCRIPTORS 4

Doesn't this imply a minimum function alignment of 8? That's not guaranteed
on AArch64, at least -mcpu=exynos-m1 uses 4-byte alignment.

Wilco


[Committed] S/390: Cleanup: Remove builtin type flags.

2017-02-14 Thread Andreas Krebbel
With the target attribute stuff the only user of the builtin types
flags value has been removed.  So drop that value from the builtin
types list entirely.

gcc/ChangeLog:

2017-02-14  Andreas Krebbel  

* config/s390/s390-builtin-types.def: Remove flags argument.
* config/s390/s390.c (s390_init_builtins): Likewise.
---
 gcc/config/s390/s390-builtin-types.def | 568 -
 gcc/config/s390/s390.c |  12 +-
 2 files changed, 287 insertions(+), 293 deletions(-)

diff --git a/gcc/config/s390/s390-builtin-types.def 
b/gcc/config/s390/s390-builtin-types.def
index a221203..b785dc5 100644
--- a/gcc/config/s390/s390-builtin-types.def
+++ b/gcc/config/s390/s390-builtin-types.def
@@ -19,320 +19,314 @@
along with GCC; see the file COPYING3.  If not see
.  */
 
-#define DEF_FN_TYPE_0(FN_TYPE, FLAGS, T1)  \
+#define DEF_FN_TYPE_0(FN_TYPE, T1) \
   DEF_FN_TYPE (FN_TYPE,\
-  FLAGS,   \
   s390_builtin_types[T1])
-#define DEF_FN_TYPE_1(FN_TYPE, FLAGS, T1, T2)  \
+#define DEF_FN_TYPE_1(FN_TYPE, T1, T2) \
   DEF_FN_TYPE (FN_TYPE,\
-  FLAGS,   \
   s390_builtin_types[T1],  \
   s390_builtin_types[T2])
-#define DEF_FN_TYPE_2(FN_TYPE, FLAGS, T1, T2, T3)  \
+#define DEF_FN_TYPE_2(FN_TYPE, T1, T2, T3) \
   DEF_FN_TYPE (FN_TYPE,\
-  FLAGS,   \
   s390_builtin_types[T1],  \
   s390_builtin_types[T2],  \
   s390_builtin_types[T3])
-#define DEF_FN_TYPE_3(FN_TYPE, FLAGS, T1, T2, T3, T4)  \
+#define DEF_FN_TYPE_3(FN_TYPE, T1, T2, T3, T4) \
   DEF_FN_TYPE (FN_TYPE,\
-  FLAGS,   \
   s390_builtin_types[T1],  \
   s390_builtin_types[T2],  \
   s390_builtin_types[T3],  \
   s390_builtin_types[T4])
-#define DEF_FN_TYPE_4(FN_TYPE, FLAGS, T1, T2, T3, T4, T5)  \
+#define DEF_FN_TYPE_4(FN_TYPE, T1, T2, T3, T4, T5) \
   DEF_FN_TYPE (FN_TYPE,\
-  FLAGS,   \
   s390_builtin_types[T1],  \
   s390_builtin_types[T2],  \
   s390_builtin_types[T3],  \
   s390_builtin_types[T4],  \
   s390_builtin_types[T5])
-#define DEF_FN_TYPE_5(FN_TYPE, FLAGS, T1, T2, T3, T4, T5, T6)  \
+#define DEF_FN_TYPE_5(FN_TYPE, T1, T2, T3, T4, T5, T6) \
   DEF_FN_TYPE (FN_TYPE,\
-  FLAGS,   \
   s390_builtin_types[T1],  \
   s390_builtin_types[T2],  \
   s390_builtin_types[T3],  \
   s390_builtin_types[T4],  \
   s390_builtin_types[T5],  \
   s390_builtin_types[T6])
-DEF_TYPE (BT_INT, B_HTM | B_VX, integer_type_node, 0)
-DEF_TYPE (BT_VOID, 0, void_type_node, 0)
-DEF_TYPE (BT_FLTCONST, B_VX, float_type_node, 1)
-DEF_TYPE (BT_UINT64, B_HTM, c_uint64_type_node, 0)
-DEF_TYPE (BT_FLT, B_VX, float_type_node, 0)
-DEF_TYPE (BT_UINT, 0, unsigned_type_node, 0)
-DEF_TYPE (BT_VOIDCONST, B_VX, void_type_node, 1)
-DEF_TYPE (BT_ULONG, B_VX, long_unsigned_type_node, 0)
-DEF_TYPE (BT_INT128, B_VX, intTI_type_node, 0)
-DEF_TYPE (BT_USHORTCONST, B_VX, short_unsigned_type_node, 1)
-DEF_TYPE (BT_SHORTCONST, B_VX, short_integer_type_node, 1)
-DEF_TYPE (BT_INTCONST, B_VX, integer_type_node, 1)
-DEF_TYPE (BT_UCHARCONST, B_VX, unsigned_char_type_node, 1)
-DEF_TYPE (BT_UCHAR, B_VX, unsigned_char_type_node, 0)
-DEF_TYPE (BT_SCHARCONST, B_VX, signed_char_type_node, 1)
-DEF_TYPE (BT_SHORT, B_VX, short_integer_type_node, 0)
-DEF_TYPE (BT_LONG, B_VX, long_integer_type_node, 0)
-DEF_TYPE (BT_SCHAR, B_VX, signed_char_type_node, 0)
-DEF_TYPE (BT_ULONGLONGCONST, B_VX, long_long_unsigned_type_node, 1)
-DEF_TYPE (BT_USHORT, B_VX, short_unsigned_type_node, 0)
-DEF_TYPE (BT_LONGLONG, B_VX, long_long_integer_type_node, 0)
-DEF_TYPE (BT_DBLCONST, B_VX, double_type_node, 1)
-DEF_TYPE (BT_ULONGLONG, B_VX, long_long_unsigned_type_node, 0)
-DEF_TYPE (BT_DBL, B_VX, double_type_node, 0)
-DEF_TYPE (BT_LONGLONGCONST, B_VX, long_long_integer_type_node, 1)
-DEF_TYPE (BT_UINTCONST, B_VX, unsigned_type_node, 1)
-DEF_VECTOR_TYPE (BT_UV2DI, B_VX, BT_ULONGLONG, 2)
-DEF_VECTOR_TYPE (BT_V4SI, B_VX, 

Re: [PATCH] rs6000: Mark powerpc*-*-*spe* as obsolete

2017-02-14 Thread David Edelsohn
On Tue, Feb 14, 2017 at 10:22 AM, Segher Boessenkool
 wrote:
> As discussed in .
>
> Is this okay for trunk?
>
>
> Segher
>
>
> 2017-02-14  Segher Boessenkool  
>
> * config.gcc (Obsolete configurations): Add powerpc*-*-*spe* .

Okay.

Thanks, David


[PATCH] rs6000: Mark powerpc*-*-*spe* as obsolete

2017-02-14 Thread Segher Boessenkool
As discussed in .

Is this okay for trunk?


Segher


2017-02-14  Segher Boessenkool  

* config.gcc (Obsolete configurations): Add powerpc*-*-*spe* .

---
 gcc/config.gcc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index ddfa4dc..c7d3899 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -236,7 +236,7 @@ md_file=
 
 # Obsolete configurations.
 case ${target} in
- nothing   \
+ powerpc*-*-*spe*  \
  )
 if test "x$enable_obsolete" != xyes; then
   echo "*** Configuration ${target} is obsolete." >&2
-- 
1.9.3



Re: [PATCH] Fix PR79460

2017-02-14 Thread Bin.Cheng
On Tue, Feb 14, 2017 at 2:48 PM, Richard Biener  wrote:
>
> The following enables final value replacement for floating point
> expressions if -funsafe-math-optimizations is set (that's the
> flag the reassoc pass controls similar transforms on).
Looks to me it's kind of abusing of current implementation of SCEV for
floating point values.  I believe it's designed only with integral
type in mind, for example, we may need to reject float time when
tracking scev chain via type conversion.

Thanks,
bin
>
> Bootstrapped / tested on x86_64-unknown-linux-gnu, queued for GCC 8.
>
> Richard.
>
> 2017-02-14  Richard Biener  
>
> PR tree-optimization/79460
> * tree-scalar-evolution.c (final_value_replacement_loop): Also
> allow final value replacement of floating point expressions.
>
> * gcc.dg/tree-ssa/sccp-3.c: New testcase.
>
> Index: gcc/tree-scalar-evolution.c
> ===
> --- gcc/tree-scalar-evolution.c (revision 245417)
> +++ gcc/tree-scalar-evolution.c (working copy)
> @@ -3718,8 +3718,10 @@ final_value_replacement_loop (struct loo
>   continue;
> }
>
> -  if (!POINTER_TYPE_P (TREE_TYPE (def))
> - && !INTEGRAL_TYPE_P (TREE_TYPE (def)))
> +  if (! (POINTER_TYPE_P (TREE_TYPE (def))
> +|| INTEGRAL_TYPE_P (TREE_TYPE (def))
> +|| (FLOAT_TYPE_P (TREE_TYPE (def))
> +&& flag_unsafe_math_optimizations)))
> {
>   gsi_next ();
>   continue;
> Index: gcc/testsuite/gcc.dg/tree-ssa/sccp-3.c
> ===
> --- gcc/testsuite/gcc.dg/tree-ssa/sccp-3.c  (nonexistent)
> +++ gcc/testsuite/gcc.dg/tree-ssa/sccp-3.c  (working copy)
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -funsafe-math-optimizations -fdump-tree-sccp" } */
> +
> +float f(float x[])
> +{
> +  float p = 1.0;
> +  for (int i = 0; i < 200; i++)
> +p += 1;
> +  return p;
> +}
> +
> +/* { dg-final { scan-tree-dump "final value replacement.* = 2.01e\\+2;" 
> "sccp" } } */


Re: [PATCH] Fix PR79460

2017-02-14 Thread Richard Biener
On Tue, 14 Feb 2017, Jakub Jelinek wrote:

> On Tue, Feb 14, 2017 at 03:48:38PM +0100, Richard Biener wrote:
> > 2017-02-14  Richard Biener  
> > 
> > PR tree-optimization/79460
> > * tree-scalar-evolution.c (final_value_replacement_loop): Also
> > allow final value replacement of floating point expressions.
> > 
> > * gcc.dg/tree-ssa/sccp-3.c: New testcase.
> > 
> > Index: gcc/tree-scalar-evolution.c
> > ===
> > --- gcc/tree-scalar-evolution.c (revision 245417)
> > +++ gcc/tree-scalar-evolution.c (working copy)
> > @@ -3718,8 +3718,10 @@ final_value_replacement_loop (struct loo
> >   continue;
> > }
> >  
> > -  if (!POINTER_TYPE_P (TREE_TYPE (def))
> > - && !INTEGRAL_TYPE_P (TREE_TYPE (def)))
> > +  if (! (POINTER_TYPE_P (TREE_TYPE (def))
> > +|| INTEGRAL_TYPE_P (TREE_TYPE (def))
> > +|| (FLOAT_TYPE_P (TREE_TYPE (def))
> > +&& flag_unsafe_math_optimizations)))
> 
> I think Segher mentioned in the PR that this should be better
> flag_associative_math.

We don't associate anything though.

Technically we could, for constant niter, fully simulate
the rounding effects (flag_rounding_math would need checking then).

Given that reassoc transforms x + x + x + x -> 4 * x with just
-funsafe-math-optimizations using that flag is at least consistent.

I think flag_associative_math wasn't meant to change the number of
rounding steps.  flag_fp_contract_mode controls this IMHO, but we
have the reassoc precedent...

>  Also, FLOAT_TYPE_P stands not just for
> SCALAR_FLOAT_TYPE_P, but for COMPLEX_TYPE and VECTOR_TYPE thereof as well.
> Does SCEV handle complex and vector types well (it would be really nice
> if it could of course, but then we should use ANY_INTEGRAL_TYPE_P as
> well to also handle complex and vector integers)?

SCEV should gate out types it doesn't handle itself -- it already
gates out VECTOR_TYPE:

static tree
analyze_scalar_evolution_1 (struct loop *loop, tree var, tree res)
{
  tree type = TREE_TYPE (var);
  gimple *def;
  basic_block bb;
  struct loop *def_loop;

  if (loop == NULL || TREE_CODE (type) == VECTOR_TYPE)
return chrec_dont_know;

and given that it special-cases only REAL_CST, FIXED_CST and INTEGER_CST
in get_scalar_evolution I doubt it handles anything else reasonably.

But as I said, it's SCEVs job to reject them, not users of the API.
I see no reason to not allow vectors for example (it's just the
code might be not ready and uses wrong tree building interfaces).

Richard.


>   Jakub
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: Patch ping^2

2017-02-14 Thread Nathan Sidwell

On 02/13/2017 10:46 AM, Jakub Jelinek wrote:

Hi!

I'd like to ping a couple of patches:



- C++ P1 PR79288 - wrong default TLS model for __thread static data members
  http://gcc.gnu.org/ml/gcc-patches/2017-01/msg02349.html


This is ok, but don't you think the changelog is misleading?  In your 
description you say it needs DECL_EXTERNAL set, but the changelog says 
'inline', which isn't something static member vars have (although I can 
see how it's involved in DECL_EXTERNAL setting).


nathan
--
Nathan Sidwell


Re: [PATCH] Fix PR79460

2017-02-14 Thread Jakub Jelinek
On Tue, Feb 14, 2017 at 03:48:38PM +0100, Richard Biener wrote:
> 2017-02-14  Richard Biener  
> 
>   PR tree-optimization/79460
>   * tree-scalar-evolution.c (final_value_replacement_loop): Also
>   allow final value replacement of floating point expressions.
> 
>   * gcc.dg/tree-ssa/sccp-3.c: New testcase.
> 
> Index: gcc/tree-scalar-evolution.c
> ===
> --- gcc/tree-scalar-evolution.c   (revision 245417)
> +++ gcc/tree-scalar-evolution.c   (working copy)
> @@ -3718,8 +3718,10 @@ final_value_replacement_loop (struct loo
> continue;
>   }
>  
> -  if (!POINTER_TYPE_P (TREE_TYPE (def))
> -   && !INTEGRAL_TYPE_P (TREE_TYPE (def)))
> +  if (! (POINTER_TYPE_P (TREE_TYPE (def))
> +  || INTEGRAL_TYPE_P (TREE_TYPE (def))
> +  || (FLOAT_TYPE_P (TREE_TYPE (def))
> +  && flag_unsafe_math_optimizations)))

I think Segher mentioned in the PR that this should be better
flag_associative_math.  Also, FLOAT_TYPE_P stands not just for
SCALAR_FLOAT_TYPE_P, but for COMPLEX_TYPE and VECTOR_TYPE thereof as well.
Does SCEV handle complex and vector types well (it would be really nice
if it could of course, but then we should use ANY_INTEGRAL_TYPE_P as
well to also handle complex and vector integers)?

Jakub


[PATCH] Improve unroller size estimate

2017-02-14 Thread Richard Biener

The following patch improves the constant_after_peeling estimate of
the GIMPLE unroller by not requiring a strictly "simple-iv" but
an evolution w/o symbols.  It also avoids computing any of this for
ops defined in a subloop of the loop we unroll (that only yields
garbage).  So it makes constant_after_peeling cheaper as well.

It also adjusts the simple-minded CCP to propagate all constants
(esp. float and vector constants).

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress, queued for 
GCC 8.

Richard.

2016-02-14  Richard Biener  

* tree-ssa-loop-ivcanon.c (constant_after_peeling): Do not require
sth as strict as a simple_iv but a chrec without symbols and an
operand defined in the loop we are peeling (and not some subloop).
(propagate_constants_for_unrolling): Propagate all constants.

* gcc.dg/vect/no-scevccp-outer-13.c: Adjust to prevent unrolling
of inner loops.
* gcc.dg/vect/no-scevccp-outer-7.c: Likewise.
* gcc.dg/vect/vect-104.c: Likewise.

Index: gcc/tree-ssa-loop-ivcanon.c
===
--- gcc/tree-ssa-loop-ivcanon.c (revision 245417)
+++ gcc/tree-ssa-loop-ivcanon.c (working copy)
@@ -157,8 +157,6 @@ struct loop_size
 static bool
 constant_after_peeling (tree op, gimple *stmt, struct loop *loop)
 {
-  affine_iv iv;
-
   if (is_gimple_min_invariant (op))
 return true;
 
@@ -188,12 +186,12 @@ constant_after_peeling (tree op, gimple
   return false;
 }
 
-  /* Induction variables are constants.  */
-  if (!simple_iv (loop, loop_containing_stmt (stmt), op, , false))
-return false;
-  if (!is_gimple_min_invariant (iv.base))
+  /* Induction variables are constants when defined in loop.  */
+  if (loop_containing_stmt (stmt) != loop)
 return false;
-  if (!is_gimple_min_invariant (iv.step))
+  tree ev = analyze_scalar_evolution (loop, op);
+  if (chrec_contains_undetermined (ev)
+  || chrec_contains_symbols (ev))
 return false;
   return true;
 }
@@ -1259,7 +1257,7 @@ propagate_constants_for_unrolling (basic
 
   if (! SSA_NAME_OCCURS_IN_ABNORMAL_PHI (result)
  && gimple_phi_num_args (phi) == 1
- && TREE_CODE (arg) == INTEGER_CST)
+ && CONSTANT_CLASS_P (arg))
{
  replace_uses_by (result, arg);
  gsi_remove (, true);
@@ -1276,7 +1274,7 @@ propagate_constants_for_unrolling (basic
   tree lhs;
 
   if (is_gimple_assign (stmt)
- && gimple_assign_rhs_code (stmt) == INTEGER_CST
+ && TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) == tcc_constant
  && (lhs = gimple_assign_lhs (stmt), TREE_CODE (lhs) == SSA_NAME)
  && !SSA_NAME_OCCURS_IN_ABNORMAL_PHI (lhs))
{
Index: gcc/testsuite/gcc.dg/vect/no-scevccp-outer-13.c
===
--- gcc/testsuite/gcc.dg/vect/no-scevccp-outer-13.c (revision 245417)
+++ gcc/testsuite/gcc.dg/vect/no-scevccp-outer-13.c (working copy)
@@ -1,4 +1,5 @@
 /* { dg-require-effective-target vect_int } */
+/* { dg-additional-options "--param max-completely-peel-times=1" } */
 
 #include 
 #include "tree-vect.h"
Index: gcc/testsuite/gcc.dg/vect/no-scevccp-outer-7.c
===
--- gcc/testsuite/gcc.dg/vect/no-scevccp-outer-7.c  (revision 245417)
+++ gcc/testsuite/gcc.dg/vect/no-scevccp-outer-7.c  (working copy)
@@ -1,4 +1,5 @@
 /* { dg-require-effective-target vect_int } */
+/* { dg-additional-options "--param max-completely-peel-times=1" } */
 
 #include 
 #include "tree-vect.h"
Index: gcc/testsuite/gcc.dg/vect/vect-104.c
===
--- gcc/testsuite/gcc.dg/vect/vect-104.c(revision 245417)
+++ gcc/testsuite/gcc.dg/vect/vect-104.c(working copy)
@@ -1,4 +1,5 @@
 /* { dg-require-effective-target vect_int } */
+/* { dg-additional-options "--param max-completely-peel-times=1" } */
 
 #include 
 #include 


Re: [ARM] Enable descriptors for nested functions in Ada

2017-02-14 Thread Richard Earnshaw (lists)
On 13/11/16 22:31, Eric Botcazou wrote:
> Similarly to x86, PowerPC and SPARC, this enables the use of custom run-time 
> descriptors in Ada, thus eliminating the need for trampolines and executable 
> stack in presence of pointers to nested functions.
> 
> This still uses bit 1 for the run-time identification scheme because bumping 
> the function alignment to 64 bits seems undesirable in Thumb mode.
> 
> Tested on ARM/Linux, OK for the mainline?
> 
> 
> 2016-11-13  Eric Botcazou  
> 
>   PR ada/67205
>   * config/arm/arm.c (TARGET_CUSTOM_FUNCTION_DESCRIPTORS): Define.
>   (arm_function_ok_for_sibcall): Return false for an indirect call by
>   descriptor if all the argument registers are used.
>   (arm_relayout_function): Use FUNCTION_ALIGNMENT macro to adjust the
>   alignment of the function.
> 

Is this ABI, or private to a release of the compiler?  If the latter,
then OK.  Otherwise, I don't think we should presume that the reserved
bits won't get used.

R.

> 
> p.diff
> 
> 
> Index: config/arm/arm.c
> ===
> --- config/arm/arm.c  (revision 242334)
> +++ config/arm/arm.c  (working copy)
> @@ -738,6 +738,11 @@ static const struct attribute_spec arm_a
>  #undef TARGET_EXPAND_DIVMOD_LIBFUNC
>  #define TARGET_EXPAND_DIVMOD_LIBFUNC arm_expand_divmod_libfunc
>  
> +/* Although the architecture reserves bits 0 and 1, only the former is
> +   used for ARM/Thumb ISA selection in v7 and earlier versions.  */
> +#undef TARGET_CUSTOM_FUNCTION_DESCRIPTORS
> +#define TARGET_CUSTOM_FUNCTION_DESCRIPTORS 2
> +
>  struct gcc_target targetm = TARGET_INITIALIZER;
>  
>  /* Obstack for minipool constant handling.  */
> @@ -6810,6 +6815,29 @@ arm_function_ok_for_sibcall (tree decl,
>&& DECL_WEAK (decl))
>  return false;
>  
> +  /* We cannot do a tailcall for an indirect call by descriptor if all the
> + argument registers are used because the only register left to load the
> + address is IP and it will already contain the static chain.  */
> +  if (!decl && CALL_EXPR_BY_DESCRIPTOR (exp) && !flag_trampolines)
> +{
> +  tree fntype = TREE_TYPE (TREE_TYPE (CALL_EXPR_FN (exp)));
> +  CUMULATIVE_ARGS cum;
> +  cumulative_args_t cum_v;
> +
> +  arm_init_cumulative_args (, fntype, NULL_RTX, NULL_TREE);
> +  cum_v = pack_cumulative_args ();
> +
> +  for (tree t = TYPE_ARG_TYPES (fntype); t; t = TREE_CHAIN (t))
> + {
> +   tree type = TREE_VALUE (t);
> +   if (!VOID_TYPE_P (type))
> + arm_function_arg_advance (cum_v, TYPE_MODE (type), type, true);
> + }
> +
> +  if (!arm_function_arg (cum_v, SImode, integer_type_node, true))
> + return false;
> +}
> +
>/* Everything else is ok.  */
>return true;
>  }
> @@ -29101,7 +29129,9 @@ arm_relayout_function (tree fndecl)
>  callee_tree = target_option_default_node;
>  
>struct cl_target_option *opts = TREE_TARGET_OPTION (callee_tree);
> -  SET_DECL_ALIGN (fndecl, FUNCTION_BOUNDARY_P (opts->x_target_flags));
> +  SET_DECL_ALIGN
> +(fndecl,
> + FUNCTION_ALIGNMENT (FUNCTION_BOUNDARY_P (opts->x_target_flags)));
>  }
>  
>  /* Inner function to process the attribute((target(...))), take an argument 
> and
> 



Re: [Aarch64] Enable descriptors for nested functions in Ada

2017-02-14 Thread Richard Earnshaw (lists)
On 13/11/16 22:30, Eric Botcazou wrote:
> Similarly to x86, PowerPC and SPARC, this enables the use of custom run-time 
> descriptors in Ada, thus eliminating the need for trampolines and executable 
> stack in presence of pointers to nested functions.
> 
> Tested on Aarch64/Linux, OK for the mainline?
> 
> 
> 2016-11-13  Eric Botcazou  
> 
>   PR ada/67205
>   * config/aarch64/aarch64.c (TARGET_CUSTOM_FUNCTION_DESCRIPTORS):
>   Define.
> 

Sorry, missed this.

OK.

R.

> 
> p.diff
> 
> 
> Index: config/aarch64/aarch64.c
> ===
> --- config/aarch64/aarch64.c  (revision 242334)
> +++ config/aarch64/aarch64.c  (working copy)
> @@ -14502,6 +14502,10 @@ aarch64_optab_supported_p (int op, machi
>  #undef TARGET_OMIT_STRUCT_RETURN_REG
>  #define TARGET_OMIT_STRUCT_RETURN_REG true
>  
> +/* The architecture reserves bits 0 and 1 so use bit 2 for descriptors.  */
> +#undef TARGET_CUSTOM_FUNCTION_DESCRIPTORS
> +#define TARGET_CUSTOM_FUNCTION_DESCRIPTORS 4
> +
>  struct gcc_target targetm = TARGET_INITIALIZER;
>  
>  #include "gt-aarch64.h"
> 



[PATCH] Fix PR79460

2017-02-14 Thread Richard Biener

The following enables final value replacement for floating point
expressions if -funsafe-math-optimizations is set (that's the
flag the reassoc pass controls similar transforms on).

Bootstrapped / tested on x86_64-unknown-linux-gnu, queued for GCC 8.

Richard.

2017-02-14  Richard Biener  

PR tree-optimization/79460
* tree-scalar-evolution.c (final_value_replacement_loop): Also
allow final value replacement of floating point expressions.

* gcc.dg/tree-ssa/sccp-3.c: New testcase.

Index: gcc/tree-scalar-evolution.c
===
--- gcc/tree-scalar-evolution.c (revision 245417)
+++ gcc/tree-scalar-evolution.c (working copy)
@@ -3718,8 +3718,10 @@ final_value_replacement_loop (struct loo
  continue;
}
 
-  if (!POINTER_TYPE_P (TREE_TYPE (def))
- && !INTEGRAL_TYPE_P (TREE_TYPE (def)))
+  if (! (POINTER_TYPE_P (TREE_TYPE (def))
+|| INTEGRAL_TYPE_P (TREE_TYPE (def))
+|| (FLOAT_TYPE_P (TREE_TYPE (def))
+&& flag_unsafe_math_optimizations)))
{
  gsi_next ();
  continue;
Index: gcc/testsuite/gcc.dg/tree-ssa/sccp-3.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/sccp-3.c  (nonexistent)
+++ gcc/testsuite/gcc.dg/tree-ssa/sccp-3.c  (working copy)
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -funsafe-math-optimizations -fdump-tree-sccp" } */
+
+float f(float x[])
+{
+  float p = 1.0;
+  for (int i = 0; i < 200; i++)
+p += 1;
+  return p;
+}
+
+/* { dg-final { scan-tree-dump "final value replacement.* = 2.01e\\+2;" "sccp" 
} } */


Re: [PATCH][testsuite] Require shared effective target for some lto.exp tests

2017-02-14 Thread Richard Earnshaw (lists)
On 24/01/17 14:16, Kyrill Tkachov wrote:
> Hi all,
> 
> The tests in this patch fail for me on aarch64-none-elf with:
> relocation R_AARCH64_ADR_PREL_PG_HI21 against external symbol
> `_impure_ptr' can not be used when making a shared object; recompile
> with -fPIC
> 
> I believe since the tests pass -shared to the linker they should be
> gated on the 'shared' effective target?
> With this patch these tests appear as UNSUPPORTED on aarch64-none-elf
> rather than FAILing.
> 
> Ok for trunk?
> 
> Thanks,
> Kyrill
> 
> 2016-01-24  Kyrylo Tkachov  
> 
> * gcc.dg/lto/pr54709_0.c: Require 'shared' effective target.
> * gcc.dg/lto/pr61526_0.c: Likewise.
> * gcc.dg/lto/pr64415_0.c: Likewise.
> 

OK.

R.

> lto-shared-tests.patch
> 
> 
> diff --git a/gcc/testsuite/gcc.dg/lto/pr54709_0.c 
> b/gcc/testsuite/gcc.dg/lto/pr54709_0.c
> index f3db5dc..69697d8 100644
> --- a/gcc/testsuite/gcc.dg/lto/pr54709_0.c
> +++ b/gcc/testsuite/gcc.dg/lto/pr54709_0.c
> @@ -1,6 +1,7 @@
>  /* { dg-lto-do link } */
>  /* { dg-require-visibility "hidden" } */
>  /* { dg-require-effective-target fpic } */
> +/* { dg-require-effective-target shared } */
>  /* { dg-extra-ld-options { -shared } } */
>  /* { dg-lto-options { { -fPIC -fvisibility=hidden -flto } } } */
>  
> diff --git a/gcc/testsuite/gcc.dg/lto/pr61526_0.c 
> b/gcc/testsuite/gcc.dg/lto/pr61526_0.c
> index 8a631f0..d3e2c80 100644
> --- a/gcc/testsuite/gcc.dg/lto/pr61526_0.c
> +++ b/gcc/testsuite/gcc.dg/lto/pr61526_0.c
> @@ -1,4 +1,5 @@
>  /* { dg-require-effective-target fpic } */
> +/* { dg-require-effective-target shared } */
>  /* { dg-lto-do link } */
>  /* { dg-lto-options { { -fPIC -flto -flto-partition=1to1 } } } */
>  /* { dg-extra-ld-options { -shared } } */
> diff --git a/gcc/testsuite/gcc.dg/lto/pr64415_0.c 
> b/gcc/testsuite/gcc.dg/lto/pr64415_0.c
> index 4faab2b..11218e0 100644
> --- a/gcc/testsuite/gcc.dg/lto/pr64415_0.c
> +++ b/gcc/testsuite/gcc.dg/lto/pr64415_0.c
> @@ -1,5 +1,6 @@
>  /* { dg-lto-do link } */
>  /* { dg-require-effective-target fpic } */
> +/* { dg-require-effective-target shared } */
>  /* { dg-lto-options { { -O -flto -fpic } } } */
>  /* { dg-extra-ld-options { -shared } } */
>  /* { dg-extra-ld-options "-Wl,-undefined,dynamic_lookup" { target 
> *-*-darwin* } } */
> 



Re: [Patch AArch64] Use 128-bit vectors when autovectorizing 16-bit float types

2017-02-14 Thread Richard Earnshaw (lists)
On 23/01/17 11:23, James Greenhalgh wrote:
> 
> Hi,
> 
> As subject, we have an oversight in aarch64_simd_container_mode for
> HFmode inputs. This results in trunk only autovectorizing to a 64-bit vector,
> rather than a full 128-bit vector.
> 
> The fix is obvious, we just need to handle HFmode, and return an
> appropriate vector mode.
> 
> Tested on aarch64-none-elf with no issues. This patch looks low risk
> for this development stage to me, though it fixes an oversight rather
> than a regression.
> 
> OK?
> 

OK.

R.

> Thanks,
> James
> 
> ---
> gcc/
> 
> 2017-01-23  James Greenhalgh  
> 
>   * config/aarch64/aarch64.c (aarch64_simd_container_mode): Handle
>   HFmode.
> 
> gcc/testsuite/
> 
> 2017-01-23  James Greenhalgh  
> 
>   * gcc.target/aarch64/vect_fp16_1.c: New.
> 
> 
> 0001-Patch-AArch64-Use-128-bit-vectors-when-autovectorizi.patch
> 
> 
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 0cf7d12..7efc1f2 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -10777,6 +10777,8 @@ aarch64_simd_container_mode (machine_mode mode, 
> unsigned width)
>   return V2DFmode;
> case SFmode:
>   return V4SFmode;
> +   case HFmode:
> + return V8HFmode;
> case SImode:
>   return V4SImode;
> case HImode:
> @@ -10793,6 +10795,8 @@ aarch64_simd_container_mode (machine_mode mode, 
> unsigned width)
> {
> case SFmode:
>   return V2SFmode;
> +   case HFmode:
> + return V4HFmode;
> case SImode:
>   return V2SImode;
> case HImode:
> diff --git a/gcc/testsuite/gcc.target/aarch64/vect_fp16_1.c 
> b/gcc/testsuite/gcc.target/aarch64/vect_fp16_1.c
> new file mode 100644
> index 000..da0cd81
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/vect_fp16_1.c
> @@ -0,0 +1,30 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fno-vect-cost-model" } */
> +
> +/* Check that we vectorize to a full 128-bit vector for _Float16 and __fp16
> +   types.  */
> +
> +/* Enable ARMv8.2-A+fp16 so we have access to the vector instructions.  */
> +#pragma GCC target ("arch=armv8.2-a+fp16")
> +
> +_Float16
> +sum_Float16 (_Float16 *__restrict__ __attribute__ ((__aligned__ (16))) a,
> +  _Float16 *__restrict__ __attribute__ ((__aligned__ (16))) b,
> +  _Float16 *__restrict__ __attribute__ ((__aligned__ (16))) c)
> +{
> +  for (int i = 0; i < 256; i++)
> +a[i] = b[i] + c[i];
> +}
> +
> +_Float16
> +sum_fp16 (__fp16 *__restrict__ __attribute__ ((__aligned__ (16))) a,
> +   __fp16 *__restrict__ __attribute__ ((__aligned__ (16))) b,
> +   __fp16 *__restrict__ __attribute__ ((__aligned__ (16))) c)
> +{
> +  for (int i = 0; i < 256; i++)
> +a[i] = b[i] + c[i];
> +}
> +
> +/* Two FADD operations on "8h" data widths, one from sum_Float16, one from
> +   sum_fp16.  */
> +/* { dg-final { scan-assembler-times "fadd\tv\[0-9\]\+.8h" 2 } } */
> 



Re: [PATCH PR79347]Maintain profile counter information in vect_do_peeling

2017-02-14 Thread Bin.Cheng
On Tue, Feb 14, 2017 at 1:57 PM, Jan Hubicka  wrote:
>> Thanks,
>> bin
>> 2017-02-13  Bin Cheng  
>>
>>   PR tree-optimization/79347
>>   * tree-vect-loop-manip.c (apply_probability_for_bb): New function.
>>   (vect_do_peeling): Maintain profile counters during peeling.
>>
>> gcc/testsuite/ChangeLog
>> 2017-02-13  Bin Cheng  
>>
>>   PR tree-optimization/79347
>>   * gcc.dg/vect/pr79347.c: New test.
>
>> diff --git a/gcc/testsuite/gcc.dg/vect/pr79347.c 
>> b/gcc/testsuite/gcc.dg/vect/pr79347.c
>> new file mode 100644
>> index 000..586c638
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/vect/pr79347.c
>> @@ -0,0 +1,13 @@
>> +/* { dg-do compile } */
>> +/* { dg-require-effective-target vect_int } */
>> +/* { dg-additional-options "-fdump-tree-vect-all" } */
>> +
>> +short *a;
>> +int c;
>> +void n(void)
>> +{
>> +  for (int i = 0; i> +a[i]++;
>> +}
>
> Thanks for fixing the prologue.  I think there is still one extra problem in 
> the vectorizer.
> With the internal vectorized loop I now see:
>
> ;;   basic block 9, loop depth 1, count 0, freq 956, maybe hot
> ;;   Invalid sum of incoming frequencies 1961, should be 956
> ;;prev block 8, next block 10, flags: (NEW, REACHABLE, VISITED)
> ;;pred:   10 [100.0%]  (FALLTHRU,DFS_BACK,EXECUTABLE)
> ;;8 [100.0%]  (FALLTHRU)
>   # i_18 = PHI 
>   # vectp_a.13_66 = PHI 
>   # vectp_a.19_75 = PHI 
>   # ivtmp_78 = PHI 
>   _2 = (long unsigned int) i_18;
>   _3 = _2 * 2;
>   _4 = a.0_1 + _3;
>   vect__5.15_68 = MEM[(short int *)vectp_a.13_66];
>   _5 = *_4;
>   vect__6.16_69 = VIEW_CONVERT_EXPR(vect__5.15_68);
>   _6 = (unsigned short) _5;
>   vect__7.17_71 = vect__6.16_69 + vect_cst__70;
>   _7 = _6 + 1;
>   vect__8.18_72 = VIEW_CONVERT_EXPR(vect__7.17_71);
>   _8 = (short int) _7;
>   MEM[(short int *)vectp_a.19_75] = vect__8.18_72;
>   i_14 = i_18 + 1;
>   vectp_a.13_67 = vectp_a.13_66 + 16;
>   vectp_a.19_76 = vectp_a.19_75 + 16;
>   ivtmp_79 = ivtmp_78 + 1;
>   if (ivtmp_79 < bnd.10_59)
> goto ; [85.00%]
>   else
> goto ; [15.00%]
>
> So it seems that the frequency of the loop itself is unrealistically scaled 
> down.
> Before vetorizing the frequency is 8500 and predicted number of iterations is
> 6.6.  Now the loop is intereed via BB 8 with frequency 1148, so the loop, by
> exit probability exits with 15% probability and thus still has 6.6 iterations,
> but by BB frequencies its body executes fewer times than the preheader.
>
> Now this is a fragile area vectirizing loop should scale number of iterations 
> down
> 8 times. However guessed CFG profiles are always very "flat". Of course
> if loop iterated 6.6 times at the average vectorizing would not make any 
> sense.
> Making guessed profiles less flat is unrealistic, because average loop 
> iterates few times,
> but of course while vectorizing we make additional guess that the 
> vectorizable loops
> matters and the guessed profile is probably unrealistic.
That's what I mentioned in the original patch.  Vectorizer calls
scale_loop_profile in
function vect_transform_loop to scale down loop's frequency regardless mismatch
between loop and preheader/exit basic blocks.  In fact, after this
patch all mismatches
in vectorizer are introduced by this.  I don't see any way to keep
consistency beween
vectorized loop and the rest program without visiting whole CFG.  So
shall we skip
scaling down profile counters for vectorized loop?

>
> GCC 6 seems however bit more consistent.
>> +/* Apply probability PROB to basic block BB and its single succ edge.  */
>> +
>> +static void
>> +apply_probability_for_bb (basic_block bb, int prob)
>> +{
>> +  bb->frequency = apply_probability (bb->frequency, prob);
>> +  bb->count = apply_probability (bb->count, prob);
>> +  gcc_assert (single_succ_p (bb));
>> +  single_succ_edge (bb)->count = bb->count;
>> +}
>> +
>>  /* Function vect_do_peeling.
>>
>> Input:
>> @@ -1690,7 +1701,18 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree 
>> niters, tree nitersm1,
>>   may be preferred.  */
>>basic_block anchor = loop_preheader_edge (loop)->src;
>>if (skip_vector)
>> -split_edge (loop_preheader_edge (loop));
>> +{
>> +  split_edge (loop_preheader_edge (loop));
>> +
>> +  /* Due to the order in which we peel prolog and epilog, we first
>> +  propagate probability to the whole loop.  The purpose is to
>> +  avoid adjusting probabilities of both prolog and vector loops
>> +  separately.  Note in this case, the probability of epilog loop
>> +  needs to be scaled back later.  */
>> +  basic_block bb_before_loop = loop_preheader_edge (loop)->src;
>> +  apply_probability_for_bb (bb_before_loop, prob_vector);
> Aha, this is the bit I missed while trying to fix it 

Re: [PATCH/AARCH64] Change -mcpu=thunderx2t99 's -mcpu=native support

2017-02-14 Thread Richard Earnshaw (lists)
On 06/02/17 06:20, Andrew Pinski wrote:
> Hi,
>   When I implemented the -mcpu=thunderx2t99 I did not have the Cavium
> partno for ThunderX CN99xx, only the original part no.  This patch
> adds the new part no for the future versions of the chip.
> 
> OK?  Bootstrapped and tested on aarch64-linux-gnu with no regressions.
> 
> Thanks,
> Andrew
> 
> ChangeLog:
> * config/aarch64/aarch64-cores.def (thunderx2t99): Move to under 'C"
> cores and change the partno/implementer to be correct.
> (thunderx2t99p1): New core which replaces thunderx2t99 and still has
> the 'B" as the implementer.
> 

OK.

R.

> 
> midrthunderx2t99.diff.txt
> 
> 
> Index: config/aarch64/aarch64-cores.def
> ===
> --- config/aarch64/aarch64-cores.def  (revision 245203)
> +++ config/aarch64/aarch64-cores.def  (working copy)
> @@ -67,6 +67,7 @@ AARCH64_CORE("thunderxt88p1", thunderxt8
>  AARCH64_CORE("thunderxt88",   thunderxt88,   thunderx,  8A,
> AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO | AARCH64_FL_LSE, 
> thunderx,  0x43, 0x0a1, -1)
>  AARCH64_CORE("thunderxt81",   thunderxt81,   thunderx,  8_1A,  
> AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO | AARCH64_FL_LSE, 
> thunderx,  0x43, 0x0a2, -1)
>  AARCH64_CORE("thunderxt83",   thunderxt83,   thunderx,  8_1A,  
> AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO | AARCH64_FL_LSE, 
> thunderx,  0x43, 0x0a3, -1)
> +AARCH64_CORE("thunderx2t99",  thunderx2t99,  thunderx2t99, 8_1A,  
> AARCH64_FL_FOR_ARCH8_1 | AARCH64_FL_CRYPTO, thunderx2t99, 0x43, 0x0af, -1)
>  
>  /* APM ('P') cores. */
>  AARCH64_CORE("xgene1",  xgene1,xgene1,8A,  AARCH64_FL_FOR_ARCH8, 
> xgene1, 0x50, 0x000, -1)
> @@ -74,7 +75,7 @@ AARCH64_CORE("xgene1",  xgene1,x
>  /* V8.1 Architecture Processors.  */
>  
>  /* Broadcom ('B') cores. */
> -AARCH64_CORE("thunderx2t99",  thunderx2t99, thunderx2t99, 8_1A,  
> AARCH64_FL_FOR_ARCH8_1 | AARCH64_FL_CRYPTO, thunderx2t99, 0x42, 0x516, -1)
> +AARCH64_CORE("thunderx2t99p1",  thunderx2t99p1, thunderx2t99, 8_1A,  
> AARCH64_FL_FOR_ARCH8_1 | AARCH64_FL_CRYPTO, thunderx2t99, 0x42, 0x516, -1)
>  AARCH64_CORE("vulcan",  vulcan, thunderx2t99, 8_1A,  AARCH64_FL_FOR_ARCH8_1 
> | AARCH64_FL_CRYPTO, thunderx2t99, 0x42, 0x516, -1)
>  
>  /* V8 big.LITTLE implementations.  */
> Index: config/aarch64/aarch64-tune.md
> ===
> --- config/aarch64/aarch64-tune.md(revision 245203)
> +++ config/aarch64/aarch64-tune.md(working copy)
> @@ -1,5 +1,5 @@
>  ;; -*- buffer-read-only: t -*-
>  ;; Generated automatically by gentune.sh from aarch64-cores.def
>  (define_attr "tune"
> - 
> "cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,exynosm1,falkor,qdf24xx,thunderx,thunderxt88p1,thunderxt88,thunderxt81,thunderxt83,xgene1,thunderx2t99,vulcan,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53"
> + 
> "cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,exynosm1,falkor,qdf24xx,thunderx,thunderxt88p1,thunderxt88,thunderxt81,thunderxt83,thunderx2t99,xgene1,thunderx2t99p1,vulcan,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53"
>   (const (symbol_ref "((enum attr_tune) aarch64_tune)")))
> 



Re: [PATCH PR79347]Maintain profile counter information in vect_do_peeling

2017-02-14 Thread Jan Hubicka
> Thanks,
> bin
> 2017-02-13  Bin Cheng  
> 
>   PR tree-optimization/79347
>   * tree-vect-loop-manip.c (apply_probability_for_bb): New function.
>   (vect_do_peeling): Maintain profile counters during peeling.
> 
> gcc/testsuite/ChangeLog
> 2017-02-13  Bin Cheng  
> 
>   PR tree-optimization/79347
>   * gcc.dg/vect/pr79347.c: New test.

> diff --git a/gcc/testsuite/gcc.dg/vect/pr79347.c 
> b/gcc/testsuite/gcc.dg/vect/pr79347.c
> new file mode 100644
> index 000..586c638
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/pr79347.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-additional-options "-fdump-tree-vect-all" } */
> +
> +short *a;
> +int c;
> +void n(void)
> +{
> +  for (int i = 0; i +a[i]++;
> +}

Thanks for fixing the prologue.  I think there is still one extra problem in 
the vectorizer.
With the internal vectorized loop I now see:

;;   basic block 9, loop depth 1, count 0, freq 956, maybe hot
;;   Invalid sum of incoming frequencies 1961, should be 956
;;prev block 8, next block 10, flags: (NEW, REACHABLE, VISITED)
;;pred:   10 [100.0%]  (FALLTHRU,DFS_BACK,EXECUTABLE)
;;8 [100.0%]  (FALLTHRU)
  # i_18 = PHI 
  # vectp_a.13_66 = PHI 
  # vectp_a.19_75 = PHI 
  # ivtmp_78 = PHI 
  _2 = (long unsigned int) i_18;
  _3 = _2 * 2;
  _4 = a.0_1 + _3;
  vect__5.15_68 = MEM[(short int *)vectp_a.13_66];
  _5 = *_4;
  vect__6.16_69 = VIEW_CONVERT_EXPR(vect__5.15_68);
  _6 = (unsigned short) _5;
  vect__7.17_71 = vect__6.16_69 + vect_cst__70;
  _7 = _6 + 1;
  vect__8.18_72 = VIEW_CONVERT_EXPR(vect__7.17_71);
  _8 = (short int) _7;
  MEM[(short int *)vectp_a.19_75] = vect__8.18_72;
  i_14 = i_18 + 1;
  vectp_a.13_67 = vectp_a.13_66 + 16;
  vectp_a.19_76 = vectp_a.19_75 + 16;
  ivtmp_79 = ivtmp_78 + 1;
  if (ivtmp_79 < bnd.10_59)
goto ; [85.00%]
  else
goto ; [15.00%]

So it seems that the frequency of the loop itself is unrealistically scaled 
down.
Before vetorizing the frequency is 8500 and predicted number of iterations is
6.6.  Now the loop is intereed via BB 8 with frequency 1148, so the loop, by
exit probability exits with 15% probability and thus still has 6.6 iterations,
but by BB frequencies its body executes fewer times than the preheader.

Now this is a fragile area vectirizing loop should scale number of iterations 
down
8 times. However guessed CFG profiles are always very "flat". Of course
if loop iterated 6.6 times at the average vectorizing would not make any sense.
Making guessed profiles less flat is unrealistic, because average loop iterates 
few times,
but of course while vectorizing we make additional guess that the vectorizable 
loops
matters and the guessed profile is probably unrealistic.

GCC 6 seems however bit more consistent.
> +/* Apply probability PROB to basic block BB and its single succ edge.  */
> +
> +static void
> +apply_probability_for_bb (basic_block bb, int prob)
> +{
> +  bb->frequency = apply_probability (bb->frequency, prob);
> +  bb->count = apply_probability (bb->count, prob);
> +  gcc_assert (single_succ_p (bb));
> +  single_succ_edge (bb)->count = bb->count;
> +}
> +
>  /* Function vect_do_peeling.
>  
> Input:
> @@ -1690,7 +1701,18 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree 
> niters, tree nitersm1,
>   may be preferred.  */
>basic_block anchor = loop_preheader_edge (loop)->src;
>if (skip_vector)
> -split_edge (loop_preheader_edge (loop));
> +{
> +  split_edge (loop_preheader_edge (loop));
> +
> +  /* Due to the order in which we peel prolog and epilog, we first
> +  propagate probability to the whole loop.  The purpose is to
> +  avoid adjusting probabilities of both prolog and vector loops
> +  separately.  Note in this case, the probability of epilog loop
> +  needs to be scaled back later.  */
> +  basic_block bb_before_loop = loop_preheader_edge (loop)->src;
> +  apply_probability_for_bb (bb_before_loop, prob_vector);
Aha, this is the bit I missed while trying to fix it myself.
I scale_bbs_frequencies_int(_before_loop, 1, prob_vector, REG_BR_PROB_BASE)
to do this.  I plan to revamp API for this next stage1, but lets keep this 
consistent.
Path is OK with this change and ...
> +  scale_loop_profile (loop, prob_vector, bound);
... please try to check if scaling is really done reasonably.  From the above
it seems that the vectorized loop is unrealistically scalled down that may 
prevent
further optimization for speed...

Thanks for looking into this,
Honza


Re: [PATCH] use zero as the lower bound for a signed-unsigned range (PR 79327)

2017-02-14 Thread Jakub Jelinek
On Tue, Feb 14, 2017 at 08:18:13AM +0100, Jakub Jelinek wrote:
> On Mon, Feb 13, 2017 at 04:53:19PM -0700, Jeff Law wrote:
> > > dirtype is one of the standard {un,}signed {char,short,int,long,long long}
> > > types, all of them have 0 in their ranges.
> > > For VR_RANGE we almost always set res.knownrange to true:
> > >   /* Set KNOWNRANGE if the argument is in a known subrange
> > >  of the directive's type (KNOWNRANGE may be reset below).  */
> > >   res.knownrange
> > > = (!tree_int_cst_equal (TYPE_MIN_VALUE (dirtype), argmin)
> > >|| !tree_int_cst_equal (TYPE_MAX_VALUE (dirtype), argmax));
> > > (the exception is in case that range clearly has to include zero),
> > > and reset it only if adjust_range_for_overflow returned true, which means
> > > it also set the range to TYPE_M{IN,AX}_VALUE (dirtype) and again
> > > includes zero.
> > > So IMNSHO likely_adjust in what you've committed is always true
> > > when you use it and thus just a useless computation and something to make
> > > the code harder to understand.
> > If KNOWNRANGE is false, then LIKELY_ADJUST will be true.  But I don't see
> > how we can determine anything for LIKELY_ADJUST if KNOWNRANGE is true.
> 
> We can't, but that doesn't matter, we only use it if KNOWNRANGE is false.
> The only user of LIKELY_ADJUST is:
>  
>   if (res.knownrange)
> res.range.likely = res.range.max;
>   else
> {
> // -- Here we know res.knownrage is false
>   res.range.likely = res.range.min;
>   if (likely_adjust && maybebase && base != 10)
> // -- and here is the only user of likely_adjust
> {
>   if (res.range.min == 1)
> res.range.likely += base == 8 ? 1 : 2;
>   else if (res.range.min == 2
>&& base == 16
>&& (dir.width[0] == 2 || dir.prec[0] == 2))
> ++res.range.likely;
> }
> }

Another argument I had was that if maybebase and base != 10, then
if the range does not include zero (if it ever could be !res.knownrange
in that case), then res.range.min won't be 1 and for base 16 won't be
even 2, because all the values in the range will include the 0 or 0x prefixes.
The only controversion then would be if the range was [0, 0], then
bumping res.range.likely would not be appropriate.  But such range is
really res.knownrange and never anything else.

Jakub


Re: [RFA][PR tree-optimization/79095] [PATCH 1/4] Improve ranges for MINUS_EXPR and EXACT_DIV_EXPR V3

2017-02-14 Thread Richard Biener
On Tue, Feb 14, 2017 at 7:53 AM, Jeff Law  wrote:
>
> This is the first patch in the series with Richi's comments from last week
> addressed.  #2, #3 and #4 were unchanged.
>
> Richi asked for the EXACT_DIV_EXPR handling in
> extract_range_from_binary_exit_1 to move out one IF conditional nesting
> level.
>
> Richi noted that the use of symbolic_range_based_on_p was unsafe in the
> context I used it in extract_range-from_binary_expr, and that the test we
> want to make happens to be simpler as well.
>
> And finally we use Richi's heuristic for when to prefer ~[0,0] over a wide
> normal range.
>
> Bootstrapped and regression tested with the other 3 patches in this kit on
> x86_64-unknown-linux-gnu.
>
> All 4 patches are attached to this message for ease of review.
>
>
> Ok for the trunk?

Ok.

Thanks,
Richard.

> Thanks,
> jeff
>
> * tree-vrp.c (extract_range_from_binary_expr_1): For EXACT_DIV_EXPR,
> if the numerator has the range ~[0,0] make the resultant range
> ~[0,0].
> (extract_range_from_binary_expr): For MINUS_EXPR with no derived
> range,
> if the operands are known to be not equal, then the resulting range
> is ~[0,0].
> (intersect_ranges): If the new range is ~[0,0] and the old range is
> wide, then prefer ~[0,0].
>
> diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
> index b429217..9174948 100644
> --- a/gcc/tree-vrp.c
> +++ b/gcc/tree-vrp.c
> @@ -2259,6 +2259,19 @@ extract_range_from_binary_expr_1 (value_range *vr,
>else if (vr1.type == VR_UNDEFINED)
>  set_value_range_to_varying ();
>
> +  /* We get imprecise results from ranges_from_anti_range when
> + code is EXACT_DIV_EXPR.  We could mask out bits in the resulting
> + range, but then we also need to hack up vrp_meet.  It's just
> + easier to special case when vr0 is ~[0,0] for EXACT_DIV_EXPR.  */
> +  if (code == EXACT_DIV_EXPR
> +  && vr0.type == VR_ANTI_RANGE
> +  && vr0.min == vr0.max
> +  && integer_zerop (vr0.min))
> +{
> +  set_value_range_to_nonnull (vr, expr_type);
> +  return;
> +}
> +
>/* Now canonicalize anti-ranges to ranges when they are not symbolic
>   and express ~[] op X as ([]' op X) U ([]'' op X).  */
>if (vr0.type == VR_ANTI_RANGE
> @@ -3298,6 +3311,21 @@ extract_range_from_binary_expr (value_range *vr,
>
>extract_range_from_binary_expr_1 (vr, code, expr_type, _vr0, );
>  }
> +
> +  /* If we didn't derive a range for MINUS_EXPR, and
> + op1's range is ~[op0,op0] or vice-versa, then we
> + can derive a non-null range.  This happens often for
> + pointer subtraction.  */
> +  if (vr->type == VR_VARYING
> +  && code == MINUS_EXPR
> +  && TREE_CODE (op0) == SSA_NAME
> +  && ((vr0.type == VR_ANTI_RANGE
> +  && vr0.min == op1
> +  && vr0.min == vr0.max)
> + || (vr1.type == VR_ANTI_RANGE
> + && vr1.min == op0
> + && vr1.min == vr1.max)))
> +  set_value_range_to_nonnull (vr, TREE_TYPE (op0));
>  }
>
>  /* Extract range information from a unary operation CODE based on
> @@ -8620,6 +8648,17 @@ intersect_ranges (enum value_range_type *vr0type,
>   else if (vrp_val_is_min (vr1min)
>&& vrp_val_is_max (vr1max))
> ;
> + /* Choose the anti-range if it is ~[0,0], that range is special
> +enough to special case when vr1's range is relatively wide.  */
> + else if (*vr0min == *vr0max
> +  && integer_zerop (*vr0min)
> +  && (TYPE_PRECISION (TREE_TYPE (*vr0min))
> +  == TYPE_PRECISION (ptr_type_node))
> +  && TREE_CODE (vr1max) == INTEGER_CST
> +  && TREE_CODE (vr1min) == INTEGER_CST
> +  && (wi::clz (wi::sub (vr1max, vr1min))
> +  < TYPE_PRECISION (TREE_TYPE (*vr0min)) / 2))
> +   ;
>   /* Else choose the range.  */
>   else
> {
>
> * tree-vrp.c (overflow_comparison_p_1): New function.
> (overflow_comparison_p): New function.
>
> diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
> index ad8173c..2c03a74 100644
> --- a/gcc/tree-vrp.c
> +++ b/gcc/tree-vrp.c
> @@ -5186,6 +5186,118 @@ masked_increment (const wide_int _in, const
> wide_int ,
>return val ^ sgnbit;
>  }
>
> +/* Helper for overflow_comparison_p
> +
> +   OP0 CODE OP1 is a comparison.  Examine the comparison and potentially
> +   OP1's defining statement to see if it ultimately has the form
> +   OP0 CODE (OP0 PLUS INTEGER_CST)
> +
> +   If so, return TRUE indicating this is an overflow test and store into
> +   *NEW_CST an updated constant that can be used in a narrowed range test.
> +
> +   REVERSED indicates if the comparison was originally:
> +
> +   OP1 CODE' OP0.
> +
> +   This affects how we build the updated constant.  */
> +
> +static bool
> +overflow_comparison_p_1 (enum tree_code code, tree op0, tree op1,
> +

Re: [RFA][PR tree-optimization/79095][PATCH 4/4] Tests

2017-02-14 Thread Richard Biener
On Tue, Feb 7, 2017 at 7:32 PM, Jeff Law  wrote:
>
> This is unchanged from the original posting.  Reposting to make review
> easier.
>
>
> The tests in g++.dg start with a reduced test from Martin (pr79095-1.C) that
> includes a size check.  With the size != 0 check this testcase should not
> issue any warnings as the path that turns into a __builtin_memset should get
> eliminated.
>
> pr79095-2.C is the same test, but without the size != test.  This test
> should trigger an "exceeds maximum object size" warning.
>
> pr79095-3.C is the original test from the BZ, but with a proper size check
> on the vector.  This test should not produce a warning.
>
> pr79095-4.C is the original test from the BZ, but without a size check on
> the vector.  This should produce *one* warning (trunk currently generates
> 3).  We verify that there's just one __builtin_memset by the time VRP2 is
> done and verify that we do get the desired warning.
>
> pr79095-5.C is another test from Martin which should not produce a warning.
>
> gcc-torture/execute/arith-1.c is updated to test a few more cases.  This was
> mis-compiled at some point during patch development and at that time I added
> the additional tsts.
>
> gcc.dg/tree-ssa/pr79095.c is a new test to verify that VRP can propagate
> constants generated on the true/false arms of an overflow test and that
> constants to _not_ propagate into the wrong arm of the conditional.
>
>
> These were included in the bootstrap & regression testing of the prior
> patches.  All the tests pass with the prior patches installed.  OK for the
> trunk?

Ok.

Richard.

>
> * g++.dg/pr79095-1.C: New test
> * g++.dg/pr79095-2.C: New test
> * g++.dg/pr79095-3.C: New test
> * g++.dg/pr79095-4.C: New test
> * g++.dg/pr79095-5.C: New test
> * gcc.c-torture/execute/arith-1.c: Update with more cases.
> * gcc.dg/tree-ssa/pr79095-1.c: New test.
>
> diff --git a/gcc/testsuite/g++.dg/pr79095-1.C
> b/gcc/testsuite/g++.dg/pr79095-1.C
> new file mode 100644
> index 000..4b8043c
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/pr79095-1.C
> @@ -0,0 +1,40 @@
> +/* { dg-do compile } */
> +/* { dg-options "-Wall -O3" } */
> +
> +typedef long unsigned int size_t;
> +
> +inline void
> +fill (int *p, size_t n, int)
> +{
> +  while (n--)
> +*p++ = 0;
> +}
> +
> +struct B
> +{
> +  int* p0, *p1, *p2;
> +
> +  size_t size () const {
> +return size_t (p1 - p0);
> +  }
> +
> +  void resize (size_t n) {
> +if (n > size())
> +  append (n - size());
> +  }
> +
> +  void append (size_t n)
> +  {
> +if (size_t (p2 - p1) >= n)  {
> +  fill (p1, n, 0);
> +}
> +  }
> +};
> +
> +void foo (B )
> +{
> +  if (b.size () != 0)
> +b.resize (b.size () - 1);
> +}
> +
> +
> diff --git a/gcc/testsuite/g++.dg/pr79095-2.C
> b/gcc/testsuite/g++.dg/pr79095-2.C
> new file mode 100644
> index 000..9dabc7e
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/pr79095-2.C
> @@ -0,0 +1,46 @@
> +/* { dg-do compile } */
> +/* { dg-options "-Wall -O3" } */
> +
> +typedef long unsigned int size_t;
> +
> +inline void
> +fill (int *p, size_t n, int)
> +{
> +  while (n--)
> +*p++ = 0;
> +}
> +
> +struct B
> +{
> +  int* p0, *p1, *p2;
> +
> +  size_t size () const {
> +return size_t (p1 - p0);
> +  }
> +
> +  void resize (size_t n) {
> +if (n > size())
> +  append (n - size());
> +  }
> +
> +  void append (size_t n)
> +  {
> +if (size_t (p2 - p1) >= n)  {
> +  fill (p1, n, 0);
> +}
> +  }
> +};
> +
> +void foo (B )
> +{
> +b.resize (b.size () - 1);
> +}
> +
> +/* If b.size() == 0, then the argument to b.resize is -1U (it overflowed).
> +   This will result calling "fill" which turns into a memset with a bogus
> +   length argument.  We want to make sure we warn, which multiple
> +   things.  First the ldist pass converted the loop into a memset,
> +   cprop and simplifications made the length a constant and the static
> +   analysis pass determines it's a bogus size to pass to memset.  */
> +/* { dg-warning "exceeds maximum object size" "" { target *-*-* } 0 } */
> +
> diff --git a/gcc/testsuite/g++.dg/pr79095-3.C
> b/gcc/testsuite/g++.dg/pr79095-3.C
> new file mode 100644
> index 000..28c8a37
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/pr79095-3.C
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-options "-Wall -O3" } */
> +
> +#include 
> +
> +void foo(std::vector );
> +
> +void vtest()
> +{
> +  std::vector v;
> +  foo (v);
> +  if (v.size() > 0)
> +  {
> +v.resize (v.size()-1);
> +  }
> +}
> +
> diff --git a/gcc/testsuite/g++.dg/pr79095-4.C
> b/gcc/testsuite/g++.dg/pr79095-4.C
> new file mode 100644
> index 000..df55025
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/pr79095-4.C
> @@ -0,0 +1,26 @@
> +/* { dg-do compile } */
> +/* { dg-options "-Wall -O3 -fdump-tree-vrp2" } */
> +
> +#include 
> +
> +void foo(std::vector );
> +
> +void vtest()
> +{
> +  std::vector v;
> +  foo (v);
> +  {
> +   

Re: [RFA] [PR tree-optimization/79095][PATCH 3/4] Improve ASSERT_EXPRs and simplification of overflow tests V2

2017-02-14 Thread Richard Biener
On Tue, Feb 7, 2017 at 7:32 PM, Jeff Law  wrote:
> This patch addresses issues Richi raised from V1.  Specifically the users of
> overflow_comparison_1 don't need to worry about trying both the original
> comparison and the reversed comparison.  This slightly simplifies the
> callers.
>
> Bootstrapped and regression tested as part of the full patch series.
>
> OK for the trunk?

Ok.

Richard.

> Jeff
>
> * tree-vrp.c (register_edge_assert_for_2): Register additional
> asserts
> if NAME is used in an overflow test.
> (vrp_evaluate_conditional_warnv_with_ops): If the ops represent an
> overflow check that can be expressed as an equality test, then
> adjust
> ops to be that equality test.
>
> diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
> index 2c03a74..21c459c 100644
> --- a/gcc/tree-vrp.c
> +++ b/gcc/tree-vrp.c
> @@ -5319,7 +5319,17 @@ register_edge_assert_for_2 (tree name, edge e,
> gimple_stmt_iterator bsi,
>/* Only register an ASSERT_EXPR if NAME was found in the sub-graph
>   reachable from E.  */
>if (live_on_edge (e, name))
> -register_new_assert_for (name, name, comp_code, val, NULL, e, bsi);
> +{
> +  tree x;
> +  if (overflow_comparison_p (comp_code, name, val, false, ))
> +   {
> + enum tree_code new_code
> +   = ((comp_code == GT_EXPR || comp_code == GE_EXPR)
> +  ? GT_EXPR : LE_EXPR);
> + register_new_assert_for (name, name, new_code, x, NULL, e, bsi);
> +   }
> +  register_new_assert_for (name, name, comp_code, val, NULL, e, bsi);
> +}
>
>/* In the case of NAME <= CST and NAME being defined as
>   NAME = (unsigned) NAME2 + CST2 we can assert NAME2 >= -CST2
> @@ -7678,6 +7688,39 @@ vrp_evaluate_conditional_warnv_with_ops (enum
> tree_code code, tree op0,
>&& !POINTER_TYPE_P (TREE_TYPE (op0)))
>  return NULL_TREE;
>
> +  /* If OP0 CODE OP1 is an overflow comparison, if it can be expressed
> + as a simple equality test, then prefer that over its current form
> + for evaluation.
> +
> + An overflow test which collapses to an equality test can always be
> + expressed as a comparison of one argument against zero.  Overflow
> + occurs when the chosen argument is zero and does not occur if the
> + chosen argument is not zero.  */
> +  tree x;
> +  if (overflow_comparison_p (code, op0, op1, use_equiv_p, ))
> +{
> +  wide_int max = wi::max_value (TYPE_PRECISION (TREE_TYPE (op0)),
> UNSIGNED);
> +  /* B = A - 1; if (A < B) -> B = A - 1; if (A == 0)
> + B = A - 1; if (A > B) -> B = A - 1; if (A != 0)
> + B = A + 1; if (B < A) -> B = A + 1; if (B == 0)
> + B = A + 1; if (B > A) -> B = A + 1; if (B != 0) */
> +  if (integer_zerop (x))
> +   {
> + op1 = x;
> + code = (code == LT_EXPR || code == LE_EXPR) ? EQ_EXPR : NE_EXPR;
> +   }
> +  /* B = A + 1; if (A > B) -> B = A + 1; if (B == 0)
> + B = A + 1; if (A < B) -> B = A + 1; if (B != 0)
> + B = A - 1; if (B > A) -> B = A - 1; if (A == 0)
> + B = A - 1; if (B < A) -> B = A - 1; if (A != 0) */
> +  else if (wi::eq_p (x, max - 1))
> +   {
> + op0 = op1;
> + op1 = wide_int_to_tree (TREE_TYPE (op0), 0);
> + code = (code == GT_EXPR || code == GE_EXPR) ? EQ_EXPR : NE_EXPR;
> +   }
> +}
> +
>if ((ret = vrp_evaluate_conditional_warnv_with_ops_using_ranges
>(code, op0, op1, strict_overflow_p)))
>  return ret;
>


Re: [RFA] [PR tree-optimization/79095][PATCH 2/4] Add infrastructure to detect overflow checks V2

2017-02-14 Thread Richard Biener
On Tue, Feb 7, 2017 at 7:32 PM, Jeff Law  wrote:
>
> This patch addresses issues Richi raised from V1.  Specifically it relieves
> the callers from having to try op0 COND op1 and op1 COND' op0 separately and
> adds some additional comments about motivation.  There may have been minor
> nits Richi pointed out, if so, they were addressed as well.
>
> Bootstrapped and regression tested as part of the full patch series.
>
> OK for the trunk?

Ok.

Richard.

> Jeff
>
>
> * tree-vrp.c (overflow_comparison_p_1): New function.
> (overflow_comparison_p): New function.
>
> diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
> index ad8173c..2c03a74 100644
> --- a/gcc/tree-vrp.c
> +++ b/gcc/tree-vrp.c
> @@ -5186,6 +5186,118 @@ masked_increment (const wide_int _in, const
> wide_int ,
>return val ^ sgnbit;
>  }
>
> +/* Helper for overflow_comparison_p
> +
> +   OP0 CODE OP1 is a comparison.  Examine the comparison and potentially
> +   OP1's defining statement to see if it ultimately has the form
> +   OP0 CODE (OP0 PLUS INTEGER_CST)
> +
> +   If so, return TRUE indicating this is an overflow test and store into
> +   *NEW_CST an updated constant that can be used in a narrowed range test.
> +
> +   REVERSED indicates if the comparison was originally:
> +
> +   OP1 CODE' OP0.
> +
> +   This affects how we build the updated constant.  */
> +
> +static bool
> +overflow_comparison_p_1 (enum tree_code code, tree op0, tree op1,
> +bool follow_assert_exprs, bool reversed, tree
> *new_cst)
> +{
> +  /* See if this is a relational operation between two SSA_NAMES with
> + unsigned, overflow wrapping values.  If so, check it more deeply.  */
> +  if ((code == LT_EXPR || code == LE_EXPR
> +   || code == GE_EXPR || code == GT_EXPR)
> +  && TREE_CODE (op0) == SSA_NAME
> +  && TREE_CODE (op1) == SSA_NAME
> +  && INTEGRAL_TYPE_P (TREE_TYPE (op0))
> +  && TYPE_UNSIGNED (TREE_TYPE (op0))
> +  && TYPE_OVERFLOW_WRAPS (TREE_TYPE (op0)))
> +{
> +  gimple *op1_def = SSA_NAME_DEF_STMT (op1);
> +
> +  /* If requested, follow any ASSERT_EXPRs backwards for OP1.  */
> +  if (follow_assert_exprs)
> +   {
> + while (gimple_assign_single_p (op1_def)
> +&& TREE_CODE (gimple_assign_rhs1 (op1_def)) == ASSERT_EXPR)
> +   {
> + op1 = TREE_OPERAND (gimple_assign_rhs1 (op1_def), 0);
> + if (TREE_CODE (op1) != SSA_NAME)
> +   break;
> + op1_def = SSA_NAME_DEF_STMT (op1);
> +   }
> +   }
> +
> +  /* Now look at the defining statement of OP1 to see if it adds
> +or subtracts a nonzero constant from another operand.  */
> +  if (op1_def
> + && is_gimple_assign (op1_def)
> + && gimple_assign_rhs_code (op1_def) == PLUS_EXPR
> + && TREE_CODE (gimple_assign_rhs2 (op1_def)) == INTEGER_CST
> + && !integer_zerop (gimple_assign_rhs2 (op1_def)))
> +   {
> + tree target = gimple_assign_rhs1 (op1_def);
> +
> + /* If requested, follow ASSERT_EXPRs backwards for op0 looking
> +for one where TARGET appears on the RHS.  */
> + if (follow_assert_exprs)
> +   {
> + /* Now see if that "other operand" is op0, following the chain
> +of ASSERT_EXPRs if necessary.  */
> + gimple *op0_def = SSA_NAME_DEF_STMT (op0);
> + while (op0 != target
> +&& gimple_assign_single_p (op0_def)
> +&& TREE_CODE (gimple_assign_rhs1 (op0_def)) ==
> ASSERT_EXPR)
> +   {
> + op0 = TREE_OPERAND (gimple_assign_rhs1 (op0_def), 0);
> + if (TREE_CODE (op0) != SSA_NAME)
> +   break;
> + op0_def = SSA_NAME_DEF_STMT (op0);
> +   }
> +   }
> +
> + /* If we did not find our target SSA_NAME, then this is not
> +an overflow test.  */
> + if (op0 != target)
> +   return false;
> +
> + tree type = TREE_TYPE (op0);
> + wide_int max = wi::max_value (TYPE_PRECISION (type), UNSIGNED);
> + tree inc = gimple_assign_rhs2 (op1_def);
> + if (reversed)
> +   *new_cst = wide_int_to_tree (type, max + inc);
> + else
> +   *new_cst = wide_int_to_tree (type, max - inc);
> + return true;
> +   }
> +}
> +  return false;
> +}
> +
> +/* OP0 CODE OP1 is a comparison.  Examine the comparison and potentially
> +   OP1's defining statement to see if it ultimately has the form
> +   OP0 CODE (OP0 PLUS INTEGER_CST)
> +
> +   If so, return TRUE indicating this is an overflow test and store into
> +   *NEW_CST an updated constant that can be used in a narrowed range test.
> +
> +   These statements are left as-is in the IL to facilitate discovery of
> +   {ADD,SUB}_OVERFLOW sequences later in the optimizer pipeline.  But
> +   the alternate range representation is 

Re: [PATCH] Fix memory leak in LTO

2017-02-14 Thread Richard Biener
On Tue, Feb 14, 2017 at 12:28 PM, Martin Liška  wrote:
> Hi.
>
> The patch fixes:
>
> ==137424== 24 bytes in 1 blocks are definitely lost in loss record 23 of 748
>
> ==137424==at 0x4C29110: malloc (in 
> /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
>
> ==137424==by 0x10C39D7: xmalloc (xmalloc.c:147)
>
> ==137424==by 0x10C3AE9: xstrdup (xstrdup.c:34)
>
> ==137424==by 0x61273F: lto_obj_file_open(char const*, bool) 
> (lto-object.c:93)
>
> ==137424==by 0x60AE62: do_stream_out(char*, lto_symtab_encoder_d*) 
> (lto.c:2284)
>
> ==137424==by 0x60FC72: stream_out (lto.c:2333)
>
> ==137424==by 0x60FC72: lto_wpa_write_files (lto.c:2470)
>
> ==137424==by 0x60FC72: do_whole_program_analysis (lto.c:3156)
>
> ==137424==by 0x60FC72: lto_main() (lto.c:3316)
>
> ==137424==by 0x9B830E: compile_file() (toplev.c:467)
>
> ==137424==by 0x5E2D98: do_compile (toplev.c:1984)
>
> ==137424==by 0x5E2D98: toplev::main(int, char**) (toplev.c:2118)
>
> ==137424==by 0x5E4B76: main (main.c:39)
>
>
>
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>
> Ready to be installed?

Ok.

Richard.

> Martin


Re: Fix profile updating after outer loop unswitching

2017-02-14 Thread Richard Biener
On Tue, Feb 14, 2017 at 12:22 PM, Martin Liška  wrote:
> On 02/05/2017 06:28 PM, Jan Hubicka wrote:
>> +  /* ... finally scale everything in the loop except for guarded basic 
>> blocks
>> + where profile does not change.  */
>> +  basic_block *body = get_loop_body (loop);
>
> Hello.
>
> This hunk causes a new memory leak:
>
> ==24882== 64 bytes in 1 blocks are definitely lost in loss record 328 of 892
>
> ==24882==at 0x4C29110: malloc (in 
> /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
>
> ==24882==by 0x115DFF7: xmalloc (xmalloc.c:147)
>
> ==24882==by 0x6FADCC: get_loop_body(loop const*) (cfgloop.c:834)
>
> ==24882==by 0xB77520: hoist_guard (tree-ssa-loop-unswitch.c:881)
>
> ==24882==by 0xB77520: tree_unswitch_outer_loop 
> (tree-ssa-loop-unswitch.c:536)
>
> ==24882==by 0xB77520: tree_ssa_unswitch_loops() 
> (tree-ssa-loop-unswitch.c:104)
>
> ==24882==by 0x99C65E: execute_one_pass(opt_pass*) (passes.c:2465)
>
> ==24882==by 0x99CE17: execute_pass_list_1(opt_pass*) (passes.c:2554)
>
> ==24882==by 0x99CE29: execute_pass_list_1(opt_pass*) (passes.c:2555)
>
> ==24882==by 0x99CE29: execute_pass_list_1(opt_pass*) (passes.c:2555)
>
> ==24882==by 0x99CE74: execute_pass_list(function*, opt_pass*) 
> (passes.c:2565)
>
> ==24882==by 0x71E745: cgraph_node::expand() (cgraphunit.c:2038)
>
> ==24882==by 0x71FCC3: expand_all_functions (cgraphunit.c:2174)
>
> ==24882==by 0x71FCC3: symbol_table::compile() (cgraphunit.c:2531)
>
> ==24882==by 0x7214DB: symbol_table::finalize_compilation_unit() 
> (cgraphunit.c:2621)
>
> ==24882==by 0xA5B3AB: compile_file() (toplev.c:492)
>
> ==24882==by 0x5F3A78: do_compile (toplev.c:1984)
>
> ==24882==by 0x5F3A78: toplev::main(int, char**) (toplev.c:2118)
>
> ==24882==by 0x5F5826: main (main.c:39)
>
>
> Fixed in attached patch that can bootstrap on ppc64le-redhat-linux and 
> survives regression tests.
>
> Ready to be installed?

Ok.

Thanks,
Richard.

> Martin
>


[PATCH][GRAPHITE] Use generic isl-val interface, not gmp special one

2017-02-14 Thread Richard Biener

This removes all GMP code from graphite and instead arranges to use
widest_ints plus the generic ISL interface for building/creating vals
by pieces.  This removes one gmp allocation per conversion plus allows
ISL to be built with IMath or IMath with small integer optimization
(on the host or in-tree).

ISL 0.15 supports IMath already but not IMath with small integer 
optimization.  I didn't adjust Makefile.def to choose anything other
than the GMP default for in-tree builds (yet).

I built and tested GCC with ISL 0.15, 0.16.1 and 0.18 all with GMP
and with IMath (or IMath-32 where available).

Full bootstrap and regtest running on x86_64-unknown-linux-gnu
(with host ISL 0.16.1).

Ok for trunk?

Thanks,
Richard.

2017-02-14  Richard Biener  

* graphite.h: Do not include isl/isl_val_gmp.h, instead include
isl/isl_val.h.
* graphite-isl-ast-to-gimple.c (gmp_cst_to_tree): Remove.
(gcc_expression_from_isl_expr_int): Use generic isl_val interface.
* graphite-sese-to-poly.c: Do not include isl/isl_val_gmp.h.
(isl_val_int_from_wi): New function.
(extract_affine_gmp): Rename to ...
(extract_affine_wi): ... this, take a widest_int.
(extract_affine_int): Just wrap extract_affine_wi.
(add_param_constraints): Use isl_val_int_from_wi.
(add_loop_constraints): Likewise, and extract_affine_wi.

Index: gcc/graphite-isl-ast-to-gimple.c
===
--- gcc/graphite-isl-ast-to-gimple.c(revision 245417)
+++ gcc/graphite-isl-ast-to-gimple.c(working copy)
@@ -73,22 +73,6 @@ struct ast_build_info
   bool is_parallelizable;
 };
 
-/* Converts a GMP constant VAL to a tree and returns it.  */
-
-static tree
-gmp_cst_to_tree (tree type, mpz_t val)
-{
-  tree t = type ? type : integer_type_node;
-  mpz_t tmp;
-
-  mpz_init (tmp);
-  mpz_set (tmp, val);
-  wide_int wi = wi::from_mpz (t, tmp, true);
-  mpz_clear (tmp);
-
-  return wide_int_to_tree (t, wi);
-}
-
 /* Verifies properties that GRAPHITE should maintain during translation.  */
 
 static inline void
@@ -325,16 +309,20 @@ gcc_expression_from_isl_expr_int (tree t
 {
   gcc_assert (isl_ast_expr_get_type (expr) == isl_ast_expr_int);
   isl_val *val = isl_ast_expr_get_val (expr);
-  mpz_t val_mpz_t;
-  mpz_init (val_mpz_t);
+  size_t n = isl_val_n_abs_num_chunks (val, sizeof (HOST_WIDE_INT));
+  HOST_WIDE_INT *chunks = XALLOCAVEC (HOST_WIDE_INT, n);
   tree res;
-  if (isl_val_get_num_gmp (val, val_mpz_t) == -1)
+  if (isl_val_get_abs_num_chunks (val, sizeof (HOST_WIDE_INT), chunks) == -1)
 res = NULL_TREE;
   else
-res = gmp_cst_to_tree (type, val_mpz_t);
+{
+  widest_int wi = widest_int::from_array (chunks, n, true);
+  if (isl_val_is_neg (val))
+   wi = -wi;
+  res = wide_int_to_tree (type, wi);
+}
   isl_val_free (val);
   isl_ast_expr_free (expr);
-  mpz_clear (val_mpz_t);
   return res;
 }
 
Index: gcc/graphite-sese-to-poly.c
===
--- gcc/graphite-sese-to-poly.c (revision 245417)
+++ gcc/graphite-sese-to-poly.c (working copy)
@@ -55,7 +55,6 @@ along with GCC; see the file COPYING3.
 #include 
 #include 
 #include 
-#include 
 
 #include "graphite.h"
 
@@ -154,16 +153,32 @@ extract_affine_name (scop_p s, tree e, _
   return isl_pw_aff_alloc (dom, aff);
 }
 
+/* Convert WI to a isl_val with CTX.  */
+
+static __isl_give isl_val *
+isl_val_int_from_wi (isl_ctx *ctx, const widest_int )
+{
+  if (wi::neg_p (wi, SIGNED))
+{
+  widest_int mwi = -wi;
+  return isl_val_neg (isl_val_int_from_chunks (ctx, mwi.get_len (),
+  sizeof (HOST_WIDE_INT),
+  mwi.get_val ()));
+}
+  return isl_val_int_from_chunks (ctx, wi.get_len (), sizeof (HOST_WIDE_INT),
+ wi.get_val ());
+}
+
 /* Extract an affine expression from the gmp constant G.  */
 
 static isl_pw_aff *
-extract_affine_gmp (mpz_t g, __isl_take isl_space *space)
+extract_affine_wi (const widest_int , __isl_take isl_space *space)
 {
   isl_local_space *ls = isl_local_space_from_space (isl_space_copy (space));
   isl_aff *aff = isl_aff_zero_on_domain (ls);
   isl_set *dom = isl_set_universe (space);
   isl_ctx *ct = isl_aff_get_ctx (aff);
-  isl_val *v = isl_val_int_from_gmp (ct, g);
+  isl_val *v = isl_val_int_from_wi (ct, g);
   aff = isl_aff_add_constant_val (aff, v);
 
   return isl_pw_aff_alloc (dom, aff);
@@ -174,13 +189,7 @@ extract_affine_gmp (mpz_t g, __isl_take
 static isl_pw_aff *
 extract_affine_int (tree e, __isl_take isl_space *space)
 {
-  mpz_t g;
-
-  mpz_init (g);
-  tree_int_to_gmp (e, g);
-  isl_pw_aff *res = extract_affine_gmp (g, space);
-  mpz_clear (g);
-
+  isl_pw_aff *res = extract_affine_wi (wi::to_widest (e), space);
   return res;
 }
 
@@ -411,15 +420,11 @@ add_param_constraints (scop_p scop, grap
 {
   

[PATCH PR79347]Maintain profile counter information in vect_do_peeling

2017-02-14 Thread Bin Cheng
Hi,
This patch fixes issue reported by PR79347 by calculating/maintaining profile 
counter information
on the fly in vect_do_peeling.  Due to the order that we first peel prologue 
loop, peel epilogue loop,
and then add guarding edge skipping prolog+vector loop if niter is small, this 
patch takes a trick
that firstly scales down counters for loop before peeling and scales counters 
back after adding the
aforementioned guarding edge.  Otherwise, more work would be needed to 
calculate counters for
prolog and vector loop. After this patch, # of profile counter for tramp3d 
benchmark is improved from:

tramp3d-v4.cpp.157t.ifcvt:296
tramp3d-v4.cpp.158t.vect:1118
tramp3d-v4.cpp.159t.dce6:1118
tramp3d-v4.cpp.160t.pcom:1118
tramp3d-v4.cpp.161t.cunroll:1019
tramp3d-v4.cpp.162t.slp1:1019
tramp3d-v4.cpp.164t.ivopts:1019
tramp3d-v4.cpp.165t.lim4:1019
tramp3d-v4.cpp.166t.loopdone:1007
tramp3d-v4.cpp.167t.no_loop:31
...
tramp3d-v4.cpp.226t.optimized:1009

to:

tramp3d-v4.cpp.157t.ifcvt:296
tramp3d-v4.cpp.158t.vect:814
tramp3d-v4.cpp.159t.dce6:814
tramp3d-v4.cpp.160t.pcom:814
tramp3d-v4.cpp.161t.cunroll:723
tramp3d-v4.cpp.162t.slp1:723
tramp3d-v4.cpp.164t.ivopts:723
tramp3d-v4.cpp.165t.lim4:723
tramp3d-v4.cpp.166t.loopdone:711
tramp3d-v4.cpp.167t.no_loop:31
...
tramp3d-v4.cpp.226t.optimized:831

Bootstrap and test on x86_64 and AArch64.  Is it OK?

BTW, with the patch, vectorizer only introduces mismatches by below code in 
vect_transform_loop:

  /* Reduce loop iterations by the vectorization factor.  */
  scale_loop_profile (loop, GCOV_COMPUTE_SCALE (1, vf),
  expected_iterations / vf);

Though it makes sense to scale down according to vect-factor, but it definitely 
introduces
mismatch between vector_loop's frequency and the rest program.  I also believe 
it is not
that useful to scale here, especially without profiling information.  At least 
we need to make
vector_loop's frequency consistent with the rest program.

Thanks,
bin
2017-02-13  Bin Cheng  

PR tree-optimization/79347
* tree-vect-loop-manip.c (apply_probability_for_bb): New function.
(vect_do_peeling): Maintain profile counters during peeling.

gcc/testsuite/ChangeLog
2017-02-13  Bin Cheng  

PR tree-optimization/79347
* gcc.dg/vect/pr79347.c: New test.diff --git a/gcc/testsuite/gcc.dg/vect/pr79347.c 
b/gcc/testsuite/gcc.dg/vect/pr79347.c
new file mode 100644
index 000..586c638
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr79347.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-additional-options "-fdump-tree-vect-all" } */
+
+short *a;
+int c;
+void n(void)
+{
+  for (int i = 0; ifrequency = apply_probability (bb->frequency, prob);
+  bb->count = apply_probability (bb->count, prob);
+  gcc_assert (single_succ_p (bb));
+  single_succ_edge (bb)->count = bb->count;
+}
+
 /* Function vect_do_peeling.
 
Input:
@@ -1690,7 +1701,18 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, 
tree nitersm1,
  may be preferred.  */
   basic_block anchor = loop_preheader_edge (loop)->src;
   if (skip_vector)
-split_edge (loop_preheader_edge (loop));
+{
+  split_edge (loop_preheader_edge (loop));
+
+  /* Due to the order in which we peel prolog and epilog, we first
+propagate probability to the whole loop.  The purpose is to
+avoid adjusting probabilities of both prolog and vector loops
+separately.  Note in this case, the probability of epilog loop
+needs to be scaled back later.  */
+  basic_block bb_before_loop = loop_preheader_edge (loop)->src;
+  apply_probability_for_bb (bb_before_loop, prob_vector);
+  scale_loop_profile (loop, prob_vector, bound);
+}
 
   tree niters_prolog = build_int_cst (type, 0);
   source_location loop_loc = find_loop_location (loop);
@@ -1727,6 +1749,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, 
tree nitersm1,
  guard_cond = fold_build2 (EQ_EXPR, boolean_type_node,
niters_prolog, build_int_cst (type, 0));
  guard_bb = loop_preheader_edge (prolog)->src;
+ basic_block bb_after_prolog = loop_preheader_edge (loop)->src;
  guard_to = split_edge (loop_preheader_edge (loop));
  guard_e = slpeel_add_loop_guard (guard_bb, guard_cond,
   

Re: [gomp4] Async related additions to OpenACC runtime library

2017-02-14 Thread Chung-Lin Tang
On 2017/2/14 07:25 PM, Thomas Schwinge wrote:
> Hi Chung-Lin!
> 
> On Mon, 13 Feb 2017 18:13:42 +0800, Chung-Lin Tang  
> wrote:
>> Tested and committed to gomp-4_0-branch.
> 
> Thanks!  (Not yet reviewed.)  Testing this, I saw a lot of regressions,
> and in r245427 just committed the following to gomp-4_0-branch to address
> OCthese.  Did you simply forget to commit your changes to
> libgomp/libgomp.map, or why did this work for you?  Please verify:

Weird, I did not see any regressions, but thanks for adding those.
I overlooked updating the map file.

Thanks,
Chung-Lin

> commit bd5613600754bd7a1fe85990eb3b7b6b5f2e1543
> Author: tschwinge 
> Date:   Tue Feb 14 11:20:31 2017 +
> 
> Update libgomp/libgomp.map for OpenACC async functions
> 
> libgomp/
> * libgomp.map: Add OACC_2.5 version, and add acc_copyin_async,
> acc_copyin_async_32_h_, acc_copyin_async_64_h_,
> acc_copyin_async_array_h_, acc_copyout_async,
> acc_copyout_async_32_h_, acc_copyout_async_64_h_,
> acc_copyout_async_array_h_, acc_create_async,
> acc_create_async_32_h_, acc_create_async_64_h_,
> acc_create_async_array_h_, acc_delete_async,
> acc_delete_async_32_h_, acc_delete_async_64_h_,
> acc_delete_async_array_h_, acc_get_default_async,
> acc_get_default_async_h_, acc_memcpy_from_device_async,
> acc_memcpy_to_device_async, acc_set_default_async,
> acc_set_default_async_h_, acc_update_device_async,
> acc_update_device_async_32_h_, acc_update_device_async_64_h_,
> acc_update_device_async_array_h_, acc_update_self_async,
> acc_update_self_async_32_h_, acc_update_self_async_64_h_, and
> acc_update_self_async_array_h_.  Add GOMP_PLUGIN_1.2 version, and
> add GOMP_PLUGIN_acc_thread_default_async.
> 
> git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@245427 
> 138bc75d-0d04-0410-961f-82ee72b054a4
> ---
>  libgomp/ChangeLog.gomp | 20 
>  libgomp/libgomp.map| 39 +++
>  2 files changed, 59 insertions(+)
> 
> diff --git libgomp/ChangeLog.gomp libgomp/ChangeLog.gomp
> index 0a5f601..b811c28 100644
> --- libgomp/ChangeLog.gomp
> +++ libgomp/ChangeLog.gomp
> @@ -1,3 +1,23 @@
> +2017-02-14  Thomas Schwinge  
> +
> + * libgomp.map: Add OACC_2.5 version, and add acc_copyin_async,
> + acc_copyin_async_32_h_, acc_copyin_async_64_h_,
> + acc_copyin_async_array_h_, acc_copyout_async,
> + acc_copyout_async_32_h_, acc_copyout_async_64_h_,
> + acc_copyout_async_array_h_, acc_create_async,
> + acc_create_async_32_h_, acc_create_async_64_h_,
> + acc_create_async_array_h_, acc_delete_async,
> + acc_delete_async_32_h_, acc_delete_async_64_h_,
> + acc_delete_async_array_h_, acc_get_default_async,
> + acc_get_default_async_h_, acc_memcpy_from_device_async,
> + acc_memcpy_to_device_async, acc_set_default_async,
> + acc_set_default_async_h_, acc_update_device_async,
> + acc_update_device_async_32_h_, acc_update_device_async_64_h_,
> + acc_update_device_async_array_h_, acc_update_self_async,
> + acc_update_self_async_32_h_, acc_update_self_async_64_h_, and
> + acc_update_self_async_array_h_.  Add GOMP_PLUGIN_1.2 version, and
> + add GOMP_PLUGIN_acc_thread_default_async.
> +
>  2017-02-13  Cesar Philippidis  
>  
>   * plugin/plugin-nvptx.c (nvptx_exec): Adjust the default num_gangs.
> diff --git libgomp/libgomp.map libgomp/libgomp.map
> index b047ad9..2c9a13d 100644
> --- libgomp/libgomp.map
> +++ libgomp/libgomp.map
> @@ -378,6 +378,40 @@ OACC_2.0 {
>   acc_set_cuda_stream;
>  };
>  
> +OACC_2.5 {
> +  global:
> + acc_copyin_async;
> + acc_copyin_async_32_h_;
> + acc_copyin_async_64_h_;
> + acc_copyin_async_array_h_;
> + acc_copyout_async;
> + acc_copyout_async_32_h_;
> + acc_copyout_async_64_h_;
> + acc_copyout_async_array_h_;
> + acc_create_async;
> + acc_create_async_32_h_;
> + acc_create_async_64_h_;
> + acc_create_async_array_h_;
> + acc_delete_async;
> + acc_delete_async_32_h_;
> + acc_delete_async_64_h_;
> + acc_delete_async_array_h_;
> + acc_get_default_async;
> + acc_get_default_async_h_;
> + acc_memcpy_from_device_async;
> + acc_memcpy_to_device_async;
> + acc_set_default_async;
> + acc_set_default_async_h_;
> + acc_update_device_async;
> + acc_update_device_async_32_h_;
> + acc_update_device_async_64_h_;
> + acc_update_device_async_array_h_;
> + acc_update_self_async;
> + acc_update_self_async_32_h_;
> + acc_update_self_async_64_h_;
> + acc_update_self_async_array_h_;
> +} OACC_2.0;
> +
>  GOACC_2.0 {
>global:
>   GOACC_data_end;

[PATCH] Fix memory leak in LTO

2017-02-14 Thread Martin Liška
Hi.

The patch fixes:

==137424== 24 bytes in 1 blocks are definitely lost in loss record 23 of 748

==137424==at 0x4C29110: malloc (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)

==137424==by 0x10C39D7: xmalloc (xmalloc.c:147)

==137424==by 0x10C3AE9: xstrdup (xstrdup.c:34)

==137424==by 0x61273F: lto_obj_file_open(char const*, bool) 
(lto-object.c:93)

==137424==by 0x60AE62: do_stream_out(char*, lto_symtab_encoder_d*) 
(lto.c:2284)

==137424==by 0x60FC72: stream_out (lto.c:2333)

==137424==by 0x60FC72: lto_wpa_write_files (lto.c:2470)

==137424==by 0x60FC72: do_whole_program_analysis (lto.c:3156)

==137424==by 0x60FC72: lto_main() (lto.c:3316)

==137424==by 0x9B830E: compile_file() (toplev.c:467)

==137424==by 0x5E2D98: do_compile (toplev.c:1984)

==137424==by 0x5E2D98: toplev::main(int, char**) (toplev.c:2118)

==137424==by 0x5E4B76: main (main.c:39)



Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Ready to be installed?
Martin
>From 66b97767a498d438eb5740141a299ac8347d4ebc Mon Sep 17 00:00:00 2001
From: marxin 
Date: Tue, 14 Feb 2017 10:35:09 +0100
Subject: [PATCH] Fix memory leak in LTO

gcc/lto/ChangeLog:

2017-02-14  Martin Liska  

	* lto.c (do_stream_out): Free LTO file filename string.

---
 gcc/lto/lto.c   | 2 ++
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/gcc/lto/lto.c b/gcc/lto/lto.c
index d77d85defb6..99d58cff4d4 100644
--- a/gcc/lto/lto.c
+++ b/gcc/lto/lto.c
@@ -2288,6 +2288,8 @@ do_stream_out (char *temp_filename, lto_symtab_encoder_t encoder)
 
   ipa_write_optimization_summaries (encoder);
 
+  free (CONST_CAST (char *, file->filename));
+
   lto_set_current_out_file (NULL);
   lto_obj_file_close (file);
   free (file);
-- 
2.11.0



Re: [gomp4] Async related additions to OpenACC runtime library

2017-02-14 Thread Thomas Schwinge
Hi Chung-Lin!

On Mon, 13 Feb 2017 18:13:42 +0800, Chung-Lin Tang  
wrote:
> Tested and committed to gomp-4_0-branch.

Thanks!  (Not yet reviewed.)  Testing this, I saw a lot of regressions,
and in r245427 just committed the following to gomp-4_0-branch to address
OCthese.  Did you simply forget to commit your changes to
libgomp/libgomp.map, or why did this work for you?  Please verify:

commit bd5613600754bd7a1fe85990eb3b7b6b5f2e1543
Author: tschwinge 
Date:   Tue Feb 14 11:20:31 2017 +

Update libgomp/libgomp.map for OpenACC async functions

libgomp/
* libgomp.map: Add OACC_2.5 version, and add acc_copyin_async,
acc_copyin_async_32_h_, acc_copyin_async_64_h_,
acc_copyin_async_array_h_, acc_copyout_async,
acc_copyout_async_32_h_, acc_copyout_async_64_h_,
acc_copyout_async_array_h_, acc_create_async,
acc_create_async_32_h_, acc_create_async_64_h_,
acc_create_async_array_h_, acc_delete_async,
acc_delete_async_32_h_, acc_delete_async_64_h_,
acc_delete_async_array_h_, acc_get_default_async,
acc_get_default_async_h_, acc_memcpy_from_device_async,
acc_memcpy_to_device_async, acc_set_default_async,
acc_set_default_async_h_, acc_update_device_async,
acc_update_device_async_32_h_, acc_update_device_async_64_h_,
acc_update_device_async_array_h_, acc_update_self_async,
acc_update_self_async_32_h_, acc_update_self_async_64_h_, and
acc_update_self_async_array_h_.  Add GOMP_PLUGIN_1.2 version, and
add GOMP_PLUGIN_acc_thread_default_async.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@245427 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgomp/ChangeLog.gomp | 20 
 libgomp/libgomp.map| 39 +++
 2 files changed, 59 insertions(+)

diff --git libgomp/ChangeLog.gomp libgomp/ChangeLog.gomp
index 0a5f601..b811c28 100644
--- libgomp/ChangeLog.gomp
+++ libgomp/ChangeLog.gomp
@@ -1,3 +1,23 @@
+2017-02-14  Thomas Schwinge  
+
+   * libgomp.map: Add OACC_2.5 version, and add acc_copyin_async,
+   acc_copyin_async_32_h_, acc_copyin_async_64_h_,
+   acc_copyin_async_array_h_, acc_copyout_async,
+   acc_copyout_async_32_h_, acc_copyout_async_64_h_,
+   acc_copyout_async_array_h_, acc_create_async,
+   acc_create_async_32_h_, acc_create_async_64_h_,
+   acc_create_async_array_h_, acc_delete_async,
+   acc_delete_async_32_h_, acc_delete_async_64_h_,
+   acc_delete_async_array_h_, acc_get_default_async,
+   acc_get_default_async_h_, acc_memcpy_from_device_async,
+   acc_memcpy_to_device_async, acc_set_default_async,
+   acc_set_default_async_h_, acc_update_device_async,
+   acc_update_device_async_32_h_, acc_update_device_async_64_h_,
+   acc_update_device_async_array_h_, acc_update_self_async,
+   acc_update_self_async_32_h_, acc_update_self_async_64_h_, and
+   acc_update_self_async_array_h_.  Add GOMP_PLUGIN_1.2 version, and
+   add GOMP_PLUGIN_acc_thread_default_async.
+
 2017-02-13  Cesar Philippidis  
 
* plugin/plugin-nvptx.c (nvptx_exec): Adjust the default num_gangs.
diff --git libgomp/libgomp.map libgomp/libgomp.map
index b047ad9..2c9a13d 100644
--- libgomp/libgomp.map
+++ libgomp/libgomp.map
@@ -378,6 +378,40 @@ OACC_2.0 {
acc_set_cuda_stream;
 };
 
+OACC_2.5 {
+  global:
+   acc_copyin_async;
+   acc_copyin_async_32_h_;
+   acc_copyin_async_64_h_;
+   acc_copyin_async_array_h_;
+   acc_copyout_async;
+   acc_copyout_async_32_h_;
+   acc_copyout_async_64_h_;
+   acc_copyout_async_array_h_;
+   acc_create_async;
+   acc_create_async_32_h_;
+   acc_create_async_64_h_;
+   acc_create_async_array_h_;
+   acc_delete_async;
+   acc_delete_async_32_h_;
+   acc_delete_async_64_h_;
+   acc_delete_async_array_h_;
+   acc_get_default_async;
+   acc_get_default_async_h_;
+   acc_memcpy_from_device_async;
+   acc_memcpy_to_device_async;
+   acc_set_default_async;
+   acc_set_default_async_h_;
+   acc_update_device_async;
+   acc_update_device_async_32_h_;
+   acc_update_device_async_64_h_;
+   acc_update_device_async_array_h_;
+   acc_update_self_async;
+   acc_update_self_async_32_h_;
+   acc_update_self_async_64_h_;
+   acc_update_self_async_array_h_;
+} OACC_2.0;
+
 GOACC_2.0 {
   global:
GOACC_data_end;
@@ -417,3 +451,8 @@ GOMP_PLUGIN_1.1 {
   global:
GOMP_PLUGIN_target_task_completion;
 } GOMP_PLUGIN_1.0;
+
+GOMP_PLUGIN_1.2 {
+  global:
+   GOMP_PLUGIN_acc_thread_default_async;
+} GOMP_PLUGIN_1.1;


Grüße
 Thomas


Re: Fix profile updating after outer loop unswitching

2017-02-14 Thread Martin Liška
On 02/05/2017 06:28 PM, Jan Hubicka wrote:
> +  /* ... finally scale everything in the loop except for guarded basic blocks
> + where profile does not change.  */
> +  basic_block *body = get_loop_body (loop);

Hello.

This hunk causes a new memory leak:

==24882== 64 bytes in 1 blocks are definitely lost in loss record 328 of 892

==24882==at 0x4C29110: malloc (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)

==24882==by 0x115DFF7: xmalloc (xmalloc.c:147)

==24882==by 0x6FADCC: get_loop_body(loop const*) (cfgloop.c:834)

==24882==by 0xB77520: hoist_guard (tree-ssa-loop-unswitch.c:881)

==24882==by 0xB77520: tree_unswitch_outer_loop 
(tree-ssa-loop-unswitch.c:536)

==24882==by 0xB77520: tree_ssa_unswitch_loops() 
(tree-ssa-loop-unswitch.c:104)

==24882==by 0x99C65E: execute_one_pass(opt_pass*) (passes.c:2465)

==24882==by 0x99CE17: execute_pass_list_1(opt_pass*) (passes.c:2554)

==24882==by 0x99CE29: execute_pass_list_1(opt_pass*) (passes.c:2555)

==24882==by 0x99CE29: execute_pass_list_1(opt_pass*) (passes.c:2555)

==24882==by 0x99CE74: execute_pass_list(function*, opt_pass*) 
(passes.c:2565)

==24882==by 0x71E745: cgraph_node::expand() (cgraphunit.c:2038)

==24882==by 0x71FCC3: expand_all_functions (cgraphunit.c:2174)

==24882==by 0x71FCC3: symbol_table::compile() (cgraphunit.c:2531)

==24882==by 0x7214DB: symbol_table::finalize_compilation_unit() 
(cgraphunit.c:2621)

==24882==by 0xA5B3AB: compile_file() (toplev.c:492)

==24882==by 0x5F3A78: do_compile (toplev.c:1984)

==24882==by 0x5F3A78: toplev::main(int, char**) (toplev.c:2118)

==24882==by 0x5F5826: main (main.c:39)


Fixed in attached patch that can bootstrap on ppc64le-redhat-linux and survives 
regression tests.

Ready to be installed?
Martin

>From 290c88481c42ca4c09cf4c4202903fc84d0c9dc4 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Tue, 14 Feb 2017 09:45:41 +0100
Subject: [PATCH 1/2] Fix memory leak in tree-ssa-loop-unswitch.c

gcc/ChangeLog:

2017-02-14  Martin Liska  

	* tree-ssa-loop-unswitch.c (hoist_guard): Release get_loop_body
	vector.  Fix trailing white spaces.
---
 gcc/tree-ssa-loop-unswitch.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-ssa-loop-unswitch.c b/gcc/tree-ssa-loop-unswitch.c
index 143caf73e86..afa04e9d110 100644
--- a/gcc/tree-ssa-loop-unswitch.c
+++ b/gcc/tree-ssa-loop-unswitch.c
@@ -820,7 +820,7 @@ hoist_guard (struct loop *loop, edge guard)
   /* Create new loop pre-header.  */
   e = split_block (pre_header, last_stmt (pre_header));
   if (dump_file && (dump_flags & TDF_DETAILS))
-fprintf (dump_file, "  Moving guard %i->%i (prob %i) to bb %i, "	
+fprintf (dump_file, "  Moving guard %i->%i (prob %i) to bb %i, "
 	 "new preheader is %i\n",
 	 guard->src->index, guard->dest->index, guard->probability,
 	 e->src->index, e->dest->index);
@@ -879,7 +879,7 @@ hoist_guard (struct loop *loop, edge guard)
   /* ... finally scale everything in the loop except for guarded basic blocks
  where profile does not change.  */
   basic_block *body = get_loop_body (loop);
-  
+
   if (dump_file && (dump_flags & TDF_DETAILS))
 fprintf (dump_file, "  Scaling nonguarded BBs in loop:");
   for (unsigned int i = 0; i < loop->num_nodes; i++)
@@ -920,6 +920,8 @@ hoist_guard (struct loop *loop, edge guard)
 
   if (dump_file && (dump_flags & TDF_DETAILS))
 fprintf (dump_file, "\n  guard hoisted.\n");
+
+  free (body);
 }
 
 /* Return true if phi argument for exit edge can be used
-- 
2.11.0



Re: [PATCH] Fix exception handling for ILP32 aarch64

2017-02-14 Thread Richard Earnshaw (lists)
On 07/02/17 23:11, Steve Ellcey wrote:
> This patch was submitted last year by Andrew Pinski, this is a
> resubmit/ping of that patch.
> 
>   https://gcc.gnu.org/ml/gcc-patches/2016-04/msg01726.html
> 
> During the initial submittal James Greenhalgh asked if this was an ABI change.
> I do not believe it is because while it is changing the size of the reg
> structure in _Unwind_Context that structure is opaque to anything outside of
> libunwind and so the change is not visible to user programs.  It is not
> changing the type or size of _Unwind_Word which remains as 64 bits but it
> fixes some places where those 64 bits are being truncated to 32 bits.  Any
> ILP32 Aarch64 program that uses an unwind library without this change may
> abort because of that truncation.
> 
> The other thing that this patch does (by setting REG_VALUE_IN_UNWIND_CONTEXT)
> is change the value of ASSUME_EXTENDED_UNWIND_CONTEXT to 1.  That should not
> matter because we are currently setting EXTENDED_CONTEXT_BIT in
> uw_init_context_1 anyway and that basically does the same thing, saying that
> we have the opaque extended unwind context structure which can be changed
> without affecting the ABI.  I believe that only pre-G++ 3.0 objects would not
> have the extended opaque context structure and there is no ILP32 Aarch64
> support in a compiler that old.
> 
> I am a little confused about why, when ASSUME_EXTENDED_UNWIND_CONTEXT
> is set in unwind-dw2.c, we don't still set the EXTENDED_CONTEXT_BIT in
> uw_init_context_1.  I guess it doesn't matter because once we start assuming
> the extended unwind context we will never go back and allowing a mix and match
> with pre 3.0 unextended unwind contexts so it doesn't matter if the bit is set
> or not.
> 
> I actually tested this patch without changing ASSUME_EXTENDED_UNWIND_CONTEXT
> and it worked as well as this patch where ASSUME_EXTENDED_UNWIND_CONTEXT is
> changed so that change could be left out out by setting it to 0 in the new
> aarch64 value-unwind.h header file if we thought there was a reason to do
> that.
> 
> Steve Ellcey
> sell...@cavium.com
> 
> 
> 2017-02-07  Andrew Pinski  
> 
>   * config/aarch64/value-unwind.h: New file.
>   * config.host (aarch64*-*-*): Add aarch64/value-unwind.h
>   to tm_file.
> 

OK.

R.

> 
> unwind.patch
> 
> 
> diff --git a/libgcc/config.host b/libgcc/config.host
> index 9472a60..8bab369 100644
> --- a/libgcc/config.host
> +++ b/libgcc/config.host
> @@ -1379,4 +1379,8 @@ i[34567]86-*-linux* | x86_64-*-linux*)
>   fi
>   tm_file="${tm_file} i386/value-unwind.h"
>   ;;
> +aarch64*-*-*)
> + # ILP32 needs an extra header for unwinding
> + tm_file="${tm_file} aarch64/value-unwind.h"
> + ;;
>  esac
> diff --git a/libgcc/config/aarch64/value-unwind.h 
> b/libgcc/config/aarch64/value-unwind.h
> index e69de29..c79e832 100644
> --- a/libgcc/config/aarch64/value-unwind.h
> +++ b/libgcc/config/aarch64/value-unwind.h
> @@ -0,0 +1,25 @@
> +/* Store register values as _Unwind_Word type in DWARF2 EH unwind context.
> +   Copyright (C) 2017 Free Software Foundation, Inc.
> +
> +   This file is part of GCC.
> +
> +   GCC is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published
> +   by the Free Software Foundation; either version 3, or (at your
> +   option) any later version.
> +
> +   GCC is distributed in the hope that it will be useful, but WITHOUT
> +   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
> +   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
> +   License for more details.
> +
> +   You should have received a copy of the GNU General Public License and
> +   a copy of the GCC Runtime Library Exception along with this program;
> +   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> +   .  */
> +
> +/* Define this macro if the target stores register values as _Unwind_Word
> +   type in unwind context.  Only enable it for ilp32.  */
> +#if defined __aarch64__ && !defined __LP64__
> +# define REG_VALUE_IN_UNWIND_CONTEXT
> +#endif
> 



Re: [RFA][PR tree-optimization/79095] [PATCH 1/4] Improve ranges for MINUS_EXPR and EXACT_DIV_EXPR V2

2017-02-14 Thread Marc Glisse

On Mon, 13 Feb 2017, Jeff Law wrote:


On 02/13/2017 09:15 AM, Marc Glisse wrote:

On Mon, 13 Feb 2017, Jeff Law wrote:


On 02/12/2017 12:13 AM, Marc Glisse wrote:

On Tue, 7 Feb 2017, Jeff Law wrote:


* tree-vrp.c (extract_range_from_binary_expr_1): For EXACT_DIV_EXPR,
if the numerator has the range ~[0,0] make the resultant range ~[0,0].


If I understand correctly, for x /[ex] 4 with x!=0, we currently split
~[0,0] into [INT_MIN,-1] and [1,INT_MAX], then apply EXACT_DIV_EXPR
which gives [INT_MIN/4,-1] and [1,INT_MAX/4], and finally compute the
union of those, which prefers [INT_MIN/4,INT_MAX/4] over ~[0,0]. We
could change the union function, but this patch prefers changing the
code elsewhere so that the new behavior is restricted to the
EXACT_DIV_EXPR case (and supposedly the patch will be reverted if we get
a new non-contiguous version of ranges, where union would already work).
Is that it?

That was one of alternate approaches I suggested.

Part of the problem is the conversion of ~[0,0] is imprecise when it's
going to be used in EXACT_DIV_EXPR, which I outlined elsewhere in the
thread.  As a result of that imprecision, the ranges we get are
[INT_MIN/4,0] U [0,INT_MAX/4].


If VRP for [1, INT_MAX] /[ex] 4 produces [0, INT_MAX/4] instead of [1,
INT_MAX/4], that's a bug that should be fixed in any case. You shouldn't
need [4, INT_MAX] for that.
Agreed.  But given it doesn't actually make anything around 79095 easier, I'd 
just assume defer to gcc-8.


That all depends how you handle the intersection/union thing.

I suspect that we'll see nicely refined anti-ranges, but rarely see 
improvements in the generated code.





If we fix that imprecision so that the conversion yields [INT_MIN,-4]
U [4, INT_MAX] then apply EXACT_DIV_EXPR we get [INT_MIN/4,-1] U
[1,INT_MAX/4], which union_ranges turns into [INT_MIN/4,INT_MAX/4].
We still end up needing a hack in union_ranges that will look a hell
of a lot like the one we're contemplating for intersect_ranges.


That was the point of my question. Do we want to put that "hack" (prefer
an anti-range in some cases) in a central place where it would apply any
time we try to replace [a,b]U[c,d] (b+1

Re: [PATCH, GCC/x86 mingw32] Add configure option to force wildcard behavior on Windows

2017-02-14 Thread JonY
On 02/14/2017 09:32 AM, Thomas Preudhomme wrote:
>>
>> Looks good, be sure to emphasize this option affects mingw hosted GCC
>> only, not the compiler output.
> 
> I think that should be pretty clear in the latest version of the patch,
> doc/install.texi contains:
> 
> "Note that this option only affects wildcard expansion for GCC itself. 
> It does
> not affect wildcard expansion of executables built by the resulting GCC."
> 
> If you think a part of that sentence is still confusing please let me
> know and I'll improve it.
> 
> Best regards,
> 
> Thomas
> 

Yes, that should be good, no more objections.




signature.asc
Description: OpenPGP digital signature


Re: [PATCH][ARM] PR rtl-optimization/68664 Implement TARGET_SCHED_CAN_SPECULATE_INSN hook

2017-02-14 Thread Richard Earnshaw (lists)
On 14/02/17 10:11, Kyrill Tkachov wrote:
> Hi all,
> 
> And this is the arm implementation of the hook. It is the same as the
> aarch64 one since the two ports
> share their instruction types for scheduling purposes.
> 
> Bootstrapped and tested on arm-none-linux-gnueabihf.
> 
> Ok for trunk?
> 
> Thanks,
> Kyrill
> 
> 2016-02-07  Kyrylo Tkachov  
> 
> PR rtl-optimization/68664
> * config/arm/arm.c (arm_sched_can_speculate_insn):
> New function.  Declare prototype.
> (TARGET_SCHED_CAN_SPECULATE_INSN): Define.
> 

OK.

R.

> arm-spec.patch
> 
> 
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index 
> b7f7179d99ff211e6be518fdbbc4bdff312d6a07..08a472f8658b49455a57bf324eada2b674436541
>  100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -240,6 +240,7 @@ static bool arm_can_inline_p (tree, tree);
>  static void arm_relayout_function (tree);
>  static bool arm_valid_target_attribute_p (tree, tree, tree, int);
>  static unsigned HOST_WIDE_INT arm_shift_truncation_mask (machine_mode);
> +static bool arm_sched_can_speculate_insn (rtx_insn *);
>  static bool arm_macro_fusion_p (void);
>  static bool arm_cannot_copy_insn_p (rtx_insn *);
>  static int arm_issue_rate (void);
> @@ -419,6 +420,9 @@ static const struct attribute_spec arm_attribute_table[] =
>  #undef  TARGET_COMP_TYPE_ATTRIBUTES
>  #define TARGET_COMP_TYPE_ATTRIBUTES arm_comp_type_attributes
>  
> +#undef TARGET_SCHED_CAN_SPECULATE_INSN
> +#define TARGET_SCHED_CAN_SPECULATE_INSN arm_sched_can_speculate_insn
> +
>  #undef TARGET_SCHED_MACRO_FUSION_P
>  #define TARGET_SCHED_MACRO_FUSION_P arm_macro_fusion_p
>  
> @@ -30085,6 +30089,35 @@ arm_fusion_enabled_p (tune_params::fuse_ops op)
>return current_tune->fusible_ops & op;
>  }
>  
> +/* Implement TARGET_SCHED_CAN_SPECULATE_INSN.  Return true if INSN can be
> +   scheduled for speculative execution.  Reject the long-running division
> +   and square-root instructions.  */
> +
> +static bool
> +arm_sched_can_speculate_insn (rtx_insn *insn)
> +{
> +  switch (get_attr_type (insn))
> +{
> +  case TYPE_SDIV:
> +  case TYPE_UDIV:
> +  case TYPE_FDIVS:
> +  case TYPE_FDIVD:
> +  case TYPE_FSQRTS:
> +  case TYPE_FSQRTD:
> +  case TYPE_NEON_FP_SQRT_S:
> +  case TYPE_NEON_FP_SQRT_D:
> +  case TYPE_NEON_FP_SQRT_S_Q:
> +  case TYPE_NEON_FP_SQRT_D_Q:
> +  case TYPE_NEON_FP_DIV_S:
> +  case TYPE_NEON_FP_DIV_D:
> +  case TYPE_NEON_FP_DIV_S_Q:
> +  case TYPE_NEON_FP_DIV_D_Q:
> + return false;
> +  default:
> + return true;
> +}
> +}
> +
>  /* Implement the TARGET_ASAN_SHADOW_OFFSET hook.  */
>  
>  static unsigned HOST_WIDE_INT
> 



Re: [PATCH][AArch64] PR rtl-optimization/68664 Implement TARGET_SCHED_CAN_SPECULATE_INSN hook

2017-02-14 Thread Richard Earnshaw (lists)
On 14/02/17 10:08, Kyrill Tkachov wrote:
> Hi all,
> 
> Following up from Segher's patch here is the aarch64 implementation of
> the new hook.
> It forbids speculation of the integer and floating-point division
> instructions as well as the
> square-root instructions.
> 
> With this patch the fsqrt is not speculated and the preformance on the
> code in the PR is improved 3x
> on a Cortex-A53.
> 
> Bootstrapped and tested on aarch64-none-linux-gnu.
> 
> Ok for trunk?
> 
> Thanks,
> Kyrill
> 
> 2016-02-14  Kyrylo Tkachov  
> 
> PR rtl-optimization/68664
> * config/aarch64/aarch64.c (aarch64_sched_can_speculate_insn):
> New function.
> (TARGET_SCHED_CAN_SPECULATE_INSN): Define.
> 
> aarch64-spec.patch

OK.

R.

> 
> 
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> eab719d076358f01023cf8b2a37d3c8edd8d8f1f..f72e4c4423d28af66f3bd8068eeb83060d541839
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -14750,6 +14750,35 @@ aarch64_excess_precision (enum excess_precision_type 
> type)
>return FLT_EVAL_METHOD_UNPREDICTABLE;
>  }
>  
> +/* Implement TARGET_SCHED_CAN_SPECULATE_INSN.  Return true if INSN can be
> +   scheduled for speculative execution.  Reject the long-running division
> +   and square-root instructions.  */
> +
> +static bool
> +aarch64_sched_can_speculate_insn (rtx_insn *insn)
> +{
> +  switch (get_attr_type (insn))
> +{
> +  case TYPE_SDIV:
> +  case TYPE_UDIV:
> +  case TYPE_FDIVS:
> +  case TYPE_FDIVD:
> +  case TYPE_FSQRTS:
> +  case TYPE_FSQRTD:
> +  case TYPE_NEON_FP_SQRT_S:
> +  case TYPE_NEON_FP_SQRT_D:
> +  case TYPE_NEON_FP_SQRT_S_Q:
> +  case TYPE_NEON_FP_SQRT_D_Q:
> +  case TYPE_NEON_FP_DIV_S:
> +  case TYPE_NEON_FP_DIV_D:
> +  case TYPE_NEON_FP_DIV_S_Q:
> +  case TYPE_NEON_FP_DIV_D_Q:
> + return false;
> +  default:
> + return true;
> +}
> +}
> +
>  /* Target-specific selftests.  */
>  
>  #if CHECKING_P
> @@ -15138,6 +15167,9 @@ aarch64_libgcc_floating_mode_supported_p
>  #define TARGET_USE_BY_PIECES_INFRASTRUCTURE_P \
>aarch64_use_by_pieces_infrastructure_p
>  
> +#undef TARGET_SCHED_CAN_SPECULATE_INSN
> +#define TARGET_SCHED_CAN_SPECULATE_INSN aarch64_sched_can_speculate_insn
> +
>  #undef TARGET_CAN_USE_DOLOOP_P
>  #define TARGET_CAN_USE_DOLOOP_P can_use_doloop_if_innermost
>  
> 



[PATCH][ARM] PR rtl-optimization/68664 Implement TARGET_SCHED_CAN_SPECULATE_INSN hook

2017-02-14 Thread Kyrill Tkachov

Hi all,

And this is the arm implementation of the hook. It is the same as the aarch64 
one since the two ports
share their instruction types for scheduling purposes.

Bootstrapped and tested on arm-none-linux-gnueabihf.

Ok for trunk?

Thanks,
Kyrill

2016-02-07  Kyrylo Tkachov  

PR rtl-optimization/68664
* config/arm/arm.c (arm_sched_can_speculate_insn):
New function.  Declare prototype.
(TARGET_SCHED_CAN_SPECULATE_INSN): Define.
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index b7f7179d99ff211e6be518fdbbc4bdff312d6a07..08a472f8658b49455a57bf324eada2b674436541 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -240,6 +240,7 @@ static bool arm_can_inline_p (tree, tree);
 static void arm_relayout_function (tree);
 static bool arm_valid_target_attribute_p (tree, tree, tree, int);
 static unsigned HOST_WIDE_INT arm_shift_truncation_mask (machine_mode);
+static bool arm_sched_can_speculate_insn (rtx_insn *);
 static bool arm_macro_fusion_p (void);
 static bool arm_cannot_copy_insn_p (rtx_insn *);
 static int arm_issue_rate (void);
@@ -419,6 +420,9 @@ static const struct attribute_spec arm_attribute_table[] =
 #undef  TARGET_COMP_TYPE_ATTRIBUTES
 #define TARGET_COMP_TYPE_ATTRIBUTES arm_comp_type_attributes
 
+#undef TARGET_SCHED_CAN_SPECULATE_INSN
+#define TARGET_SCHED_CAN_SPECULATE_INSN arm_sched_can_speculate_insn
+
 #undef TARGET_SCHED_MACRO_FUSION_P
 #define TARGET_SCHED_MACRO_FUSION_P arm_macro_fusion_p
 
@@ -30085,6 +30089,35 @@ arm_fusion_enabled_p (tune_params::fuse_ops op)
   return current_tune->fusible_ops & op;
 }
 
+/* Implement TARGET_SCHED_CAN_SPECULATE_INSN.  Return true if INSN can be
+   scheduled for speculative execution.  Reject the long-running division
+   and square-root instructions.  */
+
+static bool
+arm_sched_can_speculate_insn (rtx_insn *insn)
+{
+  switch (get_attr_type (insn))
+{
+  case TYPE_SDIV:
+  case TYPE_UDIV:
+  case TYPE_FDIVS:
+  case TYPE_FDIVD:
+  case TYPE_FSQRTS:
+  case TYPE_FSQRTD:
+  case TYPE_NEON_FP_SQRT_S:
+  case TYPE_NEON_FP_SQRT_D:
+  case TYPE_NEON_FP_SQRT_S_Q:
+  case TYPE_NEON_FP_SQRT_D_Q:
+  case TYPE_NEON_FP_DIV_S:
+  case TYPE_NEON_FP_DIV_D:
+  case TYPE_NEON_FP_DIV_S_Q:
+  case TYPE_NEON_FP_DIV_D_Q:
+	return false;
+  default:
+	return true;
+}
+}
+
 /* Implement the TARGET_ASAN_SHADOW_OFFSET hook.  */
 
 static unsigned HOST_WIDE_INT


[PATCH][AArch64] PR rtl-optimization/68664 Implement TARGET_SCHED_CAN_SPECULATE_INSN hook

2017-02-14 Thread Kyrill Tkachov

Hi all,

Following up from Segher's patch here is the aarch64 implementation of the new 
hook.
It forbids speculation of the integer and floating-point division instructions 
as well as the
square-root instructions.

With this patch the fsqrt is not speculated and the preformance on the code in 
the PR is improved 3x
on a Cortex-A53.

Bootstrapped and tested on aarch64-none-linux-gnu.

Ok for trunk?

Thanks,
Kyrill

2016-02-14  Kyrylo Tkachov  

PR rtl-optimization/68664
* config/aarch64/aarch64.c (aarch64_sched_can_speculate_insn):
New function.
(TARGET_SCHED_CAN_SPECULATE_INSN): Define.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index eab719d076358f01023cf8b2a37d3c8edd8d8f1f..f72e4c4423d28af66f3bd8068eeb83060d541839 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -14750,6 +14750,35 @@ aarch64_excess_precision (enum excess_precision_type type)
   return FLT_EVAL_METHOD_UNPREDICTABLE;
 }
 
+/* Implement TARGET_SCHED_CAN_SPECULATE_INSN.  Return true if INSN can be
+   scheduled for speculative execution.  Reject the long-running division
+   and square-root instructions.  */
+
+static bool
+aarch64_sched_can_speculate_insn (rtx_insn *insn)
+{
+  switch (get_attr_type (insn))
+{
+  case TYPE_SDIV:
+  case TYPE_UDIV:
+  case TYPE_FDIVS:
+  case TYPE_FDIVD:
+  case TYPE_FSQRTS:
+  case TYPE_FSQRTD:
+  case TYPE_NEON_FP_SQRT_S:
+  case TYPE_NEON_FP_SQRT_D:
+  case TYPE_NEON_FP_SQRT_S_Q:
+  case TYPE_NEON_FP_SQRT_D_Q:
+  case TYPE_NEON_FP_DIV_S:
+  case TYPE_NEON_FP_DIV_D:
+  case TYPE_NEON_FP_DIV_S_Q:
+  case TYPE_NEON_FP_DIV_D_Q:
+	return false;
+  default:
+	return true;
+}
+}
+
 /* Target-specific selftests.  */
 
 #if CHECKING_P
@@ -15138,6 +15167,9 @@ aarch64_libgcc_floating_mode_supported_p
 #define TARGET_USE_BY_PIECES_INFRASTRUCTURE_P \
   aarch64_use_by_pieces_infrastructure_p
 
+#undef TARGET_SCHED_CAN_SPECULATE_INSN
+#define TARGET_SCHED_CAN_SPECULATE_INSN aarch64_sched_can_speculate_insn
+
 #undef TARGET_CAN_USE_DOLOOP_P
 #define TARGET_CAN_USE_DOLOOP_P can_use_doloop_if_innermost
 


[PATCH PR71437/V2]Simplify cond with assertions in threading

2017-02-14 Thread Bin Cheng
Hi,
This is the second try fixing PR71437.  The old version patch tried to fix 
issue in VRP but it requires further non-trivial change in VRP, specifically, 
to better support variable value ranges.  This is not appropriate at stage 4.  
Alternatively, this patch tries to fix issue by improving threading.  It 
additionally simplifies condition by using assertion conditions.

Bootstrap and test on x86_64 and AArch64.  Is it OK?

Thanks,
bin

2017-02-13  Bin Cheng  

PR tree-optimization/71437
* tree-ssa-loop-niter.c (tree_simplify_using_condition): Only
expand condition if new parameter says so.  Also change it to
global.
* tree-ssa-loop-niter.h (tree_simplify_using_condition): New
declaration.
* tree-ssa-threadedge.c (tree-ssa-loop-niter.h): New include file.
(simplify_control_stmt_condition_1): Simplify condition using
assert conditions.

gcc/testsuite/ChangeLog
2017-02-13  Bin Cheng  

PR tree-optimization/71437
* gcc.dg/tree-ssa/pr71437.c: New test.diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71437.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr71437.c
new file mode 100644
index 000..66a5405
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71437.c
@@ -0,0 +1,42 @@
+/* { dg-do compile } */
+/* { dg-options "-ffast-math -O3 -fdump-tree-vrp1-details" } */
+
+int I = 50, J = 50;
+int S, L;
+const int *pL;
+const int *pS;
+
+void bar (float, float);
+
+void foo (int K)
+{
+  int k, i, j;
+  static float LD, SD;
+  for (k = 0 ; k < K; k++)
+{
+for( i = 0 ; i < ( I - 1 ) ; i++ )
+{
+if( ( L < pL[i+1] ) && ( L >= pL[i] ) )
+  break ;
+}
+
+if( i == ( I - 1 ) )
+  L = pL[i] ;
+LD = (float)( L - pL[i] ) /
+(float)( pL[i + 1] - pL[i] ) ;
+
+for( j = 0 ; j < ( J-1 ) ; j++ )
+{
+if( ( S < pS[j+1] ) && ( S >= pS[j] ) )
+  break ;
+}
+
+if( j == ( J - 1 ) )
+  S = pS[j] ;
+SD = (float)( S - pS[j] ) /
+ (float)( pS[j + 1] - pS[j] ) ;
+
+   bar (LD, SD);
+}
+}
+/* { dg-final { scan-tree-dump-times "Threaded jump " 2 "vrp1" } } */
diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c
index efcf3ed..52baad1 100644
--- a/gcc/tree-ssa-loop-niter.c
+++ b/gcc/tree-ssa-loop-niter.c
@@ -2057,12 +2057,14 @@ tree_simplify_using_condition_1 (tree cond, tree expr)
Wrapper around tree_simplify_using_condition_1 that ensures that chains
of simple operations in definitions of ssa names in COND are expanded,
so that things like casts or incrementing the value of the bound before
-   the loop do not cause us to fail.  */
+   the loop do not cause us to fail.  COND is expanded before simplifying
+   if EXPAND is true.  */
 
-static tree
-tree_simplify_using_condition (tree cond, tree expr)
+tree
+tree_simplify_using_condition (tree cond, tree expr, bool expand)
 {
-  cond = expand_simple_operations (cond);
+  if (expand)
+cond = expand_simple_operations (cond);
 
   return tree_simplify_using_condition_1 (cond, expr);
 }
diff --git a/gcc/tree-ssa-loop-niter.h b/gcc/tree-ssa-loop-niter.h
index b009857..4e572df 100644
--- a/gcc/tree-ssa-loop-niter.h
+++ b/gcc/tree-ssa-loop-niter.h
@@ -21,6 +21,7 @@ along with GCC; see the file COPYING3.  If not see
 #define GCC_TREE_SSA_LOOP_NITER_H
 
 extern tree expand_simple_operations (tree, tree = NULL);
+extern tree tree_simplify_using_condition (tree, tree, bool = true);
 extern tree simplify_using_initial_conditions (struct loop *, tree);
 extern bool loop_only_exit_p (const struct loop *, const_edge);
 extern bool number_of_iterations_exit (struct loop *, edge,
diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c
index 4949bfa..fa2891d 100644
--- a/gcc/tree-ssa-threadedge.c
+++ b/gcc/tree-ssa-threadedge.c
@@ -30,6 +30,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "cfgloop.h"
 #include "gimple-iterator.h"
 #include "tree-cfg.h"
+#include "tree-ssa-loop-niter.h"
 #include "tree-ssa-threadupdate.h"
 #include "params.h"
 #include "tree-ssa-scopedtables.h"
@@ -561,6 +562,46 @@ simplify_control_stmt_condition_1 (edge e,
   if (limit == 0)
 return NULL_TREE;
 
+  /* Simplify condition using assertion conditions.  */
+  if (handle_dominating_asserts
+  && TREE_CODE (op0) == SSA_NAME && TREE_CODE (op1) == SSA_NAME)
+{
+  tree assert_op0 = op0, assert_op1 = op1;
+  tree assert_cond0 = NULL_TREE, assert_cond1 = NULL_TREE;
+  gimple *def0 = SSA_NAME_DEF_STMT (op0), *def1 = SSA_NAME_DEF_STMT (op1);
+
+  if (is_gimple_assign (def0)
+ && TREE_CODE (gimple_assign_rhs1 (def0)) == ASSERT_EXPR)
+   {
+ assert_op0 = TREE_OPERAND (gimple_assign_rhs1 (def0), 0);
+ assert_cond0 = TREE_OPERAND (gimple_assign_rhs1 (def0), 1);
+   }
+  if (is_gimple_assign (def1)
+ && 

Re: [PATCH] Fix PR70022

2017-02-14 Thread Richard Biener
On Tue, 14 Feb 2017, Martin Liška wrote:

> Hi.
> 
> As mentioned in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79498#c5, the 
> hunk in fold-const.c
> was not properly applied to GCC 5 branch. I've just tested the branch with 
> the patch.
> 
> Ready to install the hunk?

Sure - not sure how I messed that up...

Richard.

Re: [PATCH] Fix PR70022

2017-02-14 Thread Martin Liška
Hi.

As mentioned in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79498#c5, the hunk 
in fold-const.c
was not properly applied to GCC 5 branch. I've just tested the branch with the 
patch.

Ready to install the hunk?
Thanks,
Martin


Re: [PATCH, GCC/x86 mingw32] Add configure option to force wildcard behavior on Windows

2017-02-14 Thread Thomas Preudhomme

Hi Jonathan,

Sorry for the delay answering.

On 07/02/17 08:47, JonY wrote:

On 01/26/2017 01:04 PM, Thomas Preudhomme wrote:

Hi JonY,

On 19/01/17 01:37, JonY wrote:

On 01/18/2017 09:48 AM, Thomas Preudhomme wrote:

By default, wildcard support on Windows for programs compiled with mingw
depends on how the mingw runtime was configured. This means if one wants
to build GCC for Windows with a consistent behavior with Wildcard
(enabled or disabled) the mingw runtime must be built as well. This
patch adds an option to GCC configuration to force the behavior with
wildcard when building GCC for Windows host. It does so by setting the
_dowildcard variable in the driver to a given value depending on the
configure option value (yes or no), thus overriding the variable from
mingw runtime.

Testing: I've successfully done a build of the arm-none-eabi cross GCC
for Windows with Ubuntu system mingw runtime (configured without
wildcard support by default) with the three configure options:
  1) --enable-wildcard: wildcard can be used successfully and nm of
driver-mingw32.o shows that _dowildcard is in .data section
  2) --disable-wildcard: wildcard cannot be used and nm of
driver-mingw32.o shows that _dowildcard is in .bss section
  3) no option: wildcard cannot be used and nm of driver-mingw32.o shows
no _dowildcard defined and all sections are empty

Is this ok for stage1?



Looks good, be sure to emphasize this option affects mingw hosted GCC
only, not the compiler output.


I think that should be pretty clear in the latest version of the patch, 
doc/install.texi contains:


"Note that this option only affects wildcard expansion for GCC itself.  It does
not affect wildcard expansion of executables built by the resulting GCC."

If you think a part of that sentence is still confusing please let me know and 
I'll improve it.


Best regards,

Thomas


RE: [PATCH] [X86_64] Fix alignment for znver1 arch.

2017-02-14 Thread Kumar, Venkataramanan
Thanks Uros,

I committed on Amit's behalf.
https://gcc.gnu.org/viewcvs/gcc?view=revision=245423

regards,
venkat.

> -Original Message-
> From: Uros Bizjak [mailto:ubiz...@gmail.com]
> Sent: Tuesday, February 14, 2017 2:43 PM
> To: Pawar, Amit 
> Cc: gcc-patches@gcc.gnu.org; Kumar, Venkataramanan
> 
> Subject: Re: [PATCH] [X86_64] Fix alignment for znver1 arch.
> 
> On Tue, Feb 14, 2017 at 8:48 AM, Pawar, Amit  wrote:
> > Hi maintainers,
> >
> > Please find the below patch which changes the code alignment values for
> znver1. Bootstrap and regression test passed on x86_64.
> > OK to apply?
> 
> OK.
> 
> Thanks,
> Uros.
> 
> > Thanks,
> > Amit Pawar
> >
> > 
> > diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 2561d53..a5b0159
> > 100644
> > --- a/gcc/ChangeLog
> > +++ b/gcc/ChangeLog
> > @@ -1,3 +1,8 @@
> > +2017-02-13  Amit Pawar  
> > +
> > + * config/i386/i386.c (znver1_cost): Fix the alignment for function
> > + and max skip bytes for function, loop and jump.
> > +
> >  2017-02-13  Martin Sebor  
> >
> >   PR middle-end/79496
> > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index
> > d7dce4b..d9a4a38 100644
> > --- a/gcc/config/i386/i386.c
> > +++ b/gcc/config/i386/i386.c
> > @@ -2672,7 +2672,7 @@ static const struct ptt
> processor_target_table[PROCESSOR_max] =
> >{"bdver4", _cost, 16, 10, 16, 7, 11},
> >{"btver1", _cost, 16, 10, 16, 7, 11},
> >{"btver2", _cost, 16, 10, 16, 7, 11},
> > -  {"znver1", _cost, 16, 10, 16, 7, 11}
> > +  {"znver1", _cost, 16, 15, 16, 15, 16}
> >  };
> >  ^L
> >  static unsigned int
> > 


Re: [PATCH] Add missing _mm512_prefetch_i{32,64}gather_{pd,ps} (PR target/79481)

2017-02-14 Thread Uros Bizjak
On Mon, Feb 13, 2017 at 8:35 PM, Jakub Jelinek  wrote:
> Hi!
>
> As mentioned in the PR, ICC as well as clang have these non-masked
> gather prefetch intrinsics in addition to masked (and for scatter
> even GCC has both masked and non-masked), but GCC does not (the
> SDM actually doesn't mention those, only those for scatters).
>
> The following patch implements those, I think it is useful to have
> them for compatibility with the other compilers as well for consistency.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2017-02-13  Jakub Jelinek  
>
> PR target/79481
> * config/i386/avx512pfintrin.h (_mm512_prefetch_i32gather_pd,
> _mm512_prefetch_i32gather_ps, _mm512_prefetch_i64gather_pd,
> _mm512_prefetch_i64gather_ps): New inline functions and macros.
>
> * gcc.target/i386/sse-14.c (test_2vx): Add void return type.
> (test_3vx): Change return type from int to void.
> (_mm512_prefetch_i32gather_ps, _mm512_prefetch_i32scatter_ps,
> _mm512_prefetch_i64gather_ps, _mm512_prefetch_i64scatter_ps,
> _mm512_prefetch_i32gather_pd, _mm512_prefetch_i32scatter_pd,
> _mm512_prefetch_i64gather_pd, _mm512_prefetch_i64scatter_pd): New
> tests.
> * gcc.target/i386/sse-22.c (test_2vx): Add void return type.
> (test_3vx): Change return type from int to void.
> (_mm512_prefetch_i32gather_ps, _mm512_prefetch_i32scatter_ps,
> _mm512_prefetch_i64gather_ps, _mm512_prefetch_i64scatter_ps,
> _mm512_prefetch_i32gather_pd, _mm512_prefetch_i32scatter_pd,
> _mm512_prefetch_i64gather_pd, _mm512_prefetch_i64scatter_pd): New
> tests.
> * gcc.target/i386/avx512pf-vgatherpf0dpd-1.c: Add non-masked
> intrinsic.  Change scan-assembler-times number from 1 to 2.
> * gcc.target/i386/avx512pf-vgatherpf0dps-1.c: Likewise.
> * gcc.target/i386/avx512pf-vgatherpf0qpd-1.c: Likewise.
> * gcc.target/i386/avx512pf-vgatherpf0qps-1.c: Likewise.
> * gcc.target/i386/avx512pf-vgatherpf1dpd-1.c: Likewise.
> * gcc.target/i386/avx512pf-vgatherpf1dps-1.c: Likewise.
> * gcc.target/i386/avx512pf-vgatherpf1qpd-1.c: Likewise.
> * gcc.target/i386/avx512pf-vgatherpf1qps-1.c: Likewise.

OK.

Thanks,
Uros.

> --- gcc/config/i386/avx512pfintrin.h.jj 2017-01-17 18:40:59.0 +0100
> +++ gcc/config/i386/avx512pfintrin.h2017-02-13 09:56:21.03124 +0100
> @@ -48,6 +48,24 @@ typedef unsigned short __mmask16;
>  #ifdef __OPTIMIZE__
>  extern __inline void
>  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_prefetch_i32gather_pd (__m256i __index, void const *__addr,
> + int __scale, int __hint)
> +{
> +  __builtin_ia32_gatherpfdpd ((__mmask8) 0xFF, (__v8si) __index, __addr,
> + __scale, __hint);
> +}
> +
> +extern __inline void
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_prefetch_i32gather_ps (__m512i __index, void const *__addr,
> + int __scale, int __hint)
> +{
> +  __builtin_ia32_gatherpfdps ((__mmask16) 0x, (__v16si) __index, __addr,
> + __scale, __hint);
> +}
> +
> +extern __inline void
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
>  _mm512_mask_prefetch_i32gather_pd (__m256i __index, __mmask8 __mask,
>void const *__addr, int __scale, int 
> __hint)
>  {
> @@ -66,6 +84,24 @@ _mm512_mask_prefetch_i32gather_ps (__m51
>
>  extern __inline void
>  __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_prefetch_i64gather_pd (__m512i __index, void const *__addr,
> + int __scale, int __hint)
> +{
> +  __builtin_ia32_gatherpfqpd ((__mmask8) 0xFF, (__v8di) __index, __addr,
> + __scale, __hint);
> +}
> +
> +extern __inline void
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
> +_mm512_prefetch_i64gather_ps (__m512i __index, void const *__addr,
> + int __scale, int __hint)
> +{
> +  __builtin_ia32_gatherpfqps ((__mmask8) 0xFF, (__v8di) __index, __addr,
> + __scale, __hint);
> +}
> +
> +extern __inline void
> +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
>  _mm512_mask_prefetch_i64gather_pd (__m512i __index, __mmask8 __mask,
>void const *__addr, int __scale, int 
> __hint)
>  {
> @@ -155,6 +191,14 @@ _mm512_mask_prefetch_i64scatter_ps (void
>  }
>
>  #else
> +#define _mm512_prefetch_i32gather_pd(INDEX, ADDR, SCALE, HINT)  \
> +  __builtin_ia32_gatherpfdpd ((__mmask8)0xFF, (__v8si)(__m256i)INDEX,   \
> + (void const *)ADDR, (int)SCALE, (int)HINT)
> +
> +#define _mm512_prefetch_i32gather_ps(INDEX, ADDR, 

Re: [PATCH] [X86_64] Fix alignment for znver1 arch.

2017-02-14 Thread Uros Bizjak
On Tue, Feb 14, 2017 at 8:48 AM, Pawar, Amit  wrote:
> Hi maintainers,
>
> Please find the below patch which changes the code alignment values for 
> znver1. Bootstrap and regression test passed on x86_64.
> OK to apply?

OK.

Thanks,
Uros.

> Thanks,
> Amit Pawar
>
> 
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index 2561d53..a5b0159 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,3 +1,8 @@
> +2017-02-13  Amit Pawar  
> +
> + * config/i386/i386.c (znver1_cost): Fix the alignment for function and
> + max skip bytes for function, loop and jump.
> +
>  2017-02-13  Martin Sebor  
>
>   PR middle-end/79496
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index d7dce4b..d9a4a38 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -2672,7 +2672,7 @@ static const struct ptt 
> processor_target_table[PROCESSOR_max] =
>{"bdver4", _cost, 16, 10, 16, 7, 11},
>{"btver1", _cost, 16, 10, 16, 7, 11},
>{"btver2", _cost, 16, 10, 16, 7, 11},
> -  {"znver1", _cost, 16, 10, 16, 7, 11}
> +  {"znver1", _cost, 16, 15, 16, 15, 16}
>  };
>  ^L
>  static unsigned int
> 


Re: [Fortran, Patch, CAF] Failed Images patch (TS 18508)

2017-02-14 Thread Andre Vehreschild
Hi Alessandro,

thanks for the patch. Some polishing is still necessary:

Running in the source directory of gcc:

contrib/check_GNU_style.sh resurrected_patch_and_tests_REV1.diff

gives about 10 issues. Please correct them before applying. Style in gfortran
helps readability.

In check.c::gfc_check_image_status () you are checking the kind of the image's
argument to be gt gfc_default_integer_kind and lt twice the default. Why? In
the standard I see no argument to limit the kind of the arguments. Can you
elaborate?

In the same routine: All operators are standing alone, i.e. put space before
and after each operator (e.g. line 32: gfc_default_integer_kind*2 should be
'..._kind * 2'

You are introducing the notion of teams here in the error messages, but the
rest of gfortran does not have any knowledge about teams. This might confuse
users even if is saying that team is not supported. Just as a remark.

In intrinsic.c you declare the symbol "failed_images" l. 196 as
CLASS_TRANSFORMATIONAL. What data does the statement transform in which way? I
think CLASS_INQUIRY would be better suited, because the function is just asking
the runtime for information.

Same for "image_status" l. 209 and "stopped_images" l. 221.

In line 511 I feel like returning NULL in caf_single mode is insufficient.
Imagine an assignment f = failed_images(). Returning NULL will most likely make
the compiler ICE, when evaluating the rhs (haven't tested though). Returning a
constant 0 expression would be more wise, because in caf_single mode only the
current image is present and that must be running to do the inquiry.

Same for the stopped images in l. 538.

and image_status in l. 563.

The FIXME in line 566 needs to be resolved or one of the middle-end guys will
step on your toes, when that fails.

What are the three arguments to caf_failed_images in line 610. Most
interestingly the first one?

And in line 644 you see the result of returning NULL in the simplify_()
routines. Please remove this again here and return something reasonable in the
simplify-routines as suggested above. Checking for arg-expr not being
initialized here might lead to hard to find misbehavior of gfortran in other
cases.

Line 689 and 695: Do not sort symbols as a side-effect of a functional patch.
Correct style and change sort orders and the like in a separate patch for code,
that is not intrinsic to what you patch. It makes reviewers wonder why you
needed to change that!

Line 717: Why is a block needed here? You may just return the call and be done.

Line 725: Would it not be better to call the numeric stop function? With a
documented error code?

Line 783: As Jerry has pointed out already: This needs to be dg-do compile.
Furthermore is the coarray directory the wrong place, because there all tests
are called ones with -fcoarray=single and ones with -fcoarray=lib -lcaf_single
-latomic. So the test needs to go to gfortran.dg name it e.g.,
coarray_fail_st_1.f90.

Just checking above that the code compiles is only a quarter of the way. You
still don't know, that correct API-calls are generated. This has to be added
for -fcoarray=single and -fcoarray=lib.

Line 798: Same for this test as for the previous: Wrong test-mode,
wrong directory, not checking API-calls correctly. Also add a number to the
test's file name. It most likely will not be the only test for that feature.

Line 819: Same as for the previous two.

Line 881: Make it:

! { dg-final { scan-tree-dump-times "_gfortran_caf_failed_images
\\\(\\\.\[0-9\]+, 0B, 0B\\\);" 1 "original" } }

Experience has shown, that gfortrans on different systems choose quite
arbitrary numbers for the atmp and then the test fails.

Line 989: Please remove the dependency on signal.h. I don't assume it is
present on all systems and you don't want to do the guard thing.

Line 999: Use exit(1); here instead of the sigkill. This would sync termination
with the way it is done by error_stop() and obsoletes the need for signal.h.

Overall: You are adding several API-functions without a single line of
documentation in gfortran.texi. This is not good.

Therefore my rating: NOT ok for trunk yet.

Regards,
Andre


On Mon, 13 Feb 2017 13:35:37 -0700
Alessandro Fanfarillo  wrote:

> Now with the patch attached.
> 
> 2017-02-13 13:35 GMT-07:00 Alessandro Fanfarillo :
> > Thanks Jerry. That test case is supposed only to be compiled (it never
> > runs). Anyway, the attached patch has been modified according to your
> > suggestion.
> >
> > Patch built and regtested on x86_64-pc-linux-gnu.
> >
> > 2017-02-12 10:24 GMT-07:00 Jerry DeLisle :
> >> On 02/11/2017 03:02 PM, Alessandro Fanfarillo wrote:
> >>>
> >>> Dear all,
> >>> please find in attachment a new patch following the discussion at
> >>> https://gcc.gnu.org/ml/fortran/2017-01/msg00054.html.
> >>>
> >>> Suggestions on how to fix potential issues are more than welcome.
> >>>
> >>> Regards,
> >>> Alessandro
> >>>

Re: [RFA][PR tree-optimization/79095] [PATCH 1/4] Improve ranges for MINUS_EXPR and EXACT_DIV_EXPR

2017-02-14 Thread Richard Biener
On Tue, Feb 14, 2017 at 12:19 AM, Jeff Law  wrote:
> On 02/07/2017 01:39 AM, Richard Biener wrote:
>>
>> On Mon, Feb 6, 2017 at 10:57 PM, Jeff Law  wrote:
>>>
>>> On 02/06/2017 08:33 AM, Richard Biener wrote:
>>>
 ah, indeed vr0type is VR_ANTI_RANGE and yes we have the case of a
 range with an anti-range "inside".  This also covers [-1,1] v
 ~[0,0] where you choose the much larger anti-range now.  So at
 least we want to have some idea about the sizes of the ranges
 (ideally we'd choose the smaller though for most further
 propagations anti-ranges often degenerate to varying...)
>>>
>>>
>>> vr0 as an anti-singleton range like ~[0,0] is the only one likely
>>> of any interest right now and that's always going to have a range
>>> that is all but one value :-)
>>>
>>> vr1 is the tricky case.  We could do v1.max - vr1.min and if that
>>> overflows or is some "large" value (say > 65536 just to throw out a
>>> value), then we conclude creating the singleton anti-range like
>>> ~[0,0] is more useful.
>>
>>
>> Yes, but it's hard to tell.  As said, anti-ranges quickly degrade in
>> further propagation and I fear that without a better range
>> representation it's hard to do better in all cases here.  The fact is
>> we can't represent the result of the intersection and thus we have to
>> conservatively choose an approximation.  Sometimes we have the other
>> range on an SSA name and thus can use equivalences (when coming from
>> assert processing), but most of the time not and thus we can't use
>> equivalences (which use SSA name versions rather than an index into
>> a ranges array - one possible improvement to the range
>> representation). Iff ~[0,0] is useful information querying sth for
>> non-null should also look at equivalences btw.
>
> I spoke with Andrew a bit today, he's consistently seeing cases where the
> union of 3 ranges is necessary to resolve the kinds of queries we're
> interested in.  He's made a design decision not to use anti-ranges in his
> work, so y'all are in sync on that long term.

Ok.  I'd also not hard-code the number of union ranges but make the code
agnostic.  Still the actual implementation might take a #define / template param
for an upper bound.

> He and Aldy have some bits to change the underlying range representation
> that might make sense to hash through right after stage1 reopens.

Good.

> Jeff


Re: [PATCH] Fix buffer overflow in SH expand_cbranchdi4 (PR target/79462)

2017-02-14 Thread Oleg Endo
On Tue, 2017-02-14 at 09:22 +0100, Jakub Jelinek wrote:
> Hi!
> 
> The following patch fixes a buffer overflow in the SH backend.
> r235698 removed an operand (clobber of match_scratch) from the
> various
> cbranch pattersn that called expand_cbranchdi4 as well as all but
> one references to operands[4] in that code.  Now that the insn only
> has 4 operands, clearing operands[4] is a buffer overflow.
> 
> Tested by Kaz (thanks).
> In the PR Oleg asked for a comment, but I'm not sure how useful is
> it to document that something used to be cleared and is not anymore,
> because it doesn't exist.
> 
> Ok for trunk (or suggested wording for a comment)?
> 

Sorry, I haven't checked the code in a while.  If it's the last
reference, then of course a comment would be just confusing like you've
said.  Thanks for figuring it out.  OK as it is for trunk and the other
branches.

Cheers,
Oleg


Re: [PATCH] Fix PR56888

2017-02-14 Thread Richard Biener
On Tue, Feb 23, 2016 at 12:32 PM, Richard Biener  wrote:
> On Tue, 23 Feb 2016, Jan Hubicka wrote:
>
>> >
>> > Ok, so maybe a better question to symtab would be if there is an
>> > actual definition for what __builtin_FOO will call.  Not really
>> > whether that definition is cfun.  Of course all the fortify
>> > always-inline wrappers should not count as such (just in case
>> > the symtab code is confused about those).
>>
>> Also GNU extern inlines that are often used to deal special cases.
>> >
>> > So,
>> >
>> > bool symbol_table::have_definition (enum built_in_fn);
>> >
>> > ?  Not sure how to best implement that either.  asmname lookups are
>> > expensive ...
>>
>> I am back from China trip, so i can handle you patch if you want.

Honza - ping.  Can you please think of a symtab predicate that tells me
whether a cgraph node is an implementation for BUILT_IN_X?  (see original
patch in this thread).

It's before another GCC release and we're trying to improve QOI wise for
almost 3 releases now...

Thanks a lot,
Richard.

>> I see that by stopping the optimization on whole translation unit that
>> defines memcpy/memset will solve the reachability issue I mentioned
>> in previous mail, but also when LTOing stuff like Linux kernel, it will
>> prevent the optimization on the whole program.
>
> Yes, but I think it's reasonable to disable such transform if the
> memcpy implementation is being optimized.
>
>> I am not quite sure how to deal with the alwaysinline wrappers however,
>> because there theoretically may contain memcpy/memset loops themselves.
>
> It might be a non-issue as we are doing the transforms only after
> inlining when those bodies should be gone and thus symtab shouldn't see
> such implementation.
>
> Better to double-check, of course.  We'd want
>
> #include 
>
> int main()
> {
>   int s[1204];
>   for (int i = 0; i < 1204; ++i)
>s[i] = 0;
>   memset (s, 0, sizeof (s));
> }
>
> still be optimized as memset.
>
> Richard.
>
>> Honza
>> >
>> > Richard.
>>
>>
>
> --
> Richard Biener 
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
> 21284 (AG Nuernberg)


[PATCH] Fix buffer overflow in SH expand_cbranchdi4 (PR target/79462)

2017-02-14 Thread Jakub Jelinek
Hi!

The following patch fixes a buffer overflow in the SH backend.
r235698 removed an operand (clobber of match_scratch) from the various
cbranch pattersn that called expand_cbranchdi4 as well as all but
one references to operands[4] in that code.  Now that the insn only
has 4 operands, clearing operands[4] is a buffer overflow.

Tested by Kaz (thanks).
In the PR Oleg asked for a comment, but I'm not sure how useful is
it to document that something used to be cleared and is not anymore,
because it doesn't exist.

Ok for trunk (or suggested wording for a comment)?

2017-02-14  Jakub Jelinek  

PR target/79462
* config/sh/sh.c (expand_cbranchdi4): Don't clear operands[4].

--- gcc/config/sh/sh.c.jj   2017-01-01 12:45:41.0 +0100
+++ gcc/config/sh/sh.c  2017-02-11 10:15:03.460321825 +0100
@@ -2152,7 +2152,6 @@ expand_cbranchdi4 (rtx *operands, enum r
 }
   operands[1] = op1h;
   operands[2] = op2h;
-  operands[4] = NULL_RTX;
 
   if (msw_taken != LAST_AND_UNUSED_RTX_CODE)
 expand_cbranchsi4 (operands, msw_taken, msw_taken_prob);

Jakub


Re: [PATCH] Improve x % y to x VRP optimization (PR tree-optimization/79408)

2017-02-14 Thread Richard Biener
On Mon, 13 Feb 2017, Jakub Jelinek wrote:

> On Mon, Feb 13, 2017 at 12:24:08PM +0100, Richard Biener wrote:
> > You'd of course allocate it on the stack.  But yeah, sth like your patch
> > works for me.
> 
> Now bootstrapped/regtested successfully on x86_64-linux and i686-linux.
> So is this ok for trunk and perhaps we can add new APIs later?

Yes.

Thanks,
Richard.

> > > 2017-02-13  Jakub Jelinek  
> > > 
> > >   PR tree-optimization/79408
> > >   * tree-vrp.c (simplify_div_or_mod_using_ranges): Handle also the
> > >   case when on TRUNC_MOD_EXPR op0 is INTEGER_CST.
> > >   (simplify_stmt_using_ranges): Call simplify_div_or_mod_using_ranges
> > >   also if rhs1 is INTEGER_CST.
> > > 
> > >   * gcc.dg/tree-ssa/pr79408-2.c: New test.
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)