Re: [PATCH] Fix ICE with missing lhs on noreturn call with addressable return type (PR tree-optimization/79267)

2017-01-30 Thread Richard Biener
On January 30, 2017 7:10:07 PM GMT+01:00, Jakub Jelinek  
wrote:
>Hi!
>
>This is yet another occurrence of the bug that we drop lhs on noreturn
>calls even when it actually should not be dropped (if it has
>addressable
>type or if it is a variable length type).
>
>Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Richard.

>2017-01-30  Jakub Jelinek  
>
>   PR tree-optimization/79267
>   * value-prof.c (gimple_ic): Only drop lhs for noreturn calls
>   if should_remove_lhs_p is true.
>
>   * g++.dg/opt/pr79267.C: New test.
>
>--- gcc/value-prof.c.jj2017-01-01 12:45:38.0 +0100
>+++ gcc/value-prof.c   2017-01-30 12:30:47.179820533 +0100
>@@ -1358,7 +1358,8 @@ gimple_ic (gcall *icall_stmt, struct cgr
>   dcall_stmt = as_a  (gimple_copy (icall_stmt));
>   gimple_call_set_fndecl (dcall_stmt, direct_call->decl);
>   dflags = flags_from_decl_or_type (direct_call->decl);
>-  if ((dflags & ECF_NORETURN) != 0)
>+  if ((dflags & ECF_NORETURN) != 0
>+  && should_remove_lhs_p (gimple_call_lhs (dcall_stmt)))
> gimple_call_set_lhs (dcall_stmt, NULL_TREE);
>   gsi_insert_before (, dcall_stmt, GSI_SAME_STMT);
> 
>--- gcc/testsuite/g++.dg/opt/pr79267.C.jj  2017-01-30 12:36:07.605516857
>+0100
>+++ gcc/testsuite/g++.dg/opt/pr79267.C 2017-01-30 12:35:51.0
>+0100
>@@ -0,0 +1,69 @@
>+// PR tree-optimization/79267
>+// { dg-do compile }
>+// { dg-options "-O3" }
>+
>+struct A { A (int); };
>+struct B
>+{
>+  virtual void av () = 0;
>+  void aw ();
>+  void h () { av (); aw (); }
>+};
>+template  struct G : B
>+{
>+  T ba;
>+  G (int, T) : ba (0) {}
>+  void av () { ba (0); }
>+};
>+struct I
>+{
>+  B *bc;
>+  template  I (j, T) try { G (0, 0); } catch
>(...) {}
>+  ~I () { bc->h (); }
>+};
>+template  struct C { typedef M *i; };
>+template  struct J
>+{
>+  J ();
>+  template  J (O, T p2) : be (0, p2) {}
>+  typename C::i operator-> ();
>+  I be;
>+};
>+struct H : A { H () : A (0) {} };
>+struct D { J d; void q (); };
>+template  class bs;
>+int z;
>+
>+void
>+foo (int p1, int *, int)
>+{
>+  if (p1 == 0)
>+throw H ();
>+}
>+
>+D bar ();
>+template  struct L
>+{
>+  struct K { K (int); void operator() (int *) { bar ().q (); } };
>+  static J bp () { bq (0); }
>+  template  static void bq (br) { J (0, K (0)); }
>+};
>+struct F
>+{
>+  virtual J x (int) { foo (0, 0, 0); J > (L >::bp ());
>}
>+};
>+
>+void
>+baz ()
>+{
>+  if (z)
>+{
>+  J d, e;
>+  d->x (0);
>+  e->x (0);
>+}
>+  J v, i, j;
>+  v->x (0);
>+  i->x (0);
>+  j->x (0);
>+}
>
>   Jakub



Re: [v3 PATCH] Implement LWG 2825, LWG 2756 breaks class template argument deduction for optional.

2017-01-30 Thread Tim Song
On Tue, Jan 31, 2017 at 8:48 AM Ville Voutilainen
 wrote:
>
> On 31 January 2017 at 00:41, Ville Voutilainen
>  wrote:
>
> I don't actually need to constrain it, I could just add a guide like
>
> template  optional(optional<_Tp>) -> optional<_Tp>;
>
> However, I'm not convinced I need to. The preference to an explicit
> guide is, at least based
> on that paper, a tie-breaker rule. If the copy/move constructors are
> better matches than the guide,
> those should be picked over a guide. Jason?

Yes, but they are not "better matches". They are equally good matches
after deduction and substitution. The mechanism that selects
template void f(const optional&) over template
void f(T) given an optional argument is partial ordering, and
that's the last tiebreaker in the list, *after* the implicit/explicit
guide tiebreaker.


[PATCH], PR target/78597, Fix PowerPC fp-int-convert-{float128-ieee,float64x}

2017-01-30 Thread Michael Meissner
This patch fixes PR target/78597 on PowerPC.  The basic problem is conversion
between unsigned int and _Float128 fails for 0x8000.  Both power{7,8} using
simulated IEEE 128-bit floating point and power9 using hardware IEEE 128-bit
failed in the same test.

I cut down the patches I had developed for 79038 that are waiting for GCC 8 to
open up to include the patches that fix the problem, but don't do additional
improvements (optimizing conversions between char/short and _Float128, and
optimizing converting _Float128 to int/short/char and storing the result).

This patch is a little on the big side, because I deleted the two functions
(convert_float128_to_int and (convert_int_to_float128) that were doing the
integer/_Float128 conversions, and instead implemented them directly.  I also
deleted the various insns that those two functions called.  It only affects
_Float128/__float128 conversions.

I have tested this on a little endian power8 system.  Bootstrap passes, and the
only changes in the test suite runs were the following tests now pass:

gcc.dg/torture/fp-int-convert-float128-ieee.c
gcc.dg/torture/fp-int-convert-float64x.c

I did not add any tests, because it was covered by existing tests.  Can I check
this into trunk?

2017-01-30  Michael Meissner  

PR target/78597
PR target/79038
* config/rs6000/rs6000-protos.h (convert_float128_to_int): Delete,
no longer used.
(convert_int_to_float128): Likewise.
* config/rs6000/rs6000.c (convert_float128_to_int): Likewise.
(convert_int_to_float128): Likewise.
* config/rs6000/rs6000.md (UNSPEC_IEEE128_MOVE): Likewise.
(UNSPEC_IEEE128_CONVERT): Likewise.
(floatsi2, FLOAT128 iterator): Bypass calling
rs6000_expand_float128_convert if we have IEEE 128-bit hardware.
Use local variables for IBM extended format.
(fix_truncsi2, FLOAT128 iterator): Likewise.
(fix_truncsi2_fprs): Likewise.
(fixuns_trunc2): Likewise.
(floatuns2, IEEE128 iterator): Likewise.
(fix_si2_hw): Rework the IEEE 128-bt hardware support
to know that we can now have integers of all sizes in vector
registers.
(fix_di2_hw): Likewise.
(float_si2_hw): Likewise.
(fix_si2_hw): Likewise.
(fixuns_si2_hw): Likewise.
(float_di2_hw): Likewise.
(float_di2_hw): Likewise.
(float_si2_hw): Likewise.
(floatuns_di2_hw): Likewise.
(floatuns_si2_hw): Likewise.
(xscvqpwz_): Delete, no longer used.
(xscvqpdz_): Likewise.
(xscvdqp_): Likewise.
(ieee128_mfvsrd_64bit): Likewise.
(ieee128_mfvsrd_32bit): Likewise.
(ieee128_mfvsrwz): Likewise.
(ieee128_mtvsrw): Likewise.
(ieee128_mtvsrd_64bit): Likewise.
(ieee128_mtvsrd_32bit): Likewise.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/rs6000-protos.h
===
--- gcc/config/rs6000/rs6000-protos.h   (revision 245036)
+++ gcc/config/rs6000/rs6000-protos.h   (working copy)
@@ -57,8 +57,6 @@ extern const char *rs6000_output_move_12
 extern bool rs6000_move_128bit_ok_p (rtx []);
 extern bool rs6000_split_128bit_ok_p (rtx []);
 extern void rs6000_expand_float128_convert (rtx, rtx, bool);
-extern void convert_float128_to_int (rtx *, enum rtx_code);
-extern void convert_int_to_float128 (rtx *, enum rtx_code);
 extern void rs6000_expand_vector_init (rtx, rtx);
 extern void paired_expand_vector_init (rtx, rtx);
 extern void rs6000_expand_vector_set (rtx, rtx, int);
Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 245036)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -24609,92 +24609,6 @@ rs6000_expand_float128_convert (rtx dest
   return;
 }
 
-/* Split a conversion from __float128 to an integer type into separate insns.
-   OPERANDS points to the destination, source, and V2DI temporary
-   register. CODE is either FIX or UNSIGNED_FIX.  */
-
-void
-convert_float128_to_int (rtx *operands, enum rtx_code code)
-{
-  rtx dest = operands[0];
-  rtx src = operands[1];
-  rtx tmp = operands[2];
-  rtx cvt;
-  rtvec cvt_vec;
-  rtx cvt_unspec;
-  rtvec move_vec;
-  rtx move_unspec;
-
-  if (GET_CODE (tmp) == SCRATCH)
-tmp = gen_reg_rtx (V2DImode);
-
-  if (MEM_P (dest))
-dest = rs6000_address_for_fpconvert (dest);
-
-  /* Generate the actual convert insn of the form:
- (set (tmp) (unspec:V2DI [(fix:SI (reg:KF))] UNSPEC_IEEE128_CONVERT)).  */
-  cvt = gen_rtx_fmt_e (code, GET_MODE (dest), src);
-  cvt_vec = gen_rtvec (1, cvt);
-  cvt_unspec = gen_rtx_UNSPEC (V2DImode, cvt_vec, UNSPEC_IEEE128_CONVERT);
-  emit_insn (gen_rtx_SET (tmp, cvt_unspec));
-
-  /* Generate the 

Re: [PATCH 2/6] RISC-V Port: gcc

2017-01-30 Thread Andrew Waterman
On Fri, Jan 20, 2017 at 10:41 PM, Richard Henderson  wrote:
> On 01/11/2017 06:30 PM, Palmer Dabbelt wrote:
>>
>> +(define_register_constraint "f" "TARGET_HARD_FLOAT ? FP_REGS : NO_REGS"
>> +  "A floating-point register (if available).")
>> +
>
>
> I know this is the Traditional Way, but I do wonder if using the new enable
> attribute on the alternatives would be better.  No need to change change;
> just something that makes me wonder.

Yeah, we could look into that later.

>
>> +(define_constraint "Q"
>> +  "@internal"
>> +  (match_operand 0 "const_arith_operand"))
>
>
> How does this differ from "I"?
>
>> +
>> +(define_memory_constraint "W"
>> +  "@internal
>> +   A memory address based on a member of @code{BASE_REG_CLASS}."
>> +  (and (match_code "mem")
>> +   (match_operand 0 "memory_operand")))
>
>
> How does this differ (materially) from "m"?  Or from just
>
>   (match_operand 0 "memory_operand")
>
> for that matter?
>
>
>> +;;
>> +;; DI -> SI optimizations
>> +;;
>> +
>> +;; Simplify (int)(a + 1), etc.
>> +(define_peephole2
>> +  [(set (match_operand:DI 0 "register_operand")
>> +   (match_operator:DI 4 "modular_operator"
>> + [(match_operand:DI 1 "register_operand")
>> +  (match_operand:DI 2 "arith_operand")]))
>> +   (set (match_operand:SI 3 "register_operand")
>> +   (truncate:SI (match_dup 0)))]
>> +  "TARGET_64BIT && (REGNO (operands[0]) == REGNO (operands[3]) ||
>> peep2_reg_dead_p (2, operands[0]))
>> +   && (GET_CODE (operands[4]) != ASHIFT || (CONST_INT_P (operands[2]) &&
>> INTVAL (operands[2]) < 32))"
>> +  [(set (match_dup 3)
>> + (truncate:SI
>> +(match_op_dup:DI 4
>> +  [(match_operand:DI 1 "register_operand")
>> +   (match_operand:DI 2 "arith_operand")])))])
>
>
> I take from this that combine + ree fail to do the job?
>
> I must say I'm less than thrilled about the use of truncate instead of
> simply properly representing the sign-extensions that are otherwise required
> by the ABI.  RISCV is unlike MIPS in that the 32-bit operations (addw et al)
> do not require sign-extended inputs in order to produce a correct output
> value.
>
> Consider Alpha as a counter-example to MIPS, where the ABI requires
> sign-extensions at function edges and the ISA requires sign-extensions for
> comparisons.

We've revamped the port to use TRULY_NOOP_TRUNCATION.

>>
>>
>> +(define_insn "*local_pic_storesi"
>> +  [(set (mem:ANYF (match_operand 0 "absolute_symbolic_operand" ""))
>> +   (match_operand:ANYF 1 "register_operand" "f"))
>> +   (clobber (reg:SI T0_REGNUM))]
>> +  "TARGET_HARD_FLOAT && !TARGET_64BIT && USE_LOAD_ADDRESS_MACRO
>> (operands[0])"
>> +  "\t%1,%0,t0"
>> +  [(set (attr "length") (const_int 8))])
>
>
> Use match_scratch so that you need not hard-code T0.
> And likewise for the other pic stores.
>
>> +(define_predicate "const_0_operand"
>> +  (and (match_code "const_int,const_double,const_vector")
>> +   (match_test "op == CONST0_RTX (GET_MODE (op))")))
>
>
> Probably want to support const_wide_int.
>
>> +  /* Don't handle multi-word moves this way; we don't want to introduce
>> + the individual word-mode moves until after reload.  */
>> +  if (GET_MODE_SIZE (mode) > UNITS_PER_WORD)
>> +return false;
>
>
> Why not?  Do the subreg passes not do a good job?  I would expect it to do
> well on a target like RISCV.
>
>> +(define_predicate "move_operand"
>> +  (match_operand 0 "general_operand")
>
>
> Normally this is handled by TARGET_LEGITIMATE_CONSTANT_P.
>
>> +/* Target CPU builtins.  */
>> +#define TARGET_CPU_CPP_BUILTINS()  \
>> +  do   \
>> +{  \
>> +  builtin_define ("__riscv");  \
>
>
> Perhaps better to move this to riscv-c.c?
>
>> +  if (TARGET_MUL)  \
>> +   builtin_define ("__riscv_mul"); \
>> +  if (TARGET_DIV)  \
>> +   builtin_define ("__riscv_div"); \
>> +  if (TARGET_DIV && TARGET_MUL)\
>> +   builtin_define ("__riscv_muldiv");  \
>
>
> Out of curiosity, why are these split when the M extension doesn't do so?

Implementations that have MUL but not DIV cannot claim conformance to
the M extension, but they do exist, and it's easy enough for us to
support them.  AFAIK, they are all FPGA soft cores, where MUL maps
very cheaply to the DSP macros, but DIV is relatively more expensive.

>
>> +/* Define this macro if it is advisable to hold scalars in registers
>> +   in a wider mode than that declared by the program.  In such cases,
>> +   the value is constrained to be within the 

Re: [PATCH 1/6] RISC-V Port: gcc/config/riscv/riscv.c

2017-01-30 Thread Andrew Waterman
Thanks for the feedback, Richard.  We've addressed the bulk of it, and
added some explanatory comments in the few cases where the current
implementation makes sense, but for less than obvious reasons.  We
will submit a v2 patch set reflecting these changes in the next couple
of days.

A few responses are in-line below.

On Fri, Jan 20, 2017 at 6:32 PM, Richard Henderson  wrote:
>
> On 01/11/2017 06:30 PM, Palmer Dabbelt wrote:
>>
>> +/* The largest number of operations needed to load an integer constant.
>> +   The worst case is LUI, ADDI, SLLI, ADDI, SLLI, ADDI, SLLI, ADDI,
>> +   but we may attempt and reject even worse sequences.  */
>> +#define RISCV_MAX_INTEGER_OPS 32
>
>
> Why would you?  Surely after you exhaust 8 you'd just abandon that search as 
> unprofitable.
>
>> +  if (cost > 2 && (value & 1) == 0)
>> +{
>> +  int shift = 0;
>> +  while ((value & 1) == 0)
>> +   shift++, value >>= 1;
>
>
>   shift = ctz_hwi (value);
>
> You also may want to test for
>
>   value | (HOST_WIDE_INT_M1U << (HOST_BITS_PER_WIDE_INT - shift))
>
> i.e. shifting out leading 1's.  As well as checking with shift - 12.  The 
> latter is interesting for shifting a 20-bit value up into the high word.
>
> I once proposed a generic framework for this that cached results for 
> computation of sequences and costs.  Unfortunately it never gained traction. 
> Perhaps it's time to try again.
>
>> +  /* Try filling trailing bits with 1s.  */
>> +  while ((value << shift) >= 0)
>> +   shift++;
>
>
> clz_hwi.
>
>> +  return GET_CODE (x) == SYMBOL_REF && SYMBOL_REF_TLS_MODEL (x) != 0;
>
>
> SYMBOL_REF_P.
>
>> +riscv_symbol_binds_local_p (const_rtx x)
>> +{
>> +  return (SYMBOL_REF_DECL (x)
>> + ? targetm.binds_local_p (SYMBOL_REF_DECL (x))
>> + : SYMBOL_REF_LOCAL_P (x));
>
>
> Missing SYMOL_REF_P?
>
> Surely encode_section_info will already have set SYMBOL_FLAG_LOCAL, and thus 
> you need not invoke targetm.binds_local_p again.
>
>> +case LABEL_REF:
>> +  if (LABEL_REF_NONLOCAL_P (x))
>> +   return SYMBOL_GOT_DISP;
>> +  break;
>
>
> Non-local labels are not local to the current function, but they are still 
> local to the translation unit (they'll be local to one of the outer functions 
> of a nested function).
>
>> +  switch (*symbol_type)
>> +{
>> +case SYMBOL_ABSOLUTE:
>> +case SYMBOL_PCREL:
>> +case SYMBOL_TLS_LE:
>> +  return (int32_t) INTVAL (offset) == INTVAL (offset);
>
>
> Why?
>
>
>> +case MINUS:
>> +  if (float_mode_p
>> + && !HONOR_NANS (mode)
>> + && !HONOR_SIGNED_ZEROS (mode))
>> +   {
>> + /* See if we can use NMADD or NMSUB.  See riscv.md for the
>> +associated patterns.  */
>> + rtx op0 = XEXP (x, 0);
>> + rtx op1 = XEXP (x, 1);
>> + if (GET_CODE (op0) == MULT && GET_CODE (XEXP (op0, 0)) == NEG)
>> +   {
>> + *total = (tune_info->fp_mul[mode == DFmode]
>> +   + set_src_cost (XEXP (XEXP (op0, 0), 0), mode, speed)
>> +   + set_src_cost (XEXP (op0, 1), mode, speed)
>> +   + set_src_cost (op1, mode, speed));
>> + return true;
>> +   }
>> + if (GET_CODE (op1) == MULT)
>> +   {
>> + *total = (tune_info->fp_mul[mode == DFmode]
>> +   + set_src_cost (op0, mode, speed)
>> +   + set_src_cost (XEXP (op1, 0), mode, speed)
>> +   + set_src_cost (XEXP (op1, 1), mode, speed));
>> + return true;
>> +   }
>> +   }
>
>
> Do we not fold these to FMA + NEG?  If not, that's surprising and maybe 
> should be fixed.  Also, you appear to be missing costs for FMA in 
> riscv_rtx_costs.
>
>> +   case UNORDERED:
>> + *code = EQ;
>> + /* Fall through.  */
>> +
>> +   case ORDERED:
>> + /* a == a && b == b */
>> + tmp0 = gen_reg_rtx (SImode);
>> + riscv_emit_binary (EQ, tmp0, cmp_op0, cmp_op0);
>> + tmp1 = gen_reg_rtx (SImode);
>> + riscv_emit_binary (EQ, tmp1, cmp_op1, cmp_op1);
>> + *op0 = gen_reg_rtx (SImode);
>> + riscv_emit_binary (AND, *op0, tmp0, tmp1);
>> + break;
>
>
> Better with FCLASS + AND?  At least for a branch?

The code path is slightly shorter using FEQ instead.  If FEQ were two
or more cycles slower than FCLASS, then FCLASS would be a win for
latency, but that is not the case for any known implementation.

>
>> +static int
>> +riscv_flatten_aggregate_field (const_tree type,
>> +  riscv_aggregate_field fields[2],
>> +  int n, HOST_WIDE_INT offset)
>
>
> I don't see any code within to bound N to 2, so as not to overflow FIELDS.  
> Am I missing something?
>
> Are you missing code for COMPLEX_TYPE?  In the default case I only see 
> SCALAR_FLOAT_TYPE_P.
>
>> +  memset (info, 0, sizeof (*info));
>> +  

Re: [v3 PATCH] Implement LWG 2825, LWG 2756 breaks class template argument deduction for optional.

2017-01-30 Thread Ville Voutilainen
On 31 January 2017 at 00:41, Ville Voutilainen
 wrote:
> On 31 January 2017 at 00:06, Tim Song  wrote:
>> On Mon, Jan 30, 2017 at 9:36 PM Jonathan Wakely  wrote:
>>>
>>> On 30/01/17 13:28 +, Jonathan Wakely wrote:
>>> >On 30/01/17 13:47 +0200, Ville Voutilainen wrote:
>>> >>Tested on Linux-x64.
>>> >
>>> >OK, thanks.
>>>
>>> To be clear: this isn't approved by LWG yet, but I think we can be a
>>> bit adventurous with deduction guides and add them for experimental
>>> C++17 features. Getting more usage experience before we standardise
>>> these things will be good, and deduction guides are very new and
>>> untried. If we find problems we can remove them again, and will have
>>> invaluable feedback for the standards committee.
>>>
>>
>> My brain compiler says that this may cause problems with
>>
>> std::optional o1;
>> std::optional o2 = o1; // wanted optional, deduced 
>> optional
>>
>> Trunk GCC deduces optional, but I don't think it implements
>> P0512R0 yet, which prefers explicit guides to implicit ones before
>> considering partial ordering. This example is very similar to the
>> example in https://timsong-cpp.github.io/cppwp/over.match.best#1.6.
>
>
> I'll see about constraining the guide tomorrow.

I don't actually need to constrain it, I could just add a guide like

template  optional(optional<_Tp>) -> optional<_Tp>;

However, I'm not convinced I need to. The preference to an explicit
guide is, at least based
on that paper, a tie-breaker rule. If the copy/move constructors are
better matches than the guide,
those should be picked over a guide. Jason?


Re: [PATCH] Fix ICEs with power8 fixuns_truncdi2 patterns (PR target/79197)

2017-01-30 Thread Segher Boessenkool
Hi Jakub, Mike,

On Mon, Jan 30, 2017 at 10:27:15PM +0100, Jakub Jelinek wrote:
> Accoring to make mddump generated tmp-mddump.md, on powerpc the only pattern
> with unsigned_fix:DI where the inner operand is SF or DFmode is the
> *fixuns_truncdi2_fctiduz.

It seems like vsx_fixuns_trunc2 (in vsx.md) also wants to
handle this.  Mike, do you remember?


Segher


[committed] move constant to the right of relational operators (Re: [PATCH 4/5] distinguish likely and unlikely results (PR 78703))

2017-01-30 Thread Martin Sebor

So I see the introduction of many

if (const OP object) expressions

Can you please fix those as an independent patch after #4 and #5 are
installed on the trunk?  Consider that patch pre-approved, but please
post it here for the historical record.

I think a regexp of paren followed by a constant would probably take you
to them pretty quickly.


I committed the attached patch in r245040.

Martin
commit 9f6f1219ab8d2fda96b604ac5e719b815a7a9ff4
Author: Martin Sebor 
Date:   Mon Jan 30 15:59:21 2017 -0700

gcc/ChangeLog:
	* gimple-ssa-sprintf.c (fmtresult::adjust_for_width_or_precision):
	Move constant to the right of a relational operator.
	(get_mpfr_format_length, format_character, format_string): Ditto.
	(should_warn_p, maybe_warn): Same.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index ca356c6..92440d2 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,5 +1,10 @@
 2017-01-30  Martin Sebor  
 
+	* gimple-ssa-sprintf.c (fmtresult::adjust_for_width_or_precision):
+	Move constant to the right of a relational operator.
+	(get_mpfr_format_length, format_character, format_string): Ditto.
+	(should_warn_p, maybe_warn): Same.
+
 	* doc/invoke.texi (-Wformat-truncation=1): Fix typo.
 
 2017-01-30  Maxim Ostapenko  
diff --git a/gcc/gimple-ssa-sprintf.c b/gcc/gimple-ssa-sprintf.c
index 8261a44..11f4174 100644
--- a/gcc/gimple-ssa-sprintf.c
+++ b/gcc/gimple-ssa-sprintf.c
@@ -520,7 +520,7 @@ fmtresult::adjust_for_width_or_precision (const HOST_WIDE_INT adjust[2],
   bool minadjusted = false;
 
   /* Adjust the minimum and likely counters.  */
-  if (0 <= adjust[0])
+  if (adjust[0] >= 0)
 {
   if (range.min < (unsigned HOST_WIDE_INT)adjust[0])
 	{
@@ -537,7 +537,7 @@ fmtresult::adjust_for_width_or_precision (const HOST_WIDE_INT adjust[2],
 knownrange = false;
 
   /* Adjust the maximum counter.  */
-  if (0 < adjust[1])
+  if (adjust[1] > 0)
 {
   if (range.max < (unsigned HOST_WIDE_INT)adjust[1])
 	{
@@ -1456,7 +1456,7 @@ get_mpfr_format_length (mpfr_ptr x, const char *flags, HOST_WIDE_INT prec,
 {
   /* Cap precision arbitrarily at 1KB and add the difference
 	 (if any) to the MPFR result.  */
-  if (1024 < prec)
+  if (prec > 1024)
 	p = 1024;
 }
 
@@ -1873,7 +1873,7 @@ format_character (const directive , tree arg)
 	  res.range.likely = 0;
 	  res.range.unlikely = 0;
 	}
-	  else if (0 < min && min < 128)
+	  else if (min > 0 && min < 128)
 	{
 	  /* A wide character in the ASCII range most likely results
 		 in a single byte, and only unlikely in up to MB_LEN_MAX.  */
@@ -1942,7 +1942,7 @@ format_string (const directive , tree arg)
 	 2 * wcslen (S).*/
 	  res.range.likely = res.range.min * 2;
 
-	  if (0 <= dir.prec[1]
+	  if (dir.prec[1] >= 0
 	  && (unsigned HOST_WIDE_INT)dir.prec[1] < res.range.max)
 	{
 	  res.range.max = dir.prec[1];
@@ -1952,7 +1952,7 @@ format_string (const directive , tree arg)
 
 	  if (dir.prec[0] < 0 && dir.prec[1] > -1)
 	res.range.min = 0;
-	  else if (0 <= dir.prec[0])
+	  else if (dir.prec[0] >= 0)
 	res.range.likely = dir.prec[0];
 
 	  /* Even a non-empty wide character string need not convert into
@@ -1992,7 +1992,7 @@ format_string (const directive , tree arg)
 	 in mode 2, and the maximum is PRECISION or -1 to disable
 	 tracking.  */
 
-  if (0 <= dir.prec[0])
+  if (dir.prec[0] >= 0)
 	{
 	  if (slen.range.min >= target_int_max ())
 	slen.range.min = 0;
@@ -2054,7 +2054,7 @@ should_warn_p (const pass_sprintf_length::call_info ,
 
   if (info.bounded)
 {
-  if (1 == warn_format_trunc && result.min <= avail.max
+  if (warn_format_trunc == 1 && result.min <= avail.max
 	  && info.retval_used ())
 	{
 	  /* The likely amount of space remaining in the destination is big
@@ -2062,7 +2062,7 @@ should_warn_p (const pass_sprintf_length::call_info ,
 	  return false;
 	}
 
-  if (1 == warn_format_trunc && result.likely <= avail.likely
+  if (warn_format_trunc == 1 && result.likely <= avail.likely
 	  && !info.retval_used ())
 	{
 	  /* The likely amount of space remaining in the destination is big
@@ -2082,7 +2082,7 @@ should_warn_p (const pass_sprintf_length::call_info ,
 }
   else
 {
-  if (1 == warn_level && result.likely <= avail.likely)
+  if (warn_level == 1 && result.likely <= avail.likely)
 	{
 	  /* The likely amount of space remaining in the destination is big
 	 enough for the likely output.  */
@@ -2196,7 +2196,7 @@ maybe_warn (substring_loc , source_range *pargrange,
 			  navail);
 	}
 
-  if (0 == res.min && res.max < maxbytes)
+  if (res.min == 0 && res.max < maxbytes)
 	{
 	  const char* fmtstr
 	= (info.bounded
@@ -2213,7 +2213,7 @@ maybe_warn (substring_loc , source_range *pargrange,
 			  res.max, navail);
 	}
 
-  if (0 == res.min && maxbytes <= res.max)
+  if (res.min == 0 && maxbytes <= res.max)
 	{
 	  /* This is a 

Re: [v3 PATCH] Implement LWG 2825, LWG 2756 breaks class template argument deduction for optional.

2017-01-30 Thread Ville Voutilainen
On 31 January 2017 at 00:06, Tim Song  wrote:
> On Mon, Jan 30, 2017 at 9:36 PM Jonathan Wakely  wrote:
>>
>> On 30/01/17 13:28 +, Jonathan Wakely wrote:
>> >On 30/01/17 13:47 +0200, Ville Voutilainen wrote:
>> >>Tested on Linux-x64.
>> >
>> >OK, thanks.
>>
>> To be clear: this isn't approved by LWG yet, but I think we can be a
>> bit adventurous with deduction guides and add them for experimental
>> C++17 features. Getting more usage experience before we standardise
>> these things will be good, and deduction guides are very new and
>> untried. If we find problems we can remove them again, and will have
>> invaluable feedback for the standards committee.
>>
>
> My brain compiler says that this may cause problems with
>
> std::optional o1;
> std::optional o2 = o1; // wanted optional, deduced 
> optional
>
> Trunk GCC deduces optional, but I don't think it implements
> P0512R0 yet, which prefers explicit guides to implicit ones before
> considering partial ordering. This example is very similar to the
> example in https://timsong-cpp.github.io/cppwp/over.match.best#1.6.


I'll see about constraining the guide tomorrow.


GCC patch committed: For DIE for function, use first matching variant

2017-01-30 Thread Ian Lance Taylor via gcc-patches
After the patch of 2016-11-03, gen_type_die_with_usage picks a variant
to use for the DIE for a FUNCTION_TYPE or METHOD_TYPE, as does
modified_type_die.  The two need to pick the same variant, otherwise
modified_type_die will not find the DIE that was written earlier.
Unfortunately the loops were written such that gen_type_die_with_usage
used the last matching variant and modified_type_die used the first
matching variant.  This caused PR 79289.  This patch changes
gen_type_die_with_usage to use the first matching variant.
Bootstrapped and tested on x86_64-pc-linux-gnu.  Committed to
mainline.

Ian

2017-01-30  Ian Lance Taylor  

PR debug/79289
* dwarf2out.c (gen_type_die_with_usage): When picking a variant
for FUNCTION_TYPE/METHOD_TYPE, use the first matching one.
Index: gcc/dwarf2out.c
===
--- gcc/dwarf2out.c (revision 245036)
+++ gcc/dwarf2out.c (working copy)
@@ -24453,8 +24453,13 @@
 but try to canonicalize.  */
   tree main = TYPE_MAIN_VARIANT (type);
   for (tree t = main; t; t = TYPE_NEXT_VARIANT (t))
-   if (check_base_type (t, main) && check_lang_type (t, type))
- type = t;
+   {
+ if (check_base_type (t, main) && check_lang_type (t, type))
+   {
+ type = t;
+ break;
+   }
+   }
 }
   else if (TREE_CODE (type) != VECTOR_TYPE
   && TREE_CODE (type) != ARRAY_TYPE)


Re: [v3 PATCH] Implement LWG 2825, LWG 2756 breaks class template argument deduction for optional.

2017-01-30 Thread Tim Song
On Mon, Jan 30, 2017 at 9:36 PM Jonathan Wakely  wrote:
>
> On 30/01/17 13:28 +, Jonathan Wakely wrote:
> >On 30/01/17 13:47 +0200, Ville Voutilainen wrote:
> >>Tested on Linux-x64.
> >
> >OK, thanks.
>
> To be clear: this isn't approved by LWG yet, but I think we can be a
> bit adventurous with deduction guides and add them for experimental
> C++17 features. Getting more usage experience before we standardise
> these things will be good, and deduction guides are very new and
> untried. If we find problems we can remove them again, and will have
> invaluable feedback for the standards committee.
>

My brain compiler says that this may cause problems with

std::optional o1;
std::optional o2 = o1; // wanted optional, deduced optional

Trunk GCC deduces optional, but I don't think it implements
P0512R0 yet, which prefers explicit guides to implicit ones before
considering partial ordering. This example is very similar to the
example in https://timsong-cpp.github.io/cppwp/over.match.best#1.6.

Tim


Patch ping Re: [PATCH] -fsanitize=address,undefined support on s390x

2017-01-30 Thread Jakub Jelinek
Hi!

On Mon, Jan 23, 2017 at 09:36:29PM +0100, Jakub Jelinek wrote:
> So, I've bootstrapped/regtested s390x-linux (64-bit only, don't have 32-bit
> userland around anymore to test 31-bit) with the attached patch (and on top
> of the PR79168 patch I'll post soon) and the
> only regressions I got are:
> FAIL: c-c++-common/asan/null-deref-1.c   {-O2,-O2 -flto 
> -fno-use-linker-plugin -flto-partition=none,-O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects,-O3 -g,-Os}  output pattern test
> FAIL: g++.dg/asan/deep-stack-uaf-1.C   {-O0,-O1,-O2,-O3 -g,-Os}  output 
> pattern test
> FAIL: c-c++-common/ubsan/overflow-vec-1.c   {-O0,-O1,-O2,-O2 -flto 
> -fno-use-linker-plugin -flto-partition=none,-O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects,-O3 -g,-Os}  execution test
> FAIL: c-c++-common/ubsan/overflow-vec-2.c   {-O0,-O1,-O2,-O2 -flto 
> -fno-use-linker-plugin -flto-partition=none,-O2 -flto -fuse-linker-plugin 
> -fno-fat-lto-objects,-O3 -g,-Os}  execution test
> All but deep-stack-uaf-1.C in both check-gcc and check-g++.
> 
> In null-deref-1.c it seems the problem is in the line for the deref,
> the testcase is expecting runtime error on line 10, while
> #0 0x8a6d in NullDeref c-c++-common/asan/null-deref-1.c:11
>  
> #1 0x88f1 in main c-c++-common/asan/null-deref-1.c:15 
>  
> #2 0x3ff93022c5f in __libc_start_main (/lib64/libc.so.6+0x22c5f)  
>  
> #3 0x896d  
> (gcc-7.0.1-20170120/obj-s390x-redhat-linux/gcc/testsuite/g++/null-deref-1.exe+0x896d)
>
> is reported.
> The second test fails
> ERROR: AddressSanitizer: heap-use-after-free on address 0x61500205 at pc 
> 0x8b12 bp 0x03fff8378928 sp 0x03fff8378918   
> READ of size 1 at 0x61500205 thread T0
>  
> #0 0x8b11 in main g++.dg/asan/deep-stack-uaf-1.C:33   
>  
> #1 0x3ffabe22c5f in __libc_start_main (/lib64/libc.so.6+0x22c5f)  
>  
> #2 0x89cd  
> (gcc-7.0.1-20170120/obj-s390x-redhat-linux/gcc/testsuite/g++/deep-stack-uaf-1.exe+0x89cd)
> will need to debug if we don't need to add further options on s390x to
> make sure it has all the frames it is expecting.
> The last 2 tests aren't really asan related, will debug.
> 
> Note apparently asan_test.C isn't enabled on non-i?86/x86_64, which
> is a big mistake, will try to change it separately.

I'd like to ping this patch, since then bootstrapped/regtested again
several times on s390x-linux.  asan_test.C is since then enabled
on all architectures and passes with the patch (lots of small tests),
the overflow-vec-*.c tests fail even on powerpc*-linux, so I think the
port is in quite good shape and for feature parity it would be nice to
have this feature on s390x-linux.

> 2017-01-23  Jakub Jelinek  
> 
> gcc/
>   * config/s390/s390.c (s390_asan_shadow_offset): New function.
>   (TARGET_ASAN_SHADOW_OFFSET): Redefine.
> libsanitizer/
>   * configure.tgt: Enable asan and ubsan on 64-bit s390*-*-linux*.
> 
> --- gcc/config/s390/s390.c.jj 2017-01-19 16:58:25.0 +0100
> +++ gcc/config/s390/s390.c2017-01-23 16:32:28.220398187 +0100
> @@ -15435,6 +15435,14 @@ s390_excess_precision (enum excess_preci
>return FLT_EVAL_METHOD_UNPREDICTABLE;
>  }
>  
> +/* Implement the TARGET_ASAN_SHADOW_OFFSET hook.  */
> +
> +static unsigned HOST_WIDE_INT
> +s390_asan_shadow_offset (void)
> +{
> +  return TARGET_64BIT ? HOST_WIDE_INT_1U << 52 : HOST_WIDE_INT_UC 
> (0x2000);
> +}
> +
>  /* Initialize GCC target structure.  */
>  
>  #undef  TARGET_ASM_ALIGNED_HI_OP
> @@ -15536,6 +15544,8 @@ s390_excess_precision (enum excess_preci
>  #define TARGET_BUILD_BUILTIN_VA_LIST s390_build_builtin_va_list
>  #undef TARGET_EXPAND_BUILTIN_VA_START
>  #define TARGET_EXPAND_BUILTIN_VA_START s390_va_start
> +#undef TARGET_ASAN_SHADOW_OFFSET
> +#define TARGET_ASAN_SHADOW_OFFSET s390_asan_shadow_offset
>  #undef TARGET_GIMPLIFY_VA_ARG_EXPR
>  #define TARGET_GIMPLIFY_VA_ARG_EXPR s390_gimplify_va_arg
>  
> --- libsanitizer/configure.tgt.jj 2017-01-23 15:25:21.0 +0100
> +++ libsanitizer/configure.tgt2017-01-23 15:36:40.787456320 +0100
> @@ -39,6 +39,11 @@ case "${target}" in
>   ;;
>sparc*-*-linux*)
>   ;;
> +  s390*-*-linux*)
> + if test x$ac_cv_sizeof_void_p = x4; then
> + UNSUPPORTED=1
> + fi
> + ;;
>arm*-*-linux*)
>   ;;
>aarch64*-*-linux*)
> 

Jakub


[PATCH] relax -Wformat-overflow for precision ranges (PR 79275)

2017-01-30 Thread Martin Sebor

Bug 79275 - -Wformat-overflow false positive exceeding INT_MAX in
glibc sysdeps/posix/tempname.c points out a false positive found
during a Glibc build and caused by the checker using the upper
bound of a range of precisions in string directives with string
arguments of non-constant length.  The attached patch relaxes
the checker to use the lower bound instead when appropriate.

Martin
PR middle-end/79275 -  -Wformat-overflow false positive exceeding INT_MAX in glibc sysdeps/posix/tempname.c

gcc/testsuite/ChangeLog:

	PR middle-end/79275
	* gcc.dg/tree-ssa/builtin-sprintf-warn-11.c: New test.
	* gcc.dg/tree-ssa/pr79275.c: New test.

gcc/ChangeLog:

	PR middle-end/79275
	* gimple-ssa-sprintf.c (get_string_length): Set lower bound to zero.
	(format_string): Tighten up the range of output for non-constant
	strings and correct the expected range for wide non-constant strings.

diff --git a/gcc/gimple-ssa-sprintf.c b/gcc/gimple-ssa-sprintf.c
index 8261a44..c0c0a5f 100644
--- a/gcc/gimple-ssa-sprintf.c
+++ b/gcc/gimple-ssa-sprintf.c
@@ -1832,10 +1832,11 @@ get_string_length (tree str)
 	}
   else
 	{
-	  /* When the upper bound is unknown (as assumed to be excessive)
+	  /* When the upper bound is unknown (it can be zero or excessive)
 	 set the likely length to the greater of 1 and the length of
-	 the shortest string.  */
+	 the shortest string and reset the lower bound to zero.  */
 	  res.range.likely = res.range.min ? res.range.min : warn_level > 1;
+	  res.range.min = 0;
 	}
 
   res.range.unlikely = res.range.max;
@@ -1986,43 +1987,89 @@ format_string (const directive , tree arg)
 }
   else
 {
-  /* For a '%s' and '%ls' directive with a non-constant string,
-	 the minimum number of characters is the greater of WIDTH
-	 and either 0 in mode 1 or the smaller of PRECISION and 1
-	 in mode 2, and the maximum is PRECISION or -1 to disable
-	 tracking.  */
+  /* For a '%s' and '%ls' directive with a non-constant string (either
+	 one of a number of strings of known length or an unknown string)
+	 the minimum number of characters is lesser of PRECISION[0] and
+	 the length of the shortest known string or zero, and the maximum
+	 is the lessser of the length of the longest known string or
+	 PTRDIFF_MAX and PRECISION[1].  The likely length is either
+	 the minimum at level 1 and the greater of the minimum and 1
+	 at level 2.  This result is adjust upward for width (if it's
+	 specified).  */
+
+  if (dir.modifier == FMT_LEN_l)
+	{
+	  /* A wide character converts to as few as zero bytes.  */
+	  slen.range.min = 0;
+	  if (slen.range.max < target_int_max ())
+	slen.range.max *= target_mb_len_max ();
+
+	  if (slen.range.likely < target_int_max ())
+	slen.range.likely *= 2;
+
+	  if (slen.range.likely < target_int_max ())
+	slen.range.unlikely *= target_mb_len_max ();
+	}
+
+  res.range = slen.range;
 
   if (0 <= dir.prec[0])
 	{
+	  /* Adjust the minimum to zero if the string length is unknown,
+	 or at most the lower bound of the precision otherwise.  */
 	  if (slen.range.min >= target_int_max ())
-	slen.range.min = 0;
+	res.range.min = 0;
 	  else if ((unsigned HOST_WIDE_INT)dir.prec[0] < slen.range.min)
-	{
-	  slen.range.min = dir.prec[0];
-	  slen.range.likely = slen.range.min;
-	}
+	res.range.min = dir.prec[0];
 
+	  /* Make both maxima no greater than the upper bound of precision.  */
 	  if ((unsigned HOST_WIDE_INT)dir.prec[1] < slen.range.max
 	  || slen.range.max >= target_int_max ())
 	{
-	  slen.range.max = dir.prec[1];
-	  slen.range.likely = slen.range.max;
+	  res.range.max = dir.prec[1];
+	  res.range.unlikely = dir.prec[1];
 	}
+
+	  /* If precision is constant, set the likely counter to the lesser
+	 of it and the maximum string length.  Otherwise, if the lower
+	 bound of precision is greater than zero, set the likely counter
+	 to the minimum.  Otherwise set it to zero or one based on
+	 the warning level.  */
+	  if (dir.prec[0] == dir.prec[1])
+	res.range.likely
+	  = ((unsigned HOST_WIDE_INT)dir.prec[0] < slen.range.max
+		 ? dir.prec[0] : slen.range.max);
+	  else if (dir.prec[0] > 0)
+	res.range.likely = res.range.min;
+	  else
+	res.range.likely = warn_level > 1;
+	}
+  else if (0 <= dir.prec[1])
+	{
+	  res.range.min = 0;
+	  if ((unsigned HOST_WIDE_INT)dir.prec[1] < slen.range.max)
+	res.range.max = dir.prec[1];
+	  res.range.likely = dir.prec[1] ? warn_level > 1 : 0;
 	}
   else if (slen.range.min >= target_int_max ())
 	{
-	  slen.range.min = 0;
-	  slen.range.max = HOST_WIDE_INT_MAX;
-	  /* At level one strings of unknown length are assumed to be
+	  res.range.min = 0;
+	  res.range.max = HOST_WIDE_INT_MAX;
+	  /* At level 1 strings of unknown length are assumed to be
 	 empty, while at level 1 they are assumed to be one byte
 	 long.  */
-	  slen.range.likely = warn_level > 1;
+	  res.range.likely = 

[PATCH] Fix ICEs with power8 fixuns_truncdi2 patterns (PR target/79197)

2017-01-30 Thread Jakub Jelinek
Hi!

Accoring to make mddump generated tmp-mddump.md, on powerpc the only pattern
with unsigned_fix:DI where the inner operand is SF or DFmode is the
*fixuns_truncdi2_fctiduz.  There is an expander for that instruction,
which uses different operand predicates and different condition, so the
following testcases show 2 different cases, in one the condition for the
expander is true and false for the actual insn, the other where
register_operand is true (SFmode subreg), while gpc_reg_operand is false.
(There are other expanders and patterns that handle vector modes, or
SImode, or DImode with KF/TFmodes.)

As there is just one insn that satisfies it, it makes no sense to have
different conditions or different predicates (there would need to be other
define_insn* patterns that would handle the rest), but I don't really even
see the point in duplication of the condition and predicates, the
define_insn itself can serve as expander.

Bootstrapped/regtested on powerpc64{,le}-linux, ok for trunk?

2017-01-30  Jakub Jelinek  

PR target/79197
* config/rs6000/rs6000.md (*fixuns_truncdi2_fctiduz): Rename to 
...
(fixuns_truncdi2): ... this, remove previous expander.  Put all
conditions on a single line.

* gcc.target/powerpc/pr79197.c: New test.
* gcc.c-torture/compile/pr79197.c: New test.

--- gcc/config/rs6000/rs6000.md.jj  2017-01-23 18:41:20.0 +0100
+++ gcc/config/rs6000/rs6000.md 2017-01-30 14:44:12.148761705 +0100
@@ -5712,17 +5712,10 @@ (define_insn_and_split "fixuns_truncdi2"
-  [(set (match_operand:DI 0 "register_operand" "")
-   (unsigned_fix:DI (match_operand:SFDF 1 "register_operand" "")))]
-  "TARGET_HARD_FLOAT && (TARGET_FCTIDUZ || VECTOR_UNIT_VSX_P (mode))"
-  "")
-
-(define_insn "*fixuns_truncdi2_fctiduz"
+(define_insn "fixuns_truncdi2"
   [(set (match_operand:DI 0 "gpc_reg_operand" "=d,wi")
(unsigned_fix:DI (match_operand:SFDF 1 "gpc_reg_operand" ",")))]
-  "TARGET_HARD_FLOAT && TARGET_DOUBLE_FLOAT && TARGET_FPRS
-&& TARGET_FCTIDUZ"
+  "TARGET_HARD_FLOAT && TARGET_DOUBLE_FLOAT && TARGET_FPRS && TARGET_FCTIDUZ"
   "@
fctiduz %0,%1
xscvdpuxds %x0,%x1"
--- gcc/testsuite/gcc.target/powerpc/pr79197.c.jj   2017-01-30 
14:54:55.533314402 +0100
+++ gcc/testsuite/gcc.target/powerpc/pr79197.c  2017-01-30 14:55:20.407988406 
+0100
@@ -0,0 +1,11 @@
+/* PR target/79197 */
+/* { dg-do compile } */
+/* { dg-options "-O0 -mno-popcntd" } */
+
+unsigned a;
+
+void
+foo (void)
+{
+  a = *(double *) (__UINTPTR_TYPE__) 0x40;
+}
--- gcc/testsuite/gcc.c-torture/compile/pr79197.c.jj2017-01-30 
14:56:31.383058240 +0100
+++ gcc/testsuite/gcc.c-torture/compile/pr79197.c   2017-01-30 
14:56:40.902933477 +0100
@@ -0,0 +1,10 @@
+/* PR target/79197 */
+
+unsigned long b;
+
+unsigned long
+foo (float *a, float *x)
+{
+  __builtin_memcpy (a, x, sizeof (float));
+  return *a;
+}

Jakub


[wwwdocs] news.html - remove link to openmp.org in old entry

2017-01-30 Thread Gerald Pfeifer
This URL does not work any longer, and given how old that news items
is (really, we've been supporting OpenMP for more than a decade now?),
I figured to just leave the textual reference on this one.  We've got
plenty to links elsewhere.

Applied.

Gerald

Index: news.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/news.html,v
retrieving revision 1.156
diff -u -r1.156 news.html
--- news.html   29 Jan 2017 21:58:40 -  1.156
+++ news.html   30 Jan 2017 21:21:12 -
@@ -642,9 +642,8 @@
 March 9, 2006
 
 Richard Henderson, Jakub Jelinek and Diego Novillo of Red Hat Inc, and
-Dmitry Kurochkin have contributed an implementation of the http://openmp.org/wp/;>OpenMP v2.5 parallel programming
-interface for C, C++ and Fortran.
+Dmitry Kurochkin have contributed an implementation of the
+OpenMP 2.5 parallel programming interface for C, C++ and Fortran.
 
 
 March 6, 2006


Re: Patch to fix PR79131

2017-01-30 Thread Christophe Lyon
Hi Vladimir,

On 26 January 2017 at 18:09, Vladimir Makarov  wrote:
> The following patch fixes
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79131
>
> The patch also adapts IP IRA in LRA because without it GCC IP RA tests
> become broken (it was just a luck that the tests worked before the patch).
>
> The patch was successfully bootstrapped and tested on x86-64.
>
> Committed as rev. 244942.
>
>
After this patch, I've noticed regressions on arm:
FAIL:  gcc.target/arm/neon-for-64bits-1.c scan-assembler-times vshr 0
Your follow-up patch didn't fix this.
There are now 6 vshr instructions generated.

Can you check ?

Thanks,
Christophe


[C++ PATCH] Fix default TLS model for non-inline static data members (PR c++/79288)

2017-01-30 Thread Jakub Jelinek
Hi!

The inline variable changes broke the default TLS model of non-inline static
data members.  The decl_default_tls_model call needs DECL_EXTERNAL for the
!processing_template_decl be already set appropriately.  The following patch
moves the thread_p processing a few lines below, so that it is already set
there.

Bootstrapped/regtested on x86_64-linux and i686-linux, additionally
on both tested with make check-c++-all, ok for trunk?

2017-01-30  Jakub Jelinek  

PR c++/79288
* decl.c (grokdeclarator): For static data members, handle thread_p
only after handling inline.

* g++.dg/tls/pr79288.C: New test.

--- gcc/cp/decl.c.jj2017-01-26 09:14:24.0 +0100
+++ gcc/cp/decl.c   2017-01-30 18:49:38.544438710 +0100
@@ -12049,14 +12049,6 @@ grokdeclarator (const cp_declarator *dec
: input_location,
VAR_DECL, unqualified_id, type);
set_linkage_for_static_data_member (decl);
-   if (thread_p)
- {
-   CP_DECL_THREAD_LOCAL_P (decl) = true;
-   if (!processing_template_decl)
- set_decl_tls_model (decl, decl_default_tls_model (decl));
-   if (declspecs->gnu_thread_keyword_p)
- SET_DECL_GNU_TLS_P (decl);
- }
if (concept_p)
error ("static data member %qE declared %",
   unqualified_id);
@@ -12077,6 +12069,15 @@ grokdeclarator (const cp_declarator *dec
 definition is provided, unless this is an inline
 variable.  */
  DECL_EXTERNAL (decl) = 1;
+
+   if (thread_p)
+ {
+   CP_DECL_THREAD_LOCAL_P (decl) = true;
+   if (!processing_template_decl)
+ set_decl_tls_model (decl, decl_default_tls_model (decl));
+   if (declspecs->gnu_thread_keyword_p)
+ SET_DECL_GNU_TLS_P (decl);
+ }
  }
else
  {
--- gcc/testsuite/g++.dg/tls/pr79288.C.jj   2017-01-30 18:55:05.754282818 
+0100
+++ gcc/testsuite/g++.dg/tls/pr79288.C  2017-01-30 18:54:52.0 +0100
@@ -0,0 +1,28 @@
+// PR c++/79288
+// { dg-do compile { target nonpic } }
+// { dg-require-effective-target tls }
+// { dg-options "-O2" }
+// { dg-final { scan-assembler-not "@tpoff" { target i?86-*-* x86_64-*-* } } }
+
+struct S
+{
+  static __thread int *p;
+};
+
+template 
+struct T
+{
+  static __thread int *p;
+};
+
+int *
+foo ()
+{
+  return S::p;
+}
+
+int *
+bar ()
+{
+  return T<0>::p;
+}

Jakub


Re: [C++ PATCH] pr 67273 & 79253 lambdas, warnings & ICEs

2017-01-30 Thread Jason Merrill
On Fri, Jan 27, 2017 at 7:45 AM, Nathan Sidwell  wrote:
> Jason,
> I happened to be working on 67273, noticed a problem with my 77585 fix, and
> coincidentally 79253 got filed, which this also fixes.
>
> In 67253,  Wshadow checking was getting confused when determining the return
> type of an instantiated lambda.
>
> template  void Foo (T &) {
>   int ARG = 2;
>   lambda (1);
> }
>
> void Baz () {
>   Foo ([] (auto &) {});
> }
>
> maybe_instantiate_decl gets called when building the 'lambda (1)' call
> during the instantiation of Foo (to determine its return type).  It goes
> ahead and calls instantiate_decl.  instantiate_decl decides not to push to
> top level at:
>
>   fn_context = decl_function_context (d);
>   ...
>   if (!fn_context)
> push_to_top_level ();
>   else
> ...
>
> because 'decl_function_context' is true (it's 'Baz'), but the current
> function is 'Foo'.  We end up in store_parms thinking we're
> pushing a parm 'ARG' that shadows the local var 'ARG'.  That doesn't result
> in wrong code, but does give a spurious shadowing warning.
>
> Again, this is an artifact of generic lambdas being template members of
> function-scope classes.  Not a thing that exists elsewhere.
>
> Unfortunately, there is a case where we do want to stay at the current level
> -- when we're instantiating the lambda body during instantiation of the
> containing template function.  instantiate_decl must be told in which
> context it's being invoked.

Why can't it figure that out for itself?  We should be able to tell
whether its containing function is currently open.

Jason


Re: [PATCH] Actually fix libhsail-rt build on x86_64/i?86 32-bit (take 2)

2017-01-30 Thread Jakub Jelinek
On Mon, Jan 30, 2017 at 08:53:18PM +0100, Bernhard Reutner-Fischer wrote:
> As this will likely bri{ck,g} horribly I'll leave these to Martin and Pekka.
> Thanks for fixing but note that you attached the wrong patch below.

Oops, here is the right one (bootstrapped/regtested on x86_64-linux and
i686-linux again):

2017-01-30  Jakub Jelinek  

* configure.tgt: Fix i?86-*-linux* entry.
* rt/sat_arithmetic.c (__hsail_sat_add_u32, __hsail_sat_add_u64,
__hsail_sat_add_s32, __hsail_sat_add_s64): Use __builtin_add_overflow.
(__hsail_sat_sub_u8, __hsail_sat_sub_u16): Remove pointless for overflow
over maximum.
(__hsail_sat_sub_u32, __hsail_sat_sub_u64, __hsail_sat_sub_s32,
__hsail_sat_sub_s64): Use __builtin_sub_overflow.
(__hsail_sat_mul_u32, __hsail_sat_mul_u64, __hsail_sat_mul_s32,
__hsail_sat_mul_s64): Use __builtin_mul_overflow.
* rt/arithmetic.c (__hsail_borrow_u32, __hsail_borrow_u64): Use
__builtin_sub_overflow_p.
(__hsail_carry_u32, __hsail_carry_u64): Use __builtin_add_overflow_p.
* rt/misc.c (__hsail_groupbaseptr, __hsail_kernargbaseptr_u64):
Cast pointers to uintptr_t first before casting to some other integral
type.
* rt/segment.c (__hsail_segmentp_private, __hsail_segmentp_group): 
Likewise.
* rt/queue.c (__hsail_ldqueuereadindex, __hsail_ldqueuewriteindex,
__hsail_addqueuewriteindex, __hsail_casqueuewriteindex,
__hsail_stqueuereadindex, __hsail_stqueuewriteindex): Cast integral 
value
to uintptr_t first before casting to pointer.
* rt/workitems.c (__hsail_alloca_pop_frame): Cast memcpy first argument 
to
void * to avoid warning.

--- libhsail-rt/configure.tgt.jj2017-01-30 09:31:51.0 +0100
+++ libhsail-rt/configure.tgt   2017-01-30 09:35:04.402926654 +0100
@@ -28,9 +28,7 @@
 # broken systems. Currently it has been tested only on x86_64 Linux
 # of the upstream gcc targets. More targets shall be added after testing.
 case "${target}" in
-  i[[3456789]]86-*linux*)
-;;
-  x86_64-*-linux*)
+  x86_64-*-linux* | i?86-*-linux*)
 ;;
 *)
 UNSUPPORTED=1
--- libhsail-rt/rt/sat_arithmetic.c.jj  2017-01-26 09:17:46.0 +0100
+++ libhsail-rt/rt/sat_arithmetic.c 2017-01-30 10:27:27.861325330 +0100
@@ -49,21 +49,19 @@ __hsail_sat_add_u16 (uint16_t a, uint16_
 uint32_t
 __hsail_sat_add_u32 (uint32_t a, uint32_t b)
 {
-  uint64_t c = (uint64_t) a + (uint64_t) b;
-  if (c > UINT32_MAX)
+  uint32_t c;
+  if (__builtin_add_overflow (a, b, ))
 return UINT32_MAX;
-  else
-return c;
+  return c;
 }
 
 uint64_t
 __hsail_sat_add_u64 (uint64_t a, uint64_t b)
 {
-  __uint128_t c = (__uint128_t) a + (__uint128_t) b;
-  if (c > UINT64_MAX)
+  uint64_t c;
+  if (__builtin_add_overflow (a, b, ))
 return UINT64_MAX;
-  else
-return c;
+  return c;
 }
 
 int8_t
@@ -93,25 +91,19 @@ __hsail_sat_add_s16 (int16_t a, int16_t
 int32_t
 __hsail_sat_add_s32 (int32_t a, int32_t b)
 {
-  int64_t c = (int64_t) a + (int64_t) b;
-  if (c > INT32_MAX)
-return INT32_MAX;
-  else if (c < INT32_MIN)
-return INT32_MIN;
-  else
-return c;
+  int32_t c;
+  if (__builtin_add_overflow (a, b, ))
+return b < 0 ? INT32_MIN : INT32_MAX;
+  return c;
 }
 
 int64_t
 __hsail_sat_add_s64 (int64_t a, int64_t b)
 {
-  __int128_t c = (__int128_t) a + (__int128_t) b;
-  if (c > INT64_MAX)
-return INT64_MAX;
-  else if (c < INT64_MIN)
-return INT64_MIN;
-  else
-return c;
+  int64_t c;
+  if (__builtin_add_overflow (a, b, ))
+return b < 0 ? INT64_MIN : INT64_MAX;
+  return c;
 }
 
 uint8_t
@@ -120,8 +112,6 @@ __hsail_sat_sub_u8 (uint8_t a, uint8_t b
   int16_t c = (uint16_t) a - (uint16_t) b;
   if (c < 0)
 return 0;
-  else if (c > UINT8_MAX)
-return UINT8_MAX;
   else
 return c;
 }
@@ -132,8 +122,6 @@ __hsail_sat_sub_u16 (uint16_t a, uint16_
   int32_t c = (uint32_t) a - (uint32_t) b;
   if (c < 0)
 return 0;
-  else if (c > UINT16_MAX)
-return UINT16_MAX;
   else
 return c;
 }
@@ -141,25 +129,19 @@ __hsail_sat_sub_u16 (uint16_t a, uint16_
 uint32_t
 __hsail_sat_sub_u32 (uint32_t a, uint32_t b)
 {
-  int64_t c = (uint64_t) a - (uint64_t) b;
-  if (c < 0)
+  uint32_t c;
+  if (__builtin_sub_overflow (a, b, ))
 return 0;
-  else if (c > UINT32_MAX)
-return UINT32_MAX;
-  else
-return c;
+  return c;
 }
 
 uint64_t
 __hsail_sat_sub_u64 (uint64_t a, uint64_t b)
 {
-  __int128_t c = (__uint128_t) a - (__uint128_t) b;
-  if (c < 0)
+  uint64_t c;
+  if (__builtin_sub_overflow (a, b, ))
 return 0;
-  else if (c > UINT64_MAX)
-return UINT64_MAX;
-  else
-return c;
+  return c;
 }
 
 int8_t
@@ -189,25 +171,19 @@ __hsail_sat_sub_s16 (int16_t a, int16_t
 int32_t
 __hsail_sat_sub_s32 (int32_t a, int32_t b)
 {
-  int64_t c = (int64_t) a - (int64_t) b;
-  if (c > INT32_MAX)
-return INT32_MAX;
-  else if (c < INT32_MIN)
-return INT32_MIN;
-  else

Re: [PATCH] Actually fix libhsail-rt build on x86_64/i?86 32-bit (take 2)

2017-01-30 Thread Bernhard Reutner-Fischer
On 30 January 2017 18:37:00 CET, Jakub Jelinek  wrote:
>Hi!
>
>On Mon, Jan 30, 2017 at 05:56:36PM +0100, Bernhard Reutner-Fischer
>wrote:
>> On 30 January 2017 10:56:59 CET, Jakub Jelinek 
>wrote:
>> 
>> >+++ libhsail-rt/rt/sat_arithmetic.c 2017-01-30 10:27:27.861325330
>+0100
>> >@@ -49,21 +49,18 @@ __hsail_sat_add_u16 (uint16_t a, uint16_
>> > uint64_t
>> > __hsail_sat_add_u64 (uint64_t a, uint64_t b)
>> > {
>> >-  __uint128_t c = (__uint128_t) a + (__uint128_t) b;
>> >-  if (c > UINT64_MAX)
>> >+  uint64_t c;
>> >+  if (__builtin_add_overflow (a, b, ))
>> > return UINT64_MAX;
>> >-  else
>> >-return c;
>> > }
>> 
>> Missing return c; ?
>
>Oops, right, fixed thusly.  Note the previously posted patch passed
>bootstrap/regtest on x86_64-linux (and bootstrapped on i686-linux,
>regtest
>still ongoing there), so likely nothing in the testsuite tests
>it.
>
>> Or maybe dead code since I'd have expected a warning here about not
>returning?
>
>No, but it seems libhsail-rt doesn't add any warnings at all (something
>that
>really should be fixed too, config/*.m4 has lots of functions to enable
>warnings that can be just added to configure.ac).

As this will likely bri{ck,g} horribly I'll leave these to Martin and Pekka.
Thanks for fixing but note that you attached the wrong patch below.

Cheers,
>
>2017-01-23  Jakub Jelinek  
>
>gcc/
>   * config/s390/s390.c (s390_asan_shadow_offset): New function.
>   (TARGET_ASAN_SHADOW_OFFSET): Redefine.
>libsanitizer/
>   * configure.tgt: Enable asan and ubsan on 64-bit s390*-*-linux*.
>
>--- gcc/config/s390/s390.c.jj  2017-01-19 16:58:25.0 +0100
>+++ gcc/config/s390/s390.c 2017-01-23 16:32:28.220398187 +0100
>@@ -15435,6 +15435,14 @@ s390_excess_precision (enum excess_preci
>   return FLT_EVAL_METHOD_UNPREDICTABLE;
> }
> 
>+/* Implement the TARGET_ASAN_SHADOW_OFFSET hook.  */
>+
>+static unsigned HOST_WIDE_INT
>+s390_asan_shadow_offset (void)
>+{
>+  return TARGET_64BIT ? HOST_WIDE_INT_1U << 52 : HOST_WIDE_INT_UC
>(0x2000);
>+}
>+
> /* Initialize GCC target structure.  */
> 
> #undef  TARGET_ASM_ALIGNED_HI_OP
>@@ -15536,6 +15544,8 @@ s390_excess_precision (enum excess_preci
> #define TARGET_BUILD_BUILTIN_VA_LIST s390_build_builtin_va_list
> #undef TARGET_EXPAND_BUILTIN_VA_START
> #define TARGET_EXPAND_BUILTIN_VA_START s390_va_start
>+#undef TARGET_ASAN_SHADOW_OFFSET
>+#define TARGET_ASAN_SHADOW_OFFSET s390_asan_shadow_offset
> #undef TARGET_GIMPLIFY_VA_ARG_EXPR
> #define TARGET_GIMPLIFY_VA_ARG_EXPR s390_gimplify_va_arg
> 
>--- libsanitizer/configure.tgt.jj  2017-01-23 15:25:21.0 +0100
>+++ libsanitizer/configure.tgt 2017-01-23 15:36:40.787456320 +0100
>@@ -39,6 +39,11 @@ case "${target}" in
>   ;;
>   sparc*-*-linux*)
>   ;;
>+  s390*-*-linux*)
>+  if test x$ac_cv_sizeof_void_p = x4; then
>+  UNSUPPORTED=1
>+  fi
>+  ;;
>   arm*-*-linux*)
>   ;;
>   aarch64*-*-linux*)
>
>
>   Jakub



Re: [PATCH][RFA][PR tree-optimization/79095] Improve overflow test optimization and avoid invalid warnings

2017-01-30 Thread Richard Biener
On January 30, 2017 7:02:38 PM GMT+01:00, Jeff Law  wrote:
>On 01/30/2017 02:51 AM, Richard Biener wrote:
>> On Fri, Jan 27, 2017 at 11:21 PM, Jeff Law  wrote:
>>> On 01/27/2017 02:35 PM, Richard Biener wrote:

 On January 27, 2017 7:30:07 PM GMT+01:00, Jeff Law 
>wrote:
>
> On 01/27/2017 05:08 AM, Richard Biener wrote:
>>
>> On Fri, Jan 27, 2017 at 10:02 AM, Marc Glisse
>
>
> wrote:
>>>
>>> On Thu, 26 Jan 2017, Jeff Law wrote:
>>>
> I assume this causes a regression for code like
>
> unsigned f(unsigned a){
>   unsigned b=a+1;
>   if(b   return b;
> }


 Yes.  The transformation ruins the conversion into ADD_OVERFLOW
>for
>
> the +-

 1 case.  However, ISTM that we could potentially recover the
>
> ADD_OVERFLOW in

 phi-opt.  It's a very simple pattern that would be presented to
>
> phi-opt, so

 it might not be terrible to recover -- which has the advantage
>that
>
> if a

 user wrote an optimized overflow test we'd be able to recover
>
> ADD_OVERFLOW

 for it.
>>>
>>>
>>>
>>> phi-opt is a bit surprising at first glance because there can be
>
> overflow
>>>
>>> checking without condition/PHI, but if it is convenient to catch
>
> many
>>>
>>> cases...
>>
>>
>> Yeah, and it's still on my TODO to add some helpers exercising
>> match.pd COND_EXPR
>> patterns from PHI nodes and their controlling condition.
>
> It turns out to be better to fix the existing machinery to detect
> ADD_OVERFLOW in the transformed case than to add new detection to
> phi-opt.
>
> The problem with improving the detection of ADD_OVERFLOW is that
>the
> transformed test may allow the ADD/SUB to be sunk.  So by the time
>we
> run the pass to detect ADD_OVERFLOW, the test and arithmetic may
>be in
> different blocks -- ugh.
>
> The more I keep thinking about this the more I wonder if
>transforming
> the conditional is just more of a headache than its worth -- the
>main
> need here is to drive propagation of known constants into the
>THEN/ELSE
>
> clauses.  Transforming the conditional makes that easy for VRP &
>DOM to
>
> discover those constant and the transform is easy to write in
>match.pd.
>
> But we could just go back to discovering the case in VRP or DOM
>via
> open-coding detection, then propagating the known constants
>without
> transforming the conditional.


 Indeed we can do that.  And in fact with named patterns in match.pd
>you
 could even avoid the open-coding.
>>>
>>> negate_expr_p being the example?  That does look mighty
>interesting... After
>>> recognition we'd still have to extract the operands, but it does
>look like
>>> it might handle the detection part.
>>
>> Yes, the (match ..) stuff is actually exported in gimple-match.c
>(just
>> no declarations
>> are emitted to headers yet).  logical_inverted_value might be a
>better example
>> given it has an "output":
>>
>> (match (logical_inverted_value @0)
>>  (truth_not @0))
>>
>> bool
>> gimple_logical_inverted_value (tree t, tree *res_ops, tree
>> (*valueize)(tree) ATTRIBUTE_UNUSED)
>> {
>Thanks.  If I look at the generated code for this match:
>
>(for cmp (lt le ge gt)
>  (match (overflow_test @0 @1 @2)
>   (cmp:c (plus@2 @0 INTEGER_CST@1) @0)
>   (if (TYPE_UNSIGNED (TREE_TYPE (@0))
>&& TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0))
>&& wi::ne_p (@1, 0)
>
>
>The generated code starts with...
>
>  switch (TREE_CODE (t))
> {
> case SSA_NAME:
>   if (do_valueize (valueize, t) != NULL_TREE)
> {
>   gimple *def_stmt = SSA_NAME_DEF_STMT (t);
>   if (gassign *def = dyn_cast  (def_stmt))
> switch (gimple_assign_rhs_code (def))
>   {
>   case LT_EXPR:
> {
>
>So, it appears to want to match an SSA_NAME which is assigned from the 
>conditional.  So it's not all that helpful because the expression we 
>want to match shows up inside a GIMPLE_COND.  It also appears we'd have

Yeah...

>to build a tree to pass down to matching code.

A tree?  Not sure what you mean.

As this isn't used outside of match.PD yet it surely needs some work in 
genmatch like for the case where the expression to match has no SSA value.

Richard.

>
>Jeff



[PATCH] Fix __atomic to not implement atomic loads with CAS.

2017-01-30 Thread Torvald Riegel
This patch fixes the __atomic builtins to not implement supposedly
lock-free atomic loads based on just a compare-and-swap operation.

If there is no hardware-backed atomic load for a certain memory
location, the current implementation can implement the load with a CAS
while claiming that the access is lock-free.  This is a bug in the cases
of volatile atomic loads and atomic loads to read-only-mapped memory; it
also creates a lot of contention in case of concurrent atomic loads,
which results in at least counter-intuitive performance because most
users probably understand "lock-free" to mean hardware-backed (and thus
"fast") instead of just in the progress-criteria sense.

This patch implements option 3b of the choices described here:
https://gcc.gnu.org/ml/gcc/2017-01/msg00167.html

In a nutshell, this does change affected accesses to call libatomic
instead of inlining the CAS-based load emulation -- but libatomic will
continue to do the CAS-based approach.  Thus, there's no change in how
the changes are actually performed (including compatibility with the
__sync builtins, which are not changed) -- but we do make it much easier
to fix this in the future, and we do ensure that less future code will
have the problematic code inlined into it (and thus unfixable).

Second, the return of __atomic_always_lock_free and
__atomic_is_lock_free are changed to report false for the affected
accesses.  This aims at making users aware of the fact that they don't
get fast hardware-backed performance for the affected accesses.

This patch does not yet change how OpenMP atomics support is
implemented.  Jakub said he would take care of that.  I suppose one
approach would be to check can_atomic_load_p (introduced by this patch)
to decide in expand_omp_atomic whether to just use the mutex-based
approach; I think that conservatively, OpenMP atomics have to assume
that there could be an atomic load to a particular memory location
elsewhere in the program, so the choice between mutex-backed or not has
to be consistent for a particular memory location.

Thanks to Richard Henderson for providing the libatomic part of this
patch, and thanks to Andrew MacLeod for helping with the compiler parts.

I've tested this on an x86_64-linux bootstrap build and see no
regressions.  (With the exception of two OpenMP failures, which Jakub
will take care of.  The other failures I saw didn't seem atomics related
(eg, openacc); I haven't compared it against a full bootstrap build and
make check of trunk.).

AFAICT, in practice only x86 with -mcx16 (or where this is implicitly
implied) is affected.  The other architectures I looked at seemed to
have patterns for true hardware-backed atomic loads whenever they also
had a wider-than-word-sized CAS available.

I know we missed the stage3 deadline with this, unfortunately.  I think
it would be good to have this fix be part of GCC 7 though, because this
would ensure that less future code has the problematic load-via-CAS code
inlined.

Jakub: If this is OK for GCC 7, can you please take care of the OpenMP
bits and commit this?  Changelog entries are in the commit message.

If others could test on other hardware, this would also be appreciated.
commit 1db13cb386e673d5265bcaf2d70fc25dda22e5fd
Author: Torvald Riegel 
Date:   Fri Jan 27 17:40:44 2017 +0100

Fix __atomic to not implement atomic loads with CAS.

gcc/
	* builtins.c (fold_builtin_atomic_always_lock_free): Make "lock-free"
	conditional on existance of a fast atomic load.
	* optabs-query.c (can_atomic_load_p): New function.
	* optabs-query.h (can_atomic_load_p): Declare it.
	* optabs.c (expand_atomic_exchange): Always delegate to libatomic if
	no fast atomic load is available for the particular size of access.
	(expand_atomic_compare_and_swap): Likewise.
	(expand_atomic_load): Likewise.
	(expand_atomic_store): Likewise.
	(expand_atomic_fetch_op): Likewise.
	* testsuite/lib/target-supports.exp
	(check_effective_target_sync_int_128): Remove x86 because it provides
	no fast atomic load.
	(check_effective_target_sync_int_128_runtime): Likewise.

libatomic/
	* acinclude.m4: Add #define FAST_ATOMIC_LDST_*.
	* auto-config.h.in: Regenerate.
	* config/x86/host-config.h (FAST_ATOMIC_LDST_16): Define to 0.
	(atomic_compare_exchange_n): New.
	* glfree.c (EXACT, LARGER): Change condition and add comments.

diff --git a/gcc/builtins.c b/gcc/builtins.c
index bf68e31..0a0e8b9 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -6157,8 +6157,9 @@ fold_builtin_atomic_always_lock_free (tree arg0, tree arg1)
 
   /* Check if a compare_and_swap pattern exists for the mode which represents
  the required size.  The pattern is not allowed to fail, so the existence
- of the pattern indicates support is present.  */
-  if (can_compare_and_swap_p (mode, true))
+ of the pattern indicates support is present.  Also require that an
+ atomic load 

[committed] Fix build of the brig FE on i686-linux host

2017-01-30 Thread Jakub Jelinek
Hi!

The brig FE doesn't build on i686-linux, because size_t arguments
get a warning there when used with %lu.  As %zu is not portable enough,
cast to ulong is what is generally used elsewhere.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk
as obvious.

2017-01-30  Jakub Jelinek  

* brigfrontend/brig-code-entry-handler.cc
(brig_code_entry_handler::get_tree_cst_for_hsa_operand): For %lu
cast size_t arguments to unsigned long.

--- gcc/brig/brigfrontend/brig-code-entry-handler.cc.jj 2017-01-26 
09:15:50.0 +0100
+++ gcc/brig/brigfrontend/brig-code-entry-handler.cc2017-01-30 
17:16:58.721940087 +0100
@@ -606,8 +606,9 @@ brig_code_entry_handler::get_tree_cst_fo
  if (bytes_left < scalar_element_size * element_count)
fatal_error (UNKNOWN_LOCATION,
 "Not enough bytes left for the initializer "
-"(%lu need %lu).",
-bytes_left, scalar_element_size * element_count);
+"(%lu need %lu).", (unsigned long) bytes_left,
+(unsigned long) (scalar_element_size
+ * element_count));
 
  vec *vec_els = NULL;
  for (size_t i = 0; i < element_count; ++i)
@@ -625,8 +626,8 @@ brig_code_entry_handler::get_tree_cst_fo
  if (bytes_left < scalar_element_size)
fatal_error (UNKNOWN_LOCATION,
 "Not enough bytes left for the initializer "
-"(%lu need %lu).",
-bytes_left, scalar_element_size);
+"(%lu need %lu).", (unsigned long) bytes_left,
+(unsigned long) scalar_element_size);
  cst = build_tree_cst_element (scalar_element_type, next_data);
  bytes_left -= scalar_element_size;
  next_data += scalar_element_size;

Jakub


[PATCH] Fix ICE with missing lhs on noreturn call with addressable return type (PR tree-optimization/79267)

2017-01-30 Thread Jakub Jelinek
Hi!

This is yet another occurrence of the bug that we drop lhs on noreturn
calls even when it actually should not be dropped (if it has addressable
type or if it is a variable length type).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2017-01-30  Jakub Jelinek  

PR tree-optimization/79267
* value-prof.c (gimple_ic): Only drop lhs for noreturn calls
if should_remove_lhs_p is true.

* g++.dg/opt/pr79267.C: New test.

--- gcc/value-prof.c.jj 2017-01-01 12:45:38.0 +0100
+++ gcc/value-prof.c2017-01-30 12:30:47.179820533 +0100
@@ -1358,7 +1358,8 @@ gimple_ic (gcall *icall_stmt, struct cgr
   dcall_stmt = as_a  (gimple_copy (icall_stmt));
   gimple_call_set_fndecl (dcall_stmt, direct_call->decl);
   dflags = flags_from_decl_or_type (direct_call->decl);
-  if ((dflags & ECF_NORETURN) != 0)
+  if ((dflags & ECF_NORETURN) != 0
+  && should_remove_lhs_p (gimple_call_lhs (dcall_stmt)))
 gimple_call_set_lhs (dcall_stmt, NULL_TREE);
   gsi_insert_before (, dcall_stmt, GSI_SAME_STMT);
 
--- gcc/testsuite/g++.dg/opt/pr79267.C.jj   2017-01-30 12:36:07.605516857 
+0100
+++ gcc/testsuite/g++.dg/opt/pr79267.C  2017-01-30 12:35:51.0 +0100
@@ -0,0 +1,69 @@
+// PR tree-optimization/79267
+// { dg-do compile }
+// { dg-options "-O3" }
+
+struct A { A (int); };
+struct B
+{
+  virtual void av () = 0;
+  void aw ();
+  void h () { av (); aw (); }
+};
+template  struct G : B
+{
+  T ba;
+  G (int, T) : ba (0) {}
+  void av () { ba (0); }
+};
+struct I
+{
+  B *bc;
+  template  I (j, T) try { G (0, 0); } catch (...) {}
+  ~I () { bc->h (); }
+};
+template  struct C { typedef M *i; };
+template  struct J
+{
+  J ();
+  template  J (O, T p2) : be (0, p2) {}
+  typename C::i operator-> ();
+  I be;
+};
+struct H : A { H () : A (0) {} };
+struct D { J d; void q (); };
+template  class bs;
+int z;
+
+void
+foo (int p1, int *, int)
+{
+  if (p1 == 0)
+throw H ();
+}
+
+D bar ();
+template  struct L
+{
+  struct K { K (int); void operator() (int *) { bar ().q (); } };
+  static J bp () { bq (0); }
+  template  static void bq (br) { J (0, K (0)); }
+};
+struct F
+{
+  virtual J x (int) { foo (0, 0, 0); J > (L >::bp ()); }
+};
+
+void
+baz ()
+{
+  if (z)
+{
+  J d, e;
+  d->x (0);
+  e->x (0);
+}
+  J v, i, j;
+  v->x (0);
+  i->x (0);
+  j->x (0);
+}

Jakub


[C++ PATCH] Fix assignments with comma expression on lhs (PR c++/79232)

2017-01-30 Thread Jakub Jelinek
Hi!

cp_build_modify_expr has some special code for the case when the lhs
is pre{inc,dec}rement, assignment, ?: or min/max, but the comma expression
handling has been removed (so that -fstrong-eval-order is honored; can't
we keep the previous behavior for -fstrong-eval-order=none and/or some?),
which in turn means if there is COMPOUND_EXPR with one of the above
expressions in its right operand (or series of nested COMPOUND_EXPR where the
rightmost expression is one of those), we generate invalid generic and ICE
on it, and even if we don't, we generate code with wrong evaluation order.

The following patch addresses that by handling COMPOUND_EXPR as it does
right now if the right-most expression isn't one of those, and otherwise
ensures the right thing happens.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2017-01-30  Jakub Jelinek  

PR c++/79232
* typeck.c (cp_build_modify_expr): Handle properly COMPOUND_EXPRs
on lhs that have {PRE{DEC,INC}REMENT,MODIFY,MIN,MAX,COND}_EXPR
in the rightmost operand.

* g++.dg/cpp1z/eval-order4.C: New test.
* g++.dg/other/pr79232.C: New test.

--- gcc/cp/typeck.c.jj  2017-01-30 09:31:43.076595640 +0100
+++ gcc/cp/typeck.c 2017-01-30 15:56:33.601002577 +0100
@@ -7568,16 +7568,26 @@ tree
 cp_build_modify_expr (location_t loc, tree lhs, enum tree_code modifycode,
  tree rhs, tsubst_flags_t complain)
 {
-  tree result;
+  tree result = NULL_TREE;
   tree newrhs = rhs;
   tree lhstype = TREE_TYPE (lhs);
+  tree olhs = lhs;
   tree olhstype = lhstype;
   bool plain_assign = (modifycode == NOP_EXPR);
+  bool compound_side_effects_p = false;
+  tree preeval = NULL_TREE;
 
   /* Avoid duplicate error messages from operands that had errors.  */
   if (error_operand_p (lhs) || error_operand_p (rhs))
 return error_mark_node;
 
+  while (TREE_CODE (lhs) == COMPOUND_EXPR)
+{
+  if (TREE_SIDE_EFFECTS (TREE_OPERAND (lhs, 0)))
+   compound_side_effects_p = true;
+  lhs = TREE_OPERAND (lhs, 1);
+}
+
   /* Handle control structure constructs used as "lvalues".  Note that we
  leave COMPOUND_EXPR on the LHS because it is sequenced after the RHS.  */
   switch (TREE_CODE (lhs))
@@ -7585,20 +7595,57 @@ cp_build_modify_expr (location_t loc, tr
   /* Handle --foo = 5; as these are valid constructs in C++.  */
 case PREDECREMENT_EXPR:
 case PREINCREMENT_EXPR:
+  if (compound_side_effects_p)
+   {
+ if (VOID_TYPE_P (TREE_TYPE (rhs)))
+   {
+ if (complain & tf_error)
+   error ("void value not ignored as it ought to be");
+ return error_mark_node;
+   }
+ newrhs = rhs = stabilize_expr (rhs, );
+   }
   if (TREE_SIDE_EFFECTS (TREE_OPERAND (lhs, 0)))
lhs = build2 (TREE_CODE (lhs), TREE_TYPE (lhs),
  cp_stabilize_reference (TREE_OPERAND (lhs, 0)),
  TREE_OPERAND (lhs, 1));
   lhs = build2 (COMPOUND_EXPR, lhstype, lhs, TREE_OPERAND (lhs, 0));
+maybe_add_compound:
+  /* If we had (bar, --foo) = 5; or (bar, (baz, --foo)) = 5;
+and looked through the COMPOUND_EXPRs, readd them now around
+the resulting lhs.  */
+  if (TREE_CODE (olhs) == COMPOUND_EXPR)
+   {
+ lhs = build2 (COMPOUND_EXPR, lhstype, TREE_OPERAND (olhs, 0), lhs);
+ tree *ptr = _OPERAND (lhs, 1);
+ for (olhs = TREE_OPERAND (olhs, 1);
+  TREE_CODE (olhs) == COMPOUND_EXPR;
+  olhs = TREE_OPERAND (olhs, 1))
+   {
+ *ptr = build2 (COMPOUND_EXPR, lhstype,
+TREE_OPERAND (olhs, 0), *ptr);
+ ptr = _OPERAND (*ptr, 1);
+   }
+   }
   break;
 
 case MODIFY_EXPR:
+  if (compound_side_effects_p)
+   {
+ if (VOID_TYPE_P (TREE_TYPE (rhs)))
+   {
+ if (complain & tf_error)
+   error ("void value not ignored as it ought to be");
+ return error_mark_node;
+   }
+ newrhs = rhs = stabilize_expr (rhs, );
+   }
   if (TREE_SIDE_EFFECTS (TREE_OPERAND (lhs, 0)))
lhs = build2 (TREE_CODE (lhs), TREE_TYPE (lhs),
  cp_stabilize_reference (TREE_OPERAND (lhs, 0)),
  TREE_OPERAND (lhs, 1));
   lhs = build2 (COMPOUND_EXPR, lhstype, lhs, TREE_OPERAND (lhs, 0));
-  break;
+  goto maybe_add_compound;
 
 case MIN_EXPR:
 case MAX_EXPR:
@@ -7626,7 +7673,6 @@ cp_build_modify_expr (location_t loc, tr
   except that the RHS goes through a save-expr
   so the code to compute it is only emitted once.  */
tree cond;
-   tree preeval = NULL_TREE;
 
if (VOID_TYPE_P (TREE_TYPE (rhs)))
  {
@@ -7652,14 +7698,31 @@ cp_build_modify_expr (location_t loc, tr
 
if (cond == error_mark_node)
  return cond;
+   /* If we had (e, (a ? b : c)) = d; 

Re: [PATCH][RFA][PR tree-optimization/79095] Improve overflow test optimization and avoid invalid warnings

2017-01-30 Thread Jeff Law

On 01/30/2017 02:51 AM, Richard Biener wrote:

On Fri, Jan 27, 2017 at 11:21 PM, Jeff Law  wrote:

On 01/27/2017 02:35 PM, Richard Biener wrote:


On January 27, 2017 7:30:07 PM GMT+01:00, Jeff Law  wrote:


On 01/27/2017 05:08 AM, Richard Biener wrote:


On Fri, Jan 27, 2017 at 10:02 AM, Marc Glisse 


wrote:


On Thu, 26 Jan 2017, Jeff Law wrote:


I assume this causes a regression for code like

unsigned f(unsigned a){
  unsigned b=a+1;
  if(b

Re: [wwwdocs] Added /gcc-7/porting_to.html

2017-01-30 Thread Jonathan Wakely

On 30/01/17 17:54 +, Jonathan Wakely wrote:

This adds the porting to guide for GCC 7. So far it only has details
of C++ changes, mostly in the std::lib.

Committed to CVS.


And this fixes the HTML errors.

Committed to CVS.

Index: htdocs/gcc-7/porting_to.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-7/porting_to.html,v
retrieving revision 1.1
diff -u -r1.1 porting_to.html
--- htdocs/gcc-7/porting_to.html	30 Jan 2017 17:55:20 -	1.1
+++ htdocs/gcc-7/porting_to.html	30 Jan 2017 17:58:37 -
@@ -35,9 +35,10 @@
 
 Mangling change for conversion operators
 
+
 GCC 7 changes the name mangling for a conversion operator that returns a type
 using the abi_tag attribute, see
-https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71712;>PR 71712.
+https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71712;>PR 71712.
 This affects code that has conversions to std::string,
 for example:
 
@@ -67,7 +68,7 @@
 unrelated headers such as memory,
 futex, mutex, and
 regex.
-Correct code should #include functional to define them.
+Correct code should #include functional to define them.
 
 
 Additional overloads of abs
@@ -86,18 +87,23 @@
 As a result of this change, code which overloads abs may no longer
 compile if the custom overloads conflict with one of the additional overloads
 in the standard headers. For example, this will not compile:
+
+
 #include stdlib.h
-float abs(float x) { return x < 0 ? -x : x; }
+float abs(float x) { return x  0 ? -x : x; }
 
+
+
 The solution is to use the standard functions, not define conflicting
 overloads. For portability to previous versions of GCC and other
 implementations the abs(float) function can be brought into
 scope by including cmath and adding a using-declaration:
+
+
 #include stdlib.h
 #include cmath// ensure std::abs(float) is declared
 using std::abs;
 
-
 
 
 Additionally, calling


[wwwdocs] Added /gcc-7/porting_to.html

2017-01-30 Thread Jonathan Wakely

This adds the porting to guide for GCC 7. So far it only has details
of C++ changes, mostly in the std::lib.

Committed to CVS.
Index: htdocs/gcc-7/porting_to.html
===
RCS file: htdocs/gcc-7/porting_to.html
diff -N htdocs/gcc-7/porting_to.html
--- /dev/null	1 Jan 1970 00:00:00 -
+++ htdocs/gcc-7/porting_to.html	30 Jan 2017 17:53:55 -
@@ -0,0 +1,159 @@
+
+
+
+Porting to GCC 7
+
+
+
+Porting to GCC 7
+
+
+The GCC 7 release series differs from previous GCC releases in
+a number of ways. Some of
+these are a result of bug fixing, and some old behaviors have been
+intentionally changed in order to support new standards, or relaxed
+in standards-conforming ways to facilitate compilation or run-time
+performance.  Some of these changes are not visible to the naked eye
+and will not cause problems when updating from older versions.
+
+
+
+However, some of these changes are visible, and can cause grief to
+users porting to GCC 7. This document is an effort to identify major
+issues and provide clear solutions in a quick and easily searched
+manner. Additions and suggestions for improvement are welcome.
+
+
+
+Preprocessor issues
+
+
+C language issues
+
+
+C++ language issues
+
+Mangling change for conversion operators
+
+GCC 7 changes the name mangling for a conversion operator that returns a type
+using the abi_tag attribute, see
+https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71712;>PR 71712.
+This affects code that has conversions to std::string,
+for example:
+
+
+struct A {
+  operator std::string() const;
+};
+
+
+
+Code using such conversions might fail to link if some objects are compiled
+with GCC 7 and some are compiled with older releases.
+
+
+Header dependency changes
+
+
+Several C++ Standard Library headers have been changed to no longer include
+the functional header.
+As such, C++ programs that used components defined in
+functional without explicitly including that header
+will no longer compile.
+
+
+Previously components such as std::bind
+and std::function were implicitly defined after including
+unrelated headers such as memory,
+futex, mutex, and
+regex.
+Correct code should #include functional to define them.
+
+
+Additional overloads of abs
+
+
+As required by the latest C++ draft, all overloads of the abs
+function are declared by including either of
+cstdlib or cmath
+(and correspondingly by either of stdlib.h or
+math.h). Previously cmath only
+declared the overloads for floating-point types, and
+cstdlib only declared the overloads for integral types.
+
+
+
+As a result of this change, code which overloads abs may no longer
+compile if the custom overloads conflict with one of the additional overloads
+in the standard headers. For example, this will not compile:
+#include stdlib.h
+float abs(float x) { return x < 0 ? -x : x; }
+
+The solution is to use the standard functions, not define conflicting
+overloads. For portability to previous versions of GCC and other
+implementations the abs(float) function can be brought into
+scope by including cmath and adding a using-declaration:
+#include stdlib.h
+#include cmath// ensure std::abs(float) is declared
+using std::abs;
+
+
+
+
+Additionally, calling
+abs with an argument of unsigned type is now ill-formed after
+inclusion of any standard abs overload.
+
+
+std::ios_base::failure
+
+
+When iostream objects are requested to throw exceptions on stream buffer
+errors, the type of exception thrown has changed to use the
+https://gcc.gnu.org/gcc-5/changes.html#libstdcxx;>new libstdc++ ABI
+introduced in GCC 5. Code which does
+catch (const std::ios::failure) or similar will not catch
+the exception if it is built using the old ABI. To ensure the exception is
+caught either compile the catch handler using the new ABI, or use a handler
+of type std::exception (which will catch the old and new versions
+of std::ios_base::failure) or a handler of type
+std::system_error.
+
+
+Changes to std::function constructed with std::reference_wrapper
+
+
+Prior to GCC 7 a std::function constructed with a
+std::reference_wrapperT would unwrap the argument and
+store a target of type T, and target_type() would
+return typeid(T). GCC 7 has been changed to match the behavior
+of other implementations and not unwrap the argument. This means the target
+will be a std::reference_wrapperT and
+target_type() will return
+typeid(std::reference_wrapperT).
+Code which depends on the target type may need to be adjusted appropriately.
+
+
+Changes for array support in std::shared_ptr
+
+
+The behavior of std::shared_ptrT[] and 
+std::shared_ptrT[N] has changed to match the semantics
+in the C++17 draft. Previously specializations of std::shared_ptr
+for array types had unhelpful semantics and were hard to use correctly, so the
+semantics have changed to match the C++17 behavior in GCC 7. Code which uses
+specializations for array types may continue to work in C++11 and C++14 modes,
+but not in 

Re: [PR63238] output alignment debug information

2017-01-30 Thread Alexandre Oliva
On Jan 29, 2017, Cary Coutant  wrote:

>> for gcc/ChangeLog
>> 
>> PR debug/63238

> This is OK so far, but the DW_AT_alignment attribute also needs to be
> added to the checksum computation in die_checksum and
> die_checksum_ordered.

Thanks.  I see what to do in die_checksum_ordered, but die_checksum?  It
seems to handle attributes by value class, and AFAICT the classes that
DW_AT_alignment could use are already covered.  What am I missing?

Here's a patch I'm about to start testing.  Does it look ok?


[PR63238] include alignment debug information in DIE checksum

Add DW_AT_alignment to the DIE checksum.

for gcc/ChangeLog

PR debug/63238
* dwarf2out.c (struct checksum_attributes): Add at_alignment.
(collect_checksum_attributes): Set it.
(die_checksum_ordered): Use it.
---
 gcc/dwarf2out.c |5 +
 1 file changed, 5 insertions(+)

diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index f8fe4c1..15b7a66 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -6600,6 +6600,7 @@ struct checksum_attributes
   dw_attr_node *at_friend;
   dw_attr_node *at_accessibility;
   dw_attr_node *at_address_class;
+  dw_attr_node *at_alignment;
   dw_attr_node *at_allocated;
   dw_attr_node *at_artificial;
   dw_attr_node *at_associated;
@@ -6673,6 +6674,9 @@ collect_checksum_attributes (struct checksum_attributes 
*attrs, dw_die_ref die)
 case DW_AT_address_class:
   attrs->at_address_class = a;
   break;
+   case DW_AT_alignment:
+ attrs->at_alignment = a;
+ break;
 case DW_AT_allocated:
   attrs->at_allocated = a;
   break;
@@ -6879,6 +6883,7 @@ die_checksum_ordered (dw_die_ref die, struct md5_ctx 
*ctx, int *mark)
   CHECKSUM_ATTR (attrs.at_vtable_elem_location);
   CHECKSUM_ATTR (attrs.at_type);
   CHECKSUM_ATTR (attrs.at_friend);
+  CHECKSUM_ATTR (attrs.at_alignment);
 
   /* Checksum the child DIEs.  */
   c = die->die_child;


-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer


[PATCH] Actually fix libhsail-rt build on x86_64/i?86 32-bit (take 2)

2017-01-30 Thread Jakub Jelinek
Hi!

On Mon, Jan 30, 2017 at 05:56:36PM +0100, Bernhard Reutner-Fischer wrote:
> On 30 January 2017 10:56:59 CET, Jakub Jelinek  wrote:
> 
> >+++ libhsail-rt/rt/sat_arithmetic.c  2017-01-30 10:27:27.861325330 +0100
> >@@ -49,21 +49,18 @@ __hsail_sat_add_u16 (uint16_t a, uint16_
> > uint64_t
> > __hsail_sat_add_u64 (uint64_t a, uint64_t b)
> > {
> >-  __uint128_t c = (__uint128_t) a + (__uint128_t) b;
> >-  if (c > UINT64_MAX)
> >+  uint64_t c;
> >+  if (__builtin_add_overflow (a, b, ))
> > return UINT64_MAX;
> >-  else
> >-return c;
> > }
> 
> Missing return c; ?

Oops, right, fixed thusly.  Note the previously posted patch passed
bootstrap/regtest on x86_64-linux (and bootstrapped on i686-linux, regtest
still ongoing there), so likely nothing in the testsuite tests
it.

> Or maybe dead code since I'd have expected a warning here about not returning?

No, but it seems libhsail-rt doesn't add any warnings at all (something that
really should be fixed too, config/*.m4 has lots of functions to enable
warnings that can be just added to configure.ac).

2017-01-23  Jakub Jelinek  

gcc/
* config/s390/s390.c (s390_asan_shadow_offset): New function.
(TARGET_ASAN_SHADOW_OFFSET): Redefine.
libsanitizer/
* configure.tgt: Enable asan and ubsan on 64-bit s390*-*-linux*.

--- gcc/config/s390/s390.c.jj   2017-01-19 16:58:25.0 +0100
+++ gcc/config/s390/s390.c  2017-01-23 16:32:28.220398187 +0100
@@ -15435,6 +15435,14 @@ s390_excess_precision (enum excess_preci
   return FLT_EVAL_METHOD_UNPREDICTABLE;
 }
 
+/* Implement the TARGET_ASAN_SHADOW_OFFSET hook.  */
+
+static unsigned HOST_WIDE_INT
+s390_asan_shadow_offset (void)
+{
+  return TARGET_64BIT ? HOST_WIDE_INT_1U << 52 : HOST_WIDE_INT_UC (0x2000);
+}
+
 /* Initialize GCC target structure.  */
 
 #undef  TARGET_ASM_ALIGNED_HI_OP
@@ -15536,6 +15544,8 @@ s390_excess_precision (enum excess_preci
 #define TARGET_BUILD_BUILTIN_VA_LIST s390_build_builtin_va_list
 #undef TARGET_EXPAND_BUILTIN_VA_START
 #define TARGET_EXPAND_BUILTIN_VA_START s390_va_start
+#undef TARGET_ASAN_SHADOW_OFFSET
+#define TARGET_ASAN_SHADOW_OFFSET s390_asan_shadow_offset
 #undef TARGET_GIMPLIFY_VA_ARG_EXPR
 #define TARGET_GIMPLIFY_VA_ARG_EXPR s390_gimplify_va_arg
 
--- libsanitizer/configure.tgt.jj   2017-01-23 15:25:21.0 +0100
+++ libsanitizer/configure.tgt  2017-01-23 15:36:40.787456320 +0100
@@ -39,6 +39,11 @@ case "${target}" in
;;
   sparc*-*-linux*)
;;
+  s390*-*-linux*)
+   if test x$ac_cv_sizeof_void_p = x4; then
+   UNSUPPORTED=1
+   fi
+   ;;
   arm*-*-linux*)
;;
   aarch64*-*-linux*)


Jakub


[PATCH] MIPS: Fix mode mismatch error between Loongson builtin arguments and insn operands.

2017-01-30 Thread Toma Tabacu
Hi,

The builtins for the pshufh, psllh, psllw, psrah, psraw, psrlh, psrlw Loongson
instructions have the third argument's type set to UQI while its corresponding
insn operand is in SImode.

This results in the following error when matching insn operands:

../gcc/gcc/include/loongson.h: In function 'test_psllw_s':
../gcc/gcc/include/loongson.h:483:10: error: invalid argument to built-in 
function
   return __builtin_loongson_psllw_s (s, amount);
  ^~

This causes the loongson-simd.c and loongson-shift-count-truncated-1.c tests
to fail.

This patch fixes this by wrapping the QImode builtin argument inside a
paradoxical SUBREG with SImode, which will successfully match against the insn
operand.

Tested with mips-mti-elf.

Regards,
Toma

gcc/

* config/mips/mips.c (mips_expand_builtin_insn): Put the QImode
argument of the pshufh, psllh, psllw, psrah, psraw, psrlh, psrlw
builtins into an SImode paradoxical SUBREG.

diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index da7fa8f..f1ca6e2 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -16574,6 +16574,20 @@ mips_expand_builtin_insn (enum insn_code icode, 
unsigned int nops,
 
   switch (icode)
 {
+/* The third argument needs to be in SImode in order to succesfully match
+   the operand from the insn definition.  */
+case CODE_FOR_loongson_pshufh:
+case CODE_FOR_loongson_psllh:
+case CODE_FOR_loongson_psllw:
+case CODE_FOR_loongson_psrah:
+case CODE_FOR_loongson_psraw:
+case CODE_FOR_loongson_psrlh:
+case CODE_FOR_loongson_psrlw:
+  gcc_assert (has_target_p && nops == 3 && ops[2].mode == QImode);
+  ops[2].value = lowpart_subreg (SImode, ops[2].value, QImode);
+  ops[2].mode = SImode;
+  break;
+
 case CODE_FOR_msa_addvi_b:
 case CODE_FOR_msa_addvi_h:
 case CODE_FOR_msa_addvi_w:



Re: [PR tree-optimization/71691] Fix unswitching in presence of maybe-undef SSA_NAMEs (take 2)

2017-01-30 Thread Aldy Hernandez

On 01/30/2017 10:03 AM, Richard Biener wrote:

On Fri, Jan 27, 2017 at 12:20 PM, Aldy Hernandez  wrote:

On 01/26/2017 07:29 AM, Richard Biener wrote:


On Thu, Jan 26, 2017 at 1:04 PM, Aldy Hernandez  wrote:


On 01/24/2017 07:23 AM, Richard Biener wrote:




Your initial on-demand approach is fine to catch some of the cases but it
will not handle those for which we'd need post-dominance.

I guess we can incrementally add that.



No complaints from me.

This is my initial on-demand approach, with a few fixes you've commented on
throughout.

As you can see, there is still an #if 0 wrt to using your suggested
conservative handling of memory loads, which I'm not entirely convinced of,
as it pessimizes gcc.dg/loop-unswitch-1.c.  If you feel strongly about it, I
can enable the code again.


It is really required -- fortunately loop-unswitch-1.c is one of the cases where
the post-dom / always-executed bits help .  The comparison is inside the
loop header and thus always executed when the loop enters, so inserrting it
on the preheader edge is fine.


Left as is then.




Also, I enhanced gcc.dg/loop-unswitch-1.c to verify that we're actually
unswitching something.  It seems kinda silly that we have various unswitch
tests, but we don't actually check whether we have unswitched anything.


Heh.  It probably was added for an ICE...


This test was the only one in the *unswitch*.c set that I saw was actually
being unswitched.  Of course, if you don't agree with my #if 0 above, I will
remove this change to the test.

Finally, are we even guaranteed to unswitch in loop-unswitch-1.c across
architectures?  If not, again, I can remove the one-liner.


I think so.


Left as well.





How does this look for trunk?


With a unswitch-local solution I meant to not add new files but put the
defined_or_undefined class (well, or rather a single function...) into
tree-ssa-loop-unswitch.c.


Done.



@@ -138,7 +141,7 @@ tree_may_unswitch_on (basic_block bb, struct loop *loop)
 {
   /* Unswitching on undefined values would introduce undefined
 behavior that the original program might never exercise.  */
-  if (ssa_undefined_value_p (use, true))
+  if (defined_or_undefined->is_maybe_undefined (use))
return NULL_TREE;
   def = SSA_NAME_DEF_STMT (use);
   def_bb = gimple_bb (def);

as I said, moving this (now more expensive check) after

  if (def_bb
  && flow_bb_inside_loop_p (loop, def_bb))
return NULL_TREE;

this cheap check would be better.  It should avoid 99% of all calls I bet.


Done.



You can recover the loop-unswitch-1.c case by passing down
the using stmt and checking its BB against loop_header (the only
block that we trivially know is always executed when entering the region).
Or do that check in the caller, like

if (bb != loop->header
   && ssa_undefined_value_p (use, true) /
defined_or_undefined->is_maybe_undefined (use))


Done in callee.



+  gimple *def = SSA_NAME_DEF_STMT (t);
+
+  /* Check that all the PHI args are fully defined.  */
+  if (gphi *phi = dyn_cast  (def))
+   {
+ if (virtual_operand_p (PHI_RESULT (phi)))
+   continue;

You should never run into virtual operands (you only walk SSA_OP_USEs).


Done.



You can stop walking at stmts that dominate the region header,
like with

+  gimple *def = SSA_NAME_DEF_STMT (t);
/* Uses in stmts always executed when the region header
executes are fine.  */
if (dominated_by_p (CDI_DOMINATORS, loop_header, gimple_bb (def)))
  continue;


H... doing this causes the PR testcase (gcc.dg/loop-unswitch-5.c in 
the attached patch to FAIL).  I haven't looked at it, but I seem to 
recall in the testcase that we could have a DEF that dominated the loop 
but was a mess of PHI's, some of whose args were undefined.


Did you perhaps want to put that dominated_by_p call after the PHI arg 
checks (which seems to work)?:


  /* Check that all the PHI args are fully defined.  */
  if (gphi *phi = dyn_cast  (def))
...
...
...

+  /* Uses in stmts always executed when the region header executes
+are fine.  */
+  if (dominated_by_p (CDI_DOMINATORS, loop->header, gimple_bb (def)))
+   continue;
+
   /* Handle calls and memory loads conservatively.  */
   if (!is_gimple_assign (def)
  || (gimple_assign_single_p (def)

Until this is clear, I've left this dominated_by_p call #if 0'ed out.



and the bail out for PARM_DECLs is wrong:

+  /* A PARM_DECL will not have an SSA_NAME_DEF_STMT.  Parameters
+get their initial value from function entry.  */
+  if (SSA_NAME_VAR (t) && TREE_CODE (SSA_NAME_VAR (t)) == PARM_DECL)
+   return false;

needs to be a continue; rather than a return false.


Done.

Preliminary test show the attached patch works.  Further tests on-going.

Aldy
gcc/

PR tree-optimization/71691
* bitmap.h (class 

Re: [PATCH, ARM] PR71607: New approach to arm_disable_literal_pool

2017-01-30 Thread Andre Vieira (lists)
On 27/01/17 12:13, Ramana Radhakrishnan wrote:
> On Thu, Jan 26, 2017 at 3:56 PM, Andre Vieira (lists)
>  wrote:
>> On 20/01/17 14:08, Ramana Radhakrishnan wrote:
>>> On Wed, Dec 28, 2016 at 9:58 AM, Andre Vieira (lists)
>>>  wrote:
 On 29/11/16 09:45, Andre Vieira (lists) wrote:
> On 17/11/16 10:00, Ramana Radhakrishnan wrote:
>> On Thu, Oct 6, 2016 at 2:57 PM, Andre Vieira (lists)
>>  wrote:
>>> Hello,
>>>
>>> This patch tackles the issue reported in PR71607. This patch takes a
>>> different approach for disabling the creation of literal pools. Instead
>>> of disabling the patterns that would normally transform the rtl into
>>> actual literal pools, it disables the creation of this literal pool rtl
>>> by making the target hook TARGET_CANNOT_FORCE_CONST_MEM return true if
>>> arm_disable_literal_pool is true. I added patterns to split floating
>>> point constants for both SF and DFmode. A pattern to handle the
>>> addressing of label_refs had to be included as well since all
>>> "memory_operand" patterns are disabled when
>>> TARGET_CANNOT_FORCE_CONST_MEM returns true. Also the pattern for
>>> splitting 32-bit immediates had to be changed, it was not accepting
>>> unsigned 32-bit unsigned integers with the MSB set. I believe
>>> const_int_operand expects the mode of the operand to be set to VOIDmode
>>> and not SImode. I have only changed it in the patterns that were
>>> affecting this code, though I suggest looking into changing it in the
>>> rest of the ARM backend.
>>>
>>> I added more test cases. No regressions for arm-none-eabi with
>>> Cortex-M0, Cortex-M3 and Cortex-M7.
>>>
>>> Is this OK for trunk?
>>
>> Including -mslow-flash-data in your multilib flags ? If no regressions
>> with that ok .
>>
>>
>> regards
>> Ramana
>>
>>>
>
> Hello,
>
> I found some new ICE's with the -mslow-flash-data testing so I had to
> rework this patch. I took the opportunity to rebase it as well.
>
> The problem was with the way the old version of the patch handled label
> references.  After some digging I found I wasn't using the right target
> hook and so I implemented the 'TARGET_USE_BLOCKS_FOR_CONSTANT_P' for
> ARM.  This target hook determines whether a literal pool ends up in an
> 'object_block' structure. So I reverted the changes made in the old
> version of the patch to the ARM implementation of the
> 'TARGET_CANNOT_FORCE_CONST_MEM' hook and rely on
> 'TARGET_USE_BLOCKS_FOR_CONSTANT_P' instead. This patch adds an ARM
> implementation for this hook that returns false if
> 'arm_disable_literal_pool' is set to true and true otherwise.
>
> This version of the patch also reverts back to using the check for
> 'SYMBOL_REF' in 'thumb2_legitimate_address_p' that was removed in the
> last version, this code is required to place the label references in
> rodata sections.
>
> Another thing this patch does is revert the changes made to the 32-bit
> constant split in arm.md. The reason this was needed before was because
> 'real_to_target' returns a long array and does not sign-extend values in
> it, which would make sense on hosts with 64-bit longs. To fix this the
> value is now casted to 'int' first.  It would probably be a good idea to
> change the 'real_to_target' function to return an array with
> 'HOST_WIDE_INT' elements instead and either use all 64-bits or
> sign-extend them.  Something for the future?
>
> I added more test cases in this patch and reran regression tests for:
> Cortex-M0, Cortex-M4 with and without -mslow-flash-data. Also did a
> bootstrap+regressions on arm-none-linux-gnueabihf.
>
> Is this OK for trunk?
>
> Cheers,
> Andre
>
> gcc/ChangeLog:
>
> 2016-11-29  Andre Vieira  
>
> PR target/71607
> * config/arm/arm.md (use_literal_pool): Removes.
> (64-bit immediate split): No longer takes cost into consideration
> if 'arm_disable_literal_pool' is enabled.
> * config/arm/arm.c (arm_use_blocks_for_constant_p): New.
> (TARGET_USE_BLOCKS_FOR_CONSTANT_P): Define.
> (arm_max_const_double_inline_cost): Remove use of
> arm_disable_literal_pool.
> * config/arm/vfp.md (no_literal_pool_df_immediate): New.
> (no_literal_pool_sf_immediate): New.
>
>
> gcc/testsuite/ChangeLog:
>
> 2016-11-29  Andre Vieira  
> Thomas Preud'homme  
>
> PR target/71607
> * gcc.target/arm/thumb2-slow-flash-data.c: Renamed to ...
> * gcc.target/arm/thumb2-slow-flash-data-1.c: ... this.
> * 

Re: [PATCH/VECT/AARCH64] Improve cost model for ThunderX2 CN99xx

2017-01-30 Thread Andrew Pinski
On Mon, Jan 30, 2017 at 8:55 AM, Richard Earnshaw (lists)
 wrote:
> On 28/01/17 20:34, Andrew Pinski wrote:
>> Hi,
>>   On some (most) AARCH64 cores, it is not always profitable to
>> vectorize some integer loops.  This patch does two things (I can split
>> it into different patches if needed).
>> 1) It splits the aarch64 back-end's vector cost model's vector and
>> scalar costs into int and fp fields
>> 1a) For thunderx2t99, models correctly the integer vector/scalar costs.
>> 2) Fixes/Improves a few calls to record_stmt_cost in tree-vect-loop.c
>> where stmt_info was not being passed.
>>
>> OK?  Bootstrapped and tested on aarch64-linux-gnu and provides 20% on
>> libquantum and ~1% overall on SPEC CPU 2006 int.
>>
>> Thanks,
>> Andrew Pinski
>>
>> ChangeLog:
>> * tree-vect-loop.c (vect_compute_single_scalar_iteration_cost): Pass
>> stmt_info to record_stmt_cost.
>> (vect_get_known_peeling_cost): Pass stmt_info if known to record_stmt_cost.
>>
>> * config/aarch64/aarch64-protos.h (cpu_vector_cost): Split
>> cpu_vector_cost field into
>> scalar_int_stmt_cost and scalar_fp_stmt_cost.  Split vec_stmt_cost
>> field into vec_int_stmt_cost and vec_fp_stmt_cost.
>> * config/aarch64/aarch64.c (generic_vector_cost): Update for the
>> splitting of scalar_stmt_cost and vec_stmt_cost.
>> (thunderx_vector_cost): Likewise.
>> (cortexa57_vector_cost): LIkewise.
>> (exynosm1_vector_cost): Likewise.
>> (xgene1_vector_cost): Likewise.
>> (thunderx2t99_vector_cost): Improve after the splitting of the two fields.
>> (aarch64_builtin_vectorization_cost): Update for the splitting of
>> scalar_stmt_cost and vec_stmt_cost.
>>
>>
>> improve-vect-cost.diff.txt
>>
>>
>> Index: config/aarch64/aarch64-protos.h
>> ===
>> --- config/aarch64/aarch64-protos.h   (revision 245002)
>> +++ config/aarch64/aarch64-protos.h   (working copy)
>> @@ -151,11 +151,17 @@ struct cpu_regmove_cost
>>  /* Cost for vector insn classes.  */
>>  struct cpu_vector_cost
>>  {
>> -  const int scalar_stmt_cost; /* Cost of any scalar 
>> operation,
>> +  const int scalar_int_stmt_cost; /* Cost of any int scalar operation,
>> + excluding load and store.  */
>> +  const int scalar_fp_stmt_cost;  /* Cost of any fp scalar operation,
>>   excluding load and store.  */
>>const int scalar_load_cost; /* Cost of scalar load.  */
>>const int scalar_store_cost;/* Cost of scalar store.  */
>> -  const int vec_stmt_cost;/* Cost of any vector operation,
>> +  const int vec_int_stmt_cost;/* Cost of any int vector 
>> operation,
>> + excluding load, store, permute,
>> + vector-to-scalar and
>> + scalar-to-vector operation.  */
>> +  const int vec_fp_stmt_cost; /* Cost of any fp vector 
>> operation,
>>   excluding load, store, permute,
>>   vector-to-scalar and
>>   scalar-to-vector operation.  */
>> Index: config/aarch64/aarch64.c
>> ===
>> --- config/aarch64/aarch64.c  (revision 245002)
>> +++ config/aarch64/aarch64.c  (working copy)
>> @@ -365,10 +365,12 @@ static const struct cpu_regmove_cost thu
>>  /* Generic costs for vector insn classes.  */
>>  static const struct cpu_vector_cost generic_vector_cost =
>>  {
>> -  1, /* scalar_stmt_cost  */
>> +  1, /* scalar_int_stmt_cost  */
>> +  1, /* scalar_fp_stmt_cost  */
>>1, /* scalar_load_cost  */
>>1, /* scalar_store_cost  */
>> -  1, /* vec_stmt_cost  */
>> +  1, /* vec_int_stmt_cost  */
>> +  1, /* vec_fp_stmt_cost  */
>>2, /* vec_permute_cost  */
>>1, /* vec_to_scalar_cost  */
>>1, /* scalar_to_vec_cost  */
>> @@ -383,10 +385,12 @@ static const struct cpu_vector_cost gene
>>  /* ThunderX costs for vector insn classes.  */
>>  static const struct cpu_vector_cost thunderx_vector_cost =
>>  {
>> -  1, /* scalar_stmt_cost  */
>> +  1, /* scalar_int_stmt_cost  */
>> +  1, /* scalar_fp_stmt_cost  */
>>3, /* scalar_load_cost  */
>>1, /* scalar_store_cost  */
>> -  4, /* vec_stmt_cost  */
>> +  4, /* vec_int_stmt_cost  */
>> +  4, /* vec_fp_stmt_cost  */
>>4, /* vec_permute_cost  */
>>2, /* vec_to_scalar_cost  */
>>2, /* scalar_to_vec_cost  */
>> @@ -401,10 +405,12 @@ static const struct cpu_vector_cost thun
>>  /* Generic costs for vector insn classes.  */
>>  static const struct cpu_vector_cost cortexa57_vector_cost =
>>  {
>> -  1, /* scalar_stmt_cost  */
>> +  1, /* scalar_int_stmt_cost  */
>> +  1, /* scalar_fp_stmt_cost  */
>>4, /* scalar_load_cost  */
>>1, /* scalar_store_cost  */
>> -  2, /* vec_stmt_cost 

Re: [PATCH/VECT/AARCH64] Improve cost model for ThunderX2 CN99xx

2017-01-30 Thread Richard Earnshaw (lists)
On 28/01/17 20:34, Andrew Pinski wrote:
> Hi,
>   On some (most) AARCH64 cores, it is not always profitable to
> vectorize some integer loops.  This patch does two things (I can split
> it into different patches if needed).
> 1) It splits the aarch64 back-end's vector cost model's vector and
> scalar costs into int and fp fields
> 1a) For thunderx2t99, models correctly the integer vector/scalar costs.
> 2) Fixes/Improves a few calls to record_stmt_cost in tree-vect-loop.c
> where stmt_info was not being passed.
> 
> OK?  Bootstrapped and tested on aarch64-linux-gnu and provides 20% on
> libquantum and ~1% overall on SPEC CPU 2006 int.
> 
> Thanks,
> Andrew Pinski
> 
> ChangeLog:
> * tree-vect-loop.c (vect_compute_single_scalar_iteration_cost): Pass
> stmt_info to record_stmt_cost.
> (vect_get_known_peeling_cost): Pass stmt_info if known to record_stmt_cost.
> 
> * config/aarch64/aarch64-protos.h (cpu_vector_cost): Split
> cpu_vector_cost field into
> scalar_int_stmt_cost and scalar_fp_stmt_cost.  Split vec_stmt_cost
> field into vec_int_stmt_cost and vec_fp_stmt_cost.
> * config/aarch64/aarch64.c (generic_vector_cost): Update for the
> splitting of scalar_stmt_cost and vec_stmt_cost.
> (thunderx_vector_cost): Likewise.
> (cortexa57_vector_cost): LIkewise.
> (exynosm1_vector_cost): Likewise.
> (xgene1_vector_cost): Likewise.
> (thunderx2t99_vector_cost): Improve after the splitting of the two fields.
> (aarch64_builtin_vectorization_cost): Update for the splitting of
> scalar_stmt_cost and vec_stmt_cost.
> 
> 
> improve-vect-cost.diff.txt
> 
> 
> Index: config/aarch64/aarch64-protos.h
> ===
> --- config/aarch64/aarch64-protos.h   (revision 245002)
> +++ config/aarch64/aarch64-protos.h   (working copy)
> @@ -151,11 +151,17 @@ struct cpu_regmove_cost
>  /* Cost for vector insn classes.  */
>  struct cpu_vector_cost
>  {
> -  const int scalar_stmt_cost; /* Cost of any scalar 
> operation,
> +  const int scalar_int_stmt_cost; /* Cost of any int scalar operation,
> + excluding load and store.  */
> +  const int scalar_fp_stmt_cost;  /* Cost of any fp scalar operation,
>   excluding load and store.  */
>const int scalar_load_cost; /* Cost of scalar load.  */
>const int scalar_store_cost;/* Cost of scalar store.  */
> -  const int vec_stmt_cost;/* Cost of any vector operation,
> +  const int vec_int_stmt_cost;/* Cost of any int vector 
> operation,
> + excluding load, store, permute,
> + vector-to-scalar and
> + scalar-to-vector operation.  */
> +  const int vec_fp_stmt_cost; /* Cost of any fp vector 
> operation,
>   excluding load, store, permute,
>   vector-to-scalar and
>   scalar-to-vector operation.  */
> Index: config/aarch64/aarch64.c
> ===
> --- config/aarch64/aarch64.c  (revision 245002)
> +++ config/aarch64/aarch64.c  (working copy)
> @@ -365,10 +365,12 @@ static const struct cpu_regmove_cost thu
>  /* Generic costs for vector insn classes.  */
>  static const struct cpu_vector_cost generic_vector_cost =
>  {
> -  1, /* scalar_stmt_cost  */
> +  1, /* scalar_int_stmt_cost  */
> +  1, /* scalar_fp_stmt_cost  */
>1, /* scalar_load_cost  */
>1, /* scalar_store_cost  */
> -  1, /* vec_stmt_cost  */
> +  1, /* vec_int_stmt_cost  */
> +  1, /* vec_fp_stmt_cost  */
>2, /* vec_permute_cost  */
>1, /* vec_to_scalar_cost  */
>1, /* scalar_to_vec_cost  */
> @@ -383,10 +385,12 @@ static const struct cpu_vector_cost gene
>  /* ThunderX costs for vector insn classes.  */
>  static const struct cpu_vector_cost thunderx_vector_cost =
>  {
> -  1, /* scalar_stmt_cost  */
> +  1, /* scalar_int_stmt_cost  */
> +  1, /* scalar_fp_stmt_cost  */
>3, /* scalar_load_cost  */
>1, /* scalar_store_cost  */
> -  4, /* vec_stmt_cost  */
> +  4, /* vec_int_stmt_cost  */
> +  4, /* vec_fp_stmt_cost  */
>4, /* vec_permute_cost  */
>2, /* vec_to_scalar_cost  */
>2, /* scalar_to_vec_cost  */
> @@ -401,10 +405,12 @@ static const struct cpu_vector_cost thun
>  /* Generic costs for vector insn classes.  */
>  static const struct cpu_vector_cost cortexa57_vector_cost =
>  {
> -  1, /* scalar_stmt_cost  */
> +  1, /* scalar_int_stmt_cost  */
> +  1, /* scalar_fp_stmt_cost  */
>4, /* scalar_load_cost  */
>1, /* scalar_store_cost  */
> -  2, /* vec_stmt_cost  */
> +  2, /* vec_int_stmt_cost  */
> +  2, /* vec_fp_stmt_cost  */
>3, /* vec_permute_cost  */
>8, /* vec_to_scalar_cost  */
>8, /* scalar_to_vec_cost  */
> @@ -418,10 +424,12 @@ 

Re: [PATCH 0/6] Improve -fprefetch-loop-arrays in general and for AArch64 in particular

2017-01-30 Thread Andrew Pinski
On Mon, Jan 30, 2017 at 3:24 AM, Maxim Kuvyrkov
 wrote:
> This patch series improves -fprefetch-loop-arrays pass through small fixes 
> and tweaks, and then enables it for several AArch64 cores.
>
> My tunings were done on and for Qualcomm hardware, with results varying 
> between +0.5-1.9% for SPEC2006 INT and +0.25%-1.0% for SPEC2006 FP at -O3, 
> depending on hardware revision.
>
> This patch series enables restricted -fprefetch-loop-arrays at -O2, which 
> also improves SPEC2006 numbers
>
> Biggest progressions are on 419.mcf and 437.leslie3d, with no serious 
> regressions on other benchmarks.
>
> I'm now investigating making -fprefetch-loop-arrays more aggressive for 
> Qualcomm hardware, which improves performance on most benchmarks, but also 
> causes big regressions on 454.calculix and 462.libquantum.  If I can fix 
> these two regressions, prefetching will give another boost to AArch64.

I have a patch which causes more aggressively already which improves
libquantum for CN88xx; I have not submitted yet as I had just
restarted the upstreaming my patch sets.

Thanks,
Andrew

>
> Andrew just posted similar prefetching tunings for Cavium's cores, and the 
> two patches have trivial conflicts.  I'll post mine as-is, since it address 
> one of the comments on Andrew's review (adding a stand-alone struct for 
> tuning parameters).
>
> Andrew, feel free to just copy-paste it to your patch, since it is just a 
> mechanical change.
>
> All patches were bootstrapped and regtested on x86_64-linux-gnu and 
> aarch64-linux-gnu.
>
> --
> Maxim Kuvyrkov
> www.linaro.org
>
>
>


Re: [PATCH 4/6] Port prefetch configuration from aarch32 to aarch64

2017-01-30 Thread Andrew Pinski
On Mon, Jan 30, 2017 at 3:48 AM, Maxim Kuvyrkov
 wrote:
> This patch port prefetch configuration from aarch32 backend to aarch64.  
> There is no code-generation change from this patch.
>
> This patch also happens to address Kyrill's comment on Andrew's prefetching 
> patch at https://gcc.gnu.org/ml/gcc-patches/2017-01/msg02133.html .
>
> This patch also fixes a minor bug in aarch64_override_options_internal(), 
> which used "selected_cpu->tune" instead of "aarch64_tune_params".

I am not a fan of the macro at all.

Thanks,
Andrew


>
> Bootstrapped and regtested on x86_64-linux-gnu and aarch64-linux-gnu.
>
> --
> Maxim Kuvyrkov
> www.linaro.org
>


Re: [PATCH/VECT/AARCH64] Improve cost model for ThunderX2 CN99xx

2017-01-30 Thread Andrew Pinski
On Mon, Jan 30, 2017 at 2:01 AM, Richard Biener
 wrote:
> On Sat, Jan 28, 2017 at 9:34 PM, Andrew Pinski  wrote:
>> Hi,
>>   On some (most) AARCH64 cores, it is not always profitable to
>> vectorize some integer loops.  This patch does two things (I can split
>> it into different patches if needed).
>> 1) It splits the aarch64 back-end's vector cost model's vector and
>> scalar costs into int and fp fields
>> 1a) For thunderx2t99, models correctly the integer vector/scalar costs.
>> 2) Fixes/Improves a few calls to record_stmt_cost in tree-vect-loop.c
>> where stmt_info was not being passed.
>>
>> OK?  Bootstrapped and tested on aarch64-linux-gnu and provides 20% on
>> libquantum and ~1% overall on SPEC CPU 2006 int.
>
> +   {
> + struct _stmt_vec_info *stmt_info
> +   = si->stmt ? vinfo_for_stmt (si->stmt) : NULL;
>
> use stmt_vec_info instead of 'struct _stmt_vec_info *'.

Understood.  I was just copying from previous in the file and I did
not notice I should have converted into C++ style :).

>
> The vectorizer changes are ok with that change.

Thanks,
Andrew


>
> Thanks,
> Richard.
>
>> Thanks,
>> Andrew Pinski
>>
>> ChangeLog:
>> * tree-vect-loop.c (vect_compute_single_scalar_iteration_cost): Pass
>> stmt_info to record_stmt_cost.
>> (vect_get_known_peeling_cost): Pass stmt_info if known to record_stmt_cost.
>>
>> * config/aarch64/aarch64-protos.h (cpu_vector_cost): Split
>> cpu_vector_cost field into
>> scalar_int_stmt_cost and scalar_fp_stmt_cost.  Split vec_stmt_cost
>> field into vec_int_stmt_cost and vec_fp_stmt_cost.
>> * config/aarch64/aarch64.c (generic_vector_cost): Update for the
>> splitting of scalar_stmt_cost and vec_stmt_cost.
>> (thunderx_vector_cost): Likewise.
>> (cortexa57_vector_cost): LIkewise.
>> (exynosm1_vector_cost): Likewise.
>> (xgene1_vector_cost): Likewise.
>> (thunderx2t99_vector_cost): Improve after the splitting of the two fields.
>> (aarch64_builtin_vectorization_cost): Update for the splitting of
>> scalar_stmt_cost and vec_stmt_cost.


Re: [PATCH/AARCH64] Enable software prefetching (-fprefetch-loop-arrays) for ThunderX 88xxx

2017-01-30 Thread Andrew Pinski
On Mon, Jan 30, 2017 at 6:49 AM, Maxim Kuvyrkov
 wrote:
>> On Jan 27, 2017, at 6:59 PM, Andrew Pinski  wrote:
>>
>> On Fri, Jan 27, 2017 at 4:11 AM, Richard Biener
>>  wrote:
>>> On Fri, Jan 27, 2017 at 1:10 PM, Richard Biener
>>>  wrote:
 On Thu, Jan 26, 2017 at 9:56 PM, Andrew Pinski  wrote:
> Hi,
>  This patch enables -fprefetch-loop-arrays for -mcpu=thunderxt88 and
> -mcpu=thunderxt88p1.  I filled out the tuning structures for both
> thunderx and thunderx2t99.  No other core current enables software
> prefetching so I set them to 0 which does not change the default
> parameters.
>
> OK?  Bootstrapped and tested on both ThunderX2 CN99xx and ThunderX
> CN88xx with no regressions.  I got a 2x improvement for 462.libquantum
> on CN88xx, overall a 10% improvement on SPEC INT on CN88xx at -Ofast.
> CN99xx's SPEC did not change.

 Heh, quite impressive for this kind of bit-rotten (and broken?) pass ;)
>>>
>>> And I wonder if most benefit comes from the unrolling the pass might do
>>> rather than from the prefetches...
>>
>> Not in this case.  The main reason why I know is because the number of
>> L1 and L2 misses drops a lot.
>
> I can confirm this.  In my experiments loop unrolling hurts several tests.

Not on the cores I tried it.  I tried it on both ThunderX CN88xx and
ThunderX CN99xx, I did not get any regressions due to unrolling.

Thanks,
Andrew

>
> The prefetching approach I'm testing for -O2 includes disabling of loop 
> unrolling to prevent code bloat.
>
> --
> Maxim Kuvyrkov
> www.linaro.org
>
>


Re: [PATCH/AARCH64] Enable software prefetching (-fprefetch-loop-arrays) for ThunderX 88xxx

2017-01-30 Thread Andrew Pinski
On Mon, Jan 30, 2017 at 4:14 AM, Maxim Kuvyrkov
 wrote:
>> On Jan 27, 2017, at 1:54 PM, Kyrill Tkachov  
>> wrote:
>>
>>
>> On 26/01/17 20:56, Andrew Pinski wrote:
>>> Hi,
>>>   This patch enables -fprefetch-loop-arrays for -mcpu=thunderxt88 and
>>> -mcpu=thunderxt88p1.  I filled out the tuning structures for both
>>> thunderx and thunderx2t99.  No other core current enables software
>>> prefetching so I set them to 0 which does not change the default
>>> parameters.
>>>
>>> OK?  Bootstrapped and tested on both ThunderX2 CN99xx and ThunderX
>>> CN88xx with no regressions.  I got a 2x improvement for 462.libquantum
>>> on CN88xx, overall a 10% improvement on SPEC INT on CN88xx at -Ofast.
>>> CN99xx's SPEC did not change.
>>>
>>> Thanks,
>>> Andrew Pinski
>>>
>>> ChangeLog:
>>> * config/aarch64/aarch64-protos.h (struct tune_params): Add
>>> prefetch_latency, simultaneous_prefetches, l1_cache_size, and
>>> l2_cache_size fields.
>>> (enum aarch64_autoprefetch_model): Add AUTOPREFETCHER_SW.
>>> * config/aarch64/aarch64.c (generic_tunings): Update to include
>>> prefetch_latency, simultaneous_prefetches, l1_cache_size, and
>>> l2_cache_size fields to 0.
>>> (cortexa35_tunings): Likewise.
>>> (cortexa53_tunings): Likewise.
>>> (cortexa57_tunings): Likewise.
>>> (cortexa72_tunings): Likewise.
>>> (cortexa73_tunings): Likewise.
>>> (exynosm1_tunings): Likewise.
>>> (thunderx_tunings): Fill out some of the new fields.
>>> (thunderxt88_tunings): New variable.
>>> (xgene1_tunings): Update to include prefetch_latency,
>>> simultaneous_prefetches, l1_cache_size, and l2_cache_size fields to 0.
>>> (qdf24xx_tunings): Likewise.
>>> (thunderx2t99_tunings): Fill out some of the new fields.
>>> (aarch64_override_options_internal): Consider AUTOPREFETCHER_SW like
>>> AUTOPREFETCHER_OFF.
>>> Set param values if the fields are non-zero.  Turn on
>>> prefetch-loop-arrays if AUTOPREFETCHER_SW and optimize level is at
>>> least 3 or profile feed usage is enabled.
>>> * config/aarch64/aarch64-cores.def (thunderxt88p1): Use thunderxt88 tuning.
>>> (thunderxt88): Likewise.
>>
>> --- config/aarch64/aarch64-protos.h   (revision 244917)
>> +++ config/aarch64/aarch64-protos.h   (working copy)
>> @@ -220,10 +220,19 @@ struct tune_params
>>   unsigned int max_case_values;
>>   /* Value for PARAM_L1_CACHE_LINE_SIZE; or 0 to use the default.  */
>>   unsigned int cache_line_size;
>> +  /* Value for PARAM_PREFETCH_LATENCY; or 0 to use the default.  */
>> +  unsigned int prefetch_latency;
>> +  /* Value for PARAM_SIMULTANEOUS_PREFETCHES; or 0 to use the default.  */
>> +  unsigned int simultaneous_prefetches;
>> +  /* Value for PARAM_L1_CACHE_SIZE; or 0 to use the default.  */
>> +  unsigned int l1_cache_size;
>> +  /* Value for PARAM_L2_CACHE_SIZE; or 0 to use the default.  */
>> +  unsigned int l2_cache_size;
>>
>> Not a blocker to the patch but I wonder whether it would be a good idea to 
>> group these prefetch-related parameters
>> (plus autoprefetcher_model) into a new nested struct here (prefetch_tunings 
>> or something) since there's a decent
>> number of them and they're all related.
>
> Feel free to copy-paste from 
> https://gcc.gnu.org/ml/gcc-patches/2017-01/msg02292.html  , which is a 
> copy-paste from aarch32 backend anyway ;-).

I am not a fan of this macro ...

>
> --
> Maxim Kuvyrkov
> www.linaro.org
>
>


Re: [PATCH][PR target/79170] fix memcmp builtin expansion sequence for rs6000 target.

2017-01-30 Thread Peter Bergner

On 1/27/17 5:43 PM, Segher Boessenkool wrote:

On Fri, Jan 27, 2017 at 12:11:05PM -0600, Aaron Sawdey wrote:

+addi 9,4,7
+lwbrx 10,0,9
+addi 9,5,7
+lwbrx 9,0,9


It would be nice if this was

li 9,7
lwbrx 10,9,4
lwbrx 9,9,5


Nicer still, we want the base address as the RA operand
and the offset as the RB operand, so like so:

li 9,7
lwbrx 10,4,9
lwbrx 9,5,9

On some processors, it matters performance wise.

Peter



Re: [gomp4] optimize GOMP_MAP_TO_PSET

2017-01-30 Thread Thomas Schwinge
Hi Cesar!

On Mon, 30 Jan 2017 07:19:27 -0800, Cesar Philippidis  
wrote:
> On 01/30/2017 02:26 AM, Thomas Schwinge wrote:
> > On Fri, 27 Jan 2017 08:06:22 -0800, Cesar Philippidis 
> >  wrote:
> > PASS: libgomp.fortran/examples-4/async_target-2.f90   -O0  (test for 
> > excess errors)
> > [-PASS:-]{+FAIL:+} libgomp.fortran/examples-4/async_target-2.f90   -O0  
> > execution test

> > PASS: libgomp.fortran/target3.f90   -O0  (test for excess errors)
> > [-PASS:-]{+FAIL:+} libgomp.fortran/target3.f90   -O0  execution test

> > In all cases, the run-time error message is:
> > 
> > libgomp: Pointer target of array section wasn't mapped
> 
> I'm not seeing any of these failures on power8 and x86_64 with multilibs
> disabled. How did you configure gcc? And those are OpenMP tests. Are you
> testing OpenMP target offloading, if so how?

Well,
"--enable-offload-targets=nvptx-none="$T"/install/offload-nvptx-none,x86_64-intelmicemul-linux-gnu="$T"/install/offload-x86_64-intelmicemul-linux-gnu,hsa"
as done in my build scripts (trunk-offload-big.tar.bz2,
trunk-offload-light.tar.bz2) as posted on
.
(The versions uploaded there have not yet been updated to enable HSA
offloading, but that's not relevant in this discussion.)


Grüße
 Thomas


Re: [PATCH] Fix profile corruption with -O1 (PR gcov-profile/79259)

2017-01-30 Thread Martin Liška
On 01/30/2017 04:12 PM, Richard Biener wrote:
> On Mon, Jan 30, 2017 at 4:09 PM, Martin Liška  wrote:
>> Hello.
>>
>> During investigation of another issue, I accidentally came a profile 
>> inconsistency
>> mentioned in the PR. Problem is that flag_ipa_bit_cp is enabled in use stage 
>> of PGO
>> and not in instrumentation stage. That causes ccp1 to find nonzero bits and 
>> that leads
>> to a CFG changes as a condition can be folded away.
>>
>> Solution is to enable the same flag in generate phase. However I've got one 
>> more question:
>> In -fprofile-generate we have 2 more functions available in symtab:
>>
>> _GLOBAL__sub_I_00100_0_c ()
>> {
>>[0.00%]:
>>   __gcov_init (&*.LPBX0);
>>   return;
>>
>> }
>>
>> _GLOBAL__sub_D_00100_1_c ()
>> {
>>[0.00%]:
>>   __gcov_exit ();
>>   return;
>>
>> }
>>
>> I'm wondering whether it can potentially influence early inlining decision?
> 
> I guess yes...

Looks it should be fine as einline runs before IPA passes, where we create calls
to __gcov_init and __gcov_exit. Thus the original functions do no see the 
global {cd}tor.

#0  build_init_ctor (gcov_info_type=0x76a1a000) at ../../gcc/coverage.c:1045
#1  0x00a1086d in coverage_obj_init () at ../../gcc/coverage.c:1144
#2  0x00a10df7 in coverage_finish () at ../../gcc/coverage.c:1272
#3  0x009ff53a in ipa_passes () at ../../gcc/cgraphunit.c:2338
#4  symbol_table::compile (this=this@entry=0x7688a100) at 
../../gcc/cgraphunit.c:2460
#5  0x00a018b5 in symbol_table::compile (this=0x7688a100) at 
../../gcc/cgraphunit.c:2593
#6  symbol_table::finalize_compilation_unit (this=0x7688a100) at 
../../gcc/cgraphunit.c:2619
#7  0x00dfc5fa in compile_file () at ../../gcc/toplev.c:488
#8  0x00685d37 in do_compile () at ../../gcc/toplev.c:1983
#9  toplev::main (this=this@entry=0x7fffd8e0, argc=, 
argc@entry=21, argv=, argv@entry=0x7fffd9e8) at 
../../gcc/toplev.c:2117
#10 0x00688067 in main (argc=21, argv=0x7fffd9e8) at 
../../gcc/main.c:39

Martin

> 
>> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>>
>> Ready to be installed?
> 
> Ok.
> 
> Thanks,
> Richard.
> 
>> Martin



Re: [gomp4] partially enable GOMP_MAP_FIRSTPRIVATE_POINTER in gfortran

2017-01-30 Thread Thomas Schwinge
Hi Cesar!  (It's me, again!)  ;-)

On Fri, 27 Jan 2017 09:13:06 -0800, Cesar Philippidis  
wrote:
> This patch partially enables GOMP_MAP_FIRSTPRIVATE_POINTER in gfortran.
> gfortran still falls back to GOMP_MAP_POINTER for arrays with
> descriptors and derived types. The limitation on derived types is there
> because we don't have much test coverage for it, and this patch series
> was more exploratory for performance enhancements.

Now that you still freshly remember it, please file an issue so that
we'll take care of that later.

> With that in mind,
> there are a couple of shortcomings with this patch.
> 
>  1) Dummy reduction variables fallback to GOMP_MAP_POINTER because of a
> pointer dereferencing bug.

Please also file an issue for that.


> The state of debugging such problems on
> PTX targets leaves something to be desired, especially since print
> isn't working on nvptx targets currently.

If the following is what you mean, then that's working for me:

$ cat < ../printf.c
int main(int argc, char *argv[])
{
#pragma acc parallel copyin(argv[0][0:__builtin_strlen(argv[0]) + 1])
  {
__builtin_printf("Offloaded from %s.\n", argv[0]);
  }

  return 0;
}
$ build-gcc/gcc/xgcc [...] -Wall -Wextra -g ../printf.c -fopenacc -O2
$ GOMP_DEBUG=1 ./a.out
[...]
  nvptx_exec: kernel main$_omp_fn$0: launch gangs=1, workers=1, vectors=32
Offloaded from ./a.out.
  nvptx_exec: kernel main$_omp_fn$0: finished
GOMP_offload_unregister_ver (1, 0x400c20, 5, 0x401560)
GOMP_offload_unregister_ver (0, 0x400c20, 6, 0x602050)

Again, please file an issue as appropriate.  ;-)


>  2) Apparently, firstprivate pointers negatively affects the alias
> analysis used by ACC KERNELS and parloops, so a couple of more
> execution tests fail to generate offloaded code.
> 
> I plan to resolve issue 1) in a follow up patch later on (but maybe not
> in the immediate future). Regarding 2), ACC KERNELS are eventually going
> to need a significant rework, but that's not going to happen in the near
> future either. I've been pushing to get the performance of ACC PARALLEL
> regions on par to other OpenACC compilers first, and hopefully that
> won't be too far way.

Hmm, hmm.


> With this patch, I'm observing an approximate 0.6s reduction in
> CloverLeaf's original 0.9s execution time (it takes approximate 0.9s
> after the GOMP_MAP_FIRSTPRIVATE_INT and GOMP_MAP_TO_PSET patches), to
> yield a final execution time somewhere in the neighborhood of 0.3s.
> That's about a one second savings from the unpatched version of GCC.

Yay!  \o/


> This patch has been committed to gomp-4_0-branch.

(Not reviewed in detail.)

> --- a/gcc/fortran/trans-openmp.c
> +++ b/gcc/fortran/trans-openmp.c
> @@ -2005,9 +2005,12 @@ gfc_trans_omp_clauses_1 (stmtblock_t *block, 
> gfc_omp_clauses *clauses,
>   (TREE_TYPE (TREE_TYPE (decl)
>   {
> tree orig_decl = decl;
> +   enum gomp_map_kind gmk = GOMP_MAP_FIRSTPRIVATE_POINTER;
> +   if (n->u.map_op == OMP_MAP_FORCE_DEVICEPTR)
> + gmk = GOMP_MAP_POINTER;

Curious, why is "deviceptr" different?

> node4 = build_omp_clause (input_location,
>   OMP_CLAUSE_MAP);
> -   OMP_CLAUSE_SET_MAP_KIND (node4, GOMP_MAP_POINTER);
> +   OMP_CLAUSE_SET_MAP_KIND (node4, gmk);
> OMP_CLAUSE_DECL (node4) = decl;
> OMP_CLAUSE_SIZE (node4) = size_int (0);
> decl = build_fold_indirect_ref (decl);

> --- a/gcc/gimplify.c
> +++ b/gcc/gimplify.c

> @@ -6605,11 +6636,10 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq 
> *pre_p,
>ctx = new_omp_context (region_type);
>ctx->clauses = *list_p;
>outer_ctx = ctx->outer_context;
> -  if (code == OMP_TARGET && !lang_GNU_Fortran ())
> +  if (code == OMP_TARGET && !(lang_GNU_Fortran () && !(region_type & 
> ORT_ACC)))
>  {
> -  ctx->target_map_pointers_as_0len_arrays = true;
> -  /* FIXME: For Fortran we want to set this too, when
> -  the Fortran FE is updated to OpenMP 4.5.  */
> +  if (!lang_GNU_Fortran () || region_type & ORT_ACC)
> + ctx->target_map_pointers_as_0len_arrays = true;
>ctx->target_map_scalars_firstprivate = true;
>  }

I guess the Fortran OpenMP comment should stay?  And, isn't that logic a
bit complicated; could simplify this as follows, unless I'm confused?

--- gcc/gimplify.c
+++ gcc/gimplify.c
@@ -6636,10 +6636,11 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq 
*pre_p,
   ctx = new_omp_context (region_type);
   ctx->clauses = *list_p;
   outer_ctx = ctx->outer_context;
-  if (code == OMP_TARGET && !(lang_GNU_Fortran () && !(region_type & ORT_ACC)))
+  /* FIXME: For Fortran OpenMP we want to set this too, when
+ the Fortran 

Re: [PATCH v3][PR lto/79061] Fix LTO plus ASAN fails with "AddressSanitizer: initialization-order-fiasco".

2017-01-30 Thread Jakub Jelinek
On Mon, Jan 30, 2017 at 04:14:40PM +0100, Richard Biener wrote:
> > as was figured out in PR, using DECL_NAME (TRANSLATION_UNIT_DECL) does not
> > always give us a correct module name in LTO mode because e.g. DECL_CONTEXT 
> > of
> > some variables can be NAMESPACE_DECL and LTO merges NAMESPACE_DECLs.
> 
> Yes, it indeed does.  Note that GCC8+ both TU decls and NAMESPACE_DECLs
> should no longer be neccessary and eventually vanish completely...
> (in lto1, that is).  Can we code-gen the init order stuff early before
> LTO write-out?

The problem is that at least right now the handling of the init-order is
done in 2 separate places.
The C++ FE emits the special libasan calls with the name of the TU at the
beginning and end of the static initialization.
And then the variables need to be registered with the libasan runtime from
an even earlier constructor, and this is something that is done very late
(where we collect all the variables).
So the options for LTO are:
1) somewhere preserve (at least for dynamically_initialized vars) the TU
it came originally from, if some dynamically_initialized var is owned by
more TUs, just drop that flag
2) rewrite in LTO the dynamic_init libasan calls (register with the whole
LTO partition name rather than individual original TUs); this has the major
disadvantage that it will not diagnose initialization order bugs between
vars from TUs from the same LTO partition
3) create the table of global vars for dynamically_initialized vars early
(save the artificial array with the var addresses + ctor into LTO bytecode),
at least for LTO, and then just register the non-dynamically_initialized
vars later (not really good idea for non-LTO, we want to register all the
vars from the whole TU together

1) looks easiest to mebut can grow varpool_node struct by a pointer size

Jakub


Re: [gomp4] optimize GOMP_MAP_TO_PSET

2017-01-30 Thread Cesar Philippidis
On 01/30/2017 02:26 AM, Thomas Schwinge wrote:
> On Fri, 27 Jan 2017 08:06:22 -0800, Cesar Philippidis 
>  wrote:

>> This is probably because CloverLeaf makes use
>> of ACC DATA regions in the critical sections, so all of those PSETs and
>> POINTERs are already preset on the accelerator.
>>
>> One thing I don't like about this patch is that I'm updating the host's
>> copy of the PSET prior to uploading it. The host's PSET does get
>> restored prior to returning from gomp_map_vars, however that might
>> impact things if the host were to run in multi-threaded applications.
>> Maybe I'll drop this patch from gomp4 since it's not very effective.
> 
> ... also there is some bug somewhere; I see:
> 
> PASS: libgomp.fortran/examples-4/async_target-2.f90   -O0  (test for 
> excess errors)
> [-PASS:-]{+FAIL:+} libgomp.fortran/examples-4/async_target-2.f90   -O0  
> execution test
> PASS: libgomp.fortran/examples-4/async_target-2.f90   -O1  (test for 
> excess errors)
> [-PASS:-]{+FAIL:+} libgomp.fortran/examples-4/async_target-2.f90   -O1  
> execution test
> PASS: libgomp.fortran/examples-4/async_target-2.f90   -O2  (test for 
> excess errors)
> [-PASS:-]{+FAIL:+} libgomp.fortran/examples-4/async_target-2.f90   -O2  
> execution test
> PASS: libgomp.fortran/examples-4/async_target-2.f90   -O3 
> -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  
> (test for excess errors)
> [-PASS:-]{+FAIL:+} libgomp.fortran/examples-4/async_target-2.f90   -O3 
> -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  
> execution test
> PASS: libgomp.fortran/examples-4/async_target-2.f90   -O3 -g  (test for 
> excess errors)
> [-PASS:-]{+FAIL:+} libgomp.fortran/examples-4/async_target-2.f90   -O3 -g 
>  execution test
> PASS: libgomp.fortran/examples-4/async_target-2.f90   -Os  (test for 
> excess errors)
> [-PASS:-]{+FAIL:+} libgomp.fortran/examples-4/async_target-2.f90   -Os  
> execution test
> 
> ..., and:
> 
> PASS: libgomp.fortran/target3.f90   -O0  (test for excess errors)
> [-PASS:-]{+FAIL:+} libgomp.fortran/target3.f90   -O0  execution test
> PASS: libgomp.fortran/target3.f90   -O1  (test for excess errors)
> [-PASS:-]{+FAIL:+} libgomp.fortran/target3.f90   -O1  execution test
> PASS: libgomp.fortran/target3.f90   -O2  (test for excess errors)
> [-PASS:-]{+FAIL:+} libgomp.fortran/target3.f90   -O2  execution test
> PASS: libgomp.fortran/target3.f90   -O3 -fomit-frame-pointer 
> -funroll-loops -fpeel-loops -ftracer -finline-functions  (test for excess 
> errors)
> [-PASS:-]{+FAIL:+} libgomp.fortran/target3.f90   -O3 -fomit-frame-pointer 
> -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
> PASS: libgomp.fortran/target3.f90   -O3 -g  (test for excess errors)
> [-PASS:-]{+FAIL:+} libgomp.fortran/target3.f90   -O3 -g  execution test
> PASS: libgomp.fortran/target3.f90   -Os  (test for excess errors)
> [-PASS:-]{+FAIL:+} libgomp.fortran/target3.f90   -Os  execution test
> 
> In all cases, the run-time error message is:
> 
> libgomp: Pointer target of array section wasn't mapped

I'm not seeing any of these failures on power8 and x86_64 with multilibs
disabled. How did you configure gcc? And those are OpenMP tests. Are you
testing OpenMP target offloading, if so how?

> For reference, I'm appending the patch, which wasn't included in the
> original email.

Whoops. Thanks!

Cesar



Re: [PATCH v3][PR lto/79061] Fix LTO plus ASAN fails with "AddressSanitizer: initialization-order-fiasco".

2017-01-30 Thread Jakub Jelinek
On Mon, Jan 30, 2017 at 06:09:36PM +0300, Maxim Ostapenko wrote:
> Hi,
> 
> as was figured out in PR, using DECL_NAME (TRANSLATION_UNIT_DECL) does not
> always give us a correct module name in LTO mode because e.g. DECL_CONTEXT
> of some variables can be NAMESPACE_DECL and LTO merges NAMESPACE_DECLs. The
> easiest fix is just to disable the initialization order checking altogether
> for LTO by forcing dynamically_initialized = 0 in LTO for now but we'll
> probably want to restore initialization order fiasco detection in the future
> (e.g. for GCC 8).
> This patch just disables initialization order fiasco detection for LTO for
> now. Tested and bootstrapped on x86_64-unknown-linux-gnu, OK to apply?
> Or do I need to cook a proper fix for GCC 7 (and branches) and come back
> then?
> 
> -Maxim

> gcc/ChangeLog:
> 
> 2017-01-30  Maxim Ostapenko  
> 

Please add
PR lto/79061
here.
>   * asan.c (get_translation_unit_decl): Remove function.
>   (asan_add_global): Force has_dynamic_init to zero in LTO mode.

Ok with that change.

Jakub


Re: [PATCH v3][PR lto/79061] Fix LTO plus ASAN fails with "AddressSanitizer: initialization-order-fiasco".

2017-01-30 Thread Richard Biener
On Mon, 30 Jan 2017, Maxim Ostapenko wrote:

> Hi,
> 
> as was figured out in PR, using DECL_NAME (TRANSLATION_UNIT_DECL) does not
> always give us a correct module name in LTO mode because e.g. DECL_CONTEXT of
> some variables can be NAMESPACE_DECL and LTO merges NAMESPACE_DECLs.

Yes, it indeed does.  Note that GCC8+ both TU decls and NAMESPACE_DECLs
should no longer be neccessary and eventually vanish completely...
(in lto1, that is).  Can we code-gen the init order stuff early before
LTO write-out?

> The easiest fix is just to disable the initialization order checking 
> altogether
> for LTO by forcing dynamically_initialized = 0 in LTO for now but we'll
> probably want to restore initialization order fiasco detection in the future
> (e.g. for GCC 8).
> This patch just disables initialization order fiasco detection for LTO for
> now. Tested and bootstrapped on x86_64-unknown-linux-gnu, OK to apply?
> Or do I need to cook a proper fix for GCC 7 (and branches) and come back then?

The patch works for me.

Richard.


Re: [PATCH] Fix profile corruption with -O1 (PR gcov-profile/79259)

2017-01-30 Thread Richard Biener
On Mon, Jan 30, 2017 at 4:09 PM, Martin Liška  wrote:
> Hello.
>
> During investigation of another issue, I accidentally came a profile 
> inconsistency
> mentioned in the PR. Problem is that flag_ipa_bit_cp is enabled in use stage 
> of PGO
> and not in instrumentation stage. That causes ccp1 to find nonzero bits and 
> that leads
> to a CFG changes as a condition can be folded away.
>
> Solution is to enable the same flag in generate phase. However I've got one 
> more question:
> In -fprofile-generate we have 2 more functions available in symtab:
>
> _GLOBAL__sub_I_00100_0_c ()
> {
>[0.00%]:
>   __gcov_init (&*.LPBX0);
>   return;
>
> }
>
> _GLOBAL__sub_D_00100_1_c ()
> {
>[0.00%]:
>   __gcov_exit ();
>   return;
>
> }
>
> I'm wondering whether it can potentially influence early inlining decision?

I guess yes...

> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>
> Ready to be installed?

Ok.

Thanks,
Richard.

> Martin


Re: [PATCH/AARCH64] Enable software prefetching (-fprefetch-loop-arrays) for ThunderX 88xxx

2017-01-30 Thread Maxim Kuvyrkov
> On Jan 26, 2017, at 11:56 PM, Andrew Pinski  wrote:
> 
> Hi,
>  This patch enables -fprefetch-loop-arrays for -mcpu=thunderxt88 and
> -mcpu=thunderxt88p1.  I filled out the tuning structures for both
> thunderx and thunderx2t99.  No other core current enables software
> prefetching so I set them to 0 which does not change the default
> parameters.
> 
> OK?  Bootstrapped and tested on both ThunderX2 CN99xx and ThunderX
> CN88xx with no regressions.  I got a 2x improvement for 462.libquantum
> on CN88xx, overall a 10% improvement on SPEC INT on CN88xx at -Ofast.
> CN99xx's SPEC did not change.

Below are several comments.

> Index: config/aarch64/aarch64-cores.def
> ===
> --- config/aarch64/aarch64-cores.def  (revision 244917)
> +++ config/aarch64/aarch64-cores.def  (working copy)
> @@ -63,8 +63,8 @@ AARCH64_CORE("qdf24xx", qdf24xx,   c
>  AARCH64_CORE("thunderx",  thunderx,  thunderx,  8A,
> AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO | AARCH64_FL_LSE, 
> thunderx,  0x43, 0x0a0, -1)
>  /* Do not swap around "thunderxt88p1" and "thunderxt88",
> this order is required to handle variant correctly. */
> -AARCH64_CORE("thunderxt88p1", thunderxt88p1, thunderx,  8A,
> AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO,   
> thunderx,  0x43, 0x0a1, 0)
> -AARCH64_CORE("thunderxt88",   thunderxt88,   thunderx,  8A,
> AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO | AARCH64_FL_LSE, 
> thunderx,  0x43, 0x0a1, -1)
> +AARCH64_CORE("thunderxt88p1", thunderxt88p1, thunderx,  8A,
> AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO,   
> thunderxt88,  0x43, 0x0a1, 0)
> +AARCH64_CORE("thunderxt88",   thunderxt88,   thunderx,  8A,
> AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO | AARCH64_FL_LSE, 
> thunderxt88,  0x43, 0x0a1, -1)
>  AARCH64_CORE("thunderxt81",   thunderxt81,   thunderx,  8_1A,  
> AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO | AARCH64_FL_LSE, 
> thunderx,  0x43, 0x0a2, -1)
>  AARCH64_CORE("thunderxt83",   thunderxt83,   thunderx,  8_1A,  
> AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO | AARCH64_FL_LSE, 
> thunderx,  0x43, 0x0a3, -1)

IMO, this should be in a separate patch that adds thunderxt88p1 tunings.

>  
> Index: config/aarch64/aarch64-protos.h
> ===
> --- config/aarch64/aarch64-protos.h   (revision 244917)
> +++ config/aarch64/aarch64-protos.h   (working copy)
> @@ -220,10 +220,19 @@ struct tune_params
>unsigned int max_case_values;
>/* Value for PARAM_L1_CACHE_LINE_SIZE; or 0 to use the default.  */
>unsigned int cache_line_size;
> +  /* Value for PARAM_PREFETCH_LATENCY; or 0 to use the default.  */
> +  unsigned int prefetch_latency;
> +  /* Value for PARAM_SIMULTANEOUS_PREFETCHES; or 0 to use the default.  */
> +  unsigned int simultaneous_prefetches;
> +  /* Value for PARAM_L1_CACHE_SIZE; or 0 to use the default.  */
> +  unsigned int l1_cache_size;
> +  /* Value for PARAM_L2_CACHE_SIZE; or 0 to use the default.  */
> +  unsigned int l2_cache_size;
>  
>  /* An enum specifying how to take into account CPU autoprefetch capabilities
> during instruction scheduling:
> - AUTOPREFETCHER_OFF: Do not take autoprefetch capabilities into account.
> +   - AUTOPREFETCHER_SW: Turn on software based prefetching.
> - AUTOPREFETCHER_WEAK: Attempt to sort sequences of loads/store in order 
> of
> offsets but allow the pipeline hazard recognizer to alter that order to
> maximize multi-issue opportunities.
> @@ -233,6 +242,7 @@ struct tune_params
>enum aarch64_autoprefetch_model
>{
>  AUTOPREFETCHER_OFF,
> +AUTOPREFETCHER_SW,
>  AUTOPREFETCHER_WEAK,
>  AUTOPREFETCHER_STRONG
>} autoprefetcher_model;

As I explain below, it is not a good idea to mix loop array prefetching with 
scheduler's HW autoprefetcher model. 

> Index: config/aarch64/aarch64.c
> ===
> --- config/aarch64/aarch64.c  (revision 244917)
> +++ config/aarch64/aarch64.c  (working copy)
> @@ -535,6 +535,10 @@ static const struct tune_params generic_
>2, /* min_div_recip_mul_df.  */
>0, /* max_case_values.  */
>0, /* cache_line_size.  */
> +  0, /* prefetch_latency. */
> +  0, /* simultaneous_prefetches. */
> +  0, /* l1_cache_size. */
> +  0, /* l2_cache_size. */
>tune_params::AUTOPREFETCHER_OFF,   /* autoprefetcher_model.  */
>(AARCH64_EXTRA_TUNE_NONE)  /* tune_flags.  */
>  };
> @@ -561,6 +565,10 @@ static const struct tune_params cortexa3
>2, /* min_div_recip_mul_df.  */
>0, /* max_case_values.  */
>0, /* cache_line_size.  */
> +  0, /* prefetch_latency. */
> +  0, /* simultaneous_prefetches. */
> +  0, /* l1_cache_size. */
> +  0, /* l2_cache_size. */
>tune_params::AUTOPREFETCHER_WEAK,  /* autoprefetcher_model.  

[PATCH v3][PR lto/79061] Fix LTO plus ASAN fails with "AddressSanitizer: initialization-order-fiasco".

2017-01-30 Thread Maxim Ostapenko

Hi,

as was figured out in PR, using DECL_NAME (TRANSLATION_UNIT_DECL) does 
not always give us a correct module name in LTO mode because e.g. 
DECL_CONTEXT of some variables can be NAMESPACE_DECL and LTO merges 
NAMESPACE_DECLs. The easiest fix is just to disable the initialization 
order checking altogether for LTO by forcing dynamically_initialized = 0 
in LTO for now but we'll probably want to restore initialization order 
fiasco detection in the future (e.g. for GCC 8).
This patch just disables initialization order fiasco detection for LTO 
for now. Tested and bootstrapped on x86_64-unknown-linux-gnu, OK to apply?
Or do I need to cook a proper fix for GCC 7 (and branches) and come back 
then?


-Maxim
gcc/ChangeLog:

2017-01-30  Maxim Ostapenko  

	* asan.c (get_translation_unit_decl): Remove function.
	(asan_add_global): Force has_dynamic_init to zero in LTO mode.

diff --git a/gcc/asan.c b/gcc/asan.c
index 9098121..6cdd59b 100644
--- a/gcc/asan.c
+++ b/gcc/asan.c
@@ -2373,22 +2373,6 @@ asan_needs_odr_indicator_p (tree decl)
 	  && TREE_PUBLIC (decl));
 }
 
-/* For given DECL return its corresponding TRANSLATION_UNIT_DECL.  */
-
-static const_tree
-get_translation_unit_decl (tree decl)
-{
-  const_tree context = decl;
-  while (context && TREE_CODE (context) != TRANSLATION_UNIT_DECL)
-{
-  if (TREE_CODE (context) == BLOCK)
-	context = BLOCK_SUPERCONTEXT (context);
-  else
-	context = get_containing_scope (context);
-}
-  return context;
-}
-
 /* Append description of a single global DECL into vector V.
TYPE is __asan_global struct type as returned by asan_global_struct.  */
 
@@ -2408,14 +2392,7 @@ asan_add_global (tree decl, tree type, vec *v)
 pp_string (_pp, "");
   str_cst = asan_pp_string (_pp);
 
-  const char *filename = main_input_filename;
-  if (in_lto_p)
-{
-  const_tree translation_unit_decl = get_translation_unit_decl (decl);
-  if (translation_unit_decl && DECL_NAME (translation_unit_decl) != NULL)
-	filename = IDENTIFIER_POINTER (DECL_NAME (translation_unit_decl));
-}
-  pp_string (_name_pp, filename);
+  pp_string (_name_pp, main_input_filename);
   module_name_cst = asan_pp_string (_name_pp);
 
   if (asan_needs_local_alias (decl))
@@ -2451,7 +2428,11 @@ asan_add_global (tree decl, tree type, vec *v)
   CONSTRUCTOR_APPEND_ELT (vinner, NULL_TREE,
 			  fold_convert (const_ptr_type_node, module_name_cst));
   varpool_node *vnode = varpool_node::get (decl);
-  int has_dynamic_init = vnode ? vnode->dynamically_initialized : 0;
+  int has_dynamic_init = 0;
+  /* FIXME: Enable initialization order fiasco detection in LTO mode once
+ proper fix for PR 79061 will be applied.  */
+  if (!in_lto_p)
+has_dynamic_init = vnode ? vnode->dynamically_initialized : 0;
   CONSTRUCTOR_APPEND_ELT (vinner, NULL_TREE,
 			  build_int_cst (uptr, has_dynamic_init));
   tree locptr = NULL_TREE;


[PATCH] Fix profile corruption with -O1 (PR gcov-profile/79259)

2017-01-30 Thread Martin Liška
Hello.

During investigation of another issue, I accidentally came a profile 
inconsistency
mentioned in the PR. Problem is that flag_ipa_bit_cp is enabled in use stage of 
PGO
and not in instrumentation stage. That causes ccp1 to find nonzero bits and 
that leads
to a CFG changes as a condition can be folded away.

Solution is to enable the same flag in generate phase. However I've got one 
more question:
In -fprofile-generate we have 2 more functions available in symtab:

_GLOBAL__sub_I_00100_0_c ()
{
   [0.00%]:
  __gcov_init (&*.LPBX0);
  return;

}

_GLOBAL__sub_D_00100_1_c ()
{
   [0.00%]:
  __gcov_exit ();
  return;

}

I'm wondering whether it can potentially influence early inlining decision?

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Ready to be installed?
Martin
>From 170193b48d41b90bc8b0f73a7dce2c1933430fc1 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Mon, 30 Jan 2017 10:55:36 +0100
Subject: [PATCH] Fix profile corruption with -O1 (PR gcov-profile/79259)

gcc/ChangeLog:

2017-01-30  Martin Liska  

	PR gcov-profile/79259
	* opts.c (common_handle_option): Enable flag_ipa_bit_cp w/
	-fprofile-generate.

gcc/testsuite/ChangeLog:

2017-01-30  Martin Liska  

	PR gcov-profile/79259
	* g++.dg/tree-prof/pr79259.C: New test.
---
 gcc/opts.c   |  2 ++
 gcc/testsuite/g++.dg/tree-prof/pr79259.C | 20 
 2 files changed, 22 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/tree-prof/pr79259.C

diff --git a/gcc/opts.c b/gcc/opts.c
index 5f573a16ff1..b38e9b4f3a7 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -2150,6 +2150,8 @@ common_handle_option (struct gcc_options *opts,
 	opts->x_flag_profile_values = value;
   if (!opts_set->x_flag_inline_functions)
 	opts->x_flag_inline_functions = value;
+  if (!opts_set->x_flag_ipa_bit_cp)
+	opts->x_flag_ipa_bit_cp = value;
   /* FIXME: Instrumentation we insert makes ipa-reference bitmaps
 	 quadratic.  Disable the pass until better memory representation
 	 is done.  */
diff --git a/gcc/testsuite/g++.dg/tree-prof/pr79259.C b/gcc/testsuite/g++.dg/tree-prof/pr79259.C
new file mode 100644
index 000..a55172b62d2
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-prof/pr79259.C
@@ -0,0 +1,20 @@
+/* { dg-options "-O1" } */
+
+inline bool
+a (int b)
+{
+  return (b & 5) != b;
+}
+int c;
+int
+fn2 ()
+{
+  if (a (c == 0))
+return 0;
+}
+
+int main()
+{
+  fn2();
+}
+
-- 
2.11.0



Re: [PATCH/AARCH64] Enable software prefetching (-fprefetch-loop-arrays) for ThunderX 88xxx

2017-01-30 Thread Richard Biener
On Mon, Jan 30, 2017 at 3:49 PM, Maxim Kuvyrkov
 wrote:
>> On Jan 27, 2017, at 6:59 PM, Andrew Pinski  wrote:
>>
>> On Fri, Jan 27, 2017 at 4:11 AM, Richard Biener
>>  wrote:
>>> On Fri, Jan 27, 2017 at 1:10 PM, Richard Biener
>>>  wrote:
 On Thu, Jan 26, 2017 at 9:56 PM, Andrew Pinski  wrote:
> Hi,
>  This patch enables -fprefetch-loop-arrays for -mcpu=thunderxt88 and
> -mcpu=thunderxt88p1.  I filled out the tuning structures for both
> thunderx and thunderx2t99.  No other core current enables software
> prefetching so I set them to 0 which does not change the default
> parameters.
>
> OK?  Bootstrapped and tested on both ThunderX2 CN99xx and ThunderX
> CN88xx with no regressions.  I got a 2x improvement for 462.libquantum
> on CN88xx, overall a 10% improvement on SPEC INT on CN88xx at -Ofast.
> CN99xx's SPEC did not change.

 Heh, quite impressive for this kind of bit-rotten (and broken?) pass ;)
>>>
>>> And I wonder if most benefit comes from the unrolling the pass might do
>>> rather than from the prefetches...
>>
>> Not in this case.  The main reason why I know is because the number of
>> L1 and L2 misses drops a lot.
>
> I can confirm this.  In my experiments loop unrolling hurts several tests.
>
> The prefetching approach I'm testing for -O2 includes disabling of loop 
> unrolling to prevent code bloat.

How do you get at the desired prefetching distance then?  Is it enough
to seed the HW prefetcher by
prefetching once before the loop?

> --
> Maxim Kuvyrkov
> www.linaro.org
>
>


Re: [PR tree-optimization/71691] Fix unswitching in presence of maybe-undef SSA_NAMEs (take 2)

2017-01-30 Thread Richard Biener
On Fri, Jan 27, 2017 at 12:20 PM, Aldy Hernandez  wrote:
> On 01/26/2017 07:29 AM, Richard Biener wrote:
>>
>> On Thu, Jan 26, 2017 at 1:04 PM, Aldy Hernandez  wrote:
>>>
>>> On 01/24/2017 07:23 AM, Richard Biener wrote:
>
>
>> Your initial on-demand approach is fine to catch some of the cases but it
>> will not handle those for which we'd need post-dominance.
>>
>> I guess we can incrementally add that.
>
>
> No complaints from me.
>
> This is my initial on-demand approach, with a few fixes you've commented on
> throughout.
>
> As you can see, there is still an #if 0 wrt to using your suggested
> conservative handling of memory loads, which I'm not entirely convinced of,
> as it pessimizes gcc.dg/loop-unswitch-1.c.  If you feel strongly about it, I
> can enable the code again.

It is really required -- fortunately loop-unswitch-1.c is one of the cases where
the post-dom / always-executed bits help .  The comparison is inside the
loop header and thus always executed when the loop enters, so inserrting it
on the preheader edge is fine.

> Also, I enhanced gcc.dg/loop-unswitch-1.c to verify that we're actually
> unswitching something.  It seems kinda silly that we have various unswitch
> tests, but we don't actually check whether we have unswitched anything.

Heh.  It probably was added for an ICE...

> This test was the only one in the *unswitch*.c set that I saw was actually
> being unswitched.  Of course, if you don't agree with my #if 0 above, I will
> remove this change to the test.
>
> Finally, are we even guaranteed to unswitch in loop-unswitch-1.c across
> architectures?  If not, again, I can remove the one-liner.

I think so.

>
> How does this look for trunk?

With a unswitch-local solution I meant to not add new files but put the
defined_or_undefined class (well, or rather a single function...) into
tree-ssa-loop-unswitch.c.

@@ -138,7 +141,7 @@ tree_may_unswitch_on (basic_block bb, struct loop *loop)
 {
   /* Unswitching on undefined values would introduce undefined
 behavior that the original program might never exercise.  */
-  if (ssa_undefined_value_p (use, true))
+  if (defined_or_undefined->is_maybe_undefined (use))
return NULL_TREE;
   def = SSA_NAME_DEF_STMT (use);
   def_bb = gimple_bb (def);

as I said, moving this (now more expensive check) after

  if (def_bb
  && flow_bb_inside_loop_p (loop, def_bb))
return NULL_TREE;

this cheap check would be better.  It should avoid 99% of all calls I bet.

You can recover the loop-unswitch-1.c case by passing down
the using stmt and checking its BB against loop_header (the only
block that we trivially know is always executed when entering the region).
Or do that check in the caller, like

if (bb != loop->header
   && ssa_undefined_value_p (use, true) /
defined_or_undefined->is_maybe_undefined (use))

+  gimple *def = SSA_NAME_DEF_STMT (t);
+
+  /* Check that all the PHI args are fully defined.  */
+  if (gphi *phi = dyn_cast  (def))
+   {
+ if (virtual_operand_p (PHI_RESULT (phi)))
+   continue;

You should never run into virtual operands (you only walk SSA_OP_USEs).

You can stop walking at stmts that dominate the region header,
like with

+  gimple *def = SSA_NAME_DEF_STMT (t);
/* Uses in stmts always executed when the region header
executes are fine.  */
if (dominated_by_p (CDI_DOMINATORS, loop_header, gimple_bb (def)))
  continue;

and the bail out for PARM_DECLs is wrong:

+  /* A PARM_DECL will not have an SSA_NAME_DEF_STMT.  Parameters
+get their initial value from function entry.  */
+  if (SSA_NAME_VAR (t) && TREE_CODE (SSA_NAME_VAR (t)) == PARM_DECL)
+   return false;

needs to be a continue; rather than a return false.


Otherwise looks ok and sorry for the continuing delays in reviewing this...

Thanks,
Richard.

> Aldy
>


Re: [PATCH/AARCH64] Enable software prefetching (-fprefetch-loop-arrays) for ThunderX 88xxx

2017-01-30 Thread Maxim Kuvyrkov
> On Jan 27, 2017, at 6:59 PM, Andrew Pinski  wrote:
> 
> On Fri, Jan 27, 2017 at 4:11 AM, Richard Biener
>  wrote:
>> On Fri, Jan 27, 2017 at 1:10 PM, Richard Biener
>>  wrote:
>>> On Thu, Jan 26, 2017 at 9:56 PM, Andrew Pinski  wrote:
 Hi,
  This patch enables -fprefetch-loop-arrays for -mcpu=thunderxt88 and
 -mcpu=thunderxt88p1.  I filled out the tuning structures for both
 thunderx and thunderx2t99.  No other core current enables software
 prefetching so I set them to 0 which does not change the default
 parameters.
 
 OK?  Bootstrapped and tested on both ThunderX2 CN99xx and ThunderX
 CN88xx with no regressions.  I got a 2x improvement for 462.libquantum
 on CN88xx, overall a 10% improvement on SPEC INT on CN88xx at -Ofast.
 CN99xx's SPEC did not change.
>>> 
>>> Heh, quite impressive for this kind of bit-rotten (and broken?) pass ;)
>> 
>> And I wonder if most benefit comes from the unrolling the pass might do
>> rather than from the prefetches...
> 
> Not in this case.  The main reason why I know is because the number of
> L1 and L2 misses drops a lot.

I can confirm this.  In my experiments loop unrolling hurts several tests.

The prefetching approach I'm testing for -O2 includes disabling of loop 
unrolling to prevent code bloat.

--
Maxim Kuvyrkov
www.linaro.org




Re: [PATCH 5/6][AArch64] Enable -fprefetch-loop-arrays at -O3 for cores that benefit from prefetching.

2017-01-30 Thread Maxim Kuvyrkov
> On Jan 30, 2017, at 3:23 PM, Kyrill Tkachov  
> wrote:
> 
> Hi Maxim,
> 
> On 30/01/17 12:06, Maxim Kuvyrkov wrote:
>> This patch enables prefetching at -O3 for aarch64 cores that set 
>> "simultaneous prefetches" parameter above 0.  There are currently no such 
>> settings, so this patch doesn't change default code generation.
>> 
>> I'm now working on improvements to -fprefetch-loop-arrays pass to make it 
>> suitable for -O2.  I'll post this work in the next month.
>> 
>> Bootstrapped and regtested on x86_64-linux-gnu and aarch64-linux-gnu.
>> 
> 
> Are you aiming to get this in for GCC 8?
> I have one small comment on this patch:
> 
> +  /* Enable sw prefetching at -O3 for CPUS that have prefetch, and we
> + have deemed it beneficial (signified by setting
> + prefetch.num_slots to 1 or more).  */
> +  if (flag_prefetch_loop_arrays < 0
> +  && HAVE_prefetch
> 
> HAVE_prefetch will always be true on aarch64.
> I imagine midend code that had logic like this would need this check, but 
> aarch64-specific code shouldn't need it.

Agree, I'll remove HAVE_prefetch.

This pattern was copied from other backends, and HAVE_prefetch is most likely a 
historical artifact.

--
Maxim Kuvyrkov
www.linaro.org



[arm] PR target/79260: Fix header installation for plugins

2017-01-30 Thread Richard Earnshaw (lists)
The recent changes to the header infrastructure for ARM targets broke
building of plugins due to missing headers.

Patch below, tested on a cross build plus visual examination of the
install infrastructure.

PR target/79260
* config.gcc (arm*-*-*): Add arm/arm-flags.h and arm/arm-isa.h to tm_p_file.
* arm/arm-protos.h: Don't directly include arm-flags.h and arm-isa.h.

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 03b1894..bc389eb 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -571,7 +571,7 @@ x86_64-*-*)
tm_file="vxworks-dummy.h ${tm_file}"
;;
 arm*-*-*)
-   tm_p_file="${tm_p_file} arm/aarch-common-protos.h"
+   tm_p_file="arm/arm-flags.h arm/arm-isa.h ${tm_p_file} 
arm/aarch-common-protos.h"
tm_file="vxworks-dummy.h ${tm_file}"
;;
 mips*-*-* | sh*-*-* | sparc*-*-*)
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 1b16239..680a1e6 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -22,8 +22,6 @@
 #ifndef GCC_ARM_PROTOS_H
 #define GCC_ARM_PROTOS_H
 
-#include "arm-flags.h"
-#include "arm-isa.h"
 #include "sbitmap.h"
 
 extern enum unwind_info_type arm_except_unwind_info (struct gcc_options *);


Re: [PATCH] update-copyright.py: Retain file mode

2017-01-30 Thread Bernhard Reutner-Fischer
On Tue, Jun 21, 2016 at 09:00:58AM -0600, Jeff Law wrote:
> On 06/21/2016 08:14 AM, Bernhard Reutner-Fischer wrote:
> > Hi!
> > 
> > Ok for trunk?
> > 
> > thanks,
> > 
> > contrib/ChangeLog
> > 
> > 2016-06-21  Bernhard Reutner-Fischer  
> > 
> > * update-copyright.py (Copyright.process_file): Retain original
> > file mode.
> OK.
> jeff

Thanks Jeff. Now committed as r245028.


Re: [PATCH] Fix aarch64 PGO bootstrap (bootstrap/78985)

2017-01-30 Thread James Greenhalgh
On Mon, Jan 30, 2017 at 10:47:12AM +0100, Martin Liška wrote:
> Hi.
> 
> Following patch simply fixes issues reported by -Wmaybe-unitialized. That 
> enables PGO bootstrap
> on ThunderX aarch64 machine.
> 
> Ready to be installed?

OK.

Thanks,
James



[PATCH] Fix PR79276

2017-01-30 Thread Richard Biener

Bootstrapped / tested on x86_64-unknown-linux-gnu, applied.

Richard.

2017-01-30  Richard Biener  

PR tree-optimization/79276
* tree-vrp.c (process_assert_insertions): Properly adjust common
when removing a duplicate.

* gcc.dg/torture/pr79276.c: New testcase.

Index: gcc/tree-vrp.c
===
--- gcc/tree-vrp.c  (revision 245022)
+++ gcc/tree-vrp.c  (working copy)
@@ -6544,6 +6544,11 @@ process_assert_insertions (void)
  else if (loc->e == asserts[j-1]->e)
{
  /* Remove duplicate asserts.  */
+ if (commonj == j - 1)
+   {
+ commonj = j;
+ common = loc;
+   }
  free (asserts[j-1]);
  asserts[j-1] = NULL;
}
Index: gcc/testsuite/gcc.dg/torture/pr79276.c
===
--- gcc/testsuite/gcc.dg/torture/pr79276.c  (nonexistent)
+++ gcc/testsuite/gcc.dg/torture/pr79276.c  (working copy)
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+
+short int
+ix (int *ld, short int oi)
+{
+  *ld = ((unsigned short int)oi | oi) && !!(*ld);
+  return (oi != 0) ? oi : 1;
+}


Re: [PATCH, wwwdocs/ARM] Mention new rmprofile value for --with-multilib-list

2017-01-30 Thread Kyrill Tkachov

Hi Thomas,

On 30/01/17 13:32, Thomas Preudhomme wrote:

Hi,

ARM backend now support a new set of multilib libraries enabled with 
--with-multilib-list=rmprofile [1]. This patch documents it in the changes for 
GCC 7.

[1] https://gcc.gnu.org/viewcvs/gcc?view=revision=r242696

Is this ok for wwwdocs?



This is ok.
Thanks,
Kyrill


Best regards,

Thomas




Re: [v3 PATCH] Implement LWG 2825, LWG 2756 breaks class template argument deduction for optional.

2017-01-30 Thread Jonathan Wakely

On 30/01/17 13:28 +, Jonathan Wakely wrote:

On 30/01/17 13:47 +0200, Ville Voutilainen wrote:

Tested on Linux-x64.


OK, thanks.


To be clear: this isn't approved by LWG yet, but I think we can be a
bit adventurous with deduction guides and add them for experimental
C++17 features. Getting more usage experience before we standardise
these things will be good, and deduction guides are very new and
untried. If we find problems we can remove them again, and will have
invaluable feedback for the standards committee.



Re: [PATCH] Fix PGO bootstrap on x390x (PR bootstrap/78985).

2017-01-30 Thread Martin Liška
On 01/30/2017 12:27 PM, Martin Liška wrote:
> Hi.
> 
> Following patch simply fixes issues reported by -Wmaybe-unitialized. That 
> enables PGO bootstrap
> on a s390x machine.
> 
> Ready to be installed?
> Martin
> 

There's second version that adds one more hunk for s390 target.

Martin
>From 598d0a59b91070211b09056195bc0f971bc57ae1 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Mon, 30 Jan 2017 11:09:29 +0100
Subject: [PATCH] Fix PGO bootstrap on x390x (PR bootstrap/78985).

gcc/ChangeLog:

2017-01-30  Martin Liska  

	PR bootstrap/78985
	* config/s390/s390.c (s390_gimplify_va_arg): Initialize local
	variable to NULL.
	(print_operand_address): Initialize a struct to zero.
---
 gcc/config/s390/s390.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index fe65846a4f2..c02da273d8c 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -7347,6 +7347,7 @@ void
 print_operand_address (FILE *file, rtx addr)
 {
   struct s390_address ad;
+  memset (, 0, sizeof (s390_address));
 
   if (s390_loadrelative_operand_p (addr, NULL, NULL))
 {
@@ -12195,7 +12196,7 @@ s390_gimplify_va_arg (tree valist, tree type, gimple_seq *pre_p,
   tree f_gpr, f_fpr, f_ovf, f_sav;
   tree gpr, fpr, ovf, sav, reg, t, u;
   int indirect_p, size, n_reg, sav_ofs, sav_scale, max_reg;
-  tree lab_false, lab_over;
+  tree lab_false, lab_over = NULL_TREE;
   tree addr = create_tmp_var (ptr_type_node, "addr");
   bool left_align_p; /* How a value < UNITS_PER_LONG is aligned within
 			a stack slot.  */
-- 
2.11.0



[PATCH, wwwdocs/ARM] Mention new rmprofile value for --with-multilib-list

2017-01-30 Thread Thomas Preudhomme

Hi,

ARM backend now support a new set of multilib libraries enabled with 
--with-multilib-list=rmprofile [1]. This patch documents it in the changes for 
GCC 7.


[1] https://gcc.gnu.org/viewcvs/gcc?view=revision=r242696

Is this ok for wwwdocs?

Best regards,

Thomas
Index: htdocs/gcc-7/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-7/changes.html,v
retrieving revision 1.44
diff -u -r1.44 changes.html
--- htdocs/gcc-7/changes.html	27 Jan 2017 09:54:32 -	1.44
+++ htdocs/gcc-7/changes.html	30 Jan 2017 10:00:48 -
@@ -517,6 +517,11 @@
   the generation of coprocessor instructions through the use of intrinsics
   such as cdp, ldc, and others.
 
+ 
+  The configure option --with-multilib-list now accepts the
+  value rmprofile to build multilib libraries for a range of
+  embedded targets.
+

 
 AVR


Re: [v3 PATCH] Implement LWG 2825, LWG 2756 breaks class template argument deduction for optional.

2017-01-30 Thread Jonathan Wakely

On 30/01/17 13:47 +0200, Ville Voutilainen wrote:

Tested on Linux-x64.


OK, thanks.


Re: [PATCH 3/6] Fix prefetch heuristic calculation.

2017-01-30 Thread Richard Biener
On Mon, Jan 30, 2017 at 12:43 PM, Maxim Kuvyrkov
 wrote:
> This patch fixes heuristic in loop array prefetching to use 
> "round-to-nearest" instead of "floor", which is what _all_ other similar 
> heuristics in the pass are doing.

_all_ is a bit over-exaggregating...  In the context we are testing
this round value against a min_insn_to_mem_ratio so rounding
down makes sense to me.

What am I missing?

Richard.

> This subtle change allows a critical loop in 429.mcf to get prefetches 
> without making the pass too aggressive, which causes regressions on other 
> benchmarks.
>
> Bootstrapped and regtested on x86_64-linux-gnu and aarch64-linux-gnu.
>
> --
> Maxim Kuvyrkov
> www.linaro.org
>


Re: [PATCH 2/6] Improve debug output of loop data prefetching.

2017-01-30 Thread Richard Biener
On Mon, Jan 30, 2017 at 12:36 PM, Maxim Kuvyrkov
 wrote:
> Current debug output from -fprefetch-loop-arrays refers to prefetching 
> instances by their (void *) address, which makes it painful to compare dumps, 
> e.g., when investigating how different parameter values affect prefetching 
> decisions.
>
> This patch adds UIDs to two main prefetching concepts: mem_ref_group and 
> mem_ref.  [Mem_refs are combined into mem_ref_groups so that they can re-use 
> prefetches.]  Debug output is then changed to identify prefetch opportunities 
> as : instead of 0x.  Believe me, it makes a world of 
> difference for debugging tree-ssa-loop-prefetch.c.
>
> There is no change in code-generation from this patch.
>
> Bootstrapped and regtested on x86_64-linux-gnu and aarch64-linux-gnu.

Ok.

Richard.

> --
> Maxim Kuvyrkov
> www.linaro.org
>


Re: [PATCH 1/6] Add debug counter for loop array prefetching.

2017-01-30 Thread Richard Biener
On Mon, Jan 30, 2017 at 12:28 PM, Maxim Kuvyrkov
 wrote:
> This patch adds a debug counter to -fprefetch-loop-arrays pass.  It can be 
> activated by "-fdbg-cnt=prefetch:10" to allow only 10 first prefetches to be 
> issued.
>
> Bootstrapped and regtested on x86_64-linux-gnu and aarch64-linux-gnu.

Ok.

RIchard.

> --
> Maxim Kuvyrkov
> www.linaro.org
>


Re: [PATCH 5/6][AArch64] Enable -fprefetch-loop-arrays at -O3 for cores that benefit from prefetching.

2017-01-30 Thread Kyrill Tkachov

Hi Maxim,

On 30/01/17 12:06, Maxim Kuvyrkov wrote:

This patch enables prefetching at -O3 for aarch64 cores that set "simultaneous 
prefetches" parameter above 0.  There are currently no such settings, so this patch 
doesn't change default code generation.

I'm now working on improvements to -fprefetch-loop-arrays pass to make it 
suitable for -O2.  I'll post this work in the next month.

Bootstrapped and regtested on x86_64-linux-gnu and aarch64-linux-gnu.



Are you aiming to get this in for GCC 8?
I have one small comment on this patch:

+  /* Enable sw prefetching at -O3 for CPUS that have prefetch, and we
+ have deemed it beneficial (signified by setting
+ prefetch.num_slots to 1 or more).  */
+  if (flag_prefetch_loop_arrays < 0
+  && HAVE_prefetch

HAVE_prefetch will always be true on aarch64.
I imagine midend code that had logic like this would need this check, but 
aarch64-specific code shouldn't need it.

+  && optimize >= 3
+  && aarch64_tune_params.prefetch.num_slots > 0)
+flag_prefetch_loop_arrays = 1;
+

Cheers,
Kyrill



Re: [PATCH/AARCH64] Enable software prefetching (-fprefetch-loop-arrays) for ThunderX 88xxx

2017-01-30 Thread Maxim Kuvyrkov
> On Jan 27, 2017, at 1:54 PM, Kyrill Tkachov  
> wrote:
> 
> 
> On 26/01/17 20:56, Andrew Pinski wrote:
>> Hi,
>>   This patch enables -fprefetch-loop-arrays for -mcpu=thunderxt88 and
>> -mcpu=thunderxt88p1.  I filled out the tuning structures for both
>> thunderx and thunderx2t99.  No other core current enables software
>> prefetching so I set them to 0 which does not change the default
>> parameters.
>> 
>> OK?  Bootstrapped and tested on both ThunderX2 CN99xx and ThunderX
>> CN88xx with no regressions.  I got a 2x improvement for 462.libquantum
>> on CN88xx, overall a 10% improvement on SPEC INT on CN88xx at -Ofast.
>> CN99xx's SPEC did not change.
>> 
>> Thanks,
>> Andrew Pinski
>> 
>> ChangeLog:
>> * config/aarch64/aarch64-protos.h (struct tune_params): Add
>> prefetch_latency, simultaneous_prefetches, l1_cache_size, and
>> l2_cache_size fields.
>> (enum aarch64_autoprefetch_model): Add AUTOPREFETCHER_SW.
>> * config/aarch64/aarch64.c (generic_tunings): Update to include
>> prefetch_latency, simultaneous_prefetches, l1_cache_size, and
>> l2_cache_size fields to 0.
>> (cortexa35_tunings): Likewise.
>> (cortexa53_tunings): Likewise.
>> (cortexa57_tunings): Likewise.
>> (cortexa72_tunings): Likewise.
>> (cortexa73_tunings): Likewise.
>> (exynosm1_tunings): Likewise.
>> (thunderx_tunings): Fill out some of the new fields.
>> (thunderxt88_tunings): New variable.
>> (xgene1_tunings): Update to include prefetch_latency,
>> simultaneous_prefetches, l1_cache_size, and l2_cache_size fields to 0.
>> (qdf24xx_tunings): Likewise.
>> (thunderx2t99_tunings): Fill out some of the new fields.
>> (aarch64_override_options_internal): Consider AUTOPREFETCHER_SW like
>> AUTOPREFETCHER_OFF.
>> Set param values if the fields are non-zero.  Turn on
>> prefetch-loop-arrays if AUTOPREFETCHER_SW and optimize level is at
>> least 3 or profile feed usage is enabled.
>> * config/aarch64/aarch64-cores.def (thunderxt88p1): Use thunderxt88 tuning.
>> (thunderxt88): Likewise.
> 
> --- config/aarch64/aarch64-protos.h   (revision 244917)
> +++ config/aarch64/aarch64-protos.h   (working copy)
> @@ -220,10 +220,19 @@ struct tune_params
>   unsigned int max_case_values;
>   /* Value for PARAM_L1_CACHE_LINE_SIZE; or 0 to use the default.  */
>   unsigned int cache_line_size;
> +  /* Value for PARAM_PREFETCH_LATENCY; or 0 to use the default.  */
> +  unsigned int prefetch_latency;
> +  /* Value for PARAM_SIMULTANEOUS_PREFETCHES; or 0 to use the default.  */
> +  unsigned int simultaneous_prefetches;
> +  /* Value for PARAM_L1_CACHE_SIZE; or 0 to use the default.  */
> +  unsigned int l1_cache_size;
> +  /* Value for PARAM_L2_CACHE_SIZE; or 0 to use the default.  */
> +  unsigned int l2_cache_size;
> 
> Not a blocker to the patch but I wonder whether it would be a good idea to 
> group these prefetch-related parameters
> (plus autoprefetcher_model) into a new nested struct here (prefetch_tunings 
> or something) since there's a decent
> number of them and they're all related.

Feel free to copy-paste from 
https://gcc.gnu.org/ml/gcc-patches/2017-01/msg02292.html  , which is a 
copy-paste from aarch32 backend anyway ;-).

--
Maxim Kuvyrkov
www.linaro.org




[PATCH 6/6][AArch64] Update prefetch tuning parameters for falkor and qdf24xx tunings.

2017-01-30 Thread Maxim Kuvyrkov
This patch enables software prefetching at -O3 for Qualcomm's qdf24xx cores.

Bootstrapped and regtested on x86_64-linux-gnu and aarch64-linux-gnu.

--
Maxim Kuvyrkov
www.linaro.org



0006-Update-prefetch-tuning-parameters-for-falkor-and-qdf.patch
Description: Binary data


[PATCH 5/6][AArch64] Enable -fprefetch-loop-arrays at -O3 for cores that benefit from prefetching.

2017-01-30 Thread Maxim Kuvyrkov
This patch enables prefetching at -O3 for aarch64 cores that set "simultaneous 
prefetches" parameter above 0.  There are currently no such settings, so this 
patch doesn't change default code generation.

I'm now working on improvements to -fprefetch-loop-arrays pass to make it 
suitable for -O2.  I'll post this work in the next month.

Bootstrapped and regtested on x86_64-linux-gnu and aarch64-linux-gnu.

--
Maxim Kuvyrkov
www.linaro.org



0005-Enable-fprefetch-loop-arrays-at-O3-for-cores-that-be.patch
Description: Binary data




Re: [wwwdocs] remove developer.axis.com links

2017-01-30 Thread Hans-Peter Nilsson
> Date: Sun, 29 Jan 2017 23:06:56 +0100 (CET)
> From: Gerald Pfeifer 

> There is one reference left in gcc.gnu.org/readings.html; Hans-Peter,
> do you have a recommendation on how to best handle that?  (Remove it,
> or is there a good and stable replacement?)

Sorry, I don't know.  I'll open an internal ticket.

brgds, H-P


[PATCH 4/6] Port prefetch configuration from aarch32 to aarch64

2017-01-30 Thread Maxim Kuvyrkov
This patch port prefetch configuration from aarch32 backend to aarch64.  There 
is no code-generation change from this patch.

This patch also happens to address Kyrill's comment on Andrew's prefetching 
patch at https://gcc.gnu.org/ml/gcc-patches/2017-01/msg02133.html .

This patch also fixes a minor bug in aarch64_override_options_internal(), which 
used "selected_cpu->tune" instead of "aarch64_tune_params".

Bootstrapped and regtested on x86_64-linux-gnu and aarch64-linux-gnu.

--
Maxim Kuvyrkov
www.linaro.org



0004-Port-prefetch-configuration-from-aarch32-to-aarch64-.patch
Description: Binary data


[v3 PATCH] Implement LWG 2825, LWG 2756 breaks class template argument deduction for optional.

2017-01-30 Thread Ville Voutilainen
Tested on Linux-x64.

2017-01-30  Ville Voutilainen  

Implement LWG 2825, LWG 2756 breaks class template argument
deduction for optional.
* include/std/optional: Add a deduction guide.
* testsuite/20_util/optional/cons/deduction_guide.cc: New.
diff --git a/libstdc++-v3/include/std/optional 
b/libstdc++-v3/include/std/optional
index 887bf9e..905bc0a 100644
--- a/libstdc++-v3/include/std/optional
+++ b/libstdc++-v3/include/std/optional
@@ -981,6 +981,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   /// @}
 
+  template  optional(_Tp) -> optional<_Tp>;
+
 _GLIBCXX_END_NAMESPACE_VERSION
 } // namespace std
 
diff --git a/libstdc++-v3/testsuite/20_util/optional/cons/deduction_guide.cc 
b/libstdc++-v3/testsuite/20_util/optional/cons/deduction_guide.cc
new file mode 100644
index 000..59698dc
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/optional/cons/deduction_guide.cc
@@ -0,0 +1,44 @@
+// { dg-options "-std=gnu++17" }
+// { dg-do compile }
+
+// Copyright (C) 2017 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+#include 
+#include 
+
+struct MoveOnly
+{
+  MoveOnly() = default;
+  MoveOnly(MoveOnly&&) {}
+  MoveOnly& operator=(MoveOnly&&) {}
+};
+
+int main()
+{
+std::optional x = 5;
+static_assert(std::is_same_v);
+int y = 42;
+std::optional x2 = y;
+static_assert(std::is_same_v);
+const int z = 666;
+std::optional x3 = z;
+static_assert(std::is_same_v);
+std::optional mo = MoveOnly();
+static_assert(std::is_same_v);
+mo = MoveOnly();
+}


[PATCH 3/6] Fix prefetch heuristic calculation.

2017-01-30 Thread Maxim Kuvyrkov
This patch fixes heuristic in loop array prefetching to use "round-to-nearest" 
instead of "floor", which is what _all_ other similar heuristics in the pass 
are doing.

This subtle change allows a critical loop in 429.mcf to get prefetches without 
making the pass too aggressive, which causes regressions on other benchmarks.

Bootstrapped and regtested on x86_64-linux-gnu and aarch64-linux-gnu.

--
Maxim Kuvyrkov
www.linaro.org



0003-Fix-prefetch-heuristic-calculation.patch
Description: Binary data


[PATCH 2/6] Improve debug output of loop data prefetching.

2017-01-30 Thread Maxim Kuvyrkov
Current debug output from -fprefetch-loop-arrays refers to prefetching 
instances by their (void *) address, which makes it painful to compare dumps, 
e.g., when investigating how different parameter values affect prefetching 
decisions.

This patch adds UIDs to two main prefetching concepts: mem_ref_group and 
mem_ref.  [Mem_refs are combined into mem_ref_groups so that they can re-use 
prefetches.]  Debug output is then changed to identify prefetch opportunities 
as : instead of 0x.  Believe me, it makes a world of 
difference for debugging tree-ssa-loop-prefetch.c.

There is no change in code-generation from this patch.

Bootstrapped and regtested on x86_64-linux-gnu and aarch64-linux-gnu.

--
Maxim Kuvyrkov
www.linaro.org



0002-Improve-debug-output-of-loop-data-prefetching.patch
Description: Binary data


Re: [PATCH][ARM][PR target/78945] Fix libatomic on armv7-m

2017-01-30 Thread Szabolcs Nagy
On 25/01/17 12:35, Szabolcs Nagy wrote:
> ARM libatomic inline asm uses sel, uadd8, uadd16 instructions
> which are only available if __ARM_FEATURE_SIMD32 is defined.
> 
> libatomic/
> 2017-01-25  Szabolcs Nagy  
> 
>   PR target/78945
>   * config/arm/exch_n.c (libat_exchange): Check __ARM_FEATURE_SIMD32.

committed based on
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78945#c2

i will backport it to release branches.



[PATCH 1/6] Add debug counter for loop array prefetching.

2017-01-30 Thread Maxim Kuvyrkov
This patch adds a debug counter to -fprefetch-loop-arrays pass.  It can be 
activated by "-fdbg-cnt=prefetch:10" to allow only 10 first prefetches to be 
issued.

Bootstrapped and regtested on x86_64-linux-gnu and aarch64-linux-gnu.

--
Maxim Kuvyrkov
www.linaro.org



0001-Add-debug-counter-for-loop-array-prefetching.patch
Description: Binary data


[PATCH] Fix PGO bootstrap on x390x (PR bootstrap/78985).

2017-01-30 Thread Martin Liška
Hi.

Following patch simply fixes issues reported by -Wmaybe-unitialized. That 
enables PGO bootstrap
on a s390x machine.

Ready to be installed?
Martin
>From 3f3c3fe790ebffd038a033b6946de663e2305574 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Mon, 30 Jan 2017 11:09:29 +0100
Subject: [PATCH] Fix PGO bootstrap on x390x (PR bootstrap/78985).

gcc/ChangeLog:

2017-01-30  Martin Liska  

	PR bootstrap/78985
	* config/s390/s390.c (s390_gimplify_va_arg): Initialize local
	variable to NULL.
---
 gcc/config/s390/s390.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index fe65846a4f2..3ac7df34826 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -12195,7 +12195,7 @@ s390_gimplify_va_arg (tree valist, tree type, gimple_seq *pre_p,
   tree f_gpr, f_fpr, f_ovf, f_sav;
   tree gpr, fpr, ovf, sav, reg, t, u;
   int indirect_p, size, n_reg, sav_ofs, sav_scale, max_reg;
-  tree lab_false, lab_over;
+  tree lab_false, lab_over = NULL_TREE;
   tree addr = create_tmp_var (ptr_type_node, "addr");
   bool left_align_p; /* How a value < UNITS_PER_LONG is aligned within
 			a stack slot.  */
-- 
2.11.0



Re: [PATCH] Fix PR79256

2017-01-30 Thread Richard Biener
On Mon, 30 Jan 2017, Uros Bizjak wrote:

> On Mon, Jan 30, 2017 at 11:56 AM, Richard Biener  wrote:
> > On Mon, 30 Jan 2017, Jakub Jelinek wrote:
> >
> >> On Mon, Jan 30, 2017 at 11:47:51AM +0100, Richard Biener wrote:
> >> > On Mon, 30 Jan 2017, Uros Bizjak wrote:
> >> >
> >> > > > 2017-01-30  Richard Biener  
> >> > > >
> >> > > > PR target/79277
> >> > > > * config/i386/i386-modes.def: Align DFmode properly.
> >> > >
> >> > > Index: gcc/config/i386/i386-modes.def
> >> > > ===
> >> > > --- gcc/config/i386/i386-modes.def (revision 245021)
> >> > > +++ gcc/config/i386/i386-modes.def (working copy)
> >> > > @@ -33,6 +33,7 @@ ADJUST_FLOAT_FORMAT (XF, (TARGET_128BIT_
> >> > >: _extended_intel_96_format));
> >> > >  ADJUST_BYTESIZE  (XF, TARGET_128BIT_LONG_DOUBLE ? 16 : 12);
> >> > >  ADJUST_ALIGNMENT (XF, TARGET_128BIT_LONG_DOUBLE ? 16 : 4);
> >> > > +ADJUST_ALIGNMENT (DF, !TARGET_64BIT ? 4 : 8);
> >> > >
> >> > > Please avoid negative logic, just swap arms of the conditional around.
> >> >
> >> > It was just meant as an example fix.  I don't think this is appropriate
> >> > at this stage nor is it complete.  A full fix would basically make
> >> > x86_field_alignment unnecessary which limits most modes alignment
> >> > to 32bit (but not vector or 128bit float modes).  And the conditional
> >> > needs updating to honor TARGET_ALIGN_DOUBLE.
> >>
> >> Yeah, at least for GCC 7 that change (quite major ABI change) is too
> >> dangerous IMHO.
> >
> > Yes, __alignof__ (double) would change.  But I think for correctness
> > at least __alignof__ (*(double *)p) needs to change.  So another
> > way to fix this would be to change the FE to use a differently
> > aligned double type in contexts where the default one is wrong
> > (but we do have too many == double_type_node checks everywhere...).
> >
> > Btw, we should eventually change ADJUST_FIELD_ALIGN to take the
> > type of the field instead of the FIELD_DECL (as said, only frv.c
> > looks at the field, all others just look at its type).
> 
> Digging a bit more through the documentation:
> 
> `-malign-double'
> `-mno-align-double'
>  Control whether GCC aligns `double', `long double', and `long
>  long' variables on a two word boundary or a one word boundary.
>  Aligning `double' variables on a two word boundary will produce
>  code that runs somewhat faster on a `Pentium' at the expense of
>  more memory.
> 
>  On x86-64, `-malign-double' is enabled by default.
> 
>  *Warning:* if you use the `-malign-double' switch, structures
>  containing the above types will be aligned differently than the
>  published application binary interface specifications for the 386
>  and will not be binary compatible with structures in code compiled
>  without that switch.
> 
> The text above implies that on 32bit targets we already have alignment
> to a word boundary (4-byte), unless -malign-double is used. So,
> proposed i386-modes.def patch seems redundant to me.

double bar (double *p)
{
  return *p;
}

(insn 6 5 7 (set (reg:DF 90)
(mem:DF (reg/v/f:SI 88 [ p ]) [1 *p_2(D)+0 S8 A64])) "t.c":3 -1
 (nil))

with -m32 and -mno-align-double.  The flag only affects alignment
of fields within structs.  So consider

struct X { int i; double x; } x;
foo ();

clearly the RTL for bar is bogus.

Richard.


[PATCH 0/6] Improve -fprefetch-loop-arrays in general and for AArch64 in particular

2017-01-30 Thread Maxim Kuvyrkov
This patch series improves -fprefetch-loop-arrays pass through small fixes and 
tweaks, and then enables it for several AArch64 cores.

My tunings were done on and for Qualcomm hardware, with results varying between 
+0.5-1.9% for SPEC2006 INT and +0.25%-1.0% for SPEC2006 FP at -O3, depending on 
hardware revision.

This patch series enables restricted -fprefetch-loop-arrays at -O2, which also 
improves SPEC2006 numbers

Biggest progressions are on 419.mcf and 437.leslie3d, with no serious 
regressions on other benchmarks.

I'm now investigating making -fprefetch-loop-arrays more aggressive for 
Qualcomm hardware, which improves performance on most benchmarks, but also 
causes big regressions on 454.calculix and 462.libquantum.  If I can fix these 
two regressions, prefetching will give another boost to AArch64.

Andrew just posted similar prefetching tunings for Cavium's cores, and the two 
patches have trivial conflicts.  I'll post mine as-is, since it address one of 
the comments on Andrew's review (adding a stand-alone struct for tuning 
parameters).

Andrew, feel free to just copy-paste it to your patch, since it is just a 
mechanical change.

All patches were bootstrapped and regtested on x86_64-linux-gnu and 
aarch64-linux-gnu.
 
--
Maxim Kuvyrkov
www.linaro.org





Re: [PATCH] Fix PR79256

2017-01-30 Thread Uros Bizjak
On Mon, Jan 30, 2017 at 11:56 AM, Richard Biener  wrote:
> On Mon, 30 Jan 2017, Jakub Jelinek wrote:
>
>> On Mon, Jan 30, 2017 at 11:47:51AM +0100, Richard Biener wrote:
>> > On Mon, 30 Jan 2017, Uros Bizjak wrote:
>> >
>> > > > 2017-01-30  Richard Biener  
>> > > >
>> > > > PR target/79277
>> > > > * config/i386/i386-modes.def: Align DFmode properly.
>> > >
>> > > Index: gcc/config/i386/i386-modes.def
>> > > ===
>> > > --- gcc/config/i386/i386-modes.def (revision 245021)
>> > > +++ gcc/config/i386/i386-modes.def (working copy)
>> > > @@ -33,6 +33,7 @@ ADJUST_FLOAT_FORMAT (XF, (TARGET_128BIT_
>> > >: _extended_intel_96_format));
>> > >  ADJUST_BYTESIZE  (XF, TARGET_128BIT_LONG_DOUBLE ? 16 : 12);
>> > >  ADJUST_ALIGNMENT (XF, TARGET_128BIT_LONG_DOUBLE ? 16 : 4);
>> > > +ADJUST_ALIGNMENT (DF, !TARGET_64BIT ? 4 : 8);
>> > >
>> > > Please avoid negative logic, just swap arms of the conditional around.
>> >
>> > It was just meant as an example fix.  I don't think this is appropriate
>> > at this stage nor is it complete.  A full fix would basically make
>> > x86_field_alignment unnecessary which limits most modes alignment
>> > to 32bit (but not vector or 128bit float modes).  And the conditional
>> > needs updating to honor TARGET_ALIGN_DOUBLE.
>>
>> Yeah, at least for GCC 7 that change (quite major ABI change) is too
>> dangerous IMHO.
>
> Yes, __alignof__ (double) would change.  But I think for correctness
> at least __alignof__ (*(double *)p) needs to change.  So another
> way to fix this would be to change the FE to use a differently
> aligned double type in contexts where the default one is wrong
> (but we do have too many == double_type_node checks everywhere...).
>
> Btw, we should eventually change ADJUST_FIELD_ALIGN to take the
> type of the field instead of the FIELD_DECL (as said, only frv.c
> looks at the field, all others just look at its type).

Digging a bit more through the documentation:

`-malign-double'
`-mno-align-double'
 Control whether GCC aligns `double', `long double', and `long
 long' variables on a two word boundary or a one word boundary.
 Aligning `double' variables on a two word boundary will produce
 code that runs somewhat faster on a `Pentium' at the expense of
 more memory.

 On x86-64, `-malign-double' is enabled by default.

 *Warning:* if you use the `-malign-double' switch, structures
 containing the above types will be aligned differently than the
 published application binary interface specifications for the 386
 and will not be binary compatible with structures in code compiled
 without that switch.

The text above implies that on 32bit targets we already have alignment
to a word boundary (4-byte), unless -malign-double is used. So,
proposed i386-modes.def patch seems redundant to me.

Uros.


Re: [PATCH] Avoid implicitly declared __mpxrt_stop in libmpx

2017-01-30 Thread Alexander Ivchenko
Hi Jakub,

Thanks for noticing that. Of course the declaration had to be in the
header for this case

Alexander

2017-01-30 13:13 GMT+03:00 Jakub Jelinek :
> Hi!
>
> On Mon, Dec 26, 2016 at 06:15:01PM +0300, Alexander Ivchenko wrote:
>> Submitted as r243928.
>> >> (__mpxrt_stop): Ditto.
>
> I've noticed:
> ../../../../libmpx/mpxrt/mpxrt.c:255:6: warning: implicit declaration of 
> function ‘__mpxrt_stop’; did you mean ‘__mpxrt_mode’? [-Wimplicit-function
>   __mpxrt_stop ();
>   ^~~~
>   __mpxrt_mode
>
> The following patch fixes that warning, ok for trunk?
>
> 2017-01-30  Jakub Jelinek  
>
> * mpxrt/mpxrt-utils.h (__mpxrt_stop): New prototype.
>
> --- gcc/mpxrt/mpxrt-utils.h.jj  2016-12-27 15:35:06.0 +0100
> +++ gcc/mpxrt/mpxrt-utils.h 2017-01-30 10:31:51.502825561 +0100
> @@ -66,5 +66,6 @@ void __mpxrt_print (verbose_type vt, con
>  mpx_rt_mode_t __mpxrt_mode (void);
>  void __mpxrt_utils_free (void);
>  void __mpxrt_print_summary (uint64_t num_brs, uint64_t l1_size);
> +void __mpxrt_stop (void) __attribute__ ((noreturn));
>
>  #endif /* MPXRT_UTILS_H */
>
>
> Jakub



-- 
--Alexander


Re: [PATCH] Fix PR79256

2017-01-30 Thread Richard Biener
On Mon, 30 Jan 2017, Jakub Jelinek wrote:

> On Mon, Jan 30, 2017 at 11:47:51AM +0100, Richard Biener wrote:
> > On Mon, 30 Jan 2017, Uros Bizjak wrote:
> > 
> > > > 2017-01-30  Richard Biener  
> > > >
> > > > PR target/79277
> > > > * config/i386/i386-modes.def: Align DFmode properly.
> > > 
> > > Index: gcc/config/i386/i386-modes.def
> > > ===
> > > --- gcc/config/i386/i386-modes.def (revision 245021)
> > > +++ gcc/config/i386/i386-modes.def (working copy)
> > > @@ -33,6 +33,7 @@ ADJUST_FLOAT_FORMAT (XF, (TARGET_128BIT_
> > >: _extended_intel_96_format));
> > >  ADJUST_BYTESIZE  (XF, TARGET_128BIT_LONG_DOUBLE ? 16 : 12);
> > >  ADJUST_ALIGNMENT (XF, TARGET_128BIT_LONG_DOUBLE ? 16 : 4);
> > > +ADJUST_ALIGNMENT (DF, !TARGET_64BIT ? 4 : 8);
> > > 
> > > Please avoid negative logic, just swap arms of the conditional around.
> > 
> > It was just meant as an example fix.  I don't think this is appropriate
> > at this stage nor is it complete.  A full fix would basically make
> > x86_field_alignment unnecessary which limits most modes alignment
> > to 32bit (but not vector or 128bit float modes).  And the conditional
> > needs updating to honor TARGET_ALIGN_DOUBLE.
> 
> Yeah, at least for GCC 7 that change (quite major ABI change) is too
> dangerous IMHO.

Yes, __alignof__ (double) would change.  But I think for correctness
at least __alignof__ (*(double *)p) needs to change.  So another
way to fix this would be to change the FE to use a differently
aligned double type in contexts where the default one is wrong
(but we do have too many == double_type_node checks everywhere...).

Btw, we should eventually change ADJUST_FIELD_ALIGN to take the
type of the field instead of the FIELD_DECL (as said, only frv.c
looks at the field, all others just look at its type).

Richard.


Re: [PATCH] Fix PR79256

2017-01-30 Thread Jakub Jelinek
On Mon, Jan 30, 2017 at 11:47:51AM +0100, Richard Biener wrote:
> On Mon, 30 Jan 2017, Uros Bizjak wrote:
> 
> > > 2017-01-30  Richard Biener  
> > >
> > > PR target/79277
> > > * config/i386/i386-modes.def: Align DFmode properly.
> > 
> > Index: gcc/config/i386/i386-modes.def
> > ===
> > --- gcc/config/i386/i386-modes.def (revision 245021)
> > +++ gcc/config/i386/i386-modes.def (working copy)
> > @@ -33,6 +33,7 @@ ADJUST_FLOAT_FORMAT (XF, (TARGET_128BIT_
> >: _extended_intel_96_format));
> >  ADJUST_BYTESIZE  (XF, TARGET_128BIT_LONG_DOUBLE ? 16 : 12);
> >  ADJUST_ALIGNMENT (XF, TARGET_128BIT_LONG_DOUBLE ? 16 : 4);
> > +ADJUST_ALIGNMENT (DF, !TARGET_64BIT ? 4 : 8);
> > 
> > Please avoid negative logic, just swap arms of the conditional around.
> 
> It was just meant as an example fix.  I don't think this is appropriate
> at this stage nor is it complete.  A full fix would basically make
> x86_field_alignment unnecessary which limits most modes alignment
> to 32bit (but not vector or 128bit float modes).  And the conditional
> needs updating to honor TARGET_ALIGN_DOUBLE.

Yeah, at least for GCC 7 that change (quite major ABI change) is too
dangerous IMHO.

Jakub


Re: [PATCH] Fix PR79256

2017-01-30 Thread Richard Biener
On Mon, 30 Jan 2017, Uros Bizjak wrote:

> > 2017-01-30  Richard Biener  
> >
> > PR target/79277
> > * config/i386/i386-modes.def: Align DFmode properly.
> 
> Index: gcc/config/i386/i386-modes.def
> ===
> --- gcc/config/i386/i386-modes.def (revision 245021)
> +++ gcc/config/i386/i386-modes.def (working copy)
> @@ -33,6 +33,7 @@ ADJUST_FLOAT_FORMAT (XF, (TARGET_128BIT_
>: _extended_intel_96_format));
>  ADJUST_BYTESIZE  (XF, TARGET_128BIT_LONG_DOUBLE ? 16 : 12);
>  ADJUST_ALIGNMENT (XF, TARGET_128BIT_LONG_DOUBLE ? 16 : 4);
> +ADJUST_ALIGNMENT (DF, !TARGET_64BIT ? 4 : 8);
> 
> Please avoid negative logic, just swap arms of the conditional around.

It was just meant as an example fix.  I don't think this is appropriate
at this stage nor is it complete.  A full fix would basically make
x86_field_alignment unnecessary which limits most modes alignment
to 32bit (but not vector or 128bit float modes).  And the conditional
needs updating to honor TARGET_ALIGN_DOUBLE.

So I leave such change to target maintainers and continue papering
over this issue in the middle-end (at least for GCC 7).

Thanks,
Richard.


Re: [PATCH] Fix PR79256

2017-01-30 Thread Uros Bizjak
> 2017-01-30  Richard Biener  
>
> PR target/79277
> * config/i386/i386-modes.def: Align DFmode properly.

Index: gcc/config/i386/i386-modes.def
===
--- gcc/config/i386/i386-modes.def (revision 245021)
+++ gcc/config/i386/i386-modes.def (working copy)
@@ -33,6 +33,7 @@ ADJUST_FLOAT_FORMAT (XF, (TARGET_128BIT_
   : _extended_intel_96_format));
 ADJUST_BYTESIZE  (XF, TARGET_128BIT_LONG_DOUBLE ? 16 : 12);
 ADJUST_ALIGNMENT (XF, TARGET_128BIT_LONG_DOUBLE ? 16 : 4);
+ADJUST_ALIGNMENT (DF, !TARGET_64BIT ? 4 : 8);

Please avoid negative logic, just swap arms of the conditional around.

Uros.


RE: [PATCH 0/2] [ARC] Define __NPS400__ properly

2017-01-30 Thread Claudiu Zissulescu
Hi,

> 
> Andrew Burgess (2):
>   ARC: Make arc_selected_cpu global
>   ARC: Better creation of __NPS400__ define
> 
>  gcc/ChangeLog   | 31 
>  gcc/config/arc/arc-arch.h   | 50 ++
> ---
>  gcc/config/arc/arc-c.def|  1 +
>  gcc/config/arc/arc.c| 35 ++-
>  gcc/config/arc/arc.h| 24 ++
>  gcc/config/arc/driver-arc.c |  2 +-
>  6 files changed, 88 insertions(+), 55 deletions(-)

I haven't check them functionally, but they seems alright to me.

Thank you for your contribution,
Claudiu


[PATCH] Fix PR79256

2017-01-30 Thread Richard Biener

The following fixes PR79256 in a way suitable for stage4.  But the
underlying issue is that our IL advertises bogus alignment for
types like double on i?86 as their alignment inside structures
is 4 bytes, not 8, and double_type_node advertises 64bit alignment.
UBSAN folks invented the min_align_of_type "hack", quite expensive
one, it builds a FIELD_DECL for just frvs target hook (which also
looks really broken), others would be happy with just the type of the 
decl.

So after this fix both RTL expansion and get_object_alignment still
compute bogus alignment for *(double *)p accesses.  We could fix
up get_object_alignment as well (but given min_align_of_type cost
I'd rather not do that w/o lowering the cost by changning the hook).
And we can fix this properly by fixing the target(s) to not claim
excessive alignment of such modes.  Patches attached below
(the x86 fix bootstrapped fine on x86_64, testing is in progress,
I expect code quality fallout and eventually ABI breakage of course).

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Note RTL expansion is broken since forever checked 3.4.[06] (x86_64
is new in that release).

I do think the proper fix is in the targets.  double_type_node
may not advertise 64bit alignment.

Richard.

2017-01-30  Richard Biener  

PR tree-optimization/79256
* targhooks.c (default_builtin_vector_alignment_reachable): Honor
BIGGEST_FIELD_ALIGNMENT and ADJUST_FIELD_ALIGN to fix up bogus
alignment on TYPE.
* tree.c (build_aligned_type): Set TYPE_USER_ALIGN.

Index: gcc/targhooks.c
===
--- gcc/targhooks.c (revision 244974)
+++ gcc/targhooks.c (working copy)
@@ -1130,9 +1130,14 @@ default_vector_alignment (const_tree typ
 /* By default assume vectors of element TYPE require a multiple of the natural
alignment of TYPE.  TYPE is naturally aligned if IS_PACKED is false.  */
 bool
-default_builtin_vector_alignment_reachable (const_tree /*type*/, bool 
is_packed)
+default_builtin_vector_alignment_reachable (const_tree type, bool is_packed)
 {
-  return ! is_packed;
+  if (is_packed)
+return false;
+
+  /* If TYPE can be differently aligned in field context we have to punt
+ as TYPE may have wrong TYPE_ALIGN here (PR79278).  */
+  return min_align_of_type (CONST_CAST_TREE (type)) == TYPE_ALIGN_UNIT (type);
 }
 
 /* By default, assume that a target supports any factor of misalignment
Index: gcc/tree.c
===
--- gcc/tree.c  (revision 244974)
+++ gcc/tree.c  (working copy)
@@ -6684,6 +6684,7 @@ build_aligned_type (tree type, unsigned
 
   t = build_variant_type_copy (type);
   SET_TYPE_ALIGN (t, align);
+  TYPE_USER_ALIGN (t) = 1;
 
   return t;
 }


2017-01-30  Richard Biener  

PR tree-optimization/79256
PR middle-end/79278
* builtins.c (get_object_alignment_2): Use min_align_of_type
to extract alignment for MEM_REFs to honor BIGGEST_FIELD_ALIGNMENT
and ADJUST_FIELD_ALIGN.

Index: gcc/builtins.c
===
--- gcc/builtins.c  (revision 244974)
+++ gcc/builtins.c  (working copy)
@@ -334,9 +334,11 @@ get_object_alignment_2 (tree exp, unsign
 Do so only if get_pointer_alignment_1 did not reveal absolute
 alignment knowledge and if using that alignment would
 improve the situation.  */
+  unsigned int talign;
   if (!addr_p && !known_alignment
- && TYPE_ALIGN (TREE_TYPE (exp)) > align)
-   align = TYPE_ALIGN (TREE_TYPE (exp));
+ && (talign = min_align_of_type (TREE_TYPE (exp)) * BITS_PER_UNIT)
+ && talign > align)
+   align = talign;
   else
{
  /* Else adjust bitpos accordingly.  */


2017-01-30  Richard Biener  

PR target/79277
* config/i386/i386-modes.def: Align DFmode properly.

Index: gcc/config/i386/i386-modes.def
===
--- gcc/config/i386/i386-modes.def  (revision 245021)
+++ gcc/config/i386/i386-modes.def  (working copy)
@@ -33,6 +33,7 @@ ADJUST_FLOAT_FORMAT (XF, (TARGET_128BIT_
  : _extended_intel_96_format));
 ADJUST_BYTESIZE  (XF, TARGET_128BIT_LONG_DOUBLE ? 16 : 12);
 ADJUST_ALIGNMENT (XF, TARGET_128BIT_LONG_DOUBLE ? 16 : 4);
+ADJUST_ALIGNMENT (DF, !TARGET_64BIT ? 4 : 8);
 
 /* Add any extra modes needed to represent the condition code.
 


Re: [gomp4] optimize GOMP_MAP_TO_PSET

2017-01-30 Thread Thomas Schwinge
Hi!

On Fri, 27 Jan 2017 08:06:22 -0800, Cesar Philippidis  
wrote:
> This patch optimizes GOMP_MAP_TO_PSET in libgomp by installing the
> remapped pointer to the array data directly in the PSET, instead of
> uploading it separately with GOMP_MAP_POINTER. Effectively this
> eliminates the GOMP_MAP_POINTER that is associated with the PSET,
> thereby eliminating an additional host2dev data transfer.
> 
> While it does work, it's not quite as effective as I had hope it would
> be. I'm only observing about 0.05s, if that, in CloverLeaf, and arguably
> that's statistical noise.

Ah, too bad.

> This is probably because CloverLeaf makes use
> of ACC DATA regions in the critical sections, so all of those PSETs and
> POINTERs are already preset on the accelerator.
> 
> One thing I don't like about this patch is that I'm updating the host's
> copy of the PSET prior to uploading it. The host's PSET does get
> restored prior to returning from gomp_map_vars, however that might
> impact things if the host were to run in multi-threaded applications.
> Maybe I'll drop this patch from gomp4 since it's not very effective.

... also there is some bug somewhere; I see:

PASS: libgomp.fortran/examples-4/async_target-2.f90   -O0  (test for excess 
errors)
[-PASS:-]{+FAIL:+} libgomp.fortran/examples-4/async_target-2.f90   -O0  
execution test
PASS: libgomp.fortran/examples-4/async_target-2.f90   -O1  (test for excess 
errors)
[-PASS:-]{+FAIL:+} libgomp.fortran/examples-4/async_target-2.f90   -O1  
execution test
PASS: libgomp.fortran/examples-4/async_target-2.f90   -O2  (test for excess 
errors)
[-PASS:-]{+FAIL:+} libgomp.fortran/examples-4/async_target-2.f90   -O2  
execution test
PASS: libgomp.fortran/examples-4/async_target-2.f90   -O3 
-fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  
(test for excess errors)
[-PASS:-]{+FAIL:+} libgomp.fortran/examples-4/async_target-2.f90   -O3 
-fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  
execution test
PASS: libgomp.fortran/examples-4/async_target-2.f90   -O3 -g  (test for 
excess errors)
[-PASS:-]{+FAIL:+} libgomp.fortran/examples-4/async_target-2.f90   -O3 -g  
execution test
PASS: libgomp.fortran/examples-4/async_target-2.f90   -Os  (test for excess 
errors)
[-PASS:-]{+FAIL:+} libgomp.fortran/examples-4/async_target-2.f90   -Os  
execution test

..., and:

PASS: libgomp.fortran/target3.f90   -O0  (test for excess errors)
[-PASS:-]{+FAIL:+} libgomp.fortran/target3.f90   -O0  execution test
PASS: libgomp.fortran/target3.f90   -O1  (test for excess errors)
[-PASS:-]{+FAIL:+} libgomp.fortran/target3.f90   -O1  execution test
PASS: libgomp.fortran/target3.f90   -O2  (test for excess errors)
[-PASS:-]{+FAIL:+} libgomp.fortran/target3.f90   -O2  execution test
PASS: libgomp.fortran/target3.f90   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  (test for excess errors)
[-PASS:-]{+FAIL:+} libgomp.fortran/target3.f90   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
PASS: libgomp.fortran/target3.f90   -O3 -g  (test for excess errors)
[-PASS:-]{+FAIL:+} libgomp.fortran/target3.f90   -O3 -g  execution test
PASS: libgomp.fortran/target3.f90   -Os  (test for excess errors)
[-PASS:-]{+FAIL:+} libgomp.fortran/target3.f90   -Os  execution test

In all cases, the run-time error message is:

libgomp: Pointer target of array section wasn't mapped


For reference, I'm appending the patch, which wasn't included in the
original email.


Grüße
 Thomas


commit 29f783a0e2162fea3e21e7f4295bc16f252e1c1f
Author: cesar 
Date:   Fri Jan 27 15:51:52 2017 +

Optimize GOMP_MAP_TO_PSET.

libgomp/
* target.c (gomp_map_pset): New function.
(gomp_map_vars): Update GOMP_MAP_POINTER with GOMP_MAP_TO_PSET, when
possible.



git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@244984 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgomp/ChangeLog.gomp |   6 +++
 libgomp/target.c   | 101 ++---
 2 files changed, 77 insertions(+), 30 deletions(-)

diff --git libgomp/ChangeLog.gomp libgomp/ChangeLog.gomp
index 40b536f..0fb5297 100644
--- libgomp/ChangeLog.gomp
+++ libgomp/ChangeLog.gomp
@@ -1,5 +1,11 @@
 2017-01-27  Cesar Philippidis  
 
+   * target.c (gomp_map_pset): New function.
+   (gomp_map_vars): Update GOMP_MAP_POINTER with GOMP_MAP_TO_PSET, when
+   possible.
+
+2017-01-27  Cesar Philippidis  
+
* plugin/plugin-nvptx.c (nvptx_exec): Make aware of
GOMP_MAP_FIRSTPRIVATE_INT host addresses.
* testsuite/libgomp.oacc-c++/firstprivate-int.C: New test.
diff --git libgomp/target.c libgomp/target.c
index 

Re: [gomp4] enable GOMP_MAP_FIRSTPRIVATE_INT in OpenACC

2017-01-30 Thread Thomas Schwinge
Hi Cesar!

On Fri, 27 Jan 2017 07:45:52 -0800, Cesar Philippidis  
wrote:
> If you take a close look at lower_omp_target, you'll notice that I'm
> gave reference types special treatment. Specifically, I disabled this
> optimization on non-INTEGER_TYPE and floating point values, because the
> nvptx target was having some problems dereferencing boolean-typed
> pointers. That's something I have on my TODO list to track down later.

Please file an issue as appropriate.

> As for the performance gains, this optimization resulted in a
> non-trivial speedup in CloverLeaf running on a Nvidia Pascal board.
> CloverLeaf is somewhat special in that it consists of a lot of OpenACC
> offloaded regions which gets called multiple times throughout its
> execution. Consequently, it is I/O limited. The other benchmarks I ran
> didn't benefit nearly as much as CloverLeaf. I chose a small data set
> for CloverLeaf that only ran in 1.3s without the patch, and hence make
> it even more I/O limited. After the patch, it ran 0.35s faster.

\o/ Yay!

> This patch has been applied to gomp-4_0-branch.

(Not reviewed in detail.)

> --- a/gcc/omp-low.c
> +++ b/gcc/omp-low.c

> +static tree
> +convert_from_firstprivate_pointer (tree var, bool is_ref, gimple_seq *gs)
> +{
> +  tree type = TREE_TYPE (var);
> +  tree new_type = NULL_TREE;
> +  tree tmp = NULL_TREE;
> +  tree inner_type = NULL_TREE;

[...]/source-gcc/gcc/omp-low.c: In function 'tree_node* 
convert_from_firstprivate_pointer(tree, bool, gimple**)':
[...]/source-gcc/gcc/omp-low.c:16515:8: warning: unused variable 
'inner_type' [-Wunused-variable]


> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90

I see:

{+FAIL: libgomp.oacc-fortran/firstprivate-int.f90 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O  
(internal compiler error)+}
{+FAIL: libgomp.oacc-fortran/firstprivate-int.f90 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O   4 
blank line(s) in output+}
{+FAIL: libgomp.oacc-fortran/firstprivate-int.f90 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O  (test 
for excess errors)+}
{+UNRESOLVED: libgomp.oacc-fortran/firstprivate-int.f90 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O  
compilation failed to produce executable+}

That's the nvptx offloading compiler configured with
"--enable-checking=yes,df,fold,rtl":


[...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90: 
In function 'MAIN__._omp_fn.1':

[...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90:55:0:
 error: conversion of register to a different size
VIEW_CONVERT_EXPR(_17);

_18 = VIEW_CONVERT_EXPR(_17);

[...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90:55:0:
 error: conversion of register to a different size
VIEW_CONVERT_EXPR(_20);

_21 = VIEW_CONVERT_EXPR(_20);

[...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90:55:0:
 error: conversion of register to a different size
VIEW_CONVERT_EXPR(_23);

_24 = VIEW_CONVERT_EXPR(_23);

[...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90:55:0:
 error: conversion of register to a different size
VIEW_CONVERT_EXPR(_26);

_27 = VIEW_CONVERT_EXPR(_26);

[...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90:55:0:
 internal compiler error: verify_gimple failed
0xa67d75 verify_gimple_in_cfg(function*, bool)
[...]/source-gcc/gcc/tree-cfg.c:5125
0x94ebbc execute_function_todo
[...]/source-gcc/gcc/passes.c:1958
0x94f513 execute_todo
[...]/source-gcc/gcc/passes.c:2010


And with "-m32" multilib testing, I see:

{+FAIL: libgomp.oacc-fortran/firstprivate-int.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -O  (test for excess errors)+}
{+UNRESOLVED: libgomp.oacc-fortran/firstprivate-int.f90 
-DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable  -O  compilation 
failed to produce executable+}

That is:


[...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90:10:18:
 Error: Kind 16 not supported for type INTEGER at (1)

[...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90:16:18:
 Error: Kind 16 not supported for type LOGICAL at (1)

[...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90:115:18:
 Error: Kind 16 not supported for type INTEGER at (1)

[...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90:121:18:
 Error: Kind 16 not supported for type LOGICAL at (1)

[...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90:31:6:
 Error: Symbol 'i16i' at 

[PATCH] Avoid implicitly declared __mpxrt_stop in libmpx

2017-01-30 Thread Jakub Jelinek
Hi!

On Mon, Dec 26, 2016 at 06:15:01PM +0300, Alexander Ivchenko wrote:
> Submitted as r243928.
> >> (__mpxrt_stop): Ditto.

I've noticed:
../../../../libmpx/mpxrt/mpxrt.c:255:6: warning: implicit declaration of 
function ‘__mpxrt_stop’; did you mean ‘__mpxrt_mode’? [-Wimplicit-function
  __mpxrt_stop ();
  ^~~~
  __mpxrt_mode

The following patch fixes that warning, ok for trunk?

2017-01-30  Jakub Jelinek  

* mpxrt/mpxrt-utils.h (__mpxrt_stop): New prototype.

--- gcc/mpxrt/mpxrt-utils.h.jj  2016-12-27 15:35:06.0 +0100
+++ gcc/mpxrt/mpxrt-utils.h 2017-01-30 10:31:51.502825561 +0100
@@ -66,5 +66,6 @@ void __mpxrt_print (verbose_type vt, con
 mpx_rt_mode_t __mpxrt_mode (void);
 void __mpxrt_utils_free (void);
 void __mpxrt_print_summary (uint64_t num_brs, uint64_t l1_size);
+void __mpxrt_stop (void) __attribute__ ((noreturn));
 
 #endif /* MPXRT_UTILS_H */


Jakub


Re: [RFC] Bug lto/78140

2017-01-30 Thread Richard Biener
On Mon, Jan 30, 2017 at 12:23 AM, kugan
 wrote:
> Hi All,
>
> As suggested by Richard in the PR, I tried to implement variable size
> structures for VR as shown in attached patch. That is, I changed ipa-prop.h
> to:
>
> diff --git a/gcc/ipa-prop.h b/gcc/ipa-prop.h
> index 93a2390c..acab2aa 100644
> --- a/gcc/ipa-prop.h
> +++ b/gcc/ipa-prop.h
> @@ -157,13 +157,15 @@ struct GTY(()) ipa_bits
>  };
>
>  /* Info about value ranges.  */
> -struct GTY(()) ipa_vr
> +struct GTY ((variable_size)) ipa_vr
>  {
>/* The data fields below are valid only if known is true.  */
>bool known;
>enum value_range_type type;
> -  wide_int min;
> -  wide_int max;
> +  /* Minimum and maximum.  */
> +  TRAILING_WIDE_INT_ACCESSOR (min, ints, 0)
> +  TRAILING_WIDE_INT_ACCESSOR (max, ints, 1)
> +  trailing_wide_ints <2> ints;
>  };
>
>  /* A jump function for a callsite represents the values passed as actual
> @@ -525,7 +527,7 @@ struct GTY(()) ipcp_transformation_summary
>/* Known bits information.  */
>vec *bits;
>/* Value range information.  */
> -  vec *m_vr;
> +  vec *m_vr;
>  };
>
>  void ipa_set_node_agg_value_chain (struct cgraph_node *node,
>
> However, I am running into error when I do LTO bootstrap that memory seems
> to have deallocated by the garbage collector. Since we have the reference to
> the memory allocated by ggc_internal_alloc in the vector (m_vr), I thought
> it will not be deallocated. But during the bootstrap, when in
> ipa_node_params_t::duplicate, it seems to have been deallocated as shown in
> the back trace. I dont understand internals of gc in gcc so any help is
> appreciated.
>
>
> lto1: internal compiler error: Segmentation fault
> 0xdedc4b crash_signal
> ../../gcc/gcc/toplev.c:333
> 0xb46680 ipa_node_params_t::duplicate(cgraph_node*, cgraph_node*,
> ipa_node_params*, ipa_node_params*)
> ../../gcc/gcc/ipa-prop.c:3819
> 0xb306a3
> function_summary::symtab_duplication(cgraph_node*,
> cgraph_node*, void*)
> ../../gcc/gcc/symbol-summary.h:187
> 0x85aba7 symbol_table::call_cgraph_duplication_hooks(cgraph_node*,
> cgraph_node*)
> ../../gcc/gcc/cgraph.c:488
> 0x8765bf cgraph_node::create_clone(tree_node*, long, int, bool,
> vec, bool, cgraph_node*, bitmap_head*, char
> const*)
> ../../gcc/gcc/cgraphclones.c:522
> 0x166fb3b clone_inlined_nodes(cgraph_edge*, bool, bool, int*, int)
> ../../gcc/gcc/ipa-inline-transform.c:227
> 0x166fbd7 clone_inlined_nodes(cgraph_edge*, bool, bool, int*, int)
> ../../gcc/gcc/ipa-inline-transform.c:242
> 0x1670893 inline_call(cgraph_edge*, bool, vec vl_ptr>*, int*, bool, bool*)
> ../../gcc/gcc/ipa-inline-transform.c:449
> 0x1665bd3 inline_small_functions
> ../../gcc/gcc/ipa-inline.c:2024
> 0x1667157 ipa_inline
> ../../gcc/gcc/ipa-inline.c:2434
> 0x1667fa7 execute
> ../../gcc/gcc/ipa-inline.c:2845
>
>
> I tried similar think without variable structure like:
> diff --git a/gcc/ipa-prop.h b/gcc/ipa-prop.h
> index 93a2390c..b0cc832 100644
> --- a/gcc/ipa-prop.h
> +++ b/gcc/ipa-prop.h
> @@ -525,7 +525,7 @@ struct GTY(()) ipcp_transformation_summary
>/* Known bits information.  */
>vec *bits;
>/* Value range information.  */
> -  vec *m_vr;
> +  vec *m_vr;
>  };
>
> This also has the same issue so I don't think it has anything to do with
> variable structure.

You have to debug that detail yourself but I wonder why the transformation
summary has a different representation than the jump function (and I think
the jump function size is the issue).

The JF has

  /* Information about zero/non-zero bits.  */
  struct ipa_bits bits;

  /* Information about value range, containing valid data only when vr_known is
 true.  */
  value_range m_vr;
  bool vr_known;

with ipa_bits having two widest_ints and value_range having two trees
and an unused bitmap and ipa_vr having two wide_ints (widest_ints are
smaller!).

Richard.

>
> Thanks,
> Kugan


Re: [gomp4] add support for derived types in ACC UPDATE

2017-01-30 Thread Thomas Schwinge
Hi Cesar!

On Thu, 10 Nov 2016 09:38:33 -0800, Cesar Philippidis  
wrote:
> This patch has been committed to gomp-4_0-branch.

> --- a/gcc/fortran/openmp.c
> +++ b/gcc/fortran/openmp.c

> @@ -242,7 +243,8 @@ gfc_match_omp_variable_list (const char *str, 
> gfc_omp_namelist **list,
>   case MATCH_YES:
> gfc_expr *expr;
> expr = NULL;
> -   if (allow_sections && gfc_peek_ascii_char () == '(')
> +   if (allow_sections && gfc_peek_ascii_char () == '('
> +   || allow_derived && gfc_peek_ascii_char () == '%')
>   {
> gfc_current_locus = cur_loc;
> m = gfc_match_variable (, 0);

[...]/source-gcc/gcc/fortran/openmp.c: In function 'match 
{+gfc_match_omp_variable_list(const char*, gfc_omp_namelist**, bool, bool*, 
gfc_omp_namelist***, bool, bool)':+}
[...]/source-gcc/gcc/fortran/openmp.c:246:23: warning: suggest parentheses 
around '&&' within '||' [-Wparentheses]
if (allow_sections && gfc_peek_ascii_char () == '('
   ^

> --- a/gcc/fortran/trans-openmp.c
> +++ b/gcc/fortran/trans-openmp.c
> @@ -1938,7 +1938,66 @@ gfc_trans_omp_clauses_1 (stmtblock_t *block, 
> gfc_omp_clauses *clauses,
> tree decl = gfc_get_symbol_decl (n->sym);
> if (DECL_P (decl))
>   TREE_ADDRESSABLE (decl) = 1;
> -   if (n->expr == NULL || n->expr->ref->u.ar.type == AR_FULL)
> +   /* Handle derived-typed members for OpenACC Update.  */
> +   if (n->sym->ts.type == BT_DERIVED
> +   && n->expr != NULL && n->expr->ref != NULL
> +   && (n->expr->ref->next == NULL
> +   || (n->expr->ref->next != NULL
> +   && n->expr->ref->next->type == REF_ARRAY
> +   && n->expr->ref->next->u.ar.type == AR_FULL)))
> + {
> +   gfc_ref *ref = n->expr->ref;
> +   tree orig_decl = decl;

[...]/source-gcc/gcc/fortran/trans-openmp.c: In function 'tree_node* 
gfc_trans_omp_clauses_1(stmtblock_t*, gfc_omp_clauses*, locus, bool)':
[...]/source-gcc/gcc/fortran/trans-openmp.c:1947:10: warning: unused 
variable 'orig_decl' [-Wunused-variable]
 tree orig_decl = decl;
  ^

> --- /dev/null
> +++ b/gcc/testsuite/gfortran.dg/goacc/derived-types.f90
> @@ -0,0 +1,78 @@
> +! Test ACC UPDATE with derived types. The DEVICE clause depends on an
> +! accelerator being present.

I guess that "DEVICE" comment here is a leftover?  (Doesn't apply to a
compile test.)

> +module dt
> +  integer, parameter :: n = 10
> +  type inner
> + integer :: d(n)
> +  end type inner
> +  type dtype
> + integer(8) :: a, b, c(n)
> + type(inner) :: in
> +  end type dtype
> +end module dt
> +
> +program derived_acc
> +  use dt
> +  
> +  implicit none
> +  type(dtype):: var
> +  integer i
> +  !$acc declare create(var)
> +  !$acc declare pcopy(var%a) ! { dg-error "Syntax error in OpenMP" }
> +
> +  !$acc update host(var)
> +  !$acc update host(var%a)
> +  !$acc update device(var)
> +  !$acc update device(var%a)
> +[...]

> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.oacc-fortran/update-2.f90
> @@ -0,0 +1,285 @@
> +! Test ACC UPDATE with derived types. The DEVICE clause depends on an
> +! accelerator being present.

Why?  Shouldn "!$acc update device" just be a no-op for host execution?


Grüße
 Thomas


> +! { dg-do run  { target openacc_nvidia_accel_selected } }
> +
> +module dt
> +  integer, parameter :: n = 10
> +  type inner
> + integer :: d(n)
> +  end type inner
> +  type mytype
> + integer(8) :: a, b, c(n)
> + type(inner) :: in
> +  end type mytype
> +end module dt
> +
> +program derived_acc
> +  use dt
> +
> +  implicit none
> +  integer i, res
> +  type(mytype) :: var
> +
> +  var%a = 0
> +  var%b = 1
> +  var%c(:) = 10
> +  var%in%d(:) = 100
> +
> +  var%c(:) = 10
> +
> +  !$acc enter data copyin(var)
> +
> +  !$acc parallel loop present(var)
> +  do i = 1, 1
> + var%a = var%b
> +  end do
> +  !$acc end parallel loop
> +
> +  !$acc update host(var%a)
> +
> +  if (var%a /= var%b) call abort
> +
> +  var%b = 100
> +
> +  !$acc update device(var%b)
> +
> +  !$acc parallel loop present(var)
> +  do i = 1, 1
> + var%a = var%b
> +  end do
> +  !$acc end parallel loop
> +
> +  !$acc update host(var%a)
> +
> +  if (var%a /= var%b) call abort
> +
> +  !$acc parallel loop present (var)
> +  do i = 1, n
> + var%c(i) = i
> +  end do
> +  !$acc end parallel loop
> +
> +  !$acc update host(var%c)
> +
> +  var%a = -1
> +
> +  do i = 1, n
> + if (var%c(i) /= i) call abort
> + var%c(i) = var%a
> +  end do
> +
> +  !$acc update device(var%a)
> +  !$acc update device(var%c)
> +
> +  res = 0
> +
> +  !$acc parallel loop present(var) reduction(+:res)
> +  do i = 1, n
> + if (var%c(i) /= var%a) res = res + 1
> +  end do
> +
> +  if (res /= 0) call abort
> +
> +  var%c(:) = 0
> +
> +  !$acc update device(var%c)
> +
> +  !$acc parallel loop 

Re: [PATCH/VECT/AARCH64] Improve cost model for ThunderX2 CN99xx

2017-01-30 Thread Richard Biener
On Sat, Jan 28, 2017 at 9:34 PM, Andrew Pinski  wrote:
> Hi,
>   On some (most) AARCH64 cores, it is not always profitable to
> vectorize some integer loops.  This patch does two things (I can split
> it into different patches if needed).
> 1) It splits the aarch64 back-end's vector cost model's vector and
> scalar costs into int and fp fields
> 1a) For thunderx2t99, models correctly the integer vector/scalar costs.
> 2) Fixes/Improves a few calls to record_stmt_cost in tree-vect-loop.c
> where stmt_info was not being passed.
>
> OK?  Bootstrapped and tested on aarch64-linux-gnu and provides 20% on
> libquantum and ~1% overall on SPEC CPU 2006 int.

+   {
+ struct _stmt_vec_info *stmt_info
+   = si->stmt ? vinfo_for_stmt (si->stmt) : NULL;

use stmt_vec_info instead of 'struct _stmt_vec_info *'.

The vectorizer changes are ok with that change.

Thanks,
Richard.

> Thanks,
> Andrew Pinski
>
> ChangeLog:
> * tree-vect-loop.c (vect_compute_single_scalar_iteration_cost): Pass
> stmt_info to record_stmt_cost.
> (vect_get_known_peeling_cost): Pass stmt_info if known to record_stmt_cost.
>
> * config/aarch64/aarch64-protos.h (cpu_vector_cost): Split
> cpu_vector_cost field into
> scalar_int_stmt_cost and scalar_fp_stmt_cost.  Split vec_stmt_cost
> field into vec_int_stmt_cost and vec_fp_stmt_cost.
> * config/aarch64/aarch64.c (generic_vector_cost): Update for the
> splitting of scalar_stmt_cost and vec_stmt_cost.
> (thunderx_vector_cost): Likewise.
> (cortexa57_vector_cost): LIkewise.
> (exynosm1_vector_cost): Likewise.
> (xgene1_vector_cost): Likewise.
> (thunderx2t99_vector_cost): Improve after the splitting of the two fields.
> (aarch64_builtin_vectorization_cost): Update for the splitting of
> scalar_stmt_cost and vec_stmt_cost.


[PATCH] Actually fix libhsail-rt build on x86_64/i?86 32-bit

2017-01-30 Thread Jakub Jelinek
Hi!

I've noticed a bug in libhsail-rt/configure.tgt - those files are actually
pure shell script, not autoconf templates (they are just sourced from
the autoconf templates as well as configure scripts generated by autoconf),
so [[ ]] doesn't really work there.

After fixing that I've noticed that libhsail-rt actually doesn't build at
all as 32-bit.  The most important reason is that the code uses __int128_t
and __uint128_t, which is not supported on ILP32.  While computing the
saturating arithmetics that way is probably efficient for 8-bit and 16-bit,
I believe even on x86_64 it is unnecessarily slow for 32-bit and very slow
for 64-bit SAT computations, so I rewrote those to use the overflow
builtins, which should emit efficient code.  Do you have some sufficient
testsuite to test those?  I also don't understand the 
uint8_t __hsail_sat_sub_u8 (uint8_t a, uint8_t b)
{
  int16_t c = (uint16_t) a - (uint16_t) b;
  if (c < 0)
return 0;
  else if (c > UINT8_MAX)
return UINT8_MAX;
  else
return c;
}
How could subtraction of unsigned values ever yield a value larger than
the unsigned maximum?  For a and b in [0, UINT8_MAX] a - b with infinite
precision is in the range [ -UINT8_MAX, UINT8_MAX ], so only the if (c < 0)
return 0; else return c; makes sense there to me.
The rest of the patch are workaround for pointer to int and int to pointer
conversion warning and the last one is for stripping away volatile
qualifier.  Haven't tested whether it actually works then (but I only see
dg-do compile tests for brig anyway).

OT, should we ask for a DWARF language code for BRIG?  Shall it be
DW_LANG_hsail or DW_LANG_brig, something else?

2017-01-30  Jakub Jelinek  

* configure.tgt: Fix i?86-*-linux* entry.
* rt/sat_arithmetic.c (__hsail_sat_add_u32, __hsail_sat_add_u64,
__hsail_sat_add_s32, __hsail_sat_add_s64): Use __builtin_add_overflow.
(__hsail_sat_sub_u8, __hsail_sat_sub_u16): Remove pointless for overflow
over maximum.
(__hsail_sat_sub_u32, __hsail_sat_sub_u64, __hsail_sat_sub_s32,
__hsail_sat_sub_s64): Use __builtin_sub_overflow.
(__hsail_sat_mul_u32, __hsail_sat_mul_u64, __hsail_sat_mul_s32,
__hsail_sat_mul_s64): Use __builtin_mul_overflow.
* rt/arithmetic.c (__hsail_borrow_u32, __hsail_borrow_u64): Use
__builtin_sub_overflow_p.
(__hsail_carry_u32, __hsail_carry_u64): Use __builtin_add_overflow_p.
* rt/misc.c (__hsail_groupbaseptr, __hsail_kernargbaseptr_u64):
Cast pointers to uintptr_t first before casting to some other integral
type.
* rt/segment.c (__hsail_segmentp_private, __hsail_segmentp_group): 
Likewise.
* rt/queue.c (__hsail_ldqueuereadindex, __hsail_ldqueuewriteindex,
__hsail_addqueuewriteindex, __hsail_casqueuewriteindex,
__hsail_stqueuereadindex, __hsail_stqueuewriteindex): Cast integral 
value
to uintptr_t first before casting to pointer.
* rt/workitems.c (__hsail_alloca_pop_frame): Cast memcpy first argument 
to
void * to avoid warning.

--- libhsail-rt/configure.tgt.jj2017-01-30 09:31:51.0 +0100
+++ libhsail-rt/configure.tgt   2017-01-30 09:35:04.402926654 +0100
@@ -28,9 +28,7 @@
 # broken systems. Currently it has been tested only on x86_64 Linux
 # of the upstream gcc targets. More targets shall be added after testing.
 case "${target}" in
-  i[[3456789]]86-*linux*)
-;;
-  x86_64-*-linux*)
+  x86_64-*-linux* | i?86-*-linux*)
 ;;
 *)
 UNSUPPORTED=1
--- libhsail-rt/rt/sat_arithmetic.c.jj  2017-01-26 09:17:46.0 +0100
+++ libhsail-rt/rt/sat_arithmetic.c 2017-01-30 10:27:27.861325330 +0100
@@ -49,21 +49,18 @@ __hsail_sat_add_u16 (uint16_t a, uint16_
 uint32_t
 __hsail_sat_add_u32 (uint32_t a, uint32_t b)
 {
-  uint64_t c = (uint64_t) a + (uint64_t) b;
-  if (c > UINT32_MAX)
+  uint32_t c;
+  if (__builtin_add_overflow (a, b, ))
 return UINT32_MAX;
-  else
-return c;
+  return c;
 }
 
 uint64_t
 __hsail_sat_add_u64 (uint64_t a, uint64_t b)
 {
-  __uint128_t c = (__uint128_t) a + (__uint128_t) b;
-  if (c > UINT64_MAX)
+  uint64_t c;
+  if (__builtin_add_overflow (a, b, ))
 return UINT64_MAX;
-  else
-return c;
 }
 
 int8_t
@@ -93,25 +90,19 @@ __hsail_sat_add_s16 (int16_t a, int16_t
 int32_t
 __hsail_sat_add_s32 (int32_t a, int32_t b)
 {
-  int64_t c = (int64_t) a + (int64_t) b;
-  if (c > INT32_MAX)
-return INT32_MAX;
-  else if (c < INT32_MIN)
-return INT32_MIN;
-  else
-return c;
+  int32_t c;
+  if (__builtin_add_overflow (a, b, ))
+return b < 0 ? INT32_MIN : INT32_MAX;
+  return c;
 }
 
 int64_t
 __hsail_sat_add_s64 (int64_t a, int64_t b)
 {
-  __int128_t c = (__int128_t) a + (__int128_t) b;
-  if (c > INT64_MAX)
-return INT64_MAX;
-  else if (c < INT64_MIN)
-return INT64_MIN;
-  else
-return c;
+  int64_t c;
+  if (__builtin_add_overflow (a, b, ))
+return b < 0 ? INT64_MIN : INT64_MAX;
+  return c;
 }
 

Re: [PATCH v2] S/390: PR target/79240: Fix assertion in s390_extzv_shift_ok.

2017-01-30 Thread Andreas Krebbel
On 01/26/2017 09:42 PM, Dominik Vogt wrote:
> On Thu, Jan 26, 2017 at 05:45:18PM +0100, Jakub Jelinek wrote:
>> On Thu, Jan 26, 2017 at 05:43:13PM +0100, Dominik Vogt wrote:
 If the predicates are supposed to ensure it, then I think the assert is
 fine.
>>>
>>> Is it guaranteed that the predicate conditions are evaluated
>>> before executing the conditions?
>>
>> Yes.  You can see it in insn-recog.c...
> 
> Updated patch attached.
> 
> changes:
> 
>   * Don't remove assertion.
>   * Use simplified test case.
> 
> Bootstrapped and regression tested on a zEC12 with s390x biarch
> and s390.

Applied. Thanks!

-Andreas-




Re: [PATCH][RFA][PR tree-optimization/79095] Improve overflow test optimization and avoid invalid warnings

2017-01-30 Thread Richard Biener
On Fri, Jan 27, 2017 at 11:21 PM, Jeff Law  wrote:
> On 01/27/2017 02:35 PM, Richard Biener wrote:
>>
>> On January 27, 2017 7:30:07 PM GMT+01:00, Jeff Law  wrote:
>>>
>>> On 01/27/2017 05:08 AM, Richard Biener wrote:

 On Fri, Jan 27, 2017 at 10:02 AM, Marc Glisse 
>>>
>>> wrote:
>
> On Thu, 26 Jan 2017, Jeff Law wrote:
>
>>> I assume this causes a regression for code like
>>>
>>> unsigned f(unsigned a){
>>>   unsigned b=a+1;
>>>   if(b>>   return b;
>>> }
>>
>>
>> Yes.  The transformation ruins the conversion into ADD_OVERFLOW for
>>>
>>> the +-
>>
>> 1 case.  However, ISTM that we could potentially recover the
>>>
>>> ADD_OVERFLOW in
>>
>> phi-opt.  It's a very simple pattern that would be presented to
>>>
>>> phi-opt, so
>>
>> it might not be terrible to recover -- which has the advantage that
>>>
>>> if a
>>
>> user wrote an optimized overflow test we'd be able to recover
>>>
>>> ADD_OVERFLOW
>>
>> for it.
>
>
>
> phi-opt is a bit surprising at first glance because there can be
>>>
>>> overflow
>
> checking without condition/PHI, but if it is convenient to catch
>>>
>>> many
>
> cases...


 Yeah, and it's still on my TODO to add some helpers exercising
 match.pd COND_EXPR
 patterns from PHI nodes and their controlling condition.
>>>
>>> It turns out to be better to fix the existing machinery to detect
>>> ADD_OVERFLOW in the transformed case than to add new detection to
>>> phi-opt.
>>>
>>> The problem with improving the detection of ADD_OVERFLOW is that the
>>> transformed test may allow the ADD/SUB to be sunk.  So by the time we
>>> run the pass to detect ADD_OVERFLOW, the test and arithmetic may be in
>>> different blocks -- ugh.
>>>
>>> The more I keep thinking about this the more I wonder if transforming
>>> the conditional is just more of a headache than its worth -- the main
>>> need here is to drive propagation of known constants into the THEN/ELSE
>>>
>>> clauses.  Transforming the conditional makes that easy for VRP & DOM to
>>>
>>> discover those constant and the transform is easy to write in match.pd.
>>>
>>> But we could just go back to discovering the case in VRP or DOM via
>>> open-coding detection, then propagating the known constants without
>>> transforming the conditional.
>>
>>
>> Indeed we can do that.  And in fact with named patterns in match.pd you
>> could even avoid the open-coding.
>
> negate_expr_p being the example?  That does look mighty interesting... After
> recognition we'd still have to extract the operands, but it does look like
> it might handle the detection part.

Yes, the (match ..) stuff is actually exported in gimple-match.c (just
no declarations
are emitted to headers yet).  logical_inverted_value might be a better example
given it has an "output":

(match (logical_inverted_value @0)
 (truth_not @0))

bool
gimple_logical_inverted_value (tree t, tree *res_ops, tree
(*valueize)(tree) ATTRIBUTE_UNUSED)
{

you get @0 in res_ops[0] if that returns true.  I've at some point
written some of tree-vect-patterns.c
as match.pd (match...) but never really completed it.  In the end it
would be nice to write the patterns
in-inline at use points, say,

  /* (match (foo @0) ()) */
  if (gimple_foo (...))
   {
   }

and have some gen-program extract those patterns from source files
(plus inserting the necessary
prototypes locally?).

Richard.



> jeff


[PATCH] Fix aarch64 PGO bootstrap (bootstrap/78985)

2017-01-30 Thread Martin Liška
Hi.

Following patch simply fixes issues reported by -Wmaybe-unitialized. That 
enables PGO bootstrap
on ThunderX aarch64 machine.

Ready to be installed?
Martin
>From 6188df717836ee79f4d7951dca5066ef10b577a1 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Mon, 30 Jan 2017 10:44:07 +0100
Subject: [PATCH] Fix aarch64 PGO bootstrap (bootstrap/78985)

gcc/ChangeLog:

2017-01-30  Martin Liska  

	PR bootstrap/78985
	* config/aarch64/cortex-a57-fma-steering.c (func_fma_steering::analyze):
	Initialize variables with NULL value.
---
 gcc/config/aarch64/cortex-a57-fma-steering.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/config/aarch64/cortex-a57-fma-steering.c b/gcc/config/aarch64/cortex-a57-fma-steering.c
index a5acd71d784..4a3887984b4 100644
--- a/gcc/config/aarch64/cortex-a57-fma-steering.c
+++ b/gcc/config/aarch64/cortex-a57-fma-steering.c
@@ -923,10 +923,10 @@ func_fma_steering::analyze ()
   FOR_BB_INSNS (bb, insn)
 	{
 	  operand_rr_info *dest_op_info;
-	  struct du_chain *chain;
+	  struct du_chain *chain = NULL;
 	  unsigned dest_regno;
-	  fma_forest *forest;
-	  du_head_p head;
+	  fma_forest *forest = NULL;
+	  du_head_p head = NULL;
 	  int i;
 
 	  if (!is_fmul_fmac_insn (insn, true))
-- 
2.11.0



  1   2   >