Re: [PATCH] PR target/71549: Convert V1TImode register to TImode in debug insn

2016-07-08 Thread Markus Trippelsdorf
On 2016.06.28 at 11:21 -0700, Gary Funck wrote:
> On 06/20/16 04:55:16, H.J. Lu wrote:
> > TImode register referenced in debug insn can be converted to V1TImode
> > by scalar to vector optimization.  We need to convert a debug insn if
> > it has a variable in a TImode register.
> 
> We have a situation on a few of the UPC tests, where they ICE on
> this gcc_assert().
> 
> 3820  gcc_assert (REG_P (loc)
> 3821  && GET_MODE (loc) == V1TImode);
> 
> (gdb) p val
> $2 = (rtx) 0x7fffef6d3978
> (gdb) pr
> warning: Expression is not an assignment (and might have no effect)
> (var_location:TI newval (subreg:TI (reg/v/f:V1TI 307 [ newval ]) 0))
> 
> (gdb) p loc
> $1 = (rtx) 0x7fffef409210
> (gdb) pr
> warning: Expression is not an assignment (and might have no effect)
> (subreg:TI (reg/v/f:V1TI 307 [ newval ]) 0)
> 
> As you can see, 'loc' is already a TI mode subreg based upon a V1TI mode reg.
> 
> I didn't try tracking down how we end up with 'loc' as a subreg, but will
> note that in UPC the pointer-to-shared representation is a 16 byte struct,
> aligned on 16 bytes, so the generated code will frequently deal with TImode
> values in registers.
> 
> Given the code that follows this assert,
> 
> 3822  /* Convert V1TImode register, which has been updated by a 
> SET
> 3823 insn before, to SUBREG TImode.  */
> 3824  PAT_VAR_LOCATION_LOC (val) = gen_rtx_SUBREG (TImode, loc, 
> 0);
> 3825  df_insn_rescan (insn);
> 
> converts the V1TImode register into a TImode subreg, and we already have
> that situation, I tried the following patch:
> 
> --- /a/gcc-trunk/gcc/config/i386/i386.c 2016-06-26 19:01:12.099740515 -0700
> +++ config/i386/i386.c  2016-06-28 11:17:26.323396045 -0700
> @@ -3814,6 +3814,9 @@
> continue;
>   gcc_assert (GET_CODE (val) == VAR_LOCATION);
>   rtx loc = PAT_VAR_LOCATION_LOC (val);
> + /* If already a SUBREG, skip.  */
> + if (SUBREG_P (loc))
> +   continue;
>   gcc_assert (REG_P (loc)
>   && GET_MODE (loc) == V1TImode);
>   /* Convert V1TImode register, which has been updated by a SET
> 
> 
> Can the patch be amended to include this fix?  Let me know if you need
> additional information, or would like me to try something else.

See also https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71801, which has a
small testcase, that is fixed by Gary's patch, too.

-- 
Markus


Re: [PATCH] Support running the selftests under valgrind

2016-07-08 Thread Andrew Pinski
On Fri, Jul 8, 2016 at 12:46 PM, David Malcolm  wrote:
> This patch adds a new phony target to gcc/Makefile.in to make it easy
> to run the selftests under valgrind, via "make selftest-valgrind".
> This phony target isn't a dependency of anything; it's purely for
> convenience (it takes about 4-5 seconds on my box).
>
> Doing so uncovered a few leaks in the selftest suite, which the
> patch also fixes, so that it runs cleanly under valgrind (on
> x86_64-pc-linux-gnu, configured with --enable-valgrind-annotations,
> at least).
>
> Successfully bootstrapped®rtested on x86_64-pc-linux-gnu.
> Manually verified that the valgrind output is "clean" on
> x86_64-pc-linux-gnu [1].
>
> OK for trunk?

I think this is a good idea.  I assume this not turned on by default.
valgrind is still not fully working on aarch64 :).

Thanks,
Andrew

>
> [1]:
>
>  HEAP SUMMARY:
>  in use at exit: 1,203,983 bytes in 2,114 blocks
>total heap usage: 4,545 allocs, 2,431 frees, 3,212,841 bytes allocated
>
>  LEAK SUMMARY:
> definitely lost: 0 bytes in 0 blocks
> indirectly lost: 0 bytes in 0 blocks
>   possibly lost: 0 bytes in 0 blocks
> still reachable: 1,203,983 bytes in 2,114 blocks
>  suppressed: 0 bytes in 0 blocks
>  Reachable blocks (those to which a pointer was found) are not shown.
>  To see them, rerun with: --leak-check=full --show-leak-kinds=all
>
>  For counts of detected and suppressed errors, rerun with: -v
>  ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 2 from 2)
>
> gcc/ChangeLog:
> * Makefile.in (selftest-valgrind): New phony target.
> * function-tests.c (selftest::build_cfg): Delete pass instances
> created by the test.
> (selftest::convert_to_ssa): Likewise.
> (selftest::test_expansion_to_rtl): Likewise.
> * tree-cfg.c (selftest::test_linear_chain): Release dominator
> vectors.
> (selftest::test_diamond): Likewise.
> ---
>  gcc/Makefile.in  | 6 ++
>  gcc/function-tests.c | 4 
>  gcc/tree-cfg.c   | 6 ++
>  3 files changed, 16 insertions(+)
>
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index 5e7422d..1a4b5d7 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1869,6 +1869,12 @@ s-selftest: $(GCC_PASSES) cc1$(exeext) stmp-int-hdrs
>  selftest-gdb: $(GCC_PASSES) cc1$(exeext) stmp-int-hdrs
> $(GCC_FOR_TARGET) -xc -S -c /dev/null -fself-test -wrapper gdb,--args
>
> +# Convenience method for running selftests under valgrind:
> +.PHONY: selftest-valgrind
> +selftest-valgrind: $(GCC_PASSES) cc1$(exeext) stmp-int-hdrs
> +   $(GCC_FOR_TARGET) -xc -S -c /dev/null -fself-test \
> + -wrapper valgrind,--leak-check=full
> +
>  # Recompile all the language-independent object files.
>  # This is used only if the user explicitly asks for it.
>  compilations: $(BACKEND)
> diff --git a/gcc/function-tests.c b/gcc/function-tests.c
> index c8188e7..edd355f 100644
> --- a/gcc/function-tests.c
> +++ b/gcc/function-tests.c
> @@ -296,6 +296,7 @@ build_cfg (tree fndecl)
>push_cfun (fun);
>lower_cf_pass->execute (fun);
>pop_cfun ();
> +  delete lower_cf_pass;
>
>/* We can now convert to CFG form; for our trivial test function this
>   gives us:
> @@ -310,6 +311,7 @@ build_cfg (tree fndecl)
>push_cfun (fun);
>build_cfg_pass->execute (fun);
>pop_cfun ();
> +  delete build_cfg_pass;
>  }
>
>  /* Convert a gimple+CFG function to SSA form.  */
> @@ -325,6 +327,7 @@ convert_to_ssa (tree fndecl)
>push_cfun (fun);
>build_ssa_pass->execute (fun);
>pop_cfun ();
> +  delete build_ssa_pass;
>  }
>
>  /* Assuming we have a simple 3-block CFG like this:
> @@ -594,6 +597,7 @@ test_expansion_to_rtl ()
>init_function_start (fndecl);
>expand_pass->execute (fun);
>pop_cfun ();
> +  delete expand_pass;
>
>/* On x86_64, I get this:
> (note 3 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
> diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
> index 0fac49c..6d69435 100644
> --- a/gcc/tree-cfg.c
> +++ b/gcc/tree-cfg.c
> @@ -9276,6 +9276,7 @@ test_linear_chain ()
>ASSERT_EQ (1, dom_by_b.length ());
>ASSERT_EQ (bb_c, dom_by_b[0]);
>free_dominance_info (CDI_DOMINATORS);
> +  dom_by_b.release ();
>
>/* Similarly for post-dominance: each BB in our chain is post-dominated
>   by the one after it.  */
> @@ -9286,6 +9287,7 @@ test_linear_chain ()
>ASSERT_EQ (1, postdom_by_b.length ());
>ASSERT_EQ (bb_a, postdom_by_b[0]);
>free_dominance_info (CDI_POST_DOMINATORS);
> +  postdom_by_b.release ();
>
>pop_cfun ();
>  }
> @@ -9346,8 +9348,10 @@ test_diamond ()
>ASSERT_EQ (bb_a, get_immediate_dominator (CDI_DOMINATORS, bb_d));
>vec dom_by_a = get_dominated_by (CDI_DOMINATORS, bb_a);
>ASSERT_EQ (3, dom_by_a.length ()); /* B, C, D, in some order.  */
> +  dom_by_a.release ();
>vec dom_by_b = get_dominated_by (CDI_DOMINATORS, bb_b);
>ASSERT_EQ (0, dom_by_b.length ());
> +  dom_by_b.release ();
>free_dominance_info (CDI_

Re: [PATCH PR c/71699] Handle pointer arithmetic in nonzero tree checks

2016-07-08 Thread Manish Goregaokar
Yep, there are some test issues -- I don't have time right now but
plan to investigate further later.
-Manish


On Sat, Jul 9, 2016 at 12:06 AM, Jeff Law  wrote:
> On 07/06/2016 11:22 AM, Bernd Schmidt wrote:
>>
>> On 07/05/2016 12:41 PM, Richard Biener wrote:
>>>
>>> On Fri, Jul 1, 2016 at 3:10 PM, Manish Goregaokar 
>>> wrote:

 Added a test:
>>>
>>>
>>> Ok if this passed bootstrap/regtest.
>>
>>
 +  return flag_delete_null_pointer_checks
 +&& (tree_expr_nonzero_warnv_p (op0, strict_overflow_p)
 +|| tree_expr_nonzero_warnv_p (op1, strict_overflow_p));
  case PLUS_EXPR:
>>
>>
>> But please fix the wrapping - multi-line expressions like this should be
>> enclosed in parentheses to make the editor deal with them correctly.
>
> I believe this patch regresses several tests in constexpr-array-ptr10.C.
>
> jeff
>


[PATCH FT32]: apply unbias to references to RAM symbols

2016-07-08 Thread James Bowman
The FT32 binutils use a bias to distinguish between RAM and flash
addresses.

This fix adds an ASM_OUTPUT_SYMBOL_REF() that unbiases references to
RAM symbols.

Only references to RAM objects have the bias applied. Flash objects
(that is, objects in ADDR SPACE 1) are not biased, so for these no bias
should be applied. Likewise references in the gdb section need to use
the biased address, so references in debug sections are not unbiased.

gcc/ChangeLog:

2016-07-08  James Bowman  

* config/ft32/ft32.c (ft32_elf_encode_section_info): New function.
* config/ft32/ft32.h (ASM_OUTPUT_SYMBOL_REF): New function.

Index: gcc/config/ft32/ft32.c
===
--- gcc/config/ft32/ft32.c  (revision 237998)
+++ gcc/config/ft32/ft32.c  (working copy)
@@ -35,6 +35,7 @@
 #include "calls.h"
 #include "expr.h"
 #include "builtins.h"
+#include "print-tree.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -895,6 +896,46 @@ yes:
   return 1;
 }
 
+#undef TARGET_ENCODE_SECTION_INFO
+#define TARGET_ENCODE_SECTION_INFO  ft32_elf_encode_section_info
+
+void
+ft32_elf_encode_section_info (tree decl, rtx rtl, int first)
+{
+  enum tree_code code;
+  rtx symbol;
+
+  /* Careful not to prod global register variables.  */
+  if (!MEM_P (rtl))
+return;
+  symbol = XEXP (rtl, 0);
+  if (GET_CODE (symbol) != SYMBOL_REF)
+return;
+
+  default_encode_section_info (decl, rtl, first);
+
+  code = TREE_CODE (decl);
+  switch (TREE_CODE_CLASS (code))
+{
+case tcc_declaration:
+  {
+   tree type = TREE_TYPE (decl);
+   int is_flash = (type && TYPE_P (type) && !ADDR_SPACE_GENERIC_P 
(TYPE_ADDR_SPACE (type)));
+   if ((code == VAR_DECL) && !is_flash)
+ SYMBOL_REF_FLAGS (symbol) |= 0x1000;
+  }
+  break;
+case tcc_constant:
+case tcc_exceptional:
+  if (code == STRING_CST)
+   SYMBOL_REF_FLAGS (symbol) |= 0x1000;
+}
+
+  // debug_tree (decl);
+  // debug_rtx (rtl);
+  // printf("\n");
+}
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-ft32.h"
Index: gcc/config/ft32/ft32.h
===
--- gcc/config/ft32/ft32.h  (revision 237998)
+++ gcc/config/ft32/ft32.h  (working copy)
@@ -506,4 +506,14 @@ do { \
 
 extern int ft32_is_mem_pm(rtx o);
 
+#define ASM_OUTPUT_SYMBOL_REF(stream, sym) \
+  do { \
+assemble_name (stream, XSTR (sym, 0)); \
+int section_debug = in_section && \
+  (SECTION_STYLE (in_section) == SECTION_NAMED) && \
+  (in_section->named.common.flags & SECTION_DEBUG); \
+if (!section_debug && SYMBOL_REF_FLAGS (sym) & 0x1000) \
+  asm_fprintf (stream, "-0x80"); \
+  } while (0)
+
 #endif /* GCC_FT32_H */


Re: [PATCH] Giant concepts patch

2016-07-08 Thread Jason Merrill
On Wed, Jun 22, 2016 at 2:25 AM, Andrew Sutton
 wrote:
>> > I've run into some trouble building cmcstl2: declarator requirements
>> > on a function can lead to constraints that tsubst_constraint doesn't
>> > handle.  What was your theory of only handling a few _CONSTR codes
>> > there?  This is blocking me from checking in the patch.
>
> I wonder if those were the problems that I was running into, but hadn't
> diagnosed. I had thought it shouldn't be possible to get the full set of
> constraints in tsubst_constraint. I may have mis-analyzed the problem for
> function constraints.

Any further thoughts?

Jason


PR fortran/68426 -- committed

2016-07-08 Thread Steve Kargl
2016-07-08  Steven G. Kargl  

PR fortran/68426
* simplify (gfc_simplify_spread): Adjust locus.


Index: simplify.c
===
--- simplify.c  (revision 238178)
+++ simplify.c  (working copy)
@@ -6183,8 +6183,7 @@ gfc_simplify_spread (gfc_expr *source, g
 }
   else
 {
-  gfc_error ("Simplification of SPREAD at %L not yet implemented",
-&source->where);
+  gfc_error ("Simplification of SPREAD at %C not yet implemented");
   return &gfc_bad_expr;
 }

-- 
Steve


Re: RFC (attributes): PATCH for c++/50800 to set affects_type_identity for may_alias

2016-07-08 Thread Jason Merrill
On Jun 27, 2016 12:53 PM, "Richard Biener"  wrote:
>
> On Thu, Jun 23, 2016 at 9:39 PM, Jason Merrill  wrote:
> > My earlier patch for 50800 fixed the ICE by consistently stripping
> > non-mangled attributes from template arguments, and mangling those that
> > affect type identity.  At the C++ meeting this week someone pointed out to
> > me that this is a real problem for x86 vector code, which relies on
> > may_alias semantics: if may_alias is stripped from __m128, users can't use
> > templates with vectors.
> >
> > So, it seems that the solution is to mangle may_alias by saying that it
> > affects type identity.  But since we still want to be able to convert back
> > and forth, I thought that it would make sense to treat the may_alias version
> > of a type as a variant, rather than a new distinct type.  So the first patch
> > creates a new category of attributes that are treated as type variants.
> >
> > An alternative patch just sets affects_type_identity and adjusts the C++
> > front end to allow conversion between pointers to add or discard may_alias.
> >
> > Thoughts?
>
> As may_alias purely affects semantics in the implementation of an API
> but not the ABI it shouldn't effect mangling.  In the middle-end we use
> TYPE_REF_CAN_ALIAS_ALL and that and the unqualified pointer
> share the same canonical type (but it's not a variant type, pointer types
> are chained via TYPE_POINTER_TO).
>
> Not sure if you can make use of the canonicalness in the C++ FE
> and maybe drop the attribute early there.

We already drop the attribute; the problem is that users want it to
affect template instantiations. For that to work it needs to affect
mangling of template arguments, at least, so that A<__m128d> and
A<__v2df> can be different types, as they need to be; if they are the
same, one or the other has the wrong semantics.

But to answer Florian's question, mangling of structs would not be
affected, only attribute-qualified built-in types.

Or perhaps we could make __m128* somehow mangle using those names
rather than the underlying vectors, maybe by wrapping them in a
struct.

Jason


C++ PATCHes to correct value category predicate names

2016-07-08 Thread Jason Merrill
For a while I've been meaning to rename
lvalue_or_rvalue_with_address_p to glvalue_p; we've had that term for
a long time.

Then while I was at it, I decided to fix the name "lvalue_p", which
for a long time has really meant "glvalue or class prvalue".  I've
invented the name "obvalue" for this category and renamed the old
lvalue_p to obvalue_p; lvalue_p now actually matches the C++ notion of
lvalue.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit c0d75bbf3a19e80d8c3eb553b91cf5ad822983dc
Author: Jason Merrill 
Date:   Thu Jul 7 16:05:29 2016 -0400

Rename lvalue_or_rvalue_with_address_p to glvalue_p.

* tree.c (glvalue_p): Rename from lvalue_or_rvalue_with_address_p.
* call.c, cp-tree.h, typeck.c: Adjust.

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index 8b93c61..8509566 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -4549,7 +4549,7 @@ conditional_conversion (tree e1, tree e2, tsubst_flags_t 
complain)
  If E2 is an xvalue: E1 can be converted to match E2 if E1 can be
  implicitly converted to the type "rvalue reference to T2", subject to
  the constraint that the reference must bind directly.  */
-  if (lvalue_or_rvalue_with_address_p (e2))
+  if (glvalue_p (e2))
 {
   tree rtype = cp_build_reference_type (t2, !real_lvalue_p (e2));
   conv = implicit_conversion (rtype,
@@ -4882,8 +4882,7 @@ build_conditional_expr_1 (location_t loc, tree arg1, tree 
arg2, tree arg3,
&& (CLASS_TYPE_P (arg2_type) || CLASS_TYPE_P (arg3_type)
|| (same_type_ignoring_top_level_qualifiers_p (arg2_type,
   arg3_type)
-   && lvalue_or_rvalue_with_address_p (arg2)
-   && lvalue_or_rvalue_with_address_p (arg3)
+   && glvalue_p (arg2) && glvalue_p (arg3)
&& real_lvalue_p (arg2) == real_lvalue_p (arg3
 {
   conversion *conv2;
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 5b87bb3..81f4a05 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -6515,7 +6515,7 @@ extern tree copy_binfo(tree, 
tree, tree,
 extern int member_p(const_tree);
 extern cp_lvalue_kind real_lvalue_p(const_tree);
 extern cp_lvalue_kind lvalue_kind  (const_tree);
-extern bool lvalue_or_rvalue_with_address_p(const_tree);
+extern bool glvalue_p  (const_tree);
 extern bool xvalue_p   (const_tree);
 extern tree cp_stabilize_reference (tree);
 extern bool builtin_valid_in_constant_expr_p(const_tree);
diff --git a/gcc/cp/tree.c b/gcc/cp/tree.c
index fa8db0a..57da88f 100644
--- a/gcc/cp/tree.c
+++ b/gcc/cp/tree.c
@@ -266,20 +266,10 @@ real_lvalue_p (const_tree ref)
 return kind;
 }
 
-/* This differs from real_lvalue_p in that class rvalues are considered
-   lvalues.  */
+/* This differs from real_lvalue_p in that xvalues are included.  */
 
 bool
-lvalue_p (const_tree ref)
-{
-  return (lvalue_kind (ref) != clk_none);
-}
-
-/* This differs from real_lvalue_p in that rvalues formed by dereferencing
-   rvalue references are considered rvalues.  */
-
-bool
-lvalue_or_rvalue_with_address_p (const_tree ref)
+glvalue_p (const_tree ref)
 {
   cp_lvalue_kind kind = lvalue_kind (ref);
   if (kind & clk_class)
@@ -288,7 +278,16 @@ lvalue_or_rvalue_with_address_p (const_tree ref)
 return (kind != clk_none);
 }
 
-/* Returns true if REF is an xvalue, false otherwise.  */
+/* This differs from glvalue_p in that class prvalues are included.  */
+
+bool
+lvalue_p (const_tree ref)
+{
+  return (lvalue_kind (ref) != clk_none);
+}
+
+/* Returns true if REF is an xvalue (the result of dereferencing an rvalue
+   reference), false otherwise.  */
 
 bool
 xvalue_p (const_tree ref)
@@ -781,7 +780,7 @@ rvalue (tree expr)
 
   /* We need to do this for rvalue refs as well to get the right answer
  from decltype; see c++/36628.  */
-  if (!processing_template_decl && lvalue_or_rvalue_with_address_p (expr))
+  if (!processing_template_decl && glvalue_p (expr))
 expr = build1 (NON_LVALUE_EXPR, type, expr);
   else if (type != TREE_TYPE (expr))
 expr = build_nop (type, expr);
@@ -4260,7 +4259,7 @@ stabilize_expr (tree exp, tree* initp)
  arguments with such a type; just treat it as a pointer.  */
   else if (TREE_CODE (TREE_TYPE (exp)) == REFERENCE_TYPE
   || SCALAR_TYPE_P (TREE_TYPE (exp))
-  || !lvalue_or_rvalue_with_address_p (exp))
+  || !glvalue_p (exp))
 {
   init_expr = get_target_expr (exp);
   exp = TARGET_EXPR_SLOT (init_expr);
@@ -4388,7 +4387,7 @@ stabilize_init (tree init, tree *initp)
   && TREE_CODE (t) != CONSTRUCTOR
   && TREE_CODE (t) != AGGR_INIT_EXPR
   && (SCALAR_TYPE_P (TREE_TYPE (t))
- || lvalue_or_rvalue_with_address_p (t)))
+ || glvalue_p (t)))
 {
   TREE_OPERAND (init, 1) = stabilize_expr (t, init

C++ PATCH to generic lambda conversion operator

2016-07-08 Thread Jason Merrill
While working on another patch, I noticed that the dummy object used
for a generic lambda conversion operator was mistakenly created by
casting nullptr to the lambda closure type, rather than a pointer to
that type.  Fixed thus.
commit 972a2e5d623b46b32f9de4633b60d5ecf95fc5df
Author: Jason Merrill 
Date:   Thu Jul 7 17:03:57 2016 -0400

* lambda.c (maybe_add_lambda_conv_op): Fix null object argument.

diff --git a/gcc/cp/lambda.c b/gcc/cp/lambda.c
index 85ad9f8..3822882 100644
--- a/gcc/cp/lambda.c
+++ b/gcc/cp/lambda.c
@@ -904,6 +904,8 @@ maybe_add_lambda_conv_op (tree type)
   tree optype = TREE_TYPE (callop);
   tree fn_result = TREE_TYPE (optype);
 
+  tree thisarg = build_nop (TREE_TYPE (DECL_ARGUMENTS (callop)),
+   null_pointer_node);
   if (generic_lambda_p)
 {
   /* Prepare the dependent member call for the static member function
@@ -911,7 +913,8 @@ maybe_add_lambda_conv_op (tree type)
 return expression for a deduced return call op to allow for simple
 implementation of the conversion operator.  */
 
-  tree instance = build_nop (type, null_pointer_node);
+  tree instance = cp_build_indirect_ref (thisarg, RO_NULL,
+tf_warning_or_error);
   tree objfn = build_min (COMPONENT_REF, NULL_TREE,
  instance, DECL_NAME (callop), NULL_TREE);
   int nargs = list_length (DECL_ARGUMENTS (callop)) - 1;
@@ -923,9 +926,7 @@ maybe_add_lambda_conv_op (tree type)
   else
 {
   direct_argvec = make_tree_vector ();
-  direct_argvec->quick_push (build1 (NOP_EXPR,
-TREE_TYPE (DECL_ARGUMENTS (callop)),
-null_pointer_node));
+  direct_argvec->quick_push (thisarg);
 }
 
   /* Copy CALLOP's argument list (as per 'copy_list') as FN_ARGS in order to


[PATCH] RFC: On-demand locations within string-literals

2016-07-08 Thread David Malcolm
This patch implements precise tracking of source locations for the
individual chars within string literals, so that we can e.g. underline
specific ranges in -Wformat diagnostics.

It should also enable fixing PR inline-asm/57950 ("wrong line numbers
in error messages for inline assembler statements").

I posted a much earlier version of this here:
  "[PATCH 17/22] libcpp: add location tracking within string literals"
https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00744.html
and:
  "[PATCH 18/22] Track locations within string literals in tree_string"
https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00743.html
In that old approach, I attempted to capture the location data during
parsing, storing it within a new cpp_string_location class, accessed
by a new TREE_STRING_LOCATION field of STRING_CST.

Doing so would add a pointer to every string literal, and mean storing the
data somewhere (unless we only store it for the "interesting" cases
in a hash somewhere).

Manu implemented an alternative "on-demand" approach in r223470:
in c-format.c which locates the relevant line in the source file and
effectively re-lexes the literal, thus avoiding having to store anything.
That implementation has a simplified lexer that doesn't support
all possible literals ("location_column_from_byte_offset" in c-format.c):

https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=d5a2ddc76a109258297ff345957c35cb50116c94#patch2

In particular, it doesn't support concatenation or macros (amongst other
things).

In the following patch, I've taken the on-demand idea, and reimplemented
it within libcpp's string literal lexer, where the generation of
source-location information is an optional extra aspect of
cpp_interpret_string.
It's disabled during the regular lexer, but it's available through an
interface in input.{c|h} which can rerun the libcpp code and capture
the per-char source_ranges for when we need to issue a diagnostic.

This has the advantage that we share code with the libcpp string
literal lexer, rather than trying to duplicate it, and thus it can handle
everything the "real" lexer can (as it *is* the real lexer).

To handle concatentation the patch adds some extra data storage:
every time a string concatenation happens in c-lex.c: it stores
the locations of the component tokens in a hash_map, keyed by
the spelling location of the start first token
(see class string_concat_db in input.h).

Hence it's only storing extra data for string concatenations,
not for simple string literals.

This approach also handles macros.

I have followup patches in-progess (to c-format.c) that make it use
the new location information to underline bad format strings, and
provide fix-its hints for the format code that should have been
used, for PR c/64955 ("RFE: have -Wformat suggest the correct format
string to use").

Unfortunately this doesn't yet work with the C++ frontend;
the EXPR_LOCATION for the ADDR_EXPR wrapping the literals is
currently UNKNOWN_LOCATION, and this also gets overwritten
by the CALL_EXPR's location due to this in gimplify.c:

2397  /* FIXME diagnostics: This will mess up gcc.dg/Warray-bounds.c.  */
2398  /* Make sure arguments have the same location as the function call
2399 itself.  */
2400  protected_set_expr_location (*arg_p, call_location);

from 489c40889c8be89bd5bed4b166974f8c1e01e4ee (aka r140917):

+2008-10-06  Aldy Hernandez  
+
+   * gimplify.c (gimplify_arg): Add location argument.  Use it.
+   (gimplify_call_expr): Pass location to gimplify_arg.
+   (gimplify_modify_expr_to_memcpy): Same.
+   (gimplify_modify_expr_to_memset): Same.

which seems to be due to debug information:
  https://gcc.gnu.org/ml/gcc-patches/2008-10/msg00191.html

So this isn't quite ready yet.

Also, this patch currently makes the assumption (in charset.c)
that there's a 1:1 correspondence between bytes in the source
character set and bytes in the execution character set.  This can
be the case if both are, say, UTF-8, but might not hold in
general.

The source char set is UTF-8 or UTF-EBCDIC, and safe-ctype.c has:

# if HOST_CHARSET == HOST_CHARSET_EBCDIC
  #error "FIXME: write tables for EBCDIC"

so presumably we don't actually have any hosts that supports EBCDIC
(do we?); as far as I can tell, we only currently support UTF-8
as the source char set.

Similarly, do we support any targets for which the execution
character set is *not* UTF-8?

Other notes:

- this patch is on top of
  "[PATCH] input.c: add lexing selftests and a test matrix for line_table 
states"
https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01340.html
and uses the test matrix idea there to exercise the lexing
under lots of interesting situations.

- string_concat_db has a bit more indirection that I'd like,
but this was necessary in order to get gengtype to work.

- the older approach (storing locations during initial lexing),
had a reasonably compact representation, storing runs of equal
columns-per-char, but it was bit-rotted by the

patch for PR71621

2016-07-08 Thread Vladimir Makarov

The following patch fixes

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71621

The patch was successfully bootstrapped and tested on x86/x86-64.

Committed as rev. 238178


Index: ChangeLog
===
--- ChangeLog	(revision 238175)
+++ ChangeLog	(working copy)
@@ -1,3 +1,9 @@
+2016-07-08  Vladimir Makarov  
+
+	PR rtl-optimization/71621
+	* lra-constraints.c (process_alt_operands): Check combination of
+	reg class and mode.
+
 2016-06-25  Jason Merrill  
 	Richard Biener  
 
Index: testsuite/ChangeLog
===
--- testsuite/ChangeLog	(revision 238175)
+++ testsuite/ChangeLog	(working copy)
@@ -1,3 +1,9 @@
+2016-07-08  Vladimir Makarov  
+
+	PR rtl-optimization/71621
+	* gcc.target/i386/pr71621-1.c: New.
+	* gcc.target/i386/pr71621-2.c: New.
+
 2016-07-08  Cesar Philippidis  
 
 	* gfortran.dg/goacc/pr71704.f90: New test.
Index: lra-constraints.c
===
--- lra-constraints.c	(revision 237993)
+++ lra-constraints.c	(working copy)
@@ -2261,6 +2261,41 @@ process_alt_operands (int only_alternati
 		  goto fail;
 		}
 
+	  if (this_alternative != NO_REGS)
+		{
+		  HARD_REG_SET available_regs;
+		  
+		  COPY_HARD_REG_SET (available_regs,
+ reg_class_contents[this_alternative]);
+		  AND_COMPL_HARD_REG_SET
+		(available_regs,
+		 ira_prohibited_class_mode_regs[this_alternative][mode]);
+		  AND_COMPL_HARD_REG_SET (available_regs, lra_no_alloc_regs);
+		  if (hard_reg_set_empty_p (available_regs))
+		{
+		  /* There are no hard regs holding a value of given
+			 mode.  */
+		  if (offmemok)
+			{
+			  this_alternative = NO_REGS;
+			  if (lra_dump_file != NULL)
+			fprintf (lra_dump_file,
+ "%d Using memory because of"
+ " a bad mode: reject+=2\n",
+ nop);
+			  reject += 2;
+			}
+		  else
+			{
+			  if (lra_dump_file != NULL)
+			fprintf (lra_dump_file,
+ "alt=%d: Wrong mode -- refuse\n",
+ nalt);
+			  goto fail;
+			}
+		}
+		}
+
 	  /* If not assigned pseudo has a class which a subset of
 		 required reg class, it is a less costly alternative
 		 as the pseudo still can get a hard reg of necessary
Index: testsuite/gcc.target/i386/pr71621-1.c
===
--- testsuite/gcc.target/i386/pr71621-1.c	(revision 0)
+++ testsuite/gcc.target/i386/pr71621-1.c	(working copy)
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -w -ftree-vectorize -mavx2" } */
+
+int cn;
+int *li;
+
+void
+y8 (void)
+{
+  int gv;
+  int *be = &gv;
+  short int v4 = 2;
+
+  while (*li != 0)
+{
+  int sy;
+  for (sy = 0; sy < 5; ++sy)
+	{
+	  int **t6 = &be;
+	  gv |= sy ? 0 : v4;
+	  if (gv != 0)
+	++gv;
+	  t6 = &cn;
+	  if (gv != 0)
+	*t6 = 0;
+	}
+  for (gv = 0; gv < 24; ++gv)
+	v4 |= 1 <= 1 % 0;
+  ++(*li);
+}
+}
Index: testsuite/gcc.target/i386/pr71621-2.c
===
--- testsuite/gcc.target/i386/pr71621-2.c	(revision 0)
+++ testsuite/gcc.target/i386/pr71621-2.c	(working copy)
@@ -0,0 +1,39 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -mavx2" } */
+
+int hf, sv, zz, aj;
+
+void
+dn (int xb, int bl)
+{
+  while (zz < 1)
+{
+  if (xb == 0)
+	goto mr;
+
+  while (bl < 3)
+	{
+	  int d3;
+	  unsigned char vh;
+	  unsigned char *fj = &vh;
+
+	mr:
+	  while (bl < 1)
+	{
+	  hf += vh;
+	  ++bl;
+	}
+	  if (xb == 0)
+	zz = bl;
+	  if (d3 == 0)
+	return;
+	  while (sv < 1)
+	{
+	  --vh;
+	  aj += vh;
+	  ++sv;
+	}
+	}
+  sv = 0;
+}
+}


Re: RFA (gimplify): PATCH to implement C++ order of evaluation paper

2016-07-08 Thread Jason Merrill
On Fri, Jul 8, 2016 at 9:42 AM, Jakub Jelinek  wrote:
> On Thu, Jul 07, 2016 at 03:18:13PM -0400, Jason Merrill wrote:
>> How about this?  I also have a patch to handle assignment order
>> entirely in the front end, but my impression has been that you wanted
>> to make this change for other reasons as well.
>
> So what exactly is supposed to be the evaluation order for function calls
> with lhs in C++17?
> Reading
> http://en.cppreference.com/w/cpp/language/eval_order
> I'm confused.
> struct S { S (); ~S (); ... };
> S s[1024];
> typedef S (*fn) (int, int);
> fn a[1024];
> void foo (int *i, int *j, int *k, int *l)
> {
>   s[i[0]++] = (a[j[0]++]) (k[0]++, l[0]++);
> }
> So, j[0]++ needs to happen first, then k[0]++ and l[0]++ (indeterminately
> sequenced), but what about the function call vs. i[0]++?
>
> There is the rule that for E1 = E2 all side-effects of E2 happen before all
> side-effects of E1.
>
> I mean, if the function return type is a gimple reg type, then I see no
> problem in honoring that, the function call returns a temporary, then the
> side-effects of the lhs are evaluated and then it is stored to that lvalue.
>
> But, if the return type is non-POD, then we need to pass the address of the
> lhs as invisible reference to the function call, how can we do it if we
> can't yet evaluate the side-effects of the lhs?
>
> Perhaps better testcase is:
>
> int bar (int);
> void baz ()
> {
>   s[bar (0)] = (a[bar (1)]) (bar (2), 0);
> }
>
> In which order all the 4 calls are made?
>
> What the patch you've posted does is that it gimplifies from_p first,
> and gimplify_call_expr will first evaluate bar (1), then bar (2),
> but then it is a CALL_EXPR; then it gimplifies the lhs, i.e. bar (0)
> call, and finally the indirect call.

As we discussed in IRC, to get the required semantics the front-end
needs to prevent this gimplifier optimization, as in this patch.  The
second patch changes -fargs-in-order to -fstrong-eval-order and
removes the ordering of function arguments.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit 6f19d771aa9d1afde0e56aca5b69235fb19d1daa
Author: Jason Merrill 
Date:   Thu Jul 7 14:30:43 2016 -0400

P0145R2: Refining Expression Order for C++ (assignment 2).

* cp-gimplify.c (lvalue_has_side_effects): New.
(cp_gimplify_expr): Implement assignment ordering.

diff --git a/gcc/cp/cp-gimplify.c b/gcc/cp/cp-gimplify.c
index c04368f..8496d7c 100644
--- a/gcc/cp/cp-gimplify.c
+++ b/gcc/cp/cp-gimplify.c
@@ -559,6 +559,33 @@ simple_empty_class_p (tree type, tree op)
 && is_really_empty_class (type);
 }
 
+/* Returns true if evaluating E as an lvalue has side-effects;
+   specifically, a volatile lvalue has TREE_SIDE_EFFECTS, but it doesn't really
+   have side-effects until there is a read or write through it.  */
+
+static bool
+lvalue_has_side_effects (tree e)
+{
+  if (!TREE_SIDE_EFFECTS (e))
+return false;
+  while (handled_component_p (e))
+{
+  if (TREE_CODE (e) == ARRAY_REF
+ && TREE_SIDE_EFFECTS (TREE_OPERAND (e, 1)))
+   return true;
+  e = TREE_OPERAND (e, 0);
+}
+  if (DECL_P (e))
+/* Just naming a variable has no side-effects.  */
+return false;
+  else if (INDIRECT_REF_P (e))
+/* Similarly, indirection has no side-effects.  */
+return TREE_SIDE_EFFECTS (TREE_OPERAND (e, 0));
+  else
+/* For anything else, trust TREE_SIDE_EFFECTS.  */
+return TREE_SIDE_EFFECTS (e);
+}
+
 /* Do C++-specific gimplification.  Args are as for gimplify_expr.  */
 
 int
@@ -659,8 +686,6 @@ cp_gimplify_expr (tree *expr_p, gimple_seq *pre_p, 
gimple_seq *post_p)
/* Remove any copies of empty classes.  Also drop volatile
   variables on the RHS to avoid infinite recursion from
   gimplify_expr trying to load the value.  */
-   gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p, post_p,
-  is_gimple_lvalue, fb_lvalue);
if (TREE_SIDE_EFFECTS (op1))
  {
if (TREE_THIS_VOLATILE (op1)
@@ -669,8 +694,29 @@ cp_gimplify_expr (tree *expr_p, gimple_seq *pre_p, 
gimple_seq *post_p)
 
gimplify_and_add (op1, pre_p);
  }
+   gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p, post_p,
+  is_gimple_lvalue, fb_lvalue);
*expr_p = TREE_OPERAND (*expr_p, 0);
  }
+   /* P0145 says that the RHS is sequenced before the LHS.
+  gimplify_modify_expr gimplifies the RHS before the LHS, but that
+  isn't quite strong enough in two cases:
+
+  1) gimplify.c wants to leave a CALL_EXPR on the RHS, which would
+  mean it's evaluated after the LHS.
+
+  2) the value calculation of the RHS is also sequenced before the
+  LHS, so for scalar assignment we need to preevaluate if the
+  RHS could be affected by LHS side-effects even if it has no
+  side-effects of its own.  We don't need this for classes 

Re: Importing gnulib into the gcc tree

2016-07-08 Thread ayush goel
Yes, that’s correct. It has been moved before the libiberty library in the list 
now. Bootstrapped the system with the changes as well.

PFA the updated patch

--  
Thanks,  
Ayush Goel

On 8 July 2016 at 2:29:04 AM, Manuel López-Ibáñez (lopeziba...@gmail.com) wrote:
> On 7 July 2016 at 13:48, ayush goel wrote:
> > In order to show the setup works, I’ve replaced libiberty’s version by 
> > obstack by gnulib’s.  
> This was made possible by replacing the corresponding header file and then 
> including  
> gnulib headers and gnulib static library in the build path required to 
> compile gcc files.  
>  
> Hi Ayush,
>  
> I'm not an expert on the build machinery, so this question might be
> misguided: How do you know it is using the version in gnulib rather
> than the one in libiberty? I see it uses gnulib's header file but:
>  
> # Dependencies on the intl and portability libraries.
> LIBDEPS= libcommon.a $(CPPLIB) $(LIBIBERTY) $(LIBINTL_DEP) $(LIBICONV_DEP) \
> - $(LIBDECNUMBER) $(LIBBACKTRACE)
> + $(LIBDECNUMBER) $(LIBBACKTRACE) $(LIBGNU)
>  
> makes me think that the code in libiberty is found before the one in libgnu.
>  
> Cheers,
>  
> Manuel.
>  


importgnulib_7_7
Description: Binary data


[PATCH] Support running the selftests under valgrind

2016-07-08 Thread David Malcolm
This patch adds a new phony target to gcc/Makefile.in to make it easy
to run the selftests under valgrind, via "make selftest-valgrind".
This phony target isn't a dependency of anything; it's purely for
convenience (it takes about 4-5 seconds on my box).

Doing so uncovered a few leaks in the selftest suite, which the
patch also fixes, so that it runs cleanly under valgrind (on
x86_64-pc-linux-gnu, configured with --enable-valgrind-annotations,
at least).

Successfully bootstrapped®rtested on x86_64-pc-linux-gnu.
Manually verified that the valgrind output is "clean" on
x86_64-pc-linux-gnu [1].

OK for trunk?

[1]:

 HEAP SUMMARY:
 in use at exit: 1,203,983 bytes in 2,114 blocks
   total heap usage: 4,545 allocs, 2,431 frees, 3,212,841 bytes allocated

 LEAK SUMMARY:
definitely lost: 0 bytes in 0 blocks
indirectly lost: 0 bytes in 0 blocks
  possibly lost: 0 bytes in 0 blocks
still reachable: 1,203,983 bytes in 2,114 blocks
 suppressed: 0 bytes in 0 blocks
 Reachable blocks (those to which a pointer was found) are not shown.
 To see them, rerun with: --leak-check=full --show-leak-kinds=all

 For counts of detected and suppressed errors, rerun with: -v
 ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 2 from 2)

gcc/ChangeLog:
* Makefile.in (selftest-valgrind): New phony target.
* function-tests.c (selftest::build_cfg): Delete pass instances
created by the test.
(selftest::convert_to_ssa): Likewise.
(selftest::test_expansion_to_rtl): Likewise.
* tree-cfg.c (selftest::test_linear_chain): Release dominator
vectors.
(selftest::test_diamond): Likewise.
---
 gcc/Makefile.in  | 6 ++
 gcc/function-tests.c | 4 
 gcc/tree-cfg.c   | 6 ++
 3 files changed, 16 insertions(+)

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 5e7422d..1a4b5d7 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1869,6 +1869,12 @@ s-selftest: $(GCC_PASSES) cc1$(exeext) stmp-int-hdrs
 selftest-gdb: $(GCC_PASSES) cc1$(exeext) stmp-int-hdrs
$(GCC_FOR_TARGET) -xc -S -c /dev/null -fself-test -wrapper gdb,--args
 
+# Convenience method for running selftests under valgrind:
+.PHONY: selftest-valgrind
+selftest-valgrind: $(GCC_PASSES) cc1$(exeext) stmp-int-hdrs
+   $(GCC_FOR_TARGET) -xc -S -c /dev/null -fself-test \
+ -wrapper valgrind,--leak-check=full
+
 # Recompile all the language-independent object files.
 # This is used only if the user explicitly asks for it.
 compilations: $(BACKEND)
diff --git a/gcc/function-tests.c b/gcc/function-tests.c
index c8188e7..edd355f 100644
--- a/gcc/function-tests.c
+++ b/gcc/function-tests.c
@@ -296,6 +296,7 @@ build_cfg (tree fndecl)
   push_cfun (fun);
   lower_cf_pass->execute (fun);
   pop_cfun ();
+  delete lower_cf_pass;
 
   /* We can now convert to CFG form; for our trivial test function this
  gives us:
@@ -310,6 +311,7 @@ build_cfg (tree fndecl)
   push_cfun (fun);
   build_cfg_pass->execute (fun);
   pop_cfun ();
+  delete build_cfg_pass;
 }
 
 /* Convert a gimple+CFG function to SSA form.  */
@@ -325,6 +327,7 @@ convert_to_ssa (tree fndecl)
   push_cfun (fun);
   build_ssa_pass->execute (fun);
   pop_cfun ();
+  delete build_ssa_pass;
 }
 
 /* Assuming we have a simple 3-block CFG like this:
@@ -594,6 +597,7 @@ test_expansion_to_rtl ()
   init_function_start (fndecl);
   expand_pass->execute (fun);
   pop_cfun ();
+  delete expand_pass;
 
   /* On x86_64, I get this:
(note 3 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index 0fac49c..6d69435 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -9276,6 +9276,7 @@ test_linear_chain ()
   ASSERT_EQ (1, dom_by_b.length ());
   ASSERT_EQ (bb_c, dom_by_b[0]);
   free_dominance_info (CDI_DOMINATORS);
+  dom_by_b.release ();
 
   /* Similarly for post-dominance: each BB in our chain is post-dominated
  by the one after it.  */
@@ -9286,6 +9287,7 @@ test_linear_chain ()
   ASSERT_EQ (1, postdom_by_b.length ());
   ASSERT_EQ (bb_a, postdom_by_b[0]);
   free_dominance_info (CDI_POST_DOMINATORS);
+  postdom_by_b.release ();
 
   pop_cfun ();
 }
@@ -9346,8 +9348,10 @@ test_diamond ()
   ASSERT_EQ (bb_a, get_immediate_dominator (CDI_DOMINATORS, bb_d));
   vec dom_by_a = get_dominated_by (CDI_DOMINATORS, bb_a);
   ASSERT_EQ (3, dom_by_a.length ()); /* B, C, D, in some order.  */
+  dom_by_a.release ();
   vec dom_by_b = get_dominated_by (CDI_DOMINATORS, bb_b);
   ASSERT_EQ (0, dom_by_b.length ());
+  dom_by_b.release ();
   free_dominance_info (CDI_DOMINATORS);
 
   /* Similarly for post-dominance.  */
@@ -9357,8 +9361,10 @@ test_diamond ()
   ASSERT_EQ (bb_d, get_immediate_dominator (CDI_POST_DOMINATORS, bb_c));
   vec postdom_by_d = get_dominated_by (CDI_POST_DOMINATORS, bb_d);
   ASSERT_EQ (3, postdom_by_d.length ()); /* A, B, C in some order.  */
+  postdom_by_d.release ();
   vec postdom_by_b = get_dominated_by (CDI_POST_DOMINATORS, bb_b);
   ASSERT_EQ (0, postdom_by_b

Re: [PATCH PR c/71699] Handle pointer arithmetic in nonzero tree checks

2016-07-08 Thread Jeff Law

On 07/06/2016 11:22 AM, Bernd Schmidt wrote:

On 07/05/2016 12:41 PM, Richard Biener wrote:

On Fri, Jul 1, 2016 at 3:10 PM, Manish Goregaokar 
wrote:

Added a test:


Ok if this passed bootstrap/regtest.



+  return flag_delete_null_pointer_checks
+&& (tree_expr_nonzero_warnv_p (op0, strict_overflow_p)
+|| tree_expr_nonzero_warnv_p (op1, strict_overflow_p));
 case PLUS_EXPR:


But please fix the wrapping - multi-line expressions like this should be
enclosed in parentheses to make the editor deal with them correctly.

I believe this patch regresses several tests in constexpr-array-ptr10.C.

jeff



Re: [committed] Fix OpenMP parsing of the specification part in functions (PR fortran/71704)

2016-07-08 Thread Jakub Jelinek
On Fri, Jul 08, 2016 at 11:26:12AM -0700, Cesar Philippidis wrote:
> There's probably no advantage. I just didn't want to change something
> that wasn't broken. But from a consistency standpoint, I agree that all
> of the directives except for routine and declare could use matcha. This
> patch makes that change.
> 
> Is this OK?

Ok for trunk/6.2/5.5, thanks.

> 2016-07-08  Cesar Philippidis  
> 
>   gcc/fortran/
>   * parse.c (matcha): Define.
>   (decode_oacc_directive): Add spec_only local var and set it.  Use
>   matcha to parse acc directives except for routine and declare.  Return
>   ST_GET_FCN_CHARACTERISTICS if a non-declarative directive could be
>   matched.
> 
>   gcc/testsuite/
>   * gfortran.dg/goacc/pr71704.f90: New test.

Jakub


Re: [committed] Fix OpenMP parsing of the specification part in functions (PR fortran/71704)

2016-07-08 Thread Cesar Philippidis
On 07/08/2016 10:25 AM, Jakub Jelinek wrote:
> On Fri, Jul 08, 2016 at 09:58:57AM -0700, Cesar Philippidis wrote:
 +#define matcha(keyword, subr, st) \
 +do {  \
 +  if (spec_only && gfc_match (keyword) == MATCH_YES)  \
 +  goto do_spec_only;  \
 +  else if (match_word (keyword, subr, &old_locus) \
 + == MATCH_YES)\
 +  return st;  \
 +  else\
 +  undo_new_statement ();  \
 +} while (0);
 +
  static gfc_statement
  decode_oacc_directive (void)
  {
locus old_locus;
char c;
 +  bool spec_only = false;
  
gfc_enforce_clean_symbol_state ();
  
 @@ -608,6 +622,10 @@ decode_oacc_directive (void)
return ST_NONE;
  }
  
 +  if (gfc_current_state () == COMP_FUNCTION
 +  && gfc_current_block ()->result->ts.kind == -1)
 +spec_only = true;
 +
gfc_unset_implicit_pure (NULL);
  
old_locus = gfc_current_locus;
 @@ -627,7 +645,7 @@ decode_oacc_directive (void)
match ("cache", gfc_match_oacc_cache, ST_OACC_CACHE);
>>>
>>> Why isn't ST_OACC_ATOMIC matcha?
>>> At least from the case_executable/case_exec_markers vs.
>>> case_decl defines, all directives but "routine" and "declare" should
>>> be matcha IMHO.
>>
>> Because the atomic directive must operate on a sequence of instructions,
>> otherwise it should generate a syntax error.
> 
> But you are then relying on a nested decode_statement to do something, it
> works, but IMHO just rejecting them earlier is much cleaner and more
> maintainable, with the simple rule that even can be documented that
> declaration directives use match, all others use matcha (similarly how in
> decode_omp_directive directives use the matchd[os] while executable directives
> use match[os]).
> What do you see as advantage of only marking some of the executable
> directives?

There's probably no advantage. I just didn't want to change something
that wasn't broken. But from a consistency standpoint, I agree that all
of the directives except for routine and declare could use matcha. This
patch makes that change.

Is this OK?

Cesar


2016-07-08  Cesar Philippidis  

	gcc/fortran/
	* parse.c (matcha): Define.
	(decode_oacc_directive): Add spec_only local var and set it.  Use
	matcha to parse acc directives except for routine and declare.  Return
	ST_GET_FCN_CHARACTERISTICS if a non-declarative directive could be
	matched.

	gcc/testsuite/
	* gfortran.dg/goacc/pr71704.f90: New test.

diff --git a/gcc/fortran/parse.c b/gcc/fortran/parse.c
index d795225..0aa736c 100644
--- a/gcc/fortran/parse.c
+++ b/gcc/fortran/parse.c
@@ -589,11 +589,25 @@ decode_statement (void)
   return ST_NONE;
 }
 
+/* Like match and if spec_only, goto do_spec_only without actually
+   matching.  */
+#define matcha(keyword, subr, st)\
+do {			\
+  if (spec_only && gfc_match (keyword) == MATCH_YES)	\
+	goto do_spec_only;	\
+  else if (match_word (keyword, subr, &old_locus)		\
+	   == MATCH_YES)	\
+	return st;		\
+  else			\
+	undo_new_statement ();  	\
+} while (0);
+
 static gfc_statement
 decode_oacc_directive (void)
 {
   locus old_locus;
   char c;
+  bool spec_only = false;
 
   gfc_enforce_clean_symbol_state ();
 
@@ -608,6 +622,10 @@ decode_oacc_directive (void)
   return ST_NONE;
 }
 
+  if (gfc_current_state () == COMP_FUNCTION
+  && gfc_current_block ()->result->ts.kind == -1)
+spec_only = true;
+
   gfc_unset_implicit_pure (NULL);
 
   old_locus = gfc_current_locus;
@@ -621,49 +639,52 @@ decode_oacc_directive (void)
   switch (c)
 {
 case 'a':
-  match ("atomic", gfc_match_oacc_atomic, ST_OACC_ATOMIC);
+  matcha ("atomic", gfc_match_oacc_atomic, ST_OACC_ATOMIC);
   break;
 case 'c':
-  match ("cache", gfc_match_oacc_cache, ST_OACC_CACHE);
+  matcha ("cache", gfc_match_oacc_cache, ST_OACC_CACHE);
   break;
 case 'd':
-  match ("data", gfc_match_oacc_data, ST_OACC_DATA);
+  matcha ("data", gfc_match_oacc_data, ST_OACC_DATA);
   match ("declare", gfc_match_oacc_declare, ST_OACC_DECLARE);
   break;
 case 'e':
-  match ("end atomic", gfc_match_omp_eos, ST_OACC_END_ATOMIC);
-  match ("end data", gfc_match_omp_eos, ST_OACC_END_DATA);
-  match ("end host_data", gfc_match_omp_eos, ST_OACC_END_HOST_DATA);
-  match ("end kernels loop", gfc_match_omp_eos, ST_OACC_END_KERNELS_LOOP);
-  match ("end kernels", gfc_match_omp_eos, ST_OACC_END_KERNELS);
-  match ("end loop", gfc_match_omp_eos, ST_OACC_END_LOOP);
-  match ("end parallel loop", gfc_match_omp_eos, ST_OACC_END_PARALLEL_LOOP);
-  match

Re: [PATCH] Add code-hoisting to GIMPLE

2016-07-08 Thread Jeff Law

On 07/08/2016 01:30 AM, Richard Biener wrote:

On Mon, 4 Jul 2016, Steven Bosscher wrote:


On Mon, Jul 4, 2016 at 1:26 PM, Richard Biener wrote:


The following patch is Stevens code-hoisting based on PRE forward-ported
and fixed for bootstrap plus the case of hoisting code across loops
which we generally do not want (expressions in the loop exit target block
are antic-in throughout the whole loop unless they are killed and thus
get inserted into the exit block and then PREd before the loop).

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

I'm going to try making the bitmap_set ops in do_hoist_insert a bit
faster - Steven, do you remember any issues with the approach from the
time you worked on it?


Hi Richi,

It's been almost 8 years since I worked on this, so I really don't
recall much about this at all. Sorry :-)


Fair enough ;)  Apart from the loop case I noticed that code-hoisting
will cause

  if (x1_6 > 6)
goto ;
  else
goto ;

  :
  i_7 = i_2(D) + 2;

  :
  # i_1 = PHI 
  i_8 = i_1 + 2;

to be re-written to

  _18 = i_2(D) + 2;
  if (x1_6 > 6)
goto ;
  else
goto ;

  :
  _19 = _18 + 2;

  :
  # i_8 = PHI <_18(2), _19(3)>

which is because critical edge splitting splits 2->4 and thus makes
i_2(D)+2 antic-in in the else block (IIRC it wouldn't be antic-in
in bb 4 but antic-out in bb 2).  Not sure if it is worth trying to
devise a "fix" for this, it's not really a pessimization.

But it generally shows that hoisting is quite aggressive.
I don't see it as a pessimization either, though the gratutious code 
motion can make it awful hard to evaluate the real effects (I saw this 
when evaluating Click's GCM/GVN algorithm for GCC.


I thought we had a BZ where we wanted to do this kind of hoisting up 
through PHIs.  Oh, 64700, but it's in the other direction -- sink a 
common expression through a PHI.


Jeff



Richard.





Re: [PATCH] Do not emit SAVE_EXPR for already assigned SSA_NAMEs (PR71606).

2016-07-08 Thread Richard Biener
On July 8, 2016 4:23:31 PM GMT+02:00, "Martin Liška"  wrote:
>On 07/07/2016 04:15 PM, Richard Biener wrote:
>> I think it's fine though the inliners initializer handling looks
>> incredibly fragile to me ;)
>> 
>> Richard.
>
>OK, installed in trunk. May I install the patch to all active branches?
>Reg&bootstrap works for all of them.

Sure.

Richard.



Re: [committed] Fix OpenMP parsing of the specification part in functions (PR fortran/71704)

2016-07-08 Thread Jakub Jelinek
On Fri, Jul 08, 2016 at 09:58:57AM -0700, Cesar Philippidis wrote:
> >> +#define matcha(keyword, subr, st) \
> >> +do {  \
> >> +  if (spec_only && gfc_match (keyword) == MATCH_YES)  \
> >> +  goto do_spec_only;  \
> >> +  else if (match_word (keyword, subr, &old_locus) \
> >> + == MATCH_YES)\
> >> +  return st;  \
> >> +  else\
> >> +  undo_new_statement ();  \
> >> +} while (0);
> >> +
> >>  static gfc_statement
> >>  decode_oacc_directive (void)
> >>  {
> >>locus old_locus;
> >>char c;
> >> +  bool spec_only = false;
> >>  
> >>gfc_enforce_clean_symbol_state ();
> >>  
> >> @@ -608,6 +622,10 @@ decode_oacc_directive (void)
> >>return ST_NONE;
> >>  }
> >>  
> >> +  if (gfc_current_state () == COMP_FUNCTION
> >> +  && gfc_current_block ()->result->ts.kind == -1)
> >> +spec_only = true;
> >> +
> >>gfc_unset_implicit_pure (NULL);
> >>  
> >>old_locus = gfc_current_locus;
> >> @@ -627,7 +645,7 @@ decode_oacc_directive (void)
> >>match ("cache", gfc_match_oacc_cache, ST_OACC_CACHE);
> > 
> > Why isn't ST_OACC_ATOMIC matcha?
> > At least from the case_executable/case_exec_markers vs.
> > case_decl defines, all directives but "routine" and "declare" should
> > be matcha IMHO.
> 
> Because the atomic directive must operate on a sequence of instructions,
> otherwise it should generate a syntax error.

But you are then relying on a nested decode_statement to do something, it
works, but IMHO just rejecting them earlier is much cleaner and more
maintainable, with the simple rule that even can be documented that
declaration directives use match, all others use matcha (similarly how in
decode_omp_directive directives use the matchd[os] while executable directives
use match[os]).
What do you see as advantage of only marking some of the executable
directives?

> > Also, can you figure out in the OpenACC standard and/or discuss on lang
> > committee whether acc declare and/or acc routine can appear anywhere in the
> > specification part, or need to be ordered certain way?
> > If like in OpenMP they can appear anywhere, then
> > case ST_OACC_ROUTINE: case ST_OACC_DECLARE
> > should move from case_decl to case_omp_decl macro.
> 
> OK, I'll check with them. Can that be a follow up patch, or would you
> like to see it resolved in one patch?

Sure, that can be done incrementally.

Jakub


Re: [committed] Fix OpenMP parsing of the specification part in functions (PR fortran/71704)

2016-07-08 Thread Cesar Philippidis
On 07/08/2016 09:31 AM, Jakub Jelinek wrote:
> On Fri, Jul 08, 2016 at 09:19:01AM -0700, Cesar Philippidis wrote:
>> 2016-07-08  Cesar Philippidis  
>>
>>  gcc/fortran/
>>  * parse.c (matcha): Define.
>>  (decode_oacc_directive): Add spec_only local var and set it.  Use
>>  matcha to parse acc data, enter data, exit data, host_data, parallel,
>>  kernels, update and wait directives.  Return ST_GET_FCN_CHARACTERISTICS
>>  if a non-declarative directive could be matched.
>>
>>  gcc/testsuite/
>>  * gfortran.dg/goacc/pr71704-acc.f90: New test.
> 
> I'd drop the -acc suffix, the directory is enough to differentiate the gomp
> vs. goacc test.

Done.

>> --- a/gcc/fortran/parse.c
>> +++ b/gcc/fortran/parse.c
>> @@ -589,11 +589,25 @@ decode_statement (void)
>>return ST_NONE;
>>  }
>>  
>> +/* Like match, but don't match anything if not -fopenacc
>> +   and if spec_only, goto do_spec_only without actually matching.  */
> 
> The comment doesn't match what the macro does.  The whole
> decode_oacc_directive function is only called if -fopenacc, so
> it is really "Like a match and if spec_only, ..."

The intent behind that comment was to note it shouldn't be used with the
OpenMP clauses. But I agree that the -fopenacc stuff isn't necessary.
This patch updates the comment.

>> +#define matcha(keyword, subr, st)   \
>> +do {\
>> +  if (spec_only && gfc_match (keyword) == MATCH_YES)\
>> +goto do_spec_only;  \
>> +  else if (match_word (keyword, subr, &old_locus)   \
>> +   == MATCH_YES)\
>> +return st;  \
>> +  else  \
>> +undo_new_statement ();  \
>> +} while (0);
>> +
>>  static gfc_statement
>>  decode_oacc_directive (void)
>>  {
>>locus old_locus;
>>char c;
>> +  bool spec_only = false;
>>  
>>gfc_enforce_clean_symbol_state ();
>>  
>> @@ -608,6 +622,10 @@ decode_oacc_directive (void)
>>return ST_NONE;
>>  }
>>  
>> +  if (gfc_current_state () == COMP_FUNCTION
>> +  && gfc_current_block ()->result->ts.kind == -1)
>> +spec_only = true;
>> +
>>gfc_unset_implicit_pure (NULL);
>>  
>>old_locus = gfc_current_locus;
>> @@ -627,7 +645,7 @@ decode_oacc_directive (void)
>>match ("cache", gfc_match_oacc_cache, ST_OACC_CACHE);
> 
> Why isn't ST_OACC_ATOMIC matcha?
> At least from the case_executable/case_exec_markers vs.
> case_decl defines, all directives but "routine" and "declare" should
> be matcha IMHO.

Because the atomic directive must operate on a sequence of instructions,
otherwise it should generate a syntax error.

> Also, can you figure out in the OpenACC standard and/or discuss on lang
> committee whether acc declare and/or acc routine can appear anywhere in the
> specification part, or need to be ordered certain way?
> If like in OpenMP they can appear anywhere, then
> case ST_OACC_ROUTINE: case ST_OACC_DECLARE
> should move from case_decl to case_omp_decl macro.

OK, I'll check with them. Can that be a follow up patch, or would you
like to see it resolved in one patch?

Is this patch OK for trunk and gcc6?

Cesar

2016-07-08  Cesar Philippidis  

	gcc/fortran/
	* parse.c (matcha): Define.
	(decode_oacc_directive): Add spec_only local var and set it.  Use
	matcha to parse acc data, enter data, exit data, host_data, parallel,
	kernels, update and wait directives.  Return ST_GET_FCN_CHARACTERISTICS
	if a non-declarative directive could be matched.

	gcc/testsuite/
	* gfortran.dg/goacc/pr71704.f90: New test.

diff --git a/gcc/fortran/parse.c b/gcc/fortran/parse.c
index d795225..39fdd90 100644
--- a/gcc/fortran/parse.c
+++ b/gcc/fortran/parse.c
@@ -589,11 +589,25 @@ decode_statement (void)
   return ST_NONE;
 }
 
+/* Like match and if spec_only, goto do_spec_only without actually
+   matching.  */
+#define matcha(keyword, subr, st)\
+do {			\
+  if (spec_only && gfc_match (keyword) == MATCH_YES)	\
+	goto do_spec_only;	\
+  else if (match_word (keyword, subr, &old_locus)		\
+	   == MATCH_YES)	\
+	return st;		\
+  else			\
+	undo_new_statement ();  	\
+} while (0);
+
 static gfc_statement
 decode_oacc_directive (void)
 {
   locus old_locus;
   char c;
+  bool spec_only = false;
 
   gfc_enforce_clean_symbol_state ();
 
@@ -608,6 +622,10 @@ decode_oacc_directive (void)
   return ST_NONE;
 }
 
+  if (gfc_current_state () == COMP_FUNCTION
+  && gfc_current_block ()->result->ts.kind == -1)
+spec_only = true;
+
   gfc_unset_implicit_pure (NULL);
 
   old_locus = gfc_current_locus;
@@ -627,7 +645,7 @@ decode_oacc_directive (void)
   match ("cache", gfc_match_oacc_cache, ST_OACC_CACHE);
   break;
 case 'd':
-  match ("data", gfc_match_oacc_

Re: [committed] Fix OpenMP parsing of the specification part in functions (PR fortran/71704)

2016-07-08 Thread Jakub Jelinek
On Fri, Jul 08, 2016 at 09:19:01AM -0700, Cesar Philippidis wrote:
> 2016-07-08  Cesar Philippidis  
> 
>   gcc/fortran/
>   * parse.c (matcha): Define.
>   (decode_oacc_directive): Add spec_only local var and set it.  Use
>   matcha to parse acc data, enter data, exit data, host_data, parallel,
>   kernels, update and wait directives.  Return ST_GET_FCN_CHARACTERISTICS
>   if a non-declarative directive could be matched.
> 
>   gcc/testsuite/
>   * gfortran.dg/goacc/pr71704-acc.f90: New test.

I'd drop the -acc suffix, the directory is enough to differentiate the gomp
vs. goacc test.

> --- a/gcc/fortran/parse.c
> +++ b/gcc/fortran/parse.c
> @@ -589,11 +589,25 @@ decode_statement (void)
>return ST_NONE;
>  }
>  
> +/* Like match, but don't match anything if not -fopenacc
> +   and if spec_only, goto do_spec_only without actually matching.  */

The comment doesn't match what the macro does.  The whole
decode_oacc_directive function is only called if -fopenacc, so
it is really "Like a match and if spec_only, ..."

> +#define matcha(keyword, subr, st)\
> +do { \
> +  if (spec_only && gfc_match (keyword) == MATCH_YES) \
> + goto do_spec_only;  \
> +  else if (match_word (keyword, subr, &old_locus)\
> +== MATCH_YES)\
> + return st;  \
> +  else   \
> + undo_new_statement ();  \
> +} while (0);
> +
>  static gfc_statement
>  decode_oacc_directive (void)
>  {
>locus old_locus;
>char c;
> +  bool spec_only = false;
>  
>gfc_enforce_clean_symbol_state ();
>  
> @@ -608,6 +622,10 @@ decode_oacc_directive (void)
>return ST_NONE;
>  }
>  
> +  if (gfc_current_state () == COMP_FUNCTION
> +  && gfc_current_block ()->result->ts.kind == -1)
> +spec_only = true;
> +
>gfc_unset_implicit_pure (NULL);
>  
>old_locus = gfc_current_locus;
> @@ -627,7 +645,7 @@ decode_oacc_directive (void)
>match ("cache", gfc_match_oacc_cache, ST_OACC_CACHE);

Why isn't ST_OACC_ATOMIC matcha?
At least from the case_executable/case_exec_markers vs.
case_decl defines, all directives but "routine" and "declare" should
be matcha IMHO.

Also, can you figure out in the OpenACC standard and/or discuss on lang
committee whether acc declare and/or acc routine can appear anywhere in the
specification part, or need to be ordered certain way?
If like in OpenMP they can appear anywhere, then
case ST_OACC_ROUTINE: case ST_OACC_DECLARE
should move from case_decl to case_omp_decl macro.

>break;
>  case 'd':
> -  match ("data", gfc_match_oacc_data, ST_OACC_DATA);
> +  matcha ("data", gfc_match_oacc_data, ST_OACC_DATA);
>match ("declare", gfc_match_oacc_declare, ST_OACC_DECLARE);
>break;
>  case 'e':
> @@ -639,19 +657,19 @@ decode_oacc_directive (void)
>match ("end loop", gfc_match_omp_eos, ST_OACC_END_LOOP);
>match ("end parallel loop", gfc_match_omp_eos, 
> ST_OACC_END_PARALLEL_LOOP);
>match ("end parallel", gfc_match_omp_eos, ST_OACC_END_PARALLEL);
> -  match ("enter data", gfc_match_oacc_enter_data, ST_OACC_ENTER_DATA);
> -  match ("exit data", gfc_match_oacc_exit_data, ST_OACC_EXIT_DATA);
> +  matcha ("enter data", gfc_match_oacc_enter_data, ST_OACC_ENTER_DATA);
> +  matcha ("exit data", gfc_match_oacc_exit_data, ST_OACC_EXIT_DATA);
>break;
>  case 'h':
> -  match ("host_data", gfc_match_oacc_host_data, ST_OACC_HOST_DATA);
> +  matcha ("host_data", gfc_match_oacc_host_data, ST_OACC_HOST_DATA);
>break;
>  case 'p':
>match ("parallel loop", gfc_match_oacc_parallel_loop, 
> ST_OACC_PARALLEL_LOOP);
> -  match ("parallel", gfc_match_oacc_parallel, ST_OACC_PARALLEL);
> +  matcha ("parallel", gfc_match_oacc_parallel, ST_OACC_PARALLEL);
>break;
>  case 'k':
>match ("kernels loop", gfc_match_oacc_kernels_loop, 
> ST_OACC_KERNELS_LOOP);
> -  match ("kernels", gfc_match_oacc_kernels, ST_OACC_KERNELS);
> +  matcha ("kernels", gfc_match_oacc_kernels, ST_OACC_KERNELS);
>break;
>  case 'l':
>match ("loop", gfc_match_oacc_loop, ST_OACC_LOOP);
> @@ -660,10 +678,10 @@ decode_oacc_directive (void)
>match ("routine", gfc_match_oacc_routine, ST_OACC_ROUTINE);
>break;
>  case 'u':
> -  match ("update", gfc_match_oacc_update, ST_OACC_UPDATE);
> +  matcha ("update", gfc_match_oacc_update, ST_OACC_UPDATE);
>break;
>  case 'w':
> -  match ("wait", gfc_match_oacc_wait, ST_OACC_WAIT);
> +  matcha ("wait", gfc_match_oacc_wait, ST_OACC_WAIT);
>break;
>  }
>  
> @@ -678,6 +696,13 @@ decode_oacc_directive (void)
>gfc_error_recovery ();

Re: Improve insert/emplace robustness to self insertion

2016-07-08 Thread Jonathan Wakely

On 06/07/16 21:46 +0200, François Dumont wrote:

Don't you plan to add it to the testsuite ?


Done with the attached aptch.

On my side I rebase part of my patch to reorganize a little bit code. 
I reintroduced _M_realloc_insert which isolates the code of 
_M_insert_aux used when we need to reallocate memory. So _M_insert_aux 
is used only when insertion can be done in place. It is a nice 
replacement for _M_emplace_back_aux that have been removed. In most of 
vector modifiers we start checking if we need to reallocate or not. 
With this reorganization we don't check it several times. Moreover, as 
soon as we reallocate we know that we don't need to do any temporary 
copy so insert_vs_emplace.cc test04 has been adapted and we now have 
no situation where emplace and insert are not equivalent.


   * include/bits/stl_vector.h (push_back(const value_type&)): Forward
   to _M_realloc_insert.
   (insert(const_iterator, value_type&&)): Forward to _M_insert_rval.
   (_M_realloc_insert): Declare new function.
   (_M_emplace_back_aux): Remove definition.
   * include/bits/vector.tcc (emplace_back(_Args...)):
   Use _M_realloc_insert.
   (insert(const_iterator, const value_type&)): Likewise.
   (_M_insert_rval, _M_emplace_aux): Likewise.
   (_M_emplace_back_aux): Remove declaration.
   (_M_realloc_insert): Define.
   * testsuite/23_containers/vector/modifiers/insert_vs_emplace.cc:
   Adjust expected results for emplacing an lvalue with reallocation.

Tested under Linux x86_64.

Ok to commit ?


This is excellent work, thanks for doing it.

OK for trunk.


commit dd91c89f43bb79bc4e206824341536a234542c64
Author: redi 
Date:   Fri Jul 8 16:35:10 2016 +

	* testsuite/23_containers/vector/modifiers/insert/aliasing.cc: New.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@238169 138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/libstdc++-v3/testsuite/23_containers/vector/modifiers/insert/aliasing.cc b/libstdc++-v3/testsuite/23_containers/vector/modifiers/insert/aliasing.cc
new file mode 100644
index 000..2ef13b4
--- /dev/null
+++ b/libstdc++-v3/testsuite/23_containers/vector/modifiers/insert/aliasing.cc
@@ -0,0 +1,79 @@
+// Copyright (C) 2016 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-std=gnu++14" }
+
+#include 
+#include 
+#include 
+
+// See https://gcc.gnu.org/ml/libstdc++/2016-07/msg8.html for background.
+
+struct T
+{
+ T(int v = 0) : value(v) { }
+ T(const T& t);
+ T& operator=(const T& t);
+ void make_child() { child = std::make_unique(value + 10); }
+ std::unique_ptr child;
+ int value;
+};
+
+T::T(const T& t) : value(t.value)
+{
+ if (t.child)
+   child.reset(new T(*t.child));
+}
+
+T& T::operator=(const T& t)
+{
+ value = t.value;
+ if (t.child)
+ {
+   if (child)
+ *child = *t.child;
+   else
+ child.reset(new T(*t.child));
+ }
+ else
+   child.reset();
+ return *this;
+}
+
+void
+test01()
+{
+ std::vector v;
+ v.reserve(3);
+ v.push_back(T(1));
+ v.back().make_child();
+ v.push_back(T(2));
+ v.back().make_child();
+
+ VERIFY(v[1].child->value == 12);
+ VERIFY(v[1].child->child == nullptr);
+
+ v.insert(v.begin(), *v[1].child);
+
+ VERIFY(v[0].value == 12);
+ VERIFY(v[0].child == nullptr);
+}
+
+int main()
+{
+  test01();
+}


Re: [committed] Fix OpenMP parsing of the specification part in functions (PR fortran/71704)

2016-07-08 Thread Cesar Philippidis
On 07/08/2016 09:18 AM, Jakub Jelinek wrote:
> On Fri, Jul 08, 2016 at 09:13:50AM -0700, Cesar Philippidis wrote:
>> On 06/30/2016 10:47 AM, Jakub Jelinek wrote:
>>
>>> The Fortran parser apparently relies in functions that have still undecided
>>> kind of the result that ST_GET_FCN_CHARACTERISTICS artificial statement is
>>> returned before any executable statements in the function.
>>> In normal statements that is ensured through decode_statement calling
>>> decode_specification_statement, which parses just a subset of statements,
>>> but for OpenMP we need to do something similar.  If we figure out we want
>>> only the case_omp_decl statements, for any other we just try to gfc_match
>>> the keyword and if we match it, it means we'd be about to return an OpenMP
>>> executable statement, so instead return ST_GET_FCN_CHARACTERISTICS.
>>>
>>> Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk,
>>> queued for 6.2 backport.
>>>
>>> Cesar, note OpenACC will need something similar (though,
>>> decode_acc_statement uses just the match macro, so you'll need another one
>>> for the executable statements).
>>
>> Here's the OpenACC followup for this patch. Is it OK for trunk and gcc6?
> 
> ENOPATCH

Sorry!

Cesar

2016-07-08  Cesar Philippidis  

	gcc/fortran/
	* parse.c (matcha): Define.
	(decode_oacc_directive): Add spec_only local var and set it.  Use
	matcha to parse acc data, enter data, exit data, host_data, parallel,
	kernels, update and wait directives.  Return ST_GET_FCN_CHARACTERISTICS
	if a non-declarative directive could be matched.

	gcc/testsuite/
	* gfortran.dg/goacc/pr71704-acc.f90: New test.


diff --git a/gcc/fortran/parse.c b/gcc/fortran/parse.c
index d795225..b1d9c00 100644
--- a/gcc/fortran/parse.c
+++ b/gcc/fortran/parse.c
@@ -589,11 +589,25 @@ decode_statement (void)
   return ST_NONE;
 }
 
+/* Like match, but don't match anything if not -fopenacc
+   and if spec_only, goto do_spec_only without actually matching.  */
+#define matcha(keyword, subr, st)\
+do {			\
+  if (spec_only && gfc_match (keyword) == MATCH_YES)	\
+	goto do_spec_only;	\
+  else if (match_word (keyword, subr, &old_locus)		\
+	   == MATCH_YES)	\
+	return st;		\
+  else			\
+	undo_new_statement ();  	\
+} while (0);
+
 static gfc_statement
 decode_oacc_directive (void)
 {
   locus old_locus;
   char c;
+  bool spec_only = false;
 
   gfc_enforce_clean_symbol_state ();
 
@@ -608,6 +622,10 @@ decode_oacc_directive (void)
   return ST_NONE;
 }
 
+  if (gfc_current_state () == COMP_FUNCTION
+  && gfc_current_block ()->result->ts.kind == -1)
+spec_only = true;
+
   gfc_unset_implicit_pure (NULL);
 
   old_locus = gfc_current_locus;
@@ -627,7 +645,7 @@ decode_oacc_directive (void)
   match ("cache", gfc_match_oacc_cache, ST_OACC_CACHE);
   break;
 case 'd':
-  match ("data", gfc_match_oacc_data, ST_OACC_DATA);
+  matcha ("data", gfc_match_oacc_data, ST_OACC_DATA);
   match ("declare", gfc_match_oacc_declare, ST_OACC_DECLARE);
   break;
 case 'e':
@@ -639,19 +657,19 @@ decode_oacc_directive (void)
   match ("end loop", gfc_match_omp_eos, ST_OACC_END_LOOP);
   match ("end parallel loop", gfc_match_omp_eos, ST_OACC_END_PARALLEL_LOOP);
   match ("end parallel", gfc_match_omp_eos, ST_OACC_END_PARALLEL);
-  match ("enter data", gfc_match_oacc_enter_data, ST_OACC_ENTER_DATA);
-  match ("exit data", gfc_match_oacc_exit_data, ST_OACC_EXIT_DATA);
+  matcha ("enter data", gfc_match_oacc_enter_data, ST_OACC_ENTER_DATA);
+  matcha ("exit data", gfc_match_oacc_exit_data, ST_OACC_EXIT_DATA);
   break;
 case 'h':
-  match ("host_data", gfc_match_oacc_host_data, ST_OACC_HOST_DATA);
+  matcha ("host_data", gfc_match_oacc_host_data, ST_OACC_HOST_DATA);
   break;
 case 'p':
   match ("parallel loop", gfc_match_oacc_parallel_loop, ST_OACC_PARALLEL_LOOP);
-  match ("parallel", gfc_match_oacc_parallel, ST_OACC_PARALLEL);
+  matcha ("parallel", gfc_match_oacc_parallel, ST_OACC_PARALLEL);
   break;
 case 'k':
   match ("kernels loop", gfc_match_oacc_kernels_loop, ST_OACC_KERNELS_LOOP);
-  match ("kernels", gfc_match_oacc_kernels, ST_OACC_KERNELS);
+  matcha ("kernels", gfc_match_oacc_kernels, ST_OACC_KERNELS);
   break;
 case 'l':
   match ("loop", gfc_match_oacc_loop, ST_OACC_LOOP);
@@ -660,10 +678,10 @@ decode_oacc_directive (void)
   match ("routine", gfc_match_oacc_routine, ST_OACC_ROUTINE);
   break;
 case 'u':
-  match ("update", gfc_match_oacc_update, ST_OACC_UPDATE);
+  matcha ("update", gfc_match_oacc_update, ST_OACC_UPDATE);
   break;
 case 'w':
-  match ("wait", gfc_match_oacc_wait, ST_OACC_WAIT);
+  matcha ("wait", gfc_match_oacc_wait, ST_OACC_WAIT);
   break;
 }
 
@@ -678,6 +696,13 @@ decode_oacc_directive (void)
   gfc_error_recovery ();
 
   return ST_NONE;
+
+ do_spec_only:
+  reject

Re: [committed] Fix OpenMP parsing of the specification part in functions (PR fortran/71704)

2016-07-08 Thread Jakub Jelinek
On Fri, Jul 08, 2016 at 09:13:50AM -0700, Cesar Philippidis wrote:
> On 06/30/2016 10:47 AM, Jakub Jelinek wrote:
> 
> > The Fortran parser apparently relies in functions that have still undecided
> > kind of the result that ST_GET_FCN_CHARACTERISTICS artificial statement is
> > returned before any executable statements in the function.
> > In normal statements that is ensured through decode_statement calling
> > decode_specification_statement, which parses just a subset of statements,
> > but for OpenMP we need to do something similar.  If we figure out we want
> > only the case_omp_decl statements, for any other we just try to gfc_match
> > the keyword and if we match it, it means we'd be about to return an OpenMP
> > executable statement, so instead return ST_GET_FCN_CHARACTERISTICS.
> > 
> > Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk,
> > queued for 6.2 backport.
> > 
> > Cesar, note OpenACC will need something similar (though,
> > decode_acc_statement uses just the match macro, so you'll need another one
> > for the executable statements).
> 
> Here's the OpenACC followup for this patch. Is it OK for trunk and gcc6?

ENOPATCH

Jakub


Re: [committed] Fix OpenMP parsing of the specification part in functions (PR fortran/71704)

2016-07-08 Thread Cesar Philippidis
On 06/30/2016 10:47 AM, Jakub Jelinek wrote:

> The Fortran parser apparently relies in functions that have still undecided
> kind of the result that ST_GET_FCN_CHARACTERISTICS artificial statement is
> returned before any executable statements in the function.
> In normal statements that is ensured through decode_statement calling
> decode_specification_statement, which parses just a subset of statements,
> but for OpenMP we need to do something similar.  If we figure out we want
> only the case_omp_decl statements, for any other we just try to gfc_match
> the keyword and if we match it, it means we'd be about to return an OpenMP
> executable statement, so instead return ST_GET_FCN_CHARACTERISTICS.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk,
> queued for 6.2 backport.
> 
> Cesar, note OpenACC will need something similar (though,
> decode_acc_statement uses just the match macro, so you'll need another one
> for the executable statements).

Here's the OpenACC followup for this patch. Is it OK for trunk and gcc6?

Cesar


Re: [PATCH, GCC/LRA] Teach LRA to not use same register value for multiple output operands of an insn

2016-07-08 Thread Mike Stump
On Jul 8, 2016, at 8:07 AM, Thomas Preudhomme  
wrote:
> While investigating the root cause a testsuite regression for the 
> ARM/embedded-5-branch GCC in gcc.dg/vect/slp-perm-5.c, we found that the bug 
> seems to also affect trunk.

Hum...  If in 6.x, and safe to back port to 6, a back port would be nice...  I 
use LRA in 6.x, and seems like I'd be susceptible to this sort of thing, but, I 
didn't test it.

[COMMITTED][AArch64] Fix simd intrinsics bug on float vminnm/vmaxnm

2016-07-08 Thread Jiong Wang

On 07/07/16 10:34, James Greenhalgh wrote:


To make backporting easier, could you please write a very simple
standalone test that exposes this bug, and submit this patch with just
that simple test? I've already OKed the functional part of this patch, and
I'm happy to pre-approve a simple testcase.

With that committed to trunk, this needs to go to all active release
branches please.


Committed attached patch to trunk as r238166, fmax/fmin pattern were
introduced by [1] which is available since gcc 6, so backported to
gcc 6 branch as r238167.

--
[1] https://gcc.gnu.org/ml/gcc-patches/2015-11/msg02654.html
diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index 3e4740c..f1ad325 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -244,13 +244,17 @@
   /* Implemented by 3.
  smax variants map to fmaxnm,
  smax_nan variants map to fmax.  */
-  BUILTIN_VDQIF (BINOP, smax, 3)
-  BUILTIN_VDQIF (BINOP, smin, 3)
+  BUILTIN_VDQ_BHSI (BINOP, smax, 3)
+  BUILTIN_VDQ_BHSI (BINOP, smin, 3)
   BUILTIN_VDQ_BHSI (BINOP, umax, 3)
   BUILTIN_VDQ_BHSI (BINOP, umin, 3)
   BUILTIN_VDQF (BINOP, smax_nan, 3)
   BUILTIN_VDQF (BINOP, smin_nan, 3)
 
+  /* Implemented by 3.  */
+  BUILTIN_VDQF (BINOP, fmax, 3)
+  BUILTIN_VDQF (BINOP, fmin, 3)
+
   /* Implemented by aarch64_p.  */
   BUILTIN_VDQ_BHSI (BINOP, smaxp, 0)
   BUILTIN_VDQ_BHSI (BINOP, sminp, 0)
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index ed24b59..b0ab1d3 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -17588,19 +17588,19 @@ vpminnms_f32 (float32x2_t a)
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vmaxnm_f32 (float32x2_t __a, float32x2_t __b)
 {
-  return __builtin_aarch64_smaxv2sf (__a, __b);
+  return __builtin_aarch64_fmaxv2sf (__a, __b);
 }
 
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vmaxnmq_f32 (float32x4_t __a, float32x4_t __b)
 {
-  return __builtin_aarch64_smaxv4sf (__a, __b);
+  return __builtin_aarch64_fmaxv4sf (__a, __b);
 }
 
 __extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
 vmaxnmq_f64 (float64x2_t __a, float64x2_t __b)
 {
-  return __builtin_aarch64_smaxv2df (__a, __b);
+  return __builtin_aarch64_fmaxv2df (__a, __b);
 }
 
 /* vmaxv  */
@@ -17818,19 +17818,19 @@ vminq_u32 (uint32x4_t __a, uint32x4_t __b)
 __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
 vminnm_f32 (float32x2_t __a, float32x2_t __b)
 {
-  return __builtin_aarch64_sminv2sf (__a, __b);
+  return __builtin_aarch64_fminv2sf (__a, __b);
 }
 
 __extension__ static __inline float32x4_t __attribute__ ((__always_inline__))
 vminnmq_f32 (float32x4_t __a, float32x4_t __b)
 {
-  return __builtin_aarch64_sminv4sf (__a, __b);
+  return __builtin_aarch64_fminv4sf (__a, __b);
 }
 
 __extension__ static __inline float64x2_t __attribute__ ((__always_inline__))
 vminnmq_f64 (float64x2_t __a, float64x2_t __b)
 {
-  return __builtin_aarch64_sminv2df (__a, __b);
+  return __builtin_aarch64_fminv2df (__a, __b);
 }
 
 /* vminv  */
diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vminmaxnm_1.c b/gcc/testsuite/gcc.target/aarch64/simd/vminmaxnm_1.c
new file mode 100644
index 000..8333f03
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/simd/vminmaxnm_1.c
@@ -0,0 +1,82 @@
+/* Test the `v[min|max]nm{q}_f*' AArch64 SIMD intrinsic.  */
+
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+#include "arm_neon.h"
+
+extern void abort ();
+
+#define CHECK(T, N, R, E) \
+  {\
+int i = 0;\
+for (; i < N; i++)\
+  if (* (T *) &R[i] != * (T *) &E[i])\
+	abort ();\
+  }
+
+int
+main (int argc, char **argv)
+{
+  float32x2_t f32x2_input1 = vdup_n_f32 (-1.0);
+  float32x2_t f32x2_input2 = vdup_n_f32 (0.0);
+  float32x2_t f32x2_exp_minnm  = vdup_n_f32 (-1.0);
+  float32x2_t f32x2_exp_maxnm  = vdup_n_f32 (0.0);
+  float32x2_t f32x2_ret_minnm  = vminnm_f32 (f32x2_input1, f32x2_input2);
+  float32x2_t f32x2_ret_maxnm  = vmaxnm_f32 (f32x2_input1, f32x2_input2);
+
+  CHECK (uint32_t, 2, f32x2_ret_minnm, f32x2_exp_minnm);
+  CHECK (uint32_t, 2, f32x2_ret_maxnm, f32x2_exp_maxnm);
+
+  f32x2_input1 = vdup_n_f32 (__builtin_nanf (""));
+  f32x2_input2 = vdup_n_f32 (1.0);
+  f32x2_exp_minnm  = vdup_n_f32 (1.0);
+  f32x2_exp_maxnm  = vdup_n_f32 (1.0);
+  f32x2_ret_minnm  = vminnm_f32 (f32x2_input1, f32x2_input2);
+  f32x2_ret_maxnm  = vmaxnm_f32 (f32x2_input1, f32x2_input2);
+
+  CHECK (uint32_t, 2, f32x2_ret_minnm, f32x2_exp_minnm);
+  CHECK (uint32_t, 2, f32x2_ret_maxnm, f32x2_exp_maxnm);
+
+  float32x4_t f32x4_input1 = vdupq_n_f32 (-1024.0);
+  float32x4_t f32x4_input2 = vdupq_n_f32 (77.0);
+  float32x4_t f32x4_exp_minnm  = vdupq_n_f32 (-1024.0);
+  float32x4_t f32x4_exp_maxnm  = vdupq_n_f32 (77.0);
+  float32x4_t f32x4_ret_minnm  = vminnmq_f32 (f32x4_input1, f32x4_input2);
+  float32x4_t f32x4_ret_ma

Re: [PATCH, rs6000] Fix PR71297 (ICE on invalid calls to vec_ld and vec_st)

2016-07-08 Thread Bill Schmidt
Fixed in trunk with r238168, test case included.  Thanks!

Bill

> On Jul 8, 2016, at 7:29 AM, Bill Schmidt  wrote:
> 
>> 
>> On Jul 8, 2016, at 12:14 AM, Segher Boessenkool  
>> wrote:
>> 
>> On Thu, Jul 07, 2016 at 03:40:28PM -0500, Bill Schmidt wrote:
 PR71297 reports that we ICE when __builtin_vec_ld or __builtin_vec_st is
 provided with an incorrect number of arguments.  This patch fixes it by
 bypassing special handling for these intrinsics when the number of
 arguments is wrong, thus allowing the standard error handling for
 builtins to kick in.
 
 The patch is pretty obvious and I think adding a test case would be
 extraneous, though I can do so if desired.
>> 
>> Well you could use the one from the PR?
>> 
 Bootstrapped and tested on
 powerpc64le-unknown-linux-gnu with no regressions, and the original
 failure is fixed.  Is this ok for trunk?
>> 
>> Yes, but please do a testcase.  Okay for backports, too.
> 
> No backports required; this is a 7 regression.
> 
> Bill
> 
>> 
>> 
>> Segher
>> 
>> 
PR target/71297
* config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin):
Allow standard error handling to take over when a wrong number
of arguments is presented to __builtin_vec_ld () or
__builtin_vec_st ().



[patch] libstdc++/58265 backport to gcc-5-branch (?)

2016-07-08 Thread Jonathan Wakely

I've had a request to backport the allocator propagation support for
std::__cxx11::string to the gcc-5-branch. That was done for 6cc-6 by
this patch: https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00784.html

The patch also fixed the fact that moving strings was not noexcept.

Does anyone see any problem with doing that backport to complete the
std::__cxx11::string implementation in gcc-5?

I've already done it locally and confirmed all the tests pass, of
course. The patch for the branch is attached.



commit d4691e319e31386c218304a59e80a1d5ad3cee1e
Author: redi 
Date:   Fri Sep 11 11:02:14 2015 +

Implement N4258 noexcept for std::basic_string.

	Backport from mainline
	2015-10-02  Jonathan Wakely  

	* testsuite/21_strings/basic_string/allocator/char/minimal.cc: Guard
	explicit instantiation with check for new ABI.
	* testsuite/21_strings/basic_string/allocator/wchar_t/minimal.cc:
	Likewise. Use wchar_t as char_type.

	Backport from mainline
	2015-09-11  Jonathan Wakely  

	PR libstdc++/58265
	* doc/xml/manual/intro.xml: Document LWG 2063 and 2064 resolutions.
	* doc/html/manual/bugs.html: Regenerate.
	* include/bits/basic_string.h (basic_string): Implement N4258. Add
	correct exception-specifications and propagate allocators correctly.
	* include/bits/basic_string.tcc (basic_string::swap): Propagate
	allocators correctly.
	* include/debug/string (__gnu_debug::basic_string): Add correct
	exceptions-specifications and allcoator-extended constructors.
	* testsuite/21_strings/basic_string/allocator/char/copy.cc: New.
	* testsuite/21_strings/basic_string/allocator/char/copy_assign.cc:
	New.
	* testsuite/21_strings/basic_string/allocator/char/minimal.cc: New.
	* testsuite/21_strings/basic_string/allocator/char/move.cc: New.
	* testsuite/21_strings/basic_string/allocator/char/move_assign.cc:
	New.
	* testsuite/21_strings/basic_string/allocator/char/noexcept.cc: New.
	* testsuite/21_strings/basic_string/allocator/char/swap.cc: New.
	* testsuite/21_strings/basic_string/allocator/wchar_t/copy.cc: New.
	* testsuite/21_strings/basic_string/allocator/wchar_t/copy_assign.cc:
	New.
	* testsuite/21_strings/basic_string/allocator/wchar_t/minimal.cc: New.
	* testsuite/21_strings/basic_string/allocator/wchar_t/move.cc: New.
	* testsuite/21_strings/basic_string/allocator/wchar_t/move_assign.cc:
	New.
	* testsuite/21_strings/basic_string/allocator/wchar_t/noexcept.cc: New.
	* testsuite/21_strings/basic_string/allocator/wchar_t/swap.cc: New.
	* testsuite/util/testsuite_allocator.h (tracker_allocator): Define
	defaulted assignment operators.

diff --git a/libstdc++-v3/doc/html/manual/bugs.html b/libstdc++-v3/doc/html/manual/bugs.html
index 8dccb02..02963ee 100644
--- a/libstdc++-v3/doc/html/manual/bugs.html
+++ b/libstdc++-v3/doc/html/manual/bugs.html
@@ -363,6 +363,12 @@
 2059:
 	C++0x ambiguity problem with map::erase
 Add additional overloads.
+2063:
+	Contradictory requirements for string move assignment
+Respect propagation trait for move assignment.
+2064:
+	More noexcept issues in basic_string
+Add noexcept to the comparison operators.
 2067:
 	packaged_task should have deleted copy c'tor with const parameter
 Fix signatures.
diff --git a/libstdc++-v3/doc/xml/manual/intro.xml b/libstdc++-v3/doc/xml/manual/intro.xml
index 2169905..1deb413 100644
--- a/libstdc++-v3/doc/xml/manual/intro.xml
+++ b/libstdc++-v3/doc/xml/manual/intro.xml
@@ -840,6 +840,18 @@ requirements of the license of GCC.
 Add additional overloads.
 
 
+http://www.w3.org/1999/xlink"; xlink:href="../ext/lwg-defects.html#2063">2063:
+	Contradictory requirements for string move assignment
+
+Respect propagation trait for move assignment.
+
+
+http://www.w3.org/1999/xlink"; xlink:href="../ext/lwg-defects.html#2064">2064:
+	More noexcept issues in basic_string
+
+Add noexcept to the comparison operators.
+
+
 http://www.w3.org/1999/xlink"; xlink:href="../ext/lwg-defects.html#2067">2067:
 	packaged_task should have deleted copy c'tor with const parameter
 
diff --git a/libstdc++-v3/include/bits/basic_string.h b/libstdc++-v3/include/bits/basic_string.h
index 9ef5be9..4020dcb 100644
--- a/libstdc++-v3/include/bits/basic_string.h
+++ b/libstdc++-v3/include/bits/basic_string.h
@@ -387,7 +387,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
*  @brief  Construct an empty string using allocator @a a.
*/
   explicit
-  basic_string(const _Alloc& __a)
+  basic_string(const _Alloc& __a) _GLIBCXX_NOEXCEPT
   : _M_dataplus(_M_local_data(), __a)
   { _M_set_length(0); }
 
@@ -396,7 +396,8 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
*  @param  __str  Source string.
*/
   basic_string(const basic_string& __str)
-  : _M_dataplus(_M_local_data(), __str._M_get_allocator()) // TODO A traits
+  

[PATCH, GCC/LRA] Teach LRA to not use same register value for multiple output operands of an insn

2016-07-08 Thread Thomas Preudhomme
Hi,

While investigating the root cause a testsuite regression for the 
ARM/embedded-5-branch GCC in gcc.dg/vect/slp-perm-5.c, we found that the bug 
seems to also affect trunk. The bug manifests itself as an ICE in cselib due to 
a parallel insn with two SET to the same register. When processing the second 
SET in cselib_record_set (), the assert gcc_assert (REG_VALUES (dreg)->elt == 
0) fails because the field was already set when processing the first SET. The 
root cause seems to be a register allocation issue in lra-constraints.

When considering an output operand with matching input operand(s), 
match_reload does a number of checks to see if it can reuse the first matching 
input operand register value or if a new unique value should be given to the 
output operand. The current check ignores the case of multiple output operands 
(as in neon_vtrn_insn insn pattern in config/arm/arm.md). This can lead 
to cases where multiple output operands share a same register value when the 
first matching input operand has the same value as another output operand, 
leading to the ICE in cselib. This patch changes match_reload to get 
information about other output operands and check whether this case is met or 
not.

ChangeLog entry is as follows:

*** gcc/ChangeLog ***

2016-07-01  Thomas Preud'homme  

* lra-constraints.c (match_reload): Pass information about other
output operands.  Create new unique register value if matching input
operand shares same register value as output operand being considered.
(curr_insn_transform): Record output operands already processed.


Patch passed bootstrap under arm-none-linux-gnueabihf (Cortex-A57 in ARM mode 
as well as Thumb mode), aarch64-linux-gnu (Cortex-A57) and x86_64-linux-gnu 
and testsuite results does not regress for these and for arm-none-eabi 
targeting Cortex-A8.

Is this ok for trunk?

Best regards,

Thomasdiff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
index bf08dce2e0a4c2ef4c339aedbda4dba47cba1645..a3fd6c93c648050f3479dc8aca359a819d24863e 100644
--- a/gcc/lra-constraints.c
+++ b/gcc/lra-constraints.c
@@ -871,15 +871,18 @@ regno_val_use_in (unsigned int regno, rtx x)
 }
 
 /* Generate reloads for matching OUT and INS (array of input operand
-   numbers with end marker -1) with reg class GOAL_CLASS.  Add input
-   and output reloads correspondingly to the lists *BEFORE and *AFTER.
-   OUT might be negative.  In this case we generate input reloads for
-   matched input operands INS.  EARLY_CLOBBER_P is a flag that the
-   output operand is early clobbered for chosen alternative.  */
+   numbers with end marker -1) with reg class GOAL_CLASS, considering
+   output operands OUTS (similar array to INS) needing to be in different
+   registers.  Add input and output reloads correspondingly to the lists
+   *BEFORE and *AFTER.  OUT might be negative.  In this case we generate
+   input reloads for matched input operands INS.  EARLY_CLOBBER_P is a flag
+   that the output operand is early clobbered for chosen alternative.  */
 static void
-match_reload (signed char out, signed char *ins, enum reg_class goal_class,
-	  rtx_insn **before, rtx_insn **after, bool early_clobber_p)
+match_reload (signed char out, signed char *ins, signed char *outs,
+	  enum reg_class goal_class, rtx_insn **before,
+	  rtx_insn **after, bool early_clobber_p)
 {
+  bool out_conflict;
   int i, in;
   rtx new_in_reg, new_out_reg, reg;
   machine_mode inmode, outmode;
@@ -968,12 +971,32 @@ match_reload (signed char out, signed char *ins, enum reg_class goal_class,
 	 We don't care about eliminable hard regs here as we are
 	 interesting only in pseudos.  */
 
+  /* Matching input's register value is the same as one of the other
+	 output operand.  Output operands in a parallel insn must be in
+	 different registers.  */
+  out_conflict = false;
+  if (REG_P (in_rtx))
+	{
+	  for (i = 0; outs[i] >= 0; i++)
+	{
+	  rtx other_out_rtx = *curr_id->operand_loc[outs[i]];
+	  if (REG_P (other_out_rtx)
+		  && (regno_val_use_in (REGNO (in_rtx), other_out_rtx)
+		  != NULL_RTX))
+		{
+		  out_conflict = true;
+		  break;
+		}
+	}
+	}
+
   new_in_reg = new_out_reg
 	= (! early_clobber_p && ins[1] < 0 && REG_P (in_rtx)
 	   && (int) REGNO (in_rtx) < lra_new_regno_start
 	   && find_regno_note (curr_insn, REG_DEAD, REGNO (in_rtx))
 	   && (out < 0
 	   || regno_val_use_in (REGNO (in_rtx), out_rtx) == NULL_RTX)
+	   && !out_conflict
 	   ? lra_create_new_reg (inmode, in_rtx, goal_class, "")
 	   : lra_create_new_reg_with_unique_value (outmode, out_rtx,
 		   goal_class, ""));
@@ -3432,9 +3455,11 @@ curr_insn_transform (bool check_only_p)
   int i, j, k;
   int n_operands;
   int n_alternatives;
+  int n_outputs;
   int commutative;
   signed char goal_alt_matched[MAX_RECOG_OPERANDS][MAX_RECOG_OPERANDS];
   signed char match_inputs[MAX_RECOG_OPERANDS + 1];
+  signed char outputs[MAX_RECOG_OPERANDS + 1];
   rtx_i

[PING] Re: [PATCH] input.c: add lexing selftests and a test matrix for line_table states

2016-07-08 Thread David Malcolm
Ping.

I believe I need review of the selftest.h change; the rest I think I
can self-approve, if need be.

  https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01340.html


On Fri, 2016-06-17 at 17:41 -0400, David Malcolm wrote:
> This patch adds explicit testing of lexing a source file,
> generalizing this (and the test of ordinary line maps) over
> a 2-dimensional test matrix covering:
> 
>   (1) line_table->default_range_bits: some frontends use a non-zero value
>   and others use zero
> 
>   (2) the fallback modes within line-map.c: there are various threshold
>   values for source_location/location_t beyond line-map.c changes
>   behavior (disabling of the range-packing optimization, disabling
>   of column-tracking).  We exercise these by starting the line_table
>   at interesting values at or near these thresholds.
> 
> This helps ensures that location data works in all of these states,
> and that (I hope) we don't have lingering bugs relating to the
> transition between line_table states.
> 
> Successfully bootstrapped®rtested on x86_64-pc-linux-gnu;
> Successful -fself-test of stage1 on powerpc-ibm-aix7.1.3.0.
> 
> OK for trunk?  (I can self-approve much of this, but it's probably
> worth having another pair of eyes look at it, if nothing else).
> 
> gcc/ChangeLog:
>   > * input.c: Include cpplib.h.
>   > (selftest::temp_source_file): New class.
>   > (selftest::temp_source_file::temp_source_file): New ctor.
>   > (selftest::temp_source_file::~temp_source_file): New dtor.
>   > (selftest::should_have_column_data_p): New function.
>   > (selftest::test_should_have_column_data_p): New function.
>   > (selftest::temp_line_table): New class.
>   > (selftest::temp_line_table::temp_line_table): New ctor.
>   > (selftest::temp_line_table::~temp_line_table): New dtor.
>   > (selftest::test_accessing_ordinary_linemaps): Add case_ param; use
>   > it to create a temp_line_table.
>   > (selftest::assert_loceq): Only verify LOCATION_COLUMN for
>   > locations that are known to have column data.
>   > (selftest::line_table_case): New struct.
>   > (selftest::test_reading_source_line): Move tempfile handling
>   > to class temp_source_file.
>   > (ASSERT_TOKEN_AS_TEXT_EQ): New macro.
>   > (selftest::assert_token_loc_eq): New function.
>   > (ASSERT_TOKEN_LOC_EQ): New macro.
>   > (selftest::test_lexer): New function.
>   > (selftest::boundary_locations): New array.
>   > (selftest::input_c_tests): Call test_should_have_column_data_p.
>   > Loop over a test matrix of interesting values of location and
>   > default_range_bits, calling test_lexer on each case in the matrix.
>   > Move call to test_accessing_ordinary_linemaps into the matrix.
>   > * selftest.h (ASSERT_EQ): Reimplement in terms of...
>   > (ASSERT_EQ_AT): New macro.
> 
> gcc/testsuite/ChangeLog:
>   > * gcc.dg/plugin/location_overflow_plugin.c (plugin_init): Avoid
>   > hardcoding the values of LINE_MAP_MAX_LOCATION_WITH_PACKED_RANGES
>   > and LINE_MAP_MAX_LOCATION_WITH_COLS.
> 
> libcpp/ChangeLog:
>   > * include/line-map.h (LINE_MAP_MAX_LOCATION_WITH_PACKED_RANGES):
>   > Move here from line-map.c.
>   > (LINE_MAP_MAX_LOCATION_WITH_COLS): Likewise.
>   > * line-map.c (LINE_MAP_MAX_LOCATION_WITH_PACKED_RANGES): Move from
>   > here to line-map.h.
>   > (LINE_MAP_MAX_LOCATION_WITH_COLS): Likewise.
> ---
>  gcc/input.c| 323 
> +++--
>  gcc/selftest.h |  12 +-
>  .../gcc.dg/plugin/location_overflow_plugin.c   |   4 +-
>  libcpp/include/line-map.h  |  10 +
>  libcpp/line-map.c  |  12 -
>  5 files changed, 327 insertions(+), 34 deletions(-)
> 
> diff --git a/gcc/input.c b/gcc/input.c
> index 3fb4a25..0016555 100644
> --- a/gcc/input.c
> +++ b/gcc/input.c
> @@ -23,6 +23,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "intl.h"
>  #include "diagnostic-core.h"
>  #include "selftest.h"
> +#include "cpplib.h"
>  
>  /* This is a cache used by get_next_line to store the content of a
> file to be searched for file lines.  */
> @@ -1144,6 +1145,74 @@ namespace selftest {
>  
>  /* Selftests of location handling.  */
>  
> +/* A class for writing out a temporary sourcefile for use in selftests
> +   of input handling.  */
> +
> +class temp_source_file
> +{
> + public:
> +  temp_source_file (const location &loc, const char *suffix,
> +>>   > const char *content);
> +  ~temp_source_file ();
> +
> +  const char *get_filename () const { return m_filename; }
> +
> + private:
> +  char *m_filename;
> +};
> +
> +/* Constructor.  Create a tempfile using SUFFIX, and write CONTENT to
> +   it.  Abort if anything goes wrong, using LOC as the effective
> +   location in the problem report.  */
> +
> +temp_source_file::temp_source_file (const location

Re: [PATCH, rs6000] PR71800, use correct constraint for stxsiwx

2016-07-08 Thread Segher Boessenkool
On Fri, Jul 08, 2016 at 12:43:35AM -0500, Segher Boessenkool wrote:
> On Thu, Jul 07, 2016 at 03:42:55PM -0500, Pat Haugen wrote:
> > The following patch corrects the constraint so that we only generate 
> > 'stxsiwx' on Power8 or later hardware. Ok for trunk after successful 
> > bootstrap/regtest?
> 
> I don't really understand this.  Before, it required UPPER_REGS_DF (which
> seems correct), and now it requires UPPER_REGS_SF, which seems wrong.

After some discussion...  Okay, so it is a bit sub-optimal, but it does
work.  Okay for trunk, with a comment on why it is "wu" ("because "wu"
requires power8").

Thanks,


Segher


> > --- config/rs6000/rs6000.md (revision 238117)
> > +++ config/rs6000/rs6000.md (working copy)
> > @@ -5748,7 +5748,7 @@ (define_expand "lrounddi2"
> >  ; An UNSPEC is used so we don't have to support SImode in FP registers.
> >  (define_insn "stfiwx"
> >[(set (match_operand:SI 0 "memory_operand" "=Z,Z")
> > -   (unspec:SI [(match_operand:DI 1 "gpc_reg_operand" "d,wv")]
> > +   (unspec:SI [(match_operand:DI 1 "gpc_reg_operand" "d,wu")]
> >UNSPEC_STFIWX))]
> >"TARGET_PPC_GFXOPT"
> >"@


Re: [PATCH 0/9] separate shrink-wrapping

2016-07-08 Thread Bill Schmidt
Not that getting the terminology right isn't important, but it would be
nice if Segher could get a review for the rest of the content, too. :)

Bill

> On Jul 8, 2016, at 8:45 AM, Segher Boessenkool  
> wrote:
> 
> On Fri, Jul 08, 2016 at 09:16:03AM -0400, David Malcolm wrote:
>> As far as I understand the idea, there are a number of target-specific
>> things that are to be done during a function call, and the optimization
>> tries to detect which of optimize each of these separately.
>> 
>> Some synonyms and near-synonyms for these "things":
>> 
>>  aspect
>>  component
>>  concern
>>  duty
>>  element
>>  facet
>>  factor
>>  item
>>  part
>>  piece
>>  portion
>>  responsibility
>> 
>> and I suppose "shrink_wrap_part" is shorter than
>> "shrink_wrap_component".
> 
> The reason I called it "concern" is that this isn't dealing with the
> prologue/epilogue divided neatly into separate insns.  The generic code
> only deals with what basic blocks will have what concerns the prologue
> deals with, dealt with.  The target code then worries about what code
> to write for that.  "concerns" does not map 1-1 to parts of the prologue,
> in the general case.  (A very simple example: the arm load/store pair
> instructions).
> 
> But component is abstract enough I think.
> 
>> (Yeah, I'm bike-shedding; sorry)
> 
> :-)
> 
> 
> Segher
> 



[PATCH] Fix Fortran DO loop fallback

2016-07-08 Thread Martin Liška
Hello

Following patch fixes fallout caused by the patch set:
https://gcc.gnu.org/ml/gcc-regression/2016-07/msg00097.html

Ready after it finished regression tests?
Thanks,
Martin
>From c5dd7ad62f795cce560c7f1bb8767b7ed9298d8a Mon Sep 17 00:00:00 2001
From: marxin 
Date: Fri, 8 Jul 2016 15:51:54 +0200
Subject: [PATCH] Fix Fortran DO loop fallback

gcc/testsuite/ChangeLog:

2016-07-08  Martin Liska  

	* gfortran.dg/ldist-1.f90: Update expected dump scan.
	* gfortran.dg/pr42108.f90: Likewise.
	* gfortran.dg/vect/pr62283.f: Likewise.
---
 gcc/testsuite/gfortran.dg/ldist-1.f90| 2 +-
 gcc/testsuite/gfortran.dg/pr42108.f90| 2 --
 gcc/testsuite/gfortran.dg/vect/pr62283.f | 2 +-
 3 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gfortran.dg/ldist-1.f90 b/gcc/testsuite/gfortran.dg/ldist-1.f90
index 2030328..071a651 100644
--- a/gcc/testsuite/gfortran.dg/ldist-1.f90
+++ b/gcc/testsuite/gfortran.dg/ldist-1.f90
@@ -32,4 +32,4 @@ end Subroutine PADEC
 ! There are 5 legal partitions in this code.  Based on the data
 ! locality heuristic, this loop should not be split.
 
-! { dg-final { scan-tree-dump "distributed: split to" "ldist" } }
+! { dg-final { scan-tree-dump-times "distributed: split to" 0 "ldist" } }
diff --git a/gcc/testsuite/gfortran.dg/pr42108.f90 b/gcc/testsuite/gfortran.dg/pr42108.f90
index eb93604..a913aa4 100644
--- a/gcc/testsuite/gfortran.dg/pr42108.f90
+++ b/gcc/testsuite/gfortran.dg/pr42108.f90
@@ -21,7 +21,5 @@ subroutine  eval(foo1,foo2,foo3,foo4,x,n,nnd)
   end do
 end subroutine eval
 
-! We should have hoisted the division
-! { dg-final { scan-tree-dump "in all uses of countm1\[^\n\]* / " "pre" } }
 ! There should be only one load from n left
 ! { dg-final { scan-tree-dump-times "\\*n_" 1 "fre1" } }
diff --git a/gcc/testsuite/gfortran.dg/vect/pr62283.f b/gcc/testsuite/gfortran.dg/vect/pr62283.f
index 7df3d99..2933f51 100644
--- a/gcc/testsuite/gfortran.dg/vect/pr62283.f
+++ b/gcc/testsuite/gfortran.dg/vect/pr62283.f
@@ -13,4 +13,4 @@ C { dg-additional-options "-fvect-cost-model=dynamic" }
   beta=3.141593
   y=y+beta*x
   end
-C { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { target { vect_hw_misalign } } } }
+C { dg-final { scan-tree-dump "vectorized 1 loops" "vect" { target { vect_hw_misalign } } } }
-- 
2.8.4



Re: [PATCH] Fix PR rtl-optimization/71634

2016-07-08 Thread Martin Liška
On 07/08/2016 02:54 PM, Martin Liška wrote:
> On 07/08/2016 01:59 PM, Bernd Schmidt wrote:
>>
>> Gah, that's not right, that'll swap the numbers of kept/removed loops.
>>
>> I think the right answer is simply
>>   for (i = 0; i < n - IRA_MAX_LOOPS_NUM; i++)
>>
>>
>> Bernd
> 
> Thank you for the help, I've been testing the suggested change.
> 
> Martin
> 

It survives regression tests and bootstrap.
May I install the patch?

Thanks,
Martin


Re: [PATCH], PR 71806, Fix -mfloat128/-mfloat128-hardware defaults on power9

2016-07-08 Thread Michael Meissner
On Fri, Jul 08, 2016 at 09:13:50AM -0500, Segher Boessenkool wrote:
> On Fri, Jul 08, 2016 at 09:31:33AM -0400, Michael Meissner wrote:
> > * gcc.target/powerpc/p9-lxvx-stxvx-3.c: Add -mfloat128 option.
> 
> Is that the only testcase that needs updating?
> 
> > --- gcc/config/rs6000/rs6000-cpus.def   (revision 238127)
> > +++ gcc/config/rs6000/rs6000-cpus.def   (working copy)
> > @@ -63,7 +63,6 @@
> >  /* Add ISEL back into ISA 3.0, since it is supposed to be a win.  Do not 
> > add
> > P9_MINMAX until the hardware that supports it is available.  */
> >  #define ISA_3_0_MASKS_SERVER   (ISA_2_7_MASKS_SERVER   
> > \
> > -| OPTION_MASK_FLOAT128_HW  \
> 
> Please add a comment for this as well?

Ok.

> >/* IEEE 128-bit floating point hardware instructions imply enabling
> >   __float128.  */
> >if (TARGET_FLOAT128_HW
> > -  && (rs6000_isa_flags & (OPTION_MASK_P9_VECTOR
> > - | OPTION_MASK_DIRECT_MOVE
> > - | OPTION_MASK_UPPER_REGS_DI
> > - | OPTION_MASK_UPPER_REGS_DF
> > - | OPTION_MASK_UPPER_REGS_SF)) == 0)
> > +  && (rs6000_isa_flags & ISA_3_0_MASKS_IEEE) != ISA_3_0_MASKS_IEEE)
> >  {
> >if ((rs6000_isa_flags_explicit & OPTION_MASK_FLOAT128_HW) != 0)
> > error ("-mfloat128-hardware requires full ISA 3.0 support");
> 
> That is not the same thing...  New one looks better, is this a bugfix?
> The changelog doesn't say.

I just moved the OPTIONS_MASKS_* used here to a common macro that is checked
earlier to enable hardware support if -mfloat128.

> Okay for trunk and 6 with those nits fixed.  Thanks,

Thanks.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [PATCH] Do not emit SAVE_EXPR for already assigned SSA_NAMEs (PR71606).

2016-07-08 Thread Martin Liška
On 07/07/2016 04:15 PM, Richard Biener wrote:
> I think it's fine though the inliners initializer handling looks
> incredibly fragile to me ;)
> 
> Richard.

OK, installed in trunk. May I install the patch to all active branches?
Reg&bootstrap works for all of them.


Re: [PATCH], PR 71806, Fix -mfloat128/-mfloat128-hardware defaults on power9

2016-07-08 Thread Segher Boessenkool
On Fri, Jul 08, 2016 at 10:17:14AM -0400, Michael Meissner wrote:
> > >/* IEEE 128-bit floating point hardware instructions imply enabling
> > >   __float128.  */
> > >if (TARGET_FLOAT128_HW
> > > -  && (rs6000_isa_flags & (OPTION_MASK_P9_VECTOR
> > > -   | OPTION_MASK_DIRECT_MOVE
> > > -   | OPTION_MASK_UPPER_REGS_DI
> > > -   | OPTION_MASK_UPPER_REGS_DF
> > > -   | OPTION_MASK_UPPER_REGS_SF)) == 0)
> > > +  && (rs6000_isa_flags & ISA_3_0_MASKS_IEEE) != ISA_3_0_MASKS_IEEE)
> > >  {
> > >if ((rs6000_isa_flags_explicit & OPTION_MASK_FLOAT128_HW) != 0)
> > >   error ("-mfloat128-hardware requires full ISA 3.0 support");
> > 
> > That is not the same thing...  New one looks better, is this a bugfix?
> > The changelog doesn't say.
> 
> I just moved the OPTIONS_MASKS_* used here to a common macro that is checked
> earlier to enable hardware support if -mfloat128.

The old code tests if all options are off; the new if any are off.


Segher


Re: [PATCH], PR 71806, Fix -mfloat128/-mfloat128-hardware defaults on power9

2016-07-08 Thread Segher Boessenkool
On Fri, Jul 08, 2016 at 09:31:33AM -0400, Michael Meissner wrote:
>   * gcc.target/powerpc/p9-lxvx-stxvx-3.c: Add -mfloat128 option.

Is that the only testcase that needs updating?

> --- gcc/config/rs6000/rs6000-cpus.def (revision 238127)
> +++ gcc/config/rs6000/rs6000-cpus.def (working copy)
> @@ -63,7 +63,6 @@
>  /* Add ISEL back into ISA 3.0, since it is supposed to be a win.  Do not add
> P9_MINMAX until the hardware that supports it is available.  */
>  #define ISA_3_0_MASKS_SERVER (ISA_2_7_MASKS_SERVER   \
> -  | OPTION_MASK_FLOAT128_HW  \

Please add a comment for this as well?

>/* IEEE 128-bit floating point hardware instructions imply enabling
>   __float128.  */
>if (TARGET_FLOAT128_HW
> -  && (rs6000_isa_flags & (OPTION_MASK_P9_VECTOR
> -   | OPTION_MASK_DIRECT_MOVE
> -   | OPTION_MASK_UPPER_REGS_DI
> -   | OPTION_MASK_UPPER_REGS_DF
> -   | OPTION_MASK_UPPER_REGS_SF)) == 0)
> +  && (rs6000_isa_flags & ISA_3_0_MASKS_IEEE) != ISA_3_0_MASKS_IEEE)
>  {
>if ((rs6000_isa_flags_explicit & OPTION_MASK_FLOAT128_HW) != 0)
>   error ("-mfloat128-hardware requires full ISA 3.0 support");

That is not the same thing...  New one looks better, is this a bugfix?
The changelog doesn't say.

Okay for trunk and 6 with those nits fixed.  Thanks,


Segher


Re: [AArch64][1/14] ARMv8.2-A FP16 data processing intrinsics

2016-07-08 Thread James Greenhalgh
On Thu, Jul 07, 2016 at 05:13:56PM +0100, Jiong Wang wrote:
> Several data-processing instructions are agnostic to the type of their
> operands. This patch add the mapping between them and those bit- and
> lane-manipulation instructions.
> 
> No ARMv8.2-A FP16 extension hardware support is required for these
> intrinsics.

These intrinsics are independent of the ARMv8.2-A implementation,
and are proposed to be added in a future ACLE specification. I've
checked that the intrinsics added here match those proposed.

OK for trunk.

Thanks,
James

> gcc/
> 2016-07-07  Jiong Wang 
> 
> * config/aarch64/aarch64-simd.md
> (aarch64_): Use VALL_F16.
> (aarch64_ext): Likewise.
> (aarch64_rev): Likewise.
> * config/aarch64/aarch64.c (aarch64_evpc_trn): Support
> V4HFmode and V8HFmode.
> (aarch64_evpc_uzp): Likewise.
> (aarch64_evpc_zip): Likewise.
> (aarch64_evpc_ext): Likewise.
> (aarch64_evpc_rev): Likewise.
> * config/aarch64/arm_neon.h (__aarch64_vdup_lane_f16): New.
> (__aarch64_vdup_laneq_f16): New..
> (__aarch64_vdupq_lane_f16): New.
> (__aarch64_vdupq_laneq_f16): New.
> (vbsl_f16): New.
> (vbslq_f16): New.
> (vdup_n_f16): New.
> (vdupq_n_f16): New.
> (vdup_lane_f16): New.
> (vdup_laneq_f16): New.
> (vdupq_lane_f16): New.
> (vdupq_laneq_f16): New.
> (vduph_lane_f16): New.
> (vduph_laneq_f16): New.
> (vext_f16): New.
> (vextq_f16): New.
> (vmov_n_f16): New.
> (vmovq_n_f16): New.
> (vrev64_f16): New.
> (vrev64q_f16): New.
> (vtrn1_f16): New.
> (vtrn1q_f16): New.
> (vtrn2_f16): New.
> (vtrn2q_f16): New.
> (vtrn_f16): New.
> (vtrnq_f16): New.
> (__INTERLEAVE_LIST): Support float16x4_t, float16x8_t.
> (vuzp1_f16): New.
> (vuzp1q_f16): New.
> (vuzp2_f16): New.
> (vuzp2q_f16): New.
> (vzip1_f16): New.
> (vzip2q_f16): New.
> (vmov_n_f16): Reimplement using vdup_n_f16.
> (vmovq_n_f16): Reimplement using vdupq_n_f16..



Re: [PATCH PR71734] Add missed check that reference defined inside loop.

2016-07-08 Thread Yuri Rumyantsev
Hi Richard,

Thanks for your help - your patch looks much better.
Here is new patch in which additional argument was added to determine
source loop of reference.

Bootstrap and regression testing did not show any new failures.

Is it OK for trunk?
ChangeLog:
2016-07-08  Yuri Rumyantsev  

PR tree-optimization/71734
* tree-ssa-loop-im.c (ref_indep_loop_p_1): Add REF_LOOP argument which
contains REF, use it to check safelen, assume that safelen value
must be greater 1, fix style.
(ref_indep_loop_p_2): Add REF_LOOP argument.
(ref_indep_loop_p): Pass LOOP as additional argument to
ref_indep_loop_p_2.
gcc/testsuite/ChangeLog:
* g++.dg/vect/pr70729.cc: Delete redundant dg options, fix style.

2016-07-08 11:18 GMT+03:00 Richard Biener :
> On Thu, Jul 7, 2016 at 5:38 PM, Yuri Rumyantsev  wrote:
>> I checked simd3.f90 and found out that my additional check reject
>> independence of references
>>
>> REF is independent in loop#3
>> .istart0.19, .iend0.20
>> which are defined in loop#1 which is outer for loop#3.
>> Note that these references are defined by
>> _103 = __builtin_GOMP_loop_dynamic_next (&.istart0.19, &.iend0.20);
>> which is in loop#1.
>> It is clear that both these references can not be independent for loop#3.
>
> Ok, so we end up calling ref_indep_loop for ref in LOOP also for inner loops
> of LOOP to catch memory references in those as well.  So the issue is really
> that we look at the wrong loop for safelen and we _do_ want to apply safelen
> to inner loops as well.
>
> So better track the loop we are ultimately asking the question for, like in 
> the
> attached patch (fixes the testcase for me).
>
> Richard.
>
>
>
>> 2016-07-07 17:11 GMT+03:00 Richard Biener :
>>> On Thu, Jul 7, 2016 at 4:04 PM, Yuri Rumyantsev  wrote:
 I Added this check because of new failures in libgomp.fortran suite.
 Here is copy of Jakub message:
 --- Comment #29 from Jakub Jelinek  ---
 The #c27 r237844 change looks bogus to me.
 First of all, IMNSHO you can argue this way only if ref is a reference 
 seen in
 loop LOOP,
>>>
>>> or inner loops of LOOP I guess.  I _think_ we never call ref_indep_loop_p_1 
>>> with
>>> a REF whose loop is not a sub-loop of LOOP or LOOP itself (as it would not 
>>> make
>>> sense to do that, it would be a waste of time).
>>>
>>> So only if "or inner loops of LOOP" is not correct the check would be needed
>>> but then my issue with unrolling an inner loop and turning a ref that 
>>> safelen
>>> does not apply to into a ref that it now applies to arises.
>>>
>>> I don't fully get what Jakub is hinting at.
>>>
>>> Can you install the safelen > 0 -> safelen > 1 fix please?  Jakub, can you
>>> explain that bitmap check with a simple testcase?
>>>
>>> Thanks,
>>> Richard.
>>>
 which is the case of e.g. *.omp_data_i_23(D).a ref in simd3.f90 -O2
 -fopenmp -msse2, but not the D.3815[0] case tested during can_sm_ref_p - 
 the
 D.3815[0] = 0; as well as something = D.3815[0]; stmt found in the outer 
 loop
 obviously can be dependent on many of the loads and/or stores in the loop, 
 be
 it "omp simd array" or not.
 Say for
 void
 foo (int *p, int *q)
 {
   #pragma omp simd
   for (int i = 0; i < 1024; i++)
 p[i] += q[0];
 }
 sure, q[0] can't alias p[0] ... p[1022], the earlier iterations could write
 something that changes its value, and then it would behave differently from
 using VF = 1024, where everything is performed in parallel.
 Though, actually, it can alias, just it would have to write the same value 
 as
 was there.  So, if this is used to determine if it is safe to hoist the 
 load
 before the loop, it is fine, if it is used to determine if &q[0] >= &p[0] 
 &&
 &q[0] <= &p[1023], then it is not fine.

 For aliasing of q[0] and p[1023], I don't see why they couldn't alias in a
 valid program.  #pragma omp simd I think guarantees that the last 
 iteration is
 executed last, it isn't necessarily executed last alone, it could be, or
 together with one before last iteration, or (for simdlen INT_MAX) even all
 iterations can be done concurrently, in hw or sw, so it is fine if it is
 transformed into:
   int temp[1024], temp2[1024], temp3[1024];
   for (int i = 0; i < 1024; i++)
 temp[i] = p[i];
   for (int i = 0; i < 1024; i++)
 temp2[i] = q[0];
   /* The above two loops can be also swapped, or intermixed.  */
   for (int i = 0; i < 1024; i++)
 temp3[i] = temp[i] + temp2[i];
   for (int i = 0; i < 1024; i++)
 p[i] = temp3[i];
   /* Or the above loop reversed etc. */

 If you have:
 int
 bar (int *p, int *q)
 {
   q[0] = 0;
   #pragma omp simd
   for (int i = 0; i < 1024; i++)
 p[i]++;
   return q[0];
 }
 i.e. something similar to what misbehaves in simd3.f90 with the change, 
 then
 the answer is that q[0] isn'

Re: [PATCH 0/9] separate shrink-wrapping

2016-07-08 Thread Segher Boessenkool
On Fri, Jul 08, 2016 at 09:16:03AM -0400, David Malcolm wrote:
> As far as I understand the idea, there are a number of target-specific
> things that are to be done during a function call, and the optimization
> tries to detect which of optimize each of these separately.
> 
> Some synonyms and near-synonyms for these "things":
> 
>   aspect
>   component
>   concern
>   duty
>   element
>   facet
>   factor
>   item
>   part
>   piece
>   portion
>   responsibility
> 
> and I suppose "shrink_wrap_part" is shorter than
> "shrink_wrap_component".

The reason I called it "concern" is that this isn't dealing with the
prologue/epilogue divided neatly into separate insns.  The generic code
only deals with what basic blocks will have what concerns the prologue
deals with, dealt with.  The target code then worries about what code
to write for that.  "concerns" does not map 1-1 to parts of the prologue,
in the general case.  (A very simple example: the arm load/store pair
instructions).

But component is abstract enough I think.

> (Yeah, I'm bike-shedding; sorry)

:-)


Segher


Re: RFA (gimplify): PATCH to implement C++ order of evaluation paper

2016-07-08 Thread Jakub Jelinek
On Thu, Jul 07, 2016 at 03:18:13PM -0400, Jason Merrill wrote:
> How about this?  I also have a patch to handle assignment order
> entirely in the front end, but my impression has been that you wanted
> to make this change for other reasons as well.

So what exactly is supposed to be the evaluation order for function calls
with lhs in C++17?
Reading
http://en.cppreference.com/w/cpp/language/eval_order
I'm confused.
struct S { S (); ~S (); ... };
S s[1024];
typedef S (*fn) (int, int);
fn a[1024];
void foo (int *i, int *j, int *k, int *l)
{
  s[i[0]++] = (a[j[0]++]) (k[0]++, l[0]++);
}
So, j[0]++ needs to happen first, then k[0]++ and l[0]++ (indeterminately
sequenced), but what about the function call vs. i[0]++?

There is the rule that for E1 = E2 all side-effects of E2 happen before all
side-effects of E1.

I mean, if the function return type is a gimple reg type, then I see no
problem in honoring that, the function call returns a temporary, then the
side-effects of the lhs are evaluated and then it is stored to that lvalue.

But, if the return type is non-POD, then we need to pass the address of the
lhs as invisible reference to the function call, how can we do it if we
can't yet evaluate the side-effects of the lhs?

Perhaps better testcase is:

int bar (int);
void baz ()
{
  s[bar (0)] = (a[bar (1)]) (bar (2), 0);
}

In which order all the 4 calls are made?

What the patch you've posted does is that it gimplifies from_p first,
and gimplify_call_expr will first evaluate bar (1), then bar (2),
but then it is a CALL_EXPR; then it gimplifies the lhs, i.e. bar (0)
call, and finally the indirect call.

> 
> In other news, I convinced the committee to drop function arguments
> from the order of evaluation paper, so we don't have to worry about
> that hit on PUSH_ARGS_REVERSED targets.
> 
> Jason

> commit 8dac319f5647d31568ad9278edeff3607aa1b3cc
> Author: Jason Merrill 
> Date:   Sat Jun 25 19:12:42 2016 +0300
> 
>   P0145: Refining Expression Order for C++ (assignment)
> 
>   * gimplify.c (initial_rhs_predicate_for): New.
>   (gimplfy_modify_expr): Gimplify RHS before LHS.
> 
> diff --git a/gcc/gimplify.c b/gcc/gimplify.c
> index 47c4d25..0276588 100644
> --- a/gcc/gimplify.c
> +++ b/gcc/gimplify.c
> @@ -3813,6 +3813,18 @@ rhs_predicate_for (tree lhs)
>  return is_gimple_mem_rhs_or_call;
>  }
>  
> +/* Return the initial guess for an appropriate RHS predicate for this LHS,
> +   before the LHS has been gimplified.  */
> +
> +static gimple_predicate
> +initial_rhs_predicate_for (tree lhs)
> +{
> +  if (is_gimple_reg_type (TREE_TYPE (lhs)))
> +return is_gimple_reg_rhs_or_call;
> +  else
> +return is_gimple_mem_rhs_or_call;
> +}
> +
>  /* Gimplify a C99 compound literal expression.  This just means adding
> the DECL_EXPR before the current statement and using its anonymous
> decl instead.  */
> @@ -4778,10 +4790,6 @@ gimplify_modify_expr (tree *expr_p, gimple_seq *pre_p, 
> gimple_seq *post_p,
>   that is what we must do here.  */
>maybe_with_size_expr (from_p);
>  
> -  ret = gimplify_expr (to_p, pre_p, post_p, is_gimple_lvalue, fb_lvalue);
> -  if (ret == GS_ERROR)
> -return ret;
> -
>/* As a special case, we have to temporarily allow for assignments
>   with a CALL_EXPR on the RHS.  Since in GIMPLE a function call is
>   a toplevel statement, when gimplifying the GENERIC expression
> @@ -4794,6 +4802,16 @@ gimplify_modify_expr (tree *expr_p, gimple_seq *pre_p, 
> gimple_seq *post_p,
>   reaches the CALL_EXPR.  On return from gimplify_expr, the newly
>   created GIMPLE_CALL  will be the last statement in *PRE_P
>   and all we need to do here is set 'a' to be its LHS.  */
> +  ret = gimplify_expr (from_p, pre_p, post_p,
> +initial_rhs_predicate_for (*to_p), fb_rvalue);
> +  if (ret == GS_ERROR)
> +return ret;
> +
> +  ret = gimplify_expr (to_p, pre_p, post_p, is_gimple_lvalue, fb_lvalue);
> +  if (ret == GS_ERROR)
> +return ret;
> +
> +  /* Now that the LHS is gimplified, we know what to use for the RHS.  */
>ret = gimplify_expr (from_p, pre_p, post_p, rhs_predicate_for (*to_p),
>  fb_rvalue);
>if (ret == GS_ERROR)


Jakub


[patch] Update LWG issues lists and implementation status

2016-07-08 Thread Jonathan Wakely

This backports a doc change to the gcc-5-branch. The patch doesn't
include the updated issues lists, since they're just copied from
upstream and are already on trunk and gcc-6-branch.

The list of implemented DRs is slightly different to the one on trunk,
because not all of them are fixed on gcc-5-branch. I think I got the
list right for the branch status.

Committed to gcc-5-branch.


commit a5d1cd4ac7d1d5be023ac62ef97497c8b34e9689
Author: redi 
Date:   Tue Jun 2 11:07:30 2015 +

Update LWG issues lists and implementation status

Backport from mainline
2015-06-02  Jonathan Wakely  

	* doc/html/ext/lwg-active.html: Update to R93.
	* doc/html/ext/lwg-closed.html: Likewise.
	* doc/html/ext/lwg-defects.html: Likewise.
	* doc/html/manual/*: Regenerate.
	* doc/xml/manual/intro.xml: Document status of several DRs.

diff --git a/libstdc++-v3/doc/xml/manual/intro.xml b/libstdc++-v3/doc/xml/manual/intro.xml
index 2dd833d..2169905 100644
--- a/libstdc++-v3/doc/xml/manual/intro.xml
+++ b/libstdc++-v3/doc/xml/manual/intro.xml
@@ -640,6 +640,12 @@ requirements of the license of GCC.
 Implement the resolution, basically cast less.
 
 
+http://www.w3.org/1999/xlink"; xlink:href="../ext/lwg-defects.html#445">445:
+	iterator_traits::reference unspecified for some iterator categories
+
+Change istreambuf_iterator::reference in C++11 mode.
+
+
 http://www.w3.org/1999/xlink"; xlink:href="../ext/lwg-defects.html#453">453:
 	basic_stringbuf::seekoff need not always fail for an empty stream
 
@@ -659,6 +665,12 @@ requirements of the license of GCC.
 	at(const key_type&) to std::map.
 
 
+http://www.w3.org/1999/xlink"; xlink:href="../ext/lwg-defects.html#467">467:
+	char_traits::lt(), compare(), and memcmp()
+
+Change lt.
+
+
 http://www.w3.org/1999/xlink"; xlink:href="../ext/lwg-defects.html#508">508:
 	Bad parameters for ranlux64_base_01
 
@@ -810,6 +822,128 @@ requirements of the license of GCC.
 Return the end of the filled range.
 
 
+http://www.w3.org/1999/xlink"; xlink:href="../ext/lwg-defects.html#2049">2049:
+	is_destructible underspecified
+
+Handle non-object types.
+
+
+http://www.w3.org/1999/xlink"; xlink:href="../ext/lwg-defects.html#2056">2056:
+	future_errc enums start with value 0 (invalid value for broken_promise)
+
+Reorder enumerators.
+
+
+http://www.w3.org/1999/xlink"; xlink:href="../ext/lwg-defects.html#2059">2059:
+	C++0x ambiguity problem with map::erase
+
+Add additional overloads.
+
+
+http://www.w3.org/1999/xlink"; xlink:href="../ext/lwg-defects.html#2067">2067:
+	packaged_task should have deleted copy c'tor with const parameter
+
+Fix signatures.
+
+
+http://www.w3.org/1999/xlink"; xlink:href="../ext/lwg-defects.html#2101">2101:
+	Some transformation types can produce impossible types
+
+Use the referenceable type concept.
+
+
+http://www.w3.org/1999/xlink"; xlink:href="../ext/lwg-defects.html#2106">2106:
+	move_iterator wrapping iterators returning prvalues
+
+Change the reference type.
+
+
+http://www.w3.org/1999/xlink"; xlink:href="../ext/lwg-defects.html#2132">2132:
+	std::function ambiguity
+
+Constrain the constructor to only accept callable types.
+
+
+http://www.w3.org/1999/xlink"; xlink:href="../ext/lwg-defects.html#2141">2141:
+	common_type trait produces reference types
+
+Use decay for the result type.
+
+
+http://www.w3.org/1999/xlink"; xlink:href="../ext/lwg-defects.html#2144">2144:
+	Missing noexcept specification in type_index
+
+Add noexcept
+
+
+http://www.w3.org/1999/xlink"; xlink:href="../ext/lwg-defects.html#2145">2145:
+	error_category default constructor
+
+Declare a public constexpr constructor.
+
+
+http://www.w3.org/1999/xlink"; xlink:href="../ext/lwg-defects.html#2162">2162:
+	allocator_traits::max_size missing noexcept
+
+Add noexcept.
+
+
+http://www.w3.org/1999/xlink"; xlink:href="../ext/lwg-defects.html#2187">2187:
+	vector is missing emplace and emplace_back member functions
+
+Add emplace and emplace_back member functions.
+
+
+http://www.w3.org/1999/xlink"; xlink:href="../ext/lwg-defects.html#2196">2196:
+	Specification of is_*[copy/move]_[constructible/assignable] unclear for non-referencable types
+
+Use the referenceable type concept.
+
+
+http://www.w3.org/1999/xlink"; xlink:href="../ext/lwg-defects.html#2313">2313:
+	tuple_size should always derive from integral_constant
+
+Update definitions of the partial specializations for const and volatile types.
+
+
+http://www.w3.org/1999/xlink"; xlink:href="../ext/lwg-defects.html#2329">2329:
+   regex_match()/regex_search() with match_results should forbid temporary strings
+
+Add deleted overloads for rvalue strings.
+
+
+http://www.w3.org/1999/xl

[PATCH], PR 71806, Fix -mfloat128/-mfloat128-hardware defaults on power9

2016-07-08 Thread Michael Meissner
If you configure either GCC 6.x or trunk with the --with-cpu=power9 option, it
will enable __float128 support, since power9 has the ISA 3.0 hardware IEEE
128-bit floating point instructions.  However, the libquadmath and libstdc++
libraries have not been fixed to enable the PowerPC support, and the build
fails.  Similarly, users who use -mcpu=power9 will get __float128 enabled, and
they will run into problems if they use features that are not yet present.

This patch, changes the behavior so that IEEE 128-bit floating point
instructions are enabled if you use ISA 3.0 (-mcpu=power9), provding you use
the -mfloat128 option.  Or if you use the -mfloat128-hardware option, it will
enable the base -mfloat128 support.

It is expected that when glibc, libstdc++, and libquadmath are enhanced to add
__float128 support, that the -mfloat128 option will be enabled automatically
for power7/power8 systems, and -mfloat128-hardware will be enabled for power9
systems.

Included in this post are two attachments, one for trunk, and the other for the
GCC 6.x branch.  I have tested both patches against their respected bases, and
there are no regressions (once the one test that assumed -mcpu=power9 enabled
IEEE 128-bit floating point is fixed with this patch).  Are these ok to install
in their respective trees?

The only difference between the GCC 6 patch and the trunk patch is the trunk
adds a check for -mupper-regs-di, which has not been backported to the GCC 6
branch.

[gcc]
2016-07-08  Michael Meissner  

PR target/71806
* config/rs6000/rs6000-cpus.def (ISA_3_0_MASKS_SERVER): Do not
enable -mfloat128-hardware by default.
(ISA_3_0_MASKS_IEEE): New macro to give all of the VSX options
that IEEE 128-bit hardware support needs.
* config/rs6000/rs6000.c (rs6000_option_override_internal): If
-mcpu=power9 -mfloat128, enable -mfloat128-hardware by default.
Use ISA_3_0_MASKS_IEEE as the set of options that IEEE 128-bit
floating point requires.
* doc/invoke.texi (RS/6000 and PowerPC Options): Document
-mfloat128 and -mfloat128-hardware changes.

[gcc/testsuite]
2016-07-08  Michael Meissner  

* gcc.target/powerpc/p9-lxvx-stxvx-3.c: Add -mfloat128 option.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/rs6000-cpus.def
===
--- gcc/config/rs6000/rs6000-cpus.def   (revision 238127)
+++ gcc/config/rs6000/rs6000-cpus.def   (working copy)
@@ -63,7 +63,6 @@
 /* Add ISEL back into ISA 3.0, since it is supposed to be a win.  Do not add
P9_MINMAX until the hardware that supports it is available.  */
 #define ISA_3_0_MASKS_SERVER   (ISA_2_7_MASKS_SERVER   \
-| OPTION_MASK_FLOAT128_HW  \
 | OPTION_MASK_ISEL \
 | OPTION_MASK_MODULO   \
 | OPTION_MASK_P9_FUSION\
@@ -72,6 +71,16 @@
 | OPTION_MASK_P9_MISC  \
 | OPTION_MASK_P9_VECTOR)
 
+/* Support for the IEEE 128-bit floating point hardware requires a lot of the
+   VSX instructions that are part of ISA 3.0.  */
+#define ISA_3_0_MASKS_IEEE (OPTION_MASK_VSX\
+| OPTION_MASK_P8_VECTOR\
+| OPTION_MASK_P9_VECTOR\
+| OPTION_MASK_DIRECT_MOVE  \
+| OPTION_MASK_UPPER_REGS_DI\
+| OPTION_MASK_UPPER_REGS_DF\
+| OPTION_MASK_UPPER_REGS_SF)
+
 #define POWERPC_7400_MASK  (OPTION_MASK_PPC_GFXOPT | OPTION_MASK_ALTIVEC)
 
 /* Deal with ports that do not have -mstrict-align.  */
Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 238130)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -4381,14 +4381,21 @@ rs6000_option_override_internal (bool gl
   rs6000_isa_flags &= ~(OPTION_MASK_FLOAT128 | OPTION_MASK_FLOAT128_HW);
 }
 
+  /* If we have -mfloat128 and full ISA 3.0 support, enable -mfloat128-hardware
+ by default.  */
+  if (TARGET_FLOAT128 && !TARGET_FLOAT128_HW
+  && (rs6000_isa_flags & ISA_3_0_MASKS_IEEE) == ISA_3_0_MASKS_IEEE
+  && !(rs6000_isa_flags_explicit & OPTION_MASK_FLOAT128_HW))
+{
+  rs6000_isa_flags |= OPTION_MASK_FLOAT128_HW;
+  if ((rs6000_isa_flags & OPTION_MASK_FLOAT128) != 0)
+   rs6000_isa_flags_explicit |= OPTION_MASK_FLOAT128_HW;
+}
+
   /* IEEE 128-bit floating point hardware instructions imply enabling
 

Re: [PATCH 0/9] separate shrink-wrapping

2016-07-08 Thread David Malcolm
On Fri, 2016-07-08 at 07:11 -0500, Segher Boessenkool wrote:
> On Fri, Jul 08, 2016 at 12:42:34PM +0200, Bernd Schmidt wrote:
> > On 06/14/2016 11:24 PM, Segher Boessenkool wrote:
> > > On Wed, Jun 08, 2016 at 06:43:23PM +0200, Bernd Schmidt wrote:
> > > > On 06/08/2016 05:16 PM, Segher Boessenkool wrote:
> > > > > There is no standard naming for this as far as I know.  I'll
> > > > > gladly
> > > > > use a better name anyone comes up with.
> > > > 
> > > > Maybe just subpart?
> > > 
> > > How about "factor"?
> > 
> > Still sounds odd to me. "Component" maybe? Ideally a native speaker
> > would help decide what sounds natural to them.
> 
> That does sound nice...  OTOH,
> 
> $ grep -i component *.c|wc -l
> 1081
> 
> but the opportunity for confusion is limited I think (and calling it
> "shrink-wrapping component" where needed sounds natural too!)

As far as I understand the idea, there are a number of target-specific
things that are to be done during a function call, and the optimization
tries to detect which of optimize each of these separately.

Some synonyms and near-synonyms for these "things":

  aspect
  component
  concern
  duty
  element
  facet
  factor
  item
  part
  piece
  portion
  responsibility

and I suppose "shrink_wrap_part" is shorter than
"shrink_wrap_component".

(Yeah, I'm bike-shedding; sorry)



Re: [PATCH] Fix PR rtl-optimization/71634

2016-07-08 Thread Martin Liška
On 07/08/2016 01:59 PM, Bernd Schmidt wrote:
> 
> Gah, that's not right, that'll swap the numbers of kept/removed loops.
> 
> I think the right answer is simply
>   for (i = 0; i < n - IRA_MAX_LOOPS_NUM; i++)
> 
> 
> Bernd

Thank you for the help, I've been testing the suggested change.

Martin


Re: [RFC, v2] Test coverage for --param boundary values

2016-07-08 Thread Martin Liška
Hi.

This is my second attempt of the patch where I generate all tests on fly.
Firstly, params-options.h is used to generate a list of options in form of:

"predictable-branch-outcome"=2,0,50
"inline-min-speedup"=10,0,0
"max-inline-insns-single"=400,0,0
...

The list is then loaded in params.ext which triggers dg-runtest.
I've also swapped the tested source file to a part of bzip2 compression 
algorithm.

Now the testing runs 1m20s (w/ -O3) and 0m58s (w/ -O2).
Hope it's much better?

Martin
>From f84ce7be4a998089541fb4512e19f54a4ec25cf6 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Fri, 8 Jul 2016 10:59:24 +0200
Subject: [PATCH] Add tests that test boundary values of params

gcc/ChangeLog:

2016-07-08  Martin Liska  

	* Makefile.in: Append rule for params-options.h.
	* params-options.h: New file.

gcc/testsuite/ChangeLog:

2016-07-08  Martin Liska  

	* gcc.dg/params/blocksort-part.c: New test.
	* gcc.dg/params/params.exp: New file.
---
 gcc/Makefile.in  |   9 +-
 gcc/params-options.h |  27 +
 gcc/testsuite/gcc.dg/params/blocksort-part.c | 706 +++
 gcc/testsuite/gcc.dg/params/params.exp   |  64 +++
 4 files changed, 805 insertions(+), 1 deletion(-)
 create mode 100644 gcc/params-options.h
 create mode 100644 gcc/testsuite/gcc.dg/params/blocksort-part.c
 create mode 100644 gcc/testsuite/gcc.dg/params/params.exp

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 5e7422d..f365d29a 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -2496,7 +2496,7 @@ generated_files = config.h tm.h $(TM_P_H) $(TM_H) multilib.h \
$(ALL_GTFILES_H) gtype-desc.c gtype-desc.h gcov-iov.h \
options.h target-hooks-def.h insn-opinit.h \
common/common-target-hooks-def.h pass-instances.def \
-   c-family/c-target-hooks-def.h params.list case-cfn-macros.h \
+   c-family/c-target-hooks-def.h params.list params.options case-cfn-macros.h \
cfn-operators.pd
 
 #
@@ -3328,6 +3328,13 @@ s-params.list: $(srcdir)/params-list.h $(srcdir)/params.def
 	$(SHELL) $(srcdir)/../move-if-change tmp-params.list params.list
 	$(STAMP) s-params.list
 
+params.options: s-params.options; @true
+s-params.options: $(srcdir)/params-options.h $(srcdir)/params.def
+	$(CPP) $(srcdir)/params-options.h | sed 's/^#.*//;/^$$/d' > tmp-params.options
+	$(SHELL) $(srcdir)/../move-if-change tmp-params.options params.options
+	$(STAMP) s-params.options
+
+
 PLUGIN_HEADERS = $(TREE_H) $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
   toplev.h $(DIAGNOSTIC_CORE_H) $(BASIC_BLOCK_H) $(HASH_TABLE_H) \
   tree-ssa-alias.h $(INTERNAL_FN_H) gimple-fold.h tree-eh.h gimple-expr.h \
diff --git a/gcc/params-options.h b/gcc/params-options.h
new file mode 100644
index 000..44bb3c2
--- /dev/null
+++ b/gcc/params-options.h
@@ -0,0 +1,27 @@
+/* File used to generate params.list
+   Copyright (C) 2015-2016 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+#define DEFPARAM(enumerator, option, nocmsgid, default, min, max) \
+  option=default,min,max
+#define DEFPARAMENUM5(enumerator, option, nocmsgid, default, \
+		  v0, v1, v2, v3, v4) \
+  option=v0,v1,v2,v3,v4
+#include "params.def"
+#undef DEFPARAM
+#undef DEFPARAMENUM5
diff --git a/gcc/testsuite/gcc.dg/params/blocksort-part.c b/gcc/testsuite/gcc.dg/params/blocksort-part.c
new file mode 100644
index 000..0eef2f3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/params/blocksort-part.c
@@ -0,0 +1,706 @@
+
+/*-*/
+/*--- Block sorting machinery   ---*/
+/*---   blocksort.c ---*/
+/*-*/
+
+/* --
+   This file is part of bzip2/libbzip2, a program and library for
+   lossless, block-sorting data compression.
+
+   bzip2/libbzip2 version 1.0.6 of 6 September 2010
+   Copyright (C) 1996-2010 Julian Seward 
+
+   Please read the WARNING, DISCLAIMER and PATENTS sections in the 
+   README file.
+
+   This program is released under the terms of the license contained
+   in the file LICENSE.
+   -- */
+
+typedef charChar;
+typedef unsigned char   Bool;
+typedef unsigned char   UChar;
+typedef

Re: [PATCH] Add code-hoisting to GIMPLE

2016-07-08 Thread Rainer Orth
Hi Richard,

>> I've just bootstrapped the patch on sparc-sun-solaris2.12, which
>> uncovered a couple of testsuite failures:
>> 
>> +FAIL: gcc.dg/tree-ssa/split-path-5.c scan-tree-dump-times split-paths
>> "Duplicat
>> ing join block" 2
>> +FAIL: gcc.dg/tree-ssa/split-path-5.c scan-tree-dump-times split-paths
>> "Duplicat
>> ing join block" 2
>
> Saw this on x86_64 as well and should have been fixed with
>
> 2016-07-05  Richard Biener 
>   
>   
>   
> * gimple-ssa-split-paths.c 
> (find_block_to_duplicate_for_splitting_pa):  
> Handle empty else block.  
>   
> (is_feasible_trace): Likewise.
>   
> (split_paths): Likewise.
>
>> Message doesn't occur at all.
>> 
>> +FAIL: gfortran.dg/ldist-1.f90 -O scan-tree-dump-not ldist "distributed:
>> spl
>> it to"
>> 
>> Likewise.
>
> You mean it does occur (it's a scan-tree-dump-not).  I saw this on x86_64
> as well and fixed it with
>
> 2016-07-05  Richard Biener  
>
> * tree-loop-distribution.c (distribute_loop): Fix issue with
> the cost model loop.
>
> maybe the fixes were not complete.  I'll have a second look with a
> sparc-solaris cross on Monday.

I guess there's no need: I applied the patch to a not fully up to date
tree which I'd bootstrapped before, to save me the full regtest.  At
r238001, it just didn't include either fix.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: RFA (gimplify): PATCH to implement C++ order of evaluation paper

2016-07-08 Thread Richard Biener
On Thu, Jul 7, 2016 at 9:18 PM, Jason Merrill  wrote:
> On Tue, Jun 28, 2016 at 10:00 AM, Richard Biener
>  wrote:
>> On Thu, Jun 16, 2016 at 6:15 PM, Jakub Jelinek  wrote:
>>> On Thu, Jun 16, 2016 at 11:28:48AM -0400, Jason Merrill wrote:
  gimple_predicate
  rhs_predicate_for (tree lhs)
  {
 -  if (is_gimple_reg (lhs))
 +  if (will_be_gimple_reg (lhs))
  return is_gimple_reg_rhs_or_call;
else
  return is_gimple_mem_rhs_or_call;
 @@ -4778,10 +4811,6 @@ gimplify_modify_expr (tree *expr_p, gimple_seq 
 *pre_p, gimple_seq *post_p,
   that is what we must do here.  */
maybe_with_size_expr (from_p);

 -  ret = gimplify_expr (to_p, pre_p, post_p, is_gimple_lvalue, fb_lvalue);
 -  if (ret == GS_ERROR)
 -return ret;
 -
/* As a special case, we have to temporarily allow for assignments
   with a CALL_EXPR on the RHS.  Since in GIMPLE a function call is
   a toplevel statement, when gimplifying the GENERIC expression
 @@ -4799,6 +4828,10 @@ gimplify_modify_expr (tree *expr_p, gimple_seq 
 *pre_p, gimple_seq *post_p,
if (ret == GS_ERROR)
  return ret;

 +  ret = gimplify_expr (to_p, pre_p, post_p, is_gimple_lvalue, fb_lvalue);
 +  if (ret == GS_ERROR)
 +return ret;
 +
/* In case of va_arg internal fn wrappped in a WITH_SIZE_EXPR, add the 
 type
   size as argument to the call.  */
if (TREE_CODE (*from_p) == WITH_SIZE_EXPR)
>>>
>>> I wonder if instead of trying to guess early what we'll gimplify into it
>>> wouldn't be better to gimplify *from_p twice, first time with a predicate
>>> that would assume *to_p could be gimplified into is_gimple_ref, but
>>> guarantee there are no side-effects (so that those aren't evaluated
>>> after lhs side-effects), and second time if needed (if *to_p didn't end up
>>> being is_gimple_reg).  So something like a new predicate like:
>>
>> Yes, that is what I was suggesting.
>
> How about this?  I also have a patch to handle assignment order
> entirely in the front end, but my impression has been that you wanted
> to make this change for other reasons as well.

Yes.  Looks good to me.

> In other news, I convinced the committee to drop function arguments
> from the order of evaluation paper, so we don't have to worry about
> that hit on PUSH_ARGS_REVERSED targets.

Good.

Thanks,
Richard.

> Jason


Re: [PING^2] Re: Some fixes for autofdo test cases

2016-07-08 Thread Andi Kleen
Andi Kleen  writes:

Ping^2!

> Andi Kleen  writes:
>
> Ping!
>
>> This fixes some of the problems with profile test cases running with autofdo
>> There are still remaining failures that need to be addressed, but this is the
>> low hanging fruit.
>>
>> -Andi
>>
>>

-- 
a...@linux.intel.com -- Speaking for myself only


Re: [PATCH, rs6000] Fix PR71297 (ICE on invalid calls to vec_ld and vec_st)

2016-07-08 Thread Bill Schmidt

> On Jul 8, 2016, at 12:14 AM, Segher Boessenkool  
> wrote:
> 
> On Thu, Jul 07, 2016 at 03:40:28PM -0500, Bill Schmidt wrote:
>>> PR71297 reports that we ICE when __builtin_vec_ld or __builtin_vec_st is
>>> provided with an incorrect number of arguments.  This patch fixes it by
>>> bypassing special handling for these intrinsics when the number of
>>> arguments is wrong, thus allowing the standard error handling for
>>> builtins to kick in.
>>> 
>>> The patch is pretty obvious and I think adding a test case would be
>>> extraneous, though I can do so if desired.
> 
> Well you could use the one from the PR?
> 
>>> Bootstrapped and tested on
>>> powerpc64le-unknown-linux-gnu with no regressions, and the original
>>> failure is fixed.  Is this ok for trunk?
> 
> Yes, but please do a testcase.  Okay for backports, too.

No backports required; this is a 7 regression.

Bill

> 
> 
> Segher
> 
> 
>>> PR target/71297
>>> * config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin):
>>> Allow standard error handling to take over when a wrong number
>>> of arguments is presented to __builtin_vec_ld () or
>>> __builtin_vec_st ().
> 



Re: [PATCH] Add code-hoisting to GIMPLE

2016-07-08 Thread Richard Biener
On Fri, 8 Jul 2016, Rainer Orth wrote:

> Hi Richard,
> 
> > This is a final candidate patch to add code-hoisting to GIMPLE.
> >
> > I've already committed several patches fixing fallout and the following
> > one adds -fno-code-hoisting (I renamed the option) to a few testcases.
> > I filed PRs for the cases code-hoisting exposes missed optimization
> > opportunities in passes that I couldn't quickly fix (I fixed path
> > splitting and loop distribution but failed to grok SLSR).
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> >
> > I put the patch on the czerny tester for the weekend runs (x86_64 as 
> > well).
> >
> > Testing on other archs and comments are of course appreciated, if nothing
> > unusual happens I plan to commit this on Monday.
> 
> I've just bootstrapped the patch on sparc-sun-solaris2.12, which
> uncovered a couple of testsuite failures:
> 
> +FAIL: gcc.dg/tree-ssa/split-path-5.c scan-tree-dump-times split-paths 
> "Duplicat
> ing join block" 2
> +FAIL: gcc.dg/tree-ssa/split-path-5.c scan-tree-dump-times split-paths 
> "Duplicat
> ing join block" 2

Saw this on x86_64 as well and should have been fixed with

2016-07-05  Richard Biener   

* gimple-ssa-split-paths.c (find_block_to_duplicate_for_splitting_pa):  
Handle empty else block.
(is_feasible_trace): Likewise.  
(split_paths): Likewise.

> Message doesn't occur at all.
> 
> +FAIL: gfortran.dg/ldist-1.f90   -O   scan-tree-dump-not ldist "distributed: 
> spl
> it to"
> 
> Likewise.

You mean it does occur (it's a scan-tree-dump-not).  I saw this on x86_64
as well and fixed it with

2016-07-05  Richard Biener  

* tree-loop-distribution.c (distribute_loop): Fix issue with
the cost model loop.

maybe the fixes were not complete.  I'll have a second look with a
sparc-solaris cross on Monday.

Thanks for testing,
Richard.


Re: [PATCH] Add code-hoisting to GIMPLE

2016-07-08 Thread Rainer Orth
Hi Richard,

> This is a final candidate patch to add code-hoisting to GIMPLE.
>
> I've already committed several patches fixing fallout and the following
> one adds -fno-code-hoisting (I renamed the option) to a few testcases.
> I filed PRs for the cases code-hoisting exposes missed optimization
> opportunities in passes that I couldn't quickly fix (I fixed path
> splitting and loop distribution but failed to grok SLSR).
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
>
> I put the patch on the czerny tester for the weekend runs (x86_64 as 
> well).
>
> Testing on other archs and comments are of course appreciated, if nothing
> unusual happens I plan to commit this on Monday.

I've just bootstrapped the patch on sparc-sun-solaris2.12, which
uncovered a couple of testsuite failures:

+FAIL: gcc.dg/tree-ssa/split-path-5.c scan-tree-dump-times split-paths "Duplicat
ing join block" 2
+FAIL: gcc.dg/tree-ssa/split-path-5.c scan-tree-dump-times split-paths "Duplicat
ing join block" 2

Message doesn't occur at all.

+FAIL: gfortran.dg/ldist-1.f90   -O   scan-tree-dump-not ldist "distributed: spl
it to"

Likewise.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH 0/9] separate shrink-wrapping

2016-07-08 Thread Segher Boessenkool
On Fri, Jul 08, 2016 at 12:42:34PM +0200, Bernd Schmidt wrote:
> On 06/14/2016 11:24 PM, Segher Boessenkool wrote:
> >On Wed, Jun 08, 2016 at 06:43:23PM +0200, Bernd Schmidt wrote:
> >>On 06/08/2016 05:16 PM, Segher Boessenkool wrote:
> >>>There is no standard naming for this as far as I know.  I'll gladly
> >>>use a better name anyone comes up with.
> >>
> >>Maybe just subpart?
> >
> >How about "factor"?
> 
> Still sounds odd to me. "Component" maybe? Ideally a native speaker 
> would help decide what sounds natural to them.

That does sound nice...  OTOH,

$ grep -i component *.c|wc -l
1081

but the opportunity for confusion is limited I think (and calling it
"shrink-wrapping component" where needed sounds natural too!)


Segher


Re: [PATCH] Fix PR rtl-optimization/71634

2016-07-08 Thread Bernd Schmidt

On 07/08/2016 01:52 PM, Bernd Schmidt wrote:

  int maxidx = MIN (IRA_MAX_LOOPS_NUM, n);
  for (i = 0; i < maxidx; i++)
{


Gah, that's not right, that'll swap the numbers of kept/removed loops.

I think the right answer is simply
  for (i = 0; i < n - IRA_MAX_LOOPS_NUM; i++)


Bernd


Re: [PATCH] Fix PR rtl-optimization/71634

2016-07-08 Thread Bernd Schmidt

On 06/23/2016 12:56 PM, Martin Liška wrote:

Following patch changes minimum of ira-max-loops-num to 1.
Having the minimum equal to zero does not make much sense.

Ready after it finishes reg&bootstrap on x86_64-linux?


Hmm, why wouldn't a number of zero make sense if you want try to have 
all loops removed?


The problem seems to be here:

  for (i = 0; n - i + 1 > IRA_MAX_LOOPS_NUM; i++)
{
  sorted_loops[i]->to_remove_p = true;

and this looks like an off-by-one error. n is the number of elements in 
the array, so if IRA_MAX_LOOPS_NUM is 1, it'll iterate i from 0 up to 
n-1, where

  n - i + 1 == 2 > 1
So it'll clear everything, where it seems like it should leave one loop 
around.


So maybe write this with a standard form of for loop so it's actually 
comprehensible:


  int maxidx = MIN (IRA_MAX_LOOPS_NUM, n);
  for (i = 0; i < maxidx; i++)
{
   


Bernd


RFA: new pass to warn on questionable uses of alloca() and VLAs

2016-07-08 Thread Aldy Hernandez

[New thread now that I actually have a tested patch :)].


I think detecting potentially problematic uses of alloca would
be useful, especially when done in an intelligent way like in
your patch (as opposed to simply diagnosing every call to
the function regardless of the value of its argument).  At
the same time, it seems that an even more reliable solution
than pointing out potentially unsafe calls to the function
and relying on users to modify their code to use malloc for
large/unbounded allocations would be to let GCC do it for
them automatically (i.e., in response to some other option,
emit a call to malloc instead, and insert a call to free when
appropriate).


As Jeff said, we were thinking the other way around: notice a malloced 
area that doesn't escape and replace it with a call to alloca.  But all 
this is beyond the scope of this patch.




I applied the patch and experimented with it a bit (I haven't
studied the code in any detail yet) and found a few opportunities
for improvements.  I describe them below (Sorry in advance for
the length of my comments!)


BTW, thank you so much for taking the time to look into this.  Your 
feedback has been invaluable.




I found the "warning: unbounded use of alloca" misleading when
a call to the function was, in fact, bounded but to a limit
that's greater than alloca-max-size as in the program below:


I have added various levels of granularity for the warning, along with 
appropriately different messages:


// Possible problematic uses of alloca.
enum alloca_type {
  // Alloca argument is within known bounds that are appropriate.
  ALLOCA_OK,

  // Alloca argument is KNOWN to have a value that is too large.
  ALLOCA_BOUND_DEFINITELY_LARGE,

  // Alloca argument may be too large.
  ALLOCA_BOUND_MAYBE_LARGE,

  // Alloca argument is bounded but of an indeterminate size.
  ALLOCA_BOUND_UNKNOWN,

  // Alloca argument was casted from a signed integer.
  ALLOCA_CAST_FROM_SIGNED,

  // Alloca appears in a loop.
  ALLOCA_IN_LOOP,

  // Alloca call is unbounded.  That is, there is no controlling
  // predicate for its argument.
  ALLOCA_UNBOUNDED
};

Of course, there are plenty of cases where we can't get the exact 
diagnosis (due to the limitations on our range info) and we fall back to 
ALLOCA_UNBOUNDED or ALLOCA_BOUND_MAYBE_LARGE.  In practice, I'm 
wondering whether we should lump everything into 2-3 warnings instead of 
trying so hard to get the exact reason for the problematic use of 
alloca.  (More details on upcoming changes to range info further down.)




  void f (void*);

  void g (int n)
  {
void *p;
if (n < 4096)
  p = __builtin_alloca (n);
else
  p = __builtin_malloc (n);
f (p);
  }
  t.C: In function ‘g’:
  t.C:7:7: warning: unbounded use of alloca [-Walloca]
   p = __builtin_alloca (n);


Well, in this particular case you are using a signed int, so n < 4096 
can cause the value passed to alloca  to be rather large in the case of 
n < 0.




I would suggest to rephrase the diagnostic to mention the limit,
e.g.,

  warning: calling alloca with an argument in excess of '4000'
  bytes


In the attached patch I try to diagnose these cases with:

a.c: In function ‘g’:
a.c:7:10: warning: cast from signed type in alloca [-Walloca]
p = __builtin_alloca (n);

I'm not 100% convinced this the best idea, and I could be easily 
convinced to narrow the wide variety of warning cases I currently have 
into just a handful less specific ones.




I'm not sure I understand how -Walloca-max-size is supposed to
be used.  For example, it has no effect on the test case above
(i.e., I couldn't find a way to use it to raise the limit to
avoid the warning).  Maybe the interaction of the two options
is more subtle than I think.  I would have expected either


If by subtle you mean buggy, then yes-- thank you for your kind words 
:).  I have fixed it all, and those found responsible have been sacked.



a single option to control whether alloca warnings are to be
emitted and also (optionally) the allocation threshold, or
perhaps two options, one to turn the warning on and off, and
another just to override the threshold (though this latter
approach seems superfluous given that a single option can do
both).

...

I also think that VLA diagnostics would be better controlled
by a separate option, and emit a different diagnostic (one


I have overhauled the options and added extensive documentation to 
invoke.texi explaining them.  See the included testcases.  I have tried 
to add a testcase for everything the pass currently handles.


In the interest of keeping a consistent relationship with -Wvla, we now 
have:


-Walloca:   Warn on every use of alloca (not VLAs).
-Walloca=999:   Warn on unbounded uses of alloca, bounded uses with
no known limit, and bounded uses where the number of
bytes is greater than 999.
-Wvla:  Behaves as currently (warn on every use of VLAs).
-Wvla=999:  Similar to -Walloca=999, 

[PATCH 1/2] [ARC] [libgcc] Add support for QuarkSE processor.

2016-07-08 Thread Claudiu Zissulescu
libgcc/
2016-05-26  Claudiu Zissulescu  

* config/arc/dp-hack.h (ARC_OPTFPE): Define.
(__ARC_NORM__): Use instead ARC_OPTFPE.
* config/arc/fp-hack.h: Likewise.
* config/arc/lib1funcs.S (ARC_OPTFPE): Define.
(__ARC_MPY__): Use it insetead of __ARC700__ and __HS__.
---
 libgcc/config/arc/dp-hack.h   |  12 +++--
 libgcc/config/arc/fp-hack.h   |   8 +--
 libgcc/config/arc/lib1funcs.S | 120 ++
 3 files changed, 74 insertions(+), 66 deletions(-)

diff --git a/libgcc/config/arc/dp-hack.h b/libgcc/config/arc/dp-hack.h
index 3c727b1..1f7f213 100644
--- a/libgcc/config/arc/dp-hack.h
+++ b/libgcc/config/arc/dp-hack.h
@@ -30,21 +30,23 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
 
 #define FINE_GRAINED_LIBRARIES
 #define ARC_DP_DEBUG 1
-#if !defined (__ARC_NORM__) || ARC_DP_DEBUG
+#define ARC_OPTFPE (defined (__ARC700__) || defined (__ARC_FPX_QUARK__))
+
+#if !ARC_OPTFPE || ARC_DP_DEBUG
 #define L_pack_df
 #define L_unpack_df
 #define L_make_df
 #define L_thenan_df
 #define L_sf_to_df
 #endif
-#ifndef __ARC_NORM__
+#if !ARC_OPTFPE
 #define L_addsub_df
 #elif ARC_DP_DEBUG
 #define L_addsub_df
 #define __adddf3 __adddf3_c
 #define __subdf3 __subdf3_c
 #endif
-#ifndef __ARC_NORM__
+#if !ARC_OPTFPE
 #define L_mul_df
 #define L_div_df
 #elif (!defined (__ARC700__) && !defined (__ARC_MUL64__) \
@@ -59,7 +61,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If 
not, see
 #define L_div_df
 #define __divdf3 __divdf3_c
 #endif
-#ifndef __ARC_NORM__
+#if !ARC_OPTFPE
 #define L_df_to_sf
 #define L_si_to_df
 #define L_df_to_si
@@ -77,7 +79,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If 
not, see
 #define L_usi_to_df
 #define __floatunsidf __floatunsidf_c
 #endif
-#ifndef __ARC_NORM__
+#if !ARC_OPTFPE
 #define L_fpcmp_parts_df
 #define L_compare_df
 #define L_eq_df
diff --git a/libgcc/config/arc/fp-hack.h b/libgcc/config/arc/fp-hack.h
index 30b547a..5144bb9 100644
--- a/libgcc/config/arc/fp-hack.h
+++ b/libgcc/config/arc/fp-hack.h
@@ -30,13 +30,15 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
 
 #define ARC_FP_DEBUG 1
 #define FINE_GRAINED_LIBRARIES
-#if !defined (__ARC_NORM__) || ARC_FP_DEBUG
+#define ARC_OPTFPE (defined (__ARC700__) || defined (__ARC_FPX_QUARK__))
+
+#if !ARC_OPTFPE || ARC_FP_DEBUG
 #define L_pack_sf
 #define L_unpack_sf
 #define L_make_sf
 #define L_thenan_sf
 #endif
-#ifndef __ARC_NORM__
+#if !ARC_OPTFPE
 #define L_addsub_sf
 #define L_mul_sf
 #define L_div_sf
@@ -61,7 +63,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If 
not, see
 #define L_usi_to_sf
 #define __floatunsisf __floatunsisf_c
 #endif
-#ifndef __ARC_NORM__
+#if !ARC_OPTFPE
 #define L_fpcmp_parts_sf
 #define L_compare_sf
 #define L_eq_sf
diff --git a/libgcc/config/arc/lib1funcs.S b/libgcc/config/arc/lib1funcs.S
index 1c8961c..9bb25e0 100644
--- a/libgcc/config/arc/lib1funcs.S
+++ b/libgcc/config/arc/lib1funcs.S
@@ -32,29 +32,29 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
This exception does not however invalidate any other reasons why
the executable file might be covered by the GNU General Public License.  */
 
- 
+
  /* ANSI concatenation macros.  */
- 
+
  #define CONCAT1(a, b) CONCAT2(a, b)
  #define CONCAT2(a, b) a ## b
- 
+
  /* Use the right prefix for global labels.  */
- 
+
  #define SYM(x) CONCAT1 (__USER_LABEL_PREFIX__, x)
- 
+
 #ifndef WORKING_ASSEMBLER
 #define abs_l abs
 #define asl_l asl
 #define mov_l mov
 #endif
-   
+
 #define FUNC(X) .type SYM(X),@function
 #define HIDDEN_FUNC(X) FUNC(X)` .hidden X
 #define ENDFUNC0(X) .Lfe_##X: .size X,.Lfe_##X-X
 #define ENDFUNC(X)  ENDFUNC0(X)
 
-   
-   
+
+
 #ifdef  L_mulsi3
.section .text
.align 4
@@ -64,10 +64,10 @@ SYM(__mulsi3):
 
 /* This the simple version.
 
-  while (a) 
+  while (a)
 {
   if (a & 1)
-r += b;
+   r += b;
   a >>= 1;
   b <<= 1;
 }
@@ -132,7 +132,7 @@ SYM(__mulsi3):
add2.cs r0,r0,r1
lsr.f r2,r2
add3.cs r0,r0,r1
-   bne.d .Loop 
+   bne.d .Loop
add3 r1,r3,r1
j_s [blink]
ENDFUNC(__mulsi3)
@@ -143,17 +143,17 @@ SYM(__mulsi3):
 .Lloop:
bbit0 r0,0,@.Ly
add_s r2,r2,r1  ; r += b
-.Ly:   
+.Ly:
lsr_s r0,r0 ; a >>= 1
-   asl_s r1,r1 ; b <<= 1   
-   brne_s r0,0,@.Lloop 
+   asl_s r1,r1 ; b <<= 1
+   brne_s r0,0,@.Lloop
 .Ldone:
j_s.d [blink]
mov_s r0,r2
ENDFUNC(__mulsi3)
 //
 #endif
-   
+
 #endif /* L_mulsi3 */
 
 #ifdef  L_umulsidi3
@@ -178,10 +178,10 @@ SYM(__umulsi3_highpart):
 
 /* This the simple version.
 
-  while (a) 
+  while (a)
 {
   if (a & 1)
-r += b;
+   r += b;
   a >>= 1;
   b <<= 1;
 }
@@ -455,18 +4

[PATCH 0/2] [libgcc] Add support for QuarkSE and cleanup macros.

2016-07-08 Thread Claudiu Zissulescu
This is a set of two libgcc patches that is adding support for QuarkSE
processor as well as it is changing the guarding of the ARC libgcc
support routines from cpu macros to feature macros.

Ok to apply?
Claudiu

Claudiu Zissulescu (2):
  [ARC] [libgcc] Add support for QuarkSE processor.
  [ARC] [libgcc] Fix defines

 libgcc/config/arc/dp-hack.h   |  12 +--
 libgcc/config/arc/fp-hack.h   |   8 +-
 libgcc/config/arc/lib1funcs.S | 165 ++
 3 files changed, 98 insertions(+), 87 deletions(-)

-- 
1.9.1



[PATCH 2/2] [ARC] [libgcc] Fix defines

2016-07-08 Thread Claudiu Zissulescu
Don't use CPU macros, use CPU feature macros.

libgcc/
2016-05-26  Claudiu Zissulescu  

* config/arc/lib1funcs.S (__mulsi3): Use feature defines instead
of checking for cpus.
(__umulsidi3, __umulsi3_highpart, __udivmodsi4, __divsi3)
(__modsi3, __clzsi2): Likewise.
---
 libgcc/config/arc/lib1funcs.S | 45 +++
 1 file changed, 24 insertions(+), 21 deletions(-)

diff --git a/libgcc/config/arc/lib1funcs.S b/libgcc/config/arc/lib1funcs.S
index 9bb25e0..422fd95 100644
--- a/libgcc/config/arc/lib1funcs.S
+++ b/libgcc/config/arc/lib1funcs.S
@@ -79,7 +79,7 @@ SYM(__mulsi3):
j_s.d [blink]
mov_s r0,mlo
ENDFUNC(__mulsi3)
-#elif defined (__ARC700__) || defined (__HS__)
+#elif defined (__ARC_MPY__)
HIDDEN_FUNC(__mulsi3)
mpyur0,r0,r1
nop_s
@@ -98,7 +98,7 @@ SYM(__mulsi3):
add_s   r1,r1,r1
 .Lend: j_s [blink]
ENDFUNC(__mulsi3)
-#elif !defined (__OPTIMIZE_SIZE__) && !defined(__ARC601__)
+#elif !defined (__OPTIMIZE_SIZE__) && defined (__ARC_BARREL_SHIFTER__)
/* Up to 3.5 times faster than the simpler code below, but larger.  */
FUNC(__mulsi3)
ror.f   r2,r0,4
@@ -170,7 +170,8 @@ SYM(__umulsidi3):
umulsi3_highpart implementation; the use of the latter label doesn't
actually benefit ARC601 platforms, but is useful when ARC601 code is linked
against other libraries.  */
-#if defined (__ARC700__) || defined (__ARC_MUL64__) || defined (__ARC601__)
+#if defined (__ARC_MPY__) || defined (__ARC_MUL64__) \
+   || !defined (__ARC_BARREL_SHIFTER__)
.global SYM(__umulsi3_highpart)
 SYM(__umulsi3_highpart):
HIDDEN_FUNC(__umulsi3_highpart)
@@ -188,18 +189,18 @@ SYM(__umulsi3_highpart):
 */
 #include "ieee-754/arc-ieee-754.h"
 
-#ifdef __ARC700__
+#ifdef __ARC_MPY__
mov_s   r12,DBL0L
mpyuDBL0L,r12,DBL0H
j_s.d   [blink]
-   mpyhu   DBL0H,r12,DBL0H
+   MPYHU   DBL0H,r12,DBL0H
 #elif defined (__ARC_MUL64__)
 /* Likewise for __ARC_MUL64__ */
mulu64 r0,r1
mov_s DBL0L,mlo
j_s.d [blink]
mov_s DBL0H,mhi
-#else /* !__ARC700__ && !__ARC_MUL64__ */
+#else /* !__ARC_MPY__ && !__ARC_MUL64__ */
 /* Although it might look tempting to extend this to handle muldi3,
using mulsi3 twice with 2.25 cycles per 32 bit add is faster
than one loop with 3 or four cycles per 32 bit add.  */
@@ -223,9 +224,10 @@ SYM(__umulsi3_highpart):
mov_s DBL0L,r3
j_s.d [blink]
mov DBL0H,r2
-#endif /* !__ARC700__*/
+#endif /* !__ARC_MPY__*/
ENDFUNC(__umulsidi3)
-#if defined (__ARC700__) || defined (__ARC_MUL64__) || defined (__ARC601__)
+#if defined (__ARC_MPY__) || defined (__ARC_MUL64__) \
+   || !defined (__ARC_BARREL_SHIFTER__)
ENDFUNC(__umulsi3_highpart)
 #endif
 #endif /* L_umulsidi3 */
@@ -235,7 +237,8 @@ SYM(__umulsi3_highpart):
 /* For use without a barrel shifter, and for ARC700 / ARC_MUL64, the
mulsidi3 algorithms above look better, so for these, there is an
extra label up there.  */
-#if !defined (__ARC700__) && !defined (__ARC_MUL64__) && !defined (__ARC601__)
+#if !defined (__ARC_MPY__) && !defined (__ARC_MUL64__) \
+   && defined (__ARC_BARREL_SHIFTER__)
.global SYM(__umulsi3_highpart)
 SYM(__umulsi3_highpart):
HIDDEN_FUNC(__umulsi3_highpart)
@@ -251,7 +254,7 @@ SYM(__umulsi3_highpart):
 /* Make the result register peephole-compatible with mulsidi3.  */
lsr DBL0H,r2,r3
ENDFUNC(__umulsi3_highpart)
-#endif /* !__ARC700__  && !__ARC601__ */
+#endif /* !__ARC_MPY__  && __ARC_BARREL_SHIFTER__ */
 #endif /* L_umulsi3_highpart */
 
 #ifdef L_divmod_tools
@@ -295,7 +298,7 @@ udivmodsi4(int modwanted, unsigned long num, unsigned long 
den)
FUNC(__udivmodsi4)
 SYM(__udivmodsi4):
 
-#if defined (__ARC700__)
+#if defined (__ARC_EA__)
 /* Normalize divisor and divident, and then use the appropriate number of
divaw (the number of result bits, or one more) to produce the result.
There are some special conditions that need to be tested:
@@ -368,7 +371,7 @@ SYM(__udivmodsi4):
j_s.d   [blink]
mov.c   r0,0
 #elif !defined (__OPTIMIZE_SIZE__)
-#ifdef __ARC_NORM__
+#if defined (__ARC_NORM__) && defined (__ARC_BARREL_SHIFTER__)
lsr_s r2,r0
brhs.d r1,r2,.Lret0_3
norm r2,r2
@@ -393,17 +396,17 @@ SYM(__udivmodsi4):
lsr_s r1,r1
cmp_s r0,r1
xor.f r2,lp_count,31
-#if !defined (__EM__)
+#if !defined (__ARCEM__) && !defined (__ARCHS__)
mov_s lp_count,r2
 #else
mov lp_count,r2
nop_s
-#endif /* !__EM__ */
+#endif /* !__ARCEM__ && !__ARCHS__ */
 #endif /* !__ARC_NORM__ */
sub.cc r0,r0,r1
mov_s r3,3
sbc r3,r3,0
-#ifndef __ARC601__
+#if defined (__ARC_BARREL_SHIFTER__)
asl_s r3,r3,r2
rsub r1,r1,1
lpne @.Lloop2_end
@@ -503,7 +506,7 @@ SYM(__udivsi3):
.global SYM(__divsi3)
FUNC(__divs

Re: fold x ^ y to 0 if x == y

2016-07-08 Thread Richard Biener
On Fri, 8 Jul 2016, Richard Biener wrote:

> On Fri, 8 Jul 2016, Prathamesh Kulkarni wrote:
> 
> > Hi Richard,
> > For the following test-case:
> > 
> > int f(int x, int y)
> > {
> >int ret;
> > 
> >if (x == y)
> >  ret = x ^ y;
> >else
> >  ret = 1;
> > 
> >return ret;
> > }
> > 
> > I was wondering if x ^ y should be folded to 0 since
> > it's guarded by condition x == y ?
> > 
> > optimized dump shows:
> > f (int x, int y)
> > {
> >   int iftmp.0_1;
> >   int iftmp.0_4;
> > 
> >   :
> >   if (x_2(D) == y_3(D))
> > goto ;
> >   else
> > goto ;
> > 
> >   :
> >   iftmp.0_4 = x_2(D) ^ y_3(D);
> > 
> >   :
> >   # iftmp.0_1 = PHI 
> >   return iftmp.0_1;
> > 
> > }
> > 
> > The attached patch tries to fold for above case.
> > I am checking if op0 and op1 are equal using:
> > if (bitmap_intersect_p (vr1->equiv, vr2->equiv)
> >&& operand_equal_p (vr1->min, vr1->max)
> >&& operand_equal_p (vr2->min, vr2->max))
> >   { /* equal /* }
> > 
> > I suppose intersection would check if op0 and op1 have equivalent ranges,
> > and added operand_equal_p check to ensure that there is only one
> > element within the range. Does that look correct ?
> > Bootstrap+test in progress on x86_64-unknown-linux-gnu.
> 
> I think VRP is the wrong place to catch this and DOM should have but it
> does
> 
> Optimizing block #3
> 
> 1>>> STMT 1 = x_2(D) le_expr y_3(D)
> 1>>> STMT 1 = x_2(D) ge_expr y_3(D)
> 1>>> STMT 1 = x_2(D) eq_expr y_3(D)
> 1>>> STMT 0 = x_2(D) ne_expr y_3(D)
> 0>>> COPY x_2(D) = y_3(D)
> 0>>> COPY y_3(D) = x_2(D)
> Optimizing statement ret_4 = x_2(D) ^ y_3(D);
>   Replaced 'x_2(D)' with variable 'y_3(D)'
>   Replaced 'y_3(D)' with variable 'x_2(D)'
>   Folded to: ret_4 = x_2(D) ^ y_3(D);
> LKUP STMT ret_4 = x_2(D) bit_xor_expr y_3(D)
> 
> heh, registering both reqivalencies is obviously not going to help...
> 
> The 2nd equivalence is from doing
> 
>   /* We already recorded that LHS = RHS, with canonicalization,
>  value chain following, etc.
> 
>  We also want to record RHS = LHS, but without any 
> canonicalization
>  or value chain following.  */
>   if (TREE_CODE (rhs) == SSA_NAME)
> const_and_copies->record_const_or_copy_raw (rhs, lhs,
> SSA_NAME_VALUE (rhs));
> 
> generally recording both is not helpful.  Jeff?  This seems to be
> r233207 (fix for PR65917) which must have regressed this testcase.

Just verified it works fine on the GCC 5 branch:

Optimizing block #3

0>>> COPY y_3(D) = x_2(D)
1>>> STMT 1 = x_2(D) le_expr y_3(D)
1>>> STMT 1 = x_2(D) ge_expr y_3(D)
1>>> STMT 1 = x_2(D) eq_expr y_3(D)
1>>> STMT 0 = x_2(D) ne_expr y_3(D)
Optimizing statement ret_4 = x_2(D) ^ y_3(D);
  Replaced 'y_3(D)' with variable 'x_2(D)'
Applying pattern match.pd:240, gimple-match.c:11346
gimple_simplified to ret_4 = 0;
  Folded to: ret_4 = 0;

Richard.


RE: [PATCH] [ARC] Various small miscellaneous fixes.

2016-07-08 Thread Claudiu Zissulescu
> > gcc/
> > 2016-05-09  Claudiu Zissulescu  
> >
> > * config/arc/arc.c (arc_process_double_reg_moves): Change.
> > * config/arc/arc.md (movsi_insn): Disable unsupported move
> > instructions for ARCv2 cores.
> > (movdi): Use prepare_move_operands.
> > (movsf, movdf): Use move_dest_operand predicate.
> > (arc_process_double_reg_moves): Change.
> > * config/arc/constraints.md (Chs): Enable when barrel shifter is
> > present.
> > * config/arc/fpu.md (divsf3): Change.
> > * config/arc/fpx.md (dexcl_3op_peep2_insn): Dx data register is
> > also a destination.
> > (dexcl_3op_peep2_insn_nores): Likewise.
> > * config/arc/arc.h (SHIFT_COUNT_TRUNCATED): Define to one.
> > (LINK_COMMAND_SPEC): Remove.
> > ---
> >  gcc/config/arc/arc.c  |  5 +
> >  gcc/config/arc/arc.h  | 27 +++
> >  gcc/config/arc/arc.md | 35 +++
> >  gcc/config/arc/constraints.md |  3 ++-
> >  gcc/config/arc/fpu.md |  4 +++-
> >  gcc/config/arc/fpx.md | 26 --
> >  6 files changed, 40 insertions(+), 60 deletions(-)

This patch needs to be consider together with this patch:
https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00386.html

As some issues observed by Andrew are fixed there.

Thanks,
Claudiu


[PATCH] [ARC] Add support for QuarkSE processor.

2016-07-08 Thread Claudiu Zissulescu
This patch adds support for a ARC EM version called QuarkSE.

This patch needs to be consider together with this previous patch:
https://gcc.gnu.org/ml/gcc-patches/2016-06/msg02044.html

Thanks,
Claudiu

gcc/
2016-05-25  Claudiu Zissulescu  

* config/arc/arc-arches.def: Add FPX quarkse instruction as valid
for arcem.
* config/arc/arc-c.def (__ARC_FPX_QUARK__): Define.
* config/arc/arc-cpus.def (quarkse_em): Add.
* config/arc/arc-options.def (FL_FPX_QUARK, FL_QUARK): Likewise.
* config/arc/arc-opts.h (FPX_QK): Define.
* config/arc/arc-tables.opt: Regenerate.
* config/arc/arc.c (gen_compare_reg): Change.
(arc_register_move_cost): Avoid Dy,Dx moves.
* config/arc/arc.h (TARGET_HARD_FLOAT): Change.
(TARGET_FPX_QUARK, TARGET_FP_ASSIST): Define.
* config/arc/arc.md (divsf3, sqrtsf2, fix_truncsfsi2, floatsisf2):
New expands.
* config/arc/fpu.md (divsf3_fpu, sqrtsf2_fpu, floatsisf2_fpu)
(fix_truncsfsi2_fpu): Rename.
* config/arc/fpx.md (cmp_quark, cmpsf_quark_, cmpsf_quark_ord)
(cmpsf_quark_uneq, cmpsf_quark_eq, divsf3_quark, sqrtsf2_quark)
(fix_truncsfsi2_quark, floatsisf2_quark): New patterns.
* config/arc/t-multilib: Regenerate.
---
 gcc/config/arc/arc-arches.def  |  2 +-
 gcc/config/arc/arc-c.def   |  1 +
 gcc/config/arc/arc-cpus.def|  1 +
 gcc/config/arc/arc-options.def |  2 +
 gcc/config/arc/arc-opts.h  |  2 +
 gcc/config/arc/arc-tables.opt  |  3 ++
 gcc/config/arc/arc.c   | 22 +-
 gcc/config/arc/arc.h   | 12 +++--
 gcc/config/arc/arc.md  | 46 
 gcc/config/arc/fpu.md  |  8 ++--
 gcc/config/arc/fpx.md  | 99 ++
 gcc/config/arc/t-multilib  |  5 ++-
 12 files changed, 188 insertions(+), 15 deletions(-)

diff --git a/gcc/config/arc/arc-arches.def b/gcc/config/arc/arc-arches.def
index da69a1a..db96ce1 100644
--- a/gcc/config/arc/arc-arches.def
+++ b/gcc/config/arc/arc-arches.def
@@ -19,7 +19,7 @@
 
 ARC_ARCH("arcem", em, FL_MPYOPT_1_6 | FL_DIVREM | FL_CD | FL_NORM  \
 | FL_BS | FL_SWAP | FL_FPUS | FL_SPFP | FL_DPFP\
-| FL_SIMD | FL_FPUDA, 0)
+| FL_SIMD | FL_FPUDA | FL_QUARK, 0)
 ARC_ARCH("archs", hs, FL_MPYOPT_7_9 | FL_DIVREM | FL_NORM | FL_CD  \
 | FL_ATOMIC | FL_LL64 | FL_BS | FL_SWAP\
 | FL_FPUS | FL_FPUD,   \
diff --git a/gcc/config/arc/arc-c.def b/gcc/config/arc/arc-c.def
index 4cfd7b6..fd64376 100644
--- a/gcc/config/arc/arc-c.def
+++ b/gcc/config/arc/arc-c.def
@@ -58,6 +58,7 @@ ARC_C_DEF ("__ARC_FPU_DP_DIV__", TARGET_FP_DP_SQRT)
 ARC_C_DEF ("__ARC_FPU_SP_FMA__", TARGET_FP_SP_FUSED)
 ARC_C_DEF ("__ARC_FPU_DP_FMA__", TARGET_FP_DP_FUSED)
 ARC_C_DEF ("__ARC_FPU_ASSIST__", TARGET_FP_DP_AX)
+ARC_C_DEF ("__ARC_FPX_QUARK__",  TARGET_FPX_QUARK)
 
 /* To be deprecated.  */
 ARC_C_DEF ("__A6__", TARGET_ARC600)
diff --git a/gcc/config/arc/arc-cpus.def b/gcc/config/arc/arc-cpus.def
index 6d93c89..8782bd5 100644
--- a/gcc/config/arc/arc-cpus.def
+++ b/gcc/config/arc/arc-cpus.def
@@ -23,6 +23,7 @@ ARC_CPU (em4,   em, FL_CD, NONE)
 ARC_CPU (em4_dmips, em, FL_MPYOPT_2|FL_CD|FL_DIVREM|FL_NORM|FL_SWAP|FL_BS, 
NONE)
 ARC_CPU (em4_fpus,  em, 
FL_MPYOPT_2|FL_CD|FL_DIVREM|FL_NORM|FL_SWAP|FL_BS|FL_FPU_FPUS, NONE)
 ARC_CPU (em4_fpuda, em, 
FL_MPYOPT_2|FL_CD|FL_DIVREM|FL_NORM|FL_SWAP|FL_BS|FL_FPU_FPUDA, NONE)
+ARC_CPU (quarkse_em, em, 
FL_MPYOPT_3|FL_CD|FL_DIVREM|FL_NORM|FL_SWAP|FL_BS|FL_FPX_QUARK|FL_SPFP|FL_DPFP, 
NONE)
 
 ARC_CPU (hs, hs, 0, NONE)
 ARC_CPU (archs,  hs, FL_MPYOPT_2|FL_DIVREM|FL_LL64, NONE)
diff --git a/gcc/config/arc/arc-options.def b/gcc/config/arc/arc-options.def
index 3834894..778f69d 100644
--- a/gcc/config/arc/arc-options.def
+++ b/gcc/config/arc/arc-options.def
@@ -59,10 +59,12 @@ ARC_OPTX (FL_FPU_FPUD,  (1ULL << 34), 
arc_fpu_build, FPU_FPUD,  "mfpu=fpud")
 ARC_OPTX (FL_FPU_FPUD_DIV,  (1ULL << 35), arc_fpu_build, FPU_FPUD_DIV, 
"mfpu=fpud_div")
 ARC_OPTX (FL_FPU_FPUD_FMA,  (1ULL << 36), arc_fpu_build, FPU_FPUD_FMA, 
"mfpu=fpud_fma")
 ARC_OPTX (FL_FPU_FPUD_ALL,  (1ULL << 37), arc_fpu_build, FPU_FPUD_ALL, 
"mfpu=fpud_all")
+ARC_OPTX (FL_FPX_QUARK,(1ULL << 38), arc_fpu_build, FPX_QK,
"quarkse fp")
 
 ARC_OPT (FL_FPUS,  (0xFULL << 26), 0, "single precission floating point")
 ARC_OPT (FL_FPUDA, (0xFFULL << 26), 0, "double precission fp assist")
 ARC_OPT (FL_FPUD,  (0xF0FULL << 26), 0, "double precission floating point")
+ARC_OPT (FL_QUARK, (1ULL << 38), 0, "Quark SE fp extension")
 
 /* Local Variables: */
 /* mode: c */
diff --git a/gcc/config/arc/arc-opts.h b/gcc/config/arc/arc-opts.h
index 81446b4..819b97c 100644
--- a/gcc/config/arc/arc-opts.h
+++ b/gcc/config/arc/arc-opts.h
@@ -48,6 +48,8 @@ enum processor_type
 #define FPU_DD0x0080
 /* Double precision flo

Re: fold x ^ y to 0 if x == y

2016-07-08 Thread Richard Biener
On Fri, 8 Jul 2016, Prathamesh Kulkarni wrote:

> Hi Richard,
> For the following test-case:
> 
> int f(int x, int y)
> {
>int ret;
> 
>if (x == y)
>  ret = x ^ y;
>else
>  ret = 1;
> 
>return ret;
> }
> 
> I was wondering if x ^ y should be folded to 0 since
> it's guarded by condition x == y ?
> 
> optimized dump shows:
> f (int x, int y)
> {
>   int iftmp.0_1;
>   int iftmp.0_4;
> 
>   :
>   if (x_2(D) == y_3(D))
> goto ;
>   else
> goto ;
> 
>   :
>   iftmp.0_4 = x_2(D) ^ y_3(D);
> 
>   :
>   # iftmp.0_1 = PHI 
>   return iftmp.0_1;
> 
> }
> 
> The attached patch tries to fold for above case.
> I am checking if op0 and op1 are equal using:
> if (bitmap_intersect_p (vr1->equiv, vr2->equiv)
>&& operand_equal_p (vr1->min, vr1->max)
>&& operand_equal_p (vr2->min, vr2->max))
>   { /* equal /* }
> 
> I suppose intersection would check if op0 and op1 have equivalent ranges,
> and added operand_equal_p check to ensure that there is only one
> element within the range. Does that look correct ?
> Bootstrap+test in progress on x86_64-unknown-linux-gnu.

I think VRP is the wrong place to catch this and DOM should have but it
does

Optimizing block #3

1>>> STMT 1 = x_2(D) le_expr y_3(D)
1>>> STMT 1 = x_2(D) ge_expr y_3(D)
1>>> STMT 1 = x_2(D) eq_expr y_3(D)
1>>> STMT 0 = x_2(D) ne_expr y_3(D)
0>>> COPY x_2(D) = y_3(D)
0>>> COPY y_3(D) = x_2(D)
Optimizing statement ret_4 = x_2(D) ^ y_3(D);
  Replaced 'x_2(D)' with variable 'y_3(D)'
  Replaced 'y_3(D)' with variable 'x_2(D)'
  Folded to: ret_4 = x_2(D) ^ y_3(D);
LKUP STMT ret_4 = x_2(D) bit_xor_expr y_3(D)

heh, registering both reqivalencies is obviously not going to help...

The 2nd equivalence is from doing

  /* We already recorded that LHS = RHS, with canonicalization,
 value chain following, etc.

 We also want to record RHS = LHS, but without any 
canonicalization
 or value chain following.  */
  if (TREE_CODE (rhs) == SSA_NAME)
const_and_copies->record_const_or_copy_raw (rhs, lhs,
SSA_NAME_VALUE (rhs));

generally recording both is not helpful.  Jeff?  This seems to be
r233207 (fix for PR65917) which must have regressed this testcase.

Richard.


RE: [PATCH] [ARC] Various small miscellaneous fixes.

2016-07-08 Thread Claudiu Zissulescu
> 
>../src/configure --target=arc-elf32 --enable-languages=c --with-cpu=arc700
>make all-gcc
> 

My bad, my build environment was polluted, I can see the error as well. I need 
to upstream another patch that fixes the named problem.

Best,
Claudiu


Re: [PATCH] [ARC] Various small miscellaneous fixes.

2016-07-08 Thread Andrew Burgess
* Claudiu Zissulescu  [2016-07-08 08:18:00 
+]:

> > > +   && (register_operand (operands[1], SFmode)
> > > +   || register_operand (operands[2], SFmode))"
> 
> This condition is necessary for reload cases.
> 
> > And, with this patch applied, I get a build error:
> > 
> > In file included from ./tm.h:43:0,
> >  from /path/to/gcc/gcc/backend.h:28,
> >  from insn-opinit.c:7:
> > insn-opinit.c: In function ‘void init_all_optabs(target_optabs*)’:
> > ./insn-flags.h:160:26: error: ‘operands’ was not declared in this scope
> > && (register_operand (operands[1], SFmode) \
> >   ^
> > insn-opinit.c:220:13: note: in expansion of macro ‘HAVE_divsf3’
> >ena[46] = HAVE_divsf3;
> 
> I applied this patch on the current trunk, but I've got no error. 

That's strange.  I doubled checked fresh this morning, and I still see
the same error.

Could you confirm how you're configuring & building, maybe that's why
we're seeing different behaviours.

I'm using the official GNU GCC git mirror, commit 798fc30 (2 days old
now) with your patch applied on top.

Then just

   ../src/configure --target=arc-elf32 --enable-languages=c --with-cpu=arc700
   make all-gcc

And I still hit the error above.  Can you offer any advice?

Thanks,
Andrew


Re: [PATCH 0/9] separate shrink-wrapping

2016-07-08 Thread Bernd Schmidt

On 06/14/2016 11:24 PM, Segher Boessenkool wrote:

On Wed, Jun 08, 2016 at 06:43:23PM +0200, Bernd Schmidt wrote:

On 06/08/2016 05:16 PM, Segher Boessenkool wrote:

There is no standard naming for this as far as I know.  I'll gladly
use a better name anyone comes up with.


Maybe just subpart?


How about "factor"?


Still sounds odd to me. "Component" maybe? Ideally a native speaker 
would help decide what sounds natural to them.



Bernd



Re: [PATCH/AARCH64] Add rtx_costs routine for vulcan.

2016-07-08 Thread Virendra Pathak
Hi James,

Please find the patch after taking care of your comments.


> Did you see those patches, and did you consider whether there would be a
> benefit to doing the same for Vulcan?
In our simulation environment, we did not observe any performance gain
for specfp2006.
However, we did it to keep the cost strategy same as cortexa-57/53.

Please review and merge to trunk.


gcc/ChangeLog:

Virendra Pathak  
Julian Brown  

* config/aarch64/aarch64-cores.def: Update vulcan COSTS.
* config/aarch64/aarch64-cost-tables.h
(vulcan_extra_costs): New variable.
* config/aarch64/aarch64.c
(vulcan_addrcost_table): Likewise.
(vulcan_regmove_cost): Likewise.
(vulcan_vector_cost): Likewise.
(vulcan_branch_cost): Likewise.
(vulcan_tunings): Likewise.



with regards,
Virendra Pathak


On Wed, Jun 29, 2016 at 4:23 PM, Virendra Pathak
 wrote:
> Hi James,
>
>> Did you see those patches, and did you consider whether there would be a
>> benefit to doing the same for Vulcan?
> No. I have not studied those patches yet. Currently I am working on
> adding vulcan scheduler as a next patch.
> Kindly advise on the following:
> Could this patch be merged now (assuming you are okay),
> and I will update vulcan costs based on your patch later (after vulcan
> scheduler)?
>
> Thanks for your time.
>
> with regards,
> Virendra Pathak
>
>
> On Wed, Jun 29, 2016 at 4:11 PM, James Greenhalgh
>  wrote:
>> On Thu, Jun 23, 2016 at 02:45:21PM +0530, Virendra Pathak wrote:
>>> Hi gcc-patches group,
>>>
>>> Please find the patch for adding rtx_costs routine for vulcan cpu.
>>>
>>> Tested with compiling cross aarch64-linux-gcc , bootstrapped native
>>> aarch64-unknown-linux-gnu
>>> and make check (gcc). No new regression failure is added by this patch.
>>>
>>> Kindly review and merge the patch to trunk, if the patch is okay.
>>> Thanks.
>>
>> This is OK, but I have the same question for you as I had for the
>> qdf24xx tuning that Jim proposed, so I won't commit it yet...
>>
>>> gcc/ChangeLog:
>>>
>>> Virendra Pathak  
>>>
>>> * config/aarch64/aarch64-cores.def: Update vulcan COSTS.
>>> * config/aarch64/aarch64-cost-tables.h
>>> (vulcan_extra_costs): New variable.
>>> * config/aarch64/aarch64.c
>>> (vulcan_addrcost_table): Likewise.
>>> (vulcan_regmove_cost): Likewise.
>>> (vulcan_vector_cost): Likewise.
>>> (vulcan_branch_cost): Likewise.
>>> (vulcan_tunings): Likewise.
>>
>>
>>>
>>> +  {
>>> +/* FP SFmode */
>>> +{
>>> +  COSTS_N_INSNS (16),/* Div.  */
>>> +  COSTS_N_INSNS (6), /* Mult.  */
>>> +  COSTS_N_INSNS (6), /* Mult_addsub. */
>>> +  COSTS_N_INSNS (6), /* Fma.  */
>>> +  COSTS_N_INSNS (6), /* Addsub.  */
>>> +  COSTS_N_INSNS (5), /* Fpconst. */
>>> +  COSTS_N_INSNS (5), /* Neg.  */
>>> +  COSTS_N_INSNS (5), /* Compare.  */
>>> +  COSTS_N_INSNS (7), /* Widen.  */
>>> +  COSTS_N_INSNS (7), /* Narrow.  */
>>> +  COSTS_N_INSNS (7), /* Toint.  */
>>> +  COSTS_N_INSNS (7), /* Fromint.  */
>>> +  COSTS_N_INSNS (7)  /* Roundint.  */
>>> +},
>>> +/* FP DFmode */
>>> +{
>>> +  COSTS_N_INSNS (23),/* Div.  */
>>> +  COSTS_N_INSNS (6), /* Mult.  */
>>> +  COSTS_N_INSNS (6), /* Mult_addsub.  */
>>> +  COSTS_N_INSNS (6), /* Fma.  */
>>> +  COSTS_N_INSNS (6), /* Addsub.  */
>>> +  COSTS_N_INSNS (5), /* Fpconst.  */
>>> +  COSTS_N_INSNS (5), /* Neg.  */
>>> +  COSTS_N_INSNS (5), /* Compare.  */
>>> +  COSTS_N_INSNS (7), /* Widen.  */
>>> +  COSTS_N_INSNS (7), /* Narrow.  */
>>> +  COSTS_N_INSNS (7), /* Toint.  */
>>> +  COSTS_N_INSNS (7), /* Fromint.  */
>>> +  COSTS_N_INSNS (7)  /* Roundint.  */
>>> +}
>>
>> Recently ( https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00251.html ,
>>  https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01418.html ), I changed the
>> Cortex-A57 and Cortex-A53 cost tables to make the cost of a floating-point
>> operation relative to the cost of a floating-point move. This gave some
>> code generation benefits, particularly around conditional execution and
>> generation of constants.
>>
>> Did you see those patches, and did you consider whether there would be a
>> benefit to doing the same for Vulcan?
>>
>> Thanks,
>> James
>>
From ac65cb8040cb4ee5a0124106bf077afe8cd43105 Mon Sep 17 00:00:00 2001
From: Virendra Pathak 
Date: Tue, 21 Jun 2016 01:44:38 -0700
Subject: [PATCH] AArch64: Add rtx_costs routine for vulcan.

---
 gcc/config/aarch64/aarch64-cores.def |   2 +-
 gcc/config/aarch64/aarch64-cost-tables.h | 102 +++
 gcc/config/aarch64/aarch64.c |  75 +++
 3 files changed, 178 insertions(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index 

fold x ^ y to 0 if x == y

2016-07-08 Thread Prathamesh Kulkarni
Hi Richard,
For the following test-case:

int f(int x, int y)
{
   int ret;

   if (x == y)
 ret = x ^ y;
   else
 ret = 1;

   return ret;
}

I was wondering if x ^ y should be folded to 0 since
it's guarded by condition x == y ?

optimized dump shows:
f (int x, int y)
{
  int iftmp.0_1;
  int iftmp.0_4;

  :
  if (x_2(D) == y_3(D))
goto ;
  else
goto ;

  :
  iftmp.0_4 = x_2(D) ^ y_3(D);

  :
  # iftmp.0_1 = PHI 
  return iftmp.0_1;

}

The attached patch tries to fold for above case.
I am checking if op0 and op1 are equal using:
if (bitmap_intersect_p (vr1->equiv, vr2->equiv)
   && operand_equal_p (vr1->min, vr1->max)
   && operand_equal_p (vr2->min, vr2->max))
  { /* equal /* }

I suppose intersection would check if op0 and op1 have equivalent ranges,
and added operand_equal_p check to ensure that there is only one
element within the range. Does that look correct ?
Bootstrap+test in progress on x86_64-unknown-linux-gnu.

Thanks,
Prathamesh
diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index 4333d60..787d068 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -6965,6 +6965,59 @@ vrp_valueize_1 (tree name)
   return name;
 }
 
+/* Try to fold op0 xor op1 == 0 if op0 == op1.  */ 
+static tree
+maybe_fold_xor (gassign *stmt)
+{
+  if (!stmt)
+return NULL_TREE;
+
+  enum tree_code code = gimple_assign_rhs_code (stmt);
+  if (code != BIT_XOR_EXPR)
+return NULL_TREE;
+
+  tree op0 = gimple_assign_rhs1 (stmt);
+  tree op1 = gimple_assign_rhs2 (stmt);
+
+  if (TREE_CODE (op0) != SSA_NAME
+  || TREE_CODE (op1) != SSA_NAME)
+return NULL_TREE;
+
+  value_range *vr1 = get_value_range (op0);
+  value_range *vr2 = get_value_range (op1);
+
+  if (vr1 == NULL || vr2 == NULL)
+return NULL_TREE;
+
+  if (vr1->type != VR_RANGE || vr2->type != VR_RANGE)
+return NULL_TREE;
+
+  if (! (symbolic_range_p (vr1) && symbolic_range_p (vr2)))
+return NULL_TREE;
+
+  if (! (TREE_CODE (vr1->min) == SSA_NAME && TREE_CODE (vr1->max) == SSA_NAME
+&& TREE_CODE (vr2->min) == SSA_NAME && TREE_CODE (vr2->max) == 
SSA_NAME))
+return NULL_TREE;
+
+  if (! (vr1->equiv && vr2->equiv))
+return NULL_TREE;
+
+  /* check if op0 == op1.  */
+  if (bitmap_intersect_p (vr1->equiv, vr2->equiv)
+  && operand_equal_p (vr1->min, vr1->max, 0)
+  && operand_equal_p (vr2->min, vr2->max, 0)
+  && code == BIT_XOR_EXPR)
+{
+  gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
+  gimple_assign_set_rhs_from_tree (&gsi, integer_zero_node); 
+  update_stmt (stmt);
+  return integer_zero_node;
+}
+
+  return NULL_TREE;
+}
+
+
 /* Visit assignment STMT.  If it produces an interesting range, record
the SSA name in *OUTPUT_P.  */
 
@@ -6990,8 +7043,11 @@ vrp_visit_assignment_or_call (gimple *stmt, tree 
*output_p)
   /* Try folding the statement to a constant first.  */
   tree tem = gimple_fold_stmt_to_constant_1 (stmt, vrp_valueize,
 vrp_valueize_1);
+  if (!tem)
+   tem = maybe_fold_xor (dyn_cast (stmt));
   if (tem && is_gimple_min_invariant (tem))
set_value_range_to_value (&new_vr, tem, NULL);
+
   /* Then dispatch to value-range extracting functions.  */
   else if (code == GIMPLE_CALL)
extract_range_basic (&new_vr, stmt);


ChangeLog
Description: Binary data


[wwwdocs] Add branch description for new branch unified-autovect

2016-07-08 Thread Sameera Deshpande
Hi!

I have created new branch unified-autovect based on ToT.

Please find attached the patch adding information about new branch 
"unified-autovect" in the documentation.
Is it ok to commit?

- Thanks and regards,
  Sameera D.

unified-autovec-doc.patch
Description: unified-autovec-doc.patch


Re: [PATCH] rs6000: Make the ctr* patterns allow ints in vector regs (PR71763)

2016-07-08 Thread Segher Boessenkool
On Fri, Jul 08, 2016 at 01:28:05AM +0930, Alan Modra wrote:
> BTW, both pr70098 and pr71763 are triggered by combine, not
> loop-doloop as I was thinking earlier.  See rtl dumps for the
> testcases.  I doubt the "optimization" done by combine here is worth
> keeping, since loop-doloop.c ought to already handle the benficial
> inner loop use of ctr.  Elsewhere we typically end up with an insn
> that needs splitting back to the original sequence.  So we could avoid
> creating trouble for ourselves with the following patch.

I agree on the approach; if there are any missed optimisation because
of it, it doesn't weigh up to the frequentish pessimisation we have now.

One case it will prevent it bdz before a bdnz loop (for a loop count of
zero), but we usually do not generate that anyway, and it isn't obvious
it is faster anyway (or smaller, even).

> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index 7d9c660..b2d1118 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -148,6 +148,7 @@
> UNSPEC_IEEE128_MOVE
> UNSPEC_IEEE128_CONVERT
> UNSPEC_SIGNBIT
> +   UNSPEC_DONT_COMBINE
>])

That is a pretty horrible name.  Combine can combine such insns just fine;
it won't make up the unspec out of thin air though.

It seems you want to use this in other cases that should not be invented
by combine as well, but that won't work: combine could then morph one of
those into another kind.

Maybe just UNSPEC_BDZ?  UNSPEC_DOLOOP?

> -(define_insn "*ctr_internal5"
> +(define_insn "*ctr_internal3"

Please don't rename the patterns, not if you don't make better names.


Thanks, this should be a nice improvement,


Segher


Re: [PATCH 0/2, fortran] Better code generation for DO loops with +-1 step

2016-07-08 Thread FX
> This is what Fortran standard says:
> 
>  The iteration count is established and is the value of the expression 
> (m2-m1+m3)/m3 unless that value is negative,
>  in which case the iteration count is 0.
> 
> My reading of this is that the do statement is undefined whenever the 
> expression above is undefined
> (m1 is lower bound, m2 is upper bound, m3 is step) and because I think 
> the evaulation order of
> m2-m1+m3 is not fixed, I think the statement is not defined whethever 
> (m2-m1), (m1+m3) or (m2-m1)+m3

In the Fortran standard, (m2-m1+m3)/m3 is a mathematical expression, not a 
“construct”. So it cannot be “undefined”.
If you have explicit cases where you are asking “is this valid or invalid” 
please post them here (fortran@) and we will tell you.

FX

Re: [PATCH 0/2, fortran] Better code generation for DO loops with +-1 step

2016-07-08 Thread Jakub Jelinek
On Fri, Jul 08, 2016 at 11:03:35AM +0200, Jan Hubicka wrote:
> > On Fri, Jul 08, 2016 at 10:33:45AM +0200, Martin Liška wrote:
> > > On 07/07/2016 04:40 PM, Jan Hubicka wrote:
> > > >>
> > > >> Why is the behavior only undefined for step 1 if the last iteration IV
> > > >> increment overflows?
> > > >> Doesn't this apply to all step values?
> > > > 
> > > > This is what Fortran standard says:
> > > > 
> > > >   The iteration count is established and is the value of the expression 
> > > > (m2-m1+m3)/m3 unless that value is negative,
> > > >   in which case the iteration count is 0.
> > > > 
> > > > My reading of this is that the do statement is undefined whenever the 
> > > > expression above is undefined
> > > > (m1 is lower bound, m2 is upper bound, m3 is step) and because I think 
> > > > the evaulation order of
> > > > m2-m1+m3 is not fixed, I think the statement is not defined whethever 
> > > > (m2-m1), (m1+m3) or (m2-m1)+m3
> > 
> > m1+m3?  Did you mean m3-m1 or -m1+m3 instead?
> 
> Ah yes, -m1+m3.  But I am by no means language expert - this was meant as a 
> heads up to Fortran
> people :)

Heads up to Fortran people should be sent to fort...@gcc.gnu.org ;)

Jakub


Re: [PATCH 0/2, fortran] Better code generation for DO loops with +-1 step

2016-07-08 Thread Jan Hubicka
> On Fri, Jul 08, 2016 at 10:33:45AM +0200, Martin Liška wrote:
> > On 07/07/2016 04:40 PM, Jan Hubicka wrote:
> > >>
> > >> Why is the behavior only undefined for step 1 if the last iteration IV
> > >> increment overflows?
> > >> Doesn't this apply to all step values?
> > > 
> > > This is what Fortran standard says:
> > > 
> > >   The iteration count is established and is the value of the expression 
> > > (m2-m1+m3)/m3 unless that value is negative,
> > >   in which case the iteration count is 0.
> > > 
> > > My reading of this is that the do statement is undefined whenever the 
> > > expression above is undefined
> > > (m1 is lower bound, m2 is upper bound, m3 is step) and because I think 
> > > the evaulation order of
> > > m2-m1+m3 is not fixed, I think the statement is not defined whethever 
> > > (m2-m1), (m1+m3) or (m2-m1)+m3
> 
> m1+m3?  Did you mean m3-m1 or -m1+m3 instead?

Ah yes, -m1+m3.  But I am by no means language expert - this was meant as a 
heads up to Fortran
people :)

Honza
> 
>   Jakub


Re: [PATCH] rs6000: Make the ctr* patterns allow ints in vector regs (PR71763)

2016-07-08 Thread Segher Boessenkool
On Fri, Jul 08, 2016 at 12:37:55PM +0930, Alan Modra wrote:
> The regression tests passed.  I've been looking at differences in
> gcc/*.o and find many cases like the following.
> 
> orig/combine.o
> 1508: 01 00 3f 2c cmpdi   r31,1
> 150c: ff ff ff 3b addir31,r31,-1
> 1510: dc fe 82 41 beq 13ec
> patched/combine.o
> 1508: ff ff ff 37 addic.  r31,r31,-1
> 150c: e0 fe 82 41 beq 13ec
> 
> Combine transforms the first sequence to the second, then further
> transforms that to a bdz (ctr).  When that fails to get ctr
> allocated, the splitter takes us all the way back to the three insn
> sequence..

It used to do the addic. insn.  When I made the carry bit exposed to GCC,
it no longer was possible to always split to addic. though (CA might be
live there already).  Since the splitter should seldomly be used at all,
it now never splits to addic. (and addic. also is slower on some machines,
it is cracked, longer latency than you get with the compare to 1).

> With the patch we use ctr for the inner loop.  With unpatched gcc
> combine generates ctr for the outer loop, which of course uses
> ctr and isn't profitable with an inner loop using ctr.  Vagaries of
> the register allocator result in the outer loop using ctr with the
> inner one losing.  Oops, we generally want inner loops to be more
> highly optimized.

Lovely :-)


Segher


Re: [PATCH] Fix PR rtl-optimization/71634

2016-07-08 Thread Martin Liška
PING^1

On 06/23/2016 12:56 PM, Martin Liška wrote:
> Hello.
> 
> Following patch changes minimum of ira-max-loops-num to 1.
> Having the minimum equal to zero does not make much sense.
> 
> Ready after it finishes reg&bootstrap on x86_64-linux?
> 
> Thanks,
> Martin
> 



Re: [PATCH 0/2, fortran] Better code generation for DO loops with +-1 step

2016-07-08 Thread Jakub Jelinek
On Fri, Jul 08, 2016 at 10:33:45AM +0200, Martin Liška wrote:
> On 07/07/2016 04:40 PM, Jan Hubicka wrote:
> >>
> >> Why is the behavior only undefined for step 1 if the last iteration IV
> >> increment overflows?
> >> Doesn't this apply to all step values?
> > 
> > This is what Fortran standard says:
> > 
> >   The iteration count is established and is the value of the expression 
> > (m2-m1+m3)/m3 unless that value is negative,
> >   in which case the iteration count is 0.
> > 
> > My reading of this is that the do statement is undefined whenever the 
> > expression above is undefined
> > (m1 is lower bound, m2 is upper bound, m3 is step) and because I think the 
> > evaulation order of
> > m2-m1+m3 is not fixed, I think the statement is not defined whethever 
> > (m2-m1), (m1+m3) or (m2-m1)+m3

m1+m3?  Did you mean m3-m1 or -m1+m3 instead?

Jakub


Re: [PATCH 2/2] Optimize fortran loops with +-1 step.

2016-07-08 Thread Martin Liška
On 07/07/2016 05:53 PM, Tobias Burnus wrote:
> On Thu, Jul 07, 2016 at 02:13:12PM +0200, Tobias Burnus wrote:
>> marxin wrote:
>>> gcc/fortran/ChangeLog:
>>>
>>> 2016-07-01  Martin Liska  
>>> * lang.opt (Wundefined-do-loop): New option.
>>>* resolve.c (gfc_resolve_iterator): Warn for Wundefined-do-loop.
>>> (gfc_trans_simple_do): Generate a c-style loop.
>>> (gfc_trans_do): Fix GNU coding style.
>>
>> Can you also document the new warning in gcc/fortran/invoke.texi?
>>
>> Otherwise, this patch looks good to me. Thanks for working on it.
> 
> 
> If I look at the commit to .opt,
> 
>   +Wundefined-do-loop
>   +Fortran Warning Var(warn_undefined_do_loop) LangEnabledBy(Fortran,Wall)
> 
> and to .texi,
> 
>   +@item -Wundefined-do-loop
>   [...]
>   +Warn if a DO loop with step either 1 or -1 yields an underflow or an 
> overflow
>   +during iteration of an induction variable of the loop.  Enabled by default.
> 
> I think the "Enabled by default" is misleading as it is only enabled by -Wall,
> i.e. I had expected something like: "This warning is enabled by 
> @option{-Wall}."
> 
> Cheers,
> 
> Tobias
> 

Thanks for the review, I'm going to install following patch.

Martin
>From 5bfd6b5672a99952dac5aeb6c9a86a36affb0065 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Fri, 8 Jul 2016 10:36:44 +0200
Subject: [PATCH] Enhance documentation of Wundefined-do-loop

gcc/fortran/ChangeLog:

2016-07-08  Martin Liska  

	* invoke.texi (Wundefined-do-loop): Enhance documentation.
---
 gcc/fortran/invoke.texi | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/fortran/invoke.texi b/gcc/fortran/invoke.texi
index c0be1ab..87baf15 100644
--- a/gcc/fortran/invoke.texi
+++ b/gcc/fortran/invoke.texi
@@ -929,7 +929,8 @@ is active for @option{-pedantic}, @option{-std=f95}, @option{-std=f2003},
 @opindex @code{Wundefined-do-loop}
 @cindex warnings, undefined do loop
 Warn if a DO loop with step either 1 or -1 yields an underflow or an overflow
-during iteration of an induction variable of the loop.  Enabled by default.
+during iteration of an induction variable of the loop.
+This option is implied by @option{-Wall}.
 
 @item -Wunderflow
 @opindex @code{Wunderflow}
-- 
2.8.4



Re: [PATCH 0/2, fortran] Better code generation for DO loops with +-1 step

2016-07-08 Thread Martin Liška
On 07/07/2016 04:40 PM, Jan Hubicka wrote:
>>
>> Why is the behavior only undefined for step 1 if the last iteration IV
>> increment overflows?
>> Doesn't this apply to all step values?
> 
> This is what Fortran standard says:
> 
>   The iteration count is established and is the value of the expression 
> (m2-m1+m3)/m3 unless that value is negative,
>   in which case the iteration count is 0.
> 
> My reading of this is that the do statement is undefined whenever the 
> expression above is undefined
> (m1 is lower bound, m2 is upper bound, m3 is step) and because I think the 
> evaulation order of
> m2-m1+m3 is not fixed, I think the statement is not defined whethever 
> (m2-m1), (m1+m3) or (m2-m1)+m3
> overflows or underflows as signed integer.
> 
> For example it is not valid to iterate from -10 to INT_MAX with step 1.
> Honza
> 

Hi.

I'm attaching a candidate patch that emits the warnings. Problem with current 
implementation of loop generation
(w/ step different than 1) is that it utilizes unsigned type, thus the 
calculation of iteration count works
even though the expression overflows:

  do i = array(1), array(2), 17 
  block(i) = block(i) + 10
  end do

is transformed to:

D.3428 = (*array)[0];
D.3429 = (*array)[1];
i = D.3428;
countm1.0 = ((unsigned int) D.3429 - (unsigned int) D.3428) / 17;, if 
(D.3429 < D.3428)
  {
goto L.2;
  };

Thoughts about the patch?
Martin
>From 744e824a58f5861c4376e32c9e3169f4e52e2e00 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Fri, 8 Jul 2016 10:16:29 +0200
Subject: [PATCH] Enhance warning message for DO loops with |step| != 1.

---
 gcc/fortran/trans-stmt.c| 44 ++
 gcc/testsuite/gfortran.dg/do_undefined_warn.f90 | 61 +
 2 files changed, 105 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.dg/do_undefined_warn.f90

diff --git a/gcc/fortran/trans-stmt.c b/gcc/fortran/trans-stmt.c
index 6e4e2a7..1c1cd18 100644
--- a/gcc/fortran/trans-stmt.c
+++ b/gcc/fortran/trans-stmt.c
@@ -2014,6 +2014,7 @@ gfc_trans_do (gfc_code * code, tree exit_cond)
   gfc_start_block (&block);
 
   loc = code->ext.iterator->start->where.lb->location;
+  locus *location = &code->ext.iterator->start->where;
 
   /* Evaluate all the expressions in the iterator.  */
   gfc_init_se (&se, NULL);
@@ -2097,6 +2098,49 @@ gfc_trans_do (gfc_code * code, tree exit_cond)
 {
   tree pos, neg, tou, fromu, stepu, tmp2;
 
+  if (TREE_CODE (from) == INTEGER_CST
+	  && TREE_CODE (to) == INTEGER_CST
+	  && TREE_CODE (step) == INTEGER_CST)
+	{
+	  wide_int t = to;
+	  wide_int f = from;
+	  wide_int s = step;
+	  bool r1_overflow, r2_overflow, r3_overflow;
+	  bool cmp1, cmp2, cmp3;
+	  bool has_warning = false;
+
+	  wide_int r1 = wi::sub (t, f, SIGNED, &r1_overflow);
+	  cmp1 = wi::les_p (f, t);
+	  wi::add (f, s, SIGNED, &r2_overflow);
+	  cmp2 = !wi::neg_p (f);
+	  wi::add (r1, s, SIGNED, &r3_overflow);
+	  cmp3 = wi::les_p (s, r1);
+
+	  if (r1_overflow)
+	{
+	  gfc_warning (OPT_Wundefined_do_loop,
+			   "DO loop at %L is undefined as the expression "
+			   "TO - FROM %s",
+			   location, cmp1 ? "overflows" : "underflows");
+	  has_warning = true;
+	}
+
+	  if (r2_overflow && !has_warning)
+	{
+	  gfc_warning (OPT_Wundefined_do_loop,
+			   "DO loop at %L is undefined as the expression "
+			   "FROM + STEP %s", location,
+			   cmp2 ? "overflows" : "underflows");
+	  has_warning = true;
+	}
+
+	  if (r3_overflow && !has_warning)
+	gfc_warning (OPT_Wundefined_do_loop,
+			 "DO loop at %L is undefined as the expression "
+			 "TO - FROM + STEP %s", location,
+			 cmp3 ? "overflows" : "underflows");
+	}
+
   /* The distance from FROM to TO cannot always be represented in a signed
  type, thus use unsigned arithmetic, also to avoid any undefined
 	 overflow issues.  */
diff --git a/gcc/testsuite/gfortran.dg/do_undefined_warn.f90 b/gcc/testsuite/gfortran.dg/do_undefined_warn.f90
new file mode 100644
index 000..3c20e00
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/do_undefined_warn.f90
@@ -0,0 +1,61 @@
+! { dg-options "-Wundefined-do-loop" }
+! Program to check corner cases for DO statements.
+
+function test2(array, s, block)
+integer(1) :: i, block(9), array(2)
+integer(2) :: j
+integer (8) :: s
+s = 1
+
+do i = -HUGE(i)-1, HUGE(i), 2 ! { dg-warning "is undefined as the expression TO - FROM overflows" }
+  s = s + 1
+end do
+
+do i = HUGE(i) - 10, HUGE(i), 12 ! { dg-warning "is undefined as the expression FROM \\+ STEP overflows" }
+  s = s + 1
+end do
+
+do i = 10, HUGE(i) - 10, 100 ! { dg-warning "is undefined as the expression TO - FROM \\+ STEP overflows" }
+ s = s + 1
+end do
+
+do i = 2, -HUGE(i)-1, 2 ! { dg-warning "is undefined as the expression TO - FROM underflows" }
+  s = s + 1
+end do
+
+do i = -HUGE(i)+11, 0, -13 ! { dg-warning "is undefined as the expression FROM \\+ STEP underflows" }
+  s = s + 1
+end do
+
+do i = -5, -HUGE(i)

Re: [RFC] Convert TYPE_ALIGN_OK into an TYPE_LANG_FLAG

2016-07-08 Thread Richard Biener
On Fri, 8 Jul 2016, Eric Botcazou wrote:

> > The discussion last time ended with a mail from you that TYPE_ALIGN_OK
> > is "somehow" relevant in the Ada FE, but I didn't see any feedback
> > from Eric nor results from the "extended" testing we wanted to perform.
> 
> TYPE_ALIGN_OK is definitely relevant in the Ada FE, the question being 
> whether 
> it is still relevant in the middle-end, in which case this would most likely 
> be for strict-alignment platforms.  So we need a serious evaluation of the 
> patch on a strict-alignment platform (no, the compiler bootstrap + testsuite 
> alone doesn't qualify here).

I expect it still has an effect due to the same reason as the
DECL_OFFSET_ALIGN thing - memory reference expansion doesn't gather
information from the full reference but tries to re-cover it from
the pieces returned from get_inner_reference.  So I guess we need
to fix that first.

> > So is there any news on that front?
> 
> I should have meaningful results tomorrow morning baring unexpected issues.

Thanks,
Richard.


Re: PING Re:[PATCH] PR 71667 - Handle live operations with DEBUG uses

2016-07-08 Thread Richard Biener
On Thu, Jul 7, 2016 at 4:35 PM, Alan Hayward  wrote:
> Ping.

Ok.

Richard.

>
> From: Alan Hayward 
> To: "gcc-patches at gcc dot gnu dot org"  org>
> Date: Wed, 29 Jun 2016 08:49:34 +0100
> Subject: [PATCH] PR 71667 - Handle live operations with DEBUG uses
> Authentication-results: sourceware.org; auth=none
>
> In vectorizable_live_operation() we always assume uses a of live operation
> will be PHIs. However, when using -g a use of a live operation might be a
> DEBUG stmt.
>
> This patch avoids adding any DEBUG statments to the worklist in
> vectorizable_live_operation(). Also fixes comment.
>
> Tested on x86 and aarch64.
> Ok to commit?
>
> gcc/
> PR tree-optimization/71667
> * tree-vect-loop.c (vectorizable_live_operation): ignore DEBUG stmts
>
> testsuite/gcc.dg/vect
> PR tree-optimization/71667
> * pr71667.c: New
>
>
>
> diff --git a/gcc/testsuite/gcc.dg/vect/pr71667.c
> b/gcc/testsuite/gcc.dg/vect/pr71667.c
> new file mode 100644
> index
> ..e7012efa882a5497b0a6099c3d853f9eb
> 375cc53
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/pr71667.c
> @@ -0,0 +1,30 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-g" } */
> +
> +unsigned int mu;
> +int pt;
> +
> +void
> +qf (void)
> +{
> +  int gy;
> +  long int vz;
> +
> +  for (;;)
> +{
> +  for (gy = 0; gy < 80; ++gy)
> +  {
> +   vz = mu;
> +   ++mu;
> +   pt = (vz != 0) && (pt != 0);
> +  }
> +  while (gy < 81)
> +   while (gy < 83)
> + {
> +   vz = (vz != 0) ? 0 : mu;
> +   ++gy;
> + }
> +  pt = vz;
> +  ++mu;
> +}
> +}
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index
> 6c0337bbbcbebd6443fd3bcef45c1b23a7833486..2980a1b031cd3b919369b5e31dff7e066
> 5bc7578 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -6352,11 +6352,12 @@ vectorizable_live_operation (gimple *stmt,
> : gimple_get_lhs (stmt);
>lhs_type = TREE_TYPE (lhs);
>
> -  /* Find all uses of STMT outside the loop - there should be exactly
> one.  */
> +  /* Find all uses of STMT outside the loop - there should be at least
> one.  */
>auto_vec worklist;
>FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, lhs)
> -if (!flow_bb_inside_loop_p (loop, gimple_bb (use_stmt)))
> -   worklist.safe_push (use_stmt);
> +if (!flow_bb_inside_loop_p (loop, gimple_bb (use_stmt))
> +   && !is_gimple_debug (use_stmt))
> +  worklist.safe_push (use_stmt);
>gcc_assert (worklist.length () >= 1);
>
>bitsize = TYPE_SIZE (TREE_TYPE (vectype));
>
>


RE: [PATCH] [ARC] Various small miscellaneous fixes.

2016-07-08 Thread Claudiu Zissulescu
> > +   && (register_operand (operands[1], SFmode)
> > +   || register_operand (operands[2], SFmode))"

This condition is necessary for reload cases.

> And, with this patch applied, I get a build error:
> 
> In file included from ./tm.h:43:0,
>  from /path/to/gcc/gcc/backend.h:28,
>  from insn-opinit.c:7:
> insn-opinit.c: In function ‘void init_all_optabs(target_optabs*)’:
> ./insn-flags.h:160:26: error: ‘operands’ was not declared in this scope
> && (register_operand (operands[1], SFmode) \
>   ^
> insn-opinit.c:220:13: note: in expansion of macro ‘HAVE_divsf3’
>ena[46] = HAVE_divsf3;

I applied this patch on the current trunk, but I've got no error. 

Regards,
Claudiu


Re: [RFC] Convert TYPE_ALIGN_OK into an TYPE_LANG_FLAG

2016-07-08 Thread Eric Botcazou
> The discussion last time ended with a mail from you that TYPE_ALIGN_OK
> is "somehow" relevant in the Ada FE, but I didn't see any feedback
> from Eric nor results from the "extended" testing we wanted to perform.

TYPE_ALIGN_OK is definitely relevant in the Ada FE, the question being whether 
it is still relevant in the middle-end, in which case this would most likely 
be for strict-alignment platforms.  So we need a serious evaluation of the 
patch on a strict-alignment platform (no, the compiler bootstrap + testsuite 
alone doesn't qualify here).

> So is there any news on that front?

I should have meaningful results tomorrow morning baring unexpected issues.

-- 
Eric Botcazou


Re: [PATCH] Handle undefined extern vars in output_in_order

2016-07-08 Thread Alexander Monakov
On Fri, 1 Jul 2016, Alexander Monakov wrote:
> On Thu, 23 Jun 2016, Alexander Monakov wrote:
> > Hi,
> > 
> > I've discovered that this assert in my patch was too restrictive:
> > 
> > +  if (DECL_HAS_VALUE_EXPR_P (pv->decl))
> > +   {
> > + gcc_checking_assert (lookup_attribute ("omp declare target link",
> > +DECL_ATTRIBUTES (pv->decl)));
> > 
> > Testing for the nvptx target uncovered that there's another case where a
> > global variable would have a value expr: emutls.  Sorry for not spotting it
> > earlier (but at least the new assert did its job).  I think we should always
> > skip here over decls that have value-exprs, just like hard-reg vars are
> > skipped.  The following patch does that.  Is this still OK?
> 
> Ping.

Ping^2.

> > (bootstrapped/regtested on x86-64)
> > 
> > Alexander
> > 
> > * cgraphunit.c (cgraph_order_sort_kind): New entry ORDER_VAR_UNDEF.
> > (output_in_order): Loop over undefined variables too.  Output them
> > via assemble_undefined_decl.  Skip variables that correspond to hard
> > registers or have value-exprs.
> > * varpool.c (symbol_table::output_variables): Handle undefined
> > variables together with defined ones.
> >  
> > diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
> > index 4bfcad7..e30fe6e 100644
> > --- a/gcc/cgraphunit.c
> > +++ b/gcc/cgraphunit.c
> > @@ -2141,6 +2141,7 @@ enum cgraph_order_sort_kind
> >ORDER_UNDEFINED = 0,
> >ORDER_FUNCTION,
> >ORDER_VAR,
> > +  ORDER_VAR_UNDEF,
> >ORDER_ASM
> >  };
> >  
> > @@ -2187,16 +2188,20 @@ output_in_order (bool no_reorder)
> > }
> >  }
> >  
> > -  FOR_EACH_DEFINED_VARIABLE (pv)
> > -if (!DECL_EXTERNAL (pv->decl))
> > -  {
> > -   if (no_reorder && !pv->no_reorder)
> > -   continue;
> > -   i = pv->order;
> > -   gcc_assert (nodes[i].kind == ORDER_UNDEFINED);
> > -   nodes[i].kind = ORDER_VAR;
> > -   nodes[i].u.v = pv;
> > -  }
> > +  /* There is a similar loop in symbol_table::output_variables.
> > + Please keep them in sync.  */
> > +  FOR_EACH_VARIABLE (pv)
> > +{
> > +  if (no_reorder && !pv->no_reorder)
> > +   continue;
> > +  if (DECL_HARD_REGISTER (pv->decl)
> > + || DECL_HAS_VALUE_EXPR_P (pv->decl))
> > +   continue;
> > +  i = pv->order;
> > +  gcc_assert (nodes[i].kind == ORDER_UNDEFINED);
> > +  nodes[i].kind = pv->definition ? ORDER_VAR : ORDER_VAR_UNDEF;
> > +  nodes[i].u.v = pv;
> > +}
> >  
> >for (pa = symtab->first_asm_symbol (); pa; pa = pa->next)
> >  {
> > @@ -,16 +2227,13 @@ output_in_order (bool no_reorder)
> >   break;
> >  
> > case ORDER_VAR:
> > -#ifdef ACCEL_COMPILER
> > - /* Do not assemble "omp declare target link" vars.  */
> > - if (DECL_HAS_VALUE_EXPR_P (nodes[i].u.v->decl)
> > - && lookup_attribute ("omp declare target link",
> > -  DECL_ATTRIBUTES (nodes[i].u.v->decl)))
> > -   break;
> > -#endif
> >   nodes[i].u.v->assemble_decl ();
> >   break;
> >  
> > +   case ORDER_VAR_UNDEF:
> > + assemble_undefined_decl (nodes[i].u.v->decl);
> > + break;
> > +
> > case ORDER_ASM:
> >   assemble_asm (nodes[i].u.a->asm_str);
> >   break;
> > diff --git a/gcc/varpool.c b/gcc/varpool.c
> > index ab615fa..e5f991e 100644
> > --- a/gcc/varpool.c
> > +++ b/gcc/varpool.c
> > @@ -733,11 +733,6 @@ symbol_table::output_variables (void)
> >  
> >timevar_push (TV_VAROUT);
> >  
> > -  FOR_EACH_VARIABLE (node)
> > -if (!node->definition
> > -   && !DECL_HAS_VALUE_EXPR_P (node->decl)
> > -   && !DECL_HARD_REGISTER (node->decl))
> > -  assemble_undefined_decl (node->decl);
> >FOR_EACH_DEFINED_VARIABLE (node)
> >  {
> >/* Handled in output_in_order.  */
> > @@ -747,20 +742,19 @@ symbol_table::output_variables (void)
> >node->finalize_named_section_flags ();
> >  }
> >  
> > -  FOR_EACH_DEFINED_VARIABLE (node)
> > +  /* There is a similar loop in output_in_order.  Please keep them in 
> > sync.  */
> > +  FOR_EACH_VARIABLE (node)
> >  {
> >/* Handled in output_in_order.  */
> >if (node->no_reorder)
> > continue;
> > -#ifdef ACCEL_COMPILER
> > -  /* Do not assemble "omp declare target link" vars.  */
> > -  if (DECL_HAS_VALUE_EXPR_P (node->decl)
> > - && lookup_attribute ("omp declare target link",
> > -  DECL_ATTRIBUTES (node->decl)))
> > +  if (DECL_HARD_REGISTER (node->decl)
> > + || DECL_HAS_VALUE_EXPR_P (node->decl))
> > continue;
> > -#endif
> > -  if (node->assemble_decl ())
> > -changed = true;
> > +  if (node->definition)
> > +   changed |= node->assemble_decl ();
> > +  else
> > +   assemble_undefined_decl (node->decl);
> >  }
> >timevar_pop (TV_VAROUT);
> >return changed;
> 

Re: [RFC] Convert TYPE_ALIGN_OK into an TYPE_LANG_FLAG

2016-07-08 Thread Richard Biener
On Thu, 7 Jul 2016, Bernd Edlinger wrote:

> Hi,
> 
> this patch re-factors the TYPE_ALIGN_OK into a new TYPE_LANG_FLAG,
> and removes one of the 9 parameters of get_inner_reference.  It therefore
> simplifies the middle end slightly.
> 
> It is quite a while ago, when I last proposed a similar patch, which focused
> only on get_inner_referene.  According to Eric's comment, I extended
> it to cover also the TYPE_ALIGN_OK which is only in use by Ada today.
> 
> As it turns out, the middle end does not need the TYPE_ALIGN_OK,
> but it cannot be completely removed, because Ada reads it back again,
> and it plays therefore the role of an additional TYPE_LANG_FLAG.
> 
> Removing the use of TYPE_ALIGN_OK in the pa backend can't have any
> side-effect, because when I look at the definition of mark_reg_pointer
> the align parameter only has an influence on REGNO_POINTER_ALIGN.
> Because that is never used by the pa backend, the only other place where
> it is used in the middle end is in rtlanal.c, where it is only evaluated for
> stack_pointer_rtx, frame_pointer_rtx, and arg_pointer_rtx, but these
> registers do not hold arbitrary pointer values.
> 
> 
> Boot-strapped and Reg-tested on arm-linux-gnueabihf and x86_64-pc-linux.
> I have also built cross-compilers for hppa and mips targets.
> Is it OK for trunk?

The discussion last time ended with a mail from you that TYPE_ALIGN_OK
is "somehow" relevant in the Ada FE, but I didn't see any feedback
from Eric nor results from the "extended" testing we wanted to perform.

So is there any news on that front?  That said, if we can remove
TYPE_ALIGN_OK setting from the Ada frontend then this patch becomes
much more obvious (and no need to find a lang-specific place for
TYPE_ALIGN_OK).

Richard.


Re: [PATCH 0/7] remove targets obsoleted in gcc 6

2016-07-08 Thread Eric Botcazou
> I removed the empty directories
> 
>gcc/common/config/mep
>gcc/config/mep
>libgcc/config/mep

https://gcc.gnu.org/backends.html needs to be updated accordingly.

-- 
Eric Botcazou


Re: [PATCH] Add code-hoisting to GIMPLE

2016-07-08 Thread Richard Biener
On Mon, 4 Jul 2016, Steven Bosscher wrote:

> On Mon, Jul 4, 2016 at 1:26 PM, Richard Biener wrote:
> >
> > The following patch is Stevens code-hoisting based on PRE forward-ported
> > and fixed for bootstrap plus the case of hoisting code across loops
> > which we generally do not want (expressions in the loop exit target block
> > are antic-in throughout the whole loop unless they are killed and thus
> > get inserted into the exit block and then PREd before the loop).
> >
> > Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
> >
> > I'm going to try making the bitmap_set ops in do_hoist_insert a bit
> > faster - Steven, do you remember any issues with the approach from the
> > time you worked on it?
> 
> Hi Richi,
> 
> It's been almost 8 years since I worked on this, so I really don't
> recall much about this at all. Sorry :-)

Fair enough ;)  Apart from the loop case I noticed that code-hoisting
will cause

  if (x1_6 > 6) 
goto ;
  else  
goto ;

  :   
  i_7 = i_2(D) + 2; 

  :   
  # i_1 = PHI
  i_8 = i_1 + 2;

to be re-written to

  _18 = i_2(D) + 2;
  if (x1_6 > 6)
goto ;
  else
goto ;

  :
  _19 = _18 + 2;

  :
  # i_8 = PHI <_18(2), _19(3)>

which is because critical edge splitting splits 2->4 and thus makes
i_2(D)+2 antic-in in the else block (IIRC it wouldn't be antic-in
in bb 4 but antic-out in bb 2).  Not sure if it is worth trying to
devise a "fix" for this, it's not really a pessimization.

But it generally shows that hoisting is quite aggressive.

Richard.


[PATCH] Add code-hoisting to GIMPLE

2016-07-08 Thread Richard Biener

This is a final candidate patch to add code-hoisting to GIMPLE.

I've already committed several patches fixing fallout and the following
one adds -fno-code-hoisting (I renamed the option) to a few testcases.
I filed PRs for the cases code-hoisting exposes missed optimization
opportunities in passes that I couldn't quickly fix (I fixed path
splitting and loop distribution but failed to grok SLSR).

Bootstrapped and tested on x86_64-unknown-linux-gnu.

I put the patch on the czerny tester for the weekend runs (x86_64 as 
well).

Testing on other archs and comments are of course appreciated, if nothing
unusual happens I plan to commit this on Monday.

Richard.

2016-07-08  Steven Bosscher  
Richard Biener  

PR tree-optimization/23286
PR tree-optimization/70159
* doc/invoke.texi: Document -fcode-hoisting.
* common.opt (fcode-hoisting): New flag.
* opts.c (default_options_table): Enable -fcode-hoisting at -O2+.
* tree-ssa-pre.c (pre_stats): Add hoist_insert.
(do_regular_insertion): Rename to ...
(do_pre_regular_insertion): ... this and amend general comments
on insertion strathegy.
(do_partial_partial_insertion): Rename to ...
(do_pre_partial_partial_insertion): ... this.
(do_hoist_insertion): New function.
(insert_aux): Take flags on whether to do PRE and/or hoist insertion
and call do_hoist_insertion properly.
(insert): Adjust.
(pass_pre::gate): Enable also if -fcode-hoisting is enabled.
(pass_pre::execute): Register hoist_insert stats.

* gcc.dg/tree-ssa/ssa-pre-11.c: Disable code hosting.
* gcc.dg/tree-ssa/ssa-pre-27.c: Likewise.
* gcc.dg/tree-ssa/ssa-pre-28.c: Likewise.
* gcc.dg/tree-ssa/ssa-pre-2.c: Likewise.
* gcc.dg/tree-ssa/pr35286.c: Likewise.
* gcc.dg/tree-ssa/pr35287.c: Likewise.
* gcc.dg/hoist-register-pressure-1.c: Likewise.
* gcc.dg/hoist-register-pressure-2.c: Likewise.
* gcc.dg/hoist-register-pressure-3.c: Likewise.
* gcc.dg/pr51879-12.c: Likewise.
* gcc.dg/strlenopt-9.c: Likewise.
* gcc.dg/tree-ssa/pr47392.c: Likewise.
* gcc.dg/tree-ssa/pr68619-4.c: Likewise.
* gcc.dg/tree-ssa/split-path-5.c: Likewise.
* gcc.dg/tree-ssa/slsr-35.c: Likewise.
* gcc.dg/tree-ssa/slsr-36.c: Likewise.
* gcc.dg/tree-ssa/loadpre3.c: Adjust so hosting doesn't apply.
* gcc.dg/tree-ssa/pr43491.c: Scan optimized dump for desired result.
* gcc.dg/tree-ssa/ssa-pre-31.c: Adjust expected outcome for hoisting.
* gcc.dg/tree-ssa/ssa-hoist-1.c: New testcase.
* gcc.dg/tree-ssa/ssa-hoist-2.c: New testcase.
* gcc.dg/tree-ssa/ssa-hoist-3.c: New testcase.
* gcc.dg/tree-ssa/ssa-hoist-4.c: New testcase.
* gcc.dg/tree-ssa/ssa-hoist-5.c: New testcase.
* gcc.dg/tree-ssa/ssa-hoist-6.c: New testcase.
* gfortran.dg/pr43984.f90: Adjust expected outcome.

Index: gcc/doc/invoke.texi
===
*** gcc/doc/invoke.texi.orig2016-07-07 14:45:27.156657281 +0200
--- gcc/doc/invoke.texi 2016-07-07 14:45:45.460875808 +0200
*** Objective-C and Objective-C++ Dialects}.
*** 404,410 
  -fstrict-overflow -fthread-jumps -ftracer -ftree-bit-ccp @gol
  -ftree-builtin-call-dce -ftree-ccp -ftree-ch @gol
  -ftree-coalesce-vars -ftree-copy-prop -ftree-dce -ftree-dominator-opts @gol
! -ftree-dse -ftree-forwprop -ftree-fre -ftree-loop-if-convert @gol
  -ftree-loop-if-convert-stores -ftree-loop-im @gol
  -ftree-phiprop -ftree-loop-distribution -ftree-loop-distribute-patterns @gol
  -ftree-loop-ivcanon -ftree-loop-linear -ftree-loop-optimize @gol
--- 404,410 
  -fstrict-overflow -fthread-jumps -ftracer -ftree-bit-ccp @gol
  -ftree-builtin-call-dce -ftree-ccp -ftree-ch @gol
  -ftree-coalesce-vars -ftree-copy-prop -ftree-dce -ftree-dominator-opts @gol
! -ftree-dse -ftree-forwprop -ftree-fre -fcode-hoisting -ftree-loop-if-convert 
@gol
  -ftree-loop-if-convert-stores -ftree-loop-im @gol
  -ftree-phiprop -ftree-loop-distribution -ftree-loop-distribute-patterns @gol
  -ftree-loop-ivcanon -ftree-loop-linear -ftree-loop-optimize @gol
*** also turns on the following optimization
*** 6372,6377 
--- 6372,6378 
  -fstrict-aliasing -fstrict-overflow @gol
  -ftree-builtin-call-dce @gol
  -ftree-switch-conversion -ftree-tail-merge @gol
+ -fcode-hoisting @gol
  -ftree-pre @gol
  -ftree-vrp @gol
  -fipa-ra}
*** and the @option{large-stack-frame-growth
*** 7265,7270 
--- 7266,7279 
  Perform reassociation on trees.  This flag is enabled by default
  at @option{-O} and higher.
  
+ @item -fcode-hoisting
+ @opindex fcode-hoisting
+ Perform code hoisting.  Code hoisting tries to move the
+ evaluation of expressions executed on all paths to the function exit
+ as early as possible.  This is especially useful as a code size