Re: [PING][PATCH] libgcc: Fix typos in comments for ARM FP emulation routines

2016-04-19 Thread Sandra Loosemore

On 04/19/2016 03:53 PM, Martin Galvan wrote:

A lifetime ago I contributed a patch that added CFI directives to ieee754-df.S,
among other files. For unrelated reasons I looked at that file again and saw
that some of the comments have extra '@' characters interwined; this is probably
the result of splitting lines because they were too long. This patch simply
removes those extra chars, as well as fixing a couple other cosmetic issues.


IMO, fixing typos and formatting in comments (especially in your own 
code!) qualifies as an "obvious fix" that you can check in without prior 
approval.  In any case, I looked over the patch and it seems fine to me.


Or, do you need someone to check this in for you because you don't have 
write access to the repository?


-Sandra



[committed] Fix handling of OpenMP implicit linear/lastprivate clauses (PR middle-end/70680)

2016-04-19 Thread Jakub Jelinek
Hi!

The following testcases show incorrect handling of implicit linear or
lastprivate clauses on the #pragma omp simd iterators, if the vars aren't
private in the outer context and they aren't in combined constructs, they
should be noticed in the outer context.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk,
queued up for 6 branch after 6.1 is released.

2016-04-19  Jakub Jelinek  

PR middle-end/70680
* gimplify.c (gimplify_omp_for): Call omp_notice_variable for
implicitly linear or lastprivate iterator on the outer context.

* testsuite/libgomp.c/pr70680-1.c: New test.
* testsuite/libgomp.c/pr70680-2.c: New test.

--- gcc/gimplify.c.jj   2016-04-15 18:04:42.0 +0200
+++ gcc/gimplify.c  2016-04-19 20:03:19.347936293 +0200
@@ -8785,7 +8785,10 @@ gimplify_omp_for (tree *expr_p, gimple_s
  decl, false))
;
  else if (outer->region_type != ORT_COMBINED_PARALLEL)
-   outer = NULL;
+   {
+ omp_notice_variable (outer, decl, true);
+ outer = NULL;
+   }
  if (outer)
{
  n = splay_tree_lookup (outer->variables,
@@ -8868,7 +8871,10 @@ gimplify_omp_for (tree *expr_p, gimple_s
  decl, false))
;
  else if (outer->region_type != ORT_COMBINED_PARALLEL)
-   outer = NULL;
+   {
+ omp_notice_variable (outer, decl, true);
+ outer = NULL;
+   }
  if (outer)
{
  n = splay_tree_lookup (outer->variables,
--- libgomp/testsuite/libgomp.c/pr70680-1.c.jj  2016-04-19 20:13:54.998588014 
+0200
+++ libgomp/testsuite/libgomp.c/pr70680-1.c 2016-04-19 20:01:54.0 
+0200
@@ -0,0 +1,75 @@
+/* PR middle-end/70680 */
+
+int v;
+
+void
+f1 (void)
+{
+  int i = 0;
+#pragma omp task default(shared) if(0)
+  {
+#pragma omp simd
+for (i = 0; i < 100; i++)
+  ;
+v = i;
+  }
+  if (i != 100)
+__builtin_abort ();
+}
+
+void
+f2 (void)
+{
+  int i = 0;
+#pragma omp task default(shared) if(0)
+  {
+#pragma omp simd
+for (i = 0; i < 100; i++)
+  ;
+  }
+  if (i != 100)
+__builtin_abort ();
+}
+
+void
+f3 (void)
+{
+  int i = 0;
+#pragma omp task default(shared) if(0)
+  {
+#pragma omp simd linear(i: 1)
+for (i = 0; i < 100; i++)
+  ;
+v = i;
+  }
+  if (i != 100)
+__builtin_abort ();
+}
+
+void
+f4 (void)
+{
+  int i = 0;
+#pragma omp task default(shared) if(0)
+  {
+#pragma omp simd linear(i: 1)
+for (i = 0; i < 100; i++)
+  ;
+  }
+  if (i != 100)
+__builtin_abort ();
+}
+
+int
+main ()
+{
+  f1 ();
+  if (v++ != 100)
+__builtin_abort ();
+  f2 ();
+  f3 ();
+  if (v++ != 100)
+__builtin_abort ();
+  f4 ();
+  return 0;
+}
--- libgomp/testsuite/libgomp.c/pr70680-2.c.jj  2016-04-19 20:13:59.570527869 
+0200
+++ libgomp/testsuite/libgomp.c/pr70680-2.c 2016-04-19 20:09:02.0 
+0200
@@ -0,0 +1,79 @@
+/* PR middle-end/70680 */
+
+int v;
+
+void
+f1 (void)
+{
+  int i = 0, j = 0;
+#pragma omp task default(shared) if(0)
+  {
+#pragma omp simd collapse(2)
+for (i = 0; i < 10; i++)
+  for (j = 0; j < 10; j++)
+   ;
+v = i + j;
+  }
+  if (i != 10 || j != 10)
+__builtin_abort ();
+}
+
+void
+f2 (void)
+{
+  int i = 0, j = 0;
+#pragma omp task default(shared) if(0)
+  {
+#pragma omp simd collapse(2)
+for (i = 0; i < 10; i++)
+  for (j = 0; j < 10; j++)
+   ;
+  }
+  if (i != 10 || j != 10)
+__builtin_abort ();
+}
+
+void
+f3 (void)
+{
+  int i = 0, j = 0;
+#pragma omp task default(shared) if(0)
+  {
+#pragma omp simd collapse(2) lastprivate (i, j)
+for (i = 0; i < 10; i++)
+  for (j = 0; j < 10; j++)
+   ;
+v = i + j;
+  }
+  if (i != 10 || j != 10)
+__builtin_abort ();
+}
+
+void
+f4 (void)
+{
+  int i = 0, j = 0;
+#pragma omp task default(shared) if(0)
+  {
+#pragma omp simd collapse(2) lastprivate (i, j)
+for (i = 0; i < 10; i++)
+  for (j = 0; j < 10; j++)
+   ;
+  }
+  if (i != 10 || j != 10)
+__builtin_abort ();
+}
+
+int
+main ()
+{
+  f1 ();
+  if (v++ != 20)
+__builtin_abort ();
+  f2 ();
+  f3 ();
+  if (v++ != 20)
+__builtin_abort ();
+  f4 ();
+  return 0;
+}

Jakub


Re: [PATCH] [AArch64] support -mfentry feature for arm64

2016-04-19 Thread AKASHI Takahiro
On Tue, Apr 19, 2016 at 09:39:39AM +0300, Alexander Monakov wrote:
> On Tue, 19 Apr 2016, AKASHI Takahiro wrote:
> > > > But if Szabolcs' two-instruction 
> > > > sequence in the adjacent subthread is sufficient, this is moot.
> > > 
> > > .  It can also be solved by having just one NOP after the function label, 
> > > and a number of them before, then no thread can be in the nop pad.  That 
> > > seems to indicate that GCC should not try to be too clever and simply 
> > > leave the specified number of nops before and after the function label, 
> > > leaving safety measures to the patching infrastructure.
> > 
> > I don't get this idea very well.
> > How can the instructions *before* a function label be executed
> > after branching into this function?
> 
> The single nop after the function label is changed to a short backwards branch
> to the instructions just before the function label.
> 
> As a result, the last instruction in the pad would have to become a short
> forward branch jumping over the backwards branch described above, to the first
> real instruction of the function.

So you mean something like:
1:
  str x30, [sp, #-8]!
  bl _tracefunc
  ldr x30, [sp], #8
  b 2f
.global 
  b 1b
2:
  
  ...
(We will not have to use x9 or else to preserve x30 here.)

Interesting.
Livepatch code in the kernel has an assumption that the address of
"bl _tracefunc" be equal to , but a recent patch for
power pc to support livepatch tries to ease this restriction [1],
and so hopefully it won't be an issue.
(I will have to dig into the kernel code to be sure that there is
no other issues though.)

Thanks,
-Takahiro AKASHI

[1] http://lkml.iu.edu//hypermail/linux/kernel/1604.1/04111.html and
http://lkml.iu.edu//hypermail/linux/kernel/1604.1/04112.html

> Alexander


Re: [PATCH] [AArch64] support -mfentry feature for arm64

2016-04-19 Thread AKASHI Takahiro
On Tue, Apr 19, 2016 at 09:44:37AM +0300, Alexander Monakov wrote:
> On Tue, 19 Apr 2016, AKASHI Takahiro wrote:
> > > looking at [2] i don't see why
> > > 
> > > func:
> > >   mov x9, x30
> > >   bl _tracefunc
> > >   
> > 
> > Actually,
> > mov x9, x30
> > bl _tracefunc
> > mov x30, x9
> > 
> 
> I think here Szabolcs' point was that the last instruction can be eliminated:
> _tracefunc can be responsible for restoring x30, and can use x9 to return to
> its caller. It has a non-standard calling convention and needs to be
> implemented in assembly anyway.

OK, but in _tracefunc, x30 has been updated, and so we should
return as follows:
mov xTMP, x30
mov x30, x9
ret xTMP

We need one more temp register here...

Thanks,
-Takahiro AKASHI

> Alexander


[PING][PATCH] libgcc: Fix typos in comments for ARM FP emulation routines

2016-04-19 Thread Martin Galvan
A lifetime ago I contributed a patch that added CFI directives to ieee754-df.S,
among other files. For unrelated reasons I looked at that file again and saw
that some of the comments have extra '@' characters interwined; this is probably
the result of splitting lines because they were too long. This patch simply
removes those extra chars, as well as fixing a couple other cosmetic issues.

libgcc/ChangeLog:
2016-04-19  Martin Galvan  

* config/arm/ieee754-df.S: Fix typos in comments.

Index: libgcc/config/arm/ieee754-df.S
===
--- libgcc/config/arm/ieee754-df.S  (revision 234960)
+++ libgcc/config/arm/ieee754-df.S  (working copy)
@@ -160,8 +160,8 @@
teq r4, r5
beq LSYM(Lad_d)
 
-@ CFI note: we're lucky that the branches to Lad_* that appear after this 
function
-@ have a CFI state that's exactly the same as the one we're in at this
+@ CFI note: we're lucky that the branches to Lad_* that appear after this
+@ function have a CFI state that's exactly the same as the one we're in at this
 @ point. Otherwise the CFI would change to a different state after the branch,
 @ which would be disastrous for backtracing.
 LSYM(Lad_x):
@@ -1158,8 +1158,8 @@
 1: str ip, [sp, #-4]!
.cfi_adjust_cfa_offset 4@ CFA is now sp + previousOffset + 4.
@ We're not adding CFI for ip as it's pushed into the stack
-   @ only because @ it may be popped off later as a return value
-   @ (i.e. we're not preserving @ it anyways).
+   @ only because it may be popped off later as a return value
+   @ (i.e. we're not preserving it anyways).
 
@ Trap any INF/NAN first.
mov ip, xh, lsl #1
@@ -1169,14 +1169,14 @@
COND(mvn,s,ne)  ip, ip, asr #21
beq 3f
.cfi_remember_state
-   @ Save the current CFI state. This is done because the branch
-   @ is conditional, @ and if we don't take it we'll issue a
-   @ .cfi_adjust_cfa_offset and return.  @ If we do take it,
-   @ however, the .cfi_adjust_cfa_offset from the non-branch @ code
-   @ will affect the branch code as well. To avoid this we'll
-   @ restore @ the current state before executing the branch code.
+   @ Save the current CFI state.  This is done because the branch
+   @ is conditional, and if we don't take it we'll issue a
+   @ .cfi_adjust_cfa_offset and return.  If we do take it,
+   @ however, the .cfi_adjust_cfa_offset from the non-branch code
+   @ will affect the branch code as well.  To avoid this we'll
+   @ restore the current state before executing the branch code.
 
-   @ Test for equality.  @ Note that 0.0 is equal to -0.0.
+   @ Test for equality.  Note that 0.0 is equal to -0.0.
 2: add sp, sp, #4
.cfi_adjust_cfa_offset -4   @ CFA is now sp + previousOffset.
 


Re: [PATCH] Fix missed DSE opportunity with operator delete.

2016-04-19 Thread Mikhail Maltsev
On 04/18/2016 12:14 PM, Richard Biener wrote:
> 
> Enlarging tree_function_decl is bad.
Probably using 3 bits for malloc_flag, operator_new_flag and free_flag is
redundant. I packed the state into 2 bits.
> 
> Passes should get at the info via flags_from_decl_or_type () and a new
> ECF_FREE.
Fixed.

-- 
Regards,
Mikhail Maltsev
diff --git a/gcc/ada/gcc-interface/trans.c b/gcc/ada/gcc-interface/trans.c
index 357d26f..00e4f84 100644
--- a/gcc/ada/gcc-interface/trans.c
+++ b/gcc/ada/gcc-interface/trans.c
@@ -400,7 +400,7 @@ gigi (Node_Id gnat_root,
 			   ftype,
 			   NULL_TREE, is_disabled, false, true, true, false,
 			   true, false, NULL, Empty);
-  DECL_IS_MALLOC (malloc_decl) = 1;
+  DECL_SET_MALLOC (malloc_decl);
 
   /* free is a function declaration tree for a function to free memory.  */
   free_decl
diff --git a/gcc/ada/gcc-interface/utils.c b/gcc/ada/gcc-interface/utils.c
index d568dff..5b12e3d 100644
--- a/gcc/ada/gcc-interface/utils.c
+++ b/gcc/ada/gcc-interface/utils.c
@@ -6026,7 +6026,7 @@ handle_malloc_attribute (tree *node, tree name, tree ARG_UNUSED (args),
 {
   if (TREE_CODE (*node) == FUNCTION_DECL
   && POINTER_TYPE_P (TREE_TYPE (TREE_TYPE (*node
-DECL_IS_MALLOC (*node) = 1;
+DECL_SET_MALLOC (*node);
   else
 {
   warning (OPT_Wattributes, "%qs attribute ignored",
diff --git a/gcc/builtin-attrs.def b/gcc/builtin-attrs.def
index 089817a..ddaf3e6 100644
--- a/gcc/builtin-attrs.def
+++ b/gcc/builtin-attrs.def
@@ -88,6 +88,7 @@ DEF_ATTR_IDENT (ATTR_CONST, "const")
 DEF_ATTR_IDENT (ATTR_FORMAT, "format")
 DEF_ATTR_IDENT (ATTR_FORMAT_ARG, "format_arg")
 DEF_ATTR_IDENT (ATTR_MALLOC, "malloc")
+DEF_ATTR_IDENT (ATTR_FREE, "free")
 DEF_ATTR_IDENT (ATTR_NONNULL, "nonnull")
 DEF_ATTR_IDENT (ATTR_NORETURN, "noreturn")
 DEF_ATTR_IDENT (ATTR_NOTHROW, "nothrow")
@@ -141,6 +142,10 @@ DEF_ATTR_TREE_LIST (ATTR_MALLOC_NOTHROW_LIST, ATTR_MALLOC,	\
 			ATTR_NULL, ATTR_NOTHROW_LIST)
 DEF_ATTR_TREE_LIST (ATTR_MALLOC_NOTHROW_LEAF_LIST, ATTR_MALLOC,	\
 			ATTR_NULL, ATTR_NOTHROW_LEAF_LIST)
+DEF_ATTR_TREE_LIST (ATTR_FREE_NOTHROW_LIST, ATTR_FREE,		\
+			ATTR_NULL, ATTR_NOTHROW_LIST)
+DEF_ATTR_TREE_LIST (ATTR_FREE_NOTHROW_LEAF_LIST, ATTR_FREE,	\
+			ATTR_NULL, ATTR_NOTHROW_LEAF_LIST)
 DEF_ATTR_TREE_LIST (ATTR_SENTINEL_NOTHROW_LIST, ATTR_SENTINEL,	\
 			ATTR_NULL, ATTR_NOTHROW_LIST)
 DEF_ATTR_TREE_LIST (ATTR_SENTINEL_NOTHROW_LEAF_LIST, ATTR_SENTINEL,	\
@@ -269,8 +274,10 @@ DEF_ATTR_TREE_LIST (ATTR_TM_NOTHROW_RT_LIST,
 DEF_ATTR_TREE_LIST (ATTR_TMPURE_MALLOC_NOTHROW_LIST,
 		   ATTR_TM_TMPURE, ATTR_NULL, ATTR_MALLOC_NOTHROW_LIST)
 /* Same attributes used for BUILT_IN_FREE except with TM_PURE thrown in.  */
-DEF_ATTR_TREE_LIST (ATTR_TMPURE_NOTHROW_LIST,
-		   ATTR_TM_TMPURE, ATTR_NULL, ATTR_NOTHROW_LIST)
+DEF_ATTR_TREE_LIST (ATTR_TMPURE_FREE_NOTHROW_LIST,
+		   ATTR_TM_TMPURE, ATTR_NULL, ATTR_FREE_NOTHROW_LIST)
+DEF_ATTR_TREE_LIST (ATTR_TMPURE_FREE_NOTHROW_LEAF_LIST,
+		   ATTR_TM_TMPURE, ATTR_NULL, ATTR_FREE_NOTHROW_LEAF_LIST)
 
 DEF_ATTR_TREE_LIST (ATTR_TMPURE_NOTHROW_LEAF_LIST,
 		ATTR_TM_TMPURE, ATTR_NULL, ATTR_NOTHROW_LEAF_LIST)
diff --git a/gcc/builtins.def b/gcc/builtins.def
index 2fc7f65..e3d1614 100644
--- a/gcc/builtins.def
+++ b/gcc/builtins.def
@@ -781,7 +781,7 @@ DEF_EXT_LIB_BUILTIN(BUILT_IN_FFSLL, "ffsll", BT_FN_INT_LONGLONG, ATTR_CONST_
 DEF_EXT_LIB_BUILTIN(BUILT_IN_FORK, "fork", BT_FN_PID, ATTR_NOTHROW_LIST)
 DEF_GCC_BUILTIN(BUILT_IN_FRAME_ADDRESS, "frame_address", BT_FN_PTR_UINT, ATTR_NULL)
 /* [trans-mem]: Adjust BUILT_IN_TM_FREE if BUILT_IN_FREE is changed.  */
-DEF_LIB_BUILTIN(BUILT_IN_FREE, "free", BT_FN_VOID_PTR, ATTR_NOTHROW_LEAF_LIST)
+DEF_LIB_BUILTIN(BUILT_IN_FREE, "free", BT_FN_VOID_PTR, ATTR_FREE_NOTHROW_LEAF_LIST)
 DEF_GCC_BUILTIN(BUILT_IN_FROB_RETURN_ADDR, "frob_return_addr", BT_FN_PTR_PTR, ATTR_NULL)
 DEF_EXT_LIB_BUILTIN(BUILT_IN_GETTEXT, "gettext", BT_FN_STRING_CONST_STRING, ATTR_FORMAT_ARG_1)
 DEF_C99_BUILTIN(BUILT_IN_IMAXABS, "imaxabs", BT_FN_INTMAX_INTMAX, ATTR_CONST_NOTHROW_LEAF_LIST)
diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index cae2faf..12d7924 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -355,6 +355,7 @@ static tree handle_tls_model_attribute (tree *, tree, tree, int,
 static tree handle_no_instrument_function_attribute (tree *, tree,
 		 tree, int, bool *);
 static tree handle_malloc_attribute (tree *, tree, tree, int, bool *);
+static tree handle_free_attribute (tree *, tree, tree, int, bool *);
 static tree handle_returns_twice_attribute (tree *, tree, tree, int, bool *);
 static tree handle_no_limit_stack_attribute (tree *, tree, tree, int,
 	 bool *);
@@ -720,6 +721,8 @@ const struct attribute_spec c_common_attribute_table[] =
 			  false },
   { "malloc", 0, 0, true,  false, false,
 			  handle_malloc_attribute, false },
+  { "free",   0, 0, true,  false, false,
+			  

Re: [PATCH, rs6000] Expand vec_ld and vec_st during parsing to improve performance

2016-04-19 Thread Bill Schmidt
On Tue, 2016-04-19 at 10:09 +0200, Richard Biener wrote:
> On Tue, Apr 19, 2016 at 12:05 AM, Bill Schmidt
>  wrote:
> > Hi,
> >
> > Expanding built-ins in the usual way (leaving them as calls until
> > expanding into RTL) restricts the amount of optimization that can be
> > performed on the code represented by the built-ins.  This has been
> > observed to be particularly bad for the vec_ld and vec_st built-ins on
> > PowerPC, which represent the lvx and stvx instructions.  Currently these
> > are expanded into UNSPECs that are left untouched by the optimizers, so
> > no redundant load or store elimination can take place.  For certain
> > idiomatic usages, this leads to very bad performance.
> >
> > Initially I planned to just change the UNSPEC representation to RTL that
> > directly expresses the address masking implicit in lvx and stvx.  This
> > turns out to be only partially successful in improving performance.
> > Among other things, by the time we reach RTL we have lost track of the
> > __restrict__ attribute, leading to more appearances of may-alias
> > relationships than should really be present.  Instead, this patch
> > expands the built-ins during parsing so that they are exposed to all
> > GIMPLE optimizations as well.
> >
> > This works well for vec_ld and vec_st.  It is also possible for
> > programmers to instead use __builtin_altivec_lvx_ and
> > __builtin_altivec_stvx_.  These are not so easy to catch during
> > parsing, since they are not processed by the overloaded built-in
> > function table.  For these, I am currently falling back to expansion
> > during RTL while still exposing the address-masking semantics, which
> > seems ok for these somewhat obscure built-ins.  At some future time we
> > may decide to handle them similarly to vec_ld and vec_st.
> >
> > For POWER8 little-endian only, the loads and stores during expand time
> > require some special handling, since the POWER8 expanders want to
> > convert these to lxvd2x/xxswapd and xxswapd/stxvd2x.  To deal with this,
> > I've added an extra pre-pass to the swap optimization phase that
> > recognizes the lvx and stvx patterns and canonicalizes them so they'll
> > be properly recognized.  This isn't an issue for earlier or later
> > processors, or for big-endian POWER8, so doing this as part of swap
> > optimization is appropriate.
> >
> > We have a lot of existing test cases for this code, which proved very
> > useful in discovering bugs, so I haven't seen a reason to add any new
> > tests.
> >
> > The patch is fairly large, but it isn't feasible to break it up into
> > smaller units without leaving something in a broken state.  So I will
> > have to just apologize for the size and leave it at that.  Sorry! :)
> >
> > Bootstrapped and tested successfully on powerpc64le-unknown-linux-gnu,
> > and on powerpc64-unknown-linux-gnu (-m32 and -m64) with no regressions.
> > Is this ok for trunk after GCC 6 releases?
> 
> Just took a very quick look but it seems you are using integer arithmetic
> for the pointer adjustment and bit-and.  You could use POINTER_PLUS_EXPR
> for the addition and BIT_AND_EXPR is also valid on pointer types.  Which
> means you don't need conversions to/from sizetype.

I just verified that I run into trouble with both these changes.  The
build_binary_op interface doesn't accept POINTER_PLUS_EXPR as a valid
code (we hit a gcc_unreachable in the main switch statement), but does
produce pointer additions from a PLUS_EXPR.  Also, apparently
BIT_AND_EXPR is not valid on at least these pointer types:

ld.c: In function 'test':
ld.c:68:9: error: invalid operands to binary & (have '__vector(16) unsigned 
char *' and '__vector(16) unsigned char *')
   vuc = vec_ld (0, (vector unsigned char *)svuc);
 ^

That's what happens if I try:

  tree aligned = build_binary_op (loc, BIT_AND_EXPR, addr,
  build_int_cst (TREE_TYPE (arg1),
 -16), 0);

If I try with building the -16 as a sizetype, I get the same error
message except that the second argument listed is 'sizetype'.  Is there
something else I should be trying instead?

Thanks,
Bill


> 
> x86 nowadays has intrinsics implemented as inlines - they come from
> header files.  It seems for ppc the intrinsics are somehow magically
> there, w/o a header file?
> 
> Richard.
> 
> > Thanks,
> > Bill
> >
> >
> > 2016-04-18  Bill Schmidt  
> >
> > * config/rs6000/altivec.md (altivec_lvx_): Remove.
> > (altivec_lvx__internal): Document.
> > (altivec_lvx__2op): New define_insn.
> > (altivec_lvx__1op): Likewise.
> > (altivec_lvx__2op_si): Likewise.
> > (altivec_lvx__1op_si): Likewise.
> > (altivec_stvx_): Remove.
> > (altivec_stvx__internal): Document.
> > (altivec_stvx__2op): New define_insn.
> > (altivec_stvx__1op): Likewise.
> > 

moxie-rtems patch for libgcc/config.host

2016-04-19 Thread Joel Sherrill

Hi

For some unknown reason, moxie-rtems has its own stanza
in libgcc/config.host which does not include extra_parts.
This results in C++ RTEMS applications not linking.

Also the tmake_file variable is overridden by the
shared moxie stanza rather than being added to.

This patch addresses both issues. This patch (or some
minor variant) needs to be applied to every branch from
4.9 to master.

Comments?


2015-04-18  Joel Sherrill 

* config.host (moxie-*-rtems*): Merge this stanza with
other moxie targets so the same extra_parts are built.
Also have tmake_file add on to its value rather than override.

--
Joel Sherrill, Ph.D. Director of Research & Development
joel.sherr...@oarcorp.comOn-Line Applications Research
Ask me about RTEMS: a free RTOS  Huntsville AL 35806
Support Available(256) 722-9985
>From fbeb49b7d12cb7a6c9ef15a9f3b000c9fc7b641e Mon Sep 17 00:00:00 2001
From: Joel Sherrill 
Date: Mon, 18 Apr 2016 16:31:06 -0500
Subject: [PATCH] config.host: Merge moxie-rtems stanza with other moxie
 targets

	2015-04-18  Joel Sherrill 

	* config.host (moxie-*-rtems*): Merge this stanza with
	other moxie targets so the same extra_parts are built.
	Also have tmake_file add on to its value rather than override.
---
 libgcc/config.host | 9 ++---
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/libgcc/config.host b/libgcc/config.host
index b61a579..16a45c8 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -931,14 +931,9 @@ mmix-knuth-mmixware)
 mn10300-*-*)
 	tmake_file=t-fdpbit
 	;;
-moxie-*-elf | moxie-*-moxiebox* | moxie-*-uclinux*)
-	tmake_file="moxie/t-moxie t-softfp-sfdf t-softfp-excl t-softfp"
-	extra_parts="$extra_parts crti.o crtn.o crtbegin.o crtend.o"
-	;;
-moxie-*-rtems*)
+moxie-*-elf | moxie-*-moxiebox* | moxie-*-uclinux* | moxie-*-rtems*)
 	tmake_file="$tmake_file moxie/t-moxie t-softfp-sfdf t-softfp-excl t-softfp"
-	# Don't use default.
-	extra_parts=
+	extra_parts="$extra_parts crti.o crtn.o crtbegin.o crtend.o"
 	;;
 msp430*-*-elf)
 	tmake_file="$tm_file t-crtstuff t-fdpbit msp430/t-msp430"
-- 
1.8.3.1



Re: C++ PATCH for c++/66543 (-Wunused-but-set false positives)

2016-04-19 Thread Jakub Jelinek
On Tue, Apr 19, 2016 at 03:29:17PM -0400, Jason Merrill wrote:
> We've been seeing false positives from these warnings in template code due
> to uses not making it through into the instantiation:
> 
> 1) If a pack expansion has no elements
> 2) If a parameter used in a trailing-return-type is instantiated into a
> dummy distinct from the real instantiation
> 3) If a generic lambda that refers to the decl is never instantiated
> 
> Tested x86_64-pc-linux-gnu, applying to trunk.

Looks safe even for 6.1 to me, or if you want to give it more time on trunk,
at least for 6.2.  But we'll do a rc2 later this week in any case.

> commit 16a145d52dcd75c5da6702ca7024a4570abf6d36
> Author: Jason Merrill 
> Date:   Wed Mar 2 09:40:14 2016 -0500
> 
>   PR c++/66543 - -Wunused-but-set* false positives
> 
>   * expr.c (mark_exp_read): Handle NON_DEPENDENT_EXPR.
>   * pt.c (make_pack_expansion): Call mark_exp_read.
>   * semantics.c (finish_id_expression): Call mark_type_use in
>   unevaluated context.

Jakub


C++ PATCH for c++/66543 (-Wunused-but-set false positives)

2016-04-19 Thread Jason Merrill
We've been seeing false positives from these warnings in template code 
due to uses not making it through into the instantiation:


1) If a pack expansion has no elements
2) If a parameter used in a trailing-return-type is instantiated into a 
dummy distinct from the real instantiation

3) If a generic lambda that refers to the decl is never instantiated

Tested x86_64-pc-linux-gnu, applying to trunk.
commit 16a145d52dcd75c5da6702ca7024a4570abf6d36
Author: Jason Merrill 
Date:   Wed Mar 2 09:40:14 2016 -0500

	PR c++/66543 - -Wunused-but-set* false positives

	* expr.c (mark_exp_read): Handle NON_DEPENDENT_EXPR.
	* pt.c (make_pack_expansion): Call mark_exp_read.
	* semantics.c (finish_id_expression): Call mark_type_use in
	unevaluated context.

diff --git a/gcc/cp/expr.c b/gcc/cp/expr.c
index 702b717..61b3953 100644
--- a/gcc/cp/expr.c
+++ b/gcc/cp/expr.c
@@ -145,6 +145,7 @@ mark_exp_read (tree exp)
 case ADDR_EXPR:
 case INDIRECT_REF:
 case FLOAT_EXPR:
+case NON_DEPENDENT_EXPR:
   mark_exp_read (TREE_OPERAND (exp, 0));
   break;
 case COMPOUND_EXPR:
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index f9a9d99..e18422f 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -3696,6 +3696,8 @@ make_pack_expansion (tree arg)
   /* Propagate type and const-expression information.  */
   TREE_TYPE (result) = TREE_TYPE (arg);
   TREE_CONSTANT (result) = TREE_CONSTANT (arg);
+  /* Mark this read now, since the expansion might be length 0.  */
+  mark_exp_read (arg);
 }
   else
 /* Just use structural equality for these TYPE_PACK_EXPANSIONS;
diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index 56864b4..85ef993 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -3487,6 +3487,12 @@ finish_id_expression (tree id_expression,
   if (!scope && decl != error_mark_node && identifier_p (id_expression))
 	maybe_note_name_used_in_class (id_expression, decl);
 
+  /* A use in unevaluated operand might not be instantiated appropriately
+	 if tsubst_copy builds a dummy parm, or if we never instantiate a
+	 generic lambda, so mark it now.  */
+  if (processing_template_decl && cp_unevaluated_operand)
+	mark_type_use (decl);
+
   /* Disallow uses of local variables from containing functions, except
 	 within lambda-expressions.  */
   if (outer_automatic_var_p (decl))
diff --git a/gcc/testsuite/g++.dg/warn/Wunused-parm-7.C b/gcc/testsuite/g++.dg/warn/Wunused-parm-7.C
new file mode 100644
index 000..ff1dda5
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Wunused-parm-7.C
@@ -0,0 +1,17 @@
+// { dg-do compile { target c++11 } }
+// { dg-options "-Wunused-but-set-parameter" }
+
+template  void sink(Ts...);
+
+struct A { int i; };
+
+template 
+void f(A a)
+{
+  return sink((a.i + I)...);
+}
+
+int main()
+{
+  f<>(A());
+}
diff --git a/gcc/testsuite/g++.dg/warn/Wunused-parm-8.C b/gcc/testsuite/g++.dg/warn/Wunused-parm-8.C
new file mode 100644
index 000..867ad6a
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Wunused-parm-8.C
@@ -0,0 +1,9 @@
+// { dg-do compile { target c++14 } }
+// { dg-options "-Wunused-but-set-parameter" }
+
+auto l = [](auto t) -> decltype(true ? t : 0) { return {}; };
+
+int main()
+{
+  l(42);
+}
diff --git a/gcc/testsuite/g++.dg/warn/Wunused-var-24.C b/gcc/testsuite/g++.dg/warn/Wunused-var-24.C
new file mode 100644
index 000..924b2db
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Wunused-var-24.C
@@ -0,0 +1,10 @@
+// PR c++/66543
+// { dg-do compile { target c++14 } }
+// { dg-options "-Wunused-but-set-variable" }
+
+int main() {
+  auto f = []() { };
+  [=](auto) {
+using Foo = decltype(f());
+  };
+}
diff --git a/gcc/testsuite/g++.dg/warn/Wunused-var-25.C b/gcc/testsuite/g++.dg/warn/Wunused-var-25.C
new file mode 100644
index 000..959e79c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Wunused-var-25.C
@@ -0,0 +1,15 @@
+// { dg-do compile { target c++14 } }
+// { dg-options "-Wunused-but-set-variable" }
+
+template  struct A { };
+template 
+auto f()
+{
+  constexpr int ar[sizeof...(I)+1] = {I...};
+  return A();
+}
+
+int main()
+{
+  f<>();
+}


C++ PATCH for c++/68206, 68530 (ICE with loop in constexpr)

2016-04-19 Thread Jason Merrill
Well-formed C++14 code with a loop in a constexpr function works fine, 
but we were crashing while trying to diagnose an unsuitable constexpr 
function because potential_constant_expression_1 didn't understand loops.


The second patch improves constexpr handling of EXIT_EXPR and loops 
around COMPOUND_EXPR rather than STATEMENT_LIST.  This is not currently 
necessary, but might be in future.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit e9df0f90c8502653b06a100bef9c765d6ee52ca9
Author: Jason Merrill 
Date:   Thu Mar 3 08:15:50 2016 -0600

	PR c++/68206

	PR c++/68530
	* constexpr.c (potential_constant_expression_1): Handle LOOP_EXPR
	and GOTO_EXPR.

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index ae0c973..d508660 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -4924,6 +4924,8 @@ potential_constant_expression_1 (tree t, bool want_rval, bool strict,
 case NON_DEPENDENT_EXPR:
   /* For convenience.  */
 case RETURN_EXPR:
+case LOOP_EXPR:
+case EXIT_EXPR:
   return RECUR (TREE_OPERAND (t, 0), want_rval);
 
 case TRY_FINALLY_EXPR:
@@ -5135,6 +5137,15 @@ potential_constant_expression_1 (tree t, bool want_rval, bool strict,
 case EMPTY_CLASS_EXPR:
   return false;
 
+case GOTO_EXPR:
+  {
+	tree *target = _OPERAND (t, 0);
+	/* Gotos representing break and continue are OK; we should have
+	   rejected other gotos in parsing.  */
+	gcc_assert (breaks (target) || continues (target));
+	return true;
+  }
+
 default:
   if (objc_is_property_ref (t))
 	return false;
diff --git a/gcc/testsuite/g++.dg/cpp1y/constexpr-loop5.C b/gcc/testsuite/g++.dg/cpp1y/constexpr-loop5.C
new file mode 100644
index 000..02f372d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/constexpr-loop5.C
@@ -0,0 +1,19 @@
+// PR c++/68530
+// { dg-do compile { target c++14 } }
+
+struct thing {
+void foo() {}
+};
+
+template
+constexpr int count()
+{
+auto item = thing {};
+for(; (item.foo(), false);); // { dg-error "foo" }
+return 0;
+}
+
+int main()
+{
+static_assert( count() == 0, "" ); // { dg-error "" }
+}
commit 0e632adeb8c2253f6a9f9e4445c577eef51b1f4c
Author: Jason Merrill 
Date:   Tue Apr 19 13:59:05 2016 -0400

	Improve constexpr handling of other loop forms.

	* constexpr.c (breaks): Handle EXIT_EXPR.
	(cxx_eval_loop_expr): Handle COMPOUND_EXPR body.
	(cxx_eval_constant_expression): Handle EXIT_EXPR, improve handling
	of COMPOUND_EXPR.

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index d508660..41f0b5c 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -3241,8 +3241,9 @@ static bool
 breaks (tree *jump_target)
 {
   return *jump_target
-&& TREE_CODE (*jump_target) == LABEL_DECL
-&& LABEL_DECL_BREAK (*jump_target);
+&& ((TREE_CODE (*jump_target) == LABEL_DECL
+	 && LABEL_DECL_BREAK (*jump_target))
+	|| TREE_CODE (*jump_target) == EXIT_EXPR);
 }
 
 static bool
@@ -3358,8 +3359,8 @@ cxx_eval_loop_expr (const constexpr_ctx *ctx, tree t,
   hash_set save_exprs;
   new_ctx.save_exprs = _exprs;
 
-  cxx_eval_statement_list (_ctx, body,
-			   non_constant_p, overflow_p, jump_target);
+  cxx_eval_constant_expression (_ctx, body, /*lval*/false,
+non_constant_p, overflow_p, jump_target);
 
   /* Forget saved values of SAVE_EXPRs.  */
   for (hash_set::iterator iter = save_exprs.begin();
@@ -3750,6 +3751,8 @@ cxx_eval_constant_expression (const constexpr_ctx *ctx, tree t,
 	cxx_eval_constant_expression (ctx, op0,
 	  true, non_constant_p, overflow_p,
 	  jump_target);
+	if (*non_constant_p)
+	  return t;
 	op1 = TREE_OPERAND (t, 1);
 	r = cxx_eval_constant_expression (ctx, op1,
 	  lval, non_constant_p, overflow_p,
@@ -4015,6 +4018,17 @@ cxx_eval_constant_expression (const constexpr_ctx *ctx, tree t,
 	}
   break;
 
+case EXIT_EXPR:
+  {
+	tree cond = TREE_OPERAND (t, 0);
+	cond = cxx_eval_constant_expression (ctx, cond, /*lval*/false,
+	 non_constant_p, overflow_p);
+	VERIFY_CONSTANT (cond);
+	if (integer_nonzerop (cond))
+	  *jump_target = t;
+  }
+  break;
+
 case GOTO_EXPR:
   *jump_target = TREE_OPERAND (t, 0);
   gcc_assert (breaks (jump_target) || continues (jump_target));


Re: [PATCH] Don't build 32-bit libatomic with -march=i486 on x86-64

2016-04-19 Thread H.J. Lu
On Tue, Apr 19, 2016 at 11:30 AM, Uros Bizjak  wrote:
> On Tue, Apr 19, 2016 at 8:24 PM, H.J. Lu  wrote:
>> On Tue, Apr 19, 2016 at 11:18 AM, Uros Bizjak  wrote:
>>> On Tue, Apr 19, 2016 at 8:08 PM, H.J. Lu  wrote:
 On Tue, Apr 19, 2016 at 8:45 AM, Uros Bizjak  wrote:
> On Tue, Apr 19, 2016 at 5:07 PM, H.J. Lu  wrote:
>> Gcc uses the same -march= for both -m32 and -m64 on x86-64 unless
>> --with-arch-32= is used.  There is no need for -march=i486 to compile
>> 32-bit libatomic on x86-64.
>>
>> Tested on x86-64.  OK for trunk?
>>
>> H.J.
>> ---
>> PR target/70454
>> * configure.tgt (XCFLAGS): Don't add -march=i486 to compile
>> 32-bit x86 target library on x86-64.
>> ---
>>  libatomic/configure.tgt | 10 ++
>>  1 file changed, 2 insertions(+), 8 deletions(-)
>>
>> diff --git a/libatomic/configure.tgt b/libatomic/configure.tgt
>> index c5470d7..bbb93fc 100644
>> --- a/libatomic/configure.tgt
>> +++ b/libatomic/configure.tgt
>> @@ -81,14 +81,8 @@ case "${target_cpu}" in
>> try_ifunc=yes
>> ;;
>>x86_64)
>> -   case " ${CC} ${CFLAGS} " in
>> - *" -m32 "*)
>> -   XCFLAGS="${XCFLAGS} -march=i486 -mtune=generic"
>> -   XCFLAGS="${XCFLAGS} -fomit-frame-pointer"
>> -   ;;
>> - *)
>> -   ;;
>> -   esac
>> +   # Since 64-bit arch > i486, we can use the same -march= to build
>> +   # both 32-bit and 64-bit target libraries.
>> ARCH=x86
>> # ??? Detect when -mcx16 is already enabled.
>> try_ifunc=yes
>> --
>> 2.5.5
>>
>
> No, this is wrong. My build with default options defaults to i386. So,
> the difference between
>

 How was your GCC configured?  Did you use

 --with-arch_32=i386
>>>
>>> Nope, just:
>>>
>>> ~/gcc-svn/trunk/configure
>>>
>>
>> $ /ssd/uros/gcc-build/gcc/cc1 -E -dM -m32 hello.c > aaa
>>
>> I don't think cc1 is supposed to be used directly.  Can you use gcc
>> driver instead, like
>>
>> $ /ssd/uros/gcc-build/gcc/xgcc -B/ssd/uros/gcc-build/gcc/  -E -dM -m32 
>> hello.c
>
> This works, since the driver passes:
>
> COLLECT_GCC_OPTIONS='-B' '/ssd/uros/gcc-build/gcc/' '-E' '-dM' '-m32'
> '-mtune=generic' '-march=x86-64'
>

That is why I submitted my patches.  Since -m32 passes -march=x86-64
to cc1 on x86-64,  we shouldn't pass -march=i486 to cc1.  It is undesirable
especially when --with-arch= is used.  I noticed the issue when 32-bit
libatomic/libgomp/libitm weren't optimized with -march=haswell when GCC
was configured with --with-arch=haswell


-- 
H.J.


[PATCH] Fix ICE in predicate_mem_writes (PR tree-optimization/70725)

2016-04-19 Thread Marek Polacek
While predicate_mem_writes has a check to skip conditions that were evaluated
to true, it's lacking the same check for false, so we hit an assert later on.
So I'm adding is_false_predicate.  Maybe it should be added to other spots as
well, but I'm not sure about that.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2016-04-19  Marek Polacek  

PR tree-optimization/70725
* tree-if-conv.c (is_false_predicate): New function.
(predicate_mem_writes): Use it.

* gcc.dg/pr70725.c: New test.

diff --git gcc/testsuite/gcc.dg/pr70725.c gcc/testsuite/gcc.dg/pr70725.c
index e69de29..fc7b674 100644
--- gcc/testsuite/gcc.dg/pr70725.c
+++ gcc/testsuite/gcc.dg/pr70725.c
@@ -0,0 +1,22 @@
+/* PR tree-optimization/70725 */
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+/* { dg-additional-options "-march=skylake-avx512" { target { i?86-*-* 
x86_64-*-* } } } */
+
+extern short a;
+extern int b, d;
+extern int c[100];
+extern int e;
+extern int f;
+
+void
+fn1 ()
+{
+  for (; e < 2; e = e + 1)
+d = a;
+  for (;;)
+for (int g = 0; g < 5; g = g + 1)
+  for (int h = 0; h < 2; h = h + 1)
+   for (int i = 0; i < 3; i = i + 1)
+ c[f + i] = a && b;
+}
diff --git gcc/tree-if-conv.c gcc/tree-if-conv.c
index 9e305c7..a9fbab9 100644
--- gcc/tree-if-conv.c
+++ gcc/tree-if-conv.c
@@ -262,6 +262,16 @@ ifc_temp_var (tree type, tree expr, gimple_stmt_iterator 
*gsi)
   return new_name;
 }
 
+/* Return true when COND is a false predicate.  */
+
+static inline bool
+is_false_predicate (tree cond)
+{
+  return (cond == NULL_TREE
+ || cond == boolean_false_node
+ || integer_zerop (cond));
+}
+
 /* Return true when COND is a true predicate.  */
 
 static inline bool
@@ -1988,7 +1998,7 @@ predicate_mem_writes (loop_p loop)
   gimple *stmt;
   int index;
 
-  if (is_true_predicate (cond))
+  if (is_true_predicate (cond) || is_false_predicate (cond))
continue;
 
   swap = false;

Marek


Re: [PATCH] Don't build 32-bit libatomic with -march=i486 on x86-64

2016-04-19 Thread Uros Bizjak
On Tue, Apr 19, 2016 at 8:24 PM, H.J. Lu  wrote:
> On Tue, Apr 19, 2016 at 11:18 AM, Uros Bizjak  wrote:
>> On Tue, Apr 19, 2016 at 8:08 PM, H.J. Lu  wrote:
>>> On Tue, Apr 19, 2016 at 8:45 AM, Uros Bizjak  wrote:
 On Tue, Apr 19, 2016 at 5:07 PM, H.J. Lu  wrote:
> Gcc uses the same -march= for both -m32 and -m64 on x86-64 unless
> --with-arch-32= is used.  There is no need for -march=i486 to compile
> 32-bit libatomic on x86-64.
>
> Tested on x86-64.  OK for trunk?
>
> H.J.
> ---
> PR target/70454
> * configure.tgt (XCFLAGS): Don't add -march=i486 to compile
> 32-bit x86 target library on x86-64.
> ---
>  libatomic/configure.tgt | 10 ++
>  1 file changed, 2 insertions(+), 8 deletions(-)
>
> diff --git a/libatomic/configure.tgt b/libatomic/configure.tgt
> index c5470d7..bbb93fc 100644
> --- a/libatomic/configure.tgt
> +++ b/libatomic/configure.tgt
> @@ -81,14 +81,8 @@ case "${target_cpu}" in
> try_ifunc=yes
> ;;
>x86_64)
> -   case " ${CC} ${CFLAGS} " in
> - *" -m32 "*)
> -   XCFLAGS="${XCFLAGS} -march=i486 -mtune=generic"
> -   XCFLAGS="${XCFLAGS} -fomit-frame-pointer"
> -   ;;
> - *)
> -   ;;
> -   esac
> +   # Since 64-bit arch > i486, we can use the same -march= to build
> +   # both 32-bit and 64-bit target libraries.
> ARCH=x86
> # ??? Detect when -mcx16 is already enabled.
> try_ifunc=yes
> --
> 2.5.5
>

 No, this is wrong. My build with default options defaults to i386. So,
 the difference between

>>>
>>> How was your GCC configured?  Did you use
>>>
>>> --with-arch_32=i386
>>
>> Nope, just:
>>
>> ~/gcc-svn/trunk/configure
>>
>
> $ /ssd/uros/gcc-build/gcc/cc1 -E -dM -m32 hello.c > aaa
>
> I don't think cc1 is supposed to be used directly.  Can you use gcc
> driver instead, like
>
> $ /ssd/uros/gcc-build/gcc/xgcc -B/ssd/uros/gcc-build/gcc/  -E -dM -m32 hello.c

This works, since the driver passes:

COLLECT_GCC_OPTIONS='-B' '/ssd/uros/gcc-build/gcc/' '-E' '-dM' '-m32'
'-mtune=generic' '-march=x86-64'

Uros.


[patch] Fix configure test for sendfile()

2016-04-19 Thread Jonathan Wakely

One more fix for the Filesystem library: my configure test for the GNU
and Solaris version of sendfile was failing because it used NULL
without stddef.h, so we never used sendfile. That was useful, because
it meant someone found and reported a bug in the alternative
implementation that doesn't use sendfile, but it will be much faster
to use sendfile when available.

This fixes the configure test, but as it affects the build config I
won't commit it until after 6.1 is released.

Tested x86_64-linux.

commit 867bf574bad0ba186d941520c2575e8be1ea5366
Author: Jonathan Wakely 
Date:   Tue Apr 19 19:17:50 2016 +0100

Fix configure test for sendfile()

	* acinclude.m4 (GLIBCXX_CHECK_FILESYSTEM_DEPS): Fix test for sendfile.
	* configure: Regenerate.
	* config.h.in: Regenerate.

diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
index b0f88cb..0824243 100644
--- a/libstdc++-v3/acinclude.m4
+++ b/libstdc++-v3/acinclude.m4
@@ -4373,7 +4373,7 @@ dnl
   gnu* | linux* | solaris*)
 GCC_TRY_COMPILE_OR_LINK(
   [#include ],
-  [sendfile(1, 2, (off_t*)NULL, sizeof 1);],
+  [sendfile(1, 2, (off_t*)0, sizeof 1);],
   [glibcxx_cv_sendfile=yes],
   [glibcxx_cv_sendfile=no])
 ;;
@@ -4383,7 +4383,7 @@ dnl
 esac
   ])
   if test $glibcxx_cv_sendfile = yes; then
-AC_DEFINE(_GLIBCXX_USE_SENDFILE, 1, [Define if sendfile is available in .])
+AC_DEFINE(_GLIBCXX_USE_SENDFILE, 1, [Define if sendfile is available in .])
   fi
   AC_MSG_RESULT($glibcxx_cv_sendfile)
 dnl


Re: [PATCH] Don't build 32-bit libatomic with -march=i486 on x86-64

2016-04-19 Thread H.J. Lu
On Tue, Apr 19, 2016 at 11:18 AM, Uros Bizjak  wrote:
> On Tue, Apr 19, 2016 at 8:08 PM, H.J. Lu  wrote:
>> On Tue, Apr 19, 2016 at 8:45 AM, Uros Bizjak  wrote:
>>> On Tue, Apr 19, 2016 at 5:07 PM, H.J. Lu  wrote:
 Gcc uses the same -march= for both -m32 and -m64 on x86-64 unless
 --with-arch-32= is used.  There is no need for -march=i486 to compile
 32-bit libatomic on x86-64.

 Tested on x86-64.  OK for trunk?

 H.J.
 ---
 PR target/70454
 * configure.tgt (XCFLAGS): Don't add -march=i486 to compile
 32-bit x86 target library on x86-64.
 ---
  libatomic/configure.tgt | 10 ++
  1 file changed, 2 insertions(+), 8 deletions(-)

 diff --git a/libatomic/configure.tgt b/libatomic/configure.tgt
 index c5470d7..bbb93fc 100644
 --- a/libatomic/configure.tgt
 +++ b/libatomic/configure.tgt
 @@ -81,14 +81,8 @@ case "${target_cpu}" in
 try_ifunc=yes
 ;;
x86_64)
 -   case " ${CC} ${CFLAGS} " in
 - *" -m32 "*)
 -   XCFLAGS="${XCFLAGS} -march=i486 -mtune=generic"
 -   XCFLAGS="${XCFLAGS} -fomit-frame-pointer"
 -   ;;
 - *)
 -   ;;
 -   esac
 +   # Since 64-bit arch > i486, we can use the same -march= to build
 +   # both 32-bit and 64-bit target libraries.
 ARCH=x86
 # ??? Detect when -mcx16 is already enabled.
 try_ifunc=yes
 --
 2.5.5

>>>
>>> No, this is wrong. My build with default options defaults to i386. So,
>>> the difference between
>>>
>>
>> How was your GCC configured?  Did you use
>>
>> --with-arch_32=i386
>
> Nope, just:
>
> ~/gcc-svn/trunk/configure
>

$ /ssd/uros/gcc-build/gcc/cc1 -E -dM -m32 hello.c > aaa

I don't think cc1 is supposed to be used directly.  Can you use gcc
driver instead, like

$ /ssd/uros/gcc-build/gcc/xgcc -B/ssd/uros/gcc-build/gcc/  -E -dM -m32 hello.c

-- 
H.J.


Re: [PATCH] Don't build 32-bit libatomic with -march=i486 on x86-64

2016-04-19 Thread Uros Bizjak
On Tue, Apr 19, 2016 at 8:08 PM, H.J. Lu  wrote:
> On Tue, Apr 19, 2016 at 8:45 AM, Uros Bizjak  wrote:
>> On Tue, Apr 19, 2016 at 5:07 PM, H.J. Lu  wrote:
>>> Gcc uses the same -march= for both -m32 and -m64 on x86-64 unless
>>> --with-arch-32= is used.  There is no need for -march=i486 to compile
>>> 32-bit libatomic on x86-64.
>>>
>>> Tested on x86-64.  OK for trunk?
>>>
>>> H.J.
>>> ---
>>> PR target/70454
>>> * configure.tgt (XCFLAGS): Don't add -march=i486 to compile
>>> 32-bit x86 target library on x86-64.
>>> ---
>>>  libatomic/configure.tgt | 10 ++
>>>  1 file changed, 2 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/libatomic/configure.tgt b/libatomic/configure.tgt
>>> index c5470d7..bbb93fc 100644
>>> --- a/libatomic/configure.tgt
>>> +++ b/libatomic/configure.tgt
>>> @@ -81,14 +81,8 @@ case "${target_cpu}" in
>>> try_ifunc=yes
>>> ;;
>>>x86_64)
>>> -   case " ${CC} ${CFLAGS} " in
>>> - *" -m32 "*)
>>> -   XCFLAGS="${XCFLAGS} -march=i486 -mtune=generic"
>>> -   XCFLAGS="${XCFLAGS} -fomit-frame-pointer"
>>> -   ;;
>>> - *)
>>> -   ;;
>>> -   esac
>>> +   # Since 64-bit arch > i486, we can use the same -march= to build
>>> +   # both 32-bit and 64-bit target libraries.
>>> ARCH=x86
>>> # ??? Detect when -mcx16 is already enabled.
>>> try_ifunc=yes
>>> --
>>> 2.5.5
>>>
>>
>> No, this is wrong. My build with default options defaults to i386. So,
>> the difference between
>>
>
> How was your GCC configured?  Did you use
>
> --with-arch_32=i386

Nope, just:

~/gcc-svn/trunk/configure

Uros.


Re: [PATCH] Don't build 32-bit libatomic with -march=i486 on x86-64

2016-04-19 Thread H.J. Lu
On Tue, Apr 19, 2016 at 8:45 AM, Uros Bizjak  wrote:
> On Tue, Apr 19, 2016 at 5:07 PM, H.J. Lu  wrote:
>> Gcc uses the same -march= for both -m32 and -m64 on x86-64 unless
>> --with-arch-32= is used.  There is no need for -march=i486 to compile
>> 32-bit libatomic on x86-64.
>>
>> Tested on x86-64.  OK for trunk?
>>
>> H.J.
>> ---
>> PR target/70454
>> * configure.tgt (XCFLAGS): Don't add -march=i486 to compile
>> 32-bit x86 target library on x86-64.
>> ---
>>  libatomic/configure.tgt | 10 ++
>>  1 file changed, 2 insertions(+), 8 deletions(-)
>>
>> diff --git a/libatomic/configure.tgt b/libatomic/configure.tgt
>> index c5470d7..bbb93fc 100644
>> --- a/libatomic/configure.tgt
>> +++ b/libatomic/configure.tgt
>> @@ -81,14 +81,8 @@ case "${target_cpu}" in
>> try_ifunc=yes
>> ;;
>>x86_64)
>> -   case " ${CC} ${CFLAGS} " in
>> - *" -m32 "*)
>> -   XCFLAGS="${XCFLAGS} -march=i486 -mtune=generic"
>> -   XCFLAGS="${XCFLAGS} -fomit-frame-pointer"
>> -   ;;
>> - *)
>> -   ;;
>> -   esac
>> +   # Since 64-bit arch > i486, we can use the same -march= to build
>> +   # both 32-bit and 64-bit target libraries.
>> ARCH=x86
>> # ??? Detect when -mcx16 is already enabled.
>> try_ifunc=yes
>> --
>> 2.5.5
>>
>
> No, this is wrong. My build with default options defaults to i386. So,
> the difference between
>

How was your GCC configured?  Did you use

--with-arch_32=i386

-- 
H.J.


[patch] libstdc++/69703 ignore endianness in codecvt_utf8

2016-04-19 Thread Jonathan Wakely

This was reported as a bug in the Filesystem library, but it's
actually a problem in the codecvt_utf8 facet that it uses.

Tested x86_64-linux, committed to trunk.


commit 7f3a547a9e80556030e77ac090e2ad8e04e44abc
Author: Jonathan Wakely 
Date:   Tue Apr 19 18:32:17 2016 +0100

libstdc++/69703 ignore endianness in codecvt_utf8

	PR libstdc++/69703
	* src/c++11/codecvt.cc (__codecvt_utf8_base::do_in)):
	Override endianness bit in mode.
	* testsuite/22_locale/codecvt/codecvt_utf8/69703.cc: New test.
	* testsuite/22_locale/codecvt/codecvt_utf8_utf16/66855.cc: Test
	that little_endian mode is ignored.
	* testsuite/experimental/filesystem/path/native/string.cc: New test.

diff --git a/libstdc++-v3/src/c++11/codecvt.cc b/libstdc++-v3/src/c++11/codecvt.cc
index 327beb6..b6b6358 100644
--- a/libstdc++-v3/src/c++11/codecvt.cc
+++ b/libstdc++-v3/src/c++11/codecvt.cc
@@ -789,7 +789,11 @@ do_in(state_type&, const extern_type* __from, const extern_type* __from_end,
 {
   range from{ __from, __from_end };
   range to{ __to, __to_end };
-  auto res = ucs2_in(from, to, _M_maxcode, _M_mode);
+  codecvt_mode mode = codecvt_mode(_M_mode | (consume_header|generate_header));
+#if __BYTE_ORDER__ != __ORDER_BIG_ENDIAN__
+  mode = codecvt_mode(mode | little_endian);
+#endif
+  auto res = ucs2_in(from, to, _M_maxcode, mode);
   __from_next = from.next;
   __to_next = to.next;
   return res;
diff --git a/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_utf8/69703.cc b/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_utf8/69703.cc
new file mode 100644
index 000..745d2c2
--- /dev/null
+++ b/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_utf8/69703.cc
@@ -0,0 +1,103 @@
+// Copyright (C) 2016 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-std=gnu++11" }
+
+#include 
+#include 
+
+void
+test01()
+{
+  bool test __attribute__((unused)) = true;
+
+  const char out[] = "abc";
+  char16_t in[4];
+  std::codecvt_utf8 cvt;
+  std::mbstate_t st;
+  const char* no;
+  char16_t* ni;
+  auto res = cvt.in(st, out, out+3, no, in, in+3, ni);
+  VERIFY( res == std::codecvt_base::ok );
+  VERIFY( in[0] == u'a' );
+  VERIFY( in[1] == u'b' );
+  VERIFY( in[2] == u'c' );
+}
+
+void
+test02()
+{
+  bool test __attribute__((unused)) = true;
+
+  const char out[] = "abc";
+  char16_t in[4];
+  std::codecvt_utf8 cvt;
+  std::mbstate_t st;
+  const char* no;
+  char16_t* ni;
+  auto res = cvt.in(st, out, out+3, no, in, in+3, ni);
+  VERIFY( res == std::codecvt_base::ok );
+  VERIFY( in[0] == u'a' );
+  VERIFY( in[1] == u'b' );
+  VERIFY( in[2] == u'c' );
+}
+
+void
+test03()
+{
+  bool test __attribute__((unused)) = true;
+
+  const char out[] = "abc";
+  char32_t in[4];
+  std::codecvt_utf8 cvt;
+  std::mbstate_t st;
+  const char* no;
+  char32_t* ni;
+  auto res = cvt.in(st, out, out+3, no, in, in+3, ni);
+  VERIFY( res == std::codecvt_base::ok );
+  VERIFY( in[0] == U'a' );
+  VERIFY( in[1] == U'b' );
+  VERIFY( in[2] == U'c' );
+}
+
+
+void
+test04()
+{
+  bool test __attribute__((unused)) = true;
+
+  const char out[] = "abc";
+  char32_t in[4];
+  std::codecvt_utf8 cvt;
+  std::mbstate_t st;
+  const char* no;
+  char32_t* ni;
+  auto res = cvt.in(st, out, out+3, no, in, in+3, ni);
+  VERIFY( res == std::codecvt_base::ok );
+  VERIFY( in[0] == U'a' );
+  VERIFY( in[1] == U'b' );
+  VERIFY( in[2] == U'c' );
+}
+
+int
+main()
+{
+  test01();
+  test02();
+  test01();
+  test02();
+}
diff --git a/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_utf8_utf16/66855.cc b/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_utf8_utf16/66855.cc
index 05e5bc6..49b750f 100644
--- a/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_utf8_utf16/66855.cc
+++ b/libstdc++-v3/testsuite/22_locale/codecvt/codecvt_utf8_utf16/66855.cc
@@ -45,8 +45,35 @@ test01()
   VERIFY( buf[3] == utf16[3] );
 }
 
+void
+test02()
+{
+  // Endianness flag should make no difference.
+  std::codecvt_utf8_utf16 cvt;
+  char16_t utf16[] = u"\ub098\ub294\ud0dc\uc624";
+  const char16_t* nf16;
+  char utf8[16];
+  char* nt8;
+  std::mbstate_t st{};
+  auto res = cvt.out(st, utf16, utf16+4, nf16, utf8, utf8+16, 

[patch] libstdc++/70609 fix filesystem::copy()

2016-04-19 Thread Jonathan Wakely

The conditional code in filesystem::copy() that uses stdio_filebuf
didn't handle the case where the file to be copied is empty, and so
the ostream insertion doesn't do anything. I was also failing to check
for errors when closing the files.

Rather embarassingly, the copy.cc test was actually a duplicate of
absolute.cc and wasn't testing the copy function at all.

I also found some defects in the spec of filesystem::copy(), which
I've reported.

Tested x86_64-linux, committed to trunk.


commit 461aaa6fb392aad7fc0e2c210b0b493b9335b38c
Author: Jonathan Wakely 
Date:   Tue Apr 19 18:32:03 2016 +0100

libstdc++/70609 fix filesystem::copy()

	PR libstdc++/70609
	* src/filesystem/ops.cc (close_fd): New function.
	(do_copy_file): Set permissions before copying file contents. Check
	result of closing file descriptors. Don't copy streambuf when file
	is empty.
	(copy(const path&, const path&, copy_options, error_code&)): Use
	lstat for source file when copy_symlinks is set.
	* testsuite/experimental/filesystem/operations/copy.cc: Test copy().

diff --git a/libstdc++-v3/src/filesystem/ops.cc b/libstdc++-v3/src/filesystem/ops.cc
index 756e140..aa26caf 100644
--- a/libstdc++-v3/src/filesystem/ops.cc
+++ b/libstdc++-v3/src/filesystem/ops.cc
@@ -300,6 +300,17 @@ namespace
 };
   }
 
+  // Returns true if the file descriptor was successfully closed,
+  // otherwise returns false and the reason will be in errno.
+  inline bool
+  close_fd(int fd)
+  {
+while (::close(fd))
+  if (errno != EINTR)
+	return false;
+return true;
+  }
+
   bool
   do_copy_file(const fs::path& from, const fs::path& to,
 	   fs::copy_options option,
@@ -376,7 +387,8 @@ namespace
   }
 
 struct CloseFD {
-  ~CloseFD() { if (fd != -1) ::close(fd); }
+  ~CloseFD() { if (fd != -1) close_fd(fd); }
+  bool close() { return close_fd(std::exchange(fd, -1)); }
   int fd;
 };
 
@@ -401,23 +413,6 @@ namespace
 	return false;
   }
 
-#ifdef _GLIBCXX_USE_SENDFILE
-auto n = ::sendfile(out.fd, in.fd, nullptr, from_st->st_size);
-if (n != from_st->st_size)
-  {
-	ec.assign(errno, std::generic_category());
-	return false;
-  }
-#else
-__gnu_cxx::stdio_filebuf sbin(in.fd, std::ios::in);
-__gnu_cxx::stdio_filebuf sbout(out.fd, std::ios::out);
-if ( !(std::ostream() << ) )
-  {
-	ec = std::make_error_code(std::errc::io_error);
-	return false;
-  }
-#endif
-
 #ifdef _GLIBCXX_USE_FCHMOD
 if (::fchmod(out.fd, from_st->st_mode))
 #elif _GLIBCXX_USE_FCHMODAT
@@ -429,6 +424,38 @@ namespace
 	ec.assign(errno, std::generic_category());
 	return false;
   }
+
+#ifdef _GLIBCXX_USE_SENDFILE
+const auto n = ::sendfile(out.fd, in.fd, nullptr, from_st->st_size);
+if (n != from_st->st_size)
+  {
+	ec.assign(errno, std::generic_category());
+	return false;
+  }
+if (!out.close() || !in.close())
+  {
+	ec.assign(errno, std::generic_category());
+	return false;
+  }
+#else
+__gnu_cxx::stdio_filebuf sbin(in.fd, std::ios::in);
+__gnu_cxx::stdio_filebuf sbout(out.fd, std::ios::out);
+if (sbin.is_open())
+  in.fd = -1;
+if (sbout.is_open())
+  out.fd = -1;
+if (from_st->st_size && !(std::ostream() << ))
+  {
+	ec = std::make_error_code(std::errc::io_error);
+	return false;
+  }
+if (sbout.close() || sbin.close())
+  {
+	ec.assign(errno, std::generic_category());
+	return false;
+  }
+#endif
+
 ec.clear();
 return true;
   }
@@ -439,13 +466,15 @@ void
 fs::copy(const path& from, const path& to, copy_options options,
 	 error_code& ec) noexcept
 {
-  bool skip_symlinks = is_set(options, copy_options::skip_symlinks);
-  bool create_symlinks = is_set(options, copy_options::create_symlinks);
-  bool use_lstat = create_symlinks || skip_symlinks;
+  const bool skip_symlinks = is_set(options, copy_options::skip_symlinks);
+  const bool create_symlinks = is_set(options, copy_options::create_symlinks);
+  const bool copy_symlinks = is_set(options, copy_options::copy_symlinks);
+  const bool use_lstat = create_symlinks || skip_symlinks;
 
   file_status f, t;
   stat_type from_st, to_st;
-  if (use_lstat
+  // N4099 doesn't check copy_symlinks here, but I think that's a defect.
+  if (use_lstat || copy_symlinks
   ? ::lstat(from.c_str(), _st)
   : ::stat(from.c_str(), _st))
 {
@@ -488,7 +517,7 @@ fs::copy(const path& from, const path& to, copy_options options,
 {
   if (skip_symlinks)
 	ec.clear();
-  else if (!exists(t) && is_set(options, copy_options::copy_symlinks))
+  else if (!exists(t) && copy_symlinks)
 	copy_symlink(from, to, ec);
   else
 	// Not clear what should be done here.
diff --git a/libstdc++-v3/testsuite/experimental/filesystem/operations/copy.cc b/libstdc++-v3/testsuite/experimental/filesystem/operations/copy.cc
index 9e89002..a5f6a3e 100644
--- 

[patch] Add noexcept to Filesystem TS operators

2016-04-19 Thread Jonathan Wakely

This isn't terribly important, but these operators might as well be
noexcept, and committing it now isn't going to cause any problems
backporting anything to gcc-6-branch at the last minute.

I have a couple more patches for Filesystem TS issues coming too.

Tested x86_64-linux, committed to trunk.

commit ac48e4cb8e5ff17a03dd478077568a8fd023abde
Author: Jonathan Wakely 
Date:   Tue Apr 19 18:31:43 2016 +0100

Add noexcept to Filesystem TS operators

	* include/experimental/bits/fs_fwd.h (operator&, operator|, operator^,
	operator~ operator&=, operator|=, operator^=): Add noexcept to
	overloaded operators for copy_options, perms and directory_options.
	* src/filesystem/ops.cc (make_file_type, make_file_status,
	is_not_found_errno, file_time): Add noexcept.

diff --git a/libstdc++-v3/include/experimental/bits/fs_fwd.h b/libstdc++-v3/include/experimental/bits/fs_fwd.h
index 1482e18..57aa4d3 100644
--- a/libstdc++-v3/include/experimental/bits/fs_fwd.h
+++ b/libstdc++-v3/include/experimental/bits/fs_fwd.h
@@ -93,7 +93,7 @@ _GLIBCXX_END_NAMESPACE_CXX11
   };
 
   constexpr copy_options
-  operator&(copy_options __x, copy_options __y)
+  operator&(copy_options __x, copy_options __y) noexcept
   {
 using __utype = typename std::underlying_type::type;
 return static_cast(
@@ -101,7 +101,7 @@ _GLIBCXX_END_NAMESPACE_CXX11
   }
 
   constexpr copy_options
-  operator|(copy_options __x, copy_options __y)
+  operator|(copy_options __x, copy_options __y) noexcept
   {
 using __utype = typename std::underlying_type::type;
 return static_cast(
@@ -109,7 +109,7 @@ _GLIBCXX_END_NAMESPACE_CXX11
   }
 
   constexpr copy_options
-  operator^(copy_options __x, copy_options __y)
+  operator^(copy_options __x, copy_options __y) noexcept
   {
 using __utype = typename std::underlying_type::type;
 return static_cast(
@@ -117,22 +117,22 @@ _GLIBCXX_END_NAMESPACE_CXX11
   }
 
   constexpr copy_options
-  operator~(copy_options __x)
+  operator~(copy_options __x) noexcept
   {
 using __utype = typename std::underlying_type::type;
 return static_cast(~static_cast<__utype>(__x));
   }
 
   inline copy_options&
-  operator&=(copy_options& __x, copy_options __y)
+  operator&=(copy_options& __x, copy_options __y) noexcept
   { return __x = __x & __y; }
 
   inline copy_options&
-  operator|=(copy_options& __x, copy_options __y)
+  operator|=(copy_options& __x, copy_options __y) noexcept
   { return __x = __x | __y; }
 
   inline copy_options&
-  operator^=(copy_options& __x, copy_options __y)
+  operator^=(copy_options& __x, copy_options __y) noexcept
   { return __x = __x ^ __y; }
 
 
@@ -163,7 +163,7 @@ _GLIBCXX_END_NAMESPACE_CXX11
   };
 
   constexpr perms
-  operator&(perms __x, perms __y)
+  operator&(perms __x, perms __y) noexcept
   {
 using __utype = typename std::underlying_type::type;
 return static_cast(
@@ -171,7 +171,7 @@ _GLIBCXX_END_NAMESPACE_CXX11
   }
 
   constexpr perms
-  operator|(perms __x, perms __y)
+  operator|(perms __x, perms __y) noexcept
   {
 using __utype = typename std::underlying_type::type;
 return static_cast(
@@ -179,7 +179,7 @@ _GLIBCXX_END_NAMESPACE_CXX11
   }
 
   constexpr perms
-  operator^(perms __x, perms __y)
+  operator^(perms __x, perms __y) noexcept
   {
 using __utype = typename std::underlying_type::type;
 return static_cast(
@@ -187,22 +187,22 @@ _GLIBCXX_END_NAMESPACE_CXX11
   }
 
   constexpr perms
-  operator~(perms __x)
+  operator~(perms __x) noexcept
   {
 using __utype = typename std::underlying_type::type;
 return static_cast(~static_cast<__utype>(__x));
   }
 
   inline perms&
-  operator&=(perms& __x, perms __y)
+  operator&=(perms& __x, perms __y) noexcept
   { return __x = __x & __y; }
 
   inline perms&
-  operator|=(perms& __x, perms __y)
+  operator|=(perms& __x, perms __y) noexcept
   { return __x = __x | __y; }
 
   inline perms&
-  operator^=(perms& __x, perms __y)
+  operator^=(perms& __x, perms __y) noexcept
   { return __x = __x ^ __y; }
 
   // Bitmask type
@@ -211,7 +211,7 @@ _GLIBCXX_END_NAMESPACE_CXX11
   };
 
   constexpr directory_options
-  operator&(directory_options __x, directory_options __y)
+  operator&(directory_options __x, directory_options __y) noexcept
   {
 using __utype = typename std::underlying_type::type;
 return static_cast(
@@ -219,7 +219,7 @@ _GLIBCXX_END_NAMESPACE_CXX11
   }
 
   constexpr directory_options
-  operator|(directory_options __x, directory_options __y)
+  operator|(directory_options __x, directory_options __y) noexcept
   {
 using __utype = typename std::underlying_type::type;
 return static_cast(
@@ -227,7 +227,7 @@ _GLIBCXX_END_NAMESPACE_CXX11
   }
 
   constexpr directory_options
-  operator^(directory_options __x, directory_options __y)
+  operator^(directory_options __x, directory_options __y) noexcept
   {
 using __utype = typename std::underlying_type::type;
 return static_cast(
@@ 

[PATCH] Improve detection of constant conditions during jump threading

2016-04-19 Thread Patrick Palka
This patch makes the jump threader look through the BIT_AND_EXPRs and
BIT_IOR_EXPRs within a condition so that we could find dominating
ASSERT_EXPRs that could help make the overall condition evaluate to a
constant.  For example, we currently don't perform any jump threading in
the following test case even though it's known that if the code calls
foo() then it can't possibly call bar() afterwards:

void
baz_1 (int a, int b, int c)
{
  if (a && b)
foo ();
  if (!b && c)
bar ();
}

   :
   _4 = a_3(D) != 0;
   _6 = b_5(D) != 0;
   _7 = _4 & _6;
   if (_7 != 0)
 goto ;
   else
 goto ;

   :
   b_15 = ASSERT_EXPR ;
   foo ();

   :
   _10 = b_5(D) == 0;
   _12 = c_11(D) != 0;
   _13 = _10 & _12;
   if (_13 != 0)
 goto ;
   else
 goto ;

   :
   bar ();

   :
   return;

So we here miss a jump threading opportunity that would have made bb 3 jump
straight to bb 6 instead of falling through to bb 4.

If we inspect the operands of the BIT_AND_EXPR of _13 we'll notice that
there is an ASSERT_EXPR that says its left operand b_5 is non-zero.  We
could use this ASSERT_EXPR to deduce that the condition (_13 != 0) is
always false.  This is what this patch does, basically by making
simplify_control_stmt_condition recurse into BIT_AND_EXPRs and
BIT_IOR_EXPRs.

Does this seem like a good idea/approach?

Notes:

1. This patch introduces a "regression" in gcc.dg/tree-ssa/ssa-thread-11.c
in that we no longer perform FSM threading during vrp2 but instead we
detect two new jump threading opportunities during vrp1.  Not sure if
the new code is better but it is shorter.  I wonder how this should be
resolved...

2. I haven't tested the performance impact of this patch.  What would be
a good way to do this?

3. According to my instrumentation, an older version of this change
added 4000 new threaded jumps during bootstrap.

gcc/ChangeLog:

* tree-ssa-threadedge.c (simplify_control_stmt_condition): Split
out into ...
(simplify_control_stmt_condition_1): ... here.  Recurse into
BIT_AND_EXPRs and BIT_IOR_EXPRs.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/ssa-thread-14.c: New test.
---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-14.c |  81 +
 gcc/tree-ssa-threadedge.c | 249 +-
 2 files changed, 285 insertions(+), 45 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-14.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-14.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-14.c
new file mode 100644
index 000..db9ed3b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-thread-14.c
@@ -0,0 +1,81 @@
+/* { dg-do compile }  */
+/* { dg-additional-options "-O2 -fdump-tree-vrp" }  */
+/* { dg-final { scan-tree-dump-times "Threaded jump" 8 "vrp1" } }  */
+
+void foo (void);
+void bar (void);
+void blah (void);
+
+/* One jump threaded here.  */
+
+void
+baz_1 (int a, int b, int c)
+{
+  if (a && b)
+foo ();
+  if (!b && c)
+bar ();
+}
+
+/* One jump threaded here.  */
+
+void
+baz_2 (int a, int b, int c)
+{
+  if (a && b)
+foo ();
+  if (b || c)
+bar ();
+}
+
+/* One jump threaded here.  */
+
+void
+baz_3 (int a, int b, int c)
+{
+  if (a && b > 10)
+foo ();
+  if (b < 5 && c)
+bar ();
+}
+
+/* Two jumps threaded here.  */
+
+void
+baz_4 (int a, int b, int c)
+{
+  if (a && b)
+{
+  foo ();
+  if (c)
+bar ();
+}
+  if (b && c)
+blah ();
+}
+
+/* Two jumps threaded here.  */
+
+void
+baz_5 (int a, int b, int c)
+{
+  if (a && b)
+{
+  foo ();
+  if (c)
+bar ();
+}
+  if (!b || !c)
+blah ();
+}
+
+/* One jump threaded here.  */
+
+void
+baz_6 (int a, int b, int c)
+{
+  if (a == 39 && b == 41)
+foo ();
+  if (c == 12 || b == 41)
+bar ();
+}
diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c
index f60be38..a4e8a26 100644
--- a/gcc/tree-ssa-threadedge.c
+++ b/gcc/tree-ssa-threadedge.c
@@ -376,6 +376,12 @@ record_temporary_equivalences_from_stmts_at_dest (edge e,
   return stmt;
 }
 
+static tree simplify_control_stmt_condition_1 (edge, gimple *,
+  class avail_exprs_stack *,
+  tree, enum tree_code, tree,
+  gcond *, pfn_simplify, bool,
+  unsigned);
+
 /* Simplify the control statement at the end of the block E->dest.
 
To avoid allocating memory unnecessarily, a scratch GIMPLE_COND
@@ -436,52 +442,14 @@ simplify_control_stmt_condition (edge e,
}
}
 
-  if (handle_dominating_asserts)
-   {
- /* Now see if the operand was consumed by an ASSERT_EXPR
-which dominates E->src.  If so, we want to replace the
-operand with the LHS of the ASSERT_EXPR.  */
- if (TREE_CODE (op0) == SSA_NAME)
-   op0 = lhs_of_dominating_assert (op0, e->src, 

C++ PATCH for some tiny cleanups

2016-04-19 Thread Jason Merrill
A few tiny bits that I noticed while working on other bugs during the 
GCC 6 process and decided to delay until stage 1.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit f2d4d53e2e76715a0a4e6379e75dd9eebff93c54
Author: Jason Merrill 
Date:   Tue Dec 8 15:18:22 2015 -0500

	Tiny C++ cleanups.

	* pt.c (tsubst_expr): Remove shadowing declaration.
	(tsubst_pack_expansion): Add assert.
	* semantics.c (add_decl_expr): Use DECL_SOURCE_LOCATION.

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index a6d56d1..f9a9d99 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -10961,6 +10961,7 @@ tsubst_pack_expansion (tree t, tree args, tsubst_flags_t complain,
 	  /* We can't substitute for this parameter pack.  We use a flag as
 	 well as the missing_level counter because function parameter
 	 packs don't have a level.  */
+	  gcc_assert (processing_template_decl);
 	  unsubstituted_packs = true;
 	}
 }
@@ -15135,7 +15136,6 @@ tsubst_expr (tree t, tree args, tsubst_flags_t complain, tree in_decl,
 	  {
 	tree scope = USING_DECL_SCOPE (decl);
 	tree name = DECL_NAME (decl);
-	tree decl;
 
 	scope = tsubst (scope, args, complain, in_decl);
 	decl = lookup_qualified_name (scope, name,
diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index 0487adf..56864b4 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -428,7 +428,7 @@ maybe_cleanup_point_expr_void (tree expr)
 void
 add_decl_expr (tree decl)
 {
-  tree r = build_stmt (input_location, DECL_EXPR, decl);
+  tree r = build_stmt (DECL_SOURCE_LOCATION (decl), DECL_EXPR, decl);
   if (DECL_INITIAL (decl)
   || (DECL_SIZE (decl) && TREE_SIDE_EFFECTS (DECL_SIZE (decl
 r = maybe_cleanup_point_expr_void (r);


C++ PATCH for core issue 2137

2016-04-19 Thread Jason Merrill
Issue 2137 corrects the previous adjustment of list-initialization to 
allow copying of aggregates so that it also gives such an initialization 
the same implicit conversion sequence rank as the same copy constructor 
called with () syntax.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit dd9a42cf5c84e8bb2f14d8d70544261e6c75794a
Author: Jason Merrill 
Date:   Mon Feb 29 21:42:31 2016 -0500

	DR 2137

	* call.c (implicit_conversion): If we choose a copy constructor
	for list-initialization from the same type, the conversion is an
	exact match.

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index ef195f8..636f8f6 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -1862,7 +1862,24 @@ implicit_conversion (tree to, tree from, tree expr, bool c_cast_p,
 
   cand = build_user_type_conversion_1 (to, expr, flags, complain);
   if (cand)
-	conv = cand->second_conv;
+	{
+	  if (BRACE_ENCLOSED_INITIALIZER_P (expr)
+	  && CONSTRUCTOR_NELTS (expr) == 1
+	  && !is_list_ctor (cand->fn))
+	{
+	  /* "If C is not an initializer-list constructor and the
+		 initializer list has a single element of type cv U, where U is
+		 X or a class derived from X, the implicit conversion sequence
+		 has Exact Match rank if U is X, or Conversion rank if U is
+		 derived from X."  */
+	  tree elt = CONSTRUCTOR_ELT (expr, 0)->value;
+	  tree elttype = TREE_TYPE (elt);
+	  if (reference_related_p (to, elttype))
+		return implicit_conversion (to, elttype, elt,
+	c_cast_p, flags, complain);
+	}
+	  conv = cand->second_conv;
+	}
 
   /* We used to try to bind a reference to a temporary here, but that
 	 is now handled after the recursive call to this function at the end
diff --git a/gcc/testsuite/g++.dg/DRs/dr2137-1.C b/gcc/testsuite/g++.dg/DRs/dr2137-1.C
new file mode 100644
index 000..ad6b532
--- /dev/null
+++ b/gcc/testsuite/g++.dg/DRs/dr2137-1.C
@@ -0,0 +1,20 @@
+// DR 2137
+// { dg-do run { target c++11 } }
+
+// Test that an initializer_list constructor beats the copy constructor.
+
+#include 
+
+bool ok = false;
+
+struct Q {
+  Q() = default;
+  Q(Q const&) = default;
+  Q(Q&&) = default;
+  Q(std::initializer_list) { ok = true; }
+};
+
+int main() {
+  Q x = Q { Q() };
+  if (!ok) __builtin_abort ();
+}
diff --git a/gcc/testsuite/g++.dg/DRs/dr2137-2.C b/gcc/testsuite/g++.dg/DRs/dr2137-2.C
new file mode 100644
index 000..ba90860
--- /dev/null
+++ b/gcc/testsuite/g++.dg/DRs/dr2137-2.C
@@ -0,0 +1,21 @@
+// DR 2137
+// { dg-do link { target c++11 } }
+
+// Test that copying Q is better than converting to R.
+
+struct Q {
+  Q() { }
+  Q(const Q&) { }
+};
+
+struct R {
+  R(const Q&);
+};
+
+void f(Q) { }
+void f(R);
+
+int main()
+{
+  f({Q()});
+}


[PATCH GCC]Support BIT_AND_EXPR in scalar evolution

2016-04-19 Thread Bin Cheng
Hi,
Type conversion from integer to smaller unsigned type could be transformed into 
BIT_AND_EXPR in compilation.  For example,
  int i;
  for (i = 0; i < n; i++)
{
  unsigned char uc = (unsigned char) i;  // transformed into X = i && 255, 
in which both X and i are of int type.
  b[uc] = 0;
}
X here could a valid SCEV if we can prove that loop doesn't iterate more than 
255 times.  In other words, if 'i' is SCEV and its value is in the range of 
representable set of type "unsigned char".  This information could be available 
with -faggressive-loop-optimizations.
This patch adds support for BIT_AND_EXPR in scalar evolution to handle such 
cases, as well as two new tests.

Bootstrap and test on x86_64 & AArch64.  Is it OK?

Thanks,
bin

2016-03-24  Bin Cheng  

* tree-scalar-evolution.c (interpret_rhs_expr): Handle BIT_AND_EXPR.

gcc/testsuite/ChangeLog
2016-03-24  Bin Cheng  

* gcc.dg/tree-ssa/scev-11.c: New test.
* gcc.dg/tree-ssa/scev-12.c: New test.

diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c
index fdd5da0..b25af28 100644
--- a/gcc/tree-scalar-evolution.c
+++ b/gcc/tree-scalar-evolution.c
@@ -1932,6 +1932,37 @@ interpret_rhs_expr (struct loop *loop, gimple *at_stmt,
   res = chrec_convert (type, chrec1, at_stmt);
   break;
 
+case BIT_AND_EXPR:
+  /* Given int variable A, handle A&0x as (int)(unsigned short)A.
+If A is SCEV and its value is in the range of representable set
+of type unsigned short, the result expression is a (no-overflow)
+SCEV.  */
+  res = chrec_dont_know;
+  if (cst_and_fits_in_hwi (rhs2))
+   {
+ int precision;
+ unsigned HOST_WIDE_INT val;
+
+ val = (unsigned HOST_WIDE_INT) int_cst_value (rhs2);
+ val ++;
+ /* Skip if value of rhs2 wraps in unsigned HOST_WIDE_INT or
+it's not the maximum value of a smaller type than rhs1.  */
+ if (val != 0
+ && (precision = exact_log2 (val)) > 0
+ && (unsigned) precision < TYPE_PRECISION (TREE_TYPE (rhs1)))
+   {
+ tree utype = build_nonstandard_integer_type (precision, 1);
+
+ if (TYPE_PRECISION (utype) < TYPE_PRECISION (TREE_TYPE (rhs1)))
+   {
+ chrec1 = analyze_scalar_evolution (loop, rhs1);
+ chrec1 = chrec_convert (utype, chrec1, at_stmt);
+ res = chrec_convert (TREE_TYPE (rhs1), chrec1, at_stmt);
+   }
+   }
+   }
+  break;
+
 default:
   res = chrec_dont_know;
   break;
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/scev-11.c 
b/gcc/testsuite/gcc.dg/tree-ssa/scev-11.c
new file mode 100644
index 000..a7181b2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-11.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-ivopts-details" } */
+
+int a[128];
+extern int b[];
+
+int bar (int *);
+
+int
+foo (int n)
+{
+  int i;
+
+  for (i = 0; i < n; i++)
+{
+  unsigned char uc = (unsigned char)i;
+  a[i] = i;
+  b[uc] = 0;
+}
+
+  bar (a);
+  return 0;
+}
+
+/* Address of array reference to b is scev.  */
+/* { dg-final { scan-tree-dump-times "use \[0-9\]\n  address" 2 "ivopts" } } */
+
+
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/scev-12.c 
b/gcc/testsuite/gcc.dg/tree-ssa/scev-12.c
new file mode 100644
index 000..6915ba8
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/scev-12.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-ivopts-details" } */
+
+int a[128];
+extern int b[];
+
+int bar (int *);
+
+int
+foo (int x, int n)
+{
+  int i;
+
+  for (i = 0; i < n; i++)
+{
+  unsigned char uc = (unsigned char)i;
+  if (x)
+   a[i] = i;
+  b[uc] = 0;
+}
+
+  bar (a);
+  return 0;
+}
+
+/* Address of array reference to b is not scev.  */
+/* { dg-final { scan-tree-dump-times "use \[0-9\]\n  address" 1 "ivopts" } } */
+
+
+


Re: [PATCH] PR libitm/70456: Allocate aligned memory in gtm_thread operator new

2016-04-19 Thread Torvald Riegel
On Sat, 2016-04-02 at 09:25 -0700, H.J. Lu wrote:
> On Wed, Mar 30, 2016 at 5:34 AM, H.J. Lu  wrote:
> > Since GTM::gtm_thread has
> >
> > gtm_thread *next_thread __attribute__((__aligned__(HW_CACHELINE_SIZE)));
> >
> > GTM::gtm_thread::operator new should allocate aligned memory.
> >
> > Tested on Linux/x86-64.  OK for trunk.
> >
> >
> 
> This patch is better.  Tested on Linux/x86-64.  OK for trunk?

OK.



Re: [PATCH] Optimize strchr (s, 0) to strlen

2016-04-19 Thread Wilco Dijkstra
Richard Biener wrote:
>
> This folding should be added to gimple-fold.c:gimple_fold_builtin instead,
> the builtins.c foldings are purerly for folding to constants nowadays.

So is this better? It's a lot more verbose for something so simple...
Unfortunately match.pd doesn't support this kind of thing either.

Wilco


ChangeLog:
2016-04-19  Wilco Dijkstra  

gcc/
* gcc/gimple-fold.c (gimple_fold_builtin_strchr):
New function to optimize strchr (s, 0) to strlen.
(gimple_fold_builtin): Add BUILT_IN_STRCHR case.

testsuite/
* gcc/testsuite/gcc.dg/strlenopt-20.c: Update test.
* gcc/testsuite/gcc.dg/strlenopt-21.c: Likewise.
* gcc/testsuite/gcc.dg/strlenopt-22.c: Likewise.
* gcc/testsuite/gcc.dg/strlenopt-26.c: Likewise.
* gcc/testsuite/gcc.dg/strlenopt-5.c: Likewise.
* gcc/testsuite/gcc.dg/strlenopt-7.c: Likewise.
* gcc/testsuite/gcc.dg/strlenopt-9.c: Likewise.

--

diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
index 
eb130d048469f0b8196e565fed9a40de74b098bd..11dcf69fc919f066362f4f713db392d14b39764e
 100644
--- a/gcc/gimple-fold.c
+++ b/gcc/gimple-fold.c
@@ -1380,6 +1380,59 @@ gimple_fold_builtin_strncpy (gimple_stmt_iterator *gsi,
   return true;
 }
 
+/* Simplify strchr (str, 0) into str + strlen (str).
+   In general strlen is significantly faster than strchr
+   due to being a simpler operation.  */
+static bool
+gimple_fold_builtin_strchr (gimple_stmt_iterator *gsi)
+{
+  gimple *stmt = gsi_stmt (*gsi);
+  tree str = gimple_call_arg (stmt, 0);
+  tree c = gimple_call_arg (stmt, 1);
+  location_t loc = gimple_location (stmt);
+
+  if (optimize_function_for_size_p (cfun))
+return false;
+
+  if (!integer_zerop (c) || !gimple_call_lhs (stmt))
+return false;
+
+  tree newstr;
+  tree strlen_fn = builtin_decl_implicit (BUILT_IN_STRLEN);
+
+  if (!strlen_fn)
+return false;
+
+  /* Create newstr = strlen (str).  */
+  gimple_seq stmts = NULL, stmts2;
+  gimple *repl = gimple_build_call (strlen_fn, 1, str);
+  gimple_set_location (repl, loc);
+  if (gimple_in_ssa_p (cfun))
+newstr = make_ssa_name (size_type_node);
+  else
+newstr = create_tmp_reg (size_type_node);
+  gimple_call_set_lhs (repl, newstr);
+  gimple_seq_add_stmt_without_update (, repl);
+
+  /* Create (str p+ strlen (str)).  */
+  newstr = fold_build_pointer_plus_loc (loc, str, newstr);
+  newstr = force_gimple_operand (newstr, , true, NULL_TREE);
+  gimple_seq_add_seq_without_update (, stmts2);
+
+  repl = gimple_build_assign (gimple_call_lhs (stmt), newstr);
+  gimple_seq_add_stmt_without_update (, repl);
+  gsi_replace_with_seq_vops (gsi, stmts);
+  /* gsi now points at the assignment to the lhs, get a
+ stmt iterator to the strlen.
+ ???  We can't use gsi_for_stmt as that doesn't work when the
+ CFG isn't built yet.  */
+  gimple_stmt_iterator gsi2 = *gsi;
+  gsi_prev ();
+  gsi_prev ();
+  fold_stmt ();
+  return true;
+}
+
 /* Simplify a call to the strcat builtin.  DST and SRC are the arguments
to the call.
 
@@ -2821,6 +2874,8 @@ gimple_fold_builtin (gimple_stmt_iterator *gsi)
 gimple_call_arg (stmt, 1));
 case BUILT_IN_STRNCAT:
   return gimple_fold_builtin_strncat (gsi);
+case BUILT_IN_STRCHR:
+  return gimple_fold_builtin_strchr (gsi);
 case BUILT_IN_FPUTS:
   return gimple_fold_builtin_fputs (gsi, gimple_call_arg (stmt, 0),
gimple_call_arg (stmt, 1), false);
diff --git a/gcc/testsuite/gcc.dg/strlenopt-20.c 
b/gcc/testsuite/gcc.dg/strlenopt-20.c
index 
a83e845c26d88e5acdcabf142f7b319136663488..7b483eaeac1aa47278111a92148a16f00b2aaa2d
 100644
--- a/gcc/testsuite/gcc.dg/strlenopt-20.c
+++ b/gcc/testsuite/gcc.dg/strlenopt-20.c
@@ -86,9 +86,9 @@ main ()
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "strlen \\(" 1 "strlen" } } */
+/* { dg-final { scan-tree-dump-times "strlen \\(" 2 "strlen" } } */
 /* { dg-final { scan-tree-dump-times "memcpy \\(" 4 "strlen" } } */
 /* { dg-final { scan-tree-dump-times "strcpy \\(" 0 "strlen" } } */
 /* { dg-final { scan-tree-dump-times "strcat \\(" 0 "strlen" } } */
-/* { dg-final { scan-tree-dump-times "strchr \\(" 1 "strlen" } } */
+/* { dg-final { scan-tree-dump-times "strchr \\(" 0 "strlen" } } */
 /* { dg-final { scan-tree-dump-times "stpcpy \\(" 0 "strlen" } } */
diff --git a/gcc/testsuite/gcc.dg/strlenopt-21.c 
b/gcc/testsuite/gcc.dg/strlenopt-21.c
index 
e22fa9fca9ba14354db2cd5f602283b64bd8dcac..05b85a49dde0a7f5d269174fd4269e40be910dbd
 100644
--- a/gcc/testsuite/gcc.dg/strlenopt-21.c
+++ b/gcc/testsuite/gcc.dg/strlenopt-21.c
@@ -57,9 +57,9 @@ main ()
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "strlen \\(" 1 "strlen" } } */
+/* { dg-final { scan-tree-dump-times "strlen \\(" 2 "strlen" } } */
 /* { dg-final { scan-tree-dump-times "memcpy \\(" 3 "strlen" } } */
 /* { dg-final { scan-tree-dump-times "strcpy \\(" 0 "strlen" } } */
 /* { 

Re: [PATCH] PR70674: S/390: Add memory barrier to stack pointer restore from fpr.

2016-04-19 Thread Andreas Krebbel
On 04/19/2016 12:54 PM, Jakub Jelinek wrote:
> Can you please:
>   rtx fpr = gen_rtx_REG (DImode, cfun_gpr_save_slot (i));
>   if (i == STACK_POINTER_REGNUM)
> insn = emit_insn (gen_stack_restore_from_fpr (fpr));
>   else
> insn = emit_move_insn (gen_rtx_REG (DImode, i), fpr);
> That way IMHO it is more nicely formatted, you avoid the ugly (
> at the end of line, it uses fewer lines anyway and additionally
> you can make it clear what the gen_rtx_REG (DImode, cfun_gpr_save_slot (i))
> means by giving it a name.  Of course, choose whatever other var
> name you prefer to describe what it is.

Right, that's better. I'll change the patch and commit it tomorrow. Thanks!

-Andreas-



[PATCH, i386] Relax target requirement for vec_unpacks_lo_hi

2016-04-19 Thread Ilya Enkovich
Hi,

vec_unpacks_lo_[si,hi,di] patterns for scalar masks don't need to extend
mask elements.  It means a simple register copy is good enough.

Currently vec_unpacks_lo_hi pattern uses kmovb instruction which requires
AVX512DQ target.  But 16-bit masks to/from 8-bit masks conversion is typical
for AVX512F code with a mix of integer (or float, or logical (kind=4) for
Fortran) and double computations.  This patch implements vec_unpacks_lo_hi
as kmovw instead to make masks conversion available for AVX512F target.

Bootstrapped and tested on x96_64-unknown-linux-gnu.  Does it look OK
for trunk?

Thanks,
Ilya
--
gcc/

2016-04-19  Ilya Enkovich  

* config/i386/sse.md (vec_unpacks_lo_hi): Always
use kmovw to support AVX512F target.


diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 4d2927e..c213ee1 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -13661,9 +13661,9 @@
   "ix86_expand_sse_unpack (operands[0], operands[1], true, false); DONE;")
 
 (define_expand "vec_unpacks_lo_hi"
-  [(set (match_operand:QI 0 "register_operand")
-(subreg:QI (match_operand:HI 1 "register_operand") 0))]
-  "TARGET_AVX512DQ")
+  [(set (subreg:HI (match_operand:QI 0 "register_operand") 0)
+(match_operand:HI 1 "register_operand"))]
+  "TARGET_AVX512F")
 
 (define_expand "vec_unpacks_lo_si"
   [(set (match_operand:HI 0 "register_operand")


Re: [PATCH] [AArch64] support -mfentry feature for arm64

2016-04-19 Thread Torsten Duwe
On Mon, Apr 18, 2016 at 02:12:09PM +0200, Michael Matz wrote:
> 
> .  It can also be solved by having just one NOP after the function label, 
> and a number of them before, then no thread can be in the nop pad.  That 
> seems to indicate that GCC should not try to be too clever and simply 
> leave the specified number of nops before and after the function label, 
> leaving safety measures to the patching infrastructure.

Yes, please. Consistency is required to maintain a sane stream of instructions
for all CPUs involved; this gets particularily nasty on x86 byte code that
can in theory cross a cache line boundary with every byte. ARM does not have
this problem. Even if it had, it would be a kernel problem, not the compiler's.

All that kernel live patching needs is some space to place the patch calls.
I currently see the need for 2 NOPs at the beginning of each function; making
that configurable would be a plus; leaving additional configurable space
before the entry point would be even more flexible.

Fentry, like ppc64 profile-kernel, does more than necessary by generating
a call already. On ppc64 currently, that branch target isn't even actively
used, and the first thing the kernel does is patch these calls with NOPs.

So why not start with the NOPs right away? An architecture-independent,
variable number might stimulate use cases on other OSes and architectures
as well.

Torsten



Re: [PATCH] Remove ix86_legitimate_combined_insn

2016-04-19 Thread Uros Bizjak
On Tue, Apr 19, 2016 at 5:36 PM, H.J. Lu  wrote:
> On Tue, Apr 19, 2016 at 8:27 AM, Uros Bizjak  wrote:
>> On Tue, Apr 19, 2016 at 5:18 PM, H.J. Lu  wrote:
>>> On Tue, Apr 19, 2016 at 8:08 AM, Uros Bizjak  wrote:
 On Tue, Apr 19, 2016 at 4:49 PM, H.J. Lu  wrote:
>
> From INSTRUCTION EXCEPTION SPECIFICATION section in Intel software
> developer manual volume 2, only legacy SSE instructions with memory
> operand not 16-byte aligned get General Protection fault.  There is
> no need to check 1, 2, 4, 8 byte alignments.  Since x86 backend has
> accurate constraints and predicates for 16-byte alignment, we can
> remove ix86_legitimate_combined_insn.
>
> Tested on x86-64.  OK for trunk?

 No. This function also handles cases where invalid hard register gets
 propagated into the insn during the combine pass, leading to spill
 failure later.

>>>
>>> ix86_legitimate_combined_insn was added to work around the
>>> reload issue:
>>
>> Sorry, I'm not convinced. Please see [1].
>>
>> You should remove only this part, together with now unused ssememalign
>> attribute.
>>
>> - /* For pre-AVX disallow unaligned loads/stores where the
>> -instructions don't support it.  */
>> - if (!TARGET_AVX
>> - && VECTOR_MODE_P (mode)
>> - && misaligned_operand (op, mode))
>> -   {
>> - unsigned int min_align = get_attr_ssememalign (insn);
>> - if (min_align == 0
>> - || MEM_ALIGN (op) < min_align)
>> -   return false;
>> -   }
>>
>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46829
>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46843
>>
>>> LRA doesn't have those limitation.  Removing
>>> ix86_legitimate_combined_insn causes no regressions.
>>
>> [1] https://gcc.gnu.org/ml/gcc-patches/2012-08/msg01195.html
>>
>> Uros.
>
> Here is the updated patch.  OK for trunk if there is no regression
> on x86-64?

OK...

BTW: I really hope that "INSTRUCTION EXCEPTION SPECIFICATION section"
quoted above is correct - we will quickly find out.

Thanks,
Uros.


Re: [PATCH] Don't build 32-bit libatomic with -march=i486 on x86-64

2016-04-19 Thread Uros Bizjak
On Tue, Apr 19, 2016 at 5:07 PM, H.J. Lu <hongjiu...@intel.com> wrote:
> Gcc uses the same -march= for both -m32 and -m64 on x86-64 unless
> --with-arch-32= is used.  There is no need for -march=i486 to compile
> 32-bit libatomic on x86-64.
>
> Tested on x86-64.  OK for trunk?
>
> H.J.
> ---
> PR target/70454
> * configure.tgt (XCFLAGS): Don't add -march=i486 to compile
> 32-bit x86 target library on x86-64.
> ---
>  libatomic/configure.tgt | 10 ++
>  1 file changed, 2 insertions(+), 8 deletions(-)
>
> diff --git a/libatomic/configure.tgt b/libatomic/configure.tgt
> index c5470d7..bbb93fc 100644
> --- a/libatomic/configure.tgt
> +++ b/libatomic/configure.tgt
> @@ -81,14 +81,8 @@ case "${target_cpu}" in
> try_ifunc=yes
> ;;
>x86_64)
> -   case " ${CC} ${CFLAGS} " in
> - *" -m32 "*)
> -   XCFLAGS="${XCFLAGS} -march=i486 -mtune=generic"
> -   XCFLAGS="${XCFLAGS} -fomit-frame-pointer"
> -   ;;
> - *)
> -   ;;
> -   esac
> +   # Since 64-bit arch > i486, we can use the same -march= to build
> +   # both 32-bit and 64-bit target libraries.
> ARCH=x86
> # ??? Detect when -mcx16 is already enabled.
> try_ifunc=yes
> --
> 2.5.5
>

No, this is wrong. My build with default options defaults to i386. So,
the difference between

$ /ssd/uros/gcc-build/gcc/cc1 -E -dM -m32 hello.c > aaa

and

$ /ssd/uros/gcc-build/gcc/cc1 -E -dM -m32 -march=i486 hello.c > bbb

is substantial:
--- aaa 2016-04-19 17:44:08.798432467 +0200
+++ bbb 2016-04-19 17:44:16.078351225 +0200
@@ -12,12 +12,15 @@
 #define __ORDER_LITTLE_ENDIAN__ 1234
 #define __SIZE_MAX__ 0xU
 #define __WCHAR_MAX__ 0x7fffL
+#define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_1 1
+#define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_2 1
+#define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4 1
 #define __DBL_DENORM_MIN__ ((double)4.94065645841246544177e-324L)
-#define __GCC_ATOMIC_CHAR_LOCK_FREE 1
+#define __GCC_ATOMIC_CHAR_LOCK_FREE 2
 #define __GCC_IEC_559 2
 #define __FLT_EVAL_METHOD__ 2
 #define __unix__ 1
-#define __GCC_ATOMIC_CHAR32_T_LOCK_FREE 1
+#define __GCC_ATOMIC_CHAR32_T_LOCK_FREE 2
 #define __UINT_FAST64_MAX__ 0xULL
 #define __SIG_ATOMIC_TYPE__ int
 #define __DBL_MIN_10_EXP__ (-307)
@@ -31,7 +34,7 @@
 #define __SHRT_MAX__ 0x7fff
 #define __LDBL_MAX__ 1.18973149535723176502e+4932L
 #define __UINT_LEAST8_MAX__ 0xff
-#define __GCC_ATOMIC_BOOL_LOCK_FREE 1
+#define __GCC_ATOMIC_BOOL_LOCK_FREE 2
 #define __UINTMAX_TYPE__ long long unsigned int
 #define __linux 1
 #define __DEC32_EPSILON__ 1E-6DF
@@ -44,7 +47,7 @@
 #define __WCHAR_MIN__ (-__WCHAR_MAX__ - 1)
 #define __INT64_C(c) c ## LL
 #define __DBL_DIG__ 15
-#define __GCC_ATOMIC_POINTER_LOCK_FREE 1
+#define __GCC_ATOMIC_POINTER_LOCK_FREE 2
 #define __SIZEOF_INT__ 4
 #define __SIZEOF_POINTER__ 4
 #define __USER_LABEL_PREFIX__
@@ -80,6 +83,7 @@
 #define __DEC128_EPSILON__ 1E-33DL
 #define __ATOMIC_HLE_RELEASE 131072
 #define __PTRDIFF_MAX__ 0x7fff
+#define __tune_i486__ 1
 #define __STDC_NO_THREADS__ 1
 #define __ATOMIC_HLE_ACQUIRE 65536
 #define __LONG_LONG_MAX__ 0x7fffLL
@@ -102,7 +106,7 @@
 #define __VERSION__ "7.0.0 20160419 (experimental) [trunk revision 235206]"
 #define __UINT64_C(c) c ## ULL
 #define _STDC_PREDEF_H 1
-#define __GCC_ATOMIC_INT_LOCK_FREE 1
+#define __GCC_ATOMIC_INT_LOCK_FREE 2
 #define __FLOAT_WORD_ORDER__ __ORDER_LITTLE_ENDIAN__
 #define __STDC_IEC_559_COMPLEX__ 1
 #define __INT32_C(c) c
@@ -111,6 +115,7 @@
 #define __DEC128_MIN_EXP__ (-6142)
 #define __code_model_32__ 1
 #define __INT_FAST32_TYPE__ int
+#define __i486__ 1
 #define __UINT_LEAST16_TYPE__ short unsigned int
 #define unix 1
 #define __INT16_MAX__ 0x7fff
@@ -125,7 +130,7 @@
 #define __LDBL_EPSILON__ 1.08420217248550443401e-19L
 #define __UINTMAX_C(c) c ## ULL
 #define __SIG_ATOMIC_MAX__ 0x7fff
-#define __GCC_ATOMIC_WCHAR_T_LOCK_FREE 1
+#define __GCC_ATOMIC_WCHAR_T_LOCK_FREE 2
 #define __SIZEOF_PTRDIFF_T__ 4
 #define __DEC32_SUBNORMAL_MIN__ 0.01E-95DF
 #define __INT_FAST16_MAX__ 0x7fff
@@ -146,7 +151,7 @@
 #define __INT64_MAX__ 0x7fffLL
 #define __UINT_LEAST32_MAX__ 0xU
 #define __SEG_GS 1
-#define __GCC_ATOMIC_LONG_LOCK_FREE 1
+#define __GCC_ATOMIC_LONG_LOCK_FREE 2
 #define __INT_LEAST64_TYPE__ long long int
 #define __INT16_TYPE__ short int
 #define __INT_LEAST8_TYPE__ signed char
@@ -169,12 +174,13 @@
 #define __FLT_DIG__ 6
 #define __UINT_FAST64_TYPE__ long long unsigned int
 #define __INT_MAX__ 0x7fff
+#define __i486 1
 #define __INT64_TYPE__ long long int
 #define __FLT_MAX_EXP__ 128
 #define __DBL_MANT_DIG__ 53
 #define __SIZEOF_FLOAT128__ 16
 #define __INT_LEAST64_MAX__ 0x7fffLL
-#define __GCC_ATOMIC_CHAR16_T_LOCK_FREE 1
+#define __GCC_A

Re: [PATCH] Remove ix86_legitimate_combined_insn

2016-04-19 Thread H.J. Lu
On Tue, Apr 19, 2016 at 8:27 AM, Uros Bizjak  wrote:
> On Tue, Apr 19, 2016 at 5:18 PM, H.J. Lu  wrote:
>> On Tue, Apr 19, 2016 at 8:08 AM, Uros Bizjak  wrote:
>>> On Tue, Apr 19, 2016 at 4:49 PM, H.J. Lu  wrote:

 From INSTRUCTION EXCEPTION SPECIFICATION section in Intel software
 developer manual volume 2, only legacy SSE instructions with memory
 operand not 16-byte aligned get General Protection fault.  There is
 no need to check 1, 2, 4, 8 byte alignments.  Since x86 backend has
 accurate constraints and predicates for 16-byte alignment, we can
 remove ix86_legitimate_combined_insn.

 Tested on x86-64.  OK for trunk?
>>>
>>> No. This function also handles cases where invalid hard register gets
>>> propagated into the insn during the combine pass, leading to spill
>>> failure later.
>>>
>>
>> ix86_legitimate_combined_insn was added to work around the
>> reload issue:
>
> Sorry, I'm not convinced. Please see [1].
>
> You should remove only this part, together with now unused ssememalign
> attribute.
>
> - /* For pre-AVX disallow unaligned loads/stores where the
> -instructions don't support it.  */
> - if (!TARGET_AVX
> - && VECTOR_MODE_P (mode)
> - && misaligned_operand (op, mode))
> -   {
> - unsigned int min_align = get_attr_ssememalign (insn);
> - if (min_align == 0
> - || MEM_ALIGN (op) < min_align)
> -   return false;
> -   }
>
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46829
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46843
>
>> LRA doesn't have those limitation.  Removing
>> ix86_legitimate_combined_insn causes no regressions.
>
> [1] https://gcc.gnu.org/ml/gcc-patches/2012-08/msg01195.html
>
> Uros.

Here is the updated patch.  OK for trunk if there is no regression
on x86-64?

-- 
H.J.
From d558242ecf4f9acd9ae4b0e1cda28fed6a75d99d Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Sat, 2 Jan 2016 14:57:09 -0800
Subject: [PATCH] Remove ssememalign

From INSTRUCTION EXCEPTION SPECIFICATION section in Intel software
developer manual volume 2, only legacy SSE instructions with memory
operand not 16-byte aligned get General Protection fault.  There is
no need to check 1, 2, 4, 8 byte alignments.  Since x86 backend has
accurate constraints and predicates for 16-byte alignment, we can
remove alignment check in ix86_legitimate_combined_insn.

	* config/i386/i386.c (ix86_legitimate_combined_insn): Remove
	alignment check.
	* config/i386/i386.md (ssememalign): Removed.
	* config/i386/sse.md: Remove ssememalign attribute from patterns.
---
 gcc/config/i386/i386.c  | 12 
 gcc/config/i386/i386.md |  7 ---
 gcc/config/i386/sse.md  | 30 --
 3 files changed, 49 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index d6c9200..6379313 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -7317,18 +7317,6 @@ ix86_legitimate_combined_insn (rtx_insn *insn)
 	  bool win;
 	  int j;
 
-	  /* For pre-AVX disallow unaligned loads/stores where the
-	 instructions don't support it.  */
-	  if (!TARGET_AVX
-	  && VECTOR_MODE_P (mode)
-	  && misaligned_operand (op, mode))
-	{
-	  unsigned int min_align = get_attr_ssememalign (insn);
-	  if (min_align == 0
-		  || MEM_ALIGN (op) < min_align)
-		return false;
-	}
-
 	  /* A unary operator may be accepted by the predicate, but it
 	 is irrelevant for matching constraints.  */
 	  if (UNARY_P (op))
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 6596a1d..38eb98c 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -460,13 +460,6 @@
 	   (const_string "unknown")]
 	 (const_string "integer")))
 
-;; The minimum required alignment of vector mode memory operands of the SSE
-;; (non-VEX/EVEX) instruction in bits, if it is different from
-;; GET_MODE_ALIGNMENT of the operand, otherwise 0.  If an instruction has
-;; multiple alternatives, this should be conservative maximum of those minimum
-;; required alignments.
-(define_attr "ssememalign" "" (const_int 0))
-
 ;; The (bounding maximum) length of an instruction immediate.
 (define_attr "length_immediate" ""
   (cond [(eq_attr "type" "incdec,setcc,icmov,str,lea,other,multi,idiv,leave,
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index ed0a1a6..78c28c5 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -1181,7 +1181,6 @@
   "%vlddqu\t{%1, %0|%0, %1}"
   [(set_attr "type" "ssemov")
(set_attr "movu" "1")
-   (set_attr "ssememalign" "8")
(set (attr "prefix_data16")
  (if_then_else
(match_test "TARGET_AVX")
@@ -1446,7 +1445,6 @@
vrcpss\t{%1, %2, %0|%0, %2, %k1}"
   [(set_attr "isa" "noavx,avx")
(set_attr "type" "sse")
-   (set_attr "ssememalign" "32")
(set_attr 

Re: [PATCH] Remove ix86_legitimate_combined_insn

2016-04-19 Thread Uros Bizjak
On Tue, Apr 19, 2016 at 5:18 PM, H.J. Lu  wrote:
> On Tue, Apr 19, 2016 at 8:08 AM, Uros Bizjak  wrote:
>> On Tue, Apr 19, 2016 at 4:49 PM, H.J. Lu  wrote:
>>>
>>> From INSTRUCTION EXCEPTION SPECIFICATION section in Intel software
>>> developer manual volume 2, only legacy SSE instructions with memory
>>> operand not 16-byte aligned get General Protection fault.  There is
>>> no need to check 1, 2, 4, 8 byte alignments.  Since x86 backend has
>>> accurate constraints and predicates for 16-byte alignment, we can
>>> remove ix86_legitimate_combined_insn.
>>>
>>> Tested on x86-64.  OK for trunk?
>>
>> No. This function also handles cases where invalid hard register gets
>> propagated into the insn during the combine pass, leading to spill
>> failure later.
>>
>
> ix86_legitimate_combined_insn was added to work around the
> reload issue:

Sorry, I'm not convinced. Please see [1].

You should remove only this part, together with now unused ssememalign
attribute.

- /* For pre-AVX disallow unaligned loads/stores where the
-instructions don't support it.  */
- if (!TARGET_AVX
- && VECTOR_MODE_P (mode)
- && misaligned_operand (op, mode))
-   {
- unsigned int min_align = get_attr_ssememalign (insn);
- if (min_align == 0
- || MEM_ALIGN (op) < min_align)
-   return false;
-   }

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46829
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46843

> LRA doesn't have those limitation.  Removing
> ix86_legitimate_combined_insn causes no regressions.

[1] https://gcc.gnu.org/ml/gcc-patches/2012-08/msg01195.html

Uros.


Re: gomp_target_fini

2016-04-19 Thread Alexander Monakov
On Tue, 19 Apr 2016, Thomas Schwinge wrote:
> Well, I certainly had done at least some thinking before proposing this:
> we're talking about the libgomp "fatal exit" function, called when
> something has gone very wrong, and we're about to terminate the process,
> because there's no hope to recover.

By the way, this relates to something I wanted to bring up for a while now.

The OpenMP spec does not talk about error conditions arising in well-formed
programs due to resource exhaustion (OOM, in particular).  My understanding
is that an implementation always has a "way out": if e.g. it fails to allocate
memory required for a thread, it could run with reduced parallelism.
Ultimately the implementation can "fail gracefully" all the way back to
running the program sequentially.

Offloading makes that unclear due to how host fallbacks for target regions are
observable (which I don't understand, and I hope we get a chance to discuss
it), but is the above understanding generally correct?  Today libgomp is
clearly "trigger happy" to crash the process when something goes slightly
wrong, but was graceful failure ever considered as a design [non-]goal?

In that light, can a general policy of avoiding aborting the program be in
place, and should plugin authors work towards introducing fallback paths
instead of [over-]using GOMP_PLUGIN_fatal?

Thanks.
Alexander


Re: [PATCH] Remove ix86_legitimate_combined_insn

2016-04-19 Thread H.J. Lu
On Tue, Apr 19, 2016 at 8:08 AM, Uros Bizjak  wrote:
> On Tue, Apr 19, 2016 at 4:49 PM, H.J. Lu  wrote:
>>
>> From INSTRUCTION EXCEPTION SPECIFICATION section in Intel software
>> developer manual volume 2, only legacy SSE instructions with memory
>> operand not 16-byte aligned get General Protection fault.  There is
>> no need to check 1, 2, 4, 8 byte alignments.  Since x86 backend has
>> accurate constraints and predicates for 16-byte alignment, we can
>> remove ix86_legitimate_combined_insn.
>>
>> Tested on x86-64.  OK for trunk?
>
> No. This function also handles cases where invalid hard register gets
> propagated into the insn during the combine pass, leading to spill
> failure later.
>

ix86_legitimate_combined_insn was added to work around the
reload issue:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46829
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46843

LRA doesn't have those limitation.  Removing
ix86_legitimate_combined_insn causes no regressions.

-- 
H.J.


[PATCH] Don't build 32-bit libatomic with -march=i486 on x86-64

2016-04-19 Thread H.J. Lu
Gcc uses the same -march= for both -m32 and -m64 on x86-64 unless
--with-arch-32= is used.  There is no need for -march=i486 to compile
32-bit libatomic on x86-64.

Tested on x86-64.  OK for trunk?

H.J.
---
PR target/70454
* configure.tgt (XCFLAGS): Don't add -march=i486 to compile
32-bit x86 target library on x86-64.
---
 libatomic/configure.tgt | 10 ++
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/libatomic/configure.tgt b/libatomic/configure.tgt
index c5470d7..bbb93fc 100644
--- a/libatomic/configure.tgt
+++ b/libatomic/configure.tgt
@@ -81,14 +81,8 @@ case "${target_cpu}" in
try_ifunc=yes
;;
   x86_64)
-   case " ${CC} ${CFLAGS} " in
- *" -m32 "*)
-   XCFLAGS="${XCFLAGS} -march=i486 -mtune=generic"
-   XCFLAGS="${XCFLAGS} -fomit-frame-pointer"
-   ;;
- *)
-   ;;
-   esac
+   # Since 64-bit arch > i486, we can use the same -march= to build
+   # both 32-bit and 64-bit target libraries.
ARCH=x86
# ??? Detect when -mcx16 is already enabled.
try_ifunc=yes
-- 
2.5.5



[PATCH] Don't build 32-bit libitm with -march=i486 on x86-64

2016-04-19 Thread H.J. Lu
Gcc uses the same -march= for both -m32 and -m64 on x86-64 unless
--with-arch-32= is used.  There is no need for -march=i486 to compile
32-bit libitm on x86-64.

Tested on x86-64.  OK for trunk?


H.J.
---
PR target/70454
* configure.tgt (XCFLAGS): Don't add -march=i486 to compile
32-bit target library on x86-64.
---
 libitm/configure.tgt | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/libitm/configure.tgt b/libitm/configure.tgt
index e84382f..c925f77 100644
--- a/libitm/configure.tgt
+++ b/libitm/configure.tgt
@@ -100,12 +100,8 @@ case "${target_cpu}" in
;;
 
   x86_64)
-   case " ${CC} ${CFLAGS} " in
- *" -m32 "*)
-   XCFLAGS="${XCFLAGS} -march=i486 -mtune=generic"
-   XCFLAGS="${XCFLAGS} -fomit-frame-pointer"
-   ;;
-   esac
+   # Since 64-bit arch > i486, we can use the same -march= to build
+   # both 32-bit and 64-bit target libraries.
XCFLAGS="${XCFLAGS} -mrtm"
ARCH=x86
;;
-- 
2.5.5



[PATCH] Don't build 32-bit libgomp with -march=i486 on x86-64

2016-04-19 Thread H.J. Lu
Gcc uses the same -march= for both -m32 and -m64 on x86-64 unless
--with-arch-32= is used.  There is no need for -march=i486 to compile
32-bit libgomp on x86-64.

Tested on x86-64.  OK for trunk?

H.J.
---
PR target/70454
* configure.tgt (XCFLAGS): Don't add -march=i486 to compile
32-bit target library on x86-64.
---
 libgomp/configure.tgt | 10 ++
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/libgomp/configure.tgt b/libgomp/configure.tgt
index 77e73f0..a36acc5 100644
--- a/libgomp/configure.tgt
+++ b/libgomp/configure.tgt
@@ -79,16 +79,10 @@ if test x$enable_linux_futex = xyes; then
esac
;;
 
-# Similar jiggery-pokery for x86_64 multilibs, except here we
-# can't rely on the --with-arch configure option, since that
-# applies to the 64-bit side.
 x86_64-*-linux*)
config_path="linux/x86 linux posix"
-   case " ${CC} ${CFLAGS} " in
- *" -m32 "*)
-   XCFLAGS="${XCFLAGS} -march=i486 -mtune=generic"
-   ;;
-   esac
+   # Since 64-bit arch > i486, we can use the same -march= to build
+   # both 32-bit and 64-bit target libraries.
;;
 
 # Note that sparcv7 and sparcv8 is not included here.  We need cas.
-- 
2.5.5



Re: [PATCH] Remove ix86_legitimate_combined_insn

2016-04-19 Thread Uros Bizjak
On Tue, Apr 19, 2016 at 4:49 PM, H.J. Lu  wrote:
>
> From INSTRUCTION EXCEPTION SPECIFICATION section in Intel software
> developer manual volume 2, only legacy SSE instructions with memory
> operand not 16-byte aligned get General Protection fault.  There is
> no need to check 1, 2, 4, 8 byte alignments.  Since x86 backend has
> accurate constraints and predicates for 16-byte alignment, we can
> remove ix86_legitimate_combined_insn.
>
> Tested on x86-64.  OK for trunk?

No. This function also handles cases where invalid hard register gets
propagated into the insn during the combine pass, leading to spill
failure later.

Uros.

> H.J.
> ---
> * config/i386/i386.c (ix86_legitimate_combined_insn): Removed.
> (TARGET_LEGITIMATE_COMBINED_INSN): Likewise.
> (ix86_expand_special_args_builtin): Replace
> ix86_legitimate_combined_insn with vector_memory_operand in
> comments.
> ---
>  gcc/config/i386/i386.c | 96 
> ++
>  1 file changed, 2 insertions(+), 94 deletions(-)
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index e056f68..a66cfc4 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -7288,95 +7288,6 @@ ix86_return_pops_args (tree fundecl, tree funtype, int 
> size)
>
>return 0;
>  }
> -
> -/* Implement the TARGET_LEGITIMATE_COMBINED_INSN hook.  */
> -
> -static bool
> -ix86_legitimate_combined_insn (rtx_insn *insn)
> -{
> -  /* Check operand constraints in case hard registers were propagated
> - into insn pattern.  This check prevents combine pass from
> - generating insn patterns with invalid hard register operands.
> - These invalid insns can eventually confuse reload to error out
> - with a spill failure.  See also PRs 46829 and 46843.  */
> -  if ((INSN_CODE (insn) = recog (PATTERN (insn), insn, 0)) >= 0)
> -{
> -  int i;
> -
> -  extract_insn (insn);
> -  preprocess_constraints (insn);
> -
> -  int n_operands = recog_data.n_operands;
> -  int n_alternatives = recog_data.n_alternatives;
> -  for (i = 0; i < n_operands; i++)
> -   {
> - rtx op = recog_data.operand[i];
> - machine_mode mode = GET_MODE (op);
> - const operand_alternative *op_alt;
> - int offset = 0;
> - bool win;
> - int j;
> -
> - /* For pre-AVX disallow unaligned loads/stores where the
> -instructions don't support it.  */
> - if (!TARGET_AVX
> - && VECTOR_MODE_P (mode)
> - && misaligned_operand (op, mode))
> -   {
> - unsigned int min_align = get_attr_ssememalign (insn);
> - if (min_align == 0
> - || MEM_ALIGN (op) < min_align)
> -   return false;
> -   }
> -
> - /* A unary operator may be accepted by the predicate, but it
> -is irrelevant for matching constraints.  */
> - if (UNARY_P (op))
> -   op = XEXP (op, 0);
> -
> - if (SUBREG_P (op))
> -   {
> - if (REG_P (SUBREG_REG (op))
> - && REGNO (SUBREG_REG (op)) < FIRST_PSEUDO_REGISTER)
> -   offset = subreg_regno_offset (REGNO (SUBREG_REG (op)),
> - GET_MODE (SUBREG_REG (op)),
> - SUBREG_BYTE (op),
> - GET_MODE (op));
> - op = SUBREG_REG (op);
> -   }
> -
> - if (!(REG_P (op) && HARD_REGISTER_P (op)))
> -   continue;
> -
> - op_alt = recog_op_alt;
> -
> - /* Operand has no constraints, anything is OK.  */
> - win = !n_alternatives;
> -
> - alternative_mask preferred = get_preferred_alternatives (insn);
> - for (j = 0; j < n_alternatives; j++, op_alt += n_operands)
> -   {
> - if (!TEST_BIT (preferred, j))
> -   continue;
> - if (op_alt[i].anything_ok
> - || (op_alt[i].matches != -1
> - && operands_match_p
> - (recog_data.operand[i],
> -  recog_data.operand[op_alt[i].matches]))
> - || reg_fits_class_p (op, op_alt[i].cl, offset, mode))
> -   {
> - win = true;
> - break;
> -   }
> -   }
> -
> - if (!win)
> -   return false;
> -   }
> -}
> -
> -  return true;
> -}
>
>  /* Implement the TARGET_ASAN_SHADOW_OFFSET hook.  */
>
> @@ -39859,7 +39770,7 @@ ix86_expand_special_args_builtin (const struct 
> builtin_description *d,
>  on it.  Try to improve it using get_pointer_alignment,
>  and if the special builtin is one that requires strict
>  mode alignment, also from it's GET_MODE_ALIGNMENT.
> -Failure to do so could lead to ix86_legitimate_combined_insn
> +

Re: [PATCH] PR libitm/70456: Allocate aligned memory in gtm_thread operator new

2016-04-19 Thread H.J. Lu
On Sat, Apr 2, 2016 at 9:25 AM, H.J. Lu  wrote:
> On Wed, Mar 30, 2016 at 5:34 AM, H.J. Lu  wrote:
>> Since GTM::gtm_thread has
>>
>> gtm_thread *next_thread __attribute__((__aligned__(HW_CACHELINE_SIZE)));
>>
>> GTM::gtm_thread::operator new should allocate aligned memory.
>>
>> Tested on Linux/x86-64.  OK for trunk.
>>
>>
>
> This patch is better.  Tested on Linux/x86-64.  OK for trunk?
>

Hi Richard,

Is this patch:

https://gcc.gnu.org/ml/gcc-patches/2016-04/msg00119.html

OK for trunk?

-- 
H.J.


Re: [PATCH] [AArch64] support -mfentry feature for arm64

2016-04-19 Thread Andrew Haley
On 04/19/2016 03:37 PM, Pedro Alves wrote:
> On 04/19/2016 02:25 PM, Andrew Haley wrote:
>> On 04/19/2016 02:19 PM, Michael Matz wrote:
>>
>>> Well, yeah, that's traditional insn caches on multiple cores.  From
>>> user space you need kernel help for this, doing interprocess
>>> interrupts to flush all such buffers on all cores (or at least those
>>> potentially fetching stuff in the patched region, if such
>>> granularity is possible).  An implementation providing such is
>>> non-broken :)
>>
>> Sure.  If you know of any such facility in Linux userspace, please let
>> me know.  :-)
> 
> Sounds like a job for the sys_membarrier system call:
> 
>  https://lkml.org/lkml/2015/3/18/531
>  https://lwn.net/Articles/369567/
> 
> I think it's available in Linux 4.3+.

So it is, thanks.  I'm guessing that might be good enough for full
instruction synchronization barriers, but from looking at the kernel
source I can't really tell.

Andrew.




[PATCH] Simplify ix86_expand_vector_move_misalign

2016-04-19 Thread H.J. Lu
Since mov_internal patterns handle both aligned/unaligned load
and store, we can simplify ix86_avx256_split_vector_move_misalign and
ix86_expand_vector_move_misalign.

Tested on x86-64.  OK for trunk?

H.J.
---
* config/i386/i386.c (ix86_avx256_split_vector_move_misalign):
Short-cut unaligned load and store cases.  Handle all integer
vector modes.
(ix86_expand_vector_move_misalign): Short-cut unaligned load
and store cases.  Call ix86_avx256_split_vector_move_misalign
directly without checking mode class.
---
 gcc/config/i386/i386.c | 252 -
 1 file changed, 81 insertions(+), 171 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 4e48572..e056f68 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -18820,7 +18820,39 @@ ix86_avx256_split_vector_move_misalign (rtx op0, rtx 
op1)
   rtx (*extract) (rtx, rtx, rtx);
   machine_mode mode;
 
-  switch (GET_MODE (op0))
+  if ((MEM_P (op1) && !TARGET_AVX256_SPLIT_UNALIGNED_LOAD)
+  || (MEM_P (op0) && !TARGET_AVX256_SPLIT_UNALIGNED_STORE))
+{
+  emit_insn (gen_rtx_SET (op0, op1));
+  return;
+}
+
+  rtx orig_op0 = NULL_RTX;
+  mode = GET_MODE (op0);
+  switch (GET_MODE_CLASS (mode))
+{
+case MODE_VECTOR_INT:
+case MODE_INT:
+  if (mode != V32QImode)
+   {
+ if (!MEM_P (op0))
+   {
+ orig_op0 = op0;
+ op0 = gen_reg_rtx (V32QImode);
+   }
+ else
+   op0 = gen_lowpart (V32QImode, op0);
+ op1 = gen_lowpart (V32QImode, op1);
+ mode = V32QImode;
+   }
+  break;
+case MODE_VECTOR_FLOAT:
+  break;
+default:
+  gcc_unreachable ();
+}
+
+  switch (mode)
 {
 default:
   gcc_unreachable ();
@@ -18840,34 +18872,25 @@ ix86_avx256_split_vector_move_misalign (rtx op0, rtx 
op1)
 
   if (MEM_P (op1))
 {
-  if (TARGET_AVX256_SPLIT_UNALIGNED_LOAD
- && optimize_insn_for_speed_p ())
-   {
- rtx r = gen_reg_rtx (mode);
- m = adjust_address (op1, mode, 0);
- emit_move_insn (r, m);
- m = adjust_address (op1, mode, 16);
- r = gen_rtx_VEC_CONCAT (GET_MODE (op0), r, m);
- emit_move_insn (op0, r);
-   }
-  else
-   emit_insn (gen_rtx_SET (op0, op1));
+  rtx r = gen_reg_rtx (mode);
+  m = adjust_address (op1, mode, 0);
+  emit_move_insn (r, m);
+  m = adjust_address (op1, mode, 16);
+  r = gen_rtx_VEC_CONCAT (GET_MODE (op0), r, m);
+  emit_move_insn (op0, r);
 }
   else if (MEM_P (op0))
 {
-  if (TARGET_AVX256_SPLIT_UNALIGNED_STORE
- && optimize_insn_for_speed_p ())
-   {
- m = adjust_address (op0, mode, 0);
- emit_insn (extract (m, op1, const0_rtx));
- m = adjust_address (op0, mode, 16);
- emit_insn (extract (m, op1, const1_rtx));
-   }
-  else
-   emit_insn (gen_rtx_SET (op0, op1));
+  m = adjust_address (op0, mode, 0);
+  emit_insn (extract (m, op1, const0_rtx));
+  m = adjust_address (op0, mode, 16);
+  emit_insn (extract (m, op1, const1_rtx));
 }
   else
 gcc_unreachable ();
+
+  if (orig_op0)
+emit_move_insn (orig_op0, gen_lowpart (GET_MODE (orig_op0), op0));
 }
 
 /* Implement the movmisalign patterns for SSE.  Non-SSE modes go
@@ -18925,118 +18948,50 @@ ix86_avx256_split_vector_move_misalign (rtx op0, rtx 
op1)
 void
 ix86_expand_vector_move_misalign (machine_mode mode, rtx operands[])
 {
-  rtx op0, op1, orig_op0 = NULL_RTX, m;
+  rtx op0, op1, m;
 
   op0 = operands[0];
   op1 = operands[1];
 
-  if (GET_MODE_SIZE (mode) == 64)
+  /* Use unaligned load/store for AVX512 or when optimizing for size.  */
+  if (GET_MODE_SIZE (mode) == 64 || optimize_insn_for_size_p ())
 {
-  switch (GET_MODE_CLASS (mode))
-   {
-   case MODE_VECTOR_INT:
-   case MODE_INT:
- if (GET_MODE (op0) != V16SImode)
-   {
- if (!MEM_P (op0))
-   {
- orig_op0 = op0;
- op0 = gen_reg_rtx (V16SImode);
-   }
- else
-   op0 = gen_lowpart (V16SImode, op0);
-   }
- op1 = gen_lowpart (V16SImode, op1);
- /* FALLTHRU */
-
-   case MODE_VECTOR_FLOAT:
-
- emit_insn (gen_rtx_SET (op0, op1));
- if (orig_op0)
-   emit_move_insn (orig_op0, gen_lowpart (GET_MODE (orig_op0), op0));
- break;
-
-   default:
- gcc_unreachable ();
-   }
-
+  emit_insn (gen_rtx_SET (op0, op1));
   return;
 }
 
-  if (TARGET_AVX
-  && GET_MODE_SIZE (mode) == 32)
+  if (TARGET_AVX)
 {
-  switch (GET_MODE_CLASS (mode))
-   {
-   case MODE_VECTOR_INT:
-   case MODE_INT:
- if (GET_MODE (op0) != V32QImode)
-   {
- if (!MEM_P (op0))
-   {
- orig_op0 = op0;
- op0 

[PATCH] Remove ix86_legitimate_combined_insn

2016-04-19 Thread H.J. Lu

>From INSTRUCTION EXCEPTION SPECIFICATION section in Intel software
developer manual volume 2, only legacy SSE instructions with memory
operand not 16-byte aligned get General Protection fault.  There is
no need to check 1, 2, 4, 8 byte alignments.  Since x86 backend has
accurate constraints and predicates for 16-byte alignment, we can
remove ix86_legitimate_combined_insn.

Tested on x86-64.  OK for trunk?

H.J.
---
* config/i386/i386.c (ix86_legitimate_combined_insn): Removed.
(TARGET_LEGITIMATE_COMBINED_INSN): Likewise.
(ix86_expand_special_args_builtin): Replace
ix86_legitimate_combined_insn with vector_memory_operand in
comments.
---
 gcc/config/i386/i386.c | 96 ++
 1 file changed, 2 insertions(+), 94 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index e056f68..a66cfc4 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -7288,95 +7288,6 @@ ix86_return_pops_args (tree fundecl, tree funtype, int 
size)
 
   return 0;
 }
-
-/* Implement the TARGET_LEGITIMATE_COMBINED_INSN hook.  */
-
-static bool
-ix86_legitimate_combined_insn (rtx_insn *insn)
-{
-  /* Check operand constraints in case hard registers were propagated
- into insn pattern.  This check prevents combine pass from
- generating insn patterns with invalid hard register operands.
- These invalid insns can eventually confuse reload to error out
- with a spill failure.  See also PRs 46829 and 46843.  */
-  if ((INSN_CODE (insn) = recog (PATTERN (insn), insn, 0)) >= 0)
-{
-  int i;
-
-  extract_insn (insn);
-  preprocess_constraints (insn);
-
-  int n_operands = recog_data.n_operands;
-  int n_alternatives = recog_data.n_alternatives;
-  for (i = 0; i < n_operands; i++)
-   {
- rtx op = recog_data.operand[i];
- machine_mode mode = GET_MODE (op);
- const operand_alternative *op_alt;
- int offset = 0;
- bool win;
- int j;
-
- /* For pre-AVX disallow unaligned loads/stores where the
-instructions don't support it.  */
- if (!TARGET_AVX
- && VECTOR_MODE_P (mode)
- && misaligned_operand (op, mode))
-   {
- unsigned int min_align = get_attr_ssememalign (insn);
- if (min_align == 0
- || MEM_ALIGN (op) < min_align)
-   return false;
-   }
-
- /* A unary operator may be accepted by the predicate, but it
-is irrelevant for matching constraints.  */
- if (UNARY_P (op))
-   op = XEXP (op, 0);
-
- if (SUBREG_P (op))
-   {
- if (REG_P (SUBREG_REG (op))
- && REGNO (SUBREG_REG (op)) < FIRST_PSEUDO_REGISTER)
-   offset = subreg_regno_offset (REGNO (SUBREG_REG (op)),
- GET_MODE (SUBREG_REG (op)),
- SUBREG_BYTE (op),
- GET_MODE (op));
- op = SUBREG_REG (op);
-   }
-
- if (!(REG_P (op) && HARD_REGISTER_P (op)))
-   continue;
-
- op_alt = recog_op_alt;
-
- /* Operand has no constraints, anything is OK.  */
- win = !n_alternatives;
-
- alternative_mask preferred = get_preferred_alternatives (insn);
- for (j = 0; j < n_alternatives; j++, op_alt += n_operands)
-   {
- if (!TEST_BIT (preferred, j))
-   continue;
- if (op_alt[i].anything_ok
- || (op_alt[i].matches != -1
- && operands_match_p
- (recog_data.operand[i],
-  recog_data.operand[op_alt[i].matches]))
- || reg_fits_class_p (op, op_alt[i].cl, offset, mode))
-   {
- win = true;
- break;
-   }
-   }
-
- if (!win)
-   return false;
-   }
-}
-
-  return true;
-}
 
 /* Implement the TARGET_ASAN_SHADOW_OFFSET hook.  */
 
@@ -39859,7 +39770,7 @@ ix86_expand_special_args_builtin (const struct 
builtin_description *d,
 on it.  Try to improve it using get_pointer_alignment,
 and if the special builtin is one that requires strict
 mode alignment, also from it's GET_MODE_ALIGNMENT.
-Failure to do so could lead to ix86_legitimate_combined_insn
+Failure to do so could lead to vector_memory_operand
 rejecting all changes to such insns.  */
  unsigned int align = get_pointer_alignment (arg);
  if (aligned_mem && align < GET_MODE_ALIGNMENT (tmode))
@@ -39915,7 +39826,7 @@ ix86_expand_special_args_builtin (const struct 
builtin_description *d,
 on it.  Try to improve it using get_pointer_alignment,
 and if the special builtin is one that requires 

Re: [PATCH] [AArch64] support -mfentry feature for arm64

2016-04-19 Thread Pedro Alves
On 04/19/2016 02:25 PM, Andrew Haley wrote:
> On 04/19/2016 02:19 PM, Michael Matz wrote:
> 
>> Well, yeah, that's traditional insn caches on multiple cores.  From
>> user space you need kernel help for this, doing interprocess
>> interrupts to flush all such buffers on all cores (or at least those
>> potentially fetching stuff in the patched region, if such
>> granularity is possible).  An implementation providing such is
>> non-broken :)
> 
> Sure.  If you know of any such facility in Linux userspace, please let
> me know.  :-)

Sounds like a job for the sys_membarrier system call:

 https://lkml.org/lkml/2015/3/18/531
 https://lwn.net/Articles/369567/

I think it's available in Linux 4.3+.

Thanks,
Pedro Alves



Re: [PATCH, libgomp] Fix deadlock in acc_set_device_type (ping x2)

2016-04-19 Thread Chung-Lin Tang
Ping x2.

Hi Jakub,
This patch is fairly straightforward, and solves a easily encountered
deadlock. Please approve for trunk and gcc-6-branch.

Thanks,
Chung-Lin

On 2016/4/16 03:39 PM, Chung-Lin Tang wrote:
> Ping.
> 
> On 2016/3/28 05:45 PM, Chung-Lin Tang wrote:
>> Hi Jakub, there's a path for deadlock on acc_device_lock when going
>> through the acc_set_device_type() OpenACC library function.
>> Basically, the gomp_init_targets_once() function should not be
>> called with that held. The attached patch moves it appropriately.
>>
>> Also in this patch, there are several cases in acc_* functions
>> where gomp_init_targets_once() is guarded by a test of
>> !cached_base_dev. Since that function already uses pthread_once() to
>> call gomp_target_init(), and technically cached_base_dev
>> is protected by acc_device_lock, the cleanest way should be to
>> simply drop those "if(!cached_base_dev)" tests.
>>
>> Tested libgomp without regressions on an nvptx offloaded system,
>> is this okay for trunk?
>>
>> Thanks,
>> Chung-Lin
>>
>> 2016-03-28  Chung-Lin Tang  
>>
>> * oacc-init.c (acc_init): Remove !cached_base_dev condition on call 
>> to
>> gomp_init_targets_once().
>> (acc_set_device_type): Remove !cached_base_dev condition on call to
>> gomp_init_targets_once(), move call to before acc_device_lock 
>> acquire,
>> to avoid deadlock.
>> (acc_get_device_num): Remove !cached_base_dev condition on call to
>> gomp_init_targets_once().
>> (acc_set_device_num): Likewise.
>>
> 



[gomp-nvptx] doc: document nvptx shared attribute

2016-04-19 Thread Alexander Monakov
* doc/extend.texi (Nvidia PTX Variable Attributes): New section.
---
Applied to amonakov/gomp-nvptx branch.

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index e11ce4d..5eeb179 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -5469,6 +5469,7 @@ attributes.
 * MeP Variable Attributes::
 * Microsoft Windows Variable Attributes::
 * MSP430 Variable Attributes::
+* Nvidia PTX Variable Attributes::
 * PowerPC Variable Attributes::
 * RL78 Variable Attributes::
 * SPU Variable Attributes::
@@ -6099,6 +6100,20 @@ same name (@pxref{MSP430 Function Attributes}).
 These attributes can be applied to both functions and variables.
 @end table
 
+@node Nvidia PTX Variable Attributes
+@subsection Nvidia PTX Variable Attributes
+
+These variable attributes are supported by the Nvidia PTX back end:
+
+@table @code
+@item shared
+@cindex @code{shared} attribute, Nvidia PTX
+Use this attribute to place a variable in the @code{.shared} memory space.
+This memory space is private to each cooperative thread array; only threads
+within one thread block refer to the same instance of the variable.
+The runtime does not initialize variables in this memory space.
+@end table
+
 @node PowerPC Variable Attributes
 @subsection PowerPC Variable Attributes
 


Re: gomp_target_fini

2016-04-19 Thread Thomas Schwinge
Hi!

On Fri, 22 Jan 2016 11:16:07 +0100, Jakub Jelinek  wrote:
> On Thu, Jan 21, 2016 at 04:24:46PM +0100, Bernd Schmidt wrote:
> > On 12/16/2015 01:30 PM, Thomas Schwinge wrote:
> > >Now, with the above change installed, GOMP_PLUGIN_fatal will trigger the
> > >atexit handler, gomp_target_fini, which, with the device lock held, will
> > >call back into the plugin, GOMP_OFFLOAD_fini_device, which will try to
> > >clean up.
> > >
> > >Because of the earlier CUDA_ERROR_LAUNCH_FAILED, the associated CUDA
> > >context is now in an inconsistent state
> > 
> > >Thus, any cuMemFreeHost invocations that are run during clean-up will now
> > >also/still return CUDA_ERROR_LAUNCH_FAILED, due to which we'll again call
> > >GOMP_PLUGIN_fatal, which again will trigger the same or another
> > >(GOMP_offload_unregister_ver) atexit handler, which will then deadlock
> > >trying to lock the device again, which is still locked.

(... causing "WARNING: program timed out" for the affected libgomp test
cases, as well as deadlocks for any such user code, too.)

> > >   libgomp/
> > >   * error.c (gomp_vfatal): Call _exit instead of exit.
> > 
> > It seems unfortunate to disable the atexit handlers for everything for what
> > seems purely an nvptx problem.  [...]

> I agree, _exit is just wrong, there could be important atexit hooks from the
> application.  You can set some flag that the libgomp or nvptx plugin atexit
> hooks should not do anything, or should do things differently.  But
> bypassing all atexit handlers is risky.

Well, I certainly had done at least some thinking before proposing this:
we're talking about the libgomp "fatal exit" function, called when
something has gone very wrong, and we're about to terminate the process,
because there's no hope to recover.  In this situation/consideration it
didn't seem important to me to have atexit handlers called.  Just like
these are also not called when we run into a SIGSEGV, or the kernel kills
the process for other reasons.  So I'm not completely convinced by your
assessment that calling "_exit is just wrong".  Anyway, I can certainly
accept that my understanding of the seriousness of a libgomp "fatal exit"
has been too pessimistic, and that we can do better than my proposed
_exit solution.

Two other solutions have been proposed in the past months: Chung-Lin's
patches with subject: "Adjust offload plugin interface for avoiding
deadlock on exit", later: "Resolve libgomp plugin deadlock on exit",
later: "Resolve deadlock on plugin exit" (still pending review/approval),
and Alexander's much smaller patch with subject: "libgomp plugin: make
cuMemFreeHost error non-fatal",
.
(Both of which I have not reviewed in detail.)  Assuming that Chung-Lin's
patches are considered too invasive for gcc-6-branch, can we at least get
Alexander's patch committed to gcc-6-branch as well as on trunk, please?

commit d86a582bd9c21451dc888695ee6ecef37b5fb6ac
Author: Alexander Monakov 
Date:   Fri Mar 11 15:31:33 2016 +0300

libgomp plugin: make cuMemFreeHost error non-fatal

Unlike cuMemFree and other resource-releasing functions called on exit,
cuMemFreeHost appears to re-report errors encountered in kernel launch.
This leads to a deadlock after GOMP_PLUGIN_fatal is reentered.

While the behavior on libgomp side is suboptimal (there's no need to
call resource-releasing functions if we're about to destroy the CUDA
context anyway), this behavior on cuMemFreeHost part is not useful
and just makes error "recovery" harder.  This was reported to NVIDIA
(bug ref. 1737876), but we can work around it by simply reporting the
error without making it fatal.

* plugin/plugin-nvptx.c (map_fini): Make cuMemFreeHost error non-fatal.
---
 libgomp/ChangeLog.gomp-nvptx  | 4 
 libgomp/plugin/plugin-nvptx.c | 2 +-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git libgomp/ChangeLog.gomp-nvptx libgomp/ChangeLog.gomp-nvptx
index 7eefe0b..6bd9e5e 100644
--- libgomp/ChangeLog.gomp-nvptx
+++ libgomp/ChangeLog.gomp-nvptx
@@ -1,3 +1,7 @@
+2016-03-11  Alexander Monakov  
+
+   * plugin/plugin-nvptx.c (map_fini): Make cuMemFreeHost error non-fatal.
+
 2016-03-04  Alexander Monakov  
 
* config/nvptx/bar.c: Remove wrong invocation of
diff --git libgomp/plugin/plugin-nvptx.c libgomp/plugin/plugin-nvptx.c
index adf57b1..4e44242 100644
--- libgomp/plugin/plugin-nvptx.c
+++ libgomp/plugin/plugin-nvptx.c
@@ -135,7 +135,7 @@ map_fini (struct ptx_stream *s)
 
   r = cuMemFreeHost (s->h);
   if (r != CUDA_SUCCESS)
-GOMP_PLUGIN_fatal ("cuMemFreeHost error: %s", cuda_error (r));
+GOMP_PLUGIN_error ("cuMemFreeHost error: %s", cuda_error (r));
 }
 
 static void


Grüße
 Thomas


Re: [PATCH][GCC 7] Fix PR70171

2016-04-19 Thread Richard Biener
On Fri, Mar 11, 2016 at 3:01 PM, Richard Biener  wrote:
>
> The following teaches phiprop to handle the case of aggregate copies
> where the aggregate has non-BLKmode which means it is very likely
> expanded as reg-reg moves (any better test for that apart from
> checking for non-BLKmode?).  This improves code for the testcase
> from
>
> _Z14struct_ternary1SS_b:
> .LFB2:
> .cfi_startproc
> leaq-40(%rsp), %rcx
> leaq-24(%rsp), %rax
> testb   %dl, %dl
> movl%edi, -24(%rsp)
> movl%esi, -40(%rsp)
> cmove   %rcx, %rax
> movl(%rax), %eax
> ret
>
> to
>
> _Z14struct_ternary1SS_b:
> .LFB2:
> .cfi_startproc
> testb   %dl, %dl
> movl%edi, %eax
> cmove   %esi, %eax
> ret
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, queued for stage1.

Re-bootstrapped and tested on x86_64-unknown-linux-gnu, applied as r235208.

Richard.

> Richard.
>
> 2016-03-11  Richard Biener  
>
> PR tree-optimization/70171
> * tree-ssa-phiprop.c: Include stor-layout.h.
> (phiprop_insert_phi): Handle the aggregate copy case.
> (propagate_with_phi): Likewise.
>
> * g++.dg/tree-ssa/pr70171.C: New testcase.
>
> Index: gcc/tree-ssa-phiprop.c
> ===
> *** gcc/tree-ssa-phiprop.c  (revision 234134)
> --- gcc/tree-ssa-phiprop.c  (working copy)
> *** along with GCC; see the file COPYING3.
> *** 31,36 
> --- 31,37 
>   #include "tree-eh.h"
>   #include "gimplify.h"
>   #include "gimple-iterator.h"
> + #include "stor-layout.h"
>
>   /* This pass propagates indirect loads through the PHI node for its
>  address to make the load source possibly non-addressable and to
> *** phiprop_insert_phi (basic_block bb, gphi
> *** 132,138 
> struct phiprop_d *phivn, size_t n)
>   {
> tree res;
> !   gphi *new_phi;
> edge_iterator ei;
> edge e;
>
> --- 133,139 
> struct phiprop_d *phivn, size_t n)
>   {
> tree res;
> !   gphi *new_phi = NULL;
> edge_iterator ei;
> edge e;
>
> *** phiprop_insert_phi (basic_block bb, gphi
> *** 142,148 
> /* Build a new PHI node to replace the definition of
>the indirect reference lhs.  */
> res = gimple_assign_lhs (use_stmt);
> !   new_phi = create_phi_node (res, bb);
>
> if (dump_file && (dump_flags & TDF_DETAILS))
>   {
> --- 143,150 
> /* Build a new PHI node to replace the definition of
>the indirect reference lhs.  */
> res = gimple_assign_lhs (use_stmt);
> !   if (TREE_CODE (res) == SSA_NAME)
> ! new_phi = create_phi_node (res, bb);
>
> if (dump_file && (dump_flags & TDF_DETAILS))
>   {
> *** phiprop_insert_phi (basic_block bb, gphi
> *** 187,193 
> {
>   tree rhs = gimple_assign_rhs1 (use_stmt);
>   gcc_assert (TREE_CODE (old_arg) == ADDR_EXPR);
> ! new_var = make_ssa_name (TREE_TYPE (rhs));
>   if (!is_gimple_min_invariant (old_arg))
> old_arg = PHI_ARG_DEF_FROM_EDGE (phi, e);
>   else
> --- 189,198 
> {
>   tree rhs = gimple_assign_rhs1 (use_stmt);
>   gcc_assert (TREE_CODE (old_arg) == ADDR_EXPR);
> ! if (TREE_CODE (res) == SSA_NAME)
> !   new_var = make_ssa_name (TREE_TYPE (rhs));
> ! else
> !   new_var = unshare_expr (res);
>   if (!is_gimple_min_invariant (old_arg))
> old_arg = PHI_ARG_DEF_FROM_EDGE (phi, e);
>   else
> *** phiprop_insert_phi (basic_block bb, gphi
> *** 210,222 
> }
> }
>
> !   add_phi_arg (new_phi, new_var, e, locus);
>   }
>
> !   update_stmt (new_phi);
>
> !   if (dump_file && (dump_flags & TDF_DETAILS))
> ! print_gimple_stmt (dump_file, new_phi, 0, 0);
>
> return res;
>   }
> --- 215,231 
> }
> }
>
> !   if (new_phi)
> !   add_phi_arg (new_phi, new_var, e, locus);
>   }
>
> !   if (new_phi)
> ! {
> !   update_stmt (new_phi);
>
> !   if (dump_file && (dump_flags & TDF_DETAILS))
> !   print_gimple_stmt (dump_file, new_phi, 0, 0);
> ! }
>
> return res;
>   }
> *** propagate_with_phi (basic_block bb, gphi
> *** 250,256 
> tree type = NULL_TREE;
>
> if (!POINTER_TYPE_P (TREE_TYPE (ptr))
> !   || !is_gimple_reg_type (TREE_TYPE (TREE_TYPE (ptr
>   return false;
>
> /* Check if we can "cheaply" dereference all phi arguments.  */
> --- 259,266 
> tree type = NULL_TREE;
>
> if (!POINTER_TYPE_P (TREE_TYPE (ptr))
> !   || (!is_gimple_reg_type (TREE_TYPE (TREE_TYPE (ptr)))
> ! && TYPE_MODE (TREE_TYPE (TREE_TYPE (ptr))) == BLKmode))
>   return false;
>
> /* Check if we can "cheaply" dereference all phi 

Re: gomp_target_fini

2016-04-19 Thread Jakub Jelinek
On Tue, Apr 19, 2016 at 04:01:06PM +0200, Thomas Schwinge wrote:
> Two other solutions have been proposed in the past months: Chung-Lin's
> patches with subject: "Adjust offload plugin interface for avoiding
> deadlock on exit", later: "Resolve libgomp plugin deadlock on exit",
> later: "Resolve deadlock on plugin exit" (still pending review/approval),
> and Alexander's much smaller patch with subject: "libgomp plugin: make
> cuMemFreeHost error non-fatal",
> .
> (Both of which I have not reviewed in detail.)  Assuming that Chung-Lin's
> patches are considered too invasive for gcc-6-branch, can we at least get
> Alexander's patch committed to gcc-6-branch as well as on trunk, please?

Yeah, Alex' patch is IMHO fine, even for gcc-6-branch.

> --- libgomp/ChangeLog.gomp-nvptx
> +++ libgomp/ChangeLog.gomp-nvptx
> @@ -1,3 +1,7 @@
> +2016-03-11  Alexander Monakov  
> +
> + * plugin/plugin-nvptx.c (map_fini): Make cuMemFreeHost error non-fatal.
> +
>  2016-03-04  Alexander Monakov  
>  
>   * config/nvptx/bar.c: Remove wrong invocation of
> diff --git libgomp/plugin/plugin-nvptx.c libgomp/plugin/plugin-nvptx.c
> index adf57b1..4e44242 100644
> --- libgomp/plugin/plugin-nvptx.c
> +++ libgomp/plugin/plugin-nvptx.c
> @@ -135,7 +135,7 @@ map_fini (struct ptx_stream *s)
>  
>r = cuMemFreeHost (s->h);
>if (r != CUDA_SUCCESS)
> -GOMP_PLUGIN_fatal ("cuMemFreeHost error: %s", cuda_error (r));
> +GOMP_PLUGIN_error ("cuMemFreeHost error: %s", cuda_error (r));
>  }
>  
>  static void

Jakub


[PATCH, i386]: Use lowpart_subreg instead of simplify_gen_subreg (... , 0).

2016-04-19 Thread Uros Bizjak
Trivial patch, no functional changes.

2016-04-19  Uros Bizjak  

* config/i386/i386.c (ix86_decompose_address): Use lowpart_subreg
instead of simplify_gen_subreg (... , 0).
(ix86_delegitimize_address): Ditto.
(ix86_split_divmod): Ditto.
(ix86_split_copysign_const): Ditto.
(ix86_split_copysign_var): Ditto.
(ix86_expand_args_builtin): Ditto.
(ix86_expand_round_builtin): Ditto.
(ix86_expand_special_args_builtin): Ditto.
* config/i386/i386.md (TARGET_USE_VECTOR_FP_CONVERTS splitters): Ditto.
(TARGET_SSE_PARTIAL_REG_DEPENDENCY splitters and peephole2s): Ditto.
(udivmodqi4): Ditto.
(absneg splitters): Ditto.
(*jcc_bt_1): Ditto.

Bootstrapped and regression tested on x86_64-linux-gnu, committed to
mainline SVN.

Uros.
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 235206)
+++ config/i386/i386.c  (working copy)
@@ -14100,7 +14100,7 @@ ix86_decompose_address (rtx addr, struct ix86_addr
   else if (GET_CODE (addr) == AND
   && const_32bit_mask (XEXP (addr, 1), DImode))
{
- addr = simplify_gen_subreg (SImode, XEXP (addr, 0), DImode, 0);
+ addr = lowpart_subreg (SImode, XEXP (addr, 0), DImode);
  if (addr == NULL_RTX)
return 0;
 
@@ -16211,8 +16211,7 @@ ix86_delegitimize_address (rtx x)
  x = XVECEXP (XEXP (x, 0), 0, 0);
  if (GET_MODE (orig_x) != GET_MODE (x) && MEM_P (orig_x))
{
- x = simplify_gen_subreg (GET_MODE (orig_x), x,
-  GET_MODE (x), 0);
+ x = lowpart_subreg (GET_MODE (orig_x), x, GET_MODE (x));
  if (x == NULL_RTX)
return orig_x;
}
@@ -16303,7 +16302,7 @@ ix86_delegitimize_address (rtx x)
 }
   if (GET_MODE (orig_x) != Pmode && MEM_P (orig_x))
 {
-  result = simplify_gen_subreg (GET_MODE (orig_x), result, Pmode, 0);
+  result = lowpart_subreg (GET_MODE (orig_x), result, Pmode);
   if (result == NULL_RTX)
return orig_x;
 }
@@ -19580,9 +19579,9 @@ ix86_split_idivmod (machine_mode mode, rtx operand
   emit_label (qimode_label);
   /* Don't use operands[0] for result of 8bit divide since not all
  registers support QImode ZERO_EXTRACT.  */
-  tmp0 = simplify_gen_subreg (HImode, scratch, mode, 0);
-  tmp1 = simplify_gen_subreg (HImode, operands[2], mode, 0);
-  tmp2 = simplify_gen_subreg (QImode, operands[3], mode, 0);
+  tmp0 = lowpart_subreg (HImode, scratch, mode);
+  tmp1 = lowpart_subreg (HImode, operands[2], mode);
+  tmp2 = lowpart_subreg (QImode, operands[3], mode);
   emit_insn (gen_udivmodhiqi3 (tmp0, tmp1, tmp2));
 
   if (signed_p)
@@ -21016,7 +21015,7 @@ ix86_split_copysign_const (rtx operands[])
   mode = GET_MODE (dest);
   vmode = GET_MODE (mask);
 
-  dest = simplify_gen_subreg (vmode, dest, mode, 0);
+  dest = lowpart_subreg (vmode, dest, mode);
   x = gen_rtx_AND (vmode, dest, mask);
   emit_insn (gen_rtx_SET (dest, x));
 
@@ -21062,7 +21061,7 @@ ix86_split_copysign_var (rtx operands[])
   emit_insn (gen_rtx_SET (scratch, x));
 
   dest = mask;
-  op0 = simplify_gen_subreg (vmode, op0, mode, 0);
+  op0 = lowpart_subreg (vmode, op0, mode);
   x = gen_rtx_NOT (vmode, dest);
   x = gen_rtx_AND (vmode, x, op0);
   emit_insn (gen_rtx_SET (dest, x));
@@ -21076,7 +21075,7 @@ ix86_split_copysign_var (rtx operands[])
   else /* alternative 2,4 */
{
   gcc_assert (REGNO (mask) == REGNO (scratch));
-  op1 = simplify_gen_subreg (vmode, op1, mode, 0);
+  op1 = lowpart_subreg (vmode, op1, mode);
  x = gen_rtx_AND (vmode, scratch, op1);
}
   emit_insn (gen_rtx_SET (scratch, x));
@@ -21083,7 +21082,7 @@ ix86_split_copysign_var (rtx operands[])
 
   if (REGNO (op0) == REGNO (dest)) /* alternative 1,2 */
{
- dest = simplify_gen_subreg (vmode, op0, mode, 0);
+ dest = lowpart_subreg (vmode, op0, mode);
  x = gen_rtx_AND (vmode, dest, nmask);
}
   else /* alternative 3,4 */
@@ -21090,7 +21089,7 @@ ix86_split_copysign_var (rtx operands[])
{
   gcc_assert (REGNO (nmask) == REGNO (dest));
  dest = nmask;
- op0 = simplify_gen_subreg (vmode, op0, mode, 0);
+ op0 = lowpart_subreg (vmode, op0, mode);
  x = gen_rtx_AND (vmode, dest, op0);
}
   emit_insn (gen_rtx_SET (dest, x));
@@ -39115,7 +39114,7 @@ ix86_expand_args_builtin (const struct builtin_des
   else
 {
   real_target = gen_reg_rtx (tmode);
-  target = simplify_gen_subreg (rmode, real_target, tmode, 0);
+  target = lowpart_subreg (rmode, real_target, tmode);
 }
 
   for (i = 0; i < nargs; i++)
@@ -39132,7 +39131,7 @@ ix86_expand_args_builtin (const struct builtin_des
  

Re: [PATCH] Remove UNSPEC_LOADU and UNSPEC_STOREU

2016-04-19 Thread Kirill Yukhin
Hi,
On 18 Apr 21:13, Uros Bizjak wrote:
> On Mon, Apr 18, 2016 at 8:40 PM, H.J. Lu  wrote:
> > On Sun, Jan 10, 2016 at 11:45 PM, Uros Bizjak  wrote:
> >> On Sun, Jan 10, 2016 at 11:32 PM, H.J. Lu  wrote:
> >>> Since *mov_internal and _(load|store)_mask patterns
> >>> can handle unaligned load and store, we can remove UNSPEC_LOADU and
> >>> UNSPEC_STOREU.  We use function prototypes with pointer to scalar for
> >>> unaligned load/store builtin functions so that memory passed to
> >>> *mov_internal is unaligned.
> >>>
> >>> Tested on x86-64.  Is this OK for trunk in stage 3?
> >>
> >> This patch is not appropriate for stage 3.
> >>
> >> Uros.
> >>
> >>> H.J.
> >>> 
> >
> >
> > Here is the updated patch for GCC 7.  Tested on x86-64.  OK for
> > trrunk?
> 
> IIRC from previous discussion, are we sure we won't propagate
> unaligned memory into SSE arithmetic insns?
> 
> Otherwise, the patch is OK, but please wait for Kirill for AVX512 approval.
I am ok with it.
> 
> Thanks,
> Uros.

--
Thanks, K


C++ PATCH for c++/70522 (friend hides name in unnamed namespace)

2016-04-19 Thread Jason Merrill
cp_binding_level_find_binding_for_name can find a binding for a hidden 
friend declaration, in which case we shouldn't stop looking into 
anonymous namespaces.  This bug blocked the use of N4381 customization 
points.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit 995a41f6f9153cbc4ec713ec645a3edebc408ec2
Author: Jason Merrill 
Date:   Tue Apr 19 09:11:38 2016 -0400

	PR c++/70522

	* name-lookup.c (qualified_lookup_using_namespace): Look through
	hidden names.

diff --git a/gcc/cp/name-lookup.c b/gcc/cp/name-lookup.c
index 89d84d7..b3828c0 100644
--- a/gcc/cp/name-lookup.c
+++ b/gcc/cp/name-lookup.c
@@ -4647,8 +4647,9 @@ qualified_lookup_using_namespace (tree name, tree scope,
 	cp_binding_level_find_binding_for_name (NAMESPACE_LEVEL (scope), name);
 	  if (binding)
 	{
-	  found_here = true;
 	  ambiguous_decl (result, binding, flags);
+	  if (result->type || result->value)
+		found_here = true;
 	}
 
 	  for (usings = DECL_NAMESPACE_USING (scope); usings;
diff --git a/gcc/testsuite/g++.dg/lookup/friend18.C b/gcc/testsuite/g++.dg/lookup/friend18.C
new file mode 100644
index 000..90cd2d7
--- /dev/null
+++ b/gcc/testsuite/g++.dg/lookup/friend18.C
@@ -0,0 +1,15 @@
+// PR c++/70522
+
+namespace A {
+  struct C {
+friend void i();
+  };
+  namespace {
+int i;
+  }
+}
+
+int main()
+{
+  return A::i;
+}


Re: [PATCH] [AArch64] support -mfentry feature for arm64

2016-04-19 Thread Andrew Haley
On 04/19/2016 02:19 PM, Michael Matz wrote:

> Well, yeah, that's traditional insn caches on multiple cores.  From
> user space you need kernel help for this, doing interprocess
> interrupts to flush all such buffers on all cores (or at least those
> potentially fetching stuff in the patched region, if such
> granularity is possible).  An implementation providing such is
> non-broken :)

Sure.  If you know of any such facility in Linux userspace, please let
me know.  :-)

But there are ways of doing patching sequences which don't require
IPIs across all the cores; which was my point.

> Alternatively the various invalidate cache instructions need to have
> a form that invalidates the i$ on all cores.

I'm fairly sure we haven't got that in the AArch64 architecture.

Andrew.


[Ada] Use thread_id as lwp id on Darwin

2016-04-19 Thread Arnaud Charlet
Not a functional change, but allows the use of debugserver for task switching
by gdb.

Tested on x86_64-pc-linux-gnu, committed on trunk

2016-04-19  Tristan Gingold  

* adaint.c (__gnat_lwp_self): New function (for darwin).
* s-osinte-darwin.ads, s-osinte-darwin.adb (lwp_self): Import
of __gnat_lwp_self.

Index: s-osinte-darwin.adb
===
--- s-osinte-darwin.adb (revision 235192)
+++ s-osinte-darwin.adb (working copy)
@@ -6,7 +6,7 @@
 --  --
 --  B o d y --
 --  --
---  Copyright (C) 1999-2014, Free Software Foundation, Inc. --
+--  Copyright (C) 1999-2015, Free Software Foundation, Inc. --
 --  --
 -- GNARL is free software; you can  redistribute it  and/or modify it under --
 -- terms of the  GNU General Public License as published  by the Free Soft- --
@@ -172,17 +172,6 @@
   return 0;
end sched_yield;
 
-   --
-   -- lwp_self --
-   --
-
-   function lwp_self return Address is
-  function pthread_mach_thread_np (thread : pthread_t) return Address;
-  pragma Import (C, pthread_mach_thread_np, "pthread_mach_thread_np");
-   begin
-  return pthread_mach_thread_np (pthread_self);
-   end lwp_self;
-
--
-- pthread_init --
--
Index: adaint.c
===
--- adaint.c(revision 235192)
+++ adaint.c(working copy)
@@ -3101,6 +3101,30 @@
 }
 #endif
 
+#if defined (__APPLE__)
+#include 
+#include 
+#include 
+
+/* System-wide thread identifier.  Note it could be truncated on 32 bit
+   hosts.
+   Previously was: pthread_mach_thread_np (pthread_self ()).  */
+void *
+__gnat_lwp_self (void)
+{
+  thread_identifier_info_data_t data;
+  mach_msg_type_number_t count = THREAD_IDENTIFIER_INFO_COUNT;
+  kern_return_t kret;
+
+  kret = thread_info (mach_thread_self (), THREAD_IDENTIFIER_INFO,
+ (thread_info_t) , );
+  if (kret == KERN_SUCCESS)
+return (void *)(uintptr_t)data.thread_id;
+  else
+return 0;
+}
+#endif
+
 #if defined (__linux__)
 #include 
 
Index: s-osinte-darwin.ads
===
--- s-osinte-darwin.ads (revision 235192)
+++ s-osinte-darwin.ads (working copy)
@@ -228,6 +228,7 @@
-
 
function lwp_self return System.Address;
+   pragma Import (C, lwp_self, "__gnat_lwp_self");
--  Return the mach thread bound to the current thread.  The value is not
--  used by the run-time library but made available to debuggers.
 


[Ada] Always require an elaboration counter when preserving control flow

2016-04-19 Thread Arnaud Charlet
When control flow preservation is requested, we want to be explicit
about the units elaboration order in a partition, and we need to have
the elaboration counter available for that. This patch ensures we do,
even in circumstances where we are otherwise allowed to omit the
elaboration counter, e.g. under control of a No_Elaboration_Code pragma.

Compiling the code below:

   pragma Restrictions (No_Elaboration_Code);
   package Noelab is
 type Myint is new integer;
   end;

With -fpreserve-control-flow is expected to produce a "noelab_E"
counter, not there in absence of the option.

Tested on x86_64-pc-linux-gnu, committed on trunk

2016-04-19  Olivier Hainque  

* sem_util.adb (Build_Elaboration_Entity): Always request an
elab counter when preserving control-flow.

Index: sem_util.adb
===
--- sem_util.adb(revision 235200)
+++ sem_util.adb(working copy)
@@ -1662,10 +1662,18 @@
   elsif ASIS_Mode then
  return;
 
-  --  See if we need elaboration entity. We always need it for the dynamic
-  --  elaboration model, since it is needed to properly generate the PE
-  --  exception for access before elaboration.
+  --  See if we need elaboration entity.
 
+  --  We always need an elaboration entity when preserving control-flow, as
+  --  we want to remain explicit about the units elaboration order.
+
+  elsif Opt.Suppress_Control_Flow_Optimizations then
+ null;
+
+  --  We always need an elaboration entity for the dynamic elaboration
+  --  model, since it is needed to properly generate the PE exception for
+  --  access before elaboration.
+
   elsif Dynamic_Elaboration_Checks then
  null;
 


Re: [PATCH] [AArch64] support -mfentry feature for arm64

2016-04-19 Thread Michael Matz
Hi,

On Tue, 19 Apr 2016, Andrew Haley wrote:

> > I will happily declare any implementation where it's impossible to 
> > safely patch the instruction stream by flushing the respective buffers 
> > or other means completely under control of the patching machinery, to 
> > be broken by design.
> 
> You can declare anything you want, but we have to program for the 
> architectural specification.
> 
> > What failure mode do you envision, exactly?
> 
> It's easiest just to quote from the spec:
> 
> How far ahead of the current point of execution instructions are
> fetched from is IMPLEMENTATION DEFINED. Such prefetching can be
> either a fixed or a dynamically varying number of instructions,
> and can follow any or all possible future execution paths. For all
> types of memory:
> 
>The PE might have fetched the instructions from memory at any
>time since the last Context synchronization operation on that
>PE.
> 
>Any instructions fetched in this way might be executed multiple
>times, if this is required by the execution of the program,
>without being re-fetched from memory. In the absence of an ISB,
>there is no limit on the number of times such an instruction
>might be executed without being re-fetched from memory.
> 
> The ARM architecture does not require the hardware to ensure
> coherency between instruction caches and memory, even for
> locations of shared memory.

Well, yeah, that's traditional insn caches on multiple cores.  From user 
space you need kernel help for this, doing interprocess interrupts to 
flush all such buffers on all cores (or at least those potentially 
fetching stuff in the patched region, if such granularity is possible).  
An implementation providing such is non-broken :)  Alternatively the 
various invalidate cache instructions need to have a form that invalidates 
the i$ on all cores.

That's just normal code patching on multi-core systems.  Nothing specific 
for aarch64 and nothing the GCC side needs to cater for.

> I have wondered if it might be a good idea to use an inter-processor 
> interrupt to force a context synchronization event across all PEs.

So, this, exactly.


Ciao,
Michael.


Re: [PATCH, rs6000] Expand vec_ld and vec_st during parsing to improve performance

2016-04-19 Thread Bill Schmidt
On Tue, 2016-04-19 at 10:09 +0200, Richard Biener wrote:
> On Tue, Apr 19, 2016 at 12:05 AM, Bill Schmidt
>  wrote:
> > Hi,
> >
> > Expanding built-ins in the usual way (leaving them as calls until
> > expanding into RTL) restricts the amount of optimization that can be
> > performed on the code represented by the built-ins.  This has been
> > observed to be particularly bad for the vec_ld and vec_st built-ins on
> > PowerPC, which represent the lvx and stvx instructions.  Currently these
> > are expanded into UNSPECs that are left untouched by the optimizers, so
> > no redundant load or store elimination can take place.  For certain
> > idiomatic usages, this leads to very bad performance.
> >
> > Initially I planned to just change the UNSPEC representation to RTL that
> > directly expresses the address masking implicit in lvx and stvx.  This
> > turns out to be only partially successful in improving performance.
> > Among other things, by the time we reach RTL we have lost track of the
> > __restrict__ attribute, leading to more appearances of may-alias
> > relationships than should really be present.  Instead, this patch
> > expands the built-ins during parsing so that they are exposed to all
> > GIMPLE optimizations as well.
> >
> > This works well for vec_ld and vec_st.  It is also possible for
> > programmers to instead use __builtin_altivec_lvx_ and
> > __builtin_altivec_stvx_.  These are not so easy to catch during
> > parsing, since they are not processed by the overloaded built-in
> > function table.  For these, I am currently falling back to expansion
> > during RTL while still exposing the address-masking semantics, which
> > seems ok for these somewhat obscure built-ins.  At some future time we
> > may decide to handle them similarly to vec_ld and vec_st.
> >
> > For POWER8 little-endian only, the loads and stores during expand time
> > require some special handling, since the POWER8 expanders want to
> > convert these to lxvd2x/xxswapd and xxswapd/stxvd2x.  To deal with this,
> > I've added an extra pre-pass to the swap optimization phase that
> > recognizes the lvx and stvx patterns and canonicalizes them so they'll
> > be properly recognized.  This isn't an issue for earlier or later
> > processors, or for big-endian POWER8, so doing this as part of swap
> > optimization is appropriate.
> >
> > We have a lot of existing test cases for this code, which proved very
> > useful in discovering bugs, so I haven't seen a reason to add any new
> > tests.
> >
> > The patch is fairly large, but it isn't feasible to break it up into
> > smaller units without leaving something in a broken state.  So I will
> > have to just apologize for the size and leave it at that.  Sorry! :)
> >
> > Bootstrapped and tested successfully on powerpc64le-unknown-linux-gnu,
> > and on powerpc64-unknown-linux-gnu (-m32 and -m64) with no regressions.
> > Is this ok for trunk after GCC 6 releases?
> 
> Just took a very quick look but it seems you are using integer arithmetic
> for the pointer adjustment and bit-and.  You could use POINTER_PLUS_EXPR
> for the addition and BIT_AND_EXPR is also valid on pointer types.  Which
> means you don't need conversions to/from sizetype.

Thanks, I appreciate that help -- I had tried to use BIT_AND_EXPR on
pointer types but it didn't work; I must have done something wrong, and
assumed it wasn't allowed.  I'll take another crack at that, as the
conversions are definitely an annoyance.

Using PLUS_EXPR was automatically getting me a POINTER_PLUS_EXPR based
on type, but it is probably best to make that explicit.

> 
> x86 nowadays has intrinsics implemented as inlines - they come from
> header files.  It seems for ppc the intrinsics are somehow magically
> there, w/o a header file?

Yes, and we really need to start gravitating to the inlines in header
files model (Clang does this successfully for PowerPC and it is quite a
bit cleaner, and allows for more optimization).  We have a very
complicated setup for handling overloaded built-ins that could use a
rewrite once somebody has time to attack it.  We do have one header file
for built-ins (altivec.h) but it largely just #defines well-known
aliases for the internal built-in names.  We have a lot of other things
we have to do in GCC 7, but I'd like to do something about this in the
relatively near future.  (Things like "vec_add" that just do a vector
addition aren't expanded until RTL time??  Gack.)

David, let me take another shot at eliminating the sizetype conversions
before you review this.

Thanks!

Bill

> 
> Richard.
> 
> > Thanks,
> > Bill
> >
> >
> > 2016-04-18  Bill Schmidt  
> >
> > * config/rs6000/altivec.md (altivec_lvx_): Remove.
> > (altivec_lvx__internal): Document.
> > (altivec_lvx__2op): New define_insn.
> > (altivec_lvx__1op): Likewise.
> > (altivec_lvx__2op_si): Likewise.
> > (altivec_lvx__1op_si): Likewise.
> >

[patch] Fix comment in header for gcc-6-branch

2016-04-19 Thread Jonathan Wakely

I already did this as part of a larger change on trunk, this just
fixes the comment on the gcc-6-branch.

Jakub approved this on IRC yesterday.


commit 60c6d6865513d433a079c0fa0a6a152968c48721
Author: Jonathan Wakely 
Date:   Mon Apr 18 20:30:40 2016 +0100

	* include/bits/random.h: Fix filename in comment.

diff --git a/libstdc++-v3/include/bits/random.h b/libstdc++-v3/include/bits/random.h
index 1babe80..9de480c 100644
--- a/libstdc++-v3/include/bits/random.h
+++ b/libstdc++-v3/include/bits/random.h
@@ -1649,7 +1649,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
* @{
*/
 
-  // std::uniform_int_distribution is defined in 
+  // std::uniform_int_distribution is defined in 
 
   /**
* @brief Return true if two uniform integer distributions have


[PATCH v2] [libatomic] Add RTEMS support

2016-04-19 Thread Sebastian Huber
v2: Do not use architecture configuration due to broken ARM libatomic
support.

gcc/

* config/rtems.h (LIB_SPEC): Add -latomic.

libatomic/

* configure.tgt (*-*-rtems*): New supported target.
* config/rtems/host-config.h: New file.
* config/rtems/lock.c: Likewise.
---
 gcc/config/rtems.h   |  2 +-
 libatomic/config/rtems/host-config.h | 41 
 libatomic/config/rtems/lock.c| 37 
 libatomic/configure.tgt  | 10 +
 4 files changed, 89 insertions(+), 1 deletion(-)
 create mode 100644 libatomic/config/rtems/host-config.h
 create mode 100644 libatomic/config/rtems/lock.c

diff --git a/gcc/config/rtems.h b/gcc/config/rtems.h
index f13f72f..e005547 100644
--- a/gcc/config/rtems.h
+++ b/gcc/config/rtems.h
@@ -45,6 +45,6 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If 
not, see
 #define LIB_SPEC "%{!qrtems: " STD_LIB_SPEC "} " \
 "%{!nostdlib: %{qrtems: --start-group \
  -lrtemsbsp -lrtemscpu \
- -lc -lgcc --end-group %{!qnolinkcmds: -T linkcmds%s}}}"
+ -latomic -lc -lgcc --end-group %{!qnolinkcmds: -T linkcmds%s}}}"
 
 #define TARGET_POSIX_IO
diff --git a/libatomic/config/rtems/host-config.h 
b/libatomic/config/rtems/host-config.h
new file mode 100644
index 000..d11e9ef
--- /dev/null
+++ b/libatomic/config/rtems/host-config.h
@@ -0,0 +1,41 @@
+/* Copyright (C) 2016 Free Software Foundation, Inc.
+   Contributed by Sebastian Huber .
+
+   This file is part of the GNU Atomic Library (libatomic).
+
+   Libatomic is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   Libatomic is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   .  */
+
+/* Included after all more target-specific host-config.h.  */
+
+#include 
+
+static inline UWORD
+protect_start (void *ptr)
+{
+  return _Libatomic_Protect_start (ptr);
+}
+
+static inline void
+protect_end (void *ptr, UWORD isr_level)
+{
+  _Libatomic_Protect_end (ptr, isr_level);
+}
+
+#include_next 
diff --git a/libatomic/config/rtems/lock.c b/libatomic/config/rtems/lock.c
new file mode 100644
index 000..f999f9b
--- /dev/null
+++ b/libatomic/config/rtems/lock.c
@@ -0,0 +1,37 @@
+/* Copyright (C) 2016 Free Software Foundation, Inc.
+   Contributed by Sebastian Huber .
+
+   This file is part of the GNU Atomic Library (libatomic).
+
+   Libatomic is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   Libatomic is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   .  */
+
+#include "libatomic_i.h"
+
+void
+libat_lock_n (void *ptr, size_t n)
+{
+  _Libatomic_Lock_n (ptr, n);
+}
+
+void
+libat_unlock_n (void *ptr, size_t n)
+{
+  _Libatomic_Unlock_n (ptr, n);
+}
diff --git a/libatomic/configure.tgt b/libatomic/configure.tgt
index c5470d7..eab2765 100644
--- a/libatomic/configure.tgt
+++ b/libatomic/configure.tgt
@@ -26,6 +26,10 @@
 # Map the target cpu to an ARCH sub-directory.  At the same time,
 # work out any special compilation flags as necessary.
 
+# Give operating systems the opportunity to discard XCFLAGS modifications based
+# on ${target_cpu}.  For example to allow proper use of multilibs.
+configure_tgt_pre_target_cpu_XCFLAGS="${XCFLAGS}"
+
 case "${target_cpu}" in
   alpha*)
# fenv.c needs this option to generate inexact 

[Ada] Withing Ghost units

2016-04-19 Thread Arnaud Charlet
This patch implements context clauses for Ghost compilation units. It is now
possible to "with" and "use" a Ghost unit. If the Assertion_Policy for Ghost
is set to "Ignore", the Ghost compilation units do not generate ALI or object
files, and no cross-referencing information is present in living ALI files.


-- Source --


--  checked.adc

pragma Assertion_Policy (Ghost => Check);

--  ignored.adc

pragma Assertion_Policy (Ghost => Ignore);

--  g.ads

package G with Ghost is
   G_Obj : Integer := 1;
   procedure Force_Body;
end G;

with Ada.Text_IO; use Ada.Text_IO;

package body G is
   procedure Force_Body is
  G_Obj_2 : constant Integer := 2;
   begin
  null;
   end Force_Body;

begin
   Put_Line ("G");
end G;

--  gp.ads

procedure GP with Ghost;

--  gp.adb

with Ada.Text_IO; use Ada.Text_IO;

procedure GP is
begin
   Put_Line ("GP");
end GP;

--  gparent.ads

package Gparent with Ghost is
   procedure Force_Body;
end Gparent;

--  gparent.adb

with Ada.Text_IO; use Ada.Text_IO;

package body Gparent is
   procedure Force_Body is begin null; end Force_Body;
begin
   Put_Line ("Gparent");
end Gparent;

--  gparent-lchild.ads

package Gparent.Lchild is
   procedure Force_Body;
end Gparent.Lchild;

--  gparent-lchild.adb

with Ada.Text_IO; use Ada.Text_IO;

package body Gparent.Lchild is
   procedure Force_Body is begin null; end Force_Body;
begin
   Put_Line ("Gparent.Lchild");
end Gparent.Lchild;

--  g_withs_g.ads

with G;  use G;
with GP;

package G_Withs_G with Ghost is
   GWG_Obj : Integer := G_Obj;
   procedure Force_Body;
end G_Withs_G;

--  g_withs_g.adb

with Ada.Text_IO; use Ada.Text_IO;

package body G_Withs_G is
   procedure Force_Body is
   begin
  G_Obj := G_Obj + 1;
   end Force_Body;

begin
   Put_Line ("G_Withs_G");
   GP;
end G_Withs_G;

--  g_withs_g_withs_g.ads

with G_Withs_G; use G_Withs_G;

package G_Withs_G_Withs_G with Ghost is
   GWGWG_Obj : Integer := GWG_Obj;
   procedure Force_Body;
end G_Withs_G_Withs_G;

--  g_withs_g_withs_g.adb

with Ada.Text_IO; use Ada.Text_IO;

package body G_Withs_G_Withs_G is
   procedure Force_Body is
   begin
  GWG_Obj := GWG_Obj + 1;
   end Force_Body;

begin
   Put_Line ("G_Withs_G_Withs_G");
end G_Withs_G_Withs_G;

--  g_withs_g_withs_l.ads

with G_Withs_L; use G_Withs_L;

package G_Withs_G_Withs_L with Ghost is
   GWGWL_Obj : Integer := GWL_Obj;
   procedure Force_Body;
end G_Withs_G_Withs_L;

--  g_withs_g_withs_l.adb

with Ada.Text_IO; use Ada.Text_IO;

package body G_Withs_G_Withs_L is
   procedure Force_Body is
   begin
  GWL_Obj := GWL_Obj + 1;
   end Force_Body;

begin
   Put_Line ("G_Withs_G_Withs_L");
end G_Withs_G_Withs_L;

--  g_withs_l.ads

with L;  use L;
with LP;

package G_Withs_L with Ghost is
   GWL_Obj : Integer := L_Obj;
   procedure Force_Body;
end G_Withs_L;

--  g_withs_l.adb

with Ada.Text_IO; use Ada.Text_IO;

package body G_Withs_L is
   procedure Force_Body is
   begin
  L_Obj := L_Obj + 1;
   end Force_Body;

begin
   Put_Line ("G_Withs_L");
   LP;
end G_Withs_L;

--  g_withs_l_withs_g.ads

with L_Withs_G; use L_Withs_G;

package G_Withs_L_Withs_G with Ghost is
   GWLWG_Obj : Integer := LWG_Obj;
   procedure Force_Body;
end G_Withs_L_Withs_G;

--  g_withs_l_withs_g.adb

with Ada.Text_IO; use Ada.Text_IO;

package body G_Withs_L_Withs_G is
   procedure Force_Body is
   begin
  LWG_Obj := LWG_Obj + 1;
   end Force_Body;

begin
   Put_Line ("G_Withs_L_Withs_G");
end G_Withs_L_Withs_G;

--  g_withs_l_withs_l.ads

with L_Withs_L; use L_Withs_L;

package G_Withs_L_Withs_L with Ghost is
   GWLWL_Obj : Integer := LWL_Obj;
   procedure Force_Body;
end G_Withs_L_Withs_L;

--  g_withs_l_withs_l.adb

with Ada.Text_IO; use Ada.Text_IO;

package body G_Withs_L_Withs_L is
   procedure Force_Body is
   begin
  LWL_Obj := LWL_Obj + 1;
   end Force_Body;

begin
   Put_Line ("G_Withs_L_Withs_L");
end G_Withs_L_Withs_L;

--  l.ads

package L is
   L_Obj : Integer := 1;
   procedure Force_Body;
end L;

--  l.adb

with Ada.Text_IO; use Ada.Text_IO;

package body L is
   procedure Force_Body is
  L_Obj_2 : constant Integer := 2;
   begin
  null;
   end Force_Body;

begin
   Put_Line ("L");
end L;

--  lp.ads

procedure LP;

--  lp.adb

with Ada.Text_IO; use Ada.Text_IO;

procedure LP is
begin
   Put_Line ("LP");
end LP;

--  l_withs_g.ads

with G;  use G;
with GP;

package L_Withs_G is
   LWG_Obj : Integer := G_Obj with Ghost;
   procedure Force_Body;
end L_Withs_G;

--  l_withs_g.adb

with Ada.Text_IO; use Ada.Text_IO;

package body L_Withs_G is
   procedure Force_Body is
   begin
  G_Obj := G_Obj + 1;
   end Force_Body;

begin
   Put_Line ("L_Withs_G");
   GP;
end L_Withs_G;

--  l_withs_g_withs_g.ads

with G_Withs_G; use G_Withs_G;

package L_Withs_G_Withs_G is
   LWGWG_Obj : Integer := GWG_Obj with Ghost;
   procedure Force_Body;
end L_Withs_G_Withs_G;

--  l_withs_g_withs_g.adb

with Ada.Text_IO; use Ada.Text_IO;

package body L_Withs_G_Withs_G is
   procedure 

[Ada] Illegal use of type name in a context where it is not a current instance.

2016-04-19 Thread Arnaud Charlet
This patch fixes an omission in the code that checks the legality of a type
name as a prefix of 'access. These uses are allowed when the type name is a
current instance, but previously the compiler allowed these uses within
aggregates not within the declarative region of the type.

Compiling priority_queues.adb must yield:


   priority_queues.adb:85:48:
   "Unchecked_Access" attribute cannot be applied to type
   priority_queues.adb:86:48:
   "Unchecked_Access" attribute cannot be applied to type
---
with System;
with Ada.Containers.Synchronized_Queue_Interfaces;
with Ada.Finalization;
with Ada.Containers;

use Ada.Containers;

generic
   with package Queue_Interfaces is
 new Ada.Containers.Synchronized_Queue_Interfaces (<>);

   type Queue_Priority is private;

   with function Get_Priority
 (Element : Queue_Interfaces.Element_Type) return Queue_Priority is <>;

   with function Before
 (Left, Right : Queue_Priority) return Boolean is <>;

   Default_Ceiling : System.Any_Priority := System.Priority'Last;

package Priority_Queues is
   pragma Preelaborate;

   package Implementation is

  --  All identifiers in this unit are implementation defined

  pragma Implementation_Defined;

  type List_Type is tagged limited private;

  procedure Enqueue
(List : in out List_Type;
 New_Item : Queue_Interfaces.Element_Type);

  procedure Dequeue
(List: in out List_Type;
 Element : out Queue_Interfaces.Element_Type);

  procedure Dequeue
(List : in out List_Type;
 At_Least : Queue_Priority;
 Element  : in out Queue_Interfaces.Element_Type;
 Success  : out Boolean);

  function Length (List : List_Type) return Count_Type;

  function Max_Length (List : List_Type) return Count_Type;

   private

  type Node_Type;
  type Node_Access is access all Node_Type;

  type Node_Type is limited record
 Element : Queue_Interfaces.Element_Type;
 Next: Node_Access;
 First_Equal, Last_Equal : Node_Access;
  end record;

  type List_Type is new Ada.Finalization.Limited_Controlled with record
 First, Last   : Node_Access;
 Length: Count_Type := 0;
 Max_Length: Count_Type := 0;
  end record;

  overriding procedure Finalize (List : in out List_Type);

   end Implementation;

   protected type Queue (Ceiling : System.Any_Priority := Default_Ceiling)
   with
 Priority => Ceiling
   is new Queue_Interfaces.Queue with

  overriding entry Enqueue (New_Item : Queue_Interfaces.Element_Type);

  overriding entry Dequeue (Element : out Queue_Interfaces.Element_Type);

  --  The priority queue operation Dequeue_Only_High_Priority had been a
  --  protected entry in early drafts of AI05-0159, but it was discovered
  --  that that operation as specified was not in fact implementable. The
  --  operation was changed from an entry to a protected procedure per the
  --  ARG meeting in Edinburgh (June 2011), with a different signature and
  --  semantics.

  procedure Dequeue_Only_High_Priority
(At_Least : Queue_Priority;
 Element  : in out Queue_Interfaces.Element_Type;
 Success  : out Boolean);

  overriding function Current_Use return Count_Type;

  overriding function Peak_Use return Count_Type;

   private
  List : Implementation.List_Type;
   end Queue;

end Priority_Queues;
---
with Ada.Unchecked_Deallocation;

package body Priority_Queues is

   package body Implementation is

  ---
  -- Local Subprograms --
  ---

  procedure Free is
 new Ada.Unchecked_Deallocation (Node_Type, Node_Access);

  -
  -- Dequeue --
  -

  procedure Dequeue
(List: in out List_Type;
 Element : out Queue_Interfaces.Element_Type)
  is
 X : Node_Access;
  begin
 Element := List.First.Element;

 X := List.First;
 if X.Last_Equal = X then
-- Nothing to do
null;
 else
-- new First_Equal is next node
X.Last_Equal.First_Equal := X.Next;
-- update First_Equal / Last_Equal of next node with current last
X.Next.Last_Equal := X.Last_Equal;
X.Next.First_Equal := X.Next;
 end if;

 List.First := List.First.Next;

 if List.First = null then
List.Last := null;
 end if;

 List.Length := List.Length - 1;
 pragma Warnings (Off, """X"" modified by call, but never referenced");
 Free (X);
 pragma Warnings (On, """X"" modified by call, but never referenced");
  end Dequeue;

  procedure Dequeue
(List : in out List_Type;
 At_Least : Queue_Priority;
 Element  : in out Queue_Interfaces.Element_Type;
   

Re: [PATCH][GCC7] Remove scaling of COMPONENT_REF/ARRAY_REF ops 2/3

2016-04-19 Thread Richard Biener
On Fri, Feb 19, 2016 at 9:33 AM, Eric Botcazou  wrote:
>> The following experiment resulted from looking at making
>> array_ref_low_bound and array_ref_element_size non-mutating.  Again
>> I wondered why we do this strange scaling by offset/element alignment.
>
> I personally never really grasped it either...
>
>> So - I hope somebody from Adacore can evaluate this patch code-generation
>> wise.
>
> I will, this looks like a valuable simplification to me.

Did you manage to do this yet?  I'm flushing my stage1 queue of
"simple cleanups" right now.

Thanks,
Richard.


Re: [PATCH] Early "SSA" prerequesite - make SSA def stmt update cheaper

2016-04-19 Thread Richard Biener
On Thu, Jan 21, 2016 at 2:57 PM, Richard Biener  wrote:
>
> This makes the SSA def stmt update during inlining cheaper by adjusting
> it after remapping a SSA def instead of via an extra walk over all stmt
> defs (which incidentially is not possible with FOR_EACH_SSA_* during
> "early SSA" as we don't have SSA operands there).
>
> I've tested this independently of the
> [RFC] Delayed folding, match-and-simplify and early GIMPLE
> patch.
>
> This exposes that the walk_gimple_* stuff is somewhat awkward and
> needs some refactoring (can't re-construct wi->gsi as gsi_for_stmt
> only works for stmts in a BB and thus when we have a CFG).  Need to
> think about sth (simplest is require a gsi for walk_gimple_op, like
> we do for walk_gimple_stmt).
>
> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
>
> Queued for GCC 7.

Re-bootstrapped and tested on x86_64-unknown-linux-gnu, applied
to trunk as r235190.

Richard.

> Richard.
>
> 2016-01-21  Richard Biener  
>
> * gimple-walk.h (struct walk_stmt_info): Add stmt member.
> * gimple-walk.c (walk_gimple_op): Initialize it.
> (walk_gimple_asm): Set wi->is_lhs before each callback invocation.
> * tree-inline.c (remap_gimple_op_r): Set SSA_NAME_DEF_STMT when
> remapping SSA names of defs.
> (copy_bb): Remove walk over all SSA defs and SSA_NAME_DEF_STMT
> adjustment.
>
> Index: gcc/gimple-walk.c
> ===
> *** gcc/gimple-walk.c   (revision 232670)
> --- gcc/gimple-walk.c   (working copy)
> *** walk_gimple_asm (gasm *stmt, walk_tree_f
> *** 100,108 
> noutputs = gimple_asm_noutputs (stmt);
> oconstraints = (const char **) alloca ((noutputs) * sizeof (const char 
> *));
>
> -   if (wi)
> - wi->is_lhs = true;
> -
> for (i = 0; i < noutputs; i++)
>   {
> op = gimple_asm_output_op (stmt, i);
> --- 100,105 
> *** walk_gimple_asm (gasm *stmt, walk_tree_f
> *** 114,119 
> --- 111,118 
>_reg, _inout))
> wi->val_only = (allows_reg || !allows_mem);
> }
> +   if (wi)
> +   wi->is_lhs = true;
> ret = walk_tree (_VALUE (op), callback_op, wi, NULL);
> if (ret)
> return ret;
> *** walk_gimple_op (gimple *stmt, walk_tree_
> *** 182,187 
> --- 181,189 
> unsigned i;
> tree ret = NULL_TREE;
>
> +   if (wi)
> + wi->stmt = stmt;
> +
> switch (gimple_code (stmt))
>   {
>   case GIMPLE_ASSIGN:
> Index: gcc/gimple-walk.h
> ===
> *** gcc/gimple-walk.h   (revision 232670)
> --- gcc/gimple-walk.h   (working copy)
> *** struct walk_stmt_info
> *** 28,33 
> --- 28,34 
>   {
> /* Points to the current statement being walked.  */
> gimple_stmt_iterator gsi;
> +   gimple *stmt;
>
> /* Additional data that the callback functions may want to carry
>through the recursion.  */
> Index: gcc/tree-inline.c
> ===
> *** gcc/tree-inline.c   (revision 232670)
> --- gcc/tree-inline.c   (working copy)
> *** remap_gimple_op_r (tree *tp, int *walk_s
> *** 862,871 
> --- 862,877 
> copy_body_data *id = (copy_body_data *) wi_p->info;
> tree fn = id->src_fn;
>
> +   /* For recursive invocations this is no longer the LHS itself.  */
> +   bool is_lhs = wi_p->is_lhs;
> +   wi_p->is_lhs = false;
> +
> if (TREE_CODE (*tp) == SSA_NAME)
>   {
> *tp = remap_ssa_name (*tp, id);
> *walk_subtrees = 0;
> +   if (is_lhs)
> +   SSA_NAME_DEF_STMT (*tp) = wi_p->stmt;
> return NULL;
>   }
> else if (auto_var_in_fn_p (*tp, fn))
> *** copy_bb (copy_body_data *id, basic_block
> *** 2089,2104 
>   maybe_duplicate_eh_stmt_fn (cfun, stmt, id->src_cfun, orig_stmt,
>   id->eh_map, id->eh_lp_nr);
>
> - if (gimple_in_ssa_p (cfun) && !is_gimple_debug (stmt))
> -   {
> - ssa_op_iter i;
> - tree def;
> -
> - FOR_EACH_SSA_TREE_OPERAND (def, stmt, i, SSA_OP_DEF)
> -   if (TREE_CODE (def) == SSA_NAME)
> - SSA_NAME_DEF_STMT (def) = stmt;
> -   }
> -
>   gsi_next (_gsi);
> }
> while (!gsi_end_p (copy_gsi));
> --- 2095,2100 


Re: [RFC][PATCH][PR40921] Convert x + (-y * z * z) into x - y * z * z

2016-04-19 Thread Richard Biener
On Tue, Apr 19, 2016 at 1:36 PM, Richard Biener
 wrote:
> On Tue, Apr 19, 2016 at 1:35 PM, Richard Biener
>  wrote:
>> On Mon, Feb 29, 2016 at 11:53 AM, kugan
>>  wrote:

 Err.  I think the way you implement that in reassoc is ad-hoc and not
 related to reassoc at all.

 In fact what reassoc is missing is to handle

   -y * z * (-w) * x -> y * x * w * x

 thus optimize negates as if they were additional * -1 entries in a
 multiplication chain.  And
 then optimize a single remaining * -1 in the result chain to a negate.

 Then match.pd handles x + (-y) -> x - y (independent of -frounding-math
 btw).

 So no, this isn't ok as-is, IMHO you want to expand the multiplication ops
 chain
 pulling in the * -1 ops (if single-use, of course).

>>>
>>> I agree. Here is the updated patch along what you suggested. Does this look
>>> better ?
>>
>> It looks better but I think you want to do factor_out_negate_expr before the
>> first qsort/optimize_ops_list call to catch -1. * z * (-w) which also means 
>> you
>> want to simply append a -1. to the ops list rather than adjusting the result
>> with a negate stmt.
>>
>> You also need to guard all this with ! HONOR_SNANS (type) && (!
>> HONOR_SIGNED_ZEROS (type)
>> || ! COMPLEX_FLOAT_TYPE_P (type)) (see match.pd pattern transforming x
>> * -1. to -x).
>
> And please add at least one testcase.

And it appears to me that you could handle this in linearize_expr_tree
as well, similar
to how we handle MULT_EXPR with acceptable_pow_call there by adding -1. and
op into the ops vec.

Similar for the x + x + x -> 3 * x case we'd want to add a repeat op when seeing
x + 3 * x + x and use ->count in that patch as well.

Best split out the

  if (rhscode == MULT_EXPR
  && TREE_CODE (binrhs) == SSA_NAME
  && acceptable_pow_call (SSA_NAME_DEF_STMT (binrhs), , ))
{
  add_repeat_to_ops_vec (ops, base, exponent);
  gimple_set_visited (SSA_NAME_DEF_STMT (binrhs), true);
}
  else
add_to_ops_vec (ops, binrhs);

pattern into a helper that handles the other cases.

Richard.

> Richard.
>
>> Richard.
>>
>>> Thanks,
>>> Kugan


Re: [RFC][PATCH][PR63586] Convert x+x+x+x into 4*x

2016-04-19 Thread Richard Biener
On Tue, Apr 19, 2016 at 1:56 PM, Richard Biener
 wrote:
> On Wed, Mar 2, 2016 at 3:28 PM, Christophe Lyon
>  wrote:
>> On 29 February 2016 at 05:28, kugan  
>> wrote:
>>>
 That looks better, but I think the unordered_remove will break operand
 sorting
 and thus you probably don't handle x + x + x + x + y + y + y + y + y +
 y + z + z + z + z
 optimally.

 I'd say you simply want to avoid the recursion and collect a vector of
 [start, end] pairs
 before doing any modification to the ops vector.
>>>
>>>
>>> Hi Richard,
>>>
>>> Is the attached patch looks better?
>>>
>>
>> Minor comment, I've noticed typos in your updated comment:
>> "There should be two multiplication left in test1 (inculding one generated"
>> should be
>> "There should be two multiplications left in test1 (including one generated"
>
> +/* Transoform repeated addition of same values into multiply with
> +   constant.  */
>
> Transform
>
> +static void
> +transform_add_to_multiply (gimple_stmt_iterator *gsi, gimple *stmt,
> vec *ops)
>
> split the long line
>
> op_list looks redundant - ops[start]->op gives you the desired value
> already and if you
> use a vec> you can have a more C++ish start,end pair.
>
> +  tree tmp = make_temp_ssa_name (TREE_TYPE (op), NULL, "reassocmul");
> +  gassign *mul_stmt = gimple_build_assign (tmp, MULT_EXPR,
> +  op, build_int_cst
> (TREE_TYPE(op), count));
>
> this won't work for floating point or complex numbers - you need to use sth 
> like
> fold_convert (TREE_TYPE (op), build_int_cst (integer_type_node, count));
>
> For FP types you need to guard the transform with 
> flag_unsafe_math_optimizations
>
> +  gimple_set_location (mul_stmt, gimple_location (stmt));
> +  gimple_set_uid (mul_stmt, gimple_uid (stmt));
> +  gsi_insert_before (gsi, mul_stmt, GSI_SAME_STMT);
>
> I think you do not want to set the stmt uid and you want to insert the
> stmt right
> after the def of op (or at the original first add - though you can't
> get your hands at
> that easily).  You also don't want to set the location to the last stmt of the
> whole add sequence - simply leave it unset.
>
> +  oe = operand_entry_pool.allocate ();
> +  oe->op = tmp;
> +  oe->rank = get_rank (op) * count;
>
> ?  Why that?  oe->rank should be get_rank (tmp).
>
> +  oe->id = 0;
>
> other places use next_operand_entry_id++.  I think you want to simply
> use add_to_ops_vec (oe, tmp); here for all of the above.
>
> Please return whether you did any optimization and do the
> qsort of the operand vector only if you did sth.
>
> Testcase with FP math missing.  Likewise with complex or vector math.

Btw, does it handle associating

  x + 3 * x + x

to

  5 * x

?

Richard.

> Thanks,
> Richard.
>
>>> Thanks,
>>> Kugan


Re: [RFC][PATCH][PR63586] Convert x+x+x+x into 4*x

2016-04-19 Thread Richard Biener
On Wed, Mar 2, 2016 at 3:28 PM, Christophe Lyon
 wrote:
> On 29 February 2016 at 05:28, kugan  wrote:
>>
>>> That looks better, but I think the unordered_remove will break operand
>>> sorting
>>> and thus you probably don't handle x + x + x + x + y + y + y + y + y +
>>> y + z + z + z + z
>>> optimally.
>>>
>>> I'd say you simply want to avoid the recursion and collect a vector of
>>> [start, end] pairs
>>> before doing any modification to the ops vector.
>>
>>
>> Hi Richard,
>>
>> Is the attached patch looks better?
>>
>
> Minor comment, I've noticed typos in your updated comment:
> "There should be two multiplication left in test1 (inculding one generated"
> should be
> "There should be two multiplications left in test1 (including one generated"

+/* Transoform repeated addition of same values into multiply with
+   constant.  */

Transform

+static void
+transform_add_to_multiply (gimple_stmt_iterator *gsi, gimple *stmt,
vec *ops)

split the long line

op_list looks redundant - ops[start]->op gives you the desired value
already and if you
use a vec> you can have a more C++ish start,end pair.

+  tree tmp = make_temp_ssa_name (TREE_TYPE (op), NULL, "reassocmul");
+  gassign *mul_stmt = gimple_build_assign (tmp, MULT_EXPR,
+  op, build_int_cst
(TREE_TYPE(op), count));

this won't work for floating point or complex numbers - you need to use sth like
fold_convert (TREE_TYPE (op), build_int_cst (integer_type_node, count));

For FP types you need to guard the transform with flag_unsafe_math_optimizations

+  gimple_set_location (mul_stmt, gimple_location (stmt));
+  gimple_set_uid (mul_stmt, gimple_uid (stmt));
+  gsi_insert_before (gsi, mul_stmt, GSI_SAME_STMT);

I think you do not want to set the stmt uid and you want to insert the
stmt right
after the def of op (or at the original first add - though you can't
get your hands at
that easily).  You also don't want to set the location to the last stmt of the
whole add sequence - simply leave it unset.

+  oe = operand_entry_pool.allocate ();
+  oe->op = tmp;
+  oe->rank = get_rank (op) * count;

?  Why that?  oe->rank should be get_rank (tmp).

+  oe->id = 0;

other places use next_operand_entry_id++.  I think you want to simply
use add_to_ops_vec (oe, tmp); here for all of the above.

Please return whether you did any optimization and do the
qsort of the operand vector only if you did sth.

Testcase with FP math missing.  Likewise with complex or vector math.

Thanks,
Richard.

>> Thanks,
>> Kugan


Re: [RFC][PATCH][PR40921] Convert x + (-y * z * z) into x - y * z * z

2016-04-19 Thread Richard Biener
On Tue, Apr 19, 2016 at 1:35 PM, Richard Biener
 wrote:
> On Mon, Feb 29, 2016 at 11:53 AM, kugan
>  wrote:
>>>
>>> Err.  I think the way you implement that in reassoc is ad-hoc and not
>>> related to reassoc at all.
>>>
>>> In fact what reassoc is missing is to handle
>>>
>>>   -y * z * (-w) * x -> y * x * w * x
>>>
>>> thus optimize negates as if they were additional * -1 entries in a
>>> multiplication chain.  And
>>> then optimize a single remaining * -1 in the result chain to a negate.
>>>
>>> Then match.pd handles x + (-y) -> x - y (independent of -frounding-math
>>> btw).
>>>
>>> So no, this isn't ok as-is, IMHO you want to expand the multiplication ops
>>> chain
>>> pulling in the * -1 ops (if single-use, of course).
>>>
>>
>> I agree. Here is the updated patch along what you suggested. Does this look
>> better ?
>
> It looks better but I think you want to do factor_out_negate_expr before the
> first qsort/optimize_ops_list call to catch -1. * z * (-w) which also means 
> you
> want to simply append a -1. to the ops list rather than adjusting the result
> with a negate stmt.
>
> You also need to guard all this with ! HONOR_SNANS (type) && (!
> HONOR_SIGNED_ZEROS (type)
> || ! COMPLEX_FLOAT_TYPE_P (type)) (see match.pd pattern transforming x
> * -1. to -x).

And please add at least one testcase.

Richard.

> Richard.
>
>> Thanks,
>> Kugan


Re: [RFC][PATCH][PR40921] Convert x + (-y * z * z) into x - y * z * z

2016-04-19 Thread Richard Biener
On Mon, Feb 29, 2016 at 11:53 AM, kugan
 wrote:
>>
>> Err.  I think the way you implement that in reassoc is ad-hoc and not
>> related to reassoc at all.
>>
>> In fact what reassoc is missing is to handle
>>
>>   -y * z * (-w) * x -> y * x * w * x
>>
>> thus optimize negates as if they were additional * -1 entries in a
>> multiplication chain.  And
>> then optimize a single remaining * -1 in the result chain to a negate.
>>
>> Then match.pd handles x + (-y) -> x - y (independent of -frounding-math
>> btw).
>>
>> So no, this isn't ok as-is, IMHO you want to expand the multiplication ops
>> chain
>> pulling in the * -1 ops (if single-use, of course).
>>
>
> I agree. Here is the updated patch along what you suggested. Does this look
> better ?

It looks better but I think you want to do factor_out_negate_expr before the
first qsort/optimize_ops_list call to catch -1. * z * (-w) which also means you
want to simply append a -1. to the ops list rather than adjusting the result
with a negate stmt.

You also need to guard all this with ! HONOR_SNANS (type) && (!
HONOR_SIGNED_ZEROS (type)
|| ! COMPLEX_FLOAT_TYPE_P (type)) (see match.pd pattern transforming x
* -1. to -x).

Richard.

> Thanks,
> Kugan


Re: [wwwdocs] Document GCC 6 Solaris changes

2016-04-19 Thread Rainer Orth
Hi Gerald,

> On Mon, 18 Apr 2016, Rainer Orth wrote:
>> While updating docs for Solaris, here's a Solaris section for
>> gcc-6/changes.html.  Ok?
>
> Looks good to me, thanks.

thanks, installed.

Btw., I noticed that the subsections of `Operating Systems' are in
random order.  Shouldn't they be sorted alphabetically?

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: Re-apply reverted niter change 1/4

2016-04-19 Thread Bin.Cheng
On Tue, Apr 19, 2016 at 12:09 PM, Jan Hubicka  wrote:
>> > Index: tree-ssa-loop-ivopts.c
>> > ===
>> > --- tree-ssa-loop-ivopts.c  (revision 235064)
>> > +++ tree-ssa-loop-ivopts.c  (working copy)
>> > @@ -121,7 +121,11 @@ avg_loop_niter (struct loop *loop)
>> >  {
>> >HOST_WIDE_INT niter = estimated_stmt_executions_int (loop);
>> >if (niter == -1)
>> > -return AVG_LOOP_NITER (loop);
>> > +{
>> > +  niter = max_stmt_executions_int (loop);
>> > +  if (niter == -1 || niter > AVG_LOOP_NITER (loop))
>> > +return AVG_LOOP_NITER (loop);
>> Any reason why AVG_LOOP_NITER is still used if niter gives larger number?
>
> if you have a loop like this
>
> int a[100];
>
> for (i=0li<100;i++)
>   if (a[i])
> break
> max_stmt_executions_int will be 100 but that is just upper bound, not 
> realistic
> estimate and thus you can not assume that average number of iterations is 
> 100.
> It is anywhere between 0 and 100 and I assume the constant of 5 which 
> AVG_LOOP_NITER
> expands into was chosen to avoid ivopts to give resonable balance between 
> setup cost
> and iteration cost. For example, string manipulation loops tends to get large 
> buffers
> and terminate in very few iterations.
>
> (I do not recall the data precisely, it has been a decade. The average number
> of iterations of random loop can be measured from profile feedback, it is
> somewhere between 3 adn 10 for SPEC or GCC).
> This is why:
> /* Loopback edge is taken.  */
> DEF_PREDICTOR (PRED_LOOP_BRANCH, "loop branch", HITRATE (86),
>PRED_FLAG_FIRST_MATCH)
>
> /* Edge causing loop to terminate is probably not taken.  */
> DEF_PREDICTOR (PRED_LOOP_EXIT, "loop exit", HITRATE (91),
>PRED_FLAG_FIRST_MATCH)
>
> is set accordingly.
Thanks for explanation.

Thanks,
bin


Re: [PATCH] Fix PR70726

2016-04-19 Thread Jakub Jelinek
On Tue, Apr 19, 2016 at 01:03:51PM +0200, Richard Biener wrote:
> I am testing the following to fix PR70726.
> 
> Bootstrap and regtest running on x86_64-unknown-linux-gnu.
> 
> Richard.
> 
> 2016-04-19  Richard Biener  
> 
>   PR tree-optimization/70726
>   * tree-vect-stmts.c (vectorizable_shift): Do not use scalar
>   shift amounts from a pattern stmt operand.
> 
>   * g++.dg/vect/pr70726.cc: New testcase.

Looks reasonable to me.

Jakub


Re: Re-apply reverted niter change 1/4

2016-04-19 Thread Jan Hubicka
> > Index: tree-ssa-loop-ivopts.c
> > ===
> > --- tree-ssa-loop-ivopts.c  (revision 235064)
> > +++ tree-ssa-loop-ivopts.c  (working copy)
> > @@ -121,7 +121,11 @@ avg_loop_niter (struct loop *loop)
> >  {
> >HOST_WIDE_INT niter = estimated_stmt_executions_int (loop);
> >if (niter == -1)
> > -return AVG_LOOP_NITER (loop);
> > +{
> > +  niter = max_stmt_executions_int (loop);
> > +  if (niter == -1 || niter > AVG_LOOP_NITER (loop))
> > +return AVG_LOOP_NITER (loop);
> Any reason why AVG_LOOP_NITER is still used if niter gives larger number?

if you have a loop like this

int a[100];

for (i=0li<100;i++)
  if (a[i])
break
max_stmt_executions_int will be 100 but that is just upper bound, not 
realistic
estimate and thus you can not assume that average number of iterations is 
100.
It is anywhere between 0 and 100 and I assume the constant of 5 which 
AVG_LOOP_NITER
expands into was chosen to avoid ivopts to give resonable balance between setup 
cost
and iteration cost. For example, string manipulation loops tends to get large 
buffers
and terminate in very few iterations.

(I do not recall the data precisely, it has been a decade. The average number
of iterations of random loop can be measured from profile feedback, it is
somewhere between 3 adn 10 for SPEC or GCC).
This is why:
/* Loopback edge is taken.  */  
DEF_PREDICTOR (PRED_LOOP_BRANCH, "loop branch", HITRATE (86),   
   PRED_FLAG_FIRST_MATCH)   

/* Edge causing loop to terminate is probably not taken.  */
DEF_PREDICTOR (PRED_LOOP_EXIT, "loop exit", HITRATE (91),   
   PRED_FLAG_FIRST_MATCH)   

is set accordingly.

Honza
> 
> Thanks,
> bin
> > +}
> >
> >return niter;
> >  }


[PATCH] Fix PR70726

2016-04-19 Thread Richard Biener

I am testing the following to fix PR70726.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

2016-04-19  Richard Biener  

PR tree-optimization/70726
* tree-vect-stmts.c (vectorizable_shift): Do not use scalar
shift amounts from a pattern stmt operand.

* g++.dg/vect/pr70726.cc: New testcase.

Index: gcc/tree-vect-stmts.c
===
*** gcc/tree-vect-stmts.c   (revision 235188)
--- gcc/tree-vect-stmts.c   (working copy)
*** vectorizable_shift (gimple *stmt, gimple
*** 4532,4537 
--- 4532,4547 
if (!operand_equal_p (gimple_assign_rhs2 (slpstmt), op1, 0))
  scalar_shift_arg = false;
}
+ 
+   /* If the shift amount is computed by a pattern stmt we cannot
+  use the scalar amount directly thus give up and use a vector
+shift.  */
+   if (dt[1] == vect_internal_def)
+   {
+ gimple *def = SSA_NAME_DEF_STMT (op1);
+ if (is_pattern_stmt_p (vinfo_for_stmt (def)))
+   scalar_shift_arg = false;
+   }
  }
else
  {
Index: gcc/testsuite/g++.dg/vect/pr70726.cc
===
*** gcc/testsuite/g++.dg/vect/pr70726.cc(revision 0)
--- gcc/testsuite/g++.dg/vect/pr70726.cc(working copy)
***
*** 0 
--- 1,19 
+ // { dg-do compile }
+ // { dg-additional-options "-Ofast" }
+ // { dg-additional-options "-mavx2" { target x86_64-*-* i?86-*-* } }
+ 
+ extern long a;
+ extern int b[100];
+ extern unsigned c[5][5][2][4][2][2][3];
+ void fn1() {
+   for (int d = 0; d < 2; d = d + 1)
+ for (int e = 0; e < 5; e = e + 1)
+   for (int f = 0; f < 3; f = f + 1)
+ for (int g = 0; g < 3; g = g + 1)
+   for (int h = 0; h < 2; h = h + 1)
+ for (int i = 0; i < 4; i = i + 1)
+   for (int j = 0; j < 2; j = j + 1)
+ for (int k = 0; k < 2; k = k + 1)
+   for (int l = 0; l < 3; l = l + 1)
+ c[d][e][h][i][j][k][l] = a << b[f * 5 + g] + 4;
+ }


Re: [PATCH] PR70674: S/390: Add memory barrier to stack pointer restore from fpr.

2016-04-19 Thread Jakub Jelinek
On Tue, Apr 19, 2016 at 11:02:34AM +0200, Andreas Krebbel wrote:
> I'll post the patches for the other two parts when gcc 7 entered stage
> 1 again.

It will not reenter stage 1 again, that happened last Friday ;)

> This needs to go into 4.9/5/6 branches.

Ok for 6, but I have formatting nit:

> +  rtx_insn *insn;
> +
> +  if (!FP_REGNO_P (cfun_gpr_save_slot (i)))
> + continue;
> +

Can you please:
rtx fpr = gen_rtx_REG (DImode, cfun_gpr_save_slot (i));
if (i == STACK_POINTER_REGNUM)
  insn = emit_insn (gen_stack_restore_from_fpr (fpr));
else
  insn = emit_move_insn (gen_rtx_REG (DImode, i), fpr);
That way IMHO it is more nicely formatted, you avoid the ugly (
at the end of line, it uses fewer lines anyway and additionally
you can make it clear what the gen_rtx_REG (DImode, cfun_gpr_save_slot (i))
means by giving it a name.  Of course, choose whatever other var
name you prefer to describe what it is.

> +  if (i == STACK_POINTER_REGNUM)
> + insn = emit_insn (gen_stack_restore_from_fpr (
> + gen_rtx_REG (DImode, cfun_gpr_save_slot (i;
> +  else
> + insn =
> +   emit_move_insn (gen_rtx_REG (DImode, i),
> +   gen_rtx_REG (DImode, cfun_gpr_save_slot (i)));

Jakub


[PATCH] Fix PR70724

2016-04-19 Thread Richard Biener

I am testing the following patch fixing a python miscompile with FDO
(requires -ftracer to trigger).  A previous fix to SCCVN made resetting
of flow-sensitive SSA info from tail-merging ineffective by eventually
restoring the original info.

Fixed by splitting that part out of free_scc_vn and calling it before
tail-merging.

Bootstrap & regtest running on x86_64-unknown-linux-gnu, will apply
to trunk and branch if it succeeds.

Richard.

2016-04-19  Richard Biener  

PR tree-optimization/70724
* tree-ssa-sccvn.c (scc_vn_restore_ssa_info): Split SSA info
restoring out from ...
(free_scc_vn): ... here.
* tree-ssa-sccvn.h (scc_vn_restore_ssa_info): Declare.
* tres-ssa-pre.c (pass_pre::execute): Restore SSA info before
tail merging.
(pass_fre::execute): Restore SSA info.

* gcc.dg/torture/pr70724.c: New testcase.

Index: gcc/tree-ssa-sccvn.c
===
*** gcc/tree-ssa-sccvn.c(revision 235188)
--- gcc/tree-ssa-sccvn.c(working copy)
*** init_scc_vn (void)
*** 4300,4325 
  }
  }
  
  void
! free_scc_vn (void)
  {
!   size_t i;
! 
!   delete constant_to_value_id;
!   constant_to_value_id = NULL;
!   BITMAP_FREE (constant_value_ids);
!   shared_lookup_phiargs.release ();
!   shared_lookup_references.release ();
!   XDELETEVEC (rpo_numbers);
! 
!   for (i = 0; i < num_ssa_names; i++)
  {
tree name = ssa_name (i);
if (name
  && has_VN_INFO (name))
{
  if (VN_INFO (name)->needs_insertion)
!   release_ssa_name (name);
  else if (POINTER_TYPE_P (TREE_TYPE (name))
   && VN_INFO (name)->info.ptr_info)
SSA_NAME_PTR_INFO (name) = VN_INFO (name)->info.ptr_info;
--- 4300,4318 
  }
  }
  
+ /* Restore SSA info that has been reset on value leaders.  */
+ 
  void
! scc_vn_restore_ssa_info (void)
  {
!   for (unsigned i = 0; i < num_ssa_names; i++)
  {
tree name = ssa_name (i);
if (name
  && has_VN_INFO (name))
{
  if (VN_INFO (name)->needs_insertion)
!   ;
  else if (POINTER_TYPE_P (TREE_TYPE (name))
   && VN_INFO (name)->info.ptr_info)
SSA_NAME_PTR_INFO (name) = VN_INFO (name)->info.ptr_info;
*** free_scc_vn (void)
*** 4332,4337 
--- 4325,4352 
}
}
  }
+ }
+ 
+ void
+ free_scc_vn (void)
+ {
+   size_t i;
+ 
+   delete constant_to_value_id;
+   constant_to_value_id = NULL;
+   BITMAP_FREE (constant_value_ids);
+   shared_lookup_phiargs.release ();
+   shared_lookup_references.release ();
+   XDELETEVEC (rpo_numbers);
+ 
+   for (i = 0; i < num_ssa_names; i++)
+ {
+   tree name = ssa_name (i);
+   if (name
+ && has_VN_INFO (name)
+ && VN_INFO (name)->needs_insertion)
+   release_ssa_name (name);
+ }
obstack_free (_ssa_aux_obstack, NULL);
vn_ssa_aux_table.release ();
  
Index: gcc/tree-ssa-sccvn.h
===
*** gcc/tree-ssa-sccvn.h(revision 235188)
--- gcc/tree-ssa-sccvn.h(working copy)
*** extern vn_ssa_aux_t VN_INFO_GET (tree);
*** 204,209 
--- 204,210 
  tree vn_get_expr_for (tree);
  bool run_scc_vn (vn_lookup_kind);
  void free_scc_vn (void);
+ void scc_vn_restore_ssa_info (void);
  tree vn_nary_op_lookup (tree, vn_nary_op_t *);
  tree vn_nary_op_lookup_stmt (gimple *, vn_nary_op_t *);
  tree vn_nary_op_lookup_pieces (unsigned int, enum tree_code,
Index: gcc/tree-ssa-pre.c
===
*** gcc/tree-ssa-pre.c  (revision 235188)
--- gcc/tree-ssa-pre.c  (working copy)
*** pass_pre::execute (function *fun)
*** 4828,4833 
--- 4828,4836 
todo |= fini_eliminate ();
loop_optimizer_finalize ();
  
+   /* Restore SSA info before tail-merging as that resets it as well.  */
+   scc_vn_restore_ssa_info ();
+ 
/* TODO: tail_merge_optimize may merge all predecessors of a block, in which
   case we can merge the block with the remaining predecessor of the block.
   It should either:
*** pass_fre::execute (function *fun)
*** 4901,4906 
--- 4904,4910 
  
todo |= fini_eliminate ();
  
+   scc_vn_restore_ssa_info ();
free_scc_vn ();
  
statistics_counter_event (fun, "Insertions", pre_stats.insertions);
Index: gcc/testsuite/gcc.dg/torture/pr70724.c
===
*** gcc/testsuite/gcc.dg/torture/pr70724.c  (revision 0)
--- gcc/testsuite/gcc.dg/torture/pr70724.c  (working copy)
***
*** 0 
--- 1,39 
+ /* { dg-do run } */
+ /* { dg-additional-options "-ftracer" } */
+ 
+ extern void abort (void);
+ 
+ typedef long int _PyTime_t;
+ typedef enum { _PyTime_ROUND_FLOOR = 0, _PyTime_ROUND_CEILING = 1 }
+   

[PATCH] PR70674: S/390: Add memory barrier to stack pointer restore from fpr.

2016-04-19 Thread Andreas Krebbel
This patches fixes a problem with stack variable accesses being
scheduled after the stack pointer restore instructions.  In the
testcase this happened with the stack variable 'a' accessed through the
frame pointer.

The existing stack_tie we have in the backend is basically useless
when trying to block stack variable accesses from being scheduled
across an insn.  The alias set of stack variables and the frame alias
set usually differ and hence aren't in conflict with each other.  The
solution appears to be a magic MEM term with a scratch register which
is handled as a full memory barrier when analyzing scheduling
dependencies.

With the patch a (clobber (mem:BLK (scratch))) is being added to the
restore instruction in order to prevent any memory operations to be
scheduled across the insn.  The patch does that only for the one case
where the stack pointer is restored from an FPR.  Theoretically this
might happen also in the case where the stack pointer gets restored
using a load multiple.  However, triggering that problem with
load-multiple appears to be much harder since the load-multiple will
restore the frame pointer as well.  So in order to see the problem a
different call-clobbered register would need to be used as temporary
stack pointer.

Another case which needs to be handled some day is the stack pointer
allocation part.  It needs to be a memory barrier as well.

I'll post the patches for the other two parts when gcc 7 entered stage
1 again.

Bootstrapped and regression tested with --with-arch z196 and z13 on
s390 and s390x.

This needs to go into 4.9/5/6 branches.

-Andreas-

gcc/ChangeLog:

2016-04-19  Andreas Krebbel  

PR target/70674
* config/s390/s390.c (s390_restore_gprs_from_fprs): Pick the new
stack_restore_from_fpr pattern when restoring r15.
(s390_optimize_prologue): Strip away the memory barrier in the
parallel when trying to get rid of restore insns.
* config/s390/s390.md ("stack_restore_from_fpr"): New insn
definition for loading the stack pointer from an FPR.  Compared to
the normal move insn this pattern includes a full memory barrier.

gcc/testsuite/ChangeLog:

2016-04-19  Andreas Krebbel  

PR target/70674
* gcc.target/s390/pr70674.c: New test.
---
 gcc/config/s390/s390.c  | 91 +++--
 gcc/config/s390/s390.md | 10 
 gcc/testsuite/gcc.target/s390/pr70674.c | 13 +
 3 files changed, 76 insertions(+), 38 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/pr70674.c

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 1134d0f..e969542 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -10538,19 +10538,25 @@ s390_restore_gprs_from_fprs (void)
 
   for (i = 6; i < 16; i++)
 {
-  if (FP_REGNO_P (cfun_gpr_save_slot (i)))
-   {
- rtx_insn *insn =
-   emit_move_insn (gen_rtx_REG (DImode, i),
-   gen_rtx_REG (DImode, cfun_gpr_save_slot (i)));
- df_set_regs_ever_live (i, true);
- add_reg_note (insn, REG_CFA_RESTORE, gen_rtx_REG (DImode, i));
- if (i == STACK_POINTER_REGNUM)
-   add_reg_note (insn, REG_CFA_DEF_CFA,
- plus_constant (Pmode, stack_pointer_rtx,
-STACK_POINTER_OFFSET));
- RTX_FRAME_RELATED_P (insn) = 1;
-   }
+  rtx_insn *insn;
+
+  if (!FP_REGNO_P (cfun_gpr_save_slot (i)))
+   continue;
+
+  if (i == STACK_POINTER_REGNUM)
+   insn = emit_insn (gen_stack_restore_from_fpr (
+   gen_rtx_REG (DImode, cfun_gpr_save_slot (i;
+  else
+   insn =
+ emit_move_insn (gen_rtx_REG (DImode, i),
+ gen_rtx_REG (DImode, cfun_gpr_save_slot (i)));
+  df_set_regs_ever_live (i, true);
+  add_reg_note (insn, REG_CFA_RESTORE, gen_rtx_REG (DImode, i));
+  if (i == STACK_POINTER_REGNUM)
+   add_reg_note (insn, REG_CFA_DEF_CFA,
+ plus_constant (Pmode, stack_pointer_rtx,
+STACK_POINTER_OFFSET));
+  RTX_FRAME_RELATED_P (insn) = 1;
 }
 }
 
@@ -13032,37 +13038,46 @@ s390_optimize_prologue (void)
 
   /* Remove ldgr/lgdr instructions used for saving and restore
 GPRs if possible.  */
-  if (TARGET_Z10
- && GET_CODE (pat) == SET
- && GET_MODE (SET_SRC (pat)) == DImode
- && REG_P (SET_SRC (pat))
- && REG_P (SET_DEST (pat)))
+  if (TARGET_Z10)
{
- int src_regno = REGNO (SET_SRC (pat));
- int dest_regno = REGNO (SET_DEST (pat));
- int gpr_regno;
- int fpr_regno;
+ rtx tmp_pat = pat;
 
- if (!((GENERAL_REGNO_P (src_regno) && FP_REGNO_P (dest_regno))
-   || (FP_REGNO_P (src_regno) && GENERAL_REGNO_P (dest_regno
-   continue;

Re: Re-apply reverted niter change 1/4

2016-04-19 Thread Bin.Cheng
On Mon, Apr 18, 2016 at 6:24 PM, Jan Hubicka  wrote:
> Hi,
> as discussed on IRC today, I would like to re-apply the patch to fix bogus
> realistic bounds in niter.  As it turned out, we seem to rely on this bogus
> estimate in few benchmarks and there is miscompilation with avx512.
>
> The performance regressions should be solved my planned patch to introduce
> likely upper bounds - here we can track the assumption that there are no
> trailing arrays in the structures. I plan to send it after some benchmarking.
>
> Moreover we can get smarter about tracking trailing arrays.  We seem to get
> wrong MEM_REFs (as noticed by Richard), we may disable the path for non-C
> based languages (regresions are for Fortran testcases) and we can track object
> sizes.
>
> I plan to do that step-by-step so possible additional fallout is easier to
> track.  This patch re-instantiate first fix included in the orignial
> patch - ivopts should consider max_stmt-executions_int when giving an estimate
> on number of iterations.
>
> Bootstrapped/regtested x86_64-linux, comitted.
>
> Honza
>
> Index: ChangeLog
> ===
> --- ChangeLog   (revision 235157)
> +++ ChangeLog   (working copy)
> @@ -1,3 +1,8 @@
> +2016-04-17  Jan Hubicka  
> +
> +   * tree-ssa-loop-ivopts.c (avg_loop_niter): Use also
> +   max_loop_iterations_int.
> +
>  2016-04-18  Richard Biener  
>
> PR tree-optimization/43434
> Index: tree-ssa-loop-ivopts.c
> ===
> --- tree-ssa-loop-ivopts.c  (revision 235064)
> +++ tree-ssa-loop-ivopts.c  (working copy)
> @@ -121,7 +121,11 @@ avg_loop_niter (struct loop *loop)
>  {
>HOST_WIDE_INT niter = estimated_stmt_executions_int (loop);
>if (niter == -1)
> -return AVG_LOOP_NITER (loop);
> +{
> +  niter = max_stmt_executions_int (loop);
> +  if (niter == -1 || niter > AVG_LOOP_NITER (loop))
> +return AVG_LOOP_NITER (loop);
Any reason why AVG_LOOP_NITER is still used if niter gives larger number?

Thanks,
bin
> +}
>
>return niter;
>  }


Re: Please include ada-hurd.diff upstream (try2)

2016-04-19 Thread Arnaud Charlet
> The updated attachment was included in message
> https://gcc.gnu.org/ml/gcc-patches/2016-03/msg01659.html

You should just put a FSF copyright on s-osinte-gnu.adb

OK with this change.

Arno


[HSA, PATCH] Load an HSA runtime via dlopen mechanism

2016-04-19 Thread Martin Liška
Hello.

After brief discussions about packaging of an HSA runtime, we've decided to load
an HSA runtime via dlopen mechanism. Following patch introduces necessary header
files and all functions within the HSA plug-in are loaded via dlsym.

Patch survives HSA regression tests, installed to the HSA branch as r235189.

Thanks,
Martin
>From c93babc050cc31e1d370240568414dfa0f02f5d8 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Thu, 14 Apr 2016 14:25:58 +0200
Subject: [PATCH] Load an HSA runtime via dlopen mechanism

gcc/ChangeLog:

2016-04-19  Martin Liska  

	* doc/install.texi: Remove entry about --with-hsa-kmt-lib.

libgomp/ChangeLog:

2016-04-19  Martin Liska  

	* config.h.in: Introduce HSA_RUNTIME_LIB.
	* configure: Regerenated.
	* hsa.h: New file.
	* hsa_ext_finalize.h: New file.
	* plugin/configfrag.ac: Remove hsa-kmt-lib test.
	* plugin/plugin-hsa.c (struct hsa_runtime_fn_info): New
	structure.
	(init_enviroment_variables): Load newly introduced ENV
	variables.
	(hsa_warn): Call a function via hsa_fns data structure.
	(hsa_fatal): Likewise.
	(init_hsa_runtime_functions): Likewise.
	(suitable_hsa_agent_p): Likewise.
	(init_hsa_context): Likewise.
	(get_kernarg_memory_region): Likewise.
	(GOMP_OFFLOAD_init_device): Likewise.
	(destroy_hsa_program): Likewise.
	(create_and_finalize_hsa_program): Likewise.
	(create_single_kernel_dispatch): Likewise.
	(release_kernel_dispatch): Likewise.
	(init_single_kernel): Likewise.
	(GOMP_OFFLOAD_run): Likewise.
	(GOMP_OFFLOAD_fini_device): Likewise.
	* testsuite/lib/libgomp.exp: Remove hsa_kmt_lib support.
	* testsuite/libgomp-test-support.exp.in: Likewise.
---
 gcc/doc/install.texi  |   6 -
 libgomp/config.h.in   |   3 +
 libgomp/configure |  52 +--
 libgomp/hsa.h | 630 ++
 libgomp/hsa_ext_finalize.h| 265 +++
 libgomp/plugin/configfrag.ac  |  28 +-
 libgomp/plugin/plugin-hsa.c   | 312 ++---
 libgomp/testsuite/lib/libgomp.exp |   4 -
 libgomp/testsuite/libgomp-test-support.exp.in |   1 -
 9 files changed, 1161 insertions(+), 140 deletions(-)
 create mode 100644 libgomp/hsa.h
 create mode 100644 libgomp/hsa_ext_finalize.h

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 4268036..644f9dd 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -2012,12 +2012,6 @@ explicitly specify the directory where they are installed.  The
 shorthand for
 @option{--with-hsa-runtime-lib=@/@var{hsainstalldir}/lib} and
 @option{--with-hsa-runtime-include=@/@var{hsainstalldir}/include}.
-
-@item --with-hsa-kmt-lib=@var{pathname}
-
-If you configure GCC with HSA offloading but do not have the HSA
-KMT library installed in a standard location then you can
-explicitly specify the directory where it resides.
 @end table
 
 @subheading Cross-Compiler-Specific Options
diff --git a/libgomp/config.h.in b/libgomp/config.h.in
index 226ac53..4483a84 100644
--- a/libgomp/config.h.in
+++ b/libgomp/config.h.in
@@ -125,6 +125,9 @@
 /* Define to 1 if the HSA plugin is built, 0 if not. */
 #undef PLUGIN_HSA
 
+/* Define path to HSA runtime.  */
+#undef HSA_RUNTIME_LIB
+
 /* Define to 1 if the NVIDIA plugin is built, 0 if not. */
 #undef PLUGIN_NVPTX
 
diff --git a/libgomp/configure b/libgomp/configure
index 8d03eb6..9a09369 100755
--- a/libgomp/configure
+++ b/libgomp/configure
@@ -637,7 +637,6 @@ PLUGIN_HSA_LIBS
 PLUGIN_HSA_LDFLAGS
 PLUGIN_HSA_CPPFLAGS
 PLUGIN_HSA
-HSA_KMT_LIB
 HSA_RUNTIME_LIB
 HSA_RUNTIME_INCLUDE
 PLUGIN_NVPTX_LIBS
@@ -794,7 +793,6 @@ with_cuda_driver_lib
 with_hsa_runtime
 with_hsa_runtime_include
 with_hsa_runtime_lib
-with_hsa_kmt_lib
 enable_linux_futex
 enable_tls
 enable_symvers
@@ -1476,7 +1474,6 @@ Optional Packages:
   --with-hsa-runtime-lib=PATH
   specify directory for the installed HSA run-time
   library
-  --with-hsa-kmt-lib=PATH specify directory for installed HSA KMT library.
 
 Some influential environment variables:
   CC  C compiler command
@@ -11145,7 +11142,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 11148 "configure"
+#line 11145 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -11251,7 +11248,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 11254 "configure"
+#line 11251 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -15293,22 +15290,6 @@ if test "x$HSA_RUNTIME_LIB" != x; then
   HSA_RUNTIME_LDFLAGS=-L$HSA_RUNTIME_LIB
 fi
 
-HSA_KMT_LIB=
-
-HSA_KMT_LDFLAGS=
-
-# Check whether --with-hsa-kmt-lib was given.
-if test "${with_hsa_kmt_lib+set}" = set; then :
-  withval=$with_hsa_kmt_lib;
-fi
-
-if test "x$with_hsa_kmt_lib" != x; then
-  

Re: [PATCH] Optimize strchr (s, 0) to strlen

2016-04-19 Thread Richard Biener
On Mon, Apr 18, 2016 at 7:00 PM, Wilco Dijkstra  wrote:
> Optimize strchr (s, 0) to s + strlen (s).  strchr (s, 0) appears a common
> idiom for finding the end of a string, however it is not a very efficient
> way of doing so.  Strlen is a much simpler operation which is significantly
> faster (eg. on x86 strlen is 50% faster for strings of 8 bytes and about
> twice as fast as strchr on strings of 1KB).
>
> OK for trunk?

This folding should be added to gimple-fold.c:gimple_fold_builtin instead,
the builtins.c foldings are purerly for folding to constants nowadays.

Richard.

> ChangeLog:
> 2016-04-18  Wilco Dijkstra  
>
> gcc/
> * gcc/builtins.c (fold_builtin_strchr): Optimize strchr (s, 0) into
> strlen.
>
> testsuite/
> * gcc/testsuite/gcc.dg/strlenopt-20.c: Update test.
> * gcc/testsuite/gcc.dg/strlenopt-21.c: Likewise.
> * gcc/testsuite/gcc.dg/strlenopt-22.c: Likewise.
> * gcc/testsuite/gcc.dg/strlenopt-26.c: Likewise.
> * gcc/testsuite/gcc.dg/strlenopt-5.c: Likewise.
> * gcc/testsuite/gcc.dg/strlenopt-7.c: Likewise.
> * gcc/testsuite/gcc.dg/strlenopt-9.c: Likewise.
>
> --
>
> diff --git a/gcc/builtins.c b/gcc/builtins.c
> index 
> 058ecc39aab205099713e503861103ce6ba5ee6d..150e707178a3e119d42ef630b384da3eaf7b2182
>  100644
> --- a/gcc/builtins.c
> +++ b/gcc/builtins.c
> @@ -8567,20 +8567,20 @@ fold_builtin_strchr (location_t loc, tree s1, tree 
> s2, tree type)
>else
>  {
>const char *p1;
> +  char c;
>
>if (TREE_CODE (s2) != INTEGER_CST)
> return NULL_TREE;
>
> +  if (target_char_cast (s2, ))
> +   return NULL_TREE;
> +
>p1 = c_getstr (s1);
>if (p1 != NULL)
> {
> - char c;
>   const char *r;
>   tree tem;
>
> - if (target_char_cast (s2, ))
> -   return NULL_TREE;
> -
>   r = strchr (p1, c);
>
>   if (r == NULL)
> @@ -8590,6 +8590,20 @@ fold_builtin_strchr (location_t loc, tree s1, tree s2, 
> tree type)
>   tem = fold_build_pointer_plus_hwi_loc (loc, s1, r - p1);
>   return fold_convert_loc (loc, type, tem);
> }
> +  else if (c == 0)
> +   {
> + tree fn = builtin_decl_implicit (BUILT_IN_STRLEN);
> + if (!fn)
> +   return NULL_TREE;
> +
> + s1 = builtin_save_expr (s1);
> +
> + /* Transform strchr (s1, '\0') to s1 + strlen (s1).  */
> + fn = build_call_expr_loc (loc, fn, 1, s1);
> + tree tem = fold_build_pointer_plus (s1, fn);
> + return fold_convert_loc (loc, type, tem);
> +   }
> +
>return NULL_TREE;
>  }
>  }
> diff --git a/gcc/testsuite/gcc.dg/strlenopt-20.c 
> b/gcc/testsuite/gcc.dg/strlenopt-20.c
> index 
> a83e845c26d88e5acdcabf142f7b319136663488..7b483eaeac1aa47278111a92148a16f00b2aaa2d
>  100644
> --- a/gcc/testsuite/gcc.dg/strlenopt-20.c
> +++ b/gcc/testsuite/gcc.dg/strlenopt-20.c
> @@ -86,9 +86,9 @@ main ()
>return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-times "strlen \\(" 1 "strlen" } } */
> +/* { dg-final { scan-tree-dump-times "strlen \\(" 2 "strlen" } } */
>  /* { dg-final { scan-tree-dump-times "memcpy \\(" 4 "strlen" } } */
>  /* { dg-final { scan-tree-dump-times "strcpy \\(" 0 "strlen" } } */
>  /* { dg-final { scan-tree-dump-times "strcat \\(" 0 "strlen" } } */
> -/* { dg-final { scan-tree-dump-times "strchr \\(" 1 "strlen" } } */
> +/* { dg-final { scan-tree-dump-times "strchr \\(" 0 "strlen" } } */
>  /* { dg-final { scan-tree-dump-times "stpcpy \\(" 0 "strlen" } } */
> diff --git a/gcc/testsuite/gcc.dg/strlenopt-21.c 
> b/gcc/testsuite/gcc.dg/strlenopt-21.c
> index 
> e22fa9fca9ba14354db2cd5f602283b64bd8dcac..05b85a49dde0a7f5d269174fd4269e40be910dbd
>  100644
> --- a/gcc/testsuite/gcc.dg/strlenopt-21.c
> +++ b/gcc/testsuite/gcc.dg/strlenopt-21.c
> @@ -57,9 +57,9 @@ main ()
>return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-times "strlen \\(" 1 "strlen" } } */
> +/* { dg-final { scan-tree-dump-times "strlen \\(" 2 "strlen" } } */
>  /* { dg-final { scan-tree-dump-times "memcpy \\(" 3 "strlen" } } */
>  /* { dg-final { scan-tree-dump-times "strcpy \\(" 0 "strlen" } } */
>  /* { dg-final { scan-tree-dump-times "strcat \\(" 0 "strlen" } } */
> -/* { dg-final { scan-tree-dump-times "strchr \\(" 1 "strlen" } } */
> +/* { dg-final { scan-tree-dump-times "strchr \\(" 0 "strlen" } } */
>  /* { dg-final { scan-tree-dump-times "stpcpy \\(" 0 "strlen" } } */
> diff --git a/gcc/testsuite/gcc.dg/strlenopt-22.c 
> b/gcc/testsuite/gcc.dg/strlenopt-22.c
> index 
> aa55f5ebd6a2d4803ee9a7fd60fc538d86f47124..b4ef772f0e59252f10a5419ede6837b3c8ca8265
>  100644
> --- a/gcc/testsuite/gcc.dg/strlenopt-22.c
> +++ b/gcc/testsuite/gcc.dg/strlenopt-22.c
> @@ -31,9 +31,9 @@ main ()
>return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-times "strlen \\(" 3 "strlen" } } */
> +/* { dg-final { scan-tree-dump-times "strlen \\(" 4 "strlen" } } */
>  /* { 

Re: [PATCH, rs6000] Expand vec_ld and vec_st during parsing to improve performance

2016-04-19 Thread Richard Biener
On Tue, Apr 19, 2016 at 12:05 AM, Bill Schmidt
 wrote:
> Hi,
>
> Expanding built-ins in the usual way (leaving them as calls until
> expanding into RTL) restricts the amount of optimization that can be
> performed on the code represented by the built-ins.  This has been
> observed to be particularly bad for the vec_ld and vec_st built-ins on
> PowerPC, which represent the lvx and stvx instructions.  Currently these
> are expanded into UNSPECs that are left untouched by the optimizers, so
> no redundant load or store elimination can take place.  For certain
> idiomatic usages, this leads to very bad performance.
>
> Initially I planned to just change the UNSPEC representation to RTL that
> directly expresses the address masking implicit in lvx and stvx.  This
> turns out to be only partially successful in improving performance.
> Among other things, by the time we reach RTL we have lost track of the
> __restrict__ attribute, leading to more appearances of may-alias
> relationships than should really be present.  Instead, this patch
> expands the built-ins during parsing so that they are exposed to all
> GIMPLE optimizations as well.
>
> This works well for vec_ld and vec_st.  It is also possible for
> programmers to instead use __builtin_altivec_lvx_ and
> __builtin_altivec_stvx_.  These are not so easy to catch during
> parsing, since they are not processed by the overloaded built-in
> function table.  For these, I am currently falling back to expansion
> during RTL while still exposing the address-masking semantics, which
> seems ok for these somewhat obscure built-ins.  At some future time we
> may decide to handle them similarly to vec_ld and vec_st.
>
> For POWER8 little-endian only, the loads and stores during expand time
> require some special handling, since the POWER8 expanders want to
> convert these to lxvd2x/xxswapd and xxswapd/stxvd2x.  To deal with this,
> I've added an extra pre-pass to the swap optimization phase that
> recognizes the lvx and stvx patterns and canonicalizes them so they'll
> be properly recognized.  This isn't an issue for earlier or later
> processors, or for big-endian POWER8, so doing this as part of swap
> optimization is appropriate.
>
> We have a lot of existing test cases for this code, which proved very
> useful in discovering bugs, so I haven't seen a reason to add any new
> tests.
>
> The patch is fairly large, but it isn't feasible to break it up into
> smaller units without leaving something in a broken state.  So I will
> have to just apologize for the size and leave it at that.  Sorry! :)
>
> Bootstrapped and tested successfully on powerpc64le-unknown-linux-gnu,
> and on powerpc64-unknown-linux-gnu (-m32 and -m64) with no regressions.
> Is this ok for trunk after GCC 6 releases?

Just took a very quick look but it seems you are using integer arithmetic
for the pointer adjustment and bit-and.  You could use POINTER_PLUS_EXPR
for the addition and BIT_AND_EXPR is also valid on pointer types.  Which
means you don't need conversions to/from sizetype.

x86 nowadays has intrinsics implemented as inlines - they come from
header files.  It seems for ppc the intrinsics are somehow magically
there, w/o a header file?

Richard.

> Thanks,
> Bill
>
>
> 2016-04-18  Bill Schmidt  
>
> * config/rs6000/altivec.md (altivec_lvx_): Remove.
> (altivec_lvx__internal): Document.
> (altivec_lvx__2op): New define_insn.
> (altivec_lvx__1op): Likewise.
> (altivec_lvx__2op_si): Likewise.
> (altivec_lvx__1op_si): Likewise.
> (altivec_stvx_): Remove.
> (altivec_stvx__internal): Document.
> (altivec_stvx__2op): New define_insn.
> (altivec_stvx__1op): Likewise.
> (altivec_stvx__2op_si): Likewise.
> (altivec_stvx__1op_si): Likewise.
> * config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin):
> Expand vec_ld and vec_st during parsing.
> * config/rs6000/rs6000.c (altivec_expand_lvx_be): Commentary
> changes.
> (altivec_expand_stvx_be): Likewise.
> (altivec_expand_lv_builtin): Expand lvx built-ins to expose the
> address-masking behavior in RTL.
> (altivec_expand_stv_builtin): Expand stvx built-ins to expose the
> address-masking behavior in RTL.
> (altivec_expand_builtin): Change builtin code arguments for calls
> to altivec_expand_stv_builtin and altivec_expand_lv_builtin.
> (insn_is_swappable_p): Avoid incorrect swap optimization in the
> presence of lvx/stvx patterns.
> (alignment_with_canonical_addr): New function.
> (alignment_mask): Likewise.
> (find_alignment_op): Likewise.
> (combine_lvx_pattern): Likewise.
> (combine_stvx_pattern): Likewise.
> (combine_lvx_stvx_patterns): Likewise.
> (rs6000_analyze_swaps): Perform a pre-pass to recognize lvx and
> stvx patterns 

Re: [PATCH] [AArch64] support -mfentry feature for arm64

2016-04-19 Thread Andrew Haley
On 18/04/16 18:34, Michael Matz wrote:
> Hi,
> 
> On Mon, 18 Apr 2016, Andrew Haley wrote:
> 
 That may not be safe.  Consider an implementation which looks
 ahead in the instruction stream and decodes the instructions
 speculatively.
>>>
>>> It should go without saying that patching instructions is followed
>>> by whatever means necessary to flush any such caches on a
>>> particular implementation (here after patching the jump, after
>>> patching the rest, and after patching the first insn again,
>>> i.e. three times).
>>
>> That doesn't necessarily help you, though, without an ISB in the reading 
>> thread.
> 
> I don't understand, which reading thread?  We're writing, not reading 
> instructions.  You mean other executing threads? 

Yes.

> I will happily declare any implementation where it's impossible to
> safely patch the instruction stream by flushing the respective
> buffers or other means completely under control of the patching
> machinery, to be broken by design. 

You can declare anything you want, but we have to program for the
architectural specification.

> What failure mode do you envision, exactly?

It's easiest just to quote from the spec:

How far ahead of the current point of execution instructions are
fetched from is IMPLEMENTATION DEFINED. Such prefetching can be
either a fixed or a dynamically varying number of instructions,
and can follow any or all possible future execution paths. For all
types of memory:

   The PE might have fetched the instructions from memory at any
   time since the last Context synchronization operation on that
   PE.

   Any instructions fetched in this way might be executed multiple
   times, if this is required by the execution of the program,
   without being re-fetched from memory. In the absence of an ISB,
   there is no limit on the number of times such an instruction
   might be executed without being re-fetched from memory.

The ARM architecture does not require the hardware to ensure
coherency between instruction caches and memory, even for
locations of shared memory.

So, if you write a bunch of instructions (which might have been
pre-fetched) and then rewrite a NOP to jump to those instructions you
need to make sure that the thread which might be running concurrently
does an ISB.

Note also:

Memory accesses caused by instruction fetches are not required to
be observed in program order, unless they are separated by an ISB
or other context synchronization event.

So, if you modify instruction memory in one thread, other threads may
see those changes in a different order from the writing thread.  Sure,
the writing thread executes the cache maintenance instructions on its
side, but you also need to do something on the side which is executing
the instructions.

I have wondered if it might be a good idea to use an inter-processor
interrupt to force a context synchronization event across all PEs.

Andrew.



[PATCH] [libatomic] Add RTEMS support

2016-04-19 Thread Sebastian Huber
gcc/

* config/rtems.h (LIB_SPEC): Add -latomic.

libatomic/

* configure.tgt (*-*-rtems*): New supported target.
* config/rtems/host-config.h: New file.
* config/rtems/lock.c: Likewise.
---
 gcc/config/rtems.h   |  2 +-
 libatomic/config/rtems/host-config.h | 41 
 libatomic/config/rtems/lock.c| 37 
 libatomic/configure.tgt  | 10 +
 4 files changed, 89 insertions(+), 1 deletion(-)
 create mode 100644 libatomic/config/rtems/host-config.h
 create mode 100644 libatomic/config/rtems/lock.c

diff --git a/gcc/config/rtems.h b/gcc/config/rtems.h
index f13f72f..e005547 100644
--- a/gcc/config/rtems.h
+++ b/gcc/config/rtems.h
@@ -45,6 +45,6 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If 
not, see
 #define LIB_SPEC "%{!qrtems: " STD_LIB_SPEC "} " \
 "%{!nostdlib: %{qrtems: --start-group \
  -lrtemsbsp -lrtemscpu \
- -lc -lgcc --end-group %{!qnolinkcmds: -T linkcmds%s}}}"
+ -latomic -lc -lgcc --end-group %{!qnolinkcmds: -T linkcmds%s}}}"
 
 #define TARGET_POSIX_IO
diff --git a/libatomic/config/rtems/host-config.h 
b/libatomic/config/rtems/host-config.h
new file mode 100644
index 000..d11e9ef
--- /dev/null
+++ b/libatomic/config/rtems/host-config.h
@@ -0,0 +1,41 @@
+/* Copyright (C) 2016 Free Software Foundation, Inc.
+   Contributed by Sebastian Huber .
+
+   This file is part of the GNU Atomic Library (libatomic).
+
+   Libatomic is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   Libatomic is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   .  */
+
+/* Included after all more target-specific host-config.h.  */
+
+#include 
+
+static inline UWORD
+protect_start (void *ptr)
+{
+  return _Libatomic_Protect_start (ptr);
+}
+
+static inline void
+protect_end (void *ptr, UWORD isr_level)
+{
+  _Libatomic_Protect_end (ptr, isr_level);
+}
+
+#include_next 
diff --git a/libatomic/config/rtems/lock.c b/libatomic/config/rtems/lock.c
new file mode 100644
index 000..f999f9b
--- /dev/null
+++ b/libatomic/config/rtems/lock.c
@@ -0,0 +1,37 @@
+/* Copyright (C) 2016 Free Software Foundation, Inc.
+   Contributed by Sebastian Huber .
+
+   This file is part of the GNU Atomic Library (libatomic).
+
+   Libatomic is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   Libatomic is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   .  */
+
+#include "libatomic_i.h"
+
+void
+libat_lock_n (void *ptr, size_t n)
+{
+  _Libatomic_Lock_n (ptr, n);
+}
+
+void
+libat_unlock_n (void *ptr, size_t n)
+{
+  _Libatomic_Unlock_n (ptr, n);
+}
diff --git a/libatomic/configure.tgt b/libatomic/configure.tgt
index c5470d7..702286f 100644
--- a/libatomic/configure.tgt
+++ b/libatomic/configure.tgt
@@ -26,6 +26,10 @@
 # Map the target cpu to an ARCH sub-directory.  At the same time,
 # work out any special compilation flags as necessary.
 
+# Give operating systems the opportunity to discard XCFLAGS modifications based
+# on ${target_cpu}.  For example to allow proper use of multilibs.
+configure_tgt_pre_target_cpu_XCFLAGS="${XCFLAGS}"
+
 case "${target_cpu}" in
   alpha*)
# fenv.c needs this option to generate inexact exceptions.
@@ -128,6 +132,12 @@ case "${target}" in
 ;;
 

Re: Please include ada-hurd.diff upstream (try2)

2016-04-19 Thread Svante Signell
ping!

The updated attachment was included in message
https://gcc.gnu.org/ml/gcc-patches/2016-03/msg01659.html


On Thu, 2016-03-31 at 11:33 +0200, Svante Signell wrote:
> On Thu, 2016-03-17 at 08:51 +0100, Arnaud Charlet wrote:
> > 
> > > 
> > > 
> > > > 
> > > > 
> > > > The copyright notices are wrong (or at least incomplete).
> > > Hi, what is wrong then, copyright years and/or the text?
> > Both. The copyright year should include 2016 and the text should be
> > copyright FSF, not AdaCore.
> Attached is an updated ada-hurd.diff with your comments above taken care of.
> OK now?
> 
> Thanks!


Re: [PATCH] [AArch64] support -mfentry feature for arm64

2016-04-19 Thread AKASHI Takahiro
On Mon, Apr 18, 2016 at 06:54:33PM +0300, Alexander Monakov wrote:
> On Mon, 18 Apr 2016, Szabolcs Nagy wrote:
> > On 18/04/16 14:26, Alexander Monakov wrote:
> > > On Thu, 14 Apr 2016, Szabolcs Nagy wrote:
> > >> looking at [2] i don't see why
> > >>
> > >> func:
> > >>   mov x9, x30
> > >>   bl _tracefunc
> > >>   
> > >>
> > >> is not good for the kernel.
> > >>
> > >> mov x9, x30 is a nop at function entry, so in
> > >> theory 4 byte atomic write should be enough
> > >> to enable/disable tracing.
> > > 
> > > Overwriting x9 can be problematic because GCC has gained the ability to 
> > > track
> > > register usage interprocedurally: if foo() calls bar(), and GCC has 
> > > already
> > > emitted code for bar() and knows that it cannot change x9, it can use that
> > > knowledge to avoid saving/restoring x9 in foo() around calls to bar(). See
> > > option '-fipa-ra'.
> > > 
> > > If there's no register that can be safely used in place of x9 here, then
> > > the backend should emit the entry/pad appropriately (e.g. with an unspec 
> > > that
> > > clobbers the possibly-overwritten register).
> > > 
> > 
> > (1) nop padded function can be assumed to clobber all temp regs
> 
> This may be undesirable if the nop pad is expected to be left untouched
> most of the time, because it would penalize the common case.  If only
> sufficiently complex functions (e.g. making other calls anyway) are expected
> to be padded, it's moot.

Almost of all the "C" functions in the kernel will be compiled
with -mfentry, and later on, we can dynamically turn on and off
tracing per-function.

> > (2) or _tracefunc must save/restore all temp regs, not just arg regs.
> 
> This doesn't work: when _tracefunc starts executing, old value of x9 is
> already unrecoverable.

Yeah. We may, instead, be able to preserve LR value on a stack,
but obviously with performance penalty.
I wondered whether we could stop "instruction scheduling" partially,
and always generate a fixed sequence of instructions like
save x29, x30, [sp, #-XX]!
mov x29, x30
bl _mcount

but Maxim said no :)

Thanks,
-Takahiro AKASHI

> > on x86_64, glibc and linux _mcount and __fentry__ don't
> > save %r11 (temp reg), only the arg regs, so i think nop
> > padding should behave the same way (1).
> 
> That makes sense (modulo what I said above about penalizing tiny functions).
> 
> Heh, I started wondering if on x86 this is handled correctly when the calls
> are nopped out, and it turns out -pg disables -fipa-ra (in toplev.c)! :)
> 
> Alexander

-- 
Thanks,
-Takahiro AKASHI


Re: [PATCH] [AArch64] support -mfentry feature for arm64

2016-04-19 Thread Alexander Monakov
On Tue, 19 Apr 2016, AKASHI Takahiro wrote:
> > looking at [2] i don't see why
> > 
> > func:
> >   mov x9, x30
> >   bl _tracefunc
> >   
> 
> Actually,
> mov x9, x30
> bl _tracefunc
> mov x30, x9
> 

I think here Szabolcs' point was that the last instruction can be eliminated:
_tracefunc can be responsible for restoring x30, and can use x9 to return to
its caller. It has a non-standard calling convention and needs to be
implemented in assembly anyway.

Alexander


Re: [PATCH] [AArch64] support -mfentry feature for arm64

2016-04-19 Thread Alexander Monakov
On Tue, 19 Apr 2016, AKASHI Takahiro wrote:
> > > But if Szabolcs' two-instruction 
> > > sequence in the adjacent subthread is sufficient, this is moot.
> > 
> > .  It can also be solved by having just one NOP after the function label, 
> > and a number of them before, then no thread can be in the nop pad.  That 
> > seems to indicate that GCC should not try to be too clever and simply 
> > leave the specified number of nops before and after the function label, 
> > leaving safety measures to the patching infrastructure.
> 
> I don't get this idea very well.
> How can the instructions *before* a function label be executed
> after branching into this function?

The single nop after the function label is changed to a short backwards branch
to the instructions just before the function label.

As a result, the last instruction in the pad would have to become a short
forward branch jumping over the backwards branch described above, to the first
real instruction of the function.

Alexander


Re: [PATCH] [AArch64] support -mfentry feature for arm64

2016-04-19 Thread AKASHI Takahiro
On Mon, Apr 18, 2016 at 02:12:09PM +0200, Michael Matz wrote:
> Hi,
> 
> On Sun, 17 Apr 2016, Alexander Monakov wrote:
> 
> > I've noticed an issue in my (and probably Michael's) solution: if 
> > there's a thread that made it past the first nop, but is still executing 
> > the nop pad, it's unsafe to replace the nops.

Yeah, this issue also trapped me before :)
 
> To solve that, it 
> > suffices to have a forward branch in place of the first nop to begin 
> > with (i.e. have the compiler emit it).
>
> True.  I wonder if the generic solution in GCC should do that always or if 
> the patch infrastructure should do that to enable more freedom like doing 
> this:
> 
> > But if Szabolcs' two-instruction 
> > sequence in the adjacent subthread is sufficient, this is moot.
> 
> .  It can also be solved by having just one NOP after the function label, 
> and a number of them before, then no thread can be in the nop pad.  That 
> seems to indicate that GCC should not try to be too clever and simply 
> leave the specified number of nops before and after the function label, 
> leaving safety measures to the patching infrastructure.

I don't get this idea very well.
How can the instructions *before* a function label be executed
after branching into this function?

Thanks,
-Takahiro AKASHI

> 
> Ciao,
> Michael.

-- 
Thanks,
-Takahiro AKASHI


[gomp4] Merge trunk r235033 (2016-04-15) into gomp-4_0-branch

2016-04-19 Thread Thomas Schwinge
Hi!

Committed to gomp-4_0-branch in r235188:

commit d481297b3d1460e430341a837c1a8bc77335a266
Merge: 8798e58 a050099
Author: tschwinge 
Date:   Tue Apr 19 05:49:18 2016 +

svn merge -r 234575:235033 svn+ssh://gcc.gnu.org/svn/gcc/trunk


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@235188 
138bc75d-0d04-0410-961f-82ee72b054a4

(That is the gcc-6-branch branch point.)


Grüße
 Thomas


Re: [PATCH] [AArch64] support -mfentry feature for arm64

2016-04-19 Thread AKASHI Takahiro
On Thu, Apr 14, 2016 at 04:58:12PM +0100, Szabolcs Nagy wrote:
> On 14/04/16 14:15, Andrew Pinski wrote:
> > On Thu, Apr 14, 2016 at 9:08 PM, Maxim Kuvyrkov
> >  wrote:
> >> On Mar 14, 2016, at 11:14 AM, Li Bin  wrote:
> >>>
> >>> As ARM64 is entering enterprise world, machines can not be stopped for
> >>> some critical enterprise production environment, that is, live patch as
> >>> one of the RAS features is increasing more important for ARM64 arch now.
> >>>
> >>> Now, the mainstream live patch implementation which has been merged in
> >>> Linux kernel (x86/s390) is based on the 'ftrace with regs' feature, and
> >>> this feature needs the help of gcc.
> >>>
> >>> This patch proposes a generic solution for arm64 gcc which called mfentry,
> >>> following the example of x86, mips, s390, etc. and on these archs, this
> >>> feature has been used to implement the ftrace feature 'ftrace with regs'
> >>> to support live patch.
> >>>
> >>> By now, there is an another solution from linaro [1], which proposes to
> >>> implement a new option -fprolog-pad=N that generate a pad of N nops at the
> >>> beginning of each function. This solution is a arch-independent way for 
> >>> gcc,
> >>> but there may be some limitations which have not been recognized for Linux
> >>> kernel to adapt to this solution besides the discussion on [2]
> >>
> >> It appears that implementing -fprolog-pad=N option in GCC will not enable 
> >> kernel live-patching support for AArch64.  The proposal for the option was 
> >> to make GCC output a given number of NOPs at the beginning of each 
> >> function, and then the kernel could use that NOP pad to insert whatever 
> >> instructions it needs.  The modification of kernel instruction stream 
> >> needs to be done atomically, and, unfortunately, it seems the kernel can 
> >> use only architecture-provided atomicity primitives -- i.e., changing at 
> >> most 8 bytes at a time.
> >>
> > 
> > Can't we add a 16byte atomic primitive for ARM64 to the kernel?
> > Though you need to align all functions to a 16 byte boundary if the
> > -fprolog-pag=N needs to happen.  Do you know what the size that needs
> > to be modified?  It does seem to be either 12 or 16 bytes.
> > 
> 
> looking at [2] i don't see why
> 
> func:
>   mov x9, x30
>   bl _tracefunc
>   

Actually,
mov x9, x30
bl _tracefunc
mov x30, x9

 
> is not good for the kernel.
> 
> mov x9, x30 is a nop at function entry, so in
> theory 4 byte atomic write should be enough
> to enable/disable tracing.

Please see my previous reply to Maxim.

Thanks,
-Takahiro AKASHI

> >> From the kernel discussion thread it appears that the pad needs to be more 
> >> than 8 bytes, and that the kernel can't update that atomically.  However 
> >> if -mfentry approach is used, then we need to update only 4 (or 8) bytes 
> >> of the pad, and we avoid the atomicity problem.
> > 
> > I think you are incorrect, you could add a 16 byte atomic primitive if 
> > needed.
> > 
> >>
> >> Therefore, [unless there is a clever multi-stage update process to 
> >> atomically change NOPs to whatever we need,] I think we have to go with 
> >> Li's -mfentry approach.
> > 
> > Please consider the above of having a 16 byte (128bit) atomic
> > instructions be available would that be enough?
> > 
> > Thanks,
> > Andrew
> > 
> >>
> >> Comments?
> >>
> >> --
> >> Maxim Kuvyrkov
> >> www.linaro.org
> >>
> >>
> >>> , typically
> >>> for powerpc archs. Furthermore I think there are no good reasons to 
> >>> promote
> >>> the other archs (such as x86) which have implemented the feature 'ftrace 
> >>> with regs'
> >>> to replace the current method with the new option, which may bring heavily
> >>> target-dependent code adaption, as a result it becomes a arm64 dedicated
> >>> solution, leaving kernel with two different forms of implementation.
> >>>
> >>> [1] https://gcc.gnu.org/ml/gcc/2015-10/msg00090.html
> >>> [2] 
> >>> http://lists.infradead.org/pipermail/linux-arm-kernel/2016-January/401854.html
> >>
> > 
> 

-- 
Thanks,
-Takahiro AKASHI


Re: [PATCH] [AArch64] support -mfentry feature for arm64

2016-04-19 Thread AKASHI Takahiro
On Thu, Apr 14, 2016 at 04:08:23PM +0300, Maxim Kuvyrkov wrote:
> On Mar 14, 2016, at 11:14 AM, Li Bin  wrote:
> > 
> > As ARM64 is entering enterprise world, machines can not be stopped for
> > some critical enterprise production environment, that is, live patch as
> > one of the RAS features is increasing more important for ARM64 arch now.
> > 
> > Now, the mainstream live patch implementation which has been merged in
> > Linux kernel (x86/s390) is based on the 'ftrace with regs' feature, and
> > this feature needs the help of gcc. 
> > 
> > This patch proposes a generic solution for arm64 gcc which called mfentry,
> > following the example of x86, mips, s390, etc. and on these archs, this
> > feature has been used to implement the ftrace feature 'ftrace with regs'
> > to support live patch.
> > 
> > By now, there is an another solution from linaro [1], which proposes to
> > implement a new option -fprolog-pad=N that generate a pad of N nops at the
> > beginning of each function. This solution is a arch-independent way for gcc,
> > but there may be some limitations which have not been recognized for Linux
> > kernel to adapt to this solution besides the discussion on [2]
> 
> It appears that implementing -fprolog-pad=N option in GCC will not enable 
> kernel live-patching support for AArch64.  The proposal for the option was to 
> make GCC output a given number of NOPs at the beginning of each function, and 
> then the kernel could use that NOP pad to insert whatever instructions it 
> needs.  The modification of kernel instruction stream needs to be done 
> atomically, and, unfortunately, it seems the kernel can use only 
> architecture-provided atomicity primitives -- i.e., changing at most 8 bytes 
> at a time.

Let me clarify the issue with -fprolog-pad=N.
The kernel/ftrace has two chances of replacing prologue instructions:
 1) at boot time for all the "C" functions
 2) at run time for given functions

1) will be done in part of kernel/ftrace initialization and executed while
no other threads(cpus) are running. So we don't need atomicity here.
See [1].

For 2), we only have to replace one instruction (nop <-> bl) as [1] stated.
So we can guarantee atomicity.

Therefore, I still believe that -fproglog-pad=N approach will work for
Aarch64.
 
> From the kernel discussion thread it appears that the pad needs to be more 
> than 8 bytes, and that the kernel can't update that atomically.  However if 
> -mfentry approach is used, then we need to update only 4 (or 8) bytes of the 
> pad, and we avoid the atomicity problem.
> 
> Therefore, [unless there is a clever multi-stage update process to atomically 
> change NOPs to whatever we need,] I think we have to go with Li's -mfentry 
> approach.

The reason that I gave up this approach is that it is not as generic
as we have expected. At least, power pc needs a specific instruction
(i.e. saving TOC) before NOPs.
See discussions in [2].

[1] 
http://lists.infradead.org/pipermail/linux-arm-kernel/2016-January/401854.html
[2] http://lkml.iu.edu//hypermail/linux/kernel/1602.0/02257.html


Thanks,
-Takahiro AKASHI

> Comments?
> 
> --
> Maxim Kuvyrkov
> www.linaro.org
> 
> 
> > , typically
> > for powerpc archs. Furthermore I think there are no good reasons to promote
> > the other archs (such as x86) which have implemented the feature 'ftrace 
> > with regs'
> > to replace the current method with the new option, which may bring heavily
> > target-dependent code adaption, as a result it becomes a arm64 dedicated
> > solution, leaving kernel with two different forms of implementation. 
> > 
> > [1] https://gcc.gnu.org/ml/gcc/2015-10/msg00090.html
> > [2] 
> > http://lists.infradead.org/pipermail/linux-arm-kernel/2016-January/401854.html
> 

-- 
Thanks,
-Takahiro AKASHI