Re: Ping: Re: [PATCH 1/2] gcc: Remove unneeded global flag.

2016-11-16 Thread Andrew Burgess
* Bernd Schmidt  [2016-11-03 13:01:32 +0100]:

> On 09/14/2016 03:00 PM, Andrew Burgess wrote:
> > In an attempt to get this patch merged (as I still think that its
> > correct) I've investigated, and documented a little more about how I
> > think things currently work.  I'm sure most people reading this will
> > already know this, but hopefully, if my understanding is wrong someone
> > can point it out.
> > gcc/ChangeLog:
> > 
> > * gcc/bb-reorder.c: Remove 'toplev.h' include.
> > (pass_partition_blocks::gate): No longer check
> > user_defined_section_attribute, instead check the function decl
> > for a section attribute.
> > * gcc/c-family/c-common.c (handle_section_attribute): No longer
> > set user_defined_section_attribute.
> > * gcc/final.c (rest_of_handle_final): Likewise.
> > * gcc/toplev.c: Remove definition of user_defined_section_attribute.
> > * gcc/toplev.h: Remove declaration of
> > user_defined_section_attribute.
> > 
> > gcc/testsuiteChangeLog:
> > 
> > * gcc.dg/tree-prof/section-attr-1.c: New file.
> > * gcc.dg/tree-prof/section-attr-2.c: New file.
> > * gcc.dg/tree-prof/section-attr-3.c: New file.
> 
> I think the explanation is perfectly reasonable and the patch looks good,
> except:
> 
> > +__attribute__((noinline))
> 
> Add noclone to all of these as well.

Thanks.  Considering Jeff said, I'm thinking about it, and you've said
yes, and given Jeff's not got back, I'm considering this patch
approved (with the fix you suggest).

My only remaining concern is the new tests, I've tried to restrict
them to targets that I suspect they'll pass on with:

/* { dg-final-use { scan-assembler "\.section\[\t 
\]*\.text\.unlikely\[\\n\\r\]+\[\t \]*\.size\[\t \]*foo\.cold\.0" { target 
*-*-linux* *-*-gnu* } } } */

but I'm still nervous that I'm going to introduce test failures.  Is
there any advice / guidance I should follow before I commit, or are
folk pretty relaxed so long as I've made a reasonable effort?

Thanks,
Andrew


Re: Fix PR78154

2016-11-16 Thread Martin Sebor

On 11/16/2016 11:49 AM, Prathamesh Kulkarni wrote:

Hi Richard,
Following your suggestion in PR78154, the patch checks if stmt
contains call to memmove (and friends) in gimple_stmt_nonzero_warnv_p
and returns true in that case.


Nice.  I think the list should also include mempcpy, stpcpy, and
stpncpy, and probably also the corresponding checking built-ins
such as __builtin___memcpy_chk.

FWIW, a more general solution to consider (possibly for GCC 8)
might be to extend attribute nonnull to apply to a functions return
value as well (e.g., use zero as the index for that), to indicate
that a pointer returned from it is not null.  That would let library
implementers annotate other functions (such as strerror)

Martin



[patch, libfortran] Add AVX-specific matmul

2016-11-16 Thread Thomas Koenig

Hello world,

the attached patch adds an AVX-specific version of the matmul
intrinsic to the Fortran library.  This works by using the target_clones
attribute.

For testing, I compiled this on powerpc64-unknown-linux-gnu,
without any ill effects.

Also, a resulting binary reached around 15 GFlops for larger matrices
on a 3.4 GHz i7-2600 CPU.  I am currently building/regtesting on
that machine. This can give another 40% speed increase  for large
matrices on AVX.

OK for trunk?

Regards

Thomas

2016-11-16  Thomas Koenig  

PR fortran/78379
* m4/matmul.m4:  For x86_64, make the work function for matmul
static with target_clones for AVX and default, and create
a wrapper function to call it.
* generated/matmul_c10.c
* generated/matmul_c16.c: Regenerated.
* generated/matmul_c4.c: Regenerated.
* generated/matmul_c8.c: Regenerated.
* generated/matmul_i1.c: Regenerated.
* generated/matmul_i16.c: Regenerated.
* generated/matmul_i2.c: Regenerated.
* generated/matmul_i4.c: Regenerated.
* generated/matmul_i8.c: Regenerated.
* generated/matmul_r10.c: Regenerated.
* generated/matmul_r16.c: Regenerated.
* generated/matmul_r4.c: Regenerated.
* generated/matmul_r8.c: Regenerated.
Index: generated/matmul_c10.c
===
--- generated/matmul_c10.c	(Revision 242477)
+++ generated/matmul_c10.c	(Arbeitskopie)
@@ -75,11 +75,37 @@ extern void matmul_c10 (gfc_array_c10 * const rest
 	int blas_limit, blas_call gemm);
 export_proto(matmul_c10);
 
+#ifdef __x86_64__
+
+/* For x86_64, we switch to AVX if that is available.  For this, we
+   let the actual work be done by the static aux_matmul - function.
+   The user-callable function will then automagically contain the
+   selection code for the right architecture.  This is done to avoid
+   knowledge of architecture details in the front end.  */
+
+static void aux_matmul_c10 (gfc_array_c10 * const restrict retarray, 
+	gfc_array_c10 * const restrict a, gfc_array_c10 * const restrict b, int try_blas,
+	int blas_limit, blas_call gemm)
+	__attribute__ ((target_clones("avx,default")));
+
 void
 matmul_c10 (gfc_array_c10 * const restrict retarray, 
 	gfc_array_c10 * const restrict a, gfc_array_c10 * const restrict b, int try_blas,
 	int blas_limit, blas_call gemm)
 {
+  aux_matmul_c10 (retarray, a, b, try_blas, blas_limit, gemm);
+}
+
+static void
+aux_matmul_c10 (gfc_array_c10 * const restrict retarray, 
+	gfc_array_c10 * const restrict a, gfc_array_c10 * const restrict b, int try_blas,
+	int blas_limit, blas_call gemm)
+#else
+matmul_c10 (gfc_array_c10 * const restrict retarray, 
+	gfc_array_c10 * const restrict a, gfc_array_c10 * const restrict b, int try_blas,
+	int blas_limit, blas_call gemm)
+#endif
+{
   const GFC_COMPLEX_10 * restrict abase;
   const GFC_COMPLEX_10 * restrict bbase;
   GFC_COMPLEX_10 * restrict dest;
Index: generated/matmul_c16.c
===
--- generated/matmul_c16.c	(Revision 242477)
+++ generated/matmul_c16.c	(Arbeitskopie)
@@ -75,11 +75,37 @@ extern void matmul_c16 (gfc_array_c16 * const rest
 	int blas_limit, blas_call gemm);
 export_proto(matmul_c16);
 
+#ifdef __x86_64__
+
+/* For x86_64, we switch to AVX if that is available.  For this, we
+   let the actual work be done by the static aux_matmul - function.
+   The user-callable function will then automagically contain the
+   selection code for the right architecture.  This is done to avoid
+   knowledge of architecture details in the front end.  */
+
+static void aux_matmul_c16 (gfc_array_c16 * const restrict retarray, 
+	gfc_array_c16 * const restrict a, gfc_array_c16 * const restrict b, int try_blas,
+	int blas_limit, blas_call gemm)
+	__attribute__ ((target_clones("avx,default")));
+
 void
 matmul_c16 (gfc_array_c16 * const restrict retarray, 
 	gfc_array_c16 * const restrict a, gfc_array_c16 * const restrict b, int try_blas,
 	int blas_limit, blas_call gemm)
 {
+  aux_matmul_c16 (retarray, a, b, try_blas, blas_limit, gemm);
+}
+
+static void
+aux_matmul_c16 (gfc_array_c16 * const restrict retarray, 
+	gfc_array_c16 * const restrict a, gfc_array_c16 * const restrict b, int try_blas,
+	int blas_limit, blas_call gemm)
+#else
+matmul_c16 (gfc_array_c16 * const restrict retarray, 
+	gfc_array_c16 * const restrict a, gfc_array_c16 * const restrict b, int try_blas,
+	int blas_limit, blas_call gemm)
+#endif
+{
   const GFC_COMPLEX_16 * restrict abase;
   const GFC_COMPLEX_16 * restrict bbase;
   GFC_COMPLEX_16 * restrict dest;
Index: generated/matmul_c4.c
===
--- generated/matmul_c4.c	(Revision 242477)
+++ generated/matmul_c4.c	(Arbeitskopie)
@@ -75,11 +75,37 @@ extern void matmul_c4 (gfc_array_c4 * const restri
 	int blas_limit, blas_call gemm);
 

Re: PR61409: -Wmaybe-uninitialized false-positive with -O2

2016-11-16 Thread Jeff Law

On 11/02/2016 11:16 AM, Aldy Hernandez wrote:

Hi Jeff.

As discussed in the PR, here is a patch exploring your idea of ignoring
unguarded uses if we can prove that the guards for such uses are
invalidated by the uninitialized operand paths being executed.

This is an updated patch from my suggestion in the PR.  It bootstraps
with no regression on x86-64 Linux, and fixes the PR in question.

As the "NOTE:" in the code states, we could be much smarter when
invalidating predicates, but for now let's do straight negation which
works for the simple case.  We could expand on this in the future.

OK for trunk?


curr


commit 8375d7e28c1a798dd0cc0f487d7fa1068d9eb124
Author: Aldy Hernandez 
Date:   Thu Aug 25 10:44:29 2016 -0400

PR middle-end/61409
* tree-ssa-uninit.c (use_pred_not_overlap_with_undef_path_pred):
Remove reference to missing NUM_PREDS in function comment.
(can_one_predicate_be_invalidated_p): New.
(can_chain_union_be_invalidated_p): New.
(flatten_out_predicate_chains): New.
(uninit_ops_invalidate_phi_use): New.
(is_use_properly_guarded): Call uninit_ops_invalidate_phi_use.

[ snip ]



+static bool
+can_one_predicate_be_invalidated_p (pred_info predicate,
+   vec worklist)
+{
+  for (size_t i = 0; i < worklist.length (); ++i)
+{
+  pred_info *p = worklist[i];
+
+  /* NOTE: This is a very simple check, and only understands an
+exact opposite.  So, [i == 0] is currently only invalidated
+by [.NOT. i == 0] or [i != 0].  Ideally we should also
+invalidate with say [i > 5] or [i == 8].  There is certainly
+room for improvement here.  */
+  if (pred_neg_p (predicate, *p))
It's good enough for now.  I saw some other routines that might allow us 
to handle more cases.  I'm OK with faulting those in if/when we see such 
cases in real code.




+
+/* Return TRUE if executing the path to some uninitialized operands in
+   a PHI will invalidate the use of the PHI result later on.
+
+   UNINIT_OPNDS is a bit vector specifying which PHI arguments have
+   arguments which are considered uninitialized.
+
+   USE_PREDS is the pred_chain_union specifying the guard conditions
+   for the use of the PHI result.
+
+   What we want to do is disprove each of the guards in the factors of
+   the USE_PREDS.  So if we have:
+
+   # USE_PREDS guards of:
+   #   1. i > 5 && i < 100
+   #   2. j > 10 && j < 88
Are USE_PREDS joined by an AND or IOR?  I guess based on their type it 
must be IOR.   Thus to get to a use  #1 or #2 must be true.  So to prove 
we can't reach a use, we have to prove that #1 and #2 are both not true. 
 Right?




+
+static bool
+uninit_ops_invalidate_phi_use (gphi *phi, unsigned uninit_opnds,
+  pred_chain_union use_preds)
+{
+  /* Look for the control dependencies of all the uninitialized
+ operands and build predicates describing them.  */
+  unsigned i;
+  pred_chain_union uninit_preds[32];
+  memset (uninit_preds, 0, sizeof (pred_chain_union) * 32);
+  for (i = 0; i < MIN (32, gimple_phi_num_args (phi)); i++)
Can you replace the magic "32" with a file scoped const or #define?  I 
believe there's 2 existing uses of a magic "32" elsewhere in 
tree-ssa-uninit.c as well.



+
+  /* Build the control dependency chain for `i'...  */
+  if (compute_control_dep_chain (find_dom (e->src),
+e->src,
+dep_chains,
+_chains,
+_chain,
+_calls))

Does this miss the control dependency carried by E itself.

ie, if e->src ends in a conditional, shouldn't that conditional be 
reflected in the control dependency chain as well?  I guess we'd have to 
have the notion of computing the control dependency for an edge rather 
than a block.  It doesn't look like compute_control_dep_chain is ready 
for that.  I'm willing to put that into a "future work" bucket.


So I think just confirming my question about how USE_PREDS are joined at 
the call to uninit_opts_invalidate_phi_use and fixing the magic 32 to be 
a file scoped const or a #define and this is good to go on the trunk.


jeff




Re: [libstdc++, testsuite] Add dg-require-thread-fence

2016-11-16 Thread Christophe Lyon
On 15 November 2016 at 12:50, Jonathan Wakely  wrote:
> On 14/11/16 14:32 +0100, Christophe Lyon wrote:
>>
>> On 20 October 2016 at 19:40, Jonathan Wakely  wrote:
>>>
>>> On 20/10/16 10:33 -0700, Mike Stump wrote:


 On Oct 20, 2016, at 9:34 AM, Jonathan Wakely  wrote:
>
>
>
> On 20/10/16 09:26 -0700, Mike Stump wrote:
>>
>>
>> On Oct 20, 2016, at 5:20 AM, Jonathan Wakely 
>> wrote:
>>>
>>>
>>>
>>> I am considering leaving this in the ARM backend to force people to
>>> think what they want to do about thread safety with statics and C++
>>> on bare-metal systems.
>
>
>
> The quoting makes it look like those are my words, but I was quoting
> Ramana from https://gcc.gnu.org/ml/gcc-patches/2015-05/msg02751.html
>
>> Not quite in the GNU spirit?  The port people should decide the best
>> way
>> to get as much functionality as possible and everything should just
>> work, no
>> sharp edges.
>>
>> Forcing people to think sounds like a sharp edge?
>
>
>
> I'm inclined to agree, but we are talking about bare metal systems,



 So?  gcc has been doing bare metal systems for more than 2 years now.
 It
 is pretty good at it.  All my primary targets today are themselves bare
 metal systems (I test with newlib).

> where there is no one-size-fits-all solution.



 Configurations are like ice cream cones.  Everyone gets their flavor no
 matter how weird or strange.  Putting nails in a cone because you don't
 know
 if they like vanilla or chocolate isn't reasonable.  If you want, make
 two
 flavors, and vend two, if you want to just do one, pick the flavor and
 vend
 it.  Put an enum #define default_flavor vanilla, and you then have
 support
 for any flavor you want.  Want to add a configure option for the flavor
 select, add it.  You want to make a -mflavor=chocolate option, add it.
 gcc
 is literally littered with these things.
>>>
>>>
>>>
>>> Like I said, you can either build the library with
>>> -fno-threadsafe-statics or you can provide a definition of the missing
>>> symbol.
>>>
>> I gave this a try (using CXXFLAGS_FOR_TARGET=-fno-threadsafe-statics).
>> It seems to do the trick indeed: almost all tests now pass, the flag is
>> added
>> to testcase compilation.
>>
>> Among the 6 remaining failures, I noticed these two:
>> - experimental/type_erased_allocator/2.cc: still complains about the
>> missing
>> __sync_synchronize. Does it need dg-require-thread-fence?
>
>
> Yes, I think that test actually uses atomics directly, so does depend
> on the fence.
>
I've attached the patch to achieve this.
Is it OK?

>> - abi/header_cxxabi.c complains because the option is not valid for C.
>> I can see the test is already skipped for other C++-only options: it is OK
>> if I submit a patch to skip it if -fno-threadsafe-statics is used?
>
>
> Yes, it makes sense there too.

This one is not as obvious as I hoped. I tried:
-// { dg-skip-if "invalid options for C" { *-*-* } { "-std=c++??"
"-std=gnu++??" } }
+// { dg-skip-if "invalid options for C" { *-*-* } { "-std=c++??"
"-std=gnu++??" "-fno-threadsafe-statics" } }

but it does not work.

I set CXXFLAGS_FOR_TARGET=-fno-threadsafe-statics
before running GCC's configure.

This results in -fno-threadsafe-statics being used when compiling the tests,
but dg-skip-if does not consider it: it would if I passed it via
runtestflags/target-board, but then it would mean passing this flag
to all tests, not only the c++ ones, leading to errors everywhere.

Am I missing something?

Thanks,

Christophe

>> I think I'm going to use this flag in validations from now on (target
>> arm-none-eabi
>> only, with default mode/cpu/fpu).
>
>
> Thanks for the update on this.
>
libstdc++-v3/ChangeLog:

2016-11-16  Christophe Lyon  

* testsuite/experimental/type_erased_allocator/2.cc: Add
  dg-require-thread-fence.

diff --git a/libstdc++-v3/testsuite/experimental/type_erased_allocator/2.cc 
b/libstdc++-v3/testsuite/experimental/type_erased_allocator/2.cc
index 216a88c..0b73359 100644
--- a/libstdc++-v3/testsuite/experimental/type_erased_allocator/2.cc
+++ b/libstdc++-v3/testsuite/experimental/type_erased_allocator/2.cc
@@ -1,4 +1,5 @@
 // { dg-do run { target c++14 } }
+// { dg-require-thread-fence "" }
 
 // Copyright (C) 2015-2016 Free Software Foundation, Inc.
 //


Re: C++ PATCH for c++/78358 (decltype and decomposition)

2016-11-16 Thread Jason Merrill
On Tue, Nov 15, 2016 at 11:31 AM, Jason Merrill  wrote:
> OK, (hopefully) one more patch for decltype and C++17 decomposition
> declarations.  I hadn't been thinking that "referenced type" meant to
> look through references in the tuple case, since other parts of
> [dcl.decomp] define "the referenced type" directly, but that does seem
> to be how it's used elsewhere in the standard.

Nope, that wasn't right, either.  The tuple section of [dcl.decomp]
also defines "the referenced type" in a way that makes it impossible
to determine from the actual type of the variable, so we need to store
it somewhere.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit 235535a2ee99c30292f8f964a3759fbeb6067e45
Author: Jason Merrill 
Date:   Tue Nov 15 22:22:34 2016 -0500

Fix tuple decomposition decltype.

* decl.c (store_decomp_type, lookup_decomp_type): New.
(cp_finish_decomp): Call store_decomp_type.
* semantics.c (finish_decltype_type): Call lookup_decomp_type.
* cp-tree.h: Declare lookup_decomp_type.

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 0dcb897..cb1b9fa 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -5808,6 +5808,7 @@ extern tree start_decl(const 
cp_declarator *, cp_decl_specifier_seq *, int,
 extern void start_decl_1   (tree, bool);
 extern bool check_array_initializer(tree, tree, tree);
 extern void cp_finish_decl (tree, tree, bool, tree, int);
+extern tree lookup_decomp_type (tree);
 extern void cp_finish_decomp   (tree, tree, unsigned int);
 extern int cp_complete_array_type  (tree *, tree, bool);
 extern int cp_complete_array_type_or_error (tree *, tree, bool, 
tsubst_flags_t);
diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 23ba087..c54a2de 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -7293,6 +7293,22 @@ get_tuple_decomp_init (tree decl, unsigned i)
 }
 }
 
+/* It's impossible to recover the decltype of a tuple decomposition variable
+   based on the actual type of the variable, so store it in a hash table.  */
+static GTY(()) hash_map *decomp_type_table;
+static void
+store_decomp_type (tree v, tree t)
+{
+  if (!decomp_type_table)
+decomp_type_table = hash_map::create_ggc (13);
+  decomp_type_table->put (v, t);
+}
+tree
+lookup_decomp_type (tree v)
+{
+  return *decomp_type_table->get (v);
+}
+
 /* Finish a decomposition declaration.  DECL is the underlying declaration
"e", FIRST is the head of a chain of decls for the individual identifiers
chained through DECL_CHAIN in reverse order and COUNT is the number of
@@ -7467,6 +7483,8 @@ cp_finish_decomp (tree decl, tree first, unsigned int 
count)
  v[i]);
  goto error_out;
}
+ /* Save the decltype away before reference collapse.  */
+ store_decomp_type (v[i], eltype);
  eltype = cp_build_reference_type (eltype, !lvalue_p (init));
  TREE_TYPE (v[i]) = eltype;
  layout_decl (v[i], 0);
diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index dc5ad13..96c67a5 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -8902,7 +8902,7 @@ finish_decltype_type (tree expr, bool 
id_expression_or_member_access_p,
return unlowered_expr_type (expr);
  else
/* Expr is a reference variable for the tuple case.  */
-   return non_reference (TREE_TYPE (expr));
+   return lookup_decomp_type (expr);
}
 
   switch (TREE_CODE (expr))
diff --git a/gcc/testsuite/g++.dg/cpp1z/decomp17.C 
b/gcc/testsuite/g++.dg/cpp1z/decomp17.C
new file mode 100644
index 000..484094b
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/decomp17.C
@@ -0,0 +1,15 @@
+// { dg-options -std=c++1z }
+
+#include 
+
+template  struct same_type;
+template  struct same_type {};
+
+int main() {
+  int i;
+  std::tuple tuple = { 1, i, 1 };
+  auto &[v, r, rr] = tuple;
+  same_type{};
+  same_type{};
+  same_type{};
+}
diff --git a/gcc/tree.h b/gcc/tree.h
index 6a98b6e..0a82a4a 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -2457,8 +2457,7 @@ extern void decl_value_expr_insert (tree, tree);
 
 /* In a VAR_DECL or PARM_DECL, the location at which the value may be found,
if transformations have made this more complicated than evaluating the
-   decl itself.  This should only be used for debugging; once this field has
-   been set, the decl itself may not legitimately appear in the function.  */
+   decl itself.  */
 #define DECL_HAS_VALUE_EXPR_P(NODE) \
   (TREE_CHECK3 (NODE, VAR_DECL, PARM_DECL, RESULT_DECL) \
->decl_common.decl_flag_2)


RE: [PATCH 1/4] MIPS16/GCC: Fix DImode `casesi_internal_mips16_' assembly instructions

2016-11-16 Thread Maciej W. Rozycki
On Wed, 16 Nov 2016, Matthew Fortune wrote:

> OK. I have no idea what system supports 64-bit MIPS16 but given it costs
> little to improve consistency here then it is at least doing no harm.

 No recent real hardware I believe.  Among older implementations there 
were the NEC Vr4111 and Vr4121 processors, both at the MIPS III ISA level, 
sans atomics and FPU.  Which is likely why we only support the soft-float 
model with NewABI MIPS16 compilations.  They had the usual TLB MMU though, 
so in principle they could run Linux and have the FPU emulator kick in for 
hard-float operations, except of course their glory was well before MIPS16 
support was even considered for Linux.

 Then there's always QEMU of course too.

 Patch applied (with another obvious whitespace fix), along with the rest 
from this set.  Thanks for your review!

  Maciej


Re: [PATCH] Fix up ICEs with TREE_CONSTANT references (PR c++/78373)

2016-11-16 Thread Jakub Jelinek
On Wed, Nov 16, 2016 at 04:26:36PM -0500, Jason Merrill wrote:
> On Wed, Nov 16, 2016 at 4:00 PM, Jakub Jelinek  wrote:
> > Jason's recent patch to turn reference vars initialized with invariant
> > addresses broke the first testcase below, because >singleton
> > is considered TREE_CONSTANT (because self is TREE_CONSTANT VAR_DECL and
> > singleton field has constant offset), but after going into SSA form
> > it is not supposed to be TREE_CONSTANT anymore (_2->singleton),
> > because SSA_NAMEs don't have TREE_CONSTANT set on them.
> >
> > The following patch fixes it by gimplifying such vars into their
> > DECL_INITIAL unless in OpenMP regions, where such folding is deferred
> > until omplower pass finishes.
> 
> Hmm, this seems like a workaround; why don't we see the same problem
> with constant pointer variables?

Dunno, tried to construct a testcase, but it doesn't fail.  If it is e.g.
TREE_CONSTANT because of being constexpr, then the FE already replaces it
by its initializer.

> A simpler workaround would be to not set TREE_CONSTANT on references
> in the first place, since the constexpr code doesn't need it.  What do
> you think?

If the FE doesn't need it, indeed it would be simpler this way.

> commit 6cdd28bb152fcb07a7eb6c9f053cd435cf719a20
> Author: Jason Merrill 
> Date:   Wed Nov 16 16:13:25 2016 -0500
> 
> ref
> 
> diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
> index c54a2de..87db589 100644
> --- a/gcc/cp/decl.c
> +++ b/gcc/cp/decl.c
> @@ -6839,7 +6839,8 @@ cp_finish_decl (tree decl, tree init, bool 
> init_const_expr_p,
> /* Set these flags now for templates.  We'll update the flags in
>store_init_value for instantiations.  */
> DECL_INITIALIZED_BY_CONSTANT_EXPRESSION_P (decl) = 1;
> -   if (decl_maybe_constant_var_p (decl))
> +   if (decl_maybe_constant_var_p (decl)
> +   && TREE_CODE (type) != REFERENCE_TYPE)
>   TREE_CONSTANT (decl) = 1;
>   }
>  }
> diff --git a/gcc/cp/typeck2.c b/gcc/cp/typeck2.c
> index 022a478..dcdb710 100644
> --- a/gcc/cp/typeck2.c
> +++ b/gcc/cp/typeck2.c
> @@ -824,7 +824,8 @@ store_init_value (tree decl, tree init, vec va_gc>** cleanups, int flags)
>const_init = (reduced_constant_expression_p (value)
>   || error_operand_p (value));
>DECL_INITIALIZED_BY_CONSTANT_EXPRESSION_P (decl) = const_init;
> -  TREE_CONSTANT (decl) = const_init && decl_maybe_constant_var_p (decl);
> +  if (TREE_CODE (type) != REFERENCE_TYPE)
> + TREE_CONSTANT (decl) = const_init && decl_maybe_constant_var_p (decl);
>  }
>value = cp_fully_fold (value);
>  

Jakub


PR78319

2016-11-16 Thread Prathamesh Kulkarni
Hi,
As discussed in PR, this patch marks the test-case to xfail on arm-none-eabi.
OK to commit ?

Thanks,
Prathamesh
2016-11-17  Prathamesh Kulkarni  

PR tree-optimization/78319

testsuite/
* gcc.dg/uninit-pred-8_a.c (foo): Mark dg-bogus test to xfail on
arm-none-eabi.

diff --git a/gcc/testsuite/gcc.dg/uninit-pred-8_a.c 
b/gcc/testsuite/gcc.dg/uninit-pred-8_a.c
index 1b7c472..c45fba0 100644
--- a/gcc/testsuite/gcc.dg/uninit-pred-8_a.c
+++ b/gcc/testsuite/gcc.dg/uninit-pred-8_a.c
@@ -16,8 +16,9 @@ int foo (int n, int l, int m, int r)
   if (m) g++;
   else   bar();
 
+  /* marking this test as xfail on arm-none-eabi, see PR78319.  */
   if ( n ||  m || r || l)
-  blah(v); /* { dg-bogus "uninitialized" "bogus warning" } */
+  blah(v); /* { dg-bogus "uninitialized" "bogus warning" { xfail 
arm-none-eabi } } */
 
   if ( n )
   blah(v); /* { dg-bogus "uninitialized" "bogus warning" } */


Re: PR78319

2016-11-16 Thread Jeff Law

On 11/16/2016 01:23 PM, Prathamesh Kulkarni wrote:

Hi,
As discussed in PR, this patch marks the test-case to xfail on arm-none-eabi.
OK to commit ?
You might check if Aldy's change to the uninit code helps your case 
(approved earlier today, so hopefully in the tree very soon).  I quickly 
scanned the BZ.  There's some overlap, but it might be too complex for 
Aldy's enhancements to catch.


jeff


c-family PATCH to tidy switch diagnostics (PR c/78285)

2016-11-16 Thread Marek Polacek
As pointed out in Bug 78285, some error calls should actually be inform calls.
I'm not adding any new test; existing switch-5.c covers all the cases so I 
didn't
see much value in duplicating that part of the test.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2016-11-16  Marek Polacek  

PR c/78285
* c-common.c (c_add_case_label): Turn error_at calls into inform.

* gcc.dg/switch-5.c: Turn several dg-errors into dg-messages.
* g++.dg/ext/case-range2.C: Likewise.

diff --git gcc/c-family/c-common.c gcc/c-family/c-common.c
index 2997c83..3eb7f45 100644
--- gcc/c-family/c-common.c
+++ gcc/c-family/c-common.c
@@ -4968,19 +4968,19 @@ c_add_case_label (location_t loc, splay_tree cases, 
tree cond, tree orig_type,
   if (high_value)
{
  error_at (loc, "duplicate (or overlapping) case value");
- error_at (DECL_SOURCE_LOCATION (duplicate),
-   "this is the first entry overlapping that value");
+ inform (DECL_SOURCE_LOCATION (duplicate),
+ "this is the first entry overlapping that value");
}
   else if (low_value)
{
  error_at (loc, "duplicate case value") ;
- error_at (DECL_SOURCE_LOCATION (duplicate), "previously used here");
+ inform (DECL_SOURCE_LOCATION (duplicate), "previously used here");
}
   else
{
  error_at (loc, "multiple default labels in one switch");
- error_at (DECL_SOURCE_LOCATION (duplicate),
-   "this is the first default label");
+ inform (DECL_SOURCE_LOCATION (duplicate),
+ "this is the first default label");
}
   goto error_out;
 }
diff --git gcc/testsuite/g++.dg/ext/case-range2.C 
gcc/testsuite/g++.dg/ext/case-range2.C
index 985ded3..f1165ad 100644
--- gcc/testsuite/g++.dg/ext/case-range2.C
+++ gcc/testsuite/g++.dg/ext/case-range2.C
@@ -11,7 +11,7 @@ T f2 (T i)
 {
   switch (i)
   {
-case low ... high : return i + 1;  // { dg-error "previously" }
+case low ... high : return i + 1;  // { dg-message "previously" }
 case 5 : return i + 2; // { dg-error "duplicate" }
 default : return 0;
   }
@@ -20,7 +20,7 @@ T f2 (T i)
 int f (int i)
 {
   switch (i) {
-case 1 ... 10: return i + 1;   // { dg-error "first entry" }
+case 1 ... 10: return i + 1;   // { dg-message "first entry" }
 case 3 ... 5 : return i + 3;   // { dg-error "duplicate" }
 default: return f2 (i);// { dg-message "required" }
   }
diff --git gcc/testsuite/gcc.dg/switch-5.c gcc/testsuite/gcc.dg/switch-5.c
index 5a58490..a097d44 100644
--- gcc/testsuite/gcc.dg/switch-5.c
+++ gcc/testsuite/gcc.dg/switch-5.c
@@ -40,13 +40,13 @@ f (int a, double d, void *p)
   switch (a)
 {
 case 0:
-default: /* { dg-error "this is the first default label" } */
+default: /* { dg-message "this is the first default label" } */
 case 1:
 default: ; /* { dg-error "multiple default labels in one switch" } */
 }
   switch (a)
 {
-case 0: /* { dg-error "previously used here" } */
+case 0: /* { dg-message "previously used here" } */
 case 1:
 case 0: ; /* { dg-error "duplicate case value" } */
 }
@@ -60,11 +60,11 @@ f (int a, double d, void *p)
  }
switch (a)
  {
- case 0: /* { dg-error "this is the first entry overlapping that value" } 
*/
+ case 0: /* { dg-message "this is the first entry overlapping that value" 
} */
  case -1 ... 1: /* { dg-error "duplicate \\(or overlapping\\) case value" 
} */
- case 2 ... 3: /* { dg-error "previously used here" } */
+ case 2 ... 3: /* { dg-message "previously used here" } */
  case 2: /* { dg-error "duplicate case value" } */
- case 4 ... 7: /* { dg-error "this is the first entry overlapping that 
value" } */
+ case 4 ... 7: /* { dg-message "this is the first entry overlapping that 
value" } */
  case 6 ... 9: ; /* { dg-error "duplicate \\(or overlapping\\) case value" 
} */
  }
switch (a)

Marek


[PATCH v2 0/2] strncmp builtin expansion improvement

2016-11-16 Thread Aaron Sawdey
Builtin expansion of strncmp currently only happens when at least one
of the string arguments is a constant string. Two pieces are needed to
enable this:

1) Fix i386.md cmpstrnsi pattern. It uses repzcmpsb which does not
actually test for the zero byte ending the string. So this is only a
valid pattern when the length of one of the strings is known. So this
adds a test for one of the string args being a string constant, in
which case expand_builtin_strncmp will have made sure the length arg is
no larger than this known length. Also I've changed the pattern to
reflect the fact that the generated code can clobber operand 3 if it
happens to be in cx.

2) If c_strlen () was unable to determine the length of either string,
expand_builtin_strncmp will now use only the length argument and will
proceed anyway. Also I've removed a couple pieces that Richard
indicated are not needed any more.

Bootstrap & regtest passed on x86_64 with svn 242454, ok for trunk?

-- 
Aaron Sawdey, Ph.D.  acsaw...@linux.vnet.ibm.com
050-2/C113  (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC Toolchain



Re: [PATCH] Fix NetBSD bootstrap

2016-11-16 Thread Mike Stump

> On Nov 16, 2016, at 11:23 AM, Krister Walfridsson 
>  wrote:
> 
> On Wed, 16 Nov 2016, Mike Stump wrote:
> 
>> Looks reasonable.  The biggest issue would be if any of those values changed 
>> through time, and the current version works for older netbsd releases, the 
>> patch could break them.  Of course, I don't have any visibility into how any 
>> of those values might have changed through time.
> 
> This should not be an issue in this case, so I'll commit the patch. Thanks!

Oh, I don't know if you are tracking release branches so that previous releases 
just work, but if you are, you can keep track of patches that would need to be 
back ported for a release branch to work nicely.  As you finish with getting 
things in shape, you can then go back and see about back porting that work, if 
you are interested in that.



Re: [patch, libfortran] Add AVX-specific matmul

2016-11-16 Thread Jakub Jelinek
On Wed, Nov 16, 2016 at 10:30:03PM +0100, Thomas Koenig wrote:
> the attached patch adds an AVX-specific version of the matmul
> intrinsic to the Fortran library.  This works by using the target_clones
> attribute.

Don't you need to test in configure if the assembler supports AVX?
Otherwise if somebody is bootstrapping gcc with older assembler, it will
just fail to bootstrap.
For matmul_i*, wouldn't it make more sense to use avx2 instead of avx,
or both avx and avx2 and maybe avx512f?

> 2016-11-16  Thomas Koenig  
> 
> PR fortran/78379
> * m4/matmul.m4:  For x86_64, make the work function for matmul

Why the extra space before For?

> static with target_clones for AVX and default, and create
> a wrapper function to call it.
> * generated/matmul_c10.c

Missing : Regenerated.

Jakub


[PING^2][PATCH][aarch64] Improve Logical And Immediate Expressions

2016-11-16 Thread Michael Collison
Ping^2. Link to original post:

https://gcc.gnu.org/ml/gcc-patches/2016-10/msg02305.html


[PATCH] PR fortran/58001 -- Handle tab in FORMAT

2016-11-16 Thread Steve Kargl
An earlier version of the attached patch lingered in bugzilla
for over 3 years.  I've updated the patch to include Manuel's
comment #12.  Regression tested on x86_64-*-freebsd.  OK to
commit?

2016-11-16  Steven G. Kargl  

PR fortran/58001
* io.c (next_char_not_space): Update handling of a 'tab' in a FORMAT.
(format_lex): Adjust invocations of next_char_not_space().
 
2016-11-16  Steven G. Kargl  

PR fortran/58001
* gfortran.dg/fmt_tab_1.f90: Adjust testcase.
* gfortran.dg/fmt_tab_2.f90: Ditto.

-- 
Steve
Index: gcc/fortran/io.c
===
--- gcc/fortran/io.c	(revision 242512)
+++ gcc/fortran/io.c	(working copy)
@@ -200,23 +200,14 @@ unget_char (void)
 /* Eat up the spaces and return a character.  */
 
 static char
-next_char_not_space (bool *error)
+next_char_not_space ()
 {
   char c;
   do
 {
   error_element = c = next_char (NONSTRING);
   if (c == '\t')
-	{
-	  if (gfc_option.allow_std & GFC_STD_GNU)
-	gfc_warning (0, "Extension: Tab character in format at %C");
-	  else
-	{
-	  gfc_error ("Extension: Tab character in format at %C");
-	  *error = true;
-	  return c;
-	}
-	}
+	gfc_warning (OPT_Wtabs, "Nonconforming tab character in format at %C");
 }
   while (gfc_is_whitespace (c));
   return c;
@@ -234,7 +225,6 @@ format_lex (void)
   char c, delim;
   int zflag;
   int negative_flag;
-  bool error = false;
 
   if (saved_token != FMT_NONE)
 {
@@ -243,7 +233,7 @@ format_lex (void)
   return token;
 }
 
-  c = next_char_not_space ();
+  c = next_char_not_space ();
   
   negative_flag = 0;
   switch (c)
@@ -253,7 +243,7 @@ format_lex (void)
   /* Falls through.  */
 
 case '+':
-  c = next_char_not_space ();
+  c = next_char_not_space ();
   if (!ISDIGIT (c))
 	{
 	  token = FMT_UNKNOWN;
@@ -264,7 +254,7 @@ format_lex (void)
 
   do
 	{
-	  c = next_char_not_space ();
+	  c = next_char_not_space ();
 	  if (ISDIGIT (c))
 	value = 10 * value + c - '0';
 	}
@@ -294,7 +284,7 @@ format_lex (void)
 
   do
 	{
-	  c = next_char_not_space ();
+	  c = next_char_not_space ();
 	  if (ISDIGIT (c))
 	{
 	  value = 10 * value + c - '0';
@@ -329,7 +319,7 @@ format_lex (void)
   break;
 
 case 'T':
-  c = next_char_not_space ();
+  c = next_char_not_space ();
   switch (c)
 	{
 	case 'L':
@@ -357,7 +347,7 @@ format_lex (void)
   break;
 
 case 'S':
-  c = next_char_not_space ();
+  c = next_char_not_space ();
   if (c != 'P' && c != 'S')
 	unget_char ();
 
@@ -365,7 +355,7 @@ format_lex (void)
   break;
 
 case 'B':
-  c = next_char_not_space ();
+  c = next_char_not_space ();
   if (c == 'N' || c == 'Z')
 	token = FMT_BLANK;
   else
@@ -427,7 +417,7 @@ format_lex (void)
   break;
 
 case 'E':
-  c = next_char_not_space ();
+  c = next_char_not_space ();
   if (c == 'N' )
 	token = FMT_EN;
   else if (c == 'S')
@@ -457,7 +447,7 @@ format_lex (void)
   break;
 
 case 'D':
-  c = next_char_not_space ();
+  c = next_char_not_space ();
   if (c == 'P')
 	{
 	  if (!gfc_notify_std (GFC_STD_F2003, "DP format "
@@ -478,7 +468,7 @@ format_lex (void)
 	  "specifier not allowed at %C"))
 	return FMT_ERROR;
 	  token = FMT_DT;
-	  c = next_char_not_space ();
+	  c = next_char_not_space ();
 	  if (c == '\'' || c == '"')
 	{
 	  delim = c;
@@ -518,7 +508,7 @@ format_lex (void)
   break;
 
 case 'R':
-  c = next_char_not_space ();
+  c = next_char_not_space ();
   switch (c)
 	{
 	case 'C':
@@ -559,9 +549,6 @@ format_lex (void)
   break;
 }
 
-  if (error)
-return FMT_ERROR;
-
   return token;
 }
 
Index: gcc/testsuite/gfortran.dg/fmt_tab_1.f90
===
--- gcc/testsuite/gfortran.dg/fmt_tab_1.f90	(revision 242512)
+++ gcc/testsuite/gfortran.dg/fmt_tab_1.f90	(working copy)
@@ -1,7 +1,12 @@
 ! { dg-do compile }
-! { dg-options -Wno-error=tabs }
+! { dg-options -Wtabs }
 ! PR fortran/32987
+! PR fortran/58001
   program TestFormat
 write (*, 10)
- 10 format ('Hello ',	'bug!') ! { dg-warning "Extension: Tab character in format" }
+! There is a tab character before 'bug!'.  This is accepted without
+! the -Wno-tabs option or a -std= option.
+ 10 format ('Hello ',	'bug!') ! { dg-warning "tab character in format" }
+
   end
+! { dg-excess-errors "tab character in format" }
Index: gcc/testsuite/gfortran.dg/fmt_tab_2.f90
===
--- gcc/testsuite/gfortran.dg/fmt_tab_2.f90	(revision 242512)
+++ gcc/testsuite/gfortran.dg/fmt_tab_2.f90	(working copy)
@@ -1,7 +1,9 @@
 ! { dg-do compile }
 ! { dg-options "-std=f2003" }
 ! PR fortran/32987
+! PR fortran/58001
   program TestFormat
-write 

Re: [PATCH] Fix up ICEs with TREE_CONSTANT references (PR c++/78373)

2016-11-16 Thread Jason Merrill
On Wed, Nov 16, 2016 at 4:00 PM, Jakub Jelinek  wrote:
> Jason's recent patch to turn reference vars initialized with invariant
> addresses broke the first testcase below, because >singleton
> is considered TREE_CONSTANT (because self is TREE_CONSTANT VAR_DECL and
> singleton field has constant offset), but after going into SSA form
> it is not supposed to be TREE_CONSTANT anymore (_2->singleton),
> because SSA_NAMEs don't have TREE_CONSTANT set on them.
>
> The following patch fixes it by gimplifying such vars into their
> DECL_INITIAL unless in OpenMP regions, where such folding is deferred
> until omplower pass finishes.

Hmm, this seems like a workaround; why don't we see the same problem
with constant pointer variables?

A simpler workaround would be to not set TREE_CONSTANT on references
in the first place, since the constexpr code doesn't need it.  What do
you think?
commit 6cdd28bb152fcb07a7eb6c9f053cd435cf719a20
Author: Jason Merrill 
Date:   Wed Nov 16 16:13:25 2016 -0500

ref

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index c54a2de..87db589 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -6839,7 +6839,8 @@ cp_finish_decl (tree decl, tree init, bool 
init_const_expr_p,
  /* Set these flags now for templates.  We'll update the flags in
 store_init_value for instantiations.  */
  DECL_INITIALIZED_BY_CONSTANT_EXPRESSION_P (decl) = 1;
- if (decl_maybe_constant_var_p (decl))
+ if (decl_maybe_constant_var_p (decl)
+ && TREE_CODE (type) != REFERENCE_TYPE)
TREE_CONSTANT (decl) = 1;
}
 }
diff --git a/gcc/cp/typeck2.c b/gcc/cp/typeck2.c
index 022a478..dcdb710 100644
--- a/gcc/cp/typeck2.c
+++ b/gcc/cp/typeck2.c
@@ -824,7 +824,8 @@ store_init_value (tree decl, tree init, vec** 
cleanups, int flags)
   const_init = (reduced_constant_expression_p (value)
|| error_operand_p (value));
   DECL_INITIALIZED_BY_CONSTANT_EXPRESSION_P (decl) = const_init;
-  TREE_CONSTANT (decl) = const_init && decl_maybe_constant_var_p (decl);
+  if (TREE_CODE (type) != REFERENCE_TYPE)
+   TREE_CONSTANT (decl) = const_init && decl_maybe_constant_var_p (decl);
 }
   value = cp_fully_fold (value);
 


Re: [patch,libgfortran] PR51119 - MATMUL slow for large matrices

2016-11-16 Thread Jerry DeLisle

Committed after approval on bugzilla to eliminate warnings.

2016-11-16  Jerry DeLisle  

PR libgfortran/51119
* Makefile.am: Remove -fno-protect-parens -fstack-arrays.
* Makefile.in: Regenerate.


r242517 = 026291bdda18395d7c746856dd7e4ed384856a1b (refs/remotes/svn/trunk)
M   libgfortran/Makefile.in
M   libgfortran/ChangeLog
M   libgfortran/Makefile.am

Regards,

Jerry


[PATCH v2 1/2, i386] cmpstrnsi needs string length

2016-11-16 Thread Aaron Sawdey
This patch adds a test to the cmpstrnsi pattern in i386.md so that it
will bail out (FAIL) if neither of the strings is a constant string. It
can only work as a proper strncmp if the length is not longer than both
of the strings. This change is required if expand_builtin_strncmp is
going to try expansion of strncmp when neither string argument is
constant. I've also changed the pattern to indicate that operand 3 may
be clobbered (if it happens to be in cx already).

2016-11-16  Aaron Sawdey  

* config/i386/i386.md (cmpstrnsi): New test to bail out if neither
string input is a string constant.  Clobber length argument.


-- 
Aaron Sawdey, Ph.D.  acsaw...@linux.vnet.ibm.com
050-2/C113  (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC ToolchainIndex: gcc/config/i386/i386.md
===
--- gcc/config/i386/i386.md	(revision 242428)
+++ gcc/config/i386/i386.md	(working copy)
@@ -16898,7 +16898,7 @@
   [(set (match_operand:SI 0 "register_operand")
 	(compare:SI (match_operand:BLK 1 "general_operand")
 		(match_operand:BLK 2 "general_operand")))
-   (use (match_operand 3 "general_operand"))
+   (clobber (match_operand 3 "general_operand"))
(use (match_operand 4 "immediate_operand"))]
   ""
 {
@@ -16911,6 +16911,21 @@
   if (fixed_regs[CX_REG] || fixed_regs[SI_REG] || fixed_regs[DI_REG])
 FAIL;
 
+  /* One of the strings must be a constant.  If so, expand_builtin_strncmp()
+ will have rewritten the length arg to be the minimum of the const string
+ length and the actual length arg.  If both strings are the same and
+ shorter than the length arg, repz cmpsb will not stop at the 0 byte and
+ will incorrectly base the results on chars past the 0 byte.  */
+  tree t1 = MEM_EXPR (operands[1]);
+  tree t2 = MEM_EXPR (operands[2]);
+  if (!((t1 && TREE_CODE (t1) == MEM_REF
+ && TREE_CODE (TREE_OPERAND (t1, 0)) == ADDR_EXPR
+ && TREE_CODE (TREE_OPERAND (TREE_OPERAND (t1, 0), 0)) == STRING_CST)
+  || (t2 && TREE_CODE (t2) == MEM_REF
+  && TREE_CODE (TREE_OPERAND (t2, 0)) == ADDR_EXPR
+  && TREE_CODE (TREE_OPERAND (TREE_OPERAND (t2, 0), 0)) == STRING_CST)))
+FAIL;
+
   out = operands[0];
   if (!REG_P (out))
 out = gen_reg_rtx (SImode);


[PATCH v2 2/2, expand] make expand_builtin_strncmp more general

2016-11-16 Thread Aaron Sawdey
This patch makes expand_builtin_strncmp attempt to expand via cmpstrnsi
even if neither of the string arguments are string constants.

2016-11-16  Aaron Sawdey  

* builtins.c (expand_builtin_strncmp): Attempt expansion of strncmp
via cmpstrnsi even if neither string is constant.

-- 
Aaron Sawdey, Ph.D.  acsaw...@linux.vnet.ibm.com
050-2/C113  (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC ToolchainIndex: gcc/builtins.c
===
--- gcc/builtins.c	(revision 242428)
+++ gcc/builtins.c	(working copy)
@@ -3918,7 +3918,7 @@
   insn_code cmpstrn_icode = direct_optab_handler (cmpstrn_optab, SImode);
   if (cmpstrn_icode != CODE_FOR_nothing)
   {
-tree len, len1, len2;
+tree len, len1, len2, len3;
 rtx arg1_rtx, arg2_rtx, arg3_rtx;
 rtx result;
 tree fndecl, fn;
@@ -3937,14 +3937,19 @@
 if (len2)
   len2 = size_binop_loc (loc, PLUS_EXPR, ssize_int (1), len2);
 
+len3 = fold_convert_loc (loc, sizetype, arg3);
+
 /* If we don't have a constant length for the first, use the length
-   of the second, if we know it.  We don't require a constant for
+   of the second, if we know it.  If neither string is constant length,
+   use the given length argument.  We don't require a constant for
this case; some cost analysis could be done if both are available
but neither is constant.  For now, assume they're equally cheap,
unless one has side effects.  If both strings have constant lengths,
use the smaller.  */
 
-if (!len1)
+if (!len1 && !len2)
+  len = len3;
+else if (!len1)
   len = len2;
 else if (!len2)
   len = len1;
@@ -3961,23 +3966,10 @@
 else
   len = len2;
 
-/* If both arguments have side effects, we cannot optimize.  */
-if (!len || TREE_SIDE_EFFECTS (len))
-  return NULL_RTX;
-
-/* The actual new length parameter is MIN(len,arg3).  */
-len = fold_build2_loc (loc, MIN_EXPR, TREE_TYPE (len), len,
-		   fold_convert_loc (loc, TREE_TYPE (len), arg3));
-
-/* If we don't have POINTER_TYPE, call the function.  */
-if (arg1_align == 0 || arg2_align == 0)
-  return NULL_RTX;
-
-/* Stabilize the arguments in case gen_cmpstrnsi fails.  */
-arg1 = builtin_save_expr (arg1);
-arg2 = builtin_save_expr (arg2);
-len = builtin_save_expr (len);
-
+/* If we are not using the given length, we must incorporate it here.
+   The actual new length parameter will be MIN(len,arg3) in this case.  */
+if (len != len3)
+  len = fold_build2_loc (loc, MIN_EXPR, TREE_TYPE (len), len, len3);
 arg1_rtx = get_memory_rtx (arg1, len);
 arg2_rtx = get_memory_rtx (arg2, len);
 arg3_rtx = expand_normal (len);


Re: [2/9] Encoding support for AArch64 DWARF operations

2016-11-16 Thread Jason Merrill
On Wed, Nov 16, 2016 at 12:50 PM, Jason Merrill  wrote:
> On Fri, Nov 11, 2016 at 1:33 PM, Jiong Wang  wrote:
>> The encoding for new added AARCH64 DWARF operations.
>
> This patch seems rather incomplete; I only see a change to
> dwarf2out.c, which won't compile since the opcodes aren't defined
> anywhere.

Sorry, now I see the rest of the patchset.

Jason


Re: [PATCH] Follow-up patch on enabling new AVX512 instructions

2016-11-16 Thread Uros Bizjak
On Tue, Nov 15, 2016 at 10:50 PM, Andrew Senkevich
 wrote:
> Hi,
>
> this is follow-up with tests for new __target__ attributes and
> __builtin_cpu_supports update.
>
> gcc/
> * config/i386/i386.c (processor_features): Add
> F_AVX5124VNNIW, F_AVX5124FMAPS.
> (isa_names_table): Handle new features.
> libgcc/
> * config/i386/cpuinfo.c (processor_features): Add
> FEATURE_AVX5124VNNIW, FEATURE_AVX5124FMAPS.
> gcc/testsuite/
> * gcc.target/i386/builtin_target.c: Handle new "avx5124vnniw",
> "avx5124fmaps".
> * gcc.target/i386/funcspec-56.inc: Test new attributes.

OK.

Thanks,
Uros.

> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 1da1abc..823930d
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -33205,6 +33205,8 @@ fold_builtin_cpu (tree fndecl, tree *args)
>  F_AVX512PF,
>  F_AVX512VBMI,
>  F_AVX512IFMA,
> +F_AVX5124VNNIW,
> +F_AVX5124FMAPS,
>  F_MAX
>};
>
> @@ -33317,6 +33319,8 @@ fold_builtin_cpu (tree fndecl, tree *args)
>{"avx512pf",F_AVX512PF},
>{"avx512vbmi",F_AVX512VBMI},
>{"avx512ifma",F_AVX512IFMA},
> +  {"avx5124vnniw",F_AVX5124VNNIW},
> +  {"avx5124fmaps",F_AVX5124FMAPS},
>  };
>
>tree __processor_model_type = build_processor_model_struct ();
> diff --git a/gcc/testsuite/gcc.target/i386/builtin_target.c
> b/gcc/testsuite/gcc.target/i386/builtin_target.c
> index 8d45d83..c620a74
> --- a/gcc/testsuite/gcc.target/i386/builtin_target.c
> +++ b/gcc/testsuite/gcc.target/i386/builtin_target.c
> @@ -213,6 +213,10 @@ check_features (unsigned int ecx, unsigned int edx,
> assert (__builtin_cpu_supports ("avx512ifma"));
>if (ecx & bit_AVX512VBMI)
> assert (__builtin_cpu_supports ("avx512vbmi"));
> +  if (edx & bit_AVX5124VNNIW)
> +   assert (__builtin_cpu_supports ("avx5124vnniw"));
> +  if (edx & bit_AVX5124FMAPS)
> +   assert (__builtin_cpu_supports ("avx5124fmaps"));
>  }
>  }
>
> @@ -311,6 +315,10 @@ quick_check ()
>
>assert (__builtin_cpu_supports ("avx512f") >= 0);
>
> +  assert (__builtin_cpu_supports ("avx5124vnniw") >= 0);
> +
> +  assert (__builtin_cpu_supports ("avx5124fmaps") >= 0);
> +
>/* Check CPU type.  */
>assert (__builtin_cpu_is ("amd") >= 0);
>
> diff --git a/libgcc/config/i386/cpuinfo.c b/libgcc/config/i386/cpuinfo.c
> index af203f2..4a0ad25
> --- a/libgcc/config/i386/cpuinfo.c
> +++ b/libgcc/config/i386/cpuinfo.c
> @@ -115,7 +115,9 @@ enum processor_features
>FEATURE_AVX512ER,
>FEATURE_AVX512PF,
>FEATURE_AVX512VBMI,
> -  FEATURE_AVX512IFMA
> +  FEATURE_AVX512IFMA,
> +  FEATURE_AVX5124VNNIW,
> +  FEATURE_AVX5124FMAPS
>  };
>
>  struct __processor_model
> @@ -359,6 +361,10 @@ get_available_features (unsigned int ecx, unsigned int 
> edx,
> features |= (1 << FEATURE_AVX512IFMA);
>if (ecx & bit_AVX512VBMI)
> features |= (1 << FEATURE_AVX512VBMI);
> +  if (edx & bit_AVX5124VNNIW)
> +   features |= (1 << FEATURE_AVX5124VNNIW);
> +  if (edx & bit_AVX5124FMAPS)
> +   features |= (1 << FEATURE_AVX5124FMAPS);
>  }
>
>unsigned int ext_level;
> diff --git a/gcc/testsuite/gcc.target/i386/funcspec-56.inc
> b/gcc/testsuite/gcc.target/i386/funcspec-56.inc
> index 521ac8a..9334e9e 100644
> --- a/gcc/testsuite/gcc.target/i386/funcspec-56.inc
> +++ b/gcc/testsuite/gcc.target/i386/funcspec-56.inc
> @@ -28,6 +28,8 @@ extern void test_avx512dq(void)
>  __attribute__((__target__("avx512dq")));
>  extern void test_avx512er(void)
> __attribute__((__target__("avx512er")));
>  extern void test_avx512pf(void)
> __attribute__((__target__("avx512pf")));
>  extern void test_avx512cd(void)
> __attribute__((__target__("avx512cd")));
> +extern void test_avx5124fmaps(void)
> __attribute__((__target__("avx5124fmaps")));
> +extern void test_avx5124vnniw(void)
> __attribute__((__target__("avx5124vnniw")));
>  extern void test_bmi (void)
> __attribute__((__target__("bmi")));
>  extern void test_bmi2 (void)
> __attribute__((__target__("bmi2")));
>
> @@ -59,6 +61,8 @@ extern void test_no_avx512dq(void)
> __attribute__((__target__("no-avx512dq")));
>  extern void test_no_avx512er(void)
> __attribute__((__target__("no-avx512er")));
>  extern void test_bo_avx512pf(void)
> __attribute__((__target__("no-avx512pf")));
>  extern void test_no_avx512cd(void)
> __attribute__((__target__("no-avx512cd")));
> +extern void test_no_avx5124fmaps(void)
> __attribute__((__target__("no-avx5124fmaps")));
> +extern void test_no_avx5124vnniw(void)
> __attribute__((__target__("no-avx5124vnniw")));
>  extern void test_no_bmi (void)
> __attribute__((__target__("no-bmi")));
>  extern void test_no_bmi2 (void)
> __attribute__((__target__("no-bmi2")));
>
>
> --
> WBR,
> Andrew


[PATCH] Fix up ICEs with TREE_CONSTANT references (PR c++/78373)

2016-11-16 Thread Jakub Jelinek
Hi!

Jason's recent patch to turn reference vars initialized with invariant
addresses broke the first testcase below, because >singleton
is considered TREE_CONSTANT (because self is TREE_CONSTANT VAR_DECL and
singleton field has constant offset), but after going into SSA form
it is not supposed to be TREE_CONSTANT anymore (_2->singleton),
because SSA_NAMEs don't have TREE_CONSTANT set on them.

The following patch fixes it by gimplifying such vars into their
DECL_INITIAL unless in OpenMP regions, where such folding is deferred
until omplower pass finishes.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-11-16  Jakub Jelinek  

PR c++/78373
* gimplify.c (gimplify_decl_expr): For TREE_CONSTANT reference
vars with is_gimple_min_invariant initializer after stripping
useless conversions keep non-NULL DECL_INITIAL.
(gimplify_var_or_parm_decl): Add fallback argument.  Gimplify
TREE_CONSTANT reference vars with is_gimple_min_invariant initializer
outside of OpenMP contexts to the initializer if fb_rvalue is
allowed.
(gimplify_compound_lval, gimplify_expr): Pass through fallback
argument to gimplify_var_or_parm_decl.
* omp-low.c (lower_omp_regimplify_p): Return non-zero for
TREE_CONSTANT reference vars with is_gimple_min_invariant
initializer.

* g++.dg/opt/pr78373.C: New test.
* g++.dg/gomp/pr78373.C: New test.

--- gcc/gimplify.c.jj   2016-11-10 12:34:12.0 +0100
+++ gcc/gimplify.c  2016-11-16 16:15:13.496510476 +0100
@@ -1645,10 +1645,23 @@ gimplify_decl_expr (tree *stmt_p, gimple
{
  if (!TREE_STATIC (decl))
{
+ tree new_init = NULL_TREE;
+ if (TREE_CONSTANT (decl)
+ && TREE_CODE (TREE_TYPE (decl)) == REFERENCE_TYPE)
+   {
+ new_init = init;
+ STRIP_USELESS_TYPE_CONVERSION (new_init);
+ if (is_gimple_min_invariant (new_init))
+   new_init = unshare_expr (new_init);
+ else
+   new_init = NULL_TREE;
+   }
  DECL_INITIAL (decl) = NULL_TREE;
  init = build2 (INIT_EXPR, void_type_node, decl, init);
  gimplify_and_add (init, seq_p);
  ggc_free (init);
+ if (new_init && DECL_INITIAL (decl) == NULL_TREE)
+   DECL_INITIAL (decl) = new_init;
}
  else
/* We must still examine initializers for static variables
@@ -2546,7 +2559,7 @@ static tree nonlocal_vla_vars;
DECL_VALUE_EXPR, and it's worth re-examining things.  */
 
 static enum gimplify_status
-gimplify_var_or_parm_decl (tree *expr_p)
+gimplify_var_or_parm_decl (tree *expr_p, fallback_t fallback)
 {
   tree decl = *expr_p;
 
@@ -2607,6 +2620,29 @@ gimplify_var_or_parm_decl (tree *expr_p)
   return GS_OK;
 }
 
+  /* Gimplify TREE_CONSTANT C++ reference vars as their DECL_INITIAL.  */
+  if (VAR_P (decl)
+  && TREE_CONSTANT (decl)
+  && TREE_CODE (TREE_TYPE (decl)) == REFERENCE_TYPE
+  && DECL_INITIAL (decl)
+  && (fallback & fb_rvalue) != 0
+  && is_gimple_min_invariant (DECL_INITIAL (decl)))
+{
+  if (gimplify_omp_ctxp)
+   {
+ /* Don't do this in OpenMP regions, see e.g. libgomp.c++/target-14.C
+testcase.  We'll do that in the omplower pass later instead.  */
+ if ((gimplify_omp_ctxp->region_type != ORT_TARGET
+  || gimplify_omp_ctxp->outer_context != NULL
+  || !lookup_attribute ("omp declare target",
+DECL_ATTRIBUTES (cfun->decl)))
+ && gimplify_omp_ctxp->region_type != ORT_NONE)
+   return GS_ALL_DONE;
+   }
+  *expr_p = unshare_expr (DECL_INITIAL (decl));
+  return GS_OK;
+}
+
   return GS_ALL_DONE;
 }
 
@@ -2712,7 +2748,7 @@ gimplify_compound_lval (tree *expr_p, gi
   /* Expand DECL_VALUE_EXPR now.  In some cases that may expose
 additional COMPONENT_REFs.  */
   else if ((VAR_P (*p) || TREE_CODE (*p) == PARM_DECL)
-  && gimplify_var_or_parm_decl (p) == GS_OK)
+  && gimplify_var_or_parm_decl (p, fallback) == GS_OK)
goto restart;
   else
break;
@@ -11624,7 +11660,7 @@ gimplify_expr (tree *expr_p, gimple_seq
 
case VAR_DECL:
case PARM_DECL:
- ret = gimplify_var_or_parm_decl (expr_p);
+ ret = gimplify_var_or_parm_decl (expr_p, fallback);
  break;
 
case RESULT_DECL:
--- gcc/omp-low.c.jj2016-11-16 09:23:46.0 +0100
+++ gcc/omp-low.c   2016-11-16 15:44:35.071617469 +0100
@@ -16963,6 +16963,14 @@ lower_omp_regimplify_p (tree *tp, int *w
   && bitmap_bit_p (task_shared_vars, DECL_UID (t)))
 return t;
 
+  /* And C++ references with TREE_CONSTANT set too.  */
+  if (VAR_P (t)
+  && TREE_CONSTANT (t)
+  && 

[PATCH] Fix combine's make_extraction (PR rtl-optimization/78378)

2016-11-16 Thread Jakub Jelinek
Hi!

If inner is a MEM, make_extraction requires that pos is a multiple of bytes
and deals with offsetting it.  Or otherwise requires that pos is a multiple
of BITS_PER_WORD and for REG inner it handles that too.  But if inner
is something different, it calls just force_to_mode to the target mode,
which only really works if pos is 0.

Thus the following patch restricts it to that case.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-11-16  Jakub Jelinek  

PR rtl-optimization/78378
* combine.c (make_extraction): Use force_to_mode for non-{REG,MEM}
inner only if pos is 0.  Fix up formatting.

* gcc.c-torture/execute/pr78378.c: New test.

--- gcc/combine.c.jj2016-11-16 18:51:58.0 +0100
+++ gcc/combine.c   2016-11-16 19:52:12.735674000 +0100
@@ -7382,6 +7382,7 @@ make_extraction (machine_mode mode, rtx
   if (tmode != BLKmode
   && ((pos_rtx == 0 && (pos % BITS_PER_WORD) == 0
   && !MEM_P (inner)
+  && (pos == 0 || REG_P (inner))
   && (inner_mode == tmode
   || !REG_P (inner)
   || TRULY_NOOP_TRUNCATION_MODES_P (tmode, inner_mode)
@@ -7458,10 +7459,9 @@ make_extraction (machine_mode mode, rtx
}
   else
new_rtx = force_to_mode (inner, tmode,
-len >= HOST_BITS_PER_WIDE_INT
-? HOST_WIDE_INT_M1U
-: (HOST_WIDE_INT_1U << len) - 1,
-0);
+len >= HOST_BITS_PER_WIDE_INT
+? HOST_WIDE_INT_M1U
+: (HOST_WIDE_INT_1U << len) - 1, 0);
 
   /* If this extraction is going into the destination of a SET,
 make a STRICT_LOW_PART unless we made a MEM.  */
--- gcc/testsuite/gcc.c-torture/execute/pr78378.c.jj2016-11-16 
20:02:16.975248597 +0100
+++ gcc/testsuite/gcc.c-torture/execute/pr78378.c   2016-11-16 
20:02:03.0 +0100
@@ -0,0 +1,18 @@
+/* PR rtl-optimization/78378 */
+
+unsigned long long __attribute__ ((noinline, noclone))
+foo (unsigned long long x)
+{
+  x <<= 41;
+  x /= 232;
+  return 1 + (unsigned short) x;
+}
+
+int
+main ()
+{
+  unsigned long long x = foo (1);
+  if (x != 0x2c24)
+__builtin_abort();
+  return 0;
+}

Jakub


Re: [PATCH] Fix NetBSD bootstrap

2016-11-16 Thread Krister Walfridsson

On Wed, 16 Nov 2016, Mike Stump wrote:

Looks reasonable.  The biggest issue would be if any of those values 
changed through time, and the current version works for older netbsd 
releases, the patch could break them.  Of course, I don't have any 
visibility into how any of those values might have changed through time.


This should not be an issue in this case, so I'll commit the patch. 
Thanks!


   /Krister


Re: [PATCH] Handle --enable-checking={yes,assert,release} in libcpp (PR bootstrap/72823)

2016-11-16 Thread Richard Biener
On November 16, 2016 7:22:51 PM GMT+01:00, Jakub Jelinek  
wrote:
>Hi!
>
>As mentioned in the PR, libcpp uses gcc_assert in a couple of places,
>but guards it with ENABLE_ASSERT_CHECKING macro that is never defined
>in libcpp.
>
>This patch arranges for it to be defined if ENABLE_ASSERT_CHECKING
>is going to be defined in gcc subdir.
>
>Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Richard.

>2016-11-16  Jakub Jelinek  
>
>   PR bootstrap/72823
>   * configure.ac (ENABLE_ASSERT_CHECKING): Define if gcc configure
>   would define that macro.
>   * configure: Regenerated.
>   * config.in: Regenerated.
>
>--- libcpp/configure.ac.jj 2016-05-20 12:44:36.0 +0200
>+++ libcpp/configure.ac2016-11-15 16:40:16.753842880 +0100
>@@ -152,9 +152,11 @@ for check in release $ac_checking_flags
> do
>   case $check in
>   # these set all the flags to specific states
>-  yes|all) ac_checking=1 ; ac_valgrind_checking= ;;
>-  no|none|release) ac_checking= ; ac_valgrind_checking= ;;
>+  yes|all) ac_checking=1 ; ac_assert_checking=1 ; ac_valgrind_checking=
>;;
>+  no|none) ac_checking= ; ac_assert_checking= ; ac_valgrind_checking=
>;;
>+  release) ac_checking= ; ac_assert_checking=1 ; ac_valgrind_checking=
>;;
>   # these enable particular checks
>+  assert) ac_assert_checking=1 ;;
>   misc) ac_checking=1 ;;
>   valgrind) ac_valgrind_checking=1 ;;
>   # accept
>@@ -170,6 +172,11 @@ else
>   AC_DEFINE(CHECKING_P, 0)
> fi
> 
>+if test x$ac_assert_checking != x ; then
>+  AC_DEFINE(ENABLE_ASSERT_CHECKING, 1,
>+[Define if you want assertions enabled.  This is a cheap check.])
>+fi
>+
> if test x$ac_valgrind_checking != x ; then
>   AC_DEFINE(ENABLE_VALGRIND_CHECKING, 1,
>[Define if you want to workaround valgrind (a memory checker) warnings
>about
>--- libcpp/configure.jj2016-05-20 12:44:36.0 +0200
>+++ libcpp/configure   2016-11-15 16:40:35.331607679 +0100
>@@ -7288,9 +7288,11 @@ for check in release $ac_checking_flags
> do
>   case $check in
>   # these set all the flags to specific states
>-  yes|all) ac_checking=1 ; ac_valgrind_checking= ;;
>-  no|none|release) ac_checking= ; ac_valgrind_checking= ;;
>+  yes|all) ac_checking=1 ; ac_assert_checking=1 ; ac_valgrind_checking=
>;;
>+  no|none) ac_checking= ; ac_assert_checking= ; ac_valgrind_checking=
>;;
>+  release) ac_checking= ; ac_assert_checking=1 ; ac_valgrind_checking=
>;;
>   # these enable particular checks
>+  assert) ac_assert_checking=1 ;;
>   misc) ac_checking=1 ;;
>   valgrind) ac_valgrind_checking=1 ;;
>   # accept
>@@ -7308,6 +7310,12 @@ else
> 
> fi
> 
>+if test x$ac_assert_checking != x ; then
>+
>+$as_echo "#define ENABLE_ASSERT_CHECKING 1" >>confdefs.h
>+
>+fi
>+
> if test x$ac_valgrind_checking != x ; then
> 
> $as_echo "#define ENABLE_VALGRIND_CHECKING 1" >>confdefs.h
>--- libcpp/config.in.jj2016-05-20 12:44:36.0 +0200
>+++ libcpp/config.in   2016-11-15 16:40:38.0 +0100
>@@ -14,6 +14,9 @@
> /* Define to 1 if using `alloca.c'. */
> #undef C_ALLOCA
> 
>+/* Define if you want assertions enabled. This is a cheap check. */
>+#undef ENABLE_ASSERT_CHECKING
>+
> /* Define to enable system headers canonicalization. */
> #undef ENABLE_CANONICAL_SYSTEM_HEADERS
> 
>
>   Jakub




Re: Ping: Re: [PATCH 1/2] gcc: Remove unneeded global flag.

2016-11-16 Thread Mike Stump
On Nov 16, 2016, at 12:09 PM, Andrew Burgess  
wrote:
> My only remaining concern is the new tests, I've tried to restrict
> them to targets that I suspect they'll pass on with:
> 
>/* { dg-final-use { scan-assembler "\.section\[\t 
> \]*\.text\.unlikely\[\\n\\r\]+\[\t \]*\.size\[\t \]*foo\.cold\.0" { target 
> *-*-linux* *-*-gnu* } } } */
> 
> but I'm still nervous that I'm going to introduce test failures.  Is
> there any advice / guidance I should follow before I commit, or are
> folk pretty relaxed so long as I've made a reasonable effort?

So, if you are worried about the way the line is constructed, I usually test it 
by misspelling the *-*-linux* *-*-gnu* part as *-*-linNOTux* *-*-gnNOTu* and 
see if the test then doesn't run on your machine.  If it doesn't then you can 
be pretty confident that only machines that match the target triplet can be 
impacted.  I usually do this type of testing by running the test case in 
isolation (not the full tests suite).  Anyway, do the best you can, and don't 
worry about t it too much, learn from the experience, even if it goes wrong in 
some way.  If it did go wrong, just be responsive (don't check it in just 
before a 6 week vacation) about fixing it, if you can.



RE: [PATCH] microMIPS/GCC: Fix PIC call relaxation

2016-11-16 Thread Maciej W. Rozycki
On Wed, 16 Nov 2016, Matthew Fortune wrote:

> > Fix `-mrelax-pic-calls' support for microMIPS code where the relocation
> > produced is supposed to be R_MICROMIPS_JALR rather than R_MIPS_JALR.
> > The lack of short delay support comes from a missed update to this code
> > for microMIPS support and can be relieved as JALRS and JRS instructions
> > can be relaxed to BALS and B instructions respectively, so do that as
> > well.
> 
> Thanks. I didn't follow the background to this optimisation when I added
> the compact branch support so opted to retain the pre-existing behaviour.

 In any case there was no deliberate choice made here, but just a missed 
update along the original microMIPS change.

> > By doing so complement commit r196828 ("microMIPS gcc support"),
> > , which is the
> > original change that introduced microMIPS support, in particular to
> > MIPS_CALL, which is where this code previously resided.
> > 
> > Adjust the test suite accordingly, limiting R_MICROMIPS_JALR cases to
> 
> Typo fwiw: Should say R_MIPS_JALR for MIPS code.

 Indeed.  We don't use GIT style commit logs though so I can't correct it 
and your spotting will just stay here for posterity.

> >  NB the use of this feature for microMIPS is limited because short
> > encodings of register jump instructions usually do not have their branch
> > counterparts and long encodings typically are not used.  However at
> > least
> > tail calls can be converted if the jump target is in range, as can calls
> > in `-minsn32' code.  Perhaps we could switch to producing `j[al]r[s].32'
> > in the `-mrelax-pic-calls' mode like GAS does with the `jal' and `j'
> > microMIPS macros in PIC code.
> 
> Does the linker do anything for R_MICROMIPS_JALR currently? From memory
> I seem to think it was mostly ignored. Is there any risk with older linkers
> by introducing R_MICROMIPS_JALR in GCC generated code?

 For the o32 ABI the R_MICROMIPS_JALR reloc is currently effectively 
ignored by the BFD linker, although some preparations for handling have 
been made and therefore there are two switch cases where it appears.

 Actual handling for PIC call relaxation is only included for R_MIPS_JALR 
in `mips_elf_perform_relocation' though, where the actual JALR and JR 
instruction encoding is checked, which is different between the regular 
MIPS and the microMIPS ISAs.

 The presence of R_MIPS_JALR in microMIPS code will lead to an incorrect 
internal `cross_mode_jump_p' setting in `mips_elf_calculate_relocation', 
however this does not cause harm because `mips_elf_perform_relocation' 
code will check instruction encoding.

 For n32 and n64 ABIs the BFD linker handling of R_MIPS_JALR is done in 
`_bfd_mips_relax_section' instead and does not take `cross_mode_jump_p' 
into account, however the instruction encoding is likewise checked, so no 
harm there either.  I plan to remove this function, perhaps before 2.28 
even, as it is duplicated by the corresponding o32 handler now, but 
suffers from considerable bitrot.

 I plan to implement the missing R_MICROMIPS_JALR handling in the BFD 
linker sometime as well, although it may take a little bit yet.  As I 
noted in the original change description, the usable cases are limited, so 
there isn't as much incentive to have this implemented as there was with 
the regular MIPS ISA, which is also probably the reason why this was 
missed from the original microMIPS support patches.

> >  Let me know if the lack of microMIPS results would be a problem for
> > this
> > patch's acceptance.
> 
> Not a problem. We will pick this up as part of testing for the release.
> 
> There is a bit of coding style fuzz in the testcases but it is
> pre-existing from the code you duplicated so I don't think it needs
> fixing.

 I missed that indeed, having copied the test cases verbatim.  I wonder if 
we shouldn't actually factor out these test case bodies to shared .h files 
included from corresponding regular MIPS and microMIPS test cases, and 
then have the formatting fixes applied there instead.

> >  Otherwise, OK to apply?
> 
> OK, as long as you can say there is no risk with the new reloc and older
> linkers.

 As double-checked and documented above I can say there is no risk, so I 
have committed this change now.  Thanks for your review.

  Maciej


Re: Fix PR78154

2016-11-16 Thread Marc Glisse

On Wed, 16 Nov 2016, Martin Sebor wrote:


On 11/16/2016 11:49 AM, Prathamesh Kulkarni wrote:

Hi Richard,
Following your suggestion in PR78154, the patch checks if stmt
contains call to memmove (and friends) in gimple_stmt_nonzero_warnv_p
and returns true in that case.


Nice.  I think the list should also include mempcpy, stpcpy, and
stpncpy, and probably also the corresponding checking built-ins
such as __builtin___memcpy_chk.

FWIW, a more general solution to consider (possibly for GCC 8)
might be to extend attribute nonnull to apply to a functions return
value as well (e.g., use zero as the index for that), to indicate
that a pointer returned from it is not null.  That would let library
implementers annotate other functions (such as strerror)


We already have that, under the name returns_nonnull. IIRC, people found a 
new name clearer than using position 0, when I posted the patch. Note also 
that memcpy already has both an attribute that says that it returns its 
first argument, and an attribute that says that said first argument is 
nonnull.


(I've heard some noise in C++-land about making memcpy(0,0,0) valid, but 
that may have just been noise)


--
Marc Glisse


Re: [patch, libfortran] Add AVX-specific matmul

2016-11-16 Thread Jerry DeLisle

On 11/16/2016 01:30 PM, Thomas Koenig wrote:

Hello world,

the attached patch adds an AVX-specific version of the matmul
intrinsic to the Fortran library.  This works by using the target_clones
attribute.

For testing, I compiled this on powerpc64-unknown-linux-gnu,
without any ill effects.

Also, a resulting binary reached around 15 GFlops for larger matrices
on a 3.4 GHz i7-2600 CPU.  I am currently building/regtesting on
that machine. This can give another 40% speed increase  for large
matrices on AVX.

OK for trunk?



Did you intend to name it avx_matmul and not aux_matmul?

Are the compiler flags for avx handled automatically by the gcc attributes so no 
need to endit the Makefile.am?


Fix the first and if yes to the second question, OK

Jerry







Re: Fix PR78154

2016-11-16 Thread Martin Sebor

On 11/16/2016 02:21 PM, Marc Glisse wrote:

On Wed, 16 Nov 2016, Martin Sebor wrote:


On 11/16/2016 11:49 AM, Prathamesh Kulkarni wrote:

Hi Richard,
Following your suggestion in PR78154, the patch checks if stmt
contains call to memmove (and friends) in gimple_stmt_nonzero_warnv_p
and returns true in that case.


Nice.  I think the list should also include mempcpy, stpcpy, and
stpncpy, and probably also the corresponding checking built-ins
such as __builtin___memcpy_chk.

FWIW, a more general solution to consider (possibly for GCC 8)
might be to extend attribute nonnull to apply to a functions return
value as well (e.g., use zero as the index for that), to indicate
that a pointer returned from it is not null.  That would let library
implementers annotate other functions (such as strerror)


We already have that, under the name returns_nonnull. IIRC, people found
a new name clearer than using position 0, when I posted the patch. Note
also that memcpy already has both an attribute that says that it returns
its first argument, and an attribute that says that said first argument
is nonnull.


Ah, right!  Thanks for the reminder!

__builtin_memcpy and __builtin_strcpy are both declared with
the same attribute list ATTR_RET1_NOTHROW_NONNULL_LEAF (as are
their checking versions) and that's defined like so:

  DEF_ATTR_TREE_LIST (ATTR_RET1_NOTHROW_NONNULL_LEAF, ATTR_FNSPEC,
ATTR_LIST_STR1, ATTR_NOTHROW_NONNULL_LEAF)

ATTR_NOTHROW_NONNULL_LEAF doesn't include returns_nonull and neither
does ATTR_FNSPEC ATTR_LIST_STR1 (unless it's meant to imply that).

In any event, lookup_attribute ("returns_nonnull") fails for both
of these functions so I think the fix might be as "simple" as adding
the attribute.  Alternatively, if attribute fn spec str1 should imply
returns_nonnull when nonnull is set because it says (IIUC) that
the function returns the first (non-null) argument, the changes will
be a bit more involved and require adjusting other places besides
VRP (I see it used in fold-const.c as well similarly as in VRP).

(FWIW, I quoted "simple" above because it recently took me the better
part of an afternoon to figure out how to add attribute alloc_size to
malloc.)



(I've heard some noise in C++-land about making memcpy(0,0,0) valid, but
that may have just been noise)


We may have read the same discussion.  It would make some things
a little easier in C++ (and remove what most people view as yet
another unnecessary gotcha in the language).

Martin


Re: Ping: Re: [PATCH 1/2] gcc: Remove unneeded global flag.

2016-11-16 Thread Andrew Burgess
* Mike Stump  [2016-11-16 12:59:53 -0800]:

> On Nov 16, 2016, at 12:09 PM, Andrew Burgess  
> wrote:
> > My only remaining concern is the new tests, I've tried to restrict
> > them to targets that I suspect they'll pass on with:
> > 
> >/* { dg-final-use { scan-assembler "\.section\[\t 
> > \]*\.text\.unlikely\[\\n\\r\]+\[\t \]*\.size\[\t \]*foo\.cold\.0" { target 
> > *-*-linux* *-*-gnu* } } } */
> > 
> > but I'm still nervous that I'm going to introduce test failures.  Is
> > there any advice / guidance I should follow before I commit, or are
> > folk pretty relaxed so long as I've made a reasonable effort?
> 
> So, if you are worried about the way the line is constructed, I usually test 
> it by misspelling the *-*-linux* *-*-gnu* part as *-*-linNOTux* *-*-gnNOTu* 
> and see if the test then doesn't run on your machine.  If it doesn't then you 
> can be pretty confident that only machines that match the target triplet can 
> be impacted.  I usually do this type of testing by running the test case in 
> isolation (not the full tests suite).  Anyway, do the best you can, and don't 
> worry about t it too much, learn from the experience, even if it goes wrong 
> in some way.  If it did go wrong, just be responsive (don't check it in just 
> before a 6 week vacation) about fixing it, if you can.
> 

Thanks for the feedback.

Change committed as revision 242519.  If anyone sees any issues just
let me know.

Thanks,
Andrew


Re: ubsan PATCH to fix compile-time hog with operator overloading (PR sanitizer/78208)

2016-11-16 Thread Marek Polacek
On Fri, Nov 04, 2016 at 05:16:00PM +0100, Jakub Jelinek wrote:
> On Fri, Nov 04, 2016 at 05:05:51PM +0100, Marek Polacek wrote:
> > This is a similar case to PR sanitizer/70342.  Here, we were generating 
> > expression
> > in a quadratic fashion because of the initializer--we create SAVE_EXPR <>, 
> > then 
> > UBSAN_NULL >, and then COMPOUND_EXPR of these two and so on.
> > 
> > On this testcase we were instrumention CALL_EXPR that is in fact 
> > operator<<.  I
> > think those always return a reference, so it cannot be NULL, so there's no
> > point in instrumenting those?
> 
> How do you know what is the return type of a user defined overloaded
> operator?
> Even when it returns a reference, I thought the point of -fsanitize=null was
> for all references to verify their addresses are non-null.
> 
> I must say I don't understand the issue, if it is the same SAVE_EXPR in both
> lhs of COMPOUND_EXPR and UBSAN_NULL argument, shouldn't cp_genericize_r use
> of hash table to avoid walking the same tree multiple times avoid the
> exponential compile time/memory?

Sorry.  So consider the following:

class S
{
  virtual void foo () = 0;
};

struct T {
  T  << (const char *s);
};

T t;

void
S::foo ()
{
  t << "a" << "b" << "c";
}

Before
1498   if (flag_sanitize & (SANITIZE_NULL | SANITIZE_ALIGNMENT))
1499 ubsan_maybe_instrument_member_call (stmt, is_ctor);

the stmt will be

T::operator<< (T::operator<< (T::operator<< (, "a"), "b"), "c")

after ubsan_maybe_instrument_member_call it will be

T::operator<< (UBSAN_NULL (SAVE_EXPR , 4B, 0);, SAVE_EXPR ;, "c")

and that's what is saved into the hash table.  Then another stmt will be the
inner

T::operator<< (T::operator<< (, "a"), "b")

which we instrument and put into the hash table, and so on.  But those
SAVE_EXPRs aren't the same.  So we have a T::operator<< call that has nested
T::operator<< calls and we kind of recursively instrument all of them.

Not sure if I made this any clearer nor if this can be avoided... :(

Marek


Re: [PATCH] Fix NetBSD bootstrap

2016-11-16 Thread Joseph Myers
I'll presume you know best about the choices of stdint.h types.  You may 
wish to consider what the correct value of use_gcc_stdint is - the 
default "none" (rely on the system's header), or "wrap" (use GCC's header 
in freestanding mode) or "provide" (always use GCC's header).

Note that GCC's header includes support for TS 18661-1 integer width 
macros, and the testsuite verifies these work in freestanding mode.  So if 
you use "none" but your system's header lacks support for these macros, 
you'll have test failures.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: RFA: PATCH to gengtype to avoid putting tree_node support in front end objects

2016-11-16 Thread Jason Merrill
On Wed, Nov 16, 2016 at 1:45 PM, Moore, Catherine
 wrote:
> /scratch/cmoore/mips-sde-elf-upstream/src/gcc-trunk-6/gcc/hash-map.h:62:12: 
> error: no matching function for call to 'gt_ggc_mx(rtx_def*&)'
>
> I configured with --target=mips-sde-elf, but I do have some local multilib 
> definitions for that target.  This ought to reproduce with mti-elf as well.
> Will you please fix or revert?

Fixed thus, parallel to the declarations in tree.h.
commit 764bcd5c87000e3ddc85607674e68122fe986e51
Author: Jason Merrill 
Date:   Wed Nov 16 17:22:19 2016 -0500

* rtl.h: Declare gt_ggc_mx and gt_pch_nx.

diff --git a/gcc/rtl.h b/gcc/rtl.h
index df5172b..6a4cf36 100644
--- a/gcc/rtl.h
+++ b/gcc/rtl.h
@@ -3771,5 +3771,9 @@ struct GTY(()) cgraph_rtl_info {
   unsigned function_used_regs_valid: 1;
 };
 
+/* gtype-desc.c.  */
+extern void gt_ggc_mx (rtx &);
+extern void gt_pch_nx (rtx &);
+extern void gt_pch_nx (rtx &, gt_pointer_operator, void *);
 
 #endif /* ! GCC_RTL_H */


Re: [PATCH] Fix combine's make_extraction (PR rtl-optimization/78378)

2016-11-16 Thread Segher Boessenkool
On Wed, Nov 16, 2016 at 10:07:23PM +0100, Jakub Jelinek wrote:
> If inner is a MEM, make_extraction requires that pos is a multiple of bytes
> and deals with offsetting it.  Or otherwise requires that pos is a multiple
> of BITS_PER_WORD and for REG inner it handles that too.  But if inner
> is something different, it calls just force_to_mode to the target mode,
> which only really works if pos is 0.
> 
> Thus the following patch restricts it to that case.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Yes, thanks!


Segher


Re: [patch, libfortran] Add AVX-specific matmul

2016-11-16 Thread Jakub Jelinek
On Thu, Nov 17, 2016 at 12:03:18AM +0100, Thomas Koenig wrote:
> >Don't you need to test in configure if the assembler supports AVX?
> >Otherwise if somebody is bootstrapping gcc with older assembler, it will
> >just fail to bootstrap.
> 
> That's a good point.  The AVX instructions were added in binutils 2.19,
> which was released in 2011. This could be put in the prerequisites.
> 
> What should the test do?  Fail with an error message "you need newer
> binutils" or simply (and silently) not compile the AVX vesion?

>From what I understood, you want those functions just to be implementation
details, not exported from libgfortran.so*.  Thus the test would do
something similar to what gcc/testsuite/lib/target-supports.exp 
(check_effective_target_avx)
does, but of course in autoconf way, not in tcl.
Also, from what I see, target_clones just use IFUNCs, so you probably also
need some configure test whether ifuncs are supported (the
gcc.target/i386/mvc* tests use dg-require-ifunc, so you'd need something
similar again in configure.  But if so, then I have no idea why you use
a wrapper around the function, instead of using it on the exported APIs.

> >For matmul_i*, wouldn't it make more sense to use avx2 instead of avx,
> >or both avx and avx2 and maybe avx512f?
> 
> I did a vdiff of the disassembled code generated or avx and avx2, and
> (somewhat to my surprise) there was no difference.  Maybe, with more
> unrolling, something more might have happened. I didn't check for
> AVX512f, but I can do that.

For the float/double code it wouldn't surprise me (assuming you don't need
gather insns and similar stuff).  But for integers generally most of the
avx instructions can only handle 128-bit vectors, while avx2 has 256-bit
ones.

Jakub


Re: Fix PR78154

2016-11-16 Thread Martin Sebor

On 11/16/2016 05:17 PM, Martin Sebor wrote:

On 11/16/2016 02:21 PM, Marc Glisse wrote:

On Wed, 16 Nov 2016, Martin Sebor wrote:


On 11/16/2016 11:49 AM, Prathamesh Kulkarni wrote:

Hi Richard,
Following your suggestion in PR78154, the patch checks if stmt
contains call to memmove (and friends) in gimple_stmt_nonzero_warnv_p
and returns true in that case.


Nice.  I think the list should also include mempcpy, stpcpy, and
stpncpy, and probably also the corresponding checking built-ins
such as __builtin___memcpy_chk.

FWIW, a more general solution to consider (possibly for GCC 8)
might be to extend attribute nonnull to apply to a functions return
value as well (e.g., use zero as the index for that), to indicate
that a pointer returned from it is not null.  That would let library
implementers annotate other functions (such as strerror)


We already have that, under the name returns_nonnull. IIRC, people found
a new name clearer than using position 0, when I posted the patch. Note
also that memcpy already has both an attribute that says that it returns
its first argument, and an attribute that says that said first argument
is nonnull.


Ah, right!  Thanks for the reminder!

__builtin_memcpy and __builtin_strcpy are both declared with
the same attribute list ATTR_RET1_NOTHROW_NONNULL_LEAF (as are
their checking versions) and that's defined like so:

  DEF_ATTR_TREE_LIST (ATTR_RET1_NOTHROW_NONNULL_LEAF, ATTR_FNSPEC,
ATTR_LIST_STR1, ATTR_NOTHROW_NONNULL_LEAF)

ATTR_NOTHROW_NONNULL_LEAF doesn't include returns_nonull and neither
does ATTR_FNSPEC ATTR_LIST_STR1 (unless it's meant to imply that).

In any event, lookup_attribute ("returns_nonnull") fails for both
of these functions so I think the fix might be as "simple" as adding
the attribute.  Alternatively, if attribute fn spec str1 should imply
returns_nonnull when nonnull is set because it says (IIUC) that
the function returns the first (non-null) argument, the changes will
be a bit more involved and require adjusting other places besides
VRP (I see it used in fold-const.c as well similarly as in VRP).


I should have mentioned: when fully implemented, the test case
will pass even without VRP or without optimization.  A test for
the VRP bits will need to save the return value in a variable
and then use it (otherwise the check for memcpy(...) == 0 will
have already been optimized away by fold-const.c.



(FWIW, I quoted "simple" above because it recently took me the better
part of an afternoon to figure out how to add attribute alloc_size to
malloc.)



(I've heard some noise in C++-land about making memcpy(0,0,0) valid, but
that may have just been noise)


We may have read the same discussion.  It would make some things
a little easier in C++ (and remove what most people view as yet
another unnecessary gotcha in the language).

Martin




Re: [patch, libfortran] Add AVX-specific matmul

2016-11-16 Thread Thomas Koenig

Am 16.11.2016 um 23:01 schrieb Jakub Jelinek:

On Wed, Nov 16, 2016 at 10:30:03PM +0100, Thomas Koenig wrote:

the attached patch adds an AVX-specific version of the matmul
intrinsic to the Fortran library.  This works by using the target_clones
attribute.


Don't you need to test in configure if the assembler supports AVX?
Otherwise if somebody is bootstrapping gcc with older assembler, it will
just fail to bootstrap.


That's a good point.  The AVX instructions were added in binutils 2.19,
which was released in 2011. This could be put in the prerequisites.

What should the test do?  Fail with an error message "you need newer
binutils" or simply (and silently) not compile the AVX vesion?


For matmul_i*, wouldn't it make more sense to use avx2 instead of avx,
or both avx and avx2 and maybe avx512f?


I did a vdiff of the disassembled code generated or avx and avx2, and
(somewhat to my surprise) there was no difference.  Maybe, with more
unrolling, something more might have happened. I didn't check for
AVX512f, but I can do that.


2016-11-16  Thomas Koenig  

PR fortran/78379
* m4/matmul.m4:  For x86_64, make the work function for matmul


Why the extra space before For?


Will be removed.


static with target_clones for AVX and default, and create
a wrapper function to call it.
* generated/matmul_c10.c


Missing : Regenerated.


Will be added.

Regards

Thomas


Re: c-family PATCH to tidy switch diagnostics (PR c/78285)

2016-11-16 Thread Joseph Myers
On Wed, 16 Nov 2016, Marek Polacek wrote:

> As pointed out in Bug 78285, some error calls should actually be inform calls.
> I'm not adding any new test; existing switch-5.c covers all the cases so I 
> didn't
> see much value in duplicating that part of the test.
> 
> Bootstrapped/regtested on x86_64-linux, ok for trunk?

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] libiberty: Add Rust symbol demangling.

2016-11-16 Thread David Tolnay
FSF just confirmed that my assignment/disclaimer process has been
completed. Ian can you take a look at your list again?

David

On Fri, Nov 11, 2016 at 1:48 PM, David Tolnay  wrote:
> On Fri, Nov 11, 2016 at 10:46 AM, Ian Lance Taylor  wrote:
>> The patch is OK when the copyright assignment is clear.
>
> Ian I sent out the paperwork Monday night to ass...@gnu.org.
>
> David


Re: [PATCH] libiberty: Add Rust symbol demangling.

2016-11-16 Thread Ian Lance Taylor
On Wed, Nov 16, 2016 at 2:18 PM, David Tolnay  wrote:
> FSF just confirmed that my assignment/disclaimer process has been
> completed. Ian can you take a look at your list again?

Yes, you are good.  Thanks.

Ian


Re: [PATCH] libiberty: Add Rust symbol demangling.

2016-11-16 Thread Mark Wielaard
On Wed, Nov 16, 2016 at 02:56:03PM -0800, Ian Lance Taylor wrote:
> On Wed, Nov 16, 2016 at 2:18 PM, David Tolnay  wrote:
> > FSF just confirmed that my assignment/disclaimer process has been
> > completed. Ian can you take a look at your list again?
> 
> Yes, you are good.  Thanks.

I rebased the patch on top of master (which trivially applied), rebuild,
retested and pushed.
https://gcc.gnu.org/ml/gcc-cvs/2016-11/msg00798.html

I am really happy this is part of libiberty now. I will also sync it
with binutils so c++filt will be able to demangle rust symbols too.

Thanks,

Mark


Re: [PATCH] Fix NetBSD bootstrap

2016-11-16 Thread Mike Stump

> On Nov 16, 2016, at 9:12 AM, Krister Walfridsson 
>  wrote:
> 
> NetBSD fails bootstrap with
>  stdatomic.h:55:17: error: unknown type name '__INT_LEAST8_TYPE__'
> This is fixed by the following patch (only i386 and x86_64 for now. I'll
> do the other ports after fixing some more issues -- the NetBSD support is
> rather broken at the moment...)
> 
> I'm the NetBSD maintainer, so I belive I don't need approval to commit this. 
> But I have been absent for a long time, so it makes sense for someone to 
> review at least this first patch.

Looks reasonable.  The biggest issue would be if any of those values changed 
through time, and the current version works for older netbsd releases, the 
patch could break them.  Of course, I don't have any visibility into how any of 
those values might have changed through time.



Re: [PATCH] Fix PR77848

2016-11-16 Thread Bill Schmidt
On 11/16/16 9:08 AM, Richard Biener wrote:

> On Tue, Nov 15, 2016 at 9:03 PM, Bill Schmidt
>  wrote:
>> -  if ((any_pred_load_store || any_complicated_phi)
>> -  && !version_loop_for_if_conversion (loop))
>> +  /* Since we have no cost model, always version loops if vectorization
>> + is enabled.  Either version this loop, or if the pattern is right
>> + for outer-loop vectorization, version the outer loop.  In the
>> + latter case we will still if-convert the original inner loop.  */
>> +  /* FIXME: When SLP vectorization can handle if-conversion on its own,
>> + predicate all of if-conversion on flag_tree_loop_vectorize.  */
>> +  if ((any_pred_load_store || any_complicated_phi || 
>> flag_tree_loop_vectorize)
> I'd say given fun->has_force_vectorize_loops this should be
>
>if (flag_tree_loop_if_convert != 1
>> +  && !version_loop_for_if_conversion
>> +  (versionable_outer_loop_p (loop_outer (loop))
>> +   ? loop_outer (loop) : loop))
>>  goto cleanup;
> and thus always version if the user didnt' specify -ftree-loop-if-convert
> (-ftree-loop-if-convert-stores is dead, I forgot to remove uses).
>
> Can you as a first patch (after fixing the minor things above) commit
> the patch w/o changing the condition under which we version
> (but _do_ version the outer loop if possible?).  This should be a strict
> improvement enabling more outer loop vectorization.

Done and committed.  Thanks!

>
> The 2nd patch can then fix the PR and change the condition.
>
> Thus, ok with the nits fixed and the condition unchanged.
>
> Can you re-test the 2nd part with my suggested changed predicate?

Yes, the new predicate works fine.  New patch below, bootstrapped and tested
on powerpc64le-unknown-linux-gnu, with only the bb-slp-cond-1.c regressions
previously discussed.  Is this ok for trunk?

Thanks,
Bill

>
> Thanks,
> Richard.
>
[gcc]

2016-11-16  Bill Schmidt  
Richard Biener  

PR tree-optimization/77848
* tree-if-conv.c (tree_if_conversion): Always version loops unless
the user specified -ftree-loop-if-convert.

[gcc/testsuite]

2016-11-16  Bill Schmidt  
Richard Biener  

PR tree-optimization/77848
* gfortran.dg/vect/pr77848.f: New test.


Index: gcc/testsuite/gfortran.dg/vect/pr77848.f
===
--- gcc/testsuite/gfortran.dg/vect/pr77848.f(revision 0)
+++ gcc/testsuite/gfortran.dg/vect/pr77848.f(working copy)
@@ -0,0 +1,24 @@
+! PR 77848: Verify versioning is on when vectorization fails
+! { dg-do compile }
+! { dg-options "-O3 -ffast-math -fdump-tree-ifcvt -fdump-tree-vect-details" }
+
+  subroutine sub(x,a,n,m)
+  implicit none
+  real*8 x(*),a(*),atemp
+  integer i,j,k,m,n
+  real*8 s,t,u,v
+  do j=1,m
+ atemp=0.d0
+ do i=1,n
+if (abs(a(i)).gt.atemp) then
+   atemp=a(i)
+   k = i
+end if
+ enddo
+ call dummy(atemp,k)
+  enddo
+  return
+  end
+
+! { dg-final { scan-tree-dump "LOOP_VECTORIZED" "ifcvt" } }
+! { dg-final { scan-tree-dump "vectorized 0 loops in function" "vect" } }
Index: gcc/tree-if-conv.c
===
--- gcc/tree-if-conv.c  (revision 242521)
+++ gcc/tree-if-conv.c  (working copy)
@@ -2803,10 +2803,12 @@ tree_if_conversion (struct loop *loop)
  || loop->dont_vectorize))
 goto cleanup;
 
-  /* Either version this loop, or if the pattern is right for outer-loop
- vectorization, version the outer loop.  In the latter case we will
- still if-convert the original inner loop.  */
-  if ((any_pred_load_store || any_complicated_phi)
+  /* Since we have no cost model, always version loops unless the user
+ specified -ftree-loop-if-convert.  Either version this loop, or if
+ the pattern is right for outer-loop vectorization, version the
+ outer loop.  In the latter case we will still if-convert the
+ original inner loop.  */
+  if (flag_tree_loop_if_convert != 1
   && !version_loop_for_if_conversion
   (versionable_outer_loop_p (loop_outer (loop))
? loop_outer (loop) : loop))



Re: [patch, libfortran] Add AVX-specific matmul

2016-11-16 Thread Thomas Koenig

Am 17.11.2016 um 00:20 schrieb Jakub Jelinek:

On Thu, Nov 17, 2016 at 12:03:18AM +0100, Thomas Koenig wrote:

Don't you need to test in configure if the assembler supports AVX?
Otherwise if somebody is bootstrapping gcc with older assembler, it will
just fail to bootstrap.


That's a good point.  The AVX instructions were added in binutils 2.19,
which was released in 2011. This could be put in the prerequisites.

What should the test do?  Fail with an error message "you need newer
binutils" or simply (and silently) not compile the AVX vesion?



From what I understood, you want those functions just to be implementation

details, not exported from libgfortran.so*.  Thus the test would do
something similar to what gcc/testsuite/lib/target-supports.exp 
(check_effective_target_avx)
does, but of course in autoconf way, not in tcl.


OK, that looks straightworward enough. I'll give it a shot.


Also, from what I see, target_clones just use IFUNCs, so you probably also
need some configure test whether ifuncs are supported (the
gcc.target/i386/mvc* tests use dg-require-ifunc, so you'd need something
similar again in configure.  But if so, then I have no idea why you use
a wrapper around the function, instead of using it on the exported APIs.


As you wrote above, I wanted this as an implementation detail. I also
wanted the ability to be able to add new instruction sets without
breaking the ABI.

Because the caller generates the ifunc, using a wrapper function seemed
like the best way to do it.  The overhead is neglible (the function
is one simple jump), especially considering that we only call the
library function for larger matrices.


For matmul_i*, wouldn't it make more sense to use avx2 instead of avx,
or both avx and avx2 and maybe avx512f?


I did a vdiff of the disassembled code generated or avx and avx2, and
(somewhat to my surprise) there was no difference.  Maybe, with more
unrolling, something more might have happened. I didn't check for
AVX512f, but I can do that.


For the float/double code it wouldn't surprise me (assuming you don't need
gather insns and similar stuff).  But for integers generally most of the
avx instructions can only handle 128-bit vectors, while avx2 has 256-bit
ones,


You're right - integer multiplication looks different.

Nobody I know cares about integer matrix multiplication
speed, whereas real has gotten a _lot_ of attention over
the decades.  So, putting in AVX will make the code run
faster on more machines, while putting in AVX2 will
(IMHO) bloat the library for no good reason.  However,
I am willing to stand corrected on this. Putting in AVX512f
makes sense.

I have also been trying to get target_clones to work on POWER
to get Altivec instructions, but to no avail. I also cannot
find any examples in the testsuite.

Since a lot of supercomputers use POWER nodes, that might also
be attractive.

Regards

Thomas


Re: [patch, libfortran] Add AVX-specific matmul

2016-11-16 Thread Janne Blomqvist
On Thu, Nov 17, 2016 at 9:41 AM, Thomas Koenig  wrote:
> Am 17.11.2016 um 00:20 schrieb Jakub Jelinek:
>>
>> On Thu, Nov 17, 2016 at 12:03:18AM +0100, Thomas Koenig wrote:

 Don't you need to test in configure if the assembler supports AVX?
 Otherwise if somebody is bootstrapping gcc with older assembler, it will
 just fail to bootstrap.
>>>
>>>
>>> That's a good point.  The AVX instructions were added in binutils 2.19,
>>> which was released in 2011. This could be put in the prerequisites.
>>>
>>> What should the test do?  Fail with an error message "you need newer
>>> binutils" or simply (and silently) not compile the AVX vesion?
>>
>>
>>> From what I understood, you want those functions just to be
>>> implementation
>>
>> details, not exported from libgfortran.so*.  Thus the test would do
>> something similar to what gcc/testsuite/lib/target-supports.exp
>> (check_effective_target_avx)
>> does, but of course in autoconf way, not in tcl.
>
>
> OK, that looks straightworward enough. I'll give it a shot.
>
>> Also, from what I see, target_clones just use IFUNCs, so you probably also
>> need some configure test whether ifuncs are supported (the
>> gcc.target/i386/mvc* tests use dg-require-ifunc, so you'd need something
>> similar again in configure.  But if so, then I have no idea why you use
>> a wrapper around the function, instead of using it on the exported APIs.
>
>
> As you wrote above, I wanted this as an implementation detail. I also
> wanted the ability to be able to add new instruction sets without
> breaking the ABI.
>
> Because the caller generates the ifunc, using a wrapper function seemed
> like the best way to do it.  The overhead is neglible (the function
> is one simple jump), especially considering that we only call the
> library function for larger matrices.
>
 For matmul_i*, wouldn't it make more sense to use avx2 instead of avx,
 or both avx and avx2 and maybe avx512f?
>>>
>>>
>>> I did a vdiff of the disassembled code generated or avx and avx2, and
>>> (somewhat to my surprise) there was no difference.  Maybe, with more
>>> unrolling, something more might have happened. I didn't check for
>>> AVX512f, but I can do that.
>>
>>
>> For the float/double code it wouldn't surprise me (assuming you don't need
>> gather insns and similar stuff).  But for integers generally most of the
>> avx instructions can only handle 128-bit vectors, while avx2 has 256-bit
>> ones,
>
>
> You're right - integer multiplication looks different.
>
> Nobody I know cares about integer matrix multiplication
> speed, whereas real has gotten a _lot_ of attention over
> the decades.  So, putting in AVX will make the code run
> faster on more machines, while putting in AVX2 will
> (IMHO) bloat the library for no good reason.  However,
> I am willing to stand corrected on this. Putting in AVX512f
> makes sense.
>
> I have also been trying to get target_clones to work on POWER
> to get Altivec instructions, but to no avail. I also cannot
> find any examples in the testsuite.
>
> Since a lot of supercomputers use POWER nodes, that might also
> be attractive.
>
> Regards
>
> Thomas

Hi,

In order to reduce bloat, might it make sense to make the core blocked
gemm algorithm that Jerry committed a few days ago into a separate
static function, and then only do the target_clone stuff for that one?
The rest of the matmul function deals with all kinds of stuff like
setup, handling non-stride-1 cases, calling the external gemm function
for -fexternal-blas etc., none of which vectorizes anyway so
generating different versions of this code using different vector
instructions looks like a waste?

In that case I guess one could add the avx2 variant as well on the odd
chance that somebody for some reason cares about integer matmul.

-- 
Janne Blomqvist


Re: PR78319

2016-11-16 Thread Prathamesh Kulkarni
On 17 November 2016 at 03:20, Jeff Law  wrote:
> On 11/16/2016 01:23 PM, Prathamesh Kulkarni wrote:
>>
>> Hi,
>> As discussed in PR, this patch marks the test-case to xfail on
>> arm-none-eabi.
>> OK to commit ?
>
> You might check if Aldy's change to the uninit code helps your case
> (approved earlier today, so hopefully in the tree very soon).  I quickly
> scanned the BZ.  There's some overlap, but it might be too complex for
> Aldy's enhancements to catch.
Hi Jeff,
I tried Aldy's patch [1], but it didn't catch the case in PR78319.

[1] https://gcc.gnu.org/ml/gcc-patches/2016-11/msg00225.html

Thanks,
Prathamesh
>
> jeff


Re: Fix PR78154

2016-11-16 Thread Jeff Law

On 11/16/2016 05:17 PM, Martin Sebor wrote:



(I've heard some noise in C++-land about making memcpy(0,0,0) valid, but
that may have just been noise)


We may have read the same discussion.  It would make some things
a little easier in C++ (and remove what most people view as yet
another unnecessary gotcha in the language).

And that may be a reasonable thing to do.

While GCC does take advantage of the non-null attribute when trying to 
prove certain pointers must be non-null, it only does so when the magic 
flag is turned on.  There was a sense that it was too aggressive and 
that time may be necessary for folks to come to terms with what GCC was 
doing, particularly in the the memcpy (*, *, 0) case -- but I've never 
gotten the sense that happened and we've never turned that flag on by 
default.


jeff



Re: [PATCH,gcc/MIPS] Make loongson3a use fused madd.d

2016-11-16 Thread Paul Hua
ping...

On Thu, Nov 3, 2016 at 7:58 PM, Paul Hua  wrote:
> Hi Matthew,
>
> Thanks for your comments, update the patch.
>
> *** gcc/ChangeLog ***
>
> 2016-11-03 Chenghua Xu 
>
> * config/mips/mips.h (ISA_HAS_FUSED_MADD4): Enable for
> TARGET_LOONGSON_3A.
> (ISA_HAS_UNFUSED_MADD4): Exclude TARGET_LOONGSON_3A.
>
> Thanks,
> Paul
>
> On Thu, Nov 3, 2016 at 6:31 PM, Matthew Fortune
>  wrote:
>> Paul Hua  writes:
>>> Loongson3a has 4 operand fused madd instrcution. This patch set
>>> loongson3a use fused madd.d.
>>
>> Hi Paul,
>>
>> Thanks for the fix. I was vaguely aware that this was wrong for
>> loongson-3a but never confirmed it.
>>
>> I suspect this change is mechanical enough that it can bypass
>> copyright assignment but I'd need a global maintainer to comment.
>>
>> I've sent you copyright assignment paperwork separately.
>>
>> Two comments on the patch:
>>
>>> ChangeLog :
>>>
>>> *** gcc/ChangeLog ***
>>>
>>> 2016-11-03 Chenghua Xu 
>>>
>>> config/mips/
>>> * mips.h: Set loongson3a use fused madd.d.
>>
>> The changelog needs to reference what was changed rather than the
>> effect of the change:
>>
>> * config/mips/mips.h (ISA_HAS_FUSED_MADD4): Enable for
>> TARGET_LOONGSON_3A.
>> (ISA_HAS_UNFUSED_MADD4): Exclude TARGET_LOONGSON_3A.
>>
>>
>>>diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h
>>>index 81862a9..5076a2b 100644
>>>--- a/gcc/config/mips/mips.h
>>>+++ b/gcc/config/mips/mips.h
>>>@@ -1056,11 +1056,11 @@ struct mips_cpu_info {
>>>
>>> /* ISA has 4 operand fused madd instructions of the form
>>>'d = [+-] (a * b [+-] c)'.  */
>>>-#define ISA_HAS_FUSED_MADD4   TARGET_MIPS8000
>>>+#define ISA_HAS_FUSED_MADD4   (TARGET_MIPS8000 || TARGET_LOONGSON_3A)
>>>
>>> /* ISA has 4 operand unfused madd instructions of the form
>>>'d = [+-] (a * b [+-] c)'.  */
>>>-#define ISA_HAS_UNFUSED_MADD4 (ISA_HAS_FP4 && !TARGET_MIPS8000)
>>>+#define ISA_HAS_UNFUSED_MADD4 (ISA_HAS_FP4 && !TARGET_MIPS8000 && 
>>>!TARGET_LOONGSON_3A)
>>
>> Please split this line and move && !TARGET_LOONGSON_3A to the next line
>> under ISA_HAS_FP4.
>>
>>>
>>> /* ISA has 3 operand r6 fused madd instructions of the form
>>>'c = c [+-] (a * b)'.  */
>>
>> Thanks,
>> Matthew
>>


RE: RFA: PATCH to gengtype to avoid putting tree_node support in front end objects

2016-11-16 Thread Moore, Catherine


> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
> ow...@gcc.gnu.org] On Behalf Of Jakub Jelinek
> Sent: Thursday, November 10, 2016 7:53 AM
> To: Jason Merrill 
> Cc: gcc-patches List 
> Subject: Re: RFA: PATCH to gengtype to avoid putting tree_node
> support in front end objects
> 
> On Thu, Oct 27, 2016 at 09:36:09AM -0400, Jason Merrill wrote:
> > Currently, the way gengtype works it scans the list of source files
> > with front end files at the end, and pushes data structures onto a
> > stack.  It then processes the stack in LIFO order, so that data
> > structures from front ends are handled first.  As a result, if a GTY
> > data structure in a front end depends on tree_node, gengtype
> happily
> > puts gt_ggc_mx(tree_node*&) in a front end file, leading to link
> > errors on all other front ends.
> >
> > This patch avoids this problem by appending to the list of data
> > structures so that they are processed in FIFO order, and so
> tree_node
> > gets handled in gtype-desc.o.
> >
> > Tested x86_64-pc-linux-gnu, OK for trunk?
> 
> > commit 487a1c95c0d3169b2041942ff4f8d71c9ff689eb
> > Author: Jason Merrill 
> > Date:   Wed Oct 26 23:12:23 2016 -0400
> >
> > * gengtype.c (new_structure): Append to structures list.
> >
> > (find_structure): Likewise.
> 
> Please remove the blank line in the ChangeLog.
> 
> When looking at the differences it creates, it is hard, because
> all the generated files have all the functions emitted in reverse order
> now
> from what it used to be, so I only looked at files where the size
> changed,
> and that is beyond gtype.state only in my case gt-tree-phinodes.h
> which lost
> void
> gt_ggc_mx (struct gimple *& x)
> {
>   if (x)
> gt_ggc_mx_gimple ((void *) x);
> }
> and
> void
> gt_pch_nx (struct gimple *& x)
> {
>   if (x)
> gt_pch_nx_gimple ((void *) x);
> }
> and gtype-desc.c which didn't contain those but now it does (for
> gtype-desc.c it is hard to find out due to the reordering what else
> has changed, but as gt-tree-phinodes.h shrunk by 170 characters and
> gtype-desc.c grew by 170 characters, I'd think it is all that changed).
> I believe those routines belong to gtype-desc.c, that is where similar
> ones for tree_node, etc. are, tree-phinodes.h certainly isn't the header
> that defines gimple.
> 
> So I think this patch is ok for trunk.  Thanks.
> 

Hi -- This patch caused breakage for the MIPS port while compiling 
gcc/config/mips.c:

In file included from 
/scratch/cmoore/mips-sde-elf-upstream/src/gcc-trunk-6/gcc/hash-table.h:561:0,
 from 
/scratch/cmoore/mips-sde-elf-upstream/src/gcc-trunk-6/gcc/coretypes.h:351,
 from 
/scratch/cmoore/mips-sde-elf-upstream/src/gcc-trunk-6/gcc/config/mips/mips.c:26:
/scratch/cmoore/mips-sde-elf-upstream/src/gcc-trunk-6/gcc/hash-map.h: In 
instantiation of 'static void hash_map< , 
,  
>::hash_entry::ggc_mx(hash_map< , 
,  >::hash_entry&) [with KeyId 
= nofree_string_hash; Value = rtx_def*; Traits = 
simple_hashmap_traits]':
/scratch/cmoore/mips-sde-elf-upstream/src/gcc-trunk-6/gcc/hash-table.h:1029:17: 
  required from 'void gt_ggc_mx(hash_table*) [with E = 
hash_map::hash_entry]'
/scratch/cmoore/mips-sde-elf-upstream/src/gcc-trunk-6/gcc/hash-map.h:251:13:   
required from 'void gt_ggc_mx(hash_map*) [with K = nofree_string_hash; 
V = rtx_def*; H = 
simple_hashmap_traits]'
./gt-mips.h:38:19:   required from here
/scratch/cmoore/mips-sde-elf-upstream/src/gcc-trunk-6/gcc/hash-map.h:62:12: 
error: no matching function for call to 'gt_ggc_mx(rtx_def*&)'
  gt_ggc_mx (e.m_value);
^
... etc

I configured with --target=mips-sde-elf, but I do have some local multilib 
definitions for that target.  This ought to reproduce with mti-elf as well.
Will you please fix or revert?

Thanks,
Catherine



Re: [PATCH][2/2] GIMPLE Frontend, middle-end changes

2016-11-16 Thread Jeff Law

On 10/28/2016 05:51 AM, Richard Biener wrote:


These are the middle-end changes and additions to the testsuite.

They are pretty self-contained, I've organized the changelog
entries below in areas of changes:

 1) dump changes - we add a -gimple dump modifier that allows most
 function dumps to be directy fed back into the GIMPLE FE

 2) pass manager changes to implement the startwith("pass-name")
 feature which implements unit-testing for GIMPLE passes

 3) support for "SSA" input, a __PHI stmt that is lowered once the
 CFG is built, a facility to allow a specific SSA name to be allocated
 plus a small required change in the SSA rewriter to allow for
 pre-existing PHI arguments

Bootstrapped and tested on x86_64-unknown-linux-gnu (together with [1/2]).

I can approve all these changes myself but any comments are welcome.
My only worry would be ensuring that in the case where we're asking for 
a particular SSA_NAME in make_ssa_name_fn that we assert the requested 
name is available.


ISTM that if it's > the highest current version or in the freelist, then 
we ought to be safe.   If it isn't safe then we should either issue an 
error, or renumber the preexisting SSA_NAME (and determining if it's 
safe to renumber the preexisting SSA_NAME may require more context than 
we have).


jeff



Fix PR78154

2016-11-16 Thread Prathamesh Kulkarni
Hi Richard,
Following your suggestion in PR78154, the patch checks if stmt
contains call to memmove (and friends) in gimple_stmt_nonzero_warnv_p
and returns true in that case.

Bootstrapped+tested on x86_64-unknown-linux-gnu.
Cross-testing on arm*-*-*, aarch64*-*-* in progress.
Would it be OK to commit this patch in stage-3 ?

Thanks,
Prathamesh
2016-11-17  Prathamesh Kulkarni  

* tree-vrp.c (gimple_str_nonzero_warnv_p): New function.
(gimple_stmt_nonzero_warnv_p): Call gimple_str_nonzero_warnv_p.

testsuite/
* gcc.dg/tree-ssa/pr78154.c: New test-case.

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr78154.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr78154.c
new file mode 100644
index 000..d3463f4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr78154.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-evrp-slim" } */
+
+void f (void *d, const void *s, __SIZE_TYPE__ n)
+{
+  if (__builtin_memcpy (d, s, n) == 0)
+__builtin_abort ();
+
+  if (__builtin_memmove (d, s, n) == 0)
+__builtin_abort ();
+
+  if (__builtin_memset (d, 0, n) == 0)
+__builtin_abort ();
+
+  if (__builtin_strcpy (d, s) == 0)
+__builtin_abort ();
+
+  if (__builtin_strcat (d, s) == 0)
+__builtin_abort ();
+
+  if (__builtin_strncpy (d, s, n) == 0)
+__builtin_abort ();
+
+  if (__builtin_strncat (d, s, n) == 0)
+__builtin_abort ();
+}
+
+/* { dg-final { scan-tree-dump-not "__builtin_abort" "evrp" } } */
diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index c2a4133..b563a7f 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -1069,6 +1069,34 @@ gimple_assign_nonzero_warnv_p (gimple *stmt, bool 
*strict_overflow_p)
 }
 }
 
+/* Return true if STMT is known to contain call to a string-builtin function
+   that is known to return nonnull.  */
+
+static bool
+gimple_str_nonzero_warnv_p (gimple *stmt)
+{
+  if (!is_gimple_call (stmt))
+return false;
+
+  tree fndecl = gimple_call_fndecl (stmt);
+  if (!fndecl || DECL_BUILT_IN_CLASS (fndecl) != BUILT_IN_NORMAL)
+return false;
+
+  switch (DECL_FUNCTION_CODE (fndecl))
+{
+  case BUILT_IN_MEMMOVE:
+  case BUILT_IN_MEMCPY:
+  case BUILT_IN_MEMSET:
+  case BUILT_IN_STRCPY:
+  case BUILT_IN_STRNCPY:
+  case BUILT_IN_STRCAT:
+  case BUILT_IN_STRNCAT:
+   return true;
+  default:
+   return false;
+}
+}
+
 /* Return true if STMT is known to compute a non-zero value.
If the return value is based on the assumption that signed overflow is
undefined, set *STRICT_OVERFLOW_P to true; otherwise, don't change
@@ -1097,7 +1125,7 @@ gimple_stmt_nonzero_warnv_p (gimple *stmt, bool 
*strict_overflow_p)
lookup_attribute ("returns_nonnull",
  TYPE_ATTRIBUTES (gimple_call_fntype (stmt
  return true;
-   return gimple_alloca_call_p (stmt);
+   return gimple_alloca_call_p (stmt) || gimple_str_nonzero_warnv_p (stmt);
   }
 default:
   gcc_unreachable ();


Re: [PATCH v2] bb-reorder: Improve compgotos pass (PR71785)

2016-11-16 Thread Jeff Law

On 11/01/2016 10:27 AM, Segher Boessenkool wrote:

For code like the testcase in PR71785 GCC factors all the indirect branches
to a single dispatcher that then everything jumps to.  This is because
having many indirect branches with each many jump targets does not scale
in large parts of the compiler.  Very late in the pass pipeline (right
before peephole2) the indirect branches are then unfactored again, by
the duplicate_computed_gotos pass.

This pass works by replacing branches to such a common dispatcher by a
copy of the dispatcher.  For code like this testcase this does not work
so well: most cases do a single addition instruction right before the
dispatcher, but not all, and we end up with only two indirect jumps: the
one without the addition, and the one with the addition in its own basic
block, and now everything else jumps _there_.

This patch solves this problem by simply running the core of the
duplicate_computed_gotos pass again, as long as it does any work.  The
patch looks much bigger than it is, because I factored out two routines
to simplify the control flow.

Tested on powerpc64-linux {-m32,-m64}, and on the testcase, and on a version
of the testcase that has 2000 cases instead of 4.  Is this okay for trunk?


Segher


2016-10-30  Segher Boessenkool  

PR rtl-optimization/71785
* bb-reorder.c (duplicate_computed_gotos_find_candidates): New
function, factored out from pass_duplicate_computed_gotos::execute.
(duplicate_computed_gotos_do_duplicate): Ditto.  Don't use BB_VISITED.
(pass_duplicate_computed_gotos::execute): Rewrite.  Rerun the pass as
long as it makes changes.
OK.  I'm just going to note for the record here that while we iterate 
until nothing changes, the statement and block clamps should in practice 
ensure we hit a point where nothing changes.


Ideally I'd like to see testcases with this kind of change.  It should 
be standard operating procedure at this point.


Jeff




Re: [RFC PATCH] avoid printing type suffix with %E

2016-11-16 Thread Martin Sebor

On 11/16/2016 11:34 AM, Jeff Law wrote:

On 10/26/2016 10:37 AM, Martin Sebor wrote:

When formatting an integer constant using the %E directive GCC
includes a suffix that indicates its type.  This can perhaps be
useful in some situations but in my experience it's distracting
and gets in the way when writing tests.

Here's an example:

  $ cat b.c && gcc b.c
  constexpr __SIZE_TYPE__ x = 2;

  enum E: bool { e = x };
  b.c:3:20: error: enumerator value 2ul is outside the range of
underlying type ‘bool’
   enum E: bool { e = x };
  ^

Notice the "2ul" in the error message.

As far as I can tell, Clang avoids printing the suffix and I think
it would be nice if the GCC pretty printer made it possible to avoid
it as well.

The attached patch implements one such approach by having the pretty
printer recognize the space format flag to suppress the type suffix,
so "%E" still prints the suffix but "% E" does not.  I did this to
preserve the existing output but I think it would be nicer to avoid
printing the suffix with %E and treat (for instance) the pound sign
as a request to add the suffix.  I have tested the attached patch
but not the alternative.

Does anyone have any comments/suggestions for which of the two
approaches would be preferable (or what I may have missed here)?
I CC David as the diagnostic maintainer.

I'm having a hard time seeing how this is a significant issue, even when
writing tests.

It also seems to me that relaying the type of the constant as a suffix
would help in cases that aren't so obvious.

What am I missing?


I don't think it's terribly important, more like nuisance.  Tests
that check the value printed by the %E directive (I've been writing
lots of those lately -- see for example (*)) have to consistently
use this pattern:

\[0-9\]+\[lu\]*

When the type of the %E argument is a type like size_t or similar
that can be an alias for unsigned long or unsigned int, it's easy
to make a mistake and hardcode either

\[0-9\]+lu

or

\[0-9\]+u

based on the target where the test is being developed and end
up with failures on targets where the actual type is the other.
Copying test cases exercising one type to those exercising the
other (say from int to long) is also more tedious than it would
be without the suffix.

Beyond tests, I have never found the suffix helpful in warnings
or errors, but I also haven't seen too many of them in released
versions of GCC.  With the work I've been doing on buffer
overflow where size expressions are routinely included in
the diagnostics, there are lots more of them.  In some (e.g.,
in all the -Wformat-length) I've taken care to avoid printing
the suffix by converting tree nodes to HOST_WIDE_INT.  But that's
cumbersome and error-prone, and leads to inconsistent output from
GCC for different diagnostics that don't do the same.

Martin

[*] https://gcc.gnu.org/ml/gcc-patches/2016-11/msg01672.html


Re: Fix PR78154

2016-11-16 Thread Jakub Jelinek
On Thu, Nov 17, 2016 at 12:19:37AM +0530, Prathamesh Kulkarni wrote:
> --- a/gcc/tree-vrp.c
> +++ b/gcc/tree-vrp.c
> @@ -1069,6 +1069,34 @@ gimple_assign_nonzero_warnv_p (gimple *stmt, bool 
> *strict_overflow_p)
>  }
>  }
>  
> +/* Return true if STMT is known to contain call to a string-builtin function
> +   that is known to return nonnull.  */
> +
> +static bool
> +gimple_str_nonzero_warnv_p (gimple *stmt)
> +{
> +  if (!is_gimple_call (stmt))
> +return false;

Shouldn't this be:
  if (!gimple_call_builtin_p (stmt, BUILT_IN_NORMAL))
return false;

> +
> +  tree fndecl = gimple_call_fndecl (stmt);

> +  if (!fndecl || DECL_BUILT_IN_CLASS (fndecl) != BUILT_IN_NORMAL)
> +return false;

And drop the above 2 lines?

That we you also verify the arguments for sanity.

Otherwise I'll defer to Richard.

Jakub


Re: [PATCH][GCC/TESTSUITE] Make test for traditional-cpp depend on

2016-11-16 Thread Mike Stump
On Nov 16, 2016, at 7:57 AM, Tamar Christina  wrote:
> 
> Forgot to include the committed patch.

>>> This is causing all test names to depend on $srcdir.  A test name
>>> should never include the value of $srcdir.
>> 
>> Sorry about that, committed a fix as r242500 under the obvious rule.

Yeah, I easily could have approved it as well, so no worries.

The patch is Ok.

The way I usually catch this would be in reviewing the output of 
./contrib/compare_tests, it would complain about a ton of new tests now not 
passing, which is a dead giveaway.



Re: [PATCH][GCC/TESTSUITE] Make test for traditional-cpp depend on

2016-11-16 Thread Mike Stump
On Nov 16, 2016, at 10:58 AM, Mike Stump  wrote:
> 
> Yeah, I easily could have approved it as well, so no worries.

Oh.  I see I did approve the original patch, sorry for not catching it.  Thanks 
for all your work.



Re: [PATCH] Support -fsanitize=integer-arith-overflow even for vectors (PR sanitizer/77823)

2016-11-16 Thread Richard Biener
On Tue, 15 Nov 2016, Jakub Jelinek wrote:

> Hi!
> 
> On Mon, Nov 14, 2016 at 10:58:51AM +0100, Jakub Jelinek wrote:
> > Working virtually out of Samoa.
> > 
> > The following patch is an attempt to handle -fsanitize=undefined
> > for vectors.  We already diagnose out of bounds accesses for vector
> > subscripts, this patch adds expansion for vector UBSAN_CHECK_* and generates
> > those in ubsan.  Haven't finished up the many vect elements handling (want
> > to emit a loop for code size).  Is this something we want for GCC 7?
> 
> Here is the full patch (just for -fsanitize=signed-integer-overflow, not
> for -fsanitize=shift or -fsanitize={integer,float}-divide-by-zero for now).
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

LGTM.

Thanks,
Richard.

> 2016-11-15  Jakub Jelinek  
> 
>   PR sanitizer/77823
>   * ubsan.c (ubsan_build_overflow_builtin): Add DATAP argument, if
>   it points to non-NULL tree, use it instead of ubsan_create_data.
>   (instrument_si_overflow): Handle vector signed integer overflow
>   checking.
>   * ubsan.h (ubsan_build_overflow_builtin): Add DATAP argument.
>   * tree-vrp.c (simplify_internal_call_using_ranges): Punt for
>   vector IFN_UBSAN_CHECK_*.
>   * internal-fn.c (expand_addsub_overflow): Add DATAP argument,
>   pass it through to ubsan_build_overflow_builtin.
>   (expand_neg_overflow, expand_mul_overflow): Likewise.
>   (expand_vector_ubsan_overflow): New function.
>   (expand_UBSAN_CHECK_ADD, expand_UBSAN_CHECK_SUB,
>   expand_UBSAN_CHECK_MUL): Use tit for vector arithmetics.
>   (expand_arith_overflow): Adjust expand_*_overflow callers.
> 
>   * c-c++-common/ubsan/overflow-vec-1.c: New test.
>   * c-c++-common/ubsan/overflow-vec-2.c: New test.
> 
> --- gcc/ubsan.c.jj2016-11-14 19:57:07.005897502 +0100
> +++ gcc/ubsan.c   2016-11-15 09:09:33.288146293 +0100
> @@ -1219,14 +1219,20 @@ instrument_null (gimple_stmt_iterator gs
>  
>  tree
>  ubsan_build_overflow_builtin (tree_code code, location_t loc, tree lhstype,
> -   tree op0, tree op1)
> +   tree op0, tree op1, tree *datap)
>  {
>if (flag_sanitize_undefined_trap_on_error)
>  return build_call_expr_loc (loc, builtin_decl_explicit (BUILT_IN_TRAP), 
> 0);
>  
> -  tree data = ubsan_create_data ("__ubsan_overflow_data", 1, ,
> -  ubsan_type_descriptor (lhstype), NULL_TREE,
> -  NULL_TREE);
> +  tree data;
> +  if (datap && *datap)
> +data = *datap;
> +  else
> +data = ubsan_create_data ("__ubsan_overflow_data", 1, ,
> +   ubsan_type_descriptor (lhstype), NULL_TREE,
> +   NULL_TREE);
> +  if (datap)
> +*datap = data;
>enum built_in_function fn_code;
>  
>switch (code)
> @@ -1272,14 +1278,15 @@ instrument_si_overflow (gimple_stmt_iter
>tree_code code = gimple_assign_rhs_code (stmt);
>tree lhs = gimple_assign_lhs (stmt);
>tree lhstype = TREE_TYPE (lhs);
> +  tree lhsinner = VECTOR_TYPE_P (lhstype) ? TREE_TYPE (lhstype) : lhstype;
>tree a, b;
>gimple *g;
>  
>/* If this is not a signed operation, don't instrument anything here.
>   Also punt on bit-fields.  */
> -  if (!INTEGRAL_TYPE_P (lhstype)
> -  || TYPE_OVERFLOW_WRAPS (lhstype)
> -  || GET_MODE_BITSIZE (TYPE_MODE (lhstype)) != TYPE_PRECISION (lhstype))
> +  if (!INTEGRAL_TYPE_P (lhsinner)
> +  || TYPE_OVERFLOW_WRAPS (lhsinner)
> +  || GET_MODE_BITSIZE (TYPE_MODE (lhsinner)) != TYPE_PRECISION 
> (lhsinner))
>  return;
>  
>switch (code)
> @@ -1305,7 +1312,7 @@ instrument_si_overflow (gimple_stmt_iter
>/* Represent i = -u;
>as
>i = UBSAN_CHECK_SUB (0, u);  */
> -  a = build_int_cst (lhstype, 0);
> +  a = build_zero_cst (lhstype);
>b = gimple_assign_rhs1 (stmt);
>g = gimple_build_call_internal (IFN_UBSAN_CHECK_SUB, 2, a, b);
>gimple_call_set_lhs (g, lhs);
> @@ -1316,7 +1323,7 @@ instrument_si_overflow (gimple_stmt_iter
>into
>_N = UBSAN_CHECK_SUB (0, u);
>i = ABS_EXPR<_N>;  */
> -  a = build_int_cst (lhstype, 0);
> +  a = build_zero_cst (lhstype);
>b = gimple_assign_rhs1 (stmt);
>g = gimple_build_call_internal (IFN_UBSAN_CHECK_SUB, 2, a, b);
>a = make_ssa_name (lhstype);
> --- gcc/ubsan.h.jj2016-11-14 19:57:07.027897220 +0100
> +++ gcc/ubsan.h   2016-11-14 20:37:20.892032650 +0100
> @@ -52,7 +52,8 @@ extern tree ubsan_create_data (const cha
>  extern tree ubsan_type_descriptor (tree, enum ubsan_print_style = 
> UBSAN_PRINT_NORMAL);
>  extern tree ubsan_encode_value (tree, bool = false);
>  extern bool is_ubsan_builtin_p (tree);
> -extern tree ubsan_build_overflow_builtin (tree_code, location_t, tree, tree, 
> tree);
> +extern tree ubsan_build_overflow_builtin (tree_code, location_t, tree, tree,
> +   

Re: [PATCH, vec-tails] Support loop epilogue vectorization

2016-11-16 Thread Richard Biener
On Tue, 15 Nov 2016, Yuri Rumyantsev wrote:

> Hi All,
> 
> Here is patch for non-masked epilogue vectoriziation.
> 
> Bootstrap and regression testing did not show any new failures.
> 
> Is it OK for trunk?

Ok for trunk.

I believe we ultimatively want to remove the new
--param and enable this by default with a better cost model.
What immediately comes to my mind when seeing

+  if (epilogue)
+{
+   unsigned int vector_sizes
+ = targetm.vectorize.autovectorize_vector_sizes ();
+   vector_sizes &= current_vector_size - 1;
+
+   if (!PARAM_VALUE (PARAM_VECT_EPILOGUES_NOMASK))
+ epilogue = NULL;
+   else if (!vector_sizes)
+ epilogue = NULL;
+   else if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
+&& LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) >= 0)
+ {
+   int smallest_vec_size = 1 << ctz_hwi (vector_sizes);
+   int ratio = current_vector_size / smallest_vec_size;
+   int eiters = LOOP_VINFO_INT_NITERS (loop_vinfo)
+ - LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo);
+   eiters = eiters % vf;
+
+   epilogue->nb_iterations_upper_bound = eiters - 1;
+
+   if (eiters < vf / ratio)
+ epilogue = NULL;
+   }

is that we have targetm.vectorize.preferred_simd_mode which
for example with -mprefer-avx128 will first try with 128bit
vectorization.  So if a ! prefered vector size ends up
creating the epilogue we know vectorizing with the prefered
size will fail.  The above also does not take into account
peeling for gaps in which case we know the epilogue runs at
least VF times (but the vectorized epilogue might also need
an epilogue itself if not masked).

The natural next step is the masked epilogue support, after
that the merged masked epilogue one.

Thanks,
Richard.

> Thanks.
> Changelog:
> 
> 2016-11-15  Yuri Rumyantsev  
> 
> * params.def (PARAM_VECT_EPILOGUES_NOMASK): New.
> * tree-if-conv.c (tree_if_conversion): Make public.
> * * tree-if-conv.h: New file.
> * tree-vect-data-refs.c (vect_analyze_data_ref_dependences) Avoid
> dynamic alias checks for epilogues.
> * tree-vect-loop-manip.c (vect_do_peeling): Return created epilog.
> * tree-vect-loop.c: include tree-if-conv.h.
> (new_loop_vec_info): Add zeroing orig_loop_info field.
> (vect_analyze_loop_2): Don't try to enhance alignment for epilogues.
> (vect_analyze_loop): Add argument ORIG_LOOP_INFO which is not NULL
> if epilogue is vectorized, set up orig_loop_info field of loop_vinfo
> using passed argument.
> (vect_transform_loop): Check if created epilogue should be returned
> for further vectorization with less vf.  If-convert epilogue if
> required. Print vectorization success for epilogue.
> * tree-vectorizer.c (vectorize_loops): Add epilogue vectorization
> if it is required, pass loop_vinfo produced during vectorization of
> loop body to vect_analyze_loop.
> * tree-vectorizer.h (struct _loop_vec_info): Add new field
> orig_loop_info.
> (LOOP_VINFO_ORIG_LOOP_INFO): New.
> (LOOP_VINFO_EPILOGUE_P): New.
> (LOOP_VINFO_ORIG_VECT_FACTOR): New.
> (vect_do_peeling): Change prototype to return epilogue.
> (vect_analyze_loop): Add argument of loop_vec_info type.
> (vect_transform_loop): Return created loop.
> 
> gcc/testsuite/
> 
> * lib/target-supports.exp (check_avx2_hw_available): New.
> (check_effective_target_avx2_runtime): New.
> * gcc.dg/vect/vect-tail-nomask-1.c: New test.
> 
> 
> 2016-11-14 20:04 GMT+03:00 Richard Biener :
> > On November 14, 2016 4:39:40 PM GMT+01:00, Yuri Rumyantsev 
> >  wrote:
> >>Richard,
> >>
> >>I checked one of the tests designed for epilogue vectorization using
> >>patches 1 - 3 and found out that build compiler performs vectorization
> >>of epilogues with --param vect-epilogues-nomask=1 passed:
> >>
> >>$ gcc -Ofast -mavx2 t1.c -S --param vect-epilogues-nomask=1 -o
> >>t1.new-nomask.s -fdump-tree-vect-details
> >>$ grep VECTORIZED -c t1.c.156t.vect
> >>4
> >> Without param only 2 loops are vectorized.
> >>
> >>Should I simply add a part of tests related to this feature or I must
> >>delete all not necessary changes also?
> >
> > Please remove all not necessary changes.
> >
> > Richard.
> >
> >>Thanks.
> >>Yuri.
> >>
> >>2016-11-14 16:40 GMT+03:00 Richard Biener :
> >>> On Mon, 14 Nov 2016, Yuri Rumyantsev wrote:
> >>>
>  Richard,
> 
>  In my previous patch I forgot to remove couple lines related to aux
> >>field.
>  Here is the correct updated patch.
> >>>
> >>> Yeah, I noticed.  This patch would be ok for trunk (together with
> >>> necessary parts from 1 and 2) if all not required parts are removed
> >>> (and you'd add the testcases covering non-masked tail vect).
> >>>
> >>> Thus, can you please produce a single complete patch containing only
> >>> non-masked epilogue vectoriziation?
> >>>
> >>> Thanks,
> >>> Richard.
> >>>
>  Thanks.
>  Yuri.
> 
>  2016-11-14 15:51 

Re: [RFC][PATCH] Speed-up use-after-scope (re-writing to SSA)

2016-11-16 Thread Jakub Jelinek
On Wed, Nov 16, 2016 at 01:25:04PM +0100, Martin Liška wrote:
>  
> +
> +/* Expand the ASAN_{LOAD,STORE} builtins.  */

Stale comment.

> +
> +bool
> +asan_expand_poison_ifn (gimple_stmt_iterator *iter,
> + bool *need_commit_edge_insert)
> +{
...
> +  use_operand_p use_p;
> +  imm_use_iterator imm_iter;
> +  FOR_EACH_IMM_USE_FAST (use_p, imm_iter, poisoned_var)
> +{
> +  gimple *use = USE_STMT (use_p);
> +

You want to ignore debug stmts uses here (or reset them).

> +  built_in_function b = (recover_p
> +  ? BUILT_IN_ASAN_REPORT_USE_AFTER_SCOPE_NOABORT
> +  : BUILT_IN_ASAN_REPORT_USE_AFTER_SCOPE);
> +  tree fun = builtin_decl_implicit (b);
> +  pretty_printer pp;
> +  pp_tree_identifier (, DECL_NAME (var_decl));
> +
> +  gcall *call = gimple_build_call (fun, 2, asan_pp_string (),
> +DECL_SIZE_UNIT (var_decl));
> +  gimple_set_location (call, gimple_location (g));

Is that the location you want?  I mean shouldn't it use gimple_location (use)
instead?  The bug is on the use, not on the spot where it went out of scope.
Though the question is what to use if gimple_location (use) is
UNKNOWN_LOCATION.

> +
> +  /* If ASAN_POISON is used in a PHI node, let's insert the call on
> +  the leading to the PHI node BB.  */

The comment doesn't make sense gramatically to me.

> +  if (is_a  (use))
> + {
> +   gphi * phi = dyn_cast (use);
> +   for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
> + if (gimple_phi_arg_def (phi, i) == poisoned_var)
> +   {
> + edge e = gimple_phi_arg_edge (phi, i);
> + gsi_insert_seq_on_edge (e, call);
> + *need_commit_edge_insert = true;

What if there are multiple PHI args with that use?
Shouldn't you use just FOR_EACH_USE_ON_STMT or what macros we have?

> --- a/libsanitizer/asan/asan_errors.cc
> +++ b/libsanitizer/asan/asan_errors.cc
> @@ -279,6 +279,27 @@ void ErrorInvalidPointerPair::Print() {
>ReportErrorSummary(bug_type, );
>  }

As I wrote on IRC, we have to submit this to compiler-rt and only
if it is accepted, cherry-pick it together with the gcc changes.

> --- a/libsanitizer/asan/asan_errors.h
> +++ b/libsanitizer/asan/asan_errors.h
> @@ -294,6 +294,24 @@ struct ErrorInvalidPointerPair : ErrorBase {
>void Print();
>  };
>  
> +struct ErrorUseAfterScope : ErrorBase {
> +  uptr pc, bp, sp;
> +  const char *variable_name;
> +  u32 variable_size;

Shouldn't this be uptr?

> +  ErrorUseAfterScope(u32 tid, uptr pc_, uptr bp_, uptr sp_,
> + const char *variable_name_, u32 variable_size_)

And here.

> +// --- ReportUseAfterScope --- {{{1
> +void ReportUseAfterScope(const char *variable_name, u32 variable_size,

And here?

> +void ReportUseAfterScope(const char *variable_name, u32 variable_size,
> + bool fatal);

And here?

Jakub


Re: [1/9][RFC][DWARF] Reserve three DW_OP numbers in vendor extension space

2016-11-16 Thread Mark Wielaard
On Wed, 2016-11-16 at 10:00 +, Jiong Wang wrote:
>   The two operations DW_OP_AARCH64_paciasp and DW_OP_AARCH64_paciasp_deref 
> were
> designed as shortcut operations when LR is signed with A key and using
> function's CFA as salt.  This is the default behaviour of return address
> signing so is expected to be used for most of the time.  DW_OP_AARCH64_pauth
> is designed as a generic operation that allow describing pointer signing on
> any value using any salt and key in case we can't use the shortcut operations
> we can use this.

I admit to not fully understand the salting/keying involved. But given
that the DW_OP space is really tiny, so we would like to not eat up too
many of them for new opcodes. And given that introducing any new DW_OPs
using for CFI unwinding will break any unwinder anyway causing us to
update them all for this new feature. Have you thought about using a new
CIE augmentation string character for describing that the return
address/link register used by a function/frame is salted/keyed?

This seems a good description of CIE records and augmentation
characters: http://www.airs.com/blog/archives/460

It obviously also involves updating all unwinders to understand the new
augmentation character (and possible arguments). But it might be more
generic and saves us from using up too many DW_OPs.

Cheers,

Mark


Re: [1/9][RFC][DWARF] Reserve three DW_OP numbers in vendor extension space

2016-11-16 Thread Jakub Jelinek
On Wed, Nov 16, 2016 at 02:54:56PM +0100, Mark Wielaard wrote:
> On Wed, 2016-11-16 at 10:00 +, Jiong Wang wrote:
> >   The two operations DW_OP_AARCH64_paciasp and DW_OP_AARCH64_paciasp_deref 
> > were
> > designed as shortcut operations when LR is signed with A key and using
> > function's CFA as salt.  This is the default behaviour of return address
> > signing so is expected to be used for most of the time.  DW_OP_AARCH64_pauth
> > is designed as a generic operation that allow describing pointer signing on
> > any value using any salt and key in case we can't use the shortcut 
> > operations
> > we can use this.
> 
> I admit to not fully understand the salting/keying involved. But given
> that the DW_OP space is really tiny, so we would like to not eat up too
> many of them for new opcodes. And given that introducing any new DW_OPs
> using for CFI unwinding will break any unwinder anyway causing us to
> update them all for this new feature. Have you thought about using a new
> CIE augmentation string character for describing that the return
> address/link register used by a function/frame is salted/keyed?
> 
> This seems a good description of CIE records and augmentation
> characters: http://www.airs.com/blog/archives/460
> 
> It obviously also involves updating all unwinders to understand the new
> augmentation character (and possible arguments). But it might be more
> generic and saves us from using up too many DW_OPs.

>From what I understood, the return address is not always scrambled, so
it doesn't apply to the whole function, just to most of it (except for
an insn in the prologue and some in the epilogue).  So I think one op is
needed.  But can't it be just a toggable flag whether the return address
is scrambled + some arguments to it?
Thus DW_OP_AARCH64_scramble .uleb128 0 would mean that the default
way of scrambling starts here (if not already active) or any kind of
scrambling ends here (if already active), and
DW_OP_AARCH64_scramble .uleb128 non-zero would be whatever encoding you need
to represent details of the less common variants with details what to do.
Then you'd just hook through some MD_* macro in the unwinder the
descrambling operation if the scrambling is active at the insns you unwind
on.

Jakub


[PATCH, ARM] Add vfpv2 and neon-vfpv3

2016-11-16 Thread Richard Earnshaw (lists)
The options -mfpu=neon and -mfpu=vfp have always meant a specific
version of neon and vfp, but common usage seems to misunderstand this.

To help clarify things I've added a couple of new option values to the
-mfpu= option and have now documented the existing names as aliases.

As discussed at the Cauldron, I would like long term to deprecate -mfpu
entirely (watch this space).  But in the mean time this makes things a
bit clearer and perhaps also smoothes the transition path slightly.

* arm/arm-fpus.def (vfpv2): New FPU, currently an alias for 'vfp'.
(neon-vfpv3): New FPU, currently an alias for 'neon'.
* arm/arm-tables.opt: Regenerated.
* arm/t-aprofile (MULTILIB_REUSE): Add reuse rules for vfpv2 and
neon-vfpv3.
* doc/invoke.texi (ARM: -mfpu): Document new options.  Note that 'vfp'
and 'neon' are aliases for specific implementations.

Tested on arm-none-eabi.  Installed on trunk.

R.


diff --git a/gcc/config/arm/arm-fpus.def b/gcc/config/arm/arm-fpus.def
index 
e0c43651d9392b718ce4fc5e2bbcf5fc1e097123..04b2ef140c44d78a1797a5125896fdf05a5a6fba
 100644
--- a/gcc/config/arm/arm-fpus.def
+++ b/gcc/config/arm/arm-fpus.def
@@ -26,6 +26,7 @@
genopt.sh assumes no whitespace up to the first "," in each entry.  */
 
 ARM_FPU("vfp", 2, VFP_REG_D16, FPU_FL_NONE)
+ARM_FPU("vfpv2",   2, VFP_REG_D16, FPU_FL_NONE)
 ARM_FPU("vfpv3",   3, VFP_REG_D32, FPU_FL_NONE)
 ARM_FPU("vfpv3-fp16",  3, VFP_REG_D32, FPU_FL_FP16)
 ARM_FPU("vfpv3-d16",   3, VFP_REG_D16, FPU_FL_NONE)
@@ -33,6 +34,7 @@ ARM_FPU("vfpv3-d16-fp16", 3, VFP_REG_D16, FPU_FL_FP16)
 ARM_FPU("vfpv3xd", 3, VFP_REG_SINGLE, FPU_FL_NONE)
 ARM_FPU("vfpv3xd-fp16",3, VFP_REG_SINGLE, FPU_FL_FP16)
 ARM_FPU("neon",3, VFP_REG_D32, FPU_FL_NEON)
+ARM_FPU("neon-vfpv3",  3, VFP_REG_D32, FPU_FL_NEON)
 ARM_FPU("neon-fp16",   3, VFP_REG_D32, FPU_FL_NEON | FPU_FL_FP16)
 ARM_FPU("vfpv4",   4, VFP_REG_D32, FPU_FL_FP16)
 ARM_FPU("vfpv4-d16",   4, VFP_REG_D16, FPU_FL_FP16)
diff --git a/gcc/config/arm/arm-tables.opt b/gcc/config/arm/arm-tables.opt
index 
ee9e3bb7ec57e0e8f2f15b83442711b9faf82d20..31e069c8177976a01c8cedaa651e1b57d07e7d36
 100644
--- a/gcc/config/arm/arm-tables.opt
+++ b/gcc/config/arm/arm-tables.opt
@@ -465,56 +465,62 @@ EnumValue
 Enum(arm_fpu) String(vfp) Value(0)
 
 EnumValue
-Enum(arm_fpu) String(vfpv3) Value(1)
+Enum(arm_fpu) String(vfpv2) Value(1)
 
 EnumValue
-Enum(arm_fpu) String(vfpv3-fp16) Value(2)
+Enum(arm_fpu) String(vfpv3) Value(2)
 
 EnumValue
-Enum(arm_fpu) String(vfpv3-d16) Value(3)
+Enum(arm_fpu) String(vfpv3-fp16) Value(3)
 
 EnumValue
-Enum(arm_fpu) String(vfpv3-d16-fp16) Value(4)
+Enum(arm_fpu) String(vfpv3-d16) Value(4)
 
 EnumValue
-Enum(arm_fpu) String(vfpv3xd) Value(5)
+Enum(arm_fpu) String(vfpv3-d16-fp16) Value(5)
 
 EnumValue
-Enum(arm_fpu) String(vfpv3xd-fp16) Value(6)
+Enum(arm_fpu) String(vfpv3xd) Value(6)
 
 EnumValue
-Enum(arm_fpu) String(neon) Value(7)
+Enum(arm_fpu) String(vfpv3xd-fp16) Value(7)
 
 EnumValue
-Enum(arm_fpu) String(neon-fp16) Value(8)
+Enum(arm_fpu) String(neon) Value(8)
 
 EnumValue
-Enum(arm_fpu) String(vfpv4) Value(9)
+Enum(arm_fpu) String(neon-vfpv3) Value(9)
 
 EnumValue
-Enum(arm_fpu) String(vfpv4-d16) Value(10)
+Enum(arm_fpu) String(neon-fp16) Value(10)
 
 EnumValue
-Enum(arm_fpu) String(fpv4-sp-d16) Value(11)
+Enum(arm_fpu) String(vfpv4) Value(11)
 
 EnumValue
-Enum(arm_fpu) String(fpv5-sp-d16) Value(12)
+Enum(arm_fpu) String(vfpv4-d16) Value(12)
 
 EnumValue
-Enum(arm_fpu) String(fpv5-d16) Value(13)
+Enum(arm_fpu) String(fpv4-sp-d16) Value(13)
 
 EnumValue
-Enum(arm_fpu) String(neon-vfpv4) Value(14)
+Enum(arm_fpu) String(fpv5-sp-d16) Value(14)
 
 EnumValue
-Enum(arm_fpu) String(fp-armv8) Value(15)
+Enum(arm_fpu) String(fpv5-d16) Value(15)
 
 EnumValue
-Enum(arm_fpu) String(neon-fp-armv8) Value(16)
+Enum(arm_fpu) String(neon-vfpv4) Value(16)
 
 EnumValue
-Enum(arm_fpu) String(crypto-neon-fp-armv8) Value(17)
+Enum(arm_fpu) String(fp-armv8) Value(17)
 
 EnumValue
-Enum(arm_fpu) String(vfp3) Value(18)
+Enum(arm_fpu) String(neon-fp-armv8) Value(18)
+
+EnumValue
+Enum(arm_fpu) String(crypto-neon-fp-armv8) Value(19)
+
+EnumValue
+Enum(arm_fpu) String(vfp3) Value(20)
 
diff --git a/gcc/config/arm/t-aprofile b/gcc/config/arm/t-aprofile
index 
f852ecd04010ed54917c74e699a4580096b38e7f..7c5b10b75998ae0e20268fa946b3f35bf4456ada
 100644
--- a/gcc/config/arm/t-aprofile
+++ b/gcc/config/arm/t-aprofile
@@ -111,6 +111,8 @@ MULTILIB_MATCHES   += mfpu?vfpv4-d16=mfpu?vfpv4
 MULTILIB_MATCHES   += mfpu?vfpv4-d16=mfpu?fpv5-d16
 MULTILIB_MATCHES   += mfpu?vfpv4-d16=mfpu?fp-armv8
 MULTILIB_MATCHES   += mfpu?neon-fp-armv8=mfpu?crypto-neon-fp-armv8
+MULTILIB_MATCHES   += mfpu?vfp=mfpu?vfpv2
+MULTILIB_MATCHES   += mfpu?neon=mfpu?neon-vfpv3
 
 
 # Map all requests for vfpv3 with a later CPU to vfpv3-d16 v7-a.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 

Re: [PATCH] Fix PR78305

2016-11-16 Thread Marc Glisse

On Wed, 16 Nov 2016, Richard Biener wrote:


On Wed, 16 Nov 2016, Marc Glisse wrote:


On Wed, 16 Nov 2016, Richard Biener wrote:


I am testing the following to avoid undefined behavior when negating
a multiplication (basically extending a previous fix to properly handle
negative power of two).

Bootstrap / regtest running on x86_64-unknown-linux-gnu.

Richard.

2016-11-16  Richard Biener  

PR middle-end/78305
* fold-const.c (negate_expr_p): Fix multiplication case.

* gcc.dg/torture/pr78305.c: New testcase.

Index: gcc/fold-const.c
===
--- gcc/fold-const.c(revision 242471)
+++ gcc/fold-const.c(working copy)
@@ -450,13 +450,15 @@ negate_expr_p (tree t)
  if (TYPE_UNSIGNED (type))
break;
  /* INT_MIN/n * n doesn't overflow while negating one operand it does
- if n is a power of two.  */
+ if n is a power of (minus) two.  */


if n is (minus) a power of two.
if n is a divisor of INT_MIN.


n is a divisor of INT_MIN is correct.


  if (INTEGRAL_TYPE_P (TREE_TYPE (t))
  && ! TYPE_OVERFLOW_WRAPS (TREE_TYPE (t))
  && ! ((TREE_CODE (TREE_OPERAND (t, 0)) == INTEGER_CST
-&& ! integer_pow2p (TREE_OPERAND (t, 0)))
+&& (wi::popcount (TREE_OPERAND (t, 0))
+!= 1 + wi::neg_p (TREE_OPERAND (t, 0), SIGNED)))


Is that supposed to test for (possibly negated) powers of 2? I don't see it.
For -2, aka 0b11...110, popcount is 31 != 1 + 1.


It's supposed to test for a power of two with the sign-bit ORed in.
I believe those are which, when multiplied with some number can
yield INT_MIN.  That is, we look for a test that ensures that there
exists no number when multiplied with X yields INT_MIN.


The first sentence about ORing the sign bit sounds strange (except for a 
sign-magnitude representation). With 2's complement, INT_MIN is -2^31, the 
divisors are the 2^k and -(2^k). -2 * 2^30 yields INT_MIN, but your test 
misses -2 as a possible divisor. On the other hand, 0b100...001 (aka 
-INT_MAX) is not a divisor of INT_MIN but your test says the reverse.


--
Marc Glisse


Re: [PATCH 9/9] Add "__RTL" to cc1 (v4)

2016-11-16 Thread Richard Biener
On Tue, Nov 15, 2016 at 10:07 PM, David Malcolm  wrote:
> On Mon, 2016-11-14 at 16:14 +0100, Richard Biener wrote:
>> On Fri, Nov 11, 2016 at 10:15 PM, David Malcolm 
>> wrote:
>> > Changed in this version:
>> >
>> > * Rather than running just one pass, run *all* passes, but start at
>> >   the given pass; support for "dg-do run" tests that execute the
>> >   resulting code.
>> > * Updated test cases to new "compact" dump format; more test cases;
>> >   use "dg-do run" in various places.
>> > * Lots of bugfixing
>> >
>> > Links to previous versions:
>> >   https://gcc.gnu.org/ml/gcc-patches/2016-10/msg00263.html
>> >   https://gcc.gnu.org/ml/gcc-patches/2016-10/msg00500.html
>
>> Does running the RTL passes right from the parser work with -fsyntax
>> -only?
>
> It depends what you mean by "work".  If I run it with -fsyntax-only,
> then pass_rest_of_compilation::gate returns false, and none of the RTL
> passes are run.  Is this behavior correct?

Yes.

>> Doing it like __GIMPLE has the advantage of not exposing
>> "rest_of_compilation", etc..
>
> The gimple part of the compiler supports having multiple functions in
> memory at once, and then compiling them in arbitrary order based on
> decisions made by the callgraph.
>
> By contrast, the RTL part of the compiler is full of singleton state:
> things like crtl (aka x_rtl), the state of emit-rtl.c,
> "reload_completed", etc etc.

Ah, of course - I forgot that...

> To try to limit the scope of the project, for the RTL frontend work I'm
> merely attempting to restore the singleton RTL state from a dump,
> rather than to support having per function stashes of RTL state.
>
> Hence the rest of compilation gets invoked directly from the frontend
> for the __RTL case, since it will get overwritten if there's a second
> __RTL function in the source file (which sounds like an idea for a test
> case; I'll attempt such a test case).
>
> I hope this is a reasonable approach.  If not, I suppose I can attempt
> to bundle it up into some kind of RTL function state, but that seems
> like significant scope creep.

Yeah, I think it's a good approach for now.

>> I'm now handling __GIMPLE from within declspecs (the GIMPLE FE stuff
>> has been committed), it would be nice to match the __RTL piece here.
>
> (Congratulations on getting the GIMPLE FE stuff in)
>
> I'm not sure I understand you here - do you want me to rewrite the
> __RTL parsing to match the __GIMPLE piece, or the other way around?
> If the former, presumably I should reuse (and rename)
> c_parser_gimple_pass_list?

Handle __RTL like __GIMPLE, thus as declspec.  Of course ultimatively
Joseph has the last word here.

>
>> > gcc/ChangeLog:
>> > * Makefile.in (OBJS): Add run-rtl-passes.o.
>> >
>> > gcc/c-family/ChangeLog:
>> > * c-common.c (c_common_reswords): Add "__RTL".
>> > * c-common.h (enum rid): Add RID_RTL.
>> >
>> > gcc/c/ChangeLog:
>> > * c-parser.c: Include "read-rtl-function.h" and
>> > "run-rtl-passes.h".
>> > (c_parser_declaration_or_fndef): In the "GNU extensions"
>> > part of
>> > the leading comment, add an alternate production for
>> > "function-definition", along with new "rtl-body-specifier"
>> > and
>> > "rtl-body-pass-specifier" productions.  Handle "__RTL" by
>> > calling
>> > c_parser_parse_rtl_body.  Convert a timevar_push/pop pair
>> > to an auto_timevar, to cope with early exit.
>> > (c_parser_parse_rtl_body): New function.
>> >
>> > gcc/ChangeLog:
>> > * cfg.c (free_original_copy_tables): Remove assertion
>> > on original_copy_bb_pool.
>>
>> How can that trigger?
>
> It happens when running pass_outof_cfg_layout_mode::execute; seen with
> gcc.dg/rtl/x86_64/test-return-const.c.before-fwprop.c.
>
> The input file is a dump taken in cfg_layout mode (although in this
> case it's a trivial 3-basic-block CFG, but ideally there would be cases
> with non-trivial control flow); it has "fwprop1" as its starting pass.
>
> Running without -quiet shows:
>
> skipping pass: *rest_of_compilation
> skipping pass: vregs
> skipping pass: into_cfglayout
> skipping pass: jump
> skipping pass: subreg1
> skipping pass: cse1
> found starting pass: fwprop1
>
> i.e. it skips the into_cfglayout (amongst others), to start with
> fwprop1.
>
> In theory skipping a pass ought to be a no-op, assuming that we're
> faithfully reconstructing all RTL state.  However, RTL state management
> is fiddly, so the patch introduces some logic in passes.c to do some
> things when skipping a pass; in particular, it has:
>
>   /* Update the cfg hooks as appropriate.  */
>   if (strcmp (pass->name, "into_cfglayout") == 0)
> {
>   cfg_layout_rtl_register_cfg_hooks ();
>   cfun->curr_properties |= PROP_cfglayout;
> }
>   if (strcmp (pass->name, "outof_cfglayout") == 0)
> {

Re: Fix nb_iterations calculation in tree-vect-loop-manip.c

2016-11-16 Thread Richard Biener
On Tue, Nov 15, 2016 at 4:54 PM, Richard Sandiford
 wrote:
> Richard Biener  writes:
>> On Tue, Nov 15, 2016 at 1:44 PM, Richard Sandiford
>>  wrote:
>>> We previously stored the number of loop iterations rather
>>> than the number of latch iterations.
>>
>> So ->nb_iterations was unused without SVE?  Otherwise can you please
>> add a testcase?
>
> TBH I can't remember whether we noticed this by inspection or whether
> it did manifest in the output somehow.  If it did, it would have been
> an extra unnecessary iteration after complete unrolling, but usually
> a later patch would remove the iteration as dead.

Ok then.

Thanks,
Richard.

> Thanks,
> Richard


RE: [PATCH 3/3] MIPS/GCC: Mark trailing labels with `.insn'

2016-11-16 Thread Matthew Fortune
Maciej Rozycki  writes:
> On Mon, 14 Nov 2016, Matthew Fortune wrote:
> 
> > > This however requires the correct annotation of branch targets as
> > > code, because the ISA mode is not relevant for data symbols and is
> > > therefore not recorded for them.
> >
> > I wonder if it would have been possible to add the ISA mode to data
> > symbols and hide it in readelf/objdump? I don't know what older
> > binutils would have done with such labels but it would have made the
> > new checks compatible with pre-existing GCC code generation.
> > Regardless the changes in this patch would still be important to
> > correctly identify labels as text.
> 
>  Well, I suppose we could add ISA mode annotation (i.e. STO_MIPS16 and
> STO_MICROMIPS flags, as applicable) to data symbols and then ignore it
> for anything but branch/jump validation.  I have mixed feelings about
> such an arrangement though, and I don't find jumping to data symbols
> particularly clean in the first place, so I think we should avoid it
> anyway, at least in generated code.  Also I reckon errors from ISA mode
> checks in binutils have already identified a few Linux kernel bugs as
> support for microMIPS compilation was added there, so I think these
> checks have proved themselves useful and attempts should not be made to
> defeat them.

Sure. I just thought I'd mention it as an idea.

> > > ---
> > >  I have successfully regression-tested it with the `mips-mti-linux-
> gnu'
> > > target, with a big-endian o32 regular MIPS multilib and a little-
> endian
> > > o32 MIPS16 multilib, with no regressions, except as noted below.  I
> did
> > > some big-endian n64 regular MIPS and little-endian o32 microMIPS
> > > testing, including with the new cases, and things looked good,
> except as
> > > noted below.  I also generated assembly manually (for the assembly-
> match
> > > cases) and examined output visually, including all the four above
> > > multilibs, and also -fPIC and -mno-abicalls variants, which I have
> no
> > > immediate way of testing automatically.
> >
> > As noted below and my opinion in general... Dealing with the
> intricacies
> > of getting the MIPS part of the GCC testsuite running cleanly for all
> > variations of the architecture is a prohibitively expensive process to
> > apply to each patch. Now that we are in stage 3 then various testsuite
> > issues will get dealt with.
> 
>  Having test bots running the interesting matrix of targets and
> multilibs
> might help a bit, although most likely only over changes already
> committed
> unless we have a way to submit candidate patches for testing.  I believe
> the Linux kernel project has actually got this covered as I do see
> failure
> reports about people's mailing list patch submissions sent right away
> regularly, although for build errors only which are surely easier to
> catch.  Perhaps we could learn from their experience though sometime.

Yes, I get started on the setup for this every once in a while and then
run out of time.

> > >  With n64 (`-mabi=n64') testing none of the test cases under
> > > gcc.target/mips/ were run and the test harness broke as follows:
> > >
> > > ERROR: (DejaGnu) proc "cc1: error: '-mfpxx' can only be used with
> the
> > > o32 ABI" does not exist.
> > > The error code is NONE
> > > The info on the error is:
> > > invalid command name "cc1:"
> > > while executing
> > > "::tcl_unknown cc1: error: '-mfpxx' can only be used with the o32
> ABI"
> > > ("uplevel" body line 1)
> > > invoked from within
> > > "uplevel 1 ::tcl_unknown $args"
> > >
> > > I take it as a bug in the harness, which ought to be looked into
> > > separately, and not a problem with this change.
> >
> > I haven't seen this failure before which may be down to a different
> way
> > of invoking the testsuite I guess (I have run n64 testing fairly
> recently).
> 
>  I have figured out that the precedence of compilation flags is set in
> `default_target_compile' in /usr/share/dejagnu/target.exp, so it's not
> something we can easily control locally unless (in the usual TCL way) we
> override that procedure, which is quite a substantial piece of code
> actually.
> 
>  Based on this observation however I have determined that moving
> multilib
> and related flags such as `-mabi=n64' and possibly also `-Wl,-rpath,...'
> from `$board_info(...,cflags)' over to `$board_info(...,multilib_flags)'
> helps a bit and reduces the number of test suite issues, although still
> causes occasional troubles with link tests where wrong multilib matching
> happens in the absence of an exact match, such as with `-mabi=n64
> -mmicromips' choosing the (o32) `-mmicromips' multilib which is not link
> compatible rather than the (non-microMIPS) `-mabi=n64' which usually is
> (with the minor issue of short delay slots in cross-mode jump
> relaxation,
> which might be solvable by the harness with the `-minterlink-compressed'
> option where required).

So have you been modifying the 

Re: [RFC][PATCH] Speed-up use-after-scope (re-writing to SSA)

2016-11-16 Thread Martin Liška
On 11/16/2016 01:25 PM, Martin Liška wrote:
> Hello
> 
> Following patch is a candidate that re-writes VAR_DECLs that are
> is_gimple_reg_type with:
> my_char_25 = ASAN_POISON ();
> 
> that is eventually transformed to:
> __builtin___asan_report_use_after_scope_noabort ("my_char", 1);
> 
> at places where my_char_25 is used. That introduces a new entry point
> to ASAN runtime, reporting:
> 
> ==18378==ERROR: AddressSanitizer: stack-use-after-scope at pc 0x004007b4 
> bp 0x0001 sp 0x00400603
> ACCESS of size 1 for variable 'my_char' thread T0
> #0 0x400602 in main (/tmp/a.out+0x400602)
> #1 0x7fa6e572d290 in __libc_start_main (/lib64/libc.so.6+0x20290)
> #2 0x400669 in _start (/tmp/a.out+0x400669)
> 
> SUMMARY: AddressSanitizer: stack-use-after-scope (/tmp/a.out+0x400602) in main
> 
> I'm still not sure where exactly do the expansion of ASAN_POISON as some 
> cleanup
> after the transformation would be desired.
> 
> Thoughts?
> Thanks,
> Martin 
> 
> 
> 
> 

There's an example:

int
main (void)
{
  char *ptr;
  {
char my_char;
ptr = _char;
  }

  return *ptr;
}

$ g++ /tmp/use-after-scope-1.c -fsanitize=address -O0 && ./a.out 
=
==16035==ERROR: AddressSanitizer: stack-use-after-scope on address 
0x7ffe76322240 at pc 0x00400848 bp 0x7ffe76322200 sp 0x7ffe763221f8
READ of size 1 at 0x7ffe76322240 thread T0
#0 0x400847 in main (/tmp/a.out+0x400847)
#1 0x7f0005739290 in __libc_start_main (/lib64/libc.so.6+0x20290)
#2 0x4006b9 in _start (/tmp/a.out+0x4006b9)

Address 0x7ffe76322240 is located in stack of thread T0 at offset 32 in frame
#0 0x400786 in main (/tmp/a.out+0x400786)

  This frame has 1 object(s):
[32, 33) 'my_char' <== Memory access at offset 32 is inside this variable
HINT: this may be a false positive if your program uses some custom stack 
unwind mechanism or swapcontext
  (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-use-after-scope (/tmp/a.out+0x400847) in main
Shadow bytes around the buggy address:
  0x10004ec5c3f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10004ec5c400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10004ec5c410: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10004ec5c420: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10004ec5c430: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x10004ec5c440: 00 00 00 00 f1 f1 f1 f1[f8]f2 f2 f2 f3 f3 f3 f3
  0x10004ec5c450: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10004ec5c460: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10004ec5c470: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10004ec5c480: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10004ec5c490: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:   00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:   fa
  Freed heap region:   fd
  Stack left redzone:  f1
  Stack mid redzone:   f2
  Stack right redzone: f3
  Stack after return:  f5
  Stack use after scope:   f8
  Global redzone:  f9
  Global init order:   f6
  Poisoned by user:f7
  Container overflow:  fc
  Array cookie:ac
  Intra object redzone:bb
  ASan internal:   fe
  Left alloca redzone: ca
  Right alloca redzone:cb
==16035==ABORTING

$ g++ /tmp/use-after-scope-1.c -fsanitize=address -O2 && ./a.out 
=
==16049==ERROR: AddressSanitizer: stack-use-after-scope at pc 0x00400794 bp 
0x0001 sp 0x004005f3
ACCESS of size 1 for variable 'my_char' thread T0
#0 0x4005f2 in main (/tmp/a.out+0x4005f2)
#1 0x7f883337e290 in __libc_start_main (/lib64/libc.so.6+0x20290)
#2 0x400649 in _start (/tmp/a.out+0x400649)

SUMMARY: AddressSanitizer: stack-use-after-scope (/tmp/a.out+0x4005f2) in main
==16049==ABORTING

Martin


Re: [PATCH] Add sem_item::m_hash_set (PR ipa/78309) (v2)

2016-11-16 Thread Martin Liška
On 11/15/2016 05:46 PM, Jan Hubicka wrote:
> Yep, zero is definitly valid hash value:0
> 
> Patch is OK. We may consider backporting it to release branches.
> Honza

Thanks, sending v2 as I found an error in the previous version.
Changes from last version:
- comments for ctors are just in header file, not duplicated in ipa-icf.c
- hash argument has been removed from ctors
- ctors GNU coding style has been fixed and unified

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Ready to be installed?
Martin
>From 5be0ca49cfa67ca848002d6fe008ef4c2885bd40 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Fri, 11 Nov 2016 16:15:20 +0100
Subject: [PATCH] Add sem_item::m_hash_set (PR ipa/78309)

gcc/ChangeLog:

2016-11-16  Martin Liska  

	PR ipa/78309
	* ipa-icf.c (void sem_item::set_hash): Update m_hash_set.
	(sem_function::get_hash): Use the new field.
	(sem_function::parse): Remove an argument from ctor.
	(sem_variable::parse): Likewise.
	(sem_variable::get_hash): Use the new field.
	(sem_item_optimizer::read_section): Use new ctor and set hash.
	* ipa-icf.h: _hash is removed from sem_item::sem_item,
	sem_variable::sem_variable, sem_function::sem_function.
---
 gcc/ipa-icf.c | 64 ---
 gcc/ipa-icf.h | 17 
 2 files changed, 35 insertions(+), 46 deletions(-)

diff --git a/gcc/ipa-icf.c b/gcc/ipa-icf.c
index 1ab67f3..212e406 100644
--- a/gcc/ipa-icf.c
+++ b/gcc/ipa-icf.c
@@ -131,27 +131,20 @@ symbol_compare_collection::symbol_compare_collection (symtab_node *node)
 
 /* Constructor for key value pair, where _ITEM is key and _INDEX is a target.  */
 
-sem_usage_pair::sem_usage_pair (sem_item *_item, unsigned int _index):
-  item (_item), index (_index)
+sem_usage_pair::sem_usage_pair (sem_item *_item, unsigned int _index)
+: item (_item), index (_index)
 {
 }
 
-/* Semantic item constructor for a node of _TYPE, where STACK is used
-   for bitmap memory allocation.  */
-
-sem_item::sem_item (sem_item_type _type,
-		bitmap_obstack *stack): type (_type), m_hash (0)
+sem_item::sem_item (sem_item_type _type, bitmap_obstack *stack)
+: type (_type), m_hash (-1), m_hash_set (false)
 {
   setup (stack);
 }
 
-/* Semantic item constructor for a node of _TYPE, where STACK is used
-   for bitmap memory allocation. The item is based on symtab node _NODE
-   with computed _HASH.  */
-
 sem_item::sem_item (sem_item_type _type, symtab_node *_node,
-		hashval_t _hash, bitmap_obstack *stack): type(_type),
-  node (_node), m_hash (_hash)
+		bitmap_obstack *stack)
+: type (_type), node (_node), m_hash (-1), m_hash_set (false)
 {
   decl = node->decl;
   setup (stack);
@@ -230,23 +223,20 @@ sem_item::target_supports_symbol_aliases_p (void)
 void sem_item::set_hash (hashval_t hash)
 {
   m_hash = hash;
+  m_hash_set = true;
 }
 
 /* Semantic function constructor that uses STACK as bitmap memory stack.  */
 
-sem_function::sem_function (bitmap_obstack *stack): sem_item (FUNC, stack),
-  m_checker (NULL), m_compared_func (NULL)
+sem_function::sem_function (bitmap_obstack *stack)
+: sem_item (FUNC, stack), m_checker (NULL), m_compared_func (NULL)
 {
   bb_sizes.create (0);
   bb_sorted.create (0);
 }
 
-/*  Constructor based on callgraph node _NODE with computed hash _HASH.
-Bitmap STACK is used for memory allocation.  */
-sem_function::sem_function (cgraph_node *node, hashval_t hash,
-			bitmap_obstack *stack):
-  sem_item (FUNC, node, hash, stack),
-  m_checker (NULL), m_compared_func (NULL)
+sem_function::sem_function (cgraph_node *node, bitmap_obstack *stack)
+: sem_item (FUNC, node, stack), m_checker (NULL), m_compared_func (NULL)
 {
   bb_sizes.create (0);
   bb_sorted.create (0);
@@ -279,7 +269,7 @@ sem_function::get_bb_hash (const sem_bb *basic_block)
 hashval_t
 sem_function::get_hash (void)
 {
-  if (!m_hash)
+  if (!m_hash_set)
 {
   inchash::hash hstate;
   hstate.add_int (177454); /* Random number for function type.  */
@@ -1704,7 +1694,7 @@ sem_function::parse (cgraph_node *node, bitmap_obstack *stack)
   || DECL_STATIC_DESTRUCTOR (node->decl))
 return NULL;
 
-  sem_function *f = new sem_function (node, 0, stack);
+  sem_function *f = new sem_function (node, stack);
 
   f->init ();
 
@@ -1807,19 +1797,12 @@ sem_function::bb_dict_test (vec *bb_dict, int source, int target)
 return (*bb_dict)[source] == target;
 }
 
-
-/* Semantic variable constructor that uses STACK as bitmap memory stack.  */
-
 sem_variable::sem_variable (bitmap_obstack *stack): sem_item (VAR, stack)
 {
 }
 
-/*  Constructor based on varpool node _NODE with computed hash _HASH.
-Bitmap STACK is used for memory allocation.  */
-
-sem_variable::sem_variable (varpool_node *node, hashval_t _hash,
-			bitmap_obstack *stack): sem_item(VAR,
-  node, _hash, stack)
+sem_variable::sem_variable (varpool_node *node, bitmap_obstack *stack)
+: sem_item (VAR, node, stack)
 {
   

Re: [PATCH][PPC] Fix ICE using power9 with soft-float

2016-11-16 Thread Michael Meissner
On Tue, Nov 15, 2016 at 09:11:56PM +, Andrew Stubbs wrote:
> On 15/11/16 21:06, Michael Meissner wrote:
> >Now, that I have a little time, I can look into this, to at least make
> >predicate and peepholes match.  There is some other stuff (support for the 
> >new
> >load/store that were added to the compiler after that we should also tackle).
> 
> I've been investigating this today, and I've found that the insn
> does not match because the "fusion_addis_mem_combo_store" predicate
> requires TARGET_SF_FPR is true, which in turn requires
> TARGET_HARD_FLOAT is true.
> 
> So basically the fusion stuff is disabled in soft-float mode
> regardless of where the value is stored.
> 
> Anyway, I'm at end-of-day now, so let me know if you come up with anything.

Yeah, SFmode and DFmode should not have the TARGET_{S,D}F_FPR checks.

But a secondary problem is the early clobber in the match_scratch.

Before the peephole2 we have:

(insn 7 4 8 2 (set (reg/f:DI 3 3 [157])
(plus:DI (reg:DI 3 3 [ p ])
(const_int 327680 [0x5]))) "fusion3.c":12 75 {*adddi3}
 (nil))

(insn 8 7 14 2 (set (mem:SF (plus:DI (reg/f:DI 3 3 [157])
(const_int -29420 [0x8d14])) [1 MEM[(float *)p_1(D) 
+ 298260B]+0 S4 A32])
(reg:SF 4 4 [ f ])) "fusion3.c":12 484 {*movsf_softfloat}
 (expr_list:REG_DEAD (reg:SF 4 4 [ f ])
(expr_list:REG_DEAD (reg/f:DI 3 3 [157])
(nil

This gets combined into:

(insn 17 4 14 2 (parallel [
(set (mem:SF (plus:DI (plus:DI (reg:DI 3 3 [ p ])
(const_int 327680 [0x5]))
(const_int -29420 [0x8d14])) [1 MEM[(float 
*)p_1(D) + 298260B]+0 S4 A32])
(unspec:SF [
(reg:SF 4 4 [ f ])
] UNSPEC_FUSION_P9))
(clobber (reg/f:DI 3 3 [157]))
]) "fusion3.c":12 -1
 (nil))

where (reg:DI 3) is the base register temporary.  Since it was used as part of
the original address, the early clobber check in constrain_operands will flag
it.  This is what is causing PR 78101.  Thus there are two issues here.

In the scope of the ISA 3.0/power9 work, there is also the issue that the new
d-form load and stores should be folded into this as well.  This is due to me
doing the original power9/ISA 3.0 fusion first, and then putting it on the
shelf, and then months later working on the new address forms, and never
getting back to fusion.

As it is currently written, there was an expectation that the fusion stuff
would eventually move from being a peephole2 to being part of the addressing
handled before register allocation.  Then the secondary reload hook would call
the appropriate fusion helper function, which is why we have int_ and gpr_
fusion insn functions, each based on a separate register class.  The int_
fusion form allows SFmode/DFmode, but as you point out the combo recognizer
doesn't allow them in GPRs.

I'm not as sure that we will have the will to move fused addresses early before
register allocation, and whether we need to keep around the separate int_ and
fpr_ forms.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [PATCH, Fortran, pr78356, v1] [7 Regression] [OOP] segfault allocating polymorphic variable with polymorphic component with allocatable component

2016-11-16 Thread Andre Vehreschild
Hi Janus,

thanks for the quick review. Committed as r242490.

Regards,
Andre

On Tue, 15 Nov 2016 22:24:43 +0100
Janus Weil  wrote:

> Hi Andre,
> 
> > attached patch fixes the issue raised. The issue here was, that a copy of
> > the base class was generated and its address passed to the
> > _vptr->copy()-method, which then accessed memory, that was not present in
> > the copy being an object of the base class. The patch fixes this by making
> > sure the temporary handle is a pointer to the data to copy.
> >
> > Sorry, when that is not clear. I am not feeling so well today. So here in
> > terms of pseudo code. This code was formerly generated:
> >
> > struct ac {};
> > struct a : struct ac { integer *i; };
> >
> > a src, dst;
> > ac temp;
> >
> > temp = src; // temp is now only a copy of ac
> >
> > _vptr.copy(, ); // temp does not denote memory having a pointer to
> > i
> >
> > After the patch, this code is generated:
> >
> > // types as above
> > a src, dst;
> > ac *temp; // !!! Now a pointer
> >
> > temp = 
> > _vptr.copy(temp, ); // temp now points to memory that has a pointer to i
> > // and is valid for copying.
> >
> > Bootstraps and regtests ok on x86_64-linux/F23. Ok for trunk?  
> 
> ok with me. Thanks for the quick fix!
> 
> Cheers,
> Janus


-- 
Andre Vehreschild * Email: vehre ad gmx dot de 
Index: gcc/fortran/ChangeLog
===
--- gcc/fortran/ChangeLog	(Revision 242489)
+++ gcc/fortran/ChangeLog	(Arbeitskopie)
@@ -1,3 +1,11 @@
+2016-11-16  Andre Vehreschild  
+
+	PR fortran/78356
+	* class.c (gfc_is_class_scalar_expr): Prevent taking an array ref for
+	a component ref.
+	* trans-expr.c (gfc_trans_assignment_1): Ensure a reference to the
+	object to copy is generated, when assigning class objects.
+
 2016-11-14  Thomas Koenig  
 
 	* dump-parse-tree.c (show_code):  Add prototype.
Index: gcc/fortran/class.c
===
--- gcc/fortran/class.c	(Revision 242489)
+++ gcc/fortran/class.c	(Arbeitskopie)
@@ -378,7 +378,8 @@
 	&& CLASS_DATA (e->symtree->n.sym)
 	&& !CLASS_DATA (e->symtree->n.sym)->attr.dimension
 	&& (e->ref == NULL
-	|| (strcmp (e->ref->u.c.component->name, "_data") == 0
+	|| (e->ref->type == REF_COMPONENT
+		&& strcmp (e->ref->u.c.component->name, "_data") == 0
 		&& e->ref->next == NULL)))
 return true;
 
@@ -390,7 +391,8 @@
 	&& CLASS_DATA (ref->u.c.component)
 	&& !CLASS_DATA (ref->u.c.component)->attr.dimension
 	&& (ref->next == NULL
-		|| (strcmp (ref->next->u.c.component->name, "_data") == 0
+		|| (ref->next->type == REF_COMPONENT
+		&& strcmp (ref->next->u.c.component->name, "_data") == 0
 		&& ref->next->next == NULL)))
 	return true;
 }
Index: gcc/fortran/trans-expr.c
===
--- gcc/fortran/trans-expr.c	(Revision 242489)
+++ gcc/fortran/trans-expr.c	(Arbeitskopie)
@@ -9628,6 +9628,7 @@
   int n;
   bool maybe_workshare = false;
   symbol_attribute lhs_caf_attr, rhs_caf_attr, lhs_attr;
+  bool is_poly_assign;
 
   /* Assignment of the form lhs = rhs.  */
   gfc_start_block ();
@@ -9648,6 +9649,19 @@
 	  || gfc_is_alloc_class_scalar_function (expr2)))
 expr2->must_finalize = 1;
 
+  /* Checking whether a class assignment is desired is quite complicated and
+ needed at two locations, so do it once only before the information is
+ needed.  */
+  lhs_attr = gfc_expr_attr (expr1);
+  is_poly_assign = (use_vptr_copy || lhs_attr.pointer
+		|| (lhs_attr.allocatable && !lhs_attr.dimension))
+		   && (expr1->ts.type == BT_CLASS
+		   || gfc_is_class_array_ref (expr1, NULL)
+		   || gfc_is_class_scalar_expr (expr1)
+		   || gfc_is_class_array_ref (expr2, NULL)
+		   || gfc_is_class_scalar_expr (expr2));
+
+
   /* Only analyze the expressions for coarray properties, when in coarray-lib
  mode.  */
   if (flag_coarray == GFC_FCOARRAY_LIB)
@@ -9676,6 +9690,10 @@
   if (rss == gfc_ss_terminator)
 	/* The rhs is scalar.  Add a ss for the expression.  */
 	rss = gfc_get_scalar_ss (gfc_ss_terminator, expr2);
+  /* When doing a class assign, then the handle to the rhs needs to be a
+	 pointer to allow for polymorphism.  */
+  if (is_poly_assign && expr2->rank == 0 && !UNLIMITED_POLY (expr2))
+	rss->info->type = GFC_SS_REFERENCE;
 
   /* Associate the SS with the loop.  */
   gfc_add_ss_to_loop (, lss);
@@ -9835,14 +9853,7 @@
 	gfc_add_block_to_block (, );
 }
 
-  lhs_attr = gfc_expr_attr (expr1);
-  if ((use_vptr_copy || lhs_attr.pointer
-   || (lhs_attr.allocatable && !lhs_attr.dimension))
-  && (expr1->ts.type == BT_CLASS
-	  || (gfc_is_class_array_ref (expr1, NULL)
-	  || gfc_is_class_scalar_expr (expr1))
-	  || (gfc_is_class_array_ref (expr2, NULL)
-	  || gfc_is_class_scalar_expr (expr2
+  if (is_poly_assign)
  

[Patch, testsuite] Fix bogus Wlogical-op-1.c test failure for avr

2016-11-16 Thread Senthil Kumar Selvaraj
Hi,

  The below patch fixes c-c++-common/Wlogical-op-1.c for avr by
  explicitly typedef'ing __INT32_TYPE for int and __INT16_TYPE__ for short
  if the target's int size is less than 4 bytes.

  The test assumes short is always smaller than int, and therefore does not 
  expect a warning when the logical operands are of type short and int.

  This isn't true for the avr - shorts and ints are of the same size, and
  therefore the warning triggers for the above case also.

  Committed to trunk as obvious.

Regards
Senthil

gcc/testsuite/ChangeLog

2016-11-16  Senthil Kumar Selvaraj  

* c-c++-common/Wlogical-op-1.c: Use __INT{16,32}_TYPE__ instead
of {short,int} if __SIZEOF_INT__ is less than 4 bytes.


Index: c-c++-common/Wlogical-op-1.c
===
--- c-c++-common/Wlogical-op-1.c(revision 242471)
+++ c-c++-common/Wlogical-op-1.c(working copy)
@@ -8,12 +8,22 @@
 # define false 0
 #endif
 
-extern int bar (void);
-extern int *p;
-struct R { int a, b; } S;
+#if __SIZEOF_INT__ < 4
+  __extension__ typedef __INT32_TYPE__ int32_t;
+  __extension__ typedef __UINT32_TYPE__ uint32_t;
+  __extension__ typedef __INT16_TYPE__ int16_t;
+#else
+  typedef int int32_t;
+  typedef unsigned int uint32_t;
+  typedef short int16_t;
+#endif
 
+extern int32_t bar (void);
+extern int32_t *p;
+struct R { int32_t a, b; } S;
+
 void
-andfn (int a, int b)
+andfn (int32_t a, int32_t b)
 {
   if (a && a) {}   /* { dg-warning "logical .and. of equal 
expressions" } */
   if (!a && !a) {} /* { dg-warning "logical .and. of equal 
expressions" } */
@@ -34,7 +44,7 @@
   if (p[0] && p[0]) {} /* { dg-warning "logical .and. of equal 
expressions" } */
   if (S.a && S.a) {}   /* { dg-warning "logical .and. of equal 
expressions" } */
   if ((bool) a && (bool) a) {} /* { dg-warning "logical .and. of equal 
expressions" } */
-  if ((unsigned) a && a) {}/* { dg-warning "logical .and. of equal 
expressions" } */
+  if ((uint32_t) a && a) {}/* { dg-warning "logical .and. of equal 
expressions" } */
 
   /* Stay quiet here.  */
   if (a && b) {}
@@ -48,7 +58,7 @@
 
   if (a > 0 && a > 1) {}
   if (a > -2 && a > 1) {}
-  if (a && (short) a) {}
+  if (a && (int16_t) a) {}
   if ((char) a && a) {}
   if (++a && a) {}
   if (++a && ++a) {}
@@ -61,7 +71,7 @@
 }
 
 void
-orfn (int a, int b)
+orfn (int32_t a, int32_t b)
 {
   if (a || a) {}   /* { dg-warning "logical .or. of equal 
expressions" } */
   if (!a || !a) {} /* { dg-warning "logical .or. of equal 
expressions" } */
@@ -82,7 +92,7 @@
   if (p[0] || p[0]) {} /* { dg-warning "logical .or. of equal 
expressions" } */
   if (S.a || S.a) {}   /* { dg-warning "logical .or. of equal 
expressions" } */
   if ((bool) a || (bool) a) {} /* { dg-warning "logical .or. of equal 
expressions" } */
-  if ((unsigned) a || a) {}/* { dg-warning "logical .or. of equal 
expressions" } */
+  if ((uint32_t) a || a) {}/* { dg-warning "logical .or. of equal 
expressions" } */
 
   /* Stay quiet here.  */
   if (a || b) {}
@@ -96,7 +106,7 @@
 
   if (a > 0 || a > 1) {}
   if (a > -2 || a > 1) {}
-  if (a || (short) a) {}
+  if (a || (int16_t) a) {}
   if ((char) a || a) {}
   if (++a || a) {}
   if (++a || ++a) {}


[PATCH] Fix PR78333

2016-11-16 Thread Richard Biener

Since GCC 4.6 we aggressively prune bodies of GNU extern inline functions
which means that instrumenting them via -finstrument-functions doesn't
work because that takes the address of the function.  Fixed by not
instrumenting those functions (we still instrument regular always-inline
functions and that works as expected).

In the PR this affects intrinsic header functions but it will also
affect fortify wrappers and in both cases instrumenting is undesirable
I think.

Bootstrap / regtest in progress on x86_64-unknown-linux-gnu.

Richard.

2016-11-16  Richard Biener  

PR middle-end/78333
* gimplify.c (gimplify_function_tree): Do not instrument
GNU extern inline functions.

* gcc.dg/pr78333.c: New testcase.

Index: gcc/gimplify.c
===
--- gcc/gimplify.c  (revision 242408)
+++ gcc/gimplify.c  (working copy)
@@ -12547,6 +12559,10 @@ gimplify_function_tree (tree fndecl)
   /* ??? Add some way to ignore exceptions for this TFE.  */
   if (flag_instrument_function_entry_exit
   && !DECL_NO_INSTRUMENT_FUNCTION_ENTRY_EXIT (fndecl)
+  /* Do not instrument extern inline functions.  */
+  && !(DECL_DECLARED_INLINE_P (fndecl)
+  && DECL_EXTERNAL (fndecl)
+  && DECL_DISREGARD_INLINE_LIMITS (fndecl))
   && !flag_instrument_functions_exclude_p (fndecl))
 {
   tree x;
Index: gcc/testsuite/gcc.dg/pr78333.c
===
--- gcc/testsuite/gcc.dg/pr78333.c  (revision 0)
+++ gcc/testsuite/gcc.dg/pr78333.c  (working copy)
@@ -0,0 +1,9 @@
+/* { dg-do link } */
+/* { dg-options "-finstrument-functions" } */
+
+extern inline __attribute__((gnu_inline, always_inline)) int foo () { }
+int main()
+{
+  foo ();
+  return 0;
+}


Re: [PATCH, GCC/ARM] Fix ICE when compiling empty FIQ interrupt handler in ARM mode

2016-11-16 Thread Kyrill Tkachov


On 09/11/16 16:19, Thomas Preudhomme wrote:

Hi,

This patch fixes the following ICE when building when compiling an empty FIQ 
interrupt handler in ARM mode:

empty_fiq_handler.c:5:1: error: insn does not satisfy its constraints:
 }
 ^

(insn/f 13 12 14 (set (reg/f:SI 13 sp)
(plus:SI (reg/f:SI 11 fp)
(const_int 4 [0x4]))) irq.c:5 4 {*arm_addsi3}
 (expr_list:REG_CFA_ADJUST_CFA (set (reg/f:SI 13 sp)
(plus:SI (reg/f:SI 11 fp)
(const_int 4 [0x4])))
(nil)))

The ICE was provoked by missing an alternative to reflect that ARM mode can do 
an add of general register into sp which is unpredictable in Thumb mode add 
immediate.

ChangeLog entries are as follow:

*** gcc/ChangeLog ***

2016-11-04  Thomas Preud'homme  

* config/arm/arm.md (arm_addsi3): Add alternative for addition of
general register with general register or ARM constant into SP
register.


*** gcc/testsuite/ChangeLog ***

2016-11-04  Thomas Preud'homme  

* gcc.target/arm/empty_fiq_handler.c: New test.


Testing: bootstrapped on ARMv7-A ARM mode & testsuite shows no regression.

Is this ok for trunk?



I see that "r" does not include the stack pointer (STACK_REG is separate from 
GENERAL_REGs) so we are indeed missing
that constraint.

Ok for trunk.
Thanks,
Kyrill


Best regards,

Thomas




Re: [PATCH] Fix PR78306

2016-11-16 Thread Richard Biener
On Tue, 15 Nov 2016, Richard Biener wrote:

> 
> Appearantly for some unknown reason we refuse to inline anything into
> functions calling cilk_spawn.  That breaks fortified headers and
> all other always-inline function calls (intrinsics come to my mind as 
> well).
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu, ok for trunk?

I had to fix the testcase somewhat to avoid the cilk.h header.  I also
added correctness checking.

I'll commit this to trunk if there are no comments on why we refuse
to inline into cilk-spawn containing functions (CCing original
author as well) - the code behaves this way since the original
merge and there's no testcase showing failure when removing it.

Thanks,
Richard.

2016-11-16  Richard Biener  

PR tree-optimization/78306
* ipa-inline-analysis.c (initialize_inline_failed): Do not
inhibit inlining if function calls cilk_spawn.
(can_inline_edge_p): Likewise.

* gcc.dg/cilk-plus/pr78306.c: New testcase.

Index: gcc/ipa-inline-analysis.c
===
--- gcc/ipa-inline-analysis.c   (revision 242408)
+++ gcc/ipa-inline-analysis.c   (working copy)
@@ -1507,9 +1507,6 @@ initialize_inline_failed (struct cgraph_
 e->inline_failed = CIF_BODY_NOT_AVAILABLE;
   else if (callee->local.redefined_extern_inline)
 e->inline_failed = CIF_REDEFINED_EXTERN_INLINE;
-  else if (cfun && fn_contains_cilk_spawn_p (cfun))
-/* We can't inline if the function is spawing a function.  */
-e->inline_failed = CIF_CILK_SPAWN;
   else
 e->inline_failed = CIF_FUNCTION_NOT_CONSIDERED;
   gcc_checking_assert (!e->call_stmt_cannot_inline_p
Index: gcc/ipa-inline.c
===
--- gcc/ipa-inline.c(revision 242408)
+++ gcc/ipa-inline.c(working copy)
@@ -368,11 +368,6 @@ can_inline_edge_p (struct cgraph_edge *e
   e->inline_failed = CIF_FUNCTION_NOT_INLINABLE;
   inlinable = false;
 }
-  else if (inline_summaries->get (caller)->contains_cilk_spawn)
-{
-  e->inline_failed = CIF_CILK_SPAWN;
-  inlinable = false;
-}
   /* Don't inline a function with mismatched sanitization attributes. */
   else if (!sanitize_attrs_match_for_inline_p (caller->decl, callee->decl))
 {
Index: gcc/testsuite/gcc.dg/cilk-plus/pr78306.c
===
--- gcc/testsuite/gcc.dg/cilk-plus/pr78306.c(revision 0)
+++ gcc/testsuite/gcc.dg/cilk-plus/pr78306.c(revision 0)
@@ -0,0 +1,30 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -fcilkplus" } */
+
+#define _FORTIFY_SOURCE 2
+#include 
+
+int sum(int low, int high)
+{
+  if(low == high) {
+return low;
+  }
+
+  int mid = low + (high-low)/2;
+  int a = _Cilk_spawn sum(low, mid);
+  int b = sum(mid+1, high);
+
+  // Some very expensive computation here
+  int foo[64];
+  memset(foo, 0, 64*sizeof(int)); // <--- Fails
+
+  _Cilk_sync;
+
+  return a+b;
+}
+
+int main(void) {
+  if (sum(0, 100) != 5050)
+__builtin_abort ();
+  return 0;
+}


Re: Add more subreg offset helpers

2016-11-16 Thread Richard Sandiford
Eric Botcazou  writes:
>> Provide versions of subreg_lowpart_offset and subreg_highpart_offset
>> that work on mode sizes rather than modes.  Also provide a routine
>> that converts an lsb position to a subreg offset.
>
> The position is in bits but all the other parameters and the return
> value are in bytes so IMO this is a recipe for problems.  I'd also
> rename LSB into DIFF or LSB_OFFSET or something equivalent because LSB
> is not very explicit.

subreg_offset_from_lsb was supposed to be the inverse operation of
subreg_lsb, which also returns a bit number.  Should I change that
to return a byte number as well?

Thanks,
Richard


Re: [1/9][RFC][DWARF] Reserve three DW_OP numbers in vendor extension space

2016-11-16 Thread Jiong Wang

On 15/11/16 19:25, Richard Earnshaw (lists) wrote:

On 15/11/16 16:48, Jiong Wang wrote:

On 15/11/16 16:18, Jakub Jelinek wrote:

I know nothing about the aarch64 return address signing, would all 3
or say
2 usually appear together without any separate pc advance, or are they
all
going to appear frequently and at different pcs?


  I think it's the latter, the DW_OP_AARCH64_paciasp and
DW_OP_AARCH64_paciasp_deref are going to appear frequently and at
different pcs.
For example, the following function prologue, there are three
instructions
at 0x0, 0x4, 0x8.

  After the first instruction at 0x0, LR/X30 will be mangled.  The
"paciasp" always
mangle LR register using SP as salt and write back the value into LR.
We then generate
DW_OP_AARCH64_paciasp to notify any unwinder that the original LR is
mangled in this
way so they can unwind the original value properly.

  After the second instruction at 0x4, The mangled value of LR/X30 will
be pushed on
to stack, unlike usual .cfi_offset, the unwind rule for LR/X30 becomes:
first fetch the
mangled value from stack offset -16, then do whatever to restore the
original value
from the mangled value.  This is represented by
(DW_OP_AARCH64_paciasp_deref, offset).

.cfi_startproc
   0x0  paciasp (this instruction sign return address register LR/X30)
.cfi_val_expression 30, DW_OP_AARCH64_paciasp
   0x4  stp x29, x30, [sp, -32]!
.cfi_val_expression 30, DW_OP_AARCH64_paciasp_deref, -16
.cfi_offset 29, -32
.cfi_def_cfa_offset 32
   0x8  add x29, sp, 0



Now I'm confused.

I was thinking that we needed one opcode for the sign operation in the
prologue and one for the unsign/validate operation in the epilogue (to
support non-call exceptions.


  IMO, non-call exceptions is fine, it looks to me doesn't need extra
description as for non-call exceptions (exceptions thrown from signal
handler) the key point is how to unwind across signal frame.  For libgcc EH
unwinder, when normal unwinding failed, it will fall back to architecture
unwinding hook which restore some information from signal frame which is just
on top of the signal handler's frame.

  I can see AArch64 implementation will setup return address column like the
following logic where "sc->pc" is initialized by kernel and it's not signed
therefore should sucess on further unwinding.

fs->regs.reg[__LIBGCC_DWARF_ALT_FRAME_RETURN_COLUMN__].how =
  REG_SAVED_VAL_OFFSET;
fs->regs.reg[__LIBGCC_DWARF_ALT_FRAME_RETURN_COLUMN__].loc.offset =
  (_Unwind_Ptr) (sc->pc) - new_cfa;


But why do we need a separate code to say
that a previously signed value has now been pushed on the stack?  Surely
that's just a normal store operation that can be tracked through the
unwinding state machine.


  I was thinking the same thing, but found it doesn't work.  My understanding
of frame unwinding described at DWARF specification is: there are two steps
for frame unwinding.  The first step is to calculate register restore rules.
Unwinder scans register rules from function start to the unwinding PC, one
rule will be *overridden* by the next for the same register, there is *no
inheritance*.  The second step is then to evaluate all the final rules
collected at the unwinding PC.  According to the rule, either fetch the value
from stack or evaluate the value on DWARF expression stack etc.

 I also had tried to modify ".cfi_val_expression" at offset 0x4 in above
example into ".cfi_offset 30, -24", libgcc EH unwinder just doesn't work.



I was expecting the third opcode to be needed for the special operations
that are not frequently used by the compiler.


 The two operations DW_OP_AARCH64_paciasp and DW_OP_AARCH64_paciasp_deref were
designed as shortcut operations when LR is signed with A key and using
function's CFA as salt.  This is the default behaviour of return address
signing so is expected to be used for most of the time.  DW_OP_AARCH64_pauth
is designed as a generic operation that allow describing pointer signing on
any value using any salt and key in case we can't use the shortcut operations
we can use this.



Re: An alternative fix for PR70944

2016-11-16 Thread Richard Sandiford
Segher Boessenkool  writes:
> On Tue, Nov 15, 2016 at 12:33:06PM +, Richard Sandiford wrote:
>> The transformations made by make_compound_operation apply
>> only to scalar integer modes.  The fix for PR70944 had enforced
>> that by returning early for vector modes at the top of the
>> function.  However, the function is supposed to be recursive,
>> so we should continue to look at integer suboperands even if
>> the outer operation is a vector one.
>> 
>> This patch instead splits out the non-recursive parts
>> of make_compound_operation into a subroutine and checks
>> that the mode is a scalar integer before calling it.
>> The patch was originally written to help with the later
>> conversion to static type checking of mode classes, but it
>> also happened to reenable optimisation of things like
>> vec_duplicate operands.
>> 
>> Note that the gen_lowparts in the PLUS and MINUS cases
>> were redundant, since new_rtx already had mode "mode"
>> at those points.
>> 
>> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
>
> Yes, please do.  You can use maybe_swap_commutative_operands in
> change_zero_ext as well, perhaps in more places, do you want to
> take a look?

Ah, yeah.  change_zero_ext was the only one I could see.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Thanks,
Richard


gcc/
2016-11-15  Richard Sandiford  
Alan Hayward  
David Sherwood  

* combine.c (maybe_swap_commutative_operands): New function.
(combine_simplify_rtx): Use it.
(change_zero_ext): Likewise.
(make_compound_operation_int): New function, split out of...
(make_compound_operation): ...here.  Use
maybe_swap_commutative_operands for both.

diff --git a/gcc/combine.c b/gcc/combine.c
index 19ef52f..ca5ddae 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -5479,6 +5479,21 @@ subst (rtx x, rtx from, rtx to, int in_dest, int 
in_cond, int unique_copy)
   return x;
 }
 
+/* If X is a commutative operation whose operands are not in the canonical
+   order, use substitutions to swap them.  */
+
+static void
+maybe_swap_commutative_operands (rtx x)
+{
+  if (COMMUTATIVE_ARITH_P (x)
+  && swap_commutative_operands_p (XEXP (x, 0), XEXP (x, 1)))
+{
+  rtx temp = XEXP (x, 0);
+  SUBST (XEXP (x, 0), XEXP (x, 1));
+  SUBST (XEXP (x, 1), temp);
+}
+}
+
 /* Simplify X, a piece of RTL.  We just operate on the expression at the
outer level; call `subst' to simplify recursively.  Return the new
expression.
@@ -5498,13 +5513,7 @@ combine_simplify_rtx (rtx x, machine_mode op0_mode, int 
in_dest,
 
   /* If this is a commutative operation, put a constant last and a complex
  expression first.  We don't need to do this for comparisons here.  */
-  if (COMMUTATIVE_ARITH_P (x)
-  && swap_commutative_operands_p (XEXP (x, 0), XEXP (x, 1)))
-{
-  temp = XEXP (x, 0);
-  SUBST (XEXP (x, 0), XEXP (x, 1));
-  SUBST (XEXP (x, 1), temp);
-}
+  maybe_swap_commutative_operands (x);
 
   /* Try to fold this expression in case we have constants that weren't
  present before.  */
@@ -7744,55 +7753,38 @@ extract_left_shift (rtx x, int count)
   return 0;
 }
 
-/* Look at the expression rooted at X.  Look for expressions
-   equivalent to ZERO_EXTRACT, SIGN_EXTRACT, ZERO_EXTEND, SIGN_EXTEND.
-   Form these expressions.
-
-   Return the new rtx, usually just X.
+/* Subroutine of make_compound_operation.  *X_PTR is the rtx at the current
+   level of the expression and MODE is its mode.  IN_CODE is as for
+   make_compound_operation.  *NEXT_CODE_PTR is the value of IN_CODE
+   that should be used when recursing on operands of *X_PTR.
 
-   Also, for machines like the VAX that don't have logical shift insns,
-   try to convert logical to arithmetic shift operations in cases where
-   they are equivalent.  This undoes the canonicalizations to logical
-   shifts done elsewhere.
+   There are two possible actions:
 
-   We try, as much as possible, to re-use rtl expressions to save memory.
+   - Return null.  This tells the caller to recurse on *X_PTR with IN_CODE
+ equal to *NEXT_CODE_PTR, after which *X_PTR holds the final value.
 
-   IN_CODE says what kind of expression we are processing.  Normally, it is
-   SET.  In a memory address it is MEM.  When processing the arguments of
-   a comparison or a COMPARE against zero, it is COMPARE, or EQ if more
-   precisely it is an equality comparison against zero.  */
+   - Return a new rtx, which the caller returns directly.  */
 
-rtx
-make_compound_operation (rtx x, enum rtx_code in_code)
+static rtx
+make_compound_operation_int (machine_mode mode, rtx *x_ptr,
+enum rtx_code in_code,
+enum rtx_code *next_code_ptr)
 {
+  rtx x = *x_ptr;
+  enum rtx_code next_code = *next_code_ptr;
   enum 

Re: Add more subreg offset helpers

2016-11-16 Thread Eric Botcazou
> Provide versions of subreg_lowpart_offset and subreg_highpart_offset
> that work on mode sizes rather than modes.  Also provide a routine
> that converts an lsb position to a subreg offset.

The position is in bits but all the other parameters and the return value are 
in bytes so IMO this is a recipe for problems.  I'd also rename LSB into DIFF 
or LSB_OFFSET or something equivalent because LSB is not very explicit.

> 2016-11-15  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
>   * rtl.h (subreg_size_offset_from_lsb): Declare.
>   (subreg_offset_from_lsb): Likewise.
>   (subreg_size_lowpart_offset): Likewise.
>   (subreg_size_highpart_offset): Likewise.
>   * emit-rtl.c (subreg_size_lowpart_offset): New function.
>   (subreg_lowpart_offset): Use it.
>   (subreg_size_highpart_offset): New function.
>   (subreg_highpart_offset): Use it.
>   * rtlanal.c (subreg_size_offset_from_lsb): New function.
>   (subreg_offset_from_lsb): Likewise.

Please make the 3 wrappers inline functions.

-- 
Eric Botcazou


[PATCH] Fix PR78348

2016-11-16 Thread Richard Biener

This should fix a performance regression caused by recent loop 
distribution improvements.  It trivially uses dependence analysis
(in addition to the existing alias oracle query) to determine
if we can use memcpy instead of memmove (no attempt is made yet to
cover the case where the dependence distance is greater than
the copied region).

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2016-11-16  Richard Biener  

PR tree-optimization/78348
* tree-loop-distribution.c (enum partition_kind): Add PKIND_MEMMOVE.
(generate_memcpy_builtin): Honor PKIND_MEMCPY on the partition.
(classify_partition): Set PKIND_MEMCPY if dependence analysis
revealed no dependency, PKIND_MEMMOVE otherwise.

* gcc.dg/tree-ssa/ldist-24.c: New testcase.

Index: gcc/tree-loop-distribution.c
===
--- gcc/tree-loop-distribution.c(revision 242408)
+++ gcc/tree-loop-distribution.c(working copy)
@@ -466,7 +466,7 @@ build_rdg (vec loop_nest, contro
 
 
 enum partition_kind {
-PKIND_NORMAL, PKIND_MEMSET, PKIND_MEMCPY
+PKIND_NORMAL, PKIND_MEMSET, PKIND_MEMCPY, PKIND_MEMMOVE
 };
 
 struct partition
@@ -875,10 +875,11 @@ generate_memcpy_builtin (struct loop *lo
   false, GSI_CONTINUE_LINKING);
   dest = build_addr_arg_loc (loc, partition->main_dr, nb_bytes);
   src = build_addr_arg_loc (loc, partition->secondary_dr, nb_bytes);
-  if (ptr_derefs_may_alias_p (dest, src))
-kind = BUILT_IN_MEMMOVE;
-  else
+  if (partition->kind == PKIND_MEMCPY
+  || ! ptr_derefs_may_alias_p (dest, src))
 kind = BUILT_IN_MEMCPY;
+  else
+kind = BUILT_IN_MEMMOVE;
 
   dest = force_gimple_operand_gsi (, dest, true, NULL_TREE,
   false, GSI_CONTINUE_LINKING);
@@ -970,6 +971,7 @@ generate_code_for_partition (struct loop
   break;
 
 case PKIND_MEMCPY:
+case PKIND_MEMMOVE:
   generate_memcpy_builtin (loop, partition);
   break;
 
@@ -1166,10 +1168,12 @@ classify_partition (loop_p loop, struct
  return;
}
}
+ partition->kind = PKIND_MEMMOVE;
}
+  else
+   partition->kind = PKIND_MEMCPY;
   free_dependence_relation (ddr);
   loops.release ();
-  partition->kind = PKIND_MEMCPY;
   partition->main_dr = single_store;
   partition->secondary_dr = single_load;
   partition->niter = nb_iter;
Index: gcc/testsuite/gcc.dg/tree-ssa/ldist-24.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/ldist-24.c(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/ldist-24.c(revision 0)
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-ldist-details" } */
+
+typedef struct S {
+double z[8][25];
+double x1[8][40];
+double x2[8][40];
+double y[8][35];
+} S;
+
+S * par;
+void foo ()
+{
+  int i, j;
+  for (i = 0; i<8; i++)
+for (j = 0; j<35; j++)
+  {
+   par->x1[i][j] = par->x2[i][j];
+   par->x2[i][j] = 0.0;
+  }
+}
+
+/* { dg-final { scan-tree-dump "generated memcpy" "ldist" } } */
+/* { dg-final { scan-tree-dump "generated memset zero" "ldist" } } */


[PATCH][ARM] PR target/78364: Add proper restrictions to zero and sign_extract patterns operands

2016-11-16 Thread Kyrill Tkachov

Hi all,

This patch fixes the arm build failure due to out of range ubfx operands. 
Combine now more aggressively generates zero_extracts
and it's up to the backend to reject invalid bit offsets and widths. And arm 
seems to suffer from the same problems as aarch64 and s390
did in PR 77822.

My ARMv7-A and ARMv7-R Architecture Reference Manual version C.c in section A8.8.246 
says that the bit offset () should be
in the range 0 to 31 whereas the width should be in the range 1 to 32 - . 
Same for SBFX.

This patch directly translates those restrictions into range checks on operands 
2 and 3 of the relevant patterns.

With this patch the arm build succeeds.
Bootstrapped and tested on arm-none-linux-gnueabihf.

Committing to trunk in the interest of fixing the build.

Thanks,
Kyrill

2016-11-16  Kyrylo Tkachov  

PR target/78364
* config/arm/arm.md (*extv_reg): Restrict operands 2 and 3 to the
proper ranges for an SBFX instruction.
(extzv_t2): Likewise for UBFX.
commit b9ea5a6274834cbc469988040a807093156b52cf
Author: Kyrylo Tkachov 
Date:   Tue Nov 15 15:53:21 2016 +

[ARM] PR target/78364: Add proper restrictions to zero and sign_extract patterns operands

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index ac3ef15..87b5ea6 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -4561,9 +4561,11 @@ (define_insn "unaligned_storehi"
 (define_insn "*extv_reg"
   [(set (match_operand:SI 0 "s_register_operand" "=r")
 	(sign_extract:SI (match_operand:SI 1 "s_register_operand" "r")
- (match_operand:SI 2 "const_int_M_operand" "M")
- (match_operand:SI 3 "const_int_M_operand" "M")))]
-  "arm_arch_thumb2"
+			  (match_operand:SI 2 "const_int_operand" "n")
+			  (match_operand:SI 3 "const_int_operand" "n")))]
+  "arm_arch_thumb2
+   && IN_RANGE (INTVAL (operands[3]), 0, 31)
+   && IN_RANGE (INTVAL (operands[2]), 1, 32 - INTVAL (operands[3]))"
   "sbfx%?\t%0, %1, %3, %2"
   [(set_attr "length" "4")
(set_attr "predicable" "yes")
@@ -4574,9 +4576,11 @@ (define_insn "*extv_reg"
 (define_insn "extzv_t2"
   [(set (match_operand:SI 0 "s_register_operand" "=r")
 	(zero_extract:SI (match_operand:SI 1 "s_register_operand" "r")
- (match_operand:SI 2 "const_int_M_operand" "M")
- (match_operand:SI 3 "const_int_M_operand" "M")))]
-  "arm_arch_thumb2"
+			  (match_operand:SI 2 "const_int_operand" "n")
+			  (match_operand:SI 3 "const_int_operand" "n")))]
+  "arm_arch_thumb2
+   && IN_RANGE (INTVAL (operands[3]), 0, 31)
+   && IN_RANGE (INTVAL (operands[2]), 1, 32 - INTVAL (operands[3]))"
   "ubfx%?\t%0, %1, %3, %2"
   [(set_attr "length" "4")
(set_attr "predicable" "yes")


[committed] Fix pdp11 build

2016-11-16 Thread Richard Sandiford
Needed this to test the effect of the SVE patches on other targets.

Tested on pdp11 and committed as obvious.

Thanks,
Richard


gcc/
* config/pdp11/pdp11.c: Include dbxout.h.

diff --git a/gcc/config/pdp11/pdp11.c b/gcc/config/pdp11/pdp11.c
index 1dc3eef..8e2de14 100644
--- a/gcc/config/pdp11/pdp11.c
+++ b/gcc/config/pdp11/pdp11.c
@@ -39,6 +39,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "calls.h"
 #include "expr.h"
 #include "builtins.h"
+#include "dbxout.h"
 
 /* This file should be included last.  */
 #include "target-def.h"



[PATCH] Fix PR78305

2016-11-16 Thread Richard Biener

I am testing the following to avoid undefined behavior when negating
a multiplication (basically extending a previous fix to properly handle
negative power of two).

Bootstrap / regtest running on x86_64-unknown-linux-gnu.

Richard.

2016-11-16  Richard Biener  

PR middle-end/78305
* fold-const.c (negate_expr_p): Fix multiplication case.

* gcc.dg/torture/pr78305.c: New testcase.

Index: gcc/fold-const.c
===
--- gcc/fold-const.c(revision 242471)
+++ gcc/fold-const.c(working copy)
@@ -450,13 +450,15 @@ negate_expr_p (tree t)
   if (TYPE_UNSIGNED (type))
break;
   /* INT_MIN/n * n doesn't overflow while negating one operand it does
- if n is a power of two.  */
+ if n is a power of (minus) two.  */
   if (INTEGRAL_TYPE_P (TREE_TYPE (t))
  && ! TYPE_OVERFLOW_WRAPS (TREE_TYPE (t))
  && ! ((TREE_CODE (TREE_OPERAND (t, 0)) == INTEGER_CST
-&& ! integer_pow2p (TREE_OPERAND (t, 0)))
+&& (wi::popcount (TREE_OPERAND (t, 0))
+!= 1 + wi::neg_p (TREE_OPERAND (t, 0), SIGNED)))
|| (TREE_CODE (TREE_OPERAND (t, 1)) == INTEGER_CST
-   && ! integer_pow2p (TREE_OPERAND (t, 1)
+   && (wi::popcount (TREE_OPERAND (t, 1))
+   != 1 + wi::neg_p (TREE_OPERAND (t, 1), SIGNED)
break;
 
   /* Fall through.  */
Index: gcc/testsuite/gcc.dg/torture/pr78305.c
===
--- gcc/testsuite/gcc.dg/torture/pr78305.c  (revision 0)
+++ gcc/testsuite/gcc.dg/torture/pr78305.c  (working copy)
@@ -0,0 +1,14 @@
+/* { dg-require-effective-target int32plus } */
+/* { dg-do run } */
+
+int main ()
+{
+  int a = 2;
+  int b = 1;
+
+  int t = -1 * ( -0x4000 * a / ( -0x2000 + b ) )  / -1;
+
+  if (t != 4) __builtin_abort();
+
+  return 0;
+}


[patch, nios2] Fix PR78357, adjust sync builtin initialization

2016-11-16 Thread Chung-Lin Tang
This patch adjusts the initialization of __sync built-in functions:
instead of conditionalizing on TARGET_LINUX_ABI, directly place the
target-hook #define in config/nios2/linux.h.  This appears to be in line
with other similar ports, e.g. m68k.

Sebastian, this should solve your issue of not wanting __sync_* libcalls
generated on RTEMS (which also uses TARGET_LINUX_ABI due to TLS support),
can you verify it works for you?

Chung-Lin

PR target/78357
* config/nios2/nios2.c (nios2_init_libfuncs): Remove TARGET_LINUX_ABI
condition.
(TARGET_INIT_LIBFUNCS): Delete definition and...
* config/nios2/linux.h (TARGET_INIT_LIBFUNCS): ...move to here, add
comments.
Index: config/nios2/nios2.c
===
--- config/nios2/nios2.c(revision 242468)
+++ config/nios2/nios2.c(working copy)
@@ -3607,9 +3607,7 @@ nios2_expand_builtin (tree exp, rtx target, rtx su
 static void
 nios2_init_libfuncs (void)
 {
-  /* For Linux, we have access to kernel support for atomic operations.  */
-  if (TARGET_LINUX_ABI)
-init_sync_libfuncs (UNITS_PER_WORD);
+  init_sync_libfuncs (UNITS_PER_WORD);
 }
 
 
@@ -4986,9 +4984,6 @@ nios2_adjust_reg_alloc_order (void)
 #undef TARGET_BUILTIN_DECL
 #define TARGET_BUILTIN_DECL nios2_builtin_decl
 
-#undef TARGET_INIT_LIBFUNCS
-#define TARGET_INIT_LIBFUNCS nios2_init_libfuncs
-
 #undef TARGET_FUNCTION_OK_FOR_SIBCALL
 #define TARGET_FUNCTION_OK_FOR_SIBCALL hook_bool_tree_tree_true
 
Index: config/nios2/linux.h
===
--- config/nios2/linux.h(revision 242468)
+++ config/nios2/linux.h(working copy)
@@ -44,3 +44,7 @@
Nios II Processor Reference Handbook.  */
 #define TARGET_LINUX_ABI 1
 
+/* For Linux, we have access to kernel support for atomic operations,
+   add initialization for __sync_* builtins.  */
+#undef TARGET_INIT_LIBFUNCS
+#define TARGET_INIT_LIBFUNCS nios2_init_libfuncs


[ARC] Fix missing brackets in arc.c

2016-11-16 Thread Richard Sandiford
The old code still built thanks to the brackets in the definition
of XVECEXP.

Tested on arc-elf and committed as obvious.

Thanks,
Richard


gcc/
* config/arc/arc.c (arc_loop_hazard): Add missing brackets.

diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
index 3bce7ef..98c7298 100644
--- a/gcc/config/arc/arc.c
+++ b/gcc/config/arc/arc.c
@@ -7812,7 +7812,7 @@ arc_loop_hazard (rtx_insn *pred, rtx_insn *succ)
 jump = pred;
   else if (GET_CODE (PATTERN (pred)) == SEQUENCE
   && JUMP_P (XVECEXP (PATTERN (pred), 0, 0)))
-jump = as_a  XVECEXP (PATTERN (pred), 0, 0);
+jump = as_a  (XVECEXP (PATTERN (pred), 0, 0));
   else
 return false;
 



[PATCH] [ARC] Add support for QuarkSE processor.

2016-11-16 Thread Claudiu Zissulescu
Updated QuarkSE patch.

Ok to apply?
Claudiu

gcc/
2016-05-25  Claudiu Zissulescu  

* config/arc/arc-arches.def: Add FPX quarkse instruction as valid
for arcem.
* config/arc/arc-c.def (__ARC_FPX_QUARK__): Define.
* config/arc/arc-cpus.def (quarkse_em): Add.
* config/arc/arc-options.def (FL_FPX_QUARK, FL_QUARK): Likewise.
* config/arc/arc-opts.h (FPX_QK): Define.
* config/arc/arc-tables.opt: Regenerate.
* config/arc/arc.c (gen_compare_reg): Change.
(arc_register_move_cost): Avoid Dy,Dx moves.
* config/arc/arc.h (TARGET_HARD_FLOAT): Change.
(TARGET_FPX_QUARK, TARGET_FP_ASSIST): Define.
* config/arc/arc.md (divsf3, sqrtsf2, fix_truncsfsi2, floatsisf2):
New expands.
* config/arc/fpu.md (divsf3_fpu, sqrtsf2_fpu, floatsisf2_fpu)
(fix_truncsfsi2_fpu): Rename.
* config/arc/fpx.md (cmp_quark, cmpsf_quark_, cmpsf_quark_ord)
(cmpsf_quark_uneq, cmpsf_quark_eq, divsf3_quark, sqrtsf2_quark)
(fix_truncsfsi2_quark, floatsisf2_quark): New patterns.
* config/arc/t-multilib: Regenerate.
---
 gcc/config/arc/arc-arches.def  |  2 +-
 gcc/config/arc/arc-c.def   |  1 +
 gcc/config/arc/arc-cpus.def|  1 +
 gcc/config/arc/arc-options.def |  2 +
 gcc/config/arc/arc-opts.h  |  2 +
 gcc/config/arc/arc-tables.opt  |  3 ++
 gcc/config/arc/arc.c   | 22 +-
 gcc/config/arc/arc.h   | 12 +++--
 gcc/config/arc/arc.md  | 46 
 gcc/config/arc/fpu.md  |  6 +--
 gcc/config/arc/fpx.md  | 99 ++
 gcc/config/arc/t-multilib  |  4 +-
 12 files changed, 186 insertions(+), 14 deletions(-)

diff --git a/gcc/config/arc/arc-arches.def b/gcc/config/arc/arc-arches.def
index f24babb..5fd45cd 100644
--- a/gcc/config/arc/arc-arches.def
+++ b/gcc/config/arc/arc-arches.def
@@ -40,7 +40,7 @@
 
 ARC_ARCH ("arcem", em, FL_MPYOPT_1_6 | FL_DIVREM | FL_CD | FL_NORM \
  | FL_BS | FL_SWAP | FL_FPUS | FL_SPFP | FL_DPFP   \
- | FL_SIMD | FL_FPUDA, 0)
+ | FL_SIMD | FL_FPUDA | FL_QUARK, 0)
 ARC_ARCH ("archs", hs, FL_MPYOPT_7_9 | FL_DIVREM | FL_NORM | FL_CD \
  | FL_ATOMIC | FL_LL64 | FL_BS | FL_SWAP   \
  | FL_FPUS | FL_FPUD,  \
diff --git a/gcc/config/arc/arc-c.def b/gcc/config/arc/arc-c.def
index 4cfd7b6..fd64376 100644
--- a/gcc/config/arc/arc-c.def
+++ b/gcc/config/arc/arc-c.def
@@ -58,6 +58,7 @@ ARC_C_DEF ("__ARC_FPU_DP_DIV__", TARGET_FP_DP_SQRT)
 ARC_C_DEF ("__ARC_FPU_SP_FMA__", TARGET_FP_SP_FUSED)
 ARC_C_DEF ("__ARC_FPU_DP_FMA__", TARGET_FP_DP_FUSED)
 ARC_C_DEF ("__ARC_FPU_ASSIST__", TARGET_FP_DP_AX)
+ARC_C_DEF ("__ARC_FPX_QUARK__",  TARGET_FPX_QUARK)
 
 /* To be deprecated.  */
 ARC_C_DEF ("__A6__", TARGET_ARC600)
diff --git a/gcc/config/arc/arc-cpus.def b/gcc/config/arc/arc-cpus.def
index 0ceb734..7200485 100644
--- a/gcc/config/arc/arc-cpus.def
+++ b/gcc/config/arc/arc-cpus.def
@@ -51,6 +51,7 @@ ARC_CPU (em4, em, FL_CD, NONE)
 ARC_CPU (em4_dmips, em, FL_MPYOPT_2|FL_CD|FL_DIVREM|FL_NORM|FL_SWAP|FL_BS, 
NONE)
 ARC_CPU (em4_fpus,  em, 
FL_MPYOPT_2|FL_CD|FL_DIVREM|FL_NORM|FL_SWAP|FL_BS|FL_FPU_FPUS, NONE)
 ARC_CPU (em4_fpuda, em, 
FL_MPYOPT_2|FL_CD|FL_DIVREM|FL_NORM|FL_SWAP|FL_BS|FL_FPU_FPUDA, NONE)
+ARC_CPU (quarkse_em, em, 
FL_MPYOPT_3|FL_CD|FL_DIVREM|FL_NORM|FL_SWAP|FL_BS|FL_FPX_QUARK|FL_SPFP|FL_DPFP, 
NONE)
 
 ARC_CPU (hs,hs, 0, NONE)
 ARC_CPU (archs, hs, FL_MPYOPT_2|FL_DIVREM|FL_LL64, NONE)
diff --git a/gcc/config/arc/arc-options.def b/gcc/config/arc/arc-options.def
index 0f9d36c..a16637e 100644
--- a/gcc/config/arc/arc-options.def
+++ b/gcc/config/arc/arc-options.def
@@ -99,10 +99,12 @@ ARC_OPTX (FL_FPU_FPUD,  (1ULL << 34), 
arc_fpu_build, FPU_FPUD,  "mfpu=fpud")
 ARC_OPTX (FL_FPU_FPUD_DIV,  (1ULL << 35), arc_fpu_build, FPU_FPUD_DIV, 
"mfpu=fpud_div")
 ARC_OPTX (FL_FPU_FPUD_FMA,  (1ULL << 36), arc_fpu_build, FPU_FPUD_FMA, 
"mfpu=fpud_fma")
 ARC_OPTX (FL_FPU_FPUD_ALL,  (1ULL << 37), arc_fpu_build, FPU_FPUD_ALL, 
"mfpu=fpud_all")
+ARC_OPTX (FL_FPX_QUARK,(1ULL << 38), arc_fpu_build, FPX_QK,
"quarkse fp")
 
 ARC_OPT (FL_FPUS,  (0xFULL << 26), 0, "single precission floating point")
 ARC_OPT (FL_FPUDA, (0xFFULL << 26), 0, "double precission fp assist")
 ARC_OPT (FL_FPUD,  (0xF0FULL << 26), 0, "double precission floating point")
+ARC_OPT (FL_QUARK, (1ULL << 38), 0, "Quark SE fp extension")
 
 /* Local Variables: */
 /* mode: c */
diff --git a/gcc/config/arc/arc-opts.h b/gcc/config/arc/arc-opts.h
index e5bca84..2fd3c34 100644
--- a/gcc/config/arc/arc-opts.h
+++ b/gcc/config/arc/arc-opts.h
@@ -48,6 +48,8 @@ enum processor_type
 #define FPU_DD0x0080
 /* Double precision floating point assist operations.  */
 #define FPX_DP0x0100
+/* Quark SE floating point instructions.  */
+#define FPX_QK

RE: [PATCH 1/4] MIPS16/GCC: Fix DImode `casesi_internal_mips16_' assembly instructions

2016-11-16 Thread Matthew Fortune
Maciej Rozycki  writes:
>   gcc/
>   config/mips/mips.md (casesi_internal_mips16_): Add missing
>instruction prefixes throughout.  Correct formatting.
> 
>   gcc/testsuite/
>   * gcc.target/mips/code-readable-4.c (dg-final): Expect `dla'
>   rather than `la'.
> ---
>  OK to apply?

OK. I have no idea what system supports 64-bit MIPS16 but given it costs
little to improve consistency here then it is at least doing no harm.

Thanks,
Matthew


RE: [PATCHv2 0/7] ARC: Add support for nps400 variant

2016-11-16 Thread Claudiu Zissulescu
Approved and committed: Committed r24248

//Claudiu

> -Original Message-
> From: Andrew Burgess [mailto:andrew.burg...@embecosm.com]
> Sent: Saturday, April 30, 2016 12:17 AM
> To: Claudiu Zissulescu ; Joern Wolfgang
> Rennecke 
> Cc: gcc-patches@gcc.gnu.org; noa...@mellanox.com
> Subject: Re: [PATCHv2 0/7] ARC: Add support for nps400 variant
> 
> * Claudiu Zissulescu  [2016-04-29
> 09:03:53 +]:
> 
> > I see the next tests failing:
> >
> > FAIL: gcc.target/arc/movb-1.c scan-assembler movb[ \t]+r[0-5]+, *r[0-5]+,
> *r[0-5]+, *19, *21, *8
> > FAIL: gcc.target/arc/movb-2.c scan-assembler movb[ \t]+r[0-5]+, *r[0-5]+,
> *r[0-5]+, *23, *23, *9
> > FAIL: gcc.target/arc/movb-5.c scan-assembler movb[ \t]+r[0-5]+, *r[0-5]+,
> *r[0-5]+, *23, *(23|7), *9
> > FAIL: gcc.target/arc/movh_cl-1.c scan-assembler movh.cl r[0-
> 9]+,0xc000>>16
> 
> Claudiu, Joern,
> 
> I believe that the patch below should resolve the issues that you're
> seeing for little endian arc tests.
> 
> It's mostly just updating the expected results, though one test needed
> improving for l/e arc.
> 
> In the final test the layout used for bitfields within a struct on
> little endian arc just happened to result in a movb (move bits) not
> being generated when it could / should have been.  I've added a new
> peephole2 case to catch this.
> 
> Thanks,
> Andrew
> 
> ---
> 
> gcc/arc: New peephole2 and little endian arc test fixes
> 
> Resolve some test failures introduced for little endian arc as a result
> of the recent arc/nps400 additions.
> 
> There's a new peephole2 optimisation to merge together two zero_extracts
> in order that the movb instruction can be used.
> 
> One of the test cases is extended so that the test does something
> meaningful in both big and little endian arc mode.
> 
> Other tests have their expected results updated to reflect improvements
> in other areas of GCC.
> 
> gcc/ChangeLog:
> 
>   * config/arc/arc.md (movb peephole2): New peephole2 to merge
> two
>   zero_extract operations to allow a movb to occur.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arc/movb-1.c: Update little endian arc results.
>   * gcc.target/arc/movb-2.c: Likewise.
>   * gcc.target/arc/movb-5.c: Likewise.
>   * gcc.target/arc/movh_cl-1.c: Extend test to cover little endian
>   arc.
> ---
>  gcc/ChangeLog|  5 +
>  gcc/config/arc/arc.md| 14 ++
>  gcc/testsuite/ChangeLog  |  8 
>  gcc/testsuite/gcc.target/arc/movb-1.c|  2 +-
>  gcc/testsuite/gcc.target/arc/movb-2.c|  2 +-
>  gcc/testsuite/gcc.target/arc/movb-5.c|  2 +-
>  gcc/testsuite/gcc.target/arc/movh_cl-1.c | 11 +++
>  7 files changed, 41 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
> index c61107f..0b92594 100644
> --- a/gcc/config/arc/arc.md
> +++ b/gcc/config/arc/arc.md
> @@ -6144,6 +6144,20 @@
>  (zero_extract:SI (match_dup 1) (match_dup 5) (match_dup
> 7)))])
> (match_dup 1)])
> 
> +(define_peephole2
> +  [(set (match_operand:SI 0 "register_operand" "")
> +(zero_extract:SI (match_dup 0)
> +  (match_operand:SI 1 "const_int_operand" "")
> +  (match_operand:SI 2 "const_int_operand" "")))
> +   (set (zero_extract:SI (match_operand:SI 3 "register_operand" "")
> +  (match_dup 1)
> + (match_dup 2))
> + (match_dup 0))]
> +  "TARGET_NPS_BITOPS
> +   && !reg_overlap_mentioned_p (operands[0], operands[3])"
> +  [(set (zero_extract:SI (match_dup 3) (match_dup 1) (match_dup 2))
> +(zero_extract:SI (match_dup 0) (match_dup 1) (match_dup 2)))])
> +
>  ;; include the arc-FPX instructions
>  (include "fpx.md")
> 
> diff --git a/gcc/testsuite/gcc.target/arc/movb-1.c
> b/gcc/testsuite/gcc.target/arc/movb-1.c
> index 65d4ba4..94d9f5f 100644
> --- a/gcc/testsuite/gcc.target/arc/movb-1.c
> +++ b/gcc/testsuite/gcc.target/arc/movb-1.c
> @@ -10,4 +10,4 @@ f (void)
>bar.b = foo.b;
>  }
>  /* { dg-final { scan-assembler "movb\[ \t\]+r\[0-5\]+, *r\[0-5\]+, 
> *r\[0-5\]+,
> *5, *3, *8" { target arceb-*-* } } } */
> -/* { dg-final { scan-assembler "movb\[ \t\]+r\[0-5\]+, *r\[0-5\]+, 
> *r\[0-5\]+,
> *19, *21, *8" { target arc-*-* } } } */
> +/* { dg-final { scan-assembler "movb\[ \t\]+r\[0-5\]+, *r\[0-5\]+, 
> *r\[0-5\]+,
> *3, *5, *8" { target arc-*-* } } } */
> diff --git a/gcc/testsuite/gcc.target/arc/movb-2.c
> b/gcc/testsuite/gcc.target/arc/movb-2.c
> index 1ba9976..708f393 100644
> --- a/gcc/testsuite/gcc.target/arc/movb-2.c
> +++ b/gcc/testsuite/gcc.target/arc/movb-2.c
> @@ -9,5 +9,5 @@ f (void)
>  {
>bar.b = foo.b;
>  }
> -/* { dg-final { scan-assembler "movb\[ \t\]+r\[0-5\]+, *r\[0-5\]+, 
> *r\[0-5\]+,
> *23, *23, *9" { target arc-*-* } } } */
> +/* { dg-final { scan-assembler "movb\[ \t\]+r\[0-5\]+, 

Re: Rework subreg_get_info

2016-11-16 Thread Eric Botcazou
> This isn't intended to change the behaviour, just rewrite the
> existing logic in a different (and hopefully clearer) way.

Yes, I agree that it's an improvement.  A few remarks below.

> diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
> index ca6cced..7c0acf5 100644
> --- a/gcc/rtlanal.c
> +++ b/gcc/rtlanal.c
> @@ -3601,31 +3601,28 @@ subreg_get_info (unsigned int xregno, machine_mode
> xmode, unsigned int offset, machine_mode ymode,
>struct subreg_info *info)
>  {
> -  int nregs_xmode, nregs_ymode;
> -  int mode_multiple, nregs_multiple;
> -  int offset_adj, y_offset, y_offset_adj;
> -  int regsize_xmode, regsize_ymode;
> -  bool rknown;
> +  unsigned int nregs_xmode, nregs_ymode;
> 
>gcc_assert (xregno < FIRST_PSEUDO_REGISTER);
> 
> -  rknown = false;
> +  unsigned int xsize = GET_MODE_SIZE (xmode);
> +  unsigned int ysize = GET_MODE_SIZE (ymode);
> +  bool rknown = false;
> 
>/* If there are holes in a non-scalar mode in registers, we expect
> - that it is made up of its units concatenated together.  */
> + that it is made up of its units concatenated together.  Each scalar
> + unit occupies at least one register.  */

Why using "scalar unit" while the first sentence uses "unit"?  Are they 
different units?  What's the status of the new sentence?  Assertion?  
Expectation?  I'd also make the first sentence grammatical.

>if (HARD_REGNO_NREGS_HAS_PADDING (xregno, xmode))
>  {
> -  machine_mode xmode_unit;
> -
>nregs_xmode = HARD_REGNO_NREGS_WITH_PADDING (xregno, xmode);
> -  xmode_unit = GET_MODE_INNER (xmode);
> +  unsigned int nunits = GET_MODE_NUNITS (xmode);
> +  machine_mode xmode_unit = GET_MODE_INNER (xmode);
>gcc_assert (HARD_REGNO_NREGS_HAS_PADDING (xregno, xmode_unit));
>gcc_assert (nregs_xmode
> -   == (GET_MODE_NUNITS (xmode)
> +   == (nunits
> * HARD_REGNO_NREGS_WITH_PADDING (xregno, xmode_unit)));
>gcc_assert (hard_regno_nregs[xregno][xmode]
> -   == (hard_regno_nregs[xregno][xmode_unit]
> -   * GET_MODE_NUNITS (xmode)));
> +   == hard_regno_nregs[xregno][xmode_unit] * nunits);
> 
>/* You can only ask for a SUBREG of a value with holes in the middle
>if you don't cross the holes.  (Such a SUBREG should be done by
> @@ -3635,11 +3632,9 @@ subreg_get_info (unsigned int xregno, machine_mode
> xmode, 3 for each part, but in memory it's two 128-bit parts.
>Padding is assumed to be at the end (not necessarily the 'high part')
>of each unit.  */
> -  if ((offset / GET_MODE_SIZE (xmode_unit) + 1
> -< GET_MODE_NUNITS (xmode))
> +  if ((offset / GET_MODE_SIZE (xmode_unit) + 1 < nunits)
> && (offset / GET_MODE_SIZE (xmode_unit)
> -   != ((offset + GET_MODE_SIZE (ymode) - 1)
> -   / GET_MODE_SIZE (xmode_unit
> +   != ((offset + ysize - 1) / GET_MODE_SIZE (xmode_unit
>   {
> info->representable_p = false;
> rknown = true;

This part is OK.

> @@ -3651,18 +3646,17 @@ subreg_get_info (unsigned int xregno, machine_mode
> xmode, nregs_ymode = hard_regno_nregs[xregno][ymode];
> 
>/* Paradoxical subregs are otherwise valid.  */
> -  if (!rknown
> -  && offset == 0
> -  && GET_MODE_PRECISION (ymode) > GET_MODE_PRECISION (xmode))
> +  if (!rknown && offset == 0 && ysize > xsize)
>  {
>info->representable_p = true;
>/* If this is a big endian paradoxical subreg, which uses more
>actual hard registers than the original register, we must
>return a negative offset so that we find the proper highpart
>of the register.  */
> -  if (GET_MODE_SIZE (ymode) > UNITS_PER_WORD
> -   ? REG_WORDS_BIG_ENDIAN : BYTES_BIG_ENDIAN)
> - info->offset = nregs_xmode - nregs_ymode;
> +  if (REG_WORDS_BIG_ENDIAN != WORDS_BIG_ENDIAN && ysize >
> UNITS_PER_WORD +? REG_WORDS_BIG_ENDIAN
> +   : byte_lowpart_offset (ymode, xmode) != 0)
> + info->offset = (int) nregs_xmode - (int) nregs_ymode;
>else
>   info->offset = 0;
>info->nregs = nregs_ymode;

This part is not OK, it turns straightforward code into convoluted code.

> @@ -3673,31 +3667,23 @@ subreg_get_info (unsigned int xregno, machine_mode
> xmode, modes, we cannot generally form this subreg.  */
>if (!HARD_REGNO_NREGS_HAS_PADDING (xregno, xmode)
>&& !HARD_REGNO_NREGS_HAS_PADDING (xregno, ymode)
> -  && (GET_MODE_SIZE (xmode) % nregs_xmode) == 0
> -  && (GET_MODE_SIZE (ymode) % nregs_ymode) == 0)
> +  && (xsize % nregs_xmode) == 0
> +  && (ysize % nregs_ymode) == 0)
>  {
> -  regsize_xmode = GET_MODE_SIZE (xmode) / nregs_xmode;
> -  regsize_ymode = GET_MODE_SIZE (ymode) / nregs_ymode;
> -  if (!rknown && regsize_xmode > regsize_ymode && nregs_ymode > 1)
> - {
> -   info->representable_p = false;
> -   info->nregs
> - = (GET_MODE_SIZE 

[PATCH 1/4] [ARC] Various fixes.

2016-11-16 Thread Claudiu Zissulescu
The ifconversion was failing because a move involving the lp_count was
not match by movsi_ne.  This patch updates the constraints such that
movsi_ne will match.  The failing test is dg-torture.exp=pr68955.c for
archs and without small data.

gcc/
2016-07-11  Claudiu Zissulescu  

* config/arc/arc.h (REG_CLASS_NAMES): Reorder.
* config/arc/arc.md (*add_n): Change.
(*sub_n): Likewise.
(movsi_ne): Update constraint to Rac.
---
 gcc/config/arc/arc.h  |  2 +-
 gcc/config/arc/arc.md | 18 +-
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/gcc/config/arc/arc.h b/gcc/config/arc/arc.h
index cd8b9f1..642bf73 100644
--- a/gcc/config/arc/arc.h
+++ b/gcc/config/arc/arc.h
@@ -617,10 +617,10 @@ enum reg_class
   "MPY_WRITABLE_CORE_REGS",   \
   "WRITABLE_CORE_REGS",   \
   "CHEAP_CORE_REGS", \
+  "ALL_CORE_REGS",   \
   "R0R3_CD_REGS", \
   "R0R1_CD_REGS", \
   "AC16_H_REGS",   \
-  "ALL_CORE_REGS",   \
   "ALL_REGS" \
 }
 
diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
index 5a7699f..217935c 100644
--- a/gcc/config/arc/arc.md
+++ b/gcc/config/arc/arc.md
@@ -2897,10 +2897,10 @@
(set (match_dup 3) (match_dup 4))])
 
 (define_insn "*add_n"
-  [(set (match_operand:SI 0 "dest_reg_operand" "=Rcqq,Rcw,W,W,w,w")
-   (plus:SI (mult:SI (match_operand:SI 1 "register_operand" 
"Rcqq,c,c,c,c,c")
- (match_operand:SI 2 "_2_4_8_operand" ""))
-(match_operand:SI 3 "nonmemory_operand" 
"0,0,c,?Cal,?c,??Cal")))]
+  [(set (match_operand:SI 0 "dest_reg_operand"  "=Rcqq,Rcw,W,  
W,w,w")
+   (plus:SI (mult:SI (match_operand:SI 1 "register_operand" "Rcqq,  c,c,  
c,c,c")
+ (match_operand:SI 2 "_2_4_8_operand"   ""))
+(match_operand:SI 3 "nonmemory_operand""0,  
0,c,Cal,c,Cal")))]
   ""
   "add%z2%? %0,%3,%1%&"
   [(set_attr "type" "shift")
@@ -2912,9 +2912,9 @@
 ;; N.B. sub[123] has the operands of the MINUS in the opposite order from
 ;; what synth_mult likes.
 (define_insn "*sub_n"
-  [(set (match_operand:SI 0 "dest_reg_operand" "=Rcw,w,w")
-   (minus:SI (match_operand:SI 1 "nonmemory_operand" "0,c,?Cal")
- (mult:SI (match_operand:SI 2 "register_operand" "c,c,c")
+  [(set (match_operand:SI 0 "dest_reg_operand" "=Rcw,w,w")
+   (minus:SI (match_operand:SI 1 "nonmemory_operand" "0,c,?Cal")
+ (mult:SI (match_operand:SI 2 "register_operand" "w,w,w")
   (match_operand:SI 3 "_2_4_8_operand" ""]
   ""
   "sub%z3%? %0,%1,%2"
@@ -3570,8 +3570,8 @@
 (define_insn "*movsi_ne"
   [(cond_exec
  (ne (match_operand:CC_Z 2 "cc_use_register""Rcc,  Rcc,  Rcc,Rcc,Rcc") 
(const_int 0))
- (set (match_operand:SI 0 "dest_reg_operand" "=Rcq#q,Rcq#q,Rcq#q,  w,w")
- (match_operand:SI 1 "nonmemory_operand"   "C_0,h, ?Cal, 
Lc,?Cal")))]
+ (set (match_operand:SI 0 "dest_reg_operand" "=Rcq#q,Rcq#q,Rcq#q,   w,w")
+ (match_operand:SI 1 "nonmemory_operand"   "C_0,h, 
?Cal,LRac,?Cal")))]
   ""
   "@
* current_insn_predicate = 0; return \"sub%?.ne %0,%0,%0%&\";
-- 
1.9.1



[RFC][PATCH] Speed-up use-after-scope (re-writing to SSA)

2016-11-16 Thread Martin Liška
Hello

Following patch is a candidate that re-writes VAR_DECLs that are
is_gimple_reg_type with:
my_char_25 = ASAN_POISON ();

that is eventually transformed to:
__builtin___asan_report_use_after_scope_noabort ("my_char", 1);

at places where my_char_25 is used. That introduces a new entry point
to ASAN runtime, reporting:

==18378==ERROR: AddressSanitizer: stack-use-after-scope at pc 0x004007b4 bp 
0x0001 sp 0x00400603
ACCESS of size 1 for variable 'my_char' thread T0
#0 0x400602 in main (/tmp/a.out+0x400602)
#1 0x7fa6e572d290 in __libc_start_main (/lib64/libc.so.6+0x20290)
#2 0x400669 in _start (/tmp/a.out+0x400669)

SUMMARY: AddressSanitizer: stack-use-after-scope (/tmp/a.out+0x400602) in main

I'm still not sure where exactly do the expansion of ASAN_POISON as some cleanup
after the transformation would be desired.

Thoughts?
Thanks,
Martin 




>From c115207230a5be979119b6ac6572ae6af2a0ccd7 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Mon, 14 Nov 2016 16:49:05 +0100
Subject: [PATCH] use-after-scope: introduce ASAN_POISON internal fn

---
 gcc/asan.c   | 72 +++-
 gcc/asan.h   |  1 +
 gcc/internal-fn.c|  7 
 gcc/internal-fn.def  |  1 +
 gcc/sanitizer.def|  8 +
 gcc/sanopt.c |  9 +
 gcc/tree-ssa.c   | 65 ++--
 libsanitizer/asan/asan_errors.cc | 21 
 libsanitizer/asan/asan_errors.h  | 19 +++
 libsanitizer/asan/asan_report.cc | 10 ++
 libsanitizer/asan/asan_report.h  |  3 ++
 libsanitizer/asan/asan_rtl.cc| 16 +
 12 files changed, 221 insertions(+), 11 deletions(-)

diff --git a/gcc/asan.c b/gcc/asan.c
index 6e93ea3..d7d4267 100644
--- a/gcc/asan.c
+++ b/gcc/asan.c
@@ -32,8 +32,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-pass.h"
 #include "memmodel.h"
 #include "tm_p.h"
+#include "ssa.h"
 #include "stringpool.h"
-#include "tree-vrp.h"
 #include "tree-ssanames.h"
 #include "optabs.h"
 #include "emit-rtl.h"
@@ -2979,6 +2979,76 @@ asan_expand_check_ifn (gimple_stmt_iterator *iter, bool use_calls)
   return true;
 }
 
+
+/* Expand the ASAN_{LOAD,STORE} builtins.  */
+
+bool
+asan_expand_poison_ifn (gimple_stmt_iterator *iter,
+			bool *need_commit_edge_insert)
+{
+  gimple *g = gsi_stmt (*iter);
+  tree poisoned_var = gimple_call_lhs (g);
+  if (!poisoned_var)
+{
+  gsi_remove (iter, true);
+  return true;
+}
+
+  tree var_decl = SSA_NAME_VAR (poisoned_var);
+
+  bool recover_p;
+  if (flag_sanitize & SANITIZE_USER_ADDRESS)
+recover_p = (flag_sanitize_recover & SANITIZE_USER_ADDRESS) != 0;
+  else
+recover_p = (flag_sanitize_recover & SANITIZE_KERNEL_ADDRESS) != 0;
+
+  use_operand_p use_p;
+  imm_use_iterator imm_iter;
+  FOR_EACH_IMM_USE_FAST (use_p, imm_iter, poisoned_var)
+{
+  gimple *use = USE_STMT (use_p);
+
+  built_in_function b = (recover_p
+			 ? BUILT_IN_ASAN_REPORT_USE_AFTER_SCOPE_NOABORT
+			 : BUILT_IN_ASAN_REPORT_USE_AFTER_SCOPE);
+  tree fun = builtin_decl_implicit (b);
+  pretty_printer pp;
+  pp_tree_identifier (, DECL_NAME (var_decl));
+
+  gcall *call = gimple_build_call (fun, 2, asan_pp_string (),
+   DECL_SIZE_UNIT (var_decl));
+  gimple_set_location (call, gimple_location (g));
+
+  /* If ASAN_POISON is used in a PHI node, let's insert the call on
+	 the leading to the PHI node BB.  */
+  if (is_a  (use))
+	{
+	  gphi * phi = dyn_cast (use);
+	  for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
+	if (gimple_phi_arg_def (phi, i) == poisoned_var)
+	  {
+		edge e = gimple_phi_arg_edge (phi, i);
+		gsi_insert_seq_on_edge (e, call);
+		*need_commit_edge_insert = true;
+
+		break;
+	  }
+	}
+  else
+	{
+	  gimple_stmt_iterator gsi = gsi_for_stmt (use);
+	  gsi_insert_before (, call, GSI_NEW_STMT);
+	}
+}
+
+  gimple *nop = gimple_build_nop ();
+  SSA_NAME_IS_DEFAULT_DEF (poisoned_var) = true;
+  SSA_NAME_DEF_STMT (poisoned_var) = nop;
+  gsi_replace (iter, nop, GSI_NEW_STMT);
+
+  return false;
+}
+
 /* Instrument the current function.  */
 
 static unsigned int
diff --git a/gcc/asan.h b/gcc/asan.h
index 9cf5904..6c25955 100644
--- a/gcc/asan.h
+++ b/gcc/asan.h
@@ -30,6 +30,7 @@ extern void initialize_sanitizer_builtins (void);
 extern tree asan_dynamic_init_call (bool);
 extern bool asan_expand_check_ifn (gimple_stmt_iterator *, bool);
 extern bool asan_expand_mark_ifn (gimple_stmt_iterator *);
+extern bool asan_expand_poison_ifn (gimple_stmt_iterator *, bool *);
 
 extern gimple_stmt_iterator create_cond_insert_point
  (gimple_stmt_iterator *, bool, bool, bool, basic_block *, basic_block *);
diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index ca347c5..17624e8 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -246,6 +246,13 @@ expand_ASAN_MARK (internal_fn, gcall *)
  

Re: An alternative fix for PR70944

2016-11-16 Thread Segher Boessenkool
On Wed, Nov 16, 2016 at 10:16:16AM +, Richard Sandiford wrote:
> Segher Boessenkool  writes:
> > On Tue, Nov 15, 2016 at 12:33:06PM +, Richard Sandiford wrote:
> >> The transformations made by make_compound_operation apply
> >> only to scalar integer modes.  The fix for PR70944 had enforced
> >> that by returning early for vector modes at the top of the
> >> function.  However, the function is supposed to be recursive,
> >> so we should continue to look at integer suboperands even if
> >> the outer operation is a vector one.
> >> 
> >> This patch instead splits out the non-recursive parts
> >> of make_compound_operation into a subroutine and checks
> >> that the mode is a scalar integer before calling it.
> >> The patch was originally written to help with the later
> >> conversion to static type checking of mode classes, but it
> >> also happened to reenable optimisation of things like
> >> vec_duplicate operands.
> >> 
> >> Note that the gen_lowparts in the PLUS and MINUS cases
> >> were redundant, since new_rtx already had mode "mode"
> >> at those points.
> >> 
> >> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
> >
> > Yes, please do.  You can use maybe_swap_commutative_operands in
> > change_zero_ext as well, perhaps in more places, do you want to
> > take a look?
> 
> Ah, yeah.  change_zero_ext was the only one I could see.
> 
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Yes, still okay.  Thanks!


Segher


RE: [PATCH] microMIPS/GCC: Fix PIC call relaxation

2016-11-16 Thread Matthew Fortune
Maciej Rozycki  writes:
> Fix `-mrelax-pic-calls' support for microMIPS code where the relocation
> produced is supposed to be R_MICROMIPS_JALR rather than R_MIPS_JALR.
> The lack of short delay support comes from a missed update to this code
> for microMIPS support and can be relieved as JALRS and JRS instructions
> can be relaxed to BALS and B instructions respectively, so do that as
> well.

Thanks. I didn't follow the background to this optimisation when I added
the compact branch support so opted to retain the pre-existing behaviour.

> By doing so complement commit r196828 ("microMIPS gcc support"),
> , which is the
> original change that introduced microMIPS support, in particular to
> MIPS_CALL, which is where this code previously resided.
> 
> Adjust the test suite accordingly, limiting R_MICROMIPS_JALR cases to

Typo fwiw: Should say R_MIPS_JALR for MIPS code.

> regular MIPS code only, and adding corresponding R_MICROMIPS_JALR cases
> for microMIPS code.
> 
>   gcc/
>   * config/mips/mips.c (mips_output_jump): Output R_MICROMIPS_JALR
>   rather than R_MIPS_JALR relocation in microMIPS code.  Do not
>   cancel short delay slots in PIC call relaxation.
> 
>   gcc/testsuite/
>   * gcc.target/mips/call-1.c (dg-options): Add `-mno-micromips'.
>   (dg-final): Remove microMIPS JALRS mnemonic matching.
>   * gcc.target/mips/call-2.c (dg-options): Add `-mno-micromips'.
>   (dg-final): Remove microMIPS JALRS mnemonic matching.
>   * gcc.target/mips/call-3.c (dg-options): Add `-mno-micromips'.
>   (dg-final): Remove microMIPS JALRS mnemonic matching.
>   * gcc.target/mips/call-4.c (dg-options): Add `-mno-micromips'.
>   * gcc.target/mips/call-5.c (dg-options): Add `-mno-micromips'.
>   * gcc.target/mips/call-6.c (dg-options): Add `-mno-micromips'.
>   * gcc.target/mips/call-1u.c: New test case.
>   * gcc.target/mips/call-2u.c: New test case.
>   * gcc.target/mips/call-3u.c: New test case.
>   * gcc.target/mips/call-4u.c: New test case.
>   * gcc.target/mips/call-5u.c: New test case.
>   * gcc.target/mips/call-6u.c: New test case.
> ---
>  NB the use of this feature for microMIPS is limited because short
> encodings of register jump instructions usually do not have their branch
> counterparts and long encodings typically are not used.  However at
> least
> tail calls can be converted if the jump target is in range, as can calls
> in `-minsn32' code.  Perhaps we could switch to producing `j[al]r[s].32'
> in the `-mrelax-pic-calls' mode like GAS does with the `jal' and `j'
> microMIPS macros in PIC code.

Does the linker do anything for R_MICROMIPS_JALR currently? From memory
I seem to think it was mostly ignored. Is there any risk with older linkers
by introducing R_MICROMIPS_JALR in GCC generated code?

>  Let me know if the lack of microMIPS results would be a problem for
> this
> patch's acceptance.

Not a problem. We will pick this up as part of testing for the release.

There is a bit of coding style fuzz in the testcases but it is
pre-existing from the code you duplicated so I don't think it needs
fixing.

>  Otherwise, OK to apply?

OK, as long as you can say there is no risk with the new reloc and older
linkers.

Thanks,
Matthew



RE: [PATCH 3/4] MIPS16/GCC: Improve `casesi_internal_mips16_'s instruction count estimate

2016-11-16 Thread Matthew Fortune
Maciej Rozycki  writes:
>   gcc/
>   * config/mips/mips.md (casesi_internal_mips16_): Set
>   `insn_count' to 11 rather than 16.
> ---
>  OK to apply?

Good catch again. OK.

Thanks,
Matthew


RE: [PATCH 2/4] MIPS16/GCC: Correct `casesi_internal_mips16_'s RTL pattern

2016-11-16 Thread Matthew Fortune
Maciej Rozycki  writes:
>   gcc/
>   * config/mips/mips.md (casesi_internal_mips16_): Use the
>   `ltu' rather than `leu' operation in the RTL pattern
> ---
>  OK to apply?

Good spot. OK to commit.

Thanks,
Matthew


Re: [PATCH][PR sanitizer/78307] Fix missing symbols in libubsan after recent merge.

2016-11-16 Thread Jakub Jelinek
On Fri, Nov 11, 2016 at 06:32:43PM +0300, Maxim Ostapenko wrote:
> Tested on x86_64-unknown-linux-gnu. OK for mainline?

Ok (though the LOCAL_PATCHES file is missing in the patch).

> libsanitizer/ChangeLog:
> 
> 2016-11-11  Maxim Ostapenko  
> 
>   PR sanitizer/78307
>   * ubsan/ubsan_handlers.cc (__ubsan_handle_cfi_bad_icall): New function.
>   ( __ubsan_handle_cfi_bad_icall_abort): Likewise. 
>   * ubsan/ubsan_handlers.h (struct CFIBadIcallData): New type.
>   * ubsan/ubsan_handlers_cxx.cc (__ubsan_handle_cfi_bad_type): New
>   function.
>   (__ubsan_handle_cfi_bad_type_abort): Likewise.
>   * ubsan/ubsan_handlers_cxx.h (struct CFIBadTypeData): New type.
>   (__ubsan_handle_cfi_bad_type): Export function.
>   (__ubsan_handle_cfi_bad_type_abort): Likewise.
>   * LOCAL_PATCHES: New file.
>   * HOWTO_MERGE: Update documentation.

Jakub


Re: [PATCH] Fix PR78305

2016-11-16 Thread Richard Biener
On Wed, 16 Nov 2016, Marc Glisse wrote:

> On Wed, 16 Nov 2016, Richard Biener wrote:
> 
> > I am testing the following to avoid undefined behavior when negating
> > a multiplication (basically extending a previous fix to properly handle
> > negative power of two).
> > 
> > Bootstrap / regtest running on x86_64-unknown-linux-gnu.
> > 
> > Richard.
> > 
> > 2016-11-16  Richard Biener  
> > 
> > PR middle-end/78305
> > * fold-const.c (negate_expr_p): Fix multiplication case.
> > 
> > * gcc.dg/torture/pr78305.c: New testcase.
> > 
> > Index: gcc/fold-const.c
> > ===
> > --- gcc/fold-const.c(revision 242471)
> > +++ gcc/fold-const.c(working copy)
> > @@ -450,13 +450,15 @@ negate_expr_p (tree t)
> >   if (TYPE_UNSIGNED (type))
> > break;
> >   /* INT_MIN/n * n doesn't overflow while negating one operand it does
> > - if n is a power of two.  */
> > + if n is a power of (minus) two.  */
> 
> if n is (minus) a power of two.
> if n is a divisor of INT_MIN.

n is a divisor of INT_MIN is correct.

> >   if (INTEGRAL_TYPE_P (TREE_TYPE (t))
> >   && ! TYPE_OVERFLOW_WRAPS (TREE_TYPE (t))
> >   && ! ((TREE_CODE (TREE_OPERAND (t, 0)) == INTEGER_CST
> > -&& ! integer_pow2p (TREE_OPERAND (t, 0)))
> > +&& (wi::popcount (TREE_OPERAND (t, 0))
> > +!= 1 + wi::neg_p (TREE_OPERAND (t, 0), SIGNED)))
> 
> Is that supposed to test for (possibly negated) powers of 2? I don't see it.
> For -2, aka 0b11...110, popcount is 31 != 1 + 1.

It's supposed to test for a power of two with the sign-bit ORed in.
I believe those are which, when multiplied with some number can
yield INT_MIN.  That is, we look for a test that ensures that there
exists no number when multiplied with X yields INT_MIN.

Richard.

> > || (TREE_CODE (TREE_OPERAND (t, 1)) == INTEGER_CST
> > -   && ! integer_pow2p (TREE_OPERAND (t, 1)
> > +   && (wi::popcount (TREE_OPERAND (t, 1))
> > +   != 1 + wi::neg_p (TREE_OPERAND (t, 1), SIGNED)
> > break;
> > 
> >   /* Fall through.  */
> > Index: gcc/testsuite/gcc.dg/torture/pr78305.c
> > ===
> > --- gcc/testsuite/gcc.dg/torture/pr78305.c  (revision 0)
> > +++ gcc/testsuite/gcc.dg/torture/pr78305.c  (working copy)
> > @@ -0,0 +1,14 @@
> > +/* { dg-require-effective-target int32plus } */
> > +/* { dg-do run } */
> > +
> > +int main ()
> > +{
> > +  int a = 2;
> > +  int b = 1;
> > +
> > +  int t = -1 * ( -0x4000 * a / ( -0x2000 + b ) )  / -1;
> > +
> > +  if (t != 4) __builtin_abort();
> > +
> > +  return 0;
> > +}
> > 
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCH] Fix PR sanitizer/78270 (part 2)

2016-11-16 Thread Martin Liška
I'm pinging this as it breaks ASAN bootstrap.

On 11/11/2016 02:44 PM, Martin Liška wrote:
> Hello.
> 
> Due to a stupid mistake I did, following patch is needed for the test-case
> to properly save previous gimplify_ctxp->live_switch_vars.
> 
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
> I was able to run asan bootstrap on x86_64-linux-gnu and kernel build with
> allyesconfig works fine.
> 
> Ready to be installed?
> Martin
> 



Re: [PATCH] Fix PR sanitizer/78270 (part 2)

2016-11-16 Thread Jakub Jelinek
On Fri, Nov 11, 2016 at 02:44:40PM +0100, Martin Liška wrote:
> Hello.
> 
> Due to a stupid mistake I did, following patch is needed for the test-case
> to properly save previous gimplify_ctxp->live_switch_vars.
> 
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
> I was able to run asan bootstrap on x86_64-linux-gnu and kernel build with
> allyesconfig works fine.
> 
> Ready to be installed?
> Martin

> >From 53dd3c035283863a25a24feb90bf359295999bca Mon Sep 17 00:00:00 2001
> From: marxin 
> Date: Fri, 11 Nov 2016 11:21:29 +0100
> Subject: [PATCH] Fix PR sanitizer/78270 (part 2)
> 
> gcc/ChangeLog:
> 
> 2016-11-11  Martin Liska  
> 
>   PR sanitizer/78270
>   * gimplify.c (gimplify_switch_expr): Always save previous
>   gimplify_ctxp->live_switch_vars.
> 
> gcc/testsuite/ChangeLog:
> 
> 2016-11-11  Martin Liska  
> 
>   PR sanitizer/78270
>   * gcc.dg/asan/pr78270-2.c: New test.

Ok, thanks.

> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/asan/pr78270-2.c
> @@ -0,0 +1,17 @@
> +// { dg-do compile }
> +// { dg-additional-options "-Wno-switch-unreachable" }

Please use /* */ comments instead of //, that is common
in the gcc.dg/ tests, and start with
/* PR sanitizer/78270 */
line.

Thanks.

Jakub


  1   2   >