Re: [PATCH] PR fortran/78033 -- This was a REAL pain

2016-10-21 Thread Paul Richard Thomas
Hi Steve,

Thanks for persevering with this. The patch looks good to me. If it
has regtested OK, please feel free to commit.

Cheers

Paul

On 22 October 2016 at 02:22, Steve Kargl
 wrote:
> All,
>
> The attached patch fixes PR fortran/78033.  This was a REAL pain
> to fix because Fortran overloads REAL as an intrinsic type and
> an intrinsic subprogram.
>
> gfc_match_type_spec() in match.c is used to match Fortran 2003
> type-specs in things like array constructors and TYPE IS statements.
> At some point in time, PR fortran/54730 was submitted because an ICE
> occurred for
>
>   subroutine s
> implicit none
> intrinsic :: real
> real :: vec(1:2)
> vec = (/ real(a = 1), 1. /)
>   end subroutine s
>
> where a symbol for 'a' was create while parsing for a validate
> typespec.  The invalid 'a' was causing an ICE during translation.
> Mikael fixed the ICE by introducing checkpointing of the symbols in
> gfc_match_array_constructor() in array.c, which allowed 'a' to be
> removed.
>
> Fast-forward to PR fortran/78033, submitted a few days ago.
> Code like
>
>   subroutine f(n, x)
>  integer, intent(in) :: n
>  complex, intent(in) :: x(1:n)
>  real :: y(2*n)
>  y = [real(x(1:n), aimag(x(1:n))]
>   end subroutine f
>
> was now ICE'ing due what appears to be a tangling checkpoint.
>
> f951: internal compiler error: in enforce_single_undo_checkpoint,
> at fortran/symbol.c:3514
>
> If I disabled, Mikael's fix for PR fortran/54730 then PR fortran/78033
> would compile with the expected regression with PR fortran/54730.  Having
> spent to much time looking for a mismatch in checkpoints, I decided to
> remove Mikael's fix in gfc_match_array_constructor() and fix the issue
> in gfc_match_type_spec() where I special case the parsing of
> REAL([KIND]=scalar-int-initialization-expr).
>
> An early version of the patch passed regression except for gomp/udr3.f90.
> Note that gfortran never vists gfc_match_type_spec while compling udr3.f90.
> I've deleted obj/ and started a clean bootstrap to see if this failure
> was collateral damage for my tinkering.  If regression testing is
> successfull, OK to commit?
>
> 2016-10-21  Steven G. Kargl  
>
> PR fortran/78033
> * array.c (gfc_match_array_constructor): Remove checkpointing
> introduced in r196416.  Move initialization to top of function.
> * match.c (gfc_match_type_spec): Special case matching for REAL.
>
> 2016-10-21  Steven G. Kargl  
>
> PR fortran/78033
> * gfortran.dg/pr78033.f90: New test.
>
> --
> Steve



-- 
The difference between genius and stupidity is; genius has its limits.

Albert Einstein


Re: [PATCH] Extend -Wint-in-bool-context to warn for multiplications

2016-10-21 Thread Bernd Edlinger
On 10/22/16 04:17, Martin Sebor wrote:
> On 10/21/2016 04:37 PM, Joseph Myers wrote:
>> The quoting in the diagnostic should be %<&&%>, not '&&'.
>
> Presumably same for '*' (i.e., %<*%>).
>
> But I would actually suggest a somewhat more formal phrasing than
> "better use xxx here" such as "suggest %<&&%> instead" or something
> akin to what's already in place elsewhere in gcc.pot.
>

Aehm, yes.  That would be better then:


Index: c-common.c
===
--- c-common.c  (revision 241400)
+++ c-common.c  (working copy)
@@ -3327,6 +3327,11 @@
return c_common_truthvalue_conversion (location,
   TREE_OPERAND (expr, 0));

+case MULT_EXPR:
+  warning_at (EXPR_LOCATION (expr), OPT_Wint_in_bool_context,
+ "%<*%> in boolean context, suggest %<&&%> instead");
+  break;
+
  case LSHIFT_EXPR:
/* We will only warn on signed shifts here, because the majority of
 false positive warnings happen in code where unsigned arithmetic


I assume then I should adjust the warning a few lines below as well:

 warning_at (EXPR_LOCATION (expr), OPT_Wint_in_bool_context,
 "<< in boolean context, did you mean '<' ?");



Bernd.


Re: [PATCH] Extend -Wint-in-bool-context to warn for multiplications

2016-10-21 Thread Martin Sebor

On 10/21/2016 04:37 PM, Joseph Myers wrote:

The quoting in the diagnostic should be %<&&%>, not '&&'.


Presumably same for '*' (i.e., %<*%>).

But I would actually suggest a somewhat more formal phrasing than
"better use xxx here" such as "suggest %<&&%> instead" or something
akin to what's already in place elsewhere in gcc.pot.

Martin


Re: [PATCH 2/5] [AARCH64] Change IMP and PART over to integers from strings.

2016-10-21 Thread Andrew Pinski
On Fri, Oct 21, 2016 at 9:28 AM, James Greenhalgh
 wrote:
> On Fri, Oct 21, 2016 at 04:57:22PM +0100, Richard Earnshaw (lists) wrote:
>> On 21/10/16 14:59, James Greenhalgh wrote:
>> > On Sat, Oct 15, 2016 at 07:38:40PM -0700, Andrew Pinski wrote:
>> >> On Wed, Nov 25, 2015 at 11:59 AM, Andrew Pinski  wrote:
>> >> Here is finally an updated (fixed) patch (I did not implement the two
>> >> implementer big.LITTLE support yet, that will be for a different patch
>> >> since I also fixed the part no not being unique as a separate patch.
>> >> Once I get a new enough kernel, I will also look into doing the
>> >> /sys/cpu/* style detection first.
>> >>
>> >> OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions
>> >> (and tested hacking the location of the read in file to see if it
>> >> works with big.LITTLE and other formats of /proc/cpuinfo).
>> >
>> > I'm OK with this in principle, but it needs some polish for pedantic
>> > style comments...
>> >
>> >> * config/aarch64/aarch64-cores.def: Rewrite so IMP and PART are
>> >> integer constants.
>> >> * config/aarch64/driver-aarch64.c (struct aarch64_core_data): Change
>> >> implementer_id to unsigned char.
>> >> Change part_no to unsigned int.
>> >> (AARCH64_BIG_LITTLE): New define.
>> >> (INVALID_IMP): New define.
>> >> (INVALID_CORE): New define.
>> >> (cpu_data): Change the last element's implementer_id and part_no to 
>> >> integers.
>> >> (valid_bL_string_p): Rewrite to ..
>> >> (valid_bL_core_p): this for integers instead of strings.
>> >> (parse_field): New function.
>> >> (contains_string_p): Rewrite to ...
>> >> (contains_core_p): this for integers and only for the part_no.
>> >> (host_detect_local_cpu): Rewrite handling of implementation and part
>> >> num to be integers;
>> >> simplifying the code.
>> >
>> >> Index: config/aarch64/aarch64-cores.def
>> >> ===
>> >> --- config/aarch64/aarch64-cores.def   (revision 241200)
>> >> +++ config/aarch64/aarch64-cores.def   (working copy)
>> >> @@ -32,43 +32,46 @@
>> >> FLAGS are the bitwise-or of the traits that apply to that core.
>> >> This need not include flags implied by the architecture.
>> >> COSTS is the name of the rtx_costs routine to use.
>> >> -   IMP is the implementer ID of the CPU vendor.  On a GNU/Linux system 
>> >> it can
>> >> -   be found in /proc/cpuinfo.
>> >> +   IMP is the implementer ID of the CPU vendor.  On a GNU/Linux system it
>> >> +   can be found in /proc/cpuinfo. A partial list of implementer IDs is
>> >> +   given in the ARM Architecture Reference Manual ARMv8, for
>> >> -   in /proc/cpuinfo.  For big.LITTLE systems this should have the form 
>> >> at of
>> >> -   ".".  */
>> >> +   in /proc/cpuinfo.  For big.LITTLE systems this should use the macro 
>> >> AARCH64_BIG_LITTLE
>> >> +   where the big part number comes as the first arugment to the macro 
>> >> and little is the
>> >> +   second.  */
>> >
>> > Needs rewrapped for 80 char width.
>> >
>>
>> I don't think it's a good idea to line wrap the def files, some of them
>> are processed with AWK during configure and having a complete entry per
>> line avoids potential matching problems.
>
> Agreed (and essential) for the entries themselves. This is just a comment
> that hangs over the end and should be fixed.

Yes I agree too.  I think with the entries over 80 char width made me
miss the comment being over 80.  Anyways I fixed it.

>
> While I'm here...
>
>> >> +   where the big part number comes as the first arugment to the macro 
>> >> and little is the
>
> s/arugment/argument.

Fixed.

I also changed the comment for parse_field to include why it would
return -1 and that it parses hex rather than any
base (I noticed this due to request to fix the style of the function itself :)).

Attached is what I am applying after another bootstrap/test on
aarch64-linux-gnu.

Thanks,
Andrew

* config/aarch64/aarch64-cores.def: Rewrite so IMP and PART are
integer constants.
* config/aarch64/driver-aarch64.c (struct aarch64_core_data): Change
implementer_id to unsigned char.
Change part_no to unsigned int.
(AARCH64_BIG_LITTLE): New define.
(INVALID_IMP): New define.
(INVALID_CORE): New define.
(cpu_data): Change the last element's implementer_id and part_no to integers.
(valid_bL_string_p): Rewrite to ..
(valid_bL_core_p): this for integers instead of strings.
(parse_field): New function.
(contains_string_p): Rewrite to ...
(contains_core_p): this for integers and only for the part_no.
(host_detect_local_cpu): Rewrite handling of implementation and part
num to be integers;
simplifying the code.

>
> Cheers,
> James
>
Index: config/aarch64/aarch64-cores.def
===
--- config/aarch64/aarch64-cores.def(revision 241432)
+++ config/aarch64/aarch64-cores.def(working copy)
@@ -32,43 +32,46 @@
FLAGS are the bitwise-or of the traits that apply to that core.
This need not inclu

Re: Ping Re: [PATCH] go-lang.c: remove a redundant cast

2016-10-21 Thread Ian Lance Taylor
David Malcolm  writes:

>> gcc/go/ChangeLog:
>>  > * go-lang.c (go_langhook_type_for_mode): Remove redundant cast
>>  > from result of GET_MODE_CLASS.  Minor formatting fixes.

This is OK.

Thanks.

Ian


[PATCH] PR fortran/78033 -- This was a REAL pain

2016-10-21 Thread Steve Kargl
All,

The attached patch fixes PR fortran/78033.  This was a REAL pain
to fix because Fortran overloads REAL as an intrinsic type and
an intrinsic subprogram.

gfc_match_type_spec() in match.c is used to match Fortran 2003
type-specs in things like array constructors and TYPE IS statements.
At some point in time, PR fortran/54730 was submitted because an ICE
occurred for

  subroutine s
implicit none
intrinsic :: real
real :: vec(1:2)
vec = (/ real(a = 1), 1. /)
  end subroutine s

where a symbol for 'a' was create while parsing for a validate
typespec.  The invalid 'a' was causing an ICE during translation.
Mikael fixed the ICE by introducing checkpointing of the symbols in
gfc_match_array_constructor() in array.c, which allowed 'a' to be
removed.

Fast-forward to PR fortran/78033, submitted a few days ago.
Code like 

  subroutine f(n, x)
 integer, intent(in) :: n 
 complex, intent(in) :: x(1:n)
 real :: y(2*n)
 y = [real(x(1:n), aimag(x(1:n))]
  end subroutine f

was now ICE'ing due what appears to be a tangling checkpoint.

f951: internal compiler error: in enforce_single_undo_checkpoint,
at fortran/symbol.c:3514

If I disabled, Mikael's fix for PR fortran/54730 then PR fortran/78033
would compile with the expected regression with PR fortran/54730.  Having
spent to much time looking for a mismatch in checkpoints, I decided to
remove Mikael's fix in gfc_match_array_constructor() and fix the issue
in gfc_match_type_spec() where I special case the parsing of
REAL([KIND]=scalar-int-initialization-expr).

An early version of the patch passed regression except for gomp/udr3.f90.
Note that gfortran never vists gfc_match_type_spec while compling udr3.f90.
I've deleted obj/ and started a clean bootstrap to see if this failure
was collateral damage for my tinkering.  If regression testing is 
successfull, OK to commit?

2016-10-21  Steven G. Kargl  

PR fortran/78033
* array.c (gfc_match_array_constructor): Remove checkpointing
introduced in r196416.  Move initialization to top of function.
* match.c (gfc_match_type_spec): Special case matching for REAL.

2016-10-21  Steven G. Kargl  

PR fortran/78033
* gfortran.dg/pr78033.f90: New test.

-- 
Steve
Index: gcc/fortran/array.c
===
--- gcc/fortran/array.c	(revision 241433)
+++ gcc/fortran/array.c	(working copy)
@@ -1091,7 +1091,6 @@ gfc_match_array_constructor (gfc_expr **
 {
   gfc_constructor *c;
   gfc_constructor_base head;
-  gfc_undo_change_set changed_syms;
   gfc_expr *expr;
   gfc_typespec ts;
   locus where;
@@ -1099,6 +1098,9 @@ gfc_match_array_constructor (gfc_expr **
   const char *end_delim;
   bool seen_ts;
 
+  head = NULL;
+  seen_ts = false;
+
   if (gfc_match (" (/") == MATCH_NO)
 {
   if (gfc_match (" [") == MATCH_NO)
@@ -1115,12 +1117,9 @@ gfc_match_array_constructor (gfc_expr **
 end_delim = " /)";
 
   where = gfc_current_locus;
-  head = NULL;
-  seen_ts = false;
 
   /* Try to match an optional "type-spec ::"  */
   gfc_clear_ts (&ts);
-  gfc_new_undo_checkpoint (changed_syms);
   m = gfc_match_type_spec (&ts);
   if (m == MATCH_YES)
 {
@@ -1130,16 +1129,12 @@ gfc_match_array_constructor (gfc_expr **
 	{
 	  if (!gfc_notify_std (GFC_STD_F2003, "Array constructor "
 			   "including type specification at %C"))
-	{
-	  gfc_restore_last_undo_checkpoint ();
-	  goto cleanup;
-	}
+	goto cleanup;
 
 	  if (ts.deferred)
 	{
 	  gfc_error ("Type-spec at %L cannot contain a deferred "
 			 "type parameter", &where);
-	  gfc_restore_last_undo_checkpoint ();
 	  goto cleanup;
 	}
 
@@ -1148,24 +1143,15 @@ gfc_match_array_constructor (gfc_expr **
 	{
 	  gfc_error ("Type-spec at %L cannot contain an asterisk for a "
 			 "type parameter", &where);
-	  gfc_restore_last_undo_checkpoint ();
 	  goto cleanup;
 	}
 	}
 }
   else if (m == MATCH_ERROR)
-{
-  gfc_restore_last_undo_checkpoint ();
-  goto cleanup;
-}
+goto cleanup;
 
-  if (seen_ts)
-gfc_drop_last_undo_checkpoint ();
-  else
-{
-  gfc_restore_last_undo_checkpoint ();
-  gfc_current_locus = where;
-}
+  if (!seen_ts)
+gfc_current_locus = where;
 
   if (gfc_match (end_delim) == MATCH_YES)
 {
Index: gcc/fortran/match.c
===
--- gcc/fortran/match.c	(revision 241433)
+++ gcc/fortran/match.c	(working copy)
@@ -1989,6 +1989,7 @@ gfc_match_type_spec (gfc_typespec *ts)
 {
   match m;
   locus old_locus;
+  char name[GFC_MAX_SYMBOL_LEN + 1];
 
   gfc_clear_ts (ts);
   gfc_gobble_whitespace ();
@@ -2013,13 +2014,6 @@ gfc_match_type_spec (gfc_typespec *ts)
   goto kind_selector;
 }
 
-  if (gfc_match ("real") == MATCH_YES)
-{
-  ts->type = BT_REAL;
-  ts->kind = gfc_default_real_kind;
-  goto kind_selector;
-}
-
   if (gfc_match ("double precisi

relax rule for flexible array members in 6.x (78039 - fails to compile glibc tests)

2016-10-21 Thread Martin Sebor

Bug 78039 complains that the fix for c++/71912 recently backported
to the GCC 6 branch causes GCC 6 to reject Glibc tests that expect
to be able to define structs with multiple flexible array members,
despite it violating the C standard(*).

The rejected code is unsafe and was intended to be rejected in 6.1
to begin with (i.e., it was a bug I had missed that the code wasn't
rejected in 6.1), and an alternate solution exists, so the backport
seemed appropriate to me.

However, it was pointed out to me that apparently there is a policy
or convention of not backporting to release branches bug fixes that
cause GCC to reject code that was previously accepted, even if the
code is invalid.

To comply with this policy the attached patch adjusts the backported
code to accept the invalid flexible array member with just a pedantic
warning (same as in C mode).  The patch also adds the tests that were
part of the fix for bug 71912 but that were accidentally left out of
the original backport.

Martin

[*] Bug 77650 discusses the background on this.

PS I checked the GCC Development Plan but couldn't find a mention
of this policy.  Since this seems like an important guarantee for
users to know about and for contributors to maintain I suggest to
update the document to reflect it.  If there is are no objections
I'll propose a separate change to mention it.

  https://gcc.gnu.org/develop.html
PR c++/78039 - fails to compile glibc tests

gcc/cp/ChangeLog:
2016-10-21  Martin Sebor  

	PR c++/78039
	* class.c (diagnose_flexarrays): Avoid rejecting an invalid flexible
	array member with a hard error when it is followed by anbother member
	in a different struct, and instead issue just a pedantic warning.

gcc/testsuite/ChangeLog:
2016-10-21  Martin Sebor  

	PR c++/78039
	* g++.dg/ext/flexary18.C: New test.
	* g++.dg/ext/flexary19.C: New test.

Index: gcc/cp/class.c
===
--- gcc/cp/class.c	(revision 241433)
+++ gcc/cp/class.c	(working copy)
@@ -6960,7 +6960,20 @@ diagnose_flexarrays (tree t, const flexmems_t *fme
 	  location_t loc = DECL_SOURCE_LOCATION (fmem->array);
 	  diagd = true;
 
-	  error_at (loc, msg, fmem->array, t);
+	  /* For compatibility with GCC 6.2 and 6.1 reject with an error
+	 a flexible array member of a plain struct that's followed
+	 by another member only if they are both members of the same
+	 struct.  Otherwise, issue just a pedantic warning.  See bug
+	 71375 for details.  */
+	  if (fmem->after[0]
+	  && (!TYPE_BINFO (t)
+		  || 0 == BINFO_N_BASE_BINFOS (TYPE_BINFO (t)))
+	  && DECL_CONTEXT (fmem->array) != DECL_CONTEXT (fmem->after[0])
+	  && !ANON_AGGR_TYPE_P (DECL_CONTEXT (fmem->array))
+	  && !ANON_AGGR_TYPE_P (DECL_CONTEXT (fmem->after[0])))
+	pedwarn (loc, OPT_Wpedantic, msg, fmem->array, t);
+	  else
+	error_at (loc, msg, fmem->array, t);
 
 	  /* In the unlikely event that the member following the flexible
 	 array member is declared in a different class, or the member
Index: gcc/testsuite/g++.dg/ext/flexary18.C
===
--- gcc/testsuite/g++.dg/ext/flexary18.C	(revision 0)
+++ gcc/testsuite/g++.dg/ext/flexary18.C	(working copy)
@@ -0,0 +1,213 @@
+// PR c++/71912 - [6/7 regression] flexible array in struct in union rejected
+// { dg-do compile }
+// { dg-additional-options "-Wpedantic -Wno-error=pedantic" }
+
+#if __cplusplus
+
+namespace pr71912 {
+
+#endif
+
+struct foo {
+  int a;
+  char s[]; // { dg-message "array member .char pr71912::foo::s \\\[\\\]. declared here" }
+};
+
+struct bar {
+  double d;
+  char t[];
+};
+
+struct baz {
+  union {
+struct foo f;
+struct bar b;
+  }
+  // The definition of struct foo is fine but the use of struct foo
+  // in the definition of u below is what's invalid and must be clearly
+  // diagnosed.
+u;  // { dg-warning "invalid use of .struct pr71912::foo. with a flexible array member in .struct pr71912::baz." }
+};
+
+struct xyyzy {
+  union {
+struct {
+  int a;
+  char s[]; // { dg-message "declared here" }
+} f;
+struct {
+  double d;
+  char t[];
+} b;
+  } u;  // { dg-warning "invalid use" }
+};
+
+struct baz b;
+struct xyyzy x;
+
+#if __cplusplus
+
+}
+
+#endif
+
+// The following definitions aren't strictly valid but, like those above,
+// are accepted for compatibility with GCC (in C mode).  They are benign
+// in that the flexible array member is at the highest offset within
+// the outermost type and doesn't overlap with other members except for
+// those of the union.
+union UnionStruct1 {
+  struct { int n1, a[]; } s;
+  int n2;
+};
+
+union UnionStruct2 {
+  struct { int n1, a1[]; } s1;
+  struct { int n2, a2[]; } s2;
+  int n3;
+};
+
+union UnionStruct3 {
+  struct { int n1, a1[]; } s1;
+  struct { double n2, a2[]; 

Fwd: [Patch, fortran] PR69834 - Collision in derived type hashes

2016-10-21 Thread Dominique d'Humières


> Début du message réexpédié :
> 
> De: Dominique d'Humières 
> Objet: Rép : [Patch, fortran] PR69834 - Collision in derived type hashes
> Date: 22 octobre 2016 à 01:04:21 UTC+2
> À: Paul Richard Thomas 
> Cc: Andre Vehreschild , fort...@gcc.gnu.org, gcc-patches List 
> 
> 
> Dear Paul,
> 
> If I did not do any mistake, this patch conflicts seriously with Andre’s one 
> at https://gcc.gnu.org/ml/fortran/2016-10/msg00141.html.
> 
> Cheers,
> 
> Dominique
> 



[rs6000] Add support for signed overflow arithmetic

2016-10-21 Thread Eric Botcazou
Hi,

this implements support for signed overflow arithmetic on PowerPC.  It's an 
implementation for Power ISA v2.0x, i.e. it doesn't take account the new OV32 
flag introduced in v3.0.  It doesn't implement unsigned overflow arithmetic 
because my understanding is that the generic support already generates optimal 
code in most cases on PowerPC for unsigned.

It introduces a new MODE_CC mode (CCVmode) which represents the OV flag of the 
XER, and the overflow arithmetic instructions are paired with a mcrxr.  The 
comparisons are written in terms of UNSPECs because I used that for Visium and 
SPARC, but I can rewrite them a la x86/ARM if requested.

There is also a tweak to expand_arith_overflow, because it would otherwise 
"promote" signed multiplication to unsigned multiplication in some cases and 
this badly pessimizes for PowerPC.

Tested on PowerPC/Linux and PowerPC64/Linux, OK for the mainline?


2016-10-21  Eric Botcazou  

* internal-fn.c (expand_arith_overflow): Do not promote a signed
multiplication done in hardware to an unsigned open-coded one.
* config/rs6000/rs6000-modes.def (CCV): New.
* config/rs6000/rs6000-protos.h (rs6000_select_cc_mode): Declare.
* config/rs6000/rs6000.h (SELECT_CC_MODE): Call it.
* config/rs6000/rs6000.c (rs6000_debug_reg_global): Handle CCVmode.
(validate_condition_mode): Likewise.
(print_operand): Handle %C modifier.
(rs6000_select_cc_mode): Likewise.
(output_cbranch): Likewise.  Tidy up.
* config/rs6000/rs6000.md (UNSPEC_{ADD,SUB,NEG,MUL}V): New constants.
(addv4): New expander.
(add3_overflow): New instruction.
(add3_overflow_carry_in): New expander.
(add3_overflow_carry_in_internal): New instruction.
(add3_overflow_carry_in_0): Likewise.
(add3_overflow_carry_in_m1): Likewise.
(subv4): New expander.
(subf3_overflow): New instruction.
(subf3_overflow_carry_in): New expander.
(sub3_overflow_carry_in_internal): New instruction.
(subf3_overflow_carry_in_0): Likewise.
(subf3_overflow_carry_in_m1): Likewise.
(negv3): New expander.
(neg2_overflow): New instruction.
(mulv4): New expander.
(mulv3_overflow): New instruction.
testsuite/
* gcc.target/powerpc/overflow-1.c: New test.
* gcc.target/powerpc/overflow-2.c: Likewise.
* gcc.target/powerpc/overflow-3.c: Likewise.
* gcc.target/powerpc/overflow-4.c: Likewise.

-- 
Eric BotcazouIndex: internal-fn.c
===
--- internal-fn.c	(revision 241379)
+++ internal-fn.c	(working copy)
@@ -1772,10 +1772,23 @@ expand_arith_overflow (enum tree_code co
   int prec1 = TYPE_PRECISION (TREE_TYPE (arg1));
   int precres = TYPE_PRECISION (type);
   location_t loc = gimple_location (stmt);
-  if (!uns0_p && get_range_pos_neg (arg0) == 1)
-uns0_p = true;
-  if (!uns1_p && get_range_pos_neg (arg1) == 1)
-uns1_p = true;
+  /* Try to promote to unsigned since unsigned overflow is easier to open
+ code than signed overflow, but not for multiplication if that would
+ mean not using the hardware because this would very likely result in
+ doing 2 multiplications instead of only 1, e.g. on PowerPC.  */
+  if (code == MULT_EXPR
+  && !unsr_p
+  && precres <= BITS_PER_WORD
+  && optab_handler (mulv4_optab, TYPE_MODE (type)) != CODE_FOR_nothing
+  && optab_handler (umulv4_optab, TYPE_MODE (type)) == CODE_FOR_nothing)
+;
+  else
+{
+  if (!uns0_p && get_range_pos_neg (arg0) == 1)
+	uns0_p = true;
+  if (!uns1_p && get_range_pos_neg (arg1) == 1)
+	uns1_p = true;
+}
   int pr = get_min_precision (arg0, uns0_p ? UNSIGNED : SIGNED);
   prec0 = MIN (prec0, pr);
   pr = get_min_precision (arg1, uns1_p ? UNSIGNED : SIGNED);
Index: config/rs6000/rs6000-modes.def
===
--- config/rs6000/rs6000-modes.def	(revision 241313)
+++ config/rs6000/rs6000-modes.def	(working copy)
@@ -32,13 +32,15 @@ FLOAT_MODE (TF, 16, ieee_quad_format);
 /* Add any extra modes needed to represent the condition code.
 
For the RS/6000, we need separate modes when unsigned (logical) comparisons
-   are being done and we need a separate mode for floating-point.  We also
-   use a mode for the case when we are comparing the results of two
-   comparisons, as then only the EQ bit is valid in the register.  */
+   are being done and we need a separate mode for floating-point.  We also use
+   a mode for the case when we are comparing the results of two comparisons,
+   as then only the EQ bit is valid in the register.  We also use a mode for
+   detecting signed overflow, as only the GT bit is valid in the register.  */
 
 CC_MODE (CCUNS);
 CC_MODE (CCFP);
 CC_MODE (CCEQ);
+CC_MODE (CCV);
 
 /* Vector modes.  */
 VECTOR_MODES (INT, 8);/*   V8QI  V4HI V2SI */
Index

Re: [PATCH] Extend -Wint-in-bool-context to warn for multiplications

2016-10-21 Thread Joseph Myers
The quoting in the diagnostic should be %<&&%>, not '&&'.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH v3] gcc/config/tilegx/tilegx.c (tilegx_function_profiler): Save r10 to stack before call mcount

2016-10-21 Thread Chen Gang
On 10/20/16 06:42, Jeff Law wrote:
>> On 6/4/16 21:25, cheng...@emindsoft.com.cn wrote:
>>> From: Chen Gang 
>>>
>>> r10 may also be as parameter stack pointer for the nested function, so
>>> need save it before call mcount.
>>>
>>> Also clean up code: use '!' instead of "== 0" for checking
>>> static_chain_decl and compute_total_frame_size.
>>>
>>> 2016-06-04  Chen Gang  
>>>
>>> gcc/
>>> PR target/71331
>>> * config/tilegx/tilegx.c (tilegx_function_profiler): Save r10
>>> to stack before call mcount.
>>> (tilegx_can_use_return_insn_p): Clean up code.
> So if I understand the tilegx architecture correctly, you're issuing the r10 
> save & sp adjustment as a bundle, and the restore & sp adjustment as a bundle.
> 
> The problem is the semantics of bunding on the tilegx effectively mean that 
> all source operands are read in parallel, then all outputs occur in parallel.
> 
> So if we take the bundle
> 
> {addi sp,sp,-8 ; st sp, r10}
> 
> The address used for the st is the value of the stack pointer before the addi 
> instruction.
> 
> Similarly for the restore r10 bundle.  The address used for the load is sp 
> before adjustment.
> 
> Given my understanding of the tilegx bundling semantics, that seems wrong.
> 
> Jeff
>
 
The comments on 1st page of "TILE-Gx Instruction Set Architecture":

Individual instructions within a bundle must comply with certain register 
semantics. Read-after-write (RAW) dependencies are enforced between instruction 
bundles. There is no ordering within a bundle, and the numbering of pipelines 
or instruction slots within a bundle is only used for convenience and does not 
imply any ordering. Within an instruction bundle, it is valid to encode an 
output operand that is the same as an input operand. Because there is 
explicitly no implied dependency within a bundle, the semantics for this 
specify that the input operands for all instructions in a bundle are read 
before any of the output operands are written.

Write-after-write (WAW) semantics between two bundles are defined as: the 
latest write over-writes earlier writes.

Within a bundle, WAW dependencies are forbidden. If more than one instruction 
in a bundle writes to the same output operand register, unpredictable results 
for any destination operand within that bundle can occur. Also, implementations 
are free to signal this case as an illegal instruction. There is one exception 
to this rule—multiple instructions within a bundle may legally target the zero 
register. Lastly, some instructions, such as instructions that implicitly write 
the link register, implicitly write registers. If an instruction implicitly 
writes to a register that another instruction in the same bundle writes to, 
unpredictable results can occur for any output register used by that bundle 
and/or an illegal instruction interrupt can occur.

On Page 221, ld instruction is:

  ld Dest, Src

On Page 251, st instruction is:

  st SrcA, SrcB


So for me:

  Bundle {addi sp, sp, 8; ld r10, sp} is OK, it is RAW.

  Bundle {addi sp, sp, -8; st sp, r10} is OK, too, it is RAW (not WAW --
  both SrcA and SrcB are input operands).


Please help check, if need the related document, please let me know.

Thanks.
-- 
Chen Gang (陈刚)

Managing Natural Environments is the Duty of Human Beings.


[PATCH, RFC] Fix PR71915, PR71490: Handle casts on strides consistently

2016-10-21 Thread Bill Schmidt
Hi,

I've been meaning for some time now to do a better job handling strength
reduction candidates where the stride of the candidate and its basis
involves a cast (usually widening) from another type.  The two PRs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71490 and
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71915 show that we miss
strength reduction opportunities in such cases.  Essentially the way I
was handling this before required a cast to go back to the original
type, which was illegal when this was a narrowing operation or a cast
from a wrapping type to a non-wrapping type.

This patch improves matters by tracking the effective type of the stride
so that we perform arithmetic in that type directly.  This allows us to
strength-reduce some cases missed before.  We still have to be careful
not to ever perform a narrowing or wrap-to-nonwrap conversion, but these
cases don't come up nearly so often in practice with this patch.

The patch is pretty straightforward but rather large, so I'd like to
leave it up for review for a week or so before committing it -- extra
eyes welcome!

There is an existing test case that checks for the narrowing problem,
and now fails with this patch as it should.  That is, it's now legal to
insert the initializer that we formerly knew to be bad for this test.
Since the test no longer serves any purpose, I'm deleting it.

gcc.dg/tree-ssa/slsr-8.c generates quite different intermediate code now
than when I first added it to the test suite.  As a result of various
optimization changes, it's no longer maintained as a single block;
instead, the optimizable computations are sunk into two legs of a
conditional.  This exposed another similar case, leading to the bug
reports.  This test now passes, but I had to adjust it for the new code
generation we get.  I also added some commentary to indicate what we
expect to happen in that test, since it isn't too obvious.

I've bootstrapped this and tested it on powerpc64le-unknown-linux-gnu
with no regressions.  To avoid the Wrath of Octoploid ;), I've also
tested it against ffmpeg using an x86_64-pc-linux-gnu cross with -O3
-march=amdfam10, also with no failures.  I've also verified that we hit
this code about 35 times in the test suite, so it looks like we have
some decent additional test coverage.

Thanks in advance to anyone willing to take a look.

Bill


[gcc]

2016-10-21  Bill Schmidt  

PR tree-optimization/71915
* gimple-ssa-strength-reduction.c (struct slsr_cand_d): Add
stride_type field.
(find_basis_for_base_expr): Require stride types to match when
seeking a basis.
(alloc_cand_and_find_basis): Record the stride type.
(slsr_process_phi): Pass stride type to alloc_cand_and_find_basis.
(backtrace_base_for_ref): Pass types to legal_cast_p_1 rather than
the expressions having those types.
(slsr_process_ref): Pass stride type to alloc_cand_and_find_basis.
(create_mul_ssa_cand): Likewise.
(create_mul_imm_cand): Likewise.
(create_add_ssa_cand): Likewise.
(create_add_imm_cand): Likewise.
(legal_cast_p_1): Change interface to accept types rather than the
expressions having those types.
(legal_cast_p): Pass types to legal_cast_p_1.
(slsr_process_cast): Pass stride type to
alloc_cand_and_find_basis.
(slsr_process_copy): Likewise.
(dump_candidate): Display stride type when a cast exists.
(create_add_on_incoming_edge): Introduce a cast when necessary for
the stride type.
(analyze_increments): Change the code checking for invalid casts
to rely on the stride type, and update the documentation and
example.  Change the code checking for pointer multiplies to rely
on the stride type.
(insert_initializers): Introduce a cast when necessary for the
stride type.  Use the stride type for the type of the initializer.

[gcc/testsuite]

2016-10-21  Bill Schmidt  

PR tree-optimization/71915
* gcc.dg/tree-ssa/pr54245.c: Delete.
* gcc.dg/tree-ssa/slsr-8.c: Adjust for new optimization and
document why.


Index: gcc/gimple-ssa-strength-reduction.c
===
--- gcc/gimple-ssa-strength-reduction.c (revision 241379)
+++ gcc/gimple-ssa-strength-reduction.c (working copy)
@@ -246,6 +246,13 @@ struct slsr_cand_d
  replacement MEM_REF.)  */
   tree cand_type;
 
+  /* The type to be used to interpret the stride field when the stride
+ is not a constant.  Normally the same as the type of the recorded
+ stride, but when the stride has been cast we need to maintain that
+ knowledge in order to make legal substitutions without losing 
+ precision.  When the stride is a constant, this will be sizetype.  */
+  tree stride_type;
+
   /* The kind of candidate (CAND_MULT, etc.).  */
   enum cand_kind kind;
 
@@ -502,6 +509,7 @@ find_basis

[RFA][PR tree-optimization/72785] Avoid duplicating blocks with b_c_p or b_o_s calls.

2016-10-21 Thread Jeff Law


As noted in the BZ, jump threading is isolating a path which contains a 
b_c_p call.  The result of the isolation is that on the original path, 
b_c_p continues to return 0 (not-constant), but on the isolated path it 
returns 1 (because the feeding PHI argument is constant).


That in turn causes the isolated path to issue a call a function that 
would not have been called by the original code.  That violates the 
as-if rule that governs so many of our transformations/optimizations.


I've come to the conclusion that if a block with a b_c_p/b_o_s can not 
be duplicated unless we can prove the original and duplicate continue to 
produce the same result -- including after any edge redirections to wire 
in the duplicate block.


This patch addresses the block duplication problem by disallowing 
duplication if the block contains a b_c_p or b_o_s call and making sure 
that we test can_duplicate_block_p in the old threader (the backward 
thread already has that test).


That's sufficient to resolve this particular BZ.

However, I suspect that we have a deeper problem, namely that any 
transformation which deletes an edge from the CFG can cause a b_c_p 
result to change from non-constant to constant.  It's left as an 
exercise to the reader to produce such a case.


I'm certainly open to someone redefining the semantics of b_c_p or b_o_s 
so that they're not subject to these issues.  But I think the bar for 
acceptance of such a change is fairly high, particularly when this kind 
of change in semantics would allow the optimizers to change code in an 
observable way.



Anyway, attached is the patch which restricts block duplication when the 
block has a b_c_p or b_o_s call.   It's been bootstrapped and regression 
tested on x86_64-linux-gnu.


OK for the trunk?

Jeff
diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index fd28129..9d00980 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,12 @@
+2016-10-21  Jeff Law  
+
+   PR tree-optimization/72785
+   * tree-cfg.c (gimple_can_duplicate_bb_p): Do not allow duplication of
+   blocks with builtin_constant_p or builtin_object_size calls.
+   * tree-ssa-threadedge.c: Include cfghooks.h.
+   (thread_through_normal_block): Test can_duplicate_block_p before
+   threading through the block.
+
 2016-10-21  Kugan Vivekanandarajah  
 
* ipa-prop.c (ipa_compute_jump_functions_for_edge): Create nonzero
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 09db0f8..79c637f 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,5 +1,8 @@
 2016-10-21  Jeff Law  
 
+   PR tree-optimization/72785
+   * gcc.dg/tree-ssa/pr72785.c: New test
+
* PR tree-optimization/71947
* gcc.dg/tree-ssa/pr71947-4.c: Avoid x86 opcode.
* gcc.dg/tree-ssa/pr71947-5.c: Likewise.
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr72785.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr72785.c
new file mode 100644
index 000..17cda22
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr72785.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+
+
+int a, b;
+extern int link_error(void);
+void by(void) {
+  int c = 1;
+  b = a ?: c;
+  __builtin_constant_p(b) ? b ?link_error() : 0 : 0;
+}
+
+
+/* { dg-final { scan-tree-dump-not "link_error" "optimized" } } */
+
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index dfa82aa..f6c7da4 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -5922,8 +5922,25 @@ gimple_split_block_before_cond_jump (basic_block bb)
 /* Return true if basic_block can be duplicated.  */
 
 static bool
-gimple_can_duplicate_bb_p (const_basic_block bb ATTRIBUTE_UNUSED)
+gimple_can_duplicate_bb_p (const_basic_block bb)
 {
+  /* We can not duplicate a block with a call to builtin_constant_p.  */
+  for (gimple_stmt_iterator gsi = gsi_start_bb (const_cast (bb));
+   !gsi_end_p (gsi);
+   gsi_next (&gsi))
+{
+  gimple *stmt = gsi_stmt (gsi);
+
+  if (gimple_code (stmt) == GIMPLE_CALL)
+   {
+ tree callee = gimple_call_fndecl (stmt);
+ if (callee
+ && DECL_BUILT_IN_CLASS (callee) == BUILT_IN_NORMAL
+ && (DECL_FUNCTION_CODE (callee) == BUILT_IN_CONSTANT_P
+ || DECL_FUNCTION_CODE (callee) == BUILT_IN_OBJECT_SIZE))
+   return false;
+   }
+}
   return true;
 }
 
diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c
index 170e456..fee8035 100644
--- a/gcc/tree-ssa-threadedge.c
+++ b/gcc/tree-ssa-threadedge.c
@@ -37,6 +37,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa-dom.h"
 #include "gimple-fold.h"
 #include "cfganal.h"
+#include "cfghooks.h"
 
 /* To avoid code explosion due to jump threading, we limit the
number of statements we are going to copy.  This variable
@@ -1030,6 +1031,11 @@ thread_through_normal_block (edge e,
 vec *path,
 bitmap visited)
 {
+  /* If we can't duplica

Re: libgo patch committed: Rewrite interface code into Go

2016-10-21 Thread Ian Lance Taylor
On Fri, Oct 21, 2016 at 4:16 AM, Rainer Orth
 wrote:
>
>> This patch to libgo rewrites the interface code from C to Go.
>>
>> I started to copy the Go 1.7 interface code, but the gc and gccgo
>> representations of interfaces are too different.  So instead I rewrote
>> the gccgo interface code from C to Go.  The code is largely the same
>> as it was, but the names are more like those used in the gc runtime.
>>
>> I also copied over the string comparison functions, and tweaked the
>> compiler to use eqstring when comparing strings for equality.
>>
>> Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
>> to mainline.
>
> this patch broke Solaris 11 and 12 bootstrap:
>
> In file included from 
> /vol/gcc/src/hg/trunk/local/libgo/runtime/runtime.h:113:0,
>  from /vol/gcc/src/hg/trunk/local/libgo/runtime/go-main.c:17:
> ./runtime.inc:2:12: error: expected identifier or '(' before numeric constant
>  #define c1 326713
> ^
> ./runtime.inc:713:11: note: in expansion of macro 'c1'
>   uint32_t c1;
>^~
> Makefile:1630: recipe for target 'libgobegin_a-go-main.o' failed
> make[4]: *** [libgobegin_a-go-main.o] Error 1
>
> runtime.inc starts with
>
> #define c0 2860486313
> #define c1 326713
>
> and lines 712-713 have
>
> struct _Compartments_t {
> uint32_t c1;
>
> which stems from  (Compartments_t).
>
> It seems c[01] were introduced via the new go/runtime/alg.go.

I committed this patch which should fix this problem.  Bootstrapped
and ran Go testsuite on x86_64-pc-linux-gnu, which admittedly proves
little.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 241430)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-9806c8a8e4e448eaf6810ff1acffa715745d2549
+6d9929a1641b180e724c2fdcdd55f6a254f1dec0
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/Makefile.am
===
--- libgo/Makefile.am   (revision 241427)
+++ libgo/Makefile.am   (working copy)
@@ -1230,7 +1230,7 @@ runtime-go.lo:
 runtime.inc: s-runtime-inc; @true
 s-runtime-inc: runtime-go.lo Makefile
rm -f runtime.inc.tmp2
-   grep -v "#define _" runtime.inc.tmp > runtime.inc.tmp2
+   grep -v "#define _" runtime.inc.tmp | grep -v "#define c0 " | grep -v 
"#define c1 " > runtime.inc.tmp2
for pattern in '_G[a-z]' '_P[a-z]' _Max _Lock _Sig _Trace _MHeap _Num; 
do \
  grep "#define $$pattern" runtime.inc.tmp >> runtime.inc.tmp2; \
done


Re: [PATCH][check_GNU_style.sh] More aggressively ignore dg-xxx directives

2016-10-21 Thread Mike Stump
On Oct 21, 2016, at 12:47 PM, Martin Sebor  wrote:
> 
> The latest patch works as expected for me, both with an operand
> and with stdin.  But since I'm not empowered to approve it one
> of the others reviewers will need to give it their blessing.

Seems fine from a test suite perspective, but not my file.



Re: [PATCH] PR debug/77315 - use DW_OP_form_tls_address

2016-10-21 Thread Jakub Jelinek
On Fri, Oct 21, 2016 at 02:38:11PM -0600, Tom Tromey wrote:
> > "Jakub" == Jakub Jelinek  writes:
> 
> Jakub> Also, as this effectively requires the latest unreleased GDB under the
> Jakub> default options for something that has been working previously, I 
> wonder
> Jakub> if it e.g. for some time shouldn't be guarded with dwarf_version >= 5
> 
> Here's what that looks like.

LGTM.

> commit 7865ede46e519fa6bc3d6367f943a40179b4d380
> Author: Tom Tromey 
> Date:   Thu Oct 20 17:03:35 2016 -0600
> 
> PR debug/77315 - use DW_OP_form_tls_address
> 
> This patch changes gcc to emit DW_OP_form_tls_address rather than
> DW_OP_GNU_push_tls_address.  This is PR debug/77315.
> 
> DW_OP_form_tls_address was added in DWARF 3.  However, this patch checks
> for DWARF version 5 or above to decide which opcode to emit, because gdb
> did not implement the DWARF 3 opcode until recently -- not until after
> 7.12.  This approach seems safest because DWARF 5 is also going to
> require other gdb changes.
> 
> Built and regtested on x86-64 Fedora 24.
> 
> 2016-10-21  Tom Tromey  
> 
>   PR debug/77315:
>   * dwarf2out.c (mem_loc_descriptor): Use DW_OP_form_tls_address.
>   (resolve_args_picking_1): Move DW_OP_form_tls_address case next to
>   DW_OP_GNU_push_tls_address case.
>   (loc_list_from_tree_1): Use DW_OP_form_tls_address.

Jakub


libgo patch committed: Fix int64 alignment on 32-bit SPARC

2016-10-21 Thread Ian Lance Taylor
This patch to libgo fixes the expected alignment of int64 types on
32-bit SPARC.  Without this most calls to select fail at runtime, as
the compiler and the library disagree about the expected size of the
hselect struct.  Bootstrapped and ran Go testsuite on
x86_pc-linux-gnu, bootstrapped and ran a few libgo tests on
sparc-sun-solaris.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 241427)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-df6046971233854e5b7533140d4ead095ab69857
+9806c8a8e4e448eaf6810ff1acffa715745d2549
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/configure.ac
===
--- libgo/configure.ac  (revision 241341)
+++ libgo/configure.ac  (working copy)
@@ -364,7 +364,6 @@ GOARCH_MINFRAMESIZE=8
 #endif],
 [GOARCH=sparc
 GOARCH_FAMILY=SPARC
-GOARCH_INT64ALIGN=4
 ],
 [GOARCH=sparc64
 GOARCH_FAMILY=SPARC64


Re: RFC [1/3] divmod transform v2

2016-10-21 Thread Jeff Law

On 10/21/2016 04:34 AM, Prathamesh Kulkarni wrote:

On 20 October 2016 at 15:02, Richard Biener  wrote:

On Wed, 19 Oct 2016, Jeff Law wrote:


On 10/15/2016 11:59 PM, Prathamesh Kulkarni wrote:

Hi,
After approval from Bernd Schmidt, I committed the patch to remove
optab functions for
sdivmod_optab and udivmod_optab in optabs.def, which removes the block
for divmod patch.

This patch is mostly the same as previous one, except it drops
targeting __udivmoddi4() because
it gave undefined reference link error for calling __udivmoddi4() on
aarch64-linux-gnu.
It appears aarch64 has hardware insn for DImode div, so __udivmoddi4()
isn't needed for the target
(it was a bug in my patch that called __udivmoddi4() even though
aarch64 supported hardware div).

However this makes me wonder if it's guaranteed that __udivmoddi4()
will be available for a target if it doesn't have hardware div and
divmod insn and doesn't have target-specific libfunc for
DImode divmod ? To be conservative, the attached patch doesn't
generate call to __udivmoddi4.

Passes bootstrap+test on x86_64-unknown-linux.
Cross-tested on arm*-*-*, aarch64*-*-*.
Verified that there are no regressions with SPEC2006 on
x86_64-unknown-linux-gnu.
OK to commit ?

Thanks,
Prathamesh


divmod-v2-3-main.txt


2016-10-15  Prathamesh Kulkarni  
Kugan Vivekanandarajah  
Jim Wilson  

* target.def: New hook expand_divmod_libfunc.
* doc/tm.texi.in: Add hook for TARGET_EXPAND_DIVMOD_LIBFUNC
* doc/tm.texi: Regenerate.
* internal-fn.def: Add new entry for DIVMOD ifn.
* internal-fn.c (expand_DIVMOD): New.
* tree-ssa-math-opts.c: Include optabs-libfuncs.h, tree-eh.h,
targhooks.h.
(widen_mul_stats): Add new field divmod_calls_inserted.
(target_supports_divmod_p): New.
(divmod_candidate_p): Likewise.
(convert_to_divmod): Likewise.
(pass_optimize_widening_mul::execute): Call
calculate_dominance_info(), renumber_gimple_stmt_uids() at
beginning of function. Call convert_to_divmod()
and record stats for divmod.

Starting with some high level design comments.  If these conflict with
comments from others, let me know and we'll work through the issues.

I don't really like introducing code conditional on the target capabilities
this early in the gimple optimization pipeline.


It's basically done right before RTL expansion
(pass_optimize_widening_mul).


Would it be possible to always do the transformation to divmod in the gimple
optimizers, regardless of the target capabilities.  Then in the gimple->RTL
expanders make a final decision about divmod insn, libcall, or using div/mod
insns?


The issue is that it hoists one or both the division or the modulo and
if we don't do the transform we'd want to undo that code motion.


That would move all the target dependencies out of the gimple optimizers and
into the gimple->rtl expansion phase, which is the preferred place to start
introducing this kind of target dependency.

With that background, I'm going to focus more on the identification of divmod
opportunities than the expansion bits.




diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index a4a8e49..866c368 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -7078,6 +7078,11 @@ This is firstly introduced on ARM/AArch64 targets,
please refer to
 the hook implementation for how different fusion types are supported.
 @end deftypefn

+@deftypefn {Target Hook} void TARGET_EXPAND_DIVMOD_LIBFUNC (rtx
@var{libfunc}, machine_mode @var{mode}, rtx @var{op0}, rtx @var{op1}, rtx
*@var{quot}, rtx *@var{rem})
+Define this hook for enabling divmod transform if the port does not have
+hardware divmod insn but defines target-specific divmod libfuncs.
+@end deftypefn
+
 @node Sections
 @section Dividing the Output into Sections (Texts, Data, @dots{})
 @c the above section title is WAY too long.  maybe cut the part between
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 265f1be..c4c387b 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4890,6 +4890,8 @@ them: try the first ones in this list first.

 @hook TARGET_SCHED_FUSION_PRIORITY

+@hook TARGET_EXPAND_DIVMOD_LIBFUNC
+
 @node Sections
 @section Dividing the Output into Sections (Texts, Data, @dots{})
 @c the above section title is WAY too long.  maybe cut the part between
diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index 0b32d5f..42c6973 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -2207,6 +2207,53 @@ expand_ATOMIC_COMPARE_EXCHANGE (internal_fn, gcall
*call)
   expand_ifn_atomic_compare_exchange (call);
 }

+/* Expand DIVMOD() using:

In general, we do not use () when referring to objects in comments.


+ a) optab handler for udivmod/sdivmod if it is available.
+ b) If optab_handler doesn't exist, generate call to
+target-specific divmod libfunc.  */
+
+static void
+expand_DIVMOD (internal_fn, gcall *call_stmt)

In general, don't use upper case for function names.  L

Minor testsuite fixes for 71947 tests

2016-10-21 Thread Jeff Law


This fixes minor issues in the recently added tests for 71947.

First in two tests we use an x86 opcode.  We don't actually assemble the 
tests, so it really doesn't matter, but just to make it clear the test 
is not x86 specific, the opcode was changed to "xyzzy" :-)


One test is dependent on branch costing.  So it's been changed to an 
opt-in test.


Installed on the trunk.

commit 0fe8d31788ee6fe0c70e0ab52dd0a32df122bdad
Author: Jeff Law 
Date:   Fri Oct 21 14:40:51 2016 -0600

* PR tree-optimization/71947
* gcc.dg/tree-ssa/pr71947-4.c: Avoid x86 opcode.
* gcc.dg/tree-ssa/pr71947-5.c: Likewise.
* gcc.dg/tree-ssa/pr71947-6.c: Make it opt-in rather than opt-out.

diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 62aa521..09db0f8 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,10 @@
+2016-10-21  Jeff Law  
+
+   * PR tree-optimization/71947
+   * gcc.dg/tree-ssa/pr71947-4.c: Avoid x86 opcode.
+   * gcc.dg/tree-ssa/pr71947-5.c: Likewise.
+   * gcc.dg/tree-ssa/pr71947-6.c: Make it opt-in rather than opt-out.
+
 2016-10-21  Kugan Vivekanandarajah  
 
* gcc.dg/ipa/vrp5.c: New test.
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71947-4.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr71947-4.c
index a881f0d..a2b19fe 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr71947-4.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71947-4.c
@@ -6,7 +6,7 @@
 static inline long load(long *p)
 {
 long ret;
-asm ("movq  %1,%0\n\t" : "=r" (ret) : "m" (*p));
+asm ("xyzzy  %1,%0\n\t" : "=r" (ret) : "m" (*p));
 if (ret != *p)
 __builtin_unreachable();
 return ret;
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71947-5.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr71947-5.c
index fa679f0..e7038d0 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr71947-5.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71947-5.c
@@ -5,7 +5,7 @@
 static inline long load(long *p)
 {
 long ret;
-asm ("movq  %1,%0\n\t" : "=r" (ret) : "m" (*p));
+asm ("xyzzy  %1,%0\n\t" : "=r" (ret) : "m" (*p));
 if (ret != *p)
 __builtin_unreachable();
 return ret;
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr71947-6.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr71947-6.c
index 9cb89cb..9463535 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr71947-6.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr71947-6.c
@@ -1,4 +1,5 @@
-/* { dg-do compile } */
+/* This is highly dependent on branch costing, so make it opt-in.  */
+/* { dg-do compile { target { i?86-*-* x86_64-*-* } } } */
 /* { dg-options "-O2 -fno-tree-vrp -fdump-tree-dom-details" } */
 
 


Re: [PATCH] PR debug/77315 - use DW_OP_form_tls_address

2016-10-21 Thread Tom Tromey
> "Jakub" == Jakub Jelinek  writes:

Jakub> Also, as this effectively requires the latest unreleased GDB under the
Jakub> default options for something that has been working previously, I wonder
Jakub> if it e.g. for some time shouldn't be guarded with dwarf_version >= 5

Here's what that looks like.

Tom

commit 7865ede46e519fa6bc3d6367f943a40179b4d380
Author: Tom Tromey 
Date:   Thu Oct 20 17:03:35 2016 -0600

PR debug/77315 - use DW_OP_form_tls_address

This patch changes gcc to emit DW_OP_form_tls_address rather than
DW_OP_GNU_push_tls_address.  This is PR debug/77315.

DW_OP_form_tls_address was added in DWARF 3.  However, this patch checks
for DWARF version 5 or above to decide which opcode to emit, because gdb
did not implement the DWARF 3 opcode until recently -- not until after
7.12.  This approach seems safest because DWARF 5 is also going to
require other gdb changes.

Built and regtested on x86-64 Fedora 24.

2016-10-21  Tom Tromey  

PR debug/77315:
* dwarf2out.c (mem_loc_descriptor): Use DW_OP_form_tls_address.
(resolve_args_picking_1): Move DW_OP_form_tls_address case next to
DW_OP_GNU_push_tls_address case.
(loc_list_from_tree_1): Use DW_OP_form_tls_address.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 6102719..481a2a9 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,11 @@
+2016-10-21  Tom Tromey  
+
+   PR debug/77315:
+   * dwarf2out.c (mem_loc_descriptor): Use DW_OP_form_tls_address.
+   (resolve_args_picking_1): Move DW_OP_form_tls_address case next to
+   DW_OP_GNU_push_tls_address case.
+   (loc_list_from_tree_1): Use DW_OP_form_tls_address.
+
 2016-10-21  Jakub Jelinek  
 
* config/i386/adxintrin.h (_subborrow_u32, _addcarry_u32,
diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index 4683e1c..73b0ea0 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -13619,7 +13619,12 @@ mem_loc_descriptor (rtx rtl, machine_mode mode,
 
   temp = new_addr_loc_descr (rtl, dtprel_true);
 
- mem_loc_result = new_loc_descr (DW_OP_GNU_push_tls_address, 0, 0);
+ /* We check for DWARF 5 here because gdb did not implement
+DW_OP_form_tls_address until after 7.12.  */
+ mem_loc_result = new_loc_descr ((dwarf_version >= 5
+  ? DW_OP_form_tls_address
+  : DW_OP_GNU_push_tls_address),
+ 0, 0);
  add_loc_descr (&mem_loc_result, temp);
 
  break;
@@ -15467,7 +15472,6 @@ resolve_args_picking_1 (dw_loc_descr_ref loc, unsigned 
initial_frame_offset,
case DW_OP_piece:
case DW_OP_deref_size:
case DW_OP_nop:
-   case DW_OP_form_tls_address:
case DW_OP_bit_piece:
case DW_OP_implicit_value:
case DW_OP_stack_value:
@@ -15595,6 +15599,7 @@ resolve_args_picking_1 (dw_loc_descr_ref loc, unsigned 
initial_frame_offset,
break;
  }
 
+   case DW_OP_form_tls_address:
case DW_OP_GNU_push_tls_address:
case DW_OP_GNU_uninit:
case DW_OP_GNU_encoded_addr:
@@ -15924,8 +15929,11 @@ loc_list_from_tree_1 (tree loc, int want_address,
  operand shouldn't be.  */
  if (DECL_EXTERNAL (loc) && !targetm.binds_local_p (loc))
return 0;
- dtprel = dtprel_true;
- tls_op = DW_OP_GNU_push_tls_address;
+ dtprel = dtprel_true;
+ /* We check for DWARF 5 here because gdb did not implement
+DW_OP_form_tls_address until after 7.12.  */
+ tls_op = (dwarf_version >= 5 ? DW_OP_form_tls_address
+   : DW_OP_GNU_push_tls_address);
}
  else
{


[PATCH] Extend -Wint-in-bool-context to warn for multiplications

2016-10-21 Thread Bernd Edlinger
Hi!

This patch extends -Wint-in-bool-context to warn for multiplications if
used in boolean context.  This is rarely useful, and where used, could
be easily replaced with && for instance. I think that multiplications in
boolean context should be warned about, regardless of the used
data type.

This warning found already one bug in value-prof.c, at
stringop_block_profile where an undefined overflow in a signed
multiplication was used to terminate a loop.  Fixed as well.


Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
Is it OK for trunk?


Thanks
Bernd.
c-family:
2016-10-21  Bernd Edlinger  

	* c-common.c (c_common_truthvalue_conversion): Warn for
	multiplications in boolean context.

gcc:
2016-10-21  Bernd Edlinger  

	* doc/invoke.text (Wint-in-bool-context): Update documentation.
	* value-prof.c (stringop_block_profile): Fix a warning.

testsuite:
2016-10-21  Bernd Edlinger  

	* c-c++-common/Wint-in-bool-context-3.c: New test.

Index: gcc/c-family/c-common.c
===
--- gcc/c-family/c-common.c	(revision 241400)
+++ gcc/c-family/c-common.c	(working copy)
@@ -3327,6 +3327,11 @@ c_common_truthvalue_conversion (location_t locatio
 	return c_common_truthvalue_conversion (location,
 	   TREE_OPERAND (expr, 0));
 
+case MULT_EXPR:
+  warning_at (EXPR_LOCATION (expr), OPT_Wint_in_bool_context,
+		  "* in boolean context, better use '&&' here");
+  break;
+
 case LSHIFT_EXPR:
   /* We will only warn on signed shifts here, because the majority of
 	 false positive warnings happen in code where unsigned arithmetic
Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi	(revision 241400)
+++ gcc/doc/invoke.texi	(working copy)
@@ -6169,8 +6169,9 @@ of the C++ standard.
 @opindex Wno-int-in-bool-context
 Warn for suspicious use of integer values where boolean values are expected,
 such as conditional expressions (?:) using non-boolean integer constants in
-boolean context, like @code{if (a <= b ? 2 : 3)}.  Or left shifting in
-boolean context, like @code{for (a = 0; 1 << a; a++);}.
+boolean context, like @code{if (a <= b ? 2 : 3)}.  Or left shifting of signed
+integers in boolean context, like @code{for (a = 0; 1 << a; a++);}.  Likewise
+for all kinds of multiplications regardless of the data type.
 This warning is enabled by @option{-Wall}.
 
 @item -Wno-int-to-pointer-cast
Index: gcc/value-prof.c
===
--- gcc/value-prof.c	(revision 241400)
+++ gcc/value-prof.c	(working copy)
@@ -1878,12 +1878,12 @@ stringop_block_profile (gimple *stmt, unsigned int
   else
 {
   gcov_type count;
-  int alignment;
+  unsigned int alignment;
 
   count = histogram->hvalue.counters[0];
   alignment = 1;
   while (!(count & alignment)
-	 && (alignment * 2 * BITS_PER_UNIT))
+	 && (alignment <= UINT_MAX / 2 / BITS_PER_UNIT))
 	alignment <<= 1;
   *expected_align = alignment * BITS_PER_UNIT;
   gimple_remove_histogram_value (cfun, stmt, histogram);
Index: gcc/testsuite/c-c++-common/Wint-in-bool-context-3.c
===
--- gcc/testsuite/c-c++-common/Wint-in-bool-context-3.c	(revision 0)
+++ gcc/testsuite/c-c++-common/Wint-in-bool-context-3.c	(working copy)
@@ -0,0 +1,15 @@
+/* { dg-options "-Wint-in-bool-context" } */
+/* { dg-do compile } */
+
+#define BITS_PER_UNIT 8
+
+int foo (int count)
+{
+  int alignment;
+ 
+  alignment = 1;
+  while (!(count & alignment)
+ && (alignment * 2 * BITS_PER_UNIT)) /* { dg-warning "boolean context" } */
+alignment <<= 1;
+  return alignment * BITS_PER_UNIT;
+}


Ping Re: [PATCH] go-lang.c: remove a redundant cast

2016-10-21 Thread David Malcolm
On Fri, 2016-10-07 at 15:12 -0400, David Malcolm wrote:
> Amongst many other changes, r146451 added this cast:
> 
>  -#define GET_MODE_CLASS(MODE)  mode_class[MODE]
>  +#define GET_MODE_CLASS(MODE)  ((enum mode_class) mode_class[MODE])
> 
> making a cast in go-lang.c redundant; remove it.
> 
> Successfully bootstrapped®rtested on x86_64-pc-linux-gnu.
> 
> OK for trunk?
> 
> gcc/go/ChangeLog:
>   > * go-lang.c (go_langhook_type_for_mode): Remove redundant cast
>   > from result of GET_MODE_CLASS.  Minor formatting fixes.
> ---
>  gcc/go/go-lang.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/go/go-lang.c b/gcc/go/go-lang.c
> index 88667e0..acf1fb7 100644
> --- a/gcc/go/go-lang.c
> +++ b/gcc/go/go-lang.c
> @@ -370,10 +370,9 @@ go_langhook_type_for_mode (machine_mode mode, int 
> unsignedp)
>return NULL_TREE;
>  }
>  
> -  // FIXME: This static_cast should be in machmode.h.
> -  enum mode_class mc = static_cast(GET_MODE_CLASS(mode));
> +  enum mode_class mc = GET_MODE_CLASS (mode);
>if (mc == MODE_INT)
> -return go_langhook_type_for_size(GET_MODE_BITSIZE(mode), unsignedp);
> +return go_langhook_type_for_size (GET_MODE_BITSIZE (mode), unsignedp);
>else if (mc == MODE_FLOAT)
>  {
>switch (GET_MODE_BITSIZE (mode))

Ping.


libgo patch committed: Copy lfstack code from Go 1.7 runtime

2016-10-21 Thread Ian Lance Taylor
This patch to libgo copies the lfstack code from the Go 1.7 runtime to
libgo.  The older gccgo-specific copy of lfstack had some
modifications that work on Solaris; those are implemented in a
superior form in the new version.  Bootstrapped and ran Go testsuite
on x86_64-pc-linux-gnu.  Bootstrapped and ran a few libgo tests on
i386-sun-solaris and sparc-sun-solaris.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 241384)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-14dc8052a09ad0a2226e64ab6b5af69c6923b830
+df6046971233854e5b7533140d4ead095ab69857
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/Makefile.am
===
--- libgo/Makefile.am   (revision 241384)
+++ libgo/Makefile.am   (working copy)
@@ -488,7 +488,6 @@ runtime_files = \
$(runtime_thread_files) \
runtime/yield.c \
$(rtems_task_variable_add_file) \
-   lfstack.c \
malloc.c \
runtime1.c \
sigqueue.c \
Index: libgo/go/runtime/export_test.go
===
--- libgo/go/runtime/export_test.go (revision 241341)
+++ libgo/go/runtime/export_test.go (working copy)
@@ -6,6 +6,10 @@
 
 package runtime
 
+import (
+   "unsafe"
+)
+
 //var Fadd64 = fadd64
 //var Fsub64 = fsub64
 //var Fmul64 = fmul64
@@ -32,11 +36,13 @@ type LFNode struct {
Pushcnt uintptr
 }
 
-func lfstackpush_go(head *uint64, node *LFNode)
-func lfstackpop_go(head *uint64) *LFNode
+func LFStackPush(head *uint64, node *LFNode) {
+   lfstackpush(head, (*lfnode)(unsafe.Pointer(node)))
+}
 
-var LFStackPush = lfstackpush_go
-var LFStackPop = lfstackpop_go
+func LFStackPop(head *uint64) *LFNode {
+   return (*LFNode)(unsafe.Pointer(lfstackpop(head)))
+}
 
 type ParFor struct {
body   func(*ParFor, uint32)
Index: libgo/go/runtime/lfstack.go
===
--- libgo/go/runtime/lfstack.go (revision 0)
+++ libgo/go/runtime/lfstack.go (working copy)
@@ -0,0 +1,50 @@
+// Copyright 2012 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// Lock-free stack.
+// Initialize head to 0, compare with 0 to test for emptiness.
+// The stack does not keep pointers to nodes,
+// so they can be garbage collected if there are no other pointers to nodes.
+// The following code runs only in non-preemptible contexts.
+
+package runtime
+
+import (
+   "runtime/internal/atomic"
+   "unsafe"
+)
+
+// Temporary for C code to call:
+//go:linkname lfstackpush runtime.lfstackpush
+//go:linkname lfstackpop runtime.lfstackpop
+
+func lfstackpush(head *uint64, node *lfnode) {
+   node.pushcnt++
+   new := lfstackPack(node, node.pushcnt)
+   if node1 := lfstackUnpack(new); node1 != node {
+   print("runtime: lfstackpush invalid packing: node=", node, " 
cnt=", hex(node.pushcnt), " packed=", hex(new), " -> node=", node1, "\n")
+   throw("lfstackpush")
+   }
+   for {
+   old := atomic.Load64(head)
+   node.next = old
+   if atomic.Cas64(head, old, new) {
+   break
+   }
+   }
+}
+
+func lfstackpop(head *uint64) unsafe.Pointer {
+   for {
+   old := atomic.Load64(head)
+   if old == 0 {
+   return nil
+   }
+   node := lfstackUnpack(old)
+   next := atomic.Load64(&node.next)
+   if atomic.Cas64(head, old, next) {
+   return unsafe.Pointer(node)
+   }
+   }
+}
Index: libgo/go/runtime/lfstack_32bit.go
===
--- libgo/go/runtime/lfstack_32bit.go   (revision 0)
+++ libgo/go/runtime/lfstack_32bit.go   (working copy)
@@ -0,0 +1,19 @@
+// Copyright 2014 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build 386 arm nacl armbe m68k mips mipsle mips64p32 mips64p32le mipso32 
mipsn32 s390 sparc
+
+package runtime
+
+import "unsafe"
+
+// On 32-bit systems, the stored uint64 has a 32-bit pointer and 32-bit count.
+
+func lfstackPack(node *lfnode, cnt uintptr) uint64 {
+   return uint64(uintptr(unsafe.Pointer(node)))<<32 | uint64(cnt)
+}
+
+func lfstackUnpack(val uint64) *lfnode {
+   return (*lfnode)(unsafe.Pointer(uintptr(val >> 32)))
+}
Index: libgo/go/runtime/lfstack_64bit.go
===
--- libgo/go/runtime/lfstack_64bit.go   (revision 241341)
+++ libgo/go/runtime/lfstack_64bit.go   (working copy)
@@ -

[PATCH] print_rtx: implement support for reuse IDs

2016-10-21 Thread David Malcolm
On Thu, 2016-10-20 at 11:22 +0200, Bernd Schmidt wrote:
> > Recognizing by SCRATCH wouldn't catch everything, I believe. You
> > should
> > be able to check n_dups and dup_loc in recog_data to identify cases
> > where you need to ensure something is restored with pointer
> > equality.

Thanks.  I attemped to use those fields of recog_data, but it doesn't
seem to be exactly what's needed here.

Recall that we have:

  (cinsn (set (mem/v:BLK (scratch:DI) [0  A8])
(unspec:BLK [
(mem/v:BLK (scratch:DI) [0  A8])
] UNSPEC_MEMORY_BLOCKAGE)) "test.c":2
 (nil))

If I do a recog and then insn_extract on the insn in question, then
this code in insn-extract.c fires:

case 695:  /* *memory_blockage */
  ro[0] = *(ro_loc[0] = &XEXP (pat, 0));
  recog_data.dup_loc[0] = &XVECEXP (XEXP (pat, 1), 0, 0);
  recog_data.dup_num[0] = 0;
  break;

and we have:

  (gdb) call debug (*recog_data.dup_loc[0])
  (mem/v:BLK (scratch:DI) [0  A8])

  (gdb) call debug (ro[0])
  (mem/v:BLK (scratch:DI) [0  A8])

  (gdb) p ro[0]
  $17 = (rtx) 0x719bca98
  (gdb) p recog_data.dup_loc[0]
  $18 = (rtx *) 0x7190eeb8

i.e. it's recorded that the two "(mem/v:BLK (scratch:DI) [0  A8])" match.

However, this doesn't seem to help in terms of actually writing out
the data: it's identified a match_dup, but it's of the two MEM
instances; it's the SCRATCH instance within them that's shared.

Somehow we'd need to traverse the identified match_dup cases and figure
out which descendents within them have identical pointers.

So I came up with a different approach, which doesn't directly use
recog_data.  Instead, there's a new "rtx_reuse_manager" class,
which supports directly identifying the identical SCRATCH instances
during a dump.  This approach seems to me to be simpler, and it's more
flexible, as it can cope with other ways in which pointer-equality
could occur, outside of a match_dup.

On Thu, 2016-10-20 at 17:43 +0200, Bernd Schmidt wrote:
> On 10/20/2016 04:51 PM, David Malcolm wrote:
> >(0|scratch:DI)
> > 
> > with the insn as a whole looking like:
> > 
> >(cinsn (set (mem/v:BLK (0|scratch:DI) [0  A8])
> >   (unspec:BLK [
> >   (mem/v:BLK (reuse_rtx 0) [0  A8])
> >   ] UNSPEC_MEMORY_BLOCKAGE)) "test.c":2
> >(nil))
> 
> LGTM. I'd try to expose match_dup though, it's the standard name for
> this sort of thing. Hopefully it won't have to be added to a lot of
> switch statements to shut up warnings.

The following patch implements the dumping side of the above proposed
format, via the new "rtx_reuse_manager" class.

I didn't expose match_dup to the host, instead introducing "reuse_rtx"
as a generic place for rtx pointer reuse.

Successfully bootstrapped®rtested on x86_64-pc-linux-gnu (on top
of "[PATCH] Start adding target-specific selftests").

OK for trunk?

I've (separately) implemented support for loading this format, using
test dumps emitted by this patch and it resolves the issue I had with
the __RTL cc1 selftest.  The testcases for that loading support is
currently integrated with the function_reader code, so I'll save it for
a followup.

gcc/ChangeLog:
* config/i386/i386.c: Include print-rtl-reuse.h.
(selftest::ix86_test_dumping_memory_blockage): New function.
(selftest::ix86_run_selftests): Call it.
* print-rtl-function.c: Include "print-rtl-reuse.h".
(print_rtx_function): Create an rtx_reuse_manager and use it.
* print-rtl-reuse.h: New file.
* print-rtl.c: Include "print-rtl-reuse.h" and "rtl-iter.h".
(rtx_reuse_manager::singleton): New global.
(rtx_reuse_manager::rtx_reuse_manager): New ctor.
(rtx_reuse_manager::~rtx_reuse_manager): New dtor.
(uses_rtx_reuse_p): New function.
(rtx_reuse_manager::preprocess): New function.
(rtx_reuse_manager::has_reuse_id): New function.
(rtx_reuse_manager::seen_def_p): New function.
(rtx_reuse_manager::set_seen_def): New function.
(print_rtx): If "in_rtx" has a reuse ID, print
it as a prefix the first time in_rtx is seen, and print
reuse_rtx subsequently.
* rtl-tests.c: Include "print-rtl-reuse.h".
(selftest::test_dumping_rtx_reuse): New function.
(selftest::rtl_tests_c_tests): Call it.
---
 gcc/config/i386/i386.c   |  23 
 gcc/print-rtl-function.c |   6 ++
 gcc/print-rtl-reuse.h| 100 
 gcc/print-rtl.c  | 144 +--
 gcc/rtl-tests.c  |  49 
 5 files changed, 318 insertions(+), 4 deletions(-)
 create mode 100644 gcc/print-rtl-reuse.h

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 8f6ceb4..b979cae 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c

Re: [PATCH][check_GNU_style.sh] More aggressively ignore dg-xxx directives

2016-10-21 Thread Martin Sebor

The latest patch works as expected for me, both with an operand
and with stdin.  But since I'm not empowered to approve it one
of the others reviewers will need to give it their blessing.

Thanks
Martin

On 10/21/2016 07:56 AM, Kyrill Tkachov wrote:

Ping.
https://gcc.gnu.org/ml/gcc-patches/2016-10/msg00982.html

Thanks,
Kyrill

On 13/10/16 09:11, Kyrill Tkachov wrote:


On 12/10/16 17:49, Martin Sebor wrote:

On 10/12/2016 06:43 AM, Kyrill Tkachov wrote:


On 12/10/16 11:18, Kyrill Tkachov wrote:


On 12/10/16 10:57, Kyrill Tkachov wrote:


On 11/10/16 20:19, Jakub Jelinek wrote:

On Tue, Oct 11, 2016 at 01:11:04PM -0600, Martin Sebor wrote:

Also, the pattern that starts with "/\+\+\+" looks like it's
missing
the ^ anchor.  Presumably it should be "/^\+\+\+ \/testsuite\//".

No, it will be almost never +++ /testsuite/
There needs to be .* in between "+++ " and "/testsuite/", and
perhaps
it should also ignore "+++ testsuite/".
So /^\+\+\+ (.*\/)?testsuite\// ?
Also, normally (when matching $0) there won't be newlines in the
text.

Jakub


Thanks.
Here is the updated patch with your suggestions.



Actually, I've encountered a problem:

 85 # Remove the testsuite part of the diff.  We don't care about GNU
style
 86 # in testcases and the dg-* directives give too many false
positives.
 87 remove_testsuite ()
 88 {
 89   awk 'BEGIN{testsuite=0} /\+\+\+ / && !
/testsuite\//{testsuite=0} \
 90{if (!testsuite) print} /^\+\+\+
(.*\/)?testsuite\//{testsuite=1}'
 91 }
 92
 93 grep $format '^+' $files \
 94 | remove_testsuite \
 95 | grep -v ':+++' \
 96 > $inp


The /^\+\+\+ (.*\/)?testsuite\// doesn't ever match when the ^ anchor
is used.
The awk command matches fine by itself but not when fed from the "grep
$format '^+' $files"
command because grep adds the line numbers and file names.
So is it okay to omit the ^ here?


I think the AWK regex will not work correctly when the patch has
the line number prefix like "1234: " (AFAICT, this can only happen
in the second invocation of the remove_testsuite function which
also has the problem below making me wonder if your testing
exercised that mode).



Huh, you're right, but it didn't cause problems in my testing, which
is weird.


I think the AWK regex needs to be changed to handle that.  It should
start with something like "^([1-9][0-9]*:)?\+\+\+"


I think it needs to be
^(.*:)?([1-9][0-9]*:)?\+\+\+
because grep -nH would add the filename as well as the line number in
the first
invocation of remove_testsuite.
This revision does that.



I tried to test the patch but it doesn't seem to work.  When passed
a patch as an argument it hangs.  The hunk below isn't quite right:

 # Don't reuse $inp, which may be generated using -H and thus
contain a
-# file prefix.
-grep -n '^+' $f \
+# file prefix.  Re-remove the testsuite since we're not using $inp.
+remove_testsuite $f \
+| grep -n '^+' \
 | grep -v ':+++' \
 > $tmp

The remove_testsuite function ignores arguments so passing $f to it
won't do anything except hang waiting for input.  This should look
closer to this (it worked in my very limited testing):

cat $f | remove_testsuite \



Thanks for the help,
Kyrill

2016-10-13  Kyrylo Tkachov  

* check_GNU_style.sh (remove_testsuite): New function.
Use it to remove testsuite from the diff.


Martin








C++ PATCH to debug_tree of TEMPLATE_PARM_INDEX

2016-10-21 Thread Jason Merrill
It was bugging me that the default debug_tree of a TEMPLATE_PARM_INDEX
didn't give the name of the template parameter, so I'm adding the
corresponding _DECL to the dump.
commit 56ee7ba41e0a1b6568f157a5a82230cb8f57
Author: Jason Merrill 
Date:   Fri Oct 21 15:29:04 2016 -0400

* ptree.c (cxx_print_xnode) [TEMPLATE_PARM_INDEX]: Dump the decl.

diff --git a/gcc/cp/ptree.c b/gcc/cp/ptree.c
index 5726f96..e3e5e33 100644
--- a/gcc/cp/ptree.c
+++ b/gcc/cp/ptree.c
@@ -236,6 +236,7 @@ cxx_print_xnode (FILE *file, tree node, int indent)
   print_node (file, "chain", TREE_CHAIN (node), indent+4);
   break;
 case TEMPLATE_PARM_INDEX:
+  print_node (file, "decl", TEMPLATE_PARM_DECL (node), indent+4);
   indent_to (file, indent + 3);
   fprintf (file, "index %d level %d orig_level %d",
   TEMPLATE_PARM_IDX (node), TEMPLATE_PARM_LEVEL (node),


C++ PATCH for c++/77656 (bogus warning with non-type template parameters)

2016-10-21 Thread Jason Merrill
We were warning that shifting a uint64_t right by 32 bits exceeded the
width of the type because we didn't do any conversion when
substituting the 32-bit _X of Test2 for _Val in Test.

This patch fixes this by doing conversions for value-dependent (but
not type-dependent) arguments in convert_template_argument, and
adjusts a few places that needed to learn to look past the
conversions.  There may need others that will need adjustment, too.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit 3e8031f29419189b52753d1b3fb09e2a67e167a0
Author: Jason Merrill 
Date:   Fri Oct 21 15:29:20 2016 -0400

PR c++/77656
* pt.c (convert_template_argument): Call convert_nontype_argument
on value-dependent but not type-dependent arguments.
(convert_nontype_argument): Handle value-dependent arguments.
(canonicalize_expr_argument): New.
(deducible_expression, unify): Skip CONVERT_EXPR.
* error.c (dump_template_argument): Likewise.
* mangle.c (write_expression): Likewise.

diff --git a/gcc/cp/error.c b/gcc/cp/error.c
index 4cf0041..917a448 100644
--- a/gcc/cp/error.c
+++ b/gcc/cp/error.c
@@ -171,6 +171,10 @@ dump_template_argument (cxx_pretty_printer *pp, tree arg, 
int flags)
   if (TREE_CODE (arg) == TREE_LIST)
arg = TREE_VALUE (arg);
 
+  /* Strip implicit conversions.  */
+  while (CONVERT_EXPR_P (arg))
+   arg = TREE_OPERAND (arg, 0);
+
   dump_expr (pp, arg, (flags | TFF_EXPR_IN_PARENS) & 
~TFF_CLASS_KEY_OR_ENUM);
 }
 }
diff --git a/gcc/cp/mangle.c b/gcc/cp/mangle.c
index 9f86e91..cb2f260 100644
--- a/gcc/cp/mangle.c
+++ b/gcc/cp/mangle.c
@@ -2774,10 +2774,9 @@ write_expression (tree expr)
 {
   enum tree_code code = TREE_CODE (expr);
 
-  /* Skip NOP_EXPRs.  They can occur when (say) a pointer argument
- is converted (via qualification conversions) to another
- type.  */
-  while (TREE_CODE (expr) == NOP_EXPR
+  /* Skip NOP_EXPR and CONVERT_EXPR.  They can occur when (say) a pointer
+ argument is converted (via qualification conversions) to another type.  */
+  while (CONVERT_EXPR_CODE_P (code)
 /* Parentheses aren't mangled.  */
 || code == PAREN_EXPR
 || TREE_CODE (expr) == NON_LVALUE_EXPR)
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 1db01f8..1f1df7f 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -214,6 +214,7 @@ static tree tsubst_template_parm (tree, tree, 
tsubst_flags_t);
 static tree instantiate_alias_template (tree, tree, tsubst_flags_t);
 static bool complex_alias_template_p (const_tree tmpl);
 static tree tsubst_attributes (tree, tree, tsubst_flags_t, tree);
+static tree canonicalize_expr_argument (tree, tsubst_flags_t);
 
 /* Make the current scope suitable for access checking when we are
processing T.  T can be FUNCTION_DECL for instantiated function
@@ -6297,6 +6298,9 @@ convert_nontype_argument (tree type, tree expr, 
tsubst_flags_t complain)
   if (non_dep)
 expr = instantiate_non_dependent_expr_internal (expr, complain);
 
+  if (value_dependent_expression_p (expr))
+expr = canonicalize_expr_argument (expr, complain);
+
   /* 14.3.2/5: The null pointer{,-to-member} conversion is applied
  to a non-type argument of "nullptr".  */
   if (expr == nullptr_node && TYPE_PTR_OR_PTRMEM_P (type))
@@ -6405,7 +6409,8 @@ convert_nontype_argument (tree type, tree expr, 
tsubst_flags_t complain)
 
   /* Notice that there are constant expressions like '4 % 0' which
 do not fold into integer constants.  */
-  if (TREE_CODE (expr) != INTEGER_CST)
+  if (TREE_CODE (expr) != INTEGER_CST
+ && !value_dependent_expression_p (expr))
{
  if (complain & tf_error)
{
@@ -6452,7 +6457,7 @@ convert_nontype_argument (tree type, tree expr, 
tsubst_flags_t complain)
Here, we do not care about functions, as they are invalid anyway
for a parameter of type pointer-to-object.  */
 
-  if (DECL_P (expr) && DECL_TEMPLATE_PARM_P (expr))
+  if (value_dependent_expression_p (expr))
/* Non-type template parameters are OK.  */
;
   else if (cxx_dialect >= cxx11 && integer_zerop (expr))
@@ -6567,27 +6572,30 @@ convert_nontype_argument (tree type, tree expr, 
tsubst_flags_t complain)
}
}
 
-  if (!DECL_P (expr))
+  if (!value_dependent_expression_p (expr))
{
- if (complain & tf_error)
-   error ("%qE is not a valid template argument for type %qT "
-  "because it is not an object with linkage",
-  expr, type);
- return NULL_TREE;
-   }
+ if (!DECL_P (expr))
+   {
+ if (complain & tf_error)
+   error ("%qE is not a valid template argument for type %qT "
+  "because it is not an object with linkage",
+  expr, type);
+ return NULL_TREE;
+   }
 
-  /* DR 1155 allows internal link

Re: Use version namespace in normal mode

2016-10-21 Thread Jonathan Wakely


Some quick comments before I go offline ...


On 21/10/16 21:21 +0200, François Dumont wrote:

Hi

   I configured libstdc++ to use gnu-version-namespace and there are 
a number of failures, see below. But none of them related to this 
patch so is it ok to commit ?


The results:

FAIL: libstdc++-abi/abi_check

   3709 symbols reported as added. I don't know what to think about 
it. I see a gnu-versioned-namespace.ver in config/abi/pre, is it the 
list of symbols to support when versioned namespace is activated ? The 
list looks pretty limited.


Because everything gets matched by widlcards and added to a single
symbol version.

I just ignore abi-check for the versioned namespace, since it's
explicitly not ABI compatible. Does the test even work for the
versioned namespace or does it only use the normal baselines?



FAIL: 18_support/headers/limits/synopsis.cc (test for excess errors)
FAIL: 20_util/headers/functional/synopsis.cc (test for excess errors)
FAIL: 20_util/headers/memory/synopsis.cc (test for excess errors)
FAIL: 20_util/headers/utility/synopsis.cc (test for excess errors)
FAIL: 21_strings/headers/string/synopsis.cc (test for excess errors)
FAIL: 22_locale/headers/locale/synopsis.cc (test for excess errors)
FAIL: 23_containers/headers/bitset/synopsis.cc (test for excess errors)
FAIL: 23_containers/headers/deque/synopsis.cc (test for excess errors)
FAIL: 23_containers/headers/forward_list/synopsis.cc (test for excess 
errors)

FAIL: 23_containers/headers/list/synopsis.cc (test for excess errors)
FAIL: 23_containers/headers/map/synopsis.cc (test for excess errors)
FAIL: 23_containers/headers/queue/synopsis.cc (test for excess errors)
FAIL: 23_containers/headers/set/synopsis.cc (test for excess errors)
FAIL: 23_containers/headers/stack/synopsis.cc (test for excess errors)
FAIL: 23_containers/headers/vector/synopsis.cc (test for excess errors)
FAIL: 24_iterators/headers/iterator/synopsis.cc (test for excess errors)
FAIL: 24_iterators/headers/iterator/synopsis_c++11.cc (test for excess 
errors)
FAIL: 24_iterators/headers/iterator/synopsis_c++14.cc (test for excess 
errors)
FAIL: 24_iterators/headers/iterator/synopsis_c++17.cc (test for excess 
errors)

FAIL: 26_numerics/headers/complex/synopsis.cc (test for excess errors)
FAIL: 26_numerics/headers/valarray/synopsis.cc (test for excess errors)
FAIL: 27_io/headers/fstream/synopsis.cc (test for excess errors)
FAIL: 27_io/headers/ios/synopsis.cc (test for excess errors)
FAIL: 27_io/headers/istream/synopsis.cc (test for excess errors)
FAIL: 27_io/headers/ostream/synopsis.cc (test for excess errors)
FAIL: 27_io/headers/sstream/synopsis.cc (test for excess errors)
FAIL: 27_io/headers/streambuf/synopsis.cc (test for excess errors)
FAIL: tr1/2_general_utilities/headers/functional/synopsis.cc (test for 
excess errors)
FAIL: tr1/2_general_utilities/headers/memory/synopsis.cc (test for 
excess errors)
FAIL: tr1/3_function_objects/headers/functional/synopsis.cc (test for 
excess errors)
FAIL: tr1/4_metaprogramming/headers/type_traits/synopsis.cc (test for 
excess errors)

FAIL: tr1/6_containers/headers/array/synopsis.cc (test for excess errors)
FAIL: tr1/6_containers/headers/unordered_map/synopsis.cc (test for 
excess errors)
FAIL: tr1/6_containers/headers/unordered_set/synopsis.cc (test for 
excess errors)


   All those failures are coming from declaration or explicit 
instantiations of template types expected to be in std but being in 
fact in std::__7. Should I add usage of 
_GLIBCXX_BEGIN_NAMESPACE_VERSION/_GLIBCXX_END_NAMESSPACE_VERSION in 
those files ? Or introduce a dg-require-no-versioned-namespace ?


I've just been ignoring those failures, as the reason is known. Either
of your suggestions would work, although I've been thinking we should
avoid using _GLIBCXX_ macros in the tests, so they are independent of
our implementation details.

We could define GLIBCXX_TEST_ macros for use in the tests, and define
them independently, so we could add a GLIBCXX_TEST_INLINE_NS to those
tests which would add the "inline namespace __7" bit.

We don't need to worry about it for now though.



FAIL: 17_intro/using_namespace_std_tr1_neg.cc  (test for errors, line 65)
FAIL: 21_strings/basic_string/cons/char/7.cc (test for excess errors)
FAIL: 21_strings/basic_string/cons/wchar_t/7.cc (test for excess errors)
FAIL: 21_strings/basic_string/lwg2758.cc (test for excess errors)
FAIL: 21_strings/basic_string/modifiers/append/char/4.cc (test for 
excess errors)
FAIL: 21_strings/basic_string/modifiers/append/wchar_t/4.cc (test for 
excess errors)
FAIL: 21_strings/basic_string/modifiers/assign/char/4.cc (test for 
excess errors)
FAIL: 21_strings/basic_string/modifiers/assign/wchar_t/4.cc (test for 
excess errors)
FAIL: 21_strings/basic_string/modifiers/insert/char/3.cc (test for 
excess errors)
FAIL: 21_strings/basic_string/modifiers/insert/wchar_t/3.cc (test for 
excess errors)
FAIL: 21_strings/basic_string/modifiers/replace/char/7.cc (test for 
excess errors)
FAIL: 21_string

[PATCH] DWARF5 .debug_info headers, .debug_types -> .debug_info DW_UT_type

2016-10-21 Thread Jakub Jelinek
Hi!

This patch changes the .debug_info headers to follow the current
specification (I still hope the useless padding1/padding2 fields will be
removed), and also changes the -gsplit-dwarf stuff to move dwo_id into
the header and use DW_UT_{skeleton,split_*}.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-10-21  Jakub Jelinek  

* dwarf2out.c (dwarf_AT): Handle DW_AT_dwo_name.
(use_debug_types): Adjust comment for DWARF5 DW_UT_type units.
(new_die): Handle DW_TAG_skeleton_unit like DW_TAG_compile_unit.
(is_cu_die, is_unit_die): Likewise.
(should_move_die_to_comdat, break_out_comdat_types): Adjust
comments for DWARF5 DW_UT_type units.
(output_compilation_unit_header): Add UT argument, output
start of DWARF5 .debug_info section header.
(output_comp_unit): Add dwo_id argument.  Adjust
output_compilation_unit_header caller, for DW_UT_split_compile
emit dwo_id field, otherwise padding1.  Emit padding2 field.
(add_top_level_skeleton_die_attrs): Add DW_AT_dwo_name
rather than DW_AT_GNU_dwo_name attr for -gdwarf-5.
(output_skeleton_debug_sections): Add dwo_id argument, for
-gdwarf-5 emit DWARF 5 DW_UT_skeleton header.
(output_comdat_type_unit): For -gdwarf-5 emit .debug_info
DW_UT_type or DW_UT_split_type units rather than .debug_types.
(dwarf2out_finish): Use DW_TAG_skeleton_unit rather than
DW_TAG_compile_unit for skeleton unit die.  Don't add
DW_AT_GNU_dwo_id attributes for -gdwarf-5, instead pass checksum
address to output_comp_unit and output_skeleton_debug_sections.

--- gcc/dwarf2out.c.jj  2016-10-21 18:10:55.380302806 +0200
+++ gcc/dwarf2out.c 2016-10-21 18:40:34.433237109 +0200
@@ -1634,6 +1634,11 @@ dwarf_AT (enum dwarf_attribute at)
return DW_AT_GNU_all_tail_call_sites;
   break;
 
+case DW_AT_dwo_name:
+  if (dwarf_version < 5)
+   return DW_AT_GNU_dwo_name;
+  break;
+
 default:
   break;
 }
@@ -2758,7 +2763,8 @@ const struct gcc_debug_hooks dwarf2_line
-fno-debug-types-section.  It is more efficient to put them in a
separate comdat sections since the linker will then be able to
remove duplicates.  But not all tools support .debug_types sections
-   yet.  */
+   yet.  For Dwarf V5 or higher .debug_types doesn't exist any more,
+   it is DW_UT_type unit type in .debug_info section.  */
 
 #define use_debug_types (dwarf_version >= 4 && flag_debug_types_section)
 
@@ -3422,8 +3428,8 @@ static void output_abbrev_section (void)
 static void output_die_abbrevs (unsigned long, dw_die_ref);
 static void output_die_symbol (dw_die_ref);
 static void output_die (dw_die_ref);
-static void output_compilation_unit_header (void);
-static void output_comp_unit (dw_die_ref, int);
+static void output_compilation_unit_header (enum dwarf_unit_type);
+static void output_comp_unit (dw_die_ref, int, const unsigned char *);
 static void output_comdat_type_unit (comdat_type_node *);
 static const char *dwarf2_name (tree, int);
 static void add_pubname (tree, dw_die_ref);
@@ -5229,6 +5235,7 @@ new_die (enum dwarf_tag tag_value, dw_di
  /* These are allowed because they're generated while
 breaking out COMDAT units late.  */
  && tag_value != DW_TAG_type_unit
+ && tag_value != DW_TAG_skeleton_unit
  && !early_dwarf
  /* Allow nested functions to live in limbo because they will
 only temporarily live there, as decls_for_scope will fix
@@ -7257,7 +7264,8 @@ is_symbol_die (dw_die_ref c)
 static inline bool
 is_cu_die (dw_die_ref c)
 {
-  return c && c->die_tag == DW_TAG_compile_unit;
+  return c && (c->die_tag == DW_TAG_compile_unit
+  || c->die_tag == DW_TAG_skeleton_unit);
 }
 
 /* Returns true iff C is a unit DIE of some sort.  */
@@ -7267,7 +7275,8 @@ is_unit_die (dw_die_ref c)
 {
   return c && (c->die_tag == DW_TAG_compile_unit
   || c->die_tag == DW_TAG_partial_unit
-  || c->die_tag == DW_TAG_type_unit);
+  || c->die_tag == DW_TAG_type_unit
+  || c->die_tag == DW_TAG_skeleton_unit);
 }
 
 /* Returns true iff C is a namespace DIE.  */
@@ -7552,7 +7561,8 @@ contains_subprogram_definition (dw_die_r
 }
 
 /* Return non-zero if this is a type DIE that should be moved to a
-   COMDAT .debug_types section.  */
+   COMDAT .debug_types section or .debug_info section with DW_UT_*type
+   unit type.  */
 
 static int
 should_move_die_to_comdat (dw_die_ref die)
@@ -8069,8 +8079,9 @@ copy_dwarf_procs_ref_in_dies (dw_die_ref
copied_dwarf_procs));
 }
 
-/* Traverse the DIE and set up additional .debug_types sections for each
-   type worthy of being placed in a COMDAT section.  */
+/* Traverse the DIE and set up additional .debug_types or .debug_info
+   DW_UT_*type sections for each type worthy of being placed in 

Re: [PATCH] Simplify conditions in EVRP, handle taken edge

2016-10-21 Thread Andrew Pinski
On Wed, Oct 19, 2016 at 11:10 PM, kugan
 wrote:
> Hi,
>
>
> On 20/10/16 02:54, Andrew Pinski wrote:
>>
>> On Wed, Oct 19, 2016 at 1:01 AM, Christophe Lyon
>>  wrote:
>>>
>>> On 18 October 2016 at 09:34, Richard Biener  wrote:

 On Mon, 17 Oct 2016, Richard Biener wrote:

>
> This refactors propagation vs. substitution and handles condition
> simplification properly as well as passing a known taken edge down
> to the DOM walker (avoiding useless work and properly handling PHIs).
>
> If we do all the work it's stupid to not fold away dead code...
>
> Bootstrap and regtest pending on x86_64-unknown-linux-gnu.


 The following is what I applied, also fixing a spelling mistake noticed
 by Bernhard.

>>> Hi Richard,
>>>
>>> This patch is causing regressions on aarch64. These tests now fail:
>>
>>
>> So I looked into it and found the testcase themselves need to be changed.
>> The functions are marked as noinline but not noclone.
>> For an example:
>> static void __attribute__((noinline))
>> check_args_8 (int a0, int a1, int a2, int a3, int a4, int a5, int a6, int
>> a7,
>>   int a8)
>> 
>>
>
> Indeed. In test12 and so on, arguments for check_args_8/check_args_24 is now
> becoming constant which enables ipa-cp to create specialized clones. Though
> this is good, in order to preserve the tested functionality, we need to add
> noclone attribute. Here is a patch to do this.
>
> Regression tested on aatch64-linux-gnu. Is this OK for trunk?

I think this is obvious, mainly because noinline was already there.

Thanks,
Andrew

>
> Thanks,
> Kugan
>
> gcc/testsuite/ChangeLog:
>
> 2016-10-20  Kugan Vivekanandarajah  
>
> * gcc.target/aarch64/test_frame_common.h: Add noclone attribute
> such that cloned verions of tested functions are not created.
>
>
>
>> Thanks,
>> Andrew
>>
>>>   gcc.target/aarch64/test_frame_12.c scan-assembler-times ldp\tx29,
>>> x30, \\[sp, [0-9]+\\] 1
>>>   gcc.target/aarch64/test_frame_12.c scan-assembler-times sub\tsp, sp,
>>> #[0-9]+ 1
>>>   gcc.target/aarch64/test_frame_15.c scan-assembler-times stp\tx29,
>>> x30, \\[sp, [0-9]+\\] 1
>>>   gcc.target/aarch64/test_frame_15.c scan-assembler-times sub\tsp, sp,
>>> #[0-9]+ 1
>>>   gcc.target/aarch64/test_frame_8.c scan-assembler-times ldr\tx30,
>>> \\[sp, [0-9]+\\] 1
>>>   gcc.target/aarch64/test_frame_8.c scan-assembler-times str\tx30,
>>> \\[sp, [0-9]+\\] 1
>>>
>>> Christophe
>>>
 Richard.

 2016-10-18  Richard Biener  

 * tree-vrp.c (evrp_dom_walker::before_dom_children): Handle
 not visited but non-executable predecessors.  Return taken edge.
 Simplify conditions and refactor propagation vs. folding step.

 * gcc.dg/tree-ssa/pr20318.c: Disable EVRP.
 * gcc.dg/tree-ssa/pr21001.c: Likewise.
 * gcc.dg/tree-ssa/pr21090.c: Likewise.
 * gcc.dg/tree-ssa/pr21294.c: Likewise.
 * gcc.dg/tree-ssa/pr21563.c: Likewise.
 * gcc.dg/tree-ssa/pr23744.c: Likewise.
 * gcc.dg/tree-ssa/pr25382.c: Likewise.
 * gcc.dg/tree-ssa/pr68431.c: Likewise.
 * gcc.dg/tree-ssa/vrp03.c: Likewise.
 * gcc.dg/tree-ssa/vrp06.c: Likewise.
 * gcc.dg/tree-ssa/vrp07.c: Likewise.
 * gcc.dg/tree-ssa/vrp09.c: Likewise.
 * gcc.dg/tree-ssa/vrp19.c: Likewise.
 * gcc.dg/tree-ssa/vrp20.c: Likewise.
 * gcc.dg/tree-ssa/vrp92.c: Likewise.
 * gcc.dg/pr68217.c: Likewise.
 * gcc.dg/predict-9.c: Likewise.
 * gcc.dg/tree-prof/val-prof-5.c: Adjust.
 * gcc.dg/predict-1.c: Likewise.



 Index: gcc/tree-vrp.c
 ===
 --- gcc/tree-vrp.c  (revision 241242)
 +++ gcc/tree-vrp.c  (working copy)
 @@ -10741,12 +10741,13 @@ evrp_dom_walker::before_dom_children (ba
gimple_stmt_iterator gsi;
edge e;
edge_iterator ei;
 -  bool has_unvisived_preds = false;
 +  bool has_unvisited_preds = false;

FOR_EACH_EDGE (e, ei, bb->preds)
 -if (!(e->src->flags & BB_VISITED))
 +if (e->flags & EDGE_EXECUTABLE
 +   && !(e->src->flags & BB_VISITED))
{
 -   has_unvisived_preds = true;
 +   has_unvisited_preds = true;
 break;
}

 @@ -10756,7 +10757,7 @@ evrp_dom_walker::before_dom_children (ba
gphi *phi = gpi.phi ();
tree lhs = PHI_RESULT (phi);
value_range vr_result = VR_INITIALIZER;
 -  if (!has_unvisived_preds
 +  if (!has_unvisited_preds
   && stmt_interesting_for_vrp (phi))
 extract_range_from_phi_node (phi, &vr_result);
else
 @@ -10764,81 +10765,90 @@ evrp_dom_walker::before_dom_children (ba
update_value_range (lhs, &vr_result);
>>>

Re: Use version namespace in normal mode

2016-10-21 Thread François Dumont

Hi

I configured libstdc++ to use gnu-version-namespace and there are a 
number of failures, see below. But none of them related to this patch so 
is it ok to commit ?


The results:

FAIL: libstdc++-abi/abi_check

3709 symbols reported as added. I don't know what to think about 
it. I see a gnu-versioned-namespace.ver in config/abi/pre, is it the 
list of symbols to support when versioned namespace is activated ? The 
list looks pretty limited.


FAIL: 18_support/headers/limits/synopsis.cc (test for excess errors)
FAIL: 20_util/headers/functional/synopsis.cc (test for excess errors)
FAIL: 20_util/headers/memory/synopsis.cc (test for excess errors)
FAIL: 20_util/headers/utility/synopsis.cc (test for excess errors)
FAIL: 21_strings/headers/string/synopsis.cc (test for excess errors)
FAIL: 22_locale/headers/locale/synopsis.cc (test for excess errors)
FAIL: 23_containers/headers/bitset/synopsis.cc (test for excess errors)
FAIL: 23_containers/headers/deque/synopsis.cc (test for excess errors)
FAIL: 23_containers/headers/forward_list/synopsis.cc (test for excess 
errors)

FAIL: 23_containers/headers/list/synopsis.cc (test for excess errors)
FAIL: 23_containers/headers/map/synopsis.cc (test for excess errors)
FAIL: 23_containers/headers/queue/synopsis.cc (test for excess errors)
FAIL: 23_containers/headers/set/synopsis.cc (test for excess errors)
FAIL: 23_containers/headers/stack/synopsis.cc (test for excess errors)
FAIL: 23_containers/headers/vector/synopsis.cc (test for excess errors)
FAIL: 24_iterators/headers/iterator/synopsis.cc (test for excess errors)
FAIL: 24_iterators/headers/iterator/synopsis_c++11.cc (test for excess 
errors)
FAIL: 24_iterators/headers/iterator/synopsis_c++14.cc (test for excess 
errors)
FAIL: 24_iterators/headers/iterator/synopsis_c++17.cc (test for excess 
errors)

FAIL: 26_numerics/headers/complex/synopsis.cc (test for excess errors)
FAIL: 26_numerics/headers/valarray/synopsis.cc (test for excess errors)
FAIL: 27_io/headers/fstream/synopsis.cc (test for excess errors)
FAIL: 27_io/headers/ios/synopsis.cc (test for excess errors)
FAIL: 27_io/headers/istream/synopsis.cc (test for excess errors)
FAIL: 27_io/headers/ostream/synopsis.cc (test for excess errors)
FAIL: 27_io/headers/sstream/synopsis.cc (test for excess errors)
FAIL: 27_io/headers/streambuf/synopsis.cc (test for excess errors)
FAIL: tr1/2_general_utilities/headers/functional/synopsis.cc (test for 
excess errors)
FAIL: tr1/2_general_utilities/headers/memory/synopsis.cc (test for 
excess errors)
FAIL: tr1/3_function_objects/headers/functional/synopsis.cc (test for 
excess errors)
FAIL: tr1/4_metaprogramming/headers/type_traits/synopsis.cc (test for 
excess errors)

FAIL: tr1/6_containers/headers/array/synopsis.cc (test for excess errors)
FAIL: tr1/6_containers/headers/unordered_map/synopsis.cc (test for 
excess errors)
FAIL: tr1/6_containers/headers/unordered_set/synopsis.cc (test for 
excess errors)


All those failures are coming from declaration or explicit 
instantiations of template types expected to be in std but being in fact 
in std::__7. Should I add usage of 
_GLIBCXX_BEGIN_NAMESPACE_VERSION/_GLIBCXX_END_NAMESSPACE_VERSION in 
those files ? Or introduce a dg-require-no-versioned-namespace ?



FAIL: 17_intro/using_namespace_std_tr1_neg.cc  (test for errors, line 65)
FAIL: 21_strings/basic_string/cons/char/7.cc (test for excess errors)
FAIL: 21_strings/basic_string/cons/wchar_t/7.cc (test for excess errors)
FAIL: 21_strings/basic_string/lwg2758.cc (test for excess errors)
FAIL: 21_strings/basic_string/modifiers/append/char/4.cc (test for 
excess errors)
FAIL: 21_strings/basic_string/modifiers/append/wchar_t/4.cc (test for 
excess errors)
FAIL: 21_strings/basic_string/modifiers/assign/char/4.cc (test for 
excess errors)
FAIL: 21_strings/basic_string/modifiers/assign/wchar_t/4.cc (test for 
excess errors)
FAIL: 21_strings/basic_string/modifiers/insert/char/3.cc (test for 
excess errors)
FAIL: 21_strings/basic_string/modifiers/insert/wchar_t/3.cc (test for 
excess errors)
FAIL: 21_strings/basic_string/modifiers/replace/char/7.cc (test for 
excess errors)
FAIL: 21_strings/basic_string/modifiers/replace/wchar_t/7.cc (test for 
excess errors)
FAIL: 21_strings/basic_string/operations/compare/char/2.cc (test for 
excess errors)
FAIL: 21_strings/basic_string/operations/compare/wchar_t/2.cc (test for 
excess errors)
FAIL: 21_strings/basic_string/operations/find/char/5.cc (test for excess 
errors)
FAIL: 21_strings/basic_string/operations/find/wchar_t/5.cc (test for 
excess errors)

FAIL: 21_strings/basic_string/operators/char/5.cc (test for excess errors)
FAIL: 21_strings/basic_string/operators/wchar_t/5.cc (test for excess 
errors)

FAIL: 21_strings/basic_string_view/cons/char/1.cc (test for excess errors)
FAIL: 21_strings/basic_string_view/cons/wchar_t/1.cc (test for excess 
errors)
FAIL: 21_strings/basic_string_view/inserters/char/1.cc (test for excess 
errors)
FAIL: 21_strings/basic_string_view/inserters/char/2

Re: [PATCH][v6] GIMPLE store merging pass

2016-10-21 Thread Richard Biener
On October 21, 2016 3:29:15 PM GMT+02:00, Kyrill Tkachov 
 wrote:
>Hi Richard,
>
>On 21/10/16 13:37, Richard Biener wrote:
>> On Tue, 18 Oct 2016, Kyrill Tkachov wrote:
>>
>>> Hi Richard,
>>>
>>> This patch is a merge of [1] and [2] and implements the manual
>merging of
>>> bitfields
>>> as outlined in [1] but actually makes it work on BYTES_BIG_ENDIAN
>too.
>>> It caused me a lot of headeache because the bit offset is counted
>from the
>>> most significant bit
>>> in the byte, even though BITS_BIG_ENDIAN was 0 (BITS_BIG_ENDIAN
>looks
>>> irrelevant for store merging
>>> anyway as it's just used to described RTL extract operations).
>>> I've included ASCII diagrams of the steps in the merging algorithm.
>> Heh, thanks.
>>
>>> Bootstrapped and tested on arm-none-linux-gnueabihf,
>aarch64-none-linux-gnu,
>>> x86_64-unknown-linux-gnu.
>>> Also tested on aarch64_be-none-elf.
>>>
>>> How does this version look now?
>> Mostly good.  For
>>
>> +bool
>> +pass_store_merging::terminate_all_aliasing_chains (tree dest, tree
>base,
>> +  gimple *stmt)
>> +{
>> ...
>> +  /* Check if the assignment destination (BASE) is part of a store
>chain.
>> + This is to catch non-constant stores to destinations that may
>be
>> part
>> + of a chain.  */
>> +  if (base)
>> +{
>> +  chain_info = m_stores.get (base);
>> +  if (chain_info)
>> +   {
>> + struct store_immediate_info *info;
>> + unsigned int i;
>> + FOR_EACH_VEC_ELT ((*chain_info)->m_store_info, i, info)
>> +   {
>> + if (refs_may_alias_p (info->dest, dest))
>> +   {
>>
>> I suppose the chain is not yet sorted in any way?
>>
>> At least for 'dest' which do not have a known constant offset we
>> could do
>>
>> if (base)
>>   terminate_and_release_chain (base);
>
>Do you mean when get_inner_reference returns non-NULL for POFFSET?

Yes.

>Or do you think we should try to look into dest in this function?
>
>> to speed things up?  IIRC we do not terminate chains early in
>> this phase when we have enough stores to form a group, so
>> writing a testcase that triggers quadraticness would be as simple
>> as having
>>
>> char a[100];
>>
>> void foo ()
>> {
>>   a[0] = 1;
>>   a[1] = 2;
>>   
>>   a[999] = 3;
>> }
>>
>> ?
>>
>> so I think you probably want to limit the number of stores you
>> ever put onto a chain and if you reach that limit, terminate
>> and release it?  Like just choose 16 or 64?  (and experiment
>> with the above kind of testcases)
>
>I was initially thinking of imposing such a limit as well but
>later I thought we'd want to extend the output code to be able to emit
>a memcpy (or memset) call for large regions, so detecting the largest
>possible
>regions would be needed.

But then we need a better data structure here to avoid the quadraticness.

 But that is not implemented yet (though I have
>experimented
>with it) so I can add a limit here. Should I just hardcode a limit or
>should I make it
>into a --param (MAX_STMTS_IN_STORE_MERGING_CHAIN or something)?

A param is preferred.

>>
>> + bit_off = byte_off << LOG2_BITS_PER_UNIT;
>> + if (!wi::neg_p (bit_off) && wi::fits_shwi_p
>(bit_off))
>> +   {
>> + bitpos += bit_off.to_shwi ();
>> +
>>
>> I think you want bit_off += bitpos before the fits_shwi check
>> otherwise this add may still overflow.
>>
>> + base_addr = copy_node (base_addr);
>> + TREE_OPERAND (base_addr, 1)
>> +   = build_zero_cst (TREE_TYPE (TREE_OPERAND (
>> +  base_addr,
>1)));
>>
>> I'd prefer
>>
>>base_addr = build2 (MEM_REF, ...);
>>
>> here.
>
>Thanks for the feedback,
>Kyrill
>
>>
>> Thanks,
>> Richard.
>>
>>> Thanks,
>>> Kyrill
>>>
>>> [1] https://gcc.gnu.org/ml/gcc-patches/2016-10/msg00573.html
>>> [2] https://gcc.gnu.org/ml/gcc-patches/2016-10/msg00572.html
>>>
>>> 2016-10-18  Kyrylo Tkachov  
>>>
>>>  PR middle-end/22141
>>>  * Makefile.in (OBJS): Add gimple-ssa-store-merging.o.
>>>  * common.opt (fstore-merging): New Optimization option.
>>>  * opts.c (default_options_table): Add entry for
>>>  OPT_ftree_store_merging.
>>>  * fold-const.h (can_native_encode_type_p): Declare prototype.
>>>  * fold-const.c (can_native_encode_type_p): Define.
>>>  * params.def (PARAM_STORE_MERGING_ALLOW_UNALIGNED): Define.
>>>  * passes.def: Insert pass_tree_store_merging.
>>>  * tree-pass.h (make_pass_store_merging): Declare extern
>>>  prototype.
>>>  * gimple-ssa-store-merging.c: New file.
>>>  * doc/invoke.texi (Optimization Options): Document
>>>  -fstore-merging.
>>>
>>> 2016-10-18  Kyrylo Tkachov  
>>>  Jakub Jelinek  
>>>  Andrew Pinski  
>>>
>>>  PR middle-end/22141
>>>  PR rtl-optimization/23684
>>>  * gcc.c-torture/execute/pr2

[COMMITED] PR78055 Many new gfortran test failures

2016-10-21 Thread Jerry DeLisle

Committed as obvious.

M   libgfortran/ChangeLog
M   libgfortran/io/io.h
r241422 = 268e62788f36198cb64a3ce953daacbd3b0107ee (refs/remotes/svn/trunk)


2016-10-21  Jerry DeLisle  

PR libfortran/78055
* io/io.h (st_parameter_dt): Restore GFC_IO_INT to maintain
alignment.

--- trunk/libgfortran/io/io.h   2016/10/21 17:27:15 241421
+++ trunk/libgfortran/io/io.h   2016/10/21 18:02:32 241422
@@ -514,6 +514,7 @@
 large enough to hold a complex value (two reals) of the
 largest kind.  */
  char value[32];
+ GFC_IO_INT not_used; /* Needed for alignment. */
  formatted_dtio fdtio_ptr;
  unformatted_dtio ufdtio_ptr;
} p;




Re: [ada, testsuite] Parallelize check-gnat

2016-10-21 Thread Mike Stump
On Oct 21, 2016, at 7:01 AM, Rainer Orth  wrote:
> 
> I happened to notice that the gnat.dg testsuite run is slow

>  2.6 GHz AMD Opteron 8435, -j24   43m 24s => 33m 4s
>  2.93 GHz Intel Xeon X7350, -j16   30m 7s  =>  9m 8s
>  2.67 GHz Intel Xeon X7542, -j48   14m 56s =>  5m 50s
> 
> Seems like a worthwhile speedup to me.

> Ok for mainline

I like the change as well (if it shortens bootstrap and/or check).



Re: RFC [1/3] divmod transform v2

2016-10-21 Thread Jeff Law

On 10/20/2016 03:32 AM, Richard Biener wrote:

Starting with some high level design comments.  If these conflict with
comments from others, let me know and we'll work through the issues.

I don't really like introducing code conditional on the target capabilities
this early in the gimple optimization pipeline.


It's basically done right before RTL expansion
(pass_optimize_widening_mul).
I didn't look at the placement of the pass as it hadn't changed. 
Looking at that now, I see it's fairly late in the pipeline, so I won't 
object to its placement.






Would it be possible to always do the transformation to divmod in the gimple
optimizers, regardless of the target capabilities.  Then in the gimple->RTL
expanders make a final decision about divmod insn, libcall, or using div/mod
insns?


The issue is that it hoists one or both the division or the modulo and
if we don't do the transform we'd want to undo that code motion.
But don't we only hoist when we find a suitable div/mod pair to turn 
into a divmod?   Maybe I'm mis-understanding how the identification works.


[ snip ]


*/

+
+  FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, op1)
+{
+  if (is_gimple_assign (use_stmt)
+ && (gimple_assign_rhs_code (use_stmt) == TRUNC_DIV_EXPR
+ || gimple_assign_rhs_code (use_stmt) == TRUNC_MOD_EXPR)
+ && operand_equal_p (op1, gimple_assign_rhs1 (use_stmt), 0)
+ && operand_equal_p (op2, gimple_assign_rhs2 (use_stmt), 0))

So why check for TRUNC_MOD_EXPR here?  ISTM that stmt is always going to be
TRUNC_MOD_EXPR and thus you're only interested in looking for a matching
TRUNC_DIV_EXPR statement.

The only way I could see TRUNC_MOD_EXPR being useful here would be if there is
a redundant TRUNC_MOD_EXPR in the IL, which would be a sign that some other
optimizer hasn't done its job.  Have you seen this to be useful in practice?


Note that I've reviewed this parts already and we restructured things
in this way, requiring to look for TRUNC_MOD_EXPR to properly handle
finding a dominating trunc _or_ div and interatively doing so
correctly if we have more than one pair.
But doesn't having more than one pair essentially mean that the other 
optimziers have left redundant junk in the IL?  I'm clearly missing 
something here.




+
+  gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt);
+  gimple_assign_set_rhs_from_tree (&gsi, new_rhs);
+  update_stmt (use_stmt);
+
+  if (maybe_clean_or_replace_eh_stmt (use_stmt, use_stmt))
+   cfg_changed = true;

Does this ever happen given how you filter out throwing statements earlier?


We filter out throwing stmts that are not "top".  The top one we replace
with the divmod call and we can preserve EH for that (and thus handle
the case where the original div/mod at this place may throw).
OK.  It just struck me as a bit odd.  No worries if y'all have already 
worked through this.



Jeff


Re: [ada, testsuite] Parallelize check-gnat

2016-10-21 Thread Mike Stump
On Oct 21, 2016, at 9:54 AM, Eric Botcazou  wrote:
> 
>> I'm not strongly against your patch, I'm just very surprised it is really
>> needed (acats is much larger, check-gnat is small).
> 
> In what unit do you count?  ACATS has fewer tests than gnat.dg nowadays.

The only unit that matters, wall seconds.



Re: [PATCH] PR debug/77315 - use DW_OP_form_tls_address

2016-10-21 Thread Tom Tromey
> "Jakub" == Jakub Jelinek  writes:

Jakub> I admit I haven't looked at the GDB changes, but how will the debugger 
know
Jakub> if it is an emutls emulation or normal ELF TLS, if the same op is used
Jakub> in both cases?

Because gdb never implemented DW_OP_form_tls_address, that emultls case
has never worked.

My view is that, if it ought to work, then this has to be some sort of
platform-specific thing.  gdb already lets different platforms implement
TLS lookup differently; it can be done by the target or by the arch, see
target_translate_tls_address.

My suspicion is that the history here is just that
DW_OP_GNU_push_tls_address was introduced to solve this problem, then
formalized in DWARF, and then for some reason never followed up on.

Jakub> Also, as this effectively requires the latest unreleased GDB under the
Jakub> default options for something that has been working previously, I wonder
Jakub> if it e.g. for some time shouldn't be guarded with dwarf_version >= 5
Jakub> (which will require substantial changes in gdb anyway), and only be 
changed
Jakub> back to dwarf_version >= 3 in 2 years or so when newer debugger will be 
much
Jakub> more common.

I'd be amenable to that.  It did occur to me after sending that the
DWARF 5 work is going to require a new gdb anyhow.

Tom


Re: [PATCH] PR debug/77315 - use DW_OP_form_tls_address

2016-10-21 Thread Jakub Jelinek
On Fri, Oct 21, 2016 at 11:17:33AM -0600, Tom Tromey wrote:
> This patch changes gcc to emit DW_OP_form_tls_address rather than
> DW_OP_GNU_push_tls_address.  This is PR debug/77315.
> 
> DW_OP_form_tls_address was added in DWARF 3, and this patch uses the
> DWARF version to decide which opcode to emit.
> 
> Note that gdb did not implement the DWARF 3 opcode until recently -- in
> fact it isn't even in 7.12 (I thought I had put it in there but it seems
> not).  I'm not sure if that means that this should wait a cycle.
> 
> Built and regtested on x86-64 Fedora 24.

I admit I haven't looked at the GDB changes, but how will the debugger know
if it is an emutls emulation or normal ELF TLS, if the same op is used
in both cases?

Also, as this effectively requires the latest unreleased GDB under the
default options for something that has been working previously, I wonder
if it e.g. for some time shouldn't be guarded with dwarf_version >= 5
(which will require substantial changes in gdb anyway), and only be changed
back to dwarf_version >= 3 in 2 years or so when newer debugger will be much
more common.

> 2016-10-21  Tom Tromey  
> 
>   PR debug/77315:
>   * dwarf2out.c (mem_loc_descriptor): Use DW_OP_form_tls_address.
>   (resolve_args_picking_1): Move DW_OP_form_tls_address case next to
>   DW_OP_GNU_push_tls_address case.
>   (loc_list_from_tree_1): Use DW_OP_form_tls_address.
> ---
>  gcc/ChangeLog   |  8 
>  gcc/dwarf2out.c | 12 
>  2 files changed, 16 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index 6102719..481a2a9 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,3 +1,11 @@
> +2016-10-21  Tom Tromey  
> +
> + PR debug/77315:
> + * dwarf2out.c (mem_loc_descriptor): Use DW_OP_form_tls_address.
> + (resolve_args_picking_1): Move DW_OP_form_tls_address case next to
> + DW_OP_GNU_push_tls_address case.
> + (loc_list_from_tree_1): Use DW_OP_form_tls_address.
> +
>  2016-10-21  Jakub Jelinek  
>  
>   * config/i386/adxintrin.h (_subborrow_u32, _addcarry_u32,
> diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
> index 4683e1c..6ab6ff8 100644
> --- a/gcc/dwarf2out.c
> +++ b/gcc/dwarf2out.c
> @@ -13619,7 +13619,10 @@ mem_loc_descriptor (rtx rtl, machine_mode mode,
>  
>temp = new_addr_loc_descr (rtl, dtprel_true);
>  
> -   mem_loc_result = new_loc_descr (DW_OP_GNU_push_tls_address, 0, 0);
> +   mem_loc_result = new_loc_descr ((dwarf_version > 2
> +? DW_OP_form_tls_address
> +: DW_OP_GNU_push_tls_address),
> +   0, 0);
> add_loc_descr (&mem_loc_result, temp);
>  
> break;
> @@ -15467,7 +15470,6 @@ resolve_args_picking_1 (dw_loc_descr_ref loc, 
> unsigned initial_frame_offset,
>   case DW_OP_piece:
>   case DW_OP_deref_size:
>   case DW_OP_nop:
> - case DW_OP_form_tls_address:
>   case DW_OP_bit_piece:
>   case DW_OP_implicit_value:
>   case DW_OP_stack_value:
> @@ -15595,6 +15597,7 @@ resolve_args_picking_1 (dw_loc_descr_ref loc, 
> unsigned initial_frame_offset,
>   break;
> }
>  
> + case DW_OP_form_tls_address:
>   case DW_OP_GNU_push_tls_address:
>   case DW_OP_GNU_uninit:
>   case DW_OP_GNU_encoded_addr:
> @@ -15924,8 +15927,9 @@ loc_list_from_tree_1 (tree loc, int want_address,
> operand shouldn't be.  */
> if (DECL_EXTERNAL (loc) && !targetm.binds_local_p (loc))
>   return 0;
> - dtprel = dtprel_true;
> - tls_op = DW_OP_GNU_push_tls_address;
> +   dtprel = dtprel_true;
> +   tls_op = (dwarf_version > 2 ? DW_OP_form_tls_address
> + : DW_OP_GNU_push_tls_address);
>   }
> else
>   {
> -- 
> 2.7.4

Jakub


[PATCH] PR debug/77315 - use DW_OP_form_tls_address

2016-10-21 Thread Tom Tromey
This patch changes gcc to emit DW_OP_form_tls_address rather than
DW_OP_GNU_push_tls_address.  This is PR debug/77315.

DW_OP_form_tls_address was added in DWARF 3, and this patch uses the
DWARF version to decide which opcode to emit.

Note that gdb did not implement the DWARF 3 opcode until recently -- in
fact it isn't even in 7.12 (I thought I had put it in there but it seems
not).  I'm not sure if that means that this should wait a cycle.

Built and regtested on x86-64 Fedora 24.

2016-10-21  Tom Tromey  

PR debug/77315:
* dwarf2out.c (mem_loc_descriptor): Use DW_OP_form_tls_address.
(resolve_args_picking_1): Move DW_OP_form_tls_address case next to
DW_OP_GNU_push_tls_address case.
(loc_list_from_tree_1): Use DW_OP_form_tls_address.
---
 gcc/ChangeLog   |  8 
 gcc/dwarf2out.c | 12 
 2 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 6102719..481a2a9 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,11 @@
+2016-10-21  Tom Tromey  
+
+   PR debug/77315:
+   * dwarf2out.c (mem_loc_descriptor): Use DW_OP_form_tls_address.
+   (resolve_args_picking_1): Move DW_OP_form_tls_address case next to
+   DW_OP_GNU_push_tls_address case.
+   (loc_list_from_tree_1): Use DW_OP_form_tls_address.
+
 2016-10-21  Jakub Jelinek  
 
* config/i386/adxintrin.h (_subborrow_u32, _addcarry_u32,
diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index 4683e1c..6ab6ff8 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -13619,7 +13619,10 @@ mem_loc_descriptor (rtx rtl, machine_mode mode,
 
   temp = new_addr_loc_descr (rtl, dtprel_true);
 
- mem_loc_result = new_loc_descr (DW_OP_GNU_push_tls_address, 0, 0);
+ mem_loc_result = new_loc_descr ((dwarf_version > 2
+  ? DW_OP_form_tls_address
+  : DW_OP_GNU_push_tls_address),
+ 0, 0);
  add_loc_descr (&mem_loc_result, temp);
 
  break;
@@ -15467,7 +15470,6 @@ resolve_args_picking_1 (dw_loc_descr_ref loc, unsigned 
initial_frame_offset,
case DW_OP_piece:
case DW_OP_deref_size:
case DW_OP_nop:
-   case DW_OP_form_tls_address:
case DW_OP_bit_piece:
case DW_OP_implicit_value:
case DW_OP_stack_value:
@@ -15595,6 +15597,7 @@ resolve_args_picking_1 (dw_loc_descr_ref loc, unsigned 
initial_frame_offset,
break;
  }
 
+   case DW_OP_form_tls_address:
case DW_OP_GNU_push_tls_address:
case DW_OP_GNU_uninit:
case DW_OP_GNU_encoded_addr:
@@ -15924,8 +15927,9 @@ loc_list_from_tree_1 (tree loc, int want_address,
  operand shouldn't be.  */
  if (DECL_EXTERNAL (loc) && !targetm.binds_local_p (loc))
return 0;
- dtprel = dtprel_true;
- tls_op = DW_OP_GNU_push_tls_address;
+ dtprel = dtprel_true;
+ tls_op = (dwarf_version > 2 ? DW_OP_form_tls_address
+   : DW_OP_GNU_push_tls_address);
}
  else
{
-- 
2.7.4



[PATCH] Start adding target-specific selftests

2016-10-21 Thread David Malcolm
On Fri, 2016-10-21 at 12:04 +0200, Bernd Schmidt wrote:
> On 10/21/2016 02:36 AM, David Malcolm wrote:
> > +  /* Test dumping of hard regs.  This is inherently target
> > -specific due
> > + to the name.  */
> > +#ifdef I386_OPTS_H
> > +  ASSERT_RTL_DUMP_EQ ("(reg:SI ax)", gen_raw_REG (SImode, 0));
> > +#endif
> 
> Generally putting in target dependencies like this isn't something we
> like to do. The patch is OK without this part, and we can revisit
> this,
> but maybe there wants to be a target hook for running target-specific
> selftests.

Thanks.  I removed the above target-specific part, and committed it
as r241405 (having reverified bootstrap®rtest on x86_64-pc-linux-gnu).

The following patch implements a target hook for running target-specific
selftests.

It implements the above test for dumping of hard regs, putting it
within i386.c.

It's rather trivial, but I have followups that add further
target-specific tests, so hopefully this foundation is OK.

Successfully bootstrapped®rtested on x86_64-pc-linux-gnu.

OK for trunk?
 
> > +  ASSERT_RTL_DUMP_EQ ("(cjump_insn (set (pc)\n"
> > + "(label_ref 0))\n"
> > + " (nil))\n",
> > + jump_insn);
> >  }
> 
> I do wonder about the (nil)s and whether we can eliminate them.

I hope to.

gcc/ChangeLog:
* config/i386/i386.c: Include "selftest.h" and "selftest-rtl.h".
(selftest::ix86_test_dumping_hard_regs): New function.
(selftest::ix86_run_selftests): New function.
(TARGET_RUN_TARGET_SELFTESTS): When CHECKING_P, wire this up to
selftest::ix86_run_selftests.
* doc/tm.texi.in (TARGET_RUN_TARGET_SELFTESTS): New.
* doc/tm.texi: Regenerate
* rtl-tests.c: Include "selftest-rtl.h".
(selftest::assert_rtl_dump_eq): Make non-static.
(ASSERT_RTL_DUMP_EQ): Move to selftest-rtl.h.
(selftest::test_dumping_regs): Update comment.
* selftest-rtl.h: New file.
* selftest-run-tests.c: Include "target.h".
(selftest::run_tests): If non-NULL, call
targetm.run_target_selftests.
* target.def (run_target_selftests): New hook.
---
 gcc/config/i386/i386.c   | 34 ++
 gcc/doc/tm.texi  |  4 
 gcc/doc/tm.texi.in   |  2 ++
 gcc/rtl-tests.c  | 10 +++---
 gcc/selftest-rtl.h   | 45 +
 gcc/selftest-run-tests.c |  5 +
 gcc/target.def   |  6 ++
 7 files changed, 99 insertions(+), 7 deletions(-)
 create mode 100644 gcc/selftest-rtl.h

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 3e6f8fd..8f6ceb4 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -77,6 +77,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "case-cfn-macros.h"
 #include "regrename.h"
 #include "dojump.h"
+#include "selftest.h"
+#include "selftest-rtl.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -50365,6 +50367,33 @@ ix86_addr_space_zero_address_valid (addr_space_t as)
 #undef TARGET_ADDR_SPACE_ZERO_ADDRESS_VALID
 #define TARGET_ADDR_SPACE_ZERO_ADDRESS_VALID ix86_addr_space_zero_address_valid
 
+/* Target-specific selftests.  */
+
+#if CHECKING_P
+
+namespace selftest {
+
+/* Verify that hard regs are dumped as expected (in compact mode).  */
+
+static void
+ix86_test_dumping_hard_regs ()
+{
+  ASSERT_RTL_DUMP_EQ ("(reg:SI ax)", gen_raw_REG (SImode, 0));
+  ASSERT_RTL_DUMP_EQ ("(reg:SI dx)", gen_raw_REG (SImode, 1));
+}
+
+/* Run all target-specific selftests.  */
+
+static void
+ix86_run_selftests (void)
+{
+  ix86_test_dumping_hard_regs ();
+}
+
+} // namespace selftest
+
+#endif /* CHECKING_P */
+
 /* Initialize the GCC target structure.  */
 #undef TARGET_RETURN_IN_MEMORY
 #define TARGET_RETURN_IN_MEMORY ix86_return_in_memory
@@ -50840,6 +50869,11 @@ ix86_addr_space_zero_address_valid (addr_space_t as)
 #undef TARGET_CUSTOM_FUNCTION_DESCRIPTORS
 #define TARGET_CUSTOM_FUNCTION_DESCRIPTORS 1
 
+#if CHECKING_P
+#undef TARGET_RUN_TARGET_SELFTESTS
+#define TARGET_RUN_TARGET_SELFTESTS selftest::ix86_run_selftests
+#endif /* #if CHECKING_P */
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-i386.h"
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 29dc73b..7efcf57 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -11821,3 +11821,7 @@ All and all it does not take long to convert ports that 
the
 maintainer is familiar with.
 
 @end defmac
+
+@deftypefn {Target Hook} void TARGET_RUN_TARGET_SELFTESTS (void)
+If selftests are enabled, run any selftests for this target.
+@end deftypefn
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index efcd741..fb94dd8 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -8307,3 +8307,5 @@ All and all it does not take long to convert ports that 
the
 maintainer is familiar with.
 
 @end defmac
+
+@hook TARGET_RUN_TARGET_SELFTESTS
diff --git a/gcc/rtl-tests.c b/gcc/rtl-tests.c

[PATCH] Three patches for std::experimental::filesystem

2016-10-21 Thread Jonathan Wakely

This implements some DR resolutions for the filesystem lib.

Tested x86_64-linux, committed to trunk.


commit 03db1baaa50ea8d97b4442fffaae4e68a03eebad
Author: Jonathan Wakely 
Date:   Thu Oct 20 19:26:47 2016 +0100

LWG2720 implement filesystem::perms::symlink_nofollow

	* include/experimental/bits/fs_fwd.h (perms::resolve_symlinks):
	Replace with symlink_nofollow (LWG 2720).
	* src/filesystem/ops.cc (permissions(const path&, perms, error_code&)):
	Handle symlink_nofollow.
	* testsuite/experimental/filesystem/operations/create_symlink.cc: New
	test.
	* testsuite/experimental/filesystem/operations/permissions.cc: Test
	overload taking error_code.

diff --git a/libstdc++-v3/include/experimental/bits/fs_fwd.h b/libstdc++-v3/include/experimental/bits/fs_fwd.h
index 1c08b19..fb8521a 100644
--- a/libstdc++-v3/include/experimental/bits/fs_fwd.h
+++ b/libstdc++-v3/include/experimental/bits/fs_fwd.h
@@ -162,7 +162,7 @@ _GLIBCXX_END_NAMESPACE_CXX11
   unknown		=  0x,
   add_perms		= 0x1,
   remove_perms	= 0x2,
-  resolve_symlinks	= 0x4
+  symlink_nofollow	= 0x4
   };
 
   constexpr perms
diff --git a/libstdc++-v3/src/filesystem/ops.cc b/libstdc++-v3/src/filesystem/ops.cc
index 6b38584..68343a9 100644
--- a/libstdc++-v3/src/filesystem/ops.cc
+++ b/libstdc++-v3/src/filesystem/ops.cc
@@ -1101,6 +1101,7 @@ void fs::permissions(const path& p, perms prms, error_code& ec) noexcept
 {
   const bool add = is_set(prms, perms::add_perms);
   const bool remove = is_set(prms, perms::remove_perms);
+  const bool nofollow = is_set(prms, perms::symlink_nofollow);
   if (add && remove)
 {
   ec = std::make_error_code(std::errc::invalid_argument);
@@ -,7 +1112,7 @@ void fs::permissions(const path& p, perms prms, error_code& ec) noexcept
 
   if (add || remove)
 {
-  auto st = status(p, ec);
+  auto st = nofollow ? symlink_status(p, ec) : status(p, ec);
   if (ec)
 	return;
   auto curr = st.permissions();
@@ -1122,9 +1123,12 @@ void fs::permissions(const path& p, perms prms, error_code& ec) noexcept
 }
 
 #if _GLIBCXX_USE_FCHMODAT
-  if (::fchmodat(AT_FDCWD, p.c_str(), static_cast(prms), 0))
+  const int flag = nofollow ? AT_SYMLINK_NOFOLLOW : 0;
+  if (::fchmodat(AT_FDCWD, p.c_str(), static_cast(prms), flag))
 #else
-  if (::chmod(p.c_str(), static_cast(prms)))
+  if (nofollow)
+ec = std::make_error_code(std::errc::operation_not_supported);
+  else if (::chmod(p.c_str(), static_cast(prms)))
 #endif
 ec.assign(errno, std::generic_category());
   else
diff --git a/libstdc++-v3/testsuite/experimental/filesystem/operations/create_symlink.cc b/libstdc++-v3/testsuite/experimental/filesystem/operations/create_symlink.cc
new file mode 100644
index 000..7297259
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/filesystem/operations/create_symlink.cc
@@ -0,0 +1,93 @@
+// Copyright (C) 2016 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-lstdc++fs" }
+// { dg-do run { target c++11 } }
+// { dg-require-filesystem-ts "" }
+
+#include 
+#include 
+#include 
+
+namespace fs = std::experimental::filesystem;
+
+void
+test01()
+{
+  std::error_code ec, ec2;
+  __gnu_test::scoped_file f;
+  auto tgt = f.path;
+
+  // Test empty path.
+  fs::path p;
+  create_symlink(tgt, p, ec );
+  VERIFY( ec );
+  try
+  {
+create_symlink(tgt, p);
+  }
+  catch (const std::experimental::filesystem::filesystem_error& ex)
+  {
+ec2 = ex.code();
+VERIFY( ex.path1() == tgt );
+VERIFY( ex.path2() == p );
+  }
+  VERIFY( ec2 == ec );
+}
+
+void
+test02()
+{
+  std::error_code ec, ec2;
+  __gnu_test::scoped_file f;
+  auto tgt = f.path;
+
+  // Test non-existent path
+  auto p = __gnu_test::nonexistent_path();
+  VERIFY( !exists(p) );
+
+  create_symlink(tgt, p, ec); // create the symlink once
+  VERIFY( !ec );
+  VERIFY( exists(p) );
+  VERIFY( is_symlink(p) );
+  remove(p);
+  create_symlink(tgt, p); // create the symlink again
+  VERIFY( exists(p) );
+  VERIFY( is_symlink(p) );
+
+  create_symlink(tgt, p, ec); // Try to create existing symlink
+  VERIFY( ec );
+  try
+  {
+create_symlink(tgt, p);
+  }
+  catch (const std::experimental::filesystem::filesystem_error& ex)
+  {
+ec2 = ex.code();
+VERIFY

Re: [ada, testsuite] Parallelize check-gnat

2016-10-21 Thread Eric Botcazou
> I'm not strongly against your patch, I'm just very surprised it is really
> needed (acats is much larger, check-gnat is small).

In what unit do you count?  ACATS has fewer tests than gnat.dg nowadays.

-- 
Eric Botcazou


Re: [Patch AArch64] Add floatdihf2 and floatunsdihf2 patterns

2016-10-21 Thread Kyrill Tkachov


On 06/09/16 10:19, James Greenhalgh wrote:

Hi,

This patch adds patterns for conversion from 64-bit integer to 16-bit
floating-point values under AArch64 targets which don't have support for
the ARMv8.2-A 16-bit floating point extensions.

We implement these by first saturating to a SImode (we know that any
values >= 65504 will round to infinity after conversion to HFmode), then
converting to a DFmode (unsigned conversions could go to SFmode, but there
is no performance benefit to this). Then converting to HFmode.

Having added these patterns, the expansion path in "expand_float" will
now try to use them for conversions from SImode to HFmode as there is no
floatsihf2 pattern. expand_float first tries widening the integer size and
looking for a match, so it will try SImode -> DImode. But our DI mode
pattern is going to then saturate us back to SImode which is wasteful.

Better, would be for us to provide float(uns)sihf2 patterns directly.
So that's what this patch does.

The testcase add in this patch would fail on trunk for AArch64. There is
no libgcc routine to make the conversion, and we don't provide appropriate
patterns in the backend, so we get a link-time error.

Bootstrapped and tested on aarch64-none-linux-gnu

OK for trunk?


Looks ok to me FWIW, but I can't approve.

Kyrill


James

---
2016-09-06  James Greenhalgh  

* config/aarch64/aarch64.md (sihf2): Convert to expand.
(dihf2): Likewise.
(aarch64_fp16_hf2): New.

2016-09-06  James Greenhalgh  

* gcc.target/aarch64/floatdihf2_1.c: New.





RE: [PATCHv2][GCC] Optimise the fpclassify builtin to perform integer operations when possible

2016-10-21 Thread Tamar Christina
Hi Richard, Jeff,

Fair enough, I understand the reservations both of you have.

I'll spend some time experimenting with what kind of code I'd
Get out of it from lowering early and come up with an updated
Patch.

Thanks!
Tamar

> -Original Message-
> From: Richard Biener [mailto:rguent...@suse.de]
> Sent: 21 October 2016 09:05
> To: Jeff Law
> Cc: Tamar Christina; GCC Patches; nd; Richard Earnshaw; Wilco Dijkstra;
> ja...@redhat.com; Joseph Myers; Michael Meissner; Moritz Klammler;
> Andrew Pinski
> Subject: Re: [PATCHv2][GCC] Optimise the fpclassify builtin to perform
> integer operations when possible
> 
> On Thu, 20 Oct 2016, Jeff Law wrote:
> 
> > On 09/30/2016 07:22 AM, Tamar Christina wrote:
> > > Hi All,
> > >
> > > This is v2 of the patch which adds an optimized route to the
> > > fpclassify builtin for floating point numbers which are similar to
> > > IEEE-754 in format.
> > >
> > > I have addressed most comments from everyone except for two things:
> > >
> > > 1) Providing a back-end hook to override the functionality. While 
> > > certainly
> > >possible the current fpclassify doesn't provide this either. So
> > > I'd like to
> > >treat it as an enhancement rather than an issue.
> > I think the concern here is PPC, particularly the newer ones which
> > have significant hardware support for these kind of characterizations.
> >
> > Based on the discussions though, I suspect we're going to need
> > something nontrivial due to the way the API for __builtin_fpclassify
> > works.  In the end I can easily see some target way to override the default
> code synthesis.
> >
> > I think these issues should be left for the PPC folks to propose a
> > solution when they're ready to exploit their new hardware.  I don't
> > think this should block the patch.
> >
> >
> >
> > >
> > > 2) Doing it in a lowering phase. If the general consensus is that
> > > this is the
> > >path the patch must take then I'd be happy to reconsider. However at
> this
> > >this patch does not seem to produce worse code than what there
> > > was before.
> > I think that was a desire from Richi.   I'm a bit torn here.
> >
> > The code looks more like lowering rather than folding.  But it's also
> > generating non-gimple trees and relies on gimple_fold_builtin to
> > re-gimplify the result AFAICT.
> >
> > Richi -- thoughts?
> 
> I'm not entirely happy with the patch but also not with the current state of
> handling of fpclassify.  I do see the need to lower(!) fpclassify early 
> because
> we want to optimize it both depending on the return value usage and the
> input value.
> 
> The lowering we currently apply open-codes isnormal (we have a builtin for
> this) and isfinite (likewise).  I'd prefer if we can apply the lowering in the
> gimplifier and somehow avoid the early decision on whether to use FP or
> integer code to perform the operation.  Sth like
> 
>  fpclassify(x) -> isnan(x) ? FP_NAN : isnormal(x) ? FP_NORMAL
> : !isfinite(x) ? FP_INFINITE : x == 0 ? FP_ZERO : FP_SUBNORMAL
> 
> (leaves the comparison against zero in explicit FP math).  We do have later
> foldings that expand isnormal and isfinite and isinf to use compares -- those
> are the ones that we might want to change to integer reps.
> 
> We are also missing optabs for most of the sub-classification tasks which
> would make it possible to re-combine the whole thing back to a single
> fpclassify asm op.
> 
> That said, the folding to integer ops obfuscates the real operation and thus
> makes the job of a (not yet existing) pass optimizing these kind of
> classifications via range analysis or the like hard.
> Thus I'd rather apply those at or near to RTL expansion time.
> 
> Richard.
> 
> > --
> >
> > I think its nontrivial to judge worse vs better since it's really a
> > function of the target's micro-architecture and the context in which
> > fpclassify is called -- particularly where the input value lives and
> > whether or not its used in other ways nearby.
> >
> > In the case where the input value is in memory or not used in floating
> > point arithmetic nearby, your change should be a clear win (with the
> > exception of the latest ppc hardware perhaps).
> >
> > If the input value is not in memory and used nearby in FP ops, then it
> > gets a lot trickier.  We run the risk of making the object addressable
> > which means it won't be an SSA_NAME and thus not exposed to the high
> level optimizers.
> >
> > Richi has indicated that in gimple an object need not be addressable
> > just because we access random pieces of it, including the ability to
> > avoid marking something as addressable even though we have MEM
> (&decl) style expressions.
> > I'm not sure how all that works, but trust Richi implicitly.
> > Additionally you're using VIEW_CONVERT_EXPR now rather than
> ADDR_EXPR,
> > so that may mitigate things as well.
> >
> > Finally there's the issue of having to transfer the object between the
> > FP and GP register files which can be highly e

Re: Ping^5 Re: [Patch AArch64] Add floatdihf2 and floatunsdihf2 patterns

2016-10-21 Thread James Greenhalgh
On Wed, Oct 12, 2016 at 04:56:52PM +0100, James Greenhalgh wrote:
> On Wed, Sep 28, 2016 at 05:17:14PM +0100, James Greenhalgh wrote:
> > On Wed, Sep 21, 2016 at 10:42:03AM +0100, James Greenhalgh wrote:
> > > On Tue, Sep 13, 2016 at 10:31:28AM +0100, James Greenhalgh wrote:
> > > > On Tue, Sep 06, 2016 at 10:19:50AM +0100, James Greenhalgh wrote:
> > > > > This patch adds patterns for conversion from 64-bit integer to 16-bit
> > > > > floating-point values under AArch64 targets which don't have support 
> > > > > for
> > > > > the ARMv8.2-A 16-bit floating point extensions.
> > > > > 
> > > > > We implement these by first saturating to a SImode (we know that any
> > > > > values >= 65504 will round to infinity after conversion to HFmode), 
> > > > > then
> > > > > converting to a DFmode (unsigned conversions could go to SFmode, but 
> > > > > there
> > > > > is no performance benefit to this). Then converting to HFmode.
> > > > > 
> > > > > Having added these patterns, the expansion path in "expand_float" will
> > > > > now try to use them for conversions from SImode to HFmode as there is 
> > > > > no
> > > > > floatsihf2 pattern. expand_float first tries widening the integer 
> > > > > size and
> > > > > looking for a match, so it will try SImode -> DImode. But our DI mode
> > > > > pattern is going to then saturate us back to SImode which is wasteful.
> > > > > 
> > > > > Better, would be for us to provide float(uns)sihf2 patterns directly.
> > > > > So that's what this patch does.
> > > > > 
> > > > > The testcase add in this patch would fail on trunk for AArch64. There 
> > > > > is
> > > > > no libgcc routine to make the conversion, and we don't provide 
> > > > > appropriate
> > > > > patterns in the backend, so we get a link-time error.
> > > > > 
> > > > > Bootstrapped and tested on aarch64-none-linux-gnu
> > > > > 
> > > > > OK for trunk?
> > > > 
> > > > Ping.
> > > 
> > > Ping^2
> > 
> > Ping^3
> 
> Ping^4

Ping^5

Thanks,
James

> > 
> > There was an off-list question as to whether the mid-end could catch this,
> > rather than requiring the target to do so. My objection to that is that it
> > would involve teaching the midend about saturating narrowing operations,
> > which if the target doesn't provide them natively require branching.
> > 
> > I'd rather push targets that want DImode to HFmode (and don't provide a
> > DImode to TFmode to go through first) to use libgcc/soft-fp than try to add
> > a special generic expander for DImode to HFmode conversions.
> > 
> > Note that even if we did have a generic expander for these types, we would
> > still need some version of this patch, as we want to override the behaviour
> > where the ARMv8.2-A 16-bit floating-point types are available.
> > > > > 2016-09-06  James Greenhalgh  
> > > > > 
> > > > >   * config/aarch64/aarch64.md (sihf2): Convert to expand.
> > > > >   (dihf2): Likewise.
> > > > >   (aarch64_fp16_hf2): New.
> > > > > 
> > > > > 2016-09-06  James Greenhalgh  
> > > > > 
> > > > >   * gcc.target/aarch64/floatdihf2_1.c: New.
> > > > > 
> > > > 
> > > > > diff --git a/gcc/config/aarch64/aarch64.md 
> > > > > b/gcc/config/aarch64/aarch64.md
> > > > > index 6afaf90..1882a72 100644
> > > > > --- a/gcc/config/aarch64/aarch64.md
> > > > > +++ b/gcc/config/aarch64/aarch64.md
> > > > > @@ -4630,7 +4630,14 @@
> > > > >[(set_attr "type" "f_cvti2f")]
> > > > >  )
> > > > >  
> > > > > -(define_insn "hf2"
> > > > > +;; If we do not have ARMv8.2-A 16-bit floating point extensions, the
> > > > > +;; midend will arrange for an SImode conversion to HFmode to first go
> > > > > +;; through DFmode, then to HFmode.  But first it will try converting
> > > > > +;; to DImode then down, which would match our DImode pattern below 
> > > > > and
> > > > > +;; give very poor code-generation.  So, we must provide our own 
> > > > > emulation
> > > > > +;; of the mid-end logic.
> > > > > +
> > > > > +(define_insn "aarch64_fp16_hf2"
> > > > >[(set (match_operand:HF 0 "register_operand" "=w")
> > > > >   (FLOATUORS:HF (match_operand:GPI 1 "register_operand" "r")))]
> > > > >"TARGET_FP_F16INST"
> > > > > @@ -4638,6 +4645,53 @@
> > > > >[(set_attr "type" "f_cvti2f")]
> > > > >  )
> > > > >  
> > > > > +(define_expand "sihf2"
> > > > > +  [(set (match_operand:HF 0 "register_operand")
> > > > > + (FLOATUORS:HF (match_operand:SI 1 "register_operand")))]
> > > > > +  "TARGET_FLOAT"
> > > > > +{
> > > > > +  if (TARGET_FP_F16INST)
> > > > > +emit_insn (gen_aarch64_fp16_sihf2 (operands[0], 
> > > > > operands[1]));
> > > > > +  else
> > > > > +{
> > > > > +  rtx convert_target = gen_reg_rtx (DFmode);
> > > > > +  emit_insn (gen_sidf2 (convert_target, operands[1]));
> > > > > +  emit_insn (gen_truncdfhf2 (operands[0], convert_target));
> > > > > +}
> > > > > +  DONE;
> > > > > +}
> > > > > +)
> > > > > +
> > > > > +;; For DImode there is no wide enough floating-point mode that we
> > > > > +;; can convert thro

Re: [PATCH 2/5] [AARCH64] Change IMP and PART over to integers from strings.

2016-10-21 Thread James Greenhalgh
On Fri, Oct 21, 2016 at 04:57:22PM +0100, Richard Earnshaw (lists) wrote:
> On 21/10/16 14:59, James Greenhalgh wrote:
> > On Sat, Oct 15, 2016 at 07:38:40PM -0700, Andrew Pinski wrote:
> >> On Wed, Nov 25, 2015 at 11:59 AM, Andrew Pinski  wrote:
> >> Here is finally an updated (fixed) patch (I did not implement the two
> >> implementer big.LITTLE support yet, that will be for a different patch
> >> since I also fixed the part no not being unique as a separate patch.
> >> Once I get a new enough kernel, I will also look into doing the
> >> /sys/cpu/* style detection first.
> >>
> >> OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions
> >> (and tested hacking the location of the read in file to see if it
> >> works with big.LITTLE and other formats of /proc/cpuinfo).
> > 
> > I'm OK with this in principle, but it needs some polish for pedantic
> > style comments...
> > 
> >> * config/aarch64/aarch64-cores.def: Rewrite so IMP and PART are
> >> integer constants.
> >> * config/aarch64/driver-aarch64.c (struct aarch64_core_data): Change
> >> implementer_id to unsigned char.
> >> Change part_no to unsigned int.
> >> (AARCH64_BIG_LITTLE): New define.
> >> (INVALID_IMP): New define.
> >> (INVALID_CORE): New define.
> >> (cpu_data): Change the last element's implementer_id and part_no to 
> >> integers.
> >> (valid_bL_string_p): Rewrite to ..
> >> (valid_bL_core_p): this for integers instead of strings.
> >> (parse_field): New function.
> >> (contains_string_p): Rewrite to ...
> >> (contains_core_p): this for integers and only for the part_no.
> >> (host_detect_local_cpu): Rewrite handling of implementation and part
> >> num to be integers;
> >> simplifying the code.
> > 
> >> Index: config/aarch64/aarch64-cores.def
> >> ===
> >> --- config/aarch64/aarch64-cores.def   (revision 241200)
> >> +++ config/aarch64/aarch64-cores.def   (working copy)
> >> @@ -32,43 +32,46 @@
> >> FLAGS are the bitwise-or of the traits that apply to that core.
> >> This need not include flags implied by the architecture.
> >> COSTS is the name of the rtx_costs routine to use.
> >> -   IMP is the implementer ID of the CPU vendor.  On a GNU/Linux system it 
> >> can
> >> -   be found in /proc/cpuinfo.
> >> +   IMP is the implementer ID of the CPU vendor.  On a GNU/Linux system it
> >> +   can be found in /proc/cpuinfo. A partial list of implementer IDs is
> >> +   given in the ARM Architecture Reference Manual ARMv8, for
> >> -   in /proc/cpuinfo.  For big.LITTLE systems this should have the form at 
> >> of
> >> -   ".".  */
> >> +   in /proc/cpuinfo.  For big.LITTLE systems this should use the macro 
> >> AARCH64_BIG_LITTLE
> >> +   where the big part number comes as the first arugment to the macro and 
> >> little is the
> >> +   second.  */
> > 
> > Needs rewrapped for 80 char width.
> > 
> 
> I don't think it's a good idea to line wrap the def files, some of them
> are processed with AWK during configure and having a complete entry per
> line avoids potential matching problems.

Agreed (and essential) for the entries themselves. This is just a comment
that hangs over the end and should be fixed.

While I'm here...

> >> +   where the big part number comes as the first arugment to the macro and 
> >> little is the

s/arugment/argument.

Cheers,
James



Re: [PATCH 2/5] [AARCH64] Change IMP and PART over to integers from strings.

2016-10-21 Thread Richard Earnshaw (lists)
On 21/10/16 14:59, James Greenhalgh wrote:
> On Sat, Oct 15, 2016 at 07:38:40PM -0700, Andrew Pinski wrote:
>> On Wed, Nov 25, 2015 at 11:59 AM, Andrew Pinski  wrote:
>> Here is finally an updated (fixed) patch (I did not implement the two
>> implementer big.LITTLE support yet, that will be for a different patch
>> since I also fixed the part no not being unique as a separate patch.
>> Once I get a new enough kernel, I will also look into doing the
>> /sys/cpu/* style detection first.
>>
>> OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions
>> (and tested hacking the location of the read in file to see if it
>> works with big.LITTLE and other formats of /proc/cpuinfo).
> 
> I'm OK with this in principle, but it needs some polish for pedantic
> style comments...
> 
>> * config/aarch64/aarch64-cores.def: Rewrite so IMP and PART are
>> integer constants.
>> * config/aarch64/driver-aarch64.c (struct aarch64_core_data): Change
>> implementer_id to unsigned char.
>> Change part_no to unsigned int.
>> (AARCH64_BIG_LITTLE): New define.
>> (INVALID_IMP): New define.
>> (INVALID_CORE): New define.
>> (cpu_data): Change the last element's implementer_id and part_no to integers.
>> (valid_bL_string_p): Rewrite to ..
>> (valid_bL_core_p): this for integers instead of strings.
>> (parse_field): New function.
>> (contains_string_p): Rewrite to ...
>> (contains_core_p): this for integers and only for the part_no.
>> (host_detect_local_cpu): Rewrite handling of implementation and part
>> num to be integers;
>> simplifying the code.
> 
>> Index: config/aarch64/aarch64-cores.def
>> ===
>> --- config/aarch64/aarch64-cores.def (revision 241200)
>> +++ config/aarch64/aarch64-cores.def (working copy)
>> @@ -32,43 +32,46 @@
>> FLAGS are the bitwise-or of the traits that apply to that core.
>> This need not include flags implied by the architecture.
>> COSTS is the name of the rtx_costs routine to use.
>> -   IMP is the implementer ID of the CPU vendor.  On a GNU/Linux system it 
>> can
>> -   be found in /proc/cpuinfo.
>> +   IMP is the implementer ID of the CPU vendor.  On a GNU/Linux system it
>> +   can be found in /proc/cpuinfo. A partial list of implementer IDs is
>> +   given in the ARM Architecture Reference Manual ARMv8, for
>> -   in /proc/cpuinfo.  For big.LITTLE systems this should have the form at of
>> -   ".".  */
>> +   in /proc/cpuinfo.  For big.LITTLE systems this should use the macro 
>> AARCH64_BIG_LITTLE
>> +   where the big part number comes as the first arugment to the macro and 
>> little is the
>> +   second.  */
> 
> Needs rewrapped for 80 char width.
> 

I don't think it's a good idea to line wrap the def files, some of them
are processed with AWK during configure and having a complete entry per
line avoids potential matching problems.

R.

>>  
>> -static bool
>> -valid_bL_string_p (const char** core, const char* bL_string)
>> + static bool
>> +valid_bL_core_p (unsigned int *core, unsigned int bL_core)
> 
> Stray space before static.
> 
>>  {
>> -  return strstr (bL_string, core[0]) != NULL
>> -&& strstr (bL_string, core[1]) != NULL;
>> +  return AARCH64_BIG_LITTLE (core[0], core[1]) == bL_core
>> + || AARCH64_BIG_LITTLE (core[1], core[0]) == bL_core;
>> +}
>> +
>> +/* Returns the integer that is after ':' for the field. */
>> +static unsigned parse_field (const char *field)
> 
> parse_field should be on a new line, FIELD should be capitalised in the
> explanatory comment.
> 
> OK with the appropriate changes to rectify these points.
> 
> Thanks,
> James
> 



Re: [patch, libstdc++] Optimize selection sampling by using generator to get two values at once

2016-10-21 Thread Jonathan Wakely

On 19/10/16 12:48 +0200, Eelis van der Weegen wrote:

This is the same optimization as was recently applied to std::shuffle.

It reduces the runtime of the following program by 20% on my machine:

int main()
{
std::vector a(1), b(1000);
std::mt19937 gen;

uint64_t c = 0;

for (int i = 0; i != 1000; ++i)
{
std::sample(a.begin(), a.end(), b.begin(), b.size(), 
gen);
c += uint64_t(b[32]);
}

std::cout << c;
}


Thanks, I've committd this slightly revised version to trunk (tweaking
some whitespace, removing some redundant std:: qualification, and
using foo_t aliases instead of typename foo::type).

Tested powerpc64le-linux. Committed to trunk.


commit 01535578bc44d810e7cf4c2bfbc3836d7977e229
Author: Jonathan Wakely 
Date:   Fri Oct 21 16:37:21 2016 +0100

Optimize RNG use in std::sample selection sampling

2016-10-21  Eelis van der Weegen  

	* include/bits/stl_algo.h (__gen_two_uniform_ints): Move logic out
	of shuffle into new function.
	(shuffle): Call __gen_two_uniform_ints.
	(__sample): Use
	__gen_two_uniform_ints and perform two samples at a time.

diff --git a/libstdc++-v3/include/bits/stl_algo.h b/libstdc++-v3/include/bits/stl_algo.h
index 6c771bb..3ecb33b 100644
--- a/libstdc++-v3/include/bits/stl_algo.h
+++ b/libstdc++-v3/include/bits/stl_algo.h
@@ -3741,6 +3741,37 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
 #ifdef _GLIBCXX_USE_C99_STDINT_TR1
   /**
+   *  @brief Generate two uniformly distributed integers using a
+   * single distribution invocation.
+   *  @param  __b0The upper bound for the first integer.
+   *  @param  __b1The upper bound for the second integer.
+   *  @param  __g A UniformRandomBitGenerator.
+   *  @return  A pair (i, j) with i and j uniformly distributed
+   *   over [0, __b0) and [0, __b1), respectively.
+   *
+   *  Requires: __b0 * __b1 <= __g.max() - __g.min().
+   *
+   *  Using uniform_int_distribution with a range that is very
+   *  small relative to the range of the generator ends up wasting
+   *  potentially expensively generated randomness, since
+   *  uniform_int_distribution does not store leftover randomness
+   *  between invocations.
+   *
+   *  If we know we want two integers in ranges that are sufficiently
+   *  small, we can compose the ranges, use a single distribution
+   *  invocation, and significantly reduce the waste.
+  */
+  template
+pair<_IntType, _IntType>
+__gen_two_uniform_ints(_IntType __b0, _IntType __b1,
+			   _UniformRandomBitGenerator&& __g)
+{
+  _IntType __x
+	= uniform_int_distribution<_IntType>{0, (__b0 * __b1) - 1}(__g);
+  return std::make_pair(__x / __b1, __x % __b1);
+}
+
+  /**
*  @brief Shuffle the elements of a sequence using a uniform random
* number generator.
*  @ingroup mutating_algorithms
@@ -3773,8 +3804,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   typedef typename std::uniform_int_distribution<__ud_type> __distr_type;
   typedef typename __distr_type::param_type __p_type;
 
-  typedef typename std::remove_reference<_UniformRandomNumberGenerator>::type _Gen;
-  typedef typename std::common_type::type __uc_type;
+  typedef typename remove_reference<_UniformRandomNumberGenerator>::type
+	_Gen;
+  typedef typename common_type::type
+	__uc_type;
 
   const __uc_type __urngrange = __g.max() - __g.min();
   const __uc_type __urange = __uc_type(__last - __first);
@@ -3801,13 +3834,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	while (__i != __last)
 	{
 	  const __uc_type __swap_range = __uc_type(__i - __first) + 1;
-	  const __uc_type __comp_range = __swap_range * (__swap_range + 1);
 
-	  std::uniform_int_distribution<__uc_type> __d{0, __comp_range - 1};
-	  const __uc_type __pospos = __d(__g);
+	  const pair<__uc_type, __uc_type> __pospos =
+	__gen_two_uniform_ints(__swap_range, __swap_range + 1, __g);
 
-	  std::iter_swap(__i++, __first + (__pospos % __swap_range));
-	  std::iter_swap(__i++, __first + (__pospos / __swap_range));
+	  std::iter_swap(__i++, __first + __pospos.first);
+	  std::iter_swap(__i++, __first + __pospos.second);
 	}
 
 	return;
@@ -5695,9 +5727,52 @@ _GLIBCXX_BEGIN_NAMESPACE_ALGO
 {
   using __distrib_type = uniform_int_distribution<_Size>;
   using __param_type = typename __distrib_type::param_type;
+  using _USize = make_unsigned_t<_Size>;
+  using _Gen = remove_reference_t<_UniformRandomBitGenerator>;
+  using __uc_type = common_type_t;
+
   __distrib_type __d{};
   _Size __unsampled_sz = std::distance(__first, __last);
-  for (__n = std::min(__n, __unsampled_sz); __n != 0; ++__first)
+  __n = std::min(__n, __unsampled_sz);
+
+  // If possible, we use __gen_two_uniform_ints to efficiently produce
+  // two random numbers using a single d

Re: [PATCH] Formatting fixes for some x86 intrin headers

2016-10-21 Thread Uros Bizjak
On Fri, Oct 21, 2016 at 5:28 PM, Jakub Jelinek  wrote:
> Hi!
>
> While looking at the bextr/bextri/bzhi/pdep/pext intrinsics,
> I've noticed some ugly formatted code in the headers, this patch fixes
> what I found.  Because the headers are installed, IMHO it is more important
> to keep them properly formatted.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2016-10-21  Jakub Jelinek  
>
> * config/i386/adxintrin.h (_subborrow_u32, _addcarry_u32,
> _addcarryx_u32, _subborrow_u64, _addcarry_u64, _addcarryx_u64):
> Formatting fixes.
> * config/i386/rdseedintrin.h (_rdseed16_step, _rdseed32_step,
> _rdseed64_step): Likewise.
> * config/i386/tbmintrin.h (__bextri_u32): Likewise.

OK.

(This is obvious patch, similar future formatting fixes are rubber-stamped OK).

Thanks,
Uros.

> --- gcc/config/i386/adxintrin.h.jj  2016-01-04 14:55:55.0 +0100
> +++ gcc/config/i386/adxintrin.h 2016-10-21 12:50:33.121927989 +0200
> @@ -31,9 +31,9 @@
>  extern __inline unsigned char
>  __attribute__((__gnu_inline__, __always_inline__, __artificial__))
>  _subborrow_u32 (unsigned char __CF, unsigned int __X,
> -   unsigned int __Y, unsigned int *__P)
> +   unsigned int __Y, unsigned int *__P)
>  {
> -return __builtin_ia32_sbb_u32 (__CF, __Y, __X, __P);
> +  return __builtin_ia32_sbb_u32 (__CF, __Y, __X, __P);
>  }
>
>  extern __inline unsigned char
> @@ -41,7 +41,7 @@ __attribute__((__gnu_inline__, __always_
>  _addcarry_u32 (unsigned char __CF, unsigned int __X,
>unsigned int __Y, unsigned int *__P)
>  {
> -return __builtin_ia32_addcarryx_u32 (__CF, __X, __Y, __P);
> +  return __builtin_ia32_addcarryx_u32 (__CF, __X, __Y, __P);
>  }
>
>  extern __inline unsigned char
> @@ -49,16 +49,16 @@ __attribute__((__gnu_inline__, __always_
>  _addcarryx_u32 (unsigned char __CF, unsigned int __X,
> unsigned int __Y, unsigned int *__P)
>  {
> -return __builtin_ia32_addcarryx_u32 (__CF, __X, __Y, __P);
> +  return __builtin_ia32_addcarryx_u32 (__CF, __X, __Y, __P);
>  }
>
>  #ifdef __x86_64__
>  extern __inline unsigned char
>  __attribute__((__gnu_inline__, __always_inline__, __artificial__))
>  _subborrow_u64 (unsigned char __CF, unsigned long long __X,
> -   unsigned long long __Y, unsigned long long *__P)
> +   unsigned long long __Y, unsigned long long *__P)
>  {
> -return __builtin_ia32_sbb_u64 (__CF, __Y, __X, __P);
> +  return __builtin_ia32_sbb_u64 (__CF, __Y, __X, __P);
>  }
>
>  extern __inline unsigned char
> @@ -66,7 +66,7 @@ __attribute__((__gnu_inline__, __always_
>  _addcarry_u64 (unsigned char __CF, unsigned long long __X,
>unsigned long long __Y, unsigned long long *__P)
>  {
> -return __builtin_ia32_addcarryx_u64 (__CF, __X, __Y, __P);
> +  return __builtin_ia32_addcarryx_u64 (__CF, __X, __Y, __P);
>  }
>
>  extern __inline unsigned char
> @@ -74,7 +74,7 @@ __attribute__((__gnu_inline__, __always_
>  _addcarryx_u64 (unsigned char __CF, unsigned long long __X,
> unsigned long long __Y, unsigned long long *__P)
>  {
> -return __builtin_ia32_addcarryx_u64 (__CF, __X, __Y, __P);
> +  return __builtin_ia32_addcarryx_u64 (__CF, __X, __Y, __P);
>  }
>  #endif
>
> --- gcc/config/i386/rdseedintrin.h.jj   2016-08-19 17:24:43.0 +0200
> +++ gcc/config/i386/rdseedintrin.h  2016-10-21 12:52:14.680652144 +0200
> @@ -39,14 +39,14 @@ extern __inline int
>  __attribute__((__gnu_inline__, __always_inline__, __artificial__))
>  _rdseed16_step (unsigned short *__p)
>  {
> -return __builtin_ia32_rdseed_hi_step (__p);
> +  return __builtin_ia32_rdseed_hi_step (__p);
>  }
>
>  extern __inline int
>  __attribute__((__gnu_inline__, __always_inline__, __artificial__))
>  _rdseed32_step (unsigned int *__p)
>  {
> -return __builtin_ia32_rdseed_si_step (__p);
> +  return __builtin_ia32_rdseed_si_step (__p);
>  }
>
>  #ifdef __x86_64__
> @@ -54,7 +54,7 @@ extern __inline int
>  __attribute__((__gnu_inline__, __always_inline__, __artificial__))
>  _rdseed64_step (unsigned long long *__p)
>  {
> -return __builtin_ia32_rdseed_di_step (__p);
> +  return __builtin_ia32_rdseed_di_step (__p);
>  }
>  #endif
>
> --- gcc/config/i386/tbmintrin.h.jj  2016-01-04 14:55:55.0 +0100
> +++ gcc/config/i386/tbmintrin.h 2016-10-21 12:51:16.194386886 +0200
> @@ -38,12 +38,12 @@
>  extern __inline unsigned int __attribute__((__gnu_inline__, 
> __always_inline__, __artificial__))
>  __bextri_u32 (unsigned int __X, const unsigned int __I)
>  {
> -   return __builtin_ia32_bextri_u32 (__X, __I);
> +  return __builtin_ia32_bextri_u32 (__X, __I);
>  }
>  #else
> -#define __bextri_u32(X, I)   \
> -((unsigned int)__builtin_ia32_bextri_u32 ((unsigned int)(X), \
> - (unsigned int)(I)))
> +#define __bextri_u32(X, I)  

Re: [PATCH] Also fold bmi/bmi2/tbm bextr/bextri/bzhi/pext/pdep builtins

2016-10-21 Thread Uros Bizjak
On Fri, Oct 21, 2016 at 5:26 PM, Jakub Jelinek  wrote:
> Hi!
>
> This patch on top of the just posted patch adds folding for a couple more
> builtins (though, hundreds or thousands of other md builtins remain unfolded
> even though they actually could be folded for e.g. const arguments).
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2016-10-21  Jakub Jelinek  
>
> * config/i386/i386.c (ix86_fold_builtin): Handle
> IX86_BUILTIN_BEXTR{,I}{32,64}, IX86_BUILTIN_BZHI{32,64},
> IX86_BUILTIN_PDEP{32,64} and IX86_BUILTIN_PEXT{32,64}.
> (ix86_gimple_fold_builtin): Handle IX86_BUILTIN_BZHI{32,64},
> IX86_BUILTIN_PDEP{32,64} and IX86_BUILTIN_PEXT{32,64}.
>
> * gcc.target/i386/bmi2-pext-1.c: New test.
> * gcc.target/i386/bmi2-pdep-1.c: New test.
> * gcc.target/i386/bmi2-bzhi-3.c: New test.
> * gcc.target/i386/tbm-bextri-1.c: New test.
> * gcc.target/i386/bmi-bextr-6.c: New test.

I'm not versed in this area, let's ask Richi for a review...

OK if Richi says so...

Thanks,
Uros.

> --- gcc/config/i386/i386.c.jj   2016-10-21 14:31:21.770818850 +0200
> +++ gcc/config/i386/i386.c  2016-10-21 14:58:58.897893832 +0200
> @@ -33369,6 +33369,88 @@ ix86_fold_builtin (tree fndecl, int n_ar
> }
>   break;
>
> +   case IX86_BUILTIN_BEXTR32:
> +   case IX86_BUILTIN_BEXTR64:
> +   case IX86_BUILTIN_BEXTRI32:
> +   case IX86_BUILTIN_BEXTRI64:
> + gcc_assert (n_args == 2);
> + if (tree_fits_uhwi_p (args[1]))
> +   {
> + unsigned HOST_WIDE_INT res = 0;
> + unsigned int prec = TYPE_PRECISION (TREE_TYPE (args[0]));
> + unsigned int start = tree_to_uhwi (args[1]);
> + unsigned int len = (start & 0xff00) >> 8;
> + start &= 0xff;
> + if (start >= prec || len == 0)
> +   res = 0;
> + else if (!tree_fits_uhwi_p (args[0]))
> +   break;
> + else
> +   res = tree_to_uhwi (args[0]) >> start;
> + if (len > prec)
> +   len = prec;
> + if (len < HOST_BITS_PER_WIDE_INT)
> +   res &= (HOST_WIDE_INT_1U << len) - 1;
> + return build_int_cstu (TREE_TYPE (TREE_TYPE (fndecl)), res);
> +   }
> + break;
> +
> +   case IX86_BUILTIN_BZHI32:
> +   case IX86_BUILTIN_BZHI64:
> + gcc_assert (n_args == 2);
> + if (tree_fits_uhwi_p (args[1]))
> +   {
> + unsigned int idx = tree_to_uhwi (args[1]) & 0xff;
> + if (idx >= TYPE_PRECISION (TREE_TYPE (args[0])))
> +   return args[0];
> + if (!tree_fits_uhwi_p (args[0]))
> +   break;
> + unsigned HOST_WIDE_INT res = tree_to_uhwi (args[0]);
> + res &= ~(HOST_WIDE_INT_M1U << idx);
> + return build_int_cstu (TREE_TYPE (TREE_TYPE (fndecl)), res);
> +   }
> + break;
> +
> +   case IX86_BUILTIN_PDEP32:
> +   case IX86_BUILTIN_PDEP64:
> + gcc_assert (n_args == 2);
> + if (tree_fits_uhwi_p (args[0]) && tree_fits_uhwi_p (args[1]))
> +   {
> + unsigned HOST_WIDE_INT src = tree_to_uhwi (args[0]);
> + unsigned HOST_WIDE_INT mask = tree_to_uhwi (args[1]);
> + unsigned HOST_WIDE_INT res = 0;
> + unsigned HOST_WIDE_INT m, k = 1;
> + for (m = 1; m; m <<= 1)
> +   if ((mask & m) != 0)
> + {
> +   if ((src & k) != 0)
> + res |= m;
> +   k <<= 1;
> + }
> + return build_int_cstu (TREE_TYPE (TREE_TYPE (fndecl)), res);
> +   }
> + break;
> +
> +   case IX86_BUILTIN_PEXT32:
> +   case IX86_BUILTIN_PEXT64:
> + gcc_assert (n_args == 2);
> + if (tree_fits_uhwi_p (args[0]) && tree_fits_uhwi_p (args[1]))
> +   {
> + unsigned HOST_WIDE_INT src = tree_to_uhwi (args[0]);
> + unsigned HOST_WIDE_INT mask = tree_to_uhwi (args[1]);
> + unsigned HOST_WIDE_INT res = 0;
> + unsigned HOST_WIDE_INT m, k = 1;
> + for (m = 1; m; m <<= 1)
> +   if ((mask & m) != 0)
> + {
> +   if ((src & m) != 0)
> + res |= k;
> +   k <<= 1;
> + }
> + return build_int_cstu (TREE_TYPE (TREE_TYPE (fndecl)), res);
> +   }
> + break;
> +
> default:
>   break;
> }
> @@ -33393,7 +33475,7 @@ ix86_gimple_fold_builtin (gimple_stmt_it
>int n_args = gimple_call_num_args (stmt);
>enum ix86_builtins fn_code = (enum ix86_builtins) DECL_FUNCTION_CODE 
> (fndecl);
>tree decl = NULL_TREE;
> -  tree arg0;
> +  tree arg0, arg1;
>
>switch (fn_code)
>  {
> @@ -33432,6 +33514,41 @@ ix86_gimple_fold_builtin (gimple_stmt_i

Re: [PATCH] Fold __builtin_ia32_[tl]zcnt_u{16,32,64} (PR target/78057)

2016-10-21 Thread Uros Bizjak
On Fri, Oct 21, 2016 at 5:31 PM, Jakub Jelinek  wrote:
> On Fri, Oct 21, 2016 at 05:28:42PM +0200, Uros Bizjak wrote:
>> On Fri, Oct 21, 2016 at 5:23 PM, Jakub Jelinek  wrote:
>> > Hi!
>> >
>> > This patch adds folding for the new ia32 md builtins.
>> > If they can be folded into constant, it is done in ix86_fold_builtin,
>> > if they can fold to corresponding generic __builtin_c[lt]z* (which have
>> > e.g. the advantage that VRP knows about what values it can have etc.),
>> > it is done in gimple_fold_builtin target hook.
>>
>> Are you sure that there is no way zero will be passed to generic
>> __builtin_c[lt]z?
>
> The patch only folds the ia32 specific builtins into __builtin_c[lt]z, if
> the argument is known not to be 0 (from VRP).
> That is the expr_not_equal_to call, which uses get_range_info under the
> hood.

I was expecting this answer ;)

Thanks, the patch is OK.

(I'll backport this and my patch to gcc-6 early next week).

Uros.


Re: RFC: Split into smaller pieces

2016-10-21 Thread Jonathan Wakely

On 13/10/16 18:34 +0100, Jonathan Wakely wrote:

This splits the large (2200 lines)  header into smaller
pieces, so there are separate headers for:

- std::less, std::equal_to etc. (already in their own header)
- std::__invoke (already in its own header)
- std::reference_wrapper (often used on its own, e.g. in )
- std::function (using in  and )

Everything else (std::mem_fn, std::bind, std::not_fn, searchers) stays
in , because we don't actually need them elsewhere in the
library.

Code which doesn't need the whole of  should include the
relevant  header instead.

This means that we don't need to pull the whole of  (and
 and ) into  just because shared_ptr
wants to use reference_wrapper in one place.  This reduces 
from 48kloc to 30kloc!

The patch is compressed because it's quite large, but it's mostly just
moving big blocks of code from  into new headers.

Any objections?


Nobody objected, so I'm doing it, here's the patch.

Tested powerp64le-linux. Committed to trunk.


commit fe1ee3343d9a010bef95634338014a0e79f3c8c0
Author: Jonathan Wakely 
Date:   Wed Oct 12 15:59:23 2016 +0100

Split  into smaller pieces

	* include/Makefile.am: Add  and .
	Order alphabetically.
	* include/Makefile.in: Regenerate.
	* include/bits/refwrap.h: New header.
	(_Maybe_get_result_type,_Weak_result_type_impl, _Weak_result_type)
	(_Reference_wrapper_base_impl, _Reference_wrapper_base)
	(reference_wrapper, ref, cref): Move here from .
	* include/bits/shared_ptr_base.h: Include  and
	 instead of .
	* include/bits/std_function.h: New header.
	(_Maybe_unary_or_binary_function, bad_function_call)
	(__is_location_invariant, _Nocopy_types, _Any_data)
	(_Simple_type_wrapper, _Function_base, _Function_handler, function):
	Move here from .
	* include/bits/unique_ptr.h: Include .
	* include/std/functional: Include new headers and move components to
	them.
	* include/std/future: Include  instead of
	.
	* include/std/mutex: Likewise.
	* include/std/regex: Likewise.
	* src/c++11/compatibility-thread-c++0x.cc: Include .
	* testsuite/20_util/default_delete/48631_neg.cc: Adjust dg-error line.
	* testsuite/20_util/default_delete/void_neg.cc: Likewise.
	* testsuite/20_util/unique_ptr/assign/48635_neg.cc: Adjust dg-error
	lines.
	* testsuite/20_util/unique_ptr/cons/cv_qual_neg.cc: Likewise.
	* testsuite/30_threads/packaged_task/49668.cc: Include .

diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am
index bb4a532..15a164e 100644
--- a/libstdc++-v3/include/Makefile.am
+++ b/libstdc++-v3/include/Makefile.am
@@ -140,6 +140,7 @@ bits_headers = \
 	${bits_srcdir}/random.h \
 	${bits_srcdir}/random.tcc \
 	${bits_srcdir}/range_access.h \
+	${bits_srcdir}/refwrap.h \
 	${bits_srcdir}/regex.h \
 	${bits_srcdir}/regex.tcc \
 	${bits_srcdir}/regex_constants.h \
@@ -152,14 +153,13 @@ bits_headers = \
 	${bits_srcdir}/regex_compiler.tcc \
 	${bits_srcdir}/regex_executor.h \
 	${bits_srcdir}/regex_executor.tcc \
-	${bits_srcdir}/stream_iterator.h \
-	${bits_srcdir}/streambuf_iterator.h \
 	${bits_srcdir}/shared_ptr.h \
 	${bits_srcdir}/shared_ptr_atomic.h \
 	${bits_srcdir}/shared_ptr_base.h \
 	${bits_srcdir}/slice_array.h \
 	${bits_srcdir}/sstream.tcc \
 	${bits_srcdir}/std_abs.h \
+	${bits_srcdir}/std_function.h \
 	${bits_srcdir}/std_mutex.h \
 	${bits_srcdir}/stl_algo.h \
 	${bits_srcdir}/stl_algobase.h \
@@ -186,6 +186,8 @@ bits_headers = \
 	${bits_srcdir}/stl_tree.h \
 	${bits_srcdir}/stl_uninitialized.h \
 	${bits_srcdir}/stl_vector.h \
+	${bits_srcdir}/stream_iterator.h \
+	${bits_srcdir}/streambuf_iterator.h \
 	${bits_srcdir}/streambuf.tcc \
 	${bits_srcdir}/stringfwd.h \
 	${bits_srcdir}/string_view.tcc \
diff --git a/libstdc++-v3/include/bits/refwrap.h b/libstdc++-v3/include/bits/refwrap.h
new file mode 100644
index 000..06948ff
--- /dev/null
+++ b/libstdc++-v3/include/bits/refwrap.h
@@ -0,0 +1,383 @@
+// Implementation of std::reference_wrapper -*- C++ -*-
+
+// Copyright (C) 2004-2016 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+

[PATCH] Also fold bmi/bmi2/tbm bextr/bextri/bzhi/pext/pdep builtins

2016-10-21 Thread Jakub Jelinek
Hi!

This patch on top of the just posted patch adds folding for a couple more
builtins (though, hundreds or thousands of other md builtins remain unfolded
even though they actually could be folded for e.g. const arguments).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-10-21  Jakub Jelinek  

* config/i386/i386.c (ix86_fold_builtin): Handle
IX86_BUILTIN_BEXTR{,I}{32,64}, IX86_BUILTIN_BZHI{32,64},
IX86_BUILTIN_PDEP{32,64} and IX86_BUILTIN_PEXT{32,64}.
(ix86_gimple_fold_builtin): Handle IX86_BUILTIN_BZHI{32,64},
IX86_BUILTIN_PDEP{32,64} and IX86_BUILTIN_PEXT{32,64}.

* gcc.target/i386/bmi2-pext-1.c: New test.
* gcc.target/i386/bmi2-pdep-1.c: New test.
* gcc.target/i386/bmi2-bzhi-3.c: New test.
* gcc.target/i386/tbm-bextri-1.c: New test.
* gcc.target/i386/bmi-bextr-6.c: New test.

--- gcc/config/i386/i386.c.jj   2016-10-21 14:31:21.770818850 +0200
+++ gcc/config/i386/i386.c  2016-10-21 14:58:58.897893832 +0200
@@ -33369,6 +33369,88 @@ ix86_fold_builtin (tree fndecl, int n_ar
}
  break;
 
+   case IX86_BUILTIN_BEXTR32:
+   case IX86_BUILTIN_BEXTR64:
+   case IX86_BUILTIN_BEXTRI32:
+   case IX86_BUILTIN_BEXTRI64:
+ gcc_assert (n_args == 2);
+ if (tree_fits_uhwi_p (args[1]))
+   {
+ unsigned HOST_WIDE_INT res = 0;
+ unsigned int prec = TYPE_PRECISION (TREE_TYPE (args[0]));
+ unsigned int start = tree_to_uhwi (args[1]);
+ unsigned int len = (start & 0xff00) >> 8;
+ start &= 0xff;
+ if (start >= prec || len == 0)
+   res = 0;
+ else if (!tree_fits_uhwi_p (args[0]))
+   break;
+ else
+   res = tree_to_uhwi (args[0]) >> start;
+ if (len > prec)
+   len = prec;
+ if (len < HOST_BITS_PER_WIDE_INT)
+   res &= (HOST_WIDE_INT_1U << len) - 1;
+ return build_int_cstu (TREE_TYPE (TREE_TYPE (fndecl)), res);
+   }
+ break;
+
+   case IX86_BUILTIN_BZHI32:
+   case IX86_BUILTIN_BZHI64:
+ gcc_assert (n_args == 2);
+ if (tree_fits_uhwi_p (args[1]))
+   {
+ unsigned int idx = tree_to_uhwi (args[1]) & 0xff;
+ if (idx >= TYPE_PRECISION (TREE_TYPE (args[0])))
+   return args[0];
+ if (!tree_fits_uhwi_p (args[0]))
+   break;
+ unsigned HOST_WIDE_INT res = tree_to_uhwi (args[0]);
+ res &= ~(HOST_WIDE_INT_M1U << idx);
+ return build_int_cstu (TREE_TYPE (TREE_TYPE (fndecl)), res);
+   }
+ break;
+
+   case IX86_BUILTIN_PDEP32:
+   case IX86_BUILTIN_PDEP64:
+ gcc_assert (n_args == 2);
+ if (tree_fits_uhwi_p (args[0]) && tree_fits_uhwi_p (args[1]))
+   {
+ unsigned HOST_WIDE_INT src = tree_to_uhwi (args[0]);
+ unsigned HOST_WIDE_INT mask = tree_to_uhwi (args[1]);
+ unsigned HOST_WIDE_INT res = 0;
+ unsigned HOST_WIDE_INT m, k = 1;
+ for (m = 1; m; m <<= 1)
+   if ((mask & m) != 0)
+ {
+   if ((src & k) != 0)
+ res |= m;
+   k <<= 1;
+ }
+ return build_int_cstu (TREE_TYPE (TREE_TYPE (fndecl)), res);
+   }
+ break;
+
+   case IX86_BUILTIN_PEXT32:
+   case IX86_BUILTIN_PEXT64:
+ gcc_assert (n_args == 2);
+ if (tree_fits_uhwi_p (args[0]) && tree_fits_uhwi_p (args[1]))
+   {
+ unsigned HOST_WIDE_INT src = tree_to_uhwi (args[0]);
+ unsigned HOST_WIDE_INT mask = tree_to_uhwi (args[1]);
+ unsigned HOST_WIDE_INT res = 0;
+ unsigned HOST_WIDE_INT m, k = 1;
+ for (m = 1; m; m <<= 1)
+   if ((mask & m) != 0)
+ {
+   if ((src & m) != 0)
+ res |= k;
+   k <<= 1;
+ }
+ return build_int_cstu (TREE_TYPE (TREE_TYPE (fndecl)), res);
+   }
+ break;
+
default:
  break;
}
@@ -33393,7 +33475,7 @@ ix86_gimple_fold_builtin (gimple_stmt_it
   int n_args = gimple_call_num_args (stmt);
   enum ix86_builtins fn_code = (enum ix86_builtins) DECL_FUNCTION_CODE 
(fndecl);
   tree decl = NULL_TREE;
-  tree arg0;
+  tree arg0, arg1;
 
   switch (fn_code)
 {
@@ -33432,6 +33514,41 @@ ix86_gimple_fold_builtin (gimple_stmt_it
  gimple_set_location (g, loc);
  gsi_replace (gsi, g, true);
  return true;
+   }
+  break;
+
+case IX86_BUILTIN_BZHI32:
+case IX86_BUILTIN_BZHI64:
+  gcc_assert (n_args == 2);
+  arg1 = gimple_call_arg (stmt, 1);
+  if (tree_fits_uhwi_p (arg1) && gimple_call_lhs (stmt))
+   {
+ unsigned int idx = tree_to_uhwi (arg1) & 0xff;

Re: [PATCH] Fold __builtin_ia32_[tl]zcnt_u{16,32,64} (PR target/78057)

2016-10-21 Thread Jakub Jelinek
On Fri, Oct 21, 2016 at 05:28:42PM +0200, Uros Bizjak wrote:
> On Fri, Oct 21, 2016 at 5:23 PM, Jakub Jelinek  wrote:
> > Hi!
> >
> > This patch adds folding for the new ia32 md builtins.
> > If they can be folded into constant, it is done in ix86_fold_builtin,
> > if they can fold to corresponding generic __builtin_c[lt]z* (which have
> > e.g. the advantage that VRP knows about what values it can have etc.),
> > it is done in gimple_fold_builtin target hook.
> 
> Are you sure that there is no way zero will be passed to generic
> __builtin_c[lt]z?

The patch only folds the ia32 specific builtins into __builtin_c[lt]z, if
the argument is known not to be 0 (from VRP).
That is the expr_not_equal_to call, which uses get_range_info under the
hood.

Jakub


Re: [PATCH] Fold __builtin_ia32_[tl]zcnt_u{16,32,64} (PR target/78057)

2016-10-21 Thread Uros Bizjak
On Fri, Oct 21, 2016 at 5:23 PM, Jakub Jelinek  wrote:
> Hi!
>
> This patch adds folding for the new ia32 md builtins.
> If they can be folded into constant, it is done in ix86_fold_builtin,
> if they can fold to corresponding generic __builtin_c[lt]z* (which have
> e.g. the advantage that VRP knows about what values it can have etc.),
> it is done in gimple_fold_builtin target hook.

Are you sure that there is no way zero will be passed to generic
__builtin_c[lt]z?

Uros.


[PATCH] Formatting fixes for some x86 intrin headers

2016-10-21 Thread Jakub Jelinek
Hi!

While looking at the bextr/bextri/bzhi/pdep/pext intrinsics,
I've noticed some ugly formatted code in the headers, this patch fixes
what I found.  Because the headers are installed, IMHO it is more important
to keep them properly formatted.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-10-21  Jakub Jelinek  

* config/i386/adxintrin.h (_subborrow_u32, _addcarry_u32,
_addcarryx_u32, _subborrow_u64, _addcarry_u64, _addcarryx_u64):
Formatting fixes.
* config/i386/rdseedintrin.h (_rdseed16_step, _rdseed32_step,
_rdseed64_step): Likewise.
* config/i386/tbmintrin.h (__bextri_u32): Likewise.

--- gcc/config/i386/adxintrin.h.jj  2016-01-04 14:55:55.0 +0100
+++ gcc/config/i386/adxintrin.h 2016-10-21 12:50:33.121927989 +0200
@@ -31,9 +31,9 @@
 extern __inline unsigned char
 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _subborrow_u32 (unsigned char __CF, unsigned int __X,
-   unsigned int __Y, unsigned int *__P)
+   unsigned int __Y, unsigned int *__P)
 {
-return __builtin_ia32_sbb_u32 (__CF, __Y, __X, __P);
+  return __builtin_ia32_sbb_u32 (__CF, __Y, __X, __P);
 }
 
 extern __inline unsigned char
@@ -41,7 +41,7 @@ __attribute__((__gnu_inline__, __always_
 _addcarry_u32 (unsigned char __CF, unsigned int __X,
   unsigned int __Y, unsigned int *__P)
 {
-return __builtin_ia32_addcarryx_u32 (__CF, __X, __Y, __P);
+  return __builtin_ia32_addcarryx_u32 (__CF, __X, __Y, __P);
 }
 
 extern __inline unsigned char
@@ -49,16 +49,16 @@ __attribute__((__gnu_inline__, __always_
 _addcarryx_u32 (unsigned char __CF, unsigned int __X,
unsigned int __Y, unsigned int *__P)
 {
-return __builtin_ia32_addcarryx_u32 (__CF, __X, __Y, __P);
+  return __builtin_ia32_addcarryx_u32 (__CF, __X, __Y, __P);
 }
 
 #ifdef __x86_64__
 extern __inline unsigned char
 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _subborrow_u64 (unsigned char __CF, unsigned long long __X,
-   unsigned long long __Y, unsigned long long *__P)
+   unsigned long long __Y, unsigned long long *__P)
 {
-return __builtin_ia32_sbb_u64 (__CF, __Y, __X, __P);
+  return __builtin_ia32_sbb_u64 (__CF, __Y, __X, __P);
 }
 
 extern __inline unsigned char
@@ -66,7 +66,7 @@ __attribute__((__gnu_inline__, __always_
 _addcarry_u64 (unsigned char __CF, unsigned long long __X,
   unsigned long long __Y, unsigned long long *__P)
 {
-return __builtin_ia32_addcarryx_u64 (__CF, __X, __Y, __P);
+  return __builtin_ia32_addcarryx_u64 (__CF, __X, __Y, __P);
 }
 
 extern __inline unsigned char
@@ -74,7 +74,7 @@ __attribute__((__gnu_inline__, __always_
 _addcarryx_u64 (unsigned char __CF, unsigned long long __X,
unsigned long long __Y, unsigned long long *__P)
 {
-return __builtin_ia32_addcarryx_u64 (__CF, __X, __Y, __P);
+  return __builtin_ia32_addcarryx_u64 (__CF, __X, __Y, __P);
 }
 #endif
 
--- gcc/config/i386/rdseedintrin.h.jj   2016-08-19 17:24:43.0 +0200
+++ gcc/config/i386/rdseedintrin.h  2016-10-21 12:52:14.680652144 +0200
@@ -39,14 +39,14 @@ extern __inline int
 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _rdseed16_step (unsigned short *__p)
 {
-return __builtin_ia32_rdseed_hi_step (__p);
+  return __builtin_ia32_rdseed_hi_step (__p);
 }
 
 extern __inline int
 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _rdseed32_step (unsigned int *__p)
 {
-return __builtin_ia32_rdseed_si_step (__p);
+  return __builtin_ia32_rdseed_si_step (__p);
 }
 
 #ifdef __x86_64__
@@ -54,7 +54,7 @@ extern __inline int
 __attribute__((__gnu_inline__, __always_inline__, __artificial__))
 _rdseed64_step (unsigned long long *__p)
 {
-return __builtin_ia32_rdseed_di_step (__p);
+  return __builtin_ia32_rdseed_di_step (__p);
 }
 #endif
 
--- gcc/config/i386/tbmintrin.h.jj  2016-01-04 14:55:55.0 +0100
+++ gcc/config/i386/tbmintrin.h 2016-10-21 12:51:16.194386886 +0200
@@ -38,12 +38,12 @@
 extern __inline unsigned int __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 __bextri_u32 (unsigned int __X, const unsigned int __I)
 {
-   return __builtin_ia32_bextri_u32 (__X, __I);
+  return __builtin_ia32_bextri_u32 (__X, __I);
 }
 #else
-#define __bextri_u32(X, I)   \
-((unsigned int)__builtin_ia32_bextri_u32 ((unsigned int)(X), \
- (unsigned int)(I)))
+#define __bextri_u32(X, I) \
+  ((unsigned int)__builtin_ia32_bextri_u32 ((unsigned int)(X), \
+   (unsigned int)(I)))
 #endif /*__OPTIMIZE__ */
 
 extern __inline unsigned int __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))

Jakub


[PATCH] Fold __builtin_ia32_[tl]zcnt_u{16,32,64} (PR target/78057)

2016-10-21 Thread Jakub Jelinek
Hi!

This patch adds folding for the new ia32 md builtins.
If they can be folded into constant, it is done in ix86_fold_builtin,
if they can fold to corresponding generic __builtin_c[lt]z* (which have
e.g. the advantage that VRP knows about what values it can have etc.),
it is done in gimple_fold_builtin target hook.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2016-10-21  Jakub Jelinek  

PR target/78057
* config/i386/i386.c: Include fold-const-call.h, tree-vrp.h
and tree-ssanames.h.
(ix86_fold_builtin): Fold IX86_BUILTIN_[LT]ZCNT{16,32,64}
with INTEGER_CST argument.
(ix86_gimple_fold_builtin): New function.
(TARGET_GIMPLE_FOLD_BUILTIN): Define.

* gcc.target/i386/pr78057.c: New test.

--- gcc/config/i386/i386.c.jj   2016-10-21 11:36:33.135677698 +0200
+++ gcc/config/i386/i386.c  2016-10-21 11:57:58.248530521 +0200
@@ -77,6 +77,9 @@ along with GCC; see the file COPYING3.
 #include "case-cfn-macros.h"
 #include "regrename.h"
 #include "dojump.h"
+#include "fold-const-call.h"
+#include "tree-vrp.h"
+#include "tree-ssanames.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -2,6 +5,40 @@ ix86_fold_builtin (tree fndecl, int n_ar
return build_real (type, inf);
  }
 
+   case IX86_BUILTIN_TZCNT16:
+   case IX86_BUILTIN_TZCNT32:
+   case IX86_BUILTIN_TZCNT64:
+ gcc_assert (n_args == 1);
+ if (TREE_CODE (args[0]) == INTEGER_CST)
+   {
+ tree type = TREE_TYPE (TREE_TYPE (fndecl));
+ tree arg = args[0];
+ if (fn_code == IX86_BUILTIN_TZCNT16)
+   arg = fold_convert (short_unsigned_type_node, arg);
+ if (integer_zerop (arg))
+   return build_int_cst (type, TYPE_PRECISION (TREE_TYPE (arg)));
+ else
+   return fold_const_call (CFN_CTZ, type, arg);
+   }
+ break;
+
+   case IX86_BUILTIN_LZCNT16:
+   case IX86_BUILTIN_LZCNT32:
+   case IX86_BUILTIN_LZCNT64:
+ gcc_assert (n_args == 1);
+ if (TREE_CODE (args[0]) == INTEGER_CST)
+   {
+ tree type = TREE_TYPE (TREE_TYPE (fndecl));
+ tree arg = args[0];
+ if (fn_code == IX86_BUILTIN_LZCNT16)
+   arg = fold_convert (short_unsigned_type_node, arg);
+ if (integer_zerop (arg))
+   return build_int_cst (type, TYPE_PRECISION (TREE_TYPE (arg)));
+ else
+   return fold_const_call (CFN_CLZ, type, arg);
+   }
+ break;
+
default:
  break;
}
@@ -33344,6 +33381,67 @@ ix86_fold_builtin (tree fndecl, int n_ar
   return NULL_TREE;
 }
 
+/* Fold a MD builtin (use ix86_fold_builtin for folding into
+   constant) in GIMPLE.  */
+
+bool
+ix86_gimple_fold_builtin (gimple_stmt_iterator *gsi)
+{
+  gimple *stmt = gsi_stmt (*gsi);
+  tree fndecl = gimple_call_fndecl (stmt);
+  gcc_checking_assert (fndecl && DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_MD);
+  int n_args = gimple_call_num_args (stmt);
+  enum ix86_builtins fn_code = (enum ix86_builtins) DECL_FUNCTION_CODE 
(fndecl);
+  tree decl = NULL_TREE;
+  tree arg0;
+
+  switch (fn_code)
+{
+case IX86_BUILTIN_TZCNT32:
+  decl = builtin_decl_implicit (BUILT_IN_CTZ);
+  goto fold_tzcnt_lzcnt;
+
+case IX86_BUILTIN_TZCNT64:
+  decl = builtin_decl_implicit (BUILT_IN_CTZLL);
+  goto fold_tzcnt_lzcnt;
+
+case IX86_BUILTIN_LZCNT32:
+  decl = builtin_decl_implicit (BUILT_IN_CLZ);
+  goto fold_tzcnt_lzcnt;
+
+case IX86_BUILTIN_LZCNT64:
+  decl = builtin_decl_implicit (BUILT_IN_CLZLL);
+  goto fold_tzcnt_lzcnt;
+
+fold_tzcnt_lzcnt:
+  gcc_assert (n_args == 1);
+  arg0 = gimple_call_arg (stmt, 0);
+  if (TREE_CODE (arg0) == SSA_NAME && decl && gimple_call_lhs (stmt))
+   {
+ int prec = TYPE_PRECISION (TREE_TYPE (arg0));
+ if (!expr_not_equal_to (arg0, wi::zero (prec)))
+   return false;
+
+ location_t loc = gimple_location (stmt);
+ gimple *g = gimple_build_call (decl, 1, arg0);
+ gimple_set_location (g, loc);
+ tree lhs = make_ssa_name (integer_type_node);
+ gimple_call_set_lhs (g, lhs);
+ gsi_insert_before (gsi, g, GSI_SAME_STMT);
+ g = gimple_build_assign (gimple_call_lhs (stmt), NOP_EXPR, lhs);
+ gimple_set_location (g, loc);
+ gsi_replace (gsi, g, true);
+ return true;
+   }
+  break;
+
+default:
+  break;
+}
+
+  return false;
+}
+
 /* Make builtins to detect cpu type and features supported.  NAME is
the builtin name, CODE is the builtin code, and FTYPE is the function
type of the builtin.  */
@@ -50531,6 +50629,9 @@ ix86_addr_space_zero_address_valid (addr
 #undef TARGET_FOLD_BUILTIN
 #define TARGET_FOLD_BUILTIN ix86_fold_builtin
 
+#undef TARGET_GIMPLE_FOLD_BUILTIN
+#define TARGET_GIMPLE_FOLD_B

Re: [C++ PATCH] RFC: implement P0386R2 - C++17 inline variables

2016-10-21 Thread Jakub Jelinek
On Fri, Oct 21, 2016 at 03:57:34PM +0100, Yao Qi wrote:
> Hi Jakub,
> 
> On Thu, Oct 20, 2016 at 5:21 PM, Andre Vieira (lists)
>  wrote:
> >  <2><8f5>: Abbrev Number: 38 (DW_TAG_member)
> > <8f6>   DW_AT_specification: <0x8be>
> > <8fa>   DW_AT_linkage_name: (indirect string, offset: 0x4a0):
> > _ZN6BANANA1sE
> > <8fe>   DW_AT_location: 5 byte block: 3 64 bf 1 0
> > (DW_OP_addr: 1bf64)
> >
> > I haven't tested it on other targets.
> 
> I can reproduce it on x86_64 as well.

First of all, I have a pending patch for this area:
http://gcc.gnu.org/ml/gcc-patches/2016-10/msg01183.html
so I think it doesn't really make much sense to discuss it until it gets in.
But unless you are talking about -std=c++17 or code with explicit inline
vars, I don't think anything has really changed in the debug representation
with that patch.

Jakub


Re: [PATCH] Emit DW_AT_const_expr for constexpr variables (take 2)

2016-10-21 Thread Jason Merrill
On Fri, Oct 21, 2016 at 10:40 AM, Jakub Jelinek  wrote:
> On Fri, Oct 21, 2016 at 09:58:01AM -0400, Jason Merrill wrote:
>> On Thu, Oct 20, 2016 at 2:27 PM, Jakub Jelinek  wrote:
>> > +  if ((dwarf_version >= 4 || !dwarf_strict)
>>
>> Why >=4?  Isn't this a DWARF 5 feature?
>
> It is actually DWARF 4 already.

Ah, so it is, I was looking at an earlier draft.  OK.

Jason


Re: [C++ PATCH] RFC: implement P0386R2 - C++17 inline variables

2016-10-21 Thread Yao Qi
Hi Jakub,

On Thu, Oct 20, 2016 at 5:21 PM, Andre Vieira (lists)
 wrote:
>  <2><8f5>: Abbrev Number: 38 (DW_TAG_member)
> <8f6>   DW_AT_specification: <0x8be>
> <8fa>   DW_AT_linkage_name: (indirect string, offset: 0x4a0):
> _ZN6BANANA1sE
> <8fe>   DW_AT_location: 5 byte block: 3 64 bf 1 0
> (DW_OP_addr: 1bf64)
>
> I haven't tested it on other targets.

I can reproduce it on x86_64 as well.

 <1><328>: Abbrev Number: 20 (DW_TAG_class_type)
<329>   DW_AT_name: A
<32b>   DW_AT_byte_size   : 24
<32c>   DW_AT_decl_file   : 1
<32d>   DW_AT_decl_line   : 23
<32e>   DW_AT_containing_type: <0x328>
<332>   DW_AT_sibling : <0x458>

 <2><336>: Abbrev Number: 19 (DW_TAG_member)
<337>   DW_AT_name: s
<339>   DW_AT_decl_file   : 1
<33a>   DW_AT_decl_line   : 40
<33b>   DW_AT_type: <0x5e>
<33f>   DW_AT_external: 1
<33f>   DW_AT_accessibility: 1  (public)
<340>   DW_AT_declaration : 1
 <2><36d>: Abbrev Number: 23 (DW_TAG_member)
<36e>   DW_AT_specification: <0x336>
<372>   DW_AT_linkage_name: (indirect string, offset: 0x447): _ZN1A1sE
<376>   DW_AT_location: 9 byte block: 3 10 15 60 0 0 0 0 0
 (DW_OP_addr: 601510)

We have two DIEs for member 's'.  GDB adds both of them as two fields,
the first one as static member (because of DW_AT_declaration), and the
second one as a non-static member.  GDB doesn't understand the
relationship between these two DIEs by DW_AT_specification.

Is attribute DW_AT_specification applicable to DW_TAG_member?
This is not documented in DWARF5 Appendix A "Attribute by Tage Value",
Page 258.

-- 
Yao (齐尧)


Re: [ada, testsuite] Parallelize check-gnat

2016-10-21 Thread Jakub Jelinek
On Fri, Oct 21, 2016 at 04:01:48PM +0200, Rainer Orth wrote:
> I happened to notice that the gnat.dg testsuite run is slow even on a
> reasonably fast SPARC machine (3.6 GHz SPARC T5) and together with the
> libgomp testsuite (PR libgomp/66005) dominates bootstrap time: within a
> make -j96 -k check, it takes 1h 18m 37s.  For unknown reasons,
> check-gnat isn't parallelized though it is trivial to do and buys quite
> a bit:

check-gnat dominates anything?  That just really weird,
it has only
# of expected passes2544
# of unexpected failures2
# of expected failures  24
# of unsupported tests  3

compared to the 10+ tests in gcc/g++ or 4+ in gfortran testsuites
it is just nothing.

libgomp is a know problem, sure, the problem with parallelizing it is that
many tests just use all available cores/threads.  Perhaps we should do some
small (at most 2 or 3 concurrent libgomp tests) parallelization of the
libgomp testsuite unless disallowed through some env var option, but in that
case bound OMP_NUM_THREADS if `getconf _NPROCESSORS_ONLN` > 32 to
`getconf _NPROCESSORS_ONLN` / 2 or something similar.

I'm not strongly against your patch, I'm just very surprised it is really
needed (acats is much larger, check-gnat is small).

> 2016-10-21  Rainer Orth  
> 
>   * gcc-interface/Make-lang.in (lang_checks_parallelized): New target.
>   (check_gnat_parallelize): Likewise.
> 

Jakub


Re: [testsuite, i386] Work around 32-bit i386 STV testcases failing with -mstackrealign (PR target/77483)

2016-10-21 Thread Uros Bizjak
On Fri, Oct 21, 2016 at 4:17 PM, Rainer Orth
 wrote:
> The following patch works around quite a number of i386 testcases
> FAILing on Solaris/x86, as reported in the PR.  To avoid tons of
> testsuite noise, the following patch adds -mno-stackrealign to the
> affected testcases and will thus benefit other targets that default to
> -mstackrealign, too.
>
> Bootstrapped without regressions on i386-pc-solaris2.12 and
> x86_64-pc-linux-gnuu (both multilibs in each case).
>
> Ok for mainline and (eventually) the gcc-6 branch?
>
> Thanks.
> Rainer
>
> --
> -
> Rainer Orth, Center for Biotechnology, Bielefeld University
>
>
> 2016-09-07  Rainer Orth  
>
> PR target/77483
> * gcc.target/i386/mask-unpack.c (dg-options): Add -mno-stackrealign.
> * gcc.target/i386/pr65105-1.c: Likewise.
> * gcc.target/i386/pr65105-2.c: Likewise.
> * gcc.target/i386/pr65105-3.c: Likewise.
> * gcc.target/i386/pr65105-5.c: Likewise.
> * gcc.target/i386/pr67761.c: Likewise.
> * gcc.target/i386/pr70799-1.c: Likewise.

OK, also for backport.

FTR, we realign main even with -mno-stackrealign, so simple runtime
tests should work OK on targets that default to -mstackrealign.

Thanks,
Uros.


Re: [PATCH] Emit DW_AT_const_expr for constexpr variables (take 2)

2016-10-21 Thread Jakub Jelinek
On Fri, Oct 21, 2016 at 09:58:01AM -0400, Jason Merrill wrote:
> On Thu, Oct 20, 2016 at 2:27 PM, Jakub Jelinek  wrote:
> > +  if ((dwarf_version >= 4 || !dwarf_strict)
> 
> Why >=4?  Isn't this a DWARF 5 feature?

It is actually DWARF 4 already.

Looking at DWARF4 DW_AT_* additions, we also don't emit
DW_AT_data_bit_offset (should replace DW_AT_bit_offset in some cases),
other attributes are emitted already.

Jakub


Re: [PATCH, RFC] Introduce -fsanitize=use-after-scope (v2)

2016-10-21 Thread Jakub Jelinek
On Wed, Oct 12, 2016 at 04:07:53PM +0200, Martin Liška wrote:
> > Ok, first let me list some needed follow-ups that don't need to be handled
> > right away:
> > - r237814-like changes for ASAN_MARK
> 
> Yes, that's missing. I'm wondering whether the same approach would be viable 
> as
> the {un}poisoning happens during gimplification. Thus it generates &var 
> expressions
> which verifier won't be happy about?

Sure, it uses &var.  The trick is that the addressable sub-pass then ignores
the taking of the address just in ASAN_MARK, and if some var is determined
to be addressable solely because of ASAN_MARK &var uses and no other reason,
the ASAN_MARK would be dropped and variable rewritten into SSA form.

> > - optimization to remove ASAN_MARK unpoisoning at the start of the function
> 
> As current implementation does not poison variables at the very beginning of
> a functions (RTL stack frame emission), it won't be needed.

But you still ASAN_MARK unpoison the vars when they get into scope, right?
And those can be removed if the optimizers could prove that the area has not
been poisoned yet since the beginning of the function.

> > - optimization to remove ASAN_MARK poisoning at the end of function (and
> >   remove epilogue unpoisoning)
> > - optimization to remove ASAN_MARK unpoisoning followed by ASAN_MARK 
> > poisoning
> >   or vice versa if there are no memory accesses in between
> 
> Yep, both are not handled and are very similar from my perspective: pairing
> poisoning and unpoisoning pair which are not interfered by a memory operation
> in between (in dominator meaning of word).
> I'm wondering whether can be done effectively as we would need to visit all 
> BBs
> in a dominance domain (get_all_dominated_blocks) and check for the memory 
> operations.
> One improvement can be set of BBs that do not have any memory operations 
> (excluding
> ASAN_CHECK, ASAN_MARK builtins), but it can be still expensive to simplify 
> poisoning
> for all local variables. Or am I wrong?

My memory is weak, but isn't this something the sanopt pass
(sanopt_optimize) is already doing similarly for e.g. ASAN_CHECK, UBSAN_NULL
and UBSAN_VPTR checks?  For ASAN_MARK, you actually don't care about any
calls, those shouldn't unpoison or poison again the vars under compiler's
back.

> > - try to improve the goto handling
> 
> Works for me to be target for stage3.

Sure.

> 2016-09-27  Martin Liska  

Likely newer date :)

>   * c-common.c (warn_for_unused_label): Save all labels used
>   in goto or in &label;

&label.
instead?

> +   if (dump_file && (dump_flags & TDF_DETAILS))
> + {
> +   const char *n = DECL_NAME (decl)
> + ? IDENTIFIER_POINTER (DECL_NAME (decl)) : "";

Bad formatting.

  const char *n = (DECL_NAME (decl)
   ? IDENTIFIER_POINTER (DECL_NAME (decl))
   : "");
or
  const char *n
= (DECL_NAME (decl)
   ? IDENTIFIER_POINTER (DECL_NAME (decl)) : "");

>  /* Return true if DECL should be guarded on the stack.  */
>  
>  static inline bool
>  asan_protect_stack_decl (tree decl)
>  {
> -  return DECL_P (decl) && !DECL_ARTIFICIAL (decl);
> +  return DECL_P (decl) && TREE_ADDRESSABLE (decl);
>  }

Can you explain this change?  It won't affect just
-fsanitize-use-after-scope, and goes in both directions, adds some
decls that weren't previously protected and removes others that were
previously protected.

Is the removal of !DECL_ARTIFICIAL so that you can use-after-scope
verify the C++ TARGET_EXPR temporaries?  Do we need to stack protect
those even for -fno-sanitize-use-after-scope?
I'm not 100% sure if we keep TREE_ADDRESSABLE meaningful for variables that
need to live in memory for other reasons until expansion.

So, I wonder if we don't want && (TREE_ADDRESSABLE (decl) || !DECL_ARTIFICAL 
(decl))
or just DECL_P (decl); or conditionalize something on
-fsanitize-use-after-scope.  Perhaps the change is good as is, just want to
point out that it affects also other -fsanitize=address modes in both
ways (instruments something that hasn't been before, and stops instrumenting
something that has been before).

> @@ -1514,7 +1503,8 @@ defer_stack_allocation (tree var, bool toplevel)
>/* If stack protection is enabled, *all* stack variables must be deferred,
>   so that we can re-order the strings to the top of the frame.
>   Similarly for Address Sanitizer.  */
> -  if (flag_stack_protect || asan_sanitize_stack_p ())
> +  if (flag_stack_protect
> +  || asan_sanitize_stack_p ())
>  return true;

This hunk isn't needed, if all the conditions fit on one line,
one line is better.

Jakub


[testsuite, i386] Work around 32-bit i386 STV testcases failing with -mstackrealign (PR target/77483)

2016-10-21 Thread Rainer Orth
The following patch works around quite a number of i386 testcases
FAILing on Solaris/x86, as reported in the PR.  To avoid tons of
testsuite noise, the following patch adds -mno-stackrealign to the
affected testcases and will thus benefit other targets that default to
-mstackrealign, too.

Bootstrapped without regressions on i386-pc-solaris2.12 and
x86_64-pc-linux-gnuu (both multilibs in each case).

Ok for mainline and (eventually) the gcc-6 branch?

Thanks.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2016-09-07  Rainer Orth  

PR target/77483
* gcc.target/i386/mask-unpack.c (dg-options): Add -mno-stackrealign.
* gcc.target/i386/pr65105-1.c: Likewise.
* gcc.target/i386/pr65105-2.c: Likewise.
* gcc.target/i386/pr65105-3.c: Likewise.
* gcc.target/i386/pr65105-5.c: Likewise.
* gcc.target/i386/pr67761.c: Likewise.
* gcc.target/i386/pr70799-1.c: Likewise.

# HG changeset patch
# Parent  3e7f3a609bf8231e3e4c8be3a1a84b62a02a1e1e
Work around -mstackrealign disabled for 32-bit (PR target/77483)

diff --git a/gcc/testsuite/gcc.target/i386/mask-unpack.c b/gcc/testsuite/gcc.target/i386/mask-unpack.c
--- a/gcc/testsuite/gcc.target/i386/mask-unpack.c
+++ b/gcc/testsuite/gcc.target/i386/mask-unpack.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-mavx512bw -mavx512dq -O3 -fopenmp-simd -fdump-tree-vect-details" } */
+/* { dg-options "-mavx512bw -mavx512dq -mno-stackrealign -O3 -fopenmp-simd -fdump-tree-vect-details" } */
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 10 "vect" } } */
 /* { dg-final { scan-assembler-not "maskmov" } } */
 
diff --git a/gcc/testsuite/gcc.target/i386/pr65105-1.c b/gcc/testsuite/gcc.target/i386/pr65105-1.c
--- a/gcc/testsuite/gcc.target/i386/pr65105-1.c
+++ b/gcc/testsuite/gcc.target/i386/pr65105-1.c
@@ -1,6 +1,6 @@
 /* PR target/pr65105 */
 /* { dg-do run { target { ia32 } } } */
-/* { dg-options "-O2 -msse2 -mtune=slm -save-temps" } */
+/* { dg-options "-O2 -msse2 -mtune=slm -mno-stackrealign -save-temps" } */
 /* { dg-require-effective-target sse2 } */
 /* { dg-final { scan-assembler "por" } } */
 /* { dg-final { scan-assembler "pand" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr65105-2.c b/gcc/testsuite/gcc.target/i386/pr65105-2.c
--- a/gcc/testsuite/gcc.target/i386/pr65105-2.c
+++ b/gcc/testsuite/gcc.target/i386/pr65105-2.c
@@ -1,6 +1,6 @@
 /* PR target/pr65105 */
 /* { dg-do compile { target { ia32 } } } */
-/* { dg-options "-O2 -msse2" } */
+/* { dg-options "-O2 -msse2 -mno-stackrealign" } */
 /* { dg-final { scan-assembler "por" } } */
 
 long long i1, i2, res;
diff --git a/gcc/testsuite/gcc.target/i386/pr65105-3.c b/gcc/testsuite/gcc.target/i386/pr65105-3.c
--- a/gcc/testsuite/gcc.target/i386/pr65105-3.c
+++ b/gcc/testsuite/gcc.target/i386/pr65105-3.c
@@ -1,6 +1,6 @@
 /* PR target/pr65105 */
 /* { dg-do compile { target { ia32 } } } */
-/* { dg-options "-O2 -march=slm -msse4.2" } */
+/* { dg-options "-O2 -march=slm -msse4.2 -mno-stackrealign" } */
 /* { dg-final { scan-assembler "pand" } } */
 /* { dg-final { scan-assembler "por" } } */
 /* { dg-final { scan-assembler "ptest" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr65105-5.c b/gcc/testsuite/gcc.target/i386/pr65105-5.c
--- a/gcc/testsuite/gcc.target/i386/pr65105-5.c
+++ b/gcc/testsuite/gcc.target/i386/pr65105-5.c
@@ -1,6 +1,6 @@
 /* PR target/pr65105 */
 /* { dg-do compile { target { ia32 } } } */
-/* { dg-options "-O2 -march=core-avx2" } */
+/* { dg-options "-O2 -march=core-avx2 -mno-stackrealign" } */
 /* { dg-final { scan-assembler "pandn" } } */
 /* { dg-final { scan-assembler "pxor" } } */
 /* { dg-final { scan-assembler "ptest" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr67761.c b/gcc/testsuite/gcc.target/i386/pr67761.c
--- a/gcc/testsuite/gcc.target/i386/pr67761.c
+++ b/gcc/testsuite/gcc.target/i386/pr67761.c
@@ -1,6 +1,6 @@
 /* PR target/pr67761 */
 /* { dg-do compile { target { ia32 } } } */
-/* { dg-options "-O2 -march=slm -g" } */
+/* { dg-options "-O2 -march=slm -mno-stackrealign -g" } */
 /* { dg-final { scan-assembler "paddq" } } */
 
 void
diff --git a/gcc/testsuite/gcc.target/i386/pr70799-1.c b/gcc/testsuite/gcc.target/i386/pr70799-1.c
--- a/gcc/testsuite/gcc.target/i386/pr70799-1.c
+++ b/gcc/testsuite/gcc.target/i386/pr70799-1.c
@@ -1,6 +1,6 @@
 /* PR target/pr70799 */
 /* { dg-do compile { target { ia32 } } } */
-/* { dg-options "-O2 -march=slm" } */
+/* { dg-options "-O2 -march=slm -mno-stackrealign" } */
 /* { dg-final { scan-assembler "pxor" } } */
 /* { dg-final { scan-assembler "pcmpeqd" } } */
 /* { dg-final { scan-assembler "movdqa\[ \\t\]+.?LC0" } } */


Re: [ada, testsuite] Parallelize check-gnat

2016-10-21 Thread Arnaud Charlet
> Ok for mainline (and eventually for 5 and 6 branches given the small
> size and low risk of the patch)?

I'm not familiar with lang_checks_parallelized, but that's OK with me on
principle.

Arno


Re: [PATCH] Emit DW_AT_const_expr for constexpr variables (take 2)

2016-10-21 Thread Jason Merrill
On Thu, Oct 20, 2016 at 2:27 PM, Jakub Jelinek  wrote:
> +  if ((dwarf_version >= 4 || !dwarf_strict)

Why >=4?  Isn't this a DWARF 5 feature?

OK with that change.

Jason


[PATCH] S/390: Add support for arch arch/tune options.

2016-10-21 Thread Andreas Krebbel
This patch adds an alternate CPU level naming following the
architecture level number in the Principles of Operations manual.  So
instead of having z196, zEC12, and z13 you can use arch9, arch10, and
arch11.  The old cpu names stay valid and should preferably be used.

The alternate names are supposed to improve compatibility with the IBM
XL compiler toolchain which uses the arch numbering.

Tested on s390x. No regression.

I'll commit it in a few days.

-Andreas-

gcc/testsuite/ChangeLog:

2016-10-21  Andreas Krebbel  

* gcc.target/s390/target-attribute/tattr-m64-33.c: New test.

gcc/ChangeLog:

2016-10-21  Andreas Krebbel  

* config/s390/s390.opt: Support alternate cpu level naming (archXX).
* config.gcc: Support alternate archXX cpu levels with
--with-arch= and --with-tune=.
* config/s390/linux.h: Translate new archXX cpu levels to the
original names when calling GAS.
* config/s390/tpf.h: Likewise.
* doc/invoke.texi: Document the alternate cpu level names.
---
 gcc/config.gcc |   2 +-
 gcc/config/s390/linux.h|  14 +-
 gcc/config/s390/s390.opt   |  24 ++
 gcc/config/s390/tpf.h  |  17 +-
 gcc/doc/invoke.texi|  16 +-
 .../s390/target-attribute/tattr-m64-33.c   | 353 +
 6 files changed, 416 insertions(+), 10 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/s390/target-attribute/tattr-m64-33.c

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 2143d63..507af5c 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -4172,7 +4172,7 @@ case "${target}" in
for which in arch tune; do
eval "val=\$with_$which"
case ${val} in
-   "" | native | g5 | g6 | z900 | z990 | z9-109 | z9-ec | 
z10 | z196 | zEC12 | z13)
+   "" | native | g5 | g6 | z900 | z990 | z9-109 | z9-ec | 
z10 | z196 | zEC12 | z13 | arch3 | arch5 | arch6 | arch7 | arch8 | arch9 | 
arch10 | arch11)
# OK
;;
*)
diff --git a/gcc/config/s390/linux.h b/gcc/config/s390/linux.h
index 9b00af7..541006d 100644
--- a/gcc/config/s390/linux.h
+++ b/gcc/config/s390/linux.h
@@ -47,9 +47,19 @@ along with GCC; see the file COPYING3.  If not see
 
 
 /* Target specific assembler settings.  */
-
+/* Rewrite -march=arch* options to the original CPU name in order to
+   make it work with older binutils.  */
 #undef  ASM_SPEC
-#define ASM_SPEC "%{m31&m64}%{mesa&mzarch}%{march=*}"
+#define ASM_SPEC   \
+  "%{m31&m64}%{mesa&mzarch}%{march=z*}"\
+  "%{march=arch3:-march=g5}"   \
+  "%{march=arch5:-march=z900}" \
+  "%{march=arch6:-march=z990}" \
+  "%{march=arch7:-march=z9-ec}"\
+  "%{march=arch8:-march=z10}"  \
+  "%{march=arch9:-march=z196}" \
+  "%{march=arch10:-march=zEC12}"   \
+  "%{march=arch11:-march=z13}"
 
 
 /* Target specific linker settings.  */
diff --git a/gcc/config/s390/s390.opt b/gcc/config/s390/s390.opt
index 1ae1396..569ed95 100644
--- a/gcc/config/s390/s390.opt
+++ b/gcc/config/s390/s390.opt
@@ -62,33 +62,57 @@ EnumValue
 Enum(processor_type) String(g5) Value(PROCESSOR_9672_G5)
 
 EnumValue
+Enum(processor_type) String(arch3) Value(PROCESSOR_9672_G5)
+
+EnumValue
 Enum(processor_type) String(g6) Value(PROCESSOR_9672_G6)
 
 EnumValue
 Enum(processor_type) String(z900) Value(PROCESSOR_2064_Z900)
 
 EnumValue
+Enum(processor_type) String(arch5) Value(PROCESSOR_2064_Z900)
+
+EnumValue
 Enum(processor_type) String(z990) Value(PROCESSOR_2084_Z990)
 
 EnumValue
+Enum(processor_type) String(arch6) Value(PROCESSOR_2084_Z990)
+
+EnumValue
 Enum(processor_type) String(z9-109) Value(PROCESSOR_2094_Z9_109)
 
 EnumValue
 Enum(processor_type) String(z9-ec) Value(PROCESSOR_2094_Z9_EC)
 
 EnumValue
+Enum(processor_type) String(arch7) Value(PROCESSOR_2094_Z9_EC)
+
+EnumValue
 Enum(processor_type) String(z10) Value(PROCESSOR_2097_Z10)
 
 EnumValue
+Enum(processor_type) String(arch8) Value(PROCESSOR_2097_Z10)
+
+EnumValue
 Enum(processor_type) String(z196) Value(PROCESSOR_2817_Z196)
 
 EnumValue
+Enum(processor_type) String(arch9) Value(PROCESSOR_2817_Z196)
+
+EnumValue
 Enum(processor_type) String(zEC12) Value(PROCESSOR_2827_ZEC12)
 
 EnumValue
+Enum(processor_type) String(arch10) Value(PROCESSOR_2827_ZEC12)
+
+EnumValue
 Enum(processor_type) String(z13) Value(PROCESSOR_2964_Z13)
 
 EnumValue
+Enum(processor_type) String(arch11) Value(PROCESSOR_2964_Z13)
+
+EnumValue
 Enum(processor_type) String(native) Value(PROCESSOR_NATIVE) DriverOnly
 
 mbackchain
diff --git a/gcc/config/s390/tpf.h b/gcc/config/s390/tpf.h
index 

[ada, testsuite] Parallelize check-gnat

2016-10-21 Thread Rainer Orth
I happened to notice that the gnat.dg testsuite run is slow even on a
reasonably fast SPARC machine (3.6 GHz SPARC T5) and together with the
libgomp testsuite (PR libgomp/66005) dominates bootstrap time: within a
make -j96 -k check, it takes 1h 18m 37s.  For unknown reasons,
check-gnat isn't parallelized though it is trivial to do and buys quite
a bit:

* On the same machine, though otherwise idle, it reduces make -j96
  check-gnat time to 2m 23s and even within a full bootstrap, the time
  goes down to 44m 6s.

* On x86 systems, there are also considable speedups:

  2.6 GHz AMD Opteron 8435, -j2443m 24s => 33m 4s
  2.93 GHz Intel Xeon X7350, -j16   30m 7s  =>  9m 8s
  2.67 GHz Intel Xeon X7542, -j48   14m 56s =>  5m 50s

Seems like a worthwhile speedup to me.  Bootstrapped without regressions
on i386-pc-solaris2.12, sparc-sun-solaris2.12, and x86_64-pc-linux-gnu.
dg-cmp-results.sh reports the sequential and parallel gnat.sum as
identical.

Ok for mainline (and eventually for 5 and 6 branches given the small
size and low risk of the patch)?

Thanks.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2016-10-21  Rainer Orth  

* gcc-interface/Make-lang.in (lang_checks_parallelized): New target.
(check_gnat_parallelize): Likewise.

# HG changeset patch
# Parent  13db0c5f22f787b7a09b81e1173677a02afa240d
Parallelize check-gnat

diff --git a/gcc/ada/gcc-interface/Make-lang.in b/gcc/ada/gcc-interface/Make-lang.in
--- a/gcc/ada/gcc-interface/Make-lang.in
+++ b/gcc/ada/gcc-interface/Make-lang.in
@@ -863,6 +863,9 @@ ada.stagefeedback: stagefeedback-start
 	-$(MV) ada/stamp-* stagefeedback/ada
 
 lang_checks += check-gnat
+lang_checks_parallelized += check-gnat
+# For description see the check_$lang_parallelize comment in gcc/Makefile.in.
+check_gnat_parallelize = 1000
 
 check-ada: check-acats check-gnat
 check-ada-subtargets: check-acats-subtargets check-gnat-subtargets


Re: [PATCH 2/5] [AARCH64] Change IMP and PART over to integers from strings.

2016-10-21 Thread James Greenhalgh
On Sat, Oct 15, 2016 at 07:38:40PM -0700, Andrew Pinski wrote:
> On Wed, Nov 25, 2015 at 11:59 AM, Andrew Pinski  wrote:
> Here is finally an updated (fixed) patch (I did not implement the two
> implementer big.LITTLE support yet, that will be for a different patch
> since I also fixed the part no not being unique as a separate patch.
> Once I get a new enough kernel, I will also look into doing the
> /sys/cpu/* style detection first.
> 
> OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions
> (and tested hacking the location of the read in file to see if it
> works with big.LITTLE and other formats of /proc/cpuinfo).

I'm OK with this in principle, but it needs some polish for pedantic
style comments...

> * config/aarch64/aarch64-cores.def: Rewrite so IMP and PART are
> integer constants.
> * config/aarch64/driver-aarch64.c (struct aarch64_core_data): Change
> implementer_id to unsigned char.
> Change part_no to unsigned int.
> (AARCH64_BIG_LITTLE): New define.
> (INVALID_IMP): New define.
> (INVALID_CORE): New define.
> (cpu_data): Change the last element's implementer_id and part_no to integers.
> (valid_bL_string_p): Rewrite to ..
> (valid_bL_core_p): this for integers instead of strings.
> (parse_field): New function.
> (contains_string_p): Rewrite to ...
> (contains_core_p): this for integers and only for the part_no.
> (host_detect_local_cpu): Rewrite handling of implementation and part
> num to be integers;
> simplifying the code.

> Index: config/aarch64/aarch64-cores.def
> ===
> --- config/aarch64/aarch64-cores.def  (revision 241200)
> +++ config/aarch64/aarch64-cores.def  (working copy)
> @@ -32,43 +32,46 @@
> FLAGS are the bitwise-or of the traits that apply to that core.
> This need not include flags implied by the architecture.
> COSTS is the name of the rtx_costs routine to use.
> -   IMP is the implementer ID of the CPU vendor.  On a GNU/Linux system it can
> -   be found in /proc/cpuinfo.
> +   IMP is the implementer ID of the CPU vendor.  On a GNU/Linux system it
> +   can be found in /proc/cpuinfo. A partial list of implementer IDs is
> +   given in the ARM Architecture Reference Manual ARMv8, for
> -   in /proc/cpuinfo.  For big.LITTLE systems this should have the form at of
> -   ".".  */
> +   in /proc/cpuinfo.  For big.LITTLE systems this should use the macro 
> AARCH64_BIG_LITTLE
> +   where the big part number comes as the first arugment to the macro and 
> little is the
> +   second.  */

Needs rewrapped for 80 char width.

>  
> -static bool
> -valid_bL_string_p (const char** core, const char* bL_string)
> + static bool
> +valid_bL_core_p (unsigned int *core, unsigned int bL_core)

Stray space before static.

>  {
> -  return strstr (bL_string, core[0]) != NULL
> -&& strstr (bL_string, core[1]) != NULL;
> +  return AARCH64_BIG_LITTLE (core[0], core[1]) == bL_core
> + || AARCH64_BIG_LITTLE (core[1], core[0]) == bL_core;
> +}
> +
> +/* Returns the integer that is after ':' for the field. */
> +static unsigned parse_field (const char *field)

parse_field should be on a new line, FIELD should be capitalised in the
explanatory comment.

OK with the appropriate changes to rectify these points.

Thanks,
James



Re: [PATCH][check_GNU_style.sh] More aggressively ignore dg-xxx directives

2016-10-21 Thread Kyrill Tkachov

Ping.
https://gcc.gnu.org/ml/gcc-patches/2016-10/msg00982.html

Thanks,
Kyrill

On 13/10/16 09:11, Kyrill Tkachov wrote:


On 12/10/16 17:49, Martin Sebor wrote:

On 10/12/2016 06:43 AM, Kyrill Tkachov wrote:


On 12/10/16 11:18, Kyrill Tkachov wrote:


On 12/10/16 10:57, Kyrill Tkachov wrote:


On 11/10/16 20:19, Jakub Jelinek wrote:

On Tue, Oct 11, 2016 at 01:11:04PM -0600, Martin Sebor wrote:

Also, the pattern that starts with "/\+\+\+" looks like it's missing
the ^ anchor.  Presumably it should be "/^\+\+\+ \/testsuite\//".

No, it will be almost never +++ /testsuite/
There needs to be .* in between "+++ " and "/testsuite/", and perhaps
it should also ignore "+++ testsuite/".
So /^\+\+\+ (.*\/)?testsuite\// ?
Also, normally (when matching $0) there won't be newlines in the text.

Jakub


Thanks.
Here is the updated patch with your suggestions.



Actually, I've encountered a problem:

 85 # Remove the testsuite part of the diff.  We don't care about GNU
style
 86 # in testcases and the dg-* directives give too many false positives.
 87 remove_testsuite ()
 88 {
 89   awk 'BEGIN{testsuite=0} /\+\+\+ / && ! /testsuite\//{testsuite=0} \
 90{if (!testsuite) print} /^\+\+\+
(.*\/)?testsuite\//{testsuite=1}'
 91 }
 92
 93 grep $format '^+' $files \
 94 | remove_testsuite \
 95 | grep -v ':+++' \
 96 > $inp


The /^\+\+\+ (.*\/)?testsuite\// doesn't ever match when the ^ anchor
is used.
The awk command matches fine by itself but not when fed from the "grep
$format '^+' $files"
command because grep adds the line numbers and file names.
So is it okay to omit the ^ here?


I think the AWK regex will not work correctly when the patch has
the line number prefix like "1234: " (AFAICT, this can only happen
in the second invocation of the remove_testsuite function which
also has the problem below making me wonder if your testing
exercised that mode).



Huh, you're right, but it didn't cause problems in my testing, which is weird.


I think the AWK regex needs to be changed to handle that.  It should
start with something like "^([1-9][0-9]*:)?\+\+\+"


I think it needs to be
^(.*:)?([1-9][0-9]*:)?\+\+\+
because grep -nH would add the filename as well as the line number in the first
invocation of remove_testsuite.
This revision does that.



I tried to test the patch but it doesn't seem to work.  When passed
a patch as an argument it hangs.  The hunk below isn't quite right:

 # Don't reuse $inp, which may be generated using -H and thus contain a
-# file prefix.
-grep -n '^+' $f \
+# file prefix.  Re-remove the testsuite since we're not using $inp.
+remove_testsuite $f \
+| grep -n '^+' \
 | grep -v ':+++' \
 > $tmp

The remove_testsuite function ignores arguments so passing $f to it
won't do anything except hang waiting for input.  This should look
closer to this (it worked in my very limited testing):

cat $f | remove_testsuite \



Thanks for the help,
Kyrill

2016-10-13  Kyrylo Tkachov  

* check_GNU_style.sh (remove_testsuite): New function.
Use it to remove testsuite from the diff.


Martin






Re: [PATCH] Do not disable aggressive loop opts for, -fsanitize=unreachable or leak

2016-10-21 Thread Jakub Jelinek
On Wed, Oct 19, 2016 at 01:39:16PM +0200, Martin Liška wrote:
> >From 7f1648ef3480c6db856e567153cf9bb838c77d4f Mon Sep 17 00:00:00 2001
> From: marxin 
> Date: Mon, 17 Oct 2016 15:58:50 +0200
> Subject: [PATCH] Do not disable aggressive loop opts for
>  -fsanitize=unreachable or leak
> 
> gcc/ChangeLog:
> 
> 2016-10-17  Martin Liska  
> 
>   PR sanitizer/77966
>   * opts.c (finish_options): Skip conditionally.
> 
> gcc/testsuite/ChangeLog:
> 
> 2016-10-17  Martin Liska  
> 
>   PR sanitizer/77966
>   * c-c++-common/ubsan/unreachable-3.c: New test.

Ok, thanks.

Jakub


Re: [PATCH][v6] GIMPLE store merging pass

2016-10-21 Thread Kyrill Tkachov

Hi Richard,

On 21/10/16 13:37, Richard Biener wrote:

On Tue, 18 Oct 2016, Kyrill Tkachov wrote:


Hi Richard,

This patch is a merge of [1] and [2] and implements the manual merging of
bitfields
as outlined in [1] but actually makes it work on BYTES_BIG_ENDIAN too.
It caused me a lot of headeache because the bit offset is counted from the
most significant bit
in the byte, even though BITS_BIG_ENDIAN was 0 (BITS_BIG_ENDIAN looks
irrelevant for store merging
anyway as it's just used to described RTL extract operations).
I've included ASCII diagrams of the steps in the merging algorithm.

Heh, thanks.


Bootstrapped and tested on arm-none-linux-gnueabihf, aarch64-none-linux-gnu,
x86_64-unknown-linux-gnu.
Also tested on aarch64_be-none-elf.

How does this version look now?

Mostly good.  For

+bool
+pass_store_merging::terminate_all_aliasing_chains (tree dest, tree base,
+  gimple *stmt)
+{
...
+  /* Check if the assignment destination (BASE) is part of a store chain.
+ This is to catch non-constant stores to destinations that may be
part
+ of a chain.  */
+  if (base)
+{
+  chain_info = m_stores.get (base);
+  if (chain_info)
+   {
+ struct store_immediate_info *info;
+ unsigned int i;
+ FOR_EACH_VEC_ELT ((*chain_info)->m_store_info, i, info)
+   {
+ if (refs_may_alias_p (info->dest, dest))
+   {

I suppose the chain is not yet sorted in any way?

At least for 'dest' which do not have a known constant offset we
could do

if (base)
  terminate_and_release_chain (base);


Do you mean when get_inner_reference returns non-NULL for POFFSET?
Or do you think we should try to look into dest in this function?


to speed things up?  IIRC we do not terminate chains early in
this phase when we have enough stores to form a group, so
writing a testcase that triggers quadraticness would be as simple
as having

char a[100];

void foo ()
{
  a[0] = 1;
  a[1] = 2;
  
  a[999] = 3;
}

?

so I think you probably want to limit the number of stores you
ever put onto a chain and if you reach that limit, terminate
and release it?  Like just choose 16 or 64?  (and experiment
with the above kind of testcases)


I was initially thinking of imposing such a limit as well but
later I thought we'd want to extend the output code to be able to emit
a memcpy (or memset) call for large regions, so detecting the largest possible
regions would be needed. But that is not implemented yet (though I have 
experimented
with it) so I can add a limit here. Should I just hardcode a limit or should I 
make it
into a --param (MAX_STMTS_IN_STORE_MERGING_CHAIN or something)?



+ bit_off = byte_off << LOG2_BITS_PER_UNIT;
+ if (!wi::neg_p (bit_off) && wi::fits_shwi_p (bit_off))
+   {
+ bitpos += bit_off.to_shwi ();
+

I think you want bit_off += bitpos before the fits_shwi check
otherwise this add may still overflow.

+ base_addr = copy_node (base_addr);
+ TREE_OPERAND (base_addr, 1)
+   = build_zero_cst (TREE_TYPE (TREE_OPERAND (
+  base_addr, 1)));

I'd prefer

   base_addr = build2 (MEM_REF, ...);

here.


Thanks for the feedback,
Kyrill



Thanks,
Richard.


Thanks,
Kyrill

[1] https://gcc.gnu.org/ml/gcc-patches/2016-10/msg00573.html
[2] https://gcc.gnu.org/ml/gcc-patches/2016-10/msg00572.html

2016-10-18  Kyrylo Tkachov  

 PR middle-end/22141
 * Makefile.in (OBJS): Add gimple-ssa-store-merging.o.
 * common.opt (fstore-merging): New Optimization option.
 * opts.c (default_options_table): Add entry for
 OPT_ftree_store_merging.
 * fold-const.h (can_native_encode_type_p): Declare prototype.
 * fold-const.c (can_native_encode_type_p): Define.
 * params.def (PARAM_STORE_MERGING_ALLOW_UNALIGNED): Define.
 * passes.def: Insert pass_tree_store_merging.
 * tree-pass.h (make_pass_store_merging): Declare extern
 prototype.
 * gimple-ssa-store-merging.c: New file.
 * doc/invoke.texi (Optimization Options): Document
 -fstore-merging.

2016-10-18  Kyrylo Tkachov  
 Jakub Jelinek  
 Andrew Pinski  

 PR middle-end/22141
 PR rtl-optimization/23684
 * gcc.c-torture/execute/pr22141-1.c: New test.
 * gcc.c-torture/execute/pr22141-2.c: Likewise.
 * gcc.target/aarch64/ldp_stp_1.c: Adjust for -fstore-merging.
 * gcc.target/aarch64/ldp_stp_4.c: Likewise.
 * gcc.dg/store_merging_1.c: New test.
 * gcc.dg/store_merging_2.c: Likewise.
 * gcc.dg/store_merging_3.c: Likewise.
 * gcc.dg/store_merging_4.c: Likewise.
 * gcc.dg/store_merging_5.c: Likewise.
 * gcc.dg/store_merging_6.c: Likewise.
 * gcc.dg/store_merging_7.c: Likewise.
 * gcc.target/i386/pr22141.c: Likewise.
 * gcc.target/i386/pr34012.c: Add -fno-store-mer

Re: [PATCHv2] do not throw in std::make_exception_ptr

2016-10-21 Thread Jonathan Wakely

On 21/10/16 13:57 +0100, Jonathan Wakely wrote:

On 21/10/16 15:33 +0300, Gleb Natapov wrote:

On Fri, Oct 21, 2016 at 02:58:26PM +0300, Gleb Natapov wrote:

On Fri, Oct 21, 2016 at 12:44:39PM +0100, Jonathan Wakely wrote:

On 21/10/16 14:36 +0300, Gleb Natapov wrote:
> On Thu, Oct 20, 2016 at 11:53:49PM -0400, Ryan Burn wrote:
> > Are exception classes required to support emplace new construction
> > like that? With this change, Intel's TBB library no longer compiles
> > because its exception class declares it's own new operator (see
> > https://github.com/wjakob/tbb/blob/master/include/tbb/tbb_exception.h):
> >
> Can you test this patch please:

That doesn't help, the overloaded new still prevents placement new.
Dammit.


Hmm, are you sure. This program compiles for me (while fails without ::):


Looks like tbb also compiles and pass tests.


Bah! I didn't include  in my test.

I'll make that change, thanks.


Tested powerpc64le-linux, committed to trunk.


commit 1e88a5c65a6158462d3703869766f973db91527f
Author: Jonathan Wakely 
Date:   Fri Oct 21 14:04:16 2016 +0100

Use global operator new in std::make_exception_ptr

	* libsupc++/exception_ptr.h (make_exception_ptr): Qualify new.
	* testsuite/18_support/exception_ptr/make_exception_ptr_2.cc: New
	test.

diff --git a/libstdc++-v3/libsupc++/exception_ptr.h b/libstdc++-v3/libsupc++/exception_ptr.h
index 21e4e8b..a47a585 100644
--- a/libstdc++-v3/libsupc++/exception_ptr.h
+++ b/libstdc++-v3/libsupc++/exception_ptr.h
@@ -187,10 +187,10 @@ namespace std
 	{
 #if __cpp_rtti && !_GLIBCXX_HAVE_CDTOR_CALLABI
   void *__e = __cxxabiv1::__cxa_allocate_exception(sizeof(_Ex));
-  (void)__cxxabiv1::__cxa_init_primary_exception(__e,
-   const_cast(&typeid(__ex)),
-   __exception_ptr::__dest_thunk<_Ex>);
-  new (__e) _Ex(__ex);
+  (void)__cxxabiv1::__cxa_init_primary_exception(
+	  __e, const_cast(&typeid(__ex)),
+	  __exception_ptr::__dest_thunk<_Ex>);
+  ::new (__e) _Ex(__ex);
   return exception_ptr(__e);
 #else
   throw __ex;
diff --git a/libstdc++-v3/testsuite/18_support/exception_ptr/make_exception_ptr_2.cc b/libstdc++-v3/testsuite/18_support/exception_ptr/make_exception_ptr_2.cc
new file mode 100644
index 000..378
--- /dev/null
+++ b/libstdc++-v3/testsuite/18_support/exception_ptr/make_exception_ptr_2.cc
@@ -0,0 +1,43 @@
+// { dg-do run { target c++11 } }
+// { dg-require-atomic-builtins "" }
+
+// Copyright (C) 2010-2016 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+#include 
+#include 
+
+// https://gcc.gnu.org/ml/libstdc++/2016-10/msg00139.html
+
+struct E {
+  void* operator new(std::size_t) = delete;
+};
+
+void test01()
+{
+  E e;
+  std::exception_ptr p = std::make_exception_ptr(e);
+
+  VERIFY( p );
+}
+
+int main()
+{
+  test01();
+
+  return 0;
+}


Re: [PATCH][AArch64] Improve stack adjustment

2016-10-21 Thread James Greenhalgh
On Tue, Oct 18, 2016 at 07:10:07PM +0100, Wilco Dijkstra wrote:
> James Greenhalgh wrote:
> On Mon, Oct 17, 2016 at 12:38:36PM +, Wilco Dijkstra wrote:
> 
> >> +  /* We need two add/sub instructions, each one perform part of the
> >> + addition/subtraction, but don't this if the addend can be loaded into
> >> + register by single instruction, in that case we prefer a move to 
> >> scratch
> >> + register following by addition.  */
> 
> > This sentence is missing some words.
> 
> Sorry, badly edited old comment. I decided to just rewrite it, so here is the 
> new version:
> 
> +  /* Emit 2 additions/subtractions if the adjustment is less than 24 bits.
> + Only do this if mdelta is not a 16-bit move as adjusting using a move
> + is better.  */
> 
> 
> > > +  if (mdelta < 0x100 && !aarch64_move_imm (mdelta, mode))
> 
> > Could you explain this change? The comment makes it seem like delta would
> > still be correct. Probably the comment needs to say "followed by
> > addition/subtraction" rather than what is currently written?
> 
> aarch64_move_imm (mdelta, mode) is not always the same as aarch64_move_imm
> (delta, mode). We later emit a move using mdelta (so that both prolog and 
> epilog use
> positive adjustments). Therefore we must check mdelta here, not delta.
> 
> > > +  if (emit_move_imm)
> > > +    aarch64_internal_mov_immediate (scratch_rtx, GEN_INT (mdelta), true, 
> > > mode);
> > > +  insn = emit_insn (delta < 0 ? gen_sub2_insn (this_rtx, scratch_rtx)
> > > + : gen_add2_insn (this_rtx, scratch_rtx));
> 
> > What is contained in scratch_rtx here if we didn't set it up with
> > aarch64_internal_move_immediate? Are you not adding junk values in the
> > !emit_move_imm case?
> 
> This function should only be called with !emit_move_imm if scratchreg is 
> known to contain
> mdelta. The prolog initializes the scratch register, the liveness check is 
> done in the epilog:
> 
> +  aarch64_add_sp (IP0_REGNUM, initial_adjust, df_regs_ever_live_p 
> (IP0_REGNUM));

OK. I see where I went wrong here. I thought you were constructing a fresh
scratch register, and only initialising it if emit_move_imm - now I see that
scratch_rtx is just wrapping scratchreg.

This is OK.

Thanks,
James



Re: [PATCHv2] do not throw in std::make_exception_ptr

2016-10-21 Thread Jonathan Wakely

On 21/10/16 15:33 +0300, Gleb Natapov wrote:

On Fri, Oct 21, 2016 at 02:58:26PM +0300, Gleb Natapov wrote:

On Fri, Oct 21, 2016 at 12:44:39PM +0100, Jonathan Wakely wrote:
> On 21/10/16 14:36 +0300, Gleb Natapov wrote:
> > On Thu, Oct 20, 2016 at 11:53:49PM -0400, Ryan Burn wrote:
> > > Are exception classes required to support emplace new construction
> > > like that? With this change, Intel's TBB library no longer compiles
> > > because its exception class declares it's own new operator (see
> > > https://github.com/wjakob/tbb/blob/master/include/tbb/tbb_exception.h):
> > >
> > Can you test this patch please:
>
> That doesn't help, the overloaded new still prevents placement new.
> Dammit.
>
Hmm, are you sure. This program compiles for me (while fails without ::):


Looks like tbb also compiles and pass tests.


Bah! I didn't include  in my test.

I'll make that change, thanks.



#include
#include 

struct S {
  void* operator new (unsigned long size);
};

main() {
 void* p = malloc(sizeof(S));
 ::new(p) S();
}

--
Gleb.


--
Gleb.


Re: [Patch, fortran] PR69566 - Failure of SELECT TYPE with unlimited polymorphic function result

2016-10-21 Thread Paul Richard Thomas
Hi Andre,

Committed to trunk as revision 241403.

Thanks for the review.

Paul

On 20 October 2016 at 11:43, Andre Vehreschild  wrote:
> Hi Paul,
>
> after looking at your patch again, I understood why these extra copies are
> needed. May be a comment would prevent future gfortran hackers from trying to
> remove them again.
>
> The patch is ok for me. Thanks for working on this.
>
> Regards,
> Andre
>
>
> On Wed, 19 Oct 2016 20:02:14 +0200
> Andre Vehreschild  wrote:
>
>> Hi Paul,
>>
>> I am not completely through with your patch, but what jumped into my eye was
>> that you copy ref in resolve_select_type and again in fixup_array_ref, when
>> you use it? May be I oversee something. You are more into this code. Is the
>> double copying necessary (line 49 and 82 as well as 95, respectively). IMHO
>> the copy in line 49 could be sufficient.
>>
>> I look into it tomorrow more thoroughly. Please wait before submitting a new
>> version. May be I see something more :-)
>>
>> So far, thanks for working on this.
>>
>> Regards,
>>   Andre
>>
>> On Wed, 19 Oct 2016 09:28:39 +0200
>> Paul Richard Thomas  wrote:
>>
>> > Dear Andre,
>> >
>> > Following our exchange yesterday, I have eliminated the modification
>> > to trans_associate_var and have corrected the offending expressions in
>> > resolve.c(fixup_array_ref).
>> >
>> > Please find attached the corrected patch.
>> >
>> > Cheers
>> >
>> > Paul
>> >
>> > 2016-10-19  Paul Thomas  
>> >
>> > PR fortran/69566
>> > * resolve.c (fixup_array_ref): New function.
>> > (resolve_select_type): Gather up the rank and array reference,
>> > if any, from the selector. Fix up the 'associate name' and the
>> > 'associate entities' as necessary.
>> > * trans-expr.c (gfc_conv_class_to_class): If the symbol backend
>> > decl is a FUNCTION_DECL, use the 'fake_result_decl' instead.
>> >
>> > 2016-10-19  Paul Thomas  
>> >
>> > PR fortran/69566
>> > * gfortran.dg/select_type_37.f03: New test.
>> >
>> > On 18 October 2016 at 18:16, Andre Vehreschild  wrote:
>> > > Hi Paul,
>> > >
>> > >> For reasons I don't understand, sometimes the expression type comes
>> > >> through as BT_DERIVED, whilst the symbol is BT_CLASS. I could repair
>> > >> this in resolve.c(fixup_array_ref) if you think that would be
>> > >> cleaner.
>> > >
>> > > I think that I figured the rule:
>> > >
>> > > - when no _class-ref is present, then the type is BT_CLASS,
>> > > - as soon as a _class-ref is present the type is BT_DERIVED.
>> > >
>> > > There is an attr.is_class. Would that be an alternative? I don't know how
>> > > reliable it is set.
>> > >
>> > >> > I am regression testing my polymorhpic class patch at the moment,
>> > >> > therefore I can't test.
>> > >>
>> > >> OK - I can wait. I have quite a few other things to do :-(
>> > >
>> > > I found an error in my patch that only manifests itself with an
>> > > optimization level great than 0. Now I am searching, never having done
>> > > anything there.
>> > >
>> > > - Andre
>> > > --
>> > > Andre Vehreschild * Email: vehre ad gmx dot de
>> >
>> >
>> >
>>
>>
>
>
> --
> Andre Vehreschild * Email: vehre ad gmx dot de



-- 
The difference between genius and stupidity is; genius has its limits.

Albert Einstein


Re: [PATCH][v6] GIMPLE store merging pass

2016-10-21 Thread Richard Biener
On Tue, 18 Oct 2016, Kyrill Tkachov wrote:

> Hi Richard,
> 
> This patch is a merge of [1] and [2] and implements the manual merging of
> bitfields
> as outlined in [1] but actually makes it work on BYTES_BIG_ENDIAN too.
> It caused me a lot of headeache because the bit offset is counted from the
> most significant bit
> in the byte, even though BITS_BIG_ENDIAN was 0 (BITS_BIG_ENDIAN looks
> irrelevant for store merging
> anyway as it's just used to described RTL extract operations).
> I've included ASCII diagrams of the steps in the merging algorithm.

Heh, thanks.

> Bootstrapped and tested on arm-none-linux-gnueabihf, aarch64-none-linux-gnu,
> x86_64-unknown-linux-gnu.
> Also tested on aarch64_be-none-elf.
> 
> How does this version look now?

Mostly good.  For

+bool
+pass_store_merging::terminate_all_aliasing_chains (tree dest, tree base,
+  gimple *stmt)
+{
...
+  /* Check if the assignment destination (BASE) is part of a store chain.
+ This is to catch non-constant stores to destinations that may be 
part
+ of a chain.  */
+  if (base)
+{
+  chain_info = m_stores.get (base);
+  if (chain_info)
+   {
+ struct store_immediate_info *info;
+ unsigned int i;
+ FOR_EACH_VEC_ELT ((*chain_info)->m_store_info, i, info)
+   {
+ if (refs_may_alias_p (info->dest, dest))
+   {

I suppose the chain is not yet sorted in any way?

At least for 'dest' which do not have a known constant offset we
could do

   if (base)
 terminate_and_release_chain (base);

to speed things up?  IIRC we do not terminate chains early in 
this phase when we have enough stores to form a group, so
writing a testcase that triggers quadraticness would be as simple
as having

char a[100];

void foo ()
{
 a[0] = 1;
 a[1] = 2;
 
 a[999] = 3;
}

?

so I think you probably want to limit the number of stores you
ever put onto a chain and if you reach that limit, terminate
and release it?  Like just choose 16 or 64?  (and experiment
with the above kind of testcases)

+ bit_off = byte_off << LOG2_BITS_PER_UNIT;
+ if (!wi::neg_p (bit_off) && wi::fits_shwi_p (bit_off))
+   {
+ bitpos += bit_off.to_shwi ();
+

I think you want bit_off += bitpos before the fits_shwi check
otherwise this add may still overflow.

+ base_addr = copy_node (base_addr);
+ TREE_OPERAND (base_addr, 1)
+   = build_zero_cst (TREE_TYPE (TREE_OPERAND (
+  base_addr, 1)));

I'd prefer

  base_addr = build2 (MEM_REF, ...);

here.

Thanks,
Richard.

> Thanks,
> Kyrill
> 
> [1] https://gcc.gnu.org/ml/gcc-patches/2016-10/msg00573.html
> [2] https://gcc.gnu.org/ml/gcc-patches/2016-10/msg00572.html
> 
> 2016-10-18  Kyrylo Tkachov  
> 
> PR middle-end/22141
> * Makefile.in (OBJS): Add gimple-ssa-store-merging.o.
> * common.opt (fstore-merging): New Optimization option.
> * opts.c (default_options_table): Add entry for
> OPT_ftree_store_merging.
> * fold-const.h (can_native_encode_type_p): Declare prototype.
> * fold-const.c (can_native_encode_type_p): Define.
> * params.def (PARAM_STORE_MERGING_ALLOW_UNALIGNED): Define.
> * passes.def: Insert pass_tree_store_merging.
> * tree-pass.h (make_pass_store_merging): Declare extern
> prototype.
> * gimple-ssa-store-merging.c: New file.
> * doc/invoke.texi (Optimization Options): Document
> -fstore-merging.
> 
> 2016-10-18  Kyrylo Tkachov  
> Jakub Jelinek  
> Andrew Pinski  
> 
> PR middle-end/22141
> PR rtl-optimization/23684
> * gcc.c-torture/execute/pr22141-1.c: New test.
> * gcc.c-torture/execute/pr22141-2.c: Likewise.
> * gcc.target/aarch64/ldp_stp_1.c: Adjust for -fstore-merging.
> * gcc.target/aarch64/ldp_stp_4.c: Likewise.
> * gcc.dg/store_merging_1.c: New test.
> * gcc.dg/store_merging_2.c: Likewise.
> * gcc.dg/store_merging_3.c: Likewise.
> * gcc.dg/store_merging_4.c: Likewise.
> * gcc.dg/store_merging_5.c: Likewise.
> * gcc.dg/store_merging_6.c: Likewise.
> * gcc.dg/store_merging_7.c: Likewise.
> * gcc.target/i386/pr22141.c: Likewise.
> * gcc.target/i386/pr34012.c: Add -fno-store-merging to dg-options.
> * g++.dg/init/new17.C: Likewise.
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCHv2] do not throw in std::make_exception_ptr

2016-10-21 Thread Gleb Natapov
On Fri, Oct 21, 2016 at 02:58:26PM +0300, Gleb Natapov wrote:
> On Fri, Oct 21, 2016 at 12:44:39PM +0100, Jonathan Wakely wrote:
> > On 21/10/16 14:36 +0300, Gleb Natapov wrote:
> > > On Thu, Oct 20, 2016 at 11:53:49PM -0400, Ryan Burn wrote:
> > > > Are exception classes required to support emplace new construction
> > > > like that? With this change, Intel's TBB library no longer compiles
> > > > because its exception class declares it's own new operator (see
> > > > https://github.com/wjakob/tbb/blob/master/include/tbb/tbb_exception.h):
> > > > 
> > > Can you test this patch please:
> > 
> > That doesn't help, the overloaded new still prevents placement new.
> > Dammit.
> > 
> Hmm, are you sure. This program compiles for me (while fails without ::):
> 
Looks like tbb also compiles and pass tests.

> #include
> #include 
> 
> struct S {
>   void* operator new (unsigned long size);
> };
> 
> main() {
>  void* p = malloc(sizeof(S));
>  ::new(p) S();
> }
> 
> --
>   Gleb.

--
Gleb.


Re: [Patch, rtl] PR middle-end/78016, keep REG_NOTE order during insn copy

2016-10-21 Thread Bernd Schmidt



On 10/21/2016 02:04 PM, Jiong Wang wrote:

+  /* Locate the end of existing REG_NOTES in NEW_RTX.  */
+  rtx *ptail = ®_NOTES (new_rtx);
+  while (*ptail != NULL_RTX)
+ptail = &XEXP (*ptail, 1);


I was thinking along the lines of something like this (untested, 
emit-rtl.c part omitted). Eric can choose whether he likes either of 
these or wants something else.



Bernd

Index: gcc/rtl.h
===
--- gcc/rtl.h   (revision 241233)
+++ gcc/rtl.h   (working copy)
@@ -3008,6 +3008,7 @@ extern rtx alloc_reg_note (enum reg_note
 extern void add_reg_note (rtx, enum reg_note, rtx);
 extern void add_int_reg_note (rtx, enum reg_note, int);
 extern void add_shallow_copy_of_reg_note (rtx_insn *, rtx);
+extern rtx duplicate_reg_note (rtx_insn *, rtx);
 extern void remove_note (rtx, const_rtx);
 extern void remove_reg_equal_equiv_notes (rtx_insn *);
 extern void remove_reg_equal_equiv_notes_for_regno (unsigned int);
Index: gcc/rtlanal.c
===
--- gcc/rtlanal.c   (revision 241233)
+++ gcc/rtlanal.c   (working copy)
@@ -2304,6 +2304,21 @@ add_shallow_copy_of_reg_note (rtx_insn *
 add_reg_note (insn, REG_NOTE_KIND (note), XEXP (note, 0));
 }

+/* Duplicate NOTE and return the copy.  */
+rtx
+duplicate_reg_note (rtx note)
+{
+  rtx n;
+  reg_note_kind kind = REG_NOTE_KIND (note);
+
+  if (GET_CODE (note) == INT_LIST)
+return gen_rtx_INT_LIST ((machine_mode) kind, XINT (note, 0), 
NULL_RTX);

+  else if (GET_CODE (note) == EXPR_LIST)
+return alloc_reg_note (kind, copy_insn_1 (XEXP (note, 0)), NULL_RTX);
+  else
+return alloc_reg_note (kind, XEXP (note, 0), NULL_RTX);
+}
+
 /* Remove register note NOTE from the REG_NOTES of INSN.  */

 void
Index: gcc/sel-sched-ir.c
===
--- gcc/sel-sched-ir.c  (revision 241233)
+++ gcc/sel-sched-ir.c  (working copy)
@@ -5762,6 +5762,11 @@ create_copy_of_insn_rtx (rtx insn_rtx)
   res = create_insn_rtx_from_pattern (copy_rtx (PATTERN (insn_rtx)),
   NULL_RTX);

+  /* Locate the end of existing REG_NOTES in NEW_RTX.  */
+  rtx *ptail = ®_NOTES (new_rtx);
+  while (*ptail != NULL_RTX)
+ptail = &XEXP (*ptail, 1);
+
   /* Copy all REG_NOTES except REG_EQUAL/REG_EQUIV and REG_LABEL_OPERAND
  since mark_jump_label will make them.  REG_LABEL_TARGETs are created
  there too, but are supposed to be sticky, so we copy them.  */
@@ -5770,11 +5775,8 @@ create_copy_of_insn_rtx (rtx insn_rtx)
&& REG_NOTE_KIND (link) != REG_EQUAL
&& REG_NOTE_KIND (link) != REG_EQUIV)
   {
-   if (GET_CODE (link) == EXPR_LIST)
- add_reg_note (res, REG_NOTE_KIND (link),
-   copy_insn_1 (XEXP (link, 0)));
-   else
- add_reg_note (res, REG_NOTE_KIND (link), XEXP (link, 0));
+   *ptail = duplicate_reg_note (link);
+   ptail = &XEXP (*ptail, 1);
   }

   return res;


[PATCH] Use SCEV in EVRP, fix single predecessor discovery

2016-10-21 Thread Richard Biener

This makes us derive ranges for loop IVs in EVRP using 
adjust_range_with_scevs.  It also allows us to derive ranges from
conditions in loop preheaders (I think that's still broken because
we force simple preheaders and predecessor search doesn't follow
forwarders -- sth for a followup).

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

2016-10-21  Richard Biener  

* tree-vrp.c (evrp_dom_walker::before_dom_children): Ignore
backedges when identifying the single predecessor to take
conditional info from.  Use SCEV to get at ranges for loop IVs.

Index: gcc/tree-vrp.c
===
--- gcc/tree-vrp.c  (revision 241400)
+++ gcc/tree-vrp.c  (working copy)
@@ -10693,12 +10693,29 @@ edge
 evrp_dom_walker::before_dom_children (basic_block bb)
 {
   tree op0 = NULL_TREE;
+  edge_iterator ei;
+  edge e;
 
   push_value_range (NULL_TREE, NULL);
-  if (single_pred_p (bb))
+
+  edge pred_e = NULL;
+  FOR_EACH_EDGE (e, ei, bb->preds)
+{
+  /* Ignore simple backedges from this to allow recording conditions
+in loop headers.  */
+  if (dominated_by_p (CDI_DOMINATORS, e->src, e->dest))
+   continue;
+  if (! pred_e)
+   pred_e = e;
+  else
+   {
+ pred_e = NULL;
+ break;
+   }
+}
+  if (pred_e)
 {
-  edge e = single_pred_edge (bb);
-  gimple *stmt = last_stmt (e->src);
+  gimple *stmt = last_stmt (pred_e->src);
   if (stmt
  && gimple_code (stmt) == GIMPLE_COND
  && (op0 = gimple_cond_lhs (stmt))
@@ -10715,7 +10732,7 @@ evrp_dom_walker::before_dom_children (ba
op1 = drop_tree_overflow (op1);
 
  /* If condition is false, invert the cond.  */
- if (e->flags & EDGE_FALSE_VALUE)
+ if (pred_e->flags & EDGE_FALSE_VALUE)
code = invert_tree_comparison (gimple_cond_code (stmt),
   HONOR_NANS (op0));
  /* Add VR when (OP0 CODE OP1) condition is true.  */
@@ -10743,11 +10760,7 @@ evrp_dom_walker::before_dom_children (ba
 }
 
   /* Visit PHI stmts and discover any new VRs possible.  */
-  gimple_stmt_iterator gsi;
-  edge e;
-  edge_iterator ei;
   bool has_unvisited_preds = false;
-
   FOR_EACH_EDGE (e, ei, bb->preds)
 if (e->flags & EDGE_EXECUTABLE
&& !(e->src->flags & BB_VISITED))
@@ -10761,12 +10774,24 @@ evrp_dom_walker::before_dom_children (ba
 {
   gphi *phi = gpi.phi ();
   tree lhs = PHI_RESULT (phi);
+  if (virtual_operand_p (lhs))
+   continue;
   value_range vr_result = VR_INITIALIZER;
   if (!has_unvisited_preds
  && stmt_interesting_for_vrp (phi))
extract_range_from_phi_node (phi, &vr_result);
   else
-   set_value_range_to_varying (&vr_result);
+   {
+ /* When we have an unvisited executable predecessor we can't
+use PHI arg ranges which may be still UNDEFINED but have
+to use VARYING for them.  But we can still resort to
+SCEV for loop header PHIs.  */
+ set_value_range_to_varying (&vr_result);
+ struct loop *l;
+ if ((l = loop_containing_stmt (phi))
+ && l->header == gimple_bb (phi))
+   adjust_range_with_scev (&vr_result, l, phi, lhs);
+   }
   update_value_range (lhs, &vr_result);
 
   /* Mark PHIs whose lhs we fully propagate for removal.  */
@@ -10778,7 +10803,8 @@ evrp_dom_walker::before_dom_children (ba
   edge taken_edge = NULL;
 
   /* Visit all other stmts and discover any new VRs possible.  */
-  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+  for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
+   !gsi_end_p (gsi); gsi_next (&gsi))
 {
   gimple *stmt = gsi_stmt (gsi);
   tree output = NULL_TREE;


Re: [Patch, rtl] PR middle-end/78016, keep REG_NOTE order during insn copy

2016-10-21 Thread Jiong Wang

On 21/10/16 11:13, Bernd Schmidt wrote:

On 10/21/2016 09:43 AM, Eric Botcazou wrote:
I disagree: there are currently n ways of copying NOTEs in the RTL 
middle-end,
with different properties each time.  We need only one primitive in 
rtlanal.c.


I feel the fact that they have different properties means we shouldn't 
try to unify them: we'll just end up with a long list of boolean 
parameters, with no way of quickly telling what a given function call 
is doing. A copy loop is short enough that it can be implemented 
in-place and people can quickly tell what is going on by looking at it.


Maybe the inner if statement could be a small helper function 
(append_copy_of_reg_note).



Bernd


Hi Bernd, Eric,

  How does the attached patch looks to you?  x86_64 bootstrap & regression OK.

  I borrowed Bernd' code to write the tail pointer directly.


2016-10-21  Bernd Schmidt  
Jiong Wang  
  
gcc/


PR middle-end/78016
* emit-rtl.c (emit_copy_of_insn_after): Copy REG_NOTES in order instead
of in reverse order.
* sel-sched-ir.c (create_copy_of_insn_rtx): Likewise.


diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
index 2d6d1eb6c1311871f15dbed13d7c084ed3845a86..4d849ca6e64273bedc5bf8b9a62a5cc5d4606129 100644
--- a/gcc/emit-rtl.c
+++ b/gcc/emit-rtl.c
@@ -6168,17 +6168,31 @@ emit_copy_of_insn_after (rtx_insn *insn, rtx_insn *after)
  which may be duplicated by the basic block reordering code.  */
   RTX_FRAME_RELATED_P (new_rtx) = RTX_FRAME_RELATED_P (insn);
 
+  /* Locate the end of existing REG_NOTES in NEW_RTX.  */
+  rtx *ptail = ®_NOTES (new_rtx);
+  while (*ptail != NULL_RTX)
+ptail = &XEXP (*ptail, 1);
+
   /* Copy all REG_NOTES except REG_LABEL_OPERAND since mark_jump_label
  will make them.  REG_LABEL_TARGETs are created there too, but are
  supposed to be sticky, so we copy them.  */
   for (link = REG_NOTES (insn); link; link = XEXP (link, 1))
 if (REG_NOTE_KIND (link) != REG_LABEL_OPERAND)
   {
-	if (GET_CODE (link) == EXPR_LIST)
-	  add_reg_note (new_rtx, REG_NOTE_KIND (link),
-			copy_insn_1 (XEXP (link, 0)));
+	rtx new_node;
+
+	if (GET_CODE (link) == INT_LIST)
+	  new_node = gen_rtx_INT_LIST ((machine_mode) REG_NOTE_KIND (link),
+   XINT (link, 0), NULL_RTX);
 	else
-	  add_shallow_copy_of_reg_note (new_rtx, link);
+	  new_node = alloc_reg_note (REG_NOTE_KIND (link),
+ (GET_CODE (link) == EXPR_LIST
+  ? copy_insn_1 (XEXP (link, 0))
+  : XEXP (link ,0)),
+ NULL_RTX);
+
+	*ptail = new_node;
+	ptail = &XEXP (new_node, 1);
   }
 
   INSN_CODE (new_rtx) = INSN_CODE (insn);
diff --git a/gcc/sel-sched-ir.c b/gcc/sel-sched-ir.c
index 210b1e4edfb359a161cda4826704005ae9ab5a24..324ae8cf05209757a3a3f3dee97c9274876c7ed7 100644
--- a/gcc/sel-sched-ir.c
+++ b/gcc/sel-sched-ir.c
@@ -5761,6 +5761,11 @@ create_copy_of_insn_rtx (rtx insn_rtx)
   res = create_insn_rtx_from_pattern (copy_rtx (PATTERN (insn_rtx)),
   NULL_RTX);
 
+  /* Locate the end of existing REG_NOTES in RES.  */
+  rtx *ptail = ®_NOTES (res);
+  while (*ptail != NULL_RTX)
+ptail = &XEXP (*ptail, 1);
+
   /* Copy all REG_NOTES except REG_EQUAL/REG_EQUIV and REG_LABEL_OPERAND
  since mark_jump_label will make them.  REG_LABEL_TARGETs are created
  there too, but are supposed to be sticky, so we copy them.  */
@@ -5769,11 +5774,12 @@ create_copy_of_insn_rtx (rtx insn_rtx)
 	&& REG_NOTE_KIND (link) != REG_EQUAL
 	&& REG_NOTE_KIND (link) != REG_EQUIV)
   {
-	if (GET_CODE (link) == EXPR_LIST)
-	  add_reg_note (res, REG_NOTE_KIND (link),
-			copy_insn_1 (XEXP (link, 0)));
-	else
-	  add_reg_note (res, REG_NOTE_KIND (link), XEXP (link, 0));
+	rtx new_node = alloc_reg_note (REG_NOTE_KIND (link),
+   (GET_CODE (link) == EXPR_LIST
+	? copy_insn_1 (XEXP (link, 0))
+	: XEXP (link ,0)), NULL_RTX);
+	*ptail = new_node;
+	ptail = &XEXP (new_node, 1);
   }
 
   return res;


Re: [PATCHv2] do not throw in std::make_exception_ptr

2016-10-21 Thread Gleb Natapov
On Fri, Oct 21, 2016 at 12:44:39PM +0100, Jonathan Wakely wrote:
> On 21/10/16 14:36 +0300, Gleb Natapov wrote:
> > On Thu, Oct 20, 2016 at 11:53:49PM -0400, Ryan Burn wrote:
> > > Are exception classes required to support emplace new construction
> > > like that? With this change, Intel's TBB library no longer compiles
> > > because its exception class declares it's own new operator (see
> > > https://github.com/wjakob/tbb/blob/master/include/tbb/tbb_exception.h):
> > > 
> > Can you test this patch please:
> 
> That doesn't help, the overloaded new still prevents placement new.
> Dammit.
> 
Hmm, are you sure. This program compiles for me (while fails without ::):

#include
#include 

struct S {
  void* operator new (unsigned long size);
};

main() {
 void* p = malloc(sizeof(S));
 ::new(p) S();
}

--
Gleb.


Re: [PATCH] Change ranges_table and ranges_by_label arrays into vec<*, va_gc> *

2016-10-21 Thread Bernd Schmidt

On 10/20/2016 08:30 PM, Jakub Jelinek wrote:

This patch changes these two manually maintained arrays into normal
vec.h vectors.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Looks good. For safety, could you make a before/after comparison on one 
large sourcefile to make sure code is identical?



Bernd



Re: [PATCHv2] do not throw in std::make_exception_ptr

2016-10-21 Thread Jonathan Wakely

On 21/10/16 12:44 +0100, Jonathan Wakely wrote:

On 21/10/16 14:36 +0300, Gleb Natapov wrote:

On Thu, Oct 20, 2016 at 11:53:49PM -0400, Ryan Burn wrote:

Are exception classes required to support emplace new construction
like that? With this change, Intel's TBB library no longer compiles
because its exception class declares it's own new operator (see
https://github.com/wjakob/tbb/blob/master/include/tbb/tbb_exception.h):


Can you test this patch please:


That doesn't help, the overloaded new still prevents placement new.
Dammit.

I'll see what we can do ...



We should be able to us SFINAE to detect when the placement
new-expression is valid.



Re: [PATCHv2] do not throw in std::make_exception_ptr

2016-10-21 Thread Jonathan Wakely

On 21/10/16 14:36 +0300, Gleb Natapov wrote:

On Thu, Oct 20, 2016 at 11:53:49PM -0400, Ryan Burn wrote:

Are exception classes required to support emplace new construction
like that? With this change, Intel's TBB library no longer compiles
because its exception class declares it's own new operator (see
https://github.com/wjakob/tbb/blob/master/include/tbb/tbb_exception.h):


Can you test this patch please:


That doesn't help, the overloaded new still prevents placement new.
Dammit.

I'll see what we can do ...




Re: [PATCHv2] do not throw in std::make_exception_ptr

2016-10-21 Thread Gleb Natapov
On Thu, Oct 20, 2016 at 11:53:49PM -0400, Ryan Burn wrote:
> Are exception classes required to support emplace new construction
> like that? With this change, Intel's TBB library no longer compiles
> because its exception class declares it's own new operator (see
> https://github.com/wjakob/tbb/blob/master/include/tbb/tbb_exception.h):
> 
Can you test this patch please:


diff --git a/libstdc++-v3/libsupc++/exception_ptr.h 
b/libstdc++-v3/libsupc++/exception_ptr.h
index 21e4e8b..6ade626 100644
--- a/libstdc++-v3/libsupc++/exception_ptr.h
+++ b/libstdc++-v3/libsupc++/exception_ptr.h
@@ -190,7 +190,7 @@ namespace std
   (void)__cxxabiv1::__cxa_init_primary_exception(__e,

const_cast(&typeid(__ex)),
__exception_ptr::__dest_thunk<_Ex>);
-  new (__e) _Ex(__ex);
+  ::new (__e) _Ex(__ex);
   return exception_ptr(__e);
 #else
   throw __ex;
> 
> class tbb_exception : public std::exception
> {
> /** No operator new is provided because the TBB usage model assumes 
> dynamic
>  creation of the TBB exception objects only by means of applying 
> move()
>  operation on an exception thrown out of TBB scheduler. **/
> void* operator new ( size_t );
> 
> 
> 
> On Mon, Aug 22, 2016 at 1:29 PM, Jonathan Wakely  wrote:
> > On 21/08/16 15:20 +0300, Gleb Natapov wrote:
> >>
> >> Jonathan,
> >>
> >> Is this version OK with you?
> >
> >
> > I've committed the attached version, which just adds some whitespace
> > and fixes the testsuite_abi.cc test.
> >
> > Thanks very much for the improvement to the code.
> >

--
Gleb.


MAINTAINERS: Update Hartmut Penner's email address

2016-10-21 Thread Ulrich Weigand
Hello,

as requested by Hartmut, I've updated the MAINTAINERS file to show
his new email address since the old one no longer works.

Bye,
Ulrich

Index: MAINTAINERS
===
--- MAINTAINERS (revision 241398)
+++ MAINTAINERS (working copy)
@@ -94,7 +94,7 @@ rs6000/powerpc port   David Edelsohn  
 rs6000 vector extnsAldy Hernandez  
 rx portNick Clifton
-s390 port  Hartmut Penner  
+s390 port  Hartmut Penner  
 s390 port  Ulrich Weigand  
 s390 port  Andreas Krebbel 
 score port Chen Liqin  
-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



[PATCH][2/2] Early LTO debug -- main part

2016-10-21 Thread Richard Biener

This is the main part of the early LTO debug support.  The main parts
of the changes are to dwarf2out.c where most of the changes are related
to the fact that we eventually have to output debug info twice, once
for the early LTO part and once for the fat part of the object file.

Bootstrapped and tested on x86_64-unknown-linux-gnu with ASAN and TSAN
extra FAILs (see PR78063, a libbacktrace missing feature or libsanitizer
being too pessimistic).  There's an extra

XPASS: gcc.dg/guality/inline-params.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  execution test

the previously reported extra VLA guality FAILs are gone.

I've compared testresults with -flto -g added for all languages and
only see expected differences (libstdc++ pretty printers now work,
most scan-assembler-times debug testcases fail because we have everything
twice now).

See https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01842.html for
the last posting of this patch which has a high-level overview of
Early LTO debug.  You may want to refer to the slides I presented
at the GNU Cauldron as well.

Thanks,
Richard.

2016-10-21  Richard Biener  

* debug.h (struct gcc_debug_hooks): Add die_ref_for_decl and
register_external_die hooks.
(debug_false_tree_charstarstar_uhwistar): Declare.
(debug_nothing_tree_charstar_uhwi): Likewise.
* debug.c (do_nothing_debug_hooks): Adjust.
(debug_false_tree_charstarstar_uhwistar): New do nothing.
(debug_nothing_tree_charstar_uhwi): Likewise.
* dbxout.c (dbx_debug_hooks): Adjust.
(xcoff_debug_hooks): Likewise.
* sdbout.c (sdb_debug_hooks): Likewise.
* vmsdbgout.c (vmsdbg_debug_hooks): Likewise.

* dwarf2out.c (macinfo_label_base): New global.
(dwarf2out_register_external_die): New function for the
register_external_die hook.
(dwarf2out_die_ref_for_decl): Likewise for die_ref_for_decl.
(dwarf2_debug_hooks): Use them.
(dwarf2_lineno_debug_hooks): Adjust.
(struct die_struct): Add with_offset flag.
(DEBUG_LTO_DWO_INFO_SECTION, DEBUG_LTO_INFO_SECTION,
DEBUG_LTO_DWO_ABBREV_SECTION, DEBUG_LTO_ABBREV_SECTION,
DEBUG_LTO_DWO_MACINFO_SECTION, DEBUG_LTO_MACINFO_SECTION,
DEBUG_LTO_DWO_MACRO_SECTION, DEBUG_LTO_MACRO_SECTION,
DEBUG_LTO_LINE_SECTION, DEBUG_LTO_DWO_STR_OFFSETS_SECTION,
DEBUG_LTO_STR_DWO_SECTION, DEBUG_STR_LTO_SECTION): New macros
defining section names for the early LTO debug variants.
(reset_indirect_string): New helper.
(add_AT_external_die_ref): Helper for dwarf2out_register_external_die.
(print_dw_val): Add support for offsetted symbol references.
(compute_section_prefix_1): Split out worker to distinguish
the comdat from the LTO case.
(compute_section_prefix): Wrap old comdat case here.
(output_die): Skip DIE symbol output for the LTO added one.
Handle DIE symbol references with offset.
(output_comp_unit): Guard section name mangling properly.
For LTO debug sections emit a symbol at the section beginning
which we use to refer to its DIEs.
(add_abstract_origin_attribute): For DIEs registered via
dwarf2out_register_external_die directly refer to the early
DIE rather than indirectly through the shadow one we created.
(gen_array_type_die): When generating early LTO debug do
not emit DW_AT_string_length.
(gen_formal_parameter_die): Do not re-create DIEs for PARM_DECLs
late when in LTO.
(gen_subprogram_die): Adjust the check for whether we face
a concrete instance DIE for an inline we can reuse for the
late LTO case.  Likewise avoid another specification DIE
for early built declarations/definitions for the late LTO case.
(gen_variable_die): Add type references for late duplicated VLA dies
when in late LTO.
(gen_inlined_subroutine_die): Do not call dwarf2out_abstract_function,
we have the abstract instance already.
(process_scope_var): Adjust decl DIE contexts in LTO which
first puts them in limbo.
(gen_decl_die): Do not generate type DIEs late apart from
types for VLAs or for decls we do not yet have a DIE.
(dwarf2out_early_global_decl): Make sure to create DIEs
for abstract instances of a decl first.
(dwarf2out_late_global_decl): Adjust comment.
(output_macinfo_op): With multiple macro sections use
macinfo_label_base to distinguish labels.
(output_macinfo): Likewise.  Update macinfo_label_base.
Pass in the line info label.
(init_sections_and_labels): Add early LTO debug flag parameter
and generate different sections and names if set.  Add generation
counter for the labels so we can have multiple of them.
(reset_dies): Helper to allow DIEs to be output multiple times.
(dwarf2out_

Re: libgo patch committed: Rewrite interface code into Go

2016-10-21 Thread Rainer Orth
Hi Ian,

> This patch to libgo rewrites the interface code from C to Go.
>
> I started to copy the Go 1.7 interface code, but the gc and gccgo
> representations of interfaces are too different.  So instead I rewrote
> the gccgo interface code from C to Go.  The code is largely the same
> as it was, but the names are more like those used in the gc runtime.
>
> I also copied over the string comparison functions, and tweaked the
> compiler to use eqstring when comparing strings for equality.
>
> Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
> to mainline.

this patch broke Solaris 11 and 12 bootstrap:

In file included from /vol/gcc/src/hg/trunk/local/libgo/runtime/runtime.h:113:0,
 from /vol/gcc/src/hg/trunk/local/libgo/runtime/go-main.c:17:
./runtime.inc:2:12: error: expected identifier or '(' before numeric constant
 #define c1 326713
^
./runtime.inc:713:11: note: in expansion of macro 'c1'
  uint32_t c1;
   ^~
Makefile:1630: recipe for target 'libgobegin_a-go-main.o' failed
make[4]: *** [libgobegin_a-go-main.o] Error 1

runtime.inc starts with

#define c0 2860486313
#define c1 326713

and lines 712-713 have

struct _Compartments_t {
uint32_t c1;

which stems from  (Compartments_t).

It seems c[01] were introduced via the new go/runtime/alg.go.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [libstdc++, testsuite] Add dg-require-thread-fence

2016-10-21 Thread Kyrill Tkachov

Hi all,

On 21/10/16 09:00, Christophe Lyon wrote:

[ccying Ramana]


Ramana is currently OoO all of this week.

Kyrill


On 20 October 2016 at 18:34, Jonathan Wakely  wrote:

On 20/10/16 09:26 -0700, Mike Stump wrote:

On Oct 20, 2016, at 5:20 AM, Jonathan Wakely  wrote:


I am considering leaving this in the ARM backend to force people to
think what they want to do about thread safety with statics and C++
on bare-metal systems.


The quoting makes it look like those are my words, but I was quoting
Ramana from https://gcc.gnu.org/ml/gcc-patches/2015-05/msg02751.html


Not quite in the GNU spirit?  The port people should decide the best way
to get as much functionality as possible and everything should just work, no
sharp edges.

Forcing people to think sounds like a sharp edge?


I'm inclined to agree, but we are talking about bare metal systems,
where there is no one-size-fits-all solution. Choosing something that
makes most of the library unusable will upset one group of people, and
choosing something that adds overhead that could be avoided will upset
another group.

Either way, I don't think disabling 50% of the testsuite is the
answer. If you don't like the failures, configure the library to build
without threadsafe statics, or configure it to depend on libatomic, or
something else. (We might want new --enable-xxx switches to simplify
doing that).


So if we say that the current behaviour has to keep being the default,
so that users think about what they are really doing, I can certainly
patch my validation scripts to add a configure flag for this particular
configuration.

Is arm-none-eabi the only target having this problem?

Thanks,

Christophe





[PATCH][1/2] Early LTO debug -- simple-object piece

2016-10-21 Thread Richard Biener

Apart from fixing a memleak this is unchanged from the last posting
which means it still has no support for Mach-O or [X]COFF.

Boostrapped / tested on x86_64-unknown-linux-gnu.

Richard.

2016-10-21  Richard Biener  

include/
* simple-object.h (simple_object_copy_lto_debug_sections): New
function.

libiberty/
* simple-object-common.h (struct simple_object_functions): Add
copy_lto_debug_sections hook.
* simple-object.c: Include fcntl.h.
(handle_lto_debug_sections): New helper function.
(simple_object_copy_lto_debug_sections): New function copying
early LTO debug sections to regular debug sections in a new file.
(simple_object_start_write): Handle NULL segment_name.
* simple-object-coff.c (simple_object_coff_functions): Adjust
for not implemented copy_lto_debug_sections hook.
* simple-object-mach-o.c (simple_object_mach_o_functions): Likewise.
* simple-object-xcoff.c (simple_object_xcoff_functions): Likewise.
* simple-object-elf.c (SHT_NULL, SHT_SYMTAB, SHT_RELA, SHT_REL,
SHT_GROUP): Add various sectopn header types.
(SHF_EXCLUDE): Add flag.
(Elf32_External_Sym, Elf64_External_Sym): Add symbol struct.
(ELF_ST_BIND, ELF_ST_TYPE, ELF_ST_INFO): Add accessors.
(STT_OBJECT, STT_FUNC, STT_TLS, STT_GNU_IFUNC): Add Symbol types.
(STV_DEFAULT): Add symbol visibility.
(SHN_COMMON): Add special section index name.
(struct simple_object_elf_write): New.
(simple_object_elf_start_write): Adjust for new private data.
(simple_object_elf_write_shdr): Pass in values for all fields
we write.
(simple_object_elf_write_to_file): Adjust.  Copy from recorded
section headers if requested.
(simple_object_elf_release_write): Release private data.
(simple_object_elf_copy_lto_debug_sections): Copy and rename sections
as denoted by PFN and all their dependences, symbols and relocations
to the empty destination file.
(simple_object_elf_functions): Adjust for copy_lto_debug_sections hook.

Index: early-lto-debug/include/simple-object.h
===
--- early-lto-debug.orig/include/simple-object.h2016-10-19 
13:19:58.012326431 +0200
+++ early-lto-debug/include/simple-object.h 2016-10-20 10:51:49.861722998 
+0200
@@ -197,6 +197,14 @@ simple_object_write_to_file (simple_obje
 extern void
 simple_object_release_write (simple_object_write *);
 
+/* Copy LTO debug sections from SRC_OBJECT to DEST.
+   If an error occurs, return the errno value in ERR and an error string.  */
+
+extern const char *
+simple_object_copy_lto_debug_sections (simple_object_read *src_object,
+  const char *dest,
+  int *err);
+
 #ifdef __cplusplus
 }
 #endif
Index: early-lto-debug/libiberty/simple-object-common.h
===
--- early-lto-debug.orig/libiberty/simple-object-common.h   2016-10-19 
13:19:58.012326431 +0200
+++ early-lto-debug/libiberty/simple-object-common.h2016-10-20 
10:51:49.865723045 +0200
@@ -141,6 +141,12 @@ struct simple_object_functions
 
   /* Release the private data for an simple_object_write.  */
   void (*release_write) (void *);
+
+  /* Copy LTO debug sections.  */
+  const char *(*copy_lto_debug_sections) (simple_object_read *sobj,
+ simple_object_write *dobj,
+ int (*pfn) (const char **),
+ int *err);
 };
 
 /* The known object file formats.  */
Index: early-lto-debug/libiberty/simple-object-elf.c
===
--- early-lto-debug.orig/libiberty/simple-object-elf.c  2016-10-19 
13:19:58.012326431 +0200
+++ early-lto-debug/libiberty/simple-object-elf.c   2016-10-20 
10:51:49.865723045 +0200
@@ -183,8 +183,55 @@ typedef struct {
 
 /* Values for sh_type field.  */
 
+#define SHT_NULL   0   /* Section header table entry unused */
 #define SHT_PROGBITS   1   /* Program data */
+#define SHT_SYMTAB 2   /* Link editing symbol table */
 #define SHT_STRTAB 3   /* A string table */
+#define SHT_RELA   4   /* Relocation entries with addends */
+#define SHT_REL9   /* Relocation entries, no 
addends */
+#define SHT_GROUP  17  /* Section contains a section group */
+
+/* Values for sh_flags field.  */
+
+#define SHF_EXCLUDE0x8000  /* Link editor is to exclude this
+  section from executable and
+  shared library that it builds
+  when those objects are not to be

Re: [Patch, reload, tentative, PR 71627] Tweak conditions in find_valid_class_1

2016-10-21 Thread Bernd Schmidt

On 10/21/2016 12:46 PM, Senthil Kumar Selvaraj wrote:

How does this look?


Looks good, thanks.


Bernd



Re: [PATCH] PR77985: DWARF: Emit DW_AT_comp_dir in all cases, even if source is an absolute path

2016-10-21 Thread Ximin Luo
Richard Biener:
> On Tue, Oct 18, 2016 at 2:35 PM, Ximin Luo  wrote:
>>
>> Thanks, I'll add the Changelog entry. My computer isn't very powerful, so I 
>> didn't bootstrap it yet, I only tested it on a stage1 compiler, on Debian 
>> testing/unstable. I'll find some time to bootstrap it and test it fully over 
>> the next few days.
>>
>> Shall I also get rid of the Darwin force_at_comp_dir stuff? Looking into it 
>> a bit more, my patch basically obsoletes the need for this so I can delete 
>> that as well.
> 
> That would be nice.
> 

Hi,

Attached is the ChangeLog plus updated patch, rebased against the 2016-10-16 
snapshot. Also I noticed I got the wrong bug number, the correct one is 77985 
not 77895.

I've tested it on a Debian testing/unstable x86_64-linux-gnu system. The 
results are good, the same tests fail both before and after the patch, and we 
have 2 new expected successes. Unfortunately I don't have access (and am 
unlikely to get access) to a Darwin system to test it on.

Snippets of the test logs are attached. The full logs are about 200MB each in 
size (4MB XZ-compressed, each) so I guessed I shouldn't send them via email... 
The snippets were grepped from the logs using the '^FAIL: \|^# of\|pr77985' 
pattern. You can diff them to check that the results are same in both cases.

X

-- 
GPG: ed25519/56034877E1F87C35
GPG: rsa4096/1318EFAC5FBBDBCE
https://github.com/infinity0/pubkeys.git
== gcc-build/gcc/testsuite/gcc/gcc.log ==
FAIL: c is -1, not 6303904
FAIL: v is -1, not 13
FAIL: e is -1, not 6304000
FAIL: c is -1, not 6303904
FAIL: v is -1, not 13
FAIL: e is -1, not 6304000
FAIL: c is -1, not 6303920
FAIL: v is -1, not 13
FAIL: e is -1, not 6304000
FAIL: c is -1, not 6303920
FAIL: v is -1, not 13
FAIL: e is -1, not 6304000
FAIL: c is -1, not 6303936
FAIL: v is -1, not 13
FAIL: e is -1, not 6304000
FAIL: c is -1, not 6303936
FAIL: v is -1, not 13
FAIL: e is -1, not 6304000
FAIL: c is -1, not 6303952
FAIL: v is -1, not 13
FAIL: e is -1, not 6304000
FAIL: c is -1, not 6303952
FAIL: v is -1, not 13
FAIL: e is -1, not 6304000
FAIL: c is -1, not 6303968
FAIL: v is -1, not 13
FAIL: e is -1, not 6304000
FAIL: c is -1, not 6303968
FAIL: v is -1, not 13
FAIL: e is -1, not 6304000
FAIL: c is -1, not 6303984
FAIL: v is -1, not 13
FAIL: e is -1, not 6304000
FAIL: ret is -1, not 6303984
FAIL: c is -1, not 6303920
FAIL: o is -1, not 6303904
FAIL: w is -1, not 6303984
FAIL: c is -1, not 6303920
FAIL: o is -1, not 6303904
FAIL: w is -1, not 6303984
FAIL: c is -1, not 6303936
FAIL: o is -1, not 6303920
FAIL: w is -1, not 6303984
FAIL: c is -1, not 6303936
FAIL: o is -1, not 6303920
FAIL: w is -1, not 6303984
FAIL: c is -1, not 6303952
FAIL: o is -1, not 6303936
FAIL: w is -1, not 6303984
FAIL: c is -1, not 6303952
FAIL: o is -1, not 6303936
FAIL: w is -1, not 6303984
FAIL: c is -1, not 6303968
FAIL: o is -1, not 6303952
FAIL: w is -1, not 6303984
FAIL: c is -1, not 6303968
FAIL: o is -1, not 6303952
FAIL: w is -1, not 6303984
FAIL: c is -1, not 6303984
FAIL: o is -1, not 6303968
FAIL: w is -1, not 6303984
FAIL: ret is -1, not 6303968
FAIL: c is -1, not 6303904
FAIL: e is -1, not 6303984
FAIL: c is -1, not 6303904
FAIL: e is -1, not 6303984
FAIL: c is -1, not 6303920
FAIL: e is -1, not 6303984
FAIL: c is -1, not 6303920
FAIL: e is -1, not 6303984
FAIL: c is -1, not 6303936
FAIL: e is -1, not 6303984
FAIL: c is -1, not 6303936
FAIL: e is -1, not 6303984
FAIL: c is -1, not 6303952
FAIL: e is -1, not 6303984
FAIL: c is -1, not 6303952
FAIL: e is -1, not 6303984
FAIL: c is -1, not 6303968
FAIL: e is -1, not 6303984
FAIL: c is -1, not 6303968
FAIL: e is -1, not 6303984
FAIL: ret is -1, not 0
FAIL: c is -1, not 6303904
FAIL: n is -1, not 6303920
FAIL: t is -1, not 6303984
FAIL: c is -1, not 6303904
FAIL: n is -1, not 6303920
FAIL: t is -1, not 6303984
FAIL: c is -1, not 6303920
FAIL: n is -1, not 6303936
FAIL: t is -1, not 6303984
FAIL: c is -1, not 6303920
FAIL: n is -1, not 6303936
FAIL: t is -1, not 6303984
FAIL: c is -1, not 6303936
FAIL: n is -1, not 6303952
FAIL: t is -1, not 6303984
FAIL: c is -1, not 6303936
FAIL: n is -1, not 6303952
FAIL: t is -1, not 6303984
FAIL: c is -1, not 6303952
FAIL: n is -1, not 6303968
FAIL: t is -1, not 6303984
FAIL: c is -1, not 6303952
FAIL: n is -1, not 6303968
FAIL: t is -1, not 6303984
FAIL: c is -1, not 6303968
FAIL: n is -1, not 6303984
FAIL: t is -1, not 6303984
FAIL: c is -1, not 6303968
FAIL: n is -1, not 6303984
FAIL: t is -1, not 6303984
FAIL: ret is -1, not 0
FAIL: 5 PASS, 114 FAIL, 0 UNRESOLVED
FAIL: ret is -1, not 6299888
FAIL: o is -1, not 6299808
FAIL: w is -1, not 6299888
FAIL: o is -1, not 6299808
FAIL: w is -1, not 6299888
FAIL: o is -1, not 6299824
FAIL: w is -1, not 6299888
FAIL: o is -1, not 6299824
FAIL: w is -1, not 6299888
FAIL: o is -1, not 6299840
FAIL: w is -1, not 6299888
FAIL: o is -1, not 6299840
FAIL: w is -1, not 6299888
FAIL: o is -1, not 6299856
FAIL: w is -1, not 6299888
FAIL: o is -1, not 6299856
FAIL: w is -1, not 6299888
FAIL: o is -1, no

Re: [Patch, reload, tentative, PR 71627] Tweak conditions in find_valid_class_1

2016-10-21 Thread Senthil Kumar Selvaraj

Bernd Schmidt writes:

> On 10/18/2016 02:15 PM, Senthil Kumar Selvaraj wrote:
>> Will do both the changes and re-run the reg tests. Ok for trunk if the
>> tests pass for x86_64-pc-linux and avr?
>>
> Probably but let's see the patch first.

How does this look?

Bootstrapped and reg tested x86_64-pc-linux on top of trunk@190252 with
the in_hard_reg_set_p patch backport - there were no failures. Also ran
regtests for avr on trunk, no failures there as well.

Ok to commit to trunk?

Regards
Senthil

gcc/ChangeLog:

2016-10-21  Senthil Kumar Selvaraj  

* reload.c (find_valid_class_1): Allow regclass if atleast one
regno in class is ok. Compute and use rclass size based on
actually available regnos for mode in rclass.

gcc/testsuite/ChangeLog:

2016-10-21  Senthil Kumar Selvaraj  

* gcc.target/avr/pr71627.c: New test.




diff --git gcc/reload.c gcc/reload.c
index 9a859e5..880099e 100644
--- gcc/reload.c
+++ gcc/reload.c
@@ -715,25 +715,23 @@ find_valid_class_1 (machine_mode outer ATTRIBUTE_UNUSED,
 
   for (rclass = 1; rclass < N_REG_CLASSES; rclass++)
 {
-  int bad = 0;
-  for (regno = 0; regno < FIRST_PSEUDO_REGISTER && !bad; regno++)
-   {
- if (in_hard_reg_set_p (reg_class_contents[rclass], mode, regno)
- && !HARD_REGNO_MODE_OK (regno, mode))
-   bad = 1;
-   }
-  
-  if (bad)
-   continue;
+  unsigned int computed_rclass_size = 0;
+
+  for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
+{
+  if (in_hard_reg_set_p (reg_class_contents[rclass], mode, regno)
+  && (HARD_REGNO_MODE_OK (regno, mode)))
+computed_rclass_size++;
+}
 
   cost = register_move_cost (outer, (enum reg_class) rclass, dest_class);
 
-  if ((reg_class_size[rclass] > best_size
+  if ((computed_rclass_size > best_size
   && (best_cost < 0 || best_cost >= cost))
  || best_cost > cost)
{
  best_class = (enum reg_class) rclass;
- best_size = reg_class_size[rclass];
+ best_size = computed_rclass_size;
  best_cost = register_move_cost (outer, (enum reg_class) rclass,
  dest_class);
}
diff --git gcc/testsuite/gcc.target/avr/pr71627.c 
gcc/testsuite/gcc.target/avr/pr71627.c
new file mode 100644
index 000..eaef3d2
--- /dev/null
+++ gcc/testsuite/gcc.target/avr/pr71627.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O1" } */
+
+
+extern volatile __memx const long  a, b, c, d, e, f;
+extern volatile long result;
+
+extern void vfunc (const char*, ...);
+
+void foo (void)
+{
+   result = a + b + c + d + e + f;
+   vfunc ("text", a, b, c, d, e, f, result);
+}


Re: RFC [1/3] divmod transform v2

2016-10-21 Thread Prathamesh Kulkarni
On 20 October 2016 at 15:02, Richard Biener  wrote:
> On Wed, 19 Oct 2016, Jeff Law wrote:
>
>> On 10/15/2016 11:59 PM, Prathamesh Kulkarni wrote:
>> > Hi,
>> > After approval from Bernd Schmidt, I committed the patch to remove
>> > optab functions for
>> > sdivmod_optab and udivmod_optab in optabs.def, which removes the block
>> > for divmod patch.
>> >
>> > This patch is mostly the same as previous one, except it drops
>> > targeting __udivmoddi4() because
>> > it gave undefined reference link error for calling __udivmoddi4() on
>> > aarch64-linux-gnu.
>> > It appears aarch64 has hardware insn for DImode div, so __udivmoddi4()
>> > isn't needed for the target
>> > (it was a bug in my patch that called __udivmoddi4() even though
>> > aarch64 supported hardware div).
>> >
>> > However this makes me wonder if it's guaranteed that __udivmoddi4()
>> > will be available for a target if it doesn't have hardware div and
>> > divmod insn and doesn't have target-specific libfunc for
>> > DImode divmod ? To be conservative, the attached patch doesn't
>> > generate call to __udivmoddi4.
>> >
>> > Passes bootstrap+test on x86_64-unknown-linux.
>> > Cross-tested on arm*-*-*, aarch64*-*-*.
>> > Verified that there are no regressions with SPEC2006 on
>> > x86_64-unknown-linux-gnu.
>> > OK to commit ?
>> >
>> > Thanks,
>> > Prathamesh
>> >
>> >
>> > divmod-v2-3-main.txt
>> >
>> >
>> > 2016-10-15  Prathamesh Kulkarni  
>> > Kugan Vivekanandarajah  
>> > Jim Wilson  
>> >
>> > * target.def: New hook expand_divmod_libfunc.
>> > * doc/tm.texi.in: Add hook for TARGET_EXPAND_DIVMOD_LIBFUNC
>> > * doc/tm.texi: Regenerate.
>> > * internal-fn.def: Add new entry for DIVMOD ifn.
>> > * internal-fn.c (expand_DIVMOD): New.
>> > * tree-ssa-math-opts.c: Include optabs-libfuncs.h, tree-eh.h,
>> > targhooks.h.
>> > (widen_mul_stats): Add new field divmod_calls_inserted.
>> > (target_supports_divmod_p): New.
>> > (divmod_candidate_p): Likewise.
>> > (convert_to_divmod): Likewise.
>> > (pass_optimize_widening_mul::execute): Call
>> > calculate_dominance_info(), renumber_gimple_stmt_uids() at
>> > beginning of function. Call convert_to_divmod()
>> > and record stats for divmod.
>> Starting with some high level design comments.  If these conflict with
>> comments from others, let me know and we'll work through the issues.
>>
>> I don't really like introducing code conditional on the target capabilities
>> this early in the gimple optimization pipeline.
>
> It's basically done right before RTL expansion
> (pass_optimize_widening_mul).
>
>> Would it be possible to always do the transformation to divmod in the gimple
>> optimizers, regardless of the target capabilities.  Then in the gimple->RTL
>> expanders make a final decision about divmod insn, libcall, or using div/mod
>> insns?
>
> The issue is that it hoists one or both the division or the modulo and
> if we don't do the transform we'd want to undo that code motion.
>
>> That would move all the target dependencies out of the gimple optimizers and
>> into the gimple->rtl expansion phase, which is the preferred place to start
>> introducing this kind of target dependency.
>>
>> With that background, I'm going to focus more on the identification of divmod
>> opportunities than the expansion bits.
>>
>>
>> >
>> > diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
>> > index a4a8e49..866c368 100644
>> > --- a/gcc/doc/tm.texi
>> > +++ b/gcc/doc/tm.texi
>> > @@ -7078,6 +7078,11 @@ This is firstly introduced on ARM/AArch64 targets,
>> > please refer to
>> >  the hook implementation for how different fusion types are supported.
>> >  @end deftypefn
>> >
>> > +@deftypefn {Target Hook} void TARGET_EXPAND_DIVMOD_LIBFUNC (rtx
>> > @var{libfunc}, machine_mode @var{mode}, rtx @var{op0}, rtx @var{op1}, rtx
>> > *@var{quot}, rtx *@var{rem})
>> > +Define this hook for enabling divmod transform if the port does not have
>> > +hardware divmod insn but defines target-specific divmod libfuncs.
>> > +@end deftypefn
>> > +
>> >  @node Sections
>> >  @section Dividing the Output into Sections (Texts, Data, @dots{})
>> >  @c the above section title is WAY too long.  maybe cut the part between
>> > diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
>> > index 265f1be..c4c387b 100644
>> > --- a/gcc/doc/tm.texi.in
>> > +++ b/gcc/doc/tm.texi.in
>> > @@ -4890,6 +4890,8 @@ them: try the first ones in this list first.
>> >
>> >  @hook TARGET_SCHED_FUSION_PRIORITY
>> >
>> > +@hook TARGET_EXPAND_DIVMOD_LIBFUNC
>> > +
>> >  @node Sections
>> >  @section Dividing the Output into Sections (Texts, Data, @dots{})
>> >  @c the above section title is WAY too long.  maybe cut the part between
>> > diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
>> > index 0b32d5f..42c6973 100644
>> > --- a/gcc/internal-fn.c
>> > +++ b/gcc/internal-fn.c
>> > @@ -2207,6 +2207,53 @@ expand_ATOMIC_COMPARE_EXCHANGE (i

[Patch, fortran] PR69834 - Collision in derived type hashes

2016-10-21 Thread Paul Richard Thomas
Dear All,

I had the attached patch more or less working at the end of January.
However, there was a regression with submodule_6.f03, which I had
quite a struggle with and only resolved yesterday.

Until now, select type used the hash value to do the type selection
with the inevitable consequence that a collision occurred; albeit a
good number of years after the introduction of OOP. The new testcase
is that of the reporter.

I had developed a fix that used the full, composite string containing
the type name and its module. This works fine but the string length is
such that there is a significant performance hit.

Mikael suggested to use the address of the vtable for type selection
and, apart from the regression mentioned above, this was pretty easy
to get going and causes no measurable performance hit.

The problem with submodule_6.f08 was that of multiple versions of the
vtable for derived type 't_b'. The modifications to
class.c(gfc_find_derived_vtab) provide the solution to this issue and
ensure that the vtable is unique. See the comments in the patch to
understand the mechanism.

I have retained the use of the hash value for intrinsic types, since I
know that there are no collisions there. For classes and derived
types, the addresses of the corresponding vtables are used.
resolve_select_type has been modified accordingly. Note that since
select type is no longer translated into select case, a test for
repeated cases had to be introduced. I retained the original message.
If desired the logic could be broken out into a separate function and
the message modified to reflect the source being select type rather
than select case.

The translation now occurs in two functions in trans-stmt.c. The
implementation is straight forward. Note that I have used a series of
if (condition) {block;goto end_label;} rather than stacked if () {}
else {}'s. This reduces the complexity somewhat and should not lead to
any significant performance problems.

I took the opportunity to eliminate the repeated code chunks that
obtain the symbol for the vtable and then obtain the backend_decl by
packing this into a new function. This constitutes the second block in
the fortran ChangeLog.

Bootstrapped and regtested on FC21/x86_64 - OK for trunk?

It crosses my mind that although this is not a regression, it might be
a good idea to port the patch to 6-branch in a month or two, since it
constitutes a potentially silent gotcha.

Cheers

Paul

2016-10-21  Paul Thomas  

PR fortran/69834
* class.c (gfc_find_derived_vtab): Obtain the gsymbol for the
derived type's module. If the gsymbol is present and the top
level namespace corresponds to a module, use the gsymbol name
space. In the search to see if the vtable exists, try the gsym
namespace first.
* dump-parse-tree (show_code_node): Add explicit dump for the
select type construct.
* resolve.c (build_loc_call): New function.
(resolve_select_type): Add check for repeated type is cases.
Retain selector expression and use it later instead of expr1.
Store the address for the vtable in the 'low' expression and
the hash value in the 'high' expression, for each case. Do not
call resolve_select.
* trans.c(trans_code) : Call gfc_trans_select_type.
* trans-stmt.c (gfc_trans_select_type_cases): New function.
(gfc_trans_select_type): New function.
* trans-stmt.h : Add prototype for gfc_trans_select_type.

Tidy up retrieval of vtable backend decl.
* trans.h : Add prototype for gfc_get_vtable_decl.
* trans-array.c (structure_alloc_comps): Use it.
* trans-decl.c (gfc_get_symbol_decl, gfc_trans_deferred_vars,
gfc_trans_deferred_vars): The same.
* trans-expr.c (gfc_get_vtable_decl): New function to obtain
the vtable symbol and its backend decl for any typespec.
(gfc_reset_vptr, gfc_conv_derived_to_class,
gfc_conv_intrinsic_to_class, gfc_trans_class_assign,
gfc_conv_procedure_call,gfc_trans_subcomponent_assign): Use it.
trans-intrinsic.c (scalar_transfer, conv_intrinsic_move_alloc):
The same.
trans-io.c (transfer_namelist_element): The same.
trans-stmt.c (gfc_trans_allocate): The same.

2016-10-21  Paul Thomas  

PR fortran/69834
* gfortran.dg/select_type_36.f03: New test.
Index: gcc/fortran/class.c
===
*** gcc/fortran/class.c (revision 241393)
--- gcc/fortran/class.c (working copy)
*** add_procs_to_declared_vtab (gfc_symbol *
*** 2187,2204 
  gfc_symbol *
  gfc_find_derived_vtab (gfc_symbol *derived)
  {
!   gfc_namespace *ns;
gfc_symbol *vtab = NULL, *vtype = NULL, *found_sym = NULL, *def_init = NULL;
gfc_symbol *copy = NULL, *src = NULL, *dst = NULL;
  
/* Find the top-level namespace.  */
for (ns = gfc_current_ns; ns; ns = ns->parent)
  if (!ns->parent)
break;
  
!   /* If the type is a class container, use the underlying derived type.  */
!   if (!derived->attr.un

PR libgcc/78064: Add missing include directive to unwind-c.c

2016-10-21 Thread Florian Weimer

See

  https://gcc.gnu.org/ml/gcc/2016-10/msg00165.html

for the background.  This causes pthread_cond_wait in glibc to write out 
of bounds on i386.


Fix was suggested by Jim Wilson.

Tested on x86_64-redhat-linux-gnu, with no new regressions.  Also tested 
against the i386 glibc reproducer, and verified that _Unwind_GetIPInfo 
is called from the __gcc_personality_v0.


Okay for trunk?

We probably should backport this to all active branches as well.  The 
risk is fairly low because the C unwinder personality is rarely used 
(Eric Botcazou pointed out that the C++ and Ada personalities are not 
affected by this).


2016-10-21  Florian Weimer  

PR libgcc/78064
* unwind-c.c: Include auto-target.h.
Index: libgcc/unwind-c.c
===
--- libgcc/unwind-c.c	(revision 241395)
+++ libgcc/unwind-c.c	(working copy)
@@ -26,6 +26,7 @@
 
 #include "tconfig.h"
 #include "tsystem.h"
+#include "auto-target.h"
 #include "unwind.h"
 #define NO_SIZE_OF_ENCODED_VALUE
 #include "unwind-pe.h"


Re: [Patch, rtl] PR middle-end/78016, keep REG_NOTE order during insn copy

2016-10-21 Thread Bernd Schmidt

On 10/21/2016 09:43 AM, Eric Botcazou wrote:

I disagree: there are currently n ways of copying NOTEs in the RTL middle-end,
with different properties each time.  We need only one primitive in rtlanal.c.


I feel the fact that they have different properties means we shouldn't 
try to unify them: we'll just end up with a long list of boolean 
parameters, with no way of quickly telling what a given function call is 
doing. A copy loop is short enough that it can be implemented in-place 
and people can quickly tell what is going on by looking at it.


Maybe the inner if statement could be a small helper function 
(append_copy_of_reg_note).



Bernd


Re: [PATCH] Start adding selftests for print_rtx

2016-10-21 Thread Bernd Schmidt

On 10/21/2016 02:36 AM, David Malcolm wrote:

+  /* Test dumping of hard regs.  This is inherently target-specific due
+ to the name.  */
+#ifdef I386_OPTS_H
+  ASSERT_RTL_DUMP_EQ ("(reg:SI ax)", gen_raw_REG (SImode, 0));
+#endif


Generally putting in target dependencies like this isn't something we 
like to do. The patch is OK without this part, and we can revisit this, 
but maybe there wants to be a target hook for running target-specific 
selftests.



+  ASSERT_RTL_DUMP_EQ ("(cjump_insn (set (pc)\n"
+ "(label_ref 0))\n"
+ " (nil))\n",
+ jump_insn);
 }


I do wonder about the (nil)s and whether we can eliminate them.


Bernd



  1   2   >