Re: [wwwdocs] Added /gcc-7/porting_to.html

2017-02-02 Thread Gerald Pfeifer

On Mon, 30 Jan 2017, Jonathan Wakely wrote:

This adds the porting to guide for GCC 7. So far it only has details
of C++ changes, mostly in the std::lib.


Thanks for doing that, Jonathan!

One minor observation: This has references to the GCC 5 and GCC 6
release notes, where the latter is a relative link and the former
an absolute.  


I assume that was not intentional, and applied the patch below
after verifying that neither maintainer-scripts/gcc_release nor
contrib/gennews actually package porting_to.html.

Gerald

Index: gcc-7/porting_to.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-7/porting_to.html,v
retrieving revision 1.2
diff -u -r1.2 porting_to.html
--- gcc-7/porting_to.html   30 Jan 2017 17:58:52 -  1.2
+++ gcc-7/porting_to.html   3 Feb 2017 07:53:45 -
@@ -116,7 +116,7 @@

When iostream objects are requested to throw exceptions on stream buffer
errors, the type of exception thrown has changed to use the
-https://gcc.gnu.org/gcc-5/changes.html#libstdcxx;>new libstdc++ ABI
+new libstdc++ ABI
introduced in GCC 5. Code which does
catch (const std::ios::failure) or similar will not catch
the exception if it is built using the old ABI. To ensure the exception is


Re: Fix profile updating in ifcombine

2017-02-02 Thread Jeff Law

On 02/02/2017 01:29 PM, Jan Hubicka wrote:

Hi,
this patches fixes profile updating in the ifcombine.  This is not hard to do
and ifcombine is #2 profile update offender out of tree passes (#1 is the
vectorizer).

I think this counts as a regression, becuase one can trigger arbitrarily
bad profile after ifconversion and defnitly construct a testcase where this
will cause us optimize for size where we optimized for speed previously.

Bootstrapped/regtested x86_64-linux. Will commit it tomorrow (after testers
pick up the threading fix) unless there are complains.

* gcc.dg/tree-ssa/ssa-ifcombine-1.c: Check for no profile mismatches.
* gcc.dg/tree-ssa/ssa-ifcombine-2.c: Check for no profile mismatches.
* gcc.dg/tree-ssa/ssa-ifcombine-3.c: Check for no profile mismatches.
* gcc.dg/tree-ssa/ssa-ifcombine-4.c: Check for no profile mismatches.
* gcc.dg/tree-ssa/ssa-ifcombine-5.c: Check for no profile mismatches.
* gcc.dg/tree-ssa/ssa-ifcombine-6.c: Check for no profile mismatches.
* gcc.dg/tree-ssa/ssa-ifcombine-7.c: Check for no profile mismatches.
* gcc.dg/tree-ssa/ssa-ifcombine-8.c: Check for no profile mismatches.
* gcc.dg/tree-ssa/ssa-ifcombine-9.c: Check for no profile mismatches.
* gcc.dg/tree-ssa/ssa-ifcombine-10.c: Check for no profile mismatches.
* gcc.dg/tree-ssa/ssa-ifcombine-11.c: Check for no profile mismatches.
* gcc.dg/tree-ssa/ssa-ifcombine-12.c: Check for no profile mismatches.
* gcc.dg/tree-ssa/ssa-ifcombine-13.c: Check for no profile mismatches.

* tree-ssa-ifcombine.c (update_profile_after_ifcombine): New function.
(ifcombine_ifandif): Use it.

Index: testsuite/gcc.dg/tree-ssa/threadbackward-1.c
===
--- testsuite/gcc.dg/tree-ssa/threadbackward-1.c(revision 0)
+++ testsuite/gcc.dg/tree-ssa/threadbackward-1.c(working copy)
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-ethread" } */
+char *c;
+int t()
+{
+  for (int i=0;i<5000;i++)
+c[i]=i;
+}
+/* { dg-final { scan-tree-dump-times "Registering FSM jump thread" 1 
"ethread"} } */
Seems like this is unrelated to fixing profile updates in ifcombine.  If 
you want the new test it's probably best done in another independent 
patch :-)




Index: tree-ssa-ifcombine.c
===
--- tree-ssa-ifcombine.c(revision 245134)
+++ tree-ssa-ifcombine.c(working copy)
@@ -332,6 +332,51 @@ recognize_bits_test (gcond *cond, tree *
   return true;
 }

+
+/* Update profile after code in outer_cond_bb was adjuted so

s/adjuted/adjusted/

OK with the nit fixed.

jeff



[wwwdocs] Consistently use "testsuite"

2017-02-02 Thread Gerald Pfeifer
Of course, an innocent grep after providing this input in a review
of a patch by David revealed that we have a number more cases where
we used "test suite" instead of "testsuite".

This fixes a number of those.

Applied.

Gerald

Index: contributewhy.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/contributewhy.html,v
retrieving revision 1.8
diff -u -r1.8 contributewhy.html
--- contributewhy.html  27 Jun 2014 11:48:45 -  1.8
+++ contributewhy.html  3 Feb 2017 07:39:07 -
@@ -91,7 +91,7 @@
 port in the official GCC sources maintained by the GNU Project.  The
 wider exposure in the GCC developer and tester community will help
 keep your port up to date with the current sources.  You may even find
-that volunteers will run the ever-growing test suite on your port and
+that volunteers will run the ever-growing testsuite on your port and
 fix problems during the development cycle -- sometimes without your
 intervention.
 
Index: faq.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/faq.html,v
retrieving revision 1.223
diff -u -r1.223 faq.html
--- faq.html29 Dec 2016 21:07:36 -  1.223
+++ faq.html3 Feb 2017 07:39:07 -
@@ -42,7 +42,7 @@
   
 How do I pass flags like
 -fnew-abi to the testsuite?
-How can I run the test suite with multiple 
options?
+How can I run the testsuite with multiple 
options?
   
 
   Miscellaneous
@@ -297,7 +297,7 @@
 
 
 
- How can I run the test suite with multiple 
options? 
+ How can I run the testsuite with multiple 
options? 
 
 If you invoke runtest directly, you can use the
 --target_board option, e.g:
Index: readings.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/readings.html,v
retrieving revision 1.255
diff -u -r1.255 readings.html
--- readings.html   22 Jan 2017 20:20:57 -  1.255
+++ readings.html   3 Feb 2017 07:39:07 -
@@ -442,7 +433,7 @@
 http://www.itl.nist.gov/div897/ctg/fortran_form.htm;>FORTRAN
 77 test suite by the NIST Information Technology Laboratory
 (http://www.itl.nist.gov/div897/ctg/software.htm;>license)
-The test suite contains legal and operational Fortran 77 code.
+contains legal and operational Fortran 77 code.
   
   
 IDRIS
@@ -465,7 +456,7 @@
 Arjen Markus (source provided).
   
   
-http://gdbf95.sourceforge.net/;>gdbf95 test suite.
+http://gdbf95.sourceforge.net/;>gdbf95 testsuite.
   
   
 Tests of run-time checking capabilities
@@ -487,7 +478,7 @@
 
   http://www.ifremer.fr/ditigo/molagnon/fortran90/engfaq.html;>
   Michel Olagnon's Fortran 90 List contains a "Tests and
-  Benchmarks" section mentioning commercial test suites.
+  Benchmarks" section mentioning commercial testsuites.
 
 
   http://www.personal.psu.edu/faculty/h/d/hdk/fortran.html;>Herman
@@ -498,7 +489,7 @@
   Complying with Fortran 90, How does the current crop of
   Fortran 90 compilers measure up to the standard?, Steven
   Baker, Dr Dobb's, January 1995. It described the results of several
-  commercial test suites.
+  commercial testsuites.
 
   



Re: [PATCH] use zero as the lower bound for a signed-unsigned range (PR 79327)

2017-02-02 Thread Jeff Law

On 02/02/2017 09:58 AM, Martin Sebor wrote:


Otherwise all the tests succeeded, though looking e.g. at the diff
between builtin-sprintf-warn-1.c diagnostics with your patch and with
the patch below instead, there are also differences like:
-builtin-sprintf-warn-1.c:1119:3: note: using the range [1, -128] for
directive argument
+builtin-sprintf-warn-1.c:1119:3: note: using the range [-128, 127]
for directive argument
where [1, -128] looks clearly wrong, that isn't a valid range,


The range in the note is the artificial one the pass uses to compute
the range of output.  I went back and forth on this as I think it's
debatable if [-128, 127] is clearer than [1, -128] (or [-128, 1]).
The informative notes aren't completely consistent in what ranges
they print.  It would be nice to nail down what we think is the most
useful output and make them all consistent.
I don't think there's any argument against moving towards consistency, 
but I would well see folks not agreeing on which consistent style is 
preferable.


I do think that we should strive to print ranges in the same format that 
we use to describe ranges internally.  So a range like [1, -128] is not 
a valid range.  That may argue against using the artificial range 
representation in the output.







As for the rest of the patch, while I appreciate your help and
acknowledge it's not my call to make I'm not entirely comfortable
with this much churn at this stage.  My preference would to just
fix the bugs and do the restructuring in stage 1.
I'm really torn here.  I'm keen to raise the bar in terms of what we're 
willing to do for gcc-7.  But I'm also keen to have the codebase in a 
reasonable state that will allow for the ongoing maintenance of the 
gcc-7 branch.


The sprintf stuff is fairly dense and there's almost certainly more 
dusty corner case issues we're going to have to work through.  Thus we 
stand to be in a better state to maintain the code if we can do some 
refactoring.


The fact that Jakub, one of the release managers, is proposing the 
change carries a lot of weight in terms of trying to make a decision 
here.  Similarly your weight as the author of this code carries a lot of 
weight.  Leaving us without a clearly preferable path.



Jeff



Re: New Port for RISC-V v2

2017-02-02 Thread Gerald Pfeifer
On Thu, 2 Feb 2017, Palmer Dabbelt wrote:
> OK, thanks.  As far as I know, I don't have write access to anything 
> yet. Should I request it from overse...@sourceware.org?  I already have 
> binutils git access.

Yes, and you can name me as your sponsor.

Gerald


[wwwdocs] codingconventions.html - use "testsuite"

2017-02-02 Thread Gerald Pfeifer
So, while checking this page for our preferred way to write
"testsuite", I noticed it has a case of "test suite" itself. ;-)

Fixed thusly.

Gerald

Index: codingconventions.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/codingconventions.html,v
retrieving revision 1.77
diff -u -r1.77 codingconventions.html
--- codingconventions.html  18 Sep 2016 13:55:17 -  1.77
+++ codingconventions.html  3 Feb 2017 07:10:37 -
@@ -233,7 +233,7 @@
 testcase passed, the old testcase should be removed and a new testcase
 (with a different name) should be added.  This helps automated
 regression-checkers distinguish a true regression from an improvement
-to the test suite.
+to the testsuite.
 
 
 Diagnostics Conventions


Re: [wwwdocs] Mention GIMPLE and RTL frontends in changes.html

2017-02-02 Thread Gerald Pfeifer
On Thu, 2 Feb 2017, David Malcolm wrote:
> This patch to the website moves the section about the selftest suite to
> the bottom of "Other significant improvements" section, and rewrites it
> to also cover the GIMPLE and RTL "frontends", and tries to couch these
> changes in terms of the benefit to the end-user (i.e. a more reliable
> compiler).

Index: htdocs/gcc-7/changes.html
===
+GCC's already extensive self-test suite has gained some new
+  capabilities, to further improve the reliability of the compiler:

Would "testsuite" be sufficient here?

(Per codingconventions.html we use "testsuite" as opposed to "test 
suite", as I just checked.)

+  GCC now has has an internal unit testing API and a suite of tests
+for programmatic self-testing of implementation subsystems.

Omit "implementation"?

Those are genuine questions (where I feel a little stronger about 
the former than the latter), i.e., while I personally would make 
those changes, I am not making this a requirement of my review.

The patch is okay, just consider the above, please.

Thanks,
Gerald


Re: [PATCH] [AArch64] Implement popcount pattern

2017-02-02 Thread Hurugalawadi, Naveen
Hi Andrew,

Thanks for clearing the confusion.

> I don't understand this comment and how it relates to your updated patch

foo, foo1 and foo2 generates calls to "popcountdi2" which should have
been "popcountsi2" for foo1. When Kyrill commented on using the
popcountsi2; I was confused :).

Hence, the testcase generally checks for the absence of call to "popcount"
and the presence of "cnt" instruction instead.

>> Now of course what should change still is the argument 
>> types to foo1/foo2 

The arguments to foo1 and foo2 are modified as required.

Bootstrapped and regression tested on aarch64-linux-gnu with no regressions.

Please let us know if its okay for stage 1?

Thanks,
Naveendiff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index a693a3b..684a833 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3778,6 +3778,39 @@
   }
 )
 
+;; Pop count be done via the "CNT" instruction in AdvSIMD.
+;;
+;; MOV	v.1d, x0
+;; CNT	v1.8b, v.8b
+;; ADDV b2, v1.8b
+;; MOV	w0, v2.b[0]
+
+(define_expand "popcount2"
+  [(match_operand:GPI 0 "register_operand")
+   (match_operand:GPI 1 "register_operand")]
+  "TARGET_SIMD"
+{
+  rtx v = gen_reg_rtx (V8QImode);
+  rtx v1 = gen_reg_rtx (V8QImode);
+  rtx r = gen_reg_rtx (QImode);
+  rtx in = operands[1];
+  rtx out = operands[0];
+  if(mode == SImode)
+{
+  rtx tmp;
+  tmp = gen_reg_rtx (DImode);
+  /* If we have SImode, zero extend to DImode, pop count does
+ not change if we have extra zeros. */
+  emit_insn (gen_zero_extendsidi2 (tmp, in));
+  in = tmp;
+}
+  emit_move_insn (v, gen_lowpart (V8QImode, in));
+  emit_insn (gen_popcountv8qi2 (v1, v));
+  emit_insn (gen_reduc_plus_scal_v8qi (r, v1));
+  emit_insn (gen_zero_extendqi2 (out, r));
+  DONE;
+})
+
 (define_insn "clrsb2"
   [(set (match_operand:GPI 0 "register_operand" "=r")
 (clrsb:GPI (match_operand:GPI 1 "register_operand" "r")))]
diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt.c b/gcc/testsuite/gcc.target/aarch64/popcnt.c
new file mode 100644
index 000..7e95796
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/popcnt.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+int
+foo (int x)
+{
+  return __builtin_popcount (x);
+}
+
+long
+foo1 (long x)
+{
+  return __builtin_popcountl (x);
+}
+
+long long
+foo2 (long long x)
+{
+  return __builtin_popcountll (x);
+}
+
+/* { dg-final { scan-assembler-not "popcount" } } */
+/* { dg-final { scan-assembler-times "cnt\t" 3 } } */


Re: [PATCH] [AArch64] Implement popcount pattern

2017-02-02 Thread Andrew Pinski
On Thu, Feb 2, 2017 at 3:55 AM, James Greenhalgh
 wrote:
> On Thu, Feb 02, 2017 at 04:03:42AM +, Hurugalawadi, Naveen wrote:
>> Hi James and Kyrill,
>>
>> Thanks for the review and comments on the patch.
>>
>> >> On ILP32 systems this would still use the SImode patterns,
>> >> so I suggest you use __builtin_popcountll and
>> >> an unsigned long long return type to ensure you always exercise the 
>> >> 64-bit code.
>>
>> Sorry for not commenting on this part.
>> The issue is that code generates "__popcountdi2" for all the codes in 
>> testcase
>> for LP64 and ILP32 variants.
>> __builtin_popcount, __builtin_popcountl and __builtin_popcount.
>>
>> Hence, modified the patch to check for "popcount".
>
> I don't understand this comment and how it relates to your updated
> patch (which still checks for CNT).

Now I understand Kyrill's comments as Naveen and I did not understand
before.  What Naveen was trying to test was that there was no longer
any call to __popcountdi2 or __popcountsi2 any more for all three:
__builtin_popcount, __builtin_popcountl and __builtin_popcountll.
Kyrill was asking him to change the test for __builtin_popcountl to
__builtin_popcountll even though there was already one test in that
file already.  Now of course what should change still is the argument
types to foo1/foo2 so they take long and long long respectively.  He
will post a new version of the patch soon.

>
>> Bootstrapped and regression tested on AArch64-Thunderx-Linux machine.
>>
>> Please find attached the modified patch and let us know if its okay?
>
> Could you please clarify what you meant above? The patch looks fine (though
> possibly not for Stage 4), but I'm not sure of your intent.

What he meant was he tested on aarch64-linux-gnu with no regressions.
Is it ok for when stage 1 opens with the change of the testcase?

Thanks,
Andrew

>
> Thanks,
> James


Re: [PATCH] use zero as the lower bound for a signed-unsigned range (PR 79327)

2017-02-02 Thread Martin Sebor

On 02/02/2017 05:31 PM, Martin Sebor wrote:

-  T (2, "%#hho",a); /* { dg-warning "nul past the end"
} */
-  T (2, "%#hhx",a); /* { dg-warning ".%#hhx. directive
writing between 3 and . bytes into a region of size 2" } */
+  T (2, "%#hho",a);
+  T (2, "%#hhx",a);


On reflection, this isn't quite the right fix.  We want to both set
the correct range and warn because the call will likely overflow.
This is an example of why the likely/unlikely counters have been
introduced.  By setting min = 1 and likely = 2 for the %#hho and
3 for the %#hhx we get the desired result.


Attached is a simple patch that removes the vestigial setting of
the minimum counter while preserving the warnings above by using
the likely counter.

I had overlooked this when I introduced the likely counter and so
in the corner cases of "%#o" and "%#x" with a non-constant argument
that could be zero, the minimum counter would be set to 2 and 3
respectively rather than 1 (because zero is formatted without
the '0' or '0x' base prefix).


The attached is an updated version that addresses Jakub's comments
in a separate post.

Martin
gcc/ChangeLog:

	* gimple-ssa-sprintf.c (tree_digits): Avoid adding octal prefix
	when precision has resulted in leading zeros.
	(format_integer): Adjust the likely counter to assume an unknown
	argument that may be zero is non-zero.

gcc/testsuite/ChangeLog:

	* gcc.dg/tree-ssa/builtin-sprintf-warn-13.c: New test.

Index: gcc/gimple-ssa-sprintf.c
===
--- gcc/gimple-ssa-sprintf.c	(revision 245142)
+++ gcc/gimple-ssa-sprintf.c	(working copy)
@@ -762,7 +762,9 @@ tree_digits (tree x, int base, HOST_WIDE_INT prec,
 
   res += prec < ndigs ? ndigs : prec;
 
-  if (prefix && absval)
+  /* Adjust a non-zero value for the base prefix, either hexadecimal,
+ or, unless precision has resulted in a leading zero, also octal.  */
+  if (prefix && absval && (base == 16 || prec <= ndigs))
 {
   if (base == 8)
 	res += 1;
@@ -1242,6 +1244,10 @@ format_integer (const directive , tree arg)
of the format string by returning [-1, -1].  */
 return fmtresult ();
 
+  /* True if the LIKELY counter should be adjusted upward from the MIN
+ counter to account for arguments with unknown values.  */
+  bool likely_adjust = false;
+
   fmtresult res;
 
   /* Using either the range the non-constant argument is in, or its
@@ -1273,6 +1279,14 @@ format_integer (const directive , tree arg)
 
 	  res.argmin = argmin;
 	  res.argmax = argmax;
+
+	  /* Set the adjustment for an argument whose range includes
+	 zero since that doesn't include the octal or hexadecimal
+	 base prefix.  */
+	  wide_int wzero = wi::zero (wi::get_precision (min));
+	  if (wi::le_p (min, wzero, SIGNED)
+	  && !wi::neg_p (max))
+	likely_adjust = true;
 	}
   else if (range_type == VR_ANTI_RANGE)
 	{
@@ -1311,12 +1325,14 @@ format_integer (const directive , tree arg)
 	 or one whose value range cannot be determined, create a T_MIN
 	 constant if the argument's type is signed and T_MAX otherwise,
 	 and use those to compute the range of bytes that the directive
-	 can output.  When precision may be zero, use zero as the minimum
-	 since it results in no bytes on output (unless width is specified
-	 to be greater than 0).  */
-  bool zero = dir.prec[0] <= 0 && dir.prec[1] >= 0;
-  argmin = build_int_cst (argtype, !zero);
+	 can output.  */
+  argmin = integer_zero_node;
 
+  /* Set the adjustment for an argument whose range includes
+	 zero since that doesn't include the octal or hexadecimal
+	 base prefix.  */
+  likely_adjust = true;
+
   int typeprec = TYPE_PRECISION (dirtype);
   int argprec = TYPE_PRECISION (argtype);
 
@@ -1391,7 +1407,24 @@ format_integer (const directive , tree arg)
   res.range.min = tmp;
 }
 
-  res.range.likely = res.knownrange ? res.range.max : res.range.min;
+  /* Add the adjustment for an argument whose range includes zero
+ since it doesn't include the octal or hexadecimal base prefix.  */
+  if (res.knownrange)
+res.range.likely = res.range.max;
+  else
+{
+  res.range.likely = res.range.min;
+  if (likely_adjust && maybebase && base != 10)
+	{
+	  if (res.range.min == 1)
+	res.range.likely += base == 8 ? 1 : 2;
+	  else if (res.range.min == 2
+		   && base == 16
+		   && (dir.width[0] == 2 || dir.prec[0] == 2))
+	++res.range.likely;
+	}
+}
+
   res.range.unlikely = res.range.max;
   res.adjust_for_width_or_precision (dir.width, dirtype, base,
  (sign | maybebase) + (base == 16));
Index: gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-1.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-1.c	(revision 245142)
+++ gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-1.c	(working copy)
@@ -1152,7 +1152,7 @@ void test_sprintf_chk_hh_nonconst (int w, 

[PATCH] assume arrays at end of structs are unbounded (PR 79352)

2017-02-02 Thread Martin Sebor

Bug 79352 points out that the gimple-ssa-printf pass doesn't allow
for an array at the end of a struct to be treated as a poor man's
flexible array member and hold a string that's longer than the upper
bound of the array.  Rather, the pass assumes that the string's
length must at most as long as the upper bound of the array - 1.

To allow for this the attached patch adjusts the get_range_strlen
function the pass uses to obtain the range of string lengths to
expose that bit of information.  The patch introduces a call to
array_at_struct_end_p to determine this but since the array is
in a COMPONENT_REF that array_at_struct_end_p doesn't consider
the patch extends the function to allow it to handle that case.
When the string can reference such an array the pass considers
it to be potentially unbounded in the worst (unlikely) case.

Martin
PR tree-optimization/79352 - -fprintf-return-value doesn't handle flexible-like array members properly

gcc/ChangeLog:

	PR tree-optimization/79352
	* gimple-fold.c (get_range_strlen): Add argument.
	(get_range_strlen): Change return type to bool.
	(get_maxval_strlen): Pass in a dummy argument.
	* gimple-fold.h (get_range_strlen): Change return type to bool.
	* gimple-ssa-sprintf.c (get_string_length): Set unlikely counter.
	* tree.h (array_at_struct_end_p): Add argument.
	* tree.c (array_at_struct_end_p): Handle it.

gcc/testsuite/ChangeLog:

	PR tree-optimization/79352
	* gcc.dg/tree-ssa/pr79352.c: New test.

diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
index ef1afd1..7d21592 100644
--- a/gcc/gimple-fold.c
+++ b/gcc/gimple-fold.c
@@ -1177,11 +1177,15 @@ gimple_fold_builtin_memset (gimple_stmt_iterator *gsi, tree c, tree len)
length and 2 for maximum value ARG can have.
When FUZZY is set and the length of a string cannot be determined,
the function instead considers as the maximum possible length the
-   size of a character array it may refer to.  */
+   size of a character array it may refer to.
+   Set *FLEXP to true if the range of the string lengths has been
+   obtained from the upper bound of an array at the end of a struct.
+   Such an array may hold a string that's longer than its upper bound
+   due to it being used as a poor-man's flexible array member.  */
 
 static bool
 get_range_strlen (tree arg, tree length[2], bitmap *visited, int type,
-		  bool fuzzy)
+		  bool fuzzy, bool *flexp)
 {
   tree var, val;
   gimple *def_stmt;
@@ -1202,7 +1206,7 @@ get_range_strlen (tree arg, tree length[2], bitmap *visited, int type,
 	  if (TREE_CODE (aop0) == INDIRECT_REF
 	  && TREE_CODE (TREE_OPERAND (aop0, 0)) == SSA_NAME)
 	return get_range_strlen (TREE_OPERAND (aop0, 0),
- length, visited, type, fuzzy);
+ length, visited, type, fuzzy, flexp);
 	}
 
   if (type == 2)
@@ -1219,7 +1223,7 @@ get_range_strlen (tree arg, tree length[2], bitmap *visited, int type,
 	{
 	  if (TREE_CODE (arg) == ADDR_EXPR)
 	return get_range_strlen (TREE_OPERAND (arg, 0), length,
- visited, type, fuzzy);
+ visited, type, fuzzy, flexp);
 
 	  if (TREE_CODE (arg) == COMPONENT_REF
 	  && TREE_CODE (TREE_TYPE (TREE_OPERAND (arg, 1))) == ARRAY_TYPE)
@@ -1228,7 +1232,12 @@ get_range_strlen (tree arg, tree length[2], bitmap *visited, int type,
 		 bound on the length of the array.  This may be overly
 		 optimistic if the array itself isn't NUL-terminated and
 		 the caller relies on the subsequent member to contain
-		 the NUL.  */
+		 the NUL.
+		 Set *FLEXP to true if the array whose bound is being
+		 used is at the end of a struct.  */
+	  if (array_at_struct_end_p (arg, true))
+		*flexp = true;
+
 	  arg = TREE_OPERAND (arg, 1);
 	  val = TYPE_SIZE_UNIT (TREE_TYPE (arg));
 	  if (!val || integer_zerop (val))
@@ -1295,14 +1304,14 @@ get_range_strlen (tree arg, tree length[2], bitmap *visited, int type,
 || gimple_assign_unary_nop_p (def_stmt))
   {
 tree rhs = gimple_assign_rhs1 (def_stmt);
-	return get_range_strlen (rhs, length, visited, type, fuzzy);
+	return get_range_strlen (rhs, length, visited, type, fuzzy, flexp);
   }
 	else if (gimple_assign_rhs_code (def_stmt) == COND_EXPR)
 	  {
 	tree op2 = gimple_assign_rhs2 (def_stmt);
 	tree op3 = gimple_assign_rhs3 (def_stmt);
-	return get_range_strlen (op2, length, visited, type, fuzzy)
-	  && get_range_strlen (op3, length, visited, type, fuzzy);
+	return get_range_strlen (op2, length, visited, type, fuzzy, flexp)
+	  && get_range_strlen (op3, length, visited, type, fuzzy, flexp);
   }
 return false;
 
@@ -1325,7 +1334,7 @@ get_range_strlen (tree arg, tree length[2], bitmap *visited, int type,
 if (arg == gimple_phi_result (def_stmt))
   continue;
 
-	if (!get_range_strlen (arg, length, visited, type, fuzzy))
+	if (!get_range_strlen (arg, length, visited, type, fuzzy, flexp))
 	  {
 		if (fuzzy)
 		  *maxlen = build_all_ones_cst 

Re: [doc] extend.texi - "lock critical sections"?

2017-02-02 Thread Andi Kleen
On Wed, Feb 01, 2017 at 10:44:10PM +0100, Gerald Pfeifer wrote:
> Hi Andi, or Uros,
> 
> I am not sure, but got a pointer off-list.  Is the patch below
> appropriate, or is the term "lock critical section" a special
> one for x86?

Hi Gerald,

The patch is ok.  Lock critical region isn't a special term.

-Andi

> 
> Gerald
> 
> Index: extend.texi
> ===
> --- extend.texi   (revision 245106)
> +++ extend.texi   (working copy)
> @@ -10103,7 +10103,7 @@
>  @section x86-Specific Memory Model Extensions for Transactional Memory
>  
>  The x86 architecture supports additional memory ordering flags
> -to mark lock critical sections for hardware lock elision. 
> +to mark critical sections for hardware lock elision. 
>  These must be specified in addition to an existing memory order to
>  atomic intrinsics.
>  
> 


Re: [PATCH rs6000 testsuite] Fix PR79158

2017-02-02 Thread Segher Boessenkool
On Thu, Feb 02, 2017 at 02:09:55PM -0600, Pat Haugen wrote:
> The testcase has been failing on BE because the compiler is simply storing 
> the value straight from the GPRs. The following patch fixes the issue by 
> using 'r' in an expression which forces the value back to a VSR. Verified the 
> testcase now passes for powerpc64 and still passes for powerpc64le. Ok for 
> trunk?

Okay.  Thanks,


Segher

> testsuite/ChangeLog:
> 2017-02-02  Pat Haugen  
> 
>   PR target/79158
>   * gcc.target/powerpc/pr70669.c: Use 'r' in an expression to force back
>   to VSX reg.
> 
> 
> Index: gcc.target/powerpc/pr70669.c
> ===
> --- gcc.target/powerpc/pr70669.c  (revision 245032)
> +++ gcc.target/powerpc/pr70669.c  (working copy)
> @@ -13,7 +13,7 @@ void foo (TYPE *p, TYPE *q)
>  #ifndef NO_ASM
>__asm__ (" # %0" : "+r" (r));
>  #endif
> -  *p = r;
> +  *p = r + r;
>  }
>  
>  /* { dg-final { scan-assembler   "mfvsrd"} } */


Re: New Port for RISC-V v2

2017-02-02 Thread Palmer Dabbelt
On Thu, 02 Feb 2017 13:07:25 PST (-0800), ger...@pfeifer.com wrote:
> Hi Palmer,
>
> On Thu, 2 Feb 2017, Palmer Dabbelt wrote:
>> Additionally, here's a diff against wwwdocs.  This is really just to
>> check this is all I'm supposed to do, I can submit a proper patch via
>> the mailing list (I just don't know how to use CVS, sorry).
>>
>>   Index: htdocs/gcc-7/changes.html
>>   ===
>>   +RISC-V
>>   +
>>   +  Support for the RISC-V instruction set has been added
>>   +
>
> This looks fine and is approved.  (Well, modulo adding a full stop. ;-)
>
> And of course we should add this achievement to the News section of our
> home page.  Something like
>
>RISC-V support
>Support for the RISC-V processor family was added, contributed
>by Palmer Dabbelt and Andrew Waterman.
>
> That is pre-approved as well.

OK, thanks.  As far as I know, I don't have write access to anything yet.
Should I request it from overse...@sourceware.org?  I already have binutils git
access.


Re: [PATCH 2/6] RISC-V Port: gcc

2017-02-02 Thread Palmer Dabbelt
On Thu, 02 Feb 2017 10:08:27 PST (-0800), jos...@codesourcery.com wrote:
> On Thu, 2 Feb 2017, Palmer Dabbelt wrote:
>
>> +@table @gcctabopt
>> +@item -mbranch-cost=@var{N}
>> +@opindex mbranch-cost
>> +Set the cost of branches to roughly N instructions.
>
> @var{n} (both places; Texinfo may convert to uppercase depending on the
> output format).
>
>> +@item -mplt
>> +@itemx -mno-plt
>> +@opindex plt
>> +When generating -fpic code, allow the use of PLTs. Ignored for fno-pic.
>
> @option{-fpic}, @option{-fno-pic} (or, better, say PIC and non-PIC rather
> than referring to just one of the options for controlling PIC).
>
>> +natural calling convention: eg LP64 for RV64I, ILP32 for RV32I, LP64D for
>
> e.g.@: (see codingconventions.html).
>
>> +Generate code for given RISC-V ISA (e.g. rv64im).  ISA strings must be
>
> Likewise.
>
>> +lower-case.  Examples include "rv64i", "rv32g", and "rv32imaf".
>
> @samp{rv64i}, etc.
>
>> +@item -msmall-data-limit=@var{N}
>> +@opindex msmall-data-limit
>> +Put global and static data smaller than @var{N} bytes into a special section
>> +(on some targets).
>
> @var{n}, again.

I believe these are all fixed in

  
https://github.com/riscv/riscv-gcc/commit/1fad2b80b27a0a44c3838970e8fd5fd0b51d5d6c

We'll submit a v3 patchset with this (and all the other) reviews fixed.

Thanks!


Re: New Port for RISC-V v2

2017-02-02 Thread Palmer Dabbelt
On Thu, 02 Feb 2017 09:58:32 PST (-0800), jos...@codesourcery.com wrote:
> On Thu, 2 Feb 2017, Palmer Dabbelt wrote:
>
>> Additionally, here's a diff against wwwdocs.  This is really just to check 
>> this
>> is all I'm supposed to do, I can submit a proper patch via the mailing list 
>> (I
>> just don't know how to use CVS, sorry).
>
> Other parts of the wwwdocs changes (backends.html, readings.html) are
> probably of more interest.

Thanks, that didn't feel like enough stuff.  I'll submit some wwwdocs changes
that address all the feedback from this code review.


Re: [PATCH 2/6] RISC-V Port: gcc

2017-02-02 Thread Palmer Dabbelt
On Thu, 02 Feb 2017 11:17:42 PST (-0800), mer...@debian.org wrote:
> On Thu, Feb 02, 2017 at 01:05:13AM -0800, Palmer Dabbelt wrote:
>
>> diff --git a/gcc/doc/contrib.texi b/gcc/doc/contrib.texi
>> index 5554d5f..5b14fc4 100644
>> --- a/gcc/doc/contrib.texi
>> +++ b/gcc/doc/contrib.texi
>> @@ -173,6 +173,10 @@ Denis Chertykov for contributing and maintaining the 
>> AVR port, the first GCC por
>>  for an 8-bit architecture.
>>
>>  @item
>> +Kito Cheng for his work on the RISC-V port, including bringing up the test
>> +suite and maintiance.
>  ^^
> Typo?

I'm actually just really bad at spelling :).  This is fixed here

  
https://github.com/riscv/riscv-gcc/commit/8ee844525e6c11b9fcf8b7734ed93d21c110157f

We'll submit a v3 with this (and all other) reviews fixed.

Thanks!


Re: [PATCH] use zero as the lower bound for a signed-unsigned range (PR 79327)

2017-02-02 Thread Martin Sebor

On 02/02/2017 04:23 PM, Jakub Jelinek wrote:

On Thu, Feb 02, 2017 at 12:59:11PM -0700, Martin Sebor wrote:

-  T (2, "%#hho",a); /* { dg-warning "nul past the end" } */
-  T (2, "%#hhx",a); /* { dg-warning ".%#hhx. directive
writing between 3 and . bytes into a region of size 2" } */
+  T (2, "%#hho",a);
+  T (2, "%#hhx",a);


On reflection, this isn't quite the right fix.  We want to both set
the correct range and warn because the call will likely overflow.
This is an example of why the likely/unlikely counters have been
introduced.  By setting min = 1 and likely = 2 for the %#hho and
3 for the %#hhx we get the desired result.


Then please first define what should likely mean and document that.


That's a fair request.   I implemented the "likely counter" approach
based on your suggestion (in comment #3 on bug 78703) so I'd expect
the concept to feel intuitive to you, but documenting it is a good
idea (and in fact on my list of things to do).



That is unrelated to the patch, both in the current trunk, with your
patch as well as with my patch there is just
  res.range.likely = res.knownrange ? res.range.max : res.range.min;
  res.range.unlikely = res.range.max;
for these cases.

Do you want likely 2 because that the shortest length for more than
one value (only a single value has the shortest length)?
Something else?


For "%#o" the shortest output of one byte (for zero) is less likely
than the next shortest output of 2 bytes (0 vs 01).

For "%#x" it's one byte vs three bytes (0 vs 0x1).

Since specifying '#' clearly indicates the user wants the base
prefix it seems very likely that the argument will be non-zero.
Whether it's going to be greater than 7 or 15 is not so clear.

So I think setting the likely counter to 2 for octal and 3 for
hexadecimal makes the most sense.

That's what I did in the patch I just posted:
  https://gcc.gnu.org/ml/gcc-patches/2017-02/msg00223.html

Martin


Re: [PATCH] use zero as the lower bound for a signed-unsigned range (PR 79327)

2017-02-02 Thread Jakub Jelinek
On Thu, Feb 02, 2017 at 05:31:19PM -0700, Martin Sebor wrote:
> index 9e099f0..84dd3671 100644
> --- a/gcc/gimple-ssa-sprintf.c
> +++ b/gcc/gimple-ssa-sprintf.c
> @@ -1242,6 +1242,10 @@ format_integer (const directive , tree arg)
> of the format string by returning [-1, -1].  */
>  return fmtresult ();
>  
> +  /* The adjustment to add to the MIN counter when computing the LIKELY
> + counter for arguments with unknown values.  */
> +  unsigned likely_adjust = 0;
> +
>fmtresult res;
>  
>/* Using either the range the non-constant argument is in, or its
> @@ -1273,6 +1277,15 @@ format_integer (const directive , tree arg)
>  
> res.argmin = argmin;
> res.argmax = argmax;
> +
> +   /* Set the adjustment for an argument whose range includes
> +  zero since that doesn't include the octal or hexadecimal
> +  base prefix.  */
> +   wide_int wzero = wi::zero (wi::get_precision (min));
> +   if (maybebase
> +   && wi::le_p (min, wzero, SIGNED)
> +   && wi::le_p (wzero, max, SIGNED))
> + likely_adjust = base == 8 ? 1 : base == 16 ? 2 : 0;
>   }
>else if (range_type == VR_ANTI_RANGE)
>   {
> @@ -1311,11 +1324,14 @@ format_integer (const directive , tree arg)
>or one whose value range cannot be determined, create a T_MIN
>constant if the argument's type is signed and T_MAX otherwise,
>and use those to compute the range of bytes that the directive
> -  can output.  When precision may be zero, use zero as the minimum
> -  since it results in no bytes on output (unless width is specified
> -  to be greater than 0).  */
> -  bool zero = dir.prec[0] <= 0 && dir.prec[1] >= 0;
> -  argmin = build_int_cst (argtype, !zero);
> +  can output.  */
> +  argmin = integer_zero_node;
> +
> +  /* Set the adjustment for an argument whose range includes
> +  zero since that doesn't include the octal or hexadecimal
> +  base prefix.  */
> +  if (maybebase)
> + likely_adjust = base == 8 ? 1 : base == 16 ? 2 : 0;

This is another sign why the patch I've posted is desirable.  Now you have
to tweak several places, with the patch just one, you have a final range
and you can check whether it includes zero or not.

wi::le_p (wzero, max, SIGNED) is !wi::neg_p (max) I think.

That said, adding something to res.range.min looks wrong to me, you don't
know without more analysis whether actually the length for 0 and length for
say 1 or -1 isn't the same.
Consider %#x where indeed 0 is 2 byte shorter than 1, but say %#3x where
it is the same length, so if you still add your likely_adjust of 2 to that,
the res.range.likely value will be suddenly 5 instead of correct 3.

Easier would be to see if 0 is in the range, see if 1 is also in the range
and set res.range.likely to the length of 1, or, if 0 is in the range, 1
is not but -1 is, to that of -1 or something like that.

Jakub


Re: [patch 79279] combine/simplify_set issue

2017-02-02 Thread Segher Boessenkool
On Thu, Feb 02, 2017 at 11:27:12AM +0100, Aurelien Buhrig wrote:
> > Hrm, maybe you can show the RTL before and after this transform?
> RTL before combine:
> (set (reg:SI 31 (lshiftt:SI (reg:SI 29) (const_int 16
> (set (reg:HI 1 "r1") (reg:HI 25))
> (set (reg:HI 0 "r0") (subreg:HI (reg:SI 31) 0)); LE target
> 
> r0 and r1 are HI regs
> 
> RTL after combining 1 --> 3:
> (set (reg:HI 1 "r1") (reg:HI 25))
> (set (reg:SI 0 "r0") (lshift:SI (reg:SI 29) (const_int 16)))
>  and r1 is clobbered by last insn.

Ah, much clearer now, thanks!

There is this test:

  && (((GET_MODE_SIZE (GET_MODE (src)) + (UNITS_PER_WORD - 1))
   / UNITS_PER_WORD)
  == ((GET_MODE_SIZE (GET_MODE (SUBREG_REG (src)))
   + (UNITS_PER_WORD - 1)) / UNITS_PER_WORD))

On most targets the size of a register is (at least) UNITS_PER_WORD, so
this works there.  But this code should really use can_change_dest_mode,
at least for hard registers, which does

  if (regno < FIRST_PSEUDO_REGISTER)
return (HARD_REGNO_MODE_OK (regno, mode)
&& REG_NREGS (x) >= hard_regno_nregs[regno][mode]);

i.e. it tests the new mode does not use more hard registers than the
old one did, which indeed would be bad.


Segher


Re: [PATCH] use zero as the lower bound for a signed-unsigned range (PR 79327)

2017-02-02 Thread Martin Sebor

-  T (2, "%#hho",a); /* { dg-warning "nul past the end" } */
-  T (2, "%#hhx",a); /* { dg-warning ".%#hhx. directive
writing between 3 and . bytes into a region of size 2" } */
+  T (2, "%#hho",a);
+  T (2, "%#hhx",a);


On reflection, this isn't quite the right fix.  We want to both set
the correct range and warn because the call will likely overflow.
This is an example of why the likely/unlikely counters have been
introduced.  By setting min = 1 and likely = 2 for the %#hho and
3 for the %#hhx we get the desired result.


Attached is a simple patch that removes the vestigial setting of
the minimum counter while preserving the warnings above by using
the likely counter.

I had overlooked this when I introduced the likely counter and so
in the corner cases of "%#o" and "%#x" with a non-constant argument
that could be zero, the minimum counter would be set to 2 and 3
respectively rather than 1 (because zero is formatted without
the '0' or '0x' base prefix).

Martin
gcc/ChangeLog:

	* gimple-ssa-sprintf.c (format_integer): Adjust the likely counter
	to assume an unknown argument that may be zero is non-zero.

gcc/testsuite/ChangeLog:

	* gcc.dg/tree-ssa/builtin-sprintf-warn-13.c: New test.

diff --git a/gcc/gimple-ssa-sprintf.c b/gcc/gimple-ssa-sprintf.c
index 9e099f0..84dd3671 100644
--- a/gcc/gimple-ssa-sprintf.c
+++ b/gcc/gimple-ssa-sprintf.c
@@ -1242,6 +1242,10 @@ format_integer (const directive , tree arg)
of the format string by returning [-1, -1].  */
 return fmtresult ();
 
+  /* The adjustment to add to the MIN counter when computing the LIKELY
+ counter for arguments with unknown values.  */
+  unsigned likely_adjust = 0;
+
   fmtresult res;
 
   /* Using either the range the non-constant argument is in, or its
@@ -1273,6 +1277,15 @@ format_integer (const directive , tree arg)
 
 	  res.argmin = argmin;
 	  res.argmax = argmax;
+
+	  /* Set the adjustment for an argument whose range includes
+	 zero since that doesn't include the octal or hexadecimal
+	 base prefix.  */
+	  wide_int wzero = wi::zero (wi::get_precision (min));
+	  if (maybebase
+	  && wi::le_p (min, wzero, SIGNED)
+	  && wi::le_p (wzero, max, SIGNED))
+	likely_adjust = base == 8 ? 1 : base == 16 ? 2 : 0;
 	}
   else if (range_type == VR_ANTI_RANGE)
 	{
@@ -1311,11 +1324,14 @@ format_integer (const directive , tree arg)
 	 or one whose value range cannot be determined, create a T_MIN
 	 constant if the argument's type is signed and T_MAX otherwise,
 	 and use those to compute the range of bytes that the directive
-	 can output.  When precision may be zero, use zero as the minimum
-	 since it results in no bytes on output (unless width is specified
-	 to be greater than 0).  */
-  bool zero = dir.prec[0] <= 0 && dir.prec[1] >= 0;
-  argmin = build_int_cst (argtype, !zero);
+	 can output.  */
+  argmin = integer_zero_node;
+
+  /* Set the adjustment for an argument whose range includes
+	 zero since that doesn't include the octal or hexadecimal
+	 base prefix.  */
+  if (maybebase)
+	likely_adjust = base == 8 ? 1 : base == 16 ? 2 : 0;
 
   int typeprec = TYPE_PRECISION (dirtype);
   int argprec = TYPE_PRECISION (argtype);
@@ -1391,7 +1407,11 @@ format_integer (const directive , tree arg)
   res.range.min = tmp;
 }
 
-  res.range.likely = res.knownrange ? res.range.max : res.range.min;
+  /* Add the adjustment for an argument whose range includes zero
+ since it doesn't include the octal or hexadecimal base prefix.  */
+  res.range.likely
+= res.knownrange ? res.range.max : res.range.min + likely_adjust;
+
   res.range.unlikely = res.range.max;
   res.adjust_for_width_or_precision (dir.width, dirtype, base,
  (sign | maybebase) + (base == 16));
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-13.c b/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-13.c
new file mode 100644
index 000..deba705
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-13.c
@@ -0,0 +1,180 @@
+/* { dg-do compile }
+   { dg-options "-O2 -Wall -Wformat-overflow=1 -ftrack-macro-expansion=0" }
+   { dg-require-effective-target int32plus } */
+
+typedef __SIZE_TYPE__  size_t;
+typedef __WCHAR_TYPE__ wchar_t;
+
+#define INT_MAX __INT_MAX__
+#define INT_MIN (-INT_MAX - 1)
+
+/* When debugging, define LINE to the line number of the test case to exercise
+   and avoid exercising any of the others.  The buffer and objsize macros
+   below make use of LINE to avoid warnings for other lines.  */
+#ifndef LINE
+# define LINE 0
+#endif
+
+void sink (char*, char*);
+
+int dummy_sprintf (char*, const char*, ...);
+
+char buffer [256];
+extern char *ptr;
+
+int int_range (int min, int max)
+{
+  extern int int_value (void);
+  int n = int_value ();
+  return n < min || max < n ? min : n;
+}
+
+unsigned uint_range (unsigned min, unsigned max)
+{
+  extern unsigned uint_value (void);
+  unsigned n = uint_value 

RE: [PATCH] MIPS: Fix mode mismatch error between Loongson builtin arguments and insn operands.

2017-02-02 Thread Matthew Fortune
Toma Tabacu  writes:
> > From: Matthew Fortune
> > > +/* The third argument needs to be in SImode in order to succesfully
> > > match
> > > +   the operand from the insn definition.  */
> >
> > Please refer to operand here not argument as it is the second argument
> > to the builtin but third operand of the instruction.  Also double ss in
> > successfully.
> >
> 
> I have rewritten the comment to address these mistakes.
> 
> > > +case CODE_FOR_loongson_pshufh:
> > > +case CODE_FOR_loongson_psllh:
> > > +case CODE_FOR_loongson_psllw:
> > > +case CODE_FOR_loongson_psrah:
> > > +case CODE_FOR_loongson_psraw:
> > > +case CODE_FOR_loongson_psrlh:
> > > +case CODE_FOR_loongson_psrlw:
> > > +  gcc_assert (has_target_p && nops == 3 && ops[2].mode == QImode);
> > > +  ops[2].value = lowpart_subreg (SImode, ops[2].value, QImode);
> > > +  ops[2].mode = SImode;
> > > +  break;
> > > +
> > >  case CODE_FOR_msa_addvi_b:
> > >  case CODE_FOR_msa_addvi_h:
> > >  case CODE_FOR_msa_addvi_w:
> >
> > For the record, given paradoxical subregs are a headache...
> > I am OK with this on the basis that the argument to psllh etc is actually
> > a uint8_t which means that bits 8 upwards are guaranteed to be zero so
> > the subreg can be eliminated without any explicit sign or zero extension
> > inserted.  This is the same kind of optimisation that combine would
> > perform when eliminating zero extension.
> >
> > Please can you check that a zero extension is inserted for the following
> > case with -O2 or above:
> >
> > int16x4_t testme(int16x4_t s, int amount)
> > {
> >   return psllh_s (s, amount);
> > }
> >
> > If my understanding is correct there should be an ANDI 0xff inserted
> > or similar.
> >
> 
> The ANDI 0xff is present for -O0, after the first time the value is loaded
> from memory, but it is not generated for -O1 and -O2.
> I'm not seeing any zero extension happening for -O1 and -O2.

That's not what I hoped but is what I was concerned about as I believe it
means we have a change of behaviour.  It boils down to simply ignoring the
argument type of unsigned char.  My guess is that a zero extension is
created but then immediately eliminated because of the paradoxical subreg.

I think you need to create a temporary and perform the zero extension to
ensure we honour the unsigned char operand:

  rtx new_dst = gen_reg_rtx (SImode);
  emit_insn (gen_zero_extendqisi2 (new_dst, ops[2].value));
  ops[2].value = foo;

This should mean that the testcase I sent always has a zero extension but if
you change the type of 'amount' to be unsigned char then there should not be
a zero extension as the argument will be assumed to be correctly zero extended
already and the explicitly introduced zero_extend will be eliminated.

Apologies for not proposing this before.

Thanks,
Matthew

> 
> The only change in the patch below is the fixed comment.
> 
> Regards,
> Toma
> 
> gcc/
> 
>   * config/mips/mips.c (mips_expand_builtin_insn): Put the QImode
>   argument of the pshufh, psllh, psllw, psrah, psraw, psrlh, psrlw
>   builtins into an SImode paradoxical SUBREG.
> 
> diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
> index da7fa8f..e5b2d9a 100644
> --- a/gcc/config/mips/mips.c
> +++ b/gcc/config/mips/mips.c
> @@ -16574,6 +16574,20 @@ mips_expand_builtin_insn (enum insn_code icode, 
> unsigned int
> nops,
> 
>switch (icode)
>  {
> +/* The third operand of these instructions is in SImode, so we need to
> +   bring the corresponding builtin argument from QImode into SImode.  */
> +case CODE_FOR_loongson_pshufh:
> +case CODE_FOR_loongson_psllh:
> +case CODE_FOR_loongson_psllw:
> +case CODE_FOR_loongson_psrah:
> +case CODE_FOR_loongson_psraw:
> +case CODE_FOR_loongson_psrlh:
> +case CODE_FOR_loongson_psrlw:
> +  gcc_assert (has_target_p && nops == 3 && ops[2].mode == QImode);
> +  ops[2].value = lowpart_subreg (SImode, ops[2].value, QImode);
> +  ops[2].mode = SImode;
> +  break;
> +
>  case CODE_FOR_msa_addvi_b:
>  case CODE_FOR_msa_addvi_h:
>  case CODE_FOR_msa_addvi_w:



Re: [PATCH] use zero as the lower bound for a signed-unsigned range (PR 79327)

2017-02-02 Thread Jakub Jelinek
On Thu, Feb 02, 2017 at 12:59:11PM -0700, Martin Sebor wrote:
> > > -  T (2, "%#hho",a); /* { dg-warning "nul past the end" } */
> > > -  T (2, "%#hhx",a); /* { dg-warning ".%#hhx. directive
> > > writing between 3 and . bytes into a region of size 2" } */
> > > +  T (2, "%#hho",a);
> > > +  T (2, "%#hhx",a);
> 
> On reflection, this isn't quite the right fix.  We want to both set
> the correct range and warn because the call will likely overflow.
> This is an example of why the likely/unlikely counters have been
> introduced.  By setting min = 1 and likely = 2 for the %#hho and
> 3 for the %#hhx we get the desired result.

Then please first define what should likely mean and document that.

That is unrelated to the patch, both in the current trunk, with your
patch as well as with my patch there is just
  res.range.likely = res.knownrange ? res.range.max : res.range.min;
  res.range.unlikely = res.range.max;
for these cases.

Do you want likely 2 because that the shortest length for more than
one value (only a single value has the shortest length)?
Something else?

Jakub


[RFC] Attacking 79095 -- a new approach

2017-02-02 Thread Jeff Law


This is an RFC on the updated work for fix 79095.  I've got to work on 
the testcases tomorrow, but wanted to give folks the option for early 
feedback.


There's been a number of ratholes and restarts, but I think where things 
are going now is significantly better than prior work.


The fundamental concept I'm working from now is to avoid rewriting tests 
in the IL and instead extract more precise range information for VRP 
as-if the tests had been rewritten in the IL.  This avoids a slew of 
problems with ADD_OVERFLOW detection.


The better propagated ranges help considerably, but will not eliminate 
the false positives in a slightly modified version of the testcase from 
79095.  Consider:


#include 

void foo(std::vector );

void vtest()
{
  std::vector v;
  foo (v);
  if (v.size() > 0)
  {
v.resize (v.size()-1);
  }
}


Note the inclusion of the v.size () > 0 check.  In the original 
testcase, if v.size () is zero, then the warning is warranted.  But with 
the check any warning is a false positive.  Ugh.


So the the key bits from the VRP dump:

;;   basic block 3, loop depth 0, count 0, freq 1, maybe hot
;;prev block 2, next block 4, flags: (NEW, REACHABLE, VISITED)
;;pred:   2 [100.0%]  (FALLTHRU,EXECUTABLE)
  _9 = MEM[(unsigned int * *)];
  _10 = MEM[(unsigned int * *) + 8B];
  _6 = (long int) _10;
  _11 = (long int) _9;
  _12 = _6 - _11;
  _13 = _12 /[ex] 4;
  _14 = (long unsigned int) _13;
  if (_6 != _11)
goto ; [48.88%]
  else
goto ; [51.12%]
;;succ:   4 [48.9%]  (TRUE_VALUE,EXECUTABLE)
;;11 [51.1%]  (FALSE_VALUE,EXECUTABLE)

;;   basic block 4, loop depth 0, count 0, freq 4888, maybe hot
;;prev block 3, next block 5, flags: (NEW, REACHABLE, VISITED)
;;pred:   3 [48.9%]  (TRUE_VALUE,EXECUTABLE)
  _1 = _14 + 18446744073709551615;
  if (_1 > _14)
goto ; [29.56%]
  else
goto ; [70.44%]

The TRUE arm of the second conditional contains the code which 
ultimately triggers the warning.  Careful examination would tell us that 
if _6 != _11, then _14 can never have the value 0 which is the only 
value which would result in that second conditional being true.


We had an assert that _6 != _11, but we don't have a back-propagation 
step that would allow us to infer a range for _1 or _14 in BB4 based on 
the _6 != _11 ASSERT_EXPR.  Insert plug here for Andrew's work...



However, through a series of steps between vrp1 and vrp2 we end up with 
this at the start of vrp2:


;;   basic block 3, loop depth 0, count 0, freq 1, maybe hot
;;prev block 2, next block 4, flags: (NEW, REACHABLE, VISITED)
;;pred:   2 [100.0%]  (FALLTHRU,EXECUTABLE)
  _9 = MEM[(unsigned int * *)];
  _10 = MEM[(unsigned int * *) + 8B];
  _6 = (long int) _10;
  _11 = (long int) _9;
  if (_6 != _11)
goto ; [48.88%]
  else
goto ; [51.12%]
;;succ:   4 [48.9%]  (TRUE_VALUE,EXECUTABLE)
;;10 [51.1%]  (FALSE_VALUE,EXECUTABLE)

;;   basic block 4, loop depth 0, count 0, freq 4888, maybe hot
;;prev block 3, next block 5, flags: (NEW, REACHABLE, VISITED)
;;pred:   3 [48.9%]  (TRUE_VALUE,EXECUTABLE)
  _12 = _6 - _11;
  _13 = _12 /[ex] 4;
  _14 = (long unsigned int) _13;
  _1 = _14 + 18446744073709551615;
  if (_1 > _14)
goto ; [29.56%]
  else
goto ; [70.44%]

Note how some assignments sunk down into BB4.  And recall that we'll get 
a suitable ASSERT_EXPR indicating _6 != _11 at the start of BB4.  We can 
work with that.


What we have here is the IL in a form where we don't need any 
back-propagation step to determine a more useful range for _14.  So here 
we go.


First we need to derive the range ~[0,0] for _12.  Right now we compute 
that range as VARYING, so refining to ~[0,0] is obviously a step forward.


We then want to improve the range for _14.  Right now we compute 
something like [MIN_INT/4,MAX_INT/4].  That is for all intensive 
purposes useless.  It is more useful to derive ~[0,0] for EXACT_DIV_EXPR.


*Then* we need intersect_ranges to prefer the ~[0,0] when merging the 
new range for _14 with the one found during vrp1.


With those pieces in place we then derive the range ~[0,0] for _14, and 
the new code knows that for the conditional in question to be true that 
_14 must have the value zero.  So VRP determines that the conditional is 
always false, which in turn allows removing the block with the problem 
memset.


So there's a few conceptual pieces to this patchkit.

First, deriving a range for A - B when we know that A != B, deriving a 
better range for EXACT_DIV_EXPR when the numerator has the range ~[0,0]. 
 And finally not losing the range when merging them in intersect_ranges.


The second conceptual pieces is the check for the overflow comparison. 
The only real interesting stuff here is the conditional walking of the 
ASSERT_EXPR chains.  Often an ASSERT_EXPR will get in the way of 
detecting A + CST CMP A expressions.   When following the ASSERT_EXPR 
chains, we follow the chain 

Re: [PATCH] Fix bool vs. unsigned:1 vectorization (PR tree-optimization/79284)

2017-02-02 Thread Jakub Jelinek
On Thu, Feb 02, 2017 at 10:12:32AM -0700, Jeff Law wrote:
> On 02/01/2017 03:45 AM, Richard Biener wrote:
> > 
> > I agree.  But this means we should look for a vectorizer-local fix
> > without a new global predicate then (there seem to be subtly different
> > needs and coming up with good names for all of them sounds difficult...).
> Well, we could go with Jakub's INTEGRAL_BOOLEAN_TYPE as posted, but in
> contexts where we use it and really depend on single bit objects, we add the
> precision == 1 check back.  Jakub's patch removes the type precision check
> in tree-vect-patterns for example.  There's likely all kinds of places where
> we need to add that check as well.

The 3 cases in tree-vect-patterns.c where I've removed the check were
exactly what the proposed macro does, i.e.
  if ((TYPE_PRECISION (TREE_TYPE (rhs1)) != 1
   || !TYPE_UNSIGNED (TREE_TYPE (rhs1)))
  && TREE_CODE (TREE_TYPE (rhs1)) != BOOLEAN_TYPE)
return false;
i.e. bail out unless the rhs1 type is a BOOLEAN_TYPE (any precision,
assuming it has only valid values of 0 and 1) or unless it is unsigned
precision type 1 integer (i.e. something that (if it has also QImode)
forwprop etc. could have changed a BOOLEAN_TYPE with precision 1 into.

Note Fortran has been using I think precision one boolean_type_node even in GCC 
6.x,
it had:
  boolean_type_node = gfc_get_logical_type (gfc_default_logical_kind);
  boolean_true_node = build_int_cst (boolean_type_node, 1);
  boolean_false_node = build_int_cst (boolean_type_node, 0);
Fortran logical(4) etc. are built using:
  new_type = make_unsigned_type (bit_size);
  TREE_SET_CODE (new_type, BOOLEAN_TYPE);
  TYPE_MAX_VALUE (new_type) = build_int_cst (new_type, 1);
  TYPE_PRECISION (new_type) = 1;
thus I believe they have e.g. SImode or DImode rather than QImode, but still
TYPE_PRECISION of 1.  The non-Ada boolean is also:
  boolean_type_node = make_unsigned_type (BOOL_TYPE_SIZE);
  TREE_SET_CODE (boolean_type_node, BOOLEAN_TYPE);
  TYPE_PRECISION (boolean_type_node) = 1;
where BOOL_TYPE_SIZE is CHAR_TYPE_SIZE everywhere but on powerpc*-darwin*
with some command line option.
Ada is
  boolean_type_node = make_unsigned_type (8);
  TREE_SET_CODE (boolean_type_node, BOOLEAN_TYPE);
and thus it is indeed precision 8 QImode.

So, requiring precision 1 for all BOOLEAN_TYPEs in the vectorizer would
only affect Ada.  Requiring QImode or whatever other fixed TYPE_MODE
in the macro for precision 1 unsigned non-BOOLEAN_TYPE integers would
actually also affect Fortran a lot, because e.g. SImode precision 1
INTEGER_TYPE is considered compatible with SImode precision 1 BOOLEAN_TYPE.

> /me is frustrated that we have booleans with nonstandard precision, even
> though I understand why it was done.  It creates numerous headaches.

Ditto.

Jakub


Re: [PATCH] use zero as the lower bound for a signed-unsigned range (PR 79327)

2017-02-02 Thread Jakub Jelinek
Hi!

Note, the second patch I've posted passed bootstrap/regtest on both
x86_64-linux and i686-linux.

On Thu, Feb 02, 2017 at 09:58:06AM -0700, Martin Sebor wrote:
> > int
> > main (void)
> > {
> >   int i;
> >   char buf[64];
> >   if (__builtin_sprintf (buf, "%#hho", a) > 1)
> > __builtin_abort ();
> >   if (__builtin_sprintf (buf, "%#hhx", a) > 1)
> > __builtin_abort ();
> >   return 0;
> > }
> > Current trunk as well as trunk + your patch computes ranges [2, 4] and
> > [3, 4], but the correct ranges are [1, 4] and [1, 4], because for a 0
> > (or say -131072 etc.) it sets buf to "0" in both cases.
> 
> That's right.  This is a good find.  This case is being tested in
> builtin-sprintf-warn-1.c but the assertions (on lines 1154 and
> 1155) seem to expect the wrong numbers.  I would expect your fix
> to cause failures here. (And now that I've read the rest of the
> patch I see it does.)

That testcase is derived from the builtin-sprintf-warn-1.c:115{4,5}
FAILs I got with the patch + analysis (and included in the second patch
in the testsuite as executable testcase too).

> The range in the note is the artificial one the pass uses to compute
> the range of output.  I went back and forth on this as I think it's
> debatable if [-128, 127] is clearer than [1, -128] (or [-128, 1]).
> The informative notes aren't completely consistent in what ranges
> they print.  It would be nice to nail down what we think is the most
> useful output and make them all consistent.

You say in the wording range and print it as a range, then it really should
be a range, and not an artificial one, but one reflecting actual possible
values.  If the user knows that the variable can have values -123 and 126
at that point and you still print [1, -128] or [-128, 1] as range, the
users will just think the compiler is buggy and disable the warning.

> > Plus there are certain notes removed:
> > -builtin-sprintf-warn-1.c:1311:3: note: using the range [1, -2147483648] 
> > for directive argument
> >   T (0, "%d",   a); /* { dg-warning "into a region" } */
> > without replacement, here a has [-2147483648, 2147483647] range, but as that
> > is equal to the type's range, I bet it omits it.
> 
> Could be.  As I said, there are opportunities for improvements in
> what the notes print.  Someone pointed out, for example, that very
> large numbers are hard to read.  It might be clearer to print some
> of them using constants like LONG_MAX - 2, or in hex, etc.

The *_MAX or hex is a separate issue, that can be done or doesn't have to be
done, the values are simply the same.  But picking some random numbers in
the ranges is just wrong.

> > -  tree dirmin = TYPE_MIN_VALUE (dirtype);
> > -  tree dirmax = TYPE_MAX_VALUE (dirtype);
> > -
> > -  if (TYPE_UNSIGNED (dirtype))
> > -{
> > -  *argmin = dirmin;
> > -  *argmax = dirmax;
> > -}
> > -  else
> > -{
> > -  *argmin = integer_zero_node;
> > -  *argmax = dirmin;
> > -}
> > -
> > +  *argmin = TYPE_MIN_VALUE (dirtype);
> > +  *argmax = TYPE_MAX_VALUE (dirtype);
> 
> This is likely behind the changes in the notes you pointed out above.
> The code here is intentional to reflect the range synthesized by
> the pass to compute the output.  I don't have a big issue with

The change I've done was completely intentional, it is again mixing of
value ranges with already premature guessing on what values are shorter or
longer.

> As for the rest of the patch, while I appreciate your help and
> acknowledge it's not my call to make I'm not entirely comfortable
> with this much churn at this stage.  My preference would to just
> fix the bugs and do the restructuring in stage 1.

The thing is, right now we have 3 independent but intermixed code paths,
one for arguments with VR_RANGE that doesn't need conversion for dirtype
(then argmin/argmax until very lately is actual range and the
shortest/longest computation is done at the latest point), then
VR_RANGE that does need conversion for dirtype (that one is prematurely
partly converted to the shortest/longest above), and then
another one that handles the non-VR_RANGE, which commits to the
shortest/longest immediately.  What my patch does is that it unifies all
the 3 paths on essentially what the first path does.  If we don't apply that
patch, we are going to have 3 times as many possibilities for bugs; you will
need to add some hack to get the above mentioned wrong-code fixed, and
IMNSHO another hack to get the bogus printed ranges fixed.
It is always better to remove code than to add it.

Jakub


[wwwdocs] Mention GIMPLE and RTL frontends in changes.html

2017-02-02 Thread David Malcolm
This patch to the website moves the section about the selftest suite to
the bottom of "Other significant improvements" section, and rewrites it
to also cover the GIMPLE and RTL "frontends", and tries to couch these
changes in terms of the benefit to the end-user (i.e. a more reliable
compiler).

Validates.

OK to commit?Index: htdocs/gcc-7/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-7/changes.html,v
retrieving revision 1.46
diff -u -p -r1.46 changes.html
--- htdocs/gcc-7/changes.html	28 Jan 2017 01:15:53 -	1.46
+++ htdocs/gcc-7/changes.html	2 Feb 2017 21:47:21 -
@@ -735,9 +735,6 @@ $ gcc -c test.c --param max-early-inline
 cc1: error: invalid --param name 'max-early-inliner-iteration'; did you mean 'max-early-inliner-iterations'?
 
 
-  GCC has gained an internal unit-testing framework, allowing for
-more detailed testing of its implementation details.
-
   Profile-guided optimization (PGO) instrumentation, as well as test coverage (GCOV),
   can newly instrument constructors (functions marks with __attribute__((constructor))),
   destructors and C++ constructors (and destructors) of classes that are used
@@ -749,6 +746,25 @@ cc1: error:
   -pthread on command line would result in selection of atomic
   profile updating (when supports by a target).
   
+  
+GCC's already extensive self-test suite has gained some new
+  capabilities, to further improve the reliability of the compiler:
+
+  GCC now has has an internal unit testing API and a suite of tests
+for programmatic self-testing of implementation subsystems.
+  GCC's C frontend has been extended so that it can parse dumps of
+GCC's internal representations, allowing for DejaGnu tests
+that more directly exercise specific optimization passes.  This
+covers both the
+https://gcc.gnu.org/onlinedocs/gccint/GIMPLE-Tests.html;>
+GIMPLE representation (for testing higher-level
+optimizations) and the
+https://gcc.gnu.org/onlinedocs/gccint/RTL-Tests.html;>
+RTL representation, allowing for more direct testing of
+lower-level details, such as register allocation and instruction
+selection.
+
+  
 
 
 

Re: [PATCH v3][PR lto/79061] Fix LTO plus ASAN fails with "AddressSanitizer: initialization-order-fiasco".

2017-02-02 Thread Jakub Jelinek
On Thu, Feb 02, 2017 at 06:02:07PM +0300, Maxim Ostapenko wrote:
> Hi,
> 
> PR lto/79061 actually affects gcc-{5, 6}-branch too
> Is it OK to apply following patch on branches?
> 
> -Maxim

> gcc/ChangeLog:
> 
> 2017-02-02  Maxim Ostapenko  
> 
>   PR lto/79061
>   * asan.c (asan_add_global): Force has_dynamic_init to zero in LTO mode.

Ok, thanks.

Jakub


[wwwdocs] gcc-6/changes.html: use command-line option

2017-02-02 Thread Gerald Pfeifer
...instead of command line option.

Spotted while looking for something else, but since it was easy to
fix...applied.

Gerald

Index: gcc-6/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-6/changes.html,v
retrieving revision 1.91
diff -u -r1.91 changes.html
--- gcc-6/changes.html  28 Dec 2016 23:42:42 -  1.91
+++ gcc-6/changes.html  2 Feb 2017 21:43:48 -
@@ -721,7 +721,7 @@
   to make use of these builtins the vecintrin.h
   header file needs to be included.
 
-The new command line options -march=native,
+The new command-line options -march=native,
   and -mtune=native are now available on native IBM
   z Systems.  Specifying these options will cause GCC to
   auto-detect the host CPU and rewrite these options to the
@@ -844,7 +844,7 @@
 
   
 The gcc and g++ driver programs will now
-  provide suggestions for misspelled command line options.
+  provide suggestions for misspelled command-line options.
 
 $ gcc -static-libfortran test.f95
 gcc: error: unrecognized command line option 
'-static-libfortran'; did you mean '-static-libgfortran'?


Re: [PATCH][wwwdocs] Mention native CPU detection in aarch64 notes for GCC 6

2017-02-02 Thread Gerald Pfeifer
On Tue, 26 May 2015, James Greenhalgh wrote:
> I think I remember Gerald saying in the past that it is within the
> remit of port maintainers/reviewers to OK these, but be ready to 
> revert or update it if I am wrong!

You were (and are) very much quite right, James. :-)

(And given that Gerald is not always able to respond as quickly as 
he'd like, this mail being a "nice" example, this just makes sense.)

Gerald


[wwwdocs] update about.html (was: Skeleton for GCC 6 release notes)

2017-02-02 Thread Gerald Pfeifer
On Fri, 4 Sep 2015, Sebastian Huber wrote:
>> is it possible checking outhttps://gcc.gnu.org/about.html  is all
>> you are looking for, or am I thinking to simple?:-)
> I searched the web and found this page before. Then I clicked at "browse the
> repository " and landed in
> the GCC sources. This somehow confused me.

Got it, even if with delay.  Unfortunately we don't seem to have a
current repository browser for CVS/wwwdocs, so let me remove the 
link.  And while I was it it, simplify/clarify the language on live
updates.

Gerald

Index: about.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/about.html,v
retrieving revision 1.26
diff -u -r1.26 about.html
--- about.html  28 Jun 2015 15:17:24 -  1.26
+++ about.html  2 Feb 2017 21:35:46 -
@@ -17,11 +17,9 @@
 https://gcc.gnu.org/onlinedocs/gcc/Contributors.html;>contributors
 .
 
-The web pages are under CVS control and you
-can https://gcc.gnu.org/cgi-bin/cvsweb.cgi/wwwdocs/;>browse
-the repository online.
-The pages on gcc.gnu.org are updated "live" directly after a
-change has been made; www.gnu.org is updated once a day at 4:00 -0700
+The web pages are under CVS control.
+The pages on gcc.gnu.org are updated directly after a
+change has been committed; www.gnu.org is updated once a day at 4:00 -0700
 (PDT).
 
 Please send feedback, problem reports and patches to our


Re: Fix profile updating in ifcombine

2017-02-02 Thread Andrew Pinski
On Thu, Feb 2, 2017 at 12:29 PM, Jan Hubicka  wrote:
> Hi,
> this patches fixes profile updating in the ifcombine.  This is not hard to do
> and ifcombine is #2 profile update offender out of tree passes (#1 is the
> vectorizer).
>
> I think this counts as a regression, becuase one can trigger arbitrarily
> bad profile after ifconversion and defnitly construct a testcase where this
> will cause us optimize for size where we optimized for speed previously.


This might be related to PR 78200 but I could be wrong.

Thanks,
Andrew

>
> Bootstrapped/regtested x86_64-linux. Will commit it tomorrow (after testers
> pick up the threading fix) unless there are complains.
>
> * gcc.dg/tree-ssa/ssa-ifcombine-1.c: Check for no profile mismatches.
> * gcc.dg/tree-ssa/ssa-ifcombine-2.c: Check for no profile mismatches.
> * gcc.dg/tree-ssa/ssa-ifcombine-3.c: Check for no profile mismatches.
> * gcc.dg/tree-ssa/ssa-ifcombine-4.c: Check for no profile mismatches.
> * gcc.dg/tree-ssa/ssa-ifcombine-5.c: Check for no profile mismatches.
> * gcc.dg/tree-ssa/ssa-ifcombine-6.c: Check for no profile mismatches.
> * gcc.dg/tree-ssa/ssa-ifcombine-7.c: Check for no profile mismatches.
> * gcc.dg/tree-ssa/ssa-ifcombine-8.c: Check for no profile mismatches.
> * gcc.dg/tree-ssa/ssa-ifcombine-9.c: Check for no profile mismatches.
> * gcc.dg/tree-ssa/ssa-ifcombine-10.c: Check for no profile mismatches.
> * gcc.dg/tree-ssa/ssa-ifcombine-11.c: Check for no profile mismatches.
> * gcc.dg/tree-ssa/ssa-ifcombine-12.c: Check for no profile mismatches.
> * gcc.dg/tree-ssa/ssa-ifcombine-13.c: Check for no profile mismatches.
>
> * tree-ssa-ifcombine.c (update_profile_after_ifcombine): New function.
> (ifcombine_ifandif): Use it.
>
> Index: testsuite/gcc.dg/tree-ssa/ssa-ifcombine-1.c
> ===
> --- testsuite/gcc.dg/tree-ssa/ssa-ifcombine-1.c (revision 245134)
> +++ testsuite/gcc.dg/tree-ssa/ssa-ifcombine-1.c (working copy)
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O -fdump-tree-optimized" } */
> +/* { dg-options "-O -fdump-tree-optimized-details-blocks" } */
>
>  /* Testcase for PR31657.  */
>
> @@ -14,3 +14,4 @@ int foo (int x, int a, int b)
>  }
>
>  /* { dg-final { scan-tree-dump "\\|" "optimized" } } */
> +/* { dg-final { scan-tree-dump-not "Invalid sum" "optimized" } } */
> Index: testsuite/gcc.dg/tree-ssa/ssa-ifcombine-10.c
> ===
> --- testsuite/gcc.dg/tree-ssa/ssa-ifcombine-10.c(revision 245134)
> +++ testsuite/gcc.dg/tree-ssa/ssa-ifcombine-10.c(working copy)
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -fdump-tree-optimized" } */
> +/* { dg-options "-O2 -fdump-tree-optimized-details-blocks" } */
>
>  /* Testcase for PR31657.  */
>
> @@ -17,3 +17,4 @@ int f(int x, int a, int b)
>return t;
>  }
>  /* { dg-final { scan-tree-dump "& 5" "optimized" } } */
> +/* { dg-final { scan-tree-dump-not "Invalid sum" "optimized" } } */
> Index: testsuite/gcc.dg/tree-ssa/ssa-ifcombine-11.c
> ===
> --- testsuite/gcc.dg/tree-ssa/ssa-ifcombine-11.c(revision 245134)
> +++ testsuite/gcc.dg/tree-ssa/ssa-ifcombine-11.c(working copy)
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O -fdump-tree-optimized" } */
> +/* { dg-options "-O -fdump-tree-optimized-details-blocks" } */
>
>  /* Testcase for PR31657.  */
>  int g(void);
> @@ -18,3 +18,4 @@ int f(int x, int a, int b)
>  }
>
>  /* { dg-final { scan-tree-dump "& 5" "optimized" } } */
> +/* { dg-final { scan-tree-dump-not "Invalid sum" "optimized" } } */
> Index: testsuite/gcc.dg/tree-ssa/ssa-ifcombine-12.c
> ===
> --- testsuite/gcc.dg/tree-ssa/ssa-ifcombine-12.c(revision 245134)
> +++ testsuite/gcc.dg/tree-ssa/ssa-ifcombine-12.c(working copy)
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -fno-tree-vrp -fdump-tree-optimized" } */
> +/* { dg-options "-O2 -fno-tree-vrp -fdump-tree-optimized-details-blocks" } */
>
>  /* Testcase for PR31657.  */
>
> @@ -17,3 +17,4 @@ int f(int x, int a, int b)
>return t;
>  }
>  /* { dg-final { scan-tree-dump "& 5" "optimized" } } */
> +/* { dg-final { scan-tree-dump-not "Invalid sum" "optimized" } } */
> Index: testsuite/gcc.dg/tree-ssa/ssa-ifcombine-13.c
> ===
> --- testsuite/gcc.dg/tree-ssa/ssa-ifcombine-13.c(revision 245134)
> +++ testsuite/gcc.dg/tree-ssa/ssa-ifcombine-13.c(working copy)
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O1 -fdump-tree-optimized" } */
> +/* { dg-options "-O1 -fdump-tree-optimized-details-blocks" } */
>  /* { 

Re: [wwwdocs] Add various new warnings for GCC 7

2017-02-02 Thread Gerald Pfeifer
Hi Marek,

a couple of comments (and minor changes) below.  Once you have made
those, this is okay to commit.  Quite some stuff going into GCC 7!

On Fri, 27 Jan 2017, Marek Polacek wrote:
> Index: changes.html
> ===
> +New command-line options have been added for the C and C++ compilers:
> +  
> +-Wimplicit-fallthrough warns when a switch case falls
> +through.  This warning has five different levels.  The compiler is
> + able to parse a wide range of falls through comments, depending on

Perhaps say "fall through" (in quotes)?

And why is it 'fallthrough' in doc/invoke.texi and 'falls through' here?

> + the level.  It also handles control-flow statements, such as ifs.

I have to admit I am not sure what the above means practically, i.e.,
in what manner control-flow statement interact here.

(So I did check doc/invoke.texi and there is an example, alas no
explanation.  Is the point that GCC only warns about the "i < 1"
arm?  That may be good describing in the documentation.)

> + It's possible to suppres the warning by either adding a falls through
> + comment, or by using a null statement: __attribute__

Same as above.

> +-Wpointer-compare warns when a pointer is compared with
> +a zero character constant.  This code is now invalid in C++11 and
> + GCC rejects such code.  This warning is enabled by default.

How about "Such code is now invalid in C++11 and GCC rejects it"?

> +-Wrestrict warns when an argument passed to a
> +restrict-qualified parameter aliases with another argument.

restrict-qualified

> +-Wmemset-elt-size warns for memset calls, when the first

memset

> +-Wswitch-unreachable warns when a switch statement has

switch

> +-Wregister warns about uses of register
> +storage specifier.  In C++17 this keyword has been removed and for 
> C++17

"the...storage specified" (adding "the")

> +has duplicate const, volatile, restrict or _Atomic specifier.

const
volatile
restrict
_Atomic

> +The new -Wdangling-else command-line option has been split
> +out of -Wparentheses and warns about dangling else.

else


Gerald


Re: New Port for RISC-V v2

2017-02-02 Thread Gerald Pfeifer
Hi Palmer,

On Thu, 2 Feb 2017, Palmer Dabbelt wrote:
> Additionally, here's a diff against wwwdocs.  This is really just to 
> check this is all I'm supposed to do, I can submit a proper patch via 
> the mailing list (I just don't know how to use CVS, sorry).
> 
>   Index: htdocs/gcc-7/changes.html
>   ===
>   +RISC-V
>   +
>   +  Support for the RISC-V instruction set has been added
>   +

This looks fine and is approved.  (Well, modulo adding a full stop. ;-)

And of course we should add this achievement to the News section of our 
home page.  Something like

   RISC-V support
   Support for the RISC-V processor family was added, contributed
   by Palmer Dabbelt and Andrew Waterman.

That is pre-approved as well.

Gerald


Re: [Patch][ARM,AArch64] more poly64 intrinsics and tests

2017-02-02 Thread Christophe Lyon
Hello,

Is it too late for this patch?

On 11 January 2017 at 11:13, Christophe Lyon  wrote:
> Ping?
>
> James, I'm not sure whether your comment was a request for a new
> version of my patch or just FYI?
>
>
> On 3 January 2017 at 16:47, Christophe Lyon  
> wrote:
>> Ping?
>>
>>
>> On 14 December 2016 at 23:09, Christophe Lyon
>>  wrote:
>>> On 14 December 2016 at 17:55, James Greenhalgh  
>>> wrote:
 On Mon, Dec 12, 2016 at 05:03:31PM +0100, Christophe Lyon wrote:
> Hi,
>
> After the recent update from Tamar, I noticed a few discrepancies
> between ARM and AArch64 regarding a few poly64 intrinsics.
>
> This patch:
> - adds vtst_p64 and vtstq_p64 to AArch64's arm_neon.h
> - adds vgetq_lane_p64, vset_lane_p64 and vsetq_lane_p64 to ARM's 
> arm_neon.h
> ( vget_lane_p64 was already there)
> - adds the corresponding tests, and moves the vget_lane_p64 ones out
> of the #ifdef __aarch64__ zone.
>
> Cross-tested on arm* and aarch64* targets.
>
> OK?

 The AArch64 parts of this look fine to me, but I do have one question on
 your inline assembly implementation for vtstq_p64:

> +__extension__ extern __inline uint64x2_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +vtstq_p64 (poly64x2_t a, poly64x2_t b)
> +{
> +  uint64x2_t result;
> +  __asm__ ("cmtst %0.2d, %1.2d, %2.2d"
> +   : "=w"(result)
> +   : "w"(a), "w"(b)
> +   : /* No clobbers */);
> +  return result;
> +}
> +

 Why can this not be written as many of the other vtstq intrinsics are; 
 e.g.:

__extension__ extern __inline uint64x2_t
   __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
   vtstq_p64 (poly64x2_t __a, poly64x2_t __b)
   {
 return (uint64x2_t) uint64x2_t) __a) & ((uint64x2_t) __b))
   != __AARCH64_INT64_C (0));
   }

>>>
>>> I don't know, I just followed the pattern used for vtstq_p8 and vtstq_p16
>>> just above...
>>>
>>>
 Thanks,
 James

> gcc/ChangeLog:
>
> 2016-12-12  Christophe Lyon  
>
>   * config/aarch64/arm_neon.h (vtst_p64): New.
>   (vtstq_p64): New.
>   * config/arm/arm_neon.h (vgetq_lane_p64): New.
>   (vset_lane_p64): New.
>   (vsetq_lane_p64): New.
>
> gcc/testsuite/ChangeLog:
>
> 2016-12-12  Christophe Lyon  
>
>   * gcc.target/aarch64/advsimd-intrinsics/p64_p128.c
>   (vget_lane_expected, vset_lane_expected, vtst_expected_poly64x1):
>   New.
>   (vmov_n_expected0, vmov_n_expected1, vmov_n_expected2)
>   (expected_vld_st2_0, expected_vld_st2_1, expected_vld_st3_0)
>   (expected_vld_st3_1, expected_vld_st3_2, expected_vld_st4_0)
>   (expected_vld_st4_1, expected_vld_st4_2, expected_vld_st4_3)
>   (vtst_expected_poly64x2): Move to aarch64-only section.
>   (vget_lane_p64, vgetq_lane_p64, vset_lane_p64, vsetq_lane_p64)
>   (vtst_p64, vtstq_p64): New tests.
>




Fix profile updating in ifcombine

2017-02-02 Thread Jan Hubicka
Hi,
this patches fixes profile updating in the ifcombine.  This is not hard to do
and ifcombine is #2 profile update offender out of tree passes (#1 is the
vectorizer).

I think this counts as a regression, becuase one can trigger arbitrarily
bad profile after ifconversion and defnitly construct a testcase where this
will cause us optimize for size where we optimized for speed previously.

Bootstrapped/regtested x86_64-linux. Will commit it tomorrow (after testers
pick up the threading fix) unless there are complains.

* gcc.dg/tree-ssa/ssa-ifcombine-1.c: Check for no profile mismatches.
* gcc.dg/tree-ssa/ssa-ifcombine-2.c: Check for no profile mismatches.
* gcc.dg/tree-ssa/ssa-ifcombine-3.c: Check for no profile mismatches.
* gcc.dg/tree-ssa/ssa-ifcombine-4.c: Check for no profile mismatches.
* gcc.dg/tree-ssa/ssa-ifcombine-5.c: Check for no profile mismatches.
* gcc.dg/tree-ssa/ssa-ifcombine-6.c: Check for no profile mismatches.
* gcc.dg/tree-ssa/ssa-ifcombine-7.c: Check for no profile mismatches.
* gcc.dg/tree-ssa/ssa-ifcombine-8.c: Check for no profile mismatches.
* gcc.dg/tree-ssa/ssa-ifcombine-9.c: Check for no profile mismatches.
* gcc.dg/tree-ssa/ssa-ifcombine-10.c: Check for no profile mismatches.
* gcc.dg/tree-ssa/ssa-ifcombine-11.c: Check for no profile mismatches.
* gcc.dg/tree-ssa/ssa-ifcombine-12.c: Check for no profile mismatches.
* gcc.dg/tree-ssa/ssa-ifcombine-13.c: Check for no profile mismatches.

* tree-ssa-ifcombine.c (update_profile_after_ifcombine): New function.
(ifcombine_ifandif): Use it.

Index: testsuite/gcc.dg/tree-ssa/ssa-ifcombine-1.c
===
--- testsuite/gcc.dg/tree-ssa/ssa-ifcombine-1.c (revision 245134)
+++ testsuite/gcc.dg/tree-ssa/ssa-ifcombine-1.c (working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O -fdump-tree-optimized" } */
+/* { dg-options "-O -fdump-tree-optimized-details-blocks" } */
 
 /* Testcase for PR31657.  */
 
@@ -14,3 +14,4 @@ int foo (int x, int a, int b)
 }
 
 /* { dg-final { scan-tree-dump "\\|" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "Invalid sum" "optimized" } } */
Index: testsuite/gcc.dg/tree-ssa/ssa-ifcombine-10.c
===
--- testsuite/gcc.dg/tree-ssa/ssa-ifcombine-10.c(revision 245134)
+++ testsuite/gcc.dg/tree-ssa/ssa-ifcombine-10.c(working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* { dg-options "-O2 -fdump-tree-optimized-details-blocks" } */
 
 /* Testcase for PR31657.  */
 
@@ -17,3 +17,4 @@ int f(int x, int a, int b)
   return t;
 }
 /* { dg-final { scan-tree-dump "& 5" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "Invalid sum" "optimized" } } */
Index: testsuite/gcc.dg/tree-ssa/ssa-ifcombine-11.c
===
--- testsuite/gcc.dg/tree-ssa/ssa-ifcombine-11.c(revision 245134)
+++ testsuite/gcc.dg/tree-ssa/ssa-ifcombine-11.c(working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O -fdump-tree-optimized" } */
+/* { dg-options "-O -fdump-tree-optimized-details-blocks" } */
 
 /* Testcase for PR31657.  */
 int g(void);
@@ -18,3 +18,4 @@ int f(int x, int a, int b)
 }
 
 /* { dg-final { scan-tree-dump "& 5" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "Invalid sum" "optimized" } } */
Index: testsuite/gcc.dg/tree-ssa/ssa-ifcombine-12.c
===
--- testsuite/gcc.dg/tree-ssa/ssa-ifcombine-12.c(revision 245134)
+++ testsuite/gcc.dg/tree-ssa/ssa-ifcombine-12.c(working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fno-tree-vrp -fdump-tree-optimized" } */
+/* { dg-options "-O2 -fno-tree-vrp -fdump-tree-optimized-details-blocks" } */
 
 /* Testcase for PR31657.  */
 
@@ -17,3 +17,4 @@ int f(int x, int a, int b)
   return t;
 }
 /* { dg-final { scan-tree-dump "& 5" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "Invalid sum" "optimized" } } */
Index: testsuite/gcc.dg/tree-ssa/ssa-ifcombine-13.c
===
--- testsuite/gcc.dg/tree-ssa/ssa-ifcombine-13.c(revision 245134)
+++ testsuite/gcc.dg/tree-ssa/ssa-ifcombine-13.c(working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O1 -fdump-tree-optimized" } */
+/* { dg-options "-O1 -fdump-tree-optimized-details-blocks" } */
 /* { dg-additional-options "-mbranch-cost=2" { target { i?86-*-* x86_64-*-* 
s390*-*-* avr*-*-* } } } */
 
 _Bool f1(_Bool a, _Bool b)
@@ -18,3 +18,4 @@ _Bool f1(_Bool a, _Bool b)
 /* For LOGICAL_OP_NON_SHORT_CIRCUIT, this should be optimized
into return a & b;, with no ifs.  */
 /* { dg-final { scan-tree-dump-not "if" "optimized" { target { 

[PATCH] diagnostics: fix end-points of ranges within macros (PR c++/79300)

2017-02-02 Thread David Malcolm
PR c++/79300 identifies an issue in which diagnostics_show_locus
prints the wrong end-point for a range within a macro:

   assert ((p + val_size) - buf == encoded_len);
   ~^~~~

as opposed to:

   assert ((p + val_size) - buf == encoded_len);
   ~^~

The caret, start and finish locations of this compound location are
all virtual locations.

The root cause is that when diagnostic-show-locus.c's layout ctor
expands the caret and end-points, it calls
  linemap_client_expand_location_to_spelling_point
which (via expand_location_1) unwinds the macro expansions, and
then calls linemap_expand_location.  Doing so implicitly picks the
*caret* location for any virtual locations, and so in the above case
it picks these spelling locations for the three parts of the location:

   assert ((p + val_size) - buf == encoded_len);
   ^^  ^
   START|  FINISH
  CARET

and so erroneously strips the underlining from the final token, apart
from its first character.

The fix is for layout's ctor to indicate that it wants the start/finish
locations in such a situation, adding a new param to
linemap_client_expand_location_to_spelling_point, so that
expand_location_1 can handle this case by extracting the relevant part
of the unwound compound location, and thus choose:

   assert ((p + val_size) - buf == encoded_len);
   ^^^
   START|FINISH
  CARET

Successfully bootstrapped on x86_64-pc-linux-gnu.

OK for stage 4, or should I wait until stage 1?

gcc/ChangeLog:
PR c++/79300
* diagnostic-show-locus.c (layout::layout): Use start and finish
spelling location for the start and finish of each range.
* genmatch.c (linemap_client_expand_location_to_spelling_point):
Add unused aspect param.
* input.c (expand_location_1): Add "aspect" param, and use it
to access the correct part of the location.
(expand_location): Pass LOCATION_ASPECT_CARET to new param of
expand_location_1.
(expand_location_to_spelling_point): Likewise.
(linemap_client_expand_location_to_spelling_point): Add "aspect"
param, and pass it to expand_location_1.

gcc/testsuite/ChangeLog:
PR c++/79300
* c-c++-common/Wmisleading-indentation-3.c (fn_14): Update
expected underlining within macro expansion.
* c-c++-common/pr70264.c: Likewise.
* g++.dg/plugin/diagnostic-test-expressions-1.C
(test_within_macro_1): New test.
(test_within_macro_2): Likewise.
(test_within_macro_3): Likewise.
(test_within_macro_4): Likewise.
* gcc.dg/format/diagnostic-ranges.c (test_macro_3): Update
expected underlining within macro expansion.
(test_macro_4): Likewise.
* gcc.dg/plugin/diagnostic-test-expressions-1.c
(test_within_macro_1): New test.
(test_within_macro_2): Likewise.
(test_within_macro_3): Likewise.
(test_within_macro_4): Likewise.
* gcc.dg/spellcheck-fields-2.c (test_macro): Update expected
underlining within macro expansion.

libcpp/ChangeLog:
PR c++/79300
* include/line-map.h (enum location_aspect): New enum.
(linemap_client_expand_location_to_spelling_point): Add
enum location_aspect param.
* line-map.c (source_range::intersects_line_p): Update for new
param of linemap_client_expand_location_to_spelling_point.
(rich_location::get_expanded_location): Likewise.
(fixit_insert::affects_line_p): Likewise.
---
 gcc/diagnostic-show-locus.c|  9 ++-
 gcc/genmatch.c |  3 +-
 gcc/input.c| 52 +++---
 .../c-c++-common/Wmisleading-indentation-3.c   |  2 +-
 gcc/testsuite/c-c++-common/pr70264.c   |  2 +-
 .../g++.dg/plugin/diagnostic-test-expressions-1.C  | 78 +
 gcc/testsuite/gcc.dg/format/diagnostic-ranges.c|  4 +-
 .../gcc.dg/plugin/diagnostic-test-expressions-1.c  | 79 ++
 gcc/testsuite/gcc.dg/spellcheck-fields-2.c |  2 +-
 libcpp/include/line-map.h  | 12 +++-
 libcpp/line-map.c  | 15 ++--
 11 files changed, 234 insertions(+), 24 deletions(-)

diff --git a/gcc/diagnostic-show-locus.c b/gcc/diagnostic-show-locus.c
index 5c386ae..a0914ef 100644
--- a/gcc/diagnostic-show-locus.c
+++ b/gcc/diagnostic-show-locus.c
@@ -784,11 +784,14 @@ layout::layout (diagnostic_context * context,
 
   /* Expand the various locations.  */
   expanded_location start
-   = linemap_client_expand_location_to_spelling_point (src_range.m_start);
+   = linemap_client_expand_location_to_spelling_point
+   

[PATCH rs6000 testsuite] Fix PR79158

2017-02-02 Thread Pat Haugen
The testcase has been failing on BE because the compiler is simply storing the 
value straight from the GPRs. The following patch fixes the issue by using 'r' 
in an expression which forces the value back to a VSR. Verified the testcase 
now passes for powerpc64 and still passes for powerpc64le. Ok for trunk?

-Pat


testsuite/ChangeLog:
2017-02-02  Pat Haugen  

PR target/79158
* gcc.target/powerpc/pr70669.c: Use 'r' in an expression to force back
to VSX reg.


Index: gcc.target/powerpc/pr70669.c
===
--- gcc.target/powerpc/pr70669.c(revision 245032)
+++ gcc.target/powerpc/pr70669.c(working copy)
@@ -13,7 +13,7 @@ void foo (TYPE *p, TYPE *q)
 #ifndef NO_ASM
   __asm__ (" # %0" : "+r" (r));
 #endif
-  *p = r;
+  *p = r + r;
 }
 
 /* { dg-final { scan-assembler   "mfvsrd"} } */



Re: [PATCH] use zero as the lower bound for a signed-unsigned range (PR 79327)

2017-02-02 Thread Martin Sebor

So far I haven't done full bootstrap/regtest on this, just
make check-gcc -j16 -k RUNTESTFLAGS=tree-ssa.exp
which revealed the need for gcc.dg/tree-ssa/builtin-sprintf-warn-1.c hunk
below.  Here it is turned into a short wrong-code without the patch
below:
volatile int a;

int
main (void)
{
  int i;
  char buf[64];
  if (__builtin_sprintf (buf, "%#hho", a) > 1)
__builtin_abort ();
  if (__builtin_sprintf (buf, "%#hhx", a) > 1)
__builtin_abort ();
  return 0;
}
Current trunk as well as trunk + your patch computes ranges [2, 4] and
[3, 4], but the correct ranges are [1, 4] and [1, 4], because for a 0
(or say -131072 etc.) it sets buf to "0" in both cases.


That's right.  This is a good find.  This case is being tested in
builtin-sprintf-warn-1.c but the assertions (on lines 1154 and
1155) seem to expect the wrong numbers.  I would expect your fix
to cause failures here. (And now that I've read the rest of the
patch I see it does.)

...

-  T (2, "%#hho",a); /* { dg-warning "nul past the end" } */
-  T (2, "%#hhx",a); /* { dg-warning ".%#hhx. directive
writing between 3 and . bytes into a region of size 2" } */
+  T (2, "%#hho",a);
+  T (2, "%#hhx",a);


On reflection, this isn't quite the right fix.  We want to both set
the correct range and warn because the call will likely overflow.
This is an example of why the likely/unlikely counters have been
introduced.  By setting min = 1 and likely = 2 for the %#hho and
3 for the %#hhx we get the desired result.

Martin



Re: [PATCH][libgcc, fuchsia]

2017-02-02 Thread Josh Conner via gcc-patches

Ping?


On 1/17/17 10:40 AM, Josh Conner wrote:

The attached patch adds fuchsia support to libgcc.

OK for trunk?

Thanks -

Josh

2017-01-17  Joshua Conner  

* config/arm/unwind-arm.h (_Unwind_decode_typeinfo_ptr): Use
pc-relative indirect handling for fuchsia.
* config/t-slibgcc-fuchsia: New file.
* config.host (*-*-fuchsia*, aarch64*-*-fuchsia*, arm*-*-fuchsia*,
x86_64-*-fuchsia*): Add definitions.





[PATCH] Improve aarch64 conditional compare usage

2017-02-02 Thread Steve Ellcey
This patch extends conditional compare code generation for
aarch64.  Right now if there is an AND or OR of two compares, GCC will
generate a compare followed by a conditional compare.  But if you have
a _Bool variable on one side (or both sides) instead of a comparision
than ccmp.c does not recoginize the code as something that can be done
with a conditional compare instruction.  This patch fixes that
limitation.

Most of the changes are restructuring the code to allow the change and
do not affect the actual output.  The actual behavour change is in
ccmp_tree_comparison_p where we recoginize a boolean variable as well
as a compare expression as code that can be done with a conditionial
compare and in get_compare_parts where we treat a boolean variable X
as 'X != 0' and generate that comparision.

Since the code in ccmp.c is ony used when TARGET_GEN_CCMP_FIRST is
set and TARGET_GEN_CCMP_FIRST is only set for aarch64 this change will
only affect aarch64.

Tested with no regressions and a new test is added to verify that we
generate a ccmp instruction with the change.

OK for checkin after 7.0?

Steve Ellcey
sell...@cavium.com


GCC ChangeLog:

2017-02-02  Steve Ellcey  

* ccmp.c (ccmp_tree_comparison_p): New function.
(ccmp_candidate_p): Update to use above function.
(get_compare_parts): New function.
(expand_ccmp_next): Update to use new functions.
(expand_ccmp_expr_1): Take tree arg instead of gimple, update to use
new functions.
(expand_ccmp_expr): Pass tree instead of gimple to expand_ccmp_expr_1,
take mode as argument.
* ccmp.h (expand_ccmp_expr): Add mode as argument.
* expr.c (expand_expr_real_1): Pass mode as argument.

GCC testsuite ChangeLog:

2017-02-02  Steve Ellcey  

* gcc.target/aarch64/ccmp_2.c: New test.




diff --git a/gcc/ccmp.c b/gcc/ccmp.c
index 92ca133..4fa3ebd 100644
--- a/gcc/ccmp.c
+++ b/gcc/ccmp.c
@@ -38,6 +38,29 @@ along with GCC; see the file COPYING3.  If not see
 #include "ccmp.h"
 #include "predict.h"
 
+/* Check whether T is a simple boolean variable or a SSA name
+   set by a comparison operator in the same basic block.  */
+static bool
+ccmp_tree_comparison_p (tree t, basic_block bb)
+{
+  gimple *g = get_gimple_for_ssa_name (t);
+  tree_code tcode;
+
+  /* If we have a boolean variable allow it and generate a compare
+ to zero reg when expanding.  */
+  if (!g)
+return (TREE_CODE (TREE_TYPE (t)) == BOOLEAN_TYPE);
+
+  /* Check to see if SSA name is set by a comparison operator in
+ the same basic block.  */ 
+  if (!is_gimple_assign (g))
+return false;
+  if (bb != gimple_bb (g))
+return false;
+  tcode = gimple_assign_rhs_code (g);
+  return TREE_CODE_CLASS (tcode) == tcc_comparison;
+}
+
 /* The following functions expand conditional compare (CCMP) instructions.
Here is a short description about the over all algorithm:
  * ccmp_candidate_p is used to identify the CCMP candidate
@@ -71,49 +94,69 @@ along with GCC; see the file COPYING3.  If not see
 static bool
 ccmp_candidate_p (gimple *g)
 {
-  tree rhs = gimple_assign_rhs_to_tree (g);
+  tree rhs;
   tree lhs, op0, op1;
   gimple *gs0, *gs1;
-  tree_code tcode, tcode0, tcode1;
-  tcode = TREE_CODE (rhs);
+  tree_code tcode;
+  basic_block bb;
+
+  if (!g)
+return false;
 
+  rhs = gimple_assign_rhs_to_tree (g);
+  tcode = TREE_CODE (rhs);
   if (tcode != BIT_AND_EXPR && tcode != BIT_IOR_EXPR)
 return false;
 
   lhs = gimple_assign_lhs (g);
   op0 = TREE_OPERAND (rhs, 0);
   op1 = TREE_OPERAND (rhs, 1);
+  bb = gimple_bb (g);
 
   if ((TREE_CODE (op0) != SSA_NAME) || (TREE_CODE (op1) != SSA_NAME)
   || !has_single_use (lhs))
 return false;
 
-  gs0 = get_gimple_for_ssa_name (op0);
-  gs1 = get_gimple_for_ssa_name (op1);
-  if (!gs0 || !gs1 || !is_gimple_assign (gs0) || !is_gimple_assign (gs1)
-  /* g, gs0 and gs1 must be in the same basic block, since current stage
-	 is out-of-ssa.  We can not guarantee the correctness when forwording
-	 the gs0 and gs1 into g whithout DATAFLOW analysis.  */
-  || gimple_bb (gs0) != gimple_bb (gs1)
-  || gimple_bb (gs0) != gimple_bb (g))
-return false;
+  gs0 = get_gimple_for_ssa_name (op0); /* gs0 may be NULL */
+  gs1 = get_gimple_for_ssa_name (op1); /* gs1 may be NULL */
 
-  tcode0 = gimple_assign_rhs_code (gs0);
-  tcode1 = gimple_assign_rhs_code (gs1);
-  if (TREE_CODE_CLASS (tcode0) == tcc_comparison
-  && TREE_CODE_CLASS (tcode1) == tcc_comparison)
+  if (ccmp_tree_comparison_p (op0, bb) && ccmp_tree_comparison_p (op1, bb))
 return true;
-  if (TREE_CODE_CLASS (tcode0) == tcc_comparison
-  && ccmp_candidate_p (gs1))
+  if (ccmp_tree_comparison_p (op0, bb) && ccmp_candidate_p (gs1))
 return true;
-  else if (TREE_CODE_CLASS (tcode1) == tcc_comparison
-	   && ccmp_candidate_p (gs0))
+  if (ccmp_tree_comparison_p (op1, bb) && ccmp_candidate_p (gs0))
 return true;
   /* We 

Re: [PATCH] Add support for Fuchsia (OS)

2017-02-02 Thread Josh Conner via gcc-patches

On 2/1/17 3:29 PM, Gerald Pfeifer wrote:


On Tue, 17 Jan 2017, Josh Conner via gcc-patches wrote:

Attached is my recommended patch for changes to the web docs describing
Fuchsia support. Please let me know if there's anything else I can do.

This looks fine (just remove the blank before "Fuchsia"); and
sorry for the delay getting back to this!


Done. Thanks!

- Josh



Re: New Port for RISC-V v2

2017-02-02 Thread Joseph Myers
On Thu, 2 Feb 2017, Palmer Dabbelt wrote:

> Additionally, here's a diff against wwwdocs.  This is really just to check 
> this
> is all I'm supposed to do, I can submit a proper patch via the mailing list (I
> just don't know how to use CVS, sorry).

Other parts of the wwwdocs changes (backends.html, readings.html) are 
probably of more interest.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH 2/6] RISC-V Port: gcc

2017-02-02 Thread Joseph Myers
On Thu, 2 Feb 2017, Palmer Dabbelt wrote:

> +@table @gcctabopt
> +@item -mbranch-cost=@var{N}
> +@opindex mbranch-cost
> +Set the cost of branches to roughly N instructions.

@var{n} (both places; Texinfo may convert to uppercase depending on the 
output format).

> +@item -mplt
> +@itemx -mno-plt
> +@opindex plt
> +When generating -fpic code, allow the use of PLTs. Ignored for fno-pic.

@option{-fpic}, @option{-fno-pic} (or, better, say PIC and non-PIC rather 
than referring to just one of the options for controlling PIC).

> +natural calling convention: eg LP64 for RV64I, ILP32 for RV32I, LP64D for

e.g.@: (see codingconventions.html).

> +Generate code for given RISC-V ISA (e.g. rv64im).  ISA strings must be

Likewise.

> +lower-case.  Examples include "rv64i", "rv32g", and "rv32imaf".

@samp{rv64i}, etc.

> +@item -msmall-data-limit=@var{N}
> +@opindex msmall-data-limit
> +Put global and static data smaller than @var{N} bytes into a special section
> +(on some targets).

@var{n}, again.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] relax -Wformat-overflow for precision ranges (PR 79275)

2017-02-02 Thread Jeff Law

On 02/02/2017 11:09 AM, Marek Polacek wrote:



It seems to me that we should be able to write these expressions
the way that's natural to us and at the same time be able to
comfortably read them both ways.  As always, I fully support
consistency and following a coding style where it matters.  I
just don't think this does.


I find these harder to read and they always give me a pause, especially with >= 
or
<=.  I'd say that 99% of the codebase uses "obj >= 0", so we should fix the
rest and be consistent.

Exactly.

I liked the CONST == VAR style because it makes it impossible to typo = 
instead of ==.  I wanted to see how it'd play out so I didn't initially 
call the nit out to be fixed.  But I'm finding that consistently I'm 
having to double-take on conditionals where the constant was first.


Personal style sometimes takes a back seat to project consistency.  And 
as Marek points out, the vast majority of conditions in GCC are written 
as variable op constant.


It's time to bring consistency into the sprintf checking bits.

Jeff


Re: [PATCH doc] clean up -fdump-tree- options (PR 32003)

2017-02-02 Thread Martin Sebor

On 02/01/2017 10:28 PM, Sandra Loosemore wrote:

On 02/01/2017 08:26 PM, Martin Sebor wrote:

On 02/01/2017 08:06 PM, Sandra Loosemore wrote:

On 02/01/2017 06:57 PM, Martin Sebor wrote:

As discussed in bug 32003 - Undocumented -fdump-tree options, rather
than duplicating the same boiler-plate text for each of the dozens
(138 by my count) of undocumented passes, the attached patch removes
the pass-specific -fdump-tree- options replacing them with a list of
generic steps to determine the full list of such options and the names
of the dump files for each.

The bug only talks about -fdump-tree- options so the attached patch
only tackles those and leaves the -fdump-rtl- options for another
time.


Thanks for tackling this.  I think the patch requires another round of
cleanup, though.


Attached is an updated version.


Just a couple more small things.


@@ -12985,9 +12965,11 @@ whole compilation unit while @samp{-details}
dumps every event as
 the passes generate them.  The default with no option is to sum
 counters for each function compiled.

+@item -fdump-tree-all
 @item -fdump-tree-@var{switch}


You need to make this one @itemx now since you added @item just above it.


+To enable the creation of the dump file, append the pass code to
+the @option{-fdump-} option prefix and invoke GCC with it.  For example,
+to enable the dump from the Early Value Range Propagation pass, invoke
+GCC with the @option{-fdump-tree-evrp} option.  Optionally, you may
+specify the name of the dump file.  If you don't specify one, GCC will
+create one as described below.


s/will create one/creates one/


@@ -15572,7 +15434,7 @@ before calling a function and popped afterwards.

 Popping the arguments after the function call can be expensive on
 AVR so that accumulating the stack space might lead to smaller
-executables because arguments need not to be removed from the
+executables because arguments need not be removed from the
 stack after such a function call.

 This option can lead to reduced code size for functions that perform


This change is good, but it's clearly not related to the other parts of
the patch, and wasn't present in the previous version.  Did you intend
to do this?  If so, please commit it separately.  (I often commit big
patches with collected copy-edits and typo corrections, but I try to
keep those separate from content rewrites or restructuring patches.)


I was prompted to make the change by your request to replace
"need not be" with "are not".  I wanted to see if there were
other uses of the former.  This one was incorrect so I fixed
it.

Martin



The rest of the patch is OK to commit with the two nits fixed.

-Sandra





Re: [PATCH] relax -Wformat-overflow for precision ranges (PR 79275)

2017-02-02 Thread Marek Polacek
On Thu, Feb 02, 2017 at 11:00:44AM -0700, Martin Sebor wrote:
> On 02/02/2017 10:26 AM, Jeff Law wrote:
> > On 01/30/2017 02:28 PM, Martin Sebor wrote:
> > > Bug 79275 - -Wformat-overflow false positive exceeding INT_MAX in
> > > glibc sysdeps/posix/tempname.c points out a false positive found
> > > during a Glibc build and caused by the checker using the upper
> > > bound of a range of precisions in string directives with string
> > > arguments of non-constant length.  The attached patch relaxes
> > > the checker to use the lower bound instead when appropriate.
> > > 
> > > Martin
> > > 
> > > gcc-79275.diff
> > > 
> > > 
> > > PR middle-end/79275 -  -Wformat-overflow false positive exceeding
> > > INT_MAX in glibc sysdeps/posix/tempname.c
> > > 
> > > gcc/testsuite/ChangeLog:
> > > 
> > > PR middle-end/79275
> > > * gcc.dg/tree-ssa/builtin-sprintf-warn-11.c: New test.
> > > * gcc.dg/tree-ssa/pr79275.c: New test.
> > > 
> > > gcc/ChangeLog:
> > > 
> > > PR middle-end/79275
> > > * gimple-ssa-sprintf.c (get_string_length): Set lower bound to zero.
> > > (format_string): Tighten up the range of output for non-constant
> > > strings and correct the expected range for wide non-constant strings.
> > Couple more nits.
> > 
> > First, I expect the patch won't apply as-is with the operand order
> > fixes.  There'll be trivial changes you'll need to make for that.
> 
> I had to back out those changes to avoid the massive conflict when
> applying this patch and before retesting it.  Then I reapplied those
> changes by hand.
> 
> > 
> > Along the same lines, this patch would introduce a new operand order nit
> > here:
> > 
> > 
> > > +}
> > > +  else if (0 <= dir.prec[1])
> > > +{
> 
> In light of the hassle this has caused (above, plus with the big
> patch earlier) I can't help but question the value in making these
> changes.  Isn't the code just as easy to read this way as the other?
> 
>   else if (dir.prec[1] >= 0)
> 
> Wouldn't it be if the 0 were spelled ZERO?
> 
> I never even thought about it until you started pointing it out,
> but my own personal (subconscious) bias is clear from how I write
> the code.  I'm also not the only one.
> 
> FWIW, the example above may not be one of them but there are cases
> when the constant on the left actually makes the code much easier
> to read than the other way, such as:
 
Certainly not for me.

>   gcc/read-rtl-function.c:  if (0 == strncmp (desc, "orig:", 5))
> 
> or
> 
>   cp/decl.c:if (1 == simple_cst_equal (TREE_PURPOSE (t1)
> 
> It seems to me that we should be able to write these expressions
> the way that's natural to us and at the same time be able to
> comfortably read them both ways.  As always, I fully support
> consistency and following a coding style where it matters.  I
> just don't think this does.

I find these harder to read and they always give me a pause, especially with >= 
or
<=.  I'd say that 99% of the codebase uses "obj >= 0", so we should fix the
rest and be consistent.

Marek


Re: [PATCH] relax -Wformat-overflow for precision ranges (PR 79275)

2017-02-02 Thread Martin Sebor

On 02/02/2017 10:26 AM, Jeff Law wrote:

On 01/30/2017 02:28 PM, Martin Sebor wrote:

Bug 79275 - -Wformat-overflow false positive exceeding INT_MAX in
glibc sysdeps/posix/tempname.c points out a false positive found
during a Glibc build and caused by the checker using the upper
bound of a range of precisions in string directives with string
arguments of non-constant length.  The attached patch relaxes
the checker to use the lower bound instead when appropriate.

Martin

gcc-79275.diff


PR middle-end/79275 -  -Wformat-overflow false positive exceeding
INT_MAX in glibc sysdeps/posix/tempname.c

gcc/testsuite/ChangeLog:

PR middle-end/79275
* gcc.dg/tree-ssa/builtin-sprintf-warn-11.c: New test.
* gcc.dg/tree-ssa/pr79275.c: New test.

gcc/ChangeLog:

PR middle-end/79275
* gimple-ssa-sprintf.c (get_string_length): Set lower bound to zero.
(format_string): Tighten up the range of output for non-constant
strings and correct the expected range for wide non-constant strings.

Couple more nits.

First, I expect the patch won't apply as-is with the operand order
fixes.  There'll be trivial changes you'll need to make for that.


I had to back out those changes to avoid the massive conflict when
applying this patch and before retesting it.  Then I reapplied those
changes by hand.



Along the same lines, this patch would introduce a new operand order nit
here:



+}
+  else if (0 <= dir.prec[1])
+{


In light of the hassle this has caused (above, plus with the big
patch earlier) I can't help but question the value in making these
changes.  Isn't the code just as easy to read this way as the other?

  else if (dir.prec[1] >= 0)

Wouldn't it be if the 0 were spelled ZERO?

I never even thought about it until you started pointing it out,
but my own personal (subconscious) bias is clear from how I write
the code.  I'm also not the only one.

FWIW, the example above may not be one of them but there are cases
when the constant on the left actually makes the code much easier
to read than the other way, such as:

  gcc/read-rtl-function.c:  if (0 == strncmp (desc, "orig:", 5))

or

  cp/decl.c:if (1 == simple_cst_equal (TREE_PURPOSE (t1)

It seems to me that we should be able to write these expressions
the way that's natural to us and at the same time be able to
comfortably read them both ways.  As always, I fully support
consistency and following a coding style where it matters.  I
just don't think this does.



Please consider documenting how we handle strings with unknown lengths.


I don't think those warrant waiting for another review round.  Fix,
bootstrap, test and install.


Will do.

Thanks
Martin


Re: [PATCH] Fix memory leaks in IPA CP (PR ipa/79337).

2017-02-02 Thread Jeff Law

On 02/02/2017 09:37 AM, Martin Liška wrote:

On 02/02/2017 05:13 PM, Martin Jambor wrote:

Hi,

On Thu, Feb 02, 2017 at 01:53:35PM +0100, Martin Liska wrote:

Hello.

As mentioned in the PR, there is memory leak that is caused by fact, that 
ipa_node_params_t
does release memory just in ipa_node_params_t::remove. That's wrong because the 
callback is called
just when cgraph_removal_hook is triggered. Thus the proper implementation is 
to move it do destructor.
Similar should be done for construction (currently being in 
ipa_node_params_t::insert).

Ah, that did not occur to me.  Still, I must say that destructors
called by the garbage collector are tough to wrap one's head around.
Or at least my head.

Mine too. It took me some time before I realized what's wrong :)


I can't approve it but I like the patch, with a little nit...


Apart from that, I noticed that when using GGC memory, the machinery implicitly 
calls dtors for types
that have __has_trivial_destructor == true. That's case of ipa_node_params_t, 
but as symbol_summary already
calls a dtor before ggc_free is called. Problem comes when hash_table marks a 
memory slot as empty and GGC finalizer
calls dtor for an invalid memory. Thus I believe not using such finalizer is 
proper fix.

Last change fixes issue where ipa_free_all_node_params destroys a 
symbol_summary (living in GGC memory) and dtor
is called for second time via ggc release mechanism.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Ready to be installed?
Martin
From 08a3ecda47d9d52cd4bb6435aaf18b5b099ef683 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Thu, 2 Feb 2017 11:13:13 +0100
Subject: [PATCH] Fix memory leaks in IPA CP (PR ipa/79337).

gcc/ChangeLog:

2017-02-02  Martin Liska  

PR ipa/79337
* ipa-prop.c (ipa_node_params_t::insert): Remove current
implementation.
(ipa_node_params_t::remove): Likewise.
* ipa-prop.h (ipa_node_params::ipa_node_params): Make default
initialization from ipa_node_params_t::insert.
(ipa_node_params::~ipa_node_params): Move from
ipa_node_params_t::release.
* symbol-summary.h (symbol_summary::m_released): New member.
Do not release a summary twice.  Do not allow to call finalizer
for types of a summary that live in GGC memory.
---
 gcc/ipa-prop.c   | 32 
 gcc/ipa-prop.h   | 28 ++--
 gcc/symbol-summary.h | 27 ++-
 3 files changed, 40 insertions(+), 47 deletions(-)


...


diff --git a/gcc/ipa-prop.h b/gcc/ipa-prop.h
index 93a2390c321..8c5ba25fcec 100644
--- a/gcc/ipa-prop.h
+++ b/gcc/ipa-prop.h
@@ -320,6 +320,12 @@ struct GTY(()) ipa_param_descriptor

 struct GTY((for_user)) ipa_node_params
 {
+  /* Default constructor.  */
+  ipa_node_params ();
+
+  /* Default destructor.  */
+  ~ipa_node_params ();
+
   /* Information about individual formal parameters that are gathered when
  summaries are generated. */
   vec *descriptors;
@@ -356,6 +362,24 @@ struct GTY((for_user)) ipa_node_params
   unsigned versionable : 1;
 };

+inline
+ipa_node_params::ipa_node_params ()
+: descriptors (NULL), lattices (NULL), ipcp_orig_node (NULL),
+  known_csts (vNULL), known_contexts (vNULL), analysis_done (0),
+  node_enqueued (0), do_clone_for_all_contexts (0), is_all_contexts_clone (0),
+  node_dead (0), node_within_scc (0), node_calling_single_call (0),
+  versionable (0)
+{
+}
+
+inline
+ipa_node_params::~ipa_node_params ()
+{
+  free (lattices);
+  known_csts.release ();
+  known_contexts.release ();
+}
+
 /* Intermediate information that we get from alias analysis about a particular
parameter in a particular basic_block.  When a parameter or the memory it
references is marked modified, we use that information in all dominated
@@ -580,9 +604,9 @@ public:
 function_summary (table, ggc) { }

   /* Hook that is called by summary when a node is deleted.  */
-  virtual void insert (cgraph_node *, ipa_node_params *info);
+  virtual void insert (cgraph_node *, ipa_node_params *) {}
   /* Hook that is called by summary when a node is deleted.  */
-  virtual void remove (cgraph_node *, ipa_node_params *info);
+  virtual void remove (cgraph_node *, ipa_node_params *) {}

...that these could be just removed, no?

Yes. Changed in attached patch.

Martin


Thanks for looking into this,

Martin



0001-Fix-memory-leaks-in-IPA-CP-PR-ipa-79337-v2.patch


From 0086e44dc0ca43737db6e94c1f29cad903d6b39f Mon Sep 17 00:00:00 2001
From: marxin 
Date: Thu, 2 Feb 2017 11:13:13 +0100
Subject: [PATCH] Fix memory leaks in IPA CP (PR ipa/79337).

gcc/ChangeLog:

2017-02-02  Martin Liska  

PR ipa/79337
* ipa-prop.c (ipa_node_params_t::insert): Remove current
implementation.
(ipa_node_params_t::remove): Likewise.
* ipa-prop.h 

Re: [PATCH] Fix __atomic to not implement atomic loads with CAS.

2017-02-02 Thread Torvald Riegel
On Thu, 2017-02-02 at 13:58 +0100, Thomas Schwinge wrote:
> > The other failures I saw didn't seem atomics related
> > (eg, openacc)
> 
> I suppose you're testing without nvptx offloading -- which failures do
> you see for OpenACC testing?  (There shouldn't be any for host fallback
> testing.)

Sorry, false alarm.  Those tests print out "FAILED", but they don't
actually fail.

> > I haven't compared it against a full bootstrap build and
> > make check of trunk.).
> 
> I just happened to do that for x86_64-pc-linux-gnu, and I'm seeing the
> following additional changes; posting this just in case that's not
> expected to happen:
> 
> -PASS: gcc.dg/atomic-compare-exchange-5.c (test for excess errors)
> -PASS: gcc.dg/atomic-compare-exchange-5.c execution test
> +UNSUPPORTED: gcc.dg/atomic-compare-exchange-5.c

Those changes are expected because we don't expect anymore that
cmpxchg16b availability also means that 16-byte atomics will be inlined.



Re: Enable jump threading on paths meeting hot paths

2017-02-02 Thread Jeff Law

On 02/02/2017 06:49 AM, Jan Hubicka wrote:

Hi,
it seems I forgot to send the updated patch. Here it is.
We now dump info like:
Checking profitability of path:  5 (16 insns) 3 (2 insns) 34 (2 insns) 33 (4 
insns) 32 (1 insns) 10 (3 insns) 6
  Control statement insns: 16
  Overall: 12 insns
  Registering FSM jump thread: (6, 10) incoming edge;  (10, 32)  (32, 33)  (33, 
34)  (34, 3)  (3, 5)  (5, 16) nocopy; (5, 16)

path is printed backwards. It is how the loop process it.
Yea.  I've always found that confusing, particularly since the other 
threaders go the other direction.


One thing on the todo list is to stop converting between the two 
recording formats and remove the forward/backwards threading 
entanglement in tree-ssa-threadupdate.c.


Anyway, the patch looks fine to me as well (please fix the nits Richi 
pointed out).


jeff


Re: [PATCH] Fix memory leaks in gimple-ssa-sprintf.c (PR tree-optimization/79339).

2017-02-02 Thread Jeff Law

On 02/02/2017 07:30 AM, Martin Liška wrote:

Hi.

As mentioned in the PR, mpfr_clear should be called in order to release memory.
Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Ready to be installed?
Martin


0001-Fix-memory-leaks-in-gimple-ssa-sprintf.c-PR-tree-opt.patch


From 627ea01882a2a307b107e5e4aa8de6dc60530a81 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Thu, 2 Feb 2017 12:25:32 +0100
Subject: [PATCH] Fix memory leaks in gimple-ssa-sprintf.c (PR
 tree-optimization/79339).

gcc/ChangeLog:

2017-02-02  Martin Liska  

PR tree-optimization/79339
* gimple-ssa-sprintf.c (format_floating_max): Call mpfr_clear.
(format_floating): Likewise.

OK.  Thanks for tracking these down.


Jeff



Re: [PATCH] relax -Wformat-overflow for precision ranges (PR 79275)

2017-02-02 Thread Jeff Law

On 01/30/2017 02:28 PM, Martin Sebor wrote:

Bug 79275 - -Wformat-overflow false positive exceeding INT_MAX in
glibc sysdeps/posix/tempname.c points out a false positive found
during a Glibc build and caused by the checker using the upper
bound of a range of precisions in string directives with string
arguments of non-constant length.  The attached patch relaxes
the checker to use the lower bound instead when appropriate.

Martin

gcc-79275.diff


PR middle-end/79275 -  -Wformat-overflow false positive exceeding INT_MAX in 
glibc sysdeps/posix/tempname.c

gcc/testsuite/ChangeLog:

PR middle-end/79275
* gcc.dg/tree-ssa/builtin-sprintf-warn-11.c: New test.
* gcc.dg/tree-ssa/pr79275.c: New test.

gcc/ChangeLog:

PR middle-end/79275
* gimple-ssa-sprintf.c (get_string_length): Set lower bound to zero.
(format_string): Tighten up the range of output for non-constant
strings and correct the expected range for wide non-constant strings.

Couple more nits.

First, I expect the patch won't apply as-is with the operand order 
fixes.  There'll be trivial changes you'll need to make for that.


Along the same lines, this patch would introduce a new operand order nit 
here:




+   }
+  else if (0 <= dir.prec[1])
+   {


Please consider documenting how we handle strings with unknown lengths.


I don't think those warrant waiting for another review round.  Fix, 
bootstrap, test and install.


jeff



Re: [PATCH] relax -Wformat-overflow for precision ranges (PR 79275)

2017-02-02 Thread Martin Sebor

My general inclination is to ask this to wait for gcc-8 as it is not a
regression, but instead a false positive in a new warning.

So as I mentioned in my message to Joseph, I'm going to go with Joseph &
Jakub's view that this should be considered a regression.


Okay.  I'll wait for your approval of the patch then (with the fix
for the typo you pointed out).


My biggest concern with being more aggressive than that (besides
the pushback) is that I can't think of a good function to compute
the size (it can't very well be a constant).

Presumably the argument against simply giving up and not checking at all
is that by assuming length 1, we can still check all the other arguments
and perhaps still give a warning if the sprintf overflows when the
unbound string is essentially ignored?


That's right.  Since assuming the length of an unknown string is zero
is always safe, ignoring the rest of the format when one is found never
even crossed my mind.  There are other problems we can find if we keep
going that don't necessarily depend on our knowledge of the string
length.  (E.g., excessive widths and precisions, null string pointers,
or even unterminated character arrays if/when that is implemented,
etc.)

Martin


Re: [PATCH] Fix bool vs. unsigned:1 vectorization (PR tree-optimization/79284)

2017-02-02 Thread Jeff Law

On 02/01/2017 03:45 AM, Richard Biener wrote:


I agree.  But this means we should look for a vectorizer-local fix
without a new global predicate then (there seem to be subtly different
needs and coming up with good names for all of them sounds difficult...).
Well, we could go with Jakub's INTEGRAL_BOOLEAN_TYPE as posted, but in 
contexts where we use it and really depend on single bit objects, we add 
the precision == 1 check back.  Jakub's patch removes the type precision 
check in tree-vect-patterns for example.  There's likely all kinds of 
places where we need to add that check as well.


/me is frustrated that we have booleans with nonstandard precision, even 
though I understand why it was done.  It creates numerous headaches.



Jeff


Re: [PATCH] Fix bool vs. unsigned:1 vectorization (PR tree-optimization/79284)

2017-02-02 Thread Jeff Law

On 02/01/2017 01:21 AM, Richard Biener wrote:


+/* Nonzero if TYPE represents a (scalar) boolean type or type
+   in the middle-end compatible with it.  */
+
+#define INTEGRAL_BOOLEAN_TYPE_P(TYPE) \
+  (TREE_CODE (TYPE) == BOOLEAN_TYPE\
+   || ((TREE_CODE (TYPE) == INTEGER_TYPE   \
+   || TREE_CODE (TYPE) == ENUMERAL_TYPE)   \
+   && TYPE_PRECISION (TYPE) == 1   \
+   && TYPE_UNSIGNED (TYPE)))

(just to quote what you proposed).
And my suggestion is to move the PRECISION/UNSIGNED checks so that 
they're tested for BOOLEAN_TYPEs as well.





As of useless_type_conversion_p, I don't remember why we have

  /* Preserve conversions to/from BOOLEAN_TYPE if types are not
 of precision one.  */
  if (((TREE_CODE (inner_type) == BOOLEAN_TYPE)
   != (TREE_CODE (outer_type) == BOOLEAN_TYPE))
  && TYPE_PRECISION (outer_type) != 1)
return false;

it came with r173854 where you see other BOOLEAN_TYPE
-> integral-type with precision 1 check changes, so a new predicate
is very welcome IMHO.
Funny, it's your change ;-)  Looks like Kai added a gimple checker that 
verified that the operand of a TRUTH_NOT_EXPR must be compatible with 
boolean_type_node.


That check tripped (bz48989) on some Fortran code.  Your fix seems to 
imply that the Fortran front-end created non-1-precision BOOLEAN_TYPE 
object.





all BOOLEAN_TYPEs but Adas have precision one and are unsigned
(their TYPE_SIZE may vary though).  Adas larger precision boolean
has only two valid values but needs to be able to encode some 'NaT'
state.

I think BOOLEAN_COMPATIBLE_TYPE_P would be misleading as it isn't
equal to types_compatible_p (boolean_type_node, t).

Maybe we want TWO_VALUED_UNSIGNED_INTEGRAL_TYPE_P ()? (ick)
I thought "BOOLEAN" covers TWO_VALUED_UNSIGNED well enough but
simply BOOLEAN_TYPE_P is easily confused with TREE_CODE () ==
BOOLEAN_TYPE.

I'm fine with changing the predicate to be more explicit, like

#define INTEGRAL_BOOLEAN_TYPE_P(TYPE) \
  (INTEGRAL_TYPE_P (TYPE) && TYPE_PRECISION (TYPE) == 1)

not sure if we really need the TYPE_UNSIGNED check?  The middle-end
has various places that just check for a 1-precision type when
asking for a boolean context.

So naming set aside, would you agree with the above definition?
(modulo a && TYPE_UNSIGNED (TYPE))?
I could agree to that.  Alternately, we could restore the TYPE_PRECISION 
checks that Jakub removed in the vectorizer.


Jeff


Re: [PATCH] use zero as the lower bound for a signed-unsigned range (PR 79327)

2017-02-02 Thread Martin Sebor

On 02/02/2017 02:52 AM, Jakub Jelinek wrote:

On Thu, Feb 02, 2017 at 08:37:07AM +0100, Jakub Jelinek wrote:

+  if (res.range.max < res.range.min)
+   {
+ unsigned HOST_WIDE_INT tmp = res.range.max;
+ res.range.max = res.range.min;
+ res.range.min = tmp;


These 3 lines are
std::swap (res.range.max, res.range.min);
also note the spaces instead of tabs and indentation by 7/9 spaces
instead of 1 tab and 1 tab + 2 spaces.


+   }
+
+  if (!TYPE_UNSIGNED (argtype)
+ && tree_int_cst_lt (integer_zero_node, argmax)
+ && tree_int_cst_lt (argmin, integer_zero_node))


And these 2 lines
  && tree_int_cst_sgn (argmax) > 0
  && tree_int_cst_sgn (argmin) < 0


That said, as I've said in the PR and earlier in December on gcc-patches,
the way format_integer is structured is not really maintainable, has
very different code paths for arguments with known ranges and without them
intermixed together and I still ran into wrong-code issues or wrong warnings
with your patch.  See below.  Thus I'd like to propose to just very much
simplify the code:

 gimple-ssa-sprintf.c |  108 ---
 1 file changed, 27 insertions(+), 81 deletions(-)

So far I haven't done full bootstrap/regtest on this, just
make check-gcc -j16 -k RUNTESTFLAGS=tree-ssa.exp
which revealed the need for gcc.dg/tree-ssa/builtin-sprintf-warn-1.c hunk
below.  Here it is turned into a short wrong-code without the patch below:
volatile int a;

int
main (void)
{
  int i;
  char buf[64];
  if (__builtin_sprintf (buf, "%#hho", a) > 1)
__builtin_abort ();
  if (__builtin_sprintf (buf, "%#hhx", a) > 1)
__builtin_abort ();
  return 0;
}
Current trunk as well as trunk + your patch computes ranges [2, 4] and
[3, 4], but the correct ranges are [1, 4] and [1, 4], because for a 0
(or say -131072 etc.) it sets buf to "0" in both cases.


That's right.  This is a good find.  This case is being tested in
builtin-sprintf-warn-1.c but the assertions (on lines 1154 and
1155) seem to expect the wrong numbers.  I would expect your fix
to cause failures here. (And now that I've read the rest of the
patch I see it does.)


Otherwise all the tests succeeded, though looking e.g. at the diff
between builtin-sprintf-warn-1.c diagnostics with your patch and with
the patch below instead, there are also differences like:
-builtin-sprintf-warn-1.c:1119:3: note: using the range [1, -128] for directive 
argument
+builtin-sprintf-warn-1.c:1119:3: note: using the range [-128, 127] for 
directive argument
where [1, -128] looks clearly wrong, that isn't a valid range,


The range in the note is the artificial one the pass uses to compute
the range of output.  I went back and forth on this as I think it's
debatable if [-128, 127] is clearer than [1, -128] (or [-128, 1]).
The informative notes aren't completely consistent in what ranges
they print.  It would be nice to nail down what we think is the most
useful output and make them all consistent.


and
this test is -O0 anyway, plus a doesn't have a known range:
  T (0, "%hhd", a); /* { dg-warning ".%hhd. directive writing between 1 
and . bytes into a region of size 0" } */
(so to some extent even the note with the patch isn't accurate, the
printed value range is the range of the value after an implicit conversion
from int to signed char).
Or
-builtin-sprintf-warn-1.c:1122:3: note: using the range [1, 255] for directive 
argument
+builtin-sprintf-warn-1.c:1122:3: note: using the range [0, 255] for directive 
argument


Same as above.


  T (0, "%hhu", a); /* { dg-warning "into a region" } */
similarly, a has [INT_MIN, INT_MAX] range, but after implicit conversion to 
unsigned
char it is [0, 255], certainly not [1, 255].


Ditto.


Plus there are certain notes removed:
-builtin-sprintf-warn-1.c:1311:3: note: using the range [1, -2147483648] for 
directive argument
  T (0, "%d",   a); /* { dg-warning "into a region" } */
without replacement, here a has [-2147483648, 2147483647] range, but as that
is equal to the type's range, I bet it omits it.


Could be.  As I said, there are opportunities for improvements in
what the notes print.  Someone pointed out, for example, that very
large numbers are hard to read.  It might be clearer to print some
of them using constants like LONG_MAX - 2, or in hex, etc.



2017-02-02  Jakub Jelinek  
Martin Sebor  

PR tree-optimization/79327
* gimple-ssa-sprintf.c (adjust_range_for_overflow): If returning
true, always set *argmin and *argmax to TYPE_{MIN,MAX}_VALUE of
dirtype.
(format_integer): Use wide_int_to_tree instead of build_int_cst
+ to_?hwi.  If argmin is NULL, just set argmin and argmax to
TYPE_{MIN,MAX}_VALUE of argtype.  Simplify and fix computation
of shortest and longest sequence.

* 

Re: [PATCH] relax -Wformat-overflow for precision ranges (PR 79275)

2017-02-02 Thread Jeff Law

On 02/01/2017 05:40 PM, Martin Sebor wrote:

On 01/31/2017 03:33 PM, Jeff Law wrote:

On 01/30/2017 02:28 PM, Martin Sebor wrote:

Bug 79275 - -Wformat-overflow false positive exceeding INT_MAX in
glibc sysdeps/posix/tempname.c points out a false positive found
during a Glibc build and caused by the checker using the upper
bound of a range of precisions in string directives with string
arguments of non-constant length.  The attached patch relaxes
the checker to use the lower bound instead when appropriate.

Martin

gcc-79275.diff


PR middle-end/79275 -  -Wformat-overflow false positive exceeding
INT_MAX in glibc sysdeps/posix/tempname.c

gcc/testsuite/ChangeLog:

PR middle-end/79275
* gcc.dg/tree-ssa/builtin-sprintf-warn-11.c: New test.
* gcc.dg/tree-ssa/pr79275.c: New test.

gcc/ChangeLog:

PR middle-end/79275
* gimple-ssa-sprintf.c (get_string_length): Set lower bound to zero.
(format_string): Tighten up the range of output for non-constant
strings and correct the expected range for wide non-constant
strings.

My general inclination is to ask this to wait for gcc-8 as it is not a
regression, but instead a false positive in a new warning.
So as I mentioned in my message to Joseph, I'm going to go with Joseph & 
Jakub's view that this should be considered a regression.



My rationale for zero and one for strings of unknown length was
to try to avoid false positives.  The checker does use the size
of the array when a string of unknown length points to one as
the likely length.  That's run into the expected pushback but
at least there is (what I consider) a defensible rationale for
it (the string could potentially be as long as the array,
otherwise why make the array that big?)  It also uses the length
of the longest of the strings an argument is known to point to,
on the same basis.

My biggest concern with being more aggressive than that (besides
the pushback) is that I can't think of a good function to compute
the size (it can't very well be a constant).
Presumably the argument against simply giving up and not checking at all 
is that by assuming length 1, we can still check all the other arguments 
and perhaps still give a warning if the sprintf overflows when the 
unbound string is essentially ignored?


jeff




Re: [PATCH] relax -Wformat-overflow for precision ranges (PR 79275)

2017-02-02 Thread Jeff Law

On 01/31/2017 03:59 PM, Joseph Myers wrote:

On Tue, 31 Jan 2017, Jeff Law wrote:


My general inclination is to ask this to wait for gcc-8 as it is not a
regression, but instead a false positive in a new warning.


I'd hope it would be possible for current releases of GCC and glibc to
build with each other.  As this seems to be a case where the warning is
clearly bogus (and started appearing during GCC stage 4), a fix in GCC
seems more appropriate than disabling the warning for that code in glibc.
As I mentioned to Jakub, I see things differently, but I can certainly 
see both your and Jakub's point of view.


The irony here is I'm currently poking at 79095 which is a false 
positive in a new warning as well and marked as a regression.  One could 
make the same arguments on both sides about that BZ :-)


Given we've got two release managers view 79275 as a regression and one 
non-release-manager that does not, I won't push on the not-a-regression 
issue.  I'll add the regression marker and put the patch back in the queue.


jeff




Re: [PATCH] Fix memory leaks in IPA CP (PR ipa/79337).

2017-02-02 Thread Martin Liška
On 02/02/2017 05:13 PM, Martin Jambor wrote:
> Hi,
> 
> On Thu, Feb 02, 2017 at 01:53:35PM +0100, Martin Liska wrote:
>> Hello.
>>
>> As mentioned in the PR, there is memory leak that is caused by fact, that 
>> ipa_node_params_t
>> does release memory just in ipa_node_params_t::remove. That's wrong because 
>> the callback is called
>> just when cgraph_removal_hook is triggered. Thus the proper implementation 
>> is to move it do destructor.
>> Similar should be done for construction (currently being in 
>> ipa_node_params_t::insert).
> 
> Ah, that did not occur to me.  Still, I must say that destructors
> called by the garbage collector are tough to wrap one's head around.
> Or at least my head.

Mine too. It took me some time before I realized what's wrong :)

> 
> I can't approve it but I like the patch, with a little nit...
> 
>>
>> Apart from that, I noticed that when using GGC memory, the machinery 
>> implicitly calls dtors for types
>> that have __has_trivial_destructor == true. That's case of 
>> ipa_node_params_t, but as symbol_summary already
>> calls a dtor before ggc_free is called. Problem comes when hash_table marks 
>> a memory slot as empty and GGC finalizer
>> calls dtor for an invalid memory. Thus I believe not using such finalizer is 
>> proper fix.
>>
>> Last change fixes issue where ipa_free_all_node_params destroys a 
>> symbol_summary (living in GGC memory) and dtor
>> is called for second time via ggc release mechanism.
>>
>> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>>
>> Ready to be installed?
>> Martin
> 
>> From 08a3ecda47d9d52cd4bb6435aaf18b5b099ef683 Mon Sep 17 00:00:00 2001
>> From: marxin 
>> Date: Thu, 2 Feb 2017 11:13:13 +0100
>> Subject: [PATCH] Fix memory leaks in IPA CP (PR ipa/79337).
>>
>> gcc/ChangeLog:
>>
>> 2017-02-02  Martin Liska  
>>
>>  PR ipa/79337
>>  * ipa-prop.c (ipa_node_params_t::insert): Remove current
>>  implementation.
>>  (ipa_node_params_t::remove): Likewise.
>>  * ipa-prop.h (ipa_node_params::ipa_node_params): Make default
>>  initialization from ipa_node_params_t::insert.
>>  (ipa_node_params::~ipa_node_params): Move from
>>  ipa_node_params_t::release.
>>  * symbol-summary.h (symbol_summary::m_released): New member.
>>  Do not release a summary twice.  Do not allow to call finalizer
>>  for types of a summary that live in GGC memory.
>> ---
>>  gcc/ipa-prop.c   | 32 
>>  gcc/ipa-prop.h   | 28 ++--
>>  gcc/symbol-summary.h | 27 ++-
>>  3 files changed, 40 insertions(+), 47 deletions(-)
>>
> 
> ...
> 
>> diff --git a/gcc/ipa-prop.h b/gcc/ipa-prop.h
>> index 93a2390c321..8c5ba25fcec 100644
>> --- a/gcc/ipa-prop.h
>> +++ b/gcc/ipa-prop.h
>> @@ -320,6 +320,12 @@ struct GTY(()) ipa_param_descriptor
>>  
>>  struct GTY((for_user)) ipa_node_params
>>  {
>> +  /* Default constructor.  */
>> +  ipa_node_params ();
>> +
>> +  /* Default destructor.  */
>> +  ~ipa_node_params ();
>> +
>>/* Information about individual formal parameters that are gathered when
>>   summaries are generated. */
>>vec *descriptors;
>> @@ -356,6 +362,24 @@ struct GTY((for_user)) ipa_node_params
>>unsigned versionable : 1;
>>  };
>>  
>> +inline
>> +ipa_node_params::ipa_node_params ()
>> +: descriptors (NULL), lattices (NULL), ipcp_orig_node (NULL),
>> +  known_csts (vNULL), known_contexts (vNULL), analysis_done (0),
>> +  node_enqueued (0), do_clone_for_all_contexts (0), is_all_contexts_clone 
>> (0),
>> +  node_dead (0), node_within_scc (0), node_calling_single_call (0),
>> +  versionable (0)
>> +{
>> +}
>> +
>> +inline
>> +ipa_node_params::~ipa_node_params ()
>> +{
>> +  free (lattices);
>> +  known_csts.release ();
>> +  known_contexts.release ();
>> +}
>> +
>>  /* Intermediate information that we get from alias analysis about a 
>> particular
>> parameter in a particular basic_block.  When a parameter or the memory it
>> references is marked modified, we use that information in all dominated
>> @@ -580,9 +604,9 @@ public:
>>  function_summary (table, ggc) { }
>>  
>>/* Hook that is called by summary when a node is deleted.  */
>> -  virtual void insert (cgraph_node *, ipa_node_params *info);
>> +  virtual void insert (cgraph_node *, ipa_node_params *) {}
>>/* Hook that is called by summary when a node is deleted.  */
>> -  virtual void remove (cgraph_node *, ipa_node_params *info);
>> +  virtual void remove (cgraph_node *, ipa_node_params *) {}
> 
> ...that these could be just removed, no?

Yes. Changed in attached patch.

Martin

> 
> Thanks for looking into this,
> 
> Martin
> 

>From 0086e44dc0ca43737db6e94c1f29cad903d6b39f Mon Sep 17 00:00:00 2001
From: marxin 
Date: Thu, 2 Feb 2017 11:13:13 +0100
Subject: [PATCH] Fix memory leaks in IPA CP (PR ipa/79337).

gcc/ChangeLog:

2017-02-02  Martin Liska 

Re: [GCC][PATCH]{AArch64][Testsuite] Fix failing vector_initialization_nostack.c

2017-02-02 Thread James Greenhalgh
On Fri, Jan 27, 2017 at 10:55:34AM +, Ramana Radhakrishnan wrote:
> On Fri, Jan 27, 2017 at 10:30 AM, Tamar Christina
>  wrote:
> > Hi all,
> >
> > This fixes (PR78142) by only creating one vector in the char case.
> > r241590 is causing more registers to be used and so
> > the SP registered happens to be picked and used.
> >
> > This test I believe was checking explicitly that the
> > SP is not used if not needed. By creating a single vector then less
> > registers are needed so SP won't be used.
> 
> 
> 
> The test is written that way because our previuos vector
> initialization code involved constructing the initializer on the stack
> and then reloading it into the vector registers instead of
> constructing the vector initializer through a sequence of ins
> instructions which is what we changed the vector initialization code
> to. I think it helped hmmer (?) but memory is fading  :)

It seems to me that running with one vector still tests that behaviour,
but removes the risk of accidentally failing when some other element of
the compiler goes wrong and needs 32 general-purpose registers live to
acheieve the initialisation. If there's a real bug here that causes the extra
resource usage, then it would be nice to isolate it and have a PR opened.

However, this patch gets the test closer to testing a single compiler
behaviour, so this is OK for trunk.

Thanks for the fix.

James

> > gcc/testsuite/
> >
> > 2017-01-26  Tamar Christina  
> >
> > PR middle-end/78142
> > * gcc.target/aarch64/vector_initialization_nostack.c
> > (f12): Use one vector.


Re: [v3] Solaris baseline maintenance

2017-02-02 Thread Jonathan Wakely

On 02/02/17 11:09 +0100, Rainer Orth wrote:

This patch consists of two parts:

* The first just updates the Solaris libstdc++ baselines for GCC 7.

* However, there's a maintenance issue that has been bothering me for
 some time: on 32-bit x86, there are the CXXABI_FLOAT128 on top of
 what's in the sparc baseline.  So far, I've just used the 32-bit sparc
 baselines as the common ground and lived with the few added symbols on
 x86, but this means you always have to check if there's anything else
 that needs attention.

 So I've now decided to bite the bullet and separate the x86 and sparc
 baselines, so we now have i386-solaris2.1[01] and
 sparc-solaris2.1[01] instead of just solaris2.1[01].

Bootstrapped without regressions on i386-pc-solaris2.1[02] and
sparc-sun-solaris2.1[02].  Ok for mainline now or better wait a bit
closer to release?  I had to remove the recently-added-and-removed-again
basic_string::_M_copy_assign symbols from them baselines again.


Yes, sorry about that hiccup. There shouldn't be any new symbols
before the release, so doing this now seems fine. Thanks.



Re: [PATCH] Fix memory leaks in IPA CP (PR ipa/79337).

2017-02-02 Thread Martin Jambor
Hi,

On Thu, Feb 02, 2017 at 01:53:35PM +0100, Martin Liska wrote:
> Hello.
> 
> As mentioned in the PR, there is memory leak that is caused by fact, that 
> ipa_node_params_t
> does release memory just in ipa_node_params_t::remove. That's wrong because 
> the callback is called
> just when cgraph_removal_hook is triggered. Thus the proper implementation is 
> to move it do destructor.
> Similar should be done for construction (currently being in 
> ipa_node_params_t::insert).

Ah, that did not occur to me.  Still, I must say that destructors
called by the garbage collector are tough to wrap one's head around.
Or at least my head.

I can't approve it but I like the patch, with a little nit...

> 
> Apart from that, I noticed that when using GGC memory, the machinery 
> implicitly calls dtors for types
> that have __has_trivial_destructor == true. That's case of ipa_node_params_t, 
> but as symbol_summary already
> calls a dtor before ggc_free is called. Problem comes when hash_table marks a 
> memory slot as empty and GGC finalizer
> calls dtor for an invalid memory. Thus I believe not using such finalizer is 
> proper fix.
> 
> Last change fixes issue where ipa_free_all_node_params destroys a 
> symbol_summary (living in GGC memory) and dtor
> is called for second time via ggc release mechanism.
> 
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
> 
> Ready to be installed?
> Martin

> From 08a3ecda47d9d52cd4bb6435aaf18b5b099ef683 Mon Sep 17 00:00:00 2001
> From: marxin 
> Date: Thu, 2 Feb 2017 11:13:13 +0100
> Subject: [PATCH] Fix memory leaks in IPA CP (PR ipa/79337).
> 
> gcc/ChangeLog:
> 
> 2017-02-02  Martin Liska  
> 
>   PR ipa/79337
>   * ipa-prop.c (ipa_node_params_t::insert): Remove current
>   implementation.
>   (ipa_node_params_t::remove): Likewise.
>   * ipa-prop.h (ipa_node_params::ipa_node_params): Make default
>   initialization from ipa_node_params_t::insert.
>   (ipa_node_params::~ipa_node_params): Move from
>   ipa_node_params_t::release.
>   * symbol-summary.h (symbol_summary::m_released): New member.
>   Do not release a summary twice.  Do not allow to call finalizer
>   for types of a summary that live in GGC memory.
> ---
>  gcc/ipa-prop.c   | 32 
>  gcc/ipa-prop.h   | 28 ++--
>  gcc/symbol-summary.h | 27 ++-
>  3 files changed, 40 insertions(+), 47 deletions(-)
> 

...

> diff --git a/gcc/ipa-prop.h b/gcc/ipa-prop.h
> index 93a2390c321..8c5ba25fcec 100644
> --- a/gcc/ipa-prop.h
> +++ b/gcc/ipa-prop.h
> @@ -320,6 +320,12 @@ struct GTY(()) ipa_param_descriptor
>  
>  struct GTY((for_user)) ipa_node_params
>  {
> +  /* Default constructor.  */
> +  ipa_node_params ();
> +
> +  /* Default destructor.  */
> +  ~ipa_node_params ();
> +
>/* Information about individual formal parameters that are gathered when
>   summaries are generated. */
>vec *descriptors;
> @@ -356,6 +362,24 @@ struct GTY((for_user)) ipa_node_params
>unsigned versionable : 1;
>  };
>  
> +inline
> +ipa_node_params::ipa_node_params ()
> +: descriptors (NULL), lattices (NULL), ipcp_orig_node (NULL),
> +  known_csts (vNULL), known_contexts (vNULL), analysis_done (0),
> +  node_enqueued (0), do_clone_for_all_contexts (0), is_all_contexts_clone 
> (0),
> +  node_dead (0), node_within_scc (0), node_calling_single_call (0),
> +  versionable (0)
> +{
> +}
> +
> +inline
> +ipa_node_params::~ipa_node_params ()
> +{
> +  free (lattices);
> +  known_csts.release ();
> +  known_contexts.release ();
> +}
> +
>  /* Intermediate information that we get from alias analysis about a 
> particular
> parameter in a particular basic_block.  When a parameter or the memory it
> references is marked modified, we use that information in all dominated
> @@ -580,9 +604,9 @@ public:
>  function_summary (table, ggc) { }
>  
>/* Hook that is called by summary when a node is deleted.  */
> -  virtual void insert (cgraph_node *, ipa_node_params *info);
> +  virtual void insert (cgraph_node *, ipa_node_params *) {}
>/* Hook that is called by summary when a node is deleted.  */
> -  virtual void remove (cgraph_node *, ipa_node_params *info);
> +  virtual void remove (cgraph_node *, ipa_node_params *) {}

...that these could be just removed, no?

Thanks for looking into this,

Martin



Re: [RFC] Bug lto/78140

2017-02-02 Thread Martin Jambor
Hi,

I am sorry, I am apparently not really able to follow all email this
week and am mostly skimming through this thread too, but...

On Thu, Feb 02, 2017 at 01:48:26PM +0100, Jan Hubicka wrote:
> > 
> > 2017-02-02  Kugan Vivekanandarajah  
> > 
> > * ipa-cp.c (ipcp_store_bits_results): Construct bits vector.
> > (ipcp_store_vr_results): Constrict m_vr vector.
> > * ipa-prop.c (ipa_node_params_t::remove): Set transaction summary to
> > null.
> > (ipa_node_params_t::duplicate): Construct bits and m_vr vector.
> > (read_ipcp_transformation_info): Likewise.
> > 
> > 
> > Thanks,
> > Kugan
> > 
> > >>I tried similar think without variable structure like:
> > >>diff --git a/gcc/ipa-prop.h b/gcc/ipa-prop.h
> > >>index 93a2390c..b0cc832 100644
> > >>--- a/gcc/ipa-prop.h
> > >>+++ b/gcc/ipa-prop.h
> > >>@@ -525,7 +525,7 @@ struct GTY(()) ipcp_transformation_summary
> > >>   /* Known bits information.  */
> > >>   vec *bits;
> > >>   /* Value range information.  */
> > >>-  vec *m_vr;
> > >>+  vec *m_vr;
> > >> };
> > >>
> > >>This also has the same issue so I don't think it has anything to do with
> > >>variable structure.
> > >
> > >You have to debug that detail yourself but I wonder why the transformation
> > >summary has a different representation than the jump function (and I think
> > >the jump function size is the issue).
> > >
> > >The JF has
> > >
> > >  /* Information about zero/non-zero bits.  */
> > >  struct ipa_bits bits;
> > >
> > >  /* Information about value range, containing valid data only when 
> > > vr_known is
> > > true.  */
> > >  value_range m_vr;
> > >  bool vr_known;
> > >
> > >with ipa_bits having two widest_ints and value_range having two trees
> > >and an unused bitmap and ipa_vr having two wide_ints (widest_ints are
> > >smaller!).
> > >
> > >Richard.
> > >
> > >>
> > >>Thanks,
> > >>Kugan
> 
> > diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
> > index aa3c997..5103555 100644
> > --- a/gcc/ipa-cp.c
> > +++ b/gcc/ipa-cp.c
> > @@ -4865,6 +4865,8 @@ ipcp_store_bits_results (void)
> >  
> >ipcp_grow_transformations_if_necessary ();
> >ipcp_transformation_summary *ts = ipcp_get_transformation_summary 
> > (node);
> > +  if (!ts->bits)
> > +   ts->bits = (new (ggc_cleared_alloc ()) ipa_bits_vec ());
> >vec_safe_reserve_exact (ts->bits, count);
> >  
> >for (unsigned i = 0; i < count; i++)
> > @@ -4940,6 +4942,8 @@ ipcp_store_vr_results (void)
> >  
> >ipcp_grow_transformations_if_necessary ();
> >ipcp_transformation_summary *ts = ipcp_get_transformation_summary 
> > (node);
> > +  if (!ts->m_vr)
> > +   ts->m_vr = new (ggc_cleared_alloc ()) ipa_vr_vec ();
> >vec_safe_reserve_exact (ts->m_vr, count);
> >  
> >for (unsigned i = 0; i < count; i++)
> > diff --git a/gcc/ipa-prop.c b/gcc/ipa-prop.c
> > index 834c27d..b992a7f 100644
> > --- a/gcc/ipa-prop.c
> > +++ b/gcc/ipa-prop.c
> > @@ -3759,13 +3759,20 @@ ipa_node_params_t::insert (cgraph_node *, 
> > ipa_node_params *info)
> > to.  */
> >  
> >  void
> > -ipa_node_params_t::remove (cgraph_node *, ipa_node_params *info)
> > +ipa_node_params_t::remove (cgraph_node *node, ipa_node_params *info)
> >  {
> >free (info->lattices);
> >/* Lattice values and their sources are deallocated with their alocation
> >   pool.  */
> >info->known_csts.release ();
> >info->known_contexts.release ();
> > +  ipcp_transformation_summary *ts = ipcp_get_transformation_summary (node);
> > +  if (ts != NULL)

why des this need to be conditional? ipcp_get_transformation_summary
also lives in garbage collector so it should be able to hold any
necessary references properly.

> > +{
> > +  ts->agg_values = NULL;
> > +  ts->bits = NULL;
> > +  ts->m_vr = NULL;
> > +}
> 
> I would go for explicit ggc_free for them: garbage collrector is hardly ever 
> run during
> WPA stage and thus we are not going to get the memory back otherwise.

ggc_freeing might make a difference but I fail to see how the above
can, unless ipa_node_params_t still holds a reference to the info,
which it is about to drop.  Moreover...

> 
> Patch is OK with that change.

I believe this patch conflicts with Martin's fix for 79337 which might
deal with parts of what Kugan wants to achieve better so it may be
better to re-base the patch?

Thanks,

Martin


Re: [PATCH/AARCH64] Add scheduler for Thunderx2t99

2017-02-02 Thread James Greenhalgh
On Thu, Feb 02, 2017 at 05:21:05AM +, Hurugalawadi, Naveen wrote:
> Hi James,
> 
> Thanks for reviewing the patch and comments.
> 
> >> I wonder whether the current modeling of:
> >> (define_insn_reservation "thunderx2t99_asimd_load4_elts" 6
> >> Actually benefits the schedule in a meaningful way, or if it just increases
> 
> Done. Removed the scheduler modeling for thunderx2t99_asimd_load*_mult and
> thunderx2t99_asimd_load*_elts for ld3/ld4 and st3/st4 which are rarely used.
> 
> The automaton size has come down drastically without that and hopefully
> should be okay.
> 
> Automaton `thunderx2t99'
>   184 NDFA states,838 NDFA arcs
>   184 DFA states, 838 DFA arcs
>   184 minimal DFA states, 838 minimal DFA arcs
>   360 all insns  8 insn equivalence classes
> 0 locked states
>  1016 transition comb vector els,  1472 trans table els: use simple vect
>  1472 min delay table els, compression factor 4
> 
> Automaton `thunderx2t99_advsimd'
>   453 NDFA states,   1966 NDFA arcs
>   453 DFA states,1966 DFA arcs
>   351 minimal DFA states,1562 minimal DFA arcs
>   360 all insns  7 insn equivalence classes
> 0 locked states
>  1901 transition comb vector els,  2457 trans table els: use simple vect
>  2457 min delay table els, compression factor 2
> 
> Automaton `thunderx2t99_ldst'
>41 NDFA states,163 NDFA arcs
>41 DFA states, 163 DFA arcs
>14 minimal DFA states,  78 minimal DFA arcs
>   360 all insns  8 insn equivalence classes
> 0 locked states
>83 transition comb vector els,   112 trans table els: use simple vect
>   112 min delay table els, compression factor 4
> 
> Automaton `thunderx2t99_mult'
> 2 NDFA states,  5 NDFA arcs
> 2 DFA states,   5 DFA arcs
> 2 minimal DFA states,   5 minimal DFA arcs
>   360 all insns  3 insn equivalence classes
> 0 locked states
> 6 transition comb vector els, 6 trans table els: use simple vect
> 6 min delay table els, compression factor 8
> 
> 
> >> You'll want to update this to use your new scheduling model :-).
> 
> Done. I had overlooked it :-).
> 
> >> you should be changing vulcan to use the new thunderx2t99 model. 
> 
> Done. Using the new thunderx2t99 model.

That looks much better.

I'm assuming you've tested this as appropriate for the subtargets you're
modifying and are comfortable with the level of risk taking the patch at
this stage. As it only changes behaviour for the thunderx2t99 and vulcan
targets, I'd be happy to take the patch now. Though please give
Richard/Marcus 24 hours to object.

OK if no objections from others in the next 24 hours.

Thanks,
James

> Please review the modified patch and let us know your comments on the same.
> 
> Thanks,
> Naveen

> diff --git a/gcc/config/aarch64/aarch64-cores.def 
> b/gcc/config/aarch64/aarch64-cores.def
> index a7a4b33..1b958e3 100644
> --- a/gcc/config/aarch64/aarch64-cores.def
> +++ b/gcc/config/aarch64/aarch64-cores.def
> @@ -74,8 +74,8 @@ AARCH64_CORE("xgene1",  xgene1,xgene1,8A,  
> AARCH64_FL_FOR_ARCH8, xge
>  /* V8.1 Architecture Processors.  */
>  
>  /* Broadcom ('B') cores. */
> -AARCH64_CORE("thunderx2t99",  thunderx2t99, cortexa57, 8_1A,  
> AARCH64_FL_FOR_ARCH8_1 | AARCH64_FL_CRYPTO, thunderx2t99, 0x42, 0x516, -1)
> -AARCH64_CORE("vulcan",  vulcan, cortexa57, 8_1A,  AARCH64_FL_FOR_ARCH8_1 | 
> AARCH64_FL_CRYPTO, thunderx2t99, 0x42, 0x516, -1)
> +AARCH64_CORE("thunderx2t99",  thunderx2t99, thunderx2t99, 8_1A,  
> AARCH64_FL_FOR_ARCH8_1 | AARCH64_FL_CRYPTO, thunderx2t99, 0x42, 0x516, -1)
> +AARCH64_CORE("vulcan",  vulcan, thunderx2t99, 8_1A,  AARCH64_FL_FOR_ARCH8_1 
> | AARCH64_FL_CRYPTO, thunderx2t99, 0x42, 0x516, -1)
>  
>  /* V8 big.LITTLE implementations.  */
>  
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index a693a3b..7550c3e 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -225,6 +225,7 @@
>  (include "../arm/exynos-m1.md")
>  (include "thunderx.md")
>  (include "../arm/xgene1.md")
> +(include "thunderx2t99.md")
>  
>  ;; ---
>  ;; Jumps and other miscellaneous insns
> diff --git a/gcc/config/aarch64/thunderx2t99.md 
> b/gcc/config/aarch64/thunderx2t99.md
> new file mode 100644
> index 000..0dd7199
> --- /dev/null
> +++ b/gcc/config/aarch64/thunderx2t99.md
> @@ -0,0 +1,443 @@
> +;; Cavium ThunderX 2 CN99xx pipeline description
> +;; Copyright (C) 2016-2017 Free Software Foundation, Inc.
> +;;
> +;; Contributed by Cavium, Broadcom and Mentor Embedded.
> +
> +;; This file is part of GCC.
> +
> +;; GCC is free software; you can redistribute it and/or modify
> +;; it under 

Re: [Patch AArch64] Use 128-bit vectors when autovectorizing 16-bit float types

2017-02-02 Thread James Greenhalgh
On Mon, Jan 23, 2017 at 11:23:48AM +, James Greenhalgh wrote:
> 
> Hi,
> 
> As subject, we have an oversight in aarch64_simd_container_mode for
> HFmode inputs. This results in trunk only autovectorizing to a 64-bit vector,
> rather than a full 128-bit vector.
> 
> The fix is obvious, we just need to handle HFmode, and return an
> appropriate vector mode.
> 
> Tested on aarch64-none-elf with no issues. This patch looks low risk
> for this development stage to me, though it fixes an oversight rather
> than a regression.

*Ping*

Thanks,
James

> gcc/
> 
> 2017-01-23  James Greenhalgh  
> 
>   * config/aarch64/aarch64.c (aarch64_simd_container_mode): Handle
>   HFmode.
> 
> gcc/testsuite/
> 
> 2017-01-23  James Greenhalgh  
> 
>   * gcc.target/aarch64/vect_fp16_1.c: New.
> 

> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 0cf7d12..7efc1f2 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -10777,6 +10777,8 @@ aarch64_simd_container_mode (machine_mode mode, 
> unsigned width)
>   return V2DFmode;
> case SFmode:
>   return V4SFmode;
> +   case HFmode:
> + return V8HFmode;
> case SImode:
>   return V4SImode;
> case HImode:
> @@ -10793,6 +10795,8 @@ aarch64_simd_container_mode (machine_mode mode, 
> unsigned width)
> {
> case SFmode:
>   return V2SFmode;
> +   case HFmode:
> + return V4HFmode;
> case SImode:
>   return V2SImode;
> case HImode:
> diff --git a/gcc/testsuite/gcc.target/aarch64/vect_fp16_1.c 
> b/gcc/testsuite/gcc.target/aarch64/vect_fp16_1.c
> new file mode 100644
> index 000..da0cd81
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/vect_fp16_1.c
> @@ -0,0 +1,30 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fno-vect-cost-model" } */
> +
> +/* Check that we vectorize to a full 128-bit vector for _Float16 and __fp16
> +   types.  */
> +
> +/* Enable ARMv8.2-A+fp16 so we have access to the vector instructions.  */
> +#pragma GCC target ("arch=armv8.2-a+fp16")
> +
> +_Float16
> +sum_Float16 (_Float16 *__restrict__ __attribute__ ((__aligned__ (16))) a,
> +  _Float16 *__restrict__ __attribute__ ((__aligned__ (16))) b,
> +  _Float16 *__restrict__ __attribute__ ((__aligned__ (16))) c)
> +{
> +  for (int i = 0; i < 256; i++)
> +a[i] = b[i] + c[i];
> +}
> +
> +_Float16
> +sum_fp16 (__fp16 *__restrict__ __attribute__ ((__aligned__ (16))) a,
> +   __fp16 *__restrict__ __attribute__ ((__aligned__ (16))) b,
> +   __fp16 *__restrict__ __attribute__ ((__aligned__ (16))) c)
> +{
> +  for (int i = 0; i < 256; i++)
> +a[i] = b[i] + c[i];
> +}
> +
> +/* Two FADD operations on "8h" data widths, one from sum_Float16, one from
> +   sum_fp16.  */
> +/* { dg-final { scan-assembler-times "fadd\tv\[0-9\]\+.8h" 2 } } */



Re: [PATCH] Fix __atomic to not implement atomic loads with CAS.

2017-02-02 Thread Torvald Riegel
On Thu, 2017-02-02 at 14:48 +, Ramana Radhakrishnan wrote:
> On 30/01/17 18:54, Torvald Riegel wrote:
> > This patch fixes the __atomic builtins to not implement supposedly
> > lock-free atomic loads based on just a compare-and-swap operation.
> >
> > If there is no hardware-backed atomic load for a certain memory
> > location, the current implementation can implement the load with a CAS
> > while claiming that the access is lock-free.  This is a bug in the cases
> > of volatile atomic loads and atomic loads to read-only-mapped memory; it
> > also creates a lot of contention in case of concurrent atomic loads,
> > which results in at least counter-intuitive performance because most
> > users probably understand "lock-free" to mean hardware-backed (and thus
> > "fast") instead of just in the progress-criteria sense.
> >
> > This patch implements option 3b of the choices described here:
> > https://gcc.gnu.org/ml/gcc/2017-01/msg00167.html
> 
> 
> Will Deacon pointed me at this thread asking if something similar could 
> be done on ARM.

It would be nice if someone more familiar with ARM could double-check
that ARM is not affected.  I guess ARM isn't, but that's based on me
looking at machine descriptions, which I hadn't ever done before working
on this patch...

The patch introduces a can_atomic_load_p, which checks that an entry
exists in optabs and the machine descriptions for an atomic load of the
particular mode.

IIRC, arm and aarch64 had atomic load patterns defined whenever they had
a CAS defined.  It would be nice if you could double-check that.  If
that's the case, nothing should change with my patch because
can_atomic_load_p would always return true if a CAS could be issued.
If that's not the case, arm/aarch64 could be in the same boat as x86_64
with cmpxchg16b; then, using Option 3b might be a useful (intermediary)
solution for ARM as well (OTOH, compatibility with existing code and
__sync builtins might perhaps not be as much of a factor as it might be
on x86_64).

The atomic load patterns you have could still be wrong, for example by
generating a LDXP/STXP loop or something else that can store on an
atomic load.  In that case, the problem is similar as not having a
custom load pattern, and the second case in the previous paragraph
applies.

> On armv8-a we can implement an atomic load of 16 bytes using an LDXP / 
> STXP loop as a 16 byte load isnt' single copy atomic. On armv8.1-a we do 
> have a CAS on 16 bytes.
> 
> This was discussed last year here.
> 
> https://gcc.gnu.org/ml/gcc/2016-06/msg00017.html
> 
> and the consensus seemed to be that we couldn't really do a CAS on 
> readonly data as we would have no way of avoiding a trap.

Yes, and I agree that not doing a CAS (or anything that can store on
atomic load) is the right thing to do.

BTW, on the ARM targets, is it possible that the __sync builtins would
use LDXP/STXP on 16 byte and the __atomic loads would fall back to using
libatomic and locks?  Or do you simply not provide 16-byte __sync
builtins?

> I'm sure I'm missing something piece of the puzzle, so I'd be interested 
> in how you avoid this in this implementation on x86_64 with what I can 
> see is a CAS.

If we'd start from scratch, we wouldn't use cmpxchg16b if we don't have
a 16-byte atomic load.  However, we did do that, and there might be code
out there that does have inlined cmpxchg16b.  As discussed in the thread
you cited, changing GCC 7 back to libatomics *and* the use of locks
would be an effective ABI break.  Therefore, my patch (and Option 3b)
makes a compromise and delegates to libatomic but libatomic gets custom
code to actually keep using cmpxchg16b.  That means that the actual
synchronization doesn't change, but we constrain this problem to
libatomic and prevent application code generated with GCC 7 from being
directly affected.

Does this clarify the x86_64 situation?



Re: [PATCH][wwwdocs] Mention -march=armv8.3-a -msign-return-address= for GCC 7

2017-02-02 Thread Gerald Pfeifer

On Thu, 2 Feb 2017, Jiong Wang wrote:

Please review, thanks.


Looks good to me, thank you!

Gerald


Re: [PATCH] Fix __atomic to not implement atomic loads with CAS.

2017-02-02 Thread Ramana Radhakrishnan

On 02/02/17 14:52, Jakub Jelinek wrote:

On Thu, Feb 02, 2017 at 02:48:42PM +, Ramana Radhakrishnan wrote:

On 30/01/17 18:54, Torvald Riegel wrote:

This patch fixes the __atomic builtins to not implement supposedly
lock-free atomic loads based on just a compare-and-swap operation.

If there is no hardware-backed atomic load for a certain memory
location, the current implementation can implement the load with a CAS
while claiming that the access is lock-free.  This is a bug in the cases
of volatile atomic loads and atomic loads to read-only-mapped memory; it
also creates a lot of contention in case of concurrent atomic loads,
which results in at least counter-intuitive performance because most
users probably understand "lock-free" to mean hardware-backed (and thus
"fast") instead of just in the progress-criteria sense.

This patch implements option 3b of the choices described here:
https://gcc.gnu.org/ml/gcc/2017-01/msg00167.html



Will Deacon pointed me at this thread asking if something similar could be
done on ARM.

On armv8-a we can implement an atomic load of 16 bytes using an LDXP / STXP
loop as a 16 byte load isnt' single copy atomic. On armv8.1-a we do have a
CAS on 16 bytes.


If the AArch64 ISA guarantees LDXP is atomic, then yes, you can do that.
The problem we have on x86_64 is that I think neither Intel nor AMD gave us
guarantees that aligned SSE or AVX loads are guaranteed to be atomic.



LDXP is not single copy atomic.

so this would become something like the following with appropriate 
additional barriers.


You need to write this as a loop of

.retry
LDXP x0, x1 , [x2]
STXP X3, X0, X1, [x2]
CBNZ X3, .retry

You have to do the write again to guarantee atomicity on AArch64.

I missed the first paragraph in this thread on gcc@.

https://gcc.gnu.org/ml/gcc/2017-01/msg00167.html



I consider this a bug because it can
result in a store being issued (e.g., when loading from a read-only
page, or when trying to do a volatile atomic load), and because it can
increase contention (which makes atomic loads perform much different
than HW load instructions would).  See the thread "GCC libatomic ABI
specification draft" for more background.


Ok, this means implementing such a change in libatomic for AArch64 will 
introduce the bug that Torvald is worried about for x86_64 .



regards
Ramana








Jakub





Re: [gomp4] optimize GOMP_MAP_TO_PSET

2017-02-02 Thread Cesar Philippidis
On 01/30/2017 02:26 AM, Thomas Schwinge wrote:

> ... also there is some bug somewhere; I see:
> 
> PASS: libgomp.fortran/examples-4/async_target-2.f90   -O0  (test for 
> excess errors)
> [-PASS:-]{+FAIL:+} libgomp.fortran/examples-4/async_target-2.f90   -O0  
> execution test
> PASS: libgomp.fortran/examples-4/async_target-2.f90   -O1  (test for 
> excess errors)
> [-PASS:-]{+FAIL:+} libgomp.fortran/examples-4/async_target-2.f90   -O1  
> execution test
> PASS: libgomp.fortran/examples-4/async_target-2.f90   -O2  (test for 
> excess errors)
> [-PASS:-]{+FAIL:+} libgomp.fortran/examples-4/async_target-2.f90   -O2  
> execution test
> PASS: libgomp.fortran/examples-4/async_target-2.f90   -O3 
> -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  
> (test for excess errors)
> [-PASS:-]{+FAIL:+} libgomp.fortran/examples-4/async_target-2.f90   -O3 
> -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  
> execution test
> PASS: libgomp.fortran/examples-4/async_target-2.f90   -O3 -g  (test for 
> excess errors)
> [-PASS:-]{+FAIL:+} libgomp.fortran/examples-4/async_target-2.f90   -O3 -g 
>  execution test
> PASS: libgomp.fortran/examples-4/async_target-2.f90   -Os  (test for 
> excess errors)
> [-PASS:-]{+FAIL:+} libgomp.fortran/examples-4/async_target-2.f90   -Os  
> execution test
> 
> ..., and:
> 
> PASS: libgomp.fortran/target3.f90   -O0  (test for excess errors)
> [-PASS:-]{+FAIL:+} libgomp.fortran/target3.f90   -O0  execution test
> PASS: libgomp.fortran/target3.f90   -O1  (test for excess errors)
> [-PASS:-]{+FAIL:+} libgomp.fortran/target3.f90   -O1  execution test
> PASS: libgomp.fortran/target3.f90   -O2  (test for excess errors)
> [-PASS:-]{+FAIL:+} libgomp.fortran/target3.f90   -O2  execution test
> PASS: libgomp.fortran/target3.f90   -O3 -fomit-frame-pointer 
> -funroll-loops -fpeel-loops -ftracer -finline-functions  (test for excess 
> errors)
> [-PASS:-]{+FAIL:+} libgomp.fortran/target3.f90   -O3 -fomit-frame-pointer 
> -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
> PASS: libgomp.fortran/target3.f90   -O3 -g  (test for excess errors)
> [-PASS:-]{+FAIL:+} libgomp.fortran/target3.f90   -O3 -g  execution test
> PASS: libgomp.fortran/target3.f90   -Os  (test for excess errors)
> [-PASS:-]{+FAIL:+} libgomp.fortran/target3.f90   -Os  execution test
> 
> In all cases, the run-time error message is:
> 
> libgomp: Pointer target of array section wasn't mapped

That problem occurred because I wasn't handling NULL pointers
gomp_map_pset. Basically, that can occur when you have an nullified
pointer to an array. I taught libgomp to handle that situation like
gomp_map_pointer, i.e., just propagate the NULL pointer to the accelerator.

This patch also fixes a bug with pointers to reference types. The
approach I took in this patch is to be conservative by demoting
GOMP_MAP_FIRSTPRIVATE_POINTER to GOMP_MAP_POINTER for such types.

I've applied this patch to gomp-4_0-branch.

Cesar


Re: [PATCH v3][PR lto/79061] Fix LTO plus ASAN fails with "AddressSanitizer: initialization-order-fiasco".

2017-02-02 Thread Maxim Ostapenko

Hi,

PR lto/79061 actually affects gcc-{5, 6}-branch too
Is it OK to apply following patch on branches?

-Maxim
gcc/ChangeLog:

2017-02-02  Maxim Ostapenko  

	PR lto/79061
	* asan.c (asan_add_global): Force has_dynamic_init to zero in LTO mode.

diff --git a/gcc/asan.c b/gcc/asan.c
index 398a508..0f55dc0 100644
--- a/gcc/asan.c
+++ b/gcc/asan.c
@@ -2275,7 +2275,11 @@ asan_add_global (tree decl, tree type, vec *v)
   CONSTRUCTOR_APPEND_ELT (vinner, NULL_TREE,
 			  fold_convert (const_ptr_type_node, module_name_cst));
   varpool_node *vnode = varpool_node::get (decl);
-  int has_dynamic_init = vnode ? vnode->dynamically_initialized : 0;
+  int has_dynamic_init = 0;
+  /* FIXME: Enable initialization order fiasco detection in LTO mode once
+ proper fix for PR 79061 will be applied.  */
+  if (!in_lto_p)
+has_dynamic_init = vnode ? vnode->dynamically_initialized : 0;
   CONSTRUCTOR_APPEND_ELT (vinner, NULL_TREE,
 			  build_int_cst (uptr, has_dynamic_init));
   tree locptr = NULL_TREE;


Re: [PATCH] IPA: enhance dump output

2017-02-02 Thread Jan Hubicka
> 2017-01-24  Martin Liska  
> 
>   * cgraph.c (cgraph_node::dump): Dump function version info.
>   * symtab.c (symtab_node::dump_base): Add missing new line.
> ---
>  gcc/cgraph.c | 10 ++
>  gcc/symtab.c |  1 +
>  2 files changed, 11 insertions(+)
> 
> diff --git a/gcc/cgraph.c b/gcc/cgraph.c
> index ef2dc50282c..74839f7d993 100644
> --- a/gcc/cgraph.c
> +++ b/gcc/cgraph.c
> @@ -2066,6 +2066,16 @@ cgraph_node::dump (FILE *f)
>  fprintf (f, "  Profile id: %i\n",
>profile_id);
>fprintf (f, "  First run: %i\n", tp_first_run);
> +  cgraph_function_version_info *vi = function_version ();
> +  if (vi != NULL)
> +{
> +  /* Iterate to first item in the chain.  */
> +  while (vi->prev != NULL)
> + vi = vi->prev;
> +  fprintf (f, "  Version info: ");
> +  dump_addr (f, "@", (void *)vi);
> +  fprintf (f, "\n");

I suppose it is useful to know that version info is attached, but instead of
dumping an address, i would rather meaningfully print its contents (i.e.
dispatcher and prev/next pointers in list).

OK with that change.

honza
> +}
>fprintf (f, "  Function flags:");
>if (count)
>  fprintf (f, " executed %" PRId64"x",
> diff --git a/gcc/symtab.c b/gcc/symtab.c
> index 87febdc212f..0078896c8a8 100644
> --- a/gcc/symtab.c
> +++ b/gcc/symtab.c
> @@ -890,6 +890,7 @@ symtab_node::dump_base (FILE *f)
>  {
>fprintf (f, "  Aux:");
>dump_addr (f, " @", (void *)aux);
> +  fprintf (f, "\n");
>  }
>  
>fprintf (f, "  References: ");
> -- 
> 2.11.0
> 



Re: [PATCH] Fix __atomic to not implement atomic loads with CAS.

2017-02-02 Thread Jakub Jelinek
On Thu, Feb 02, 2017 at 02:48:42PM +, Ramana Radhakrishnan wrote:
> On 30/01/17 18:54, Torvald Riegel wrote:
> > This patch fixes the __atomic builtins to not implement supposedly
> > lock-free atomic loads based on just a compare-and-swap operation.
> > 
> > If there is no hardware-backed atomic load for a certain memory
> > location, the current implementation can implement the load with a CAS
> > while claiming that the access is lock-free.  This is a bug in the cases
> > of volatile atomic loads and atomic loads to read-only-mapped memory; it
> > also creates a lot of contention in case of concurrent atomic loads,
> > which results in at least counter-intuitive performance because most
> > users probably understand "lock-free" to mean hardware-backed (and thus
> > "fast") instead of just in the progress-criteria sense.
> > 
> > This patch implements option 3b of the choices described here:
> > https://gcc.gnu.org/ml/gcc/2017-01/msg00167.html
> 
> 
> Will Deacon pointed me at this thread asking if something similar could be
> done on ARM.
> 
> On armv8-a we can implement an atomic load of 16 bytes using an LDXP / STXP
> loop as a 16 byte load isnt' single copy atomic. On armv8.1-a we do have a
> CAS on 16 bytes.

If the AArch64 ISA guarantees LDXP is atomic, then yes, you can do that.
The problem we have on x86_64 is that I think neither Intel nor AMD gave us
guarantees that aligned SSE or AVX loads are guaranteed to be atomic.

Jakub


Re: [PATCH][wwwdocs] Mention -march=armv8.3-a -msign-return-address= for GCC 7

2017-02-02 Thread Jiong Wang

On 02/02/17 13:31, Gerald Pfeifer wrote:

On Thu, 2 Feb 2017, Jiong Wang wrote:

This patch adds a short entry for the -march=armv8.3-a and
-msign-return-address= options in GCC 7 to the "AArch64" section.


Thanks, Jiong.

Index: gcc-7/changes.html
===
 
+   The ARMv8.3-A architecture is now supported.  It can be used by
+   specifying the -march=armv8.3-a option.
+
+   The option -msign-return-address= is supported to enable
+   return address protection using ARMv8.3-A Pointer Authentication
+   Extensions.  Please refer to the documentation for more information on
+   the arguments accepted by this option.
+ 

Would it make sense to make this two different items?  The way it
is currently marked up, the blank line will be "gone" once rendered.


OK, seperated them into two different items.



Where you "refer to the documentation", what kind of documentation
is that?  ARM reference manuals, GCC's documentation,...?  Being a
bit more explicit here and/or using a link would be good.


It's GCC user manual, have added the link in the updated patch.

Please review, thanks.


Index: htdocs/gcc-7/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-7/changes.html,v
retrieving revision 1.54
diff -u -r1.54 changes.html
--- htdocs/gcc-7/changes.html	1 Feb 2017 19:23:00 -	1.54
+++ htdocs/gcc-7/changes.html	2 Feb 2017 14:34:49 -
@@ -711,6 +711,18 @@
 AArch64

  
+   The ARMv8.3-A architecture is now supported.  It can be used by
+   specifying the -march=armv8.3-a option.
+ 
+ 
+   The option -msign-return-address= is supported to enable
+   return address protection using ARMv8.3-A Pointer Authentication
+   Extensions.  For more information on the arguments accepted by this
+   option, please refer to
+	https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html#AArch64-Options;>
+	AArch64-Options.
+ 
+ 
The ARMv8.2-A architecture and the ARMv8.2-A 16-bit Floating-Point
Extensions are now supported.  They can be used by specifying the
-march=armv8.2-a or -march=armv8.2-a+fp16


Re: [PATCH] Fix __atomic to not implement atomic loads with CAS.

2017-02-02 Thread Ramana Radhakrishnan

On 30/01/17 18:54, Torvald Riegel wrote:

This patch fixes the __atomic builtins to not implement supposedly
lock-free atomic loads based on just a compare-and-swap operation.

If there is no hardware-backed atomic load for a certain memory
location, the current implementation can implement the load with a CAS
while claiming that the access is lock-free.  This is a bug in the cases
of volatile atomic loads and atomic loads to read-only-mapped memory; it
also creates a lot of contention in case of concurrent atomic loads,
which results in at least counter-intuitive performance because most
users probably understand "lock-free" to mean hardware-backed (and thus
"fast") instead of just in the progress-criteria sense.

This patch implements option 3b of the choices described here:
https://gcc.gnu.org/ml/gcc/2017-01/msg00167.html



Will Deacon pointed me at this thread asking if something similar could 
be done on ARM.


On armv8-a we can implement an atomic load of 16 bytes using an LDXP / 
STXP loop as a 16 byte load isnt' single copy atomic. On armv8.1-a we do 
have a CAS on 16 bytes.


This was discussed last year here.

https://gcc.gnu.org/ml/gcc/2016-06/msg00017.html

and the consensus seemed to be that we couldn't really do a CAS on 
readonly data as we would have no way of avoiding a trap.


I'm sure I'm missing something piece of the puzzle, so I'd be interested 
in how you avoid this in this implementation on x86_64 with what I can 
see is a CAS.


regards
Ramana




In a nutshell, this does change affected accesses to call libatomic
instead of inlining the CAS-based load emulation -- but libatomic will
continue to do the CAS-based approach.  Thus, there's no change in how
the changes are actually performed (including compatibility with the
__sync builtins, which are not changed) -- but we do make it much easier
to fix this in the future, and we do ensure that less future code will
have the problematic code inlined into it (and thus unfixable).

Second, the return of __atomic_always_lock_free and
__atomic_is_lock_free are changed to report false for the affected
accesses.  This aims at making users aware of the fact that they don't
get fast hardware-backed performance for the affected accesses.

This patch does not yet change how OpenMP atomics support is
implemented.  Jakub said he would take care of that.  I suppose one
approach would be to check can_atomic_load_p (introduced by this patch)
to decide in expand_omp_atomic whether to just use the mutex-based
approach; I think that conservatively, OpenMP atomics have to assume
that there could be an atomic load to a particular memory location
elsewhere in the program, so the choice between mutex-backed or not has
to be consistent for a particular memory location.

Thanks to Richard Henderson for providing the libatomic part of this
patch, and thanks to Andrew MacLeod for helping with the compiler parts.

I've tested this on an x86_64-linux bootstrap build and see no
regressions.  (With the exception of two OpenMP failures, which Jakub
will take care of.  The other failures I saw didn't seem atomics related
(eg, openacc); I haven't compared it against a full bootstrap build and
make check of trunk.).

AFAICT, in practice only x86 with -mcx16 (or where this is implicitly
implied) is affected.  The other architectures I looked at seemed to
have patterns for true hardware-backed atomic loads whenever they also
had a wider-than-word-sized CAS available.

I know we missed the stage3 deadline with this, unfortunately.  I think
it would be good to have this fix be part of GCC 7 though, because this
would ensure that less future code has the problematic load-via-CAS code
inlined.

Jakub: If this is OK for GCC 7, can you please take care of the OpenMP
bits and commit this?  Changelog entries are in the commit message.

If others could test on other hardware, this would also be appreciated.





Re: [PATCH][ARM] Remove movdi_vfp_cortexa8

2017-02-02 Thread Wilco Dijkstra

ping



From: Wilco Dijkstra
Sent: 29 November 2016 11:05
To: GCC Patches
Cc: nd
Subject: [PATCH][ARM] Remove movdi_vfp_cortexa8
    
Merge the movdi_vfp_cortexa8 pattern into movdi_vfp and remove it to avoid
unnecessary duplication and repeating bugs like PR78439 due to changes being
applied only to one of the duplicates.

Bootstrap OK for ARM and Thumb-2 gnueabihf targets. OK for commit?

ChangeLog:
2016-11-29  Wilco Dijkstra  

    * config/arm/vfp.md (movdi_vfp): Merge changes from movdi_vfp_cortexa8.
    * (movdi_vfp_cortexa8): Remove pattern.
--

diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
index 
2051f1018f1cbff9c5bf044e71304d78e615458e..a917aa625a7b15f6c9e2b549ab22e5219bb9b99c
 100644
--- a/gcc/config/arm/vfp.md
+++ b/gcc/config/arm/vfp.md
@@ -304,9 +304,9 @@
 ;; DImode moves
 
 (define_insn "*movdi_vfp"
-  [(set (match_operand:DI 0 "nonimmediate_di_operand" "=r,r,r,r,q,q,m,w,r,w,w, 
Uv")
+  [(set (match_operand:DI 0 "nonimmediate_di_operand" 
"=r,r,r,r,q,q,m,w,!r,w,w, Uv")
    (match_operand:DI 1 "di_operand"  
"r,rDa,Db,Dc,mi,mi,q,r,w,w,Uvi,w"))]
-  "TARGET_32BIT && TARGET_HARD_FLOAT && arm_tune != TARGET_CPU_cortexa8
+  "TARGET_32BIT && TARGET_HARD_FLOAT
    && (   register_operand (operands[0], DImode)
    || register_operand (operands[1], DImode))
    && !(TARGET_NEON && CONST_INT_P (operands[1])
@@ -339,71 +339,25 @@
 }
   "
   [(set_attr "type" 
"multiple,multiple,multiple,multiple,load2,load2,store2,f_mcrr,f_mrrc,ffarithd,f_loadd,f_stored")
-   (set (attr "length") (cond [(eq_attr "alternative" "1,4,5,6") (const_int 8)
+   (set (attr "length") (cond [(eq_attr "alternative" "1") (const_int 8)
   (eq_attr "alternative" "2") (const_int 12)
   (eq_attr "alternative" "3") (const_int 16)
+ (eq_attr "alternative" "4,5,6")
+  (symbol_ref "arm_count_output_move_double_insns 
(operands) * 4")
   (eq_attr "alternative" "9")
    (if_then_else
  (match_test "TARGET_VFP_SINGLE")
  (const_int 8)
  (const_int 4))]
   (const_int 4)))
+   (set_attr "predicable"    "yes")
    (set_attr "arm_pool_range" "*,*,*,*,1020,4096,*,*,*,*,1020,*")
    (set_attr "thumb2_pool_range" "*,*,*,*,1018,4094,*,*,*,*,1018,*")
    (set_attr "neg_pool_range" "*,*,*,*,1004,0,*,*,*,*,1004,*")
+   (set (attr "ce_count") (symbol_ref "get_attr_length (insn) / 4"))
    (set_attr "arch"   "t2,any,any,any,a,t2,any,any,any,any,any,any")]
 )
 
-(define_insn "*movdi_vfp_cortexa8"
-  [(set (match_operand:DI 0 "nonimmediate_di_operand" 
"=r,r,r,r,r,r,m,w,!r,w,w, Uv")
-   (match_operand:DI 1 "di_operand"  
"r,rDa,Db,Dc,mi,mi,r,r,w,w,Uvi,w"))]
-  "TARGET_32BIT && TARGET_HARD_FLOAT && arm_tune == TARGET_CPU_cortexa8
-    && (   register_operand (operands[0], DImode)
-    || register_operand (operands[1], DImode))
-    && !(TARGET_NEON && CONST_INT_P (operands[1])
-    && neon_immediate_valid_for_move (operands[1], DImode, NULL, NULL))"
-  "*
-  switch (which_alternative)
-    {
-    case 0: 
-    case 1:
-    case 2:
-    case 3:
-  return \"#\";
-    case 4:
-    case 5:
-    case 6:
-  return output_move_double (operands, true, NULL);
-    case 7:
-  return \"vmov%?\\t%P0, %Q1, %R1\\t%@ int\";
-    case 8:
-  return \"vmov%?\\t%Q0, %R0, %P1\\t%@ int\";
-    case 9:
-  return \"vmov%?.f64\\t%P0, %P1\\t%@ int\";
-    case 10: case 11:
-  return output_move_vfp (operands);
-    default:
-  gcc_unreachable ();
-    }
-  "
-  [(set_attr "type" 
"multiple,multiple,multiple,multiple,load2,load2,store2,f_mcrr,f_mrrc,ffarithd,f_loadd,f_stored")
-   (set (attr "length") (cond [(eq_attr "alternative" "1") (const_int 8)
-   (eq_attr "alternative" "2") (const_int 12)
-   (eq_attr "alternative" "3") (const_int 16)
-   (eq_attr "alternative" "4,5,6") 
-  (symbol_ref 
-   "arm_count_output_move_double_insns (operands) \
- * 4")]
-  (const_int 4)))
-   (set_attr "predicable"    "yes")
-   (set_attr "arm_pool_range" "*,*,*,*,1018,4094,*,*,*,*,1018,*")
-   (set_attr "thumb2_pool_range" "*,*,*,*,1018,4094,*,*,*,*,1018,*")
-   (set_attr "neg_pool_range" "*,*,*,*,1004,0,*,*,*,*,1004,*")
-   (set (attr "ce_count") 
-   (symbol_ref "get_attr_length (insn) / 4"))
-   (set_attr "arch"   "t2,any,any,any,a,t2,any,any,any,any,any,any")]
- )
-
 ;; HFmode moves
 
 (define_insn "*movhf_vfp_fp16"


Re: Enable jump threading on paths meeting hot paths

2017-02-02 Thread Jan Hubicka
> > + if (!contains_hot_bb && speed_p && j < path_length - 1)
> 
> j < path_length - 1 is already checked above?
> 
> Otherwise looks ok.  If it does fix the regression - does it?

Thanks, yes it fixes the regression.

Honza


Re: [PATCH][ARM] Fix ldrd offsets

2017-02-02 Thread Wilco Dijkstra

    
ping


From: Wilco Dijkstra
Sent: 03 November 2016 12:20
To: GCC Patches
Cc: nd
Subject: [PATCH][ARM] Fix ldrd offsets
    
Fix ldrd offsets of Thumb-2 - for TARGET_LDRD the range is +-1020,
without -255..4091.  This reduces the number of addressing instructions
when using DI mode operations (such as in PR77308).

Bootstrap & regress OK.

ChangeLog:
2015-11-03  Wilco Dijkstra  

    gcc/
    * config/arm/arm.c (arm_legitimate_index_p): Add comment.
    (thumb2_legitimate_index_p): Use correct range for DI/DF mode.
--

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
3c4c7042d9c2101619722b5822b3d1ca37d637b9..5d12cf9c46c27d60a278d90584bde36ec86bb3fe
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -7486,6 +7486,8 @@ arm_legitimate_index_p (machine_mode mode, rtx index, 
RTX_CODE outer,
 {
   HOST_WIDE_INT val = INTVAL (index);
 
+ /* Assume we emit ldrd or 2x ldr if !TARGET_LDRD.
+    If vldr is selected it uses arm_coproc_mem_operand.  */
   if (TARGET_LDRD)
 return val > -256 && val < 256;
   else
@@ -7613,11 +7615,13 @@ thumb2_legitimate_index_p (machine_mode mode, rtx 
index, int strict_p)
   if (code == CONST_INT)
 {
   HOST_WIDE_INT val = INTVAL (index);
- /* ??? Can we assume ldrd for thumb2?  */
- /* Thumb-2 ldrd only has reg+const addressing modes.  */
- /* ldrd supports offsets of +-1020.
-    However the ldr fallback does not.  */
- return val > -256 && val < 256 && (val & 3) == 0;
+ /* Thumb-2 ldrd only has reg+const addressing modes.
+    Assume we emit ldrd or 2x ldr if !TARGET_LDRD.
+    If vldr is selected it uses arm_coproc_mem_operand.  */
+ if (TARGET_LDRD)
+   return IN_RANGE (val, -1020, 1020) && (val & 3) == 0;
+ else
+   return IN_RANGE (val, -255, 4095 - 4);
 }
   else
 return 0;    

Re: [PATCH][ARM] Improve max_insns_skipped logic

2017-02-02 Thread Wilco Dijkstra

    

ping


From: Wilco Dijkstra
Sent: 10 November 2016 17:19
To: GCC Patches
Cc: nd
Subject: [PATCH][ARM] Improve max_insns_skipped logic
    
Improve the logic when setting max_insns_skipped.  Limit the maximum size of IT
to MAX_INSN_PER_IT_BLOCK as otherwise multiple IT instructions are needed,
increasing codesize.  Given 4 works well for Thumb-2, use the same limit for ARM
for consistency. 

ChangeLog:
2016-11-04  Wilco Dijkstra  

    * config/arm/arm.c (arm_option_params_internal): Improve setting of
    max_insns_skipped.
--

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
f046854e9665d54911616fc1c60fee407188f7d6..29e8d1d07d918fbb2a627a653510dfc8587ee01a
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -2901,20 +2901,12 @@ arm_option_params_internal (void)
   targetm.max_anchor_offset = TARGET_MAX_ANCHOR_OFFSET;
 }
 
-  if (optimize_size)
-    {
-  /* If optimizing for size, bump the number of instructions that we
- are prepared to conditionally execute (even on a StrongARM).  */
-  max_insns_skipped = 6;
+  /* Increase the number of conditional instructions with -Os.  */
+  max_insns_skipped = optimize_size ? 4 : current_tune->max_insns_skipped;
 
-  /* For THUMB2, we limit the conditional sequence to one IT block.  */
-  if (TARGET_THUMB2)
-    max_insns_skipped = arm_restrict_it ? 1 : 4;
-    }
-  else
-    /* When -mrestrict-it is in use tone down the if-conversion.  */
-    max_insns_skipped = (TARGET_THUMB2 && arm_restrict_it)
-  ? 1 : current_tune->max_insns_skipped;
+  /* For THUMB2, we limit the conditional sequence to one IT block.  */
+  if (TARGET_THUMB2)
+    max_insns_skipped = MIN (max_insns_skipped, MAX_INSN_PER_IT_BLOCK);
 }
 
 /* True if -mflip-thumb should next add an attribute for the default

    

Re: [PATCH v3][AArch64] Fix symbol offset limit

2017-02-02 Thread Wilco Dijkstra

ping


From: Wilco Dijkstra
Sent: 17 January 2017 15:14
To: Richard Earnshaw; GCC Patches; James Greenhalgh
Cc: nd
Subject: Re: [PATCH v3][AArch64] Fix symbol offset limit
    
Here is v3 of the patch - tree_fits_uhwi_p was necessary to ensure the size of a
declaration is an integer. So the question is whether we should allow
largish offsets outside of the bounds of symbols (v1), no offsets (this 
version), or
small offsets (small negative and positive offsets just outside a symbol are 
common).
The only thing we can't allow is any offset like we currently do...

In aarch64_classify_symbol symbols are allowed full-range offsets on 
relocations.
This means the offset can use all of the +/-4GB offset, leaving no offset 
available
for the symbol itself.  This results in relocation overflow and link-time errors
for simple expressions like _char + 0xff00.

To avoid this, limit the offset to +/-1GB so that the symbol needs to be within 
a
3GB offset from its references.  For the tiny code model use a 64KB offset, 
allowing
most of the 1MB range for code/data between the symbol and its references.
For symbols with a defined size, limit the offset to be within the size of the 
symbol.


ChangeLog:
2017-01-17  Wilco Dijkstra  

    gcc/
    * config/aarch64/aarch64.c (aarch64_classify_symbol):
    Apply reasonable limit to symbol offsets.

    testsuite/
    * gcc.target/aarch64/symbol-range.c (foo): Set new limit.
    * gcc.target/aarch64/symbol-range-tiny.c (foo): Likewise.

--
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
e8d65ead95a3c5730c2ffe64a9e057779819f7b4..f1d54e332dc1cf1ef0bc4b1e46b0ebebe1c4cea4
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -9809,6 +9809,8 @@ aarch64_classify_symbol (rtx x, rtx offset)
   if (aarch64_tls_symbol_p (x))
 return aarch64_classify_tls_symbol (x);
 
+  const_tree decl = SYMBOL_REF_DECL (x);
+
   switch (aarch64_cmodel)
 {
 case AARCH64_CMODEL_TINY:
@@ -9817,25 +9819,45 @@ aarch64_classify_symbol (rtx x, rtx offset)
  we have no way of knowing the address of symbol at compile time
  so we can't accurately say if the distance between the PC and
  symbol + offset is outside the addressible range of +/-1M in the
-    TINY code model.  So we rely on images not being greater than
-    1M and cap the offset at 1M and anything beyond 1M will have to
-    be loaded using an alternative mechanism.  Furthermore if the
-    symbol is a weak reference to something that isn't known to
-    resolve to a symbol in this module, then force to memory.  */
+    TINY code model.  So we limit the maximum offset to +/-64KB and
+    assume the offset to the symbol is not larger than +/-(1M - 64KB).
+    Furthermore force to memory if the symbol is a weak reference to
+    something that doesn't resolve to a symbol in this module.  */
   if ((SYMBOL_REF_WEAK (x)
    && !aarch64_symbol_binds_local_p (x))
- || INTVAL (offset) < -1048575 || INTVAL (offset) > 1048575)
+ || !IN_RANGE (INTVAL (offset), -0x1, 0x1))
 return SYMBOL_FORCE_TO_MEM;
+
+ /* Limit offset to within the size of a declaration if available.  */
+ if (decl && DECL_P (decl))
+   {
+ const_tree decl_size = DECL_SIZE (decl);
+
+ if (tree_fits_uhwi_p (decl_size)
+ && !IN_RANGE (INTVAL (offset), 0, tree_to_uhwi (decl_size)))
+   return SYMBOL_FORCE_TO_MEM;
+   }
+
   return SYMBOL_TINY_ABSOLUTE;
 
 case AARCH64_CMODEL_SMALL:
   /* Same reasoning as the tiny code model, but the offset cap here is
-    4G.  */
+    1G, allowing +/-3G for the offset to the symbol.  */
   if ((SYMBOL_REF_WEAK (x)
    && !aarch64_symbol_binds_local_p (x))
- || !IN_RANGE (INTVAL (offset), HOST_WIDE_INT_C (-4294967263),
-   HOST_WIDE_INT_C (4294967264)))
+ || !IN_RANGE (INTVAL (offset), -0x4000, 0x4000))
 return SYMBOL_FORCE_TO_MEM;
+
+ /* Limit offset to within the size of a declaration if available.  */
+ if (decl && DECL_P (decl))
+   {
+ const_tree decl_size = DECL_SIZE (decl);
+
+ if (tree_fits_uhwi_p (decl_size)
+ && !IN_RANGE (INTVAL (offset), 0, tree_to_uhwi (decl_size)))
+   return SYMBOL_FORCE_TO_MEM;
+   }
+
   return SYMBOL_SMALL_ABSOLUTE;
 
 case AARCH64_CMODEL_TINY_PIC:
diff --git a/gcc/testsuite/gcc.target/aarch64/symbol-range-tiny.c 
b/gcc/testsuite/gcc.target/aarch64/symbol-range-tiny.c
index 
d7e46b059e41f2672b3a1da5506fa8944e752e01..d49ff4dbe5786ef6d343d2b90052c09676dd7fe5
 100644
--- 

Re: [PATCH][ARM] Remove Thumb-2 iordi_not patterns

2017-02-02 Thread Wilco Dijkstra

ping

From: Wilco Dijkstra
Sent: 17 January 2017 18:00
To: GCC Patches
Cc: nd; Kyrylo Tkachov; Richard Earnshaw
Subject: [PATCH][ARM] Remove Thumb-2 iordi_not patterns
    
After Bernd's DImode patch [1] almost all DImode operations are expanded
early (except for -mfpu=neon). This means the Thumb-2 iordi_notdi_di
patterns are no longer used - the split ORR and NOT instructions are merged
into ORN by Combine.  With -mfpu=neon the iordi_notdi_di patterns are used
on Thumb-2, and after this patch the orndi3_neon pattern matches instead
(which still emits ORN).  After this there are no Thumb-2 specific DImode 
patterns.

[1] https://gcc.gnu.org/ml/gcc-patches/2016-11/msg02796.html

ChangeLog:
2017-01-17  Wilco Dijkstra  

    * config/arm/thumb2.md (iordi_notdi_di): Remove pattern.
    (iordi_notzesidi_di): Likewise.
    (iordi_notdi_zesidi): Likewise.
    (iordi_notsesidi_di): Likewise.

--

diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md
index 
2e7580f220eae1524fef69719b1796f50f5cf27c..91471d4650ecae4f4e87b549d84d11adf3014ad2
 100644
--- a/gcc/config/arm/thumb2.md
+++ b/gcc/config/arm/thumb2.md
@@ -1434,103 +1434,6 @@
    (set_attr "type" "alu_sreg")]
 )
 
-; Constants for op 2 will never be given to these patterns.
-(define_insn_and_split "*iordi_notdi_di"
-  [(set (match_operand:DI 0 "s_register_operand" "=,")
-   (ior:DI (not:DI (match_operand:DI 1 "s_register_operand" "0,r"))
-   (match_operand:DI 2 "s_register_operand" "r,0")))]
-  "TARGET_THUMB2"
-  "#"
-  "TARGET_THUMB2 && reload_completed"
-  [(set (match_dup 0) (ior:SI (not:SI (match_dup 1)) (match_dup 2)))
-   (set (match_dup 3) (ior:SI (not:SI (match_dup 4)) (match_dup 5)))]
-  "
-  {
-    operands[3] = gen_highpart (SImode, operands[0]);
-    operands[0] = gen_lowpart (SImode, operands[0]);
-    operands[4] = gen_highpart (SImode, operands[1]);
-    operands[1] = gen_lowpart (SImode, operands[1]);
-    operands[5] = gen_highpart (SImode, operands[2]);
-    operands[2] = gen_lowpart (SImode, operands[2]);
-  }"
-  [(set_attr "length" "8")
-   (set_attr "predicable" "yes")
-   (set_attr "predicable_short_it" "no")
-   (set_attr "type" "multiple")]
-)
-
-(define_insn_and_split "*iordi_notzesidi_di"
-  [(set (match_operand:DI 0 "s_register_operand" "=,")
-   (ior:DI (not:DI (zero_extend:DI
-    (match_operand:SI 2 "s_register_operand" "r,r")))
-   (match_operand:DI 1 "s_register_operand" "0,?r")))]
-  "TARGET_THUMB2"
-  "#"
-  ; (not (zero_extend...)) means operand0 will always be 0x
-  "TARGET_THUMB2 && reload_completed"
-  [(set (match_dup 0) (ior:SI (not:SI (match_dup 2)) (match_dup 1)))
-   (set (match_dup 3) (const_int -1))]
-  "
-  {
-    operands[3] = gen_highpart (SImode, operands[0]);
-    operands[0] = gen_lowpart (SImode, operands[0]);
-    operands[1] = gen_lowpart (SImode, operands[1]);
-  }"
-  [(set_attr "length" "4,8")
-   (set_attr "predicable" "yes")
-   (set_attr "predicable_short_it" "no")
-   (set_attr "type" "multiple")]
-)
-
-(define_insn_and_split "*iordi_notdi_zesidi"
-  [(set (match_operand:DI 0 "s_register_operand" "=,")
-   (ior:DI (not:DI (match_operand:DI 2 "s_register_operand" "0,?r"))
-   (zero_extend:DI
-    (match_operand:SI 1 "s_register_operand" "r,r"]
-  "TARGET_THUMB2"
-  "#"
-  "TARGET_THUMB2 && reload_completed"
-  [(set (match_dup 0) (ior:SI (not:SI (match_dup 2)) (match_dup 1)))
-   (set (match_dup 3) (not:SI (match_dup 4)))]
-  "
-  {
-    operands[3] = gen_highpart (SImode, operands[0]);
-    operands[0] = gen_lowpart (SImode, operands[0]);
-    operands[1] = gen_lowpart (SImode, operands[1]);
-    operands[4] = gen_highpart (SImode, operands[2]);
-    operands[2] = gen_lowpart (SImode, operands[2]);
-  }"
-  [(set_attr "length" "8")
-   (set_attr "predicable" "yes")
-   (set_attr "predicable_short_it" "no")
-   (set_attr "type" "multiple")]
-)
-
-(define_insn_and_split "*iordi_notsesidi_di"
-  [(set (match_operand:DI 0 "s_register_operand" "=,")
-   (ior:DI (not:DI (sign_extend:DI
-    (match_operand:SI 2 "s_register_operand" "r,r")))
-   (match_operand:DI 1 "s_register_operand" "0,r")))]
-  "TARGET_THUMB2"
-  "#"
-  "TARGET_THUMB2 && reload_completed"
-  [(set (match_dup 0) (ior:SI (not:SI (match_dup 2)) (match_dup 1)))
-   (set (match_dup 3) (ior:SI (not:SI
-   (ashiftrt:SI (match_dup 2) (const_int 31)))
-  (match_dup 4)))]
-  "
-  {
-    operands[3] = gen_highpart (SImode, operands[0]);
-    operands[0] = gen_lowpart (SImode, operands[0]);
-    operands[4] = gen_highpart (SImode, operands[1]);
-    operands[1] = gen_lowpart (SImode, operands[1]);
-  }"
-  [(set_attr "length" "8")
-   (set_attr "predicable" "yes")
-   (set_attr "predicable_short_it" "no")
-   (set_attr "type" "multiple")]
-)
-
 (define_insn "*orsi_notsi_si"
   [(set (match_operand:SI 0 

Re: [PATCH][ARM] Remove DImode expansions for 1-bit shifts

2017-02-02 Thread Wilco Dijkstra

ping



From: Wilco Dijkstra
Sent: 17 January 2017 19:23
To: GCC Patches
Cc: nd; Kyrill Tkachov; Richard Earnshaw
Subject: [PATCH][ARM] Remove DImode expansions for 1-bit shifts
    
A left shift of 1 can always be done using an add, so slightly adjust rtx
cost for DImode left shift by 1 so that adddi3 is preferred in all cases,
and the arm_ashldi3_1bit is redundant.

DImode right shifts of 1 are rarely used (6 in total in the GCC binary),
so there is little benefit of the arm_ashrdi3_1bit and arm_lshrdi3_1bit
patterns.

Bootstrap OK on arm-linux-gnueabihf.

ChangeLog:
2017-01-17  Wilco Dijkstra  

    * config/arm/arm.md (ashldi3): Remove shift by 1 expansion.
    (arm_ashldi3_1bit): Remove pattern.
    (ashrdi3): Remove shift by 1 expansion.
    (arm_ashrdi3_1bit): Remove pattern.
    (lshrdi3): Remove shift by 1 expansion.
    (arm_lshrdi3_1bit): Remove pattern.
    * config/arm/arm.c (arm_rtx_costs_internal): Slightly increase
    cost of ashldi3 by 1.
    * config/arm/neon.md (ashldi3_neon): Remove shift by 1 expansion.
    (di3_neon): Likewise.
--
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
7d82ba358306189535bf7eee08a54e2f84569307..d47f4005446ff3e81968d7888c6573c0360cfdbd
 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -9254,6 +9254,9 @@ arm_rtx_costs_internal (rtx x, enum rtx_code code, enum 
rtx_code outer_code,
    + rtx_cost (XEXP (x, 0), mode, code, 0, speed_p));
   if (speed_p)
 *cost += 2 * extra_cost->alu.shift;
+ /* Slightly disparage left shift by 1 at so we prefer adddi3.  */
+ if (code == ASHIFT && XEXP (x, 1) == CONST1_RTX (SImode))
+   *cost += 1;
   return true;
 }
   else if (mode == SImode)
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 
0d69c8be9a2f98971c23c3b6f1659049f369920e..92b734ca277079f5f7343c7cc21a343f48d234c5
 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -4061,12 +4061,6 @@
 {
   rtx scratch1, scratch2;
 
-  if (operands[2] == CONST1_RTX (SImode))
-    {
-  emit_insn (gen_arm_ashldi3_1bit (operands[0], operands[1]));
-  DONE;
-    }
-
   /* Ideally we should use iwmmxt here if we could know that operands[1]
  ends up already living in an iwmmxt register. Otherwise it's
  cheaper to have the alternate code being generated than moving
@@ -4083,18 +4077,6 @@
   "
 )
 
-(define_insn "arm_ashldi3_1bit"
-  [(set (match_operand:DI    0 "s_register_operand" "=r,")
-    (ashift:DI (match_operand:DI 1 "s_register_operand" "0,r")
-   (const_int 1)))
-   (clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT"
-  "movs\\t%Q0, %Q1, asl #1\;adc\\t%R0, %R1, %R1"
-  [(set_attr "conds" "clob")
-   (set_attr "length" "8")
-   (set_attr "type" "multiple")]
-)
-
 (define_expand "ashlsi3"
   [(set (match_operand:SI    0 "s_register_operand" "")
 (ashift:SI (match_operand:SI 1 "s_register_operand" "")
@@ -4130,12 +4112,6 @@
 {
   rtx scratch1, scratch2;
 
-  if (operands[2] == CONST1_RTX (SImode))
-    {
-  emit_insn (gen_arm_ashrdi3_1bit (operands[0], operands[1]));
-  DONE;
-    }
-
   /* Ideally we should use iwmmxt here if we could know that operands[1]
  ends up already living in an iwmmxt register. Otherwise it's
  cheaper to have the alternate code being generated than moving
@@ -4152,18 +4128,6 @@
   "
 )
 
-(define_insn "arm_ashrdi3_1bit"
-  [(set (match_operand:DI  0 "s_register_operand" "=r,")
-    (ashiftrt:DI (match_operand:DI 1 "s_register_operand" "0,r")
- (const_int 1)))
-   (clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT"
-  "movs\\t%R0, %R1, asr #1\;mov\\t%Q0, %Q1, rrx"
-  [(set_attr "conds" "clob")
-   (set_attr "length" "8")
-   (set_attr "type" "multiple")]
-)
-
 (define_expand "ashrsi3"
   [(set (match_operand:SI  0 "s_register_operand" "")
 (ashiftrt:SI (match_operand:SI 1 "s_register_operand" "")
@@ -4196,12 +4160,6 @@
 {
   rtx scratch1, scratch2;
 
-  if (operands[2] == CONST1_RTX (SImode))
-    {
-  emit_insn (gen_arm_lshrdi3_1bit (operands[0], operands[1]));
-  DONE;
-    }
-
   /* Ideally we should use iwmmxt here if we could know that operands[1]
  ends up already living in an iwmmxt register. Otherwise it's
  cheaper to have the alternate code being generated than moving
@@ -4218,18 +4176,6 @@
   "
 )
 
-(define_insn "arm_lshrdi3_1bit"
-  [(set (match_operand:DI  0 "s_register_operand" "=r,")
-    (lshiftrt:DI (match_operand:DI 1 "s_register_operand" "0,r")
- (const_int 1)))
-   (clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT"
-  "movs\\t%R0, %R1, lsr #1\;mov\\t%Q0, %Q1, rrx"
-  [(set_attr "conds" "clob")
-   (set_attr "length" "8")
-   (set_attr "type" "multiple")]

libgomp, nvptx plugin: Make "nvptx_exec" static (was: [PATCH 7/10] OpenACC 2.0 support for libgomp - OpenACC runtime, NVidia PTX/CUDA plugin)

2017-02-02 Thread Thomas Schwinge
Hi!

On Tue, 23 Sep 2014 19:19:31 +0100, Julian Brown  
wrote:
> This patch contains the bulk of the OpenACC 2.0 runtime support, [...]

> --- /dev/null
> +++ b/libgomp/plugin-nvptx.c

> +void
> +PTX_exec (void (*fn), size_t mapnum, void **hostaddrs, void **devaddrs,
> +   size_t *sizes, unsigned short *kinds, int num_gangs, int num_workers,
> +   int vector_length, int async, void *targ_mem_desc)
> +{

As obvious, committed to trunk in r245127:

commit fbfa5aaf7537a8d4a86873dd993fdd20aaed0298
Author: tschwinge 
Date:   Thu Feb 2 14:35:30 2017 +

libgomp, nvptx plugin: Make "nvptx_exec" static

libgomp/
* plugin/plugin-nvptx.c (nvptx_exec): Make it static.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@245127 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgomp/ChangeLog | 2 ++
 libgomp/plugin/plugin-nvptx.c | 2 +-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git libgomp/ChangeLog libgomp/ChangeLog
index 5f05cdb..56dc5bb 100644
--- libgomp/ChangeLog
+++ libgomp/ChangeLog
@@ -1,5 +1,7 @@
 2017-02-02  Thomas Schwinge  
 
+   * plugin/plugin-nvptx.c (nvptx_exec): Make it static.
+
* libgomp-plugin.h (GOMP_OFFLOAD_openacc_parallel): Rename to
GOMP_OFFLOAD_openacc_exec.  Adjust all users.
(GOMP_OFFLOAD_openacc_get_current_cuda_device): Rename to
diff --git libgomp/plugin/plugin-nvptx.c libgomp/plugin/plugin-nvptx.c
index 0284c7f..36d447c 100644
--- libgomp/plugin/plugin-nvptx.c
+++ libgomp/plugin/plugin-nvptx.c
@@ -1041,7 +1041,7 @@ event_add (enum ptx_event_type type, CUevent *e, void *h, 
int val)
   pthread_mutex_unlock (_event_lock);
 }
 
-void
+static void
 nvptx_exec (void (*fn), size_t mapnum, void **hostaddrs, void **devaddrs,
int async, unsigned *dims, void *targ_mem_desc)
 {


Backported to gomp-4_0-branch in r245128:

commit 18ebacbb8d4a978eee356d6bfb5051a431e6400b
Author: tschwinge 
Date:   Thu Feb 2 14:36:31 2017 +

libgomp, nvptx plugin: Make "nvptx_exec" static

Backport from trunk r245127:

libgomp/
* plugin/plugin-nvptx.c (nvptx_exec): Make it static.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@245128 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgomp/ChangeLog.gomp| 5 +
 libgomp/plugin/plugin-nvptx.c | 2 +-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git libgomp/ChangeLog.gomp libgomp/ChangeLog.gomp
index 17b10ef..062103e 100644
--- libgomp/ChangeLog.gomp
+++ libgomp/ChangeLog.gomp
@@ -1,5 +1,10 @@
 2017-02-02  Thomas Schwinge  
 
+   Backport from trunk r245127:
+   2017-02-02  Thomas Schwinge  
+
+   * plugin/plugin-nvptx.c (nvptx_exec): Make it static.
+
Backport from trunk r245125:
2017-02-02  Thomas Schwinge  
 
diff --git libgomp/plugin/plugin-nvptx.c libgomp/plugin/plugin-nvptx.c
index 79c58c6..15018e5 100644
--- libgomp/plugin/plugin-nvptx.c
+++ libgomp/plugin/plugin-nvptx.c
@@ -888,7 +888,7 @@ event_add (enum ptx_event_type type, CUevent *e, void *h, 
int val)
   pthread_mutex_unlock (_event_lock);
 }
 
-void
+static void
 nvptx_exec (void (*fn), size_t mapnum, void **hostaddrs, void **devaddrs,
int async, unsigned *dims, void *targ_mem_desc)
 {


Grüße
 Thomas


[PATCH] Fix memory leaks in gimple-ssa-sprintf.c (PR tree-optimization/79339).

2017-02-02 Thread Martin Liška
Hi.

As mentioned in the PR, mpfr_clear should be called in order to release memory.
Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Ready to be installed?
Martin
>From 627ea01882a2a307b107e5e4aa8de6dc60530a81 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Thu, 2 Feb 2017 12:25:32 +0100
Subject: [PATCH] Fix memory leaks in gimple-ssa-sprintf.c (PR
 tree-optimization/79339).

gcc/ChangeLog:

2017-02-02  Martin Liska  

	PR tree-optimization/79339
	* gimple-ssa-sprintf.c (format_floating_max): Call mpfr_clear.
	(format_floating): Likewise.
---
 gcc/gimple-ssa-sprintf.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/gcc/gimple-ssa-sprintf.c b/gcc/gimple-ssa-sprintf.c
index 11f41741f95..fe4083ef1e2 100644
--- a/gcc/gimple-ssa-sprintf.c
+++ b/gcc/gimple-ssa-sprintf.c
@@ -1501,7 +1501,10 @@ format_floating_max (tree type, char spec, HOST_WIDE_INT prec)
   mpfr_from_real (x, , GMP_RNDN);
 
   /* Return a value one greater to account for the leading minus sign.  */
-  return 1 + get_mpfr_format_length (x, "", prec, spec, 'D');
+  unsigned HOST_WIDE_INT r
+= 1 + get_mpfr_format_length (x, "", prec, spec, 'D');
+  mpfr_clear (x);
+  return r;
 }
 
 /* Return a range representing the minimum and maximum number of bytes
@@ -1739,6 +1742,7 @@ format_floating (const directive , tree arg)
 	   of the result struct.  */
 	*minmax[i] = get_mpfr_format_length (mpfrval, fmtstr, prec[i],
 	 dir.specifier, rndspec);
+	mpfr_clear (mpfrval);
   }
   }
 
-- 
2.11.0



Re: Enable jump threading on paths meeting hot paths

2017-02-02 Thread Richard Biener
On Thu, Feb 2, 2017 at 2:49 PM, Jan Hubicka  wrote:
> Hi,
> it seems I forgot to send the updated patch. Here it is.
> We now dump info like:
> Checking profitability of path:  5 (16 insns) 3 (2 insns) 34 (2 insns) 33 (4 
> insns) 32 (1 insns) 10 (3 insns) 6
>   Control statement insns: 16
>   Overall: 12 insns
>   Registering FSM jump thread: (6, 10) incoming edge;  (10, 32)  (32, 33)  
> (33, 34)  (34, 3)  (3, 5)  (5, 16) nocopy; (5, 16)
>
> path is printed backwards. It is how the loop process it.
>
> Bootstrapped/regtested x86_64-linux, OK?
>
> Honza
>
> PR middle-end/77445
> * gcc.dg/tree-ssa/pr77445-2.c: Update testcase to check that all
> threading is done.
> * tree-ssa-threadbackward.c (profitable_jump_thread_path): Dump
> statistics of the analyzed path; allow threading for speed when
> any of BBs along the path are optimized for speed.
>
> Index: testsuite/gcc.dg/tree-ssa/pr77445-2.c
> ===
> --- testsuite/gcc.dg/tree-ssa/pr77445-2.c   (revision 245124)
> +++ testsuite/gcc.dg/tree-ssa/pr77445-2.c   (working copy)
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -fdump-tree-thread1-details-blocks-stats" } */
> +/* { dg-options "-O2 -fdump-tree-thread1-details-blocks-stats 
> -fdump-tree-thread2-details-blocks-stats 
> -fdump-tree-thread3-details-blocks-stats 
> -fdump-tree-thread4-details-blocks-stats" } */

You can use -fdump-tree-thread-details-blocks-stats to get all thread dumps.

>  typedef enum STATES {
> START=0,
> INVALID,
> @@ -121,3 +121,7 @@ enum STATES FMS( u8 **in , u32 *transiti
> increase much.  */
>  /* { dg-final { scan-tree-dump "Jumps threaded: 1\[1-9\]" "thread1" } } */
>  /* { dg-final { scan-tree-dump-times "Invalid sum" 2 "thread1" } } */
> +/* { dg-final { scan-tree-dump-not "not considered" "thread1" } } */
> +/* { dg-final { scan-tree-dump-not "not considered" "thread2" } } */
> +/* { dg-final { scan-tree-dump-not "not considered" "thread3" } } */
> +/* { dg-final { scan-tree-dump-not "not considered" "thread4" } } */
> Index: tree-ssa-threadbackward.c
> ===
> --- tree-ssa-threadbackward.c   (revision 245124)
> +++ tree-ssa-threadbackward.c   (working copy)
> @@ -159,6 +159,10 @@ profitable_jump_thread_path (vecbool threaded_through_latch = false;
>bool multiway_branch_in_path = false;
>bool threaded_multiway_branch = false;
> +  bool contains_hot_bb = false;
> +
> +  if (dump_file && (dump_flags & TDF_DETAILS))
> +fprintf (dump_file, "Checking profitability of path: ");
>
>/* Count the number of instructions on the path: as these instructions
>   will have to be duplicated, we will not record the path if there
> @@ -168,6 +172,8 @@ profitable_jump_thread_path (vec  {
>basic_block bb = (*path)[j];
>
> +  if (dump_file && (dump_flags & TDF_DETAILS))
> +   fprintf (dump_file, " %i", bb->index);
>/* Remember, blocks in the path are stored in opposite order
>  in the PATH array.  The last entry in the array represents
>  the block with an outgoing edge that we will redirect to the
> @@ -177,6 +183,7 @@ profitable_jump_thread_path (vec  branch.  */
>if (j < path_length - 1)
> {
> + int orig_n_insns = n_insns;
>   if (bb->loop_father != loop)
> {
>   path_crosses_loops = true;
> @@ -219,6 +226,9 @@ profitable_jump_thread_path (vec }
> }
>
> +

Please do not add excess vertical space.

> + if (!contains_hot_bb && speed_p && j < path_length - 1)

j < path_length - 1 is already checked above?

Otherwise looks ok.  If it does fix the regression - does it?

Richard.

> +   contains_hot_bb |= optimize_bb_for_speed_p (bb);
>   for (gsi = gsi_after_labels (bb);
>!gsi_end_p (gsi);
>gsi_next_nondebug ())
> @@ -229,8 +239,10 @@ profitable_jump_thread_path (vec   && !(gimple_code (stmt) == GIMPLE_ASSIGN
>&& gimple_assign_rhs_code (stmt) == ASSERT_EXPR)
>   && !is_gimple_debug (stmt))
> -   n_insns += estimate_num_insns (stmt, _size_weights);
> +   n_insns += estimate_num_insns (stmt, _size_weights);
> }
> + if (dump_file && (dump_flags & TDF_DETAILS))
> +   fprintf (dump_file, " (%i insns)", n_insns-orig_n_insns);
>
>   /* We do not look at the block with the threaded branch
>  in this loop.  So if any block with a last statement that
> @@ -264,7 +276,13 @@ profitable_jump_thread_path (vec   last block in the threading path.  So don't count it against our
>   statement count.  */
>
> -  n_insns-= estimate_num_insns (stmt, _size_weights);
> +  int stmt_insns = estimate_num_insns (stmt, _size_weights);
> +  

Re: libgomp: Normalize the names of a few functions of the libgomp plugin API

2017-02-02 Thread Thomas Schwinge
Hi!

On Thu, 2 Feb 2017 13:27:11 +0100, Jakub Jelinek  wrote:
> On Thu, Feb 02, 2017 at 01:22:37PM +0100, Thomas Schwinge wrote:
> > On Tue, 31 Jan 2017 15:44:46 +0100, I wrote:
> > > --- libgomp/libgomp.h
> > > +++ libgomp/libgomp.h
> > > @@ -882,31 +882,35 @@ typedef struct acc_dispatch_t
> > 
> > > +  __typeof (GOMP_OFFLOAD_openacc_parallel) *exec_func;
> > 
> > As can be seen here, for a handful of functions, the name of the
> > implementation in the plugins ("GOMP_OFFLOAD_openacc_parallel") doesn't
> > match the name used in libgomp proper ("openacc.exec_func").  Here is a
> > patch to adjust these.  OK for trunk?  If not now, then in next stage 1?
> 
> Ok for trunk now

Thanks for the quick review!

> otherwise we'd have to bump the plugin interface in GCC8, even
> when we don't know if we'll need to do that.

I don't think that's actually a concern: this changes the libgomp <->
libgomp plugin interface only, for which versioning is not a concern.


Anyway, without changes committed to trunk in r245125:

commit 65caa53b3e767f1df02673f25f281926d11219a7
Author: tschwinge 
Date:   Thu Feb 2 14:13:57 2017 +

libgomp: Normalize the names of a few functions of the libgomp plugin API

libgomp/
* libgomp-plugin.h (GOMP_OFFLOAD_openacc_parallel): Rename to
GOMP_OFFLOAD_openacc_exec.  Adjust all users.
(GOMP_OFFLOAD_openacc_get_current_cuda_device): Rename to
GOMP_OFFLOAD_openacc_cuda_get_current_device.  Adjust all users.
(GOMP_OFFLOAD_openacc_get_current_cuda_context): Rename to
GOMP_OFFLOAD_openacc_cuda_get_current_context.  Adjust all users.
(GOMP_OFFLOAD_openacc_get_cuda_stream): Rename to
GOMP_OFFLOAD_openacc_cuda_get_stream.  Adjust all users.
(GOMP_OFFLOAD_openacc_set_cuda_stream): Rename to
GOMP_OFFLOAD_openacc_cuda_set_stream.  Adjust all users.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@245125 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgomp/ChangeLog | 13 +
 libgomp/libgomp-plugin.h  | 12 ++--
 libgomp/libgomp.h | 10 +-
 libgomp/plugin/plugin-nvptx.c | 14 +++---
 libgomp/target.c  | 10 +-
 5 files changed, 36 insertions(+), 23 deletions(-)

diff --git libgomp/ChangeLog libgomp/ChangeLog
index 829a30f..5f05cdb 100644
--- libgomp/ChangeLog
+++ libgomp/ChangeLog
@@ -1,3 +1,16 @@
+2017-02-02  Thomas Schwinge  
+
+   * libgomp-plugin.h (GOMP_OFFLOAD_openacc_parallel): Rename to
+   GOMP_OFFLOAD_openacc_exec.  Adjust all users.
+   (GOMP_OFFLOAD_openacc_get_current_cuda_device): Rename to
+   GOMP_OFFLOAD_openacc_cuda_get_current_device.  Adjust all users.
+   (GOMP_OFFLOAD_openacc_get_current_cuda_context): Rename to
+   GOMP_OFFLOAD_openacc_cuda_get_current_context.  Adjust all users.
+   (GOMP_OFFLOAD_openacc_get_cuda_stream): Rename to
+   GOMP_OFFLOAD_openacc_cuda_get_stream.  Adjust all users.
+   (GOMP_OFFLOAD_openacc_set_cuda_stream): Rename to
+   GOMP_OFFLOAD_openacc_cuda_set_stream.  Adjust all users.
+
 2017-01-31  Thomas Schwinge  
 
* libgomp-plugin.h: #include .
diff --git libgomp/libgomp-plugin.h libgomp/libgomp-plugin.h
index fba45ee..ff81350 100644
--- libgomp/libgomp-plugin.h
+++ libgomp/libgomp-plugin.h
@@ -93,8 +93,8 @@ extern bool GOMP_OFFLOAD_dev2dev (int, void *, const void *, 
size_t);
 extern bool GOMP_OFFLOAD_can_run (void *);
 extern void GOMP_OFFLOAD_run (int, void *, void *, void **);
 extern void GOMP_OFFLOAD_async_run (int, void *, void *, void **, void *);
-extern void GOMP_OFFLOAD_openacc_parallel (void (*) (void *), size_t, void **,
-  void **, int, unsigned *, void *);
+extern void GOMP_OFFLOAD_openacc_exec (void (*) (void *), size_t, void **,
+  void **, int, unsigned *, void *);
 extern void GOMP_OFFLOAD_openacc_register_async_cleanup (void *, int);
 extern int GOMP_OFFLOAD_openacc_async_test (int);
 extern int GOMP_OFFLOAD_openacc_async_test_all (void);
@@ -105,10 +105,10 @@ extern void GOMP_OFFLOAD_openacc_async_wait_all_async 
(int);
 extern void GOMP_OFFLOAD_openacc_async_set_async (int);
 extern void *GOMP_OFFLOAD_openacc_create_thread_data (int);
 extern void GOMP_OFFLOAD_openacc_destroy_thread_data (void *);
-extern void *GOMP_OFFLOAD_openacc_get_current_cuda_device (void);
-extern void *GOMP_OFFLOAD_openacc_get_current_cuda_context (void);
-extern void *GOMP_OFFLOAD_openacc_get_cuda_stream (int);
-extern int GOMP_OFFLOAD_openacc_set_cuda_stream (int, void *);
+extern void *GOMP_OFFLOAD_openacc_cuda_get_current_device (void);
+extern void *GOMP_OFFLOAD_openacc_cuda_get_current_context (void);
+extern void *GOMP_OFFLOAD_openacc_cuda_get_stream (int);
+extern int 

Re: [PATCH] use zero as the lower bound for a signed-unsigned range (PR 79327)

2017-02-02 Thread Jakub Jelinek
On Thu, Feb 02, 2017 at 10:52:18AM +0100, Jakub Jelinek wrote:
> That said, as I've said in the PR and earlier in December on gcc-patches,
> the way format_integer is structured is not really maintainable, has
> very different code paths for arguments with known ranges and without them
> intermixed together and I still ran into wrong-code issues or wrong warnings
> with your patch.  See below.  Thus I'd like to propose to just very much
> simplify the code:
> 
>  gimple-ssa-sprintf.c |  108 
> ---
>  1 file changed, 27 insertions(+), 81 deletions(-)
> 
> So far I haven't done full bootstrap/regtest on this, just
> make check-gcc -j16 -k RUNTESTFLAGS=tree-ssa.exp
> which revealed the need for gcc.dg/tree-ssa/builtin-sprintf-warn-1.c hunk
> below.  Here it is turned into a short wrong-code without the patch below:

Testing revealed a bug in my patch (POINTER_TYPE args really need special
handling, restored), and one further testcase glitch, plus I've added
another testcase for the other wrong-code.

Going to bootstrap/regtest this again:

2017-02-02  Jakub Jelinek  
Martin Sebor  

PR tree-optimization/79327
* gimple-ssa-sprintf.c (adjust_range_for_overflow): If returning
true, always set *argmin and *argmax to TYPE_{MIN,MAX}_VALUE of
dirtype.
(format_integer): Use wide_int_to_tree instead of build_int_cst
+ to_?hwi.  If argmin is NULL, just set argmin and argmax to
TYPE_{MIN,MAX}_VALUE of argtype.  Simplify and fix computation
of shortest and longest sequence.

* gcc.dg/tree-ssa/pr79327.c: New test.
* gcc.dg/tree-ssa/builtin-sprintf-warn-1.c
(test_sprintf_chk_hh_nonconst): Don't expect 2 bogus warnings.
* gcc.dg/tree-ssa/builtin-sprintf-warn-3.c
(test_sprintf_chk_range_schar): Adjust dg-message.
* gcc.dg/tree-ssa/builtin-sprintf-warn-12.c: New test.
* gcc.c-torture/execute/pr79327.c: New test.

--- gcc/gimple-ssa-sprintf.c.jj 2017-02-02 11:03:46.536526801 +0100
+++ gcc/gimple-ssa-sprintf.c2017-02-02 14:53:54.657592450 +0100
@@ -1014,8 +1014,8 @@ get_int_range (tree arg, tree type, HOST
determined by checking for the actual argument being in the range
of the type of the directive.  If it isn't it must be assumed to
take on the full range of the directive's type.
-   Return true when the range has been adjusted to the full unsigned
-   range of DIRTYPE, or [0, DIRTYPE_MAX], and false otherwise.  */
+   Return true when the range has been adjusted to the full range
+   of DIRTYPE, and false otherwise.  */
 
 static bool
 adjust_range_for_overflow (tree dirtype, tree *argmin, tree *argmax)
@@ -1051,20 +1051,8 @@ adjust_range_for_overflow (tree dirtype,
return false;
 }
 
-  tree dirmin = TYPE_MIN_VALUE (dirtype);
-  tree dirmax = TYPE_MAX_VALUE (dirtype);
-
-  if (TYPE_UNSIGNED (dirtype))
-{
-  *argmin = dirmin;
-  *argmax = dirmax;
-}
-  else
-{
-  *argmin = integer_zero_node;
-  *argmax = dirmin;
-}
-
+  *argmin = TYPE_MIN_VALUE (dirtype);
+  *argmax = TYPE_MAX_VALUE (dirtype);
   return true;
 }
 
@@ -1260,10 +1248,8 @@ format_integer (const directive , tr
   enum value_range_type range_type = get_range_info (arg, , );
   if (range_type == VR_RANGE)
{
- argmin = build_int_cst (argtype, wi::fits_uhwi_p (min)
- ? min.to_uhwi () : min.to_shwi ());
- argmax = build_int_cst (argtype, wi::fits_uhwi_p (max)
- ? max.to_uhwi () : max.to_shwi ());
+ argmin = wide_int_to_tree (argtype, min);
+ argmax = wide_int_to_tree (argtype, max);
 
  /* Set KNOWNRANGE if the argument is in a known subrange
 of the directive's type (KNOWNRANGE may be reset below).  */
@@ -1307,47 +1293,16 @@ format_integer (const directive , tr
 
   if (!argmin)
 {
-  /* For an unknown argument (e.g., one passed to a vararg function)
-or one whose value range cannot be determined, create a T_MIN
-constant if the argument's type is signed and T_MAX otherwise,
-and use those to compute the range of bytes that the directive
-can output.  When precision may be zero, use zero as the minimum
-since it results in no bytes on output (unless width is specified
-to be greater than 0).  */
-  bool zero = dir.prec[0] <= 0 && dir.prec[1] >= 0;
-  argmin = build_int_cst (argtype, !zero);
-
-  int typeprec = TYPE_PRECISION (dirtype);
-  int argprec = TYPE_PRECISION (argtype);
-
-  if (argprec < typeprec)
+  if (TREE_CODE (argtype) == POINTER_TYPE)
{
- if (POINTER_TYPE_P (argtype))
-   argmax = build_all_ones_cst (argtype);
- else if (TYPE_UNSIGNED (argtype))
-   argmax = TYPE_MAX_VALUE (argtype);
- else
-   argmax = 

Re: Enable jump threading on paths meeting hot paths

2017-02-02 Thread Jan Hubicka
Hi,
it seems I forgot to send the updated patch. Here it is.
We now dump info like:
Checking profitability of path:  5 (16 insns) 3 (2 insns) 34 (2 insns) 33 (4 
insns) 32 (1 insns) 10 (3 insns) 6
  Control statement insns: 16
  Overall: 12 insns
  Registering FSM jump thread: (6, 10) incoming edge;  (10, 32)  (32, 33)  (33, 
34)  (34, 3)  (3, 5)  (5, 16) nocopy; (5, 16) 

path is printed backwards. It is how the loop process it.

Bootstrapped/regtested x86_64-linux, OK?

Honza

PR middle-end/77445
* gcc.dg/tree-ssa/pr77445-2.c: Update testcase to check that all
threading is done.
* tree-ssa-threadbackward.c (profitable_jump_thread_path): Dump
statistics of the analyzed path; allow threading for speed when
any of BBs along the path are optimized for speed.

Index: testsuite/gcc.dg/tree-ssa/pr77445-2.c
===
--- testsuite/gcc.dg/tree-ssa/pr77445-2.c   (revision 245124)
+++ testsuite/gcc.dg/tree-ssa/pr77445-2.c   (working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-tree-thread1-details-blocks-stats" } */
+/* { dg-options "-O2 -fdump-tree-thread1-details-blocks-stats 
-fdump-tree-thread2-details-blocks-stats 
-fdump-tree-thread3-details-blocks-stats 
-fdump-tree-thread4-details-blocks-stats" } */
 typedef enum STATES {
START=0,
INVALID,
@@ -121,3 +121,7 @@ enum STATES FMS( u8 **in , u32 *transiti
increase much.  */
 /* { dg-final { scan-tree-dump "Jumps threaded: 1\[1-9\]" "thread1" } } */
 /* { dg-final { scan-tree-dump-times "Invalid sum" 2 "thread1" } } */
+/* { dg-final { scan-tree-dump-not "not considered" "thread1" } } */
+/* { dg-final { scan-tree-dump-not "not considered" "thread2" } } */
+/* { dg-final { scan-tree-dump-not "not considered" "thread3" } } */
+/* { dg-final { scan-tree-dump-not "not considered" "thread4" } } */
Index: tree-ssa-threadbackward.c
===
--- tree-ssa-threadbackward.c   (revision 245124)
+++ tree-ssa-threadbackward.c   (working copy)
@@ -159,6 +159,10 @@ profitable_jump_thread_path (vecindex);
   /* Remember, blocks in the path are stored in opposite order
 in the PATH array.  The last entry in the array represents
 the block with an outgoing edge that we will redirect to the
@@ -177,6 +183,7 @@ profitable_jump_thread_path (vecloop_father != loop)
{
  path_crosses_loops = true;
@@ -219,6 +226,9 @@ profitable_jump_thread_path (vec= PARAM_VALUE (PARAM_MAX_FSM_THREAD_PATH_INSNS))
{


Re: [PATCH][wwwdocs] Mention -march=armv8.3-a -msign-return-address= for GCC 7

2017-02-02 Thread Gerald Pfeifer

On Thu, 2 Feb 2017, Jiong Wang wrote:

This patch adds a short entry for the -march=armv8.3-a and
-msign-return-address= options in GCC 7 to the "AArch64" section.


Thanks, Jiong.

Index: gcc-7/changes.html
===
 
+   The ARMv8.3-A architecture is now supported.  It can be used by
+   specifying the -march=armv8.3-a option.
+
+   The option -msign-return-address= is supported to enable
+   return address protection using ARMv8.3-A Pointer Authentication
+   Extensions.  Please refer to the documentation for more information on
+   the arguments accepted by this option.
+ 

Would it make sense to make this two different items?  The way it
is currently marked up, the blank line will be "gone" once rendered.

Where you "refer to the documentation", what kind of documentation
is that?  ARM reference manuals, GCC's documentation,...?  Being a
bit more explicit here and/or using a link would be good.

Gerald


[PATCH] IPA: enhance dump output

2017-02-02 Thread Martin Liška
Following patch was helpful to deal with the PR.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Ready to be installed or is it stage1 material?
Martin
>From dc02af2bbbe67d9d1fb6119b606b3c1fea726062 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Tue, 24 Jan 2017 13:33:58 +0100
Subject: [PATCH] IPA: enhance dump output

gcc/ChangeLog:

2017-01-24  Martin Liska  

	* cgraph.c (cgraph_node::dump): Dump function version info.
	* symtab.c (symtab_node::dump_base): Add missing new line.
---
 gcc/cgraph.c | 10 ++
 gcc/symtab.c |  1 +
 2 files changed, 11 insertions(+)

diff --git a/gcc/cgraph.c b/gcc/cgraph.c
index ef2dc50282c..74839f7d993 100644
--- a/gcc/cgraph.c
+++ b/gcc/cgraph.c
@@ -2066,6 +2066,16 @@ cgraph_node::dump (FILE *f)
 fprintf (f, "  Profile id: %i\n",
 	 profile_id);
   fprintf (f, "  First run: %i\n", tp_first_run);
+  cgraph_function_version_info *vi = function_version ();
+  if (vi != NULL)
+{
+  /* Iterate to first item in the chain.  */
+  while (vi->prev != NULL)
+	vi = vi->prev;
+  fprintf (f, "  Version info: ");
+  dump_addr (f, "@", (void *)vi);
+  fprintf (f, "\n");
+}
   fprintf (f, "  Function flags:");
   if (count)
 fprintf (f, " executed %" PRId64"x",
diff --git a/gcc/symtab.c b/gcc/symtab.c
index 87febdc212f..0078896c8a8 100644
--- a/gcc/symtab.c
+++ b/gcc/symtab.c
@@ -890,6 +890,7 @@ symtab_node::dump_base (FILE *f)
 {
   fprintf (f, "  Aux:");
   dump_addr (f, " @", (void *)aux);
+  fprintf (f, "\n");
 }
 
   fprintf (f, "  References: ");
-- 
2.11.0



Re: [PATCH] Make multiple_target.c aware of LTO (PR lto/66295)

2017-02-02 Thread Martin Liška
Ok, I spent more time with understanding how that pass works and I believe it 
can be
significantly simplified. I guess target_clones are very close to 'target' 
attribute
that is handled by C++ FE. It creates cgraph_function_version_info and function 
dispatcher
is generated.

I hope doing the very same by an early simple IPA pass should be the right 
approach.
Works fine apart from an assert it triggers:

$ ./xgcc -B. /home/marxin/Programming/gcc/gcc/testsuite/gcc.target/i386/mvc9.c 
-flto -O1
lto1: internal compiler error: in binds_to_current_def_p, at symtab.c:2239
0x8c580a symtab_node::binds_to_current_def_p(symtab_node*)
../../gcc/symtab.c:2239
0x18cb40b worse_state
../../gcc/ipa-pure-const.c:477
0x18cd61f propagate_pure_const
../../gcc/ipa-pure-const.c:1346
0x18ce304 execute
../../gcc/ipa-pure-const.c:1679

triggered for foo.ifunc:

foo.ifunc/6 (foo.ifunc) @0x7f9535b138a0
  Type: function definition analyzed alias
  Visibility: prevailing_def_ironly artificial
  References: foo.resolver/7 (alias)
  Referring: 
  Read from file: /tmp/ccdj2ikS.o
  Availability: overwritable
  First run: 0
  Function flags:
  Called by: main/2 (1.00 per call) 
  Calls: 

The assert is removed in attached patch, but maybe there's a better approach?

Thanks,
Martin
>From 4dbaad270f77d3bf79d4d980371f120a19f7d3a3 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Tue, 24 Jan 2017 13:41:25 +0100
Subject: [PATCH] Simplify creation of target_clones (PR lto/66295)

gcc/ChangeLog:

2017-01-24  Martin Liska  

	* multiple_target.c (create_dispatcher_calls): Redirect edge
	from a caller of a dispatcher.
	(expand_target_clones): Make the clones local.
	(ipa_target_clone): Do both target clones and resolvers.
	(ipa_dispatcher_calls): Remove the pass.
	(pass_dispatcher_calls::gate): Likewise.
	(make_pass_dispatcher_calls): Likewise.
	* passes.def (pass_target_clone): Put as very first IPA early
	pass.
	* symtab.c (symtab_node::binds_to_current_def_p): Remove assert.

gcc/testsuite/ChangeLog:

2017-01-24  Martin Liska  

	* gcc.target/i386/mvc9.c: New test.
---
 gcc/multiple_target.c| 71 +---
 gcc/passes.def   |  3 +-
 gcc/symtab.c |  2 -
 gcc/testsuite/gcc.target/i386/mvc9.c | 28 ++
 4 files changed, 39 insertions(+), 65 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/mvc9.c

diff --git a/gcc/multiple_target.c b/gcc/multiple_target.c
index 5be3980db20..7b735ae81ae 100644
--- a/gcc/multiple_target.c
+++ b/gcc/multiple_target.c
@@ -87,6 +87,7 @@ create_dispatcher_calls (struct cgraph_node *node)
 	inode->resolve_alias (cgraph_node::get (resolver_decl));
 
   e->redirect_callee (inode);
+  e->redirect_call_stmt_to_callee ();
   /*  Since REDIRECT_CALLEE modifies NEXT_CALLER field we move to
 	  previously set NEXT_CALLER.  */
   e = NULL;
@@ -283,6 +284,7 @@ expand_target_clones (struct cgraph_node *node, bool definition)
   create_new_asm_name (attr, suffix);
   /* Create new target clone.  */
   cgraph_node *new_node = create_target_clone (node, definition, suffix);
+  new_node->local.local = false;
   XDELETEVEC (suffix);
 
   /* Set new attribute for the clone.  */
@@ -334,17 +336,19 @@ expand_target_clones (struct cgraph_node *node, bool definition)
   return ret;
 }
 
-static bool target_clone_pass;
-
 static unsigned int
 ipa_target_clone (void)
 {
   struct cgraph_node *node;
 
-  target_clone_pass = false;
+  bool target_clone_pass = false;
   FOR_EACH_FUNCTION (node)
-if (node->definition)
-  target_clone_pass |= expand_target_clones (node, true);
+target_clone_pass |= expand_target_clones (node, node->definition);
+
+  if (target_clone_pass)
+FOR_EACH_FUNCTION (node)
+  create_dispatcher_calls (node);
+
   return 0;
 }
 
@@ -360,7 +364,7 @@ const pass_data pass_data_target_clone =
   0,/* properties_provided */
   0,/* properties_destroyed */
   0,/* todo_flags_start */
-  0/* todo_flags_finish */
+  TODO_update_ssa		/* todo_flags_finish */
 };
 
 class pass_target_clone : public simple_ipa_opt_pass
@@ -388,58 +392,3 @@ make_pass_target_clone (gcc::context *ctxt)
 {
   return new pass_target_clone (ctxt);
 }
-
-static unsigned int
-ipa_dispatcher_calls (void)
-{
-  struct cgraph_node *node;
-
-  FOR_EACH_FUNCTION (node)
-if (!node->definition)
-  target_clone_pass |= expand_target_clones (node, false);
-  if (target_clone_pass)
-FOR_EACH_FUNCTION (node)
-  create_dispatcher_calls (node);
-  return 0;
-}
-
-namespace {
-
-const pass_data pass_data_dispatcher_calls =
-{
-  SIMPLE_IPA_PASS,		/* type */
-  "dispatchercalls",		/* name */
-  OPTGROUP_NONE,		/* optinfo_flags */
-  TV_NONE,			/* tv_id */
-  ( PROP_ssa | PROP_cfg ),	/* properties_required */
-  0,/* properties_provided */
-  0,/* properties_destroyed */
-  0,/* 

Re: [PATCH] Fix __atomic to not implement atomic loads with CAS.

2017-02-02 Thread Thomas Schwinge
Hi!

On Mon, 30 Jan 2017 19:54:00 +0100, Torvald Riegel  wrote:
> This patch fixes the __atomic builtins to not implement supposedly
> lock-free atomic loads based on just a compare-and-swap operation.  [...]

> I've tested this on an x86_64-linux bootstrap build and see no
> regressions.  (With the exception of two OpenMP failures, which Jakub
> will take care of.

These are:

[-PASS:-]{+FAIL:+} libgomp.c/atomic-2.c (test for excess errors)
[-PASS:-]{+UNRESOLVED:+} libgomp.c/atomic-2.c [-execution 
test-]{+compilation failed to produce executable+}

atomic-2.c:(.text+0x50): undefined reference to `__atomic_load_16'

[-PASS:-]{+FAIL:+} libgomp.c/atomic-5.c (test for excess errors)
[-PASS:-]{+UNRESOLVED:+} libgomp.c/atomic-5.c [-execution 
test-]{+compilation failed to produce executable+}

atomic-5.c:(.text+0x55): undefined reference to `__atomic_load_16'

(Would've been nice to xfail these as part of your commit, until Jakub
gets to address that.)

> The other failures I saw didn't seem atomics related
> (eg, openacc)

I suppose you're testing without nvptx offloading -- which failures do
you see for OpenACC testing?  (There shouldn't be any for host fallback
testing.)

> I haven't compared it against a full bootstrap build and
> make check of trunk.).

I just happened to do that for x86_64-pc-linux-gnu, and I'm seeing the
following additional changes; posting this just in case that's not
expected to happen:

-PASS: gcc.dg/atomic-compare-exchange-5.c (test for excess errors)
-PASS: gcc.dg/atomic-compare-exchange-5.c execution test
+UNSUPPORTED: gcc.dg/atomic-compare-exchange-5.c

-PASS: gcc.dg/atomic-exchange-5.c (test for excess errors)
-PASS: gcc.dg/atomic-exchange-5.c execution test
+UNSUPPORTED: gcc.dg/atomic-exchange-5.c

-PASS: gcc.dg/atomic-load-5.c (test for excess errors)
-PASS: gcc.dg/atomic-load-5.c execution test
+UNSUPPORTED: gcc.dg/atomic-load-5.c

-PASS: gcc.dg/atomic-op-5.c (test for excess errors)
-PASS: gcc.dg/atomic-op-5.c execution test
+UNSUPPORTED: gcc.dg/atomic-op-5.c

-PASS: gcc.dg/atomic-store-5.c (test for excess errors)
-PASS: gcc.dg/atomic-store-5.c execution test
+UNSUPPORTED: gcc.dg/atomic-store-5.c

-PASS: gcc.dg/atomic-store-6.c (test for excess errors)
-PASS: gcc.dg/atomic-store-6.c execution test
+UNSUPPORTED: gcc.dg/atomic-store-6.c

-PASS: gcc.dg/simulate-thread/atomic-load-int128.c   -O0 -g  (test for 
excess errors)
-PASS: gcc.dg/simulate-thread/atomic-load-int128.c   -O0 -g  thread 
simulation test
+UNSUPPORTED: gcc.dg/simulate-thread/atomic-load-int128.c   -O0 -g 
-PASS: gcc.dg/simulate-thread/atomic-load-int128.c   -O2 -g  (test for 
excess errors)
-PASS: gcc.dg/simulate-thread/atomic-load-int128.c   -O2 -g  thread 
simulation test
+UNSUPPORTED: gcc.dg/simulate-thread/atomic-load-int128.c   -O2 -g 
-PASS: gcc.dg/simulate-thread/atomic-load-int128.c   -O3 -g  (test for 
excess errors)
-PASS: gcc.dg/simulate-thread/atomic-load-int128.c   -O3 -g  thread 
simulation test
+UNSUPPORTED: gcc.dg/simulate-thread/atomic-load-int128.c   -O3 -g 

-PASS: gcc.dg/simulate-thread/atomic-other-int128.c   -O0 -g  (test for 
excess errors)
-PASS: gcc.dg/simulate-thread/atomic-other-int128.c   -O0 -g  thread 
simulation test
+UNSUPPORTED: gcc.dg/simulate-thread/atomic-other-int128.c   -O0 -g 
-PASS: gcc.dg/simulate-thread/atomic-other-int128.c   -O2 -g  (test for 
excess errors)
-PASS: gcc.dg/simulate-thread/atomic-other-int128.c   -O2 -g  thread 
simulation test
+UNSUPPORTED: gcc.dg/simulate-thread/atomic-other-int128.c   -O2 -g 
-PASS: gcc.dg/simulate-thread/atomic-other-int128.c   -O3 -g  (test for 
excess errors)
-PASS: gcc.dg/simulate-thread/atomic-other-int128.c   -O3 -g  thread 
simulation test
+UNSUPPORTED: gcc.dg/simulate-thread/atomic-other-int128.c   -O3 -g 


Grüße
 Thomas


Re: [RFC] Bug lto/78140

2017-02-02 Thread Richard Biener
On Thu, Feb 2, 2017 at 1:52 PM, Richard Biener
 wrote:
> On Thu, Feb 2, 2017 at 1:48 PM, Jan Hubicka  wrote:
>>>
>>> 2017-02-02  Kugan Vivekanandarajah  
>>>
>>> * ipa-cp.c (ipcp_store_bits_results): Construct bits vector.
>>> (ipcp_store_vr_results): Constrict m_vr vector.
>>> * ipa-prop.c (ipa_node_params_t::remove): Set transaction summary to
>>> null.
>>> (ipa_node_params_t::duplicate): Construct bits and m_vr vector.
>>> (read_ipcp_transformation_info): Likewise.
>>>
>>>
>>> Thanks,
>>> Kugan
>>>
>>> >>I tried similar think without variable structure like:
>>> >>diff --git a/gcc/ipa-prop.h b/gcc/ipa-prop.h
>>> >>index 93a2390c..b0cc832 100644
>>> >>--- a/gcc/ipa-prop.h
>>> >>+++ b/gcc/ipa-prop.h
>>> >>@@ -525,7 +525,7 @@ struct GTY(()) ipcp_transformation_summary
>>> >>   /* Known bits information.  */
>>> >>   vec *bits;
>>> >>   /* Value range information.  */
>>> >>-  vec *m_vr;
>>> >>+  vec *m_vr;
>>> >> };
>>> >>
>>> >>This also has the same issue so I don't think it has anything to do with
>>> >>variable structure.
>>> >
>>> >You have to debug that detail yourself but I wonder why the transformation
>>> >summary has a different representation than the jump function (and I think
>>> >the jump function size is the issue).
>>> >
>>> >The JF has
>>> >
>>> >  /* Information about zero/non-zero bits.  */
>>> >  struct ipa_bits bits;
>>> >
>>> >  /* Information about value range, containing valid data only when 
>>> > vr_known is
>>> > true.  */
>>> >  value_range m_vr;
>>> >  bool vr_known;
>>> >
>>> >with ipa_bits having two widest_ints and value_range having two trees
>>> >and an unused bitmap and ipa_vr having two wide_ints (widest_ints are
>>> >smaller!).
>>> >
>>> >Richard.
>>> >
>>> >>
>>> >>Thanks,
>>> >>Kugan
>>
>>> diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
>>> index aa3c997..5103555 100644
>>> --- a/gcc/ipa-cp.c
>>> +++ b/gcc/ipa-cp.c
>>> @@ -4865,6 +4865,8 @@ ipcp_store_bits_results (void)
>>>
>>>ipcp_grow_transformations_if_necessary ();
>>>ipcp_transformation_summary *ts = ipcp_get_transformation_summary 
>>> (node);
>>> +  if (!ts->bits)
>>> + ts->bits = (new (ggc_cleared_alloc ()) ipa_bits_vec ());
>
>
> Why do you need those placement news?

Ah, I see we handle finalization but not construction in ggc_[cleared_]alloc...

> Richard.
>
>>>vec_safe_reserve_exact (ts->bits, count);
>>>
>>>for (unsigned i = 0; i < count; i++)
>>> @@ -4940,6 +4942,8 @@ ipcp_store_vr_results (void)
>>>
>>>ipcp_grow_transformations_if_necessary ();
>>>ipcp_transformation_summary *ts = ipcp_get_transformation_summary 
>>> (node);
>>> +  if (!ts->m_vr)
>>> + ts->m_vr = new (ggc_cleared_alloc ()) ipa_vr_vec ();
>>>vec_safe_reserve_exact (ts->m_vr, count);
>>>
>>>for (unsigned i = 0; i < count; i++)
>>> diff --git a/gcc/ipa-prop.c b/gcc/ipa-prop.c
>>> index 834c27d..b992a7f 100644
>>> --- a/gcc/ipa-prop.c
>>> +++ b/gcc/ipa-prop.c
>>> @@ -3759,13 +3759,20 @@ ipa_node_params_t::insert (cgraph_node *, 
>>> ipa_node_params *info)
>>> to.  */
>>>
>>>  void
>>> -ipa_node_params_t::remove (cgraph_node *, ipa_node_params *info)
>>> +ipa_node_params_t::remove (cgraph_node *node, ipa_node_params *info)
>>>  {
>>>free (info->lattices);
>>>/* Lattice values and their sources are deallocated with their alocation
>>>   pool.  */
>>>info->known_csts.release ();
>>>info->known_contexts.release ();
>>> +  ipcp_transformation_summary *ts = ipcp_get_transformation_summary (node);
>>> +  if (ts != NULL)
>>> +{
>>> +  ts->agg_values = NULL;
>>> +  ts->bits = NULL;
>>> +  ts->m_vr = NULL;
>>> +}
>>
>> I would go for explicit ggc_free for them: garbage collrector is hardly ever 
>> run during
>> WPA stage and thus we are not going to get the memory back otherwise.
>>
>> Patch is OK with that change.
>>
>> Honza
>>>  }
>>>
>>>  /* Hook that is called by summary when a node is duplicated.  */
>>> @@ -3811,8 +3818,10 @@ ipa_node_params_t::duplicate(cgraph_node *src, 
>>> cgraph_node *dst,
>>>ipcp_grow_transformations_if_necessary ();
>>>src_trans = ipcp_get_transformation_summary (src);
>>>const vec *src_vr = src_trans->m_vr;
>>> -  vec *_vr
>>> - = ipcp_get_transformation_summary (dst)->m_vr;
>>> +  ipcp_transformation_summary *dts = ipcp_get_transformation_summary 
>>> (dst);
>>> +  if (!dts->m_vr)
>>> + dts->m_vr = (new (ggc_cleared_alloc ()) ipa_vr_vec ());
>>> +  vec *_vr = dts->m_vr;
>>>if (vec_safe_length (src_trans->m_vr) > 0)
>>>   {
>>> vec_safe_reserve_exact (dst_vr, src_vr->length ());
>>> @@ -3826,8 +3835,10 @@ ipa_node_params_t::duplicate(cgraph_node *src, 
>>> cgraph_node *dst,
>>>ipcp_grow_transformations_if_necessary ();
>>>src_trans = 

Re: [RFC] Bug lto/78140

2017-02-02 Thread Martin Liška
On 02/02/2017 02:36 AM, kugan wrote:
> This is due to an existing issue. That is, in ipa_node_params_t::remove, m_vr 
> and bits vectors are not set to null such that the gc can claim it.

I've just sent patch that should remove such need as ~ipa_node_params_t should 
be called
just once:

https://gcc.gnu.org/ml/gcc-patches/2017-02/msg00148.html

Thanks,
Martin


[PATCH] Fix memory leaks in IPA CP (PR ipa/79337).

2017-02-02 Thread Martin Liška
Hello.

As mentioned in the PR, there is memory leak that is caused by fact, that 
ipa_node_params_t
does release memory just in ipa_node_params_t::remove. That's wrong because the 
callback is called
just when cgraph_removal_hook is triggered. Thus the proper implementation is 
to move it do destructor.
Similar should be done for construction (currently being in 
ipa_node_params_t::insert).

Apart from that, I noticed that when using GGC memory, the machinery implicitly 
calls dtors for types
that have __has_trivial_destructor == true. That's case of ipa_node_params_t, 
but as symbol_summary already
calls a dtor before ggc_free is called. Problem comes when hash_table marks a 
memory slot as empty and GGC finalizer
calls dtor for an invalid memory. Thus I believe not using such finalizer is 
proper fix.

Last change fixes issue where ipa_free_all_node_params destroys a 
symbol_summary (living in GGC memory) and dtor
is called for second time via ggc release mechanism.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Ready to be installed?
Martin
>From 08a3ecda47d9d52cd4bb6435aaf18b5b099ef683 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Thu, 2 Feb 2017 11:13:13 +0100
Subject: [PATCH] Fix memory leaks in IPA CP (PR ipa/79337).

gcc/ChangeLog:

2017-02-02  Martin Liska  

	PR ipa/79337
	* ipa-prop.c (ipa_node_params_t::insert): Remove current
	implementation.
	(ipa_node_params_t::remove): Likewise.
	* ipa-prop.h (ipa_node_params::ipa_node_params): Make default
	initialization from ipa_node_params_t::insert.
	(ipa_node_params::~ipa_node_params): Move from
	ipa_node_params_t::release.
	* symbol-summary.h (symbol_summary::m_released): New member.
	Do not release a summary twice.  Do not allow to call finalizer
	for types of a summary that live in GGC memory.
---
 gcc/ipa-prop.c   | 32 
 gcc/ipa-prop.h   | 28 ++--
 gcc/symbol-summary.h | 27 ++-
 3 files changed, 40 insertions(+), 47 deletions(-)

diff --git a/gcc/ipa-prop.c b/gcc/ipa-prop.c
index 3ef3d4fae9e..d031a70caa4 100644
--- a/gcc/ipa-prop.c
+++ b/gcc/ipa-prop.c
@@ -3736,38 +3736,6 @@ ipa_add_new_function (cgraph_node *node, void *data ATTRIBUTE_UNUSED)
 ipa_analyze_node (node);
 }
 
-/* Initialize a newly created param info.  */
-
-void
-ipa_node_params_t::insert (cgraph_node *, ipa_node_params *info)
-{
-  info->lattices = NULL;
-  info->ipcp_orig_node = NULL;
-  info->known_csts = vNULL;
-  info->known_contexts = vNULL;
-  info->analysis_done = 0;
-  info->node_enqueued = 0;
-  info->do_clone_for_all_contexts = 0;
-  info->is_all_contexts_clone = 0;
-  info->node_dead = 0;
-  info->node_within_scc = 0;
-  info->node_calling_single_call = 0;
-  info->versionable = 0;
-}
-
-/* Frees all dynamically allocated structures that the param info points
-   to.  */
-
-void
-ipa_node_params_t::remove (cgraph_node *, ipa_node_params *info)
-{
-  free (info->lattices);
-  /* Lattice values and their sources are deallocated with their alocation
- pool.  */
-  info->known_csts.release ();
-  info->known_contexts.release ();
-}
-
 /* Hook that is called by summary when a node is duplicated.  */
 
 void
diff --git a/gcc/ipa-prop.h b/gcc/ipa-prop.h
index 93a2390c321..8c5ba25fcec 100644
--- a/gcc/ipa-prop.h
+++ b/gcc/ipa-prop.h
@@ -320,6 +320,12 @@ struct GTY(()) ipa_param_descriptor
 
 struct GTY((for_user)) ipa_node_params
 {
+  /* Default constructor.  */
+  ipa_node_params ();
+
+  /* Default destructor.  */
+  ~ipa_node_params ();
+
   /* Information about individual formal parameters that are gathered when
  summaries are generated. */
   vec *descriptors;
@@ -356,6 +362,24 @@ struct GTY((for_user)) ipa_node_params
   unsigned versionable : 1;
 };
 
+inline
+ipa_node_params::ipa_node_params ()
+: descriptors (NULL), lattices (NULL), ipcp_orig_node (NULL),
+  known_csts (vNULL), known_contexts (vNULL), analysis_done (0),
+  node_enqueued (0), do_clone_for_all_contexts (0), is_all_contexts_clone (0),
+  node_dead (0), node_within_scc (0), node_calling_single_call (0),
+  versionable (0)
+{
+}
+
+inline
+ipa_node_params::~ipa_node_params ()
+{
+  free (lattices);
+  known_csts.release ();
+  known_contexts.release ();
+}
+
 /* Intermediate information that we get from alias analysis about a particular
parameter in a particular basic_block.  When a parameter or the memory it
references is marked modified, we use that information in all dominated
@@ -580,9 +604,9 @@ public:
 function_summary (table, ggc) { }
 
   /* Hook that is called by summary when a node is deleted.  */
-  virtual void insert (cgraph_node *, ipa_node_params *info);
+  virtual void insert (cgraph_node *, ipa_node_params *) {}
   /* Hook that is called by summary when a node is deleted.  */
-  virtual void remove (cgraph_node *, ipa_node_params *info);
+  virtual void remove (cgraph_node *, 

Re: [RFC] Bug lto/78140

2017-02-02 Thread Richard Biener
On Thu, Feb 2, 2017 at 1:48 PM, Jan Hubicka  wrote:
>>
>> 2017-02-02  Kugan Vivekanandarajah  
>>
>> * ipa-cp.c (ipcp_store_bits_results): Construct bits vector.
>> (ipcp_store_vr_results): Constrict m_vr vector.
>> * ipa-prop.c (ipa_node_params_t::remove): Set transaction summary to
>> null.
>> (ipa_node_params_t::duplicate): Construct bits and m_vr vector.
>> (read_ipcp_transformation_info): Likewise.
>>
>>
>> Thanks,
>> Kugan
>>
>> >>I tried similar think without variable structure like:
>> >>diff --git a/gcc/ipa-prop.h b/gcc/ipa-prop.h
>> >>index 93a2390c..b0cc832 100644
>> >>--- a/gcc/ipa-prop.h
>> >>+++ b/gcc/ipa-prop.h
>> >>@@ -525,7 +525,7 @@ struct GTY(()) ipcp_transformation_summary
>> >>   /* Known bits information.  */
>> >>   vec *bits;
>> >>   /* Value range information.  */
>> >>-  vec *m_vr;
>> >>+  vec *m_vr;
>> >> };
>> >>
>> >>This also has the same issue so I don't think it has anything to do with
>> >>variable structure.
>> >
>> >You have to debug that detail yourself but I wonder why the transformation
>> >summary has a different representation than the jump function (and I think
>> >the jump function size is the issue).
>> >
>> >The JF has
>> >
>> >  /* Information about zero/non-zero bits.  */
>> >  struct ipa_bits bits;
>> >
>> >  /* Information about value range, containing valid data only when 
>> > vr_known is
>> > true.  */
>> >  value_range m_vr;
>> >  bool vr_known;
>> >
>> >with ipa_bits having two widest_ints and value_range having two trees
>> >and an unused bitmap and ipa_vr having two wide_ints (widest_ints are
>> >smaller!).
>> >
>> >Richard.
>> >
>> >>
>> >>Thanks,
>> >>Kugan
>
>> diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
>> index aa3c997..5103555 100644
>> --- a/gcc/ipa-cp.c
>> +++ b/gcc/ipa-cp.c
>> @@ -4865,6 +4865,8 @@ ipcp_store_bits_results (void)
>>
>>ipcp_grow_transformations_if_necessary ();
>>ipcp_transformation_summary *ts = ipcp_get_transformation_summary 
>> (node);
>> +  if (!ts->bits)
>> + ts->bits = (new (ggc_cleared_alloc ()) ipa_bits_vec ());


Why do you need those placement news?

Richard.

>>vec_safe_reserve_exact (ts->bits, count);
>>
>>for (unsigned i = 0; i < count; i++)
>> @@ -4940,6 +4942,8 @@ ipcp_store_vr_results (void)
>>
>>ipcp_grow_transformations_if_necessary ();
>>ipcp_transformation_summary *ts = ipcp_get_transformation_summary 
>> (node);
>> +  if (!ts->m_vr)
>> + ts->m_vr = new (ggc_cleared_alloc ()) ipa_vr_vec ();
>>vec_safe_reserve_exact (ts->m_vr, count);
>>
>>for (unsigned i = 0; i < count; i++)
>> diff --git a/gcc/ipa-prop.c b/gcc/ipa-prop.c
>> index 834c27d..b992a7f 100644
>> --- a/gcc/ipa-prop.c
>> +++ b/gcc/ipa-prop.c
>> @@ -3759,13 +3759,20 @@ ipa_node_params_t::insert (cgraph_node *, 
>> ipa_node_params *info)
>> to.  */
>>
>>  void
>> -ipa_node_params_t::remove (cgraph_node *, ipa_node_params *info)
>> +ipa_node_params_t::remove (cgraph_node *node, ipa_node_params *info)
>>  {
>>free (info->lattices);
>>/* Lattice values and their sources are deallocated with their alocation
>>   pool.  */
>>info->known_csts.release ();
>>info->known_contexts.release ();
>> +  ipcp_transformation_summary *ts = ipcp_get_transformation_summary (node);
>> +  if (ts != NULL)
>> +{
>> +  ts->agg_values = NULL;
>> +  ts->bits = NULL;
>> +  ts->m_vr = NULL;
>> +}
>
> I would go for explicit ggc_free for them: garbage collrector is hardly ever 
> run during
> WPA stage and thus we are not going to get the memory back otherwise.
>
> Patch is OK with that change.
>
> Honza
>>  }
>>
>>  /* Hook that is called by summary when a node is duplicated.  */
>> @@ -3811,8 +3818,10 @@ ipa_node_params_t::duplicate(cgraph_node *src, 
>> cgraph_node *dst,
>>ipcp_grow_transformations_if_necessary ();
>>src_trans = ipcp_get_transformation_summary (src);
>>const vec *src_vr = src_trans->m_vr;
>> -  vec *_vr
>> - = ipcp_get_transformation_summary (dst)->m_vr;
>> +  ipcp_transformation_summary *dts = ipcp_get_transformation_summary 
>> (dst);
>> +  if (!dts->m_vr)
>> + dts->m_vr = (new (ggc_cleared_alloc ()) ipa_vr_vec ());
>> +  vec *_vr = dts->m_vr;
>>if (vec_safe_length (src_trans->m_vr) > 0)
>>   {
>> vec_safe_reserve_exact (dst_vr, src_vr->length ());
>> @@ -3826,8 +3835,10 @@ ipa_node_params_t::duplicate(cgraph_node *src, 
>> cgraph_node *dst,
>>ipcp_grow_transformations_if_necessary ();
>>src_trans = ipcp_get_transformation_summary (src);
>>const vec *src_bits = src_trans->bits;
>> -  vec *_bits
>> - = ipcp_get_transformation_summary (dst)->bits;
>> +  ipcp_transformation_summary *dts = ipcp_get_transformation_summary 
>> (dst);
>> +  if 

Re: [RFC] Bug lto/78140

2017-02-02 Thread Jan Hubicka
> 
> 2017-02-02  Kugan Vivekanandarajah  
> 
> * ipa-cp.c (ipcp_store_bits_results): Construct bits vector.
> (ipcp_store_vr_results): Constrict m_vr vector.
> * ipa-prop.c (ipa_node_params_t::remove): Set transaction summary to
> null.
> (ipa_node_params_t::duplicate): Construct bits and m_vr vector.
> (read_ipcp_transformation_info): Likewise.
> 
> 
> Thanks,
> Kugan
> 
> >>I tried similar think without variable structure like:
> >>diff --git a/gcc/ipa-prop.h b/gcc/ipa-prop.h
> >>index 93a2390c..b0cc832 100644
> >>--- a/gcc/ipa-prop.h
> >>+++ b/gcc/ipa-prop.h
> >>@@ -525,7 +525,7 @@ struct GTY(()) ipcp_transformation_summary
> >>   /* Known bits information.  */
> >>   vec *bits;
> >>   /* Value range information.  */
> >>-  vec *m_vr;
> >>+  vec *m_vr;
> >> };
> >>
> >>This also has the same issue so I don't think it has anything to do with
> >>variable structure.
> >
> >You have to debug that detail yourself but I wonder why the transformation
> >summary has a different representation than the jump function (and I think
> >the jump function size is the issue).
> >
> >The JF has
> >
> >  /* Information about zero/non-zero bits.  */
> >  struct ipa_bits bits;
> >
> >  /* Information about value range, containing valid data only when vr_known 
> > is
> > true.  */
> >  value_range m_vr;
> >  bool vr_known;
> >
> >with ipa_bits having two widest_ints and value_range having two trees
> >and an unused bitmap and ipa_vr having two wide_ints (widest_ints are
> >smaller!).
> >
> >Richard.
> >
> >>
> >>Thanks,
> >>Kugan

> diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
> index aa3c997..5103555 100644
> --- a/gcc/ipa-cp.c
> +++ b/gcc/ipa-cp.c
> @@ -4865,6 +4865,8 @@ ipcp_store_bits_results (void)
>  
>ipcp_grow_transformations_if_necessary ();
>ipcp_transformation_summary *ts = ipcp_get_transformation_summary 
> (node);
> +  if (!ts->bits)
> + ts->bits = (new (ggc_cleared_alloc ()) ipa_bits_vec ());
>vec_safe_reserve_exact (ts->bits, count);
>  
>for (unsigned i = 0; i < count; i++)
> @@ -4940,6 +4942,8 @@ ipcp_store_vr_results (void)
>  
>ipcp_grow_transformations_if_necessary ();
>ipcp_transformation_summary *ts = ipcp_get_transformation_summary 
> (node);
> +  if (!ts->m_vr)
> + ts->m_vr = new (ggc_cleared_alloc ()) ipa_vr_vec ();
>vec_safe_reserve_exact (ts->m_vr, count);
>  
>for (unsigned i = 0; i < count; i++)
> diff --git a/gcc/ipa-prop.c b/gcc/ipa-prop.c
> index 834c27d..b992a7f 100644
> --- a/gcc/ipa-prop.c
> +++ b/gcc/ipa-prop.c
> @@ -3759,13 +3759,20 @@ ipa_node_params_t::insert (cgraph_node *, 
> ipa_node_params *info)
> to.  */
>  
>  void
> -ipa_node_params_t::remove (cgraph_node *, ipa_node_params *info)
> +ipa_node_params_t::remove (cgraph_node *node, ipa_node_params *info)
>  {
>free (info->lattices);
>/* Lattice values and their sources are deallocated with their alocation
>   pool.  */
>info->known_csts.release ();
>info->known_contexts.release ();
> +  ipcp_transformation_summary *ts = ipcp_get_transformation_summary (node);
> +  if (ts != NULL)
> +{
> +  ts->agg_values = NULL;
> +  ts->bits = NULL;
> +  ts->m_vr = NULL;
> +}

I would go for explicit ggc_free for them: garbage collrector is hardly ever 
run during
WPA stage and thus we are not going to get the memory back otherwise.

Patch is OK with that change.

Honza
>  }
>  
>  /* Hook that is called by summary when a node is duplicated.  */
> @@ -3811,8 +3818,10 @@ ipa_node_params_t::duplicate(cgraph_node *src, 
> cgraph_node *dst,
>ipcp_grow_transformations_if_necessary ();
>src_trans = ipcp_get_transformation_summary (src);
>const vec *src_vr = src_trans->m_vr;
> -  vec *_vr
> - = ipcp_get_transformation_summary (dst)->m_vr;
> +  ipcp_transformation_summary *dts = ipcp_get_transformation_summary 
> (dst);
> +  if (!dts->m_vr)
> + dts->m_vr = (new (ggc_cleared_alloc ()) ipa_vr_vec ());
> +  vec *_vr = dts->m_vr;
>if (vec_safe_length (src_trans->m_vr) > 0)
>   {
> vec_safe_reserve_exact (dst_vr, src_vr->length ());
> @@ -3826,8 +3835,10 @@ ipa_node_params_t::duplicate(cgraph_node *src, 
> cgraph_node *dst,
>ipcp_grow_transformations_if_necessary ();
>src_trans = ipcp_get_transformation_summary (src);
>const vec *src_bits = src_trans->bits;
> -  vec *_bits
> - = ipcp_get_transformation_summary (dst)->bits;
> +  ipcp_transformation_summary *dts = ipcp_get_transformation_summary 
> (dst);
> +  if (!dts->bits)
> + dts->bits = (new (ggc_cleared_alloc ()) ipa_bits_vec ());
> +  vec *_bits = dts->bits;
>vec_safe_reserve_exact (dst_bits, src_bits->length ());
>for (unsigned i = 0; i < src_bits->length (); ++i)
> 

Re: [patch] Fix PR middle-end/78468

2017-02-02 Thread Eric Botcazou
> Here's a revised version along these lines, OK for mainline after testing?
> 
> 
>   PR middle-end/78468
>   * emit-rtl.c (init_emit): Add ??? comment for problematic alignment
>   settings of the virtual registers.
> 
>   Revert again
>   2016-08-23  Dominik Vogt  
> 
>   * explow.c (get_dynamic_stack_size): Take known alignment of stack
>   pointer + STACK_DYNAMIC_OFFSET into account when calculating the size
>   needed.

Jakub said in the audit trail that he was also OK with this for GCC 7 so I 
have installed the patch after boostrapping/regtesting on SPARC/Solaris.

-- 
Eric Botcazou


Re: libgomp: Normalize the names of a few functions of the libgomp plugin API (was: libgomp: Provide prototypes for functions implemented by libgomp plugins)

2017-02-02 Thread Jakub Jelinek
On Thu, Feb 02, 2017 at 01:22:37PM +0100, Thomas Schwinge wrote:
> Hi!
> 
> On Tue, 31 Jan 2017 15:44:46 +0100, I wrote:
> > --- libgomp/libgomp.h
> > +++ libgomp/libgomp.h
> > @@ -882,31 +882,35 @@ typedef struct acc_dispatch_t
> 
> > +  __typeof (GOMP_OFFLOAD_openacc_parallel) *exec_func;
> 
> As can be seen here, for a handful of functions, the name of the
> implementation in the plugins ("GOMP_OFFLOAD_openacc_parallel") doesn't
> match the name used in libgomp proper ("openacc.exec_func").  Here is a
> patch to adjust these.  OK for trunk?  If not now, then in next stage 1?

Ok for trunk now, otherwise we'd have to bump the plugin interface in GCC8, even
when we don't know if we'll need to do that.

Jakub


libgomp: Normalize the names of a few functions of the libgomp plugin API (was: libgomp: Provide prototypes for functions implemented by libgomp plugins)

2017-02-02 Thread Thomas Schwinge
Hi!

On Tue, 31 Jan 2017 15:44:46 +0100, I wrote:
> --- libgomp/libgomp.h
> +++ libgomp/libgomp.h
> @@ -882,31 +882,35 @@ typedef struct acc_dispatch_t

> +  __typeof (GOMP_OFFLOAD_openacc_parallel) *exec_func;

As can be seen here, for a handful of functions, the name of the
implementation in the plugins ("GOMP_OFFLOAD_openacc_parallel") doesn't
match the name used in libgomp proper ("openacc.exec_func").  Here is a
patch to adjust these.  OK for trunk?  If not now, then in next stage 1?

commit 4dc0a9c9a6a1d67e298b254d9c1aa7592db5c763
Author: Thomas Schwinge 
Date:   Thu Jan 26 20:18:30 2017 +0100

libgomp: Normalize the names of a few functions of the libgomp plugin API

libgomp/
* libgomp-plugin.h (GOMP_OFFLOAD_openacc_parallel): Rename to
GOMP_OFFLOAD_openacc_exec.  Adjust all users.
(GOMP_OFFLOAD_openacc_get_current_cuda_device): Rename to
GOMP_OFFLOAD_openacc_cuda_get_current_device.  Adjust all users.
(GOMP_OFFLOAD_openacc_get_current_cuda_context): Rename to
GOMP_OFFLOAD_openacc_cuda_get_current_context.  Adjust all users.
(GOMP_OFFLOAD_openacc_get_cuda_stream): Rename to
GOMP_OFFLOAD_openacc_cuda_get_stream.  Adjust all users.
(GOMP_OFFLOAD_openacc_set_cuda_stream): Rename to
GOMP_OFFLOAD_openacc_cuda_set_stream.  Adjust all users.
---
 libgomp/ChangeLog | 11 +++
 libgomp/libgomp-plugin.h  | 12 ++--
 libgomp/libgomp.h | 10 +-
 libgomp/plugin/plugin-nvptx.c | 14 +++---
 libgomp/target.c  | 10 +-
 5 files changed, 34 insertions(+), 23 deletions(-)

diff --git libgomp/ChangeLog libgomp/ChangeLog
[snipped]
diff --git libgomp/libgomp-plugin.h libgomp/libgomp-plugin.h
index fba45ee..ff81350 100644
--- libgomp/libgomp-plugin.h
+++ libgomp/libgomp-plugin.h
@@ -93,8 +93,8 @@ extern bool GOMP_OFFLOAD_dev2dev (int, void *, const void *, 
size_t);
 extern bool GOMP_OFFLOAD_can_run (void *);
 extern void GOMP_OFFLOAD_run (int, void *, void *, void **);
 extern void GOMP_OFFLOAD_async_run (int, void *, void *, void **, void *);
-extern void GOMP_OFFLOAD_openacc_parallel (void (*) (void *), size_t, void **,
-  void **, int, unsigned *, void *);
+extern void GOMP_OFFLOAD_openacc_exec (void (*) (void *), size_t, void **,
+  void **, int, unsigned *, void *);
 extern void GOMP_OFFLOAD_openacc_register_async_cleanup (void *, int);
 extern int GOMP_OFFLOAD_openacc_async_test (int);
 extern int GOMP_OFFLOAD_openacc_async_test_all (void);
@@ -105,10 +105,10 @@ extern void GOMP_OFFLOAD_openacc_async_wait_all_async 
(int);
 extern void GOMP_OFFLOAD_openacc_async_set_async (int);
 extern void *GOMP_OFFLOAD_openacc_create_thread_data (int);
 extern void GOMP_OFFLOAD_openacc_destroy_thread_data (void *);
-extern void *GOMP_OFFLOAD_openacc_get_current_cuda_device (void);
-extern void *GOMP_OFFLOAD_openacc_get_current_cuda_context (void);
-extern void *GOMP_OFFLOAD_openacc_get_cuda_stream (int);
-extern int GOMP_OFFLOAD_openacc_set_cuda_stream (int, void *);
+extern void *GOMP_OFFLOAD_openacc_cuda_get_current_device (void);
+extern void *GOMP_OFFLOAD_openacc_cuda_get_current_context (void);
+extern void *GOMP_OFFLOAD_openacc_cuda_get_stream (int);
+extern int GOMP_OFFLOAD_openacc_cuda_set_stream (int, void *);
 
 #ifdef __cplusplus
 }
diff --git libgomp/libgomp.h libgomp/libgomp.h
index 6dfe9aa..1769a48 100644
--- libgomp/libgomp.h
+++ libgomp/libgomp.h
@@ -882,7 +882,7 @@ typedef struct acc_dispatch_t
   struct target_mem_desc *data_environ;
 
   /* Execute.  */
-  __typeof (GOMP_OFFLOAD_openacc_parallel) *exec_func;
+  __typeof (GOMP_OFFLOAD_openacc_exec) *exec_func;
 
   /* Async cleanup callback registration.  */
   __typeof (GOMP_OFFLOAD_openacc_register_async_cleanup)
@@ -905,12 +905,12 @@ typedef struct acc_dispatch_t
 
   /* NVIDIA target specific routines.  */
   struct {
-__typeof (GOMP_OFFLOAD_openacc_get_current_cuda_device)
+__typeof (GOMP_OFFLOAD_openacc_cuda_get_current_device)
   *get_current_device_func;
-__typeof (GOMP_OFFLOAD_openacc_get_current_cuda_context)
+__typeof (GOMP_OFFLOAD_openacc_cuda_get_current_context)
   *get_current_context_func;
-__typeof (GOMP_OFFLOAD_openacc_get_cuda_stream) *get_stream_func;
-__typeof (GOMP_OFFLOAD_openacc_set_cuda_stream) *set_stream_func;
+__typeof (GOMP_OFFLOAD_openacc_cuda_get_stream) *get_stream_func;
+__typeof (GOMP_OFFLOAD_openacc_cuda_set_stream) *set_stream_func;
   } cuda;
 } acc_dispatch_t;
 
diff --git libgomp/plugin/plugin-nvptx.c libgomp/plugin/plugin-nvptx.c
index 4144218..0284c7f 100644
--- libgomp/plugin/plugin-nvptx.c
+++ libgomp/plugin/plugin-nvptx.c
@@ -1922,9 +1922,9 @@ GOMP_OFFLOAD_dev2dev (int ord, void *dst, const void 
*src, size_t n)
 void (*device_run) (int n, void *fn_ptr, void *vars) = 

  1   2   >