date:20150629

RE: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math

2015-06-29 Thread Kumar, Venkataramanan

 -Original Message-
 From: Dr. Philipp Tomsich [mailto:philipp.toms...@theobroma-systems.com]
 Sent: Monday, June 29, 2015 2:17 PM
 To: Kumar, Venkataramanan
 Cc: pins...@gmail.com; Benedikt Huber; gcc-patches@gcc.gnu.org
 Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt)
 estimation in -ffast-math

 Kumar,

 This does not come unexpected, as the initial estimation and each iteration
 will add an architecturally-defined number of bits of precision (ARMv8
 guarantuees only a minimum number of bits provided per operation… the
 exact number is specific to each micro-arch, though).
 Depending on your architecture and on the required number of precise bits
 by any given benchmark, one may see miscompares.

True.  

 Do you know the exact number of bits that the initial estimate and the
 subsequent refinement steps add for your micro-arch?

I am not sure on this. I  need to check for cortex-a57 case.  

But best thing for cortex-a57 case is not to use this optimization by 
default. 
If we get an -mrecip-sqrt command line , then we can add it for gromacs kind 
application to get gains.

Any thoughts on this?

Regards,
Venkat.

 Thanks,
 Philipp.

  On 29 Jun 2015, at 10:17, Kumar, Venkataramanan
 venkataramanan.ku...@amd.com wrote:

  Hmm,  Reducing the iterations to 1 step for float and 2 steps for double

  I got VE (miscompares) on following benchmarks 416.gamess
  453.povray
  454.calculix
  459.GemsFDTD

  Benedikt , I have ICE for 444.namd with your patch,  not sure if something
 wrong in my local tree.

  Regards,
  Venkat.

  -Original Message-
  From: pins...@gmail.com [mailto:pins...@gmail.com]
  Sent: Sunday, June 28, 2015 8:35 PM
  To: Kumar, Venkataramanan
  Cc: Dr. Philipp Tomsich; Benedikt Huber; gcc-patches@gcc.gnu.org
  Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root
  (rsqrt) estimation in -ffast-math

  On Jun 25, 2015, at 9:44 AM, Kumar, Venkataramanan
  venkataramanan.ku...@amd.com wrote:

  I got around ~12% gain with -Ofast -mcpu=cortex-a57.

  I get around 11/12% on thunderX with the patch and the decreasing the
  iterations change (1/2) compared to without the patch.

  Thanks,
  Andrew

  Regards,
  Venkat.

  -Original Message-
  From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
  ow...@gcc.gnu.org] On Behalf Of Dr. Philipp Tomsich
  Sent: Thursday, June 25, 2015 9:13 PM
  To: Kumar, Venkataramanan
  Cc: Benedikt Huber; pins...@gmail.com; gcc-patches@gcc.gnu.org
  Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root
  (rsqrt) estimation in -ffast-math

  Kumar,

  what is the relative gain that you see on Cortex-A57?

  Thanks,
  Philipp.

  On 25 Jun 2015, at 17:35, Kumar, Venkataramanan
  venkataramanan.ku...@amd.com wrote:

  Changing to  1 step for float and 2 steps for double gives
  better gains
  now for gromacs on cortex-a57.

  Regards,
  Venkat.
  -Original Message-
  From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
  ow...@gcc.gnu.org] On Behalf Of Benedikt Huber
  Sent: Thursday, June 25, 2015 4:09 PM
  To: pins...@gmail.com
  Cc: gcc-patches@gcc.gnu.org; philipp.tomsich@theobroma-
  systems.com
  Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root
  (rsqrt) estimation in -ffast-math

  Andrew,

  This is NOT a win on thunderX at least for single precision
  because you have
  to do the divide and sqrt in the same time as it takes 5
  multiples (estimate and step are multiplies in the thunderX pipeline).
  Doubles is 10 multiplies which is just the same as what the patch
  does (but it is really slightly less than 10, I rounded up). So
  in the end this is NOT a win at all for thunderX unless we do one
  less step for both single
  and double.

  Yes, the expected benefit from rsqrt estimation is implementation
  specific. If one has a better initial rsqrte or an application
  that can trade precision for execution time, we could offer a
  command line option to do only 2 steps for doulbe and 1 step for
  float; similar to -
  mrecip-precision for PowerPC.
  What are your thoughts on that?

  Best regards,
  Benedikt

Re: [AArch64][2/2] Implement -fpic for -mcmodel=small

2015-06-29 Thread Christophe Lyon

On 27 June 2015 at 14:49, Jiong Wang jiong.w...@arm.com wrote:

 Andreas Schwab writes:

 Jiong Wang jiong.w...@arm.com writes:

 Andreas Schwab writes:

 spawn -ignore SIGHUP /opt/gcc/gcc-20150627/Build/gcc/xgcc 
 -B/opt/gcc/gcc-20150627/Build/gcc/ -fno-diagnostics-show-caret 
 -fdiagnostics-color=never -O0 -w -fpie -c -o pr65593.o 
 /opt/gcc/gcc-20150627/gcc/testsuite/gcc.c-torture/compile/pr65593.c
 /tmp/cc0Pymaf.s: Assembler messages:
 /tmp/cc0Pymaf.s:11: Error: unknown relocation modifier at operand 2 -- 
 `ldr x0,[x0,#:gotpage_lo15:a]'
 /tmp/cc0Pymaf.s:19: Error: unknown relocation modifier at operand 2 --
 `ldr x2,[x0,#:gotpage_lo15:bar]'

 Andreas,

   The binutils patch has been upstreamed already. Please checkout the
   latest binutils code.

 That must work with the current binutils.

 I see, agree.

 Will adopt current ILP32 feature detection mechanism into -fpic.

 And some new TLS patches may need similiar check also.

 Thanks.

Hi Jiong,

I'm not sure I fully understand your answer. I believe you are going
to change the testcase, right?

I'm seeing whole toolchain build failures (when building glibc) since
you committed this, because of the new relocations. I understand your
suggestion is to move to trunk binutils to have support for these new
relocations. Any other things I could do to avoid changing the
binutils version I use?

Thanks
Christophe.


 --
 Regards,
 Jiong

[wwwdocs] Knuth's pages have moved, adjust URLs

2015-06-29 Thread Gerald Pfeifer

Applied.

Gerald

Index: readings.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/readings.html,v
retrieving revision 1.239
diff -u -r1.239 readings.html
--- readings.html   27 Jun 2015 19:26:46 -  1.239
+++ readings.html   29 Jun 2015 11:30:46 -
@@ -209,13 +209,12 @@
[EM-micks].  The number stands for the average of numbers of 14
actual computers very similar to MMIX.  The name may also be due to a
predecessor appropriately named MIX.
-   br /MMIX is used in program examples in a
-   href=http://www-cs-faculty.stanford.edu/~knuth/;Donald E. Knuth/a's
-   a href=http://www-cs-faculty.stanford.edu/~knuth/taocp.html;The Art
+   br /MMIX is used in program examples in Donald E. Knuth's
+   a href=http://www-cs-faculty.stanford.edu/~uno/taocp.html;The Art
of Computer Programming/a (ISBN 0-201-89683-4).
-   br /a href=http://www-cs-faculty.stanford.edu/~knuth/mmix.html;Knuth's
-   MMIX page/a has more information about MMIX.  Knuth also wrote a
-   a href=http://www-cs-faculty.stanford.edu/~knuth/mmixware.html;book
+   br /The a href=http://www-cs-faculty.stanford.edu/~uno/mmix.html;MMIX
+   page/a has more information about MMIX.  Knuth also wrote a
+   a href=http://www-cs-faculty.stanford.edu/~uno/mmixware.html;book
specifically about MMIX/a (MMIXware, ISBN 3-540-66938-8).
  /li
  
Index: gcc-3.1/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-3.1/changes.html,v
retrieving revision 1.68
diff -u -r1.68 changes.html
--- gcc-3.1/changes.html28 Jun 2014 07:45:10 -  1.68
+++ gcc-3.1/changes.html29 Jun 2015 11:30:46 -
@@ -256,7 +256,7 @@
 
 ul
 liHans-Peter Nilsson has contributed a port to a
-href=http://www-cs-faculty.stanford.edu/~knuth/mmix.html;MMIX/a,
+href=http://www-cs-faculty.stanford.edu/~uno/mmix.html;MMIX/a,
 the CPU architecture used in new editions of Donald E. Knuth's emThe
 Art of Computer Programming/em./li
 
Index: projects/prefetch.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/projects/prefetch.html,v
retrieving revision 1.25
diff -u -r1.25 prefetch.html
--- projects/prefetch.html  30 Jan 2011 18:38:35 -  1.25
+++ projects/prefetch.html  29 Jun 2015 11:30:46 -
@@ -786,14 +786,14 @@
 
 pa name=ref_11[11]/a
 emMMIX Op Codes/em, Don Knuth;
-a href=http://www-cs-faculty.stanford.edu/~knuth/mmop.html;
-http://www-cs-faculty.stanford.edu/~knuth/mmop.html/a./p
+a href=http://www-cs-faculty.stanford.edu/~uno/mmop.html;
+http://www-cs-faculty.stanford.edu/~uno/mmop.html/a./p
 
 pa name=ref_12[12]/a
 emThe Art of Computer Programming, Fascicle 1: MMIX/em, Don Knuth,
 Addison Wesley Longman, 2001;
 a href=http://www-cs-faculty.stanford.edu/~knuth/fasc1.ps.gz;
-http://www-cs-faculty.stanford.edu/~knuth/fasc1.ps.gz/a./p
+http://www-cs-faculty.stanford.edu/~uno/fasc1.ps.gz/a./p
 
 pa name=ref_13[13]/a
 emPA-RISC 2.0 Instruction Set Architecture/em;

Re: [wwwdocs] Knuth's pages have moved, adjust URLs

2015-06-29 Thread Gerald Pfeifer

Oh, and the patch also removed a link to his home page.  Enough
links in place already.

Gerald

Re: C/C++ PATCH to smarten up -Wswitch-bool (PR c/66322)

2015-06-29 Thread Marek Polacek

On Thu, Jun 25, 2015 at 12:40:09PM -0600, Jeff Law wrote:
 If this patch is approved, I'd like to backport it even to 5 branch after
 some time so it's fixed in 5.2.

I've committed the patch to trunk.

 If it's OK with Jakub, it's fine with me.

So Jakub, is it ok to backport this to 5.2 after a few days?

 OK.

Thanks for review!

Marek

[PATCH] Add canonicalize_funcptr_for_compare target-insn

2015-06-29 Thread Richard Biener


Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2015-06-29  Richard Biener  rguent...@suse.de

* target-insns.def (canonicalize_funcptr_for_compare): Add.
* fold-const.c (build_range_check): Replace uses of
HAVE_canonicalize_funcptr_for_compare.
(fold_widened_comparison): Likewise.
(fold_sign_changed_comparison): Likewise.
* dojump.c: Include target.h.
(do_compare_and_jump): Replace uses of
HAVE_canonicalize_funcptr_for_compare and
gen_canonicalize_funcptr_for_compare.
* expr.c (do_store_flag): Likewise.

Index: gcc/target-insns.def
===
--- gcc/target-insns.def(revision 225115)
+++ gcc/target-insns.def(working copy)
@@ -32,3 +32,4 @@
Instructions should be documented in md.texi rather than here.  */
 DEF_TARGET_INSN (return, (void))
 DEF_TARGET_INSN (simple_return, (void))
+DEF_TARGET_INSN (canonicalize_funcptr_for_compare, (rtx x0, rtx x1))
Index: gcc/fold-const.c
===
--- gcc/fold-const.c(revision 225115)
+++ gcc/fold-const.c(working copy)
@@ -4488,14 +4488,12 @@ build_range_check (location_t loc, tree
 {
   tree etype = TREE_TYPE (exp), value;
 
-#ifdef HAVE_canonicalize_funcptr_for_compare
   /* Disable this optimization for function pointer expressions
  on targets that require function pointer canonicalization.  */
-  if (HAVE_canonicalize_funcptr_for_compare
+  if (targetm.have_canonicalize_funcptr_for_compare ()
TREE_CODE (etype) == POINTER_TYPE
TREE_CODE (TREE_TYPE (etype)) == FUNCTION_TYPE)
 return NULL_TREE;
-#endif
 
   if (! in_p)
 {
@@ -6964,14 +6962,12 @@ fold_widened_comparison (location_t loc,
 return NULL_TREE;
   shorter_type = TREE_TYPE (arg0_unw);
 
-#ifdef HAVE_canonicalize_funcptr_for_compare
   /* Disable this optimization if we're casting a function pointer
  type on targets that require function pointer canonicalization.  */
-  if (HAVE_canonicalize_funcptr_for_compare
+  if (targetm.have_canonicalize_funcptr_for_compare ()
TREE_CODE (shorter_type) == POINTER_TYPE
TREE_CODE (TREE_TYPE (shorter_type)) == FUNCTION_TYPE)
 return NULL_TREE;
-#endif
 
   if (TYPE_PRECISION (TREE_TYPE (arg0)) = TYPE_PRECISION (shorter_type))
 return NULL_TREE;
@@ -7059,14 +7055,12 @@ fold_sign_changed_comparison (location_t
   arg0_inner = TREE_OPERAND (arg0, 0);
   inner_type = TREE_TYPE (arg0_inner);
 
-#ifdef HAVE_canonicalize_funcptr_for_compare
   /* Disable this optimization if we're casting a function pointer
  type on targets that require function pointer canonicalization.  */
-  if (HAVE_canonicalize_funcptr_for_compare
+  if (targetm.have_canonicalize_funcptr_for_compare ()
TREE_CODE (inner_type) == POINTER_TYPE
TREE_CODE (TREE_TYPE (inner_type)) == FUNCTION_TYPE)
 return NULL_TREE;
-#endif
 
   if (TYPE_PRECISION (inner_type) != TYPE_PRECISION (outer_type))
 return NULL_TREE;
Index: gcc/dojump.c
===
--- gcc/dojump.c(revision 225115)
+++ gcc/dojump.c(working copy)
@@ -47,6 +47,7 @@ along with GCC; see the file COPYING3.
 #include predict.h
 #include basic-block.h
 #include tm_p.h
+#include target.h
 
 static bool prefer_and_bit_test (machine_mode, int);
 static void do_jump_by_parts_greater (tree, tree, int,
@@ -1204,13 +1205,12 @@ do_compare_and_jump (tree treeop0, tree
   unsignedp = TYPE_UNSIGNED (type);
   code = unsignedp ? unsigned_code : signed_code;
 
-#ifdef HAVE_canonicalize_funcptr_for_compare
   /* If function pointers need to be canonicalized before they can
  be reliably compared, then canonicalize them.
  Only do this if *both* sides of the comparison are function pointers.
  If one side isn't, we want a noncanonicalized comparison.  See PR
  middle-end/17564.  */
-  if (HAVE_canonicalize_funcptr_for_compare
+  if (targetm.have_canonicalize_funcptr_for_compare ()
TREE_CODE (TREE_TYPE (treeop0)) == POINTER_TYPE
TREE_CODE (TREE_TYPE (TREE_TYPE (treeop0)))
   == FUNCTION_TYPE
@@ -1221,13 +1221,12 @@ do_compare_and_jump (tree treeop0, tree
   rtx new_op0 = gen_reg_rtx (mode);
   rtx new_op1 = gen_reg_rtx (mode);
 
-  emit_insn (gen_canonicalize_funcptr_for_compare (new_op0, op0));
+  emit_insn (targetm.gen_canonicalize_funcptr_for_compare (new_op0, op0));
   op0 = new_op0;
 
-  emit_insn (gen_canonicalize_funcptr_for_compare (new_op1, op1));
+  emit_insn (targetm.gen_canonicalize_funcptr_for_compare (new_op1, op1));
   op1 = new_op1;
 }
-#endif
 
   do_compare_rtx_and_jump (op0, op1, code, unsignedp, mode,
((mode == BLKmode)
Index: gcc/expr.c
===
--- gcc/expr.c  (revision 225115)
+++ gcc/expr.c

[AArch64] Fall back to -fPIC if no support of -fpic relocation modifer in assembler

2015-06-29 Thread Jiong Wang


This patch fix the breakage caused by

  https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01913.html

We fall back to -fPIC if there is no assembler support on those new
relocation modifiers for -fpic.

OK for trunk?

gcc/
  * configure.ac: Add check for aarch64 assembler -fpic relocation
  modifier support.
  * configure: Regenerate.
  * config.in: Regenerate.
  * config/aarch64/aarch64.c (initialize_aarch64_code_model): Fall back to
  -fPIC if not support of -fpic relocation modifier in assembler.
-- 
Regards,
Jiong

diff --git a/gcc/config.in b/gcc/config.in
index 3aee936..476d585 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -606,6 +606,12 @@
 #endif
 
 
+/* Define if your assembler supports relocs needed by -fpic. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_AS_SPIC_RELOCS
+#endif
+
+
 /* Define if your assembler and linker support thread-local storage. */
 #ifndef USED_FOR_TARGET
 #undef HAVE_AS_TLS
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index f130f8d..7e09e3b 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -7581,9 +7581,13 @@ initialize_aarch64_code_model (void)
 	   aarch64_cmodel = AARCH64_CMODEL_TINY_PIC;
 	   break;
 	 case AARCH64_CMODEL_SMALL:
+#ifdef HAVE_AS_SPIC_RELOCS
 	   aarch64_cmodel = (flag_pic == 2
 			 ? AARCH64_CMODEL_SMALL_PIC
 			 : AARCH64_CMODEL_SMALL_SPIC);
+#else
+	   aarch64_cmodel = AARCH64_CMODEL_SMALL_PIC;
+#endif
 	   break;
 	 case AARCH64_CMODEL_LARGE:
 	   sorry (code model %qs with -f%s, large,
diff --git a/gcc/configure b/gcc/configure
index 3f3f578..b72cde7 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -24228,6 +24228,40 @@ $as_echo #define HAVE_AS_MABI_OPTION 1 confdefs.h
 done
   fi
 fi
+# Check if we have binutils support for relocations types needed by -fpic
+{ $as_echo $as_me:${as_lineno-$LINENO}: checking assembler for -fpic relocs 5
+$as_echo_n checking assembler for -fpic relocs...  6; }
+if test ${gcc_cv_as_aarch64_picreloc+set} = set; then :
+  $as_echo_n (cached)  6
+else
+  gcc_cv_as_aarch64_picreloc=no
+  if test x$gcc_cv_as != x; then
+$as_echo '
+	.text
+	ldr x0, [x2, #:gotpage_lo15:globalsym]
+'  conftest.s
+if { ac_try='$gcc_cv_as $gcc_cv_as_flags  -o conftest.o conftest.s 5'
+  { { eval echo \\$as_me\:${as_lineno-$LINENO}: \$ac_try\; } 5
+  (eval $ac_try) 25
+  ac_status=$?
+  $as_echo $as_me:${as_lineno-$LINENO}: \$? = $ac_status 5
+  test $ac_status = 0; }; }
+then
+	gcc_cv_as_aarch64_picreloc=yes
+else
+  echo configure: failed program was 5
+  cat conftest.s 5
+fi
+rm -f conftest.o conftest.s
+  fi
+fi
+{ $as_echo $as_me:${as_lineno-$LINENO}: result: $gcc_cv_as_aarch64_picreloc 5
+$as_echo $gcc_cv_as_aarch64_picreloc 6; }
+if test $gcc_cv_as_aarch64_picreloc = yes; then
+
+$as_echo #define HAVE_AS_SPIC_RELOCS 1 confdefs.h
+
+fi
 # Enable default workaround for AArch64 Cortex-A53 erratum 835769.
 # Check whether --enable-fix-cortex-a53-835769 was given.
 if test ${enable_fix_cortex_a53_835769+set} = set; then :
diff --git a/gcc/configure.ac b/gcc/configure.ac
index 85f72d5..fb7dbfb 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -3562,6 +3562,13 @@ case $target in
 done
   fi
 fi
+# Check if we have binutils support for relocations types needed by -fpic
+gcc_GAS_CHECK_FEATURE([-fpic relocs], gcc_cv_as_aarch64_picreloc,,,
+[
+	.text
+	ldr x0, [[x2, #:gotpage_lo15:globalsym]]
+],,[AC_DEFINE(HAVE_AS_SPIC_RELOCS, 1,
+	[Define if your assembler supports relocs needed by -fpic.])])
 # Enable default workaround for AArch64 Cortex-A53 erratum 835769.
 AC_ARG_ENABLE(fix-cortex-a53-835769,
 [

Re: conditional lim

2015-06-29 Thread Richard Biener

On Tue, Jun 9, 2015 at 10:11 PM, Evgeniya Maenkova
evgeniya.maenk...@gmail.com wrote:
 On Tue, Jun 9, 2015 at 3:46 PM, Richard Biener
 richard.guent...@gmail.com wrote:
 On Fri, May 29, 2015 at 3:14 PM, Evgeniya Maenkova
 evgeniya.maenk...@gmail.com wrote:
 Hi Richard,

 Here is some explanation. I hope you let me know if I need to clarify 
 something.

 Also, you asked me about concrete example, to make sure you don’t miss
 my answer here is the link:
 https://gcc.gnu.org/ml/gcc-patches/2015-05/msg02417.html.

 Also, I doubt whether it’s convenient for you to create a build with
 my patch or not. May be to clarify things you could send me some
 examples/concrete cases, then I’ll compile them with
 –fdump-tree-loopinit-details and –fdump-tree-lim-details and send you
 these dumps. May be these dumps will be useful. (I’ll only disable
 cleanup_cfg TODO after lim to let you know the exact picture after
 lim).

 What do you think?

 1.   invariantness _dom_walker –

 1.1   for each GIMPLE_COND in given bb calls handle_cond_stmt to call
 for true and false edges handle_branch_edge, which calls SET_TARGET_OF
 for all bb ‘predicated’ by given GIMPLE_COND.

 SET_TARGET_OF sets in basic_blocks aux 2 facts:

 a)  this is true or false edge;

 b)  link to cond stmt;

 Handle_branch_edge works this way:

 If (cond1)

   {

  bb1;

  if (cond2}

{

bb2;

 }

Being called for cond1, it sets cond1 as condition for both bb1 and
 bb2 (the whole branch for cond1, ie also for bb containing cond2),
 then this method will be called (as there is dominance order) for
 cond2 to correct things (ie to set cond2 as condition for bb2).

 Hmm, why not track the current condition as state during the DOM walk
 and thus avoid processing more than one basic-block in handle_branch_edge?
 Thus get rid of handle_branch_edge and instead do everything in 
 handle_cond_stmt
 plus the dom-walkers BB visitor?

 I need to look more carefully how to implement it, but I think I
 understand what you mean and this optimization of course looks
 reasonable to me. Will do.

 I see you don't handle BBs with multiple predecessors - that's ok, but
 are you sure you don't run into correctness issues when not marking such
 BBs as predicated?  This misses handling of, say

  if (a || b)
bb;

 which is a pity (but can be fixed later if desired).

 I had some test (in gcc testsuite or bootstrap build) which worked
 incorrectly because of multiple predecessors. As far as I remember the
 situation was (further, will make some notes about such tests to
 clarify this better), I mean with previous version of my code which
 handled bb with 2 predecessors:
 if (a)
   tmpvar=something;
 while()
   if (a || b)
   basic_block {do something with tmpvar;} // I mean basic block
 predicated by bb with a and bb with b

 So, if a is false, I mean we didn't do tmpvar=something (outside
 loop), BUT we are in basick_block  (we went by bb with b), we got
 Segmentation falt in basic_block {do something with tmpvar;}.

 I think we can reproduce all the details of this test if I remove not
 handling bb with 2 predecessors.

 So I wouldn't move bb with 2 predecessors (this is not always executed
 bb in any loop, not conditionally, they will not be moved at all).

 This is my more detail explanation on this point. Perhaps, I didn't
 understand your question about correctness. Could you repeat it in
 other words (based on this new clarification).

 So I think according to current code it will not be moved. What
 incorrectness do you mean?

If the block isn't marked as predicated the question is whether it is
handled correctly or assumed to be unconditionally executed.

 I note that collecting predicates has similarities to what if-conversion
 does in tree-ifcvt.c (even if its implementation is completely different,
 of course).


 Ok, I'll look at this. But could you please clarify your point?
 (Should I just take this into account with low priority and look at
 this later or you want some refactoring?)

I just noted similar code exists elsewhere - it may be possible to
factor it out but I didn't investigate.  And no, doing that isn't a prerequesite
for this patch.

 1.2   As 1.1 goes we identify whether some bb is predicated by some
 condition or not.

 bb-aux-type will be [TRUE/FALSE]_TARGET_OF and
 bb-aux-cond_stmt=cond stmt (the nearest condition).

 If bb is always executed bb-aux-type = ALWAYS_EXECUTED_IN,
 bb-loop-loop (this info was available in the clean build).

 1.3   As this walker is called in dominance order, information about
 condition is available when invariantness_dom_walker is called for
 given bb.  So we can make some computations based on bb-aux
 structure. This is done in check_conditionally_executed. The main goal
 of this method is to check that the root condition is always executed
 in the loop. I did so to avoid situation like this

 Loop:

Jmp somewhere;

   If (cond1)

   If (cond2)

Re: Fix PR43404, PR48470, PR64744 ICE on naked functions

2015-06-29 Thread Alexander Basov

I've updated patch with attributes lookup.
is it OK?

-- 
Alexander

2015-06-26 9:33 GMT+03:00 Alexander Basov coo...@gmail.com:
 2015-06-25 21:47 GMT+03:00 Jeff Law l...@redhat.com:
 On 06/03/2015 02:15 PM, Alexander Basov wrote:

 Hello Jeff,
 please find updated patch attached

 diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
 index b190f91..c6db8a9 100644
 --- a/gcc/cfgexpand.c
 +++ b/gcc/cfgexpand.c
 @@ -1382,7 +1382,15 @@ expand_one_var (tree var, bool toplevel, bool
 really_expand)
  else
{
  if (really_expand)
 -expand_one_stack_var (origvar);
 +{
 +  if (!targetm.calls.allocate_stack_slots_for_args ())
 +error (cannot allocate stack for variable %q+D, naked
 function.,
 +   var);
 +
 +  expand_one_stack_var (origvar);
 +}

 So how do you know ORIGVAR is an argument here before issuing the
 error?  ie, shouldn't you verify that the underlying object is a
 PARM_DECL? If there's some way we already know we're dealing with a
 PARM_DECL, then just say so.

 In case of naked function stack should not be used not only for function
 args, but also for any local variables.
 So, i think we don't need to check if underlying object is a PARM_DECL.

 Then that would indicate that we're using the wrong test
 (allocate_stack_slot_for_args).  That hook is for whether or not arguments
 should have stack slots allocated.  Yet you're issuing an error for more
 than just PARM_DECLs.

 Shouldn't you instead be checking if the current function is a naked
 function or not by checking the attributes of the current function?

 Jeff

 What allocate_stack_slots_for_args  does, it only checks if current
 function is naked or not.
 May be it will be better to remove allocate_stack_slots_for_args and
 replace if with explicit checking of naked attribute?

 --
 Alexander
commit 3a72dac72beb713ab6a566728b77c4da6d297755
Author: Alexander Basov coo...@gmail.com
Date:   Tue Mar 10 14:15:24 2015 +0300

	PR middle-end/64744
	PR middle-end/48470
	PR middle-end/43404

	* gcc/cfgexpand.c (expand_one_var): Add check if stack is going to
	be used in naked function.
	* gcc/expr.c (expand_expr_addr_expr_1): Remove exscess checking
	whether expression should not reside in MEM.
* gcc/function.c (use_register_for_decl): Do not use registers for
	non-register things (volatile, float, BLKMode) in naked functions.

	* gcc/testsuite/gcc.target/arm/pr43404.c : New testcase.
	* gcc/testsuite/gcc.target/arm/pr48470.c : New testcase.
	* gcc/testsuite/gcc.target/arm/pr64744-1.c : New testcase.
* gcc/testsuite/gcc.target/arm/pr64744-2.c : New testcase.

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 05eb2ad..b7b4804 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -1349,7 +1349,16 @@ expand_one_var (tree var, bool toplevel, bool really_expand)
   else
 {
   if (really_expand)
-expand_one_stack_var (origvar);
+{
+  if (lookup_attribute (naked,
+DECL_ATTRIBUTES (current_function_decl)))
+error (cannot allocate stack for variable %q+D, naked function.,
+   var);
+
+  expand_one_stack_var (origvar);
+}
+
+
   return tree_to_uhwi (DECL_SIZE_UNIT (var));
 }
   return 0;
diff --git a/gcc/expr.c b/gcc/expr.c
index 408ae1a..34cd7de 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -7631,15 +7631,7 @@ expand_expr_addr_expr_1 (tree exp, rtx target, machine_mode tmode,
 	 marked TREE_ADDRESSABLE, which will be either a front-end
 	 or a tree optimizer bug.  */
 
-	  if (TREE_ADDRESSABLE (exp)
-	   ! MEM_P (result)
-	   ! targetm.calls.allocate_stack_slots_for_args ())
-	{
-	  error (local frame unavailable (naked function?));
-	  return result;
-	}
-	  else
-	gcc_assert (MEM_P (result));
+	  gcc_assert (MEM_P (result));
 	  result = XEXP (result, 0);
 
 	  /* ??? Is this needed anymore?  */
diff --git a/gcc/function.c b/gcc/function.c
index cffe323..0866c49 100644
--- a/gcc/function.c
+++ b/gcc/function.c
@@ -2110,9 +2110,6 @@ aggregate_value_p (const_tree exp, const_tree fntype)
 bool
 use_register_for_decl (const_tree decl)
 {
-  if (!targetm.calls.allocate_stack_slots_for_args ())
-return true;
-
   /* Honor volatile.  */
   if (TREE_SIDE_EFFECTS (decl))
 return false;
@@ -2140,6 +2137,9 @@ use_register_for_decl (const_tree decl)
   if (flag_float_store  FLOAT_TYPE_P (TREE_TYPE (decl)))
 return false;
 
+  if (!targetm.calls.allocate_stack_slots_for_args ())
+return true;
+
   /* If we're not interested in tracking debugging information for
  this decl, then we can certainly put it in a register.  */
   if (DECL_IGNORED_P (decl))
diff --git a/gcc/testsuite/gcc.target/arm/pr43404.c b/gcc/testsuite/gcc.target/arm/pr43404.c
new file mode 100644
index 000..4f2291d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr43404.c
@@ -0,0 +1,10

Re: [PATCH] Move X - (X / Y) * Y folding to match.pd

2015-06-29 Thread Richard Biener

On Mon, 29 Jun 2015, Marek Polacek wrote:

 On Mon, Jun 29, 2015 at 09:36:59AM +0200, Richard Biener wrote:
   Anything wrong with this?
   
   +/* X - (X / Y) * Y is the same as X % Y.  */
   +(simplify
   + (minus (convert? @0) (convert? (mult (trunc_div @0 @1) @1)))
   + (if (INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type))
   +  (convert (trunc_mod @0 @1
  
 That looks awfully similar to a variant I also tried (but I remember
 having convert1? and convert2? in it).  Not sure what was wrong with
 that one; certainly yours seems to work fine.
 
 Not sure whether we need some tree_nop_conversion_p in it.  Perhaps not.
 
  Yes.  Eventually even (convert? (mult (convert1? (trunc_div ...)?
  Of course with matching @0 between the two operands of the minus
  you constrain types quite a bit.
 
 I'm starting to dislike this whole convert business ;).

;)  fold-const.c STRIP_NOPS certainly was both convenient and
error-prone at the same time.

  I'd say just single-step through fold and see what types it get
  present when folding a - (unsigned) ((a / b) * b).
 
 I did that.  The whole expression has type unsigned int, arg0
 is a of type int and arg1 is (a / b) * b of type int.
 
 The following version is what Marc suggests.
 
 Bootstrapped/regtested on x86_64-linux, ok for trunk?

Ok.

Thanks,
Richard.

 2015-06-29  Marek Polacek  pola...@redhat.com
   Marc Glisse  marc.gli...@inria.fr
 
   * fold-const.c (fold_binary_loc): Move X - (X / Y) * Y - X % Y to ...
   * match.pd: ... pattern here.
 
 diff --git gcc/fold-const.c gcc/fold-const.c
 index 6f12dd0..01e3983 100644
 --- gcc/fold-const.c
 +++ gcc/fold-const.c
 @@ -10509,19 +10509,6 @@ fold_binary_loc (location_t loc,
   fold_convert_loc (loc, type,
 TREE_OPERAND (arg0, 0)));
  
 -  /* X - (X / Y) * Y is X % Y.  */
 -  if ((INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type))
 -TREE_CODE (arg1) == MULT_EXPR
 -TREE_CODE (TREE_OPERAND (arg1, 0)) == TRUNC_DIV_EXPR
 -operand_equal_p (arg0,
 -   TREE_OPERAND (TREE_OPERAND (arg1, 0), 0), 0)
 -operand_equal_p (TREE_OPERAND (TREE_OPERAND (arg1, 0), 1),
 -   TREE_OPERAND (arg1, 1), 0))
 - return
 -   fold_convert_loc (loc, type,
 - fold_build2_loc (loc, TRUNC_MOD_EXPR, TREE_TYPE 
 (arg0),
 -  arg0, TREE_OPERAND (arg1, 1)));
 -
if (! FLOAT_TYPE_P (type))
   {
 /* Fold A - (A  B) into ~B  A.  */
 diff --git gcc/match.pd gcc/match.pd
 index b2f8429..2bc158b 100644
 --- gcc/match.pd
 +++ gcc/match.pd
 @@ -238,6 +238,12 @@ along with GCC; see the file COPYING3.  If not see
 tree_nop_conversion_p (type, TREE_TYPE (@1)))
(trunc_mod @0 (convert @1
  
 +/* X - (X / Y) * Y is the same as X % Y.  */
 +(simplify
 + (minus (convert? @0) (convert? (mult (trunc_div @0 @1) @1)))
 + (if (INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type))
 +  (convert (trunc_mod @0 @1
 +
  /* Optimize TRUNC_MOD_EXPR by a power of two into a BIT_AND_EXPR,
 i.e. X % C into X  (C - 1), if X and C are positive.
 Also optimize A % (C  N)  where C is a power of 2,
 
   Marek
 
 

-- 
Richard Biener rguent...@suse.de
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Dilip Upmanyu, Graham 
Norton, HRB 21284 (AG Nuernberg)

Re: Debug mode enhancements

2015-06-29 Thread Jonathan Wakely


On 12/06/15 19:11 +0200, François Dumont wrote:

Hi

   This is a patch to:

- Enhance __get_distance to get a better feedback about distance between
iterators so that we can take sharper decision about what is right or
not. This function is now aware about safe iterators and leverage on
those a little like std::distance does with C++ 11 list::iterator.
- Make debug mode aware about iterator adapters reverse_iterator and
move_iterator.
- Thanks to previous points this patch also extend situations where it
is possible to remove debug layers on iterators to lower performance
hint of this mode. We now detect at runtime if we know enough about the
iterator range to get rid of the potential debug layer.

   For the last point I introduced __gnu_debug::__unsafe which remove
debug layer unconditionally in opposition to __gnu_debug::__base which
do so only for random access iterator. The latter has been kept to be
used in context of constructors.

   I had to introduced new debug headers to limit impact in
stl_iterator.h. We shall not include debug.h here as the purpose is not
to inject debug checks in the normal code.

   Note that the new __get_distance will be very useful to implement
proper debug algos


I didn't check every line of this, but it looks good and I trust your
judgment for debug mode changes, so it's OK for trunk, thanks.

[PATCH, PR66652] Use max_loop_iterations in transform_to_exit_first_loop_alt

2015-06-29 Thread Tom de Vries


Hi,

this patch fixes PR66652.

It uses max_loop_iterations in transform_to_exit_first_loop_alt to 
ensure that the new loop bound nit + 1 doesn't overflow.


Bootstrapped and reg-tested on x86_64.

OK for trunk?

Thanks,
- Tom
Use max_loop_iterations in transform_to_exit_first_loop_alt

2015-06-29  Tom de Vries  t...@codesourcery.com

	PR tree-optimization/66652
	* tree-parloops.c (try_transform_to_exit_first_loop_alt): Use
	max_loop_iterations to determine if nit + 1 overflows.

	* testsuite/libgomp.c/parloops-exit-first-loop-alt-3.c (f): Rewrite
	using restrict pointers.
	(main): Add arguments to calls to f.
	* testsuite/libgomp.c/parloops-exit-first-loop-alt.c: Same.

	* gcc.dg/parloops-exit-first-loop-alt-pr66652.c: New test.
	* gcc.dg/parloops-exit-first-loop-alt-3.c (f):  Rewrite using restrict
	pointers.
	* gcc.dg/parloops-exit-first-loop-alt.c: Same.
---
 .../gcc.dg/parloops-exit-first-loop-alt-3.c|  2 +-
 .../gcc.dg/parloops-exit-first-loop-alt-pr66652.c  | 31 ++
 .../gcc.dg/parloops-exit-first-loop-alt.c  | 19 +
 gcc/tree-parloops.c| 31 ++
 .../libgomp.c/parloops-exit-first-loop-alt-3.c |  6 ++---
 .../libgomp.c/parloops-exit-first-loop-alt.c   |  7 ++---
 6 files changed, 77 insertions(+), 19 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-pr66652.c

diff --git a/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-3.c b/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-3.c
index b0fde37..fec53a1 100644
--- a/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-3.c
+++ b/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-3.c
@@ -7,7 +7,7 @@
 unsigned int *a;
 
 unsigned int
-f (unsigned int n)
+f (unsigned int n, unsigned int *__restrict__ a)
 {
   int i;
   unsigned int sum = 1;
diff --git a/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-pr66652.c b/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-pr66652.c
new file mode 100644
index 000..2ea097d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-pr66652.c
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target pthread } */
+/* { dg-options -O2 -ftree-parallelize-loops=2 -fdump-tree-parloops } */
+
+#include stdio.h
+#include stdlib.h
+#include limits.h
+
+unsigned int
+f (unsigned int n, unsigned int sum)
+{
+  unsigned int i;
+
+  i = UINT_MAX;
+  do
+{
+  sum += i % 13;
+  i++;
+}
+  while (i  n - 1);
+
+  return sum;
+}
+
+/* Four times % 13:
+   - once in f._loopfn.0
+   - once in the parallel
+   - once in the low iteration count loop
+   - once for a peeled off last iteration following the parallel.
+   In other words, we want try_transform_to_exit_first_loop_alt to fail.  */
+/* { dg-final { scan-tree-dump-times (?n)% 13 4 parloops } } */
diff --git a/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt.c b/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt.c
index b36f01b..e088fa1 100644
--- a/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt.c
+++ b/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt.c
@@ -4,14 +4,9 @@
 
 /* Variable bound, vector addition.  */
 
-#define N 1000
-
-unsigned int a[N];
-unsigned int b[N];
-unsigned int c[N];
-
 void
-f (unsigned int n)
+f (unsigned int n, unsigned int *__restrict__ a, unsigned int *__restrict__ b,
+   unsigned int *__restrict__ c)
 {
   int i;
 
@@ -19,9 +14,9 @@ f (unsigned int n)
 c[i] = a[i] + b[i];
 }
 
-/* Three times three array accesses:
-   - three in f._loopfn.0
-   - three in the parallel
-   - three in the low iteration count loop
+/* Three times a store:
+   - one in f._loopfn.0
+   - one in the parallel
+   - one in the low iteration count loop
Crucially, none for a peeled off last iteration following the parallel.  */
-/* { dg-final { scan-tree-dump-times (?n)\\\[i 9 parloops } } */
+/* { dg-final { scan-tree-dump-times (?n)^  \\*_\[0-9\]* 3 parloops } } */
diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
index cb0ba54..32d059a 100644
--- a/gcc/tree-parloops.c
+++ b/gcc/tree-parloops.c
@@ -1795,8 +1795,39 @@ try_transform_to_exit_first_loop_alt (struct loop *loop,
 
   gcc_assert (TREE_CODE (nit) == SSA_NAME);
 
+  /* Variable nit is the loop bound as returned by canonicalize_loop_ivs, for an
+ iv with base 0 and step 1 that is incremented in the latch, like this:
+
+ bb header:
+ # iv_1 = PHI 0 (preheader), iv_2 (latch)
+ ...
+ if (iv_1  nit)
+   goto bb latch;
+ else
+   goto bb exit;
+
+ bb latch:
+ iv_2 = ivtmp_1 + 1;
+ goto bb header;
+
+ The range of iv_1 is [0, nit].  The latch edge is taken for
+ iv_1 == [0, nit - 1] and the exit edge is taken for iv_1 == nit.  So the
+ number of latch executions is equal to nit.
+
+ The function max_loop_iterations gives us the maximum number of latch
+ executions, so it gives us the maximum value of nit.  */
+  widest_int nit_max;
+  if (!max_loop_iterations

Re: [PATCH 3/3][ARM][PR target/65697] Add tests for __sync builtins.

2015-06-29 Thread Ramana Radhakrishnan

On Mon, Jun 22, 2015 at 10:52 AM, Matthew Wahab matthew.wa...@arm.com wrote:
 This is the ARM version of the patches to strengthen memory barriers for the
 __sync builtins on ARMv8 targets
 (https://gcc.gnu.org/ml/gcc-patches/2015-05/msg01989.html).

 This patch adds tests for the code generated by the ARM backend for the
 __sync
 builtins.

 Tested the series for arm-none-linux-gnueabihf with check-gcc.

 Ok for trunk?
 Matthew

 gcc/testsuite
 2015-06-22  Matthew Wahab  matthew.wa...@arm.com

 PR Target/65697

And as earlier ... s/Target/target :)

 * gcc.target/arm/armv8-sync-comp-swap.c: New.
 * gcc.target/arm/armv8-sync-op-acquire.c: New.
 * gcc.target/arm/armv8-sync-op-full.c: New.
 * gcc.target/arm/armv8-sync-op-release.c: New.


OK.

Ramana

[RFC, PATCH] Split pool_allocator and create a new object_allocator

2015-06-29 Thread Martin Liška

Hello.

This mail thread is follow-up from the previous thread, where Ulrich Weigand 
spotted
a miscompilation:

https://gcc.gnu.org/ml/gcc-patches/2015-06/msg00868.html

Following patch changes behavior of the pool_allocator:

1) pool_allocator is typeless and does not call any ctor
2) pool_allocator returns void* as a result of allocate function
3) all classes (e.g. et_occ) which use an operator new utilize
the typeless allocator

4) object_allocator is a new type-based allocator which is used for allocation 
of
objects
5) for a type T, the allocator calls placement new constructor and T* is return
type of the allocate function
6) the object_allocator internally utilizes pool_allocator with m_size == 
sizeof(T)

The patch can bootstrap on x86_64-unknown-linux-gnu and 
ppc64le-unknown-linux-gnu and
regression tests have been running.

Thoughts?
Thanks,
Martin

From 9702a6ea0b60985f08ff28b0977c1dc46c25f27b Mon Sep 17 00:00:00 2001
From: mliska mli...@suse.cz
Date: Wed, 24 Jun 2015 13:42:52 +0200
Subject: [PATCH] Add new object_allocator.

gcc/c-family/ChangeLog:

2015-06-24  Martin Liska  mli...@suse.cz

	* c-format.c (static void check_format_info_main): Use
	object_allocator instead of pool_allocator.
	(check_format_arg): Likewise.
	(check_format_info_main): Likewise.

gcc/ChangeLog:

2015-06-24  Martin Liska  mli...@suse.cz

	* alloc-pool.h (pool_allocator::initialize): Change to type-less
	pool_allocator.
	(pool_allocator::allocate): Likewise.
	(pool_allocator::remove): Likewise.
	* asan.c (struct asan_mem_ref): Likewise.
	* cfg.c (initialize_original_copy_tables): Use object_allocator
	instead of pool_allocator.
	* cselib.c (struct elt_list): Change declaration of used
	pool_allocator.
	(new_cselib_val): Likewise.
	* cselib.h (struct cselib_val): Change to type-less pool_allocator..
	(struct elt_loc_list): Likewise.
	* df-problems.c (df_chain_alloc): Use object_allocator instead of pool_allocator..
	* df-scan.c (struct df_scan_problem_data): Likewise.
	(df_scan_alloc): Likewise.
	* df.h (struct dataflow): Likewise.
	* dse.c (struct read_info_type): Change to type-less pool_allocator.
	(struct insn_info_type): Likewise.
	(struct dse_bb_info_type): Likewise.
	(struct group_info): Likewise.
	(struct deferred_change): Likewise.
	* et-forest.c (struct et_occ): Likewise.
	* et-forest.h (struct et_node): Likewise.
	* ipa-cp.c: Use object_allocator instead of pool_allocator.
	* ipa-inline-analysis.c: Likewise.
	* ipa-profile.c: Likewise.
	* ipa-prop.c: Likewise.
	* ipa-prop.h: Likewise.
	* ira-build.c (initiate_cost_vectors): Change to type-less pool_allocator.
	(ira_allocate_cost_vector): Likewise.
	* ira-color.c (struct update_cost_record): Use object_allocator instead of pool_allocator.
	* lra-int.h (struct lra_live_range): Change to type-less pool_allocator.
	(struct lra_copy): Likewise.
	(struct lra_insn_reg): Likewise.
	* lra-lives.c: Likewise.
	* lra.c: Likewise.
	* regcprop.c (struct queued_debug_insn_change): Use object_allocator instead of pool_allocator.
	* sched-deps.c (sched_deps_init): Likewise.
	* sel-sched-ir.c: Likewise.
	* sel-sched-ir.h: Likewise.
	* stmt.c (expand_case): Likewise.
	(expand_sjlj_dispatch_table): Likewise.
	* tree-sra.c (struct access): Change to type-less pool_allocator.
	(struct assign_link): Likewise.
	* tree-ssa-math-opts.c (pass_cse_reciprocals::execute): Use object_allocator instead of pool_allocator.
	* tree-ssa-pre.c: Likewise.
	* tree-ssa-reassoc.c: Likewise.
	* tree-ssa-sccvn.c (allocate_vn_table): Likewise.
	* tree-ssa-strlen.c: Likewise.
	* tree-ssa-structalias.c: Likewise.
	* var-tracking.c (onepart_pool_allocate): Change to type-less pool_allocator.
	(unshare_variable): Likewise.
	(variable_merge_over_cur): Likewise.
	(variable_from_dropped): Likewise.
	(variable_was_changed): Likewise.
	(set_slot_part): Likewise.
	(emit_notes_for_differences_1): Likewise.
---
 gcc/alloc-pool.h   | 176 +
 gcc/asan.c |   8 +--
 gcc/c-family/c-format.c|   7 +-
 gcc/cfg.c  |   5 +-
 gcc/cselib.c   |  17 +++--
 gcc/cselib.h   |  12 ++--
 gcc/df-problems.c  |   2 +-
 gcc/df-scan.c  |  24 +++
 gcc/df.h   |   2 +-
 gcc/dse.c  |  50 +++--
 gcc/et-forest.c|  11 ++-
 gcc/et-forest.h|   6 +-
 gcc/ipa-cp.c   |   8 +--
 gcc/ipa-inline-analysis.c  |   2 +-
 gcc/ipa-profile.c  |   2 +-
 gcc/ipa-prop.c |   2 +-
 gcc/ipa-prop.h |   8 +--
 gcc/ira-build.c|  18 ++---
 gcc/ira-color.c|   8 +--
 gcc/lra-int.h  |  19 +++--
 gcc/lra-lives.c|   3 +-
 gcc/lra.c  |   4 +-
 gcc/regcprop.c |   4 +-
 gcc/sched-deps.c   |   8 +--
 gcc/sel-sched-ir.c |   2 +-
 gcc/sel-sched-ir.h |   2 +-
 gcc/stmt.c |   7 +-
 gcc/tree-sra.c |  16 ++---

Re: [PATCH] Move X - (X / Y) * Y folding to match.pd

2015-06-29 Thread Marek Polacek

On Mon, Jun 29, 2015 at 09:36:59AM +0200, Richard Biener wrote:
  Anything wrong with this?
  
  +/* X - (X / Y) * Y is the same as X % Y.  */
  +(simplify
  + (minus (convert? @0) (convert? (mult (trunc_div @0 @1) @1)))
  + (if (INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type))
  +  (convert (trunc_mod @0 @1
 
That looks awfully similar to a variant I also tried (but I remember
having convert1? and convert2? in it).  Not sure what was wrong with
that one; certainly yours seems to work fine.

Not sure whether we need some tree_nop_conversion_p in it.  Perhaps not.

 Yes.  Eventually even (convert? (mult (convert1? (trunc_div ...)?
 Of course with matching @0 between the two operands of the minus
 you constrain types quite a bit.

I'm starting to dislike this whole convert business ;).

 I'd say just single-step through fold and see what types it get
 present when folding a - (unsigned) ((a / b) * b).

I did that.  The whole expression has type unsigned int, arg0
is a of type int and arg1 is (a / b) * b of type int.

The following version is what Marc suggests.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2015-06-29  Marek Polacek  pola...@redhat.com
Marc Glisse  marc.gli...@inria.fr

* fold-const.c (fold_binary_loc): Move X - (X / Y) * Y - X % Y to ...
* match.pd: ... pattern here.

diff --git gcc/fold-const.c gcc/fold-const.c
index 6f12dd0..01e3983 100644
--- gcc/fold-const.c
+++ gcc/fold-const.c
@@ -10509,19 +10509,6 @@ fold_binary_loc (location_t loc,
fold_convert_loc (loc, type,
  TREE_OPERAND (arg0, 0)));
 
-  /* X - (X / Y) * Y is X % Y.  */
-  if ((INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type))
-  TREE_CODE (arg1) == MULT_EXPR
-  TREE_CODE (TREE_OPERAND (arg1, 0)) == TRUNC_DIV_EXPR
-  operand_equal_p (arg0,
- TREE_OPERAND (TREE_OPERAND (arg1, 0), 0), 0)
-  operand_equal_p (TREE_OPERAND (TREE_OPERAND (arg1, 0), 1),
- TREE_OPERAND (arg1, 1), 0))
-   return
- fold_convert_loc (loc, type,
-   fold_build2_loc (loc, TRUNC_MOD_EXPR, TREE_TYPE 
(arg0),
-arg0, TREE_OPERAND (arg1, 1)));
-
   if (! FLOAT_TYPE_P (type))
{
  /* Fold A - (A  B) into ~B  A.  */
diff --git gcc/match.pd gcc/match.pd
index b2f8429..2bc158b 100644
--- gcc/match.pd
+++ gcc/match.pd
@@ -238,6 +238,12 @@ along with GCC; see the file COPYING3.  If not see
tree_nop_conversion_p (type, TREE_TYPE (@1)))
   (trunc_mod @0 (convert @1
 
+/* X - (X / Y) * Y is the same as X % Y.  */
+(simplify
+ (minus (convert? @0) (convert? (mult (trunc_div @0 @1) @1)))
+ (if (INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type))
+  (convert (trunc_mod @0 @1
+
 /* Optimize TRUNC_MOD_EXPR by a power of two into a BIT_AND_EXPR,
i.e. X % C into X  (C - 1), if X and C are positive.
Also optimize A % (C  N)  where C is a power of 2,

Marek

[committed] Simplify structure try_transform_to_exit_first_loop_alt

2015-06-29 Thread Tom de Vries


Hi,

this patch simplifies the structure of function 
try_transform_to_exit_first_loop_alt, no functional changes.


Bootstrapped on x86_64, retested parloops-exit-first-loop-alt*.* testcases.

Committed as trivial.

Thanks,
- Tom
Simplify structure try_transform_to_exit_first_loop_alt

2015-06-29  Tom de Vries  t...@codesourcery.com

	* tree-parloops.c (try_transform_to_exit_first_loop_alt): Simplify
	function structure.
---
 gcc/tree-parloops.c | 59 ++---
 1 file changed, 15 insertions(+), 44 deletions(-)

diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
index ab77f32..ec708c6 100644
--- a/gcc/tree-parloops.c
+++ b/gcc/tree-parloops.c
@@ -1788,60 +1788,31 @@ try_transform_to_exit_first_loop_alt (struct loop *loop,
    nit, build_one_cst (nit_type));
 
 	  gcc_assert (TREE_CODE (alt_bound) == INTEGER_CST);
+	  transform_to_exit_first_loop_alt (loop, reduction_list, alt_bound);
+	  return true;
 	}
   else
 	{
 	  /* Todo: Figure out if we can trigger this, if it's worth to handle
 	 optimally, and if we can handle it optimally.  */
+	  return false;
 	}
 }
-  else
-{
-  gcc_assert (TREE_CODE (nit) == SSA_NAME);
-
-  gimple def = SSA_NAME_DEF_STMT (nit);
-
-  if (def
-	   is_gimple_assign (def)
-	   gimple_assign_rhs_code (def) == PLUS_EXPR)
-	{
-	  tree op1 = gimple_assign_rhs1 (def);
-	  tree op2 = gimple_assign_rhs2 (def);
-	  if (integer_minus_onep (op1))
-	alt_bound = op2;
-	  else if (integer_minus_onep (op2))
-	alt_bound = op1;
-	}
 
-  /* There is a number of test-cases for which we don't get an alt_bound
-	 here: they're listed here, with the lhs of the last stmt as the nit:
+  gcc_assert (TREE_CODE (nit) == SSA_NAME);
 
-	 libgomp.graphite/force-parallel-1.c:
-	 _21 = (signed long) N_6(D);
-	 _19 = _21 + -1;
-	 _7 = (unsigned long) _19;
+  gimple def = SSA_NAME_DEF_STMT (nit);
 
-	 libgomp.graphite/force-parallel-2.c:
-	 _33 = (signed long) N_9(D);
-	 _16 = _33 + -1;
-	 _37 = (unsigned long) _16;
-
-	 libgomp.graphite/force-parallel-5.c:
-	 bb 6:
-	 # graphite_IV.5_46 = PHI 0(5), graphite_IV.5_47(11)
-	 bb 7:
-	 _33 = (unsigned long) graphite_IV.5_46;
-
-	 g++.dg/tree-ssa/pr34355.C:
-	 _2 = (unsigned int) i_9;
-	 _3 = 4 - _2;
-
-	 gcc.dg/pr53849.c:
-	 _5 = d.0_11 + -2;
-	 _18 = (unsigned int) _5;
-
-	 We will be able to handle some of these cases, if we can determine when
-	 it's safe to look past casts.  */
+  if (def
+   is_gimple_assign (def)
+   gimple_assign_rhs_code (def) == PLUS_EXPR)
+{
+  tree op1 = gimple_assign_rhs1 (def);
+  tree op2 = gimple_assign_rhs2 (def);
+  if (integer_minus_onep (op1))
+	alt_bound = op2;
+  else if (integer_minus_onep (op2))
+	alt_bound = op1;
 }
 
   if (alt_bound == NULL_TREE)
-- 
1.9.1

Re: conditional lim

2015-06-29 Thread Evgeniya Maenkova

On Mon, Jun 29, 2015 at 5:10 PM, Richard Biener
richard.guent...@gmail.com wrote:
 On Tue, Jun 9, 2015 at 10:11 PM, Evgeniya Maenkova
 evgeniya.maenk...@gmail.com wrote:
 On Tue, Jun 9, 2015 at 3:46 PM, Richard Biener
 richard.guent...@gmail.com wrote:
 On Fri, May 29, 2015 at 3:14 PM, Evgeniya Maenkova
 evgeniya.maenk...@gmail.com wrote:
 Hi Richard,

 Here is some explanation. I hope you let me know if I need to clarify 
 something.

 Also, you asked me about concrete example, to make sure you don’t miss
 my answer here is the link:
 https://gcc.gnu.org/ml/gcc-patches/2015-05/msg02417.html.

 Also, I doubt whether it’s convenient for you to create a build with
 my patch or not. May be to clarify things you could send me some
 examples/concrete cases, then I’ll compile them with
 –fdump-tree-loopinit-details and –fdump-tree-lim-details and send you
 these dumps. May be these dumps will be useful. (I’ll only disable
 cleanup_cfg TODO after lim to let you know the exact picture after
 lim).

 What do you think?

 1.   invariantness _dom_walker –

 1.1   for each GIMPLE_COND in given bb calls handle_cond_stmt to call
 for true and false edges handle_branch_edge, which calls SET_TARGET_OF
 for all bb ‘predicated’ by given GIMPLE_COND.

 SET_TARGET_OF sets in basic_blocks aux 2 facts:

 a)  this is true or false edge;

 b)  link to cond stmt;

 Handle_branch_edge works this way:

 If (cond1)

   {

  bb1;

  if (cond2}

{

bb2;

 }

Being called for cond1, it sets cond1 as condition for both bb1 and
 bb2 (the whole branch for cond1, ie also for bb containing cond2),
 then this method will be called (as there is dominance order) for
 cond2 to correct things (ie to set cond2 as condition for bb2).

 Hmm, why not track the current condition as state during the DOM walk
 and thus avoid processing more than one basic-block in handle_branch_edge?
 Thus get rid of handle_branch_edge and instead do everything in 
 handle_cond_stmt
 plus the dom-walkers BB visitor?

 I need to look more carefully how to implement it, but I think I
 understand what you mean and this optimization of course looks
 reasonable to me. Will do.

 I see you don't handle BBs with multiple predecessors - that's ok, but
 are you sure you don't run into correctness issues when not marking such
 BBs as predicated?  This misses handling of, say

  if (a || b)
bb;

 which is a pity (but can be fixed later if desired).

 I had some test (in gcc testsuite or bootstrap build) which worked
 incorrectly because of multiple predecessors. As far as I remember the
 situation was (further, will make some notes about such tests to
 clarify this better), I mean with previous version of my code which
 handled bb with 2 predecessors:
 if (a)
   tmpvar=something;
 while()
   if (a || b)
   basic_block {do something with tmpvar;} // I mean basic block
 predicated by bb with a and bb with b

 So, if a is false, I mean we didn't do tmpvar=something (outside
 loop), BUT we are in basick_block  (we went by bb with b), we got
 Segmentation falt in basic_block {do something with tmpvar;}.

 I think we can reproduce all the details of this test if I remove not
 handling bb with 2 predecessors.

 So I wouldn't move bb with 2 predecessors (this is not always executed
 bb in any loop, not conditionally, they will not be moved at all).

 This is my more detail explanation on this point. Perhaps, I didn't
 understand your question about correctness. Could you repeat it in
 other words (based on this new clarification).

 So I think according to current code it will not be moved. What
 incorrectness do you mean?

 If the block isn't marked as predicated the question is whether it is
 handled correctly or assumed to be unconditionally executed.

 I note that collecting predicates has similarities to what if-conversion
 does in tree-ifcvt.c (even if its implementation is completely different,
 of course).


 Ok, I'll look at this. But could you please clarify your point?
 (Should I just take this into account with low priority and look at
 this later or you want some refactoring?)

 I just noted similar code exists elsewhere - it may be possible to
 factor it out but I didn't investigate.  And no, doing that isn't a 
 prerequesite
 for this patch.

 1.2   As 1.1 goes we identify whether some bb is predicated by some
 condition or not.

 bb-aux-type will be [TRUE/FALSE]_TARGET_OF and
 bb-aux-cond_stmt=cond stmt (the nearest condition).

 If bb is always executed bb-aux-type = ALWAYS_EXECUTED_IN,
 bb-loop-loop (this info was available in the clean build).

 1.3   As this walker is called in dominance order, information about
 condition is available when invariantness_dom_walker is called for
 given bb.  So we can make some computations based on bb-aux
 structure. This is done in check_conditionally_executed. The main goal
 of this method is to check that the root condition is always executed
 in the loop. I did so to

Re: [PATCH 2/3][ARM][PR target/65697] Strengthen barriers for compare-and-swap builtin.

2015-06-29 Thread Ramana Radhakrishnan

On Mon, Jun 22, 2015 at 10:50 AM, Matthew Wahab matthew.wa...@arm.com wrote:
 This is the ARM version of the patches to strengthen memory barriers for the
 __sync builtins on ARMv8 targets
 (https://gcc.gnu.org/ml/gcc-patches/2015-05/msg01989.html).

 This patch changes the code generated for __sync_type_compare_and_swap to
 remove
 the acquire-barrier from the load and end the operation with a fence. This
 also
 strengthens the acquire barrier generated for __sync_lock_test_and_set
 which,
 like compare-and-swap, is implemented as a form of atomic exchange.

 Tested as part of a series for arm-none-linux-gnueabihf with check-gcc.

 Ok for trunk?
 Matthew

 gcc/
 2015-06-22  Matthew Wahab  matthew.wa...@arm.com

 PR Target/65697

s/Target/target

 * config/armc/arm.c (arm_split_compare_and_swap): For ARMv8, replace
 an
 initial acquire barrier with a final full barrier.


OK.

Ramana

Re: [PATCH 1/3][ARM][PR target/65697] Strengthen memory barriers for __sync builtins

2015-06-29 Thread Ramana Radhakrishnan

On Mon, Jun 22, 2015 at 10:48 AM, Matthew Wahab matthew.wa...@arm.com wrote:
 This is the ARM version of the patches to strengthen memory barriers for the
 __sync builtins on ARMv8 targets
 (https://gcc.gnu.org/ml/gcc-patches/2015-05/msg01989.html).

 The problem is that the barriers generated for the __sync builtins for ARMv8
 targets are too weak. This affects the full and the acquire barriers in the
 __sync fetch-and-op, compare-and-swap functions and
 __sync_lock_test_and_set.

 This patch series changes the code to strengthen the barriers by replacing
 initial load-acquires with a simple load and adding a final memory barrier
 to
 prevent code hoisting.

 - Full barriers:  __sync_fetch_and_op, __sync_op_and_fetch
   __sync_*_compare_and_swap

   [load-acquire; code; store-release]
   becomes
   [load; code ; store-release; barrier].

 - Acquire barriers:  __sync_lock_test_and_set

   [load-acquire; code; store]
   becomes
   [load; code; store; barrier]

 This patch changes the code generated for __sync_fetch_and_op and
 __sync_op_and_fetch builtins.

 Tested as part of a series for arm-none-linux-gnueabihf with check-gcc.

 Ok for trunk?
 Matthew

 gcc/
 2015-06-22  Matthew Wahab  matthew.wa...@arm.com

 PR Target/65697

s/Target/target

 * config/armc/arm.c (arm_split_atomic_op): For ARMv8, replace an
 initial acquire barrier with a final full barrier.

Otherwise OK.

Ramana

Re: [PATCH, C++-1z] Implement N4197 - Adding u8 character literals

2015-06-29 Thread Jason Merrill


On 06/27/2015 08:14 PM, Ed Smith-Rowland wrote:

In c-ada-specs.c/print_ada_macros() I just write these as a char
constant rather than spelling the token.
We could do the latter.  You'd see the u8 then I think.  I couldn't
find in the Ada test suite where this was exercised.


If Ada folks don't respond, let's do the latter.  OK with that change.

Jason

Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math

2015-06-29 Thread Benedikt Huber

 Hmm,  Reducing the iterations to 1 step for float and 2 steps for double
 
 I got VE (miscompares) on following benchmarks
 416.gamess
 453.povray
 454.calculix
 459.GemsFDTD
 
 Benedikt , I have ICE for 444.namd with your patch,  not sure if something 
 wrong in my local tree.

I could not reproduce the ICE for namd.
Could you provide the spec config entry or the command line options, please.
Which ICE did you get? Please provide the output.

Thank you and best regards,
Benedikt


signature.asc
Description: Message signed with OpenPGP using GPGMail

Re: [AArch64][2/2] Implement -fpic for -mcmodel=small

2015-06-29 Thread Jiong Wang


Christophe Lyon writes:

 On 27 June 2015 at 14:49, Jiong Wang jiong.w...@arm.com wrote:

 Andreas Schwab writes:

 Jiong Wang jiong.w...@arm.com writes:

 Andreas Schwab writes:

 spawn -ignore SIGHUP /opt/gcc/gcc-20150627/Build/gcc/xgcc 
 -B/opt/gcc/gcc-20150627/Build/gcc/ -fno-diagnostics-show-caret 
 -fdiagnostics-color=never -O0 -w -fpie -c -o pr65593.o 
 /opt/gcc/gcc-20150627/gcc/testsuite/gcc.c-torture/compile/pr65593.c
 /tmp/cc0Pymaf.s: Assembler messages:
 /tmp/cc0Pymaf.s:11: Error: unknown relocation modifier at operand 2 -- 
 `ldr x0,[x0,#:gotpage_lo15:a]'
 /tmp/cc0Pymaf.s:19: Error: unknown relocation modifier at operand 2 --
 `ldr x2,[x0,#:gotpage_lo15:bar]'

 Andreas,

   The binutils patch has been upstreamed already. Please checkout the
   latest binutils code.

 That must work with the current binutils.

 I see, agree.

 Will adopt current ILP32 feature detection mechanism into -fpic.

 And some new TLS patches may need similiar check also.

 Thanks.

 Hi Jiong,

 I'm not sure I fully understand your answer. I believe you are going
 to change the testcase, right?

Sorry about the trouble.

I am going to add gcc_GAS_CHECK_FEATURE to detect whether those relocs
needed by -fpic supported. Just like what's done for ILP32 detection.

commit 891f2090d994d420cb56942e486f6bba3ede265f
Author: clyon clyon@138bc75d-0d04-0410-961f-82ee72b054a4
Date:   Wed Dec 11 12:57:08 2013 +



 I'm seeing whole toolchain build failures (when building glibc) since
 you committed this, because of the new relocations. I understand your
 suggestion is to move to trunk binutils to have support for these new
 relocations. Any other things I could do to avoid changing the
 binutils version I use?

 Thanks
 Christophe.


 --
 Regards,
 Jiong


-- 
Regards,
Jiong

Re: Move ABS detection from fold-const.c to match.pd

2015-06-29 Thread Richard Biener

On Sun, Jun 28, 2015 at 8:34 PM, Marc Glisse marc.gli...@inria.fr wrote:
 (this message looks like it was lost in my draft folder...)

 On Tue, 26 May 2015, Richard Biener wrote:

 +(match zerop integer_zerop)
 +(match zerop real_zerop)

 Would it also include fixed_zerop?


 Probably, yes. The main issue is that I know next to nothing about
 fixed-point types, so I am always unsure how to handle them (when I don't
 forget them completely). For instance, in the recently added -A CMP -B, we
 could probably replace

   (if (FLOAT_TYPE_P (TREE_TYPE (@0))
|| (ANY_INTEGRAL_TYPE_P (TREE_TYPE (@0))
 TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0

 with

   (if (FLOAT_TYPE_P (TREE_TYPE (@0))
|| TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0)))

Not sure if TYPE_OVERFLOW_UNDEFINED says sth sensible for fixed-point
types given that there is no overflow for them but they saturate.  As far as I
see the check would even ICE without guarding it with
ANY_INTEGRAL_TYPE_P.  So it would be

   || NON_SAT_FIXED_POINT_TYPE_P (TREE_TYPE (@0))

where I am not sure whether overflow is undefined for non-saturating fixed-point
types ...


 Note that with inlining implemented it would duplicate the pattern for
 each match variant thus in this case adding a tree.[ch] function zerop ()
 might be better.


 Ah... I actually thought we might end up moving things like integer_zerop
 from tree.c to match.pd, especially since predicates are not declared
 'static'... Ok, reverse gear.

Yeah, I don't think match.pd is a good fit for them.

 Note that inlining does not seem necessary to implement more advanced
 predicates like negated_value_for_comparison in the parent message.

Sure not necessary but one point of match-and-simplify was that
the pattern matching is fast because it uses a decision tree.  Once
you introduce predicates that are in functions with their own decision
tree you get back to testing all of them.

 +   (simplify
 +(cnd (cmp @0 zerop) (convert?@2 @0) (negate@1 @2))
 +(if (cmp == EQ_EXPR || cmp == UNEQ_EXPR)
 + @1)
 +(if (cmp == NE_EXPR || cmp == LTGT_EXPR)
 + (non_lvalue @2))
 +(if (TYPE_SIGN (TREE_TYPE (@0)) == SIGNED /* implicit */
 + TYPE_SIGN (type) == SIGNED
 + element_precision (type) = element_precision (TREE_TYPE
 (@0)))
 + (if (cmp == GE_EXPR || cmp == GT_EXPR
 + || (!flag_trapping_math  (cmp == UNGE_EXPR || cmp ==
 UNGT_EXPR)))
 +  (abs @2))
 + (if (cmp == LE_EXPR || cmp == LT_EXPR
 + || (!flag_trapping_math  (cmp == UNLE_EXPR || cmp ==
 UNLT_EXPR)))
 +  (negate (abs @2)
 +   /* Now with the branches swapped.  */
 +   (simplify
 +(cnd (cmp @0 zerop) (negate@1 (convert?@2 @0)) @2)

 not obvious from a quick look - but would you be able to remove the
 swapped branch
 vairant if (cnd:c (cmp @0 zerop) X Y) would work by swapping X and Y?


 Hmm. How do I test if I am currently in the original or commuted version of
 the simplification?

You can't.

 I could add a with block that defines truecmp as
 either cmp or invert_tree_comparison (cmp) and test that. Otherwise, I would
 need a test before each return as swapped versions don't return the same
 thing. It might make a slight difference on the handling of
 flag_trapping_math, but that handling already seems strange to me...

(cnd:c (cmp @0 zerop) (convert?@2 @0) (negate@1 @2))

would get you

 (cnd (cmp @0 zerop) (convert?@2 @0) (negate@1 @2))

and

 (cnd (cmp @0 zerop) (negate@1 @2) (convert?@2 @0))

in the patterns it almost literally looked like what you did manually.

 The fold-const.c code doesn't seem to handle as many variants (esp.
 the swapping?),


 The fold-const.c function is called twice, once on regular operands, once
 with inverted comparison and swapped operands. I really don't think I am
 handling more cases (except maybe the silly a?a:0 is extended to unsigned).

Ok.

 so maybe you can add a testcase that exercises some of the above on
 GIMPLE?


 So mostly the VEC_COND_EXPR version? We don't seem to have that much
 COND_EXPR left in gimple.

Ah, true.  Yes, the vector variant then.

Thanks,
Richard.

 --
 Marc Glisse

Re: [patch] Fix PR middle-end/66633

2015-06-29 Thread Eric Botcazou

 Don't you need to handle convert_nonlocal_omp_clauses similarly (need_chain
 in that case)?
 At least looking at your r211308 commit, for !optimize you force not just
 the frame, but also chain.

You're very likely right, although I didn't manage to write a Fortran testcase 
with my limited knowledge of the language (I get a Bad statement code ICE).

In fact the conditions should also mimic those of the aforementioned commit.
I initially didn't realize it, because I didn't quite grasp how the GOMP stuff 
was interacting with the nesting stuff, but I guess that the code ought not to 
do anything if there aren't pre-existing nested functions in the sources.

Tested on x86_64-suse-linux, OK for the mainline and 5 branch?


2015-06-29  Eric Botcazou  ebotca...@adacore.com

PR middle-end/66633
* tree-nested.c (convert_nonlocal_omp_clauses): Initialize need_chain
to true if the function is nested and if not optimizing.
(convert_local_omp_clauses): Initialize need_frame to true if the
function contains nested functions and if not optimizing.


2015-06-29  Eric Botcazou  ebotca...@adacore.com

* gfortran.dg/gomp/pr66633.f90: New test.



-- 
Eric BotcazouIndex: tree-nested.c
===
--- tree-nested.c	(revision 225111)
+++ tree-nested.c	(working copy)
@@ -1069,7 +1069,9 @@ static bool
 convert_nonlocal_omp_clauses (tree *pclauses, struct walk_stmt_info *wi)
 {
   struct nesting_info *const info = (struct nesting_info *) wi-info;
-  bool need_chain = false, need_stmts = false;
+  /* If not optimizing, we will force the creation of the CHAIN object in
+ convert_all_function_calls, so we need to take it into account here.  */
+  bool need_chain = info-outer  !optimize, need_stmts = false;
   tree clause, decl;
   int dummy;
   bitmap new_suppress;
@@ -1697,7 +1699,9 @@ static bool
 convert_local_omp_clauses (tree *pclauses, struct walk_stmt_info *wi)
 {
   struct nesting_info *const info = (struct nesting_info *) wi-info;
-  bool need_frame = false, need_stmts = false;
+  /* If not optimizing, we will force the creation of the FRAME object in
+ convert_all_function_calls, so we need to take it into account here.  */
+  bool need_frame = info-inner  !optimize, need_stmts = false;
   tree clause, decl;
   int dummy;
   bitmap new_suppress;! PR middle-end/66633
! Testcase by Andrew Benson abenso...@gmail.com

! { dg-do compile }
! { dg-options -O0 -fopenmp }

module spls
contains
  function spl()
!$omp parallel
write (0,*) igrt(fli)
!$omp end parallel
  contains
double precision function fli()
end function fli
  end function spl
end module spls

Re: [PATCH, stage1] Make parloops gate more strict

2015-06-29 Thread Richard Biener

On Fri, Jun 19, 2015 at 9:47 AM, Tom de Vries tom_devr...@mentor.com wrote:
 On 16/06/15 13:18, Richard Biener wrote:

 On Mon, Jun 15, 2015 at 12:38 AM, Tom de Vries tom_devr...@mentor.com
 wrote:

 On 14/06/15 23:49, Bernhard Reutner-Fischer wrote:


 On June 14, 2015 10:55:59 AM GMT+02:00, Tom de Vries
 tom_devr...@mentor.com wrote:


 On 13/03/15 11:36, Richard Biener wrote:


 On Fri, Mar 13, 2015 at 11:32 AM, Tom de Vries


 tom_devr...@mentor.com wrote:


 Hi,

 this patch moves a bunch of early-out tests from the parloops pass


 to the


 gate function.

 The only effect is for functions that we don't consider at all for
 parallelization in the parloops pass. We no longer dump those in the
 parloops dump file.

 Bootstrapped and reg-tested on x86_64.

 OK for stage1 trunk?



 Does it work with -fdump-passes?


 Hi,

 with -fdump-passes now fixed to work on a dummy function (r222129), I'm

 resubmitting this patch, split up in two patches.

 The first patch moves two trivial early-exit tests to the parloops
 gate.

 The second patch moves the number_of_loops test to the parloops gate,
 and adds a dummy loops structure in the dummy function for
 -fdump-passes.

 Bootstrapped and reg-tested on x86_64.

 Both patches OK for trunk?



 diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
 index 02f44eb..a1659a3 100644
 --- a/gcc/tree-parloops.c
 +++ b/gcc/tree-parloops.c
 @@ -2535,10 +2535,6 @@ parallelize_loops (void)
  source_location loop_loc;

  /* Do not parallelize loops in the functions created by
 parallelization.  */
 -  if (parallelized_function_p (cfun-decl))
 -return false;
 -  if (cfun-has_nonlocal_label)
 -return false;

  gcc_obstack_init (parloop_obstack);
  reduction_info_table_type reduction_list (10);

 Now stray comment?
 Stopped reading here.


 Fixed in updated patch.

 Also:
 - made sure cfun is not used in the gate function
 - added missing update of function header comment for
init_loops_structure
 - improved comment in pass_manager::dump_passes.


 OK for trunk?


 For -fdump-passes this doesn't make much sense but I suppose
 you are after not getting the untouched functions dumped.  Note
 that generally this is undesired for debugging (in my opinion)
 as diffs from the pass dump before parloops to parloops will
 contain a lot of spurious changes (and if the preceeding pass
 is changed similarly the function we run parloops on is maybe
 not even dumped there!).

 So I question the general idea of the change.


 I suppose there are two competing principles:
 1. ensure the state before the pass is in the previous dump
 2. only dump if changed

 Indeed in general we use the first principle, although it doesn't hold for
 f.i. pass_tree_loop and a function without loops.

 The problems I'm trying to solve with this patch are the following:
 - even if you're compiling a single function, part of the function
   occurs twice (once in the original, once in the split-off
   function), making formulating scan-tree-dumps more complicated for
   parloops.
 - the intuitive thing is to look in the parloops tree-dump and look at
   the original function and the split-off function, and think that the
   split-off function is the immediate result of the split-off that
   happened in the pass, which is incorrect (since it jumps back in the
   pass list and traverses other passes before arriving back at
   parloops).

 Adding something like a flag -fno-dump-new-function would solve the first
 problem, but not the second.

Yeah, any pass introducing new functions has this issue.

 We could also dump new functions in a different dump file
 src.c.new.123t.pass, and this would solve both problems. But it will cause
 the 'where did that function go' confusion, certainly initially.

 Any other ideas?

No, not really.  But it sounds like a very special thing you try to solve.
You could also dump the split off function twice - once next to the function
it was split off and once when it arrives at parloops again...

Richard.

 Thanks,
 - Tom

Re: [PATCH 1/2] Allow REG_EQUAL for ZERO_EXTRACT

2015-06-29 Thread Maxim Kuvyrkov

 On Jun 28, 2015, at 2:28 PM, Kugan kugan.vivekanandara...@linaro.org wrote:
 
 This patch allows setting REG_EQUAL for ZERO_EXTRACT and handle that in
 cse (where the src for the ZERO_EXTRACT needs to be calculated)
 
 Thanks,
 Kugan

 From 75e746e559ffd21b25542b3db627e3b318118569 Mon Sep 17 00:00:00 2001
 From: Kugan Vivekanandarajah kugan.vivekanandara...@linaro.org
 Date: Fri, 26 Jun 2015 17:12:07 +1000
 Subject: [PATCH 1/2] Allow adding REG_EQUAL for ZERO_EXTRACT
 
 ---
  gcc/ChangeLog  |  6 ++
  gcc/cse.c  | 41 -
  gcc/emit-rtl.c |  3 ++-
  3 files changed, 48 insertions(+), 2 deletions(-)
 
 diff --git a/gcc/ChangeLog b/gcc/ChangeLog
 index 080aa39..d4a73d6 100644
 --- a/gcc/ChangeLog
 +++ b/gcc/ChangeLog
 @@ -1,3 +1,9 @@
 +2015-06-26  Kugan Vivekanandarajah  kug...@linaro.org
 +
 + * cse.c (cse_insn): Calculate src_eqv for ZERO_EXTRACT.
 + * emit-rtl.c (set_for_reg_notes): Allow ZERO_EXTRACT to set
 + REG_EQUAL note.
 +
  2015-06-25  H.J. Lu  hongjiu...@intel.com
  
   * gentarget-def.c (def_target_insn): Cast return of strtol to
 diff --git a/gcc/cse.c b/gcc/cse.c
 index 100c9c8..8add651 100644
 --- a/gcc/cse.c
 +++ b/gcc/cse.c
 @@ -4531,8 +4531,47 @@ cse_insn (rtx_insn *insn)
if (n_sets == 1  REG_NOTES (insn) != 0
 (tem = find_reg_note (insn, REG_EQUAL, NULL_RTX)) != 0
 (! rtx_equal_p (XEXP (tem, 0), SET_SRC (sets[0].rtl))
 +   || GET_CODE (SET_DEST (sets[0].rtl)) == ZERO_EXTRACT
 || GET_CODE (SET_DEST (sets[0].rtl)) == STRICT_LOW_PART))
 -src_eqv = copy_rtx (XEXP (tem, 0));
 +{
 +  src_eqv = copy_rtx (XEXP (tem, 0));
 +
 +  /* If DEST is of the form ZERO_EXTACT, as in:
 +  (set (zero_extract:SI (reg:SI 119)
 +   (const_int 16 [0x10])
 +   (const_int 16 [0x10]))
 +   (const_int 51154 [0xc7d2]))
 +  REG_EQUAL note will specify the value of register (reg:SI 119) at this
 +  point.  Note that this is different from SRC_EQV. We can however
 +  calculate SRC_EQV with the position and width of ZERO_EXTRACT.  */
 +  if (GET_CODE (SET_DEST (sets[0].rtl)) == ZERO_EXTRACT)

Consider changing

if (something
 (!rtx_equal_p)
|| ZERO_EXTRACT
|| STRICT_LOW_PART)

to 

if (something
 !rtx_equal_p)
  {
 if (ZERO_EXTRACT)
   {
   }
 else if (STRICT_LOW_PART)
   {
   }
  }

Otherwise looks good to me, but you still need another approval.


 + {
 +   if (CONST_INT_P (src_eqv)
 +CONST_INT_P (XEXP (SET_DEST (sets[0].rtl), 1))
 +CONST_INT_P (XEXP (SET_DEST (sets[0].rtl), 2)))
 + {
 +   rtx dest_reg = XEXP (SET_DEST (sets[0].rtl), 0);
 +   rtx width = XEXP (SET_DEST (sets[0].rtl), 1);
 +   rtx pos = XEXP (SET_DEST (sets[0].rtl), 2);
 +   HOST_WIDE_INT val = INTVAL (src_eqv);
 +   HOST_WIDE_INT mask;
 +   unsigned int shift;
 +   if (BITS_BIG_ENDIAN)
 + shift = GET_MODE_PRECISION (GET_MODE (dest_reg))
 +   - INTVAL (pos) - INTVAL (width);
 +   else
 + shift = INTVAL (pos);
 +   if (INTVAL (width) == HOST_BITS_PER_WIDE_INT)
 + mask = ~(HOST_WIDE_INT) 0;
 +   else
 + mask = ((HOST_WIDE_INT) 1  INTVAL (width)) - 1;
 +   val = (val  shift)  mask;
 +   src_eqv = GEN_INT (val);
 + }
 +   else
 + src_eqv = 0;
 + }
 +}
  
/* Set sets[i].src_elt to the class each source belongs to.
   Detect assignments from or to volatile things
 diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
 index e7f7eab..cb891b1 100644
 --- a/gcc/emit-rtl.c
 +++ b/gcc/emit-rtl.c
 @@ -5228,7 +5228,8 @@ set_for_reg_notes (rtx insn)
reg = SET_DEST (pat);
  
/* Notes apply to the contents of a STRICT_LOW_PART.  */
 -  if (GET_CODE (reg) == STRICT_LOW_PART)
 +  if (GET_CODE (reg) == STRICT_LOW_PART
 +  || GET_CODE (reg) == ZERO_EXTRACT)
  reg = XEXP (reg, 0);
  
/* Check that we have a register.  */
 -- 
 1.9.1
 
 


--
Maxim Kuvyrkov
www.linaro.org

Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math

2015-06-29 Thread James Greenhalgh

On Mon, Jun 29, 2015 at 10:18:23AM +0100, Kumar, Venkataramanan wrote:

  -Original Message-
  From: Dr. Philipp Tomsich [mailto:philipp.toms...@theobroma-systems.com]
  Sent: Monday, June 29, 2015 2:17 PM
  To: Kumar, Venkataramanan
  Cc: pins...@gmail.com; Benedikt Huber; gcc-patches@gcc.gnu.org
  Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt)
  estimation in -ffast-math

  Kumar,

  This does not come unexpected, as the initial estimation and each iteration
  will add an architecturally-defined number of bits of precision (ARMv8
  guarantuees only a minimum number of bits provided per operation… the
  exact number is specific to each micro-arch, though).
  Depending on your architecture and on the required number of precise bits
  by any given benchmark, one may see miscompares.

 True.  

I would be very uncomfortable with this approach.

From Richard Biener's post in the thread Michael Matz linked earlier
in the thread:

It would follow existing practice of things we allow in
-funsafe-math-optimizations.  Existing practice in that we
want to allow -ffast-math use with common benchmarks we care
about.

https://gcc.gnu.org/ml/gcc-patches/2009-11/msg00100.html

With the solution you seem to be converging on (2-steps for some
microarchitectures, 3 for others), a binary generated for one micro-arch
may drop below a minimum guarantee of precision when run on another. This
seems to go against the spirit of the practice above. I would only support
adding this optimization to -Ofast if we could keep to architectural
guarantees of precision in the generated code (i.e. 3-steps everywhere).

I don't object to adding a -mlow-precision-recip-sqrt style option,
which would be off by default, would enable the 2-step mode, and would
need to be explicitly enabled (i.e. not implied by -mcpu=foo) but I don't
see what this buys you beyond the Gromacs boost (and even there you would
be creating an Invalid Run as optimization flags must be applied across
all workloads). 

For the 3-step optimization, it is clear to me that for generic tuning
we don't want this to be enabled by default experimental results and advice
in this thread argues against it for thunderx and cortex-a57 targets.
However, enabling it based on the CPU tuning selected seems fine to me.

Thanks,
James

  Do you know the exact number of bits that the initial estimate and the
  subsequent refinement steps add for your micro-arch?

 I am not sure on this. I  need to check for cortex-a57 case.  

 But best thing for cortex-a57 case is not to use this optimization by 
 default. 
 If we get an -mrecip-sqrt command line , then we can add it for gromacs 
 kind application to get gains.

 Any thoughts on this?

 Regards,
 Venkat.

  Thanks,
  Philipp.

   On 29 Jun 2015, at 10:17, Kumar, Venkataramanan
  venkataramanan.ku...@amd.com wrote:

   Hmm,  Reducing the iterations to 1 step for float and 2 steps for 
   double

   I got VE (miscompares) on following benchmarks 416.gamess
   453.povray
   454.calculix
   459.GemsFDTD

   Benedikt , I have ICE for 444.namd with your patch,  not sure if something
  wrong in my local tree.

   Regards,
   Venkat.

   -Original Message-
   From: pins...@gmail.com [mailto:pins...@gmail.com]
   Sent: Sunday, June 28, 2015 8:35 PM
   To: Kumar, Venkataramanan
   Cc: Dr. Philipp Tomsich; Benedikt Huber; gcc-patches@gcc.gnu.org
   Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root
   (rsqrt) estimation in -ffast-math

   On Jun 25, 2015, at 9:44 AM, Kumar, Venkataramanan
   venkataramanan.ku...@amd.com wrote:

   I got around ~12% gain with -Ofast -mcpu=cortex-a57.

   I get around 11/12% on thunderX with the patch and the decreasing the
   iterations change (1/2) compared to without the patch.

   Thanks,
   Andrew

   Regards,
   Venkat.

   -Original Message-
   From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
   ow...@gcc.gnu.org] On Behalf Of Dr. Philipp Tomsich
   Sent: Thursday, June 25, 2015 9:13 PM
   To: Kumar, Venkataramanan
   Cc: Benedikt Huber; pins...@gmail.com; gcc-patches@gcc.gnu.org
   Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root
   (rsqrt) estimation in -ffast-math

   Kumar,

   what is the relative gain that you see on Cortex-A57?

   Thanks,
   Philipp.

   On 25 Jun 2015, at 17:35, Kumar, Venkataramanan
   venkataramanan.ku...@amd.com wrote:

   Changing to  1 step for float and 2 steps for double gives
   better gains
   now for gromacs on cortex-a57.

   Regards,
   Venkat.
   -Original Message-
   From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
   ow...@gcc.gnu.org] On Behalf Of Benedikt Huber
   Sent: Thursday, June 25, 2015 4:09 PM
   To: pins...@gmail.com
   Cc: gcc-patches@gcc.gnu.org; philipp.tomsich@theobroma-
   systems.com
   Subject: Re: [PATCH] [aarch64] Implemented reciprocal square

Re: [PATCH 2/2] Set REG_EQUAL

2015-06-29 Thread Maxim Kuvyrkov

 On Jun 28, 2015, at 2:30 PM, Kugan kugan.vivekanandara...@linaro.org wrote:
 
 This patch sets REG_EQUAL when emitting arm_emit_movpair.
 
 Thanks,
 Kugan
 
 gcc/testsuite/ChangeLog:
 
 2015-06-26  Kugan Vivekanandarajah  kug...@linaro.org
 
   * gcc.target/arm/reg_equal_test.c: New test.
 
 gcc.
 
 2015-06-26  Kugan Vivekanandarajah  kug...@linaro.org
 
   * config/arm/arm.c (arm_emit_movpair): Add REG_EQUAL notes to
   instruction.
 0002-Add-REG_EQUAL-note-for-arm_emit_movpair.patch

LGTM, but you need ARM maintainer's approval.

--
Maxim Kuvyrkov
www.linaro.org

Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math

2015-06-29 Thread Dr. Philipp Tomsich

James,

On 29 Jun 2015, at 13:36, James Greenhalgh james.greenha...@arm.com wrote:

 On Mon, Jun 29, 2015 at 10:18:23AM +0100, Kumar, Venkataramanan wrote:

 -Original Message-
 From: Dr. Philipp Tomsich [mailto:philipp.toms...@theobroma-systems.com]
 Sent: Monday, June 29, 2015 2:17 PM
 To: Kumar, Venkataramanan
 Cc: pins...@gmail.com; Benedikt Huber; gcc-patches@gcc.gnu.org
 Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt)
 estimation in -ffast-math

 Kumar,

 This does not come unexpected, as the initial estimation and each iteration
 will add an architecturally-defined number of bits of precision (ARMv8
 guarantuees only a minimum number of bits provided per operation… the
 exact number is specific to each micro-arch, though).
 Depending on your architecture and on the required number of precise bits
 by any given benchmark, one may see miscompares.

 True.  

 I would be very uncomfortable with this approach.

Same here. The default must be safe. Always.
Unlike other architectures, we don’t have a problem with making the proper
defaults for “safety”, as the ARMv8 ISA guarantees a minimum number of
precise bits per iteration.

 From Richard Biener's post in the thread Michael Matz linked earlier
 in the thread:

It would follow existing practice of things we allow in
-funsafe-math-optimizations.  Existing practice in that we
want to allow -ffast-math use with common benchmarks we care
about.

https://gcc.gnu.org/ml/gcc-patches/2009-11/msg00100.html

 With the solution you seem to be converging on (2-steps for some
 microarchitectures, 3 for others), a binary generated for one micro-arch
 may drop below a minimum guarantee of precision when run on another. This
 seems to go against the spirit of the practice above. I would only support
 adding this optimization to -Ofast if we could keep to architectural
 guarantees of precision in the generated code (i.e. 3-steps everywhere).

 I don't object to adding a -mlow-precision-recip-sqrt style option,
 which would be off by default, would enable the 2-step mode, and would
 need to be explicitly enabled (i.e. not implied by -mcpu=foo) but I don't
 see what this buys you beyond the Gromacs boost (and even there you would
 be creating an Invalid Run as optimization flags must be applied across
 all workloads). 

Any flag that reduces precision (and thus breaks IEEE floating-point semantics)
needs to be gated with an “unsafe” flag (i.e. one that is never on by default).
As a consequence, the “peak”-tuning for SPEC will turn this on… but barely 
anyone else would.

 For the 3-step optimization, it is clear to me that for generic tuning
 we don't want this to be enabled by default experimental results and advice
 in this thread argues against it for thunderx and cortex-a57 targets.
 However, enabling it based on the CPU tuning selected seems fine to me.

I do not agree on this one, as I would like to see the safe form (i.e. 3 and 5
iterations respectively) to become the default. Most “server-type” chips
should not see a performance regression, while it will be easier to optimise for
this in hardware than for a (potentially microcoded) sqrt-instruction (and 
subsequent, dependent divide).

I have not heard anyone claim a performance regression (either on thunderx
or on cortex-a57), but merely heard a “no speed-up”.

So I am strongly in favor of defaulting to the ‘safe’ number of iterations, even
when compiling for a generic target.

Best,
Philipp.

[PATCH][7/n] Remove GENERIC stmt combining from SCCVN

2015-06-29 Thread Richard Biener


This moves a few more patterns from fold-const.c to match.pd, also
covering two cases where fold basically recurses (calling
fold_unary or using negate_expr_p/negate_expr).  For these cases
we have to implement a subset of all possible cases (all fold
cases basically searching for some stuff arbitrarily deep in
an expression tree would need to be converted to a propagator-like
pass rather than too many patterns).

It also moves (x  y) | x - x before (x | CST1)  CST2 - (x  CST2) | 
(CST1  CST2) so it has the chance to match (x | 1)  1 which with
the first pattern simplifies without needing intermediate stmts
(which I don't allow for SCCVN for efficiency reasons).

As for other generator inputs what comes earlier is matched earlier.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

I think next I'll merge trunk into match-and-simplify and see what
patterns I have implemented there and will either kill them off
if they are broken or move them.

Richard.

2015-06-29  Richard Biener  rguent...@suse.de

* fold-const.c (fold_unary_loc): Move abs(abs(x)) - abs(x),
~ (-A) to A - 1, ~ (A - 1) or ~ (A + -1) to -A and some cases of
~(X ^ Y) to ~X ^ Y or X ^ ~Y if ~X or ~Y simplify to ...
* match.pd: ... here.
Add a few cases of A - B - A + (-B) when B easily negates.
Move (x  y) | x - x and friends before
(x | CST1)  CST2 - (x  CST2) | (CST1  CST2).

Index: gcc/fold-const.c
===
*** gcc/fold-const.c(revision 225115)
--- gcc/fold-const.c(working copy)
*** fold_unary_loc (location_t loc, enum tre
*** 8131,8139 
  TREE_TYPE (targ0),
  targ0));
}
-   /* ABS_EXPRABS_EXPRx = ABS_EXPRx even if flag_wrapv is on.  */
-   else if (TREE_CODE (arg0) == ABS_EXPR)
-   return arg0;
  
/* Strip sign ops from argument.  */
if (TREE_CODE (type) == REAL_TYPE)
--- 8131,8136 
*** fold_unary_loc (location_t loc, enum tre
*** 8161,8193 
return NULL_TREE;
  
  case BIT_NOT_EXPR:
-   /* Convert ~ (-A) to A - 1.  */
-   if (INTEGRAL_TYPE_P (type)  TREE_CODE (arg0) == NEGATE_EXPR)
-   return fold_build2_loc (loc, MINUS_EXPR, type,
-   fold_convert_loc (loc, type, TREE_OPERAND (arg0, 
0)),
-   build_int_cst (type, 1));
-   /* Convert ~ (A - 1) or ~ (A + -1) to -A.  */
-   else if (INTEGRAL_TYPE_P (type)
-   ((TREE_CODE (arg0) == MINUS_EXPR
-integer_onep (TREE_OPERAND (arg0, 1)))
-  || (TREE_CODE (arg0) == PLUS_EXPR
-   integer_all_onesp (TREE_OPERAND (arg0, 1)
-   {
- /* Perform the negation in ARG0's type and only then convert
-to TYPE as to avoid introducing undefined behavior.  */
- tree t = fold_build1_loc (loc, NEGATE_EXPR,
-   TREE_TYPE (TREE_OPERAND (arg0, 0)),
-   TREE_OPERAND (arg0, 0));
- return fold_convert_loc (loc, type, t);
-   }
/* Convert ~(X ^ Y) to ~X ^ Y or X ^ ~Y if ~X or ~Y simplify.  */
!   else if (TREE_CODE (arg0) == BIT_XOR_EXPR
!   (tem = fold_unary_loc (loc, BIT_NOT_EXPR, type,
!fold_convert_loc (loc, type,
!  TREE_OPERAND (arg0, 
0)
return fold_build2_loc (loc, BIT_XOR_EXPR, type, tem,
!   fold_convert_loc (loc, type,
! TREE_OPERAND (arg0, 1)));
else if (TREE_CODE (arg0) == BIT_XOR_EXPR
(tem = fold_unary_loc (loc, BIT_NOT_EXPR, type,
 fold_convert_loc (loc, type,
--- 8158,8171 
return NULL_TREE;
  
  case BIT_NOT_EXPR:
/* Convert ~(X ^ Y) to ~X ^ Y or X ^ ~Y if ~X or ~Y simplify.  */
!   if (TREE_CODE (arg0) == BIT_XOR_EXPR
!  (tem = fold_unary_loc (loc, BIT_NOT_EXPR, type,
!   fold_convert_loc (loc, type,
! TREE_OPERAND (arg0, 0)
return fold_build2_loc (loc, BIT_XOR_EXPR, type, tem,
!   fold_convert_loc (loc, type,
! TREE_OPERAND (arg0, 1)));
else if (TREE_CODE (arg0) == BIT_XOR_EXPR
(tem = fold_unary_loc (loc, BIT_NOT_EXPR, type,
 fold_convert_loc (loc, type,
Index: gcc/match.pd
===
*** gcc/match.pd(revision 225115)
--- gcc/match.pd(working copy)
*** (define_operator_list swapped_tcc_compar
*** 378,389 
--- 378,409 
   (bit_and @0 @1))
  
  (simplify

Re: [gomp4][PATCH] Handle casts in bound in try_transform_to_exit_first_loop_alt

2015-06-29 Thread Tom de Vries


On 22/06/15 16:36, Richard Biener wrote:

On Thu, 18 Jun 2015, Tom de Vries wrote:


On 13/06/15 16:24, Tom de Vries wrote:

Hi,

this patch allows try_transform_to_exit_first_loop_alt to succeed when
handling cases where the expression representing the number of
iterations contains a cast.

Currently, transform_to_exit_first_loop_alt testcase
gfortran/parloops-exit-first-loop-alt.f95 will fail.

The nit is _19, which is defined as follows:
...
_20 = _6 + -1;
_19 = (unsigned int) _20;
...
And transform_to_exit_first_loop_alt currently only handles nits with
defining stmt 'nit = x - 1', for which it finds alt_bound 'x'.

The patch:
- uses try_get_loop_niter to get nit as a nested tree expression
'(unsigned int) (_6 + -1)'
- strips the outer nops (assuming no change in value)
- uses '(unsigned int)_6' as the alt_bound, and
- gimplifies the expression.

Bootstrapped and reg-tested on x86_64.



Cleaned up whitespace in testcases.

Committed to gomp-4_0-branch as atttached.


OK for trunk?



I assume the above also handles the reverse, (int) (_6 + -1)?


AFAIU, niter.niter is always unsigned.


In this case what happens if _6 == INT_MAX + 1?  nit is INT_MAX
but (int) _6 is INT_MIN.

Likewise what happens if _6 + -1 under-/overflows?



While playing around with underflow/overflow cases, I ran into PR66652, 
which reproduces on trunk.


I've submitted for trunk:
- https://gcc.gnu.org/ml/gcc-patches/2015-06/msg02084.html , which
  fixes PR66652, and should handle all overflow concerns.
- https://gcc.gnu.org/ml/gcc-patches/2015-06/msg02085.html , which deals
  with the case that nit is not defined by assignment nit = n - 1, in
  other words, with the case that this patch is trying to address.

I'll revert this patch on gomp-4_0-branch, and commit these patches 
instead (in addition to 
https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01768.html, which fixes 
low iteration count reduction loops handled by 
transform_to_exit_first_loop_alt).


Thanks,
- Tom

[committed] Flags outputs for asms

2015-06-29 Thread Richard Henderson


Sorry for the delay, but here's the third and final version.
This includes the requested cpp symbol and updated documentation.


r~

* config/i386/constraints.md (Bf): New constraint.
* config/i386/i386-c.c (ix86_target_macros): Define
__GCC_ASM_FLAG_OUTPUTS__.
* config/i386/i386.c (ix86_md_asm_adjust): Handle =@cc* constraints
as flags outputs.
* doc/extend.texi (FlagOutputOperands): Document them.

* gcc.target/i386/asm-flag-1.c: New.
* gcc.target/i386/asm-flag-2.c: New.
* gcc.target/i386/asm-flag-3.c: New.
* gcc.target/i386/asm-flag-4.c: New.
* gcc.target/i386/asm-flag-5.c: New.


diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
index c718bc1..2861d8d 100644
--- a/gcc/config/i386/constraints.md
+++ b/gcc/config/i386/constraints.md
@@ -146,11 +146,16 @@
  @internal Lower SSE register when avoiding REX prefix and all SSE registers 
otherwise.)
 
 ;; We use the B prefix to denote any number of internal operands:
+;;  f  FLAGS_REG
 ;;  g  GOT memory operand.
 ;;  s  Sibcall memory operand, not valid for TARGET_X32
 ;;  w  Call memory operand, not valid for TARGET_X32
 ;;  z  Constant call address operand.
 
+(define_constraint Bf
+  @internal Flags register operand.
+  (match_operand 0 flags_reg_operand))
+
 (define_constraint Bg
   @internal GOT memory operand.
   (match_operand 0 GOT_memory_operand))
diff --git a/gcc/config/i386/i386-c.c b/gcc/config/i386/i386-c.c
index 28444f9..d506345 100644
--- a/gcc/config/i386/i386-c.c
+++ b/gcc/config/i386/i386-c.c
@@ -560,6 +560,8 @@ ix86_target_macros (void)
   cpp_define_formatted (parse_in, __ATOMIC_HLE_ACQUIRE=%d, IX86_HLE_ACQUIRE);
   cpp_define_formatted (parse_in, __ATOMIC_HLE_RELEASE=%d, IX86_HLE_RELEASE);
 
+  cpp_define (parse_in, __GCC_ASM_FLAG_OUTPUTS__);
+
   ix86_target_macros_internal (ix86_isa_flags,
   ix86_arch,
   ix86_tune,
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 9598600..144c1a6 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -4,21 +4,144 @@ ix86_c_mode_for_suffix (char suffix)
 
 /* Worker function for TARGET_MD_ASM_ADJUST.
 
-   We do this in the new i386 backend to maintain source compatibility
+   We implement asm flag outputs, and maintain source compatibility
with the old cc0-based compiler.  */
 
 static rtx_insn *
-ix86_md_asm_adjust (vecrtx /*outputs*/, vecrtx /*inputs*/,
-   vecconst char * /*constraints*/,
+ix86_md_asm_adjust (vecrtx outputs, vecrtx /*inputs*/,
+   vecconst char * constraints,
vecrtx clobbers, HARD_REG_SET clobbered_regs)
 {
-  clobbers.safe_push (gen_rtx_REG (CCmode, FLAGS_REG));
   clobbers.safe_push (gen_rtx_REG (CCFPmode, FPSR_REG));
-
-  SET_HARD_REG_BIT (clobbered_regs, FLAGS_REG);
   SET_HARD_REG_BIT (clobbered_regs, FPSR_REG);
 
-  return NULL;
+  bool saw_asm_flag = false;
+
+  start_sequence ();
+  for (unsigned i = 0, n = outputs.length (); i  n; ++i)
+{
+  const char *con = constraints[i];
+  if (strncmp (con, =@cc, 4) != 0)
+   continue;
+  con += 4;
+  if (strchr (con, ',') != NULL)
+   {
+ error (alternatives not allowed in asm flag output);
+ continue;
+   }
+
+  bool invert = false;
+  if (con[0] == 'n')
+   invert = true, con++;
+
+  machine_mode mode = CCmode;
+  rtx_code code = UNKNOWN;
+
+  switch (con[0])
+   {
+   case 'a':
+ if (con[1] == 0)
+   mode = CCAmode, code = EQ;
+ else if (con[1] == 'e'  con[2] == 0)
+   mode = CCCmode, code = EQ;
+ break;
+   case 'b':
+ if (con[1] == 0)
+   mode = CCCmode, code = EQ;
+ else if (con[1] == 'e'  con[2] == 0)
+   mode = CCAmode, code = NE;
+ break;
+   case 'c':
+ if (con[1] == 0)
+   mode = CCCmode, code = EQ;
+ break;
+   case 'e':
+ if (con[1] == 0)
+   mode = CCZmode, code = EQ;
+ break;
+   case 'g':
+ if (con[1] == 0)
+   mode = CCGCmode, code = GT;
+ else if (con[1] == 'e'  con[2] == 0)
+   mode = CCGCmode, code = GE;
+ break;
+   case 'l':
+ if (con[1] == 0)
+   mode = CCGCmode, code = LT;
+ else if (con[1] == 'e'  con[2] == 0)
+   mode = CCGCmode, code = LE;
+ break;
+   case 'o':
+ if (con[1] == 0)
+   mode = CCOmode, code = EQ;
+ break;
+   case 'p':
+ if (con[1] == 0)
+   mode = CCPmode, code = EQ;
+ break;
+   case 's':
+ if (con[1] == 0)
+   mode = CCSmode, code = EQ;
+ break;
+   case 'z':
+ if (con[1] == 0)
+   mode = CCZmode, code = EQ;
+ break;
+   }
+  if (code == UNKNOWN)
+   {
+ error (unknown asm flag output %qs,

Four jit backports to gcc 5 branch

2015-06-29 Thread David Malcolm

I've gone over the changes to the gcc/jit and gcc/testsuite/jit.dg
directories in trunk since gcc 5 and backported the following 4 changes
from trunk to the gcc-5-branch:

gcc-5-branch's r225123:
  * trunk's r222863 (8120405bda9bf28b004e09b23e74eda57ae1, addition
of test-benchmark.c):
  https://gcc.gnu.org/ml/gcc-patches/2015-05/msg00500.html


gcc-5-branch's r225125:
  * trunk's r224531 (8154a3514d5fc8067a6928531d5f61cd768bd62c
along with trunk's r224535 (1828cd755cf5e4a34d5638f543a059f3562ad957,
PR jit/66539: Add parentheses as needed to
gcc_jit_object_get_debug_string):
  https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01158.html


gcc-5-branch's r225127:
  * trunk's r224536 (3052eeefc4607a7147fdc55af6d86845890eb281, jit: Add
a test for compound assignment):
  https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01164.html


gcc-5-branch's r225129:
  * trunk's r224565 (6689f47f53079d76bbb051d3b5da9018c2e0161a,
jit: Add missing type-checking to gcc_jit_{l|r}value_access_field)
  https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01211.html


Note to self: I believe that covers everything on master up to
ec2e0095a3a5988b03a2706b5ffe0e807b238ba8 (on 2015-06-25) that might be
backportable.

Dave

[PATCH] Insert new bound in try_transform_to_exit_first_loop_alt

2015-06-29 Thread Tom de Vries


Hi,

this patch allows try_transform_to_exit_first_loop_alt to handle the 
case that the new loop bound nit + 1 is not available as ssa-name n in 
the assignment nit = n - 1, by inserting the new loop bound.


Bootstrapped and reg-tested on x86_64.

OK for trunk?

Thanks,
- Tom
Insert new bound in try_transform_to_exit_first_loop_alt

2015-06-29  Tom de Vries  t...@codesourcery.com

	* tree-parloops.c (try_transform_to_exit_first_loop_alt): If not found,
	insert nit + 1 bound.

	* testsuite/libgomp.fortran/parloops-exit-first-loop-alt-2.f95: New test.
	* testsuite/libgomp.fortran/parloops-exit-first-loop-alt.f95: New test.

	* gfortran.dg/parloops-exit-first-loop-alt-2.f95: New test.
	* gfortran.dg/parloops-exit-first-loop-alt.f95: New test.
---
 .../gfortran.dg/parloops-exit-first-loop-alt-2.f95 | 24 +
 .../gfortran.dg/parloops-exit-first-loop-alt.f95   | 25 +
 gcc/tree-parloops.c| 18 +-
 .../parloops-exit-first-loop-alt-2.f95 | 40 +
 .../parloops-exit-first-loop-alt.f95   | 41 ++
 5 files changed, 147 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gfortran.dg/parloops-exit-first-loop-alt-2.f95
 create mode 100644 gcc/testsuite/gfortran.dg/parloops-exit-first-loop-alt.f95
 create mode 100644 libgomp/testsuite/libgomp.fortran/parloops-exit-first-loop-alt-2.f95
 create mode 100644 libgomp/testsuite/libgomp.fortran/parloops-exit-first-loop-alt.f95

diff --git a/gcc/testsuite/gfortran.dg/parloops-exit-first-loop-alt-2.f95 b/gcc/testsuite/gfortran.dg/parloops-exit-first-loop-alt-2.f95
new file mode 100644
index 000..f26a6e3
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/parloops-exit-first-loop-alt-2.f95
@@ -0,0 +1,24 @@
+! { dg-additional-options -O2 }
+! { dg-require-effective-target pthread }
+! { dg-additional-options -ftree-parallelize-loops=2 }
+! { dg-additional-options -fdump-tree-parloops }
+
+! Constant bound, vector addition.
+
+subroutine foo ()
+  integer, parameter :: n = 1000
+  integer, dimension (0:n-1) :: a, b, c
+  common a, b, c
+  integer :: ii
+
+  do ii = 0, n - 1
+ c(ii) = a(ii) + b(ii) + 25
+  end do
+end subroutine foo
+
+! Three times plus 25:
+! - once in f._loopfn.0
+! - once in the parallel
+! - once in the low iteration count loop
+! Crucially, none for a peeled off last iteration following the parallel.
+! { dg-final { scan-tree-dump-times (?n) \\+ 25; 3 parloops } }
diff --git a/gcc/testsuite/gfortran.dg/parloops-exit-first-loop-alt.f95 b/gcc/testsuite/gfortran.dg/parloops-exit-first-loop-alt.f95
new file mode 100644
index 000..6dc8a38
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/parloops-exit-first-loop-alt.f95
@@ -0,0 +1,25 @@
+! { dg-additional-options -O2 }
+! { dg-require-effective-target pthread }
+! { dg-additional-options -ftree-parallelize-loops=2 }
+! { dg-additional-options -fdump-tree-parloops }
+
+! Variable bound, vector addition.
+
+subroutine foo (nr)
+  integer, intent(in) :: nr
+  integer, parameter :: n = 1000
+  integer, dimension (0:n-1) :: a, b, c
+  common a, b, c
+  integer :: ii
+
+  do ii = 0, nr - 1
+ c(ii) = a(ii) + b(ii) + 25
+  end do
+end subroutine foo
+
+! Three times plus 25:
+! - once in f._loopfn.0
+! - once in the parallel
+! - once in the low iteration count loop
+! Crucially, none for a peeled off last iteration following the parallel.
+! { dg-final { scan-tree-dump-times (?n) \\+ 25; 3 parloops } }
diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
index 32d059a..7a07c7d 100644
--- a/gcc/tree-parloops.c
+++ b/gcc/tree-parloops.c
@@ -1840,8 +1840,24 @@ try_transform_to_exit_first_loop_alt (struct loop *loop,
 	alt_bound = op1;
 }
 
+  /* If not found, insert nit + 1.  */
   if (alt_bound == NULL_TREE)
-return false;
+{
+  alt_bound = fold_build2 (PLUS_EXPR, nit_type, nit,
+			   build_int_cst_type (nit_type, 1));
+
+  gimple_seq pre = NULL, post = NULL;
+  push_gimplify_context (true);
+  gimplify_expr (alt_bound, pre, post, is_gimple_reg,
+		 fb_rvalue);
+  pop_gimplify_context (NULL);
+
+  gimple_seq_add_seq (pre, post);
+
+  gimple_stmt_iterator gsi
+	= gsi_last_bb (loop_preheader_edge (loop)-src);
+  gsi_insert_seq_after (gsi, pre, GSI_CONTINUE_LINKING);
+}
 
   transform_to_exit_first_loop_alt (loop, reduction_list, alt_bound);
   return true;
diff --git a/libgomp/testsuite/libgomp.fortran/parloops-exit-first-loop-alt-2.f95 b/libgomp/testsuite/libgomp.fortran/parloops-exit-first-loop-alt-2.f95
new file mode 100644
index 000..56add65
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/parloops-exit-first-loop-alt-2.f95
@@ -0,0 +1,40 @@
+! { dg-do run }
+! { dg-additional-options -O2 }
+! { dg-additional-options -ftree-parallelize-loops=2 }
+
+! Constant bound, vector addition.
+
+subroutine foo ()
+  integer, parameter :: n = 1000
+  integer, dimension (0:n-1) :: a, b, c
+  common a, b, c
+  integer :: ii
+
+

Re: [Patch Vax] zero/sign extend patterns need to be SUBREG aware

2015-06-29 Thread James Greenhalgh

On Fri, Jun 19, 2015 at 05:30:21PM +0100, Matt Thomas wrote:
 
  On Jun 19, 2015, at 8:51 AM, Jan-Benedict Glaw jbg...@lug-owl.de wrote:
  
  Hi James,
  
  On Tue, 2015-06-16 10:58:48 +0100, James Greenhalgh 
  james.greenha...@arm.com wrote:
  The testcase in this patch, from libgcc, causes an ICE in the Vax
  build.
  [...]
  As far as I know, reload is going to get rid of these SUBREGs
  for us, so we don't need to modify the output statement.
  
  Tested that this restores the VAX build and that the code-gen is
  sensible for the testcase.
  
  OK?
  
  Looks good to me, but Matt has to ACK this fix.
 
 I so ACK.
 

Now that this has had a few days sitting on trunk without seeing any
complaints, would you mind if I backported it to the GCC 5 branch?

Thanks,
James

Re: [Patch Vax] zero/sign extend patterns need to be SUBREG aware

2015-06-29 Thread Matt Thomas


 On Jun 29, 2015, at 8:19 AM, James Greenhalgh james.greenha...@arm.com 
 wrote:
 
 Now that this has had a few days sitting on trunk without seeing any
 complaints, would you mind if I backported it to the GCC 5 branch?

I don’t have a problem with that.

[PATCH C/C++/middle-end] move Wunused-parameter to the FEs

2015-06-29 Thread Manuel López-Ibáñez

Wunused-parameter warnings are given from cgraph::finalize_function,
which is the middle-end. This is an oddity compared to other
-Wunused-* warnings. Moreover, Fortran has its own definition of
-Wunused-parameter that conflicts with the middle-end definition.

This patch moves the middle-end part of Wunused-parameter to the C/C++
FEs. I'm not sure if other FEs expected this warning to work. If so,
they do not seem to test for it. Ada, for example, explicitly disables
it.

This also fixes [fortran/66605] -Wunused-parameter causes internal
compiler error with gfortran 5.1.0.

Bootstrapped  regression tested on x86_64-linux-gnu with languages
c,c++,objc,fortran,ada,obj-c++

OK?


gcc/ChangeLog:

2015-06-28  Manuel López-Ibáñez  m...@gcc.gnu.org

PR fortran/66605
* cgraphunit.c (cgraph_node::finalize_function): Do not call
do_warn_unused_parameter.
* function.c (do_warn_unused_parameter): Move from here.
* function.h (do_warn_unused_parameter): Do not declare.

gcc/c-family/ChangeLog:

2015-06-28  Manuel López-Ibáñez  m...@gcc.gnu.org

PR fortran/66605
   * c-common.c (do_warn_unused_parameter): Move here.
* c-common.h (do_warn_unused_parameter): Declare.

gcc/ada/ChangeLog:

2015-06-28  Manuel López-Ibáñez  m...@gcc.gnu.org

PR fortran/66605
   * gcc-interface/misc.c (gnat_post_options): No need to disable
warn_unused_parameter anymore.

gcc/cp/ChangeLog:

2015-06-28  Manuel López-Ibáñez  m...@gcc.gnu.org

PR fortran/66605
   * decl.c (finish_function): Call do_warn_unused_parameter.

gcc/testsuite/ChangeLog:

2015-06-28  Manuel López-Ibáñez  m...@gcc.gnu.org

PR fortran/66605
   * gfortran.dg/wunused-parameter.f90: New test.

gcc/c/ChangeLog:

2015-06-28  Manuel López-Ibáñez  m...@gcc.gnu.org

PR fortran/66605
   * c-decl.c (finish_function): Call do_warn_unused_parameter.
Index: gcc/c-family/c-common.c
===
--- gcc/c-family/c-common.c (revision 224997)
+++ gcc/c-family/c-common.c (working copy)
@@ -12045,10 +12045,27 @@ do_warn_double_promotion (tree result_ty
   else
 return;
   warning_at (loc, OPT_Wdouble_promotion, gmsgid, source_type, result_type);
 }
 
+/* Possibly warn about unused parameters.  */
+
+void
+do_warn_unused_parameter (tree fn)
+{
+  tree decl;
+
+  for (decl = DECL_ARGUMENTS (fn);
+   decl; decl = DECL_CHAIN (decl))
+if (!TREE_USED (decl)  TREE_CODE (decl) == PARM_DECL
+DECL_NAME (decl)  !DECL_ARTIFICIAL (decl)
+!TREE_NO_WARNING (decl))
+  warning_at (DECL_SOURCE_LOCATION (decl), OPT_Wunused_parameter,
+ unused parameter %qD, decl);
+}
+
+
 /* Setup a TYPE_DECL node as a typedef representation.
 
X is a TYPE_DECL for a typedef statement.  Create a brand new
..._TYPE node (which will be just a variant of the existing
..._TYPE node with identical properties) and then install X
Index: gcc/c-family/c-common.h
===
--- gcc/c-family/c-common.h (revision 224997)
+++ gcc/c-family/c-common.h (working copy)
@@ -1041,10 +1041,11 @@ extern void warn_for_div_by_zero (locati
 extern void warn_for_sign_compare (location_t,
   tree orig_op0, tree orig_op1,
   tree op0, tree op1,
   tree result_type,
   enum tree_code resultcode);
+extern void do_warn_unused_parameter (tree);
 extern void do_warn_double_promotion (tree, tree, tree, const char *, 
  location_t);
 extern void set_underlying_type (tree);
 extern void record_types_used_by_current_var_decl (tree);
 extern void record_locally_defined_typedef (tree);
Index: gcc/c/c-decl.c
===
--- gcc/c/c-decl.c  (revision 224997)
+++ gcc/c/c-decl.c  (working copy)
@@ -9026,10 +9026,14 @@ finish_function (void)
 
   /* Complain about locally defined typedefs that are not used in this
  function.  */
   maybe_warn_unused_local_typedefs ();
 
+  /* Possibly warn about unused parameters.  */
+  if (warn_unused_parameter)
+do_warn_unused_parameter (fndecl);
+
   /* Store the end of the function, so that we get good line number
  info for the epilogue.  */
   cfun-function_end_locus = input_location;
 
   /* Finalize the ELF visibility for the function.  */
Index: gcc/cgraphunit.c
===
--- gcc/cgraphunit.c(revision 224997)
+++ gcc/cgraphunit.c(working copy)
@@ -470,14 +470,10 @@ cgraph_node::finalize_function (tree dec
 
   /* If we've not yet emitted decl, tell the debug info about it.  */
   if (!TREE_ASM_WRITTEN (decl))
 (*debug_hooks-deferred_inline_function) (decl);
 
-  /* Possibly warn about unused parameters.  */
-  if (warn_unused_parameter)
-do_warn_unused_parameter (decl);
-
   if

Re: [PATCH C/C++/middle-end] move Wunused-parameter to the FEs

2015-06-29 Thread Jason Merrill


OK.

Jason

Re: [PATCH] top-level for libvtv: use normal (not raw_cxx) target exports

2015-06-29 Thread Michael Haubenwallner



On 06/25/2015 09:15 PM, Jeff Law wrote:
 On 06/09/2015 08:22 AM, Michael Haubenwallner wrote:
 Hi build machinery maintainers,

 since we always build the C++ compiler now, I fail to see the need to still
 use RAW_CXX_TARGET_EXPORTS for libvtv.
 But why is vtv special here?
 Wouldn't this also apply to libstdc++-v3 which would still have raw_cxx=true 
 after your change?

libstdc++ breaks with shell syntax errors in maintainer-mode, which more 
obviously
spots the configure script recreation while running as the breakage's origin,
where the simple enough workaround is to run 'make bootstrap' again.

This is because libstdc++-v3/configure runs 'generate-headers' (via 
config.status),
which is 'cd include  make', causing dependencies in 'include/Makefile' to be
evaluated, causing make to run autoconf to recreate libstdc++-v3/configure.

Actually, an unexpected libstdc++-v3/configure change is introduced by some
inconsistency in https://gcc.gnu.org/viewcvs/gcc?view=revisionrevision=223196
where acinclude.m4 contains different code than the recreated configure:

acinclude.m4-3973: case ${target_os} in
acinclude.m4#3974:   gnu* | linux* | solaris*)
acinclude.m4-3975: GCC_TRY_COMPILE_OR_LINK(

  configure-79218:   case ${target_os} in
  configure#79219:   gnu* | linux* | kfreebsd*-gnu | knetbsd*-gnu | 
solaris*)
  configure-79220: if test x$gcc_no_link = xyes; then

Not sure how to handle such kind of inconsistencies though...

 It's unfortunate that David didn't document the meaning of raw_cxx...
 I've read the threads from 2004 and I'm still lost.

For vtv the breakage once resulted in the different CC/CXX values here,
but that must've been related to maintainer-mode actually, not raw_cxx.

From my point of view, this breakage vanished as reason for the raw_cxx patch.

/haubi/

RE: [Patch, MIPS] Enable fp-contract on MIPS and update -mfused-madd

2015-06-29 Thread Maciej W. Rozycki

Richard, please have a look at my question below in a reference to your 
previous statement.

On Thu, 18 Jun 2015, Steve Ellcey wrote:

 OK, I checked in the prequel patch and here is a new copy of the
 original patch based off of that (and with no HONOR_NAN checks in the
 fma/madd instructions).
 
 OK for checkin?

 Please see below for my notes.

 2015-06-18  Steve Ellcey  sell...@imgtec.com
 
   * config.gcc (mips*-*-*): Add fused-madd.opt.

 Please use angle brackets as per 
https://www.gnu.org/prep/standards/html_node/Indicating-the-Part-Changed.html,
 
i.e.:

* config.gcc mips*-*-*: Add fused-madd.opt.

There's no function or similar entity involved here and `mips*-*-*' is a 
case value like with the C language's `switch' statement where you'd use 
angle brackets too to refer to individual cases.

   (*nmsub4mode_fastmath)  Update condition.

 Extraneous space here.

 diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md
 index f6912e1..4f5692c 100644
 --- a/gcc/config/mips/mips.md
 +++ b/gcc/config/mips/mips.md
[...]
 +;; fnma is defined in GCC as (fma (neg op1) op2 op3)
 +;; (-op1 * op2) + op3 == -(op1 * op2) + op3 == -((op1 * op2) - op3)
 +;; The mips nmsub instructions implement -((op1 * op2) - op3)
 +;; This transformation means we may return the wrong signed zero
 +;; so we check HONOR_SIGNED_ZEROS.
 +
 +(define_expand fnmamode4
 +  [(set (match_operand:ANYF 0 register_operand)
 + (fma:ANYF (neg:ANYF (match_operand:ANYF 1 register_operand))
 +   (match_operand:ANYF 2 register_operand)
 +   (match_operand:ANYF 3 register_operand)))]
 +  (ISA_HAS_FUSED_MADD3 || ISA_HAS_FUSED_MADD4)
 +!HONOR_SIGNED_ZEROS (MODEmode))

 Have you considered the alternative/complementary approach proposed by 
Richard here: http://gcc.gnu.org/ml/gcc-patches/2010-11/msg00680.html, 
i.e. to introduce further expanders, e.g.:

fmanM4: (neg:M (fma:M OP1 OP2 OP3)) (multiply-add, negated)

fmsnM4: (neg:M (fma:M OP1 OP2 (neg:M OP3))) (multiply-subtract, negated)

?

 These patterns wouldn't need a check for !HONOR_SIGNED_ZEROS as they 
match the respective hardware instructions in an exact manner.  Therefore 
I think they would be more useful as they would also suit software that 
claims/requires full IEEE Std 754 compliance.

 Richard, do you maintain the introduction of these additional operations
would be a good idea and one you're willing to support for the purpose of 
patch acceptance/approval if implemented?

 +;; fnms is defined as: (fma (neg op1) op2 (neg op3))
 +;; ((-op1) * op2) - op3 == -(op1 * op2) - op3 == -((op1 * op2) + op3)
 +;; The mips nmadd instructions implement -((op1 * op2) + op3)
 +;; This transformation means we may return the wrong signed zero
 +;; so we check HONOR_SIGNED_ZEROS.
 +
 +(define_expand fnmsmode4
 +  [(set (match_operand:ANYF 0 register_operand)
 + (fma:ANYF
 +   (neg:ANYF (match_operand:ANYF 1 register_operand))
 +   (match_operand:ANYF 2 register_operand)
 +   (neg:ANYF (match_operand:ANYF 3 register_operand]
 +  (ISA_HAS_FUSED_MADD3 || ISA_HAS_FUSED_MADD4)
 +!HONOR_SIGNED_ZEROS (MODEmode))

 Same observation here.

 The change looks good to me otherwise.

  Maciej

[nios2, committed] fix bad assertion

2015-06-29 Thread Sandra Loosemore

When I was preparing to regression-test something else in a 
nios2-linux-gnu build, I discovered it was ICE'ing while building shared 
libraries with -fpic (glibc, libgomp).  I tracked this down to having 
started with r224048, but on further investigation I decided that commit 
merely exposed a latent bug.


The trouble is that the assertion in nios2_delegitimize_address is too 
restrictive compared to what nios2_legitimize_address can produce.  It's 
expecting to find a SYMBOL_REF underneath but in one case it was 
crashing on a LABEL_REF (for a computed goto), and in another case it 
was a symbol + offset expression which is even documented with a big 
block of comments in nios2_legitimize_address.  I've checked in this 
patch to relax the assertion; it allows the toolchain to build again, 
and test results look decent.


-Sandra

2015-06-29  Sandra Loosemore  san...@codesourcery.com

	gcc/
	* config/nios2/nios2.c (nios2_delegitimize_address): Make
	assert less restrictive.
Index: gcc/config/nios2/nios2.c
===
--- gcc/config/nios2/nios2.c	(revision 225094)
+++ gcc/config/nios2/nios2.c	(working copy)
@@ -1920,7 +1920,7 @@ nios2_delegitimize_address (rtx x)
 	case UNSPEC_LOAD_TLS_IE:
 	case UNSPEC_ADD_TLS_LE:
 	  x = XVECEXP (XEXP (x, 0), 0, 0);
-	  gcc_assert (GET_CODE (x) == SYMBOL_REF);
+	  gcc_assert (CONSTANT_P (x));
 	  break;
 	}
 }

[gomp4] Remove unnecessary conversions

2015-06-29 Thread James Norris


Hi,

This patch removes the conversion of the arguments. As the
arguments are already of the proper type, via the front
ends, there's no need to the conversion. Furthermore, if
the node is a 'const int', then the conversion function
will create a node that can't be used as an argument to
a gimple call.

Committed to gomp-4_0-branch.


Jim
diff --git a/gcc/ChangeLog.gomp b/gcc/ChangeLog.gomp
index 4d449b0..f2b6c4b 100644
--- a/gcc/ChangeLog.gomp
+++ b/gcc/ChangeLog.gomp
@@ -1,3 +1,7 @@
+2015-06-29  James Norris  jnor...@codesourcery.com
+
+	* omp-low.c (expand_omp_target): Remove unnecessary conversion.
+
 2015-06-26  Nathan Sidwell  nat...@codesourcery.com
 
 	* config/nvptx/nvptx.md (UNSPEC_ID): Rename to ...
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 5914fcf..24fac7c 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -10138,22 +10138,17 @@ expand_omp_target (struct omp_region *region)
 	  = fold_convert_loc (gimple_location (entry_stmt),
 			  integer_type_node, integer_one_node);
 	/* ..., but if present, use the value specified by the respective
-	   clause, making sure that are of the correct type.  */
+	   clause.  */
 	c = find_omp_clause (clauses, OMP_CLAUSE_NUM_GANGS);
 	if (c)
-	  t_num_gangs = fold_convert_loc (OMP_CLAUSE_LOCATION (c),
-	  integer_type_node,
-	  OMP_CLAUSE_NUM_GANGS_EXPR (c));
+	  t_num_gangs = OMP_CLAUSE_NUM_GANGS_EXPR (c);
 	c = find_omp_clause (clauses, OMP_CLAUSE_NUM_WORKERS);
 	if (c)
-	  t_num_workers = fold_convert_loc (OMP_CLAUSE_LOCATION (c),
-	integer_type_node,
-	OMP_CLAUSE_NUM_WORKERS_EXPR (c));
+	  t_num_workers = OMP_CLAUSE_NUM_WORKERS_EXPR (c);
 	c = find_omp_clause (clauses, OMP_CLAUSE_VECTOR_LENGTH);
 	if (c)
-	  t_vector_length = fold_convert_loc (OMP_CLAUSE_LOCATION (c),
-	  integer_type_node,
-	  OMP_CLAUSE_VECTOR_LENGTH_EXPR (c));
+	  t_vector_length = OMP_CLAUSE_VECTOR_LENGTH_EXPR (c);
+	 
 	args.quick_push (t_num_gangs);
 	args.quick_push (t_num_workers);
 	args.quick_push (t_vector_length);

Re: [PATCH] Yet another simple fix to enhance outer-loop vectorization.

2015-06-29 Thread Yuri Rumyantsev

Hi All,

Here is updated patch containing missed change in
slpeel_tree_peel_loop_to_edge which prevents renaming of exit PHI uses
in inner loop.

ChangeLog:
2015-06-29  Yuri Rumyantsev  ysrum...@gmail.com

* tree-vect-loop-manip.c (rename_variables_in_bb): Add argument
to allow renaming of PHI arguments on edges incoming from outer
loop header, add corresponding check before start PHI iterator.
(slpeel_tree_duplicate_loop_to_edge_cfg): Introduce new bool
variable DUPLICATE_OUTER_LOOP and set it to true for outer loops
with true force_vectorize.  Set-up dominator for outer loop too.
Pass DUPLICATE_OUTER_LOOP as argument to rename_variables_in_bb.
(slpeel_can_duplicate_loop_p): Allow duplicate of outer loop if it
was marked with force_vectorize and has restricted cfg.
(slpeel_tree_peel_loop_to_edge): Do not rename exit PHI uses in
inner loop.
* tree-vect-data-refs.c (vect_enhance_data_refs_alignment): Do not
do peeling for outer loops.

gcc/testsuite/ChangeLog:
* gcc.dg/vect/vect-outer-simd-2.c: New test.

2015-06-17 19:35 GMT+03:00 Yuri Rumyantsev ysrum...@gmail.com:
 Richard,

 I attached updated patch.
 You asked me to explain why I did changes for renaming.
 If we do not change PHI arguments for inner loop header we can get the
 following IR:
 source outer loop:
   bb 5: outer-loop header
   # i_45 = PHI 0(4), i_18(9)
   # .MEM_17 = PHI .MEM_4(D)(4), .MEM_44(9)

   bb 6:inner-loop header
   # .MEM_46 = PHI .MEM_44(7), .MEM_17(5)

 after duplication we have (without argument correction):
   bb 12:
   # i_74 = PHI i_64(13), 0(17)
   # .MEM_73 = PHI .MEM_97(13), .MEM_4(D)(17)

   bb 15:
   # .MEM_63 = PHI .MEM_17(12), .MEM_97(16)

 and later we get verifier error:
 test1.c:20:6: error: definition in block 6 does not dominate use in block 10
  void foo (int n)
   ^
 for SSA_NAME: .MEM_17 in statement:
 .MEM_63 = PHI .MEM_17(10), .MEM_97(14)

 and you can see that we need to rename MEM_17 argument for out-coming
 edge to MEM_73 since
 MEM_17 was converted to MEM_73 in outer-loop header.

 This explains my simple fix in rename_variables_in_bb.
 Note also that loop distribution is not performed for outer loops.

 I also did a change in slpeel_can_duplicate_loop_p to simplify check.

 Any comments will be appreciated.

 Yuri.

 2015-06-17 15:24 GMT+03:00 Richard Biener richard.guent...@gmail.com:
 On Tue, Jun 16, 2015 at 4:12 PM, Yuri Rumyantsev ysrum...@gmail.com wrote:
 Thanks a lot Richard for your review.

 I presented updated patch which is not gated by force_vectorize.
 I added test on outer-loop in vect_enhance_data_refs_alignment
 and it returns false for it because we can not improve dr alighment
 through outer-loop peeling in general. So I assume that only
 versioning for alignment can be applied for targets do not support
 unaligned memory access.

 @@ -998,7 +998,12 @@
gimple stmt = DR_STMT (dr);
stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
tree vectype = STMT_VINFO_VECTYPE (stmt_info);
 +  loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);

 +  /* Peeling of outer loops can't improve alignment.  */
 +  if (loop_vinfo  LOOP_VINFO_LOOP (loop_vinfo)-inner)
 +return false;
 +

 but this looks wrong.  It depends on the context (DRs in the outer loop
 can improve alignment by peeling the outer loop and we can still
 peel the inner loop for alignment).

 So IMHO the correct place to amend is vect_enhance_data_refs_alignment
 (which it seems doesn't consider peeling the inner loop).

 I'd say you should simply add

  || loop-inner)

 to the

   /* Check if we can possibly peel the loop.  */
   if (!vect_can_advance_ivs_p (loop_vinfo)
   || !slpeel_can_duplicate_loop_p (loop, single_exit (loop)))
 do_peeling = false;

 check.

 I did not change tests for outer loops in slpeel_can_duplicate_loop_p
 as you proposed since it is not called outside vectorization.

 There is still no reason for this complex condition, so please remove it.

 _Please_ also generate diffs with -p, it is very tedious to see patch hunks
 without a function name context.

 You didn't explain why you needed the renaming changes as I don't
 remember needing it when using the code from loop distribution.

 I also noticed one not-resolved issue with outer-loop peeling - we don't
 consider remainder for possible vectorization of inner-loop as we can see
 on the following example:

   for (i = 0; i  n; i++) {
 diff = 0;
 for (j = 0; j  M; j++) {
   diff += in[j+i]*coeff[j];
 }
 out[i] = diff;
   }

 Is it worth to fix it?

 You mean vectorizing the inner loop in the niter peel epilogue loop
 of the outer loop?  Possibly yes.

 Thanks,
 Richard.

 2015-06-16  Yuri Rumyantsev  ysrum...@gmail.com

 * tree-vect-loop-manip.c (rename_variables_in_bb): Add argument
 to allow renaming of PHI arguments on edges incoming from outer
 loop header, add corresponding check before start PHI iterator.
 (slpeel_tree_duplicate_loop_to_edge_cfg): Introduce new bool
 variable

Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math

2015-06-29 Thread pinskia





 On Jun 29, 2015, at 4:44 AM, Dr. Philipp Tomsich 
 philipp.toms...@theobroma-systems.com wrote:
 
 James,
 
 On 29 Jun 2015, at 13:36, James Greenhalgh james.greenha...@arm.com wrote:
 
 On Mon, Jun 29, 2015 at 10:18:23AM +0100, Kumar, Venkataramanan wrote:
 
 -Original Message-
 From: Dr. Philipp Tomsich [mailto:philipp.toms...@theobroma-systems.com]
 Sent: Monday, June 29, 2015 2:17 PM
 To: Kumar, Venkataramanan
 Cc: pins...@gmail.com; Benedikt Huber; gcc-patches@gcc.gnu.org
 Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt)
 estimation in -ffast-math
 
 Kumar,
 
 This does not come unexpected, as the initial estimation and each iteration
 will add an architecturally-defined number of bits of precision (ARMv8
 guarantuees only a minimum number of bits provided per operation… the
 exact number is specific to each micro-arch, though).
 Depending on your architecture and on the required number of precise bits
 by any given benchmark, one may see miscompares.
 
 True.  
 
 I would be very uncomfortable with this approach.
 
 Same here. The default must be safe. Always.
 Unlike other architectures, we don’t have a problem with making the proper
 defaults for “safety”, as the ARMv8 ISA guarantees a minimum number of
 precise bits per iteration.
 
 From Richard Biener's post in the thread Michael Matz linked earlier
 in the thread:
 
   It would follow existing practice of things we allow in
   -funsafe-math-optimizations.  Existing practice in that we
   want to allow -ffast-math use with common benchmarks we care
   about.
 
   https://gcc.gnu.org/ml/gcc-patches/2009-11/msg00100.html
 
 With the solution you seem to be converging on (2-steps for some
 microarchitectures, 3 for others), a binary generated for one micro-arch
 may drop below a minimum guarantee of precision when run on another. This
 seems to go against the spirit of the practice above. I would only support
 adding this optimization to -Ofast if we could keep to architectural
 guarantees of precision in the generated code (i.e. 3-steps everywhere).
 
 I don't object to adding a -mlow-precision-recip-sqrt style option,
 which would be off by default, would enable the 2-step mode, and would
 need to be explicitly enabled (i.e. not implied by -mcpu=foo) but I don't
 see what this buys you beyond the Gromacs boost (and even there you would
 be creating an Invalid Run as optimization flags must be applied across
 all workloads).
 
 Any flag that reduces precision (and thus breaks IEEE floating-point 
 semantics)
 needs to be gated with an “unsafe” flag (i.e. one that is never on by 
 default).
 As a consequence, the “peak”-tuning for SPEC will turn this on… but barely 
 anyone else would.
 
 For the 3-step optimization, it is clear to me that for generic tuning
 we don't want this to be enabled by default experimental results and advice
 in this thread argues against it for thunderx and cortex-a57 targets.
 However, enabling it based on the CPU tuning selected seems fine to me.
 
 I do not agree on this one, as I would like to see the safe form (i.e. 3 and 5
 iterations respectively) to become the default. Most “server-type” chips
 should not see a performance regression, while it will be easier to optimise 
 for
 this in hardware than for a (potentially microcoded) sqrt-instruction (and 
 subsequent, dependent divide).
 
 I have not heard anyone claim a performance regression (either on thunderx
 or on cortex-a57), but merely heard a “no speed-up”.

Actually it does regress performance on thunderX, I just assumed that when I 
said not going to be a win it was taken as a slow down. It regress gromacs by 
more than 10% on thunderX but I can't remember how much as i had someone else 
run it. The latency difference is also over 40%; for example single precision: 
29 cycles with div (12) sqrt(17) directly vs 42 cycles with the rsqrte and 2 
iterations of 2mul/rsqrts (double is 53 vs 60). That is huge difference right 
there.  ThunderX has a fast div and a fast sqrt for 32bit and a reasonable one 
for double.   So again this is not just not a win but rather a regression for 
thunderX. I suspect cortex-a57 is also true. 

Thanks,
Andrew

 
 So I am strongly in favor of defaulting to the ‘safe’ number of iterations, 
 even
 when compiling for a generic target.
 
 Best,
 Philipp.

Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math

2015-06-29 Thread Benedikt Huber

 
 Benedikt , I have ICE for 444.namd with your patch,  not sure if something 
 wrong in my local tree.

Venkat, now I could reproduce it.
Strangely it does not happen with -flto.
I will try to find out the reason for that.

Thank you for the catch,
Benedikt



signature.asc
Description: Message signed with OpenPGP using GPGMail

Fix PR ada/63310

2015-06-29 Thread Eric Botcazou

This fixes the -fcompare-debug bootstrap failure reported by Jakub.

Tested on x86_64-suse-linux, applied on the mainline.


2015-06-29  Eric Botcazou  ebotca...@adacore.com

PR ada/63310
* gcc-interface/utils.c (gnat_write_global_declarations): Always
build the dummy global variable if code was generated.

-- 
Eric BotcazouIndex: gcc-interface/utils.c
===
--- gcc-interface/utils.c	(revision 225111)
+++ gcc-interface/utils.c	(working copy)
@@ -5202,8 +5202,9 @@ gnat_write_global_declarations (void)
   tree iter;
 
   /* If we have declared types as used at the global level, insert them in
- the global hash table.  We use a dummy variable for this purpose.  */
-  if (types_used_by_cur_var_decl  !types_used_by_cur_var_decl-is_empty ())
+ the global hash table.  We use a dummy variable for this purpose, but
+ we need to build it unconditionally to avoid -fcompare-debug issues.  */
+  if (first_global_object_name)
 {
   struct varpool_node *node;
   char *label;
@@ -5218,11 +5219,12 @@ gnat_write_global_declarations (void)
   node-definition = 1;
   node-force_output = 1;
 
-  while (!types_used_by_cur_var_decl-is_empty ())
-	{
-	  tree t = types_used_by_cur_var_decl-pop ();
-	  types_used_by_var_decl_insert (t, dummy_global);
-	}
+  if (types_used_by_cur_var_decl)
+	while (!types_used_by_cur_var_decl-is_empty ())
+	  {
+	tree t = types_used_by_cur_var_decl-pop ();
+	types_used_by_var_decl_insert (t, dummy_global);
+	  }
 }
 
   /* Output debug information for all global type declarations first.  This

[PATCH, i386] Remove MPX jump insn patterns

2015-06-29 Thread Uros Bizjak

Hello!

Attached patch removes special MPX jump insn patterns and extends
generic jump insn patterns to handle bnd prefix using existing MPX
infrastructure.

2015-06-29  Uros Bizjak  ubiz...@gmail.com

* config/i386/i386.md (*jcc_1): Use %! in asm template.
Set attribute length_nobnd instead of length.
(*jcc_2): Ditto.
(jump): Ditto.
(*jcc_1_bnd, *jcc_2_bnd, jump_bnd): Remove insn patterns.

Patch was tested on x86_64-linux-gnu {,-m32} and committed to mainline SVN.

Uros.
Index: config/i386/i386.md
===
--- config/i386/i386.md (revision 225123)
+++ config/i386/i386.md (working copy)
@@ -10948,24 +10948,6 @@
 ;; Basic conditional jump instructions.
 ;; We ignore the overflow flag for signed branch instructions.
 
-(define_insn *jcc_1_bnd
-  [(set (pc)
-   (if_then_else (match_operator 1 ix86_comparison_operator
- [(reg FLAGS_REG) (const_int 0)])
- (label_ref (match_operand 0))
- (pc)))]
-  TARGET_MPX  ix86_bnd_prefixed_insn_p (insn)
-  bnd %+j%C1\t%l0
-  [(set_attr type ibr)
-   (set_attr modrm 0)
-   (set (attr length)
-  (if_then_else (and (ge (minus (match_dup 0) (pc))
- (const_int -126))
- (lt (minus (match_dup 0) (pc))
- (const_int 128)))
-(const_int 3)
-(const_int 7)))])
-
 (define_insn *jcc_1
   [(set (pc)
(if_then_else (match_operator 1 ix86_comparison_operator
@@ -10973,35 +10955,18 @@
  (label_ref (match_operand 0))
  (pc)))]
   
-  %+j%C1\t%l0
+  %!%+j%C1\t%l0
   [(set_attr type ibr)
(set_attr modrm 0)
-   (set (attr length)
-  (if_then_else (and (ge (minus (match_dup 0) (pc))
- (const_int -126))
- (lt (minus (match_dup 0) (pc))
- (const_int 128)))
-(const_int 2)
-(const_int 6)))])
+   (set (attr length_nobnd)
+   (if_then_else
+ (and (ge (minus (match_dup 0) (pc))
+  (const_int -126))
+  (lt (minus (match_dup 0) (pc))
+  (const_int 128)))
+ (const_int 2)
+ (const_int 6)))])
 
-(define_insn *jcc_2_bnd
-  [(set (pc)
-   (if_then_else (match_operator 1 ix86_comparison_operator
- [(reg FLAGS_REG) (const_int 0)])
- (pc)
- (label_ref (match_operand 0]
-  TARGET_MPX  ix86_bnd_prefixed_insn_p (insn)
-  bnd %+j%c1\t%l0
-  [(set_attr type ibr)
-   (set_attr modrm 0)
-   (set (attr length)
-  (if_then_else (and (ge (minus (match_dup 0) (pc))
- (const_int -126))
- (lt (minus (match_dup 0) (pc))
- (const_int 128)))
-(const_int 3)
-(const_int 7)))])
-
 (define_insn *jcc_2
   [(set (pc)
(if_then_else (match_operator 1 ix86_comparison_operator
@@ -11009,16 +10974,17 @@
  (pc)
  (label_ref (match_operand 0]
   
-  %+j%c1\t%l0
+  %!%+j%c1\t%l0
   [(set_attr type ibr)
(set_attr modrm 0)
-   (set (attr length)
-  (if_then_else (and (ge (minus (match_dup 0) (pc))
- (const_int -126))
- (lt (minus (match_dup 0) (pc))
- (const_int 128)))
-(const_int 2)
-(const_int 6)))])
+   (set (attr length_nobnd)
+   (if_then_else
+ (and (ge (minus (match_dup 0) (pc))
+  (const_int -126))
+  (lt (minus (match_dup 0) (pc))
+  (const_int 128)))
+ (const_int 2)
+ (const_int 6)))])
 
 ;; In general it is not safe to assume too much about CCmode registers,
 ;; so simplify-rtx stops when it sees a second one.  Under certain
@@ -11452,35 +11418,21 @@
 
 ;; Unconditional and other jump instructions
 
-(define_insn jump_bnd
-  [(set (pc)
-   (label_ref (match_operand 0)))]
-  TARGET_MPX  ix86_bnd_prefixed_insn_p (insn)
-  bnd jmp\t%l0
-  [(set_attr type ibr)
-   (set (attr length)
-  (if_then_else (and (ge (minus (match_dup 0) (pc))
- (const_int -126))
- (lt (minus (match_dup 0) (pc))
- (const_int 128)))
-(const_int 3)
-(const_int 6)))
-   (set_attr modrm 0)])
-
 (define_insn jump
   [(set (pc)
(label_ref (match_operand 0)))]
   
-  jmp\t%l0
+  %!jmp\t%l0
   [(set_attr type ibr)
-   (set (attr length)
-  (if_then_else (and (ge (minus (match_dup 0) (pc))
- (const_int -126))
- (lt (minus (match_dup 0) (pc))
- (const_int 128)))
-

Re: [PATCH, PR66652] Use max_loop_iterations in transform_to_exit_first_loop_alt

2015-06-29 Thread Jeff Law


On 06/29/2015 08:24 AM, Tom de Vries wrote:

Hi,

this patch fixes PR66652.

It uses max_loop_iterations in transform_to_exit_first_loop_alt to
ensure that the new loop bound nit + 1 doesn't overflow.

Bootstrapped and reg-tested on x86_64.

OK for trunk?

Thanks,
- Tom

0001-Use-max_loop_iterations-in-transform_to_exit_first_l.patch


Use max_loop_iterations in transform_to_exit_first_loop_alt

2015-06-29  Tom de Vriest...@codesourcery.com

PR tree-optimization/66652
* tree-parloops.c (try_transform_to_exit_first_loop_alt): Use
max_loop_iterations to determine if nit + 1 overflows.

* testsuite/libgomp.c/parloops-exit-first-loop-alt-3.c (f): Rewrite
using restrict pointers.
(main): Add arguments to calls to f.
* testsuite/libgomp.c/parloops-exit-first-loop-alt.c: Same.

* gcc.dg/parloops-exit-first-loop-alt-pr66652.c: New test.
* gcc.dg/parloops-exit-first-loop-alt-3.c (f):  Rewrite using restrict
pointers.
* gcc.dg/parloops-exit-first-loop-alt.c: Same.

OK.

Make sure to put the PR marker in the testsuite/ChangeLog entry and drop 
the testsuite/ prefix in the testsuite/ChangeLog entry.


jeff

Re: [Patch] Add support for IEEE-conformant versions of scalar fmin* and fmax*

2015-06-29 Thread Joseph Myers

On Mon, 29 Jun 2015, David Sherwood wrote:

 Hi,
 
 I have added new STRICT_MAX_EXPR and STRICT_MIN_EXPR expressions to support 
 the
 IEEE versions of fmin and fmax. This is done by recognising the math library
 fmax and fmin builtin functions in a similar way to how this is done for
 -ffast-math. This also allows us to vectorise the IEEE max/min functions for
 targets that support it, for example aarch64/aarch32.

This patch is missing documentation.  You need to document the new insn 
patterns in md.texi and the new tree codes in generic.texi.

-- 
Joseph S. Myers
jos...@codesourcery.com

RE: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math

2015-06-29 Thread Kumar, Venkataramanan

Hi,

 -Original Message-
 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
 ow...@gcc.gnu.org] On Behalf Of Benedikt Huber
 Sent: Monday, June 29, 2015 11:04 PM
 To: Kumar, Venkataramanan
 Cc: pins...@gmail.com; Dr. Philipp Tomsich; gcc-patches@gcc.gnu.org
 Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt)
 estimation in -ffast-math

  Benedikt , I have ICE for 444.namd with your patch,  not sure if something
 wrong in my local tree.

 Venkat, now I could reproduce it.
 Strangely it does not happen with -flto.
 I will try to find out the reason for that.

Sure,  I kind of remember that RTX for target in builtin rsqrt was 0.  But not 
sure if we need to create reg rtx in such cases.

 Thank you for the catch,
 Benedikt

Re: [PATCH 1/2] Add gcc/typed-splay-tree.h

2015-06-29 Thread David Malcolm

On Thu, 2015-06-25 at 13:18 -0600, Jeff Law wrote:
 On 06/25/2015 01:13 PM, David Malcolm wrote:
  I found when implementing switch statements for the jit that it
  was much easier to work with libiberty's splay-tree.h by first
  wrapping it in a C++ wrapper to add typesafety.
 
  This patch adds such a wrapper, implementing the methods I needed.
 
  It's unused in this patch, but is used in the followup patch (which only
  touches the jit).
 
  OK for trunk?
 
  gcc/ChangeLog:
  * typed-splay-tree.h: New file.
 OK.

Well, this is embarrassing, it seems the patch I posted to the list
doesn't actually compile.

The underlying splay_tree_insert returns a splay_tree_node, I had the
typed_splay_tree::insert returning a value_type.  I dropped this bogus
return type from the insert method in the implementation, in favor of
void, but I forgot to update the declaration, leading to errors when
attempting to actually compile this (via jit/jit-recording.c in the
followup patch).

The attached one-liner patch drops it from the declaration, and applies
to [PATCH 1/2].  I don't know if I can count this as obvious...  It
does compile now, and make check-jit looks good.

Suitably mortified
Dave
commit 839fdd693d2fe8d12f04ab5bde20e8595b3031f2
Author: David Malcolm dmalc...@redhat.com
Date:   Fri Jun 26 05:22:32 2015 -0400

fix typed-splay-tree.h insert

diff --git a/gcc/typed-splay-tree.h b/gcc/typed-splay-tree.h
index 1ec4894..7849862 100644
--- a/gcc/typed-splay-tree.h
+++ b/gcc/typed-splay-tree.h
@@ -42,7 +42,7 @@ class typed_splay_tree
   value_type lookup (key_type k);
   value_type predecessor (key_type k);
   value_type successor (key_type k);
-  value_type insert (key_type k, value_type v);
+  void insert (key_type k, value_type v);
 
  private:
   static value_type node_to_value (splay_tree_node node);

Re: [RFA] Factor conversion out of COND_EXPR using match.pd pattern

2015-06-29 Thread Jeff Law


On 06/01/2015 04:55 AM, Richard Biener wrote:

On Sat, May 30, 2015 at 11:11 AM, Marc Glisse marc.gli...@inria.fr wrote:

(only commenting on the technique, not on the transformation itself)


+(simplify
+  (cond @0 (convert @1) INTEGER_CST@2)
+  (if (INTEGRAL_TYPE_P (TREE_TYPE (@1))
+COMPARISON_CLASS_P (@0)



If you add COMPARISON_CLASS_P to define_predicates, you could write:
(cond COMPARISON_CLASS_P@0 (convert @1) INTEGER_CST@2)


But that would fail to match on GIMPLE, so I don't like either variant
as Jeffs relies on the awkward fact that on GIMPLE cond expr conditions
are GENERIC and yours wouldn't work.

That said - for this kind of patterns testcases that exercise the patterns
on GIMPLE would be very appreciated.
It may be the case that these patterns don't make a lot of sense on 
gimple and should be restricted to generic, at least with our current 
infrastructure.


The problem is when we lower from generic to gimple, we end up with 
branchy code, not straight line code and there's no good way I see to 
write a match.pd pattern which encompasses flow control.


So to discover the MIN/MAX with typecast, we're probably stuck hacking 
minmax_replacement to know when it can ignore the typecast.


That may in fact be a good thing -- I haven't looked closely yet, but 
45397 may be of a similar nature (no good chance to see straight line 
code for match.pd, thus we're forced to handle it in phiopt).



So do we want to restrict the new pattern on GENERIC, then rely on 
phiopt to get smarter and catch this stuff for GIMPLE?  Or drop the 
pattern totally and do everything in phiopt on GIMPLE?







or maybe use a for loop on comparisons, which would give names to
TREE_OPERAND (@0, *). This should even handle the operand_equal_p
alternative:

(cond (cmp:c@0 @1 @2) (convert @1) INTEGER_CST@2)


Yes, that would be my reference.
But won't this require pointer equivalence?  Are INTEGER_CST nodes fully 
shared?  What if @1 is something more complex than a _DECL node 
(remember, we're working with GENERIC).  So something like

(cond (cmp:c@0 @1 @2) (convert @3) INTEGER_CST@4))

And using operand_equal_p seems more appropriate to me (and is still 
better than the original (cond @0 ...) and grubbing around inside @0 to 
look at operands.






+int_fits_type_p (@2, TREE_TYPE (@1))
+((operand_equal_p (TREE_OPERAND (@0, 0), @2, 0)
+operand_equal_p (TREE_OPERAND (@0, 1), @1, 0))
+  || (operand_equal_p (TREE_OPERAND (@0, 0), @1, 0)
+   operand_equal_p (TREE_OPERAND (@0, 1), @2, 0
+(with { tree itype = TREE_TYPE (@1); tree otype = TREE_TYPE (@2); }
+  (convert:otype (cond:itype @0 @1 (convert:itype @2))



This should be enough, no need to specify the outer type
(convert (cond:itype @0 @1 (convert:itype @2))


Yes.


I believe we should not have to write cond:itype here, cond should be made
to use the type of its second argument instead of the first one, by default
(expr::gen_transform already has a few special cases).


Indeed.  Patch welcome (I'd have expected it already works...)

With Marc's fix, those explicit types are no longer needed.

Jeff

Re: [gomp4, fortran] Patch to fix continuation checks of OpenACC and OpenMP directives

2015-06-29 Thread Ilmir Usmanov


Hi Cesar!

Thanks for your review!

08.06.2015, 17:59, Cesar Philippidis ce...@codesourcery.com:

On 06/07/2015 02:05 PM, Ilmir Usmanov wrote:

 Fixed fortran mail-list address. Sorry for inconvenience.

 08.06.2015, 00:01, Ilmir Usmanov m...@ilmir.us:

 Hi Cesar!

 This patch fixes checks of OpenMP and OpenACC continuations in
 case if someone mixes them (i.e. continues OpenMP directive with
 !$ACC sentinel or vice versa).

 OK for gomp branch?


Thanks for working on this. Does this fix PR63858 by any chance?
No problem. I had a feeling that something is wrong in the scanner since 
I've committed an initial support of OpenACC ver. 1.0 to gomp branch 
(more than a year ago).
Now it does fix the PR, because I've added support of fixed form to the 
patch. BTW, your test in the PR has a wrong continuation. Fixed test 
added to the patch.




two minor nits...


 0001-Fix-mix-of-OpenACC-and-OpenMP-sentinels-in-continuat.patch

 From 5492bf5bc991b6924f5e3b35c11eeaed745df073 Mon Sep 17 00:00:00 2001
 From: Ilmir Usmanov i.usma...@samsung.com
 Date: Sun, 7 Jun 2015 23:55:22 +0300
 Subject: [PATCH] Fix mix of OpenACC and OpenMP sentinels in continuation

 ---
  gcc/fortran/ChangeLog | 5 +


Use ChangeLog.gomp for gomp-4_0-branch.

Done.




 + /* In case we have an OpenMP directive continued by OpenACC
 + sentinel, or vice versa, we get both openmp_flag and
 + openacc_flag on. */
 +
 + if (openacc_flag  openmp_flag)
 + {
 + int is_openmp = 0;
 + for (i = 0; i  5; i++, c = next_char ())
 + {
 + if (gfc_wide_tolower (c) != (unsigned char) !$acc[i])
 + is_openmp = 1;
 + if (i == 4)
 + old_loc = gfc_current_locus;
 + }
 + gfc_error (Wrong %s continuation at %C: expected %s, got %s,
 + is_openmp ? OpenACC : OpenMP,
 + is_openmp ? !$ACC : !$OMP,
 + is_openmp ? !$OMP : !$ACC);


I think it's better for the translation project if you made this a
complete string. So maybe change this line into

  gfc_error (is_openmp ? Wrong continuation at %C: expected !$ACC, got
  !$OMP,
 : Wrong continuation at %C: expected !$OMP, got !$ACC);

Done



Other than that, it looks fine.

Thanks,
Cesar


OK for gomp branch?

--
Ilmir.



From dc98062b499838ca4391c0ed497f40205267a44e Mon Sep 17 00:00:00 2001
From: Ilmir Usmanov m...@ilmir.us
Date: Tue, 30 Jun 2015 03:05:23 +0300
Subject: [PATCH] PR/63858

---
 gcc/fortran/ChangeLog.gomp  |  10 ++
 gcc/fortran/scanner.c   | 259 +---
 gcc/testsuite/ChangeLog.gomp|   7 +
 gcc/testsuite/gfortran.dg/goacc/omp-fixed.f |  32 
 gcc/testsuite/gfortran.dg/goacc/omp.f95 |   8 +
 5 files changed, 216 insertions(+), 100 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/omp-fixed.f

diff --git a/gcc/fortran/ChangeLog.gomp b/gcc/fortran/ChangeLog.gomp
index aa02ffe..0a5bf47 100644
--- a/gcc/fortran/ChangeLog.gomp
+++ b/gcc/fortran/ChangeLog.gomp
@@ -1,3 +1,13 @@
+2015-06-30  Ilmir Usmanov  m...@ilmir.us
+
+	PR/63858
+	Fix mix of OpenACC and OpenMP sentinels in continuation.
+	* scanner.c (skip_omp_attribute_fixed, skip_oacc_attribute_fixed): New
+	functions.
+	(skip_fixed_comments): Fix mix of OpenACC and OpenMP sentinels in
+	continuation.
+	(gfc_next_char_literal): Likewise.
+
 2015-06-18  James Norris  jnor...@codesourcery.com
 
 	* trans-decl.c (find_module_oacc_declare_clauses): Fix setting of
diff --git a/gcc/fortran/scanner.c b/gcc/fortran/scanner.c
index f0e6404..d66cf65 100644
--- a/gcc/fortran/scanner.c
+++ b/gcc/fortran/scanner.c
@@ -935,6 +935,63 @@ skip_free_comments (void)
   return false;
 }
 
+/* Return true if MP was matched in fixed form.  */
+static bool
+skip_omp_attribute_fixed (locus *start)
+{
+  gfc_char_t c;
+  if (((c = next_char ()) == 'm' || c == 'M')
+   ((c = next_char ()) == 'p' || c == 'P'))
+{
+  c = next_char ();
+  if (c != '\n'
+	   (continue_flag
+	  || c == ' ' || c == '\t' || c == '0'))
+	{
+	  do
+	c = next_char ();
+	  while (gfc_is_whitespace (c));
+	  if (c != '\n'  c != '!')
+	{
+	  /* Canonicalize to *$omp.  */
+	  *start-nextc = '*';
+	  openmp_flag = 1;
+	  gfc_current_locus = *start;
+	  return true;
+	}
+	}
+}
+  return false;
+}
+
+/* Return true if CC was matched in fixed form.  */
+static bool
+skip_oacc_attribute_fixed (locus *start)
+{
+  gfc_char_t c;
+  if (((c = next_char ()) == 'c' || c == 'C')
+   ((c = next_char ()) == 'c' || c == 'C'))
+{
+  c = next_char ();
+  if (c != '\n'
+	   (continue_flag
+	  || c == ' ' || c == '\t' || c == '0'))
+	{
+	  do
+	c = next_char ();
+	  while (gfc_is_whitespace (c));
+	  if (c != '\n'  c != '!')
+	{
+	  /* Canonicalize to *$omp.  */
+	  *start-nextc = '*';
+	  openacc_flag = 1;
+	  gfc_current_locus = *start;
+	  return true;
+	}
+	}
+}
+  return false;
+}
 
 /* Skip comment lines in fixed source mode.  We have the same rules as
in skip_free_comment(), except that we can

[PATCH][6/n] Remove GENERIC stmt combining from SCCVN

2015-06-29 Thread Richard Biener


This moves X ==/!= ptr comparison folding which triggers surprisingly
often.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2015-06-29  Richard Biener  rguent...@suse.de

* genmatch.c (add_operator): Treat ADDR_EXPR as atom.
* fold-const.c (fold_binary_loc): Move A - B simplification
via ptr_difference_const ...
* match.pd: ... here.
When matching (X ^ Y) == Y also match with swapped operands.

Index: gcc/genmatch.c
===
--- gcc/genmatch.c  (revision 225007)
+++ gcc/genmatch.c  (working copy)
@@ -324,6 +324,9 @@ add_operator (enum tree_code code, const
   /* And allow CONSTRUCTOR for vector initializers.  */
!(code == CONSTRUCTOR))
 return;
+  /* Treat ADDR_EXPR as atom, thus don't allow matching its operand.  */
+  if (code == ADDR_EXPR)
+nargs = 0;
   operator_id *op = new operator_id (code, id, nargs, tcc);
   id_base **slot = operators-find_slot_with_hash (op, op-hashval, INSERT);
   if (*slot)
Index: gcc/fold-const.c
===
--- gcc/fold-const.c(revision 225007)
+++ gcc/fold-const.c(working copy)
@@ -10618,16 +10562,6 @@ fold_binary_loc (location_t loc,
fold_convert_loc (loc, type,
  negate_expr (arg1)));
 
-  /* Try folding difference of addresses.  */
-  {
-   HOST_WIDE_INT diff;
-
-   if ((TREE_CODE (arg0) == ADDR_EXPR
-|| TREE_CODE (arg1) == ADDR_EXPR)
-ptr_difference_const (arg0, arg1, diff))
- return build_int_cst_type (type, diff);
-  }
-
   /* Fold a[i] - a[j] to i-j.  */
   if (TREE_CODE (arg0) == ADDR_EXPR
   TREE_CODE (TREE_OPERAND (arg0, 0)) == ARRAY_REF
Index: gcc/match.pd
===
*** gcc/match.pd(revision 225007)
--- gcc/match.pd(working copy)
*** (define_operator_list swapped_tcc_compar
*** 545,550 
--- 545,565 
(with { tree algn = wide_int_to_tree (TREE_TYPE (@0), wi::bit_not (@1)); }
 (bit_and @0 { algn; })))
  
+ /* Try folding difference of addresses.  */
+ (simplify
+  (minus (convert ADDR_EXPR@0) (convert @1))
+  (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
+   (with { HOST_WIDE_INT diff; }
+(if (ptr_difference_const (@0, @1, diff))
+ { build_int_cst_type (type, diff); }
+ (simplify
+  (minus (convert @0) (convert ADDR_EXPR@1))
+  (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
+   (with { HOST_WIDE_INT diff; }
+(if (ptr_difference_const (@0, @1, diff))
+ { build_int_cst_type (type, diff); }
+ 
+ 
  
  /* We can't reassociate at all for saturating types.  */
  (if (!TYPE_SATURATING (type))
*** (define_operator_list swapped_tcc_compar
*** 1229,1235 
   /* (X ^ Y) == Y becomes X == 0.
  Likewise (X ^ Y) == X becomes Y == 0.  */
   (simplify
!   (cmp (bit_xor:c @0 @1) @0)
(cmp @1 { build_zero_cst (TREE_TYPE (@1)); }))
  
   /* (X ^ C1) op C2 can be rewritten as X op (C1 ^ C2).  */
--- 1244,1250 
   /* (X ^ Y) == Y becomes X == 0.
  Likewise (X ^ Y) == X becomes Y == 0.  */
   (simplify
!   (cmp:c (bit_xor:c @0 @1) @0)
(cmp @1 { build_zero_cst (TREE_TYPE (@1)); }))
  
   /* (X ^ C1) op C2 can be rewritten as X op (C1 ^ C2).  */

RE: [Patch, MIPS] Enable fp-contract on MIPS and update -mfused-madd

2015-06-29 Thread Matthew Fortune

Maciej W. Rozycki ma...@linux-mips.org writes:
 Richard, please have a look at my question below in a reference to your
 previous statement.
 
 On Thu, 18 Jun 2015, Steve Ellcey wrote:
 
  OK, I checked in the prequel patch and here is a new copy of the
  original patch based off of that (and with no HONOR_NAN checks in the
  fma/madd instructions).
 
  OK for checkin?
 
  Please see below for my notes.
 
  2015-06-18  Steve Ellcey  sell...@imgtec.com
 
  * config.gcc (mips*-*-*): Add fused-madd.opt.
 
  Please use angle brackets as per
 https://www.gnu.org/prep/standards/html_node/Indicating-the-Part-Changed.html,
 i.e.:
 
   * config.gcc mips*-*-*: Add fused-madd.opt.
 
 There's no function or similar entity involved here and `mips*-*-*' is a
 case value like with the C language's `switch' statement where you'd use
 angle brackets too to refer to individual cases.
 
  (*nmsub4mode_fastmath)  Update condition.
 
  Extraneous space here.
 
  The change looks good to me otherwise.
 
   Maciej

If nobody can identify any further functional issues with this patch then
I'd like to get this committed and pursue enhancements as a second round.
Catherine, would you be happy for this to be committed on that basis?

Thanks,
Matthew

[PATCH, libcpp]: Use asm flag outputs in search_line_sse42 main loop

2015-06-29 Thread Uros Bizjak

Hello!

Attached patch introduces asm flag outputs in seach_line_sse42 main
loop to handle carry flag value from pcmpestri insn. Slightly improved
old code that uses asm loop compiles to:

  96:66 0f 6f 05 00 00 00 movdqa 0x0(%rip),%xmm0
  9d:00
  9e:48 83 ef 10  sub$0x10,%rdi
  a2:ba 10 00 00 00   mov$0x10,%edx
  a7:b8 04 00 00 00   mov$0x4,%eax
  ac:0f 1f 40 00  nopl   0x0(%rax)
  b0:48 83 c7 10  add$0x10,%rdi
  b4:66 0f 3a 61 07 00pcmpestri $0x0,(%rdi),%xmm0
  ba:73 f4jaeb0
_ZL17search_line_sse42PKhS0_+0x20
  bc:48 8d 04 0f  lea(%rdi,%rcx,1),%rax
  c0:c3   retq

and new code results in:

  96:66 0f 6f 05 00 00 00 movdqa 0x0(%rip),%xmm0
  9d:00
  9e:ba 10 00 00 00   mov$0x10,%edx
  a3:b8 04 00 00 00   mov$0x4,%eax
  a8:66 0f 3a 61 07 00pcmpestri $0x0,(%rdi),%xmm0
  ae:72 0cjb bc
_ZL17search_line_sse42PKhS0_+0x2c
  b0:48 83 c7 10  add$0x10,%rdi
  b4:66 0f 3a 61 07 00pcmpestri $0x0,(%rdi),%xmm0
  ba:73 f4jaeb0
_ZL17search_line_sse42PKhS0_+0x20
  bc:48 8d 04 0f  lea(%rdi,%rcx,1),%rax
  c0:c3   retq

which looks like an improvement to me.

2015-06-29  Uros Bizjak  ubiz...@gmail.com

* lex.c (search_line_sse42) [__GCC_ASM_FLAG_OUTPUTS__]: New main
loop using asm flag outputs.

Patch was bootstrapped and regression tested on x86_64-linux-gnu
{,m32} (ivybridge), so both code paths were exercised.

Since this is a new feature - does the approach look OK?

Uros.
Index: lex.c
===
--- lex.c   (revision 225138)
+++ lex.c   (working copy)
@@ -450,15 +450,30 @@ search_line_sse42 (const uchar *s, const uchar *en
   s = (const uchar *)((si + 16)  -16);
 }
 
-  /* Main loop, processing 16 bytes at a time.  By doing the whole loop
- in inline assembly, we can make proper use of the flags set.  */
-  __asm (  sub $16, %1\n
-  .balign 16\n
+  /* Main loop, processing 16 bytes at a time.  */
+#ifdef __GCC_ASM_FLAG_OUTPUTS__
+  while (1)
+{
+  char f;
+  __asm (%vpcmpestri\t$0, %2, %3
+: =c(index), =@ccc(f)
+: m(*s), x(search), a(4), d(16));
+  if (f)
+   break;
+  
+  s += 16;
+}
+#else
+  s -= 16;
+  /* By doing the whole loop in inline assembly,
+ we can make proper use of the flags set.  */
+  __asm (  .balign 16\n
0: add $16, %1\n
-  %vpcmpestri $0, (%1), %2\n
+  %vpcmpestri\t$0, (%1), %2\n
   jnc 0b
: =c(index), +r(s)
: x(search), a(4), d(16));
+#endif
 
  found:
   return s + index;

RE: [Patch, MIPS] Enable fp-contract on MIPS and update -mfused-madd

2015-06-29 Thread Moore, Catherine

 -Original Message-
 From: Matthew Fortune [mailto:matthew.fort...@imgtec.com]
 Sent: Monday, June 29, 2015 3:07 PM
 To: Maciej W. Rozycki; Steve Ellcey; Moore, Catherine; Richard Biener
 Cc: Richard Sandiford; Richard Sandiford; Myers, Joseph; gcc-
 patc...@gcc.gnu.org
 Subject: RE: [Patch, MIPS] Enable fp-contract on MIPS and update -mfused-
 madd

 Maciej W. Rozycki ma...@linux-mips.org writes:
  Richard, please have a look at my question below in a reference to
  your previous statement.

  On Thu, 18 Jun 2015, Steve Ellcey wrote:

   OK, I checked in the prequel patch and here is a new copy of the
   original patch based off of that (and with no HONOR_NAN checks in
   the fma/madd instructions).

   OK for checkin?

   Please see below for my notes.

   2015-06-18  Steve Ellcey  sell...@imgtec.com

 * config.gcc (mips*-*-*): Add fused-madd.opt.

   Please use angle brackets as per
  https://www.gnu.org/prep/standards/html_node/Indicating-the-Part-
 Chan
  ged.html,
  i.e.:

  * config.gcc mips*-*-*: Add fused-madd.opt.

  There's no function or similar entity involved here and `mips*-*-*' is
  a case value like with the C language's `switch' statement where you'd
  use angle brackets too to refer to individual cases.

 (*nmsub4mode_fastmath)  Update condition.

   Extraneous space here.

   The change looks good to me otherwise.

Maciej

 If nobody can identify any further functional issues with this patch then I'd
 like to get this committed and pursue enhancements as a second round.
 Catherine, would you be happy for this to be committed on that basis?

Yes, this is fine with me.

RE: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math

2015-06-29 Thread Kumar, Venkataramanan

Hi,

 -Original Message-
 From: pins...@gmail.com [mailto:pins...@gmail.com]
 Sent: Monday, June 29, 2015 10:23 PM
 To: Dr. Philipp Tomsich
 Cc: James Greenhalgh; Kumar, Venkataramanan; Benedikt Huber; gcc-
 patc...@gcc.gnu.org; Marcus Shawcroft; Ramana Radhakrishnan; Richard
 Earnshaw
 Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt)
 estimation in -ffast-math

  On Jun 29, 2015, at 4:44 AM, Dr. Philipp Tomsich
 philipp.toms...@theobroma-systems.com wrote:

  James,

  On 29 Jun 2015, at 13:36, James Greenhalgh
 james.greenha...@arm.com wrote:

  On Mon, Jun 29, 2015 at 10:18:23AM +0100, Kumar, Venkataramanan
 wrote:

  -Original Message-
  From: Dr. Philipp Tomsich
  [mailto:philipp.toms...@theobroma-systems.com]
  Sent: Monday, June 29, 2015 2:17 PM
  To: Kumar, Venkataramanan
  Cc: pins...@gmail.com; Benedikt Huber; gcc-patches@gcc.gnu.org
  Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root
  (rsqrt) estimation in -ffast-math

  Kumar,

  This does not come unexpected, as the initial estimation and each
  iteration will add an architecturally-defined number of bits of
  precision (ARMv8 guarantuees only a minimum number of bits
 provided
  per operation… the exact number is specific to each micro-arch,
 though).
  Depending on your architecture and on the required number of
  precise bits by any given benchmark, one may see miscompares.

  True.

  I would be very uncomfortable with this approach.

  Same here. The default must be safe. Always.
  Unlike other architectures, we don’t have a problem with making the
  proper defaults for “safety”, as the ARMv8 ISA guarantees a minimum
  number of precise bits per iteration.

  From Richard Biener's post in the thread Michael Matz linked earlier
  in the thread:

It would follow existing practice of things we allow in
-funsafe-math-optimizations.  Existing practice in that we
want to allow -ffast-math use with common benchmarks we care
about.

https://gcc.gnu.org/ml/gcc-patches/2009-11/msg00100.html

  With the solution you seem to be converging on (2-steps for some
  microarchitectures, 3 for others), a binary generated for one
  micro-arch may drop below a minimum guarantee of precision when run
  on another. This seems to go against the spirit of the practice
  above. I would only support adding this optimization to -Ofast if we
  could keep to architectural guarantees of precision in the generated code
 (i.e. 3-steps everywhere).

  I don't object to adding a -mlow-precision-recip-sqrt style option,
  which would be off by default, would enable the 2-step mode, and
  would need to be explicitly enabled (i.e. not implied by -mcpu=foo)
  but I don't see what this buys you beyond the Gromacs boost (and even
  there you would be creating an Invalid Run as optimization flags must
  be applied across all workloads).

  Any flag that reduces precision (and thus breaks IEEE floating-point
  semantics) needs to be gated with an “unsafe” flag (i.e. one that is never
 on by default).
  As a consequence, the “peak”-tuning for SPEC will turn this on… but
  barely anyone else would.

  For the 3-step optimization, it is clear to me that for generic
  tuning we don't want this to be enabled by default experimental
  results and advice in this thread argues against it for thunderx and 
  cortex-
 a57 targets.
  However, enabling it based on the CPU tuning selected seems fine to me.

  I do not agree on this one, as I would like to see the safe form (i.e.
  3 and 5 iterations respectively) to become the default. Most
  “server-type” chips should not see a performance regression, while it
  will be easier to optimise for this in hardware than for a
  (potentially microcoded) sqrt-instruction (and subsequent, dependent
 divide).

  I have not heard anyone claim a performance regression (either on
  thunderx or on cortex-a57), but merely heard a “no speed-up”.

 Actually it does regress performance on thunderX, I just assumed that when
 I said not going to be a win it was taken as a slow down. It regress gromacs 
 by
 more than 10% on thunderX but I can't remember how much as i had
 someone else run it. The latency difference is also over 40%; for example
 single precision: 29 cycles with div (12) sqrt(17) directly vs 42 cycles with 
 the
 rsqrte and 2 iterations of 2mul/rsqrts (double is 53 vs 60). That is huge
 difference right there.  ThunderX has a fast div and a fast sqrt for 32bit 
 and a
 reasonable one for double.   So again this is not just not a win but rather a
 regression for thunderX. I suspect cortex-a57 is also true.

 Thanks,
 Andrew

Yes theoretically  should be  true for cortex-57 case as well.   But  I believe 
hardware pipelining with instruction scheduling in compiler helps a little for 
gromacs case  ~3% to 4% with the original patch.

I have not tested other FP benchmarks.   As James said a flag 
-mlow-precision-recip-sqrt if allowed can be

Re: C++ PATCH for c++/66255 (ICE with non-type template parameter of typedef type)

2015-06-29 Thread Jason Merrill


On 06/27/2015 04:28 AM, Andreas Schwab wrote:

* pt.c (check_unstripped_args): Mark parameter as unused.


Thanks.

Jason

Re: [RFA] Factor conversion out of COND_EXPR using match.pd pattern

2015-06-29 Thread Andrew Pinski

On Mon, Jun 29, 2015 at 3:12 PM, Marc Glisse marc.gli...@inria.fr wrote:
 On Mon, 29 Jun 2015, Jeff Law wrote:

 That said - for this kind of patterns testcases that exercise the
 patterns
 on GIMPLE would be very appreciated.

 It may be the case that these patterns don't make a lot of sense on gimple
 and should be restricted to generic, at least with our current
 infrastructure.

 The problem is when we lower from generic to gimple, we end up with
 branchy code, not straight line code and there's no good way I see to write
 a match.pd pattern which encompasses flow control.


 Andrew was working on a way to generate phiopt code automatically from
 match.pd patterns involving cond_expr, I don't know what the status is.


I stopped due to many different things.  One was match.pd was too
immature for conditional expressions.  The other reason was because I
had other things to do internally.  I might pick up but it won't be
for another three months.

Thanks,
Andrew


 or maybe use a for loop on comparisons, which would give names to
 TREE_OPERAND (@0, *). This should even handle the operand_equal_p
 alternative:

 (cond (cmp:c@0 @1 @2) (convert @1) INTEGER_CST@2)


 Hmm, looks like I wrote INTEGER_CST before the second occurence of @2
 instead of the first, so it is probably ignored :-(

 Yes, that would be my reference.

 But won't this require pointer equivalence?  Are INTEGER_CST nodes fully
 shared?  What if @1 is something more complex than a _DECL node (remember,
 we're working with GENERIC).  So something like
 (cond (cmp:c@0 @1 @2) (convert @3) INTEGER_CST@4))

 And using operand_equal_p seems more appropriate to me (and is still
 better than the original (cond @0 ...) and grubbing around inside @0 to look
 at operands.


 I don't understand this comment. When you write @1 twice, genmatch emits a
 call to operand_equal_p to check that they are the same.

 --
 Marc Glisse

Re: [C++ Patch] PR 65977

2015-06-29 Thread Jason Merrill


OK.

Jason

Re: [PATCH] Move X - (X / Y) * Y folding to match.pd

2015-06-29 Thread Marc Glisse


On Mon, 29 Jun 2015, Marek Polacek wrote:


On Mon, Jun 29, 2015 at 09:36:59AM +0200, Richard Biener wrote:

Anything wrong with this?

+/* X - (X / Y) * Y is the same as X % Y.  */
+(simplify
+ (minus (convert? @0) (convert? (mult (trunc_div @0 @1) @1)))
+ (if (INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type))
+  (convert (trunc_mod @0 @1


That looks awfully similar to a variant I also tried (but I remember
having convert1? and convert2? in it).  Not sure what was wrong with
that one; certainly yours seems to work fine.


Afterwards I thought of a limitation. Nothing bad, but it highlights a 
trap I regularly fall into: several @0 in the same pattern may have 
different types (for INTEGER_CST, operand_equal_p mostly ignores the 
type). So for an int x, 42L-42/x*x should fail to match, while using 
convert1? and convert2? should match.


--
Marc Glisse

[C++/66443]

2015-06-29 Thread Nathan Sidwell

This patch fixes 66443, a defect introduced by a C++14 change concerning virtual 
bases of abstract classes.


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66443 points at DR1611  1658, 
which describe the issue.  Essentially, an abstract class type can only exist as 
a base of some other (non-abstract) class.  Any virtual bases of the abstract 
class will be constructed by the containing class itself.  In this specific 
instance, a deleted default ctor of such a virtual base should not delete the 
abstract class's default dtor.


built   tested on x6-64-linux, ok?

nathan
2015-06-29  Nathan Sidwell  nat...@acm.org

	PR c++/66443
	* init.c (emit_mem_initializers): Do not emit initializers for
	virtual bases of abstract classes.
	* method.c (synthesized_method_walk): Skip virtual bases of
	abstract classes in C++14 mode.

	PR c++/66443
	* cpp1y/pr66443.C: New test.

Index: cp/init.c
===
--- cp/init.c	(revision 225103)
+++ cp/init.c	(working copy)
@@ -1139,9 +1139,7 @@ emit_mem_initializers (tree mem_inits)
 	}
 
   /* Initialize the base.  */
-  if (BINFO_VIRTUAL_P (subobject))
-	construct_virtual_base (subobject, arguments);
-  else
+  if (!BINFO_VIRTUAL_P (subobject))
 	{
 	  tree base_addr;
 
@@ -1155,6 +1153,8 @@ emit_mem_initializers (tree mem_inits)
   tf_warning_or_error);
 	  expand_cleanup_for_base (subobject, NULL_TREE);
 	}
+  else if (!ABSTRACT_CLASS_TYPE_P (current_class_type))
+	construct_virtual_base (subobject, arguments);
 }
   in_base_initializer = 0;
 
Index: cp/method.c
===
--- cp/method.c	(revision 225103)
+++ cp/method.c	(working copy)
@@ -1505,7 +1505,10 @@ synthesized_method_walk (tree ctype, spe
   vbases = CLASSTYPE_VBASECLASSES (ctype);
   if (vec_safe_is_empty (vbases))
 /* No virtual bases to worry about.  */;
-  else if (!assign_p)
+  else if (!assign_p
+	   /* C++14 ignores virtual bases of abstract classes, as they
+	  can never be directly called.  */
+	(cxx_dialect  cxx14 || !ABSTRACT_CLASS_TYPE_P (ctype)))
 {
   if (constexpr_p)
 	*constexpr_p = false;
Index: testsuite/g++.dg/cpp1y/pr66443.C
===
--- testsuite/g++.dg/cpp1y/pr66443.C	(revision 0)
+++ testsuite/g++.dg/cpp1y/pr66443.C	(working copy)
@@ -0,0 +1,22 @@
+// PR c++/66443
+// { dg-do compile { target c++14 } }
+
+class A {
+public:
+A( int ) {  }
+};
+
+// B's virtual base is ignored for default ctor determination as B is
+// abstract.  DR1611   DR1658
+
+class B: virtual public A {
+public:
+virtual void do_something() = 0;
+};
+
+class C: public B {
+public:
+C(): A( 1 ) {  }
+virtual void do_something() {  }
+};
+

Re: [PATCH] Use PIE_SPEC/NO_PIE_SPEC for crtend.o/crtendS.o

2015-06-29 Thread Joseph Myers

On Sun, 28 Jun 2015, H.J. Lu wrote:

  Here is a patch.  OK for trunk?
  
  Thanks.
 
 I'd like to check in this patch.  Any comments, objections?
 
 Thanks.
 
 H.J.
  From 50bebf531193c18efb0982ac119694aa9f650e44 Mon Sep 17 00:00:00 2001
  From: H.J. Lu hjl.to...@gmail.com
  Date: Thu, 25 Jun 2015 03:04:56 -0700
  Subject: [PATCH 1/2] Use PIE_SPEC/NO_PIE_SPEC for crtend.o/crtendS.o
  
  We need to link with crtend.o and crtendS.o properly for GCC configured
  to generate PIE by default.
  
  * config/gnu-user.h (GNU_USER_TARGET_ENDFILE_SPEC): Use
  PIE_SPEC and NO_PIE_SPEC if HAVE_LD_PIE is defined.

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: Four jit backports to gcc 5 branch

2015-06-29 Thread Dibyendu Majumdar

On 29 June 2015 at 16:44, David Malcolm dmalc...@redhat.com wrote:
 I've gone over the changes to the gcc/jit and gcc/testsuite/jit.dg
 directories in trunk since gcc 5 and backported the following 4 changes
 from trunk to the gcc-5-branch:

 gcc-5-branch's r225123:
   * trunk's r222863 (8120405bda9bf28b004e09b23e74eda57ae1, addition
 of test-benchmark.c):
   https://gcc.gnu.org/ml/gcc-patches/2015-05/msg00500.html


 gcc-5-branch's r225125:
   * trunk's r224531 (8154a3514d5fc8067a6928531d5f61cd768bd62c
 along with trunk's r224535 (1828cd755cf5e4a34d5638f543a059f3562ad957,
 PR jit/66539: Add parentheses as needed to
 gcc_jit_object_get_debug_string):
   https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01158.html


 gcc-5-branch's r225127:
   * trunk's r224536 (3052eeefc4607a7147fdc55af6d86845890eb281, jit: Add
 a test for compound assignment):
   https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01164.html


 gcc-5-branch's r225129:
   * trunk's r224565 (6689f47f53079d76bbb051d3b5da9018c2e0161a,
 jit: Add missing type-checking to gcc_jit_{l|r}value_access_field)
   https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01211.html



Hi Dave - is this what will go into 5.2? If so, is the option to
disable unreachable block validation going to be available?

Thanks and Regards
Dibyendu

Re: [RFA] Factor conversion out of COND_EXPR using match.pd pattern

2015-06-29 Thread Marc Glisse


On Mon, 29 Jun 2015, Jeff Law wrote:


That said - for this kind of patterns testcases that exercise the patterns
on GIMPLE would be very appreciated.
It may be the case that these patterns don't make a lot of sense on gimple 
and should be restricted to generic, at least with our current 
infrastructure.


The problem is when we lower from generic to gimple, we end up with branchy 
code, not straight line code and there's no good way I see to write a 
match.pd pattern which encompasses flow control.


Andrew was working on a way to generate phiopt code automatically from 
match.pd patterns involving cond_expr, I don't know what the status is.



or maybe use a for loop on comparisons, which would give names to
TREE_OPERAND (@0, *). This should even handle the operand_equal_p
alternative:

(cond (cmp:c@0 @1 @2) (convert @1) INTEGER_CST@2)


Hmm, looks like I wrote INTEGER_CST before the second occurence of @2 
instead of the first, so it is probably ignored :-(



Yes, that would be my reference.
But won't this require pointer equivalence?  Are INTEGER_CST nodes fully 
shared?  What if @1 is something more complex than a _DECL node (remember, 
we're working with GENERIC).  So something like

(cond (cmp:c@0 @1 @2) (convert @3) INTEGER_CST@4))

And using operand_equal_p seems more appropriate to me (and is still better 
than the original (cond @0 ...) and grubbing around inside @0 to look at 
operands.


I don't understand this comment. When you write @1 twice, genmatch emits a 
call to operand_equal_p to check that they are the same.


--
Marc Glisse

Re: [patch] PR debug/66653: avoid late_global_decl on decl_type_context()s

2015-06-29 Thread Jason Merrill


On 06/29/2015 05:07 AM, Richard Biener wrote:

On Fri, Jun 26, 2015 at 11:59 PM, Jason Merrill ja...@redhat.com wrote:

On 06/26/2015 05:37 AM, Richard Biener wrote:


Can we defer TLS model setting to template instantiation?


We need to represent somehow that __thread (or thread_local) was used in the
declaration, but DECL_THREAD_LOCAL_P was changed to refer to the TLS model.


Ok, so easiest would be to allocate a bit from decl_with_vis for this...


Or I can find a flag in the front end.  I guess I'll do that.

Jason

[PATCH] Discard Scops for which entry==exit

2015-06-29 Thread Aditya Kumar

In this patch we discard the scops where entry and exit are the same BB.
This is an effort to remove graphite-scop-detection.c:limit_scops.
Removing the limit_scops function introduces correctness regressions.
We are making relevant changes in incremental steps to fix those bugs,
and finally we intend to remove limit_scops.

2015-06-29  Aditya Kumar  aditya...@samsung.com
Sebastian Pop s@samsung.com

* graphite-scop-detection.c (build_scops_1): Discard scops for which 
entry==exit


---
 gcc/graphite-scop-detection.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c
index e8ddecd..f57cc4a 100644
--- a/gcc/graphite-scop-detection.c
+++ b/gcc/graphite-scop-detection.c
@@ -810,7 +810,14 @@ build_scops_1 (basic_block current, loop_p outermost_loop,
 {
   open_scop.exit = sinfo.exit;
   gcc_assert (open_scop.exit);
-  scops-safe_push (open_scop);
+  if (open_scop.entry != open_scop.exit)
+   scops-safe_push (open_scop);
+  else
+   {
+ sinfo.difficult = true;
+ sinfo.exits = false;
+ sinfo.exit = NULL;
+   }
 }
 
   result.exit = sinfo.exit;
-- 
2.1.0.243.g30d45f7

Re: Thinking about libgccjit SONAME bump for gcc 5.2 (was Re: Four jit backports to gcc 5 branch)

2015-06-29 Thread Dibyendu Majumdar

On 29 June 2015 at 22:26, David Malcolm dmalc...@redhat.com wrote:


 I'm working on that, I know that it's important to Ravi (the deadline
 for gcc-5.2 is Friday; 2015-07-03).

Thanks!


 There's no good way to track enums in binary metadata, so I'm currently
 thinking of this approach for the new options:

  extern void
  gcc_jit_context_set_bool_allow_unreachable_blocks (gcc_jit_context *ctxt,
 int bool_value);

 tagging it within the linker mapfile as being a new API.


 I'm looking at ways to manage libgccjit API/ABI as per this thread:
 https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01982.html
 by transitioning to using symbol versioning, so that the linker can tag
 subsets of libgccjit symbols in both libgccjit and in client binaries.

 Adding symbol versioning will involve a SONAME bump (from libgccjit.so.0
 to libgccjit.so.1), thus requiring anyone using libgccjit to recompile
 for this one change (the transition to versioned symbols, that is);
 hopefully we never need to bump the SONAME again.

 Given that this is needed by Ravi I feel that this break is worth
 backporting to gcc 5.2 (I'd rather do this one-time ABI break for the
 benefit of specific users, than not do it out of concern for
 hypothetical ones).

 Hope this sounds sane.


If you are just adding some API then does it need to be versioned?
Apologies if I am misunderstanding the reason here.

A breaking change would be if you changed an existing API.

Regards
Dibyendu

[PATCH] Graphite cannot handle return stmt

2015-06-29 Thread Aditya Kumar

No regressions.

2015-06-29  Aditya Kumar  aditya...@samsung.com
Sebastian Pop s@samsung.com

* graphite-scop-detection.c (stmt_simple_for_scop_p): Bail out in case 
of a return statement.

---
 gcc/graphite-scop-detection.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c
index e8ddecd..a10702e 100644
--- a/gcc/graphite-scop-detection.c
+++ b/gcc/graphite-scop-detection.c
@@ -365,6 +365,8 @@ stmt_simple_for_scop_p (basic_block scop_entry, loop_p 
outermost_loop,
   switch (gimple_code (stmt))
 {
 case GIMPLE_RETURN:
+  return false;
+
 case GIMPLE_LABEL:
   return true;
 
-- 
2.1.0.243.g30d45f7

Re: Thinking about libgccjit SONAME bump for gcc 5.2 (was Re: Four jit backports to gcc 5 branch)

2015-06-29 Thread Ulrich Drepper

On Mon, Jun 29, 2015 at 5:26 PM, David Malcolm dmalc...@redhat.com wrote:
 I'm looking at ways to manage libgccjit API/ABI as per this thread:
 https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01982.html
 by transitioning to using symbol versioning, so that the linker can tag
 subsets of libgccjit symbols in both libgccjit and in client binaries.

You don't have to bump the SONAME to introduce symbol versioning.
glibc in the beginning didn't have symbol versioning and we wrote the
linker and dynamic linker support so that no SONAME change was
necessary.  The idea is that unversioned symbols are satisfied by the
oldest symbol version.

[C++ Patch] PR 65977

2015-06-29 Thread Paolo Carlini


Hi,

given in particular Richard Smith' analysis here:

http://stackoverflow.com/questions/29871138/constexpr-is-not-allowed-in-declaration-of-friend-template-specialization/29957648#29957648

I think it's pretty straightforward that we don't have a good reason to 
reject constexpr friend declarations. In practice the most recent 
compilers I have at hand all agree.


Tested x86_64-linux.

Thanks,
Paolo.

///
/cp
2015-06-29  Paolo Carlini  paolo.carl...@oracle.com

PR c++/65977
* decl.c (grokfndecl): Allow constexpr declarations of friend
template specializations.


/testsuite
2015-06-29  Paolo Carlini  paolo.carl...@oracle.com

PR c++/65977
* g++.dg/cpp0x/constexpr-friend-3.C: New.
* g++.dg/cpp0x/constexpr-friend-2.C: Adjust.
Index: cp/decl.c
===
--- cp/decl.c   (revision 225123)
+++ cp/decl.c   (working copy)
@@ -7738,15 +7738,12 @@ grokfndecl (tree ctype,
}
 
  if (inlinep  1)
-   error (%inline% is not allowed in declaration of friend 
-  template specialization %qD,
-  decl);
- if (inlinep  2)
-   error (%constexpr% is not allowed in declaration of friend 
-  template specialization %qD,
-  decl);
- if (inlinep)
-   return NULL_TREE;
+   {
+ error (%inline% is not allowed in declaration of friend 
+template specialization %qD,
+decl);
+ return NULL_TREE;
+   }
}
 }
 
Index: testsuite/g++.dg/cpp0x/constexpr-friend-2.C
===
--- testsuite/g++.dg/cpp0x/constexpr-friend-2.C (revision 225123)
+++ testsuite/g++.dg/cpp0x/constexpr-friend-2.C (working copy)
@@ -3,5 +3,5 @@
 templatetypename T void f(T);
 
 template class T class A {
-  friend constexpr void f(int);  // { dg-error 'constexpr' is not allowed }
+  friend constexpr void f(int);
 };
Index: testsuite/g++.dg/cpp0x/constexpr-friend-3.C
===
--- testsuite/g++.dg/cpp0x/constexpr-friend-3.C (revision 0)
+++ testsuite/g++.dg/cpp0x/constexpr-friend-3.C (working copy)
@@ -0,0 +1,21 @@
+// PR c++/65977
+// { dg-do compile { target c++11 } }
+
+template__SIZE_TYPE__
+class bitset;
+
+template__SIZE_TYPE__ N
+constexpr bool operator==(const bitsetN, const bitsetN) noexcept;
+
+template__SIZE_TYPE__ N
+class bitset
+{
+  friend constexpr bool operator== (const bitsetN,
+ const bitsetN) noexcept;
+};
+
+template__SIZE_TYPE__ N
+constexpr bool operator==(const bitsetN, const bitsetN) noexcept
+{
+  return true;
+}

Thinking about libgccjit SONAME bump for gcc 5.2 (was Re: Four jit backports to gcc 5 branch)

2015-06-29 Thread David Malcolm

On Mon, 2015-06-29 at 21:57 +0100, Dibyendu Majumdar wrote:
 On 29 June 2015 at 16:44, David Malcolm dmalc...@redhat.com wrote:
  I've gone over the changes to the gcc/jit and gcc/testsuite/jit.dg
  directories in trunk since gcc 5 and backported the following 4 changes
  from trunk to the gcc-5-branch:
 
  gcc-5-branch's r225123:
* trunk's r222863 (8120405bda9bf28b004e09b23e74eda57ae1, addition
  of test-benchmark.c):
https://gcc.gnu.org/ml/gcc-patches/2015-05/msg00500.html
 
 
  gcc-5-branch's r225125:
* trunk's r224531 (8154a3514d5fc8067a6928531d5f61cd768bd62c
  along with trunk's r224535 (1828cd755cf5e4a34d5638f543a059f3562ad957,
  PR jit/66539: Add parentheses as needed to
  gcc_jit_object_get_debug_string):
https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01158.html
 
 
  gcc-5-branch's r225127:
* trunk's r224536 (3052eeefc4607a7147fdc55af6d86845890eb281, jit: Add
  a test for compound assignment):
https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01164.html
 
 
  gcc-5-branch's r225129:
* trunk's r224565 (6689f47f53079d76bbb051d3b5da9018c2e0161a,
  jit: Add missing type-checking to gcc_jit_{l|r}value_access_field)
https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01211.html
 
 
 
 Hi Dave - is this what will go into 5.2?

No, it's just the things that I've backported so far.

 If so, is the option to
 disable unreachable block validation going to be available?

I'm working on that, I know that it's important to Ravi (the deadline
for gcc-5.2 is Friday; 2015-07-03).

There's no good way to track enums in binary metadata, so I'm currently
thinking of this approach for the new options:

 extern void
 gcc_jit_context_set_bool_allow_unreachable_blocks (gcc_jit_context *ctxt,
int bool_value);

tagging it within the linker mapfile as being a new API.


I'm looking at ways to manage libgccjit API/ABI as per this thread:
https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01982.html
by transitioning to using symbol versioning, so that the linker can tag
subsets of libgccjit symbols in both libgccjit and in client binaries.

Adding symbol versioning will involve a SONAME bump (from libgccjit.so.0
to libgccjit.so.1), thus requiring anyone using libgccjit to recompile
for this one change (the transition to versioned symbols, that is);
hopefully we never need to bump the SONAME again.

Given that this is needed by Ravi I feel that this break is worth
backporting to gcc 5.2 (I'd rather do this one-time ABI break for the
benefit of specific users, than not do it out of concern for
hypothetical ones).

Hope this sounds sane.

As mentioned in that other thread, I'm wondering if linker experts could
advise on the sanity of using a feature-based naming scheme for symbol
map tags, rather than pure numbers?
e.g.
  LIBGCCJIT_ABI_INITIAL
  LIBGCCJIT_ABI_WITH_SWITCH_STATEMENTS
  LIGCCCJIT_ABI_WITH_BOOL_OPTION_ERROR_ON_UNREACHABLE_BLOCKS
etc (how long can they be, and is there a cost? can they include lower
case?)

Dave

Re: [PATCH] Move X - (X / Y) * Y folding to match.pd

2015-06-29 Thread Richard Biener

On Sat, 27 Jun 2015, Marc Glisse wrote:

 On Fri, 26 Jun 2015, Marek Polacek wrote:
 
  This is an attempt to move one pattern from fold-const.c to match.pd.
  It ought to be 1:1, but is not, e.g. with this patch we won't fold e.g.
  
  int
  f (int a, int b)
  {
   return a - (unsigned) ((a / b) * b)
  }
  
  anymore, but we're able to fold
  
  int
  ff (int a, unsigned int b)
  {
   return a - ((a / b) * b);
  }
  
  and fold-const.c is not.

Interesting.

   I played around with converts, but didn't find
  anything that would work well.  Any suggestions how to make this pattern
  better?
 
 Anything wrong with this?
 
 +/* X - (X / Y) * Y is the same as X % Y.  */
 +(simplify
 + (minus (convert? @0) (convert? (mult (trunc_div @0 @1) @1)))
 + (if (INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type))
 +  (convert (trunc_mod @0 @1

Yes.  Eventually even (convert? (mult (convert1? (trunc_div ...)?
Of course with matching @0 between the two operands of the minus
you constrain types quite a bit.

I'd say just single-step through fold and see what types it get
present when folding a - (unsigned) ((a / b) * b).

Thanks,
Richard.

 (the other div/mod pairs could benefit from the same transformation as long as
 there are no conversions, but the conversion seems easier to handle with
 trunc_)
 
  diff --git gcc/match.pd gcc/match.pd
  index b2f8429..2bc158b 100644
  --- gcc/match.pd
  +++ gcc/match.pd
  @@ -238,6 +238,12 @@ along with GCC; see the file COPYING3.  If not see
 tree_nop_conversion_p (type, TREE_TYPE (@1)))
(trunc_mod @0 (convert @1
  
  +/* X - (X / Y) * Y is the same as X % Y.  */
  +(simplify
  + (minus @0 (mult (trunc_div @0 @1) @1))
  + (if (INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type))
  +  (trunc_mod @0 @1)))
  +
  /* Optimize TRUNC_MOD_EXPR by a power of two into a BIT_AND_EXPR,
 i.e. X % C into X  (C - 1), if X and C are positive.
 Also optimize A % (C  N)  where C is a power of 2,
  
  Marek
  
 
 

-- 
Richard Biener rguent...@suse.de
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Dilip Upmanyu, Graham 
Norton, HRB 21284 (AG Nuernberg)

RE: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math

2015-06-29 Thread Kumar, Venkataramanan


Hmm,  Reducing the iterations to 1 step for float and 2 steps for double

 I got VE (miscompares) on following benchmarks
416.gamess 
453.povray 
454.calculix   
459.GemsFDTD  

Benedikt , I have ICE for 444.namd with your patch,  not sure if something 
wrong in my local tree.  

Regards,
Venkat.

 -Original Message-
 From: pins...@gmail.com [mailto:pins...@gmail.com]
 Sent: Sunday, June 28, 2015 8:35 PM
 To: Kumar, Venkataramanan
 Cc: Dr. Philipp Tomsich; Benedikt Huber; gcc-patches@gcc.gnu.org
 Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt)
 estimation in -ffast-math
 
 
 
 
 
  On Jun 25, 2015, at 9:44 AM, Kumar, Venkataramanan
 venkataramanan.ku...@amd.com wrote:
 
  I got around ~12% gain with -Ofast -mcpu=cortex-a57.
 
 I get around 11/12% on thunderX with the patch and the decreasing the
 iterations change (1/2) compared to without the patch.
 
 Thanks,
 Andrew
 
 
 
  Regards,
  Venkat.
 
  -Original Message-
  From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
  ow...@gcc.gnu.org] On Behalf Of Dr. Philipp Tomsich
  Sent: Thursday, June 25, 2015 9:13 PM
  To: Kumar, Venkataramanan
  Cc: Benedikt Huber; pins...@gmail.com; gcc-patches@gcc.gnu.org
  Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root
  (rsqrt) estimation in -ffast-math
 
  Kumar,
 
  what is the relative gain that you see on Cortex-A57?
 
  Thanks,
  Philipp.
 
  On 25 Jun 2015, at 17:35, Kumar, Venkataramanan
  venkataramanan.ku...@amd.com wrote:
 
  Changing to  1 step for float and 2 steps for double gives
  better gains
  now for gromacs on cortex-a57.
 
  Regards,
  Venkat.
  -Original Message-
  From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
  ow...@gcc.gnu.org] On Behalf Of Benedikt Huber
  Sent: Thursday, June 25, 2015 4:09 PM
  To: pins...@gmail.com
  Cc: gcc-patches@gcc.gnu.org; philipp.tomsich@theobroma-
 systems.com
  Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root
  (rsqrt) estimation in -ffast-math
 
  Andrew,
 
  This is NOT a win on thunderX at least for single precision
  because you have
  to do the divide and sqrt in the same time as it takes 5 multiples
  (estimate and step are multiplies in the thunderX pipeline).
  Doubles is 10 multiplies which is just the same as what the patch
  does (but it is really slightly less than 10, I rounded up). So in
  the end this is NOT a win at all for thunderX unless we do one less
  step for both single
  and double.
 
  Yes, the expected benefit from rsqrt estimation is implementation
  specific. If one has a better initial rsqrte or an application that
  can trade precision for execution time, we could offer a command
  line option to do only 2 steps for doulbe and 1 step for float;
  similar to -
  mrecip-precision for PowerPC.
  What are your thoughts on that?
 
  Best regards,
  Benedikt

Re: [PATCH][5/n] Remove GENERIC stmt combining from SCCVN

2015-06-29 Thread Richard Biener

On Sat, 27 Jun 2015, Marc Glisse wrote:

 On Fri, 26 Jun 2015, Richard Biener wrote:
 
  + /* Equality compare simplifications from fold_binary  */
  + (for cmp (eq ne)
  +
  +  /* If we have (A | C) == D where C  ~D != 0, convert this into 0.
  + Similarly for NE_EXPR.  */
  +  (simplify
  +   (cmp (convert?@3 (bit_ior @0 INTEGER_CST@1)) INTEGER_CST@2)
  +   (if (tree_nop_conversion_p (TREE_TYPE (@3), TREE_TYPE (@0))
  + wi::bit_and_not (@1, @2) != 0)
  +{ constant_boolean_node (cmp == NE_EXPR, type); }))
  +
  +  /* (X ^ Y) == 0 becomes X == Y, and (X ^ Y) != 0 becomes X != Y.  */
  +  (simplify
  +   (cmp (bit_xor @0 @1) integer_zerop)
  +   (cmp @0 @1))
  +
  +  /* (X ^ Y) == Y becomes X == 0.
  + Likewise (X ^ Y) == X becomes Y == 0.  */
  +  (simplify
  +   (cmp (bit_xor:c @0 @1) @0)
 
 Don't you need cmp:c for this one? The transformation still somehow happens
 through forward_propagate_into_comparison, but it looks like an accident.

Yeah, I think it canonicalizes operand order (and uses the GENERIC
folding path).

I'm testing a patch changing the above to cmp:c.

  +   (cmp @1 { build_zero_cst (TREE_TYPE (@1)); }))
  +
  +  /* (X ^ C1) op C2 can be rewritten as X op (C1 ^ C2).  */
  +  (simplify
  +   (cmp (convert?@3 (bit_xor @0 INTEGER_CST@1)) INTEGER_CST@2)
  +   (if (tree_nop_conversion_p (TREE_TYPE (@3), TREE_TYPE (@0)))
  +(cmp @0 (bit_xor @1 (convert @2))
 
 I guess we'll have to generalize this to vectors at some point...

True - I'm trying to move patterns 1:1 leaving improvements to followups

Thanks,
Richard.

[Patch] Add support for IEEE-conformant versions of scalar fmin* and fmax*

2015-06-29 Thread David Sherwood

Hi,

I have added new STRICT_MAX_EXPR and STRICT_MIN_EXPR expressions to support the
IEEE versions of fmin and fmax. This is done by recognising the math library
fmax and fmin builtin functions in a similar way to how this is done for
-ffast-math. This also allows us to vectorise the IEEE max/min functions for
targets that support it, for example aarch64/aarch32.

Tested:

x86_64-linux: no regressions
aarch64-none-elf: no regressions
aarch64_be-none-elf: no regressions
arm-none-eabi: no regressions

ChangeLog:

2015-06-26  David Sherwood  david.sherw...@arm.com

gcc/
* builtins.c (integer_valued_real_p): Add STRICT_MIN_EXPR and
STRICT_MAX_EXPR.
(fold_builtin_fmin_fmax): For strict math, convert builting fmin and 
fmax to STRICT_MIN_EXPR and STRICT_MIN_EXPR, respectively.
* expr.c (expand_expr_real_2): Add STRICT_MIN_EXPR and STRICT_MAX_EXPR.
* fold-const.c (const_binop): Likewise.
(fold_binary_loc, tree_binary_nonnegative_warnv_p): Likewise.
(tree_binary_nonzero_warnv_p): Likewise.
* optabs.h (strict_minmax_support): Declare.
* optabs.def: Add new optabs strict_max_optab/strict_min_optab.
* optabs.c (optab_for_tree_code): Return new optabs for STRICT_MIN_EXPR
and STRICT_MAX_EXPR.
(strict_minmax_support): New function.
* real.c (real_arithmetic): Add STRICT_MIN_EXPR and STRICT_MAX_EXPR.
* tree.def: Likewise.
* tree.c (associative_tree_code, commutative_tree_code): Likewise.
* tree-cfg.c (verify_expr): Likewise.
(verify_gimple_assign_binary): Likewise.
* tree-inline.c (estimate_operator_cost): Likewise.
* tree-pretty-print.c (dump_generic_node, op_code_prio): Likewise.
(op_symbol_code): Likewise.
* config/aarch64/aarch64.md: New pattern.
* config/aarch64/aarch64-simd.md: Likewise.
* config/aarch64/iterators.md: New unspecs, iterators.
* config/arm/iterators.md: New iterators.
* config/arm/unspecs.md: New unspecs.
* config/arm/neon.md: New pattern.
* config/arm/vfp.md: Likewise.
gcc/testsuite
* gcc.target/aarch64/maxmin_strict.c: New test.
* gcc.target/arm/maxmin_strict.c: New test.
* lib/target-supports.exp
(check_effective_target_arm_v8_vfp_neon_ok): New check.
(add_options_for_arm_v8_vfp_neon): New options.

Regards,
David Sherwood.



patch
Description: Binary data

Re: [PATCH][4/n] Remove GENERIC stmt combining from SCCVN

2015-06-29 Thread Richard Biener

On Fri, 26 Jun 2015, Jeff Law wrote:

 On 06/26/2015 03:24 AM, Richard Biener wrote:
  On Thu, 25 Jun 2015, Richard Biener wrote:
  
   
   This moves fold_sign_changed_comparison.  Shows up in gcc.dg/pr55833.c
   
   I'll eventually massage it according to Jakubs suggestion to do a
   
   #ifndef HAVE_canonicalize_funcptr_for_compare
   #define HAVE_canonicalize_funcptr_for_compare 0
   #endif
   
   somewhere (defaults.h should work I guess).
   
   Bootstrap and regtest running on x86_64-unknown-linux-gnu.
  
  This runs into
  
  Running target unix//-m32
  FAIL: g++.dg/tree-ssa/calloc.C  -std=gnu++11  scan-tree-dump-not optimized
  malloc
  
  where we now optimize
  
  n_5 = (size_t) n_4(D);
  ...
  bb 5:
  -  if (n_5 != 0)
  +  if (n_4(D) != 0)
  
  but both VRP and DOM fail to record equivalences for n_5 from
  the updated condition (I have a patch to fix VRP but not DOM).
 So you want an equivalence recorded for n_5, even though it no longer appears
 in the conditional (but presumably has other uses)?
 
 DOM has some code for this already, but it's kind-of backwards from what
 you're trying to do in terms of when it's used.  That code might be factorable
 and usable elsewhere in DOM.

Not sure I came along sth like that.

In principle the following works for the testcase (even w/o fixing
the VRP part).

Index: gcc/tree-ssa-dom.c
===
--- gcc/tree-ssa-dom.c  (revision 225007)
+++ gcc/tree-ssa-dom.c  (working copy)
@@ -1409,6 +1409,14 @@ simplify_stmt_for_jump_threading (gimple
   return lookup_avail_expr (stmt, false);
 }
 
+static tree
+dom_valueize (tree t)
+{
+  if (TREE_CODE (t) == SSA_NAME)
+return SSA_NAME_VALUE (t);
+  return t;
+}
+
 /* Record into the equivalence tables any equivalences implied by
traversing edge E (which are cached in E-aux).
 
@@ -1429,7 +1437,33 @@ record_temporary_equivalences (edge e)
 
   /* If we have a simple NAME = VALUE equivalence, record it.  */
   if (lhs  TREE_CODE (lhs) == SSA_NAME)
-   const_and_copies-record_const_or_copy (lhs, rhs);
+   {
+ gimple use_stmt;
+ imm_use_iterator iter;
+ const_and_copies-record_const_or_copy (lhs, rhs);
+ FOR_EACH_IMM_USE_STMT (use_stmt, iter, lhs)
+   {
+ /* Only bother to record more equivalences for lhs that
+can be directly used by e-dest.
+???  If the code gets re-organized to a worklist to
+catch more indirect opportunities and it is made to
+handle PHIs then this should only consider use_stmts
+in basic-blocks we have already visited.  */
+ if (!dominated_by_p (CDI_DOMINATORS,
+  e-dest, gimple_bb (use_stmt)))
+   continue;
+ tree lhs = gimple_get_lhs (use_stmt);
+ if (lhs  TREE_CODE (lhs) == SSA_NAME)
+   {
+ tree res = gimple_fold_stmt_to_constant_1 (use_stmt,
+dom_valueize,
+
no_follow_ssa_edges);
+ if (TREE_CODE (res) == SSA_NAME
+ || is_gimple_min_invariant (res))
+   const_and_copies-record_const_or_copy (lhs, res);
+   }
+   }
+   }
 
   /* If we have 0 = COND or 1 = COND equivalences, record them
 into our expression hash tables.  */


it's not using DOMs own stmt visiting machinery as that always modifies
stmts in-place.  As stated in the comment it doesn't catch secondary
opportunities.  That would be possible by using a work-list seeded
by LHS we recorded new const/copies for and re-visiting their uses.
You can get extra fancy here by properly handling PHIs and
conditionals.  But it's a question of cost here, of course.

Note that I think this isn't really backward propagation but
just context sensitive value-numbering.

Like VRP restricts what it inserts asserts for stmt re-visiting could
also be constrained on availability of uses dominated by the edge
providing the original equivalency.

So in some sense the above is a hack (similar to all the special-cases
VRP has for similar situations).  Under what constraints do you
think sth like the above is ok to put on trunk?

Thanks,
Richard.

  I think we're simply missing a pass that can properly deal
  with this kind of stuff.  For example DOM could re-visit
  stmts which have uses of SSA names we added temporary
  equivalences for (ones that dominate the current block,
  others we'll (re-)visit anyway).  That would fix this testcase
  but to catch secondary effects you'd need to re-visit uses of
  the defs of the revisited stmts as well (if anything interesting
  was produced, of course).
 This problem feels a bit like it's better handled by an optimizer independent
 of DOM.  Probably a backwards IL walking context

[PATCH] Bump LTO_major_version on trunk

2015-06-29 Thread Richard Biener


Was still the same as on the GCC 5 branch.

Committed as obvious.

Richard.

2015-06-29  Richard Biener  rguent...@suse.de

* lto-streamer.h (LTO_major_version): Bump to 5.

Index: gcc/lto-streamer.h
===
--- gcc/lto-streamer.h  (revision 225112)
+++ gcc/lto-streamer.h  (working copy)
@@ -130,7 +130,7 @@ along with GCC; see the file COPYING3.
  String are represented in the table as pairs, a length in ULEB128
  form followed by the data for the string.  */
 
-#define LTO_major_version 4
+#define LTO_major_version 5
 #define LTO_minor_version 0
 
 typedef unsigned char  lto_decl_flags_t;

Re: Remove redundant AND from count reduction loop

2015-06-29 Thread Richard Biener

On Sun, Jun 28, 2015 at 1:45 PM, Marc Glisse marc.gli...@inria.fr wrote:
 On Fri, 26 Jun 2015, Richard Biener wrote:

 OK.  The reason I was being paranoid was that I couldn't see anywhere
 where we enforced that the vector condition in a VEC_COND had to have
 the same element width as the values being selected.



 We don't require that indeed.

  tree-cfg.c
 only checks that rhs2 and rhs3 are compatible with the result.
 There doesn't seem to be any checking of rhs1 vs. the other types.
 So I wasn't sure whether anything stopped us from, e.g., comparing two
 V4HIs and using the result to select between two V4SIs.



 Nothing does (or should).



 The documentation patch you approved in
 https://gcc.gnu.org/ml/gcc-patches/2012-10/msg01109.html says something
 different. If it is really wrong, could you fix it?


 Hmm, that simplifies things.


 On the other hand, vectors of bools could be (I haven't thought about it
 much) nice to have, especially for avx512 (and at least one other arch,
 maybe sparc).

 It would be nice if these constraints would also be checked in the gimple
 verifier...


 This passed bootstrap+testsuite on powerpc64le-unknown-linux-gnu.

Ok.

Thanks,
Richard.

 2015-06-29  Marc Glisse  marc.gli...@inria.fr

 * tree-cfg.c (verify_gimple_assign_ternary) VEC_COND_EXPR: Check
 the first argument.

 --
 Marc Glisse
 Index: gcc/tree-cfg.c
 ===
 --- gcc/tree-cfg.c  (revision 225104)
 +++ gcc/tree-cfg.c  (working copy)
 @@ -4001,8 +4001,22 @@
 }
break;

 +case VEC_COND_EXPR:
 +  if (!VECTOR_INTEGER_TYPE_P (rhs1_type)
 + || TYPE_SIGN (rhs1_type) != SIGNED
 + || TYPE_SIZE (rhs1_type) != TYPE_SIZE (lhs_type)
 + || TYPE_VECTOR_SUBPARTS (rhs1_type)
 +!= TYPE_VECTOR_SUBPARTS (lhs_type))
 +   {
 + error (the first argument of a VEC_COND_EXPR must be of a signed
 
 +integral vector type of the same size and number of 
 +elements as the result);
 + debug_generic_expr (lhs_type);
 + debug_generic_expr (rhs1_type);
 + return true;
 +   }
 +  /* Fallthrough.  */
  case COND_EXPR:
 -case VEC_COND_EXPR:
if (!useless_type_conversion_p (lhs_type, rhs2_type)
   || !useless_type_conversion_p (lhs_type, rhs3_type))
 {

Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt) estimation in -ffast-math

2015-06-29 Thread Dr. Philipp Tomsich

Kumar,

This does not come unexpected, as the initial estimation and each iteration 
will add an architecturally-defined number of bits of precision (ARMv8 
guarantuees only a minimum number of bits provided per operation… the exact 
number is specific to each micro-arch, though).
Depending on your architecture and on the required number of precise bits by 
any given benchmark, one may see miscompares.

Do you know the exact number of bits that the initial estimate and the 
subsequent refinement steps add for your micro-arch?

Thanks,
Philipp.

 On 29 Jun 2015, at 10:17, Kumar, Venkataramanan 
 venkataramanan.ku...@amd.com wrote:
 
 
 Hmm,  Reducing the iterations to 1 step for float and 2 steps for double
 
 I got VE (miscompares) on following benchmarks
 416.gamess 
 453.povray 
 454.calculix   
 459.GemsFDTD  
 
 Benedikt , I have ICE for 444.namd with your patch,  not sure if something 
 wrong in my local tree.  
 
 Regards,
 Venkat.
 
 -Original Message-
 From: pins...@gmail.com [mailto:pins...@gmail.com]
 Sent: Sunday, June 28, 2015 8:35 PM
 To: Kumar, Venkataramanan
 Cc: Dr. Philipp Tomsich; Benedikt Huber; gcc-patches@gcc.gnu.org
 Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root (rsqrt)
 estimation in -ffast-math
 
 
 
 
 
 On Jun 25, 2015, at 9:44 AM, Kumar, Venkataramanan
 venkataramanan.ku...@amd.com wrote:
 
 I got around ~12% gain with -Ofast -mcpu=cortex-a57.
 
 I get around 11/12% on thunderX with the patch and the decreasing the
 iterations change (1/2) compared to without the patch.
 
 Thanks,
 Andrew
 
 
 
 Regards,
 Venkat.
 
 -Original Message-
 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
 ow...@gcc.gnu.org] On Behalf Of Dr. Philipp Tomsich
 Sent: Thursday, June 25, 2015 9:13 PM
 To: Kumar, Venkataramanan
 Cc: Benedikt Huber; pins...@gmail.com; gcc-patches@gcc.gnu.org
 Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root
 (rsqrt) estimation in -ffast-math
 
 Kumar,
 
 what is the relative gain that you see on Cortex-A57?
 
 Thanks,
 Philipp.
 
 On 25 Jun 2015, at 17:35, Kumar, Venkataramanan
 venkataramanan.ku...@amd.com wrote:
 
 Changing to  1 step for float and 2 steps for double gives
 better gains
 now for gromacs on cortex-a57.
 
 Regards,
 Venkat.
 -Original Message-
 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
 ow...@gcc.gnu.org] On Behalf Of Benedikt Huber
 Sent: Thursday, June 25, 2015 4:09 PM
 To: pins...@gmail.com
 Cc: gcc-patches@gcc.gnu.org; philipp.tomsich@theobroma-
 systems.com
 Subject: Re: [PATCH] [aarch64] Implemented reciprocal square root
 (rsqrt) estimation in -ffast-math
 
 Andrew,
 
 This is NOT a win on thunderX at least for single precision
 because you have
 to do the divide and sqrt in the same time as it takes 5 multiples
 (estimate and step are multiplies in the thunderX pipeline).
 Doubles is 10 multiplies which is just the same as what the patch
 does (but it is really slightly less than 10, I rounded up). So in
 the end this is NOT a win at all for thunderX unless we do one less
 step for both single
 and double.
 
 Yes, the expected benefit from rsqrt estimation is implementation
 specific. If one has a better initial rsqrte or an application that
 can trade precision for execution time, we could offer a command
 line option to do only 2 steps for doulbe and 1 step for float;
 similar to -
 mrecip-precision for PowerPC.
 What are your thoughts on that?
 
 Best regards,
 Benedikt

Re: [patch] PR debug/66653: avoid late_global_decl on decl_type_context()s

2015-06-29 Thread Richard Biener

On Fri, Jun 26, 2015 at 11:59 PM, Jason Merrill ja...@redhat.com wrote:
 On 06/26/2015 05:37 AM, Richard Biener wrote:

 Can we defer TLS model setting to template instantiation?


 We need to represent somehow that __thread (or thread_local) was used in the
 declaration, but DECL_THREAD_LOCAL_P was changed to refer to the TLS model.

Ok, so easiest would be to allocate a bit from decl_with_vis for this...

Richard.

 Jason

Re: [PATCH] Graphite cannot handle return stmt

2015-06-29 Thread Sebastian Pop

On Mon, Jun 29, 2015 at 3:58 PM, Aditya Kumar hiradi...@msn.com wrote:
 No regressions.

 2015-06-29  Aditya Kumar  aditya...@samsung.com
 Sebastian Pop s@samsung.com

 * graphite-scop-detection.c (stmt_simple_for_scop_p): Bail out in 
 case of a return statement.

Looks good to me.
Tobi, do you see a good reason not to cut scops at return stmts?

Thanks,
Sebastian


 ---
  gcc/graphite-scop-detection.c | 2 ++
  1 file changed, 2 insertions(+)

 diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c
 index e8ddecd..a10702e 100644
 --- a/gcc/graphite-scop-detection.c
 +++ b/gcc/graphite-scop-detection.c
 @@ -365,6 +365,8 @@ stmt_simple_for_scop_p (basic_block scop_entry, loop_p 
 outermost_loop,
switch (gimple_code (stmt))
  {
  case GIMPLE_RETURN:
 +  return false;
 +
  case GIMPLE_LABEL:
return true;

 --
 2.1.0.243.g30d45f7

[PATCH, rs6000] Add more overloaded built-ins for vec_cmpge and vec_cmple

2015-06-29 Thread Bill Schmidt

Hi,

This patch adds built-in functions for vec_cmpge and vec_cmple required
by the ELFv2 ABI but not yet present in gcc.  These make use of the
existing patterns for gt:VEC_I and gtu:VEC_I, applying the not
operator and reversing the order of operands as needed.  The result is
generation of a vmpgt[su][bhwd] instruction followed by an xxlnor.

Since it is likely that a vec_cmp[gl]e will be followed by a vec_sel, it
will be important to optimize the xxlnor/xxsel* to remove the xxlnor by
reversing the sense of the select.  A separate patch will follow to add
this pattern to simplify-rtx.c.

Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
regressions.  Ok for trunk?

Thanks,
Bill


[gcc]

2015-06-29  Bill Schmidt  wschm...@linux.vnet.ibm.com

* config/rs6000/rs6000-builtin.def (CMPGE_16QI): New built-in
definition.
(CMPGE_8HI): Likewise.
(CMPGE_4SI): Likewise.
(CMPGE_2DI): Likewise.
(CMPGE_U16QI): Likewise.
(CMPGE_U8HI): Likewise.
(CMPGE_U4SI): Likewise.
(CMPGE_U2DI): Likewise.
(CMPLE_16QI): Likewise.
(CMPLE_8HI): Likewise.
(CMPLE_4SI): Likewise.
(CMPLE_2DI): Likewise.
(CMPLE_U16QI): Likewise.
(CMPLE_U8HI): Likewise.
(CMPLE_U4SI): Likewise.
(CMPLE_U2DI): Likewise.
* config/rs6000/rs6000-c.c (altivec_overloaded_builtins): Add
overloads for ALTIVEC_BUILTIN_VEC_CMPGE and
ALTIVEC_BUILTIN_VEC_CMPLE.
* config/rs6000/vector.md (vector_gemode): Restrict to
floating-point vector modes.
(vector_nltmode): New define_expand.
(vector_nltumode): Likewise.
(vector_ngtmode): Likewise.
(vector_ngtumode): Likewise.

[gcc/testsuite]

2015-06-29  Bill Schmidt  wschm...@linux.vnet.ibm.com

* gcc.target/powerpc/vec-cmp.c: New test.


Index: gcc/config/rs6000/rs6000-builtin.def
===
--- gcc/config/rs6000/rs6000-builtin.def(revision 224924)
+++ gcc/config/rs6000/rs6000-builtin.def(working copy)
@@ -1284,6 +1284,24 @@ BU_VSX_2 (XVCVUXDDP_SCALE,xvcvuxddp_scale, C
 BU_VSX_2 (XVCVDPSXDS_SCALE,   xvcvdpsxds_scale, CONST, vsx_xvcvdpsxds_scale)
 BU_VSX_2 (XVCVDPUXDS_SCALE,   xvcvdpuxds_scale, CONST, vsx_xvcvdpuxds_scale)
 
+BU_VSX_2 (CMPGE_16QI, cmpge_16qi, CONST,  vector_nltv16qi)
+BU_VSX_2 (CMPGE_8HI,  cmpge_8hi,  CONST,  vector_nltv8hi)
+BU_VSX_2 (CMPGE_4SI,  cmpge_4si,  CONST,  vector_nltv4si)
+BU_VSX_2 (CMPGE_2DI,  cmpge_2di,  CONST,  vector_nltv2di)
+BU_VSX_2 (CMPGE_U16QI,cmpge_u16qi,CONST,  vector_nltuv16qi)
+BU_VSX_2 (CMPGE_U8HI, cmpge_u8hi, CONST,  vector_nltuv8hi)
+BU_VSX_2 (CMPGE_U4SI, cmpge_u4si, CONST,  vector_nltuv4si)
+BU_VSX_2 (CMPGE_U2DI, cmpge_u2di, CONST,  vector_nltuv2di)
+
+BU_VSX_2 (CMPLE_16QI, cmple_16qi, CONST,  vector_ngtv16qi)
+BU_VSX_2 (CMPLE_8HI,  cmple_8hi,  CONST,  vector_ngtv8hi)
+BU_VSX_2 (CMPLE_4SI,  cmple_4si,  CONST,  vector_ngtv4si)
+BU_VSX_2 (CMPLE_2DI,  cmple_2di,  CONST,  vector_ngtv2di)
+BU_VSX_2 (CMPLE_U16QI,cmple_u16qi,CONST,  vector_ngtuv16qi)
+BU_VSX_2 (CMPLE_U8HI, cmple_u8hi, CONST,  vector_ngtuv8hi)
+BU_VSX_2 (CMPLE_U4SI, cmple_u4si, CONST,  vector_ngtuv4si)
+BU_VSX_2 (CMPLE_U2DI, cmple_u2di, CONST,  vector_ngtuv2di)
+
 /* VSX abs builtin functions.  */
 BU_VSX_A (XVABSDP,   xvabsdp,CONST,  absv2df2)
 BU_VSX_A (XVNABSDP,  xvnabsdp,   CONST,  vsx_nabsv2df2)
Index: gcc/config/rs6000/rs6000-c.c
===
--- gcc/config/rs6000/rs6000-c.c(revision 224924)
+++ gcc/config/rs6000/rs6000-c.c(working copy)
@@ -1096,6 +1096,26 @@ const struct altivec_builtin_types altivec_overloa
 RS6000_BTI_bool_V4SI, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 },
   { ALTIVEC_BUILTIN_VEC_CMPGE, VSX_BUILTIN_XVCMPGEDP,
 RS6000_BTI_bool_V2DI, RS6000_BTI_V2DF, RS6000_BTI_V2DF, 0 },
+  { ALTIVEC_BUILTIN_VEC_CMPGE, VSX_BUILTIN_CMPGE_16QI,
+RS6000_BTI_bool_V16QI, RS6000_BTI_V16QI, RS6000_BTI_V16QI, 0},
+  { ALTIVEC_BUILTIN_VEC_CMPGE, VSX_BUILTIN_CMPGE_U16QI,
+RS6000_BTI_bool_V16QI, RS6000_BTI_unsigned_V16QI,
+RS6000_BTI_unsigned_V16QI, 0},
+  { ALTIVEC_BUILTIN_VEC_CMPGE, VSX_BUILTIN_CMPGE_8HI,
+RS6000_BTI_bool_V8HI, RS6000_BTI_V8HI, RS6000_BTI_V8HI, 0},
+  { ALTIVEC_BUILTIN_VEC_CMPGE, VSX_BUILTIN_CMPGE_U8HI,
+RS6000_BTI_bool_V8HI, RS6000_BTI_unsigned_V8HI,
+RS6000_BTI_unsigned_V8HI, 0},
+  { ALTIVEC_BUILTIN_VEC_CMPGE, VSX_BUILTIN_CMPGE_4SI,
+RS6000_BTI_bool_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0},
+  { ALTIVEC_BUILTIN_VEC_CMPGE, VSX_BUILTIN_CMPGE_U4SI,
+RS6000_BTI_bool_V4SI, RS6000_BTI_unsigned_V4SI,
+RS6000_BTI_unsigned_V4SI, 0},
+  { ALTIVEC_BUILTIN_VEC_CMPGE, VSX_BUILTIN_CMPGE_2DI,
+

Re: [patch] fix regrename pass to ensure renamings produce valid insns

2015-06-29 Thread Kito Cheng

Hi all:

This patch seem will broken when disable assert checking for c6x

Index: gcc/config/c6x/c6x.c
===
--- gcc/config/c6x/c6x.c (revision 225104)
+++ gcc/config/c6x/c6x.c (working copy)
@@ -3516,7 +3516,7 @@ try_rename_operands (rtx_insn *head, rtx
   best_reg =
 find_rename_reg (this_head, super_class, unavailable, old_reg, true);

-  regrename_do_replace (this_head, best_reg);
+  gcc_assert (regrename_do_replace (this_head, best_reg));

   count_unit_reqs (new_reqs, head, PREV_INSN (tail));
   merge_unit_reqs (new_reqs);
@@ -3529,7 +3529,7 @@ try_rename_operands (rtx_insn *head, rtx
unit_req_imbalance (reqs), unit_req_imbalance (new_reqs));
 }
   if (unit_req_imbalance (new_reqs)  unit_req_imbalance (reqs))
-regrename_do_replace (this_head, old_reg);
+gcc_assert (regrename_do_replace (this_head, old_reg));
   else
 memcpy (reqs, new_reqs, sizeof (unit_req_table));


On Mon, Jun 29, 2015 at 5:08 AM, Sandra Loosemore
san...@codesourcery.com wrote:
 On 06/24/2015 09:46 PM, Jeff Law wrote:

 On 06/23/2015 07:00 PM, Sandra Loosemore wrote:

 On 06/18/2015 11:32 AM, Eric Botcazou wrote:

 The attached patch teaches regrename to validate insns affected by each
 register renaming before making the change.  I can see at least two
 other ways to handle this -- earlier, by rejecting renamings that
 result
 in invalid instructions when it's searching for the best renaming; or
 later, by validating the entire set of renamings as a group instead of
 incrementally for each one -- but doing it all in regname_do_replace
 seems least disruptive and risky in terms of the existing code.


 OK, but the patch looks incomplete, rename_chains should be adjusted
 as well,
 i.e. regrename_do_replace should now return a boolean.


 Like this?  I tested this on nios2 and x86_64-linux-gnu, as before, plus
 built for aarch64-linux-gnu and ran the gcc testsuite.

 The c6x back end also calls regrename_do_replace.  I am not set up to
 build or test on that target, and Bernd told me off-list that it would
 never fail on that target anyway so I have left that code alone.

 -Sandra

 regrename-2.log


 2015-06-23  Chung-Lin Tangclt...@codesourcery.com
 Sandra Loosemoresan...@codesourcery.com

 gcc/
 * regrename.h (regrename_do_replace): Change to return bool.
 * regrename.c (rename_chains): Check return value of
 regname_do_replace.
 (regrename_do_replace): Re-validate the modified insns and
 return bool status.
 * config/aarch64/cortex-a57-fma-steering.c (rename_single_chain):
 Update to match rename_chains changes.

 As Eric mentioned, please put an assert to verify that the call from the
 c6x backend never fails.

 The regrename and ARM bits are fine.

 Do you have a testcase that you can add to the suite?  If so it'd be
 appreciated if you could include that too.

 Approved with the c6x assert if a testcase isn't available or
 exceedingly difficult to produce.


 Thanks.  I've committed the attached version.

 Re the testcase, this fixed 16 FAILs on existing tests in the gcc testsuite
 with the forthcoming nios2 load/store multiple instruction support, all
 assembler errors due to the bad instructions being generated.  There's
 nothing I can do on nios2 for a testcase until I get those patches committed
 (I'm still trying to re-test and tidy them up for submission), plus I think
 the failures are rather fragile -- depending on the register allocator
 choosing an initial register numbering that allows peephole optimizers to
 trigger, etc.  But, I will revisit this later and see what I can do.

 -Sandra

Re: [PATCH GCC][refacor]Manage allocation of struct iv in obstack.

2015-06-29 Thread Bin.Cheng

On Sat, Jun 27, 2015 at 5:13 AM, Jeff Law l...@redhat.com wrote:
 On 06/26/2015 03:02 AM, Bin Cheng wrote:

 Hi,
 GCC avoids multi-pointers/dangling-pointers of struct iv by allocating
 multiple copies of the structure.  This patch is an obvious fix to the
 issue
 by managing iv structures in obstack.

 Bootstrap on x86_64, will apply to trunk if no objection.

 Thanks,
 bin

 2015-06-26  Bin Cheng  bin.ch...@arm.com

 * tree-ssa-loop-ivopts.c (struct ivopts_data): New field
 iv_obstack.
 (tree_ssa_iv_optimize_init): Initialize iv_obstack.
 (alloc_iv): New parameter.  Allocate struct iv using
 obstack_alloc.
 (set_iv, find_interesting_uses_address, add_candidate_1): New
 argument.
 (find_interesting_uses_op): Don't duplicate struct iv.
 (free_loop_data): Don't free iv structure explicitly.
 (tree_ssa_iv_optimize_finalize): Free iv_obstack.

 Presumably you're trying to simplify the memory management  here so that you
 don't have to track lifetimes of the IV structures so carefully, which in
 turn simplifies some upcoming patch?
Yes, that's exactly the reason.  I am still on the way fixing
missed-optimizations in IVO, and plan to do some
refactoring/simplification afterwards.

 Note we don't have a no objection policy for this kind of patch. However,
 I think it may make sense to look into having you as a maintainer for the IV
 optimizations if you're interested.
Oh, that would be my great honor.

Thanks,
bin

 Jeff

Re: [PATCH] Discard Scops for which entry==exit

2015-06-29 Thread Sebastian Pop

On Mon, Jun 29, 2015 at 3:04 PM, Aditya Kumar hiradi...@msn.com wrote:
 In this patch we discard the scops where entry and exit are the same BB.
 This is an effort to remove graphite-scop-detection.c:limit_scops.
 Removing the limit_scops function introduces correctness regressions.
 We are making relevant changes in incremental steps to fix those bugs,
 and finally we intend to remove limit_scops.

 2015-06-29  Aditya Kumar  aditya...@samsung.com
 Sebastian Pop s@samsung.com

 * graphite-scop-detection.c (build_scops_1): Discard scops for which 
 entry==exit

Looks good to me.
Let's wait on comments from Tobi before pushing this patch.

Thanks,
Sebastian



 ---
  gcc/graphite-scop-detection.c | 9 -
  1 file changed, 8 insertions(+), 1 deletion(-)

 diff --git a/gcc/graphite-scop-detection.c b/gcc/graphite-scop-detection.c
 index e8ddecd..f57cc4a 100644
 --- a/gcc/graphite-scop-detection.c
 +++ b/gcc/graphite-scop-detection.c
 @@ -810,7 +810,14 @@ build_scops_1 (basic_block current, loop_p 
 outermost_loop,
  {
open_scop.exit = sinfo.exit;
gcc_assert (open_scop.exit);
 -  scops-safe_push (open_scop);
 +  if (open_scop.entry != open_scop.exit)
 +   scops-safe_push (open_scop);
 +  else
 +   {
 + sinfo.difficult = true;
 + sinfo.exits = false;
 + sinfo.exit = NULL;
 +   }
  }

result.exit = sinfo.exit;
 --
 2.1.0.243.g30d45f7

Re: [s390] Revert TPF C++ library changes

2015-06-29 Thread DJ Delorie


 But don't we need to support the older system (with 2 libstdc++s) and 
 the newer system (just one libstdc++)?  Which implies these changes need 
 to be conditional, right?

The CPP2 configuration was never shipped to TPF customers, so there's
no need to retain both ways.

[PATCH, ARM] stop changing signedness in PROMOTE_MODE

2015-06-29 Thread Jim Wilson

This is my suggested fix for PR 65932, which is a linux kernel
miscompile with gcc-5.1.

The problem here is caused by a chain of events.  The first is that
the relatively new eipa_sra pass creates fake parameters that behave
slightly differently than normal parameters.  The second is that the
optimizer creates phi nodes that copy local variables to fake
parameters and/or vice versa.  The third is that the ouf-of-ssa pass
assumes that it can emit simple move instructions for these phi nodes.
And the fourth is that the ARM port has a PROMOTE_MODE macro that
forces QImode and HImode to unsigned, but a
TARGET_PROMOTE_FUNCTION_MODE hook that does not.  So signed char and
short parameters have different in register representations than local
variables, and require a conversion when copying between them, a
conversion that the out-of-ssa pass can't easily emit.

Ultimately, I think this is a problem in the arm backend.  It should
not have a PROMOTE_MODE macro that is changing the sign of char and
short local variables.  I also think that we should merge the
PROMOTE_MODE macro with the TARGET_PROMOTE_FUNCTION_MODE hook to
prevent this from happening again.

I see four general problems with the current ARM PROMOTE_MODE definition.
1) Unsigned char is only faster for armv5 and earlier, before the sxtb
instruction was added.  It is a lose for armv6 and later.
2) Unsigned short was only faster for targets that don't support
unaligned accesses.  Support for these targets was removed a while
ago, and this PROMODE_MODE hunk should have been removed at the same
time.  It was accidentally left behind.
3) TARGET_PROMOTE_FUNCTION_MODE used to be a boolean hook, when it was
converted to a function, the PROMOTE_MODE code was copied without the
UNSIGNEDP changes.  Thus it is only an accident that
TARGET_PROMOTE_FUNCTION_MODE and PROMOTE_MODE disagree.  Changing
TARGET_PROMOTE_FUNCTION_MODE is an ABI change, so only PROMOTE_MODE
changes to resolve the difference are safe.
4) There is a general principle that you should only change signedness
in PROMOTE_MODE if the hardware forces it, as otherwise this results
in extra conversion instructions that make code slower.  The mips64
hardware for instance requires that 32-bit values be sign-extended
regardless of type, and instructions may trap if this is not true.
However, it has a set of 32-bit instructions that operate on these
values, and hence no conversions are required.  There is no similar
case on ARM. Thus the conversions are unnecessary and unwise.  This
can be seen in the testcases where gcc emits both a zero-extend and a
sign-extend inside a loop, as the sign-extend is required for a
compare, and the zero-extend is required by PROMOTE_MODE.

My change was tested with an arm bootstrap, make check, and SPEC
CPU2000 run.  The original poster verified that this gives a linux
kernel that boots correctly.

The PRMOTE_MODE change causes 3 testsuite testcases to fail.  These
are tests to verify that smulbb and/or smlabb are generated.
Eliminating the unnecessary sign conversions causes us to get better
code that doesn't include the smulbb and smlabb instructions.  I had
to modify the testcases to get them to emit the desired instructions.
With the testcase changes there are no additional testsuite failures,
though I'm concerned that these testcases with the changes may be
fragile, and future changes may break them again.

If there are ARM parts where smulbb/smlabb are faster than mul/mla,
then maybe we should try to add new patterns to get the instructions
emitted again for the unmodified testcases.

Jim
gcc/
2015-06-29  Jim Wilson  jim.wil...@linaro.org

	PR target/65932
	* config/arm/arm.h (PROMOTE_MODE): Don't set UNSIGNEDP for QImode and
	HImode.

gcc/testsuite/
2015-06-29  Jim Wilson  jim.wil...@linaro.org

	PR target/65932
	* gcc.target/arm/wmul-1.c (mac): Change a and b to int pointers.  Cast
	multiply operands to short.
	* gcc.target/arm/wmul-2.c (vec_mpy): Cast multiply result to short.
	* gcc.target/arm/wmul-3.c (mac): Cast multiply results to short.

Index: config/arm/arm.h
===
--- config/arm/arm.h	(revision 224672)
+++ config/arm/arm.h	(working copy)
@@ -523,16 +523,10 @@ extern int arm_arch_crc;
type, but kept valid in the wider mode.  The signedness of the
extension may differ from that of the type.  */
 
-/* It is far faster to zero extend chars than to sign extend them */
-
 #define PROMOTE_MODE(MODE, UNSIGNEDP, TYPE)	\
   if (GET_MODE_CLASS (MODE) == MODE_INT		\
GET_MODE_SIZE (MODE)  4)  	\
 {		\
-  if (MODE == QImode)			\
-	UNSIGNEDP = 1;\
-  else if (MODE == HImode)			\
-	UNSIGNEDP = 1;\
   (MODE) = SImode;\
 }
 
Index: testsuite/gcc.target/arm/wmul-1.c
===
--- testsuite/gcc.target/arm/wmul-1.c	(revision 224672)
+++ testsuite/gcc.target/arm/wmul-1.c	(working copy)
@@ -2,14 +2,14 @@

Re: Thinking about libgccjit SONAME bump for gcc 5.2 (was Re: Four jit backports to gcc 5 branch)

2015-06-29 Thread David Malcolm

On Mon, 2015-06-29 at 18:34 -0400, Ulrich Drepper wrote:
 On Mon, Jun 29, 2015 at 5:26 PM, David Malcolm dmalc...@redhat.com wrote:
  I'm looking at ways to manage libgccjit API/ABI as per this thread:
  https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01982.html
  by transitioning to using symbol versioning, so that the linker can tag
  subsets of libgccjit symbols in both libgccjit and in client binaries.
 
 You don't have to bump the SONAME to introduce symbol versioning.
 glibc in the beginning didn't have symbol versioning and we wrote the
 linker and dynamic linker support so that no SONAME change was
 necessary.  The idea is that unversioned symbols are satisfied by the
 oldest symbol version.

Aha!  Thanks.  I won't bump the SONAME, in that case, I'll just add
symbol versioning, and each new symbol will go it a new tag.

Re: [PATCH, rs6000] Add more overloaded built-ins for vec_cmpge and vec_cmple

2015-06-29 Thread David Edelsohn

On Mon, Jun 29, 2015 at 10:05 PM, Bill Schmidt
wschm...@linux.vnet.ibm.com wrote:
 Hi,

 This patch adds built-in functions for vec_cmpge and vec_cmple required
 by the ELFv2 ABI but not yet present in gcc.  These make use of the
 existing patterns for gt:VEC_I and gtu:VEC_I, applying the not
 operator and reversing the order of operands as needed.  The result is
 generation of a vmpgt[su][bhwd] instruction followed by an xxlnor.

 Since it is likely that a vec_cmp[gl]e will be followed by a vec_sel, it
 will be important to optimize the xxlnor/xxsel* to remove the xxlnor by
 reversing the sense of the select.  A separate patch will follow to add
 this pattern to simplify-rtx.c.

 Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
 regressions.  Ok for trunk?

 Thanks,
 Bill


 [gcc]

 2015-06-29  Bill Schmidt  wschm...@linux.vnet.ibm.com

 * config/rs6000/rs6000-builtin.def (CMPGE_16QI): New built-in
 definition.
 (CMPGE_8HI): Likewise.
 (CMPGE_4SI): Likewise.
 (CMPGE_2DI): Likewise.
 (CMPGE_U16QI): Likewise.
 (CMPGE_U8HI): Likewise.
 (CMPGE_U4SI): Likewise.
 (CMPGE_U2DI): Likewise.
 (CMPLE_16QI): Likewise.
 (CMPLE_8HI): Likewise.
 (CMPLE_4SI): Likewise.
 (CMPLE_2DI): Likewise.
 (CMPLE_U16QI): Likewise.
 (CMPLE_U8HI): Likewise.
 (CMPLE_U4SI): Likewise.
 (CMPLE_U2DI): Likewise.
 * config/rs6000/rs6000-c.c (altivec_overloaded_builtins): Add
 overloads for ALTIVEC_BUILTIN_VEC_CMPGE and
 ALTIVEC_BUILTIN_VEC_CMPLE.
 * config/rs6000/vector.md (vector_gemode): Restrict to
 floating-point vector modes.
 (vector_nltmode): New define_expand.
 (vector_nltumode): Likewise.
 (vector_ngtmode): Likewise.
 (vector_ngtumode): Likewise.

 [gcc/testsuite]

 2015-06-29  Bill Schmidt  wschm...@linux.vnet.ibm.com

 * gcc.target/powerpc/vec-cmp.c: New test.

Okay.

Thanks, David

Re: [PATCH 1/2] Allow REG_EQUAL for ZERO_EXTRACT

2015-06-29 Thread Kugan


On 29/06/15 21:56, Maxim Kuvyrkov wrote:
 On Jun 28, 2015, at 2:28 PM, Kugan kugan.vivekanandara...@linaro.org wrote:

 This patch allows setting REG_EQUAL for ZERO_EXTRACT and handle that in
 cse (where the src for the ZERO_EXTRACT needs to be calculated)

 Thanks,
 Kugan
 
 From 75e746e559ffd21b25542b3db627e3b318118569 Mon Sep 17 00:00:00 2001
 From: Kugan Vivekanandarajah kugan.vivekanandara...@linaro.org
 Date: Fri, 26 Jun 2015 17:12:07 +1000
 Subject: [PATCH 1/2] Allow adding REG_EQUAL for ZERO_EXTRACT

 ---
  gcc/ChangeLog  |  6 ++
  gcc/cse.c  | 41 -
  gcc/emit-rtl.c |  3 ++-
  3 files changed, 48 insertions(+), 2 deletions(-)

 diff --git a/gcc/ChangeLog b/gcc/ChangeLog
 index 080aa39..d4a73d6 100644
 --- a/gcc/ChangeLog
 +++ b/gcc/ChangeLog
 @@ -1,3 +1,9 @@
 +2015-06-26  Kugan Vivekanandarajah  kug...@linaro.org
 +
 +* cse.c (cse_insn): Calculate src_eqv for ZERO_EXTRACT.
 +* emit-rtl.c (set_for_reg_notes): Allow ZERO_EXTRACT to set
 +REG_EQUAL note.
 +
  2015-06-25  H.J. Lu  hongjiu...@intel.com
  
  * gentarget-def.c (def_target_insn): Cast return of strtol to
 diff --git a/gcc/cse.c b/gcc/cse.c
 index 100c9c8..8add651 100644
 --- a/gcc/cse.c
 +++ b/gcc/cse.c
 @@ -4531,8 +4531,47 @@ cse_insn (rtx_insn *insn)
if (n_sets == 1  REG_NOTES (insn) != 0
 (tem = find_reg_note (insn, REG_EQUAL, NULL_RTX)) != 0
 (! rtx_equal_p (XEXP (tem, 0), SET_SRC (sets[0].rtl))
 +  || GET_CODE (SET_DEST (sets[0].rtl)) == ZERO_EXTRACT
|| GET_CODE (SET_DEST (sets[0].rtl)) == STRICT_LOW_PART))
 -src_eqv = copy_rtx (XEXP (tem, 0));
 +{
 +  src_eqv = copy_rtx (XEXP (tem, 0));
 +
 +  /* If DEST is of the form ZERO_EXTACT, as in:
 + (set (zero_extract:SI (reg:SI 119)
 +  (const_int 16 [0x10])
 +  (const_int 16 [0x10]))
 +  (const_int 51154 [0xc7d2]))
 + REG_EQUAL note will specify the value of register (reg:SI 119) at this
 + point.  Note that this is different from SRC_EQV. We can however
 + calculate SRC_EQV with the position and width of ZERO_EXTRACT.  */
 +  if (GET_CODE (SET_DEST (sets[0].rtl)) == ZERO_EXTRACT)
 
 Consider changing
 
 if (something
  (!rtx_equal_p)
 || ZERO_EXTRACT
 || STRICT_LOW_PART)
 
 to 
 
 if (something
  !rtx_equal_p)
   {
  if (ZERO_EXTRACT)
{
}
  else if (STRICT_LOW_PART)
{
}
   }
 
 Otherwise looks good to me, but you still need another approval.

Thanks Maxim for the review. How about the attached patch?

Thanks,
Kugan
 
 
 +{
 +  if (CONST_INT_P (src_eqv)
 +   CONST_INT_P (XEXP (SET_DEST (sets[0].rtl), 1))
 +   CONST_INT_P (XEXP (SET_DEST (sets[0].rtl), 2)))
 +{
 +  rtx dest_reg = XEXP (SET_DEST (sets[0].rtl), 0);
 +  rtx width = XEXP (SET_DEST (sets[0].rtl), 1);
 +  rtx pos = XEXP (SET_DEST (sets[0].rtl), 2);
 +  HOST_WIDE_INT val = INTVAL (src_eqv);
 +  HOST_WIDE_INT mask;
 +  unsigned int shift;
 +  if (BITS_BIG_ENDIAN)
 +shift = GET_MODE_PRECISION (GET_MODE (dest_reg))
 +  - INTVAL (pos) - INTVAL (width);
 +  else
 +shift = INTVAL (pos);
 +  if (INTVAL (width) == HOST_BITS_PER_WIDE_INT)
 +mask = ~(HOST_WIDE_INT) 0;
 +  else
 +mask = ((HOST_WIDE_INT) 1  INTVAL (width)) - 1;
 +  val = (val  shift)  mask;
 +  src_eqv = GEN_INT (val);
 +}
 +  else
 +src_eqv = 0;
 +}
 +}
  
/* Set sets[i].src_elt to the class each source belongs to.
   Detect assignments from or to volatile things
 diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
 index e7f7eab..cb891b1 100644
 --- a/gcc/emit-rtl.c
 +++ b/gcc/emit-rtl.c
 @@ -5228,7 +5228,8 @@ set_for_reg_notes (rtx insn)
reg = SET_DEST (pat);
  
/* Notes apply to the contents of a STRICT_LOW_PART.  */
 -  if (GET_CODE (reg) == STRICT_LOW_PART)
 +  if (GET_CODE (reg) == STRICT_LOW_PART
 +  || GET_CODE (reg) == ZERO_EXTRACT)
  reg = XEXP (reg, 0);
  
/* Check that we have a register.  */
 -- 
 1.9.1


 
 
 --
 Maxim Kuvyrkov
 www.linaro.org
 
 
 
 
From 3754bcd11fb732e03a39d6625a9eb14934b36643 Mon Sep 17 00:00:00 2001
From: Kugan Vivekanandarajah kugan.vivekanandara...@linaro.org
Date: Fri, 26 Jun 2015 17:12:07 +1000
Subject: [PATCH 1/2] Allow adding REG_EQUAL for ZERO_EXTRACT

---
 gcc/ChangeLog  |  6 ++
 gcc/cse.c  | 49 ++---
 gcc/emit-rtl.c |  3 ++-
 3 files changed, 50 insertions(+), 8 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 080aa39..d4a73d6 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,9 @@
+2015-06-26  Kugan Vivekanandarajah  kug...@linaro.org
+
+	* cse.c (cse_insn): Calculate src_eqv for ZERO_EXTRACT.
+	* emit-rtl.c (set_for_reg_notes): Allow ZERO_EXTRACT to set
+	REG_EQUAL note.
+
 2015-06-25  H.J. Lu

Re: [patch] fix regrename pass to ensure renamings produce valid insns

2015-06-29 Thread Sandra Loosemore


On 06/29/2015 09:07 PM, Kito Cheng wrote:

Hi all:

This patch seem will broken when disable assert checking for c6x

Index: gcc/config/c6x/c6x.c
===
--- gcc/config/c6x/c6x.c (revision 225104)
+++ gcc/config/c6x/c6x.c (working copy)
@@ -3516,7 +3516,7 @@ try_rename_operands (rtx_insn *head, rtx
best_reg =
  find_rename_reg (this_head, super_class, unavailable, old_reg, true);

-  regrename_do_replace (this_head, best_reg);
+  gcc_assert (regrename_do_replace (this_head, best_reg));

count_unit_reqs (new_reqs, head, PREV_INSN (tail));
merge_unit_reqs (new_reqs);
@@ -3529,7 +3529,7 @@ try_rename_operands (rtx_insn *head, rtx
 unit_req_imbalance (reqs), unit_req_imbalance (new_reqs));
  }
if (unit_req_imbalance (new_reqs)  unit_req_imbalance (reqs))
-regrename_do_replace (this_head, old_reg);
+gcc_assert (regrename_do_replace (this_head, old_reg));
else
  memcpy (reqs, new_reqs, sizeof (unit_req_table));



I'm sorry; do you have a suggestion for a fix?  I thought this was the 
change I was asked to make, and as I noted previously, I'm not set up to 
test (or even build) for this target.


-Sandra the obviously confused :-(

Re: [patch] fix regrename pass to ensure renamings produce valid insns

2015-06-29 Thread Chung-Lin Tang

On 2015/6/30 12:22 PM, Sandra Loosemore wrote:
 On 06/29/2015 09:07 PM, Kito Cheng wrote:
 Hi all:

 This patch seem will broken when disable assert checking for c6x

 Index: gcc/config/c6x/c6x.c
 ===
 --- gcc/config/c6x/c6x.c (revision 225104)
 +++ gcc/config/c6x/c6x.c (working copy)
 @@ -3516,7 +3516,7 @@ try_rename_operands (rtx_insn *head, rtx
 best_reg =
   find_rename_reg (this_head, super_class, unavailable, old_reg, true);

 -  regrename_do_replace (this_head, best_reg);
 +  gcc_assert (regrename_do_replace (this_head, best_reg));

 count_unit_reqs (new_reqs, head, PREV_INSN (tail));
 merge_unit_reqs (new_reqs);
 @@ -3529,7 +3529,7 @@ try_rename_operands (rtx_insn *head, rtx
  unit_req_imbalance (reqs), unit_req_imbalance (new_reqs));
   }
 if (unit_req_imbalance (new_reqs)  unit_req_imbalance (reqs))
 -regrename_do_replace (this_head, old_reg);
 +gcc_assert (regrename_do_replace (this_head, old_reg));
 else
   memcpy (reqs, new_reqs, sizeof (unit_req_table));

 
 I'm sorry; do you have a suggestion for a fix?  I thought this was the change 
 I was asked to make, and as I noted previously, I'm not set up to test (or 
 even build) for this target.
 
 -Sandra the obviously confused :-(

You probably have to separate out the regrename_do_replace() bool result into a 
variable,
placing the whole call into the gcc_assert() might make it disappear when 
assertions are turned off.

Chung-Lin

Re: [C++/66443]

2015-06-29 Thread Jason Merrill


On 06/29/2015 06:57 PM, Nathan Sidwell wrote:

* method.c (synthesized_method_walk): Skip virtual bases of
abstract classes in C++14 mode.


Let's not limit this to C++14 mode; most DRs apply to earlier standards 
as well.


Jason

Re: [s390] Revert TPF C++ library changes

2015-06-29 Thread Jeff Law


On 06/29/2015 07:00 PM, DJ Delorie wrote:

But don't we need to support the older system (with 2 libstdc++s) and
the newer system (just one libstdc++)?  Which implies these changes need
to be conditional, right?


The CPP2 configuration was never shipped to TPF customers, so there's
no need to retain both ways.

Ah, then approved.

Jeff

Re: [patch] fix regrename pass to ensure renamings produce valid insns

2015-06-29 Thread Chung-Lin Tang

On 2015/6/30 下午 01:13, Chung-Lin Tang wrote:
 On 2015/6/30 12:22 PM, Sandra Loosemore wrote:
 On 06/29/2015 09:07 PM, Kito Cheng wrote:
 Hi all:

 This patch seem will broken when disable assert checking for c6x

 Index: gcc/config/c6x/c6x.c
 ===
 --- gcc/config/c6x/c6x.c (revision 225104)
 +++ gcc/config/c6x/c6x.c (working copy)
 @@ -3516,7 +3516,7 @@ try_rename_operands (rtx_insn *head, rtx
 best_reg =
   find_rename_reg (this_head, super_class, unavailable, old_reg, true);

 -  regrename_do_replace (this_head, best_reg);
 +  gcc_assert (regrename_do_replace (this_head, best_reg));

 count_unit_reqs (new_reqs, head, PREV_INSN (tail));
 merge_unit_reqs (new_reqs);
 @@ -3529,7 +3529,7 @@ try_rename_operands (rtx_insn *head, rtx
  unit_req_imbalance (reqs), unit_req_imbalance (new_reqs));
   }
 if (unit_req_imbalance (new_reqs)  unit_req_imbalance (reqs))
 -regrename_do_replace (this_head, old_reg);
 +gcc_assert (regrename_do_replace (this_head, old_reg));
 else
   memcpy (reqs, new_reqs, sizeof (unit_req_table));


 I'm sorry; do you have a suggestion for a fix?  I thought this was the 
 change I was asked to make, and as I noted previously, I'm not set up to 
 test (or even build) for this target.

 -Sandra the obviously confused :-(
 
 You probably have to separate out the regrename_do_replace() bool result into 
 a variable,
 placing the whole call into the gcc_assert() might make it disappear when 
 assertions are turned off.

I notice the way gcc_assert() is defined in system.h now, the test won't 
disappear even when runtime checks
are disabled, though you might still adjust it to avoid any programmer 
confusion.

Chung-Lin

Re: [PATCH] config/bfin/bfin.c (hwloop_optimize): Use return false instead of gcc_assert for checking jump_insn.

2015-06-29 Thread Bernd Schmidt


On 06/28/2015 04:15 PM, Chen Gang wrote:

For bfin looping optimization, after lsetup optimization, it can have
the correct lsetup related insns which causes gcc_assert for jump_insn.


I've been debugging this for a bit, and at least the explanation of the 
patch is wrong - it's finding an LSETUP for a different loop. There 
seems to be an inconsistency in the CFG, and it looks like it's caused 
by the unusual (?) situation that both arms out of a conditional branch 
lead directly to a hwloop candidate.


So, not OK until further investigation I think.


Bbernd

98 matches

Mail list logo