Re: [PATCH] [AArch64] Implement popcount pattern

2017-02-03 Thread James Greenhalgh
On Fri, Feb 03, 2017 at 06:57:54AM +, Hurugalawadi, Naveen wrote:
> Hi Andrew,
> 
> Thanks for clearing the confusion.
> 
> > I don't understand this comment and how it relates to your updated patch
> 
> foo, foo1 and foo2 generates calls to "popcountdi2" which should have
> been "popcountsi2" for foo1. When Kyrill commented on using the
> popcountsi2; I was confused :).
> 
> Hence, the testcase generally checks for the absence of call to "popcount"
> and the presence of "cnt" instruction instead.
> 
> >> Now of course what should change still is the argument 
> >> types to foo1/foo2 
> 
> The arguments to foo1 and foo2 are modified as required.
> 
> Bootstrapped and regression tested on aarch64-linux-gnu with no regressions.
> 
> Please let us know if its okay for stage 1?

This looks OK to me now. The change is self contained, looks safe, and
has only been held up for the testcase issue.

But, please give Richard/Marcus a reasonable time to object (say, end of day
Tuesday) before committing.

Thanks,
James

> 
> Thanks,
> Naveen

> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index a693a3b..684a833 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -3778,6 +3778,39 @@
>}
>  )
>  
> +;; Pop count be done via the "CNT" instruction in AdvSIMD.
> +;;
> +;; MOV   v.1d, x0
> +;; CNT   v1.8b, v.8b
> +;; ADDV b2, v1.8b
> +;; MOV   w0, v2.b[0]
> +
> +(define_expand "popcount2"
> +  [(match_operand:GPI 0 "register_operand")
> +   (match_operand:GPI 1 "register_operand")]
> +  "TARGET_SIMD"
> +{
> +  rtx v = gen_reg_rtx (V8QImode);
> +  rtx v1 = gen_reg_rtx (V8QImode);
> +  rtx r = gen_reg_rtx (QImode);
> +  rtx in = operands[1];
> +  rtx out = operands[0];
> +  if(mode == SImode)
> +{
> +  rtx tmp;
> +  tmp = gen_reg_rtx (DImode);
> +  /* If we have SImode, zero extend to DImode, pop count does
> + not change if we have extra zeros. */
> +  emit_insn (gen_zero_extendsidi2 (tmp, in));
> +  in = tmp;
> +}
> +  emit_move_insn (v, gen_lowpart (V8QImode, in));
> +  emit_insn (gen_popcountv8qi2 (v1, v));
> +  emit_insn (gen_reduc_plus_scal_v8qi (r, v1));
> +  emit_insn (gen_zero_extendqi2 (out, r));
> +  DONE;
> +})
> +
>  (define_insn "clrsb2"
>[(set (match_operand:GPI 0 "register_operand" "=r")
>  (clrsb:GPI (match_operand:GPI 1 "register_operand" "r")))]
> diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt.c 
> b/gcc/testsuite/gcc.target/aarch64/popcnt.c
> new file mode 100644
> index 000..7e95796
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/popcnt.c
> @@ -0,0 +1,23 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +int
> +foo (int x)
> +{
> +  return __builtin_popcount (x);
> +}
> +
> +long
> +foo1 (long x)
> +{
> +  return __builtin_popcountl (x);
> +}
> +
> +long long
> +foo2 (long long x)
> +{
> +  return __builtin_popcountll (x);
> +}
> +
> +/* { dg-final { scan-assembler-not "popcount" } } */
> +/* { dg-final { scan-assembler-times "cnt\t" 3 } } */




[committed] Fix gfortran.dg/coarray_43.f90 on hppa*-*-hpux*

2017-02-03 Thread John David Anglin
We need to link against libatomic when available.  Committed to trunk.

Dave
--
John David Anglin   dave.ang...@bell.net


2017-02-03  John David Anglin  

* gfortran.dg/coarray_43.f90: Add "-latomic" option if
libatomic_available.

Index: gfortran.dg/coarray_43.f90
===
--- gfortran.dg/coarray_43.f90  (revision 245109)
+++ gfortran.dg/coarray_43.f90  (working copy)
@@ -1,5 +1,6 @@
 ! { dg-do link }
 ! { dg-options "-fcoarray=lib -lcaf_single" }
+! { dg-additional-options "-latomic" { target libatomic_available } }
 
 program coarray_43
   implicit none


[committed] Require alias support for gcc.dg/pr77587.c

2017-02-03 Thread John David Anglin
This avoids test failure on 32-bit hppa*-*-hpux*.  Committed to trunk.

Dave
--
John David Anglin   dave.ang...@bell.net


2017-02-03  John David Anglin  

* gcc.dg/pr77587.c: Require alias support.
* gcc.dg/pr77587a.c: Likewise.

Index: gcc.dg/pr77587.c
===
--- gcc.dg/pr77587.c(revision 245109)
+++ gcc.dg/pr77587.c(working copy)
@@ -1,5 +1,6 @@
 /* PR target/77587 */
 /* { dg-do run } */
+/* { dg-require-alias "" } */
 /* { dg-require-weak-override "" } */
 /* { dg-additional-sources "pr77587a.c" } */
 
Index: gcc.dg/pr77587a.c
===
--- gcc.dg/pr77587a.c   (revision 245109)
+++ gcc.dg/pr77587a.c   (working copy)
@@ -1,5 +1,6 @@
 /* PR target/77587 */
 /* { dg-do compile } */
+/* { dg-require-alias "" } */
 /* { dg-require-weak-override "" } */
 
 void


[committed] Skip tests on hppa*-*-hpux*

2017-02-03 Thread John David Anglin
The attached change skips a couple of tests on hpux.  The 32-bit hppa-hpux 
target lacks support for
the dwarf debug format.  We also lack the %hhd format except on 11.31.

Committed to trunk.

Dave
--
John David Anglin   dave.ang...@bell.net


2017-02-03  John David Anglin  

* g++.dg/pr78112-2.C: Skip on hppa*-*-hpux*.
* gcc.c-torture/execute/pr78622.c: Likewise.

Index: g++.dg/pr78112-2.C
===
--- g++.dg/pr78112-2.C  (revision 245109)
+++ g++.dg/pr78112-2.C  (working copy)
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-skip-if "No dwarf debug support" { hppa*-*-hpux* } "*" "" } */
 /* { dg-options "-g -dA -gdwarf-4 -std=gnu++11" } */
 /* { dg-options "-g -dA -std=gnu++11 -gdwarf-4" } */
 /* { dg-final { scan-assembler-times DW_AT_object_pointer 18 } } */
Index: gcc.c-torture/execute/pr78622.c
===
--- gcc.c-torture/execute/pr78622.c (revision 245109)
+++ gcc.c-torture/execute/pr78622.c (working copy)
@@ -1,5 +1,6 @@
 /* PR middle-end/78622 - [7 Regression] -Wformat-overflow/-fprintf-return-value
incorrect with overflow/wrapping
+   { dg-skip-if "Requires %hhd format" { hppa*-*-hpux* } { "*" } { "" } }
{ dg-additional-options "-Wformat-overflow=2" } */
 
 __attribute__((noinline, noclone)) int


[committed] Fix some more tests that fail on hppa*-*-hpux* due to limited common alignment

2017-02-03 Thread John David Anglin
The attached change adds the -fno-common option to several tests that fail on 
hppa*-*-hpux*.
Committed to trunk.

Dave
--
John David Anglin   dave.ang...@bell.net


2017-02-03  John David Anglin  

* c-c++-common/Wunused-var-15.c: Add -fno-common additional option on
hppa*-*-hpux*.
* c-c++-common/Wunused-var-16.c: Likewise.
* c-c++-common/builtin-shuffle-1.c: Likewise.
* gcc.dg/debug/dwarf2/align-[1-6].c Likewise.
* gcc.dg/debug/dwarf2/align-as-1.c: Likewise.

Index: c-c++-common/Wunused-var-15.c
===
--- c-c++-common/Wunused-var-15.c   (revision 245109)
+++ c-c++-common/Wunused-var-15.c   (working copy)
@@ -1,6 +1,7 @@
 /* PR c/71719 */
 /* { dg-do compile } */
 /* { dg-options "-Wunused -W -Wno-psabi" } */
+/* { dg-additional-options "-fno-common" { target hppa*-*-hpux* } } */
 
 typedef unsigned V __attribute__ ((vector_size (16)));
 
Index: c-c++-common/Wunused-var-16.c
===
--- c-c++-common/Wunused-var-16.c   (revision 245109)
+++ c-c++-common/Wunused-var-16.c   (working copy)
@@ -1,6 +1,7 @@
 /* PR c++/78949 */
 /* { dg-do compile } */
 /* { dg-options "-Wunused" } */
+/* { dg-additional-options "-fno-common" { target hppa*-*-hpux* } } */
 
 typedef unsigned char V __attribute__((vector_size(16)));
 V v;
Index: c-c++-common/builtin-shuffle-1.c
===
--- c-c++-common/builtin-shuffle-1.c(revision 245109)
+++ c-c++-common/builtin-shuffle-1.c(working copy)
@@ -1,5 +1,6 @@
 /* PR c++/78089 */
 /* { dg-do run } */
+/* { dg-additional-options "-fno-common" { target hppa*-*-hpux* } } */
 
 typedef int V __attribute__((vector_size (4 * __SIZEOF_INT__)));
 V a, b, c;
Index: gcc.dg/debug/dwarf2/align-1.c
===
--- gcc.dg/debug/dwarf2/align-1.c   (revision 245109)
+++ gcc.dg/debug/dwarf2/align-1.c   (working copy)
@@ -1,5 +1,6 @@
 // { dg-do compile }
 // { dg-options "-O -g -dA -gno-strict-dwarf" }
+// { dg-additional-options "-fno-common" { target hppa*-*-hpux* } }
 // { dg-final { scan-assembler-times " DW_AT_alignment" 1 { xfail { 
powerpc-ibm-aix* } } } }
 
 int __attribute__((__aligned__(64))) i;
Index: gcc.dg/debug/dwarf2/align-2.c
===
--- gcc.dg/debug/dwarf2/align-2.c   (revision 245109)
+++ gcc.dg/debug/dwarf2/align-2.c   (working copy)
@@ -1,5 +1,6 @@
 // { dg-do compile }
 // { dg-options "-O -g -dA -gno-strict-dwarf" }
+// { dg-additional-options "-fno-common" { target hppa*-*-hpux* } }
 // { dg-final { scan-assembler-times " DW_AT_alignment" 1 { xfail { 
powerpc-ibm-aix* } } } }
 
 typedef int __attribute__((__aligned__(64))) i_t;
Index: gcc.dg/debug/dwarf2/align-3.c
===
--- gcc.dg/debug/dwarf2/align-3.c   (revision 245109)
+++ gcc.dg/debug/dwarf2/align-3.c   (working copy)
@@ -1,5 +1,6 @@
 // { dg-do compile }
 // { dg-options "-O -g -dA -gno-strict-dwarf" }
+// { dg-additional-options "-fno-common" { target hppa*-*-hpux* } }
 // { dg-final { scan-assembler-times " DW_AT_alignment" 1 { xfail { 
powerpc-ibm-aix* } } } }
 
 typedef int int_t;
Index: gcc.dg/debug/dwarf2/align-4.c
===
--- gcc.dg/debug/dwarf2/align-4.c   (revision 245109)
+++ gcc.dg/debug/dwarf2/align-4.c   (working copy)
@@ -1,5 +1,6 @@
 // { dg-do compile }
 // { dg-options "-O -g -dA -gno-strict-dwarf" }
+// { dg-additional-options "-fno-common" { target hppa*-*-hpux* } }
 // { dg-final { scan-assembler-times " DW_AT_alignment" 2 { xfail { 
powerpc-ibm-aix* } } } }
 
 struct tt {
Index: gcc.dg/debug/dwarf2/align-5.c
===
--- gcc.dg/debug/dwarf2/align-5.c   (revision 245109)
+++ gcc.dg/debug/dwarf2/align-5.c   (working copy)
@@ -1,5 +1,6 @@
 // { dg-do compile }
 // { dg-options "-O -g -dA -gno-strict-dwarf" }
+// { dg-additional-options "-fno-common" { target hppa*-*-hpux* } }
 // { dg-final { scan-assembler-times " DW_AT_alignment" 1 { xfail { 
powerpc-ibm-aix* } } } }
 
 struct tt {
Index: gcc.dg/debug/dwarf2/align-6.c
===
--- gcc.dg/debug/dwarf2/align-6.c   (revision 245109)
+++ gcc.dg/debug/dwarf2/align-6.c   (working copy)
@@ -1,5 +1,6 @@
 // { dg-do compile }
 // { dg-options "-O -g -dA -gno-strict-dwarf" }
+// { dg-additional-options "-fno-common" { target hppa*-*-hpux* } }
 // { dg-final { scan-assembler-times " DW_AT_alignment" 1 { xfail { 
powerpc-ibm-aix* } } } }
 
 struct tt {
Index: gcc.dg/debug/dwarf2/align-as-1.c
===
--- gcc.dg/debug/dwarf2/align-as-1.c  

Re: [wwwdocs, coding conventions] Mention OVERRIDE/FINAL

2017-02-03 Thread Trevor Saunders
On Fri, Feb 03, 2017 at 09:34:52AM +0100, Gerald Pfeifer wrote:
> On Fri, 14 Oct 2016, David Malcolm wrote:
> > On Fri, 2016-10-14 at 16:27 +0100, Pedro Alves wrote:
> >> FYI, I pushed these in now.  I also bootstrapped with the
> >> jit included in the selected languages, and hacked the
> >> jit code a bit to trigger the problems OVERRIDE intends to
> >> catch, just to make sure it still works.
> > I propose that we update our coding conventions to mention the OVERRIDE
> > and FINAL macros in the paragraph that discusses virtual funcs.
> > 
> > The attached patch (to the website) does so.
> > 
> > OK to commit?
> 
> I noticed this one has neither been rejected nor applied.
> 
> The patch appears fine wearing my wwwdocs maintainer hat, alas I
> do not feel confident approving it (content-wise).

fwiw I can't think of any big downsides, I guess there's slightly more
work ocassionally when you add a new class that inherits from an old one
and slightly more verbosity, but it definitely seems worth it to me.

Trev

> 
> Perhaps something for Jeff (now added) or Bernd?
> 
> Gerald

> Index: htdocs/codingconventions.html
> ===
> RCS file: /cvs/gcc/wwwdocs/htdocs/codingconventions.html,v
> retrieving revision 1.77
> diff -u -p -r1.77 codingconventions.html
> --- htdocs/codingconventions.html 18 Sep 2016 13:55:17 -  1.77
> +++ htdocs/codingconventions.html 14 Oct 2016 21:22:44 -
> @@ -902,7 +902,10 @@ Its use with data-carrying classes is mo
>  
>  Think carefully about the size and performance impact
>  of virtual functions and virtual bases
> -before using them.
> +before using them.  If you do use virtual functions, use the
> +OVERRIDE and FINAL macros from
> +include/ansidecl.h to annotate the code for a human reader,
> +and to allow sufficiently modern C++ compilers to detect mistakes.
>  
>  
>  



[committed] Add gcc/function-tests.o to compare_exclusions for 32-bit hppa*-*-hpux*.

2017-02-03 Thread John David Anglin
Depending on the bootstrap compiler and configure options, we sometimes get a 
compare failure
for gcc/function-tests.o on 32-bit hppa*-*-hpux*.  The problem is a 
"_GLOBAL__F_" label containing
a "random" string.  For example, 

.EXPORT _GLOBAL__F_.._.._gcc_gcc_function_tests.c_DFF67DD7_0xb048e3,DATA
_GLOBAL__F_.._.._gcc_gcc_function_tests.c_DFF67DD7_0xb048e3:

The attached change adds gcc/function-tests.o to the compare exclusions for 
hppa*-*-hpux*. 

Tested on hppa2.0w-hp-hpux11.11.  Committed to trunk.

Dave
--
John David Anglin   dave.ang...@bell.net


2017-02-03  John David Anglin  

* configure.ac: Add gcc/function-tests.o to compare_exclusions for
32-bit hppa*-*-hpux*.
* configure: Regenerate.

Index: configure.ac
===
--- configure.ac(revision 245137)
+++ configure.ac(working copy)
@@ -3509,7 +3509,7 @@
 compare_exclusions="gcc/cc*-checksum\$(objext) | gcc/ada/*tools/*"
 case "$target" in
   hppa*64*-*-hpux*) ;;
-  hppa*-*-hpux*) compare_exclusions="gcc/cc*-checksum\$(objext) | 
*/libgcc/lib2funcs* | gcc/ada/*tools/*" ;;
+  hppa*-*-hpux*) compare_exclusions="gcc/cc*-checksum\$(objext) | 
*/libgcc/lib2funcs* | gcc/ada/*tools/* | gcc/function-tests.o" ;;
   powerpc*-ibm-aix*) compare_exclusions="gcc/cc*-checksum\$(objext) | 
gcc/ada/*tools/* | *libgomp*\$(objext)" ;;
 esac
 AC_SUBST(compare_exclusions)


Re: [PATCH] sched: Do not move expensive insns speculatively (PR68664)

2017-02-03 Thread Segher Boessenkool
Hi Jeff,

On Fri, Feb 03, 2017 at 04:31:58PM -0700, Jeff Law wrote:
> >+@deftypefn {Target Hook} bool TARGET_SCHED_CAN_SPECULATE_INSN (rtx_insn 
> >*@var{insn})
> >+Some instructions should never be speculated by the schedulers, usually
> >+ because the instruction is too expensive to get this wrong.  This hook
> >+ should return @code{false} if @var{insn} should not be speculated.
> >+@end deftypefn
> Consider adding something like this:
> 
> Define this hook to return false for instructions which are not fully 
> modeled by the pipeline description to avoid DFA size explosion. 
> Otherwise the scheduler may erroneously speculate those instructions 
> into a pipeline bubble that is too small which may severely impact 
> performance.

Well, it speculates it even _if_ you correctly model it, currently
anyway.  But I'll write something similar, good idea.

> >+  if (INSN_BB (insn) != target_bb && IS_SPECULATIVE_INSN (insn))
> >+{
> >+  /* Cannot schedule this insn unless all operands are live.  */
> >+  if (!check_live (insn, INSN_BB (insn)))
> >+return 0;
> >+
> >+  /* Should not move expensive instructions speculatively.  */
> >+  if (GET_CODE (PATTERN (insn)) != CLOBBER
> >+  && !targetm.sched.can_speculate_insn (insn))
> >+return 0;
> >+}
> You probably want to filter both CLOBBER and USE insns.

I had that originally (and more checks), but only CLOBBER can be
speculated and there is that IS_SPECULATIVE_INSN check right before.
I had the CLOBBER check in the rs6000 hook first -- get_attr_type
will die on it -- but I figured it makes more sense in the generic
code.

> OK with those two nits addressed.

Thanks!


Segher


Re: [PATCH] PR target/66144, PowerPC improve vector compare

2017-02-03 Thread Michael Meissner
On Fri, Feb 03, 2017 at 06:07:56PM -0600, Segher Boessenkool wrote:
> On Fri, Feb 03, 2017 at 04:25:00PM -0500, Michael Meissner wrote:
> > +;; Return 1 if operand is either a vector constant of all 0 bits of a 
> > vector
> > +;; constant of all 1 bits.
> > +(define_predicate "vector_int_same_bit"
> > +  (match_code "const_vector")
> > +{
> > +  if (GET_MODE_CLASS (mode) != MODE_VECTOR_INT)
> > +return 0;
> > +
> > +  else
> > +return op == CONST0_RTX (mode) || op == CONSTM1_RTX (mode);
> > +})
> 
> This predicate is unused as far as I see?

Right.  It was used in my first attempt when I had a peephole2 to
eliminate the extra loads.  Since I moved the processing to when we create the
conditional vector assignment and deleted the peephole2, I missed deleting the
predicate.  Thanks for catching it.

> > +  /* Optimize vec1 == vec2, to know the mask generates -1/0.  */
> > +  if (GET_MODE_CLASS (dest_mode) == MODE_VECTOR_INT)
> >  {
> > -  tmp = op_true;
> > -  op_true = op_false;
> > -  op_false = tmp;
> > +  if (op_true == constant_m1 && op_false == constant_0)
> > +   {
> > + emit_move_insn (dest, mask);
> > + return 1;
> > +   }
> > +
> > +  else if (op_true == constant_0 && op_false == constant_m1)
> > +   {
> > + emit_insn (gen_rtx_SET (dest, gen_rtx_NOT (dest_mode, mask)));
> > + return 1;
> > +   }
> >  }
> 
> Do you need to test for dest_mode == mask_mode here, like below?

Yes, because there is support for vcondv4siv4sf and vcondv4sfv4si where the
mask is one of V4SI or V4SF and the destination is the other.  The dest_mode ==
mask_mode checks for that, and ignore that case.

I.e. something like:

static float af[1024], bf[1024], cf[1024];
static int di[1024], ei[1024];

// ..

for (i = 0; i < 1024; i++)
  af[i] = (di[i] == ei[i]) ? bf[i] : cf[i];

> 
> > +  if (op_true == constant_m1 && dest_mode == mask_mode)
> > +op_true = mask;
> > +  else if (!REG_P (op_true) && !SUBREG_P (op_true))
> > +op_true = force_reg (dest_mode, op_true);
> > +
> > +  if (op_false == constant_0 && dest_mode == mask_mode)
> > +op_false = mask;
> > +  else if (!REG_P (op_false) && !SUBREG_P (op_false))
> > +op_false = force_reg (dest_mode, op_false);
> 
> Another thing you could try is, if either op_true or op_false is 0
> or -1, let the result be
>   (mask & op_true) | (~mask & op_false)
> 
> and let the rest of the optimisers sort it out (it's a single vor/vand
> or vorc/vandc, or a vnot, or nothing).  A later improvement perhaps.
> Or does it already handle all cases now :-)

I don't know if it would handle it.

> 
> Okay for trunk with the unused predicate removed, and the dest_mode ==
> mask_mode thing looked at.  Thanks!

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [PATCH, rs6000] gcc 5 back port of vec_packs and vec_vgbbd builtin table entry fixes

2017-02-03 Thread Segher Boessenkool
On Fri, Feb 03, 2017 at 02:23:12PM -0800, Carl E. Love wrote:
> The GCC 6 branch for the back port of the following two mainline commits
> was approved and committed to GCC 6.  The following patch is the back
> port for the two commits for the GCC 5 branch.  The only difference
> relative to the GCC 6 patch is the line numbers.

> Is the patch OK for gcc 5 branch?  

Sure :-)  (But, typo in the changelog, see below).


Segher


> 2017-02-03  Carl Love  
> 
> Backport of two commits from mainline, r244943 and r244904,
> dated 017-01-26 and 2017-01-25 respectively

^^^ I doubt that date.  A strange Y2K remnant?

> * config/rs6000/rs6000-c (altivec_overloaded_builtins): Fix order
> of entries for ALTIVEC_BUILTIN_VEC_PACKS.  Remove bogus entries
> for P8V_BUILTIN_VEC_VGBBD.
> 
> gcc/testsuite/ChangeLog:
> 
> 2017-02-03  Carl Love  
> * gcc.target/powerpc/builtins-3-p8.c:  Add new testfile for missing
> vec_packs built-in tests.


Re: [PATCH] PR target/66144, PowerPC improve vector compare

2017-02-03 Thread Segher Boessenkool
On Fri, Feb 03, 2017 at 04:25:00PM -0500, Michael Meissner wrote:
> +;; Return 1 if operand is either a vector constant of all 0 bits of a vector
> +;; constant of all 1 bits.
> +(define_predicate "vector_int_same_bit"
> +  (match_code "const_vector")
> +{
> +  if (GET_MODE_CLASS (mode) != MODE_VECTOR_INT)
> +return 0;
> +
> +  else
> +return op == CONST0_RTX (mode) || op == CONSTM1_RTX (mode);
> +})

This predicate is unused as far as I see?

> +  /* Optimize vec1 == vec2, to know the mask generates -1/0.  */
> +  if (GET_MODE_CLASS (dest_mode) == MODE_VECTOR_INT)
>  {
> -  tmp = op_true;
> -  op_true = op_false;
> -  op_false = tmp;
> +  if (op_true == constant_m1 && op_false == constant_0)
> + {
> +   emit_move_insn (dest, mask);
> +   return 1;
> + }
> +
> +  else if (op_true == constant_0 && op_false == constant_m1)
> + {
> +   emit_insn (gen_rtx_SET (dest, gen_rtx_NOT (dest_mode, mask)));
> +   return 1;
> + }
>  }

Do you need to test for dest_mode == mask_mode here, like below?

> +  if (op_true == constant_m1 && dest_mode == mask_mode)
> +op_true = mask;
> +  else if (!REG_P (op_true) && !SUBREG_P (op_true))
> +op_true = force_reg (dest_mode, op_true);
> +
> +  if (op_false == constant_0 && dest_mode == mask_mode)
> +op_false = mask;
> +  else if (!REG_P (op_false) && !SUBREG_P (op_false))
> +op_false = force_reg (dest_mode, op_false);

Another thing you could try is, if either op_true or op_false is 0
or -1, let the result be
(mask & op_true) | (~mask & op_false)

and let the rest of the optimisers sort it out (it's a single vor/vand
or vorc/vandc, or a vnot, or nothing).  A later improvement perhaps.
Or does it already handle all cases now :-)

Okay for trunk with the unused predicate removed, and the dest_mode ==
mask_mode thing looked at.  Thanks!


Segher


Re: PR79286, ira combine_and_move_insns in loops

2017-02-03 Thread Jeff Law

On 02/02/2017 02:31 AM, Alan Modra wrote:

Revised patch that cures the lra related -m32 -Os regression too.

The code that I'm patching here is changing a REG_EQUAL note to
REG_EQUIV, ie. asserting that the value of the reg is always the value
set by the current instruction.  Which is not always true when the
insn is in a loop and the use of the reg happens before the def.  Of
course that implies the value of the reg is initially undefined, so
it's not unreasonable to make the initial reg value the same.
Except when the code setting the initial value may trap, as it does in
the testcase.

Bootstrap and regression test on x86_64-linux in progress.  OK
assuming no regressions?

I also took a look at gcc/*.o to see whether there were any code
quality regressions.  No differences found except the expected change
in ira.o.

PR rtl-optimization/79286
* ira.c (update_equiv_regs): Do not create an equivalence for
mems that may trap.
testsuite/
* gcc.c-torture/execute/pr79286.c: New.
So I haven't been able to find where we've hashed through this issue 
before.  My recollection was that if we encountered a use before the def 
that we would avoid recording the equivalence when we later see the 
REG_EQUAL note.


That works within a BB.  If the use/set are in different blocks then 
don't we just need to check that the set dominates the use?


Jeff




Re: [PATCH] sched: Do not move expensive insns speculatively (PR68664)

2017-02-03 Thread Jeff Law

On 02/03/2017 01:12 PM, Segher Boessenkool wrote:

Scheduling should never move very expensive instructions to places they
are executed more frequently.  This patch fixes that, reducing the
execution time of c-ray by over 40% (I tested on a BE Power7 system).

This introduces a new target hook sched.can_speculate_insn which returns
whether the scheduler is allowed to speculate a given instruction.  The
rs6000 implementation disallows all divide and square root instructions.

Bootstrapped and tested on powerpc64-linux {-m32,-m64}.  Is this okay
for trunk?


Segher


2017-02-03  Segher Boessenkool  

PR rtl-optimization/68664
* target.def (can_speculate_insn): New hook.
* doc/tm.texi.in (TARGET_SCHED_CAN_SPECULATE_INSN): New hook.
* doc/tm.texi: Regenerate.
* sched-rgn.c (can_schedule_ready_p): Use the new hook.
* config/rs6000/rs6000.c (TARGET_SCHED_CAN_SPECULATE_INSN): New macro.
(rs6000_sched_can_speculate_insn): New function.

---
 gcc/config/rs6000/rs6000.c | 20 
 gcc/doc/tm.texi|  6 ++
 gcc/doc/tm.texi.in |  2 ++
 gcc/sched-rgn.c| 19 +--
 gcc/target.def |  7 +++
 5 files changed, 48 insertions(+), 6 deletions(-)

diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 909589c..1963402 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -7000,6 +7000,12 @@ The structure *@var{spec_info} should be filled in by 
the target.
 The structure describes speculation types that can be used in the scheduler.
 @end deftypefn

+@deftypefn {Target Hook} bool TARGET_SCHED_CAN_SPECULATE_INSN (rtx_insn 
*@var{insn})
+Some instructions should never be speculated by the schedulers, usually
+ because the instruction is too expensive to get this wrong.  This hook
+ should return @code{false} if @var{insn} should not be speculated.
+@end deftypefn

Consider adding something like this:

Define this hook to return false for instructions which are not fully 
modeled by the pipeline description to avoid DFA size explosion. 
Otherwise the scheduler may erroneously speculate those instructions 
into a pipeline bubble that is too small which may severely impact 
performance.




+
 @deftypefn {Target Hook} int TARGET_SCHED_SMS_RES_MII (struct ddg *@var{g})
 This hook is called by the swing modulo scheduler to calculate a
 resource-based lower bound which is based on the resources available in
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index ea74d37..756c118 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4882,6 +4882,8 @@ them: try the first ones in this list first.
 descr
 @hook TARGET_SCHED_SET_SCHED_FLAGS

+@hook TARGET_SCHED_CAN_SPECULATE_INSN
+
 @hook TARGET_SCHED_SMS_RES_MII

 @hook TARGET_SCHED_DISPATCH
diff --git a/gcc/sched-rgn.c b/gcc/sched-rgn.c
index 2af3a03..a09fc5d 100644
--- a/gcc/sched-rgn.c
+++ b/gcc/sched-rgn.c
@@ -2147,12 +2147,19 @@ static int
 can_schedule_ready_p (rtx_insn *insn)
 {
   /* An interblock motion?  */
-  if (INSN_BB (insn) != target_bb
-  && IS_SPECULATIVE_INSN (insn)
-  && !check_live (insn, INSN_BB (insn)))
-return 0;
-  else
-return 1;
+  if (INSN_BB (insn) != target_bb && IS_SPECULATIVE_INSN (insn))
+{
+  /* Cannot schedule this insn unless all operands are live.  */
+  if (!check_live (insn, INSN_BB (insn)))
+   return 0;
+
+  /* Should not move expensive instructions speculatively.  */
+  if (GET_CODE (PATTERN (insn)) != CLOBBER
+ && !targetm.sched.can_speculate_insn (insn))
+   return 0;
+}

You probably want to filter both CLOBBER and USE insns.

OK with those two nits addressed.

jeff


Re: [PATCH] Fix improper combiner optimization of mixed mode vector/scalar expressions (PR 79365)

2017-02-03 Thread Segher Boessenkool
Hi,

A couple of things.  Firstly, we are in stage 4 now, so unless this
is a regression it will have to wait for GCC 8.  I can carry the patch
or you can repost it when we are in stage 1.

Secondly, you forgot the changelog.

On Fri, Feb 03, 2017 at 03:03:06PM -0500, Walter Lee wrote:
> In looking at PR 79365 I found that the problem is actually in the
> combiner.  The combiner sometimes applies scalar optimizations to
> vector context where they are invalid.  i.e. (a > b) >> 1 can optimize
> to 0 if ">" is a scalar op but not if it is a vector op.  The reason
> this shows up on tile* and not other architectures is because tile*
> uses the same register file for both scalars and vectors, so the
> combiner sees these mixed mode expressions on tile* that would go
> through memory on other architectures.

There are other architectures that are similar, but perhaps none that
have their vector size equal to their most common integer size.

> I have found two places where this improper optimization occurs.

> --- a/gcc/combine.c
> +++ b/gcc/combine.c
> @@ -7267,6 +7267,7 @@ expand_field_assignment (const_rtx x)
>else if (GET_CODE (SET_DEST (x)) == SUBREG
>  /* We need SUBREGs to compute nonzero_bits properly.  */
>  && nonzero_sign_valid
> +&& !VECTOR_MODE_P (GET_MODE (SET_DEST (x)))
>  && (((GET_MODE_SIZE (GET_MODE (SET_DEST (x)))
>+ (UNITS_PER_WORD - 1)) / UNITS_PER_WORD)
>  == ((GET_MODE_SIZE (GET_MODE (SUBREG_REG (SET_DEST (x

Please use SCALAR_INT_MODE_P instead.

> @@ -13030,6 +13031,7 @@ record_dead_and_set_regs_1 (rtx dest, const_rtx 
> setter, void *data)
>   record_value_for_reg (dest, record_dead_insn, SET_SRC (setter));
>else if (GET_CODE (setter) == SET
>  && GET_CODE (SET_DEST (setter)) == SUBREG
> +&& !VECTOR_MODE_P (GET_MODE (SET_DEST (setter)))
>  && SUBREG_REG (SET_DEST (setter)) == dest
>  && GET_MODE_PRECISION (GET_MODE (dest)) <= BITS_PER_WORD
>  && subreg_lowpart_p (SET_DEST (setter)))

What is this for?  It isn't triggered by the testcase in the PR.

You should add testcases to the testsuite, too.  In gcc.target is fine
if they are tile*-specific.

Thanks,


Segher


Re: [PATCH] Fix memory leak in parloops (PR tree-optimization/79338)

2017-02-03 Thread Jeff Law

On 02/03/2017 03:27 PM, Jakub Jelinek wrote:

Hi!

On the gcc.dg/autopar/outer-4.c testcase we leak memory, because
vect_analyze_loop_form allocates vinfos for all the stmts in the loop
and when it calls vect_analyze_loop_form on the loop->inner before
the former is destroyed, we overwrite all those vinfo pointers for
stmts inside of the inner loop with newly allocated memory.

The following patch fixes that by making sure vect_analyze_loop_form
for the inner loop is only called after destroy_loop_vec_info
of the outer loop.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2017-02-03  Jakub Jelinek  

PR tree-optimization/79338
* tree-parloops.c (gather_scalar_reductions): Don't call
vect_analyze_loop_form for loop->inner before destroying loop's
loop_vinfo.

OK.
jeff



C++ PATCH for c++/78689 (ICE on constructor with label)

2017-02-03 Thread Jason Merrill
My patch for the new inheriting constructor semantics changed
tree-inline.c to avoid copying non-taken branches, in order to avoid
trying to refer to parameters that are omitted from base constructor
clones; this overlooked that we might enter such branches via goto.
This patch reverts that change, and addresses the omitted parameter
issue in the front end by providing a suitable replacement.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit e4a9364d66494c3cfd6cf528d8176a02b24b9d00
Author: Jason Merrill 
Date:   Fri Feb 3 15:30:24 2017 -0500

PR c++/78689 - ICE on constructor with label

gcc/
* tree-inline.c (copy_tree_body_r) [COND_EXPR]: Revert change to
avoid copying non-taken branch.
gcc/cp/
* optimize.c (maybe_clone_body): Replace omitted parameters with
null lvalues.
* class.c (build_clone): Fix logic for omitting inherited parms.

diff --git a/gcc/cp/class.c b/gcc/cp/class.c
index d99ebcd..7ec07c9 100644
--- a/gcc/cp/class.c
+++ b/gcc/cp/class.c
@@ -4818,7 +4818,7 @@ build_clone (tree fn, tree name)
 
   /* A base constructor inheriting from a virtual base doesn't get the
  arguments.  */
-  if (ctor_omit_inherited_parms (fn))
+  if (ctor_omit_inherited_parms (clone))
 DECL_CHAIN (DECL_CHAIN (DECL_ARGUMENTS (clone))) = NULL_TREE;
 
   for (parms = DECL_ARGUMENTS (clone); parms; parms = DECL_CHAIN (parms))
diff --git a/gcc/cp/optimize.c b/gcc/cp/optimize.c
index f61d035..933612c 100644
--- a/gcc/cp/optimize.c
+++ b/gcc/cp/optimize.c
@@ -621,9 +621,21 @@ maybe_clone_body (tree fn)
  function.  */
   else
 {
-  decl_map->put (parm, clone_parm);
+ tree replacement;
  if (clone_parm)
-   clone_parm = DECL_CHAIN (clone_parm);
+   {
+ replacement = clone_parm;
+ clone_parm = DECL_CHAIN (clone_parm);
+   }
+ else
+   {
+ /* Inheriting ctors can omit parameters from the base
+clone.  Replace them with null lvalues.  */
+ tree reftype = build_reference_type (TREE_TYPE (parm));
+ replacement = fold_convert (reftype, null_pointer_node);
+ replacement = convert_from_reference (replacement);
+   }
+  decl_map->put (parm, replacement);
 }
 }
 
diff --git a/gcc/testsuite/g++.dg/cpp1z/inh-ctor23.C 
b/gcc/testsuite/g++.dg/cpp1z/inh-ctor23.C
index 0c862f7..c0cf040 100644
--- a/gcc/testsuite/g++.dg/cpp1z/inh-ctor23.C
+++ b/gcc/testsuite/g++.dg/cpp1z/inh-ctor23.C
@@ -14,6 +14,9 @@ Z z(0); // OK: initialization of Y does not invoke default 
constructor of X
 // { dg-final { scan-assembler "_ZN1YCI21WEi" } }
 // { dg-final { scan-tree-dump "Y::Y ._2, _3.;" "gimple" } }
 
+// And that we aren't expecting the int, either.
+// { dg-final { scan-tree-dump-not "Y::Y.int\[^\n\]*int" "gimple" } }
+
 // And that we *are* passing the int along to V::V.
 // { dg-final { scan-assembler "_ZN1VCI21WEi" } }
 // { dg-final { scan-tree-dump "V::V .this, _1.;" "gimple" } }
diff --git a/gcc/testsuite/g++.dg/init/ctor12.C 
b/gcc/testsuite/g++.dg/init/ctor12.C
new file mode 100644
index 000..7c1aab72
--- /dev/null
+++ b/gcc/testsuite/g++.dg/init/ctor12.C
@@ -0,0 +1,14 @@
+// PR c++/78689 - ICE on constructor with label
+
+struct e {
+  e() {
+goto aj;
+if (0)
+aj:;
+  }
+};
+
+void f()
+{
+  struct e x;
+}
diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
index d63c70f..138b992 100644
--- a/gcc/tree-inline.c
+++ b/gcc/tree-inline.c
@@ -1045,7 +1045,6 @@ copy_tree_body_r (tree *tp, int *walk_subtrees, void 
*data)
   copy_body_data *id = (copy_body_data *) data;
   tree fn = id->src_fn;
   tree new_block;
-  bool copied = false;
 
   /* Begin by recognizing trees that we'll completely rewrite for the
  inlining context.  Our output for these trees is completely
@@ -1242,40 +1241,10 @@ copy_tree_body_r (tree *tp, int *walk_subtrees, void 
*data)
  *walk_subtrees = 0;
  return NULL;
}
-  else if (TREE_CODE (*tp) == COND_EXPR)
-   {
- tree cond = TREE_OPERAND (*tp, 0);
- walk_tree (, copy_tree_body_r, data, NULL);
- tree folded = fold (cond);
- if (TREE_CODE (folded) == INTEGER_CST)
-   {
- /* Only copy the taken branch; for a C++ base constructor clone
-inherited from a virtual base, copying the other branch leads
-to references to parameters that were optimized away.  */
- tree branch = (integer_nonzerop (folded)
-? TREE_OPERAND (*tp, 1)
-: TREE_OPERAND (*tp, 2));
- tree type = TREE_TYPE (*tp);
- if (VOID_TYPE_P (type)
- 

Re: [PATCH] Fix memory leak in the vectorizer (PR tree-optimization/79340)

2017-02-03 Thread Jeff Law

On 02/03/2017 03:24 PM, Jakub Jelinek wrote:

Hi!

As mentioned in the PR, on the gfortran.dg/forall_7.f90 testcase
we leak memory starting with r239542.  vect_get_slp_defs allocates
a vector of vectors, elements corresponding to NULL in slp_ops
are .create (0) vectors that don't need freeing, but the rest
is safe_spliced into other vectors.  We need to release those
vectors afterwards, before the vector containing them is released.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2017-02-03  Jakub Jelinek  

PR tree-optimization/79340
* tree-vect-loop.c (vectorizable_reduction): Release
vec_defs elements after safe_splicing them into other vectors.
Formatting fixes.

OK.
jeff



[PATCH] Fix memory leak in parloops (PR tree-optimization/79338)

2017-02-03 Thread Jakub Jelinek
Hi!

On the gcc.dg/autopar/outer-4.c testcase we leak memory, because
vect_analyze_loop_form allocates vinfos for all the stmts in the loop
and when it calls vect_analyze_loop_form on the loop->inner before
the former is destroyed, we overwrite all those vinfo pointers for
stmts inside of the inner loop with newly allocated memory.

The following patch fixes that by making sure vect_analyze_loop_form
for the inner loop is only called after destroy_loop_vec_info
of the outer loop.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2017-02-03  Jakub Jelinek  

PR tree-optimization/79338
* tree-parloops.c (gather_scalar_reductions): Don't call
vect_analyze_loop_form for loop->inner before destroying loop's
loop_vinfo.

--- gcc/tree-parloops.c.jj  2017-02-01 16:25:21.0 +0100
+++ gcc/tree-parloops.c 2017-02-03 15:37:13.940564625 +0100
@@ -2513,8 +2513,8 @@ gather_scalar_reductions (loop_p loop, r
 {
   gphi_iterator gsi;
   loop_vec_info simple_loop_info;
-  loop_vec_info simple_inner_loop_info = NULL;
-  bool allow_double_reduc = true;
+  auto_vec double_reduc_phis;
+  auto_vec double_reduc_stmts;
 
   if (!stmt_vec_info_vec.exists ())
 init_stmt_vec_info_vec ();
@@ -2544,43 +2544,55 @@ gather_scalar_reductions (loop_p loop, r
 
   if (double_reduc)
{
- if (!allow_double_reduc
- || loop->inner->inner != NULL)
+ if (loop->inner->inner != NULL)
continue;
 
- if (!simple_inner_loop_info)
-   {
- simple_inner_loop_info = vect_analyze_loop_form (loop->inner);
- if (!simple_inner_loop_info)
-   {
- allow_double_reduc = false;
- continue;
-   }
-   }
-
- use_operand_p use_p;
- gimple *inner_stmt;
- bool single_use_p = single_imm_use (res, _p, _stmt);
- gcc_assert (single_use_p);
- if (gimple_code (inner_stmt) != GIMPLE_PHI)
-   continue;
- gphi *inner_phi = as_a  (inner_stmt);
- if (simple_iv (loop->inner, loop->inner, PHI_RESULT (inner_phi),
-, true))
-   continue;
-
- gimple *inner_reduc_stmt
-   = vect_force_simple_reduction (simple_inner_loop_info, inner_phi,
-  true, _reduc, true);
- gcc_assert (!double_reduc);
- if (inner_reduc_stmt == NULL)
-   continue;
+ double_reduc_phis.safe_push (phi);
+ double_reduc_stmts.safe_push (reduc_stmt);
+ continue;
}
 
   build_new_reduction (reduction_list, reduc_stmt, phi);
 }
   destroy_loop_vec_info (simple_loop_info, true);
-  destroy_loop_vec_info (simple_inner_loop_info, true);
+
+  if (!double_reduc_phis.is_empty ())
+{
+  simple_loop_info = vect_analyze_loop_form (loop->inner);
+  if (simple_loop_info)
+   {
+ gphi *phi;
+ unsigned int i;
+
+ FOR_EACH_VEC_ELT (double_reduc_phis, i, phi)
+   {
+ affine_iv iv;
+ tree res = PHI_RESULT (phi);
+ bool double_reduc;
+
+ use_operand_p use_p;
+ gimple *inner_stmt;
+ bool single_use_p = single_imm_use (res, _p, _stmt);
+ gcc_assert (single_use_p);
+ if (gimple_code (inner_stmt) != GIMPLE_PHI)
+   continue;
+ gphi *inner_phi = as_a  (inner_stmt);
+ if (simple_iv (loop->inner, loop->inner, PHI_RESULT (inner_phi),
+, true))
+   continue;
+
+ gimple *inner_reduc_stmt
+   = vect_force_simple_reduction (simple_loop_info, inner_phi,
+  true, _reduc, true);
+ gcc_assert (!double_reduc);
+ if (inner_reduc_stmt == NULL)
+   continue;
+
+ build_new_reduction (reduction_list, double_reduc_stmts[i], phi);
+   }
+ destroy_loop_vec_info (simple_loop_info, true);
+   }
+}
 
  gather_done:
   /* Release the claim on gimple_uid.  */

Jakub


[PATCH] Fix memory leak in the vectorizer (PR tree-optimization/79340)

2017-02-03 Thread Jakub Jelinek
Hi!

As mentioned in the PR, on the gfortran.dg/forall_7.f90 testcase
we leak memory starting with r239542.  vect_get_slp_defs allocates
a vector of vectors, elements corresponding to NULL in slp_ops
are .create (0) vectors that don't need freeing, but the rest
is safe_spliced into other vectors.  We need to release those
vectors afterwards, before the vector containing them is released.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2017-02-03  Jakub Jelinek  

PR tree-optimization/79340
* tree-vect-loop.c (vectorizable_reduction): Release
vec_defs elements after safe_splicing them into other vectors.
Formatting fixes.

--- gcc/tree-vect-loop.c.jj 2017-02-02 08:44:13.0 +0100
+++ gcc/tree-vect-loop.c2017-02-03 13:19:00.087499820 +0100
@@ -6172,20 +6172,24 @@ vectorizable_reduction (gimple *stmt, gi
  if (slp_node)
{
  /* Get vec defs for all the operands except the reduction index,
-   ensuring the ordering of the ops in the vector is kept.  */
+ensuring the ordering of the ops in the vector is kept.  */
  auto_vec slp_ops;
  auto_vec vec_defs;
 
- slp_ops.quick_push ((reduc_index == 0) ? NULL : ops[0]);
- slp_ops.quick_push ((reduc_index == 1) ? NULL : ops[1]);
+ slp_ops.quick_push (reduc_index == 0 ? NULL : ops[0]);
+ slp_ops.quick_push (reduc_index == 1 ? NULL : ops[1]);
  if (op_type == ternary_op)
-   slp_ops.quick_push ((reduc_index == 2) ? NULL : ops[2]);
+   slp_ops.quick_push (reduc_index == 2 ? NULL : ops[2]);
 
  vect_get_slp_defs (slp_ops, slp_node, _defs, -1);
 
- vec_oprnds0.safe_splice (vec_defs[(reduc_index == 0) ? 1 : 0]);
+ vec_oprnds0.safe_splice (vec_defs[reduc_index == 0 ? 1 : 0]);
+ vec_defs[reduc_index == 0 ? 1 : 0].release ();
  if (op_type == ternary_op)
-   vec_oprnds1.safe_splice (vec_defs[(reduc_index == 2) ? 1 : 2]);
+   {
+ vec_oprnds1.safe_splice (vec_defs[reduc_index == 2 ? 1 : 2]);
+ vec_defs[reduc_index == 2 ? 1 : 2].release ();
+   }
}
   else
{
@@ -6194,7 +6198,7 @@ vectorizable_reduction (gimple *stmt, gi
   vec_oprnds0.quick_push (loop_vec_def0);
   if (op_type == ternary_op)
{
-op1 = (reduc_index == 0) ? ops[2] : ops[1];
+op1 = reduc_index == 0 ? ops[2] : ops[1];
  loop_vec_def1 = vect_get_vec_def_for_operand (op1, stmt);
  vec_oprnds1.quick_push (loop_vec_def1);
}

Jakub


[PATCH, rs6000] gcc 5 back port of vec_packs and vec_vgbbd builtin table entry fixes

2017-02-03 Thread Carl E. Love
GCC Maintainers:

The GCC 6 branch for the back port of the following two mainline commits
was approved and committed to GCC 6.  The following patch is the back
port for the two commits for the GCC 5 branch.  The only difference
relative to the GCC 6 patch is the line numbers.

mainline commits.
commit r244943  Remove bogus entries for the P8V_BUILTIN_VEC_VGBBD   
built-ins
commit r244904  Fix order of entries for ALTIVEC_BUILTIN_VEC_PACKS
and P8V_BUILTIN_VEC_VGBBD.

The patch fixes the issue of the vec_packs built-in entries not being
contiguous and removes the bogus entries for the vec_vgbbd built-in.

The patch has been tested on powerpc64le-unknown-linux-gnu (Power 8 LE)
with no regressions.

Is the patch OK for gcc 5 branch?  

   Carl Love
\

gcc/ChangeLog:

2017-02-03  Carl Love  

Backport of two commits from mainline, r244943 and r244904,
dated 017-01-26 and 2017-01-25 respectively

* config/rs6000/rs6000-c (altivec_overloaded_builtins): Fix order
of entries for ALTIVEC_BUILTIN_VEC_PACKS.  Remove bogus entries
for P8V_BUILTIN_VEC_VGBBD.

gcc/testsuite/ChangeLog:

2017-02-03  Carl Love  
* gcc.target/powerpc/builtins-3-p8.c:  Add new testfile for missing
vec_packs built-in tests.
---
 gcc/config/rs6000/rs6000-c.c | 13 
 gcc/testsuite/gcc.target/powerpc/builtins-3-p8.c | 26 
 2 files changed, 30 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/builtins-3-p8.c

diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c
index fbb8e2d..33cb27f 100644
--- a/gcc/config/rs6000/rs6000-c.c
+++ b/gcc/config/rs6000/rs6000-c.c
@@ -2065,14 +2065,14 @@ const struct altivec_builtin_types 
altivec_overloaded_builtins[] = {
 RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V4SI, 
RS6000_BTI_unsigned_V4SI, 0 },
   { ALTIVEC_BUILTIN_VEC_PACKS, ALTIVEC_BUILTIN_VPKSWSS,
 RS6000_BTI_V8HI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
-  { ALTIVEC_BUILTIN_VEC_VPKSWSS, ALTIVEC_BUILTIN_VPKSWSS,
-RS6000_BTI_V8HI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
-  { ALTIVEC_BUILTIN_VEC_VPKUWUS, ALTIVEC_BUILTIN_VPKUWUS,
-RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V4SI, 
RS6000_BTI_unsigned_V4SI, 0 },
   { ALTIVEC_BUILTIN_VEC_PACKS, P8V_BUILTIN_VPKUDUS,
 RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V2DI, 
RS6000_BTI_unsigned_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_PACKS, P8V_BUILTIN_VPKSDSS,
 RS6000_BTI_V4SI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_VPKSWSS, ALTIVEC_BUILTIN_VPKSWSS,
+RS6000_BTI_V8HI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
+  { ALTIVEC_BUILTIN_VEC_VPKUWUS, ALTIVEC_BUILTIN_VPKUWUS,
+RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V4SI, 
RS6000_BTI_unsigned_V4SI, 0 },
   { ALTIVEC_BUILTIN_VEC_VPKSHSS, ALTIVEC_BUILTIN_VPKSHSS,
 RS6000_BTI_V16QI, RS6000_BTI_V8HI, RS6000_BTI_V8HI, 0 },
   { ALTIVEC_BUILTIN_VEC_VPKUHUS, ALTIVEC_BUILTIN_VPKUHUS,
@@ -4196,11 +4196,6 @@ const struct altivec_builtin_types 
altivec_overloaded_builtins[] = {
   { P8V_BUILTIN_VEC_VUPKLSW, P8V_BUILTIN_VUPKLSW,
 RS6000_BTI_bool_V2DI, RS6000_BTI_bool_V4SI, 0, 0 },
 
-  { P8V_BUILTIN_VEC_VGBBD, P8V_BUILTIN_VGBBD,
-RS6000_BTI_V16QI, 0, 0, 0 },
-  { P8V_BUILTIN_VEC_VGBBD, P8V_BUILTIN_VGBBD,
-RS6000_BTI_unsigned_V16QI, 0, 0, 0 },
-
   /* Crypto builtins.  */
   { CRYPTO_BUILTIN_VPERMXOR, CRYPTO_BUILTIN_VPERMXOR_V16QI,
 RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI,
diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-3-p8.c 
b/gcc/testsuite/gcc.target/powerpc/builtins-3-p8.c
new file mode 100644
index 000..2c06ea7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/builtins-3-p8.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-mcpu=power8" } */
+
+#include 
+
+vector signed int
+test_vsi_packs_vsll_vsll (vector signed long long x,
+  vector signed long long y)
+{
+  return vec_packs (x, y);
+}
+
+vector unsigned int
+test_vui_packs_vull_vull (vector unsigned long long x,
+  vector unsigned long long y)
+{
+  return vec_packs (x, y);
+}
+
+/* Expected test results:
+ test_vsi_packs_vsll_vsll  1 vpksdss
+ test_vui_packs_vull_vull  1 vpkudus */
+
+/* { dg-final { scan-assembler-times "vpksdss"  1 } } */
+/* { dg-final { scan-assembler-times "vpkudus"  1 } } */
-- 
1.9.1





Re: [PATCH/AARCH64] Enable software prefetching (-fprefetch-loop-arrays) for ThunderX 88xxx

2017-02-03 Thread Andrew Pinski
On Fri, Feb 3, 2017 at 4:00 AM, Maxim Kuvyrkov
 wrote:
> Hi Andrew,
>
> I took the liberty of rebasing your patch on top of my patchset.  Does it 
> look correct?

Yes this looks correct.

Thanks,
Andrew

>
> I think I addressed all the comments you had about my review and posted 
> updated patches.
>
> --
> Maxim Kuvyrkov
> www.linaro.org
>
>
>
>> On Jan 30, 2017, at 7:25 PM, Andrew Pinski  wrote:
>>
>> On Mon, Jan 30, 2017 at 6:49 AM, Maxim Kuvyrkov
>>  wrote:
 On Jan 27, 2017, at 6:59 PM, Andrew Pinski  wrote:

 On Fri, Jan 27, 2017 at 4:11 AM, Richard Biener
  wrote:
> On Fri, Jan 27, 2017 at 1:10 PM, Richard Biener
>  wrote:
>> On Thu, Jan 26, 2017 at 9:56 PM, Andrew Pinski  
>> wrote:
>>> Hi,
>>> This patch enables -fprefetch-loop-arrays for -mcpu=thunderxt88 and
>>> -mcpu=thunderxt88p1.  I filled out the tuning structures for both
>>> thunderx and thunderx2t99.  No other core current enables software
>>> prefetching so I set them to 0 which does not change the default
>>> parameters.
>>>
>>> OK?  Bootstrapped and tested on both ThunderX2 CN99xx and ThunderX
>>> CN88xx with no regressions.  I got a 2x improvement for 462.libquantum
>>> on CN88xx, overall a 10% improvement on SPEC INT on CN88xx at -Ofast.
>>> CN99xx's SPEC did not change.
>>
>> Heh, quite impressive for this kind of bit-rotten (and broken?) pass ;)
>
> And I wonder if most benefit comes from the unrolling the pass might do
> rather than from the prefetches...

 Not in this case.  The main reason why I know is because the number of
 L1 and L2 misses drops a lot.
>>>
>>> I can confirm this.  In my experiments loop unrolling hurts several tests.
>>
>> Not on the cores I tried it.  I tried it on both ThunderX CN88xx and
>> ThunderX CN99xx, I did not get any regressions due to unrolling.
>>
>> Thanks,
>> Andrew
>>
>>>
>>> The prefetching approach I'm testing for -O2 includes disabling of loop 
>>> unrolling to prevent code bloat.
>>>
>>> --
>>> Maxim Kuvyrkov
>>> www.linaro.org
>


Re: New Port for RISC-V v2

2017-02-03 Thread Gerald Pfeifer
On Fri, 3 Feb 2017, Palmer Dabbelt wrote:
> How do these look?

Fine for readings.html.  backends.html, not my cup of tea. ;-)

Gerald


Re: [PATCH] diagnostics: fix end-points of ranges within macros (PR c++/79300)

2017-02-03 Thread Jeff Law

On 02/03/2017 01:52 PM, David Malcolm wrote:


IIRC you had another issue that was borderline,
but which we did want to move forward on gcc-7.  The reported
testcase
was a problem with a new warning, but the underlying issue could (in
theory) be tripped for existing warnings.  RIght?


I think you're talking about PR preprocessor/79210.  I found a trivial
way to turn the reproducer for that one into one that ICE-d gcc 7 with
input that gcc 6 could handle, so I marked it as "[7 Regression]",
committed a fix to trunk, and marked it as RESOLVED FIXED.

That's the one :-)  Perfect.  Thanks.


jeff



[PATCH] PR target/66144, PowerPC improve vector compare

2017-02-03 Thread Michael Meissner
This patch improves the code generation for:

vector char
compare_vc (vector char a, vector char b)
{
  return a == b;
}

Previously it generated:

vcmpequb 2,2,3
vspltisw 1,-1
vspltisw 0,0
xxsel 34,32,33,34

and now it generates:

vcmpequb 2,2,3

I changed the vector conditional support for integer vectors to allow constant
all 0's or all 1's in addition to registers, and it knows that after a vector
comparison, the vector element will be all 1's if the comparison was true, and
all 0's if the comparison was false.

I did the normal bootstrap build and make check.  There were no regressions.

I built Spec 2006 with the compiler, and I noticed that 15 out of the 29
benchmarks had code changes with this fix (bzip2, gcc, gamess,milc, gromacs,
gobmk, dealII, calculix, hmmer, sjeng, libquantum, h264ref, tonto, wrf,
sphinx3).  I did a quick run (one pass instead of the normal 3) of the
benchmarks that had code changes.  There were no regressions in performance,
and 2 runs that were faster (gcc and tonto) by 3-4%.

Can I check this fix into the GCC trunk?

Note, I will be on starting on vacation on Wednesday (February 8th), and I will
return to work on Friday February 24th.  If I don't get the approval by Monday,
I will likely hold the patches until I return to work.

[gcc]
2017-02-03  Michael Meissner  

PR target/66144
* config/rs6000/vector.md (vcond): Allow the true and
false values to be constant vectors with all 0 or all 1 bits set.
(vcondu): Likewise.
* config/rs6000/predicates.md (vector_int_same_bit): New
predicate.
(vector_int_reg_or_same_bit): Likewise.
(fpmask_comparison_operator): Update comment.
(vecint_comparison_operator): New predicate.
* config/rs6000/rs6000.c (rs6000_emit_vector_cond_expr): Optimize
vector conditionals when the true and false values are constant
vectors with all 0 bits or all 1 bits set.

[gcc/testsuite]
2017-02-03  Michael Meissner  

PR target/66144
* gcc.target/powerpc/pr66144-1.c: New test.
* gcc.target/powerpc/pr66144-2.c: Likewise.
* gcc.target/powerpc/pr66144-3.c: Likewise.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/vector.md
===
--- gcc/config/rs6000/vector.md (revision 245101)
+++ gcc/config/rs6000/vector.md (working copy)
@@ -390,13 +390,13 @@ (define_expand "vcond"
 }")
 
 (define_expand "vcond"
-  [(set (match_operand:VEC_I 0 "vint_operand" "")
+  [(set (match_operand:VEC_I 0 "vint_operand")
(if_then_else:VEC_I
 (match_operator 3 "comparison_operator"
-[(match_operand:VEC_I 4 "vint_operand" "")
- (match_operand:VEC_I 5 "vint_operand" "")])
-(match_operand:VEC_I 1 "vint_operand" "")
-(match_operand:VEC_I 2 "vint_operand" "")))]
+[(match_operand:VEC_I 4 "vint_operand")
+ (match_operand:VEC_I 5 "vint_operand")])
+(match_operand:VEC_I 1 "vector_int_reg_or_same_bit")
+(match_operand:VEC_I 2 "vector_int_reg_or_same_bit")))]
   "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)"
   "
 {
@@ -446,13 +446,13 @@ (define_expand "vcondv4siv4sf"
 }")
 
 (define_expand "vcondu"
-  [(set (match_operand:VEC_I 0 "vint_operand" "")
+  [(set (match_operand:VEC_I 0 "vint_operand")
(if_then_else:VEC_I
 (match_operator 3 "comparison_operator"
-[(match_operand:VEC_I 4 "vint_operand" "")
- (match_operand:VEC_I 5 "vint_operand" "")])
-(match_operand:VEC_I 1 "vint_operand" "")
-(match_operand:VEC_I 2 "vint_operand" "")))]
+[(match_operand:VEC_I 4 "vint_operand")
+ (match_operand:VEC_I 5 "vint_operand")])
+(match_operand:VEC_I 1 "vector_int_reg_or_same_bit")
+(match_operand:VEC_I 2 "vector_int_reg_or_same_bit")))]
   "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)"
   "
 {
Index: gcc/config/rs6000/predicates.md
===
--- gcc/config/rs6000/predicates.md (revision 245101)
+++ gcc/config/rs6000/predicates.md (working copy)
@@ -808,6 +808,33 @@ (define_predicate "all_ones_constant"
   (and (match_code "const_int,const_double,const_wide_int,const_vector")
(match_test "op == CONSTM1_RTX (mode) && !FLOAT_MODE_P (mode)")))
 
+;; Return 1 if operand is either a vector constant of all 0 bits of a vector
+;; constant of all 1 bits.
+(define_predicate "vector_int_same_bit"
+  (match_code "const_vector")
+{
+  if (GET_MODE_CLASS (mode) != MODE_VECTOR_INT)
+return 0;
+
+  else
+return op == CONST0_RTX (mode) 

[PATCH/AARCH64] Fix ThunderX core definition

2017-02-03 Thread Andrew Pinski
Hi,
  It turns out due to a hardware issue, LSE support is turned off.
This patch updates the thunderx cores to disable LSE and 8.1 where as
needed.

OK?  Bootstrapped and tested on aarch64-linux-gnu with no regressions.

Thanks,
Andrew Pinski

ChangeLog:
* config/aarch64/aarch64-cores.def (thunderx): Disable LSE.
(thunderxt88): Likewise.
(thunderxt81): Disable LSE and change v8.1 to v8.
(thunderxt83): Likewise.
Index: config/aarch64/aarch64-cores.def
===
--- config/aarch64/aarch64-cores.def(revision 244077)
+++ config/aarch64/aarch64-cores.def(working copy)
@@ -60,13 +60,13 @@ AARCH64_CORE("falkor",  falkor,c
 AARCH64_CORE("qdf24xx", qdf24xx,   cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 | 
AARCH64_FL_CRC | AARCH64_FL_CRYPTO, qdf24xx,   0x51, 0xC00, -1)
 
 /* Cavium ('C') cores. */
-AARCH64_CORE("thunderx",  thunderx,  thunderx,  8A,
AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO | AARCH64_FL_LSE, 
thunderx,  0x43, 0x0a0, -1)
+AARCH64_CORE("thunderx",  thunderx,  thunderx,  8A,  
AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx,  0x43, 
0x0a0, -1)
 /* Do not swap around "thunderxt88p1" and "thunderxt88",
this order is required to handle variant correctly. */
-AARCH64_CORE("thunderxt88p1", thunderxt88p1, thunderx,  8A,
AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, 
thunderx,  0x43, 0x0a1, 0)
-AARCH64_CORE("thunderxt88",   thunderxt88,   thunderx,  8A,
AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO | AARCH64_FL_LSE, 
thunderx,  0x43, 0x0a1, -1)
-AARCH64_CORE("thunderxt81",   thunderxt81,   thunderx,  8_1A,  
AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO | AARCH64_FL_LSE, 
thunderx,  0x43, 0x0a2, -1)
-AARCH64_CORE("thunderxt83",   thunderxt83,   thunderx,  8_1A,  
AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO | AARCH64_FL_LSE, 
thunderx,  0x43, 0x0a3, -1)
+AARCH64_CORE("thunderxt88p1", thunderxt88p1, thunderx,  8A,  
AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO,thunderx,  
0x43, 0x0a1, 0)
+AARCH64_CORE("thunderxt88",   thunderxt88,   thunderx,  8A,  
AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx,  0x43, 
0x0a1, -1)
+AARCH64_CORE("thunderxt81",   thunderxt81,   thunderx,  8A,  
AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx,  0x43, 
0x0a2, -1)
+AARCH64_CORE("thunderxt83",   thunderxt83,   thunderx,  8A,  
AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx,  0x43, 
0x0a3, -1)
 
 /* APM ('P') cores. */
 AARCH64_CORE("xgene1",  xgene1,xgene1,8A,  AARCH64_FL_FOR_ARCH8, 
xgene1, 0x50, 0x000, -1)


Re: [PATCH] Fix powerpc movsi_from_sf define_insn_and_split constraints (PR target/79354)

2017-02-03 Thread Michael Meissner
On Fri, Feb 03, 2017 at 09:11:38PM +0100, Jakub Jelinek wrote:
> On Fri, Feb 03, 2017 at 03:05:58PM -0500, Michael Meissner wrote:
> > > > --- gcc/testsuite/gcc.target/powerpc/pr79354.c.jj   2017-02-03 
> > > > 02:37:44.147938375 +0100
> > > > +++ gcc/testsuite/gcc.target/powerpc/pr79354.c  2017-02-03 
> > > > 02:38:34.838303987 +0100
> > > > @@ -0,0 +1,23 @@
> > > > +/* PR target/79354 */
> > > > +/* { dg-do compile { target { powerpc64*-*-* && lp64 } } } */
> > 
> > The code that moves SFmode vector registers to GPRs needs -m64 (the convert
> > from scalar format to vector format puts the converted SFmode value in the
> > upper 32-bits of the vector register, and then after the direct move, it 
> > does a
> > 32-bit shift right in the GPR).  Similarly, going from a GPR to a vector
> > register, we do a shift left 32-bits, direct move, and then convert from 
> > vector
> > format to scalar format.
> 
> The testcase is testing that the particular power9 instruction doesn't
> appear in -mcpu=power8 output.  The test is valid source even for -m32 and
> should not contain the power9 instruction either.
> Without the rs6000.md change the test will only FAIL with -m64 indeed.

The function being patched (movsi_from_sf) will only be called when -m64 and
-mcpu=power9 or -mcpu=power8 are used.  So, it doesn't harm anything to omit
the && lp64, but the fix being tested would only occur in 64-bit mode.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [PATCH] diagnostics: fix end-points of ranges within macros (PR c++/79300)

2017-02-03 Thread David Malcolm
On Fri, 2017-02-03 at 13:28 -0700, Jeff Law wrote:
> On 02/02/2017 01:53 PM, David Malcolm wrote:
> > PR c++/79300 identifies an issue in which diagnostics_show_locus
> > prints the wrong end-point for a range within a macro:
> > 
> >assert ((p + val_size) - buf == encoded_len);
> >~^~~~
> > 
> > as opposed to:
> > 
> >assert ((p + val_size) - buf == encoded_len);
> >~^~
> > 
> > The caret, start and finish locations of this compound location are
> > all virtual locations.
> > 
> > The root cause is that when diagnostic-show-locus.c's layout ctor
> > expands the caret and end-points, it calls
> >   linemap_client_expand_location_to_spelling_point
> > which (via expand_location_1) unwinds the macro expansions, and
> > then calls linemap_expand_location.  Doing so implicitly picks the
> > *caret* location for any virtual locations, and so in the above
> > case
> > it picks these spelling locations for the three parts of the
> > location:
> > 
> >assert ((p + val_size) - buf == encoded_len);
> >^^  ^
> >START|  FINISH
> >   CARET
> > 
> > and so erroneously strips the underlining from the final token,
> > apart
> > from its first character.
> > 
> > The fix is for layout's ctor to indicate that it wants the
> > start/finish
> > locations in such a situation, adding a new param to
> > linemap_client_expand_location_to_spelling_point, so that
> > expand_location_1 can handle this case by extracting the relevant
> > part
> > of the unwound compound location, and thus choose:
> > 
> >assert ((p + val_size) - buf == encoded_len);
> >^^^
> >START|FINISH
> >   CARET
> > 
> > Successfully bootstrapped on x86_64-pc-linux-gnu.
> > 
> > OK for stage 4, or should I wait until stage 1?
> 
> > 
> > gcc/ChangeLog:
> > PR c++/79300
> > * diagnostic-show-locus.c (layout::layout): Use start and
> > finish
> > spelling location for the start and finish of each range.
> > * genmatch.c
> > (linemap_client_expand_location_to_spelling_point):
> > Add unused aspect param.
> > * input.c (expand_location_1): Add "aspect" param, and use it
> > to access the correct part of the location.
> > (expand_location): Pass LOCATION_ASPECT_CARET to new param of
> > expand_location_1.
> > (expand_location_to_spelling_point): Likewise.
> > (linemap_client_expand_location_to_spelling_point): Add
> > "aspect"
> > param, and pass it to expand_location_1.
> > 
> > gcc/testsuite/ChangeLog:
> > PR c++/79300
> > * c-c++-common/Wmisleading-indentation-3.c (fn_14): Update
> > expected underlining within macro expansion.
> > * c-c++-common/pr70264.c: Likewise.
> > * g++.dg/plugin/diagnostic-test-expressions-1.C
> > (test_within_macro_1): New test.
> > (test_within_macro_2): Likewise.
> > (test_within_macro_3): Likewise.
> > (test_within_macro_4): Likewise.
> > * gcc.dg/format/diagnostic-ranges.c (test_macro_3): Update
> > expected underlining within macro expansion.
> > (test_macro_4): Likewise.
> > * gcc.dg/plugin/diagnostic-test-expressions-1.c
> > (test_within_macro_1): New test.
> > (test_within_macro_2): Likewise.
> > (test_within_macro_3): Likewise.
> > (test_within_macro_4): Likewise.
> > * gcc.dg/spellcheck-fields-2.c (test_macro): Update expected
> > underlining within macro expansion.
> > 
> > libcpp/ChangeLog:
> > PR c++/79300
> > * include/line-map.h (enum location_aspect): New enum.
> > (linemap_client_expand_location_to_spelling_point): Add
> > enum location_aspect param.
> > * line-map.c (source_range::intersects_line_p): Update for new
> > param of linemap_client_expand_location_to_spelling_point.
> > (rich_location::get_expanded_location): Likewise.
> > (fixit_insert::affects_line_p): Likewise.
> I'd suggest waiting. 

OK; I plan to self-approve this one in stage 1.

> IIRC you had another issue that was borderline, 
> but which we did want to move forward on gcc-7.  The reported
> testcase 
> was a problem with a new warning, but the underlying issue could (in 
> theory) be tripped for existing warnings.  RIght?

I think you're talking about PR preprocessor/79210.  I found a trivial
way to turn the reproducer for that one into one that ICE-d gcc 7 with
input that gcc 6 could handle, so I marked it as "[7 Regression]",
committed a fix to trunk, and marked it as RESOLVED FIXED.

Dave


Re: New Port for RISC-V v2

2017-02-03 Thread Palmer Dabbelt
On Thu, 02 Feb 2017 17:08:06 PST (-0800), Palmer Dabbelt wrote:
> On Thu, 02 Feb 2017 09:58:32 PST (-0800), jos...@codesourcery.com wrote:
>> On Thu, 2 Feb 2017, Palmer Dabbelt wrote:
>>
>>> Additionally, here's a diff against wwwdocs.  This is really just to check 
>>> this
>>> is all I'm supposed to do, I can submit a proper patch via the mailing list 
>>> (I
>>> just don't know how to use CVS, sorry).
>>
>> Other parts of the wwwdocs changes (backends.html, readings.html) are
>> probably of more interest.
>
> Thanks, that didn't feel like enough stuff.  I'll submit some wwwdocs changes
> that address all the feedback from this code review.

How do these look?

Index: htdocs/backends.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/backends.html,v
retrieving revision 1.73
diff -u -r1.73 backends.html
--- htdocs/backends.html3 Jan 2017 22:03:25 -   1.73
+++ htdocs/backends.html3 Feb 2017 20:44:56 -
@@ -99,6 +99,7 @@
 nvptx  |   S Q   Cq mg   e
 pa |   ? Q   CBD  qr  b   i  e
 pdp11  |L   ICqrc b  e
+riscv  | Q   Cqr gia
 rl78   |L  F l   gs
 rs6000 | Q   Cqr pb   ia
 rx |  s
Index: htdocs/readings.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/readings.html,v
retrieving revision 1.256
diff -u -r1.256 readings.html
--- htdocs/readings.html3 Feb 2017 07:43:57 -   1.256
+++ htdocs/readings.html3 Feb 2017 20:44:56 -
@@ -250,6 +250,12 @@
   Manufacturer: DEC
   http://simh.trailing-edge.com/;>Simulators
  
+
+ riscv
+  Manufacturer: Many (open ISA standard)
+  http://riscv.org;>RISC-V Foundation
+  http://riscv.org/specifications/;>ISA Specifications
+ 

  rs6000 (powerpc, powerpcle)
   Manufacturer: IBM, Motorola


Re: [PATCH] diagnostics: fix end-points of ranges within macros (PR c++/79300)

2017-02-03 Thread Jeff Law

On 02/02/2017 01:53 PM, David Malcolm wrote:

PR c++/79300 identifies an issue in which diagnostics_show_locus
prints the wrong end-point for a range within a macro:

   assert ((p + val_size) - buf == encoded_len);
   ~^~~~

as opposed to:

   assert ((p + val_size) - buf == encoded_len);
   ~^~

The caret, start and finish locations of this compound location are
all virtual locations.

The root cause is that when diagnostic-show-locus.c's layout ctor
expands the caret and end-points, it calls
  linemap_client_expand_location_to_spelling_point
which (via expand_location_1) unwinds the macro expansions, and
then calls linemap_expand_location.  Doing so implicitly picks the
*caret* location for any virtual locations, and so in the above case
it picks these spelling locations for the three parts of the location:

   assert ((p + val_size) - buf == encoded_len);
   ^^  ^
   START|  FINISH
  CARET

and so erroneously strips the underlining from the final token, apart
from its first character.

The fix is for layout's ctor to indicate that it wants the start/finish
locations in such a situation, adding a new param to
linemap_client_expand_location_to_spelling_point, so that
expand_location_1 can handle this case by extracting the relevant part
of the unwound compound location, and thus choose:

   assert ((p + val_size) - buf == encoded_len);
   ^^^
   START|FINISH
  CARET

Successfully bootstrapped on x86_64-pc-linux-gnu.

OK for stage 4, or should I wait until stage 1?




gcc/ChangeLog:
PR c++/79300
* diagnostic-show-locus.c (layout::layout): Use start and finish
spelling location for the start and finish of each range.
* genmatch.c (linemap_client_expand_location_to_spelling_point):
Add unused aspect param.
* input.c (expand_location_1): Add "aspect" param, and use it
to access the correct part of the location.
(expand_location): Pass LOCATION_ASPECT_CARET to new param of
expand_location_1.
(expand_location_to_spelling_point): Likewise.
(linemap_client_expand_location_to_spelling_point): Add "aspect"
param, and pass it to expand_location_1.

gcc/testsuite/ChangeLog:
PR c++/79300
* c-c++-common/Wmisleading-indentation-3.c (fn_14): Update
expected underlining within macro expansion.
* c-c++-common/pr70264.c: Likewise.
* g++.dg/plugin/diagnostic-test-expressions-1.C
(test_within_macro_1): New test.
(test_within_macro_2): Likewise.
(test_within_macro_3): Likewise.
(test_within_macro_4): Likewise.
* gcc.dg/format/diagnostic-ranges.c (test_macro_3): Update
expected underlining within macro expansion.
(test_macro_4): Likewise.
* gcc.dg/plugin/diagnostic-test-expressions-1.c
(test_within_macro_1): New test.
(test_within_macro_2): Likewise.
(test_within_macro_3): Likewise.
(test_within_macro_4): Likewise.
* gcc.dg/spellcheck-fields-2.c (test_macro): Update expected
underlining within macro expansion.

libcpp/ChangeLog:
PR c++/79300
* include/line-map.h (enum location_aspect): New enum.
(linemap_client_expand_location_to_spelling_point): Add
enum location_aspect param.
* line-map.c (source_range::intersects_line_p): Update for new
param of linemap_client_expand_location_to_spelling_point.
(rich_location::get_expanded_location): Likewise.
(fixit_insert::affects_line_p): Likewise.
I'd suggest waiting.  IIRC you had another issue that was borderline, 
but which we did want to move forward on gcc-7.  The reported testcase 
was a problem with a new warning, but the underlying issue could (in 
theory) be tripped for existing warnings.  RIght?


Jeff



Re: [PATCH] Fix improper combiner optimization of mixed mode vector/scalar expressions (PR 79365)

2017-02-03 Thread Walter Lee

On 2/3/2017 3:03 PM, Walter Lee wrote:

Hi,

In looking at PR 79365 I found that the problem is actually in the
combiner.  The combiner sometimes applies scalar optimizations to
vector context where they are invalid.  i.e. (a > b) >> 1 can optimize
to 0 if ">" is a scalar op but not if it is a vector op.  The reason
this shows up on tile* and not other architectures is because tile*
uses the same register file for both scalars and vectors, so the
combiner sees these mixed mode expressions on tile* that would go
through memory on other architectures.

I have found two places where this improper optimization occurs.
Patch below.

Ok to commit?


I forgot to mention this fix has been validated on TILEPro/TILE-Gx.  I also 
sanity checked it against x86.

Thanks,

Walter



[PATCH] sched: Do not move expensive insns speculatively (PR68664)

2017-02-03 Thread Segher Boessenkool
Scheduling should never move very expensive instructions to places they
are executed more frequently.  This patch fixes that, reducing the
execution time of c-ray by over 40% (I tested on a BE Power7 system).

This introduces a new target hook sched.can_speculate_insn which returns
whether the scheduler is allowed to speculate a given instruction.  The
rs6000 implementation disallows all divide and square root instructions.

Bootstrapped and tested on powerpc64-linux {-m32,-m64}.  Is this okay
for trunk?


Segher


2017-02-03  Segher Boessenkool  

PR rtl-optimization/68664
* target.def (can_speculate_insn): New hook.
* doc/tm.texi.in (TARGET_SCHED_CAN_SPECULATE_INSN): New hook.
* doc/tm.texi: Regenerate.
* sched-rgn.c (can_schedule_ready_p): Use the new hook.
* config/rs6000/rs6000.c (TARGET_SCHED_CAN_SPECULATE_INSN): New macro.
(rs6000_sched_can_speculate_insn): New function.

---
 gcc/config/rs6000/rs6000.c | 20 
 gcc/doc/tm.texi|  6 ++
 gcc/doc/tm.texi.in |  2 ++
 gcc/sched-rgn.c| 19 +--
 gcc/target.def |  7 +++
 5 files changed, 48 insertions(+), 6 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 98e5f91..5e85858 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -1608,6 +1608,9 @@ static const struct attribute_spec 
rs6000_attribute_table[] =
 #undef TARGET_SCHED_FREE_SCHED_CONTEXT
 #define TARGET_SCHED_FREE_SCHED_CONTEXT rs6000_free_sched_context
 
+#undef TARGET_SCHED_CAN_SPECULATE_INSN
+#define TARGET_SCHED_CAN_SPECULATE_INSN rs6000_sched_can_speculate_insn
+
 #undef TARGET_VECTORIZE_BUILTIN_MASK_FOR_LOAD
 #define TARGET_VECTORIZE_BUILTIN_MASK_FOR_LOAD rs6000_builtin_mask_for_load
 #undef TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT
@@ -34986,6 +34989,23 @@ rs6000_free_sched_context (void *_sc)
   free (_sc);
 }
 
+static bool
+rs6000_sched_can_speculate_insn (rtx_insn *insn)
+{
+  switch (get_attr_type (insn))
+{
+case TYPE_DIV:
+case TYPE_SDIV:
+case TYPE_DDIV:
+case TYPE_VECDIV:
+case TYPE_SSQRT:
+case TYPE_DSQRT:
+  return false;
+
+default:
+  return true;
+  }
+}
 
 /* Length in units of the trampoline for entering a nested function.  */
 
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 909589c..1963402 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -7000,6 +7000,12 @@ The structure *@var{spec_info} should be filled in by 
the target.
 The structure describes speculation types that can be used in the scheduler.
 @end deftypefn
 
+@deftypefn {Target Hook} bool TARGET_SCHED_CAN_SPECULATE_INSN (rtx_insn 
*@var{insn})
+Some instructions should never be speculated by the schedulers, usually
+ because the instruction is too expensive to get this wrong.  This hook
+ should return @code{false} if @var{insn} should not be speculated.
+@end deftypefn
+
 @deftypefn {Target Hook} int TARGET_SCHED_SMS_RES_MII (struct ddg *@var{g})
 This hook is called by the swing modulo scheduler to calculate a
 resource-based lower bound which is based on the resources available in
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index ea74d37..756c118 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4882,6 +4882,8 @@ them: try the first ones in this list first.
 
 @hook TARGET_SCHED_SET_SCHED_FLAGS
 
+@hook TARGET_SCHED_CAN_SPECULATE_INSN
+
 @hook TARGET_SCHED_SMS_RES_MII
 
 @hook TARGET_SCHED_DISPATCH
diff --git a/gcc/sched-rgn.c b/gcc/sched-rgn.c
index 2af3a03..a09fc5d 100644
--- a/gcc/sched-rgn.c
+++ b/gcc/sched-rgn.c
@@ -2147,12 +2147,19 @@ static int
 can_schedule_ready_p (rtx_insn *insn)
 {
   /* An interblock motion?  */
-  if (INSN_BB (insn) != target_bb
-  && IS_SPECULATIVE_INSN (insn)
-  && !check_live (insn, INSN_BB (insn)))
-return 0;
-  else
-return 1;
+  if (INSN_BB (insn) != target_bb && IS_SPECULATIVE_INSN (insn))
+{
+  /* Cannot schedule this insn unless all operands are live.  */
+  if (!check_live (insn, INSN_BB (insn)))
+   return 0;
+
+  /* Should not move expensive instructions speculatively.  */
+  if (GET_CODE (PATTERN (insn)) != CLOBBER
+ && !targetm.sched.can_speculate_insn (insn))
+   return 0;
+}
+
+  return 1;
 }
 
 /* Updates counter and other information.  Split from can_schedule_ready_p ()
diff --git a/gcc/target.def b/gcc/target.def
index 7308da1..dbb5d76 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -1480,6 +1480,13 @@ DEFHOOK_UNDOC
  "Return speculation types that are checked for instruction @var{insn}",
  unsigned int, (rtx_insn *insn), NULL)
 
+DEFHOOK
+(can_speculate_insn,
+ "Some instructions should never be speculated by the schedulers, usually\n\
+ because the instruction is too expensive to get this wrong.  This hook\n\
+ should return @code{false} if @var{insn} should not be speculated.",
+ bool, (rtx_insn *insn), 

Re: [PATCH] Fix powerpc movsi_from_sf define_insn_and_split constraints (PR target/79354)

2017-02-03 Thread Jakub Jelinek
On Fri, Feb 03, 2017 at 03:05:58PM -0500, Michael Meissner wrote:
> > > --- gcc/testsuite/gcc.target/powerpc/pr79354.c.jj 2017-02-03 
> > > 02:37:44.147938375 +0100
> > > +++ gcc/testsuite/gcc.target/powerpc/pr79354.c2017-02-03 
> > > 02:38:34.838303987 +0100
> > > @@ -0,0 +1,23 @@
> > > +/* PR target/79354 */
> > > +/* { dg-do compile { target { powerpc64*-*-* && lp64 } } } */
> 
> The code that moves SFmode vector registers to GPRs needs -m64 (the convert
> from scalar format to vector format puts the converted SFmode value in the
> upper 32-bits of the vector register, and then after the direct move, it does 
> a
> 32-bit shift right in the GPR).  Similarly, going from a GPR to a vector
> register, we do a shift left 32-bits, direct move, and then convert from 
> vector
> format to scalar format.

The testcase is testing that the particular power9 instruction doesn't
appear in -mcpu=power8 output.  The test is valid source even for -m32 and
should not contain the power9 instruction either.
Without the rs6000.md change the test will only FAIL with -m64 indeed.

Jakub


Re: [PATCH] Fix powerpc movsi_from_sf define_insn_and_split constraints (PR target/79354)

2017-02-03 Thread Michael Meissner
On Fri, Feb 03, 2017 at 09:59:45AM -0600, Segher Boessenkool wrote:
> Hi Jakub,
> 
> On Fri, Feb 03, 2017 at 10:10:47AM +0100, Jakub Jelinek wrote:
> > As mentioned in the PR, for the following testcase we emit a power9
> > instruction even with -mcpu=power8.  Similar movsf_hardfloat instruction
> > uses wb constraint for the stxssp insn source rather than wu, which it
> > only uses for stxsspx (power7?).
> > 
> > Bootstrapped/regtested on powerpc64{,le}-linux, ok for trunk?
> 
> Yes please.  Thanks!
> 
> Some testcase stuff below...
> 
> 
> > --- gcc/testsuite/gcc.target/powerpc/pr79354.c.jj   2017-02-03 
> > 02:37:44.147938375 +0100
> > +++ gcc/testsuite/gcc.target/powerpc/pr79354.c  2017-02-03 
> > 02:38:34.838303987 +0100
> > @@ -0,0 +1,23 @@
> > +/* PR target/79354 */
> > +/* { dg-do compile { target { powerpc64*-*-* && lp64 } } } */

The code that moves SFmode vector registers to GPRs needs -m64 (the convert
from scalar format to vector format puts the converted SFmode value in the
upper 32-bits of the vector register, and then after the direct move, it does a
32-bit shift right in the GPR).  Similarly, going from a GPR to a vector
register, we do a shift left 32-bits, direct move, and then convert from vector
format to scalar format.

So to enable the code that was causing the problem, you need 64-bit.  In
32-bit, it does a store, ori 2,2,0, and then a load to move SFmode values
between GPR and vector registers.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



[PATCH] Fix improper combiner optimization of mixed mode vector/scalar expressions (PR 79365)

2017-02-03 Thread Walter Lee

Hi,

In looking at PR 79365 I found that the problem is actually in the
combiner.  The combiner sometimes applies scalar optimizations to
vector context where they are invalid.  i.e. (a > b) >> 1 can optimize
to 0 if ">" is a scalar op but not if it is a vector op.  The reason
this shows up on tile* and not other architectures is because tile*
uses the same register file for both scalars and vectors, so the
combiner sees these mixed mode expressions on tile* that would go
through memory on other architectures.

I have found two places where this improper optimization occurs.
Patch below.

Ok to commit?

Thanks,

Walter

diff --git a/gcc/combine.c b/gcc/combine.c
index 28133ff..8f2615e 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -7267,6 +7267,7 @@ expand_field_assignment (const_rtx x)
   else if (GET_CODE (SET_DEST (x)) == SUBREG
   /* We need SUBREGs to compute nonzero_bits properly.  */
   && nonzero_sign_valid
+  && !VECTOR_MODE_P (GET_MODE (SET_DEST (x)))
   && (((GET_MODE_SIZE (GET_MODE (SET_DEST (x)))
 + (UNITS_PER_WORD - 1)) / UNITS_PER_WORD)
   == ((GET_MODE_SIZE (GET_MODE (SUBREG_REG (SET_DEST (x
@@ -13030,6 +13031,7 @@ record_dead_and_set_regs_1 (rtx dest, const_rtx setter, 
void *data)
record_value_for_reg (dest, record_dead_insn, SET_SRC (setter));
   else if (GET_CODE (setter) == SET
   && GET_CODE (SET_DEST (setter)) == SUBREG
+  && !VECTOR_MODE_P (GET_MODE (SET_DEST (setter)))
   && SUBREG_REG (SET_DEST (setter)) == dest
   && GET_MODE_PRECISION (GET_MODE (dest)) <= BITS_PER_WORD
   && subreg_lowpart_p (SET_DEST (setter)))
--
2.7.2



C++ PATCH for c++/79294 (ICE with invalid pointer-to-function template argument)

2017-02-03 Thread Jason Merrill
In this testcase, if we saw a value-dependent expression we were
leaving its type unchanged, leading to an abort at the end of the
function.  Instead, we should complain about an argument with the
wrong type.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit 336288e5f6a35f54003b4d16e840cc2fd34bf115
Author: Jason Merrill 
Date:   Wed Feb 1 17:43:12 2017 -0500

PR c++/79294 - ICE with invalid template argument

* pt.c (convert_nontype_argument_function): Check value-dependence.
(convert_nontype_argument): Don't check it here for function ptrs.

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index c69c270..4c4941a 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -5966,6 +5966,9 @@ convert_nontype_argument_function (tree type, tree expr,
   if (fn == error_mark_node)
 return error_mark_node;
 
+  if (value_dependent_expression_p (fn))
+return fn;
+
   fn_no_ptr = strip_fnptr_conv (fn);
   if (TREE_CODE (fn_no_ptr) == ADDR_EXPR)
 fn_no_ptr = TREE_OPERAND (fn_no_ptr, 0);
@@ -6698,8 +6701,7 @@ convert_nontype_argument (tree type, tree expr, 
tsubst_flags_t complain)
/* Null pointer values are OK in C++11.  */
return perform_qualification_conversions (type, expr);
 
-  if (!value_dependent_expression_p (expr))
-   expr = convert_nontype_argument_function (type, expr, complain);
+  expr = convert_nontype_argument_function (type, expr, complain);
   if (!expr || expr == error_mark_node)
return expr;
 }
@@ -6723,8 +6725,7 @@ convert_nontype_argument (tree type, tree expr, 
tsubst_flags_t complain)
  return NULL_TREE;
}
 
-  if (!value_dependent_expression_p (expr))
-   expr = convert_nontype_argument_function (type, expr, complain);
+  expr = convert_nontype_argument_function (type, expr, complain);
   if (!expr || expr == error_mark_node)
return expr;
 }
diff --git a/gcc/testsuite/g++.dg/template/error57.C 
b/gcc/testsuite/g++.dg/template/error57.C
new file mode 100644
index 000..f67e0a6
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/error57.C
@@ -0,0 +1,5 @@
+// PR c++/79294
+
+template  struct a;
+template  a < b// { dg-error "int" }
+// { dg-error "expected" "" { target *-*-* } .-1 }


Re: [PATCH][C++] Improve memory use for PR12245

2017-02-03 Thread Jason Merrill
On Thu, Feb 2, 2017 at 4:21 AM, Richard Biener  wrote:
> On Wed, 1 Feb 2017, Richard Biener wrote:
>
>> On Wed, 1 Feb 2017, Jakub Jelinek wrote:
>>
>> > On Wed, Feb 01, 2017 at 02:14:20PM +0100, Richard Biener wrote:
>> > >
>> > > Looks like we cache the answer to maybe_constant_value (INTEGER_CST)
>> > > which results in (-fmem-report):
>> > >
>> > > cp/constexpr.c:4814 (maybe_constant_value) 67108816:100.0%
>> > > 10066310417:  0.0%   ggc
>> > >
>> > > this can be improved trivially to
>> > >
>> > > cp/constexpr.c:4817 (maybe_constant_value) 2032: 13.6%
>> > > 2144 2:  0.0%   ggc
>> > >
>> > > with the following patch which I am testing right now.
>> > >
>> > > Ok for trunk?
>> > >
>> > > (just in case it causes some fallout because, err, some tcc_constant
>> > > is not really constant, what's the subset we can cheaply check here?
>> > > basically we want to avoid caching all INTEGER_CSTs we use for
>> > > CONSTRUCTOR_INDEX in large initializers)
>> >
>> > I'm worried that we don't want to handle all the constants that way.
>> > As I wrote on IRC, I see some problematic constants:
>> > 1) not sure if constants can't be
>> >potential_nondependent_constant_expression, then we don't want to return
>> >them
>> > 2) cxx_eval_outermost_constant_expr has some special handling of
>> >trees with vector type (and array type)
>> > 3) constants with TREE_OVERFLOW should go through maybe_constant_value_1
>> > 4) INTEGER_CSTs with POINTER_TYPE (if they aren't 0) likewise
>> >
>> > For 3) and 4) I believe maybe_constant_value is supposed to wrap the
>> > constants into a NOP_EXPR or something.
>>
>> Just to mention, bootstrap & regtest completed successfully without
>> regressions on x86_64-unknown-linux-gnu so we at least have zero
>> testing coverage for the cases that break.
>>
>> I'll wait for Jason to suggest specific things to avoid, TREE_OVERFLOW
>> and pointer types are easy (no need to special case zero, it's just
>> one entry per pointer type).
>
> Oh, and just to mention the same issue of course plagues
> maybe_constant_init which ends up allocating a hash_map 1630776 times
> (fixing that doesn't fix any memory-hog but would avoid some needless
> cycles spent on this).  Similar "simple" patch would be
>
> * constexpr.c (maybe_constant_init): Bail out early
> for CONSTANT_CLASS_P.
>
> Index: gcc/cp/constexpr.c
> ===
> --- gcc/cp/constexpr.c  (revision 245119)
> +++ gcc/cp/constexpr.c  (working copy)
> @@ -4916,6 +4919,8 @@ maybe_constant_init (tree t, tree decl)
>  t = TARGET_EXPR_INITIAL (t);
>if (!potential_nondependent_static_init_expression (t))
>  /* Don't try to evaluate it.  */;
> +  else if (CONSTANT_CLASS_P (t))
> +return t;
>else
>  t = cxx_eval_outermost_constant_expr (t, true, false, decl);
>if (TREE_CODE (t) == TARGET_EXPR)
>
> which is even eventually safer because it's after
> the !potential_nondependent_static_init_expression (if that can
> be ever true for CONSTANT_CLASS_P t).
>
> Then the other issue noticed is that we always copy every
> CONSTRUCTOR at least once via reshape_init_array.
>
> I think both maybe_constant_value and maybe_constant_init are
> low-hanging fruit to fix at this point so waiting for some
> guidance on Jakubs concerns (or just take it yourself from here).

Yeah, I think doing the early return after potential_nondependent_* is
the way to go.  Here's what I'm applying:
commit 7b3d88c4524ae080518f8cbe4afbd57ffe4fdc3c
Author: Jason Merrill 
Date:   Thu Feb 2 17:24:41 2017 -0500

PR c++/12245 - excessive memory use

* constexpr.c (maybe_constant_value): Fold maybe_constant_value_1
back in.  Don't cache constants.
(maybe_constant_init): Don't cache constants.

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index 67d2428..f9bc5186 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -4777,8 +4777,10 @@ fold_simple (tree t)
Otherwise, if T does not have TREE_CONSTANT set, returns T.
Otherwise, returns a version of T without TREE_CONSTANT.  */
 
-static tree
-maybe_constant_value_1 (tree t, tree decl)
+static GTY((deletable)) hash_map *cv_cache;
+
+tree
+maybe_constant_value (tree t, tree decl)
 {
   tree r;
 
@@ -4791,6 +4793,14 @@ maybe_constant_value_1 (tree t, tree decl)
}
   return t;
 }
+  else if (CONSTANT_CLASS_P (t))
+/* No caching or evaluation needed.  */
+return t;
+
+  if (cv_cache == NULL)
+cv_cache = hash_map::create_ggc (101);
+  if (tree *cached = cv_cache->get (t))
+return *cached;
 
   r = cxx_eval_outermost_constant_expr (t, true, true, decl);
   gcc_checking_assert (r == t
@@ -4798,29 +4808,10 @@ maybe_constant_value_1 (tree t, tree decl)
   || TREE_CODE (t) == VIEW_CONVERT_EXPR
   || 

Re: [PATCH] PR66145 use new ABI for std::ios::failure exceptions

2017-02-03 Thread Jonathan Wakely

On 16/01/17 15:57 +, Jonathan Wakely wrote:

This changes the version of std::ios_base::failure thrown by
libstdc++.so to be the new __cxx11 ABI one, matching the default for
headers that are trying to catch the exception.

There's no simple way to do this as an easy transition, but hopefully
by now most people are either using the new ABI or building GCC
without the dual ABI.


PR libstdc++/66145
* src/c++11/functexcept.cc: Use new ABI for std::ios_base::failure
exceptions.
* testsuite/27_io/basic_ios/copyfmt/char/1.cc: Don't override ABI
for test, so new ios::failure can be caught.
* testsuite/27_io/basic_ios/exceptions/char/1.cc: Likewise.
* testsuite/27_io/basic_istream/extractors_arithmetic/char/
exceptions_failbit.cc: Likewise.
* testsuite/27_io/basic_istream/extractors_arithmetic/wchar_t/
exceptions_failbit.cc: Likewise.
* testsuite/27_io/basic_istream/extractors_other/char/
exceptions_null.cc: Likewise.
* testsuite/27_io/basic_istream/extractors_other/wchar_t/
exceptions_null.cc: Likewise.
* testsuite/27_io/basic_istream/sentry/char/12297.cc: Likewise.
* testsuite/27_io/basic_istream/sentry/wchar_t/12297.cc: Likewise.
* testsuite/27_io/basic_ostream/inserters_other/char/
exceptions_null.cc: Likewise.
* testsuite/27_io/basic_ostream/inserters_other/wchar_t/
exceptions_null.cc: Likewise.
* testsuite/27_io/ios_base/storage/2.cc: Likewise.


Instead of just removing these overrides from the tests:


-// The library still throws the original definition of std::ios::failure
-// { dg-options "-D_GLIBCXX_USE_CXX11_ABI=0" }


I should have flipped the override to force the new ABI, otherwise
those tests fail when the library is configured with
--with-default-libstdcxx-default-abi=gcc4-compatible

This puts them back, adjusts the comment, and changes the value.

Tested powerpc64le-linux, with gcc4-compatible and without dual-abi.

Committed to trunk.

commit fe0a528549e96d631d4c53bcde7322449e08e19b
Author: Jonathan Wakely 
Date:   Fri Feb 3 19:06:25 2017 +

PR libstdc++/66145 ensure new ABI for ios::failure tests

	PR libstdc++/66145
	* testsuite/27_io/basic_ios/copyfmt/char/1.cc: Restore ABI override
	so new ios::failure can be caught even when old ABI is the default.
	* testsuite/27_io/basic_ios/exceptions/char/1.cc: Likewise.
	* testsuite/27_io/basic_istream/extractors_arithmetic/char/
	exceptions_failbit.cc: Likewise.
	* testsuite/27_io/basic_istream/extractors_arithmetic/wchar_t/
	exceptions_failbit.cc: Likewise.
	* testsuite/27_io/basic_istream/extractors_other/char/
	exceptions_null.cc: Likewise.
	* testsuite/27_io/basic_istream/extractors_other/wchar_t/
	exceptions_null.cc: Likewise.
	* testsuite/27_io/basic_istream/sentry/char/12297.cc: Likewise.
	* testsuite/27_io/basic_istream/sentry/wchar_t/12297.cc: Likewise.
	* testsuite/27_io/basic_ostream/inserters_other/char/
	exceptions_null.cc: Likewise.
	* testsuite/27_io/basic_ostream/inserters_other/wchar_t/
	exceptions_null.cc: Likewise.
	* testsuite/27_io/ios_base/storage/2.cc: Likewise.

diff --git a/libstdc++-v3/testsuite/27_io/basic_ios/copyfmt/char/1.cc b/libstdc++-v3/testsuite/27_io/basic_ios/copyfmt/char/1.cc
index d922b18..87551a0 100644
--- a/libstdc++-v3/testsuite/27_io/basic_ios/copyfmt/char/1.cc
+++ b/libstdc++-v3/testsuite/27_io/basic_ios/copyfmt/char/1.cc
@@ -17,6 +17,9 @@
 // with this library; see the file COPYING3.  If not see
 // .
 
+// The library throws the new definition of std::ios::failure
+// { dg-options "-D_GLIBCXX_USE_CXX11_ABI=1" }
+
 // 27.4.4.2 basic_ios member functions
 
 // NB: Don't include any other headers in this file.
diff --git a/libstdc++-v3/testsuite/27_io/basic_ios/exceptions/char/1.cc b/libstdc++-v3/testsuite/27_io/basic_ios/exceptions/char/1.cc
index a7f829a..86b7bee 100644
--- a/libstdc++-v3/testsuite/27_io/basic_ios/exceptions/char/1.cc
+++ b/libstdc++-v3/testsuite/27_io/basic_ios/exceptions/char/1.cc
@@ -17,6 +17,9 @@
 // with this library; see the file COPYING3.  If not see
 // .
 
+// The library throws the new definition of std::ios::failure
+// { dg-options "-D_GLIBCXX_USE_CXX11_ABI=1" }
+
 // 27.4.4.2 basic_ios member functions
 
 // NB: Don't include any other headers in this file.
diff --git a/libstdc++-v3/testsuite/27_io/basic_istream/extractors_arithmetic/char/exceptions_failbit.cc b/libstdc++-v3/testsuite/27_io/basic_istream/extractors_arithmetic/char/exceptions_failbit.cc
index ab4d7b1..2f1edf6 100644
--- a/libstdc++-v3/testsuite/27_io/basic_istream/extractors_arithmetic/char/exceptions_failbit.cc
+++ b/libstdc++-v3/testsuite/27_io/basic_istream/extractors_arithmetic/char/exceptions_failbit.cc
@@ -15,6 +15,9 @@
 // with this library; see the file COPYING3.  

[PATCH, committed] TILEPro/TILE-Gx: add blockage to avoid bad scheduler + dwarf interaction (PR target/78862)

2017-02-03 Thread Walter Lee

This patch fixes PR target/78862.  Add blockage to prevent the
scheduler from reordering a LR save with a subsequent instruction that
changes the CFA register.  This trips up the dwarf generating logic.

Bootstrapped and tested on TILEPro/TILE-Gx hardware, also backported
to GCC 6.

* config/tilegx/tilegx.md (tilegx_expand_prologue): Add blockage
  after initial stackframe link reg save.
* config/tilepro/tilepro.md (tilepro_expand_prologue): Likewise.

diff --git a/gcc/config/tilegx/tilegx.c b/gcc/config/tilegx/tilegx.c
index b04e708..d8ca14b 100644
--- a/gcc/config/tilegx/tilegx.c
+++ b/gcc/config/tilegx/tilegx.c
@@ -3998,8 +3998,11 @@ tilegx_expand_prologue (void)
   /* Save lr first in its special location because code after this
  might use the link register as a scratch register.  */
   if (df_regs_ever_live_p (TILEGX_LINK_REGNUM) || crtl->calls_eh_return)
-FRP (frame_emit_store (TILEGX_LINK_REGNUM, TILEGX_LINK_REGNUM,
-  stack_pointer_rtx, stack_pointer_rtx, 0));
+{
+  FRP (frame_emit_store (TILEGX_LINK_REGNUM, TILEGX_LINK_REGNUM,
+stack_pointer_rtx, stack_pointer_rtx, 0));
+  emit_insn (gen_blockage ());
+}
 
   if (total_size == 0)

 {
diff --git a/gcc/config/tilepro/tilepro.c b/gcc/config/tilepro/tilepro.c
index b16ad38..aa1bb1c 100644
--- a/gcc/config/tilepro/tilepro.c
+++ b/gcc/config/tilepro/tilepro.c
@@ -3533,8 +3533,11 @@ tilepro_expand_prologue (void)
   /* Save lr first in its special location because code after this
  might use the link register as a scratch register.  */
   if (df_regs_ever_live_p (TILEPRO_LINK_REGNUM) || crtl->calls_eh_return)
-FRP (frame_emit_store (TILEPRO_LINK_REGNUM, TILEPRO_LINK_REGNUM,
-  stack_pointer_rtx, stack_pointer_rtx, 0));
+{
+  FRP (frame_emit_store (TILEPRO_LINK_REGNUM, TILEPRO_LINK_REGNUM,
+stack_pointer_rtx, stack_pointer_rtx, 0));
+  emit_insn (gen_blockage ());
+}
 
   if (total_size == 0)

 {
--
2.7.2



Re: [PATCH] use zero as the lower bound for a signed-unsigned range (PR 79327)

2017-02-03 Thread Martin Sebor

On 02/03/2017 12:02 PM, Jeff Law wrote:

On 02/02/2017 05:31 PM, Martin Sebor wrote:

-  T (2, "%#hho",a); /* { dg-warning "nul past the end"
} */
-  T (2, "%#hhx",a); /* { dg-warning ".%#hhx. directive
writing between 3 and . bytes into a region of size 2" } */
+  T (2, "%#hho",a);
+  T (2, "%#hhx",a);


On reflection, this isn't quite the right fix.  We want to both set
the correct range and warn because the call will likely overflow.
This is an example of why the likely/unlikely counters have been
introduced.  By setting min = 1 and likely = 2 for the %#hho and
3 for the %#hhx we get the desired result.


Attached is a simple patch that removes the vestigial setting of
the minimum counter while preserving the warnings above by using
the likely counter.

I had overlooked this when I introduced the likely counter and so
in the corner cases of "%#o" and "%#x" with a non-constant argument
that could be zero, the minimum counter would be set to 2 and 3
respectively rather than 1 (because zero is formatted without
the '0' or '0x' base prefix).

This patch almost certainly conflicts with Jakub's.  But I think if
anything it may get simpler after Jakub applies his patch.

Jakub, if you want to do the updates and commit after your patch so they
can both get into any potential weekend gcc spin for Fedora, go right
ahead :-)

Otherwise it's good to go for Martin after making the minor updates.


Let's let Jakub go first so he can be done with his day/week.
I'll deal with the conflicts and retest everything.

Martin



Re: [PATCH] use zero as the lower bound for a signed-unsigned range (PR 79327)

2017-02-03 Thread Jeff Law

On 02/02/2017 05:49 PM, Martin Sebor wrote:


That is unrelated to the patch, both in the current trunk, with your
patch as well as with my patch there is just
  res.range.likely = res.knownrange ? res.range.max : res.range.min;
  res.range.unlikely = res.range.max;
for these cases.

Do you want likely 2 because that the shortest length for more than
one value (only a single value has the shortest length)?
Something else?


For "%#o" the shortest output of one byte (for zero) is less likely
than the next shortest output of 2 bytes (0 vs 01).

For "%#x" it's one byte vs three bytes (0 vs 0x1).

Since specifying '#' clearly indicates the user wants the base
prefix it seems very likely that the argument will be non-zero.
Whether it's going to be greater than 7 or 15 is not so clear.

So I think setting the likely counter to 2 for octal and 3 for
hexadecimal makes the most sense.
We're trying to guess intent here, which I generally prefer to avoid. 
While we may get it right often, there's going to be cases where we 
don't -- and I see tuning that kind of decision as ultimately a losing 
battle.


To some degree it may be inevitable in this code, but let's not let it 
get out of hand.


Jeff


Re: [PATCH] use zero as the lower bound for a signed-unsigned range (PR 79327)

2017-02-03 Thread Jeff Law

On 02/02/2017 05:31 PM, Martin Sebor wrote:

-  T (2, "%#hho",a); /* { dg-warning "nul past the end"
} */
-  T (2, "%#hhx",a); /* { dg-warning ".%#hhx. directive
writing between 3 and . bytes into a region of size 2" } */
+  T (2, "%#hho",a);
+  T (2, "%#hhx",a);


On reflection, this isn't quite the right fix.  We want to both set
the correct range and warn because the call will likely overflow.
This is an example of why the likely/unlikely counters have been
introduced.  By setting min = 1 and likely = 2 for the %#hho and
3 for the %#hhx we get the desired result.


Attached is a simple patch that removes the vestigial setting of
the minimum counter while preserving the warnings above by using
the likely counter.

I had overlooked this when I introduced the likely counter and so
in the corner cases of "%#o" and "%#x" with a non-constant argument
that could be zero, the minimum counter would be set to 2 and 3
respectively rather than 1 (because zero is formatted without
the '0' or '0x' base prefix).
This patch almost certainly conflicts with Jakub's.  But I think if 
anything it may get simpler after Jakub applies his patch.


Jakub, if you want to do the updates and commit after your patch so they 
can both get into any potential weekend gcc spin for Fedora, go right 
ahead :-)


Otherwise it's good to go for Martin after making the minor updates.

jeff


[PATCH] PR libstdc++/60936 reduce coupling between objects in libstdc++.a

2017-02-03 Thread Jonathan Wakely

Move explicit instantiation definitions for string I/O functions into
their own files so that iostream and locale definitions are not needed
for uses of strings without I/O. Move functions for throwing C++11
exceptions into the individual files defining the exception types, so
that using any of the functions from functexcept.cc doesn't pull in
large pieces of the C++11 library. Finally, avoid using __int_to_char in
snprintf_lite.cc to avoid pulling in locale-inst.cc for one function.

PR libstdc++/60936
* src/c++11/Makefile.am: Add new files.
* src/c++11/Makefile.in: Regenerate.
* src/c++11/cow-string-inst.cc [!_GLIBCXX_USE_CXX11_ABI]
(operator<<, operator>>, getline): Move explicit instantiations to ...
* src/c++11/cow-string-io-inst.cc: ... new file.
* src/c++11/cow-wstring-inst.cc [!_GLIBCXX_USE_CXX11_ABI]
(operator<<, operator>>, getline): Move explicit instantiations to ...
* src/c++11/cow-wstring-io-inst.cc: ... new file.
* src/c++11/functexcept.cc (__throw_ios_failure, __throw_system_error)
(__throw_future_error, __throw_bad_function_call):
(__throw_regex_error): Move functions for C++11 exceptions to the
files that define the exception types.
* src/c++11/functional.cc (__throw_bad_function_call): Move here.
* src/c++11/future.cc (__throw_future_error): Likewise.
* src/c++11/ios.cc (__throw_ios_failure): Likewise.
* src/c++11/regex.cc (__throw_regex_error): Likewise.
* src/c++11/snprintf_lite.cc (__concat_size_t): Print decimal
representation directly instead of calling __int_to_char.
* src/c++11/sso_string.cc (__sso_string): New file for definition
of __sso_string type.
* src/c++11/string-io-inst.cc [_GLIBCXX_USE_CXX11_ABI]: New file for
explicit instantiations of narrow string I/O functions.
* src/c++11/system_error.cc (__throw_system_error): Move here.
(__sso_string): Move to new file.
* src/c++11/wstring-io-inst.cc [_GLIBCXX_USE_CXX11_ABI]: New file for
explicit instantiations of wide string I/O functions.
* src/c++98/misc-inst.cc [_GLIBCXX_USE_CXX11_ABI] (operator<<)
(operator>>, getline): Remove explicit instantiations from here.

Tested powerpc64le-linux, using the default settings and also with
--with-default-libstdcxx-default-abi=gcc4-compatible and also with
--disable-libstdcxx-dual-abi

Committed to trunk.

In theory this could be backported, since it only shuffles things
around between files, and the PR shows how sizes of anything using
std::string have been growing since 4.9 as new C++11 stuff was added,
and as the __cxx11::string symbols were added.

commit 204cfaf2c746c6351731072fc4b155c89db908db
Author: Jonathan Wakely 
Date:   Thu Feb 2 13:54:21 2017 +

PR libstdc++/60936 reduce coupling between objects in libstdc++.a

Move explicit instantiation definitions for string I/O functions into
their own files so that iostream and locale definitions are not needed
for uses of strings without I/O. Move functions for throwing C++11
exceptions into the individual files defining the exception types, so
that using any of the functions from functexcept.cc doesn't pull in
large pieces of the C++11 library. Finally, avoid using __int_to_char in
snprintf_lite.cc to avoid pulling in locale-inst.cc for one function.

PR libstdc++/60936
* src/c++11/Makefile.am: Add new files.
* src/c++11/Makefile.in: Regenerate.
* src/c++11/cow-string-inst.cc [!_GLIBCXX_USE_CXX11_ABI]
(operator<<, operator>>, getline): Move explicit instantiations to ...
* src/c++11/cow-string-io-inst.cc: ... new file.
* src/c++11/cow-wstring-inst.cc [!_GLIBCXX_USE_CXX11_ABI]
(operator<<, operator>>, getline): Move explicit instantiations to ...
* src/c++11/cow-wstring-io-inst.cc: ... new file.
* src/c++11/functexcept.cc (__throw_ios_failure, __throw_system_error)
(__throw_future_error, __throw_bad_function_call):
(__throw_regex_error): Move functions for C++11 exceptions to the
files that define the exception types.
* src/c++11/functional.cc (__throw_bad_function_call): Move here.
* src/c++11/future.cc (__throw_future_error): Likewise.
* src/c++11/ios.cc (__throw_ios_failure): Likewise.
* src/c++11/regex.cc (__throw_regex_error): Likewise.
* src/c++11/snprintf_lite.cc (__concat_size_t): Print decimal
representation directly instead of calling __int_to_char.
* src/c++11/sso_string.cc (__sso_string): New file for definition
of __sso_string type.
* src/c++11/string-io-inst.cc [_GLIBCXX_USE_CXX11_ABI]: New file for
explicit instantiations of narrow string I/O functions.
* src/c++11/system_error.cc (__throw_system_error): Move here.
(__sso_string): 

Re: [PATCH] use zero as the lower bound for a signed-unsigned range (PR 79327)

2017-02-03 Thread Jakub Jelinek
On Fri, Feb 03, 2017 at 11:43:53AM -0700, Jeff Law wrote:
> > +  if (TREE_CODE (argtype) == POINTER_TYPE)
> > {
> > - if (POINTER_TYPE_P (argtype))
> > -   argmax = build_all_ones_cst (argtype);
> > - else if (TYPE_UNSIGNED (argtype))
> > -   argmax = TYPE_MAX_VALUE (argtype);
> > - else
> > -   argmax = TYPE_MIN_VALUE (argtype);
> > + argmin = build_int_cst (pointer_sized_int_node, 0);
> > + argmax = build_all_ones_cst (pointer_sized_int_node);
> Curious why you didn't use the wide_int_to_tree API here.  I'm not objecting
> to using build_XXX, it's more to help guide me when I need to make a choice
> between the APIs for building INTEGER_CST nodes.

For wide_int_to_tree I need to have some wide_int, which I have from the
VR_RANGE query, but not here.  If I have an integer rather than wide-int,
then build_int_cst is the right API (I could use build_zero_cst, but
that in the end just checks if it is INTEGER_TYPE and calls build_int_cst
with 0 in that case).

> OK pending resolution of any conflicts with vla/flexible array changes from
> last night.

Thanks, I think there aren't any conflicts.

Jakub


[PATCH, i386]: Use pextr for TARGET_SSE4_1 when creating STV scalar copy

2017-02-03 Thread Uros Bizjak
Hello!

This patch fixes asymmetry for TARGET_SSE4_1, where pinsr was used to
create vector copy, but not pextr when creating scalar copy with
-mstv.

2017-02-03  Uros Bizjak  

* config/i386/i386.c (dimode_scalar_chain::convert_reg):
Use pextrd for TARGET_SSE4_1 when creating scalar copy.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32},
configured with --with-arch=core-avx-i.

Committed to mainline SVN.

Uros.
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 01a05c6..3a65945 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -3622,11 +3622,26 @@ dimode_scalar_chain::convert_reg (unsigned regno)
 
   if (scalar_copy)
{
- rtx vcopy = gen_reg_rtx (V2DImode);
-
  start_sequence ();
- if (TARGET_INTER_UNIT_MOVES_FROM_VEC)
+ if (TARGET_SSE4_1)
+   {
+ rtx tmp = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (1, const0_rtx));
+ emit_insn
+   (gen_rtx_SET
+(gen_rtx_SUBREG (SImode, scopy, 0),
+ gen_rtx_VEC_SELECT (SImode,
+ gen_rtx_SUBREG (V4SImode, reg, 0), tmp)));
+
+ tmp = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (1, const1_rtx));
+ emit_insn
+   (gen_rtx_SET
+(gen_rtx_SUBREG (SImode, scopy, 4),
+ gen_rtx_VEC_SELECT (SImode,
+ gen_rtx_SUBREG (V4SImode, reg, 0), tmp)));
+   }
+ else if (TARGET_INTER_UNIT_MOVES_FROM_VEC)
{
+ rtx vcopy = gen_reg_rtx (V2DImode);
  emit_move_insn (vcopy, gen_rtx_SUBREG (V2DImode, reg, 0));
  emit_move_insn (gen_rtx_SUBREG (SImode, scopy, 0),
  gen_rtx_SUBREG (SImode, vcopy, 0));


Re: [PATCH] use zero as the lower bound for a signed-unsigned range (PR 79327)

2017-02-03 Thread Jeff Law

On 02/02/2017 03:15 PM, Jakub Jelinek wrote:



Plus there are certain notes removed:
-builtin-sprintf-warn-1.c:1311:3: note: using the range [1, -2147483648] for 
directive argument
  T (0, "%d",   a); /* { dg-warning "into a region" } */
without replacement, here a has [-2147483648, 2147483647] range, but as that
is equal to the type's range, I bet it omits it.


Could be.  As I said, there are opportunities for improvements in
what the notes print.  Someone pointed out, for example, that very
large numbers are hard to read.  It might be clearer to print some
of them using constants like LONG_MAX - 2, or in hex, etc.


The *_MAX or hex is a separate issue, that can be done or doesn't have to be
done, the values are simply the same.  But picking some random numbers in
the ranges is just wrong.
Agreed.  Further refinements in this area should be distinct patches 
from the ones currently under consideration.


One of the high level things that worries me is the level of patch 
dependencies I see as we work through the issues in this code.  That is 
part of what led to the mega patch that I asked Martin to break down 
into pieces, and Martin has indicated levels of pain around keeping 
patches up-to-date WRT the trunk sources.


This is often an indication we went forward with the original work too 
quickly.  I'll take the heat on gating this stuff in too quickly.  It 
happens once in a while (I did the same with the IPA ICF code a couple 
years back).


It's a manageable problem as long as we're aware of and work to avoid 
dependencies.


Jeff


Re: [PATCH] use zero as the lower bound for a signed-unsigned range (PR 79327)

2017-02-03 Thread Jeff Law

On 02/02/2017 07:12 AM, Jakub Jelinek wrote:

On Thu, Feb 02, 2017 at 10:52:18AM +0100, Jakub Jelinek wrote:

That said, as I've said in the PR and earlier in December on gcc-patches,
the way format_integer is structured is not really maintainable, has
very different code paths for arguments with known ranges and without them
intermixed together and I still ran into wrong-code issues or wrong warnings
with your patch.  See below.  Thus I'd like to propose to just very much
simplify the code:

 gimple-ssa-sprintf.c |  108 ---
 1 file changed, 27 insertions(+), 81 deletions(-)

So far I haven't done full bootstrap/regtest on this, just
make check-gcc -j16 -k RUNTESTFLAGS=tree-ssa.exp
which revealed the need for gcc.dg/tree-ssa/builtin-sprintf-warn-1.c hunk
below.  Here it is turned into a short wrong-code without the patch below:


Testing revealed a bug in my patch (POINTER_TYPE args really need special
handling, restored), and one further testcase glitch, plus I've added
another testcase for the other wrong-code.

Going to bootstrap/regtest this again:

2017-02-02  Jakub Jelinek  
Martin Sebor  

PR tree-optimization/79327
* gimple-ssa-sprintf.c (adjust_range_for_overflow): If returning
true, always set *argmin and *argmax to TYPE_{MIN,MAX}_VALUE of
dirtype.
(format_integer): Use wide_int_to_tree instead of build_int_cst
+ to_?hwi.  If argmin is NULL, just set argmin and argmax to
TYPE_{MIN,MAX}_VALUE of argtype.  Simplify and fix computation
of shortest and longest sequence.

* gcc.dg/tree-ssa/pr79327.c: New test.
* gcc.dg/tree-ssa/builtin-sprintf-warn-1.c
(test_sprintf_chk_hh_nonconst): Don't expect 2 bogus warnings.
* gcc.dg/tree-ssa/builtin-sprintf-warn-3.c
(test_sprintf_chk_range_schar): Adjust dg-message.
* gcc.dg/tree-ssa/builtin-sprintf-warn-12.c: New test.
* gcc.c-torture/execute/pr79327.c: New test.
After a lot of thinking about the situation, I think we're better off 
doing this bit of cleanup now.  Jakub and I both expect we're going to 
be in this code again during the gcc-7 cycle, so the cleanups have 
immediate benefit.  I don't see them as inherently destabilizing and 
they're a smaller than I had anticipated.


They may (or may not) need tweaks to adjust for the vla/flexible array 
support that went in last night.  I don't see an obvious conflict, but 
if there is use your best judgment about whether or not you need to go 
through another review/approval cycle.





--- gcc/gimple-ssa-sprintf.c.jj 2017-02-02 11:03:46.536526801 +0100
+++ gcc/gimple-ssa-sprintf.c2017-02-02 14:53:54.657592450 +0100
@@ -1260,10 +1248,8 @@ format_integer (const directive , tr
@@ -1307,47 +1293,16 @@ format_integer (const directive , tr

   if (!argmin)
 {
-  /* For an unknown argument (e.g., one passed to a vararg function)
-or one whose value range cannot be determined, create a T_MIN
-constant if the argument's type is signed and T_MAX otherwise,
-and use those to compute the range of bytes that the directive
-can output.  When precision may be zero, use zero as the minimum
-since it results in no bytes on output (unless width is specified
-to be greater than 0).  */
-  bool zero = dir.prec[0] <= 0 && dir.prec[1] >= 0;
-  argmin = build_int_cst (argtype, !zero);
-
-  int typeprec = TYPE_PRECISION (dirtype);
-  int argprec = TYPE_PRECISION (argtype);
-
-  if (argprec < typeprec)
+  if (TREE_CODE (argtype) == POINTER_TYPE)
{
- if (POINTER_TYPE_P (argtype))
-   argmax = build_all_ones_cst (argtype);
- else if (TYPE_UNSIGNED (argtype))
-   argmax = TYPE_MAX_VALUE (argtype);
- else
-   argmax = TYPE_MIN_VALUE (argtype);
+ argmin = build_int_cst (pointer_sized_int_node, 0);
+ argmax = build_all_ones_cst (pointer_sized_int_node);
Curious why you didn't use the wide_int_to_tree API here.  I'm not 
objecting to using build_XXX, it's more to help guide me when I need to 
make a choice between the APIs for building INTEGER_CST nodes.


OK pending resolution of any conflicts with vla/flexible array changes 
from last night.


jeff


Re: [PATCH, rs6000] gcc 6 back port of vec_packs and vec_vgbbd builtin table entry fixes

2017-02-03 Thread Segher Boessenkool
On Fri, Feb 03, 2017 at 09:57:31AM -0800, Carl E. Love wrote:
> I have squashed the following two patches from mainline 
> 
> commit r244943  Remove bogus entries for the P8V_BUILTIN_VEC_VGBBD   
> built-ins
> commit r244904  Fix order of entries for ALTIVEC_BUILTIN_VEC_PACKS
> and P8V_BUILTIN_VEC_VGBBD.
> 
> and back ported them to the GCC 6 branch.  The patch fixes the issue
> of the vec_packs built-in entries not being contiguous and removes the
> bogus entries for the vec_vgbbd built-in.
> 
> The patch has been tested on powerpc64le-unknown-linux-gnu (Power 8 LE)
> with no regressions.
> 
> Is the patch OK for gcc 6 branch?  

Yes, okay.  Thanks,


Segher


> 2017-02-03  Carl Love  
> 
> * config/rs6000/rs6000-c (altivec_overloaded_builtins): Fix order
> of entries for ALTIVEC_BUILTIN_VEC_PACKS.  Remove bogus entries
> for P8V_BUILTIN_VEC_VGBBD.
> 
> gcc/testsuite/ChangeLog:
> 
> 2017-02-03  Carl Love  
> * gcc.target/powerpc/builtins-3-p8.c:  Add new testfile for missing
> vec_packs built-in tests.


[PATCH, rs6000] gcc 6 back port of vec_packs and vec_vgbbd builtin table entry fixes

2017-02-03 Thread Carl E. Love
GCC Maintainers:

I have squashed the following two patches from mainline 

commit r244943  Remove bogus entries for the P8V_BUILTIN_VEC_VGBBD   
built-ins
commit r244904  Fix order of entries for ALTIVEC_BUILTIN_VEC_PACKS
and P8V_BUILTIN_VEC_VGBBD.

and back ported them to the GCC 6 branch.  The patch fixes the issue
of the vec_packs built-in entries not being contiguous and removes the
bogus entries for the vec_vgbbd built-in.

The patch has been tested on powerpc64le-unknown-linux-gnu (Power 8 LE)
with no regressions.

Is the patch OK for gcc 6 branch?  

   Carl Love
-
gcc/ChangeLog:

2017-02-03  Carl Love  

* config/rs6000/rs6000-c (altivec_overloaded_builtins): Fix order
of entries for ALTIVEC_BUILTIN_VEC_PACKS.  Remove bogus entries
for P8V_BUILTIN_VEC_VGBBD.

gcc/testsuite/ChangeLog:

2017-02-03  Carl Love  
* gcc.target/powerpc/builtins-3-p8.c:  Add new testfile for missing
vec_packs built-in tests.
---
 gcc/config/rs6000/rs6000-c.c | 13 
 gcc/testsuite/gcc.target/powerpc/builtins-3-p8.c | 26 
 2 files changed, 30 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/builtins-3-p8.c

diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c
index 9eb6d54..474b48c 100644
--- a/gcc/config/rs6000/rs6000-c.c
+++ b/gcc/config/rs6000/rs6000-c.c
@@ -2133,14 +2133,14 @@ const struct altivec_builtin_types 
altivec_overloaded_builtins[] = {
 RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V4SI, 
RS6000_BTI_unsigned_V4SI, 0 },
   { ALTIVEC_BUILTIN_VEC_PACKS, ALTIVEC_BUILTIN_VPKSWSS,
 RS6000_BTI_V8HI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
-  { ALTIVEC_BUILTIN_VEC_VPKSWSS, ALTIVEC_BUILTIN_VPKSWSS,
-RS6000_BTI_V8HI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
-  { ALTIVEC_BUILTIN_VEC_VPKUWUS, ALTIVEC_BUILTIN_VPKUWUS,
-RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V4SI, 
RS6000_BTI_unsigned_V4SI, 0 },
   { ALTIVEC_BUILTIN_VEC_PACKS, P8V_BUILTIN_VPKUDUS,
 RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V2DI, 
RS6000_BTI_unsigned_V2DI, 0 },
   { ALTIVEC_BUILTIN_VEC_PACKS, P8V_BUILTIN_VPKSDSS,
 RS6000_BTI_V4SI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 },
+  { ALTIVEC_BUILTIN_VEC_VPKSWSS, ALTIVEC_BUILTIN_VPKSWSS,
+RS6000_BTI_V8HI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0 },
+  { ALTIVEC_BUILTIN_VEC_VPKUWUS, ALTIVEC_BUILTIN_VPKUWUS,
+RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V4SI, 
RS6000_BTI_unsigned_V4SI, 0 },
   { ALTIVEC_BUILTIN_VEC_VPKSHSS, ALTIVEC_BUILTIN_VPKSHSS,
 RS6000_BTI_V16QI, RS6000_BTI_V8HI, RS6000_BTI_V8HI, 0 },
   { ALTIVEC_BUILTIN_VEC_VPKUHUS, ALTIVEC_BUILTIN_VPKUHUS,
@@ -4550,11 +4550,6 @@ const struct altivec_builtin_types 
altivec_overloaded_builtins[] = {
   { P8V_BUILTIN_VEC_VUPKLSW, P8V_BUILTIN_VUPKLSW,
 RS6000_BTI_bool_V2DI, RS6000_BTI_bool_V4SI, 0, 0 },
 
-  { P8V_BUILTIN_VEC_VGBBD, P8V_BUILTIN_VGBBD,
-RS6000_BTI_V16QI, 0, 0, 0 },
-  { P8V_BUILTIN_VEC_VGBBD, P8V_BUILTIN_VGBBD,
-RS6000_BTI_unsigned_V16QI, 0, 0, 0 },
-
   { P9V_BUILTIN_VEC_VSLV, P9V_BUILTIN_VSLV,
 RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI,
 RS6000_BTI_unsigned_V16QI, 0 },
diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-3-p8.c 
b/gcc/testsuite/gcc.target/powerpc/builtins-3-p8.c
new file mode 100644
index 000..2c06ea7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/builtins-3-p8.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-mcpu=power8" } */
+
+#include 
+
+vector signed int
+test_vsi_packs_vsll_vsll (vector signed long long x,
+  vector signed long long y)
+{
+  return vec_packs (x, y);
+}
+
+vector unsigned int
+test_vui_packs_vull_vull (vector unsigned long long x,
+  vector unsigned long long y)
+{
+  return vec_packs (x, y);
+}
+
+/* Expected test results:
+ test_vsi_packs_vsll_vsll  1 vpksdss
+ test_vui_packs_vull_vull  1 vpkudus */
+
+/* { dg-final { scan-assembler-times "vpksdss"  1 } } */
+/* { dg-final { scan-assembler-times "vpkudus"  1 } } */
-- 
1.9.1





Re: [PATCH] use zero as the lower bound for a signed-unsigned range (PR 79327)

2017-02-03 Thread Jeff Law

On 02/03/2017 09:39 AM, Jakub Jelinek wrote:

On Fri, Feb 03, 2017 at 09:13:23AM -0700, Martin Sebor wrote:

I'm not opposed to the changes, certainly not to cleaning things up,
but I don't think now is the time to be making these tweaks.  Jakub's
patch is fine with me without those tweaks, and with the correction


What do you mean by my patch without those tweaks?
The intent of the patch is not just fix the diagnostics and
wrong-code issue, but primarily that any further needed fix will need
to tweak just a single spot, not many, otherwise e.g. you need to have
sufficient testsuite coverage for all those paths.
Say in the http://gcc.gnu.org/ml/gcc-patches/2017-02/msg00234.html
patch, you would with my patch need just the tree_digits part,
and then the very last hunk in gimple-ssa-sprintf.c (with
likely_adjust &&
removed).  Because you do the adjustments only if !res.knownrange
and in that case you know argmin/argmax are actually dirtype's min/max,
so 0 must be in the range.
Right.  It's a combination of 3 things in Jakub patch.  FIx the 
diagnostics, fix the codegen issue and refactoring to make the ongoing 
maintenance easier.



I'm not opposed to changing this to make it more intuitive or useful
but again, I would rather not spend time on it now.  Instead, I would
prefer to discuss what we want after the dust from the release has
settled and implement a consistent solution in GCC 8.


I fear this isn't the last fix needed, the code is very complex and not
sufficiently covered by testcases, so if we don't want to make the code
more maintainable now, I'd be strongly for just turning
-fprintf-return-value off by default for the release.  Bogus warnings
can be lived with or worked around, silent wrong-code is much worse.
I also expect we're going to be in this code more through the gcc-7 
process and that's the primary reason why I'm considering the 
refactoring aspects of Jakub's patch.   Normally I would have nixed 
those changes as inappropriate at this stage.


Jeff


Re: [PATCH] Fix powerpc movsi_from_sf define_insn_and_split constraints (PR target/79354)

2017-02-03 Thread Jakub Jelinek
On Fri, Feb 03, 2017 at 09:59:45AM -0600, Segher Boessenkool wrote:
> Hi Jakub,
> 
> On Fri, Feb 03, 2017 at 10:10:47AM +0100, Jakub Jelinek wrote:
> > As mentioned in the PR, for the following testcase we emit a power9
> > instruction even with -mcpu=power8.  Similar movsf_hardfloat instruction
> > uses wb constraint for the stxssp insn source rather than wu, which it
> > only uses for stxsspx (power7?).
> > 
> > Bootstrapped/regtested on powerpc64{,le}-linux, ok for trunk?
> 
> Yes please.  Thanks!
> 
> Some testcase stuff below...
> 
> 
> > --- gcc/testsuite/gcc.target/powerpc/pr79354.c.jj   2017-02-03 
> > 02:37:44.147938375 +0100
> > +++ gcc/testsuite/gcc.target/powerpc/pr79354.c  2017-02-03 
> > 02:38:34.838303987 +0100
> > @@ -0,0 +1,23 @@
> > +/* PR target/79354 */
> > +/* { dg-do compile { target { powerpc64*-*-* && lp64 } } } */
> 
> powerpc*-*-* instead?  And why is lp64 needed?

Neither is needed, I've copied/adjusted the dg-do.*dg-options
block from another testcase and missed this.

> > +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
> > "-mcpu=power8" } } */
> > +/* { dg-require-effective-target powerpc_p8vector_ok } */
> > +/* { dg-options "-mcpu=power8 -O2" } */
> > +/* { dg-final { scan-assembler-not "stxssp\[^x]" } } */
> 
> \M is nicer and more future-proof, but this works.

{\mstxssp\M} is what I've committed after retesting it without/with the
rs6000.md change.  Thanks.

2017-02-03  Jakub Jelinek  

PR target/79354
* config/rs6000/rs6000.md (movsi_from_sf): Use wb constraint instead of
wu for stxssp alternative.

* gcc.target/powerpc/pr79354.c: New test.
* gcc.c-torture/execute/pr79354.c: New test.

--- gcc/config/rs6000/rs6000.md.jj  2017-02-02 11:04:27.0 +0100
+++ gcc/config/rs6000/rs6000.md 2017-02-03 02:29:42.754962983 +0100
@@ -6814,7 +6814,7 @@ (define_insn_and_split "movsi_from_sf"
 
(unspec:SI [(match_operand:SF 1 "input_operand"
"r,  m,   Z,   Z,r,
-f,  wu,  wu,  wIwH, r,
+f,  wb,  wu,  wIwH, r,
 wK")]
UNSPEC_SI_FROM_SF))
 
--- gcc/testsuite/gcc.target/powerpc/pr79354.c.jj   2017-02-03 
02:37:44.147938375 +0100
+++ gcc/testsuite/gcc.target/powerpc/pr79354.c  2017-02-03 02:38:34.838303987 
+0100
@@ -0,0 +1,22 @@
+/* PR target/79354 */
+/* { dg-do compile } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power8" } } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-mcpu=power8 -O2" } */
+/* { dg-final { scan-assembler-not {\mstxssp\M} } } */
+
+int b, f, g;
+float e;
+unsigned long d;
+
+void
+foo (int *a)
+{
+  for (g = 0; g < 32; g++)
+if (f)
+  {
+e = d;
+__builtin_memcpy (, , sizeof (float));
+b = *a;
+  }
+}
--- gcc/testsuite/gcc.c-torture/execute/pr79354.c.jj2017-02-03 
02:36:36.746781897 +0100
+++ gcc/testsuite/gcc.c-torture/execute/pr79354.c   2017-02-03 
02:36:07.0 +0100
@@ -0,0 +1,30 @@
+/* PR target/79354 */
+
+int b, f, g;
+float e;
+unsigned long d;
+
+__attribute__((noinline, noclone)) void
+foo (int *a)
+{
+  for (g = 0; g < 32; g++)
+if (f)
+  {
+e = d;
+__builtin_memcpy (, , sizeof (float));
+b = *a;
+  }
+}
+
+int
+main ()
+{
+  int h = 5;
+  f = 1;
+  asm volatile ("" : : : "memory");
+  foo ();
+  asm volatile ("" : : : "memory");
+  foo ();
+  asm volatile ("" : : : "memory");
+  return 0;
+}


Jakub


Re: [PATCH] use zero as the lower bound for a signed-unsigned range (PR 79327)

2017-02-03 Thread Jakub Jelinek
On Fri, Feb 03, 2017 at 10:02:48AM -0700, Martin Sebor wrote:
> On 02/03/2017 09:39 AM, Jakub Jelinek wrote:
> > On Fri, Feb 03, 2017 at 09:13:23AM -0700, Martin Sebor wrote:
> > > I'm not opposed to the changes, certainly not to cleaning things up,
> > > but I don't think now is the time to be making these tweaks.  Jakub's
> > > patch is fine with me without those tweaks, and with the correction
> > 
> > What do you mean by my patch without those tweaks?
> ...
> > I fear this isn't the last fix needed, the code is very complex and not
> > sufficiently covered by testcases, so if we don't want to make the code
> > more maintainable now, I'd be strongly for just turning
> > -fprintf-return-value off by default for the release.  Bogus warnings
> > can be lived with or worked around, silent wrong-code is much worse.
> 
> I mean without changes to the content of the notes.  If you feel
> it important to simplify the code now that's okay with me.

I haven't changed anything in order to change the notes, that is just the
side-effect of arg{min,max} to always mean the range boundaries, and only
res.  If you want to print what values the pass considered as shortest or
longest output, then it would need to remember (in other named fields)
those values that it picked and use suitable wording to tell those
values to the user.  I think the ranges are very desirable to be printed
in any cases, and if further info is added, if it doesn't clutter the
message too much, why not.

> I'm not empowered to either approve or reject any changes so what
> say is obviously just my input into Jeff's decision.  I also care

I'm not a maintainer of that part of GCC, so I can only wait for some
reviewer or maintainer too.

> a lot about this work so I'm not going to resist when faced with
> a threat of having it disabled.  If leaving the optimization
> enabled is conditional on it then feel free to make whatever
> changes you see fit.

Leaving the optimization enabled by default should be dependent on
1) what value it brings 2) what are the risks (how much we can trust
it not to have other silent wrong-code issues lurking in).
That decision is of course again something some reviewer would need
to ack if I were to propose it, but with Fedora gcc package maintainer
hat on, that is an option I'm seriously considering at least for the
upcoming non-test mass rebuild, because we have at most days to resolve
stuff.

Jakub


[PATCH][AArch64] Accept more addressing modes for PRFM

2017-02-03 Thread Kyrill Tkachov

Hi all,

While evaluating Maxim's SW prefetch patches [1] I noticed that the aarch64 
prefetch pattern is
overly restrictive in its address operand. It only accepts simple register 
addressing modes.
In fact, the PRFM instruction accepts almost all modes that a normal 64-bit LDR 
supports.
The restriction in the pattern leads to explicit address calculation code to be 
emitted which we could avoid.

This patch relaxes the restrictions on the prefetch define_insn. It creates a 
predicate and constraint that
allow the full addressing modes that PRFM allows. Thus for the testcase in the 
patch (adapted from one of the existing
__builtin_prefetch tests in the testsuite) we can generate a:
prfmPLDL1STRM, [x1, 8]

instead of the current
prfmPLDL1STRM, [x1]
with an explicit increment of x1 by 8 in a separate instruction.

I've removed the %a output modifier in the output template and wrapped the 
address operand into a DImode MEM before
passing it down to aarch64_print_operand.

This is because operand 0 is an address operand rather than a memory operand 
and thus doesn't have a mode associated
with it.  When processing the 'a' output modifier the code in final.c will call 
TARGET_PRINT_OPERAND_ADDRESS with a VOIDmode
argument.  This will ICE on aarch64 because we need a mode for the memory in 
order for aarch64_classify_address to work
correctly.  Rather than overriding the VOIDmode in 
aarch64_print_operand_address I decided to instead create the DImode
MEM in the "prefetch" output template and treat it as a normal 64-bit memory 
address, which at the point of assembly output
is what it is anyway.

With this patch I see a reduction in instruction count in the SPEC2006 
benchmarks when SW prefetching is enabled on top
of Maxim's patchset because fewer address calculation instructions are emitted 
due to the use of the more expressive
addressing modes. It also fixes a performance regression that I observed in 
410.bwaves from Maxim's patches on Cortex-A72.
I'll be running a full set of benchmarks to evaluate this further, but I think 
this is the right thing to do.

Bootstrapped and tested on aarch64-none-linux-gnu.

Maxim, do you want to try this on top of your patches on your hardware to see 
if it helps with the regressions you mentioned?

Thanks,
Kyrill


[1] https://gcc.gnu.org/ml/gcc-patches/2017-01/msg02284.html

2016-02-03  Kyrylo Tkachov  

* config/aarch64/aarch64.md (prefetch); Adjust predicate and
constraint on operand 0 to allow more general addressing modes.
Adjust output template.
* config/aarch64/aarch64.c (aarch64_address_valid_for_prefetch_p):
New function.
* config/aarch64/aarch64-protos.h
(aarch64_address_valid_for_prefetch_p): Declare prototype.
* config/aarch64/constraints.md (Dp): New address constraint.
* config/aarch64/predicates.md (aarch64_prefetch_operand): New
predicate.

2016-02-03  Kyrylo Tkachov  

* gcc.target/aarch64/prfm_imm_offset_1.c: New test.
commit a324e2f2ea243fe42f23a026ecbe1435876e2c8b
Author: Kyrylo Tkachov 
Date:   Thu Feb 2 14:46:11 2017 +

[AArch64] Accept more addressing modes for PRFM

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index babc327..61706de 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -300,6 +300,7 @@ extern struct tune_params aarch64_tune_params;
 
 HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned, unsigned);
 int aarch64_get_condition_code (rtx);
+bool aarch64_address_valid_for_prefetch_p (rtx, bool);
 bool aarch64_bitmask_imm (HOST_WIDE_INT val, machine_mode);
 unsigned HOST_WIDE_INT aarch64_and_split_imm1 (HOST_WIDE_INT val_in);
 unsigned HOST_WIDE_INT aarch64_and_split_imm2 (HOST_WIDE_INT val_in);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index acc093a..c05eff3 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -4549,6 +4549,24 @@ aarch64_classify_address (struct aarch64_address_info *info,
 }
 }
 
+/* Return true if the address X is valid for a PRFM instruction.
+   STRICT_P is true if we should do strict checking with
+   aarch64_classify_address.  */
+
+bool
+aarch64_address_valid_for_prefetch_p (rtx x, bool strict_p)
+{
+  struct aarch64_address_info addr;
+
+  /* PRFM accepts the same addresses as DImode...  */
+  bool res = aarch64_classify_address (, x, DImode, MEM, strict_p);
+  if (!res)
+return false;
+
+  /* ... except writeback forms.  */
+  return addr.type != ADDRESS_REG_WB;
+}
+
 bool
 aarch64_symbolic_address_p (rtx x)
 {
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index b9e8edf..c6201a5 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -518,27 +518,31 @@ (define_insn "nop"
 )
 
 (define_insn "prefetch"
-  [(prefetch (match_operand:DI 0 "register_operand" "r")
+  [(prefetch 

Re: [PATCH] use zero as the lower bound for a signed-unsigned range (PR 79327)

2017-02-03 Thread Martin Sebor

On 02/03/2017 09:39 AM, Jakub Jelinek wrote:

On Fri, Feb 03, 2017 at 09:13:23AM -0700, Martin Sebor wrote:

I'm not opposed to the changes, certainly not to cleaning things up,
but I don't think now is the time to be making these tweaks.  Jakub's
patch is fine with me without those tweaks, and with the correction


What do you mean by my patch without those tweaks?

...

I fear this isn't the last fix needed, the code is very complex and not
sufficiently covered by testcases, so if we don't want to make the code
more maintainable now, I'd be strongly for just turning
-fprintf-return-value off by default for the release.  Bogus warnings
can be lived with or worked around, silent wrong-code is much worse.


I mean without changes to the content of the notes.  If you feel
it important to simplify the code now that's okay with me.

I'm not empowered to either approve or reject any changes so what
say is obviously just my input into Jeff's decision.  I also care
a lot about this work so I'm not going to resist when faced with
a threat of having it disabled.  If leaving the optimization
enabled is conditional on it then feel free to make whatever
changes you see fit.

Martin


Re: [PATCH] use zero as the lower bound for a signed-unsigned range (PR 79327)

2017-02-03 Thread Jakub Jelinek
On Fri, Feb 03, 2017 at 09:13:23AM -0700, Martin Sebor wrote:
> I'm not opposed to the changes, certainly not to cleaning things up,
> but I don't think now is the time to be making these tweaks.  Jakub's
> patch is fine with me without those tweaks, and with the correction

What do you mean by my patch without those tweaks?
The intent of the patch is not just fix the diagnostics and
wrong-code issue, but primarily that any further needed fix will need
to tweak just a single spot, not many, otherwise e.g. you need to have
sufficient testsuite coverage for all those paths.
Say in the http://gcc.gnu.org/ml/gcc-patches/2017-02/msg00234.html
patch, you would with my patch need just the tree_digits part,
and then the very last hunk in gimple-ssa-sprintf.c (with
likely_adjust && 
removed).  Because you do the adjustments only if !res.knownrange
and in that case you know argmin/argmax are actually dirtype's min/max,
so 0 must be in the range.

> to keep the warning (and fix the octal base prefix) that I posted
> in the followup patch.  (The followup patch is also necessary to
> avoid incorrect optimization.)
> 
> Regarding the printed ranges: for integer arguments the pass prints
> one of two sets:
> 
> 1) either the range the argument supplied by the caller is known to
>be in, or
> 2) the range synthesized by the pass for an argument in an unknown
>range, or after conversion to the type of the directive
> 
> The notes distinguish between these two by using the two phrases:
> 
> 1) directive argument in the range [MIN, MAX]
> 
> 2) using the range [MIN, MAX] for directive argument
> 
> I suspect this isn't entirely consistent with all the recent changes
> but that's the idea behind it.  It's subtle and I'm not surprised
> Jakub missed it.

You still print as range [MIN, MAX] something that is in no way a range,
just some values picked from the range.

> I'm not opposed to changing this to make it more intuitive or useful
> but again, I would rather not spend time on it now.  Instead, I would
> prefer to discuss what we want after the dust from the release has
> settled and implement a consistent solution in GCC 8.

I fear this isn't the last fix needed, the code is very complex and not
sufficiently covered by testcases, so if we don't want to make the code
more maintainable now, I'd be strongly for just turning
-fprintf-return-value off by default for the release.  Bogus warnings
can be lived with or worked around, silent wrong-code is much worse.

Jakub


Re: [PATCH] Fix __atomic to not implement atomic loads with CAS.

2017-02-03 Thread Jakub Jelinek
On Fri, Feb 03, 2017 at 04:19:58PM +, Ramana Radhakrishnan wrote:
> > > Would it be acceptable for those users to have loads that perform like
> > > CAS loops, especially under contention?  Or are these users more
> > > concerned about aarch64 not offering a true atomic 16-byte load?
> > 
> > Can the store you need for atomicity be into an automatic var on the stack?
> 
> No, it has to be to the same location.

But then it is the same problem as using cmpxchg16b on x86_64, the location
could be read-only, or that it is too slow otherwise for what users expect
for atomic load.

Jakub


Re: [PATCH] Fix __atomic to not implement atomic loads with CAS.

2017-02-03 Thread Ramana Radhakrishnan

On 03/02/17 15:13, Jakub Jelinek wrote:

On Fri, Feb 03, 2017 at 04:07:22PM +0100, Torvald Riegel wrote:

On Fri, 2017-02-03 at 13:44 +, Ramana Radhakrishnan wrote:

__atomic_load on ARM appears to be ok as well

except for

__atomic_load_di which should really be the ldrexd / strexd loop but we
could ameliorate that similar to your option 3b.


This uses just ldrexd now, and thus is not guaranteed to be atomic?


On AArch64

* <16 byte loads have always been fine. The architecture allows single
copy atomic loads using single load instructions for all other sizes and
memory models, so we are fine there.

* we have gone through the libatomic locks from day one of the port for
16 byte loads.  This has been a bit of a bugbear for a number of users
within ARM who would really like to get performance without heavy weight
locks for 16 byte atomic ops.


Would it be acceptable for those users to have loads that perform like
CAS loops, especially under contention?  Or are these users more
concerned about aarch64 not offering a true atomic 16-byte load?


Can the store you need for atomicity be into an automatic var on the stack?


No, it has to be to the same location.

Ramana



Re: [PATCH] use zero as the lower bound for a signed-unsigned range (PR 79327)

2017-02-03 Thread Martin Sebor

On 02/03/2017 12:35 AM, Jeff Law wrote:

On 02/02/2017 09:58 AM, Martin Sebor wrote:


Otherwise all the tests succeeded, though looking e.g. at the diff
between builtin-sprintf-warn-1.c diagnostics with your patch and with
the patch below instead, there are also differences like:
-builtin-sprintf-warn-1.c:1119:3: note: using the range [1, -128] for
directive argument
+builtin-sprintf-warn-1.c:1119:3: note: using the range [-128, 127]
for directive argument
where [1, -128] looks clearly wrong, that isn't a valid range,


The range in the note is the artificial one the pass uses to compute
the range of output.  I went back and forth on this as I think it's
debatable if [-128, 127] is clearer than [1, -128] (or [-128, 1]).
The informative notes aren't completely consistent in what ranges
they print.  It would be nice to nail down what we think is the most
useful output and make them all consistent.

I don't think there's any argument against moving towards consistency,
but I would well see folks not agreeing on which consistent style is
preferable.

I do think that we should strive to print ranges in the same format that
we use to describe ranges internally.  So a range like [1, -128] is not
a valid range.  That may argue against using the artificial range
representation in the output.






As for the rest of the patch, while I appreciate your help and
acknowledge it's not my call to make I'm not entirely comfortable
with this much churn at this stage.  My preference would to just
fix the bugs and do the restructuring in stage 1.

I'm really torn here.  I'm keen to raise the bar in terms of what we're
willing to do for gcc-7.  But I'm also keen to have the codebase in a
reasonable state that will allow for the ongoing maintenance of the
gcc-7 branch.

The sprintf stuff is fairly dense and there's almost certainly more
dusty corner case issues we're going to have to work through.  Thus we
stand to be in a better state to maintain the code if we can do some
refactoring.

The fact that Jakub, one of the release managers, is proposing the
change carries a lot of weight in terms of trying to make a decision
here.  Similarly your weight as the author of this code carries a lot of
weight.  Leaving us without a clearly preferable path.


I'm not opposed to the changes, certainly not to cleaning things up,
but I don't think now is the time to be making these tweaks.  Jakub's
patch is fine with me without those tweaks, and with the correction
to keep the warning (and fix the octal base prefix) that I posted
in the followup patch.  (The followup patch is also necessary to
avoid incorrect optimization.)

Regarding the printed ranges: for integer arguments the pass prints
one of two sets:

1) either the range the argument supplied by the caller is known to
   be in, or
2) the range synthesized by the pass for an argument in an unknown
   range, or after conversion to the type of the directive

The notes distinguish between these two by using the two phrases:

1) directive argument in the range [MIN, MAX]

2) using the range [MIN, MAX] for directive argument

I suspect this isn't entirely consistent with all the recent changes
but that's the idea behind it.  It's subtle and I'm not surprised
Jakub missed it.

I'm not opposed to changing this to make it more intuitive or useful
but again, I would rather not spend time on it now.  Instead, I would
prefer to discuss what we want after the dust from the release has
settled and implement a consistent solution in GCC 8.

Martin



Re: [PATCH] Fix powerpc movsi_from_sf define_insn_and_split constraints (PR target/79354)

2017-02-03 Thread Segher Boessenkool
Hi Jakub,

On Fri, Feb 03, 2017 at 10:10:47AM +0100, Jakub Jelinek wrote:
> As mentioned in the PR, for the following testcase we emit a power9
> instruction even with -mcpu=power8.  Similar movsf_hardfloat instruction
> uses wb constraint for the stxssp insn source rather than wu, which it
> only uses for stxsspx (power7?).
> 
> Bootstrapped/regtested on powerpc64{,le}-linux, ok for trunk?

Yes please.  Thanks!

Some testcase stuff below...


> --- gcc/testsuite/gcc.target/powerpc/pr79354.c.jj 2017-02-03 
> 02:37:44.147938375 +0100
> +++ gcc/testsuite/gcc.target/powerpc/pr79354.c2017-02-03 
> 02:38:34.838303987 +0100
> @@ -0,0 +1,23 @@
> +/* PR target/79354 */
> +/* { dg-do compile { target { powerpc64*-*-* && lp64 } } } */

powerpc*-*-* instead?  And why is lp64 needed?

> +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
> "-mcpu=power8" } } */
> +/* { dg-require-effective-target powerpc_p8vector_ok } */
> +/* { dg-options "-mcpu=power8 -O2" } */
> +/* { dg-final { scan-assembler-not "stxssp\[^x]" } } */

\M is nicer and more future-proof, but this works.


Segher


Re: [PATCH] Fix __atomic to not implement atomic loads with CAS.

2017-02-03 Thread Jakub Jelinek
On Fri, Feb 03, 2017 at 04:07:22PM +0100, Torvald Riegel wrote:
> On Fri, 2017-02-03 at 13:44 +, Ramana Radhakrishnan wrote:
> > __atomic_load on ARM appears to be ok as well
> > 
> > except for
> > 
> > __atomic_load_di which should really be the ldrexd / strexd loop but we 
> > could ameliorate that similar to your option 3b.
> 
> This uses just ldrexd now, and thus is not guaranteed to be atomic?
> 
> > On AArch64
> > 
> > * <16 byte loads have always been fine. The architecture allows single 
> > copy atomic loads using single load instructions for all other sizes and 
> > memory models, so we are fine there.
> > 
> > * we have gone through the libatomic locks from day one of the port for 
> > 16 byte loads.  This has been a bit of a bugbear for a number of users 
> > within ARM who would really like to get performance without heavy weight 
> > locks for 16 byte atomic ops.
> 
> Would it be acceptable for those users to have loads that perform like
> CAS loops, especially under contention?  Or are these users more
> concerned about aarch64 not offering a true atomic 16-byte load?

Can the store you need for atomicity be into an automatic var on the stack?
Then there really shouldn't be any contention on that var (especially if it
is padded so the cache-line doesn't contain anything else), does load
contention on the atomic var matter if there are no stores to it?

Jakub


Re: [PATCH] Fix __atomic to not implement atomic loads with CAS.

2017-02-03 Thread Torvald Riegel
On Fri, 2017-02-03 at 13:44 +, Ramana Radhakrishnan wrote:
> __atomic_load on ARM appears to be ok as well
> 
> except for
> 
> __atomic_load_di which should really be the ldrexd / strexd loop but we 
> could ameliorate that similar to your option 3b.

This uses just ldrexd now, and thus is not guaranteed to be atomic?

> On AArch64
> 
> * <16 byte loads have always been fine. The architecture allows single 
> copy atomic loads using single load instructions for all other sizes and 
> memory models, so we are fine there.
> 
> * we have gone through the libatomic locks from day one of the port for 
> 16 byte loads.  This has been a bit of a bugbear for a number of users 
> within ARM who would really like to get performance without heavy weight 
> locks for 16 byte atomic ops.

Would it be acceptable for those users to have loads that perform like
CAS loops, especially under contention?  Or are these users more
concerned about aarch64 not offering a true atomic 16-byte load?

> We could never switch this around on AArch64 to using the loop (or CAS) 
> by default as this would be the reverse issue (i.e. old compilers 
> calling libatomic locks and new code using the inlined instruction 
> sequence).

I think there's an implicit assumption that whenever one uses code
generated with a particular compiler, the libatomic version being used
at runtime is at least as recent as that particular compiler.  In a mix
of old and new code, this would mean that new libatomic has to be used.
However, AFAIK we haven't specified that explicitly yet.  (Richard
Henderson said we'd make similar implicit assumptions for libgcc, if I
understood him correctly.)

If that assumption holds, then switching from locks in libatomic to
inlined instructions is possible, provided that libatomic switches too.

> 
> My interest in this patch was piqued because if we were to switch 
> aarch64 to not use the locks in libatomic and go to a CAS or the loop 
> sequence I showed in my reply to Jakub, I don't believe we have an ABI 
> issue as I would expect there to be a single copy of libatomic on the 
> system and everyone would simply be using that.

Right, and it would have to be at least as recent as the most recent
compiler being used (eg, the system compiler on that system).

> However we would end up with the situation of generating stores for 
> __atomic_loads as you described above.
> 
> We could still in theory do that and gain the same bug for rodata and 
> volatile atomic pointers on AArch64 - I don't see a way of avoiding that 
> on aarch32 as we have a similar ABI issue to you.

Agreed.

> 
> > The patch introduces a can_atomic_load_p, which checks that an entry
> > exists in optabs and the machine descriptions for an atomic load of the
> > particular mode.
> >
> > IIRC, arm and aarch64 had atomic load patterns defined whenever they had
> > a CAS defined.  It would be nice if you could double-check that.  If
> > that's the case, nothing should change with my patch because
> > can_atomic_load_p would always return true if a CAS could be issued.
> > If that's not the case, arm/aarch64 could be in the same boat as x86_64
> > with cmpxchg16b; then, using Option 3b might be a useful (intermediary)
> > solution for ARM as well (OTOH, compatibility with existing code and
> > __sync builtins might perhaps not be as much of a factor as it might be
> > on x86_64).
> 
> On AArch64 IIRC only those instructions that are single copy atomic as 
> per the architecture are allowed to be atomic loads. Thus we do not 
> generate such loops anywhere.

I think the condition we'd have to check is rather that whenever there
is an atomic CAS available, there is also an atomic load available.
That is not quite the same as having only truly atomic loads available,
because then a CAS could still be available regardless of the
availability of a matching load.



[wwwdocs] Uncomment link to gcc-7/porting_to.html

2017-02-03 Thread David Malcolm
We now have a porting_to.html for gcc 7, so this patch uncomments the
link to it on gcc 7's changes page.

Validates.

I took the liberty of committing this, under the "obvious" rule.
Index: htdocs/gcc-7/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-7/changes.html,v
retrieving revision 1.58
diff -u -p -r1.58 changes.html
--- htdocs/gcc-7/changes.html	3 Feb 2017 14:00:31 -	1.58
+++ htdocs/gcc-7/changes.html	3 Feb 2017 14:04:41 -
@@ -14,7 +14,7 @@
 
 This page is a brief summary of some of the huge number of improvements in GCC 7.
 For more information, see the
-
+Porting to GCC 7 page and the
 full GCC documentation.
 
 


Re: [wwwdocs] Add various new warnings for GCC 7

2017-02-03 Thread Marek Polacek
On Thu, Feb 02, 2017 at 10:24:34PM +0100, Gerald Pfeifer wrote:
> Hi Marek,
> 
> a couple of comments (and minor changes) below.  Once you have made
> those, this is okay to commit.  Quite some stuff going into GCC 7!
>
> On Fri, 27 Jan 2017, Marek Polacek wrote:
> > Index: changes.html
> > ===
> > +New command-line options have been added for the C and C++ compilers:
> > +  
> > +-Wimplicit-fallthrough warns when a switch case falls
> > +through.  This warning has five different levels.  The compiler is
> > +   able to parse a wide range of falls through comments, depending on
> 
> Perhaps say "fall through" (in quotes)?
> 
> And why is it 'fallthrough' in doc/invoke.texi and 'falls through' here?
 
I don't think there's one established term.  I changed this to fallthrough.

> > +   the level.  It also handles control-flow statements, such as ifs.
> 
> I have to admit I am not sure what the above means practically, i.e.,
> in what manner control-flow statement interact here.
> 
> (So I did check doc/invoke.texi and there is an example, alas no
> explanation.  Is the point that GCC only warns about the "i < 1"
> arm?  That may be good describing in the documentation.)
 
Yeah, it means that the warning can figure out that only some (or none)
of the branches of an if can fall through.  (In the future it should also
handle switches much in the same way.)

> > +   It's possible to suppres the warning by either adding a falls through
> > +   comment, or by using a null statement: __attribute__
> 
> Same as above.
 
Changed, too.

> > +-Wpointer-compare warns when a pointer is compared 
> > with
> > +a zero character constant.  This code is now invalid in C++11 and
> > +   GCC rejects such code.  This warning is enabled by default.
> 
> How about "Such code is now invalid in C++11 and GCC rejects it"?
 
Ok. 
 
> > +-Wrestrict warns when an argument passed to a
> > +restrict-qualified parameter aliases with another argument.
> 
> restrict-qualified
> 
> > +-Wmemset-elt-size warns for memset calls, when the 
> > first
> 
> memset
> 
> > +-Wswitch-unreachable warns when a switch statement has
> 
> switch
> 
> > +-Wregister warns about uses of register
> > +storage specifier.  In C++17 this keyword has been removed and for 
> > C++17
> 
> "the...storage specified" (adding "the")
> 
> > +has duplicate const, volatile, restrict or _Atomic specifier.
> 
> const
> volatile
> restrict
> _Atomic
> 
> > +The new -Wdangling-else command-line option has been split
> > +out of -Wparentheses and warns about dangling else.
> 
> else

All fixed & committed.

Marek


Re: [wwwdocs] Mention GIMPLE and RTL frontends in changes.html

2017-02-03 Thread David Malcolm
On Fri, 2017-02-03 at 08:09 +0100, Gerald Pfeifer wrote:
> On Thu, 2 Feb 2017, David Malcolm wrote:
> > This patch to the website moves the section about the selftest
> > suite to
> > the bottom of "Other significant improvements" section, and
> > rewrites it
> > to also cover the GIMPLE and RTL "frontends", and tries to couch
> > these
> > changes in terms of the benefit to the end-user (i.e. a more
> > reliable
> > compiler).
> 
> Index: htdocs/gcc-7/changes.html
> ===
> +GCC's already extensive self-test suite has gained some new
> +  capabilities, to further improve the reliability of the
> compiler:
> 
> Would "testsuite" be sufficient here?
> 
> (Per codingconventions.html we use "testsuite" as opposed to "test 
> suite", as I just checked.)

Changed.

> +  GCC now has has an internal unit testing API and a suite
> of tests
> +for programmatic self-testing of implementation
> subsystems.
> 
> Omit "implementation"?
> 
> Those are genuine questions (where I feel a little stronger about 
> the former than the latter), i.e., while I personally would make 
> those changes, I am not making this a requirement of my review.
> 
> The patch is okay, just consider the above, please.

I tried a few variants here, but in the end I considered your
suggestion as the cleanest; thanks!

Committed

Dave


Re: [PATCH] Fix __atomic to not implement atomic loads with CAS.

2017-02-03 Thread Ramana Radhakrishnan

On 02/02/17 15:21, Torvald Riegel wrote:

On Thu, 2017-02-02 at 14:48 +, Ramana Radhakrishnan wrote:

On 30/01/17 18:54, Torvald Riegel wrote:

This patch fixes the __atomic builtins to not implement supposedly
lock-free atomic loads based on just a compare-and-swap operation.

If there is no hardware-backed atomic load for a certain memory
location, the current implementation can implement the load with a CAS
while claiming that the access is lock-free.  This is a bug in the cases
of volatile atomic loads and atomic loads to read-only-mapped memory; it
also creates a lot of contention in case of concurrent atomic loads,
which results in at least counter-intuitive performance because most
users probably understand "lock-free" to mean hardware-backed (and thus
"fast") instead of just in the progress-criteria sense.

This patch implements option 3b of the choices described here:
https://gcc.gnu.org/ml/gcc/2017-01/msg00167.html



Will Deacon pointed me at this thread asking if something similar could
be done on ARM.


It would be nice if someone more familiar with ARM could double-check
that ARM is not affected.  I guess ARM isn't, but that's based on me
looking at machine descriptions, which I hadn't ever done before working
on this patch...



ARM doesn't have __int128 support, so I don't think the problem exists 
there.


On ARM, on architecture levels (i.e arch < armv6k) that do not have 
single copy atomic routines we end up with calling the kernel helper 
routines where the appropriate handling is done by the kernel depending 
on whether you are multicore or not.


__atomic_load on ARM appears to be ok as well

except for

__atomic_load_di which should really be the ldrexd / strexd loop but we 
could ameliorate that similar to your option 3b.




On AArch64

* <16 byte loads have always been fine. The architecture allows single 
copy atomic loads using single load instructions for all other sizes and 
memory models, so we are fine there.


* we have gone through the libatomic locks from day one of the port for 
16 byte loads.  This has been a bit of a bugbear for a number of users 
within ARM who would really like to get performance without heavy weight 
locks for 16 byte atomic ops.
We could never switch this around on AArch64 to using the loop (or CAS) 
by default as this would be the reverse issue (i.e. old compilers 
calling libatomic locks and new code using the inlined instruction 
sequence).



My interest in this patch was piqued because if we were to switch 
aarch64 to not use the locks in libatomic and go to a CAS or the loop 
sequence I showed in my reply to Jakub, I don't believe we have an ABI 
issue as I would expect there to be a single copy of libatomic on the 
system and everyone would simply be using that.


However we would end up with the situation of generating stores for 
__atomic_loads as you described above.


We could still in theory do that and gain the same bug for rodata and 
volatile atomic pointers on AArch64 - I don't see a way of avoiding that 
on aarch32 as we have a similar ABI issue to you.




The patch introduces a can_atomic_load_p, which checks that an entry
exists in optabs and the machine descriptions for an atomic load of the
particular mode.

IIRC, arm and aarch64 had atomic load patterns defined whenever they had
a CAS defined.  It would be nice if you could double-check that.  If
that's the case, nothing should change with my patch because
can_atomic_load_p would always return true if a CAS could be issued.
If that's not the case, arm/aarch64 could be in the same boat as x86_64
with cmpxchg16b; then, using Option 3b might be a useful (intermediary)
solution for ARM as well (OTOH, compatibility with existing code and
__sync builtins might perhaps not be as much of a factor as it might be
on x86_64).


On AArch64 IIRC only those instructions that are single copy atomic as 
per the architecture are allowed to be atomic loads. Thus we do not 
generate such loops anywhere.





The atomic load patterns you have could still be wrong, for example by
generating a LDXP/STXP loop or something else that can store on an
atomic load.  In that case, the problem is similar as not having a
custom load pattern, and the second case in the previous paragraph
applies.






On armv8-a we can implement an atomic load of 16 bytes using an LDXP /
STXP loop as a 16 byte load isnt' single copy atomic. On armv8.1-a we do
have a CAS on 16 bytes.

This was discussed last year here.

https://gcc.gnu.org/ml/gcc/2016-06/msg00017.html

and the consensus seemed to be that we couldn't really do a CAS on
readonly data as we would have no way of avoiding a trap.


Yes, and I agree that not doing a CAS (or anything that can store on
atomic load) is the right thing to do.

BTW, on the ARM targets, is it possible that the __sync builtins would
use LDXP/STXP on 16 byte and the __atomic loads would fall back to using
libatomic and locks?  Or do you simply not provide 16-byte __sync
builtins?


Re: [PATCH] Make multiple_target.c aware of LTO (PR lto/66295)

2017-02-03 Thread Martin Liška
On 02/02/2017 02:18 PM, Martin Liška wrote:
> Ok, I spent more time with understanding how that pass works and I believe it 
> can be
> significantly simplified. I guess target_clones are very close to 'target' 
> attribute
> that is handled by C++ FE. It creates cgraph_function_version_info and 
> function dispatcher
> is generated.
> 
> I hope doing the very same by an early simple IPA pass should be the right 
> approach.
> Works fine apart from an assert it triggers:

After reading of the original thread, it's still unclear why 2 passes we 
introduced.
As I read the original patch, there was just one. Having pre-approved patch by 
Honza,
I'll install that after testing.

Martin

> 
> $ ./xgcc -B. 
> /home/marxin/Programming/gcc/gcc/testsuite/gcc.target/i386/mvc9.c -flto -O1
> lto1: internal compiler error: in binds_to_current_def_p, at symtab.c:2239
> 0x8c580a symtab_node::binds_to_current_def_p(symtab_node*)
>   ../../gcc/symtab.c:2239
> 0x18cb40b worse_state
>   ../../gcc/ipa-pure-const.c:477
> 0x18cd61f propagate_pure_const
>   ../../gcc/ipa-pure-const.c:1346
> 0x18ce304 execute
>   ../../gcc/ipa-pure-const.c:1679
> 
> triggered for foo.ifunc:
> 
> foo.ifunc/6 (foo.ifunc) @0x7f9535b138a0
>   Type: function definition analyzed alias
>   Visibility: prevailing_def_ironly artificial
>   References: foo.resolver/7 (alias)
>   Referring: 
>   Read from file: /tmp/ccdj2ikS.o
>   Availability: overwritable
>   First run: 0
>   Function flags:
>   Called by: main/2 (1.00 per call) 
>   Calls: 
> 
> The assert is removed in attached patch, but maybe there's a better approach?
> 
> Thanks,
> Martin
> 

>From e2ff9bb3ebe8f005e7334780ede92dab6afb1a07 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Tue, 24 Jan 2017 13:41:25 +0100
Subject: [PATCH] Simplify creation of target_clones (PR lto/66295)

gcc/ChangeLog:

2017-01-24  Martin Liska  

	* multiple_target.c (create_dispatcher_calls): Redirect edge
	from a caller of a dispatcher.
	(expand_target_clones): Make the clones local.
	(ipa_target_clone): Do both target clones and resolvers.
	(ipa_dispatcher_calls): Remove the pass.
	(pass_dispatcher_calls::gate): Likewise.
	(make_pass_dispatcher_calls): Likewise.
	* passes.def (pass_target_clone): Put as very first IPA early
	pass.

gcc/testsuite/ChangeLog:

2017-01-24  Martin Liska  

	* gcc.target/i386/mvc9.c: New test.
---
 gcc/multiple_target.c| 71 +---
 gcc/passes.def   |  3 +-
 gcc/testsuite/gcc.target/i386/mvc9.c | 28 ++
 3 files changed, 39 insertions(+), 63 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/mvc9.c

diff --git a/gcc/multiple_target.c b/gcc/multiple_target.c
index 5be3980db20..7b735ae81ae 100644
--- a/gcc/multiple_target.c
+++ b/gcc/multiple_target.c
@@ -87,6 +87,7 @@ create_dispatcher_calls (struct cgraph_node *node)
 	inode->resolve_alias (cgraph_node::get (resolver_decl));
 
   e->redirect_callee (inode);
+  e->redirect_call_stmt_to_callee ();
   /*  Since REDIRECT_CALLEE modifies NEXT_CALLER field we move to
 	  previously set NEXT_CALLER.  */
   e = NULL;
@@ -283,6 +284,7 @@ expand_target_clones (struct cgraph_node *node, bool definition)
   create_new_asm_name (attr, suffix);
   /* Create new target clone.  */
   cgraph_node *new_node = create_target_clone (node, definition, suffix);
+  new_node->local.local = false;
   XDELETEVEC (suffix);
 
   /* Set new attribute for the clone.  */
@@ -334,17 +336,19 @@ expand_target_clones (struct cgraph_node *node, bool definition)
   return ret;
 }
 
-static bool target_clone_pass;
-
 static unsigned int
 ipa_target_clone (void)
 {
   struct cgraph_node *node;
 
-  target_clone_pass = false;
+  bool target_clone_pass = false;
   FOR_EACH_FUNCTION (node)
-if (node->definition)
-  target_clone_pass |= expand_target_clones (node, true);
+target_clone_pass |= expand_target_clones (node, node->definition);
+
+  if (target_clone_pass)
+FOR_EACH_FUNCTION (node)
+  create_dispatcher_calls (node);
+
   return 0;
 }
 
@@ -360,7 +364,7 @@ const pass_data pass_data_target_clone =
   0,/* properties_provided */
   0,/* properties_destroyed */
   0,/* todo_flags_start */
-  0/* todo_flags_finish */
+  TODO_update_ssa		/* todo_flags_finish */
 };
 
 class pass_target_clone : public simple_ipa_opt_pass
@@ -388,58 +392,3 @@ make_pass_target_clone (gcc::context *ctxt)
 {
   return new pass_target_clone (ctxt);
 }
-
-static unsigned int
-ipa_dispatcher_calls (void)
-{
-  struct cgraph_node *node;
-
-  FOR_EACH_FUNCTION (node)
-if (!node->definition)
-  target_clone_pass |= expand_target_clones (node, false);
-  if (target_clone_pass)
-FOR_EACH_FUNCTION (node)
-  create_dispatcher_calls (node);
-  return 0;
-}
-
-namespace {
-
-const pass_data pass_data_dispatcher_calls =
-{
-  SIMPLE_IPA_PASS,		/* type */
-  

Re: [PATCH] Make multiple_target.c aware of LTO (PR lto/66295)

2017-02-03 Thread Martin Liška
On 02/02/2017 02:18 PM, Martin Liška wrote:
> Ok, I spent more time with understanding how that pass works and I believe it 
> can be
> significantly simplified. I guess target_clones are very close to 'target' 
> attribute
> that is handled by C++ FE. It creates cgraph_function_version_info and 
> function dispatcher
> is generated.
> 
> I hope doing the very same by an early simple IPA pass should be the right 
> approach.
> Works fine apart from an assert it triggers:
> 
> $ ./xgcc -B. 
> /home/marxin/Programming/gcc/gcc/testsuite/gcc.target/i386/mvc9.c -flto -O1
> lto1: internal compiler error: in binds_to_current_def_p, at symtab.c:2239
> 0x8c580a symtab_node::binds_to_current_def_p(symtab_node*)
>   ../../gcc/symtab.c:2239
> 0x18cb40b worse_state
>   ../../gcc/ipa-pure-const.c:477
> 0x18cd61f propagate_pure_const
>   ../../gcc/ipa-pure-const.c:1346
> 0x18ce304 execute
>   ../../gcc/ipa-pure-const.c:1679
> 
> triggered for foo.ifunc:
> 
> foo.ifunc/6 (foo.ifunc) @0x7f9535b138a0
>   Type: function definition analyzed alias
>   Visibility: prevailing_def_ironly artificial
>   References: foo.resolver/7 (alias)
>   Referring: 
>   Read from file: /tmp/ccdj2ikS.o
>   Availability: overwritable
>   First run: 0
>   Function flags:
>   Called by: main/2 (1.00 per call) 
>   Calls: 
> 
> The assert is removed in attached patch, but maybe there's a better approach?

After discussion with Honza, we agreed that the function should bail out when
declaration has ifunc attribute. I'm going to install it as separate patch
right after proper testing.

Martin

> 
> Thanks,
> Martin
> 

>From 3f1d907737cf132f52c7a85d697e1f9d267efa86 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Fri, 3 Feb 2017 14:33:16 +0100
Subject: [PATCH] Bail out binds_to_current_def_p for ifunc functions.

gcc/ChangeLog:

2017-02-03  Martin Liska  

	* symtab.c (symtab_node::binds_to_current_def_p): Bail out
	in case of a function with ifunc attribute.
---
 gcc/symtab.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/symtab.c b/gcc/symtab.c
index 0078896c8a8..f0baf081040 100644
--- a/gcc/symtab.c
+++ b/gcc/symtab.c
@@ -2225,6 +2225,8 @@ symtab_node::binds_to_current_def_p (symtab_node *ref)
   if (transparent_alias)
 return definition
 	   && get_alias_target()->binds_to_current_def_p (ref);
+  if (lookup_attribute ("ifunc", DECL_ATTRIBUTES (decl)))
+return false;
   if (decl_binds_to_current_def_p (decl))
 return true;
 
-- 
2.11.0



Re: [PATCH] IPA: enhance dump output

2017-02-03 Thread Martin Liška
On 02/02/2017 03:54 PM, Jan Hubicka wrote:
>> 2017-01-24  Martin Liska  
>>
>>  * cgraph.c (cgraph_node::dump): Dump function version info.
>>  * symtab.c (symtab_node::dump_base): Add missing new line.
>> ---
>>  gcc/cgraph.c | 10 ++
>>  gcc/symtab.c |  1 +
>>  2 files changed, 11 insertions(+)
>>
>> diff --git a/gcc/cgraph.c b/gcc/cgraph.c
>> index ef2dc50282c..74839f7d993 100644
>> --- a/gcc/cgraph.c
>> +++ b/gcc/cgraph.c
>> @@ -2066,6 +2066,16 @@ cgraph_node::dump (FILE *f)
>>  fprintf (f, "  Profile id: %i\n",
>>   profile_id);
>>fprintf (f, "  First run: %i\n", tp_first_run);
>> +  cgraph_function_version_info *vi = function_version ();
>> +  if (vi != NULL)
>> +{
>> +  /* Iterate to first item in the chain.  */
>> +  while (vi->prev != NULL)
>> +vi = vi->prev;
>> +  fprintf (f, "  Version info: ");
>> +  dump_addr (f, "@", (void *)vi);
>> +  fprintf (f, "\n");
> 
> I suppose it is useful to know that version info is attached, but instead of
> dumping an address, i would rather meaningfully print its contents (i.e.
> dispatcher and prev/next pointers in list).
> 
> OK with that change.

Yep, there's final version of patch, which I'll install just after testing.

Martin

> 
> honza
>> +}
>>fprintf (f, "  Function flags:");
>>if (count)
>>  fprintf (f, " executed %" PRId64"x",
>> diff --git a/gcc/symtab.c b/gcc/symtab.c
>> index 87febdc212f..0078896c8a8 100644
>> --- a/gcc/symtab.c
>> +++ b/gcc/symtab.c
>> @@ -890,6 +890,7 @@ symtab_node::dump_base (FILE *f)
>>  {
>>fprintf (f, "  Aux:");
>>dump_addr (f, " @", (void *)aux);
>> +  fprintf (f, "\n");
>>  }
>>  
>>fprintf (f, "  References: ");
>> -- 
>> 2.11.0
>>
> 

>From 9e0affe017bf9420c14f5a53d7f6f282fea43e4a Mon Sep 17 00:00:00 2001
From: marxin 
Date: Tue, 24 Jan 2017 13:33:58 +0100
Subject: [PATCH] IPA: enhance dump output

gcc/ChangeLog:

2017-01-24  Martin Liska  

	* cgraph.c (cgraph_node::dump): Dump function version info.
	* symtab.c (symtab_node::dump_base): Add missing new line.
---
 gcc/cgraph.c | 22 ++
 gcc/symtab.c |  1 +
 2 files changed, 23 insertions(+)

diff --git a/gcc/cgraph.c b/gcc/cgraph.c
index ef2dc50282c..f2f763e31b3 100644
--- a/gcc/cgraph.c
+++ b/gcc/cgraph.c
@@ -2066,6 +2066,28 @@ cgraph_node::dump (FILE *f)
 fprintf (f, "  Profile id: %i\n",
 	 profile_id);
   fprintf (f, "  First run: %i\n", tp_first_run);
+  cgraph_function_version_info *vi = function_version ();
+  if (vi != NULL)
+{
+  fprintf (f, "  Version info: ");
+  if (vi->prev != NULL)
+	{
+	  fprintf (f, "prev: ");
+	  fprintf (f, "%s/%i ", vi->prev->this_node->asm_name (),
+		   vi->prev->this_node->order);
+	}
+  if (vi->next != NULL)
+	{
+	  fprintf (f, "next: ");
+	  fprintf (f, "%s/%i ", vi->next->this_node->asm_name (),
+		   vi->next->this_node->order);
+	}
+  if (vi->dispatcher_resolver != NULL_TREE)
+	fprintf (f, "dispatcher: %s",
+		 lang_hooks.decl_printable_name (vi->dispatcher_resolver, 2));
+
+  fprintf (f, "\n");
+}
   fprintf (f, "  Function flags:");
   if (count)
 fprintf (f, " executed %" PRId64"x",
diff --git a/gcc/symtab.c b/gcc/symtab.c
index 87febdc212f..0078896c8a8 100644
--- a/gcc/symtab.c
+++ b/gcc/symtab.c
@@ -890,6 +890,7 @@ symtab_node::dump_base (FILE *f)
 {
   fprintf (f, "  Aux:");
   dump_addr (f, " @", (void *)aux);
+  fprintf (f, "\n");
 }
 
   fprintf (f, "  References: ");
-- 
2.11.0



Re: [PATCH/AARCH64] Enable software prefetching (-fprefetch-loop-arrays) for ThunderX 88xxx

2017-02-03 Thread Maxim Kuvyrkov
Hi Andrew,

I took the liberty of rebasing your patch on top of my patchset.  Does it look 
correct?

I think I addressed all the comments you had about my review and posted updated 
patches.

--
Maxim Kuvyrkov
www.linaro.org



> On Jan 30, 2017, at 7:25 PM, Andrew Pinski  wrote:
> 
> On Mon, Jan 30, 2017 at 6:49 AM, Maxim Kuvyrkov
>  wrote:
>>> On Jan 27, 2017, at 6:59 PM, Andrew Pinski  wrote:
>>> 
>>> On Fri, Jan 27, 2017 at 4:11 AM, Richard Biener
>>>  wrote:
 On Fri, Jan 27, 2017 at 1:10 PM, Richard Biener
  wrote:
> On Thu, Jan 26, 2017 at 9:56 PM, Andrew Pinski  wrote:
>> Hi,
>> This patch enables -fprefetch-loop-arrays for -mcpu=thunderxt88 and
>> -mcpu=thunderxt88p1.  I filled out the tuning structures for both
>> thunderx and thunderx2t99.  No other core current enables software
>> prefetching so I set them to 0 which does not change the default
>> parameters.
>> 
>> OK?  Bootstrapped and tested on both ThunderX2 CN99xx and ThunderX
>> CN88xx with no regressions.  I got a 2x improvement for 462.libquantum
>> on CN88xx, overall a 10% improvement on SPEC INT on CN88xx at -Ofast.
>> CN99xx's SPEC did not change.
> 
> Heh, quite impressive for this kind of bit-rotten (and broken?) pass ;)
 
 And I wonder if most benefit comes from the unrolling the pass might do
 rather than from the prefetches...
>>> 
>>> Not in this case.  The main reason why I know is because the number of
>>> L1 and L2 misses drops a lot.
>> 
>> I can confirm this.  In my experiments loop unrolling hurts several tests.
> 
> Not on the cores I tried it.  I tried it on both ThunderX CN88xx and
> ThunderX CN99xx, I did not get any regressions due to unrolling.
> 
> Thanks,
> Andrew
> 
>> 
>> The prefetching approach I'm testing for -O2 includes disabling of loop 
>> unrolling to prevent code bloat.
>> 
>> --
>> Maxim Kuvyrkov
>> www.linaro.org



0007-Prefetch-tuning-for-ThunderX.patch
Description: Binary data


Re: [PATCH 5/6][AArch64] Enable -fprefetch-loop-arrays at -O3 for cores that benefit from prefetching.

2017-02-03 Thread Maxim Kuvyrkov
> On Jan 30, 2017, at 5:50 PM, Maxim Kuvyrkov  wrote:
> 
>> On Jan 30, 2017, at 3:23 PM, Kyrill Tkachov  
>> wrote:
>> 
>> Hi Maxim,
>> 
>> On 30/01/17 12:06, Maxim Kuvyrkov wrote:
>>> This patch enables prefetching at -O3 for aarch64 cores that set 
>>> "simultaneous prefetches" parameter above 0.  There are currently no such 
>>> settings, so this patch doesn't change default code generation.
>>> 
>>> I'm now working on improvements to -fprefetch-loop-arrays pass to make it 
>>> suitable for -O2. I'll post this work in the next month.
>>> 
>>> Bootstrapped and regtested on x86_64-linux-gnu and aarch64-linux-gnu.
>>> 
>> 
>> Are you aiming to get this in for GCC 8?
>> I have one small comment on this patch:
>> 
>> +  /* Enable sw prefetching at -O3 for CPUS that have prefetch, and we
>> + have deemed it beneficial (signified by setting
>> + prefetch.num_slots to 1 or more).  */
>> +  if (flag_prefetch_loop_arrays < 0
>> +  && HAVE_prefetch
>> 
>> HAVE_prefetch will always be true on aarch64.
>> I imagine midend code that had logic like this would need this check, but 
>> aarch64-specific code shouldn't need it.
> 
> Agree, I'll remove HAVE_prefetch.
> 
> This pattern was copied from other backends, and HAVE_prefetch is most likely 
> a historical artifact.

Andrew raised a good point in the review of his patch that it is a bad idea to 
use one of prefetching parameters (simultaneous_prefetches) as indicator for 
whether to enable prefetching pass by default.  Indeed there are cases when we 
want to set simultaneous_prefetch according to HW documentation (or 
experimental results), but not enable prefetching pass by default.

This update to the patch addresses it.  The patch adds a new explicit field to 
prefetch tuning structure "default_opt_level" that sets optimization level from 
which prefetching should be enabled by default.  The current value is to enable 
prefetching at -O3; additionally, this parameter will come handy for enabling 
prefetching at -O2 [when it is ready].

--
Maxim Kuvyrkov
www.linaro.org




0005-Enable-fprefetch-loop-arrays-at-O3-for-cores-that-be.patch
Description: Binary data


Re: [PATCH 4/6] Port prefetch configuration from aarch32 to aarch64

2017-02-03 Thread Maxim Kuvyrkov
> On Jan 30, 2017, at 7:35 PM, Andrew Pinski  wrote:
> 
> On Mon, Jan 30, 2017 at 3:48 AM, Maxim Kuvyrkov
>  wrote:
>> This patch port prefetch configuration from aarch32 backend to aarch64.  
>> There is no code-generation change from this patch.
>> 
>> This patch also happens to address Kyrill's comment on Andrew's prefetching 
>> patch at https://gcc.gnu.org/ml/gcc-patches/2017-01/msg02133.html .
>> 
>> This patch also fixes a minor bug in aarch64_override_options_internal(), 
>> which used "selected_cpu->tune" instead of "aarch64_tune_params".
> 
> I am not a fan of the macro at all.

Are you referring to the AARCH64_PREFETCH_NON_BENEFICIAL macro?  If so, then 
consider consider how it simplifies updates to prefetch tuning parameters for 
targets that don't care about prefetching.  My guess is that growing number of 
ARM cores was the reason for this macro in aarch32 backend.  Similarly, the 
number of aarch64 cores will only increase over time, and updating every single 
tuning structure will become a pain.

Here is an updated patch that fixes references from global_options.foo to 
opts->foo.  I spotted this mistake while looking at Andrew's patch.

--
Maxim Kuvyrkov
www.linaro.org



0004-Port-prefetch-configuration-from-aarch32-to-aarch64-.patch
Description: Binary data


Re: [PATCH][PR sanitizer/78663] Fix 'dyld: Symbol not found: _memmem' linkage error on Darwin 10.6.

2017-02-03 Thread Jakub Jelinek
On Fri, Feb 03, 2017 at 02:46:39PM +0300, Maxim Ostapenko wrote:
> Hi,
> 
> this patch fixes  'dyld: Symbol not found: _memmem' linkage error on Darwin
> 10.6.
> Just cherry-pick r293992 from upstream, OK to apply?
> 
> -Maxim

> libsanitizer/ChangeLog:
> 
> 2017-02-03  Maxim Ostapenko  
> 
>   PR sanitizer/78663
>   * sanitizer_common/sanitizer_mac.cc: Cherry-pick upstream r293992.
>   * sanitizer_common/sanitizer_platform_interceptors.h: Likewise.

Ok for trunk, thanks a lot.

> diff --git a/libsanitizer/sanitizer_common/sanitizer_mac.cc 
> b/libsanitizer/sanitizer_common/sanitizer_mac.cc
> index 62be7b0..2a05102 100644
> --- a/libsanitizer/sanitizer_common/sanitizer_mac.cc
> +++ b/libsanitizer/sanitizer_common/sanitizer_mac.cc
> @@ -91,20 +91,22 @@ namespace __sanitizer {
>  
>  #include "sanitizer_syscall_generic.inc"
>  
> -// Direct syscalls, don't call libmalloc hooks.
> +// Direct syscalls, don't call libmalloc hooks (but not available on 10.6).
>  extern "C" void *__mmap(void *addr, size_t len, int prot, int flags, int 
> fildes,
> -off_t off);
> -extern "C" int __munmap(void *, size_t);
> +off_t off) SANITIZER_WEAK_ATTRIBUTE;
> +extern "C" int __munmap(void *, size_t) SANITIZER_WEAK_ATTRIBUTE;
>  
>  // -- sanitizer_libc.h
>  uptr internal_mmap(void *addr, size_t length, int prot, int flags,
> int fd, u64 offset) {
>if (fd == -1) fd = VM_MAKE_TAG(VM_MEMORY_ANALYSIS_TOOL);
> -  return (uptr)__mmap(addr, length, prot, flags, fd, offset);
> +  if (__mmap) return (uptr)__mmap(addr, length, prot, flags, fd, offset);
> +  return (uptr)mmap(addr, length, prot, flags, fd, offset);
>  }
>  
>  uptr internal_munmap(void *addr, uptr length) {
> -  return __munmap(addr, length);
> +  if (__munmap) return __munmap(addr, length);
> +  return munmap(addr, length);
>  }
>  
>  int internal_mprotect(void *addr, uptr length, int prot) {
> @@ -190,17 +192,19 @@ uptr internal_sigprocmask(int how, __sanitizer_sigset_t 
> *set,
>return sigprocmask(how, set, oldset);
>  }
>  
> -// Doesn't call pthread_atfork() handlers.
> -extern "C" pid_t __fork(void);
> +// Doesn't call pthread_atfork() handlers (but not available on 10.6).
> +extern "C" pid_t __fork(void) SANITIZER_WEAK_ATTRIBUTE;
>  
>  int internal_fork() {
> -  return __fork();
> +  if (__fork)
> +return __fork();
> +  return fork();
>  }
>  
>  int internal_forkpty(int *amaster) {
>int master, slave;
>if (openpty(, , nullptr, nullptr, nullptr) == -1) return -1;
> -  int pid = __fork();
> +  int pid = internal_fork();
>if (pid == -1) {
>  close(master);
>  close(slave);
> diff --git a/libsanitizer/sanitizer_common/sanitizer_platform_interceptors.h 
> b/libsanitizer/sanitizer_common/sanitizer_platform_interceptors.h
> index 2a88605..6b2ba31 100644
> --- a/libsanitizer/sanitizer_common/sanitizer_platform_interceptors.h
> +++ b/libsanitizer/sanitizer_common/sanitizer_platform_interceptors.h
> @@ -81,8 +81,16 @@
>  #define SANITIZER_INTERCEPT_MEMMOVE 1
>  #define SANITIZER_INTERCEPT_MEMCPY 1
>  #define SANITIZER_INTERCEPT_MEMCMP 1
> +#if defined(__ENVIRONMENT_MAC_OS_X_VERSION_MIN_REQUIRED__) && \
> +__ENVIRONMENT_MAC_OS_X_VERSION_MIN_REQUIRED__ < 1070
> +# define SI_MAC_DEPLOYMENT_BELOW_10_7 1
> +#else
> +# define SI_MAC_DEPLOYMENT_BELOW_10_7 0
> +#endif
> +// memmem on Darwin doesn't exist on 10.6
>  // FIXME: enable memmem on Windows.
> -#define SANITIZER_INTERCEPT_MEMMEM SI_NOT_WINDOWS
> +#define SANITIZER_INTERCEPT_MEMMEM \
> +  SI_NOT_WINDOWS && !SI_MAC_DEPLOYMENT_BELOW_10_7
>  #define SANITIZER_INTERCEPT_MEMCHR 1
>  #define SANITIZER_INTERCEPT_MEMRCHR SI_FREEBSD || SI_LINUX
>  


Jakub


[PATCH][PR sanitizer/78663] Fix 'dyld: Symbol not found: _memmem' linkage error on Darwin 10.6.

2017-02-03 Thread Maxim Ostapenko

Hi,

this patch fixes  'dyld: Symbol not found: _memmem' linkage error on 
Darwin 10.6.

Just cherry-pick r293992 from upstream, OK to apply?

-Maxim
libsanitizer/ChangeLog:

2017-02-03  Maxim Ostapenko  

	PR sanitizer/78663
	* sanitizer_common/sanitizer_mac.cc: Cherry-pick upstream r293992.
	* sanitizer_common/sanitizer_platform_interceptors.h: Likewise.

diff --git a/libsanitizer/sanitizer_common/sanitizer_mac.cc b/libsanitizer/sanitizer_common/sanitizer_mac.cc
index 62be7b0..2a05102 100644
--- a/libsanitizer/sanitizer_common/sanitizer_mac.cc
+++ b/libsanitizer/sanitizer_common/sanitizer_mac.cc
@@ -91,20 +91,22 @@ namespace __sanitizer {
 
 #include "sanitizer_syscall_generic.inc"
 
-// Direct syscalls, don't call libmalloc hooks.
+// Direct syscalls, don't call libmalloc hooks (but not available on 10.6).
 extern "C" void *__mmap(void *addr, size_t len, int prot, int flags, int fildes,
-off_t off);
-extern "C" int __munmap(void *, size_t);
+off_t off) SANITIZER_WEAK_ATTRIBUTE;
+extern "C" int __munmap(void *, size_t) SANITIZER_WEAK_ATTRIBUTE;
 
 // -- sanitizer_libc.h
 uptr internal_mmap(void *addr, size_t length, int prot, int flags,
int fd, u64 offset) {
   if (fd == -1) fd = VM_MAKE_TAG(VM_MEMORY_ANALYSIS_TOOL);
-  return (uptr)__mmap(addr, length, prot, flags, fd, offset);
+  if (__mmap) return (uptr)__mmap(addr, length, prot, flags, fd, offset);
+  return (uptr)mmap(addr, length, prot, flags, fd, offset);
 }
 
 uptr internal_munmap(void *addr, uptr length) {
-  return __munmap(addr, length);
+  if (__munmap) return __munmap(addr, length);
+  return munmap(addr, length);
 }
 
 int internal_mprotect(void *addr, uptr length, int prot) {
@@ -190,17 +192,19 @@ uptr internal_sigprocmask(int how, __sanitizer_sigset_t *set,
   return sigprocmask(how, set, oldset);
 }
 
-// Doesn't call pthread_atfork() handlers.
-extern "C" pid_t __fork(void);
+// Doesn't call pthread_atfork() handlers (but not available on 10.6).
+extern "C" pid_t __fork(void) SANITIZER_WEAK_ATTRIBUTE;
 
 int internal_fork() {
-  return __fork();
+  if (__fork)
+return __fork();
+  return fork();
 }
 
 int internal_forkpty(int *amaster) {
   int master, slave;
   if (openpty(, , nullptr, nullptr, nullptr) == -1) return -1;
-  int pid = __fork();
+  int pid = internal_fork();
   if (pid == -1) {
 close(master);
 close(slave);
diff --git a/libsanitizer/sanitizer_common/sanitizer_platform_interceptors.h b/libsanitizer/sanitizer_common/sanitizer_platform_interceptors.h
index 2a88605..6b2ba31 100644
--- a/libsanitizer/sanitizer_common/sanitizer_platform_interceptors.h
+++ b/libsanitizer/sanitizer_common/sanitizer_platform_interceptors.h
@@ -81,8 +81,16 @@
 #define SANITIZER_INTERCEPT_MEMMOVE 1
 #define SANITIZER_INTERCEPT_MEMCPY 1
 #define SANITIZER_INTERCEPT_MEMCMP 1
+#if defined(__ENVIRONMENT_MAC_OS_X_VERSION_MIN_REQUIRED__) && \
+__ENVIRONMENT_MAC_OS_X_VERSION_MIN_REQUIRED__ < 1070
+# define SI_MAC_DEPLOYMENT_BELOW_10_7 1
+#else
+# define SI_MAC_DEPLOYMENT_BELOW_10_7 0
+#endif
+// memmem on Darwin doesn't exist on 10.6
 // FIXME: enable memmem on Windows.
-#define SANITIZER_INTERCEPT_MEMMEM SI_NOT_WINDOWS
+#define SANITIZER_INTERCEPT_MEMMEM \
+  SI_NOT_WINDOWS && !SI_MAC_DEPLOYMENT_BELOW_10_7
 #define SANITIZER_INTERCEPT_MEMCHR 1
 #define SANITIZER_INTERCEPT_MEMRCHR SI_FREEBSD || SI_LINUX
 


Re: [PATCH][DOC][OBVIOUS] Document default value for use-after-scope-direct-emission-threshold

2017-02-03 Thread Martin Liška
On 02/03/2017 10:45 AM, Gerald Pfeifer wrote:
> On Fri, 3 Feb 2017, Martin Liška wrote:
>> Installed as obvious as r245147.
> 
> Yes, but...

Hi.

Thanks for that, stupid typos.

> 
> I think you wanted to say "is 256", not "in 256" and make this a
> full sentence?  Looking at the entire entry I noticed a few other
> issues ("smaller of equal" instead of "smaller or equal", missing
> articles, passive voice,...) which I believe the patch below addresses.
> 
> (I'm still not quite sure what this option does, and whether we 
> could say "poison and unpoison it", for example?  Can you advise?)

I'm sending enhancement of that. Basically: direct poisoning = emitting
instructions that touch shadow memory vs. runtime callbacks = function
that does that :)

I welcome help with that.

Martin

> 
> Gerald
> 
> 2017-02-03  Gerald Pfeifer  
> 
>   * doc/invoke.texi (use-after-scope-direct-emission-threshold):
>   Fix typos and grammar; use active voice.
> 
> Index: doc/invoke.texi
> ===
> --- doc/invoke.texi   (revision 245148)
> +++ doc/invoke.texi   (working copy)
> @@ -10469,9 +10469,9 @@
>  @option{--param asan-instrumentation-with-call-threshold=0}.
>  
>  @item use-after-scope-direct-emission-threshold
> -If size of a local variables in bytes is smaller of equal to this number,
> -direct instruction emission is utilized to poison and unpoison local 
> variables.
> -Default value in 256.
> +If the size of a local variable in bytes is smaller or equal to this number,
> +utilize direct instruction emission to poison and unpoison local variables.
> +The default value is 256.
>  
>  @item chkp-max-ctor-size
>  Static constructors generated by Pointer Bounds Checker may become very
> 

>From 685cf7e8e22380916ac2b020f03d517cc2ae3a9a Mon Sep 17 00:00:00 2001
From: marxin 
Date: Fri, 3 Feb 2017 12:29:35 +0100
Subject: [PATCH] v2

---
 gcc/doc/invoke.texi | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 08d26a1d858..1dd26ad11c7 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -10469,9 +10469,9 @@ E.g. to disable inline code use
 @option{--param asan-instrumentation-with-call-threshold=0}.
 
 @item use-after-scope-direct-emission-threshold
-If size of a local variables in bytes is smaller of equal to this number,
-direct instruction emission is utilized to poison and unpoison local variables.
-Default value in 256.
+If size of a local variable in bytes is smaller or equal to this number,
+directly poison (or unpoison) shadow memory.  Otherwise runtime callbacks
+are used.  The default value is 256.
 
 @item chkp-max-ctor-size
 Static constructors generated by Pointer Bounds Checker may become very
-- 
2.11.0



Re: [PATCH v2,rs6000] PR78056: Finish fixing build failure on Power7

2017-02-03 Thread Andreas Schwab
On Dez 16 2016, Kelvin Nilsen  wrote:

> Index: gcc/testsuite/gcc.target/powerpc/pr78056-8.c
> ===
> --- gcc/testsuite/gcc.target/powerpc/pr78056-8.c  (revision 0)
> +++ gcc/testsuite/gcc.target/powerpc/pr78056-8.c  (working copy)
> @@ -0,0 +1,26 @@
> +/* { dg-do compile { target { powerpc*-*-* } } } */
> +/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
> "-mcpu=power5" } } */
> +
> +/* powerpc_popcntb_ok represents support for power 5.  */
> +/* { dg-require-effective-target powerpc_popcntb_ok } */
> +/* dfp_hw represents support for power 6.  */
> +/* { dg-skip-if "" { dfp_hw } } */
> +/* { dg-skip-if "" { powerpc*-*-aix* } } */
> +/* { dg-options "-mcpu=power5" } */
> +
> +/* This test follows the pattern of pr78056-2.c, which has been
> + * exercised with binutils 2.25.  This test, however, has not
> + * been exercised because the author of the test does not have access
> + * to a development environment that succesfully bootstraps gcc
> + * while at the same lacking assembler support for power 6.  */
> +
> +/* This test should succeed on both 32- and 64-bit configurations.  */
> +/* Though the command line specifies power5 target, this function is
> +   to support power6. Expect an error message here because this target
> +   does not support power6.  */
> +__attribute__((target("cpu=power6")))
> +/* fabs/fnabs/fsel */
> +double normal1 (double a, double b)
> +{ /* { dg-warning "lacks power6 support" } */
> +  return __builtin_copysign (a, b); /* { dg-warning "implicit declaration" } 
> */
> +}

http://gcc.gnu.org/ml/gcc-testresults/2017-02/msg00294.html

spawn -ignore SIGHUP /daten/gcc/gcc-20170202/Build/gcc/xgcc 
-B/daten/gcc/gcc-20170202/Build/gcc/ 
/daten/gcc/gcc-20170202/gcc/testsuite/gcc.target/powerpc/pr78056-8.c -m32 
-fno-diagnostics-show-caret -fdiagnostics-color=never -mcpu=power5 -S -o 
pr78056-8.s
FAIL: gcc.target/powerpc/pr78056-8.c  (test for warnings, line 24)
FAIL: gcc.target/powerpc/pr78056-8.c  (test for warnings, line 25)

spawn -ignore SIGHUP /daten/gcc/gcc-20170202/Build/gcc/xgcc 
-B/daten/gcc/gcc-20170202/Build/gcc/ 
/daten/gcc/gcc-20170202/gcc/testsuite/gcc.target/powerpc/pr78056-8.c -m64 
-fno-diagnostics-show-caret -fdiagnostics-color=never -mcpu=power5 -S -o 
pr78056-8.s
FAIL: gcc.target/powerpc/pr78056-8.c  (test for warnings, line 24)
FAIL: gcc.target/powerpc/pr78056-8.c  (test for warnings, line 25)

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."


Re: [PATCH] [AArch64] Implement popcount pattern

2017-02-03 Thread Kyrill Tkachov


On 03/02/17 06:57, Hurugalawadi, Naveen wrote:

Hi Andrew,

Thanks for clearing the confusion.


I don't understand this comment and how it relates to your updated patch

foo, foo1 and foo2 generates calls to "popcountdi2" which should have
been "popcountsi2" for foo1. When Kyrill commented on using the
popcountsi2; I was confused :).

Hence, the testcase generally checks for the absence of call to "popcount"
and the presence of "cnt" instruction instead.



Sorry for the confusion, I should have been more explicit that I was talking
about the optab.


Now of course what should change still is the argument
types to foo1/foo2

The arguments to foo1 and foo2 are modified as required.

Bootstrapped and regression tested on aarch64-linux-gnu with no regressions.

Please let us know if its okay for stage 1?


This looks good to me, but you'll need James (or another aarch64 maintainer) to 
approve.

Thanks,
Kyrill


Thanks,
Naveen




Re: [PATCH][DOC][OBVIOUS] Document default value for use-after-scope-direct-emission-threshold

2017-02-03 Thread Gerald Pfeifer
On Fri, 3 Feb 2017, Martin Liška wrote:
> Installed as obvious as r245147.

Yes, but...

I think you wanted to say "is 256", not "in 256" and make this a
full sentence?  Looking at the entire entry I noticed a few other
issues ("smaller of equal" instead of "smaller or equal", missing
articles, passive voice,...) which I believe the patch below addresses.

(I'm still not quite sure what this option does, and whether we 
could say "poison and unpoison it", for example?  Can you advise?)

Gerald

2017-02-03  Gerald Pfeifer  

* doc/invoke.texi (use-after-scope-direct-emission-threshold):
Fix typos and grammar; use active voice.

Index: doc/invoke.texi
===
--- doc/invoke.texi (revision 245148)
+++ doc/invoke.texi (working copy)
@@ -10469,9 +10469,9 @@
 @option{--param asan-instrumentation-with-call-threshold=0}.
 
 @item use-after-scope-direct-emission-threshold
-If size of a local variables in bytes is smaller of equal to this number,
-direct instruction emission is utilized to poison and unpoison local variables.
-Default value in 256.
+If the size of a local variable in bytes is smaller or equal to this number,
+utilize direct instruction emission to poison and unpoison local variables.
+The default value is 256.
 
 @item chkp-max-ctor-size
 Static constructors generated by Pointer Bounds Checker may become very

Re: [PATCH] assume arrays at end of structs are unbounded (PR 79352)

2017-02-03 Thread Jakub Jelinek
Hi!

Some formatting nits:

On Thu, Feb 02, 2017 at 10:10:32PM -0700, Martin Sebor wrote:
>   else if (gimple_assign_rhs_code (def_stmt) == COND_EXPR)
> {
>   tree op2 = gimple_assign_rhs2 (def_stmt);
>   tree op3 = gimple_assign_rhs3 (def_stmt);
> - return get_range_strlen (op2, length, visited, type, fuzzy)
> -   && get_range_strlen (op3, length, visited, type, fuzzy);
> + return get_range_strlen (op2, length, visited, type, fuzzy, flexp)
> +   && get_range_strlen (op3, length, visited, type, fuzzy, flexp);

This should be written as:
return (get_range_strlen (op2, length, visited, type, fuzzy, flexp)
&& get_range_strlen (op3, length, visited, type, fuzzy,
 flexp));
or so.

> +   Return true if the range of the string lengths has been obtained
> +   from the upper bound of an array at the end of a struct.  Such
> +   an array may hold a string that's longer than its upper bound
> +   due to it being used as a poor-man's flexible array member.  */
>  
> -void get_range_strlen (tree arg, tree minmaxlen[2])
> +bool get_range_strlen (tree arg, tree minmaxlen[2])

This should be
bool
get_range_strlen (tree arg, tree minmaxlen[2])

Jakub


Re: [wwwdocs] Added /gcc-7/porting_to.html

2017-02-03 Thread Jonathan Wakely
On 3 February 2017 at 07:56, Gerald Pfeifer wrote:
> On Mon, 30 Jan 2017, Jonathan Wakely wrote:
>>
>> This adds the porting to guide for GCC 7. So far it only has details
>> of C++ changes, mostly in the std::lib.
>
>
> Thanks for doing that, Jonathan!
>
> One minor observation: This has references to the GCC 5 and GCC 6
> release notes, where the latter is a relative link and the former
> an absolute.
> I assume that was not intentional, and applied the patch below
> after verifying that neither maintainer-scripts/gcc_release nor
> contrib/gennews actually package porting_to.html.

It wasn't intentional, thanks for fixing it.


Re: [PATCH] Fix PR79278, amend ADJUST_FIELD_ALIGN interface

2017-02-03 Thread Iain Sandoe

> On 2 Feb 2017, at 08:08, Richard Biener  wrote:
> 
> On Wed, 1 Feb 2017, Segher Boessenkool wrote:
> 
>> On Wed, Feb 01, 2017 at 03:53:09PM -0600, Segher Boessenkool wrote:
>>> On Wed, Feb 01, 2017 at 11:59:10AM +0100, Richard Biener wrote:
 Wasn't successful in making a cross to ppc64-linux build its libobjc.
>>> 
>>> I'll do a native build.  Just the patch in the first message in this
>>> thread?  And just running the testsuite is enough, or is there
>>> something specific you want tested?
>> 
>> Done now.  No new failures on powerpc64-linux {-m32,-m64}.  I needed
>> the following additional patch; I do not know if that works on other
>> targets (or if it actually is correct!)
> 
> Yes, I knew I needed something there but didn't know what is correct
> either ;)  But I concluded as 'type' is const char * here which is
> totally broken it probably doesn't matter what we pass as 2nd argument
> either ...
> 
> So, consider the patch changed like Segher proposed below (and thanks
> Segher for the testing).  For reference attached again below, including
> tm.texi.in.
> 
> Barring further comments I'll check this in at the beginning of next
> week.

I bootstrapped  (+ada +obj-c++) 245116+this patch on x86_64-darwin15 and 
powerpc-darwin9, I don’t see any regressions on the objective-c or 
objective-c++ suites (which include testing -fgnu-runtime on Darwin).
Iain

> 
> Thanks,
> Richard.
> 
> 2017-02-02  Richard Biener  
> 
>   PR tree-optimization/79256
>   PR middle-end/79278
>   * builtins.c (get_object_alignment_2): Use min_align_of_type
>   to extract alignment for MEM_REFs to honor BIGGEST_FIELD_ALIGNMENT
>   and ADJUST_FIELD_ALIGN.
> 
>   * doc/tm.texi.in (ADJUST_FIELD_ALIGN): Adjust to take additional
>   type parameter.
>   * doc/tm.texi: Regenerate.
>   * stor-layout.c (layout_decl): Adjust.
>   (update_alignment_for_field): Likewise.
>   (place_field): Likewise.
>   (min_align_of_type): Likewise.
>   * config/arc/arc.h (ADJUST_FIELD_ALIGN): Adjust.
>   * config/epiphany/epiphany.h (ADJUST_FIELD_ALIGN): Likewise.
>   * config/epiphany/epiphany.c (epiphany_adjust_field_align): Likewise.
>   * config/frv/frv.h (ADJUST_FIELD_ALIGN): Likewise.
>   * config/frv/frv.c (frv_adjust_field_align): Likewise.
>   * config/i386/i386.h (ADJUST_FIELD_ALIGN): Likewise.
>   * config/i386/i386.c (x86_field_alignment): Likewise.
>   * config/rs6000/aix.h (ADJUST_FIELD_ALIGN): Likewise.
>   * config/rs6000/darwin.h (ADJUST_FIELD_ALIGN): Likewise.
>   * config/rs6000/freebsd64.h (ADJUST_FIELD_ALIGN): Likewise.
>   * config/rs6000/linux64.h (ADJUST_FIELD_ALIGN): Likewise.
>   * config/rs6000/sysv4.h (ADJUST_FIELD_ALIGN): Likewise.
>   * config/rs6000/rs6000.c (rs6000_special_adjust_field_align_p):
>Likewise.
> 
>   go/
>   * go-backend.c (go_field_alignment): Adjust.
> 
>   libobjc/
>   * encoding.c (objc_layout_structure_next_member): Adjust
>   ADJUST_FIELD_ALIGN usage.
> 
> Index: gcc/builtins.c
> ===
> --- gcc/builtins.c(revision 245115)
> +++ gcc/builtins.c(working copy)
> @@ -334,9 +334,11 @@ get_object_alignment_2 (tree exp, unsign
>Do so only if get_pointer_alignment_1 did not reveal absolute
>alignment knowledge and if using that alignment would
>improve the situation.  */
> +  unsigned int talign;
>   if (!addr_p && !known_alignment
> -   && TYPE_ALIGN (TREE_TYPE (exp)) > align)
> - align = TYPE_ALIGN (TREE_TYPE (exp));
> +   && (talign = min_align_of_type (TREE_TYPE (exp)) * BITS_PER_UNIT)
> +   && talign > align)
> + align = talign;
>   else
>   {
> /* Else adjust bitpos accordingly.  */
> Index: gcc/config/arc/arc.h
> ===
> --- gcc/config/arc/arc.h  (revision 245115)
> +++ gcc/config/arc/arc.h  (working copy)
> @@ -317,8 +317,8 @@ if (GET_MODE_CLASS (MODE) == MODE_INT \
>construct.
> */
> 
> -#define ADJUST_FIELD_ALIGN(FIELD, COMPUTED) \
> -(TYPE_MODE (strip_array_types (TREE_TYPE (FIELD))) == DFmode \
> +#define ADJUST_FIELD_ALIGN(FIELD, TYPE, COMPUTED) \
> +(TYPE_MODE (strip_array_types (TYPE)) == DFmode \
>  ? MIN ((COMPUTED), 32) : (COMPUTED))
> 
> 
> Index: gcc/config/epiphany/epiphany.c
> ===
> --- gcc/config/epiphany/epiphany.c(revision 245115)
> +++ gcc/config/epiphany/epiphany.c(working copy)
> @@ -2855,12 +2855,12 @@ epiphany_special_round_type_align (tree
>arrays-at-the-end-of-structs work, like for struct gcov_fn_info in
>libgcov.c .  */
> unsigned
> -epiphany_adjust_field_align (tree field, unsigned computed)
> +epiphany_adjust_field_align (tree type, unsigned computed)
> {
>   if (computed == 32
> -  && TREE_CODE 

[PATCH][DOC][OBVIOUS] Document default value for use-after-scope-direct-emission-threshold

2017-02-03 Thread Martin Liška
Hello.

Installed as obvious as r245147.

Martin
>From 705314a0d896e5434dcb30b6e5790fc5dab132b4 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Fri, 3 Feb 2017 10:07:05 +0100
Subject: [PATCH] Document default value for
 use-after-scope-direct-emission-threshold

gcc/ChangeLog:

2017-02-03  Martin Liska  

	* doc/invoke.texi: Document default value for
	use-after-scope-direct-emission-threshold.
---
 gcc/doc/invoke.texi | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 2d8ee57552d..08d26a1d858 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -10471,6 +10471,7 @@ E.g. to disable inline code use
 @item use-after-scope-direct-emission-threshold
 If size of a local variables in bytes is smaller of equal to this number,
 direct instruction emission is utilized to poison and unpoison local variables.
+Default value in 256.
 
 @item chkp-max-ctor-size
 Static constructors generated by Pointer Bounds Checker may become very
-- 
2.11.0



[PATCH] Fix powerpc movsi_from_sf define_insn_and_split constraints (PR target/79354)

2017-02-03 Thread Jakub Jelinek
Hi!

As mentioned in the PR, for the following testcase we emit a power9
instruction even with -mcpu=power8.  Similar movsf_hardfloat instruction
uses wb constraint for the stxssp insn source rather than wu, which it
only uses for stxsspx (power7?).

Bootstrapped/regtested on powerpc64{,le}-linux, ok for trunk?

2017-02-03  Jakub Jelinek  

PR target/79354
* config/rs6000/rs6000.md (movsi_from_sf): Use wb constraint instead of
wu for stxssp alternative.

* gcc.target/powerpc/pr79354.c: New test.
* gcc.c-torture/execute/pr79354.c: New test.

--- gcc/config/rs6000/rs6000.md.jj  2017-02-02 11:04:27.0 +0100
+++ gcc/config/rs6000/rs6000.md 2017-02-03 02:29:42.754962983 +0100
@@ -6814,7 +6814,7 @@ (define_insn_and_split "movsi_from_sf"
 
(unspec:SI [(match_operand:SF 1 "input_operand"
"r,  m,   Z,   Z,r,
-f,  wu,  wu,  wIwH, r,
+f,  wb,  wu,  wIwH, r,
 wK")]
UNSPEC_SI_FROM_SF))
 
--- gcc/testsuite/gcc.target/powerpc/pr79354.c.jj   2017-02-03 
02:37:44.147938375 +0100
+++ gcc/testsuite/gcc.target/powerpc/pr79354.c  2017-02-03 02:38:34.838303987 
+0100
@@ -0,0 +1,23 @@
+/* PR target/79354 */
+/* { dg-do compile { target { powerpc64*-*-* && lp64 } } } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power8" } } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-mcpu=power8 -O2" } */
+/* { dg-final { scan-assembler-not "stxssp\[^x]" } } */
+
+
+int b, f, g;
+float e;
+unsigned long d;
+
+void
+foo (int *a)
+{
+  for (g = 0; g < 32; g++)
+if (f)
+  {
+e = d;
+__builtin_memcpy (, , sizeof (float));
+b = *a;
+  }
+}
--- gcc/testsuite/gcc.c-torture/execute/pr79354.c.jj2017-02-03 
02:36:36.746781897 +0100
+++ gcc/testsuite/gcc.c-torture/execute/pr79354.c   2017-02-03 
02:36:07.0 +0100
@@ -0,0 +1,30 @@
+/* PR target/79354 */
+
+int b, f, g;
+float e;
+unsigned long d;
+
+__attribute__((noinline, noclone)) void
+foo (int *a)
+{
+  for (g = 0; g < 32; g++)
+if (f)
+  {
+e = d;
+__builtin_memcpy (, , sizeof (float));
+b = *a;
+  }
+}
+
+int
+main ()
+{
+  int h = 5;
+  f = 1;
+  asm volatile ("" : : : "memory");
+  foo ();
+  asm volatile ("" : : : "memory");
+  foo ();
+  asm volatile ("" : : : "memory");
+  return 0;
+}

Jakub


Re: PR79286, ira combine_and_move_insns in loops

2017-02-03 Thread Jeff Law

On 02/02/2017 02:31 AM, Alan Modra wrote:

Revised patch that cures the lra related -m32 -Os regression too.

The code that I'm patching here is changing a REG_EQUAL note to
REG_EQUIV, ie. asserting that the value of the reg is always the value
set by the current instruction.  Which is not always true when the
insn is in a loop and the use of the reg happens before the def.  Of
course that implies the value of the reg is initially undefined, so
it's not unreasonable to make the initial reg value the same.
Except when the code setting the initial value may trap, as it does in
the testcase.

Bootstrap and regression test on x86_64-linux in progress.  OK
assuming no regressions?

I also took a look at gcc/*.o to see whether there were any code
quality regressions.  No differences found except the expected change
in ira.o.

PR rtl-optimization/79286
* ira.c (update_equiv_regs): Do not create an equivalence for
mems that may trap.
testsuite/
* gcc.c-torture/execute/pr79286.c: New.
That seems pretty pessimistic -- do we have dominance information at 
this point?  If so we could check that the assignment to the register 
dominates the use.   If they are in the same block, then you have to 
look at LUIDs or somesuch.


That would address the problem you're trying to solve without 
pessimizing the code.


THe hell of it is I know I've poked at this problem in the past.

jeff



Re: [patch,wwwdocs] Add v7 AVR release notes.

2017-02-03 Thread Gerald Pfeifer

On Fri, 7 Oct 2016, Georg-Johann Lay wrote:

Mentioning some AVR target specific improvements.


Below some small editorial changes that I just applied on top
of this.

Gerald

Index: gcc-7/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-7/changes.html,v
retrieving revision 1.56
diff -u -r1.56 changes.html
--- gcc-7/changes.html  2 Feb 2017 19:10:59 -   1.56
+++ gcc-7/changes.html  3 Feb 2017 08:42:26 -
@@ -804,10 +804,10 @@
is now properly supported.  Respective read-only variables are located
in flash memory in section .progmem.data.  No special
code is needed to access such variables; the compiler automatically
-adds an offset of 0x4000 to all addresses which is needed
+adds an offset of 0x4000 to all addresses, which is needed
to access variables in flash memory.  As opposed to ordinary cores
where it is sufficient to specify the progmem attribute
-with definitions, on the reduced Tiny cores the attribute has also to
+with definitions, on the reduced Tiny cores the attribute also has to
be specified with (external) declarations:

extern const int array[] __attribute__((__progmem__));
@@ -824,15 +824,15 @@
  return array[idx];
}
  A new command-line option -Wmisspelled-isr has been added.
-It can be used to turn off  or turn into errors 
+It turns off  or turns into errors 
warnings that are reported for interrupt service routines (ISRs)
which don't follow AVR-LibC's naming convention of prefixing
ISR names with __vector.
-  __builtin_avr_nops(n_nops) is a new
+  __builtin_avr_nops(n) is a new
https://gcc.gnu.org/onlinedocs/gcc/AVR-Built-in-Functions.html;>built-in
  function
-that inserts n_nops NOP instructions into
-the instruction stream. n_nops must be a value known at
+that inserts n NOP instructions into
+the instruction stream. n must be a value known at
compile time.




Re: [wwwdocs, coding conventions] Mention OVERRIDE/FINAL

2017-02-03 Thread Gerald Pfeifer
On Fri, 14 Oct 2016, David Malcolm wrote:
> On Fri, 2016-10-14 at 16:27 +0100, Pedro Alves wrote:
>> FYI, I pushed these in now.  I also bootstrapped with the
>> jit included in the selected languages, and hacked the
>> jit code a bit to trigger the problems OVERRIDE intends to
>> catch, just to make sure it still works.
> I propose that we update our coding conventions to mention the OVERRIDE
> and FINAL macros in the paragraph that discusses virtual funcs.
> 
> The attached patch (to the website) does so.
> 
> OK to commit?

I noticed this one has neither been rejected nor applied.

The patch appears fine wearing my wwwdocs maintainer hat, alas I
do not feel confident approving it (content-wise).

Perhaps something for Jeff (now added) or Bernd?

GeraldIndex: htdocs/codingconventions.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/codingconventions.html,v
retrieving revision 1.77
diff -u -p -r1.77 codingconventions.html
--- htdocs/codingconventions.html	18 Sep 2016 13:55:17 -	1.77
+++ htdocs/codingconventions.html	14 Oct 2016 21:22:44 -
@@ -902,7 +902,10 @@ Its use with data-carrying classes is mo
 
 Think carefully about the size and performance impact
 of virtual functions and virtual bases
-before using them.
+before using them.  If you do use virtual functions, use the
+OVERRIDE and FINAL macros from
+include/ansidecl.h to annotate the code for a human reader,
+and to allow sufficiently modern C++ compilers to detect mistakes.
 
 
 


Re: [PATCH] assume arrays at end of structs are unbounded (PR 79352)

2017-02-03 Thread Jeff Law

On 02/02/2017 10:10 PM, Martin Sebor wrote:

Bug 79352 points out that the gimple-ssa-printf pass doesn't allow
for an array at the end of a struct to be treated as a poor man's
flexible array member and hold a string that's longer than the upper
bound of the array.  Rather, the pass assumes that the string's
length must at most as long as the upper bound of the array - 1.

To allow for this the attached patch adjusts the get_range_strlen
function the pass uses to obtain the range of string lengths to
expose that bit of information.  The patch introduces a call to
array_at_struct_end_p to determine this but since the array is
in a COMPONENT_REF that array_at_struct_end_p doesn't consider
the patch extends the function to allow it to handle that case.
When the string can reference such an array the pass considers
it to be potentially unbounded in the worst (unlikely) case.

Martin

gcc-79352.diff


PR tree-optimization/79352 - -fprintf-return-value doesn't handle flexible-like 
array members properly

gcc/ChangeLog:

PR tree-optimization/79352
* gimple-fold.c (get_range_strlen): Add argument.
(get_range_strlen): Change return type to bool.
(get_maxval_strlen): Pass in a dummy argument.
* gimple-fold.h (get_range_strlen): Change return type to bool.
* gimple-ssa-sprintf.c (get_string_length): Set unlikely counter.
* tree.h (array_at_struct_end_p): Add argument.
* tree.c (array_at_struct_end_p): Handle it.

OK.

Jeff



Re: [wwwdocs] Consistently use "testsuite"

2017-02-03 Thread Gerald Pfeifer
On Fri, 3 Feb 2017, Gerald Pfeifer wrote:
> Of course, an innocent grep after providing this input in a review
> of a patch by David revealed that we have a number more cases where
> we used "test suite" instead of "testsuite".

And now a recursive grep.

Applied.

Gerald

Index: bugs/minimize.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/bugs/minimize.html,v
retrieving revision 1.8
diff -u -r1.8 minimize.html
--- bugs/minimize.html  28 Jun 2014 08:14:41 -  1.8
+++ bugs/minimize.html  3 Feb 2017 08:14:32 -
@@ -26,7 +26,7 @@
 
   GCC developers prefer bug reports with small, portable test cases.
 
-  Minimized test cases can be added to the GCC test suites.
+  Minimized test cases can be added to the GCC testsuite.
 
 
 There are basically two methods: either directly
Index: projects/beginner.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/projects/beginner.html,v
retrieving revision 1.64
diff -u -r1.64 beginner.html
--- projects/beginner.html  28 May 2016 20:40:35 -  1.64
+++ projects/beginner.html  3 Feb 2017 08:14:33 -
@@ -39,7 +39,7 @@
 Bug patrol
 
 These projects all have to do with bugs in the compiler, and our
-test suite which is supposed to make sure no bugs come back.
+testsuite which is supposed to make sure no bugs come back.
 
 
 Analyze failing test cases.
@@ -92,7 +92,7 @@
 It's likely that the same test has been added more than once, over
 the years.  You'd need to figure out a sensible definition of "the
 same test" that can be checked mechanically, then write a program that
-does that check, and run it against the entire test suite.
+does that check, and run it against the entire testsuite.
 
 
 Perform additional GCC testing.
Index: testing/index.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/testing/index.html,v
retrieving revision 1.37
diff -u -r1.37 index.html
--- testing/index.html  29 Jun 2014 18:50:44 -  1.37
+++ testing/index.html  3 Feb 2017 08:14:33 -
@@ -16,7 +16,7 @@
 For information about testsuite organization and adding new tests, see
 https://gcc.gnu.org/onlinedocs/gccint/Testsuites.html;>
 Test Suites in the GCC Internals manual and the README files in
-the test suite directories.
+the testsuite directories.
 
 Current efforts
 
Index: testing/testing-blitz.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/testing/testing-blitz.html,v
retrieving revision 1.7
diff -u -r1.7 testing-blitz.html
--- testing/testing-blitz.html  27 May 2016 20:04:50 -  1.7
+++ testing/testing-blitz.html  3 Feb 2017 08:14:33 -
@@ -17,11 +17,11 @@
 href="https://sourceforge.net/projects/blitz/files/;>Blitz++ download
 page at SourceForge is a 2.0 MB file.  The uncompressed
 distribution comprises 10.5 MB of source files.  Building and running the
-test suite adds an additional 75 or so MB of object files
+testsuite adds an additional 75 or so MB of object files
 and executables to this, and building the Blitz++ example programs
 adds another 50 MB.
 
-Building and running the Blitz++ test suite on a 750 Mhz
+Building and running the Blitz++ testsuite on a 750 Mhz
 Pentium III laptop takes 17 minutes.  Building the Blitz++ example
 programs takes another 20 minutes.