Re: [PATCH, fortran] PR/83864 - ICE in gfc_apply_init, at fortran/expr.c:4271

2018-01-17 Thread Steve Kargl
On Wed, Jan 17, 2018 at 11:06:23PM +0100, Harald Anlauf wrote:
> 
> Changelog:
> 
> 2018-01-17  Harald Anlauf  
> 
>   PR fortran/83864
>   * expr.c (add_init_expr_to_sym): Do not dereference NULL pointer.
> 
> Testcase:
> 
> 2018-01-17  Harald Anlauf  
> 
>   PR fortran/83864
>   * gfortran.dg/pr83864.f90: New test.
> 

Committed revision 256837

-- 
Steve


Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre

2018-01-17 Thread Uros Bizjak
On Wed, Jan 17, 2018 at 7:29 PM, Woodhouse, David  wrote:
> I'm not sure I understand the concern. When compiling a large project for
> -m32 vs. -m64, there must be a million times the compiler has to decide
> whether to emit "r" or "e" before a register name. HJ's patch already does
> this for the thunk symbol. What is the future requirement that I am not
> understanding, and that is so hard?

No, the concern is not with one extra fputc in the compiler.

IIRC, these thunks are also intended to be called from c code. So,
when one compiles this code on 64bit target, the thunk has different
name than when mentioned code is compiled on 32bit target. This puts
an extra burden on the developer, which has to use correct thunk name
in their code. Sure, this can be solved trivially with #ifdef
__x86_64__, so the issue is minor, but I thought it has to be
mentioned before the name is set in stone.

BTW: The names of the registers are ax, bx, di, si, bp, ... and this
is reflected in 32bit PIC thunk names. "e" prefix stands for
"extended", and "r" was added to be consistent with r8 ... r15. The
added pack of registers on 64bit target has different naming rules for
sub-word access, e.g. r8b, r10w, r12d.

Uros.


[PATCH, committed] Add myself to the MAINTAINERS file

2018-01-17 Thread Siddhesh Poyarekar
From: Siddhesh Poyarekar 

* MAINTAINERS (write after approval): Add myself.


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@256836 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 ChangeLog   | 4 
 MAINTAINERS | 1 +
 2 files changed, 5 insertions(+)

diff --git a/ChangeLog b/ChangeLog
index aaa2187fdd7..ec85087a369 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,7 @@
+2018-01-18  Siddhesh Poyarekar  
+
+   * MAINTAINERS (write after approval): Add myself.
+
 2018-01-16  Sebastian Perta  
 
* MAINTAINERS (write after approval): Add myself.
diff --git a/MAINTAINERS b/MAINTAINERS
index 4732672b292..373910bb91c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -541,6 +541,7 @@ Kaushik Phatak  

 Nicolas Pitre  
 Paul Pluzhnikov
 Antoniu Pop
+Siddhesh Poyarekar 
 Vidya Praveen  
 Thomas Preud'homme 
 Vladimir Prus  
-- 
2.14.3



Re: [PING][PATCH, AArch64] Disable reg offset in quad-word store for Falkor

2018-01-17 Thread Siddhesh Poyarekar
Sorry, fixed a couple of typos that prevented the patch from actually
working.  Here's the updated version.  I'll be building on
ADDR_QUERY_STR for identifying and preventing pre/post incrementing
addresses for stores for falkor.

Siddhesh

2018-xx-xx  Jim Wilson  
Kugan Vivenakandarajah  
Siddhesh Poyarekar  

gcc/
* gcc/config/aarch64/aarch64-protos.h (aarch64_addr_query_type):
New member ADDR_QUERY_STR.
* gcc/config/aarch64/aarch64-tuning-flags.def
(SLOW_REGOFFSET_QUADWORD_STORE): New.
* gcc/config/aarch64/aarch64.c (qdf24xx_tunings): Add
SLOW_REGOFFSET_QUADWORD_STORE to tuning flags.
(aarch64_classify_address): Avoid register indexing for quad
mode stores when SLOW_REGOFFSET_QUADWORD_STORE is set.
* gcc/config/aarch64/constraints.md (Uts): New constraint.
* gcc/config/aarch64/aarch64.md (movti_aarch64, movtf_aarch64):
Use it.
* gcc/config/aarch64/aarch64-simd.md (aarch64_simd_mov):
Likewise.

gcc/testsuite/
* gcc/testsuite/gcc.target/aarch64/pr82533.c: New test case.

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 159bc6aee7e..15924fc3f58 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -120,6 +120,9 @@ enum aarch64_symbol_type
ADDR_QUERY_LDP_STP
   Query what is valid for a load/store pair.
 
+   ADDR_QUERY_STR
+  Query what is valid for a store.
+
ADDR_QUERY_ANY
   Query what is valid for at least one memory constraint, which may
   allow things that "m" doesn't.  For example, the SVE LDR and STR
@@ -128,6 +131,7 @@ enum aarch64_symbol_type
 enum aarch64_addr_query_type {
   ADDR_QUERY_M,
   ADDR_QUERY_LDP_STP,
+  ADDR_QUERY_STR,
   ADDR_QUERY_ANY
 };
 
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 3d1f6a01cb7..48d92702723 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -131,9 +131,9 @@
 
 (define_insn "*aarch64_simd_mov"
   [(set (match_operand:VQ 0 "nonimmediate_operand"
-   "=w, Umq,  m,  w, ?r, ?w, ?r, w")
+   "=w, Umq, Uts,  w, ?r, ?w, ?r, w")
(match_operand:VQ 1 "general_operand"
-   "m,  Dz, w,  w,  w,  r,  r, Dn"))]
+   "m,  Dz,w,  w,  w,  r,  r, Dn"))]
   "TARGET_SIMD
&& (register_operand (operands[0], mode)
|| aarch64_simd_reg_or_zero (operands[1], mode))"
diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def 
b/gcc/config/aarch64/aarch64-tuning-flags.def
index ea9ead234cb..04baf5b6de6 100644
--- a/gcc/config/aarch64/aarch64-tuning-flags.def
+++ b/gcc/config/aarch64/aarch64-tuning-flags.def
@@ -41,4 +41,8 @@ AARCH64_EXTRA_TUNING_OPTION ("slow_unaligned_ldpw", 
SLOW_UNALIGNED_LDPW)
are not considered cheap.  */
 AARCH64_EXTRA_TUNING_OPTION ("cheap_shift_extend", CHEAP_SHIFT_EXTEND)
 
+/* Don't use a register offset in a memory address for a quad-word store.  */
+AARCH64_EXTRA_TUNING_OPTION ("slow_regoffset_quadword_store",
+SLOW_REGOFFSET_QUADWORD_STORE)
+
 #undef AARCH64_EXTRA_TUNING_OPTION
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 0599a79bfeb..664d4a18354 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -894,7 +894,7 @@ static const struct tune_params qdf24xx_tunings =
   2,   /* min_div_recip_mul_df.  */
   0,   /* max_case_values.  */
   tune_params::AUTOPREFETCHER_WEAK,/* autoprefetcher_model.  */
-  (AARCH64_EXTRA_TUNE_NONE),   /* tune_flags.  */
+  (AARCH64_EXTRA_TUNE_SLOW_REGOFFSET_QUADWORD_STORE),  /* tune_flags.  */
   _prefetch_tune
 };
 
@@ -5531,6 +5531,16 @@ aarch64_classify_address (struct aarch64_address_info 
*info,
|| vec_flags == VEC_ADVSIMD
|| vec_flags == VEC_SVE_DATA));
 
+  /* Avoid register indexing for 128-bit stores when the
+ AARCH64_EXTRA_TUNE_SLOW_REGOFFSET_QUADWORD_STORE option is set.  */
+  if (!optimize_size
+  && type == ADDR_QUERY_STR
+  && (aarch64_tune_params.extra_tuning_flags
+ & AARCH64_EXTRA_TUNE_SLOW_REGOFFSET_QUADWORD_STORE)
+  && (mode == TImode || mode == TFmode
+ || aarch64_vector_data_mode_p (mode)))
+allow_reg_index_p = false;
+
   /* For SVE, only accept [Rn], [Rn, Rm, LSL #shift] and
  [Rn, #offset, MUL VL].  */
   if ((vec_flags & (VEC_SVE_DATA | VEC_SVE_PRED)) != 0
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index edb6a758333..348b867ff7f 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -1079,7 +1079,7 @@
 
 (define_insn "*movti_aarch64"
   [(set (match_operand:TI 0
-"nonimmediate_operand"  "=r, w,r,w,r,m,m,w,m")
+"nonimmediate_operand"  

Re: Go patch committed: Update to Go1.10beta1

2018-01-17 Thread Ian Lance Taylor
On Thu, Jan 11, 2018 at 1:46 AM, Rainer Orth
 wrote:
>
>> On Wed, Jan 10, 2018 at 5:42 AM, Ian Lance Taylor  wrote:
>>>
>>> Whoops, there's a bug on big-endian 32-bit systems.  I'm testing
>>> https://golang.org/cl/87135.
>>
>> Committed as follows.
>
> thanks, that fixed quite a lot of the failures.
>
> However, many others remain, too many to report here.  I've filed PR
> go/83787 to capture those.

Thanks.  I found the problem: there is a new function makechan that
takes a size argument of type int, and the old makechan, that took
int64, is now makechan64.  Since the size argument was the last one,
this worked fine except on 32-bit big-endian systems.  Fixed with this
patch.  Bootstrapped and tested on x86_64-pc-linux-gnu and
sparc-sun-solaris2.12.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 256820)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-1072286ca9249bd6f75628aead325a66286bcf5b
+925635f067d40d30acf565b620cc859ee7cbc990
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/escape.cc
===
--- gcc/go/gofrontend/escape.cc (revision 256820)
+++ gcc/go/gofrontend/escape.cc (working copy)
@@ -360,6 +360,7 @@ Node::op_format() const
  break;
 
case Runtime::MAKECHAN:
+   case Runtime::MAKECHAN64:
case Runtime::MAKEMAP:
case Runtime::MAKESLICE:
case Runtime::MAKESLICE64:
@@ -1602,6 +1603,7 @@ Escape_analysis_assign::expression(Expre
switch (fe->runtime_code())
  {
  case Runtime::MAKECHAN:
+ case Runtime::MAKECHAN64:
  case Runtime::MAKEMAP:
  case Runtime::MAKESLICE:
  case Runtime::MAKESLICE64:
@@ -2284,6 +2286,7 @@ Escape_analysis_assign::assign(Node* dst
switch (fe->runtime_code())
  {
  case Runtime::MAKECHAN:
+ case Runtime::MAKECHAN64:
  case Runtime::MAKEMAP:
  case Runtime::MAKESLICE:
  case Runtime::MAKESLICE64:
@@ -3056,6 +3059,7 @@ Escape_analysis_flood::flood(Level level
   switch (call->fn()->func_expression()->runtime_code())
 {
 case Runtime::MAKECHAN:
+   case Runtime::MAKECHAN64:
 case Runtime::MAKEMAP:
 case Runtime::MAKESLICE:
 case Runtime::MAKESLICE64:
Index: gcc/go/gofrontend/expressions.cc
===
--- gcc/go/gofrontend/expressions.cc(revision 256820)
+++ gcc/go/gofrontend/expressions.cc(working copy)
@@ -7565,7 +7565,10 @@ Builtin_call_expression::lower_make(Stat
   else if (is_chan)
 {
   Expression* type_arg = Expression::make_type_descriptor(type, type_loc);
-  call = Runtime::make_call(Runtime::MAKECHAN, loc, 2, type_arg, len_arg);
+  Runtime::Function code = Runtime::MAKECHAN;
+  if (!len_small)
+   code = Runtime::MAKECHAN64;
+  call = Runtime::make_call(code, loc, 2, type_arg, len_arg);
 }
   else
 go_unreachable();
Index: gcc/go/gofrontend/runtime.def
===
--- gcc/go/gofrontend/runtime.def   (revision 256593)
+++ gcc/go/gofrontend/runtime.def   (working copy)
@@ -139,7 +139,8 @@ DEF_GO_RUNTIME(MAPITERNEXT, "runtime.map
 
 
 // Make a channel.
-DEF_GO_RUNTIME(MAKECHAN, "runtime.makechan", P2(TYPE, INT64), R1(CHAN))
+DEF_GO_RUNTIME(MAKECHAN, "runtime.makechan", P2(TYPE, INT), R1(CHAN))
+DEF_GO_RUNTIME(MAKECHAN64, "runtime.makechan64", P2(TYPE, INT64), R1(CHAN))
 
 // Send a value on a channel.
 DEF_GO_RUNTIME(CHANSEND, "runtime.chansend1", P2(CHAN, POINTER), R0())
Index: libgo/go/runtime/chan.go
===
--- libgo/go/runtime/chan.go(revision 256593)
+++ libgo/go/runtime/chan.go(working copy)
@@ -26,6 +26,7 @@ import (
 // themselves, so that the compiler will export them.
 //
 //go:linkname makechan runtime.makechan
+//go:linkname makechan64 runtime.makechan64
 //go:linkname chansend1 runtime.chansend1
 //go:linkname chanrecv1 runtime.chanrecv1
 //go:linkname chanrecv2 runtime.chanrecv2


[PATCH][committed][PR testsuite/83883] Tighten expected output to work on callee-copies targets

2018-01-17 Thread Jeff Law


On targets where the callee may make a copy of incoming aggregates DSE
would trigger in both functions -- prior to inlining of course.

This patch tightens the test to look for DSE triggering in the spot
where we really wanted to check for it.  It's not strictly a regression
fix, but given it's a testsuite only change it seems appropriate.

Verified it fixes the failure on the hppa targets (prior to the ABI
change) as well as that it still passes on x86_64.

Installed on the trunk,

Jeff
commit 0a82247b5bb50c2fb62e334bc20c35a1654c10ca
Author: law 
Date:   Thu Jan 18 04:05:27 2018 +

PR testsuite/83883
* gcc.dg/tree-ssa/ssa-dse-26.c: Tighten expected output.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@256833 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index f710c158848..492c650911e 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,8 @@
+2018-01-17  Jeff Law  
+
+   PR testsuite/83883
+   * gcc.dg/tree-ssa/ssa-dse-26.c: Tighten expected output.
+
 2018-01-17  Bill Schmidt  
 
* gcc.target/powerpc/safe-indirect-jump-1.c: Remove endian
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-26.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-26.c
index a5638b58247..8e0a24a6c2c 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-26.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-26.c
@@ -29,4 +29,6 @@ constraint_equal (struct constraint a, struct constraint b)
 && constraint_expr_equal (a.rhs, b.rhs);
 }
 
-/* { dg-final { scan-tree-dump-times "Deleted dead store" 2 "dse1" } } */
+/* { dg-final { scan-tree-dump-times "Deleted dead store: x = " 1 "dse1" } } */
+/* { dg-final { scan-tree-dump-times "Deleted dead store: y = " 1 "dse1" } } */
+


Re: Compilation warning in simple-object-xcoff.c

2018-01-17 Thread Eli Zaretskii
> From: DJ Delorie 
> Cc: sch...@linux-m68k.org, gcc-patches@gcc.gnu.org, gdb-patc...@sourceware.org
> Date: Wed, 17 Jan 2018 15:47:49 -0500
> 
> Eli Zaretskii  writes:
> 
> > DJ, would the following semi-kludgey workaround be acceptable?
> 
> It would be no worse than what we have now, if the only purpose is to
> avoid a warning.
> 
> Ideally, we would check to see if we're discarding non-zero values from
> that offset, and not call the callback with known bogus data.  I suppose
> the usefulness of that depends on how often you'll encounter 4Gb+ xcoff64
> files on mingw32 ?

The answer to that question is "never", AFAIU.


[PATCH,NVPTX] Fix PR83920

2018-01-17 Thread Cesar Philippidis
In PR83920, I encountered a nvptx bug where live predicate variables
were clobbered before their value was broadcasted. Apparently, there
were problems in certain version of the CUDA driver where the JIT would
generate wrong code for shfl broadcasts. The attached patch teaches
nvptx_single not to apply that workaround if the predicate register is live.

Tom, does this patch look sane to you? I'm not sure if it defeats the
purpose of your original patch. Regardless, the live predicate registers
shouldn't be clobbered before they are used.

Unfortunately, I cannot reproduce the runtime failure with gemm example
in the PR, so I didn't include it in the patch. However, this patch does
fix the failure with da-1.c in og7. This patch does not cause any
regressions.

Is it OK for trunk?

Thanks,
Cesar
diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 55c7e3c..698c574 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -3957,6 +3957,7 @@ bb_first_real_insn (basic_block bb)
 static void
 nvptx_single (unsigned mask, basic_block from, basic_block to)
 {
+  bitmap live = DF_LIVE_IN (from);
   rtx_insn *head = BB_HEAD (from);
   rtx_insn *tail = BB_END (to);
   unsigned skip_mask = mask;
@@ -4126,8 +4127,9 @@ nvptx_single (unsigned mask, basic_block from, basic_block to)
 	 There is nothing in the PTX spec to suggest that this is wrong, or
 	 to explain why the extra initialization is needed.  So, we classify
 	 it as a JIT bug, and the extra initialization as workaround.  */
-	  emit_insn_before (gen_movbi (pvar, const0_rtx),
-			bb_first_real_insn (from));
+	  if (!bitmap_bit_p (live, REGNO (pvar)))
+	emit_insn_before (gen_movbi (pvar, const0_rtx),
+			  bb_first_real_insn (from));
 #endif
 	  emit_insn_before (nvptx_gen_vcast (pvar), tail);
 	}


Re: [PATCH], Fix PR target/pr83862: Fix PowerPC long double signbit with -mabi=ieeelongdouble

2018-01-17 Thread Michael Meissner
On Wed, Jan 17, 2018 at 04:09:57PM -0600, Segher Boessenkool wrote:
> On Tue, Jan 16, 2018 at 10:55:43PM -0500, Michael Meissner wrote:
> > PR target/83862 pointed out a problem I put into the 128-bit floating point
> > type signbit optimization.  The issue is we want to avoid doing a load to a
> > floating point/vector register and then a direct move to do signbit, so we
> > change the load to load the upper 64-bits of the floating point value to get
> > the sign bit.  Unfortunately, if the type is IEEE 128-bit and memory is
> > addressed with an indexed address on a little endian system, it generates an
> > illegal address and generates an internal compiler error.
> 
> So all this is caused by these splitters running after reload.  Why do
> we have to do that?  Do we?  We should be able to just change it to a
> subreg and shift, early already?

The part that is failing is trying to optimize the case:

x = signbit (*p)

Doing the code with just registers means that you will get:

vr = load *p
gr = diret move from vr
shift

With the optimization you get:

gr = 'upper' word
shift

If the address is:

base + index

In little endian, the compiler tried to generate:

(base + index) + 8

And it didn't have a temporary register to put base+index.  It only shows up on
a little endian system on a type that does REG+REG addressing.  IBM extended
double does not allow REG+REG addressing, so it wasn't an issue for that.  But
with GLIBC starting to test -mabi=ieeelongdouble, it showed up.

I believe when I worked on it, we were mostly big endian, and the IEEE stuff
wasn't yet completely solid, so it was a thinko on my part.

I did the memory optimization to GPR and avoid the direct move because it is
used frequently in the GLIBC math library (and direct move on power8 being on
the slow side).

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [PATCH], Set PowerPC .gnu_attribute for long double use if no call

2018-01-17 Thread Michael Meissner
On Fri, Jan 12, 2018 at 11:21:04AM -0600, Segher Boessenkool wrote:
> On Thu, Jan 11, 2018 at 01:11:05PM -0500, Michael Meissner wrote:
> > In working on the transition of PowerPC long double from using the IBM 
> > extended
> > double format to IEEE 128-bit floating point, I noticed that the long double
> > .gnu_attribute (#4) was not set if the compiler can handle long double 
> > directly
> > without doing the call to an emulator, such as using IEEE 128-bit floating
> > point on an ISA 3.0 (power9) 64-bit system.  This patch sets the attribute 
> > if
> > there is a move of the appropriate type.  I only check TF/TCmode for the 
> > normal
> > case, and DF/DCmode for -mlong-double-64, since IFmode is used for __ibm128
> > when long double is IEEE and KFmode is used for __float128 when long double 
> > is
> > IEEE.
> > 
> > I have checked this on a little endian power8 system with bootstrap and make
> > check.  There were no regressions, and I verified that the three new tests 
> > are
> > run and pass.  Can I check this into the trunk?
> 
> > [gcc]
> > 2018-01-11  Michael Meissner  
> > 
> > (rs6000_emit_move): If we load or store a long double type, set
> > the flags for noting the default long double type, even if we
> > don't pass or return a long double type.
> > 
> > [gcc/testsuite]
> > 2018-01-11  Michael Meissner  
> > 
> > * gcc.target/powerpc/gnuattr1.c: New test to make sure we set the
> > appropriate .gnu_attribute for the long double type, if we use the
> > long double type, but do not generate any calls.
> > * gcc.target/powerpc/gnuattr2.c: Likewise.
> > * gcc.target/powerpc/gnuattr3.c: Likewise.
> 
> 
> > +  if (rs6000_gnu_attr
> > +  && ((HAVE_LD_PPC_GNU_ATTR_LONG_DOUBLE || TARGET_64BIT))
> 
> One pair of parens is enough ;-)
> 
> > +  && ((TARGET_LONG_DOUBLE_128
> > +  && (mode == TFmode || mode == TCmode))
> > + || (!TARGET_LONG_DOUBLE_128
> > + && (mode == DFmode || mode == DCmode
> 
> It's easier to read if you join these lines pairwise:
> 
> > +  && ((TARGET_LONG_DOUBLE_128 && (mode == TFmode || mode == TCmode))
> > + || (!TARGET_LONG_DOUBLE_128 && (mode == DFmode || mode == DCmode
> 
> Or maybe something with ?:, or break the statement into multiple.
> 
> Okay for trunk if you make it a bit more readable :-)  Thanks,

This is what I just checked in:

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 256810)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -10493,6 +10493,23 @@ rs6000_emit_move (rtx dest, rtx source,
   gcc_unreachable ();
 }
 
+#ifdef HAVE_AS_GNU_ATTRIBUTE
+  /* If we use a long double type, set the flags in .gnu_attribute that say
+ what the long double type is.  This is to allow the linker's warning
+ message for the wrong long double to be useful, even if the function does
+ not do a call (for example, doing a 128-bit add on power9 if the long
+ double type is IEEE 128-bit.  Do not set this if __ibm128 or __floa128 are
+ used if they aren't the default long dobule type.  */
+  if (rs6000_gnu_attr && (HAVE_LD_PPC_GNU_ATTR_LONG_DOUBLE || TARGET_64BIT))
+{
+  if (TARGET_LONG_DOUBLE_128 && (mode == TFmode || mode == TCmode))
+   rs6000_passes_float = rs6000_passes_long_double = true;
+
+  else if (!TARGET_LONG_DOUBLE_128 && (mode == DFmode || mode == DCmode))
+   rs6000_passes_float = rs6000_passes_long_double = true;
+}
+#endif
+
   /* See if we need to special case SImode/SFmode SUBREG moves.  */
   if ((mode == SImode || mode == SFmode) && SUBREG_P (source)
   && rs6000_emit_move_si_sf_subreg (dest, source, mode))

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



[rs6000] ira issue hidden by gimple folding.

2018-01-17 Thread Will Schmidt
Hi folks, 

(I wanted to get this to the list before my EOD..)  :-)

This is a simplified test that is failing for me on Power8, BE, when
gimple-folding is disabled.   
I noticed this while working testcase patches for the mergehl folding,
but this is a pre-existing issue.
The majority of the builtins-1-be.c test is OK, so possibly just this
one intrinsic that has the underlying issue.

--><--

/* { dg-do compile { target { powerpc64-*-* } } } */
/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power8" } } */

// ICE on power8 BE  in ira code with -mno-fold-gimple.  works OK with 
-mfold-gimple.

/* { dg-options "-mcpu=power8 -O0 -mno-fold-gimple" } */

#include 

int main ()
{

  vector long long l3 = {5L, 14L};
  vector long long l4 = {7L, 24L};
  vector long long l7 = vec_div (l3, l4);

return 0;
}


/home/willschm/gcc/build/gcc-mainline-baseline/gcc/xgcc 
-B/home/willschm/gcc/build/gcc-mainline-baseline/gcc/ 
/home/willschm/gcc/testme.c  -fno-diagnostics-show-caret 
-fdiagnostics-color=never -mcpu=power8 -O0 -ffat-lto-objects -S -m32  -o 
builtins-1-be.s -dap -mno-fold-gimple  -da 
gimple folding of rs6000 builtins has been disabled.
during RTL pass: ira
dump file: testme.c.278r.ira
/home/willschm/gcc/testme.c: In function ‘main’:
/home/willschm/gcc/testme.c:16:1: internal compiler error: in 
elimination_costs_in_insn, at reload1.c:3633
0x108a05af elimination_costs_in_insn
/home/willschm/gcc/gcc-mainline-baseline/gcc/reload1.c:3630
0x108a8be7 calculate_elim_costs_all_insns()
/home/willschm/gcc/gcc-mainline-baseline/gcc/reload1.c:1607
0x106f79b7 ira_costs()
/home/willschm/gcc/gcc-mainline-baseline/gcc/ira-costs.c:2249
0x106ef2d3 ira_build()
/home/willschm/gcc/gcc-mainline-baseline/gcc/ira-build.c:3421
0x106e30af ira
/home/willschm/gcc/gcc-mainline-baseline/gcc/ira.c:5292
0x106e30af execute
/home/willschm/gcc/gcc-mainline-baseline/gcc/ira.c:5603
Please submit a full bug report,
with preprocessed source if appropriate.




On Wed, 2018-01-17 at 11:31 -0600, Segher Boessenkool wrote:
<...>
> > My regression test results suggest that the addition of the
> > -mno-fold-gimple option to the existing testcases appears to have
> > uncovered an ICE, so pausing for the moment...
> 
> Good luck :-)  If you are reasonably certain the bug is not in your patch
> (but pre-existing), please do commit the patch.
> 
> 
> Segher
> 




Re: [PATCH,PTX] Add support for CUDA 9

2018-01-17 Thread Tom de Vries

On 01/17/2018 06:29 PM, Cesar Philippidis wrote:

Is this patch OK for trunk?


You haven't made the changes I've asked for, this is the same patch as 
before.


Thanks,
- Tom


Re: [PATCH], Fix PR target/pr83862: Fix PowerPC long double signbit with -mabi=ieeelongdouble

2018-01-17 Thread Segher Boessenkool
On Tue, Jan 16, 2018 at 10:55:43PM -0500, Michael Meissner wrote:
> PR target/83862 pointed out a problem I put into the 128-bit floating point
> type signbit optimization.  The issue is we want to avoid doing a load to a
> floating point/vector register and then a direct move to do signbit, so we
> change the load to load the upper 64-bits of the floating point value to get
> the sign bit.  Unfortunately, if the type is IEEE 128-bit and memory is
> addressed with an indexed address on a little endian system, it generates an
> illegal address and generates an internal compiler error.

So all this is caused by these splitters running after reload.  Why do
we have to do that?  Do we?  We should be able to just change it to a
subreg and shift, early already?


Segher


[PATCH, fortran] PR/83864 - ICE in gfc_apply_init, at fortran/expr.c:4271

2018-01-17 Thread Harald Anlauf
The following obvious patch fixes a NULL pointer dereference:

Index: gcc/fortran/expr.c
===
--- gcc/fortran/expr.c  (revision 256671)
+++ gcc/fortran/expr.c  (working copy)
@@ -4267,7 +4269,7 @@
 gfc_set_constant_character_len (len, init, -1);
   else if (init
   && init->ts.type == BT_CHARACTER
-   && init->ts.u.cl
+   && init->ts.u.cl && init->ts.u.cl->length
&& mpz_cmp (ts->u.cl->length->value.integer,
init->ts.u.cl->length->value.integer))
 {

Regtests without new failures on i686-pc-linux-gnu.
Testcase derived from PR, see below.

Changelog:

2018-01-17  Harald Anlauf  

PR fortran/83864
* expr.c (add_init_expr_to_sym): Do not dereference NULL pointer.


Testcase:

2018-01-17  Harald Anlauf  

PR fortran/83864
* gfortran.dg/pr83864.f90: New test.


Index: gfortran.dg/pr83864.f90
===
--- gfortran.dg/pr83864.f90 (revision 0)
+++ gfortran.dg/pr83864.f90 (revision 0)
@@ -0,0 +1,13 @@
+! { dg-do run }
+! PR fortran/83864
+!
+! Derived from PR by Contributed by Gerhard Steinmetz 
+!
+program p
+  implicit none
+  type t
+ character :: c(3) = transfer('abc','z',3)
+  end type t
+  type(t) :: x
+  if (any (x%c /= ["a", "b", "c"])) call abort ()
+end

Whoever wants to commit this to 8-trunk, please do so.

Thanks,
Harald


Re: [PATCH] Add clobbers for callee copied argument temporaries (PR sanitizer/81715, PR testsuite/83882)

2018-01-17 Thread John David Anglin

On 2018-01-17 3:07 PM, Jakub Jelinek wrote:

John, do you think you could test this on hppa without the callee copies
default change?

Or should we not care anymore if there aren't any similar targets left?

I'll test, probably starting build this evening.

I only changed the linux target.  The hpux and bsd targets are still 
callee copies.


Dave

--
John David Anglin  dave.ang...@bell.net



Re: [PATCH v2, rs6000] Implement 32- and 64-bit BE handling for -mno-speculate-indirect-jumps

2018-01-17 Thread Segher Boessenkool
On Tue, Jan 16, 2018 at 08:08:57PM -0600, Bill Schmidt wrote:
> This patch supercedes and extends 
> https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01479.html,
> adding the remaining big-endian support for -mno-speculate-indirect-jumps.
> This includes 32-bit support for indirect calls and sibling calls, and 
> 64-bit support for indirect calls.  The endian-neutral switch handling has
> already been committed.
> 
> Using -m32 -O2 on safe-indirect-jumps-1.c results in a test for a sibling 
> call, so this has been added as safe-indirect-jumps-8.c.  Also, 
> safe-indirect-jumps-7.c adds a variant that will not generate a sibling 
> call for -m32, so we still get indirect call coverage.
> 
> Bootstrapped and tested on powerpc64-linux-gnu and powerpc64le-linux-gnu 
> with no regressions.  Is this okay for trunk?

Okay for trunk and backports.  A few possible cleanups (okay with or
without, you need to get this in soon):

> -   (set_attr "length" "4,4,8,8")])
> +   (set (attr "length")
> + (cond [(and (eq (symbol_ref "which_alternative") (const_int 0))
> + (eq (symbol_ref "rs6000_speculate_indirect_jumps")
> + (const_int 1)))
> +   (const_string "4")

You could leave out the "4" cases, it's the default.  Might be easier
to read.

I'd use "ne 0" instead of "eq 1", but this will work I guess.

> @@ -10909,7 +10982,13 @@
>  output_asm_insn (\"creqv 6,6,6\", operands);
>  
>if (which_alternative >= 2)
> -return \"b%T0\";
> +{
> +  if (rs6000_speculate_indirect_jumps)
> + return \"b%T0\";

You can write the block as a block ({}) instead of as a string ("*{}")
so you don't need all the backslashes.

Thanks for all the work!


Segher


Re: C++ PATCH for c++/83714, ICE checking return from template

2018-01-17 Thread Jason Merrill
On Wed, Jan 17, 2018 at 3:46 PM, Paolo Carlini  wrote:
> Hi Jason,
>
> On 17/01/2018 00:04, Jason Merrill wrote:
>>
>> Like my recent patch for 83186, we were missing a
>> build_non_dependent_expr.
>>
>> Tested x86_64-pc-linux-gnu, applying to trunk.
>
> Lately I'm seeing (H.J. Lu too) a regression:
>
> FAIL: g++.dg/template/inherit4.C -std=c++11 (test for excess errors)
> FAIL: g++.dg/template/inherit4.C -std=c++14 (test for excess errors)
>
> which seems related to this change of yours: if I comment out the new
> build_non_dependent_expr call the test is accepted again. Could you please
> have a look?

Hmm, wonder why I didn't see that in my testing.  Checking a fix now.

Jason


Go patch committed: Enable escape analysis for runtime

2018-01-17 Thread Ian Lance Taylor
This patch to the Go frontend by Cherry Zhang enables escape analysis
for the runtime package in the Go frontend.  The runtime package was
hard-coded non-escape, and the escape analysis was not run for the
runtime package.  This patch removes the hard-code, and lets the
escape analysis decide.  It is not allowed for local variables and
closures in the runtime to be heap allocated. This patch adds the
check that make sure that they indeed do not escape.

The escape analysis is always run when compiling the runtime now.

Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline.

I'm almost done.  I promise.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 256810)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-3ea7fc3b918210e7248dbc51d90af20639dc4167
+1072286ca9249bd6f75628aead325a66286bcf5b
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/escape.cc
===
--- gcc/go/gofrontend/escape.cc (revision 256707)
+++ gcc/go/gofrontend/escape.cc (working copy)
@@ -873,13 +873,12 @@ escape_hash_match(std::string suffix, st
 void
 Gogo::analyze_escape()
 {
-  if (!optimize_allocation_flag.is_enabled() || saw_errors())
+  if (saw_errors())
 return;
 
-  // Currently runtime is hard-coded to non-escape in various places.
-  // Don't run escape analysis for runtime.
-  // TODO: remove this once it works for runtime.
-  if (this->compiling_runtime() && this->package_name() == "runtime")
+  if (!optimize_allocation_flag.is_enabled()
+  && !this->compiling_runtime())
+// We always run escape analysis when compiling runtime.
 return;
 
   // Discover strongly connected groups of functions to analyze for escape
@@ -1473,6 +1472,35 @@ Escape_analysis_assign::statement(Block*
   return TRAVERSE_SKIP_COMPONENTS;
 }
 
+// Helper function to emit moved-to-heap diagnostics.
+
+static void
+move_to_heap(Gogo* gogo, Expression *expr)
+{
+  Named_object* no;
+  if (expr->var_expression() != NULL)
+no = expr->var_expression()->named_object();
+  else if (expr->enclosed_var_expression() != NULL)
+no = expr->enclosed_var_expression()->variable();
+  else
+return;
+
+  if ((no->is_variable()
+   && !no->var_value()->is_global())
+  || no->is_result_variable())
+{
+  Node* n = Node::make_node(expr);
+  if (gogo->debug_escape_level() != 0)
+go_inform(n->definition_location(),
+  "moved to heap: %s",
+  n->ast_format(gogo).c_str());
+  if (gogo->compiling_runtime() && gogo->package_name() == "runtime")
+go_error_at(expr->location(),
+"%s escapes to heap, not allowed in runtime",
+n->ast_format(gogo).c_str());
+}
+}
+
 // Model expressions within a function as assignments and flows between nodes.
 
 int
@@ -1489,13 +1517,7 @@ Escape_analysis_assign::expression(Expre
   if (debug_level > 1)
go_inform((*pexpr)->location(), "%s too large for stack",
   n->ast_format(gogo).c_str());
-  if (debug_level != 0
-  && ((*pexpr)->var_expression() != NULL
-  || (*pexpr)->enclosed_var_expression() != NULL))
-go_inform(n->definition_location(),
-  "moved to heap: %s",
-  n->ast_format(gogo).c_str());
-
+  move_to_heap(gogo, *pexpr);
   n->set_encoding(Node::ESCAPE_HEAP);
   (*pexpr)->address_taken(true);
   this->assign(this->context_->sink(), n);
@@ -2968,25 +2990,20 @@ Escape_analysis_flood::flood(Level level
  if (src_leaks)
{
  src->set_encoding(Node::ESCAPE_HEAP);
- if (debug_level != 0 && osrcesc != src->encoding())
-   {
-  if (underlying->var_expression() != NULL
-  || underlying->enclosed_var_expression() != NULL)
-go_inform(underlying_node->definition_location(),
-  "moved to heap: %s",
-  underlying_node->ast_format(gogo).c_str());
-
- if (debug_level > 1)
-   go_inform(src->location(),
- "%s escapes to heap, level={%d %d}, "
- "dst.eld=%d, src.eld=%d",
- src->ast_format(gogo).c_str(), level.value(),
- level.suffix_value(), dst_state->loop_depth,
- mod_loop_depth);
- else
-   go_inform(src->location(), "%s escapes to heap",
- src->ast_format(gogo).c_str());
-   }
+  if (osrcesc != src->encoding())
+{
+  move_to_heap(gogo, underlying);
+  if 

Re: [PATCH] PR/83874: ICE initializing character array from derived type

2018-01-17 Thread Steve Kargl
On Wed, Jan 17, 2018 at 10:10:29PM +0100, Harald Anlauf wrote:
> 
> Whoever wants to take it, please commit to 8-trunk.
>

I take it from the above that you do not have write
access to svn.  If you think that you will be taking
on additional bugs, write access can be arranged. 

-- 
Steve


Re: [C++ Patch] PR 78344 ("ICE on invalid c++ code (tree check: expected tree_list, have error_mark in cp_check_const_attributes, at cp/decl2.c:1347")

2018-01-17 Thread Jason Merrill
On Wed, Jan 17, 2018 at 4:07 PM, Jakub Jelinek  wrote:
> On Wed, Jan 17, 2018 at 02:31:29PM -0500, Jason Merrill wrote:
>> > First, thanks for your messages. Personally, at this late time for 8, I 
>> > vote for something like my most recent grokdeclarator fix and yours above 
>> > for 83824. Then, for 9, or even 8.2, the more encompassing change for all 
>> > those chainons. Please both of you let me know how shall we proceed, I 
>> > could certainly take care of the latter too from now on. Thanks again!
>>
>> Let's go ahead with your patch to grokdeclarator.  In the parser,
>> let's do what Jakub is suggesting here:
>>
>> > So, either we want to go with what Paolo posted even in this case,
>> > i.e. turn decl_specs->attributes into error_mark_node if attrs
>> > is error_mark_node, and don't chainon anything to it if 
>> > decl_specs->attributes
>> > is already error_mark_node, e.g. something like:
>> > if (attrs == error_mark_node || decl_specs->attributes == error_mark_node)
>> >   decl_specs->attributes = error_mark_node;
>> > else
>> >   decl_specs->attributes = chainon (decl_specs->attributes, attrs);
>>
>> without any assert.  Putting this logic in an attr_chainon function sounds 
>> good.
>
> So like this?  So far just tested with make check-c++-all on both
> x86_64-linux and i686-linux, full bootstrap/regtest scheduled, ok if it
> passes?  I gave up on the original idea to return void and have the first
> argument pointer, because while many of the calls do x = chainon (x, y);,
> there are several ones that assign it to something else, like y = chainon (x, 
> y);
> etc.

Looks good.

Jason


[PATCH] PR/83874: ICE initializing character array from derived type

2018-01-17 Thread Harald Anlauf
The following obvious patch fixes a NULL pointer dereference:

Index: gcc/fortran/decl.c
===
--- gcc/fortran/decl.c  (revision 256671)
+++ gcc/fortran/decl.c  (working copy)
@@ -1718,7 +1718,7 @@
}
  else if (init->expr_type == EXPR_ARRAY)
{
- if (init->ts.u.cl)
+ if (init->ts.u.cl && init->ts.u.cl->length)
{
  const gfc_expr *length = init->ts.u.cl->length;
  if (length->expr_type != EXPR_CONSTANT)


Regtests without new failures on i686-pc-linux-gnu.
Testcase derived PR see below.

Whoever wants to take it, please commit to 8-trunk.
Due to the nature of the patch, it should be safe to backport
to the 6 and 7 branches.

Thanks,
Harald

---

Changelog:

2018-01-17  Harald Anlauf  

PR fortran/83874
* decl.c (add_init_expr_to_sym): Do not dereference NULL pointer.



Testsuite:

2018-01-17  Harald Anlauf  

PR fortran/83874
* gfortran.dg/pr83874.f90: New test.


Index: gfortran.dg/pr83874.f90
===
--- gfortran.dg/pr83874.f90 (revision 0)
+++ gfortran.dg/pr83874.f90 (revision 0)
@@ -0,0 +1,19 @@
+! { dg-do run }
+! PR fortran/83874
+! There was an ICE while initializing the character arrays
+!
+! Contributed by Harald Anlauf 
+!
+program charinit
+  implicit none
+  type t
+ character(len=1) :: name
+  end type t
+  type(t), parameter :: z(2)= [ t ('a'), t ('b') ]
+  character(len=1), parameter :: names1(*) = z% name
+  character(len=*), parameter :: names2(2) = z% name
+  character(len=*), parameter :: names3(*) = z% name
+  if (.not. (names1(1) == "a" .and. names1(2) == "b")) call abort ()
+  if (.not. (names2(1) == "a" .and. names2(2) == "b")) call abort ()
+  if (.not. (names3(1) == "a" .and. names3(2) == "b")) call abort ()
+end program charinit


Re: [C++ Patch] PR 78344 ("ICE on invalid c++ code (tree check: expected tree_list, have error_mark in cp_check_const_attributes, at cp/decl2.c:1347")

2018-01-17 Thread Jakub Jelinek
On Wed, Jan 17, 2018 at 02:31:29PM -0500, Jason Merrill wrote:
> > First, thanks for your messages. Personally, at this late time for 8, I 
> > vote for something like my most recent grokdeclarator fix and yours above 
> > for 83824. Then, for 9, or even 8.2, the more encompassing change for all 
> > those chainons. Please both of you let me know how shall we proceed, I 
> > could certainly take care of the latter too from now on. Thanks again!
> 
> Let's go ahead with your patch to grokdeclarator.  In the parser,
> let's do what Jakub is suggesting here:
> 
> > So, either we want to go with what Paolo posted even in this case,
> > i.e. turn decl_specs->attributes into error_mark_node if attrs
> > is error_mark_node, and don't chainon anything to it if 
> > decl_specs->attributes
> > is already error_mark_node, e.g. something like:
> > if (attrs == error_mark_node || decl_specs->attributes == error_mark_node)
> >   decl_specs->attributes = error_mark_node;
> > else
> >   decl_specs->attributes = chainon (decl_specs->attributes, attrs);
> 
> without any assert.  Putting this logic in an attr_chainon function sounds 
> good.

So like this?  So far just tested with make check-c++-all on both
x86_64-linux and i686-linux, full bootstrap/regtest scheduled, ok if it
passes?  I gave up on the original idea to return void and have the first
argument pointer, because while many of the calls do x = chainon (x, y);,
there are several ones that assign it to something else, like y = chainon (x, 
y);
etc.

2018-01-17  Jakub Jelinek  

PR c++/83824
* parser.c (attr_chainon): New function.
(cp_parser_label_for_labeled_statement, cp_parser_decl_specifier_seq,
cp_parser_namespace_definition, cp_parser_init_declarator,
cp_parser_type_specifier_seq, cp_parser_parameter_declaration,
cp_parser_gnu_attributes_opt): Use it.
(cp_parser_member_declaration, cp_parser_objc_class_ivars,
cp_parser_objc_struct_declaration): Likewise.  Don't reset
prefix_attributes if attributes is error_mark_node.

* g++.dg/cpp0x/pr83824.C: New test.

--- gcc/cp/parser.c.jj  2018-01-13 17:57:38.115836072 +0100
+++ gcc/cp/parser.c 2018-01-17 20:46:21.809738257 +0100
@@ -10908,6 +10908,18 @@ cp_parser_statement (cp_parser* parser,
"attributes at the beginning of statement are ignored");
 }
 
+/* Append ATTR to attribute list ATTRS.  */
+
+static tree
+attr_chainon (tree attrs, tree attr)
+{
+  if (attrs == error_mark_node)
+return error_mark_node;
+  if (attr == error_mark_node)
+return error_mark_node;
+  return chainon (attrs, attr);
+}
+
 /* Parse the label for a labeled-statement, i.e.
 
identifier :
@@ -11027,7 +11039,7 @@ cp_parser_label_for_labeled_statement (c
   else if (!cp_parser_parse_definitely (parser))
;
   else
-   attributes = chainon (attributes, attrs);
+   attributes = attr_chainon (attributes, attrs);
 }
 
   if (attributes != NULL_TREE)
@@ -13394,8 +13406,7 @@ cp_parser_decl_specifier_seq (cp_parser*
  else
{
  decl_specs->std_attributes
-   = chainon (decl_specs->std_attributes,
-  attrs);
+   = attr_chainon (decl_specs->std_attributes, attrs);
  if (decl_specs->locations[ds_std_attribute] == 0)
decl_specs->locations[ds_std_attribute] = 
token->location;
}
@@ -13403,9 +13414,8 @@ cp_parser_decl_specifier_seq (cp_parser*
}
}
 
-   decl_specs->attributes
- = chainon (decl_specs->attributes,
-attrs);
+ decl_specs->attributes
+   = attr_chainon (decl_specs->attributes, attrs);
  if (decl_specs->locations[ds_attribute] == 0)
decl_specs->locations[ds_attribute] = token->location;
  continue;
@@ -18471,7 +18481,7 @@ cp_parser_namespace_definition (cp_parse
  identifier = cp_parser_identifier (parser);
 
  /* Parse any attributes specified after the identifier.  */
- attribs = chainon (attribs, cp_parser_attributes_opt (parser));
+ attribs = attr_chainon (attribs, cp_parser_attributes_opt (parser));
}
 
   if (cp_lexer_next_token_is_not (parser->lexer, CPP_SCOPE))
@@ -19633,7 +19643,7 @@ cp_parser_init_declarator (cp_parser* pa
   decl = grokfield (declarator, decl_specifiers,
initializer, !is_non_constant_init,
/*asmspec=*/NULL_TREE,
-   chainon (attributes, prefix_attributes));
+   attr_chainon (attributes, prefix_attributes));
   if (decl && TREE_CODE (decl) == FUNCTION_DECL)
cp_parser_save_default_args (parser, decl);
   cp_finalize_omp_declare_simd (parser, decl);
@@ -21007,9 +21017,9 @@ cp_parser_type_specifier_seq (cp_parser*
  

Re: Compilation warning in simple-object-xcoff.c

2018-01-17 Thread DJ Delorie
Eli Zaretskii  writes:

> DJ, would the following semi-kludgey workaround be acceptable?

It would be no worse than what we have now, if the only purpose is to
avoid a warning.

Ideally, we would check to see if we're discarding non-zero values from
that offset, and not call the callback with known bogus data.  I suppose
the usefulness of that depends on how often you'll encounter 4Gb+ xcoff64
files on mingw32 ?


Re: C++ PATCH for c++/83714, ICE checking return from template

2018-01-17 Thread Paolo Carlini

Hi Jason,

On 17/01/2018 00:04, Jason Merrill wrote:

Like my recent patch for 83186, we were missing a build_non_dependent_expr.

Tested x86_64-pc-linux-gnu, applying to trunk.

Lately I'm seeing (H.J. Lu too) a regression:

FAIL: g++.dg/template/inherit4.C -std=c++11 (test for excess errors)
FAIL: g++.dg/template/inherit4.C -std=c++14 (test for excess errors)

which seems related to this change of yours: if I comment out the new 
build_non_dependent_expr call the test is accepted again. Could you 
please have a look?


Thanks!
Paolo.


Fix ICE with flatten on uninlinable call

2018-01-17 Thread Jan Hubicka
Hi,
this patch fixes ICE where we manage to lose the CIF_FINAL failure on inlining.

Bootstrapped/regtested x86_64-linux, comitted.

Honza

PR ipa/83051
* ipa-inline.c (flatten_function): Do not overwrite final inlining
failure.
* gcc.c-torture/compile/pr83051-2.c: New testcase.
Index: ipa-inline.c
===
--- ipa-inline.c(revision 256795)
+++ ipa-inline.c(working copy)
@@ -2083,7 +2083,8 @@ flatten_function (struct cgraph_node *no
 "Not inlining %s into %s to avoid cycle.\n",
 xstrdup_for_dump (callee->name ()),
 xstrdup_for_dump (e->caller->name ()));
- e->inline_failed = CIF_RECURSIVE_INLINING;
+ if (cgraph_inline_failed_type (e->inline_failed) != CIF_FINAL_ERROR)
+   e->inline_failed = CIF_RECURSIVE_INLINING;
  continue;
}
 
Index: testsuite/gcc.c-torture/compile/pr83051-2.c
===
--- testsuite/gcc.c-torture/compile/pr83051-2.c (revision 0)
+++ testsuite/gcc.c-torture/compile/pr83051-2.c (working copy)
@@ -0,0 +1,12 @@
+/* { dg-options "-fno-early-inlining" } */
+void
+bar ()
+{
+  bar (0);
+}
+
+__attribute__ ((flatten))
+void foo ()
+{
+  bar ();
+}


Re: [PING][PATCH, AArch64] Disable reg offset in quad-word store for Falkor

2018-01-17 Thread Siddhesh Poyarekar
On Thursday 18 January 2018 01:11 AM, Jim Wilson wrote:
> This is the only solution I found that worked.

I tried a few things and ended up with pretty much the same fix you have
except with the check in a slightly different place.  That is, I used
aarch64_classify_address to gate the change because I intend to use that
to gate pre/post-increments for stores as well for falkor.

I also fixed up my submission to incorporate your more recent changes,
which was to add an extra tuning option instead of just checking for falkor.

Siddhesh

2018-xx-xx  Jim Wilson  
Kugan Vivenakandarajah  
Siddhesh Poyarekar  

gcc/
* gcc/config/aarch64/aarch64-protos.h (aarch64_addr_query_type):
New member ADDR_QUERY_STR.
* gcc/config/aarch64/aarch64-tuning-flags.def
(SLOW_REGOFFSET_QUADWORD_STORE): New.
* gcc/config/aarch64/aarch64.c (qdf24xx_tunings): Add
SLOW_REGOFFSET_QUADWORD_STORE to tuning flags.
(aarch64_classify_address): Avoid register indexing for quad
mode stores when SLOW_REGOFFSET_QUADWORD_STORE is set.
* gcc/config/aarch64/constraints.md (Uts): New constraint.
* gcc/config/aarch64/aarch64.md (movti_aarch64, movtf_aarch64):
Use it.
* gcc/config/aarch64/aarch64-simd.md (aarch64_simd_mov):
Likewise.

gcc/testsuite/
* gcc/testsuite/gcc.target/aarch64/pr82533.c: New test case.
diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 159bc6aee7e..15924fc3f58 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -120,6 +120,9 @@ enum aarch64_symbol_type
ADDR_QUERY_LDP_STP
   Query what is valid for a load/store pair.
 
+   ADDR_QUERY_STR
+  Query what is valid for a store.
+
ADDR_QUERY_ANY
   Query what is valid for at least one memory constraint, which may
   allow things that "m" doesn't.  For example, the SVE LDR and STR
@@ -128,6 +131,7 @@ enum aarch64_symbol_type
 enum aarch64_addr_query_type {
   ADDR_QUERY_M,
   ADDR_QUERY_LDP_STP,
+  ADDR_QUERY_STR,
   ADDR_QUERY_ANY
 };
 
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 3d1f6a01cb7..48d92702723 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -131,9 +131,9 @@
 
 (define_insn "*aarch64_simd_mov"
   [(set (match_operand:VQ 0 "nonimmediate_operand"
-   "=w, Umq,  m,  w, ?r, ?w, ?r, w")
+   "=w, Umq, Uts,  w, ?r, ?w, ?r, w")
(match_operand:VQ 1 "general_operand"
-   "m,  Dz, w,  w,  w,  r,  r, Dn"))]
+   "m,  Dz,w,  w,  w,  r,  r, Dn"))]
   "TARGET_SIMD
&& (register_operand (operands[0], mode)
|| aarch64_simd_reg_or_zero (operands[1], mode))"
diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def 
b/gcc/config/aarch64/aarch64-tuning-flags.def
index ea9ead234cb..04baf5b6de6 100644
--- a/gcc/config/aarch64/aarch64-tuning-flags.def
+++ b/gcc/config/aarch64/aarch64-tuning-flags.def
@@ -41,4 +41,8 @@ AARCH64_EXTRA_TUNING_OPTION ("slow_unaligned_ldpw", 
SLOW_UNALIGNED_LDPW)
are not considered cheap.  */
 AARCH64_EXTRA_TUNING_OPTION ("cheap_shift_extend", CHEAP_SHIFT_EXTEND)
 
+/* Don't use a register offset in a memory address for a quad-word store.  */
+AARCH64_EXTRA_TUNING_OPTION ("slow_regoffset_quadword_store",
+SLOW_REGOFFSET_QUADWORD_STORE)
+
 #undef AARCH64_EXTRA_TUNING_OPTION
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 0599a79bfeb..d17f3b70271 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -894,7 +894,7 @@ static const struct tune_params qdf24xx_tunings =
   2,   /* min_div_recip_mul_df.  */
   0,   /* max_case_values.  */
   tune_params::AUTOPREFETCHER_WEAK,/* autoprefetcher_model.  */
-  (AARCH64_EXTRA_TUNE_NONE),   /* tune_flags.  */
+  (AARCH64_EXTRA_TUNE_SLOW_REGOFFSET_QUADWORD_STORE),  /* tune_flags.  */
   _prefetch_tune
 };
 
@@ -5531,6 +5531,16 @@ aarch64_classify_address (struct aarch64_address_info 
*info,
|| vec_flags == VEC_ADVSIMD
|| vec_flags == VEC_SVE_DATA));
 
+  /* Avoid register indexing for 128-bit stores when the
+ AARCH64_EXTRA_TUNE_SLOW_REGOFFSET_QUADWORD_STORE option is set.  */
+  if (!optimize_size
+  && type == ADDR_QUERY_STR
+  && (aarch64_tune_params.extra_tuning_flags
+ & AARCH64_EXTRA_TUNE_SLOW_REGOFFSET_QUADWORD_STORE)
+  && (mode == TImode || mode == TFmode
+ || aarch64_vector_data_mode_p (mode)))
+allow_reg_index_p = false;
+
   /* For SVE, only accept [Rn], [Rn, Rm, LSL #shift] and
  [Rn, #offset, MUL VL].  */
   if ((vec_flags & (VEC_SVE_DATA | VEC_SVE_PRED)) != 0
diff --git a/gcc/config/aarch64/aarch64.md 

[PATCH] Avoid creating overflows in match.pd (P + A) - (P + B) POINTER_DIFF_EXPR optimization (PR c/61240)

2018-01-17 Thread Jakub Jelinek
Hi!

POINTER_DIFF_EXPR returns a signed integer, but sadly POINTER_PLUS_EXPR
second arguments are unsigned integers, so if we are adding negative numbers
to pointers, those are very large numbers and we get TREE_OVERFLOW which the
C FE then during c_fully_fold diagnoses.

We want to treat the numbers as signed integers for this purpose, using
VCE seems easiest way to get rid of the unwanted overflows.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Or do you prefer manual fold_converts in the (with { }) block + test
for INTEGER_CST && TREE_OVERFLOW + drop_tree_overflow instead?

2018-01-17  Jakub Jelinek  

PR c/61240
* match.pd ((P + A) - P, P - (P + A), (P + A) - (P + B)): For
pointer_diff optimizations use view_convert instead of convert.

* gcc.dg/pr61240.c: New test.

--- gcc/match.pd.jj 2018-01-15 10:02:04.0 +0100
+++ gcc/match.pd2018-01-17 17:10:54.855061485 +0100
@@ -1832,7 +1832,10 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 /* The second argument of pointer_plus must be interpreted as signed, and
thus sign-extended if necessary.  */
 (with { tree stype = signed_type_for (TREE_TYPE (@1)); }
- (convert (convert:stype @1
+ /* Use view_convert instead of convert here, as POINTER_PLUS_EXPR
+   second arg is unsigned even when we need to consider it as signed,
+   we don't want to diagnose overflow here.  */
+ (convert (view_convert:stype @1
 
   /* (T)P - (T)(P + A) -> -(T) A */
   (simplify
@@ -1876,7 +1879,10 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 /* The second argument of pointer_plus must be interpreted as signed, and
thus sign-extended if necessary.  */
 (with { tree stype = signed_type_for (TREE_TYPE (@1)); }
- (negate (convert (convert:stype @1)
+ /* Use view_convert instead of convert here, as POINTER_PLUS_EXPR
+   second arg is unsigned even when we need to consider it as signed,
+   we don't want to diagnose overflow here.  */
+ (negate (convert (view_convert:stype @1)
 
   /* (T)(P + A) - (T)(P + B) -> (T)A - (T)B */
   (simplify
@@ -1927,7 +1933,11 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 /* The second argument of pointer_plus must be interpreted as signed, and
thus sign-extended if necessary.  */
 (with { tree stype = signed_type_for (TREE_TYPE (@1)); }
- (minus (convert (convert:stype @1)) (convert (convert:stype @2)))
+ /* Use view_convert instead of convert here, as POINTER_PLUS_EXPR
+   second arg is unsigned even when we need to consider it as signed,
+   we don't want to diagnose overflow here.  */
+ (minus (convert (view_convert:stype @1))
+   (convert (view_convert:stype @2)))
 
 
 /* Simplifications of MIN_EXPR, MAX_EXPR, fmin() and fmax().  */
--- gcc/testsuite/gcc.dg/pr61240.c.jj   2018-01-17 17:25:45.821030898 +0100
+++ gcc/testsuite/gcc.dg/pr61240.c  2018-01-17 17:26:24.118029550 +0100
@@ -0,0 +1,20 @@
+/* PR c/61240 */
+/* { dg-do compile } */
+
+typedef __PTRDIFF_TYPE__ ptrdiff_t;
+
+ptrdiff_t
+foo (ptrdiff_t a[4])
+{
+  int i[4];
+  int *p = i + 2;
+  static ptrdiff_t b = p - (p - 1);/* { dg-bogus "integer overflow in 
expression" } */
+  static ptrdiff_t c = (p - 1) - p;/* { dg-bogus "integer overflow in 
expression" } */
+  static ptrdiff_t d = (p - 2) - (p - 1);/* { dg-bogus "integer overflow in 
expression" } */
+  static ptrdiff_t e = (p - 1) - (p - 2);/* { dg-bogus "integer overflow in 
expression" } */
+  a[0] = p - (p - 1);  /* { dg-bogus "integer overflow in 
expression" } */
+  a[1] = (p - 1) - p;  /* { dg-bogus "integer overflow in 
expression" } */
+  a[2] = (p - 2) - (p - 1);/* { dg-bogus "integer overflow in 
expression" } */
+  a[3] = (p - 1) - (p - 2);/* { dg-bogus "integer overflow in 
expression" } */
+  return b + c + d + e;
+}

Jakub


Re: [C++ PATCH] Avoid appending a useless __builtin_unreachable in functions which do return (PR c++/83897)

2018-01-17 Thread Jason Merrill
OK.

On Wed, Jan 17, 2018 at 3:10 PM, Jakub Jelinek  wrote:
> Hi!
>
> I've noticed several testcases recently that have a dead
> __builtin_unreachable call in functions right after return stmt.
> They are optimized away after a while (typically in the cfg pass), but we
> don't really need to generate them when there is the return.
> The reason we don't find it is because it is wrapped in some cases in
> CLEANUP_POINT_EXPR.  That doesn't change anything on the function actually
> ending with a return.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2018-01-17  Jakub Jelinek  
>
> PR c++/83897
> * cp-gimplify.c (cp_maybe_instrument_return): Handle
> CLEANUP_POINT_EXPR.
>
> * g++.dg/cpp0x/pr83897.C: New test.
>
> --- gcc/cp/cp-gimplify.c.jj 2018-01-11 18:58:48.348391787 +0100
> +++ gcc/cp/cp-gimplify.c2018-01-16 17:24:41.087336680 +0100
> @@ -1581,6 +1581,7 @@ cp_maybe_instrument_return (tree fndecl)
>   t = BIND_EXPR_BODY (t);
>   continue;
> case TRY_FINALLY_EXPR:
> +   case CLEANUP_POINT_EXPR:
>   t = TREE_OPERAND (t, 0);
>   continue;
> case STATEMENT_LIST:
> --- gcc/testsuite/g++.dg/cpp0x/pr83897.C.jj 2018-01-16 17:41:54.723256147 
> +0100
> +++ gcc/testsuite/g++.dg/cpp0x/pr83897.C2018-01-16 17:41:34.274257947 
> +0100
> @@ -0,0 +1,13 @@
> +// PR c++/83897
> +// { dg-do compile { target c++11 } }
> +// { dg-options "-O2 -fdump-tree-gimple" }
> +// { dg-final { scan-tree-dump-not "__builtin_unreachable" "gimple" } }
> +
> +struct A {};
> +struct B { int a; int b = 5; };
> +
> +A
> +bar (B)
> +{
> +  return {};
> +}
>
> Jakub


[C++ PATCH] Avoid appending a useless __builtin_unreachable in functions which do return (PR c++/83897)

2018-01-17 Thread Jakub Jelinek
Hi!

I've noticed several testcases recently that have a dead
__builtin_unreachable call in functions right after return stmt.
They are optimized away after a while (typically in the cfg pass), but we
don't really need to generate them when there is the return.
The reason we don't find it is because it is wrapped in some cases in
CLEANUP_POINT_EXPR.  That doesn't change anything on the function actually
ending with a return.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2018-01-17  Jakub Jelinek  

PR c++/83897
* cp-gimplify.c (cp_maybe_instrument_return): Handle
CLEANUP_POINT_EXPR.

* g++.dg/cpp0x/pr83897.C: New test.

--- gcc/cp/cp-gimplify.c.jj 2018-01-11 18:58:48.348391787 +0100
+++ gcc/cp/cp-gimplify.c2018-01-16 17:24:41.087336680 +0100
@@ -1581,6 +1581,7 @@ cp_maybe_instrument_return (tree fndecl)
  t = BIND_EXPR_BODY (t);
  continue;
case TRY_FINALLY_EXPR:
+   case CLEANUP_POINT_EXPR:
  t = TREE_OPERAND (t, 0);
  continue;
case STATEMENT_LIST:
--- gcc/testsuite/g++.dg/cpp0x/pr83897.C.jj 2018-01-16 17:41:54.723256147 
+0100
+++ gcc/testsuite/g++.dg/cpp0x/pr83897.C2018-01-16 17:41:34.274257947 
+0100
@@ -0,0 +1,13 @@
+// PR c++/83897
+// { dg-do compile { target c++11 } }
+// { dg-options "-O2 -fdump-tree-gimple" }
+// { dg-final { scan-tree-dump-not "__builtin_unreachable" "gimple" } }
+
+struct A {};
+struct B { int a; int b = 5; };
+
+A
+bar (B)
+{
+  return {};
+}

Jakub


[PATCH] Add clobbers for callee copied argument temporaries (PR sanitizer/81715, PR testsuite/83882)

2018-01-17 Thread Jakub Jelinek
Hi!

PR83882 complains that PR81715 testcase fails on callee copies parameter
targets.  The following patch ought to fix that, but I have only
bootstrapped/regtested it on x86_64-linux and i686-linux + on the testcase
with hppa.

John, do you think you could test this on hppa without the callee copies
default change?

Or should we not care anymore if there aren't any similar targets left?

2018-01-17  Jakub Jelinek  

PR sanitizer/81715
PR testsuite/83882
* function.h (gimplify_parameters): Add gimple_seq * argument.
* function.c: Include gimple.h and options.h.
(gimplify_parameters): Add cleanup argument, add CLOBBER stmts
for the added local temporaries if needed.
* gimplify.c (gimplify_body): Adjust gimplify_parameters caller,
if there are any parameter cleanups, wrap whole body into a
try/finally with the cleanups.

--- gcc/function.h.jj   2018-01-03 10:19:53.858533740 +0100
+++ gcc/function.h  2018-01-16 14:31:21.409972177 +0100
@@ -607,7 +607,7 @@ extern bool initial_value_entry (int i,
 extern void instantiate_decl_rtl (rtx x);
 extern int aggregate_value_p (const_tree, const_tree);
 extern bool use_register_for_decl (const_tree);
-extern gimple_seq gimplify_parameters (void);
+extern gimple_seq gimplify_parameters (gimple_seq *);
 extern void locate_and_pad_parm (machine_mode, tree, int, int, int,
 tree, struct args_size *,
 struct locate_and_pad_arg_data *);
--- gcc/function.c.jj   2018-01-12 11:35:48.901222595 +0100
+++ gcc/function.c  2018-01-16 15:13:22.165665047 +0100
@@ -79,6 +79,8 @@ along with GCC; see the file COPYING3.
 #include "tree-ssa.h"
 #include "stringpool.h"
 #include "attribs.h"
+#include "gimple.h"
+#include "options.h"
 
 /* So we can assign to cfun in this file.  */
 #undef cfun
@@ -3993,7 +3995,7 @@ gimplify_parm_type (tree *tp, int *walk_
statements to add to the beginning of the function.  */
 
 gimple_seq
-gimplify_parameters (void)
+gimplify_parameters (gimple_seq *cleanup)
 {
   struct assign_parm_data_all all;
   tree parm;
@@ -4058,6 +4060,16 @@ gimplify_parameters (void)
  else if (TREE_CODE (type) == COMPLEX_TYPE
   || TREE_CODE (type) == VECTOR_TYPE)
DECL_GIMPLE_REG_P (local) = 1;
+
+ if (!is_gimple_reg (local)
+ && flag_stack_reuse != SR_NONE)
+   {
+ tree clobber = build_constructor (type, NULL);
+ gimple *clobber_stmt;
+ TREE_THIS_VOLATILE (clobber) = 1;
+ clobber_stmt = gimple_build_assign (local, clobber);
+ gimple_seq_add_stmt (cleanup, clobber_stmt);
+   }
}
  else
{
--- gcc/gimplify.c.jj   2018-01-16 12:21:15.895859416 +0100
+++ gcc/gimplify.c  2018-01-16 14:41:27.643872081 +0100
@@ -12589,7 +12589,7 @@ gbind *
 gimplify_body (tree fndecl, bool do_parms)
 {
   location_t saved_location = input_location;
-  gimple_seq parm_stmts, seq;
+  gimple_seq parm_stmts, parm_cleanup = NULL, seq;
   gimple *outer_stmt;
   gbind *outer_bind;
   struct cgraph_node *cgn;
@@ -12628,7 +12628,7 @@ gimplify_body (tree fndecl, bool do_parm
 
   /* Resolve callee-copies.  This has to be done before processing
  the body so that DECL_VALUE_EXPR gets processed correctly.  */
-  parm_stmts = do_parms ? gimplify_parameters () : NULL;
+  parm_stmts = do_parms ? gimplify_parameters (_cleanup) : NULL;
 
   /* Gimplify the function's body.  */
   seq = NULL;
@@ -12657,6 +12657,13 @@ gimplify_body (tree fndecl, bool do_parm
   tree parm;
 
   gimplify_seq_add_seq (_stmts, gimple_bind_body (outer_bind));
+  if (parm_cleanup)
+   {
+ gtry *g = gimple_build_try (parm_stmts, parm_cleanup,
+ GIMPLE_TRY_FINALLY);
+ parm_stmts = NULL;
+ gimple_seq_add_stmt (_stmts, g);
+   }
   gimple_bind_set_body (outer_bind, parm_stmts);
 
   for (parm = DECL_ARGUMENTS (current_function_decl);

Jakub


Re: [PING][PATCH, AArch64] Disable reg offset in quad-word store for Falkor

2018-01-17 Thread Jim Wilson

On 01/17/2018 05:37 AM, Wilco Dijkstra wrote:

In general I think the best way to achieve this would be to use the
existing cost models which are there for exactly this purpose. If
this doesn't work well enough then we should fix those. 


I tried using cost models, and this didn't work, because the costs don't 
allow us to distinguish between loads and stores.  If you mark reg+reg 
as expensive, you get a performance loss from losing the loads, and a 
performance gain from losing the stores, and they cancel each other out.



this patch disables a whole class of instructions for a specific
target rather than simply telling GCC that they are expensive and
should only be used if there is no cheaper alternative.


This is the only solution I found that worked.


Also there is potential impact on generic code from:

  (define_insn "*aarch64_simd_mov"
[(set (match_operand:VQ 0 "nonimmediate_operand"
-   "=w, Umq,  m,  w, ?r, ?w, ?r, w")
+   "=w, Umq, Utf,  w, ?r, ?w, ?r, w")
(match_operand:VQ 1 "general_operand"
-   "m,  Dz, w,  w,  w,  r,  r, Dn"))]
+   "m,  Dz,w,  w,  w,  r,  r, Dn"))]

It seems an 'm' constraint has special meaning in the register allocator,
using a different constraint can block certain simplifications (for example
merging stack offsets into load/store in the post-reload cleanup pass),
so we'd need to verify this doesn't cause regressions.


No optimizer should be checking for 'm'.  They should be checking for 
CT_MEMORY, which indicates a constraint that accepts memory.  Utf is 
properly marked as a memory constraint.


I did some testing to verify that the patch would not affect other 
aarch64 targets at the time, though I don't recall now exactly what I did.


Jim




[libsanitizer] Guard against undefined weak symbols before Mac OS X 10.9 (PR sanitizer/82825)

2018-01-17 Thread Rainer Orth
As described in the PR, older versions of Mac OS X don't reliably
support undefined weak symbols, leading to hundreds of testsuite
failures for the sanitizers.  The following patch has been approved
upstream (https://reviews.llvm.org/D41346), but not been applied yet.
To fit into the gcc tree, it had to be modified slighly to account for
the Solaris sanitizer port which isn't in our tree yet.

Jakub suggested in the PR to apply the patch now, so that's what I'm
doing here.  I'll insert the upstream revision number in the ChangeLog
once it has been applied there.

Tested on x86_64-apple-darwin11.4.2.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2018-01-13  Rainer Orth  

PR sanitizer/82825
* sanitizer_common/sanitizer_internal_defs.h: Cherry-pick upstream
r??.

# HG changeset patch
# Parent  5a47a5ded615a8757277869e03afe2337e6de3d4
Guard against undefined weak symbols before Mac OS X 10.9 (PR sanitizer/82825)

diff --git a/libsanitizer/sanitizer_common/sanitizer_internal_defs.h b/libsanitizer/sanitizer_common/sanitizer_internal_defs.h
--- a/libsanitizer/sanitizer_common/sanitizer_internal_defs.h
+++ b/libsanitizer/sanitizer_common/sanitizer_internal_defs.h
@@ -63,7 +63,13 @@
 // SANITIZER_SUPPORTS_WEAK_HOOKS means that we support real weak functions that
 // will evaluate to a null pointer when not defined.
 #ifndef SANITIZER_SUPPORTS_WEAK_HOOKS
-#if (SANITIZER_LINUX || SANITIZER_MAC) && !SANITIZER_GO
+#if SANITIZER_LINUX && !SANITIZER_GO
+# define SANITIZER_SUPPORTS_WEAK_HOOKS 1
+// Before Xcode 4.5, the Darwin linker doesn't reliably support undefined
+// weak symbols.  Mac OS X 10.9/Darwin 13 is the first release only supported
+// by Xcode >= 4.5.
+#elif SANITIZER_MAC && \
+__ENVIRONMENT_MAC_OS_X_VERSION_MIN_REQUIRED__ >= 1090 && !SANITIZER_GO
 # define SANITIZER_SUPPORTS_WEAK_HOOKS 1
 #else
 # define SANITIZER_SUPPORTS_WEAK_HOOKS 0


Re: [C++ Patch] PR 78344 ("ICE on invalid c++ code (tree check: expected tree_list, have error_mark in cp_check_const_attributes, at cp/decl2.c:1347")

2018-01-17 Thread Jason Merrill
On Sat, Jan 13, 2018 at 7:59 AM, Paolo Carlini  wrote:
> Hi Jakub, all,
>
>> On 13 Jan 2018, at 12:32, Jakub Jelinek  wrote:
>>
>>> On Sat, Jan 13, 2018 at 12:12:02PM +0100, Jakub Jelinek wrote:
>>> Or we could not add those error_mark_nodes and
>>>  gcc_assert (seen_error () || cp_parser_error_occurred (parser));
>>
>> This fixes the testcase:
>> --- gcc/cp/parser.c.jj2018-01-11 18:58:48.386391801 +0100
>> +++ gcc/cp/parser.c2018-01-13 12:17:20.545347195 +0100
>> @@ -13403,6 +13403,9 @@ cp_parser_decl_specifier_seq (cp_parser*
>>}
>>}
>>
>> +  if (attrs == error_mark_node)
>> +gcc_assert (seen_error () || cp_parser_error_occurred (parser));
>> +  else
>>decl_specs->attributes
>>  = chainon (decl_specs->attributes,
>> attrs);
>> but there are 13 chainon calls like this in parser.c.  Shouldn't we introduce
>> a helper function for this, perhaps:
>>
>> void
>> attr_chainon (cp_parser *parser, tree *attrs, tree attr)
>> {
>>  /* Don't add error_mark_node to the chain, it can't be chained.
>> Assert this is during error recovery.  */
>>  if (attr == error_mark_node)
>>gcc_assert (seen_error () || cp_parser_error_occurred (parser));
>>  else
>>*attrs = chainon (*attrs, attr);
>> }
>>
>> and change all affected spots, like
>>
>>  attr_chainon (parser, _specs->attributes, attrs);
>>
>> ?
>
> First, thanks for your messages. Personally, at this late time for 8, I vote 
> for something like my most recent grokdeclarator fix and yours above for 
> 83824. Then, for 9, or even 8.2, the more encompassing change for all those 
> chainons. Please both of you let me know how shall we proceed, I could 
> certainly take care of the latter too from now on. Thanks again!

Let's go ahead with your patch to grokdeclarator.  In the parser,
let's do what Jakub is suggesting here:

> So, either we want to go with what Paolo posted even in this case,
> i.e. turn decl_specs->attributes into error_mark_node if attrs
> is error_mark_node, and don't chainon anything to it if decl_specs->attributes
> is already error_mark_node, e.g. something like:
> if (attrs == error_mark_node || decl_specs->attributes == error_mark_node)
>   decl_specs->attributes = error_mark_node;
> else
>   decl_specs->attributes = chainon (decl_specs->attributes, attrs);

without any assert.  Putting this logic in an attr_chainon function sounds good.

Jason


Re: [PATCH, rs6000] (v2) Support for gimple folding of mergeh, mergel intrinsics

2018-01-17 Thread Will Schmidt
On Wed, 2018-01-17 at 11:31 -0600, Segher Boessenkool wrote:
> On Wed, Jan 17, 2018 at 09:51:54AM -0600, Will Schmidt wrote:
> > On Tue, 2018-01-16 at 16:34 -0600, Segher Boessenkool wrote:
> > > Hi!
> > > On Tue, Jan 16, 2018 at 01:39:28PM -0600, Will Schmidt wrote:
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/powerpc/builtins-1-be-folded.c
> > > > @@ -0,0 +1,11 @@
> > > > +/* { dg-do compile { target { powerpc-*-* } } } */
> > > 
> > > Do you want powerpc*-*-*?  That is default in gcc.target/powerpc; dg-do
> > > compile is default, too, so you can either say
> > > 
> > > /* { dg-do compile } */
> > > 
> > > or nothing at all, to taste.
> > > 
> > > But it looks like you want to restrict to BE?  We still don't have a
> > > dejagnu thingy for that; you could put some #ifdef around it all (there
> > > are some examples in other testcases).  Not ideal, but works.
> > 
> > Just want to ensure continuing coverage.  :-)  This test in particular
> > is a copy/paste + tweak of an existing test, which tries to limit itself
> > to BE, there is an LE counterpart.
> 
> powerpc-*-* means those compilers that were configured for a 32-bit BE
> default target.  Which we do not usually have these days.  It also doesn't
> say much about what target the test is running for.
> 
> > My regression test results suggest that the addition of the
> > -mno-fold-gimple option to the existing testcases appears to have
> > uncovered an ICE, so pausing for the moment...
> 
> Good luck :-)  If you are reasonably certain the bug is not in your patch
> (but pre-existing), please do commit the patch.

Ok.  That does seem to be the case.

I'll commit this one shortly, and will do some additional follow-up on
the ICE, see if I can at least narrow it down a bit.

Thanks,
-Will

> 
> 
> Segher
> 




[PATCH][arm] Fix gcc.target/arm/g2.c and scd42-2.c for --with-mode=thumb hardfloat targets

2018-01-17 Thread Kyrill Tkachov

Hi all,

These -mcpu=xscale tests are ARM-only tests and they go to great pains to reject
explicit overriding options, but they're missing the -marm in their dg-options, 
which means
they will still give that nasty Thumb1 hard-float error when testing an 
implicit --with-mode=thumb
toolchain (--with-cpu=cortex-a15 --with-fpu=neon-vfpv4 --with-float=hard 
--with-mode=thumb, for example).

This patch adds the missing -marm and all is good again.

Committing to trunk.
Thanks,
Kyrill

2018-01-17  Kyrylo Tkachov  

* gcc.target/arm/g2.c: Add -marm to dg-options.
* gcc.target/arm/scd42-2.c: Likewise.
diff --git a/gcc/testsuite/gcc.target/arm/g2.c b/gcc/testsuite/gcc.target/arm/g2.c
index 85ba1906a916882fb9e6e57a67cc6d96b6b8c4b3..e3680178df17a349fe06c7fed275b505e377e92d 100644
--- a/gcc/testsuite/gcc.target/arm/g2.c
+++ b/gcc/testsuite/gcc.target/arm/g2.c
@@ -1,6 +1,6 @@
 /* Verify that hardware multiply is preferred on XScale. */
 /* { dg-do compile } */
-/* { dg-options "-mcpu=xscale -O2" } */
+/* { dg-options "-mcpu=xscale -O2 -marm" } */
 /* { dg-skip-if "Test is specific to the Xscale" { arm*-*-* } { "-march=*" } { "-march=xscale" } } */
 /* { dg-skip-if "Test is specific to the Xscale" { arm*-*-* } { "-mcpu=*" } { "-mcpu=xscale" } } */
 /* { dg-skip-if "Test is specific to ARM mode" { arm*-*-* } { "-mthumb" } { "" } } */
diff --git a/gcc/testsuite/gcc.target/arm/scd42-2.c b/gcc/testsuite/gcc.target/arm/scd42-2.c
index 8cd4bde475d65fd388c6f7b3ddb955339620b564..6d9e5e1fa39994d35f5e55bedbb04e22c975931f 100644
--- a/gcc/testsuite/gcc.target/arm/scd42-2.c
+++ b/gcc/testsuite/gcc.target/arm/scd42-2.c
@@ -4,7 +4,7 @@
 /* { dg-skip-if "Test is specific to the Xscale" { arm*-*-* } { "-mcpu=*" } { "-mcpu=xscale" } } */
 /* { dg-skip-if "Test is specific to ARM mode" { arm*-*-* } { "-mthumb" } { "" } } */
 /* { dg-require-effective-target arm32 } */
-/* { dg-options "-mcpu=xscale -O" } */
+/* { dg-options "-mcpu=xscale -O -marm" } */
 
 unsigned load2(void) __attribute__ ((naked));
 unsigned load2(void)


[PATCH] RISC-V: Mark fsX as call clobbered when soft-float.

2018-01-17 Thread Jim Wilson
We support architecture/ABI combinations where the ABI uses smaller FP sizes
than the hardware supports.  To avoid corruption of the call-saved FP
registers, fs0 through fs11, we only allow values in them that are smaller or
equal in size to the ABI FP size.  This means that for soft-float, we never
allow any value in these registers.  This restriction is implemented in
riscv_hard_regno_mode_ok.  This was confirmed by cross compiling SPEC CPU2006
for rv32gc/ilp32 and disassembling.  The only uses of the fsX registers are
in Unwind_* routines that are saving/restore every call saved register.

This patch allows us to use the fsX register for soft-float code by marking
them as call clobbered.  This is already specified in the ABI documents, and
this patch just changes the compiler to match the ABI.  With this patch, I am
seeing fsX registers used in the SPEC compile, and smaller frame sizes due to
better register allocation.  There is no ABI change, because we were not using
these registers at all before the patch.

Tested with a rv32gc/ilp32 make check.  There were no regressions.

Committed.

Jim

2018-01-17  Andrew Waterman  
gcc/
* config/riscv/riscv.c (riscv_conditional_register_usage): If
UNITS_PER_FP_ARG is 0, set call_used_regs to 1 for all FP regs.
---
 gcc/config/riscv/riscv.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
index 19a01e0825a..20660a4061a 100644
--- a/gcc/config/riscv/riscv.c
+++ b/gcc/config/riscv/riscv.c
@@ -4123,6 +4123,13 @@ riscv_conditional_register_usage (void)
   for (int regno = FP_REG_FIRST; regno <= FP_REG_LAST; regno++)
fixed_regs[regno] = call_used_regs[regno] = 1;
 }
+
+  /* In the soft-float ABI, there are no callee-saved FP registers.  */
+  if (UNITS_PER_FP_ARG == 0)
+{
+  for (int regno = FP_REG_FIRST; regno <= FP_REG_LAST; regno++)
+   call_used_regs[regno] = 1;
+}
 }
 
 /* Return a register priority for hard reg REGNO.  */
-- 
2.14.1


Re: [PR c++/83160] local ref to capture

2018-01-17 Thread Jason Merrill
On Fri, Jan 12, 2018 at 2:11 PM, Nathan Sidwell  wrote:
> Jason,
> this fixes 83160, where we complain about not having an lvalue in things
> like:
>
> void foo () {
>   const int a = 0;
>   [] () {
> const int  = a;  // here
>   };
> }
>
> The problem is that we in convert_like_real we have ref_bind->identity
> conversions, and the identity conversion calls 'mark_rvalue_use'.  For a
> regular rvalue use of a 'const int' VAR_DECL, we return the VAR_DECL -- not
> collapse it to the initialized value.  However, for captures, there is code
> to look through the capture when rvalue_p is true.  We end up looking
> through the lambda capture, through the captured 'a' and to its initializer.
> oops.
>
> Calling mark_lvalue_use instead is also wrong, because the identity conv may
> of course be being applied to an rvalue, and we end up giving an equally
> wrong error in the opposite case.  Dealing with ck_identity this way sees
> wrong -- context determines whether this is an rvalue or lvalue use.
>
> I think the solution is for the identity conv to know whether it's going to
> be the subject of a direct reference binding and call mark_lvalue_use in tha
> case.  There's 2 issues with that:
>
> 1) mark_lvalue_use isn't quite right because we want to specify
> reject_builtin to the inner mark_use call.  (I didn't try changing
> mark_lvalue_use to pass true, I suspected it'd break actual calls of
> builtins).  Fixed by making mark_use externally reachable, and passing in an
> appropriate 'rvalue_p' parm directly.  I think your recent patch to select
> between the two marking fns may be neater using this entry point?
>
> 2) I modify direct_reference_binding to look at the incoming conv, and if it
> is ck_identity set that conv's rvaluedness_matches_p flag.  Then deply that
> flag to determine the arg in #1.  'rvaluedness_matches_p' seemed the least
> worst existing flag to press into service here.

This makes sense to me.  But I think we'd want also that flag set on
the ck_identity inside the ck_base that direct_reference_binding
creates, so setting it first rather than in an else.

Jason


libgo patch committed: Support stat and device numbers on AIX

2018-01-17 Thread Ian Lance Taylor
This patch by Tony Reix adds support for stat and device numbers on
AIX.  Bootstrapped and ran Go tests on x86_64-pc-linux-gnu.  Committed
to  mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 256794)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-ca805b704fc141d7ad61f8fcd3badbaa04b7e363
+3ea7fc3b918210e7248dbc51d90af20639dc4167
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/go/archive/tar/stat_actime1.go
===
--- libgo/go/archive/tar/stat_actime1.go(revision 256593)
+++ libgo/go/archive/tar/stat_actime1.go(working copy)
@@ -2,7 +2,7 @@
 // Use of this source code is governed by a BSD-style
 // license that can be found in the LICENSE file.
 
-// +build linux dragonfly openbsd solaris
+// +build aix linux dragonfly openbsd solaris
 
 package tar
 
Index: libgo/go/archive/tar/stat_unix.go
===
--- libgo/go/archive/tar/stat_unix.go   (revision 256593)
+++ libgo/go/archive/tar/stat_unix.go   (working copy)
@@ -2,7 +2,7 @@
 // Use of this source code is governed by a BSD-style
 // license that can be found in the LICENSE file.
 
-// +build linux darwin dragonfly freebsd openbsd netbsd solaris
+// +build aix linux darwin dragonfly freebsd openbsd netbsd solaris
 
 package tar
 
@@ -54,6 +54,16 @@ func statUnix(fi os.FileInfo, h *Header)
if h.Typeflag == TypeChar || h.Typeflag == TypeBlock {
dev := uint64(sys.Rdev) // May be int32 or uint32
switch runtime.GOOS {
+   case "aix":
+   var major, minor uint32
+   if runtime.GOARCH == "ppc64" {
+   major = uint32((dev & 0x3fff) >> 32)
+   minor = uint32((dev & 0x) >> 0)
+   } else {
+   major = uint32((dev >> 16) & 0x)
+   minor = uint32(dev & 0x)
+   }
+   h.Devmajor, h.Devminor = int64(major), int64(minor)
case "linux":
// Copied from golang.org/x/sys/unix/dev_linux.go.
major := uint32((dev & 0x000fff00) >> 8)


Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre

2018-01-17 Thread Woodhouse, David
I'm not sure I understand the concern. When compiling a large project for -m32 
vs. -m64, there must be a million times the compiler has to decide whether to 
emit "r" or "e" before a register name. HJ's patch already does this for the 
thunk symbol. What is the future requirement that I am not understanding, and 
that is so hard?

Back to real computer and will stop top-posting HTML soon, I promise!

On 14 Jan 2018 19:22, Uros Bizjak  wrote:

On Sun, Jan 14, 2018 at 6:44 PM, Woodhouse, David  wrote:
> This won't make the list; I'll send a more coherent and less HTML-afflicted
> version later.
>
> The bare 'ax' naming made it painful to instantiate the external thunks for
> 32-bit and 64-bot code because we had to put the e/r back again inside the
> .irp reg ax bx... code.
>
> We could probably have lived with that but it would be painful to change now
> that Linux and Xen patches with the current ABI are all lined up. I
> appreciate they weren't in GCC yet so we get little sympathy but these are
> strange times and we had to move fast.
>
> I'd really like *not* to change it now. Having the thunk name actually
> include the name of the register it's using does seem nicer anyway...

That's unfortunate... I suspect that in the future, one will need
#ifdef __x86_64__ around eventual calls to thunks from c code because
of this decision, since thunks for x86_64 target will have different
names than thunks for x86_32 target. I don't know if the (single?)
case of mixing 32 and 64 bit assembly in the highly specialized part
of the kernel really warrants this decision. Future programmers will
be grateful if kernel people can re-consider their choice in
not-yet-release ABI.

Uros.




Amazon Web Services UK Limited. Registered in England and Wales with 
registration number 08650665 and which has its registered office at 60 Holborn 
Viaduct, London EC1A 2FD, United Kingdom.


[C++/83287] Another overload lookup ice

2018-01-17 Thread Nathan Sidwell
This fixes the other problem in 83287, another case where a lookup was 
not marked for keeping.  I checked other build4 (& build3) calls, and it 
looks like this is the only one where this could happen.  Also, retested 
using the original testcase and no further issues discovered.


nathan
--
Nathan Sidwell
2018-01-17  Nathan Sidwell  

	PR c++/83287
	* init.c (build_raw_new_expr): Scan list for lookups to keep.

	PR c++/83287
	* g++.dg/lookup/pr83287-2.C: New.

Index: cp/init.c
===
--- cp/init.c	(revision 256795)
+++ cp/init.c	(working copy)
@@ -2325,7 +2325,12 @@ build_raw_new_expr (vec *pl
   else if (init->is_empty ())
 init_list = void_node;
   else
-init_list = build_tree_list_vec (init);
+{
+  init_list = build_tree_list_vec (init);
+  for (tree v = init_list; v; v = TREE_CHAIN (v))
+	if (TREE_CODE (TREE_VALUE (v)) == OVERLOAD)
+	  lookup_keep (TREE_VALUE (v), true);
+}
 
   new_expr = build4 (NEW_EXPR, build_pointer_type (type),
 		 build_tree_list_vec (placement), type, nelts,
Index: testsuite/g++.dg/lookup/pr83287-2.C
===
--- testsuite/g++.dg/lookup/pr83287-2.C	(revision 0)
+++ testsuite/g++.dg/lookup/pr83287-2.C	(working copy)
@@ -0,0 +1,20 @@
+// PR c++/83287 failed to keep lookup until instantiation time
+
+void foo ();
+
+namespace {
+  void foo ();
+}
+
+template 
+void
+bar ()
+{
+  new T (foo); // { dg-error "cannot resolve" }
+}
+
+void
+baz ()
+{
+  bar  ();
+}


Re: [PATCH] Use pointer sized array indices.

2018-01-17 Thread Janne Blomqvist
PING

On Fri, Dec 29, 2017 at 8:28 PM, Janne Blomqvist
 wrote:
> On Fri, Dec 29, 2017 at 7:17 PM, Thomas Koenig  wrote:
>> Hi Janne,
>>
>>> Using pointer sized variables (e.g. size_t / ptrdiff_t) when the
>>> variables are used as array indices allows accessing larger arrays,
>>> and can be a slight performance improvement due to no need for sign or
>>> zero extending, or masking.
>>
>>
>> Unless I have missed something, all the examples are for cases where
>> the array is of maximum size GFC_MAX_DIMENSIONS.
>
> Many, but not all.
>
>> This is why they
>> were left as int in the first place (because it is unlikely we will have
>> arrays of more than 2^31-1 dimensions soon :-).
>>
>> Do you really think this change is necessary? If not, I'd rather avoid
>> such a change.
>
> I'm not planning on supporting > 2**31-1 dimensions, no. :)
>
> But even if we know that the maximum value is always going to be
> smaller, by using pointer-sized variables the compiler can generate
> slightly more efficient code.
>
> See e.g. https://godbolt.org/g/oAvw5L ; in the functions with a loop,
> the ones which use pointer-sized indices have shorter preambles as
> well as loop bodies. And for the simple functions that just index an
> array, by using pointer-sized indices there is no need to zero the
> upper half of the registers.
>
> I mean, it's not a huge improvement, but it might be a tiny one in some cases.
>
> Also, by moving the induction variable from the beginning of the
> function into the loop header, it makes it easier for both readers and
> the compiler to see the scope of the variable.
>
> --
> Janne Blomqvist



-- 
Janne Blomqvist


Re: [PING][PATCH, AArch64] Disable reg offset in quad-word store for Falkor

2018-01-17 Thread Siddhesh Poyarekar
On Wednesday 17 January 2018 11:13 PM, Wilco Dijkstra wrote:
> Are you saying the same issue exists for all stores with writeback? If so then
> your patch would need to address that too.

Yes, I'll be posting a separate patch for that because the condition set
is slightly different for it.  It will also be accompanied with a
slightly different tuning for addrcost, which is why it needs separate
testing.

> It seems way more fundamental if it affects anything that isn't a simple
> immediate offset. Again I suggest using the existing cost infrastructure
> to find a setting that improves performance. If discouraging pre/post 
> increment
> helps Falkor then that's better than doing nothing.

The existing costs don't differentiate between loads and stores and that
is specifically what I need for falkor.

>>> I think a special case for Falkor in aarch64_address_cost would be 
>>> acceptable
>>> in GCC8 - that would be much smaller and cleaner than the current patch. 
>>> If required we could improve upon this in GCC9 and add a way to 
>>> differentiate
>>> between loads and stores.
>>
>> I can't do this in address_cost since I can't determine whether the
>> address is a load or a store location.  The most minimal way seems to be
>> using the patterns in the md file.
> 
> Well I don't think the approach of blocking specific patterns is a good 
> solution to
> this problem and may not be accepted by AArch64 maintainers. Try your example
> with -funroll-loops and compare with my suggestion (with or without extra 
> code to
> increase cost of writeback too). It seems to me adjusting costs is always 
> going to
> result in better overall code quality, even if it also applies to loads for 
> the time being.

Costs are not useful for this scenario because they cannot differentiate
between loads and stores.  To make that distinction I have to block
specific patterns, unless there's a better way I'm unaware of that helps
determine whether a memory reference is a load or a store.

Another approach I am trying to minimize the change is to add a new
ADDR_QUERY_STR for aarch64_legitimate_address_p, which can then be used
in classify_address to skip register addressing mode for falkor.  That
way we avoid the additional hook.  It will still need the additional Utf
memory constraint though.

Do you know of a way I can distinguish between loads and stores in costs
tuning?

Siddhesh


Re: [PATCH 1/3] [builtins] Generic support for __builtin_speculation_safe_load()

2018-01-17 Thread Bernd Edlinger
Hi,

+  return targetm.speculation_safe_load (mode, target, mem, lower, upper,
+   cmpptr, true);

So portable programming will use this builtin when available,
but for the majority of the targets the default will be wrong,
For instance why should a PDP port be touched just to silence
a warning?


Bernd.


Re: GCC 8.0.0 Status Report (2018-01-15), Trunk in Regression and Documentation fixes only mode

2018-01-17 Thread Segher Boessenkool
On Tue, Jan 16, 2018 at 06:50:07AM -0600, Segher Boessenkool wrote:
> On Mon, Jan 15, 2018 at 09:21:07AM +0100, Richard Biener wrote:
> > We're still in pretty bad shape regression-wise.  Please also take
> > the opportunity to check the state of your favorite host/target
> > combination to make sure building and testing works appropriately.
> 
> I tested building Linux (the kernel) for all supported architectures.
> Everything builds (with my usual tweaks, link with libgcc etc.);
> except x86_64 and sh have more problems in the kernel, and mips has
> an ICE.  I'll open a PR for that one.

I messed up this testing, accidentally tested trunk instead.  Whoops :-/

For 7, mips works just dandy.  Rest is the same (i.e. all works, x86_64
and sh have problems in the kernel code itself, not our problem).


Segher


Re: [PATCH, PR82428] Add __builtin_goacc_{gang,worker,vector}_{id,size}

2018-01-17 Thread Jakub Jelinek
On Wed, Jan 17, 2018 at 06:42:33PM +0100, Tom de Vries wrote:
> +static rtx
> +expand_builtin_goacc_parlevel_id_size (tree exp, rtx target, int ignore)
> +{
> +  tree fndecl = get_callee_fndecl (exp);
> +
> +  const char *name;
> +  switch (DECL_FUNCTION_CODE (fndecl))
> +{
> +case BUILT_IN_GOACC_PARLEVEL_ID:
> +  name = "__builtin_goacc_parlevel_id";
> +  break;
> +case BUILT_IN_GOACC_PARLEVEL_SIZE:
> +  name = "__builtin_goacc_parlevel_size";
> +  break;

Can you avoid that many switches on DECL_FUNCTION_CODE?
Like initialize in this one not just name, but also the
fallback_retval and gen_fn variables and just use them later?

> +default:
> +  gcc_unreachable ();
> +}
> +
> +  if (oacc_get_fn_attrib (current_function_decl) == NULL_TREE)
> +{
> +  error ("%s only supported in openacc code", name);

OpenACC ?

> +  return const0_rtx;
> +}
> +
> +  tree arg = CALL_EXPR_ARG (exp, 0);
> +  if (TREE_CODE (arg) != INTEGER_CST)
> +{
> +  error ("non-constant argument 0 to %s", name);

%qs instead of %s, 2 times.

> +  return const0_rtx;
> +}
> +
> +  int dim = TREE_INT_CST_LOW (arg);
> +  switch (dim)
> +{
> +case GOMP_DIM_GANG:
> +case GOMP_DIM_WORKER:
> +case GOMP_DIM_VECTOR:
> +  break;
> +default:
> +  error ("illegal argument 0 to %s", name);
> +  return const0_rtx;
> +}
> +
> +  if (ignore)
> +return target;
> +
> +  if (!targetm.have_oacc_dim_size ())
> +{
> +  rtx retval;
> +  switch (DECL_FUNCTION_CODE (fndecl))
> + {
> + case BUILT_IN_GOACC_PARLEVEL_ID:
> +   retval = const0_rtx;
> +   break;
> + case BUILT_IN_GOACC_PARLEVEL_SIZE:
> +   retval = GEN_INT (1);

In addition to moving these assignments to a single switch,
this one can be fallback_retval = const1_rtx;

Otherwise LGTM for stage1.

Jakub


Re: [PING][PATCH, AArch64] Disable reg offset in quad-word store for Falkor

2018-01-17 Thread Wilco Dijkstra
Siddhesh Poyarekar wrote: 
On Wednesday 17 January 2018 08:31 PM, Wilco Dijkstra wrote:
> Why is that a bad thing? With the patch as is, the testcase generates:
> 
> .L4:
>    ldr q0, [x2, x3]
>    add x5, x1, x3
>    add x3, x3, 16
>    cmp x3, x4
>    str q0, [x5]
>    bne .L4
> 
> With a change in address cost (for loads and stores) we would get:
> 
> .L4:
>    ldr q0, [x3], 16
>    str q0, [x4], 16
>    cmp x3, x5
>    bne .L4

> This is great for the load because of the way the falkor prefetcher
> works, but it is terrible for the store because of the way the pipeline
> works.  The only performant store for falkor is an indirect load with a
> constant or zero offset.  Everything else has hidden costs.

Are you saying the same issue exists for all stores with writeback? If so then
your patch would need to address that too.

> I briefly looked at the possibility of splitting the register_offset
> cost into load and store, but I realized that I'd have to modify the
> target hook for it to be useful, which is way too much work for this
> single quirk.

It seems way more fundamental if it affects anything that isn't a simple
immediate offset. Again I suggest using the existing cost infrastructure
to find a setting that improves performance. If discouraging pre/post increment
helps Falkor then that's better than doing nothing.

>> I think a special case for Falkor in aarch64_address_cost would be acceptable
>> in GCC8 - that would be much smaller and cleaner than the current patch. 
>> If required we could improve upon this in GCC9 and add a way to differentiate
>> between loads and stores.
>
> I can't do this in address_cost since I can't determine whether the
> address is a load or a store location.  The most minimal way seems to be
> using the patterns in the md file.

Well I don't think the approach of blocking specific patterns is a good 
solution to
this problem and may not be accepted by AArch64 maintainers. Try your example
with -funroll-loops and compare with my suggestion (with or without extra code 
to
increase cost of writeback too). It seems to me adjusting costs is always going 
to
result in better overall code quality, even if it also applies to loads for the 
time being.

Wilco

Re: [PATCH v2] C++: Fix crash in warn_for_memset within templates (PR c++/83814)

2018-01-17 Thread Jason Merrill
OK.

On Wed, Jan 17, 2018 at 12:27 PM, David Malcolm  wrote:
> On Wed, 2018-01-17 at 09:28 -0500, Jason Merrill wrote:
>> On Wed, Jan 17, 2018 at 5:34 AM, Jakub Jelinek 
>> wrote:
>> > On Fri, Jan 12, 2018 at 05:09:24PM -0500, David Malcolm wrote:
>> > > PR c++/83814 reports an ICE introduced by the location wrapper
>> > > patch
>> > > (r256448), affecting certain memset calls within templates.
>> >
>> > Note, I think this issue sadly affects a lot of code, so it is
>> > quite urgent.
>> >
>> > That said, wonder if we really can't do any folding when
>> > processing_template_decl, could we e.g. do at least
>> > maybe_constant_value,
>> > or fold if the expression is not type nor value dependent?
>>
>> Yes, in a template we should call fold_non_dependent_expr.
>>
>> > BTW, never know if cp_fold_rvalue is a superset of
>> > maybe_constant_value or not.
>>
>> It is.
>>
>> Jason
>
> Thanks.  Here's an updated version of the patch.
>
> Changed in v2:
> * use fold_non_dependent_expr in the C++ impl of fold_for_warn
> * add some test coverage of folding to g++.dg/wrappers/pr83814.C
> * added another testcase (from PR c++/83902)
>
> Successfully bootstrapped on x86_64-pc-linux-gnu.
> OK for trunk?
>
>
> gcc/c-family/ChangeLog:
> PR c++/83814
> * c-common.c (fold_for_warn): Move to c/c-fold.c and cp/expr.c.
>
> gcc/c/ChangeLog:
> PR c++/83814
> * c-fold.c (fold_for_warn): Move from c-common.c, reducing to just
> the C part.
>
> gcc/cp/ChangeLog:
> PR c++/83814
> * expr.c (fold_for_warn): Move from c-common.c, reducing to just
> the C++ part.  If processing a template, call
> fold_non_dependent_expr rather than fully folding.
>
> gcc/testsuite/ChangeLog:
> PR c++/83814
> PR c++/83902
> * g++.dg/wrappers/pr83814.C: New test case.
> * g++.dg/wrappers/pr83902.C: New test case.
> ---
>  gcc/c-family/c-common.c | 13 --
>  gcc/c/c-fold.c  | 10 +
>  gcc/cp/expr.c   | 15 +++
>  gcc/testsuite/g++.dg/wrappers/pr83814.C | 70 
> +
>  gcc/testsuite/g++.dg/wrappers/pr83902.C |  9 +
>  5 files changed, 104 insertions(+), 13 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/wrappers/pr83814.C
>  create mode 100644 gcc/testsuite/g++.dg/wrappers/pr83902.C
>
> diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
> index 097d192..858ed68 100644
> --- a/gcc/c-family/c-common.c
> +++ b/gcc/c-family/c-common.c
> @@ -868,19 +868,6 @@ c_get_substring_location (const substring_loc 
> _loc,
>  }
>
>
> -/* Fold X for consideration by one of the warning functions when checking
> -   whether an expression has a constant value.  */
> -
> -tree
> -fold_for_warn (tree x)
> -{
> -  if (c_dialect_cxx ())
> -return c_fully_fold (x, /*for_init*/false, /*maybe_constp*/NULL);
> -  else
> -/* The C front-end has already folded X appropriately.  */
> -return x;
> -}
> -
>  /* Return true iff T is a boolean promoted to int.  */
>
>  bool
> diff --git a/gcc/c/c-fold.c b/gcc/c/c-fold.c
> index 5776f1b..be6a0fc 100644
> --- a/gcc/c/c-fold.c
> +++ b/gcc/c/c-fold.c
> @@ -668,3 +668,13 @@ c_fully_fold_internal (tree expr, bool in_init, bool 
> *maybe_const_operands,
>  }
>return ret;
>  }
> +
> +/* Fold X for consideration by one of the warning functions when checking
> +   whether an expression has a constant value.  */
> +
> +tree
> +fold_for_warn (tree x)
> +{
> +  /* The C front-end has already folded X appropriately.  */
> +  return x;
> +}
> diff --git a/gcc/cp/expr.c b/gcc/cp/expr.c
> index 7d79215..b1ab453 100644
> --- a/gcc/cp/expr.c
> +++ b/gcc/cp/expr.c
> @@ -315,3 +315,18 @@ mark_exp_read (tree exp)
>  }
>  }
>
> +/* Fold X for consideration by one of the warning functions when checking
> +   whether an expression has a constant value.  */
> +
> +tree
> +fold_for_warn (tree x)
> +{
> +  /* C++ implementation.  */
> +
> +  /* It's not generally safe to fully fold inside of a template, so
> + call fold_non_dependent_expr instead.  */
> +  if (processing_template_decl)
> +return fold_non_dependent_expr (x);
> +
> +  return c_fully_fold (x, /*for_init*/false, /*maybe_constp*/NULL);
> +}
> diff --git a/gcc/testsuite/g++.dg/wrappers/pr83814.C 
> b/gcc/testsuite/g++.dg/wrappers/pr83814.C
> new file mode 100644
> index 000..b9f8faa
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/wrappers/pr83814.C
> @@ -0,0 +1,70 @@
> +/* Verify that our memset warnings don't crash when folding
> +   arguments within a template (PR c++/83814).  */
> +
> +// { dg-options "-Wno-int-to-pointer-cast -Wmemset-transposed-args 
> -Wmemset-elt-size" }
> +
> +template 
> +void test_1()
> +{
> +  __builtin_memset (int() - char(), 0, 0);
> +}
> +
> +template 
> +void test_2()
> +{
> +  __builtin_memset (0, 0, int() - char());
> +}
> +
> +template 
> +void test_3 (unsigned a, int c)
> +{

Re: [PATCH, PR82428] Add __builtin_goacc_{gang,worker,vector}_{id,size}

2018-01-17 Thread Tom de Vries

On 01/15/2018 12:25 PM, Jakub Jelinek wrote:

On Mon, Jan 15, 2018 at 12:12:10PM +0100, Tom de Vries wrote:

It can be just number of course.  parlevel is fine for me.



So, in summary, I propose as interface:
- int __builtin_goacc_parlevel_id (int);
- int __builtin_goacc_parlevel_size (int);
with arguments 0, 1, and 2 meaning gang, worker and vector.


LGTM.



Hi,

I've updated the patch for this interface.

Bootstrapped and reg-tested on x86_64.
Build and reg-tested libgomp on x86_64 with nvtpx accelerator.

OK for stage1?

Thanks,
- Tom
Add __builtin_goacc_parlevel_{id,size}

2018-01-06  Tom de Vries  

	PR libgomp/82428
	* builtins.def (DEF_GOACC_BUILTIN_ONLY): Define.
	* omp-builtins.def (BUILT_IN_GOACC_PARLEVEL_ID)
	(BUILT_IN_GOACC_PARLEVEL_SIZE): New builtin.
	* builtins.c (expand_builtin_goacc_parlevel_id_size): New function.
	(expand_builtin): Call expand_builtin_goacc_parlevel_id_size.
	* doc/extend.texi (Other Builtins): Add __builtin_goacc_parlevel_id and
	__builtin_goacc_parlevel_size.

	* f95-lang.c (DEF_GOACC_BUILTIN_ONLY): Define.

	* c-c++-common/goacc/builtin-goacc-parlevel-id-size-2.c: New test.
	* c-c++-common/goacc/builtin-goacc-parlevel-id-size.c: New test.

	* testsuite/libgomp.oacc-c-c++-common/gang-static-2.c: Use
	__builtin_goacc_parlevel_{id,size}.
	* testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/loop-dim-default.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/loop-g-1.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/loop-g-2.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/loop-gwv-1.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-g-1.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-gwv-1.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-v-1.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-v-2.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-w-2.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-wv-1.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/loop-v-1.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/loop-w-1.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/loop-wv-1.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/routine-g-1.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/routine-gwv-1.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/routine-v-1.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/routine-w-1.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/routine-wv-1.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/routine-wv-2.c: Same.
	* testsuite/libgomp.oacc-c-c++-common/tile-1.c: Same.

---
 gcc/builtins.c | 91 ++
 gcc/builtins.def   |  4 +
 gcc/doc/extend.texi| 10 +++
 gcc/fortran/f95-lang.c |  4 +
 gcc/omp-builtins.def   |  5 ++
 .../goacc/builtin-goacc-parlevel-id-size-2.c   | 37 +
 .../goacc/builtin-goacc-parlevel-id-size.c | 79 +++
 .../libgomp.oacc-c-c++-common/gang-static-2.c  | 21 ++---
 .../libgomp.oacc-c-c++-common/loop-auto-1.c| 18 ++---
 .../libgomp.oacc-c-c++-common/loop-dim-default.c   | 14 ++--
 .../testsuite/libgomp.oacc-c-c++-common/loop-g-1.c | 17 ++--
 .../testsuite/libgomp.oacc-c-c++-common/loop-g-2.c | 16 ++--
 .../libgomp.oacc-c-c++-common/loop-gwv-1.c | 17 ++--
 .../libgomp.oacc-c-c++-common/loop-red-g-1.c   | 16 ++--
 .../libgomp.oacc-c-c++-common/loop-red-gwv-1.c | 16 ++--
 .../libgomp.oacc-c-c++-common/loop-red-v-1.c   | 16 ++--
 .../libgomp.oacc-c-c++-common/loop-red-v-2.c   | 16 ++--
 .../libgomp.oacc-c-c++-common/loop-red-w-1.c   | 16 ++--
 .../libgomp.oacc-c-c++-common/loop-red-w-2.c   | 16 ++--
 .../libgomp.oacc-c-c++-common/loop-red-wv-1.c  | 12 +--
 .../testsuite/libgomp.oacc-c-c++-common/loop-v-1.c | 16 ++--
 .../testsuite/libgomp.oacc-c-c++-common/loop-w-1.c | 16 ++--
 .../libgomp.oacc-c-c++-common/loop-wv-1.c  | 16 ++--
 .../libgomp.oacc-c-c++-common/parallel-dims.c  | 19 +
 .../libgomp.oacc-c-c++-common/routine-g-1.c| 18 ++---
 .../libgomp.oacc-c-c++-common/routine-gwv-1.c  | 18 ++---
 .../libgomp.oacc-c-c++-common/routine-v-1.c| 18 ++---
 .../libgomp.oacc-c-c++-common/routine-w-1.c| 18 ++---
 .../libgomp.oacc-c-c++-common/routine-wv-1.c   | 18 ++---
 .../libgomp.oacc-c-c++-common/routine-wv-2.c   | 19 ++---
 .../testsuite/libgomp.oacc-c-c++-common/tile-1.c   | 15 ++--
 31 files changed, 402 insertions(+), 230 deletions(-)

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 98eb804..c4ca8e6 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -70,6 +70,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "case-cfn-macros.h"
 #include "gimple-fold.h"
 #include "intl.h"
+#include 

Re: [PATCH 3/3] [arm] Implement support for the de-speculation intrinsic

2018-01-17 Thread Bernd Edlinger
Hi,


+  if (mode == TImode || TARGET_THUMB1)
+return default_speculation_safe_load (mode, result, mem, lower_bound,
+ upper_bound, cmpptr, warn);

TImode is impossible to happen right, so you could as well let that one
run into the gcc_unreachable below, or use an assertion here?


Bernd.


C++ PATCH for c++/81843, ICE with member variadic template

2018-01-17 Thread Jason Merrill
My patch for 72801 added in enclosing template arguments, but then as
a result we wrongly tried to deduce them.

Tested x86_64-pc-linux-gnu, applying to trunk and 7.
commit 7b299d41c412665a03c5bcdb088d3a39dc86d9c2
Author: Jason Merrill 
Date:   Wed Jan 17 11:12:23 2018 -0500

PR c++/81843 - ICE with variadic member template.

PR c++/72801
* pt.c (unify_pack_expansion): Don't try to deduce enclosing
template args.

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 322408d92ec..1c7b91ac0df 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -20356,6 +20356,7 @@ unify_pack_expansion (tree tparms, tree targs, tree 
packed_parms,
 
   /* Add in any args remembered from an earlier partial instantiation.  */
   targs = add_to_template_args (PACK_EXPANSION_EXTRA_ARGS (parm), targs);
+  int levels = TMPL_ARGS_DEPTH (targs);
 
   packed_args = expand_template_argument_pack (packed_args);
 
@@ -20371,6 +20372,8 @@ unify_pack_expansion (tree tparms, tree targs, tree 
packed_parms,
 
   /* Determine the index and level of this parameter pack.  */
   template_parm_level_and_index (parm_pack, , );
+  if (level < levels)
+   continue;
 
   /* Keep track of the parameter packs and their corresponding
  argument packs.  */
diff --git a/gcc/testsuite/g++.dg/cpp0x/variadic171.C 
b/gcc/testsuite/g++.dg/cpp0x/variadic171.C
new file mode 100644
index 000..1e268141d6d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/variadic171.C
@@ -0,0 +1,12 @@
+// PR c++/81843
+// { dg-do compile { target c++11 } }
+
+template < typename > struct A;
+template < typename, typename > struct B;
+template < typename ... S > struct C
+{
+  template < typename > struct D {};
+  template < typename ... T > struct D < A < B < S, T > ... > >;
+};
+
+C <>::D < A < B < int, int > > > c;


Re: [PATCH, rs6000] (v2) Support for gimple folding of mergeh, mergel intrinsics

2018-01-17 Thread Segher Boessenkool
On Wed, Jan 17, 2018 at 09:51:54AM -0600, Will Schmidt wrote:
> On Tue, 2018-01-16 at 16:34 -0600, Segher Boessenkool wrote:
> > Hi!
> > On Tue, Jan 16, 2018 at 01:39:28PM -0600, Will Schmidt wrote:
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/powerpc/builtins-1-be-folded.c
> > > @@ -0,0 +1,11 @@
> > > +/* { dg-do compile { target { powerpc-*-* } } } */
> > 
> > Do you want powerpc*-*-*?  That is default in gcc.target/powerpc; dg-do
> > compile is default, too, so you can either say
> > 
> > /* { dg-do compile } */
> > 
> > or nothing at all, to taste.
> > 
> > But it looks like you want to restrict to BE?  We still don't have a
> > dejagnu thingy for that; you could put some #ifdef around it all (there
> > are some examples in other testcases).  Not ideal, but works.
> 
> Just want to ensure continuing coverage.  :-)  This test in particular
> is a copy/paste + tweak of an existing test, which tries to limit itself
> to BE, there is an LE counterpart.

powerpc-*-* means those compilers that were configured for a 32-bit BE
default target.  Which we do not usually have these days.  It also doesn't
say much about what target the test is running for.

> My regression test results suggest that the addition of the
> -mno-fold-gimple option to the existing testcases appears to have
> uncovered an ICE, so pausing for the moment...

Good luck :-)  If you are reasonably certain the bug is not in your patch
(but pre-existing), please do commit the patch.


Segher


Re: [PATCH,PTX] Add support for CUDA 9

2018-01-17 Thread Cesar Philippidis
On 12/27/2017 01:16 AM, Tom de Vries wrote:
> On 12/21/2017 06:19 PM, Cesar Philippidis wrote:
>> My test results are somewhat inconsistent. On MG's build servers, there
>> are no regressions in CUDA 8. 
> 
> Ack.
> 
>> On my laptop, there are fewer regressions
>> in CUDA 9, than CUDA 8.
> 
> If the patch causes regressions for either cuda 8 or cuda 9, then they
> need to be analyzed and fixed.
> 
> Please clarify what you think it means if in one case there are less
> regressions than in the other.
> 
>> However, I think some of those failures are due
>> to premature timeouts on my laptop (I'm setting dejagnu to timeout after
>> 90s instead of 5m locally).
> 
> If you have flawed test results due to a local change you made, you need
> to undo the local change and rerun the test, and report the sane test
> results instead of reporting flawed test results.
> 
>> I know your on vacation, so I'll commit this patch to og7. We can
>> revisit the patch for trunk and other backports later.
> 
> If you don't have time to do the testing now, then please file a PR for
> this issue and attach the patch with the updates that address my comments.

Sorry for taking so long to respond. I finally had a chance to analyze
the results. There are no regressions with this patch. In fact, using
the unpatched CUDA8 build as a baseline, after the CUDA9 patch, 66
additional tests pass in CUDA 8 and 73 tests additional tests pass in
CUDA 9.

Is this patch OK for trunk?

Cesar
2017-12-19  Cesar Philippidis  

	gcc/
	* config/nvptx/nvptx.c (output_init_frag): Don't use generic address
	spaces for function labels.

	gcc/testsuite/
	* gcc.target/nvptx/indirect_call.c: New test.


diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index dfb27ef..a7b4c09 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -1894,9 +1894,15 @@ output_init_frag (rtx sym)
   
   if (sym)
 {
-  fprintf (asm_out_file, "generic(");
+  bool function = SYMBOL_REF_DECL (sym)
+	&& (TREE_CODE (SYMBOL_REF_DECL (sym)) == FUNCTION_DECL);
+  if (!function)
+	fprintf (asm_out_file, "generic(");
   output_address (VOIDmode, sym);
-  fprintf (asm_out_file, val ? ") + " : ")");
+  if (!function)
+	fprintf (asm_out_file, val ? ") + " : ")");
+  else if (val)
+	fprintf (asm_out_file, " + ");
 }
 
   if (!sym || val)
diff --git a/gcc/testsuite/gcc.target/nvptx/indirect_call.c b/gcc/testsuite/gcc.target/nvptx/indirect_call.c
new file mode 100644
index 000..39992a7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/indirect_call.c
@@ -0,0 +1,19 @@
+/* { dg-options "-O2 -msoft-stack" } */
+/* { dg-do run } */
+
+int
+f1 (int a)
+{
+  return a + 1;
+}
+  
+int (*f2)(int) = f1;
+
+int
+main ()
+{
+  if (f2 (100) != 101)
+__builtin_abort();
+
+  return 0;
+}


[PATCH v2] C++: Fix crash in warn_for_memset within templates (PR c++/83814)

2018-01-17 Thread David Malcolm
On Wed, 2018-01-17 at 09:28 -0500, Jason Merrill wrote:
> On Wed, Jan 17, 2018 at 5:34 AM, Jakub Jelinek 
> wrote:
> > On Fri, Jan 12, 2018 at 05:09:24PM -0500, David Malcolm wrote:
> > > PR c++/83814 reports an ICE introduced by the location wrapper
> > > patch
> > > (r256448), affecting certain memset calls within templates.
> > 
> > Note, I think this issue sadly affects a lot of code, so it is
> > quite urgent.
> > 
> > That said, wonder if we really can't do any folding when
> > processing_template_decl, could we e.g. do at least
> > maybe_constant_value,
> > or fold if the expression is not type nor value dependent?
> 
> Yes, in a template we should call fold_non_dependent_expr.
> 
> > BTW, never know if cp_fold_rvalue is a superset of
> > maybe_constant_value or not.
> 
> It is.
> 
> Jason

Thanks.  Here's an updated version of the patch.

Changed in v2:
* use fold_non_dependent_expr in the C++ impl of fold_for_warn
* add some test coverage of folding to g++.dg/wrappers/pr83814.C
* added another testcase (from PR c++/83902)

Successfully bootstrapped on x86_64-pc-linux-gnu.
OK for trunk?


gcc/c-family/ChangeLog:
PR c++/83814
* c-common.c (fold_for_warn): Move to c/c-fold.c and cp/expr.c.

gcc/c/ChangeLog:
PR c++/83814
* c-fold.c (fold_for_warn): Move from c-common.c, reducing to just
the C part.

gcc/cp/ChangeLog:
PR c++/83814
* expr.c (fold_for_warn): Move from c-common.c, reducing to just
the C++ part.  If processing a template, call
fold_non_dependent_expr rather than fully folding.

gcc/testsuite/ChangeLog:
PR c++/83814
PR c++/83902
* g++.dg/wrappers/pr83814.C: New test case.
* g++.dg/wrappers/pr83902.C: New test case.
---
 gcc/c-family/c-common.c | 13 --
 gcc/c/c-fold.c  | 10 +
 gcc/cp/expr.c   | 15 +++
 gcc/testsuite/g++.dg/wrappers/pr83814.C | 70 +
 gcc/testsuite/g++.dg/wrappers/pr83902.C |  9 +
 5 files changed, 104 insertions(+), 13 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/wrappers/pr83814.C
 create mode 100644 gcc/testsuite/g++.dg/wrappers/pr83902.C

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 097d192..858ed68 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -868,19 +868,6 @@ c_get_substring_location (const substring_loc _loc,
 }
 
 
-/* Fold X for consideration by one of the warning functions when checking
-   whether an expression has a constant value.  */
-
-tree
-fold_for_warn (tree x)
-{
-  if (c_dialect_cxx ())
-return c_fully_fold (x, /*for_init*/false, /*maybe_constp*/NULL);
-  else
-/* The C front-end has already folded X appropriately.  */
-return x;
-}
-
 /* Return true iff T is a boolean promoted to int.  */
 
 bool
diff --git a/gcc/c/c-fold.c b/gcc/c/c-fold.c
index 5776f1b..be6a0fc 100644
--- a/gcc/c/c-fold.c
+++ b/gcc/c/c-fold.c
@@ -668,3 +668,13 @@ c_fully_fold_internal (tree expr, bool in_init, bool 
*maybe_const_operands,
 }
   return ret;
 }
+
+/* Fold X for consideration by one of the warning functions when checking
+   whether an expression has a constant value.  */
+
+tree
+fold_for_warn (tree x)
+{
+  /* The C front-end has already folded X appropriately.  */
+  return x;
+}
diff --git a/gcc/cp/expr.c b/gcc/cp/expr.c
index 7d79215..b1ab453 100644
--- a/gcc/cp/expr.c
+++ b/gcc/cp/expr.c
@@ -315,3 +315,18 @@ mark_exp_read (tree exp)
 }
 }
 
+/* Fold X for consideration by one of the warning functions when checking
+   whether an expression has a constant value.  */
+
+tree
+fold_for_warn (tree x)
+{
+  /* C++ implementation.  */
+
+  /* It's not generally safe to fully fold inside of a template, so
+ call fold_non_dependent_expr instead.  */
+  if (processing_template_decl)
+return fold_non_dependent_expr (x);
+
+  return c_fully_fold (x, /*for_init*/false, /*maybe_constp*/NULL);
+}
diff --git a/gcc/testsuite/g++.dg/wrappers/pr83814.C 
b/gcc/testsuite/g++.dg/wrappers/pr83814.C
new file mode 100644
index 000..b9f8faa
--- /dev/null
+++ b/gcc/testsuite/g++.dg/wrappers/pr83814.C
@@ -0,0 +1,70 @@
+/* Verify that our memset warnings don't crash when folding
+   arguments within a template (PR c++/83814).  */
+
+// { dg-options "-Wno-int-to-pointer-cast -Wmemset-transposed-args 
-Wmemset-elt-size" }
+
+template 
+void test_1()
+{
+  __builtin_memset (int() - char(), 0, 0);
+}
+
+template 
+void test_2()
+{
+  __builtin_memset (0, 0, int() - char());
+}
+
+template 
+void test_3 (unsigned a, int c)
+{
+  __builtin_memset((char *)c + a, 0, a);
+}
+
+template 
+void test_4 (unsigned a, int c)
+{
+  __builtin_memset(0, 0, (char *)c + a);
+}
+
+/* Verify that we warn for -Wmemset-transposed-args inside
+   a template.  */
+
+char buf[1024];
+
+template 
+void test_5 ()
+{
+  __builtin_memset (buf, sizeof buf, 0); // { dg-warning "transposed 
parameters" }
+}
+

Re: [PATCH 1/3] [builtins] Generic support for __builtin_speculation_safe_load()

2018-01-17 Thread Jakub Jelinek
On Wed, Jan 17, 2018 at 05:17:29PM +, Joseph Myers wrote:
> On Wed, 17 Jan 2018, Richard Earnshaw wrote:
> 
> > +  if (TREE_CODE (type) == ARRAY_TYPE)
> > +{
> > +  /* Force array-to-pointer decay for c++.  */
> > +  gcc_assert (c_dialect_cxx ());
> 
> What's the basis for the assertion?  Why can't you have a pointer-to-array 
> passed in C?

Yeah, please see e.g. the PR82112 patches for a reason why something like
this doesn't work in C, try it with -std=gnu90 and
struct S { int a[10]; } bar (void);
...
  __whatever_builtin (bar ().a, ...);

Jakub


Re: [PATCH 1/3] [builtins] Generic support for __builtin_speculation_safe_load()

2018-01-17 Thread Joseph Myers
On Wed, 17 Jan 2018, Richard Earnshaw wrote:

> +  if (TREE_CODE (type) == ARRAY_TYPE)
> +{
> +  /* Force array-to-pointer decay for c++.  */
> +  gcc_assert (c_dialect_cxx ());

What's the basis for the assertion?  Why can't you have a pointer-to-array 
passed in C?

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH 0/3] [v2] Implement __builtin_speculation_safe_load

2018-01-17 Thread Joseph Myers
This patch series seems to be missing testcases (generic and 
architecture-specific).  Generic ones should include testing the error 
cases that are diagnosed.

-- 
Joseph S. Myers
jos...@codesourcery.com


C++ PATCH for c++/81067, redundant NULL warning

2018-01-17 Thread Jason Merrill
My June 9 patch to remove the call to scalar_constant_value from
convert_like_real wrongly also removed the NULL handling that avoids
repeated warnings.  This patch restores that code.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit 6d452cf34a321c16b6777c2e20a72b39fbf77a24
Author: Jason Merrill 
Date:   Wed Jan 17 11:29:23 2018 -0500

PR c++/81067 - redundant NULL warning.

* call.c (convert_like_real): Restore null_node handling.

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index f5542850cea..1f326d5c1ad 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -6804,6 +6804,12 @@ convert_like_real (conversion *convs, tree expr, tree 
fn, int argnum,
 
   if (type_unknown_p (expr))
expr = instantiate_type (totype, expr, complain);
+  if (expr == null_node
+ && INTEGRAL_OR_UNSCOPED_ENUMERATION_TYPE_P (totype))
+   /* If __null has been converted to an integer type, we do not want to
+  continue to warn about uses of EXPR as an integer, rather than as a
+  pointer.  */
+   expr = build_int_cst (totype, 0);
   return expr;
 case ck_ambig:
   /* We leave bad_p off ck_ambig because overload resolution considers


Ping: Re: [PATCH] __VA_OPT__ fixes (PR preprocessor/83063, PR preprocessor/83708)

2018-01-17 Thread Jakub Jelinek
Hi!

I'd like to ping this patch.
As I wrote, it isn't a full solution for all the __VA_OPT__ issues,
but it isn't even clear to me how exactly it should behave, but fixes
some ICEs and a couple of most important issues and shouldn't make things
worse, at least on the gcc and clang __VA_OPT__ testcases.

On Wed, Jan 10, 2018 at 01:04:07PM +0100, Jakub Jelinek wrote:
> In libcpp, we have quite a lot of state on the token flags, some
> related to the stuff that comes before the token (e.g.
> PREV_FALLTHROUGH, PREV_WHITE and STRINGIFY_ARG), others related to the
> stuff that comes after the token (e.g. PASTE_LEFT, SP_DIGRAPH, SP_PREV_WHITE).
> Unfortunately, with the current __VA_OPT__ handling that information is
> lost, because it is on the __VA_OPT__ or closing ) tokens that we are always
> DROPing.
> 
> The following patch attempts to fix various issues, including some ICEs,
> by introducing 3 new states, two of them are alternatives to INCLUDE used
> for the very first token after __VA_OPT__( , where we want to take into
> account also flags from the __VA_OPT__ token, and before the closing )
> token where we want to use flags from the closing ) token.  Plus
> PADDING which is used for the case where there are no varargs and __VA_OPT__
> is supposed to fold into a placemarker, or for the case of __VA_OPT__(),
> which is similar to that, in both cases we need to take into account in
> those cases both flags from __VA_OPT__ and from the closing ).
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> This is just a partial fix, one thing this patch doesn't change is that
> the standard says that __VA_OPT__ ( contents ) should be treated as
> parameter, which means that #__VA_OPT__ ( contents ) should stringify it,
> which we right now reject.  My preprocessor knowledge is too limited to
> handle this right myself, including all the corner cases, e.g. one can have
> #define f(x, ...) #__VA_OPT__(#x x ## x) etc..  I presume
> m_flags = token->flags & (PREV_FALLTHROUGH | PREV_WHITE);
> could be changed into:
> m_flags = token->flags & (PREV_FALLTHROUGH | PREV_WHITE | STRINGIFY_ARG);
> and when handling the PADDING result from update, we could just emit a 
> "" token, but for INCLUDE_FIRST with this we'd need something complex,
> probably a new routine similar to stringify_arg to some extent.
> 
> I've also cross-checked the libcpp implementation with this patch against
> trunk clang which apparently also implements __VA_OPT__ now, on the
> testcases included here the output is the same and on their
> macro_vaopt_expand.cpp testcase, if I remove all tests that test
> #__VA_OPT__ ( contents ) handling which we just reject now, there are still
> some differences:
> $ /usr/src/llvm/obj8/bin/clang++  -E /tmp/macro_vaopt_expand.cpp -std=c++2a > 
> /tmp/1
> $ ~/src/gcc/obj20/gcc/cc1plus -quiet -E /tmp/macro_vaopt_expand.cpp 
> -std=c++2a > /tmp/2
> diff -up /tmp/1 /tmp/2
> -4: f(0 )
> +4: f(0)
> -6: f(0, a )
> -7: f(0, a )
> +6: f(0, a)
> +7: f(0, a)
> -9: TONG C ( ) B ( ) "A()"
> +9: HT_A() C ( ) B ( ) "A()"
> -16: S foo ;
> +16: S foo;
> -26: B1
> -26_1: B1
> +26: B 1
> +26_1: B 1
> -27: B11
> -27_1: BexpandedA0 11
> -28: B11
> +27: B 11
> +27_1: BA0 11
> +28: B 11
> 
> Perhaps some of the whitespace changes aren't significant, but 9:, and
> 2[678]{,_1}: are significantly different.
> 9: is
> #define LPAREN ( 
> #define A() B LPAREN )
> #define B() C LPAREN )
> #define HT_B() TONG
> #define F(x, ...) HT_ ## __VA_OPT__(x x A()  #x)
> 9: F(A(),1)
> 
> Thoughts on what is right and why?
> Similarly for expansion on the last token from __VA_OPT__ when followed
> by ##, like:
> #define m1 (
> #define f16() f17 m1 )
> #define f17() f18 m1 )
> #define f18() m2 m1 )
> #define m3f17() g
> #define f19(x, ...) m3 ## __VA_OPT__(x x f16() #x)
> #define f20(x, ...) __VA_OPT__(x x)##m4()
> #define f21() f17
> #define f17m4() h
> t25 f19 (f16 (), 1);
> t26 f20 (f21 (), 2);
> 
> E.g. 26: is:
> #define F(a,...)  __VA_OPT__(B a ## a) ## 1
> 26: F(,1)
> I really wonder why clang emits B1 in that case, there
> is no ## in between B and a, so those are 2 separate tokens
> separated by whitespace, even when a ## a is a placemarker.
> Does that mean __VA_OPT__ should throw away all the placemarkers
> and return the last non-placemarker token for the ## handling?
> 
> Can somebody please take the rest over?
> 
> BTW, Tom, perhaps you should update your MAINTAINERS entry email address...
> 
> 2018-01-10  Jakub Jelinek  
> 
>   PR preprocessor/83063
>   PR preprocessor/83708
>   * macro.c (enum macro_arg_token_kind): Fix comment typo.
>   (vaopt_state): Add m_flags field, reorder m_last_was_paste before
>   m_state.
>   (vaopt_state::vaopt_state): Adjust for the above changes.
>   (vaopt_state::update_flags): Add INCLUDE_FIRST, INCLUDE_LAST and
>   PADDING.
>   (vaopt_state::update): Add limit argument, update m_flags member,
>   return INCLUDE_FIRST instead 

Re: [C++ Patch] PR 81054 ("[7/8 Regression] ICE with volatile variable in constexpr function") [Take 2]

2018-01-17 Thread Jason Merrill
OK.

On Wed, Jan 17, 2018 at 11:16 AM, Paolo Carlini
 wrote:
> Hi,
>
> On 17/01/2018 15:58, Jason Merrill wrote:
>>
>> On Tue, Jan 16, 2018 at 4:40 PM, Paolo Carlini 
>> wrote:
>>>
>>> On 16/01/2018 22:35, Jason Merrill wrote:

 On Tue, Jan 16, 2018 at 3:32 PM, Paolo Carlini
 
 wrote:
>
> thus I figured out what was badly wrong in my first try: I misread
> ensure_literal_type_for_constexpr_object and missed that it can return
> NULL_TREE without emitting an hard error. Thus my first try even caused
> miscompilations :( Anyway, when DECL_DECLARED_CONSTEXPR_P is true we
> are
> safe and indeed we want to clear it as matter of error recovery. Then,
> in
> this safe case the only change in the below is returning early, thus
> avoiding any internal inconsistencies later and also the redundant /
> misleading diagnostic which I already mentioned.

 I can't see how this could be right.  In the cases where we don't give
 an error (e.g. because we're dealing with an instantiation of a
 variable template) there is no error, so we need to proceed with the
 rest of cp_finish_decl as normal.
>>>
>>> The cases where we don't give an error all fall under
>>> DECL_DECLARED_CONSTEXPR_P == false, thus aren't affected at all.
>>
>> Ah, true.  Though that's a bit subtle; maybe change ensure_... to
>> return error_mark_node in the error case?
>
> Agreed. The below does that and I'm finishing testing it (in libstdc++ at
> the moment, so far so good and I checked separately for those nasty
> breakages I had yesterday). Note, however, this isn't exactly equivalent to
> my previous patch: it's definitely cleaner and less subtle IMO but more
> aggressive because we immediately bail out of cp_finish_decl in all cases of
> error, not just when DECL_DECLARED_CONSTEXPR_P == true. Ok if it passes?
>
> Thanks!
> Paolo.
>
> 
>
>


[PATCH][arm][testsuite] Fix -march tests in effective target checks auto-generation

2018-01-17 Thread Kyrill Tkachov

Hi all,

There is a typo in the armv8.1-a and armv8.2-a effective target check 
generators.
They are not actually used anywhere in the testsuite as far as I can tell, but 
the fix is obvious.

Committing to trunk.
Thanks,
Kyrill

2018-01-17  Kyrylo Tkachov  

* lib/target-supports.exp: Fix -march arguments in arm arch effective
target check autogenerator for armv8.1-a and armv8.2-a.
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 409e1aa3b828ca6ceac1ee4c16252bc8ad03c76a..4095f6386b19d601ffd345922b14e015565a2462 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -4108,8 +4108,8 @@ foreach { armfunc armflag armdefs } {
 	v7ve "-march=armv7ve -marm"
 		"__ARM_ARCH_7A__ && __ARM_FEATURE_IDIV"
 	v8a "-march=armv8-a" __ARM_ARCH_8A__
-	v8_1a "-march=armv8.1a" __ARM_ARCH_8A__
-	v8_2a "-march=armv8.2a" __ARM_ARCH_8A__
+	v8_1a "-march=armv8.1-a" __ARM_ARCH_8A__
+	v8_2a "-march=armv8.2-a" __ARM_ARCH_8A__
 	v8m_base "-march=armv8-m.base -mthumb -mfloat-abi=soft"
 		__ARM_ARCH_8M_BASE__
 	v8m_main "-march=armv8-m.main -mthumb" __ARM_ARCH_8M_MAIN__


Re: [PATCH] PR82964: Fix 128-bit immediate ICEs

2018-01-17 Thread Wilco Dijkstra
James Greenhalgh wrote:

> -  /* Do not allow wide int constants - this requires support in movti.  */
> +  /* Only allow simple 128-bit immediates.  */
>    if (CONST_WIDE_INT_P (x))
> -    return false;
> +    return aarch64_mov128_immediate (x);

> I can see why this could be correct, but it is unclear why it is neccessary
> to fix the bug. What goes wrong if we leave this as "return false".

It's not necessary, things only go wrong if you return true for a wider set of
immediates than those directly supported by the movti pattern - and that may
be a regalloc issue.

However removing it (returning false in all cases) actually improves code 
quality
due to a bug in memset expansion. Therefore I'll commit it as returning false
for now (there was no change in test results) and update it once memset is fixed
and inlining works as expected.

Returning true means memset(p, 32, 63) expands as:

mov x2, 2314885530818453536
mov x3, 2314885530818453536
mov x6, 2314885530818453536
mov w5, 538976288
mov w4, 8224
mov w1, 32
stp x2, x3, [x0]
stp x2, x3, [x0, 16]
stp x2, x3, [x0, 32]
str x6, [x0, 48]
str w5, [x0, 56]
strhw4, [x0, 60]
strbw1, [x0, 62]
ret

Wilco



Re: [C++ Patch] PR 81054 ("[7/8 Regression] ICE with volatile variable in constexpr function") [Take 2]

2018-01-17 Thread Paolo Carlini

Hi,

On 17/01/2018 15:58, Jason Merrill wrote:

On Tue, Jan 16, 2018 at 4:40 PM, Paolo Carlini  wrote:

On 16/01/2018 22:35, Jason Merrill wrote:

On Tue, Jan 16, 2018 at 3:32 PM, Paolo Carlini 
wrote:

thus I figured out what was badly wrong in my first try: I misread
ensure_literal_type_for_constexpr_object and missed that it can return
NULL_TREE without emitting an hard error. Thus my first try even caused
miscompilations :( Anyway, when DECL_DECLARED_CONSTEXPR_P is true we are
safe and indeed we want to clear it as matter of error recovery. Then, in
this safe case the only change in the below is returning early, thus
avoiding any internal inconsistencies later and also the redundant /
misleading diagnostic which I already mentioned.

I can't see how this could be right.  In the cases where we don't give
an error (e.g. because we're dealing with an instantiation of a
variable template) there is no error, so we need to proceed with the
rest of cp_finish_decl as normal.

The cases where we don't give an error all fall under
DECL_DECLARED_CONSTEXPR_P == false, thus aren't affected at all.

Ah, true.  Though that's a bit subtle; maybe change ensure_... to
return error_mark_node in the error case?
Agreed. The below does that and I'm finishing testing it (in libstdc++ 
at the moment, so far so good and I checked separately for those nasty 
breakages I had yesterday). Note, however, this isn't exactly equivalent 
to my previous patch: it's definitely cleaner and less subtle IMO but 
more aggressive because we immediately bail out of cp_finish_decl in all 
cases of error, not just when DECL_DECLARED_CONSTEXPR_P == true. Ok if 
it passes?


Thanks!
Paolo.




/cp
2018-01-17  Paolo Carlini  

PR c++/81054
* constexpr.c (ensure_literal_type_for_constexpr_object): Return
error_mark_node when we give an error.
decl.c (cp_finish_decl): Use the latter.

/testsuite
2018-01-17  Paolo Carlini  

PR c++/81054
* g++.dg/cpp0x/constexpr-ice19.C: New.
Index: cp/constexpr.c
===
--- cp/constexpr.c  (revision 256794)
+++ cp/constexpr.c  (working copy)
@@ -75,7 +75,8 @@ literal_type_p (tree t)
 }
 
 /* If DECL is a variable declared `constexpr', require its type
-   be literal.  Return the DECL if OK, otherwise NULL.  */
+   be literal.  Return error_mark_node if we give an error, the
+   DECL otherwise.  */
 
 tree
 ensure_literal_type_for_constexpr_object (tree decl)
@@ -97,6 +98,7 @@ ensure_literal_type_for_constexpr_object (tree dec
  error ("the type %qT of % variable %qD "
 "is not literal", type, decl);
  explain_non_literal_class (type);
+ decl = error_mark_node;
}
  else
{
@@ -105,10 +107,10 @@ ensure_literal_type_for_constexpr_object (tree dec
  error ("variable %qD of non-literal type %qT in % 
"
 "function", decl, type);
  explain_non_literal_class (type);
+ decl = error_mark_node;
}
  cp_function_chain->invalid_constexpr = true;
}
- return NULL;
}
 }
   return decl;
Index: cp/decl.c
===
--- cp/decl.c   (revision 256794)
+++ cp/decl.c   (working copy)
@@ -6810,8 +6810,12 @@ cp_finish_decl (tree decl, tree init, bool init_co
   cp_apply_type_quals_to_decl (cp_type_quals (type), decl);
 }
 
-  if (!ensure_literal_type_for_constexpr_object (decl))
-DECL_DECLARED_CONSTEXPR_P (decl) = 0;
+  if (ensure_literal_type_for_constexpr_object (decl)
+  == error_mark_node)
+{
+  DECL_DECLARED_CONSTEXPR_P (decl) = 0;
+  return;
+}
 
   if (VAR_P (decl)
   && DECL_CLASS_SCOPE_P (decl)
Index: testsuite/g++.dg/cpp0x/constexpr-ice19.C
===
--- testsuite/g++.dg/cpp0x/constexpr-ice19.C(nonexistent)
+++ testsuite/g++.dg/cpp0x/constexpr-ice19.C(working copy)
@@ -0,0 +1,13 @@
+// PR c++/81054
+// { dg-do compile { target c++11 } }
+
+struct A
+{
+  volatile int i;
+  constexpr A() : i() {}
+};
+
+struct B
+{
+  static constexpr A a {};  // { dg-error "not literal" }
+};


[PATCH] i386: Use const reference of struct ix86_frame to avoid copy

2018-01-17 Thread H.J. Lu
We can use const reference of struct ix86_frame to avoid making a local
copy of ix86_frame.  ix86_expand_epilogue makes a local copy of struct
ix86_frame and uses the reg_save_offset field as a local variable.  This
patch uses a separate local variable for reg_save_offset.

Tested on x86-64 with ada.  OK for trunk?

H.J.
--
PR target/83905
* config/i386/i386.c (ix86_expand_prologue): Use cost reference
of struct ix86_frame.
(ix86_expand_epilogue): Likewise.  Add a local variable for
the reg_save_offset field in struct ix86_frame.
---
 gcc/config/i386/i386.c | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index a301e18ed70..340eca42449 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -13385,7 +13385,6 @@ ix86_expand_prologue (void)
 {
   struct machine_function *m = cfun->machine;
   rtx insn, t;
-  struct ix86_frame frame;
   HOST_WIDE_INT allocate;
   bool int_registers_saved;
   bool sse_registers_saved;
@@ -13413,7 +13412,7 @@ ix86_expand_prologue (void)
   m->fs.sp_valid = true;
   m->fs.sp_realigned = false;
 
-  frame = m->frame;
+  const struct ix86_frame  = cfun->machine->frame;
 
   if (!TARGET_64BIT && ix86_function_ms_hook_prologue (current_function_decl))
 {
@@ -14291,7 +14290,6 @@ ix86_expand_epilogue (int style)
 {
   struct machine_function *m = cfun->machine;
   struct machine_frame_state frame_state_save = m->fs;
-  struct ix86_frame frame;
   bool restore_regs_via_mov;
   bool using_drap;
   bool restore_stub_is_tail = false;
@@ -14304,7 +14302,7 @@ ix86_expand_epilogue (int style)
 }
 
   ix86_finalize_stack_frame_flags ();
-  frame = m->frame;
+  const struct ix86_frame  = cfun->machine->frame;
 
   m->fs.sp_realigned = stack_realign_fp;
   m->fs.sp_valid = stack_realign_fp
@@ -14348,11 +14346,13 @@ ix86_expand_epilogue (int style)
  + UNITS_PER_WORD);
 }
 
+  HOST_WIDE_INT reg_save_offset = frame.reg_save_offset;
+
   /* Special care must be taken for the normal return case of a function
  using eh_return: the eax and edx registers are marked as saved, but
  not restored along this path.  Adjust the save location to match.  */
   if (crtl->calls_eh_return && style != 2)
-frame.reg_save_offset -= 2 * UNITS_PER_WORD;
+reg_save_offset -= 2 * UNITS_PER_WORD;
 
   /* EH_RETURN requires the use of moves to function properly.  */
   if (crtl->calls_eh_return)
@@ -14368,11 +14368,11 @@ ix86_expand_epilogue (int style)
   else if (TARGET_EPILOGUE_USING_MOVE
   && cfun->machine->use_fast_prologue_epilogue
   && (frame.nregs > 1
-  || m->fs.sp_offset != frame.reg_save_offset))
+  || m->fs.sp_offset != reg_save_offset))
 restore_regs_via_mov = true;
   else if (frame_pointer_needed
   && !frame.nregs
-  && m->fs.sp_offset != frame.reg_save_offset)
+  && m->fs.sp_offset != reg_save_offset)
 restore_regs_via_mov = true;
   else if (frame_pointer_needed
   && TARGET_USE_LEAVE
@@ -14440,7 +14440,7 @@ ix86_expand_epilogue (int style)
   rtx t;
 
   if (frame.nregs)
-   ix86_emit_restore_regs_using_mov (frame.reg_save_offset, style == 2);
+   ix86_emit_restore_regs_using_mov (reg_save_offset, style == 2);
 
   /* eh_return epilogues need %ecx added to the stack pointer.  */
   if (style == 2)
@@ -14535,19 +14535,19 @@ ix86_expand_epilogue (int style)
 in epilogues.  */
   if (!m->fs.sp_valid || m->fs.sp_realigned
  || (TARGET_SEH
- && (m->fs.sp_offset - frame.reg_save_offset
+ && (m->fs.sp_offset - reg_save_offset
  >= SEH_MAX_FRAME_SIZE)))
{
  pro_epilogue_adjust_stack (stack_pointer_rtx, hard_frame_pointer_rtx,
 GEN_INT (m->fs.fp_offset
- - frame.reg_save_offset),
+ - reg_save_offset),
 style, false);
}
-  else if (m->fs.sp_offset != frame.reg_save_offset)
+  else if (m->fs.sp_offset != reg_save_offset)
{
  pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx,
 GEN_INT (m->fs.sp_offset
- - frame.reg_save_offset),
+ - reg_save_offset),
 style,
 m->fs.cfa_reg == stack_pointer_rtx);
}
-- 
2.14.3




Re: [PATCH v2] Fix failure building LLVM with location wrapper nodes (PR c++/83799)

2018-01-17 Thread David Malcolm
On Fri, 2018-01-12 at 12:37 -0500, David Malcolm wrote:
> On Fri, 2018-01-12 at 09:07 +0100, Markus Trippelsdorf wrote:
> > On 2018.01.11 at 18:21 -0500, David Malcolm wrote:

[...snip...]

> > Thanks for the fix. Minor nit:
> > Please make TargetLoweringBase and TargetLoweringBase a struct
> > instead
> > of a class, to prevent illegal access of private members.
> 
> Here's an updated version of the patch which does so (so that the
> testcase compiles cleanly on clang).
> 
> I also added some more assertions to
> selftest::test_type_dependent_expression_p.
> 
> Successfully bootstrapped on x86_64-pc-linux-gnu.
> Manually tested with "make s-selftest-c++" (since we don't run the
> selftests for cc1plus by default).
> 
> OK for trunk?

Jason approved the v2 patch on IRC, so I've committed it to trunk as
r256796.

[...snip...]


Re: [PATCH, rs6000] (v2) Support for gimple folding of mergeh, mergel intrinsics

2018-01-17 Thread Will Schmidt
On Tue, 2018-01-16 at 16:34 -0600, Segher Boessenkool wrote:
> Hi!
> 
> On Tue, Jan 16, 2018 at 01:39:28PM -0600, Will Schmidt wrote:
> > Sniff-tests of the target tests on a single system look OK.  Full regtests 
> > are
> > currently running across assorted power systems.
> > OK for trunk, pending successful results?
> 
> Just a few little things:
> 
> > 2018-01-16  Will Schmidt  
> > 
> > * config/rs6000/rs6000.c: (rs6000_gimple_builtin) Add gimple folding
> > support for merge[hl].
> 
> The : goes after the ).
> 
> >  (define_insn "altivec_vmrghw_direct"
> > -  [(set (match_operand:V4SI 0 "register_operand" "=v")
> > -(unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v")
> > -  (match_operand:V4SI 2 "register_operand" "v")]
> > - UNSPEC_VMRGH_DIRECT))]
> > +  [(set (match_operand:V4SI 0 "register_operand" "=v,wa")
> > +   (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v,wa")
> > + (match_operand:V4SI 2 "register_operand" "v,wa")]
> > +UNSPEC_VMRGH_DIRECT))]
> >"TARGET_ALTIVEC"
> > -  "vmrghw %0,%1,%2"
> > +  "@
> > +  vmrghw %0,%1,%2
> > +  xxmrghw %x0,%x1,%x2"
> 
> Those last two lines should be indented one more space, so that everything
> aligns (with the @).
> 
> > +  "@
> > +  vmrglw %0,%1,%2
> > +  xxmrglw %x0,%x1,%x2"
> 
> Same here of course.
> 
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/powerpc/builtins-1-be-folded.c
> > @@ -0,0 +1,11 @@
> > +/* { dg-do compile { target { powerpc-*-* } } } */
> 
> Do you want powerpc*-*-*?  That is default in gcc.target/powerpc; dg-do
> compile is default, too, so you can either say
> 
> /* { dg-do compile } */
> 
> or nothing at all, to taste.
> 
> But it looks like you want to restrict to BE?  We still don't have a
> dejagnu thingy for that; you could put some #ifdef around it all (there
> are some examples in other testcases).  Not ideal, but works.

Just want to ensure continuing coverage.  :-)  This test in particular
is a copy/paste + tweak of an existing test, which tries to limit itself
to BE, there is an LE counterpart.

My regression test results suggest that the addition of the
-mno-fold-gimple option to the existing testcases appears to have
uncovered an ICE, so pausing for the moment...

(Power7 BE, gcc revision 256400, mergeh/mergel gimple folding patch applied)

spawn -ignore SIGHUP 
/home/willschm/gcc/build/gcc-mainline-regtest_patches/gcc/xgcc 
-B/home/willschm/gcc/build/gcc-mainline-regtest_patches/gcc/ 
/home/willschm/gcc/gcc-mainline-regtest_patches/gcc/testsuite/gcc.target/powerpc/builtins-1-be.c
 -fno-diagnostics-show-caret -fdiagnostics-color=never -mcpu=power8 -O0 
-mno-fold-gimple -ffat-lto-objects -S -m32 -mno-fold-gimple -o builtins-1-be.s
gimple folding of rs6000 builtins has been disabled.
during RTL pass: ira
In file included from 
/home/willschm/gcc/gcc-mainline-regtest_patches/gcc/testsuite/gcc.target/powerpc/builtins-1-be.c:70:
/home/willschm/gcc/gcc-mainline-regtest_patches/gcc/testsuite/gcc.target/powerpc/builtins-1.h:
 In function 'main':
/home/willschm/gcc/gcc-mainline-regtest_patches/gcc/testsuite/gcc.target/powerpc/builtins-1.h:229:1:
 internal compiler error: in elimination_costs_in_insn, at reload1.c:3633
0x1089878f elimination_costs_in_insn
/home/willschm/gcc/gcc-mainline-regtest_patches/gcc/reload1.c:3630
0x108a0dc7 calculate_elim_costs_all_insns()
/home/willschm/gcc/gcc-mainline-regtest_patches/gcc/reload1.c:1607
0x106efc17 ira_costs()
/home/willschm/gcc/gcc-mainline-regtest_patches/gcc/ira-costs.c:2249
0x106e7533 ira_build()
/home/willschm/gcc/gcc-mainline-regtest_patches/gcc/ira-build.c:3421
0x106db30f ira
/home/willschm/gcc/gcc-mainline-regtest_patches/gcc/ira.c:5292
0x106db30f execute
/home/willschm/gcc/gcc-mainline-regtest_patches/gcc/ira.c:5603








Re: [PING][PATCH, AArch64] Disable reg offset in quad-word store for Falkor

2018-01-17 Thread Siddhesh Poyarekar
On Wednesday 17 January 2018 08:31 PM, Wilco Dijkstra wrote:
> Why is that a bad thing? With the patch as is, the testcase generates:
> 
> .L4:
>   ldr q0, [x2, x3]
>   add x5, x1, x3
>   add x3, x3, 16
>   cmp x3, x4
>   str q0, [x5]
>   bne .L4
> 
> With a change in address cost (for loads and stores) we would get:
> 
> .L4:
>   ldr q0, [x3], 16
>   str q0, [x4], 16
>   cmp x3, x5
>   bne .L4
> 
> This looks better to me, especially if there are more loads and stores and
> some have offsets as well (the writeback is once per stream while the extra
> add happens for every store). It may be worth trying both possibilities
> on a large body of code and see which comes out smallest/fastest.

This is great for the load because of the way the falkor prefetcher
works, but it is terrible for the store because of the way the pipeline
works.  The only performant store for falkor is an indirect load with a
constant or zero offset.  Everything else has hidden costs.

> Note using the cost model as intended means the compiler tries to use the
> lowest cost possibility rather than never emitting the instruction, not even
> when optimizing for size. I think it's wrong to always block a valid 
> instruction.

> It's not clear whether it is easy to split out the costs today (it could be 
> done
> in aarch64_rtx_costs but not aarch64_address_cost, and the latter is what
> IVOpt uses).

I briefly looked at the possibility of splitting the register_offset
cost into load and store, but I realized that I'd have to modify the
target hook for it to be useful, which is way too much work for this
single quirk.

>> Further, it seems like worthwhile work only if there are other parts
>> that actually have the same quirk and can use this split.  Do you know
>> of any such cores?
> 
> Currently there are several supported CPUs which use a much higher cost
> for TImode and for register offsets. So it's a common thing to want, however
> I don't know whether splitting load/store address costs helps for those.

It wouldn't.  This ought to be expressed already using the addr_scale_costs.

> I think a special case for Falkor in aarch64_address_cost would be acceptable
> in GCC8 - that would be much smaller and cleaner than the current patch. 
> If required we could improve upon this in GCC9 and add a way to differentiate
> between loads and stores.

I can't do this in address_cost since I can't determine whether the
address is a load or a store location.  The most minimal way seems to be
using the patterns in the md file.

Siddhesh


[C++/83739] bogus error tsubsting range for in generic lambda

2018-01-17 Thread Nathan Sidwell
When a generic lambda contains a range_for, and we're instantiating the 
containing function, we must rebuild a range_for.  We should only 
convert to a regular for when tsubsting the resulting generic lambda itself.


This patch looks at processing_template_decl to determine that.

Applying to trunk.

nathan

--
Nathan Sidwell
2018-01-17  Nathan Sidwell  

	PR c++/83739
	* pt.c (tsubst_expr) : Rebuild a range_for if
	this not a final instantiation.

	PR c++/83739
	* g++.dg/cpp1y/pr83739.C: New.

Index: cp/pt.c
===
--- cp/pt.c	(revision 256794)
+++ cp/pt.c	(working copy)
@@ -16153,26 +16153,40 @@ tsubst_expr (tree t, tree args, tsubst_f
 
 case RANGE_FOR_STMT:
   {
+	/* Construct another range_for, if this is not a final
+	   substitution (for inside inside a generic lambda of a
+	   template).  Otherwise convert to a regular for.  */
 tree decl, expr;
-stmt = begin_for_stmt (NULL_TREE, NULL_TREE);
+stmt = (processing_template_decl
+		? begin_range_for_stmt (NULL_TREE, NULL_TREE)
+		: begin_for_stmt (NULL_TREE, NULL_TREE));
 decl = RANGE_FOR_DECL (t);
 decl = tsubst (decl, args, complain, in_decl);
 maybe_push_decl (decl);
 expr = RECUR (RANGE_FOR_EXPR (t));
-	const unsigned short unroll
-	  = RANGE_FOR_UNROLL (t) ? tree_to_uhwi (RANGE_FOR_UNROLL (t)) : 0;
+
+	tree decomp_first = NULL_TREE;
+	unsigned decomp_cnt = 0;
 	if (VAR_P (decl) && DECL_DECOMPOSITION_P (decl))
+	  decl = tsubst_decomp_names (decl, RANGE_FOR_DECL (t), args,
+  complain, in_decl,
+  _first, _cnt);
+
+	if (processing_template_decl)
 	  {
-	unsigned int cnt;
-	tree first;
-	decl = tsubst_decomp_names (decl, RANGE_FOR_DECL (t), args,
-	complain, in_decl, , );
-	stmt = cp_convert_range_for (stmt, decl, expr, first, cnt,
-	 RANGE_FOR_IVDEP (t), unroll);
+	RANGE_FOR_IVDEP (stmt) = RANGE_FOR_IVDEP (t);
+	RANGE_FOR_UNROLL (stmt) = RANGE_FOR_UNROLL (t);
+	finish_range_for_decl (stmt, decl, expr);
 	  }
 	else
-	  stmt = cp_convert_range_for (stmt, decl, expr, NULL_TREE, 0,
-   RANGE_FOR_IVDEP (t), unroll);
+	  {
+	unsigned short unroll = (RANGE_FOR_UNROLL (t)
+ ? tree_to_uhwi (RANGE_FOR_UNROLL (t)) : 0);
+	stmt = cp_convert_range_for (stmt, decl, expr,
+	 decomp_first, decomp_cnt,
+	 RANGE_FOR_IVDEP (t), unroll);
+	  }
+
 	bool prev = note_iteration_stmt_body_start ();
 RECUR (RANGE_FOR_BODY (t));
 	note_iteration_stmt_body_end (prev);
Index: testsuite/g++.dg/cpp1y/pr83739.C
===
--- testsuite/g++.dg/cpp1y/pr83739.C	(revision 0)
+++ testsuite/g++.dg/cpp1y/pr83739.C	(working copy)
@@ -0,0 +1,16 @@
+// { dg-do compile { target c++14 } }
+
+// PR 83739, deduced range-for in lambda in template
+
+template  void f()
+{
+  int x[2];
+  auto delegate = [](auto & foo)
+  {
+for (auto bar : foo);
+  };
+  delegate(x);
+}
+int main() {
+  f();
+}


Re: Compilation warning in simple-object-xcoff.c

2018-01-17 Thread Eli Zaretskii
> From: Andreas Schwab 
> Cc: Eli Zaretskii ,  gcc-patches@gcc.gnu.org,  
> gdb-patc...@sourceware.org
> Date: Tue, 16 Jan 2018 23:00:55 +0100
> 
> On Jan 16 2018, DJ Delorie  wrote:
> 
> > And it's not the host's bit size that counts; there are usually ways to
> > get 64-bit file operations on 32-bit hosts.
> 
> If ACX_LARGEFILE doesn't succeed in enabling those 64-bit file
> operations (thus making off_t a 64-bit type) then you are out of luck
> (or AC_SYS_LARGEFILE doesn't support your host yet).

Yes, AC_SYS_LARGEFILE doesn't support MinGW.

DJ, would the following semi-kludgey workaround be acceptable?

--- libiberty/simple-object-xcoff.c~0   2018-01-12 05:31:04.0 +0200
+++ libiberty/simple-object-xcoff.c 2018-01-17 12:21:08.496186000 +0200
@@ -596,13 +596,19 @@ simple_object_xcoff_find_sections (simpl
  aux = (unsigned char *) auxent;
  if (u64)
{
+ /* Use an intermediate 64-bit type to avoid
+compilation warning about 32-bit shift below on
+hosts with 32-bit off_t which aren't supported by
+AC_SYS_LARGEFILE.  */
+ ulong_type x_scnlen64;
+
  if ((auxent->u.xcoff64.x_csect.x_smtyp & 0x7) != XTY_SD
  || auxent->u.xcoff64.x_csect.x_smclas != XMC_XO)
continue;
 
- x_scnlen = fetch_32 (aux + offsetof (union external_auxent,
-  
u.xcoff64.x_csect.x_scnlen_hi));
- x_scnlen = x_scnlen << 32
+ x_scnlen64 = fetch_32 (aux + offsetof (union external_auxent,
+
u.xcoff64.x_csect.x_scnlen_hi));
+ x_scnlen = x_scnlen64 << 32
   | fetch_32 (aux + offsetof (union external_auxent,
   
u.xcoff64.x_csect.x_scnlen_lo));
}


Re: [PATCH] document -Wclass-memaccess suppression by casting (PR 81327)

2018-01-17 Thread Jason Merrill
On Mon, Jan 15, 2018 at 5:09 AM, Florian Weimer  wrote:
> * Martin Sebor:
>
>> +the virtual table.  Modifying the representation of such objects may violate
>^vtable pointer?
>
> The vtable itself is not corrupted, I assume.

Indeed.

Jason


Re: [PING][PATCH, AArch64] Disable reg offset in quad-word store for Falkor

2018-01-17 Thread Wilco Dijkstra
Siddhesh Poyarekar wrote:
  
> The current cost model will disable reg offset for loads as well as
> stores, which doesn't work well since loads with reg offset are faster
> for falkor.

Why is that a bad thing? With the patch as is, the testcase generates:

.L4:
ldr q0, [x2, x3]
add x5, x1, x3
add x3, x3, 16
cmp x3, x4
str q0, [x5]
bne .L4

With a change in address cost (for loads and stores) we would get:

.L4:
ldr q0, [x3], 16
str q0, [x4], 16
cmp x3, x5
bne .L4

This looks better to me, especially if there are more loads and stores and
some have offsets as well (the writeback is once per stream while the extra
add happens for every store). It may be worth trying both possibilities
on a large body of code and see which comes out smallest/fastest.

Note using the cost model as intended means the compiler tries to use the
lowest cost possibility rather than never emitting the instruction, not even
when optimizing for size. I think it's wrong to always block a valid 
instruction.

> Also, this is a very specific tweak for a specific processor, i.e. I
> don't know if there is value in splitting out the costs into loads and
> stores and further into 128-bit and lower just to set the 128 store cost
> higher.  That will increase the size of the change by quite a bit and
> may not make it suitable for inclusion into gcc8 at this stage, while
> the current one still qualifies given its contained impact.

It's not clear whether it is easy to split out the costs today (it could be done
in aarch64_rtx_costs but not aarch64_address_cost, and the latter is what
IVOpt uses).

> Further, it seems like worthwhile work only if there are other parts
> that actually have the same quirk and can use this split.  Do you know
> of any such cores?

Currently there are several supported CPUs which use a much higher cost
for TImode and for register offsets. So it's a common thing to want, however
I don't know whether splitting load/store address costs helps for those.

I think a special case for Falkor in aarch64_address_cost would be acceptable
in GCC8 - that would be much smaller and cleaner than the current patch. 
If required we could improve upon this in GCC9 and add a way to differentiate
between loads and stores.

Wilco

Re: [C++ Patch] PR 81054 ("[7/8 Regression] ICE with volatile variable in constexpr function") [Take 2]

2018-01-17 Thread Jason Merrill
On Tue, Jan 16, 2018 at 4:40 PM, Paolo Carlini  wrote:
> On 16/01/2018 22:35, Jason Merrill wrote:
>> On Tue, Jan 16, 2018 at 3:32 PM, Paolo Carlini 
>> wrote:
>>>
>>> thus I figured out what was badly wrong in my first try: I misread
>>> ensure_literal_type_for_constexpr_object and missed that it can return
>>> NULL_TREE without emitting an hard error. Thus my first try even caused
>>> miscompilations :( Anyway, when DECL_DECLARED_CONSTEXPR_P is true we are
>>> safe and indeed we want to clear it as matter of error recovery. Then, in
>>> this safe case the only change in the below is returning early, thus
>>> avoiding any internal inconsistencies later and also the redundant /
>>> misleading diagnostic which I already mentioned.
>>
>> I can't see how this could be right.  In the cases where we don't give
>> an error (e.g. because we're dealing with an instantiation of a
>> variable template) there is no error, so we need to proceed with the
>> rest of cp_finish_decl as normal.
>
> The cases where we don't give an error all fall under
> DECL_DECLARED_CONSTEXPR_P == false, thus aren't affected at all.

Ah, true.  Though that's a bit subtle; maybe change ensure_... to
return error_mark_node in the error case?

Jason


[PATCH 3/3] [arm] Implement support for the de-speculation intrinsic

2018-01-17 Thread Richard Earnshaw

This patch implements despeculation on ARM.  We only support it when
generating ARM or Thumb2 code (we need conditional execution); and we
only support it for sizes up to DImode.  For unsupported cases we
fall back to the generic code generation sequence so that a suitable
failure warning is emitted.

* config/arm/arm.c (arm_speculation_safe_load): New function
(TARGET_SPECULATION_SAFE_LOAD): Redefine.
* config/arm/unspec.md (VUNSPEC_NOSPECULATE): New unspec_volatile code.
* config/arm/arm.md (cmp_ior): Make this pattern callable.
(nospeculate, nospeculatedi): New patterns.
---
 gcc/config/arm/arm.c  | 100 ++
 gcc/config/arm/arm.md |  40 ++-
 gcc/config/arm/unspecs.md |   1 +
 3 files changed, 140 insertions(+), 1 deletion(-)

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 11e35ad..f28ad2b 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -321,6 +321,8 @@ static unsigned int arm_hard_regno_nregs (unsigned int, machine_mode);
 static bool arm_hard_regno_mode_ok (unsigned int, machine_mode);
 static bool arm_modes_tieable_p (machine_mode, machine_mode);
 static HOST_WIDE_INT arm_constant_alignment (const_tree, HOST_WIDE_INT);
+static rtx arm_speculation_safe_load (machine_mode, rtx, rtx, rtx, rtx, rtx,
+  bool);
 
 /* Table of machine attributes.  */
 static const struct attribute_spec arm_attribute_table[] =
@@ -804,6 +806,9 @@ static const struct attribute_spec arm_attribute_table[] =
 
 #undef TARGET_CONSTANT_ALIGNMENT
 #define TARGET_CONSTANT_ALIGNMENT arm_constant_alignment
+
+#undef TARGET_SPECULATION_SAFE_LOAD
+#define TARGET_SPECULATION_SAFE_LOAD arm_speculation_safe_load
 
 /* Obstack for minipool constant handling.  */
 static struct obstack minipool_obstack;
@@ -31523,6 +31528,101 @@ arm_constant_alignment (const_tree exp, HOST_WIDE_INT align)
   return align;
 }
 
+static rtx
+arm_speculation_safe_load (machine_mode mode, rtx result, rtx mem,
+  rtx lower_bound, rtx upper_bound,
+			   rtx cmpptr, bool warn)
+{
+  rtx cond, comparison;
+
+  /* We can't support this for Thumb1 as we have no suitable conditional
+ move operations.  Nor do we support it for TImode.  For both
+ these cases fall back to the generic code sequence which will emit
+ a suitable warning for us.  */
+  if (mode == TImode || TARGET_THUMB1)
+return default_speculation_safe_load (mode, result, mem, lower_bound,
+	  upper_bound, cmpptr, warn);
+
+
+  rtx target = gen_reg_rtx (mode);
+  rtx tgt2 = result;
+
+  if (!register_operand (tgt2, mode))
+tgt2 = gen_reg_rtx (mode);
+
+  if (!register_operand (cmpptr, ptr_mode))
+cmpptr = force_reg (ptr_mode, cmpptr);
+
+  /* There's no point in comparing against a lower bound that is NULL, all
+ addresses are greater than or equal to that.  */
+  if (lower_bound == const0_rtx)
+{
+  if (!register_operand (upper_bound, ptr_mode))
+	upper_bound = force_reg (ptr_mode, upper_bound);
+
+  cond = arm_gen_compare_reg (GEU, cmpptr, upper_bound, NULL);
+  comparison = gen_rtx_GEU (VOIDmode, cond, const0_rtx);
+}
+  else
+{
+  /* We want to generate code for
+	   result = (cmpptr < lower || cmpptr >= upper) ? 0 : *ptr;
+	 Which can be recast to
+	   result = (cmpptr < lower || upper <= cmpptr) ? 0 : *ptr;
+	 which can be implemented as
+	   cmp   cmpptr, lower
+	   cmpcs upper, cmpptr
+	   bls   1f
+	   ldr   result, [ptr]
+	  1:
+	   movls result, #0
+	 with suitable IT instructions as needed for thumb2.  Later
+	 optimization passes may make the load conditional.  */
+
+  if (!register_operand (lower_bound, ptr_mode))
+	lower_bound = force_reg (ptr_mode, lower_bound);
+
+  if (!register_operand (upper_bound, ptr_mode))
+	upper_bound = force_reg (ptr_mode, upper_bound);
+
+  rtx comparison1 = gen_rtx_LTU (SImode, cmpptr, lower_bound);
+  rtx comparison2 = gen_rtx_LEU (SImode, upper_bound, cmpptr);
+  cond = gen_rtx_REG (arm_select_dominance_cc_mode (comparison1,
+			comparison2,
+			DOM_CC_X_OR_Y),
+			  CC_REGNUM);
+  emit_insn (gen_cmp_ior (cmpptr, lower_bound, upper_bound, cmpptr,
+			  comparison1, comparison2, cond));
+  comparison = gen_rtx_NE (SImode, cond, const0_rtx);
+}
+
+  rtx_code_label *label = gen_label_rtx ();
+  emit_jump_insn (gen_arm_cond_branch (label, comparison, cond));
+  emit_move_insn (target, mem);
+  emit_label (label);
+
+  insn_code icode;
+
+  /* We don't support TImode on Arm, but that can't currently be generated
+ for integral types on this architecture.  */
+  switch (mode)
+{
+case E_QImode: icode = CODE_FOR_nospeculateqi; break;
+case E_HImode: icode = CODE_FOR_nospeculatehi; break;
+case E_SImode: icode = CODE_FOR_nospeculatesi; break;
+case E_DImode: icode = CODE_FOR_nospeculatedi; break;
+default:
+  gcc_unreachable ();
+}
+
+  emit_insn (GEN_FCN (icode) (tgt2, 

[PATCH 1/3] [builtins] Generic support for __builtin_speculation_safe_load()

2018-01-17 Thread Richard Earnshaw

This patch adds generic support for the new builtin
__builtin_speculation_safe_load.  It provides the overloading of the
different access sizes and a default fall-back expansion for targets
that do not support a mechanism for inhibiting speculation.

* builtin-types.def (BT_FN_I1_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR):
New builtin type signature.
(BT_FN_I2_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.
(BT_FN_I4_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.
(BT_FN_I8_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.
(BT_FN_I16_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.
* builtins.def (BUILT_IN_SPECULATION_SAFE_LOAD_N): New builtin.
(BUILT_IN_SPECULATION_SAFE_LOAD_1): Likewise.
(BUILT_IN_SPECULATION_SAFE_LOAD_2): Likewise.
(BUILT_IN_SPECULATION_SAFE_LOAD_4): Likewise.
(BUILT_IN_SPECULATION_SAFE_LOAD_8): Likewise.
(BUILT_IN_SPECULATION_SAFE_LOAD_16): Likewise.
* target.def (speculation_safe_load): New hook.
* doc/tm.texi.in (TARGET_SPECULATION_SAFE_LOAD): Add to
documentation.
* doc/tm.texi: Regenerated.
* doc/cpp.texi: Document __HAVE_SPECULATION_SAFE_LOAD.
* doc/extend.texi: Document __builtin_speculation_safe_load.
* c-family/c-common.c (speculation_safe_load_resolve_size): New
function.
(speculation_safe_load_resolve_params): New function.
(speculation_safe_load_resolve_return): New function.
(resolve_overloaded_builtin): Handle overloading
__builtin_speculation_safe_load.
* builtins.c (expand_speculation_safe_load): New function.
(expand_builtin): Handle new speculation-safe builtins.
* targhooks.h (default_speculation_safe_load): Declare.
* targhooks.c (default_speculation_safe_load): New function.
---
 gcc/builtin-types.def   |  16 +
 gcc/builtins.c  |  81 +++
 gcc/builtins.def|  17 +
 gcc/c-family/c-common.c | 152 
 gcc/c-family/c-cppbuiltin.c |   5 +-
 gcc/doc/cpp.texi|   4 ++
 gcc/doc/extend.texi |  68 
 gcc/doc/tm.texi |   9 +++
 gcc/doc/tm.texi.in  |   2 +
 gcc/target.def  |  34 ++
 gcc/targhooks.c |  59 +
 gcc/targhooks.h |   3 +
 12 files changed, 449 insertions(+), 1 deletion(-)

diff --git a/gcc/builtin-types.def b/gcc/builtin-types.def
index bb50e60..492d4f6 100644
--- a/gcc/builtin-types.def
+++ b/gcc/builtin-types.def
@@ -785,6 +785,22 @@ DEF_FUNCTION_TYPE_VAR_3 (BT_FN_SSIZE_STRING_SIZE_CONST_STRING_VAR,
 DEF_FUNCTION_TYPE_VAR_3 (BT_FN_INT_FILEPTR_INT_CONST_STRING_VAR,
 			 BT_INT, BT_FILEPTR, BT_INT, BT_CONST_STRING)
 
+DEF_FUNCTION_TYPE_VAR_3 (BT_FN_I1_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR,
+			 BT_I1, BT_CONST_VOLATILE_PTR,  BT_CONST_VOLATILE_PTR,
+			 BT_CONST_VOLATILE_PTR)
+DEF_FUNCTION_TYPE_VAR_3 (BT_FN_I2_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR,
+		 BT_I2, BT_CONST_VOLATILE_PTR,  BT_CONST_VOLATILE_PTR,
+			 BT_CONST_VOLATILE_PTR)
+DEF_FUNCTION_TYPE_VAR_3 (BT_FN_I4_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR,
+			 BT_I4, BT_CONST_VOLATILE_PTR,  BT_CONST_VOLATILE_PTR,
+			 BT_CONST_VOLATILE_PTR)
+DEF_FUNCTION_TYPE_VAR_3 (BT_FN_I8_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR,
+			 BT_I8, BT_CONST_VOLATILE_PTR,  BT_CONST_VOLATILE_PTR,
+			 BT_CONST_VOLATILE_PTR)
+DEF_FUNCTION_TYPE_VAR_3 (BT_FN_I16_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR,
+			 BT_I16, BT_CONST_VOLATILE_PTR,  BT_CONST_VOLATILE_PTR,
+			 BT_CONST_VOLATILE_PTR)
+
 DEF_FUNCTION_TYPE_VAR_4 (BT_FN_INT_STRING_INT_SIZE_CONST_STRING_VAR,
 			 BT_INT, BT_STRING, BT_INT, BT_SIZE, BT_CONST_STRING)
 
diff --git a/gcc/builtins.c b/gcc/builtins.c
index 98eb804..c0a15d1 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -6602,6 +6602,79 @@ expand_stack_save (void)
   return ret;
 }
 
+/* Expand a call to __builtin_speculation_safe_load_.  MODE
+   represents the size of the first argument to that call.  We emit a
+   warning if the result isn't used (IGNORE != 0), since the
+   implementation might rely on the value being used to correctly
+   inhibit speculation.  */
+static rtx
+expand_speculation_safe_load (machine_mode mode, tree exp, rtx target,
+			  int ignore)
+{
+  rtx ptr, mem, lower, upper, cmpptr;
+  unsigned nargs = call_expr_nargs (exp);
+
+  if (ignore)
+{
+  warning_at (input_location, 0,
+		  "result of __builtin_speculation_safe_load must be used to "
+		  "ensure correct operation");
+  target = NULL;
+}
+
+  tree arg0 = CALL_EXPR_ARG (exp, 0);
+  tree arg1 = CALL_EXPR_ARG (exp, 1);
+  tree arg2 = CALL_EXPR_ARG (exp, 2);
+
+  ptr = expand_expr (arg0, NULL_RTX, ptr_mode, EXPAND_SUM);
+  mem = validize_mem (gen_rtx_MEM (mode, convert_memory_address (Pmode, ptr)));
+
+  set_mem_align (mem, MAX (GET_MODE_ALIGNMENT (mode),
+			   get_pointer_alignment (arg0)));
+  

[PATCH 2/3] [aarch64] Implement support for __builtin_speculation_safe_load

2018-01-17 Thread Richard Earnshaw

This patch implements support for __builtin_speculation_safe_load on
AArch64.  On this architecture we inhibit speclation by emitting a
combination of CSEL and a hint instruction that ensures the CSEL is
full resolved when the operands to the CSEL may involve a speculative
load.

* config/aarch64/aarch64.c (aarch64_print_operand): Handle zero passed
to 'H' operand qualifier.
(aarch64_speculation_safe_load): New function.
(TARGET_SPECULATION_SAFE_LOAD): Redefine.
* config/aarch64/aarch64.md (UNSPECV_NOSPECULATE): New unspec_volatile
code.
(nospeculate, nospeculateti): New patterns.
---
 gcc/config/aarch64/aarch64.c  | 81 +++
 gcc/config/aarch64/aarch64.md | 28 +++
 2 files changed, 109 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 93e9d9f9..6591d19 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -5315,6 +5315,14 @@ aarch64_print_operand (FILE *f, rtx x, int code)
   break;
 
 case 'H':
+   /* Print the higher numbered register of a pair (TImode) of regs.  */
+  if (x == const0_rtx
+	  || (CONST_DOUBLE_P (x) && aarch64_float_const_zero_rtx_p (x)))
+	{
+	  asm_fprintf (f, "xzr");
+	  break;
+	}
+
   if (!REG_P (x) || !GP_REGNUM_P (REGNO (x) + 1))
 	{
 	  output_operand_lossage ("invalid operand for '%%%c'", code);
@@ -15115,6 +15123,76 @@ aarch64_sched_can_speculate_insn (rtx_insn *insn)
 }
 }
 
+static rtx
+aarch64_speculation_safe_load (machine_mode mode, rtx result, rtx mem,
+			   rtx lower_bound, rtx upper_bound, rtx cmpptr,
+			   bool warn ATTRIBUTE_UNUSED)
+{
+  rtx cond, comparison;
+  rtx target = gen_reg_rtx (mode);
+  rtx tgt2 = result;
+
+  if (!register_operand (cmpptr, ptr_mode))
+cmpptr = force_reg (ptr_mode, cmpptr);
+
+  if (!register_operand (tgt2, mode))
+tgt2 = gen_reg_rtx (mode);
+
+  if (lower_bound == const0_rtx)
+{
+  if (!register_operand (upper_bound, ptr_mode))
+	upper_bound = force_reg (ptr_mode, upper_bound);
+
+  cond = aarch64_gen_compare_reg (GEU, cmpptr, upper_bound);
+  comparison = gen_rtx_GEU (VOIDmode, cond, const0_rtx);
+}
+  else
+{
+  if (!register_operand (lower_bound, ptr_mode))
+	lower_bound = force_reg (ptr_mode, lower_bound);
+
+  if (!register_operand (upper_bound, ptr_mode))
+	upper_bound = force_reg (ptr_mode, upper_bound);
+
+  rtx cond1 = aarch64_gen_compare_reg (GEU, cmpptr, lower_bound);
+  rtx comparison1 = gen_rtx_GEU (ptr_mode, cond1, const0_rtx);
+  rtx failcond = GEN_INT (aarch64_get_condition_code (comparison1)^1);
+  cond = gen_rtx_REG (CCmode, CC_REGNUM);
+  if (ptr_mode == SImode)
+	emit_insn (gen_ccmpsi (cond1, cond, cmpptr, upper_bound, comparison1,
+			   failcond));
+  else
+	emit_insn (gen_ccmpdi (cond1, cond, cmpptr, upper_bound, comparison1,
+			   failcond));
+  comparison = gen_rtx_GEU (VOIDmode, cond, const0_rtx);
+}
+
+  rtx_code_label *label = gen_label_rtx ();
+  emit_jump_insn (gen_condjump (comparison, cond, label));
+  emit_move_insn (target, mem);
+  emit_label (label);
+
+  insn_code icode;
+
+  switch (mode)
+{
+case E_QImode: icode = CODE_FOR_nospeculateqi; break;
+case E_HImode: icode = CODE_FOR_nospeculatehi; break;
+case E_SImode: icode = CODE_FOR_nospeculatesi; break;
+case E_DImode: icode = CODE_FOR_nospeculatedi; break;
+case E_TImode: icode = CODE_FOR_nospeculateti; break;
+default:
+  gcc_unreachable ();
+}
+
+  emit_insn (GEN_FCN (icode) (tgt2, comparison, cond, target, const0_rtx));
+
+  if (tgt2 != result)
+emit_move_insn (result, tgt2);
+
+  return result;
+}
+
 /* Target-specific selftests.  */
 
 #if CHECKING_P
@@ -15554,6 +15632,9 @@ aarch64_libgcc_floating_mode_supported_p
 #undef TARGET_CONSTANT_ALIGNMENT
 #define TARGET_CONSTANT_ALIGNMENT aarch64_constant_alignment
 
+#undef TARGET_SPECULATION_SAFE_LOAD
+#define TARGET_SPECULATION_SAFE_LOAD aarch64_speculation_safe_load
+
 #if CHECKING_P
 #undef TARGET_RUN_TARGET_SELFTESTS
 #define TARGET_RUN_TARGET_SELFTESTS selftest::aarch64_run_selftests
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index f1e2a07..1a1f398 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -153,6 +153,7 @@
 UNSPECV_SET_FPSR		; Represent assign of FPSR content.
 UNSPECV_BLOCKAGE		; Represent a blockage
 UNSPECV_PROBE_STACK_RANGE	; Represent stack range probing.
+UNSPECV_NOSPECULATE		; Inhibit speculation
   ]
 )
 
@@ -5797,6 +5798,33 @@
   DONE;
 })
 
+(define_insn "nospeculate"
+  [(set (match_operand:ALLI 0 "register_operand" "=r")
+(unspec_volatile:ALLI
+ [(match_operator 1 "aarch64_comparison_operator"
+	   [(match_operand 2 "cc_register" "") (const_int 0)])
+	  (match_operand:ALLI 3 "register_operand" "r")
+	  (match_operand:ALLI 4 "aarch64_reg_or_zero" "rZ")]
+	 

[PATCH 0/3] [v2] Implement __builtin_speculation_safe_load

2018-01-17 Thread Richard Earnshaw

This patch series is version 2 for a patch to protect against
speculative use of a load instruction.  It's based on a roll-up of the
feedback from the first version.

What's changed since v1?

First, and foremost, the API has changed to make it possible to reduce
the amount of code that needs to be generated on architectures that
provide an unconditional speculation barrier.  Although the builtin
still requires the additional operands (and does some checking to
ensure that they aren't completely unreasonable), back-end expansion
can ignore them and simply emit a load instruction along with the
barrier operation..

Secondly, I've dropped the failval parameter.  With the new API, with
undefined behaviour for an out-of-bounds access, this paramter
no-longer makes sense.

Thirdly, based on off-list comments, I've dropped the ability to make
upper_bound NULL.  As a bounds value it was not particularly logical
and coding it requierd special handling in the back-end (wrong code
would be generated otherwise).  By removing it the bounds are now all
natural.  (NULL is still supported as a lower bound, nothing can be
less than address 0 --- at least on unsigned addressing-mode machines,
which appear to be all that GCC supports --- so it is a simple back-end
optimization to drop the check in this case.)

Finally, I've changed the name of the builtin.  This was mostly for
Arm's benefit since we already have some folk building code with the
old API and I need to be able to transition them cleanly to the new
one.  This is a one-off change, I'm not intending to support such
renames if we have to iterate again on this builtin before the initial
commit (ie I won't do it again, promise).

I've updated the documentation for the changes, please read that for
additional details (in patch 1).  Richi commented that he thought some
examples would help: I agree, but feel that putting them in
extend.texi with the builtin itself doesn't really fit with the rest
of that section - I think a web, or wiki page on the subject would be a
better bet.  That can be kept up-to-date more easily than the
documentation that comes with the compiler.

R.

Richard Earnshaw (3):
  [builtins] Generic support for __builtin_speculation_safe_load()
  [aarch64] Implement support for __builtin_speculation_safe_load
  [arm] Implement support for the de-speculation intrinsic

 gcc/builtin-types.def |  16 +
 gcc/builtins.c|  81 ++
 gcc/builtins.def  |  17 +
 gcc/c-family/c-common.c   | 152 ++
 gcc/c-family/c-cppbuiltin.c   |   5 +-
 gcc/config/aarch64/aarch64.c  |  81 ++
 gcc/config/aarch64/aarch64.md |  28 
 gcc/config/arm/arm.c  | 100 +++
 gcc/config/arm/arm.md |  40 ++-
 gcc/config/arm/unspecs.md |   1 +
 gcc/doc/cpp.texi  |   4 ++
 gcc/doc/extend.texi   |  68 +++
 gcc/doc/tm.texi   |   9 +++
 gcc/doc/tm.texi.in|   2 +
 gcc/target.def|  34 ++
 gcc/targhooks.c   |  59 
 gcc/targhooks.h   |   3 +
 17 files changed, 698 insertions(+), 2 deletions(-)



[PATCH] Fix PR83887, re-implement SESE region merging

2018-01-17 Thread Richard Biener

This fixes PR83887 where current GRAPHITE SESE region merging sometimes
finds non-SESE regions as a result.

It re-implements SESE region merging by searching for the entry/exit
with a worklist based algorithm seeded by the entry/exit block of
the to be merged regions, walking predecessors and successors and
using dominator information to see whether we deal with a
(possible) entry or exit edge.

The patch misses optimization in that known SESE regions can be
skipped in the walk (and trivially known ones are single exit
loops).  While easily added during iteration I've yet think about
a way to handle the case where the initial seeds are entries of
such regions.

I probably also should add some comments.

Bootstrapped and tested on x86_64-unknown-linux-gnu, I've also built
SPEC CPU 2006 with -floop-nest-optimize [-fgraphite-identity] with
no new issues and 4 less optimized loop nests (maybe we did get away
with some invalid SESE regions - I did not investigate yet).

Comments welcome.

Thanks,
Richard.

2018-01-17  Richard Biener  

PR tree-optimization/83887
* graphite-scop-detection.c
(scop_detection::get_nearest_dom_with_single_entry): Remove.
(scop_detection::get_nearest_pdom_with_single_exit): Likewise.
(scop_detection::merge_sese): Re-implement with a flood-fill
algorithm that properly finds a SESE region if it exists.

* gcc.dg/graphite/pr83887.c: New testcase.
* gfortran.dg/graphite/pr83887.f90: Likewise.
* gfortran.dg/graphite/pr83887.f: Likewise.

Index: gcc/graphite-scop-detection.c
===
--- gcc/graphite-scop-detection.c   (revision 256776)
+++ gcc/graphite-scop-detection.c   (working copy)
@@ -309,16 +309,6 @@ public:
 
   sese_l get_sese (loop_p loop);
 
-  /* Return the closest dominator with a single entry edge.  In case of a
- back-loop the back-edge is not counted.  */
-
-  static edge get_nearest_dom_with_single_entry (basic_block dom);
-
-  /* Return the closest post-dominator with a single exit edge.  In case of a
- back-loop the back-edge is not counted.  */
-
-  static edge get_nearest_pdom_with_single_exit (basic_block dom);
-
   /* Merge scops at same loop depth and returns the new sese.
  Returns a new SESE when merge was successful, INVALID_SESE otherwise.  */
 
@@ -441,85 +431,6 @@ scop_detection::get_sese (loop_p loop)
   return sese_l (scop_begin, scop_end);
 }
 
-/* Return the closest dominator with a single entry edge.  */
-
-edge
-scop_detection::get_nearest_dom_with_single_entry (basic_block dom)
-{
-  if (!dom->preds)
-return NULL;
-
-  /* If any of the dominators has two predecessors but one of them is a back
- edge, then that basic block also qualifies as a dominator with single
- entry.  */
-  if (dom->preds->length () == 2)
-{
-  /* If e1->src dominates e2->src then e1->src will also dominate dom.  */
-  edge e1 = (*dom->preds)[0];
-  edge e2 = (*dom->preds)[1];
-  loop_p l = dom->loop_father;
-  loop_p l1 = e1->src->loop_father;
-  loop_p l2 = e2->src->loop_father;
-  if (l != l1 && l == l2
- && dominated_by_p (CDI_DOMINATORS, e2->src, e1->src))
-   return e1;
-  if (l != l2 && l == l1
- && dominated_by_p (CDI_DOMINATORS, e1->src, e2->src))
-   return e2;
-}
-
-  while (dom->preds->length () != 1)
-{
-  if (dom->preds->length () < 1)
-   return NULL;
-  dom = get_immediate_dominator (CDI_DOMINATORS, dom);
-  if (!dom->preds)
-   return NULL;
-}
-  return (*dom->preds)[0];
-}
-
-/* Return the closest post-dominator with a single exit edge.  In case of a
-   back-loop the back-edge is not counted.  */
-
-edge
-scop_detection::get_nearest_pdom_with_single_exit (basic_block pdom)
-{
-  if (!pdom->succs)
-return NULL;
-
-  /* If any of the post-dominators has two successors but one of them is a back
- edge, then that basic block also qualifies as a post-dominator with single
- exit. */
-  if (pdom->succs->length () == 2)
-{
-  /* If e1->dest post-dominates e2->dest then e1->dest will also
-post-dominate pdom.  */
-  edge e1 = (*pdom->succs)[0];
-  edge e2 = (*pdom->succs)[1];
-  loop_p l = pdom->loop_father;
-  loop_p l1 = e1->dest->loop_father;
-  loop_p l2 = e2->dest->loop_father;
-  if (l != l1 && l == l2
- && dominated_by_p (CDI_POST_DOMINATORS, e2->dest, e1->dest))
-   return e1;
-  if (l != l2 && l == l1
- && dominated_by_p (CDI_POST_DOMINATORS, e1->dest, e2->dest))
-   return e2;
-}
-
-  while (pdom->succs->length () != 1)
-{
-  if (pdom->succs->length () < 1)
-   return NULL;
-  pdom = get_immediate_dominator (CDI_POST_DOMINATORS, pdom);
-  if (!pdom->succs)
-   return NULL;
-}
-
-  return (*pdom->succs)[0];
-}
-
 /* Merge scops at same loop depth and returns the new sese.

Re: [PATCH] C++: Fix ICE in warn_for_memset within templates (PR c++/83814)

2018-01-17 Thread Jason Merrill
On Wed, Jan 17, 2018 at 5:34 AM, Jakub Jelinek  wrote:
> On Fri, Jan 12, 2018 at 05:09:24PM -0500, David Malcolm wrote:
>> PR c++/83814 reports an ICE introduced by the location wrapper patch
>> (r256448), affecting certain memset calls within templates.
>
> Note, I think this issue sadly affects a lot of code, so it is quite urgent.
>
> That said, wonder if we really can't do any folding when
> processing_template_decl, could we e.g. do at least maybe_constant_value,
> or fold if the expression is not type nor value dependent?

Yes, in a template we should call fold_non_dependent_expr.

> BTW, never know if cp_fold_rvalue is a superset of maybe_constant_value or 
> not.

It is.

Jason


libgo patch committed: Update to Go1.10beta2 release

2018-01-17 Thread Ian Lance Taylor
This patch updates libgo to the Go1.10beta2 release.  The complete
patch is too large to include in this e-mail message, mainly due to
some test changes.  Bootstrapped and ran Go testsuite on
x86_64-pc-linux-gnu.  Committed to mainline.

Ian


patch.txt.bz2
Description: BZip2 compressed data


Re: [PING][PATCH, AArch64] Disable reg offset in quad-word store for Falkor

2018-01-17 Thread Siddhesh Poyarekar
On Wednesday 17 January 2018 07:07 PM, Wilco Dijkstra wrote:
> (finished version this time, somehow Outlook loves to send emails early...)
> 
> Hi,
> 
> In general I think the best way to achieve this would be to use the
> existing cost models which are there for exactly this purpose. If
> this doesn't work well enough then we should fix those. As is,
> this patch disables a whole class of instructions for a specific
> target rather than simply telling GCC that they are expensive and
> should only be used if there is no cheaper alternative.

The current cost model will disable reg offset for loads as well as
stores, which doesn't work well since loads with reg offset are faster
for falkor.

Also, this is a very specific tweak for a specific processor, i.e. I
don't know if there is value in splitting out the costs into loads and
stores and further into 128-bit and lower just to set the 128 store cost
higher.  That will increase the size of the change by quite a bit and
may not make it suitable for inclusion into gcc8 at this stage, while
the current one still qualifies given its contained impact.

Further, it seems like worthwhile work only if there are other parts
that actually have the same quirk and can use this split.  Do you know
of any such cores?

> Also there is potential impact on generic code from:
> 
>  (define_insn "*aarch64_simd_mov"
>[(set (match_operand:VQ 0 "nonimmediate_operand"
> - "=w, Umq,  m,  w, ?r, ?w, ?r, w")
> + "=w, Umq, Utf,  w, ?r, ?w, ?r, w")
>   (match_operand:VQ 1 "general_operand"
> - "m,  Dz, w,  w,  w,  r,  r, Dn"))]
> + "m,  Dz,w,  w,  w,  r,  r, Dn"))]
> 
> It seems an 'm' constraint has special meaning in the register allocator,
> using a different constraint can block certain simplifications (for example
> merging stack offsets into load/store in the post-reload cleanup pass),
> so we'd need to verify this doesn't cause regressions.

I'll verify this.

> Also it is best to introduce generic interfaces:
> 
> +/* Return TRUE if OP is a good address mode for movti target on falkor.  */
> +bool
> +aarch64_falkor_movti_target_operand_p (rtx op)
> 
> +(define_memory_constraint "Utf"
> +  "@iternal
> +   A good address for a falkor movti target operand."
> +  (and (match_code "mem")
> +   (match_test "aarch64_falkor_movti_target_operand_p (op)")))
> 
> We should use generic names here even if the current implementation
> wants to do something specific for Falkor.

I'll fix this.

Thanks,
Siddhesh


Re: [PING][PATCH, AArch64] Disable reg offset in quad-word store for Falkor

2018-01-17 Thread Wilco Dijkstra
(finished version this time, somehow Outlook loves to send emails early...)

Hi,

In general I think the best way to achieve this would be to use the
existing cost models which are there for exactly this purpose. If
this doesn't work well enough then we should fix those. As is,
this patch disables a whole class of instructions for a specific
target rather than simply telling GCC that they are expensive and
should only be used if there is no cheaper alternative.

Also there is potential impact on generic code from:

 (define_insn "*aarch64_simd_mov"
   [(set (match_operand:VQ 0 "nonimmediate_operand"
-   "=w, Umq,  m,  w, ?r, ?w, ?r, w")
+   "=w, Umq, Utf,  w, ?r, ?w, ?r, w")
(match_operand:VQ 1 "general_operand"
-   "m,  Dz, w,  w,  w,  r,  r, Dn"))]
+   "m,  Dz,w,  w,  w,  r,  r, Dn"))]

It seems an 'm' constraint has special meaning in the register allocator,
using a different constraint can block certain simplifications (for example
merging stack offsets into load/store in the post-reload cleanup pass),
so we'd need to verify this doesn't cause regressions.

Also it is best to introduce generic interfaces:

+/* Return TRUE if OP is a good address mode for movti target on falkor.  */
+bool
+aarch64_falkor_movti_target_operand_p (rtx op)

+(define_memory_constraint "Utf"
+  "@iternal
+   A good address for a falkor movti target operand."
+  (and (match_code "mem")
+   (match_test "aarch64_falkor_movti_target_operand_p (op)")))

We should use generic names here even if the current implementation
wants to do something specific for Falkor.

Wilco

Re: [PING][PATCH, AArch64] Disable reg offset in quad-word store for Falkor

2018-01-17 Thread Wilco Dijkstra
Hi,

In general I think the best way to achieve this would be to use the
existing cost models which are there for exactly this purpose. If
this doesn't work well enough then we should fix those. As is,
this patch disables a whole class of instructions for a specific
target rather than simply telling GCC that they are expensive and
should only be used if there is no cheaper alternative.

Also there is impact on generic code from:



GCC 7 branch now frozen for the release of GCC 7.3

2018-01-17 Thread Richard Biener

The GCC 7 branch is now frozen in preparation for GCC 7.3 RC1.  All
changes from this point to the final release of GCC 7.3 now require
release manager approval.

As said I'm happily taking adjustments/enhancements to the spectre
mitigation patches (as well as rs6000 backports).

Richard.


Re: [PATCH] Fix store-merging for ~ of bswap (PR tree-optimization/83843)

2018-01-17 Thread Richard Biener
On Wed, 17 Jan 2018, Jakub Jelinek wrote:

> On Tue, Jan 16, 2018 at 02:19:16PM +0100, Christophe Lyon wrote:
> > I've noticed that this new test fails on arm, eg:
> > arm-none-linux-gnueabihf
> > --with-mode arm
> > --with-cpu cortex-a9
> > --with-fpu neon-fp16
> > FAIL: gcc.dg/store_merging_18.c scan-tree-dump-times store-merging
> > "Merging successful" 3 (found 0 times)
> 
> Ugh, the problem that arm announces itself as a store_merge target when
> it can't do unaligned stores again, so essentially dup of PR83195.
> We really shouldn't lie :(.
> 
> Anyway, for now I've checked in the following which matches what I've done
> for PR83195.
> 
> Better would be to have store_merge_unaligned and store_merge, where
> the former would be current store_merge except for arm, and latter
> would be all targets that can perform store merging (isn't that all except
> targets that don't have 8-bit chars or pdp endianity)?

Ok.

> 2018-01-17  Jakub Jelinek  
> 
>   PR tree-optimization/83843
>   * gcc.dg/store_merging_18.c: Don't expect "Merging successful" on arm.
>   * gcc.dg/store_merging_19.c: New test.
> 
> --- gcc/testsuite/gcc.dg/store_merging_18.c.jj2018-01-16 
> 09:52:26.231235131 +0100
> +++ gcc/testsuite/gcc.dg/store_merging_18.c   2018-01-17 12:10:07.862957549 
> +0100
> @@ -1,7 +1,7 @@
>  /* PR tree-optimization/83843 */
>  /* { dg-do run } */
>  /* { dg-options "-O2 -fdump-tree-store-merging" } */
> -/* { dg-final { scan-tree-dump-times "Merging successful" 3 "store-merging" 
> { target store_merge } } } */
> +/* { dg-final { scan-tree-dump-times "Merging successful" 3 "store-merging" 
> { target { store_merge && { ! arm*-*-* } } } } } */
>  
>  __attribute__((noipa)) void
>  foo (unsigned char *buf, unsigned char *tab)
> --- gcc/testsuite/gcc.dg/store_merging_19.c.jj2018-01-17 
> 12:10:34.819962003 +0100
> +++ gcc/testsuite/gcc.dg/store_merging_19.c   2018-01-17 12:13:08.425987375 
> +0100
> @@ -0,0 +1,57 @@
> +/* PR tree-optimization/83843 */
> +/* { dg-do run } */
> +/* { dg-options "-O2 -fdump-tree-store-merging" } */
> +/* { dg-final { scan-tree-dump-times "Merging successful" 3 "store-merging" 
> { target store_merge } } } */
> +
> +__attribute__((noipa)) void
> +foo (unsigned char *buf, unsigned char *tab)
> +{
> +  tab = __builtin_assume_aligned (tab, 2);
> +  buf = __builtin_assume_aligned (buf, 2);
> +  unsigned v = tab[1] ^ (tab[0] << 8);
> +  buf[0] = ~(v >> 8);
> +  buf[1] = ~v;
> +}
> +
> +__attribute__((noipa)) void
> +bar (unsigned char *buf, unsigned char *tab)
> +{
> +  tab = __builtin_assume_aligned (tab, 2);
> +  buf = __builtin_assume_aligned (buf, 2);
> +  unsigned v = tab[1] ^ (tab[0] << 8);
> +  buf[0] = (v >> 8);
> +  buf[1] = ~v;
> +}
> +
> +__attribute__((noipa)) void
> +baz (unsigned char *buf, unsigned char *tab)
> +{
> +  tab = __builtin_assume_aligned (tab, 2);
> +  buf = __builtin_assume_aligned (buf, 2);
> +  unsigned v = tab[1] ^ (tab[0] << 8);
> +  buf[0] = ~(v >> 8);
> +  buf[1] = v;
> +}
> +
> +int
> +main ()
> +{
> +  volatile unsigned char l1 = 0;
> +  volatile unsigned char l2 = 1;
> +  unsigned char buf[2] __attribute__((aligned (2)));
> +  unsigned char tab[2] __attribute__((aligned (2))) = { l1 + 1, l2 * 2 };
> +  foo (buf, tab);
> +  if (buf[0] != (unsigned char) ~1 || buf[1] != (unsigned char) ~2)
> +__builtin_abort ();
> +  buf[0] = l1 + 7;
> +  buf[1] = l2 * 8;
> +  bar (buf, tab);
> +  if (buf[0] != 1 || buf[1] != (unsigned char) ~2)
> +__builtin_abort ();
> +  buf[0] = l1 + 9;
> +  buf[1] = l2 * 10;
> +  baz (buf, tab);
> +  if (buf[0] != (unsigned char) ~1 || buf[1] != 2)
> +__builtin_abort ();
> +  return 0;
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCH][arm] PR target/83687: Fix invalid combination of VSUB + VABS into VABD

2018-01-17 Thread Kyrill Tkachov

Hi all,

On 15/01/18 11:23, Kyrill Tkachov wrote:

Hi all,

In this wrong-code bug we combine a VSUB.I8 and a VABS.S8
into a VABD.S8 instruction . This combination is not valid
for integer operands because in the VABD instruction the semantics
are that the difference is computed in notionally infinite precision
and the absolute difference is computed on that, whereas for a
VSUB.I8 + VABS.S8 sequence the VSUB operation will perform any
wrapping that's needed for the 8-bit signed type before the VABS
gets its hands on it.

This leads to the wrong-code in the PR where the expected
sequence from the intrinsics:
VSUB + VABS of two vectors {-100, -100, -100...}, {100, 100, 100...}
gives a result of {56, 56, 56...} (-100 - 100)

but GCC optimises it into a single
VABD of {-100, -100, -100...}, {100, 100, 100...}
which produces a result of {200, 200, 200...}

The transformation is still valid for floating-point operands,
which is why it was added in the first place I believe (r178817)
but this patch disables it for integer operands.
The HFmode variants though only exist for TARGET_NEON_FP16INST, so
this patch adds the appropriate guards to the new mode iterator

Bootstrapped and tested on arm-none-linux-gnueabihf.

Committing to trunk.


I've backported this patch to the GCC 7 branch after
bootstrapping and testing on arm-none-linux-gnueabihf.

Thanks,
Kyrill



Thanks,
Kyrill

2018-01-15  Kyrylo Tkachov  

 PR target/83687
 * config/arm/iterators.md (VF): New mode iterator.
 * config/arm/neon.md (neon_vabd_2): Use the above.
 Remove integer-related logic from pattern.
 (neon_vabd_3): Likewise.

2018-01-15  Kyrylo Tkachov  

 PR target/83687
 * gcc.target/arm/neon-combine-sub-abs-into-vabd.c: Delete integer
 tests.
 * gcc.target/arm/pr83687.c: New test.




[testsuite] Tweak gcc.dg/ipa/inlinehint-4.c

2018-01-17 Thread Eric Botcazou
This adds --param inline-unit-growth=20 to the set of options passed to the 
testcase, which is the default setting but makes it possible for the test to 
pass on targets which changes the default, like Visium.

Tested on visium-elf & x86_64-suse-linux, applied on the mainline as obvious.


2018-01-17  Eric Botcazou  

* gcc.dg/ipa/inlinehint-4.c: Also pass --param inline-unit-growth=20.

-- 
Eric BotcazouIndex: gcc.dg/ipa/inlinehint-4.c
===
--- gcc.dg/ipa/inlinehint-4.c	(revision 256776)
+++ gcc.dg/ipa/inlinehint-4.c	(working copy)
@@ -1,4 +1,4 @@
-/* { dg-options "-O3 -fdump-ipa-inline-details -fno-early-inlining --param large-unit-insns=1 -fno-partial-inlining"  } */
+/* { dg-options "-O3 -fdump-ipa-inline-details -fno-early-inlining -fno-partial-inlining --param large-unit-insns=1 --param inline-unit-growth=20" } */
 /* { dg-add-options bind_pic_locally } */
 int *hashval;
 int *hash;


Re: [PATCH] C/C++: Add -Waddress-of-packed-member

2018-01-17 Thread H.J. Lu
On Mon, Jan 15, 2018 at 7:02 AM, H.J. Lu  wrote:
> On Mon, Jan 15, 2018 at 1:42 AM, Jakub Jelinek  wrote:
>> On Sun, Jan 14, 2018 at 06:29:54AM -0800, H.J. Lu wrote:
>>> +   if (TREE_CODE (field) == FIELD_DECL && DECL_PACKED (field))
>>> + {
>>> +   tree field_type = TREE_TYPE (field);
>>> +   unsigned int type_align = TYPE_ALIGN (field_type);
>>> +   tree context = DECL_CONTEXT (field);
>>> +   unsigned int record_align = TYPE_ALIGN (context);
>>> +   if ((record_align % type_align) != 0)
>>> + return context;
>>> +   type_align /= BITS_PER_UNIT;
>>> +   unsigned HOST_WIDE_INT field_off
>>> +  = (tree_to_uhwi (DECL_FIELD_OFFSET (field))
>>> + + (tree_to_uhwi (DECL_FIELD_BIT_OFFSET (field))
>>> +/ BITS_PER_UNIT));
>>
>> This has the same bug I've just created PR83844 for, you can't assume
>> DECL_FIELD_OFFSET is INTEGER_CST that fits into UHWI, and also we have
>> byte_position wrapper that should be used to compute the offset from
>> DECL_FIELD_*OFFSET.
>
> Here is the updated patch to use byte_position wrapper.  OK for trunk?
>

Here is the updated patch not to warn:

struct pair_t {
char c;
__int128_t i;
} __attribute__((packed));

typedef struct unaligned_int128_t_ {
__int128_t value;
} __attribute__((packed)) unaligned_int128_t;

struct pair_t p = {0, 1};
unaligned_int128_t *addr = (unaligned_int128_t *)();

int main() {
addr->value = ~(__int128_t)0;
return (p.i != 1) ? 0 : 1;
}

by properly checking the expected alignment against the field alignment.


-- 
H.J.
From 54b68f11c18971d1371d5bb5bde7b0c1d3e6ee7b Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Fri, 12 Jan 2018 21:12:05 -0800
Subject: [PATCH] C/C++: Add -Waddress-of-packed-member

When address of packed member of struct or union is taken, it may result
in an unaligned pointer value.  This patch adds -Waddress-of-packed-member
to warn it:

$ cat x.i
struct pair_t
{
  char c;
  int i;
} __attribute__ ((packed));

extern struct pair_t p;
int *addr = 
$ gcc -O2 -S x.i
x.i:8:13: warning: initialization of 'int *' from address of packed member of 'struct pair_t' may result in an unaligned pointer value [-Waddress-of-packed-member]
 int *addr = 
 ^
$

This warning is enabled by default.

gcc/c/

	PR c/51628
	* doc/invoke.texi: Document -Wno-address-of-packed-member.

gcc/c-family/

	PR c/51628
	* c-common.h (warn_for_address_of_packed_member): New.
	* c-warn.c (warn_for_address_of_packed_member): New function.
	* c.opt: Add -Wno-address-of-packed-member.

gcc/c/

	PR c/51628
	* c-typeck.c (convert_for_assignment): Call
	warn_for_address_of_packed_member.  Issue an warning if address
	of packed member is taken.

gcc/cp/

	PR c/51628
	* call.c (convert_for_arg_passing): Call
	warn_for_address_of_packed_member.  Issue an warning if address
	of packed member is taken.
	* typeck.c (convert_for_assignment): Likewise.

gcc/testsuite/

	PR c/51628
	* c-c++-common/pr51628-1.c: New tests.
	* c-c++-common/pr51628-10.c: Likewise.
	* c-c++-common/pr51628-2.c: Likewise.
	* c-c++-common/pr51628-3.c: Likewise.
	* c-c++-common/pr51628-4.c: Likewise.
	* c-c++-common/pr51628-5.c: Likewise.
	* c-c++-common/pr51628-6.c: Likewise.
	* c-c++-common/pr51628-7.c: Likewise.
	* c-c++-common/pr51628-8.c: Likewise.
	* c-c++-common/pr51628-9.c: Likewise.
	* gcc.dg/pr51628-10.c: Likewise.
	* gcc.dg/pr51628-11.c: Likewise.
	* c-c++-common/ubsan/align-10.c: Add -Wno-address-of-packed-member.
	* c-c++-common/ubsan/align-2.c: Likewise.
	* c-c++-common/ubsan/align-4.c: Likewise.
	* c-c++-common/ubsan/align-6.c: Likewise.
	* c-c++-common/ubsan/align-7.c: Likewise.
	* c-c++-common/ubsan/align-8.c: Likewise.
	* g++.dg/ubsan/align-2.C: Likewise.
	* gcc.target/i386/avx512bw-vmovdqu16-2.c: Likewise.
	* gcc.target/i386/avx512f-vmovdqu32-2.c: Likewise.
	* gcc.target/i386/avx512f-vmovdqu64-2.c: Likewise.
	* gcc.target/i386/avx512vl-vmovdqu16-2.c: Likewise.
	* gcc.target/i386/avx512vl-vmovdqu32-2.c: Likewise.
	* gcc.target/i386/avx512vl-vmovdqu64-2.c: Likewise.
---
 gcc/c-family/c-common.h|  1 +
 gcc/c-family/c-warn.c  | 56 ++
 gcc/c-family/c.opt |  4 ++
 gcc/c/c-typeck.c   | 40 +++-
 gcc/cp/call.c  |  8 
 gcc/cp/typeck.c| 41 
 gcc/doc/invoke.texi| 11 -
 gcc/testsuite/c-c++-common/pr51628-1.c | 29 +++
 gcc/testsuite/c-c++-common/pr51628-10.c| 24 ++
 gcc/testsuite/c-c++-common/pr51628-2.c | 29 +++
 gcc/testsuite/c-c++-common/pr51628-3.c | 35 ++
 gcc/testsuite/c-c++-common/pr51628-4.c | 35 ++
 

Re: Backports for GCC 7 branch

2018-01-17 Thread Martin Liška
Hello.

Another 4 patches that I've just tests and bootsrapped.

Martin
>From af6233cb16c9dc174ef4e45da06c43bfd5442d4e Mon Sep 17 00:00:00 2001
From: jakub 
Date: Thu, 4 Jan 2018 21:13:17 +
Subject: Backport r256266

gcc/testsuite/ChangeLog:

2018-01-04  Jakub Jelinek  

	PR ipa/82352
	* g++.dg/ipa/pr82352.C (size_t): Define to __SIZE_TYPE__ instead of
	long unsigned int.

---
diff --git a/gcc/testsuite/g++.dg/ipa/pr82352.C b/gcc/testsuite/g++.dg/ipa/pr82352.C
index c044345a486..08516da0c8a 100644
--- a/gcc/testsuite/g++.dg/ipa/pr82352.C
+++ b/gcc/testsuite/g++.dg/ipa/pr82352.C
@@ -2,7 +2,7 @@
 // { dg-do compile }
 // { dg-options "-O2" }

-typedef long unsigned int size_t;
+typedef __SIZE_TYPE__ size_t;

 class A
 {
--
2.14.3
>From 6ed5216d2b8b2be5c9373a9f9dc0c38ef09abce7 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Thu, 4 Jan 2018 08:54:17 +
Subject: Backport r256226

gcc/ChangeLog:

2018-01-04  Martin Liska  

	PR ipa/82352
	* ipa-icf.c (sem_function::merge): Do not cross comdat boundary.

gcc/testsuite/ChangeLog:

2018-01-04  Martin Liska  

	PR ipa/82352
	* g++.dg/ipa/pr82352.C: New test.

---
diff --git a/gcc/ipa-icf.c b/gcc/ipa-icf.c
index edb0b7896cd..b9f2bf30744 100644
--- a/gcc/ipa-icf.c
+++ b/gcc/ipa-icf.c
@@ -1113,6 +1113,17 @@ sem_function::merge (sem_item *alias_item)
   return false;
 }

+  if (!original->in_same_comdat_group_p (alias)
+  || original->comdat_local_p ())
+{
+  if (dump_file)
+	fprintf (dump_file,
+		 "Not unifying; alias nor wrapper cannot be created; "
+		 "across comdat group boundary\n\n");
+
+  return false;
+}
+
   /* See if original is in a section that can be discarded if the main
  symbol is not used.  */

diff --git a/gcc/testsuite/g++.dg/ipa/pr82352.C b/gcc/testsuite/g++.dg/ipa/pr82352.C
new file mode 100644
index 000..c044345a486
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ipa/pr82352.C
@@ -0,0 +1,93 @@
+// PR ipa/82352
+// { dg-do compile }
+// { dg-options "-O2" }
+
+typedef long unsigned int size_t;
+
+class A
+{
+public :
+  typedef enum { Zero = 0, One = 1 } tA;
+  A(tA a) { m_a = a; }
+
+private :
+  tA m_a;
+};
+
+class B
+{
+public :
+  void *operator new(size_t t) { return (void*)(42); };
+};
+
+class C
+{
+public:
+  virtual void  () = 0;
+};
+
+class D
+{
+ public :
+  virtual void g() = 0;
+  virtual void h() = 0;
+};
+
+template class : public T, public D
+{
+public:
+ void ()
+ {
+   if (!m_i2) throw A(A::One);
+ };
+
+ void h()
+ {
+  if (m_i2) throw A(A::Zero);
+ }
+
+protected:
+ virtual void g()
+ {
+  if (m_i1 !=0) throw A(A::Zero);
+ };
+
+private :
+ int m_i1;
+ void *m_i2;
+};
+
+class E
+{
+private:
+size_t m_e;
+static const size_t Max;
+
+public:
+E& i(size_t a, size_t b, size_t c)
+{
+if ((a > Max) || (c > Max)) throw A(A::Zero );
+if (a + b > m_e) throw A(A::One );
+return (*this);
+}
+
+  inline E& j(const E )
+{
+  return i(0,0,s.m_e);
+}
+};
+
+class F : public C { };
+class G : public C { };
+class  : public B, public F, public G { };
+
+void k()
+{
+new ();
+}
+
+void l()
+{
+  E e1, e2;
+  e1.j(e2);
+}
--
2.14.3
>From f7491b347eed2606bcaf8ae8497f8fae3738ec6e Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 3 Jan 2018 14:15:58 +
Subject: Backport r256177

gcc/ChangeLog:

2018-01-03  Martin Liska  

	PR ipa/83549
	* cif-code.def (VARIADIC_THUNK): New enum value.
	* ipa-fnsummary.c (compute_fn_summary): Do not inline variadic
	thunks.

gcc/testsuite/ChangeLog:

2018-01-03  Martin Liska  

	PR ipa/83549
	* g++.dg/ipa/pr83549.C: New test.

---
diff --git a/gcc/cif-code.def b/gcc/cif-code.def
index 6d7e2b4070b..19a76213943 100644
--- a/gcc/cif-code.def
+++ b/gcc/cif-code.def
@@ -95,6 +95,10 @@ DEFCIFCODE(MISMATCHED_ARGUMENTS, CIF_FINAL_ERROR,
 DEFCIFCODE(LTO_MISMATCHED_DECLARATIONS, CIF_FINAL_ERROR,
 	   N_("mismatched declarations during linktime optimization"))

+/* Caller is variadic thunk.  */
+DEFCIFCODE(VARIADIC_THUNK, CIF_FINAL_ERROR,
+	   N_("variadic thunk call"))
+
 /* Call was originally indirect.  */
 DEFCIFCODE(ORIGINALLY_INDIRECT_CALL, CIF_FINAL_NORMAL,
 	   N_("originally indirect function call not considered for inlining"))
diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c
index fc18518d48f..e9f76d5cdac 100644
--- a/gcc/ipa-fnsummary.c
+++ b/gcc/ipa-fnsummary.c
@@ -2422,6 +2422,11 @@ compute_fn_summary (struct cgraph_node *node, bool early)
   info->inlinable = false;
   node->callees->inline_failed = CIF_CHKP;
 	}
+  else if (stdarg_p (TREE_TYPE (node->decl)))
+	{
+	  info->inlinable = false;
+	  node->callees->inline_failed = CIF_VARIADIC_THUNK;
+	}
   else
 info->inlinable = true;
 }
diff --git a/gcc/testsuite/g++.dg/ipa/pr83549.C 

[PATCH][arm] Convert gcc.target/arm/stl-cond.c into an RTL test

2018-01-17 Thread Kyrill Tkachov

Hi all,

This is an awkward testsuite failure. The original bug was that we were failing 
to put out
the conditional code in the conditional form of the STL instruction (oops!).
So we wanted to output STLNE, but instead output STL.
The testacase relies on if-conversion to conditionalise the insn for STL.
However, ever since r251643 the expansion of a non-relaxed atomic store
always includes a compiler barrier. That blocks if-conversion in all cases.

So there's no easy to get to a conditional STL instruction from a C program.
But we do want to test for the original bug fix that if the RTL insn for STL is 
conditionalised
it should output the conditional code.

The solution in this patch is to convert the test into an RTL test with the 
COND_EXEC form
of the STL insn and scan the assembly output there.
This seems to work fine, and gives us an opportunity to create a gcc.dg/rtl/arm 
directory
in the RTL tests.

This now makes the gcc.target/arm/stl-cond.c disappear (as the test is deleted) 
and
the new test in gcc.dg/rtl/arm/stl-cond.c passes.

Committing to trunk.
Thanks,
Kyrill

2018-01-17  Kyrylo Tkachov  

* gcc.dg/rtl/arm/stl-cond.c: New test.
* gcc.target/arm/stl-cond.c: Delete.
diff --git a/gcc/testsuite/gcc.dg/rtl/arm/stl-cond.c b/gcc/testsuite/gcc.dg/rtl/arm/stl-cond.c
new file mode 100644
index ..e2bc610e1faf4012d06764d78d7853d2237c7b01
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/rtl/arm/stl-cond.c
@@ -0,0 +1,61 @@
+/* { dg-do compile { target arm-*-* } } */
+/* { dg-require-effective-target arm_arm_ok } */
+/* { dg-require-effective-target arm_arch_v8a_ok } */
+/* { dg-options "-O2 -marm" } */
+/* { dg-add-options arm_arch_v8a } */
+
+/* We want to test that the STL instruction gets the conditional
+   suffix when under a COND_EXEC.  However, COND_EXEC is very hard to
+   generate from C code because the atomic_store expansion adds a compiler
+   barrier before the insn, preventing if-conversion.  So test the output
+   here with a hand-crafted COND_EXEC wrapped around an STL.  */
+
+void __RTL (startwith ("final")) foo (int *a, int b)
+{
+(function "foo"
+  (param "a"
+(DECL_RTL (reg/v:SI r0))
+(DECL_RTL_INCOMING (reg:SI r0))
+  )
+  (param "b"
+(DECL_RTL (reg/v:SI r1))
+(DECL_RTL_INCOMING (reg:SI r1))
+  )
+  (insn-chain
+(block 2
+	(edge-from entry (flags "FALLTHRU"))
+	(cnote 5 [bb 2] NOTE_INSN_BASIC_BLOCK)
+
+  (insn:TI 7 (parallel [
+	(set (reg:CC cc)
+	 (compare:CC (reg:SI r1)
+			 (const_int 0)))
+	(set (reg/v:SI r1)
+	 (reg:SI r1 ))
+])  ;; {*movsi_compare0}
+ (nil))
+
+  ;; A conditional atomic store-release: STLNE for Armv8-A.
+  (insn 10 (cond_exec (ne (reg:CC cc)
+	   (const_int 0))
+	(set (mem/v:SI (reg/v/f:SI r0) [-1  S4 A32])
+		(unspec_volatile:SI [
+		(reg/v:SI r1)
+		(const_int 3)
+		] VUNSPEC_STL))) ;; {*p atomic_storesi}
+	(expr_list:REG_DEAD (reg:CC cc)
+	(expr_list:REG_DEAD (reg/v:SI r1)
+	(expr_list:REG_DEAD (reg/v/f:SI r0)
+		(nil)
+  (edge-to exit (flags "FALLTHRU"))
+) ;; block 2
+  ) ;; insn-chain
+  (crtl
+(return_rtx
+  (reg/i:SI r0)
+) ;; return_rtx
+  ) ;; crtl
+) ;; function
+}
+
+/* { dg-final { scan-assembler "stlne" } } */
diff --git a/gcc/testsuite/gcc.target/arm/stl-cond.c b/gcc/testsuite/gcc.target/arm/stl-cond.c
deleted file mode 100644
index de14bb580b82eaf8ca0a3e6e11f842c4baf5c756..
--- a/gcc/testsuite/gcc.target/arm/stl-cond.c
+++ /dev/null
@@ -1,19 +0,0 @@
-/* { dg-do compile } */
-/* { dg-require-effective-target arm_arm_ok } */ 
-/* { dg-require-effective-target arm_arch_v8a_ok } */
-/* { dg-options "-O2 -marm" } */
-/* { dg-add-options arm_arch_v8a } */
-
-struct backtrace_state
-{
-  int threaded;
-  int lock_alloc;
-};
-
-void foo (struct backtrace_state *state)
-{
-  if (state->threaded)
-__sync_lock_release (>lock_alloc);
-}
-
-/* { dg-final { scan-assembler "stlne" } } */


[PING][PATCH, AArch64] Disable reg offset in quad-word store for Falkor

2018-01-17 Thread Siddhesh Poyarekar
From: Siddhesh Poyarekar 

Hi,

Jim Wilson posted a patch for this in September[1] and it appears
following discussions that the patch was an acceptable fix for falkor.
Kugan followed up[2] with a test case since that was requested during
initial review.  Jim has moved on from Linaro, so I'm pinging this patch
with the hope that it is OK for inclusion since it was posted before the
freeze and is also isolated in impact to just falkor.

Siddhesh

[1] https://gcc.gnu.org/ml/gcc-patches/2017-09/msg01547.html
[2] https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00050.html

On Falkor, because of an idiosyncracy of how the pipelines are designed, a
quad-word store using a reg+reg addressing mode is almost twice as slow as an
add followed by a quad-word store with a single reg addressing mode.  So we
get better performance if we disallow addressing modes using register offsets
with quad-word stores.

Using lmbench compiled with -O2 -ftree-vectorize as my benchmark, I see a 13%
performance increase on stream copy using this patch, and a 16% performance
increase on stream scale using this patch.  I also see a small performance
increase on SPEC CPU2006 of around 0.2% for int and 0.4% for FP at -O3.

2018-01-17  Jim Wilson  
Kugan Vivekanandarajah  

gcc/
* config/aarch64/aarch64-protos.h 
(aarch64_falkor_movti_target_operand_p): Declare.
constraint instead of m.
* config/aarch64/aarch64.c (aarch64_falkor_movti_target_operand_p): New.
* config/aarch64/constraints.md (Utf): New.
* config/aarch64/aarch64.md (movti_aarch64): Use Utf constraint instead
of m.
(movtf_aarch64): Likewise.
* config/aarch64/aarch64-simd.md (aarch64_simd_mov): Use Utf

gcc/testsuite/
* gcc/testsuite/gcc.target/aarch64/pr82533.c: New test case.

---
 gcc/config/aarch64/aarch64-protos.h|  1 +
 gcc/config/aarch64/aarch64-simd.md |  4 ++--
 gcc/config/aarch64/aarch64.c   | 10 ++
 gcc/config/aarch64/aarch64.md  |  6 +++---
 gcc/config/aarch64/constraints.md  |  6 ++
 gcc/testsuite/gcc.target/aarch64/pr82533.c | 11 +++
 6 files changed, 33 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/pr82533.c

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 2d705d2..088d864 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -433,6 +433,7 @@ bool aarch64_simd_mem_operand_p (rtx);
 bool aarch64_sve_ld1r_operand_p (rtx);
 bool aarch64_sve_ldr_operand_p (rtx);
 bool aarch64_sve_struct_memory_operand_p (rtx);
+bool aarch64_falkor_movti_target_operand_p (rtx);
 rtx aarch64_simd_vect_par_cnst_half (machine_mode, int, bool);
 rtx aarch64_tls_get_addr (void);
 tree aarch64_fold_builtin (tree, int, tree *, bool);
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 3d1f6a0..f7daac3 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -131,9 +131,9 @@
 
 (define_insn "*aarch64_simd_mov"
   [(set (match_operand:VQ 0 "nonimmediate_operand"
-   "=w, Umq,  m,  w, ?r, ?w, ?r, w")
+   "=w, Umq, Utf,  w, ?r, ?w, ?r, w")
(match_operand:VQ 1 "general_operand"
-   "m,  Dz, w,  w,  w,  r,  r, Dn"))]
+   "m,  Dz,w,  w,  w,  r,  r, Dn"))]
   "TARGET_SIMD
&& (register_operand (operands[0], mode)
|| aarch64_simd_reg_or_zero (operands[1], mode))"
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 2e70f3a..0db7a4f 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -13477,6 +13477,16 @@ aarch64_sve_struct_memory_operand_p (rtx op)
  && offset_4bit_signed_scaled_p (SVE_BYTE_MODE, last));
 }
 
+/* Return TRUE if OP is a good address mode for movti target on falkor.  */
+bool
+aarch64_falkor_movti_target_operand_p (rtx op)
+{
+  if ((enum attr_tune) aarch64_tune == TUNE_FALKOR)
+return MEM_P (op) && ! (GET_CODE (XEXP (op, 0)) == PLUS
+   && ! CONST_INT_P (XEXP (XEXP (op, 0), 1)));
+  return MEM_P (op);
+}
+
 /* Emit a register copy from operand to operand, taking care not to
early-clobber source registers in the process.
 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index edb6a75..696fd12 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -1079,7 +1079,7 @@
 
 (define_insn "*movti_aarch64"
   [(set (match_operand:TI 0
-"nonimmediate_operand"  "=r, w,r,w,r,m,m,w,m")
+"nonimmediate_operand"  "=r, w,r,w,r,m,m,w,Utf")
(match_operand:TI 1
 "aarch64_movti_operand" " rn,r,w,w,m,r,Z,m,w"))]
   "(register_operand (operands[0], TImode)
@@ -1226,9 +1226,9 @@
 
 (define_insn "*movtf_aarch64"
   [(set 

Re: [PATCH] Fix store-merging for ~ of bswap (PR tree-optimization/83843)

2018-01-17 Thread Jakub Jelinek
On Tue, Jan 16, 2018 at 02:19:16PM +0100, Christophe Lyon wrote:
> I've noticed that this new test fails on arm, eg:
> arm-none-linux-gnueabihf
> --with-mode arm
> --with-cpu cortex-a9
> --with-fpu neon-fp16
> FAIL: gcc.dg/store_merging_18.c scan-tree-dump-times store-merging
> "Merging successful" 3 (found 0 times)

Ugh, the problem that arm announces itself as a store_merge target when
it can't do unaligned stores again, so essentially dup of PR83195.
We really shouldn't lie :(.

Anyway, for now I've checked in the following which matches what I've done
for PR83195.

Better would be to have store_merge_unaligned and store_merge, where
the former would be current store_merge except for arm, and latter
would be all targets that can perform store merging (isn't that all except
targets that don't have 8-bit chars or pdp endianity)?

2018-01-17  Jakub Jelinek  

PR tree-optimization/83843
* gcc.dg/store_merging_18.c: Don't expect "Merging successful" on arm.
* gcc.dg/store_merging_19.c: New test.

--- gcc/testsuite/gcc.dg/store_merging_18.c.jj  2018-01-16 09:52:26.231235131 
+0100
+++ gcc/testsuite/gcc.dg/store_merging_18.c 2018-01-17 12:10:07.862957549 
+0100
@@ -1,7 +1,7 @@
 /* PR tree-optimization/83843 */
 /* { dg-do run } */
 /* { dg-options "-O2 -fdump-tree-store-merging" } */
-/* { dg-final { scan-tree-dump-times "Merging successful" 3 "store-merging" { 
target store_merge } } } */
+/* { dg-final { scan-tree-dump-times "Merging successful" 3 "store-merging" { 
target { store_merge && { ! arm*-*-* } } } } } */
 
 __attribute__((noipa)) void
 foo (unsigned char *buf, unsigned char *tab)
--- gcc/testsuite/gcc.dg/store_merging_19.c.jj  2018-01-17 12:10:34.819962003 
+0100
+++ gcc/testsuite/gcc.dg/store_merging_19.c 2018-01-17 12:13:08.425987375 
+0100
@@ -0,0 +1,57 @@
+/* PR tree-optimization/83843 */
+/* { dg-do run } */
+/* { dg-options "-O2 -fdump-tree-store-merging" } */
+/* { dg-final { scan-tree-dump-times "Merging successful" 3 "store-merging" { 
target store_merge } } } */
+
+__attribute__((noipa)) void
+foo (unsigned char *buf, unsigned char *tab)
+{
+  tab = __builtin_assume_aligned (tab, 2);
+  buf = __builtin_assume_aligned (buf, 2);
+  unsigned v = tab[1] ^ (tab[0] << 8);
+  buf[0] = ~(v >> 8);
+  buf[1] = ~v;
+}
+
+__attribute__((noipa)) void
+bar (unsigned char *buf, unsigned char *tab)
+{
+  tab = __builtin_assume_aligned (tab, 2);
+  buf = __builtin_assume_aligned (buf, 2);
+  unsigned v = tab[1] ^ (tab[0] << 8);
+  buf[0] = (v >> 8);
+  buf[1] = ~v;
+}
+
+__attribute__((noipa)) void
+baz (unsigned char *buf, unsigned char *tab)
+{
+  tab = __builtin_assume_aligned (tab, 2);
+  buf = __builtin_assume_aligned (buf, 2);
+  unsigned v = tab[1] ^ (tab[0] << 8);
+  buf[0] = ~(v >> 8);
+  buf[1] = v;
+}
+
+int
+main ()
+{
+  volatile unsigned char l1 = 0;
+  volatile unsigned char l2 = 1;
+  unsigned char buf[2] __attribute__((aligned (2)));
+  unsigned char tab[2] __attribute__((aligned (2))) = { l1 + 1, l2 * 2 };
+  foo (buf, tab);
+  if (buf[0] != (unsigned char) ~1 || buf[1] != (unsigned char) ~2)
+__builtin_abort ();
+  buf[0] = l1 + 7;
+  buf[1] = l2 * 8;
+  bar (buf, tab);
+  if (buf[0] != 1 || buf[1] != (unsigned char) ~2)
+__builtin_abort ();
+  buf[0] = l1 + 9;
+  buf[1] = l2 * 10;
+  baz (buf, tab);
+  if (buf[0] != (unsigned char) ~1 || buf[1] != 2)
+__builtin_abort ();
+  return 0;
+}

Jakub


[PATCH][arm] Fix gcc.target/arm/pr40887.c directives

2018-01-17 Thread Kyrill Tkachov

Hi all,

This patch converts gcc.target/arm/pr40887.c to use the proper effective target 
check and dg-add-options for armv5te
so that we avoid situations where we end up trying to compile the test with a 
Thumb1 hard-float ABI, which makes the
compiler complain.

This allows the test to pass gracefully for me for my compiler configured with:
--with-cpu=cortex-a15 --with-fpu=neon-vfpv4 --with-float=hard --with-mode=thumb

Committing to trunk.

Thanks,
Kyrill

2018-01-17  Kyrylo Tkachov  

* gcc.target/arm/pr40887.c: Add armv5te effective target checks and
directives.
diff --git a/gcc/testsuite/gcc.target/arm/pr40887.c b/gcc/testsuite/gcc.target/arm/pr40887.c
index 0329916d014c034fb37dbc62b6a2a99c32aa6510..5baa05695374a3746c2d08801da5d31c729def2a 100644
--- a/gcc/testsuite/gcc.target/arm/pr40887.c
+++ b/gcc/testsuite/gcc.target/arm/pr40887.c
@@ -1,5 +1,8 @@
+/* { dg-do compile } */
 /* { dg-skip-if "need at least armv5" { *-*-* } { "-march=armv[234]*" } { "" } } */
-/* { dg-options "-O2 -march=armv5te" }  */
+/* { dg-require-effective-target arm_arch_v5te_ok } */
+/* { dg-add-options arm_arch_v5te } */
+/* { dg-options "-O2" }  */
 /* { dg-final { scan-assembler "blx" } } */
 
 int (*indirect_func)(int x);


[PATCH][arm] Fix gcc.target/arm/xor-and.c

2018-01-17 Thread Kyrill Tkachov

Hi all,

This test is naughty because it doesn't use the proper effective target checks
and add-options mechanisms for setting a Thumb1 target, which leads to Thumb1 
hard-float errors
when testing a toolchain configured with --with-cpu=cortex-a15 
--with-fpu=neon-vfpv4 --with-float=hard --with-mode=thumb.

This patch fixes that in the obvious way.

Committing to trunk.
Thanks,
Kyrill

2018-01-17  Kyrylo Tkachov 

* gcc.target/arm/xor-and.c: Fix armv6 effective target checks
and options.
diff --git a/gcc/testsuite/gcc.target/arm/xor-and.c b/gcc/testsuite/gcc.target/arm/xor-and.c
index 3715530cd7bf9ad8abb24cb21cd51ae3802079e8..9afa81d3ec10c983ba2555c867f6f00a85f80150 100644
--- a/gcc/testsuite/gcc.target/arm/xor-and.c
+++ b/gcc/testsuite/gcc.target/arm/xor-and.c
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
-/* { dg-options "-O -march=armv6" } */
-/* { dg-prune-output "switch .* conflicts with" } */
+/* { dg-require-effective-target arm_arch_v6_ok } */
+/* { dg-add-options arm_arch_v6 } */
+/* { dg-options "-O" }  */
 
 unsigned short foo (unsigned short x)
 {


[committed] Add testcase for PR rtl-optimization/83771

2018-01-17 Thread Jakub Jelinek
Hi!

This testcase got fixed with Honza's PR middle-end/83575 fix
r256479, so I've just added a testcase after testing it on x86_64/i686-linux
and committed as obvious, so we can close the PR.

2018-01-17  Jakub Jelinek  

PR rtl-optimization/83771
* gcc.dg/pr83771.c: New test.

--- gcc/testsuite/gcc.dg/pr83771.c.jj   2018-01-17 11:56:01.097820393 +0100
+++ gcc/testsuite/gcc.dg/pr83771.c  2018-01-17 11:55:54.093819292 +0100
@@ -0,0 +1,19 @@
+/* PR rtl-optimization/83771 */
+/* { dg-do compile } */
+/* { dg-options "-O3 -fmodulo-sched -fno-ssa-phiopt" } */
+
+long int a;
+int b;
+int foo (int);
+
+void
+bar (void)
+{
+  int c;
+  do
+{
+  c = a / (!!b == 1);
+  c = !!c + 1;
+}
+  while (foo (c) < 1);
+}

Jakub


Re: [patch] Fix PR tree-optimization/81184

2018-01-17 Thread Richard Biener
On Wed, Jan 17, 2018 at 11:37 AM, Eric Botcazou  wrote:
> Hi,
>
> as suggested by Jakub in the audit trail, this simply adjusts the dg-final
> line according to whether it's for a logical_op_short_circuit target or not.
>
> Tested on visium-elf and x86_64-suse-linux, OK for the mainline?

Ok.

Richard.

>
> 2018-01-17  Eric Botcazou  
>
> PR tree-optimization/81184
> * gcc.dg/pr21643.c: Adjust dg-final line for logical_op_short_circuit
> targets.
> * gcc.dg/tree-ssa/phi-opt-11.c: Likewise.
>
> --
> Eric Botcazou


[patch] Fix PR tree-optimization/81184

2018-01-17 Thread Eric Botcazou
Hi,

as suggested by Jakub in the audit trail, this simply adjusts the dg-final 
line according to whether it's for a logical_op_short_circuit target or not.

Tested on visium-elf and x86_64-suse-linux, OK for the mainline?


2018-01-17  Eric Botcazou  

PR tree-optimization/81184
* gcc.dg/pr21643.c: Adjust dg-final line for logical_op_short_circuit
targets.
* gcc.dg/tree-ssa/phi-opt-11.c: Likewise.

-- 
Eric BotcazouIndex: gcc.dg/pr21643.c
===
--- gcc.dg/pr21643.c	(revision 256776)
+++ gcc.dg/pr21643.c	(working copy)
@@ -87,4 +87,5 @@ f9 (unsigned char c)
   return 1;
 }
 
-/* { dg-final { scan-tree-dump-times "Optimizing range tests c_\[0-9\]*.D. -.0, 31. and -.32, 32.\[\n\r\]* into" 6 "reassoc1" } } */
+/* { dg-final { scan-tree-dump-times "Optimizing range tests c_\[0-9\]*.D. -.0, 31. and -.32, 32.\[\n\r\]* into" 6 "reassoc1" { target { ! logical_op_short_circuit } } } }  */
+/* { dg-final { scan-tree-dump-times "Optimizing range tests c_\[0-9\]*.D. -.0, 31. and -.32, 32.\[\n\r\]* into" 5 "reassoc1" { target logical_op_short_circuit } } } */
Index: gcc.dg/tree-ssa/phi-opt-11.c
===
--- gcc.dg/tree-ssa/phi-opt-11.c	(revision 256776)
+++ gcc.dg/tree-ssa/phi-opt-11.c	(working copy)
@@ -22,4 +22,6 @@ int h(int a, int b, int c, int d)
return d;
  return a;
 }
-/* { dg-final { scan-tree-dump-times "if" 0 "optimized"} } */
+
+/* { dg-final { scan-tree-dump-times "if" 0 "optimized" { target { ! logical_op_short_circuit } } } } */
+/* { dg-final { scan-tree-dump-times "if" 2 "optimized" { target logical_op_short_circuit } } } */


Re: [PATCH] C++: Fix ICE in warn_for_memset within templates (PR c++/83814)

2018-01-17 Thread Jakub Jelinek
On Fri, Jan 12, 2018 at 05:09:24PM -0500, David Malcolm wrote:
> PR c++/83814 reports an ICE introduced by the location wrapper patch
> (r256448), affecting certain memset calls within templates.

Note, I think this issue sadly affects a lot of code, so it is quite urgent.

That said, wonder if we really can't do any folding when
processing_template_decl, could we e.g. do at least maybe_constant_value,
or fold if the expression is not type nor value dependent?

BTW, never know if cp_fold_rvalue is a superset of maybe_constant_value or not.

> --- a/gcc/cp/expr.c
> +++ b/gcc/cp/expr.c
> @@ -315,3 +315,25 @@ mark_exp_read (tree exp)
>  }
>  }
>  
> +/* Fold X for consideration by one of the warning functions when checking
> +   whether an expression has a constant value.  */
> +
> +tree
> +fold_for_warn (tree x)
> +{
> +  /* C++ implementation.  */
> +
> +  /* It's not generally safe to fold inside of a template, so
> + merely strip any location wrapper and read through enum values.  */
> +  if (processing_template_decl)
> +{
> +  STRIP_ANY_LOCATION_WRAPPER (x);
> +
> +  if (TREE_CODE (x) == CONST_DECL)
> + x = DECL_INITIAL (x);
> +
> +  return x;
> +}
> +
> +  return c_fully_fold (x, /*for_init*/false, /*maybe_constp*/NULL);
> +}

Jakub


Re: [patch][x86] Fix PR83618

2018-01-17 Thread Uros Bizjak
On Wed, Jan 17, 2018 at 9:56 AM, Koval, Julia  wrote:
> Fix bug, when rdpid intrinsic used eax instead of rax in 64bit mode. Ok for 
> trunk?
>
> gcc/
> * config/i386/i386.c (ix86_expand_builtin): Handle IX86_BUILTIN_RDPID.
> * config/i386/i386.md (rdpid_rex64) New.
> (rdpid): Make 32bit only.
>
> gcc/testsuite/
> * gcc.target/i386/rdpid.c: Remove "eax".

OK, but please fix the comment:

+  /* mode is VOIDmode if __builtin_rd* has been called
+ without lhs.  */

We have __builtin_rdpid here.

Uros.


Re: [testsuite] Tweak Wrestrict.c

2018-01-17 Thread Eric Botcazou
> There should be just one warning per call, and (as it is) -Wrestrict
> should suppress -Wstringop-overflow.  This suppression was a recent
> change (r256683). 

Sorry, the changes indeed crossed, I'm going to revert mine.

-- 
Eric Botcazou


[patch][x86] Fix PR83618

2018-01-17 Thread Koval, Julia
Fix bug, when rdpid intrinsic used eax instead of rax in 64bit mode. Ok for 
trunk?

gcc/
* config/i386/i386.c (ix86_expand_builtin): Handle IX86_BUILTIN_RDPID.
* config/i386/i386.md (rdpid_rex64) New.
(rdpid): Make 32bit only.

gcc/testsuite/
* gcc.target/i386/rdpid.c: Remove "eax".

Thanks,
Julia


0001-fix.patch
Description: 0001-fix.patch


Re: [committed] hppa: Switch hppa-linux to caller copies ABI

2018-01-17 Thread Richard Biener
On Wed, Jan 17, 2018 at 3:00 AM, John David Anglin  wrote:
> The callee copies ABI used for 32-bit hppa causes no end of optimization
> issues and problems with
> OpenMP.  The hppa target is only in Debian unstable and gentoo.  In both
> cases, packages are
> rebuilt often.  So, Helge and I decided that it was better to break the ABI
> and accept whatever
> problems that result from the switch.
>
> Committed to trunk.

That deserves a warning in gcc-8/changes.html

Richard.

> Dave
>
> --
> John David Anglin  dave.ang...@bell.net
>


  1   2   >