Re: [AARCH64] Disable pc relative literal load irrespective of TARGET_FIX_ERR_A53_84341

2017-08-29 Thread Kugan Vivekanandarajah
Hi James,

On 29 August 2017 at 21:31, James Greenhalgh  wrote:
> On Tue, Jun 27, 2017 at 11:20:02AM +1000, Kugan Vivekanandarajah wrote:
>> https://gcc.gnu.org/ml/gcc-patches/2016-03/msg00614.html  added this
>> workaround to get kernel building with when TARGET_FIX_ERR_A53_843419
>> is enabled.
>>
>> This was added to support building kernel loadable modules. In kernel,
>> when CONFIG_ARM64_ERRATUM_843419 is selected, the relocation needed
>> for ADRP/LDR (R_AARCH64_ADR_PREL_PG_HI21 and
>> R_AARCH64_ADR_PREL_PG_HI21_NC are removed from the kernel to avoid
>> loading objects with possibly offending sequence). Thus, it could only
>> support pc relative literal loads.
>>
>> However, the following patch was posted to kernel to add
>> -mpc-relative-literal-loads
>> http://www.spinics.net/lists/arm-kernel/msg476149.html
>>
>> -mpc-relative-literal-loads is unconditionally added to the kernel
>> build as can be seen from:
>> https://github.com/torvalds/linux/blob/master/arch/arm64/Makefile
>>
>> Therefore this patch removes the hunk so that applications like
>> SPECcpu2017's 521/621.wrf can be built (with LTO in this case) without
>> -mno-pc-relative-literal-loads
>>
>> Bootstrapped and regression tested on aarch64-linux-gnu with no new 
>> regressions.
>>
>> Is this OK for trunk?
>
> Hi Kugan,
>
> I'm struggling a little to convince myself that this is correct. I think
> the argument is that this was a workaround for one very particular issue
> with the linux kernel module loader, which hard faults by refusing to
> handle these relocations when in a workaround mode for Erratum 843419.

Yes.

> Most of these relocations won't occur because the kernel builds with
> -mcmodel=large, but literals would always use a small model style
> adrp/load, unless we were using -mpc-relative-literal-loads . So, this
> workaround enabled -mpc-relative-literal-loads always when we were in
> a workaround mode, thus allowing the kernel loader to continue.
>
> The argument for removing it then, is that with the kernel now always using
> -mpc-relative-literal-loads there is no reason for this workaround to stay
> in place. The linkers which we will use will apply the workaround if needed.

Yes.

> Testcases and a detailed problem report of the build failures you had seen in
> 521.wrf would have helped me get closer to understanding this, and made
> review substantially easier.

Sorry for not being clear with this. Unfortunately 521.wrf  was
showing this error with LTO so I could not reproduce with a small
enough test case.

> Am I on the right track?
>
> If so, this is OK for trunk. If not, please can you expand on what is going
> on in this patch.

Thanks for the review.
Kugan

>
> Thanks,
> James
>
>
>>
>> Thanks,
>> Kugan
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2017-06-27  Kugan Vivekanandarajah  
>>
>> * gcc.target/aarch64/pr63304_1.c: Remove-mno-fix-cortex-a53-843419.
>>
>> gcc/ChangeLog:
>>
>> 2017-06-27  Kugan Vivekanandarajah  
>>
>> * config/aarch64/aarch64.c (aarch64_override_options_after_change_1):
>> Disable pc relative literal load irrespective of TARGET_FIX_ERR_A53_84341
>> for default.
>
>


Re: [PATCH][RFA/RFC] Stack clash mitigation patch 06/08 - V3

2017-08-29 Thread Segher Boessenkool
Hi Jeff,

Sorry for the delay in reviewing this.  It mostly looks good :-)

On Sun, Jul 30, 2017 at 11:45:16PM -0600, Jeff Law wrote:
> 
> This contains the PPC bits for stack clash protection.
> 
> Changes since V2:
> 
> Exploits inlined/unrolled probes and rotated loops for the dynamic area.
>  Some trivial simplifications.  It also uses the new params to control
> if probes are needed and how often to probe.


>   * config/rs6000/rs6000-protos.h (output_probe_stack_range): Update
>   prototype for new argument.
>   * config/rs6000/rs6000.c (wrap_frame_mem): New function extracted
>   from rs6000_emit_allocate_stack.
>   (handle_residual): New function. 

Trailing space.

>   (rs6000_emit_probe_stack_range_stack_clash): New function.
>   (rs6000_emit_allocate_stack): Use wrap_frame_mem.
>   Call rs6000_emit_probe_stack_range_stack_clash as needed.
>   (rs6000_emit_probe_stack_range): Add additional argument
>   to call to gen_probe_stack_range{si,di}.
>   (output_probe_stack_range): New.
>   (output_probe_stack_range_1):  Renamed from output_probe_stack_range.

Double space.

>   (output_probe_stack_range_stack_clash): New.
>   (rs6000_emit_prologue): Emit notes into dump file as requested.
>   * rs6000.md (allocate_stack): Handle -fstack-clash-protection

Missing full stop.

>   (probe_stack_range): Operand 0 is now early-clobbered.
>   Add additional operand and pass it to output_probe_stack_range.


> +/* INSN allocates SIZE bytes on the stack (STACK_REG) using a store
> +   with update style insn.
> +
> +   Set INSN's alias set/attributes and add suitable flags and notes
> +   for the dwarf CFI machinery.  */
> +static void
> +wrap_frame_mem (rtx insn, rtx stack_reg, HOST_WIDE_INT size)

This isn't such a great name...  Something with "frame allocate"?  And
the "wrap" part is meaningless...  "set_attrs_for_frame_allocation"?
Or maybe "stack allocation" is better.

Actually, everywhere it is used it has a Pmode == SImode wart before
it, to emit the right update_stack insn...  So fold that into this
function, name it rs6000_emit_allocate_stack_1?

> +/* Allocate ROUNDED_SIZE - ORIG_SIZE bytes on the stack, storing
> +   ORIG_SP into *sp after the allocation.

This is a bit misleading: it has to store it at the _same time_ as doing
any change to r1.  Or, more precisely, it has to ensure r1 points at a
valid stack backchain at all times.  Since the red zone is pretty small
(if it exists at all, it doesn't on all ABIs) you need atomic store-with-
update in most cases.

> +static rtx_insn *
> +handle_residual (HOST_WIDE_INT orig_size,
> +  HOST_WIDE_INT rounded_size,
> +  rtx orig_sp)

Not a great name either.  Since this is only used once, and it is hard
to make a good name for it, and it is only a handful of statements, just
inline it?

> +/* Allocate ORIG_SIZE bytes on the stack and probe the newly
> +   allocated space every STACK_CLASH_PROTECTION_PROBE_INTERVAL bytes.
> +
> +   COPY_REG, if non-null, should contain a copy of the original
> +   stack pointer modified by COPY_OFF at exit from this function.
> +
> +   This is subtly different than the Ada probing in that it tries hard to
> +   prevent attacks that jump the stack guard.  Thus it is never allowed to
> +   allocate more than STACK_CLASH_PROTECTION_PROBE_INTERVAL bytes of stack
> +   space without a suitable probe.  */

Please handle the COPY_OFF part in the (single) caller of this, that
will simplify things a little I think?

You don't describe what the return value here is (but neither does
rs6000_emit_allocate_stack); it is the single instruction that actually
changes the stack pointer?  For the split stack support?  (Does the stack
clash code actually work with split stack, has that been tested?)

Maybe we should keep track of that sp_adjust insn in a global, or in
machine_function, or even in the stack info struct.

> +static rtx_insn *
> +rs6000_emit_probe_stack_range_stack_clash (HOST_WIDE_INT orig_size,
> +rtx copy_reg, int copy_off)
> +{
> +  rtx orig_sp = copy_reg;
> +
> +  HOST_WIDE_INT probe_interval
> += PARAM_VALUE (PARAM_STACK_CLASH_PROTECTION_PROBE_INTERVAL);
> +
> +  /* Round the size down to a multiple of PROBE_INTERVAL.  */
> +  HOST_WIDE_INT rounded_size = orig_size & -probe_interval;

Use ROUND_DOWN?

> +
> +  /* If explicitly requested,
> +   or the rounded size is not the same as the original size
> +   or the the rounded size is greater than a page,
> + then we will need a copy of the original stack pointer.  */
> +  if (rounded_size != orig_size
> +  || rounded_size > probe_interval
> +  || copy_reg)
> +{
> +  /* If the caller requested a copy of the incoming stack pointer,
> +  then ORIG_SP == COPY_REG and will not be NULL.
> +
> +  If no copy was requested, then we use r0 to hold the copy.  */
> +  if (orig_sp == NULL_RTX)
> + orig_sp = 

Re: [PATCH] libquadmath/81848, allow PowerPC to enable libquadmath

2017-08-29 Thread Jakub Jelinek
On Tue, Aug 15, 2017 at 11:06:01PM -0400, Michael Meissner wrote:
> 2017-08-15  Michael Meissner  
> 
>   PR libquadmath/81848
>   * configure.ac (powerpc*-linux*): Use attribute mode KC to create
>   complex __float128 on PowerPC instead of attribute mode TC.
>   * quadmth.h (__complex128): Likewise.

quadmath.h ?

>   * configure: Regenerate.
>   * math/cbrtq.c (CBRT2): Use __float128 not long double.
>   (CBRT4): Likewise.
>   (CBRT2I): Likewise.
>   (CBRT4I): Likewise.
>   * math/j0q.c (U0): Likewise.
>   * math/sqrtq.c (sqrtq): Don't depend on implicit conversion
>   between __float128, instead explicitly convert the __float128
>   value to long double because the PowerPC does not allow __float128
>   and long double in the same expression.

Does the Q suffix on ppc* imply __float128 like on x86_64 etc.?

> --- libquadmath/math/sqrtq.c  (revision 251097)
> +++ libquadmath/math/sqrtq.c  (working copy)
> @@ -31,15 +31,18 @@ sqrtq (const __float128 x)
>  return y;
>}
>  
> -#ifdef HAVE_SQRTL
> -  if (x <= LDBL_MAX && x >= LDBL_MIN)
> +#if defined(HAVE_SQRTL)

Why the #ifdef -> #if defined change?  That looks unnecessary.

>{
> -/* Use long double result as starting point.  */
> -y = sqrtl ((long double) x);
> +long double xl = (long double)x;

Please add a space after (long double)

> +if (xl <= LDBL_MAX && xl >= LDBL_MIN)
> +  {
> + /* Use long double result as starting point.  */
> + y = sqrtl (xl);
>  
> -/* One Newton iteration.  */
> -y -= 0.5q * (y - x / y);
> -return y;
> + /* One Newton iteration.  */
> + y -= 0.5q * (y - x / y);
> + return y;
> +  }
>}
>  #endif
>  

Otherwise LGTM.

Jakub


Re: [C++ PATCH] Make taking the address of an overloaded function a non-deduced context

2017-08-29 Thread Ville Voutilainen
On 30 August 2017 at 01:19, Ville Voutilainen
 wrote:
> 2017-08-29  Ville Voutilainen  
>
> Make taking the address of an overloaded function a non-deduced context
>
> cp/
>
> * pt.c (unify_overload_resolution_failure): Return unify_success
> instead of unify_invalid.
>
> testsuite/
>
> * g++.dg/overload/template6.C: New.

Ah yes, tested on Linux-PPC64. I see some libstdc++ (runtime) failures
from the testsuite, but as far as I can
see they are there even before this patch.


[C++ PATCH] Make taking the address of an overloaded function a non-deduced context

2017-08-29 Thread Ville Voutilainen
2017-08-29  Ville Voutilainen  

Make taking the address of an overloaded function a non-deduced context

cp/

* pt.c (unify_overload_resolution_failure): Return unify_success
instead of unify_invalid.

testsuite/

* g++.dg/overload/template6.C: New.
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 564ffb0..4f731fd 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -6370,7 +6370,7 @@ unify_overload_resolution_failure (bool explain_p, tree 
arg)
 inform (input_location,
"  could not resolve address from overloaded function %qE",
arg);
-  return unify_invalid (explain_p);
+  return unify_success (explain_p);
 }
 
 /* Attempt to convert the non-type template parameter EXPR to the
diff --git a/gcc/testsuite/g++.dg/overload/template6.C 
b/gcc/testsuite/g++.dg/overload/template6.C
new file mode 100644
index 000..f2650aa
--- /dev/null
+++ b/gcc/testsuite/g++.dg/overload/template6.C
@@ -0,0 +1,47 @@
+// { dg-do compile { target c++11 } }
+
+template 
+struct is_function {
+  static constexpr bool value = false;
+};
+
+template 
+struct is_function
+{
+  static constexpr bool value = true;
+};
+
+template struct enable_if {};
+
+template struct enable_if 
+{
+  typedef T type;
+};
+
+template 
+struct remove_pointer
+{
+  typedef T type;
+};
+
+template 
+struct remove_pointer
+{
+  typedef T type;
+};
+
+void f(int) {}
+void f(double) {}
+
+template 
+struct X
+{
+  template ::type>::value,
+  bool>::type = false> X(U&&) {}
+};
+
+int main() {
+  X x0(f);
+}


Re: Turn FUNCTION_ARG_PADDING into a target hook

2017-08-29 Thread Jeff Law
On 08/28/2017 04:05 AM, Richard Sandiford wrote:
> This involved renaming the rather general-sounding "enum direction" to
> "enum pad_direction" to avoid a conflict with the Fortran frontend.
> 
> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
> Also tested by checking that there were no extra warnings or changes in
> testsuite assembly output for at least one target per CPU.  OK to install?
> 
> Richard
> 
> 
> 2017-08-28  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * coretypes.h (pad_direction): New enum.
>   * defaults.h (DEFAULT_FUNCTION_ARG_PADDING): Delete.
>   (FUNCTION_ARG_PADDING): Likewise.
>   * target.def (function_arg_padding): New hook.
>   * targhooks.h (default_function_arg_padding): Declare.
>   * targhooks.c (default_function_arg_padding): New function.
>   * doc/tm.texi.in (FUNCTION_ARG_PADDING): Replace with...
>   (TARGET_FUNCTION_ARG_PADDING): ...this.
>   * doc/tm.texi: Regenerate.
>   * calls.c (store_unaligned_arguments_into_pseudos): Use pad_direction
>   instead of direction.
>   (compute_argument_addresses): Likewise.
>   (load_register_parameters): Likewise.
>   (emit_library_call_value_1): Likewise.
>   (store_one_arg): Use targetm.calls.function_arg_padding instead
>   of FUNCTION_ARG_PADDING.
>   (must_pass_in_stack_var_size_or_pad): Likewise.
>   * expr.c (emit_group_load_1): Use pad_direction instead of direction.
>   (emit_group_store): Likewise.
>   (emit_single_push_insn_1): Use targetm.calls.function_arg_padding
>   instead of FUNCTION_ARG_PADDING.
>   (emit_push_insn): Likewise, and propagate enum change throughout
>   function.
>   * function.h (direction): Delete.
>   (locate_and_pad_arg_data::where_pad): Use pad_direction instead
>   of direction.
>   * function.c (assign_parm_find_stack_rtl): Likewise.
>   (assign_parm_setup_block_p): Likewise.
>   (assign_parm_setup_block): Likewise.
>   (gimplify_parameters): Likewise.
>   (locate_and_pad_parm): Use targetm.calls.function_arg_padding
>   instead of FUNCTION_ARG_PADDING, and propagate enum change throughout
>   function.
>   * config/aarch64/aarch64.h (FUNCTION_ARG_PADDING): Delete.
>   (BLOCK_REG_PADDING): Use pad_direction instead of direction.
>   * config/aarch64/aarch64-protos.h (aarch64_pad_arg_upward): Delete.
>   * config/aarch64/aarch64.c (aarch64_pad_arg_upward): Replace with...
>   (aarch64_function_arg_padding): ...this new function.
>   (aarch64_gimplify_va_arg_expr): Use pad_direction instead of direction.
>   (TARGET_FUNCTION_ARG_PADDING): Redefine.
>   * config/arm/arm.h (FUNCTION_ARG_PADDING): Delete.
>   (BLOCK_REG_PADDING): Use pad_direction instead of direction.
>   * config/arm/arm-protos.h (arm_pad_arg_upward): Delete.
>   * config/arm/arm.c (TARGET_FUNCTION_ARG_PADDING): Redefine.
>   (arm_pad_arg_upward): Replace with...
>   (arm_function_arg_padding): ...this new function.
>   * config/c6x/c6x.h (BLOCK_REG_PADDING): Use pad_direction instead
>   of direction.
>   * config/ia64/hpux.h (FUNCTION_ARG_PADDING): Delete.
>   * config/ia64/ia64-protos.h (ia64_hpux_function_arg_padding): Delete.
>   * config/ia64/ia64.c (TARGET_FUNCTION_ARG_PADDING): Redefine.
>   (ia64_hpux_function_arg_padding): Replace with...
>   (ia64_function_arg_padding): ...this new function.  Use pad_direction
>   instead of direction.  Check for TARGET_HPUX.
>   * config/iq2000/iq2000.h (FUNCTION_ARG_PADDING): Delete.
>   * config/iq2000/iq2000.c (TARGET_FUNCTION_ARG_PADDING): Redefine.
>   (iq2000_function_arg_padding): New function.
>   * config/mips/mips-protos.h (mips_pad_arg_upward): Delete.
>   * config/mips/mips.c (mips_pad_arg_upward): Replace with...
>   (mips_function_arg_padding): ...this new function.
>   (mips_pad_reg_upward): Update accordingly.
>   (TARGET_FUNCTION_ARG_PADDING): Redefine.
>   * config/mips/mips.h (PAD_VARARGS_DOWN): Use
>   targetm.calls.function_arg_padding.
>   (FUNCTION_ARG_PADDING): Delete.
>   (BLOCK_REG_PADDING): Use pad_direction instead of direction.
>   * config/nios2/nios2.h (FUNCTION_ARG_PADDING): Delete.
>   (PAD_VARARGS_DOWN): Use targetm.calls.function_arg_padding.
>   * config/nios2/nios2-protos.h (nios2_function_arg_padding): Delete.
>   (nios2_block_reg_padding): Return pad_direction instead of direction.
>   * config/nios2/nios2.c (nios2_block_reg_padding): Return pad_direction
>   instead of direction.
>   (nios2_function_arg_padding): Likewise.  Make static.
>   (TARGET_FUNCTION_ARG_PADDING): Redefine.
>   * config/pa/pa.h (FUNCTION_ARG_PADDING): Delete.
>   (BLOCK_REG_PADDING): Use targetm.calls.function_arg_padding.
>  

libgo patch committed: build libgo/net/internal/socktest/sys_unix.go on AIX

2017-08-29 Thread Ian Lance Taylor
This libgo patch by Tony Reix builds the file
libgo/net/internal/socktest/sys_unix.go on AIX.  Bootstrapped on
x86_64-pc-linux-gnu.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 251439)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-930304ec98bfe8e9cccb34d340873958e14255df
+5674b5927d5336e20fbec455a9f7b0b8ed70166c
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/go/net/internal/socktest/sys_unix.go
===
--- libgo/go/net/internal/socktest/sys_unix.go  (revision 250873)
+++ libgo/go/net/internal/socktest/sys_unix.go  (working copy)
@@ -2,7 +2,7 @@
 // Use of this source code is governed by a BSD-style
 // license that can be found in the LICENSE file.
 
-// +build darwin dragonfly freebsd linux nacl netbsd openbsd solaris
+// +build aix darwin dragonfly freebsd linux nacl netbsd openbsd solaris
 
 package socktest
 


libgo patch committed: netinet/icmp6.h require netinet/in.h on AIX

2017-08-29 Thread Ian Lance Taylor
This patch from Tony Reix fixes the libgo configure script to
correctly decide whether netinet/icmp6.h exists on AIX.  Bootstrapped
and ran Go testsuite on x86_64-pc-linux-gnu.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 251436)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-03a2c6be0c6e2b8ef62a5a424c5518bfb7cce0b9
+930304ec98bfe8e9cccb34d340873958e14255df
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/configure.ac
===
--- libgo/configure.ac  (revision 250873)
+++ libgo/configure.ac  (working copy)
@@ -575,7 +575,11 @@ AC_C_BIGENDIAN
 
 GCC_CHECK_UNWIND_GETIPINFO
 
-AC_CHECK_HEADERS(port.h sched.h semaphore.h sys/file.h sys/mman.h syscall.h 
sys/epoll.h sys/event.h sys/inotify.h sys/ptrace.h sys/syscall.h sys/user.h 
sys/utsname.h sys/select.h sys/socket.h net/if.h net/if_arp.h net/route.h 
netpacket/packet.h sys/prctl.h sys/mount.h sys/vfs.h sys/statfs.h sys/timex.h 
sys/sysinfo.h utime.h linux/ether.h linux/fs.h linux/ptrace.h linux/reboot.h 
netinet/icmp6.h netinet/in_syst.h netinet/ip.h netinet/ip_mroute.h 
netinet/if_ether.h)
+AC_CHECK_HEADERS(port.h sched.h semaphore.h sys/file.h sys/mman.h syscall.h 
sys/epoll.h sys/event.h sys/inotify.h sys/ptrace.h sys/syscall.h sys/user.h 
sys/utsname.h sys/select.h sys/socket.h net/if.h net/if_arp.h net/route.h 
netpacket/packet.h sys/prctl.h sys/mount.h sys/vfs.h sys/statfs.h sys/timex.h 
sys/sysinfo.h utime.h linux/ether.h linux/fs.h linux/ptrace.h linux/reboot.h 
netinet/in_syst.h netinet/ip.h netinet/ip_mroute.h netinet/if_ether.h)
+
+AC_CHECK_HEADERS([netinet/icmp6.h], [], [],
+[#include 
+])
 
 AC_CHECK_HEADERS([linux/filter.h linux/if_addr.h linux/if_ether.h 
linux/if_tun.h linux/netlink.h linux/rtnetlink.h], [], [],
 [#ifdef HAVE_SYS_SOCKET_H


Re: Add a partial_subreg_p predicate

2017-08-29 Thread Jeff Law
On 08/21/2017 07:34 AM, Richard Sandiford wrote:
> This patch adds a partial_subreg_p predicate to go alongside
> paradoxical_subreg_p.
> 
> The first two changes to cse_insn preserve the current behaviour,
> but the condition seems strange.  Shouldn't we be able to continue
> to cse if the inner modes of the two subregs have the same size?
I would think so.  It could well be a simple oversight.  We're talking
about very old code -- this stuff is virtually unchanged in 25 years.

This specific chunk of code is something I would like to look more
deeply into its utility -- most of what it's doing should be handled by
DOM these days.  It'd be an interesting exercise to see if this code is
important at all anymore.

If you wanted to handle that case, I wouldn't object as a follow-up if
you can come up with a testcase that shows when its useful.


> The patch also preserves the existing condition in
> simplify_operand_subreg, but perhaps it should be using
> a df_read_modify_subreg_p-style check instead.  We don't
> need to reload the inner value first if the inner value
> is no bigger than a word, for example.
Not sure on this one.  We could try to change it as a follow-up though.
Vlad may have a better sense of how this might interact with other parts
of LRA than I would

> 
> The use in curr_insn_transform also seemed strange.  Surely the
> modes of the SET_DEST and SET_SRC will be the same, given that
> this code isn't meant for constants?
Is it possible this is for the zero/sign extension support?

> 
> Like the paradoxical_subreg_p patch, this one replaces some tests that
> were based on GET_MODE_SIZE rather than GET_MODE_PRECISION.  In each
> case the change should be a no-op or an improvement.
> 
> Doing this in regcprop.c prevents some replacements of the 82-bit RFmode
> with the 80-bit XFmode on ia64.  I don't understand the target details
> here particularly well, but from the way the values are described in
> ia64-modes.def, it isn't valid to assume that an XFmode can carry an
> RFmode payload.  A comparison of the testsuite assembly output for one
> target per CPU showed no other differences.
> 
> Some of the places changed here are tracking the widest access mode
> found for a register.  The series tries to standardise on:
> 
>   if (partial_subreg_p (widest_seen, new_mode))
> widest_seen = new_mode;
> 
> rather than:
> 
>   if (paradoxical_subreg_p (new_mode, widest_seen))
> widest_seen = new_mode;
> 
> Either would have been OK.
> 
> Tested on aarch64-linux-gnu and x86_64-linux-gnu in addition to the above.
> OK to install?
> 
> Richard
> 
> 
> 2017-08-21  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * rtl.h (partial_subreg_p): New function.
>   * caller-save.c (save_call_clobbered_regs): Use it.
>   * calls.c (expand_call): Likewise.
>   * combine.c (combinable_i3pat): Likewise.
>   (simplify_set): Likewise.
>   (make_extraction): Likewise.
>   (make_compound_operation_int): Likewise.
>   (gen_lowpart_or_truncate): Likewise.
>   (force_to_mode): Likewise.
>   (make_field_assignment): Likewise.
>   (reg_truncated_to_mode): Likewise.
>   (record_truncated_value): Likewise.
>   (move_deaths): Likewise.
>   * cse.c (record_jump_cond): Likewise.
>   (cse_insn): Likewise.
>   * cselib.c (cselib_lookup_1): Likewise.
>   * df-scan.c (df_read_modify_subreg_p): Likewise.
>   * expmed.c (extract_bit_field_using_extv): Likewise.
>   * function.c (assign_parm_setup_reg): Likewise.
>   * ifcvt.c (noce_convert_multiple_sets): Likewise.
>   * ira-build.c (create_insn_allocnos): Likewise.
>   * lra-coalesce.c (merge_pseudos): Likewise.
>   * lra-constraints.c (match_reload): Likewise.
>   (simplify_operand_subreg): Likewise.
>   (curr_insn_transform): Likewise.
>   * lra-lives.c (process_bb_lives): Likewise.
>   * lra.c (new_insn_reg): Likewise.
>   (lra_substitute_pseudo): Likewise.
>   * regcprop.c (mode_change_ok): Likewise.
>   (maybe_mode_change): Likewise.
>   (copyprop_hardreg_forward_1): Likewise.
>   * reload.c (push_reload): Likewise.
>   (find_reloads): Likewise.
>   (find_reloads_subreg_address): Likewise.
>   * reload1.c (alter_reg): Likewise.
>   (eliminate_regs_1): Likewise.
>   * simplify-rtx.c (simplify_unary_operation_1): Likewise.
Any thought on whether or not a same size subreg predicate would be
useful.  ie, (subreg:SI (reg:SF)) and the like.  Certainly not something
you have to do to go forward, just something to ponder.

The patch is OK for the trunk.  The issues noted above all seem like
potentially follow-up items if we want to tackle them at all.

jeff



[PATCH v2, rs6000] Fix PR81833

2017-08-29 Thread Bill Schmidt
Hi Segher,

Thanks for approving the previous patch with changes.  I've made those and also
modified the test case to require VSX hardware for execution.  I duplicated the
test so we get coverage on P7 BE 32/64 and P8 BE/LE.  I'd appreciate it if you
could look over the dejagnu instructions once more on these.  Thanks!

Bill


[gcc]

2017-08-29  Bill Schmidt  

PR target/81833
* config/rs6000/altivec.md (altivec_vsum2sws): Convert from a
define_insn to a define_expand.
(altivec_vsum2sws_direct): New define_insn.
(altivec_vsumsws): Convert from a define_insn to a define_expand.

[gcc/testsuite]

2017-08-29  Bill Schmidt  

PR target/81833
* gcc.target/powerpc/pr81833-1.c: New file.
* gcc.target/powerpc/pr81833-2.c: New file.


Index: gcc/config/rs6000/altivec.md
===
--- gcc/config/rs6000/altivec.md(revision 251369)
+++ gcc/config/rs6000/altivec.md(working copy)
@@ -1804,51 +1804,61 @@
   "vsum4ss %0,%1,%2"
   [(set_attr "type" "veccomplex")])
 
-;; FIXME: For the following two patterns, the scratch should only be
-;; allocated for !VECTOR_ELT_ORDER_BIG, and the instructions should
-;; be emitted separately.
-(define_insn "altivec_vsum2sws"
-  [(set (match_operand:V4SI 0 "register_operand" "=v")
-(unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v")
-  (match_operand:V4SI 2 "register_operand" "v")]
-UNSPEC_VSUM2SWS))
-   (set (reg:SI VSCR_REGNO) (unspec:SI [(const_int 0)] UNSPEC_SET_VSCR))
-   (clobber (match_scratch:V4SI 3 "=v"))]
+(define_expand "altivec_vsum2sws"
+  [(use (match_operand:V4SI 0 "register_operand"))
+   (use (match_operand:V4SI 1 "register_operand"))
+   (use (match_operand:V4SI 2 "register_operand"))]
   "TARGET_ALTIVEC"
 {
   if (VECTOR_ELT_ORDER_BIG)
-return "vsum2sws %0,%1,%2";
+emit_insn (gen_altivec_vsum2sws_direct (operands[0], operands[1],
+operands[2]));
   else
-return "vsldoi %3,%2,%2,12\n\tvsum2sws %3,%1,%3\n\tvsldoi %0,%3,%3,4";
-}
-  [(set_attr "type" "veccomplex")
-   (set (attr "length")
- (if_then_else
-   (match_test "VECTOR_ELT_ORDER_BIG")
-   (const_string "4")
-   (const_string "12")))])
+{
+  rtx tmp1 = gen_reg_rtx (V4SImode);
+  rtx tmp2 = gen_reg_rtx (V4SImode);
+  emit_insn (gen_altivec_vsldoi_v4si (tmp1, operands[2],
+  operands[2], GEN_INT (12)));
+  emit_insn (gen_altivec_vsum2sws_direct (tmp2, operands[1], tmp1));
+  emit_insn (gen_altivec_vsldoi_v4si (operands[0], tmp2, tmp2,
+  GEN_INT (4)));
+}
+  DONE;
+})
 
-(define_insn "altivec_vsumsws"
+; FIXME: This can probably be expressed without an UNSPEC.
+(define_insn "altivec_vsum2sws_direct"
   [(set (match_operand:V4SI 0 "register_operand" "=v")
 (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v")
-  (match_operand:V4SI 2 "register_operand" "v")]
-UNSPEC_VSUMSWS))
-   (set (reg:SI VSCR_REGNO) (unspec:SI [(const_int 0)] UNSPEC_SET_VSCR))
-   (clobber (match_scratch:V4SI 3 "=v"))]
+ (match_operand:V4SI 2 "register_operand" "v")]
+UNSPEC_VSUM2SWS))
+   (set (reg:SI VSCR_REGNO) (unspec:SI [(const_int 0)] UNSPEC_SET_VSCR))]
   "TARGET_ALTIVEC"
+  "vsum2sws %0,%1,%2"
+  [(set_attr "type" "veccomplex")])
+
+(define_expand "altivec_vsumsws"
+  [(use (match_operand:V4SI 0 "register_operand"))
+   (use (match_operand:V4SI 1 "register_operand"))
+   (use (match_operand:V4SI 2 "register_operand"))]
+  "TARGET_ALTIVEC"
 {
   if (VECTOR_ELT_ORDER_BIG)
-return "vsumsws %0,%1,%2";
+emit_insn (gen_altivec_vsumsws_direct (operands[0], operands[1],
+   operands[2]));
   else
-return "vspltw %3,%2,0\n\tvsumsws %3,%1,%3\n\tvsldoi %0,%3,%3,12";
-}
-  [(set_attr "type" "veccomplex")
-   (set (attr "length")
- (if_then_else
-   (match_test "(VECTOR_ELT_ORDER_BIG)")
-   (const_string "4")
-   (const_string "12")))])
+{
+  rtx tmp1 = gen_reg_rtx (V4SImode);
+  rtx tmp2 = gen_reg_rtx (V4SImode);
+  emit_insn (gen_altivec_vspltw_direct (tmp1, operands[2], const0_rtx));
+  emit_insn (gen_altivec_vsumsws_direct (tmp2, operands[1], tmp1));
+  emit_insn (gen_altivec_vsldoi_v4si (operands[0], tmp2, tmp2,
+  GEN_INT (12)));
+}
+  DONE;
+})
 
+; FIXME: This can probably be expressed without an UNSPEC.
 (define_insn "altivec_vsumsws_direct"
   [(set (match_operand:V4SI 0 "register_operand" "=v")
 (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v")
Index: gcc/testsuite/gcc.target/powerpc/pr81833-1.c
===
--- 

C++ PATCH for c++/81236, missed 'this' capture with template-id in generic lambda

2017-08-29 Thread Jason Merrill
The problem here was that normally in finish_id_expression we add
'this->' to a mention of a non-static member, but there is special
code for doing less if the lookup result is dependent, and so we
didn't get that processing.

A few months back Adam fixed 64382 by deciding to do all the normal
processing if we're in a generic lambda in a template, but in this
testcase the generic lambda isn't in a template.

We could approach this by extending that change to all generic
lambdas, and that might be appropriate for GCC 7, but it seems to me
that this approach will just mean any problems with doing all the
normal processing in a template will remain latent until someone
happens to use them in a generic lambda; instead, this patch removes
the template special case and fixes the normal code to work properly
in templates.

The cp_parser_postfix_dot_deref_expression hunk was a latent
diagnostic quality issue: there's no point in trying to be permissive
about an incomplete type like void, since it can't be completed by
instantiation time.

The cp_walk_subtrees hunk was necessary to make abi-tag21.C work with
this change; without walking into the scope of the BASELINK, we didn't
see the explicit scope in the return type of fv.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit 28df39b2192b5feeffa6dd2b8208cf0270ac5330
Author: Jason Merrill 
Date:   Thu Aug 24 14:09:01 2017 -0400

PR c++/81236 - ICE with template-id in generic lambda

* semantics.c (finish_id_expression): Remove special dependent case.
Avoid some later pieces when dependent.
(finish_qualified_id_expr): Do normal BASELINK handling in a
template.  Always build a SCOPE_REF for a destructor BIT_NOT_EXPR.
(parsing_default_capturing_generic_lambda_in_template): Remove.
* parser.c (cp_parser_postfix_dot_deref_expression): Always give an
error for types that will never be complete.
* mangle.c (write_expression): Add sanity check.
* tree.c (build_qualified_name): Add sanity check.
(cp_walk_subtrees): Walk into the class context of a BASELINK.
* lambda.c (add_capture): Improve diagnostic for generic lambda
capture failure.
* call.c (build_new_method_call_1): Print the right constructor
name.

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index f7f9297..c446057 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -9007,6 +9007,7 @@ build_new_method_call_1 (tree instance, tree fns, 
vec **args,
   if (! (complain & tf_error))
return error_mark_node;
 
+  basetype = DECL_CONTEXT (fn);
   name = constructor_name (basetype);
   if (permerror (input_location,
 "cannot call constructor %<%T::%D%> directly",
diff --git a/gcc/cp/lambda.c b/gcc/cp/lambda.c
index 4747a72..9ba3df1 100644
--- a/gcc/cp/lambda.c
+++ b/gcc/cp/lambda.c
@@ -590,7 +590,11 @@ add_capture (tree lambda, tree id, tree orig_init, bool 
by_reference_p,
   /* Add it to the appropriate closure class if we've started it.  */
   if (current_class_type
   && current_class_type == LAMBDA_EXPR_CLOSURE (lambda))
-finish_member_declaration (member);
+{
+  if (COMPLETE_TYPE_P (current_class_type))
+   internal_error ("trying to capture %qD after closure is complete", id);
+  finish_member_declaration (member);
+}
 
   tree listmem = member;
   if (variadic)
diff --git a/gcc/cp/mangle.c b/gcc/cp/mangle.c
index a87f97f..ce7c0c5 100644
--- a/gcc/cp/mangle.c
+++ b/gcc/cp/mangle.c
@@ -3015,6 +3015,7 @@ write_expression (tree expr)
{
  scope = TREE_OPERAND (expr, 0);
  member = TREE_OPERAND (expr, 1);
+ gcc_assert (!BASELINK_P (member));
}
   else
{
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index d0d71fa..47d91bf 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -7446,11 +7446,14 @@ cp_parser_postfix_dot_deref_expression (cp_parser 
*parser,
  /* In a template, be permissive by treating an object expression
 of incomplete type as dependent (after a pedwarn).  */
  diagnostic_t kind = (processing_template_decl
+  && MAYBE_CLASS_TYPE_P (scope)
   ? DK_PEDWARN
   : DK_ERROR);
  cxx_incomplete_type_diagnostic
(location_of (postfix_expression),
 postfix_expression, scope, kind);
+ if (!MAYBE_CLASS_TYPE_P (scope))
+   return error_mark_node;
  if (processing_template_decl)
{
  dependent_p = true;
@@ -20671,33 +20674,6 @@ parsing_nsdmi (void)
   return false;
 }
 
-/* Return true iff our current scope is a default capturing generic lambda
-   defined within a template.  FIXME: This is part of a workaround (see
-   semantics.c) to handle 

libgo patch committed: fix Stat_t on AIX

2017-08-29 Thread Ian Lance Taylor
This patch to libgo's mksysinfo.sh script handles the timespec field
found in the stat struct on AIX.  Bootstrapped on x86_64-pc-linux-gnu.
Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 251435)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-a28f1d8aa306bdb5166af1f087e5f5027ce51d6d
+03a2c6be0c6e2b8ef62a5a424c5518bfb7cce0b9
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/mksysinfo.sh
===
--- libgo/mksysinfo.sh  (revision 250873)
+++ libgo/mksysinfo.sh  (working copy)
@@ -462,6 +462,7 @@ fi | sed -e 's/type _stat64/type Stat_t/
  -e 's/st_ctim/Ctim/' \
  -e 's/\([^a-zA-Z0-9_]\)_timeval\([^a-zA-Z0-9_]\)/\1Timeval\2/g' \
  -e 's/\([^a-zA-Z0-9_]\)_timespec_t\([^a-zA-Z0-9_]\)/\1Timespec\2/g' \
+ -e 
's/\([^a-zA-Z0-9_]\)_st_timespec_t\([^a-zA-Z0-9_]\)/\1Timespec\2/g' \
  -e 's/\([^a-zA-Z0-9_]\)_timespec\([^a-zA-Z0-9_]\)/\1Timespec\2/g' \
  -e 's/\([^a-zA-Z0-9_]\)_timestruc_t\([^a-zA-Z0-9_]\)/\1Timestruc\2/g' 
\
  -e 's/Godump_[0-9] struct { \([^;]*;\) };/\1/g' \


libgo patch committed: Fix go-nosys.c when HAVE_SYSCALL is not defined

2017-08-29 Thread Ian Lance Taylor
This patch from Tony Reix fixes the compilation of go-nosys.c when
HAVE_SYSCALL is not defined.  Bootstrapped on x86_64-pc-linux-gnu.
Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 251420)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-db8e3801bf8508656606d6e465c76cdc6e9a9eb7
+a28f1d8aa306bdb5166af1f087e5f5027ce51d6d
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/runtime/go-nosys.c
===
--- libgo/runtime/go-nosys.c(revision 250873)
+++ libgo/runtime/go-nosys.c(working copy)
@@ -506,7 +506,7 @@ strerror_r (int errnum, char *buf, size_
 
 #ifndef HAVE_SYSCALL
 int
-syscall(int number, ...)
+syscall(int number __attribute__ ((unused)), ...)
 {
   errno = ENOSYS;
   return -1;


Re: [patch, fortran] Warn about suspicious assignment to contiguous pointers

2017-08-29 Thread Thomas Koenig

Hi Janus,


and in that
case I would argue that, beyond being "not useful", it's even illegal,
so why not throw a hard error, if we can infer at compile-time that
the target is non-contiguous?


Problem is, we cannot infer this from the tests done.
We would also have to add a test if the array is empty
or that it contains only a single element, and that (I think)
is a) impossible in the general case, and b) not worth the bother.


I'm not sure I understand which cases you're worried about here. Maybe
you can give an example?


   real, dimension(5,5), target :: a
   real, dimension(:,:), pointer, contiguous :: ap
   ap => a(4::2,4::2)

points to a single element, which is (by definition) contiguous.


yes, I see that your current patch mishandles this case.


It does _not_ mishandle that case.  It issues a warning for a
very dubious practice. The warning can be turned off at will.

Issuing a hard error whould be mishandling this case.



But why
should it be impossible to detect that there is only a single element?
If the triplet and the array bounds are constant, it is certainly
possible.


Too much work into supporting a corner case of a dubious practice.  I'm
not going to do it.


There are also a few other checks that I missed, for
example

   subroutine foo(a)
   real, dimension(:), target, intent(inout) :: a
   real, dimension(:), contiguous :: ap
   ap => a

I will also address this in a future version of the patch.


How do you want to do that?


Warn in all cases.

Because the code writer can not how if the array passed to the
subroutine is contigous (use case of a library), it is not
possible for him to know if the user will do this correctly or
not.  The correct way to write this subroutine then is

subroutine foo(a)
  real, dimension(:), target, intent(inout), contiguous :: a
  real, dimension(:), contiguous :: ap
  ap => a

when gfortran will someday complain if the dummy argument
is associated with a non-contiguous entity.


C++ PATCH to overhaul lambdas in templates

2017-08-29 Thread Jason Merrill
Our old model for handling lambdas in templates was to handle them
like any other template class, and instantiate them by normal
substitution.  This was insufficient for lambdas for two reasons: for
one, it made it impossible to get implicit captures quite right in
templates.

Also, if a lambda appears in a pack expansion, e.g.

template 
auto f() {
  int i = 42;
  return ([i]{ return T(i); }() + ...);
}

There needs to be one lambda for each element of T, not just one for
each instantiation of f.  So instantiating a LAMBDA_EXPR needs to
build up a new LAMBDA_EXPR each time.  This patch implements that,
primarily in the new tsubst_lambda_expr function.

Generating the op() decl for the new lambda is a lot like a normal
instantiation, but not quite.  I decided to do this by factoring out
tsubst_function_decl and tsubst_template_decl, and having them handle
the lambda special case appropriately: when we're dealing with the
lambda function, it isn't actually a specialization of the lambda in
the template, and it isn't a member of a specialization of the closure
in the template.

The code in process_outer_var_ref for handling generic lambdas
specifically can now go away; we now defer implicit capture until the
enclosing context is instantiated for generic and non-generic lambdas.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit bdbf060161375648283cf82d4662362ea788b298
Author: Jason Merrill 
Date:   Thu Jul 6 19:32:32 2017 -0400

Reimplement handling of lambdas in templates.

* cp-tree.h (LAMBDA_FUNCTION_P): Check DECL_DECLARES_FUNCTION_P.
* decl.c (start_preparsed_function): Call start_lambda_scope.
(finish_function): Call finish_lambda_scope.
* init.c (get_nsdmi): Call start/finish_lambda_scope.
* lambda.c (start_lambda_scope): Only ignore VAR_DECL in a function.
* parser.c (cp_parser_function_definition_after_declarator): Don't
call start/finish_lambda_scope.
* pt.c (retrieve_specialization): Ignore lambda functions in
templates.
(find_parameter_packs_r): Ignore capture proxies.  Look into
lambdas.
(check_for_bare_parameter_packs): Allow bare packs in lambdas.
(tsubst_default_argument): Call start/finish_lambda_scope.
(tsubst_function_decl): Handle lambda functions differently.
(tsubst_template_decl): Likewise.
(tsubst_expr) [DECL_EXPR]: Skip closure declarations and capture
proxies.
(tsubst_lambda_expr): Create a new closure rather than instantiate
the one from the template.
(tsubst_copy_and_build): Don't register a specialization of a pack.
(regenerate_decl_from_template): Call start/finish_lambda_scope.
(instantiate_decl): Remove special lambda function handling.
* semantics.c (process_outer_var_ref): Remove special generic lambda
handling.  Don't implicitly capture in a lambda in a template.  Look
for an existing proxy.
* class.c (current_nonlambda_class_type): Use decl_type_context.

diff --git a/gcc/cp/class.c b/gcc/cp/class.c
index 28cf7dc..a5f1007 100644
--- a/gcc/cp/class.c
+++ b/gcc/cp/class.c
@@ -7709,27 +7709,10 @@ outermost_open_class (void)
 tree
 current_nonlambda_class_type (void)
 {
-  int i;
-
-  /* We start looking from 1 because entry 0 is from global scope,
- and has no type.  */
-  for (i = current_class_depth; i > 0; --i)
-{
-  tree c;
-  if (i == current_class_depth)
-   c = current_class_type;
-  else
-   {
- if (current_class_stack[i].hidden)
-   break;
- c = current_class_stack[i].type;
-   }
-  if (!c)
-   continue;
-  if (!LAMBDA_TYPE_P (c))
-   return c;
-}
-  return NULL_TREE;
+  tree type = current_class_type;
+  while (type && LAMBDA_TYPE_P (type))
+type = decl_type_context (TYPE_NAME (type));
+  return type;
 }
 
 /* When entering a class scope, all enclosing class scopes' names with
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index ad97be4..41c48ec 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -1216,8 +1216,9 @@ struct GTY (()) tree_trait_expr {
   (CLASS_TYPE_P (NODE) && CLASSTYPE_LAMBDA_EXPR (NODE))
 
 /* Test if FUNCTION_DECL is a lambda function.  */
-#define LAMBDA_FUNCTION_P(FNDECL) \
-  (DECL_OVERLOADED_OPERATOR_P (FNDECL) == CALL_EXPR \
+#define LAMBDA_FUNCTION_P(FNDECL)  \
+  (DECL_DECLARES_FUNCTION_P (FNDECL)   \
+   && DECL_OVERLOADED_OPERATOR_P (FNDECL) == CALL_EXPR \
&& LAMBDA_TYPE_P (CP_DECL_CONTEXT (FNDECL)))
 
 enum cp_lambda_default_capture_mode_type {
@@ -6828,6 +6829,11 @@ extern bool is_lambda_ignored_entity(tree);
 extern bool lambda_static_thunk_p  (tree);
 extern tree finish_builtin_launder (location_t, tree,
 

C++ PATCH to remove unnecessary LAMBDA_EXPR fields

2017-08-29 Thread Jason Merrill
The explicit return type for a lambda is only used during parsing, so
we can track it in a local variable.  Later on we can look at the
actual return type of the function.

Since tree_lambda_expr is typed, and the type of a lambda-expression
is the closure, we can use TREE_TYPE for the closure instead of an
additional field.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit f5fe57f1735a860988df4c3eb30c2340463852b7
Author: Jason Merrill 
Date:   Thu Jul 6 17:37:15 2017 -0400

Remove unnecessary LAMBDA_EXPR fields.

* cp-tree.h (LAMBDA_EXPR_CLOSURE): Use TREE_TYPE.
(LAMBDA_EXPR_RETURN_TYPE): Remove.
(struct tree_lambda_expr): Remove closure and return_type fields.
* lambda.c (build_lambda_expr): Don't set LAMBDA_EXPR_RETURN_TYPE.
* pt.c (tsubst_copy_and_build): Likewise.
* parser.c (cp_parser_lambda_declarator_opt): Track return type.
(cp_parser_lambda_body): Adjust unspecified return type check.
* ptree.c (cxx_print_lambda_node): Don't print closure or
return type.

diff --git a/gcc/cp/cp-tree.def b/gcc/cp/cp-tree.def
index a46f9c3..890723f 100644
--- a/gcc/cp/cp-tree.def
+++ b/gcc/cp/cp-tree.def
@@ -468,8 +468,7 @@ DEFTREECODE (TRAIT_EXPR, "trait_expr", tcc_exceptional, 0)
LAMBDA_EXPR_THIS_CAPTURE goes straight to the capture of `this', if it 
exists.
LAMBDA_EXPR_PENDING_PROXIES is a vector of capture proxies which need to
be pushed once scope returns to the lambda.
-   LAMBDA_EXPR_MUTABLE_P signals whether this lambda was declared mutable.
-   LAMBDA_EXPR_RETURN_TYPE holds the return type, if it was specified.  */
+   LAMBDA_EXPR_MUTABLE_P signals whether this lambda was declared mutable.  */
 DEFTREECODE (LAMBDA_EXPR, "lambda_expr", tcc_exceptional, 0)
 
 /* The declared type of an expression.  This is a C++0x extension.
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index a58e7bd..ad97be4 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -1253,11 +1253,6 @@ enum cp_lambda_default_capture_mode_type {
 #define LAMBDA_EXPR_MUTABLE_P(NODE) \
   TREE_LANG_FLAG_1 (LAMBDA_EXPR_CHECK (NODE))
 
-/* The return type in the expression.
- * NULL_TREE indicates that none was specified.  */
-#define LAMBDA_EXPR_RETURN_TYPE(NODE) \
-  (((struct tree_lambda_expr *)LAMBDA_EXPR_CHECK (NODE))->return_type)
-
 /* The source location of the lambda.  */
 #define LAMBDA_EXPR_LOCATION(NODE) \
   (((struct tree_lambda_expr *)LAMBDA_EXPR_CHECK (NODE))->locus)
@@ -1276,20 +1271,17 @@ enum cp_lambda_default_capture_mode_type {
 #define LAMBDA_EXPR_PENDING_PROXIES(NODE) \
   (((struct tree_lambda_expr *)LAMBDA_EXPR_CHECK (NODE))->pending_proxies)
 
-/* The closure type of the lambda.  Note that the TREE_TYPE of a
-   LAMBDA_EXPR is always NULL_TREE, because we need to instantiate the
-   LAMBDA_EXPR in order to instantiate the type.  */
+/* The closure type of the lambda, which is also the type of the
+   LAMBDA_EXPR.  */
 #define LAMBDA_EXPR_CLOSURE(NODE) \
-  (((struct tree_lambda_expr *)LAMBDA_EXPR_CHECK (NODE))->closure)
+  (TREE_TYPE (LAMBDA_EXPR_CHECK (NODE)))
 
 struct GTY (()) tree_lambda_expr
 {
   struct tree_typed typed;
   tree capture_list;
   tree this_capture;
-  tree return_type;
   tree extra_scope;
-  tree closure;
   vec *pending_proxies;
   location_t locus;
   enum cp_lambda_default_capture_mode_type default_capture_mode;
diff --git a/gcc/cp/lambda.c b/gcc/cp/lambda.c
index 0e8934b..ff76178 100644
--- a/gcc/cp/lambda.c
+++ b/gcc/cp/lambda.c
@@ -42,7 +42,6 @@ build_lambda_expr (void)
   LAMBDA_EXPR_CAPTURE_LIST (lambda) = NULL_TREE;
   LAMBDA_EXPR_THIS_CAPTURE (lambda) = NULL_TREE;
   LAMBDA_EXPR_PENDING_PROXIES  (lambda) = NULL;
-  LAMBDA_EXPR_RETURN_TYPE  (lambda) = NULL_TREE;
   LAMBDA_EXPR_MUTABLE_P(lambda) = false;
   return lambda;
 }
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 55f088d..07913d6 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -10419,6 +10419,7 @@ cp_parser_lambda_declarator_opt (cp_parser* parser, 
tree lambda_expr)
   tree exception_spec = NULL_TREE;
   tree template_param_list = NULL_TREE;
   tree tx_qual = NULL_TREE;
+  tree return_type = NULL_TREE;
   cp_decl_specifier_seq lambda_specs;
   clear_decl_specs (_specs);
 
@@ -10493,8 +10494,7 @@ cp_parser_lambda_declarator_opt (cp_parser* parser, 
tree lambda_expr)
   if (cp_lexer_next_token_is (parser->lexer, CPP_DEREF))
 {
   cp_lexer_consume_token (parser->lexer);
-  LAMBDA_EXPR_RETURN_TYPE (lambda_expr)
-   = cp_parser_trailing_type_id (parser);
+  return_type = cp_parser_trailing_type_id (parser);
 }
 
   /* The function parameters must be in scope all the way until after the
@@ -10517,8 +10517,8 @@ cp_parser_lambda_declarator_opt (cp_parser* parser, 
tree lambda_expr)
 void *p;
 
 clear_decl_specs (_type_specs);
-if 

Re: [patch, fortran] Warn about suspicious assignment to contiguous pointers

2017-08-29 Thread Janus Weil
Hi Thomas,

 and in that
 case I would argue that, beyond being "not useful", it's even illegal,
 so why not throw a hard error, if we can infer at compile-time that
 the target is non-contiguous?
>>>
>>> Problem is, we cannot infer this from the tests done.
>>> We would also have to add a test if the array is empty
>>> or that it contains only a single element, and that (I think)
>>> is a) impossible in the general case, and b) not worth the bother.
>>
>> I'm not sure I understand which cases you're worried about here. Maybe
>> you can give an example?
>
>   real, dimension(5,5), target :: a
>   real, dimension(:,:), pointer, contiguous :: ap
>   ap => a(4::2,4::2)
>
> points to a single element, which is (by definition) contiguous.

yes, I see that your current patch mishandles this case. But why
should it be impossible to detect that there is only a single element?
If the triplet and the array bounds are constant, it is certainly
possible.


>> In any case, I think your test case is a bit short, so I extended it
>> somewhat (see attachment) and found two cases along the way where your
>> patch throws a warning but shouldn't:
>>
>> r => x(::-1)
>
> This one is _not_ contiguous; contiguos implies stride==1.

Ok, apparently I did not read the F08 standard carefully. I guess the
mention of the "array element order" in chapter 5.3.7 forbids the
negative stride (which means reversed order).


>> Apart from the two mishandled cases above, one other thing comes to my
>> mind: It might be a good idea to apply your checks not only to pointer
>> assignments, but also to dummy arguments (passing a non-contiguous
>> array to a contiguous dummy pointer), where the same rules should
>> apply.
>
> This is true.
>
> There are also a few other checks that I missed, for
> example
>
>   subroutine foo(a)
>   real, dimension(:), target, intent(inout) :: a
>   real, dimension(:), contiguous :: ap
>   ap => a
>
> I will also address this in a future version of the patch.

How do you want to do that? I don't see a way to tell at compile time
if 'a' here is contiguous (like in many other cases). I guess one
would ultimately need a runtime check. Or do you want to warn
unconditionally because it 'might' be non-contiguous?

Cheers,
Janus


2017-08-29 20:13 GMT+02:00 Thomas Koenig :
> Hi Janus,
>
> I think an unconditional warning is OK
> in this case because
>
> - Assigning to a pointer from an obvious non-contiguous target
> is not useful at all, that I can see



 I guess you're talking about a *contiguous* pointer here,
>>>
>>>
>>> Correct.
>>>
 and in that
 case I would argue that, beyond being "not useful", it's even illegal,
 so why not throw a hard error, if we can infer at compile-time that
 the target is non-contiguous?
>>>
>>>
>>> Problem is, we cannot infer this from the tests done.
>>> We would also have to add a test if the array is empty
>>> or that it contains only a single element, and that (I think)
>>> is a) impossible in the general case, and b) not worth the bother.
>>
>>
>> I'm not sure I understand which cases you're worried about here. Maybe
>> you can give an example?
>
>
>
>   real, dimension(5,5), target :: a
>   real, dimension(:,:), pointer, contiguous :: ap
>   ap => a(4::2,4::2)
>
> points to a single element, which is (by definition) contiguous.
>
>> In any case, I think your test case is a bit short, so I extended it
>> somewhat (see attachment) and found two cases along the way where your
>> patch throws a warning but shouldn't:
>>
>> r => x(::-1)
>
>
> This one is _not_ contiguous; contiguos implies stride==1.
>
>> r => x2(2:3,1)
>
>
> Correct, I will correct that one.
>
>> Apart from the two mishandled cases above, one other thing comes to my
>> mind: It might be a good idea to apply your checks not only to pointer
>> assignments, but also to dummy arguments (passing a non-contiguous
>> array to a contiguous dummy pointer), where the same rules should
>> apply.
>
>
> This is true.
>
> There are also a few other checks that I missed, for
> example
>
>   subroutine foo(a)
>   real, dimension(:), target, intent(inout) :: a
>   real, dimension(:), contiguous :: ap
>   ap => a
>
> I will also address this in a future version of the patch. This will
> take a bit of time, though.
>
> Regards
>
> Thomas


C++ PATCH for various small fixes

2017-08-29 Thread Jason Merrill
Various small things I noticed while working on the lambda overhaul:

We try to avoid mentioning the lambda conversion static thunk in
diagnostics, but it was showing up in constexpr messages.

If the lambda itself is erroneous, don't try to build a closure object.

We can't defer instantiation of a function with undeduced auto type.

tsubst_decl should set the DECL_CONTEXT on function parameters immediately.

Don't build a non-type parameter with erroneous type.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit 98a3f06a6ef75905a280209cd176d154e0ca7387
Author: Jason Merrill 
Date:   Wed Aug 23 15:46:01 2017 -0400

Various small fixes.

* lambda.c (build_lambda_object): Check for error_mark_node.
* pt.c (make_pack_expansion): Set PACK_EXPANSION_LOCAL_P on the type
pack as well.
(tsubst_decl) [FUNCTION_DECL]: Set DECL_CONTEXT on the parameters.
(tsubst) [TEMPLATE_PARM_INDEX]: Check for error_mark_node.

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index 8cfc5a4..daeec9d 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -1458,7 +1458,8 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree 
t,
 {
   if (!ctx->quiet)
{
- error_at (loc, "call to non-constexpr function %qD", fun);
+ if (!lambda_static_thunk_p (fun))
+   error_at (loc, "call to non-constexpr function %qD", fun);
  explain_invalid_constexpr_fn (fun);
}
   *non_constant_p = true;
diff --git a/gcc/cp/lambda.c b/gcc/cp/lambda.c
index 14ff6c2..337b9ee 100644
--- a/gcc/cp/lambda.c
+++ b/gcc/cp/lambda.c
@@ -59,7 +59,7 @@ build_lambda_object (tree lambda_expr)
   tree node, expr, type;
   location_t saved_loc;
 
-  if (processing_template_decl)
+  if (processing_template_decl || lambda_expr == error_mark_node)
 return lambda_expr;
 
   /* Make sure any error messages refer to the lambda-introducer.  */
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 416d17b..aaae06d 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -3773,6 +3773,7 @@ make_pack_expansion (tree arg)
   purpose = cxx_make_type (TYPE_PACK_EXPANSION);
   SET_PACK_EXPANSION_PATTERN (purpose, TREE_PURPOSE (arg));
   PACK_EXPANSION_PARAMETER_PACKS (purpose) = parameter_packs;
+  PACK_EXPANSION_LOCAL_P (purpose) = at_function_scope_p ();
 
   /* Just use structural equality for these TYPE_PACK_EXPANSIONS;
 they will rarely be compared to anything.  */
@@ -9535,6 +9536,7 @@ static inline bool
 neglectable_inst_p (tree d)
 {
   return (DECL_P (d)
+ && !undeduced_auto_decl (d)
  && !(TREE_CODE (d) == FUNCTION_DECL ? DECL_DECLARED_CONSTEXPR_P (d)
   : decl_maybe_constant_var_p (d)));
 }
@@ -12413,6 +12415,8 @@ tsubst_decl (tree t, tree args, tsubst_flags_t complain)
 
DECL_ARGUMENTS (r) = tsubst (DECL_ARGUMENTS (t), args,
 complain, t);
+   for (tree parm = DECL_ARGUMENTS (r); parm; parm = DECL_CHAIN (parm))
+ DECL_CONTEXT (parm) = r;
DECL_RESULT (r) = NULL_TREE;
 
TREE_STATIC (r) = 0;
@@ -13786,6 +13790,8 @@ tsubst (tree t, tree args, tsubst_flags_t complain, 
tree in_decl)
   couldn't do it earlier because it might be an auto parameter,
   and we wouldn't need to if we had an argument.  */
type = tsubst (type, args, complain, in_decl);
+   if (type == error_mark_node)
+ return error_mark_node;
r = reduce_template_parm_level (t, type, levels, args, complain);
break;
 


C++ PATCH for c++/80935, wrong error on lambda in C++17 mode

2017-08-29 Thread Jason Merrill
In this testcase, trying to treat a lambda op() as a constexpr
function led to an inappropriate error; we should instead just decide
that it isn't constexpr after all.

This patch also tweaks a couple of other places to use
is_instantiation_of_constexpr and var_in_maybe_constexpr_fn
appropriately.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit 5eeca22016ed9ff34e83afb88e8195796096acae
Author: Jason Merrill 
Date:   Tue Aug 29 15:45:47 2017 -0400

PR c++/80935 - wrong C++17 error with lambda

* decl.c (check_for_uninitialized_const_var): Check
is_instantiation_of_constexpr.
* constexpr.c (ensure_literal_type_for_constexpr_object): Check
is_instantiation_of_constexpr.
(potential_constant_expression_1): Check var_in_maybe_constexpr_fn.

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index daeec9d..f3e868c 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -100,7 +100,7 @@ ensure_literal_type_for_constexpr_object (tree decl)
}
  else
{
- if (!DECL_TEMPLATE_INSTANTIATION (current_function_decl))
+ if (!is_instantiation_of_constexpr (current_function_decl))
{
  error ("variable %qD of non-literal type %qT in % 
"
 "function", decl, type);
@@ -5335,8 +5335,7 @@ potential_constant_expression_1 (tree t, bool want_rval, 
bool strict, bool now,
 STRIP_NOPS (x);
 if (is_this_parameter (x) && !is_capture_proxy (x))
  {
-   if (DECL_CONTEXT (x)
-   && !DECL_DECLARED_CONSTEXPR_P (DECL_CONTEXT (x)))
+   if (!var_in_maybe_constexpr_fn (x))
  {
if (flags & tf_error)
  error_at (loc, "use of % in a constant expression");
diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index ff3127e..23829b0 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -5525,9 +5525,10 @@ check_for_uninitialized_const_var (tree decl)
   "uninitialized const %qD", decl);
   else
{
- error_at (DECL_SOURCE_LOCATION (decl),
-   "uninitialized variable %qD in % function",
-   decl);
+ if (!is_instantiation_of_constexpr (current_function_decl))
+   error_at (DECL_SOURCE_LOCATION (decl),
+ "uninitialized variable %qD in % function",
+ decl);
  cp_function_chain->invalid_constexpr = true;
}
 
diff --git a/gcc/testsuite/g++.dg/cpp1z/constexpr-lambda16.C 
b/gcc/testsuite/g++.dg/cpp1z/constexpr-lambda16.C
new file mode 100644
index 000..ad5d885
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/constexpr-lambda16.C
@@ -0,0 +1,16 @@
+// PR c++/80642
+// { dg-do compile { target c++14 } }
+
+int main()
+{
+  [](auto i)
+{
+  if (i)
+{
+ int j;
+ static int k;
+ return i + j;
+}
+  return i;
+}(0);
+}


Re: [PATCH], Fix PR 81959 (power9 IEEE 128-bit float convert from 32-bit memory)

2017-08-29 Thread Michael Meissner
On Tue, Aug 29, 2017 at 06:14:37AM -0500, Segher Boessenkool wrote:
> Hi Mike,
> 
> On Mon, Aug 28, 2017 at 02:50:02PM -0400, Michael Meissner wrote:
> > When I added the optimization for loading 32-bit values directly into the
> > vector registers from memory to convert to IEEE 128-bit floating point, I
> > forgot to make sure the address did not have PRE_INCREMENT, etc. addressing.
> 
> > * config/rs6000/rs6000.md (float_si2_hw): If register
> > allocation hasn't been done, make sure the memory address is
> > X-FORM (register+register).
> > (floatuns_si2_hw2): Likewise.
> 
> Why is it okay after RA but not before?

Because the Z constraint forces the RA to fix the address.  Before RA, the
address is:

(MEM:SI (PRE_INC:DI (REG:DI ...)))

When the insn is split before RA, the split insn is no longer valid.  I could
either have made the split test to be after RA (which would have moved the
increment outside of the address) or generated a new pseudo to fix up the
address.

> > --- gcc/config/rs6000/rs6000.md (revision 251358)
> > +++ gcc/config/rs6000/rs6000.md (working copy)
> > @@ -14505,6 +14505,9 @@ (define_insn_and_split "float_si2_
> >  {
> >if (GET_CODE (operands[2]) == SCRATCH)
> >  operands[2] = gen_reg_rtx (DImode);
> > +
> > +  if (MEM_P (operands[1]) && !reload_completed)
> > +operands[1] = rs6000_address_for_fpconvert (operands[1]);
> >  })
> 
> It will need a comment here, then (other callers of
> rs6000_address_for_fpconvert do not test for !reload_completed).

rs6000_address_for_fpconvert cannot be called after RA because it allocates a
pseudo in some cases.  In theory the other cases where it is used is either in
the define_expands or in define_splits before RA.  I can certainly add a
comment.  You don't necessary need a gcc_assert in rs6000_address_for_fpconvert
since creating the pseudo will cause a fatal.

> Or maybe the predicate should be stricter in all these cases?
> nonimmediate_operand allows a lot ;-)

It is one of those balancing acts, if you make it too strict early on, you
either don't get the optimization because it doesn't match during expand or
combine, or you potentially get insn not found messages.

> > --- gcc/testsuite/gcc.target/powerpc/pr81959.c  (revision 0)
> > +++ gcc/testsuite/gcc.target/powerpc/pr81959.c  (revision 0)
> > @@ -0,0 +1,25 @@
> > +/* { dg-do compile { target { powerpc64*-*-* && lp64 } } } */
> > +/* { dg-require-effective-target powerpc_p9vector_ok } */
> > +/* { dg-options "-mpower9-vector -O2 -mfloat128" } */
> 
> powerpc*-*-*, or does that not work?

As we were discussing on private IRC yesterday, in order to enable the ISA 3.0
IEEE 128-bit floating point instructions you need support for TImode, and
TImode is not supported on 32-bit.  Since the code only fails when converting
SImode to KFmode, you need the && lp64.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



C++ PATCH for c++/80767, unnecessary instantiation of generic lambda

2017-08-29 Thread Jason Merrill
In this testcase, when we try to call the object of 'overloader' type,
we consider the conversion function from the first lambda to void
(*)(a) and build up a surrogate call function for it.  We consider how
to convert from overloader to that type, which means also looking at
the template conversion function from the second lambda, which
involves instantiating the operator() to determine the return type,
which is ill-formed.

But the standard seems to say that we shouldn't bother to consider how
to do that conversion unless we actually choose the surrogate call
function for that type, so that's what this patch implemenets.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit b987db09b63a8fc89f0d6cb2712c1ba8ee47eefd
Author: Jason Merrill 
Date:   Mon Aug 28 17:38:59 2017 -0400

PR c++/80767 - unnecessary instantiation of generic lambda

* call.c (convert_like_real): Call build_user_type_conversion_1 if
cand is null.
(add_conv_candidate): Build a ck_user conversion with no candidate.

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index 70c2f86..c446057 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -2278,8 +2278,10 @@ add_conv_candidate (struct z_candidate **candidates, 
tree fn, tree obj,
 
   if (i == 0)
{
- t = implicit_conversion (totype, argtype, arg, /*c_cast_p=*/false,
-  flags, complain);
+ t = build_identity_conv (argtype, NULL_TREE);
+ t = build_conv (ck_user, totype, t);
+ /* Leave the 'cand' field null; we'll figure out the conversion in
+convert_like_real if this candidate is chosen.  */
  convert_type = totype;
}
   else if (parmnode == void_list_node)
@@ -6692,6 +6694,13 @@ convert_like_real (conversion *convs, tree expr, tree 
fn, int argnum,
 case ck_user:
   {
struct z_candidate *cand = convs->cand;
+
+   if (cand == NULL)
+ /* We chose the surrogate function from add_conv_candidate, now we
+actually need to build the conversion.  */
+ cand = build_user_type_conversion_1 (totype, expr,
+  LOOKUP_NO_CONVERSION, complain);
+
tree convfn = cand->fn;
 
/* When converting from an init list we consider explicit
diff --git a/gcc/testsuite/g++.dg/cpp1z/lambda-inherit1.C 
b/gcc/testsuite/g++.dg/cpp1z/lambda-inherit1.C
new file mode 100644
index 000..75ef586
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/lambda-inherit1.C
@@ -0,0 +1,23 @@
+// PR c++/80767
+// { dg-options -std=c++17 }
+
+template  
+struct overloader : Fs...
+{
+overloader(Fs... fs) 
+: Fs(fs)...
+{ } 
+
+using Fs::operator()...;
+};
+
+struct a { void foo() { } };
+struct b { void bar() { } };
+struct c { void bar() { } };
+
+int main() {
+overloader{
+[](a x) { x.foo(); },
+[](auto x) { x.bar(); }
+}(a{});
+}


Re: [i386] Document a dozen of IA-32 builtins

2017-08-29 Thread Eric Botcazou
> Builtins that are used through ia32intrin.h are considered an
> "implementation detail", and not a stable interface that needs to be
> documented. Intrinsic headers should be used instead, and users should
> be discouraged to use builtins directly.

That works for C/C++ but not for other languages though, but OK I guess.

-- 
Eric Botcazou


libgo patch committed: Fix lfstack for AIX

2017-08-29 Thread Ian Lance Taylor
This patch by Tony Reix fixes the runtime package's lock-free stack
code to work on 64-bit AIX.  Bootstrapped on x86_64-pc-linux-gnu.
Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 251188)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-2c4a2bd826e58c8c8c51b9196c8d2c67abc4037e
+db8e3801bf8508656606d6e465c76cdc6e9a9eb7
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/go/runtime/lfstack_64bit.go
===
--- libgo/go/runtime/lfstack_64bit.go   (revision 250873)
+++ libgo/go/runtime/lfstack_64bit.go   (working copy)
@@ -43,6 +43,14 @@ const (
// 52 address bits each (with 64k page size).
ia64AddrBits = 55
ia64CntBits  = 64 - ia64AddrBits + 3
+
+   // On AIX, 64-bit addresses are split into 36-bit segment number and 
28-bit
+   // offset in segment.  Segment numbers in the range 
0x07000-0x07FFF
+   // and 0x0A000-0x0AFFF(LSA) are available for mmap.
+   // We assume all lfnode addresses are from memory allocated with mmap.
+   // We use one bit to distinguish between the two ranges.
+   aixAddrBits = 57
+   aixCntBits  = 64 - aixAddrBits + 3
 )
 
 func lfstackPack(node *lfnode, cnt uintptr) uint64 {
@@ -54,6 +62,9 @@ func lfstackPack(node *lfnode, cnt uintp
val := uint64(uintptr(unsafe.Pointer(node)))
return (val<<(64-ia64AddrBits))&(1<<(64-3)-1) | 
val&^(1<<(64-3)-1) | uint64(cnt&(1<>ia64CntBits<<3)&(1<<(64-3)-1) | 
val&^(1<<(64-3)-1
}
+   if GOARCH == "ppc64" && GOOS == "aix" {
+   if val&(1<<63) != 0 {
+   return (*lfnode)(unsafe.Pointer(uintptr((val >> 
aixCntBits << 3) | 0x7<<56)))
+   } else {
+   return (*lfnode)(unsafe.Pointer(uintptr((val >> 
aixCntBits << 3) | 0xa<<56)))
+   }
+   }
return (*lfnode)(unsafe.Pointer(uintptr(val >> cntBits << 3)))
 }


C++ PATCH to lambda in default argument of inherited constructor

2017-08-29 Thread Jason Merrill
When inheriting a template constructor, we've been leaving the
template in DECL_INHERITED_CTOR rather than the appropriate
specialization; this leads to a crash when we try to mangle the
context for a lambda in a default argument for such a function.

This patch fixes this by remembering which specialization we found in
synthesized_method_walk and setting DECL_INHERITED_CTOR to that.

While I was looking at this, I noticed that the class template
argument deduction code's use of DECL_ABSTRACT_ORIGIN was misleading;
it refers to the constructor from which a deduction guide was
generated, not an abstract version of the same function.  I've moved
that information to DECL_ABSTRACT_ORIGIN on the template instead,
which shouldn't confuse anything because we never look at the
DECL_ABSTRACT_ORIGIN of other templates.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit 77d09e0d881bf8803fce194fcab1dcb2817adc88
Author: Jason Merrill 
Date:   Wed Aug 16 23:53:05 2017 -0700

Fix lambdas in template default argument of inherited ctor.

* method.c (synthesized_method_base_walk): Replace an inherited
template with its specialization.
(synthesized_method_walk): Make inheriting_ctor a pointer.
(maybe_explain_implicit_delete, explain_implicit_non_constexpr)
(deduce_inheriting_ctor, implicitly_declare_fn): Adjust.

diff --git a/gcc/cp/method.c b/gcc/cp/method.c
index 809ebc8..012d02a 100644
--- a/gcc/cp/method.c
+++ b/gcc/cp/method.c
@@ -1458,7 +1458,7 @@ static tree
 synthesized_method_base_walk (tree binfo, tree base_binfo, 
  int quals, bool copy_arg_p,
  bool move_p, bool ctor_p,
- tree inheriting_ctor, tree inherited_parms,
+ tree *inheriting_ctor, tree inherited_parms,
  tree fnname, int flags, bool diag,
  tree *spec_p, bool *trivial_p,
  bool *deleted_p, bool *constexpr_p)
@@ -1469,8 +1469,9 @@ synthesized_method_base_walk (tree binfo, tree base_binfo,
 
   if (copy_arg_p)
 argtype = build_stub_type (BINFO_TYPE (base_binfo), quals, move_p);
-  else if ((inherited_binfo
-   = binfo_inherited_from (binfo, base_binfo, inheriting_ctor)))
+  else if (inheriting_ctor
+  && (inherited_binfo
+  = binfo_inherited_from (binfo, base_binfo, *inheriting_ctor)))
 {
   argtype = inherited_parms;
   /* Don't check access on the inherited constructor.  */
@@ -1492,6 +1493,12 @@ synthesized_method_base_walk (tree binfo, tree 
base_binfo,
   if (defer != dk_no_deferred)
 pop_deferring_access_checks ();
 
+  /* Replace an inherited template with the appropriate specialization.  */
+  if (inherited_binfo && rval
+  && DECL_P (*inheriting_ctor) && DECL_P (rval)
+  && DECL_CONTEXT (*inheriting_ctor) == DECL_CONTEXT (rval))
+*inheriting_ctor = DECL_CLONED_FUNCTION (rval);
+
   process_subob_fn (rval, spec_p, trivial_p, deleted_p,
constexpr_p, diag, BINFO_TYPE (base_binfo));
   if (ctor_p &&
@@ -1526,7 +1533,7 @@ static void
 synthesized_method_walk (tree ctype, special_function_kind sfk, bool const_p,
 tree *spec_p, bool *trivial_p, bool *deleted_p,
 bool *constexpr_p, bool diag,
-tree inheriting_ctor, tree inherited_parms)
+tree *inheriting_ctor, tree inherited_parms)
 {
   tree binfo, base_binfo, fnname;
   int i;
@@ -1581,7 +1588,7 @@ synthesized_method_walk (tree ctype, 
special_function_kind sfk, bool const_p,
 }
 
   gcc_assert ((sfk == sfk_inheriting_constructor)
- == (inheriting_ctor != NULL_TREE));
+ == (inheriting_ctor && *inheriting_ctor != NULL_TREE));
 
   /* If that user-written default constructor would satisfy the
  requirements of a constexpr constructor (7.1.5), the
@@ -1656,7 +1663,7 @@ synthesized_method_walk (tree ctype, 
special_function_kind sfk, bool const_p,
   tree scope = push_scope (ctype);
 
   int flags = LOOKUP_NORMAL | LOOKUP_SPECULATIVE;
-  if (!inheriting_ctor)
+  if (sfk != sfk_inheriting_constructor)
 flags |= LOOKUP_DEFAULTED;
 
   tsubst_flags_t complain = diag ? tf_warning_or_error : tf_none;
@@ -1770,9 +1777,9 @@ get_defaulted_eh_spec (tree decl, tsubst_flags_t complain)
   bool const_p = CP_TYPE_CONST_P (non_reference (parm_type));
   tree spec = empty_except_spec;
   bool diag = !DECL_DELETED_FN (decl) && (complain & tf_error);
+  tree inh = DECL_INHERITED_CTOR (decl);
   synthesized_method_walk (ctype, sfk, const_p, , NULL, NULL,
-  NULL, diag, DECL_INHERITED_CTOR (decl),
-  parms);
+  NULL, diag, , parms);
   return spec;
 }
 
@@ -1847,10 +1854,11 @@ maybe_explain_implicit_delete (tree decl)
  tree raises = NULL_TREE;
  

C++ PATCH to allow copying of local_specializations

2017-08-29 Thread Jason Merrill
When instantiating a lambda-expression in the new model, we want to
retain the local_specializations from the enclosing function but still
be able to remember specializations local to the lambda and discard
them afterwards.  With this patch a local_specialization_stack object
can start with a copy of the enclosing map.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit aa72ed613b20715d171287c643f611b7c30b3f12
Author: Jason Merrill 
Date:   Tue Jul 11 17:32:21 2017 -0400

Support copying local_specializations.

* cp-tree.h (enum lss_policy): New.
(local_specialization_stack): Add policy parameter to default ctor.
* pt.c (local_specialization_stack): Copy local_specializations if
lss_copy.

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index f0eafb3..a58e7bd 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -5117,9 +5117,10 @@ enum unification_kind_t {
 // An RAII class used to create a new pointer map for local
 // specializations. When the stack goes out of scope, the
 // previous pointer map is restored.
+enum lss_policy { lss_blank, lss_copy };
 struct local_specialization_stack
 {
-  local_specialization_stack ();
+  local_specialization_stack (lss_policy = lss_blank);
   ~local_specialization_stack ();
 
   hash_map *saved;
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 847cd68..416d17b 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -77,10 +77,13 @@ static tree cur_stmt_expr;
 //
 // Implementation of the RAII helper for creating new local
 // specializations.
-local_specialization_stack::local_specialization_stack ()
+local_specialization_stack::local_specialization_stack (lss_policy policy)
   : saved (local_specializations)
 {
-  local_specializations = new hash_map;
+  if (policy == lss_blank || !saved)
+local_specializations = new hash_map;
+  else
+local_specializations = new hash_map(*saved);
 }
 
 local_specialization_stack::~local_specialization_stack ()


C++ PATCH to add immediate version of potential_constant_expression

2017-08-29 Thread Jason Merrill
potential_constant_expression returns whether or not an expression
could be a constant expression after constexpr substitution of
parameters, but some of the places that we have been using this
predicate aren't subject to constexpr substitution, so it would be
good to have a version that doesn't consider parameters potentially
constant.

This patch adds such a variant, and calls it is_constant_expression.
That isn't quite accurate, since actual evaluation might fail to
produce a constant value, in which case the expression isn't actually
a constant expression, but I haven't come up with a better name.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit e8401fb168ffdbc03123fd449807c821e0ee4331
Author: Jason Merrill 
Date:   Fri Jul 7 18:23:27 2017 -0400

Add immediate potential_constant_expression variants.

* constexpr.c (potential_constant_expression_1): Add "now" parm.
(is_constant_expression, require_constant_expression): New.
(is_static_init_expression, is_nondependent_constant_expression)
(is_nondependent_static_init_expression): Drop "potential".
* except.c (build_must_not_throw_expr): Do type conversion on
value-dependent argument.
* pt.c, semantics.c, typeck2.c: Use variants without "potential".

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index 29ba2c3..8cfc5a4 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -1181,7 +1181,7 @@ cxx_eval_builtin_function_call (const constexpr_ctx *ctx, 
tree t, tree fun,
   return t;
 }
 
-  if (!potential_constant_expression (new_call))
+  if (!is_constant_expression (new_call))
 {
   if (!*non_constant_p && !ctx->quiet)
error ("%q+E is not a constant expression", new_call);
@@ -4861,7 +4861,7 @@ maybe_constant_value (tree t, tree decl)
 {
   tree r;
 
-  if (!potential_nondependent_constant_expression (t))
+  if (!is_nondependent_constant_expression (t))
 {
   if (TREE_OVERFLOW_P (t))
{
@@ -4929,7 +4929,7 @@ fold_non_dependent_expr (tree t)
  as two declarations of the same function, for example.  */
   if (processing_template_decl)
 {
-  if (potential_nondependent_constant_expression (t))
+  if (is_nondependent_constant_expression (t))
{
  processing_template_decl_sentinel s;
  t = instantiate_non_dependent_expr_internal (t, tf_none);
@@ -4982,7 +4982,7 @@ maybe_constant_init (tree t, tree decl)
 t = TREE_OPERAND (t, 1);
   if (TREE_CODE (t) == TARGET_EXPR)
 t = TARGET_EXPR_INITIAL (t);
-  if (!potential_nondependent_static_init_expression (t))
+  if (!is_nondependent_static_init_expression (t))
 /* Don't try to evaluate it.  */;
   else if (CONSTANT_CLASS_P (t))
 /* No evaluation needed.  */;
@@ -5025,7 +5025,9 @@ check_automatic_or_tls (tree ref)
 
 /* Return true if T denotes a potentially constant expression.  Issue
diagnostic as appropriate under control of FLAGS.  If WANT_RVAL is true,
-   an lvalue-rvalue conversion is implied.
+   an lvalue-rvalue conversion is implied.  If NOW is true, we want to
+   consider the expression in the current context, independent of constexpr
+   substitution.
 
C++0x [expr.const] used to say
 
@@ -5041,10 +5043,12 @@ check_automatic_or_tls (tree ref)
   not evaluated are not considered.   */
 
 static bool
-potential_constant_expression_1 (tree t, bool want_rval, bool strict,
+potential_constant_expression_1 (tree t, bool want_rval, bool strict, bool now,
 tsubst_flags_t flags)
 {
-#define RECUR(T,RV) potential_constant_expression_1 ((T), (RV), strict, flags)
+#define RECUR(T,RV) \
+  potential_constant_expression_1 ((T), (RV), strict, now, flags)
+
   enum { any = false, rval = true };
   int i;
   tree tmp;
@@ -5087,7 +5091,6 @@ potential_constant_expression_1 (tree t, bool want_rval, 
bool strict,
 case USERDEF_LITERAL:
   /* We can see a FIELD_DECL in a pointer-to-member expression.  */
 case FIELD_DECL:
-case PARM_DECL:
 case RESULT_DECL:
 case USING_DECL:
 case USING_STMT:
@@ -5098,6 +5101,15 @@ potential_constant_expression_1 (tree t, bool want_rval, 
bool strict,
 case STATIC_ASSERT:
   return true;
 
+case PARM_DECL:
+  if (now)
+   {
+ if (flags & tf_error)
+   error ("%qE is not a constant expression", t);
+ return false;
+   }
+  return true;
+
 case AGGR_INIT_EXPR:
 case CALL_EXPR:
   /* -- an invocation of a function other than a constexpr function
@@ -5173,7 +5185,11 @@ potential_constant_expression_1 (tree t, bool want_rval, 
bool strict,
tree x = get_nth_callarg (t, 0);
if (is_this_parameter (x))
  return true;
-   else if (!RECUR (x, rval))
+   /* Don't require an immediately constant value, as
+  constexpr substitution might 

C++ PATCH to instantiate default args and default member inits once

2017-08-29 Thread Jason Merrill
G++ has traditionally instantiated default arguments and default
member initializers wherever they are needed, but the standard
describes default arguments as separate definitions, and the old way
is problematic for the new lambda instantiation model.

This patch charts a middle course: now we will remember instantiations
of default arguments/DMI, but if the instantiation fails in SFINAE
context we can still fail substitution rather than give a hard error.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit 68acac95b51e692c7ccd267aeebbc93e43e219d6
Author: Jason Merrill 
Date:   Tue Aug 15 14:14:44 2017 -0700

Instantiate default arguments/member initializers once.

* init.c (get_nsdmi): Remember NSDMI instantiations.
* parser.c (inject_this_parameter): Be more picky about
current_class_ptr.
* pt.c (tsubst_copy): Simplify 'this' handling.
(tsubst_default_argument): Remember default argument
instantiations.  Take parameter number.
(tsubst_default_arguments): Pass it.
* call.c (convert_default_arg): Likewise.

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index 6405be2..cfedd30 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -7282,7 +7282,7 @@ convert_default_arg (tree type, tree arg, tree fn, int 
parmnum,
   push_defarg_context (fn);
 
   if (fn && DECL_TEMPLATE_INFO (fn))
-arg = tsubst_default_argument (fn, type, arg, complain);
+arg = tsubst_default_argument (fn, parmnum, type, arg, complain);
 
   /* Due to:
 
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index f2e54a8..f0eafb3 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -6465,7 +6465,7 @@ extern tree maybe_process_partial_specialization (tree);
 extern tree most_specialized_instantiation (tree);
 extern void print_candidates   (tree);
 extern void instantiate_pending_templates  (int);
-extern tree tsubst_default_argument(tree, tree, tree,
+extern tree tsubst_default_argument(tree, int, tree, tree,
 tsubst_flags_t);
 extern tree tsubst (tree, tree, tsubst_flags_t, tree);
 extern tree tsubst_copy_and_build  (tree, tree, tsubst_flags_t,
diff --git a/gcc/cp/init.c b/gcc/cp/init.c
index 83e685c..56a5df8 100644
--- a/gcc/cp/init.c
+++ b/gcc/cp/init.c
@@ -535,6 +535,8 @@ perform_target_ctor (tree init)
 
 /* Return the non-static data initializer for FIELD_DECL MEMBER.  */
 
+static GTY(()) hash_map *nsdmi_inst;
+
 tree
 get_nsdmi (tree member, bool in_ctor, tsubst_flags_t complain)
 {
@@ -542,31 +544,36 @@ get_nsdmi (tree member, bool in_ctor, tsubst_flags_t 
complain)
   tree save_ccp = current_class_ptr;
   tree save_ccr = current_class_ref;
   
-  if (!in_ctor)
-{
-  /* Use a PLACEHOLDER_EXPR when we don't have a 'this' parameter to
-refer to; constexpr evaluation knows what to do with it.  */
-  current_class_ref = build0 (PLACEHOLDER_EXPR, DECL_CONTEXT (member));
-  current_class_ptr = build_address (current_class_ref);
-}
-
   if (DECL_LANG_SPECIFIC (member) && DECL_TEMPLATE_INFO (member))
 {
   init = DECL_INITIAL (DECL_TI_TEMPLATE (member));
+  location_t expr_loc
+   = EXPR_LOC_OR_LOC (init, DECL_SOURCE_LOCATION (member));
+  tree *slot;
   if (TREE_CODE (init) == DEFAULT_ARG)
/* Unparsed.  */;
+  else if (nsdmi_inst && (slot = nsdmi_inst->get (member)))
+   init = *slot;
   /* Check recursive instantiation.  */
   else if (DECL_INSTANTIATING_NSDMI_P (member))
{
  if (complain & tf_error)
-   error ("recursive instantiation of default member "
-  "initializer for %qD", member);
+   error_at (expr_loc, "recursive instantiation of default member "
+ "initializer for %qD", member);
  init = error_mark_node;
}
   else
{
+ int un = cp_unevaluated_operand;
+ cp_unevaluated_operand = 0;
+
+ location_t sloc = input_location;
+ input_location = expr_loc;
+
  DECL_INSTANTIATING_NSDMI_P (member) = 1;
 
+ inject_this_parameter (DECL_CONTEXT (member), TYPE_UNQUALIFIED);
+
  /* Do deferred instantiation of the NSDMI.  */
  init = (tsubst_copy_and_build
  (init, DECL_TI_ARGS (member),
@@ -575,6 +582,16 @@ get_nsdmi (tree member, bool in_ctor, tsubst_flags_t 
complain)
  init = digest_nsdmi_init (member, init, complain);
  
  DECL_INSTANTIATING_NSDMI_P (member) = 0;
+
+ if (init != error_mark_node)
+   {
+ if (!nsdmi_inst)
+   nsdmi_inst = hash_map::create_ggc (37);
+ nsdmi_inst->put (member, init);
+   }
+
+ input_location = sloc;
+ cp_unevaluated_operand = un;
}
 }
   else
@@ -592,6 +609,19 @@ get_nsdmi (tree member, bool 

C++ PATCH to default argument conversion and SFINAE

2017-08-29 Thread Jason Merrill
A Core working group discussion noted that G++ was getting
__is_constructible wrong in this case, where the conversion from
nullptr to the parameter type fails, but we weren't noticing that and
so happily built up a CALL_EXPR with an ERROR_MARK argument.

When that is fixed, we can change the parser to remember an erroneous
default argument to improve error recovery.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit 6d165eb63bbd0cd4f312b89f2711f15d48a4411a
Author: Jason Merrill 
Date:   Tue Aug 15 15:29:23 2017 -0700

Fix default argument conversion failure and SFINAE.

* call.c (build_over_call): Check convert_default_arg result for
error_mark_node.
* parser.c (cp_parser_late_parsing_default_args): Remember
error_mark_node.

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index 067db59a..6405be2 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -7892,10 +7892,13 @@ build_over_call (struct z_candidate *cand, int flags, 
tsubst_flags_t complain)
 {
   if (TREE_VALUE (parm) == error_mark_node)
return error_mark_node;
-  argarray[j++] = convert_default_arg (TREE_VALUE (parm),
-  TREE_PURPOSE (parm),
-  fn, i - is_method,
-  complain);
+  val = convert_default_arg (TREE_VALUE (parm),
+TREE_PURPOSE (parm),
+fn, i - is_method,
+complain);
+  if (val == error_mark_node)
+return error_mark_node;
+  argarray[j++] = val;
 }
 
   /* Ellipsis */
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index b849824..9b7c2c0 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -27645,11 +27645,6 @@ cp_parser_late_parsing_default_args (cp_parser 
*parser, tree fn)
= cp_parser_late_parse_one_default_arg (parser, parmdecl,
default_arg,
TREE_VALUE (parm));
-  if (parsed_arg == error_mark_node)
-   {
- continue;
-   }
-
   TREE_PURPOSE (parm) = parsed_arg;
 
   /* Update any instantiations we've already created.  */
diff --git a/gcc/testsuite/g++.dg/ext/is_constructible1.C 
b/gcc/testsuite/g++.dg/ext/is_constructible1.C
new file mode 100644
index 000..b80ac28
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/is_constructible1.C
@@ -0,0 +1,6 @@
+// { dg-do compile { target c++11 } }
+// { dg-options "" }
+
+template struct Foo { Foo(T = nullptr) {} };
+
+static_assert (!__is_constructible(Foo));
diff --git a/gcc/testsuite/g++.dg/other/new1.C 
b/gcc/testsuite/g++.dg/other/new1.C
index 7138370..30b6513 100644
--- a/gcc/testsuite/g++.dg/other/new1.C
+++ b/gcc/testsuite/g++.dg/other/new1.C
@@ -10,5 +10,5 @@ struct A
 
 void foo()
 {
-  new A;   // { dg-error "default argument" }
+  new A;
 }
diff --git a/gcc/testsuite/g++.dg/parse/crash40.C 
b/gcc/testsuite/g++.dg/parse/crash40.C
index df352dd..537cdb7 100644
--- a/gcc/testsuite/g++.dg/parse/crash40.C
+++ b/gcc/testsuite/g++.dg/parse/crash40.C
@@ -37,6 +37,6 @@ void bar()
   int i;
   i.C::foo<0>(); /* { dg-error "which is of non-class type" } */
 
-  S s; /* { dg-error "default argument" } */
+  S s;
   SS ss;
 }
diff --git a/gcc/testsuite/g++.dg/parse/defarg12.C 
b/gcc/testsuite/g++.dg/parse/defarg12.C
index 2d2d7e7..df80581 100644
--- a/gcc/testsuite/g++.dg/parse/defarg12.C
+++ b/gcc/testsuite/g++.dg/parse/defarg12.C
@@ -9,5 +9,5 @@ struct A
 
 void foo()
 {
-  A().i; /* { dg-error "default argument" } */
+  A().i;
 }
diff --git a/gcc/testsuite/g++.dg/template/error15.C 
b/gcc/testsuite/g++.dg/template/error15.C
index 8693658..ad18a1b 100644
--- a/gcc/testsuite/g++.dg/template/error15.C
+++ b/gcc/testsuite/g++.dg/template/error15.C
@@ -18,7 +18,7 @@ protected:
 
 template 
 void B::g(void) {
-  f(); // { dg-error "default argument" }
+  f();
 }
 
 template class B;


Re: [i386] Document a dozen of IA-32 builtins

2017-08-29 Thread Uros Bizjak
Hello!

>> it seems that there are many undocumented IA-32 builtins.  This patch starts
>> small and documents the builtins corresponding to the basic instrinsics
>> which are declared in ia32intrin.h.
>
> Any comment on this?  Should I open a PR with a list of undocumented builtins?

Builtins that are used through ia32intrin.h are considered an
"implementation detail", and not a stable interface that needs to be
documented. Intrinsic headers should be used instead, and users should
be discouraged to use builtins directly.

Uros.


Re: [patch, fortran] Warn about suspicious assignment to contiguous pointers

2017-08-29 Thread Thomas Koenig

Hi Janus,


I think an unconditional warning is OK
in this case because

- Assigning to a pointer from an obvious non-contiguous target
is not useful at all, that I can see



I guess you're talking about a *contiguous* pointer here,


Correct.


and in that
case I would argue that, beyond being "not useful", it's even illegal,
so why not throw a hard error, if we can infer at compile-time that
the target is non-contiguous?


Problem is, we cannot infer this from the tests done.
We would also have to add a test if the array is empty
or that it contains only a single element, and that (I think)
is a) impossible in the general case, and b) not worth the bother.


I'm not sure I understand which cases you're worried about here. Maybe
you can give an example?



  real, dimension(5,5), target :: a
  real, dimension(:,:), pointer, contiguous :: ap
  ap => a(4::2,4::2)

points to a single element, which is (by definition) contiguous.


In any case, I think your test case is a bit short, so I extended it
somewhat (see attachment) and found two cases along the way where your
patch throws a warning but shouldn't:

r => x(::-1)


This one is _not_ contiguous; contiguos implies stride==1.


r => x2(2:3,1)


Correct, I will correct that one.


Apart from the two mishandled cases above, one other thing comes to my
mind: It might be a good idea to apply your checks not only to pointer
assignments, but also to dummy arguments (passing a non-contiguous
array to a contiguous dummy pointer), where the same rules should
apply.


This is true.

There are also a few other checks that I missed, for
example

  subroutine foo(a)
  real, dimension(:), target, intent(inout) :: a
  real, dimension(:), contiguous :: ap
  ap => a

I will also address this in a future version of the patch. This will
take a bit of time, though.

Regards

Thomas


[PATCH, testsuite]: Cleanup dg-options in gcc.target/i386/ a bit

2017-08-29 Thread Uros Bizjak
2017-08-29  Uros Bizjak  

* gcc.target/i386/20030926-1.c: Add dg-additional-options.
* gcc.target/i386/abi-2.c: Ditto.
* gcc.target/i386/interrupt-sibcall-2.c: Ditto.
* gcc.target/i386/pr22076.c: Ditto.
* gcc.target/i386/pr37216.c: Ditto.
* gcc.target/i386/pr39431.c: Ditto.
* gcc.target/i386/pr40906-1.c: Ditto.
* gcc.target/i386/pr40906-2.c: Ditto.
* gcc.target/i386/pr43766.c: Ditto.
* gcc.target/i386/pr46226.c: Ditto.
* gcc.target/i386/pr46470.c: Ditto.
* gcc.target/i386/pr59929.c: Ditto.
* gcc.target/i386/sse-10.c: Ditto.
* gcc.target/i386/vararg-1.c: Ditto.
* gcc.target/i386/vararg-1.c: Ditto.
* gcc.target/i386/asm-6.c: Compile for fpic target only.
* gcc.target/i386/pr44223.c: Ditto.
* gcc.target/i386/bitfield1.c (dg-options): Remove target selector.
* gcc.target/i386/bitfield2.c (dg-options): Ditto.
* gcc.target/i386/pr67480.c (dg-options): Ditto.
* gcc.target/i386/vect-cond-1.c (dg-options): Ditto.
* gcc.target/i386/bittest.c (scan-assembler-times): Ditto.
* gcc.target/i386/darwin-fpmath.c (dg-do): Simplify target selector.
* gcc.target/i386/mvc9.c: Compile for lto target only.
* gcc.target/i386/pr45234.c: Compile for ia32 target only.
* gcc.target/i386/pr49866.c: Compile for lp64 target only.
* gcc.target/i386/pr57091.c: Ditto.
* gcc.target/i386/pr61599-2.c: Ditto.

Tested on x86_64-linux-gnu {,-m32}.

Committed to mainline.

Uros.
diff --git a/gcc/testsuite/gcc.target/i386/20030926-1.c 
b/gcc/testsuite/gcc.target/i386/20030926-1.c
index ebde34085113..f4c8f618d4c3 100644
--- a/gcc/testsuite/gcc.target/i386/20030926-1.c
+++ b/gcc/testsuite/gcc.target/i386/20030926-1.c
@@ -1,7 +1,7 @@
 /* PR optimization/11741  */
 /* { dg-do compile } */
 /* { dg-options "-O2 -minline-all-stringops" } */
-/* { dg-options "-O2 -minline-all-stringops -march=pentium4" { target ia32 } } 
*/
+/* { dg-additional-options "-march=pentium4" { target ia32 } } */
 
 extern void *memcpy (void *, const void *, __SIZE_TYPE__);
 extern __SIZE_TYPE__ strlen (const char *);
diff --git a/gcc/testsuite/gcc.target/i386/abi-2.c 
b/gcc/testsuite/gcc.target/i386/abi-2.c
index 39eafc250396..72a17eca31fc 100644
--- a/gcc/testsuite/gcc.target/i386/abi-2.c
+++ b/gcc/testsuite/gcc.target/i386/abi-2.c
@@ -1,7 +1,7 @@
 /* Make certain that we pass __m256i in the correct register for AVX.  */
 /* { dg-do compile } */
 /* { dg-options "-O1 -mavx" } */
-/* { dg-options "-mabi=sysv -O1 -mavx" { target x86_64-*-mingw* } } */
+/* { dg-additional-options "-mabi=sysv" { target x86_64-*-mingw* } } */
 
 typedef long long __m256i __attribute__ ((__vector_size__ (32)));
 __m256i foo (void) { return (__m256i){ 1, 2, 3, 4 }; }
diff --git a/gcc/testsuite/gcc.target/i386/asm-6.c 
b/gcc/testsuite/gcc.target/i386/asm-6.c
index 6aa37ef4276a..225e3193f7ad 100644
--- a/gcc/testsuite/gcc.target/i386/asm-6.c
+++ b/gcc/testsuite/gcc.target/i386/asm-6.c
@@ -1,8 +1,8 @@
 /* PR rtl-optimization/44174 */
 /* Testcase by Jakub Jelinek  */
 
-/* { dg-do compile } */
-/* { dg-options "-O2 -fpic" { target fpic } } */
+/* { dg-do compile { target fpic } } */
+/* { dg-options "-O2 -fpic" } */
 
 int f0 (int, int, int, int, int);
 int f1 (void);
diff --git a/gcc/testsuite/gcc.target/i386/bitfield1.c 
b/gcc/testsuite/gcc.target/i386/bitfield1.c
index 714792c3d026..ecc7efe64010 100644
--- a/gcc/testsuite/gcc.target/i386/bitfield1.c
+++ b/gcc/testsuite/gcc.target/i386/bitfield1.c
@@ -1,8 +1,7 @@
 // Test for bitfield alignment in structs on IA-32
 // { dg-do run }
 // { dg-require-effective-target ia32 }
-// { dg-options "-O2" }
-// { dg-options "-mno-align-double -mno-ms-bitfields" { target i?86-*-* 
x86_64-*-* } }
+// { dg-options "-O2 -mno-align-double -mno-ms-bitfields" }
 
 extern void abort (void);
 extern void exit (int);
diff --git a/gcc/testsuite/gcc.target/i386/bitfield2.c 
b/gcc/testsuite/gcc.target/i386/bitfield2.c
index 5784bf0ccb45..58f7cea1b603 100644
--- a/gcc/testsuite/gcc.target/i386/bitfield2.c
+++ b/gcc/testsuite/gcc.target/i386/bitfield2.c
@@ -1,8 +1,7 @@
 // Test for bitfield alignment in structs on IA-32
 // { dg-do run }
 // { dg-require-effective-target ia32 }
-// { dg-options "-O2" }
-// { dg-options "-mno-align-double -mno-ms-bitfields" { target i?86-*-* 
x86_64-*-* } }
+// { dg-options "-O2 -mno-align-double -mno-ms-bitfields" }
 
 extern void abort (void);
 extern void exit (int);
diff --git a/gcc/testsuite/gcc.target/i386/bittest.c 
b/gcc/testsuite/gcc.target/i386/bittest.c
index 7b7ce9eed105..79c389718591 100644
--- a/gcc/testsuite/gcc.target/i386/bittest.c
+++ b/gcc/testsuite/gcc.target/i386/bittest.c
@@ -19,4 +19,4 @@ gate_rtl_cprop (void)
memory and mask off bits are unnecessary.  In theory we can just count
the move-with-extension, and and testb instructions.  There should be
only one.  */
-/* { dg-final { scan-assembler-times "movzbl|and|testb" 1 { target { i?86-*-* 
x86_64-*-*} } } } 

[PATCH, i386]: Simplify setting of opts->x_flag_entry.

2017-08-29 Thread Uros Bizjak
Check opts_set->x_flag_fentry and remove check for non-existing
PROFILE_BEFORE_PROLOGUE definition.

2017-08-29  Uros Bizjak  

* config/i386/i386.opt (flag_fentry): Do not init to -1.
* config/i386/i386.c (ix86_option_override_internal): Simplify
setting of opts->x_flag_entry.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline SVN.

Uros.
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index b5c113d95aad..509fd3a26d5b 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -6545,27 +6545,18 @@ ix86_option_override_internal (bool main_args_p,
 opts->x_target_flags |= MASK_CLD & ~opts_set->x_target_flags;
 #endif
 
-  if (!TARGET_64BIT_P (opts->x_ix86_isa_flags) && opts->x_flag_pic)
+  /* Set the default value for -mfentry.  */
+  if (!opts_set->x_flag_fentry)
+opts->x_flag_fentry = TARGET_SEH;
+  else
 {
-  if (opts->x_flag_fentry > 0)
-sorry ("-mfentry isn%'t supported for 32-bit in combination "
+  if (!TARGET_64BIT_P (opts->x_ix86_isa_flags) && opts->x_flag_pic
+ && opts->x_flag_fentry)
+   sorry ("-mfentry isn%'t supported for 32-bit in combination "
   "with -fpic");
-  opts->x_flag_fentry = 0;
-}
-  else if (TARGET_SEH)
-{
-  if (opts->x_flag_fentry == 0)
+  else if (TARGET_SEH && !opts->x_flag_fentry)
sorry ("-mno-fentry isn%'t compatible with SEH");
-  opts->x_flag_fentry = 1;
 }
-  else if (opts->x_flag_fentry < 0)
-   {
-#if defined(PROFILE_BEFORE_PROLOGUE)
- opts->x_flag_fentry = 1;
-#else
- opts->x_flag_fentry = 0;
-#endif
-   }
 
   if (TARGET_SEH && TARGET_CALL_MS2SYSV_XLOGUES)
 sorry ("-mcall-ms2sysv-xlogues isn%'t currently supported with SEH");
diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index 8bf6af21fd18..81bbc1e2170a 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -862,7 +862,7 @@ Target Report Mask(ISA_PREFETCHWT1) Var(ix86_isa_flags) Save
 Support PREFETCHWT1 built-in functions and code generation.
 
 mfentry
-Target Report Var(flag_fentry) Init(-1)
+Target Report Var(flag_fentry)
 Emit profiling counter call at function entry before prologue.
 
 mrecord-mcount


Re: [i386] Document a dozen of IA-32 builtins

2017-08-29 Thread Eric Botcazou
> it seems that there are many undocumented IA-32 builtins.  This patch starts
> small and documents the builtins corresponding to the basic instrinsics
> which are declared in ia32intrin.h.

Any comment on this?  Should I open a PR with a list of undocumented builtins?

-- 
Eric Botcazou


Re: [PATCH] ira-costs: avoid missing base registers in record_address_regs

2017-08-29 Thread Vladimir Makarov



On 08/28/2017 09:06 AM, Alexander Monakov wrote:

Hello,

The code in record_address_regs shown in the following patch assumes that
if a given target cannot have two registers in a memory address, then the
sole register, if present, must be the leftmost operand in the PLUS chain.

I think this is not true if the target uses unspecs to signify special
addressing modes such as TLS.  In that case the unspec can be to the left
of the register, and this function won't see the register.

The proposed fix is to always recurse into non-constant operands like in the
adjacent case of index registers being the same as base registers. OK to apply?

Yes.  Thank you for the patch, Alexander.

As I remember this code was adopted from the old regclass.



Re: [PATCH] [MSP430] [PR80993] Prevent lto removing interrupt handlers

2017-08-29 Thread DJ Delorie

Richard Biener  writes:
> Humm... don't you have to register interrupt handlers somehow?

MSP430 uses an "if they're present, they're registered" approach, so
it's driven by the user tagging functions as interrupts - the linker
notices that they're present and links them into the interrupt table for
you.  So, adding "used" is consistent with this method, as the interrupt
attribute *is* the registration.



Re: Improve DOM's ability to derive equivalences when traversing edges

2017-08-29 Thread Christophe Lyon
On 29 August 2017 at 17:28, Jeff Law <l...@redhat.com> wrote:
> On 08/29/2017 03:13 AM, Christophe Lyon wrote:
>> Hi Jeff,
> [ ... ]
>>>
>>> commit a370df2c52074abbb044d1921a0c7df235758050
>>> Author: law <law@138bc75d-0d04-0410-961f-82ee72b054a4>
>>> Date:   Tue Aug 29 05:03:36 2017 +
>>>
>>> * tree-ssa-dom.c (edge_info::record_simple_equiv): Call
>>> derive_equivalences.
>>> (derive_equivalences_from_bit_ior, 
>>> record_temporary_equivalences):
>>> Code moved into
>>> (edge_info::derive_equivalences): New private member function
>>>
>>> * gcc.dg/torture/pr57214.c: Fix type of loop counter.
>>> * gcc.dg/tree-ssa/ssa-sink-16.c: Disable DOM.
>>> * gcc.dg/tree-ssa/ssa-dom-thread-11.c: New test.
>>> * gcc.dg/tree-ssa/ssa-dom-thread-12.c: New test.
>>> * gcc.dg/tree-ssa/ssa-dom-thread-13.c: New test.
>>> * gcc.dg/tree-ssa/ssa-dom-thread-14.c: New test.
>>> * gcc.dg/tree-ssa/ssa-dom-thread-15.c: New test.
>>> * gcc.dg/tree-ssa/ssa-dom-thread-16.c: New test.
>>> * gcc.dg/tree-ssa/ssa-dom-thread-17.c: New test.
>>>
>>> git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@251397 
>>> 138bc75d-0d04-0410-961f-82ee72b054a4
>>>
>>
>> 3 of the new tests fail on arm-none-linux-gnueabihf
>> --with-cpu=cortex-a15 --with-fpu=vfpv3-d16-fp16
>>
>> FAIL:gcc.dg/tree-ssa/ssa-dom-thread-11.c scan-tree-dump-times dom2
>> "Threaded" 1
>> FAIL:gcc.dg/tree-ssa/ssa-dom-thread-14.c scan-tree-dump-times dom2
>> "Threaded" 1
>> FAIL:gcc.dg/tree-ssa/ssa-dom-thread-16.c scan-tree-dump-times dom2
>> "Threaded" 1
>>
>> they do pass when configuring for cpu cortex-a9/a15 and fpu 
>> neon-fp16/neon-vfpv4
>>
>> I do not have the dumps since it's automated testing; let me know if
>> you need me to
>> reproduce it manually and extract the dumps.
> Strange.  I can't reproduce this.
>
> /home/law/gcc-testing/gcc2/configure --target=arm-none-linux-gnueabihf
> --with-cpu=cortex-a15 --with-fpu=vfpv3-d16-fp16
>

Sorry, it was a typo: I meant cortex-a5.


> [ Wait for build... ]
> make check-gcc RUNTESTFLAGS=tree-ssa.exp=ssa-dom-thread-11.c
>
> Gets me 2 passes.  If I run it manually and look at the dumps it
> produces exactly the code I would expect.
>
> Is something perhaps passing down a -mtune or other option?
>
> ./cc1 -quiet -v -iprefix
> /opt/notnfs/law/gcc-testing/arm/gcc/../lib/gcc/arm-none-linux-gnueabihf/8.0.0/
> -isystem ./include -isystem ./include-fixed j.c -quiet -dumpbase j.c
> -mcpu=cortex-a15 -mfpu=vfpv3-d16-fp16 -mtls-dialect=gnu -marm
> -march=armv7ve -auxbase j -O2 -version -o /tmp/cc6vzJl6.s
> -fdump-tree-all-blocks-details
> GNU C11 (GCC) version 8.0.0 20170829 (experimental)
> (arm-none-linux-gnueabihf)
> compiled by GNU C version 6.3.1 20161221 (Red Hat 6.3.1-1), GMP
> version 6.1.1, MPFR version 3.1.5, MPC version 1.0.2, isl version none
>
> I'm happy to dig further and make sure we get the selectors right, but I
> have to be able to reproduce the problem first :-)
>
> jeff


Re: [06/77] Make GET_MODE_WIDER return an opt_mode

2017-08-29 Thread Jeff Law
On 08/29/2017 09:36 AM, Richard Sandiford wrote:
> Jeff Law  writes:
>> On 08/29/2017 09:01 AM, Richard Sandiford wrote:
>>> Jeff Law  writes:
>>>
>>> OK, here's a patch that uses require ().  I've updated the following
>>> patches in the obvious way.
>> Thanks.
>>
>>>
>>> This does make me want to reconsider another decision though.
>>> Using opt_mode for iterators leads to things like:
>>>
>>>   opt_scalar_int_mode wider_mode_iter;
>>>   FOR_EACH_WIDER_MODE (wider_mode_iter, mode)
>>> {
>>>   scalar_int_mode wider_mode = wider_mode_iter.require ();
>>>   if (optab_handler (unoptab, wider_mode) != CODE_FOR_nothing)
>>> ...
>>>
>>> which isn't pretty.  It would be easy to support:
>> No ideal, but it's reasonably explicit in what it does, so I wouldn't
>> expect anyone to be surprised.
>>
>>
>>>
>>>   scalar_int_mode wider_mode;
>>>   FOR_EACH_WIDER_MODE (wider_mode, mode)
>>> if (optab_handler (unoptab, wider_mode) != CODE_FOR_nothing)
>>>   ...
>>>
>>> *but* this would mean that "wider_mode" is accessible but undefined
>>> after the loop (unlike the first loop, where wider_mode_iter is
>>> guaranteed to be empty if the loop runs to completion).  Is that OK?
>>> Or is it too suprising?
>> I think most folks would be surprised that they can't look at wider_mode
>> in a meaningful way after the loop.
>>
>>>
>>> Another alternative would be:
>>>
>>>   opt_scalar_int_mode wider_mode_iter;
>>>   scalar_int_mode wider_mode;
>>>   FOR_EACH_WIDER_MODE (wider_mode_iter, wider_mode, mode)
>>> if (optab_handler (unoptab, wider_mode) != CODE_FOR_nothing)
>>>   ...
>>>
>>> which gives both.  But perhaps this would be odd for plain machine_mode
>>> iterators, where there's no obvious distinction between the first and
>>> second arguments.
>> Well, this is like the first to me, except we've separated the iterator
>> from the mode.
>>
>> I slightly prefer the first and think the second is probably too
>> different from the traditional way we think about the state of loop
>> variables after the loop has terminated.
> 
> OK, that's easy then :-)  Is OK to apply the approved parts of the
> series with the "require" change then?  If the consensus ends up
> being that we should handle the iterators a different way, I'll
> volunteer to do a bulk update.
Yea, I think the approved parts are good to go for the trunk.  I think
that covers the whole series, except the aarch64 backend bits which I
left to the ARM folks.

> 
> Thanks again for all the reviews.
No problem.  Sorry it took so long.

jeff


Re: [06/77] Make GET_MODE_WIDER return an opt_mode

2017-08-29 Thread Richard Sandiford
Jeff Law  writes:
> On 08/29/2017 09:01 AM, Richard Sandiford wrote:
>> Jeff Law  writes:
>> 
>> OK, here's a patch that uses require ().  I've updated the following
>> patches in the obvious way.
> Thanks.
>
>> 
>> This does make me want to reconsider another decision though.
>> Using opt_mode for iterators leads to things like:
>> 
>>   opt_scalar_int_mode wider_mode_iter;
>>   FOR_EACH_WIDER_MODE (wider_mode_iter, mode)
>> {
>>   scalar_int_mode wider_mode = wider_mode_iter.require ();
>>   if (optab_handler (unoptab, wider_mode) != CODE_FOR_nothing)
>> ...
>> 
>> which isn't pretty.  It would be easy to support:
> No ideal, but it's reasonably explicit in what it does, so I wouldn't
> expect anyone to be surprised.
>
>
>> 
>>   scalar_int_mode wider_mode;
>>   FOR_EACH_WIDER_MODE (wider_mode, mode)
>> if (optab_handler (unoptab, wider_mode) != CODE_FOR_nothing)
>>   ...
>> 
>> *but* this would mean that "wider_mode" is accessible but undefined
>> after the loop (unlike the first loop, where wider_mode_iter is
>> guaranteed to be empty if the loop runs to completion).  Is that OK?
>> Or is it too suprising?
> I think most folks would be surprised that they can't look at wider_mode
> in a meaningful way after the loop.
>
>> 
>> Another alternative would be:
>> 
>>   opt_scalar_int_mode wider_mode_iter;
>>   scalar_int_mode wider_mode;
>>   FOR_EACH_WIDER_MODE (wider_mode_iter, wider_mode, mode)
>> if (optab_handler (unoptab, wider_mode) != CODE_FOR_nothing)
>>   ...
>> 
>> which gives both.  But perhaps this would be odd for plain machine_mode
>> iterators, where there's no obvious distinction between the first and
>> second arguments.
> Well, this is like the first to me, except we've separated the iterator
> from the mode.
>
> I slightly prefer the first and think the second is probably too
> different from the traditional way we think about the state of loop
> variables after the loop has terminated.

OK, that's easy then :-)  Is OK to apply the approved parts of the
series with the "require" change then?  If the consensus ends up
being that we should handle the iterators a different way, I'll
volunteer to do a bulk update.

Thanks again for all the reviews.

Richard


Re: Improve DOM's ability to derive equivalences when traversing edges

2017-08-29 Thread Jeff Law
On 08/29/2017 03:13 AM, Christophe Lyon wrote:
> Hi Jeff,
[ ... ]
>>
>> commit a370df2c52074abbb044d1921a0c7df235758050
>> Author: law <law@138bc75d-0d04-0410-961f-82ee72b054a4>
>> Date:   Tue Aug 29 05:03:36 2017 +
>>
>> * tree-ssa-dom.c (edge_info::record_simple_equiv): Call
>> derive_equivalences.
>> (derive_equivalences_from_bit_ior, 
>> record_temporary_equivalences):
>> Code moved into
>> (edge_info::derive_equivalences): New private member function
>>
>> * gcc.dg/torture/pr57214.c: Fix type of loop counter.
>> * gcc.dg/tree-ssa/ssa-sink-16.c: Disable DOM.
>> * gcc.dg/tree-ssa/ssa-dom-thread-11.c: New test.
>> * gcc.dg/tree-ssa/ssa-dom-thread-12.c: New test.
>> * gcc.dg/tree-ssa/ssa-dom-thread-13.c: New test.
>> * gcc.dg/tree-ssa/ssa-dom-thread-14.c: New test.
>> * gcc.dg/tree-ssa/ssa-dom-thread-15.c: New test.
>> * gcc.dg/tree-ssa/ssa-dom-thread-16.c: New test.
>> * gcc.dg/tree-ssa/ssa-dom-thread-17.c: New test.
>>
>> git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@251397 
>> 138bc75d-0d04-0410-961f-82ee72b054a4
>>
> 
> 3 of the new tests fail on arm-none-linux-gnueabihf
> --with-cpu=cortex-a15 --with-fpu=vfpv3-d16-fp16
> 
> FAIL:gcc.dg/tree-ssa/ssa-dom-thread-11.c scan-tree-dump-times dom2
> "Threaded" 1
> FAIL:gcc.dg/tree-ssa/ssa-dom-thread-14.c scan-tree-dump-times dom2
> "Threaded" 1
> FAIL:gcc.dg/tree-ssa/ssa-dom-thread-16.c scan-tree-dump-times dom2
> "Threaded" 1
> 
> they do pass when configuring for cpu cortex-a9/a15 and fpu 
> neon-fp16/neon-vfpv4
> 
> I do not have the dumps since it's automated testing; let me know if
> you need me to
> reproduce it manually and extract the dumps.
Strange.  I can't reproduce this.

/home/law/gcc-testing/gcc2/configure --target=arm-none-linux-gnueabihf
--with-cpu=cortex-a15 --with-fpu=vfpv3-d16-fp16

[ Wait for build... ]
make check-gcc RUNTESTFLAGS=tree-ssa.exp=ssa-dom-thread-11.c

Gets me 2 passes.  If I run it manually and look at the dumps it
produces exactly the code I would expect.

Is something perhaps passing down a -mtune or other option?

./cc1 -quiet -v -iprefix
/opt/notnfs/law/gcc-testing/arm/gcc/../lib/gcc/arm-none-linux-gnueabihf/8.0.0/
-isystem ./include -isystem ./include-fixed j.c -quiet -dumpbase j.c
-mcpu=cortex-a15 -mfpu=vfpv3-d16-fp16 -mtls-dialect=gnu -marm
-march=armv7ve -auxbase j -O2 -version -o /tmp/cc6vzJl6.s
-fdump-tree-all-blocks-details
GNU C11 (GCC) version 8.0.0 20170829 (experimental)
(arm-none-linux-gnueabihf)
compiled by GNU C version 6.3.1 20161221 (Red Hat 6.3.1-1), GMP
version 6.1.1, MPFR version 3.1.5, MPC version 1.0.2, isl version none

I'm happy to dig further and make sure we get the selectors right, but I
have to be able to reproduce the problem first :-)

jeff


Re: [06/77] Make GET_MODE_WIDER return an opt_mode

2017-08-29 Thread Jeff Law
On 08/29/2017 09:01 AM, Richard Sandiford wrote:
> Jeff Law  writes:
> 
> OK, here's a patch that uses require ().  I've updated the following
> patches in the obvious way.
Thanks.

> 
> This does make me want to reconsider another decision though.
> Using opt_mode for iterators leads to things like:
> 
>   opt_scalar_int_mode wider_mode_iter;
>   FOR_EACH_WIDER_MODE (wider_mode_iter, mode)
> {
>   scalar_int_mode wider_mode = wider_mode_iter.require ();
>   if (optab_handler (unoptab, wider_mode) != CODE_FOR_nothing)
> ...
> 
> which isn't pretty.  It would be easy to support:
No ideal, but it's reasonably explicit in what it does, so I wouldn't
expect anyone to be surprised.


> 
>   scalar_int_mode wider_mode;
>   FOR_EACH_WIDER_MODE (wider_mode, mode)
> if (optab_handler (unoptab, wider_mode) != CODE_FOR_nothing)
>   ...
> 
> *but* this would mean that "wider_mode" is accessible but undefined
> after the loop (unlike the first loop, where wider_mode_iter is
> guaranteed to be empty if the loop runs to completion).  Is that OK?
> Or is it too suprising?
I think most folks would be surprised that they can't look at wider_mode
in a meaningful way after the loop.

> 
> Another alternative would be:
> 
>   opt_scalar_int_mode wider_mode_iter;
>   scalar_int_mode wider_mode;
>   FOR_EACH_WIDER_MODE (wider_mode_iter, wider_mode, mode)
> if (optab_handler (unoptab, wider_mode) != CODE_FOR_nothing)
>   ...
> 
> which gives both.  But perhaps this would be odd for plain machine_mode
> iterators, where there's no obvious distinction between the first and
> second arguments.
Well, this is like the first to me, except we've separated the iterator
from the mode.

I slightly prefer the first and think the second is probably too
different from the traditional way we think about the state of loop
variables after the loop has terminated.

Jeff


RE:[PATCH,AIX] Cleanup in libiberty for AIX.

2017-08-29 Thread REIX, Tony
Description:
 * This patch does some cleanup in libiberty for AIX.

Tests:
 * AIX: Build: SUCCESS
   - build made by means of gmake in trunk.
   - patch generated by: 
cd gcc-svn-trunk/
svn diff libiberty/ChangeLog libiberty/simple-object-xcoff.c

ChangeLog:
   +   * simple-object-xcoff.c (simple_object_xcoff_find_sections):
   +   Improve .go_export csect handling.  Don't make assumptions
   +   on containing section or number of auxiliary entries.

Cordialement,

Tony Reix

Bull - ATOS
IBM Coop Architect & Technical Leader

Office : +33 (0) 4 76 29 72 67
1 rue de Provence - 38432 Échirolles - France
www.atos.net
Index: ./libiberty/ChangeLog
===
--- ./libiberty/ChangeLog	(revision 251399)
+++ ./libiberty/ChangeLog	(working copy)
@@ -1,3 +1,9 @@
+2017-08-29  Tony Reix  
+
+	* simple-object-xcoff.c (simple_object_xcoff_find_sections):
+	Improve .go_export csect handling.  Don't make assumptions
+	on containing section or number of auxiliary entries.
+
 2017-08-28  Richard Biener  
 
 	PR lto/81968
Index: libiberty/simple-object-xcoff.c
===
--- libiberty/simple-object-xcoff.c	(revision 251399)
+++ libiberty/simple-object-xcoff.c	(working copy)
@@ -255,11 +255,15 @@ union external_auxent
 #define IMAGE_SYM_TYPE \
   ((IMAGE_SYM_DTYPE_NULL << 4) | IMAGE_SYM_TYPE_NULL)
 
+#define C_EXT		(2)
 #define C_STAT		(3)
 #define C_FILE		(103)
+#define C_HIDEXT	(107)
 
-#define DBXMASK		0x80
+#define XTY_SD		(1)	/* section definition */
 
+#define XMC_XO		(7)	/* extended operation */
+
 /* Private data for an simple_object_read.  */
 
 struct simple_object_xcoff_read
@@ -400,6 +404,7 @@ simple_object_xcoff_find_sections (simple_object_r
   size_t scnhdr_size;
   unsigned char *scnbuf;
   const char *errmsg;
+  unsigned short (*fetch_16) (const unsigned char *);
   unsigned int (*fetch_32) (const unsigned char *);
   ulong_type (*fetch_64) (const unsigned char *);
   unsigned int nscns;
@@ -407,7 +412,6 @@ simple_object_xcoff_find_sections (simple_object_r
   size_t strtab_size;
   struct external_syment *symtab = NULL;
   unsigned int i;
-  off_t textptr = 0;
 
   scnhdr_size = u64 ? SCNHSZ64 : SCNHSZ32;
   scnbuf = XNEWVEC (unsigned char, scnhdr_size * ocr->nscns);
@@ -420,6 +424,7 @@ simple_object_xcoff_find_sections (simple_object_r
   return errmsg;
 }
 
+  fetch_16 = simple_object_fetch_big_16;
   fetch_32 = simple_object_fetch_big_32;
   fetch_64 = simple_object_fetch_big_64;
 
@@ -433,7 +438,7 @@ simple_object_xcoff_find_sections (simple_object_r
   char namebuf[SCNNMLEN + 1];
   char *name;
   off_t scnptr;
-  unsigned int size;
+  off_t size;
 
   scnhdr = scnbuf + i * scnhdr_size;
   scnname = scnhdr + offsetof (struct external_scnhdr, s_name);
@@ -489,24 +494,24 @@ simple_object_xcoff_find_sections (simple_object_r
 	  u.xcoff32.s_size));
 	}
 
-  if (strcmp (name, ".text") == 0)
-	textptr = scnptr;
   if (!(*pfn) (data, name, scnptr, size))
 	break;
 }
 
-  /* Special handling for .go_export CSECT. */
-  if (textptr != 0 && ocr->nsyms > 0)
+  /* Special handling for .go_export csect.  */
+  if (ocr->nsyms > 0)
 {
-  unsigned char *sym, *aux;
+  unsigned char *sym;
   const char *n_name;
-  unsigned long n_value, n_offset, n_zeroes, x_scnlen;
+  off_t size, n_value;
+  unsigned int n_numaux, n_offset, n_zeroes;
+  short n_scnum;
 
-  /* Read symbol table. */
+  /* Read symbol table.  */
   symtab = XNEWVEC (struct external_syment, ocr->nsyms * SYMESZ);
   if (!simple_object_internal_read (sobj->descriptor,
 	sobj->offset + ocr->symptr,
-	(unsigned char *)symtab,
+	(unsigned char *) symtab,
 	ocr->nsyms * SYMESZ,
 	, err))
 	{
@@ -515,17 +520,25 @@ simple_object_xcoff_find_sections (simple_object_r
 	  return NULL;
 	}
 
-  /* Search in symbol table if we have a ".go_export" symbol. */
-  for (i = 0; i < ocr->nsyms; ++i)
+  /* Search in symbol table if we have a ".go_export" symbol.  */
+  for (i = 0; i < ocr->nsyms; i += n_numaux + 1)
 	{
-	  sym = (unsigned char *)[i];
+	  sym = (unsigned char *) [i];
+	  n_numaux = symtab[i].n_numaux[0];
 
-	  if (symtab[i].n_sclass[0] & DBXMASK)
-	{
-	  /* Skip debug symbols whose names are in stabs. */
-	  i += symtab[i].n_numaux[0];
-	  continue;
-	}
+	  if (symtab[i].n_sclass[0] != C_EXT
+	  && symtab[i].n_sclass[0] != C_HIDEXT)
+	continue;
+
+	  /* Must have at least one csect auxiliary entry.  */
+	  if (n_numaux < 1 || i + n_numaux >= ocr->nsyms)
+	continue;
+
+	  n_scnum = fetch_16 (sym + offsetof (struct external_syment,
+	  n_scnum));
+	  if (n_scnum < 1 || (unsigned int) n_scnum > nscns)
+	continue;
+
 	  if (u64)
 	{
 	  n_value = fetch_64 (sym + offsetof (struct external_syment,
@@ 

Re: Fix inconsistent section flags

2017-08-29 Thread Joerg Sonnenberger
On Mon, Aug 28, 2017 at 11:42:53AM -0600, Jeff Law wrote:
> Does the attached (untested) patch work for you?

Works for the cases I care about, yes.

Joerg


Re: [06/77] Make GET_MODE_WIDER return an opt_mode

2017-08-29 Thread Richard Sandiford
Jeff Law  writes:
> On 08/28/2017 01:05 PM, Richard Sandiford wrote:
>> 
>>> As for the name, get_nonvoid?  Ugh.  Not sure.  Open to suggestions.
>> 
>> I'd rather avoid "nonvoid", since the use of VOIDmode for "no mode" is
>> really an implementation detail in things like opt_mode .
>> Other possiblities might be:
> Yea, good point on encoding the implementation detail not being a good idea.
>
>> 
>>   - require
>>   - demand
>>   - mode
>>   - get_mode
>>   - require_mode
>>   - demand_mode
>>   - else_fail (to go with else_void and else_blk)
>>   - noelse
> require, demand with or without the _mode suffix seem good to me.

OK, here's a patch that uses require ().  I've updated the following
patches in the obvious way.

This does make me want to reconsider another decision though.
Using opt_mode for iterators leads to things like:

  opt_scalar_int_mode wider_mode_iter;
  FOR_EACH_WIDER_MODE (wider_mode_iter, mode)
{
  scalar_int_mode wider_mode = wider_mode_iter.require ();
  if (optab_handler (unoptab, wider_mode) != CODE_FOR_nothing)
...

which isn't pretty.  It would be easy to support:

  scalar_int_mode wider_mode;
  FOR_EACH_WIDER_MODE (wider_mode, mode)
if (optab_handler (unoptab, wider_mode) != CODE_FOR_nothing)
  ...

*but* this would mean that "wider_mode" is accessible but undefined
after the loop (unlike the first loop, where wider_mode_iter is
guaranteed to be empty if the loop runs to completion).  Is that OK?
Or is it too suprising?

Another alternative would be:

  opt_scalar_int_mode wider_mode_iter;
  scalar_int_mode wider_mode;
  FOR_EACH_WIDER_MODE (wider_mode_iter, wider_mode, mode)
if (optab_handler (unoptab, wider_mode) != CODE_FOR_nothing)
  ...

which gives both.  But perhaps this would be odd for plain machine_mode
iterators, where there's no obvious distinction between the first and
second arguments.

Thanks,
Richard


2017-08-29  Richard Sandiford  
Alan Hayward  
David Sherwood  

gcc/
* coretypes.h (opt_mode): New class.
* machmode.h (opt_mode): Likewise.
(opt_mode::else_void): New function.
(opt_mode::require): Likewise.
(opt_mode::exists): Likewise.
(GET_MODE_WIDER_MODE): Turn into a function and return an opt_mode.
(GET_MODE_2XWIDER_MODE): Likewise.
(mode_iterator::get_wider): Update accordingly.
(mode_iterator::get_2xwider): Likewise.
(mode_iterator::get_known_wider): Likewise, turning into a template.
* combine.c (make_extraction): Update use of GET_MODE_WIDER_MODE,
forcing a wider mode to exist.
* config/cr16/cr16.h (LONG_REG_P): Likewise.
* rtlanal.c (init_num_sign_bit_copies_in_rep): Likewise.
* config/c6x/c6x.c (c6x_rtx_costs): Update use of
GET_MODE_2XWIDER_MODE, forcing a wider mode to exist.
* lower-subreg.c (init_lower_subreg): Likewise.
* optabs-libfuncs.c (init_sync_libfuncs_1): Likewise, but not
on the final iteration.
* config/i386/i386.c (ix86_expand_set_or_movmem): Check whether
a wider mode exists before asking for a move pattern.
(get_mode_wider_vector): Update use of GET_MODE_WIDER_MODE,
forcing a wider mode to exist.
(expand_vselect_vconcat): Update use of GET_MODE_2XWIDER_MODE,
returning false if no such mode exists.
* config/ia64/ia64.c (expand_vselect_vconcat): Likewise.
* config/mips/mips.c (mips_expand_vselect_vconcat): Likewise.
* expmed.c (init_expmed_one_mode): Update use of GET_MODE_WIDER_MODE.
Avoid checking for a MODE_INT if we already know the mode is not a
SCALAR_INT_MODE_P.
(extract_high_half): Update use of GET_MODE_WIDER_MODE,
forcing a wider mode to exist.
(expmed_mult_highpart_optab): Likewise.
(expmed_mult_highpart): Likewise.
* expr.c (expand_expr_real_2): Update use of GET_MODE_WIDER_MODE,
using else_void.
* lto-streamer-in.c (lto_input_mode_table): Likewise.
* optabs-query.c (find_widening_optab_handler_and_mode): Likewise.
* stor-layout.c (bit_field_mode_iterator::next_mode): Likewise.
* internal-fn.c (expand_mul_overflow): Update use of
GET_MODE_2XWIDER_MODE.
* omp-low.c (omp_clause_aligned_alignment): Likewise.
* tree-ssa-math-opts.c (convert_mult_to_widen): Update use of
GET_MODE_WIDER_MODE.
(convert_plusminus_to_widen): Likewise.
* tree-switch-conversion.c (array_value_type): Likewise.
* var-tracking.c (emit_note_insn_var_location): Likewise.
* tree-vrp.c (simplify_float_conversion_using_ranges): Likewise.
Return false inside rather than outside the loop if no wider mode
exists
* optabs.c (expand_binop): Update use of GET_MODE_WIDER_MODE
and GET_MODE_2XWIDER_MODE
   

Re: Improve DOM's ability to derive equivalences when traversing edges

2017-08-29 Thread Jeff Law
On 08/29/2017 03:13 AM, Christophe Lyon wrote:
> Hi Jeff,
> 
> 
> On 29 August 2017 at 07:07, Jeff Law  wrote:
>> This is a two part patchkit to improve DOM's ability to derive constant
>> equivalences that arise as a result of traversing a particular edge in
>> the CFG.
>>
>> Until now we only allowed a single NAME = NAME|CONST equivalence to be
>> associated with an edge in the CFG.  Patch #1 generalizes that code so
>> that we can record multiple simple equivalences on an edge.  Much like
>> expression equivalences, we just shove them into a vec and iterate on
>> the vec in the appropriate places.
>>
>> Patch #2 has the interesting bits.
>>
>> Back in the gcc-7 effort I added the ability to look at the operands of
>> a BIT_IOR_EXPR that had a zero result.  In that case each operand of the
>> BIT_IOR must have a zero value.  This was to address a missed
>> optimization regression bug during stage4.
>>
>> The plan was always to add analogous BIT_AND support, but I didn't feel
>> like handling BIT_AND was appropriate at the time (no bz entry and no
>> regressions related to that capability).
>>
>> I'd also had the sense that further improvements could be made here. For
>> example, it is common for the BIT_IOR or BIT_AND to be fed by a
>> comparison and we ought to be able to record the result of the
>> comparison.  If the comparison happened to be an equality test, then we
>> may ultimately derive a constant for on operand of the equality test as
>> well.
>>
>> It also seemed like the NOP/CONVERT_EXPR handling could be incorporated
>> into such a change.
>>
>> So I pulled together some instrumentation.  Lots of things generate
>> equivalences -- but a much smaller subset of those equivalences are
>> ultimately useful.
>>
>> Probably the most surprising was BIT_XOR, which allows us to generate
>> all kinds of equivalences, but none that were useful for ultimate
>> simplification in any of the tests I looked at.
>>
>>
>> The most subtle was COND_EXPRs.  We might have something like
>>
>> res = (a != 5) ? x : 1;
>>
>>
>> We can't actually derive anything useful for "a" here, even if we know
>> the result is one.  That's because "x" could have the value 1.  So you
>> end up only being able to derive equivalences for COND_EXPRs when both
>> arms have a constant value.  That restriction dramatically reduces the
>> utility of handling COND_EXPR -- to the point where I'm not including it.
>>
>> So what we end up with is BIT_AND/BIT_IOR, conversions, plus/minus,
>> comparisons and neg/not.
>>
>> So when we determine that a particular SSA_NAME has a constant value, we
>> look at the defining statement and essentially try to derive a value for
>> the input operand(s) based on knowing the result value.  If we can
>> derive a constant value for an input operand, we record that value and
>> recurse.
>>
>> In cases where we walk backwards to a condition.  We will record the
>> condition into the available expression table.
>>
>>
>> The code is written such that if we find cases where the equivalences
>> for other nodes are useful, they're easy to add.
>>
>>
>> These equivalences are most useful to the threader, but I've seen them
>> help in other cases as well.  There's a half-dozen or so new tests
>> reduced from GCC itself.
>>
>> Bootstrapped and regression tested on x86_64, lightly tested on ppc64le
>> via bootstrapping and running the new tests to verify they do the right
>> thing on a !logical_op_short_circuit target.
>>
>> Installing on the trunk.
>>
>> Jeff
>>
>>
>> commit 506ac60cacbc4c4e5e166513ea83c1d2e14eaf3b
>> Author: law 
>> Date:   Tue Aug 29 05:03:22 2017 +
>>
>> * tree-ssa-dom.c (class edge_info): Changed from a struct
>> to a class.  Add ctor/dtor, methods and data members.
>> (edge_info::edge_info): Renamed from allocate_edge_info.
>> Initialize additional members.
>> (edge_info::~edge_info): New.
>> (free_dom_edge_info): Delete the edge info.
>> (record_edge_info): Use new class & associated member functions.
>> Tighten forms for testing for edge equivalences.
>> (record_temporary_equivalences): Iterate over the simple
>> equivalences rather than assuming there's only one per edge.
>> (cprop_into_successor_phis): Iterate over the simple
>> equivalences rather than assuming there's only one per edge.
>> (optimize_stmt): Use operand_equal_p rather than pointer
>> equality for mini-DSE code.
>>
[ snip ]

>> commit a370df2c52074abbb044d1921a0c7df235758050
>> Author: law 
>> Date:   Tue Aug 29 05:03:36 2017 +
>>
>> * tree-ssa-dom.c (edge_info::record_simple_equiv): Call
>> derive_equivalences.
>> (derive_equivalences_from_bit_ior, 
>> record_temporary_equivalences):
>> Code moved 

Re: [PATCH][RFC] Make expansion of balanced binary trees of switches on tree level.

2017-08-29 Thread Martin Liška
On 08/29/2017 01:42 PM, Richard Biener wrote:
> On Thu, Aug 24, 2017 at 5:35 PM, Martin Liška  wrote:
>> On 08/18/2017 12:25 PM, Martin Liška wrote:
>>> On 08/18/2017 11:30 AM, Richard Biener wrote:
 On Tue, Aug 15, 2017 at 2:37 PM, Martin Liška  wrote:
> On 08/14/2017 10:32 AM, Richard Biener wrote:
>> Hmm, but the existing "lowering" part is called from the
>> switch-conversion pass.  So
>> I'm not sure a new file is good.
>
> Good, I'm not against having that in a single file. So new version of the 
> patch
> does that.
>
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>
> Ready to be installed?

 Hmm, I see you duplicate add_case_node for example.  Is that just 
 temporary?
 If not can you please factor out the data structure and common code?
 (case.[Ch]?)
>>>
>>> You are right. As we'll generate just jump table in stmt.c the proper fix 
>>> is to remove
>>> all usages of 'case_node' in the file because simple iteration of labels 
>>> will work fine.
>>> Let me do it incrementally to minimize fall out :)
>>
>> Hello.
>>
>> So lesson learned. I should follow your recommendation and make the clean-up 
>> in stmt.c. I didn't
>> so adding new variant of case_node with a different size caused bootstrap 
>> failure on aarch64 and
>> it was quite hard to debug. So sending updated version of the patch which 
>> has cleaned up stmt.c.
>>
>> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. 
>> Same of aarch64-linux-gnu.
>>
>> Ready to be installed?
> 
> No ChangeLog entry for tree-switch-conversion.c?  At least you added
> make_pass_lower_switch
> and friends.

Hi.

Yes, sorry for the missing entries. Fixed and installed as r251412.
Hopefully fall out will be small.

Martin

> 
> Ok with a little more verbose changelog.
> 
> Thanks and sorry for the delay,
> Richard.
> 
>> Martin
>>>
>>> Martin
>>>

 Thanks,
 Richard.

> Martin
>>>
>>



Re: [PATCH] [MSP430] Read mcu data from file instead of hardcoded data

2017-08-29 Thread Nick Clifton
Hi Jozef,

> If the file is found then it is parsed, and the device passed to the
> mmcu option is searched for. If the file or device aren't found, then
> GCC uses the old hard-coded data instead.

Whilst testing this patch with -mlarge enabled on the command line, I
encountered this (new) failure:

  cc1: error: -mlarge requires a 430X-compatible -mmcu=

  compiler exited with status 1
  FAIL: gcc.target/msp430/no_devices_warn_msp430.c (test for excess errors)

I think that you might need to extend the skip list in the test...

Cheers
  Nick


Re: [PATCH v3] Fix PR81503 (SLSR invalid fold)

2017-08-29 Thread Bill Schmidt

> On Aug 29, 2017, at 4:24 AM, Richard Biener  
> wrote:
> 
> On Mon, Aug 28, 2017 at 10:16 PM, Bill Schmidt
>  wrote:
>>> On Aug 28, 2017, at 7:37 AM, Richard Biener  
>>> wrote:
>>> 
>>> Not sure, but would it be fixed in a similar way when writing
>> 
>>> ?
>> 
>> Thanks, Richard, that works very well.  I decided this was a good 
>> opportunity to also
>> simplify the control flow a little with early returns.  Here's the result.  
>> Bootstrapped and
>> tested on powerpc64le-linux-gnu with no regressions.  Is this ok for trunk, 
>> and
>> eventually for backport to gcc 5, 6, and 7?  (I can omit the control flow 
>> cleanups for
>> the older releases if desired.)
> 
> Note I removed the to_shwi () != HOST_WIDE_INT_MIN check and we might end up
> negating a signed MIN value, turning it into itself but doing plus -> minus in

Right, I was initially concerned about that, but I believe we're ok.  Since the 
code was
first written, the wide-int C++ class came along and bump became a widest_int.  
So
when signed MIN gets negated it will be a positive number outside the range of 
an HWI.
The check for fitting in the signed target type should then take care of 
overflow.  The 
net is that we no longer need a separate check, which is much cleaner.

> 
>> +  if (wi::neg_p (bump))
>> {
>> -  enum tree_code code = PLUS_EXPR;
>> -  tree bump_tree;
>> -  gimple *stmt_to_print = NULL;
>> +  code = MINUS_EXPR;
>> +  bump = -bump;
>> +}
> 
> I'm not sure if that's an issue in practice (I doubt).  I think we
> keep whatever the
> user wrote if you write that in source in regular folding, not doing x - 
> INT_MIN
> to x + INT_MIN canonicalization.  Was the != HOST_WIDE_INT_MIN check
> done for any wrong-code issue?

Yes, it was.  It's been a long time, so I'm not sure exactly which test covers 
this;
but I've always added test cases for these corner cases as they arise, and the
test suite is clean with the patch.
> 
> Patch is ok.  Cleanups are fine to backport if they are obvious (the
> patch doesn't tell that ;))

Yeah, that didn't diff nicely *at all*.  Sorry about that.

> Please wait a few days with backporting though.

Thanks!  Will do.

Bill
> 
> Thanks,
> Richard.
> 
>> Thanks,
>> Bill
>> 
>> [gcc]
>> 
>> 2017-08-03  Bill Schmidt  
>>Jakub Jelinek  
>>Richard Biener  
>> 
>>PR tree-optimization/81503
>>* gimple-ssa-strength-reduction.c (replace_mult_candidate): Ensure
>>folded constant fits in the target type; reorder tests for clarity.
>> 
>> [gcc/testsuite]
>> 
>> 2017-08-03  Bill Schmidt  
>>Jakub Jelinek  
>>Richard Biener  
>> 
>>PR tree-optimization/81503
>>* gcc.c-torture/execute/pr81503.c: New file.
>> 
>> 
>> Index: gcc/gimple-ssa-strength-reduction.c
>> ===
>> --- gcc/gimple-ssa-strength-reduction.c (revision 251369)
>> +++ gcc/gimple-ssa-strength-reduction.c (working copy)
>> @@ -2089,104 +2089,104 @@ replace_mult_candidate (slsr_cand_t c, tree basis_
>>   tree target_type = TREE_TYPE (gimple_assign_lhs (c->cand_stmt));
>>   enum tree_code cand_code = gimple_assign_rhs_code (c->cand_stmt);
>> 
>> -  /* It is highly unlikely, but possible, that the resulting
>> - bump doesn't fit in a HWI.  Abandon the replacement
>> - in this case.  This does not affect siblings or dependents
>> - of C.  Restriction to signed HWI is conservative for unsigned
>> - types but allows for safe negation without twisted logic.  */
>> -  if (wi::fits_shwi_p (bump)
>> -  && bump.to_shwi () != HOST_WIDE_INT_MIN
>> -  /* It is not useful to replace casts, copies, negates, or adds of
>> -an SSA name and a constant.  */
>> -  && cand_code != SSA_NAME
>> -  && !CONVERT_EXPR_CODE_P (cand_code)
>> -  && cand_code != PLUS_EXPR
>> -  && cand_code != POINTER_PLUS_EXPR
>> -  && cand_code != MINUS_EXPR
>> -  && cand_code != NEGATE_EXPR)
>> +  /* It is not useful to replace casts, copies, negates, or adds of
>> + an SSA name and a constant.  */
>> +  if (cand_code == SSA_NAME
>> +  || CONVERT_EXPR_CODE_P (cand_code)
>> +  || cand_code == PLUS_EXPR
>> +  || cand_code == POINTER_PLUS_EXPR
>> +  || cand_code == MINUS_EXPR
>> +  || cand_code == NEGATE_EXPR)
>> +return;
>> +
>> +  enum tree_code code = PLUS_EXPR;
>> +  tree bump_tree;
>> +  gimple *stmt_to_print = NULL;
>> +
>> +  if (wi::neg_p (bump))
>> {
>> -  enum tree_code code = PLUS_EXPR;
>> -  tree bump_tree;
>> -  gimple *stmt_to_print = NULL;
>> +  code = MINUS_EXPR;
>> +  bump = -bump;
>> +}
>> 
>> -  /* If the basis name and the candidate's LHS have incompatible
>> -types, 

Re: [PATCH] [MSP430] [PR80993] Prevent lto removing interrupt handlers

2017-08-29 Thread Richard Biener
On Tue, Aug 29, 2017 at 3:23 PM, Nick Clifton  wrote:
> Hi Jozef,
>
>> As reported in PR80993, enabling lto causes interrupt handlers to be
>> removed. This patch marks interrupt handlers as used, preventing them
>> from being optimized out.
>>
>> If the attached patch is acceptable, I would appreciate if someone could
>> commit it for me, as I do not have write access.
>
> Approved and applied.  Thanks for fixing this bug.

Humm... don't you have to register interrupt handlers somehow?

That is, I fail to see why it is not the users responsibility to mark
things as used if they are used in a way GCC (and the linker
plugin) doesn't see?

Anyway, I guess this side-effect should be documented.  It also
will prevent GCC from making the symbol hidden in case it
resides in a shared library IIRC.

Richard.

> Cheers
>   Nick
>
>


Re: [PATCH], PR target/82015, add PowerPC warning for unpack_vector_int128 with illegal 2nd argument

2017-08-29 Thread Segher Boessenkool
Hi!

On Mon, Aug 28, 2017 at 11:41:47PM -0400, Michael Meissner wrote:
> One of the local programmers tried to use the __builtin_unpack_vector_int128
> function, but his second argument was not the constant 0 or 1.  The compiler
> put the 2nd argument into a register, but there wasn't a valid insn for this,
> and raised an insn not found message.  GCC should warn about this illegal
> usage.

Error, not warn (all the code is correct though).

>   * config/rs6000/rs6000.c (rs6000_expand_binop_builtin): Insure
>   that the second argument of the built-in functions to unpack
>   128-bit scalar types to 64-bit values is 0 or 1.  Change to use a
>   switch statement instead a lot of if statements.

It usually is easier to review if you post the big mechanical changes
as a separate patch.  But I'll manage, this one isn't so bad :-)

> @@ -14050,6 +14051,21 @@ rs6000_expand_binop_builtin (enum insn_c
> error ("argument 2 must be a 7-bit unsigned literal");
> return CONST0_RTX (tmode);
>   }
> +  break;
> +case CODE_FOR_unpackv1ti:
> +case CODE_FOR_unpackkf:
> +case CODE_FOR_unpacktf:
> +case CODE_FOR_unpackif:
> +case CODE_FOR_unpacktd:
> +  /* Only allow 1-bit unsigned literals. */
> +  STRIP_NOPS (arg1);
> +  if (TREE_CODE (arg1) != INTEGER_CST
> +   || !IN_RANGE (TREE_INT_CST_LOW (arg1), 0, 1))
> + {
> +   error ("argument 2 must be 0 or 1");
> +   return CONST0_RTX (tmode);
> + }
> +  break;

This loses that it must be a literal, compared to the 5/6/7 bit messages.
Maybe just say "1-bit unsigned literal", it reads a little bit funny, but
at least it is correct (for some meaning of "literal", anyway) ;-)

Okay for trunk; okay for the release branches with the obvious changes.
Thanks!


Segher


Re: [PATCH] [MSP430] [PR80993] Prevent lto removing interrupt handlers

2017-08-29 Thread Nick Clifton
Hi Jozef,

> As reported in PR80993, enabling lto causes interrupt handlers to be
> removed. This patch marks interrupt handlers as used, preventing them
> from being optimized out.
> 
> If the attached patch is acceptable, I would appreciate if someone could
> commit it for me, as I do not have write access.

Approved and applied.  Thanks for fixing this bug.

Cheers
  Nick




Re: [PATCH, rs6000] Stop non-volatile CR usage from killing shrink-wrap

2017-08-29 Thread Segher Boessenkool
Hi Pat,

On Mon, Aug 28, 2017 at 04:34:16PM -0500, Pat Haugen wrote:
> The following patch allows shrink-wrapping to succeed in the presence of
> non-volatile CR save/restore. The movesi_from_cr define_insn used to
> list all CRs as used, even though it's only the non-volatile values that
> we are interested in saving/restoring. This prevented the prolog from
> being moved past the early exit test because that compare was defining a
> register used in the prolog (a volatile CR). The patch removes the
> mentions of the volatile CRs and renames the functions involved so that
> it's hopefully clear they are for prolog generation only.

It is spelled "prologue" (prolog is a computer language).

> --- gcc/config/rs6000/rs6000.c(revision 251389)
> +++ gcc/config/rs6000/rs6000.c(working copy)
> @@ -26083,10 +26083,14 @@ rs6000_emit_savres_rtx (rs6000_stack_t *
>return insn;
>  }
>  
> -/* Emit code to store CR fields that need to be saved into REG.  */
> +/* Emit prolog code to store CR fields that need to be saved into REG. This

Super nit: two spaces after a full stop.

Okay for trunk with those fixed.  Thanks!


Segher


Re: [PATCH] Fix PR82011, early LTO debug fallout

2017-08-29 Thread Richard Biener
On Tue, 29 Aug 2017, Richard Biener wrote:

> 
> The following avoids adding DW_AT_inline attributes twice on which
> dsymutil complains.  The duplicate attribute is caused by stray
> code I left in (I guess I hoped nothing ends up DECL_ABSTRACT_P ...).
> Thus the following patch removes it -- DW_AT_inline is solely set
> by dwarf2out_abstract_function now.
> 
> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.  The
> gdb testsuite shows no regression.
> 
> Will commit once testing finished.
> 
> Note the assert is deliberately restricted to DW_AT_inline for now
> given enabling it unconditionally fires left and right ... :/
> (sth to fix, but only as followups)

The following seems to cure it as far as preliminary testing goes.

Full bootstrap & regtest currently running on x86_64-unknown-linux-gnu.

Will apply once it succeeds.

Richard.

2017-08-29  Richard Biener  

* dwarf2out.c (add_dwarf_attr): Check we don't add duplicate
attributes.
(gen_subprogram_die): Add DW_AT_object_pointer only early.
(dwarf2out_early_global_decl): Only generate a DIE for the
abstract origin if it doesn't already exist.
(resolve_addr): Do not add the linkage name twice when
generating a stub DIE for the DW_TAG_GNU_call_site target.

Index: gcc/dwarf2out.c
===
--- gcc/dwarf2out.c (revision 251409)
+++ gcc/dwarf2out.c (working copy)
@@ -4129,7 +4129,7 @@ add_dwarf_attr (dw_die_ref die, dw_attr_
   dw_attr_node *a;
   unsigned ix;
   FOR_EACH_VEC_SAFE_ELT (die->die_attr, ix, a)
-   gcc_assert (a->dw_attr != attr->dw_attr || a->dw_attr != DW_AT_inline);
+   gcc_assert (a->dw_attr != attr->dw_attr);
 }
 
   vec_safe_reserve (die->die_attr, 1);
@@ -22334,7 +22334,8 @@ gen_subprogram_die (tree decl, dw_die_re
{
  dw_die_ref parm_die = gen_decl_die (parm, NULL, NULL, subr_die);
 
- if (parm == DECL_ARGUMENTS (decl)
+ if (early_dwarf
+ && parm == DECL_ARGUMENTS (decl)
  && TREE_CODE (TREE_TYPE (decl)) == METHOD_TYPE
  && parm_die
  && (dwarf_version >= 3 || !dwarf_strict))
@@ -25479,10 +25480,11 @@ dwarf2out_early_global_decl (tree decl)
 with C++ constructor clones for example and makes
 dwarf2out_abstract_function happy which requires the early
 DIE of the abstract instance to be present.  */
- if (DECL_ABSTRACT_ORIGIN (decl))
+ tree origin = DECL_ABSTRACT_ORIGIN (decl);
+ if (origin != NULL && lookup_decl_die (origin) == NULL)
{
- current_function_decl = DECL_ABSTRACT_ORIGIN (decl);
- dwarf2out_decl (DECL_ABSTRACT_ORIGIN (decl));
+ current_function_decl = origin;
+ dwarf2out_decl (origin);
}
 
  current_function_decl = decl;
@@ -29047,7 +29049,7 @@ resolve_addr (dw_die_ref die)
add_AT_flag (tdie, DW_AT_external, 1);
add_AT_flag (tdie, DW_AT_declaration, 1);
add_linkage_attr (tdie, tdecl);
-   add_name_and_src_coords_attributes (tdie, tdecl);
+   add_name_and_src_coords_attributes (tdie, tdecl, true);
equate_decl_number_to_die (tdecl, tdie);
  }
  }


Re: [PATCH] Factor out division by squares and remove division around comparisons (1/2)

2017-08-29 Thread Richard Biener
On Tue, Aug 29, 2017 at 1:35 PM, Jackson Woodruff
 wrote:
> Hi all,
>
> Apologies again to those CC'ed, who (again) received this twice.
>
> Joseph: Yes you are correct. I misread the original thread, now fixed.
>
> Richard: I've moved the optimizations out of fold-const.c. One has been
> replicated in match.pd, and the other (x / C +- y / C -> (x +- y) / C) I've
> deleted as it only introduced a new optimization when running with the flags
> '-O0 -funsafe-math-optimizations'.

Hmm, how did you verify that, that it only adds sth with -O0
-funsafe-math-optimizations?
Is that because in GIMPLE the reassoc pass should do this transform?

You added

+/* Simplify x / (- y) to -x / y.  */
+(simplify
+ (rdiv @0 (negate @1))
+ (rdiv (negate @0) @1))

for

  /* (-A) / (-B) -> A / B  */
  if (TREE_CODE (arg0) == NEGATE_EXPR && negate_expr_p (arg1))
return fold_build2_loc (loc, RDIV_EXPR, type,
TREE_OPERAND (arg0, 0),
negate_expr (arg1));
  if (TREE_CODE (arg1) == NEGATE_EXPR && negate_expr_p (arg0))
return fold_build2_loc (loc, RDIV_EXPR, type,
negate_expr (arg0),
TREE_OPERAND (arg1, 0));

presumably?  This isn't equivalent.  It's more like

(simplify
  (rdiv (negate_expr_p @0) (negate @1))
  (rdiv (negate @0) @1))
(simplify
  (rdiv (negate @0) (negate_expr_p @1))
  (rdiv @0 (negate @1)))

and you should remove the corresponding fold-const.c code.

Please do these changes independently to aid bisecting in case of issues.

 (if (flag_reciprocal_math)
- /* Convert (A/B)/C to A/(B*C)  */
+ /* Convert (A/B)/C to A/(B*C) where neither B nor C are constant.  */
  (simplify
   (rdiv (rdiv:s @0 @1) @2)
-   (rdiv @0 (mult @1 @2)))
+   (if (TREE_CODE (@1) != REAL_CST && TREE_CODE (@1) != REAL_CST)
+(rdiv @0 (mult @1 @2

why?  I guess to avoid ping-poning with

+ /* Simplify x / (C * y) to (x / C) / y where C is a constant.  */
+ (simplify
+  (rdiv @0
+   (mult @1 REAL_CST@2))
+  (rdiv (rdiv @0 @2) @1))

?  If so why not just disallow for @1 == REAL_CST?

> On O1 and up, the pattern that replaces 'x / C' with 'x * (1 / C)'
> is enabled and then the pattern is covered by that and
> (x * C +- y * C -> C * (x +- y)) (which is already present in match.pd)
>
> I have also updated the testcase for those optimizations to use 'O1' to
> avoid that case.
>
>
> On 08/24/2017 10:06 PM, Jeff Law wrote:
>>
>> On 08/17/2017 03:55 AM, Wilco Dijkstra wrote:
>>>
>>> Richard Biener wrote:

 On Tue, Aug 15, 2017 at 4:11 PM, Wilco Dijkstra 
 wrote:
>
> Richard Biener wrote:
>>>
>>> We also change the association of
>>>
>>>x / (y * C) -> (x / C) / y
>>>
>>> If C is a constant.
>>
>>
>> Why's that profitable?
>
>
> It enables (x * C1) / (y * C2) -> (x * C1/C2) / y for example.
> Also 1/y is now available to the reciprocal optimization, see
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71026 for details.


 Sure, but on its own it's going to be slower.  So this isn't the
 correct way to enable those followup transforms.
>>>
>>>
>>> How can it be any slower? It's one division and one multiply in both
>>> cases.
>>
>> x / (y * C) -> (x / C) / y
>>
>> Goes from one division and one multiplication to two divisions.  I'm
>> guessing that's what Richi is (reasonably) concerned about.
>>
>> So it may be the case that we need more complex pattern matching here
>> for when to perform the first transformation to ultimately enable the
>> second.
>>
>
> The only case where we don't remove the division but still execute this
> pattern is when run with'-O0 -freciprocal-math'.
>
> As long as we have 'O1' or greater (and -freciprocal-math), that is enough
> to enable the removal of the reciprocal.

I don't see this.  I presume you mean this happens in the recip pass?
But we don't optimize this when optimizing for size (but the above pattern
still applies) or when targetm.min_divisions_for_recip_mul is too large.

So I still think this is the wrong place to do this and instead the recip
pass should be extended.



>
> Jackson.
>
>>
>> Jeff
>>
>


Re: [PATCH, rs6000] Fix PR81833 (incorrect code gen for vec_msum)

2017-08-29 Thread Segher Boessenkool
Hi Bill,

On Mon, Aug 28, 2017 at 03:56:21PM -0500, Bill Schmidt wrote:
> PR81833 identifies a problem with the little-endian vector multiply-sum
> instructions.  The original implementation is quite poor (and I am allowed
> to say that, since it was mine).  This patch fixes the code properly.
> 
> The revised code still uses UNSPECs for these ops, which is not strictly
> necessary, although descriptive rtl for them would be pretty complex.  I've
> put in a FIXME to make note of that for a future cleanup.

There is ss_plus, but that is the saturated sum of two things, not of
five resp. three as we need for vsumsws and vsum2sws.

If you convert to double-width, then add, then clamp, the result will
be correct; but then, very often some of that will be folded away (say,
when combine works on these patterns), and then you need a lot of
different patterns to catch all of this.

So we really need an ss_plus that has more than two arguments.  We could
have an unspec for that as well (instead of the unspecs that work on
vectors, unspecs that work on the elements), that may work.

Or perhaps it will work best as-is: the only way we get these unspecs
is via intrinsics, maybe we should just trust the user.

>   PR target/81833
>   * config/rs6000/altivec.md (altivec_vsum2sws): Convert from a
>   define_insn to a define_expand.
>   (altivec_vsum2sws_direct): New define_insn.
>   (altivec_vsumsws): Convert from a define_insn to a define_expand.
> 
> [gcc/testsuite]
> 
> 2017-08-28  Bill Schmidt  
> 
>   PR target/81833
>   * gcc.target/powerpc/pr81833.c: New file.


> +; FIXME: This can probably be expressed without an UNSPEC.
> +(define_insn "altivec_vsum2sws_direct"
>[(set (match_operand:V4SI 0 "register_operand" "=v")
>  (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "v")
> +   (match_operand:V4SI 2 "register_operand" "v")]
> +  UNSPEC_VSUM2SWS))]
>"TARGET_ALTIVEC"
> +  "vsum2sws %0,%1,%2"
> +  [(set_attr "type" "veccomplex")
> +   (set_attr "length" "4")])

This no longer writes to VSCR, please fix that.  "length" 4 is the
default, just leave it.

vsumsws has both of these correct already.

> --- gcc/testsuite/gcc.target/powerpc/pr81833.c(nonexistent)
> +++ gcc/testsuite/gcc.target/powerpc/pr81833.c(working copy)
> @@ -0,0 +1,54 @@
> +/* PR81833: This used to fail due to improper implementation of vec_msum.  */
> +
> +/* { dg-do run {target { lp64 } } } */

Does this need lp64?  I don't at first glance see anything that would
need it.  Oh, vec_vsx_ld perhaps?

Okay for trunk and branches with the above fixes.  Thanks!


Segher


Re: [PATCH][RFC] Make expansion of balanced binary trees of switches on tree level.

2017-08-29 Thread Richard Biener
On Thu, Aug 24, 2017 at 5:35 PM, Martin Liška  wrote:
> On 08/18/2017 12:25 PM, Martin Liška wrote:
>> On 08/18/2017 11:30 AM, Richard Biener wrote:
>>> On Tue, Aug 15, 2017 at 2:37 PM, Martin Liška  wrote:
 On 08/14/2017 10:32 AM, Richard Biener wrote:
> Hmm, but the existing "lowering" part is called from the
> switch-conversion pass.  So
> I'm not sure a new file is good.

 Good, I'm not against having that in a single file. So new version of the 
 patch
 does that.

 Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

 Ready to be installed?
>>>
>>> Hmm, I see you duplicate add_case_node for example.  Is that just temporary?
>>> If not can you please factor out the data structure and common code?
>>> (case.[Ch]?)
>>
>> You are right. As we'll generate just jump table in stmt.c the proper fix is 
>> to remove
>> all usages of 'case_node' in the file because simple iteration of labels 
>> will work fine.
>> Let me do it incrementally to minimize fall out :)
>
> Hello.
>
> So lesson learned. I should follow your recommendation and make the clean-up 
> in stmt.c. I didn't
> so adding new variant of case_node with a different size caused bootstrap 
> failure on aarch64 and
> it was quite hard to debug. So sending updated version of the patch which has 
> cleaned up stmt.c.
>
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests. 
> Same of aarch64-linux-gnu.
>
> Ready to be installed?

No ChangeLog entry for tree-switch-conversion.c?  At least you added
make_pass_lower_switch
and friends.

Ok with a little more verbose changelog.

Thanks and sorry for the delay,
Richard.

> Martin
>>
>> Martin
>>
>>>
>>> Thanks,
>>> Richard.
>>>
 Martin
>>
>


Re: [testsuite, i386] Require -static support in gcc.dg/pie-static-[12].c (PR testsuite/81793)

2017-08-29 Thread Iain Sandoe
> On 29 Aug 2017, at 09:56, Rainer Orth  wrote:

>>> All this begs the question why on earth would darwin.h (STARTFILE_SPEC)
>>> accept -static and try to link libcrt0.o it the latter doesn't exist.
>>> 
>>> However, I've just found that the bundled clang on Darwin 11 does the
>>> same (and fails in the same way: "ld: library not found for -lcrt0.o”).
>> 
>> I think it’s because when one is bringing up the system, then one does (or
>> at least used to) have a static libc/crt set.
>> Thus the compiler did/does need to support it for that case.  I’ve not done
>> a kernel bootstrap since ≈ Darwin9 era, so things might have changed and
>> this could be history leaking through.
> 
> I've now found the following statement:
> 
>   https://developer.apple.com/library/content/qa/qa1118/_index.html
> effectively declaring statically linked binaries unsupported (same as
> newer Solaris versions do, btw.).

It's been unsupported in user-space for as long as I recall … 

> I've now checked clang from Darwin 17 (Xcode 9 Beta) and it behaves the
> same.  I've found the following in clang sources
> 
>  } else {
>if (Args.hasArg(options::OPT_static) ||
>Args.hasArg(options::OPT_object) ||
>Args.hasArg(options::OPT_preload)) {
>  CmdArgs.push_back("-lcrt0.o");
> 
> in lib/Driver/ToolChains/Darwin.cpp (Darwin::addStartObjectFileArgs).
> Several comments seem to indicate that most of this is just a
> straightforward translation of GCC's Darwin specs to C++ ;-)
> 
> Overall, mostly a historical artifact these days, it seems.

possibly, indeed - but ...
…  as noted above. when bringing the system up from scratch (i.e. bootstrapping 
the kernel, and associated stuff) there’s (or at least used to be) a phase 
where some tools _are_ built with static linkage - and part of that process 
involves building the crt0 + a statically-linkable libc.  That the code is 
still present in clang, suggests this could still be the case (although the 
‘just left in there by accident’ explanation is possible too, I will try to 
query someone for a definite answer).

In any case, we might as well just skip such tests (or xfail them) on Darwin, 
that’s a correct reflection of the user-land expectation.

Iain Sandoe
CodeSourcery / Mentor Embedded / Siemens





Re: [PATCH] Factor out division by squares and remove division around comparisons (1/2)

2017-08-29 Thread Jackson Woodruff

Hi all,

Apologies again to those CC'ed, who (again) received this twice.

Joseph: Yes you are correct. I misread the original thread, now fixed.

Richard: I've moved the optimizations out of fold-const.c. One has been 
replicated in match.pd, and the other (x / C +- y / C -> (x +- y) / C) 
I've deleted as it only introduced a new optimization when running with 
the flags '-O0 -funsafe-math-optimizations'.


On O1 and up, the pattern that replaces 'x / C' with 'x * (1 / C)'
is enabled and then the pattern is covered by that and
(x * C +- y * C -> C * (x +- y)) (which is already present in match.pd)

I have also updated the testcase for those optimizations to use 'O1' to 
avoid that case.



On 08/24/2017 10:06 PM, Jeff Law wrote:

On 08/17/2017 03:55 AM, Wilco Dijkstra wrote:

Richard Biener wrote:

On Tue, Aug 15, 2017 at 4:11 PM, Wilco Dijkstra  wrote:

Richard Biener wrote:

We also change the association of

   x / (y * C) -> (x / C) / y

If C is a constant.


Why's that profitable?


It enables (x * C1) / (y * C2) -> (x * C1/C2) / y for example.
Also 1/y is now available to the reciprocal optimization, see
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71026 for details.


Sure, but on its own it's going to be slower.  So this isn't the
correct way to enable those followup transforms.


How can it be any slower? It's one division and one multiply in both cases.

x / (y * C) -> (x / C) / y

Goes from one division and one multiplication to two divisions.  I'm
guessing that's what Richi is (reasonably) concerned about.

So it may be the case that we need more complex pattern matching here
for when to perform the first transformation to ultimately enable the
second.



The only case where we don't remove the division but still execute this 
pattern is when run with'-O0 -freciprocal-math'.


As long as we have 'O1' or greater (and -freciprocal-math), that is 
enough to enable the removal of the reciprocal.



Jackson.



Jeff

diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index 
eeeff1ed166734328a612142fdf6235274f9e858..907d96c3d994db1ec8dc0ef692ac0b4d59c99a4c
 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -3834,47 +3834,6 @@ invert_truthvalue_loc (location_t loc, tree arg)
   : TRUTH_NOT_EXPR,
  type, arg);
 }
-
-/* Knowing that ARG0 and ARG1 are both RDIV_EXPRs, simplify a binary operation
-   with code CODE.  This optimization is unsafe.  */
-static tree
-distribute_real_division (location_t loc, enum tree_code code, tree type,
- tree arg0, tree arg1)
-{
-  bool mul0 = TREE_CODE (arg0) == MULT_EXPR;
-  bool mul1 = TREE_CODE (arg1) == MULT_EXPR;
-
-  /* (A / C) +- (B / C) -> (A +- B) / C.  */
-  if (mul0 == mul1
-  && operand_equal_p (TREE_OPERAND (arg0, 1),
-  TREE_OPERAND (arg1, 1), 0))
-return fold_build2_loc (loc, mul0 ? MULT_EXPR : RDIV_EXPR, type,
-   fold_build2_loc (loc, code, type,
-TREE_OPERAND (arg0, 0),
-TREE_OPERAND (arg1, 0)),
-   TREE_OPERAND (arg0, 1));
-
-  /* (A / C1) +- (A / C2) -> A * (1 / C1 +- 1 / C2).  */
-  if (operand_equal_p (TREE_OPERAND (arg0, 0),
-  TREE_OPERAND (arg1, 0), 0)
-  && TREE_CODE (TREE_OPERAND (arg0, 1)) == REAL_CST
-  && TREE_CODE (TREE_OPERAND (arg1, 1)) == REAL_CST)
-{
-  REAL_VALUE_TYPE r0, r1;
-  r0 = TREE_REAL_CST (TREE_OPERAND (arg0, 1));
-  r1 = TREE_REAL_CST (TREE_OPERAND (arg1, 1));
-  if (!mul0)
-   real_arithmetic (, RDIV_EXPR, , );
-  if (!mul1)
-real_arithmetic (, RDIV_EXPR, , );
-  real_arithmetic (, code, , );
-  return fold_build2_loc (loc, MULT_EXPR, type,
- TREE_OPERAND (arg0, 0),
- build_real (type, r0));
-}
-
-  return NULL_TREE;
-}
 
 /* Return a BIT_FIELD_REF of type TYPE to refer to BITSIZE bits of INNER
starting at BITPOS.  The field is unsigned if UNSIGNEDP is nonzero
@@ -9419,12 +9378,6 @@ fold_binary_loc (location_t loc,
}
}
 
- if (flag_unsafe_math_optimizations
- && (TREE_CODE (arg0) == RDIV_EXPR || TREE_CODE (arg0) == 
MULT_EXPR)
- && (TREE_CODE (arg1) == RDIV_EXPR || TREE_CODE (arg1) == 
MULT_EXPR)
- && (tem = distribute_real_division (loc, code, type, arg0, arg1)))
-   return tem;
-
   /* Convert a + (b*c + d*e) into (a + b*c) + d*e.
  We associate floats only if the user has specified
  -fassociative-math.  */
@@ -9772,13 +9725,6 @@ fold_binary_loc (location_t loc,
return tem;
}
 
-  if (FLOAT_TYPE_P (type)
- && flag_unsafe_math_optimizations
- && (TREE_CODE (arg0) == RDIV_EXPR || TREE_CODE (arg0) == MULT_EXPR)
- && (TREE_CODE (arg1) == RDIV_EXPR || TREE_CODE (arg1) == MULT_EXPR)
- && 

Re: [AARCH64] Disable pc relative literal load irrespective of TARGET_FIX_ERR_A53_84341

2017-08-29 Thread James Greenhalgh
On Tue, Jun 27, 2017 at 11:20:02AM +1000, Kugan Vivekanandarajah wrote:
> https://gcc.gnu.org/ml/gcc-patches/2016-03/msg00614.html  added this
> workaround to get kernel building with when TARGET_FIX_ERR_A53_843419
> is enabled.
> 
> This was added to support building kernel loadable modules. In kernel,
> when CONFIG_ARM64_ERRATUM_843419 is selected, the relocation needed
> for ADRP/LDR (R_AARCH64_ADR_PREL_PG_HI21 and
> R_AARCH64_ADR_PREL_PG_HI21_NC are removed from the kernel to avoid
> loading objects with possibly offending sequence). Thus, it could only
> support pc relative literal loads.
> 
> However, the following patch was posted to kernel to add
> -mpc-relative-literal-loads
> http://www.spinics.net/lists/arm-kernel/msg476149.html
> 
> -mpc-relative-literal-loads is unconditionally added to the kernel
> build as can be seen from:
> https://github.com/torvalds/linux/blob/master/arch/arm64/Makefile
> 
> Therefore this patch removes the hunk so that applications like
> SPECcpu2017's 521/621.wrf can be built (with LTO in this case) without
> -mno-pc-relative-literal-loads
> 
> Bootstrapped and regression tested on aarch64-linux-gnu with no new 
> regressions.
> 
> Is this OK for trunk?

Hi Kugan,

I'm struggling a little to convince myself that this is correct. I think
the argument is that this was a workaround for one very particular issue
with the linux kernel module loader, which hard faults by refusing to
handle these relocations when in a workaround mode for Erratum 843419.
Most of these relocations won't occur because the kernel builds with
-mcmodel=large, but literals would always use a small model style
adrp/load, unless we were using -mpc-relative-literal-loads . So, this
workaround enabled -mpc-relative-literal-loads always when we were in
a workaround mode, thus allowing the kernel loader to continue.

The argument for removing it then, is that with the kernel now always using
-mpc-relative-literal-loads there is no reason for this workaround to stay
in place. The linkers which we will use will apply the workaround if needed.

Testcases and a detailed problem report of the build failures you had seen in
521.wrf would have helped me get closer to understanding this, and made
review substantially easier.

Am I on the right track?

If so, this is OK for trunk. If not, please can you expand on what is going
on in this patch.

Thanks,
James


> 
> Thanks,
> Kugan
> 
> gcc/testsuite/ChangeLog:
> 
> 2017-06-27  Kugan Vivekanandarajah  
> 
> * gcc.target/aarch64/pr63304_1.c: Remove-mno-fix-cortex-a53-843419.
> 
> gcc/ChangeLog:
> 
> 2017-06-27  Kugan Vivekanandarajah  
> 
> * config/aarch64/aarch64.c (aarch64_override_options_after_change_1):
> Disable pc relative literal load irrespective of TARGET_FIX_ERR_A53_84341
> for default.




Re: [Patch][aarch64] Use IFUNCs to enable LSE instructions in libatomic on aarch64

2017-08-29 Thread Szabolcs Nagy
On 28/08/17 19:25, Steve Ellcey wrote:
> On Fri, 2017-08-25 at 15:37 +0100, Szabolcs Nagy wrote:
> 
>> the use of ifunc in gcc target libraries was a mistake
>> in my opinion, there are several known bugs in the ifunc
>> design and uclibc/musl/bionic don't plan to support it.
>> but at this point i dont have a better proposal for doing
>> runtime selection of optimal atomics code.
>>
>> however in this patch i don't see why would the ctor run
>> before ifunc resolvers. how does this work on x86_64?
>> (there the different 16byte atomics are not even compatible,
>> so if ifunc resolvers in different dsos return different
>> result because one ran before the ctor, another after then
>> the runtime behaviour is broken. this can happen when one
>> dso is bindnow so ifunc relocation is processed before
>> ctors and another runs resolvers lazily or dlopened later..
>> but i didnt test it just looks broken)
> 
> I am not sure I followed everything here.  My basic testing all
> worked, init_cpu_revision got run when libatomic was loaded and
> then the IFUNC resolver gets called after that when one of the IFUNC
> atomic functions is called.  init_cpu_revision sets libat_have_lse
> which, I assume, will not be used by any other libraries.  If other
> libraries have IFUNCs they would have their own static constructors and
> set their own variables to be checked by their own IFUNCs.  So I am
> not sure how one DSO is going to break another DSO.
> 

this can only possibly work with lazy binding of the
libatomic calls (there are many reasons why that may
not be the case starting from static linking to compiling
with -fno-plt or setting LD_BIND_NOW at runtime) as i
expected this is broken on x86 see the execution test
i added to pr 60790:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60790

>> note that aarch64 ifunc resolvers get hwcap as an argument
>> so all this brokenness can be avoided (and if we want to
>> disable hwcap flags then probably glibc should take care
>> of that before passing hwcap to the ifunc resolver).
> 
> I looked at the IFUNC memcpy resolver in glibc and it does not look
> like it gets hwcap as an argument.  When I preprocess everything I see:
> 
> void *__libc_memcpy_ifunc (void) __asm__ ("__libc_memcpy");
> void *__libc_memcpy_ifunc (void)
> {
>   uint64_t __attribute__((unused)) midr = _dl_aarch64_cpu_features.midr_el1;
>   *res = ** expression that looks at midr value and returns function pointer 
> **;
>   return res;
> }
> __asm__ (".type " "__libc_memcpy" ", %gnu_indirect_function");
> extern __typeof (__libc_memcpy) memcpy __attribute__ ((alias 
> ("__libc_memcpy")));
> 

it's a general bug that most ifunc users (e.g. in binutils
tests) declare the ifunc resolvers incorrectly, the correct
prototype is the way the dynamic linker invokes the resolver
in sysdeps/aarch64/dl-irel.h:

static inline ElfW(Addr)
__attribute ((always_inline))
elf_ifunc_invoke (ElfW(Addr) addr)
{
  return ((ElfW(Addr) (*) (unsigned long int)) (addr)) (GLRO(dl_hwcap));
}

(that argument should be uint64_t for ilp32, but that's a
separate issue)

in glibc the hwcap is not used, because it has accesses to
cached dispatch info, but in libatomic using the hwcap
argument is the right way.




Re: [PATCH], Fix PR 81959 (power9 IEEE 128-bit float convert from 32-bit memory)

2017-08-29 Thread Segher Boessenkool
Hi Mike,

On Mon, Aug 28, 2017 at 02:50:02PM -0400, Michael Meissner wrote:
> When I added the optimization for loading 32-bit values directly into the
> vector registers from memory to convert to IEEE 128-bit floating point, I
> forgot to make sure the address did not have PRE_INCREMENT, etc. addressing.

>   * config/rs6000/rs6000.md (float_si2_hw): If register
>   allocation hasn't been done, make sure the memory address is
>   X-FORM (register+register).
>   (floatuns_si2_hw2): Likewise.

Why is it okay after RA but not before?

> --- gcc/config/rs6000/rs6000.md   (revision 251358)
> +++ gcc/config/rs6000/rs6000.md   (working copy)
> @@ -14505,6 +14505,9 @@ (define_insn_and_split "float_si2_
>  {
>if (GET_CODE (operands[2]) == SCRATCH)
>  operands[2] = gen_reg_rtx (DImode);
> +
> +  if (MEM_P (operands[1]) && !reload_completed)
> +operands[1] = rs6000_address_for_fpconvert (operands[1]);
>  })

It will need a comment here, then (other callers of
rs6000_address_for_fpconvert do not test for !reload_completed).

Or maybe the predicate should be stricter in all these cases?
nonimmediate_operand allows a lot ;-)

> --- gcc/testsuite/gcc.target/powerpc/pr81959.c(revision 0)
> +++ gcc/testsuite/gcc.target/powerpc/pr81959.c(revision 0)
> @@ -0,0 +1,25 @@
> +/* { dg-do compile { target { powerpc64*-*-* && lp64 } } } */
> +/* { dg-require-effective-target powerpc_p9vector_ok } */
> +/* { dg-options "-mpower9-vector -O2 -mfloat128" } */

powerpc*-*-*, or does that not work?


Segher


[PATCH] Fix PR82011, early LTO debug fallout

2017-08-29 Thread Richard Biener

The following avoids adding DW_AT_inline attributes twice on which
dsymutil complains.  The duplicate attribute is caused by stray
code I left in (I guess I hoped nothing ends up DECL_ABSTRACT_P ...).
Thus the following patch removes it -- DW_AT_inline is solely set
by dwarf2out_abstract_function now.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.  The
gdb testsuite shows no regression.

Will commit once testing finished.

Note the assert is deliberately restricted to DW_AT_inline for now
given enabling it unconditionally fires left and right ... :/
(sth to fix, but only as followups)

Richard.

2017-08-29  Richard Biener  

* dwarf2out.c (add_dwarf_attr): When checking is enabled verify
we do not add a DW_AT_inline attribute twice.
(gen_subprogram_die): Remove code setting DW_AT_inline on
DECL_ABSTRACT_P nodes.

Index: gcc/dwarf2out.c
===
--- gcc/dwarf2out.c (revision 251401)
+++ gcc/dwarf2out.c (working copy)
@@ -4122,6 +4122,16 @@ add_dwarf_attr (dw_die_ref die, dw_attr_
   if (die == NULL)
 return;
 
+  if (flag_checking)
+{
+  /* Check we do not add duplicate attrs.  Can't use get_AT here
+ because that recurses to the specification/abstract origin DIE.  */
+  dw_attr_node *a;
+  unsigned ix;
+  FOR_EACH_VEC_SAFE_ELT (die->die_attr, ix, a)
+   gcc_assert (a->dw_attr != attr->dw_attr || a->dw_attr != DW_AT_inline);
+}
+
   vec_safe_reserve (die->die_attr, 1);
   vec_safe_push (die->die_attr, *attr);
 }
@@ -22082,28 +22092,6 @@ gen_subprogram_die (tree decl, dw_die_re
add_AT_flag (subr_die, DW_AT_rvalue_reference, 1);
}
 }
-  /* Tag abstract instances with DW_AT_inline.  */
-  else if (DECL_ABSTRACT_P (decl))
-{
-  if (DECL_DECLARED_INLINE_P (decl))
-   {
- if (cgraph_function_possibly_inlined_p (decl))
-   add_AT_unsigned (subr_die, DW_AT_inline, DW_INL_declared_inlined);
- else
-   add_AT_unsigned (subr_die, DW_AT_inline, 
DW_INL_declared_not_inlined);
-   }
-  else
-   {
- if (cgraph_function_possibly_inlined_p (decl))
-   add_AT_unsigned (subr_die, DW_AT_inline, DW_INL_inlined);
- else
-   add_AT_unsigned (subr_die, DW_AT_inline, DW_INL_not_inlined);
-   }
-
-  if (DECL_DECLARED_INLINE_P (decl)
- && lookup_attribute ("artificial", DECL_ATTRIBUTES (decl)))
-   add_AT_flag (subr_die, DW_AT_artificial, 1);
-}
   /* For non DECL_EXTERNALs, if range information is available, fill
  the DIE with it.  */
   else if (!DECL_EXTERNAL (decl) && !early_dwarf)


Re: [PING #2] [PATCH] enhance -Wrestrict to handle string built-ins (PR 78918)

2017-08-29 Thread Richard Biener
On Tue, Aug 29, 2017 at 2:20 AM, Martin Sebor  wrote:
> On 08/22/2017 02:45 AM, Richard Biener wrote:
>>
>> On Mon, Aug 21, 2017 at 10:10 PM, Martin Sebor  wrote:
>>>
>>> On 08/09/2017 10:14 AM, Jeff Law wrote:


 On 08/06/2017 05:08 PM, Martin Sebor wrote:

>>
>> Well, simply because the way as implemented isn't a must-alias query
>> but maybe one that's good enough for warnings (reduces false positives
>> but surely doesn't eliminate them).
>
>
>
> I'm very interested in reducing the rate of false positives in
> these and all other warnings.  As I mentioned in my comments,
> I did my best to weed them out of the implementation by building
> GDB, Glibc, Busybox, and the Linux kernel.  That of course isn't
> a guarantee that there aren't any.  But the first implementation
> of any non-trivial feature is never perfect, and hardly any
> warning of sufficient complexity is free of false positives, no
> matter here it's implemented (the middle-end, front-end, or
> a standalone static analysis tool).  If you spotted some cases
> I had missed I'd certainly be grateful for examples.  Otherwise,
> they will undoubtedly be reported as more software is exposed
> to the warning and, if possible, fixed, as happens with all
> other warnings.


 I think Richi is saying that the must alias query you've built isn't
 proper/correct.  It's less about false positives for Richi and more
 about building a proper must alias query if I understand him correctly.

 I suspect he's also saying that you can't reasonably build must-alias on
 top of a may-alias query framework.  They're pretty different queries.

 If you need something that is close to, but not quite a must alias
 query, then you're going to have to make a argument for that and you
 can't call it a must alias query.
>>>
>>>
>>>
>>> Attached is an updated and simplified patch that avoids making
>>> changes to any of the may-alias functions.  It turns out that
>>> all the information the logic needs to determine the overlap
>>> is already in the ao_ref structures populated by
>>> ao_ref_init_from_ptr_and_size.  The only changes to the pass
>>> are the enhancement to ao_ref_init_from_ptr_and_size to make
>>> use of range info and the addition of the two new functions
>>> used by the -Wrestrict clients outside the pass.
>>
>>
>> Warning for memcpy (p, p, ...) is going to fire false positives all around
>> given the C++ FE emits those in some cases and optimization can
>> expose that we are dealing with self-assignments.  And *p = *p is
>> valid.
>
>
> I changed it to only warn for calls to the library function and
> not to the __builtin_xxx.  Though I haven't been able to reproduce
> the problem you referring to (bug 32667) which makes me wonder if
> it's been fixed.

@@ -699,6 +706,15 @@ gimple_fold_builtin_memory_op (gimple_stmt_iterator *gsi,
  DEST{,+LEN,+LEN-1}.  */
   if (operand_equal_p (src, dest, 0))
 {
+  /* Avoid diagnosing exact overlap in calls to __builtin_memcpy.
+It's safe and may even be emitted by GCC itself (see bug
+32667).  However, diagnose it in explicit calls to the memcpy
+function.  */
+  if (check_overlap && *IDENTIFIER_POINTER (DECL_NAME (func)) != '_')
+   warning_at (loc, OPT_Wrestrict,

You can have explicit calls to __builtin_memcpy so that check looks bogus to me.
I think there's no way to distinguish the cases and I'd simply remove the check
and set TREE_NO_WARNING on the implicit calls generated by frontends.

>>
>> @@ -1028,6 +1066,10 @@ gimple_fold_builtin_memory_op (gimple_stmt_iterator
>> *gsi,
>> }
>> }
>>
>> +  /* Avoid folding the call if overlap is detected.  */
>> +  if (check_overlap && detect_overlap (loc, stmt, dest, src, len))
>> +   return false;
>> +
>>
>> no, please not.  You diagnosed the call (which might be a false positive)
>> so why keep it undefined?  The folded stmt will either have the same
>> semantics (aggregate copies expanded as memcpy) or have all reads
>> performed before writes.
>
>
> My goal was to follow the approach reflected in the comments
> elsewhere in the file:
>
> /* Out of bound array access.  Value is undefined,
>but don't fold.  */

This is done to keep -Warray-bounds triggering IIRC.  Also there's no
good value to fold to (well, we used zero).  I also vetoed emitting
warnings from folding code given that's inherently fragile ... (I think
I already said I don't like warning from gimple_fold_builtin_memory_op
too much either).

> While gimple_fold_builtin_memory_op may be able to provide well-
> defined behavior for the otherwise undefined semantics in a small
> subset of cases, it doesn't attempt to fold many more that it
> otherwise could (it only folds calls with sizes that are powers
> of 2).  So it seems of dubious value to make 

RE: [PATCH][GCC][PATCHv3] Improve fpclassify w.r.t IEEE like numbers in GIMPLE.

2017-08-29 Thread Tamar Christina


> -Original Message-
> From: Richard Biener [mailto:rguent...@suse.de]
> Sent: 24 August 2017 10:16
> To: Tamar Christina
> Cc: Joseph Myers; Christophe Lyon; Markus Trippelsdorf; Jeff Law; GCC
> Patches; Wilco Dijkstra; Michael Meissner; nd
> Subject: RE: [PATCH][GCC][PATCHv3] Improve fpclassify w.r.t IEEE like
> numbers in GIMPLE.
> 
> On Wed, 23 Aug 2017, Tamar Christina wrote:
> 
> > Hi Richard,
> >
> > Thanks for the feedback,
> >
> > >
> > > I think you should split up your work a bit.  The pieces from
> > > fold_builtin_* you remove and move to lowering that require no
> > > control flow like __builtin_isnan (x) -> x UNORD x should move to
> > > match.pd patterns (aka foldings that will then be applied both on
> GENERIC and GIMPLE).
> > >
> >
> > But emitting them as an UNORD wouldn't solve the performance or
> > correctness issues The patch is trying to address. For correctness,
> > for instance an UNORD would still signal When -fsignaling-nans is used
> (PR/66462).
> 
> Well, then address the correctness issues separately.  I think lumping
> everything into a single patch just makes your live harder ;)

But the patch was made to fix the correctness issues. Which unless I'm mistaken
can only be solved using integer operations. To give a short history on the 
patch:

My original patch had this in builtins.c where the current code is and had
none of this problem. The intention of this patch was just address the 
correctness
issues with isnan etc.

Since we had to do this using integer operations we figured we'd also replace 
the
operations with faster ones so the code was tweaked in order to e.g. make the 
common
paths of fpclassify exit earlier. (I had provided benchmark results to show that
the integer operations are faster than the floating point ones, also on x86).

It was then suggested by upstream review that I do this as a gimple lowering
phase. I pushed back against this but ultimately lost and submitted a new 
version
that moved from builtins.c to gimple-low.c, which was why late expansion for C++
broke, because the expansion is now done very early.

The rational for this move was that it would give the rest of the pipeline a 
chance
to optimize the code further.

After a few revisions this was ultimately accepted (the v3 version), but was 
reverted
because it broke IBM's long double format due to the special case code for it 
violating SSA.

That was trivially fixed (once I finally had access to a PowerPC hardware 
months later) and I changed the
patch to leave the builtin as a function call if at lowering time it wasn't 
known to be a builtin.
This then "fixes" the C++ testcase, in that the test doesn't fail anymore, but 
it doesn't generate
the same code as before.
Though the test is quite contrived and I don't know if actual user code often 
has this.

While checking to see if this behavior is OK, I was suggested to do it as a 
gimple-fold instead.
I did, but then turns out it won't work for non-constants as I can't generate 
new control flow from
a fold (which makes sense in retrospect but I didn't know this before).

This is why the patch is more complex than what it was before. The original 
version only emitted simple tree.

> 
> I was saying that if I write isunordered (x, y) and use -fno-signaling-nans 
> (the
> default) then I would expect the same code to be generated as if I say isnan
> (x) || isnan (y).  And I'm not sure if isnan (x) || isnan (y) is more 
> efficient if
> both are implemented with integer ops compared to using an unordered FP
> compare.
> 

But isnan (x) || isnan (y) can be rewritten using a match.pd rule to 
isunordered (x, y)
If that should really generate the same code. So I don’t think it's an issue 
for this patch.

> A canonical high-level representation on GIMPLE is equally important as
> maybe getting some optimization on a lowered "optimized" form.
> 
> And a good canonical representation is short (x UNORD x or x UNORD y
> counts as short here).
> 
> > > As fallback the expanders should simply emit a call (as done for
> > > isinf when not folded for example).
> >
> > Yes the patch currently already does this.
> 
> Ah, I probably looked at an old version then.
> 
> > My question had more to do if
> > the late expansion when you alias one of these functions in C++ was
> > really an issue, since It used to (sometimes, only when you got the
> > type of the argument correct) expand before whereas now it'll always
> > just emit a function call in this edge case.
> 
> The answer here is really that we should perform lowering at a point where
> we do not omit possible optimizations.  For the particular case this would
> mean lowering after IPA optimizations (also think of accelerator offloading
> which uses the early optimization phase of the host).  This also allows early
> optimization to more accurately estimate code size (when optimizing for
> size).  And I still think whether to expose fp classification as FP or 
> integer ops
> should be a 

[C++ Patch] PR 70621 ("[6/7/8 Regression] ICE on invalid code at -O1 and above on x86_64-linux-gnu in record_reference, at cgraphbuild.c:64")

2017-08-29 Thread Paolo Carlini

Hi,

in this error recovery regression, we ICE only when optimizing, while 
building the cgraph. Avoiding the reported problem seems easy, just 
check the return value of duplicate_decls in start_decl and immediately 
return back error_mark_node. However, while working on the issue, I 
noticed something slightly more interesting, IMO: we have, after the 
relevant duplicate_decls call:


-  if (decl_spec_seq_has_spec_p (declspecs, ds_constexpr)
-  && !DECL_DECLARED_CONSTEXPR_P (field))
-error ("%qD declared % outside its class", 
field);


which I propose to remove. In fact - something I didn't really know - 
for well formed user code, duplicate_decls, near the end, does some 
memcpys which mean that its second argument (would be 'field' in the 
start_decl section we are looking at)  is adjusted to have a 
DECL_DECLARED_CONSTEXPR_P consistent with its first argument. That's of 
course because we want to accept snippets like:


struct A
{
  static const int x;
};

constexpr int A::x = 0;

In turn that means the error above is issued only when something went 
wrong in duplicate_decls in the first place. Thus the above error seems 
at least verbose. However, here in start_decl we are only handling VAR_P 
(decl), thus I don't think the message above even makes sense and could 
be misleading, given snippets like the above. Therefore, I propose to 
remove the diagnostic entirely, which overall also simplifies a patch 
dealing with c++/70621. The below passes testing as-is on x86_64-linux.


Thanks, Paolo.



/cp
2017-29-08  Paolo Carlini  

PR c++/70621
* decl.c (start_decl): Early return error_mark_node if duplicate_decls
returns it; avoid misleading error message.

/testsuite
2017-29-08  Paolo Carlini  

PR c++/70621
* g++.dg/torture/pr70621.C: New.
Index: cp/decl.c
===
--- cp/decl.c   (revision 251375)
+++ cp/decl.c   (working copy)
@@ -5023,11 +5023,12 @@ start_decl (const cp_declarator *declarator,
 about this situation, and so we check here.  */
  if (initialized && DECL_INITIALIZED_IN_CLASS_P (field))
error ("duplicate initialization of %qD", decl);
- if (duplicate_decls (decl, field, /*newdecl_is_friend=*/false))
+ field = duplicate_decls (decl, field,
+  /*newdecl_is_friend=*/false);
+ if (field == error_mark_node)
+   return error_mark_node;
+ else if (field)
decl = field;
-  if (decl_spec_seq_has_spec_p (declspecs, ds_constexpr)
-  && !DECL_DECLARED_CONSTEXPR_P (field))
-error ("%qD declared % outside its class", field);
}
}
   else
Index: testsuite/g++.dg/torture/pr70621.C
===
--- testsuite/g++.dg/torture/pr70621.C  (revision 0)
+++ testsuite/g++.dg/torture/pr70621.C  (working copy)
@@ -0,0 +1,13 @@
+float foo();
+
+struct A
+{
+  static float x;  // { dg-message "previous declaration" }
+};
+
+double A::x = foo();  // { dg-error "conflicting declaration" }
+
+void bar()
+{
+  A::x = 0;
+}


Re: [PATCH v3] Fix PR81503 (SLSR invalid fold)

2017-08-29 Thread Richard Biener
On Mon, Aug 28, 2017 at 10:16 PM, Bill Schmidt
 wrote:
>> On Aug 28, 2017, at 7:37 AM, Richard Biener  
>> wrote:
>>
>> Not sure, but would it be fixed in a similar way when writing
> 
>> ?
>
> Thanks, Richard, that works very well.  I decided this was a good opportunity 
> to also
> simplify the control flow a little with early returns.  Here's the result.  
> Bootstrapped and
> tested on powerpc64le-linux-gnu with no regressions.  Is this ok for trunk, 
> and
> eventually for backport to gcc 5, 6, and 7?  (I can omit the control flow 
> cleanups for
> the older releases if desired.)

Note I removed the to_shwi () != HOST_WIDE_INT_MIN check and we might end up
negating a signed MIN value, turning it into itself but doing plus -> minus in

> +  if (wi::neg_p (bump))
>  {
> -  enum tree_code code = PLUS_EXPR;
> -  tree bump_tree;
> -  gimple *stmt_to_print = NULL;
> +  code = MINUS_EXPR;
> +  bump = -bump;
> +}

I'm not sure if that's an issue in practice (I doubt).  I think we
keep whatever the
user wrote if you write that in source in regular folding, not doing x - INT_MIN
to x + INT_MIN canonicalization.  Was the != HOST_WIDE_INT_MIN check
done for any wrong-code issue?

Patch is ok.  Cleanups are fine to backport if they are obvious (the
patch doesn't tell that ;))
Please wait a few days with backporting though.

Thanks,
Richard.

> Thanks,
> Bill
>
> [gcc]
>
> 2017-08-03  Bill Schmidt  
> Jakub Jelinek  
> Richard Biener  
>
> PR tree-optimization/81503
> * gimple-ssa-strength-reduction.c (replace_mult_candidate): Ensure
> folded constant fits in the target type; reorder tests for clarity.
>
> [gcc/testsuite]
>
> 2017-08-03  Bill Schmidt  
> Jakub Jelinek  
> Richard Biener  
>
> PR tree-optimization/81503
> * gcc.c-torture/execute/pr81503.c: New file.
>
>
> Index: gcc/gimple-ssa-strength-reduction.c
> ===
> --- gcc/gimple-ssa-strength-reduction.c (revision 251369)
> +++ gcc/gimple-ssa-strength-reduction.c (working copy)
> @@ -2089,104 +2089,104 @@ replace_mult_candidate (slsr_cand_t c, tree basis_
>tree target_type = TREE_TYPE (gimple_assign_lhs (c->cand_stmt));
>enum tree_code cand_code = gimple_assign_rhs_code (c->cand_stmt);
>
> -  /* It is highly unlikely, but possible, that the resulting
> - bump doesn't fit in a HWI.  Abandon the replacement
> - in this case.  This does not affect siblings or dependents
> - of C.  Restriction to signed HWI is conservative for unsigned
> - types but allows for safe negation without twisted logic.  */
> -  if (wi::fits_shwi_p (bump)
> -  && bump.to_shwi () != HOST_WIDE_INT_MIN
> -  /* It is not useful to replace casts, copies, negates, or adds of
> -an SSA name and a constant.  */
> -  && cand_code != SSA_NAME
> -  && !CONVERT_EXPR_CODE_P (cand_code)
> -  && cand_code != PLUS_EXPR
> -  && cand_code != POINTER_PLUS_EXPR
> -  && cand_code != MINUS_EXPR
> -  && cand_code != NEGATE_EXPR)
> +  /* It is not useful to replace casts, copies, negates, or adds of
> + an SSA name and a constant.  */
> +  if (cand_code == SSA_NAME
> +  || CONVERT_EXPR_CODE_P (cand_code)
> +  || cand_code == PLUS_EXPR
> +  || cand_code == POINTER_PLUS_EXPR
> +  || cand_code == MINUS_EXPR
> +  || cand_code == NEGATE_EXPR)
> +return;
> +
> +  enum tree_code code = PLUS_EXPR;
> +  tree bump_tree;
> +  gimple *stmt_to_print = NULL;
> +
> +  if (wi::neg_p (bump))
>  {
> -  enum tree_code code = PLUS_EXPR;
> -  tree bump_tree;
> -  gimple *stmt_to_print = NULL;
> +  code = MINUS_EXPR;
> +  bump = -bump;
> +}
>
> -  /* If the basis name and the candidate's LHS have incompatible
> -types, introduce a cast.  */
> -  if (!useless_type_conversion_p (target_type, TREE_TYPE (basis_name)))
> -   basis_name = introduce_cast_before_cand (c, target_type, basis_name);
> -  if (wi::neg_p (bump))
> +  /* It is possible that the resulting bump doesn't fit in target_type.
> + Abandon the replacement in this case.  This does not affect
> + siblings or dependents of C.  */
> +  if (bump != wi::ext (bump, TYPE_PRECISION (target_type),
> +  TYPE_SIGN (target_type)))
> +return;
> +
> +  bump_tree = wide_int_to_tree (target_type, bump);
> +
> +  /* If the basis name and the candidate's LHS have incompatible types,
> + introduce a cast.  */
> +  if (!useless_type_conversion_p (target_type, TREE_TYPE (basis_name)))
> +basis_name = introduce_cast_before_cand (c, target_type, basis_name);
> +
> +  if (dump_file && (dump_flags & TDF_DETAILS))
> +{
> +  fputs ("Replacing: ", 

Re: [C++ PATCH] Kill CLASSTYPE_SORTED_FIELDS

2017-08-29 Thread Richard Biener
On Mon, Aug 28, 2017 at 9:08 PM, Nathan Sidwell  wrote:
> Some quick tests show that memory use increased by 1.5%  Compilation time
> reduced by 1% to 3%
>
> Your comment on IRC about not needing a identifier->decl map, just a decl
> hash, because the decl already has the identifier is a good one.  I recall
> considering that when converting namespaces, but there the anonymous
> namespace has a NULL name, which is annoying.  I just used the same map
> table here.
>
> However, it's probably worth the effort.
>
> It'd also be nice if the hash table primitive didn't unconditionally hold
> instrumentation counters.  AFAICT they only get folded to global counters
> upon destruction of the hash table.  Which for these things never happens.

There's some debug methods like ::collision that use the fields.

I think for embedding hash_table it would be nice to subclass it,
have hash_table_embed not including those fields / methods and have
hash_table derive from it.  Not sure what to do about things like
hash_map and hash_set -- either template them on the hash-table
type or always use the embed variant and see if sth blows up.

Richard.

> Remember, when I succeed in folding METHOD_VEC into the same structure,
> we'll have several members in this table for all but the simplest of
> structures.
>
> nathan
>
> --
> Nathan Sidwell


Re: Improve DOM's ability to derive equivalences when traversing edges

2017-08-29 Thread Christophe Lyon
Hi Jeff,


On 29 August 2017 at 07:07, Jeff Law  wrote:
> This is a two part patchkit to improve DOM's ability to derive constant
> equivalences that arise as a result of traversing a particular edge in
> the CFG.
>
> Until now we only allowed a single NAME = NAME|CONST equivalence to be
> associated with an edge in the CFG.  Patch #1 generalizes that code so
> that we can record multiple simple equivalences on an edge.  Much like
> expression equivalences, we just shove them into a vec and iterate on
> the vec in the appropriate places.
>
> Patch #2 has the interesting bits.
>
> Back in the gcc-7 effort I added the ability to look at the operands of
> a BIT_IOR_EXPR that had a zero result.  In that case each operand of the
> BIT_IOR must have a zero value.  This was to address a missed
> optimization regression bug during stage4.
>
> The plan was always to add analogous BIT_AND support, but I didn't feel
> like handling BIT_AND was appropriate at the time (no bz entry and no
> regressions related to that capability).
>
> I'd also had the sense that further improvements could be made here. For
> example, it is common for the BIT_IOR or BIT_AND to be fed by a
> comparison and we ought to be able to record the result of the
> comparison.  If the comparison happened to be an equality test, then we
> may ultimately derive a constant for on operand of the equality test as
> well.
>
> It also seemed like the NOP/CONVERT_EXPR handling could be incorporated
> into such a change.
>
> So I pulled together some instrumentation.  Lots of things generate
> equivalences -- but a much smaller subset of those equivalences are
> ultimately useful.
>
> Probably the most surprising was BIT_XOR, which allows us to generate
> all kinds of equivalences, but none that were useful for ultimate
> simplification in any of the tests I looked at.
>
>
> The most subtle was COND_EXPRs.  We might have something like
>
> res = (a != 5) ? x : 1;
>
>
> We can't actually derive anything useful for "a" here, even if we know
> the result is one.  That's because "x" could have the value 1.  So you
> end up only being able to derive equivalences for COND_EXPRs when both
> arms have a constant value.  That restriction dramatically reduces the
> utility of handling COND_EXPR -- to the point where I'm not including it.
>
> So what we end up with is BIT_AND/BIT_IOR, conversions, plus/minus,
> comparisons and neg/not.
>
> So when we determine that a particular SSA_NAME has a constant value, we
> look at the defining statement and essentially try to derive a value for
> the input operand(s) based on knowing the result value.  If we can
> derive a constant value for an input operand, we record that value and
> recurse.
>
> In cases where we walk backwards to a condition.  We will record the
> condition into the available expression table.
>
>
> The code is written such that if we find cases where the equivalences
> for other nodes are useful, they're easy to add.
>
>
> These equivalences are most useful to the threader, but I've seen them
> help in other cases as well.  There's a half-dozen or so new tests
> reduced from GCC itself.
>
> Bootstrapped and regression tested on x86_64, lightly tested on ppc64le
> via bootstrapping and running the new tests to verify they do the right
> thing on a !logical_op_short_circuit target.
>
> Installing on the trunk.
>
> Jeff
>
>
> commit 506ac60cacbc4c4e5e166513ea83c1d2e14eaf3b
> Author: law 
> Date:   Tue Aug 29 05:03:22 2017 +
>
> * tree-ssa-dom.c (class edge_info): Changed from a struct
> to a class.  Add ctor/dtor, methods and data members.
> (edge_info::edge_info): Renamed from allocate_edge_info.
> Initialize additional members.
> (edge_info::~edge_info): New.
> (free_dom_edge_info): Delete the edge info.
> (record_edge_info): Use new class & associated member functions.
> Tighten forms for testing for edge equivalences.
> (record_temporary_equivalences): Iterate over the simple
> equivalences rather than assuming there's only one per edge.
> (cprop_into_successor_phis): Iterate over the simple
> equivalences rather than assuming there's only one per edge.
> (optimize_stmt): Use operand_equal_p rather than pointer
> equality for mini-DSE code.
>
> git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@251396 
> 138bc75d-0d04-0410-961f-82ee72b054a4
>
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index abe7d855260..258d4242f30 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,3 +1,20 @@
> +2017-08-28  Jeff Law  
> +
> +   * tree-ssa-dom.c (class edge_info): Changed from a struct
> +   to a class.  Add ctor/dtor, methods and data members.
> +   (edge_info::edge_info): Renamed from allocate_edge_info.
> +   Initialize additional members.
> +

Re: [testsuite, i386] Require -static support in gcc.dg/pie-static-[12].c (PR testsuite/81793)

2017-08-29 Thread Rainer Orth
Hi Iain,

>> All this begs the question why on earth would darwin.h (STARTFILE_SPEC)
>> accept -static and try to link libcrt0.o it the latter doesn't exist.
>> 
>> However, I've just found that the bundled clang on Darwin 11 does the
>> same (and fails in the same way: "ld: library not found for -lcrt0.o”).
>
> I think it’s because when one is bringing up the system, then one does (or
> at least used to) have a static libc/crt set.
> Thus the compiler did/does need to support it for that case.  I’ve not done
> a kernel bootstrap since ≈ Darwin9 era, so things might have changed and
> this could be history leaking through.

I've now found the following statement:

https://developer.apple.com/library/content/qa/qa1118/_index.html

effectively declaring statically linked binaries unsupported (same as
newer Solaris versions do, btw.).

I've now checked clang from Darwin 17 (Xcode 9 Beta) and it behaves the
same.  I've found the following in clang sources

  } else {
if (Args.hasArg(options::OPT_static) ||
Args.hasArg(options::OPT_object) ||
Args.hasArg(options::OPT_preload)) {
  CmdArgs.push_back("-lcrt0.o");

in lib/Driver/ToolChains/Darwin.cpp (Darwin::addStartObjectFileArgs).
Several comments seem to indicate that most of this is just a
straightforward translation of GCC's Darwin specs to C++ ;-)

Overall, mostly a historical artifact these days, it seems.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH][compare-elim] Merge zero-comparisons with normal ops

2017-08-29 Thread Kyrill Tkachov

Hi Jeff,
Some comments since I wrote this patch some time ago...

On 28/08/17 19:26, Jeff Law wrote:

On 08/10/2017 03:14 PM, Michael Collison wrote:

Hi all,

One issue that we keep encountering on aarch64 is GCC not making good use of 
the flag-setting arithmetic instructions
like ADDS, SUBS, ANDS etc. that perform an arithmetic operation and compare the 
result against zero.
They are represented in a fairly standard way in the backend as PARALLEL 
patterns:
(parallel [(set (reg x1) (op (reg x2) (reg x3)))
(set (reg cc) (compare (op (reg x2) (reg x3)) (const_int 0)))])

GCC isn't forming these from separate arithmetic and comparison instructions as 
aggressively as it could.
A particular pain point is when the result of the arithmetic insn is used 
before the comparison instruction.
The testcase in this patch is one such example where we have:
(insn 7 35 33 2 (set (reg/v:SI 0 x0 [orig:73  ] [73])
 (plus:SI (reg:SI 0 x0 [ x ])
 (reg:SI 1 x1 [ y ]))) "comb.c":3 95 {*addsi3_aarch64}
  (nil))
(insn 33 7 34 2 (set (reg:SI 1 x1 [77])
 (plus:SI (reg/v:SI 0 x0 [orig:73  ] [73])
 (const_int 2 [0x2]))) "comb.c":4 95 {*addsi3_aarch64}
  (nil))
(insn 34 33 17 2 (set (reg:CC 66 cc)
 (compare:CC (reg/v:SI 0 x0 [orig:73  ] [73])
 (const_int 0 [0]))) "comb.c":4 391 {cmpsi}
  (nil))

This scares combine away as x0 is used in insn 33 as well as the comparison in 
insn 34.
I think the compare-elim pass can help us here.

Is it the multiple use or the hard register that combine doesn't
appreciate.  The latter would definitely steer us towards compare-elim.


It's the multiple use IIRC.


This patch extends it by handling comparisons against zero, finding the 
defining instruction of the compare
and merging the comparison with the defining instruction into a PARALLEL that 
will hopefully match the form
described above. If between the comparison and the defining insn we find an 
instruction that uses the condition
registers or any control flow we bail out, and we don't cross basic blocks.
This simple technique allows us to catch cases such as this example.

Bootstrapped and tested on arm-none-linux-gnueabihf, aarch64-none-linux-gnu and 
x86_64.

Ok for trunk?

2017-08-05  Kyrylo Tkachov  
Michael Collison 

* compare-elim.c: Include emit-rtl.h.
(can_merge_compare_into_arith): New function.
(try_validate_parallel): Likewise.
(try_merge_compare): Likewise.
(try_eliminate_compare): Call the above when no previous clobber
is available.
(execute_compare_elim_after_reload): Add DF_UD_CHAIN and DF_DU_CHAIN
dataflow problems.

2017-08-05  Kyrylo Tkachov  
Michael Collison 

* gcc.target/aarch64/cmpelim_mult_uses_1.c: New test.


pr5198v1.patch


diff --git a/gcc/compare-elim.c b/gcc/compare-elim.c
index 7e557a2..c65d155 100644
--- a/gcc/compare-elim.c
+++ b/gcc/compare-elim.c
@@ -65,6 +65,7 @@ along with GCC; see the file COPYING3.  If not see
  #include "tm_p.h"
  #include "insn-config.h"
  #include "recog.h"
+#include "emit-rtl.h"
  #include "cfgrtl.h"
  #include "tree-pass.h"
  #include "domwalk.h"
@@ -579,6 +580,145 @@ equivalent_reg_at_start (rtx reg, rtx_insn *end, rtx_insn 
*start)
return reg;
  }
  
+/* Return true if it is okay to merge the comparison CMP_INSN with

+   the instruction ARITH_INSN.  Both instructions are assumed to be in the
+   same basic block with ARITH_INSN appearing before CMP_INSN.  This checks
+   that there are no uses or defs of the condition flags or control flow
+   changes between the two instructions.  */
+
+static bool
+can_merge_compare_into_arith (rtx_insn *cmp_insn, rtx_insn *arith_insn)
+{
+  for (rtx_insn *insn = PREV_INSN (cmp_insn);
+   insn && insn != arith_insn;
+   insn = PREV_INSN (insn))
+{
+  if (!NONDEBUG_INSN_P (insn))
+   continue;
+  /* Bail if there are jumps or calls in between.  */
+  if (!NONJUMP_INSN_P (insn))
+   return false;
+
+  df_ref ref;
+  /* Find a USE of the flags register.  */
+  FOR_EACH_INSN_USE (ref, insn)
+   if (DF_REF_REGNO (ref) == targetm.flags_regnum)
+ return false;
+
+  /* Find a DEF of the flags register.  */
+  FOR_EACH_INSN_DEF (ref, insn)
+   if (DF_REF_REGNO (ref) == targetm.flags_regnum)
+ return false;
+}
+  return true;
+}

What about old style asms?  Do you need to explicit punt those?  I don't
think they carry DF info.


I think we want to bail out on those to be safe. How are they 
represented in RTL?



Don't you also have to verify that the inputs to ARITH_INSN are
unchanged between ARITH_INSN and CMP_INSN?


I don't think so. As long as there is no flag clobbering instructions 
between them we should be ok.


Consider:
r1 := r2 + r3  // arith_insn
r2 := r4 + r5  // 

Re: [PATCH][RFC] Patch candidate for other/39851

2017-08-29 Thread Martin Liška
On 08/28/2017 05:59 PM, Joseph Myers wrote:
> On Tue, 8 Aug 2017, Martin Liška wrote:
> 
>> Hi.
>>
>> As mentioned in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=39851#c0 we need
>> to call targetm.target_option.override () in order to properly report which
>> ISA options are enabled for a -march/-mtune. Currently, opts.c uses just
>> #include "common/common-target.h" we are unable to call the function direct.
>> One solution might be to put the hook to gcc/common/common-target.def, but
>> that would require huge refactoring of i386.c and i386-common.c files.
>>
>> Thus I came with a small hook that lives in cl_option_handler_func.
>> With that I see proper results for test-case mentioned in the PR.
> 
> This patch is OK.  As you note, making the targetm.target_option.override 
> hook into a common option would involve much refactoring, because many of 
> those hooks mix things acting purely on the options structure (which could 
> readily become common) and things relating to other back-end state (which 
> would need to stay in a hook that's only used in the compiler proper, not 
> the driver).
> 

Thanks for the review. I've just installed the patch and a small fallout for 
Ada.

Martin
>From 891cac45682dddbecb343c3a7660ea0ab264373e Mon Sep 17 00:00:00 2001
From: marxin 
Date: Tue, 29 Aug 2017 10:34:32 +0200
Subject: [PATCH] Fix --help=target (Ada) (PR other/39851)

gcc/ada/ChangeLog:

2017-08-29  Martin Liska  

	PR other/39851
	* gcc-interface/trans.c (Pragma_to_gnu): Set argument to NULL.
---
 gcc/ada/gcc-interface/trans.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/ada/gcc-interface/trans.c b/gcc/ada/gcc-interface/trans.c
index 67044b7b574..c0c6fb30915 100644
--- a/gcc/ada/gcc-interface/trans.c
+++ b/gcc/ada/gcc-interface/trans.c
@@ -1486,7 +1486,7 @@ Pragma_to_gnu (Node_Id gnat_node)
 	else
 	  option_index = 0;
 
-	set_default_handlers ();
+	set_default_handlers (, NULL);
 	control_warning_option (option_index, (int) kind, arg, imply, location,
 lang_mask, , _options,
 _options_set, global_dc);
-- 
2.14.1



Re: [PATCH] Fix PR81921

2017-08-29 Thread Richard Biener
On Mon, 28 Aug 2017, Joseph Myers wrote:

> On Mon, 28 Aug 2017, Joseph Myers wrote:
> 
> > On Sat, 26 Aug 2017, Richard Biener wrote:
> > 
> > > On August 26, 2017 12:51:57 AM GMT+02:00, Joseph Myers 
> > >  wrote:
> > > >I'm seeing a build failure for s390x-linux-gnu that looks like it could
> > > >be 
> > > >related to this change (build was OK at r251332, failed at r251358).
> > > 
> > > Can you please open a bug? Can you confirm it fails the same way before 
> > > the patch if you use - flto?
> > 
> > Bug 82012 filed (as noted there, I don't know if the problem is in the 
> > compiler or libitm).  I can confirm the same failure appears building the 
> > preprocessed source with a GCC 7 branch compiler with -flto added to the 
> > options, rebuilding trunk r251332 to check with that.
> 
> Now confirmed with -flto with trunk r251332.

Thanks.  This is a target issue -- targets implementing
TARGET_OPTION_VALID_ATTRIBUTE_P need to also implement
TARGET_CAN_INLINE_P which s390 fails to.  nios2 as well.

Richard.



[PING] [PATCH, 3/3] Handle GOMP_NVPTX_JIT={-O[0-4],-ori,-arch=} in libgomp nvptx plugin

2017-08-29 Thread Tom de Vries

On 07/04/2017 12:23 PM, Tom de Vries wrote:

On 07/04/2017 12:05 PM, Tom de Vries wrote:

On 07/03/2017 04:24 PM, Tom de Vries wrote:

On 07/03/2017 04:08 PM, Thomas Schwinge wrote:

Hi!

On Mon, 26 Jun 2017 17:29:11 +0200, Jakub Jelinek  
wrote:

On Mon, Jun 26, 2017 at 03:26:57PM +, Joseph Myers wrote:

On Mon, 26 Jun 2017, Tom de Vries wrote:

2. Handle GOMP_OPENACC_NVPTX_{DISASM,SAVE_TEMPS} in libgomp 
nvptx plugin


This patch adds handling of:
- GOMP_OPENACC_NVPTX_SAVE_TEMPS=[01], and
- GOMP_OPENACC_NVPTX_DISASM=[01]


Why the "OPENACC" in these names?


I took the format from 'GOMP_OPENACC_DIM'.


Doesn't this debugging aid apply to
any variant of offloading?


I guess you're right. These environment variables would also be 
applicable for f.i. offloading via openmp on nvptx. I'll strip the 
'OPENACC_' bit from the variables.


The filename used for dumping the module is 
plugin-nvptx..cubin.


Also, I suggest to make these names similar to their controlling 
options,

that is: "gomp-nvptx*", for example.



Makes sense, will do.


Changes in the patch series:
- removed OPENACC_ from environment variable names
- made temp files use gomp-nvptx prefix.
- fixed build error due to missing _GNU_SOURCE in libgomp-nvptx.c.
- merged the three GOMP_NVPTX_JIT patches into one
- rewrote GOMP_NVPTX_JIT to add no extra flags to the JIT compiler
   invocation if GOMP_NVPTX_JIT if not defined, removing the need for
   hardcoding default values
- added CU_JIT_TARGET to plugin/cuda/cuda.h

Build on x86_64 with nvptx offloading enabled (using plugin/cuda/cuda.h).

The patch series now looks like:
1. Handle GOMP_NVPTX_{DISASM,SAVE_TEMPS} in libgomp nvptx plugin
2. Handle GOMP_NVPTX_PTXRW in libgomp nvptx plugin
3. Handle GOMP_NVPTX_JIT={-O[0-4],-ori,-arch=} in libgomp nvptx
plugin

I'll repost the patch series in reply to this email.



3. Handle GOMP_NVPTX_JIT={-O[0-4],-ori,-arch=} in libgomp nvptx
plugin
( combination of 3 GOMP_NVPTX_JIT patches originally submitted at:
  https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01921.html
  https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01920.html
  https://gcc.gnu.org/ml/gcc-patches/2017-06/msg02407.html )




Ping. I'd like to use GOMP_NVPTX_JIT in a workaround for a cuda JIT bug 
triggered in libgomp.c/for-5.c (see PR81805), like this:

...
 /* { dg-set-target-env-var GOMP_NVPTX_JIT "-O0" } */
...

Thanks,- Tom



0003-Handle-GOMP_NVPTX_JIT-O-0-4-ori-arch-n-in-libgomp-nvptx-plugin.patch


Handle GOMP_NVPTX_JIT={-O[0-4],-ori,-arch=} in libgomp nvptx plugin

2017-06-26  Tom de Vries  

* plugin/cuda/cuda.h (enum CUjit_option): Add CU_JIT_OPTIMIZATION_LEVEL,
CU_JIT_NEW_SM3X_OPT and CU_JIT_TARGET.
* plugin/plugin-nvptx.c (parse_number): New function.
(process_GOMP_NVPTX_JIT): New function.
(link_ptx): Add CU_JIT_OPTIMIZATION_LEVEL, CU_JIT_NEW_SM3X_OPT and
CU_JIT_TARGET to opts if specified.

---
  libgomp/plugin/cuda/cuda.h|   5 +-
  libgomp/plugin/plugin-nvptx.c | 108 --
  2 files changed, 109 insertions(+), 4 deletions(-)

diff --git a/libgomp/plugin/cuda/cuda.h b/libgomp/plugin/cuda/cuda.h
index 25d5d19..7d190f1 100644
--- a/libgomp/plugin/cuda/cuda.h
+++ b/libgomp/plugin/cuda/cuda.h
@@ -88,7 +88,10 @@ typedef enum {
CU_JIT_INFO_LOG_BUFFER_SIZE_BYTES = 4,
CU_JIT_ERROR_LOG_BUFFER = 5,
CU_JIT_ERROR_LOG_BUFFER_SIZE_BYTES = 6,
-  CU_JIT_LOG_VERBOSE = 12
+  CU_JIT_OPTIMIZATION_LEVEL = 7,
+  CU_JIT_TARGET = 9,
+  CU_JIT_LOG_VERBOSE = 12,
+  CU_JIT_NEW_SM3X_OPT = 15
  } CUjit_option;
  
  typedef enum {

diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index cc2ee5e..f5b9502 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -144,6 +144,10 @@ init_cuda_lib (void)
  
  #include "secure_getenv.h"
  
+#if CUDA_VERSION < 8000

+#define CU_JIT_NEW_SM3X_OPT 15
+#endif
+
  /* Convenience macros for the frequently used CUDA library call and
 error handling sequence as well as CUDA library calls that
 do the error checking themselves or don't do it at all.  */
@@ -1106,11 +1110,77 @@ post_process_ptx (unsigned num, const char **res_code, 
size_t *res_size)
  }
  
  static bool

+parse_number (const char *c, unsigned long* resp, char **end)
+{
+  unsigned long res;
+
+  errno = 0;
+  res = strtoul (c, end, 10);
+  if (errno)
+return false;
+
+  *resp = res;
+  return true;
+}
+
+static void
+process_GOMP_NVPTX_JIT (intptr_t *gomp_nvptx_o, intptr_t *gomp_nvptx_ori,
+   uintptr_t *gomp_nvptx_target)
+{
+  const char *var_name = "GOMP_NVPTX_JIT";
+  const char *env_var = getenv (var_name);
+  notify_var (var_name, env_var);
+
+  if (env_var == NULL)
+return;
+
+  const char *c = env_var;
+  while (*c != '\0')
+{
+  while (*c == ' ')
+   c++;
+
+  if (c[0] == '-' && c[1] == 'O'
+ && '0' <= c[2] && c[2] <= '4'